TOFASTA

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
OUTPUT
INPUT FILES
CONSIDERATIONS
COMMAND-LINE SUMMARY
LOCAL DATA FILES
PARAMETER REFERENCE

FUNCTION

[ Top | Next ]

ToFastA converts GCG sequence(s) into FastA format.

DESCRIPTION

[ Previous | Top | Next ]

Sequence files in GCG format can be converted into a format suitable for use by programs that require sequences in FastA format. ToFastA accepts one or more GCG sequences as input and by default creates one output file containing all the sequences in FastA format. However, NCBI's BLAST family of programs accepts only one sequence per input file. Therefore, if you put -BLAst on the command line, ToFastA writes your output into separate files, naming each output file with the input sequence's name and the file name extension .tfa.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using ToFastA to convert the sequence ggamma.pep into FastA format.


% tofasta

 TOFASTA of what input sequence(s) ?  ggamma.pep

                  Begin (* 1 *) ?
                End (*   148 *) ?
               Reverse (* No *) ?

 What should I call the output file (* ggamma.tfa *) ?

               GGAMMA     148 characters.

 148 symbols written into "ggamma.tfa".

%

OUTPUT

[ Previous | Top | Next ]

Here is the output file:


>GGAMMA TRANSLATE of: gamma.seq check: 6474 from: 2179 to: 2270
MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK
VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG
KEFTPEVQASWQKMVTGVASALSSRYH*

INPUT FILES

[ Previous | Top | Next ]

ToFastA accepts multiple (one or more) nucleotide or protein sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*.

If the input is a list file, ToFastA applies any Begin, End, and Strand (if nucleic acid) attributes it finds within that file. However, with one exception, the Command Line qualifiers -BEGin, -END, -REVerse, and -NOREVerse will override any conflicting attributes found in the list file. The single exception is that if an -END qualifier specified on the Command Line is less than a Begin attribute found in the list file, the output sequence will begin and end at the base indicated by the Begin.

CONSIDERATIONS

[ Previous | Top | Next ]

To be compatible with NCBI's BLAST server, ToFastA deletes all non-alphabetic characters except periods (.), tildes (~), and asterisks (*). The program changes periods and tildes into hyphen (-) characters to represent gaps. NCBI's BLAST server tolerates asterisk (*) characters in protein sequences to represent the translation of the stop codon. At the time of this writing, we are not aware of the character requirements of other applications using FastA format.

FastA format does not differentiate protein from nucleotide sequences. FastA format is not rigorously defined so there may be different requirements from one application to another. Please call us at (608) 231-5200 or send us e-mail at Help@GCG.Com if you find programs that do not work with the output of ToFastA.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % tofasta [-INfile=]ggamma.pep -Default

Prompted Parameters:

-BEGin=1 -END=148       sets the range of interest (single sequences only)
-REVerse                uses the reverse strand (single sequences only)
[-OUTfile=]ggamma.tfa   names the output file

Local Data Files:  None

Optional Parameters:

-BLAst           creates a separate output file for each sequence
-EXTension=.tfa  uses .tfa as a file name extension
-NOMONitor       suppresses the screen monitor

LOCAL DATA FILES

[ Previous | Top | Next ]

None.

PARAMETER REFERENCE

[ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-BLAst

creates a separate output file for each sequence in the input file. The output file names consist of the names of the sequences in the input file followed by a .tfa extension.

-EXTension=.tfa

changes the default output file name extension if you don't like the default extension .tfa.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

Printed: December 9, 1998 16:27 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997, 1998 Genetics Computer Group Inc., a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com