GCGToBLAST

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
OUTPUT
INPUT FILES
RELATED PROGRAMS
RESTRICTIONS
SPECIFYING DATABASES TO BLAST
CONSIDERATIONS
COMMAND-LINE SUMMARY
LOCAL DATA FILES
PARAMETER REFERENCE

FUNCTION

[ Top | Next ]

GCGToBLAST combines any set of GCG sequences into a database that you can search with BLAST.

DESCRIPTION

[ Previous | Top | Next ]

BLAST can search only databases that have been compressed into a special format. Such databases must be searched in their entirety. GCGToBLAST is provided to allow you to create a BLAST-searchable database from a group of sequences that interest you.

GCGToBLAST accepts any GCG multiple sequence specification as input and creates the three or four output files necessary for BLAST. These files share a common base name (the database name) and must be kept together in the same directory.

The output is written into your current working directory. If you want your output written into another directory use the command-line parameter -DIRectory=/usr/user/burgess/seq/.

EXAMPLE

[ Previous | Top | Next ]

Here is a session with GCGToBLAST that converts all the sequences specified by hsp70.list into a database suitable for input to BLAST.


% gcgtoblast

 GCGTOBLAST of what input sequence(s) ?  @hsp70.list

 What should I call the database ?  hsp70

        PIR1:JU0062     675 characters.
        PIR2:A25646     634 characters.

        ///////////////////////////////

        PIR1:S05776     682 characters.
        PIR2:B36590     642 characters.
        PIR1:S29261     638 characters.

 GCGTOBLAST complete:

         Sequences: 25
           Symbols: 16,073
   Output files in: .

%

OUTPUT

[ Previous | Top | Next ]

GCGToBLAST writes three or four files in your current working directory unless you redirect the output with the -DIRectory parameter.

INPUT FILES

[ Previous | Top | Next ]

GCGToBLAST accepts multiple sequences of the same type. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*.

RELATED PROGRAMS

[ Previous | Top | Next ]

DataSet creates a GCG data library from any set of sequences in GCG format.

BLAST searches one or more nucleic acid or protein databases for sequences similar to one or more query sequences of any type. BLAST can produce gapped alignments for the matches it finds.

RESTRICTIONS

[ Previous | Top | Next ]

All the sequences compressed by GCGToBLAST must be the same type, that is all nucleotide or all protein! The output files must be kept together in the same directory.

SPECIFYING DATABASES TO BLAST

[ Previous | Top | Next ]

By default BLAST does local searches by reading files from the directory whose logical name is BLASTDB. Each database known to BLAST is named in one of the three local data files: blast.rdbs, blast.ldbs, and blast.sdbs, so if your BLAST-searchable database is in some other directory, you have to name that directory as part of the search set specification to BLAST. For instance you could use a specification like /usr/user/burgess/seq/mydatabase that includes both the directory name and the name of the BLAST-searchable database (mydatabase in this example).

CONSIDERATIONS

[ Previous | Top | Next ]

The compressed representation of nucleotide sequences in the output from GCGToBLAST is not rich enough to represent nucleotide ambiguity codes accurately. So in addition to the compressed form of the database, GCGToBLAST writes an ASCII version of the data in what is becoming known as FastA format. If the sequences in the database are nucleotide, and if they contain ambiguous symbols, GCGToBLAST saves this file. BLAST uses it to display any sequences found in a search that contain ambiguous symbols.

The FastA format file can be large. If the display of the correct original ambiguity codes in your segment pair output is not important to you, you might want to delete this file or use GCGToBLAST with the -DELete parameter so that GCGToBLAST will delete it for you once the database is created. GCGToBLAST automatically deletes this file if the sequences in the data set are proteins, since the compressed amino acid codes can express ambiguity.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % gcgtoblast [-INfile=]@hsp70.list [-OUTfile=]hsp70 -Default

Prompted Parameters: None

Local Data Files:  None

Optional Switches:

-DIRectory=dirname writes into a directory other than the current directory
-NOMONitor         suppresses the screen monitor
-NOSUMmary         suppresses the screen summary
-OLDblast          create database for pre-2.0 versions of BLAST
  -NODELete        don't delete FASTA files when running pre-2.0 BLAST
-BATch             submits the program to run in the batch queue

LOCAL DATA FILES

[ Previous | Top | Next ]

None.

PARAMETER REFERENCE

[ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-DIRectory=DirName

This parameter allows you to redirect the output files written by GCGToBLAST to a directory other than your current working directory.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

-SUMmary

writes a summary of the program's work to the screen when you've used -Default to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

-OLDblast

directs GCGToBLAST to create pre-BLAST 2.0-compatible databases. Such databases are incompatible with BLAST versions 2.0 or higher.

-DELete

directs GCGToBLAST to delete the FastA-format version of a nucleotide sequence database that it creates in addition to the compressed database. You would do this to free up disk space if the display of the correct original ambiguity codes in the output of a BLAST search is not important to you. The FastaA file is deleted by default. If you use the -OLDblast parameter, then the FastaA file is deleted by default only if you are creating a protein database. Use -NODELete if you want to retain the FastA file when not building pre-BLAST 2.0 nucleic acid databases.

-BATch

submits the program to the batch queue for processing after prompting you for all required user inputs. Any information that would normally appear on the screen while the program is running is written into a log file. Whether that log file is deleted, printed, or saved to your current directory depends on how your system manager has set up the command that submits this program to the batch queue. All output files are written to your current directory, unless you direct the output to another directory when you specify the output file.

Printed: December 9, 1998 16:29 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997, 1998 Genetics Computer Group Inc., a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com