Short Descriptions

[ Program Manual | User's Guide | Data Files | Databases ]

This appendix lists and briefly describes programs in the Wisconsin Package. Programs are grouped by function and may appear under multiple functional headings. For more information on using these programs, see the Program Manual.

+ - Denotes a program that generates graphics which require a graphics output device.

Comparison

Pairwise Comparison
Gap Uses the algorithm of Needleman and Wunsch to find the alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps.
BestFit Makes an optimal alignment of the best segment of similarity between two sequences. Optimal alignments are found by inserting gaps to maximize the number of matches using the local homology algorithm of Smith and Waterman.
FrameAlign Creates an optimal alignment of the best segment of similarity (local alignment) between a protein sequence and the codons in all possible reading frames on a single strand of a nucleotide sequence. Optimal alignments may include reading frame shifts.
Compare Compares two protein or nucleic acid sequences and creates a file of the points of similarity between them for plotting with DotPlot. Compare finds the points using either a window/stringency or a word match criterion. The word comparison is 1,000 times faster than the window/stringency comparison, but somewhat less sensitive.
DotPlot+ Makes a dot-plot with the output file from Compare or StemLoop.
GapShow+ Displays an alignment by making a graph that shows the distribution of similarities and gaps. The two input sequences should be aligned with either Gap or BestFit before they are given to GapShow for display.
ProfileGap Makes an optimal alignment between a profile and one or more sequences.
Multiple Comparison
PileUp+ Creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment.
SeqLab Is the graphical user interface for the Wisconsin Package. For additional information, refer to the SeqLab Guide.
PlotSimilarity+ Plots the running average of the similarity among the sequences in a multiple sequence alignment.
Pretty Displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment; it simply displays it.
PrettyBox+ Displays multiple sequence alignments as shaded boxes in Postscript format for printing or displaying with a Postscript-compatible device. PrettyBox optionally calculates a consensus sequence. The program does not create the alignment; it simply displays it.
MEME (Multiple EM for Motif Elicitation) Finds conserved motifs in a group unaligned sequences. MEME saves these motifs as a set of profiles. You can search a database of sequences with these profiles using the MotifSearch program.
ProfileMake Creates a position-specific scoring table, called a profile, that quantitatively represents the information from a group of aligned sequences. The profile can then be used for database searching (ProfileSearch) or sequence alignment (ProfileGap).
ProfileGap Makes an optimal alignment between a profile and one or more sequences.
Overlap Compares two sets of DNA sequences to each other in both orientations using a WordSearch style comparison.
NoOverlap Identifies the places where a group of nucleotide sequences do not share any common subsequences.
OldDistances Makes a table of the pairwise similarities within a group of aligned sequences.

Database Searching

Reference Searching
LookUp Identifies sequence database entries by name, accession number, author, organism, keyword, title, reference, feature, definition, length, or date. The output is a list of sequences.
StringSearch Identifies sequences by searching for character patterns such as "globin" or "human" in the sequence documentation.
Names Identifies GCG data files and sequence entries by name. It can show you what set of sequences is implied by any sequence specification.
Sequence Searching
BLAST Searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. BLAST can search databases on your own computer or databases maintained at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA.
NetBLAST Searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. NetBLAST can search only databases maintained at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA.
FastA Does a Pearson and Lipman search for similarity between a query sequence and a group of sequences of the same type (nucleic acid or protein). For nucleotide searches, FastA may be more sensitive than BLAST.
SSearch Does a rigorous Smith-Waterman search for similarity between a query sequence and a group of sequences of the same type (nucleic acid or protein). This may be the most sensitive method available for similarity searches. Compared to BLAST and FastA, it is very slow.
TFastA Does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences. TFastA translates the nucleotide sequences in all six reading frames before performing the comparison. It is designed to answer the question, "What implied protein sequences in a nucleotide sequence database are similar to my protein sequence?"
TFastX Does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences, taking frameshifts into account. It is designed to be a replacement for TFastA, and like TFastA, it is designed to answer the question, "What implied protein sequences in a nucleotide sequence database are similar to my protein sequence?" TFastA treats each of the six reading frames of a nucleotide sequence as a separate sequence, resulting in three separate alignments for each strand. TFastX, on the other hand, compares the protein query sequence to only one translated protein per strand of the nucleotide sequence, resulting in one alignment per strand. It calculates a similarity score for alignments that takes frameshifts into account, allowing it to "join" short regions separated by frameshifts into a single long alignment. TFastX may alert you to more meaningful hits than TFastA does when the nucleotide sequences contain frameshift errors.
FastX Does a Pearson and Lipman search for similarity between a nucleotide query sequence and a group of protein sequences, taking frameshifts into account. FastX translates both strands of the nucleic sequence before performing the comparison. It is designed to answer the question, "What implied protein sequences in my nucleic acid sequence are similar sequences in a protein database?"
FrameSearch+ Searches a group of protein sequences for similarity to one or more nucleotide query sequences, or searches a group of nucleotide sequences for similarity to one or more protein query sequences. For each sequence comparison, the program finds an optimal alignment between the protein sequence and all possible codons on each strand of the nucleotide sequence. Optimal alignments may include reading frame shifts.
MotifSearch Uses a set of profiles (representing similarities within a family of sequences) as a query to either a) search a database for new sequences similar to the original family, or b) annotate the members of the the original family with details of the matches between the profiles and each of the members. Normally, the profiles are created with the program MEME.
ProfileSearch Uses a profile (representing a group of aligned sequences) as a query to search the database for new sequences with similarity to the group. The profile is created with the program ProfileMake.
ProfileSegments Makes optimal alignments showing the segments of similarity found by ProfileSearch.
FindPatterns Identifies sequences that contain short patterns like GAATTC or YRYRYRYR. You can define the patterns ambiguously and allow mismatches. You can provide the patterns in a file or simply type them in from the terminal.
Motifs Looks for sequence motifs by searching through proteins for the patterns defined in the PROSITE Dictionary of Protein Sites and Patterns. Motifs can display an abstract of the current literature on each of the motifs it finds.
WordSearch+ Identifies sequences in the database that share large numbers of common words in the same register of comparison with your query sequence. The output of WordSearch can be displayed with Segments.
Segments Aligns and displays the segments of similarity found by WordSearch.
Sequence Retrieval
Fetch Copies GCG sequences or data files from the GCG database into your directory or displays them on your terminal screen.
NetFetch Retrieves entries from NCBI listed in a NetBLAST output file. It can also be used to retrieve entries individually by entry name or accession number. The output of NetFetch is an RSF file.

Editing and Publication

SeqEd Is an interactive editor for entering and modifying sequences and for assembling parts of existing sequences into new genetic constructs. You can enter sequences from the keyboard or from a digitizer.
SeqLab Is the graphical user interface for the Wisconsin Package. For additional information, refer to the SeqLab Guide.
Assemble Constructs new sequences from pieces of existing sequences. It concatenates the fragments you specify and writes them out as a new sequence file. SeqEd is a better tool for assembling sequences interactively, but Assemble is best for assembling sequences from fragments defined in a list file.
Pretty Displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment; it simply displays it.
PrettyBox+ Displays multiple sequence alignments as shaded boxes in Postscript format for printing or displaying with a Postscript-compatible device. PrettyBox optionally calculates a consensus sequence. The program does not create the alignment; it simply displays it.
Publish Arranges sequences for publication. It creates a text file that you can modify to your own needs with a text editor.
PlasmidMap+ Draws a circular plot of a plasmid construct. It can display restriction patterns, inserts, and known genetic elements. The plot is suitable for publication, record keeping, or analysis. It is drawn from one or more labeling files such as those written by MapSort.
LineUp Is a screen editor for editing multiple sequence alignments. You can edit up to 30 sequences simultaneously. New sequences can be typed in by hand or added from existing sequence files. A consensus sequence identifies places where the sequences are in conflict.
Figure+ Makes figures and posters by drawing graphics and text together. You can include output from other Wisconsin Package graphics programs as part of a figure.
Red Is a text formatter that creates publication-quality documents on a PostScript printer such as the Apple LaserWriter. You can use 13 different fonts, scaling each font to any size. You can also include figures and graphics from any Wisconsin Package graphics program within the text of the document.

Evolution

PAUPSearch Provides a GCG interface to the tree-searching options in PAUP (Phylogenetic Analysis Using Parsimony). Starting with a set of aligned sequences, you can search for phylogenetic trees that are optimal according to parsimony, distance, or maximum likelihood criteria; reconstruct a neighbor-joining tree; or perform a bootstrap analysis. The program PAUPDisplay can produce a graphical version of a PAUPSearch trees file. PAUP is the copyrighted property of the Smithsonian Institution. Use the program Fetch to obtain a copy of paup-license.txt to read about rights and limitations for using PAUP.
PAUPDisplay+ Provides a GCG interface to tree manipulation, diagnosis, and display options in PAUP (Phylogenetic Analysis Using Parsimony). Starting with a trees file that contains a sequence alignment and one or more trees reconstructed from this alignment (such as the output from PAUPSearch), you can plot the tree(s); compute the score of the tree(s) according to the criteria of parsimony, distance, or maximum likelihood; or calculate a consensus tree (two or more input trees). PAUPDisplay can also plot the trees from a GrowTree trees file. PAUP is the copyrighted property of the Smithsonian Institution. Use the program Fetch to obtain a copy of paup-license.txt to read about rights and limitations for using PAUP.
Distances Creates a table of the pairwise distances within a group of aligned sequences.
GrowTree+ Creates a phylogenetic tree from a distance matrix created by Distances using either the UPGMA or neighbor-joining method. You can create a text or graphics output file.
Diverge Estimates the pairwise number of synonymous and nonsynonymous substitutions per site between two or more aligned nucleic acid sequences that code for proteins. It uses a variant of the method published by Li et al.

Fragment Assembly

GelStart Begins a fragment assembly session by creating a new fragment assembly project or by identifying an existing project.
GelEnter Adds fragment sequences to a fragment assembly project. It accepts sequence data from your terminal keyboard, a digitizer, or existing sequence files.
GelMerge Aligns the sequences in a fragment assembly project into assemblies called contigs. You can view and edit these assemblies in GelAssemble.
GelAssemble Is a multiple sequence editor for viewing and editing contigs assembled by GelMerge.
GelView Displays the structure of the contigs in a fragment assembly project.
GelDisassemble Breaks up the contigs in a fragment assembly project into single fragments.

Gene Finding and Pattern Recognition

TestCode+ Helps you identify protein coding sequences by plotting a measure of the non-randomness of the composition at every third base. The statistic does not require a codon frequency table.
CodonPreference+ Is a frame-specific gene finder that tries to recognize protein coding sequences by virtue of the similarity of their codon usage to a codon frequency table or by the bias of their composition (usually GC) in the third position of each codon.
Frames+ Shows open reading frames for the six translation frames of a DNA sequence. Frames can superimpose the pattern of rare codon choices if you provide it with a codon frequency table.
Terminator Searches for prokaryotic factor-independent RNA polymerase terminators according to the method of Brendel and Trifonov.
Motifs Looks for sequence motifs by searching through proteins for the patterns defined in the PROSITE Dictionary of Protein Sites and Patterns. Motifs can display an abstract of the current literature on each of the motifs it finds.
MEME (Multiple EM for Motif Elicitation) Finds conserved motifs in a group unaligned sequences. MEME saves these motifs as a set of profiles. You can search a database of sequences with these profiles using the MotifSearch program.
Repeat Finds direct repeats in sequences. You must set the size, stringency, and range within which the repeat must occur; all the repeats of that size or greater are displayed as short alignments.
FindPatterns Identifies sequences that contain short patterns like GAATTC or YRYRYRYR. You can define the patterns ambiguously and allow mismatches. You can provide the patterns in a file or simply type them in from the terminal.
Composition Determines the composition of sequence(s). For nucleotide sequence(s), Composition also determines dinucleotide and trinucleotide content.
CodonFrequency Tabulates codon usage from sequences and/or existing codon usage tables. The output file is correctly formatted for input to the CodonPreference, Correspond, and Frames programs.
Correspond Looks for similar patterns of codon usage by comparing codon frequency tables.
Window Makes a table of the frequencies of different sequence patterns within a window as it is moved along a sequence. A pattern is any short sequence like GC or R or ATG. You can plot the output with the program StatPlot.
StatPlot+ Plots a set of parallel curves from a table of numbers like the table written by the Window program. The statistics in each column of the table are associated with a position in the analyzed sequence.
FitConsensus Uses a consensus table written by Consensus as a probe to find the best examples of the consensus in a DNA sequence. You can specify the number of fits you want to see, and FitConsensus tabulates them with their position, frame, and a statistical measure of their quality.
Consensus Calculates a consensus sequence for a set of pre-aligned short nucleic acid sequences by tabulating the percent of G, A, T, and C for each position in the set. FitConsensus uses the Consensus output table as a probe to search for the best examples of the derived consensus in other nucleotide sequences.
Xnu Replaces statistically significant tandem repeats in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.
Seg Replaces low complexity regions in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.

Importing / Exporting

Reformat Rewrites sequence file(s), scoring matrix file(s), or enzyme data file(s) so that they can be read by GCG programs.
BreakUp Reads a GCG-format sequence file containing more than 350,000 sequence characters and writes it as a set of separate, shorter, overlapping sequence files that can be analyzed by Wisconsin Package programs.
ChopUp Converts a non-GCG sequence file containing lines as long as 32,000 characters into a new file containing lines no longer than 50 characters. The new file can be read by Reformat to create a GCG-format sequence file.
FromStaden Changes a sequence from Staden format into GCG format. If the file contains a nucleotide sequence, the ambiguity codes are converted as shown in Appendix III of the Program Manual.
FromEMBL Reformats sequences from the distribution (flat file) format of the EMBL database into individual sequence files in GCG format.
FromGenBank Reformats one or more sequences in the flat file format of the GenBank database into individual sequence files in GCG format.
FromPIR Reformats sequences from the protein database of the Protein Identification Resource (PIR) into individual files in GCG format.
FromIG Reformats one or more sequences from IntelliGenetics format into individual files in GCG format.
FromTrace Converts one or more ABI or SCF files into individual sequence files in GCG format.
FromFasta Reformats one or more sequences from FastA format into individual files in GCG format.
ToStaden Writes a GCG sequence into a file in Staden format. If the file contains a nucleotide sequence, the ambiguity codes are converted as shown in Appendix III of the Program Manual.
ToPIR Writes GCG sequence(s) into a single file in PIR format.
ToIG Converts GCG sequence file(s) into a single file in IntelliGenetics format.
ToFastA Converts GCG sequence(s) into FastA format.
GetSeq Reads a sequence from a computer that is acting as a terminal and writes it into a new sequence file in GCG format on the computer running the Wisconsin Package.
Spew Sends a GCG sequence from the computer that runs the Wisconsin Package to a personal computer acting as a terminal.

Mapping

Map Maps a DNA sequence and displays both strands of the mapped sequence with restriction enzyme cut points above the sequence and protein translations below. Map can also create a peptide map of an amino acid sequence.
MapPlot+ Displays restriction sites graphically. If you don't have a plotter, MapPlot can write a text file that approximates the graph.
MapSort Finds the coordinates of the restriction enzyme cuts in a DNA sequence and sorts the fragments of the resulting digest by size. MapSort can sort the fragments from single or multiple enzyme digests.
FingerPrint Identifies the products of T1 ribonuclease digestion.
PeptideMap Creates a peptide map of an amino acid sequence.
PlasmidMap+ Draws a circular plot of a plasmid construct. It can display restriction patterns, inserts, and known genetic elements. The plot is suitable for publication, record keeping, or analysis. It is drawn from one or more labeling files such as those written by MapSort.
PeptideSort Shows the peptide fragments from a digest of an amino acid sequence. It sorts the peptides by weight, position, and HPLC retention at pH 2.1, and shows the composition of each peptide. It also prints a summary of the composition of the whole protein.

Primer Selection

Prime+ Selects oligonucleotide primers for a template DNA sequence. The primers may be useful for the polymerase chain reaction (PCR) or for DNA sequencing. You can allow Prime to choose primers from the whole template or limit the choices to a particular set of primers listed in a file.

Protein Analysis

Motifs Looks for sequence motifs by searching through proteins for the patterns defined in the PROSITE Dictionary of Protein Sites and Patterns. Motifs can display an abstract of the current literature on each of the motifs it finds.
ProfileScan Uses a database of profiles to find structural and sequence motifs in protein sequences.
CoilScan Locates coiled-coil segments in protein sequences.
HTHScan Scans protein sequences for the presence of helix-turn-helix motifs, indicative of sequence-specific DNA-binding structures often associated with gene regulation.
SPScan Scans protein sequences for the presence of secretory signal peptides (SPs).
PeptideSort Shows the peptide fragments from a digest of an amino acid sequence. It sorts the peptides by weight, position, and HPLC retention at pH 2.1, and shows the composition of each peptide. It also prints a summary of the composition of the whole protein.
Isoelectric+ Plots the charge as a function of pH for any peptide sequence.
PeptideMap Creates a peptide map of an amino acid sequence.
PepPlot+ Plots measures of protein secondary structure and hydrophobicity in parallel panels of the same plot.
PeptideStructure Makes secondary structure predictions for a peptide sequence. The predictions include (in addition to alpha, beta, coil, and turn) measures for antigenicity, flexibility, hydrophobicity, and surface probability. PlotStructure displays the predictions graphically.
PlotStructure+ Plots the measures of protein secondary structure in the output file from PeptideStructure. The measures can be shown on parallel panels of a graph or with a two-dimensional "squiggly" representation.
Moment+ Makes a contour plot of the helical hydrophobic moment of a peptide sequence.
HelicalWheel+ Plots a peptide sequence as a helical wheel to help you recognize amphiphilic regions.
Xnu Replaces statistically significant tandem repeats in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.
Seg Replaces low complexity regions in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.

DNA/ RNA Secondary Structure

MFold Predicts optimal and suboptimal secondary structures for an RNA or DNA molecule using the most recent energy minimization method of Zuker.
PlotFold+ Displays the optimal and suboptimal secondary structures for an RNA or DNA molecule predicted by MFold.
StemLoop Finds stems (inverted repeats) within a sequence. You specify the minimum stem length, minimum and maximum loop sizes, and the minimum number of bonds per stem. All loops or only the best loops can be displayed on your screen or written into a file.
DotPlot+ Makes a dot-plot with the output file from Compare or StemLoop.

Translation

Translate Translates nucleotide sequences into peptide sequences.
BackTranslate Backtranslates an amino acid sequence into a nucleotide sequence. The output helps you recognize minimally ambiguous regions that might be good for constructing synthetic probes.
Map Maps a DNA sequence and displays both strands of the mapped sequence with restriction enzyme cut points above the sequence and protein translations below. Map can also create a peptide map of an amino acid sequence.
ExtractPeptide Writes a peptide sequence from one or more of the translation frames displayed in the output from Map. Translate supercedes ExtractPeptide for most applications.
Pepdata Translates DNA sequence(s) in all six frames.
Reverse Reverses and/or complements a sequence.
Dataset Creates a GCG data library from any set of sequences in GCG format. To translate nucleotide sequences into peptide sequences, include the ToProt parameter.

Utilities

Sequence Utilities
Reverse Reverses and/or complements a sequence.
Shuffle Randomizes the order of the symbols in a sequence without changing the composition.
Simplify Lets you reduce the number of symbols in a sequence. Such a simplification would allow you, for instance, to treat all hydrophobic amino acids as equivalent.
Comptable Creates a scoring matrix using equivalences defined in a simplification scheme such as the one used for Simplify.
Corrupt Randomly introduces small numbers of substitutions, insertions, and deletions into nucleotide sequence(s).
Xnu Replaces statistically significant tandem repeats in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.
Seg Replaces low complexity regions in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.
Sample Extracts sequence fragments randomly from sequence(s). You can set a sampling rate to determine how many fragments Sample extracts.
Database Utilities
DataSet Creates a GCG data library from any set of sequences in GCG format.
GCGtoBLAST Combines any set of GCG sequences into a database that you can search with BLAST.
Sample Extracts sequence fragments randomly from sequence(s). You can set a sampling rate to determine how many fragments Sample extracts.
Printing / Plotting Utilities
LPrint Prints text file(s) on a PostScript printer connected to LPrintPort.
ListFile Prints a file on a printer attached to your terminal's pass-through printer port.
SetPlot Allows you to choose a plotting configuration from a menu of available graphics devices at your site.
Figure+ Makes figures and posters by drawing graphics and text together. You can include output from other Wisconsin Package graphics programs as part of a figure.
PlotTest+ Plots a test pattern to test of your graphics configuration. The pattern created by PlotTest uses every Wisconsin Package graphics feature. It should resemble the example test pattern in the documentation for PlotTest in the Program Manual.
File Utilities
Chopup Converts a non-GCG sequence file containing lines as long as 32,000 characters into a new file containing lines no longer than 50 characters. The new file can be read by Reformat to create a GCG-format sequence file.
Replace Makes character string replacements in text file(s). You provide a table of replacements in a file showing each existing string and its replacement.
CompressText Removes any or all of the following from files: A) trailing space; B) blank lines; C) extra space between words; D) all space; or E) leading space.
OneCase Puts all of the alphabetic characters in a file into lower or UPPER case. It can also capitalize every word.
ShiftOver Moves a file to the right or to the left as many columns as you specify.
Detab Replaces the tab characters in one or more files with spaces. The files can be written out in card-image format with records of fixed length.
Miscellaneous Utilities
SetKeys Writes a file in your current directory that redefines your keyboard's keys for easier sequence entry with the SeqEd, LineUp, GelEnter and GelAssemble programs and the SeqLab sequence editor. The output file, called set.keys, can be edited if you want to redefine keys that were not considered by the SetKeys program.
Reformat Rewrites sequence file(s), scoring matrix file(s), or enzyme data file(s) so that they can be read by GCG programs.
Red Is a text formatter that creates publication-quality documents on a PostScript printer such as the Apple LaserWriter. You can use 13 different fonts, scaling each font to any size. You can also include figures and graphics from any Wisconsin Package graphics program within the text of the document.
Name Creates, changes, deletes, or displays GCG logical name(s) from the GCG logical names table.
Symbol Creates, changes, deletes, or displays GCG symbol(s) from the GCG symbol table.

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982-2000 Genetics Computer Group Inc., a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com