OLDDISTANCES

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents

FUNCTION

DESCRIPTION

FUNCTION [ Top | Next ]

OldDistances makes a table of the pairwise similarities within a group of aligned sequences.

DESCRIPTION [ Previous | Top | Next ]

OldDistances writes a matrix of the pairwise similarities between up to 50 different sequences in a multiple sequence alignment. The similarity value is the number of "matches" between each sequence pair divided by the sequence length.

Matches

A match occurs if the value in the scoring matrix for a pair of bases or amino acids is greater than or equal to a set match threshold.

Denominator

The denominator can be any of four functions of sequence length: 1) the length of the shorter sequence of the pair; 2) the length without gaps of the shorter sequence of the pair; 3) the average of the sequence lengths; or 4) the average of the sequence lengths without gaps.

EXAMPLE [ Previous | Top | Next ]

Here is a session using OldDistances to determine similarities between aligned sequences in the file hsp70.msf:


% olddistances

 OLDDISTANCES within what multiple sequence alignment ?  hsp70.msf{*}

 What is the threshold for a match (* 2 *) ?

 Divide the sum of the matches by:

     1)  Length of shorter sequence including gaps
     2)  Length of shorter sequence excluding gaps
     3)  Average sequence length including gaps
     4)  Average sequence length excluding gaps
     5)  Nothing

 Please choose one (* 2 *) :

     hsp70.msf{s11448} 743
     hsp70.msf{s06443} 743
     hsp70.msf{a25398} 743

     /////////////////////

     hsp70.msf{s20149} 743
     hsp70.msf{a32493} 743
     hsp70.msf{s29261} 743

 What should I call the output file (* hsp70.olddistances *) ?

%

OUTPUT [ Previous | Top | Next ]

Here is part of the output file; it contains a 25 X 25 matrix (not all of which is shown):


 OLDDISTANCES within: hsp70.msf{*}  October 20, 1998 12:35

Threshold of comparison: 2
            Denominator: "Length of shorter sequence without gaps"
    Number of sequences: 25
Symbol Comparison Table: GenRunData:blosum62.cmp

Key for column and row indices:

  1         hsp70.msf{S11448}  Length: 743       Length without gaps: 653
  2         hsp70.msf{S06443}  Length: 743       Length without gaps: 516
  3         hsp70.msf{A25398}  Length: 743       Length without gaps: 661

  ///////////////////////////////////////////////////////////////////////

 25         hsp70.msf{S29261}  Length: 743       Length without gaps: 638

 Distance Matrix Part: 1

                 1         2         3         4         5         6   ...
 _____________________________________________________________________ ...
|    1   |    1.0000    0.9845    0.8698    0.8760    0.7679    0.7668 ...
|    2   |              1.0000    0.9380    0.9360    0.8120    0.8140 ...
|    3   |                        1.0000    0.9334    0.7586    0.7590 ...
|    4   |                                  1.0000    0.7695    0.7637 ...

//////////////////////////////////////////////////////////////////////

PileUp creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment. Gap makes sequence alignments. LineUp edits multiple sequence alignments. ProfileGap aligns a new sequence to an existing multiple sequence alignment. Pretty displays multiple sequence alignments.

Distances writes a matrix of the pairwise genetic distances between sequences in a multiple sequence alignment. These distances are suitable for input into programs such as GrowTree that create evolutionary trees. Distances provides several methods for correcting the distance calculations to account for multiple substitions at a single site, and the distance value is expressed as the number of nucleotide or amino acid substitutions per 100 residues.

RESTRICTIONS [ Previous | Top | Next ]

The sequences must be aligned properly for OldDistances to work.

ALGORITHM [ Previous | Top | Next ]

OldDistances compares each pair of aligned sequences base by base from the first symbol to the last symbol of the shorter sequence. The sequences must have already been aligned for the comparison to make sense. OldDistances simply counts the matches where the scoring matrix value is greater than a set match threshold. The sum of the matches is divided by various denominators such as the length of the shorter sequence.

Gaps are treated like any other symbol. The gap symbol (.) matches another symbol if that pair's value in the scoring matrix is above the threshold.

CONSIDERATIONS [ Previous | Top | Next ]

OldDistances chooses a default match threshold that is appropriate for the scoring matrix it reads. If you select a different scoring matrix wit the -MATRix command-line parameter, the program will adjust the default match threshold accordingly.

SUGGESTIONS [ Previous | Top | Next ]

If the sequences are not in an MSF file, use Pretty to display the aligned sequences you pass to OldDistances. If they look right in the Pretty display, they work sensibly with OldDistances.

COMMAND-LINE SUMMARY [ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % olddistances [-INfile=]hsp70.msf{*} -Default

Prompted Parameters:

-THReshold=0.6              sets minimum scoring matrix score for a match
-MENu=2                     divides the sum of the matches by:
                              1=length of the shorter sequence
                              2=length of the shorter sequence without gaps
                              3=average length
                              4=average length without gaps
                              5=nothing

[-OUTfile=]hsp70.distances  names the output file

Local Data Files:

-MATRix=blosum62.cmp        assigns the scoring matrix for proteins
-MATRix=dnadistances.cmp    assigns the scoring matrix for nucleic acids

Optional Parameters: None

LOCAL DATA FILES [ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

Local Scoring Matrices

This program reads one or more scoring matrices for the comparison of sequence characters. The program automatically reads the program's default scoring matrix in a public data directory unless you either 1) have a data file with exactly the same name as the program default scoring matrix in your current working directory; or 2) have a data file with exactly the same name as the program default scoring matrix in the directory with the logical name MyData; or 3) name a file on the command line with an expression like -MATRix=mymatrix.cmp. If you don't include a directory specification when you name a file with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see "Using a Special Kind of Data File: A Scoring Matrix" in Chapter 4, Using Data Files in the User's Guide.

OldDistances reads the scoring matrix file blosum62.cmp for peptide comparisons and dnadistances.cmp for nucleotide comparisons.

PARAMETER REFERENCE [ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-THReshold=2

sets the minimum scoring matrix value for a match.

-MENu=2

sets the method used to determine the final score. Methods 1 through 4 divide the sum of the matches by the length of the shorter sequence, the length of the shorter sequence without gaps, the average length of the two sequences, and the average length of the two sequence without gaps, respectively. Method 5 reports the sum of the matches without modification.

-MATRix=mymatrix.cmp

allows you to specify a scoring matrix file name other than the program default. If you don't include a directory specification when you name a file with-MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData.

For more information see the Local Scoring Matrices section.

Printed: December 9, 1998 16:23 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]

Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.