PEPTIDEMAP

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents

FUNCTION

DESCRIPTION

FUNCTION [ Top | Next ]

PeptideMap creates a peptide map of an amino acid sequence.

DESCRIPTION [ Previous | Top | Next ]

PeptideMap marks a peptide sequence at every position where a known proteolytic enzyme or reagent might cut it. You can select one or a few enzymes or let PeptideMap use the whole list.

PeptideMap is simply the program Map run with -PROGRAMname=PeptideMap. (See the documentation for Map in the Program Manual for a complete description.)

EXAMPLE [ Previous | Top | Next ]

Here is a session using PeptideMap to create a peptide map of gzeinaa.pep:


% peptidemap

 (Linear) (Peptide) MAP of what sequence ?  gzeinaa.pep

                  Begin (* 1 *) ?
                End (*   283 *) ?

 Select the enzymes:  Type nothing or "*" to get all enzymes. Type "?"
 for help on which enzymes are available and how to select them.

                                       Enzyme(* * *):

  What should I call the output file (* gzeinaa.map *) ?

 Mapping .......

 Writing ..... ..
 MAP complete with:

   Sequence Length:     283
    Enzymes Chosen:       8
    Cutsites found:     114
          CPU time:   00.29

    Output file(s): gzeinaa.map

%

OUTPUT FILE [ Previous | Top | Next ]

Here is some of the output file:


 (Linear) (Peptide) MAP of: gzeinaa.pep  check: 2106  from: 1  to: 283

Corn Storage Protein Am. Ac. (19,000, genomic)
extracted from GZEIN.SEQ, checksum 2842, row a

 With 8 enzymes: *

                             October 8, 1998 14:40  ..

                          Chymo                                ProEn
                        Chymo |                              Staph |
                       Chymo| |                             NTCB | |
                      Chymo|| |                         Chymo  | | |
                       CnBr|| |                          CnBr  | | |
                    Chymo ||| |                       ProEn |  | | |
                  Chymo | ||| |                     Chymo | |  | | |
                   NTCB | ||| |                    Chymo| | |  | | |
              Trypsin | | ||| |                   ProEn|| | |  | | |
             Chymo  | | | ||| |                  ProEn||| | |  | | |
    Trypsin   CnBr  | | | ||| |        NTCB    Chymo |||| | |  | | |
   Trypsin|ProEn |  | | | ||| |      ProEn|ProEn   | |||| | |  | | |
         ||    | |  | | | ||| |          ||    |   | |||| | |  | | |
         RKHNIVPIMAAKIFCLIMLLGLSASAATASIFPQCSQAPIASLLPPYLSPAMSSVCENPI
       1 ---------+---------+---------+---------+---------+---------+ 60

 ///////////////////////////////////////////////////////////////////////

 Enzymes that do cut:

    Chymo     CnBr     NTCB    ProEn    Staph  Trypsin

 Enzymes that do not cut:

    NH2OH    pH2.5

INPUT FILES [ Previous | Top | Next ]

PeptideMap accepts a single protein sequence as input. If PeptideMap rejects your protein sequence, see Appendix VI for information on how to change or set the type of a sequence.

PeptideSort shows the peptide fragments from a digest of an amino acid sequence. It sorts the peptides by position, putative molecular weight, and relative HPLC retention at pH 2.1, and shows the composition of each peptide. It also prints a summary of the composition of the whole protein.

CHOOSING THE ENZYMES [ Previous | Top | Next ]

See the documentation for the program Map for a complete description of choosing enzymes.

COMMAND-LINE SUMMARY [ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % peptidemap [-INfile=]gzeinaa.pep -Default

Prompted Parameters:

-BEGin=1 -END=283          sets the range of interest
-ENZymes=*[,...]           selects peptidases used in the search
[-OUTfile=]gzeinaa.map     names the output file

Local Data Files:

-DATa=proenzyme.dat        specifies name and specificity of each peptidase

Optional Parameters:

-WIDth=100      sets display width to something other than 60 aa/line
-PAGe[=62]      adds form-feeds to keep clusters on a single page
-ONCe           shows peptidases that cut only once
-MINCuts=2      shows only peptidases that cut at least 2 times
-MAXCuts=2      shows only peptidases that cut no more than 2 times
-EXCLude=n1,n2  suppresses peptidases that cut between n1 and n2
-APPend         appends the input data files to the output file
-MISmatch=1     finds cleavage sites with one or fewer mismatches
-NOSCALeline    suppresses the scale line
-VERtical       displays peptidase name vertically over cleavage site
-RSF[=map.rsf]  saves sites as features in RSF file

LOCAL DATA FILES [ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

The file proenzyme.dat contains the enzyme data.

PARAMETER REFERENCE [ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-ENZymes=*[,...]

specifies the restriction enzymes whose recognition sites you want to search. If you search for several different enzymes, separate their names with commas. -ENZymes=* selects all enzymes, -ENZymes=** selects all enzymes, including isoschizomers, and -ENZymes=Al* selects all enzymes whose names start with Al.

-MENu=t

specifies which nucleotide reading frames are translated into protein sequences in the output file. Specify t for three forward frames, s for all six frames, o for open frames only, or n for no protein translation. You can also specify one of the letters a through f for any one of the six possible reading frames.

-TRANSlate=filename.txt

Usually, translation is based on the translation table in a default or local data file called translate.txt. This parameter allows you to use a translation table in a different file. (See Appendix VII for information about translation tables.)

-RSF=map.rsf

writes an RSF (rich sequence format) file containing the input sequences annotated with features generated from the results of Map. This RSF file is suitable for input to other Wisconsin Package programs that support RSF files. In particular, you can use SeqLab to view this features annotation graphically. If you don't specify a file name with this parameter, then the program creates one using map for the file basename and .rsf for the extension. For more information on RSF files, see "Using Rich Sequence Format (RSF) Files" in Chapter 2 of the User's Guide. Or, see "Rich Sequence Format (RSF) Files" in Appendix C of the SeqLab Guide.

-OPEn=20

restricts the display of translations to open reading frames (ORFs). If you supply a number like 20 with this parameter, the ORF would only be displayed if it coded for at least 20 amino acids.

-CIRcular

tells Map to treat your sequence as circular. If a possible recognition site starts at the end and continues into the beginning of the sequence, the site is marked at the point where a circular molecule would be cut. For instance if your sequence ends in GAA and starts with TTC, Map shows an EcoRI cut two bases before the end of the sequence. The sequence is only circularized at the ends found in the file, so if you want a subrange to be treated as circular you have to create a file in which the subrange is the entire sequence (see the Assemble program).

-LINear

is the opposite of -CIRcular. If you have defined a command that runs Map with -CIRcular as the default, use the-LINear parameter to make Map treat your sequence as linear.

-PAGe=60

Printed output from this program may cross from one page to another in an annoying way. Use this parameter to add form feeds to the output file in order to try to keep clusters of related information together. You can set the number of lines per page by supplying a number after -PAGe.

-WIDth=100

allows you to choose the number of bases shown on each line of output. The standard is 60, which can be shown on a terminal screen nicely, but 100 sequence symbols per line is very convenient for estimating the size of fragments between cuts.

-THReeletter

sets the translation to show three-letter amino acid codes instead of the one-letter codes. Normally you can set the translation to show three-letter amino acid codes by capitalizing your response to the protein translation program prompt. However, when you choose protein translation from the command line, you must add -THReeletter to get three-letter amino acid codes.

-MISmatch=1

causes the program to recognize sites that are like the recognition site but with one or fewer mismatches. If too many mismatches are allowed, the results may not be meaningful. The output from most mapping programs distinguishes between sites with no mismatches and sites with mismatches.

-SILent

shows the places where restriction sites can be introduced (by site-directed mutagenesis) without changing the peptide translation of the sequence. The -SILent parameter assumes that the range you have chosen defines a coding region and reading frame precisely. Sites may be found that have any number of bases changed as long as the changes do not alter the translation. The reading frame is implied by the beginning coordinate you specify. The output from most mapping programs distinguishes between real sites and sites with one or more mismatches. The data file translate.txt defines the genetic code.

-PERFect

sets the program to look for a perfect alphabetic match between the site and the sequence. Ambiguity codes are normally translated so that the site RXY would find sequences like ACT or GAC. With this parameter, the ambiguity codes are not translated so the site RXY would only match the sequence RXY. This parameter is not the same as -MISmatch=0!

-ALL

makes an overlap-set map instead of the usual subset map. If your sequence is very ambiguous (for instance, as a back-translated sequence would be) and you want to see where restriction sites could be, then an overlap-set map is for you. Overlap-set and subset pattern recognition is discussed in more detail in the Program Manual entry for Window.

-APPend

appends the enzyme data file to your output file. If you provided your own translation scheme, that file is also appended.

-CUTters=gamma.cutters

writes out a new enzyme data file containing those selected enzymes that did cut your sequence and were not excluded with any of the -MINCuts, -ONCe,-MAXCuts, and -EXClude parameters. If you do not add a file name to the -CUTters parameter the output file will have the name of your sequence followed by the file name extension.cutters

-NONCUTters=gamma.noncutters

writes out a new enzyme data file containing the selected enzymes that did NOT cut your sequence. If you do not add a file name to this parameter the output file will have the name of your sequence followed by the file name extension.noncutters

-EXCUTters=gamma.excutters

writes out a new enzyme data file containing those enzymes that did cut your sequence but were excluded with any of the-EXClude, -MINCuts, -ONCe, and -MAXCuts parameters. If you do not add a file name to this parameter the output file will have the name of your sequence followed by the file name extension.excutters

The parameters -MINSitelen and -OVErhang restrict the domain of enzymes selected.

-MINSitelen=6

selects only patterns with the specified number or more bases in the recognition site. You can display the sites from any pattern in the enzyme or pattern file that you take the trouble to name individually, but when you use all of the patterns, the program uses all of the patterns whose recognition sites have the specified number or more non-N, non-X bases. -MINSitelen=6 replaces the -SIXbase parameter from earlier versions of the Wisconsin Package.

-OVErhang=0

selects only enzymes that leave blunt ends. Use a5 with this parameter to search only with enzymes that leave 5' overhangs and a 3 to search only with enzymes that leave a 3' overhang. You can use multiple values, separated by commas. For instance, -OVErhang=5,3 searches with all enzymes that leave either 5' or 3' overhangs. You can display the cuts from any enzyme in the enzyme data file that you take the trouble to name individually, but when you use* (meaning all), the program uses all of the enzymes whose overhangs conform to your choice with this parameter.

The -MINCuts, -MAXCuts, -ONCe, and-EXClude parameters suppress the display of selected enzymes. The list of excluded enzymes in the program output includes both selected enzymes that cut within excluded ranges and selected enzymes that did not cut the right number of times.

-MINCuts=2

excludes enzymes that do not cut at least two times.

-MAXCuts=2

excludes enzymes that cut more than two times.

-ONCe

excludes, from the set of enzymes displayed, those enzymes that cut your sequence more than once (equivalent to setting both mincuts and maxcuts to one).

-EXClude=n1,n2[,n3,n4,...]

excludes enzymes that cut anywhere within one or more ranges of the sequence. If an enzyme is found within an excluded range, then the enzyme is not displayed. The list of excluded enzymes includes enzymes that cut within excluded ranges. The ranges are defined with sets of two numbers. The numbers are separated by commas. Spaces between numbers are not allowed. The numbers must be integers that fall within the sequence beginning and ending points you have chosen. The range may be circular if circular mapping is being done. Exclusion is not done if there are any non-numeric characters in the numbers or numbers out of range or if there is an odd number of integers following the parameter.

-BOTtom

shows where each enzyme cuts the reverse strand as well as the forward strand. The cut point on the bottom strand is the 5' end of the fragment which continues to the left.


                           HgaI
                           SimI
                        NlaIII|
                     BsaJI   ||
                      DsaI   ||
                      NcoI   ||
                      StyI   ||
                  BsaHI  |   ||                         RleAI
              BspGI   |  |   ||    MnlI              BseRI  |
          BfaI    |   |  |   || MnlI  |           CviJI  |  |  CviJI
             |    |   |  |   ||    |  |               |  |  |      |
         GCTCCTAGTCCAGACGCCATGGGTCATTTCACAGAGGAGGACAAGGCTACTATCACAAGC
    2161 ---------+---------+---------+---------+---------+---------+ 2220
         CGAGGATCAGGTCTGCGGTACCCAGTAAAGTGTCTCCTCCTGTTCCGATGATAGTGTTCG
               |  |     ||   |   ||| |                || |         |
            BfaI  | BsaHI|StyI   ||| |            CviJI| |     CviJI
              BspGI NlaIII   |SimI|| |             BseRI |
                          NcoI MnlI| |               RleAI
                          DsaI  HgaI |
                         BsaJI    MnlI

-VERtical

shows enzyme names vertically over (or under) the position where they cut. When a collision at a cut point requires more than one enzyme to be displayed at that point, Map uses the next unoccupied column to the right. A '/' below the enzyme's name indicates that the name of the enzyme has been displaced. When the number of finds is very great, the resolution of this kind of display is inadequate. If the display seems too full, either restrict the number of enzymes chosen or use the default horizontal enzyme display.


                             N
                  B   B  B   l                        C  B  R      C
             B    s   s  sDNSaH    M  M               v  s  l      v
             f    p   a  asctIg    n  n               i  e  e      i
             a    G   H  JaoyIa    l  l               J  R  A      J
             I    I   I  IIIIII    I  I               I  I  I      I
             |    |   |  |///||    |  |               |  |  |      |
         GCTCCTAGTCCAGACGCCATGGGTCATTTCACAGAGGAGGACAAGGCTACTATCACAAGC
    2161 ---------+---------+---------+---------+---------+---------+ 2220
         CGAGGATCAGGTCTGCGGTACCCAGTAAAGTGTCTCCTCCTGTTCCGATGATAGTGTTCG

The center of the Map display is a line showing the cut points with '|' characters, the top strand of the sequence, a scale, and the bottom sequence strand. These parameters let you suppress any of these lines.

-NOCUTline

suppresses the line of '|' characters between the enzyme name and the strand it cuts.

-NOSEQline

suppresses the sequence display.

-NOSCALeline

suppresses the scale line between the sequence and its complement.

-NOCOMPline

suppresses complement sequence display.

-TABle

If you simply want a table of which enzymes cut where use this parameter. See the topic TABLE OUTPUT.

-SORtbyenzyme: Table output is normally sorted by the position of the cut in the top strand of the sequence. Use this parameter to see the cuts sorted first by enzyme and then by position. See the topic TABLE OUTPUT.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

-SUMmary

writes a summary of the program's work to the screen when you've used -Default to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

Printed: December 9, 1998 16:28 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]

Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.