APPENDIX III
[ Program Manual |
User's Guide |
Data Files |
Databases ]
Sequence Symbols
Wisconsin Package programs allow all
upper- and lowercase letters, periods
(.), asterisks (*), tildes (~),
ampersands (&), and at (@)
symbols in biological sequences.
Nucleotide symbols, their complements, and
the
standard one-letter amino acid symbols
are shown below in separate
lists. The meanings of
the symbols &,
and @ have not been
assigned at this writing (October,
1996).
The Wisconsin Package supports two
gap characters: the period (.)
and the tilde (~).
Wisconsin
Package programs run from the
command line or from the
Main List mode of SeqLab
treat the two gap
characters identically in input sequences.
Wisconsin Package programs run
from the Editor mode of
SeqLab
remove any tilde gap characters
from the right end of
each input sequence before performing
their analyses.
In the future, programs run
from either the command line
or from SeqLab may differentiate
the two
gap characters in their analyses.
The period gap character
will increasingly be used as
a space holder that
may represent a missing character
in a sequence. For
example, the period gap character
may represent a
missed base call in a
contig alignment in fragment assembly.
The tilde gap character
will increasingly be
used as a simple place
holder that never represents an
actual character in a sequence.
For example, two
tildes may be used in
a translated sequence to align
each codon in a nucleotide
sequence with its
corresponding single-letter amino acid symbol.
As another example, gaps
at the ends of sequences
in an
alignment may be written as
tildes when those gaps are
due to differences in input
sequence lengths rather
than missing characters in the
input sequences.
GCG uses the letter codes
for amino acid codes and
nucleotide ambiguity proposed by IUPAC-IUB.
These codes are compatible with
the codes used by the
EMBL, GenBank, and PIR databases.
Nucleotides
The meaning of each symbol,
its complement, and the Cambridge
equivalents are shown below.
Cambridge files can be converted
into GCG files and vice
versa with the programs FromStaden
and ToStaden.
-
IUB/GCG Meaning Complement Staden/Sanger
A A T A
C C G C
G G C G
T/U T A T
M A or C K M
R A or G Y R
W A or T W W
S C or G S S
Y C or T R Y
K G or T M K
V A or C or G B V
H A or C or T D H
D A or G or T H D
B C or G or T V B
X/N G or A or T or C X/N N
./~ gap character ./~ -
-
The uncertainty and frame ambiguity
codes used by Staden are
not supported by GCG and
are converted by
FromStaden to the lowercase single
base equivalent.
Staden Code Meaning GCG
1 probably C c
2 probably T t
3 probably A a
4 probably G g
5 A or C m
6 G or T k
7 A or T w
8 G or C s
-
Amino Acids
Here is a list of
the standard one-letter amino acid
codes and their three-letter equivalents.
The
synonymous codons and their depiction
in the IUB codes are
shown. You should recognize
that the codons
following semicolons (;) are not
sufficiently specific to define a
single amino acid even though
they represent
the best possible backtranslation into
the IUB codes! You
can redefine all of the
relationships in this list in
a local data file as
described in Appendix VII.
-
IUB
Symbol 3-letter Meaning Codons Depiction
A Ala Alanine GCT,GCC,GCA,GCG !GCX
B Asp,Asn Aspartic,
Asparagine GAT,GAC,AAT,AAC !RAY
C Cys Cysteine TGT,TGC !TGY
D Asp Aspartic GAT,GAC !GAY
E Glu Glutamic GAA,GAG !GAR
F Phe Phenylalanine TTT,TTC !TTY
G Gly Glycine GGT,GGC,GGA,GGG !GGX
H His Histidine CAT,CAC !CAY
I Ile Isoleucine ATT,ATC,ATA !ATH
K Lys Lysine AAA,AAG !AAR
L Leu Leucine TTG,TTA,CTT,CTC,CTA,CTG !TTR,CTX,YTR;YTX
M Met Methionine ATG !ATG
N Asn Asparagine AAT,AAC !AAY
P Pro Proline CCT,CCC,CCA,CCG !CCX
Q Gln Glutamine CAA,CAG !CAR
R Arg Arginine CGT,CGC,CGA,CGG,AGA,AGG !CGX,AGR,MGR;MGX
S Ser Serine TCT,TCC,TCA,TCG,AGT,AGC !TCX,AGY;WSX
T Thr Threonine ACT,ACC,ACA,ACG !ACX
V Val Valine GTT,GTC,GTA,GTG !GTX
W Trp Tryptophan TGG !TGG
X Xxx Unknown !XXX
Y Tyr Tyrosine TAT, TAC !TAY
Z Glu,Gln Glutamic,
Glutamine GAA,GAG,CAA,CAG !SAR
* End Terminator TAA, TAG, TGA !TAR,TRA;TRR
Printed: December 9, 1998 16:22 (1162)
[ Program Manual |
User's Guide |
Data Files |
Databases ]
Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com
Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989,
1991, 1994, 1995, 1996, 1997, 1998 Genetics Computer Group Inc., a wholly
owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.
Licenses and Trademarks Wisconsin
Package is a trademark of Genetics Computer Group, Inc. GCG and the
GCG logo are registered trademarks of Genetics Computer Group,
Inc.
All other product names mentioned in this documentation may
be trademarks, and if so, are trademarks or registered trademarks of
their respective holders and are used in this documentation for
identification purposes only.

www.gcg.com