[ Program Manual | User's Guide | Data Files | Databases ]
Composition determines the composition of sequence(s). For nucleotide sequence(s), Composition also determines dinucleotide and trinucleotide content.
Composition measures the composition of one or a group of sequences. If you specify only one sequence, you can choose a range within the sequence. Lowercase letters are converted to uppercase and counted with their uppercase equivalents. If you specify a group of sequences, Composition displays the name of each sequence as it finishes the measurement for that sequence.
Here is a session using Composition to measure the composition and di- and trinucleotide content for all of the bacterial sequences in GenEMBL:
% composition COMPOSITION on what sequence(s) ? Bacterial:* What should I call the output file (* bacterial.composition *) ? A33344 A33349 A34992 ///////// ZSSSURNAS1 ZSSSURNAS2 ZYP16SRNA COMPOSITION complete. Sequences: 48,022 Total Length: 120,532,386 CPU time: 54.37 Output file: bacterial.composition %
Here is the output file:
COMPOSITION of: Primate:* October 7, 1998 11:13 Sequences: 48,022 Total_Length: 120,532,386 CPU_Time: 54.37 ***** A: 31,313,460 B: 102 C: 28,735,001 D: 95 G: 30,961,705 H: 314 K: 849 M: 777 N: 91,852 R: 1,571 S: 1,568 T: 29,422,612 V: 102 W: 964 Y: 1,414 Other: 0 Total: 120,532,386 ***** GG: 8,050,342 GA: 7,862,220 GT: 6,360,816 GC: 8,666,422 AG: 7,038,203 AA: 9,998,328 AT: 7,864,675 AC: 6,393,582 TG: 7,923,448 TA: 6,046,193 TT: 8,693,358 TC: 6,736,239 CG: 7,924,019 CA: 7,381,735 CT: 6,486,257 CC: 6,920,823 Other: 137,704 Total: 120,484,364 ***** GGG: 1,809,031 GGA: 1,960,301 GGT: 1,901,998 GGC: 2,371,501 GAG: 1,672,334 GAA: 2,489,765 GAT: 2,125,037 GAC: 1,570,445 GTG: 1,747,910 GTA: 1,331,828 GTT: 1,828,370 GTC: 1,446,886 GCG: 2,339,125 GCA: 2,092,399 GCT: 2,019,219 GCC: 2,210,163 AGG: 1,758,829 AGA: 1,840,304 AGT: 1,406,725 AGC: 2,027,036 AAG: 2,315,186 AAA: 3,373,559 AAT: 2,308,923 AAC: 1,993,654 ATG: 1,990,442 ATA: 1,691,234 ATT: 2,235,792 ATC: 1,942,982 ACG: 1,661,442 ACA: 1,628,545 ACT: 1,365,670 ACC: 1,733,627 TGG: 2,225,297 TGA: 2,195,490 TGT: 1,492,723 TGC: 2,005,470 TAG: 1,051,637 TAA: 1,930,346 TAT: 1,737,112 TAC: 1,323,841 TTG: 2,036,860 TTA: 1,910,148 TTT: 2,790,885 TTC: 1,949,165 TCG: 1,767,614 TCA: 1,899,594 TCT: 1,527,840 TCC: 1,535,419 CGG: 2,250,632 CGA: 1,857,615 CGT: 1,553,902 CGC: 2,257,754 CAG: 1,993,100 CAA: 2,196,518 CAT: 1,686,654 CAC: 1,502,049 CTG: 2,143,874 CTA: 1,109,501 CTT: 1,833,157 CTC: 1,393,229 CCG: 2,151,571 CCA: 1,757,503 CCT: 1,568,927 CCC: 1,436,717 Other: 173,936 Total: 120,436,342 *****
Unknown.
You can infer the composition of the bottom strand of a nucleic acid sequence from the composition of the top strand. The -BOTHstrands parameter measures both strands, but information is lost because G=C and A=T, and so on.
Composition takes either a single or a multiple sequence file specification. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*. The function of Composition depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, see Appendix VI for information on how to change or set the type of a sequence.
CodonFrequency tabulates codon frequencies for any range of a sequence in a particular reading frame, as opposed to counting all trinucleotides.
If you need to stop this program, use <Ctrl>C to reset your terminal and session as gracefully as possible. Searches and comparisons write out the results from the part of the search that is complete when you use <Ctrl>C.
You can run this program in the batch queue using a script that we supply. Use Fetch with a filename that starts with this program's name and ends with the filename extension .csh. Modify the file with any text editor so that it specifies the experiment you want to do and queue the script.
See the sections on specifying sequences in Chapter 2, Using Sequence Files and Databases of the User's Guide.
All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
Minimal Syntax: % composition [-INfile=]Bacterial:* -Default Prompted Parameters: -BEGin=1 -END=1000 sets the range of interest (single seqs only) [-OUTfile=]bacterial.composition names the output file Local Data Files: None Optional Parameters: -BOTHstrands determines composition of both strands of nucleic acids -NOCOMmas removes the commas from the numbers in the output -NOMONitor suppresses the screen monitor showing each sequence -NOSUMmary suppresses the screen summary at the end of the program
None.
You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
measures the composition of both strands of a nucleic acid sequence.
Composition normally displays numbers greater than 999 with commas to make them easier to read; for example, the number 1234567 would look like 1,234,567. These commas make the numbers unreadable to a computer. If you are going to use the output file from this program for input to another program, you can suppress the commas with this parameter.
This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.
writes a summary of the program's work to the screen when you've used -Default to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.
You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.
[ Program Manual | User's Guide | Data Files | Databases ]
Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com
Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997, 1998 Genetics Computer Group Inc., a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.
Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.