[ Program Manual | User's Guide | Data Files | Databases ]
FromIG reformats one or more sequences from IntelliGenetics format into individual files in GCG format.
Use FromIG when you want to move sequences being used or assembled with IntelliGenetics software into a format suitable for use with programs in the Wisconsin Package(TM). Since IG software maintains many sequences in one file, FromIG must write many output files, one for each sequence in the IG file. Each output file is named according to the identifier word just above the sequence data in the IG file. All the documentation from the IntelliGenetics input file is preserved in the GCG output files. If an IG sequence is circular, the GCG sequence file says (circular sequence) just above the dividing line.
Here is a session using FromIG to convert the IG-format sequences in urchin.nih into separate files in GCG format:
% fromig FROMIG of what IntelliGenetics sequence file(s) ? urchin.nih surphist1 788 bp surphist2 188 bp surphist3 159 bp /////////////////// surshist2 682 bp Finished FROMIG with 12 files written. 6418 bases were reformatted. %
Here is part of the first output file, surphist1, from the example above:
FROMIG of: urchin.nih definition sea urchin(p.mil.) histone genes; h4 gene. 788bp locus surphist1 788 bp updated 11/01/82 segment 1 of 9 ////////////////////////////////////////////////////////////////////// composition: 180 a 158 c 198 g 129 t 123 n total: 788 nucleotides. (circular sequence) Surphist1. Length: 788 September 29, 1998 17:59 Type: N Check: 6642 .. 1 CAACATATTA GAGGAAGGGA GAGAGAGAGA GAGAGAGAGA GAGAGAGAGA 51 GGGGGGGGGG GAGGGAGAAT TGCCCAAAAC ACTGTAAATG TAGCGTTAAT //////////////////////////////////////////////////////////// 701 NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNGGCCGAAC ACTGTACGGC 751 TTCGGCGGCT AAGTGAAGCA GACTTGGCTA GAATAACG
FromIG accepts multiple (one or more) files containing sequences in IG format as input. You can specify multiple input files as a file of file names, for example @igseqs.list, or by using a file specification with an asterisk (*) wildcard, for example ig*.seq. Each input file may contain multiple (one or more) sequences. Here is part of the input file used for the example above (the number 2 appears at the end of circular sequences in IG format):
; DEFINITION SEA URCHIN(P.MIL.) HISTONE GENES; H4 GENE. 788BP ; LOCUS SURPHIST1 788 BP UPDATED 11/01/82 ; SEGMENT 1 OF 9 //////////////////////////////////////////////////////////// ; Composition: 180 A 158 C 198 G 129 T 123 N ; Total: 788 nucleotides. SURPHIST1 CAACATATTAGAGGAAGGGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGGGGGGGGGG GAGGGAGAATTGCCCAAAACACTGTAAATGTAGCGTTAATGAACTTTTCATCTCATCGAC //////////////////////////////////////////////////////////// NNNNNNNNNNNNGGCCGAACACTGTACGGCTTCGGCGGCTAAGTGAAGCAGACTTGGCTA GAATAACG2 ; DEFINITION SEA URCHIN(P.MIL.) HISTONE GENES; PARTIAL SPACER. 188BP ; LOCUS SURPHIST2 188 BP UPDATED 11/01/82 /////////////////////////////////////////////////////////////////////
When FromIG writes GCG sequence files, it assigns the sequence type based on the composition of the sequence characters. This method is not fool-proof, so to ensure that the output files are written with the correct sequence type, use -PROtein or -NUCleotide on the command line when running FromIG.
If FromIG is run interactively, you can watch the program monitor to see if the sequences are assigned the correct type. As each new file is written, its name and the number of bases (bp) or amino acids (aa) appears on the screen. If the wrong abbreviation appears (for example, bp appears for a protein sequence), the sequence file was assigned the wrong type. The sequence type also appears in the sequence file. Look on the last line of the text heading just above the sequence itself for Type: N or Type: P.
If the sequence type was incorrectly assigned, see Appendix VI for information on how to change or set the type of a sequence.
The following programs convert sequences between other formats and GCG format: FromEMBL, FromGenBank, FromIG, FromPIR, FromStaden, FromFastA, ToIG, ToPIR, ToStaden and ToFastA.
DataSet creates a GCG data library from any set of sequences in GCG format. GCGToBLAST creates a database that can be searched by the BLAST program from any set of sequences in GCG format.
As of IntelliGenetics Release 5.3, IntelliGenetics programs use only IUB-IUPAC nucleotide ambiguity codes. Prior to Release 5.3, IntelliGenetics programs used the Stanford ambiguity codes. The GCG program FromIG assumes that sequence files in IntelliGenetics format contain only IUB-IUPAC sequence symbols and will not perform any symbol conversion.
If there is no identifier above the sequence entry in the IntelliGenetics file, the sequence is written into a file called scratch.fromig.
All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
Minimal Syntax: % fromig [-INfile=]urchin.nih -Default Prompted Parameters: None Local Data Files: None Optional Parameters: -PROtein insists that the input sequences are proteins -NUCleotide insists that the input sequences are nucleic acids -LIStfile[=fromig.list] writes a list file of output sequence names
None.
You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
set the program to expect protein or nucleic acid sequences, respectively. Normally, FromIG determines whether an input sequence is protein or nucleic acid by looking at its composition. If the first 300 alphabetic characters in a sequence are composed entirely of IUB-IUPAC nucleotide codes (see Appendix III), it is reformatted as a nucleic acid sequence in GCG format; otherwise it is reformatted as a protein sequence. Using these command-line parameters, you can insist that your sequences are proteins (-PROtein) or nucleic acids (-NUCleotide).
writes a list file with the names of the output sequence files. This list file is suitable for input to other Wisconsin Package programs that support list files (see Chapter 2, Using Sequence Files and Databases in the User's Guide.) If you don't specify a file name, then FromIG makes one up using fromig for the file name and .list for the file name extension.
[ Program Manual | User's Guide | Data Files | Databases ]
Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com
Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997, 1998 Genetics Computer Group Inc., a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.
Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.