[ Program Manual | User's Guide | Data Files | Databases ]
This document may be useful for programmers and script writers, but can be skipped by most users of the FastA program family (FastA, FastX, TFastA, TFastX, and SSearch).
The standard alignment formats of the FastA program family are difficult to parse, and so it has been hard to extract the alignment information from the output file for further processing. A new command-line parameter, -MARKx=10, saves the alignments in a format which is easily parsed. The following is a description of the parsable output file.
The output file has three types of records. The header record starts with >>> . It contains information about the search as a whole, which version of the program was used, which analysis parameters were used, etc. There is only one header record per output file.
An alignment record contains information pertaining to a pairwise alignment, such as the scores for the alignment. It starts with >>. There will be one alignment record for each alignment that was saved.
Following each alignment record are two aligned sequence records, which start with > . Each of these records contains the information for one of the sequences in the alignment: the length of the sequence, the beginning and end of the alignment in that sequence's coordinates, etc.
The end of the parsable records is denoted with >>><<<.
Information in each record consists of parameters and their values in a specific format. Parameters consist of a parameter tag, followed by an underscore, followed by the parameter's name. The complete format is:
; tag_name: value(s)
Parameters originating in William Pearson's FASTA package always have a two-character tag. Current FASTA tags are:
Redistributors of the FASTA package may create their own parameters. If they do, they must use a tag with more than two characters, for example:
; ebi_access: M61687 ; gcg_ver: 9.0
GCG currently has no Wisconsin Package-specific parameters.
Most of the parameters specified by two-character tags correspond to values that are presented in other FastA output formats. A notable exception is parameters with the al tag:
al_stop gives the location of the end of the alignment in the original sequence
al_display_start gives the location of the first displayed residue in the original sequence. (This may not be the same as the first residue in the aligned region, because FastA provides some context for an alignment; even if the -SHOWall parameter is not used, FastA will try to provide about 30 residues on either side of the actual aligned region if the alignment is in the middle of one or the other sequence.)
Sequences may be padded with leading hyphens, if necessary. For example, if the beginning of the query sequence aligns with the tenth residue of the library sequence, then the query sequence will be padded with ten leading hyphens (-) to produce the alignment. The leading hyphens are a formatting convenience only; they are not considered in the numbering system for al_display_start, al_start, or al_stop.
As an example, here is a pair of aligned sequence records:
>gtm1_mouse .. ; sq_len: 217 ; sq_offset: 1 ; sq_type: p ; al_start: 3 ; al_stop: 180 ; al_display_start: 1 ---PMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLN EKFKLGLDFPNLPYLIDGSHKITQSNAILRYLARKHH---LDGETEEERI RADIVENQVMDTRMQLIMLCYNPDFEKQKPEFLKTIPEKMKLYSEFLGKR PWFAGDKVTYVDFLAYDILDQYRMFEPKCLDA------FPNLRDFLARFE GLKKISAYMKSSRYIATPIFSKMAHWSNK >GTX2_TOBAC .. ; sq_len: 223 ; sq_type: p ; al_start: 6 ; al_stop: 181 ; al_display_start: 1 MAEVKLLGFW-YSPFSHRVEWALKIKGVKYE---YIEEDRDN--KSSLLL QSNPV---YKKVPVLIHNGKPIVESMIILEYIDETFEGPSILPKDPYDRA LARFWAKFLDDKVAAVVNTFFRKGEEQEKGK--EEVYEMLKVLDNELKDK KFFAGDKFGFADIAANLVGFWLGVFEEGYGDVLVKSEKFPNFSKWRDEYI NCSQVNESLPPRDELLAFFRARFQAVVASRSAPK
To properly display this alignment, the first P of gtm1_mouse must line up with the first V in GTX2_TOBAC, and the actual aligned region (the region that scores as the best local alignment) starts with the first I in gtm1_mouse (amino acid 3) and the first L (amino acid 6) in GTX2_TOBAC.
Here is a printout of a complete parsable output file containing three alignment records, followed by a printout of the first alignment as it is output by FastA when the default parameter -MARKx=3 is used.
>>>A41264, 496 aa vs @GLUT4.LIST library ; mp_name: FASTA ; mp_ver: Wisconsin Package 10.0 implementation of FASTA 3.1t12 ; pg_name: FASTA ; pg_ver: 3.15 August, 1998 ; pg_matrix: GenRunData:Blosum50.Cmp ; pg_gap-pen: -12 -2 ; pg_ktup: 2 ; pg_optcut: 25 ; pg_cgap: 37 >>Pir2:A49158 ; fa_initn: 1844 ; fa_init1: 1201 ; fa_opt: 1915 ; sw_score: 1915 ; sw_ident: 0.593 ; sw_overlap: 496 >A41264 .. ; sq_len: 496 ; sq_offset: 1 ; sq_type: p ; al_start: 4 ; al_stop: 493 ; al_display_start: 1 -------------MADKKKITASLIYAVSVAAIGSLQFGYNTGVINAPEK IIQAFYNRTLSQRSG----ETISPELLTSLWSLSVAIFSVGGMIGSFSVS LFVNRFGRRNSMLLVNVLAFAGGALMALSKIAKAVEMLIIGRFIIGLFCG LCTGFVPMYISEVSPTSLRGAFGTLNQLGIVVGILVAQIFGLEGIMGTEA LWPLLLGFTIVPAVLQCVALLFCPESPRFLLINKMEEEKAQTVLQKLRGT QDVSQDISEMKEESAKMSQEKKATVLELFRSPNYRQPIIISITLQLSQQL SGINAVFYYSTGIFERAGITQPVYATIGAGVVNTVFTVVSLFLVERAGRR TLHLVGLGGMAVCAAVMTIALALKEK--WIRYISIVATFGFVALFEIGPG PIPWFIVAELFSQGPRPAAMAVAGCSNWTSNFLVGMLFPYAEKLCGPYVF LIFLVFLLIFFIFTYFKVPETKGRTFEDISRGFEEQVETSSPSSPPIEKN PMVEMNSIEPDKEVA >A49158 .. ; sq_len: 509 ; sq_type: p ; al_start: 17 ; al_stop: 507 ; al_display_start: 1 MPSGFQQIGSEDGEPPQQRVTGTLVLAVFSAVLGSLQFGYNIGVINAPQK VIEQSYNETWLGRQGPEGPSSIPPGTLTTLWALSVAIFSVGGMISSFLIG IISQWLGRKRAMLVNNVLAVLGGSLMGLANAAASYEMLILGRFLIGAYSG LTSGLVPMYVGEIAPTHLRGALGTLNQLAIVIGILIAQVLGLESLLGTAS LWPLLLGLTVLPALLQLVLLPFCPESPRYLYIIQNLEGPARKSLKRLTGW ADVSGVLAELKDEKRKLERERPLSLLQLLGSRTHRQPLIIAVVLQLSQQL SGINAVFYYSTSIFETAGVGQPAYATIGAGVVNTVFTLVSVLLVERAGRR TLHLLGLAGMCGCAILMTVALLLLERVPAMSYVSIVAIFGFVAFFEIGPG PIPWFIVAELFSQGPRPAAMAVAGFSNWTSNFIIGMGFQYVAEAMGPYVF LLFAVLLLGFFIFTFLRVPETRGRTFDQISAAFHR-----TPSLLEQEVK PSTELEYLGPDEND >>Pir2:A32101 ; fa_initn: 1822 ; fa_init1: 1188 ; fa_opt: 1883 ; sw_score: 1883 ; sw_ident: 0.589 ; sw_overlap: 496 >A41264 .. ; sq_len: 496 ; sq_offset: 1 ; sq_type: p ; al_start: 4 ; al_stop: 493 ; al_display_start: 1 -------------MADKKKITASLIYAVSVAAIGSLQFGYNTGVINAPEK IIQAFYNRTLSQRSG----ETISPELLTSLWSLSVAIFSVGGMIGSFSVS LFVNRFGRRNSMLLVNVLAFAGGALMALSKIAKAVEMLIIGRFIIGLFCG LCTGFVPMYISEVSPTSLRGAFGTLNQLGIVVGILVAQIFGLEGIMGTEA LWPLLLGFTIVPAVLQCVALLFCPESPRFLLINKMEEEKAQTVLQKLRGT QDVSQDISEMKEESAKMSQEKKATVLELFRSPNYRQPIIISITLQLSQQL SGINAVFYYSTGIFERAGITQPVYATIGAGVVNTVFTVVSLFLVERAGRR TLHLVGLGGMAVCAAVMTIALALKEKW--IRYISIVATFGFVALFEIGPG PIPWFIVAELFSQGPRPAAMAVAGCSNWTSNFLVGMLFPYAEKLCGPYVF LIFLVFLLIFFIFTYFKVPETKGRTFEDISRGFEEQVETSSPSSPPIEKN PMVEMNSIEPDKEVA >A32101 .. ; sq_len: 509 ; sq_type: p ; al_start: 17 ; al_stop: 507 ; al_display_start: 1 MPSGFQQIGSEDGEPPQQRVTGTLVLAVFSAVLGSLQFGYNIGVINAPQK VIEQSYNATWLGRQGPGGPDSIPQGTLTTLWALSVAIFSVGGMISSFLIG IISQWLGRKRAMLANNVLAVLGGALMGLANAAASYEILILGRFLIGAYSG LTSGLVPMYVGEIAPTHLRGALGTLNQLAIVIGILVAQVLGLESMLGTAT LWPLLLAITVLPALLQLLLLPFCPESPRYLYIIRNLEGPARKSLKRLTGW ADVSDALAELKDEKRKLERERPLSLLQLLGSRTHRQPLIIAVVLQLSQQL SGINAVFYYSTSIFELAGVEQPAYATIGAGVVNTVFTLVSVLLVERAGRR TLHLLGLAGMCGCAILMTVALLLLERVPSMSYVSIVAIFGFVAFFEIGPG PIPWFIVAELFSQGPRPAAMAVAGFSNWTCNFIVGMGFQYVADAMGPYVF LLFAVLLLGFFIFTFLRVPETRGRTFDQISATFRR-----TPSLLEQEVK PSTELEYLGPDEND >>Pir2:B30310 ; fa_initn: 1796 ; fa_init1: 1179 ; fa_opt: 1862 ; sw_score: 1862 ; sw_ident: 0.585 ; sw_overlap: 496 >A41264 .. ; sq_len: 496 ; sq_offset: 1 ; sq_type: p ; al_start: 4 ; al_stop: 493 ; al_display_start: 1 -------------MADKKKITASLIYAVSVAAIGSLQFGYNTGVINAPEK IIQAFYNRTLSQRSG----ETISPELLTSLWSLSVAIFSVGGMIGSFSVS LFVNRFGRRNSMLLVNVLAFAGGALMALSKIAKAVEMLIIGRFIIGLFCG LCTGFVPMYISEVSPTSLRGAFGTLNQLGIVVGILVAQIFGLEGIMGTEA LWPLLLGFTIVPAVLQCVALLFCPESPRFLLINKMEEEKAQTVLQKLRGT QDVSQDISEMKEESAKMSQEKKATVLELFRSPNYRQPIIISITLQLSQQL SGINAVFYYSTGIFERAGITQPVYATIGAGVVNTVFTVVSLFLVERAGRR TLHLVGLGGMAVCAAVMTIALALKEKW--IRYISIVATFGFVALFEIGPG PIPWFIVAELFSQGPRPAAMAVAGCSNWTSNFLVGMLFPYAEKLCGPYVF LIFLVFLLIFFIFTYFKVPETKGRTFEDISRGFEEQVETSSPSSPPIEKN PMVEMNSIEPDKEVA >B30310 .. ; sq_len: 508 ; sq_type: p ; al_start: 17 ; al_stop: 506 ; al_display_start: 1 MPSGFQQIGSDDGEPPRQRVTGTLVLAVFSAVLGSLQFGYNIGVINAPQK VIEQSYNATWLGRQGPGGPDSIPQGTLTTLWALSVAIFSVGGMISSFLIG IISQWLGRKRAMLANNVLAVLGGALMGLANAVASYEILILGRFLIGAYSG LTSGLVPMYVGEIAPTHLRGALGTLNRLAIVIGILVAQVLGLESMLGTAT LWPLLLALTVLPALLQLILLPFCPESPRYLYIIRNLEGPARKSLKPLTGW ADVSDALAELKDEKRKLERERPMSLLQLLGSRTHRQPLIIAVVLQLSQQL SGINAVFYYSTSIFESAGVGQPAYATIGAGVVNTVFTLVSVLLVERAGRR TLHLLGLAGMCGCAILMTVALLLLERVPAMSYVSIVAIFGFVAFFEIGPG PIPWF-VAELFSQGPRPAAMAVAGFSNWTCNFIVGMGFQYVADRMGPYVF LLFAVLLLGFFIFTFLKVPETRGRTFDQISAAFRR-----TPSLLEQEVK PSTELEYLGPDEND >>><<<------------------------------------------------------------------------------
SCORES Init1: 1201 Initn: 1844 Opt: 1915 Smith-Waterman score: 1915; 59.3% identity in 496 aa overlap 10 20 30 40 A41264 MADKKKITASLIYAVSVAAIGSLQFGYNTGVINAPEKIIQAFYNRTL ::::|::|: || |::|||||||| ||||||:|:|: ||:| A49158 MPSGFQQIGSEDGEPPQQRVTGTLVLAVFSAVLGSLQFGYNIGVINAPQKVIEQSYNETW 10 20 30 40 50 60 50 60 70 80 90 100 A41264 SQRSG----ETISPELLTSLWSLSVAIFSVGGMIGSFSVSLFVNRFGRRNSMLLVNVLAF |:| :| | ||:||:||||||||||||:|| :::: : :||: :||: |||| A49158 LGRQGPEGPSSIPPGTLTTLWALSVAIFSVGGMISSFLIGIISQWLGRKRAMLVNNVLAV 70 80 90 100 110 120 110 120 130 140 150 160 A41264 AGGALMALSKIAKAVEMLIIGRFIIGLFCGLCTGFVPMYISEVSPTSLRGAFGTLNQLGI ||:||:|:: | : ||||:|||:|| : || :|:||||::|::|| ||||:||||||:| A49158 LGGSLMGLANAAASYEMLILGRFLIGAYSGLTSGLVPMYVGEIAPTHLRGALGTLNQLAI 130 140 150 160 170 180 170 180 190 200 210 220 A41264 VVGILVAQIFGLEGIMGTEALWPLLLGFTIVPAVLQCVALLFCPESPRFLLINKMEEEKA |:|||:||::|||:::|| :|||||||:|::||:|| | | |||||||:| | : | | A49158 VIGILIAQVLGLESLLGTASLWPLLLGLTVLPALLQLVLLPFCPESPRYLYIIQNLEGPA 190 200 210 220 230 240 230 240 250 260 270 280 A41264 QTVLQKLRGTQDVSQDISEMKEESAKMSQEKKATVLELFRSPNYRQPIIISITLQLSQQL : |::| | ||| ::|:|:|: |: :|: ::|:|: | ::|||:||:::||||||| A49158 RKSLKRLTGWADVSGVLAELKDEKRKLERERPLSLLQLLGSRTHRQPLIIAVVLQLSQQL 250 260 270 280 290 300 290 300 310 320 330 340 A41264 SGINAVFYYSTGIFERAGITQPVYATIGAGVVNTVFTVVSLFLVERAGRRTLHLVGLGGM |||||||||||:||| ||: ||:||||||||||||||:||::||||||||||||:||:|| A49158 SGINAVFYYSTSIFETAGVGQPAYATIGAGVVNTVFTLVSVLLVERAGRRTLHLLGLAGM 310 320 330 340 350 360 350 360 370 380 390 400 A41264 AVCAAVMTIALALKEK--WIRYISIVATFGFVALFEIGPGPIPWFIVAELFSQGPRPAAM || :||:|| | |: : |:|||| |||||:|||||||||||||||||||||||||| A49158 CGCAILMTVALLLLERVPAMSYVSIVAIFGFVAFFEIGPGPIPWFIVAELFSQGPRPAAM 370 380 390 400 410 420 410 420 430 440 450 460 A41264 AVAGCSNWTSNFLVGMLFPYAEKLCGPYVFLIFLVFLLIFFIFTYFKVPETKGRTFEDIS |||| |||||||::|| | |: : ||||||:| |:|| |||||:::||||:||||::|| A49158 AVAGFSNWTSNFIIGMGFQYVAEAMGPYVFLLFAVLLLGFFIFTFLRVPETRGRTFDQIS 430 440 450 460 470 480 470 480 490 A41264 RGFEEQVETSSPSSPPIEKNPMVEMNSIEPDKEVA :|:: :|| | :| :|:: : ||:: A49158 AAFHR-----TPSLLEQEVKPSTELEYLGPDEND 490 500
[ Program Manual | User's Guide | Data Files | Databases ]
Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com
Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997, 1998 Genetics Computer Group Inc., a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.
Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.