[ Program Manual | User's Guide | Data Files | Databases ]
BreakUp reads a GCG-format sequence file containing more than 350,000 sequence characters and writes it as a set of separate, shorter, overlapping sequence files that can be analyzed by Wisconsin Package programs.
This program converts a user sequence that is longer than 350,000 bases to a set of sequences, none longer than 110,000 bases, by breaking the input sequence at 100,000 base boundaries and including 10,000 bases of overlap in the output files.
Here is a session using BreakUp to convert the user sequence lengthy.seq, of length 600,000 bases, to a set of six output sequence files, each with no more than 110,000 bases.
% breakup BREAKUP what file(s) ? lengthy.seq lengthy_0.seq length: 110000 bp lengthy_1.seq length: 110000 bp lengthy_2.seq length: 110000 bp lengthy_3.seq length: 110000 bp lengthy_4.seq length: 110000 bp lengthy_5.seq length: 100000 bp %
BreakUp accepts a single sequence or multiple sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*. The function of BreakUp depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, see Appendix VI for information on how to change or set the type of a sequence.
Replace, CompressText, OneCase, ShiftOver, DeTab, ChopUp, LPrint, and ListFile are the Wisconsin Package file utilities programs.
Sequence files prepared with a text editor or brought to your computer from other sources may contain lines longer than 511 characters. These sequence files must be converted by ChopUp before being read by BreakUp.
All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
Minimal Syntax: % breakup [-INfile=]breakup.txt -Default Prompted Parameters: None Local Data Files: None Optional Parameters: -NOMONitor suppresses the screen trace showing each file -LINesize=50 sets number of characters per line -BLOcksize=10 sets number of characters per block -BLAnklines=1 puts blank lines between the sequence lines -SEGmentsize=100000 sets number of nonoverlapping bases per segment -OVErlap=10000 sets number of overlapping bases per segment -NONUMbering suppresses numbering -NOCOMments suppresses comments -PROtein insists that the sequences are reformatted as protein sequences -NUCleotide insists that the sequences are reformatted as nucleic acid sequences [-OUTfile=]newseqname lets you name the output file -EXTension=.seq defines a file name extension
None.
You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.
lets you set the number of sequence characters per line to any number between 1 and 120.
lets you set the number of sequence characters in each block to any number between 1 and the line size.
leaves zero or more blank lines between the sequence lines.
lets you set the number of non-overlapping sequence characters in each output file to any number greater than the overlap and less than 350000.
lets you set the number of overlapping sequence characters in each output file to any number between 0 and the segment size. The sum of the segment size and the overlap size must, however, be less than 350000.
suppresses the numbering next to each sequence line.
suppresses any comments that may have been in the input sequence file.
sets the sequence type to protein.
sets the sequence type to nucleotide.
selects an output filename other than the name of the input file.
selects a filename extension other than the input filename extension.
[ Program Manual | User's Guide | Data Files | Databases ]
Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com
Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997, 1998 Genetics Computer Group Inc., a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.
Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.