[ Program Manual | User's Guide | Data Files | Databases ]
NoOverlap identifies the places where a group of nucleotide sequences do not share any common subsequences.
This program determines if there are regions where a group of nucleotide sequences do not share any common subsequences. Witkiewicz, Bolander, and Edwards assert that hybridization probes specific enough to detect individual members of a gene family can be prepared if a region 100 bases or longer can be found that does not have a perfect match of nine or more bases with any other member of the family (BioTechniques 14(3); 458-463). NoOverlap is designed to find out if such regions occur in a group of sequences.
To use NoOverlap, you name a group of related sequences in which you want to find regions that do not share any 9-mer with any other sequence in the group. The resulting output is a list of the sequences that have such regions and the coordinates of the regions where no common 9-mers occur.
Here is a session using NoOverlap to find all of the regions of length 100 or greater that contain no common 9-mers in the sequences named in the file of sequence names inhibit.list.
% nooverlap (Double-stranded) NOOVERLAP among what sequences ? @inhibit.list What is the word size (* 9 *) ? What minimum region length with no 9-mers (* 100 *) ? What should I call the output file (* nooverlap.dat *) ? Reading .. Comparing .. NOOVERLAP complete! Sequences: 2 Total Length: 1,844 Common 9-mers: 22 Regions of no overlap: 7 %
NoOverlap makes an output file with a list of all the non-overlapping regions in every sequence that meet your requirements for word size and length. Here is the output file from this session:
(Double-stranded) NOOVERLAP of: @inhibit.list October 8, 1998 10:57 Window: 9 Minimum No-hit region: 100 Sequences: 2 Sequence Ranges .. X03124 1-116 422-583 593-772 J05593 1-195 275-402 493-599 691-790
NoOverlap accepts multiple (two or more) nucleotide sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*. If NoOverlap rejects your nucleotide sequence, see Appendix VI for information on how to change or set the type of a sequence.
Compare compares two protein or nucleic acid sequences and creates a file of the points of similarity between them for plotting with DotPlot. Compare finds the points using either a window/stringency or a word match criterion. The word comparison is 1,000 times faster than the window/stringency comparison, but somewhat less sensitive.
NoOverlap only works with nucleotide sequences. The total of all sequence lengths cannot be greater than 350,000 bases.
If your setting for the minimum region length without an n-mer is greater than the longest sequence in the set of sequences you search, NoOverlap will adjust it downwards to the length of the longest sequence in the group.
Different ambiguity codes will not necessarily match one another. That is, NoOverlap converts ambiguity codes to single, unambiguous bases. Thus, ambiguity codes match only those other ambiguity codes which have been converted to the same unambiguous base.
RNA and DNA are treated the same way; that is, T is equivalent to U.
All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
Minimal Syntax: % nooverlap [-INfile1=]@inhibit.list -Default Prompted Parameters: -WORdsize=9 sets length of words that must not occur -MINlength=100 sets minimum size of region with no common words [-OUTfile=]nooverlap.dat names the output file Local Data Files: None Optional Parameters: -ONEstrand searches only the top strand of your sequences -NOMONitor suppresses the screen trace: "Reading ..." -NOSUMmary suppresses the summary at the end of the program
NoOverlap was written by John Devereux in collaboration with Dr. Halina Witkiewicz at the Mayo clinic.
None.
You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
sets the word size.
sets the minimum length of the region that must contain no word matches among the sequences in the specified list.
searches only for regions in the top strand of each of your sequences.
This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.
writes a summary of the program's work to the screen when you've used -Default to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.
You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.
[ Program Manual | User's Guide | Data Files | Databases ]
Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com
Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997, 1998 Genetics Computer Group Inc., a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.
Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.