MFOLD

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
OUTPUT
INPUT FILES
RELATED PROGRAMS
ALGORITHM
FOLDING CONSTRAINTS
RESTRICTIONS
BATCH QUEUE
CONSIDERATIONS
COMMAND-LINE SUMMARY
ACKNOWLEDGEMENTS
LOCAL DATA FILES
PARAMETER REFERENCE

FUNCTION

[ Top | Next ]

MFold predicts optimal and suboptimal secondary structures for an RNA or DNA molecule using the most recent energy minimization method of Zuker.

DESCRIPTION

[ Previous | Top | Next ]

MFold is an adaptation of the mfold package (version 2.3) by Zuker and Jaeger that has been modified to work with the Wisconsin Package(TM). Their method uses the energy rules developed by Turner and colleagues to determine optimal and suboptimal secondary structures for an RNA molecule and the energy rules compiled and developed by SantaLucia and colleagues to determine optimal and suboptimal secondary structures for a single-stranded DNA molecule. (See the ACKNOWLEDGEMENTS topic for references.)

Using energy minimization criteria, any predicted "optimal" secondary structure for an RNA or DNA molecule depends on the model of folding and the specific folding energies used to calculate that structure. Different optimal foldings may be calculated if the folding energies are changed even slightly. Because of uncertainties in the folding model and the folding energies, the "correct" folding may not be the "optimal" folding determined by the program. You may therefore want to view many optimal and suboptimal structures within a few percent of the minimum energy. You can use the variation among these structures to determine which regions of the secondary structure you can predict reliably. For instance, a region of the RNA molecule containing the same helix in most calculated optimal and suboptimal secondary structures may be more reliably predicted than other regions with greater variation.

MFold calculates energy matrices that determine all optimal and suboptimal secondary structures for an RNA or DNA molecule. The program writes these energy matrices to an output file. A companion program, PlotFold, reads this output file and displays a representative set of optimal and suboptimal secondary structures for the molecule within any increment of the computed minimum free energy you choose. You can choose any of several different graphic representations for displaying the secondary structures in PlotFold.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using MFold to predict optimal and suboptimal secondary structures for an Alu consensus RNA sequence.


% mfold

 (Linear) MFOLD what sequence ? alucons.seq

                  Begin (* 1 *) ?
                End (*   290 *) ?

 What should I call the energy matrix output file (* alucons.mfold *) ?

   Folding .........................................................

               CPU time: 40.21

            Output file: alucons.mfold

%

OUTPUT

[ Previous | Top | Next ]

The output file produced by MFold contains the calculated energy matrices that determine all optimal and suboptimal secondary structures for the folded nucleic acid molecule. You cannot read the output file produced by MFold. This file is read by the companion program, PlotFold, which can display any of several different graphic representations of optimal and suboptimal secondary structures for the folded molecule.

INPUT FILES

[ Previous | Top | Next ]

MFold accepts a single nucleotide sequence as input. If MFold rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

MFold predicts optimal and suboptimal secondary structures for an RNA or DNA molecule using the most recent energy minimization method of Zuker. PlotFold displays the optimal and suboptimal secondary structures for an RNA or DNA molecule predicted by MFold.

StemLoop finds all possible stems (inverted repeats) above some minimum quality that you can set, but StemLoop cannot recognize a structure with gaps (bulge loops or uneven bifurcation loops). The stems can be plotted with DotPlot.

ALGORITHM

[ Previous | Top | Next ]

The general algorithm for determining multiple optimal and suboptimal secondary structures is described by the author of the program, Dr. Michael Zuker (Science 244, 48-52 (1989)). A description of the folding parameters used in the algorithm is presented in Jaeger, Turner, and Zuker (Proc. Natl. Acad. Sci. USA, 86, 7706-7710 (1989)).

FOLDING CONSTRAINTS

[ Previous | Top | Next ]

You may want to constrain the computed foldings to require specific helices and/or unpaired regions based on experimental data.

Forcing Bases to Pair

You can insist that all optimal and suboptimal foldings include a specified helix. (This is equivalent to the double force option in Zuker's original version of the program.) To do this, specify the first base pair, between bases i and j, and the length of the helix, k, using -FORCe1=i,j,k. This forces base pairs s(i)-s(j), s(i+1)-s(j-1),..., s (i+k-1)-s(j-k+1).

You can insist that a group of consecutive bases be double-stranded without specifying the pairing partner for each base. (This is equivalent to the single force option in Zuker's original version of the program.) To do this, specify the first base of the forced region, i, and the length of the forced region, k, using -FORCe1=i,0,k. The 0 between i and k is necessary to tell the program that you are forcing a group of contiguous bases to be double-stranded, rather than forcing a specific helix. This forces bases s(i), s(i+1),..., s(i+k-1) to be double-stranded.

You can force up to eight additional regions to pair with -FORCe2=l,m,n ... -FORCe9=x,y,z.

The only allowable base pairs are A-T/U, G-C, and G-T/U. If you force other base pairing, the program ignores them.

Preventing Bases from Pairing

You can prevent a specified helix from forming in all optimal and suboptimal foldings. (This is equivalent to the double prevent option in Zuker's original version of the program.) To do this, specify the first base pair of the helix you want to prevent, between bases i and j, and the length of the helix, k, using -PREVent1=i,j,k. This prevents the helix containing base pairs s(i)-s(j), s(i+1)-s(j-1),..., s(i+k-1)-s(j-k+1) from forming. Only a specific, single helix is prevented; the prevented bases are still free to participate in other helices.

You can prevent a group of consecutive bases from being involved in any helix, forcing them to remain single-stranded in all predicted foldings. (This is equivalent to the single prevent option in Zuker's original version of the program.) To do this, specify the first base of the single-stranded region, i, and the length of the single-stranded region, k, using -PREVent1=i,0,k. The 0 between i and k is necessary to tell the program that you are forcing a single-stranded region, rather than preventing a specific helix from forming. This will force bases s(i), s(i+1),..., s(i+k-1) to be single-stranded.

You can prevent up to eight additional regions from pairing with -PREVent2=l,m,n ... -PREVent9=x,y,z.

Removing Bases

You can exclude a region of the RNA molecule from folding when a secondary structure model for that region already exists. (This is equivalent to the closed excision option in Zuker's original version of the program.) To do this, specify the base pair that closes off the excluded region, between bases i and j, using -CLOSedexcise1=i,j. MFold folds the remainder of the sequence, including the base pair between i and j. The only allowable base pairs are A-T/U, G-C, and G-T/U. Attempts to force other base pairing produce undefined results. You can specify up to eight additional regions for closed excisions with -CLOSedexcise2=k,l ... -CLOSedexcise9=y,z.

You can also exclude a region of the RNA molecule from participating in a secondary structure as if that region were spliced from the molecule before folding. (This is equivalent to the open excision option in Zuker's original version of the program.) To do this, specify the beginning and ending base numbers, i and j respectively, of the excluded region using -OPENexcise1=i,j. The region from i to j, inclusive, is removed and base i-1 is "ligated" to j+1 before folding the molecule. You can specify up to eight additional regions for open excisions with -OPENexcise2=k,l ... -OPENexcise9=y,z.

See RESTRICTIONS for constraints in plotting the secondary structures of RNA molecules in which a region has be excluded from folding with either -CLOSedexcise or -OPENexcise.

If you want to specify multiple regions for any folding constraint discussed above, you must number that constraint sequentially. For instance, if you want to specify two excluded regions for open excisions, you would need to specify -OPENexcise1 and -OPENexcise2; specifying -OPENexcise1 and -OPENexcise3 would cause the program to recognize only the first excluded region.

If you don't specify any folding constraints as described above, yet an optimal folding is inconsistent with the experimental data, then one of the predicted suboptimal foldings may be consistent.

RESTRICTIONS

[ Previous | Top | Next ]

A maximum of 1400 bases can be folded.

Sequences should only contain the symbols A, C, G, and T/U.

If you exclude a region of the RNA or DNA molecule from folding with either -CLOSedexcise or -OPENexcise, you can display the predicted secondary structures using only the text output option in PlotFold; do not use any of the graphic plotting options of PlotFold to display the results.

MFold does not predict RNA secondary structures containing pseudoknots.

BATCH QUEUE

[ Previous | Top | Next ]

MFold uses an algorithm that computes in time proportional to the cube of the folded length of sequence. It takes a DEC 5000/300 about one minute to fold 290 bases. You can predict, therefore, that 500 bases will take a little more than five times as long. Because of this, you might want to consider running MFold in the batch queue for long sequences. You can specify that this program run at a later time in the batch queue by using -BATch. Run this way, the program prompts you for all the required parameters and then automatically submits itself to the batch or at queue. For more information, see "Using the Batch Queue" in Chapter 3, Using Programs in the User's Guide.

CONSIDERATIONS

[ Previous | Top | Next ]

There are several differences between the GCG implementation of MFold and Dr. Zuker's mfold package. Dr. Zuker's lrna and crna programs, which fold linear and circular sequences, respectively, are combined into a single GCG program. By default, MFold treats the input sequence as a linear molecule. To fold a circular sequence, use -CIRCular.

In Dr. Zuker's original implementation, the program takes a nucleic acid sequence as input, computes the energy matrices, and then displays representations of optimal and suboptimal secondary structures. Dr. Zuker's program allows you the option of storing the energy matrices in a save run of the program and later displaying the secondary structures in a separate continue run. The GCG version of MFold always saves the energy matrices into an output file. A separate program, PlotFold, reads these energy matrices and displays representative secondary structures. Depending on the size of the RNA sequence, the file containing the energy matrices can be very large. For example, the output file created in the MFold example session requires approximately 0.35 megabytes of disk storage. You should consider deleting files that you no longer need.

The default energy files are used by the program to predict folding at 37(o)C. Dr. Zuker's newtemp program allows you to generate energy files for folding nucleic acid molecules at any temperature between 0(o)C and 100(o)C. The GCG version of MFold does not require separate energy files for folding at another temperature. You can specify another folding temperature by using -TEMperature=45 (to fold at 45(o)C, for example).

In Dr. Zuker's original implementation, the symbols B, Z, H, and V/W represent, respectively, the bases A, C, G, and U that are accessible to single-strand nuclease cleavage. The GCG version of MFold does not recognize these symbols as nuclease-sensitive bases; sequences should only contain the symbols A, C, G, and T/U.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax:  % mfold [-INfile=]alucons.seq -Default

Prompted Parameters:

-BEGin=1 -END=290          sets the range of interest
[-OUTfile=]alucons.mfold   names the energy matrix output file

Local Data Files:

                         RNA Energy Tables

-DATa1=dangle.mfoldr037     assigns energies for single base stacking
-DATa2=loop.mfoldr037       assigns destabilizing energies for internal,
                              bulge, and hairpin loops
-DATa3=stack.mfoldr037      assigns energies for base stacking
-DATa4=tstackh.mfoldr037    assigns energies for terminal mismatched pairs
                              in hairpin loops
-DATa5=tstacki.mfoldr037    assigns energies for terminal mismatched pairs
                              in interior loops
-DATa6=tloop.mfoldr037      assigns bonus energies for recognized "tetraloops"
-DATa7=miscloop.mfoldr037   assigns energies for multi-branched and asymmetric
                              interior loops

                         DNA Energy Tables

-DATa1=dangle.mfoldd037     assigns energies for single base stacking
-DATa2=loop.mfoldd037       assigns destabilizing energies for internal,
                              bulge, and hairpin loops
-DATa3=stack.mfoldd037      assigns energies for base stacking
-DATa4=tstackh.mfoldd037    assigns energies for terminal mismatched pairs
                              in hairpin loops
-DATa5=tstacki.mfoldd037    assigns energies for terminal mismatched pairs
                              in interior loops
-DATa6=tloop.mfoldd037      assigns bonus energies for recognized "tetraloops"
-DATa7=miscloop.mfoldd037   assigns energies for multi-branched and asymmetric
                              interior loops
Optional Parameters:

-DNA                  folds a DNA molecule
-CIRcular             folds a circular molecule
-TEMperature=37.0     sets the folding temperature (Celsius)
-EXTension=mfoldr037  sets the default extension for all local data files
-MAXLoopsize=30       sets the maximum size of interior loop
-LOPsidedness=30      sets the maximum lopsidedness of an interior loop
-FORCe=i,j,k          forces k consecutive base pairs, starting
                        with the base pair between i and j
-FORCe=i,0,k          forces k consecutive bases, beginning with i,
                        to form base pairs
-PREVent=i,j,k        prevents k consecutive bases pairs, starting
                        with the base pair between i and j
-PREVent=i,0,k        prevents k consecutive bases, beginning with i,
                        from base pairing
-CLOSedexcise=i,j     excludes bases i+1 through j-1 from folding,
                        forcing a base pair between i and j
-OPENexcise=i,j       excludes bases i through j from folding,
                        ligating bases i-1 and j+1 together
-NOMONitor            suppresses screen trace of program progress
-NOSUMmary            suppresses screen summary at the end of the
                        program
-BATch                submits program to the batch queue

ACKNOWLEDGEMENTS

[ Previous | Top | Next ]

GCG is licensed to distribute MFold by the National Research Council of Canada. If you use MFold for published research, please cite Dr. Zuker's Science paper (reference below). We are very grateful to Dr. Zuker both for making his work available to GCG and for helping us incorporate his work into the Wisconsin Package.

MFold is an adaptation of the mfold package (version 2.3) by Zuker and Jaeger (Zuker, M. (1989). Science 244, 48-52; Jaeger, J.A. Turner, D.H., and Zuker, M. (1989). Proc. Natl. Acad. Sci. USA, 86, 7706-7710; Jaeger, J.A., Turner, D.H., and Zuker, M. (1990). In Methods in Enzymology, 183, 281-306) that has been modified to work with the Wisconsin Package. Their method uses the energy rules developed by Turner and colleagues (Freier, S.M., Kierzek, R., Jaeger, J.A., Sugimoto, N., Caruthers, M.H., Neilson, T., and Turner, D.H. (1986). Proc. Natl. Acad. Sci. USA 83, 9373-9377; Turner, D.H. Sugimoto, N., Jaeger, J.A., Longfellow, C.E., Freier, S.M. and Kierzek, R. (1987). Cold Spring Harbor Symp., Quant. Biol. 52, 123-133; Turner, D.H., Sugimoto, N., and Freier, S.M. (1988). Annu. Rev. Biophys. Biophys. Chem. 17, 167-192) to determine optimal and suboptimal secondary structures for an RNA molecule.

The energy rules to determine optimal and suboptimal secondary structures for a single-stranded DNA molecule were developed and compiled by Dr. John SantaLucia, Jr. and colleagues. If you use MFold to predict single-stranded DNA secondary structures, please cite the following references: SantaLucia, J.Jr. (1998). Proc. Natl. Acad. Sci. USA 95, 1460-1465; SantaLucia, J.Jr. and Allawi, H.T. (1977). Biochemistry 36, 10581-10594.

MFold was modified to work with version 7.2 of the Wisconsin Package by Irv Edelman.

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

For RNA secondary structure predictions, MFold reads the file dangle.mfoldr037 for the single base stacking energies; loop.mfoldr037 for the internal, bulge, and hairpin loop energies; stack.mfoldr037 for the base stacking energies; tstackh.mfoldr037 for the energies for terminal mismatched pairs in hairpin loops; tstacki.mfoldr037 for the energies for terminal mismatched pairs in interior loops; tloop.mfoldr037 for the bonus energies for recognized tetraloops; and miscloop.mfoldr037 for the energies for multi-branched and asymmetric interior loops. For DNA secondary structure predictions MFold reads data files with these same names, but with the file name extension .mfoldd037 instead of .mfoldr037.

PARAMETER REFERENCE

[ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-DNA

folds a single-stranded DNA molecule using the thermodynamic parameters determined for DNA. (See the ACKNOWLEDGEMENTS topic for references.)

-CIRcular

tells MFold to treat the nucleic acid molecule as circular.

-TEMperature=37

lets you select the folding temperature in degrees Celsius. The default folding temperature is 37(o).

-EXTension=mfold037

selects a file name extension for all local data files.

-MAXLoopsize=30

set the maximum size for an interior or bulge loop in the predicted secondary structures. An interior loop is an unpaired region interrupting a helix, with unpaired bases on both strands of the interrupted region. A bulge loop is a loop-out in a helix involving only one of the helix strands. The size of the loop is the total number of unpaired bases in the loop.

-LOPsidedness=30

sets the maximum lopsidedness for an interior or bulge loop in the predicted secondary structures. For an interior loop, this is the maximum difference between the number of single-stranded bases on one side of the loop and the number of single-stranded bases on the other side. For a bulge loop, this is the maximum number of bases in the loop.

-FORCe1=i,j,k ... -FORCe9=x,y,z

forces the helix that begins with the base pair between bases i and j and extends for k bases to the base pair between i+k-1 and j-k+1.

If j is 0, then the sequence of k consecutive bases, beginning with base i, is forced to be double-stranded (although the pairing partner for each base is not specified).

You can force up to 9 regions to pair by specifying sequential numbers with the -FORCe parameter (-FORCe1=l,m,n ... -FORCe9=x,y,z).

The only allowable base pairs are A-T/U, G-C, and G-T/U. Attempts to force other base pairing produce undefined results.

-PREVent1=i,j,k ... -PREVent9=x,y,z

prevents the helix that begins with the base pair between bases i and j and extends for k bases to the base pair between bases i+k-1 and j-k+1.

If j is 0, then the sequence of k consecutive bases, beginning at base i is prevented from participating in any helix, forcing them to remain single-stranded.

You can prevent up to 9 regions from pairing by specifying sequential numbers with the -PREVent parameter (-PREVent1=l,m,n ... -PREVent9=x,y,z).

-CLOSedexcise1=i,j ... -CLOSedexcise9=y,z

excludes the sequence range from base i+1 through base j-1 from folding, forcing a base pair between the bases i and j.

You can exclude up to 9 regions from folding in this manner by specifying sequential numbers with the -CLOSedexcise parameter (-CLOSedexcise1=i,j ... -CLOSedexcise9=y,z).

The only allowable base pairs are A-T/U, G-C, and G-T/U. Attempts to force other base pairing produce undefined results.

-OPENexcise1=i,j ... -OPENexcise9=y,z.

excludes the sequence range from base i through base j from folding, "ligating" base i-1 to j+1 before folding the molecule.

You can exclude up to 9 regions from folding in this manner by specifying sequential numbers with the -OPENexcise parameter (-OPENexcise1=i,j ... -OPENexcise9=y,z).

-MONitor

shows the progress of MFold on your screen. Use this parameter to see this same monitor in the log file for a batch process. If the monitor is slowing down the program because your terminal is connected to a slow modem, suppress it by including -NOMONitor on the command line.

-SUMmary

writes a summary of the program's work to the screen when you've used -Default to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

-BATch

submits the program to the batch queue for processing after prompting you for all required user inputs. Any information that would normally appear on the screen while the program is running is written into a log file. Whether that log file is deleted, printed, or saved to your current directory depends on how your system manager has set up the command that submits this program to the batch queue. All output files are written to your current directory, unless you direct the output to another directory when you specify the output file.

Printed: December 15, 1998 13:25 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997, 1998 Genetics Computer Group Inc., a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com