Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
RNACOFOLD(1)			 User Commands			  RNACOFOLD(1)

       RNAcofold - manual page for RNAcofold 2.4.14

       RNAcofold [OPTIONS]... [FILES]...

       RNAcofold 2.4.14

       calculate secondary structures of two RNAs with dimerization

       The  program works much like RNAfold, but allows	one to specify two RNA
       sequences which are then	allowed	to form	a  dimer  structure.  RNA  se-
       quences	are read from stdin in the usual format, i.e. each line	of in-
       put corresponds to one sequence,	except for  lines  starting  with  ">"
       which  contain  the  name  of the next sequence.	 To compute the	hybrid
       structure of two	molecules, the two sequences must be concatenated  us-
       ing  the	 \'&\'	character as separator.	 RNAcofold can compute minimum
       free energy (mfe) structures, as	well as	partition  function  (pf)  and
       base  pairing probability matrix	(using the -p switch) Since dimer for-
       mation is concentration dependent, RNAcofold can	 be  used  to  compute
       equilibrium concentrations for all five monomer and (homo/hetero)-dimer
       species,	given input concentrations for the monomers.  Output  consists
       of  the	mfe structure in bracket notation as well as PostScript	struc-
       ture plots and "dot plot" files containing the pair probabilities,  see
       the  RNAfold  man  page for details. In the dot plots a cross marks the
       chain break between the two concatenated	sequences.  The	 program  will
       continue	 to  read  new sequences until a line consisting of the	single
       character @ or an end of	file condition is encountered.

       -h, --help
	      Print help and exit

	      Print help, including all	details	and hidden options, and	exit

	      Print help, including hidden options, and	exit

       -V, --version
	      Print version and	exit

   General Options:
	      Command line options which alter the general  behavior  of  this

       -v, --verbose
	      Be verbose.


       -j, --jobs[=number]
	      Split batch input	into jobs and start processing in parallel us-
	      ing multiple threads. A value of 0 indicates to use as many par-
	      allel threads as computation cores are available.


	      Default  processing of input data	is performed in	a serial fash-
	      ion, i.e.	one sequence pair at a time. Using this	switch,	a user
	      can instead start	the computation	for many sequence pairs	in the
	      input in parallel. RNAcofold will	create as many parallel	compu-
	      tation slots as specified	and assigns input sequences of the in-
	      put file(s) to the available slots. Note,	 that  this  increases
	      memory  consumption  since  input	 alignments have to be kept in
	      memory until an empty compute slot is available and each running
	      job requires its own dynamic programming matrices.

	      Do  not  try  to	keep output in order with input	while parallel
	      processing is in place.


	      When parallel input processing (--jobs flag) is enabled, the or-
	      der in which input is processed depends on the host machines job
	      scheduler. Therefore, any	output to stdout or files generated by
	      this program will	most likely not	follow the order of the	corre-
	      sponding input data set. The default of RNAcofold	is  to	use  a
	      specialized  data	 structure to still keep the results output in
	      order with the input data. However, this comes with a  trade-off
	      in terms of memory consumption, since all	output must be kept in
	      memory for as long as no chunks of consecutive,  ordered	output
	      are  available.  By setting this flag, RNAcofold will not	buffer
	      individual results but print them	as soon	as they	have been com-

       --noPS Do not produce postscript	drawing	of the mfe structure.


	      Do not automatically substitute nucleotide "T" with "U"


	      Automatically generate an	ID for each sequence.  (default=off)

	      The  default  mode of RNAcofold is to automatically determine an
	      ID from the input	sequence data if the input file	format	allows
	      to  do  that. Sequence IDs are usually given in the FASTA	header
	      of input sequences. If this flag is  active,  RNAcofold  ignores
	      any  IDs retrieved from the input	and automatically generates an
	      ID for each sequence. This ID consists of	a prefix  and  an  in-
	      creasing	number.	 This  flag  can  also	be used	to add a FASTA
	      header to	the output even	if the input has none.

	      Prefix for automatically generated IDs (as used in  output  file


	      If  this	parameter  is set, each	sequence will be prefixed with
	      the provided string. Hence, the output files will	obey the  fol-
	      lowing  naming  scheme: "" (secondary structure
	      plot),  ""   (dot-plot),	  ""
	      (stack  probabilities),  etc. where xxxx is the sequence number.
	      Note: Setting this parameter implies --auto-id.

	      Change the delimiter between prefix and  increasing  number  for
	      automatically generated IDs (as used in output file names)


	      This  parameter  can be used to change the default delimiter "_"

	      the prefix string	and the	increasing  number  for	 automatically
	      generated	ID.

	      Specify  the  number  of	digits of the counter in automatically
	      generated	alignment IDs.


	      When alignments IDs are automatically generated, they receive an
	      increasing  number,  starting with 1. This number	will always be
	      left-padded by leading zeros, such that the number  takes	 up  a
	      certain  width. Using this parameter, the	width can be specified
	      to the users need. We allow numbers in the  range	 [1:18].  This
	      option implies --auto-id.

	      Specify  the  first  number in automatically generated alignment


	      When sequence IDs	are automatically generated, they  receive  an
	      increasing  number,  usually starting with 1. Using this parame-
	      ter, the first number can	be specified  to  the  users  require-
	      ments.  Note:  negative  numbers are not allowed.	 Note: Setting
	      this parameter implies to	ignore any IDs retrieved from the  in-
	      put data,	i.e. it	activates the --auto-id	flag.

	      Change the delimiting character that is used

	      for sanitized filenames


	      This  parameter  can  be used to change the delimiting character
	      used while sanitizing filenames, i.e. replacing invalid  charac-
	      ters. Note, that the default delimiter ALWAYS is the first char-
	      acter of the "ID delimiter" as supplied through  the  --id-delim
	      option. If the delimiter is a whitespace character or empty, in-
	      valid characters will be simply removed rather than substituted.
	      Currently, we regard the following characters as illegal for use
	      in filenames: backslash '\', slash '/', question mark '?',  per-
	      cent  sign '%', asterisk '*', colon ':', pipe symbol '|',	double
	      quote '"', triangular brackets '<' and '>'.

	      Use full FASTA header to create filenames


	      This parameter can be used to deactivate the default behavior of
	      limiting	output filenames to the	first word of the sequence ID.
	      Consider the following  example:	An  input  with	 FASTA	header
	      ">NM_0001	 Homo Sapiens some gene" usually produces output files
	      with the prefix "NM_0001"	without	the additional data  available
	      in  the  FASTA header, e.g. "" for secondary	struc-
	      ture plots. With this flag set,  no  truncation  of  the	output
	      filenames	 is done, i.e. output filenames	receive	the full FASTA
	      header data as prefixes. Note, however, that invalid  characters
	      (such as whitespace) will	be substituted by a delimiting charac-
	      ter or simply removed, (see also the  parameter  option  --file-

	      Change the default output	format


	      The following output formats are currently supported:

	      ViennaRNA	 format	(V), Delimiter-separated format	(D) also known
	      as CSV


	      Change the delimiting character for  Delimiter-separated	output
	      format, such as CSV


	      Delimiter-separated  output  defaults  to	comma separated	values
	      (CSV), i.e. all data in one data set is  delimited  by  a	 comma
	      character. This option allows one	to change the delimiting char-
	      acter to something else. Note, to	switch to tab-separated	 data,
	      use $'\t'	as delimiting character.

	      Do not print header for Delimiter-separated output, such as CSV


   Structure Constraints:
	      Command  line options to interact	with the structure constraints
	      feature of this program

	      Set the maximum base pair	span


       -C, --constraint[=<filename>]  Calculate	 structures  subject  to  con-

	      The  program  reads first	the sequence, then a string containing
	      constraints on the structure encoded with	the symbols:

	      .	(no constraint for this	base)

	      |	(the corresponding base	has to be paired

	      x	(the base is unpaired)

	      <	(base i	is paired with a base j>i)

	      >	(base i	is paired with a base j<i)

	      and matching brackets ( )	(base i	pairs base j)

	      With the exception of "|", constraints will disallow  all	 pairs
	      conflicting  with	 the constraint. This is usually sufficient to
	      enforce the constraint, but occasionally a  base	may  stay  un-
	      paired  in  spite	of constraints.	PF folding ignores constraints
	      of type "|".

	      Use constraints for multiple sequences.  (default=off)

	      Usually, constraints provided from input file only  apply	 to  a
	      single input sequence. Therefore,	RNAcofold will stop its	compu-
	      tation and quit after the	first input  sequence  was  processed.
	      Using  this switch, RNAcofold processes multiple input sequences
	      and applies the same provided constraints	to each	of them.

	      Remove non-canonical base	pairs from the structure constraint


	      Enforce base pairs given by round	brackets ( ) in	structure con-


	      Use SHAPE	reactivity data	to guide structure predictions

       --shapeMethod=[D/Z/W] + [optional parameters]
	      Select method to incorporate SHAPE reactivity

       data.  (default=`D')

	      The  following methods can be used to convert SHAPE reactivities
	      into pseudo energy contributions.

	      'D': Convert by using a linear equation according	to  Deigan  et
	      al  2009.	The calculated pseudo energies will be applied for ev-
	      ery nucleotide involved in a stacked pair. This method is	recog-
	      nized  by	 a  capital  'D'  in  the  provided  parameter,	 i.e.:
	      --shapeMethod="D"	is the default setting.	The slope 'm' and  the
	      intercept	 'b'  can  be set to a non-default value if necessary,
	      otherwise	m=1.8 and b=-0.6.  To  alter  these  parameters,  e.g.
	      m=1.9   and   b=-0.7,   use   a	parameter  string  like	 this:
	      --shapeMethod="Dm1.9b-0.7". You may also provide only one	of the
	      two      parameters      like:	  --shapeMethod="Dm1.9"	    or

	      'Z': Convert SHAPE reactivities to pseudo	energies according  to
	      Zarringhalam et al 2012. SHAPE reactivities will be converted to
	      pairing probabilities by using linear mapping.  Aberration  from
	      the  observed pairing probabilities will be penalized during the
	      folding recursion. The magnitude of the penalties	 can  affected
	      by adjusting the factor beta (e.g. --shapeMethod="Zb0.8").

	      'W':  Apply  a given vector of perturbation energies to unpaired
	      nucleotides according to Washietl	et al 2012. Perturbation  vec-
	      tors can be calculated by	using RNApvmin.

	      +	[optional parameters] Select method to convert SHAPE reactivi-
	      ties to

       pairing probabilities.

	      This parameter is	useful when dealing with the SHAPE  incorpora-
	      tion  according to Zarringhalam et al. The following methods can
	      be used to convert SHAPE reactivities into the probability for a
	      certain nucleotide to be unpaired.

	      'M':  Use	 linear	mapping	according to Zarringhalam et al.  'C':
	      Use a cutoff-approach to divide into paired and unpaired nucleo-
	      tides  (e.g.  "C0.25")  'S': Skip	the normalizing	step since the
	      input data already represents probabilities for  being  unpaired
	      rather  than  raw	 reactivity  values 'L': Use a linear model to
	      convert the reactivity into a  probability  for  being  unpaired
	      (e.g.  "Ls0.68i0.2"  to  use a slope of 0.68 and an intercept of
	      0.2) 'O':	Use a linear model to convert the log of the  reactiv-
	      ity into a probability for being unpaired	(e.g. "Os1.6i-2.29" to
	      use a slope of 1.6 and an	intercept of -2.29)

	      Read additional commands from file

	      Commands include hard and	soft constraints, but  also  structure
	      motifs  in  hairpin  and	interior loops that need to be treeted
	      differently. Furthermore,	commands can be	set  for  unstructured
	      and structured domains.

	      Select  additional  algorithms  which  should be included	in the
	      calculations.  The Minimum free energy  (MFE)  and  a  structure
	      representative are calculated in any case.

       -p, --partfunc[=INT]
	      Calculate	 the  partition	 function and base pairing probability
	      matrix in	addition to the	mfe structure. Default is  calculation
	      of mfe structure only.


	      In  addition  to the MFE structure we print a coarse representa-
	      tion of the pair probabilities in	form of	a pseudo bracket nota-
	      tion,  followed by the ensemble free energy, as well as the cen-
	      troid structure derived from  the	 pair  probabilities  together
	      with  its	 free energy and distance to the ensemble.  Finally it
	      prints the frequency of the mfe structure,  and  the  structural
	      diversity	 (mean	distance  between the structures in the	ensem-
	      ble).  See the description of pf_fold() and  mean_bp_dist()  and
	      centroid()  in  the RNAlib documentation for details.  Note that
	      unless you also specify -d2 or -d0, the partition	 function  and
	      mfe calculations will use	a slightly different energy model. See
	      the discussion of	dangling end options below.

	      An additionally passed value to this option changes the behavior
	      of partition function calculation:

	      In  order	 to  calculate the partition function but not the pair

	      use the -p0 option and save about

	      50% in runtime. This prints the ensemble free energy -kT ln(Z).

       -a, --all_pf[=INT]
	      Compute the partition function and free energies not only	of the
	      hetero-dimer  consisting	of  the	 two  input sequences (the "AB
	      dimer"), but also	of the homo-dimers AA and BB as	well as	A  and
	      B	monomers.


	      The  output  will	 contain  the  free energies for each of these
	      species, as well as 5 dot	plots containing the conditional  pair
	      probabilities,  called "", ""	and so on. For
	      later use, these dot plot	files also contain the free energy  of
	      the  ensemble  as	 a comment. Using -a automatically switches on
	      the -p option. Base pair probability computations	may be	turned
	      off  altogether  by providing "0"	as an argument to this parame-
	      ter. In that case, no dot	plot files will	be generated.

       -c, --concentrations
	      In addition to everything	listed under the -a  option,  read  in
	      initial monomer concentrations and compute the expected equilib-
	      rium concentrations of the 5 possible species (AB,  AA,  BB,  A,


	      Start  concentrations  are read from stdin (unless the -f	option
	      is used) in [mol/l], equilibrium concentrations are given	 real-
	      tive  to	the sum	of the two inputs. An arbitrary	number of ini-
	      tial concentrations can be specified (one	pair of	concentrations
	      per line).

       -f, --concfile=filename
	      Specify  a  file	with  initial  concentrations  for the two se-

	      The table	consits	of arbitrary many lines	with just two  numbers
	      (the  concentration of sequence A	and B).	This option will auto-
	      matically	toggle the -c  (and  thus  -a  and  -p)	 options  (see

	      Compute the centroid structure.  (default=off)

	      Additionally  to	the MFE	structure, compute the centroid	repre-
	      sentative	of the structure ensemble. Here,  we  apply  the  base
	      pair distance as distance	measure, and report the	structure that
	      minimizes	its Boltzmann weighted base pair distance to the  rest
	      of the ensemble. Computing the centroid structure	requires equi-
	      librium base pair	probabilities. Therefore, this option  implies
	      the  -p  switch.	For historical reasons,	the centroid structure
	      output is	deactivated by default.

	      Calculate	an MEA (maximum	expected  accuracy)  structure,	 where
	      the  expected  accuracy is computed from the pair	probabilities:
	      each base	pair (i,j) gets	a score	2*gamma*p_ij and the score  of
	      an  unpaired  base  is given by the probability of not forming a


	      The parameter gamma tunes	the importance of correctly  predicted
	      pairs versus unpaired bases. Thus, for small values of gamma the
	      MEA structure will contain only pairs with very  high  probabil-
	      ity.   Using  --MEA implies -p for computing the pair probabili-

       -S, --pfScale=scaling factor
	      In the calculation of the	pf use scale*mfe as  an	 estimate  for
	      the ensemble free	energy (used to	avoid overflows).

	      The  default is 1.07, useful values are 1.0 to 1.2. Occasionally
	      needed for long sequences.  You can also recompile  the  program
	      to use double precision (see the README file).

	      Set  the	threshold  for base pair probabilities included	in the
	      postscript output


	      By setting the threshold the base	pair  probabilities  that  are
	      included	in the output can be varied. By	default	only those ex-
	      ceeding 1e-5 in probability will be shown	as squares in the  dot
	      plot.  Changing  the threshold to	any other value	allows for in-
	      crease or	decrease of data.

       -g, --gquad
	      Incoorporate G-Quadruplex	formation into the  structure  predic-
	      tion algorithm.


   Model Details:
       -T, --temp=DOUBLE
	      Rescale energy parameters	to a temperature of temp C. Default is

       -4, --noTetra
	      Do not include special tabulated stabilizing energies for	 tri-,
	      tetra- and hexaloop hairpins.


	      Mostly for testing.

       -d, --dangles=INT
	      How  to  treat "dangling end" energies for bases adjacent	to he-
	      lices in free ends and multi-loops


	      With -d1 only unpaired bases can participate in at most one dan-
	      gling  end.   With  -d2 this check is ignored, dangling energies
	      will be added for	the bases adjacent to a	helix on both sides in
	      any  case;  this	is  the	default	for mfe	and partition function
	      folding (-p).  The option	-d0 ignores dangling  ends  altogether
	      (mostly for debugging).  With -d3	mfe folding will allow coaxial
	      stacking of adjacent helices in multi-loops. At the  moment  the
	      implementation  will not allow coaxial stacking of the two inte-
	      rior pairs in a loop of degree 3 and works only for mfe folding.

	      Note that	with -d1 and -d3 only the MFE computations will	be us-
	      ing this setting while partition function	uses -d2 setting, i.e.
	      dangling ends will be treated differently.

       --noLP Produce structures without lonely	pairs (helices of length 1).


	      For partition function folding this only	disallows  pairs  that
	      can  only	occur isolated.	Other pairs may	still occasionally oc-
	      cur as helices of	length 1.

       --noGU Do not allow GU pairs


	      Do not allow GU pairs at the end of helices


       -P, --paramFile=paramfile
	      Read energy parameters from paramfile, instead of	using the  de-
	      fault parameter set.

	      Different	 sets  of energy parameters for	RNA and	DNA should ac-
	      company your distribution.  See the RNAlib documentation for de-
	      tails on the file	format.	When passing the placeholder file name
	      "DNA", DNA parameters are	loaded without the  need  to  actually
	      specify any input	file.

	      Allow other pairs	in addition to the usual AU,GC,and GU pairs.

	      Its  argument  is	a comma	separated list of additionally allowed
	      pairs. If	the first character is a "-" then AB will  imply  that
	      AB  and BA are allowed pairs.  e.g. RNAcofold -nsp -GA  will al-
	      low GA and AG pairs. Nonstandard pairs are given 0 stacking  en-

       -e, --energyModel=INT
	      Rarely used option to fold sequences from	the artificial ABCD...
	      alphabet,	where A	pairs B, C-D etc.  Use the  energy  parameters
	      for GC (-e 1) or AU (-e 2) pairs.

	      Set the scaling of the Boltzmann factors (default=`1.')

	      The  argument  provided  with  this  option enables to scale the
	      thermodynamic temperature	used in	the Boltzmann factors indepen-
	      dently  from the temperature used	to scale the individual	energy
	      contributions of the loop	types. The Boltzmann factors then  be-
	      come  exp(-dG/(kT*betaScale)) where k is the Boltzmann constant,
	      dG the free energy contribution of the state and T the  absolute

       If you use this program in your work you	might want to cite:

       R.  Lorenz,  S.H.  Bernhart,  C.	 Hoener	 zu Siederdissen, H. Tafer, C.
       Flamm, P.F. Stadler and I.L. Hofacker (2011), "ViennaRNA	Package	 2.0",
       Algorithms for Molecular	Biology: 6:26

       I.L.  Hofacker,	W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P.
       Schuster	(1994),	"Fast Folding and Comparison of	RNA  Secondary	Struc-
       tures", Monatshefte f. Chemie: 125, pp 167-188

       R.  Lorenz,  I.L. Hofacker, P.F.	Stadler	(2016),	"RNA folding with hard
       and soft	constraints", Algorithms for Molecular Biology 11:1 pp 1-13

       S.H.Bernhart, Ch. Flamm,	P.F. Stadler, I.L. Hofacker,  (2006),  "Parti-
       tion  Function and Base Pairing Probabilities of	RNA Heterodimers", Al-
       gorithms	Mol. Biol.

       The energy parameters are taken from:

       D.H. Mathews, M.D. Disney, D. Matthew, J.L. Childs, S.J.	Schroeder,  J.
       Susan,  M. Zuker, D.H. Turner (2004), "Incorporating chemical modifica-
       tion constraints	into a dynamic programming algorithm for prediction of
       RNA secondary structure", Proc. Natl. Acad. Sci.	USA: 101, pp 7287-7292

       D.H  Turner, D.H. Mathews (2009), "NNDB:	The nearest neighbor parameter
       database	for predicting stability of nucleic acid secondary structure",
       Nucleic Acids Research: 38, pp 280-282

       Ivo L Hofacker, Peter F Stadler,	Stephan	Bernhart, Ronny	Lorenz

       If  in doubt our	program	is right, nature is at fault.  Comments	should
       be sent to

RNAcofold 2.4.14		  August 2019			  RNACOFOLD(1)


Want to link to this manual page? Use this URL:

home | help