Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
RNAForester(2.0.1)					    RNAForester(2.0.1)

       RNAforester - compare RNA secondary structures via forest alignment

       RNAforester [options]
       Options are:
       --help			 shows this help info
       --version		 shows version information
       -d			 calculate distance instead of similarity
       -r			 calculate relative score
       -l			 local similarity
       -so=int			 local suboptimal alignments within int%
       -s			 small-in-large	similarity
       -m			 multiple alignment mode
       -mt=double		 clustering threshold
       -mc=double		 clustering cutoff
       -p			 predict structures from sequences
       -pmin=num		 minimum basepair frequency for	prediction
       -pm=int			 basepair(bond)	match score
       -pd=int			 basepair bond indel score
       -bm=int			 base match score
       -br=int			 base mismatch score
       -bd=int			 base indel score
       --RIBOSUM		 RIBOSUM85-60 scoring matrix
       -cmin=double		  minimum  basepair  frequency	for  consensus
       -2d			 generate alignment  2D	 plots	in  postscript
       --2d_hidebasenum		 hide base numbers in 2D plot
       --2d_basenuminterval=n	 show every n-th base number
       --2d_grey		 use only grey colors in 2D plots
       --2d_scale=double	 scale factor for the 2d plots
       --score			 compute only scores, no alignment
       --fasta			 generate fasta	output of alignments
       -f=file			 read input from file
       --noscale		 suppress output of scale

       RNAforester  calculates	RNA secondary structure	alignments, both pair-
       wise and	multiple.  The comparison is based on the tree alignment model

       The  model  for	pairwise  and multiple alignment differs slightly. The
       pairwise	model is based on the following	edit  operations  on  sequence
       and structure:

       basepair	 replacement/match: A basepair,	INCLUDING the paired bases, is
       substituted by another basepair.	 The scoring contribution is p_m.
       basepair	bond deletion: A basepair bond WITHOUT the paired bases	is re-
       moved. The scoring contribution is p_d.
       Sequence	 edit  operations:  Base match/mismatch	and base deletion give
       the scoring contributions b_m and b_d, respectively.

       In the multiple alignment mode (-m), parameter p_m  is  the  score  for
       matching	a basepair bond	WITHOUT	the paired bases.  Thus, the score for
       a whole basepair	replacement is p_m+2*b_m. For more  information	 about
       multiple	alignment refer	to the description of parameter	-m.

       RNAforester reads  RNA  secondary structures from stdin by default.  It
       accepts sequences and structures	in Fasta format, where matching	brack-
       ets symbolize base pairs	and unpaired bases are represented by a	dot. A
       line containing the primary sequence  can  precede  the	RNA  secondary
       structure(s). An	example	is given below:

	 > test
	 accaguuacccauucgggaaccggu   primary structure
	 .((..(((...)))..((..)))).   secondary structure

       All  characters	after a	"blank"	are ignored and	all '-'	characters are
       removed.	 The  program will continue to read  new  structures  until  a
       line consisting of the single character @ or an end of file is  encoun-
       tered. Input lines starting with	> can contain a	structure name.

       Option -f=filename let RNAforester read the input  from	file.  Results
       files are then written to files prefixed	by filename.

       Alignments  in ASCII format are written to stdout. Option -2d generates
       postscript drawings of structure	alignments.

       -d     Calculate	distance instead of similarity.	In contrast  to	 simi-
	      larity, scoring contributions are	minimized.  The	scoring	param-
	      eters must not be	negative and equal structures achieve  a  dis-
	      tance  of	 zero.	This  parameter	can not	be used	in conjunction
	      with multiple alignment, where relative similarity is computed.

       -r     Calculate	       relative	       score,	     defined	    by
	      sr(a,b)=2*s(a,b)/(s(a,a)+s(b,b).	  Relative  scores  are	 upper
	      bounded by 1 which is the	score for equal	structures.

       -l     Calculate	local similar structures. The  term  local  refers  to
	      subwords of the input sequences and structures. If parameter -so
	      is used suboptimal solutions are calculated. This	does not  mean
	      suboptimal solutions of the same local structures, but different
	      substructures which do not include each other.

	      Calculates suboptimal local alignments within int% of the	 opti-
	      mum. This	option requires	option -l.

       -s     Calculates small-in-large	similarity, i.e. the best alignment of
	      the first	structure against  all	substructures  of  the	second
	      structure	is computed.

       -m, -mc=double, -mt=double, -cmin=double
	      Multiple	alignment  mode. Multiple alignments of	structures are
	      calculated in a progressive fashion. First,  an  all-against-all
	      comparison  of structures	is performed (relative scores) and af-
	      terwards structural alignments are joined	 along	a  guide  tree
	      (the  guide tree is constructed dynamically).  If	the best score
	      which a single structure or structure alignment can  achieve  by
	      aligning	to  all	 others	 is  below cutoff value	-mc, it	is not
	      joined and put into the results list. Thus, a multiple structure
	      alignment	 can produce a list of alignments. The main purpose of
	      parameter	-mc is to identify alternative	and  wrong  structures
	      produced	by structure predictions. The default value for	-mc is
	      zero, as this separates similar from dissimilar in a  similarity
	      scoring model.

	      In  each	step  in  the multiple alignment calculation, the best
	      scoring pair is joined and then the guide	tree is	 adjusted.  To
	      speed up computation, parameter -mt defines a threshold whereas,
	      if this is exceeded, multiple pairs  are	joined	and  then  the
	      guide tree is adjusted.

	      Besides  sequence	 and structure alignment, a consensus sequence
	      and structure is computed. The minimum pair frequency  probabil-
	      ity  for	a  basepair in the consensus sequence is controlled by
	      parameter	-cmin.

	      The console output could look like (just a part):

				  * *  ****
				  * *  ****
				 ** *  ****
				 ** *  ****		     *
				 ** *  ****  ********	  ****
				 ** *  ****  ********	  ****
				 ** *  ****  ********	  ****
		**************** ** * ****************	  ******
		**************** ** ****************************
		**************** ** ****************************
		**************** ** ****************************
		**************** ** ** *************************
		**************** ** *  ***************	 *******
				 ** *  ****  ********	 *****
				 ** *  ****  ********	 *****
				 ** *  ****   *******	 *** *
				 ** *  ****		     *
				  * *  ****
				  * *  ****

	      The number of * above the	primary	sequence shows	the  frequency
	      of  the base.  Each * stands for 10% frequency. Accordingly, the
	      number of	* below	the secondary structure	show the frequency  of
	      the occurrence of	a paired or unpaired base.

	      The guide	tree is	written	to a file ""	in dot format.
	      If a filename was	specified by  parameter	 -f  the  filename  is
	      "".	     Refer	to	for more details about the dot
	      format and tools.

       -p, -pmin=double
	      Structures  (in  fact, a consensus of compatible structures) are
	      predicted	from the partition function which is calculated	 using
	      the Vienna RNA library [3]. Structure lines in the input are ig-
	      nored.  -pmin is the minimum frequency of	a basepair which  must
	      be exceeded to be	considered for the prediction of structures.

	      Scoring parameters. Refer	to Section DESCRIPTION.

	      Uses  the	base and basepair substitution matrix RIBOSUM85-60 ma-
	      trix as proposed in [4].	Requires pairwise alignment model.

       -2d    RNAforester provides different types of visualizations for pair-
	      wise and multiple	alignment.

	      pairwise	alignment  Since bases paired in a structure S1	can be
	      aligned to bases unpaired	in a structure S2, the presentation of
	      a	 common	 secondary structure leaves some choice. For an	align-
	      ment of those structures,	an RNA secondary structure "$S2-at-S1"
	      is  drawn	 that  highlights  the differences as deviations of S2
	      from S1, or vice versa, "S1-at-S2". Both are alternative visual-
	      izations	of  the	 same  alignment.  Bases printed in black show
	      structure	elements that occur in both structures with  the  same
	      sequence.	 Sequence  variations  are displayed by	using red let-
	      ters. Bases or base pairs	that can  only	be  found  in  S1  are
	      printed  in  blue, while bases that only occur in	S2 are printed
	      in green.

	      The drawings are written to files	"" and "" where  n
	      is  the number of	the alignment. n enumerates the	suboptimal so-
	      lutions if option	-so is used.  The region of  local  similarity
	      are  highlighted	in  the	 original  structures  in the drawings
	      "" and "".

	      multiple alignment Each cluster of the result list of a multiple
	      alignment	 is visualized in two alternative drawings, written to
	      the files	"" and "" if option -f
	      is  used.	 In  both plots, the consensus structure is shown. The
	      lighter a	basepair bond is drawn,	the less frequent does it  ex-
	      ist  in the structures. Bases or basepair	bonds that have	a fre-
	      quency of	one hundred percent are	drawn in red color. In	"file-",	the  most  frequent  base  at  each residue is
	      printed, with the	base frequency	indicated  by  grey-scale.  In
	      "",	the  frequencies of the	bases a,c,g,u are pro-
	      portional	to the radius of circles that are  arranged  clockwise
	      on  the  corners of a square, starting at	the upper left corner.
	      Additionally, these circles are colored red,  green,  blue,  ma-
	      genta  for  the  bases a,c,g,u, respectively. The	frequency of a
	      gap is proportional to a black circle growing at the  center  of
	      the square.

	      Parameters		   --2d_hidebasenum,--2d_basenuminter-
	      val=n,--2d_grey,--2d_scale=double	 effect	the drawings of	align-
	      ments and	consensus structures as	implied	by their names.

	      Only  the	 optimal score of an alignment is printed. This	option
	      is useful	when RNA-forester is called by	another	 program  that
	      only needs a similarity or distance value.

	      Alignments are printed in	Fasta format

       [1] Jiang T, Wang J T L and Zhang K, (1995) Alignment of	Trees -	An Al-
       ternative to Tree Edit, Theoretical Computer Science 143(1), 137-148

       [2] Hoechsmann M, Toeller T, Giegerich R	and Kurtz S, (2003) Local Sim-
       ilarity	of  RNA	Secondary Structures, Proc. of the IEEE	Bioinformatics
       Conference (CSB 2003), 159-168

       [3] Ivo L. Hofacker, Walter Fontana, Peter  F.  Stadler,	 L.  Sebastian
       Bonhoeffer, Manfred Tacker, and Peter Schuster, (1994) Fast Folding and
       Comparison of RNA Secondary Structures, Monatsh.Chem. 125: 167-188.

       [4] Klein R.J. and Eddy S.R., (2003) RSEARCH: finding homologs of  sin-
       gle  structured	RNA sequences, BMC Bioinformatics. 2003	Sep 22;4(1):44

       This man	page documents version 1.4 of RNAforester.

       Matthias	Hoechsmann

       I  hope	you  wouldn't  find  them.   Comments  should	be   sent   to

				 November 2017		    RNAForester(2.0.1)


Want to link to this manual page? Use this URL:

home | help