FreeBSD Manual Pages
RNAForester(2.0.1) RNAForester(2.0.1) NAME RNAforester - compare RNA secondary structures via forest alignment SYNOPSIS RNAforester [options] Options are: --help shows this help info --version shows version information -d calculate distance instead of similarity -r calculate relative score -l local similarity -so=int local suboptimal alignments within int% -s small-in-large similarity -m multiple alignment mode -mt=double clustering threshold -mc=double clustering cutoff -p predict structures from sequences -pmin=num minimum basepair frequency for prediction -pm=int basepair(bond) match score -pd=int basepair bond indel score -bm=int base match score -br=int base mismatch score -bd=int base indel score --RIBOSUM RIBOSUM85-60 scoring matrix -cmin=double minimum basepair frequency for consensus structure -2d generate alignment 2D plots in postscript format --2d_hidebasenum hide base numbers in 2D plot --2d_basenuminterval=n show every n-th base number --2d_grey use only grey colors in 2D plots --2d_scale=double scale factor for the 2d plots --score compute only scores, no alignment --fasta generate fasta output of alignments -f=file read input from file --noscale suppress output of scale DESCRIPTION RNAforester calculates RNA secondary structure alignments, both pair- wise and multiple. The comparison is based on the tree alignment model [1,2]. Model The model for pairwise and multiple alignment differs slightly. The pairwise model is based on the following edit operations on sequence and structure: basepair replacement/match: A basepair, INCLUDING the paired bases, is substituted by another basepair. The scoring contribution is p_m. basepair bond deletion: A basepair bond WITHOUT the paired bases is re- moved. The scoring contribution is p_d. Sequence edit operations: Base match/mismatch and base deletion give the scoring contributions b_m and b_d, respectively. In the multiple alignment mode (-m), parameter p_m is the score for matching a basepair bond WITHOUT the paired bases. Thus, the score for a whole basepair replacement is p_m+2*b_m. For more information about multiple alignment refer to the description of parameter -m. Input RNAforester reads RNA secondary structures from stdin by default. It accepts sequences and structures in Fasta format, where matching brack- ets symbolize base pairs and unpaired bases are represented by a dot. A line containing the primary sequence can precede the RNA secondary structure(s). An example is given below: > test accaguuacccauucgggaaccggu primary structure .((..(((...)))..((..)))). secondary structure All characters after a "blank" are ignored and all '-' characters are removed. The program will continue to read new structures until a line consisting of the single character @ or an end of file is encoun- tered. Input lines starting with > can contain a structure name. Option -f=filename let RNAforester read the input from file. Results files are then written to files prefixed by filename. Output Alignments in ASCII format are written to stdout. Option -2d generates postscript drawings of structure alignments. Options -d Calculate distance instead of similarity. In contrast to simi- larity, scoring contributions are minimized. The scoring param- eters must not be negative and equal structures achieve a dis- tance of zero. This parameter can not be used in conjunction with multiple alignment, where relative similarity is computed. -r Calculate relative score, defined by sr(a,b)=2*s(a,b)/(s(a,a)+s(b,b). Relative scores are upper bounded by 1 which is the score for equal structures. -l Calculate local similar structures. The term local refers to subwords of the input sequences and structures. If parameter -so is used suboptimal solutions are calculated. This does not mean suboptimal solutions of the same local structures, but different substructures which do not include each other. -so=int Calculates suboptimal local alignments within int% of the opti- mum. This option requires option -l. -s Calculates small-in-large similarity, i.e. the best alignment of the first structure against all substructures of the second structure is computed. -m, -mc=double, -mt=double, -cmin=double Multiple alignment mode. Multiple alignments of structures are calculated in a progressive fashion. First, an all-against-all comparison of structures is performed (relative scores) and af- terwards structural alignments are joined along a guide tree (the guide tree is constructed dynamically). If the best score which a single structure or structure alignment can achieve by aligning to all others is below cutoff value -mc, it is not joined and put into the results list. Thus, a multiple structure alignment can produce a list of alignments. The main purpose of parameter -mc is to identify alternative and wrong structures produced by structure predictions. The default value for -mc is zero, as this separates similar from dissimilar in a similarity scoring model. In each step in the multiple alignment calculation, the best scoring pair is joined and then the guide tree is adjusted. To speed up computation, parameter -mt defines a threshold whereas, if this is exceeded, multiple pairs are joined and then the guide tree is adjusted. Besides sequence and structure alignment, a consensus sequence and structure is computed. The minimum pair frequency probabil- ity for a basepair in the consensus sequence is controlled by parameter -cmin. The console output could look like (just a part): * * **** * * **** ** * **** ** * **** * ** * **** ******** **** ** * **** ******** **** ** * **** ******** **** **************** ** * **************** ****** **************** ** **************************** **************** ** **************************** ggggcuauagcucagcugggggagcuauagcucagcugggagcgggga .((((....))))....((.(.(((((..((((........))))... ************************************************ **************** ** **************************** **************** ** ** ************************* **************** ** * *************** ******* ** * **** ******** ***** ** * **** ******** ***** ** * **** ******* *** * ** * **** * * * **** * * **** The number of * above the primary sequence shows the frequency of the base. Each * stands for 10% frequency. Accordingly, the number of * below the secondary structure show the frequency of the occurrence of a paired or unpaired base. The guide tree is written to a file "cluster.dot" in dot format. If a filename was specified by parameter -f the filename is "filename_cluster.dot". Refer to http://www.re- search.att.com/sw/tools/graphviz for more details about the dot format and tools. -p, -pmin=double Structures (in fact, a consensus of compatible structures) are predicted from the partition function which is calculated using the Vienna RNA library [3]. Structure lines in the input are ig- nored. -pmin is the minimum frequency of a basepair which must be exceeded to be considered for the prediction of structures. -pm=int,-pd=int,-bm=int,-br=int,-bd=int Scoring parameters. Refer to Section DESCRIPTION. --RIBOSUM Uses the base and basepair substitution matrix RIBOSUM85-60 ma- trix as proposed in [4]. Requires pairwise alignment model. -2d RNAforester provides different types of visualizations for pair- wise and multiple alignment. pairwise alignment Since bases paired in a structure S1 can be aligned to bases unpaired in a structure S2, the presentation of a common secondary structure leaves some choice. For an align- ment of those structures, an RNA secondary structure "$S2-at-S1" is drawn that highlights the differences as deviations of S2 from S1, or vice versa, "S1-at-S2". Both are alternative visual- izations of the same alignment. Bases printed in black show structure elements that occur in both structures with the same sequence. Sequence variations are displayed by using red let- ters. Bases or base pairs that can only be found in S1 are printed in blue, while bases that only occur in S2 are printed in green. The drawings are written to files "x_n.ps" and "y_n.ps" where n is the number of the alignment. n enumerates the suboptimal so- lutions if option -so is used. The region of local similarity are highlighted in the original structures in the drawings "x_str.ps" and "y_str.ps". multiple alignment Each cluster of the result list of a multiple alignment is visualized in two alternative drawings, written to the files "filename_cons_n.ps" and "filename_n_.ps" if option -f is used. In both plots, the consensus structure is shown. The lighter a basepair bond is drawn, the less frequent does it ex- ist in the structures. Bases or basepair bonds that have a fre- quency of one hundred percent are drawn in red color. In "file- name_cons_n.ps", the most frequent base at each residue is printed, with the base frequency indicated by grey-scale. In "filename_n.ps", the frequencies of the bases a,c,g,u are pro- portional to the radius of circles that are arranged clockwise on the corners of a square, starting at the upper left corner. Additionally, these circles are colored red, green, blue, ma- genta for the bases a,c,g,u, respectively. The frequency of a gap is proportional to a black circle growing at the center of the square. Parameters --2d_hidebasenum,--2d_basenuminter- val=n,--2d_grey,--2d_scale=double effect the drawings of align- ments and consensus structures as implied by their names. --score Only the optimal score of an alignment is printed. This option is useful when RNA-forester is called by another program that only needs a similarity or distance value. --fasta Alignments are printed in Fasta format REFERENCES [1] Jiang T, Wang J T L and Zhang K, (1995) Alignment of Trees - An Al- ternative to Tree Edit, Theoretical Computer Science 143(1), 137-148 [2] Hoechsmann M, Toeller T, Giegerich R and Kurtz S, (2003) Local Sim- ilarity of RNA Secondary Structures, Proc. of the IEEE Bioinformatics Conference (CSB 2003), 159-168 [3] Ivo L. Hofacker, Walter Fontana, Peter F. Stadler, L. Sebastian Bonhoeffer, Manfred Tacker, and Peter Schuster, (1994) Fast Folding and Comparison of RNA Secondary Structures, Monatsh.Chem. 125: 167-188. [4] Klein R.J. and Eddy S.R., (2003) RSEARCH: finding homologs of sin- gle structured RNA sequences, BMC Bioinformatics. 2003 Sep 22;4(1):44 VERSION This man page documents version 1.4 of RNAforester. AUTHORS Matthias Hoechsmann BUGS I hope you wouldn't find them. Comments should be sent to mhoechsm@techfak.uni-bielefeld.de November 2017 RNAForester(2.0.1)
NAME | SYNOPSIS | DESCRIPTION | Options | REFERENCES | VERSION | AUTHORS | BUGS
Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=RNAforester&sektion=1&manpath=FreeBSD+12.2-RELEASE+and+Ports>