Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
ANALYSESEQS(1)		    General Commands Manual		ANALYSESEQS(1)

NAME
       AnalyseSeqs - Analyse a set of sequences	of common length

SYNOPSIS
       AnalyseSeqs [-X[bswn]] [-Q] [-M{mask}[+|!]] [-D{H|A|G}] [-d{S|H|D|B}]

DESCRIPTION
       AnalyseSeqs  reads a set	of sequences from stdin	and tries a variety of
       methods for sequence analysis on	them. Currently	available are:
       Statistical geometry for	quadruples of sequences; THIS  IS  PRELIMINARY
       AND NOT WELL TESTED BY NOW.
       split  decomposition;  neighbour	joining	and Ward's variance method for
       reconstructing phylogenies using	various	distance measures.   For  sta-
       tistical	 geometry  and the cluster methods PostScript output is	avail-
       able.
       The program continues reading until it encounters one of	the  separator
       characters  '@' or '%'. Only sequences of alphabetical characters or of
       a specified alphabet are	processed, all other lines  are	 ignored.  The
       program	stops  reading if it either encounters an EOF condition, or if
       there are no valid sequence data	between	two lines beginning with sepa-
       rator characters.
       A list of taxa names can	be specified in	the input stream. The list be-
       gins with a line	beginning with '*'. Optionally,	 a  file  name	prefix
       [fn]  for the PostScript	output can be specified	in this	line.  The en-
       tries have the form 'x :	Taxon',	where x	is the number of taxon,	 i.e.,
       of  the	corresponding  entry  in the list of input sequences. The taxa
       list need not be	complete. It must end, however,	with a line  beginning
       with  '*'  or any of the	separator characters. The taxa list is printed
       on top of the output. The specified taxa	names are used	as  labels  in
       the PostScript output.

OPTIONS
       -X[bswn]
	      specifies	the analysis methods to	be used.

       [b]    Statistical Geometry. A PostScript file named '[fn_]box.ps' giv-
	      ing a graphical representation of	the  statistical  geometry  is
	      created.	The resulting box is a good measure of 'tree likeness'
	      of the data set.	This is	the default.

       [s]    Split decomposition.

       [w]    Cluster analysis using Ward's method. A  PostScript  file	 named
	      '[fn_]wards.ps' is created containing a drawing of the tree.

       [n]    Cluster  analysis	 using	Saitou's  neighbour  joining method. A
	      PostScript file named '[fn_]nj.ps' is created containing a draw-
	      ing of the tree.

       -Q     indicates	 that  a  statistical  geometry	analysis is to be per-
	      formed comparing four data sets, for  instance  to  confirm  the
	      significance of a	proposed phylogeny. This option	is only	useful
	      for statistical geometry analysis	and hence the -X option	is ig-
	      nored. Each of the four data sets	must be	of the form
	      *	[filename_prefix]
	      #	number
	      [list of taxa names]
	      *
	      list of sequences
	      %
	      where number is 1,2,3,4 for the four groups to be	compared.

       -M{mask}[+|!]
	      allows one to specify a mask for the input file. '{mask}'	can be
	      one of the following letters indicating a	predefined alphabet or
	      the  %-sign  followed by all characters to be accepted. A	+ sign
	      at the very end of the mask indicates that the input  is	to  be
	      handled  case  sensitive.	 Default is conversion of the input to
	      upper case. A ! sign can be used to convert the input data to RY
	      code:  GgAaXx  ->	 R,  UuCcKkTt -> Y, all	other letters are con-
	      verted to	*.

       -Ma    all letters A-Z and a-z.

       -Mu    uppercase	letters.

       -Ml    lowercase	letters.

       -Mc    digits [0-9].

       -Mn    all alphanumeric characters.

       -MR    RNA alphabet (GCAUgcau).

       -MD    DNA alphabet (GCATgcat).

       -MA    Amino acids in one-letter	code.

       -MS    Secondary	strcutures coded as '^.()'

       -M%alphabet
	      use the specified	alphabet.

       -D     specifies	the algorithm to be used for calculating the  distance
	      matrix of	the input data set. Available are

       -DH    Hamming Distance

       -DA[,cost]
	      Simple  alignment	distance according to Needleman	and Wunsch.  A
	      gap cost different from 1. can be	specified after	the comma.

       -DG[,cost1,cost2]
	      Gotoh's	distance   with	  gap	 cost	 function    g(k)    =
	      cost2+cost1*(k-1).  cost2<=cost1	has  to	be fulfilled.  Default
	      values are cost1=1., cost2=1., yielding the same distance	as op-
	      tion A.
	      ONLY THE HAMMING DISTANCE	IS WELL	TESTED BY NOW !!!

       -d     specifies	the edit cost matrix to	be used. Available are

       -dS    simple  distance.	Indel and substitution of different characters
	      all have cost 1. The indel cost can be set by specifying the gap
	      costs  with  the	algorithm options -DA and -DG. This is the de-
	      fault.

       -dH    A	distance matrix	for  RNA  secondary  structures.  Inspired  by
	      Hogeweg's	similarity measure (J.Mol.Biol 1988).  Gap-function is
	      set automatically.

       -dD    Dayhoff's	matrix for amino acid distances.

       -dB    Distinguish purines and pyrimidines only.	 CAUTION  this	option
	      of course	influences only	the calculation	of distances.  It does
	      NOT affect computation of	the statistical	geometry. This is done
	      directly	on the sequences. If you want to do statistical	geome-
	      try on RY	sequences use the ! sign with the -M option,  for  in-
	      stance -MR!.

REFERENCES
       The  method of statistical geometry has been introduced by M. Eigen, R.
       Winkler-Oswatitsch and A.W.M. Dress (Proc Natl Acad Sci,	85:1988,5912).
       The  method  of	split  decomposition  was proposed by H.J. Bandelt and
       A.W.M. Dress (Adv Math, 92:1992,47).  The variance method  for  cluster
       analysis	 is  due  to  H.J.  Ward  (J Amer Stat Ass, 58:1963,236).  The
       neighbour joining method	was published by  Saitou  and  Nei  (Mol  Biol
       Evol, 4:1987,406).

       This program is part of the Vienna RNA Package

WARNING
       This  is	the beta test version. Some options or combinations of options
       may still produce nonsense. Please send	bug  reports  to  ivo@tbi.uni-
       vie.ac.at.

VERSION
       This man	page is	part of	the Vienna RNA Package version 1.2.

AUTHOR
       Peter F Stadler,	Ivo L. Hofacker.

BUGS
       Comments	should be sent to ivo@itc.univie.ac.at.

								ANALYSESEQS(1)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | REFERENCES | WARNING | VERSION | AUTHOR | BUGS

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=AnalyseSeqs&sektion=1&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help