Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
ALIGN(1)		    General Commands Manual		      ALIGN(1)

       align - compute the global alignment of two protein or DNA sequences

       align0  -  compute the global alignment of two protein or DNA sequences
       without penalizing for end-gaps

       align [ -f # -g # -O filename  -m # -s SMATRIX -w #  ]  sequence-file-1

       align  produces	an optimal global alignment between two	protein	or DNA
       sequences.  align will automatically decide whether the query  sequence
       is  DNA	or protein by reading the query	sequence as protein and	deter-
       mining whether the `amino-acid composition' is more than	 85%  A+C+G+T.
       align uses a modification of the	algorithm described by E. Myers	and W.
       Miller in  "Optimal Alignments in Linear	Space" CABIOS (1988)  4:11-17.
       The program can be invoked either with command line arguments or	in in-
       teractive mode.

       align weights end gaps, so that an alignment of the form
       will have a higher score	than:
       align0 uses the same algorithm, but does	not weight  end	 gaps.	 Some-
       times this can have surprising effects.

       align  and  align0  use the standard fasta format sequence file.	 Lines
       beginning with '>' or ';' are  considered  comments  and	 ignored;  se-
       quences	can  be	 upper	or  lower case,	blanks,tabs and	unrecognizable
       characters are ignored.	align expects sequences	to use the single let-
       ter amino acid codes, see protcodes(1) .

       align can be directed to	change the scoring matrix and output format by
       entering	options	on the command line (preceeded by a `-'	or `/' for MS-
       DOS). All of the	options	should preceed the file	name arguments.	Alter-
       nately, these options can be changed by setting environment  variables.
       The options and environment variables are:

       -f #   Penalty for the first residue in a gap (-12 by default).

       -g #   Penalty for additional residues in a gap (-2 by default).

       -O filename
	      Sends copy of results to "filename".

       -m #   (MARKX)  =1,2,3.	Alternate display of matches and mismatches in
	      alignments. MARKX=1 uses ":","."," ", for	 identities,  conseva-
	      tive  replacements,  and	non-conservative replacements, respec-
	      tively. MARKX=2 uses " ","x", and	"X".  MARKX=3  does  not  show
	      the  second sequence, but	uses the second	alignment line to dis-
	      play matches with	a "."  for identity, or	 with  the  mismatched
	      residue  for  mismatches.	  MARKX=3 is useful for	aligning large
	      numbers of similar sequences.

       -s str (SMATRIX)	the filename of	an alternative scoring matrix file  or
	      "250" to use the PAM250 matrix.

       -w #   (LINLEN)	output line length for sequence	alignments.  (normally
	      60, can be set up	to 200).

       (1)    align musplfm.aa lcbo.aa

       Compare the amino acid sequence in the file musplfm.aa with  the	 amino
       acid sequence in	the file lcbo.aa Each sequence should be in the	form:
	    >LCBO bovine preprolactin
	    WILLLSQ ...

       (2)    align -w 80 musplfm.aa lcbo.aa > musplfm.aln

       Compare	the  amino  acid  sequence in the file musplfm.aa with the se-
       quences in the file lcbo.aa Show	both sequences	with  80  residues  on
       each output line	and write the output to	the file musplfm.aln.

       (3)    align

       Run the align program in	interactive mode.  The program will prompt for
       the file	name for the first sequence and	the second sequence.

       rdf2(1),protcodes(5), dnacodes(5)

       Bill Pearson

				     local			      ALIGN(1)


Want to link to this manual page? Use this URL:

home | help