Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
alimask(1)			 HMMER Manual			    alimask(1)

NAME
       alimask	-  calculate and add column mask to a multiple sequence	align-
       ment

SYNOPSIS
       alimask [options] msafile postmsafile

DESCRIPTION
       alimask is used to apply	a mask line to a multiple sequence  alignment,
       based  on  provided  alignment or model coordinates.  When hmmbuild re-
       ceives a	masked alignment as input, it  produces	 a  profile  model  in
       which  the  emission probabilities at masked positions are set to match
       the background frequency, rather	than being set based on	observed  fre-
       quencies	 in  the  alignment.  Position-specific	insertion and deletion
       rates are not altered, even in masked regions.  alimask autodetects in-
       put  format,  and  produces  masked  alignments	in  Stockholm  format.
       msafile may contain only	one sequence alignment.

       A common	motivation for masking a region	in an alignment	 is  that  the
       region contains a simple	tandem repeat that is observed to cause	an un-
       acceptably high rate of false positive hits.

       In the simplest case, a mask range is given in coordinates relative  to
       the  input  alignment,  using --alirange	_s_.  However it is more often
       the case	that the region	to be masked has been  identified  in  coordi-
       nates relative to the profile model (e.g. based on recognizing a	simple
       repeat pattern in false hit alignments or in the	HMM  logo).   Not  all
       alignment columns are converted to match	state positions	in the profile
       (see the	--symfrac flag for hmmbuild for	discussion),  so  model	 posi-
       tions  do  not  necessarily match up to alignment column	positions.  To
       remove the burden of converting model positions to alignment positions,
       alimask	accepts	the mask range input in	model coordinates as well, us-
       ing --modelrange	_s_.  When using this flag, alimask  determines	 which
       alignment  positions would be identified	by hmmbuild as match states, a
       process that requires that all hmmbuild flags impacting	that  decision
       be  supplied  to	 alimask.  It is for this reason that many of the hmm-
       build flags are also used by alimask.

OPTIONS
       -h     Help; print a brief reminder  of	command	 line  usage  and  all
	      available	options.

       -o _f_ Direct the summary output	to file	_f_, rather than to stdout.

OPTIONS	FOR SPECIFYING MASK RANGE
       A  single  mask	range is given as a dash-separated pair, like --model-
       range 10-20 and multiple	ranges may be submitted	as  a  comma-separated
       list, --modelrange 10-20,30-42.

       --modelrange _s_
	      Supply the given range(s)	in model coordinates.

       --alirange _s_
	      Supply the given range(s)	in alignment coordinates.

       --apendmask
	      Add  to the existing mask	found with the alignment.  The default
	      is to overwrite any existing mask.

       --model2ali _s_
	      Rather than actually produce the masked alignment, simply	 print
	      model range(s) corresponding to input alignment range(s).

       --ali2model _s_
	      Rather  than actually produce the	masked alignment, simply print
	      alignment	range(s) corresponding to input	model range(s).

OPTIONS	FOR SPECIFYING THE ALPHABET
       --amino
	      Assert that sequences in msafile are protein, bypassing alphabet
	      autodetection.

       --dna  Assert that sequences in msafile are DNA,	bypassing alphabet au-
	      todetection.

       --rna  Assert that sequences in msafile are RNA,	bypassing alphabet au-
	      todetection.

OPTIONS	CONTROLLING PROFILE CONSTRUCTION
       These  options  control	how consensus columns are defined in an	align-
       ment.

       --fast Define consensus columns as those	that have a fraction  >=  sym-
	      frac  of	residues as opposed to gaps. (See below	for the	--sym-
	      frac option.) This is the	default.

       --hand Define consensus columns in next profile using reference annota-
	      tion  to	the multiple alignment.	 This allows you to define any
	      consensus	columns	you like.

       --symfrac _x_
	      Define the residue fraction threshold necessary to define	a con-
	      sensus  column when using	the --fast option. The default is 0.5.
	      The symbol fraction in each column is  calculated	 after	taking
	      relative sequence	weighting into account,	and ignoring gap char-
	      acters corresponding to ends of sequence fragments  (as  opposed
	      to  internal  insertions/deletions).   Setting this to 0.0 means
	      that every alignment column will be assigned as consensus, which
	      may  be  useful in some cases. Setting it	to 1.0 means that only
	      columns that include 0 gaps (internal insertions/deletions) will
	      be assigned as consensus.

       --fragthresh _x_
	      We  only want to count terminal gaps as deletions	if the aligned
	      sequence is known	to be full-length, not if  it  is  a  fragment
	      (for  instance,  because	only  part of it was sequenced). HMMER
	      uses a simple rule to infer fragments: if	the sequence length  L
	      is  less	than  or  equal	 to a fraction _x_ times the alignment
	      length in	columns, then the sequence is handled as  a  fragment.
	      The  default  is	0.5.   Setting	--fragthresh  0	will define no
	      (nonempty) sequence as a fragment; you might want	to do this  if
	      you know you've got a carefully curated alignment	of full-length
	      sequences.  Setting --fragthresh 1 will define all sequences  as
	      fragments;  you might want to do this if you know	your alignment
	      is entirely composed of  fragments,  such	 as  translated	 short
	      reads in metagenomic shotgun data.

OPTIONS	CONTROLLING RELATIVE WEIGHTS
       HMMER uses an ad	hoc sequence weighting algorithm to downweight closely
       related sequences and upweight distantly	related	ones. This has the ef-
       fect  of	 making	 models	less biased by uneven phylogenetic representa-
       tion. For example, two identical	sequences would	typically each receive
       half  the  weight that one sequence would.  These options control which
       algorithm gets used.

       --wpb  Use  the	Henikoff  position-based  sequence  weighting	scheme
	      [Henikoff	 and  Henikoff,	J. Mol.	Biol. 243:574, 1994].  This is
	      the default.

       --wgsc Use the Gerstein/Sonnhammer/Chothia  weighting  algorithm	 [Ger-
	      stein et al, J. Mol. Biol. 235:1067, 1994].

       --wblosum
	      Use  the	same clustering	scheme that was	used to	weight data in
	      calculating BLOSUM subsitution matrices [Henikoff	and  Henikoff,
	      Proc.  Natl.  Acad.  Sci	89:10915, 1992]. Sequences are single-
	      linkage clustered	at an identity threshold  (default  0.62;  see
	      --wid)  and  within  each	 cluster of c sequences, each sequence
	      gets relative weight 1/c.

       --wnone
	      No relative weights. All sequences are assigned uniform weight.

       --wid _x_
	      Sets the identity	threshold used	by  single-linkage  clustering
	      when  using --wblosum.  Invalid with any other weighting scheme.
	      Default is 0.62.

OTHER OPTIONS
       --informat _s_
	      Assert that input	msafile	is in alignment	format _s_,  bypassing
	      format  autodetection.   Common  choices for _s_ include:	stock-
	      holm, a2m, afa, psiblast,	clustal, phylip.   For	more  informa-
	      tion, and	for codes for some less	common formats,	see main docu-
	      mentation.  The string _s_ is case-insensitive (a2m or A2M  both
	      work).

       --outformat _s_
	      Write  the  output  postmsafile in alignment format _s_.	Common
	      choices for _s_ include: stockholm, a2m, afa, psiblast, clustal,
	      phylip.	The  string  _s_  is case-insensitive (a2m or A2M both
	      work).  Default is stockholm.

       --seed _n_
	      Seed the random number generator with _n_, an integer >= 0.   If
	      _n_ is nonzero, any stochastic simulations will be reproducible;
	      the same command will give the same results.  If _n_ is  0,  the
	      random  number  generator	 is seeded arbitrarily,	and stochastic
	      simulations will vary from run to	run of the same	command.   The
	      default seed is 42.

SEE ALSO
       See  hmmer(1)  for  a master man	page with a list of all	the individual
       man pages for programs in the HMMER package.

       For complete documentation, see the user	guide that came	with your  HM-
       MER distribution	(Userguide.pdf); or see	the HMMER web page (http://hm-
       mer.org/).

COPYRIGHT
       Copyright (C) 2019 Howard Hughes	Medical	Institute.
       Freely distributed under	the BSD	open source license.

       For additional information on copyright and  licensing,	see  the  file
       called  COPYRIGHT  in  your HMMER source	distribution, or see the HMMER
       web page	(http://hmmer.org/).

AUTHOR
       http://eddylab.org

HMMER 3.3			   Nov 2019			    alimask(1)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | OPTIONS FOR SPECIFYING MASK RANGE | OPTIONS FOR SPECIFYING THE ALPHABET | OPTIONS CONTROLLING PROFILE CONSTRUCTION | OPTIONS CONTROLLING RELATIVE WEIGHTS | OTHER OPTIONS | SEE ALSO | COPYRIGHT | AUTHOR

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=alimask&sektion=1&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help