Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
nhmmscan(1)			 HMMER Manual			   nhmmscan(1)

NAME
       nhmmscan	- search DNA sequence(s) against a DNA profile database

SYNOPSIS
       nhmmscan	[options] hmmdb	seqfile

DESCRIPTION
       nhmmscan	 is used to search nucleotide sequences	against	collections of
       nucleotide profiles. For	each sequence in seqfile, use that  query  se-
       quence  to  search the target database of profiles in hmmdb, and	output
       ranked lists of the profiles with the most significant matches  to  the
       sequence.

       The  seqfile  may  contain  more	 than one query	sequence. It can be in
       FASTA format, or	several	other common sequence file  formats  (genbank,
       embl,  and uniprot, among others), or in	alignment file formats (stock-
       holm, aligned fasta, and	others). See the --qformat option for  a  com-
       plete list.

       The hmmdb needs to be press'ed using hmmpress before it can be searched
       with nhmmscan.  This creates four binary	files, suffixed	.h3{fimp}.

       The query seqfile may be	'-' (a dash  character),  in  which  case  the
       query sequences are read	from a stdin pipe instead of from a file.  The
       hmmdb cannot be read from a stdin stream, because it needs to have  the
       four auxiliary binary files generated by	hmmpress.

       The output format is designed to	be human-readable, but is often	so vo-
       luminous	that reading it	is impractical,	and parsing it is a pain.  The
       --tblout	option saves output in a simple	tabular	format that is concise
       and easier to parse.  The -o option allows redirecting the main output,
       including throwing it away in /dev/null.

OPTIONS
       -h     Help;  print  a  brief  reminder	of  command line usage and all
	      available	options.

OPTIONS	FOR CONTROLLING	OUTPUT
       -o _f_ Direct the main human-readable output to a file _f_  instead  of
	      the default stdout.

       --tblout	_f_
	      Save  a  simple  tabular	(space-delimited) file summarizing the
	      per-hit output, with one data line per homologous	 target	 model
	      hit found.

       --dfamtblout _f_
	      Save  a  tabular	(space-delimited) file summarizing the per-hit
	      output, similar to --tblout but more succinct.

       --aliscoresout _f_
	      Save to file a list of per-position scores for each  hit.	  This
	      is  useful,  for	example,  in identifying regions of high score
	      density for use in resolving  overlapping	 hits  from  different
	      models.

       --acc  Use accessions instead of	names in the main output, where	avail-
	      able for profiles	and/or sequences.

       --noali
	      Omit the alignment  section  from	 the  main  output.  This  can
	      greatly reduce the output	volume.

       --notextw
	      Unlimit  the length of each line in the main output. The default
	      is a limit of 120	characters per line, which helps in displaying
	      the output cleanly on terminals and in editors, but can truncate
	      target profile description lines.

       --textw _n_
	      Set the main output's line length	limit to  _n_  characters  per
	      line. The	default	is 120.

OPTIONS	FOR REPORTING THRESHOLDS
       Reporting  thresholds  control  which hits are reported in output files
       (the main output, --tblout, and --dfamtblout).  Hits are	ranked by sta-
       tistical	significance (E-value).

       -E _x_ Report  target  profiles with an E-value of <= _x_.  The default
	      is 10.0, meaning that on average,	about 10 false positives  will
	      be  reported  per	query, so you can see the top of the noise and
	      decide for yourself if it's really noise.

       -T _x_ Instead of thresholding output on	E-value, instead report	target
	      profiles with a bit score	of >= _x_.

OPTIONS	FOR INCLUSION THRESHOLDS
       Inclusion thresholds are	stricter than reporting	thresholds.  Inclusion
       thresholds control which	hits are considered to be reliable  enough  to
       be  included  in	 an output alignment or	a subsequent search round.  In
       nhmmscan, which does not	have any alignment output (like	 nhmmer),  in-
       clusion	thresholds  have little	effect.	They only affect what hits get
       marked as significant (!) or questionable (?) in	hit output.

       --incE _x_
	      Use an E-value of	<= _x_ as the inclusion	 threshold.   The  de-
	      fault  is	 0.01, meaning that on average,	about 1	false positive
	      would be expected	in every 100 searches with different query se-
	      quences.

       --incT _x_
	      Instead  of  using E-values for setting the inclusion threshold,
	      use a bit	score of >= _x_	as the inclusion threshold.  It	 would
	      be unusual to use	bit score thresholds with hmmscan, because you
	      don't expect a single score threshold to work for	different pro-
	      files; different profiles	have slightly different	expected score
	      distributions.

OPTIONS	FOR MODEL-SPECIFIC SCORE THRESHOLDING
       Curated profile databases may define specific bit score thresholds  for
       each profile, superseding any thresholding based	on statistical signif-
       icance alone.

       To use these options, the profile must contain the appropriate (GA, TC,
       and/or  NC)  optional  score threshold annotation; this is picked up by
       hmmbuild	from Stockholm format alignment	files. For a nucleotide	model,
       each  thresholding  option has a	single per-hit threshold <x> This acts
       as if -T	_x_ --incT  _x_	 has  been  applied  specifically  using  each
       model's curated thresholds.

       --cut_ga
	      Use  the	GA (gathering) bit score threshold in the model	to set
	      per-hit reporting	and inclusion thresholds.  GA  thresholds  are
	      generally	 considered  to	 be  the  reliable  curated thresholds
	      defining family membership; for example, in Dfam,	these  thresh-
	      olds are applied when annotating a genome	with a model of	a fam-
	      ily known	to be found in that organism. They may allow for mini-
	      mal expected false discovery rate.

       --cut_nc
	      Use  the	NC  (noise cutoff) bit score threshold in the model to
	      set per-hit reporting and	inclusion  thresholds.	NC  thresholds
	      are  less	 stringent  than  GA; in the context of	Pfam, they are
	      generally	used to	store the score	of the	highest-scoring	 known
	      false positive.

       --cut_tc
	      Use  the TC (trusted cutoff) bit score threshold in the model to
	      set per-hit reporting and	inclusion  thresholds.	TC  thresholds
	      are  more	 stringent than	GA, and	are generally considered to be
	      the score	of the lowest-scoring  known  true  positive  that  is
	      above  all  known	 false	positives; for example,	in Dfam, these
	      thresholds are applied when annotating a genome with a model  of
	      a	family not known to be found in	that organism.

CONTROL	OF THE ACCELERATION PIPELINE
       HMMER3  searches	 are  accelerated in a three-step filter pipeline: the
       scanning-SSV filter, the	Viterbi	filter,	and the	 Forward  filter.  The
       first  filter is	the fastest and	most approximate; the last is the full
       Forward scoring algorithm. There	is also	a bias filter step between SSV
       and  Viterbi. Targets that pass all the steps in	the acceleration pipe-
       line are	then subjected to postprocessing -- domain identification  and
       scoring using the Forward/Backward algorithm.

       Changing	 filter	 thresholds only removes or includes targets from con-
       sideration; changing filter thresholds does not alter  bit  scores,  E-
       values,	or  alignments,	all of which are determined solely in postpro-
       cessing.

       --max  Turn off (nearly)	all filters, including the  bias  filter,  and
	      run  full	 Forward/Backward postprocessing on most of the	target
	      sequence.	 In contrast to	hmmscan, where this flag  really  does
	      turn  off	 the filters entirely, the --max flag in nhmmscan sets
	      the scanning-SSV filter threshold	to 0.4,	not 1.0. Use  of  this
	      flag increases sensitivity somewhat, at a	large cost in speed.

       --F1 _x_
	      Set  the P-value threshold for the MSV filter step.  The default
	      is 0.02, meaning that roughly 2% of the highest  scoring	nonho-
	      mologous targets are expected to pass the	filter.

       --F2 _x_
	      Set  the P-value threshold for the Viterbi filter	step.  The de-
	      fault is 0.001.

       --F3 _x_
	      Set the P-value threshold	for the	Forward	filter step.  The  de-
	      fault is 1e-5.

       --nobias
	      Turn  off	 the bias filter. This increases sensitivity somewhat,
	      but can come at a	high cost in speed, especially	if  the	 query
	      has  biased  residue  composition	(such as a repetitive sequence
	      region, or if it is a membrane protein with large	regions	of hy-
	      drophobicity).  Without  the bias	filter,	too many sequences may
	      pass the filter with biased queries, leading to slower than  ex-
	      pected   performance   as	 the  computationally  intensive  For-
	      ward/Backward algorithms shoulder	an abnormally heavy load.

OTHER OPTIONS
       --nonull2
	      Turn off the null2 score corrections for biased composition.

       -Z _x_ Assert that the total number of targets in your searches is _x_,
	      for  the	purposes  of per-sequence E-value calculations,	rather
	      than the actual number of	targets	seen.

       --seed _n_
	      Set the random number seed to _n_.  Some steps in	postprocessing
	      require  Monte  Carlo simulation.	 The default is	to use a fixed
	      seed (42), so that results are exactly reproducible.  Any	 other
	      positive integer will give different (but	also reproducible) re-
	      sults. A choice of 0 uses	an arbitrarily chosen seed.

       --qformat _s_
	      Assert that input	query seqfile is in format _s_,	bypassing for-
	      mat autodetection.  Common choices for _s_ include: fasta, embl,
	      genbank.	Alignment formats also work; common  choices  include:
	      stockholm, a2m, afa, psiblast, clustal, phylip.  For more	infor-
	      mation, and for codes for	some less  common  formats,  see  main
	      documentation.   The  string  _s_	 is case-insensitive (fasta or
	      FASTA both work).

       --w_beta	_x_
	      Window length tail mass.	The upper bound, W, on the  length  at
	      which  nhmmer  expects  to  find an instance of the model	is set
	      such that	the fraction of	all sequences generated	by  the	 model
	      with  length  >= W is less than _x_.  The	default	is 1e-7.  This
	      flag may be used to override the value of	W established for  the
	      model by hmmbuild.

       --w_length _n_
	      Override the model instance length upper bound, W, which is oth-
	      erwise controlled	by --w_beta.  It should	 be  larger  than  the
	      model  length.  The value	of  W is used deep in the acceleration
	      pipeline,	and modest changes are not expected to impact  results
	      (though  larger  values  of W do lead to longer run time).  This
	      flag may be used to override the value of	W established for  the
	      model by hmmbuild.

       --watson
	      Only  search  the	top strand. By default both the	query sequence
	      and its reverse-complement are searched.

       --crick
	      Only search the bottom (reverse-complement) strand.  By  default
	      both the query sequence and its reverse-complement are searched.

       --cpu _n_
	      Set  the number of parallel worker threads to _n_.  On multicore
	      machines,	the default is 2.  You can also	control	this number by
	      setting  an  environment	variable, HMMER_NCPU.  There is	also a
	      master thread, so	the actual number of threads that HMMER	spawns
	      is _n_+1.

	      This  option  is	not available if HMMER was compiled with POSIX
	      threads support turned off.

       --stall
	      For debugging the	MPI master/worker version: pause after	start,
	      to  enable the developer to attach debuggers to the running mas-
	      ter and worker(s)	processes. Send	SIGCONT	signal to release  the
	      pause.  (Under gdb: (gdb)	signal SIGCONT)

	      (Only  available if optional MPI support was enabled at compile-
	      time.)

       --mpi  Run under	MPI control with master/worker parallelization	(using
	      mpirun,  for example, or equivalent). Only available if optional
	      MPI support was enabled at compile-time.

SEE ALSO
       See hmmer(1) for	a master man page with a list of  all  the  individual
       man pages for programs in the HMMER package.

       For  complete documentation, see	the user guide that came with your HM-
       MER distribution	(Userguide.pdf); or see	the HMMER web page (http://hm-
       mer.org/).

COPYRIGHT
       Copyright (C) 2019 Howard Hughes	Medical	Institute.
       Freely distributed under	the BSD	open source license.

       For  additional	information  on	 copyright and licensing, see the file
       called COPYRIGHT	in your	HMMER source distribution, or  see  the	 HMMER
       web page	(http://hmmer.org/).

AUTHOR
       http://eddylab.org

HMMER 3.3			   Nov 2019			   nhmmscan(1)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | OPTIONS FOR CONTROLLING OUTPUT | OPTIONS FOR REPORTING THRESHOLDS | OPTIONS FOR INCLUSION THRESHOLDS | OPTIONS FOR MODEL-SPECIFIC SCORE THRESHOLDING | CONTROL OF THE ACCELERATION PIPELINE | OTHER OPTIONS | SEE ALSO | COPYRIGHT | AUTHOR

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=nhmmscan&sektion=1&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help