Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
hmmscan(1)			 HMMER Manual			    hmmscan(1)

NAME
       hmmscan - search	sequence(s) against a profile database

SYNOPSIS
       hmmscan [options] hmmdb seqfile

DESCRIPTION
       hmmscan is used to search protein sequences against collections of pro-
       tein profiles. For each sequence	in seqfile, use	that query sequence to
       search  the  target  database  of  profiles in hmmdb, and output	ranked
       lists of	the profiles with the most  significant	 matches  to  the  se-
       quence.

       The  seqfile  may  contain  more	 than one query	sequence. Each will be
       searched	in turn	against	hmmdb.

       The hmmdb needs to be press'ed using hmmpress before it can be searched
       with hmmscan.  This creates four	binary files, suffixed .h3{fimp}.

       The  query  seqfile  may	 be  '-' (a dash character), in	which case the
       query sequences are read	from a stdin pipe instead of from a file.  The
       hmmdb  cannot  be  read	from  a	stdin stream, because it needs to have
       those four auxiliary binary files generated by hmmpress.

       The output format is designed to	be human-readable, but is often	so vo-
       luminous	 that reading it is impractical, and parsing it	is a pain. The
       --tblout	and --domtblout	options	save output in simple tabular  formats
       that are	concise	and easier to parse.  The -o option allows redirecting
       the main	output,	including throwing it away in /dev/null.

OPTIONS
       -h     Help; print a brief reminder  of	command	 line  usage  and  all
	      available	options.

OPTIONS	FOR CONTROLLING	OUTPUT
       -o _f_ Direct  the  main	human-readable output to a file	_f_ instead of
	      the default stdout.

       --tblout	_f_
	      Save a simple tabular  (space-delimited)	file  summarizing  the
	      per-target  output,  with	 one  data  line per homologous	target
	      model found.

       --domtblout _f_
	      Save a simple tabular  (space-delimited)	file  summarizing  the
	      per-domain  output, with one data	line per homologous domain de-
	      tected in	a query	sequence for each homologous model.

       --pfamtblout _f_
	      Save an especially succinct tabular (space-delimited) file  sum-
	      marizing	the  per-target	output,	with one data line per homolo-
	      gous target model	found.

       --acc  Use accessions instead of	names in the main output, where	avail-
	      able for profiles	and/or sequences.

       --noali
	      Omit  the	 alignment  section  from  the	main  output. This can
	      greatly reduce the output	volume.

       --notextw
	      Unlimit the length of each line in the main output. The  default
	      is a limit of 120	characters per line, which helps in displaying
	      the output cleanly on terminals and in editors, but can truncate
	      target profile description lines.

       --textw _n_
	      Set  the	main  output's line length limit to _n_	characters per
	      line. The	default	is 120.

OPTIONS	FOR REPORTING THRESHOLDS
       Reporting thresholds control which hits are reported  in	 output	 files
       (the main output, --tblout, and --domtblout).

       -E _x_ In the per-target	output,	report target profiles with an E-value
	      of <= _x_.  The default is 10.0, meaning that on average,	 about
	      10  false	 positives  will be reported per query,	so you can see
	      the top of the noise and decide  for  yourself  if  it's	really
	      noise.

       -T _x_ Instead  of  thresholding	per-profile output on E-value, instead
	      report target profiles with a bit	score of >= _x_.

       --domE _x_
	      In the per-domain	output,	for target profiles that have  already
	      satisfied	the per-profile	reporting threshold, report individual
	      domains with a conditional E-value of <= _x_.   The  default  is
	      10.0.   A	conditional E-value means the expected number of addi-
	      tional false positive domains in the  smaller  search  space  of
	      those comparisons	that already satisfied the per-profile report-
	      ing threshold (and thus must have	at least one homologous	domain
	      already).

       --domT _x_
	      Instead  of  thresholding	 per-domain output on E-value, instead
	      report domains with a bit	score of >= _x_.

OPTIONS	FOR INCLUSION THRESHOLDS
       Inclusion thresholds are	stricter than reporting	thresholds.  Inclusion
       thresholds  control  which hits are considered to be reliable enough to
       be included in an output	alignment or a subsequent  search  round.   In
       hmmscan,	 which	does  not have any alignment output (like hmmsearch or
       phmmer) nor any iterative  search  steps	 (like	jackhmmer),  inclusion
       thresholds have little effect. They only	affect what domains get	marked
       as significant (!) or questionable (?) in domain	output.

       --incE _x_
	      Use an E-value of	<= _x_ as the per-target inclusion  threshold.
	      The default is 0.01, meaning that	on average, about 1 false pos-
	      itive would be expected in every	100  searches  with  different
	      query sequences.

       --incT _x_
	      Instead  of  using E-values for setting the inclusion threshold,
	      instead use a bit	score of >= _x_	as  the	 per-target  inclusion
	      threshold.  It would be unusual to use bit score thresholds with
	      hmmscan, because you don't expect	a single  score	 threshold  to
	      work  for	 different  profiles; different	profiles have slightly
	      different	expected score distributions.

       --incdomE _x_
	      Use a conditional	E-value	of <= _x_ as the per-domain  inclusion
	      threshold,  in  targets  that have already satisfied the overall
	      per-target inclusion threshold.  The default is 0.01.

       --incdomT _x_
	      Instead of using E-values, instead use a bit score of >= _x_  as
	      the  per-domain  inclusion  threshold.  As with --incT above, it
	      would be unusual to use a	single bit score threshold in hmmscan.

OPTIONS	FOR MODEL-SPECIFIC SCORE THRESHOLDING
       Curated profile databases may define specific bit score thresholds  for
       each profile, superseding any thresholding based	on statistical signif-
       icance alone.

       To use these options, the profile must contain the appropriate (GA, TC,
       and/or  NC)  optional  score threshold annotation; this is picked up by
       hmmbuild	from Stockholm format alignment	files. Each  thresholding  op-
       tion has	two scores: the	per-sequence threshold _x1_ and	the per-domain
       threshold _x2_.	These act as if	-T _x1_	--incT _x1_ --domT _x2_	--inc-
       domT  _x2_  has	been  applied  specifically using each model's curated
       thresholds.

       --cut_ga
	      Use the GA (gathering) bit scores	in the model  to  set  per-se-
	      quence  (GA1)  and  per-domain  (GA2)  reporting	and  inclusion
	      thresholds. GA thresholds	are generally considered to be the re-
	      liable  curated thresholds defining family membership; for exam-
	      ple, in Pfam, these thresholds define what gets included in Pfam
	      Full alignments based on searches	with Pfam Seed models.

       --cut_nc
	      Use  the	NC (noise cutoff) bit score thresholds in the model to
	      set per-sequence (NC1) and per-domain (NC2) reporting and	inclu-
	      sion  thresholds.	 NC  thresholds	are generally considered to be
	      the score	of the highest-scoring known false positive.

       --cut_tc
	      Use the NC (trusted cutoff) bit score thresholds in the model to
	      set per-sequence (TC1) and per-domain (TC2) reporting and	inclu-
	      sion thresholds. TC thresholds are generally  considered	to  be
	      the  score  of  the  lowest-scoring  known true positive that is
	      above all	known false positives.

CONTROL	OF THE ACCELERATION PIPELINE
       HMMER3 searches are accelerated in a three-step	filter	pipeline:  the
       MSV  filter, the	Viterbi	filter,	and the	Forward	filter.	The first fil-
       ter is the fastest and most approximate;	the last is the	 full  Forward
       scoring	algorithm.  There  is  also a bias filter step between MSV and
       Viterbi.	Targets	that pass all the steps	in the	acceleration  pipeline
       are then	subjected to postprocessing -- domain identification and scor-
       ing using the Forward/Backward algorithm.

       Changing	filter thresholds only removes or includes targets  from  con-
       sideration;  changing  filter  thresholds does not alter	bit scores, E-
       values, or alignments, all of which are determined solely  in  postpro-
       cessing.

       --max  Turn  off	 all  filters, including the bias filter, and run full
	      Forward/Backward postprocessing on every target. This  increases
	      sensitivity somewhat, at a large cost in speed.

       --F1 _x_
	      Set  the P-value threshold for the MSV filter step.  The default
	      is 0.02, meaning that roughly 2% of the highest  scoring	nonho-
	      mologous targets are expected to pass the	filter.

       --F2 _x_
	      Set  the P-value threshold for the Viterbi filter	step.  The de-
	      fault is 0.001.

       --F3 _x_
	      Set the P-value threshold	for the	Forward	filter step.  The  de-
	      fault is 1e-5.

       --nobias
	      Turn  off	 the bias filter. This increases sensitivity somewhat,
	      but can come at a	high cost in speed, especially	if  the	 query
	      has  biased  residue  composition	(such as a repetitive sequence
	      region, or if it is a membrane protein with large	regions	of hy-
	      drophobicity).  Without  the bias	filter,	too many sequences may
	      pass the filter with biased queries, leading to slower than  ex-
	      pected   performance   as	 the  computationally  intensive  For-
	      ward/Backward algorithms shoulder	an abnormally heavy load.

OTHER OPTIONS
       --nonull2
	      Turn off the null2 score corrections for biased composition.

       -Z _x_ Assert that the total number of targets in your searches is _x_,
	      for  the	purposes  of per-sequence E-value calculations,	rather
	      than the actual number of	targets	seen.

       --domZ _x_
	      Assert that the total number of targets in your searches is _x_,
	      for the purposes of per-domain conditional E-value calculations,
	      rather than the number of	 targets  that	passed	the  reporting
	      thresholds.

       --seed _n_
	      Set the random number seed to _n_.  Some steps in	postprocessing
	      require Monte Carlo simulation.  The default is to use  a	 fixed
	      seed  (42),  so that results are exactly reproducible. Any other
	      positive integer will give different (but	also reproducible) re-
	      sults. A choice of 0 uses	an arbitrarily chosen seed.

       --qformat _s_
	      Assert that input	seqfile	is in format _s_, bypassing format au-
	      todetection.  Common choices for _s_ include: fasta, embl,  gen-
	      bank.   Alignment	 formats  also	work;  common choices include:
	      stockholm, a2m, afa, psiblast, clustal, phylip.  For more	infor-
	      mation,  and  for	 codes	for some less common formats, see main
	      documentation.  The string _s_  is  case-insensitive  (fasta  or
	      FASTA both work).

       --cpu _n_
	      Set  the number of parallel worker threads to _n_.  On multicore
	      machines,	the default is 2.  You can also	control	this number by
	      setting  an  environment	variable, HMMER_NCPU.  There is	also a
	      master thread, so	the actual number of threads that HMMER	spawns
	      is _n_+1.

	      This  option  is	not available if HMMER was compiled with POSIX
	      threads support turned off.

       --stall
	      For debugging the	MPI master/worker version: pause after	start,
	      to  enable the developer to attach debuggers to the running mas-
	      ter and worker(s)	processes. Send	SIGCONT	signal to release  the
	      pause.  (Under gdb: (gdb)	signal SIGCONT)

	      (Only  available if optional MPI support was enabled at compile-
	      time.)

       --mpi  Run under	MPI control with master/worker parallelization	(using
	      mpirun,  for example, or equivalent). Only available if optional
	      MPI support was enabled at compile-time.

SEE ALSO
       See hmmer(1) for	a master man page with a list of  all  the  individual
       man pages for programs in the HMMER package.

       For  complete documentation, see	the user guide that came with your HM-
       MER distribution	(Userguide.pdf); or see	the HMMER web page (http://hm-
       mer.org/).

COPYRIGHT
       Copyright (C) 2019 Howard Hughes	Medical	Institute.
       Freely distributed under	the BSD	open source license.

       For  additional	information  on	 copyright and licensing, see the file
       called COPYRIGHT	in your	HMMER source distribution, or  see  the	 HMMER
       web page	(http://hmmer.org/).

AUTHOR
       http://eddylab.org

HMMER 3.3			   Nov 2019			    hmmscan(1)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | OPTIONS FOR CONTROLLING OUTPUT | OPTIONS FOR REPORTING THRESHOLDS | OPTIONS FOR INCLUSION THRESHOLDS | OPTIONS FOR MODEL-SPECIFIC SCORE THRESHOLDING | CONTROL OF THE ACCELERATION PIPELINE | OTHER OPTIONS | SEE ALSO | COPYRIGHT | AUTHOR

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=hmmscan&sektion=1&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help