Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
hmmsearch(1)			 HMMER Manual			  hmmsearch(1)

NAME
       hmmsearch - search profile(s) against a sequence	database

SYNOPSIS
       hmmsearch [options] hmmfile seqdb

DESCRIPTION
       hmmsearch  is  used  to	search one or more profiles against a sequence
       database.  For each profile in  hmmfile,	 use  that  query  profile  to
       search  the  target  database  of sequences in seqdb, and output	ranked
       lists of	the sequences with the most significant	matches	 to  the  pro-
       file.  To build profiles	from multiple alignments, see hmmbuild.

       Either the query	hmmfile	or the target seqdb may	be '-' (a dash charac-
       ter), in	which case the query profile or	target database	input will  be
       read  from  a  stdin pipe instead of from a file. Only one input	source
       can come	through	stdin, not both.  An exception is that if the  hmmfile
       contains	 more  than  one  profile  query,  then	seqdb cannot come from
       stdin, because we can't rewind the streaming target database to	search
       it with another profile.

       The output format is designed to	be human-readable, but is often	so vo-
       luminous	that reading it	is impractical,	and parsing it is a pain.  The
       --tblout	 and --domtblout options save output in	simple tabular formats
       that are	concise	and easier to parse.  The -o option allows redirecting
       the main	output,	including throwing it away in /dev/null.

OPTIONS
       -h     Help;  print  a  brief  reminder	of  command line usage and all
	      available	options.

OPTIONS	FOR CONTROLLING	OUTPUT
       -o _f_ Direct the main human-readable output to a file _f_  instead  of
	      the default stdout.

       -A _f_ Save  a multiple alignment of all	significant hits (those	satis-
	      fying inclusion thresholds) to the file _f_.

       --tblout	_f_
	      Save a simple tabular  (space-delimited)	file  summarizing  the
	      per-target  output, with one data	line per homologous target se-
	      quence found.

       --domtblout _f_
	      Save a simple tabular  (space-delimited)	file  summarizing  the
	      per-domain  output, with one data	line per homologous domain de-
	      tected in	a query	sequence for each homologous model.

       --acc  Use accessions instead of	names in the main output, where	avail-
	      able for profiles	and/or sequences.

       --noali
	      Omit  the	 alignment  section  from  the	main  output. This can
	      greatly reduce the output	volume.

       --notextw
	      Unlimit the length of each line in the main output. The  default
	      is a limit of 120	characters per line, which helps in displaying
	      the output cleanly on terminals and in editors, but can truncate
	      target profile description lines.

       --textw _n_
	      Set  the	main  output's line length limit to _n_	characters per
	      line. The	default	is 120.

OPTIONS	CONTROLLING REPORTING THRESHOLDS
       Reporting thresholds control which hits are reported  in	 output	 files
       (the main output, --tblout, and --domtblout).  Sequence hits and	domain
       hits are	ranked by statistical significance  (E-value)  and  output  is
       generated  in  two sections called per-target and per-domain output. In
       per-target output, by default, all sequence hits	with an	E-value	<=  10
       are reported. In	the per-domain output, for each	target that has	passed
       per-target reporting thresholds,	all domains satisfying per-domain  re-
       porting	thresholds  are	 reported.  By default,	these are domains with
       conditional E-values of <= 10.  The  following  options	allow  you  to
       change  the  default  E-value reporting thresholds, or to use bit score
       thresholds instead.

       -E _x_ In the per-target	output,	report target  sequences  with	an  E-
	      value  of	<= _x_.	 The default is	10.0, meaning that on average,
	      about 10 false positives will be reported	per query, so you  can
	      see  the top of the noise	and decide for yourself	if it's	really
	      noise.

       -T _x_ Instead of thresholding per-profile output on  E-value,  instead
	      report target sequences with a bit score of >= _x_.

       --domE _x_
	      In the per-domain	output,	for target sequences that have already
	      satisfied	the per-profile	reporting threshold, report individual
	      domains  with  a	conditional E-value of <= _x_.	The default is
	      10.0.  A conditional E-value means the expected number of	 addi-
	      tional  false  positive  domains	in the smaller search space of
	      those comparisons	that already satisfied the per-target  report-
	      ing threshold (and thus must have	at least one homologous	domain
	      already).

       --domT _x_
	      Instead of thresholding per-domain output	 on  E-value,  instead
	      report domains with a bit	score of >= _x_.

OPTIONS	FOR INCLUSION THRESHOLDS
       Inclusion thresholds are	stricter than reporting	thresholds.  Inclusion
       thresholds control which	hits are considered to be reliable  enough  to
       be  included  in	 an  output alignment or a subsequent search round, or
       marked as significant ("!") as opposed to questionable ("?")  in	domain
       output.

       --incE _x_
	      Use  an E-value of <= _x_	as the per-target inclusion threshold.
	      The default is 0.01, meaning that	on average, about 1 false pos-
	      itive  would  be	expected  in every 100 searches	with different
	      query sequences.

       --incT _x_
	      Instead of using E-values	for setting the	 inclusion  threshold,
	      instead  use  a  bit score of >= _x_ as the per-target inclusion
	      threshold.  By default this option is unset.

       --incdomE _x_
	      Use a conditional	E-value	of <= _x_ as the per-domain  inclusion
	      threshold,  in  targets  that have already satisfied the overall
	      per-target inclusion threshold.  The default is 0.01.

       --incdomT _x_
	      Instead of using E-values, use a bit score of >= _x_ as the per-
	      domain inclusion threshold.

OPTIONS	FOR MODEL-SPECIFIC SCORE THRESHOLDING
       Curated	profile	databases may define specific bit score	thresholds for
       each profile, superseding any thresholding based	on statistical signif-
       icance alone.

       To use these options, the profile must contain the appropriate (GA, TC,
       and/or NC) optional score threshold annotation; this is	picked	up  by
       hmmbuild	 from  Stockholm format	alignment files. Each thresholding op-
       tion has	two scores: the	per-sequence threshold <x1> and	the per-domain
       threshold  <x2>	These act as if	-T _x1_	--incT _x1_ --domT _x2_	--inc-
       domT _x2_ has been applied  specifically	 using	each  model's  curated
       thresholds.

       --cut_ga
	      Use  the	GA  (gathering)	bit scores in the model	to set per-se-
	      quence  (GA1)  and  per-domain  (GA2)  reporting	and  inclusion
	      thresholds. GA thresholds	are generally considered to be the re-
	      liable curated thresholds	defining family	membership; for	 exam-
	      ple, in Pfam, these thresholds define what gets included in Pfam
	      Full alignments based on searches	with Pfam Seed models.

       --cut_nc
	      Use the NC (noise	cutoff)	bit score thresholds in	the  model  to
	      set per-sequence (NC1) and per-domain (NC2) reporting and	inclu-
	      sion thresholds. NC thresholds are generally  considered	to  be
	      the score	of the highest-scoring known false positive.

       --cut_tc
	      Use the TC (trusted cutoff) bit score thresholds in the model to
	      set per-sequence (TC1) and per-domain (TC2) reporting and	inclu-
	      sion  thresholds.	 TC  thresholds	are generally considered to be
	      the score	of the lowest-scoring  known  true  positive  that  is
	      above all	known false positives.

OPTIONS	CONTROLLING THE	ACCELERATION PIPELINE
       HMMER3  searches	 are  accelerated in a three-step filter pipeline: the
       MSV filter, the Viterbi filter, and the Forward filter. The first  fil-
       ter  is	the fastest and	most approximate; the last is the full Forward
       scoring algorithm. There	is also	a bias filter  step  between  MSV  and
       Viterbi.	 Targets  that pass all	the steps in the acceleration pipeline
       are then	subjected to postprocessing -- domain identification and scor-
       ing using the Forward/Backward algorithm.

       Changing	 filter	 thresholds only removes or includes targets from con-
       sideration; changing filter thresholds does not alter  bit  scores,  E-
       values,	or  alignments,	all of which are determined solely in postpro-
       cessing.

       --max  Turn off all filters, including the bias filter,	and  run  full
	      Forward/Backward	postprocessing on every	target.	This increases
	      sensitivity somewhat, at a large cost in speed.

       --F1 _x_
	      Set the P-value threshold	for the	MSV filter step.  The  default
	      is  0.02,	 meaning that roughly 2% of the	highest	scoring	nonho-
	      mologous targets are expected to pass the	filter.

       --F2 _x_
	      Set the P-value threshold	for the	Viterbi	filter step.  The  de-
	      fault is 0.001.

       --F3 _x_
	      Set  the P-value threshold for the Forward filter	step.  The de-
	      fault is 1e-5.

       --nobias
	      Turn off the bias	filter.	This increases	sensitivity  somewhat,
	      but  can	come  at a high	cost in	speed, especially if the query
	      has biased residue composition (such as  a  repetitive  sequence
	      region, or if it is a membrane protein with large	regions	of hy-
	      drophobicity). Without the bias filter, too many	sequences  may
	      pass  the	filter with biased queries, leading to slower than ex-
	      pected  performance  as  the  computationally   intensive	  For-
	      ward/Backward algorithms shoulder	an abnormally heavy load.

OTHER OPTIONS
       --nonull2
	      Turn off the null2 score corrections for biased composition.

       -Z _x_ Assert that the total number of targets in your searches is _x_,
	      for the purposes of per-sequence	E-value	 calculations,	rather
	      than the actual number of	targets	seen.

       --domZ _x_
	      Assert that the total number of targets in your searches is _x_,
	      for the purposes of per-domain conditional E-value calculations,
	      rather  than  the	 number	 of  targets that passed the reporting
	      thresholds.

       --seed _n_
	      Set the random number seed to _n_.  Some steps in	postprocessing
	      require  Monte  Carlo simulation.	 The default is	to use a fixed
	      seed (42), so that results are exactly reproducible.  Any	 other
	      positive integer will give different (but	also reproducible) re-
	      sults. A choice of 0 uses	a randomly chosen seed.

       --tformat _s_
	      Assert that target sequence file seqfile is in format  _s_,  by-
	      passing  format  autodetection.  Common choices for _s_ include:
	      fasta, embl,  genbank.   Alignment  formats  also	 work;	common
	      choices include: stockholm, a2m, afa, psiblast, clustal, phylip.
	      For more information, and	for codes for some  less  common  for-
	      mats,  see  main documentation.  The string _s_ is case-insensi-
	      tive (fasta or FASTA both	work).

       --cpu _n_
	      Set the number of	parallel worker	threads	to _n_.	 On  multicore
	      machines,	the default is 2.  You can also	control	this number by
	      setting an environment variable, HMMER_NCPU.  There  is  also  a
	      master thread, so	the actual number of threads that HMMER	spawns
	      is _n_+1.

	      This option is not available if HMMER was	 compiled  with	 POSIX
	      threads support turned off.

       --stall
	      For  debugging the MPI master/worker version: pause after	start,
	      to enable	the developer to attach	debuggers to the running  mas-
	      ter  and worker(s) processes. Send SIGCONT signal	to release the
	      pause.  (Under gdb: (gdb)	signal SIGCONT)	(Only available	if op-
	      tional MPI support was enabled at	compile-time.)

       --mpi  Run  under MPI control with master/worker	parallelization	(using
	      mpirun, for example, or equivalent). Only	available if  optional
	      MPI support was enabled at compile-time.

SEE ALSO
       See  hmmer(1)  for  a master man	page with a list of all	the individual
       man pages for programs in the HMMER package.

       For complete documentation, see the user	guide that came	with your  HM-
       MER distribution	(Userguide.pdf); or see	the HMMER web page (http://hm-
       mer.org/).

COPYRIGHT
       Copyright (C) 2019 Howard Hughes	Medical	Institute.
       Freely distributed under	the BSD	open source license.

       For additional information on copyright and  licensing,	see  the  file
       called  COPYRIGHT  in  your HMMER source	distribution, or see the HMMER
       web page	(http://hmmer.org/).

AUTHOR
       http://eddylab.org

HMMER 3.3			   Nov 2019			  hmmsearch(1)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | OPTIONS FOR CONTROLLING OUTPUT | OPTIONS CONTROLLING REPORTING THRESHOLDS | OPTIONS FOR INCLUSION THRESHOLDS | OPTIONS FOR MODEL-SPECIFIC SCORE THRESHOLDING | OPTIONS CONTROLLING THE ACCELERATION PIPELINE | OTHER OPTIONS | SEE ALSO | COPYRIGHT | AUTHOR

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=hmmsearch&sektion=1&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help