Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
cmemit(1)			Infernal Manual			     cmemit(1)

NAME
       cmemit -	sample sequences from a	covariance model

SYNOPSIS
       cmemit [options]	_cmfile_

DESCRIPTION
       The  cmemit  program  samples  (emits)  sequences  from	the covariance
       model(s)	in _cmfile_, and writes	them to	 output.   Sampling  sequences
       may  be	useful for a variety of	purposes, including creating synthetic
       true positives for benchmarks or	tests.

       The default is to sample	ten unaligned sequence from each CM.  Alterna-
       tively, with the	-c option, you can emit	a single majority-rule consen-
       sus sequence; or	with the -a option, you	can emit an alignment.

       The _cmfile_ may	contain	a library of CMs, in which case	each  CM  will
       be used in turn.

       _cmfile_	 may  be '-' (dash), which means reading this input from stdin
       rather than a file.

       For models with zero basepairs, sequences are sampled from the  profile
       HMM  filter  instead  of	 the  CM.  However, since these	models will be
       nearly identical	(unless	special	options	were used in cmbuild  to  pre-
       vent  this), using the HMM instead of the CM will not change the	output
       in a significant	way, unless the	-l option is used. With	 -l,  the  HMM
       will  be	 configured  for  equiprobable	model begin and	end positions,
       while the CM will not. You can force cmemit to always sample  from  the
       CM with the --nohmmonly option.

OPTIONS
       -h     Help; print a brief reminder of command line usage and available
	      options.

       -o _f_ Save the synthetic sequences to file  _f_	 rather	 than  writing
	      them to stdout.

       -N _n_ Generate _n_ sequences. The default value	for _n_	is 10.

       -u     Write  the generated sequences in	unaligned format (FASTA). This
	      is the default behavior.

       -a     Write the	generated sequences in an aligned  format  (STOCKHOLM)
	      with  consensus  structure  annotation  rather than FASTA. Other
	      output formats are possible with the --outformat option.

       -c     Predict a	single majority-rule  consensus	 sequence  instead  of
	      sampling	sequences  from	 the  CM's  probability	 distribution.
	      Highly conserved	residues  (base	 paired	 residues  that	 score
	      higher  than  3.0	 bits,	or single stranded residues that score
	      higher than 1.0 bits) are	shown in upper case; others are	 shown
	      in lower case.

       -e _n_ Embed  the  CM  emitted sequences	in a larger randomly generated
	      sequence of length _n_ generated from an HMM that	was trained on
	      real  genomic  sequences	with various GC	contents (the same HMM
	      used by cmcalibrate).  You can use the --iid option to  generate
	      25%  A,  C,  G, and U sequence instead.  The CM emitted sequence
	      will begin at a random position within the larger	 sequence  and
	      will  be	included in its	entirety unless	the --u5p or --u3p op-
	      tions are	used.  When -e is used in combination with --u5p,  the
	      CM  emitted  sequence  will  always  begin  at position 1	of the
	      larger sequence and will be truncated 5'.	When used in  combina-
	      tion  --u3p  the CM emitted sequence will	always end at position
	      _n_ of the larger	sequence and will be truncated 3'.

       -l     Configure	the CMs	into local mode	before emitting	sequences.  By
	      default  the  model will be in global mode. In local mode, large
	      insertions and deletions are more	common than in global mode.

OPTIONS	FOR TRUNCATING EMITTED SEQUENCES
       --u5p  Truncate all emitted sequences at	a randomly chosen start	 posi-
	      tion  <n>, by only outputting residues beginning at <n>.	A dif-
	      ferent start point is randomly chosen for	each sequence.

       --u3p  Truncate all emitted sequences at	a randomly chosen end position
	      <n>,  by only outputting residues	up to position <n>.  A differ-
	      ent end point is randomly	chosen for each	sequence.

       --a5p _n_
	      In combination with the -a option, truncate the  emitted	align-
	      ment at a	randomly chosen	start match position _n_, by only out-
	      putting alignment	columns	for positions after match state	_n_  -
	      1.  _n_ must be an integer between 0 and the consensus length of
	      the model	(which can be determined using the cmstat program.  As
	      a	 special case, using 0 as _n_ will result in a randomly	chosen
	      start position.

       --a3p _n_
	      In combination with the -a option, truncate the  emitted	align-
	      ment  at	a randomly chosen end match position _n_, by only out-
	      putting alignment	columns	for positions before match state _n_ +
	      1.  _n_ must be an integer between 1 and the consensus length of
	      the model	(which can be determined using the cmstat program). As
	      a	 special case, using 0 as _n_ will result in a randomly	chosen
	      end position.

OTHER OPTIONS
       --seed _n_
	      Seed the random number generator with _n_, an integer >=	0.  If
	      _n_  is nonzero, stochastic sampling of sequences	will be	repro-
	      ducible; the same	command	will give the same results.  If	_n_ is
	      0,  the  random number generator is seeded arbitrarily, and sto-
	      chastic samplings	will vary from run to run of the same command.
	      The default seed is 0.

       --iid  With  -e,	 generate the larger sequences as 25% each A, C, G and
	      U.

       --rna  Specify that the emitted sequences be output as  RNA  sequences.
	      This is true by default.

       --dna  Specify  that  the emitted sequences be output as	DNA sequences.
	      By default, the output alphabet is RNA.

       --idx _n_
	      Specify that the emitted sequences be named starting with	 _mod-
	      elname_._n_.  By default _n_ is 1.

       --outformat _s_
	      With -a, specify the output alignment format as _s_.  Acceptable
	      formats are: Pfam,  AFA,	A2M,  Clustal,	and  Phylip.   AFA  is
	      aligned  fasta.  Only  Pfam and Stockholm	alignment formats will
	      include consensus	structure annotation.

       --tfile _f_
	      Dump tabular sequence parsetrees (tracebacks) for	 each  emitted
	      sequence to file _f_.  Primarily useful for debugging.

       --exp _x_
	      Exponentiate the emission	and transition probabilities of	the CM
	      by _x_ and then renormalize those	distributions before  emitting
	      sequences.  This	option changes the CM probability distribution
	      of parsetrees relative to	default. With _x_ less	than  1.0  the
	      emitted sequences	will tend to have lower	bit scores upon	align-
	      ment to the CM.  With <x>	greater	 than  1.0,  the  emitted  se-
	      quences  will  tend  to have higher bit scores upon alignment to
	      the CM. This bit score difference	will  increase	as  <x>	 moves
	      further  away  from 1.0 in either	direction.  If <x> equals 1.0,
	      this option has no effect	relative to default.  This  option  is
	      useful for generating sequences that are either more difficult (
	      _x_ < 1.0) or easier ( _x_ > 1.0)	for the	CM to  distinguish  as
	      homologous from background, random sequence.

       --hmmonly
	      Emit from	the filter profile HMM instead of the CM.

       --nohmmonly
	      Never  emit from the filter profile HMM, always use the CM, even
	      for models with zero basepairs.

SEE ALSO
       See infernal(1) for a master man	page with a list of all	the individual
       man pages for programs in the Infernal package.

       For  complete documentation, see	the user guide that came with your In-
       fernal distribution (Userguide.pdf); or see the Infernal	web page ().

COPYRIGHT
       Copyright (C) 2019 Howard Hughes	Medical	Institute.
       Freely distributed under	the BSD	open source license.

       For additional information on copyright and  licensing,	see  the  file
       called  COPYRIGHT  in your Infernal source distribution,	or see the In-
       fernal web page ().

AUTHOR
       The Eddy/Rivas Laboratory
       Janelia Farm Research Campus
       19700 Helix Drive
       Ashburn VA 20147	USA
       http://eddylab.org

Infernal 1.1.3			   Nov 2019			     cmemit(1)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | OPTIONS FOR TRUNCATING EMITTED SEQUENCES | OTHER OPTIONS | SEE ALSO | COPYRIGHT | AUTHOR

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=cmemit&sektion=1&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help