Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
vsearch(1)			 USER COMMANDS			    vsearch(1)

NAME
       vsearch	-- chimera detection, clustering, dereplication	and rereplica-
       tion, FASTA/FASTQ file processing, masking, pairwise alignment, search-
       ing,  shuffling,	 sorting, subsampling, and taxonomic classification of
       amplicons for metagenomics, genomics, and population genetics.

SYNOPSIS
       Chimera detection:
	      vsearch (--uchime_denovo | --uchime2_denovo | --uchime3_denovo)
	      fastafile	(--chimeras | --nonchimeras | --uchimealns | --uchime-
	      out) outputfile [options]

	      vsearch --uchime_ref fastafile (--chimeras | --nonchimeras |
	      --uchimealns | --uchimeout) outputfile --db fastafile [options]

       Clustering:
	      vsearch (--cluster_fast |	--cluster_size | --cluster_smallmem |
	      --cluster_unoise)	fastafile (--alnout | --biomout	| --blast6out
	      |	--centroids | --clusters | --mothur_shared_out | --msaout |
	      --otutabout | --profile |	--samout | --uc	| --userout) output-
	      file --id	real [options]

       Dereplication and rereplication:
	      vsearch (--derep_fulllength | --derep_id | --derep_prefix)
	      fastafile	(--output | --uc) outputfile [options]

	      vsearch --rereplicate fastafile --output outputfile [options]

       Extraction of sequences:
	      vsearch --fastx_getseq fastafile (--fastaout | --fastqout	|
	      --notmatched | --notmatchedfq) outputfile	--label	label [op-
	      tions]

	      vsearch --fastx_getseqs fastafile	(--fastaout | --fastqout |
	      --notmatched | --notmatchedfq) outputfile	(--label label	--la-
	      bels labelfile | --label_word label | --label_words labelfile)
	      [options]

	      vsearch --fastx_getsubseq	fastafile (--fastaout |	--fastqout |
	      --notmatched | --notmatchedfq) outputfile	--label	label [--sub-
	      seq_start	position] [--subseq_end	position] [options]

       FASTA/FASTQ file	processing:
	      vsearch --fastq_chars fastqfile [options]

	      vsearch --fastq_convert fastqfile	--fastqout outputfile [op-
	      tions]

	      vsearch (--fastq_eestats | --fastq_eestats2) fastqfile --output
	      outputfile [options]

	      vsearch --fastq_filter fastqfile [--reverse fastqfile] (--fas-
	      taout | --fastaout_discarded | --fastqout	| --fastqout_discarded
	      --fastaout_rev | --fastaout_discarded_rev	| --fastqout_rev |
	      --fastqout_discarded_rev)	outputfile [options]

	      vsearch --fastq_join fastqfile --reverse fastqfile (--fastaout |
	      --fastqout) outputfile [options]

	      vsearch --fastq_mergepairs fastqfile --reverse fastqfile (--fas-
	      taout | --fastqout | --fastaout_notmerged_fwd | --fastaout_not-
	      merged_rev | --fastqout_notmerged_fwd | --fastqout_notmerged_rev
	      |	--eetabbedout) outputfile [options]

	      vsearch --fastq_stats fastqfile [--log logfile] [options]

	      vsearch --fastx_filter inputfile [--reverse inputfile] (--fas-
	      taout | --fastaout_discarded | --fastqout	| --fastqout_discarded
	      --fastaout_rev | --fastaout_discarded_rev	| --fastqout_rev |
	      --fastqout_discarded_rev)	outputfile [options]

	      vsearch --fastx_revcomp inputfile	(--fastaout | --fastqout) out-
	      putfile [options]

	      vsearch --sff_convert sff-file --fastqout	outputfile [options]

       Masking:
	      vsearch --fastx_mask fastxfile (--fastaout | --fastqout) output-
	      file [options]

	      vsearch --maskfasta fastafile --output outputfile	[options]

       Orienting:
	      vsearch --orient fastxfile --db fastafile	(--fastaout |
	      --fastqout | --notmatched	| --tabbedout) outputfile [options]

       Pairwise	alignment:
	      vsearch --allpairs_global	fastafile (--alnout | --blast6out |
	      --matched	| --notmatched | --samout | --uc | --userout) output-
	      file (--acceptall	| --id real) [options]

       Restriction site	cutting:
	      vsearch --cut fastafile --cut_pattern pattern (--fastaout	|
	      --fastaout_rev | --fastaout_discarded | --fastaout_dis-
	      carded_rev) outputfile [options]

       Searching:
	      vsearch --search_exact fastafile --db fastafile (--alnout	|
	      --biomout	| --blast6out |	--mothur_shared_out | --otutabout |
	      --samout | --uc |	--userout) outputfile [options]

	      vsearch --usearch_global fastafile --db fastafile	(--alnout |
	      --biomout	| --blast6out |	--mothur_shared_out | --otutabout |
	      --samout | --uc |	--userout) outputfile --id real	[options]

       Shuffling and sorting:
	      vsearch (--shuffle | --sortbylength | --sortbysize) fastafile
	      --output outputfile [options]

       Subsampling:
	      vsearch --fastx_subsample	fastafile (--fastaout |	--fastqout)
	      outputfile (--sample_pct real | --sample_size positive integer)
	      [options]

       Taxonomic classification:
	      vsearch --sintax fastafile --db fastafile	--tabbedout outputfile
	      [--sintax_cutoff real] [options]

       UDB database handling:
	      vsearch --makeudb_usearch	fastafile --output outputfile [op-
	      tions]

	      vsearch --udb2fasta udbfile --output outputfile [options]

	      vsearch (--udbinfo | --udbstats) udbfile [options]

DESCRIPTION
       Environmental  or  clinical  molecular diversity	studies	generate large
       volumes of amplicons (e.g.; SSU-rRNA sequences) that need to be checked
       for chimeras, dereplicated, masked, sorted, searched, clustered or com-
       pared to	reference sequences. The aim of	vsearch	is to offer a  all-in-
       one  open source	tool to	perform	these tasks, using optimized algorithm
       implementations and harvesting the full potential of modern  computers,
       thus providing fast and accurate	data processing.

       Comparing  nucleotide  sequences	is at the core of vsearch. To speed up
       comparisons, vsearch implements an extremely fast Needleman-Wunsch  al-
       gorithm,	 making	 use  of  the  Streaming  SIMD	Extensions  (SSE2)  of
       post-2003 x86-64	CPUs.  If SSE2 instructions are	not available, vsearch
       exits with an error message. On Power8 CPUs it will use AltiVec/VSX/VMX
       instructions, and on ARMv8 CPUs it will use Neon	 instructions.	Memory
       usage increases rapidly with sequence length: for example comparing two
       sequences of length 1 kb	requires 8 MB of memory	per thread,  and  com-
       paring  two  10	kb sequences requires 800 MB of	memory per thread. For
       comparisons involving sequences with a length product greater  than  25
       million	(for  example  two  sequences  of length 5 kb),	vsearch	uses a
       slower alignment	method described by Hirschberg (1975)  and  Myers  and
       Miller (1988), with much	smaller	memory requirements.

   Input
       vsearch	accept as input	fasta or fastq files containing	one or several
       nucleotidic entries. In fasta files, each entry is made of a header and
       a  sequence.  The header	is defined as the string comprised between the
       initial '>' symbol and the first	space, tab or the end of the line, un-
       less  the --notrunclabels option	is in effect, in which case the	entire
       line is included. The header should contain printable ascii  characters
       (33-126).  The  program	will terminate with a fatal error if there are
       unprintable ascii characters. A warning will  be	 issued	 if  non-ascii
       characters (128-255) are	encountered.

       If the header matches '>[;]size=integer;label', vsearch interpret inte-
       ger as the number of occurrences	(or abundance) of the sequence in  the
       study. That abundance information is used or created during chimera de-
       tection,	clustering, dereplication, sorting and searching.

       The sequence is defined as a string of  IUPAC  symbols  (ACGTURYSWKMDB-
       HVN),  starting	after the end of the identifier	line and ending	before
       the next	identifier line, or the	file  end.  vsearch  silently  ignores
       ascii  characters  9  to	 13,  and exits	with an	error message if ascii
       characters 0 to 8, 14 to	31, '.'	or '-' are present. All	other ascii or
       non-ascii  characters  are  stripped  and complained about in a warning
       message.

       In fastq	files, each entry is made of sequence header starting  with  a
       symbol '@', a nucleotidic sequence (same	rules as for fasta sequences),
       a quality header	starting with a	symbol '+' and a string	of ASCII char-
       acters  (offset	33  or 64), each one encoding the quality value	of the
       corresponding position in the nucleotidic sequence.

       vsearch operations are case insensitive,	except when  soft  masking  is
       activated.  Masking  is automatically applied during chimera detection,
       clustering, masking, pairwise alignment and searching. Soft masking  is
       specified  with	the options '--dbmask soft' (for searching and chimera
       detection with a	reference) or '--qmask soft' (for searching,  de  novo
       chimera	detection,  clustering	and masking). When using soft masking,
       lower case letters indicate masked symbols, while  upper	 case  letters
       indicate	 regular  symbols.  Masked  symbols  are never included	in the
       unique index words used for sequence comparisons,  otherwise  they  are
       treated as normal symbols.

       When  comparing	sequences  during  chimera  detection,	dereplication,
       searching and clustering, T and U are considered	identical,  regardless
       of  their case. When aligning sequences,	identical symbols will receive
       a positive match	score (default +2). If two symbols are not  identical,
       their  alignment	 result	 in  a	negative  mismatch score (default -4).
       Aligning	a pair of symbols where	at least one of	them is	 an  ambiguous
       symbol  (BDHKMNRSVWY)  will always result in a score of zero. Alignment
       of two identical	ambiguous symbols (for example,	R vs R)	also  receives
       a  score	 of  zero. When	computing the amount of	similarity by counting
       matches and mismatches after alignment,	ambiguous  nucleotide  symbols
       will  count  as	matching to other symbols if they have at least	one of
       the nucleotides (ACGTU) they may	represent in common.  For  example:  W
       will  match  A  and  T, but also	any of MRVHDN. When showing alignments
       (for example with the --alnout option) matches involving	ambiguous sym-
       bols  will  be shown with a plus	character (+) between them while exact
       matches between non-ambiguous symbols will be shown with	a vertical bar
       character (|).

       vsearch	can read data from standard files and write to standard	files,
       but it can also read from pipes and write to pipes! For example,	multi-
       ple  fasta files	can be piped into vsearch for dereplication. To	do so,
       file names can be replaced with:

	      -	the symbol '-',	representing '/dev/stdin' for input  files  or
		'/dev/stdout' for output files,

	      -	a named	pipe created with the command mkfifo,

	      -	a  process  substitution '<(command)' as input or '>(command)'
		as output.

       vsearch can automatically read compressed gzip or bzip2	files  if  the
       appropriate  libraries  are present during the compilation. vsearch can
       also read pipes streaming compressed gzip or bzip2 data if the  options
       --gzip_decompress or --bzip2_decompress are selected. When reading from
       a pipe, the progress indicator is not updated.

   Options
       vsearch recognizes a large number of command-line commands and options.
       For  easier navigation, options are grouped below by theme (chimera de-
       tection,	clustering, dereplication and rereplication, FASTA/FASTQ  file
       processing, masking, pairwise alignment,	searching, shuffling, sorting,
       and subsampling). We start with the general options that	apply  to  all
       themes.	Options	 start	with a double dash (--). A single dash (-) may
       also be used, except on NetBSD systems. Option names may	 be  shortened
       as long as they are not ambiguous (e.g. --derep_f).

       Help and	version	commands:

	      --help --h
		       Display help text with brief information	about all com-
		       mands and options.

	      --version	--v
		       Output version  information  and	 a  citation  for  the
		       VSEARCH publication. Show the status of the support for
		       gzip- and bzip2-compressed input	files.

       General options:

	      --bzip2_decompress
		       When reading from  a  pipe  streaming  bzip2-compressed
		       data,  decompress  the  data. That option is not	needed
		       when reading from a standard bzip2-compressed file.

	      --fasta_width positive integer
		       Fasta files produced by vsearch are wrapped  (sequences
		       are  written on lines of	integer	nucleotides, 80	by de-
		       fault). Set that	value to zero to eliminate  the	 wrap-
		       ping.

	      --gzip_decompress
		       When  reading  from  a  pipe  streaming gzip-compressed
		       data, decompress	the data. That option  is  not	needed
		       when reading from a standard gzip-compressed file.

	      --log filename
		       Write  messages	to the specified log file. Information
		       written includes	 program  version,  amount  of	memory
		       available,  number  of  cores and command line options,
		       and if need be, informational  messages,	 warnings  and
		       fatal  errors.  The  start  and	finish	times are also
		       recorded	as well	as the elapsed time  and  the  maximum
		       amount  of  memory consumed. The	different vsearch com-
		       mands can also write additional informations to the log
		       file.

	      --maxseqlength positive integer
		       All  vsearch  operations	 discard  sequences  of	length
		       equal or	greater	than integer  (50,000  nucleotides  by
		       default).

	      --minseqlength positive integer
		       All  vsearch  operations	 discard  sequences  of	length
		       smaller than integer: 1 nucleotide by default for sort-
		       ing  or	shuffling,  32	nucleotides for	clustering and
		       dereplication as	well as	 the  commands	--makeudb_use-
		       arch, --sintax, and --usearch_global.

	      --no_progress
		       Do  not	show the gradually increasing progress indica-
		       tor.

	      --notrunclabels
		       Do not truncate sequence	labels at first	space or  tab,
		       use  the	full header in output files. Turned off	by de-
		       fault for all commands except the sintax	command.

	      --quiet  Suppress	all messages to	stdout and stderr  except  for
		       warnings	and fatal error	messages.

	      --threads	positive integer
		       Number  of  computation threads to use (1 to 1024). The
		       number of threads should	be lesser or equal to the num-
		       ber  of	available CPU cores. The default is to use all
		       available resources and to launch one thread per	 logi-
		       cal  core.  The	following commands are multi-threaded:
		       allpairs_global,	 cluster_fast,	 cluster_size,	 clus-
		       ter_smallmem,	 cluster_unoise,     fastq_mergepairs,
		       fastx_mask,    maskfasta,     search_exact,     sintax,
		       uchime_ref, and usearch_global. Only one	thread is used
		       for the other commands.

       Chimera detection options:

	      Chimera detection	is based on a scoring function	controlled  by
	      five  options  (--dn,  --mindiffs,  --mindiv, --minh, --xn). Se-
	      quences are first	sorted by decreasing abundance,	if  available,
	      and compared on their plus strand	only (case insensitive).

	      Input  sequences	are  masked  as	specified with the --qmask and
	      --hardmask options. Masking of the database for reference	 based
	      chimera detection	is specified with the --dbmask option.

	      In de novo mode, input fasta file	must present abundance annota-
	      tions (i.e. a pattern [;]size=integer[;] in the  fasta  header).
	      Input  order  matters  for chimera detection, so we recommend to
	      sort sequences by	decreasing abundance (default of --derep_full-
	      length command). If your sequence	set needs to be	sorted,	please
	      see the --sortbysize command in the sorting section.

	      --abskew real
		       When using --uchime_denovo, the abundance skew is  used
		       to  distinguish in a three-way alignment	which sequence
		       is the chimera and which	are the	parents.  The  assump-
		       tion  is	that chimeras appear later in the PCR amplifi-
		       cation process and are  therefore  less	abundant  than
		       their  parents.	For --uchime3_denovo the default value
		       is 16.0.	For the	other commands,	the default  value  is
		       2.0,  which means that the parents should be at least 2
		       times more abundant than	their  chimera.	 Any  positive
		       value equal or greater than 1.0 can be used.

	      --alignwidth positive integer
		       When using --uchimealns,	set the	width of the three-way
		       alignments (80 nucleotides by default). Set to zero  to
		       eliminate wrapping.

	      --borderline filename
		       Output  borderline  chimeric  sequences to filename, in
		       fasta format. Borderline	 chimeric  sequences  are  se-
		       quences that have a high	enough score but which are not
		       sufficiently different from their closest parent.

	      --chimeras filename
		       Output chimeric sequences to filename, in fasta format.
		       Output order may	vary when using	multiple threads.

	      --db filename
		       When  using  --uchime_ref,  detect  chimeras  using the
		       fasta-formatted reference sequences contained in	 file-
		       name.  Reference	 sequences  are	assumed	to be chimera-
		       free. Chimeras cannot be	detected if their parents,  or
		       sufficiently  close  relatives,	are not	present	in the
		       database.

	      --dn strictly positive real number
		       pseudo-count prior on the number	of  no	votes,	corre-
		       sponding	 to  the  parameter  n	in the chimera scoring
		       function	(default value is 1.4).	 Increasing  --dn  re-
		       duces the likelihood of tagging a sequence as a chimera
		       (less false positives, but also more false negatives).

	      --fasta_score
		       Add the chimera score to	the headers in the fasta  out-
		       put files for chimeras, non-chimeras and	borderline se-
		       quences,	using the format ';uchime_denovo=float;'.

	      --mindiffs positive integer
		       Minimum number  of  differences	per  segment  (default
		       value   is   3).	  The	parameter   is	 ignored  with
		       --uchime2_denovo	and --uchime3_denovo.

	      --mindiv real
		       Minimum divergence from closest parent  (default	 value
		       is 0.8).	The parameter is ignored with --uchime2_denovo
		       and --uchime3_denovo.

	      --minh real
		       Minimum score (h). Increasing this value	tends  to  re-
		       duce the	number of false	positives and to decrease sen-
		       sitivity. Default value is  0.28,  and  values  ranging
		       from 0.0	to 1.0 included	are accepted. The parameter is
		       ignored with --uchime2_denovo and --uchime3_denovo.

	      --nonchimeras filename
		       Output non-chimeric sequences  to  filename,  in	 fasta
		       format.	Output	order  may  vary  when	using multiple
		       threads.

	      --relabel	string
		       Relabel sequences using the prefix string and a	ticker
		       (1,  2,	3,  etc.)  to  construct  the new headers. Use
		       --sizeout to conserve the abundance annotations.

	      --relabel_keep
		       When relabelling, keep the old identifier in the	header
		       after a space.

	      --relabel_md5
		       Relabel	sequences  using  the MD5 message digest algo-
		       rithm applied to	each sequence. Former sequence headers
		       are  discarded. The sequence is converted to upper case
		       and each	'U' is replaced	by a 'T' before	computation of
		       the  digest.  The  MD5  digest  is a cryptographic hash
		       function	designed to minimize the probability that  two
		       different  inputs  give	the same output, even for very
		       similar,	but non-identical inputs. Still,  there	 is  a
		       very  small, but	non-zero, probability that two differ-
		       ent inputs give the same	digest (i.e. a collision). MD5
		       generates  a  128-bit  (16-byte)	 digest	that is	repre-
		       sented by 16  hexadecimal  numbers  (using  32  symbols
		       among  0123456789abcdef). Use --sizeout to conserve the
		       abundance annotations.

	      --relabel_self
		       Relabel sequences using each sequence itself as	a  la-
		       bel.

	      --relabel_sha1
		       Relabel	sequences  using the SHA1 message digest algo-
		       rithm applied to	each sequence. It is  similar  to  the
		       --relabel_md5  option  but  uses	the SHA1 algorithm in-
		       stead of	the MD5	algorithm. SHA1	 generates  a  160-bit
		       (20-byte)  digest that is represented by	20 hexadecimal
		       numbers (40 symbols). The probability  of  a  collision
		       (two  non-identical sequences resulting in the same di-
		       gest) is	smaller	for the	SHA1 algorithm than it is  for
		       the MD5 algorithm.

	      --self   When  using  --uchime_ref,  ignore a reference sequence
		       when its	label matches the label	of the query  sequence
		       (useful	to  estimate  false-positive rate in reference
		       sequences).

	      --selfid When using --uchime_ref,	ignore	a  reference  sequence
		       when  its  nucleotide sequence is strictly identical to
		       the nucleotidic sequence	of the query.

	      --sizein In  de  novo  mode,  abundance	annotations   (pattern
		       '[>;]size=integer[;]')  present in sequence headers are
		       taken into account by default (--sizein is  always  im-
		       plied). This option is ignored by --uchime_ref.

	      --sizeout
		       When  relabelling,  add	abundance annotations to fasta
		       headers (using the format ';size=integer;').

	      --uchime_denovo filename
		       Detect chimeras present in  the	fasta-formatted	 file-
		       name, without external references (i.e. de novo). Auto-
		       matically sort the sequences in filename	by  decreasing
		       abundance  beforehand  (see the sorting section for de-
		       tails). Multithreading is not supported.

	      --uchime2_denovo filename
		       Detect chimeras present in  the	fasta-formatted	 file-
		       name,  using  the  UCHIME2 algorithm. This algorithm is
		       designed	for denoised amplicons (see --cluster_unoise).
		       Automatically  sort  the	 sequences  in filename	by de-
		       creasing	abundance beforehand (see the sorting  section
		       for details).  Multithreading is	not supported.

	      --uchime3_denovo filename
		       Detect  chimeras	 present  in the fasta-formatted file-
		       name, using the UCHIME2 algorithm. The only  difference
		       from --uchime2_denovo is	that the default minimum abun-
		       dance skew (--abskew) is	set to 16.0 rather than	2.0.

	      --uchime_ref filename
		       Detect chimeras present in the fasta-formatted filename
		       by  comparing  them  with  reference  sequences (option
		       --db). Multithreading is	supported.

	      --uchimealns filename
		       Write the three-way global  alignments  (parentA,  par-
		       entB,  chimera) to filename using a human-readable for-
		       mat. Use	--alignwidth to	modify alignment length.  Out-
		       put order may vary when using multiple threads. All se-
		       quences are converted to	upper case  before  alignment.
		       Lower  case letters indicate disagreement in the	align-
		       ment.

	      --uchimeout filename
		       Write chimera detection results	to  filename  using  a
		       18-field,   tab-separated   uchime-like	 format.   Use
		       --uchimeout5 to use a format compatible with usearch v5
		       and  earlier  versions. Rows output order may vary when
		       using multiple threads.

			      1.  score: higher	 score	means  a  more	likely
				  chimeric alignment.

			      2.  Q: query sequence label.

			      3.  A: parent A sequence label.

			      4.  B: parent B sequence label.

			      5.  T:  top  parent  sequence label (i.e.	parent
				  most similar to the query).  That  field  is
				  removed when using --uchimeout5.

			      6.  idQM:	 percentage of similarity of query (Q)
				  and model (M)	constructed as a part of  par-
				  ent A	and a part of parent B.

			      7.  idQA:	 percentage of similarity of query (Q)
				  and parent A.

			      8.  idQB:	percentage of similarity of query  (Q)
				  and parent B.

			      9.  idAB:	 percentage  of	similarity of parent A
				  and parent B.

			      10. idQT:	percentage of similarity of query  (Q)
				  and top parent (T).

			      11. LY: yes votes	in the left part of the	model.

			      12. LN: no votes in the left part	of the model.

			      13. LA:  abstain	votes  in the left part	of the
				  model.

			      14. RY: yes votes	 in  the  right	 part  of  the
				  model.

			      15. RN: no votes in the right part of the	model.

			      16. RA:  abstain	votes in the right part	of the
				  model.

			      17. div: divergence, defined as (idQM - idQT).

			      18. YN: query is chimeric	(Y), or	not (N), or is
				  a borderline case (?).

	      --uchimeout5
		       When using --uchimeout, write chimera detection results
		       using  a	 17-field,  tab-separated  uchime-like	format
		       (drop  the  5th	field of --uchimeout), compatible with
		       usearch version 5 and earlier versions.

	      --xn strictly positive real number
		       weight of no votes, corresponding to the	parameter beta
		       in  the	scoring	 function  (default value is 8.0). In-
		       creasing	--xn reduces the likelihood of tagging	a  se-
		       quence  as  a  chimera  (less false positives, but also
		       more false negatives).

	      --xsize  Strip abundance information from	the headers when writ-
		       ing the output file.

       Clustering options:

	      vsearch implements a single-pass,	greedy centroid-based cluster-
	      ing algorithm, similar to	the algorithms implemented in usearch,
	      DNAclust and sumaclust for example. Important parameters are the
	      global clustering	threshold (--id)  and  the  pairwise  identity
	      definition (--iddef).

	      Input  sequences	are  masked  as	specified with the --qmask and
	      --hardmask options.

	      --biomout	filename
		       Generate	an OTU table in	the biom version 1.0 JSON file
		       format as specified at <http://biom-format.org/documen-
		       tation/format_versions/biom-1.0.html>.  The format  de-
		       scribes	how  to	 store	a sparse matrix	containing the
		       abundances of the OTUs in the different	samples.  This
		       format  is  much	 more  efficient  than the classic and
		       mothur OTU table	formats	available with the --otutabout
		       and  --mothur_shared_out	 options, respectively,	and is
		       recommended at least for	large  tables.	The  OTUs  are
		       represented by the cluster centroids. Taxonomy informa-
		       tion will be included for the OTUs if available.	Sample
		       identifiers  will  be extracted from the	headers	of all
		       sequences in the	input file.  If	 the  header  contains
		       ';sample=abc123;' or ';barcodelabel=abc123;' or a simi-
		       lar string somewhere, then the given sample  identifier
		       (here  'abc123')	 will  be  used.  The semicolon	is not
		       mandatory at the	beginning or end of  the  header.  The
		       sample  identifier  may contain any printable character
		       except semicolons. If no	such sample  label  is	found,
		       the  identifier	in the initial part of the header will
		       be used,	but only letters, digits and  underscores  are
		       allowed.	 OTU  identifiers  will	 be extracted from the
		       headers of  the	cluster	 centroid  sequences.  If  the
		       header  contains	 ';otu=def789;'	 or  a	similar	string
		       somewhere,  then	 the  given   OTU   identifier	 (here
		       'def789')  will be used.	The semicolon is not mandatory
		       at the beginning	or end of the header. The OTU  identi-
		       fier  may  contain any printable	character except semi-
		       colons. If no such OTU label is found,  the  identifier
		       in the initial part of the header will be used, and all
		       characters  except  semicolons  are  allowed.  Alterna-
		       tively, OTU identifers can be generated using the rela-
		       belling	options	 (--relabel,  --relabel_self,  --rela-
		       bel_sha1,  or  --relabel_md5). Taxonomy information, if
		       present,	will also be extracted from the	headers	of the
		       centroid	   sequences.	 If    the   header   contains
		       ';tax=Homo_sapiens;' or	a  similar  string  somewhere,
		       then  the  given	taxonomy information (here 'Homo_sapi-
		       ens') will be used. The semicolon is not	 mandatory  at
		       the beginning or	end of the header. The taxonomy	infor-
		       mation may contain any printable	character except semi-
		       colons.	If  an	OTU table in the biom version 2.1 HDF5
		       file format is required,	the biom utility may  be  used
		       as   described	at  <http://biom-format.org/documenta-
		       tion/biom_conversion.html>.

	      --centroids filename
		       Output cluster centroid sequences to filename, in fasta
		       format.	The  centroid  is the sequence that seeded the
		       cluster (i.e. the first sequence	of the cluster).

	      --clusterout_id
		       Add cluster identifier information to the output	 files
		       when using the --centroids, --consout and --profile op-
		       tions.

	      --clusterout_sort
		       Sort some output	files by decreasing abundance  instead
		       of  input order.	It applies to the --consout, --msaout,
		       --profile, --centroids, and --uc	options. For --uc, the
		       sorting	applies	 only to the centroid information part
		       (the C lines).

	      --cluster_fast filename
		       Clusterize the fasta sequences in  filename,  automati-
		       cally sort by decreasing	sequence length	beforehand.

	      --cluster_size filename
		       Clusterize  the	fasta sequences	in filename, automati-
		       cally sort by decreasing	sequence abundance beforehand.

	      --cluster_smallmem filename
		       Clusterize the fasta sequences in filename without  au-
		       tomatically  modifying their order beforehand. Sequence
		       are  expected  to  be  sorted  by  decreasing  sequence
		       length, unless --usersort is used.

	      --cluster_unoise filename
		       Perform	denoising  of  the fasta sequences in filename
		       according to the	UNOISE version 3 algorithm  by	Robert
		       Edgar,  but  without  the chimera removal step. The op-
		       tions --minsize (default	8) and --unoise_alpha (default
		       2.0) may	be specified. Chimera removal (de novo)	should
		       be performed afterwards with --uchime3_denovo.

	      --clusters string
		       Output each cluster to a	separate fasta file using  the
		       prefix string and a ticker (0, 1, 2, etc.) to construct
		       the path	and filenames.

	      --consout	filename
		       Output cluster consensus	 sequences  to	filename.  For
		       each  cluster,  a multiple alignment is computed, and a
		       consensus sequence is constructed by taking the	major-
		       ity  symbol (nucleotide or gap) from each column	of the
		       alignment. Columns containing a majority	 of  gaps  are
		       skipped,	 except	for terminal gaps. If the --sizein op-
		       tion is specified, sequence abundances  will  be	 taken
		       into account.

	      --cons_truncate
		       This command is ignored.	A warning is issued.

	      --id real
		       Do  not	add  the target	to the cluster if the pairwise
		       identity	with the centroid is lower  than  real	(value
		       ranging	from  0.0 to 1.0 included). The	pairwise iden-
		       tity is defined as the number of	(matching  columns)  /
		       (alignment length - terminal gaps). That	definition can
		       be modified by --iddef.

	      --iddef 0|1|2|3|4
		       Change the pairwise identity definition used  in	 --id.
		       Values accepted are:

			      0.  CD-HIT   definition:	(matching  columns)  /
				  (shortest sequence length).

			      1.  edit distance: (matching columns) /  (align-
				  ment length).

			      2.  edit	distance excluding terminal gaps (same
				  as --id).

			      3.  Marine Biological  Lab  definition  counting
				  each gap opening (internal or	terminal) as a
				  single mismatch, whether or not the gap  was
				  extended:  1.0  -  [(mismatches  + gap open-
				  ings)/(longest sequence length)]

			      4.  BLAST	definition, equivalent to --iddef 1 in
				  a context of global pairwise alignment.

	      --minsize	positive integer
		       Specify	the minimum abundance of sequences for denois-
		       ing using --cluster_unoise. The default is 8.

	      --msaout filename
		       Output a	multiple sequence alignment  and  a  consensus
		       sequence	for each cluster to filename, in fasta format.
		       Be warned that vsearch computes	center	star  multiple
		       sequence	 alignments using a fast method	whose accuracy
		       can decrease  significantly  when  using	 low  pairwise
		       identity	 thresholds.  The  consensus  sequence is con-
		       structed	by taking the majority symbol  (nucleotide  or
		       gap)  from  each	 column	of the alignment. Columns con-
		       taining a majority of gaps are skipped, except for ter-
		       minal  gaps.  If	 the --sizein option is	specified, se-
		       quence abundances will be taken into account when  com-
		       puting the consensus.

	      --mothur_shared_out filename
		       Output  an  OTU	table in the mothur 'shared' tab-sepa-
		       rated   plain   text    format	 as    described    at
		       <https://www.mothur.org/wiki/Shared_file>.  The	format
		       describes how a matrix containing the abundances	of the
		       OTUs in the different samples is	stored.	The first line
		       will start with the strings 'label', 'group' and	'numO-
		       tus'  and is followed by	a list of all OTU identifiers.
		       The following lines, one	for each sample,  starts  with
		       the string 'vsearch' followed by	the sample identifier,
		       the total number	of OTUs, and a list of abundances  for
		       each  OTU  in  that  sample,  in	the order given	on the
		       first line. The OTU  and	 sample	 identifiers  are  ex-
		       tracted	from  the  FASTA headers of the	sequences. The
		       OTUs are	represented by the cluster centroids. See  the
		       --biomout option	for further details.

	      --otutabout filename
		       Output  an OTU table in the classic tab-separated plain
		       text format as a	matrix containing  the	abundances  of
		       the  OTUs in the	different samples. The first line will
		       start with the string '#OTU ID' and is  followed	 by  a
		       tab-separated  list of all sample identifiers. The fol-
		       lowing lines, one for each OTU,	starts	with  the  OTU
		       identifier  and	is followed by a tab-separated list of
		       abundances for that OTU in each sample,	in  the	 order
		       given on	the first line.	The OTU	and sample identifiers
		       are extracted from the FASTA headers of the  sequences.
		       The  OTUs  are represented by the cluster centroids. An
		       extra column is added to	the right of the table if tax-
		       onomy  information is available for at least one	of the
		       OTUs. This column will be labelled 'taxonomy' and  each
		       row  will  then	contain	 the  taxonomy information ex-
		       tracted for that	OTU. See the --biomout option for fur-
		       ther details.

	      --profile	filename
		       Output  a sequence profile to a text file with the fre-
		       quency of each nucleotide in each position in the  mul-
		       tiple alignment for each	cluster. There is a FASTA-like
		       header line for each cluster, followed by  the  profile
		       information  in	a tab-separated	format.	The eight col-
		       umns are:  position  (0-based),	consensus  nucleotide,
		       number  of As, number of	Cs, number of Gs, number of Ts
		       or Us, number of	gap symbols,  and  finally  the	 total
		       number  of ambiguous nucleotide symbols (B, D, H, K, M,
		       N, R, S,	Y, V or	W). All	numbers	are integers.  If  the
		       --sizein	 option	is specified, sequence abundances will
		       be taken	into account.

	      --qmask none|dust|soft
		       Mask regions in sequences using the dust	 or  the  soft
		       methods,	 or  do	 not  mask (none). Warning, when using
		       soft masking, clustering	becomes	 case  sensitive.  The
		       default is to mask using	dust.

	      --relabel	string
		       Relabel	sequence  identifiers in the output files pro-
		       duced by	--consout, --profile and --centroids  options.
		       Please  see  the	 description  of the same option under
		       Chimera detection for details.

	      --relabel_keep
		       When relabelling, keep the old identifier in the	header
		       after a space.

	      --relabel_md5
		       Relabel	sequence  identifiers in the output files pro-
		       duced by	--consout, --profile and --centroids  options.
		       Please  see  the	 description  of the same option under
		       Chimera detection for details.

	      --relabel_self
		       Relabel sequence	identifiers in the output  files  pro-
		       duced  by --consout, --profile and --centroids options.
		       Please see the description of  the  same	 option	 under
		       Chimera detection for details.

	      --relabel_sha1
		       Relabel	sequence  identifiers in the output files pro-
		       duced by	--consout, --profile and --centroids  options.
		       Please  see  the	 description  of the same option under
		       Chimera detection for details.

	      --sizein Take into account the abundance annotations present  in
		       the   input   fasta   file   (search  for  the  pattern
		       '[>;]size=integer[;]' in	sequence headers).

	      --sizeorder
		       When an amplicon	is close to 2 or more centroids,  both
		       within the distance specified with the --id option, re-
		       solve the ambiguity by clustering it with the  centroid
		       having the highest abundance, not necessarily the clos-
		       est one.	The option only	 has  effect  when  the	 value
		       specified  with	--maxaccepts  is  higher than one. The
		       --sizeorder option turns	on what	is sometimes  referred
		       to  as abundance-based greedy clustering	(AGC), in con-
		       trast to	the default distance-based  greedy  clustering
		       (DGC).

	      --sizeout
		       Add  abundance  annotations  to	the output fasta files
		       (add the	pattern	';size=integer;' to sequence headers).
		       If --sizein is specified, abundance annotations are re-
		       ported to output	files, and each	cluster	 centroid  re-
		       ceives a	new abundance value corresponding to the total
		       abundance of the	 amplicons  included  in  the  cluster
		       (--centroids option). If	--sizein is not	specified, in-
		       put abundances are set to 1 for amplicons, and  to  the
		       number of amplicons per cluster for centroids.

	      --strand plus|both
		       When  comparing	sequences with the cluster seed, check
		       the plus	strand only (default) or check both strands.

	      --uc filename
		       Output clustering results in filename using a tab-sepa-
		       rated  uclust-like format with 10 columns and 3 differ-
		       ent type	of entries (S, H or C).	Each fasta sequence in
		       the  input file can be either a cluster centroid	(S) or
		       a hit (H) assigned to a cluster.	 Cluster  records  (C)
		       summarize  information  (size, centroid label) for each
		       cluster.	In  the	 context  of  clustering,  the	option
		       --uc_allhits  has  no effect on the --uc	output.	Column
		       content varies with the type of entry (S, H or C):

			      1.  Record type: S, H, or	C.

			      2.  Cluster number (zero-based).

			      3.  Centroid length (S), query  length  (H),  or
				  cluster size (C).

			      4.  Percentage  of  similarity with the centroid
				  sequence (H),	or set to '*' (S, C).

			      5.  Match	orientation + or - (H),	or set to  '*'
				  (S, C).

			      6.  Not  used,  always  set  to '*' (S, C) or to
				  zero (H).

			      7.  Not used, always set to '*'  (S,  C)	or  to
				  zero (H).

			      8.  set  to '*' (S, C) or, for H,	compact	repre-
				  sentation of the  pairwise  alignment	 using
				  the	CIGAR  format  (Compact	 Idiosyncratic
				  Gapped  Alignment  Report):  M   (match/mis-
				  match),  D (deletion)	and I (insertion). The
				  equal	sign '=' indicates that	the  query  is
				  identical to the centroid sequence.

			      9.  Label	 of  the query sequence	(H), or	of the
				  centroid sequence (S,	C).

			      10. Label	of the centroid	sequence (H),  or  set
				  to '*' (S, C).

	      --unoise_alpha real
		       Specify	the  alpha  parameter  to the --cluster_unoise
		       command.	The default is 2.0.

	      --usersort
		       When using --cluster_smallmem, allow any	sequence input
		       order, not just a decreasing length ordering.

	      --xsize  Strip abundance information from	the headers when writ-
		       ing the output file.

	      ...      Most searching options as well as score filtering,  gap
		       penalties and masking also apply	to clustering (see the
		       Searching   section   for    definitions):    --alnout,
		       --blast6out,   --fastapairs,  --matched,	 --notmatched,
		       --maxaccept, --maxreject, --samout, --userout,  --user-
		       fields

       Dereplication and rereplication options:

	      --derep_fulllength filename
		       Merge  strictly	identical sequences contained in file-
		       name. Identical sequences are  defined  as  having  the
		       same  length  and  the same string of nucleotides (case
		       insensitive, T and U are	considered the same). See  the
		       options --sizein	and --sizeout to take into account and
		       compute abundance values. This command does not support
		       multithreading.

	      --derep_id filename
		       Merge  strictly	identical sequences contained in file-
		       name, as	with the --derep_fulllength command,  but  the
		       sequence	 labels	 (identifiers) on the header line need
		       to be identical too.

	      --derep_prefix filename
		       Merge sequences with identical  prefixes	 contained  in
		       filename.   A  short  sequence  identical to an initial
		       segment (prefix)	of another sequence  is	 considered  a
		       replicate  of  the  longer  sequence.  If a sequence is
		       identical to the	prefix	of  two	 or  more  longer  se-
		       quences,	 it is clustered with the shortest of them. If
		       they are	equally	long, it is clustered  with  the  most
		       abundant.  Remaining  ties  are	solved	using sequence
		       headers and sequence input order. Sequence  comparisons
		       are  case insensitive, and T and	U are considered iden-
		       tical. This command does	not support multithreading.

	      --maxuniquesize positive integer
		       Discard sequences with a	 post-dereplication  abundance
		       value greater than integer.

	      --minuniquesize positive integer
		       Discard	sequences  with	a post-dereplication abundance
		       value smaller than integer.

	      --output filename
		       Write the dereplicated sequences	to filename, in	 fasta
		       format  and  sorted  by decreasing abundance. Identical
		       sequences receive the header of the first  sequence  of
		       their group. If --sizeout is used, the number of	occur-
		       rences (i.e. abundance) of each sequence	 is  indicated
		       at  the	end  of	 their	fasta header using the pattern
		       ';size=integer;'.

	      --relabel	string
		       Please see the description of  the  same	 option	 under
		       Chimera detection for details.

	      --relabel_keep
		       When relabelling, keep the old identifier in the	header
		       after a space.

	      --relabel_md5
		       Please see the description of  the  same	 option	 under
		       Chimera detection for details.

	      --relabel_self
		       Please  see  the	 description  of the same option under
		       Chimera detection for details.

	      --relabel_sha1
		       Please see the description of  the  same	 option	 under
		       Chimera detection for details.

	      --rereplicate filename
		       Duplicate  each	sequence the number of times indicated
		       by the abundance	of each	sequence in the	specified file
		       (option	--sizein  is always implied). The sequence la-
		       bels are	identical for the same sequence, unless	 --re-
		       label,  --relabel_self, --relabel_sha1 or --relabel_md5
		       is used to create unique	labels.	Output is  written  to
		       the  file  specified with the --output option, in FASTA
		       format. The output file does not	contain	abundance  in-
		       formation  unless --sizeout is specified, in which case
		       an abundance of 1 is used.

	      --sizein Take into account the abundance annotations present  in
		       the   input   fasta   file   (search  for  the  pattern
		       '[>;]size=integer[;]' in	sequence headers). That	option
		       is active by default when rereplicating.

	      --sizeout
		       Add abundance annotations to the	output fasta file (add
		       the pattern ';size=integer;' to sequence	 headers).  If
		       --sizein	 is specified, each unique sequence receives a
		       new abundance value corresponding to  its  total	 abun-
		       dance  (sum  of	the abundances of its occurrences). If
		       --sizein	is not specified, input	abundances are set  to
		       1,  and	each  unique sequence receives a new abundance
		       value corresponding to its number of occurrences	in the
		       input file.

	      --strand plus|both
		       When  searching for strictly identical sequences, check
		       the plus	strand only (default) or check both strands.

	      --topn positive integer
		       Output only the top integer sequences  (i.e.  the  most
		       abundant).

	      --uc filename
		       Output  full-length  or prefix-dereplication results in
		       filename	using a	tab-separated uclust-like format  with
		       10 columns and 3	different type of entries (S, H	or C).
		       Each fasta sequence in the input	file can be  either  a
		       cluster	centroid  (S) or a hit (H) assigned to a clus-
		       ter. Cluster records (C)	summarize  information	(size,
		       centroid	 label)	 for  each  cluster. In	the context of
		       dereplication, the option --uc_allhits has no effect on
		       the --uc	output.	Column content varies with the type of
		       entry (S, H or C):

			      1.  Record type: S, H, or	C.

			      2.  Cluster number (zero-based).

			      3.  Sequence length (S, H), or cluster size (C).

			      4.  Percentage of	similarity with	 the  centroid
				  sequence (H),	or set to '*' (S, C).

			      5.  Match	 orientation + or - (H), or set	to '*'
				  (S, C).

			      6.  Not used, always set to '*' (S, C) or	0 (H).

			      7.  Not used, always set to '*' (S, C) or	0 (H).

			      8.  Not used, always set to '*'.

			      9.  Label	of the query sequence (H), or  of  the
				  centroid sequence (S,	C).

			      10. Label	 of  the centroid sequence (H),	or set
				  to '*' (S, C).

	      --xsize
		     Strip abundance information from the headers when writing
		     the output	file.

       Extraction options:

	      Sequences	 with  headers	matching  certain  criteria can	be ex-
	      tracted from FASTA and FASTQ  files  using  the  --fastx_getseq,
	      --fastx_getseqs and --fastx_getsubseq commands.

	      The  --fastx_getseq command requires the header to match a label
	      specified	with the --label option.  If the  --label_substr_match
	      option  is  given, the label may be a substring located anywhere
	      in the header, otherwise the entire header must match the	label.
	      These  matches  are not case-sensitive. The headers in the input
	      file are truncated at the	first space or	tab  character	unless
	      the  --notrunclabels  option  is	given.	The matching sequences
	      will be written to the files specified with the  --fastaout  and
	      --fastqout options, in FASTA and FASTQ format, respectively. Se-
	      quences that do not match	are written  to	 the  files  specified
	      with the --notmatched and	--notmatchedfq options,	respectively.

	      The  --fastx_getsubseq  command is similar to the	--fastx_getseq
	      command, but will	extract	a  subsequence	of  the	 matching  se-
	      quences.	The start position is specifed with the	--subseq_start
	      option and the end position is specified with  the  --subseq_end
	      option. The positions are	1-based, meaning that the first	symbol
	      of the sequence is at position 1.	If the start or	 end  position
	      option  is  not  specified, the default is to start at the first
	      position and end at the last position in the sequence.

	      The --fastx_getseqs command is  similar  to  the	--fastx_getseq
	      command  but  allows more	flexibility in specifying the label(s)
	      to be matched. A single label may	be specified using the --label
	      option  as  described  above. Alternatively, a file containing a
	      list of labels to	be matched may be specified with the  --labels
	      option.  The  file  must	be a plain text	file with one label on
	      each line. The --label_word and  --label_words  options  may  be
	      used to specify either a single word or a	file containing	a list
	      of words,	respectively, to be  matched.  Words  are  defined  as
	      character	 sequences delimited either by a character that	is not
	      alpha-numeric (A-Z, a-z, or 0-9) or by the beginning or  end  of
	      the  header.  Word matching is case-sensitive. The --label_field
	      option will limit	the matching of	words to a  certain  field  in
	      the header.

	      --fastaout filename
		       Write  the  extracted  sequences	in FASTA format	to the
		       file with the given name.

	      --fastqout filename
		       Write the extracted sequences in	FASTQ  format  to  the
		       file with the given name. This option is	illegal	if the
		       input is	in FASTA format.

	      --fastx_getseq filename
		       Extract sequences from the given	FASTA or  FASTQ	 file.
		       Specify a label to match	using the --label option. Out-
		       put  files   are	  specified   with   the   --fastaout,
		       --fastqout, --notmatched	and --notmatchedfq options.

	      --fastx_getseqs filename
		       Extract	sequences  from	the given FASTA	or FASTQ file.
		       Specify the label or labels to match using one  of  the
		       following  options: --label, --labels, --label_word, or
		       --label_words. Output  files  are  specified  with  the
		       --fastaout, --fastqout, --notmatched and	--notmatchedfq
		       options.

	      --fastx_getsubseq	filename
		       Extract a certain part of some of the sequences in  the
		       given  FASTA or FASTQ file. Specify labels to match us-
		       ing the --label option. Specify the  subsequence	 range
		       to  be  extracted  with	the  --subseq_start and	--sub-
		       seq_end options.	Output files are  specified  with  the
		       --fastaout, --fastqout, --notmatched and	--notmatchedfq
		       options.

	      --label string
		       Specifiy	the label to match in the sequence header. Un-
		       less  the --label_substr_match option is	given, the la-
		       bel must	match the entire header. The comparison	is not
		       case-sensitive.

	      --label_field string
		       Specify a field name to be used when matching using the
		       --label_word or --label_words option. The field name is
		       a  string  like	"abc" that must	precede	the word to be
		       matched with an equals sign (=) in between.  The	 field
		       must be delimited by semicolons or the beginning	or end
		       of the header. The following header will	match the  la-
		       bel 123 in the field abc: "seq1;abc=123".

	      --label_substr_match
		       The  labels  specified with the --label or the --labels
		       option may match	anywhere in the	header if this	option
		       is  given.  Otherwise a label needs to match the	entire
		       header.

	      --label_word string
		       Specifiy	a word to match	in the sequence	header.	 Words
		       are defined as strings delimited	by either the start or
		       end of the header or by any symbol that is not a	letter
		       (A-Z,  a-z) or digit (0-9). The comparison is case-sen-
		       sitive.

	      --label_words filename
		       Specify a file containing words to be  matched  against
		       the  sequence headers. The plain	text file must contain
		       one word	on each	line.  Words are  defined  as  strings
		       delimited  by  either the start or end of the header or
		       by any symbol that is not a letter (A-Z,	a-z) or	 digit
		       (0-9). The comparison is	case-sensitive.

	      --labels filename
		       Specify	a file containing labels to be matched against
		       the sequence headers. The plain text file must  contain
		       one label on each line. Unless the --label_substr_match
		       option is given,	a label	must match the entire  header.
		       The comparison is not case-sensitive.

	      --notmatched filename
		       Write the sequences that	were not extracted to the file
		       with the	given name, in FASTA format.

	      --notmatchedfq filename
		       Write the sequences that	were not extracted to the file
		       with  the  given	 name, in FASTQ	format.	This option is
		       illegal if the input is in FASTA	format.

	      --subseq_end positive integer
		       Specifiy	the end	position in  the  sequences  when  ex-
		       tracting	 subsequences using the	--fastx_getsubseq com-
		       mand. Positions are 1-based, so the sequences start  at
		       position	1. The default is to end at the	end of the se-
		       quence if this option is	not specified.

	      --subseq_start positive integer
		       Specifiy	the starting position in  the  sequences  when
		       extracting  subsequences	 using	the  --fastx_getsubseq
		       command.	Positions are 1-based, so the sequences	 start
		       at position 1. The default is to	start at the beginning
		       of the sequence (position 1), if	 this  option  is  not
		       specified.

       FASTA/FASTQ file	processing options:

	      Analyse,	trim,  filter,	convert	 or  merge  sequences in FASTQ
	      files, or	reverse	complement sequences in	FASTA or FASTQ	files.
	      The  --fastq_chars command can be	used to	analyse	FASTQ files to
	      identify the quality encoding and	the  range  of	quality	 score
	      values  used.  To	convert	between	different FASTQ	file variants,
	      use the --fastq_convert command.	Statistical  analysis  of  the
	      quality  and length of the sequences in a	FASTQ file may be per-
	      formed   with   the    --fastq_stats,    --fastq_eestats,	   and
	      --fastq_eestats2	commands.  Sequences  may be trimmed, filtered
	      and converted by the --fastq_filter or --fastx_filter  commands.
	      Paired-end reads can be merged using the --fastq_mergepairs com-
	      mand. The	--fastx_revcomp	command	reverse-complements sequences.
	      Finally,	the  --sff_convert  command can	be used	to convert SFF
	      files to FASTQ.

	      --eeout  When   using    --fastq_filter,	  --fastx_filter    or
		       --fastq_mergepairs,  include the	number of expected er-
		       rors (ee) in the	sequence header	 of  FASTQ  and	 FASTA
		       output	files.	 This  option  is  a  synonym  of  the
		       --fastq_eeout option. Use the --xee  option  to	remove
		       this information	from headers.

	      --eetabbedout filename
		       When  specified	with  the  --fastq_mergepairs command,
		       write statistics	with expected errors  of  each	merged
		       read  to	 the  given  file. The file is a tab separated
		       file with four columns: The number of  errors  expected
		       in  the	forward	read, the number of expected errors in
		       the reverse read, the number of observed	errors in  the
		       forward	read, and the number of	observed errors	in the
		       reverse read. The observed number  of  errors  are  the
		       number  of  differences	in  the	 overlap region	of the
		       merged sequence relative	to each	of the	reads  in  the
		       pair.

	      --fastaout filename
		       When   using   --fastq_filter,	--fastq_mergepairs  or
		       --fastx_filter, write to	the given FASTA-formatted file
		       the  sequences  passing	the  filter, or	the merged se-
		       quences.

	      --fastaout_rev filename
		       When using --fastq_filter, or --fastx_filter, write  to
		       the  given FASTA-formatted file the reverse reads pass-
		       ing the filter.

	      --fastaout_notmerged_fwd filename
		       When using --fastq_mergepairs, write forward reads  not
		       merged to the specified FASTA file.

	      --fastaout_notmerged_rev filename
		       When  using --fastq_mergepairs, write reverse reads not
		       merged to the specified FASTA file.

	      --fastaout_discarded filename
		       Write sequences that do not  pass  the  filter  of  the
		       --fastq_filter  or  --fastx_filter command to the given
		       FASTA-formatted file.

	      --fastaout_discarded_rev filename
		       Write reverse reads that	do not pass the	filter of  the
		       --fastq_filter  or  --fastx_filter command to the given
		       FASTA-formatted file.

	      --fastq_allowmergestagger
		       When using --fastq_mergepairs, allow to merge staggered
		       read  pairs. Staggered pairs are	pairs where the	3' end
		       of the reverse read has an overhang to the left of  the
		       5'  end	of  the	forward	read. This situation can occur
		       when a very short fragment is sequenced.	The  3'	 over-
		       hang  of	the reverse read is not	included in the	merged
		       sequence. The opposite option is	the  --fastq_nostagger
		       option. The default is to discard staggered pairs.

	      --fastq_ascii positive integer
		       Define the ASCII	character number used as the basis for
		       the FASTQ quality score.	The default is	33,  which  is
		       used  by	 the  Sanger  /	 Illumina  1.8+	 FASTQ	format
		       (phred+33). The value 64	is used	by the	Solexa,	 Illu-
		       mina 1.3+ and Illumina 1.5+ formats (phred+64). Only 33
		       and 64 are valid	arguments.

	      --fastq_asciiout positive	integer
		       When using --fastq_convert or --sff_convert, define the
		       ASCII  character	number used as the basis for the FASTQ
		       quality score when writing FASTQ	output files. The  de-
		       fault is	33. Only 33 and	64 are valid arguments.

	      --fastq_chars filename
		       Summarize  the  composition  of	sequence  and  quality
		       strings contained in the	input FASTQ file. For each  of
		       the four	DNA letters, --fastq_chars gives the number of
		       occurrences of the letter, its relative	frequency  and
		       the  length of the longest run of that letter. For each
		       character present in the	quality	strings, --fastq_chars
		       gives  the  ASCII  value	of the character, its relative
		       frequency, and the number of  times  a  k-mer  of  that
		       character  appears  at  the end of quality strings. The
		       length of the k-mer can be set using --fastq_tail (4 by
		       default).  The command --fastq_chars tries to automati-
		       cally detect the	 quality  encoding  (Solexa,  Illumina
		       1.3+, Illumina 1.5+ or Illumina 1.8+/Sanger) by analyz-
		       ing the range of	observed quality score values. In case
		       of  success,  --fastq_chars  suggests  values  for  the
		       --fastq_ascii (33 or 64), --fastq_qmin and --fastq_qmax
		       options to be used with the other commands that require
		       a FASTQ input file.

	      --fastq_convert filename
		       Convert between the different  variants	of  the	 FASTQ
		       file  format.  The  quality  encoding of	the input file
		       must be specified with the --fastq_ascii	option (either
		       33  or  64,  the	default	is 33),	and the	output quality
		       encoding	must be	specified  with	 the  --fastq_asciiout
		       option  (default	 33).  The  minimum and	maximum	output
		       quality scores may be limited using the --fastq_qminout
		       and  --fastq_qmaxout options. The output	file is	speci-
		       fied with the --fastqout	option.

	      --fastq_eeout
		       When   using    --fastq_filter,	  --fastx_filter    or
		       --fastq_mergepairs,  include the	number of expected er-
		       rors (ee) in the	sequence header	 of  FASTQ  and	 FASTA
		       files.  This option is a	synonym	of the --eeout option.
		       Use the --xee option to remove  this  information  from
		       headers.

	      --fastq_eestats filename
		       Analyze	a FASTQ	file and report	statistics on the dis-
		       tributions of quality scores, error  probabilities  and
		       expected	 accumulated errors. The report, a table of 21
		       tab-separated columns, is written to the	file specified
		       with  the --output option. The first column corresponds
		       to the position in the  reads  (Pos).  The  second  and
		       third columns correspond	to the number of reads (Reads)
		       and percentage of reads (PctRecs) that include this po-
		       sition. The remaining columns include information about
		       the distribution	of quality  scores  in	this  position
		       (Q), error probabilities	in this	position (Pe), and fi-
		       nally the expected number of  accumulated  errors  from
		       the  beginning of the reads and until the current posi-
		       tion (EE). For each of the Q, Pe	and EE	distributions,
		       the  following  statistics  are included: minimum value
		       (Min), lower quartile (Low), median (Med), mean (Mean),
		       upper quartile (Hi), and	maximum	value (Max). The qual-
		       ity encoding and	the range of  quality  values  may  be
		       specified    with    --fastq_ascii   --fastq_qmin   and
		       --fastq_qmax.

	      --fastq_eestats2 filename
		       Analyze the specified FASTQ file	and report  statistics
		       on  the number of sequences that	would be retained at a
		       combination of selected cutoffs for  length  truncation
		       and  maximum expected errors, that could	potentially be
		       used  as	 arguments   to	  the	--fastq_trunclen   and
		       --fastq_maxee  options  to  the --fastq_filter command.
		       The result, a table of two or more columns, is  written
		       to  the	file specified with the	--output option. There
		       is a line for each length truncation cutoff. The	 first
		       column  on  each	 line contains the selected truncation
		       length, while the following columns contain the	number
		       of sequences and, in parenthesis, the percentage	of se-
		       quences that would be retained at the selected EE  lev-
		       els.   The  truncation  length cutoffs may be specified
		       with the	--length_cutoffs option	and requires a list of
		       three  comma-separated integers indicating the shortest
		       cutoff, the longest cutoff, and the  increment  between
		       cutoffs.	 The  longest  cutoff  may be specified	with a
		       star (*)	which indicates	that the limit is equal	to the
		       longest sequence	in the input file. The default setting
		       is "50,*,50" meaning that  truncation  lengths  of  50,
		       100,  150  and  so on up	to the longest sequence	length
		       should be used.	The maximum expected error  (EE)  cut-
		       offs  may  be  specified	 with  the --ee_cutoffs	option
		       which requires a	comma-separated	list of	floating point
		       numbers	 as  its  argument.  The  default  setting  is
		       "0.5,1.0,2.0" that indicates that expected error	levels
		       of 0.5, 1.0 and 2.0 should be used.

	      --fastq_filter filename
		       Trim  and/or  filter sequences in the given FASTQ file.
		       Similar to the --fastx_filter command, but  works  only
		       on FASTQ	files. See --fastx_filter for details.

	      --fastq_join filename
		       Join  paired-end	 sequence  reads into one sequence and
		       add a gap between them using a  padding	sequence.  The
		       sequences  are  not merged as with the fastq_mergepairs
		       command,	but simply joined  with	 a  gap.  The  forward
		       reads  are specified as the argument to this option and
		       the reverse reads are specified with the	--reverse  op-
		       tion.  The  resulting  sequences	consist	of the forward
		       read, the padding sequence and the  reverse  complement
		       of  the reverse read. The padding sequence is specified
		       with the	--join_padgap option and the  padding  quality
		       is  specified  with  the	--join_padgapq option. The de-
		       fault padding sequence string is	NNNNNNNN and  the  de-
		       fault padding quality string is IIIIIIII, corresponding
		       to a base quality score of  40  (a  very	 high  quality
		       score  with  error  probability 0.0001).	The joined se-
		       quences are output to the file(s)  specified  with  the
		       --fastaout or --fastqout	options.

	      --fastq_maxdiffs positive	integer
		       When using --fastq_mergepairs, specify the maximum num-
		       ber of non-matching nucleotides allowed in the  overlap
		       region. That option has a strong	influence on the merg-
		       ing success rate. The default value is 10.

	      --fastq_maxdiffpct real
		       When using --fastq_mergepairs, specify the maximum per-
		       centage	of  non-matching  nucleotides  allowed	in the
		       overlap region. The default value is 100.0%. There  are
		       other more sophisticated	rules in the merging algorithm
		       that will discard read pairs with a  high  fraction  of
		       mismatches.

	      --fastq_maxee real
		       When   using   --fastq_filter,	--fastq_mergepairs  or
		       --fastx_filter, discard sequences with  more  than  the
		       specified number	of expected errors.

	      --fastq_maxee_rate real
		       When  using  --fastq_filter  or --fastx_filter, discard
		       sequences with more than	the specified  number  of  ex-
		       pected errors per base.

	      --fastq_maxlen positive integer
		       When   using   --fastq_filter,	--fastq_mergepairs  or
		       --fastx_filter, discard sequences with  more  than  the
		       specified number	of bases.

	      --fastq_maxmergelen positive integer
		       When  using  --fastq_mergepairs,	 specify  the  maximum
		       length of the merged sequence. By default there	is  no
		       limit.

	      --fastq_maxns positive integer
		       When   using   --fastq_filter,	--fastq_mergepairs  or
		       --fastx_filter, discard sequences with  more  than  the
		       specified number	of N's.

	      --fastq_mergepairs filename
		       Merge  paired-end sequence reads	into one sequence. The
		       forward reads are specified as the argument to this op-
		       tion and	the reverse reads are specified	with the --re-
		       verse option. The merged	sequences are  output  to  the
		       file(s) specified with the --fastaout or	--fastqout op-
		       tions. The non-merged reads can be output to the	 files
		       specified  with	the  --fastaout_notmerged_fwd,	--fas-
		       taout_notmerged_rev,    --fastqout_notmerged_fwd	   and
		       --fastqout_notmerged_rev	 options.  Statistics  may  be
		       output to the file specified with the --eetabbedout op-
		       tion.  Sequences	 are  truncated	 as specified with the
		       --fastq_truncqual option	to remove low-quality bases in
		       the  3'	end.  Sequences	 shorter  than	specified with
		       --fastq_minlen (after truncation) are discarded	(1  by
		       default).  Sequences  with  too	many  ambiguous	 bases
		       (N's), as specified with	 the  --fastq_maxns  are  also
		       discarded  (no  limit  by default). Staggered reads are
		       not merged unless the --fastq_allowmergestagger	option
		       is  specified. The minimum length of the	overlap	region
		       between the reads may be	specified with the --fastq_mi-
		       novlen option (at least 5, default 10). The overlap re-
		       gion may	not include  more  mismatches  than  specified
		       with  the  --fastq_maxdiffs option (10 by default) or a
		       higher percentage of mismatches than specified with the
		       --fastq_maxdiffpct  option  (100.0% by default),	other-
		       wise the	read pair is discarded.	Additional rules  will
		       avoid  merging of reads that cannot be aligned reliably
		       and unambiguously. The mimimum and  maximum  length  of
		       the   merged   sequence	 may  be  specified  with  the
		       --fastq_minmergelen  and	 --fastq_maxmergelen  options,
		       respectively. The quality value limits for output files
		       may   be	  specied   with   the	 --fastq_qminout   and
		       --fastq_qmaxout	options,  but  they  apply only	to the
		       merged	region.	   Other   relevant    options	  are:
		       --fastq_ascii,	  --fastq_maxee,    --fastq_nostagger,
		       --fastq_qmax, --fastq_qmin, and --label_suffix.

	      --fastq_minlen positive integer
		       When  using   --fastq_filter,   --fastq_mergepairs   or
		       --fastx_filter,	discard	 sequences  with less than the
		       specified number	of bases (default 1).

	      --fastq_minmergelen positive integer
		       When  using  --fastq_mergepairs,	 specify  the  minimum
		       length of the merged sequence. The default is 1.

	      --fastq_minovlen positive	integer
		       When  using  --fastq_mergepairs,	 specify  the  minimum
		       overlap between the merged reads. The  default  is  10.
		       Must be at least	5.

	      --fastq_nostagger
		       When  using  --fastq_mergepairs,	 forbid	the merging of
		       staggered read pairs. This is the default behaviour  of
		       --fastq_mergepairs.  To	change that behaviour, see the
		       --fastq_allowmergestagger option.

	      --fastq_qmax positive integer
		       Specify the maximum quality score accepted when reading
		       FASTQ  files. The default is 41,	which is usual for re-
		       cent Sanger/Illumina 1.8+ files.

	      --fastq_qmaxout positive integer
		       When  using  --fastq_mergepairs,	  --fastq_convert   or
		       --sff_convert,  specify	the maximum quality score used
		       when writing FASTQ files. The default is	41,  which  is
		       usual for recent	Sanger/Illumina	1.8+ files. Older for-
		       mats may	use a maximum quality score of 40.  The	 limit
		       only   applies	to   the   merged  region  when	 using
		       --fastq_mergepairs.

	      --fastq_qmin positive integer
		       Specify the minimum quality score  accepted  for	 FASTQ
		       files.  The  default  is	 0,  which is usual for	recent
		       Sanger/Illumina	1.8+  files.  Older  formats  may  use
		       scores between -5 and 2.

	      --fastq_qminout positive integer
		       When   using   --fastq_mergepairs,  --fastq_convert  or
		       --sff_convert, specify the minimum quality  score  used
		       when  writing  FASTQ  files. The	default	is 0, which is
		       usual for Sanger/Illumina 1.8+ files. Older versions of
		       the  format  may	use scores between -5 and 2. The limit
		       applies	only  to  the	merged	 region	  when	 using
		       --fastq_mergepairs.

	      --fastq_stats filename
		       Analyze	a FASTQ	file and report	the number of reads it
		       contains. The quality encoding and the range of quality
		       values may be specified with --fastq_ascii --fastq_qmin
		       and --fastq_qmax. That command requires the  --log  op-
		       tion  and  outputs the following	detailed statistics on
		       read length, quality score, length vs. quality  distri-
		       butions,	and length / quality filtering:

		       Read length distribution:

			      1.  L: read length.

			      2.  N: number of reads.

			      3.  Pct: fraction	of reads with this length.

			      4:  AccPct:  fraction  of	reads with this	length
				  or longer.

		       Quality score distribution:

			      1.  ASCII: character encoding the	quality	score.

			      2.  Q: Phred quality score.

			      3.  Pe: probability of error associated with the
				  quality score.

			      4.  N: number of bases with this quality score.

			      5.  Pct:	fraction  of  bases  with this quality
				  score.

			      6:  AccPct: fraction of bases with this  quality
				  score	or higher.

		       Length vs. quality distribution:

			      1.  L: position in reads (starting from position
				  2).

			      2.  PctRecs: fraction of	reads  with  at	 least
				  this length.

			      3.  AvgQ:	 average  quality score	over all reads
				  up to	this position.

			      4.  P(AvgQ): error probability corresponding  to
				  AvgQ.

			      5.  AvgP:	average	error probability.

			      6:  AvgEE: average expected error	over all reads
				  up to	this position.

			      7:  Rate:	growth rate of AvgEE between this  po-
				  sition and position -	1.

			      8:  RatePct: Rate	(as explained above) expressed
				  as a percentage.

		       Effect of expected error	and length filtering:
			      The first	column indicates read lengths (L). The
			      next  four  columns indicate the number of reads
			      that would be  retained  by  the	--fastq_filter
			      command  if the reads were truncated at length L
			      (option --fastq_trunclen L) and filtered to have
			      a	 maximum  expected  error of 1.0, 0.5, 0.25 or
			      0.1 (with	the option --fastq_maxee  float).  The
			      last four	columns	indicate the fraction of reads
			      that would be  retained  by  the	--fastq_filter
			      command  using  the  same	length and maximum ex-
			      pected error parameters.

		       Effect of minimum quality and length filtering:
			      The first	column indicates read  lengths	(Len).
			      The  next	 four columns indicate the fraction of
			      reads that would be retained by the --fastq_fil-
			      ter  command  if	the  reads  were  truncated at
			      length Len (option --fastq_trunclen Len)	or  at
			      the first	position with a	quality	Q below	5, 10,
			      15 or 20 (option --fastq_truncqual Q).

	      --fastq_stripleft	positive integer
		       When using --fastq_filter or --fastx_filter, strip  the
		       specified  number  of  bases  from  the left end	of the
		       reads.

	      --fastq_stripright positive integer
		       When using --fastq_filter or --fastx_filter, strip  the
		       specified  number  of  bases  from the right end	of the
		       reads.

	      --fastq_tail positive integer
		       When using --fastq_chars, count the number of  times  a
		       series  of characters of	length k appears at the	end of
		       quality strings.	By default, k =	4.

	      --fastq_truncee real
		       When using --fastq_filter or  --fastx_filter,  truncate
		       sequences  so  that  their  total expected error	is not
		       higher than the specified value.

	      --fastq_trunclen positive	integer
		       When using --fastq_filter or  --fastx_filter,  truncate
		       sequences  to  the  specified length. Shorter sequences
		       are discarded.

	      --fastq_trunclen_keep positive integer
		       When using --fastq_filter or  --fastx_filter,  truncate
		       sequences  to  the  specified length. Shorter sequences
		       are not discarded.

	      --fastq_truncqual	positive integer
		       When using --fastq_filter or  --fastx_filter,  truncate
		       sequences  starting from	the first base with the	speci-
		       fied base quality score value or	lower.

	      --fastqout filename
		       When  using   --fastq_filter,   --fastq_mergepairs   or
		       --fastx_filter, write to	the given FASTQ-formatted file
		       the sequences passing the filter,  or  the  merged  se-
		       quences.

	      --fastqout_rev filename
		       When  using  --fastq_filter or --fastx_filter, write to
		       the given FASTQ-formatted file the reverse reads	 pass-
		       ing the filter.

	      --fastqout_discarded filename
		       When  using --fastq_filter or --fastx_filter, write se-
		       quences that do not pass	the filter to the given	FASTQ-
		       formatted file.

	      --fastqout_discarded_rev filename
		       When  using --fastq_filter or --fastx_filter, write re-
		       verse reads that	do not pass the	filter	to  the	 given
		       FASTQ-formatted file.

	      --fastqout_notmerged_fwd filename
		       When  using --fastq_mergepairs, write forward reads not
		       merged to the specified FASTQ file.

	      --fastqout_notmerged_rev filename
		       When using --fastq_mergepairs, write reverse reads  not
		       merged to the specified FASTQ file.

	      --fastx_filter filename
		       Trim  and/or filter the sequences in the	given FASTA or
		       FASTQ file and output the remaining  sequences  to  the
		       FASTQ  file specified with the --fastqout option	and/or
		       to the FASTA file specified with	the --fastaout option.
		       Discarded  sequences are	written	to the files specified
		       with the	--fastaout_discarded and  --fastqout_discarded
		       options.	The input format (FASTA	or FASTQ) is automati-
		       cally detected. If the input  consists  of  paired  se-
		       quences,	an input file with reverse reads may be	speci-
		       fied with the --reverse option, and corresponding  out-
		       put  will  be  written  to the files specified with the
		       --fastqout_rev,	   --fastaout_rev,     --fastqout_dis-
		       carded_rev,  and	--fastaout_discarded_rev options. Out-
		       put can not be written to FASTQ files if	the  input  is
		       in  FASTA  format.  The sequences are first trimmed and
		       then filtered based on the remaining  bases.  Sequences
		       may  be	trimmed	 using	the options --fastq_stripleft,
		       --fastq_stripright, --fastq_truncee,  --fastq_trunclen,
		       --fastq_trunclen_keep  and  --fastq_truncqual.  The se-
		       quences	 may   be   filtered   using	the    options
		       --fastq_maxee,	 --fastq_maxee_rate,   --fastq_maxlen,
		       --fastq_maxns,	  --fastq_minlen     (default	   1),
		       --fastq_trunclen,  --maxsize,  and --minsize. Sequences
		       not satisfying  the  requirements  are  discarded.  For
		       pairs  of sequences, both sequences in a	pair must sat-
		       isfy the	requirements, otherwise	both are discarded. If
		       no  shortening  or filtering options are	given, all se-
		       quences are written to the output files,	possibly after
		       conversion  from	 FASTQ	to FASTA format. The --relabel
		       option may be used to relabel the output	sequences. The
		       --eeout	option may be used to output the expected num-
		       ber of errors in	each  sequence.	 After	all  sequences
		       have  been  processed, the number of kept and discarded
		       sequences will be shown,	as well	as  how	 many  of  the
		       kept sequences were trimmed. When the input is in FASTA
		       format, the following options are not accepted  because
		       quality	  scores    are	   not	 available:   --eeout,
		       --fastq_ascii,	   --fastq_eeout,	--fastq_maxee,
		       --fastq_maxee_rate,	--fastq_out,	 --fastq_qmax,
		       --fastq_qmin,	--fastq_truncee,    --fastq_truncqual,
		       --fastqout_discarded,	     --fastqout_discarded_rev,
		       --fastqout_rev.

	      --fastx_revcomp filename
		       Reverse-complement the sequences	in the given FASTA  or
		       FASTQ  file  to	a  file	 specified with	the --fastaout
		       and/or --fastqout options. If  the  input  file	is  in
		       FASTA  format,  the output can not be written back to a
		       FASTQ file due to missing base quality scores.

	      --join_padgap string
		       When running --fastq_join, use the string as a sequence
		       padding string. The default is NNNNNNNN (8 N's).

	      --join_padgapq string
		       When  running --fastq_join, use the string as a quality
		       padding string. The default is a	string of I's equal in
		       length  to  the	sequence  padding string. The letter I
		       corresponds to a	base quality score of 40 indicating  a
		       very  high  quality  base  with	error  probability  of
		       0.0001.

	      --label_suffix string
		       When using --fastx_revcomp or  --fastq_mergepairs,  add
		       the suffix string to sequence headers.

	      --maxsize	positive integer
		       When  using  --fastq_filter  or --fastx_filter, discard
		       sequences with an abundance higher than	the  specified
		       value.

	      --minsize	positive integer
		       When  using  --fastq_filter  or --fastx_filter, discard
		       sequences with an abundance lower  than	the  specified
		       value.

	      --output filename
		       When  using  --fastq_eestats or --fastq_eestats2, write
		       tabulated results to  filename.	See  --fastq_eestats's
		       and --fastq_eestats2's documentation for	a complete de-
		       scription of the	table.

	      --relabel_keep
		       When using --relabel, keep the old  identifier  in  the
		       header after a space.

	      --relabel	string
		       Please  see  the	 description  of the same option under
		       Chimera detection for details.

	      --relabel_md5
		       Please see the description of  the  same	 option	 under
		       Chimera detection for details.

	      --relabel_self
		       Please  see  the	 description  of the same option under
		       Chimera detection for details.

	      --relabel_sha1
		       Please see the description of  the  same	 option	 under
		       Chimera detection for details.

	      --reverse	filename
		       When	 using	   --fastq_filter,     --fastx_filter,
		       --fastq_mergepairs or --fastq_join, specify  the	 FASTQ
		       file containing containing the reverse reads.

	      --sff_convert filename
		       Convert	the  given SFF file to FASTQ. The FASTQ	output
		       file is specified with the --fastqout option.  The  se-
		       quence  may  be clipped as specified in the SFF file if
		       the option --sff_clip is	specified, otherwise no	 clip-
		       ping  occurs.  Bases  that  would have been clipped are
		       converted to lower case,	while the  rest	 is  in	 upper
		       case. The output	quality	encoding may be	specified with
		       the --fastq_asciiout option (default 33).  The  minimum
		       and  maximum output quality scores may be limited using
		       the --fastq_qminout and --fastq_qmaxout options.

	      --sff_clip
		       Specifies  that	the   sequences	  converted   by   the
		       --sff_convert command should be clipped in both ends as
		       indicated in the	SFF file. By default  no  clipping  is
		       performed.

	      --xsize  Strip abundance information from	the headers when writ-
		       ing the output file.

	      --xee    Strip information about expected	errors (ee)  from  the
		       output  file  headers. This information is added	by the
		       --fastq_eeout and --eeout options.

       Masking options:

	      An input sequence	can be composed	of lower-  or  uppercase  let-
	      ters.  When  soft	 masking  is specified,	lower case letters are
	      treated as symbols that should be	masked.	Otherwise the case  of
	      the input	sequences is ignored.

	      Masking  is  performed  by  the  commands	 for chimera detection
	      (uchime_denovo,  uchime_ref),  clustering	 (cluster_fast,	 clus-
	      ter_smallmem,  cluster_size),  masking  (maskfasta, fastx_mask),
	      pairwise alignment (allpairs_global) and	searching  (search_ex-
	      act, usearch_global).

	      Masking  is usually specified with the --qmask option, while the
	      --dbmask option is used for  the	database  sequences  specified
	      with  the	 --db option with the --usearch_global,	--search_exact
	      and --uchime_ref commands.

	      The argument to the --qmask and --dbmask	option	may  be	 none,
	      soft  or	dust.  If the argument is none,	the no masking is per-
	      formed. If the argument is  soft	the  lower  case  symbols  are
	      masked. Finally, if the argument is dust,	the sequence is	masked
	      using the	DUST algorithm by Tatusov and Lipman to	mask  low-com-
	      plexity regions.

	      If  the  --hardmask  option is specified,	all masked regions are
	      converted	to N's,	otherwise  masked  regions  are	 indicated  by
	      lower case letters.

	      If  any  sequence	 is masked, the	masked version of the sequence
	      (with lower case letters or N's) is used in  all	output	files.
	      Otherwise	 the  sequence is unmodified. The exception is the se-
	      quences in the output file specified with	the  --uchimealns  op-
	      tion,  where  the	 input	sequences  are converted to upper case
	      first and	lower case letters indicate disagreement  between  the
	      aligned sequences.

	      The  --qmask  option (or --dbmask	for database sequences)	may be
	      combined with the	--hardmask option. The results	of  using  the
	      none, dust or soft argument to --qmask or	--dbmask are presented
	      below, assuming each input sequence contains both	lower and  up-
	      percase symbols.

	      Results if the --hardmask	option is off (default):

		     none:    no masking, all symbols used, no change

		     dust:    masked symbols lowercased, rest uppercased

		     soft:    lowercase	symbols	masked,	no case	changes

	      Results if the --hardmask	option is on:

		     none:    no masking, all symbols used, no change

		     dust:    masked symbols changed to	Ns, rest unchanged

		     soft:    lowercase	symbols	masked and changed to Ns

	      When  a  sequence	 region	is masked, words in the	region are not
	      included in the indices used in the heuristic search  algorithm.
	      In all other aspects, the	region is treated as other regions.

	      Regions  in sequences that are hardmasked	(with N's) have	a zero
	      alignment	score and do not contribute to an alignment.

	      --fastaout filename
		       Write the masked	sequences to filename, in  fasta  for-
		       mat. Applies only to the	--fastx_mask command.

	      --fastqout filename
		       Write  the  masked sequences to filename, in fastq for-
		       mat. Applies only to the	--fastx_mask command.

	      --fastx_mask filename
		       Mask regions in sequences contained  in	the  specified
		       fasta  or fastq file. The default is to mask using DUST
		       (use --qmask to modify that behavior). The output files
		       are  specified  with  the --fastaout and	--fastqout op-
		       tions. The minimum and maximum percentage  of  unmasked
		       residues	 may  be specified with	the --min_unmasked_pct
		       and --max_unmasked_pct options, respectively.

	      --hardmask
		       Symbols in masked regions are replaced by N's. The  de-
		       fault  is  to  replace the masked regions by lower case
		       letters.

	      --maskfasta filename
		       Mask regions in sequences contained in the  fasta  file
		       filename.  The  default	is  to	mask  using  dust (use
		       --qmask to modify that behavior). The  output  file  is
		       specified with the --output option. This	command	is de-
		       preciated, please use --fastx_mask instead.

	      --max_unmasked_pct real
		       Discard sequences with more than	the specified  maximum
		       percentage   of	unmasked  residues.  Works  only  with
		       --fastx_mask.

	      --min_unmasked_pct real
		       Discard sequences with less than	the specified  minimum
		       percentage   of	unmasked  residues.  Works  only  with
		       --fastx_mask.

	      --output filename
		       Write the masked	sequences to filename, in  fasta  for-
		       mat. Applies only to the	--mask_fasta command.

	      --qmask none|dust|soft
		       If  the argument	is dust, mask regions in sequences us-
		       ing the DUST algorithm that detects simple repeats  and
		       low-complexity regions. This is the default. If the ar-
		       gument is soft, mask the	lower case letters in the  in-
		       put sequence. If	the argument is	none, do not mask.

       Orienting options:

	      The  --orient  command  can be used to orient the	sequences in a
	      given file in either the forward or  the	reverse	 complementary
	      direction	 based on a reference database specified with the --db
	      option. The two strands of each input sequence are  compared  to
	      the  reference  database	using  nucleotide words. If one	of the
	      strands share many more words with at least one sequence in  the
	      database	than  the  other, that strand is chosen. The correctly
	      oriented sequences may be	written	to a FASTA file	specified with
	      the   --fastaout,	 and  to  a  FASTQ  file  specified  with  the
	      --fastqout option	(as long as the	input was also in  FASTA  for-
	      mat). If the result is uncertain,	because	the number of matching
	      words is too similar, the	original sequence is  written  to  the
	      file  specified  with  the  --notmatched option. The results may
	      also be written to a tab-delimited text file specified with  the
	      --tabbedout  option. This	file will contain the query label, the
	      direction	(+, - or ?), the number	of matching words on the  for-
	      ward  strand,  and  the  number of matching words	on the reverse
	      complementary strand. By default,	a word length of  12  is  used
	      for  this	 command.  The	word  length may be adjusted using the
	      --wordlength option. There has to	be at least 4  times  as  many
	      matches  on  one	strand	than  the other	for a strand to	be se-
	      lected. In addition to the common	options, the following options
	      may also be specified for	this command: --dbmask,	--qmask, --re-
	      label, --relabel_keep,  --relabel_md5,  --relabel_self,  --rela-
	      bel_sha1,	--sizein, and --sizeout.

	      --db filename
		       Read the	reference database from	the given file.	It may
		       be in FASTA, FASTQ or UDB format. If  an	 UDB  file  is
		       used  it	 should	have been created with a wordlength of
		       12.

	      --fastaout filename
		       Write the correctly oriented sequences to filename,  in
		       fasta format.

	      --fastqout filename
		       Write  the correctly oriented sequences to filename, in
		       fastq format.

	      --notmatched filename
		       Write the  sequences  with  undetermined	 direction  to
		       filename, in the	orginal	format.

	      --orient filename
		       Orient the sequences in the given file.

	      --tabbedout filename
		       Write  the resuls to a tab-delimited text file with the
		       specified filename. This	file will  contain  the	 query
		       label,  the direction (+, - or ?), the number of	match-
		       ing words on the	forward	 strand,  and  the  number  of
		       matching	words on the reverse complementary strand.

       Restriction site	cutting	options:

	      The input	sequences in the file specified	with the --cut command
	      are cut into fragments at	all  restriction  sites	 matching  the
	      pattern  given  with  the	--cut_pattern option. The fragments on
	      the forward strand are written to	the file  specified  with  the
	      --fastaout  file	and  the  fragments  on	the reverse strand are
	      written to the file specified with  the  --fastaout_rev  option.
	      Input sequences that do not match	are written to the file	speci-
	      fied with	the option  --fastaout_discarded,  and	their  reverse
	      complement are also written to the file specfied with the	--fas-
	      taout_discarded_rev  option.  The	 relabel  options  (--relabel,
	      --relabel_self,	--relabel_keep,	  --relabel_md5,  and  --rela-
	      bel_sha1)	may be used to relabel the output sequences).

	      --cut filename
		       Specify the input file with sequences in	FASTA format.

	      --cut_pattern string
		       Specify the restriction site cutting pattern and	 posi-
		       tions.  The  pattern is a string	of lower- or uppercase
		       letters specifying the nucleotides that must match, and
		       may  include  ambiguous nucleotide symbols. The special
		       characters "^" (circumflex) and	"_"  (underscore)  are
		       used  to	 indicate  the cutting position	on the forward
		       and reverse strand, respectively. For example, the pat-
		       tern  "G^AATT_C"	 is the	pattern	for the	EcoRI restric-
		       tion site. For such palindromic patterns	(identical  to
		       its  reverse  complement)  the  command will output all
		       possible	fragments on both strands. For non-palindromic
		       sites,  it  may be necessary to run the command also on
		       the reverse complemented	input sequences.  Exactly  one
		       cutting site on each strand must	be indicated.

	      --fastaout filename
		       Specify	the output file	for the	resulting fragments on
		       the forward strand.

	      --fastaout_rev filename
		       Specify the output file for the resulting fragments  on
		       the reverse strand.

	      --fastaout_discarded filename
		       Specify the output file for the non-matching sequences.

	      --fastaout_discarded_rev filename
		       Specify the output file for the non-matching seqeunces,
		       reverse complemented.

       Pairwise	alignment options:

	      The results of the n * (n	- 1) / 2 pairwise alignments are writ-
	      ten  to  the  result files specified with	--alnout, --blast6out,
	      --fastapairs  --matched,	 --notmatched,	 --samout,   --uc   or
	      --userout	 (see  Searching  section  below).  Specify either the
	      --acceptall option to output all pairwise	alignments, or specify
	      an  identity  level  with	 --id to discard weak alignments. Most
	      other accept/reject options (see Searching  options  below)  may
	      also  be	used. Sequences	are aligned on their plus strand only.
	      Masking is performed as usual and	 specified  with  --qmask  and
	      --hardmask.

	      --acceptall
		       Write  the  results  of all alignments to output	files.
		       This option overrides all other	accept/reject  options
		       (including --id).

	      --allpairs_global	filename
		       Perform	optimal	 global	pairwise alignments of all vs.
		       all fasta sequences contained in	filename. This command
		       is multi-threaded.

	      --id real
		       Reject  the  sequence match if the pairwise identity is
		       lower than real (value ranging  from  0.0  to  1.0  in-
		       cluded).

	      --threads	positive integer
		       Number  of  computation threads to use (1 to 1024). The
		       number of threads should	be lesser or equal to the num-
		       ber  of	available CPU cores. The default is to use all
		       available resources and to launch one thread per	 logi-
		       cal core.

	      --uc filename
		       Output  pairwise	 alignment results in filename using a
		       tab-separated uclust-like format	with 10	columns.  Each
		       sequence	 is  compared  to all other sequences, and all
		       hits (--acceptall) or only some hits (--id  float)  are
		       reported, with one pairwise comparison per line:

			      1.  Record type, always set to 'H'.

			      2.  Ordinal number of the	target sequence	(based
				  on input order, starting from	zero).

			      3.  Sequence length.

			      4.  Percentage of	similarity with	the target se-
				  quence.

			      5.  Match	orientation, always set	to '+'.

			      6.  Not used, always set to zero.

			      7.  Not used, always set to zero.

			      8.  Compact   representation   of	 the  pairwise
				  alignment using the  CIGAR  format  (Compact
				  Idiosyncratic	 Gapped	 Alignment  Report): M
				  (match/mismatch), D (deletion) and I (inser-
				  tion). The equal sign	'=' indicates that the
				  query	is identical to	the centroid sequence.

			      9.  Label	of the query sequence.

			      10. Label	of the target sequence.

       Searching options:

	      --alnout filename
		       Write pairwise global alignments	to  filename  using  a
		       human-readable format. Use --rowlen to modify alignment
		       length. Output  order  may  vary	 when  using  multiple
		       threads.

	      --biomout	filename
		       Write  search  results to an OTU	table in the biom ver-
		       sion 1.0	file format. The query file contains the  sam-
		       ples, while the database	file contains the OTUs.	Sample
		       and OTU identifiers are extracted from  the  header  of
		       these  sequences. See the --biomout option in the Clus-
		       tering section for further details.

	      --blast6out filename
		       Write search results to	filename  using	 a  blast-like
		       tab-separated  format  of twelve	fields (listed below),
		       with one	line per query-target  matching	 (or  lack  of
		       matching	if --output_no_hits is used). Warning, vsearch
		       uses global pairwise alignments,	not blast's  seed-and-
		       extend  algorithm.  Therefore, some common blast	output
		       values (alignment start and end,	evalue,	bit score) are
		       reported	 differently. Output order may vary when using
		       multiple	threads. A similar output can be  obtain  with
		       --userout    filename   and   --userfields   query+tar-
		       get+id+alnlen+mism+opens+qlo+qhi+tlo+thi+evalue+bits.
		       A  complete  list  and  description is available	in the
		       section 'Userfields' of this manual.

			      1.  query: query label.

			      2.  target: target  (database  sequence)	label.
				  The  field  is  set  to  '*'	if there is no
				  alignment.

			      3.  id: percentage of identity (real value rang-
				  ing from 0.0 to 100.0). The percentage iden-
				  tity is defined as 100 * (matching  columns)
				  /  (alignment	 length	 - terminal gaps). See
				  fields id0 to	id4 for	other definitions.

			      4.  alnlen: length of the	query-target alignment
				  (number  of  columns). The field is set to 0
				  if there is no alignment.

			      5.  mism:	number of mismatches in	the  alignment
				  (zero	or positive integer value).

			      6.  opens:  number  of  columns containing a gap
				  opening (zero	or positive integer value).

			      7.  qlo: first nucleotide	of the	query  aligned
				  with	the target. Always equal to 1 if there
				  is an	alignment, 0 otherwise	(see  qilo  to
				  ignore initial gaps).

			      8.  qhi:	last  nucleotide  of the query aligned
				  with the target. Always equal	to the	length
				  of  the pairwise alignment, 0	otherwise (see
				  qihi to ignore terminal gaps).

			      9.  tlo: first nucleotide	of the target  aligned
				  with	the  query. Always equal to 1 if there
				  is an	alignment, 0 otherwise	(see  tilo  to
				  ignore initial gaps).

			      10. thi:	last  nucleotide of the	target aligned
				  with the query. Always equal to  the	length
				  of  the pairwise alignment, 0	otherwise (see
				  tihi to ignore terminal gaps).

			      11. evalue: expectancy-value (not	 computed  for
				  nucleotide alignments). Always set to	-1.

			      12. bits:	bit score (not computed	for nucleotide
				  alignments). Always set to 0.

	      --db filename
		       Compare	query	sequences   (specified	 with	--use-
		       arch_global)  to	 the  fasta-formatted target sequences
		       contained in filename, using global pairwise alignment.
		       Alternatively,  the name	of a preformatted UDB database
		       created using the makeudb_usearch command  (see	below)
		       may be specified.

	      --dbmask none|dust|soft
		       Mask regions in the target database sequences using the
		       dust method or the soft method, or do not mask  (none).
		       Warning,	when using soft	masking	search commands	become
		       case sensitive. The default is to mask using dust.

	      --dbmatched filename
		       Write database target sequences matching	at  least  one
		       query sequence to filename, in fasta format. If the op-
		       tion --sizeout is used,	the  number  of	 queries  that
		       matched	each  target  sequence	is indicated using the
		       pattern ";size=integer;".

	      --dbnotmatched filename
		       Write database target sequences not matching query  se-
		       quences to filename, in fasta format.

	      --fastapairs filename
		       Write pairwise alignments of query and target sequences
		       to filename, in fasta format.

	      --fulldp Dummy option for	compatibility with usearch.  To	 maxi-
		       mize  search  sensitivity,  vsearch uses	a 8-way	16-bit
		       SIMD  vectorized	 full  dynamic	programming  algorithm
		       (Needleman-Wunsch),  whether  or	not --fulldp is	speci-
		       fied.

	      --gapext string
		       Set penalties for a gap extension. See --gapopen	for  a
		       complete	description of the penalty declaration system.
		       The default is to initialize the	six gap	extending pen-
		       alties using a penalty of 2 for extending internal gaps
		       and a penalty of	1 for extending	terminal gaps, in both
		       query and target	sequences (i.e.	2I/1E).

	      --gapopen	string
		       Set  penalties for a gap	opening. A gap opening can oc-
		       cur in six different contexts: in the query (Q)	or  in
		       the  target  (T)	sequence, at the left (L) or right (R)
		       extremity of the	sequence, or inside the	sequence  (I).
		       Sequence	 symbols  (Q and T) can	be combined with loca-
		       tion symbols (L,	I, and R), and numerical values	to de-
		       clare	penalties    for    all	  possible   contexts:
		       aQL/bQI/cQR/dTL/eTI/fTR,	where abcdef are zero or posi-
		       tive integers, and '/' is used as a separator.
		       To  simplify  declarations, the location	symbols	(L, I,
		       and R) can be combined, the symbol (E) can be  used  to
		       treat  both extremities (L and R) equally, and the sym-
		       bols Q and T can	be omitted to treat query  and	target
		       sequences  equally. For instance, the default is	to de-
		       clare a penalty of 20 for opening internal gaps	and  a
		       penalty of 2 for	opening	terminal gaps (left or right),
		       in both query and target	sequences  (i.e.  20I/2E).  If
		       only  a	numerical value	is given, without any sequence
		       or location symbol, then	the penalty applies to all gap
		       openings.  To  forbid  gap-opening, an infinite penalty
		       value can be declared  with  the	 symbol	 '*'.  To  use
		       vsearch as a semi-global	aligner, a null-penalty	can be
		       applied to the left (L) or right	(R) gaps.
		       vsearch always initializes the six gap  opening	penal-
		       ties using the default parameters (20I/2E). The user is
		       then free to declare only the values  he/she  wants  to
		       modify.	The  string is scanned from left to right, ac-
		       cepted symbols are (0123456789/LIREQT*),	and later val-
		       ues override previous values.
		       Please  note that vsearch, in contrast to usearch, only
		       allows integer gap penalties. Because  the  lowest  gap
		       penalties  are  0.5  by default in usearch, all default
		       scores and gap penalties	in vsearch have	 been  doubled
		       to maintain equivalent penalties	and to produce identi-
		       cal alignments.

	      --hardmask
		       Mask sequence regions by	replacing them with Ns instead
		       of  setting  them  to lower case	as is the default. For
		       more information, please	see the	Masking	section.

	      --id real
		       Reject the sequence match if the	pairwise  identity  is
		       lower  than  real  (value  ranging  from	0.0 to 1.0 in-
		       cluded).	The search process sorts target	 sequences  by
		       decreasing  number  of  k-mers they have	in common with
		       the query sequence, using that information as  a	 proxy
		       for  sequence  similarity. That efficient pre-filtering
		       also prevents pairwise alignments with weakly  matching
		       targets,	 as there needs	to be at least 6 shared	k-mers
		       to start	the pairwise alignment,	and at least  one  out
		       of  every  16  k-mers from the query needs to match the
		       target. Consequently, using values lower	than --id  0.5
		       is  not likely to capture more weakly matching targets.
		       The pairwise identity is	by default defined as the num-
		       ber  of (matching columns) / (alignment length -	termi-
		       nal gaps). That definition can be modified by --iddef.

	      --iddef 0|1|2|3|4
		       Change the pairwise identity definition used  in	 --id.
		       Values accepted are:

			      0.  CD-HIT   definition:	(matching  columns)  /
				  (shortest sequence length).

			      1.  edit distance: (matching columns) /  (align-
				  ment length).

			      2.  edit	distance  excluding terminal gaps (de-
				  fault	definition for --id).

			      3.  Marine Biological  Lab  definition  counting
				  each gap opening (internal or	terminal) as a
				  single mismatch, whether or not the gap  was
				  extended:  1.0  -  [(mismatches  + gap open-
				  ings)/(longest sequence length)]

			      4.  BLAST	definition, equivalent	to  --iddef  1
				  for global pairwise alignments.

		       The  option --userfields	accepts	the fields id0 to id4,
		       in addition to the field	id,  to	 report	 the  pairwise
		       identity	 values	corresponding to the different defini-
		       tions.

	      --idprefix positive integer
		       Reject the sequence match if the	first integer  nucleo-
		       tides of	the target do not match	the query.

	      --idsuffix positive integer
		       Reject  the  sequence match if the last integer nucleo-
		       tides of	the target do not match	the query.

	      --leftjust
		       Reject the sequence match if the	pairwise alignment be-
		       gins with gaps.

	      --match integer
		       Score  assigned to a match (i.e.	identical nucleotides)
		       in the pairwise alignment. The default value is 2.

	      --matched	filename
		       Write query  sequences  matching	 database  target  se-
		       quences to filename, in fasta format.

	      --maxaccepts positive integer
		       Maximum	number	of  hits to accept before stopping the
		       search. The default value is 1. This  option  works  in
		       pair with --maxrejects. The search process sorts	target
		       sequences by decreasing number of k-mers	they  have  in
		       common  with the	query sequence,	using that information
		       as a proxy  for	sequence  similarity.  After  pairwise
		       alignments, if the first	target sequence	passes the ac-
		       ceptation criteria, it is accepted as best hit and  the
		       search process stops for	that query. If --maxaccepts is
		       set to a	higher	value,	more  hits  are	 accepted.  If
		       --maxaccepts  and  --maxrejects	are both set to	0, the
		       complete	database is searched.

	      --maxdiffs positive integer
		       Reject the sequence match if the	alignment contains  at
		       least integer substitutions, insertions or deletions.

	      --maxgaps	positive integer
		       Reject  the sequence match if the alignment contains at
		       least integer insertions	or deletions.

	      --maxhits	non-negative integer
		       Maximum number of hits to show once the search is  ter-
		       minated	(hits  are sorted by decreasing	identity). Un-
		       limited by default or if	the argument it	zero. This op-
		       tion  applies  to  --alnout, --blast6out, --fastapairs,
		       --samout, --uc, or --userout output files.

	      --maxid real
		       Reject the sequence match if the	percentage of identity
		       between the two sequences is greater than real.

	      --maxqsize positive integer
		       Reject  query  sequences	with an	abundance greater than
		       integer.

	      --maxqt real
		       Reject if the query/target  sequence  length  ratio  is
		       greater than real.

	      --maxrejects positive integer
		       Maximum number of non-matching target sequences to con-
		       sider before stopping the search. The default value  is
		       32.  This  option  works	in pair	with --maxaccepts. The
		       search process sorts  target  sequences	by  decreasing
		       number of k-mers	they have in common with the query se-
		       quence, using that information as a proxy for  sequence
		       similarity.  After  pairwise alignments,	if none	of the
		       first 32	examined target	sequences pass the acceptation
		       criteria,  the  search process stops for	that query (no
		       hit). If	--maxrejects is	set to a  higher  value,  more
		       target  sequences  are  considered. If --maxaccepts and
		       --maxrejects are	both set to 0, the  complete  database
		       is searched.

	      --maxsizeratio real
		       Reject  if  the query/target abundance ratio is greater
		       than real.

	      --maxsl real
		       Reject if the shorter/longer sequence length  ratio  is
		       greater than real.

	      --maxsubs	positive integer
		       Reject  the  sequence  match  if	the pairwise alignment
		       contains	more than integer substitutions.

	      --mid real
		       Reject the sequence match if the	percentage of identity
		       is  lower  than	real  (ignoring	all gaps, internal and
		       terminal).

	      --mincols	positive integer
		       Reject the sequence match if the	 alignment  length  is
		       shorter than integer.

	      --minqt real
		       Reject  if  the	query/target  sequence length ratio is
		       lower than real.

	      --minsizeratio real
		       Reject if the query/target  abundance  ratio  is	 lower
		       than real.

	      --minsl real
		       Reject  if  the shorter/longer sequence length ratio is
		       lower than real.

	      --mintsize positive integer
		       Reject target sequences with an	abundance  lower  than
		       integer.

	      --minwordmatches non-negative integer
		       Minimum	number of word matches required	for a sequence
		       to be considered	further. Default value is 12  for  the
		       default	word  length 8.	For word lengths 3-15, the de-
		       fault minimum word matches are 18, 17, 16, 15, 14,  12,
		       11,  10,	 9,  8,	7, 5 and 3, respectively. If the query
		       sequence	has fewer unique words than the	number	speci-
		       fied,  all  words in the	query must match. If the argu-
		       ment is 0, no word matches are required.

	      --mismatch integer
		       Score assigned to a mismatch  (i.e.  different  nucleo-
		       tides)  in the pairwise alignment. The default value is
		       -4.

	      --mothur_shared_out filename
		       Write search results to an  OTU	table  in  the	mothur
		       'shared'	 tab-separated	plain  text  file  format. The
		       query file contains the	samples,  while	 the  database
		       file  contains the OTUs.	Sample and OTU identifiers are
		       extracted from the header of these sequences.  See  the
		       --otutabout  option  in the Clustering section for fur-
		       ther details.

	      --notmatched filename
		       Write query sequences not matching database target  se-
		       quences to filename, in fasta format.

	      --otutabout filename
		       Write  search  results  to  an OTU table	in the classic
		       tab-separated plain text	format.	The  query  file  con-
		       tains the samples, while	the database file contains the
		       OTUs. Sample and	OTU identifiers	are extracted from the
		       header  of these	sequences. See the --mothur_shared_out
		       option in the Clustering	section	for further details.

	      --output_no_hits
		       Write both matching and non-matching queries  to	 --al-
		       nout,  --blast6out, --samout or --userout output	files.
		       Non-matching queries are	labelled 'No hits' in --alnout
		       files.

	      --pattern	string
		       This  option is ignored.	It is provided for compatibil-
		       ity with	usearch.

	      --qmask none|dust|soft
		       Mask regions in the query sequences using the  dust  or
		       the  soft  algorithms,  or do not mask (none). Warning,
		       when using soft masking	search	commands  become  case
		       sensitive. The default is to mask using dust.

	      --query_cov real
		       Reject if the fraction of the query aligned to the tar-
		       get sequence is lower than real.	The query coverage  is
		       computed	 as  (matches  +  mismatches) /	query sequence
		       length. Internal	or terminal gaps are  not  taken  into
		       account.

	      --rightjust
		       Reject  the  sequence  match  if	the pairwise alignment
		       ends with gaps.

	      --rowlen positive	integer
		       Width of	alignment lines	in --alnout  output.  The  de-
		       fault value is 64. Set to 0 to eliminate	wrapping.

	      --samheader
		       Include	header	lines to the SAM file when --samout is
		       specified. The header includes lines starting with @HD,
		       @SQ    and    @PG,    but    no	  @RG	 lines	  (see
		       <https://github.com/samtools/hts-specs>). By default no
		       header line is written.

	      --samout filename
		       Write  alignment	results	to filename using the SAM for-
		       mat (a tab-separated text file).	When using the	--sam-
		       header  option,	the SAM	file starts with header	lines.
		       Each non-header line is a SAM record, which  represents
		       either a	query-target alignment or the absence of match
		       for a query (output order may vary when using  multiple
		       threads).  Each record contains 11 mandatory fields and
		       optional	fields (see  <https://github.com/samtools/hts-
		       specs> for a complete description of the	format):

			      1.  query	sequence label.

			      2.  combination  of bitwise flags. Possible val-
				  ues are: 0 (top hit),	4 (no  hit),  16  (re-
				  verse-complemented hit), 256 (secondary hit,
				  i.e. all hits	except the top hit).

			      3.  target sequence label.

			      4.  first	position of a target aligned with  the
				  query	 (always  1 for	global pairwise	align-
				  ments, 0 if there is no match).

			      5.  mapping  quality  (ignored,  always  set  to
				  '*').

			      6.  CIGAR	 string	 (set  to  '*'	if there is no
				  match).

			      7.  name of the target  sequence	matching  with
				  the  next  read of the query (for mate reads
				  only,	ignored	and always set to '*').

			      8.  position of the  primary  alignment  of  the
				  next read of the query (for mate reads only,
				  ignored and always set to 0).

			      9.  target sequence  length  (for	 multi-segment
				  targets, ignored and always set to 0).

			      10. query	 sequence (complete, not only the seg-
				  ment aligned to the target as	usearch	does).

			      11. quality string (ignored, always set to '*').
		       Optional	fields for query-target	 matches  (number  and
		       order of	fields may vary):

			      12. AS:i:?  alignment  score (i.e. percentage of
				  identity).

			      13. XN:i:? next best alignment score (always set
				  to 0).

			      14. XM:i:? number	of mismatches.

			      15. XO:i:?  number  of  gap  openings (excluding
				  terminal gaps).

			      16. XG:i:? number	of gap	extensions  (excluding
				  terminal gaps).

			      17. NM:i:?  edit	distance to the	target (sum of
				  XM and XG).

			      18. MD:Z:? string	for mismatching	positions.

			      19. YT:Z:UU string  representing	the  alignment
				  type.

	      --search_exact filename
		       Search  for  exact full-length matches to the query se-
		       quences contained in filename in	the database of	target
		       sequences  (--db). Only 100% exact matches are reported
		       and this	command	is much	faster than  --usearch_global.
		       The --id, --maxaccepts and --maxrejects options are ig-
		       nored, but the rest of the  searching  options  may  be
		       specified.

	      --self   Reject  the  sequence match if the query	and target la-
		       bels are	identical.

	      --selfid Reject the sequence match if the	query and  target  se-
		       quences are strictly identical.

	      --sizeout
		       Add  abundance  annotations to the output of the	option
		       --dbmatched (using the  pattern	';size=integer;'),  to
		       report the number of queries that matched each target.

	      --strand plus|both
		       When  searching	for  similar sequences,	check the plus
		       strand only (default) or	check both strands.

	      --target_cov real
		       Reject the sequence match if the	fraction of the	target
		       sequence	 aligned  to  the query	sequence is lower than
		       real. The target	coverage is  computed  as  (matches  +
		       mismatches) / target sequence length.  Internal or ter-
		       minal gaps are not taken	into account.

	      --top_hits_only
		       Only the	top hits between the query  and	 database  se-
		       quence  sets  are  written to the output	specified with
		       the options --alnout, --samout, --userout, --blast6out,
		       --uc,  --fastapairs, --matched or --notmatched (but not
		       --dbmatched and --dbnotmatched).	For  each  query,  the
		       top hit is the one presenting the highest percentage of
		       identity	(see the --iddef  option  to  change  the  way
		       identity	 is  measured).	 For a given query, if several
		       top hits	present	exactly	the same percentage  of	 iden-
		       tity,  the number of hits reported is controlled	by the
		       --maxaccepts value (1 by	default).

	      --uc filename
		       Output searching	results	in filename using a  tab-sepa-
		       rated  uclust-like  format  with	10 columns. When using
		       the --search_exact command, the	table  layout  is  the
		       same  than  with	 the --allpairs_global.	When using the
		       --usearch_global	command, the table present two differ-
		       ent  type of entries: hit (H) or	no hit (N). Each query
		       sequence	is compared to all other  sequences,  and  the
		       best hit	(--maxaccept 1)	or several hits	(--maxaccept >
		       1) are reported (H). Output order may vary  when	 using
		       multiple	 threads.  Column content varies with the type
		       of entry	(H or N):

			      1.  Record type: H, or N ('hit' or 'no hit').

			      2.  Ordinal number of the	target sequence	(based
				  on  input order, starting from zero).	Set to
				  '*' for N.

			      3.  Sequence length. Set to '*' for N.

			      4.  Percentage of	similarity with	the target se-
				  quence. Set to '*' for N.

			      5.  Match	 orientation  +	or -. .	Set to '.' for
				  N.

			      6.  Not used, always set to zero for H,  or  '*'
				  for N.

			      7.  Not  used,  always set to zero for H,	or '*'
				  for N.

			      8.  Compact  representation  of	the   pairwise
				  alignment  using  the	 CIGAR format (Compact
				  Idiosyncratic	Gapped	Alignment  Report):  M
				  (match/mismatch), D (deletion) and I (inser-
				  tion). The equal sign	'=' indicates that the
				  query	is identical to	the centroid sequence.
				  Set to '*' for N.

			      9.  Label	of the query sequence.

			      10. Label	of the target centroid	sequence.  Set
				  to '*' for N.

	      --uc_allhits
		       When using the --uc option, show	all hits, not just the
		       top hit for each	query.

	      --usearch_global filename
		       Compare target sequences	(--db) to the  fasta-formatted
		       query  sequences	 contained  in	filename, using	global
		       pairwise	alignment.

	      --userfields string
		       When using --userout, select and	order the fields writ-
		       ten  to	the  output  file. Fields are separated	by '+'
		       (e.g. query+target+id). See  the	 'Userfields'  section
		       for a complete list of fields.

	      --userout	filename
		       Write  user-defined  tab-separated  output to filename.
		       Select the fields with the option --userfields.	Output
		       order  may vary when using multiple threads. If --user-
		       fields is empty or not present, filename	is empty.

	      --weak_id	real
		       Show hits with percentage of identity of	at least real,
		       without	terminating  the search. A normal search stops
		       as soon as enough hits are found	(as defined by	--max-
		       accepts,	 --maxrejects, and --id). As --weak_id reports
		       weak hits that are not deduced from --maxaccepts,  high
		       --id  values  can  be used, hence preserving both speed
		       and sensitivity.	Logically, real	must be	 smaller  than
		       the value indicated by --id.

	      --wordlength positive integer
		       Length  of  words  (i.e.	k-mers)	for database indexing.
		       The range of possible values goes from  3  to  15,  but
		       values  near  8	or 9 are generally recommended.	Longer
		       words may reduce	the sensitivity/recall for weak	 simi-
		       larities,  but  can  increase  precision.  On the other
		       hand, shorter words may increase	sensitivity or recall,
		       but  may	 reduce	 precision. Computation	time generally
		       increases with shorter words and	decreases with	longer
		       words, but it increases again for very long words. Mem-
		       ory requirements	for a part of the index	increase  with
		       a  factor  of  4	each time word length increases	by one
		       nucleotide, and this generally becomes significant  for
		       long words (12 or more).	The default value is 8.

       Shuffling options:
	      Fasta entries in the input file are outputted in a pseudo-random
	      order.

	      --output filename
		       Write the shuffled sequences to filename, in fasta for-
		       mat.

	      --randseed positive integer
		       When  shuffling	sequence order,	use integer as seed. A
		       given seed always produces the same output order	 (use-
		       ful for replicability). Set to 0	to use a pseudo-random
		       seed (default behavior).

	      --relabel	string
		       Relabel sequences using the prefix string and a	ticker
		       (1,  2,	3,  etc.)  to  construct  the new headers. Use
		       --sizeout to conserve the abundance annotations.

	      --relabel_keep
		       When relabelling, keep the old identifier in the	header
		       after a space.

	      --relabel_md5
		       Relabel	sequences  using  the MD5 message digest algo-
		       rithm applied to	each sequence. Former sequence headers
		       are  discarded. The sequence is converted to upper case
		       and U is	replaced by T before the digest	 is  computed.
		       The  MD5	 digest	 is  a cryptographic hash function de-
		       signed to minimize the probability that	two  different
		       inputs  gives  the  same	output,	even for very similar,
		       but non-identical inputs. Still,	there is always	a very
		       small,  but non-zero probability	that two different in-
		       puts give the same result. The MD5 digest  generates  a
		       128-bit	(16-byte)  digest  that	 is  represented by 16
		       hexadecimal   numbers   (using	32    symbols	 among
		       0123456789abcdef).  Use --sizeout to conserve the abun-
		       dance annotations.

	      --relabel_self
		       Relabel sequences using the sequence itself as the  la-
		       bel.

	      --relabel_sha1
		       Relabel	sequences  using the SHA1 message digest algo-
		       rithm applied to	each sequence. It is  similar  to  the
		       --relabel_md5  option  but  uses	the SHA1 algorithm in-
		       stead of	the MD5	algorithm. The SHA1 digest generates a
		       160-bit	(20-byte)  result  that	 is  represented by 20
		       hexadecimal numbers (40 symbols). The probability of  a
		       collision  (two non-identical sequences having the same
		       digest) is smaller for the SHA1 algorithm  than	it  is
		       for  the	 MD5  algorithm. Use --sizeout to conserve the
		       abundance annotations.

	      --sizeout
		       When using --relabel, --relabel_self, --relabel_md5  or
		       --relabel_sha1,	preserve  and report abundance annota-
		       tions to	the  output  fasta  file  (using  the  pattern
		       ';size=integer;').

	      --shuffle	filename
		       Pseudo-randomly	shuffle	 the  order  of	sequences con-
		       tained in filename.

	      --topn positive integer
		       Output only the first integer sequences	after  pseudo-
		       random reordering.

	      --xsize  Strip abundance information from	the headers when writ-
		       ing the output file.

       Sorting options:
	      Fasta entries are	sorted by decreasing abundance	(--sortbysize)
	      or  sequence length (--sortbylength). To obtain a	stable sorting
	      order, ties are sorted by	decreasing  abundance  and  label  in-
	      creasing	alpha-numerical	order (--sortbylength),	or just	by la-
	      bel increasing alpha-numerical order (--sortbysize). Label sort-
	      ing  assumes that	all sequences have unique labels. The same ap-
	      plies to the automatic sorting performed during chimera checking
	      (--uchime_denovo), dereplication (--derep_fulllength), and clus-
	      tering (--cluster_fast and --cluster_size).

	      --maxsize	positive integer
		       When using  --sortbysize,  discard  sequences  with  an
		       abundance value greater than integer.

	      --minsize	positive integer
		       When  using  --sortbysize,  discard  sequences  with an
		       abundance value smaller than integer.

	      --output filename
		       Write the sorted	sequences to filename, in  fasta  for-
		       mat.

	      --relabel	string
		       Please  see  the	 description  of the same option under
		       Chimera detection for details.

	      --relabel_keep
		       When relabelling, keep the old identifier in the	header
		       after a space.

	      --relabel_md5
		       Please  see  the	 description  of the same option under
		       Chimera detection for details.

	      --relabel_self
		       Please see the description of  the  same	 option	 under
		       Chimera detection for details.

	      --relabel_sha1
		       Please  see  the	 description  of the same option under
		       Chimera detection for details.

	      --sizeout
		       When using --relabel, report abundance  annotations  to
		       the  output  fasta file (using the pattern ';size=inte-
		       ger;').

	      --sortbylength filename
		       Sort by decreasing length the  sequences	 contained  in
		       filename.  See  the  general options --minseqlength and
		       --maxseqlength to eliminate short and long sequences.

	      --sortbysize filename
		       Sort by decreasing abundance the	sequences contained in
		       filename	 (missing  abundance  values are assumed to be
		       ';size=1'). See the options --minsize and --maxsize  to
		       eliminate rare and dominant sequences.

	      --topn positive integer
		       Output only the top integer sequences (i.e. the longest
		       or the most abundant).

	      --xsize  Strip abundance information from	the headers when writ-
		       ing the output file.

       Subsampling options:
	      Subsampling randomly extracts a certain number or	a certain per-
	      centage of the sequences in the input file. If the --sizein  op-
	      tion  is	in  effect,  the  abundances of	the input sequences is
	      taken into account and the sampling is performed as if the input
	      sequences	 were rereplicated, subsampled and dereplicated	before
	      being written to the output file.	The extraction is performed as
	      a	 random	 sampling  with	a uniform distribution among the input
	      sequences	and is performed without replacement. The  input  file
	      is specified with	the --fastx_subsample option, the output files
	      are specified with the --fastaout	and --fastqout options and the
	      amount  of  sequences to be sampled is specified with the	--sam-
	      ple_pct or --sample_size options.	The sequences not sampled  may
	      be written to files specified with the options --fasta_discarded
	      and  --fastq_discarded.  The  --fastq_ascii,  --fastq_qmin   and
	      --fastq_qmax options are also available.

	      --fastaout filename
		       Write  the sampled sequences to filename, in fasta for-
		       mat.

	      --fastaout_discarded filename
		       Write the sequences not sampled to filename,  in	 fasta
		       format.

	      --fastq_ascii positive integer
		       Define the ASCII	character number used as the basis for
		       the FASTQ quality score.	The default is	33,  which  is
		       used  by	 the  Sanger  /	 Illumina  1.8+	 FASTQ	format
		       (phred+33). The value 64	is used	by the	Solexa,	 Illu-
		       mina 1.3+ and Illumina 1.5+ formats (phred+64). Only 33
		       and 64 are valid	arguments.

	      --fastq_qmax positive integer
		       Specify the maximum quality score accepted when reading
		       FASTQ  files. The default is 41,	which is usual for re-
		       cent Sanger/Illumina 1.8+ files.

	      --fastq_qmin positive integer
		       Specify the minimum quality score  accepted  for	 FASTQ
		       files.  The  default  is	 0,  which is usual for	recent
		       Sanger/Illumina	1.8+  files.  Older  formats  may  use
		       scores between -5 and 2.

	      --fastqout filename
		       Write  the sampled sequences to filename, in fastq for-
		       mat. Requires input in fastq format.

	      --fastqout_discarded filename
		       Write the sequences not sampled to filename,  in	 fastq
		       format. Requires	input in fastq format.

	      --fastx_subsample	filename
		       Perform subsampling from	the sequences in the specified
		       input file that is in FASTA or FASTQ format.

	      --randseed positive integer
		       Use integer as a	seed for the pseudo-random  generator.
		       A  given	seed always produces the same output, which is
		       useful for replicability. Set to	0 to use a pseudo-ran-
		       dom seed	(default behavior).

	      --relabel	string
		       Relabel	sequences using	the prefix string and a	ticker
		       (1, 2, 3, etc.)	to  construct  the  new	 headers.  Use
		       --sizeout to conserve the abundance annotations.

	      --relabel_keep
		       When relabelling, keep the old identifier in the	header
		       after a space.

	      --relabel_md5
		       Relabel sequences using the MD5	message	 digest	 algo-
		       rithm applied to	each sequence. Former sequence headers
		       are discarded. The sequence is converted	to upper  case
		       and  U  is replaced by T	before the digest is computed.
		       The MD5 digest is a  cryptographic  hash	 function  de-
		       signed  to  minimize the	probability that two different
		       inputs give the same output, even for very similar, but
		       non-identical  inputs.  Still,  there  is always	a very
		       small, but non-zero probability that two	different  in-
		       puts  give  the same result. The	MD5 digest generates a
		       128-bit (16-byte) digest	 that  is  represented	by  16
		       hexadecimal    numbers	 (using	  32   symbols	 among
		       0123456789abcdef). Use --sizeout	to conserve the	 abun-
		       dance annotations.

	      --relabel_self
		       Relabel	sequences using	the sequence itself as the la-
		       bel.

	      --relabel_sha1
		       Relabel sequences using the SHA1	message	 digest	 algo-
		       rithm  applied  to  each	sequence. It is	similar	to the
		       --relabel_md5 option but	uses the  SHA1	algorithm  in-
		       stead of	the MD5	algorithm. The SHA1 digest generates a
		       160-bit (20-byte) result	 that  is  represented	by  20
		       hexadecimal  numbers (40	symbols). The probability of a
		       collision (two non-identical sequences having the  same
		       digest)	is  smaller  for the SHA1 algorithm than it is
		       for the MD5 algorithm. Use --sizeout  to	 conserve  the
		       abundance annotations.

	      --sample_pct real
		       Subsample  the given percentage of the input sequences.
		       Accepted	values range from 0.0 to 100.0.

	      --sample_size positive integer
		       Extract the given number	of sequences.

	      --sizein Take the	abundance information of the input  file  into
		       account,	 otherwise  the	 abundance of each sequence is
		       considered to be	1.

	      --sizeout
		       Write abundance information to the output file.

	      --xsize  Strip abundance information from	the headers when writ-
		       ing the output file.

       Taxonomic classification	options:
	      The  vsearch  command --sintax will classify the input sequences
	      according	to the Sintax algorithm	as described by	 Robert	 Edgar
	      (2016)  in SINTAX: a simple non-Bayesian taxonomy	classifier for
	      16S  and	ITS  sequences,	 BioRxiv,   074161.   Preprint.	  doi:
	      10.1101/074161

	      The  name	of the fasta file containing the input sequences to be
	      classified is given as an	argument to the	--sintax command.  The
	      reference	 sequence  database is specified with the --db option.
	      The results are written in a tab delimited text file whose  name
	      is  specified  with  the --tabbedout option. The --sintax_cutoff
	      option may be used to set	a minimum level	of  bootstrap  support
	      for the taxonomic	ranks to be reported.

	      Multithreading  is  supported.  Databases	 in UDB	files are sup-
	      ported.  The strand option may be	specified.

	      The reference database must contain taxonomic information	in the
	      header  of  each	sequence in the	form of	a string starting with
	      ";tax=" and followed by a	comma-separated	list of	 up  to	 eight
	      taxonomic	identifiers. Each taxonomic identifier must start with
	      an indication of the rank	by one of the letters d	(for domain) k
	      (kingdom),  p  (phylum),	c  (class),  o	(order), f (family), g
	      (genus), or s (species). The letter is followed by a  colon  (:)
	      and the name of that rank. Commas	and semicolons are not allowed
	      in the name of the rank.

	      Example:	  ">X80725_S000004313;tax=d:Bacteria,p:Proteobacteria,
	      c:Gammaproteobacteria,o:Enterobacteriales,f:Enterobacteriaceae,
	      g:Escherichia/Shigella,s:Escherichia_coli".

	      The option --notrunclabels is turned on by default for this com-
	      mand, allowing spaces in the taxonomic identifiers.

	      --db filename
		       Read  the  reference sequences from filename, in	FASTA,
		       FASTQ or	UDB format. These sequences needs to be	 anno-
		       tated with taxonomy.

	      --sintax_cutoff real
		       Specify	a  minimum  level of bootstrap support for the
		       taxonomic ranks that will be included in	 column	 4  of
		       the  output  file.  For	instance 0.9, corresponding to
		       90%.

	      --sintax filename
		       Read the	input sequences	from  filename,	 in  FASTA  or
		       FASTQ format.

	      --tabbedout filename
		       Write  the results to filename, in a tab-separated text
		       format. Column 1	contains the  query  label.  Column  2
		       contains	 the  predicted	taxonomy in the	same format as
		       for the reference data, with  bootstrap	support	 indi-
		       cated in	parentheses after each rank. Column 3 contains
		       the strand. If the --sintax_cutoff option is used,  the
		       predicted  taxonomy  will be repeated in	column 4 while
		       omitting	the bootstrap values and  including  only  the
		       ranks with support at or	above the threshold.

       UDB options:
	      Databases	 to  be	 used with the --usearch_global	command	may be
	      prepared from FASTA files	and stored to a	binary	UDB  formatted
	      file in order to speed up	searching. This	may be worthwhile when
	      searching	a large	database repeatedly. The sequences are indexed
	      and  stored in a way that	can be quickly loaded into memory. The
	      commands and options below can be	used to	create and inspect UDB
	      files. An	UDB file may be	specified with the --db	option instead
	      of a FASTA formatted file	with the --usearch_global command.

	      --dbmask none|dust|soft
		       Specify the  sequence  masking  method  used  with  the
		       --makeudb_usearch  command,  either none, dust or soft.
		       No masking is performed when none  is  specified.  When
		       dust  is	specified, the DUST algorithm will be used for
		       masking	low  complexity	 regions  (short  repeats  and
		       skewed  composition).  Lower  case letters in the input
		       file will be masked when	soft is	specified (soft	 mask-
		       ing).

	      --hardmask
		       Mask  sequences	by  replacing  letters	with N for the
		       --makeudb_usearch command. The default is to use	 lower
		       case letters (soft masking).

	      --makeudb_usearch	filename
		       Create  an  UDB	database file from the FASTA-formatted
		       sequences in the	file with the given filename. The  UDB
		       database	 is  written  to  the  file specified with the
		       --output	option.

	      --output filename
		       Specify the filename of a FASTA or UDB output file  for
		       the  --makeudb_usearch  or the --udb2fasta command, re-
		       spectively.

	      --udb2fasta filename
		       Read the	UDB database in	the file with the given	 file-
		       name  and  output  the sequences	in FASTA format	in the
		       file specified by the --output option.

	      --udbinfo	filename
		       Show information	about the UDB  database	 in  the  file
		       with the	given filename.

	      --udbstats filename
		       Report  statistics  about  the indexed words in the UDB
		       database	in the file with the given filename.

	      --wordlength positive integer
		       Specify the length of the words to be used when	creat-
		       ing  the	UDB database index using the --makeudb_usearch
		       command.	Valid numbers range from 3 to 15. The  default
		       is 8.

       Userfields (fields accepted by the --userfields option):

	      aln      Print a string of M (match/mismatch, i.e. not a gap), D
		       (delete,	i.e. a gap in the query) and I (insert,	i.e. a
		       gap in the target) representing the pairwise alignment.
		       Empty field if there is no alignment.

	      alnlen   Print the length	of the query-target alignment  (number
		       of  columns).  The  field  is  set  to 0	if there is no
		       alignment.

	      bits     Bit score (not computed for nucleotide alignments). Al-
		       ways set	to 0.

	      caln     Compact	representation of the pairwise alignment using
		       the CIGAR format	(Compact Idiosyncratic	Gapped	Align-
		       ment  Report):  M  (match/mismatch), D (deletion) and I
		       (insertion). Empty field	if there is no alignment.

	      evalue   E-value (not computed for nucleotide  alignments).  Al-
		       ways set	to -1.

	      exts     Number  of  columns containing a	gap extension (zero or
		       positive	integer	value).

	      gaps     Number of columns containing a gap  (zero  or  positive
		       integer value).

	      id       The  percentage	of identity, according to the identity
		       definition specified by the --iddef option.   Equal  to
		       id0, id1, id2, id3 or id4 below.	By default the same as
		       id2.

	      id0      CD-HIT definition of the	percentage of  identity	 (real
		       value  ranging  from  0.0 to 100.0) using the length of
		       the shortest sequence in	the pairwise alignment as  de-
		       nominator:  100	*  (matching  columns) / (shortest se-
		       quence length).

	      id1      The percentage of identity (real	value ranging from 0.0
		       to  100.0)  is  defined	as  the	 edit  distance: 100 *
		       (matching columns) / (alignment length).

	      id2      The percentage of identity (real	value ranging from 0.0
		       to  100.0)  is  defined as the edit distance, excluding
		       terminal	gaps.

	      id3      Marine Biological Lab definition	of the	percentage  of
		       identity	(real value ranging from 0.0 to	100.0),	count-
		       ing each	gap opening (internal or terminal) as a	single
		       mismatch,  whether or not the gap was extended, and us-
		       ing the length of the longest sequence in the  pairwise
		       alignment  as  denominator: 100 * (1.0 -	[(mismatches +
		       gaps) / (longest	sequence length)]).

	      id4      BLAST definition	of the percentage  of  identity	 (real
		       value ranging from 0.0 to 100.0), equivalent to --iddef
		       1 in a context of global	pairwise alignment. The	 field
		       id4 is always equal to the field	id1.

	      ids      Number  of  matches  in the alignment (zero or positive
		       integer value).

	      mism     Number of mismatches in the alignment (zero or positive
		       integer value).

	      opens    Number  of  columns  containing	a gap opening (zero or
		       positive	integer	value).

	      pairs    Number of columns  containing  only  nucleotides.  That
		       value  corresponds to the length	of the alignment minus
		       the gap-containing columns (zero	 or  positive  integer
		       value).

	      pctgaps  Number  of  columns containing gaps expressed as	a per-
		       centage of the alignment	 length	 (real	value  ranging
		       from 0.0	to 100.0).

	      pctpv    Percentage  of  positive	columns. When working with nu-
		       cleotide	sequences, this	is equivalent to the  percent-
		       age of matches (real value ranging from 0.0 to 100.0).

	      pv       Number  of  positive columns. When working with nucleo-
		       tide sequences, this is equivalent  to  the  number  of
		       matches (zero or	positive integer value).

	      qcov     Fraction	of the query sequence that is aligned with the
		       target sequence (real value ranging from	0.0 to 100.0).
		       The  query  coverage  is	computed as 100.0 * (matches +
		       mismatches) / query sequence length.  Internal or  ter-
		       minal gaps are not taken	into account. The field	is set
		       to 0.0 if there is no alignment.

	      qframe   Query frame (-3 to +3). That field only concerns	coding
		       sequences and is	not computed by	vsearch. Always	set to
		       +0.

	      qhi      Last nucleotide of the query aligned with  the  target.
		       Always equal to the length of the pairwise alignment, 0
		       otherwise (see qihi to ignore terminal gaps).

	      qihi     Last nucleotide of the query aligned  with  the	target
		       (ignoring  terminal  gaps). Nucleotide numbering	starts
		       from 1. The field is set	to 0 if	there is no alignment.

	      qilo     First nucleotide	of the query aligned with  the	target
		       (ignoring  initial  gaps).  Nucleotide numbering	starts
		       from 1. The field is set	to 0 if	there is no alignment.

	      ql       Query sequence length  (positive	 integer  value).  The
		       field is	set to 0 if there is no	alignment.

	      qlo      First  nucleotide of the	query aligned with the target.
		       Always equal to 1 if there is an	alignment, 0 otherwise
		       (see qilo to ignore initial gaps).

	      qrow     Print  the sequence of the query	segment	as seen	in the
		       pairwise	alignment (i.e.	with gap  insertions  if  need
		       be). Empty field	if there is no alignment.

	      qs       Query  segment  length.	Always equal to	query sequence
		       length.

	      qstrand  Query strand orientation	(+ or  -  for  nucleotide  se-
		       quences). Empty field if	there is no alignment.

	      query    Query label.

	      raw      Raw alignment score (negative, null or positive integer
		       value). The score is the	sum  of	 match	rewards	 minus
		       mismatch	 penalties,  gap  openings and gap extensions.
		       The field is set	to 0 if	there is no alignment.

	      target   Target label. The field is set to '*' if	 there	is  no
		       alignment.

	      tcov     Fraction	 of  the  target sequence that is aligned with
		       the query sequence (real	 value	ranging	 from  0.0  to
		       100.0).	The  target  coverage  is  computed as 100.0 *
		       (matches	+ mismatches) /	target sequence	 length.   In-
		       ternal  or  terminal  gaps  are not taken into account.
		       The field is set	to 0.0 if there	is no alignment.

	      tframe   Target frame (-3	to +3).	That field only	concerns  cod-
		       ing  sequences  and  is not computed by vsearch.	Always
		       set to +0.

	      thi      Last nucleotide of the target aligned with  the	query.
		       Always equal to the length of the pairwise alignment, 0
		       otherwise (see tihi to ignore terminal gaps).

	      tihi     Last nucleotide of the target aligned  with  the	 query
		       (ignoring  terminal  gaps). Nucleotide numbering	starts
		       from 1. The field is set	to 0 if	there is no alignment.

	      tilo     First nucleotide	of the target aligned with  the	 query
		       (ignoring  initial  gaps).  Nucleotide numbering	starts
		       from 1. The field is set	to 0 if	there is no alignment.

	      tl       Target sequence length (positive	 integer  value).  The
		       field is	set to 0 if there is no	alignment.

	      tlo      First  nucleotide of the	target aligned with the	query.
		       Always equal to 1 if there is an	alignment, 0 otherwise
		       (see tilo to ignore initial gaps).

	      trow     Print the sequence of the target	segment	as seen	in the
		       pairwise	alignment (i.e.	with gap  insertions  if  need
		       be). Empty field	if there is no alignment.

	      ts       Target  segment length. Always equal to target sequence
		       length. The field is set	to 0 if	there is no alignment.

	      tstrand  Target strand orientation (+ or -  for  nucleotide  se-
		       quences).  Always set to	'+', so	reverse	strand matches
		       have tstrand '+'	and qstrand '-'. Empty field if	 there
		       is no alignment.

DELIBERATE CHANGES
       If  you	are a usearch user, our	objective is to	make you feel at home.
       That's why vsearch was designed to behave like usearch, to some extent.
       Like  any  complex software, usearch is not free	from quirks and	incon-
       sistencies. We decided not to reproduce some of them, and for  complete
       transparency, to	document here the deliberate changes we	made.

       During  a  search  with usearch,	when using the options --blast6out and
       --output_no_hits, for queries with no match the number  of  fields  re-
       ported is 13, where it should be	12. This is corrected in vsearch.

       The field raw of	the --userfields option	is not informative in usearch.
       This is corrected in vsearch.

       The fields qlo, qhi, tlo, thi now have counterparts (qilo, qihi,	 tilo,
       tihi) reporting alignment coordinates ignoring terminal gaps.

       In  usearch,  when  using the option --output_no_hits, queries that re-
       ceive no	match are reported in --blast6out file,	but not	in the	align-
       ment output file. This is corrected in vsearch.

       vsearch introduces a new	--cluster_size command that sorts sequences by
       decreasing abundance before clustering.

       vsearch reintroduces --iddef alternative	pairwise identity  definitions
       that were removed from usearch.

       vsearch extends the --topn option to sorting commands.

       vsearch	extends	 the  --sizein	option to dereplication	(--derep_full-
       length) and clustering (--cluster_fast).

       vsearch treats T	and U as identical nucleotides during dereplication.

       vsearch sorting is stabilized by	using sequence abundances or sequences
       labels as secondary or tertiary keys.

       vsearch	by  default uses the DUST algorithm for	masking	low-complexity
       regions.	Masking	behavior is also slightly changed to be	 more  consis-
       tent.

NOVELTIES
       vsearch	introduces new commands	and new	options	not present in usearch
       7. They are described in	the 'Options' section of this manual. Here  is
       a short list:

	      -	uchime2_denovo,	   uchime3_denovo,   alignwidth,   borderline,
		fasta_score (chimera checking)

	      -	cluster_size, cluster_unoise, clusterout_id,  clusterout_sort,
		profile	(clustering)

	      -	fasta_width,  gzip_decompress,	bzip2_decompress  (general op-
		tion)

	      -	iddef (clustering, pairwise alignment, searching)

	      -	maxuniquesize (dereplication)

	      -	relabel_md5, relabel_self and relabel_sha1 (chimera detection,
		dereplication, FASTQ processing, shuffling, sorting)

	      -	shuffle	(shuffling)

	      -	fastq_eestats,	 fastq_eestats2,  fastq_maxlen,	 fastq_truncee
		(FASTQ processing)

	      -	fastaout_discarded, fastqout_discarded (subsampling)

	      -	rereplicate (dereplication/rereplication)

EXAMPLES
       Align all sequences in a	database with each other and output all	 pair-
       wise alignments:

	      vsearch	--allpairs_global  database.fas	 --alnout  results.aln
	      --acceptall

       Check for the presence of chimeras (de  novo);  parents	should	be  at
       least  1.5  times  more abundant	than chimeras. Output non-chimeric se-
       quences in fasta	format (no wrapping):

	      vsearch --uchime_denovo queries.fas --abskew  1.5	 --nonchimeras
	      results.fas --fasta_width	0

       Cluster with a 97% similarity threshold,	collect	cluster	centroids, and
       write cluster descriptions using	a uclust-like format:

	      vsearch --cluster_fast queries.fas --id  0.97  --centroids  cen-
	      troids.fas --uc clusters.uc

       Dereplicate  the	 sequences contained in	queries.fas, take into account
       the abundance information already present, write	 unwrapped  fasta  se-
       quences	to queries_unique.fas with the new abundance information, dis-
       card all	sequences with an abundance of 1:

	      vsearch --derep_fulllength queries.fas --sizein --fasta_width  0
	      --sizeout	--output queries_unique.fas --minuniquesize 2

       Mask  simple repeats and	low complexity regions in the input fasta file
       with the	DUST algorithm (masked regions are lowercased),	and write  the
       results to the output file:

	      vsearch	 --maskfasta   queries.fas   --qmask   dust   --output
	      queries_masked.fas

       Search queries in a reference database, with a  80%-similarity  thresh-
       old, take terminal gaps into account when calculating pairwise similar-
       ities, output pairwise alignments:

	      vsearch --usearch_global queries.fas  --db  references.fas  --id
	      0.8 --iddef 1 --alnout results.aln

       Search  a  sequence  dataset against itself (ignore self	hits), get all
       matches with at least 60% similarity, and collect results in  a	blast-
       like tab-separated format. Accept an unlimited number of	hits (--maxac-
       cepts 0), and compare each query	to all other sequences,	including  un-
       likely candidates (--maxrejects 0):

	      vsearch  --usearch_global	 queries.fas  --db  queries.fas	--self
	      --id 0.6 --blast6out results.blast6 --maxaccepts 0  --maxrejects
	      0

       Shuffle	the  input fasta file (change the order	of sequences) in a re-
       peatable	fashion	(fixed seed), and write	unwrapped fasta	 sequences  to
       the output file:

	      vsearch	--shuffle  queries.fas	--output  queries_shuffled.fas
	      --randseed 13 --fasta_width 0

       Sort by decreasing abundance the	 sequences  contained  in  queries.fas
       (using  the  'size=integer'  information),  relabel the sequences while
       preserving the abundance	information (with --sizeout),  keep  only  se-
       quences with an abundance equal to or greater than 2:

	      vsearch  --sortbysize  queries.fas  --output  queries_sorted.fas
	      --relabel	sampleA_ --sizeout --minsize 2

AUTHORS
       Implementation by TorbjA,rn Rognes and TomA!s Flouri, documentation  by
       FrA(C)dA(C)ric MahA(C).

CITATION
       Rognes  T,  Flouri T, Nichols B,	Quince C, MahA(C) F. (2016) VSEARCH: a
       versatile open  source  tool  for  metagenomics.	  PeerJ	 4:e2584  doi:
       10.7717/peerj.2584

REPORTING BUGS
       Submit	       suggestions	    and		bug-reports	    at
       <https://github.com/torognes/vsearch/issues>, send a  pull  request  on
       <https://github.com/torognes/vsearch>, or compose a friendly or curmud-
       geont e-mail to TorbjA,rn Rognes	<torognes@ifi.uio.no>.

AVAILABILITY
       Source	   code	     and      binaries	    are	     available	    at
       <https://github.com/torognes/vsearch>.

COPYRIGHT
       Copyright  (C)  2014-2021, TorbjA,rn Rognes, FrA(C)dA(C)ric MahA(C) and
       TomA!s Flouri

       All rights reserved.

       Contact:	TorbjA,rn Rognes <torognes@ifi.uio.no>,	Department  of	Infor-
       matics, University of Oslo, PO Box 1080 Blindern, NO-0316 Oslo, Norway

       This  software  is dual-licensed	and available under a choice of	one of
       two licenses, either under the terms of the GNU General Public  License
       version 3 or the	BSD 2-Clause License.

       GNU General Public License version 3

       This program is free software: you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published  by  the
       Free  Software Foundation, either version 3 of the License, or (at your
       option) any later version.

       This program is distributed in the hope that it	will  be  useful,  but
       WITHOUT	ANY  WARRANTY;	without	 even  the  implied  warranty  of MER-
       CHANTABILITY or FITNESS FOR A PARTICULAR	PURPOSE.  See the GNU  General
       Public License for more details.

       You should have received	a copy of the GNU General Public License along
       with this program.  If not, see <http://www.gnu.org/licenses/>.

       The BSD 2-Clause	License

       Redistribution and use in source	and binary forms, with or without mod-
       ification,  are	permitted  provided  that the following	conditions are
       met:

       1. Redistributions of source code must retain the above	copyright  no-
       tice, this list of conditions and the following disclaimer.

       2.  Redistributions  in	binary form must reproduce the above copyright
       notice, this list of conditions and the	following  disclaimer  in  the
       documentation and/or other materials provided with the distribution.

       THIS SOFTWARE IS	PROVIDED BY THE	COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
       IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT  NOT  LIMITED
       TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTIC-
       ULAR PURPOSE ARE	DISCLAIMED. IN NO EVENT	SHALL THE COPYRIGHT HOLDER  OR
       CONTRIBUTORS  BE	 LIABLE	FOR ANY	DIRECT,	INDIRECT, INCIDENTAL, SPECIAL,
       EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,	BUT  NOT  LIMITED  TO,
       PROCUREMENT  OF	SUBSTITUTE  GOODS  OR  SERVICES; LOSS OF USE, DATA, OR
       PROFITS;	OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON	ANY THEORY  OF
       LIABILITY,  WHETHER  IN	CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
       NEGLIGENCE OR OTHERWISE)	ARISING	IN ANY WAY OUT	OF  THE	 USE  OF  THIS
       SOFTWARE, EVEN IF ADVISED OF THE	POSSIBILITY OF SUCH DAMAGE.

       We would	like to	thank the authors of the following projects for	making
       their source code available:

	      -	vsearch	includes code from Google's CityHash project by	 Geoff
		Pike and Jyrki Alakuijala, providing some excellent hash func-
		tions available	under a	MIT license.

	      -	vsearch	includes code derived from Tatusov and	Lipman's  DUST
		program	that is	in the public domain.

	      -	vsearch	 includes  public  domain  code	 written  by Alexander
		Peslyak	for the	MD5 message digest algorithm.

	      -	vsearch	includes public	domain code written by Steve Reid  and
		others for the SHA1 message digest algorithm.

	      -	vsearch	binaries may include code from the zlib	library, copy-
		right Jean-Loup	Gailly and Mark	Adler.

	      -	vsearch	binaries may include  code  from  the  bzip2  library,
		copyright Julian R. Seward.

SEE ALSO
       swipe,  an  extremely  fast  pairwise  local  (Smith-Waterman) database
       search	  tool	   by	  TorbjA,rn	Rognes,	     available	    at
       <https://github.com/torognes/swipe>.

       swarm, a	fast and accurate amplicon clustering method by	FrA(C)dA(C)ric
       MahA(C)	    and	      TorbjA,rn	      Rognes,	    available	    at
       <https://github.com/torognes/swarm>.

VERSION	HISTORY
       New features and	important modifications	of vsearch (short lived	or mi-
       nor bug releases	may not	be mentioned):

       v1.0.0 released November	28th, 2014
	      First public release.

       v1.0.1 released December	1st, 2014
	      Bug fixes	(sortbysize, semicolon after size annotation in	 head-
	      ers)  and	 minor	changes	(labels	as secondary sort key for most
	      sorts, treat T and U as identical	for dereplication, only	output
	      size in --dbmatched file if --sizeout specified).

       v1.0.2 released December	6th, 2014
	      Bug fixes	(ssse3/sse4.1 requirement, memory leak).

       v1.0.3 released December	6th, 2014
	      Bug fix (now writes help to stdout instead of stderr).

       v1.0.4 released December	8th, 2014
	      Added   --allpairs_global	 option.  Reduce  memory  requirements
	      slightly and eliminate memory leaks.

       v1.0.5 released December	9th, 2014
	      Fixes a minor bug	with  --allpairs_global	 and  --acceptall  op-
	      tions.

       v1.0.6 released December	14th, 2014
	      Fixes a memory allocation	bug in chimera detection (--uchime_ref
	      option).

       v1.0.7 released December	19th, 2014
	      Fixes a bug in  the  output  from	 chimera  detection  with  the
	      --uchimeout option.

       v1.0.8 released January 22nd, 2015
	      Introduces several changes and bug fixes:

	      -	a  new linear memory aligner for alignment of sequences	longer
		than 5,000 nucleotides,

	      -	a new --cluster_size command that sorts	sequences by  decreas-
		ing abundance before clustering,

	      -	meaning	 of userfields qlo, qhi, tlo, thi changed for compati-
		bility with usearch,

	      -	new userfields qilo, qihi, tilo, tihi give  alignment  coordi-
		nates ignoring terminal	gaps,

	      -	in  --uc output	files, a perfect alignment is indicated	with a
		'=' sign,

	      -	the option --cluster_fast now sorts  sequences	by  decreasing
		length,	 then  by decreasing abundance and finally by sequence
		identifier,

	      -	default	--maxseqlength value set to 50,000 nucleotides,

	      -	fix for	bug in alignment in rare cases,

	      -	fix for	lack of	 detection  of	under-	or  overflow  in  SIMD
		aligner.

       v1.0.9 released January 22nd, 2015
	      Fixes  a	bug  in	 the  function sorting sequences by decreasing
	      abundance	(--sortbysize).

       v1.0.10 released	January	23rd, 2015
	      Fixes a bug where	the --sizein option  was  ignored  and	always
	      treated as on, affecting clustering and dereplication commands.

       v1.0.11 released	February 5th, 2015
	      Introduces  the possibility to output results in SAM format (for
	      clustering, pairwise alignment and searching).

       v1.0.12 released	February 6th, 2015
	      Temporarily fixes	a problem with long headers in FASTA files.

       v1.0.13 released	February 17th, 2015
	      Fix a memory allocation problem when computing multiple sequence
	      alignments with the --msaout and --consout options, as well as a
	      memory leak.  Also increased line	buffer for reading FASTA files
	      to 4MB.

       v1.0.14 released	February 17th, 2015
	      Fix  a  bug  where the multiple alignment	and consensus sequence
	      computed after clustering	ignored	the strand of  the  sequences.
	      Also  decreased  size  of	line buffer for	reading	FASTA files to
	      1MB again	due to excessive stack memory usage.

       v1.0.15 released	February 18th, 2015
	      Fix bug in calculation of	identity metric	between	sequences when
	      using the	MBL definition (--iddef	3).

       v1.0.16 released	February 19th, 2015
	      Integrated  patches from Debian for increased compatibility with
	      various architectures.

       v1.1.0 released February	20th, 2015
	      Added the	--quiet	option to suppress all output  to  stdout  and
	      stderr except for	warnings and fatal errors. Added the --log op-
	      tion to write messages to	a log file.

       v1.1.1 released February	20th, 2015
	      Added info about --log and --quiet options to help text.

       v1.1.2 released March 18th, 2015
	      Fix bug with large datasets. Fix format of help info.

       v1.1.3 released March 18th, 2015
	      Fix more bugs with large datasets.

       v1.2.0-1.2.19 released July 6th to September 8th, 2015
	      Several new commands and options added. Bugs  fixed.  Documenta-
	      tion updated.

       v1.3.0 released September 9th, 2015
	      Changed to autotools build system.

       v1.3.1 released September 14th, 2015
	      Several new commands and options.	Bug fixes.

       v1.3.2 released September 15th, 2015
	      Fixed  memory leaks. Added '-h' shortcut for help. Removed extra
	      'v' in version number.

       v1.3.3 released September 15th, 2015
	      Fixed bug	in hexadecimal digits of MD5 and SHA1  digests.	 Added
	      --samheader option.

       v1.3.4 released September 16th, 2015
	      Fixed compilation	problems with zlib and bzip2lib.

       v1.3.5 released September 17th, 2015
	      Minor  configuration/makefile  changes  to compile to native CPU
	      and simplify makefile.

       v1.4.0 released September 25th, 2015
	      Added --sizeorder	option.

       v1.4.1 released September 29th, 2015
	      Inserted public domain MD5 and SHA1 code to eliminate dependency
	      on crypto	and openssl libraries and their	licensing issues.

       v1.4.2 released October 2nd, 2015
	      Dynamic  loading	of  libraries  for reading gzip	and bzip2 com-
	      pressed files if available. Circumvention	 of  missing  gzoffset
	      function in zlib 1.2.3 and earlier.

       v1.4.3 released October 3rd, 2015
	      Fix  a bug with determining amount of memory on some versions of
	      Apple OS X.

       v1.4.4 released October 3rd, 2015
	      Remove debug message.

       v1.4.5 released October 6th, 2015
	      Fix memory allocation bug	when reading long FASTA	sequences.

       v1.4.6 released October 6th, 2015
	      Fix subtle bug in	SIMD alignment code that reduced accuracy.

       v1.4.7 released October 7th, 2015
	      Fixes a problem with searching for or clustering sequences  with
	      repeats.	In this	new version, vsearch looks at all words	occur-
	      ring at least once in the	sequences in the initial step.	Previ-
	      ously  only words	occurring exactly once were considered.	In ad-
	      dition, vsearch now requires at least 10 words to	be  shared  by
	      the  sequences,  previously  only	 6 were	required. If the query
	      contains less than 10 words, all words must  be  present	for  a
	      match. This change seems to lead to slightly reduced recall, but
	      somewhat increased precision, ending up with  slightly  improved
	      overall accuracy.

       v1.5.0 released October 7th, 2015
	      This version introduces the new option --minwordmatches that al-
	      lows the user to specify the minimum number of  matching	unique
	      words  before a sequence is considered further. New default val-
	      ues for different	word lengths are also set.  The	 minimum  word
	      length is	increased to 7.

       v1.6.0 released October 9th, 2015
	      This  version  adds  the	relabeling options (--relabel, --rela-
	      bel_md5 and --relabel_sha1) to the shuffle command. It also adds
	      the  --xsize  option to the clustering, dereplication, shuffling
	      and sorting commands.

       v1.6.1 released October 14th, 2015
	      Fix bugs and update manual and help text regarding  relabelling.
	      Add  all relabelling options to the subsampling command. Add the
	      --xsize option to	chimera	 detection,  dereplication  and	 fastq
	      filtering	commands. Refactoring of code.

       v1.7.0 released October 14th, 2015
	      Add --relabel_keep option.

       v1.8.0 released October 19th, 2015
	      Added --search_exact, --fastx_mask and --fastq_convert commands.
	      Changed most commands to read FASTQ input	files as well as FASTA
	      files.   Modified	--fastx_revcomp	and --fastx_subsample to write
	      FASTQ files.

       v1.8.1 released November	2nd, 2015
	      Fixes for	compatibility with QIIME and older OS X	versions.

       v1.9.0 released November	12th, 2015
	      Added the	--fastq_mergepairs  command  and  associated  options.
	      This  command  has not been tested well yet. Included additional
	      files to avoid dependency	of autoconf for	compilation. Fixed  an
	      error  where identifiers in fasta	headers	where not truncated at
	      tabs, just spaces.  Fixed	a bug in detection of the file	format
	      (FASTA/FASTQ) of a gzip compressed input file.

       v1.9.1 released November	13th, 2015
	      Fixed   memory   leak   and   a  bug  in	score  computation  in
	      --fastq_mergepairs, and improved speed.

       v1.9.2 released November	17th, 2015
	      Fixed  a	bug  in	 the   computation   of	  some	 values	  with
	      --fastq_stats.

       v1.9.3 released November	19th, 2015
	      Workaround for missing x86intrin.h with old compilers.

       v1.9.4 released December	3rd, 2015
	      Fixed incrementation of counter when relabeling dereplicated se-
	      quences.

       v1.9.5 released December	3rd, 2015
	      Fixed bug	resulting in inferior chimera detection	performance.

       v1.9.6 released January 8th, 2016
	      Fixed bug	in aligned sequences produced  with  --fastapairs  and
	      --userout	(qrow, trow) options.

       v1.9.7 released January 12th, 2016
	      Masking  behavior	is changed somewhat to keep the	letter case of
	      the input	sequences unchanged  when  no  masking	is  performed.
	      Masking is now performed also during chimera detection. Documen-
	      tation updated.

       v1.9.8 released January 22nd, 2016
	      Fixed bug	causing	segfault when chimera detection	 is  performed
	      on extremely short sequences.

       v1.9.9 released January 22nd, 2016
	      Adjusted	default	minimum	number of word matches during searches
	      for improved performance.

       v1.9.10 released	January	25th, 2016
	      Fixed bug	related	to masking and lower case database sequences.

       v1.10.0 released	February 11th, 2016
	      Parallelized and improved	merging	of paired-end  reads  and  ad-
	      justed  some defaults. Removed progress indicator	when stderr is
	      not a terminal. Added --fasta_score  option  to  report  chimera
	      scores  in  FASTA	files. Added --rereplicate and --fastq_eestats
	      commands.	Fixed typos. Added relabelling to files	produced  with
	      --consout	and --profile options.

       v1.10.1 released	February 23rd, 2016
	      Fixed  a	bug  affecting	the --fastq_mergepairs command causing
	      FASTQ headers to be truncated at first space  (despite  the  bug
	      fix  release 1.9.0 of November 12th, 2015). Full headers are now
	      included in the output (no matter	if --notrunclabels is  in  ef-
	      fect or not).

       v1.10.2 released	March 18th, 2016
	      Fixed  a	bug  causing  a	segmentation fault when	running	--use-
	      arch_global with an empty	query sequence.	Also fixed a bug caus-
	      ing imperfect alignments to be reported with an alignment	string
	      of '=' in	uc output  files.  Fixed  typos	 in  man  file.	 Fixed
	      fasta/fastq  processing  code  regarding	presence or absence of
	      compression library header files.

       v1.11.1 released	April 13th, 2016
	      Added strand information in UC file for  --derep_fulllength  and
	      --derep_prefix.  Added  expected	errors (ee) to header of FASTA
	      files specified with --fastaout  and  --fastaout_discarded  when
	      --eeout  or  --fastq_eeout  option is in effect for fastq_filter
	      and fastq_mergepairs. The	options	--eeout	and --fastq_eeout  are
	      now equivalent.

       v1.11.2 released	June 21st, 2016
	      Two  bugs	 were  fixed.  The  first  issue  was  related	to the
	      --query_cov option that used  a  different  coverage  definition
	      than  the	 qcov  userfield.  The	coverage is now	defined	as the
	      fraction of the whole query sequence length that is aligned with
	      matching or mismatching residues in the target. All gaps are ig-
	      nored. The other issue was related to  the  consensus  sequences
	      produced	during	clustering  when only N's were present in some
	      positions. Previously these would	be converted  to  A's  in  the
	      consensus.  The behaviour	is changed so that N's are produced in
	      the consensus, and it should now be more	compatible  with  use-
	      arch.

       v2.0.0 released June 24th, 2016
	      This  major new version supports reading from pipes. Two new op-
	      tions are	added: --gzip_decompress and  --bzip2_decompress.  One
	      of  these	 options must be specified if reading compressed input
	      from a pipe, but are not required	 when  reading	from  ordinary
	      files.  The vsearch header that was previously written to	stdout
	      is now written to	stderr.	This enables  piping  of  results  for
	      further processing. The file name	'-' now	represent standard in-
	      put (/dev/stdin) or standard output (/dev/stdout)	 when  reading
	      or writing files,	respectively. Code for reading FASTA and FASTQ
	      files has	been refactored.

       v2.0.1 released June 30th, 2016
	      Avoid segmentation fault when masking very long sequences.

       v2.0.2 released July 5th, 2016
	      Avoid warnings when compiling with GCC 6.

       v2.0.3 released August 2nd, 2016
	      Fixed bad	compiler options resulting in Illegal instruction  er-
	      rors when	running	precompiled binaries.

       v2.0.4 released September 1st, 2016
	      Improved	error  message	for bad	FASTQ quality values. Improved
	      manual.

       v2.0.5 released September 9th, 2016
	      Add options  --fastaout_discarded	 and  --fastqout_discarded  to
	      output  discarded	 sequences from	subsampling to separate	files.
	      Updated manual.

       v2.1.0 released September 16th, 2016
	      New  command:  --fastx_filter.  New   options:   --fastq_maxlen,
	      --fastq_truncee. Allow --minwordmatches down to 3.

       v2.1.1 released September 23rd, 2016
	      Fixed bugs in output to UC-files.	Improved help text and manual.

       v2.1.2 released September 28th, 2016
	      Fixed   incorrect	  abundance   output   from  fastx_filter  and
	      fastq_filter when	relabelling.

       v2.2.0 released October 7th, 2016
	      Added    OTU     table	 generation	options	    --biomout,
	      --mothur_shared_out   and	 --otutabout  to  the  clustering  and
	      searching	commands.

       v2.3.0 released October 10th, 2016
	      Allowed zero-length sequences in FASTA and  FASTQ	 files.	 Added
	      --fastq_trunclen_keep  option.  Fixed bug	with output of OTU ta-
	      bles to pipes.

       v2.3.1 released November	16th, 2016
	      Fixed bug	where --minwordmatches 0 was interpreted  as  the  de-
	      fault  minimum word matches for the given	word length instead of
	      zero. When used in combination with --maxaccepts 0 and  --maxre-
	      jects 0 it will allow complete bypass of kmer-based heuristics.

       v2.3.2 released November	18th, 2016
	      Fixed  bug where vsearch reported	the ordinal number of the tar-
	      get sequence instead of the cluster number in  column  2	on  H-
	      lines  in	 the  uc  output file after clustering.	For search and
	      alignment	commands both usearch and vsearch reports  the	target
	      sequence number here.

       v2.3.3 released December	5th, 2016
	      A	minor speed improvement.

       v2.3.4 released December	9th, 2016
	      Fixed  bug in output of sequence profiles	and updated documenta-
	      tion.

       v2.4.0 released February	8th, 2017
	      Added support for	Linux on Power8	systems	(ppc64le) and  Windows
	      on  x86_64.  Improved  detection of pipes	when reading FASTA and
	      FASTQ  files.  Corrected	option	for  specifiying  output  from
	      fastq_eestats command in help text.

       v2.4.1 released March 1st, 2017
	      Fixed an overflow	bug in fastq_stats and fastq_eestats affecting
	      analysis of very large FASTQ files. Fixed	maximum	 memory	 usage
	      reporting	on Windows.

       v2.4.2 released March 10th, 2017
	      Default  value  for fastq_minovlen increased to 16 in accordance
	      with help	text and for compatibility with	usearch. Minor changes
	      for improved accuracy of paired-end read merging.

       v2.4.3 released April 6th, 2017
	      Fixed bug	with progress bar for shuffling. Fixed missing N-lines
	      in  UC  files  with  usearch_global,   search_exact   and	  all-
	      pairs_global when	the output_no_hits option was not specified.

       v2.4.4 released August 28th, 2017
	      Fixed a few minor	bugs, improved error messages and updated doc-
	      umentation.

       v2.5.0 released October 5th, 2017
	      Support for UDB database files. New commands:  fastq_stripright,
	      fastq_eestats2,  makeudb_usearch,	 udb2fasta,  udbinfo, and udb-
	      stats. New general option: no_progress. New options minsize  and
	      maxsize to fastx_filter. Minor bug fixes,	error message improve-
	      ments and	documentation updates.

       v2.5.1 released October 25th, 2017
	      Fixed bug	with bad default value of 1 instead of 32  for	minse-
	      qlength when using the makeudb_usearch command.

       v2.5.2 released October 30th, 2017
	      Fixed  bug  with	where '-' as an	argument to the	fastq_eestats2
	      option was treated literally instead of equivalent to stdin.

       v2.6.0 released November	10th, 2017
	      Rewritten	paired-end reads merger	with  improved	accuracy.  De-
	      creased  default	value for fastq_minovlen option	from 16	to 10.
	      The default value	for the	 fastq_maxdiffs	 option	 is  increased
	      from  5  to  10. There are now other more	important restrictions
	      that will	avoid merging reads that cannot	be reliably aligned.

       v2.6.1 released December	8th, 2017
	      Improved parallelisation of paired end reads merging.

       v2.6.2 released December	18th, 2017
	      Fixed option xsize that  was  partially  inactive	 for  commands
	      uchime_denovo, uchime_ref, and fastx_filter.

       v2.7.0 released February	13th, 2018
	      Added commands cluster_unoise, uchime2_denovo and	uchime3_denovo
	      contributed by Davide Albanese based on Robert  Edgar's  papers.
	      Refactored  fasta	 and fastq print functions as well as code for
	      extraction of abundance and other	attributes from	the headers.

       v2.7.1 released February	16th, 2018
	      Fix several bugs on Windows related to large files, use  of  "-"
	      as a file	name to	mean stdin or stdout, alignment	errors,	missed
	      kmers and	corrupted UDB files. Added  documentation  of  UDB-re-
	      lated commands.

       v2.7.2 released April 20th, 2018
	      Added  the  sintax command for taxonomic classification. Fixed a
	      bug with incorrect FASTA headers of  consensus  sequences	 after
	      clustering.

       v2.8.0 released April 24th, 2018
	      Added  the  fastq_maxdiffpct option to the fastq_mergepairs com-
	      mand.

       v2.8.1 released June 22nd, 2018
	      Fixes for	compilation warnings with GCC 8.

       v2.8.2 released August 21st, 2018
	      Fix for wrong placement of semicolons in header  lines  in  some
	      cases  when  using  the sizeout or xsize options.	Reduced	memory
	      requirements for full-length dereplication in  cases  with  many
	      duplicate	 sequences.   Improved wording of fastq_mergepairs re-
	      port. Updated manual regarding use of sizein  and	 sizeout  with
	      dereplication. Changed a compiler	option.

       v2.8.3 released August 31st, 2018
	      Fix for segmentation fault for --derep_fulllength	with --uc.

       v2.8.4 released September 3rd, 2018
	      Further  reduce  memory  requirements for	dereplication when not
	      using the	uc option. Fix output during subsampling when quiet or
	      log options are in effect.

       v2.8.5 released September 26th, 2018
	      Fixed  a	bug in fastq_eestats2 that caused the values for large
	      lengths to be much too high when the input sequences had varying
	      lengths.

       v2.8.6 released October 9th, 2018
	      Fixed  a bug introduced in version 2.8.2 that caused derep_full-
	      length to	include	the full FASTA header in its output instead of
	      stopping	at the first space (unless the notrunclabels option is
	      in effect).

       v2.9.0 released October 10th, 2018
	      Added the	fastq_join command.

       v2.9.1 released October 29th, 2018
	      Changed compiler options that select the target cpu  and	tuning
	      to  allow	 the  software	to run on any 64-bit x86 system, while
	      tuning for more modern variants. Avoid illegal instruction error
	      on  some architectures. Update documentation of rereplicate com-
	      mand.

       v2.10.0 released	December 6th, 2018
	      Added the	sff_convert commmand to	convert	SFF  files  to	FASTQ.
	      Added some additional option argument checks. Fixed segmentation
	      fault bug	after some fatal errors	when a log file	was specified.

       v2.10.1 released	December 7th, 2018
	      Improved sff_convert command. It will now	read several  variants
	      of the SFF format. It is also able to read from a	pipe. Warnings
	      are given	if there are minor problems. Errors messages have been
	      improved.	Minor speed and	memory usage improvements.

       v2.10.2 released	December 10th, 2018
	      Fixed bug	in sintax with reversed	order of domain	and kingdom.

       v2.10.3 released	December 19th, 2018
	      Ported  to  Linux	 on ARMv8 (aarch64). Fixed compilation warning
	      with gcc version 8.1.0 and 8.2.0.

       v2.10.4 released	January	4th, 2019
	      Fixed serious bug	in x86_64 SIMD alignment  code	introduced  in
	      version  2.10.3.	Added link to BioConda in README. Fixed	bug in
	      fastq_stats with sequence	length 1. Fixed	use of	equals	symbol
	      in UC files for identical	sequences with cluster_fast.

       v2.11.0 released	February 13th, 2019
	      Added  ability to	trim and filter	paired-end reads using the re-
	      verse option with	the fastx_filter  and  fastq_filter  commands.
	      Added  --xee  option to remove ee	attributes from	FASTA headers.
	      Minor invisible improvement to the progress indicator.

       v2.11.1 released	February 28th, 2019
	      Minor change to the handling of the weak_id and id options  when
	      using cluster_unoise.

       v2.12.0 released	March 19th, 2019
	      Take  sequence  abundance	 into account when computing consensus
	      sequences	or profiles after clustering. Warn when	 rereplicating
	      sequences	 without abundance info. Guess offset 33 in more cases
	      with fastq_chars.	Stricter checking of option arguments and  op-
	      tion combinations.

       v2.13.0 released	April 11th, 2019
	      Added  the --fastx_getseq, --fastx_getseqs and --fastx_getsubseq
	      commands to extract sequences from a FASTA or FASTQ  file	 based
	      on  their	labels.	Improved handling of ambiguous nucleotide sym-
	      bols. Corrected behaviour	of --uchime_ref	command	with  and  op-
	      tions  --self  and --selfid. Strict detection of illegal options
	      for each command.

       v2.13.1 released	April 26th, 2019
	      Minor changes to the allowed options for each command. All  com-
	      mands now	allow the log, quiet and threads options. If more than
	      1	thread is specified for	commands that are not  multi-threaded,
	      a	warning	will be	issued.	Minor changes to the manual.

       v2.13.2 released	April 30th, 2019
	      Fixed  bug  related to improper handling of newlines on Windows.
	      Allowed option strand plus to uchime_ref for compatibility.

       v2.13.3 released	April 30th, 2019
	      Fixed bug	in FASTQ parsing introduced in version 2.13.2.

       v2.13.4 released	May 10th, 2019
	      Added information	about support for gzip-	 and  bzip2-compressed
	      input files to the output	of the version command.	Adapted	source
	      code for compilation on FreeBSD and NetBSD systems.

       v2.13.5 released	July 2nd, 2019
	      Added cut	command	to fragment sequences  at  restriction	sites.
	      Silenced output from the fastq_stats command if quiet option was
	      given. Updated manual.

       v2.13.6 released	July 2nd, 2019
	      Added info about cut command to output of	help command.

       v2.13.7 released	September 2nd, 2019
	      Fixed bug	in consensus sequence introduced in version 2.13.0.

       v2.14.0 released	September 11th,	2019
	      Added relabel_self option. Made fasta_width, sizein, sizeout and
	      relabelling options valid	for certain commands.

       v2.14.1 released	September 18th,	2019
	      Fixed  bug  with	sequences  written to file specified with fas-
	      taout_rev	for commands fastx_filter and fastq_filter.

       v2.14.2 released	January	28th, 2020
	      Fixed some issues	with the  cut,	fastx_revcomp,	fastq_convert,
	      fastq_mergepairs,	and makeudb_usearch commands. Updated manual.

       v2.15.0 released	June 19th, 2020
	      Update  manual  and  documentation. Turn on notrunclabels	option
	      for sintax command by default. Change maxhits 0 to  mean	unlim-
	      ited hits, like the default. Allow non-ascii characters in head-
	      ers, with	a warning. Sort	centroids and  uc  too	when  cluster-
	      out_sort	specified.  Add	 cluster  id  to centroids output when
	      clusterout_id specified. Improve	error  messages	 when  parsing
	      FASTQ files. Add missing fastq_qminout option and	fix label_suf-
	      fix option  for  fastq_mergepairs.  Add  derep_id	 command  that
	      dereplicates  based  on both label and sequence. Remove compila-
	      tion warnings.

       v2.15.1 released	October	28th, 2020
	      Fix for dereplication  when  including  reverse  complement  se-
	      quences  and  headers.  Make some	extra checks when loading com-
	      pression libraries and add more diagnostic output	about them  to
	      the  output  of  the  version  command.  Report  an  error  when
	      fastx_filter is used with	FASTA input and	options	 that  require
	      FASTQ input. Update manual.

       v2.15.2 released	January	26th, 2021
	      No  real	functional  changes,  but  some	 code  and compilation
	      changes. Compiles	successfully on	macOS running on Apple Silicon
	      (ARMv8).	 Binaries  available.  Code  updated  for C++11. Minor
	      adaptations for Windows compatibility, including the use of  the
	      C++  standard library for	regular	expressions. Minor changes for
	      compatibility with Power8. Switch	to C++ header files.

       v2.16.0 released	March 22nd, 2021
	      This version adds	the orient command. It also handles empty  in-
	      put files	properly. Documentation	has been updated.

       v2.17.0 released	March 29nd, 2021
	      The  fastq_mergepairs  command  has  been	changed. It now	allows
	      merging of sequences with	overlaps as  short  as	5  bp  if  the
	      --fastq_minovlen	option has been	adjusted down from the default
	      10. In addition, much fewer pairs	of reads  should  now  be  re-
	      jected  with  the	 reason	'multiple potential alignments'	as the
	      algorithm	for detecting those have been changed.

       v2.17.1 released	June 14th, 2021
	      Modernized code. Minor changes to	help info.

version	2.17.1			 June 14, 2021			    vsearch(1)

NAME | SYNOPSIS | DESCRIPTION | DELIBERATE CHANGES | NOVELTIES | EXAMPLES | AUTHORS | CITATION | REPORTING BUGS | AVAILABILITY | COPYRIGHT | SEE ALSO | VERSION HISTORY

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=vsearch&sektion=1&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help