Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
samtools(1)		     Bioinformatics tools		   samtools(1)

NAME
       samtools	- Utilities for	the Sequence Alignment/Map (SAM) format

SYNOPSIS
       samtools	view -bt ref_list.txt -o aln.bam aln.sam.gz

       samtools	tview aln.sorted.bam ref.fasta

       samtools	quickcheck in1.bam in2.cram

       samtools	index aln.sorted.bam

       samtools	sort -T	/tmp/aln.sorted	-o aln.sorted.bam aln.bam

       samtools	collate	-o aln.name_collated.bam aln.sorted.bam

       samtools	idxstats aln.sorted.bam

       samtools	flagstat aln.sorted.bam

       samtools	flags PAIRED,UNMAP,MUNMAP

       samtools	stats aln.sorted.bam

       samtools	bedcov aln.sorted.bam

       samtools	depth aln.sorted.bam

       samtools	mpileup	-C50 -f	ref.fasta -r chr3:1,000-2,000 in1.bam in2.bam

       samtools	coverage aln.sorted.bam

       samtools	merge out.bam in1.bam in2.bam in3.bam

       samtools	split merged.bam

       samtools	cat out.bam in1.bam in2.bam in3.bam

       samtools	fastq input.bam	> output.fastq

       samtools	fasta input.bam	> output.fasta

       samtools	faidx ref.fasta

       samtools	fqidx ref.fastq

       samtools	dict -a	GRCh38 -s "Homo	sapiens" ref.fasta

       samtools	calmd in.sorted.bam ref.fasta

       samtools	fixmate	in.namesorted.sam out.bam

       samtools	markdup	in.algnsorted.bam out.bam

       samtools	 addreplacerg  -r 'ID:fish' -r 'LB:1334' -r 'SM:alpha' -o out-
       put.bam input.bam

       samtools	reheader in.header.sam in.bam >	out.bam

       samtools	targetcut input.bam

       samtools	phase input.bam

       samtools	depad input.bam

       samtools	ampliconclip -b	bed.file input.bam

DESCRIPTION
       Samtools	is a set of utilities that manipulate alignments  in  the  SAM
       (Sequence  Alignment/Map),  BAM,	and CRAM formats.  It converts between
       the formats, does sorting, merging and indexing,	and can	retrieve reads
       in any regions swiftly.

       Samtools	 is designed to	work on	a stream. It regards an	input file `-'
       as the standard input (stdin) and an output file	`-'  as	 the  standard
       output (stdout).	Several	commands can thus be combined with Unix	pipes.
       Samtools	always output warning and error	messages to the	standard error
       output (stderr).

       Samtools	is also	able to	open files on remote FTP or HTTP(S) servers if
       the file	name starts with `ftp://', `http://',  etc.   Samtools	checks
       the  current working directory for the index file and will download the
       index upon absence. Samtools does not  retrieve	the  entire  alignment
       file unless it is asked to do so.

       If  an index is needed, samtools	looks for the index suffix appended to
       the filename, and if that isn't found it	tries again without the	 file-
       name suffix (for	example	in.bam.bai followed by in.bai).	 However if an
       index is	in a completely	different location or has  a  different	 name,
       both  the  main data filename and index filename	can be pasted together
       with ##idx##.  For example  /data/in.bam##idx##/indices/in.bam.bai  may
       be used to explicitly indicate where the	data and index files reside.

COMMANDS
       Each  command  has  its own man page which can be viewed	using e.g. man
       samtools-view or	with a recent GNU man using man	samtools view.	 Below
       we have a brief summary of syntax and sub-command description.

       Options	common	to all sub-commands are	documented below in the	GLOBAL
       COMMAND OPTIONS section.

       view	 samtools view [options] in.sam|in.bam|in.cram [region...]

		 With no options or regions specified, prints  all  alignments
		 in  the  specified input alignment file (in SAM, BAM, or CRAM
		 format) to standard output in SAM format (with	no  header  by
		 default).

		 You may specify one or	more space-separated region specifica-
		 tions after the input filename	to  restrict  output  to  only
		 those	alignments  which overlap the specified	region(s). Use
		 of region specifications requires a coordinate-sorted and in-
		 dexed input file.

		 Options  exist	to change the output format from SAM to	BAM or
		 CRAM, so this command also acts as a file  format  conversion
		 utility.

       tview	 samtools   tview   [-p	  chr:pos]   [-s   STR]	 [-d  display]
		 <in.sorted.bam> [ref.fasta]

		 Text alignment	viewer (based on the ncurses library). In  the
		 viewer,  press	`?' for	help and press `g' to check the	align-
		 ment	start	from   a   region   in	 the	format	  like
		 `chr10:10,000,000'  or	 `=10,000,000'	when  viewing the same
		 reference sequence.

       quickcheck
		 samtools quickcheck [options] in.sam|in.bam|in.cram [ ... ]

		 Quickly check that input files	appear to  be  intact.	Checks
		 that  beginning of the	file contains a	valid header (all for-
		 mats) containing at least one target sequence and then	 seeks
		 to  the  end of the file and checks that an end-of-file (EOF)
		 is present and	intact (BAM only).

		 Data in the middle of the file	is not read since  that	 would
		 be much more time consuming, so please	note that this command
		 will not detect internal corruption, but is useful for	 test-
		 ing  that  files are not truncated before performing more in-
		 tensive tasks on them.

		 This command will exit	with a non-zero	exit code if any input
		 files	don't have a valid header or are missing an EOF	block.
		 Otherwise it will exit	successfully (with a zero exit code).

       index	 samtools index	 [-bc]	[-m  INT]  aln.sam.gz|aln.bam|aln.cram
		 [out.index]

		 Index a coordinate-sorted SAM,	BAM or CRAM file for fast ran-
		 dom access.  Note for SAM this	only works  if	the  file  has
		 been BGZF compressed first.

		 This  index is	needed when region arguments are used to limit
		 samtools view and similar commands to particular  regions  of
		 interest.

		 If  an	output filename	is given, the index file will be writ-
		 ten to	out.index.  Otherwise, for a CRAM file aln.cram, index
		 file  aln.cram.crai  will  be	created; for a BAM or SAM file
		 aln.bam, either aln.bam.bai or	aln.bam.csi will  be  created,
		 depending on the index	format selected.

       sort	 samtools sort [-l level] [-m maxMem] [-o out.bam] [-O format]
		 [-n] [-t tag] [-T tmpprefix] [-@ threads]
		 [in.sam|in.bam|in.cram]

		 Sort alignments by leftmost coordinates, or by	read name when
		 -n is used.  An appropriate @HD-SO sort order header tag will
		 be added or an	existing one updated if	necessary.

		 The  sorted  output is	written	to standard output by default,
		 or to the specified file (out.bam) when  -o  is  used.	  This
		 command  will also create temporary files tmpprefix.%d.bam as
		 needed	when the entire	alignment data cannot fit into	memory
		 (as controlled	via the	-m option).

		 Consider using	samtools collate instead if you	need name col-
		 lated data without a full lexicographical sort.

       collate	 samtools collate [options] in.sam|in.bam|in.cram [_prefix_]

		 Shuffles and groups reads together by their names.  A	faster
		 alternative  to  a full query name sort, collate ensures that
		 reads of the same name	are  grouped  together	in  contiguous
		 groups,  but  doesn't	make any guarantees about the order of
		 read names between groups.

		 The output from this command should be	suitable for any oper-
		 ation	that  requires	all reads from the same	template to be
		 grouped together.

       idxstats	 samtools idxstats in.sam|in.bam|in.cram

		 Retrieve and print stats in the index file  corresponding  to
		 the  input file.  Before calling idxstats, the	input BAM file
		 should	be indexed by samtools index.

		 If run	on a SAM or CRAM file or an unindexed BAM  file,  this
		 command  will	still produce the same summary statistics, but
		 does so by reading through the	 entire	 file.	 This  is  far
		 slower	than using the BAM indices.

		 The output is TAB-delimited with each line consisting of ref-
		 erence	sequence name, sequence	length,	# mapped reads	and  #
		 unmapped reads. It is written to stdout.

       flagstat	 samtools flagstat in.sam|in.bam|in.cram

		 Does  a  full	pass  through  the input file to calculate and
		 print statistics to stdout.

		 Provides counts for each of 13	categories based primarily  on
		 bit  flags  in	the FLAG field.	Each category in the output is
		 broken	down into QC pass and QC fail, which is	 presented  as
		 "#PASS	+ #FAIL" followed by a description of the category.

       flags	 samtools flags	INT|STR[,...]

		 Convert between textual and numeric flag representation.

		 FLAGS:

		   0x1	 PAIRED		 paired-end (or	multiple-segment) sequencing technology
		   0x2	 PROPER_PAIR	 each segment properly aligned according to the	aligner
		   0x4	 UNMAP		 segment unmapped
		   0x8	 MUNMAP		 next segment in the template unmapped
		  0x10	 REVERSE	 SEQ is	reverse	complemented
		  0x20	 MREVERSE	 SEQ of	the next segment in the	template is reverse complemented
		  0x40	 READ1		 the first segment in the template
		  0x80	 READ2		 the last segment in the template
		 0x100	 SECONDARY	 secondary alignment
		 0x200	 QCFAIL		 not passing quality controls
		 0x400	 DUP		 PCR or	optical	duplicate
		 0x800	 SUPPLEMENTARY	 supplementary alignment

       stats	 samtools stats	[options] in.sam|in.bam|in.cram	[region...]

		 samtools stats	collects statistics from BAM files and outputs
		 in a text format.  The	output can be  visualized  graphically
		 using plot-bamstats.

       bedcov	 samtools	   bedcov	  [options]	    region.bed
		 in1.sam|in1.bam|in1.cram[...]

		 Reports the total read	base count (i.e. the sum of  per  base
		 read  depths)	for  each genomic region specified in the sup-
		 plied BED file. The regions are output	as they	appear in  the
		 BED  file  and	 are  0-based.	Counts for each	alignment file
		 supplied are reported in separate columns.

       depth	 samtools    depth     [options]     [in1.sam|in1.bam|in1.cram
		 [in2.sam|in2.bam|in2.cram] [...]]

		 Computes the read depth at each position or region.

       mpileup	 samtools  mpileup [-EB] [-C capQcoef] [-r reg]	[-f in.fa] [-l
		 list] [-Q minBaseQ] [-q minMapQ] in.bam [in2.bam [...]]

		 Generate textual pileup for one or multiple BAM  files.   For
		 VCF  and  BCF output, please use the bcftools mpileup command
		 instead.  Alignment records are grouped by sample (SM)	 iden-
		 tifiers  in  @RG header lines.	 If sample identifiers are ab-
		 sent, each input file is regarded as one sample.

		 See the samtools-mpileup man page for a  description  of  the
		 pileup	format and options.

       coverage	 samtools    coverage	 [options]   [in1.sam|in1.bam|in1.cram
		 [in2.sam|in2.bam|in2.cram] [...]]

		 Produces a histogram or table of coverage per chromosome.

       merge	 samtools merge	[-nur1f] [-h inh.sam] [-t tag]	[-R  reg]  [-b
		 list] out.bam in1.bam [in2.bam	in3.bam	... inN.bam]

		 Merge	multiple  sorted  alignment  files, producing a	single
		 sorted	output file that contains all the  input  records  and
		 maintains the existing	sort order.

		 If  -h	 is  specified	the @SQ	headers	of input files will be
		 merged	into the specified  header,  otherwise	they  will  be
		 merged	 into  a composite header created from the input head-
		 ers.  If the @SQ headers differ in order this may require the
		 output	file to	be re-sorted after merge.

		 The ordering of the records in	the input files	must match the
		 usage of the -n and -t	command-line options.  If they do not,
		 the output order will be undefined.  See sort for information
		 about record ordering.

       split	 samtools split	[options] merged.sam|merged.bam|merged.cram

		 Splits	a file by read group, producing	 one  or  more	output
		 files matching	a common prefix	(by default based on the input
		 filename) each	containing one read-group.

       cat	 samtools cat [-b list]	[-h header.sam]	[-o  out.bam]  in1.bam
		 in2.bam [ ... ]

		 Concatenate  BAMs or CRAMs. Although this works on either BAM
		 or CRAM, all input files must be  the	same  format  as  each
		 other.	 The  sequence	dictionary  of each input file must be
		 identical, although this command does not  check  this.  This
		 command  uses	a similar trick	to reheader which enables fast
		 BAM concatenation.

       fastq/a	 samtools fastq	[options] in.bam
		 samtools fasta	[options] in.bam

		 Converts a BAM	or CRAM	into either FASTQ or FASTA format  de-
		 pending  on  the command invoked. The files will be automati-
		 cally compressed if the file names have a .gz or .bgzf	exten-
		 sion.

		 The input to this program must	be collated by name.  Use sam-
		 tools collate or samtools sort	-n to ensure this.

       faidx	 samtools faidx	<ref.fasta> [region1 [...]]

		 Index reference sequence in the FASTA format or extract  sub-
		 sequence  from	 indexed  reference  sequence. If no region is
		 specified,   faidx   will   index   the   file	  and	create
		 _ref.fasta_.fai  on  the  disk. If regions are	specified, the
		 subsequences will be retrieved	and printed to stdout  in  the
		 FASTA format.

		 The input file	can be compressed in the BGZF format.

		 FASTQ files can be read and indexed by	this command.  Without
		 using --fastq any extracted subsequence will be in FASTA for-
		 mat.

       fqidx	 samtools fqidx	<ref.fastq> [region1 [...]]

		 Index	reference sequence in the FASTQ	format or extract sub-
		 sequence from indexed reference sequence.  If	no  region  is
		 specified,   fqidx   will   index   the   file	  and	create
		 _ref.fastq_.fai on the	disk. If regions  are  specified,  the
		 subsequences  will  be	retrieved and printed to stdout	in the
		 FASTQ format.

		 The input file	can be compressed in the BGZF format.

		 samtools fqidx	should only be used  on	 fastq	files  with  a
		 small number of entries.  Trying to use it on a file contain-
		 ing millions of short sequencing reads	will produce an	 index
		 that  is almost as big	as the original	file, and searches us-
		 ing the index will be very slow and use a lot of memory.

       dict	 samtools dict ref.fasta|ref.fasta.gz

		 Create	a sequence dictionary file from	a fasta	file.

       calmd	 samtools calmd	[-Eeubr] [-C capQcoef] aln.bam ref.fasta

		 Generate the MD tag. If the MD	tag is already	present,  this
		 command  will	give a warning if the MD tag generated is dif-
		 ferent	from the existing tag. Output SAM by default.

		 Calmd can also	read and write CRAM  files  although  in  most
		 cases	it is pointless	as CRAM	recalculates MD	and NM tags on
		 the fly.  The one exception to	this case is where both	 input
		 and  output CRAM files	have been / are	being created with the
		 no_ref	option.

       fixmate	 samtools fixmate [-rpcm] [-O format] in.nameSrt.bam out.bam

		 Fill in mate coordinates, ISIZE and mate related flags	from a
		 name-sorted alignment.

       markdup	 samtools markdup [-l length] [-r] [-s]	[-T] [-S] in.al-
		 gsort.bam out.bam

		 Mark duplicate	alignments from	a coordinate sorted file  that
		 has  been  run	 through  samtools fixmate with	the -m option.
		 This program relies on	the MC and ms tags that	 fixmate  pro-
		 vides.

       rmdup	 samtools rmdup	[-sS] <input.srt.bam> <out.bam>

		 This command is obsolete. Use markdup instead.

       addreplacerg
		 samtools  addreplacerg	 [-r rg-line | -R rg-ID] [-m mode] [-l
		 level]	[-o out.bam] in.bam

		 Adds or replaces read group tags in a file.

       reheader	 samtools reheader [-iP] in.header.sam in.bam

		 Replace  the  header	in   in.bam   with   the   header   in
		 in.header.sam.	  This	command	 is much faster	than replacing
		 the header with a BAM->SAM->BAM conversion.

		 By default this command outputs the BAM or CRAM file to stan-
		 dard  output  (stdout),  but for CRAM format files it has the
		 option	to perform an in-place edit, both reading and  writing
		 to  the  same file.  No validity checking is performed	on the
		 header, nor that it is	suitable to use	with the sequence data
		 itself.

       targetcut samtools  targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1
		 em1] [-2 em2] [-f ref]	in.bam

		 This command identifies target	regions	by examining the  con-
		 tinuity  of  read depth, computes haploid consensus sequences
		 of targets and	outputs	a SAM with each	sequence corresponding
		 to  a	target.	When option -f is in use, BAQ will be applied.
		 This command is only designed for cutting fosmid clones  from
		 fosmid	pool sequencing	[Ref. Kitzman et al. (2010)].

       phase	 samtools  phase  [-AF]	 [-k  len] [-b prefix] [-q minLOD] [-Q
		 minBaseQ] in.bam

		 Call and phase	heterozygous SNPs.

       depad	 samtools depad	[-SsCu1] [-T ref.fa] [-o output] in.bam

		 Converts a BAM	aligned	against	a padded reference  to	a  BAM
		 aligned against the depadded reference.  The padded reference
		 may contain verbatim "*" bases	in it, but "*" bases are  also
		 counted  in  the  reference numbering.	 This means that a se-
		 quence	base-call aligned against a reference "*"  is  consid-
		 ered  to be a cigar match ("M"	or "X")	operator (if the base-
		 call is "A", "C", "G" or "T").	 After depadding the reference
		 "*"  bases  are  deleted and such aligned sequence base-calls
		 become	insertions.  Similarly transformations apply for dele-
		 tions and padding cigar operations.

       ampliconclip
		 samtools  ampliconclip	 [-o out.file] [-f stat.file] [--soft-
		 clip]	[--hard-clip]  [--both-ends]  [--strand]   [--clipped]
		 [--fail] [--no-PG] -b bed.file	in.file

		 Clip  reads in	a SAM compatible file based on data from a BED
		 file.

SAMTOOLS OPTIONS
       These are options that are passed after the  samtools  command,	before
       any sub-command is specified.

       help, --help
	      Display  a  brief	 usage	message	 listing the samtools commands
	      available.  If the name of a command is also given,  e.g.,  sam-
	      tools help view,	the detailed usage message for that particular
	      command is displayed.

       --version
	      Display the version numbers and copyright	information  for  sam-
	      tools and	the important libraries	used by	samtools.

       --version-only
	      Display  the  full samtools version number in a machine-readable
	      format.

GLOBAL COMMAND OPTIONS
       Several long-options are	shared between multiple	samtools sub-commands:
       --input-fmt,   --input-fmt-option,  --output-fmt,  --output-fmt-option,
       --reference, --write-index, and --verbosity.  The input format is typi-
       cally auto-detected so specifying the format is usually unnecessary and
       the option is included for completeness.	 Note that not all subcommands
       have all	options.  Consult the subcommand help for more details.

       Format  strings recognised are "sam", "sam.gz", "bam" and "cram".  They
       may be followed by  a  comma  separated	list  of  options  as  key  or
       key=value. See below for	examples.

       The fmt-option arguments	accept either a	single option or option=value.
       Note that some options only work	on some	file formats and only on  read
       or  write  streams.   If	value is unspecified for a boolean option, the
       value is	assumed	to be 1.  The valid options are	as follows.

       level=INT
	   Output only.	Specifies the compression level	from 1 to 9, or	0  for
	   uncompressed.   If the output format	is SAM,	this also enables BGZF
	   compression,	otherwise SAM defaults to uncompressed.

       nthreads=INT
	   Specifies the number	of threads to use during encoding  and/or  de-
	   coding.   For  BAM this will	be encoding only.  In CRAM the threads
	   are dynamically shared between encoder and decoder.

       reference=fasta_file
	   Specifies a FASTA reference file for	use in CRAM encoding or	decod-
	   ing.	  It usually is	not required for decoding except in the	situa-
	   tion	of the MD5 not being obtainable	via the	REF_PATH or  REF_CACHE
	   environment variables.

       decode_md=0|1
	   CRAM	input only; defaults to	1 (on).	 CRAM does not typically store
	   MD and NM tags, preferring to generate them on the fly.  When  this
	   option  is  0 missing MD, NM	tags will not be generated.  It	can be
	   particularly	 useful	 when  combined	 with  a  file	encoded	 using
	   store_md=1 and store_nm=1.

       store_md=0|1
	   CRAM	 output	 only; defaults	to 0 (off).  CRAM normally only	stores
	   MD tags when	the reference is unknown and lets the decoder generate
	   these values	on-the-fly (see	decode_md).

       store_nm=0|1
	   CRAM	 output	 only; defaults	to 0 (off).  CRAM normally only	stores
	   NM tags when	the reference is unknown and lets the decoder generate
	   these values	on-the-fly (see	decode_md).

       ignore_md5=0|1
	   CRAM	 input	only; defaults to 0 (off).  When enabled, md5 checksum
	   errors on the reference sequence and	block checksum	errors	within
	   CRAM	are ignored.  Use of this option is strongly discouraged.

       required_fields=bit-field
	   CRAM	 input only; specifies which SAM columns need to be populated.
	   By default all fields are used.  Limiting the  decode  to  specific
	   columns can have significant	performance gains.  The	bit-field is a
	   numerical value constructed from the	following table.

	      0x1   SAM_QNAME
	      0x2   SAM_FLAG
	      0x4   SAM_RNAME
	      0x8   SAM_POS
	     0x10   SAM_MAPQ
	     0x20   SAM_CIGAR
	     0x40   SAM_RNEXT
	     0x80   SAM_PNEXT
	    0x100   SAM_TLEN
	    0x200   SAM_SEQ
	    0x400   SAM_QUAL
	    0x800   SAM_AUX
	   0x1000   SAM_RGAUX

       name_prefix=string
	   CRAM	input only; defaults to	output filename.  Any  sequences  with
	   auto-generated read names will use string as	the name prefix.

       multi_seq_per_slice=0|1
	   CRAM	 output	 only; defaults	to 0 (off).  By	default	CRAM generates
	   one container per reference sequence, except	in the	case  of  many
	   small references (such as a fragmented assembly).

       version=major.minor
	   CRAM	 output	 only.	Specifies the CRAM version number.  Acceptable
	   values are "2.1" and	"3.0".

       seqs_per_slice=INT
	   CRAM	output only; defaults to 10000.

       slices_per_container=INT
	   CRAM	output only; defaults to 1.  The  effect  of  having  multiple
	   slices  per	container is to	share the compression header block be-
	   tween multiple slices.  This	is unlikely to	have  any  significant
	   impact  unless  the number of sequences per slice is	reduced.  (To-
	   gether these	two options control the	granularity of random access.)

       embed_ref=0|1
	   CRAM	output only; defaults to 0 (off).  If 1, this will store  por-
	   tions  of  the  reference sequence in each slice, permitting	decode
	   without having requiring an external	 copy  of  the	reference  se-
	   quence.

       no_ref=0|1
	   CRAM	 output	 only;	defaults  to 0 (off).  If 1, sequences will be
	   stored verbatim with	no reference encoding.	This can be useful  if
	   no reference	is available for the file.

       use_bzip2=0|1
	   CRAM	 output	 only;	defaults  to 0 (off).  Permits use of bzip2 in
	   CRAM	block compression.

       use_lzma=0|1
	   CRAM	output only; defaults to 0 (off).  Permits use of lzma in CRAM
	   block compression.

       lossy_names=0|1
	   CRAM	 output	 only;	defaults to 0 (off).  If 1, templates with all
	   members within the same CRAM	slice will have	their read  names  re-
	   moved.   New	names will be automatically generated during decoding.
	   Also	see the	name_prefix option.

       For example:

	   samtools view --input-fmt-option decode_md=0
	       --output-fmt cram,version=3.0 --output-fmt-option embed_ref
	       --output-fmt-option seqs_per_slice=2000 -o foo.cram foo.bam

       The --write-index option	enables	automatic index	creation while writing
       out  BAM,  CRAM	or  bgzf SAM files.  Note to get compressed SAM	as the
       output format you need to manually request a compression	level,	other-
       wise  all  SAM files are	uncompressed.  By default SAM and BAM will use
       CSI indices while CRAM will use CRAI indices.  If you  need  to	create
       BAI  indices  note that it is possible to specify the name of the index
       being written to, and hence the format, by using	the filename##idx##in-
       dexname notation.

       For example: to convert a BAM to	a compressed SAM with CSI indexing:

	   samtools view -h -O sam,level=6 --write-index in.bam	-o out.sam.gz

       To convert a SAM	to a compressed	BAM using BAI indexing:

	   samtools view --write-index in.sam -o out.bam##idx##out.bam.bai

       The  --verbosity	 INT  option sets the verbosity	level for samtools and
       HTSlib.	The default is 3 (HTS_LOG_WARNING); 2 reduces warning messages
       and  0 or 1 also	reduces	some error messages, while values greater than
       3 produce increasing numbers of additional warnings  and	 logging  mes-
       sages.

REFERENCE SEQUENCES
       The  CRAM  format requires use of a reference sequence for both reading
       and writing.

       When reading a CRAM the @SQ headers are interrogated  to	 identify  the
       reference  sequence  MD5sum  (M5: tag) and the local reference sequence
       filename	(UR: tag).  Note that http:// and ftp:// based URLs in the UR:
       field are not used, but local fasta filenames (with or without file://)
       can be used.

       To create a CRAM	the @SQ	headers	will also be read to identify the ref-
       erence sequences, but M5: and UR: tags may not be present. In this case
       the -T and -t options of	samtools view may be used to specify the fasta
       or  fasta.fai  filenames	 respectively (provided	the .fasta.fai file is
       also backed up by a .fasta file).

       The search order	to obtain a reference is:

       1. Use any local	file specified by the command line options (eg -T).

       2. Look for MD5 via REF_CACHE environment variable.

       3. Look for MD5 in each element of the REF_PATH environment variable.

       4. Look for a local file	listed in the UR: header tag.

ENVIRONMENT VARIABLES
       HTS_PATH
	      A	colon-separated	list of	directories in which to	search for HT-
	      Slib  plugins.  If $HTS_PATH starts or ends with a colon or con-
	      tains a double colon (::), the built-in list of  directories  is
	      searched at that point in	the search.

	      If  no HTS_PATH variable is defined, the built-in	list of	direc-
	      tories specified when HTSlib was built is	used, which  typically
	      includes /usr/local/libexec/htslib and similar directories.

       REF_PATH
	      A	 colon	separated (semi-colon on Windows) list of locations in
	      which to look for	sequences identified by	their  MD5sums.	  This
	      can  be either a list of directories or URLs. Note that if a URL
	      is included then the colon in http:// and	 ftp://	 and  the  op-
	      tional  port number will be treated as part of the URL and not a
	      PATH field separator.  For URLs, the text	%s will	be replaced by
	      the MD5sum being read.

	      If   no	REF_PATH   has	been  specified	 it  will  default  to
	      http://www.ebi.ac.uk/ena/cram/md5/%s and if  REF_CACHE  is  also
	      unset, it	will be	set to $XDG_CACHE_HOME/hts-ref/%2s/%2s/%s.  If
	      $XDG_CACHE_HOME is unset,	$HOME/.cache (or a local system	tempo-
	      rary directory if	no home	directory is found) will be used simi-
	      larly.

       REF_CACHE
	      This can be defined to a single location housing a  local	 cache
	      of  references.	Upon downloading a reference it	will be	stored
	      in the location pointed to  by  REF_CACHE.   REF_CACHE  will  be
	      searched before attempting to load via the REF_PATH search list.
	      If no REF_PATH is	defined, both REF_PATH and REF_CACHE  will  be
	      automatically  set  (see	above),	but if REF_PATH	is defined and
	      REF_CACHE	not then no local cache	is used.

	      To  avoid	 many  files  being  stored  in	 the  same  directory,
	      REF_CACHE	may be defined as a pattern using %nums	to consume num
	      chracters	of the MD5sum and %s to	consume	all remaining  charac-
	      ters.   If  REF_CACHE  lacks %s then it will get an implicit /%s
	      appended.

	      To  aid  population  of  the  REF_CACHE	directory   a	script
	      misc/seq_cache_populate.pl is provided in	the Samtools distribu-
	      tion. This takes a fasta file or a directory of fasta files  and
	      generates	the MD5sum named files.

	      For  example if you use seq_cache_populate -subdirs 2 -root /lo-
	      cal/ref_cache to create 2	nested subdirectories  (the  default),
	      each  consuming  2 characters of the MD5sum, then	REF_CACHE must
	      be set to	/local/ref_cache/%2s/%2s/%s.

EXAMPLES
       o Import	SAM to BAM when	@SQ lines are present in the header:

	   samtools view -b aln.sam > aln.bam

	 If @SQ	lines are absent:

	   samtools faidx ref.fa
	   samtools view -bt ref.fa.fai	aln.sam	> aln.bam

	 where ref.fa.fai is generated automatically by	the faidx command.

       o Convert a BAM file to a CRAM file using a local reference sequence.

	   samtools view -C -T ref.fa aln.bam >	aln.cram

LIMITATIONS
       o Unaligned words used in bam_endian.h, bam.c and bam_aux.c.

AUTHOR
       Heng Li from the	Sanger Institute wrote the original C version of  sam-
       tools.  Bob Handsaker from the Broad Institute implemented the BGZF li-
       brary.  Petr Danecek and	Heng  Li  wrote	 the  VCF/BCF  implementation.
       James Bonfield from the Sanger Institute	developed the CRAM implementa-
       tion.  Other large code contributions have been made by John  Marshall,
       Rob  Davies,  Martin  Pollard, Andrew Whitwham, Valeriu Ohan (all while
       primarily at the	Sanger Institute), with	 numerous  other  smaller  but
       valuable	 contributions.	  See the per-command manual pages for further
       authorship.

SEE ALSO
       samtools-addreplacerg(1), samtools-ampliconclip(1), samtools-bedcov(1),
       samtools-calmd(1),  samtools-cat(1), samtools-collate(1), samtools-cov-
       erage(1), samtools-depad(1), samtools-depth(1), samtools-dict(1),  sam-
       tools-faidx(1),	 samtools-fasta(1),  samtools-fastq(1),	 samtools-fix-
       mate(1),	 samtools-flags(1),  samtools-flagstat(1),  samtools-fqidx(1),
       samtools-idxstats(1), samtools-index(1),	samtools-markdup(1), samtools-
       merge(1),     samtools-mpileup(1),     samtools-phase(1),     samtools-
       quickcheck(1),	samtools-reheader(1),	samtools-rmdup(1),   samtools-
       sort(1),	samtools-split(1),  samtools-stats(1),	samtools-targetcut(1),
       samtools-tview(1), samtools-view(1), bcftools(1), sam(5), tabix(1)

       Samtools	website: <http://www.htslib.org/>
       File   format   specification   of  SAM/BAM,CRAM,VCF/BCF:  <http://sam-
       tools.github.io/hts-specs>
       Samtools	latest source: <https://github.com/samtools/samtools>
       HTSlib latest source: <https://github.com/samtools/htslib>
       Bcftools	website: <http://samtools.github.io/bcftools>

samtools-1.11		       22 September 2020		   samtools(1)

NAME | SYNOPSIS | DESCRIPTION | COMMANDS | SAMTOOLS OPTIONS | GLOBAL COMMAND OPTIONS | REFERENCE SEQUENCES | ENVIRONMENT VARIABLES | EXAMPLES | LIMITATIONS | AUTHOR | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=samtools&sektion=1&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help