Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
samtools(1)		     Bioinformatics tools		   samtools(1)

       samtools	- Utilities for	the Sequence Alignment/Map (SAM) format

       samtools	view -bt ref_list.txt -o aln.bam aln.sam.gz

       samtools	sort -T	/tmp/aln.sorted	-o aln.sorted.bam aln.bam

       samtools	index aln.sorted.bam

       samtools	idxstats aln.sorted.bam

       samtools	flagstat aln.sorted.bam

       samtools	stats aln.sorted.bam

       samtools	bedcov aln.sorted.bam

       samtools	depth aln.sorted.bam

       samtools	view aln.sorted.bam chr2:20,100,000-20,200,000

       samtools	merge out.bam in1.bam in2.bam in3.bam

       samtools	faidx ref.fasta

       samtools	fqidx ref.fastq

       samtools	tview aln.sorted.bam ref.fasta

       samtools	split merged.bam

       samtools	quickcheck in1.bam in2.cram

       samtools	dict -a	GRCh38 -s "Homo	sapiens" ref.fasta

       samtools	fixmate	in.namesorted.sam out.bam

       samtools	mpileup	-C50 -f	ref.fasta -r chr3:1,000-2,000 in1.bam in2.bam

       samtools	flags PAIRED,UNMAP,MUNMAP

       samtools	fastq input.bam	> output.fastq

       samtools	fasta input.bam	> output.fasta

       samtools	 addreplacerg  -r 'ID:fish' -r 'LB:1334' -r 'SM:alpha' -o out-
       put.bam input.bam

       samtools	collate	-o aln.name_collated.bam aln.sorted.bam

       samtools	depad input.bam

       samtools	markdup	in.algnsorted.bam out.bam

       Samtools	is a set of utilities that manipulate alignments  in  the  BAM
       format. It imports from and exports to the SAM (Sequence	Alignment/Map)
       format, does sorting, merging and  indexing,  and  allows  to  retrieve
       reads in	any regions swiftly.

       Samtools	 is designed to	work on	a stream. It regards an	input file `-'
       as the standard input (stdin) and an output file	`-'  as	 the  standard
       output (stdout).	Several	commands can thus be combined with Unix	pipes.
       Samtools	always output warning and error	messages to the	standard error
       output (stderr).

       Samtools	 is  also able to open a BAM (not SAM) file on a remote	FTP or
       HTTP server if the BAM file name	starts	with  `ftp://'	or  `http://'.
       Samtools	 checks	 the  current working directory	for the	index file and
       will download the index upon absence. Samtools does  not	 retrieve  the
       entire alignment	file unless it is asked	to do so.

       Each  command  has  its own man page which can be viewed	using e.g. man
       samtools-view or	with a recent GNU man using man	samtools view.	 Below
       we have a brief summary of syntax and sub-command description.

       Options	common	to all sub-commands are	documented below in the	GLOBAL
       COMMAND OPTIONS section.

       view	 samtools view [options] in.sam|in.bam|in.cram [region...]

		 With no options or regions specified, prints  all  alignments
		 in  the  specified input alignment file (in SAM, BAM, or CRAM
		 format) to standard output in SAM format (with	no  header  by

		 You may specify one or	more space-separated region specifica-
		 tions after the input filename	to  restrict  output  to  only
		 those	alignments  which overlap the specified	region(s). Use
		 of region specifications requires a coordinate-sorted and in-
		 dexed input file.

		 Options  exist	to change the output format from SAM to	BAM or
		 CRAM, so this command also acts as a file  format  conversion

       sort	 samtools sort [-l level] [-m maxMem] [-o out.bam] [-O format]
		 [-n] [-t tag] [-T tmpprefix] [-@ threads]

		 Sort alignments by leftmost coordinates, or by	read name when
		 -n is used.  An appropriate @HD-SO sort order header tag will
		 be added or an	existing one updated if	necessary.

		 The  sorted  output is	written	to standard output by default,
		 or to the specified file (out.bam) when  -o  is  used.	  This
		 command  will also create temporary files tmpprefix.%d.bam as
		 needed	when the entire	alignment data cannot fit into	memory
		 (as controlled	via the	-m option).

		 Consider using	samtools collate instead if you	need name col-
		 lated data without a full lexicographical sort.

       index	 samtools index	[-bc] [-m INT] aln.bam|aln.cram	[out.index]

		 Index a coordinate-sorted BAM or CRAM file  for  fast	random
		 access.  (Note	that this does not work	with SAM files even if
		 they are  bgzip  compressed  --  to  index  such  files,  use
		 tabix(1) instead.)

		 This  index is	needed when region arguments are used to limit
		 samtools view and similar commands to particular  regions  of

		 If  an	output filename	is given, the index file will be writ-
		 ten to	out.index.  Otherwise, for a CRAM file aln.cram, index
		 file  aln.cram.crai  will be created; for a BAM file aln.bam,
		 either	aln.bam.bai or aln.bam.csi will	be created,  depending
		 on the	index format selected.

       idxstats	 samtools idxstats in.sam|in.bam|in.cram

		 Retrieve  and	print stats in the index file corresponding to
		 the input file.  Before calling idxstats, the input BAM  file
		 should	be indexed by samtools index.

		 If  run  on a SAM or CRAM file	or an unindexed	BAM file, this
		 command will still produce the	same summary  statistics,  but
		 does  so  by  reading	through	 the entire file.  This	is far
		 slower	than using the BAM indices.

		 The output is TAB-delimited with each line consisting of ref-
		 erence	 sequence  name, sequence length, # mapped reads and #
		 unmapped reads. It is written to stdout.

       flagstat	 samtools flagstat in.sam|in.bam|in.cram

		 Does a	full pass through the  input  file  to	calculate  and
		 print statistics to stdout.

		 Provides  counts for each of 13 categories based primarily on
		 bit flags in the FLAG field. Each category in the  output  is
		 broken	 down  into QC pass and	QC fail, which is presented as
		 "#PASS	+ #FAIL" followed by a description of the category.

       stats	 samtools stats	[options] in.sam|in.bam|in.cram	[region...]

		 samtools stats	collects statistics from BAM files and outputs
		 in  a	text format.  The output can be	visualized graphically
		 using plot-bamstats.

       bedcov	 samtools	  bedcov	 [options]	    region.bed

		 Reports  the  total read base count (i.e. the sum of per base
		 read depths) for each genomic region specified	 in  the  sup-
		 plied	BED file. The regions are output as they appear	in the
		 BED file and are 0-based.  Counts  for	 each  alignment  file
		 supplied are reported in separate columns.

       depth	 samtools     depth	[options]    [in1.sam|in1.bam|in1.cram
		 [in2.sam|in2.bam|in2.cram] [...]]

		 Computes the read depth at each position or region.

       merge	 samtools merge	[-nur1f] [-h inh.sam] [-t tag]	[-R  reg]  [-b
		 list] out.bam in1.bam [in2.bam	in3.bam	... inN.bam]

		 Merge	multiple  sorted  alignment  files, producing a	single
		 sorted	output file that contains all the  input  records  and
		 maintains the existing	sort order.

		 If  -h	 is  specified	the @SQ	headers	of input files will be
		 merged	into the specified  header,  otherwise	they  will  be
		 merged	 into  a composite header created from the input head-
		 ers.  If the @SQ headers differ in order this may require the
		 output	file to	be re-sorted after merge.

		 The ordering of the records in	the input files	must match the
		 usage of the -n and -t	command-line options.  If they do not,
		 the output order will be undefined.  See sort for information
		 about record ordering.

       faidx	 samtools faidx	<ref.fasta> [region1 [...]]

		 Index reference sequence in the FASTA format or extract  sub-
		 sequence  from	 indexed  reference  sequence. If no region is
		 specified,   faidx   will   index   the   file	  and	create
		 _ref.fasta_.fai  on  the  disk. If regions are	specified, the
		 subsequences will be retrieved	and printed to stdout  in  the
		 FASTA format.

		 The input file	can be compressed in the BGZF format.

		 FASTQ files can be read and indexed by	this command.  Without
		 using --fastq any extracted subsequence will be in FASTA for-

       fqidx	 samtools fqidx	<ref.fastq> [region1 [...]]

		 Index	reference sequence in the FASTQ	format or extract sub-
		 sequence from indexed reference sequence.  If	no  region  is
		 specified,   fqidx   will   index   the   file	  and	create
		 _ref.fastq_.fai on the	disk. If regions  are  specified,  the
		 subsequences  will  be	retrieved and printed to stdout	in the
		 FASTQ format.

		 The input file	can be compressed in the BGZF format.

		 samtools fqidx	should only be used  on	 fastq	files  with  a
		 small number of entries.  Trying to use it on a file contain-
		 ing millions of short sequencing reads	will produce an	 index
		 that  is almost as big	as the original	file, and searches us-
		 ing the index will be very slow and use a lot of memory.

       tview	 samtools  tview  [-p	chr:pos]   [-s	 STR]	[-d   display]
		 <in.sorted.bam> [ref.fasta]

		 Text  alignment viewer	(based on the ncurses library).	In the
		 viewer, press `?' for help and	press `g' to check the	align-
		 ment	 start	  from	 a   region   in   the	 format	  like
		 `chr10:10,000,000' or `=10,000,000'  when  viewing  the  same
		 reference sequence.

       split	 samtools split	[options] merged.sam|merged.bam|merged.cram

		 Splits	 a  file  by  read group, producing one	or more	output
		 files matching	a common prefix	(by default based on the input
		 filename) each	containing one read-group.

		 samtools quickcheck [options] in.sam|in.bam|in.cram [ ... ]

		 Quickly  check	 that  input files appear to be	intact.	Checks
		 that beginning	of the file contains a valid header (all  for-
		 mats)	containing at least one	target sequence	and then seeks
		 to the	end of the file	and checks that	an  end-of-file	 (EOF)
		 is present and	intact (BAM only).

		 Data  in  the middle of the file is not read since that would
		 be much more time consuming, so please	note that this command
		 will  not detect internal corruption, but is useful for test-
		 ing that files	are not	truncated before performing  more  in-
		 tensive tasks on them.

		 This command will exit	with a non-zero	exit code if any input
		 files don't have a valid header or are	missing	an EOF	block.
		 Otherwise it will exit	successfully (with a zero exit code).

       dict	 samtools dict ref.fasta|ref.fasta.gz

		 Create	a sequence dictionary file from	a fasta	file.

       fixmate	 samtools fixmate [-rpcm] [-O format] in.nameSrt.bam out.bam

		 Fill in mate coordinates, ISIZE and mate related flags	from a
		 name-sorted alignment.

       mpileup	 samtools mpileup [-EB]	[-C capQcoef] [-r reg] [-f in.fa]  [-l
		 list] [-Q minBaseQ] [-q minMapQ] in.bam [in2.bam [...]]

		 Generate  textual  pileup for one or multiple BAM files.  For
		 VCF and BCF output, please use	the bcftools  mpileup  command
		 instead.   Alignment records are grouped by sample (SM) iden-
		 tifiers in @RG	header lines.  If sample identifiers  are  ab-
		 sent, each input file is regarded as one sample.

		 See  the  samtools-mpileup  man page for a description	of the
		 pileup	format and options.

       flags	 samtools flags	INT|STR[,...]

		 Convert between textual and numeric flag representation.


		   0x1	 PAIRED		 paired-end (or	multiple-segment) sequencing technology
		   0x2	 PROPER_PAIR	 each segment properly aligned according to the	aligner
		   0x4	 UNMAP		 segment unmapped
		   0x8	 MUNMAP		 next segment in the template unmapped
		  0x10	 REVERSE	 SEQ is	reverse	complemented
		  0x20	 MREVERSE	 SEQ of	the next segment in the	template is reverse complemented
		  0x40	 READ1		 the first segment in the template
		  0x80	 READ2		 the last segment in the template
		 0x100	 SECONDARY	 secondary alignment
		 0x200	 QCFAIL		 not passing quality controls
		 0x400	 DUP		 PCR or	optical	duplicate
		 0x800	 SUPPLEMENTARY	 supplementary alignment

       fastq/a	 samtools fastq	[options] in.bam
		 samtools fasta	[options] in.bam

		 Converts a BAM	or CRAM	into either FASTQ or FASTA format  de-
		 pending  on  the command invoked. The files will be automati-
		 cally compressed if the file names have a .gz or .bgzf	exten-

		 The input to this program must	be collated by name.  Use sam-
		 tools collate or samtools sort	-n to ensure this.

       collate	 samtools collate [options] in.sam|in.bam|in.cram [_prefix_]

		 Shuffles and groups reads together by their names.  A	faster
		 alternative  to  a full query name sort, collate ensures that
		 reads of the same name	are  grouped  together	in  contiguous
		 groups,  but  doesn't	make any guarantees about the order of
		 read names between groups.

		 The output from this command should be	suitable for any oper-
		 ation	that  requires	all reads from the same	template to be
		 grouped together.

       reheader	 samtools reheader [-iP] in.header.sam in.bam

		 Replace  the  header	in   in.bam   with   the   header   in
		 in.header.sam.	  This	command	 is much faster	than replacing
		 the header with a BAM->SAM->BAM conversion.

		 By default this command outputs the BAM or CRAM file to stan-
		 dard  output  (stdout),  but for CRAM format files it has the
		 option	to perform an in-place edit, both reading and  writing
		 to  the  same file.  No validity checking is performed	on the
		 header, nor that it is	suitable to use	with the sequence data

       cat	 samtools  cat	[-b list] [-h header.sam] [-o out.bam] in1.bam
		 in2.bam [ ... ]

		 Concatenate BAMs or CRAMs. Although this works	on either  BAM
		 or  CRAM,  all	 input	files  must be the same	format as each
		 other.	The sequence dictionary	of each	 input	file  must  be
		 identical,  although  this  command does not check this. This
		 command uses a	similar	trick to reheader which	 enables  fast
		 BAM concatenation.

       rmdup	 samtools rmdup	[-sS] <> <out.bam>

		 This command is obsolete. Use markdup instead.

		 samtools  addreplacerg	 [-r rg-line | -R rg-ID] [-m mode] [-l
		 level]	[-o out.bam] in.bam

		 Adds or replaces read group tags in a file.

       calmd	 samtools calmd	[-Eeubr] [-C capQcoef] aln.bam ref.fasta

		 Generate the MD tag. If the MD	tag is already	present,  this
		 command  will	give a warning if the MD tag generated is dif-
		 ferent	from the existing tag. Output SAM by default.

		 Calmd can also	read and write CRAM  files  although  in  most
		 cases	it is pointless	as CRAM	recalculates MD	and NM tags on
		 the fly.  The one exception to	this case is where both	 input
		 and  output CRAM files	have been / are	being created with the
		 no_ref	option.

       targetcut samtools targetcut [-Q	minBaseQ] [-i inPenalty] [-0 em0]  [-1
		 em1] [-2 em2] [-f ref]	in.bam

		 This  command identifies target regions by examining the con-
		 tinuity of read depth,	computes haploid  consensus  sequences
		 of targets and	outputs	a SAM with each	sequence corresponding
		 to a target. When option -f is	in use,	BAQ will  be  applied.
		 This  command is only designed	for cutting fosmid clones from
		 fosmid	pool sequencing	[Ref. Kitzman et al. (2010)].

       phase	 samtools phase	[-AF] [-k len] [-b  prefix]  [-q  minLOD]  [-Q
		 minBaseQ] in.bam

		 Call and phase	heterozygous SNPs.

       depad	 samtools depad	[-SsCu1] [-T ref.fa] [-o output] in.bam

		 Converts  a  BAM  aligned against a padded reference to a BAM
		 aligned against the depadded reference.  The padded reference
		 may  contain verbatim "*" bases in it,	but "*"	bases are also
		 counted in the	reference numbering.  This means  that	a  se-
		 quence	 base-call  aligned against a reference	"*" is consid-
		 ered to be a cigar match ("M" or "X") operator	(if the	 base-
		 call is "A", "C", "G" or "T").	 After depadding the reference
		 "*" bases are deleted and such	 aligned  sequence  base-calls
		 become	insertions.  Similarly transformations apply for dele-
		 tions and padding cigar operations.

       markdup	 samtools markdup [-l length] [-r] [-s]	[-T] [-S]
		 gsort.bam out.bam

		 Mark  duplicate alignments from a coordinate sorted file that
		 has been run through samtools fixmate	with  the  -m  option.
		 This  program	relies on the MC and ms	tags that fixmate pro-

       These are options that are passed after the  samtools  command,	before
       any sub-command is specified.

       help, --help
	      Display  a  brief	 usage	message	 listing the samtools commands
	      available.  If the name of a command is also given,  e.g.,  sam-
	      tools help view,	the detailed usage message for that particular
	      command is displayed.

	      Display the version numbers and copyright	information  for  sam-
	      tools and	the important libraries	used by	samtools.

	      Display  the  full samtools version number in a machine-readable

       Several long-options are	shared between multiple	samtools sub-commands:
       --input-fmt,   --input-fmt-option,  --output-fmt,  --output-fmt-option,
       --reference, --write-index, and --verbosity.  The input format is typi-
       cally auto-detected so specifying the format is usually unnecessary and
       the option is included for completeness.	 Note that not all subcommands
       have all	options.  Consult the subcommand help for more details.

       Format  strings recognised are "sam", "sam.gz", "bam" and "cram".  They
       may be followed by  a  comma  separated	list  of  options  as  key  or
       key=value. See below for	examples.

       The fmt-option arguments	accept either a	single option or option=value.
       Note that some options only work	on some	file formats and only on  read
       or  write  streams.   If	value is unspecified for a boolean option, the
       value is	assumed	to be 1.  The valid options are	as follows.

	   Output only.	Specifies the compression level	from 1 to 9, or	0  for
	   uncompressed.   If the output format	is SAM,	this also enables BGZF
	   compression,	otherwise SAM defaults to uncompressed.

	   Specifies the number	of threads to use during encoding  and/or  de-
	   coding.   For  BAM this will	be encoding only.  In CRAM the threads
	   are dynamically shared between encoder and decoder.

	   Specifies a FASTA reference file for	use in CRAM encoding or	decod-
	   ing.	  It usually is	not required for decoding except in the	situa-
	   tion	of the MD5 not being obtainable	via the	REF_PATH or  REF_CACHE
	   environment variables.

	   CRAM	input only; defaults to	1 (on).	 CRAM does not typically store
	   MD and NM tags, preferring to generate them on the fly.  When  this
	   option  is  0 missing MD, NM	tags will not be generated.  It	can be
	   particularly	 useful	 when  combined	 with  a  file	encoded	 using
	   store_md=1 and store_nm=1.

	   CRAM	 output	 only; defaults	to 0 (off).  CRAM normally only	stores
	   MD tags when	the reference is unknown and lets the decoder generate
	   these values	on-the-fly (see	decode_md).

	   CRAM	 output	 only; defaults	to 0 (off).  CRAM normally only	stores
	   NM tags when	the reference is unknown and lets the decoder generate
	   these values	on-the-fly (see	decode_md).

	   CRAM	 input	only; defaults to 0 (off).  When enabled, md5 checksum
	   errors on the reference sequence and	block checksum	errors	within
	   CRAM	are ignored.  Use of this option is strongly discouraged.

	   CRAM	 input only; specifies which SAM columns need to be populated.
	   By default all fields are used.  Limiting the  decode  to  specific
	   columns can have significant	performance gains.  The	bit-field is a
	   numerical value constructed from the	following table.

	      0x1   SAM_QNAME
	      0x2   SAM_FLAG
	      0x4   SAM_RNAME
	      0x8   SAM_POS
	     0x10   SAM_MAPQ
	     0x20   SAM_CIGAR
	     0x40   SAM_RNEXT
	     0x80   SAM_PNEXT
	    0x100   SAM_TLEN
	    0x200   SAM_SEQ
	    0x400   SAM_QUAL
	    0x800   SAM_AUX
	   0x1000   SAM_RGAUX

	   CRAM	input only; defaults to	output filename.  Any  sequences  with
	   auto-generated read names will use string as	the name prefix.

	   CRAM	 output	 only; defaults	to 0 (off).  By	default	CRAM generates
	   one container per reference sequence, except	in the	case  of  many
	   small references (such as a fragmented assembly).

	   CRAM	 output	 only.	Specifies the CRAM version number.  Acceptable
	   values are "2.1" and	"3.0".

	   CRAM	output only; defaults to 10000.

	   CRAM	output only; defaults to 1.  The  effect  of  having  multiple
	   slices  per	container is to	share the compression header block be-
	   tween multiple slices.  This	is unlikely to	have  any  significant
	   impact  unless  the number of sequences per slice is	reduced.  (To-
	   gether these	two options control the	granularity of random access.)

	   CRAM	output only; defaults to 0 (off).  If 1, this will store  por-
	   tions  of  the  reference sequence in each slice, permitting	decode
	   without having requiring an external	 copy  of  the	reference  se-

	   CRAM	 output	 only;	defaults  to 0 (off).  If 1, sequences will be
	   stored verbatim with	no reference encoding.	This can be useful  if
	   no reference	is available for the file.

	   CRAM	 output	 only;	defaults  to 0 (off).  Permits use of bzip2 in
	   CRAM	block compression.

	   CRAM	output only; defaults to 0 (off).  Permits use of lzma in CRAM
	   block compression.

	   CRAM	 output	 only;	defaults to 0 (off).  If 1, templates with all
	   members within the same CRAM	slice will have	their read  names  re-
	   moved.   New	names will be automatically generated during decoding.
	   Also	see the	name_prefix option.

       For example:

	   samtools view --input-fmt-option decode_md=0
	       --output-fmt cram,version=3.0 --output-fmt-option embed_ref
	       --output-fmt-option seqs_per_slice=2000 -o foo.cram foo.bam

       The --write-index option	enables	automatic index	creation while writing
       out  BAM,  CRAM	or  bgzf SAM files.  Note to get compressed SAM	as the
       output format you need to manually request a compression	level,	other-
       wise  all SAM files are uncompressed.  SAM and BAM will use CSI indices
       while CRAM will use CRAI	indices.

       For example: to convert a BAM to	a compressed SAM with CSI indexing:

	   samtools view -h -O sam,level=6 --write-index in.bam	-o out.sam.gz

       The --verbosity INT option sets the verbosity level  for	 samtools  and
       HTSlib.	The default is 3 (HTS_LOG_WARNING); 2 reduces warning messages
       and 0 or	1 also reduces some error messages, while values greater  than
       3  produce  increasing  numbers of additional warnings and logging mes-

       The CRAM	format requires	use of a reference sequence for	 both  reading
       and writing.

       When  reading  a	 CRAM the @SQ headers are interrogated to identify the
       reference sequence MD5sum (M5: tag) and the  local  reference  sequence
       filename	(UR: tag).  Note that http:// and ftp:// based URLs in the UR:
       field are not used, but local fasta filenames (with or without file://)
       can be used.

       To create a CRAM	the @SQ	headers	will also be read to identify the ref-
       erence sequences, but M5: and UR: tags may not be present. In this case
       the -T and -t options of	samtools view may be used to specify the fasta
       or fasta.fai filenames respectively (provided the  .fasta.fai  file  is
       also backed up by a .fasta file).

       The search order	to obtain a reference is:

       1. Use any local	file specified by the command line options (eg -T).

       2. Look for MD5 via REF_CACHE environment variable.

       3. Look for MD5 in each element of the REF_PATH environment variable.

       4. Look for a local file	listed in the UR: header tag.

	      A	colon-separated	list of	directories in which to	search for HT-
	      Slib plugins.  If	$HTS_PATH starts or ends with a	colon or  con-
	      tains  a	double colon (::), the built-in	list of	directories is
	      searched at that point in	the search.

	      If no HTS_PATH variable is defined, the built-in list of	direc-
	      tories  specified	when HTSlib was	built is used, which typically
	      includes /usr/local/libexec/htslib and similar directories.

	      A	colon separated	(semi-colon on Windows)	list of	 locations  in
	      which  to	 look for sequences identified by their	MD5sums.  This
	      can be either a list of directories or URLs. Note	that if	a  URL
	      is  included  then  the  colon in	http://	and ftp:// and the op-
	      tional port number will be treated as part of the	URL and	not  a
	      PATH field separator.  For URLs, the text	%s will	be replaced by
	      the MD5sum being read.

	      If  no  REF_PATH	has  been  specified  it   will	  default   to  and	 if  REF_CACHE is also
	      unset, it	will be	set to $XDG_CACHE_HOME/hts-ref/%2s/%2s/%s.  If
	      $XDG_CACHE_HOME is unset,	$HOME/.cache (or a local system	tempo-
	      rary directory if	no home	directory is found) will be used simi-

	      This  can	be defined to a	single directory housing a local cache
	      of references.  Upon downloading a reference it will  be	stored
	      in  the location pointed to by REF_CACHE.	 When reading a	refer-
	      ence it will be looked for in this  directory  before  searching
	      REF_PATH.	  To  avoid many files being stored in the same	direc-
	      tory, a pathname may be constructed using	%nums and %s notation,
	      consuming	 num  characters  of  the  MD5sum.   For  example /lo-
	      cal/ref_cache/%2s/%2s/%s will  create  2	nested	subdirectories
	      with  the	 filenames  in the deepest directory being the last 28
	      characters of the	md5sum.

	      The REF_CACHE directory will be searched for  before  attempting
	      to  load	via  the  REF_PATH search list.	 If no REF_PATH	is de-
	      fined, both REF_PATH and REF_CACHE  will	be  automatically  set
	      (see  above),  but if REF_PATH is	defined	and REF_CACHE not then
	      no local cache is	used.

	      To  aid  population  of  the  REF_CACHE	directory   a	script
	      misc/ is provided in	the Samtools distribu-
	      tion. This takes a fasta file or a directory of fasta files  and
	      generates	the MD5sum named files.

       o Import	SAM to BAM when	@SQ lines are present in the header:

	   samtools view -b aln.sam > aln.bam

	 If @SQ	lines are absent:

	   samtools faidx ref.fa
	   samtools view -bt ref.fa.fai	aln.sam	> aln.bam

	 where ref.fa.fai is generated automatically by	the faidx command.

       o Convert a BAM file to a CRAM file using a local reference sequence.

	   samtools view -C -T ref.fa aln.bam >	aln.cram

       o Unaligned words used in bam_endian.h, bam.c and bam_aux.c.

       Heng  Li	from the Sanger	Institute wrote	the original C version of sam-
       tools.  Bob Handsaker from the Broad Institute implemented the BGZF li-
       brary.	Petr  Danecek  and  Heng  Li wrote the VCF/BCF implementation.
       James Bonfield from the Sanger Institute	developed the CRAM implementa-
       tion.   Other large code	contributions have been	made by	John Marshall,
       Rob Davies, Martin Pollard, Andrew Whitwham, Valeriu  Ohan  (all	 while
       primarily  at  the  Sanger  Institute), with numerous other smaller but
       valuable	contributions.	See the	per-command manual pages  for  further

       samtools-addreplacerg(1),  samtools-bedcov(1),  samtools-calmd(1), sam-
       tools-cat(1),   samtools-collate(1),    samtools-depad(1),    samtools-
       depth(1),  samtools-dict(1), samtools-faidx(1), samtools-fasta(1), sam-
       tools-fastq(1), samtools-fixmate(1), samtools-flags(1),	samtools-flag-
       stat(1),	 samtools-fqidx(1),  samtools-idxstats(1),  samtools-index(1),
       samtools-markdup(1), samtools-merge(1), samtools-mpileup(1),  samtools-
       phase(1),   samtools-quickcheck(1),   samtools-reheader(1),   samtools-
       rmdup(1), samtools-sort(1), samtools-split(1), samtools-stats(1),  sam-
       tools-targetcut(1),  samtools-tview(1),	samtools-view(1), bcftools(1),
       sam(5), tabix(1)

       Samtools	website: <>
       File  format  specification   of	  SAM/BAM,CRAM,VCF/BCF:	  <http://sam->
       Samtools	latest source: <>
       HTSlib latest source: <>
       Bcftools	website: <>

samtools-1.10			6 December 2019			   samtools(1)


Want to link to this manual page? Use this URL:

home | help