Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
tabix(1)		     Bioinformatics tools		      tabix(1)

       tabix - Generic indexer for TAB-delimited genome	position files

       tabix  [-0lf]  [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol]
       [-S lineSkip] [-c metaChar] [region1 [region2	[...]]]

       Tabix indexes a TAB-delimited genome position file and  cre-
       ates  an	 index	file ( or	when region is
       absent from the command-line. The input	data  file  must  be  position
       sorted and compressed by	bgzip which has	a gzip(1) like interface.

       After  indexing,	 tabix is able to quickly retrieve data	lines overlap-
       ping regions specified in the format  "chr:beginPos-endPos".   (Coordi-
       nates specified in this region format are 1-based and inclusive.)

       Fast  data  retrieval also works	over network if	URI is given as	a file
       name and	in this	case the index file will be downloaded if  it  is  not
       present locally.

       -0, --zero-based
		 Specify  that	the position in	the data file is 0-based (e.g.
		 UCSC files) rather than 1-based.

       -b, --begin INT
		 Column	of start chromosomal position. [4]

       -c, --comment CHAR
		 Skip lines started with character CHAR. [#]

       -C, --csi Produce CSI format index instead of classical	tabix  or  BAI
		 style indices.

       -e, --end INT
		 Column	of end chromosomal position. The end column can	be the
		 same as the start column. [5]

       -f, --force
		 Force to overwrite the	index file if it is present.

       -m, --min-shift INT
		 set minimal interval size for CSI indices to 2^INT [14]

       -p, --preset STR
		 Input format for indexing. Valid values are: gff,  bed,  sam,
		 vcf.	This option should not be applied together with	any of
		 -s, -b, -e, -c	and -0;	it is not used for data	retrieval  be-
		 cause this setting is stored in the index file. [gff]

       -s, --sequence INT
		 Column	of sequence name. Option -s, -b, -e, -S, -c and	-0 are
		 all stored in the index file and thus not used	 in  data  re-
		 trieval. [1]

       -S, --skip-lines	INT
		 Skip first INT	lines in the data file.	[0]

       -h, --print-header
	      Print also the header/meta lines.

       -H, --only-header
	      Print only the header/meta lines.

       -l, --list-chroms
	      List the sequence	names stored in	the index file.

       -r, --reheader FILE
	      Replace the header with the content of FILE

       -R, --regions FILE
	      Restrict to regions listed in the	FILE. The FILE can be BED file
	      (requires	.bed, .bed.gz, .bed.bgz	file name extension) or	a TAB-
	      delimited	 file  with  CHROM, POS, and,  optionally, POS_TO col-
	      umns, where positions are	1-based	and inclusive.	When this  op-
	      tion is in use, the input	file may not be	sorted.

       -T, --targets FILE
	      Similar to -R but	the entire input will be read sequentially and
	      regions not listed in FILE will be skipped.

       -D     Do not download the index	file before opening it.	Valid for  re-
	      mote files only.

       (grep  ^"#"  in.gff; grep -v ^"#" in.gff	| sort -k1,1 -k4,4n) | bgzip >

       tabix -p	gff sorted.gff.gz;

       tabix sorted.gff.gz chr1:10,000,000-20,000,000;

       It is straightforward to	achieve	overlap	queries	using the standard  B-
       tree  index (with or without binning) implemented in all	SQL databases,
       or the R-tree index in PostgreSQL and Oracle. But there are still  many
       reasons	to  use	 tabix.	 Firstly,  tabix  directly works with a	lot of
       widely used TAB-delimited formats such as GFF/GTF and BED.  We  do  not
       need  to	 design	database schema	or specialized binary formats. Data do
       not need	to be duplicated in different formats, either. Secondly, tabix
       works  on  compressed  data  files while	most SQL databases do not. The
       GenCode annotation GTF can be compressed	down to	4%.  Thirdly, tabix is
       fast.  The  same	indexing algorithm is known to work efficiently	for an
       alignment with a	few billion short reads. SQL databases probably	cannot
       easily  handle  data  at	this scale. Last but not the least, tabix sup-
       ports remote data retrieval. One	can put	the data file and the index at
       an  FTP	or  HTTP  server, and other users or even web services will be
       able to get a slice without downloading the entire file.

       Tabix was written by Heng Li. The BGZF library  was  originally	imple-
       mented  by Bob Handsaker	and modified by	Heng Li	for remote file	access
       and in-memory caching.

       bgzip(1), samtools(1)

htslib-1.10.2		       19 December 2019			      tabix(1)


Want to link to this manual page? Use this URL:

home | help