Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
abyss-sealer(1)			     ABySS		       abyss-sealer(1)

       abyss-sealer - Close gaps within	scaffolds

       abyss-sealer -b <Bloom filter size> -k <kmer size> -k <kmer size>... -o
       <output_prefix>	-S  <path  to  scaffold	 file>	[options]...  <reads1>

       For example:

       abyss-sealer  -b20G  -k64  -k80 -k96 -k112 -k128	-o test	-S scaffold.fa
       read1.fa	read2.fa

       Sealer is an application	of Konnector that closes intra-scaffold	 gaps.
       It  performs  three  sequential	functions.  First, regions with	Ns are
       identified from an input	scaffold.  Flanking nucleotues (2 x 100bp) are
       extracted from those regions while respecting the strand	(5' to 3') di-
       rection on the sequence immediately downstream of  each	gap.   In  the
       second  step,  flanking	sequence  pairs	are used as input to Konnector
       along with a set	of reads with a	high  level  of	 coverage  redundancy.
       Ideally,	the reads should represent the original	dataset	from which the
       draft assembly is generated, or further whole genome shotgun (WGS)  se-
       quencing	 data  generated  from the same	sample.	 Within	Konnector, the
       input WGS reads are used	to populate a Bloom filter, tiling  the	 reads
       with a sliding window of	length k, thus generating a probabilistic rep-
       resentation of all the k-mers in	the reads.  Konnector also uses	 crude
       error  removal  and correctional	algorithms, eliminating	singletons (k-
       mers that are observed only once) and fixing  base  mismatches  in  the
       flanking	 sequence  pairs.  Sealer launches Konnector processes using a
       user-input range	of k-mer lengths.  In the third	and  final  operation,
       successfully  merged sequences are inserted into	the gaps of the	origi-
       nal scaffolds, and Sealer outputs a new gap-filled scaffold file.

       See ABySS installation instructions.

How to run as stand-alone application
       abyss-sealer [-b	bloom filter size][-k values...] [-o outputprefix] [-S
       assembly	file] [options...] [reads...]

       Sealer  requires	 the  following	information to run: - draft assembly -
       user-supplied k values (>0) - output prefix - WGS reads	(for  building
       Bloom Filters)

Sample commands
       Without pre-built bloom filters:

       abyss-sealer -b20G -k64 -k96 -o run1 -S test.fa read1.fq.gz read2.fq.gz

       With pre-built bloom filters:

       abyss-sealer  -k64  -k96	 -o  run1 -S test.fa -i	k64.bloom -i k96.bloom
       read1.fq.gz read2.fq.gz

       Reusable	Bloom filters can be pre-built with abyss-bloom	build, e.g.:

       abyss-bloom  build  -vv	-k64  -j12  -b20G  -l2	k64.bloom  read1.fq.gz

       Note:  when  using  pre-built  bloom  filters  generated	by abyss-bloom
       build, Sealer must be compiled with the same  maxk  value  that	abyss-
       bloom was compiled with.	 For example, if a Bloom filter	was built with
       a maxk of 64, Sealer must be compiled with a maxk of 64	as  well.   If
       different  values are used between the pre-built	bloom filter and Seal-
       er, any sequences generated will	be nonsensical and incorrect.

Output files
       o prefix_log.txt

       o prefix_scaffold.fa

       o prefix_merged.fa

       o prefix_flanks_1.fq -> if --print-flanks option	used

       o prefix_flanks_2.fq -> if --print-flanks option	used

       The log file contains results of	each Konnector run.  The structure  of
       one run is as follows:

       o ## unique gaps	closed for k##

       o No start/goal kmer: ###

       o No path: ###

       o Unique	path: ###

       o Multiple paths: ###

       o Too many paths: ###

       o Too many branches: ###

       o Too many path/path mismatches:	###

       o Too many path/read mismatches:	###

       o Contains cycle: ###

       o Exceeded mem limit: ###

       o Skipped: ###

       o ### flanks left

       o k## run complete

       o Total gaps closed so far = ###

       The  scaffold.fa	file is	a gap-filled version of	the draft assembly in-
       serted into Sealer.  The	merged.fa file contains	every newly  generated
       sequence	 that  were  inserted  into  gaps,  including the flanking se-
       quences.	 Negative sizes	of new sequences indicate Konnector  collapsed
       the pair	of flanking sequences.	For example:

       >[scaffold  ID]_[original  start	 position of gap on scaffold]_[size of

       If --print-flanks option	is enabled, Sealer outputs  the	 flanking  se-
       quences used to insert into Konnector.  This may	be useful should users
       which to	double check if	this tool is extracting	the correct  sequences
       surrounding gaps.  The structure	of these files are as follows:

       >[scaffold  ID]_[original  start	 position of gap on scaffold]_[size of
       gap]/[1 or 2 indicating whether left or right  flank]  GCTAGCTAGCTAGCT-

How to optimize	for gap	closure
       To  optimize  Sealer, users can observe the log files generated after a
       run and adjust parameters accordingly.  If k runs are showing gaps hav-
       ing  too	 many  paths or	branches, consider increasing -P or -B parame-
       ters, respectively.

       Also consider increasing	the number of k	values used.  Generally, large
       k-mers  are  better  able to address highly repetitive genomic regions,
       while smaller k-mers are	better able to resolve areas of	low coverage.

Runtime	and memory usage
       More k values mean more bloom filters will be required, which will  in-
       crease  runtime as it takes time	to build/load each bloom filter	at the
       beginning of each k run.	 Memory	usage is not affected  by  using  more
       bloom filters.

       The larger value	used for parameters such as -P,	-B or -F will increase

       Parameters of abyss-sealer

       o --print-flanks: outputs flank files

       o -S,--input-scaffold=FILE: load	scaffold from FILE

       o -L,--flank-length=N: length of	flanks to be used as pseudoreads [100]

       o -j,--threads=N: use N parallel	threads	[1]

       o -k,--kmer=N: the size of a k-mer

       o -b,--bloom-size=N: size of bloom filter.   Required  when  not	 using
	 pre-built Bloom filter(s).

       o -B,--max-branches=N:  max  branches in	de Bruijn graph	traversal; use
	 `nolimit' for no limit	[1000]

       o -d,--dot-file=FILE: write graph traversals to a DOT file

       o -e,--fix-errors: find and fix single-base errors when reads  have  no
	 kmers in bloom	filter [disabled]

       o -f,--min-frag=N: min fragment size in base pairs [0]

       o -F,--max-frag=N: max fragment size in base pairs [1000]

       o -i,--input-bloom=FILE:	load bloom filter from FILE

       o --mask: mask new and changed bases as lower case

       o --no-mask: do not mask	bases [default]

       o --chastity: discard unchaste reads [default]

       o --no-chastity:	do not discard unchaste	reads

       o --trim-masked:	trim masked bases from the ends	of reads

       o --no-trim-masked:  do	not  trim  masked bases	from the ends of reads

       o -l,--long-search: start path search as	close as possible to  the  be-
	 ginnings  of  reads.  Takes more time but improves results when bloom
	 filter	false positive rate is high [disabled]

       o -m,-flank-mismatches=N`: max mismatches between paths and flanks; use
	 `nolimit' for no limit	[nolimit]

       o -M,-max-mismatches=N`:	 max  mismatches  between all alternate	paths;
	 use `nolimit' for no limit [nolimit]

       o -n-no-limits`:	disable	all limits; equivalent to `-B nolimit  -m  no-
	 limit -M nolimit -P nolimit'

       o -o,-output-prefix=FILE`: prefix of output FASTA files [required]

       o -P,-max-paths=N`:  merge at most N alternate paths; use `nolimit' for
	 no limit [2]

       o -q,-trim-quality=N`: trim bases from the ends of reads	whose  quality
	 is less than the threshold

       o --standard-quality:  zero  quality  is	`!' (33) default for FASTQ and
	 SAM files

       o --illumina-quality: zero quality is `@' (64) default for qseq and ex-
	 port files

       o -r,-read-name=STR`: only process reads	with names that	contain	STR

       o -s,-search-mem=N`: mem	limit for graph	searches; multiply by the num-
	 ber of	threads	(-j) to	get the	total mem  used	 for  graph  traversal

       o -t,-trace-file=FILE`: write graph search stats	to FILE

       o -v,-verbose`: display verbose output

       o --help: display this help and exit

       o --version: output version information and exit

       k is the	size of	k-mer for the de Bruijn	graph.	You may	specify	multi-
       ple values of k,	which will increase the	number of gaps closed  at  the
       cost of increased run time.  Multiple values of k ought to be specified
       in increasing order, as lower values of k have fewer coverage gaps  and
       are less	likely to misassemble.

       P  is  the threshold for	number of paths	allowed	to be traversed.  When
       set to 10, Konnector will attempt to close gaps even when there are  10
       different paths found.  It would	attempt	to create a consensus sequence
       between these paths.  The default setting is 2.

       Daniel Paulino.

ABySS				  2014-11-13		       abyss-sealer(1)

Name | Synopsis | Description | Installation | How to run as stand-alone application | Sample commands | Output files | How to optimize for gap closure | Runtime and memory usage | Options | AUTHORS

Want to link to this manual page? Use this URL:

home | help