Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Bio::DB::Qual(3)      User Contributed Perl Documentation     Bio::DB::Qual(3)

       Bio::DB::Qual - Fast indexed access to quality files

	 use Bio::DB::Qual;

	 # create database from	directory of qual files
	 my $db	     = Bio::DB::Qual->new('/path/to/qual/files/');
	 my @ids     = $db->get_all_primary_ids;

	 # Simple access
	 my @qualarr = @{$db->qual('CHROMOSOME_I',4_000_000 => 4_100_000)};
	 my @revqual = @{$db->qual('CHROMOSOME_I',4_100_000 => 4_000_000)};
	 my $length  = $db->length('CHROMOSOME_I');
	 my $header  = $db->header('CHROMOSOME_I');

	 # Access to sequence objects. See Bio::PrimarySeqI.
	 my $obj     = $db->get_Qual_by_id('CHROMOSOME_I');
	 my @qual    = @{$obj->qual};
	 my @subqual = @{$obj->subqual(4_000_000 => 4_100_000)};
	 my $length  = $obj->length;

	 # Loop	through	sequence objects
	 my $stream  = $db->get_PrimarySeq_stream;
	 while (my $qual = $stream->next_seq) {
	   # Bio::Seq::PrimaryQual operations

	 # Filehandle access
	 my $fh	= Bio::DB::Qual->newFh('/path/to/qual/files/');
	 while (my $qual = <$fh>) {
	   # Bio::Seq::PrimaryQual operations

	 # Tied	hash access
	 tie %qualities,'Bio::DB::Qual','/path/to/qual/files/';
	 print $qualities{'CHROMOSOME_I:1,20000'};

       Bio::DB::Qual provides indexed access to	a single Fasta file, several
       files, or a directory of	files. It provides random access to each
       quality score entry without having to read the file from	the beginning.
       Access to subqualities (portions	of a quality score) is provided,
       although	contrary to Bio::DB::Fasta, the	full quality score has to be
       brought in memory. Bio::DB::Qual	is based on Bio::DB::IndexedBase. See
       this module's documentation for details.

       The qual	files should contain decimal quality scores. Entries may have
       any line	length up to 65,536 characters,	and different line lengths are
       allowed in the same file. However, within a quality score entry,	all
       lines must be the same length except for	the last. An error will	be
       thrown if this is not the case.

       The module uses /^>(\S+)/ to extract the	primary	ID of each quality
       score from the qual header. See -makeid in Bio::DB::IndexedBase to pass
       a callback routine to reversibly	modify this primary ID,	e.g. if	you
       wish to extract a specific portion of the gi|gb|abc|xyz GenBank IDs.

       The object-oriented constructor is new(), the filehandle	constructor is
       newFh() and the tied hash constructor is	tie(). They all	allow one to
       index a single Fasta file, several files, or a directory	of files. See




       When a quality score is deleted from one	of the qual files, this
       deletion	is not detected	by the module and removed from the index. As a
       result, a "ghost" entry will remain in the index	and will return
       garbage results if accessed. Currently, the only	way to accommodate
       deletions is to rebuild the entire index, either	by deleting it
       manually, or by passing -reindex=>1 to new() when initializing the

       All quality score lines for a given quality score must have the same
       length except for the last (not sure why	there is this limitation).
       This is not problematic for sequences but could be annoying for quality
       scores. A workaround is to make sure that your quality scores fit on no
       more than 2 lines. Another solution could be to padd them with blank
       spaces so that each line	has the	same number of characters (maybe this
       padding should be implemented in	Bio::SeqIO::qual?).

       Florent E Angly <florent	. angly	@ gmail-dot-com>.

       Module largely based on and adapted from	Bio::DB::Fasta by Lincoln

       Copyright (c) 2007 Florent E Angly.

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

       The rest	of the documentation details each of the object	methods.
       Internal	methods	are usually preceded with a _

       For BioPerl-style access, the following methods are provided:

	Title	: get_Seq_by_id,  get_Seq_by_acc, get_Seq_by_version, get_Seq_by_primary_id,
		  get_Qual_by_id, get_qual_by_acc, get_qual_by_version,	get_qual_by_primary_id,
	Usage	: my $seq = $db->get_Seq_by_id($id);
	Function: Given	an ID, fetch the corresponding sequence	from the database.
	Returns	: A Bio::PrimarySeq::Fasta object (Bio::PrimarySeqI compliant)
		  Note that to save resource, Bio::PrimarySeq::Fasta sequence objects
		  only load the	sequence string	into memory when requested using seq().
		  See L<Bio::PrimarySeqI> for methods provided by the sequence objects
		  returned from	get_Seq_by_id()	and get_PrimarySeq_stream().
	Args	: ID

	Title	: get_Seq_stream, get_PrimarySeq_stream
	Usage	: my $stream = $db->get_Seq_stream();
	Function: Get a	stream of Bio::PrimarySeq::Fasta objects. The stream supports a
		  single method, next_seq(). Each call to next_seq() returns a new
		  Bio::PrimarySeq::Fasta sequence object, until	no more	sequences remain.
	Returns	: A Bio::DB::Indexed::Stream object
	Args	: None

       For simple access, the following	methods	are provided:

	Title	: new
	Usage	: my $db = Bio::DB::Qual->new( $path, %options);
	Function: Initialize a new database object. When indexing a directory, files
		  ending in .qual,qa are indexed by default.
	Returns	: A new	Bio::DB::Qual object
	Args	: A single file, or path to dir, or arrayref of	files
		  Optional arguments: see Bio::DB::IndexedBase

	Title	: qual,	quality, subqual
	Usage	: # All	quality	scores
		  my @qualarr =	@{$qualdb->subqual($id)};
		  # Subset of the quality scores
		  my @subqualarr = @{$qualdb->subqual($id, $start, $stop, $strand)};
		  # or...
		  my @subqualarr = @{$qualdb->subqual($compound_id)};
	Function: Get a	subqual	of an entry in the database. For your convenience,
		  the sequence to extract can be specified with	any of the following
		  compound IDs:
		  If $stop is less than	$start,	then the reverse complement of the
		  sequence is returned.	Avoid using it if possible since this goes
		  against Bio::Seq conventions.
	Returns	: Reference to an array	of quality scores
	Args	: Compound ID of entry to retrieve
		  ID, optional start (defaults to 1), optional end (defaults to	the
		  number of quality scores for this sequence), and strand (defaults to

	Title	: header
	Usage	: my $header = $db->header($id);
	Function: Get the header line (ID and description fields) of the specified entry.
	Returns	: String
	Args	: ID of	entry

perl v5.32.0			  2019-12-07		      Bio::DB::Qual(3)


Want to link to this manual page? Use this URL:

home | help