Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Bio::DB::Fasta(3)     User Contributed Perl Documentation    Bio::DB::Fasta(3)

NAME
       Bio::DB::Fasta -	Fast indexed access to fasta files

SYNOPSIS
	 use Bio::DB::Fasta;

	 # Create database from	a directory of Fasta files
	 my $db	      =	Bio::DB::Fasta->new('/path/to/fasta/files/');
	 my @ids      =	$db->get_all_primary_ids;

	 # Simple access
	 my $seqstr   =	$db->seq('CHROMOSOME_I', 4_000_000 => 4_100_000);
	 my $revseq   =	$db->seq('CHROMOSOME_I', 4_100_000 => 4_000_000);
	 my $length   =	$db->length('CHROMOSOME_I');
	 my $header   =	$db->header('CHROMOSOME_I');
	 my $alphabet =	$db->alphabet('CHROMOSOME_I');

	 # Access to sequence objects. See Bio::PrimarySeqI.
	 my $seq     = $db->get_Seq_by_id('CHROMOSOME_I');
	 my $seqstr  = $seq->seq;
	 my $subseq  = $seq->subseq(4_000_000 => 4_100_000);
	 my $trunc   = $seq->trunc(4_000_000 =>	4_100_000);
	 my $length  = $seq->length;

	 # Loop	through	sequence objects
	 my $stream  = $db->get_PrimarySeq_stream;
	 while (my $seq	= $stream->next_seq) {
	   # Bio::PrimarySeqI stuff
	 }

	 # Filehandle access
	 my $fh	= Bio::DB::Fasta->newFh('/path/to/fasta/files/');
	 while (my $seq	= <$fh>) {
	   # Bio::PrimarySeqI stuff
	 }

	 # Tied	hash access
	 tie %sequences,'Bio::DB::Fasta','/path/to/fasta/files/';
	 print $sequences{'CHROMOSOME_I:1,20000'};

DESCRIPTION
       Bio::DB::Fasta provides indexed access to a single Fasta	file, several
       files, or a directory of	files. It provides persistent random access to
       each sequence entry (either as a	Bio::PrimarySeqI-compliant object or a
       string),	and to subsequences within each	entry, allowing	you to
       retrieve	portions of very large sequences without bringing the entire
       sequence	into memory. Bio::DB::Fasta is based on	Bio::DB::IndexedBase.
       See this	module's documentation for details.

       The Fasta files may contain any combination of nucleotide and protein
       sequences; during indexing the module guesses the molecular type.
       Entries may have	any line length	up to 65,536 characters, and different
       line lengths are	allowed	in the same file.  However, within a sequence
       entry, all lines	must be	the same length	except for the last. An	error
       will be thrown if this is not the case.

       The module uses /^>(\S+)/ to extract the	primary	ID of each sequence
       from the	Fasta header. See -makeid in Bio::DB::IndexedBase to pass a
       callback	routine	to reversibly modify this primary ID, e.g. if you wish
       to extract a specific portion of	the gi|gb|abc|xyz GenBank IDs.

DATABASE CREATION AND INDEXING
       The object-oriented constructor is new(), the filehandle	constructor is
       newFh() and the tied hash constructor is	tie(). They all	allow one to
       index a single Fasta file, several files, or a directory	of files. See
       Bio::DB::IndexedBase.

SEE ALSO
       Bio::DB::IndexedBase

       Bio::DB::Qual

       Bio::PrimarySeqI

AUTHOR
       Lincoln Stein <lstein@cshl.org>.

       Copyright (c) 2001 Cold Spring Harbor Laboratory.

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.  See DISCLAIMER.txt	for
       disclaimers of warranty.

APPENDIX
       The rest	of the documentation details each of the object	methods.
       Internal	methods	are usually preceded with a _

       For BioPerl-style access, the following methods are provided:

   get_Seq_by_id
	Title	: get_Seq_by_id, get_Seq_by_acc, get_Seq_by_primary_id
	Usage	: my $seq = $db->get_Seq_by_id($id);
	Function: Given	an ID, fetch the corresponding sequence	from the database.
	Returns	: A Bio::PrimarySeq::Fasta object (Bio::PrimarySeqI compliant)
		  Note that to save resource, Bio::PrimarySeq::Fasta sequence objects
		  only load the	sequence string	into memory when requested using seq().
		  See L<Bio::PrimarySeqI> for methods provided by the sequence objects
		  returned from	get_Seq_by_id()	and get_PrimarySeq_stream().
	Args	: ID

   get_PrimarySeq_stream
	Title	: get_PrimarySeq_stream
	Usage	: my $stream = $db->get_PrimarySeq_stream();
	Function: Get a	stream of Bio::PrimarySeq::Fasta objects. The stream supports a
		  single method, next_seq(). Each call to next_seq() returns a new
		  Bio::PrimarySeq::Fasta sequence object, until	no more	sequences remain.
	Returns	: A Bio::DB::Indexed::Stream object
	Args	: None

       For simple access, the following	methods	are provided:

   new
	Title	: new
	Usage	: my $db = Bio::DB::Fasta->new(	$path, %options);
	Function: Initialize a new database object. When indexing a directory, files
		  ending in .fa,fasta,fast,dna,fna,faa,fsa are indexed by default.
	Returns	: A new	Bio::DB::Fasta object.
	Args	: A single file, or path to dir, or arrayref of	files
		  Optional arguments: see Bio::DB::IndexedBase

   seq
	Title	: seq, sequence, subseq
	Usage	: # Entire sequence string
		  my $seqstr	= $db->seq($id);
		  # Subsequence
		  my $subseqstr	= $db->seq($id,	$start,	$stop, $strand);
		  # or...
		  my $subseqstr	= $db->seq($compound_id);
	Function: Get a	subseq of a sequence from the database.	For your convenience,
		  the sequence to extract can be specified with	any of the following
		  compound IDs:
		     $db->seq("$id:$start,$stop")
		     $db->seq("$id:$start..$stop")
		     $db->seq("$id:$start-$stop")
		     $db->seq("$id:$start,$stop/$strand")
		     $db->seq("$id:$start..$stop/$strand")
		     $db->seq("$id:$start-$stop/$strand")
		     $db->seq("$id/$strand")
		  In the case of DNA or	RNA sequence, if $stop is less than $start,
		  then the reverse complement of the sequence is returned. Avoid using
		  it if	possible since this goes against Bio::Seq conventions.
	Returns	: A string
	Args	: ID of	sequence to retrieve
		    or
		  Compound ID of subsequence to	fetch
		    or
		  ID, optional start (defaults to 1), optional end (defaults to	length
		  of sequence) and optional strand (defaults to	1).

   length
	Title	: length
	Usage	: my $length = $qualdb->length($id);
	Function: Get the number of residues in	the indicated sequence.
	Returns	: Number
	Args	: ID of	entry

   header
	Title	: header
	Usage	: my $header = $db->header($id);
	Function: Get the header line (ID and description fields) of the specified
		  sequence.
	Returns	: String
	Args	: ID of	sequence

   alphabet
	Title	: alphabet
	Usage	: my $alphabet = $db->alphabet($id);
	Function: Get the molecular type of the	indicated sequence: dna, rna or	protein
	Returns	: String
	Args	: ID of	sequence

perl v5.32.0			  2019-12-07		     Bio::DB::Fasta(3)

NAME | SYNOPSIS | DESCRIPTION | DATABASE CREATION AND INDEXING | SEE ALSO | AUTHOR | APPENDIX

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Bio::DB::Fasta&sektion=3&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help