Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Bio::DB::Flat::BinarySUserhContributed Perl DocuBio::DB::Flat::BinarySearch(3)

NAME
       Bio::DB::Flat::BinarySearch - BinarySearch search indexing system for
       sequence	files

SYNOPSIS
	 TODO: SYNOPSIS	NEEDED!

DESCRIPTION
       This module can be used both to index sequence files and	also to
       retrieve	sequences from existing	sequence files.

       This object allows indexing of sequence files both by a primary key
       (say accession) and multiple secondary keys (say	ids).  This is
       different from the Bio::Index::Abstract (see Bio::Index::Abstract)
       which uses DBM files as storage.	 This module uses a binary search to
       retrieve	sequences which	is more	efficient for large datasets.

   Index creation
	   my $sequencefile;  #	Some fasta sequence file

       Patterns	have to	be entered to define where the keys are	to be indexed
       and also	where the start	of each	record.	 E.g. for fasta

	   my $start_pattern   = '^>';
	   my $primary_pattern = '^>(\S+)';

       So the start of a record	is a line starting with	a > and	the primary
       key is all characters up	to the first space after the >

       A string	also has to be entered to defined what the primary key
       (primary_namespace) is called.

       The index can now be created using

	   my $index = Bio::DB::Flat::BinarySearch->new(
		    -directory	       => "/home/max/",
		    -dbname	       => "mydb",
		     -start_pattern	=> $start_pattern,
		     -primary_pattern	=> $primary_pattern,
		    -primary_namespace => "ID",
		     -format		=> "fasta" );

	   my @files = ("file1","file2","file3");

	   $index->build_index(@files);

       The index is now	ready to use.  For large sequence files	the perl way
       of indexing takes a *long* time and a *huge* amount of memory.  For
       indexing	things like dbEST I recommend using the	DB_File	indexer, BDB.

       The formats currently supported by this module are fasta, Swissprot,
       and EMBL.

   Creating indices with secondary keys
       Sometimes just indexing files with one id per entry is not enough.  For
       instance	you may	want to	retrieve sequences from	swissprot using	their
       accessions as well as their ids.

       To be able to do	this when creating your	index you need to pass in a
       hash of secondary_patterns which	have their namespaces as the keys to
       the hash.

       e.g. For	Indexing something like

       ID   1433_CAEEL	   STANDARD;	  PRT;	 248 AA.  AC   P41932; DT
       01-NOV-1995 (Rel. 32, Created) DT   01-NOV-1995 (Rel. 32, Last sequence
       update) DT   15-DEC-1998	(Rel. 37, Last annotation update) DE
       14-3-3-LIKE PROTEIN 1.  GN   FTT-1 OR M117.2.  OS   Caenorhabditis
       elegans.	 OC   Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida;
       Rhabditoidea; OC	  Rhabditidae; Peloderinae; Caenorhabditis.  OX
       NCBI_TaxID=6239;	RN   [1]

       where we	want to	index the accession (P41932) as	the primary key	and
       the id (1433_CAEEL) as the secondary id.	 The index is created as
       follows

	   my %secondary_patterns;

	   my $start_pattern   = '^ID	(\S+)';
	   my $primary_pattern = '^AC	(\S+)\;';

	   $secondary_patterns{"ID"} = '^ID   (\S+)';

	   my $index = Bio::DB::Flat::BinarySearch->new(
		       -directory	   => $index_directory,
			 -dbname	     =>	"ppp",
			 -write_flag	     =>	1,
		       -verbose		   => 1,
		       -start_pattern	   => $start_pattern,
		       -primary_pattern	   => $primary_pattern,
		       -primary_namespace  => 'AC',
		       -secondary_patterns => \%secondary_patterns);

	   $index->build_index($seqfile);

       Of course having	secondary indices makes	indexing slower	and use	more
       memory.

   Index reading
       To fetch	sequences using	an existing index first	of all create your
       sequence	object

	   my $index = Bio::DB::Flat::BinarySearch->new(
			 -directory => $index_directory);

       Now you can happily fetch sequences either by the primary key or	by the
       secondary keys.

	   my $entry = $index->get_entry_by_id('HBA_HUMAN');

       This returns just a string containing the whole entry.  This is useful
       is you just want	to print the sequence to screen	or write it to a file.

       Other ways of getting sequences are

	   my $fh = $index->get_stream_by_id('HBA_HUMAN');

       This can	then be	passed to a seqio object for output or converting into
       objects.

	   my $seq = Bio::SeqIO->new(-fh     =>	$fh,
				       -format => 'fasta');

       The last	way is to retrieve a sequence directly.	 This is the slowest
       way of extracting as the	sequence objects need to be made.

	   my $seq = $index->get_Seq_by_id('HBA_HUMAN');

       To access the secondary indices the secondary namespace needs to	be
       known

	   $index->secondary_namespaces("ID");

       Then the	following call can be used

	   my $seq   = $index->get_Seq_by_secondary('ID','1433_CAEEL');

       These calls are not yet implemented

	   my $fh    = $index->get_stream_by_secondary('ID','1433_CAEEL');
	   my $entry = $index->get_entry_by_secondary('ID','1433_CAEEL');

FEEDBACK
   Mailing Lists
       User feedback is	an integral part of the	evolution of this and other
       Bioperl modules.	Send your comments and suggestions preferably to one
       of the Bioperl mailing lists.  Your participation is much appreciated.

	 bioperl-l@bioperl.org			- General discussion
	 http://bioperl.org/wiki/Mailing_lists	- About	the mailing lists

   Support
       Please direct usage questions or	support	issues to the mailing list:

       bioperl-l@bioperl.org

       rather than to the module maintainer directly. Many experienced and
       reponsive experts will be able look at the problem and quickly address
       it. Please include a thorough description of the	problem	with code and
       data examples if	at all possible.

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track
       the bugs	and their resolution.  Bug reports can be submitted via	the
       web:

	 https://github.com/bioperl/bioperl-live/issues

AUTHOR - Michele Clamp
       Email - michele@sanger.ac.uk

CONTRIBUTORS
       Jason Stajich, jason@bioperl.org

APPENDIX
       The rest	of the documentation details each of the object	methods.
       Internal	methods	are usually preceded with an "_" (underscore).

   new
	Title	: new
	Usage	: For reading
		    my $index =	Bio::DB::Flat::BinarySearch->new(
			    -directory => '/Users/michele/indices/dbest',
		    -dbname    => 'mydb',
			    -format    => 'fasta');

		  For writing

		    my %secondary_patterns = {"ACC" => "^>\\S+ +(\\S+)"}
		    my $index =	Bio::DB::Flat::BinarySearch->new(
		    -directory		=> '/Users/michele/indices',
			    -dbname		=> 'mydb',
		    -primary_pattern	=> "^>(\\S+)",
			    -secondary_patterns	=> \%secondary_patterns,
		    -primary_namespace	=> "ID");

		    my @files =	('file1','file2','file3');

		    $index->build_index(@files);

	Function: create a new Bio::DB::Flat::BinarySearch object
	Returns	: new Bio::DB::Flat::BinarySearch
	Args	: -directory	      Root directory for index files
		  -dbname	      Name of subdirectory containing indices
				      for named	database
		  -write_flag	      Allow building index
		  -primary_pattern    Regexp defining the primary id
		  -secondary_patterns A	hash ref containing the	secondary
				      patterns with the	namespaces as keys
		  -primary_namespace  A	string defining	what the primary key
				      is

	Status	: Public

   get_Seq_by_id
	Title	: get_Seq_by_id
	Usage	: $obj->get_Seq_by_id($newval)
	Function:
	Example	:
	Returns	: value	of get_Seq_by_id
	Args	: newvalue (optional)

   get_entry_by_id
	Title	: get_entry_by_id
	Usage	: $obj->get_entry_by_id($newval)
	Function: Get a	Bio::SeqI object for a unique ID
	Returns	: Bio::SeqI
	Args	: string

   get_stream_by_id
	Title	: get_stream_by_id
	Usage	: $obj->get_stream_by_id($id)
	Function: Gets a Sequence stream for an	id
	Returns	: Bio::SeqIO stream
	Args	: Id to	lookup by

   get_Seq_by_acc
	Title	: get_Seq_by_acc
	Usage	: $obj->get_Seq_by_acc($acc)
	Function: Gets a Bio::SeqI object by accession number
	Returns	: Bio::SeqI object
	Args	: string representing accession	number

   get_Seq_by_version
	Title	: get_Seq_by_version
	Usage	: $obj->get_Seq_by_version($version)
	Function: Gets a Bio::SeqI object by accession.version number
	Returns	: Bio::SeqI object
	Args	: string representing accession.version	number

   get_Seq_by_secondary
	Title	: get_Seq_by_secondary
	Usage	: $obj->get_Seq_by_secondary($namespace,$acc)
	Function: Gets a Bio::SeqI object looking up secondary accessions
	Returns	: Bio::SeqI object
	Args	: namespace name to check secondary namespace and an id

   read_header
	Title	: read_header
	Usage	: $obj->read_header($fhl)
	Function: Reads	the header from	the db file
	Returns	: width	of a record
	Args	: filehandle

   read_record
	Title	: read_record
	Usage	: $obj->read_record($fh,$pos,$len)
	Function: Reads	a record from a	filehandle
	Returns	: String
	Args	: filehandle, offset, and length

   get_all_primary_ids
	Title	: get_all_primary_ids
	Usage	: @ids = $seqdb->get_all_primary_ids()
	Function: gives	an array of all	the primary_ids	of the
		  sequence objects in the database.
	Returns	: an array of strings
	Args	: none

   find_entry
	Title	: find_entry
	Usage	: $obj->find_entry($fh,$start,$end,$id,$recsize)
	Function: Extract an entry based on the	start,end,id and record	size
	Returns	: string
	Args	: filehandle, start, end, id, recordsize

   build_index
	Title	: build_index
	Usage	: $obj->build_index(@files)
	Function: Build	the index based	on a set of files
	Returns	: count	of the number of entries
	Args	: List of filenames

   _index_file
	Title	: _index_file
	Usage	: $obj->_index_file($newval)
	Function:
	Example	:
	Returns	: value	of _index_file
	Args	: newvalue (optional)

   write_primary_index
	Title	: write_primary_index
	Usage	: $obj->write_primary_index($newval)
	Function:
	Example	:
	Returns	: value	of write_primary_index
	Args	: newvalue (optional)

   write_secondary_indices
	Title	: write_secondary_indices
	Usage	: $obj->write_secondary_indices($newval)
	Function:
	Example	:
	Returns	: value	of write_secondary_indices
	Args	: newvalue (optional)

   new_secondary_filehandle
	Title	: new_secondary_filehandle
	Usage	: $obj->new_secondary_filehandle($newval)
	Function:
	Example	:
	Returns	: value	of new_secondary_filehandle
	Args	: newvalue (optional)

   open_secondary_index
	Title	: open_secondary_index
	Usage	: $obj->open_secondary_index($newval)
	Function:
	Example	:
	Returns	: value	of open_secondary_index
	Args	: newvalue (optional)

   _add_id_position
	Title	: _add_id_position
	Usage	: $obj->_add_id_position($newval)
	Function:
	Example	:
	Returns	: value	of _add_id_position
	Args	: newvalue (optional)

   make_config_file
	Title	: make_config_file
	Usage	: $obj->make_config_file($newval)
	Function:
	Example	:
	Returns	: value	of make_config_file
	Args	: newvalue (optional)

   read_config_file
	Title	: read_config_file
	Usage	: $obj->read_config_file($newval)
	Function:
	Example	:
	Returns	: value	of read_config_file
	Args	: newvalue (optional)

   get_fileid_by_filename
	Title	: get_fileid_by_filename
	Usage	: $obj->get_fileid_by_filename($newval)
	Function:
	Example	:
	Returns	: value	of get_fileid_by_filename
	Args	: newvalue (optional)

   get_filehandle_by_fileid
	Title	: get_filehandle_by_fileid
	Usage	: $obj->get_filehandle_by_fileid($newval)
	Function:
	Example	:
	Returns	: value	of get_filehandle_by_fileid
	Args	: newvalue (optional)

   primary_index_file
	Title	: primary_index_file
	Usage	: $obj->primary_index_file($newval)
	Function:
	Example	:
	Returns	: value	of primary_index_file
	Args	: newvalue (optional)

   primary_index_filehandle
	Title	: primary_index_filehandle
	Usage	: $obj->primary_index_filehandle($newval)
	Function:
	Example	:
	Returns	: value	of primary_index_filehandle
	Args	: newvalue (optional)

   format
	Title	: format
	Usage	: $obj->format($newval)
	Function:
	Example	:
	Returns	: value	of format
	Args	: newvalue (optional)

   write_flag
	Title	: write_flag
	Usage	: $obj->write_flag($newval)
	Function:
	Example	:
	Returns	: value	of write_flag
	Args	: newvalue (optional)

   dbname
	Title	: dbname
	Usage	: $obj->dbname($newval)
	Function: get/set database name
	Example	:
	Returns	: value	of dbname
	Args	: newvalue (optional)

   index_directory
	Title	: index_directory
	Usage	: $obj->index_directory($newval)
	Function:
	Example	:
	Returns	: value	of index_directory
	Args	: newvalue (optional)

   record_size
	Title	: record_size
	Usage	: $obj->record_size($newval)
	Function:
	Example	:
	Returns	: value	of record_size
	Args	: newvalue (optional)

   primary_namespace
	Title	: primary_namespace
	Usage	: $obj->primary_namespace($newval)
	Function:
	Example	:
	Returns	: value	of primary_namespace
	Args	: newvalue (optional)

   index_type
	Title	: index_type
	Usage	: $obj->index_type($newval)
	Function:
	Example	:
	Returns	: value	of index_type
	Args	: newvalue (optional)

   index_version
	Title	: index_version
	Usage	: $obj->index_version($newval)
	Function:
	Example	:
	Returns	: value	of index_version
	Args	: newvalue (optional)

   primary_pattern
	Title	: primary_pattern
	Usage	: $obj->primary_pattern($newval)
	Function:
	Example	:
	Returns	: value	of primary_pattern
	Args	: newvalue (optional)

   start_pattern
	Title	: start_pattern
	Usage	: $obj->start_pattern($newval)
	Function:
	Example	:
	Returns	: value	of start_pattern
	Args	: newvalue (optional)

   secondary_patterns
	Title	: secondary_patterns
	Usage	: $obj->secondary_patterns($newval)
	Function:
	Example	:
	Returns	: value	of secondary_patterns
	Args	: newvalue (optional)

   secondary_namespaces
	Title	: secondary_namespaces
	Usage	: $obj->secondary_namespaces($newval)
	Function:
	Example	:
	Returns	: value	of secondary_namespaces
	Args	: newvalue (optional)

perl v5.32.0			  2019-12-07	Bio::DB::Flat::BinarySearch(3)

NAME | SYNOPSIS | DESCRIPTION | FEEDBACK | AUTHOR - Michele Clamp | CONTRIBUTORS | APPENDIX

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Bio::DB::Flat::BinarySearch&sektion=3&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help