Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Bio::Search::BlastUtilUser Contributed Perl DocumentBio::Search::BlastUtils(3)

       Bio::Search::BlastUtils - Utility functions for Bio::Search:: BLAST

	# This module is just a	collection of subroutines, not an object.

       See Bio::Search::Hit::BlastHit.

       The module	is a collection	of subroutines used primarily
       by Bio::Search::Hit::BlastHit objects for some of the additional
       functionality, such as HSP tiling. Right	now, the BlastUtils is just a
       collection of methods, not an object, and it's tightly coupled to
       Bio::Search::Hit::BlastHit. A goal for the future is to generalize it
       to work based on	the Bio::Search	interfaces, then it can	work with any
       objects that implements them.

       Steve Chervitz <>

	Usage	  : tile_hsps( $sbjct );
		  : This is called automatically by Bio::Search::Hit::BlastHit
		  : during object construction or
		  : as needed by methods that rely on having tiled data.
	Purpose	  : Collect statistics about the aligned sequences in a	set of HSPs.
		  : Calculates the following data across all HSPs:
		  :    -- total	alignment length
		  :    -- total	identical residues
		  :    -- total	conserved residues
	Returns	  : n/a
	Argument  : A Bio::Search::Hit::BlastHit object
	Throws	  : n/a
	Comments  :
		  : This method	is *strongly* coupled to Bio::Search::Hit::BlastHit
		  : (it	accesses BlastHit data members directly).
		  : TODO: Re-write this	to the Bio::Search::Hit::HitI interface.
		  : This method	performs more careful summing of data across
		  : all	HSPs in	the Sbjct object. Only HSPs that are in	the same strand
		  : and	frame are tiled. Simply	summing	the data from all HSPs
		  : in the same	strand and frame will overestimate the actual
		  : length of the alignment if there is	overlap	between	different HSPs
		  : (often the case).
		  : The	strategy is to tile the	HSPs and sum over the
		  : contigs, collecting	data separately	from overlapping and
		  : non-overlapping regions of each HSP. To facilitate this, the
		  : object now permits extraction of data from sub-sections
		  : of an HSP.
		  : Additional useful information is collected from the	results
		  : of the tiling. It is possible that sub-sequences in
		  : different HSPs will	overlap	significantly. In this case, it
		  : is impossible to create a single unambiguous alignment by
		  : concatenating the HSPs. The	ambiguity may indicate the
		  : presence of	multiple, similar domains in one or both of the
		  : aligned sequences. This ambiguity is recorded using	the
		  : ambiguous_aln() method.
		  : This method	does not attempt to discern biologically
		  : significant	vs. insignificant overlaps. The	allowable amount of
		  : overlap can	be set with the	overlap() method or with the -OVERLAP
		  : parameter used when	constructing the Blast & Sbjct objects.
		  : For	a given	hit, both the query and	the sbjct sequences are
		  : tiled independently.
		  :    -- If only query	sequence HSPs overlap,
		  :	     this may suggest multiple domains in the sbjct.
		  :    -- If only sbjct	sequence HSPs overlap,
		  :	     this may suggest multiple domains in the query.
		  :    -- If both query	& sbjct	sequence HSPs overlap,
		  :	     this suggests multiple domains in both.
		  :    -- If neither query & sbjct sequence HSPs overlap,
		  :	     this suggests either no multiple domains in either
		  :	     sequence OR that both sequences have the same
		  :	     distribution of multiple similar domains.
		  : This method	can deal with the special case of when multiple
		  : HSPs exactly overlap.
		  : Efficiency concerns:
		  :  Speed will	be an issue for	sequences with numerous	HSPs.
	Bugs	  : Currently, tile_hsps() does	not properly account for
		  : the	number of non-tiled but	overlapping HSPs, which	becomes	a problem
		  : as overlap() grows.	Large values overlap() may thus	lead to
		  : incorrect statistics for some hits.	For best results, keep overlap()
		  : below 5 (DEFAULT IS	2). For	more about this, see the "HSP Tiling and
		  : Ambiguous Alignments" section in L<Bio::Search::Hit::BlastHit>.

       See Also	  : _adjust_contigs(), Bio::Search::Hit::BlastHit

	Usage	  : n/a; called	automatically during object construction.
	Purpose	  : Builds HSP contigs for a given BLAST hit.
		  : Utility method called by _tile_hsps()
	Returns	  :
	Argument  :
	Throws	  : Exceptions propagated from Bio::Search::Hit::BlastHSP::matches()
		  : for	invalid	sub-sequence ranges.
	Status	  : Experimental
	Comments  : This method	does not currently support gapped alignments.
		  : Also, it does not keep track of the	number of HSPs that
		  : overlap within the amount specified	by overlap().
		  : This will lead to significant tracking errors for large
		  : overlap values.

       See Also	  : tile_hsps(), Bio::Search::Hit::BlastHSP::matches

	Usage	  : &get_exponent( number );
	Purpose	  : Determines the power of 10 exponent	of an integer, float,
		  : or scientific notation number.
	Example	  : &get_exponent("4.0e-206");
		  : &get_exponent("0.00032");
		  : &get_exponent("10.");
		  : &get_exponent("1000.0");
		  : &get_exponent("e+83");
	Argument  : Float, Integer, or scientific notation number
	Returns	  : Integer representing the exponent part of the number (+ or -).
		  : If argument	== 0 (zero), return value is "-999".
	Comments  : Exponents are rounded up (less negative) if	the mantissa is	>= 5.
		  : Exponents are rounded down (more negative) if the mantissa is <= -5.

	Usage	  : @cnums = collapse_nums( @numbers );
	Purpose	  : Collapses a	list of	numbers	into a set of ranges of	consecutive terms:
		  : Useful for condensing long lists of	consecutive numbers.
		  :  EXPANDED:
		  :	1 2 3 4	5 6 10 12 13 14	15 17 18 20 21 22 24 26	30 31 32
		  :	1-6 10 12-15 17	18 20-22 24 26 30-32
	Argument  : List of numbers sorted numerically.
	Returns	  : List of numbers mixed with ranges of numbers (see above).
	Throws	  : n/a

       See Also	  : Bio::Search::Hit::BlastHit::seq_inds()

	Usage	  : $boolean = &strip_blast_html( string_ref );
		  : This method	is exported.
	Purpose	  : Removes HTML formatting from a supplied string.
		  : Attempts to	restore	the Blast report to enable
		  : parsing by
	Returns	  : Boolean: true if string was	stripped, false	if not.
	Argument  : string_ref = reference to a	string containing the whole Blast
		  :		 report	containing HTML	formatting.
	Throws	  : Croaks if the argument is not a scalar reference.
	Comments  : Based on code originally written by	Alex Dong Li
		  : (
		  : This method	does some Blast-specific stripping
		  : (adds back a '>' character in front	of each	HSP
		  : alignment listing).
		  : Removal of the HTML	tags and accurate reconstitution of the
		  : non-HTML-formatted report is highly	dependent on structure of
		  : the	HTML-formatted version.	For example, it	assumes	that first
		  : line of each alignment section (HSP	listing) starts	with a
		  : <a name=..>	anchor tag. This permits the reconstruction of the
		  : original report in which these lines begin with a ">".
		  : This is required for parsing.
		  : If the structure of	the Blast report itself	is not intended	to
		  : be a standard, the structure of the	HTML-formatted version
		  : is even less so. Therefore,	the use	of this	method to
		  : reconstitute parsable Blast	reports	from HTML-format versions
		  : should be considered a temporary solution.

perl v5.32.1			  2019-12-07	    Bio::Search::BlastUtils(3)


Want to link to this manual page? Use this URL:

home | help