Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Bio::Search::SearchUtiUser)Contributed Perl DocumenBio::Search::SearchUtils(3)

       Bio::Search::SearchUtils	- Utility functions for	Bio::Search:: objects

	 # This	module is just a collection of subroutines, not	an object.

       The module is a collection of subroutines	used primarily
       by Bio::Search::Hit::HitI objects for some of the additional
       functionality, such as HSP tiling. Right	now, the SearchUtils is	just a
       collection of methods, not an object.

       Steve Chervitz <>

       Sendu Bala,

	Usage	  : tile_hsps( $sbjct );
		  : This is called automatically by methods in Bio::Search::Hit::GenericHit
		  : that rely on having	tiled data.
		  : If you are interested in getting data about	the constructed	HSP contigs:
		  : my ($qcontigs, $scontigs) =	Bio::Search::SearchUtils::tile_hsps($hit);
		  : if (ref $qcontigs) {
		  :    print STDERR "Query contigs:\n";
		  :    foreach (@{$qcontigs}) {
		  :	    print "contig start	is $_->{'start'}\n";
		  :	    print "contig stop is $_->{'stop'}\n";
		  :    }
		  : }
		  : See	below for more information about the contig data structure.
	Purpose	  : Collect statistics about the aligned sequences in a	set of HSPs.
		  : Calculates the following data across all HSPs:
		  :    -- total	alignment length
		  :    -- total	identical residues
		  :    -- total	conserved residues
	Returns	  : If there was only a	single HSP (so no tiling was necessary)
		      tile_hsps() returns a list of two	non-zero integers.
		    If there were multiple HSP,
		      tile_hsps() returns a list of two	array references containin HSP contig data.
		    The	first array ref	contains a list	of HSP contigs on the query sequence.
		    The	second array ref contains a list of HSP	contigs	on the subject sequence.
		    Each contig	is a hash reference with the following data fields:
		      'start' => start coordinate of the contig
		      'stop'  => start coordinate of the contig
		      'iden'  => number	of identical residues in the contig
		      'cons'  => number	of conserved residues in the contig
		      'strand'=> strand	of the contig
		      'frame' => frame of the contig
	Argument  : A Bio::Search::Hit::HitI object
	Throws	  : n/a
	Comments  :
		  : This method	performs more careful summing of data across
		  : all	HSPs in	the Sbjct object. Only HSPs that are in	the same strand
		  : and	frame are tiled. Simply	summing	the data from all HSPs
		  : in the same	strand and frame will overestimate the actual
		  : length of the alignment if there is	overlap	between	different HSPs
		  : (often the case).
		  : The	strategy is to tile the	HSPs and sum over the
		  : contigs, collecting	data separately	from overlapping and
		  : non-overlapping regions of each HSP. To facilitate this, the
		  : object now permits extraction of data from sub-sections
		  : of an HSP.
		  : Additional useful information is collected from the	results
		  : of the tiling. It is possible that sub-sequences in
		  : different HSPs will	overlap	significantly. In this case, it
		  : is impossible to create a single unambiguous alignment by
		  : concatenating the HSPs. The	ambiguity may indicate the
		  : presence of	multiple, similar domains in one or both of the
		  : aligned sequences. This ambiguity is recorded using	the
		  : ambiguous_aln() method.
		  : This method	does not attempt to discern biologically
		  : significant	vs. insignificant overlaps. The	allowable amount of
		  : overlap can	be set with the	overlap() method or with the -OVERLAP
		  : parameter used when	constructing the Hit object.
		  : For	a given	hit, both the query and	the sbjct sequences are
		  : tiled independently.
		  :    -- If only query	sequence HSPs overlap,
		  :	     this may suggest multiple domains in the sbjct.
		  :    -- If only sbjct	sequence HSPs overlap,
		  :	     this may suggest multiple domains in the query.
		  :    -- If both query	& sbjct	sequence HSPs overlap,
		  :	     this suggests multiple domains in both.
		  :    -- If neither query & sbjct sequence HSPs overlap,
		  :	     this suggests either no multiple domains in either
		  :	     sequence OR that both sequences have the same
		  :	     distribution of multiple similar domains.
		  : This method	can deal with the special case of when multiple
		  : HSPs exactly overlap.
		  : Efficiency concerns:
		  :  Speed will	be an issue for	sequences with numerous	HSPs.
	Bugs	  : Currently, tile_hsps() does	not properly account for
		  : the	number of non-tiled but	overlapping HSPs, which	becomes	a problem
		  : as overlap() grows.	Large values overlap() may thus	lead to
		  : incorrect statistics for some hits.	For best results, keep overlap()
		  : below 5 (DEFAULT IS	2). For	more about this, see the "HSP Tiling and
		  : Ambiguous Alignments" section in L<Bio::Search::Hit::GenericHit>.

       See Also	  : _adjust_contigs(), Bio::Search::Hit::GenericHit

	Usage	  : logical_length( $alg_name, $seq_type, $length );
	Purpose	  : Determine the logical length of an aligned sequence	based on
		  : algorithm name and sequence	type.
	Returns	  : integer representing the logical aligned length.
	Argument  : $alg_name =	name of	algorigthm (e.g., blastx, tblastn)
		  : $seq_type =	type of	sequence (e.g.,	query or hit)
		  : $length = physical length of the sequence in the alignment.
	Throws	  : n/a
	Comments  : This function is used to account for the fact that number of identities
		    and	conserved residues is reported in peptide space	while the query
		    length (in the case	of BLASTX and TBLASTX) and/or the hit length
		    (in	the case of TBLASTN and	TBLASTX) are in	nucleotide space.
		    The	adjustment affects the values reported by the various frac_XXX
		    methods in GenericHit and GenericHSP.

	Usage	  : &get_exponent( number );
	Purpose	  : Determines the power of 10 exponent	of an integer, float,
		  : or scientific notation number.
	Example	  : &get_exponent("4.0e-206");
		  : &get_exponent("0.00032");
		  : &get_exponent("10.");
		  : &get_exponent("1000.0");
		  : &get_exponent("e+83");
	Argument  : Float, Integer, or scientific notation number
	Returns	  : Integer representing the exponent part of the number (+ or -).
		  : If argument	== 0 (zero), return value is "-999".
	Comments  : Exponents are rounded up (less negative) if	the mantissa is	>= 5.
		  : Exponents are rounded down (more negative) if the mantissa is <= -5.

	Usage	  : @cnums = collapse_nums( @numbers );
	Purpose	  : Collapses a	list of	numbers	into a set of ranges of	consecutive terms:
		  : Useful for condensing long lists of	consecutive numbers.
		  :  EXPANDED:
		  :	1 2 3 4	5 6 10 12 13 14	15 17 18 20 21 22 24 26	30 31 32
		  :	1-6 10 12-15 17	18 20-22 24 26 30-32
	Argument  : List of numbers sorted numerically.
	Returns	  : List of numbers mixed with ranges of numbers (see above).
	Throws	  : n/a

       See Also	  : Bio::Search::Hit::BlastHit::seq_inds()

	Usage	  : $boolean = &strip_blast_html( string_ref );
		  : This method	is exported.
	Purpose	  : Removes HTML formatting from a supplied string.
		  : Attempts to	restore	the Blast report to enable
		  : parsing by
	Returns	  : Boolean: true if string was	stripped, false	if not.
	Argument  : string_ref = reference to a	string containing the whole Blast
		  :		 report	containing HTML	formatting.
	Throws	  : Croaks if the argument is not a scalar reference.
	Comments  : Based on code originally written by	Alex Dong Li
		  : (
		  : This method	does some Blast-specific stripping
		  : (adds back a '>' character in front	of each	HSP
		  : alignment listing).
		  : Removal of the HTML	tags and accurate reconstitution of the
		  : non-HTML-formatted report is highly	dependent on structure of
		  : the	HTML-formatted version.	For example, it	assumes	that first
		  : line of each alignment section (HSP	listing) starts	with a
		  : <a name=..>	anchor tag. This permits the reconstruction of the
		  : original report in which these lines begin with a ">".
		  : This is required for parsing.
		  : If the structure of	the Blast report itself	is not intended	to
		  : be a standard, the structure of the	HTML-formatted version
		  : is even less so. Therefore,	the use	of this	method to
		  : reconstitute parsable Blast	reports	from HTML-format versions
		  : should be considered a temporary solution.

	Title	 : result2hash
	Usage	 : my %data = &Bio::Search::SearchUtils($result)
	Function : converts ResultI data to simple hash
	Returns	 : hash
	Args	 : ResultI
	Note	 : used	mainly as a utility for	running	SearchIO tests

perl v5.32.1			  2019-12-07	   Bio::Search::SearchUtils(3)


Want to link to this manual page? Use this URL:

home | help