Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Bio::Tools::GuessSeqFoUser(Contributed Perl DocumBio::Tools::GuessSeqFormat(3)

NAME
       Bio::Tools::GuessSeqFormat - Module for determining the sequence	format
       of the contents of a file, a string, or through a filehandle.

SYNOPSIS
	   # To	guess the format of a flat file, given a filename:
	   my $guesser = Bio::Tools::GuessSeqFormat->new( -file	=> $filename );
	   my $format  = $guesser->guess;

	   # To	guess the format from an already open filehandle:
	   my $guesser = Bio::Tools::GuessSeqFormat->new( -fh => $filehandle );
	   my $format  = $guesser->guess;
	   # The filehandle will be returned to	its original position. Note that this
	   # filehandle	can be STDIN.

	   # To	guess the format of one	or several lines of text (with
	   # embedded newlines):
	   my $guesser = Bio::Tools::GuessSeqFormat->new( -text	=> $linesoftext	);
	   my $format =	$guesser->guess;

	   # To	create a Bio::Tools::GuessSeqFormat object and set the
	   # filename, filehandle, or line to parse afterwards:
	   my $guesser = Bio::Tools::GuessSeqFormat->new();
	   $guesser->file($filename);
	   $guesser->fh($filehandle);
	   $guesser->text($linesoftext);

	   # To	guess in one go, given e.g. a filename:
	   my $format =	Bio::Tools::GuessSeqFormat->new( -file => $filename )->guess;

DESCRIPTION
       Bio::Tools::GuessSeqFormat tries	to guess the format ("swiss", "pir",
       "fasta" etc.) of	the sequence or	MSA in a file, in a scalar, or through
       a filehandle.

       The guess() method of a Bio::Tools::GuessSeqFormat object will examine
       the data, line by line, until it	finds a	line to	which only one format
       can be assigned.	 If no conclusive guess	can be made, undef is
       returned.

       If the Bio::Tools::GuessSeqFormat object	is given a filehandle, e.g.
       STDIN, it will be restored to its original position on return from the
       guess() method.

   Formats
       Tests are currently implemented for the following formats:

       o   ACeDB ("ace")

       o   Blast ("blast")

       o   ClustalW ("clustalw")

       o   Codata ("codata")

       o   EMBL	("embl")

       o   FastA sequence ("fasta")

       o   FastQ sequence ("fastq")

       o   FastXY/FastA	alignment ("fastxy")

       o   Game	XML ("game")

       o   GCG ("gcg")

       o   GCG Blast ("gcgblast")

       o   GCG FastA ("gcgfasta")

       o   GDE ("gde")

       o   Genbank ("genbank")

       o   Genscan ("genscan")

       o   GFF ("gff")

       o   HMMER ("hmmer")

       o   PAUP/NEXUS ("nexus")

       o   Phrap assembly file ("phrap")

       o   NBRF/PIR ("pir")

       o   Mase	("mase")

       o   Mega	("mega")

       o   GCG/MSF ("msf")

       o   Pfam	("pfam")

       o   Phylip ("phylip")

       o   Prodom ("prodom")

       o   Raw ("raw")

       o   RSF ("rsf")

       o   Selex ("selex")

       o   Stockholm ("stockholm")

       o   Swissprot ("swiss")

       o   Tab ("tab")

       o   Variant Call	Format ("vcf")

FEEDBACK
   Mailing Lists
       User feedback is	an integral part of the	evolution of this and other
       Bioperl modules.	 Send your comments and	suggestions preferably to one
       of the Bioperl mailing lists.  Your participation is much appreciated.

	 bioperl-l@bioperl.org			- General discussion
	 http://bioperl.org/wiki/Mailing_lists	- About	the mailing lists

   Support
       Please direct usage questions or	support	issues to the mailing list:

       bioperl-l@bioperl.org

       rather than to the module maintainer directly. Many experienced and
       reponsive experts will be able look at the problem and quickly address
       it. Please include a thorough description of the	problem	with code and
       data examples if	at all possible.

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track
       the bugs	and their resolution.  Bug reports can be submitted via	the
       web:

	 https://github.com/bioperl/bioperl-live/issues

AUTHOR
       Andreas Kaehaeri, andreas.kahari@ebi.ac.uk

CONTRIBUTORS
       Heikki Lehvaeslaiho, heikki-at-bioperl-dot-org Mark A. Jensen, maj-at-
       fortinbras-dot-us

METHODS
       Methods available to Bio::Tools::GuessSeqFormat objects are described
       below.  Methods with names beginning with an underscore are considered
       to be internal.

   new
	Title	   : new
	Usage	   : $guesser =	Bio::Tools::GuessSeqFormat->new( ... );
	Function   : Creates a new object.
	Example	   : See SYNOPSIS.
	Returns	   : A new object.
	Arguments  : -file The filename	of the file whose format is to
			   be guessed, e.g. STDIN, or
		     -fh   An already opened filehandle	from which a text
			   stream may be read, or
		     -text A scalar containing one or several lines of
			   text	with embedded newlines.

	   If more than	one of the above arguments are given, they
	   are tested in the order -text, -file, -fh, and the first
	   available argument will be used.

   file
	Title	   : file
	Usage	   : $guesser->file($filename);
		     $filename = $guesser->file;
	Function   : Gets or sets the current filename associated with
		     an	object.
	Returns	   : The new filename.
	Arguments  : The filename of the file whose format is to be
		     guessed.

	   A call to this method will clear the	current	filehandle and
	   the current lines of	text associated	with the object.

   fh
	Title	   : fh
	Usage	   : $guesser->fh($filehandle);
		     $filehandle = $guesser->fh;
	Function   : Gets or sets the current filehandle associated with
		     an	object.
	Returns	   : The new filehandle.
	Arguments  : An	already	opened filehandle from which a text
		     stream may	be read.

	   A call to this method will clear the	current	filename and
	   the current lines of	text associated	with the object.

   text
	Title	   : text
	Usage	   : $guesser->text($linesoftext);
		     $linesofext = $guesser->text;
	Function   : Gets or sets the current text associated with an
		     object.
	Returns	   : The new lines of texts.
	Arguments  : A scalar containing one or	several	lines of text,
		     including embedded	newlines.

	   A call to this method will clear the	current	filename and
	   the current filehandle associated with the object.

   guess
	Title	   : guess
	Usage	   : $format = $guesser->guess;
		     @format = $guesser->guess;	# if given a line of text
	Function   : Guesses the format	of the data accociated with the
		     object.
	Returns	   : A format string such as "swiss" or	"pir".	If a
		     format can	not be found, undef is returned.
	Arguments  : None.

	   If the object is associated with a filehandle, the position
	   of the filehandle will be returned to its original position
	   before the method returns.

HELPER SUBROUTINES
       All helper subroutines will, given a line of text and the line number
       of the same line, return	1 if the line possibly is from a file of the
       type that they perform a	test of.

       A zero return value does	not mean that the line is not part of a
       certain type of file, just that the test	did not	find any
       characteristics of that type of file in the line.

   _possibly_ace
       From bioperl test data, and from
       "http://www.isrec.isb-sib.ch/DEA/module8/B_Stevenson/Practicals/transcriptome_recon/transcriptome_recon.html".

   _possibly_blast
	From various blast results.

   _possibly_bowtie
       Contributed by kortsch.

   _possibly_clustalw
       From "http://www.ebi.ac.uk/help/formats.html".

   _possibly_codata
       From "http://www.ebi.ac.uk/help/formats.html".

   _possibly_embl
       From
       "http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3.3".

   _possibly_fasta
       From "http://www.ebi.ac.uk/help/formats.html".

   _possibly_fastq
       From bioperl test data.

   _possibly_fastxy
       From bioperl test data.

   _possibly_game
       From bioperl testdata.

   _possibly_gcg
       From bioperl, Bio::SeqIO::gcg.

   _possibly_gcgblast
       From bioperl testdata.

   _possibly_gcgfasta
       From bioperl testdata.

   _possibly_gde
       From "http://www.ebi.ac.uk/help/formats.html".

   _possibly_genbank
       From "http://www.ebi.ac.uk/help/formats.html".  Format of [apparantly
       optional] file header from
       "http://www.umdnj.edu/rcompweb/PA/Notes/GenbankFF.htm". (TODO: dead
       link)

   _possibly_genscan
       From bioperl test data.

   _possibly_gff
       From bioperl test data.

   _possibly_hmmer
       From bioperl test data.

   _possibly_nexus
       From "http://paup.csit.fsu.edu/nfiles.html".

   _possibly_mase
       From bioperl test data.	More detail from
       "http://www.umdnj.edu/rcompweb/PA/Notes/GenbankFF.htm" (TODO: dead
       link)

   _possibly_mega
       From the	ensembl	broswer	(AlignView data	export).

   _possibly_msf
       From "http://www.ebi.ac.uk/help/formats.html".

   _possibly_phrap
       From "http://biodata.ccgb.umn.edu/docs/contigimage.html". (TODO:	dead
       link) From "http://genetics.gene.cwru.edu/gene508/Lec6.htm".    (TODO:
       dead link) From bioperl test data ("*.ace.1" files).

   _possibly_pir
       From "http://www.ebi.ac.uk/help/formats.html".  The ".,()" spotted in
       bioperl test data.

   _possibly_pfam
       From bioperl test data.

   _possibly_phylip
       From "http://www.ebi.ac.uk/help/formats.html".  Initial space allowed
       on first	line (spotted in ensembl AlignView exported data).

   _possibly_prodom
       From "http://prodom.prabi.fr/prodom/current/documentation/data.php".

   _possibly_raw
       From "http://www.ebi.ac.uk/help/formats.html".

   _possibly_rsf
       From "http://www.ebi.ac.uk/help/formats.html".

   _possibly_selex
       From "http://www.ebc.ee/WWW/hmmer2-html/node27.html".

       Assuming	presence of Selex file header.	Data exported by Bioperl on
       Pfam and	Selex formats are identical, but Pfam file only	holds one
       alignment.

   _possibly_stockholm
       From bioperl test data.

   _possibly_swiss
       From "http://ca.expasy.org/sprot/userman.html#entrystruc".

   _possibly_tab
       Contributed by Heikki.

   _possibly_vcf
       From "http://www.1000genomes.org/wiki/analysis/vcf4.0".

       Assumptions made	about sanity - format and date lines are line 1	and 2
       respectively. This is not specified in the format document.

perl v5.24.1			  2017-07-08	 Bio::Tools::GuessSeqFormat(3)

NAME | SYNOPSIS | DESCRIPTION | FEEDBACK | AUTHOR | CONTRIBUTORS | METHODS | HELPER SUBROUTINES

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Bio::Tools::GuessSeqFormat&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help