Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Bio::SeqIO::msout(3)  User Contributed Perl Documentation Bio::SeqIO::msout(3)

NAME
       Bio::SeqIO::msout - input stream	for output by Hudson's ms

SYNOPSIS
       Do not use this module directly.	 Use it	via the	Bio::SeqIO class.

DESCRIPTION
       ms ( Hudson, R. R. (2002) Generating samples under a Wright-Fisher
       neutral model. Bioinformatics 18:337-8 )	can be found at
       http://home.uchicago.edu/~rhudson1/source/mksamples.html.

       Currently, this object can be used to read output from ms into seq
       objects.	 However, because bioperl has no support for haplotypes
       created using an	infinite sites model (where '1'	identifies a derived
       allele and '0' identifies an ancestral allele), the sequences returned
       by msout	are coded using	A, T, C	and G. To decode the bases, use	the
       sequence	conversion table (a hash) returned by
       get_base_conversion_table(). In the table, 4 and	5 are used when	the
       ancestry	is unclear. This should	not ever happen	when creating files
       with ms,	but it will be used when creating msOUT	files from a
       collection of seq objects ( To be added later ).	Alternatively, use
       get_next_hap() to get a string with 1's and 0's instead of a seq
       object.

   Mapping to Finite Sites
       This object can now also	be used	to map haplotypes created using	an
       infinite	sites model to sequences of arbitrary finite length.  See
       set_n_sites() for more detail.  Thanks to Filipe	G. Vieira
       <fgvieira@berkeley.edu> for the idea and	code.

FEEDBACK
   Mailing Lists
       User feedback is	an integral part of the	evolution of this and other
       Bioperl modules.	Send your comments and suggestions preferably to the
       Bioperl mailing list. Your participation	is much	appreciated.

	 bioperl-l@bioperl.org			- General discussion
	 http://bioperl.org/wiki/Mailing_lists	- About	the mailing lists

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track of
       the bugs	and their resolution. Bug reports can be submitted via the
       web:

	 https://github.com/bioperl/bioperl-live/issues

AUTHOR - Warren	Kretzschmar
       This module was written by Warren Kretzschmar

       email: wkretzsch@gmail.com

       This module grew	out of a parser	written	by Aida	Andres.

COPYRIGHT
   Public Domain Notice
       This software/database is ``United States Government Work'' under the
       terms of	the United States Copyright Act. It was	written	as part	of the
       authors'	official duties	for the	United States Government and thus
       cannot be copyrighted. This software/database is	freely available to
       the public for use without a copyright notice. Restrictions cannot be
       placed on its present or	future use.

       Although	all reasonable efforts have been taken to ensure the accuracy
       and reliability of the software and data, the National Human Genome
       Research	Institute (NHGRI) and the U.S. Government does not and cannot
       warrant the performance or results that may be obtained by using	this
       software	or data.  NHGRI	and the	U.S. Government	disclaims all
       warranties as to	performance, merchantability or	fitness	for any
       particular purpose.

METHODS
   Methods for Internal	Use
       _initialize

       Title   : _initialize Usage   : $stream =
       Bio::SeqIO::msOUT->new($infile) Function: extracts basic	information
       about the file.	Returns	: Bio::SeqIO object Args    : no_og, gunzip,
       gzip, n_sites Details   :
	   - include 'no_og' flag if the last population of an msout file
       contains
	     only one haplotype	and you	don't want the last haplotype to be
	     treated as	the outgroup ( suggested when reading data created by
       ms ).
	   - including 'n_sites' (positive integer) causes all output
       haplotypes to be
	     mapped to a sequence of length 'n_sites'. See set_n_sites() for
       more details.

       _read_start

       Title   : _read_start Usage   : $stream->_read_start() Function:	reads
       from the	filehandle $stream->{_filehandle} all information up to	the
       first haplotype (sequence).  Closes the filehandle if all lines have
       been read.  Returns : void Args	  : none

   Methods to Access Data
       get_segsites

       Title   : get_segsites Usage   :	$segsites = $stream->get_segsites()
       Function: returns the number of segsites	in the msOUT file (according
       to the msOUT header line's -s option), or the current run's segsites if
       -s was not specified in the command line	(in this case the number of
       segsites	varies from run	to run).  Returns : scalar Args	   : NONE

       get_current_run_segsites

       Title   : get_current_run_segsites Usage	  : $segsites =
       $stream->get_current_run_segsites() Function: returns the number	of
       segsites	in the run of the last read
		 haplotype (sequence).	Returns	: scalar Args	 : NONE

       get_n_sites

       Title   : get_n_sites Usage   : $n_sites	= $stream->get_n_sites()
       Function: Gets the number of total sites	(variable or not) to be
       output.	Returns	: scalar if n_sites option is defined at call time of
       new() Args    : NONE Note    :
		 WARNING: Final	sequence length	might not be equal to n_sites
       if n_sites is
			  too close to number of segregating sites in the
       msout file.

       set_n_sites

       Title   : set_n_sites Usage   : $n_sites	= $stream->set_n_sites($value)
       Function: Sets the number of total sites	(variable or not) to be
       output.	Returns	: 1 on success;	throws an error	if $value is not a
       positive	integer	or undef Args	 : positive integer Note    :
		 WARNING: Final	sequence length	might not be equal to n_sites
       if it is
			  too close to number of segregating sites.
		 - n_sites needs to be at least	as large as the	number of
       segsites	of
		   the next haplotype returned
		 - n_sites may also be set to undef, in	which case haplotypes
       are returned
		   under the infinite sites model assumptions.

       get_runs

       Title   : get_runs Usage	  : $runs = $stream->get_runs()	Function:
       returns the number of runs in the msOUT file (according to the
		 msinfo	line) Returns :	scalar Args    : NONE

       get_Seeds

       Title   : get_Seeds Usage   : @seeds = $stream->get_Seeds() Function:
       returns an array	of the seeds used in the creation of the msOUT file.
       Returns : array Args    : NONE Details :	In older versions, ms used
       three seeds.  Newer versions of ms seem to
		 use only one (longer) seed.  This function will return	all
       the seeds
		 found.

       get_Positions

       Title   : get_Positions Usage   : @positions = $stream->get_Positions()
       Function: returns an array of the names of each segsite of the run of
       the last
		 read hap.  Returns : array Args    : NONE Details : The
       Positions may or	may not	vary from run to run depending on the
		 options used with ms.

       get_tot_run_haps

       Title   : get_tot_run_haps Usage	  : $number_of_haps_per_run =
       $stream->get_tot_run_haps() Function: returns the number	of haplotypes
       (sequences) in each run of the msOUT
		 file (	according to the msinfo	line ).	 Returns : scalar >= 0
       Args    : NONE Details :	This number should not vary from run to	run.

       get_ms_info_line

       Title   : get_ms_info_line Usage	  : $ms_info_line =
       $stream->get_ms_info_line() Function: returns the header	line of	the
       msOUT file.  Returns : scalar Args    : NONE

       tot_haps

       Title   : tot_haps Usage	  : $number_of_haplotypes_in_file =
       $stream->tot_haps() Function: returns the number	of haplotypes
       (sequences) in the msOUT	file.
		 Information gathered from msOUT header	line.  Returns :
       scalar Args    :	NONE

       get_Pops

       Title   : get_Pops Usage	  : @pops = $stream->pops() Function: returns
       an array	of population sizes (order taken from the -I flag in
		 the msOUT header line).  This array will include the last hap
       even if
		 it looks like an outgroup.  Returns : array of	scalars	> 0
       Args    : NONE

       get_next_run_num

       Title   : get_next_run_num Usage	  : $next_run_number =
       $stream->next_run_num() Function: returns the number of the ms run that
       the next	haplotype (sequence)
		 will be taken from (starting at 1).  Returns undef if the
       complete
		 file has been read.  Returns :	scalar > 0 or undef Args    :
       NONE

       get_last_haps_run_num

       Title   : get_last_haps_run_num Usage   : $last_haps_run_number =
       $stream->get_last_haps_run_num()	Function: returns the number of	the ms
       run that	the last haplotype (sequence)
		 was taken from	(starting at 1).  Returns undef	if no hap has
       been
		 read yet.  Returns : scalar > 0 or undef Args	  : NONE

       get_last_read_hap_num

       Title   : get_last_read_hap_num Usage   : $last_read_hap_num =
       $stream->get_last_read_hap_num()	Function: returns the number (starting
       with 1) of the last haplotype read from
		 the ms	file Returns : scalar >= 0 Args	   : NONE Details   :
       0 means that no haplotype has been read yet.  Is	reset to 0 every run.

       outgroup

       Title   : outgroup Usage	  : $outgroup =	$stream->outgroup() Function:
       returns '1' if the msOUT	stream has an outgroup.	 Returns '0'
		 otherwise.  Returns : '1' or '0' Args	  : NONE Details   :
       This method will	return '1' only	if the last population in the msOUT
		 file contains only one	haplotype.  If the last	population is
       not an
		 outgroup then create the msOUT	object using 'no_og' as	input
       flag.
		 Also, return 0, if the	run has	only one population.

       get_next_haps_pop_num

       Title   : get_next_haps_pop_num Usage   : ($next_haps_pop_num,
       $num_haps_left_in_pop) =	$stream->get_next_haps_pop_num() Function:
       First return value is the population number (starting with 1) the
		 next hap will come from. The second return value is the
       number of haps
		 left to read in the population	from which the next hap	will
       come.  Returns :	(scalar	> 0, scalar > 0) Args	 : NONE

       get_next_seq

       Title   : get_next_seq Usage   :	$seq = $stream->get_next_seq()
       Function: reads and returns the next sequence (haplotype) in the	stream
       Returns : Bio::Seq object or void if end	of file	Args	: NONE Note :
       This function is	included only to conform to convention.	 The
		 returned Bio::Seq object holds	a halpotype in coded form. Use
       the hash
		 returned by get_base_conversion_table() to convert 'A', 'T',
       'C', 'G'
		 back into 1,2,4 and 5.	Use get_next_hap() to retrieve the
       halptoype as
		 a string of 1,2,4 and 5s instead.

       next_seq

       Title   : next_seq Usage	  : $seq = $stream->next_seq() Function: Alias
       to get_next_seq() Returns : Bio::Seq object or void if end of file Args
       : NONE Note    :	This function is only included for convention.	It
       calls get_next_seq().
		 See get_next_seq() for	details.

       get_next_hap

       Title   : get_next_hap Usage   :	$hap = $stream->next_hap() Function:
       reads and returns the next sequence (haplotype) in the stream.
		 Returns undef if all sequences	in stream have been read.
       Returns : Haplotype string (e.g.	'110110000101101045454000101' Args
       : NONE Note : Use get_next_seq()	if you want the	halpotype returned as
       a
		 Bio::Seq object.

       get_next_pop

       Title   : get_next_pop Usage   :	@seqs =	$stream->next_pop() Function:
       reads and returns all the remaining sequences (haplotypes) in the
		 population of the next	sequence.  Returns an empty list if no
       more
		 haps remain to	be read	in the stream Returns :	array of
       Bio::Seq	objects	Args	: NONE

       next_run

       Title   : next_run Usage	  : @seqs = $stream->next_run()	Function:
       reads and returns all the remaining sequences (haplotypes) in the ms
		 run of	the next sequence.  Returns an empty list if all haps
       have been
		 read from the stream.	Returns	: array	of Bio::Seq objects
       Args    : NONE

   Methods to Retrieve Constants
       base_conversion_table

       Title   : get_base_conversion_table Usage   : $table_hash_ref =
       $stream->get_base_conversion_table() Function: returns a	reference to a
       hash.  The keys of the hash are the letters '
		 A','T','G','C'. The values associated with each key are the
       value that
		 each letter in	the sequence of	a seq object returned by a
		 Bio::SeqIO::msout stream should be translated to.  Returns :
       reference to a hash Args	   : NONE Synopsis:

	       # retrieve the Bio::Seq object's	sequence
	       my $haplotype = $seq->seq;

	       # need to convert all letters to	their corresponding numbers.
	       foreach my $base	(keys %{$rh_base_conversion_table}){
		       $haplotype =~ s/($base)/$rh_base_conversion_table->{$base}/g;
	       }

	       # $haplotype is now an ms style haplotype. (e.g.	'100101101455')

perl v5.32.1			  2019-12-07		  Bio::SeqIO::msout(3)

NAME | SYNOPSIS | DESCRIPTION | FEEDBACK | AUTHOR - Warren Kretzschmar | COPYRIGHT | METHODS

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Bio::SeqIO::msout&sektion=3&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help