Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Bio::DB::GFF::Adaptor:User:Contributed PerBio::DB::GFF::Adaptor::dbi::mysql(3)

       Bio::DB::GFF::Adaptor::dbi::mysql -- Database adaptor for a specific
       mysql schema

       See Bio::DB::GFF

       This adaptor implements a specific mysql	database schema	that is
       compatible with Bio::DB::GFF.  It inherits from
       Bio::DB::GFF::Adaptor::dbi, which itself	inherits from Bio::DB::GFF.

       The schema uses several tables:

	   This	is the feature data table.  Its	columns	are: -
	       fid		feature	ID (integer)
	       fref	      reference	sequence name (string)
	       fstart	      start position relative to reference (integer)
	       fstop	      stop postion relative to reference (integer)
	       ftypeid	      feature type ID (integer)
	       fscore	      feature score (float); may be null
	       fstrand	      strand; one of "+" or "-"; may be	null
	       fphase	      phase; one of 0, 1 or 2; may be null
	       gid	      group ID (integer)
	       ftarget_start  for similarity features, the target start
	   position (integer)
	       ftarget_stop   for similarity features, the target stop
	   position (integer)

	   Note	that it	would be desirable to normalize	the reference sequence
	   name, since there are usually many features that share the same
	   reference feature.  However,	in the current schema, query
	   performance suffers dramatically when this additional join is

	   This	is the group table. There is one row for each group.  Columns:

	       gid	 the group ID (integer)
	       gclass	 the class of the group	(string)
	       gname	 the name of the group (string)

	   The group table serves multiple purposes.  As you might expect, it
	   is used to cluster features that logically belong together, such as
	   the multiple	exons of the same transcript.  It is also used to
	   assign a name and class to a	singleton feature.  Finally, the group
	   table is used to identify the target	of a similarity	hit.  This is
	   consistent with the way in which the	group field is used in the GFF
	   version 2 format.

	   The fgroup.gid field	joins with the fdata.gid field.


	     mysql> select * from fgroup where gname='sjj_2L52.1';
	     | gid   | gclass	   | gname	|
	     | 69736 | PCR_product | sjj_2L52.1	|
	     1 row in set (0.70	sec)

	     mysql> select fref,fstart,fstop from fdata,fgroup
		       where gclass='PCR_product' and gname = 'sjj_2L52.1'
			     and fdata.gid=fgroup.gid;
	     | fref	     | fstart |	fstop |
	     | CHROMOSOME_II |	 1586 |	 2355 |
	     1 row in set (0.03	sec)

	   This	table contains the feature types, one per row.	Columns	are:

	       ftypeid	    the	feature	type ID	(integer)
	       fmethod	    the	feature	type method name (string)
	       fsource	    the	feature	type source name (string)

	   The ftype.ftypeid field joins with the fdata.ftypeid	field.

	     mysql> select fref,fstart,fstop,fmethod,fsource from fdata,fgroup,ftype
		    where gclass='PCR_product'
			  and gname = 'sjj_2L52.1'
			  and fdata.gid=fgroup.gid
			  and fdata.ftypeid=ftype.ftypeid;
	     | fref	     | fstart |	fstop |	fmethod	    | fsource	|
	     | CHROMOSOME_II |	 1586 |	 2355 |	PCR_product | GenePairs	|
	     1 row in set (0.08	sec)

	   This	table holds the	raw DNA	of the reference sequences.  It	has
	   three columns:

	       fref	     reference sequence	name (string)
	       foffset	     offset of this sequence
	       fdna	     the DNA sequence (longblob)

	   To overcome problems	loading	large blobs, DNA is automatically
	   fragmented into multiple segments when loading, and the position of
	   each	segment	is stored in foffset.  The fragment size is controlled
	   by the -clump_size argument during initialization.

	   This	table holds "attributes", which	are tag/value pairs stuffed
	   into	the GFF	line.  The first tag/value pair	is treated as the
	   group, and anything else is treated as an attribute (weird, huh?).

	    CHR_I assembly_tag Finished	    2032 2036 .	+ . Note "Right: cTel33B"
	    CHR_I assembly_tag Polymorphism 668	 668  .	+ . Note "A->C in cTel33B"

	   The columns of this table are:

	       fid		   feature ID (integer)
	       fattribute_id	   ID of the attribute (integer)
	       fattribute_value	   text	of the attribute (text)

	   The fdata.fid column	joins with fattribute_to_feature.fid.

	   This	table holds the	normalized names of the	attributes.  Fields

	     fattribute_id	ID of the attribute (integer)
	     fattribute_name	Name of	the attribute (varchar)

   Data	Loading	Methods
       In addition to implementing the abstract	SQL-generating methods of
       Bio::DB::GFF::Adaptor::dbi, this	module also implements the data
       loading functionality of	Bio::DB::GFF.

	Title	: new
	Usage	: $db =	Bio::DB::GFF->new(@args)
	Function: create a new adaptor
	Returns	: a Bio::DB::GFF object
	Args	: see below
	Status	: Public

       The new constructor is identical	to the "dbi" adaptor's new() method,
       except that the prefix "dbi:mysql" is added to the database DSN
       identifier automatically	if it is not there already.

	 Argument	Description
	 --------	-----------

	 -dsn		the DBI	data source, e.g. 'dbi:mysql:ens0040' or "ens0040"

	 -user		username for authentication

	 -pass		the password for authentication

	Title	: get_dna
	Usage	: $string = $db->get_dna($name,$start,$stop,$class)
	Function: get DNA string
	Returns	: a string
	Args	: name,	class, start and stop of desired segment
	Status	: Public

       This method performs the	low-level fetch	of a DNA substring given its
       name, class and the desired range.  This	should probably	be moved to
       the parent class.

	Title	: search_notes
	Usage	: @search_results = $db->search_notes("full text search	string",$limit)
	Function: Search the notes for a text string, using mysql full-text search
	Returns	: array	of results
	Args	: full text search string, and an optional row limit
	Status	: public

       This is a mysql-specific	method.	 Given a search	string,	it performs a
       full-text search	of the notes table and returns an array	of results.
       Each row	of the returned	array is a arrayref containing the following

	 column	1     A	Bio::DB::GFF::Featname object, suitable	for passing to segment()
	 column	2     The text of the note
	 column	3     A	relevance score.

	Title	: schema
	Usage	: $schema = $db->schema
	Function: return the CREATE script for the schema
	Returns	: a list of CREATE statemetns
	Args	: none
	Status	: protected

       This method returns a list containing the various CREATE	statements
       needed to initialize the	database tables.

	Title	: make_classes_query
	Usage	: ($query,@args) = $db->make_classes_query
	Function: return query fragment	for generating list of reference classes
	Returns	: a query and args
	Args	: none
	Status	: public

	Title	: make_meta_set_query
	Usage	: $sql = $db->make_meta_set_query
	Function: return SQL fragment for setting a meta parameter
	Returns	: SQL fragment
	Args	: none
	Status	: public

       By default this does nothing; meta parameters are not stored or

	Title	: setup_load
	Usage	: $db->setup_load
	Function: called before	load_gff_line()
	Returns	: void
	Args	: none
	Status	: protected

       This method performs schema-specific initialization prior to loading a
       set of GFF records.  It prepares	a set of DBI statement handlers	to be
       used in loading the data.

	Title	: load_gff_line
	Usage	: $db->load_gff_line($fields)
	Function: called to load one parsed line of GFF
	Returns	: true if successfully inserted
	Args	: hashref containing GFF fields
	Status	: protected

       This method is called once per line of the GFF and passed a series of
       parsed data items that are stored into the hashref $fields.  The	keys

	ref	     reference sequence
	source	     annotation	source
	method	     annotation	method
	start	     annotation	start
	stop	     annotation	stop
	score	     annotation	score (may be undef)
	strand	     annotation	strand (may be undef)
	phase	     annotation	phase (may be undef)
	group_class  class of annotation's group (may be undef)
	group_name   ID	of annotation's	group (may be undef)
	target_start start of target of	a similarity hit
	target_stop  stop of target of a similarity hit
	attributes   array reference of	attributes, each of which is a [tag=>value] array ref

	Title	: get_table_id
	Usage	: $integer = $db->get_table_id($table,@ids)
	Function: get the ID of	a group	or type
	Returns	: an integer ID	or undef
	Args	: none
	Status	: private

       This internal method is called by load_gff_line to look up the integer
       ID of an	existing feature type or group.	 The arguments are the name of
       the table, and two string identifiers.  For feature types, the
       identifiers are the method and source.  For groups, the identifiers are
       group name and class.

       This method requires that a statement handler named lookup_$table, have
       been created previously by setup_load().	 It is here to overcome
       deficiencies in mysql's INSERT syntax.

	Title	: get_feature_id
	Usage	: $integer = $db->get_feature_id($ref,$start,$stop,$typeid,$groupid)
	Function: get the ID of	a feature
	Returns	: an integer ID	or undef
	Args	: none
	Status	: private

       This internal method is called by load_gff_line to look up the integer
       ID of an	existing feature.  It is ony needed when replacing a feature
       with new	information.

       none ;-)

       Bio::DB::GFF, bioperl

       Lincoln Stein <>.

       Copyright (c) 2002 Cold Spring Harbor Laboratory.

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.24.1			  2017-07-Bio::DB::GFF::Adaptor::dbi::mysql(3)


Want to link to this manual page? Use this URL:

home | help