Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Bio::DB::SoapEUtilitieUser Contributed Perl DocumentBio::DB::SoapEUtilities(3)

       Bio::DB::SoapEUtilities - Interface to the NCBI Entrez web service

	use Bio::DB::SoapEUtilities;

	# factory construction

	my $fac	= Bio::DB::SoapEUtilities->new()

	# executing a utility call

	#get an	iteratable adaptor
	my $links = $fac->elink(
		      -dbfrom => 'protein',
		      -db => 'taxonomy',
		      -id => \@protein_ids )->run(-auto_adapt => 1);

	# get a	Bio::DB::SoapEUtilities::Result	object
	my $result = $fac->esearch(
		      -db => 'gene',
		      -term => 'sonic and human')->run;

	# get the raw XML message
	my $xml	= $fac->efetch(
		    -db	=> 'gene',
		    -id	=> \@gids )->run( -raw_xml => 1	);

	# change parameters
	my $new_result = $fac->efetch(
			  -db => 'gene',
			  -id => \@more_gids)->run;
	# reset	parameters
	$fac->efetch->reset_parameters(	-db => 'nucleotide',
					-id => $nucid );
	$result	= $fac->efetch->run;

	# parsing and iterating	the results

	$count = $result->count;
	@ids = $result->ids;

	while (	my $linkset = $links->next_link	) {
	   $submitted =	$linkset->submitted_id;

	($taxid) = $links->id_map($submitted_prot_id);
	$species_io = $fac->efetch( -db	=> 'taxonomy',
				    -id	=> $taxid )->run( -auto_adapt => 1);
	$species = $species_io->next_species;
	$linnaeus = $species->binomial;

       This module allows the user to query the	NCBI Entrez database via its
       SOAP (Simple Object Access Protocol) web	service	(described at
       The basic tools ("einfo,	esearch, elink,	efetch,	espell,	epost")	are
       available as methods off	a "SoapEUtilities" factory object. Parameters
       for each	tool can be queried, set and reset for each method through the
       Bio::ParameterBaseI standard calls ("available_parameters(),
       set_parameters(), get_parameters(), reset_parameters()"). Returned data
       can be retrieved, accessed and parsed in	several	ways, according	to
       user preference.	Adaptors and object iterators are available for
       "efetch", "egquery", "elink", and "esummary" results.

       The "SoapEU" system has been designed to	be as easy (few	includes,
       available parameter facilities, reasonable defaults, intuitive aliases,
       built-in	pipelines) or as complex (accessors for	underlying low-level
       objects,	all parameters accessible, custom hooks	for builder objects,
       facilities for providing	local copies of	WSDLs) as the user requires or
       desires.	(To the	extent that it does not	succeed	in either direction,
       it is up	to the user to report to the mailing list ("FEEDBACK")!)

       To begin, make a	factory:

	my $fac	= Bio::DB::SoapEUtilities->new();

       From the	factory, utilities are called, parameters are set, and results
       or adaptors are retrieved.

       If you have your	own copy of the	wsdl, use

	my $fac	= Bio::Db::SoapEUtilities->new(	-wsdl_file => $my_wsdl );

       otherwise, the correct one will be obtained over	the network (by
       Bio::DB::ESoap and friends).

   Utilities and parameters
       To run any of the standard NCBI EUtilities ("einfo, esearch, esummary,
       elink, egquery, epost, espell"),	call the desired utility from the
       factory.	 To use	a utility, you must set	its parameters and run it to
       get a result.  TMTOWTDI:

	# verbose
	my $fetch = $fac->efetch();
	$fetch->set_parameters(	-db => 'gene', -id => [828392, 790]);
	my $result = $fetch->run;

	# compact
	my $result = $fac->efetch(-db =>'gene',-id => [828392,790])->run;

	# change ids
	$fac->efetch->set_parameters( -id => 470338 );
	$result	= $fac->run;

	# another util
	$result	= $fac->esearch(-db => 'protein', -term	=> 'BRCA and human')->run;

	# the utilities	are kept separate
	%search_params = $fac->esearch->get_parameters;
	%fetch_params =	$fac->efetch->get_parameters;
	$search_param{db}; # is	'protein'
	$fetch_params{db}; # is	'gene'

       The factory is Bio::ParameterBaseI compliant: that means	you can	find
       out what	you can	set with

	@available_search = $fac->esearch->available_parameters;
	@available_egquery = $fac->egquery->available_parameters;

       For more	information on parameters, see

       The "intermediate" object for "SoapEU" query results is the
       Bio::DB::SoapEUtilities::Result.	This is	a BioPerly parsing of the SOAP
       message sent by NCBI when a query is "run()". This can be very useful
       on it's own, but	most users will	likely want to proceed directly	to
       "Adaptors", which take a	"Result" and turn it into more
       intuitive/familiar BioPerl objects. Go there if the following details
       are too gory.

       Results can be highly- or lowly-parsed, depending on the	parameters
       passed to the factory "run()" method. To	get the	raw XML	message	with
       no parsing, do

	my $xml	= $fac->$util->run(-raw_xml => 1); # $xml is a scalar string

       To retrieve a Bio::DB::SoapEUtilities::Result object with limited
       parsing,	but with accessors to the SOAP::SOM message (provided by
       SOAP::Lite), do

	my $result = $fac->$util->run(-no_parse	=> 1);
	my $som	= $result->som;
	my $method_hash	= $som->method;	# etc...

       To retrieve a "Result" object with message elements parsed into
       accessors, including "count()" and "ids()", run without arguments:

	my $result = $fac->esearch->run()
	my $count = $result->count;
	my @Count = $result->Count; # counts for each member of
				    # the translation stack
	my @ids	= $result->IdList_Id; #	from automatic message parsing
	@ids = $result->ids; # a convenient alias

       See Bio::DB::SoapEUtilities::Result for more, even gorier details.

       Adaptors	convert	EUtility "Result"s into	convenient objects, via	a
       handle that usually provides an iterator, in the	spirit of Bio::SeqIO.
       These are probably more useful than the "Result"	to the typical user,
       and so you can retrieve them automatically by setting the "run()"
       parameter "-auto_adapt =" 1>.

       In general, retrieve an adaptor like so:

	$adp = $fac->$util->run( -auto_adapt =>	1 );
	# iterate...
	while (	my $obj	= $adp->next_obj ) {
	   # do	stuff with $obj

       The adaptor itself occasionally possesses useful	methods	besides	the
       iterator. The method "next_obj" always works, but a natural alias is
       also always available:

	$seqio = $fac->esearch->run( -auto_adapt => 1 );
	while (	my $seq	= $seqio->next_seq ) {
	   # do	stuff with $seq

       In the above example, "-auto_adapt =" 1>	also instructs the factory to
       perform an "efetch" based on the	ids returned by	the "esearch" (if
       any), so	that the adaptor returned iterates over	Bio::SeqI objects.

       Here is a rundown of the	different adaptor flavors:

       o   "efetch", Fetch Adaptors, and BioPerl object	iterators

	   The "FetchAdaptor" creates bona fide	BioPerl	objects. Currently,
	   there are FetchAdaptor subclasses for sequence data (both Genbank
	   and FASTA rettypes) and taxonomy data. The choice of	FetchAdaptor
	   is based on information in the result message, and should be
	   transparent to the user.

	    $seqio = $fac->efetch( -db =>'nucleotide',
				   -id => \@ids,
				   -rettype => 'gb' )->run( -auto_adapt	=> 1 );
	    while (my $seq = $seqio->next_seq) {
	       my $taxio = $fac->efetch(
		   -db => 'taxonomy',
		   -id => $seq->species->ncbi_taxid )->run(-auto_adapt => 1);
	       my $tax = $taxio->next_species;
	       unless (	$tax->TaxId == $seq->species->ncbi_taxid ) {
		 print "more work for MAJ"

	   See the pod for the FetchAdaptor subclasses (e.g.,
	   Bio::DB::SoapEUtilities::FetchAdaptor::seq) for more	detail.

       o   "elink", the	Link adaptor, and the "linkset"	iterator

	   The "LinkAdaptor" manages LinkSets. In "SoapEU", an "elink" call
	   always preserves the	correspondence between submitted and retrieved
	   ids.	The mapping between these can be accessed from the adaptor
	   object directly as "id_map()"

	    my $links =	$fac->elink( -db => 'protein',
				     -dbfrom =>	'nucleotide',
				     -id => \@nucids )->run( -auto_adapt => 1 );

	    # maybe more than one associated id...
	    my @prot_0 = $links->id_map( $nucids[0] );

	   Or iterate over the linksets:

	    while ( my $ls = $links->next_linkset ) {
	       @ids = $ls->ids;
	       @submitted_ids =	$ls->submitted_ids;
	       # etc.

       o   "esummary", the DocSum adaptor, and the "docsum" iterator

	   The "DocSumAdaptor" manages docsums,	the "esummary" return type.
	   The objects returned	by iterating with a "DocSumAdaptor" have
	   accessors that let you obtain field information directly. Docsums
	   contain lots	of easy-to-forget fields; use "item_names()" to	remind

	    my $docs = $fac->esummary( -db => 'taxonomy',
				       -id => 527031 )->run(-auto_adapt=>1);
	    # iterate over docsums
	    while (my $d = $docs->next_docsum) {
	       @available_items	= $docsum->item_names;
	       # any available item can	be called as an	accessor
	       # from the docsum	your case...
	       $sci_name = $d->ScientificName;
	       $taxid =	$d->TaxId;

       o   "egquery", the GQuery adaptor, and the "query" iterator

	   The "GQueryAdaptor" manages global query items returned by calls to
	   "egquery", which identifies all NCBI	databases containing hits for
	   your	query term. The	databases actually containing hits can be
	   retrieved directly from the adaptor with "found_in_dbs":

	    my $queries	= $fac->egquery(
		-term => 'BRCA and human'
	    my @dbs = $queries->found_in_dbs;

	   Retrieve the	global query info returned for any database with

	    my $prot_q = $queries->query_by_db('protein');
	    if ($prot_q->count)	{
	       #do something

	   Or iterate as usual:

	    while ( my $q = $queries->next_query ) {
	       if ($q->status eq 'Ok') {
		 # do sth

   Web environments and	query keys
       To make large or	complex	requests for data, or to share queries,	it may
       be helpful to use the NCBI WebEnv system	to manage your queries.	Each
       EUtility	accepts	the following parameters:


       for this	purpose. These store the details of your queries serverside.

       "SoapEU"	attempts to make using these relatively	straightforward. Use
       "Result"	objects	to obtain the correct parameters, and don't forget

	my $result1 = $fac->esearch(
	    -term => 'BRCA and human',
	    -db	=> 'nucleotide',
	    -usehistory	=> 1 )->run( -no_parse=>1 );

	my $result = $fac->esearch(
	    -term => 'AND early	onset',
	    -QueryKey => $result1->query_key,
	    -WebEnv => $result1->webenv	)->run(	-no_parse => 1 );

	my $result = $fac->esearch(
	   -db => 'protein',
	   -term => 'sonic',
	   -usehistory => 1 )->run( -no_parse => 1 );

	# later	(but not more than 8 hours later) that day...

	$result	= $fac->esearch(
	   -WebEnv => $result->webenv,
	   -QueryKey =>	$result->query_key,
	   -RetMax => 800 # get	'em all
	   )->run; # note we're	parsing	the result...
	@all_ids = $result->ids;

   Error checking
       Two kinds of errors can ensue on	an Entrez SOAP run. One	is a SOAP
       fault, and the other is an error	sent in	non-faulted SOAP message from
       the server. The distinction is probably systematic, and I would welcome
       an explanation of it. To	check for result errors, try something like:

	unless ( $result = $fac->$util->run ) {
	   die $fac->errstr; # this will catch a SOAP fault
	# a valid result object	was returned, but it may carry an error
	if ($result->count == 0) {
	   warn	"No hits returned";
	   if ($result->ERROR) {
	     warn "Entrez error	: ".$result->ERROR;

       Error handling will be improved in the package eventually.

       Bio::DB::EUtilities, Bio::DB::SoapEUtilities::Result, Bio::DB::ESoap.

   Mailing Lists
       User feedback is	an integral part of the	evolution of this and other
       Bioperl modules.	Send your comments and suggestions preferably to the
       Bioperl mailing list.  Your participation is much appreciated.			- General discussion  -	About the mailing lists

       Please direct usage questions or	support	issues to the mailing list:

       rather than to the module maintainer directly. Many experienced and
       reponsive experts will be able look at the problem and quickly address
       it. Please include a thorough description of the	problem	with code and
       data examples if	at all possible.

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track of
       the bugs	and their resolution. Bug reports can be submitted via the

AUTHOR - Mark A. Jensen
       Email maj -at- fortinbras -dot- us

       The rest	of the documentation details each of the object	methods.
       Internal	methods	are usually preceded with a _

	Title	: new
	Usage	: my $eutil = new Bio::DB::SoapEUtilities();
	Function: Builds a new Bio::DB::SoapEUtilities object
	Returns	: an instance of Bio::DB::SoapEUtilities
	Args	:

	Title	: run
	Usage	: $fac->$eutility->run(@args)
	Function: Execute the EUtility
	Returns	: true on success, false on fault or error
		  (reason in errstr(), for more	detail check the SOAP message
		   in last_result() )
	Args	: named	params appropriate to utility
		  -auto_adapt => boolean ( return an iterator over results as
					   appropriate to util if true)
		  -raw_xml => boolean (	return raw xml result; no processing )
		  Bio::DB::SoapEUtilities::Result constructor parms

   Useful Accessors
	Title	: response_message
	Aliases	: last_response, last_result
	Usage	: $som = $fac->response_message
	Function: get the last response	message
	Returns	: a SOAP::SOM object
	Args	: none

	Title	: webenv
	Usage	:
	Function: contains WebEnv key referencing the session
		  (set after run() )
	Returns	: scalar
	Args	: none

	Title	: errstr
	Usage	: $fac->errstr
	Function: get the last error, if any
	Example	:
	Returns	: value	of errstr (a scalar)
	Args	: none

   Bio::ParameterBaseI compliance
	Title	: available_parameters
	Usage	:
	Function: get available	request	parameters for calling
	Returns	:
	Args	: -util	=> $desired_utility [optional, default is
		  caller utility]

	Title	: set_parameters
	Usage	:
	Returns	: none
	Args	: -util	=> $desired_utility [optional, default is
		   caller utility],
		  named	utility	arguments

	Title	: get_parameters
	Usage	:
	Returns	: array	of named parameters
	Args	: utility (scalar string) [optional]
		  (default is caller utility)

	Title	: reset_parameters
	Usage	:
	Returns	: none
	Args	: -util	=> $desired_utility [optional, default is
		   caller utility],
		  named	utility	arguments

	Title	: parameters_changed
	Usage	:
	Returns	: boolean
	Args	: utility (scalar string) [optional]
		  (default is caller utility)

	Title	: _soap_facs
	Usage	: $self->_soap_facs($util, $fac)
	Function: caches Bio::DB::ESoap	factories for the
		  eutils in use	by this	instance
	Example	:
	Returns	: Bio::DB::ESoap object
	Args	: $eutility, [optional on set] $esoap_factory_object

	Title	: _caller_util
	Usage	: $self->_caller_util($newval)
	Function: the utility requested	off the	main SoapEUtilities
	Example	:
	Returns	: value	of _caller_util	(a scalar string, a valid eutility)
	Args	: on set, new value (a scalar string [optional])

perl v5.32.1			  2021-03-01	    Bio::DB::SoapEUtilities(3)


Want to link to this manual page? Use this URL:

home | help