Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
URI::Fetch(3)	      User Contributed Perl Documentation	 URI::Fetch(3)

       URI::Fetch - Smart URI fetching/caching

	   use URI::Fetch;

	   ## Simple fetch.
	   my $res = URI::Fetch->fetch('')
	       or die URI::Fetch->errstr;
	   do_something($res->content) if $res->is_success;

	   ## Fetch using specified ETag and Last-Modified headers.
	   $res	= URI::Fetch->fetch('',
		   ETag	=> '123-ABC',
		   LastModified	=> time	- 3600,
	       or die URI::Fetch->errstr;

	   ## Fetch using an on-disk cache that	URI::Fetch manages for you.
	   my $cache = Cache::File->new( cache_root => '/tmp/cache' );
	   $res	= URI::Fetch->fetch('',
		   Cache => $cache
	       or die URI::Fetch->errstr;

       URI::Fetch is a smart client for	fetching HTTP pages, notably
       syndication feeds (RSS, Atom, and others), in an	intelligent,
       bandwidth- and time-saving way. That means:

       o   GZIP	support

	   If you have Compress::Zlib installed, URI::Fetch will automatically
	   try to download a compressed	version	of the content,	saving
	   bandwidth (and time).

       o   Last-Modified and ETag support

	   If you use a	local cache (see the Cache parameter to	fetch),
	   URI::Fetch will keep	track of the Last-Modified and ETag headers
	   from	the server, allowing you to only download pages	that have been
	   modified since the last time	you checked.

       o   Proper understanding	of HTTP	error codes

	   Certain HTTP	error codes are	special, particularly when fetching
	   syndication feeds, and well-written clients should pay special
	   attention to	them.  URI::Fetch can only do so much for you in this
	   regard, but it gives	you the	tools to be a well-written client.

	   The response	from fetch gives you the raw HTTP response code, along
	   with	special	handling of 4 codes:

	   o   200 (OK)

	       Signals that the	content	of a page/feed was retrieved

	   o   301 (Moved Permanently)

	       Signals that a page/feed	has moved permanently, and that	your
	       database	of feeds should	be updated to reflect the new URI.

	   o   304 (Not	Modified)

	       Signals that a page/feed	has not	changed	since it was last

	   o   410 (Gone)

	       Signals that a page/feed	is gone	and will never be coming back,
	       so you should stop trying to fetch it.

   Change from 0.09
       If you make a request using a cache and get back	a 304 response code
       (Not Modified), then if the content was returned	from the cache,	then
       "is_success()" will return true,	and "$response->content" will contain
       the cached content.

       I think this is the right behaviour, given the philosophy of
       "URI::Fetch", but please	let me (NEILB) know if you disagree.

   URI::Fetch->fetch($uri, %param)
       Fetches a page identified by the	URI $uri.

       On success, returns a URI::Fetch::Response object; on failure, returns

       %param can contain:

       o   LastModified

       o   ETag

	   LastModified	and ETag can be	supplied to force the server to	only
	   return the full page	if it's	changed	since the last request.	If
	   you're writing your own feed	client,	this is	recommended practice,
	   because it limits both your bandwidth use and the server's.

	   If you'd rather not have to store the LastModified time and ETag
	   yourself, see the Cache parameter below (and	the SYNOPSIS above).

       o   Cache

	   If you'd like URI::Fetch to cache responses between requests,
	   provide the Cache parameter with an object supporting the Cache API
	   (e.g.  Cache::File, Cache::Memory). Specifically, an	object that
	   supports "$cache->get($key)"	and "$cache->set($key, $value,

	   If supplied,	URI::Fetch will	store the page content,	ETag, and
	   last-modified time of the response in the cache, and	will pull the
	   content from	the cache on subsequent	requests if the	page returns a
	   Not-Modified	response.

       o   UserAgent

	   Optional.  You may provide your own LWP::UserAgent instance.	 Look
	   into	LWPx::ParanoidUserAgent	if you're fetching URLs	given to you
	   by possibly malicious parties.

       o   NoNetwork

	   Optional.  Controls the interaction between the cache and HTTP
	   requests with If-Modified-Since/If-None-Match headers.  Possible
	   behaviors are:

	   false (default)
	       If a page is in the cache, the origin HTTP server is always
	       checked for a fresher copy with an If-Modified-Since and/or If-
	       None-Match header.

	   1   If set to 1, the	origin HTTP is never contacted,	regardless of
	       the page	being in cache or not.	If the page is missing from
	       cache, the fetch	method will return undef.  If the page is in
	       cache, that page	will be	returned, no matter how	old it is.
	       Note that setting this option means the URI::Fetch::Response
	       object will never have the http_response	member set.

	   "N",	where N	> 1
	       The origin HTTP server is not contacted if the page is in cache
	       and the cached page was inserted	in the last N seconds.	If the
	       cached copy is older than N seconds, a normal HTTP request
	       (full or	cache check) is	done.

       o   ContentAlterHook

	   Optional.  A	subref that gets called	with a scalar reference	to
	   your	content	so you can modify the content before it's returned and
	   before it's put in cache.

	   For instance, you may want to only cache the	<head> section of an
	   HTML	document, or you may want to take a feed URL and cache only a
	   pre-parsed version of it.  If you modify the	scalarref given	to
	   your	hook and change	it into	a hashref, scalarref, or some blessed
	   object, that	same value will	be returned to you later on not-
	   modified responses.

       o   CacheEntryGrep

	   Optional.  A	subref that gets called	with the URI::Fetch::Response
	   object about	to be cached (with the contents	already	possibly
	   transformed by your "ContentAlterHook").  If	your subref returns
	   true, the page goes into the	cache.	If false, it doesn't.

       o   Freeze

       o   Thaw

	   Optional. Subrefs that get called to	serialize and deserialize,
	   respectively, the data that will be cached. The cached data should
	   be assumed to be an arbitrary Perl data structure, containing
	   (potentially) references to arrays, hashes, etc.

	   Freeze should serialize the structure into a	scalar;	Thaw should
	   deserialize the scalar into a data structure.

	   By default, Storable	will be	used for freezing and thawing the
	   cached data structure.

       o   ForceResponse

	   Optional. A boolean that indicates a	URI::Fetch::Response should be
	   returned regardless of the HTTP status. By default "undef" is
	   returned when a response is not a "success" (200 codes) or one of
	   the recognized HTTP status codes listed above. The HTTP status
	   message can then be retreived using the "errstr" method on the


       URI::Fetch is free software; you	may redistribute it and/or modify it
       under the same terms as Perl itself.

       Except where otherwise noted, URI::Fetch	is Copyright 2004 Benjamin
       Trott, All rights reserved.

       Currently maintained by Neil Bowers.

       o   Tim Appnel

       o   Mario Domgoergen

       o   Karen Etheridge

       o   Brad	Fitzpatrick

       o   Jason Hall

       o   Naoya Ito

       o   Tatsuhiko Miyagawa

perl v5.32.0			  2016-07-02			 URI::Fetch(3)


Want to link to this manual page? Use this URL:

home | help