Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
URI::Find(3)	      User Contributed Perl Documentation	  URI::Find(3)

NAME
       URI::Find - Find	URIs in	arbitrary text

SYNOPSIS
	 require URI::Find;

	 my $finder = URI::Find->new(\&callback);

	 $how_many_found = $finder->find(\$text);

DESCRIPTION
       This module does	one thing: Finds URIs and URLs in plain	text.  It
       finds them quickly and it finds them all	(or what URI.pm	considers a
       URI to be.)  It only finds URIs which include a scheme (http:// or the
       like), for something a bit less strict have a look at
       URI::Find::Schemeless.

       For a command-line interface, urifind is	provided.

   Public Methods
       new
	     my	$finder	= URI::Find->new(\&callback);

	   Creates a new URI::Find object.

	   &callback is	a function which is called on each URI found.  It is
	   passed two arguments, the first is a	URI object representing	the
	   URI found.  The second is the original text of the URI found.  The
	   return value	of the callback	will replace the original URI in the
	   text.

       find
	     my	$how_many_found	= $finder->find(\$text);

	   $text is a string to	search and possibly modify with	your callback.

	   Alternatively, "find" can be	called with a replacement function for
	   the rest of the text:

	     use CGI qw(escapeHTML);
	     # ...
	     my	$how_many_found	= $finder->find(\$text,	\&escapeHTML);

	   will	not only call the callback function for	every URL found	(and
	   perform the replacement instructions	therein), but also run the
	   rest	of the text through "escapeHTML()". This makes it easier to
	   turn	plain text which contains URLs into HTML (see example below).

   Protected Methods
       I got a bunch of	mail from people asking	if I'd add certain features to
       URI::Find.  Most	wanted the search to be	less restrictive, do more
       heuristics, etc...  Since many of the requests were contradictory, I'm
       letting people create their own custom subclasses to do what they want.

       The following are methods internal to URI::Find which a subclass	can
       override	to change the way URI::Find acts.  They	are only to be called
       inside a	URI::Find subclass.  Users of this module are NOT to use these
       methods.

       uri_re
	     my	$uri_re	= $self->uri_re;

	   Returns the regex for finding absolute, schemed URIs
	   (http://www.foo.com and such).  This, combined with
	   schemeless_uri_re() is what finds candidate URIs.

	   Usually this	method does not	have to	be overridden.

       schemeless_uri_re
	     my	$schemeless_re = $self->schemeless_uri_re;

	   Returns the regex for finding schemeless URIs (www.foo.com and
	   such) and other things which	might be URIs.	By default this	will
	   match nothing (though it used to try	to find	schemeless URIs	which
	   started with	"www" and "ftp").

	   Many	people will want to override this method.  See
	   URI::Find::Schemeless for a subclass	does a reasonable job of
	   finding URIs	which might be missing the scheme.

       uric_set
	     my	$uric_set = $self->uric_set;

	   Returns a set matching the 'uric' set defined in RFC	2396 suitable
	   for putting into a character	set ([]) in a regex.

	   You almost never have to override this.

       cruft_set
	     my	$cruft_set = $self->cruft_set;

	   Returns a set of characters which are considered garbage.  Used by
	   decruft().

       decruft
	     my	$uri = $self->decruft($uri);

	   Sometimes garbage characters	like periods and parenthesis get
	   accidentally	matched	along with the URI.  In	order for the URI to
	   be properly identified, it must sometimes be	"decrufted", the
	   garbage characters stripped.

	   This	method takes a candidate URI and strips	off any	cruft it
	   finds.

       recruft
	     my	$uri = $self->recruft($uri);

	   This	method puts back the cruft taken off with decruft().  This is
	   necessary because the cruft is destructively	removed	from the
	   string before invoking the user's callback, so it has to be put
	   back	afterwards.

       schemeless_to_schemed
	     my	$schemed_uri = $self->schemeless_to_schemed($schemeless_uri);

	   This	takes a	schemeless URI and returns an absolute,	schemed	URI.
	   The standard	implementation supplies	ftp:// for URIs	which start
	   with	ftp., and http:// otherwise.

       is_schemed
	     $obj->is_schemed($uri);

	   Returns whether or not the given URI	is schemed or schemeless.
	   True	for schemed, false for schemeless.

       badinvo
	     __PACKAGE__->badinvo($extra_levels, $msg)

	   This	is used	to complain about bogus	subroutine/method invocations.
	   The args are	optional.

   Old Functions
       The old find_uri() function is still around and it works, but its
       deprecated.

EXAMPLES
       Store a list of all URIs	(normalized) in	the document.

	 my @uris;
	 my $finder = URI::Find->new(sub {
	     my($uri) =	shift;
	     push @uris, $uri;
	 });
	 $finder->find(\$text);

       Print the original URI text found and the normalized representation.

	 my $finder = URI::Find->new(sub {
	     my($uri, $orig_uri) = @_;
	     print "The	text '$orig_uri' represents '$uri'\n";
	     return $orig_uri;
	 });
	 $finder->find(\$text);

       Check each URI in document to see if it exists.

	 use LWP::Simple;

	 my $finder = URI::Find->new(sub {
	     my($uri, $orig_uri) = @_;
	     if( head $uri ) {
		 print "$orig_uri is okay\n";
	     }
	     else {
		 print "$orig_uri cannot be found\n";
	     }
	     return $orig_uri;
	 });
	 $finder->find(\$text);

       Turn plain text into HTML, with each URI	found wrapped in an HTML
       anchor.

	 use CGI qw(escapeHTML);
	 use URI::Find;

	 my $finder = URI::Find->new(sub {
	     my($uri, $orig_uri) = @_;
	     return qq|<a href="$uri">$orig_uri</a>|;
	 });
	 $finder->find(\$text, \&escapeHTML);
	 print "<pre>$text</pre>";

NOTES
       Will not	find URLs with Internationalized Domain	Names or pretty	much
       any non-ascii stuff in them.  See
       <http://rt.cpan.org/Ticket/Display.html?id=44226>

AUTHOR
       Michael G Schwern <schwern@pobox.com> with insight from Uri Gutman,
       Greg Bacon, Jeff	Pinyan,	Roderick Schertler and others.

       Roderick	Schertler <roderick@argon.org> maintained versions 0.11	to
       0.16.

       Darren Chamberlain wrote	urifind.

LICENSE
       Copyright 2000, 2009-2010, 2014,	2016 by	Michael	G Schwern
       <schwern@pobox.com>.

       This program is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

       See http://www.perlfoundation.org/artistic_license_1_0

SEE ALSO
       urifind,	URI::Find::Schemeless, URI, RFC	3986 Appendix C

perl v5.32.0			  2020-08-08			  URI::Find(3)

NAME | SYNOPSIS | DESCRIPTION | EXAMPLES | NOTES | AUTHOR | LICENSE | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=URI::Find&sektion=3&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help