Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
HTML::Query(3)	      User Contributed Perl Documentation	HTML::Query(3)

NAME
       HTML::Query - jQuery-like selection queries for HTML::Element

SYNOPSIS
       Creating	an "HTML::Query" object	using the Query() constructor
       subroutine:

	   use HTML::Query 'Query';

	   # using named parameters
	   $q =	Query( text  =>	$text  );	   # HTML text
	   $q =	Query( file  =>	$file  );	   # HTML file
	   $q =	Query( tree  =>	$tree  );	   # HTML::Element object
	   $q =	Query( query =>	$query );	   # HTML::Query object
	   $q =	Query(
	       text  =>	$text1,			   # or	any combination
	       text  =>	$text2,			   # of	the above
	       file  =>	$file1,
	       file  =>	$file2,
	       tree  =>	$tree,
	       query =>	$query,
	   );

	   # passing elements as positional arguments
	   $q =	Query( $tree );			   # HTML::Element object(s)
	   $q =	Query( $tree1, $tree2, $tree3, ... );

	   # or	from one or more existing queries
	   $q =	Query( $query1 );		   # HTML::Query object(s)
	   $q =	Query( $query1,	$query2, $query3, ... );

	   # or	a mixture
	   $q =	Query( $tree1, $query1,	$tree2,	$query2	);

	   # the final argument	(in all	cases) can be a	selector
	   my $spec = 'ul.menu li a';		   # <ul class="menu">..<li>..<a>

	   $q =	Query( $tree, $spec );
	   $q =	Query( $query, $spec );
	   $q =	Query( $tree1, $tree2, $query1,	$query2, $spec );
	   $q =	Query( text  =>	$text,	$spec );
	   $q =	Query( file  =>	$file,	$spec );
	   $q =	Query( tree  =>	$tree,	$spec );
	   $q =	Query( query =>	$query,	$spec );
	   $q =	Query(
	       text => $text,
	       file => $file,
	       # ...etc...
	       $spec
	   );

       Or using	the OO new() constructor method	(which the Query() subroutine
       maps onto):

	   use HTML::Query;

	   $q =	HTML::Query->new(
	       # accepts the same arguments as Query()
	   )

       Or by monkey-patching a query() method into HTML::Element.

	   use HTML::Query 'query';		   # note lower	case 'q'
	   use HTML::TreeBuilder;

	   # build a tree
	   my $tree = HTML::TreeBuilder->new;
	   $tree->parse_file($filename);

	   # call the query() method on	any element
	   my $query = $tree->query($spec);

       Once you	have a query, you can start selecting elements:

	   @r =	$q->query('a')->get_elements();		   # all <a>...</a> elements
	   @r =	$q->query('a#menu')->get_elements();	   # all <a> with "menu" id
	   @r =	$q->query('#menu')->get_elements();	   # all elements with "menu" id
	   @r =	$q->query('a.menu')->get_elements();	   # all <a> with "menu" class
	   @r =	$q->query('.menu')->get_elements();	   # all elements with "menu" class
	   @r =	$q->query('a[href]')->get_elements();	   # all <a> with 'href' attr
	   @r =	$q->query('a[href=foo]')->get_elements();  # all <a> with 'href="foo"' attr

	   # you can specify elements within elements...
	   @r =	$q->query('ul.menu li a')->get_elements(); # <ul class="menu">...<li>...<a>

	   # and use commas to delimit multiple	path specs for different elements
	   @r =	$q->query('table tr td a, form input[type=submit]')->get_elements();

	   # query() in	scalar context returns a new query
	   $r =	$q->query('table')->get_elements();;	   # find all tables
	   $s =	$r->query('tr')->get_elements();	   # find all rows in all those	tables
	   $t =	$s->query('td')->get_elements();	   # and all cells in those rows...

       Inspecting query	elements:

	   # get number	of elements in query
	   my $size  = $q->size

	   # get first/last element in query
	   my $first = $q->first;
	   my $last  = $q->last;

	   # convert query to list or list ref of HTML::Element	objects
	   my $list = $q->list;		   # list ref in scalar	context
	   my @list = $q->list;		   # list in list context

       All other methods are mapped onto the HTML::Element objects in the
       query:

	   print $query->as_trimmed_text;  # print trimmed text	for each element
	   print $query->as_HTML;	   # print each	element	as HTML
	   $query->delete;		   # call delete() on each element

DESCRIPTION
       The "HTML::Query" module	is an add-on for the HTML::Tree	module set. It
       provides	a simple way to	select one or more elements from a tree	using
       a query syntax inspired by jQuery. This selector	syntax will be
       reassuringly familiar to	anyone who has ever written a CSS selector.

       "HTML::Query" is	not an attempt to provide a complete (or even near-
       complete) implementation	of jQuery in Perl (see Ingy's pQuery module
       for a more ambitious attempt at that). Rather, it borrows some of the
       tried and tested	selector syntax	from jQuery (and CSS) that can easily
       be mapped onto the "look_down()"	method provided	by the HTML::Element
       module.

   Creating a Query
       The easiest way to create a query is using the exportable Query()
       subroutine.

	   use HTML::Query 'Query';	   # note capital 'Q'

       It accepts a "text" or "file" named parameter and will create an
       "HTML::Query" object from the HTML source text or file, respectively.

	   my $query = Query( text => $text );
	   my $query = Query( file => $file );

       This delegates to HTML::TreeBuilder to parse the	HTML into a tree of
       HTML::Element objects.  The root	element	returned is then wrapped in an
       "HTML::Query" object.

       If you already have one or more HTML::Element objects that you want to
       query then you can pass them to the Query() subroutine as arguments.
       For example, you	can explicitly use HTML::TreeBuilder to	parse an HTML
       document	into a tree:

	   use HTML::TreeBuilder;
	   my $tree = HTML::TreeBuilder->new;
	   $tree->parse_file($filename);

       And then	create an "HTML::Query"	object for the tree either using an
       explicit	"tree" named parameter:

	   my $query = Query( tree => $tree );

       Or implicitly using positional arguments.

	   my $query = Query( $tree );

       If you want to query across multiple elements, then pass	each one as a
       positional argument.

	   my $query = Query( $tree1, $tree2, $tree3 );

       You can also create a new query from one	or more	existing queries,

	   my $query = Query( query => $query );   # named parameter
	   my $query = Query( $query1, $query2 );  # positional	arguments.

       You can mix and match these different parameters	and positional
       arguments to create a query across several different sources.

	   $q =	Query(
	       text  =>	$text1,
	       text  =>	$text2,
	       file  =>	$file1,
	       file  =>	$file2,
	       tree  =>	$tree,
	       query =>	$query,
	   );

       The Query() subroutine is a simple wrapper around the new() constructor
       method. You can instantiate your	objects	manually if you	prefer.	 The
       new() method accepts the	same arguments as for the Query() subroutine
       (in fact, the Query() subroutine	simply forwards	all arguments to the
       new() method).

	   use HTML::Query;

	   my $query = HTML::Query->new(
	       # same argument format as for Query()
	   );

       A final way to use "HTML::Query"	is to have it add a query() method to
       HTML::Element.  The "query" import hook (all lower case)	can be
       specified to make this so.

	   use HTML::Query 'query';		   # note lower	case 'q'
	   use HTML::TreeBuilder;

	   my $tree = HTML::TreeBuilder->new;
	   $tree->parse_file($filename);

	   # now all HTML::Elements have a query() method
	   my @items = $tree->query('ul	li')->get_elements();  # find all list items

       This approach, often referred to	as monkey-patching, should be used
       carefully and sparingly.	It involves a violation	of HTML::Element's
       namespace that could have unpredictable results with a future version
       of the module (e.g. one which defines its own "query()" method that
       does something different). Treat	it as something	that is	great to get a
       quick job done right now, but probably not something to be used in
       production code without careful consideration of	the implications.

   Selecting Elements
       Having created an "HTML::Query" object by one of	the methods outlined
       above, you can now fetch	descendant elements in the tree	using a	simple
       query syntax.  For example, to fetch all	the "<a>" elements in the
       tree, you can write:

	   @links = $query->query('a')->get_elements();

       Or, if you want the elements that have a	specific "class" attribute
       defined with a value of,	say "menu", you	can write:

	   @links = $query->query('a.menu')->get_elements();

       More generally, you can look for	the existence of any attribute and
       optionally provide a specific value for it.

	   @links = $query->query('a[href]')->get_elements();		 # any href attribute
	   @links = $query->query('a[href=index.html]')->get_elements(); # specific value

       You can also find an element (or	elements) by specifying	an id.

	   @links = $query->query('#menu')->get_elements();	    # any element with id="menu"
	   @links = $query->query('ul#menu')->get_elements();	    # ul element with id="menu"

       You can provide multiple	selection criteria to find elements within
       elements	within elements, and so	on.  For example, to find all links in
       a menu, you can write:

	   # matches: <ul class="menu">	<li> <a>
	   @links = $query->query('ul.menu li a')->get_elements();

       You can separate	different criteria using commas.  For example, to
       fetch all table rows and	"span" elements	with a "foo" class:

	   @elems = $query->('table tr,	span.foo')->get_elements();

   Query Results
       When called in list context, as shown in	the examples above, the
       query() method returns a	list of	HTML::Element objects matching the
       search criteria.	In scalar context, the query() method returns a	new
       "HTML::Query" object containing the HTML::Element objects found.	You
       can then	call the query() method	against	that object to further refine
       the query. The query() method applies the selection to all elements
       stored in the query.

	   my $tables =	$query->query('table');		    # query for	tables
	   my $rows   =	$tables->query('tr');		    # requery for all rows in those tables
	   my $cells  =	$rows->query('td')->get_elements(); # return back all the cells	in those rows

   Inspection Methods
       The size() method returns the number of elements	in the query. The
       first() and last() methods return the first and last items in the
       query, respectively.

	   if ($query->size) {
	       print "from ", $query->first->as_trimmed_text, "	to ", $query->last->as_trimmed_text;
	   }

       If you want to extract the HTML::Element	objects	from the query you can
       call the	list() method. This returns a list of HTML::Element objects in
       list context, or	a reference to a list in scalar	context.

	   @elems = $query->list;
	   $elems = $query->list;

   Element Methods
       Any other methods are automatically applied to each element in the
       list. For example, to call the "as_trimmed_text()" method on all	the
       HTML::Element objects in	the query, you can write:

	   print $query->as_trimmed_text;

       In list context,	this method returns a list of the return values	from
       calling the method on each element.  In scalar context it returns a
       reference to a list of return values.

	   @text_blocks	= $query->as_trimmed_text;
	   $text_blocks	= $query->as_trimmed_text;

       See HTML::Element for further information on the	methods	it provides.

QUERY SYNTAX
   Basic Selectors
       element

       Matches all elements of a particular type.

	   @elems = $query->query('table')->get_elements();	# <table>

       #id

       Matches all elements with a specific id attribute.

	   @elems = $query->query('#menu')->get_elements()     # <ANY id="menu">

       This can	be combined with an element type:

	   @elems = $query->query('ul#menu')->get_elements();  # <ul id="menu">

       .class

       Matches all elements with a specific class attribute.

	   @elems = $query->query('.info')->get_elements();	# <ANY class="info">

       This can	be combined with an element type and/or	element	id:

	   @elems = $query->query('p.info')->get_elements();	 # <p class="info">
	   @elems = $query->query('p#foo.info')->get_elements(); # <p id="foo" class="info">
	   @elems = $query->query('#foo.info')->get_elements();	 # <ANY	id="foo" class="info">

       The selectors listed above can be combined in a whitespace delimited
       sequence	to select down through a hierarchy of elements.	 Consider the
       following table:

	   <table class="search">
	     <tr class="result">
	       <td class="value">WE WANT THIS ELEMENT</td>
	     </tr>
	     <tr class="result">
	       <td class="value">AND THIS ONE</td>
	     </tr>
	     ...etc..
	   </table>

       To locate the cells that	we're interested in, we	can write:

	   @elems = $query->query('table.search	tr.result td.value')->get_elements();

   Attribute Selectors
       W3C CSS 2 specification defines new constructs through which to select
       based on	specific attributes within elements. See the following link
       for the spec:
       <http://www.w3.org/TR/css3-selectors/#attribute-selectors>

       [attr]

       Matches elements	that have the specified	attribute, including any where
       the attribute has no value.

	   @elems = $query->query('[href]')->get_elements();	    # <ANY href="...">

       This can	be combined with any of	the above selectors.  For example:

	   @elems = $query->query('a[href]')->get_elements();	    # <a href="...">
	   @elems = $query->query('a.menu[href]')->get_elements();  # <a class="menu" href="...">

       You can specify multiple	attribute selectors.  Only those elements that
       match all of them will be selected.

	   @elems = $query->query('a[href][rel]')->get_elements();  # <a href="..." rel="...">

       [attr=value]

       Matches elements	that have an attribute set to a	specific value.	 The
       value can be quoted in either single or double quotes, or left
       unquoted.

	   @elems = $query->query('[href=index.html]')->get_elements();
	   @elems = $query->query('[href="index.html"]')->get_elements();
	   @elems = $query->query("[href='index.html']")->get_elements();

       You can specify multiple	attribute selectors.  Only those elements that
       match all of them will be selected.

	   @elems = $query->query('a[href=index.html][rel=home]')->get_elements();

       [attr|=value]

       Matches any element X whose foo attribute has a hyphen-separated	list
       of values beginning (from the left) with	bar. The value can be quoted
       in either single	or double quotes, or left unquoted.

	   @elems = $query->query('[lang|=en]')->get_elements();
	   @elems = $query->query('p[class|="example"]')->get_elements();
	   @elems = $query->query("img[alt|='fig']")->get_elements();

       You can specify multiple	attribute selectors.  Only those elements that
       match all of them will be selected.

	   @elems = $query->query('p[class|="external"][lang|="en"]')->get_elements();

       [attr~=value]

       Matches any element X whose foo attribute value is a list of space-
       separated values, one of	which is exactly equal to bar. The value can
       be quoted in either single or double quotes, or left unquoted.

	   @elems = $query->query('[lang~=en]')->get_elements();
	   @elems = $query->query('p[class~="example"]')->get_elements();
	   @elems = $query->query("img[alt~='fig']")->get_elements();

       You can specify multiple	attribute selectors.  Only those elements that
       match all of them will be selected.

	   @elems = $query->query('p[class~="external"][lang~="en"]')->get_elements();

       KNOWN BUG: you can't have a "]" character in the	attribute value
       because it confuses the query parser.  Fixing this is TODO.

   Universal Selector
       W3C CSS 2 specification defines a new construct through which to	select
       any element within the document below a given hierarchy.

       http://www.w3.org/TR/css3-selectors/#universal-selector

	 @elems	= $query->query('*')->get_elements();

   Combinator Selectors
       W3C CSS 2 specification defines new constructs through which to select
       based on	heirarchy with the DOM.	See the	following link for the spec:
       <http://www.w3.org/TR/css3-selectors/#combinators>

       Immediate Descendents (children)

       When you	combine	selectors with whitespace elements are selected	if
       they are	descended from the parent in some way. But if you just want to
       select the children (and	not the	grandchildren, great-grandchildren,
       etc) then you can combine the selectors with the	">" character.

	@elems = $query->query('a > img')->get_elements();

       Non-Immediate Descendents

       If you just want	any descendents	that aren't children then you can
       combine selectors with the "*" character.

	@elems = $query->query('div * a')->get_elements();

       Immediate Siblings

       If you want to use a sibling relationship then you can can join
       selectors with the "+" character.

	@elems = $query->query('img + span')->get_elements();

   Pseudo-classes
       W3C CSS 2 and CSS 3 specifications define new concepts of pseudo-
       classes to permit formatting based on information that lies outside the
       document	tree.  See the following link for the most recent spec:
       <http://www.w3.org/TR/css3-selectors/#pseudo-classes>

       HTML::Query currently has limited support for CSS 2, and	no support for
       CSS 3.

       Patches are *highly* encouraged to help add support here.

       -child pseudo-classes

       If you want to return child elements within a certain position then
       -child pseudo-classes (:first-child, :last-child) are what you're
       looking for.

	@elems = $query->query('table td:first-child')->get_elements;

       Link pseudo-classes: :link and :visited

       Unsupported.

       The :link pseudo-class is to be implemented, currently unsupported.

       It is not possible to locate :visited outside of	a browser context due
       to it's dynamic nature.

       Dynamic pseudo-classes

       Unsupported.

       It is not possible to locate these classes(:hover, :active, :focus)
       outside of a browser context due	to their dynamic nature.

       Language	pseudo-class

       Unsupported.

       Functionality for the :lang pseudo-class	is largely replicated by using
       an attribute selector for lang combined with a universal	selector
       query.

       If this is insufficient I'd love	to see a patch adding support for it.

       Other pseudo-classes

       W3C CSS 3 added a number	of new behaviors that need support. At this
       time there is no	support	for them, but we should	work on	adding
       support.

       Patches are very	welcome.

   Pseudo-elements
       W3C CSS 2 and CSS 3 specification defines new concepts of pseudo-
       elements	to permit formatting based on information that lies outside
       the document tree.  See the following link for the most recent spec:
       <http://www.w3.org/TR/css3-selectors/#pseudo-elements>

       At this time there is no	support	for pseudo-elements, but we are
       working on adding support.

       Patches are very	welcome.

   Combining Selectors
       You can combine basic and hierarchical selectors	into a single query by
       separating each part with a comma.  The query will select all matching
       elements	for each of the	comma-delimited	selectors.  For	example, to
       find all	"a", "b" and "i" elements in a tree:

	   @elems = $query->query('a, b, i')->get_elements();

       Each of these selectors can be arbitrarily complex.

	   @elems = $query->query(
	       'table.search[width=100%] tr.result[valign=top] td.value,
		form.search input[type=submit],
		a[href=index.html]'
	   )->get_elements();

EXPORT HOOKS
   Query
       The "Query()" constructor subroutine (note the capital letter) can be
       exported	as a convenient	way to create "HTML::Query" objects. It	simply
       forwards	all arguments to the new() constructor method.

	   use HTML::Query 'Query';

	   my $query = Query( file => $file, 'ul.menu li a' );

   query
       The "query()" export hook can be	called to monkey-patch a query()
       method into the HTML::Element module.

       This is considered questionable behaviour in polite society which
       regards it as a violation of the	inner sanctity of the HTML::Element.

       But if you're the kind of person	that doesn't mind a bit	of occasional
       namespace abuse for the sake of getting the job done, then go right
       ahead.  Just don't blame	me if it all blows up later.

	   use HTML::Query 'query';		   # note lower	case 'q'
	   use HTML::TreeBuilder;

	   # build a tree
	   my $tree = HTML::TreeBuilder->new;
	   $tree->parse_file($filename);

	   # call the query() method on	any element
	   my $query = $tree->query('ul	li a');

METHODS
       The "HTML::Query" object	is a subclass of Badger::Base and inherits all
       of its method.

   new(@elements,$selector)
       This constructor	method is used to create a new "HTML::Query" object.
       It expects a list of any	number (including zero)	of HTML::Element or
       "HTML::Query" objects.

	   # single HTML::Element object
	   my $query = HTML::Query->new($elem);

	   # multiple element object
	   my $query = HTML::Query->new($elem1,	$elem2,	$elem3,	...);

	   # copy elements from	an existing query
	   my $query = HTML::Query->new($another_query);

	   # copy elements from	several	queries
	   my $query = HTML::Query->new($query1, $query2, $query3);

	   # or	a mixture
	   my $query = HTML::Query->new($elem1,	$query1, $elem2, $query3);

       You can also use	named parameters to specify an alternate source	for a
       element.

	   $query = HTML::Query->new( file => $file );
	   $query = HTML::Query->new( text => $text );

       In this case, the HTML::TreeBuilder module is used to parse the source
       file or text into a tree	of HTML::Element objects.

       For the sake of completeness, you can also specify element trees	and
       queries using named parameters:

	   $query = HTML::Query->new( tree  => $tree );
	   $query = HTML::Query->new( query => $query );

       You can freely mix and match elements, queries and named	sources.  The
       query will be constructed as an aggregate across	them all.

	   $q =	HTML::Query->new(
	       text  =>	$text1,
	       text  =>	$text2,
	       file  =>	$file1,
	       file  =>	$file2,
	       tree  =>	$tree,
	       query =>	$query1,
	   );

       The final, optional argument can	be a selector specification.  This is
       immediately passed to the query() method	which will return a new	query
       with only those elements	selected.

	   my $spec = 'ul.menu li a';		   # <ul class="menu">..<li>..<a>

	   my $query = HTML::Query->new( $tree,	$spec );
	   my $query = HTML::Query->new( text => $text,	$spec );
	   my $query = HTML::Query->new(
	       text => $text,
	       file => $file,
	       $spec
	   );

       The list	of arguments can also be passed	by reference to	a list.

	   my $query = HTML::Query->new(\@args);

   query($spec)
       This method locates the descendant elements identified by the $spec
       argument	for each element in the	query. It then interally stores	the
       results for requerying or return. See get_elements().

	   my $query = HTML::Query->new(\@args);
	   my $results = $query->query($spec);

       See "QUERY SYNTAX" for the permitted syntax of the $spec	argument.

   get_elements()
       This method returns the stored results from a query. In list context it
       returns a list of matching HTML::Element	objects. In scalar context it
       returns a reference to the results array.

	   my $query = HTML::Query->new(\@args);
	   my $results = $query->query($spec);

	   my @elements	 = $results->query($spec)->get_elements();
	   my $elements	 = $results->query($spec)->get_elements();

   get_specificity()
       Calculate the specificity for any given passed selector,	a critical
       factor in determining how best to apply the cascade

       A selector's specificity	is calculated as follows:

       * count the number of ID	attributes in the selector (= a) * count the
       number of other attributes and pseudo-classes in	the selector (=	b) *
       count the number	of element names in the	selector (= c) * ignore
       pseudo-elements.

       The specificity is based	only on	the form of the	selector. In
       particular, a selector of the form "[id=p33]" is	counted	as an
       attribute selector (a=0,	b=0, c=1, d=0),	even if	the id attribute is
       defined as an "ID" in the source	document's DTD.

       See the following spec for additional details:
       <http://www.w3.org/TR/CSS21/cascade.html#specificity>

   size()
       Returns the number of elements in the query.

   first()
       Returns the first element in the	query.

	   my $elem = $query->first;

       If the query is empty then an exception will be thrown. If you would
       rather have an undefined	value returned then you	can use	the "try"
       method inherited	from Badger::Base. This	effectively wraps the call to
       "first()" in an "eval" block to catch any exceptions thrown.

	   my $elem = $query->try('first') || warn "no first element\n";

   last()
       Similar to first(), but returning the last element in the query.

	   my $elem = $query->last;

   list()
       Returns a list of the HTML::Element object in the query in list
       context,	or a reference to a list in scalar context.

	   my @elems = $query->list;
	   my $elems = $query->list;

   AUTOLOAD
       The "AUTOLOAD" method maps any other method calls to the	HTML::Element
       objects in the list. When called	in list	context	it returns a list of
       the values returned from	calling	the method on each element. In scalar
       context it returns a reference to a list	of return values.

	   my @text_blocks = $query->as_trimmed_text;
	   my $text_blocks = $query->as_trimmed_text;

KNOWN BUGS
   Attribute Values
       It is not possible to use "]" in	an attribute value.  This is due to a
       limitation in the parser	which will be fixed RSN.

AUTHOR
       Andy Wardley <http://wardley.org>

MAINTAINER
       Kevin Kamel <kamelkev@mailermailer.com>

CONTRIBUTORS
       Vivek Khera <vivek@khera.org> Michael Peters <wonko@cpan.org> David
       Gray <cpan@doesntsuck.com>

COPYRIGHT
       Copyright (C) 2010 Andy Wardley.	 All Rights Reserved.

       This module is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

SEE ALSO
       HTML::Tree, HTML::Element, HTML::TreeBuilder, pQuery,
       <http://jQuery.com/>

perl v5.32.0			  2014-08-28			HTML::Query(3)

NAME | SYNOPSIS | DESCRIPTION | QUERY SYNTAX | EXPORT HOOKS | METHODS | KNOWN BUGS | AUTHOR | MAINTAINER | CONTRIBUTORS | COPYRIGHT | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=HTML::Query&sektion=3&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help