Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Mojo::DOM(3)	      User Contributed Perl Documentation	  Mojo::DOM(3)

NAME
       Mojo::DOM - Minimalistic	HTML/XML DOM parser with CSS selectors

SYNOPSIS
	 use Mojo::DOM;

	 # Parse
	 my $dom = Mojo::DOM->new('<div><p id="a">Test</p><p id="b">123</p></div>');

	 # Find
	 say $dom->at('#b')->text;
	 say $dom->find('p')->map('text')->join("\n");
	 say $dom->find('[id]')->map(attr => 'id')->join("\n");

	 # Iterate
	 $dom->find('p[id]')->reverse->each(sub	{ say $_->{id} });

	 # Loop
	 for my	$e ($dom->find('p[id]')->each) {
	   say $e->{id}, ':', $e->text;
	 }

	 # Modify
	 $dom->find('div p')->last->append('<p id="c">456</p>');
	 $dom->find(':not(p)')->map('strip');

	 # Render
	 say "$dom";

DESCRIPTION
       Mojo::DOM is a minimalistic and relaxed HTML/XML	DOM parser with	CSS
       selector	support. It will even try to interpret broken HTML and XML, so
       you should not use it for validation.

NODES AND ELEMENTS
       When we parse an	HTML/XML fragment, it gets turned into a tree of
       nodes.

	 <!DOCTYPE html>
	 <html>
	   <head><title>Hello</title></head>
	   <body>World!</body>
	 </html>

       There are currently eight different kinds of nodes, "cdata", "comment",
       "doctype", "pi",	"raw", "root", "tag" and "text". Elements are nodes of
       the type	"tag".

	 root
	 |- doctype (html)
	 +- tag	(html)
	    |- tag (head)
	    |  +- tag (title)
	    |	  +- raw (Hello)
	    +- tag (body)
	       +- text (World!)

       While all node types are	represented as Mojo::DOM objects, some methods
       like "attr" and "namespace" only	apply to elements.

CASE-SENSITIVITY
       Mojo::DOM defaults to HTML semantics, that means	all tags and attribute
       names are lowercased and	selectors need to be lowercase as well.

	 # HTML	semantics
	 my $dom = Mojo::DOM->new('<P ID="greeting">Hi!</P>');
	 say $dom->at('p[id]')->text;

       If an XML declaration is	found, the parser will automatically switch
       into XML	mode and everything becomes case-sensitive.

	 # XML semantics
	 my $dom = Mojo::DOM->new('<?xml version="1.0"?><P ID="greeting">Hi!</P>');
	 say $dom->at('P[ID]')->text;

       HTML or XML semantics can also be forced	with the "xml" method.

	 # Force HTML semantics
	 my $dom = Mojo::DOM->new->xml(0)->parse('<P ID="greeting">Hi!</P>');
	 say $dom->at('p[id]')->text;

	 # Force XML semantics
	 my $dom = Mojo::DOM->new->xml(1)->parse('<P ID="greeting">Hi!</P>');
	 say $dom->at('P[ID]')->text;

METHODS
       Mojo::DOM implements the	following methods.

   all_text
	 my $text = $dom->all_text;

       Extract text content from all descendant	nodes of this element.

	 # "foo\nbarbaz\n"
	 $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->all_text;

   ancestors
	 my $collection	= $dom->ancestors;
	 my $collection	= $dom->ancestors('div ~ p');

       Find all	ancestor elements of this node matching	the CSS	selector and
       return a	Mojo::Collection object	containing these elements as Mojo::DOM
       objects.	 All selectors from "SELECTORS"	in Mojo::DOM::CSS are
       supported.

	 # List	tag names of ancestor elements
	 say $dom->ancestors->map('tag')->join("\n");

   append
	 $dom =	$dom->append('<p>I aY Mojolicious!</p>');

       Append HTML/XML fragment	to this	node (for all node types other than
       "root").

	 # "<div><h1>Test</h1><h2>123</h2></div>"
	 $dom->parse('<div><h1>Test</h1></div>')
	   ->at('h1')->append('<h2>123</h2>')->root;

	 # "<p>Test 123</p>"
	 $dom->parse('<p>Test</p>')->at('p')
	   ->child_nodes->first->append(' 123')->root;

   append_content
	 $dom =	$dom->append_content('<p>I aY Mojolicious!</p>');

       Append HTML/XML fragment	(for "root" and	"tag" nodes) or	raw content to
       this node's content.

	 # "<div><h1>Test123</h1></div>"
	 $dom->parse('<div><h1>Test</h1></div>')
	   ->at('h1')->append_content('123')->root;

	 # "<!-- Test 123 --><br>"
	 $dom->parse('<!-- Test	--><br>')
	   ->child_nodes->first->append_content('123 ')->root;

	 # "<p>Test<i>123</i></p>"
	 $dom->parse('<p>Test</p>')->at('p')->append_content('<i>123</i>')->root;

   at
	 my $result = $dom->at('div ~ p');

       Find first descendant element of	this element matching the CSS selector
       and return it as	a Mojo::DOM object, or "undef" if none could be	found.
       All selectors from "SELECTORS" in Mojo::DOM::CSS	are supported.

	 # Find	first element with "svg" namespace definition
	 my $namespace = $dom->at('[xmlns\:svg]')->{'xmlns:svg'};

   attr
	 my $hash = $dom->attr;
	 my $foo  = $dom->attr('foo');
	 $dom	  = $dom->attr({foo => 'bar'});
	 $dom	  = $dom->attr(foo => 'bar');

       This element's attributes.

	 # Remove an attribute
	 delete	$dom->attr->{id};

	 # Attribute without value
	 $dom->attr(selected =>	undef);

	 # List	id attributes
	 say $dom->find('*')->map(attr => 'id')->compact->join("\n");

   child_nodes
	 my $collection	= $dom->child_nodes;

       Return a	Mojo::Collection object	containing all child nodes of this
       element as Mojo::DOM objects.

	 # "<p><b>123</b></p>"
	 $dom->parse('<p>Test<b>123</b></p>')->at('p')->child_nodes->first->remove;

	 # "<!DOCTYPE html>"
	 $dom->parse('<!DOCTYPE	html><b>123</b>')->child_nodes->first;

	 # " Test "
	 $dom->parse('<b>123</b><!-- Test -->')->child_nodes->last->content;

   children
	 my $collection	= $dom->children;
	 my $collection	= $dom->children('div ~	p');

       Find all	child elements of this element matching	the CSS	selector and
       return a	Mojo::Collection object	containing these elements as Mojo::DOM
       objects.	 All selectors from "SELECTORS"	in Mojo::DOM::CSS are
       supported.

	 # Show	tag name of random child element
	 say $dom->children->shuffle->first->tag;

   content
	 my $str = $dom->content;
	 $dom	 = $dom->content('<p>I aY Mojolicious!</p>');

       Return this node's content or replace it	with HTML/XML fragment (for
       "root" and "tag"	nodes) or raw content.

	 # "<b>Test</b>"
	 $dom->parse('<div><b>Test</b></div>')->at('div')->content;

	 # "<div><h1>123</h1></div>"
	 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('123')->root;

	 # "<p><i>123</i></p>"
	 $dom->parse('<p>Test</p>')->at('p')->content('<i>123</i>')->root;

	 # "<div><h1></h1></div>"
	 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('')->root;

	 # " Test "
	 $dom->parse('<!-- Test	--><br>')->child_nodes->first->content;

	 # "<div><!-- 123 -->456</div>"
	 $dom->parse('<div><!--	Test -->456</div>')
	   ->at('div')->child_nodes->first->content(' 123 ')->root;

   descendant_nodes
	 my $collection	= $dom->descendant_nodes;

       Return a	Mojo::Collection object	containing all descendant nodes	of
       this element as Mojo::DOM objects.

	 # "<p><b>123</b></p>"
	 $dom->parse('<p><!-- Test --><b>123<!-- 456 --></b></p>')
	   ->descendant_nodes->grep(sub	{ $_->type eq 'comment'	})
	   ->map('remove')->first;

	 # "<p><b>test</b>test</p>"
	 $dom->parse('<p><b>123</b>456</p>')
	   ->at('p')->descendant_nodes->grep(sub { $_->type eq 'text' })
	   ->map(content => 'test')->first->root;

   find
	 my $collection	= $dom->find('div ~ p');

       Find all	descendant elements of this element matching the CSS selector
       and return a Mojo::Collection object containing these elements as
       Mojo::DOM objects. All selectors	from "SELECTORS" in Mojo::DOM::CSS are
       supported.

	 # Find	a specific element and extract information
	 my $id	= $dom->find('div')->[23]{id};

	 # Extract information from multiple elements
	 my @headers = $dom->find('h1, h2, h3')->map('text')->each;

	 # Count all the different tags
	 my $hash = $dom->find('*')->reduce(sub	{ $a->{$b->tag}++; $a }, {});

	 # Find	elements with a	class that contains dots
	 my @divs = $dom->find('div.foo\.bar')->each;

   following
	 my $collection	= $dom->following;
	 my $collection	= $dom->following('div ~ p');

       Find all	sibling	elements after this node matching the CSS selector and
       return a	Mojo::Collection object	containing these elements as Mojo::DOM
       objects.	 All selectors from "SELECTORS"	in Mojo::DOM::CSS are
       supported.

	 # List	tags of	sibling	elements after this node
	 say $dom->following->map('tag')->join("\n");

   following_nodes
	 my $collection	= $dom->following_nodes;

       Return a	Mojo::Collection object	containing all sibling nodes after
       this node as Mojo::DOM objects.

	 # "C"
	 $dom->parse('<p>A</p><!-- B -->C')->at('p')->following_nodes->last->content;

   matches
	 my $bool = $dom->matches('div ~ p');

       Check if	this element matches the CSS selector. All selectors from
       "SELECTORS" in Mojo::DOM::CSS are supported.

	 # True
	 $dom->parse('<p class="a">A</p>')->at('p')->matches('.a');
	 $dom->parse('<p class="a">A</p>')->at('p')->matches('p[class]');

	 # False
	 $dom->parse('<p class="a">A</p>')->at('p')->matches('.b');
	 $dom->parse('<p class="a">A</p>')->at('p')->matches('p[id]');

   namespace
	 my $namespace = $dom->namespace;

       Find this element's namespace, or return	"undef"	if none	could be
       found.

	 # Find	namespace for an element with namespace	prefix
	 my $namespace = $dom->at('svg > svg\:circle')->namespace;

	 # Find	namespace for an element that may or may not have a namespace prefix
	 my $namespace = $dom->at('svg > circle')->namespace;

   new
	 my $dom = Mojo::DOM->new;
	 my $dom = Mojo::DOM->new('<foo	bar="baz">I aY Mojolicious!</foo>');

       Construct a new scalar-based Mojo::DOM object and "parse" HTML/XML
       fragment	if necessary.

   next
	 my $sibling = $dom->next;

       Return Mojo::DOM	object for next	sibling	element, or "undef" if there
       are no more siblings.

	 # "<h2>123</h2>"
	 $dom->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h1')->next;

   next_node
	 my $sibling = $dom->next_node;

       Return Mojo::DOM	object for next	sibling	node, or "undef" if there are
       no more siblings.

	 # "456"
	 $dom->parse('<p><b>123</b><!--	Test -->456</p>')
	   ->at('b')->next_node->next_node;

	 # " Test "
	 $dom->parse('<p><b>123</b><!--	Test -->456</p>')
	   ->at('b')->next_node->content;

   parent
	 my $parent = $dom->parent;

       Return Mojo::DOM	object for parent of this node,	or "undef" if this
       node has	no parent.

	 # "<b><i>Test</i></b>"
	 $dom->parse('<p><b><i>Test</i></b></p>')->at('i')->parent;

   parse
	 $dom =	$dom->parse('<foo bar="baz">I aY Mojolicious!</foo>');

       Parse HTML/XML fragment with Mojo::DOM::HTML.

	 # Parse XML
	 my $dom = Mojo::DOM->new->xml(1)->parse('<foo>I aY Mojolicious!</foo>');

   preceding
	 my $collection	= $dom->preceding;
	 my $collection	= $dom->preceding('div ~ p');

       Find all	sibling	elements before	this node matching the CSS selector
       and return a Mojo::Collection object containing these elements as
       Mojo::DOM objects.  All selectors from "SELECTORS" in Mojo::DOM::CSS
       are supported.

	 # List	tags of	sibling	elements before	this node
	 say $dom->preceding->map('tag')->join("\n");

   preceding_nodes
	 my $collection	= $dom->preceding_nodes;

       Return a	Mojo::Collection object	containing all sibling nodes before
       this node as Mojo::DOM objects.

	 # "A"
	 $dom->parse('A<!-- B --><p>C</p>')->at('p')->preceding_nodes->first->content;

   prepend
	 $dom =	$dom->prepend('<p>I aY Mojolicious!</p>');

       Prepend HTML/XML	fragment to this node (for all node types other	than
       "root").

	 # "<div><h1>Test</h1><h2>123</h2></div>"
	 $dom->parse('<div><h2>123</h2></div>')
	   ->at('h2')->prepend('<h1>Test</h1>')->root;

	 # "<p>Test 123</p>"
	 $dom->parse('<p>123</p>')
	   ->at('p')->child_nodes->first->prepend('Test	')->root;

   prepend_content
	 $dom =	$dom->prepend_content('<p>I aY Mojolicious!</p>');

       Prepend HTML/XML	fragment (for "root" and "tag" nodes) or raw content
       to this node's content.

	 # "<div><h2>Test123</h2></div>"
	 $dom->parse('<div><h2>123</h2></div>')
	   ->at('h2')->prepend_content('Test')->root;

	 # "<!-- Test 123 --><br>"
	 $dom->parse('<!-- 123 --><br>')
	   ->child_nodes->first->prepend_content(' Test')->root;

	 # "<p><i>123</i>Test</p>"
	 $dom->parse('<p>Test</p>')->at('p')->prepend_content('<i>123</i>')->root;

   previous
	 my $sibling = $dom->previous;

       Return Mojo::DOM	object for previous sibling element, or	"undef"	if
       there are no more siblings.

	 # "<h1>Test</h1>"
	 $dom->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h2')->previous;

   previous_node
	 my $sibling = $dom->previous_node;

       Return Mojo::DOM	object for previous sibling node, or "undef" if	there
       are no more siblings.

	 # "123"
	 $dom->parse('<p>123<!-- Test --><b>456</b></p>')
	   ->at('b')->previous_node->previous_node;

	 # " Test "
	 $dom->parse('<p>123<!-- Test --><b>456</b></p>')
	   ->at('b')->previous_node->content;

   remove
	 my $parent = $dom->remove;

       Remove this node	and return "root" (for "root" nodes) or	"parent".

	 # "<div></div>"
	 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->remove;

	 # "<p><b>456</b></p>"
	 $dom->parse('<p>123<b>456</b></p>')
	   ->at('p')->child_nodes->first->remove->root;

   replace
	 my $parent = $dom->replace('<div>I aY Mojolicious!</div>');

       Replace this node with HTML/XML fragment	and return "root" (for "root"
       nodes) or "parent".

	 # "<div><h2>123</h2></div>"
	 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->replace('<h2>123</h2>');

	 # "<p><b>123</b></p>"
	 $dom->parse('<p>Test</p>')
	   ->at('p')->child_nodes->[0]->replace('<b>123</b>')->root;

   root
	 my $root = $dom->root;

       Return Mojo::DOM	object for "root" node.

   strip
	 my $parent = $dom->strip;

       Remove this element while preserving its	content	and return "parent".

	 # "<div>Test</div>"
	 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->strip;

   tag
	 my $tag = $dom->tag;
	 $dom	 = $dom->tag('div');

       This element's tag name.

	 # List	tag names of child elements
	 say $dom->children->map('tag')->join("\n");

   tap
	 $dom =	$dom->tap(sub {...});

       Alias for "tap" in Mojo::Base.

   text
	 my $text = $dom->text;

       Extract text content from this element only (not	including child
       elements).

	 # "bar"
	 $dom->parse("<div>foo<p>bar</p>baz</div>")->at('p')->text;

	 # "foo\nbaz\n"
	 $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->text;

   to_string
	 my $str = $dom->to_string;

       Render this node	and its	content	to HTML/XML.

	 # "<b>Test</b>"
	 $dom->parse('<div><b>Test</b></div>')->at('div	b')->to_string;

   tree
	 my $tree = $dom->tree;
	 $dom	  = $dom->tree(['root']);

       Document	Object Model. Note that	this structure should only be used
       very carefully since it is very dynamic.

   type
	 my $type = $dom->type;

       This node's type, usually "cdata", "comment", "doctype",	"pi", "raw",
       "root", "tag" or	"text".

	 # "cdata"
	 $dom->parse('<![CDATA[Test]]>')->child_nodes->first->type;

	 # "comment"
	 $dom->parse('<!-- Test	-->')->child_nodes->first->type;

	 # "doctype"
	 $dom->parse('<!DOCTYPE	html>')->child_nodes->first->type;

	 # "pi"
	 $dom->parse('<?xml version="1.0"?>')->child_nodes->first->type;

	 # "raw"
	 $dom->parse('<title>Test</title>')->at('title')->child_nodes->first->type;

	 # "root"
	 $dom->parse('<p>Test</p>')->type;

	 # "tag"
	 $dom->parse('<p>Test</p>')->at('p')->type;

	 # "text"
	 $dom->parse('<p>Test</p>')->at('p')->child_nodes->first->type;

   val
	 my $value = $dom->val;

       Extract value from form element (such as	"button", "input", "option",
       "select"	and "textarea"), or return "undef" if this element has no
       value. In the case of "select" with "multiple" attribute, find "option"
       elements	with "selected"	attribute and return an	array reference	with
       all values, or "undef" if none could be found.

	 # "a"
	 $dom->parse('<input name=test value=a>')->at('input')->val;

	 # "b"
	 $dom->parse('<textarea>b</textarea>')->at('textarea')->val;

	 # "c"
	 $dom->parse('<option value="c">Test</option>')->at('option')->val;

	 # "d"
	 $dom->parse('<select><option selected>d</option></select>')
	   ->at('select')->val;

	 # "e"
	 $dom->parse('<select multiple><option selected>e</option></select>')
	   ->at('select')->val->[0];

	 # "on"
	 $dom->parse('<input name=test type=checkbox>')->at('input')->val;

   wrap
	 $dom =	$dom->wrap('<div></div>');

       Wrap HTML/XML fragment around this node (for all	node types other than
       "root"),	placing	it as the last child of	the first innermost element.

	 # "<p>123<b>Test</b></p>"
	 $dom->parse('<b>Test</b>')->at('b')->wrap('<p>123</p>')->root;

	 # "<div><p><b>Test</b></p>123</div>"
	 $dom->parse('<b>Test</b>')->at('b')->wrap('<div><p></p>123</div>')->root;

	 # "<p><b>Test</b></p><p>123</p>"
	 $dom->parse('<b>Test</b>')->at('b')->wrap('<p></p><p>123</p>')->root;

	 # "<p><b>Test</b></p>"
	 $dom->parse('<p>Test</p>')->at('p')->child_nodes->first->wrap('<b>')->root;

   wrap_content
	 $dom =	$dom->wrap_content('<div></div>');

       Wrap HTML/XML fragment around this node's content (for "root" and "tag"
       nodes), placing it as the last children of the first innermost element.

	 # "<p><b>123Test</b></p>"
	 $dom->parse('<p>Test<p>')->at('p')->wrap_content('<b>123</b>')->root;

	 # "<p><b>Test</b></p><p>123</p>"
	 $dom->parse('<b>Test</b>')->wrap_content('<p></p><p>123</p>');

   xml
	 my $bool = $dom->xml;
	 $dom	  = $dom->xml($bool);

       Disable HTML semantics in parser	and activate case-sensitivity,
       defaults	to auto-detection based	on XML declarations.

OPERATORS
       Mojo::DOM overloads the following operators.

   array
	 my @nodes = @$dom;

       Alias for "child_nodes".

	 # "<!-- Test -->"
	 $dom->parse('<!-- Test	--><b>123</b>')->[0];

   bool
	 my $bool = !!$dom;

       Always true.

   hash
	 my %attrs = %$dom;

       Alias for "attr".

	 # "test"
	 $dom->parse('<div id="test">Test</div>')->at('div')->{id};

   stringify
	 my $str = "$dom";

       Alias for "to_string".

SEE ALSO
       Mojolicious, Mojolicious::Guides, <http://mojolicious.org>.

perl v5.24.1			  2017-04-22			  Mojo::DOM(3)

NAME | SYNOPSIS | DESCRIPTION | NODES AND ELEMENTS | CASE-SENSITIVITY | METHODS | OPERATORS | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Mojo::DOM&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help