Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
HTMLWriter(3)	      User Contributed Perl Documentation	 HTMLWriter(3)

       XML::Handler::HTMLWriter	- SAX Handler for writing HTML 4.0

	 use XML::Handler::HTMLWriter;
	 use XML::SAX;

	 my $writer = XML::Handler::HTMLWriter->new(...);
	 my $parser = XML::SAX::ParserFactory->parser(Handler => $writer);

       This module is based on the rules for outputting	HTML according to - the XSLT specification. It is a subclass of
       XML::SAX::Writer, and the usage is the same as that module.

   First create	a new HTMLWriter object:
	 my $writer = XML::Handler::HTMLWriter->new(...);

       The ... indicates parameters to be passed in. These are all passed in
       using the hash syntax: Key => Value.

       All parameters are from XML::SAX::Writer, so please see its
       documentation for more details.

   Now pass $writer to a SAX chain:
       e.g. a SAX parser:

	 my $parser = XML::SAX::ParserFactory->parser(Handler => $writer);

       Or a SAX	filter:

	 my $tolower = XML::Filter::ToLower->new(Handler => $writer);

       Or use in a SAX Machine:

	 use XML::SAX::Machines	qw(Pipeline);

	    XML::Filter::XSLT->new(Source => { SystemId	=> 'foo.xsl' })

   Initiate processing
       XML::Handler::HTMLWriter	never initiates	processing itself, since it is
       just a recepticle for SAX events. So you	have to	start processing on
       one of the modules higher up the	chain. For example in the XML::SAX
       parser case:

	 $parser->parse(Source => { SystemId =>	"foo.xhtml" });

   Get the results
       Results work via	the consumer interface as defined in XML::SAX::Writer.

HTML Output Methodology
       Here is the relevant excerpt from TR/xslt [note that a bit of an
       understanding of	XSLT is	necessary to read this,	but don't worry	-
       understanding isn't necessary to	use this module	:-)]:

       The html	output method should not output	an element differently from
       the xml output method unless the	expanded-name of the element has a
       null namespace URI; an element whose expanded-name has a	non-null
       namespace URI should be output as XML. If the expanded-name of the
       element has a null namespace URI, but the local part of the expanded-
       name is not recognized as the name of an	HTML element, the element
       should output in	the same way as	a non-empty, inline element such as

       The html	output method should not output	an end-tag for empty elements.
       For HTML	4.0, the empty elements	are area, base,	basefont, br, col,
       frame, hr, img, input, isindex, link, meta and param. For example, an
       element written as <br/>	or <br></br> in	the stylesheet should be
       output as <br>.

       The html	output method should recognize the names of HTML elements
       regardless of case. For example,	elements named br, BR or Br should all
       be recognized as	the HTML br element and	output without an end-tag.

       The html	output method should not perform escaping for the content of
       the script and style elements. For example, a literal result element
       written in the stylesheet as

	 <script>if (a &lt; b) foo()</script>


	 <script><![CDATA[if (a	< b) foo()]]></script>

       should be output	as

	 <script>if (a < b) foo()</script>

       The html	output method should not escape	< characters occurring in
       attribute values.

       If the indent attribute has the value yes, then the html	output method
       may add or remove whitespace as it outputs the result tree, so long as
       it does not change how an HTML user agent would render the output. The
       default value is	yes.

       The html	output method should escape non-ASCII characters in URI
       attribute values	using the method recommended in	Section	B.2.1 of the
       HTML 4.0	Recommendation.

       The html	output method may output a character using a character entity
       reference, if one is defined for	it in the version of HTML that the
       output method is	using.

       The html	output method should terminate processing instructions with >
       rather than ?>.

       The html	output method should output boolean attributes (that is
       attributes with only a single allowed value that	is equal to the	name
       of the attribute) in minimized form. For	example, a start-tag written
       in the stylesheet as

	 <OPTION selected="selected">

       should be output	as

	 <OPTION selected>

       The html	output method should not escape	a & character occurring	in an
       attribute value immediately followed by a { character (see Section
       B.7.1 of	the HTML 4.0 Recommendation). For example, a start-tag written
       in the stylesheet as

	 <BODY bgcolor='&amp;{{randomrbg}};'>

       should be output	as

	 <BODY bgcolor='&{randomrbg};'>

       The encoding attribute specifies	the preferred encoding to be used. If
       there is	a HEAD element,	then the html output method should add a META
       element immediately after the start-tag of the HEAD element specifying
       the character encoding actually used. For example,

	 <META http-equiv="Content-Type" content="text/html; charset=EUC-JP">

       It is possible that the result tree will	contain	a character that
       cannot be represented in	the encoding that the XSLT processor is	using
       for output. In this case, if the	character occurs in a context where
       HTML recognizes character references, then the character	should be
       output as a character entity reference or decimal numeric character
       reference; otherwise (for example, in a script or style element or in a
       comment), the XSLT processor should signal an error.

       If the doctype-public or	doctype-system attributes are specified, then
       the html	output method should output a document type declaration
       immediately before the first element. The name following	<!DOCTYPE
       should be HTML or html. If the doctype-public attribute is specified,
       then the	output method should output PUBLIC followed by the specified
       public identifier; if the doctype-system	attribute is also specified,
       it should also output the specified system identifier following the
       public identifier. If the doctype-system	attribute is specified but the
       doctype-public attribute	is not specified, then the output method
       should output SYSTEM followed by	the specified system identifier.

       The media-type attribute	is applicable for the html output method. The
       default value is	text/html.

       HTML characters are output using	HTML::Entities.	See HTML::Entities for
       more details. By	default, XML::Handler::HTMLWriter uses the default
       parameters to HTML::Entities::encode(), but I would be willing to
       investigate the worth in	passing	more parameters	in.

SAX1 or	SAX2?
       Previous	versions of this module	worked with both SAX1 and SAX2,	but
       actually	implemented the	translation in quite a broken manner. So now
       this module only	works with SAX 2. See for more

       Matt Sergeant,

       XML::SAX::Writer, XML::SAX::ParserFactory.

perl v5.32.0			  2003-03-30			 HTMLWriter(3)

NAME | SYNOPSIS | DESCRIPTION | Usage | HTML Output Methodology | Entities | SAX1 or SAX2? | AUTHOR | SEE ALSO

Want to link to this manual page? Use this URL:

home | help