Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
PrettyPrinter(3)      User Contributed Perl Documentation     PrettyPrinter(3)

	HTML::PrettyPrinter - generate nice HTML files from HTML syntax	trees

	 use HTML::TreeBuilder;
	 # generate a HTML syntax tree
	 my $tree = new	HTML::TreeBuilder;
	 # modify the tree if you want

	 use HTML::PrettyPrinter;
	 my $hpp = new HTML::PrettyPrinter ('linelength' => 130,
					    'quote_attr' => 1);
	 # configure
	 $tree->address("0.1.0")->attr(_hpp_indent,0);	  # for	an individual element
	 $hpp->set_force_nl(1,qw(body head));		  # for	tags
	 $hpp->set_force_nl(1,qw(@SECTIONS));		  # as above
	 $hpp->set_nl_inside(0,'default!');		  # for	all tags

	 # format the source
	 my $linearray_ref = $hpp->format($tree);
	 print @$linearray_ref;

	 # alternative:	print directly to filehandle
	 use FileHandle;
	 my $fh	= new FileHandel ">$filenaem2";
	 if (defined $fh) {
	   undef $fh;

       HTML::PrettyPrinter produces nicely formatted HTML code from a HTML
       syntax tree. It is especially usefull if	the produced HTML file shall
       be read or edited manually afterwards. Various parameters let you adapt
       the output to different styles and requirements.

       If you don't care how the HTML source looks like	as long	as it is valid
       and readable by browsers, you should use	the as_HTML() method of
       HTML::Element instead of	the pretty printer. It is about	five times

       The pretty printer will handle line wrapping, indention and structuring
       by the way the whitespace in the	tree is	represented in the output.
       Furthermore upper/lowercase markup and markup minimization, quoting of
       attribute values, the encoding of entities and the presence of optional
       end tags	are configurable.

       There are two types of parameters to influence the output, individual
       parameters that are set on a per	element	and per	tag basis and common
       parameters that are set only once for each instance of a	pretty

       In order	to faciliate the configuration a mechanism to handle tag
       groups is provided. Thus, it is possible	to modify a parameter for a
       group of	tags (e.g. all known block elements) without writing each tag
       name explicitly.	 Perhaps the code for tag groups will move to an other
       Perl module in the future.

       For HTML::Elements that require a special treatment like	<PRE>, <XMP>,
       <SCRIPT>, comments and declarations, pretty printer will	fall back to
       the method "as_HTML()" of the HTML elements.

       Following individual paramters exist

       indent n
	   The indent of new lines inside the element is increased by n
	   coloumns. Default is	2 for all tags.

       skip bool
	   If true, the	element	and its	content	is skipped from	output.
	   Default is false.

       nl_before n
	   Number of newlines before the start tag. Default is 0 for inline
	   elements and	1 for other elements.

       nl_inside n
	   Number of newlines between the tags and the contents	of an element.
	   Default is 0.

       nl_after	n
	   Number of newlines after an element.	Default	is 0 for inline
	   elements and	1 for other elements.

       force_nl	bool
	   Force linebreaks before and after an	element	even if	the HTML tree
	   does	not contain whitespace at this place. Default is false for
	   inline elements and true for	all other elements. This parameter is
	   superseded if the common parameter allow_forced_nl is set to	false.

       endtag bool
	   Print an optional endtag. Default is	true.

   Access Methods
       Following access	methods	exist for each individual paramenter.  Replace
       parameter by the	respective name.

	   Takes a reference to	an HTML	element	as argument. Returns the value
	   of the parameter for	that element. The priority to retrieve the
	   value is:

	   1.  The value of the	element's internal attribute "_hpp_parameter".

	   2.  The value specified inside the pretty printer for the tag of
	       the element.

	   3.  The value specified inside the pretty printer for 'default!'.

	   Like	"parameter($element)", except that only	priorities 2 and 3 are

	   Sets	the parameter for each tag in the list to $value.

	   If $value is	undefined, the entries for the tags are	deleted.

	   Beside individual tags the list may include tag groups like
	   '@BLOCK' (see below)	and '"default!"'. Individual tag names are
	   written in lower case, the names of tag groups start	with an	'@'
	   and are written in upper case letters. Tag groups are expanded
	   during the call of "set_parameter()".  '"default!"' sets the
	   default value, which	is retrived if no value	is defined for the
	   individual element or tag.

	   Deletes all existing	settings for parameter inside the pretty
	   printer and sets the	default	to $value..

       tabify n
	   If non zero,	each n spaces at the beginnig of a line	are converted
	   into	one TAB. Default is 8.

       linelength n
	   The maximum number of character a line should have. Default is 80.

	   The linelength may be exceeded if there is no proper	way to break a
	   line	without	modifying the content, e.g. inside <PRE> and other
	   special elements or if there	is no whitespace.

       min_bool_attr bool
	   Minimize boolean attributes,	e.g. print <UL COMPACT>	instead	of <UL
	   COMPACT=COMPACT>. Default is	true.

       quote_attr bool
	   Always quote	attribute values. If false, attribute values
	   consisting entirely of letters, digits, periods and hyphens only
	   are not put into quotes. Default is false.

       entities	string
	   The string contains all characters that are escaped to their	entity
	   names.  Default is the bare minimum of "&<>"	plus the non breaking
	   space 'nbsp'	(because otherwise it is difficult for the human eye
	   to distiguish it from a normal space	in most	editors).

       wrap_at_tagend NEVER|AFTER_ATTR|ALWAYS
	   May pretty printer wrap lines before	the closing ankle of a start
	   tag?	 Supported values are the predifined constants NEVER (allow
	   line	wraps at white space only ), AFTER_ATTR	(allow line wraps at
	   the end of tags that	contain	attributes only) and ALWAYS (allow
	   line	wraps at the end of every start	tag). Default is AFTER_ATTR.

       allow_forced_nl bool
	   Allow the addition of white space, that is not in the HTML tree.
	   If set to false (the	default) the force_nl parameter	is ignored.
	   It is recomended to set this	parameter to true if the HTML tree was
	   generated with ignore_ignorable_whitespace set to true.

       uppercase bool
	   Use uppercase letters for markup. Default is	the value of
	   $HTML::Element::html_uc at the time the constructor is called.

   Access Method
	   Retrieves and optionaly sets	the parameter.

       $hpp = HTML::PrettyPrinter->new(%common_paremeters)
	   This	class method creates a new HTML::PrettyPrinter and returns it.
	   Key/value pair arguments may	be provided to overwrite the default
	   settings of common parameters. There	is currently no	mechanism to
	   overwrite the default values	for individual parameters at
	   construction. Use the "$hpp-"set_parameter()> methods instead.

	   Select a FileHandle object for output.

	   If a	FileHandle is selected the generated HTML is printed directly
	   to that file. With $hpp->select(undef) you can switch back to the
	   default behaviour.

       $line_array_ref = $hpp->format($tree,[$indent],[$line_array_ref])
	   Format the HTML syntax (sub-) tree.

	   $tree is not	restricted to the root of the HTML syntax tree.	A
	   reference to	any HTML::Element will do.

	   The optional	$indent	indents	the first element by n characters

	   Return value	is the reference to an array with the generated	lines.
	   If such a reference is provided as third argument, the lines	will
	   be appended to that array. Otherwise	a new array will be created.

	   If a	FileHandle is selected by a previous call of the
	   "$hpp-"select($fh)> method, the lines are printed to	the FileHandle
	   object directly.  The array of lines	is not changed in this case.

       Tag groups are lists that contain the names of tags and other tag
       groups which are	considered as subsets. This reflects the way allowed
       content is specified in HTML DTDs, where	e.g. %flow consists of all
       %block and %inline elements and %inline covers several subsets like

       If you add a tag	name to	a group	A, it will be seen in any group	that
       contains	group A. Thus, it is easy to maintain groups of	tags with
       similar properties. (and	configure HTML pretty printer for these	tags).

       The names of tag	groups are written in upper case letters with a
       leading '@' (e.g. '@BLOCK'). The	names of simple	tags are written all
       lower case.

       All the functions to handle and modify tag groups are included in the
       @EXPORT_OK list of "HTML::PrettyPrinter".

       @tag_groups = list_groups()
	   Returns a list with the names of all	defined	tag groups

       @tags = group_expand('tag_or_tag_group0',['tag_or_tag_group1',...])
	   Returns a list of every tag in the tag groups and their subgroups
	   Each	tag is listed once only. The order of the list is not

       @tag_groups = sub_group('tag_group0',['tag_group1',...])
	   Returns a list of every tag group and sub group in the list.	 Each
	   group is listed once	only. The order	of the list is not specified.

	   Return the (unexpanded) contents of a tag group.

	   Set a tag group.

	   Add tags and	tag groups to a	group.

	   Remove tags or tag groups from a group. Subgroups are not expanded.
	   Thus, "group_remove('@A','@B')" will	remove '@B' from '@A' if it is
	   included directly. Tags included in '@B' will not be	removed	from
	   '@A'.  Nor will '@A'	be changed if '@B' is included in a aubgroup
	   of '@A' but not in '@A' directly.

   Predefined Tag Groups
       There are a couple of predefined	tag groups. Use	"  foreach my $tg
       (list_groups()) {
	   printA "'$tg'A =>A qw(".join(',',group_get($tg)).")\n";
	 } " to	get a list.

   Examples for	tag groups
       1. create some groups
	     group_set('@A',qw(a1 a2 a3));
	     group_set('@B',qw(b1 b2));
	     group_set('@C',qw(@A @B c1	@D));
	     # @D needs	to be defined when @C is expannded
	     group_set('@D',qw(d1 @B));
	     group_set('@E',qw(e1 @D));
	     group_set('@F',qw(f1 @A));	"

       2. add tags
	     group_add('@A',qw(a4 a5));	# @A contains (a1 a2 a3	a4 a5)
	     group_add('@D',qw(d1));	# @D contains (d1 @B d1)
	     # @F contains (f1 @A b1 b2	f1 @F) "

       3. evaluate
	     group_exapand('@E');    # returns e1, d1, b1, b2
	     sub_groups('@E');	     # returns @B, @D
	     sub_groups(qw(@E @F));  # returns @A, @B, @D
	     group_get('@F'));	     # returns f1, @A, b1, b2, f1, @F "

       4. remove tags
	     group_remove('@E','@C');  # @E not	changed, because it doesn't
	   contain @C
	     group_remove('@E','@D');  # @D removed from @E
	     group_remove('@D','d1');  # all d1's are removed. Now @D contains
	   @B only
	     group_remove('@C','@B');  # @C now	contains (@a c1	@D), Thus
	     sub_groups('@C');	       # still returns @A, @B, @D,
				       # because @B is included	in @D, too "

       5. application
	     # set the indent for tags b1, b2, e1, g1 to 0
	     $hpp->set_indent(0,qw(@D @E g1)); "

	   If the groups @D or @E are modified afterwards, the configuration
	   of the pretty printer is not	affected, because "set_indent()" will
	   expand the tag groups.

       Consider	the following HTML tree

	   <html> @0
	     <head> @0.0
	       <title> @0.0.0
		 "Demonstrate HTML::PrettyPrinter"
	     <body> @0.1
	       <h1> @0.1.0
	       <p align="JUSTIFY"> @0.1.1
		 "Some text in "
		 <b> @
		 " and "
		 <i> @
		 " and with 'Ax' & 'A1/4'."
	       <table align="LEFT" border=0> @0.1.2
		 <tr> @
		   <td align="RIGHT"> @
		     "top right"
		 <tr> @
		   <td align="LEFT"> @
		     "bottom left"
	       <hr noshade="NOSHADE" size=5> @0.1.3
	       <address> @0.1.4
		 <a href=""> @
		   "ClausA Schotten"

       and "
	 $hpp =	HTML::PrettyPrinter-"new('uppercase' =>	1);
	 print @{$hpp->format($tree)}; >

       will print

	       ALIGN=JUSTIFY>Some text in <B>bold</B> and
	       <I>italics</I> and with 'Ax' &amp; 'A1/4'.</P><TABLE
		   right</TD></TR><TR><TD ALIGN=LEFT>bottom
		   left</TD></TR></TABLE><HR NOSHADE SIZE=5
	       ><ADDRESS><A HREF=""

       That doesn't look very nice. What went wrong? By	default
       HTML::PrettyPrinter takes a conservative	approach on whitespace.	It
       will enlarge existing whitespace, but it	will not introduce new
       whitespace outside of tags, because that	might change the way a browser
       renders the HTML	document. However the HTML tree	was constructed	with
       ""ignore_ignorable_whitespace> turned on.  Thus,	there is no whitespace
       between block elements that the pretty printer could format. So pretty
       printer does line wrapping and indention	only.  E.g. the	title is in
       the third level of the tree. Thus, the second line is indented six
       characters. The table cells in the fifth	level are indented by ten
       characters. Furthermore,	you see	that there is a	whitespace inserted
       after the last attribute	of the <A> tag.

       Let's set $hpp->allow_forced_nl(1);. Now	the forced_nl parameters are
       enabled.	By default, they are set for all non-inline tags. That creates

	    <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
	    <P ALIGN=JUSTIFY>Some text in <B>bold</B> and
	      <I>italics</I> and with 'Ax' &amp; 'A1/4'.</P>
		<TD ALIGN=RIGHT>top right</TD>
		<TD ALIGN=LEFT>bottom left</TD>
	    <ADDRESS><A	HREF=""

       Much better, isn't it? Now let's	improve	the structuring.
	 $hpp->set_nl_before(2,qw(body table));
	 $hpp->set_nl_after(2,qw(table)); will require two new lines in	front
       of <body> and <table> tags and after <table> tags.

	    <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>

	    <P ALIGN=JUSTIFY>Some text in <B>bold</B> and
	      <I>italics</I> and with 'Ax' &amp; 'A1/4'.</P>

		<TD ALIGN=RIGHT>top right</TD>
		<TD ALIGN=LEFT>bottom left</TD>

	    <ADDRESS><A	HREF=""

       Currently the mail address is the only attribute	value which is quoted.
       Here the	quotes are required by the '@' character. For all other
       attribute values	quotes are optional and	thus ommited by	default.
       $hpp->quote_attr(1); will turn the quotes on.

       $hpp->set_endtag(0,'all!') turns	all optional endtags off.  This
       affects the </p>	(and should affect </tr> and </td>, see	below).
       Alternatively, we could use $hpp->set_endtag(0,'default!'). That	would
       turn the	default	off, too. But it wouldn't delete settings for
       individual tags that supersede the default.

       $hpp->set_nl_after(3,'head') requires three new lines after the <head>
       element.	Because	there are already two new lines	required by the	start
       of <body> only one additional line is added.

       $hpp->set_force_nl(0,'td') will inhibit the introduction	of whitespace
       alround <td>. Thus, the table cells are now on the same line as the
       table rows.

	     <TITLE>Demonstrate	HTML::PrettyPrinter</TITLE>

	     <P	ALIGN="JUSTIFY">Some text in <B>bold</B> and
	       <I>italics</I> and with 'Ax' &amp; 'A1/4'.

	       <TR><TD ALIGN="RIGHT">top right</TD></TR>
	       <TR><TD ALIGN="LEFT">bottom left</TD></TR>

	     <HR NOSHADE SIZE="5">
	     <ADDRESS><A HREF=""

       The end tags </td> and </tr> are	printed	because	HTML:Tagset says they
       are mandatory.
	 map {$HTML::Tagset::optionalEndTag{$_}=1} qw(td tr th); will fix

       The additional new line after </head> doesn't look nice.	With
       $hpp->set_nl_after(undef,'head')	we will	reset the parameter for	the
       <head> tag.

       $hpp->entities($hpp->entities().'Ax'); will enforce the entity encoding
       of 'Ax'.

       $hpp->min_bool_attr(0); will inhibt the minimizyation of	the NOSHADE
       attribute to <hr>.

       Let's fiddle with the indention:

       New lines inside	text blocks (here inside <h1>, <p> and <address>) will
       be indented by 8	characters instead of two, whereas the code directly
       under <html> will not be	indented.

	  <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>

	  <P ALIGN="JUSTIFY">Some text in <B>bold</B> and
		  <I>italics</I> and with '&auml;' &amp; 'A1/4'.

	    <TR><TD ALIGN="RIGHT">top right
	    <TR><TD ALIGN="LEFT">bottom	left


       $hpp->wrap_at_tagend(HTML::PrettyPrinter::NEVER); will disable the line
       wrap between the	attribute and the '>' of the <a> tag. The resulting
       line excedes the	target line length by far, but the is no point left,
       where the pretty	printer	could legaly break this	line.

       $hpp->set_endtag(1,'tr')	will overwrite the default. Thus, the </tr>
       appears in the code whereas the other optional endtags are still

       Finally,	we customize some individual elements:

	   will	skip the <p> and its content from the output

	   will	force new lines	arround	the second <td>, but will not affect
	   the first.  <td>.

	  <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>


	    <TR><TD ALIGN="RIGHT">top right</TR>
	      <TD ALIGN="LEFT">bottom left


       o   This	is early alpha code. The interfaces are	subject	to changes.

       o   The module is tested	with perl 5.005_03 only. It should work	with
	   perl	5.004 though.

       o   The predefined tag groups are incomplete. Several tags need to be

       o   Attribute values from a fixed set given in the DTD (e.g.
	   ALIGN=LEFT|RIGHT etc.) should be converted to upper or lower	case
	   depending on	the value of the uppercase parameter. Currently, they
	   are printed as given	in the HTML tree.

       o   No optimization for performance was done.

       HTML::TreeBuilder, HTML::Element, HTML::Tagset

       Copyright 2000 Claus Schotten

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

       Claus Schotten <>

       Hey! The	above document had some	coding errors, which are explained

       Around line 954:
	   Non-ASCII character seen before =encoding in	'printA	"'$tg''.
	   Assuming CP1252

perl v5.32.1			  2000-09-15		      PrettyPrinter(3)


Want to link to this manual page? Use this URL:

home | help