Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
HXPIPE(1)			HTML-XML-utils			     HXPIPE(1)

       hxpipe -	convert	XML file to a format easier to parse with Perl or AWK

       hxpipe [	-l ] [ -- ] [ file-or-URL ]

       hxpipe parses an	HTML or	XML file and outputs a line-oriented represen-
       tation of it that is well suited	to further processing with AWK or sim-
       ilar tools. The format is similar to the	ESIS (Element Structure	Infor-
       mation Set) that	is output by nsgmls/onsgmls.

       The reverse operation, converting back to mark-up, is performed by  the
       hxunpipe	program.

       The output format is as follows:

		 Comments are output as


		 I.e., a single	line starting with "*" followed	by the text of
		 the comment. Line feeds, carriage returns  and	 tabs  in  the
		 text  are  written as "\n", "\r" and "\t", respectively. Text
		 that looks like a numerical character entity is written  with
		 the "&" replaced by "\".  The line ends with a	line feed.

		 Note  that  onsgmls  outputs comments starting	with a "_" in-
		 stead of a "*"	and doesn't replace the	"&" of numerical char-
		 acter entities	by "\" (and by default it omits	comments alto-

       <?processing instruction>
		 Processing instructions are output as

		     ?processing instruction

		 I.e., a single	line starting with a "?" followed by the  text
		 of  the  processing  instruction.  The	text is	escaped	as for
		 comments (see above).

       <!DOCTYPE root PUBLIC "-//foo//DTD bar//EN" "">
		 DOCTYPEs are output as	one of the following:

		     !root "-//foo//DTD	bar//EN"
		     !root "-//foo//DTD	bar//EN"
		     !root ""
		     !root ""

		 for respectively: a DOCTYPE with (1) both a public and	a sys-
		 tem identifier, (2) only a public identifier, (3) only	a sys-
		 tem identifier, or (4)	neither	of the	two.  I.e.,  a	single
		 line  starting	with a "!", followed by	a space	and a possibly
		 empty quoted string, followed optionally by a space and arbi-
		 trary text. Note the quotes for the public identifier and the
		 absence of quotes for the system identifier.

       <elt att1="value1" att2="value2">
		 A start tag is	output as

		     Aatt1 CDATA value1
		     Aatt2 CDATA value2

		 I.e., as zero or more lines for the attributes	and  one  line
		 for  the element type.	Each line for an attribute starts with
		 "A" followed by the name of the attribute, a space, the  lit-
		 eral  string "CDATA", another space, and the attribute	value.
		 The text of the attribute value is escaped  as	 for  comments
		 (see  above).	The  line for the element type starts with "("
		 followed by the element type.

		 hxpipe	does not read DTDs and assumes that attributes are al-
		 ways  CDATA.  It never	generates other	types (IMPLIED,	TOKEN,
		 ID, etc.), unlike onsgmls.

       </elt>	 End tags are output as


		 I.e., as a line starting with ")"  followed  by  the  element

       <empty att1="val1" att2="val2"/>
		 Empty elements	(in XML) are output as

		     Aatt1 CDATA val1
		     Aatt2 CDATA val2

		 I.e.,	as  zero  or  more  lines  for attributes and one line
		 starting with "|" followed by the element type.

		 Note that onsgmls never outputs "|". (However,	it can option-
		 ally output a line consisting of a single "e" just before the
		 "(" line, to indicate that the	element	is empty.)

       text	 Text is output	as


		 I.e., as a single line	starting with a	"-". The text  is  es-
		 caped as for comments (see above).

       line numbers
		 When  the -l option is	in effect, hxpipe will intersperse the
		 output	with lines of the form


		 where "12" is replaced	with the line  number  in  the	source
		 where the next	output came from.

       hxpipe does not normalize the input and does not	add mising tags. It is
       thus possible that there	are unequal numbers of "(" and ")"  lines.  If
       it is important that every start	tag is matched by an end tag, pipe the
       input through hxnormalize -x first.

       The following options are supported:

       -l	 Add "L" lines to the output to	indicate the line  numbers  in
		 the source.

       The following operand is	supported:

		 The name or URL of an HTML file. If absent, standard input is
		 read instead.

       The following exit values are returned:

       0	 Successful completion.

       > 0	 An error occurred in the parsing of the  HTML	file.	hxpipe
		 will try to correct the error and produce output anyway.

       To  use a proxy to retrieve remote files, set the environment variables
       http_proxy and ftp_proxy.  E.g.,	http_proxy="http://localhost:8080/"

       The error recovery for incorrect	HTML is	 primitive.   hxnormalize  can
       currently only retrieve remote files over HTTP. It doesn't handle pass-
       word-protected files, nor files whose content depends  on  HTTP	"cook-

       hxunpipe(1), onsgmls(1).

7.x				  10 Jul 2011			     HXPIPE(1)


Want to link to this manual page? Use this URL:

home | help