Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
HTML2WML(1)		    Html2Wml Documentation		   HTML2WML(1)

       Html2Wml	-- Program that	can convert HTML pages to WML pages

       Html2Wml	can be used as either a	shell command:

	 $ html2wml file.html

       or as a CGI:


       In both cases, the file can be either a local file or a URL.

       Html2Wml	converts HTML pages to WML decks, suitable for being viewed on
       a Wap device. The program can be	launched from a	shell to statically
       convert a set of	pages, or as a CGI to convert a	particular
       (potentially dynamic) HTML resource.

       Althought the result is not guarantied to be valid WML, it should be
       the case	for most pages.	Good HTML pages	will most probably produce
       valid WML decks.	To check and correct your pages, you can use W3C's
       softwares: the HTML Validator, available	online at and HTML	Tidy, written by Dave Raggett.

       Html2Wml	provides the following features:

       o   translation of the links

       o   limitation of the cards size	by splitting the result	into several

       o   inclusion of	files (similar to the SSI)

       o   compilation of the result (using the	WML Tools, see "LINKS")

       o   a debug mode	to check the result using validation functions

       Please note that	most of	these options are also available when calling
       Html2Wml	as a CGI. In this case,	boolean	options	are given the value
       "1" or "0", and other options simply receive the	value they expect. For
       example,	"--ascii" becomes "?ascii=1" or	"?a=1".	See the	file
       t/form.html for an example on how to call Html2Wml as a CGI.

   Conversion Options
       -a, --ascii
	   When	this option is on, named HTML entities and non-ASCII
	   characters are converted to US-ASCII	characters using the same 7
	   bit approximations as Lynx. For example, "©" is	translated to
	   "(c)", and "ß"	is translated to "ss". This option is off by

	   This	option tells Html2Wml to collapse redundant whitespaces,
	   tabulations,	carriage returns, lines	feeds and empty	paragraphs.
	   The aim is to reduce	the size of the	WML document as	much as
	   possible. Collapsing	empty paragraphs is necessary for two reasons.
	   First, this avoids empty screens (and on a device with only 4 lines
	   of display, an empty	screen can be quite ennoying). Second,
	   Html2wml creates many empty paragraphs when converting, because of
	   the way the syntax reconstructor is programmed.  Deleting these
	   empty paragraphs is necessary like cleaning the kitchen :-)

	   If this really bother you, you can desactivate this behaviour with
	   the --nocollapse option.

	   This	option tells Html2Wml to completly ignore all image links.

	   This	option tells Html2Wml to replace the image tags	with their
	   corresponding alternative text (as with a text mode web browser).
	   This	option is on by	default.

	   This	option is on by	default. This makes Html2Wml flattens the HTML
	   tables (they	are linearized), as Lynx does. I think this is better
	   than	trying to use the native WML tables. First, they have
	   extremely limited features and possibilities	compared to HTML
	   tables. In particular, they can't be	nested.	In fact	this is	normal
	   because Wap devices are not supposed	to have	a big CPU running at
	   some	zillions-hertz,	and the	calculations needed to render the
	   tables are the most complicated and CPU-hogger part of HTML.

	   Second, as they can't be nested, and	as typical HTML	pages heavily
	   use imbricated tables to create their layout, it's impossible to
	   decide which	one could be kept. So the best thing is	to keep	none
	   of them.

	   [Note] Although you can desactivate this behaviour, and although
	   there is internal support for tables, the unlinearized mode has not
	   been	heavily	tested with nested tables, and it may produce
	   unexpected results.

       -n, --numeric-non-ascii
	   This	option tells Html2wml to convert all non-ASCII characters to
	   numeric entities, i.e., "e" becomes "é", and "ss" becomes
	   "ß".  By default, this option is off.

       -p, --nopre
	   This	options	tells Html2Wml not to use the <pre> tag. This option
	   was added because the compiler from WML Tools 0.0.4 doesn't support
	   this	tag.

   Links Reconstruction	Options
	   This	options	sets the template that will be used to reconstruct the
	   "href"-type links. See "LINKS RECONSTRUCTION" for more information.

	   This	option sets the	template that will be used to reconstruct the
	   "src"-type links. See "LINKS	RECONSTRUCTION"	for more information.

   Splitting Options
       -s, --max-card-size=SIZE
	   This	option allows you to limit the size (in	bytes) of the
	   generated cards. Default is 1,500 bytes, which should be small
	   enought to be loaded	on most	Wap devices. See "DECK SLICING"	for
	   more	information.

       -t, --card-split-threshold=SIZE
	   This	option sets the	threshold of the split event, which can	occur
	   when	the size of the	current	card is	between	"max-card-size"	-
	   "card-split-threshold" and "max-card-size". Default value is	50.
	   See "DECK SLICING" for more information.

	   This	options	sets the label of the link that	points to the next
	   card.  Default is "[&gt;&gt;]", which whill be rendered as "[>>]".

	   This	options	sets the label of the link that	points to the previous
	   card.  Default is "[&lt;&lt;]", which whill be rendered as "[<<]".

   HTTP	Authentication
       -U, --http-user=USERNAME
	   Use this option to set the username for an authenticated request.

       -P, --http-passwd=PASSWORD
	   Use this option to set the password for an authenticated request.

   Proxy Support
       -[no]Y, --[no]proxy
	   Use this option to activate proxy support. By default, proxy
	   support is activated. See "PROXY SUPPORT".

   Output Options
       -k, --compile
	   Setting this	option tells Html2Wml to use the compiler from WML
	   Tools to compile the	WML deck. If you want to create	a real Wap
	   site, you should seriously use this option in order to reduce the
	   size	of the WML decks.  Remember that Wap devices have very little
	   amount of memory. If	this is	not enought, use the splitting

	   Take	a look in wml_compilation/ for more information	on how to use
	   a WML compiler with Html2Wml.

       -o, --output
	   Use this option (in shell mode) to specify an output	file.  By
	   default, Html2Wml prints the	result to standard output.

   Debugging Options
       -d, --debug[=LEVEL]
	   This	option activates the debug mode. This prints the output	result
	   with	line numbering and with	the result of the XML check. If	the
	   WML compiler	was called, the	result is also printed in hexadecimal
	   an ascii forms. When	called as a CGI, all of	this is	printed	as
	   HTML, so that can use any web browser for that purpose.

	   When	this option is on, it send the WML output to XML::Parser to
	   check its well-formedness.

       The deck	slicing	is a feature that Html2Wml provides in order to	match
       the low memory capabilities of most Wap devices.	Many can't handle
       cards larger than 2,000 bytes, therefore	the cards must be sufficiently
       small to	be viewed by all Wap devices. To achieve this, you should
       compile your WML	deck, which reduce the size of the deck	by 50%,	but
       even then your cards may	be too big. This is where Html2Wml comes with
       the deck	slicing	feature. This allows you to limit the size of the
       cards, currently	only before the	compilation stage.

   Slice by cards or by	decks
       On some Wap phones, slicing the deck is not sufficient: the WML browser
       still tries to download the whole deck instead of just picking one card
       at a time. A solution is	to slice the WML document by decks.  See the
       figure below.

	    _____________	   _____________
	   |	deck	 |	  |   deck #1	|
	   |  _________	 |	  |  _________	|
	   | | card #1 | |	  | |  card   |	|
	   | |_________| |	  | |_________|	|
	   |  _________	 |	  |_____________|
	   | | card #2 | |
	   | |_________| |	       . . .
	   |  _________	 |
	   | |	 ...   | |	   _____________
	   | |_________| |	  |   deck #n	|
	   |  _________	 |	  |  _________	|
	   | | card #n | |	  | |  card   |	|
	   | |_________| |	  | |_________|	|
	   |_____________|	  |_____________|

	     WML document	    WML	document
	   sliced by cards	  sliced by decks

       What this means is that Html2Wml	generates several WML documents.  In
       CGI mode, only the appropriate deck is sent, selected by	the id given
       in parameter. If	no id was given, the first deck	is sent.

   Note	on size	calculation
       Currently, Html2Wml estimates the size of the card on the fly, by
       summing the length of the strings that compose the WML output, texts
       and tags. I say "estimates" and not "calculates"	because	computing the
       exact size would	require	many more calculations than the	way it is done
       now.  One may objects that there	are only additions, which is correct,
       but knowing the exact size is not necessary. Indeed, if you compile the
       WML, most of the	strings	of the tags will be removed, but not all.

       For example, take an image tag: "<img src="images/dog.jpg" alt="Photo
       of a dog">".  When compiled, the	string "img" will be replaced by a one
       byte value.  Same thing for the strings "src" and "alt",	and the
       spaces, double quotes and equal signs will be stripped. Only the	text
       between double quote will be preserved... but not in every cases.
       Indeed, in order	to go a	step further, the compiler can also encode
       parts of	the arguments as binary. For example, the string "http://www."
       can be encoded as a single byte ("8F" in	this case). Or,	if the
       attribute is "href", the	string "href="http://" can become the byte

       As you see, it doesn't matter to	know exactly the size of the textual
       form of the WML,	as it will always be far superior to the size of the
       compiled	form. That's why I don't count all the characters that may be
       actually	written.

       Also, it's because I'm quite lazy ;-)

   Why compiling the WML deck?
       If you intent to	create real WML	pages, you should really consider to
       always compile them. If you're not convinced, here is an	illustration.

       Take the	following WML code snipet:

	   <a href=''>Yahoo!</a>

       It's the	basic and classical way	to code	an hyperlink. It takes 42
       bytes to	code this, because it is presented in a	human-readable form.

       The WAP Forum has defined a compact binary representation of WML	in its
       specification, which is called "compiled	WML". It's a binary format,
       therefore you, a	mere human, can't read that, but your computer can.
       And it's	much faster for	it to read a binary format than	to read	a
       textual format.

       The previous example would be, once compiled (and printed here as

	   1C 4A 8F 03 y a h o o 00 85 01 03 Y a h o o ! 00 01

       This only takes 21 bytes. Half the size of the human-readable form.
       For a Wap device, this means both less to download, and easier things
       to read.	Therefore the processing of the	document can be	achieved in a
       short time compared to the tectual version of the same document.

       There is	a last argument, and not the less important: many Wap devices
       only read binary	WML.

       Actions are a feature similar to	(but with far less functionalities!)
       the SSI (Server Side Includes) available	on good	servers	like Apache.
       In order	not to interfere with the real SSI, but	to keep	the syntax
       easy to learn, it differs in very few points.

       Basically, the syntax to	execute	an action is:

	   <!--	[action	param1="value" param2='value'] -->

       Note that the angle brackets are	part of	the syntax. Except for that
       point, Actions syntax is	very similar to	SSI syntax.

   Available actions
       Only few	actions	are currently available, but more can be implemented
       on request.

		   Includes a file in the document at the current point.
		   Please note that Html2Wml doesn't check nor parse the file,
		   and if the file cannot be found, will silently die (this is
		   the same behavior as	SSI).

		   "virtual=url" -- The	file is	get by http.

		   "file=path" -- The file is read from	the local disk.

		   Returns the size of a file at the current point of the

		   "virtual=url" -- The	file is	get by http.

		   "file=path" -- The file is read from	the local disk.

	   Notes   If you use the file parameter, an absolute path is

		   Skips everything until the first "end_skip" action.

   Generic parameters
       The following parameters	can be used for	any action.

       for=output format
	   This	paramater restricts the	action for the given output format.
	   Currently, the only available format	is ""wml"" (when using
	   "html2chtml"	the format is ""chtml"").

       If you want to share a navigation bar between several WML pages,	you
       can "include" it	this way:

	   <!--	[include virtual="nav.wml"] -->

       Of course, you have to write this navigation bar	first :-)

       If you want to use your current HTML pages for creating your WML	pages,
       but that	they contains complex tables, or unecessary navigation tables,
       etc, you	can simply "skip" the complex parts and	keep the rest.

	   <!--[skip for="wml"]-->
	   unecessary parts for	the WML	pages
	   useful parts	for the	WML pages

       The links reconstruction	engine is IMHO the most	important part of
       Html2Wml, because it's this engine that allows you to reconstruct the
       links of	the HTML document being	converted. It has two modes, depending
       upon whether Html2Wml was launched from the shell or as a CGI.

       When used as a CGI, this	engine will reconstructs the links of the HTML
       document	so that	all the	urls will be passed to Html2Wml	in order to
       convert the pointed files (pages	or images). This is completly
       automatic and can't be customized for now (but I	don't think it would
       be really useful).

       When used from the shell, this engine reconstructs the links with the
       given templates.	Note that absolute URLs	will be	left untouched.	The
       templates can be	customized using the following syntax.

       HREF Template
	   This	template controls the reconstruction of	the "href" attribute
	   of the "A" tag. Its value can be changed using the --hreftmpl
	   option.  Default value is "{FILEPATH}{FILENAME}{$FILETYPE =~
	   s/s?html?/wml/o; $FILETYPE}".

       Image Source Template
	   This	template controls the reconstruction of	the "src" attribute of
	   the "IMG" tag. Its value can	be changed using the --srctmpl option.
	   Default value is "{FILEPATH}{FILENAME}{$FILETYPE =~
	   s/gif|png|jpe?g/wbmp/o; $FILETYPE}"

       The template is a string	that contains the new URL. More	precisely,
       it's a Text::Template template. Parameters can be interpolated as a
       constant	or as a	variable. The template is embraced between curcly
       bracets,	and can	contain	any valid Perl code.

       The simplest form of a template is "{PARAM}" which just returns the
       value of	PARAM. If you want to do something more	complex, you can use
       the corresponding variable; for example "{"foo $PARAM bar"}", or	"{join
       "_", split " ", PARAM}".

       You may read Text::Template for more information	on what	is possible
       within a	template.

       If the original URL contained a query part or a fragment	part, then
       they will be appended to	the result of the template.

   Available parameters
       URL This	parameter contains the original	URL from the "href" or "src"

	   This	parameter contains the base name of the	file.

	   This	parameter contains the leading path of the file.

	   This	parameter contains the suffix of the file.

       This can	be resumed this	way:

	 URL =
				    ------------^^^^ ----
					|	 |     \
					|	 |	\

       Note that "FILETYPE" contains all the extensions	of the file, so	if its
       name is for example, "FILETYPE" contains """".

       To add a	path option:


       Using Apache, you can then add a	Rewrite	directive so that URL ending
       with $wap will be redirected to Html2Wml:

	   RewriteRule	^(/.*)\$wap$  /cgi-bin/html2wml.cgi?url=$1

       To change the extension of an image:


       Html2Wml	uses LWP built-in proxy	support. It is activated by default,
       and loads the proxy settings from the environment variables, using the
       same variables as many others programs. Each protocol (http, ftp, etc)
       can be mapped to	use a proxy server by setting a	variable of the	form
       "PROTOCOL_proxy".  Example: use "http_proxy" to define the proxy	for
       http access, "ftp_proxy"	for ftp	access.	In the shell, this is only a
       matter of defining the variable.

       For Bourne shell:

	   $ export http_proxy=""

       For C-shell:

	   % setenv http_proxy ""

       Under Apache, you can add this directive	to your	configuration file:

	   SetEnv http_proxy ""

       but this	has the	default	that another CGI, or another program, can use
       this to access external ressources. A better way	is to edit Html2Wml
       and fill	the option "proxy-server" with the appropriate value.

       Html2Wml	tries to make correct WML documents, but the well-formedness
       and the validity	of the document	are not	guarantied.

       Inverted	tags (like "<b>bold <i>italic</b></i>")	may produce unexpected
       results.	But only bad softwares do bad stuff like this.

	   This	is the web site	of the Html2Wml	project, hosted	by  All the stable releases can be downloaded from
	   this	site.

	   [ ]

	   This	is the web site	of the author, where you can find the archives
	   of all the releases of Html2Wml.

	   [ ]

       The WAP Forum
	   This	is the official	site of	the WAP	Forum. You can find some
	   technical information, as the specifications	of all the
	   technologies	associated with	the WAP.

	   [ ]
	   This	site has some useful information and links. In particular, it
	   has a quite well done FAQ.

	   [ ]

       The World Wide Web Consortium
	   Altough not directly	related	to the Wap stuff, you may find useful
	   to read the specifications of the XML (WML is an XML	application),
	   and the specifications of the different stylesheet languages	(CSS
	   and XSL), which include support for low-resolution devices.

	   [	]

	   This	web site is dedicated to Mobile	UniX systems. It leads you to
	   a lot of useful hands-on information	about installing and running
	   Linux and BSD on laptops, PDAs and other mobile computer devices.

	   [ ]

   Programmers utilities
       HTML Tidy
	   This	is a very handful utility which	corrects your HTML files so
	   that	they conform to	W3C standards.

	   [ ]

	   Kannel is an	open source Wap	and SMS	gateway.  A WML	compiler is
	   included in the distribution.

	   [ ]

       WML Tools
	   This	is a collection	of utilities for WML programmers. This include
	   a compiler, a decompiler, a viewer and a WBMP converter.

	   [ ]

   WML browsers	and Wap	emulators
	   Opera is originaly a	Web browser, but the version 5 has a good
	   support for XML and WML. Opera is available for free	for several

	   [ ]

	   wApua is an open source WML browser written in Perl/Tk.  It's easy
	   to intall and to use. Its support for WML is	incomplete, but
	   sufficient for testing purpose.

	   [ ]

	   Tofoa is an open source Wap emulator	written	in Python.  Its
	   installation	is quite difficult, and	its incomplete WML support
	   makes it produce strange results, even with valid WML documents.

	   [ ]

	   EzWAP, from EZOS, is	a commercial WML browser freely	available for
	   Windows 9x, NT, 2000	and CE.	Compared to others Windows WML
	   browsers, it	requires very few resources, and is quite stable. Its
	   support for the WML specs seems quite complete. A very good

	   [ ]

	   Deck-It is a	commercial Wap phone emulator, available for Windows
	   and Linux/Intel only. It's a	very good piece	of software which
	   really show how WML pages are rendered on a Wap phone, but one of
	   its major default is	that it	cannot read local files.

	   [ ]

       Klondike	WAP Browser
	   Klondike WAP	Browser	is a commercial	WAP browser available for
	   Windows and PocketPC.

	   [ ]

	   WinWAP is a commercial Wap browser, freely available	for Windows.

	   [ ]

	   WAPman from EdgeMatrix, is a	commercial WAP browser available for
	   Windows and PalmOS.


       Wireless	Companion
	   Wireless Companion, from, is a WAP emulator available
	   for Windows.

	   [ ]

	   Mobilizer is	a Wap emulator available for Windows and Unix.

	   [ ]

	   QWmlBrowser (formerly known as WML BRowser) is an open source WML
	   browser, written using the Qt toolkit.

	   [	]

	   Wapsody, developed by IBM, is a freely available simulation
	   environment that implements the WAP specification. It also features
	   a WML browser which can be run stand-alone.	As Wapsody is written
	   in Java/Swing, it should work on any	system.

	   [ ]

	   WAPreview is	a Wap emulator written in Java.	As it uses an HTML
	   based UI and	needs a	local web proxy, it runs quite slowly.

	   [ ]

	   PicoWap is a	small WML browser made by three	French students.

	   [ ]

       Werner Heuser, for his numerous ideas, advices and his help for the

       Igor Khristophorov, for his numerous suggestions	and patches

       And all the people that send me bug reports: Daniele Frijia, Axel
       Jerabek,	Ouyang

       Sebastien Aperghis-Tramoni <<gt>

       Copyright (C)2000, 2001,	2002 Sebastien Aperghis-Tramoni

       This program is free software. You can redistribute it and/or modify it
       under the terms of the GNU General Public License, version 2 or later.

0.4.0				  2003-10-26			   HTML2WML(1)


Want to link to this manual page? Use this URL:

home | help