Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
XML::UM(3)	      User Contributed Perl Documentation	    XML::UM(3)

       XML::UM - Convert UTF-8 strings to any encoding supported by

	use XML::UM;

	# Set directory	with .xml files	that comes with	XML::Encoding distribution
	# Always include the trailing slash!
	$XML::UM::ENCDIR = '/home1/enno/perlModules/XML-Encoding-1.01/maps/';

	# Create the encoding routine
	my $encode = XML::UM::get_encode (
	       Encoding	=> 'ISO-8859-2',
	       EncodeUnmapped => \&XML::UM::encode_unmapped_dec);

	# Convert a string from	UTF-8 to the specified Encoding
	my $encoded_str	= $encode->($utf8_str);

	# Remove circular references for garbage collection
	XML::UM::dispose_encoding ('ISO-8859-2');

       This module provides methods to convert UTF-8 strings to	any XML
       encoding	that XML::Encoding supports. It	creates	mapping	routines from
       the .xml	files that can be found	in the maps/ directory in the
       XML::Encoding distribution. Note	that the XML::Encoding distribution
       does install the	.enc files in your perl	directory, but not the.xml
       files they were created from. That's why	you have to specify $ENCDIR as
       in the SYNOPSIS.

       This implementation uses	the XML::Encoding class	to parse the .xml file
       and creates a hash that maps UTF-8 characters (each consisting of up to
       4 bytes)	to their equivalent byte sequence in the specified encoding.
       Note that large mappings	may consume a lot of memory!

       Future implementations may parse	the .enc files directly, or do the
       conversions entirely in XS (i.e.	C code.)

get_encode (Encoding =>	STRING,	EncodeUnmapped => SUB)
       The central entry point to this module is the XML::UM::get_encode()
       method.	It forwards the	call to	the global $XML::UM::FACTORY, which is
       defined as an instance of XML::UM::SlowMapperFactory by default.
       Override	this variable to plug in your own mapper factory.

       The XML::UM::SlowMapperFactory creates an instance of
       XML::UM::SlowMapper (and	caches it for subsequent use) that reads in
       the .xml	encoding file and creates a hash that maps UTF-8 characters to
       encoded characters.

       The get_encode()	method of XML::UM::SlowMapper is called, finally,
       which generates an anonimous subroutine that uses the hash to convert
       multi-character UTF-8 blocks to the proper encoding.

dispose_encoding ($encoding_name)
       Call this to free the memory used by the	SlowMapper for a specific
       encoding.  Note that in order to	free the big conversion	hash, the user
       should no longer	have references	to the subroutines generated by

       The parameters to the get_encode() method (defined as name/value	pairs)

       o   Encoding

	   The name of the desired encoding, e.g. 'ISO-8859-2'

       o   EncodeUnmapped (Default: \&XML::UM::encode_unmapped_dec)

	   Defines how Unicode characters not found in the mapping file	(of
	   the specified encoding) are printed.	 By default, they are
	   converted to	decimal	entity references, like	'{'

	   Use \&XML::UM::encode_unmapped_hex for hexadecimal constants, like

       I'm not exactly sure about which	Unicode	characters in the range	(0 ..
       127) should be mapped to	themselves. See	comments in XML/ near

       The encodings that expat	supports by default are	currently not
       supported, (e.g.	UTF-16,	ISO-8859-1), because there are no .enc files
       available for these encodings.  This module needs some more work. If
       you have	the time, please help!

       Send bug	reports, hints,	tips, suggestions to Enno Derksen at

perl v5.32.1			  2000-02-17			    XML::UM(3)

NAME | SYNOPSIS | DESCRIPTION | get_encode (Encoding => STRING, EncodeUnmapped => SUB) | dispose_encoding ($encoding_name) | CAVEATS | AUTHOR

Want to link to this manual page? Use this URL:

home | help