Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Map8(3)		      User Contributed Perl Documentation	       Map8(3)

NAME
       Unicode::Map8 - Mapping table between 8-bit chars and Unicode

SYNOPSIS
	require	Unicode::Map8;
	my $no_map = Unicode::Map8->new("ISO646-NO") ||	die;
	my $l1_map = Unicode::Map8->new("latin1")    ||	die;

	my $ustr = $no_map->to16("V}re norske tegn b|r {res\n");
	my $lstr = $l1_map->to8($ustr);
	print $lstr;

	print $no_map->tou("V}re norske	tegn b|r {res\n")->utf8

DESCRIPTION
       The Unicode::Map8 class implement efficient mapping tables between
       8-bit character sets and	16 bit character sets like Unicode.  The
       tables are efficient both in terms of space allocated and translation
       speed.  The 16-bit strings is assumed to	use network byte order.

       The following methods are available:

       $m = Unicode::Map8->new(	[$charset] )
	   The object constructor creates new instances	of the Unicode::Map8
	   class.  I takes an optional argument	that specify then name of a
	   8-bit character set to initialize mappings from.  The argument can
	   also	be a the name of a mapping file.  If the charset/file can not
	   be located, then the	constructor returns undef.

	   If you omit the argument, then an empty mapping table is
	   constructed.	 You must then add mapping pairs to it using the
	   addpair() method described below.

       $m->addpair( $u8, $u16 );
	   Adds	a new mapping pair to the mapping object.  It takes two
	   arguments.  The first is the	code value in the 8-bit	character set
	   and the second is the corresponding code value in the 16-bit
	   character set.  The same codes can be used multiple times (but
	   using the same pair has no effect).	The first definition for a
	   code	is the one that	is used.

	   Consider the	following example:

	     $m->addpair(0x20, 0x0020);
	     $m->addpair(0x20, 0x00A0);
	     $m->addpair(0xA0, 0x00A0);

	   It means that the character 0x20 and	0xA0 in	the 8-bit charset maps
	   to themselves in the	16-bit set, but	in the 16-bit character	set
	   0x0A0 maps to 0x20.

       $m->default_to8(	$u8 )
	   Set the code	of the default character to use	when mapping from
	   16-bit to 8-bit strings.  If	there is no mapping pair defined for a
	   character then this default is substituted by to8() and recode8().

       $m->default_to16( $u16 )
	   Set the code	of the default character to use	when mapping from
	   8-bit to 16-bit strings. If there is	no mapping pair	defined	for a
	   character then this default is used by to16(), tou()	and recode8().

       $m->nostrict;
	   All undefined mappings are replaced with the	identity mapping.
	   Undefined character are normally just removed (or replaced with the
	   default if defined) when converting between character sets.

       $m->to8(	$ustr );
	   Converts a 16-bit character string to the corresponding string in
	   the 8-bit character set.

       $m->to16( $str );
	   Converts a 8-bit character string to	the corresponding string in
	   the 16-bit character	set.

       $m->tou(	$str );
	   Same	an to16() but return a Unicode::String object instead of a
	   plain UCS2 string.

       $m->recode8($m2,	$str);
	   Map the string $str from one	8-bit character	set ($m) to another
	   one ($m2).  Since we	assume we know the mappings towards the	common
	   16-bit encoding we can use this to convert between any of the 8-bit
	   character sets.

       $m->to_char16( $u8 )
	   Maps	a single 8-bit character code to an 16-bit code.  If the 8-bit
	   character is	unmapped then the constant NOCHAR is returned.	The
	   default is not used and the callback	method is not invoked.

       $m->to_char8( $u16 )
	   Maps	a single 16-bit	character code to an 8-bit code. If the	16-bit
	   character is	unmapped then the constant NOCHAR is returned.	The
	   default is not used and the callback	method is not invoked.

       The following callback methods are available.  You can override these
       methods by creating a subclass of Unicode::Map8.

       $m->unmapped_to8
	   When	mapping	to 8-bit character string and there is no mapping
	   defined (and	no default either), then this method is	called as the
	   last	resort.	 It is called with a single integer argument which is
	   the code of the unmapped 16-bit character.  It is expected to
	   return a string that	will be	incorporated in	the 8-bit string.  The
	   default version of this method always returns an empty string.

	   Example:

	    package MyMapper;
	    @ISA=qw(Unicode::Map8);

	    sub	unmapped_to8
	    {
	       my($self, $code)	= @_;
	       require Unicode::CharName;
	       "<" . Unicode::CharName::uname($code) . ">";
	    }

       $m->unmapped_to16
	   Likewise when mapping to 16-bit character string and	no mapping is
	   defined then	this method is called.	It should return a 16-bit
	   string with the bytes in network byte order.	 The default version
	   of this method always returns an empty string.

FILES
       The Unicode::Map8 constructor can parse two different file formats; a
       binary format and a textual format.

       The binary format is simple.  It	consist	of a sequence of 16-bit
       integer pairs in	network	byte order.  The first pair should contain the
       magic value 0xFFFE, 0x0001.  Of each pair, the first value is the code
       of an 8-bit character and the second is the code	of the 16-bit
       character.  If follows from this	that the first value should be less
       than 256.

       The textual format consist of lines that	is either a comment (first
       non-blank character is '#'), a completely blank line or a line with two
       hexadecimal numbers.  The hexadecimal numbers must be preceded by "0x"
       as in C and Perl.  This is the same format used by the Unicode mapping
       files available from <URL:ftp://ftp.unicode.org/Public>.

       The mapping table files are installed in	the Unicode/Map8/maps
       directory somewhere in the Perl @INC path.  The variable
       $Unicode::Map8::MAPS_DIR	is the complete	path name to this directory.
       Binary mapping files are	stored within this directory with the suffix
       .bin.  Textual mapping files are	stored with the	suffix .txt.

       The scripts map8_bin2txt	and map8_txt2bin can translate between these
       mapping file formats.

       A special file called aliases within $MAPS_DIR specify all the alias
       names that can be used to denote	the various character sets.  The first
       name of each line is the	real file name and the rest is alias names
       separated by space.

       The `"umap --list"' command be used to list the character sets
       supported.

BUGS
       Does not	handle Unicode surrogate pairs as a single character.

SEE ALSO
       umap(1),	Unicode::String

COPYRIGHT
       Copyright 1998 Gisle Aas.

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.24.1			  2010-01-18			       Map8(3)

NAME | SYNOPSIS | DESCRIPTION | FILES | BUGS | SEE ALSO | COPYRIGHT

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Unicode::Map8&sektion=3&manpath=FreeBSD+12.0-RELEASE+and+Ports>

home | help