Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
I18N::Charset(3)      User Contributed Perl Documentation     I18N::Charset(3)

NAME
       I18N::Charset - IANA Character Set Registry names and Unicode::MapUTF8
       (et al.)	 conversion scheme names

SYNOPSIS
	 use I18N::Charset;

	 $sCharset = iana_charset_name('WinCyrillic');
	 # $sCharset is	now 'windows-1251'
	 $sCharset = umap_charset_name('Adobe DingBats');
	 # $sCharset is	now 'ADOBE-DINGBATS' which can be passed to Unicode::Map->new()
	 $sCharset = map8_charset_name('windows-1251');
	 # $sCharset is	now 'cp1251' which can be passed to Unicode::Map8->new()
	 $sCharset = umu8_charset_name('x-sjis');
	 # $sCharset is	now 'sjis' which can be	passed to Unicode::MapUTF8->new()
	 $sCharset = libi_charset_name('x-sjis');
	 # $sCharset is	now 'MS_KANJI' which can be passed to `iconv -f	$sCharset ...`
	 $sCharset = enco_charset_name('Shift-JIS');
	 # $sCharset is	now 'shiftjis' which can be passed to Encode::from_to()

	 I18N::Charset::add_iana_alias('my-japanese' =>	'iso-2022-jp');
	 I18N::Charset::add_map8_alias('my-arabic' => 'arabic7');
	 I18N::Charset::add_umap_alias('my-hebrew' => 'ISO-8859-8');
	 I18N::Charset::add_libi_alias('my-sjis' => 'x-sjis');
	 I18N::Charset::add_enco_alias('my-japanese' =>	'shiftjis');

DESCRIPTION
       The "I18N::Charset" module provides access to the IANA Character	Set
       Registry	names for identifying character	encoding schemes.  It also
       provides	a mapping to the character set names used by the Unicode::Map8
       and Unicode::Map	modules.

       So, for example,	if you get an HTML document with a META	CHARSET="..."
       tag, you	can fairly quickly determine what Unicode::MapXXX module can
       be used to convert it to	Unicode.

       If you don't have the module Unicode::Map installed, the	umap_
       functions will always return undef.  If you don't have the module
       Unicode::Map8 installed,	the map8_ functions will always	return undef.
       If you don't have the module Unicode::MapUTF8 installed,	the umu8_
       functions will always return undef.  If you don't have the iconv
       library installed, the libi_ functions will always return undef.	 If
       you don't have the Encode module	installed, the enco_ functions will
       always return undef.

CONVERSION ROUTINES
       There are four main conversion routines:	"iana_charset_name()",
       "map8_charset_name()", "umap_charset_name()", and
       "umu8_charset_name()".

       iana_charset_name()
	   This	function takes a string	containing the name of a character set
	   and returns a string	which contains the official IANA name of the
	   character set identified. If	no valid character set name can	be
	   identified, then "undef" will be returned.  The case	and
	   punctuation within the string are not important.

	       $sCharset = iana_charset_name('WinCyrillic');

       mime_charset_name()
	   This	function takes a string	containing the name of a character set
	   and returns a string	which contains the preferred MIME name of the
	   character set identified. If	no valid character set name can	be
	   identified, then "undef" will be returned.  The case	and
	   punctuation within the string are not important.

	       $sCharset = mime_charset_name('Extended_UNIX_Code_Packed_Format_for_Japanese');

       enco_charset_name()
	   This	function takes a string	containing the name of a character set
	   and returns a string	which contains a name of the character set
	   suitable to be passed to the	Encode module.	If no valid character
	   set name can	be identified, or if Encode is not installed, then
	   "undef" will	be returned.  The case and punctuation within the
	   string are not important.

	       $sCharset = enco_charset_name('Extended_UNIX_Code_Packed_Format_for_Japanese');

       libi_charset_name()
	   This	function takes a string	containing the name of a character set
	   and returns a string	which contains a name of the character set
	   suitable to be passed to iconv.  If no valid	character set name can
	   be identified, then "undef" will be returned.  The case and
	   punctuation within the string are not important.

	       $sCharset = libi_charset_name('Extended_UNIX_Code_Packed_Format_for_Korean');

       mib_to_charset_name
	   This	function takes a string	containing the MIBenum of a character
	   set and returns a string which contains a name for the character
	   set.	 If the	given MIBenum does not correspond to any character
	   set,	then "undef" will be returned.

	       $sCharset = mib_to_charset_name('3');

       mib_charset_name
	   This	is a synonum for mib_to_charset_name

       charset_name_to_mib
	   This	function takes a string	containing the name of a character set
	   in almost any format	and returns a MIBenum for the character	set.
	   For IANA-registered character sets, this is the IANA-registered
	   MIB.	 For non-IANA character	sets, this is an unambiguous unique
	   string whose	only use is to pass to other functions in this module.
	   If no valid character set name can be identified, then "undef" will
	   be returned.

	       $iMIB = charset_name_to_mib('US-ASCII');

       map8_charset_name()
	   This	function takes a string	containing the name of a character set
	   (in almost any format) and returns a	string which contains a	name
	   for the character set that can be passed to Unicode::Map8::new().
	   Note: the returned string will be capitalized just like the name of
	   the .bin file in the	Unicode::Map8::MAPS_DIR	directory.  If no
	   valid character set name can	be identified, then "undef" will be
	   returned.  The case and punctuation within the argument string are
	   not important.

	       $sCharset = map8_charset_name('windows-1251');

       umap_charset_name()
	   This	function takes a string	containing the name of a character set
	   (in almost any format) and returns a	string which contains a	name
	   for the character set that can be passed to Unicode::Map::new(). If
	   no valid character set name can be identified, then "undef" will be
	   returned.  The case and punctuation within the argument string are
	   not important.

	       $sCharset = umap_charset_name('hebrew');

       umu8_charset_name()
	   This	function takes a string	containing the name of a character set
	   (in almost any format) and returns a	string which contains a	name
	   for the character set that can be passed to
	   Unicode::MapUTF8::new(). If no valid	character set name can be
	   identified, then "undef" will be returned.  The case	and
	   punctuation within the argument string are not important.

	       $sCharset = umu8_charset_name('windows-1251');

QUERY ROUTINES
       There is	one function which can be used to obtain a list	of all IANA-
       registered character set	names.

       "all_iana_charset_names()"
	   Returns a list of all registered IANA character set names.  The
	   names are not in any	particular order.

CHARACTER SET NAME ALIASING
       This module supports several semi-private routines for specifying
       character set name aliases.

       add_iana_alias()
	   This	function takes two strings: a new alias, and a target IANA
	   Character Set Name (or another alias).  It defines the new alias to
	   refer to that character set name (or	to the character set name to
	   which the second alias refers).

	   Returns the target character	set name of the	successfully installed
	   alias.  Returns 'undef' if the target character set name is not
	   registered.	Returns	'undef'	if the target character	set name of
	   the second alias is not registered.

	     I18N::Charset::add_iana_alias('my-alias1' => 'Shift_JIS');

	   With	this code, "my-alias1" becomes an alias	for the	existing IANA
	   character set name 'Shift_JIS'.

	     I18N::Charset::add_iana_alias('my-alias2' => 'sjis');

	   With	this code, "my-alias2" becomes an alias	for the	IANA character
	   set name referred to	by the existing	alias 'sjis' (which happens to
	   be 'Shift_JIS').

       add_map8_alias()
	   This	function takes two strings: a new alias, and a target
	   Unicode::Map8 Character Set Name (or	an exising alias to a Map8
	   name).  It defines the new alias to refer to	that mapping name (or
	   to the mapping name to which	the second alias refers).

	   If the first	argument is a registered IANA character	set name, then
	   all aliases of that IANA character set name will end	up pointing to
	   the target Map8 mapping name.

	   Returns the target mapping name of the successfully installed
	   alias.  Returns 'undef' if the target mapping name is not
	   registered.	Returns	'undef'	if the target mapping name of the
	   second alias	is not registered.

	     I18N::Charset::add_map8_alias('normal' => 'ANSI_X3.4-1968');

	   With	the above statement, "normal" becomes an alias for the
	   existing Unicode::Map8 mapping name 'ANSI_X3.4-1968'.

	     I18N::Charset::add_map8_alias('normal' => 'US-ASCII');

	   With	the above statement, "normal" becomes an alias for the
	   existing Unicode::Map mapping name 'ANSI_X3.4-1968' (which is what
	   "US-ASCII" is an alias for).

	     I18N::Charset::add_map8_alias('IBM297' => 'EBCDIC-CA-FR');

	   With	the above statement, "IBM297" becomes an alias for the
	   existing Unicode::Map mapping name 'EBCDIC-CA-FR'.  As a side
	   effect, all the aliases for 'IBM297'	(i.e. 'cp297' and
	   'ebcdic-cp-fr') also	become aliases for 'EBCDIC-CA-FR'.

       add_umap_alias()
	   This	function works identically to add_map8_alias() above, but
	   operates on Unicode::Map encoding tables.

       add_libi_alias()
	   This	function takes two strings: a new alias, and a target iconv
	   Character Set Name (or existing iconv alias).  It defines the new
	   alias to refer to that character set	name (or to the	character set
	   name	to which the existing alias refers).

	   Returns the target conversion scheme	name of	the successfully
	   installed alias.  Returns 'undef' if	there is no such target
	   conversion scheme or	alias.

	   Examples:

	     I18N::Charset::add_libi_alias('my-chinese1' => 'CN-GB');

	   With	this code, "my-chinese1" becomes an alias for the existing
	   iconv conversion scheme 'CN-GB'.

	     I18N::Charset::add_libi_alias('my-chinese2' => 'EUC-CN');

	   With	this code, "my-chinese2" becomes an alias for the iconv
	   conversion scheme referred to by the	existing alias 'EUC-CN'	(which
	   happens to be 'CN-GB').

       add_enco_alias()
	   This	function takes two strings: a new alias, and a target Encode
	   encoding Name (or existing Encode alias).  It defines the new alias
	   referring to	that encoding name (or to the encoding to which	the
	   existing alias refers).

	   Returns the target encoding name of the successfully	installed
	   alias.  Returns 'undef' if there is no such encoding	or alias.

	   Examples:

	     I18N::Charset::add_enco_alias('my-japanese1' => 'jis0201-raw');

	   With	this code, "my-japanese1" becomes an alias for the existing
	   encoding 'jis0201-raw'.

	     I18N::Charset::add_enco_alias('my-japanese2' => 'my-japanese1');

	   With	this code, "my-japanese2" becomes an alias for the encoding
	   referred to by the existing alias 'my-japanese1' (which happens to
	   be 'jis0201-raw' after the previous call).

KNOWN BUGS AND LIMITATIONS
       o   There could probably	be many	more aliases added (for	convenience)
	   to all the IANA names.  If you have some specific recommendations,
	   please email	the author!

       o   The only character set names	which have a corresponding mapping in
	   the Unicode::Map8 module are	the character sets that	Unicode::Map8
	   can convert.

	   Similarly, the only character set names which have a	corresponding
	   mapping in the Unicode::Map module are the character	sets that
	   Unicode::Map	can convert.

       o   In the current implementation, all tables are read in and
	   initialized when the	module is loaded, and then held	in memory
	   until the program exits.  A "lazy" implementation (or a less-
	   portable tied hash) might lead to a shorter startup time.
	   Suggestions,	patches, comments are always welcome!

SEE ALSO
       Unicode::Map
	   Convert strings from	various	multi-byte character encodings to and
	   from	Unicode.

       Unicode::Map8
	   Convert strings from	various	8-bit character	encodings to and from
	   Unicode.

       Jcode
	   Convert strings among various Japanese character encodings and
	   Unicode.

       Unicode::MapUTF8
	   A wrapper around all	three of these character set conversion
	   distributions.

AUTHOR
       Martin Thurn, "mthurn@cpan.org",	<http://tinyurl.com/nn67z>.

LICENSE
       This module is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.24.1			  2008-07-12		      I18N::Charset(3)

NAME | SYNOPSIS | DESCRIPTION | CONVERSION ROUTINES | QUERY ROUTINES | CHARACTER SET NAME ALIASING | KNOWN BUGS AND LIMITATIONS | SEE ALSO | AUTHOR | LICENSE

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=I18N::Charset&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help