Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Unicode::MapUTF8(3)   User Contributed Perl Documentation  Unicode::MapUTF8(3)

NAME
       Unicode::MapUTF8	- Conversions to and from arbitrary character sets and
       UTF8

SYNOPSIS
	use Unicode::MapUTF8 qw(to_utf8	from_utf8 utf8_supported_charset);

	# Convert a string in 'ISO-8859-1' to 'UTF8'
	my $output = to_utf8({ -string => 'An example',	-charset => 'ISO-8859-1' });

	# Convert a string in 'UTF8' encoding to encoding 'ISO-8859-1'
	my $other  = from_utf8({ -string => 'Other text', -charset => 'ISO-8859-1' });

	# List available character set encodings
	my @character_sets = utf8_supported_charset;

	# Add a	character set alias
	utf8_charset_alias({ 'ms-japanese' => 'sjis' });

	# Convert between two arbitrary	(but largely compatible) charset encodings
	# (SJIS	to EUC-JP)
	my $utf8_string	  = to_utf8({ -string =>$sjis_string, -charset => 'sjis'});
	my $euc_jp_string = from_utf8({	-string	=> $utf8_string, -charset => 'euc-jp' })

	# Verify that a	specific character set is supported
	if (utf8_supported_charset('ISO-8859-1') {
	    # Yes
	}

DESCRIPTION
       Provides	an adapter layer between core routines for converting to and
       from UTF8 and other encodings. In essence, a way	to give	multiple
       existing	Unicode	modules	a single common	interface so you don't have to
       know the	underlaying implementations to do simple UTF8 to-from other
       character set encoding conversions. As such, it wraps the
       Unicode::String,	Unicode::Map8, Unicode::Map and	Jcode modules in a
       standardized and	simple API.

       This also provides general character set	conversion operation based on
       UTF8 - it is possible to	convert	between	any two	compatible and
       supported character sets	via a simple two step chaining of conversions.

       As with most things Perlish - if	you give it a few big chunks of	text
       to chew on instead of lots of small ones	it will	handle many more
       characters per second.

       By design, it can be easily extended to encompass any new charset
       encoding	conversion modules that	arrive on the scene.

       This module is intended to provide good Unicode support to versions of
       Perl prior to 5.8. If you are using Perl	5.8.0 or later,	you probably
       want to be using	the Encode module instead. This	module does work with
       Perl 5.8, but Encode is the preferred method in that environment.

CHANGES
       1.14 2020.09.27	 Fixing	POD breakage in	EUC-JP version of POD

       1.13 2020.09.27	 Fixing	MANIFEST.SKIP error

       1.12 2020.09.27	 Build tool updates. Maintainer	updates. POD error
       fixes.
			 Relicensed under MIT license.

       1.11 2005.10.10	 Documentation changes.	Addition of Build.PL support.
			 Added various build tests, LICENSE,
       Artistic_License.txt,
			 GPL_License.txt. Split	documentation into seperate
			 .pod file. Added Japanese translation of POD.

       1.10 2005.05.22 - Fixed bug in conversion of ISO-2022-JP	to UTF-8.
			 Problem and fix found by Masahiro HONMA
			 <masahiro.honma@tsutaya.co.jp>.

			 Similar bugs in conversions of	shift_jis and euc-jp
			 to UTF-8 corrected as well.

       1.09 2001.08.22 - Fixed multiple	typo occurances	of 'uft'
			 where 'utf' was meant in code.	Problem	affected
			 utf16 and utf7	encodings. Problem found
			 by devon smith	<devon@taller.PSCL.cwru.edu>

       1.08 2000.11.06 Added 'utf8_charset_alias' function to allow for
       runtime
		       setting of character set	aliases. Added several
       alternate
		       names for 'sjis'	(shiftjis, shift-jis, shift_jis,
       s-jis,
		       and s_jis).

		       Corrected 'croak' messages for 'from_utf8' functions to
		       appropriate function name.

		       Corrected fatal problem in jcode-unicode	internals. Problem
		       and fix found by	Brian Wisti <wbrian2@uswest.net>.

       1.07 2000.11.01 Added 'croak' to	use Carp declaration to	fix error
		       messages. Problem and fix found by
       <wbrian2@uswest.net>.

       1.06 2000.10.30 Fix to handle change in stringification of overloaded
		       objects between Perl 5.005 and 5.6.
		       Problem noticed by Brian	Wisti <wbrian2@uswest.net>.

       1.05 2000.10.23 Error in	conversions from UTF8 to multibyte encodings
       corrected

       1.04 2000.10.23 Additional diagnostic error messages added for
		       internal	errors

       1.03 2000.10.22 Bug fix for load	time Unicode::Map encoding
		       detection

       1.02 2000.10.22 Bug fix to 'from_utf8' method and load time
		       detection of Unicode::Map8 supported character
		       set encodings

       1.01 2000.10.02 Initial public release

FUNCTIONS
       utf8_charset_alias({ $alias => $charset });
	   Used	for runtime assignment of character set	aliases.

	   Called with no parameters, returns a	hash of	defined	aliases	and
	   the character sets they map to.

	   Example:

	     my	$aliases     = utf8_charset_alias;
	     my	@alias_names = keys %$aliases;

	   If called with ONE parameter, returns the name of the 'real'
	   charset if the alias	is defined. Returns undef if it	is not found
	   in the aliases.

	   Example:

	       if (! utf8_charset_alias('VISCII')) {
		   # No	alias for this
	       }

	   If called with a list of 'alias' => 'charset' pairs,	defines	those
	   aliases for use.

	   Example:

	       utf8_charset_alias({ 'japanese' => 'sjis', 'japan' => 'sjis' });

	   Note: It will croak if a passed pair	does not map to	a character
	   set defined in the predefined set of	character encoding. It is NOT
	   allowed to alias something to another alias.

	   Multiple character set aliases can be set with a single call.

	   To clear an alias, pass a character set mapping of undef.

	   Example:

	       utf8_charset_alias({ 'japanese' => undef	});

	   While an alias is set, the 'utf8_supported_charset' function	will
	   return the alias as if it were a predefined charset.

	   Overriding a	base defined character encoding	with an	alias will
	   generate a warning message to STDERR.

       utf8_supported_charset($charset_name);
	   Returns true	if the named charset is	supported (including user
	   defined aliases).

	   Returns false if it is not.

	   Example:

	       if (! utf8_supported_charset('VISCII')) {
		   # No	support	yet
	       }

	   If called in	a list context with no parameters, it will return a
	   list	of all supported character set names (including	user defined
	   aliases).

	   Example:

	       my @charsets = utf8_supported_charset;

       to_utf8({ -string => $string, -charset => $source_charset });
	   Returns the string converted	to UTF8	from the specified source
	   charset.

       from_utf8({ -string => $string, -charset	=> $target_charset});
	   Returns the string converted	from UTF8 to the specified target
	   charset.

VERSION
       1.14 2020.09.27

TODO
       Regression tests	for Jcode, 2-byte encodings and	encoding aliases

SEE ALSO
       Unicode::String Unicode::Map8 Unicode::Map Jcode	Encode

COPYRIGHT
       Copyright 2000-2020, Jerilyn Franz. All rights reserved.

AUTHOR
       Jerilyn Franz <cpan@jerilyn.info>

LICENSE
       MIT License

       Copyright (c) 2020 Jerilyn Franz

       Permission is hereby granted, free of charge, to	any person obtaining a
       copy of this software and associated documentation files	(the
       "Software"), to deal in the Software without restriction, including
       without limitation the rights to	use, copy, modify, merge, publish,
       distribute, sublicense, and/or sell copies of the Software, and to
       permit persons to whom the Software is furnished	to do so, subject to
       the following conditions:

       The above copyright notice and this permission notice shall be included
       in all copies or	substantial portions of	the Software.

       THE SOFTWARE IS PROVIDED	"AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
       OR IMPLIED, INCLUDING BUT NOT LIMITED TO	THE WARRANTIES OF
       MERCHANTABILITY,	FITNESS	FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
       IN NO EVENT SHALL THE AUTHORS OR	COPYRIGHT HOLDERS BE LIABLE FOR	ANY
       CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN	ACTION OF CONTRACT,
       TORT OR OTHERWISE, ARISING FROM,	OUT OF OR IN CONNECTION	WITH THE
       SOFTWARE	OR THE USE OR OTHER DEALINGS IN	THE SOFTWARE.

perl v5.32.1			  2021-02-28		   Unicode::MapUTF8(3)

NAME | SYNOPSIS | DESCRIPTION | CHANGES | FUNCTIONS | VERSION | TODO | SEE ALSO | COPYRIGHT | AUTHOR | LICENSE

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Unicode::MapUTF8&sektion=3&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help