Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
MIME::Charset(3)      User Contributed Perl Documentation     MIME::Charset(3)

NAME
       MIME::Charset - Charset Information for MIME

SYNOPSIS
	   use MIME::Charset:

	   $charset = MIME::Charset->new("euc-jp");

       Getting charset information:

	   $benc = $charset->body_encoding; # e.g. "Q"
	   $cset = $charset->as_string;	# e.g. "US-ASCII"
	   $henc = $charset->header_encoding; #	e.g. "S"
	   $cset = $charset->output_charset; # e.g. "ISO-2022-JP"

       Translating text	data:

	   ($text, $charset, $encoding)	=
	       $charset->header_encode(
		  "\xc9\xc2\xc5\xaa\xc0\xde\xc3\xef\xc5\xaa".
		  "\xc7\xd1\xca\xaa\xbd\xd0\xce\xcf\xb4\xef",
		  Charset => 'euc-jp');
	   # ...returns	e.g. (<converted>, "ISO-2022-JP", "B").

	   ($text, $charset, $encoding)	=
	       $charset->body_encode(
		   "Collectioneur path\xe9tiquement ".
		   "\xe9clectique de d\xe9chets",
		   Charset => 'latin1');
	   # ...returns	e.g. (<original>, "ISO-8859-1",	"QUOTED-PRINTABLE").

	   $len	= $charset->encoded_header_len(
	       "Perl\xe8\xa8\x80\xe8\xaa\x9e",
	       Charset => 'utf-8',
	       Encoding	=> "b");
	   # ...returns	e.g. 28.

       Manipulating module defaults:

	   MIME::Charset::alias("csEUCKR", "euc-kr");
	   MIME::Charset::default("iso-8859-1");
	   MIME::Charset::fallback("us-ascii");

       Non-OO functions	(may be	deprecated in near future):

	   use MIME::Charset qw(:info);

	   $benc = body_encoding("iso-8859-2");	# "Q"
	   $cset = canonical_charset("ANSI X3.4-1968");	# "US-ASCII"
	   $henc = header_encoding("utf-8"); # "S"
	   $cset = output_charset("shift_jis");	# "ISO-2022-JP"

	   use MIME::Charset qw(:trans);

	   ($text, $charset, $encoding)	=
	       header_encode(
		  "\xc9\xc2\xc5\xaa\xc0\xde\xc3\xef\xc5\xaa".
		  "\xc7\xd1\xca\xaa\xbd\xd0\xce\xcf\xb4\xef",
		  "euc-jp");
	   # ...returns	(<converted>, "ISO-2022-JP", "B");

	   ($text, $charset, $encoding)	=
	       body_encode(
		   "Collectioneur path\xe9tiquement ".
		   "\xe9clectique de d\xe9chets",
		   "latin1");
	   # ...returns	(<original>, "ISO-8859-1", "QUOTED-PRINTABLE");

	   $len	= encoded_header_len(
	       "Perl\xe8\xa8\x80\xe8\xaa\x9e", "b", "utf-8"); #	28

DESCRIPTION
       MIME::Charset provides information about	character sets used for	MIME
       messages	on Internet.

   Definitions
       The charset is ``character set''	used in	MIME to	refer to a method of
       converting a sequence of	octets into a sequence of characters.  It
       includes	both concepts of ``coded character set'' (CCS) and ``character
       encoding	scheme'' (CES) of ISO/IEC.

       The encoding is that used in MIME to refer to a method of representing
       a body part or a	header body as sequence(s) of printable	US-ASCII
       characters.

   Constructor
       $charset	= MIME::Charset->new([CHARSET [, OPTS]])
	   Create charset object.

	   OPTS	may accept following key-value pair.  NOTE: When
	   Unicode/multibyte support is	disabled (see "USE_ENCODE"),
	   conversion will not be performed.  So this option do	not have any
	   effects.

	   Mapping => MAPTYPE
	       Whether to extend mappings actually used	for charset names or
	       not.  "EXTENDED"	uses extended mappings.	 "STANDARD" uses
	       standardized strict mappings.  Default is "EXTENDED".

   Getting Information of Charsets
       $charset->body_encoding
       body_encoding CHARSET
	   Get recommended transfer-encoding of	CHARSET	for message body.

	   Returned value will be one of "B" (BASE64), "Q" (QUOTED-PRINTABLE),
	   "S" (shorter	one of either) or "undef" (might not be	transfer-
	   encoded; either 7BIT	or 8BIT).  This	may not	be same	as encoding
	   for message header.

       $charset->as_string
       canonical_charset CHARSET
	   Get canonical name for charset.

       $charset->decoder
	   Get "Encode::Encoding" object to decode strings to Unicode by
	   charset.  If	charset	is not specified or not	known by this module,
	   undef will be returned.

       $charset->dup
	   Get a copy of charset object.

       $charset->encoder([CHARSET])
	   Get "Encode::Encoding" object to encode Unicode string using
	   compatible charset recommended to be	used for messages on Internet.

	   If optional CHARSET is specified, replace encoder (and output
	   charset name) of $charset object with those of CHARSET, therefore,
	   $charset object will	be a converter between original	charset	and
	   new CHARSET.

       $charset->header_encoding
       header_encoding CHARSET
	   Get recommended encoding scheme of CHARSET for message header.

	   Returned value will be one of "B", "Q", "S" (shorter	one of either)
	   or "undef" (might not be encoded).  This may	not be same as
	   encoding for	message	body.

       $charset->output_charset
       output_charset CHARSET
	   Get a charset which is compatible with given	CHARSET	and is
	   recommended to be used for MIME messages on Internet	(if it is
	   known by this module).

	   When	Unicode/multibyte support is disabled (see "USE_ENCODE"), this
	   function will simply	return the result of "canonical_charset".

   Translating Text Data
       $charset->body_encode(STRING [, OPTS])
       body_encode STRING, CHARSET [, OPTS]
	   Get converted (if needed) data of STRING and	recommended transfer-
	   encoding of that data for message body.  CHARSET is the charset by
	   which STRING	is encoded.

	   OPTS	may accept following key-value pairs.  NOTE: When
	   Unicode/multibyte support is	disabled (see "USE_ENCODE"),
	   conversion will not be performed.  So these options do not have any
	   effects.

	   Detect7bit => YESNO
	       Try auto-detecting 7-bit	charset	when CHARSET is	not given.
	       Default is "YES".

	   Replacement => REPLACEMENT
	       Specifies error handling	scheme.	 See "Error Handling".

	   3-item list of (converted string, charset for output, transfer-
	   encoding) will be returned.	Transfer-encoding will be either
	   "BASE64", "QUOTED-PRINTABLE", "7BIT"	or "8BIT".  If charset for
	   output could	not be determined and converted	string contains	non-
	   ASCII byte(s), charset for output will be "undef" and transfer-
	   encoding will be "BASE64".  Charset for output will be "US-ASCII"
	   if and only if string does not contain any non-ASCII	bytes.

       $charset->decode(STRING [,CHECK])
	   Decode STRING to Unicode.

	   Note: When Unicode/multibyte	support	is disabled (see
	   "USE_ENCODE"), this function	will die.

       detect_7bit_charset STRING
	   Guess 7-bit charset that may	encode a string	STRING.	 If STRING
	   contains any	8-bit bytes, "undef" will be returned.	Otherwise,
	   Default Charset will	be returned for	unknown	charset.

       $charset->encode(STRING [, CHECK])
	   Encode STRING (Unicode or non-Unicode) using	compatible charset
	   recommended to be used for messages on Internet (if this module
	   knows it).  Note that string	will be	decoded	to Unicode then
	   encoded even	if compatible charset was equal	to original charset.

	   Note: When Unicode/multibyte	support	is disabled (see
	   "USE_ENCODE"), this function	will die.

       $charset->encoded_header_len(STRING [, ENCODING])
       encoded_header_len STRING, ENCODING, CHARSET
	   Get length of encoded STRING	for message header (without folding).

	   ENCODING may	be one of "B", "Q" or "S" (shorter one of either "B"
	   or "Q").

       $charset->header_encode(STRING [, OPTS])
       header_encode STRING, CHARSET [,	OPTS]
	   Get converted (if needed) data of STRING and	recommended encoding
	   scheme of that data for message headers.  CHARSET is	the charset by
	   which STRING	is encoded.

	   OPTS	may accept following key-value pairs.  NOTE: When
	   Unicode/multibyte support is	disabled (see "USE_ENCODE"),
	   conversion will not be performed.  So these options do not have any
	   effects.

	   Detect7bit => YESNO
	       Try auto-detecting 7-bit	charset	when CHARSET is	not given.
	       Default is "YES".

	   Replacement => REPLACEMENT
	       Specifies error handling	scheme.	 See "Error Handling".

	   3-item list of (converted string, charset for output, encoding
	   scheme) will	be returned.  Encoding scheme will be either "B", "Q"
	   or "undef" (might not be encoded).  If charset for output could not
	   be determined and converted string contains non-ASCII byte(s),
	   charset for output will be "8BIT" (this is not charset name but a
	   special value to represent unencodable data)	and encoding scheme
	   will	be "undef" (should not be encoded).  Charset for output	will
	   be "US-ASCII" if and	only if	string does not	contain	any non-ASCII
	   bytes.

       $charset->undecode(STRING [,CHECK])
	   Encode Unicode string STRING	to byte	string by input	charset	of
	   $charset.  This is equivalent to "$charset->decoder->encode()".

	   Note: When Unicode/multibyte	support	is disabled (see
	   "USE_ENCODE"), this function	will die.

   Manipulating	Module Defaults
       alias ALIAS [, CHARSET]
	   Get/set charset alias for canonical names determined	by
	   "canonical_charset".

	   If CHARSET is given and isn't false,	ALIAS will be assigned as an
	   alias of CHARSET.  Otherwise, alias won't be	changed.  In both
	   cases, current charset name that ALIAS is assigned will be
	   returned.

       default [CHARSET]
	   Get/set default charset.

	   Default charset is used by this module when charset context is
	   unknown.  Modules using this	module are recommended to use this
	   charset when	charset	context	is unknown or implicit default is
	   expected.  By default, it is	"US-ASCII".

	   If CHARSET is given and isn't false,	it will	be set to default
	   charset.  Otherwise,	default	charset	won't be changed.  In both
	   cases, current default charset will be returned.

	   NOTE: Default charset should	not be changed.

       fallback	[CHARSET]
	   Get/set fallback charset.

	   Fallback charset is used by this module when	conversion by given
	   charset is failed and "FALLBACK" error handling scheme is
	   specified.  Modules using this module may use this charset as last
	   resort of charset for conversion.  By default, it is	"UTF-8".

	   If CHARSET is given and isn't false,	it will	be set to fallback
	   charset.  If	CHARSET	is "NONE", fallback charset will be undefined.
	   Otherwise, fallback charset won't be	changed.  In any cases,
	   current fallback charset will be returned.

	   NOTE: It is useful that "US-ASCII" is specified as fallback
	   charset, since result of conversion will be readable	without
	   charset information.

       recommended CHARSET [, HEADERENC, BODYENC [, ENCCHARSET]]
	   Get/set charset profiles.

	   If optional arguments are given and any of them are not false,
	   profiles for	CHARSET	will be	set by those arguments.	 Otherwise,
	   profiles won't be changed.  In both cases, current profiles for
	   CHARSET will	be returned as 3-item list of (HEADERENC, BODYENC,
	   ENCCHARSET).

	   HEADERENC is	recommended encoding scheme for	message	header.	 It
	   may be one of "B", "Q", "S" (shorter	one of either) or "undef"
	   (might not be encoded).

	   BODYENC is recommended transfer-encoding for	message	body.  It may
	   be one of "B", "Q", "S" (shorter one	of either) or "undef" (might
	   not be transfer-encoded).

	   ENCCHARSET is a charset which is compatible with given CHARSET and
	   is recommended to be	used for MIME messages on Internet.  If
	   conversion is not needed (or	this module doesn't know appropriate
	   charset), ENCCHARSET	is "undef".

	   NOTE: This function in the future releases can accept more optional
	   arguments (for example, properties to handle	character widths, line
	   folding behavior, ...).  So format of returned value	may probably
	   be changed.	Use "header_encoding", "body_encoding" or
	   "output_charset" to get particular profile.

   Constants
       USE_ENCODE
	   Unicode/multibyte support flag.  Non-empty string will be set when
	   Unicode and multibyte support is enabled.  Currently, this flag
	   will	be non-empty on	Perl 5.7.3 or later and	empty string on
	   earlier versions of Perl.

   Error Handling
       "body_encode" and "header_encode" accept	following "Replacement"
       options:

       "DEFAULT"
	   Put a substitution character	in place of a malformed	character.
	   For UCM-based encodings, <subchar> will be used.

       "FALLBACK"
	   Try "DEFAULT" scheme	using fallback charset (see "fallback").  When
	   fallback charset is undefined and conversion	causes error, code
	   will	die on error with an error message.

       "CROAK"
	   Code	will die on error immediately with an error message.
	   Therefore, you should trap the fatal	error with eval{} unless you
	   really want to let it die on	error.	Synonym	is "STRICT".

       "PERLQQ"
       "HTMLCREF"
       "XMLCREF"
	   Use "FB_PERLQQ", "FB_HTMLCREF" or "FB_XMLCREF" scheme defined by
	   Encode module.

       numeric values
	   Numeric values are also allowed.  For more details see "Handling
	   Malformed Data" in Encode.

       If error	handling scheme	is not specified or unknown scheme is
       specified, "DEFAULT" will be assumed.

   Configuration File
       Built-in	defaults for option parameters can be overridden by
       configuration file: MIME/Charset/Defaults.pm.  For more details read
       MIME/Charset/Defaults.pm.sample.

VERSION
       Consult $VERSION	variable.

       Development versions of this module may be found	at
       <http://hatuka.nezumi.nu/repos/MIME-Charset/>.

   Incompatible	Changes
       Release 1.001
	   o   new() method returns an object when CHARSET argument is not
	       specified.

       Release 1.005
	   o   Restrict	characters in encoded-word according to	RFC 2047
	       section 5 (3).  This also affects return	value of
	       encoded_header_len() method.

       Release 1.008.2
	   o   body_encoding() method may also returns "S".

	   o   Return value of body_encode() method for	UTF-8 may include
	       "QUOTED-PRINTABLE" encoding item	that in	earlier	versions was
	       fixed to	"BASE64".

SEE ALSO
       Multipurpose Internet Mail Extensions (MIME).

AUTHOR
       Hatuka*nezumi - IKEDA Soji <hatuka(at)nezumi.nu>

COPYRIGHT
       Copyright (C) 2006-2017 Hatuka*nezumi - IKEDA Soji.  This program is
       free software; you can redistribute it and/or modify it under the same
       terms as	Perl itself.

perl v5.32.0			  2017-04-11		      MIME::Charset(3)

NAME | SYNOPSIS | DESCRIPTION | VERSION | SEE ALSO | AUTHOR | COPYRIGHT

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=MIME::Charset&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help