Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Code Conversion(3m17n)		 Version 1.5.5		Code Conversion(3m17n)

NAME
       Code Conversion - Coding	system objects and API for them.

   Data	Structures
       struct MConverter
	   Structure to	be used	in code	conversion.
       struct MCodingInfoISO2022
	   Structure for a coding system of type MCODING_TYPE_ISO_2022.
       struct MCodingInfoUTF
	   Structure for extra information about a coding system of type
	   MCODING_TYPE_UTF.

   Variables: Symbols representing coding systems
       MSymbol Mcoding_us_ascii
	   Symbol for the coding system	US-ASCII.
       MSymbol Mcoding_iso_8859_1
	   Symbol for the coding system	ISO-8859-1.
       MSymbol Mcoding_utf_8
	   Symbol for the coding system	UTF-8.
       MSymbol Mcoding_utf_8_full
	   Symbol for the coding system	UTF-8-FULL.
       MSymbol Mcoding_utf_16
	   Symbol for the coding system	UTF-16.
       MSymbol Mcoding_utf_16be
	   Symbol for the coding system	UTF-16BE.
       MSymbol Mcoding_utf_16le
	   Symbol for the coding system	UTF-16LE.
       MSymbol Mcoding_utf_32
	   Symbol for the coding system	UTF-32.
       MSymbol Mcoding_utf_32be
	   Symbol for the coding system	UTF-32BE.
       MSymbol Mcoding_utf_32le
	   Symbol for the coding system	UTF-32LE.
       MSymbol Mcoding_sjis
	   Symbol for the coding system	SJIS.

   Variables: Parameter	keys for mconv_define_coding().
       MSymbol Mtype
       MSymbol Mcharsets
       MSymbol Mflags
       MSymbol Mdesignation
       MSymbol Minvocation
       MSymbol Mcode_unit
       MSymbol Mbom
       MSymbol Mlittle_endian

   Variables: Symbols representing coding system types.
       MSymbol Mutf
       MSymbol Miso_2022

   Variables: Symbols appearing	in the value of	Mflags parameter.
       Symbols that can	be a value of the Mflags parameter of a	coding system
       used in an argument to the mconv_define_coding()	function (which	see).
       MSymbol Mreset_at_eol
       MSymbol Mreset_at_cntl
       MSymbol Meight_bit
       MSymbol Mlong_form
       MSymbol Mdesignation_g0
       MSymbol Mdesignation_g1
       MSymbol Mdesignation_ctext
       MSymbol Mdesignation_ctext_ext
       MSymbol Mlocking_shift
       MSymbol Msingle_shift
       MSymbol Msingle_shift_7
       MSymbol Meuc_tw_shift
       MSymbol Miso_6429
       MSymbol Mrevision_number
       MSymbol Mfull_support

   Variables: Others
       Remaining variables.
       MSymbol Mmaybe
	   Symbol whose	name is	'maybe'.
       MSymbol Mcoding
	   The symbol Mcoding.

   Enumerations
       enum MConversionResult {	MCONVERSION_RESULT_SUCCESS,
	   MCONVERSION_RESULT_INVALID_BYTE, MCONVERSION_RESULT_INVALID_CHAR,
	   MCONVERSION_RESULT_INSUFFICIENT_SRC,
	   MCONVERSION_RESULT_INSUFFICIENT_DST,	MCONVERSION_RESULT_IO_ERROR }
	   Codes that represent	the result of code conversion.
       enum MCodingType	{ MCODING_TYPE_CHARSET,	MCODING_TYPE_UTF,
	   MCODING_TYPE_ISO_2022, MCODING_TYPE_MISC }
	   Types of coding system.
       enum MCodingFlagISO2022 { MCODING_ISO_RESET_AT_EOL =  0x1,
	   MCODING_ISO_RESET_AT_CNTL =	0x2, MCODING_ISO_EIGHT_BIT =  0x4,
	   MCODING_ISO_LONG_FORM =  0x8, MCODING_ISO_DESIGNATION_G0 =  0x10,
	   MCODING_ISO_DESIGNATION_G1 =	 0x20, MCODING_ISO_DESIGNATION_CTEXT =
	   0x40, MCODING_ISO_DESIGNATION_CTEXT_EXT =  0x80,
	   MCODING_ISO_LOCKING_SHIFT =	0x100, MCODING_ISO_SINGLE_SHIFT	=
	   0x200, MCODING_ISO_SINGLE_SHIFT_7 =	0x400,
	   MCODING_ISO_EUC_TW_SHIFT =  0x800, MCODING_ISO_ISO6429 =  0x1000,
	   MCODING_ISO_REVISION_NUMBER =  0x2000, MCODING_ISO_FULL_SUPPORT =
	   0x3000, MCODING_ISO_FLAG_MAX	}
	   Bit-masks to	specify	the detail of coding system whose type is
	   MCODING_TYPE_ISO_2022.

   Functions
       MSymbol mconv_define_coding (const char *name, MPlist *plist,
	   int(*resetter)(MConverter *), int(*decoder)(const unsigned char *,
	   int,	MText *, MConverter *),	int(*encoder)(MText *, int, int,
	   unsigned char *, int, MConverter *),	void *extra_info)
	   Define a coding system.
       MSymbol mconv_resolve_coding (MSymbol symbol)
	   Resolve coding system name.
       int mconv_list_codings (MSymbol **symbols)
	   List	symbols	representing coding systems.
       MConverter * mconv_buffer_converter (MSymbol name, const	unsigned char
	   *buf, int n)
	   Create a code converter bound to a buffer.
       MConverter * mconv_stream_converter (MSymbol name, FILE *fp)
	   Create a code converter bound to a stream.
       int mconv_reset_converter (MConverter *converter)
	   Reset a code	converter.
       void mconv_free_converter (MConverter *converter)
	   Free	a code converter.
       MConverter * mconv_rebind_buffer	(MConverter *converter,	const unsigned
	   char	*buf, int n)
	   Bind	a buffer to a code converter.
       MConverter * mconv_rebind_stream	(MConverter *converter,	FILE *fp)
	   Bind	a stream to a code converter.
       MText * mconv_decode (MConverter	*converter, MText *mt)
	   Decode a byte sequence into an M-text.
       MText * mconv_decode_buffer (MSymbol name, const	unsigned char *buf,
	   int n)
	   Decode a buffer area	based on a coding system.
       MText * mconv_decode_stream (MSymbol name, FILE *fp)
	   Decode a stream input based on a coding system.
       int mconv_encode	(MConverter *converter,	MText *mt)
	   Encode an M-text into a byte	sequence.
       int mconv_encode_range (MConverter *converter, MText *mt, int from, int
	   to)
	   Encode a part of an M-text.
       int mconv_encode_buffer (MSymbol	name, MText *mt, unsigned char *buf,
	   int n)
	   Encode an M-text into a buffer area.
       int mconv_encode_stream (MSymbol	name, MText *mt, FILE *fp)
	   Encode an M-text to write to	a stream.
       int mconv_getc (MConverter *converter)
	   Read	a character via	a code converter.
       int mconv_ungetc	(MConverter *converter,	int c)
	   Push	a character back to a code converter.
       int mconv_putc (MConverter *converter, int c)
	   Write a character via a code	converter.
       MText * mconv_gets (MConverter *converter, MText	*mt)
	   Read	a line using a code converter.

Detailed Description
       Coding system objects and API for them.

       The m17n	library	represents a character encoding	scheme (CES) of	coded
       character sets (CCS) as an object called	coding system. Application
       programs	can add	original coding	systems.

       To encode means converting code-points to character codes and to	decode
       means converting	character codes	back to	code-points.

       Application programs can	decode a byte sequence with a specified	coding
       system into an M-text, and inversely, can encode	an M-text into a byte
       sequence.

Data Structure Documentation
   MConverter
       Structure to be used in code conversion.

       FIELD DOCUMENTATION:

       int MConverter::lenient

       Set the value to	nonzero	if the conversion should be lenient. By
       default,	the conversion is strict (i.e. not lenient).

       If the conversion is strict, the	converter stops	at the first invalid
       byte (on	decoding) or at	the first character not	supported by the
       coding system (on encoding). If this happens, MConverter->result	is set
       to MCONVERSION_RESULT_INVALID_BYTE or MCONVERSION_RESULT_INVALID_CHAR
       accordingly.

       If the conversion is lenient, on	decoding, an invalid byte is kept per
       se, and on encoding, an invalid character is replaced with '<U+XXXX>'
       (if the character is a Unicode character) or with '<M+XXXXXX>'
       (otherwise).

       int MConverter::last_block

       Set the value to	nonzero	before decoding	or encoding the	last block of
       the byte	sequence or the	character sequence respectively. The value
       influences the conversion as below.

       On decoding, in the case	that the last few bytes	are too	short to form
       a valid byte sequence:

       If the value is nonzero,	the conversion terminates by error
       (MCONVERSION_RESULT_INVALID_BYTE) at the	first byte of the sequence.

       If the value is zero, the conversion terminates successfully. Those
       bytes are stored	in the converter as carryover and are prepended	to the
       byte sequence of	the further conversion.

       On encoding, in the case	that the coding	system is context dependent:

       If the value is nonzero,	the conversion may produce a byte sequence at
       the end to reset	the context to the initial state even if the source
       characters are zero.

       If the value is zero, the conversion never produce such a byte sequence
       at the end.

       unsigned	MConverter::at_most

       If the value is nonzero,	it specifies at	most how many characters to
       convert.

       int MConverter::nchars

       The following three members are to report the result of the conversion.

       Number of characters most recently decoded or encoded.

       int MConverter::nbytes

       Number of bytes recently	decoded	or encoded.

       enum MConversionResult MConverter::result

       Result code of the conversion.

       void* MConverter::ptr

       double MConverter::dbl

       char MConverter::c[256]

       union { ... }   MConverter::status

       Various information about the status of code conversion.	The contents
       depend on the type of coding system. It is assured that status is
       aligned so that any type	of casting is safe and at least	256 bytes of
       memory space can	be used.

       void* MConverter::internal_info

       This member is for internally use only. An application program should
       never touch it.

   MCodingInfoISO2022
       Structure for a coding system of	type MCODING_TYPE_ISO_2022.

       FIELD DOCUMENTATION:

       int MCodingInfoISO2022::initial_invocation[2]

       Table of	numbers	of an ISO2022 code extension element invoked to	each
       graphic plane (Graphic Left and Graphic Right). -1 means	no code
       extension element is invoked to that plane.

       char MCodingInfoISO2022::designations[32]

       Table of	code extension elements. The Nth element corresponds to	the
       Nth charset in charset_names, which is an argument given	to the
       mconv_define_coding() function.

       If an element value is 0..3, it specifies a graphic register number to
       designate the corresponds charset. In addition, the charset is
       initially designated to that graphic register.

       If the value is -4..-1, it specifies a graphic register number 0..3
       respectively to designate the corresponds charset. Initially, the
       charset is not designated to any	graphic	register.

       unsigned	MCodingInfoISO2022::flags

       Bitwise OR of enum MCodingFlagISO2022 .

   MCodingInfoUTF
       Structure for extra information about a coding system of	type
       MCODING_TYPE_UTF.

       FIELD DOCUMENTATION:

       int MCodingInfoUTF::code_unit_bits

       Specify bits of a code unit. The	value must be 8, 16, or	32.

       int MCodingInfoUTF::bom

       Specify how to handle the heading BOM (byte order mark).	The value must
       be 0, 1,	or 2. The meanings are as follows:

       0: On decoding, check the first two byte. If they are BOM, decide
       endian by them. If not, decide endian by	the member endian. On
       encoding, produce byte sequence according to endian with	heading	BOM.

       1: On decoding, do not handle the first two bytes as BOM, and decide
       endian by endian. On encoding, produce byte sequence according to
       endian without BOM.

       2: On decoding, handle the first	two bytes as BOM and decide ending by
       them. On	encoding, produce byte sequence	according to endian with
       heading BOM.

       If <code_unit_bits> is 8, the value has no meaning.

       int MCodingInfoUTF::endian

       Specify the endian type.	The value must be 0 or 1. 0 means little
       endian, and 1 means big endian.

       If <code_unit_bits> is 8, the value has no meaning.

Enumeration Type Documentation
   enum	MConversionResult
       Codes that represent the	result of code conversion.

       One of these values is set in MConverter->result.

       Enumerator:

       MCONVERSION_RESULT_SUCCESS
	      Code conversion is successful.

       MCONVERSION_RESULT_INVALID_BYTE
	      On decoding, the source contains an invalid byte.

       MCONVERSION_RESULT_INVALID_CHAR
	      On encoding, the source contains a character that	cannot be
	      encoded by the specified coding system.

       MCONVERSION_RESULT_INSUFFICIENT_SRC
	      On decoding, the source ends with	an incomplete byte sequence.

       MCONVERSION_RESULT_INSUFFICIENT_DST
	      On encoding, the destination is too short	to store the result.

       MCONVERSION_RESULT_IO_ERROR
	      An I/O error occurred in the conversion.

   enum	MCodingType
       Types of	coding system.

       Enumerator:

       MCODING_TYPE_CHARSET
	      A	coding system of this type supports charsets directly. The
	      dimension	of each	charset	defines	the length of bytes to
	      represent	a single character of the charset, and a byte sequence
	      directly represents the code-point of a character. The m17n
	      library provides the default decoding and	encoding routines of
	      this type.

       MCODING_TYPE_UTF
	      A	coding system of this type supports byte sequences of a	UTF
	      (UTF-8, UTF-16, UTF-32) like structure. The m17n library
	      provides the default decoding and	encoding routines of this
	      type.

       MCODING_TYPE_ISO_2022
	      A	coding system of this type supports byte sequences of an
	      ISO-2022 like structure. The details of each structure are
	      specified	by MCodingInfoISO2022. The m17n	library	provides
	      decoding and encoding routines of	this type.

       MCODING_TYPE_MISC
	      A	coding system of this type is for byte sequences of
	      miscellaneous structures.	The m17n library does not provide
	      decoding and encoding routines of	this type. They	must be
	      provided by the application program.

   enum	MCodingFlagISO2022
       Bit-masks to specify the	detail of coding system	whose type is
       MCODING_TYPE_ISO_2022.

       Enumerator:

       MCODING_ISO_RESET_AT_EOL
	      On encoding, reset the invocation	and designation	status to
	      initial at end of	line.

       MCODING_ISO_RESET_AT_CNTL
	      On encoding, reset the invocation	and designation	status to
	      initial before any control codes.

       MCODING_ISO_EIGHT_BIT
	      Use the right graphic plane.

       MCODING_ISO_LONG_FORM
	      Use the non-standard 4 bytes format for designation sequence for
	      charsets JISX0208-1978, GB2312, and JISX0208-1983.

       MCODING_ISO_DESIGNATION_G0
	      On encoding, unless explicitly specified,	designate charsets to
	      G0.

       MCODING_ISO_DESIGNATION_G1
	      On encoding, unless explicitly specified,	designate charsets
	      except for ASCII to G1.

       MCODING_ISO_DESIGNATION_CTEXT
	      On encoding, unless explicitly specified,	designate 94-chars
	      charsets to G0, 96-chars charsets	to G1.

       MCODING_ISO_DESIGNATION_CTEXT_EXT
	      On encoding, encode such charsets	not conforming to ISO-2022 by
	      ESC % / ..., and encode non-supported Unicode characters by ESC
	      %	G ... ESC % @ .	On decoding, handle those escape sequences.

       MCODING_ISO_LOCKING_SHIFT
	      Use locking shift.

       MCODING_ISO_SINGLE_SHIFT
	      Use single shift (SS2 (0x8E or ESC N), SS3 (0x8F or ESC O)).

       MCODING_ISO_SINGLE_SHIFT_7
	      Use 7-bit	single shift 2 (SS2 (0x19)).

       MCODING_ISO_EUC_TW_SHIFT
	      Use EUC-TW like special shifting.

       MCODING_ISO_ISO6429
	      Use ISO-6429 escape sequences to indicate	direction. Not yet
	      implemented.

       MCODING_ISO_REVISION_NUMBER
	      On encoding, if a	charset	has revision number, produce escape
	      sequences	to specify the number.

       MCODING_ISO_FULL_SUPPORT
	      Support all ISO-2022 charsets.

       MCODING_ISO_FLAG_MAX

Variable Documentation
   MSymbol Mcoding_us_ascii
       Symbol for the coding system US-ASCII.

       The symbol Mcoding_us_ascii has name 'us-ascii' and represents a	coding
       system for the CES US-ASCII.

   MSymbol Mcoding_iso_8859_1
       Symbol for the coding system ISO-8859-1.

       The symbol Mcoding_iso_8859_1 has name 'iso-8859-1' and represents a
       coding system for the CES ISO-8859-1.

   MSymbol Mcoding_utf_8
       Symbol for the coding system UTF-8.

       The symbol Mcoding_utf_8	has name 'utf-8' and represents	a coding
       system for the CES UTF-8.

   MSymbol Mcoding_utf_8_full
       Symbol for the coding system UTF-8-FULL.

       The symbol Mcoding_utf_8_full has name 'utf-8-full' and represents a
       coding system that is a extension of UTF-8. This	coding system uses the
       same encoding algorithm as UTF-8	but is not limited to the Unicode
       characters. It can encode all characters	supported by the m17n library.

   MSymbol Mcoding_utf_16
       Symbol for the coding system UTF-16.

       The symbol Mcoding_utf_16 has name 'utf-16' and represents a coding
       system for the CES UTF-16 (RFC 2279).

   MSymbol Mcoding_utf_16be
       Symbol for the coding system UTF-16BE.

       The symbol Mcoding_utf_16be has name 'utf-16be' and represents a	coding
       system for the CES UTF-16BE (RFC	2279).

   MSymbol Mcoding_utf_16le
       Symbol for the coding system UTF-16LE.

       The symbol Mcoding_utf_16le has name 'utf-16le' and represents a	coding
       system for the CES UTF-16LE (RFC	2279).

   MSymbol Mcoding_utf_32
       Symbol for the coding system UTF-32.

       The symbol Mcoding_utf_32 has name 'utf-32' and represents a coding
       system for the CES UTF-32 (RFC 2279).

   MSymbol Mcoding_utf_32be
       Symbol for the coding system UTF-32BE.

       The symbol Mcoding_utf_32be has name 'utf-32be' and represents a	coding
       system for the CES UTF-32BE (RFC	2279).

   MSymbol Mcoding_utf_32le
       Symbol for the coding system UTF-32LE.

       The symbol Mcoding_utf_32le has name 'utf-32le' and represents a	coding
       system for the CES UTF-32LE (RFC	2279).

   MSymbol Mcoding_sjis
       Symbol for the coding system SJIS.

       The symbol Mcoding_sjis has name	'sjis' and represents a	coding system
       for the CES Shift-JIS.

   MSymbol Mtype
       Parameter key for mconv_define_coding() (which see).

   MSymbol Mcharsets
   MSymbol Mflags
   MSymbol Mdesignation
   MSymbol Minvocation
   MSymbol Mcode_unit
   MSymbol Mbom
   MSymbol Mlittle_endian
   MSymbol Mutf
       Symbol that can be a value of the Mtype parameter of a coding system
       used in an argument to the mconv_define_coding()	function (which	see).

   MSymbol Miso_2022
   MSymbol Mreset_at_eol
   MSymbol Mreset_at_cntl
   MSymbol Meight_bit
   MSymbol Mlong_form
   MSymbol Mdesignation_g0
   MSymbol Mdesignation_g1
   MSymbol Mdesignation_ctext
   MSymbol Mdesignation_ctext_ext
   MSymbol Mlocking_shift
   MSymbol Msingle_shift
   MSymbol Msingle_shift_7
   MSymbol Meuc_tw_shift
   MSymbol Miso_6429
   MSymbol Mrevision_number
   MSymbol Mfull_support
   MSymbol Mmaybe
       Symbol whose name is 'maybe'.

       The variable Mmaybe is a	symbol of name 'maybe'.	It is used a value of
       Mbom parameter of the function mconv_define_coding() (which see).

   MSymbol Mcoding
       The symbol Mcoding.

       Any decoded M-text has a	text property whose key	is the predefined
       symbol Mcoding. The name	of Mcoding is 'coding'.

COPYRIGHT
       Copyright (C) 2001 Information-technology Promotion Agency (IPA)
       Copyright (C) 2001-2009 National	Institute of Advanced Industrial
       Science and Technology (AIST)
       Permission is granted to	copy, distribute and/or	modify this document
       under the terms of the GNU Free Documentation License
       <http://www.gnu.org/licenses/fdl.html>.

				  15 Oct 2009		Code Conversion(3m17n)

NAME | Detailed Description | Data Structure Documentation | Enumeration Type Documentation | Variable Documentation | COPYRIGHT

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=m17nConv&sektion=3m17n&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help