Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
UNICODE_CONVERT(3)	    Courier Unicode Library	    UNICODE_CONVERT(3)

       unicode_u_ucs4_native, unicode_u_ucs2_native, unicode_convert_init,
       unicode_convert,	unicode_convert_deinit,	unicode_convert_tocbuf_init,
       unicode_convert_tou_init, unicode_convert_fromu_init,
       unicode_convert_uc, unicode_convert_tocbuf_toutf8_init,
       unicode_convert_tocbuf_fromutf8_init, unicode_convert_toutf8,
       unicode_convert_fromutf8, unicode_convert_tobuf,
       unicode_convert_tou_tobuf, unicode_convert_fromu_tobuf -	unicode
       character set conversion

       #include	<courier-unicode.h>

		extern const char unicode_u_ucs4_native[];

		extern const char unicode_u_ucs2_native[];

       unicode_convert_handle_t	unicode_convert_init(const char	*src_chset,
						     const char	*dst_chset,
						     void *cb_arg);

       int unicode_convert(unicode_convert_handle_t handle, const char *text,
			   size_t cnt);

       int unicode_convert_deinit(unicode_convert_handle_t handle,
				  int *errptr);

							    unicode_convert_tocbuf_init(const char *src_chset,
							    const char *dst_chset,
							    char **cbufptr_ret,
							    size_t *cbufsize_ret,
							    int	nullterminate);

								   unicode_convert_tocbuf_toutf8_init(const char *src_chset,
								   char	**cbufptr_ret,
								   size_t *cbufsize_ret,
								   int nullterminate);

								     unicode_convert_tocbuf_fromutf8_init(const	char *dst_chset,
								     char **cbufptr_ret,
								     size_t *cbufsize_ret,
								     int nullterminate);

							 unicode_convert_tou_init(const	char *src_chset,
							 char32_t **ucptr_ret,
							 size_t	*ucsize_ret,
							 int nullterminate);

							   unicode_convert_fromu_init(const char *dst_chset,
							   char	**cbufptr_ret,
							   size_t *cbufsize_ret,
							   int nullterminate);

       int unicode_convert_uc(unicode_convert_handle_t handle,
			      const char32_t *text, size_t cnt);

       char *unicode_convert_toutf8(const char *text, const char *charset,
				    int	*error);

       char *unicode_convert_fromutf8(const char *text,	const char *charset,
				      int *error);

       char *unicode_convert_tobuf(const char *text, const char	*charset,
				   const char *dstcharset, int *error);

       int unicode_convert_toubuf(const	char *text, size_t text_l,
				  const	char *charset, char32_t	**uc,
				  size_t *ucsize, int *error);

       int unicode_convert_fromu_tobuf(const char32_t *utext, size_t utext_l,
				       const char *charset, char **c,
				       size_t *csize, int *error);

       unicode_u_ucs4_native[] contains	the string "UCS-4BE" or	"UCS-4LE",
       matching	the native char32_t endianness.

       unicode_u_ucs2_native[] contains	the string "UCS-2BE" or	"UCS-2LE",
       matching	the native char32_t endianness.

       unicode_convert_init(), unicode_convert(), and unicode_convert_deinit()
       are an adaption of th iconv(3)[1] API that uses the same	calling
       convention as the other algorithms in this unicode library, with	some
       value-added features. These functions use iconv(3) to effect the	actual
       character set conversion.

       unicode_convert_init() returns a	non-NULL handle	for the	requested
       conversion, or NULL if the requested conversion is not available.
       unicode_convert_init() takes a pointer to the output function that
       receives	receives converted character text. The output function
       receives	a pointer to the converted character text, and the number of
       characters in the converted text. The output function gets repeatedly
       called, until it	receives the entire converted text.

       The character text to convert gets passed, repeatedly, to
       unicode_convert(). Each call to unicode_convert() results in the	output
       function	getting	invoked, zero or more times, with each successive part
       of the converted	text. Finally, unicode_convert_deinit()	stops the
       conversion and deallocates the conversion handle.

       It's possible that a call to unicode_convert_deinit() results in	some
       additional calls	to the output function,	passing	the remaining, final
       parts, of the converted text, before unicode_convert_deinit()
       deallocates the handle, and returns.

       The output function should return 0 normally. A non-0 return indicates
       n error condition.  unicode_convert_deinit() returns non-zero if	any
       previous	invocation of the output function returned non-zero (this
       includes	any invocations	of the output function resulting from this
       call, or	prior unicode_convert()	calls),	or 0 if	all invocations	of the
       output function returned	0.

       If the errptr is	not NULL, *errptr gets set to non-zero if there	were
       any conversion errors --	if there was any text that could not be
       converted to the	destination character text.

       unicode_convert() also returns non-zero if it calls the output function
       and it returns non-zero,	however	the conversion handle remains
       allocated, so unicode_convert_deinit() must still be called, to clean
       that up.

   Collecting converted	text into a buffer
       Call unicode_convert_tocbuf_init() instead of unicode_convert_init(),
       then call unicode_convert() and unicode_convert_deinit()	normally. The
       parameters to unicode_convert_init() specify the	source and the
       destination character sets.  unicode_convert_tocbuf_toutf8_init() is
       just an alias that specifies UTF-8 as the destination character set.
       unicode_convert_tocbuf_fromutf8_init() is just an alias that specifies
       UTF-8 as	the source character st.

       These functions supply an output	function that collects the converted
       text into a malloc()ed buffer. If unicode_convert_deinit() returns 0,
       *cbufptr_ret gets initialized to	a malloc()ed buffer, and the number of
       converted characters, the size of the malloc()ed	buffer,	get placed
       into *cbufsize_ret.

	   If the converted string is an empty string, *cbufsize_ret gets set
	   to 0, but *cbufptr_ret still	gets initialized (to a dummy malloced

       A non-zero nullterminate	places a trailing \0 character after the
       converted string	(this is included in *cbufsize_ret).

   Converting between character	sets and unicode
       unicode_convert_tou_init() converts character text into a char32_t
       buffer. It works	just like unicode_convert_tocbuf_init(), except	that
       only the	source character set gets specified and	the output buffer is a
       char32_t	buffer.	 nullterminate terminates the converted	unicode
       characters with a U+0000.

       unicode_convert_fromu_init() converts char32_ts to the output character
       set, and	also works like	unicode_convert_tocbuf_init(). Additionally,
       in this case, unicode_convert_uc() works	just like unicode_convert()
       except that the input sequence is a char32_t sequence, and the count
       parameter is th enumber of unicode characters.

   One-shot conversions
       unicode_convert_toutf8()	converts the specified text in the specified
       text into a UTF-8 string, returning a malloced buffer. If error is not
       NULL, even if unicode_convert_toutf8() returns a	non NULL value *error
       gets set	to a non-zero value if a character conversion error has
       occured,	and some characters could not be converted.

       unicode_convert_fromutf8() does a similar conversion from UTF-8 text to
       the specified character set.

       unicode_convert_tobuf() does a similar conversion between two different
       character sets.

       unicode_convert_tou_tobuf() calls unicode_convert_tou_init(), feeds the
       character string	through	unicode_convert(), then	calls
       unicode_convert_deinit(). If this function returns 0, *uc and *ucsize
       are set to a malloced buffer+size holding the unicode char array.

       unicode_convert_fromu_tobuf() calls unicode_convert_fromu_init(), feeds
       the unicode array through unicode_convert_uc(), then calls
       unicode_convert_deinit(). If this function returns 0, *c	and *csize are
       set to a	malloced buffer+size holding the char array.

       courier-unicode(7), unicode_convert_tocase(3),

       Sam Varshavchik



Courier	Unicode	Library		  03/12/2021		    UNICODE_CONVERT(3)


Want to link to this manual page? Use this URL:

home | help