Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
UTF(3)			   Library Functions Manual			UTF(3)

       runetochar,  chartorune,	 runelen, fullrune, utflen, utfrune, utfrrune,
       utfutf -	Unicode	Text Format functionality

       #include	<utf.h>

       int runetochar(char *cp,	Rune *rp);

       int chartorune(Rune *rp,	char *cp);

       int runelen(long	r);

       int fullrune(char *cp, int n);

       int utflen(char *s);

       int utfbytes(char *s);

       char *utfrune(char *cp, long r);

       char *utfrrune(char *cp,	long r);

       char *utfutf(char *big, char *little);

       int utf_snprintf(char *buf, size_t size,	char *format, ...);

       int utfcmp(char *s1, char *s2);

       int utfncmp(char	*s1, char *s2, int rc);

       char *utfcpy(char *dst, char *src);

       char *utfncpy(char *dst,	char *src, int nbytes);

       char *utfcat(char *src, char *append);

       char *utfncat(char *src,	char *append, int nbytes);

       The UTF routines	are used to pack the  Unicode  text  encoding  into  a
       standard	 character  stream.   To do that effectively, ASCII characters
       form the	lowest 127 characters of UTF-8.	These  characters  are	inter-
       changeable between the two character sets.  A Rune is a Unicode charac-
       ter, defined in the header file utf.h.

       runetochar translates a single Rune to a	UTF sequence and  returns  the
       number  of  bytes produced. chartorune is the inverse of	this function,
       returning the number of bytes consumed.	runelen	returns	the number  of
       bytes  in  the  encoding	 of  a Rune.  fullrune checks that the first n
       bytes of	the UTF	string cp contain a complete UTF encoding.

       utflen returns the number of runes in a UTF  string.   utbytes  returns
       the  number of bytes in a UTF string.  utfrune returns a	pointer	to the
       first occurrence	of a rune in a UTF string.  utfrrune returns a pointer
       to  the last.  utfutf searches for the first occurrence of a UTF	string
       in another UTF string.

       utf_snprintf is a prticularly dumb implementation of snprintf  for  utf
       strings	-  it  only  interprets	 %%, %s	and %d sequences in the	format
       string, and does	no field width calculation on those.

       utfcmp compares	two  strings  lexicographically,  Rune	by  Rune,  and
       returns	a  value  greater  than	 0,  equal  to zero, or	less than zero
       depending on whether the	first UTF string is greater than, the same as,
       or  less	 than  the second string.  utfncmp does	the same comparison as
       utfcmp, with a maximum upper bound of rc	Runes.

       utfcpy copies from source to destination, Rune by Rune, and returns its
       destination  string.  No	bounds checking	is done	on the number of Runes
       copied, or their	individual  sizes.   The  dst  argument	 is  returned.
       utfncpy	copies at most nbytes bytes from source	to destination,	termi-
       nating when a null Rune is found	in the source. If the number of	 bytes
       copied is less than nbytes, then	the destination	string is paddedf with
       null (0)	bytes. If it is	equal to or greater than nbytes, no zero bytes
       is added.  The dst argument is returned.	 utfcat	appends	the UTF	string
       append onto the UTF string src.	utfncat	appends	the UTF	string	append
       onto  the  UTF  string src, bearing in mind that	the buffer src is only
       nbytes long.

       This implementation of UTF, nominally UTF-8, can	encode a null  Unicode
       character  using	 a one-byte or a two-byte encoding.  Typically,	Plan 9
       uses a one-byte encoding, whilst	Java uses a two-byte encoding.	Plan 9
       type  encoding  makes  backwards	 compatibility	much easier, and loses
       nothing - all the Java functionality is there, there  are  no  embedded
       null  bytes  in	a  UTF string, due to the encoding of second and third
       characters, and ordinary	C strings are recognised as well, which	is not
       the case	in Java.  By default, a	one byte Null-byte encoding is used.

       UTF-8  is  defined in X/Open Company Ltd., "File	System Safe UCS	Trans-
       formation Format	(FSS_UTF)", X/Open Preliminary Specification, Document
       Number: P316, which also	appears	in ISO/IEC 10646, Annex	P.

       Undoubtably, these are many, and	legion.

       Written	  by	Alistair    Crooks   (,	or   agc@west-, from a	draft document written by  Rob	Pike  and  Ken
       Thompson,  detailing  the implementation	of UTF in the Plan 9 operating



Want to link to this manual page? Use this URL:

home | help