Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
TICKIT_UTF8_COUNT(3)	   Library Functions Manual	  TICKIT_UTF8_COUNT(3)

       tickit_utf8_count,  tickit_utf8_countmore - count characters in Unicode

       #include	<tickit.h>

       typedef struct {
	   size_t bytes;
	   int	  codepoints;
	   int	  graphemes;
	   int	  columns;
       } TickitStringPos;

       size_t tickit_utf8_count(const char *str, TickitStringPos *pos,
	   const TickitStringPos *limit);
       size_t tickit_utf8_countmore(const char *str, TickitStringPos *pos,
	   const TickitStringPos *limit);

       size_t tickit_utf8_ncount(const char *str, size_t len,
	   TickitStringPos *pos, const TickitStringPos *limit);
       size_t tickit_utf8_ncountmore(const char	*str, size_t len,
	   TickitStringPos *pos, const TickitStringPos *limit);

       Link with -ltickit.

       tickit_utf8_count() counts characters  in  the  given  Unicode  string,
       which  must  be	in  UTF-8  encoding. It	starts at the beginning	of the
       string and counts forward over codepoints and  graphemes,  incrementing
       the  counters  in  pos until it reaches a limit.	It will	not go further
       than any	of the limits given by the limits structure (where  the	 value
       -1 indicates no limit of	that type). It will never split	a codepoint in
       the middle of a UTF-8 sequence, nor will	it split  a  grapheme  between
       its  codepoints;	it is therefore	possible that the function returns be-
       fore any	of the limits have been	reached, if the	 next  whole  grapheme
       would  involve  going  past  at	least one of the specified limits. The
       function	will also stop when it reaches the end of str. It returns  the
       total number of bytes it	has counted over.

       The bytes member	counts UTF-8 bytes which encode	individual codepoints.
       For example the Unicode character U+00E9	is encoded by two bytes	 0xc3,
       0xa9;  it  would	 increment  the	 bytes counter by 2 and	the codepoints
       counter by 1.

       The codepoints member counts individual Unicode codepoints.

       The graphemes member counts whole composed graphical clusters of	 code-
       points, where combining accents which count as individual codepoints do
       not count as separate graphemes.	For example,  the  codepoint  sequence
       U+0065  U+0301  would  increment	 the  codepoint	 counter  by 2 and the
       graphemes counter by 1.

       The columns member counts the number of screen columns consumed by  the
       graphemes.  Most	 graphemes consume only	1 column, but some are defined
       in Unicode to consume 2.

       tickit_utf8_countmore() is similar  to  tickit_utf8_count()  except  it
       will  not  zero	any  of	the counters before it starts. It can continue
       counting	where a	previous call finished.	In particular, it will	assume
       that  it	is starting at the beginning of	a UTF-8	sequence that begins a
       new grapheme; it	will not check these facts and the behavior  is	 unde-
       fined  if  these	 assumptions  do not hold. It will begin at the	offset
       given by	pos.bytes.

       The tickit_utf8_ncount()	and tickit_utf8_ncountmore() variants are sim-
       ilar  except  that they read no more than len bytes from	the string and
       do not require it to be NUL terminated. They will still stop at	a  NUL
       byte if one is found before len bytes have been read.

       These functions will all	immediately abort if any C0 or C1 control byte
       other than NUL is encountered, returning	the value -1. In this  circum-
       stance,	the  pos  structure will still be updated with the progress so

       Typically, these	functions would	be used	either of two ways.

       When given a value in limit.bytes (or no	limit and simply using	string
       termination),  tickit_utf8_count()  will	 yield	the width of the given
       string in terminal columns, in the limit.columns	field.

       When given a value in limit.columns, tickit_utf8_count()	will yield the
       number of bytes of that string that will	consume	the given space	on the

       tickit_utf8_count() and tickit_utf8_countmore() return  the  number  of
       bytes they have skipped over this call, or -1 if	they encounter a C0 or
       C1 byte other than NUL .

       tickit_stringpos_zero(3),	      tickit_stringpos_limit_bytes(3),
       tickit_utf8_mbswidth(3),	tickit(7)



Want to link to this manual page? Use this URL:

home | help