Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
UNICODE_WORD_BREAK(3)	    Courier Unicode Library	 UNICODE_WORD_BREAK(3)

       unicode_wb_init,	unicode_wb_next, unicode_wb_next_cnt, unicode_wb_end,
       unicode_wbscan_init, unicode_wbscan_next, unicode_wbscan_end -
       calculate word breaks

       #include	<courier-unicode.h>

       unicode_wb_info_t unicode_wb_init(int (*cb_func)(int, void *),
					 void *cb_arg);

       int unicode_wb_next(unicode_wb_info_t wb, char32_t c);

       int unicode_wb_next_cnt(unicode_wb_info_t wb, const char32_t *cptr,
			       size_t cnt);

       int unicode_wb_end(unicode_wb_info_t wb);

       unicode_wbscan_info_t unicode_wbscan_init(void);

       int unicode_wbscan_next(unicode_wbscan_info_t wbs, char32_t c);

       size_t unicode_wbscan_end(unicode_wbscan_info_t wbs);

       These functions implement the unicode word breaking algorithm. Invoke
       unicode_wb_init() to initialize the word	breaking algorithm. The	first
       parameter is a callback function. The second parameter is an opaque
       pointer.	The callback function gets invoked with	two parameters.	The
       second parameter	is the opaque pointer that was given to
       unicode_wb_init(); and the opaque pointer is not	subject	to any further
       interpretation by these functions.

       unicode_wb_init() returns an opaque handle. Repeated invocations	of
       unicode_wb_next(), passing the handle, and one unicode character
       defines a sequence of unicode characters	over which the word breaking
       algorithm calculation takes place.  unicode_wb_next_cnt() is a shortcut
       for invoking unicode_wb_next() repeatedly over an array cptr containing
       cnt unicode characters.

       unicode_wb_end()	denotes	the end	of the unicode character sequence.
       After the call to unicode_wb_end() the word breaking unicode_wb_info_t
       handle is no longer valid.

       Between the call	to unicode_wb_init() and unicode_wb_end(), the
       callback	function gets invoked exactly once for each unicode character
       given to	unicode_wb_next() or unicode_wb_next_cnt(). Usually each call
       to unicode_wb_next() results in the callback function getting invoked
       immediately, but	it does	not have to be.	It's possible that a call to
       unicode_wb_next() returns without invoking the callback function, and
       some subsequent call to unicode_wb_next() (or unicode_wb_end()) invokes
       the callback function more than once, to	catch things up. The contract
       is that before unicode_wb_end() returns,	the callback function gets
       invoked the exact number	of times as the	number of characters in	the
       unicode sequence	defined	by the intervening calls to unicode_wb_next()
       and unicode_wb_next_cnt(), unless an error occurs.

       Each call to the	callback function reports the calculated wordbreaking
       status of the corresponding character in	the unicode character
       sequence. If the	parameter to the callback function is non zero,	a word
       break is	permitted before the corresponding character. A	zero value
       indicates that a	word break is prohibited before	the corresponding

       The callback function should return 0. A	non-zero value indicates to
       the word	breaking algorithm that	an error has occured.
       unicode_wb_next() and unicode_wb_next_cnt() return zero either if they
       never invoked the callback function, or if each call to the callback
       function	returned zero. A non zero return from the callback function
       results in unicode_wb_next() and	unicode_wb_next_cnt() immediately
       returning the same value.

       unicode_wb_end()	must be	invoked	to destroy the word breaking handle
       even if unicode_wb_next() and unicode_wb_next_cnt() returned an error
       indication. It's	also possible that, under normal circumstances,
       unicode_wb_end()	invokes	the callback function one or more times. The
       return value from unicode_wb_end() has the same meaning as from
       unicode_wb_next() and unicode_wb_next_cnt(); however in all cases after
       unicode_wb_end()	returns	the line breaking handle is no longer valid.

   Word	scan
       unicode_wbscan_init(), unicode_wbscan_next() and	unicode_wbscan_end
       scan for	the next word boundary in a unicode character sequence.
       unicode_wbscan_init() obtains a handle, then unicode_wbscan_next() gets
       repeatedly invoked to define the	unicode	character sequence.
       unicode_wbscan_end() deallocates	the handle and returns the number of
       leading characters in the unicode character sequence up to the first
       word break.

       A non-0 return value from unicode_wbscan_next() indicates that the word
       boundary	is already known, and any further calls	to
       unicode_wbscan_next() will be ignored.  unicode_wbscan_end() must still
       be called, to obtain the	unicode	character count.

       TR-29[1], courier-unicode(7), unicode::wordbreak(3),
       unicode_convert_tocase(3), unicode_line_break(3),

       Sam Varshavchik

	1. TR-29

Courier	Unicode	Library		  03/11/2017		 UNICODE_WORD_BREAK(3)


Want to link to this manual page? Use this URL:

home | help