Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Unicode::GCString(3)  User Contributed Perl Documentation Unicode::GCString(3)

NAME
       Unicode::GCString - String as Sequence of UAX #29 Grapheme Clusters

SYNOPSIS
	   use Unicode::GCString;
	   $gcstring = Unicode::GCString->new($string);

DESCRIPTION
       Unicode::GCString treats	Unicode	string as a sequence of	extended
       grapheme	clusters defined by Unicode Standard Annex #29 [UAX #29].

       Grapheme	cluster	is a sequence of Unicode character(s) that consists of
       one grapheme base and optional grapheme extender	and/or ^aprepend^a
       character.  It is close in that people consider as character.

   Public Interface
       Constructors

       new (STRING, [KEY => VALUE, ...])
       new (STRING, [LINEBREAK])
	   Constructor.	 Create	new grapheme cluster string (Unicode::GCString
	   object) from	Unicode	string STRING.

	   About optional KEY => VALUE pairs see "Options" in
	   Unicode::LineBreak.	On second form,	Unicode::LineBreak object
	   LINEBREAK controls breaking features.

	   Note: The first form	was introduced by release 2012.10.

       copy
	   Copy	constructor.  Create a copy of grapheme	cluster	string.	 Next
	   position of new string is set at beginning.

       Sizes

       chars
	   Instance method.  Returns number of Unicode characters grapheme
	   cluster string includes, i.e. length	as Unicode string.

       columns
	   Instance method.  Returns total number of columns of	grapheme
	   clusters defined by built-in	character database.  For more details
	   see "DESCRIPTION" in	Unicode::LineBreak.

       length
	   Instance method.  Returns number of grapheme	clusters contained in
	   grapheme cluster string.

       Operations as String

       as_string
       """OBJECT"""
	   Instance method.  Convert grapheme cluster string to	Unicode	string
	   explicitly.

       cmp (STRING)
       STRING "cmp" STRING
	   Instance method.  Compare strings.  There are no oddities.  One of
	   each	STRING may be Unicode string.

       concat (STRING)
       STRING "." STRING
	   Instance method.  Concatenate STRINGs.  One of each STRING may be
	   Unicode string.  Note that number of	columns	(see columns())	or
	   grapheme clusters (see length()) of resulting string	is not always
	   equal to sum	of both	strings.  Next position	of new string is that
	   set on the left value.

       join ([STRING, ...])
	   Instance method.  Join STRINGs inserting grapheme cluster string.
	   Any of STRINGs may be Unicode string.

       substr (OFFSET, [LENGTH,	[REPLACEMENT]])
	   Instance method.  Returns substring of grapheme cluster string.
	   OFFSET and LENGTH are based on grapheme clusters.  If REPLACEMENT
	   is specified, substring is replaced by it.  REPLACEMENT may be
	   Unicode string.

	   Note: This method cannot return the lvalue, unlike built-in
	   substr().

       Operations as Sequence of Grapheme Clusters

       as_array
       "@{"OBJECT"}"
       as_arrayref
	   Instance method.  Convert grapheme cluster string to	an array of
	   grapheme clusters.

       eos Instance method.  Test if current position is at end	of grapheme
	   cluster string.

       item ([OFFSET])
	   Instance method.  Returns OFFSET-th grapheme	cluster.  If OFFSET
	   was not specified, returns next grapheme cluster.

       next
       "<"OBJECT">"
	   Instance method, iterative.	Returns	next grapheme cluster and
	   increment next position.

       pos ([OFFSET])
	   Instance method.  If	optional OFFSET	is specified, set next
	   position by it.  Returns next position of grapheme cluster string.

       Miscelaneous

       lbc Instance method.  Returns Line Breaking Class (See
	   Unicode::LineBreak) of the first character of first grapheme
	   cluster.

       lbcext
	   Instance method.  Returns Line Breaking Class (See
	   Unicode::LineBreak) of the last grapheme extender of	last grapheme
	   cluster.  If	there are no grapheme extenders	or its class is	CM,
	   value of last grapheme base will be returned.

CAVEATS
       o   The grapheme	cluster	should not be referred to as "grapheme"	even
	   though Larry	does.

       o   On Perl around 5.10.1, implicit conversion from Unicode::GCString
	   object to Unicode string sometimes let "utf8_mg_pos_cache_update"
	   cache be confused.

	   For example,	instead	of doing

	       $sub = substr($gcstring,	$i, $j);

	   do

	       $sub = substr("$gcstring", $i, $j);

	       $sub = substr($gcstring->as_string, $i, $j);

       o   This	module implements default algorithm for	determining grapheme
	   cluster boundaries.	Tailoring mechanism has	not been supported
	   yet.

VERSION
       Consult $VERSION	variable.

   Incompatible	Changes
       Release 2013.10
	   o   The new() method	can take non-Unicode string argument.  In this
	       case it will be decoded by iso-8859-1 (Latin 1) character set.
	       That method of former releases would die	with non-Unicode
	       inputs.

SEE ALSO
       [UAX #29] Mark Davis (ed.) (2009-2013).	Unicode	Standard Annex #29:
       Unicode Text Segmentation, Revisions 15-23.
       <http://www.unicode.org/reports/tr29/>.

AUTHOR
       Hatuka*nezumi - IKEDA Soji <hatuka(at)nezumi.nu>

COPYRIGHT
       Copyright (C) 2009-2013 Hatuka*nezumi - IKEDA Soji.

       This program is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.32.0			  2017-04-11		  Unicode::GCString(3)

NAME | SYNOPSIS | DESCRIPTION | CAVEATS | VERSION | SEE ALSO | AUTHOR | COPYRIGHT

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Unicode::GCString&sektion=3&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help