Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
UNICHARS(1)	      User Contributed Perl Documentation	   UNICHARS(1)

NAME
       unichars	- list characters for one or more properties

SYNOPSIS
       unichars	[options] criterion ...

       Each criterion is either	a square-bracketed character class, a regex
       starting	with a backslash, or an	arbitrary Perl expression.  See	the
       EXAMPLES	section	below.

       OPTIONS:

	Selection Options:

	   --bmp	   include the Basic Multilingual Plane	(plane 0) [DEFAULT]
	   --smp	   include the Supplementary Multilingual Plane	(plane 1)
	   --astral    -a  include planes above	the BMP	(planes	1-15)
	   --unnamed   -u  include various unnamed characters (see DESCRIPTION)
	   --locale    -l  specify the locale used for UCA functions

	Display	Options:

	   --category  -c  include the general category	(GC=)
	   --script    -s  include the script name (SC=)
	   --block     -b  include the block name (BLK=)
	   --bidi      -B  include the bidi class (BC=)
	   --combining -C  include the canonical combining class (CCC=)
	   --numeric   -n  include the numeric value (NV=)
	   --casefold  -f  include the casefold	status
	   --decimal   -d  include the decimal representation of the code point

	Miscellaneous Options:

	   --version   -v  print version information and exit
	   --help      -h  this	message
	   --man       -m  full	manpage
	   --debug     -d  show	debugging of criteria and examined code	point span

	Special	Functions:

	    $_	  is the current code point
	    ord	  is the current code point's ordinal

	    NAME is charname::viacode(ord)
	    NUM	is Unicode::UCD::num(ord), not code point number
	    CF is casefold->{status}
	    NFD, NFC, NFKD, NFKC, FCD, FCC  (normalization)
	    UCA, UCA1, UCA2, UCA3, UCA4	(binary	sort keys)

	    Singleton, Exclusion, NonStDecomp, Comp_Ex
	    checkNFD, checkNFC,	checkNFKD, checkNFKC, checkFCD,	checkFCC
	    NFD_NO, NFC_NO, NFC_MAYBE, NFKD_NO,	NFKC_NO, NFKC_MAYBE

DESCRIPTION
       The unichars program reports which characters match all selection
       criteria	anded together.

       A criterion beginning with a square bracket or a	backslash is assumed
       to be a regular expression.  Anything else is a Perl expression such as
       you might pass to the Perl "grep" function.  The	$_ variable is set to
       each successive Unicode character, and if all criteria match, that
       character is displayed.

       The numeric code	point is therefore accessible as "ord".

       The special token "NAME"	is set to the full name	of the current code
       point.  Also, the tokens	"NFD", "NFKD", "NFC", and "NFKC" are set to
       the corresponding normalization form.

       By default only plane 0,	the Basic Multilingual Plane, is examined.
       For plane 1, the	Supplementary Multilingual Plane, use --smp.  To
       examine either, specify both --bmp and --smp options, or	-bs.  To
       include all valid code points, use the -a or --astral option.

       Unless the --unnamed option is given, characters	with any of the
       properties Unassigned, PrivateUse, Han, or InHangulSyllables will be
       excluded.

EXAMPLES
       Could all non-ASCII digits:

	    $ unichars -a '\d' '\P{ASCII}' | wc	-l
	    401

       Find all	line terminators:

	   $ unichars '\R'
	    --	     10	 0000A	LINE FEED (LF)
	    --	     11	 0000B	LINE TABULATION
	    --	     12	 0000C	FORM FEED (FF)
	    --	     13	 0000D	CARRIAGE RETURN	(CR)
	    --	    133	 00085	NEXT LINE (NEL)
	    --	   8232	 02028	LINE SEPARATOR
	    --	   8233	 02029	PARAGRAPH SEPARATOR

       Find what is not	"\s" but is "[\h\v]":

	   $ unichars '\S' '[\h\v]'
	    --	     11	 0000B	LINE TABULATION

       Count how many code points in the Basic Multilingual Plane are not
       marks but are diacritics:

	   $ unichars '\PM' '\p{Diacritic}' | wc -l
		209

       Count how many code points in the Basic Multilingual Plane are marks
       but are not diacritics:

	   $ unichars '\pM' '\P{Diacritic}' | wc -l
		750

       Find all	code points that are Letters, are in the Greek script, have
       differing canonical and compatibility decompositions, and whose name
       contains	"SYMBOL":

	   $ unichars -a '\pL' '\p{Greek}' 'NFD	ne NFKD' 'NAME =~ /SYMBOL/'
	    I	    976	 003D0	GREEK BETA SYMBOL
	    I	    977	 003D1	GREEK THETA SYMBOL
	    I	    978	 003D2	GREEK UPSILON WITH HOOK	SYMBOL
	    I	    979	 003D3	GREEK UPSILON WITH ACUTE AND HOOK SYMBOL
	    I	    980	 003D4	GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL
	    I	    981	 003D5	GREEK PHI SYMBOL
	    I	    982	 003D6	GREEK PI SYMBOL
	    I<degree>	   1008	 003F0	GREEK KAPPA SYMBOL
	    I+-	     1009  003F1  GREEK	RHO SYMBOL
	    I^2	     1010  003F2  GREEK	LUNATE SIGMA SYMBOL
	    I'	    1012  003F4	 GREEK CAPITAL THETA SYMBOL
	    I<micro>	  1013	003F5  GREEK LUNATE EPSILON SYMBOL
	    I^1	     1017  003F9  GREEK	CAPITAL	LUNATE SIGMA SYMBOL

       Find all	numeric	nondigits in the Latin script (within the BMP):

	   $ unichars '\pN' '\D' '\p{Latin}'
	    a	    8544  02160	 ROMAN NUMERAL ONE
	    a!	    8545  02161	 ROMAN NUMERAL TWO
	    ac	    8546  02162	 ROMAN NUMERAL THREE
	    aL	    8547  02163	 ROMAN NUMERAL FOUR
	    ax	    8548  02164	 ROMAN NUMERAL FIVE
	    aY	    8549  02165	 ROMAN NUMERAL SIX
	    a|	    8550  02166	 ROMAN NUMERAL SEVEN
	    a<section>	    8551  02167	 ROMAN NUMERAL EIGHT
	    (etc)

       Find the	first three alphanumunderish code points with no assigned
       name:

	   $ unichars -au '\w' '!length	NAME' |	head -3
	    a	13312 003400 <unnamed codepoint>
	    a	13313 003401 <unnamed codepoint>
	    a	13314 003402 <unnamed codepoint>

       Count the combining characters in the Suuplemental Multilingual Plane:

	   $ unichars -s '\pM' | wc -l
		 61

ENVIRONMENT
       If your environment smells like it's in a Unicode encoding, program
       arguments will be in UTF-8.

BUGS
       The --man option	does not correctly process the page for	UTF-8, because
       it does not pass	the necessary --utf8 option to pod2man.

SEE ALSO
       uniprops, uninames, perluniprops, perlunicode, perlrecharclass, perlre

AUTHOR
       Tom Christiansen	<tchrist@perl.com>

COPYRIGHT AND LICENCE
       Copyright 2010 Tom Christiansen.

       This program is free software; you may redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.32.1			  2021-11-05			   UNICHARS(1)

NAME | SYNOPSIS | DESCRIPTION | EXAMPLES | ENVIRONMENT | BUGS | SEE ALSO | AUTHOR | COPYRIGHT AND LICENCE

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=unichars&sektion=1&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help