Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
UNIPROPS(1)	      User Contributed Perl Documentation	   UNIPROPS(1)

NAME
       uniprops	- list unicode properties for one or more characters

SYNOPSIS
       uniprops	[options] character | U+codepoint | "name" ...

	Options:

	   --version   print version information
	   --help      this message
	   --man       full manpage

	   --unicode   list simple Unicode properties (DEFAULT)
	   --general   include even the	long form of general properties

	   --perl      list lowercase Perl short-cuts, plus \R (DEFAULT)
	   --negated   list uppercase Perl short-cuts

	   --all       list all	Unicode	categories, not	just one-parters
	   --list      list all	known Unicode properties, then exit

	   --reorder   sort Unicode property lists shortest first
	   --single    output each property one	per line

	   --verbose   wrap Unicode properties in \p{xxx}
	   --width N   set column width

	   --debug     noisy internal processing

	 options may be	bundled	if used	in the short form; e.g., -va

DESCRIPTION
       Each argument to	uniprops specifies a character in one of three forms:

       1.  a one-character literal, such as "#"	or "A".

       2.  a code point	number in hex, (optionally) prefixed by	"0x" or	"U+",
	   or "\x" or "\u", with the backslash prefixes	admitting but not
	   requiring enclosing curly braces.  Examples:	"0x23",	"U+394",
	   "\x{0394}", "0394".

       3.  a case-sensitive character name, such as "COMMA" or "GREEK CAPITAL
	   LETTER DELTA".  Names may be	specified by their full	names or their
	   short names per the charnames pragma, or they may be	Latin or Greek
	   (in that order).  See the EXAMPLES.

       The uniprops program reports the	properties that	apply to a given
       character for use in regular expressions.  By default, the Perl
       character class short-cuts and the one-part Unicode properties are
       listed, which are mostly	those from the general category.

       The --all option	adds all the two-part Unicode properties from the non-
       general categories.

       Long, two-part forms of general category	properties are not listed
       unless the --general option is given.

       The --negated option adds the Perl shortcuts that are in	capitals.  The
       --verbose option	encloses Unicode properties with "\p{PROPNAME}".

       To simply list out all available	Unicode	properties, use	the --list
       option, which then exits	without	processing further arguments.

       Lines will be wrapped before the	edge of	your screen.  You can override
       the window width	with the --width NN option.  To	get only one property
       per line	without	any indentation, use the --single or -1	option.

       Unicode properties are by default listed	in the same order in which
       they occur in perluniprops(), but the --reorder option will sort	them
       smallest	to largest.

       Unicode properties designated as	deprecated, obsolete, or discouraged,
       or which	begin with an underscore, are ignored.

       It takes	quite some time	to load	up and test all	the Unicode
       properties, so if you just need confirmation of a character, just ask
       for Perl	properties, not	Unicode	ones, and it will run at least six
       times faster.

EXAMPLES
       Count known Unicode properties:

	   $ uniprops -l | wc -l
	   2478

       List all	known Unicode properties, sorted by length:

	   $ uniprops -lr

       List all	known Unicode properties, sorted by name:

	   $ uniprops -l | sort	-df | more

       List Greek-related Unicode properties:

	   $ uniprops -l | grep	Greek |	sort -dfu
	   Blk=Greek
	   Block:Ancient_Greek_Musical_Notation
	   Block:Ancient_Greek_Numbers
	   Block:Greek
	   Block=Greek_And_Coptic
	   Block:Greek_Extended
	   Greek
	   Greek_And_Coptic
	   InAncientGreekMusicalNotation
	   InAncientGreekNumbers
	   InGreek
	   InGreekExtended
	   Is_Greek
	   Script=Greek

       List just Perl properties for three named characters:

	   $ uniprops -p delta greek:delta Greek:Delta
	   U+1E9F a^1aoao \N{ LATIN SMALL LETTER DELTA }:
	       \w \pL \p{LC} \p{L_} \p{L&} \p{Ll}
	   U+03B4 a^1I'ao \N{ GREEK SMALL LETTER DELTA }:
	       \w \pL \p{LC} \p{L_} \p{L&} \p{Ll}
	   U+0394 a^1Iao \N{ GREEK CAPITAL LETTER DELTA	}:
	       \w \pL \p{LC} \p{L_} \p{L&} \p{Lu}

       List just Perl properties negations for four named characters:

	   $ uniprops -p Thorn pi hebrew:alef cyrillic:be
	   U+00DE a^1Aao \N{ LATIN CAPITAL LETTER THORN	}:
	       \w \pL \p{LC} \p{L_} \p{L&} \p{Lu}
	   U+03C0 a^1Iao \N{ GREEK SMALL LETTER	PI }:
	       \w \pL \p{LC} \p{L_} \p{L&} \p{Ll}
	   U+05D0 a^1xao \N{ HEBREW LETTER ALEF	}:
	       \w \pL \p{L_} \p{Lo}
	   U+0431 a^1Dh+-ao \N{	CYRILLIC SMALL LETTER BE }:
	       \w \pL \p{LC} \p{L_} \p{L&} \p{Ll}

       List Perl and Unicode properties	for three different literal
       characters:

	   $ uniprops \# A<section> I
	   U+0023 a^1#ao \N{ NUMBER SIGN }:
	       \pP \p{Po}
	       All Any ASCII Assigned Common Zyyy Po P Gr_Base
		  Grapheme_Base	Graph GrBase Other_Punctuation Punct Pat_Syn
		  Pattern_Syntax PatSyn	PosixGraph PosixPrint PosixPunct
		  Print	Punctuation
	   U+00E7 a^1A<section>ao \N{ LATIN SMALL LETTER C WITH	CEDILLA	}:
	       \w \pL \p{LC} \p{L_} \p{L&} \p{Ll}
	       All Any Alnum Alpha Alphabetic Assigned InLatin1	Cased
		  Cased_Letter LC Changes_When_Casemapped CWCM
		  Changes_When_Titlecased CWT Changes_When_Uppercased CWU Ll
		  L Gr_Base Grapheme_Base Graph	GrBase ID_Continue IDC
		  ID_Start IDS Letter L_ Latin Latn Lowercase_Letter Lower
		  Lowercase Print Word XID_Continue XIDC XID_Start XIDS
	   U+03C0 a^1Iao \N{ GREEK SMALL LETTER	PI }:
	       \w \pL \p{LC} \p{L_} \p{L&} \p{Ll}
	       All Any Alnum Alpha Alphabetic Assigned Greek Is_Greek
		  InGreek Cased	Cased_Letter LC	Changes_When_Casemapped	CWCM
		  Changes_When_Titlecased CWT Changes_When_Uppercased CWU Ll
		  L Gr_Base Grapheme_Base Graph	GrBase Grek Greek_And_Coptic
		  ID_Continue IDC ID_Start IDS Letter L_ Lowercase_Letter
		  Lower	Lowercase Print	Word XID_Continue XIDC XID_Start XIDS

       Just list Perl shortcuts, including negated ones, for a named
       character:

	   $ uniprops -pn LF
	   U+000A a^1U+000Aao \N{ LINE FEED (LF) }:
	       \s \v \R	\pC \p{Cc}
	       \W \D \H

       For the Greek final sigma character, list Unicode properties that are
       either one-parters or else two-part general categories

	   $ uniprops -ug "greek:final sigma"
	   U+03C2 a^1Iao \N{ GREEK SMALL LETTER	FINAL SIGMA }:
	       All Any Alnum Alpha Alphabetic Assigned Greek Is_Greek InGreek
		  Cased	Cased_Letter LC	Changes_When_Casefolded	CWCF
		  Changes_When_Casemapped CWCM Changes_When_NFKC_Casefolded CWKCF
		  Changes_When_Titlecased CWT Changes_When_Uppercased CWU Ll L
		  Gr_Base Grapheme_Base	Graph GrBase Grek Greek_And_Coptic
		  ID_Continue IDC ID_Start IDS Letter L_ Lowercase_Letter Lower
		  Lowercase Print Word XID_Continue XIDC XID_Start XIDS
	       General_Category=Cased_Letter General_Category:Cased_Letter Gc=LC
		  General_Category:L General_Category=Letter General_Category:LC
		  General_Category:Letter Gc=L General_Category:Ll
		  General_Category=Lowercase_Letter
		  General_Category:Lowercase_Letter Gc=Ll

       List just Unicode properties for	a code point, given in hex:

	   $ uniprops -u 0xDF
	   U+00DF a^1Aao \N{ LATIN SMALL LETTER	SHARP S	}:
	       All Any Alnum Alpha Alphabetic Assigned InLatin1	Cased
		  Cased_Letter LC Changes_When_Casefolded CWCF
		  Changes_When_Casemapped CWCM Changes_When_NFKC_Casefolded
		  CWKCF	Changes_When_Titlecased	CWT Changes_When_Uppercased
		  CWU Ll L Gr_Base Grapheme_Base Graph GrBase ID_Continue
		  IDC ID_Start IDS Letter L_ Latin Latn	Lowercase_Letter
		  Lower	Lowercase Print	Word XID_Continue XIDC XID_Start XIDS

       List Perl and Unicode properties	for a named character, verbosely:

	   $ uniprops -v "ALEF SYMBOL"
	   U+2135 a^1a<micro>ao	\N{ ALEF SYMBOL	}:
	       \w \pL \p{L_} \p{Lo}
	       \p{All} \p{Any} \p{Alnum} \p{Alpha} \p{Alphabetic} \p{Assigned}
		  \p{InLetterlikeSymbols} \p{Changes_When_NFKC_Casefolded}
		  \p{CWKCF} \p{Common} \p{Zyyy}	\p{L} \p{Lo} \p{Gr_Base}
		  \p{Grapheme_Base} \p{Graph} \p{GrBase} \p{ID_Continue} \p{IDC}
		  \p{ID_Start} \p{IDS} \p{Letter} \p{L_} \p{Other_Letter}
		  \p{Math} \p{Print} \p{Word} \p{XID_Continue} \p{XIDC}
		  \p{XID_Start}	\p{XIDS}

       List Unicode properties in all categories except	for two-part general
       categories:

	   $ uniprops -au INFINITY
	   U+221E a^1aao \N{ INFINITY }:
	       All Any Assigned	InMathematicalOperators	Common Zyyy Sm S
		  Gr_Base Grapheme_Base	Graph GrBase Math Math_Symbol
		  Pat_Syn Pattern_Syntax PatSyn	Print Symbol
	       Age:1.1 Bidi_Class:ON Bidi_Class=Other_Neutral
		  Bidi_Class:Other_Neutral Bc=ON Block:Mathematical_Operators
		  Canonical_Combining_Class:0
		  Canonical_Combining_Class=Not_Reordered
		  Canonical_Combining_Class:Not_Reordered Ccc=NR
		  Canonical_Combining_Class:NR Script=Common
		  Decomposition_Type:None Dt=None East_Asian_Width:A
		  East_Asian_Width=Ambiguous East_Asian_Width:Ambiguous	Ea=A
		  Grapheme_Cluster_Break:Other GCB=XX Grapheme_Cluster_Break:XX
		  Grapheme_Cluster_Break=Other Hangul_Syllable_Type:NA
		  Hangul_Syllable_Type=Not_Applicable
		  Hangul_Syllable_Type:Not_Applicable Hst=NA
		  Joining_Group:No_Joining_Group Jg=NoJoiningGroup
		  Joining_Type:Non_Joining Jt=U	Joining_Type:U
		  Joining_Type=Non_Joining Line_Break:AI Line_Break=Ambiguous
		  Line_Break:Ambiguous Lb=AI Numeric_Type:None Nt=None
		  Numeric_Value:NaN Nv=NaN Present_In:1.1 Age=1.1 In=1.1
		  Present_In:2.0 In=2.0	Present_In:2.1 In=2.1 Present_In:3.0
		  In=3.0 Present_In:3.1	In=3.1 Present_In:3.2 In=3.2
		  Present_In:4.0 In=4.0	Present_In:4.1 In=4.1 Present_In:5.0
		  In=5.0 Present_In:5.1	In=5.1 Present_In:5.2 In=5.2
		  Script:Common	Sc=Zyyy	Script:Zyyy Sentence_Break:Other SB=XX
		  Sentence_Break:XX Sentence_Break=Other Word_Break:Other WB=XX
		  Word_Break:XX	Word_Break=Other

       For the HYPHEN character, verbosely list	all Unicode properties
       including the two-part general categories, one per line,	and sort them:

	   $ uniprops -1vgau HYPHEN | sort

       List Perl and Unicode properties	for code point U+2212, reordered by
       length and with width set to 50:

	   $ uniprops -r -w 50 U+2212
	   U+2212 a^1aao \N{ MINUS SIGN	}:
	       \pS \p{Sm}
	       S Sm All	Any Dash Math Zyyy Graph Print
		  Common GrBase	PatSyn Symbol Gr_Base Pat_Syn
		  Assigned Math_Symbol Grapheme_Base
		  Pattern_Syntax InMathematicalOperators

       Ask for a (currently) unassigned	code point:

	   $ uniprops 1F12F
	   U+1F12F a^1U+1F12Fao	\N{ U+1F12F }:
	       \pC \p{Cn}
	       All Any InEnclosedAlphanumericSupplement	C Other	Cn
		   Unassigned Zzzz Unknown

ERRORS
       It is an	error to ask for properties of code points representing	a
       UTF-16 surrogate.

       Characters not legal for	interchange are	flagged	as errors.

ENVIRONMENT
       If your environment smells like it's in a Unicode encoding, program
       arguments and output will be in UTF-8.  This allows you to enter	a
       single, literal UTF-8 character as a program argument.

       The PAGER environment variable is used for the --list option.

FILES
       The pod source for the perluniprops(1) manpage is parsed	to determine
       Unicode properties.  This is expected to	be found in the	Config
       module's	$installprivlib/pods directory.

PROGRAMS
       The stty(1) program is called on	Unix systems to	determine the window
       size.

       If the standard output is to a tty when the --list option is requested,
       the user's pager	is used, defaulting to more(1).

BUGS
       The --man option	does not correctly process the page for	UTF-8;
       pod2text(1) works fine, though.

SEE ALSO
       unichars, uninames, perluniprops, perlunicode, perlrecharclass, perlre

AUTHOR
       Tom Christiansen	<tchrist@perl.com>

COPYRIGHT AND LICENCE
       Copyright 2011 Tom Christiansen.

       This program is free software; you may redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.32.1			  2021-11-05			   UNIPROPS(1)

NAME | SYNOPSIS | DESCRIPTION | EXAMPLES | ERRORS | ENVIRONMENT | FILES | PROGRAMS | BUGS | SEE ALSO | AUTHOR | COPYRIGHT AND LICENCE

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=uniprops&sektion=1&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help