Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
uniname(1)		    General Commands Manual		    uniname(1)

       uniname - Name the characters in	a Unicode text file

       uniname ([option	flags])	(<file name>)

       If  no input file name is supplied, uniname reads from the standard in-

       uniname names the characters in a Unicode text file.  For each  charac-
       ter,  uniname  defaults to printing the character offset, the byte off-
       set, the	hexadecimal UTF-32 character code, the encoding	as a  sequence
       of  hex	byte values, the glyph,	and the	character's Unicode name. Com-
       mand line flags allow undesired information to be  suppressed.	Glyphs
       that  do	not display nicely, such as control characters and spaces, are
       not displayed.  For the Latin-1 control characters, whose official Uni-
       code name is "control", the real	name is	given. Character and byte off-
       sets both start from 0.

       Where a character does not have a unique	Unicode	name, as is  the  case
       with  Chinese  characters, the character	is identified as "character in
       such-and-such a range".	However, if the	character is a Chinese charac-
       ter listed in Nelson's dictionary, the Nelson number is supplied.

       By  default,  input is expected to be UTF-8. Native order UTF-32	may be
       specified via the command line flag If invalid UTF8 is encountered,  an
       explanation is printed as to why	it is invalid.	-q.

       -A     Skip ASCII whitespace characters.

       -a     Skip ASCII characters.

       -B     Skip characters within the Basic Multilingual Plane.

       -b     Suppress printing	of byte	offset.

       -c     Suppress printing	of character offset.

       -e     Suppress printing	of encoding.

       -g     Suppress printing	of glyph.

       -h     Print usage information.

       -l     Print line number.

       -n     Suppress printing	of Unicode name.

       -p     Suppress printing	of headers every screenfull.

       -q     Input is native order UTF-32.

       -r     Print  Unicode range.  The ranges	reported include both official
	      Unicode ranges and the constructed language  ranges  within  the
	      Private Use Areas	registered with	the Conscript Unicode Registry

       -s <character offset>
	      Skip to specified	character offset.

       -S <byte	offset>
	      Skip to specified	byte offset. Note that even if the  file  con-
	      sists of well-formed Unicode there is no guarantee that the byte
	      sequence beginning at an arbitrary byte will be  valid  Unicode.
	      This  option  is	provided for use where other programs generate
	      only byte	offsets	or where it is necessary to skip over  damaged
	      Unicode. In most circumstances use of a character	offset will be
	      more apprpriate. If a byte offset	is used, the character offsets
	      shown  are  with	respect	to the beginning of the	section	of the
	      file examined rather than	the beginning of the file.

       -u     Suppress printing	of UTF32 code.

       -V     Validate the input. In this case,	nothing	is done	other than de-
	      termine  whether	the input is valid UTF-8 Unicode. If it	is, no
	      output is	produced and the program exits with status 0.  If  in-
	      valid  UTF-8 is encountered, the program reports the location of
	      the first	invalid	UTF-8 encountered, explains why	it is invalid,
	      and exits	with status 1.

       -v     Print version information.


       Unicode Standard, version 5.1

       Bill Poser

       GNU General Public License

				February, 2009			    uniname(1)


Want to link to this manual page? Use this URL:

home | help