Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
UNIQUOTE(1)	      User Contributed Perl Documentation	   UNIQUOTE(1)

NAME
       uniquote	- escape special characters using various quoting conventions

SYNOPSIS
       uniquote	[options] [ textfile ... ]

	Standard options:

	   --version	   print version information and exit
	   --help	   this	message
	   --man	   full	manpage
	   --debug	   add some debugging output

	Character mode options:

		   Without a specified encoding, utf8 is assumed
		   unless file has encoding extension.

	   --verbose   -v  show	full character names like \N{EN	DASH}

	   --hex       -x  use singleton \x{...} esapes	instead	of \N{U+XXX}

	   --encoding  -E  specify encoding for	all input files

	   --html      -H  show	HTML entities (add --verbose for names)
	   --xml       -X  show	XML entities

	 Binary	mode options:

	   --bytes     -b  binary file in hex
	   --octal     -0  binary file in octal

	 Other options:

	   --endings	   -n	place $	at EOL so trailing spaces visible
	   --backslash	   -t	use backslash escapes for unprintable ASCII
	   --fix-newlines  -l	consider any Unicode linebreak sequence	as EOL
	   --unbuffer	   -u	flush each output line

DESCRIPTION
       The uniquote program it means as	a Unicode-aware	replacement for
       programs	like ol(1) and "cat -v".  It converts ASCII control code and
       all non-ASCII code points into a	quoted form such as one	might use in a
       Perl literal.

       Use --endings or	"-e" to	cat like "cat -e" and add a dollar at the end
       of each line so trailing	spaces become apparent.

       Use --backslash or "-t" to show tabs and	other ASCII control codes as
       backslash escapes.

       By default, uniquote converts each such code points into	the form
       "\N{U+hex}", making code	point 962 appear as "\N{U+3C2}".  The --hex
       option instead shows eligible points in backslash-X notation, so	code
       point 962 would be displayed as "\x{3C2}".

       The --verbose option instead displays eligible code points by name.
       Code point 962 would then be shown as "\N{GREEK SMALL LETTER FINAL
       SIGMA}".

       The --xml and --html options show code point using numeric entities.
       Adding --verbose	to --html will use named HTML entities where
       available.

   Character Modes vs Binary Mode
       To treat	the file as a sequence a bytes,	use --binary.  This displays
       all bytes escaped in the	form "\xXX".  The other	way to specify binary
       input uses the <--octal>	option.

       If you have not specified binary	mode, then you are in character	mode.
       The default encoding in character mode us not ASCII but UTF-8.  If you
       have not	specified an optional encoding with --encoding,	but the
       filename	ends with the name of an encoding that Perl recognizes,	that
       encoding	will be	assumed.

       Note that no matter the actual input character encoding,	code points
       reflect the Unicode number of that code point.  You can use this
       property	to normalize input, or to check	that you actually know a
       file's encoding.	 For example, you can test the same file with various
       8-bit encodings like Latin1, MacRoman, and CP1252.

       The default input encoding is actually "utf8"; that is, Perl's
       permissive version of UTF-8.  If	you want strict	UTF-8, override	it.

EXAMPLES
	 $ perl	-E 'say	"ascii:\tnayeeve fassodd"'						       > /tmp/nf.ascii
	 $ perl	-E 'binmode(STDOUT, "encoding(macroman)")||die;	say "macroman:\tna\xEFve fa\xE7ade"'   > /tmp/nf.macroman
	 $ perl	-E 'binmode(STDOUT, "encoding(utf8)")||die;	say "utf8:\tna\xEFve fa\xE7ade"'       > /tmp/nf.utf8
	 $ perl	-E 'binmode(STDOUT, "encoding(utf16)")||die;	say "utf16:\tna\xEFve fa\xE7ade"'      > /tmp/nf.utf16
	 $ perl	-E 'binmode(STDOUT, "encoding(utf32)")||die;	say "utf32:\tna\xEFve fa\xE7ade"'      > /tmp/nf.utf32
	 $ perl	-E 'binmode(STDOUT, "encoding(latin1)")||die;	say "latin1:\tna\xEFve fa\xE7ade"'     > /tmp/nf.latin1
	 $ perl	-E 'binmode(STDOUT, "encoding(cp1252)")||die;	say "cp1252:\tna\xEFve fa\xE7ade"'     > /tmp/nf.cp1252

	 $ wc -c /tmp/nf*
	     23	/tmp/nf.ascii
	     21	/tmp/nf.cp1252
	     21	/tmp/nf.latin1
	     23	/tmp/nf.macroman
	     42	/tmp/nf.utf16
	     84	/tmp/nf.utf32
	     21	/tmp/nf.utf8
	    235	total

	 $ uniquote /tmp/nf.*
       ascii:\N{U+09}nayeeve fassodd
       cp1252:\N{U+09}na\N{U+EF}ve fa\N{U+E7}ade
       latin1:\N{U+09}na\N{U+EF}ve fa\N{U+E7}ade
       macroman:\N{U+09}na\N{U+EF}ve fa\N{U+E7}ade
       utf16:\N{U+09}na\N{U+EF}ve fa\N{U+E7}ade
       utf32:\N{U+09}na\N{U+EF}ve fa\N{U+E7}ade
       utf8:\N{U+09}na\N{U+EF}ve fa\N{U+E7}ade

	 $ uniquote --backslash	--endings /tmp/nf.*
       ascii:\tnayeeve fassodd$
       cp1252:\tna\N{U+EF}ve fa\N{U+E7}ade$
       latin1:\tna\N{U+EF}ve fa\N{U+E7}ade$
       macroman:\tna\N{U+EF}ve fa\N{U+E7}ade$
       utf16:\tna\N{U+EF}ve fa\N{U+E7}ade$
       utf32:\tna\N{U+EF}ve fa\N{U+E7}ade$
       utf8:\tna\N{U+EF}ve fa\N{U+E7}ade$

	 $ uniquote --verbose /tmp/nf.*
       ascii:\N{CHARACTER TABULATION}nayeeve fassodd
       cp1252:\N{CHARACTER TABULATION}na\N{LATIN SMALL LETTER I	WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade
       latin1:\N{CHARACTER TABULATION}na\N{LATIN SMALL LETTER I	WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade
       macroman:\N{CHARACTER TABULATION}na\N{LATIN SMALL LETTER	I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade
       utf16:\N{CHARACTER TABULATION}na\N{LATIN	SMALL LETTER I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH	CEDILLA}ade
       utf32:\N{CHARACTER TABULATION}na\N{LATIN	SMALL LETTER I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH	CEDILLA}ade
       utf8:\N{CHARACTER TABULATION}na\N{LATIN SMALL LETTER I WITH DIAERESIS}ve	fa\N{LATIN SMALL LETTER	C WITH CEDILLA}ade

	 $ uniquote --binary /tmp/nf.*
       ascii:\x09nayeeve fassodd
       cp1252:\x09na\xEFve fa\xE7ade
       latin1:\x09na\xEFve fa\xE7ade
       macroman:\x09na\x95ve fa\x8Dade
       \xFE\xFF\x00u\x00t\x00f\x001\x006\x00:\x00\x09\x00n\x00a\x00\xEF\x00v\x00e\x00 \x00f\x00a\x00\xE7\x00a\x00d\x00e\x00
       \x00\x00\xFE\xFF\x00\x00\x00u\x00\x00\x00t\x00\x00\x00f\x00\x00\x003\x00\x00\x002\x00\x00\x00:\x00\x00\x00\x09\x00\x00\x00n\x00\x00\x00a\x00\x00\x00\xEF\x00\x00\x00v\x00\x00\x00e\x00\x00\x00 \x00\x00\x00f\x00\x00\x00a\x00\x00\x00\xE7\x00\x00\x00a\x00\x00\x00d\x00\x00\x00e\x00\x00\x00
       utf8:\x09na\xC3\xAFve fa\xC3\xA7ade

	 $ uniquote --xml /tmp/nf.*
       ascii:&#x9;nayeeve fassodd
       cp1252:&#x9;na&#xef;ve fa&#xe7;ade
       latin1:&#x9;na&#xef;ve fa&#xe7;ade
       macroman:&#x9;na&#xef;ve	fa&#xe7;ade
       utf16:&#x9;na&#xef;ve fa&#xe7;ade
       utf32:&#x9;na&#xef;ve fa&#xe7;ade
       utf8:&#x9;na&#xef;ve fa&#xe7;ade

	 $ uniquote --html /tmp/nf.*
       ascii:&#9;nayeeve fassodd
       cp1252:&#9;na&#239;ve fa&#231;ade
       latin1:&#9;na&#239;ve fa&#231;ade
       macroman:&#9;na&#239;ve fa&#231;ade
       utf16:&#9;na&#239;ve fa&#231;ade
       utf32:&#9;na&#239;ve fa&#231;ade
       utf8:&#9;na&#239;ve fa&#231;ade

	 $ uniquote --html --verbose /tmp/nf.*
       ascii:&#9;nayeeve fassodd
       cp1252:&#9;na&iuml;ve fa&ccedil;ade
       latin1:&#9;na&iuml;ve fa&ccedil;ade
       macroman:&#9;na&iuml;ve fa&ccedil;ade
       utf16:&#9;na&iuml;ve fa&ccedil;ade
       utf32:&#9;na&iuml;ve fa&ccedil;ade
       utf8:&#9;na&iuml;ve fa&ccedil;ade

	 $ uniquote --backslash	--encoding latin1   --verbose /tmp/nf.*
       ascii:\tnayeeve fassodd
       cp1252:\tna\N{LATIN SMALL LETTER	I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade
       latin1:\tna\N{LATIN SMALL LETTER	I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade
       macroman:\tna\N{MESSAGE WAITING}ve fa\N{REVERSE LINE FEED}ade
       \N{LATIN	SMALL LETTER THORN}\N{LATIN SMALL LETTER Y WITH	DIAERESIS}\0u\0t\0f\01\06\0:\0\t\0n\0a\0\N{LATIN SMALL LETTER I	WITH DIAERESIS}\0v\0e\0	\0f\0a\0\N{LATIN SMALL LETTER C	WITH CEDILLA}\0a\0d\0e\0
       \0\0\N{LATIN SMALL LETTER THORN}\N{LATIN	SMALL LETTER Y WITH DIAERESIS}\0\0\0u\0\0\0t\0\0\0f\0\0\03\0\0\02\0\0\0:\0\0\0\t\0\0\0n\0\0\0a\0\0\0\N{LATIN SMALL LETTER I WITH DIAERESIS}\0\0\0v\0\0\0e\0\0\0	\0\0\0f\0\0\0a\0\0\0\N{LATIN SMALL LETTER C WITH CEDILLA}\0\0\0a\0\0\0d\0\0\0e\0\0\0
       utf8:\tna\N{LATIN CAPITAL LETTER	A WITH TILDE}\N{MACRON}ve fa\N{LATIN CAPITAL LETTER A WITH TILDE}\N{SECTION SIGN}ade

	 $ uniquote --backslash	--encoding cp1252   --verbose /tmp/nf.*
       ascii:\tnayeeve fassodd
       uniquote: cp1252	"\x8D" does not	map to Unicode at /tmp/nf.macroman line	0
       cp1252:\tna\N{LATIN SMALL LETTER	I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade
       latin1:\tna\N{LATIN SMALL LETTER	I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade
       \N{LATIN	SMALL LETTER THORN}\N{LATIN SMALL LETTER Y WITH	DIAERESIS}\0u\0t\0f\01\06\0:\0\t\0n\0a\0\N{LATIN SMALL LETTER I	WITH DIAERESIS}\0v\0e\0	\0f\0a\0\N{LATIN SMALL LETTER C	WITH CEDILLA}\0a\0d\0e\0
       \0\0\N{LATIN SMALL LETTER THORN}\N{LATIN	SMALL LETTER Y WITH DIAERESIS}\0\0\0u\0\0\0t\0\0\0f\0\0\03\0\0\02\0\0\0:\0\0\0\t\0\0\0n\0\0\0a\0\0\0\N{LATIN SMALL LETTER I WITH DIAERESIS}\0\0\0v\0\0\0e\0\0\0	\0\0\0f\0\0\0a\0\0\0\N{LATIN SMALL LETTER C WITH CEDILLA}\0\0\0a\0\0\0d\0\0\0e\0\0\0
       utf8:\tna\N{LATIN CAPITAL LETTER	A WITH TILDE}\N{MACRON}ve fa\N{LATIN CAPITAL LETTER A WITH TILDE}\N{SECTION SIGN}ade

	 $ uniquote --backslash	--encoding macroman --verbose /tmp/nf.*
       ascii:\tnayeeve fassodd
       cp1252:\tna\N{LATIN CAPITAL LETTER O WITH CIRCUMFLEX}ve fa\N{LATIN CAPITAL LETTER A WITH	ACUTE}ade
       latin1:\tna\N{LATIN CAPITAL LETTER O WITH CIRCUMFLEX}ve fa\N{LATIN CAPITAL LETTER A WITH	ACUTE}ade
       macroman:\tna\N{LATIN SMALL LETTER I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C	WITH CEDILLA}ade
       \N{OGONEK}\N{CARON}\0u\0t\0f\01\06\0:\0\t\0n\0a\0\N{LATIN CAPITAL LETTER	O WITH CIRCUMFLEX}\0v\0e\0 \0f\0a\0\N{LATIN CAPITAL LETTER A WITH ACUTE}\0a\0d\0e\0
       \0\0\N{OGONEK}\N{CARON}\0\0\0u\0\0\0t\0\0\0f\0\0\03\0\0\02\0\0\0:\0\0\0\t\0\0\0n\0\0\0a\0\0\0\N{LATIN CAPITAL LETTER O WITH CIRCUMFLEX}\0\0\0v\0\0\0e\0\0\0 \0\0\0f\0\0\0a\0\0\0\N{LATIN	CAPITAL	LETTER A WITH ACUTE}\0\0\0a\0\0\0d\0\0\0e\0\0\0
       utf8:\tna\N{SQUARE ROOT}\N{LATIN	CAPITAL	LETTER O WITH STROKE}ve	fa\N{SQUARE ROOT}\N{LATIN SMALL	LETTER SHARP S}ade

ERRORS
       Exits 0 if all is well, 1 otherwise.

       Errors include inaccessible files, bogus	encodings, and contents	that
       do not match a specified	encoding.

BUGS
       Good question.

SEE ALSO
       od(1), cat(1), Encode(3)

HISTORY
       First public release February 27, 2011.

AUTHOR
       Tom Christiansen	"<tchrist@perl.com>"

COPYRIGHT AND LICENCE
       Copyright 2010 Tom Christiansen.

       This program is free software; you may redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.32.1			  2021-11-05			   UNIQUOTE(1)

NAME | SYNOPSIS | DESCRIPTION | EXAMPLES | ERRORS | BUGS | SEE ALSO | HISTORY | AUTHOR | COPYRIGHT AND LICENCE

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=uniquote&sektion=1&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help