Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
UNICONV(1)			LINUX COMMANDS			    UNICONV(1)

NAME
       uniconv - convert text to native	formats	through	unicode

SYNOPSIS
       uniconv	-out  output-file [ -decode input-encoding ] [ -encode output-
       encoding	] [ input-file ] [ -todos ] [ -fromdos ] [ -tomac ] [ -frommac
       ]

DESCRIPTION
       uniconv	program	 decodes  scripts with a certain encoding encodes them
       with some other encoding.  The scipt is a 16,8 or  7  bit-byte  stream.
       The  converted  text  will be sent to the standard output, even in case
       of 16-bit encodings,unless the output file is specified by the -out op-
       tion.

       The  -decode and	-encode	options	are optional, the default converter is
       utf-8.  The program reads the Unicode map helper	files (*.my) from  the
       default	directory  /usr/local/share/data.  Simple 1-to-1 encodings can
       be added	on the fly by adding a a my-file, or setting your  yudit.data-
       path  property  in  ~/.yudit/yudit.properties  or  /usr/local/share/yu-
       dit/config/yudit.properties.  By	default	/usr/local/share/yudit/data is
       searched.

       My-files	 can be	created	by a program called The	files can be converted
       between dos/unix/mac line-ending	variants with -fromdos,	-frommac, -to-
       dos,  -tomac  options.  the  default  (not  scpecified  one)  is	 Unix.
       makeumap.

ENCODING
       If you received this program through the	Yudit distribution, then as of
       today you can convert between the encodings below.

       utf-8  Yudit  recommends	 this format for international information ex-
	      change.  ASCII text  will	 get through  intact, while other uni-
	      code  characters	will  get their	8th bit	set and	the length  of
	      the  code	 will depend on	how far	away they are in  the  Unicode
	      space.   This  is	the only transformation	format that can	encode
	      both 16-bit (ucs-2) and 31-bit (ucs-4) unicode.

       utf-8-s
	      Hackers utf-8 format - it	does not give an error message when  a
	      surrogate	pair is	decoded	and it can encode a surrogate pair 'as
	      is'.  This is not	a recommended encoding	format	although  this
	      format is	used to	encode/decode clipboard	data, in order to pre-
	      serve input.

       utf-16 Although 16 is bigger than 8 this	is still a compromise required
	      by  OSes	like Windows that can not handle ucs-4 - this encoding
	      produces 16-bit unicode streams.	In addition to BMP it can con-
	      vert  16 planes using the	Unicode	Surrogate Area.	 This encoding
	      can not convert anything above U+10FFFF (Plane 16).   The	 input
	      byte  order is recognized	by the first two characters BEM	(byte-
	      order-mark) U+FEFF. This format is used in Windows NT for	 docu-
	      ments like notepad .txt files.

       utf-16-be
	      Big endian utf-16	converter.

       utf-16-le
	      Littlen endian utf-16 converter.

       utf-7  This is the recommended format for international information ex-
	      change, when 7-bit can only be used. It can only	handle	16-bit
	      (utf-16)	unicode,  for  ucs-4  (above  U+10FFFF)	you should use
	      utf-8 encoding.

       iso-8859-1
	      This is the ISO 8859-1 character	encoding format.  It  is  also
	      known as "Latin-1" encoding.

       iso-8859-2
	      This   is	  the ISO 8859-2 character encoding format. It is also
	      known as "Central	European" encoding.

       iso-8859-5
	      This is the ISO 8859-5 character encoding	 format.  It  is  also
	      known as "Cyrillic" encoding.

       iso-8859-7
	      This  is	the  ISO  8859-7 character encoding format. It is also
	      known as "Greek" encoding.

       iso-8859-9
	      This is the ISO 8859-9 character encoding	 format.  It  is  also
	      known as "Turkish" encoding.

       koi8-r This  is the KOI8-R character encoding format. It	is mainly used
	      in Russia.

       cp-1251
	      This is the CP1251 cyrillic character  encoding  format.	It  is
	      mainly used in Microsoft Windows and some	web sites.

       iso-2022-jp
	      This  is a Japanese character encoding format. It	is a 7-bit en-
	      coding format.

       iso-2022-jp-3
	      This is a	Japanese character encoding format. It is a 7-bit  en-
	      coding format. It	is base	upon  JIS X 0213 standard.

       euc-jp This is a	Japanese character encoding format. It is an 8-bit en-
	      coding format.  Mainly used in UNIX systems.

       euc-jp-3
	      The official name	is EUC-JISX0213	- I just could not read	 this.
	      This  is a Japanese character encoding format. It	is a 8-bit en-
	      coding format. It	is base	upon  JIS X 0213 standard.

       shift-jis
	      This is a	Japanese character encoding format.  It	 is  an	 8-bit
	      encoding format. Mainly used in MSDOS/Windows.

       shift-jis-3
	      The  official  name  is  Shift_JISX0213  - I just	could not read
	      this.  This is a Japanese	character encoding format.  It	is  an
	      8-bit encoding format. Mainly used in MSDOS/Windows.

       iso-2022-jp
	      This  is	a  Japanese  7-bit  character  encoding	 format.   The
	      iso-2022-jp email	messages can be	decoded/encoded	 are  in  this
	      format.

       iso-2022-x11
	      This  is a Japanese character encoding format.  It is also known
	      as "COMPOUND_TEXT" encoding for the X  Window System. This is  a
	      7-bit  encoding  format.	It can be derived from the ISO 2022-JP
	      format with some differences.

       ksc-5601-x11
	      This is a	 Korean	 character  encoding format used by the	X win-
	      dow  system(COMPOUND_TEXT	 encoding) to encode Korean(KS X 1001)
	      and US-ASCII. This  is  a	 7bit  encoding	 format	 compliant  to
	      ISO-2022	specification for encoding of multiple character sets.
	      Please, note that	this is	DIFFERENT from ISO-2022-KR (defined in
	      IETF RFC 1557).

       euc-kr This   is	  an  8bit  multibyte encoding for Korean.  It encodes
	      US-ASCII(7bit) in	single byte  range  and	 characters  in	 KS  X
	      1001(formerly KS C 5601) in double byte range with MSB on(8bit).
	      It's used	in Unix	and Internet. Korean  version of MS-DOS, MacOS
	      and MS-Windows use compatible (most cases, identical) variant of
	      this encoding.

       johab  This  is	a  Korean  encoding  specified	in  KS	 X  1001(KS  C
	      5601-1992),    Annex   3	 as  a supplementary encoding.	Widely
	      used in Korean MS-DOS until mid-1990's.	It  can	  encode   all
	      Hangul  syllables(11,172)	 of  modern  Korean as well as all the
	      special symbols and Hanja	(Chinese ideograms used	in Korea)  de-
	      fined in KS X 1001.

       uhc    A	 variant   of  EUC-KR  used  in	 Korean	 MS-Windows 95/98(pro-
	      prietary encoding	of Microsoft,CP949). Its character  repertoire
	      includes	all  modern   syllables	  of Hangul,Korean   script as
	      well as all the special symbols  and  Hanja  (Chinese  ideograms
	      used in Korea) defined in	KS X 1001.

       gb-18030
	      This is a	Chinese	character encoding format based	upon GB	18030.
	      It encodes the whole U+0000..U+10FFFF range, while being compat-
	      ible with	gb-2312.

       gb-2312-x11
	      This  is a Chinese character encoding format based upon GB 2312.
	      It is a 7-bit encoding format.

       gb-2312
	      This is a	Chinese	character encoding format based	upon GB	 2312.
	      It is an 8-bit encoding format.

       big-5  This  is a Chinese character encoding format based upon BIG5 en-
	      coding.  It is an	8-bit encoding format.

       hz     This is a	Chinese	character encoding format based	 upon  "Hanzi"
	      encoding.	 It is a 7-bit encoding	format.

       viscii This is a	Vietnamese character encoding format.

       ucs-2-be
	      This  converts  16-bit unicode (ucs-2) streams. The format takes
	      care of big-endian variant.  Yudit does not recommend this  for-
	      mat.

       ucs-2-le
	      This  converts  16-bit unicode (ucs-2) streams. The format takes
	      care of little-endian variant.  Yudit does  not  recommend  this
	      format.

       ucs-2  This  converts  16-bit  unicode (ucs-2) streams.	The input byte
	      order is recognized by the first two characters BEM (byte-order-
	      mark) U+FEFF.  Yudit does	not recommend this format.

       java   This converts \uxxxx character escapes. When encoding, all char-
	      acters above U+0080 will be escaped with a string	like '\u0080'.
	      When decoding the	same format is decoded but, in addition, utf-8
	      format is	also recognized, so it can also	 be  used  to  recover
	      data   accidentally   saved   with   the	wrong  enconding.  The
	      U+10000..U+10FFFF	area  is  converted  to	 surrogates  and  vice
	      versa.

       java-s This converts \uxxxx character escapes. When encoding, all char-
	      acters above U+0080 will be escaped with a string	like '\u0080'.
	      When decoding the	same format is decoded but, in addition, utf-8
	      format is	also recognized, so it can also	 be  used  to  recover
	      data accidentally	saved with the wrong enconding.	Surrogates are
	      not treated specially during conversion -	this is	why it is  not
	      a	recommened conversion.

FILES
       ~/.yudit/yudit.properties  or /usr/local/share/yudit/config/yudit.prop-
       erties
	      can have yudit.datapath property.	This is	where  the  map	 files
	      are kept.	 By default /usr/local/share/yudit/data	is searched.

SEE ALSO
	makeumap

AUTHOR
       This  program  was written by gsinai@yudit.org (Gaspar Sinai), Tokyo, 2
       January,	2001.

LINUX COMMANDS			  Nov 5	1997			    UNICONV(1)

NAME | SYNOPSIS | DESCRIPTION | ENCODING | FILES | SEE ALSO | AUTHOR

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=uniconv&sektion=1&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help