Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
encoding(n)		     Tcl Built-In Commands		   encoding(n)


       encoding	- Manipulate encodings

       encoding	option ?arg arg	...?

       Strings	in  Tcl	are logically a	sequence of 16-bit Unicode characters.
       These strings are represented in	memory as a sequence of	bytes that may
       be in one of several encodings: modified	UTF-8 (which uses 1 to 3 bytes
       per character), 16-bit "Unicode"	(which uses  2	bytes  per  character,
       with an endianness that is dependent on the host	architecture), and bi-
       nary (which uses	a single byte per character but	 only  handles	a  re-
       stricted	 range	of  characters).  Tcl does not guarantee to always use
       the same	encoding for the same string.

       Different operating system  interfaces  or  applications	 may  generate
       strings	in  other  encodings  such as Shift-JIS.  The encoding command
       helps to	bridge the gap between Unicode and these other formats.

       Performs	one of several encoding	related	operations, depending  on  op-
       tion.  The legal	options	are:

       encoding	convertfrom ?encoding? data
	      Convert  data to Unicode from the	specified encoding.  The char-
	      acters in	data are treated as binary data	where the lower	8-bits
	      of  each character is taken as a single byte.  The resulting se-
	      quence of	bytes is treated as a string in	the  specified	encod-
	      ing.   If	encoding is not	specified, the current system encoding
	      is used.

       encoding	convertto ?encoding? string
	      Convert string from Unicode to the specified encoding.  The  re-
	      sult  is	a  sequence  of	 bytes	that  represents the converted
	      string.  Each byte is stored in the lower	8-bits	of  a  Unicode
	      character	 (indeed,  the	resulting string is a binary string as
	      far as Tcl is concerned, at least	initially).   If  encoding  is
	      not specified, the current system	encoding is used.

       encoding	dirs ?directoryList?
	      Tcl  can	load encoding data files from the file system that de-
	      scribe additional	encodings for it to work  with.	 This  command
	      sets  the	 search	path for *.enc encoding	data files to the list
	      of directories directoryList. If directoryList is	 omitted  then
	      the command returns the current list of directories that make up
	      the search path. It is an	error for directoryList	to  not	 be  a
	      valid  list. If, when a search for an encoding data file is hap-
	      pening, an element in directoryList does not refer  to  a	 read-
	      able, searchable directory, that element is ignored.

       encoding	names
	      Returns a	list containing	the names of all of the	encodings that
	      are currently available.	The encodings "utf-8" and  "iso8859-1"
	      are guaranteed to	be present in the list.

       encoding	system ?encoding?
	      Set the system encoding to encoding. If encoding is omitted then
	      the command returns the current system encoding.	The system en-
	      coding is	used whenever Tcl passes strings to system calls.

       It  is  common  practice	to write script	files using a text editor that
       produces	output in the euc-jp  encoding,	 which	represents  the	 ASCII
       characters  as  singe bytes and Japanese	characters as two bytes.  This
       makes it	easy to	embed literal strings  that  correspond	 to  non-ASCII
       characters  by  simply typing the strings in place in the script.  How-
       ever, because the source	command	always reads files using  the  current
       system encoding,	Tcl will only source such files	correctly when the en-
       coding used to write the	file is	the same.  This	tends not to  be  true
       in  an  internationalized  setting.   For  example,  if such a file was
       sourced in North	America	(where the ISO8859-1 is	normally  used),  each
       byte  in	the file would be treated as a separate	character that maps to
       the 00 page in Unicode.	The resulting Tcl strings will not contain the
       expected	Japanese characters.  Instead, they will contain a sequence of
       Latin-1 characters that correspond to the bytes of the original string.
       The encoding command can	be used	to convert this	string to the expected
       Japanese	Unicode	characters.  For example,

	      set s [encoding convertfrom euc-jp "\xA4\xCF"]

       would return the	Unicode	string "\u306F", which is the Hiragana	letter


       encoding, unicode

Tcl				      8.1			   encoding(n)


Want to link to this manual page? Use this URL:

home | help