Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
charmap(5)	      Standards, Environments, and Macros	    charmap(5)

NAME
       charmap - character set description file

DESCRIPTION
       A character set description file	or charmap defines characteristics for
       a coded character set. Other information	about the coded	character  set
       may  also  be in	the file. Coded	character set character	values are de-
       fined using symbolic character names  followed  by  character  encoding
       values.

       The character set description file provides:

	 o  The	 capability to describe	character set attributes (such as col-
	    lation order or character classes) independent  of	character  set
	    encoding,  and using only the characters in	the portable character
	    set. This makes it possible	to create generic localedef(1)	source
	    files for all codesets that	share the portable character set.

	 o  Standardized  symbolic  names  for	all characters in the portable
	    character set, making it possible to refer to any  such  character
	    regardless of encoding.

   Symbolic Names
       Each  symbolic  name  is	included in the	file and is mapped to a	unique
       encoding	value (except for those	symbolic names	that  are  shown  with
       identical  glyphs).  If the control characters commonly associated with
       the symbolic names in the following table are supported by  the	imple-
       mentation,  the	symbolic names and their corresponding encoding	values
       are included in the file. Some of the  encodings	 associated  with  the
       symbolic	 names in this table may be the	same as	characters in the por-
       table character set table.

       +-----------------------------------------------------------------------+
       |  <ACK>	      <DC2>	  <ENQ>	       <FS>	   <IS4>       <SOH>   |
       |  <BEL>	      <DC3>	  <EOT>	       <GS>	   <LF>	       <STX>   |
       |  <BS>	      <DC4>	  <ESC>	       <HT>	   <NAK>       <SUB>   |
       |  <CAN>	      <DEL>	  <ETB>	      <IS1>	   <RS>	       <SYN>   |
       |  <CR>	      <DLE>	  <ETX>	      <IS2>	   <SI>	       <US>    |
       |  <DC1>	      <EM>	  <FF>	      <IS3>	   <SO>	       <VT>    |
       +-----------------------------------------------------------------------+

   Declarations
       The following declarations can precede the character definitions.  Each
       must  consist  of  the  symbol shown in the following list, starting in
       column 1, including the surrounding brackets, followed by one  or  more
       blank characters, followed by the value to be assigned to the symbol.

       <code_set_name>	       The  name  of the coded character set for which
			       the character set description file is defined.

       <mb_cur_max>	       The maximum number of  bytes  in	 a  multi-byte
			       character. This defaults	to 1.

       <mb_cur_min>	       An unsigned positive integer value that defines
			       the minimum number of bytes in a	character  for
			       the encoded character set.

       <escape_char>	       The  escape character used to indicate that the
			       characters following will be interpreted	 in  a
			       special	way, as	defined	later in this section.
			       This defaults to	backslash ('\'), which is  the
			       character  glyph	used in	all the	following text
			       and examples, unless otherwise noted.

       <comment_char>	       The character that when placed in column	1 of a
			       charmap line, is	used to	indicate that the line
			       is to be	ignored. The default character is  the
			       number sign (#).

   Format
       The character set mapping definitions will be all the lines immediately
       following an identifier line containing the string CHARMAP starting  in
       column  1,  and	preceding  a  trailer  line  containing	the string END
       CHARMAP starting	in column 1.  Empty lines and lines containing a _com-
       ment_char_  in  the first column	will be	ignored. Each non-comment line
       of the character	set mapping definition,	that is, between  the  CHARMAP
       and END CHARMAP lines of	the file), must	be in either of	two forms:

       "%s %s %s\n",<symbolic-name>,<encoding>,<comments>

       or

       "%s...%s	%s %s\n",<symbolic-name>,<symbolic-name>, <encoding>,<comments>

       In  the	first format, the line in the character	set mapping definition
       defines a single	symbolic name and a corresponding encoding. A  charac-
       ter  following  an escape character is interpreted as itself; for exam-
       ple, the	sequence "<\\\>>" represents the symbolic name	"\_"  enclosed
       between angle brackets.

       In  the second format, the line in the character	set mapping definition
       defines a range of one or more symbolic names. In this form,  the  sym-
       bolic  names must consist of zero or more non-numeric characters,  fol-
       lowed by	an integer formed by one or more decimal digits.  The  charac-
       ters preceding the integer must be identical in the two symbolic	names,
       and the integer formed by the digits in the second symbolic  name  must
       be  equal  to  or  greater than the integer formed by the digits	in the
       first name.  This is interpreted	as a series of symbolic	 names	formed
       from the	common part and	each of	the integers between the first and the
       second integer, inclusive. As an	example, <j0101>...<j0104>  is	inter-
       preted as the symbolic names <j0101>, <j0102>, <j0103>, and <j0104>, in
       that order.

       A character set mapping definition line must  exist  for	 all  symbolic
       names and must define the coded character value that corresponds	to the
       character glyph indicated in the	table, or the  coded  character	 value
       that  corresponds with the control character symbolic name. If the con-
       trol characters commonly	associated with	the symbolic names   are  sup-
       ported  by  the implementation, the symbolic name and the corresponding
       encoding	value must be included in the file. Additional unique symbolic
       names  may  be  included. A coded character value can be	represented by
       more than one symbolic name.

       The encoding part is expressed as one (for single-byte  character  val-
       ues)  or	 more  concatenated decimal, octal or hexadecimal constants in
       the following formats:

       "%cd%d",<escape_char>,<decimal byte value>

       "%cx%x",<escape_char>,<hexadecimal byte value>

       "%c%o",<escape_char>,<octal byte	value>

   Decimal Constants
       Decimal constants must be represented by	two or three  decimal  digits,
       preceded	by the escape character	and the	lower-case letter d; for exam-
       ple, \d05, \d97,	or \d143. Hexadecimal constants	must be	represented by
       two hexadecimal digits, preceded	by the escape character	and the	lower-
       case letter x; for example, \x05, \x61, or \x8f.	Octal  constants  must
       be  represented	by  two	 or three octal	digits,	preceded by the	escape
       character; for example, \05, \141, or \217. In a	portable charmap file,
       each  constant must represent an	8-bit byte. Implementations supporting
       other byte sizes	may allow constants to represent  values  larger  than
       those  that  can	be represented in 8-bit	bytes, and to allow additional
       digits in constants. When constants  are	 concatenated  for  multi-byte
       character  values,  they	 must  be of the same type, and	interpreted in
       byte order from first to	last with the least significant	 byte  of  the
       multi-byte character specified by the last constant.

   Ranges of Symbolic Names
       In  lines  defining  ranges of symbolic names, the encoded value	is the
       value for the first symbolic name in the	range (the symbolic name  pre-
       ceding  the  ellipsis).	Subsequent symbolic names defined by the range
       will have encoding values in increasing order. Bytes are	treated	as un-
       signed octets and carry is propagated between the bytes as necessary to
       represent the range. However, because this causes a null	 byte  in  the
       second  or  subsequent  bytes of	a character, such a declaration	should
       not be specified. For example, the line

       <j0101>...<j0104>     \d129\d254

       is interpreted as:

       <j0101>		      \d129\d254
       <j0102>		      \d129\d255
       <j0103>		      \d130\d00
       <j0104>		      \d130\d01

       The expanded declaration	of the symbol <j0103> in the above example  is
       an invalid specification, because it contains a null byte in the	second
       byte of a character.

       The comment is optional.

   Width Specification
       The following declarations can follow the character set mapping defini-
       tions (after the	"END CHARMAP" statement). Each consists	of the keyword
       shown in	the following list, starting in	 column	 1,  followed  by  the
       value(s)	to be associated to the	keyword, as defined below.

       WIDTH	       A  non-negative integer value defining the column width
		       for the printable character in the coded	character  set
		       mapping definitions. Coded character set	character val-
		       ues are defined using symbolic character	names followed
		       by  column width	values.	Defining a character with more
		       than one	WIDTH  produces	 undefined  results.  The  END
		       WIDTH  keyword  is  used	to terminate the WIDTH defini-
		       tions.  Specifying the width of a non-printable charac-
		       ter in a	WIDTH declaration produces undefined results.

       WIDTH_DEFAULT   A  non-negative integer value defining the default col-
		       umn width for any printable character not listed	by one
		       of  the	WIDTH keywords.	If no WIDTH_DEFAULT keyword is
		       included	in the charmap,	the default character width is
		       1.

       Example:

       After  the  "END	 CHARMAP"  statement,  a syntax	for a width definition
       would be:

       WIDTH
       <A>	       1
       <B>	       1
       <C>...<Z>       1
       ...
       <fool>...<foon> 2
       ...
       END WIDTH

       In this example,	the numerical code point  values  represented  by  the
       symbols	<A> and	<B> are	assigned a width of 1. The code	point values <
       C> to <Z> inclusive, that is, <C>, <D>, <E>, and	so on,	are  also  as-
       signed a	width of 1. Using <A>. . .<Z> would have required fewer	lines,
       but the alternative was shown to	demonstrate flexibility.  The  keyword
       WIDTH_DEFAULT could have	been added as appropriate.

SEE ALSO
       locale(1), localedef(1),	nl_langinfo(3C), extensions(5),	locale(5)

SunOS 5.10			  1 Dec	2003			    charmap(5)

NAME | DESCRIPTION | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=charmap&sektion=5&manpath=SunOS+5.10>

home | help