Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
NLS(7)		     BSD Miscellaneous Information Manual		NLS(7)

NAME
     NLS -- Native Language Support Overview

DESCRIPTION
     Native Language Support (NLS) provides commands for a single worldwide
     operating system base.  An	internationalized system has no	built-in as-
     sumptions or dependencies on language-specific or cultural-specific con-
     ventions such as:

	   +o   Character classifications
	   +o   Character comparison rules
	   +o   Character collation order
	   +o   Numeric and monetary formatting
	   +o   Date and	time formatting
	   +o   Message-text language
	   +o   Character sets

     All information pertaining	to cultural conventions	and language is	ob-
     tained at program run time.

     "Internationalization" (often abbreviated "i18n") refers to the operation
     by	which system software is developed to support multiple cultural-spe-
     cific and language-specific conventions.  This is a generalization
     process by	which the system is untied from	calling	only English strings
     or	other English-specific conventions.  "Localization" (often abbreviated
     "l10n") refers to the operations by which the user	environment is custom-
     ized to handle its	input and output appropriate for specific language and
     cultural conventions.  This is a specialization process, by which generic
     methods already implemented in an internationalized system	are used in
     specific ways.  The formal	description of cultural	conventions for	some
     country, together with all	associated translations	targeted to the	native
     language, is called the "locale".

     NetBSD provides extensive support to programmers and system developers to
     enable internationalized software to be developed.	 NetBSD	also supplies
     a large variety of	locales	for system localization.

   Localization	of Information
     All locale	information is accessible to programs at run time so that data
     is	processed and displayed	correctly for specific cultural	conventions
     and language.

     A locale is divided into categories.  A category is a group of language-
     specific and culture-specific conventions as outlined in the list above.
     ISO C specifies the following six standard	categories supported by
     NetBSD:

     LC_COLLATE	    string-collation order information
     LC_CTYPE	    character classification, case conversion, and other char-
		    acter attributes
     LC_MESSAGES    the	format for affirmative and negative responses
     LC_MONETARY    rules and symbols for formatting monetary numeric informa-
		    tion
     LC_NUMERIC	    rules and symbols for formatting nonmonetary numeric in-
		    formation
     LC_TIME	    rules and symbols for formatting time and date information

     Localization of the system	is achieved by setting appropriate values in
     environment variables to identify which locale should be used.  The envi-
     ronment variables have the	same names as their respective locale cate-
     gories.  Additionally, the	LANG, LC_ALL, and NLSPATH environment vari-
     ables are used.  The NLSPATH environment variable specifies a colon-sepa-
     rated list	of directory names where the message catalog files of the NLS
     database are located.  The	LC_ALL and LANG	environment variables also de-
     termine the current locale.

     The values	of these environment variables contains	a string format	as:

	     language[_territory][.codeset][@modifier]

     Valid values for the language field come from the ISO639 standard which
     defines two-character codes for many languages.  Some common language
     codes are:

     Language Name	Code	   Language Family
     ABKHAZIAN		AB	   IBERO-CAUCASIAN
     AFAN (OROMO)	OM	   HAMITIC
     AFAR		AA	   HAMITIC
     AFRIKAANS		AF	   GERMANIC
     ALBANIAN		SQ	   INDO-EUROPEAN (OTHER)
     AMHARIC		AM	   SEMITIC
     ARABIC		AR	   SEMITIC
     ARMENIAN		HY	   INDO-EUROPEAN (OTHER)
     ASSAMESE		AS	   INDIAN
     AYMARA		AY	   AMERINDIAN
     AZERBAIJANI	AZ	   TURKIC/ALTAIC
     BASHKIR		BA	   TURKIC/ALTAIC
     BASQUE		EU	   BASQUE
     BENGALI		BN	   INDIAN
     BHUTANI		DZ	   ASIAN
     BIHARI		BH	   INDIAN
     BISLAMA		BI
     BRETON		BR	   CELTIC
     BULGARIAN		BG	   SLAVIC
     BURMESE		MY	   ASIAN
     BYELORUSSIAN	BE	   SLAVIC
     CAMBODIAN		KM	   ASIAN
     CATALAN		CA	   ROMANCE
     CHINESE		ZH	   ASIAN
     CORSICAN		CO	   ROMANCE
     CROATIAN		HR	   SLAVIC
     CZECH		CS	   SLAVIC
     DANISH		DA	   GERMANIC
     DUTCH		NL	   GERMANIC
     ENGLISH		EN	   GERMANIC
     ESPERANTO		EO	   INTERNATIONAL AUX.
     ESTONIAN		ET	   FINNO-UGRIC
     FAROESE		FO	   GERMANIC
     FIJI		FJ	   OCEANIC/INDONESIAN
     FINNISH		FI	   FINNO-UGRIC
     FRENCH		FR	   ROMANCE
     FRISIAN		FY	   GERMANIC
     GALICIAN		GL	   ROMANCE
     GEORGIAN		KA	   IBERO-CAUCASIAN
     GERMAN		DE	   GERMANIC
     GREEK		EL	   LATIN/GREEK
     GREENLANDIC	KL	   ESKIMO
     GUARANI		GN	   AMERINDIAN
     GUJARATI		GU	   INDIAN
     HAUSA		HA	   NEGRO-AFRICAN
     HEBREW		HE	   SEMITIC
     HINDI		HI	   INDIAN
     HUNGARIAN		HU	   FINNO-UGRIC
     ICELANDIC		IS	   GERMANIC
     INDONESIAN		ID	   OCEANIC/INDONESIAN
     INTERLINGUA	IA	   INTERNATIONAL AUX.
     INTERLINGUE	IE	   INTERNATIONAL AUX.
     INUKTITUT		IU
     INUPIAK		IK	   ESKIMO
     IRISH		GA	   CELTIC
     ITALIAN		IT	   ROMANCE
     JAPANESE		JA	   ASIAN
     JAVANESE		JV	   OCEANIC/INDONESIAN
     KANNADA		KN	   DRAVIDIAN
     KASHMIRI		KS	   INDIAN
     KAZAKH		KK	   TURKIC/ALTAIC
     KINYARWANDA	RW	   NEGRO-AFRICAN
     KIRGHIZ		KY	   TURKIC/ALTAIC
     KURUNDI		RN	   NEGRO-AFRICAN
     KOREAN		KO	   ASIAN
     KURDISH		KU	   IRANIAN
     LAOTHIAN		LO	   ASIAN
     LATIN		LA	   LATIN/GREEK
     LATVIAN		LV	   BALTIC
     LINGALA		LN	   NEGRO-AFRICAN
     LITHUANIAN		LT	   BALTIC
     MACEDONIAN		MK	   SLAVIC
     MALAGASY		MG	   OCEANIC/INDONESIAN
     MALAY		MS	   OCEANIC/INDONESIAN
     MALAYALAM		ML	   DRAVIDIAN
     MALTESE		MT	   SEMITIC
     MAORI		MI	   OCEANIC/INDONESIAN
     MARATHI		MR	   INDIAN
     MOLDAVIAN		MO	   ROMANCE
     MONGOLIAN		MN
     NAURU		NA
     NEPALI		NE	   INDIAN
     NORWEGIAN		NO	   GERMANIC
     OCCITAN		OC	   ROMANCE
     ORIYA		OR	   INDIAN
     PASHTO		PS	   IRANIAN
     PERSIAN (farsi)	FA	   IRANIAN
     POLISH		PL	   SLAVIC
     PORTUGUESE		PT	   ROMANCE
     PUNJABI		PA	   INDIAN
     QUECHUA		QU	   AMERINDIAN
     RHAETO-ROMANCE	RM	   ROMANCE
     ROMANIAN		RO	   ROMANCE
     RUSSIAN		RU	   SLAVIC
     SAMOAN		SM	   OCEANIC/INDONESIAN
     SANGHO		SG	   NEGRO-AFRICAN
     SANSKRIT		SA	   INDIAN
     SCOTS GAELIC	GD	   CELTIC
     SERBIAN		SR	   SLAVIC
     SERBO-CROATIAN	SH	   SLAVIC
     SESOTHO		ST	   NEGRO-AFRICAN
     SETSWANA		TN	   NEGRO-AFRICAN
     SHONA		SN	   NEGRO-AFRICAN
     SINDHI		SD	   INDIAN
     SINGHALESE		SI	   INDIAN
     SISWATI		SS	   NEGRO-AFRICAN
     SLOVAK		SK	   SLAVIC
     SLOVENIAN		SL	   SLAVIC
     SOMALI		SO	   HAMITIC
     SPANISH		ES	   ROMANCE
     SUNDANESE		SU	   OCEANIC/INDONESIAN
     SWAHILI		SW	   NEGRO-AFRICAN
     SWEDISH		SV	   GERMANIC
     TAGALOG		TL	   OCEANIC/INDONESIAN
     TAJIK		TG	   IRANIAN
     TAMIL		TA	   DRAVIDIAN
     TATAR		TT	   TURKIC/ALTAIC
     TELUGU		TE	   DRAVIDIAN
     THAI		TH	   ASIAN
     TIBETAN		BO	   ASIAN
     TIGRINYA		TI	   SEMITIC
     TONGA		TO	   OCEANIC/INDONESIAN
     TSONGA		TS	   NEGRO-AFRICAN
     TURKISH		TR	   TURKIC/ALTAIC
     TURKMEN		TK	   TURKIC/ALTAIC
     TWI		TW	   NEGRO-AFRICAN
     UIGUR		UG
     UKRAINIAN		UK	   SLAVIC
     URDU		UR	   INDIAN
     UZBEK		UZ	   TURKIC/ALTAIC
     VIETNAMESE		VI	   ASIAN
     VOLAPUK		VO	   INTERNATIONAL AUX.
     WELSH		CY	   CELTIC
     WOLOF		WO	   NEGRO-AFRICAN
     XHOSA		XH	   NEGRO-AFRICAN
     YIDDISH		YI	   GERMANIC
     YORUBA		YO	   NEGRO-AFRICAN
     ZHUANG		ZA
     ZULU		ZU	   NEGRO-AFRICAN

     For example, the locale for the Danish language spoken in Denmark using
     the ISO 8859-1 character set is da_DK.ISO8859-1.  The da stands for the
     Danish language and the DK	stands for Denmark.  The short form of da_DK
     is	sufficient to indicate this locale.

     The environment variable settings are queried by their priority level in
     the following manner:

     +o	 If the	LC_ALL environment variable is set, all	six categories use the
	 locale	it specifies.

     +o	 If the	LC_ALL environment variable is not set,	each individual	cate-
	 gory uses the locale specified	by its corresponding environment vari-
	 able.

     +o	 If the	LC_ALL environment variable is not set,	and a value for	a par-
	 ticular LC_* environment variable is not set, the value of the	LANG
	 environment variable specifies	the default locale for all categories.
	 Only the LANG environment variable should be set in /etc/profile,
	 since it makes	it most	easy for the user to override the system de-
	 fault using the individual LC_* variables.

     +o	 If the	LC_ALL environment variable is not set,	a value	for a particu-
	 lar LC_* environment variable is not set, and the value of the	LANG
	 environment variable is not set, the locale for that specific cate-
	 gory defaults to the C	locale.	 The C or POSIX	locale assumes the
	 ASCII character set and defines information for the six categories.

   Character Sets
     A character is any	symbol used for	the organization, control, or repre-
     sentation of data.	 A group of such symbols used to describe a particular
     language make up a	character set.	It is the encoding values in a charac-
     ter set that provide the interface	between	the system and its input and
     output devices.

     The following character sets are supported	in NetBSD:

     ASCII	      The American Standard Code for Information Exchange
		      (ASCII) standard specifies 128 Roman characters and con-
		      trol codes, encoded in a 7-bit character encoding
		      scheme.

     ISO 8859 family  Industry-standard	character sets specified by the
		      ISO/IEC 8859 standard.  The standard is divided into 15
		      numbered parts, with each	part specifying	broad script
		      similarities.  Examples include Western European,	Cen-
		      tral European, Arabic, Cyrillic, Hebrew, Greek, and
		      Turkish.	The character sets use an 8-bit	character en-
		      coding scheme which is compatible	with the ASCII charac-
		      ter set.

     Unicode	      The Unicode character set	is the full set	of known ab-
		      stract characters	of all real-world scripts.  It can be
		      used in environments where multiple scripts must be pro-
		      cessed simultaneously.  Unicode is compatible with ISO
		      8859-1 (Western European)	and ASCII.  Many character en-
		      coding schemes are available for Unicode,	including
		      UTF-8, UTF-16 and	UTF-32.	 These encoding	schemes	are
		      multi-byte encodings.  The UTF-8 encoding	scheme uses
		      8-bit, variable-width encodings which is compatible with
		      ASCII.  The UTF-16 encoding scheme uses 16-bit, vari-
		      able-width encodings.  The UTF-32	encoding scheme	using
		      32-bit, fixed-width encodings.

   Font	Sets
     A font set	contains the glyphs to be displayed on the screen for a	corre-
     sponding character	in a character set.  A display must support a suitable
     font to display a character set.  If suitable fonts are available to the
     X server, then X clients can include support for different	character
     sets.  xterm(1) includes support for Unicode with UTF-8 encoding.	xfd(1)
     is	useful for displaying all the characters in an X font.

     The NetBSD	wscons(4) console provides support for loading fonts using the
     wsfontload(8) utility.  Currently,	only fonts for the ISO8859-1 family of
     character sets are	supported.

   Internationalization	for Programmers
     To	facilitate translations	of messages into various languages and to make
     the translated messages available to the program based on a user's	lo-
     cale, it is necessary to keep messages separate from the programs and
     provide them in the form of message catalogs that a program can access at
     run time.

     Access to locale information is provided through the setlocale(3) and
     nl_langinfo(3) interfaces.	 See their respective man pages	for further
     information.

     Message source files containing application messages are created by the
     programmer	and converted to message catalogs.  These catalogs are used by
     the application to	retrieve and display messages, as needed.

     NetBSD supports two message catalog interfaces: the X/Open	catgets(3) in-
     terface and the Uniforum gettext(3) interface.  The catgets(3) interface
     has the advantage that it belongs to a standard which is well supported.
     Unfortunately the interface is complicated	to use and maintenance of the
     catalogs is difficult.  The implementation	also doesn't support different
     character sets.  The gettext(3) interface has not been standardized yet,
     however it	is being supported by an increasing number of systems.	It
     also provides many	additional tools which make programming	and catalog
     maintenance much easier.

   Support for Multi-byte Encodings
     Some character sets with multi-byte encodings may be difficult to decode,
     or	may contain state (i.e., adjacent characters are dependent).  ISO C
     specifies a set of	functions using	'wide characters' which	can handle
     multi-byte	encodings properly.  The behaviour of these functions is af-
     fected by the LC_CTYPE category of	the current locale.

     A wide character is specified in ISO C as being a fixed number of bits
     wide and is stateless.  There are two types for wide characters: wchar_t
     and wint_t.  wchar_t is a type which can contain one wide character and
     operates like 'char' type does for	one character.	wint_t can contain one
     wide character or WEOF (wide EOF).

     There are functions that operate on wchar_t, and substitute for functions
     operating on 'char'.  See wmemchr(3) and towlower(3) for details.	There
     are some additional functions that	operate	on wchar_t.  See wctype(3) and
     wctrans(3)	for details.

     Wide characters should be used for	all I/O	processing which may rely on
     locale-specific strings.  The two primary issues requiring	special	use of
     wide characters are:

	   +o   All I/O is performed using multibyte characters.	 Input data is
	       converted into wide characters immediately after	reading	and
	       data for	output is converted from wide characters to multi-byte
	       encoding	immediately before writing.  Conversion	is controlled
	       by the mbstowcs(3), mbsrtowcs(3), wcstombs(3), wcsrtombs(3),
	       mblen(3), mbrlen(3), and	mbsinit(3).

	   +o   Wide characters are used	directly for I/O, using	getwchar(3),
	       fgetwc(3), getwc(3), ungetwc(3),	fgetws(3), putwchar(3),
	       fputwc(3), putwc(3), and	fputws(3).  They are also used for
	       formatted I/O functions for wide	characters such	as fwscanf(3),
	       wscanf(3), swscanf(3), fwprintf(3), wprintf(3), swprintf(3),
	       vfwprintf(3), vwprintf(3), and vswprintf(3), and	wide character
	       identifier of %lc, %C, %ls, %S for conventional formatted I/O
	       functions.

SEE ALSO
     gencat(1),	xfd(1),	xterm(1), catgets(3), gettext(3), nl_langinfo(3),
     setlocale(3), wsfontload(8)

BUGS
     This man page is incomplete.

BSD			       February	21, 2007			   BSD

NAME | DESCRIPTION | SEE ALSO | BUGS

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=nls&sektion=7&manpath=NetBSD+6.0>

home | help