Gettext/Locale-Names
Next: Locale Environment Variables, Up: Setting the POSIX Locale [Contents][Index]
2.3.1 Locale Names
A locale name usually has the form ‘ll_CC
’. Here
‘ll
’ is an ISO 639 two-letter language code, and
‘CC
’ is an ISO 3166 two-letter country code. For example,
for German in Germany, ll
is de
, and CC
is DE
.
You find a list of the language codes in appendix Language Codes and
a list of the country codes in appendix Country Codes.
You might think that the country code specification is redundant. But in
fact, some languages have dialects in different countries. For example,
‘de_AT
’ is used for Austria, and ‘pt_BR
’ for Brazil. The country
code serves to distinguish the dialects.
Many locale names have an extended syntax
‘ll_CC.encoding
’ that also specifies the character
encoding. These are in use because between 2000 and 2005, most users have
switched to locales in UTF-8 encoding. For example, the German locale on
glibc systems is nowadays ‘de_DE.UTF-8
’. The older name ‘de_DE
’
still refers to the German locale as of 2000 that stores characters in
ISO-8859-1 encoding – a text encoding that cannot even accommodate the Euro
currency sign.
Some locale names use ‘ll_CC@variant
’ instead of
‘ll_CC
’. The ‘@variant
’ can denote any kind of
characteristics that is not already implied by the language ll
and
the country CC
. It can denote a particular monetary unit. For example,
on glibc systems, ‘de_DE@euro
’ denotes the locale that uses the Euro
currency, in contrast to the older locale ‘de_DE
’ which implies the use
of the currency before 2002. It can also denote a dialect of the language,
or the script used to write text (for example, ‘sr_RS@latin
’ uses the
Latin script, whereas ‘sr_RS
’ uses the Cyrillic script to write Serbian),
or the orthography rules, or similar.
On other systems, some variations of this scheme are used, such as
‘ll
’. You can get the list of locales supported by your system
for your language by running the command ‘locale -a | grep '^ll'
’.
There is also a special locale, called ‘C
’.
When it is used, it disables all localization: in this locale, all programs
standardized by POSIX use English messages and an unspecified character
encoding (often US-ASCII, but sometimes also ISO-8859-1 or UTF-8, depending on
the operating system).
Next: Locale Environment Variables, Up: Setting the POSIX Locale [Contents][Index]