Locales (The GNU Awk User’s Guide)
Next: Expressions Summary, Previous: Precedence, Up: Expressions [Contents][Index]
6.6 Where You Are Makes a Difference
Modern systems support the notion of locales: a way to tell the system about the local character set and language. The ISO C standard defines a default
"C" locale, which is an environment that is typical of what many C programmers are used to.
Once upon a time, the locale setting used to affect regexp matching, but this is no longer true (see section Regexp Ranges and Locales: A Long Sad Story).
Locales can affect record splitting. For the normal case of ‘
RS = "\n"’, the locale is largely irrelevant. For other single-character record separators, setting ‘
LC_ALL=C’ in the environment will give you much better performance when reading records. Otherwise,
gawk has to make several function calls, per input character, to find the record terminator.
Locales can affect how dates and times are formatted (see section Time Functions). For example, a common way to abbreviate the date September 4, 2015, in the United States is “9/4/15.” In many countries in Europe, however, it is abbreviated “4.9.15.” Thus, the ‘
%x’ specification in a
"US" locale might produce ‘
9/4/15’, while in a
"EUROPE" locale, it might produce ‘
According to POSIX, string comparison is also affected by locales (similar to regular expressions). The details are presented in String Comparison Based on Locale Collating Order.
Finally, the locale affects the value of the decimal point character used when
gawk parses input data. This is discussed in detail in Conversion of Strings and Numbers.