Locale influences conversions (The GNU Awk User’s Guide)
Previous: Strings And Numbers, Up: Conversion [Contents][Index]
6.1.4.2 Locales Can Influence Conversion
Where you are can matter when it comes to converting between numbers and strings. The local character set and language—the locale—can affect numeric formats. In particular, for awk
programs, it affects the decimal point character and the thousands-separator character. The "C"
locale, and most English-language locales, use the period character (‘.
’) as the decimal point and don’t have a thousands separator. However, many (if not most) European and non-English locales use the comma (‘,
’) as the decimal point character. European locales often use either a space or a period as the thousands separator, if they have one.
The POSIX standard says that awk
always uses the period as the decimal point when reading the awk
program source code, and for command-line variable assignments (see section Other Command-Line Arguments). However, when interpreting input data, for print
and printf
output, and for number-to-string conversion, the local decimal point character is used. (d.c.) In all cases, numbers in source code and in input data cannot have a thousands separator. Here are some examples indicating the difference in behavior, on a GNU/Linux system:
$ export POSIXLY_CORRECT=1 Force POSIX behavior $ gawk 'BEGIN { printf "%g\n", 3.1415927 }' -| 3.14159 $ LC_ALL=en_DK.utf-8 gawk 'BEGIN { printf "%g\n", 3.1415927 }' -| 3,14159 $ echo 4,321 | gawk '{ print $1 + 1 }' -| 5 $ echo 4,321 | LC_ALL=en_DK.utf-8 gawk '{ print $1 + 1 }' -| 5,321
The en_DK.utf-8
locale is for English in Denmark, where the comma acts as the decimal point separator. In the normal "C"
locale, gawk
treats ‘4,321
’ as 4, while in the Danish locale, it’s treated as the full number including the fractional part, 4.321.
Some earlier versions of gawk
fully complied with this aspect of the standard. However, many users in non-English locales complained about this behavior, because their data used a period as the decimal point, so the default behavior was restored to use a period as the decimal point character. You can use the --use-lc-numeric
option (see section Command-Line Options) to force gawk
to use the locale’s decimal point character. (gawk
also uses the locale’s decimal point character when in POSIX mode, either via --posix
or the POSIXLY_CORRECT
environment variable, as shown previously.)
Table 6.1 describes the cases in which the locale’s decimal point character is used and when a period is used. Some of these features have not been described yet.
Feature | Default | --posix or --use-lc-numeric
|
---|---|---|
%'g
|
Use locale | Use locale |
%g
|
Use period | Use locale |
Input | Use period | Use locale |
strtonum()
|
Use period | Use locale |
Finally, modern-day formal standards and the IEEE standard floating-point representation can have an unusual but important effect on the way gawk
converts some special string values to numbers. The details are presented in Standards Versus Existing Practice.
Previous: Strings And Numbers, Up: Conversion [Contents][Index]