Explaining gettext (The GNU Awk User’s Guide)
13.2 GNU gettext
gawk uses GNU
gettext to provide its internationalization features. The facilities in GNU
gettext focus on messages: strings printed by a program, either directly or via formatting with
When using GNU
gettext, each application has its own text domain. This is a unique name, such as ‘
kpilot’ or ‘
gawk’, that identifies the application. A complete application may have multiple components—programs written in C or C++, as well as scripts written in
awk. All of the components use the same text domain.
To make the discussion concrete, assume we’re writing an application named
guide. Internationalization consists of the following steps, in this order:
- The programmer reviews the source for all of
guide’s components and marks each string that is a candidate for translation. For example,
"`-F': option required"is a good candidate for translation. A table with strings of option names is not (e.g.,
--profileoption should remain the same, no matter what the local language).
- The programmer indicates the application’s text domain (
"guide") to the
gettextlibrary, by calling the
- Messages from the application are extracted from the source code and collected into a portable object template file (
guide.pot), which lists the strings and their translations. The translations are initially empty. The original (usually English) messages serve as the key for lookup of the translations.
- For each language with a translator,
guide.potis copied to a portable object file (
.po) and translations are created and shipped with the application. For example, there might be a
fr.pofor a French translation.
- Each language’s
.pofile is converted into a binary message object (
.gmo) file. A message object file contains the original messages and their translations in a binary format that allows fast lookup of translations at runtime.
guideis built and installed, the binary translation files are installed in a standard place.
- For testing and development, it is possible to tell
.gmofiles in a different directory than the standard one by using the
- At runtime,
guidelooks up each string via a call to
gettext(). The returned string is the translated string if available, or the original string if not.
- If necessary, it is possible to access messages from a different text domain than the one belonging to the application, without having to switch the application’s default text domain back and forth.
In C (or C++), the string marking and dynamic translation lookup are accomplished by wrapping each string in a call to
printf("%s", gettext("Don't Panic!\n"));
The tools that extract messages from source code pull out all strings enclosed in calls to
gettext developers, recognizing that typing ‘
gettext(…)’ over and over again is both painful and ugly to look at, use the macro ‘
_’ (an underscore) to make things easier:
/* In the standard header file: */ #define _(str) gettext(str) /* In the program text: */ printf("%s", _("Don't Panic!\n"));
This reduces the typing overhead to just three extra characters per string and is considerably easier to read as well.
There are locale categories for different types of locale-related information. The defined locale categories that
gettext knows about are:
Text messages. This is the default category for
gettext operations, but it is possible to supply a different one explicitly, if necessary. (It is almost never necessary to supply a different category.)
Text-collation information (i.e., how different characters and/or groups of characters sort in a given language).
Character-type information (alphabetic, digit, upper- or lowercase, and so on) as well as character encoding. This information is accessed via the POSIX character classes in regular expressions, such as
/Gawk/docs/latest/:alnum:/ (see section Using Bracket Expressions).
Monetary information, such as the currency symbol, and whether the symbol goes before or after a number.
Numeric information, such as which characters to use for the decimal point and the thousands separator.90
Time- and date-related information, such as 12- or 24-hour clock, month printed before or after the day in a date, local month abbreviations, and so on.
All of the above. (Not too useful in the context of
NOTE: As described in Where You Are Makes a Difference, environment variables with the same name as the locale categories (
LC_ALL, etc.) influence
gawk’s behavior (and that of other utilities).
Normally, these variables also affect how the
gettextlibrary finds translations. However, the
LANGUAGEenvironment variable overrides the
LC_xxxvariables. Many GNU/Linux systems may define this variable without your knowledge, causing
gawkto not find the correct translations. If this happens to you, look to see if
LANGUAGEis defined, and if so, use the shell’s
unsetcommand to remove it.
For testing translations of
gawk itself, you can set the
GAWK_LOCALE_DIR environment variable. See the documentation for the C
bindtextdomain() function and also see Other Environment Variables.
For some operating systems, the
gawk port doesn’t support GNU
gettext. Therefore, these features are not available if you are using one of those operating systems. Sorry.
Americans use a comma every three decimal places and a period for the decimal point, while many Europeans do exactly the opposite: 1,234.56 versus 1.234,56.