15.1 The Language Implementor’s View

All programming and scripting languages that have the notion of strings are eligible to supporting gettext. Supporting gettext means the following:

  1. You should add to the language a syntax for translatable strings. In principle, a function call of gettext would do, but a shorthand syntax helps keeping the legibility of internationalized programs. For example, in C we use the syntax _("string"), and in GNU awk we use the shorthand _"string".
  2. You should arrange that evaluation of such a translatable string at runtime calls the gettext function, or performs equivalent processing.
  3. Similarly, you should make the functions ngettext, dcgettext, dcngettext available from within the language. These functions are less often used, but are nevertheless necessary for particular purposes: ngettext for correct plural handling, and dcgettext and dcngettext for obeying other locale-related environment variables than LC_MESSAGES, such as LC_TIME or LC_MONETARY. For these latter functions, you need to make the LC_* constants, available in the C header <locale.h>, referenceable from within the language, usually either as enumeration values or as strings.
  4. You should allow the programmer to designate a message domain, either by making the textdomain function available from within the language, or by introducing a magic variable called TEXTDOMAIN. Similarly, you should allow the programmer to designate where to search for message catalogs, by providing access to the bindtextdomain function or — on native Windows platforms — to the wbindtextdomain function.
  5. You should either perform a setlocale (LC_ALL, "") call during the startup of your language runtime, or allow the programmer to do so. Remember that gettext will act as a no-op if the LC_MESSAGES and LC_CTYPE locale categories are not both set.
  6. A programmer should have a way to extract translatable strings from a program into a PO file. The GNU xgettext program is being extended to support very different programming languages. Please contact the GNU gettext maintainers to help them doing this. The GNU gettext maintainers will need from you a formal description of the lexical structure of source files. It should answer the questions:

    • What does a token look like?
    • What does a string literal look like? What escape characters exist inside a string?
    • What escape characters exist outside of strings? If Unicode escapes are supported, are they applied before or after tokenization?
    • What is the syntax for function calls? How are consecutive arguments in the same function call separated?
    • What is the syntax for comments?

    Based on this description, the GNU gettext maintainers can add support to xgettext.

    If the string extractor is best integrated into your language’s parser, GNU xgettext can function as a front end to your string extractor.

  7. The language’s library should have a string formatting facility. Additionally:

    1. There must be a way, in the format string, to denote the arguments by a positional number or a name. This is needed because for some languages and some messages with more than one substitutable argument, the translation will need to output the substituted arguments in different order. See c-format Flag.
    2. The syntax of format strings must be documented in a way that translators can understand. The GNU gettext manual will be extended to include a pointer to this documentation.

    Based on this, the GNU gettext maintainers can add a format string equivalence checker to msgfmt, so that translators get told immediately when they have made a mistake during the translation of a format string.

  8. If the language has more than one implementation, and not all of the implementations use gettext, but the programs should be portable across implementations, you should provide a no-i18n emulation, that makes the other implementations accept programs written for yours, without actually translating the strings.
  9. To help the programmer in the task of marking translatable strings, which is sometimes performed using the Emacs PO mode (see Marking), you are welcome to contact the GNU gettext maintainers, so they can add support for your language to po-mode.el.

On the implementation side, two approaches are possible, with different effects on portability and copyright:

  • You may link against GNU gettext functions if they are found in the C library. For example, an autoconf test for gettext() and ngettext() will detect this situation. For the moment, this test will succeed on GNU systems and on Solaris 11 platforms. No severe copyright restrictions apply, except if you want to distribute statically linked binaries.
  • You may emulate or reimplement the GNU gettext functionality. This has the advantage of full portability and no copyright restrictions, but also the drawback that you have to reimplement the GNU gettext features (such as the LANGUAGE environment variable, the locale aliases database, the automatic charset conversion, and plural handling).