Gettext/Perl-Pitfalls

From Get docs

Previous: Long Lines, Up: Perl   [Contents][Index]


15.5.21.9 Bugs, Pitfalls, And Things That Do Not Work

The foregoing sections should have proven that xgettext is quite smart in extracting translatable strings from Perl sources. Yet, some more or less exotic constructs that could be expected to work, actually do not work.

One of the more relevant limitations can be found in the implementation of variable interpolation inside quoted strings. Only simple hash lookups can be used there:

print <<EOF;
$gettext{"The dot operator"
          . " does not work"
          . "here!"}
Likewise, you cannot @{[ gettext ("interpolate function calls") ]}
inside quoted strings or quote-like expressions.
EOF

This is valid Perl code and will actually trigger invocations of the gettext function at runtime. Yet, the Perl parser in xgettext will fail to recognize the strings. A less obvious example can be found in the interpolation of regular expressions:

s/<!--START_OF_WEEK-->/gettext ("Sunday")/e;

The modifier e will cause the substitution to be interpreted as an evaluable statement. Consequently, at runtime the function gettext() is called, but again, the parser fails to extract the string “Sunday”. Use a temporary variable as a simple workaround if you really happen to need this feature:

my $sunday = gettext "Sunday";
s/<!--START_OF_WEEK-->/$sunday/;

Hash slices would also be handy but are not recognized:

my @weekdays = @gettext{'Sunday', 'Monday', 'Tuesday', 'Wednesday',
                        'Thursday', 'Friday', 'Saturday'};
# Or even:
@weekdays = @gettext{qw (Sunday Monday Tuesday Wednesday Thursday
                         Friday Saturday) };

This is perfectly valid usage of the tied hash %gettext but the strings are not recognized and therefore will not be extracted.

Another caveat of the current version is its rudimentary support for non-ASCII characters in identifiers. You may encounter serious problems if you use identifiers with characters outside the range of ’A’-’Z’, ’a’-’z’, ’0’-’9’ and the underscore ’_’.

Maybe some of these missing features will be implemented in future versions, but since you can always make do without them at minimal effort, these todos have very low priority.

A nasty problem are brace format strings that already contain braces as part of the normal text, for example the usage strings typically encountered in programs:

die "usage: $0 {OPTIONS} FILENAME...\n";

If you want to internationalize this code with Perl brace format strings, you will run into a problem:

die __x ("usage: {program} {OPTIONS} FILENAME...\n", program => $0);

Whereas ‘{program}’ is a placeholder, ‘{OPTIONS}’ is not and should probably be translated. Yet, there is no way to teach the Perl parser in xgettext to recognize the first one, and leave the other one alone.

There are two possible work-arounds for this problem. If you are sure that your program will run under Perl 5.8.0 or newer (these Perl versions handle positional parameters in printf()) or if you are sure that the translator will not have to reorder the arguments in her translation – for example if you have only one brace placeholder in your string, or if it describes a syntax, like in this one –, you can mark the string as no-perl-brace-format and use printf():

# xgettext: no-perl-brace-format
die sprintf ("usage: %s {OPTIONS} FILENAME...\n", $0);

If you want to use the more portable Perl brace format, you will have to do put placeholders in place of the literal braces:

die __x ("usage: {program} {[}OPTIONS{]} FILENAME...\n",
         program => $0, '[' => '{', ']' => '}');

Perl brace format strings know no escaping mechanism. No matter how this escaping mechanism looked like, it would either give the programmer a hard time, make translating Perl brace format strings heavy-going, or result in a performance penalty at runtime, when the format directives get executed. Most of the time you will happily get along with printf() for this special case.

Previous: Long Lines, Up: Perl   [Contents][Index]