GNU gettext utilities: Preparing ITS Rules
Previous: AppData, Up: Internationalizable Data [Contents][Index]
16.1.6 Preparing Rules for XML Internationalization
Marking translatable strings in an XML file is done through a separate "rule" file, making use of the Internationalization Tag Set standard (ITS, https://www.w3.org/TR/its20/). The currently supported ITS data categories are: ‘Translate
’, ‘Localization Note
’, ‘Elements Within Text
’, and ‘Preserve Space
’. In addition to them, xgettext
also recognizes the following extended data categories:
- ‘
Context
’ - This data category associates
msgctxt
to the extracted text. In the global rule, thecontextRule
element contains the following:
- A required
selector
attribute. It contains an absolute selector that selects the nodes to which this rule applies. - A required
contextPointer
attribute that contains a relative selector pointing to a node that holds themsgctxt
value. - An optional
textPointer
attribute that contains a relative selector pointing to a node that holds themsgid
value.
- A required
- ‘
Escape Special Characters
’ - This data category indicates whether the special XML characters (
<
,>
,&
,"
) are escaped with entity reference. In the global rule, theescapeRule
element contains the following:
- A required
selector
attribute. It contains an absolute selector that selects the nodes to which this rule applies. - A required
escape
attribute with the valueyes
orno
.
- A required
- ‘
Extended Preserve Space
’ - This data category extends the standard ‘
Preserve Space
’ data category with the additional values ‘trim
’ and ‘paragraph
’. ‘trim
’ means to remove the leading and trailing whitespaces of the content, but not to normalize whitespaces in the middle. ‘paragraph
’ means to normalize the content but keep the paragraph boundaries. In the global rule, thepreserveSpaceRule
element contains the following:
- A required
selector
attribute. It contains an absolute selector that selects the nodes to which this rule applies. - A required
space
attribute with the valuedefault
,preserve
,trim
, orparagraph
.
- A required
All those extended data categories can only be expressed with global rules, and the rule elements have to have the https://www.gnu.org/s/gettext/ns/its/extensions/1.0
namespace.
Given the following XML document in a file messages.xml
:
<?xml version="1.0"?> <messages> <message> <p>A translatable string</p> </message> <message> <p translatable="no">A non-translatable string</p> </message> </messages>
To extract the first text content ("A translatable string"), but not the second ("A non-translatable string"), the following ITS rules can be used:
<?xml version="1.0"?> <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:translateRule selector="/messages" translate="no"/> <its:translateRule selector="//message/p" translate="yes"/> <!-- If 'p' has an attribute 'translatable' with the value 'no', then the content is not translatable. --> <its:translateRule selector="//message/p[@translatable = 'no']" translate="no"/> </its:rules>
‘xgettext
’ needs another file called "locating rule" to associate an ITS rule with an XML file. If the above ITS file is saved as messages.its
, the locating rule would look like:
<?xml version="1.0"?> <locatingRules> <locatingRule name="Messages" pattern="*.xml"> <documentRule localName="messages" target="messages.its"/> </locatingRule> <locatingRule name="Messages" pattern="*.msg" target="messages.its"/> </locatingRules>
The locatingRule
element must have a pattern
attribute, which denotes either a literal file name or a wildcard pattern of the XML file7. The locatingRule
element can have child documentRule
element, which adds checks on the content of the XML file.
The first rule matches any file with the .xml
file extension, but it only applies to XML files whose root element is ‘<messages>
’.
The second rule indicates that the same ITS rule file are also applicable to any file with the .msg
file extension. The optional name
attribute of locatingRule
allows to choose rules by name, typically with xgettext
’s -L
option.
The associated ITS rule file is indicated by the target
attribute of locatingRule
or documentRule
. If it is specified in a documentRule
element, the parent locatingRule
shouldn’t have the target
attribute.
Locating rule files must have the .loc
file extension. Both ITS rule files and locating rule files must be installed in the $prefix/share/gettext/its
directory. Once those files are properly installed, xgettext
can extract translatable strings from the matching XML files.
16.1.6.1 Two Use-cases of Translated Strings in XML
For XML, there are two use-cases of translated strings. One is the case where the translated strings are directly consumed by programs, and the other is the case where the translated strings are merged back to the original XML document. In the former case, special characters in the extracted strings shouldn’t be escaped, while they should in the latter case. To control wheter to escape special characters, the ‘Escape Special Characters
’ data category can be used.
To merge the translations, the ‘msgfmt
’ program can be used with the option --xml
. See msgfmt Invocation, for more details about how one calls the ‘msgfmt
’ program. ‘msgfmt
’’s --xml
option doesn’t perform character escaping, so translated strings can have arbitrary XML constructs, such as elements for markup.
Footnotes
(7)
Note that the file name matching is done after removing any .in
suffix from the input file name. Thus the pattern
attribute must not include a pattern matching .in
. For example, if the input file name is foo.msg.in
, the pattern should be either *.msg
or just *
, rather than *.in
.
Previous: AppData, Up: Internationalizable Data [Contents][Index]