From the translator: This is the tenth article from the Node.js series from the Mozilla Identity team that deals with the Persona project.
All articles of the cycle:
In the last article about localizing Node.js applications, we learned how to use the i18n-abide module in our code. Our work as programmers actually ended in the fact that we wrapped the strings in the templates and application code into
gettext()
calls. But the job of localizing and translating the application is just beginning.
')
Tools
The Mozilla Persona team localization toolkit is compatible with the tools that are used in the rest of the Mozilla community, while retaining the friendliness and flexibility inherent to Node.
The Mozilla project is almost 15 years old, and our team of localizers and translators is one of the largest (and cool) in the world of Open Source. Therefore, we have long used the familiar, one can even say old and fancy tools.
Gettext
GNU Gettext is a toolkit for localizing desktop and web applications. When you write code and templates for Node, you use English phrases everywhere, but you wrap each one into a
gettext()
call.
gettext does two things:
- at the time of assembly, compiles a catalog of all lines encountered in the application
- at runtime, replaces them with localized variants.
All extracted lines are stored in text files with the
.po
extension. In the following, we will call them po-files.
Po files
Po files are text files of a specific format that gettext can read, write, and merge.
Here is an example of the contents of the zhTW / LCMESSAGES / messages.po po file:
We will discuss it in more detail later, but now it is important for us to understand that
msgid
is an English string, and
msgstr
is its translation into Chinese. Everything that begins with a
#
is a comment. The comment in this example indicates the location of this line in the code.
Gettext provides many other tools for working with strings and po-files. We will touch them.
Why this particular toolkit?
Before we dive into a more detailed study of the Node.js modules for working with gettext, we must ask ourselves why we chose this particular set of tools?
A year ago, I thoroughly researched existing Node.js modules for internationalization and localization. Most of them invented their own bikes and JSON-based formats for storing strings.
On the other hand, Mozilla has long and successfully used tools such as
POEdit ,
Verbatim ,
Translate Toolkit and
Pootle . Instead of forcing people to retrain, we decided to develop tools for them that are compatible with familiar standards and processes.
Po-files are a common format for the exchange and cooperation of our translators. It is in this format that they should receive lines from us for translation, and give us the finished text.
Having a lot of experience developing in Mozilla in PHP and Python, I find Gettext very convenient. As the web application grows and contains more and more text, there are more and more nuances that require the use of well-tested tools and the Gettext API.
Create po-files for translators
So we tagged our code with gettext calls. What's next? The one who we call the “stringer” comes into play. It can be you, the translator or the administrator. What does the string guide do?
- Retrieves strings first appearing in an application.
- Finds new, changed lines or marks deleted in subsequent releases.
- Prepares po-files for each translation team.
- Resolves conflicts and marks modified or deleted translation lines.
This may sound a bit confusing, but, fortunately, most of these tasks are well automated. A grower only has to intervene when problems arise.
The msginit, xgettext, msgfmt, and other
GNU Gettext tools are a powerful set for working with string directories. Only the string guide works with these tools. Most developers can remain blissfully unaware of them.
Creating a file tree for a locale:
$ mkdir -p locale/templates/LC_MESSAGES
Templates of po-files -
.pot
files are stored in this directory. They will be used by gettext in the future.
Extract rows
In the last article we installed i18n-abide:
$ npm install i18n-abide
Among other command line tools, abide provides extract-pot. This command is used to extract strings to the locale directory:
mkdir -p locale/templates/LC_MESSAGES $ ./node_modules/.bin/extract-pot --locale locale
The script will go through the entire source code of the application, find the lines and write them to the po template file.
To create the pot-files, we could use the traditional utilities gettext, but we wrote a special jsxgettext module, convenient and cross-platform. Under the hood extract-pot uses it.
Jsxgettext searches the code for
gettext()
calls and extracts a string argument from them, then it formats the strings into a format that is compatible with the gettext toolkit. Here is an excerpt of such a pot file:
Later, based on this template, po-files with translation will be created. They will look like this:
To get a better feel for the topic, you can take a look at the
full version of the po-file for Chinese.
Creating locale
The mstegin command from the gettext set is used to create a po file for a specific locale based on a template file:
$ for l in en_US de es; do mkdir -p locale/${l}/LC_MESSAGES/ msginit --input=./locale/templates/LC_MESSAGES/messages.pot \ --output-file=./locale/${l}/LC_MESSAGES/messages.po \ -l ${l} done
We have just created po-files for American English, German and Spanish.
Po files
So we extracted the strings and created the locale folders. This is how our file tree looks like:
locale/ el/ LC_MESSAGES/ messages.po en_US LC_MESSAGES/ messages.po es LC_MESSAGES/ messages.po templates LC_MESSAGES/ messages.pot
These parts of your application can be given access to translators. For example, the Spanish team will have access to
locale/es/LC_MESSAGES/messages.po
. If you have a very large project, there may even be two separate locales for Spanish and Argentine Spanish variants: es-ES and es-AR.
Over time, new locales may be added.
Merge line changes
Release after release you will add new ones, change and delete old lines. You will need to update all po-files in accordance with these changes. Gettext has powerful tools for this. For ourselves, we made a wrapper script merge-po.sh, which uses the msgmerge command from the GNU Gettext package.
Add the i18n-abide tools to the system paths:
$ export PATH=$PATH:node_modules/i18n-abide/bin
and run the process of merging lines:
$ ./node_modules/.bin/extract-pot --locale locale . $ merge_po.sh ./locale
Like the first time, the extract-pot collects all the strings and creates a pattern. Then merge-po.sh updates all po files, matching them with the current version of the application. After this, translation teams can start working again.
Gettext against "not invented here" syndrome
There is nothing difficult to invent your bike based on JSON instead of gettext. Most of the authors of the Node modules went this way. But as the application grows and new and new languages are added, minor troubles will grow like a snowball. For example, without merge-po.sh, sooner or later you will have to write and debug your own merge tools. Manually update 30 files for 30 locales, without losing anything and not confusing - that still hassle.
And in gettext everything you need is already there and it saves us a lot of time and nerves.
Conclusion
Now that we have finally figured out how to create and update po files, you can reassign the translators to their careers. In general, it is always better to communicate with them in advance and discuss when it will be possible to begin translation, what amount is expected and when it is desirable to finish. It would also be useful to study the gettext documentation.
So, the strings are translated, and in the next article we will learn how localization works at runtime.
All articles of the cycle: