Despite the fact that world culture in the face of Wikipedia and Paul McCartney assures us that
Mary had a little lamb , on the territory of one-eighth of the land, they still believe that in fact “Mary had a lamb”. Who really was with Mary, and how to write it in different languages of the world? Let's try to figure it out (and also understand what the Japanese think about it) along with our favorite Python and the gettext multilingual translation support module built into it.
Let's get started
To begin with, recall that the gettext library is used to translate not only Python programs, but in many different languages. It allows you to use phrase patterns in our program that can be translated using separate and independent translation files. In the program itself, we, as before, output the text immediately to the screen, to disk, to logs, or somewhere else, just by marking the translated strings in a special way; the gettext library, on the other hand, allows you to take these translatable strings, sets of translation files, and, if there is a suitable translation file for the current language, substitute the desired string.
In Python, access to the gettext library mechanisms is achieved using the gettext module that comes with the Python module. So let's not confuse the gettext system as such (external to Python, and an entity that is completely unnecessary for its work; nevertheless, the package includes convenient utilities for working with gettext files) and the built-in Python module gettext.
')
First, we write a basic program (let's call it mary.py), which we will try to translate into various languages:
#!/usr/bin/python
name = _("Mary")
animal = _("lamb")
print _("%s had a little %s") % (name, animal)
When using the gettext module, it is customary to mark translatable strings with a call to the
_()
function. While this function is not defined (however, no one prevents us from temporarily determining something like
_ = lambda x: x
), so the program will probably not even be able to start ... but we don’t need it yet.
You probably already thought that now we will create a new text file with associations, in which you will need to remember to specify all the translated strings from the program? In our case there are only 3 such lines, but in a serious program there may be a lot more ...
Translation template: .pot
... you almost guessed it. We will create the file. But at the same time, we will use the pleasant opportunity of the gettext system - the analysis of source files for translatable strings. Since we wisely flagged them with a call to the _ () function before this call began to use gettext seriously, the parser can now quickly assemble them.
Since the gettext system is oriented for use in any programming languages, it includes the xgettext program that can generate a template file for translation from source codes to a sufficiently large number of languages - C, C ++, ObjectiveC, C #, Java, Perl, Python, PHP, Lisp ... But this is if you are not too lazy to install the gettext software package itself (“aptitude install gettext”, or whatever it is in your distribution). But we are writing a program in Python, which is self-sufficient for translating programs; therefore, we will use the pygettext.py script (or pygettext under Unix) that is part of Python.
Run the pygettext:
pygettext mary.py
In the same directory with our program, a file called messages.pot, containing the following:
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2009-10-28 01:12+MSK\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: ENCODING\n"
"Generated-By: pygettext.py 1.5\n"
#: mary.py:6
msgid "Mary"
msgstr ""
#: mary.py:7
msgid "lamb"
msgstr ""
#: mary.py:10
msgid "%s had a little %s"
msgstr ""
What it is? This is a template for translating our entire program. If we have a large translation team, we can give this template to each translator for each target language, and he should return the completed template for his language. Typically, templates have the .pot extension, and filled files have the .po extension.
The syntax of the file is fairly transparent. Comments, translation copyright notes, pairs of original strings and translations. Let's remove from the file everything superfluous, except for the line with “Content-Type:” and necessary for translating strings, specify the UTF-8 encoding and write translations:
Translation file: .po
msgid ""
msgstr ""
"Content-Type: text/plain; charset=UTF-8\n"
msgid "Mary"
msgstr ""
msgid "lamb"
msgstr ""
msgid "%s had a little %s"
msgstr " %s %s"
In our case, the file is quite small and simple; If it were more complicated, it would be more convenient to use specialized editors for .po files, like
Poedit , or the “specialized editor of the whole”
Emacs .
Compiled translation file: .mo
So, we translated the lines in our program. In vain, by the way. gettext is aimed solely at translating finished finished sentences, and the translation of individual words and sentence patterns in it is dangerous ... (for example, gettext does not support cases and genus at all, and somehow supports only the singular and plural distinction; so that instead Mary "Tanya" or "Light" will have to take into account the case for each possible use of the original name.) Well, okay - in our case it does not matter. Now we have another task: to prepare the translation file for use.
It would be inconvenient to use the source text file for performance reasons (for programs with a lot of translatable text), so the gettext system uses files compiled into a special format. For compilation, we can use either the msgfmt tool from the gettext bundle, or the msgfmt.py from the Python bundle (in debian-like distributions it is included in the python2.5-examples package). We use the second:
msgfmt.py mary.po
Yeah, we see the file mary.mo. Unlike mary.po, it is clearly not intended for manual editing.
Directory structure and program launch
If we prepared the program for installation in service directories, then we would create something like this hierarchy (in the case of Debian linux): system directory / usr / share / locale, in it subdirectories for different languages - ru, en, etc .; in them - according to the LC_MESSAGES catalog, and there already - a file like mary.mo (with the most unique name so as not to intersect with other programs). But in our case study, we simply make a locale subdirectory in our directory, create ru / LC_MESSAGES subdirectories in it, and in the latter we will set mary.mo.
Now finally add support for gettext to our program:
#!/usr/bin/python
import gettext
gettext.install('mary', './locale', unicode=True)
name = _("Mary")
animal = _("lamb")
print _("%s had a little %s") % (name, animal)
What changed? We imported the gettext module (well, that’s obvious). We also installed the _ () function in the global program space, which for translation of strings in the ./locale subdirectory (second argument) will find the directory with our current locale (the same ru directory), and in its subdirectory LC_MESSAGES will look for Unicode (third argument ) mary.mo translation file for mary (first argument).
What is meant by the word "installed"? And the fact that, after this action, we can import other modules of our program, and the function _ () in them will be already defined.
Launch our program ...
1:/tmp/mary> ./mary.py
Yeah. Something like this.
Bonus
According to Google Translate, the .po file for Japanese will look something like this:
msgid ""
msgstr ""
"Content-Type: text/plain; charset=UTF-8\n"
msgid "Mary"
msgstr "メアリー"
msgid "lamb"
msgstr "子羊"
msgid "%s had a little %s"
msgstr "%sの%sいた"
And for normal support of the Japanese language (besides Russian) we will have to change the last line of code to
print (_("%s had a little %s") % (name, animal)).encode('UTF-8')
Check in the work:
1:/tmp/mary> LANG=ja_JP.UTF-8 ./mary.py
メアリーの子羊いた