Recipe i18n. Basis - Babel, json with coffee and a grant with hbs to your taste

In my previous post I wrote about why and why pybabel-hbs had to be made, gettext row extractor from handlebars templates.

A little later it became necessary to extract the same from json.
This is how pybabel-json appeared.
pip install pybabel-json either on github

There the javascript lexer built into the babel was used, but there were also some nuances, but the post is not about that, the one written there is less interesting than it was in the hbs plugin and hardly needs any attention.
')
This post is about how the whole set looks like a whole for localization, from and to, what to do with data from a database, or from another, not quite static, place.
From and to includes:
(I must note that not a single item is obligatory, all this is quite easily connected to any application only partially and as necessary)

- Babel. A set of tools for localizing applications.
- Grunt. Task Manager (task)
- coffeescript. Does not need to be presented, the entire client code is written in coffee, and you also need to extract lines from it.
- handlebars - templates
- json - row storage
- Jed. gettext client for js
- po2json. A utility for converting .po files to the .json format supported by Jed

A little bit about gettext and myths

gettext is initially a set of utilities for localizing applications; today, I would call gettext also a generally accepted format. (not to be confused with the sole)
The minimum essence can be described as there are lines in English that pass through a certain gettext function and turn into a string in the desired language at the output, keeping the rules of the language relating to different inclinations for plural numbers + the ability to specify the context and domain.
It is important to note that it is the strings, they are the keys, and not the USER_WELCOME_MESSAGE constant that turns into a text somewhere.

Not everyone needs the context, and I haven’t yet implemented it in my babel plugins, because without need, pull requests are welcome
There will be a couple of words about the domain later.
But ngettext is absolutely necessary for many, if not all.
And then about the myths.

 Zero apples.  Zero apples
 One apple.  One apple
 Two apples.  Two apples
 Five apples.  Five apples

This simple example should show all lovers of language constants a la "USER_WELCOME_MESSAGE", which are then given to the translation that everything is not as simple as it seems at first glance.

For what line will be chosen decide the rules predefined and described in the babel:
For example, this is for English:

"Plural-Forms: nplurals=2; plural=(n != 1)\n"

And this is for Russian:

 "Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && " "n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2)\n"

Great and Mighty :)
Do not be afraid to write this manually for, for example, the Japanese will not have to.

So, about the myths.
Several times I heard the opinion that you can do the main site in Russian and wrap the same Russian lines in gettext calls, and then add English.
If you have your crutches using those linguistic constants, you have no inclined sentences with numbers anywhere, and you use an ugly format in the style “You have apples: 1”, then of course you can do basic Russian.
If you want to display a slightly more beautiful messages to the user, such as “You have 1 apple”, “You have 7 apples” then English should be the main language.

Why? It's all about apples.
The plural is not always in the singular, and the singular is not always for the unit.
English in this regard is simple, but there is no Russian.

ngettext defaults to how the key expects English. Moreover, ngettext accepts only two parameters for input — the singular and the plural. And not an array of plurals.

Thus, if you still want to use Russian by default, you will at least have to maintain a Russian-Russian translation file, in which the string “You have% s apples” will turn into the correct declination. Yes, you can - but it is crooked.
When changing, you will need to remember that only the key is changed, not a string in Russian, and you need to go and edit the Russian language file in parallel. In general, do not need to do so. ngettext is as compatible as possible with the English language as the original.

By the way, at the same time I will show an example of how the .po files look for English and for Russian.

 msgid "You have %(apples_count)d apple" msgid_plural "You have %(apples_count)d apples" msgstr[0] "  %(apples_count)d " msgstr[1] "  %(apples_count)d " msgstr[2] "  %(apples_count)d "

 msgid "You have %(apples_count)d apple" msgid_plural "You have %(apples_count)d apples" msgstr[0] "" msgstr[1] ""

Ie the number of resulting lines depends on the configuration of the language. Maybe there is a language in which a dozen forms of the plural ...

OK, so where do i start?

All those who still have 3 apples must be motivated to start

pip install babel

The hard part is behind.

Left:
- Change the code in all the text calls gettext
- Set babel on code
- On the basis of the received .pot file, make the .po file corresponding to each desired language.

What actually translate?

The question is not as simple as it seems at first glance:

The part is simple - templates and code.
Django and flask - there are extractors from the templates
Python and javascript are supported by babel initially
handlebars and json - had to do links at the beginning of the post.
For coffeescript - recipe further
For everything else - google help

Once again, the part is simple - the code, to do this, you need to wrap all the lines in gettext / ngettext calls in accordance with the format that each extractor requires. As a rule, they also provide the ability to override which function should be used.
For example, I have this:

 pybabel extract -F babel.cfg -o messages.pot -k "trans" -k "ntrans:1,2" -k "__" .

trans and ntrans are specified for javascript, and __ for python, in which this function is used to transmit the string transparently (more on this later)

I'm all
print ("apple") needs to be altered in print (ngettext ("apple"))
And all
print ("I have% s apples") in print (ngettext ("I have% s apple", "I have% s apples", num_of_apples)% num_of_apples)

I must note here what I wish for everyone, that I never use or recommend using unnamed parameters.
In my case - only named, I mean it should look like this:

Python:

 print(gettext("I have an apple!")) print(ngettext( "I have %(apples_count)d apple", "I have %(apples_count)d apples", num_of_apples ).format(apples_count=num_of_apples))

Standard gettext is used, for flask and django there are wrappers

Javascript:

 console.log(i18n.trans("I have an apple!")) console.log(i18n.ntrans("I have %(apples_count)d apple","I have %(apples_count)d apples",num_of_apples,{apples_count:num_of_apples}));

Here and in coffee, proxies are used for Jed methods from here:
github.com/tigrawap/pybabel-hbs/blob/master/client_side_usage/i18n.coffee
Parameters are passed to the string due to the built-in Jed sprintf

Coffeescript:

 console.log i18n.trans "I have an apple!" console.log i18n.ntrans "I have %(apples_count)d apple", "I have %(apples_count)d apples", num_of_apples, apples_count:num_of_apples

Hadlebars:

 {{#trans}} I have an apple! {{/trans}} {{# ntrans num_of_apples apples_count=num_of_apples}} I have %(apples_count)d apple {{else}} I have %(apples_count)d apples {{/ntrans}}

JSON row storage:

 { "anykey":"I have an apple!", "another_any_key":{ "type":"gettext_string", "funcname":"ngettext", "content":"I have %(apples_count)d apples", "alt_content":"I have %(apples_count)d apples" } }

Offtop: An explanation of this format in the documentation for pybabel-json

I think it was not difficult to notice that num_of_apples repeated every call twice.
The reason is that once it is passed as an argument for ngettext, which determines which string is used, and the second time as a parameter for a string, along with other possible parameters, substituted into this string.

- As I said before - this is the easy part, to wrap up the existing text.
Next you need

1) Change all the buttons on which the labels on the buttons with the text. Everyone knows that text buttons are bad. But often it is necessary to accept it, because it’s so faster, and the designer wants it that way :)
- With this item, everything should be clear - tedious, but necessary

2)
Where is a more interesting point, what do you do with seemingly constant lines, but which are not exactly constant?
As an example I will give our case - genres for songs. It seems to be the dynamics, are stored in the database, but in fact - rarely changing statics, which would be nice to tear and send for translation.

This is what caused pybabel-json.
This solution is also a solution to any other translation problem, such as the response of a third-party server error message. It can be said that this is static, but this is uncontrolled static, which should be beautifully wrapped for translation.
All you need to do is create a .json file.
errors.json
with content

 { "from_F_service": [ "Connection error", "Access denied" ], "from_T_service":[ "Oops, it is too long" ] }

No keys, pure array of strings.
The worst thing that happens if the service has changed the message - the user will receive an untranslated version. As a rule, this stuff

With the data in the database, the situation is similar, to the build-push-deployment system, whatever it is (after all, do you have something)? at the same level where there will be commands for assembling everything and all babel, you need to add a script before these commands that will extract all the necessary data from the database and collect a similar json, which is started by the babel will already collect data.
Needless to say - such files should be added to .gitignore, or an analogue of whatever, in general, so that the source control does not fall

All strings that are received in this way must pass through the gettext function call.
Ie if it is in python, then gettext (), in js jed or proxy methods given earlier

It should also be noted that sometimes you want to do in the reverse order. Or you need to do in reverse order.
Ie to define in the code that the line should be translated, but the translation itself will be launched in a different place.
I will give an example in python:

 class SomeView(MainView): title=gettext("This view title")

If you write such a code, then you risk to get a created copy of the class in the English version, if the class was created when the server was started, or, for example, the Chinese version, if the creation was dynamic but cached at the first call

In such cases, I would like to mark for translation, but translate in the right place.
The right place is to create an object, not a class.
those

 def __(string,*k,**kwargs): return string class MainView(SomeParent): def __init__(self): #.... self.title=gettext(self._title) #.... class SomeView(MainView): _title=__("This view title")

Ie - the string collector will define __ as a string for translation, the function itself does nothing, and the translation will be started at the right time.
Thus, everything is in one place and looks beautiful.

This applies to many languages, including coffeescript and javascript, if you write under node.js.
For the browser, this is less relevant, since even at the time of creating the class it should already be known for which language to create.

But in any case, it is more correct to translate in the constructor, and not at the moment of class creation.

It seems to have bypassed all the possibilities of the direction of translation known to me, let's say all this is done.

Glue it all together

Now you can try to collect all this, there are a few simple steps:
0) Create an empty directory of the original lines, so as not to swear in the future for the absence of the file

 touch messages.pot

1) Create .po files of target languages This is done 1 time and should not be included in the build. .po files are files containing both original strings and translations for them, for each language.

 pybabel init -i messages.pot -d path/i18n -l es #   .po      path/i18n/es (   i18n  ) #   ,   : (   ,       echo?, echo   ) echo {es,en,fr,de,ja} | xargs -n1 pybabel init -i messages.pot -d path/i18n -l

2) Create / update .pot file - main line storage This also should not be included in the build, but you need to run it when you need to get new .po files that will be sent for translation.

 python/node/your_language update_translation_jsons #      pybabel extract -F babel.cfg -o messages.pot -k "trans" -k "ntrans:1,2" -k "__" . #    # trans -    , ntrans -  # __  ""    # babel.cfg -  babel-     pybabel update -i messages.pot -d path/i18n/ # .po    ,

Here it will not be superfluous to show an example of the babel.cfg file, this is a mapping file indicating what and from which files to extract the lines:

 [python: path/backend/notifier.py] [hbs: path/static/**.hbs] [json: path/static/i18n/src/**.json] [javascript: path/static/**.coffee_js] encoding = utf-8

3) Run all .po files through po2json, to get .json, which Jed will accept.
Here it can and should be included in the build.
What not to do is letting it in git, they have no place there.

How exactly to feed all the .po file and where to put them is on the conscience of the user.
I run them into grunt, like the rest of the build.
The grunt-po2json which is on github and in the grant repository is broken, because it does not support rename, but it is needed, since it’s more convenient for me when all the final .json files go to one directory, I fixed it locally, but need to send it to case pull request ...

Of course, it can be a lot easier, after installing po2json ( npm install po2json ), to include something similar in the build script:

 echo {es,en,fr,de,ja} | xargs -n1 -I {} po2json /path/i18n/{}/LC_MESSAGES/messages.pot /path/to/build/i18n/{}.json

Thoughts that are not included in the flow, but those that have a sense of focusing on them

Throughout the post, he promised several times "about this later," but there was no suitable place for later.

Such as:
coffeescript does not have its own extractor, because when building statics, coffeescript is compiled (or translated) into javascript.
Therefore, it is enough to start assembling .js strings after translation into javascript
In my case, everything is even a bit wrong, next to each coffee file is a coffee_js file, which is created using grunt watch at the time of editing (and restarts dev static, but this is a topic for a separate post :)), these files are by themselves outside the gita. Here from them lines also are pulled out

- There was also a mention of domains.
The domains are ultimately different files, messages.pot / messages.po = domain messages
You can create multiple domains, bind all domains to a Jed instance, or create several different Jed instans and redirect to them.
But for this you need to expand the handlebars helpers or any other wrapper ... I have never had such a need, but as a rule I prefer not to do anything extra in advance :)

- A small footnote to texa in the introductory block

If you want to display a slightly more beautiful messages to the user, such as “You have 1 apple”, “You have 7 apples” then English should be the main language.

Here it should be understood that in the call to ngettext it is necessary to write exactly “you have% (apples_count) d apples”, and not “you have one apple”
Because in the case of one and in the case of the 21st, the final line should be in the first form - that is, “You have an% d apple”

- It will also be important to focus on one issue that I have not had time to solve at the automatic level:
babel creates an “empty string” (the configuration of the .po file, which determines what language it is and what the string should be for the plural) in a format incompatible with Jed
Jed expects that there will be “plural_forms”, babel gives out Plural-Forms
Here you will need to edit either the output of the babel, or the input of the Jed, or between them.
But first, look in the configuration of both.

If I missed something, did not describe it, etc. - write in the comments, add.
Objectives to disassemble each utility in detail did not stand, the goal was to tell about the existence of these and how and why they work together.
The rest is a place in the comments.

Source: https://habr.com/ru/post/199226/

All Articles