📜 ⬆️ ⬇️

BabelFish - a polyglot in the world of JavaScript

Babelfish


The Internet brings globalization into our lives. And many web resources are not limited to an audience living in one country and speaking the same language. However, manual support of several language versions of the site is not a pleasant undertaking and, starting from a certain scale, it is hardly real.

For example, in REG.RU today there are more than 15,000 phrases in dictionaries, of which about 200 use declension, and more than 2000 use variable substitution. Every day at least 10 phrases are added. And this is despite the fact that we have just begun to localize the site and ahead are plans for new languages.
')
Although the tasks of software internationalization and localization (including on the web) are not new, and, on the whole, fairly standard, there are not so many good universal tools for solving them. And it is not always easy to choose such a tool for a specific stack of client and server technologies, especially if you want to use the same tool both there and there.

DON'T PANIC.

BabelFish 1.0, a package for the internationalization of JavaScript applications, was recently published.

The ideas behind it were so much to our liking that we even transferred them to Perl in the form of the Locale :: Babelfish CPAN module, and use this for Perl applications. But back to the JavaScript implementation.

Overview

image
What are the features of this library?


Consider the possibilities of the module on examples. A typical phrase is:

#{cachalotes_count} ((||)):cachalotes_count.

It also supports exact match and the possibility of nested interpretation of the occurrences of variables. A typical example is when instead of “0 sperm whales” we want to write “no sperm whales”, instead of “1 sperm whale” just “sperm whale”, while leaving the spelling “21 sperm whale”:

((=0 |=1 |#{count} |#{count} |#{count} ))

Note that if a variable with the name count is used, then its name can be omitted by a colon at the end of the phrase.

The Babelfish API offers a method t(, , ) for resolving a key in a specific locale to a finished text or data structure. The call looks like this:

 babelfish.t( 'ru-RU', 'some.complex.key', { a: "test" } ); babelfish.t( 'ru-RU', 'some.complex.key', 17 ); //  count  value   17 

To simplify code readability and less typing, a method of this type (coffee) is usually created:

  window.t = t = (key, params, locale) -> locale = _locale unless locale? babelfish.t.call babelfish, locale, key, params 

Here the locale moves to the end of the argument list and becomes optional. Now you can write briefly:

 t( 'some.complex.key', { a: "test" } ); //    : t( 'some.complex.key', 17 ); t( 'some.complex.key', { count => 17, value => 17 } ); 

The reverse side of the laconism of the syntax - translators (staff working with dictionaries and templates) need to get used to the syntax, even though it is simple.

The solution to the problem is to provide an interface for translators, where, in addition to the phrase for translation, the context of the phrase is immediately offered, the fixtures with typical data used in its formation, and the results area.

It is also useful to provide snippets that insert ready-made constructions for declining and substituting variables.

Consider the process of integrating Babelfish into your application on the browser side.

Installation


Babelfish is available in both the npm package and the bower package. If you need to work simultaneously with Node.JS and with browsers, we recommend using the npm-package + browserify (an example is in babelfish demo ), but most developers will be easier to use bower.

Here we assume that the current locale is defined as window.lang:

 # assets/coffee/babelfish-init.coffee do (window) -> "use strict" BabelFish = require 'babelfish' locale = switch window.lang when 'ru' then 'ru-RU' when 'en' then 'en-US' else window.lang window.l10n = l10n = BabelFish() l10n.setFallback 'be-BY', [ 'ru-RU', 'en-US' ] window.t = t = (args...) -> l10n.t.apply l10n, [ locale ].concat(args) null 


Storage and compilation of dictionaries


Internal format


Dictionaries are formed in the internal Babelfish format, which allows you to bind to the key not only text, but also other data structures. The mechanism of serialization and deserialization of dictionaries in JSON is attached (stringify / load).

In fact, you can add phrases to dictionaries like this:

 babelfish.addPhrase( 'ru-RU', 'some.complex.key', ' ' ); babelfish.addPhrase( 'ru-RU', 'some.complex.anotherkey', '  ' ); 

Or so:

 babelfish.addPhrase( 'ru-RU', 'some', { complex: { key: ' ', anotherkey: '  ' } }); 

When adding complex data structures, you can specify the flattenLevel parameter (false or 0), after:

 babelfish.addPhrase( 'ru-RU', 'myhash', { key: ' ', anotherkey: '  ' }, false); 

And then by calling t ('myhash') we get an object with the keys key and anotherkey. This is very useful when localizing external libraries (for example, to provide configurations for jQuery UI plugins).

The only requirement for the serialization of such data is the possibility of their presentation in JSON format.

Notice that Babelfish uses lazy (delayed) compilation to parse syntax. That is, for phrases with parameters, when first used, functions will be generated, and on subsequent calls, the result will be quickly. On the one hand, this greatly simplifies serialization, on the other, it can be a problem if you use paranoid CSP policies (which prohibit the execution of eval and Function () in the browser). The author of the package does not mind implementing the compatibility mode, so if you really need it - just create a ticket in the project tracker.

YAML format


For most applications, the YAML format is more suitable, which is also supported out of the box. I would recommend storing the data in this format, compiling it into an internal format before using. In particular, dictionaries can be combined with each other and given to the client in the form of regular JavaScript.

In this case, the nested YAML keys are converted to a flat structure:

 some:
     complex:
         key: "Some text at least of # {count}"

converted to key some.complex.key.

By the way, Babelfish can automatically, without direct instructions, recognize in the dictionaries not just phrases, but also lists (as complex data structures). So, if you specify

 mylist:
     - british
     - irish

Then by calling t('mylist') we get [ 'british', 'irish' ] . This is useful to us later.


Phrase Localization Transformations


image Usually we need to perform additional transformations on them before compiling phrases. Their number includes such as:


Automatic typography is useful for everyone, and using Markdown format makes it easy to read text and interact with translators.

We put the original dictionaries into the assets / locales directory, transforming them further into ready-to-use config / locales.

It is clear that your transformation stack is likely to be different from ours.

And here is an example of compiling dictionaries in YAML format into the internal Babelfish format with conversion via Markdown processor (grunt):

  # Gruntfile.coffee #   glob, marked, traverse marked = require 'marked' traverse = require 'traverse' grunt.registerTask 'babelfish', 'Compile config/locales/*.<locale>.yaml to Babelfish assets', -> fs = require 'fs' Babelfish = require 'babelfish' glob = require 'glob' files = glob.sync '**/*.yaml', { cwd: 'config/locales' } reFile = /(^|.+\/)(.+)\.([^\.]+)\.yaml$/ # do not wrap each line with <p> renderer = new marked.Renderer() renderer.paragraph = (text) -> text for file in files m = reFile.exec(file) continue unless m [folder, dict, locale] = [m[1], m[2], m[3], ''] b = Babelfish locale translations = grunt.file.readYAML "config/locales/#{folder}#{file}" # md traverse(translations).forEach (value) -> if typeof value is 'string' @update marked( value, { renderer: renderer } ) b.addPhrase locale, dict, translations res = "// #{file} translation\n" res += "window.l10n.load(" res += b.stringify locale res += ");\n" resPath = "assets/javascripts/l10n/#{folder}#{dict}.#{locale}.js" grunt.file.write resPath, res grunt.log.writeln "#{resPath} compiled." 

Now ready-made scripts can be glued together and connected to your application in any way you like.


Select locale


To select a locale on the server side, the most correct way is to parse the Accept-Language header. The npm module locale will help us in this. You can also view the source code of nodeca.core .

Rollback to another locale


Babelfish maintains a list of rollback rules for other locales in case the required phrase is not in the current locale.

For example, we want data for the Belarusian locale to be taken in order of priority from the Belarusian, Russian and English locales:

 babelfish.setFallback( 'be-BY', [ 'ru-RU', 'en-US' ] ); 


Localization


In addition to internationalization, we also have the task of localizing the application. In particular, we should be able, for example, to format currencies, dates, time ranges, taking into account locale.

Localization of dates


Use slightly modified data to format dates from Rails:

 # config/locales/formatting.ru-RU.yaml date: abbr_day_names: -  -  -  -  -  -  -  abbr_month_names: - - . - . -  - . -  -  -  - . - . - . - . - . day_names: -  -  -  -  -  -  -  formats: default: '%d.%m.%Y' long: '%-d %B %Y' short: '%-d %b' month_names: - -  -  -  -  -  -  -  -  -  -  -  -  order: - day - month - year time: am:   formats: default: '%a, %d %b %Y, %H:%M:%S %z' long: '%d %B %Y, %H:%M' short: '%d %b, %H:%M' pm:   


 # assets/coffee/babelfish-init.coffee strftime = require 'strftime' l10n.datetime = ( dt, format, options ) -> return null unless dt && format dt = new Date(dt * 1000) if 'number' == typeof dt m = /^([^\.%]+)\.([^\.%]+)$/.exec format format = t("formatting.#{m[1]}.formats.#{m[2]}", options) if m format = format.replace /(%[aAbBpP])/g, (id) -> switch id when '%a' t("formatting.date.abbr_day_names", { format: format })[dt.getDay()] # wday when '%A' t("formatting.date.day_names", { format: format })[dt.getDay()] # wday when '%b' t("formatting.date.abbr_month_names", { format: format })[dt.getMonth() + 1] # mon when '%B' t("formatting.date.month_names", { format: format })[dt.getMonth() + 1] # mon when '%p' t((if dt.getHours() < 12 then "formatting.time.am" else "formatting.time.pm"), { format: format }).toUpperCase() when '%P' t((if dt.getHours() < 12 then "formatting.time.am" else "formatting.time.pm"), { format: format }).toLowerCase() strftime.strftime format, dt 

Now we have a helper:

 window.l10n.datetime( unix timestamp or Date object, format_string_or_config ). 

Similarly, helpers can be built for currencies and other localized values.

Other implementations


The Babelfish parser is built on PEG.js. With some modifications, you can use its grammar in other PEG parsers. Given the lack of syntax binding to JavaScript and ease of use, it can be assumed that Babelfish implementations will be published for other platforms as well.

As I mentioned above, we have implemented the Babelfish 1.0 dialect for Perl.

Conclusion


To illustrate the capabilities of Babelfish, we published a small demo project using marked and jade .

It must be said that in the process of using in our project, some of the capabilities of Babelfish expanded significantly as a result of our requests. For example, storing complex data structures actually migrated to Babelfish from our Perl project.

As is usually the case with nodeca, they have released a thoughtful, high-quality and promising library. Just to remind you that they have developed such hits as js-yaml , mincer , argparse , pako and markdown-it .

Special thanks to the author of the module Vitaly Puzrin ( @puzrin ). The article was prepared with the active participation of the development department of REG.RU, in particular: IgorMironov , dreamworker , nugged TimurN .

Source: https://habr.com/ru/post/224919/


All Articles