Idea: formatting function for easy localization of strings

Problem: when translating applications into other languages (more often we are faced with the Russification of English-language products), support for plural forms of numbers most often suffers. For example, “1 note, 2 notes, 5 notes,” or the supposedly universal “1 file (s)” , etc. The fact is that in all programming languages it is usually not further than the version of sprintf () or some kind of template engine, and support for multiple forms must be programmed each time with handles: if N = 1, then “1 note”, otherwise “N notes” . And doing it every time is lazy. This task is partially solved by the gettext framework, where there is the concept of several variants of a localized string, but this doesn’t make life much easier, because there can be several parts depending on numbers (“Found 23 files in 3 folders”), and then pieces of lines must then be glued together anyway.

Therefore, I came up with the following idea: why not come up with a common format, microlanguage (by analogy with the well-established format format () or sprintf () function), to address this problem and simplify the writing of the localized code in the future.

As is known, in different languages a different number of plural forms of numbers. There are two of them in English ( “1 file” , “many files” ). There are three in Russian ( “1 file” , “2 files” , “many files” ). In Arabic, in general, as Pootle tells us, as many as 6 pieces. Therefore, we need to be able to set directly in the string a set of several substrings and a parameter on which the choice of these substrings depends.
')
The proposed format of the substring with multiple options:
{%COUNTER%|FORM0|FORM1|FORM2[|FORM3][|FORM4][|...]}

Where
%COUNTER% is a kind of variable name that can have values [0,1,2,3, ...]
FORM0 is the string version for COUNTER = 0 (this is a special case, usually requiring a separate message)
FORM1 , FORM2 , etc. - These are alternative text variants for each plural form of the given language; for English it will be two options, for Russian - three, etc.

English example:
{%F%|No files|1 file|%F% files} found in {%D%|0 folders|1 folder|%D% folders}.

This will give us the following final variants of the output string when substituting different values of% F% and% D%:
% F% = 0,% D% = 1 => No files found in 1 folder.
% F% = 1,% D% = 2 => 1 file found in 2 folders.

The same example of a string translated into Russian:
{%F%| |1 |%F% |%F% } {%D%|0 |1 |%D% |%D% }.

This will give us the following final variants of the output string when substituting different values of% F% and% D%:
% F% = 0,% D% = 1 => No files were found in 1 folder.
% F% = 1,% D% = 2 => 1 file was found in 2 folders.

Notice that having received the entire string for translation containing several such variant inclusions, the translator a) is easier to understand in what context all parts of the string are used and b) he has room for maneuvers so that the string sounds better in the end.

It would be great to have the implementation of such a micro language in different programming languages.

I hope someone idea come in handy.

UPD: The result did not take long to wait: in this topic, the webdew habrauser is divided into the implementation of a function in C #, for which many thanks to it.

Source: https://habr.com/ru/post/55286/

All Articles

Idea: formatting function for easy localization of strings

More articles: