
Continuing the discussion about the advantages of Fluent over the usual gettext, I publish the official position of the creators of Fluent in translation.
Gettext is a localization system that is deeply rooted in the GNU project and its associated architectural solutions.
Fluent Project sees gettext as a good example of a full-fledged low-level platform-independent ecosystem of libraries and tools for managing the complete release cycle of a product with localization files in a readable format. At the same time, the Fluent paradigm leads us to other architectural solutions in important localization aspects, which, in turn, lead to completely different APIs and life cycles.
In other words, gettext is an excellent project, but we do not share its views on the approach to localization.
Here are the main differences between gettext and Fluent:
Arrangement
The most important difference between gettext and Fluent is the message identifier. In gettext, we decided to use the source string (usually in English) as an identifier. This choice seems simple, but in the future it imposes many restrictions.
')
First of all, with this approach, any change in the original line invalidates all translations associated with it. This seriously increases the burden on developers, forcing them to never change the original messages, as this will require updating all translations.
Secondly, it complicates the introduction of several messages with the same text in the source language, which must be translated in different ways. For example, the text for the “Open” button and for the “Open” mark can be translated in different ways, since the first text is a command, and the second is a description. Gettext has an optional
msgctxt context string for distinguishing between strings with the same source segment. Such an approach places the responsibility for the recognition of such situations on the developers, which contradicts the principle of the division of interests.
Fluent does not recommend reusing texts for exactly this reason. Separating source text from other translations is also important for our ability to enter composite messages (which contain several lines for one translation unit associated with one user interface widget) and for identifier-based links to messages.
Fluent establishes an “agreement” between developers and localizers. The developer enters a unique identifier and a set of variables (the number of unread messages, the user name, etc.), and the localizer, using the Fluent syntax, decides how to construct the message text for this identifier.
The developer should not worry about the detailed implementation of translations of such messages. All that a developer needs is that he will receive one line of text suitable for a specific place in the UI to query a string for a specific identifier.
Message options
Gettext supports a small set of functions for internationalization, in particular for plurals. But this syntax for plurals is a special case, in addition to the standard gettext syntax, and is difficult to scale for other cases that require variability.
Fluent supports the basic concept of string variation, which can be used with selectors. Usually, the plural rule will be such a selector, but depending on the grammatical features of the language, there may be others, such as gender, declination, or even the environment — for example, the time of day or the operating system. Fluent syntax allows localizers to take into account all these features and to create text that exactly fits the situation.
External arguments
Gettext does not support external arguments. In other words, you cannot set the formatting of parameters - numbers, dates. To format the parameters in gettext, it is recommended to return a string that will later be passed to
printf or run
String.prototype.replace on the resulting string.
Fluent support for external arguments is at the very core of the syntax. External arguments are not only interpolated, but also used as parameters for the selector, and can also be passed to built-in functions. This allows localizers to create much more accurate texts for specific cases. On top of that, Fluent places
FSI / PDI markers around objects to protect directivity isolation in bidirectional text, and prohibits any manipulation of the final lines, reducing the burden on developers.
Isolation of responsibility
In addition, the way gettext handles the rules for plurals requires the developer of a choice system whether the message will be a multivariate message or a single line. From the point of view of Fluent, the developer should not deal with such issues. In many cases, when one variant is enough in English, in other languages ​​it is necessary to add variants with plural numbers.
Fluent assumes that the developer should not have similar linguistic knowledge when developing software with a multitude of locales, and each language must have a certain leeway in localization.
As a result, Fluent stores each translation separately, without “leaking” the requirements of one language to another, and keeps all the translations “opaque” for a developer who does not need to worry about what localization functions may be needed for a given string.
Cancellation of transfer
In the development cycle, three situations can be distinguished when the translation is “canceled” (becomes invalid) in relation to the original:
- Minor change: does not affect the translation (correction of punctuation, typos).
- Average change: affects the construction of the message, but does not cancel the correctness of the associated translation (for example, Show All Bookmarks -> Show Bookmarks Manager ).
- Major change: new meaning of the sentence ( Click to save -> Click to open ).
For architectural reasons, gettext combines all three levels into one state called
fuzzy . Any change to the original line (albeit a complete, albeit minor) leads to the cancellation of transfers.
In Fluent, the use of unique identifiers allows you to keep two of these levels separate from the third: if you make
small changes to the source text of a line and when you save the identifier, the translations remain valid. On the other hand, if the developer changes the identifier, then all transfers are canceled and will require updating.
We believe that such an architectural solution is more beneficial for most release cycles, although we recognize that in order to change the
average level, the developer will have to choose between saving or changing the identifier (that is, between a
minor and a
significant change).
We also consider the idea of ​​message
versioning so that the developer can mark the message as
updated without completely canceling its content. Such a state will allow the translation to remain valid based on the view that the old version of the translation is still better than the untranslated string, and at the same time allows the tools to notify the localizer of the need to update the translation.
Data format
The gettext uses three file formats - * .po, * .pot, and * .mo. This affects the implementation of gettext in the production cycle, adding steps like extracting and compiling messages.
Fluent uses a single file format * .ftl, which simplifies the implementation and does not require additional steps that may lead to discrepancies in the data.
Unicode support
Gettext can be encoded in UTF-8. Overall, this is where Unicode support ends. It uses its own data set for plurals, does not know how to work with the formatting of dates and numbers, does not help in working with bidirectional texts.
Fluent actively uses standardized libraries and CLDR, ICU, and ECMA402 algorithms, neatly combining localization and internationalization.
Conclusion
We believe that the Fluent API and syntax are a significant improvement over gettext, and we recommend using them for international software.
More about Fluent