
At Habré, localization / internationalization of applications has
been repeatedly discussed. We, the
ABBYY Language Services company, have been working in the field of linguistic services, services and technologies for a long time, and we are constantly engaged in the localization of software. We have gained considerable experience in this matter, we decided to share it, with a greater emphasis on the organization of the whole process. Localization of applications is a more difficult task than is commonly believed, and it can be approached in various ways: you can initially create simple and understandable text, you can invest in steep translators who can draw sense out of any text, you can prepare and translate the text “somehow “But plant a community or testers to reconcile the final result. It is only necessary to remember that the verification of the source text is done in one language, and the verification of the result is done in all languages, i.e. efforts need to be spent N times more.
In general, localization is in fact the opening of another market, and it is clear that, when deciding on localization, the management expects to receive additional profit. At the same time, it is often in this very localization that they invest only a small part of the total development budget (say, about 1-2%). Those. the calculation goes to the fact that by adding 1%, you can get + 50% of income. How realistic can such expectations be?
Goals and scope

The first thing that needs to be decided is what the localization will be done for, what languages, how much and with what quality. Usually the goal is business objectives - product launch in new markets, expanding the audience. There are also cases when the product will actually be sold in the original language, but according to the laws of the state, it is necessary to have its translated version (for example, various user guides).
In general, the whole localization process requires active interaction between the client and translators, therefore, if translation into any language is carried out without real support in the local market (representation, user community) and it will not be in the near future, there is a chance that all the work will have to be redone .
')
Speaking about the translation of software, we can distinguish the following main types of content:
- The lines of the application itself;
- User's manual;
- Product Help.
With the transition to each subsequent item, the amount of information that needs to be translated increases. The application itself may contain 10–100 thousand words, various guides and training courses - another 200–500 thousand, and full product assistance - up to 1-3 million words (everything, of course, depends on the project).
If there is still enough money for the product itself, then everything else can remain behind the scenes, although, of course, a large product without translation of the user's manual can be generally useless.
If everything is localized, the important factor is that both the user manual and product assistance should be based on the lines of the application itself. Therefore, vague terminology or an unfortunate name of the interface elements, as well as errors and misprints in the application lines, lead to the reproduction of the same errors in other materials, often with the number of them multiplied. Plus, there may be inconsistency of software and everything else, with all the ensuing consequences in terms of quality, timing and cost.
Organization of the process
So, let's say the choice is made, localization is needed.
Immediately you need to think about how everything will be implemented technically. That is, will the lines be extracted from the product, or will the translation go directly to the source materials, etc.
Possible options (not all, of course) for software:
- The lines are directly in the source code of the program. Implementation:
- Source files are localized and stored. Then the localized version is compiled in parallel with the main code. Disadvantages: high costs are possible for support of builds, the product itself, since bugs need to be fixed in several versions - the original one and the translated ones.
- There is a special translation mechanism that, using the source string (possibly with an additional identifier, for example, a program module), receives a translation at runtime or during the build of the local version. In addition, you need a mechanism for pulling lines (parsing). The translation itself can be stored both in separate files and in a database. Pros: you can quickly correct translations.
- Strings are separated from the code.
Strings can be placed in separate files (for example, resx, po, txt, etc.) or a database. In the code itself, some string identifier (not necessarily unique) and localization mechanism issuing the translation can be used. Here you can also go in two ways: add translated lines at the build stage, or you can implement a separate mechanism for runtime. The essential difference from point 1 is that it is necessary to fix bugs only in the code itself.
After the text resources have been somehow allocated, it is necessary to think about how they will be stored: in a specialized database, in the form of text files on the server, etc.
The next question to ask yourself is: “How will the updates be translated and new versions be released?” Usually, all these tasks can be automated, but it’s better to think about it from the very beginning.
Content

The main feature of this stage is that any mistake made during the preparation of the content is automatically multiplied by the number of translation languages (N). Take an arbitrary poorly worded phrase.
At best, N translators will ask the translation project manager (TRM) to clarify this phrase. TRM is not necessarily a product specialist: it is the person who organizes the process. He will have to spend obviously more time to understand the situation than people who directly create content.
In the worst case, N translators, in order not to waste time, will translate the phrase as it suits them. As a result, it will be necessary to correct the translation in N languages, attracting not only TRM, but also testers, to confirm the correctness of the edit. After that, N * X clients will spend the time to install the update program.
Terminology
Content can be roughly divided into two components: terminology (foundation) and the rest of the lines (the whole building). Terminology is a place where you need to exert maximum effort to get a quality translation. You need to create a list of the most frequently used terms, add the most complex concepts to it, and make sure that all terms are one-to-one: one concept - one term. Any ambiguity is a headache for translators.
After the list of terms has been created in the source language, it must be translated into all target languages and verified (this is a very important point) by local specialists. This step, including reconciliation, should in no case be skipped. While this process is running, you can do the product lines themselves.
Product lines
There are many ways to create application content. The concrete implementation depends on the development process, the way the strings are placed, and other factors.
Frequent situation: the designer, communicating with the client (or product marketing), creates the primary specification of the product.
Further, the programmer writes the code, sometimes copying the names of the interface elements, sometimes introducing something of his own, including errors.
Then the tester checks the work of the program, finds functional errors, gives recommendations, as a result of which interface elements and messages appear that were not present in the original design specification. There is usually no time or resources to synchronize the new version of the code with the original specification.
The whole procedure can be repeated several times. The problem is exacerbated in large international companies, when the main language of application and development, which is also the basis of localization, is not the native language for one or several participants in the chain. Moreover, all participants can speak in different languages: the product is created in English, the designer is Russian, the programmer is Chinese, and the Mexican is testing.
Add two more problems:
- Not every programmer (and indeed, not every person), even in his native language, can clearly and competently formulate his thoughts (alas, but it is).
- The development team is brewed in its specialized area, where they understand everything, while using their own slang language.

The result of such a process can be a set of strings with incomprehensible terminology and a certain number of errors. Sometimes the company has a terminology specialist and a department of technical writers who check the generated content. In general, such a check is optimal at the stage of creating specifications, before the lines get into the code. Otherwise, programmers will have to redo their work. Sometimes technical writers deal only with the description of the application created, and the quality of the lines remains on the developers' conscience.
A large role is played by the presence of a worked
- out
style guide (
example ). And, by the way, the fact that there is a department of technical writers does not guarantee the presence of a style guide.
Consider the case where the lines still undergo some kind of control.
- Strings are separated from the code into separate text format files, a database, or another specialized internal program. Then technical writers can quickly correct errors directly in the source storage system, and the program build will most likely not fall from their work.
- Lines are assembled into a single block using a special parsing procedure, and only then are sent for review. In this case, after correcting the lines, additional work is required to correct the source code.
Aspects of source content verification
- Need context. If strings are separated from the code, technical writers do not always see the context of use and do not always make efforts to correct those elements that they do not understand, limited to formally following the rules of the language.
- It is highly desirable to compare new lines between themselves and with old lines in order to use the same terminology and similar constructions of sentences. When working with a small product, you can still keep track of all the details, especially if you are engaged in content from the very beginning. When developing large products with several development teams (especially if they are multilingual), such a manual comparison is ineffective, and no technical writer is able to cover the entire product.
Errors can be many. For example, in working with one thematic cluster responsible for a specific functionality (especially if it is specific for a particular country, the language of which does not coincide with the language of development), terms may be used which in the other module will mean something completely different. Or vice versa: different entities in different functional parts of the application may have the same name. This confuses both customers and developers with technical writers who will eventually describe the system.
Real examples of non-standardized content:• Qty. On hand
• OnHand Qty
• On-hand Quantity
• On Hand Qty.
• Qty on Hand
• Quantity On Hand
• On Hand Qty
• On Hand Quantity
• Qty On Hand
• Start Date cannot be greater than the End Date.
• Due Date cannot be before Start Date.
• End Date Cannot Be Before The Start Date.
• Start date may not be greater than the End Date.
• From Date cannot be greater than To Date.
• The Star date cannot be greater than the End date.
• can not be greater than the end date.
• From Date cannont be greater than To Date.
• From Date cannot be later than To Date.
• Start date cannot be greater then End date.
• The Start date cannot be greater than the End date.
• Begin Date may not be greater than the End Date.
• Invoice not found.
• Invoice cannot be found.
• Unable to find invoice.
• Invoice was not found.
The given examples are identical strings in meaning, but they are created by different programmers in different departments. Instead of using one thing everywhere, we have 10 options, for the transfer of which you have to pay money.
Controlled Language System
One of the ways to solve the problem of simplifying content is the implementation of the
Controlled Language (CL) system . Its main idea is to unify the language and terminology of the application, aligning it with the company's developed style guide and automating the verification. It uses a specific set: for example, a limited set of words, grammatical structures, restrictions on the length of a sentence, etc.
The system provides the following benefits:
- The text is perceived easier, easier to work with the product, which is beneficial to those customers who use the program in its original language.
- Improved translation quality: fewer errors are allowed when translating plain text.
- The cost of translation is reduced: with a limited set of words, phrases and constructions, the volume of partial matches (as well as 100% matches and repetitions) increases.
- The quality of auxiliary content is improved: both due to the “foundation” in the form of the best soft lines, and from the direct use of the CL system.
- Improved quality of machine translation: CL allows you to remove the ambiguity, which is one of the main problems affecting the quality of MT.
Controlled Language can be deployed inside, can be given to contractors, for example, simply as a translation from English to Simplified English.
Summary

Thus, the content of the application (software lines) must undergo an extremely complete control, verified for compliance with a specific set of rules specific to the industry in general and the company in particular. Content should be understandable for as many people as possible, including those who do not have in-depth knowledge of the product.
Translation and quality control deserve a separate article, as there is a whole range of questions: on whether to trust automatic checks or necessarily involve local users, to choosing a translation management system and supporting a pool of translators.
Posted by
fridge .
From the life of blog editorsJust before publication, we received the following review from one of the authors:
The programmer said that there could have been a few more pictures, and at least one of them was naked, but, I suppose, this is not an option :).
In general, there are no big comments.It’s impossible to deny a programmer, but a nudity is too much. So here you are:
