Dialectics of Neural Machine Translation

or does the quantity grow into quality

An article based on a speech at the conference RIF + KIB 2017.

Neural Machine Translation: why just now?

They have been talking about neural networks for a long time, and it would seem that one of the classical tasks of artificial intelligence, machine translation, is simply asking for it to be solved on the basis of this technology.
')
Nevertheless, here is the dynamics of popularity in the search for queries about neural networks in general and about neural machine translation in particular:

It is clearly seen that on radar, until recently, there is nothing about neural machine translation - and at the end of 2016, several companies, including Google, Microsoft and SYSTRAN, demonstrated their new technologies and machine translation systems based on neural networks. They appeared almost simultaneously, with a difference of several weeks or even days. Why is that?

In order to answer this question, it is necessary to understand what machine translation is based on neural networks and what is its key difference from the classical statistical systems or analytical systems that are used today for machine translation.

The neural translator is based on the mechanism of bidirectional recurrent neural networks (Bidirectional Recurrent Neural Networks), built on matrix calculations, which allows you to build significantly more complex probabilistic models than statistical machine translators.

As with statistical translation, neural translation requires parallel cases for learning, which allow to compare automatic translation with a reference “human” one, only in the learning process it operates not with separate phrases and phrases, but with whole sentences. The main problem is that to train such a system requires significantly more computing power.

To speed up the process, developers use GPUs from NVIDIA, and Google also uses the Tensor Processing Unit (TPU), which are proprietary chips adapted specifically for machine learning technologies. Graphic chips are initially optimized for matrix computing algorithms, and therefore the performance gain is 7-15 times in comparison with the CPU.

Even with all this, training of one neural model requires from 1 to 3 weeks, whereas the statistical model of approximately the same size is adjusted in 1-3 days, and with the increase in size this difference increases.

However, not only technological problems were a brake on the development of neural networks in the context of the task of machine translation. In the end, it was possible to teach language models earlier, albeit more slowly, but there were no principal obstacles.

The role played by the fashion on neural networks. There were many developments within themselves, but they were not in a hurry to declare this, fearing that they might not receive the increase in quality that society expects from the Neural Networks phrase. This may explain the fact that several neural translators were announced one after another.

Translation quality: whose BLEU score is thicker?

Let us try to understand whether the increase in the quality of translation corresponds to the accumulated expectations and the increase in costs that accompany the development and support of neural networks for translation.
Google in its research demonstrates that neural machine translation gives Relative Improvement from 58% to 87%, depending on the language pair, compared to the classical statistical approach (or Phrase Based Machine Translation, PBMT, as it is also called).

SYSTRAN conducts research in which the quality of the translation is assessed by choosing from several presented options made by various systems, as well as “human” translation. And declares that his neural translation is preferred in 46% of cases to the translation made by man.

Translation quality: is there a breakthrough?

Despite the fact that Google claims an improvement of 60% or even higher, there is a small catch in this indicator. Representatives of the company speak of “Relative Improvement”, that is, how closely they succeeded with the neural approach to approach the quality of Human Translation in relation to what was in the classical statistical translator.

Industry experts who analyze Google’s Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation article are quite skeptical of the results presented and say that in fact the BLEU score was improved only by 10%, and significant progress visible on fairly simple tests from Wikipedia, which were most likely used in the network learning process.

Inside PROMT, we regularly compare translations on various texts of our systems with competitors, and therefore there are always examples at hand where we can check whether neural translation really surpasses the previous generation as manufacturers claim.

Original text (EN): Worrying never did anyone any good.
PBMT: Do not worry, did not do anything good to anyone.
NMT: Anxiety never helped anyone.

By the way, translating the same phrase into Translate.Ru: “Excitement never did anyone good,” you can see that it was and remained the same without using neural networks.

Microsoft Translator in this issue is also not lagging behind. Unlike colleagues from Google, they even made a website where you can make a translation and compare two results: neural and neuronal, to make sure that the growth claims are not unsubstantiated.

In this example, we see that there is progress, and it is really noticeable. At first glance, it seems that the developers' statement that machine translation almost caught up with “human” is true. But is it really, and what does this mean in terms of the practical application of technology for business?

In the general case, translation using neural networks is superior to statistical translation, and this technology has great potential for development. But if you carefully approach the issue, then we will be able to make sure that progress is not in everything, and not all tasks can be applied to neural networks without regard for the task itself.

Machine translation: what are the tasks

From the automatic translator the whole history of its existence - and this is already more than 60 years! - waited for some magic, presenting it as a typewriter from science fiction films that instantly translates any speech into an alien whistle and back.

In fact, tasks are of different levels, one of which implies a “universal” or, if I may say so, “household” translation for everyday tasks and ease of understanding. The tasks of this level are perfectly handled by online translation services and many mobile products.

These tasks include:

• fast translation of words and short texts for various purposes;
• automatic translation in the process of communication on forums, on social networks, instant messengers;
• automatic translation when reading news, articles Wikipedia;
• travel translator (mobile).

All those examples of growth in the quality of translation using neural networks, which we considered above, are precisely related to these tasks.

However, with the goals and objectives of the business with regard to machine translation, everything is somewhat different. For example, here are some requirements that apply to corporate machine translation systems:

• translation of business correspondence with customers, partners, investors, foreign employees;
• localization of websites, online stores, product descriptions, instructions;
• translation of custom content (reviews, forums, blogs);
• the ability to integrate translation into business processes and software products and services;
• accuracy of translation in compliance with the terminology, confidentiality and security.

Let's try to understand with examples, whether any tasks of a translation business can be solved using neural networks and how.

Case: Amadeus

Amadeus is one of the world's largest global airline ticket distribution systems. On the one hand, air carriers are connected to it, on the other hand, agencies that should receive all information about changes in real time and communicate to their clients.

The task is to localize the conditions for applying tariffs (Fare Rules), which are formed automatically in the booking system from various sources. These rules are always in English. Manual translation is almost impossible here, since there is a lot of information and it often changes. The airline ticket sales agent would like to read Fare Rules in Russian in order to promptly and efficiently advise their clients.

A clear translation is required that conveys the meaning of the tariff rules, taking into account typical terms and abbreviations. And the automatic translation is required to be integrated directly into the Amadeus reservation system.

→ The task and implementation of the project are detailed in the document .

Let's try to compare the translation made through the PROMT Cloud API, integrated into Amadeus Fare Rules Translator, and the “neural” translation from Google.

Original: ROUND TRIP INSTANT PURCHASE FARES

PROMT (Analytical Approach): TARIFFS FOR INSTANT PURCHASE OF THE ROUTE OF THERE AND BACK

GNMT: ROUND SHOPPING

Obviously, the neural translator cannot cope here, and a little further it becomes clear why.

Case: TripAdvisor

TripAdvisor is one of the largest travel services in the world that needs no introduction. According to an article published by The Telegraph, 165,600 new reviews of various tourist sites in different languages appear on the site daily.

The task of translating the comments of tourists from English to Russian with the quality of translation sufficient to understand the meaning of this review. The main difficulty: typical features of user generated content (texts with errors, misspellings, word omissions).

Also part of the task was an automatic assessment of the quality of the translation before being published on TripAdvisor. Since a manual evaluation of all translatable content is impossible, the machine translation solution should provide an automatic quality score for the quality of translated texts - a confidence score, to enable TripAdvisor to publish only high quality translated reviews.

→ Read more about the project on the company's website .

For the solution, the PROMT DeepHybrid technology was used, which allows to get a better and more understandable translation for the end reader, including through statistical post-editing of the translation results.

Let's look at examples:

Original: We have been there for a night. The service was attentive without being over bearing.

PROMT (Hybrid Translation): We ate there on the last evening by chance, and it was a wonderful meal. The staff was attentive but not overbearing.

GNMT: We ate there on a whim last night, and it was a wonderful meal. The service was attentive, not being more bearings.

Here everything is not so depressing in terms of quality, as in the previous example. And in general, in terms of its parameters, this task can potentially be solved using neural networks, and this can further improve the quality of translation.

Problems of using NMT for business

As mentioned earlier, a “universal” translator does not always provide acceptable quality and cannot support specific terminology. To integrate into your processes and apply neural networks for translation, you need to fulfill the basic requirements:

• The presence of sufficient volumes of parallel texts in order to be able to train the neural network. Often, the customer simply has few or no texts on this subject in nature. They can be classified or in a state not very suitable for automatic processing.

To create a model, you need a base that contains at least 100 million tokens (word usage), and to get a more or less acceptable quality translation - 500 million tokens. Not every company has such a volume of materials.

• The presence of a mechanism or algorithms for automatic assessment of the quality of the result.

• Sufficient computational power.
A “universal” neural translator is often not suitable for quality, and in order to deploy its private neural network capable of providing acceptable quality and speed of work, a “small cloud” is required.

• It is not clear what to do with privacy.
Not every customer is ready to give their content for transfer to the cloud for security reasons, while NMT is primarily a cloud story.

findings

• In general, neural automatic translation gives a higher quality result than the “pure” statistical approach;
• Automatic translation through a neural network - better suited for solving the problem of "universal translation";
• None of the approaches to the MP is in itself an ideal universal tool for solving any translation task;
• To solve translation problems in business, only specialized solutions can guarantee compliance with all requirements.

We come to an absolutely obvious and logical solution, that for our translation tasks you need to use the translator who is most suitable for this. It doesn't matter if there is a neural network inside or not. Understanding the task itself is more important.

Source: https://habr.com/ru/post/330654/

All Articles