After Google closed its API for translation, the problem of finding an online service for machine translation has become particularly relevant.
There are many translation services on the Internet with big names: Promt, Pragma, etc. There is no problem in PHP to model calls to the service pages and get the translation results. But there is a problem: almost all services in response to a simple GET or POST request do not give the result of the translation, but the entire page in all its glory, starting with the DTD. As we say in Ukraine, “bad nema”.
After the analysis, it was found out that there are only two services that only give the result of the translation in response to the request: Yandex and Bing from Microsoft.
Looking ahead significantly, we indicate the areas of application and features:
Yandex is easier to use, it perfectly translates from Russian to Russian, but there is also a drawback: Yandex translates only from Russian or only into Russian. It is impossible to translate Yandex from Ukrainian into English in one operation.
Bing does not suffer from this, but:
- translations in which Russian or Ukrainian participates suffer from a strong “accent” and necessarily require revisions
- using Bing in free mode has some limitations
- to use Bing, you need a certain web application identifier - appID, which itself is not associated with legal difficulties - this is actually just a registration, but which is a fascinating and long quest.')
So, what tasks should a library / class solve for translation?
1. Obtaining languages ​​from which and which can be translated, and their valid combinations
2. Actually the translation itself
Immediately remark. For reasons of common sense, it is clear that in one go translate "War and Peace" will not work. Landing on the technical level gives a clearer limitation: the translator Yaneksa uses GET requests, respectively. - very roughly - about 2000 characters at a time, no more. This is quite a bit, about 2 small paragraphs of text, even a small publication on the site will go beyond this.
Hence the following task:
3. Translation of large text fragments.
Well, imagine the problem: a multilingual site. To drive every time a translator to translate interface elements and other texts on the site is, to put it mildly, unwise. Accordingly, the task:
4. Caching.
Caching is needed for one more purpose: the translator from Yandex is good, but not perfect, especially considering the wealth of the Russian language. Often I would like to correct the result of the translation, and for this you need to store it somewhere.
So, Yandex.Translate
Sources
are available in the Google repository and documented in Russian.
1) Languages ​​of translation.
The Yandex_Translate class contains three methods with speaking names:
yandexGetLangsPairs () - getting available pairs of languages ​​FROM-> TO
yandexGet_FROM_Langs ()
yandexGet_TO_Langs ()
An example (this example is full, the connection of files below, the creation of an instance of a class, the output formatting elements, etc. will be omitted.)
<?php
include_once 'Yandex_Translate.php';
$pairs = $translator->yandexGetLangsPairs();
print_r($pairs);
We get these combinations (by the way, they change from time to time):
[0] => en-ru
[1] => ru-en
[2] => ru-uk
[3] => uk-ru
[4] => pl-ru
[5] => ru-pl
[6] => tr-ru
[7] => ru-tr
[8] => de-ru
[9] => ru-de
[10] => fr-ru
[11] => ru-fr
[12] => it-ru
[13] => es-ru
[14] => ru-es
Please note that in all pairs there is the ru language, well, this has already been mentioned above.
Two other methods give languages ​​separately and can be used, for example, to form selects or other selection elements.
2. Translation
One method, three arguments: from which, to which and actually translated text.
Pay attention also to the important property of eolSymbol - the end of the line. If it is set incorrectly, the output text will not be formatted (see comments in the source).
Example:
$text = file_get_contents('text.txt');
$translatedText = $translator->yandexTranslate('ru', 'uk', $text);
echo $translatedText;
Beginning of the text.txt file:
Mario Puzo Godfather
Dedicated to Anthony Cleary
BOOK ONE
Behind every great condition lies a crime.Script execution result:
MarŃ–o p'yuzo of the Epiphany
Attach to Entrance Clips
FIRST BOOK
Behind a great country of skin, curse evil-doers.Let's pay attention right away - the translation is good, but editing is required.
3. Translation of large texts
For the translation of large texts is the abstract class Big_Text_Translate
The principle is as follows.
First, the text is divided into sentences using the sentendersDelimiter separator — the default is a period.
Of course, it would be more correct to use a dot with a space, but in real, for example, “kamentah”, the space after the dot could easily “act”. Therefore, it does not cause problems in real work, but the property can be overridden.Then sentences are collected in text fragments, the size of which does not exceed the specified value of symbolLimit - by default 2000.
Text fragments are ready for translation, semantics and formatting are saved. Fragments are created by the static toBigPieces method, the output is an array.
The fromBigPieces method stitches the translated fragments back into a solid text.
Example
$bigText = file_get_contents('text_big.txt');
$textArray = Big_Text_Translate::toBigPieces($bigText);
$numberOfTextItems = count($textArray);
foreach ($textArray as $key=>$textItem){
//
echo ' '.$key.' '.$numberOfTextItems;
flush();
$translatedItem = $translator->yandexTranslate('ru', 'uk', $textItem);
$translatedArray[$key] = $translatedItem;
}
$translatedBigText = Big_Text_Translate::fromBigPieces($translatedArray);
echo $translatedBigText;
Try this example yourself - everything is in the repository.
Dear Hobrazhiteli! If the material is of interest, then its continuation is being prepared, including sections:
- caching translation results in several levels
- work with the Bing service
- full demo: building a multilingual site.