Lingvo API: ABBYY Dictionaries in the Windows Azure Cloud

We think that readers of our blog do not need to tell in detail what the ABBYY Lingvo dictionary is. ABBYY started this product 27 years ago. At first, the dictionary could only be used on computers, then mobile applications and online services appeared. Recently, we have opened access to Lingvo dictionaries for third-party developers at https://developers.lingvolive.com - while in free beta mode.

Under the cut, we will tell you more about how we worked on this service and how it can be used.

Cloud Lingvo API can be used in many scenarios. For example, to include the functionality of translating words into an application in which it is not the basis: in a mobile application for travelers, in a game, in an e-book or in a program for reading on smartphones. You can, for example, add a translation to a web application for learning foreign languages.
')
The service makes it possible to integrate into the products a full or short translation of words, the interpretation of words, the correction of errors and correct spelling hints, full-text search through articles of available dictionaries, display of an article from a specific dictionary, form of words and pronunciation. Developers get access to 140 dictionaries of various subjects for 15 languages, morphology works for all languages. Russian and Ukrainian languages perform in pairs with European (English, German, French, Spanish, Italian Portuguese, Latin, Polish, Greek, Danish), as well as with Chinese, Kazakh and Tatar. Dictionaries of general vocabulary, explanatory, and also thematic in the following areas are available: accounting and auditing, jurisprudence, programming and management.

The translation takes place entirely on the side of the service; all users' programs are required only to be able to send HTTP requests.

Lingvo API is open for integration into programs for mobile (iOS, Android), desktop (Windows, Linux, Mac, etc.) and web applications for free with a limit of 50,000 characters / day. In the future, the service will be paid for as it is used, the licensing model is still in operation and right now we will not be able to tell in more detail about what it will be.

Service API

To start using the Lingvo API, you need to register at https://developers.lingvolive.com using your email address, Facebook account, VKontakte, Google account, or log in to the site using your Lingvo Live account. Then you need to add your application on the "My Applications" tab. In beta mode, you can register three applications with a limit of 50,000 characters / day for free.

The new application is automatically given a standard set of dictionaries and a key to access the API. With this key, the application must authenticate, get a token for access that needs to be attached to the header of requests to the API. How to do this is described in the help .

Client applications interact with the service using HTTP requests, they can be of several types (methods):

1. In the specified direction, you can send a word or phrase and get all the dictionary entries from the existing dictionaries or specify a specific dictionary and get from it.
2. You can get the part of the vocabulary corresponding to the entered word, phrase or part of the word
3. For a word or phrase you can get a short translation, which is the most frequent translation in the existing dictionaries.
4. You can search not only by the titles of dictionary entries, but also by the content of the articles themselves, i.e. in translations and examples
5. Specific dictionary entry by dictionary identifier and title.
6. Variants of spelling correction of the word.
7. Table with word forms (conjugation / forms by time / plural and so on).
8. Voice acting for the word.

In essence, API methods repeat the methods of the Lingvo engine, simplifying access and automatically substituting some parameters. Within each method there is filtering by the dictionaries available to the client application. Since the engine produces everything in XML format, for each method we turn the entire output into JSON.

Sample requests are posted here .

How is everything inside

Our task was to implement the ability to access the engine via HTTP from any device that has Internet access. In addition, we initially wanted the service to be horizontally scaled. We chose for this Microsoft Azure Cloud Services. It combines the ability to configure automatic scaling with the flexibility to configure virtual machines.

To set up virtual machines in Cloud Service, we use an installation script written in PowerShell. It is called from a CMD script (Azure CloudService can call the CMD script specified in the configuration file for initial machine configuration) and performs the following operations:

• enables execution of 32-bit applications (needed for interaction with the engine);
• installs the PowerShell module for working with Openwork, if it has not yet been installed;
• Checks whether LingvoServer is installed and, if not, downloads the Zip archive from Azure BlobStorage, unpacks it and launches the LingvoServer installer.

Schematically, the solution architecture looks like this:

There is basic information about the user application that we need to work: it is his identifier, with which user he is associated and how many characters he can consume per day. This information is stored in a token, it is a JWT token signed with a key that only we have. Why do you need to keep it in a token? In order not to access the database each time (Azure SQL in the diagram), the database is one, it does not scale horizontally, we don’t want to load it much.

We have information about how many characters the application has already consumed. Putting this information into a token is wrong, because then you have to update the token each time (and this is an extra load on the user of our service — he will have to write more complex code, etc.). Instead, the information is stored in the Redis cache, this is a fast cache that can scale horizontally, which means it can handle a lot of work than the database.

A client application request containing a token goes to a virtual machine (indicated in the diagram as a web role). This is a virtual machine instance in MS Azure Cloud Service, our API and the Lingvo engine are deployed on it, on which everything works. The number of virtual machines varies according to the Cloud Service’s auto-scaling options, depending on the average processor load. The engine installer itself (quite large in size) is stored in Blob Storage. It is downloaded only if it is not already installed on a specific virtual machine, so as not to load with each service update.

The user's request goes through several stages: validation of the token (in which there is information about what application it is and how many characters it can consume at all), if the token is correct, then we look in the Redis-cache and see how much it has already consumed . Limiting the amount of characters consumed is done using a leaky bucket algorithm. The algorithm provides the promised volume of consumption for the application client, while limiting the possible peak load. The scope of the possible peak load is set via the configuration. Information about the number of characters available to the application is stored in the Redis cache and is checked after checking the authorization immediately before executing the method itself. If he passed the test and the request passes, then we turn to the Lingvo engine inside the virtual machine and get him an answer to his request.

Future plans

The largest and most important plan - we want to make the Lingvo engine faster, more easily scalable, and improve the quality of search and the format of search results. At the moment, as you can see, it lacks a clearer structure. How to do it right - still thinking. And, of course, we will replenish the API with new useful methods.

We also plan to add learning content to the API and develop adaptive intelligent algorithms based on it, which will use an artificial intelligence to build an individual learning model depending on the needs and level of knowledge of the user.

We are very interested in your feedback and wishes! Write them in the comments to this text or to lingvo.api@abbyy.com.

Examples of integration of previous versions of Lingvo API

In conclusion, we will talk about several applications with an integrated Lingvo API. The service that we have released now is the first experience of implementing the API of our dictionaries in the cloud. But a few years ago we already released a similar product, which was installed directly into the devices and worked only under Android. It has been built into several book reading applications: Yota Reader, Moon + Reader, Moon + Reader Pro and others.

The Yota developers have integrated the Lingvo API into the Yota Reader application, which allowed users to quickly receive a translation of the word highlighted in the text, without interrupting reading.

The developer of Android applications for reading electronic books Moon + Reader and Moon + Reader Pro using the Lingvo API added the function of issuing a card with a brief translation. When reading a book in a foreign language, you can select a word, select the “Dictionary” menu and see its translation within the selected language pair.

Here is the translation in the Moon + Reader application.

Kirill Kryuchkov,
Mobile & Lingvo Live

Source: https://habr.com/ru/post/317102/

All Articles

Lingvo API: ABBYY Dictionaries in the Windows Azure Cloud

Service API

How is everything inside

Future plans

Examples of integration of previous versions of Lingvo API

More articles: