
As our regular readers know, ABBYY not only produces software products, but also has been engaged in research in the field of computational linguistics for many years, without which these products could not have appeared. We also hold an annual international conference
"Dialogue" (details about it -
here ). And recently, our company opened the departments of computational linguistics at two Moscow universities - at the Institute of Linguistics of the RSUH (together with IBM) and at the Faculty of Innovations and High Technologies at MIPT.
ABBYY already has a positive student learning experience: our
department of image recognition and text processing has been working for six years at the Moscow Institute of Physics and Technology, and many of its graduates have had a good
career in the company . Therefore, we hope that with the preparation of computer linguists we will succeed too.
As in the case of the Physics and Technology Department of image processing, the departments of computational linguistics will work closely with the company. Students during their studies will participate in real commercial projects, and undergraduates, if they wish, will be able to get us a job.
Why were the departments of computational linguistics needed? First, of course, in order to train specialists of a new profile, because until recently there was no such university specialty in Russia. In the past ten years, computational linguistics has been developing very rapidly, and our schools have also understood the need to keep up with the times: this year the magistracies in computational linguistics at the Higher School of Economics and St. Petersburg State University have opened.
')
Secondly, in addition to training for companies, the departments of computational linguistics in universities have another important goal - to contribute to the development of the relevant science in Russia. After all, the situation here is not very happy. On the one hand, we have traditions in the field of computer processing of texts, there are companies that are doing real and successful projects in the world language technology market. On the other hand, since the specialists of this profile have not been trained in Russia for a long time, we practically do not participate in the world’s scientific life today, and, even more sadly, our specialists often do not know how to conduct research according to world standards. And the Russian language is not even included in the twenty languages, on the material of which studies are conducted.
It turns out that we have commercial computer linguistics projects, and Russian participants are not represented at international conferences (or are presented as undergraduates and graduate students of foreign universities). Why is this coming out? The fact is that companies most often cannot make the results of their scientific research open: they are limited by corporate policy,
patent trolls , and competition. It is not surprising that, in the world, computer linguistic research is carried out primarily in universities, not in corporations.
The reasonable question is why the department of linguistics, albeit computer, opens not only in a humanitarian, but also in a technical university? The fact is that it is hardly possible to create a universal specialist - a linguist and a programmer “in one bottle”, having the same deep knowledge in both areas. In real projects should be involved and those and others. Only for the success of the case, the engineer who will develop technologies for processing the language must have a clear understanding of its structure and the results of linguistics that can be used. A linguist must understand the requirements imposed on linguistic models by the necessity of their application in modern computer analysis technologies. Therefore, additional knowledge will be superimposed on a fundamental basic education (linguist or engineer) that will help humanities and “techies” to find a common language of interaction when solving applied problems.

So, linguists will gain knowledge of statistics, formal grammars, machine learning methods, heuristic methods of artificial intelligence, expert systems and knowledge representation systems. They will be taught to work with specialized languages ​​and development environments for linguists (such as Natural Languge Toolkit, R etc.), specialized linguistic databases, and open linguistic resources (from grammars and parsers to ontologies).
Engineers take courses on the grammatical system of natural language (morphology and syntax), semantics and discourse, general and computer lexicography, corpus linguistics (methods of creating and using text corpora).
From the very beginning of the training, we plan to combine the efforts of the masters from the Moscow Institute of Physics and Technology and the Russian State Humanitarian University to work on serious projects. So, in the fall, work will begin on the ambitious project of the General Corps of the Russian Language (a
large PDF about this ), in which, besides our students, famous linguists will take part.
As for the teaching staff, in addition to professors and teachers of the RSUH and MIPT, ABBYY's best specialists, both linguists and programmers, will read special training courses for students and undergraduates of the department. Of course, it is difficult and pitiful for us to tear them away from the main work, but it is also important to teach departments. Firstly, only we ourselves can teach students that approach to solving problems of computational linguistics, which is accepted by us, so then graduates (if they want) will simply be involved in the work of the company. And secondly, teaching is also useful for the teachers themselves: in order to develop a training course, you need to “sort through” your knowledge, be aware of the latest world achievements in this field.
We will also invite well-known Russian and international computer linguistics specialists to give lectures (as we already do in the framework of the
ABBYY Open ).
The organization of the educational process at the departments at the Moscow Institute of Physics and Technology and the Russian State Humanitarian University will be slightly different. If the department at MIPT is the ABBYY department and it is physically located in our office, the department of the RSUH is the department with our participation and it is located in the university itself. At the RSUH, the department teaches courses to all students of the Institute of Linguistics, and not just "their" bachelors and masters. Both departments are recruiting for the first time this year.
We invite to our departments those who are interested in computer analysis of natural language!
Tatyana Panferova
with the participation of the research and development department