📜 ⬆️ ⬇️

Carrot models, bottlenecks and speech recognition: the absence of dictionaries in the field of artificial intelligence

Language quest in speech recognition spaces


Six months ago, I became a technical writer at the MDG Research Department. Then I did not know yet what deep theoretical sea I would have to plunge without a life buoy in the form of at least some terminological dictionary.



The first call from HR from the MDGs contained a rather strange question for me: “Doesn't it scare you that you have to translate articles from Russian into English and from English into Russian?” Of course, this did not frighten me - what a terrible thing in articles! I write them and translate all my life, even love. Therefore, I completed all the test tasks without any worries, went through a series of interviews and, eventually, got a job at the MDGs.


Having received the first task at the new place - to translate three articles from English into Russian - I asked:


- And who are the authors of the texts?
“The guys from the next room,” my colleagues answered.
- Do they speak Russian? - I asked.
- Of course! Look at the names!


Ivan, Alexey, Yuri - the names of the authors were really Russian, so I began to ask them for draft articles on the great and powerful. There were no plans, glossaries or other written sources related to the articles, and I simply began to translate the articles. I was pleased with the fact that my colleagues know English well and do not need Russian-language blanks.



"

But the joy did not last long. Already from the second paragraph of the first text began the most interesting: immersion in the terminological abyss of speech recognition. Meeting the terms that I have not yet known, I certainly looked for them in dictionaries. But they were not in any dictionary known to me. Even Multitran, perhaps, the most comprehensive online vocabulary of professional terms and not only, was silent or gave out the wrong thing. Google translator in this situation turned out to be completely unsuitable, although he amused me a couple of times, giving out a few delusional phrases like “carrot models” (Markov models) or “bottleneck” (bottleneck).


')




Having accumulated a couple of dozen examples of such an untranslatable play on words, I went to one of the authors for explanations. Seeing in my eyes the question “Guys, what are you talking about?”, A colleague began to help me formulate correct translations of concepts. He also explained that the MDG scientific staff do not write scientific articles in Russian - this is just not necessary for anyone. The journals in which it makes sense to publish are entirely English-speaking, and the readers of these articles speak English well enough to share knowledge and move science forward. I was also assigned a translation in order to report on the project to the customer and save it in the knowledge base of the MDGs.


The situation is tense. Attracting one of the bright minds of the MDGs did not put an end to the language quest. Firstly, there were too many untranslated terms, and secondly, it was sometimes possible to translate the phrase only with text the size of a dictionary entry.





Then heavy artillery went into action — a colleague’s candidate’s dissertation, in which he also had to translate speech recognition terms into Russian, and also so that those gathered at the defense could understand what was being said. It became easier, the work began to boil, and soon all three articles were translated into Russian.


For several months I have made a glossary of almost 400 terms for myself, which helps me to translate into Russian and English any texts created by MDG researchers. I’m not afraid of embeddings, MFCC, MLP, bottleneck– signs, etc.


Book Quest in the open spaces of Russia


The task (and not even one) of translating articles was successfully solved, but the sediment from the language quest remained. And the point here is not that at first it was difficult for me. It’s just that in Russia there are no terminological dictionaries for speech recognition and artificial intelligence in general. And such emptiness is a huge obstacle for those who start their way in information technology. And it does not matter that English is the international language of science. Due to the lack of elementary support, anyone who is at the start of an IT career feels uncertain and spends a lot of time learning to talk with colleagues about artificial intelligence and read about it.


But artificial intelligence is not new to science. According to it, both monographs are written and dissertations are defended. And each scientist independently compiles a glossary for his work, and sometimes does without it.


And what about philologists? What do lexicographers and other humanities do to help understand the intricacies of IT terms? For many years I have been using bilingual paper and online dictionaries, including those that users correct. For almost ten years in a row, everything suited me (I worked in system integration). And then I came to the MDGs and realized that Abbyy lingvo is killing all hope of finding at least some adequate translation of the IT term, and Multitran is too rarely to please. This resource demonstrates a very modest database of terms related to artificial intelligence. They are collected in Multitranov exactly 3400. Approximately the same amount demonstrates “oceanology” (3267 terms) and “zoology” (3625 terms) - areas well studied and long provided by the literature, including terminological dictionaries. For comparison, applied IT topics on Multitran are better developed: “robotics” contains 9802 terms, “microelectronics” - almost 12000, “electronics” - 47640.


Russian philologists do not sit idle, they explore the “semantic field of information technology”. But the field is more likely to be at the turn of the century, because until now the articles discuss common and already Russified words like “software”, “user” and “clickability”.


As for the book industry. She remains away from the voiced problem. Proof of this - the search results of dictionaries on artificial intelligence.





Ozone (the active seller of both new books and used books) shows that in 1992 the following was published: “The Explanatory Dictionary on Artificial Intelligence”, the authors-compilers: A.N. Averkin, M.G. Haase-Rapoport, D.A. Pospelov. In it were collected translations of 550 terms from 5 European languages ​​into Russian. And that's all. More than one dictionary among the 2000 publications in the “Artificial Intelligence” section of this store. In the rest of the book is still sadder, there is nothing at all.


But the search results in the electronic catalogs of the three largest scientific libraries in the country, which receive a compulsory copy of books and buy publications in all branches of knowledge.



Library


Number of documents on request
"Artificial Intelligence"


(from dictionaries)


Total fund


Of which dictionaries


State public
Scientific and Technical Library of Russia


1136


Found only 1 on a related topic:
Internet.ru language dictionary [Text] / M. A. Krongauz [and others]; by ed. M.A.
Krongauz. - Moscow: Words. XXI century, 2018. - 288 p.


Russian National Library


890


3 dictionaries:


  1. Vinokurov
    T.N. English-Russian Dictionary of Artificial Intelligence Terms: [about
    2729 terminological units] / T.N. Vinokourov; Feder. agency for
    education GOU VPO. “Om. state tech. un-t. Ohm terminol.
    Centre. - Omsk: Cannes Printing Center, 2012. - 403 p.
  2. Pankin
    A.V. German-Russian Dictionary of High-Tech Terms and Concepts =
    Deutsch-russisches wörterbuch der hightech-begriffe: [essential,
    cable and satellite television, video and audio equipment, nanotechnology,
    electronics and electronics, telecommunications and communications,
    computer equipment, computer networks and the Internet, programming and
    computer science, automatic regulation and control, robotics and
    artificial intelligence, digital photography and digital cinema and
    other]: 35000 terms / A.V. Pankin. - Moscow: Book House
    LIBROCOM URSS, 2009. - 745, [1] p.
  3. Explanatory
    dictionary of artificial intelligence / compiled by A.N. Averkin and
    other - Moscow: Radio and communication, 1992. - 254, [1] pp .; 20 cm. - Bibliogr .:
    with. 254 (the same dictionary on Ozone was found)

Russian State Library


1524


4 Vocabulary: the same as in the NLR (see
previous line) + 1 in Bulgarian:


Systems with izkuestvenny intellect: Terminol.
riverman / Georgi S. Todorov



In general, the picture with the dictionaries is sad. Among the three found dictionaries on artificial intelligence, the first one is German-Russian, the second one was published a quarter of a century ago (it was, incidentally, posted online ), the third one is interesting, at first glance, but very rare, it is impossible to buy it, you can only read it in the scientific library, and then not in each.


I will look for dictionaries further - among the sources in English.

Source: https://habr.com/ru/post/358512/


All Articles