
Dear readers and writers of Habr!
I want to say thanks to all the participants of this discussion for their sincere desire to make our country better. This is evident in all the questions asked here, even if they sometimes sound hard and prejudiced. I thought it was important to answer the questions, because ABBYY hadn't talked about what we were doing or how we were doing it for too long. And the lack of information gives rise to many ridiculous conjectures. So, time to answer questions.
In these answers, I will sometimes be distracted by general judgments and general information, but not in order to advertise something, but to better explain the reasons underlying our actions.
')
To begin with, ABBYY is already 21 years old. All this time we have been making interesting products and technologies known throughout the world. Now it is the recognition products of printed and handwriting text and dictionaries. In the future, these are products based on computational linguistics. Our interests do not lie in the field of consumption. Our managers and shareholders do not drive luxury cars, do not buy yachts, helicopters and palaces, we do not have summer cottages on Rublevka. We are interested in making new amazing products, not shopping.
More than 70% of our income is not earned in Russia. According to various estimates, from 30% to 50% of scanners and MFPs of the whole world are completed with the Russian software product ABBYY FineReader. We are fairly well diversified by source of income. We are not dependent on the Russian market and, moreover, on its state sector.
We only do what we are interested in. Only those that we are willing to spend the time of our lives. We are not interested in cuts, politics, corruption schemes and other muddy activities.
We live in Russia, we work in Russia, we pay all taxes. Somehow they even received a certificate from the tax inspectorate, as the best taxpayer in the district :) (I don’t know whether to rejoice at this or worry). Our children study here and no one is going to leave anywhere. Everything that we have is here! We do not care what will happen to Russia. And we are ready to participate in affairs useful for the country. We spend our time and money on the development of education and other useful initiatives that develop the IT industry in Russia.
Maybe all of this will seem too lofty to someone, but we really have a reserve here of good people with good and common interests.
Now let's switch to the substantive questions. There were many, many questions and answers, so today only the first block will appear here. In it, I will answer questions about ABBYY Compreno technology, under which ABBYY received a grant. The following blocks will be published early next week.
"... Money is allocated for the creation of technology for automatic text processing Compreno ... Compreno technology is designed to create systems for analyzing, translating and searching texts in various languages. With this technology, it will be possible, for example, to automatically create a context database for improved text analysis, text autoreferration and etc..."
I would like to hear more concretik, with explanations on examples. I guess that Compreno technology can be used in computer translation of texts, for example, to correctly determine the structure of sentences in the source language - that is, to determine the connections and relationships between words in a sentence in the source language. Thus, the generation of sentences in the final language will be carried out taking into account the entire structure of the original sentence. As far as I can see, in modern autotranslators such an analysis is in its infancy, approximately at the level of the definition of the noun (pronoun) and the verb to it. Therefore, the proposals of modern translators are rather clumsy.
How much will the quality of translation increase? Does it not turn out that sentences will be generated linguistically related, with words with correct endings, that is, at the structure level - there will be no complaints. But at the level of meaning, will the autotranslator both generate rubbish and continue to generate pseudo-meaningful text?Answering the first two questions, I would like to tell more in detail about the ABBYY Compreno technology.
Although we are closely following developments in this area in the world, we are not aware of the analogs of the new generation ABBYY Compreno linguistic technology created by us.
The central core of the technology being created is the universal hierarchy of concepts and the model of relations between these concepts (for specialists: a hierarchy of universal semantic values ​​and relations between them). Although all people on earth speak in different words, they use a very similar system of concepts. People in different countries go to work, sit at home, work on computers, draw up contracts, fly on airplanes, and negotiate. Similar business centers are being built for them. They are located in similar rooms and enjoy similar furniture. All these concepts and their interrelations in different developed civilizations have much more in common than different. In the future, this semantic tree of concepts I will call the English abbreviation USH (Universal Sematic Hierarchy).
USH is a tree of concepts, universal for all languages, whose thick branches are more general and universal concepts (for example, “travel”), and thin - more specific, but also universal concepts (for example, “business trip”). The tree structure allows you to ensure the inheritance of properties from ancestors to descendants, thanks to which the description of new concepts is faster, since to describe the concept of “order”, it is no longer necessary to list all the characteristics of the concept of “document”. Words of a particular language are leaves on the USH tree. We get the ability to remove ambiguities. For example, in Russian, the different meanings of the word “management” correspond to the concepts on different branches of this tree, since there is the meaning “management” as a department, and there is the meaning “management” as an action.
The semantic description of a particular language turns into the attachment of “leaves” —the words of this language — to the branches of the USH.
The second, but no less important part of the technology is full text syntax analysis. Syntax is a way of “coding” meaning (for specialists: semantic relations) in a particular language. The semantic relations themselves are universal, and the ways of their implementation in each language are their own. In some languages, a linear order is established; in others, cases, prepositions, special service words are used, somewhere all at once is used. For each language, the syntactic description is made anew, but the means themselves, which different languages ​​use to encode meaning, are enumerated. When describing a new language, different constructor elements are used (the same linear order, different types of syntactic transformations, grammatical meanings, prepositions, special constructions).
Compreno also successfully identifies more complex syntactic links, such as replacing the word “boy” with the word “he” in a sentence (for specialists: anaphora): “Although the boy wanted to play, but he understood that he had little time.” Or the whole omissions in compound sentences (for specialists: ellipsis), for example, “he likes red wine, and she likes white wine”. The connections between the concepts distinguished by the system are also expressed in the tree structure, actually convey the meaning of what is written, and carry important information for searching or translating. Thus, the system seeks to determine the meaning of the text written in ordinary language, allowing the machine to “understand” this text and transform it into a universal representation, independent of the language.
Using USH, a syntactic description of the language, as well as statistics of the relationship between words, Compreno technology makes a complete analysis of the text and, when translating it into another language, uses words that correspond to the correct branches of the USH tree and the relations revealed during the analysis of the original sentence.
As you understand, if for translation we managed to bring the computer closer to understanding the meaning of the text, then this understanding can be used not only for translation tasks, but also for many other highly demanded applications. Obtaining a universal idea (meaning) allows you to come close to better speech recognition, clever information retrieval, when natural language is used as a query, and as a response you can get a document that does not necessarily contain the words of the query, but contains their analogues and correct ones. relationship between concepts. You can determine the authorship of the document, you can make a summary (squeeze from a large document). You can still do a lot of things when you have the universal basic linguistic technology ABBYY Compreno.
What problems does the product for which the money is allocated in Skolkovo solve? What are its uses? What new opportunities will it provide to ordinary people?
Specify specifically, in what types of programs you now see the need to use Compreno, if suddenly now it would be completely ready?Again I will answer two questions at once.
Compreno technology is a universal linguistic platform for applications that solve many applied tasks in natural language processing, such as:
- Translation and interpretation from one language to another;
- Intellectual search , in particular:
- Search by meaning, not by keyword;
- Extraction of facts and links between the objects of search (including for competitive intelligence);
- Monitoring companies and personalities and building analytical reports based on parameters of various types. For example, when preparing a report on “What is the most popular cellular operator's tariff?” It is important not only to correctly identify all the tariffs discussed in the media, but also to compare the frequency.
- The ability to receive answers to queries specified in ordinary language (for example, “What does Ivan Ivanovich Bobrov own?”).
- Multilingual search , i.e. when a question in one language contains answers in all languages ​​supported by the system;
- Classification and filtering of documents;
- Protection against unauthorized use of information;
- Automatic summarization and annotation of documents;
- Speech recognition;
There is no such universal technology in the world now that allows to solve so many applied problems that require high-quality linguistic analysis of texts. And we are talking not only about traditional tasks, such as translation, but also problems that could not be solved on a qualitatively new level in the past (for example, automatic search for facts and connections in arrays of information).
ABBYY Compreno’s revolutionism is fundamental in its approach. Many have thought about the universal system of concepts and technologies of complete syntactic and semantic analysis. In our work we rely on the works of leading Russian scientists in this field and classical linguistic education. However, many experts retreated before the colossal engineering and linguistic complexity of the implementation of this idea for real practical tasks. Advanced Russian linguistic education and science have given us a very good foundation for the start and development of this great work.
What will the money go for? What are the expected results? Will the result be a new product, or will it disperse in the old?
The project has been under development for about 10 years. I do not know about any public results. Did he become unfinished with unclear prospects? What are the guarantees that this grant will be the last necessary for the release of the product?
How long is the project designed to be implemented in Skolkovo?I answer three questions.
In psychology, it is known that the child is different from an adult inability to control their impulses - and this is one of the reasons why he can never replace an adult in many types of adult activity. Similarly, a mature company built for a long time differs from a one-day firm with the ability to invest in projects that will bear fruit through the years. Read more
here .
I should add that we have already decided long ago that you need to do only what you are sure that you can achieve the best results in the world. If you do not have reasonable grounds to believe that you will become the best in the world in your business, then this is a bad business, because it will be reduced as a result to price competition. Unique things are not done quickly, otherwise they would be easy to repeat.
Well, now essentially what we are doing.
Work on the creation of Compreno technology has been going on for 15 years (you can look at the people who are engaged in this and our other projects
here ). The fact is that this project requires a serious scientific foundation, without which it is impossible to create a truly high-quality working technology. Fundamental science, as is known, requires money and time. 15 years ago began thinking through the basic concepts of the new technology. Approximately 10 years ago - work on building architecture, about 6 years ago - serious work on the programming of basic modules, and about 2 years ago, the most serious technological risks were passed. The project has entered a phase that allows us to speak with a high degree of confidence about the attainability of our goals.
The result to which we strive, I described in one of the previous answers.
We expect that commercial products for wide application based on the Compreno platform will appear within 2-3 years. But today we are demonstrating Compreno technology to large customers, Experts who see how technology works today are discussing pilot projects with us. In addition, we are preparing another product of ours - a library of functions available to other application developers. That is, any developer will be able to license the core of the system and embed these functions in their software products.
In the end, in order to remove doubts about the long-term prospects of the dolgostroi, I want to add that all these years, the company's shareholders (roughly coinciding with its management) instead of enriching and buying yachts, villas and other trash, have invested the company's profits in a completely new breakthrough direction. We were able to build a high-quality high-tech international business, and we consider ourselves competent in what we do. Who better than us to know where to invest our money? Can anyone except us more reliably evaluate the correctness of this investment? If someone thinks that we are mistaken and knows how to use our money better, then I can congratulate us all on the emergence of new businessmen who will glorify our homeland with their deeds. Dear young! We are waiting for posts about your success!
I would like to know more about the recognition system, namely. Will it be “language independent” or will it also rely on the morphology of each language, like the current system in FineReader? In the second case, do you plan to use an open format or technology that allows you to add morphology rules for new languages ​​to users themselves? Say, for example, hunspell / aspell dictionaries in browsers.I hope that the previous answers clarified this topic. If not, please specify.
What profit to the state and citizens from this technology? I am interested in its use in something other than your products.Few people know that selling ABBYY FineReader function libraries for developers brings us more revenue than selling ABBYY FineReader boxes. We also plan to make the ABBYY Compreno technology available to other developers, so that it can be incorporated into other products that require high-quality processing of natural text. This will spur the development of a variety of Russian businesses that use Compreno to create programs or provide services. We believe that some of these high-tech products or services will be sold abroad, which in the current situation, mainly Russian oil and gas exports, can not but rejoice.
In addition, we must not forget that the creation of such technology here, in Russia, raises the general skill level in this field in the country. Because people who are so well understood in applied linguistics live, work, pay taxes and spend their money in Russia, not in California, Munich, Calcutta or Guanjou. Conferences, lectures and seminars conducted by ABBYY on artificial intelligence and applied linguistics are held in Russia. ABBYY supports education in this area in our universities, not at MIT or Peking University. And in general, the chance to get world leadership in this local area of ​​knowledge is precisely with Russia, and not with America, Israel or India.
Immediate PROFIT from all this is difficult for the state and citizens to count, but here the USE is very large, especially if you look at the long time horizon.
What technologies of artificial intelligence are used in the work?To build a language-independent semantic structure, technologies of syntactic and semantic analysis are used. To resolve homonymy, classification and machine learning technologies are used.
How much will your new technology cost?We will try to make sure that everyone benefits from their use. So that translators can earn more by increasing labor productivity, so that the customer can translate more and pay less. Otherwise, it is impossible to do business in the modern world. You will be successful only if all your partners and clients are successful. Surely something will be available for free on the Internet.
Can (and will you?) Use Compreno technology to improve the quality of text recognition? I mean, will Compreno help you decide in controversial cases with defects in a recognizable picture? Can Compreno work with incomplete data sets and help make assumptions about what should be in a poorly readable place?Apparently some elements of Compreno technology will be used in the FineReader recognition system in the future.
How can your system help me, a simple builder?Great question! You can take a picture of a bag with dry Chinese glue or a can of German paint on your mobile phone, click on the “Transfer” button, the program recognizes and translates Chinese and German instructions into Russian, which will help you to properly dissolve the glue or mix paints. You will do your work with quality, and it will bring joy to those people who will live in the house you have built. This is what we call “helping people understand each other better.” We create artificial intelligence technologies that enhance the quality of life. In this sense, we are colleagues. You also create something that enhances the quality of life.
Continuation - next week.
Part 2. GrantPart 3. Skolkovo, Linux and other questions