Computer understanding of the text: is it really that bad?

Most recently, a post appeared on Habré in which the author confidently declares that a computer can never understand a text in the way that a person understands it. As proof, he cites a number of tasks impossible for a machine, emphasizing the absence of efficient algorithms and the impossibility of modeling a complete system, which would take into account all possible options for constructing text. But is it really that bad? Is it true that to solve such problems we need incredible computing power? And what is the general state of word processing in natural languages?

And what does “understanding” mean in general?

The first thing that confused me was the question itself - could a computer ever understand a text as a person understands it ? And what actually means “to understand as a person”? Or even not so, what does “understanding” mean in general? In the book Data Mining: Practical Machine Learning Tools and Techniques, the authors ask themselves a similar question - what it means to be “trained.” Suppose that we have applied to the "interpreter" some learning technique. How to check, he learned or not? If a student went to all the lectures on the subject, this does not mean that he learned and understood it. To check this, exams are introduced, where the student is asked to solve some problems on the subject. It is the same with a computer - you can find out if he learned (he understood the text) only by checking how he solves specific applied problems - he translates the text, highlights the facts, gives out the specific meaning of a polysemous word, etc. In this perspective, the concept of meaning generally loses its primary importance - the meaning can be considered just some kind of interpreter state, in accordance with which it processes the text.

Ambiguous words

Further, the author of the original article gives an example of the translation of the sentence “First Nikolai printed out a letter from Sony”, pointing to several possible translations of the word “printed out” with completely different meanings. A person can easily understand what is being said, but can a car?
In order to answer this question, let us consider how a person himself makes a decision about the sense in which a given word is used. I think everyone will agree that first of all, when solving such tasks, we focus on the context . The context can be presented explicitly - in the form of sentences framing this, or implicitly - in the form of a body of knowledge about the proposal (in our case, the knowledge that the proposal is taken from the novel War and Peace, knowledge of the time when the plot takes place , and knowledge of the state of progress at that time).

To begin, consider the first option - the use of contextual sentences. Suppose that we have two pairs of sentences: “First, Nikolai printed out a letter from Sonia. By the light of the torch, it was difficult to read it ”and“ First Nicholas printed out a letter from Sonya. The printer was junk, so in places not all the characters were clear. ” In the second sentence of each pair there are key words that allow us to unambiguously identify the meaning of the word “printed out” in the previous sentence - in the first case it is “splinter”, in the second - “printer”. Question: what prevents the computer from doing the same maneuver to find out the real meaning of the word in question? Never mind. In fact, systems for defining the meaning of a word have long been used in practice. For example, the tf-idf index is widely used in search engines when calculating relevance. As a rule, information is collected on the co-occurrence of words (“print” and “torch”, “print” and “printer”) and on its basis a more relevant document or a more accurate translation of the word is selected.
')
With an implicit context, that is, with a set of knowledge of circumstances, everything is somewhat more complicated. A simple collection of statistics here will not do - it needs knowledge. And what is knowledge in general, how can it be presented? One of the ways of representation is ontology . In the simplest case, an ontology is a set of facts of the form <Subject, Predicate, Object>, for example, <Nikolai, is, Man>. Building ontologies themselves, even for a specific domain, is, to put it mildly, rather big. Considerable, but not impracticable. There are a number of initiatives, such as Linked Data , in which people gather together and build a web of interconnected concepts. Moreover, there are a number of quite successful developments in the automatic extraction of facts from the text. Those. from the sentence “First Nikolai printed out a letter from Sony”, you can automatically derive the facts <Nikolay, printed out, letter>, <Letter, from Sony, etc., etc. As an open-source example of such developments, Stanford Parser can serve, which quite well understands the structure of the sentence in English. And some companies, such as InventionMachine (I’m not going to insert a link, because they’re already advertising) generally build their business on fact-finding systems.
However, I digress. So, we will assume that we already have a more or less complete ontology for our subject domain. For simplicity, we also assume that it is collected by people, so the word "print" in it is presented several times - once for each meaning of the word. In the sense of "open" this word can form facts <[Someone], printed out, the parcel>. In the sense of “printing”, it can be used in facts <print what the printer is on>. Finally, suppose that knowledge about circumstances is already present in our ontology. In this case, the task of determining the correct meaning of the word is reduced to displaying all the facts of the proposition on the ontology for all possible meanings of the word “print” and choosing the meaning in whose environment the most known facts (both facts about the circumstances and facts extracted directly from the sentences ).

Before going further, I will draw several conclusions:

1. Statistics is a powerful text analysis tool.
2. The extraction of knowledge (facts) from the text is a reality.
3. Creating a knowledge base about the subject area is a difficult but manageable task.

Other tasks

Further, the author of the article leads, in my opinion, rather messy, a number of specific tasks that the computer is supposedly completely beyond the control of. I will not argue, some tasks are really quite complex. Difficult, but not subject to. Below, the mentioned problems with possible solutions will be given in random order, but first a few more words about the natural language processing discipline itself.

From the NLP point of view, the text is a set of attributes. These signs can be words (the roots and forms of words, case, case of letters, part of speech), punctuation marks (especially those that are put at the end), emoticons, sentences entirely. On the basis of these signs, more complex ones can be built - n-grams (sequences of words), assessment groups (appraisal groups, the same sequences, but with an indication of assessment, for example, “very good”), words from given dictionaries. And even more complex ones are alliteration, antonymy and synonymy, homophones, etc. All this can be used ~~against you in court~~ as indicators when solving various word processing tasks.

So, the tasks themselves.

Text mood determination

In general, the author suggested not quite clear division - the text is funny and the text is sad. Three variants of classification come to mind:

1. The text is optimistic / pessimistic.
2. Positive / negative (for example, feedback).
3. Humorous / serious.

One way or another, this is a classification task, which means that standard algorithms such as Naïve Bayes or SVM can be used for it. The only question is what specific signs to take from the text in order to achieve maximum classification results.
I never dealt with the classification of text into optimistic and pessimistic, but I bet that it’s enough to use the roots of all words as signs. The results can be further improved by compiling dictionaries for each of the classes. For example, such words as “sad”, “loneliness”, “sadness”, etc. may be included in the “pessimistic” dictionary, and “optimistic” - “cool”, “yo”, “fun”.
The classification of reviews and other user generated content, which shows the speaker’s attitude to a certain object (to a new camera, to government actions, to Microsoft), has recently become so widespread that it was even highlighted in a separate area - extracting opinions (opinion mining, aka sentiment analysis) ([1], [2]). There are many approaches to extracting opinions. For texts consisting of at least 5-6 sentences, evaluation groups [3] performed well (with the result up to 90.2% of correctly defined opinions). For smaller texts (for example, tweets) you have to use other signs - words from predefined dictionaries, case of letters, smiles, etc.
The task of defining a humorous text is not so popular, but there are certain achievements in it ([4]). As a rule, antonymy, alliteration, as well as “adult slang” are used to define humor.
It is also worth noting that not only humor, but sarcasm with irony, the computer is also already able to quite successfully recognize. Well, anyway, better than Sheldon Cooper.

Ideology of the author

As well as his competence, approach to work, attitudes towards the family and hidden complexes. Everything that is reflected in the text can be found. Even what the average person does not see. It is enough to select the appropriate signs and properly train the classifier. Yes, it is possible that the results will not be very accurate, but, for example, Wikipedia generally asserts that among people such things can only be correctly determined by 70%, and 70% are below the average for such classifiers.

Metaphors, sayings and omissions

All these tasks require additional information. If you have a ready-made ontology for the domain, it will not be difficult to find objects with similar properties — a certain measure of proximity is introduced for this, calculated on the basis of statistical data, and the most "relevant" object is searched for.

Automatic translation

As I have already indicated above, the problem with determining the specific meaning of a multivalued word during automatic translation can be solved with the help of statistical analysis. Therefore, the only real problem is the generation of a well-formed text. There are two subtasks here:

1. Correct definition of connections between words.
2. Correct display of found structures in the new language.

The task of determining the connections between words is essentially the same task of classification, where classes are all possible relations between words. Libraries such as Stanford Parser use probabilistic classifiers and the theory of fuzzy sets to determine the most “correct" version of the connections between words.

But with the display of found structures in the new language there are really problems. But these problems are mostly not computer, but just of a translational nature. Professional translators never indicate languages that they know — instead they indicate directions for translation . For example, a translator may be able to translate from Italian to Russian, but not from Russian to Italian. That is, of course, they can somehow do a reverse translation, but they are far from ideal. The problem is just the mapping of the constructions of one language into another, in which there can be no direct analogue. What to do in this case is not clear. Therefore, not only computer, but also the usual theoretical linguistics continues to evolve, bringing out more and more new rules. At the same time, from the point of view of computer implementation, there is nothing difficult in laying the created rules into an automatic translation program.

A big problem

So, computers already know how to extract facts from the text, understand the mood of the author, recognize sarcasm and much more. So what's the problem, why is there still no universal "reader" that could take the text and solve all the tasks that a person can perform? For several years of practice in NLP, I came to the conclusion that intelligent text processing systems are difficult to combine . Creating systems of several components not only causes combinatorial growth of links between them, but also requires consideration of all dependencies along with their probabilistic indicators. For example, machine learning or manually created rules can be used to extract opinions. However, if we combine both of these approaches, the question arises of how much each of them should influence the result: what will it depend on, what is the nature of these dependencies, how to calculate numerical parameters, etc. The field of natural language processing is still in its adolescence, so for now humanity is only able to create systems for solving local problems. What happens when all local tasks are resolved, and whether a person has enough abilities (memory, thinking tricks) to combine everything that has been worked out is difficult to predict.

Links to resources

[1] Bo Pang, Lillian Lee. Opinion Mining and Sentiment Analysis
[2] Bing Liu. Opinion mining
[3] Casey Whitelaw. Using Appraisal Groups for Sentiment Analysis
[4] Rada Mihalcea. Making Computers Lough: Investigations in Automatic Humor Recognition

Source: https://habr.com/ru/post/127061/

All Articles