📜 ⬆️ ⬇️

How to solve the problem of machine understanding of natural language

image

Many programmers have tried and are trying to make some kind of interactive program to communicate with the machine on IT. Do not count all sorts of bots and the like homemade.

In addition, there are a huge number of commercial programs that somehow, approximately, solve the problems of machine understanding of HER. Examples are known to all - search engines, the so-called machine translation systems, tonality analysis systems, help systems, and the same FAQ - all of them are far from a satisfactory solution to the problem of communicating with the machine on IT.

The reason is visible to the naked eye — approximate, superficial, simplified ways of processing natural language sentences are used — searching for keywords, using statistical data on the occurrence of certain syntactic structures in the language. Thus, it is as if implied that the NL is too complicated for the realization of a complete machine understanding, therefore it is necessary to apply the approaches simplifying the task.
')
What should be a complete, uncompromising solution to the problem? Obviously, for this, the machine must provide the same work with natural language that we, people, do when we read, listen, speak, write and think. What is our difference in this case from the current computer programs? A person works with the semantic content of sentences, realizing that one and the same thought can be expressed in many ways, although not completely equivalent. So, we need to teach the machine to process sentences in natural language in such a way as to extract the thought, semantic content contained in these sentences. The machine must work with thought, not with a letter.

Here there are two interrelated questions:

- how to build a mechanism for extracting semantic content from the text?
- how to formally present this semantic content of the text?

Of course, the main problem here is the second, because it provides the necessary initial formalization of the problem. Solutions to this problem have been known for quite some time. A brief look at some of them.

Back in the 1980s. translated into Russian, R. Schenk's book “Conceptual Information Processing” was published, in which he described the work he had done with his graduate students on modeling machine understanding of natural language. He developed a method for the formal presentation of the semantic content of a NL sentence, and his graduate students implemented three main necessary functions in the form of LISP programs:

- semantic translation - the transformation of the sentence to us in a natural language into the corresponding conceptual model;
- conceptual memory - the manipulation of conceptual structures, corresponding to "human" mental operations;
- conceptual generation - transformation of conceptual structure into natural language text.

An example of a conceptual representation of the sentence "John ate a frog."

image

Schenk's approach is based on the use of a special language developed by him for the description of mental (conceptual) operations and objects. He called his approach the theory of conceptual dependence (TKZ).

In order to give an initial idea of ​​the TKZ we give some minimal information about it. Conceptualization is the basic unit of the conceptual level from which thoughts are constructed. Conceptualization is based on the following elements:

- ACTOR - the notion of the performer of the ACT;
- ACT - action performed in relation to the object;
- OBJECT - something on which the action is performed;
- RECIPIENT - recipient of the OBJECT as a result of the ACT;
- DIRECTION - the location to which the ACT is directed;
- CONDITION - the state of the OBJECT.

Actions, objects, relationships, states - these are the main elements of the language he created (for which he did not invent a name).
The main types of conceptual actions in TKZ are as follows:

- PROPEL, MOVE, INGEST, EXPEL, GRASP (physical actions performed by man);
- PTRANS - “move a physical object”;
- ATRANS - “change the abstract relationship for an object
- SPEAK - “produce sound”;
- ATTEND - “to direct the sense organ to a specific stimulus”;
- MTRANS - “transmit information (between people or within one person)”;
- MBUILD - “create go combine thoughts.”

Here we will not give a description or at least an introduction to the language of the TKZ, since this is not the purpose of this text. In the book of R. Schenk there is a detailed description of this language.

Schenk's theory is aimed at describing the behavior and thinking of human subjects, which is very interesting and relevant for personality modeling. On the basis of the TKZ, you can create programs that provide dialogue to thinking individuals when the dialogue with the machine is indistinguishable from the dialogue with the person.

At the same time, for machine understanding of NL-text it is not always necessary to accurately simulate the thought processes of the individual. As one of the more utilitarian approaches to modeling text semantics, one can consider the theory of conceptual graphs. The first author who described the CG in detail and considered the issues of their application is R. Sowa, whose book “Conceptual Structures: Information Processing in Mind and Machine” was not translated into Russian.

The concept graph is a coherent network of binary relations describing the semantic links of the corresponding sentence. This approach has become a whole scientific direction, in which there are various branches, there are many experimental developments, scientific conferences are held.

The CG also has abstract concepts and relationships, but when describing conceptualization, only directly expressed semantic assertions and conceptual objects are cited, therefore specific conceptualization looks much simpler.

image

As one of the practical implementations of the CG theory, one can consider the UNL - the universal network language created and developed at the UN Development Institute. UNL is designed to solve the problem of machine translation into the Internet - it is planned that a translator in UNL and a generator from the text of UNL to each EY will be created for each of the existing natural languages, which will allow people to freely communicate on the Internet, regardless of the language used. Despite the understandable and clear concept outlined in the relevant standards, the UNL language is still not well developed to provide a solution to the problem of machine translation.

The work on the creation of the semantic processor CONST, which is being carried out at the NPF Semantics Risech (Kazan), will solve the problem of machine understanding of natural language, providing programmers with convenient tools for creating intelligent applications based on mechanisms for solving all major types of tasks that require machine understanding of IT machine translation, knowledge bases, natural language dialogue with the machine, communication with the robot, etc.

The CONST language is one of the variants of the implementation of the CG theory and is designed to build all types of intelligent systems related to the understanding of NL-texts and the NL-dialogue. The structure of the semantic processor is similar to the MARGIE system, but is intended for commercial use.

image

Literature


1. Schenk R. Processing of conceptual information, M .: Energy, 1980, - 360s.
2. Sowa John F. Conceptual Structures: Information Processing in Mind and Machine, Addison-Wesley, Reading, Ma.
3. www.undlfoundation.org
4. N. Ihsanov. CONST - a tool for creating applied intelligent systems, Heuristic algorithms and distributed computing, Samara, 2015, v.2. №2, pp. 69–78

Source: https://habr.com/ru/post/271321/


All Articles