⬆️ ⬇️

Extracting the facts. Synonymy and Homonymy

This post arose as a result of communication with one naive person and the result of their own thoughts about such a complex and ambiguous subject as language (in this case, Russian).

About the conversation: the essence was that (let's call him Someone) Someone stated that the process of extracting facts from a text in a natural language is a fairly simple and easy to implement thing, supposedly looking for verbs (words ending with “em / u / e / /… " ) And adjacent nouns (words longer than 4 letters), compose triplets and drive into ontology database - this is the engine for extracting facts.

Immediately, according to my own system of classification of intelligence, a person received one of the lowest estimates, but it made me think about some aspects of presenting information in the IT and the difficulties that arise when extracting information from it.



Today we will talk about synonymy and homonymy.



Synonymy



Synonymy is a feature of the Russian language, when the same meaning can be expressed differently. For example, the words “cavalry” and “cavalry” mean the same thing (morphological synonymy), and the meaning expressed by the phrase “Smith did not manage to translate this text only due to the fact that it contained many special terms” can be expressed more than a million synonymous paraphrases (syntactic synonymy)! In fact, “he failed = could not = was unable to = he did not succeed ...” , “only = only = solely = solely = ...” , “because, = because = due to that ...” and so on. d. - all these options create a huge number of options for the transfer of meaning, and their direct (Cartesian product) is a huge - n-dimensional set of options.



Homonymy



Homonymy, as opposed to synonymy, hides behind one and the same word (morphological homonymy) or expression (syntactic homonymy) several, sometimes opposite meanings. For example, the word “steel” can be used both in the phrase “Workers smelted a lot of steel per shift” and in “Children have become stronger over the summer” and have completely different meanings and missions in the sentence. The syntactic homonymy of a sentence can be easily demonstrated by saying “My husband cannot be changed . ” A more complex example, passed by everybody in school - “He brought foggy scholarship from Germany” (A. Pushkin) - this can be said about “foggy Germany” (this is understood by the majority), but is Germany considered to be a foggy country) can be said about “nebulous scholarship” (Lensky’s nebula of scholarship is not subject to

no particular doubts).

')

We must not forget about another subtype of homonymy - polysemy. The effect when the same word (not one of the word forms, similar in spelling and pronunciation, as in the case of "steel" ), for example, the word "nose" - "the boat's nose stuck into the sandy beach" and "nose Bob with goats . The person is easy, understands which of the meanings to take, and the computer?



Methods of dealing with homonymy have been developed and debugged for a very long time - they have their pros and cons. These are hidden Markov models, subordination trees, context analysis, revision directories, compatibility dictionaries, and more. Unfortunately, their detailed (or even approximate) description does not fit into the scope of the article - so I will postpone it until the next time.



Literature:



  1. Gladky A.V. Syntactic structures of natural language in automated communication systems. M .: “Science” 1985.




Further >>>

Source: https://habr.com/ru/post/95324/



All Articles