📜 ⬆️ ⬇️

Morphology and computational linguistics for the smallest

On Habré, there was already a post about Technopark, and even stories about courses ( 1 , 2 ) that take place in it. Today we are publishing the first part of the master class, which Andrei Andrianov from ABBYY conducted for Technopark students.

To begin with, it would not be superfluous to recall what morphology is and how it relates to linguistics. For this I propose to go under the cat to the contents of the first post of the series.

Many of you are familiar with the sentence "Gloe Kuzdra shteko blanka bokra and kurdyachit bokkrenka." Although we do not know what lies behind all the words of this sentence (with the exception of the union “and”), we can assume that the main actor is kuzdra. And this is not some ordinary kuzdra, but a gloch. What did she do? Budlanula. How did she do it? Shteko. Who is she up to? Bokra. In addition, she performs some actions on the side.

This phrase was invented by academician Lev Shcherba, and academician Alexander Potebnya demonstrated to his students using this phrase as an example how we can extract a certain part of semantics from the morphology of a word, from its inflection, from endings. We do not know the lexical meaning of words - we do not understand which objects are named - but we can catch their grammatical meaning. It is about the grammatical meanings I would like to talk about in this article.

Morphology is a section of linguistics that studies 4 things.

Parts of speech

As soon as you read the sentence: "Gloe Kuzdra, the shteko budlanula bokra and kurdyachit kakryonka", you immediately caught the subject and two predicates - "budlanula" and "kurdyachit." Different parts of speech in different languages ​​may form sentences in different ways.
')
Word change

Having seen the word “budlanula”, without even knowing what it means, you can already decline it, retreat it. You understand that the infinitive of this word is “budlanut”. You can change the genus (budlanul, budlanulo), and you can change the time (budlanet, budlanyosh). The way in which words change, in what forms they define one or another grammatical meaning, is studied by the second sub-section of morphology, the word change.

Word formation


Having met bokrar and bokkryonka in one sentence, you immediately imagined that bokryonok - it is a cub of bokra, like a elephant and elephant. Maybe this is just a small copy of a large bokra - well, the character did not turn out, for example, growth.

We often form new words using suffixes (for example, diminutive), in order to change some properties of an object; You can even change the part of speech. For example, there is the word "shovel". From this word, if desired, you can form the verb: - ​​shovel. Native speakers will quickly understand its meaning, but those who study Russian as a foreign language will find out for a long time what the word is and why it is not in the dictionary. Quite often, we form verbs on the properties of different animals and endow them with some properties.

The grammatical meaning of the word

I have already mentioned that a word has two meanings - lexical (what the word means in the dictionary), and grammatical (what the word means in the sentence). Some semantics can be taken out of grammatical meaning. For example, the word "budlanula". Obviously, this is a verb. From this it follows that the word “budlanula” means action. In addition, we can say that this is a verb in the past tense, singular, feminine, perfect form. All this gives you additional information. For example, in Russian, the female gender is often associated with the female sex. We can not explain why the fork is feminine, and the glass is male, but why the girl rose, and the boy rose, we understand. And we will be hurt if someone makes a mistake in choosing a gender.

From the school desk, we present the grammatical meaning as a set of grammes. Genitive, past tense, singular - all these are different grammes. Grammemes can be grouped into categories. Nominative, genitive, dative, accusative and prepositional is a category of case. The same form cannot have two grammes of the same category. If we say "budlanula", then we mean only gramme singular. In the same form “budlanula” we cannot encrypt two forms of the verb at the same time. There can be no noun at the same time in the nominative and in the dative case. Forms may coincide, as they often coincide in the nominative and accusative cases, but they must be distinguished. This is another of the tasks of morphology.

Applied linguistic tasks

Computational linguistics is part of artificial intelligence. The goal of computer linguistics is to create algorithms by which the machine will understand the meaning of text or words that come to it from various sources of input - sound, image, textual information.

Applications of computational linguistics:

Natural language processing

Computational linguistics is most widely used in natural language processing. Processing solves a variety of tasks, including compiling dictionaries and automatic translation.
Other technologies related to the processing of natural language are also interesting from both theoretical and practical points of view. Extracting facts from text and autoreferencing allows you to automatically categorize large amounts of text with greater accuracy than machine learning methods. Knowledge management systems, expert and question-answer systems are basically based on knowledge extraction from the text.

Text Recognition (OCR)

When recognizing text, other technologies are used. And in this case, we are interested in whether the word is a dictionary or not. When text is recognized, we often deal with image blur, and the binarization algorithms that occur before text recognition cannot give a result of 100%. In connection with this, a mass of hypotheses is generated about what is written there after all. Sometimes it is impossible to distinguish the letter "n" from "m" or "n" from "k", and then computational linguistics, or more precisely morphology, comes into play. Morphology suggests whether there is such a word in the language or not.

Speech Recognition (ASR)

Speech recognition works in a similar way. From a set of sounds, hypotheses are constructed regarding specific letters that a person utters. Take the word "cow." We say “karova”, and we write “cow”. Here it is important to understand whether the word “karova” is in Russian or not.

Speech synthesis

Speech synthesis is another interesting technology that can be used both independently and as part of automatic translation. This is already a synthetic task: we need to analyze the text in one natural language, determine its meaning, and, based on the result, generate the text in another natural language.

At this introductory part is over. In the next post we will talk about the role of morphology in computational linguistics.

Source: https://habr.com/ru/post/188026/


All Articles