Using hidden Markov models to remove morphological homonymy

In a previous post, I wrote about what morphological homonymy is (the example with the word “steel” ) and mentioned that for its resolution they use hidden Markov models (Hidden Markov Model, HMM).
First, a little about test markup (in English literature, this process is called “part-of-speech tagging” (POST)) is a manual or automatic process, as a result of which attribute word (att) to each word of the text, which determines what part of speech is This word: noun, verb, adjective, adverb, pronoun, particle, union, interjection, etc. This is where we come across the problem of "steel."

If you don’t go into the details of the algorithm, then the main question to which he answers is: “Choose the most likely tag for this word.”
And if we talk about the details, then for a given sequence of words, the HMM tagger (the program that implements the algorithm mentioned) selects a tag that maximizes the formula:
[1] P (word | tag) * P (tag | previous n tags) , where “P (word | tag)” is the conditional probability of occurrence of a specified word in a given place, provided that it has the given tag, “P (tag | previous n tags) " - conditional probability of the appearance of this tag, subject to the presence of before this specified series of n tags.
A real tagger defines a sequence of tags for a sequence of words, but for pedagogical purposes we will see how a tagger assigns a tag to a single word.
First we use the basic formula, then we will follow the example, and those who are interested in how the “combat” formula is derived and who needs all the mathematical calculations can refer to the literature indicated at the end of the article.
Two-word (using information only about the previous word) (bigram) tagger will select the tag t _j for the word w _i , provided that the previous tag was t _i-1 :
[2] t _i = argmax (j) P (t _j | t _i-1 , w _i )
Using some simplifications of the Markov conditions (also listed in the literature), the formula [2] can be rewritten in the form:
[3] t _i = argmax (j) P (t _j | t _i-1 ) P (w _i | t _j )

Example

As an example, use the previous phrases:
"Workers smelted a lot of steel per shift"
"Children over the summer have become stronger"
In the first example, the word "steel" is a noun, in the second verb. For the purposes of the example, suppose that in some better way all the words in the sentence were tagged, and only the word “steel” was omitted in both examples.
Now the goal of our tagger is to assign the most likely tag for the following sequences:
a lot of {narech} steel {???}
summer {SUSCH} steel {???}

Let's see how the formula [3] can be applied to our example. The formula says that if we try to use it for the “lot of steel” sequence, we will need to choose a tag that has the highest probability of two options:
[4] P (EYE | NARECH) P (steel | EYE)
and
[5] P (SUSCH | NARECh) P (steel | SUSCH)
')
The formula [3] and its generations [4] and [5] consist of 2 parts: the probability of a sequence of tags and the probability of a word for the corresponding tag. For the "steel", the probabilities P (VALUE | NARECH) and P (SUSCH | NARECH) will answer the question: "What is the probability that after the adverb we will meet the verb (noun)?" . We can easily resolve this issue using the marked body (for example, “ruscorpora.ru” ). The probability of following a noun after adverbs is greater than a verb (although this is not an exception):
P (EYE | NARECH) = 0.021
P (SUSCH | NARECh) = 0.34

Notice that the second part of the formulas [3], [4] and [5] do not answer the question “What is the probability that the word“ steel ”has the tag GLAG (SUSCH)?” , And “If this tag will be GLAG (SUSCH ), what is the probability that it will be the word "become"? "
P (steel | HEAD) = 0.00003
P (steel | SUSCH) = 0.00041

As a result, we find that in the desired sequence “a lot of steel” the word “steel” should be marked with the tag “ SUSCH” .
P (steel | GLAG) = 0.000007
P (steel | SUSCH) = 0.00001

Literature

Speech And Language Processing, D. Jurafsky, JH Martin, 1999

Source: https://habr.com/ru/post/95334/

All Articles

Using hidden Markov models to remove morphological homonymy

Example

Literature

More articles: