Markov random fields

The article is devoted to the description of the CRF method (Conditional Random Fields), which is a variation of the Markov random field method. This method has found wide application in various fields of AI, in particular, it is successfully used in problems of speech and image recognition, text processing, as well as in other subject areas: bioinformatics, computer graphics, etc.

Brief description of the method

(who does not like mathematics and is afraid of formulas, you can immediately go to the "comparison methods").
A Markov random field or a Markov network (not to be confused with Mac chains) is a graph model, which is used to represent the joint distributions of a set of several random variables. Formally, a Markov random field consists of the following components:

undirected graph or factor graph G = (V, E), where each vertex is a random variable X and each edge represents the relationship between the random variables u and v.
set of potential functions (potential function) or factors , one for each click (a click is a complete subgraph G of an undirected graph) in the graph. Function assigns to each possible state of the clique elements a certain non-negative real-valued number.

Vertices that are not adjacent should correspond to conditionally independent random variables. The group of adjacent vertices forms a clique, the set of vertex states is the argument of the corresponding potential function.

Joint distribution of a set of random variables

in the Markov random field is calculated by the formula:

(one)

,
Where

- potential function describing the state of random variables in the k-th clique; Z - the normalization factor is calculated by the formula:
')
(2)

.

Many input tokens

and many corresponding types

collectively form a set of random variables

. To solve the problem of extracting information from the text, it suffices to determine the conditional probability P (Y | X). The potential function is:

where

real valued parametric vector, and

- a set of feature functions. Then a linear conditional random field is called a probability distribution of the form:

.
The normalization factor then Z (x) is calculated by the formula:

Method comparison

The CRF method, like the MEMM (Maximum Entropy Markov Models) method, refers to discriminative probabilistic methods, in contrast to generative methods such as HMM (Hidden Markov Models) or the Naïve Bayes method.

By analogy with MEMM, the choice of factor-signs to set the probability of transition between states in the presence of the observed

depends on the specifics of specific data, but unlike the same MEMM,
CRF can take into account any features and interdependencies in the source data. Feature vector

calculated on the basis of the training sample and determines the weight of each potential function. Algorithms similar to the HMM algorithms are used to train and apply the model: Viterbi and its type - forward-backward algorithm.

The hidden Markov model can be considered as a special case of a linear conditional random field (CRF). In turn, the conditional random field can be regarded as a kind of Markov random field (see figure).

Fig. Graphic image for HMM, MEMM and CRF methods. Open circles indicate that the distribution of a random variable is not taken into account in the model. Arrows indicate dependent nodes.

In conditional random fields there is no so-called. label bias problem - a situation where the advantage is given to states with a smaller number of transitions, since a uniform probability distribution is constructed and the normalization (the Z (x) coefficient from formula (5)) is performed as a whole, and not within a separate state. This is certainly an advantage of the method: the algorithm does not require the assumption of the independence of the observed variables. In addition, the use of arbitrary factors allows us to describe various signs of the objects being determined, which reduces the requirements for completeness and size of the training sample. Depending on the problem being solved, in practice, a volume from several hundred thousand to millions of terms is sufficient. At the same time, accuracy will be determined not only by the sample size, but also by selected factors.

The disadvantage of the CRF approach is the computational complexity of the analysis of the training sample, which makes it difficult to constantly update the model when new training data is received.

Comparison of the CRF method with some other methods used in computational linguistics can be found in [3]. The paper shows that the proposed method works better than others (as measure F1) in solving various linguistic problems. For example, in the tasks of automatically finding named entities in the text, CRF demonstrates somewhat lower accuracy compared to the MEMM method, but at the same time, it wins significantly in its entirety.

It should be added that the implementation of the CRF algorithm has a good speed, which is very important when processing large volumes of information.

Today, it is the CRF method that is the most popular and accurate way to extract objects from text. For example, it was implemented in the Stanford University Stanford Named Entity Recognizer project. Also, this method has been successfully implemented for different types of linguistic text processing.

Bibliography

Lafferty J., McCallum A., Pereira F. “ Conditional Random Fields: Models for Segmenting and Labeling Sequence Data ”. Proceedings of the 18th International Conference on Machine Learning. 282-289. 2001.
Klinger R., Tomanek K .: Classical Probabilistic Models and Conditional Random Fields. Algorithm Engineering Report TR07-2-013. of Computer Science. Dortmund University of Technology. December 2007. ISSN 1864-4503.
Antonova A.Yu., Soloviev A.N. The method of conditional random fields in the tasks of processing Russian texts. "Information Technologies and Systems - 2013", Kaliningrad, 2013.
http://itas2013.iitp.ru/pdf/1569759547.pdf
Stanford Named Entity Recognizer // http://www-nlp.stanford.edu/software/CRF-NER.shtml

Source: https://habr.com/ru/post/241317/

All Articles

Markov random fields

Brief description of the method

Method comparison

Bibliography

More articles: