Synthesis of optimal facial recognition algorithm

Content:

1. Search and analysis of the optimal color space for the construction of eye-catching objects on a given class of images
2. Determination of the dominant signs of classification and the development of a mathematical model of facial expressions "
3. Synthesis of optimal facial recognition algorithm
4. Implementation and testing of facial recognition algorithm
5. Creating a test database of images of users' lips in various states to increase the accuracy of the system
6. Search for the best open source audio speech recognition system
7. Search for the optimal audio system of speech recognition with closed source code, but having open API, for the possibility of integration
8. Experiment for integrating video extensions into audio speech recognition system with test report

Goals

Determine the most optimal algorithm for its subsequent implementation and testing in the recognition of facial expressions.

Tasks

To analyze the existing algorithms for video recognition of a human face and its characteristics, taking into account the dominant signs of classification and a mathematical model defined by us. On the basis of the data obtained, select the optimal variant of the visual recognition algorithm for its subsequent implementation under our tasks of implementing facial recognition technology for mobile devices or computers.
')

Theme

Since we are faced with the task of implementing a productive mimicry recognition system for mobile devices, when choosing the optimal algorithm for solving this problem, we must proceed from the following:

• Low resolution and high noise level (typical of most frontal VGA cameras of smartphones and PCs);
• Low performance requirements of mobile devices and computers for computing data at a frequency of 25 frames per second;
• High speed operation (for online video processing).

Based on the above conditions, when choosing the optimal algorithm for the mimicking tasks, we need to focus on a reliable algorithm that has minimal system requirements and is highly efficient. Also, when implementing the synthesis of an optimal mimicry recognition algorithm for solving the problem, we must take into account our accumulated experience, which we acquired in the previous stages of the study.

Imagine the scheme of the processing and subsequent image analysis in the form of a table (Fig. 1). In this case, at this stage of the study, we should determine the column that we repainted in blue for simplicity - that is, choose the optimal matrix recognition algorithm:

But before proceeding to the choice of the optimal algorithm for our problems of recognition of facial expressions, we should explain the mechanism of capturing the feature vector.

Snagging a feature vector

After at the previous stages the image was binarized and the contour of the lips was extracted, the so-called procedure of applying n points, numbered from p1 to pn clockwise, takes place. The coordinates of the points used are normalized: the midpoint of the ellipse is considered the origin of coordinates, the x axis is directed in the direction of the larger radius of the ellipse, the large radius of the ellipse is considered to be unity. In addition to the coordinates of points, in the process of selecting the contours of the lips are the parameters of the ellipse that describes the area of the lips on the original image. The parameters of the ellipse allow to draw conclusions about such common parameters of the mouth area as the mouth is open or closed. The contour numbering starts at the intersection of the lips contour with the left large radius of the ellipse.

Then we search for corners (Fig. 2). Among the points obtained, it is necessary to determine the right and left angle. Despite the numbering of points, these are not always points p1 and pn / 2. The right corner is the point located in the right half of the contour (between pn / 4 and p3n / 4), whose angle α is the smallest. The angle α is the angle between the average qnext and qprev. Here qnext = (pi + 1 + ... + pi + k) / k, qprev = (pi-1 + ... + pi-k) / k, k = n / 5. A similar rule is used for the left corner [1].

The next step after finding the angles is to convert the set of source data into a set of feature vectors. As the first few elements in the feature vector, features are used that are obtained separately from the coordinates — the ratio of the height of the ellipse of the lip area to its width. Further elements of the feature vector are the coordinates of the left and right corner of the contour, the coordinates of the upper and lower points of the contour, the coordinates of the other points of the contour. Consider options for analyzing the data obtained by the method of principal components. The selection of the basis by the method of principal components allows you to find the main directions in which the vectors of signs change. This makes it possible to significantly reduce the dimension of feature vectors. The principal component method is applied to a set of feature vectors obtained from a data set that reflects most of the possible lip states.

Now consider the most common algorithms for recognizing a human face and its characteristics:

Algorithms based on the method of hidden Markov models (Hidden Markov Models)

The Hidden Markov Model (SMM) is a statistical model that simulates the operation of a process similar to the Markov process with unknown parameters, and the task is to guess unknown parameters based on observables.

Each feature vector must be mapped to the symbol of the hidden Markov model (Fig. 3). For this we use the vector quantization method. Using this method, the space of feature vectors is divided into clusters, according to the principle of proximity to cluster centers - code words. A set of code words is called a code book. The main complexity of the method is to build a code book of vectors. The size of the code book is determined by the number of lip states in the source data. A code book of known size k is constructed by the K average algorithm [2].

At the first step of the algorithm, k vectors, which are considered code words (cluster centers), are randomly selected. In the next step, each input vector is assigned to the cluster whose code word is at the shortest distance from it. In the third step, the code words of each cluster are recalculated. Each code word is made equal to the arithmetic average among all vectors of the cluster. The second and third steps are repeated until the codeword changes are small enough.

This algorithm is slow, but the application of the analysis of principal components before quantization allows one to lower the dimension and, thus, significantly speed up the process of building the code book [3]. New source data are quantized before use in the recognition process: each vector is assigned the closest vector from the codebook, and in the future, instead of a vector, its index in the codebook is used as a symbol of the hidden Markov model.

Image recognition cannot work at the visem level, since the visemes for different phonemes are quite close. In this case, recognition on the basis of the sequence of visem — diphones, triphons — is much more reliable [4]. For recognition, a system of ergodic hidden Markov models is used [5]. Each diphone has its own hmm. SMM initialized equal probabilities for symbols and transitions between states. However, such a model, due to the high degree of freedom, is poorly tuned to the training data, which negatively affects the quality of recognition [6].

The SMM system is trained using a sequence of quantized feature vectors. The source data is manually broken down by the learning dipones, after which the corresponding SMM is updated using the Baum-Welch algorithm [7]. The resulting SMM gives the maximum probability values on sequences close to the set for training your diphone.

As a result of the work, an effective algorithm for constructing lip feature vectors for speech recognition problem is built. The algorithm allows you to convert lip contour data into sets of features suitable for recognition. The algorithm has the properties of reliability and stability and is easily integrated with a speech recognition system based on hidden Markov models. However, it should be noted and weaknesses of this algorithm, in particular, it has a weak discriminating ability and is poorly trained.

Algorithms based on the method of neural networks (Artificial neural networks)

Neural network methods are methods based on the use of various types of neural networks (NS). NA consists of elements called formal neurons, which themselves are very simple and are connected with other neurons. Each neuron converts a set of signals arriving at its input to the output signal. It is the connections between neurons encoded by the scales that play a key role. One of the advantages of NN (as well as the disadvantage of implementing them on a sequential architecture) is that all elements can function in parallel (Fig. M.4), thereby significantly increasing the efficiency of solving the problem, especially in image processing. Besides the fact that NAs allow to solve many problems efficiently, they provide powerful flexible and versatile learning mechanisms, which is their main advantage over other methods. Also among other advantages of the neural network, one should recognize the possibility of obtaining a classifier that models a complex image distribution function of faces well. The disadvantage is the need for careful and painstaking adjustment of the neural network to obtain a satisfactory classification result [8].

Algorithms based on combined methods

Considering the advantages and disadvantages of algorithms based on hidden Markov models and neural networks, recently in the scientific world of recognition on a given class of images, hybrid algorithms have become popular. According to research data, hybrid INS / SMM recognizers increase the accuracy of traditional SMM by simulating correlations between simultaneous signal parameters and between current and following parameters [10]. That is, the SMM provides the ability to model long-term dependencies, and the INS provides non-parametric universal approximation, probability estimation, discriminant learning algorithms, reduction in the number of parameters for evaluation, which are usually required for standard SMM [11]. However, when choosing a combined algorithm, it must be borne in mind that the too large and complex architecture of the hybrid algorithms increases the processing time of the data by the system processor.

Conclusion

To determine the optimal algorithm for the problem of recognition of facial expressions, we first considered in detail the simplest and most reliable mechanism for capturing the feature vector for its subsequent analysis by matrix algorithms. At the next stage, we reviewed and analyzed the advantages and disadvantages of the most well-known algorithm building models: hidden Markov models, artificial neural networks, hybrid algorithms. After examining the existing approaches and solutions in the field of feature vector processing, we focused on the combined data processing methods that we believe are most effective for our solution: the implementation of a reliable and fast facial recognition system for mobile devices and computers.

List of used sources

1. Soldatov S. Lip reading: preparing feature vectors. Graphics & Media Laboratory MSU, 2003
2. A. Linde, R. Gray. An algorithm for vector quantization design.// IEEE Transactions on Communicatinos COM-28, 1980
3. Soldatov S. Lip reading: preparing feature vectors. Graphics & Media Laboratory MSU, 2003
4. Ibid.
5. Gultyaeva T.A., Popov A.A. Modifications of one-dimensional hidden Markov models for the problem of face recognition // 16th International Conference on Computer Graphics and Computer Vision - GRAPHICON, 2006
6. K. Sobottka and I. Pitas, A novel method for automatic seeding, facial feature extraction and tracking, Signal processing: Image communication, Vol. 12, No. 3, pp. 263-281, June, 1998
7. Gultiaeva T.A., Popov A.A. Modifications of one-dimensional hidden Markov models for the problem of face recognition // 16th International Conference on Computer Graphics and Computer Vision - GRAPHICON, 2006
8. Makarenko A.A. Classification of images ultra-precise neural network. Scientific session - TUSUR-2006. Mater All-Russia. scientific-tech. conf. students, graduate students and young specialists. Part 1 Tomsk, 2005.
9. F. Wasserman. Neurocomputer engineering: theory and practice (Trans. In Russian. Language. Y. Yu. A., Tochenov V. A.), 1992.
10. Osetrov V.P. Audiovisual speech recognizer. // Staffing for the development of innovative activities in Russia. Ershovo, M., 2010.
11. Makovkin K.A. Hybrid models: hidden Markov models and neural networks, their application in speech recognition systems. // Models, methods, algorithms and architectures of speech recognition systems. Calculate. center them. A.A. Dorodnitsyna, Moscow, 2006.

To be continued.

Source: https://habr.com/ru/post/229895/

All Articles