Implementation and testing of facial recognition algorithm

Content:

1. Search and analysis of the optimal color space for the construction of eye-catching objects on a given class of images
2. Determination of the dominant signs of classification and the development of a mathematical model of facial expressions "
3. Synthesis of optimal facial recognition algorithm
4. Implementation and testing of facial recognition algorithm
5. Creating a test database of images of users' lips in various states to increase the accuracy of the system
6. Search for the best open source audio speech recognition system
7. Search for the optimal audio system of speech recognition with closed source code, but having open API, for the possibility of integration
8. Experiment for integrating video extensions into audio speech recognition system with test report

Goals:

Determine the most optimal algorithm for the recognition tasks of a human face, consider ways to implement it.
')

Tasks:

To analyze the existing algorithms of recognition of facial expressions, taking into account the dominant signs of classification and a mathematical model that we have identified. On the basis of the data obtained, select the optimal variant of the algorithm for its subsequent implementation and testing.

Introduction

In previous scientific reports, a mathematical model of facial expression recognition was developed, and an facial recognition algorithm was synthesized. There are two approaches to recognizing facial expressions - using a deformable model on the lip area and picking up vector features of the lip area and then analyzing them using Gaussian mixture based algorithms. To implement facial recognition it is necessary to choose the optimal algorithm.

1. Algorithms of recognition of a human face:

1.1 Algorithms based on a deformable model.

A deformable model (model) is a pattern of some form (for a two-dimensional case, an open or closed curve, for a three-dimensional one, a surface). Imposed on the image, the pattern is deformed under the influence of various forces, internal (defined for each specific template) and external (defined by the image on which the template is applied) - the model changes its shape, adjusting to the input data [1]. The original coarse lip model is deformed under the action of the force fields specified by the input image (Fig. 1).

The main advantage over traditional search methods, such as Hough transform (Hough transform [2]), in which the search pattern is rigidly defined, is that deformable models can change their shape during operation, allowing for more flexible search for an object [3 ].

The main disadvantage of the deformable models [4] is the need to conduct a large number of iterations over a large number of frames, which significantly loads the system, but when making basic calculations to the cloud, you can unload the system.

Deformable models can be classified according to the type of restrictions imposed on their shape into two types: deformable free-form models and parametric deformable models.

1.1.1 Deformable free-form models

Deformable free-form models are such deformable models on which only the general requirements of smoothness and continuity of the contour are imposed. An example of such a model is the "snake" [5]. The classic “snake” is a deformable model defined by a spline. The change of the deformable model is set by moving the control points of the spline along the lips image, energy is given by the weighted sum of two components: internal energy (given by the continuity and smoothness of the contour) and external energy (given by the details on the image to which the "snake" attracts). It is also possible the third term - the additional energy that sets additional user restrictions. Snakes are widely used in medical image processing [6], motion tracking, segmentation tasks [7].

Formally, a “snake” is a parametric given contour c (s) = (x (s), y (s)). The energy of the "snake" is expressed by the sum

Here P is the potential associated with the image and rigidly connected with the lips. For a “snake” that is configured to search for boundaries, it is possible

where I is the brightness of the image.

Minimizing energy, we obtain the desired parameters. Minimization can be carried out, for example, by the branch and bound method. For this purpose, the coordinates of each of the control points change iteratively; the change that led to the lowest energy value is used in the next iteration step. The minimization procedure finishes work, when at the next step no change can reduce the energy of the "snake" (Fig. 2).

The main drawback of the “snake” is that if there are no clearly defined details near the initialization position, the influence of internal energy, which determines the degree of smoothness of the object being sought, tends to over-pull the model, degenerating it into a straight line, since the zero derivative and the second) direct minimizes the energy of the "snake". To avoid this effect, use a special kind of deformable models, the so-called balloon (balloon) [8].

The main advantages of the "snake" include the relative ease of implementation (in the case of not including the procedures of numerical optimization) and resistance to the variability of the input data.

As a result, we can conclude that the use of the "snake" for the recognition of facial expressions will be associated with the need to use numerical optimization methods, which will complicate the already cumbersome algorithm.

1.1.2 Parametric deformable models

Parametric deformable models are models that have more stringent restrictions on the shape. The model is initialized with a template of a strictly defined form, and with further deformations, the internal energy of the model controls its compliance with the form constraints [9]. Such models are widely used in face recognition [10], gestures and the human figure in the images.

The energy equation E of a snake, expressed by the formula v (s), looks just like an ordinary “snake”:

The first two terms describe the regularity energy of a snake. In our polar coordinate system, v (s) = [r (s), θ (s)], s from 0 to 1. The third term is the energy related to the external force obtained from the image, the fourth with the pressure force.

External force is determined based on the above characteristics. It is able to shift control points to a certain intensity value. It is calculated as:

Analytically defined parametric deformable models are described by a set of primitives, which are interconnected in some way [10]. Primitives and the links between them are involved in the calculation of internal energy, so that the shape of the deformable model cannot significantly deviate from the initializing shape (Fig. 3).

Another variant of parametric deformable models is based on a prototype model (prototype based deformable models [9]). The initialization position and shape of the model based on the prototype is established by machine learning methods or high-level image processing.

The advantages of the parametric model over the "snake" are that, in general, there is no need to apply constraining parameters, since the shape of the model varies within the specified limits. Also, the need to use numerical optimization is often dropped. based on experimental data [9], it works quite well.

The main disadvantage of the parametric model is the need to compile training samples, and rebuilding the samples in case of a change in the model parameters.

As a result of comparing the "snake" and the parametrically specified model, it was concluded that the parametric model is the best choice, since changing the training samples is more comfortable than changing and setting the parameters for optimizing the results of the "snake".

1.2 Algorithms based on Gaussian mixtures.

Gaussian mixtures are the totality of the distributions of the normal value [12]. The standard normal distribution is called the normal distribution with a mean of 0 and a standard deviation of 1. The sum of these quantities gives us a Gaussian mixture (Fig. 4).

The model of Gaussian mixtures is a weighted sum of M, a component and can be written by the expression

(one)
where x is a D -dimensional vector of random variables; bi (x), i = 1, ..., M, are the density functions of the distribution of the components of the model and pi, i = 1, ..., M, are the weights of the components of the model. Each component is a D-dimensional Gaussian distribution function of the form.

(2)
The complete model of a Gaussian mixture is determined by the expectation vectors, the covariance matrices and the weights of the mixtures for each component of the model. These parameters are collectively written as

In the problem of recognition of facial expressions, each image of the lips is represented by a model of Gaussian mixtures and is assigned to its own model, λ. The model of a Gaussian mixture can have several different forms, depending on the type of covariance matrix.

The main advantage of using Gaussian mixtures is the intuitive assumption that the individual components of the model can simulate some set of acoustic features / events [12].

The second advantage of using Gaussian blend models to identify facial expressions is the empirical observation that a linear combination of Gaussian distributions can represent a large number of lip image classes. One of the strengths of the mixture of Gaussian models is that these models can very accurately approximate arbitrary distributions.

The disadvantage of using the model of Gaussian mixtures lies in the difficulty of extracting the feature vector from each frame, as well as in the analysis of the obtained data itself, since it is difficult to divide them into classes.

Due to the complexity of the implementation and the large number of calculations, it is time-consuming to use Gaussian mixtures in the problem of recognizing facial expressions, and because of the ambiguity of the data obtained, it is difficult to get rid of the errors that occur.

Approbation and conclusion:

In this report, we considered mimicry recognition algorithms based on the properties of the deformable model (deformable free-form model and parametric model) and statistical characteristics (algorithms based on Gaussian mixtures). Deformable models are based on changes in the properties of the original template, and Gaussian mixtures assume the use of the statistical characteristics of the region of interest.

The use of Gaussian mixtures involves the processing of large amounts of data [13], which is resource intensive, and the use of a parametric deformable model requires preprocessing procedures that take at least 4 minutes of processor time [5]. That is why the choice fell on the use of the usual "snake", because it is the most optimal in speed [3].

Fig. 5 Work of a deformable model from a previously binarized image of a person’s lips.

According to the conducted research, with preliminary binarization of the lip area, it is possible to increase the quality of the active contour algorithm several times (Fig. 5). At the same time, it is possible to get rid of the procedures for pre-setting the system and analyzing the reference images, as it would be if a parametric active model is used.

Sure finding the contour of the lips will allow to continue to the implementation of the problem of direct analysis of micropauses, which will improve the currently existing systems of audio speech recognizers.

Bibliography

1) Demetri Terzopoulos, John Platt, Alan Barr, Kurt Fleischer. Elastically Deformable Models. Computer Graphics (Proceedings of ACM SIGGRAPH), Vol. 21, No. 4, pp. 205-214, July 1987.
2) Linda Shapiro, John Stockman. Computer vision. Moscow: Basic Knowledge Laboratory, 2006.
3) Michael Kass, Andrew Witkin and Demetri Terzopoulos. Snakes: Active contour models. Int. Journal of Computer Vision, Vol. 1, No. 4, pp. 321-331, January 1988.
4) Shu-Fai Wong, Kwan-Yee Kenneth Wong. Robust Image Segmentation Under Sensitive Snake Under Low Contrast Environment. In Proc. Int. Conference on Informatics in Control, Automation and Robotics, pp.430-434, August 2004.
5) Michael Kass, Andrew Witkin and Demetri Terzopoulos. Snakes: Active contour models. Int. Journal of Computer Vision, Vol. 1, No. 4, pp. 321-331, January 1988.
6) Tim McInerney, Demetri Terzopoulos. Deformable Models in Medical Image Analysis: A Survey. Medical Image Analysis, 1 (2): pp. 91-108, 1996.
7) Doug P. Perrin, Christopher E. Smith. Rethinking Classical Internal Forces for Active Contour Models. Computer Vision and Pattern Recognition, Vol. 2, pp.615-620, 2001.
8) Laurent D. Cohen. On Active Contour Models and Balloons. Computer Vision, Graphics and Image Processing: Image Understanding, Vol. 53, No. 2, pp. 211-218, March 1991.
9) Anil K. Jain, Yu Zhong, Sridan Lakshmanan. Object Matching Using Deformable Templates. IEEE Trans. on Pattern Anal. And Machine Intel. Vol. 18, No. 3, pp. 267-278, March 1996.
10) Alan L. Yuille, Peter W. Hallinan, David S. Cohen. Feature extraction from faces using deformable templates. Int. Journal of Computer Vision, Vol. 8, No. 2, pp. 99-111, August 1992.
11) Markel JD, Oshika BT, Gray AH // IEEE Trans. On Acoustics, Speech, and Signal Processing. 1977. Vol. 25. P. 330–337.
12) “Incremental Learning of Temporally-Coherent. Gaussian Mixture Models »Ognjen Arandjelovicґ, Roberto Cipolla, Department of Engineering, Cambridge.

To be continued

Source: https://habr.com/ru/post/229949/

All Articles