📜 ⬆️ ⬇️

Search and analysis of the optimal color space for the construction of eye-catching objects on a given class of images

Content:


1. Search and analysis of the optimal color space for the construction of eye-catching objects on a given class of images
2. Determination of the dominant signs of classification and the development of a mathematical model of facial expressions "
3. Synthesis of optimal facial recognition algorithm
4. Implementation and testing of facial recognition algorithm
5. Creating a test database of images of users' lips in various states to increase the accuracy of the system
6. Search for the best open source audio speech recognition system
7. Search for the optimal audio system of speech recognition with closed source code, but having open API, for the possibility of integration
8. Experiment for integrating video extensions into audio speech recognition system with test report

Auto face detection and recognition technologies are used in a number of computer vision systems: biometric identification, human-machine interface, robot vision, computer animation, identification and detection systems in photo-video cameras, and so on. The main difference between these applications among themselves is the target classes, which are objects of recognition. The target classes in the recognition tasks can be: a person with elements of overlap, an image of a person’s face, a living person’s face, facial expression, facial features, gender, race, age, person’s personality and other characteristics. For convenience, we will separate the target classes into separate groups, which, when attempting to build an automatic face detection system, form difficulties:

- Strongly varying appearance of the face in different people;
- Even a relatively small change in the orientation of the face relative to the camera entails a serious change in the image of the face;
- The possible presence of individual characteristics (mustache, beard, glasses, wrinkles, and so on), which significantly complicate automatic recognition;
- A change in facial expression can greatly affect how the face looks on the image;
- Shooting conditions (lighting, color balance of the camera, image distortions introduced by the optics of the system, image quality) greatly influence the resulting face image [1].
')
The task of detection on the image is the first step, preprocessing in the process of solving the problem of "higher level" (for example, face recognition, facial expression recognition, and so on). Existing face detection algorithms can be divided into two categories: empirical recognition methods and face image modeling methods. The first category includes methods that are repelled from human experience in recognizing faces and attempting to formalize and algorithmize this experience. The second category focuses on pattern recognition tools, considering the problem of face detection as a special case of the general recognition problem. For a set of training images, a face image model is built, and the detection task is reduced to checking the input image for the satisfaction of the resulting model.



Among the methods of empirical face detection, a family of methods using skin color as a sign of the presence of a face is distinguished. In general, color is a combination of different light waves with a predominance of certain frequencies. In order to describe the color information it is necessary, first of all, to get rid of the color, that is, to convert it into a form that allows direct measurement, namely the form of brightness characteristics. Each of the filters used creates a uniform color stream after itself, that is, in essence, a tone image that is easy enough to record and encode — convert to color form. Filters are needed in order to be able to capture the receiving tone.

To solve this problem, it is necessary to find out how many and which filters are enough to analyze color information. As practice shows, all three filters are enough to solve this problem (red — red, green — green, and blue — blue colors).


Fig. 1 Example of using color filters: red, green and blue

The methods associated with the analysis of color space are widespread, since they combine several important advantages: low computational complexity; high processing speed; ease of implementation; resistance to changes in the orientation and scale of the face; resistance to changes in lighting (with the exception of color); resistance to changes in facial expression and partial overlapping of the face by another object of the scene. [1]

Since the selection of the color of human skin and lips is quite stable, their color characteristics are almost independent of sanctification. Therefore, the color space in which the search will be carried out should not take into account the lighting. This condition is satisfied by the RGB color space (red, green, blue), which is used in the construction of color classes.

When detecting areas with skin color, along with the usual RGB color representation, that is, the intensities of the red, green and blue color components, an additional color-based representation is used - the HSL representation (hue - color or hue, saturation - saturation, luminosity - brightness):


Figure 2. HSL view

H = arctg (y⁄k) / 2k;
S = √ (k ^ 2 + y ^ 2);
L = (R + G + B) / 3;
Where:
k = R-0.5- (G + B)
y = √3⁄2 (GB)
[2]

The RGB color space has the advantage that its components are primary to the computer, and their use will ensure the highest processing speed. Components are usually rated for their sum [3,4]. Of the components of this space the most popular: red and green color schemes. Sometimes, instead of the components themselves, the color difference is used [4].

Using the HSL color space is more suitable for color analysis, since its components are directly related to color. However, its use limits the need to perform arctangent and square root calculations, which requires some time.

Recently, due to the increase in the speed of computers, this color space (HSL) is used more often [5], but its use to solve our problem: creating a mobile mimicking application is not entirely reasonable. Since the system requirements of mobile devices: performance, multitasking, picture quality of VGA cameras and so on - does not fully satisfy our requirements for implementing a mobile application based on HSL color space.

The implementation of facial recognition technology, using RGB space as a basis, is also not the optimal solution. Since this space has drawbacks related to the limited color gamut.


Fig. 3 Example of color limitation of RGB space

Therefore, taking into account the specifics and shortcomings of RGB color spaces (limited in color gamut) and HSL (excessive system requirements for data processing), it is proposed to take as a basis the “compromise” solution - the YCbCr (or YUV) color model, which, in essence, is encoding RGB information. In the YCbCr model, color consists of 3 components: brightness (Y) and two color difference (U and V).


Fig. 4 Example of a color model YCbCr, where Y = 0.5

Thus, as the color space in the system under consideration, the following was chosen: {R, G, Cb, Cr}, where:
R = r / (r + g + b);
G = g / (r + g + b);
but:
Cb and Cr are the corresponding components of the YCbCr color space.
As mentioned earlier, the red and green (G) components of the RGB space are the most popular solutions, and their interaction, together with the color difference indicators (Cb) and (Cr), makes it possible to avoid as much as possible - the influence of light intensity, which will allow us to carry out clear separation of the skin area of ​​the face and the skin of the lips. Among the advantages of the YCbCr color model, it is also necessary to single out a quick transition and transformation of this color model from the RGB format to the new color space {R, G, Cb, Cr}.


Fig. 5 Example of changing the color space of the analyzed image

Conclusion:



Since we are faced with the task of developing a method for identifying potential areas of persons with the following properties: resistance to inevitable inaccuracies in color segmentation, accuracy in identifying areas, high speed of work. An important requirement is to maintain the high speed of the methods, along with an increase in resilience, in order to retain the most important advantage of using skin color for face detection - speed. That is why, based on the goal, traditionally priority is given to the RGB color space, since it has the following number of advantages:
- Low computational complexity;
- High processing speed;
- Ease of implementation;
- Resistance to changes in the orientation and scale of the face;
- Resistance to changes in lighting;
- Resistance to changes in facial expression and partial overlapping of the face with another object of the scene.
However, among the disadvantages of this color space, you must select:
- Not high resistance to the inevitable inaccuracies of color segmentation (the fusion of objects of uniform color with the color of human skin into a single background).
To solve this drawback, it is supposed to transform and transfer to the YCbCr color model from the RGB color space, to implement the new space {R, G, Cb, Cr}. The color space {R, G, Cb, Cr} more reliably and clearly separates the area of ​​the skin of the face and lips, avoiding the effect of sanctification as much as possible; Binarization in this color format does not present much difficulty for the system, compared to the HSL model. The latter condition is fundamental, since we are faced with the task of implementing facial recognition technology for mobile devices.

Bibliographic list:



1. Vezhnevets V., Dyagtereva A. Detection and localization of the face in the image. CGM Journal, 2003
2. Gupta D. Computer Gesture Recognition: Using the Constellation method. Caltech undergraduate Research Journal, 2001, vol.1, # 1. - pp. 26-31.
3. Graf HP, Cosatto E., Gibbon D., Kosheisen M., Patajan E. Multi-modal system. - AT & T lab technical report 95.5.1, 1996
4. Vezhnevets V. Human-Computer Interface. GraphiCon - 2002.
5. Vizilter Yu.V., Zheltov S.Yu., Ososkov M.V. The system of recognition and visualization of the characteristic features of a human face in real time on personal computers using a web-camera. // GraphiCon - 2002.

To be continued

Source: https://habr.com/ru/post/229757/


All Articles