⬆️ ⬇️

Automatic Age Assessment System for Face Images

annotation

People are the most important tracking objects in video surveillance systems. However, tracking a person does not in itself provide sufficient information about his motives, intentions, desires, etc. In this paper, we present a new and reliable system for automatic age estimation using computer vision technologies. It uses global features of the face, obtained by combining Gabor wavelets and preserving the orthogonality of local projections ( Orthogonal Locality Preserving Projections , OLPP). In addition, the system is able to estimate age using real-time images. This means that the proposed system has a greater potential compared to other semi-automatic systems. The results obtained in the process of applying the proposed approach may provide a clearer understanding of the algorithms in the field of age estimation required for developing applications that are relevant for real application.

Keywords: Gabor wavelets, face image, age estimation, Support Vector Machine (SVM).



1. Introduction

The image of a human face contains abundant information about a person, including facial features, emotions, gender, age, etc. In general, an image of a person’s face can be viewed as a complex signal consisting of many facial properties, such as: skin color, geometric features of facial features . These attributes play an important role in real-life face image analysis applications. In such applications, various properties (attributes) evaluated from the captured face image can be used for further reaction (actions) of the system. Age, in particular, is one of the most important attributes. For example, users may require an age-dependent interactive computer system, or a system that can estimate age to provide access control or a system to gather intelligence. Automatic age estimation using facial image analysis involves a huge number of real-world applications.

The automatic age assessment system consists of two parts: the detection of a person in an image and the actual age assessment. It is rather difficult to detect faces in an image, because the detection results strongly depend on many conditions: environment, movement, lighting, orientation of faces in space, expression of emotions. These factors can lead to distortions in color, brightness, shadows and contours of images. For this reason, Viola and Jones proposed their famous face detection system in 2004. The Viola-Jones classifier uses the AdaBoost algorithm at each node of the classifier cascade to train a high degree of face detection by lowering the number of ignored faces of the entire cascade. This algorithm has the following features: 1) uses the signs of the Haar - comparing the differences of the sums of intensities of pixels in two rectangular areas with threshold values; 2) the use of an integral image to accelerate the calculation of the sums of pixels in a rectangular area or a rectangular area rotated at an angle of 45 degrees; 3) the AdaBoost algorithm uses statistical boosting to create binary (face - not face) classification nodes, characterized by a good probability of face detection and a small probability of face skip; 4) nodes of weak classifiers are organized in a cascade in order to filter out non-persons' images at the initial stage of the algorithm’s operation (i.e., the first levels of the cascade allow more errors of incorrect classification, but they work faster than subsequent levels of the cascade classifier). A person is classified as a person only if it passes through all levels of the cascade classifier.

Although automatic detection of faces in an image is a mature technique involving many applications, estimating the age from a face image is still a difficult task. This is because the aging process is expressed differently not only among different races, but also within the race. This process is mostly personal. In addition, it is also determined by the influence of external factors: lifestyle (proper nutrition, sports), area of ​​residence, weather conditions. Therefore, the problem of sustainable age assessment is an open problem.

In general, there are three categories of feature extraction methods for estimating a person’s age in the literature. The first category is statistical approaches. Xin Geng et al. [2, 3] proposed AGing pattErn Subspace (AGES), a method for automatic age estimation. The idea of ​​this approach is to model the pattern (pattern) of aging, which is determined by the sequence of personal images of aging face. This model is constructed by studying the subspace for such an EM-algorithm of iterative learning of the principal component method ( Principal Component Analysis , PCA). In other papers [4, 5], Guodong Guo and others compare three typical methods for reducing the dimensions of the feature space and various embedding methods such as: PCA, Locally Linear Embedding , LLE, orthogonal Locality Preserving Projections , OLPP). According to the distribution of data in the OLPP subspace, they offer a locally tuned sustainable regression (LARR) method for learning and predicting a person's age. LARR uses support vector regression (SVR) regression for coarse prediction and determines local settings within a small limited range of ages, centered relative to the result, using the support vector machine (SVM).

The second category of methods includes an approach based on the Active Appearance Model (AAM). Using the appearance model is the most intuitive method among all the facial image analysis methods.

Young H. Kwon et al. [6] used visual age features to construct an anthropometric model. Primary features are eyes, nose, mouth and chin. The relationships of these features were calculated to distinguish between different age categories. When analyzing secondary features, a wrinkle map was used to control the detection and measurement of wrinkles. Jun-Da Txia et al. [7] proposed an age-based estimation method based on the active appearance model (AAM) for extracting regions of age-specific features. Each face requires the calculation of 28 specific points and is divided into 10 wrinkle regions. Shuicheng Yan et al. [8] used a path-based appearance model called Patch-Kernel. This method is designed to determine the Kullback-Leibler distance between models that are derived from the global Gaussian mixture model (GMM) using the maximum a posteriori probability ( Maximum a Posteriori , MAP) of any two images. The ability to classify was then enhanced by a process of weak learning, called synchronization of intermodal similarity. Nuclear regression is used at the end for age estimation.

The third category of methods uses a frequency based approach. In image processing and pattern recognition, frequency domain analysis is one of the most popular methods for extracting image features. Guodong Guo et al. [9] investigated the “biological” features of an image ( biologically inspired features , BIF) to assess the age of people in an image. Unlike previous works [4, 5], Guo modeled a person's face using Gabor filters [10]. Gabor filters are linear filters used in image processing to highlight the boundaries of objects inside an image. The frequency and orientation of the Gabor filter views is similar to human vision and is well suited for texturing and solving the problem of discrimination.

Our proposed system uses the cascade AdaBoost for learning to detect individuals, and gets the age estimate by applying the Gabor and OLPP wavelets. This article consists of the following sections. The first includes a description of the face detection system: histogram alignment, feature selection, cascade classifier, trained by AdaBoost, and the algorithm for clustering regions of the face image. The second section: the age estimation process involves extracting features using Gabor wavelets, screening features and selecting the best, age classification. At the end of the article, the simulation results are presented and conclusions are drawn.

This article proposes a fully automatic age estimation system using Gabor wavelets to represent the aging process. The system we offer has 4 main modules: 1) face detection; 2) analysis based on Gabor wavelets; 3) OLPP reduction; 4) classification by the method of support vectors. The input image can come from the camera or be read from the file. A face image is selected from the original image using a face detector using the approach indicated in [12]. Then the image is scaled to have a size of 64 * 64 pixels. Then, using 40 cores of Gabor wavelets, features are extracted and the OLPP reduction is applied to them. At the end, the age estimate is started using the trained SVM classifier.

The rest of the article is organized as follows: Section 2 describes the face detection subsystem using AdaBoost. Section 3 describes the algorithm for estimating age and includes: texture analysis of Gabor wavelets, OLPP reduction, and SVM classification. Section 4 presents the experimental results. Section 5 draws conclusions on the proposed system.



Figure 1. System Overview



2. Face Detection

Figure 1 shows the architecture of the automatic age estimation system proposed in our work. The entire system consists of a face detection subsystem, whose task is to detect areas of the faces in the image and the age estimation subsystem. Scanning windows of various sizes are used to search for faces in an image, since an object can capture images at different distances from the camera. There are a total of 12 scale scan levels, and the image size changes from 24 * 24 with a scale factor of 1.25. Depending on the lighting conditions in which the capture of images takes place, there may be various variations in the brightness of the images. The image can be more accurately recognized (more precisely, the face on the image) after normalizing its brightness.



2.1. Illumination normalization

Illumination normalization is based on the histogram equalization method. The primary task of fitting a histogram is to convert the original histogram H (l) to the target histogram G (l). The target histogram G (l) is selected as an image histogram close to the average histogram for the database of persons. Select the target image and histogram G (l) as shown in Figure 2 (a). Images before and after normalization are shown in Figures 2 (b) -.



Figure 2. Illumination normalization. (a) Target Image. (b) Input Images. (c) Normalized images

')

Input images that are too dark or too light are normalized according to the histogram of the target image. The histograms H (l) are converted into histograms G (l) as follows:



Where and - direct and inverse mapping of histograms H (l) and G (l) into histograms of homogeneous (uniform) distributions.



2.2 Selection of features

We chose four rectangular signs of Haar as shown in Figure 3 [13].



Figure 3. Four types of rectangular features



It is permissible to use a composition of rectangles of different brightness to represent the light and dark regions of the image. Features are defined as follows:



where (x, y) denotes the center of the relative coordinate system of a rectangular feature in the scanning window. The importance of w and h denotes the relative width and height of the rectangular features, respectively. Type - the type of the rectangular feature, - the difference of the sums of pixels in the light and dark areas.

A rectangular feature that can effectively separate faces and non-faces is considered as a weak classifier:





Weak classifier used to determine whether the current part of the image is a face or not a face based on the calculation of the rectangular feature, threshold q and polarity (direction of inequality) p. For each weak classifier, the optimal threshold is chosen so as to minimize the error of incorrect classification. The threshold is selected through training on a sample of 4,000 face images and 59,000 non-face images. Figures 4 (a) - (b) are examples from databases of individuals and not individuals. In this procedure, we calculate the distribution of each feature. for each image in the database and select the threshold, which has the maximum discriminative ability (ie, splits the image into two classes better than others).



Figure 4. Database of persons (a) and non-persons (b)



Although each rectangular feature is computed very efficiently, the computation of all combinations is very computationally expensive. For example, for the smallest sliding window (24 * 24), the full set of features is 160,000.

The AdaBoost algorithm combines a set of weak classifiers to form a strong classifier. Although a strong classifier is effective for face detection applications, it works for a rather long time. The structure of cascade classifiers, which improves detection ability and reduces computation time, was proposed by Viola and Jones [14]. Based on this idea, our cascade AdaBoost forms a strong classifier. In the first step, if the image from the sliding window is classified as a face, then we proceed to step 2, in the other case, the image is discarded. A similar process is performed for all steps. The number of steps should be sufficient to achieve a good degree of recognition and at the same time, should minimize the computation time. For example, if at each step the probability of face detection is 0.99, the 10-step classifier will reach a probability of 0.9 (since 0.9 ~ = 0.99 ^ 10). Although achieving this probability may sound like a very difficult task, it can be done easily, since each step should have a false-positive recognition error value of only about 30%.

The procedure of the AdaBoost algorithm can be described as follows: if m and l are the numbers of individuals and not individuals, respectively, and j is the sum of non-individuals and individuals. The initial weights w_ (i, j) for the i-th step can be defined as . The normalized weighted error of a weak classifier can be expressed as follows:



Weights are updated by the formula (5) in each iteration. If the object is classified correctly, then in other cases, ej = 1.



The final classifier for the i-th step is defined below:



Where



2.3 Area-based clustering

A face detector typically finds more than one face, even if it is one on the image (as shown in Figure 5).



Figure 5. The results of the face detector



Therefore, domain-based clustering is used to solve this problem. The proposed method consists of two levels of clustering - local and global clustering. Local clustering is used to cluster the blocks at one scale and form a simple filter to determine the number of image blocks within the clusters. If the number of blocks in a cluster is more than one, then this cluster is marked as probably containing a person, otherwise the cluster is rejected. The local clustering method also has the following rule for deciding about cluster marking:



In formula (7), the percentage of overlap (x, y) denotes the distance between two detected candidate regions and is equal to the distance between the centers of these regions. Equality means that x and y are in the same cluster and these areas almost completely overlap each other

Figure 6 shows several possible cases of overlapping areas.



Figure 6. Overlapping regions and block center distances



In Figure 6 (a), two blocks fall into one cluster. In Figure 6 (b), two blocks fall into different clusters, since the distance between their centers is greater than the threshold. For special cases, as shown in Figure 6 (c), all blocks are considered as candidates, but most of them are false faces. Therefore, in this paper, for practical applications, we choose only one block that satisfies equation (7) rather than several blocks. In the end, global clustering will use the blocks obtained at the stage of local clustering, and the label of the front region corresponds to the average size of all the available blocks. Some results of the whole clustering process based on the choice of regions for the local and global levels are shown in Figure 7. From the right image in Figure 7, in fact, only one block will be accurately classified as a facial region as a result of applying local and global clustering (even if more than 5 facial candidates obtained for the image, which includes only 5 persons).



Figure 7. Clustering results. (a) Results of clustering at the local level. (b) Clustering results at the global level



3. Age assessment

, : , . , - . 2D , - . , , , . , Donato [15] , . , , .



3.1

[16]:



Where and , , :



Where and — , f — . , (8) , — , — , (9) , . — .

, , , 8 5 8 , :



8.



— , (8). Let be — . I :



Where * ().

, () . (11) (12) — .



Where and .



9. 40



9 . 9, . , . , .



3.2

, [19, 20]. , , , — . 3 : () (, Parallel Dimension Reduction Scheme, PDRS): 10. , . (b) (, Ensemble Dimension Reduction Scheme , EDRS): — , . 11, , . () (, Multi-channel Dimension Reduction , MDRS). Xiaodong Li [21] 2009. 12, . [21] Xiaodong Li . , , .



10.





11.





12.



k- (KNN). , « », 40 . KNN 40 . . — . . FG-NET [22] . 1002 ( ) , . 82 ( ) 0 69 . (, mean absolute error , MAE) . . :



Where — k, — . N — . 1 . , .

1.





3.3

, . . : () () , , , [23]. (b) (LPP) , , [24]. OLPP LPP [25]. , KNN . LPP OLPP . 2 . OLPP .



2.





3.4

- . . . ., . [25-27]. 1 11 . . 1002 ( ) , . 82 ( ) 0 69 . 43 ( 2). , KNN.



4.

FG-NET [20]. 1002 ( ) , . 82 ( ) 0 69 . 13 .



13. FG-NET



, , 2. , , , . , .

64*64 , 256 . ( Radial basis function kernel , RBF) , c = 0,5 g = 0.0078125. , .

: () (). . [2-10]. :



Where — , j.

3 . , FG-NET. -OLSS, 8.43 5.71 KNN , , . 16% AGES [2]. 3, , LARR [4] BIF [9] : 5.07 4.77, .



3.





, — . LARR AAM FG-NET , . , . , LARR . BIF , , . , BIF . , 10.32. , BIF . , BIF . 12-15 .

14. Gabor-OLPP , WAS , . AGES GAbor-OLPP , Gabor-OLPP, .



14.



5.

. , , . , .

. , . , , . , . . OLPP .



6.

: 100‐EC-17‐A‐02‐S1‐032, , , : NSC‐100‐2218-E‐009‐023.



Literature

[1] Paul V, Jones MJ (2004) Robust Real‐Time Face Detection. International Journal of Computer Vision 57(2), 137‐154

[2] Geng X, Zhou Z‐H, Zhang Y, Li G, Dai H. (2006) Learning from facial aging patterns for automatic age estimation, In ACM Conf. on Multimedia, pages 307– 316

[3] Geng X, Zhou Z‐H, Smith‐Miles K. (2007) Automatic age estimation based on facial aging patterns. IEEE Trans. on PAMI, 29(12): 2234–2240

[4] Guo G, Fu Y, Dyer, CR, Huang, TS (2008) Image‐Based Human Age Estimation by Manifold Learning and Locally Adjusted Robust Regression. IEEE Trans. on Image Processing, 17(7): 1178‐1188

[5] Guo G, Fu Y, Huang TS and Dyer, CR (2008) Locally Adjusted Robust Regression for Human Age Estimation. IEEE Workshop on Applications of Computer Vision, pages 1‐6,.

[6] Kwon Y, Lobo N. (1999) Age classification from facial images. Computer Vision and Image Understanding, 74(1): 1–21

[7] Txia J‐D and Huang C‐L. (2009) Age Estimation Using AAM and Local Facial Features. Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pages 885‐888

[8] Yan S‐C, Zhou X and Liu M. Hasegawa‐Johnson, M., Huang, TS (2008) Regression from patch‐kernel. IEEE Conference on CVPR, pages 1‐8

[9] Guo G, Mu G, Fu Y and Huang TS (2009) Human age estimation using bio‐inspired features. IEEE Conference on CVPR, pages 112‐119.

[10] Serre T, Wolf L, Bileschi S, Riesenhuber M and Poggio T. (2007) “Robust Object Recognition with Cortex‐Like Mechanisms. IEEE Trans. on PAMI, 29(3): 411–426

[11] Lin C‐T, Siana L, Shou Y‐W, Yang C‐T (2010) Multiclient Identification System using Adaptive

Probabilistic Model. EURASIP Journal on Advances in Signal Processing. Vol. 2010

[12] Paul V and Jones MJ (2004) Robust Real‐Time Face Detection. International Journal of Computer Vision 57(2), 137‐154

[13] Papageorgiou C. P, Oren M and Poggio T. (1998) A general framework for object detection. in

Proceedings of the 6th IEEE International Conference on Computer Vision, pp. 555–562

[14] Viola P and Jones MJ (2004) Robust real‐time face detection. International Journal of Computer Vision, vol. 57, no. 2, pp. 137–154

[15] Donato G, Bartlett MS, Hager JC, Ekman P and Sejnowski TJ (1999) Classifying facial actions. IEEE Trans. Pattern Anal. Machine Intell., vol. 21, pp. 974– 989

[16] Wiskott L, Fellous J, Kruger N and Malsburg C. (1997) Face recognition by elastic bunch graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, pp. 775–779

[17] Liu C and Wechsler H. (2002) Gabor feature based classification using enhanced fisher linear discriminant model for face recognition. IEEE Transactions on Image Processing, vol. 11, pp. 467– 476

[18] Liu C. (2004) Gabor‐based kernel PCA with fractional power polynomial models for face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, pp. 572–581.

[19] Belhumeur PN, Hespanha JP and Kriegman DJ (1997). ʺEigenfaces vs. Fisherfaces: Recognition using class specific linear projection.ʺ IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7): 711‐ 720.

[20] Duda RO, Hart PE, and Stork DG (2000) Pattern Classification, 2nd ed. New York: Wiley Interscience

[21] Li X, Fei S and Zhang T. (2009) Novel Dimension Reduction Method of Gabor Feature and Its Application to Face Recognition. International Congress on Image and Signal Processing, 2009. CISP ʹ09. 2nd, Page(s): 1‐5

[22] The FG‐NET Aging Database [Online]. Available: www.fgnet.rsunit.com

[23] He X‐F, Yan S‐C, Hu Y‐X, Niyogi P and Zhang H‐J. (2005) Face recognition using Laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(3): 328‐340.

[24] Cai D, He X‐F, Han J‐W and Zhang H‐J. (2006) Orthogonal Laplacianfaces for Face Recognition. IEEE Transactions on Image Processing 15(11): 3608‐ 3614.

[25] Mercier G and Lennon M. (2003) Support vector machines for hyperspectral image classification with spectral‐based kernels. in Proc. IGARSS, Toulouse, France, July 21–25.

[26] Abe S. (2005) Support Vector Machines for Pattern Classification. London: Springer‐Verlag London Limited.

[27] Wang L. (2005) Support Vector Machines: Theory and Applications. New York: Springer, Berlin.

[28] Lanitis A, Draganova C and Christodoulou C. (2004) Comparing different classifiers for automatic age estimation. IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 1, pp. 621–628

Source: https://habr.com/ru/post/248991/



All Articles