📜 ⬆️ ⬇️

Familiar faces: algorithms for creating a "typical" portrait


Author: Andrey Sorokin, Senior Developer DataArt

At the end of last year, we completed an R & D project dedicated to computer vision techniques in image processing. As a result, we have created a number of averaged portraits of IT specialists working with different technologies. In this article, I will talk about images of “typical” Java and .NET programmers, suitable frameworks and process optimization.

I am interested in the topic of machine vision since graduate school - my PhD was devoted to the recognition of handwritten texts. Over the past few years, there have been significant changes in the methodology and software for machine vision, new tools and frameworks have appeared that I wanted to try. In this project, we did not claim to invent a unique solution - we made the main contribution to the optimization of image processing.
')
The idea to create a portrait of an “average representative” is, of course, not unique. For example, in October, the author of Reddit osmutiar laid out for the resource medium portraits of professional baseball player, basketball player, golfer, etc.


This picture shows the faces created by osmitar on the basis of 1,800 portraits of American MLB players and the 500 best players in the world.

Four years ago, a study of female and male attractiveness was widely discussed, in which scientists modeled averaged individuals from different nationalities.


In fact, the portraits in the illustration at the end underwent additional artistic processing.

The main part of our research was photos of colleagues, which we could group both by features of appearance, and by formal features related to their professional competencies.


The resulting portrait of the "average" male colleagues from DataArt.

We analyzed a total of 1,541 males and 512 females taken from our internal system of working time records. The first problems we encountered were the small size of photos - only 80 by 120 points - and the lack of a standard for shooting. The rotation and tilt of the head in the photo were different for everyone; initially, the program detected faces for only 927 male and 85 female portraits. Therefore, the first step was to normalize the situation of individuals.


Photos before and after leveling the head.

After increasing the size and interpolation of the points in the image, a detector based on the Histogram of Oriented Gradients (HOG) method started working.

We used the method proposed by Satya Mallik , a researcher of Indian origin, who works at the University of California at San Diego, to merge the individuals pre-processed by our algorithm. We identified 68 key points on each face in the sample: the coordinates of the corners of the eyes, eyebrows, lips, nose. Then each face was triangulated by these key points, that is, it was divided into triangles.


This is how the schemes of individuals look after triangulation.


This is the real portrait after transformation in accordance with the average face model.

And, finally, for all persons in the sample, the colors of the pixels within the corresponding triangles were averaged.

Additionally, it was interesting to look at the clustering of the original images. To isolate groups of individuals, we used spectral analysis of image descriptors, with a sample of N major components. Character matrix (MxN), where M is the number of samples, N is the number of components of the feature vectors, is subjected to SVD decomposition. The largest eigenvalues ​​are selected, the corresponding eigenvectors are corresponding to them, and the remaining samples are divided into these top N clusters (“close” to the cluster centers defined by the eigenvectors). In other words, the five most dissimilar groups of the samples presented are selected. Then, three images are selected from each group. Thus, we get a contrasting averaged face due to the use of a smaller number of samples, however, all clusters are represented in the resulting image. As a result, we received a number of typical portraits selected by the algorithm. Images did not turn out to be “faceless” or too similar to each other. With a simple merger of a sufficiently large number of images, this would be almost inevitable.

results



Java girlJava boy.NET Girl.NET Boy
Portraits obtained using cluster analysis (merging the top 3 faces in each of the five clusters).
Images obtained by simply merging portraits of all .NET and Java programmers.


Most of the image manipulations (working with image points, affine transformations and working with color) use algorithms implemented in the opencv framework . To highlight the key points of the faces, like the faces themselves, the image uses the dlib framework and the pre-trained face positioning model (here we are Davis King users), for example, shape_predictor_68_face_landmarks.dat , trained on the iBUG 300W image collection. That is, we simply feed the image of the model 150 x 150 pixels and get 68 points on the output - a vector of fixed length.

The final part of Satya Mallik implemented in Python, we basically rewrote it in C ++. This allowed us to increase processing speed, reduce the amount of memory consumed and ensure the integrity of the solution.

Another problem was the high memory consumption (> 4GB) at the merge of already 300 images. We solved it by analyzing the code for merging proposed by Mallik - all source images were read simultaneously before merging. We could not arrange it: in our case, all we needed was to read 1541 files. If the sizes of the images were slightly larger, 32 GB would not be enough. This problem was solved by rewriting a piece of code and incrementally merging the next image read. Now the amount of used memory does not exceed 100 MB (only the averaged coordinates of the “key points” of the persons, one processed image and downloaded classifiers - fHOG and loaded models are stored).

Source: https://habr.com/ru/post/347454/


All Articles