Did anyone know? All the people above do not exist: this is not a photograph, but the result of the operation of a simple algorithm averaging thousands of different faces. In this article we will talk about how easy it is to quickly outline such an algorithm and get interesting results.
Preamble
So, once spring Saturday evening I had a desire to select some basic set of facial features for subsequent identification. It was not related to my work, just a hobby. I decided to start with building an average face, in relation to which it would be easier to choose deviations and track the influence of individual features on the overall structure of the face. In general, this problem is quite successfully solved, usually with the use of Image Warp or other non-linear transformations of the coordinate system, which by and large can not always be called averaging, but we will go the other way.
I hasten to assure you that this article is not:
Perilous mathematics
Finished pieces of code
Optimal solutions
Since This project started for me as a sideline in the course of solving a narrow problem, most of the actions here are done by Just For Fun, without looking at Best Practices with the idea of a quick and easy development. But there is a fair averaging, without changing the proportions and shape of the face. I will use Matlab in my work, but if you decide to repeat my experience, you can choose almost any programming language and environment, since it does not use any too complex structures that could not be written on anything in a day.
Database of images
Critical for all subsequent venture stage. We need a fairly large set of source images. ')
What we will pay attention to when selecting images for our database:
Base volume. I often saw when a similar problem was solved on a base with the number of images of the order of a hundred. This is clearly not enough for the subsequent filtering of the database in accordance with various characteristics.
The quality of the pictures. For a detailed result, it is desirable to have high-quality images (with an eye on the possibility of iron on the processing of the whole thing)
Lighting. There is only one strict rule: the lighting should not change significantly from snapshot to snapshot. I would also like to avoid the presence of shaded areas as much as possible, but even with them our algorithm will work.
The position of the face in the picture. It would be ideal to have position markers for basic facial features, but if they are not there, you can also do without it.
Emotions, hats, body modifications. All this will greatly complicate the work, so it is advisable to remove the most blatantly inappropriate pictures from the database in advance.
According to the above parameters, I really liked the Humanae base, which was covered at TED, where I actually saw it. It contains about 3000 high-quality images with quite uniform lighting, a uniform background and a fairly stable position and facial expression:
Before we begin, we should download the base and bring all the images to the same size. You can use pictures of different sizes, but with so-called. Programming was more convenient for me. Then you need to divide all 3000 images into classes. It will be necessary to filter the sources and get averaged people from different groups. This is the longest and dreary stage of the whole work.
I used to classify the following parameters:
Gender: Male, Female, Unobvious.
Skin color: 0-255. Can be taken from the background color.
Age: Children, Adults, Elderly.
Subjective beauty: 0, 1 (normal, no obvious defects), 2 (attractive).
We will use them as filters to get different results. You can do without quite a selection, you get some kind of averaged face, the features of which will depend on the number of people represented in the database with certain features. Separation by gender we will necessarily, because externally the features of men and women differ the most.
If you choose the same base as me, you can be helped by eliminating children (there are several dozen), blacks (of the order of several hundred), and people with obvious physical defects in order to get a less blurred result. This is not related to any kind of chauvinism or political statements, just the number of these groups in the base is often too low, and the features are too different from the average. In any case, you can always invert the filter and build averaged faces for these groups, although they are quite inaccurate.
Figure 1. Example of averaged face on a very limited set of shots (black adults)
Algorithms and methods
Now that we have a formed base, it's time to go directly to thinking about how to achieve the desired result. First, I will immediately exclude the various ready-made libraries of Pattern Matching, Image Warping, as well as any complex math. After all, we just want to quickly wrap up the working code and have fun, right?
Imperfection naive method
Life would be simple and pleasant if you could just lay down the entire set of photos and get the result. Alas, the brutal reality responds to our fantasies by a rather mediocre result for a male averaged person. It is possible to significantly improve the image obtained by the naive method, if for each subsequent photo to look for the most appropriate transformation of the coordinate system (position, scale, etc.). How to do it? You can use different approaches:
The above-mentioned ready-made pattern matching algorithms. We are looking for a set of common points, build a transformation on them. In matlab, there are ready-made algorithms, but I never managed to get them to work well on averaged faces, so we will not use this method.
Cross correlation A fairly easy-to-use method is the search for a suitable position, which is also well considered on the GPU. In theory, it is somewhat more primitive, since the transformation does not take into account the rotation and scale of the image, but it is quite sufficient for our purposes. In matlab, you can use the normxcorr2 function.
The first pass naive and correlation methods for comparison:
Figure 2. Comparison of the naive (left) and correlation (right) methods
Focus and Refine
The picture above still looks quite blurry, now you should focus on highlighting the main features of the face. Therefore, we will search for the pictures and center not the whole face, but the individual details obtained at the previous stage:
Figure 3. Correlation Plots
Naturally, it is advisable to somehow select the images by the value of the correlation coefficient, so as not to miss the result completely unsuitable. I implemented it simply as a threshold function with auto-tuning. Let's say, let the threshold t = 0.5, sort through the pictures one by one in random order. All new pictures are rejected, the correlation of which is less than 0.5 for this segment or with the current result. If the picture is not rejected, the threshold t increases by 0.003, otherwise it decreases by 0.001. As a result, the proportion of images falling into the result is regulated. The system independently balances at a suitable value of t. It was possible to implement something else, here the selection algorithm is on your conscience. As a result, we obtain a set of images with different areas of focus. It remains to combine them in manual or automatic mode to get a fairly accurate image.
Fig.4. Gradual refinement and focusing of the main features
Additional features
To date, we already have fairly pronounced main features and shape of the face, but we notice that the averaging "ate" all the features: hair, skin texture and eyebrows, making the face somewhat synthetic. This can be corrected by setting a high threshold so that only one or a few of the most similar persons will be selected. You can mix additional features with the main ones automatically, but I did it manually. Naturally, it is desirable to transfer only those small parts that have been lost.
Fig.5. Male averaged face, more than 500 shots.
results
Before the publication of this article, I expected to get rid of the first person received through FindFace, but he zaglyuchil, and in the end it was not needed. I already knew the person whom I clearly resembled the drawing above. At the end of the screen I looked at ... I myself. I understand that the average person should combine the features of all people in such a way as to be somewhat similar to absolutely everyone, but the coincidence was too significant to be noted. It only remained to make a more or less similar picture in terms of angle and light, and then combine it with the result for clarity:
Minute of de-anonymization
I found my photo in the article unnecessary for reasons of minimal privacy and relevance, so please use this text link drive.google.com/file/d/0B3QDURLCBJrBOGpONmJMbTZPanc/view?usp=sharing Thank you for your understanding.
After obtaining an averaged face, it was easy to find all sorts of typical classes for eyes, noses, mouths and visually assess their impact on the shape of the face. For the eyes, I identified 35 different classes, of which fourteen are mostly female:
Fig.6. Fourteen types of female eyes and their influence on the shape of the face.
A short illustration of the work of the first stage of the algorithm:
It may also be interesting:
Select images with the largest deviations from the average and average them. Most likely, there are other popular combinations of features that the algorithm missed during mass averaging. In fact, the right-wing character on KDPV is obtained in this way, and I am sure that he is not the only one.
To search and average pictures, similar to a certain real-life person.
From the obtained classes of facial features find the most popular combination and average. A face obtained in this way may differ slightly from those given above.