Hi, Habr! I present to your attention the translation of the article
"Facial Surface and Texture Synthesis via GAN" .
When researchers have a lack of real data, they often resort to data augmentation as a way to expand the existing data. The idea is to modify the existing training data in such a way as to leave the semantic properties intact.
Not such a trivial task when it comes to human faces.The method of generating faces should take into account such complex data transformations as
- pose,
- lighting,
- non-rigid deformations
while creating realistic images that correlate with the statistics of real data.
')
Consider how state-of-art methods attempt to solve this problem.
Modern approaches to the generation of individuals
Generative-competitive neural networks (GAN) show their effectiveness in making synthetic data more realistic. Taking the input to the synthesized data, the
GAN produces patterns that are more like real data . However, the semantic properties can be changed, and even the function of loss, punishing changing the parameters, does not completely solve the problem.
3D Morphable Model (3DMM) is the most common method for representing and synthesizing geometry and textures and was originally introduced in the context of generating three-dimensional human faces. According to this model, the geometric structure and textures of a human face can be linearly approximated, as a combination of root vectors.
Recently, the
3DMM model was combined with convolutional neural networks for augmentation of data. However, the samples obtained are obtained too smooth and unrealistic, as can be seen in the picture below:

3DMM Persons
Moreover, 3DMM generates data based on Gaussian distribution, which is rarely reflected in actual data distribution. For example, below are two PCA (principal component analysis) coefficients plotted by real persons and synthesized using 3DMM. The difference between synthetic and real distribution can easily lead to the generation of incorrect data.

The first two PCA coefficients for real (left) and 3DMM generated (right) faces
State-of-art idea
Slossberg, Shamay and Kimmel from the Technion - Israel Institute of Technology
offer a new approach to the synthesis of realistic human faces , using a combination of 3DMM and GAN.
In particular, researchers use GAN to simulate the space of parametrized human textures and create corresponding face geometries, calculating the best 3DMM coefficients for each texture. The generated textures are mapped to the appropriate geometry for new high resolution 3D faces.
This architecture generates realistic images, with:
- does not suffer from control of attributes such as posture and lighting;
- not quantitatively limited in the generation of new individuals.
Let's take a closer look at the data generation process.
Data generation process

Data preparation
Pipeline data generation consists of four main steps:
- Data collection : researchers collected more than 5,000 scans (face scans) from different ethnic, gender, and age groups. Each participant had to portray 5 different facial expressions including a neutral one.
- Markup : 43 key points are added to the meshes semi-automatically, by rendering a face and using a pre-trained face mark detector.
- Alignment of meshes : implemented due to deformation of the patterned face mesh according to the geometry of each scan, focusing on the affixed markup.
- Texture transfer : the texture is transferred from the scan to the template using the ray casting technique built into the Blender toolbox. After that, the texture is converted from the template into a two-dimensional plane using a predefined universal transformation.

Flat Lined Facial Textures
The next step is to teach GAN how to create imitations of aligned textures. For this task, the researchers used a progressive GAN with a generator and a discriminator, organized as a symmetrical neural network. In such an implementation, the generator progressively increases the size of the feature map until it reaches the size of the output image, while the discriminator gradually reduces the size back to a single output.
GAN Face TexturesThe last step is the creation of face geometry. The researchers tried different approaches to find the correct geometry coefficients for the texture. Qualitative and quantitative comparison of various methods below (L2 geometric error):

Two synthesized textures superimposed on different geometries.
Unexpectedly, the smallest squares method shows the best results. Considering the simplicity of the method, it was chosen for all experiments.
results
The proposed method can generate many new faces, and each of them can be represented in various poses, with different expressions and lighting. Various facial expressions are added to neutral geometry using the Blend Shape model. The resulting images are shown below:



For quantitative assessments, the researchers used
the Wasserstein truncated metric (SWD) to measure the distance between the training and generated image distributions.

The table shows that the resulting textures are statistically closer to real data than those obtained using 3DMM.
The following experiment evaluates the ability to synthesize images, which are significantly different from the training dataset, and to obtain previously unseen images. Thus, 5% of individuals were not included in the assessment. The researchers measured L2 distance between each real person from the training data and the most similar of the generated ones, and similarly for the real one from the training dataset.

The distance between synthesized and real persons
As can be seen from the graphs, the test data is closer to the generated images than to the training. Moreover, the “Test to fake” distance is not too different from “Fake to real”. It follows from this that the obtained samples are not just synthesized faces, similar to a training sample, but completely new faces.
Finally, to check the possibility of generating the initial dataset, a qualitative assessment was made: the face textures obtained by this model were compared with their closest neighbor in the L2 metric.

Synthesized textures (above) versus the nearest real “neighbors” (below)
As you can see, the next real textures are quite different from the original ones, which allows us to conclude about the ability to generate
new faces.
Results
The proposed model is probably the first one that can realistically synthesize both the texture and the geometry of human faces. This can be useful for the detection and recognition of faces or models of facial reconstruction. In addition, it can be used in cases where many different realistic faces are required, for example, in the film industry or computer games. Moreover, this structure is not limited to the synthesis of human faces, but can actually be used for other classes of objects where augmentation of data is possible.
OriginalTranslated - Stanislav Litvinov.