Generative Adversarial Networks (GAN) is a class of deep generative models with interesting capabilities. Their main idea is to train two neural networks, a generator that learns to synthesize data (for example, images), and a discriminator who learns how to distinguish real data from those synthesized by the generator. This approach has been successfully used for
high-quality image synthesis ,
improved image compression , and others.
The evolution of the generated samples in the process of learning on ImageNet. The generator is limited to the image class (for example, "bearded owl" or "golden retriever").
In the field of the synthesis of natural images, the best GGSs achieve the best results, which, unlike the unconditional ones, use labels (“machine”, “dog”, etc.) during training. And although this facilitates the task and provides a significant improvement in the result, such an approach requires a large amount of tagged data, which are rarely encountered in practice.
In our
work “Generating high-quality images with fewer tags”, we propose a new approach to reduce the amount of tagged data needed to train advanced conventional GSS. Combining this approach with recent breakthroughs in the development of large-scale GSS, we produce comparable in quality natural images using 10 times less tags. We are also releasing a large update of
the Compare GAN library based on this study, which contains all the necessary components for training and evaluating modern GSS.
')
Improvements through semi-surveillance and self-supervision
In conditional GSS, the generator and discriminator are usually limited to class labels. In our work, we propose to replace manually stamped labels with supposed ones. To derive good quality labels for a large set of mostly unmarked data, we use a two-step approach. First, we learn how to present the features of the image only by the example of the unpartitioned part of the base. To learn the presentation of features, we use self-supervision in the form of a
recently proposed approach , in which unmarked data is randomly mixed, and the deep convolutional neural network predicts the angle of rotation. The idea is that models must be able to recognize the main objects and their forms in order to successfully accomplish this task:
Then we consider the activation sequence of one of the intermediate layers of the trained network as a new representation of the features of the input data, and train the classifier to recognize the label of these input data using the marked part of the initial data set. Since the network was previously trained to extract semantically meaningful data features (on a task with turn prediction), learning this classifier is more effective in examples than learning a whole network from scratch. Finally, we use this classifier to mark up unallocated data.
In order to further enhance the quality of the model and the stability of training, we encourage the discriminator's network to learn meaningful ideas about features that are not forgotten during training due to the auxiliary losses presented by us
earlier . These two advantages, together with large-scale training, are provided by advanced conditional GSS, which are well suited for image synthesis from ImageNet, judging by the
Frechet distance .
The generator network produces an image based on its own vector. In each row, linear interpolation of the own codes of the leftmost and rightmost images leads to semantic interpolation in the image space.
Compare GAN library for training and assessment of GSS
Advanced research in the field of GSS is highly dependent on well-developed and proven code, since even the reproduction of previous results and techniques require great effort. To support open science and allow the research community to develop on the basis of recent breakthroughs, we are releasing a large update of the Compare GAN library. It includes loss functions, regularization and normalization schemes, neural network architectures and numerical metrics often used in modern GSS. She also supports:
- Training on the GPU and TPU.
- Simple setup using Gin ( examples ).
- A huge number of data sets through the library TensorFlow .
Conclusion and plans for the future
Given the gap between labeled and unmarked data sources,
it is becoming increasingly important to be able to learn from only partially tagged data. We have shown that a simple but powerful combination of self-supervision and semi-surveillance can help bridge this gap for the GSS. We believe that self-supervision is a promising idea that needs to be explored for other areas of generative modeling.