AI interfaces and where they live

Recently we wrote a column on Habr about how we released our own online toy. One of the chips that we were seriously confused about was the AI generation of avatars for your character “on the fly” from a photo (as long as it works in the prototype and is not part of the game). At the same time, the technology itself is interesting and may be applicable not only here. As promised, we tell about it in more detail and let us feel the prototype alive!

Under the cat you can also find: why you made a choice in favor of an atypical AI learning system - without marking up the data, and why we consider this a scientific innovation; our fakapu when creating AI-avatars that do not need to be repeated; how and where domain adaptation is used today.

If there is no time to read the post

Follow the link to look at the work of AI right now.

Note:
1. The converter does not work out quickly, so please be patient, non-production power is used.
2. The system was trained only in the male field, so that from the female photos you will get men, probably effeminate. Portrait similarity should not be expected, because the number of elements used is obviously limited, we will tell about it below.

Where did the idea to create AI avatars come from?

It appeared not out of the blue. Work on machine learning in the uKit Group has been going on for several years. Thus, the WebScore AI project, which assesses the site’s visual appeal in real time, was open to all comers about a year ago and is being successfully used for the company's internal purposes.

The use of neural networks in game devs is a topic that periodically acquires a considerable HYIP. Let us recall No Man's Sky or RimWorld, which received their portion of gaming attention, and not only the community due to a fully generated universe, which has almost infinite variability from the words of the authors themselves. True, in reality, the generation of worlds was procedural, and therefore it has nothing to do with neural networks. However, the trend is obvious here - the market is ready for this and is waiting, rubbing their hands!

We thought that the ability to upload your photo into the game and immediately get a personal, most similar to you avatar, which no one else has - it is fun and can be an interesting enticing feature. In addition, the technology can clearly find its use outside the game.

It is worth noting that the similarity of the character with a real person in Web Tycoon will be relative. This is a deliberate step, because we will collect avatars from the elements drawn by our designers. And there are several reasons for this. First, at the exit, we want to get game avatars in the style of the game, while maintaining the flat favorite. Secondly, we are of course reinsured, because it allows us to always receive a portrait image, regardless of what the user has uploaded.

After all, not everyone wants to see a realistic cucumber among rivals.

Training models without tagged data

Such an approach to machine learning can be called innovative, because, in fact, AI is trained at all without the original labeled data. Due to what is this happening? The key to success is the availability of tools for generating synthetic data. Ahead of possible questions: we cannot give ours.

Why did you decide to teach our AI that way? The data markup is a monotonous and very voluminous work of assessors. And if designers suddenly decide to add to the game, for example, a third type of glasses (there were two at the time of writing the system), then it will be necessary to mark everything again, because each example marked up now becomes potentially not optimal.

In addition, in our case, it was necessary to take into account such a moment as subjectivity: if you give 10 people to collect for one photo on the avatar, we will get 10 different avatars on the way out. We checked.

Original photo:

And the results from two different designers of our company:

Man vs Machine

Learning was not easy. At first, our AI capitulated in all respects:

If someone suddenly did not understand, then the original photo here is the same. Bags under the eyes of AI turns into sunglasses. Hussar antennae - as a gift. Real workaholics can regard this as a feature, not a bug.

Further some more indicative results.

Glasses just do not add and can even dress up! There are certain problems with color rendering.

Directly about the development process

As a starting point, we took a few ready-made Style Transfer solutions, but they quickly had to be abandoned because they did not fit us in their pure form. We also tried to use generative models by themselves, but quickly rested on the fact that most of the solutions we came across either did not have examples of implementation or did not give a result.

As a result, CycleGAN became the first successful generative model, which we took as a basis, finishing it to fit our needs. Perceptual Loss was called to help the standard CycleGAN. This is a very noticeable beauty of the resulting images.

Below you can see CycleGAN in action:

Or another example, clear and familiar to everyone who has ever used the Prisma application:

The main difficulty has traditionally been to force the generative model to learn normally. The whole family of such models has a lot of rather characteristic sores, which in recent years everyone is trying to solve: long training time, mode collapse, sensitivity to initialization.

There were also purely engineering problems that, in theory, many should face, but for some reason very few people write about it. For example, we needed to do a fast parallel data load with augmentation, and the standard set of augmentations presented in keras / tf / pytorch was not enough for us. Plus, initially I wanted to augment on the CPU. The augmentation on the CPU has its undeniable advantages, the main of which, in our opinion, is the ability to “unload” the video card, thereby dividing the duties between the two computers.

How to solve problems

In the case of GAN training, we mainly worked with loss functions (loss functions). So, we have added identity loss and color loss. They were simultaneously played with the architecture of the generator inside CycleGAN, as a result of which they came to a 12-block reznet (the original one was a bit short for us).

In the case of duplicate code, we wrote higher-level wrappers on the models, allowing us to reuse the code used in several models at once. The code for date generators was also summarized in a similar way for the same reasons.

To solve the third problem (loading with augmentations on the CPU) I had to invent and write my own chip, but about this next time. We can only say that it is due to her technology can be successfully used for other projects that are not related to avatars.

Fakapy when creating AI-interfaces

The main mistake is an incorrect assessment of the complexity of the task. At first glance, it often seems that there is a huge amount of ready-made solutions that are 99% suitable for you. They only need to take and safely transfer to your case. So, it's not like that. Separately, it is worth showing what it is like to observe how in GAN, which has been successfully studying for several days, suddenly something literally explodes, which is why it begins to generate complete game.

Like this:

Another serious problem was that at the initial stages we forgot about the fixation of a random seed, and we remember that the GANs are extremely sensitive to initialization. A very shameful failure on our part, put a like for frankness.

Where is Domain Adaptation used today?

Domain adaptation slowly but surely penetrates into AI tasks. This steady trend is due to the fact that despite the rapid growth of available information in the modern world, the markup still remains a long and expensive exercise. The development of Transfer Learning and its generalizations in the face of Domain Adaptation solves this problem.

An example of the practical application of domain adaptation is the work of Apple to expand the dataset of photographs of human eyes by adapting synthetically generated images. In their study, they showed that the effective approach is the generation of initially tagged artificial data, followed by approximation to real data using Domain Adaptation methods.

Or another interesting example. In 2017, a group of scientists came up with an unusual approach to collecting data about streets, roads, pedestrians and other environments, which should contain self-driving cars for their training.

They offered to take this information from GTA V.

To this end, over 480,000 tagged virtual images of normal driving on the highway were generated in the virtual environment of Grand Theft Auto V. Using these images, the system was trained to read all the main variables necessary for basic autonomous driving: determine the distance to cars and other objects in front, the strip marking and the driving angle (angular course relative to the centerline of the strip). A Tesla fatal accident in Florida was also analyzed.

Future for NST and GAN

Can we talk about this with confidence? Perhaps yes. Neural Style Transfer uses Prisma. By analogy with this, new applications are being created, and not only for entertainment needs. GANs can also be used as a tool for solving a wide variety of tasks: coloring images, generating images from noise, even generating images from text.

Returning to the game dev theme. Here, the possibilities of domain adaptation are potentially limitless: if in a case with training of unmanned vehicles, the textures of the game world of GTA V were taken, from which something very similar to the photos of the real world was obtained, then absolutely nothing prevents from doing the opposite: to generate game textures from panoramic images of real cities .

Plus the car is that it does not get tired. A computer can now generate a huge number of different views in a split second. Our task is to learn how to do it effectively and efficiently, and then it remains only to separate the wheat from the chaff and get high.

Any questions? We are pleased to answer them in the comments.

Source: https://habr.com/ru/post/449494/

All Articles