AI learned how to create video from a single frame. Old paintings can now be made alive

Technology from Harry Potter has survived. Now to create a full-fledged video of a person, one of his pictures or photos is enough. Researchers of machine learning from Skolkovo and the Samsung AI center from Moscow published their work on creating such a system, along with a number of videos of celebrities and art objects that have received a new life.

The text of the scientific work can be read here . Everything is quite interesting there, with a lot of formulas, but the meaning is simple: their system is guided by "landmarks", sights of the face, like a nose, two eyes, two eyebrows, a chin line. So she instantly catches what a person is. And then it can transfer everything else (color, texture of the face, mustache, stubble, etc.) to any other video of a person. Adapting the old face to new situations.

Of course, this still works only on portraits. Models need only one person, with a face turned towards us, so that he can at least see both eyes. Then the system can do anything with it, transfer any mimicry to it. It is enough to give her a suitable video (with another person with a head in approximately the same position).

Earlier, the AI had already learned how to make dipfaces, and Internet users notably mocked celebrities by inserting their faces into porn and making memes with Nicolas Cage. But for this, they had to train the algorithms with megabytes (or, better, gigabytes) of data, find as many images and videos as possible with celebrities' faces in order to produce more or less decent results. The creator of Deepfakes himself said that it takes 8-12 hours to compile one short clip. The new system generates the result instantly, and at the entrance it only needs one picture.

With the previous system, we would never have been able to see the live Mona Lisa, we have only one of its angles. Now, with benchmarking algorithms, this becomes possible. The ideal is not achieved, but something is already close.

Moscow researchers also use a generative-adversary network. Two models of the algorithm are fighting with each other. Each is trying to deceive the opponent, and prove to him that the video she creates is real. Thus, a certain level of realism is achieved: the picture of a human face is not issued “into the light” if the model critic is not sure of its authenticity by more than 90%. As the authors say in their work, the images are governed by tens of millions of parameters, but at the expense of such a system, work boils very quickly.

If there are several pictures, the result is improved. Again, the easiest way is to work with celebrities who have already been taken from all possible angles. To achieve the "ideal realism" need 32 pictures. In this case, the generated AI photos in low resolution will be indistinguishable from real photos of a person. Untrained people at this stage are no longer able to identify the fake - perhaps the odds remain with the experts or with the close relatives of the “experimental” from all these images.

If there is only one photo or picture, the result is not always the best. You can see the artifacts in the video, when the head is in motion, without any problems. The researchers themselves say that their weakest point is the look. The model based on the landmarks of the person, while not always understand how and where a person should look.

Source: https://habr.com/ru/post/453058/

All Articles

AI learned how to create video from a single frame. Old paintings can now be made alive

More articles: