The new algorithm, created by scientists, allows you to create almost perfect "talking heads" with real people.

The researchers learned how to edit video clips, putting any words and sentences into the video of a person . The technology processes the video in such a way that it all looks very natural and organic, you can notice a fake only if editing is suspected.

Created a new algorithm by a joint team of researchers from Stanford, the Max Planck Institute, Princeton and the company Adobe. Editing is only in the creation of text that a person should say from the video. The rest of the work is performed by the neural network. It is difficult to notice a fake because facial expressions and patterns of “speaker” movements are preserved, the technology allows to mask the traces of interference.

In order to achieve this, the creators of the algorithm taught him to analyze the video. The neural network selects the necessary gestures, facial expressions and words with articulation, and then combines the individual frames in such a way that the modified video looks intact. The result is, in fact, a computer model that performs the actions required by the technology owner.
')
The movements of the lips, tongue, all articulation elements are original, the neural network "cuts out" them from the original video. After that, the video does not look too natural, because it contains a large number of frames and pauses. Therefore, the technology “smoothes out” the resulting version so that it looks as natural as possible.

Before using, a neural network needs to be trained - it needs to “feed” at least 40 minutes of video with that person or people whose speech will be replaced. True, this is only relevant for English-language videos, since there are only 44 phonemes in English, so the neural network is much easier to learn using the example of English than Russian or Japanese. However, over time, this technology can also be used to edit videos with people who speak any other languages. Below is a video that demonstrates the capabilities of the described technology.

Of course, this work raises a number of questions. One of them is information and media security. If you can put any words into the mouth of any person, and the result will look very natural, isn’t technology dangerous? The authors of the development claim that yes, attackers can use it. But, for example, graphic editors have been around for a very long time, with their help, you can also fake anything, but the world and we continue to exist with it.

In addition, the authors say they understand that the same technology can be used by unscrupulous politicians. The latter will avoid the need to slander speeches in front of the camera if they are replaced by “talking heads” formed from earlier speeches recorded on video.

In order to detect a fake, the authors of the idea suggest using specialized watermarks and some other techniques that will make it possible to recognize a forgery.

Of course, the fact of modifying a video is easy to prove if there is an original video. In addition, the authors plan to develop media content protection methods by adding “digital fingerprints” to the original version, which are easy to detect and understand whether the video is original or a fake.

The full text of the study can be found here .

Source: https://habr.com/ru/post/455439/

All Articles

The new algorithm, created by scientists, allows you to create almost perfect "talking heads" with real people.

More articles: