The first video codec on machine learning radically surpassed all existing codecs, including H.265 and VP9

Examples of reconstruction of a fragment of video compressed by different codecs with approximately the same BPP value (bits per pixel). Comparative test results, see under the cat

Researchers at WaveOne claim that they are close to the video compression revolution. When processing 1080p high-resolution video, their new codec on machine learning compresses video by about 20% better than the most modern traditional video codecs, such as H.265 and VP9. And on the video "standard definition" (SD / VGA, 640 × 480) the difference reaches 60%.

The developers call the current methods of video compression, which are implemented in H.265 and VP9, "ancient" by the standards of modern technologies: "Over the past 20 years, the basics of existing video compression algorithms have not changed significantly," write the authors of the scientific paper in the introduction of their article. “Although they are very well designed and carefully tuned, they remain tightly programmed and, as such, cannot adapt to the growing demand and a more versatile range of video materials, including sharing in social media, object detection, streaming virtual reality broadcasting and so on.”

The use of machine learning should finally bring video compression technology into the 21st century. The new compression algorithm is far superior to existing video codecs. “As far as we know, this is the first method of machine learning, which showed such a result,” they say.
')
The basic idea of video compression is to remove the redundant data and replace it with a shorter description that allows you to play the video later. Most video compression occurs in two stages.

The first stage is compression of motion, when the codec searches for moving objects and tries to predict where they will be in the next frame. Then, instead of recording the pixels associated with this moving object, in each frame, the algorithm encodes only the shape of the object along with the direction of motion. Indeed, some algorithms look at future frames in order to determine the motion even more accurately, although this obviously will not work for live broadcasts.

The second compression step removes other redundancies between one frame and the next. Thus, instead of recording the color of each pixel in a blue sky, the compression algorithm can determine the area of that color and indicate that it does not change over the next few frames. Therefore, these pixels remain the same color until they are told to change. This is called residual compression.

The new approach, which was presented by scientists, for the first time uses machine learning to improve both of these compression methods. So, while compressing traffic, the methods of machine learning teams found new redundancies based on motion, which conventional codecs were never able to detect, much less use. For example, turning a person’s head from a frontal view to a profile always gives a similar result: “Traditional codecs will not be able to predict a face profile based on the frontal view,” write the authors of the scientific work. On the contrary, the new codec studies these types of space-time patterns and uses them to predict future frames.

Another problem is the allocation of available bandwidth between motion and residual compression. In some scenes, the compression of movement is more important, while in others the residual compression provides the greatest gain. The best compromise between them is different from frame to frame.

Traditional algorithms process both processes separately. This means that there is no easy way to take advantage of one or the other and find a compromise.

The authors bypass this by compressing both signals simultaneously and, based on the complexity of the frame, determine how to distribute the bandwidth between the two signals in the most efficient way.

These and other improvements have allowed researchers to create a compression algorithm that far exceeds traditional codecs (see benchmarks below).

Examples of reconstruction of a fragment compressed with different codecs with approximately the same BPP value show a significant advantage of the WaveOne codec.

H.265 optical stream cards (left) and WaveOne codec (right) on the same bit rate

However, the new approach is not without some flaws, notes the publication MIT Technology Review . Perhaps the main drawback is the low computational efficiency, that is, the time required for encoding and decoding video. On the Nvidia Tesla V100 platform and on the VGA-size video, the new decoder operates at an average speed of about 10 frames per second, and the encoder does at all at about 2 frames per second. Such speeds simply cannot be applied in live video broadcasting, and even when offline coding materials, the new encoder will have a very limited scope of use.

Moreover, the decoder speed is not enough even to view the video compressed by this codec on a regular personal computer. That is, to view these videos, even in the minimum SD quality, at the moment an entire computing cluster with several graphic accelerators is required. And to view the video in HD quality (1080p) you need a whole computer farm.

It remains to hope only for an increase in the power of graphics processors in the future and for the improvement of technology: “The current speed is not sufficient for deployment in real time, but it must be significantly improved in future work,” they write.

Benchmarks

HEVC/H.265, AVC/H.264, VP9 HEVC HM 16.0 . Ffmpeg, — . , . , B- H.264/5 bframes=0, -auto-alt-ref 0 -lag-in-frames 0 . MS-SSIM, , -ssim.

SD HD, . SD- VGA e Consumer Digital Video Library (CDVL). 34 15 650 . HD Xiph 1080p: 22 11 680 . 1080p 1024 ( , 32 ).

:

MS-SSIM ;
MS-SSIM ;
WaveOne ( ).

(SD)

(HD)

WaveOne

. , . . , . G. Toderici, S. M. O’Malley, S. J. Hwang, D. Vincent, D. Minnen, S. Baluja, M. Covell, R. Sukthankar. Variable rate image compression with recurrent neural networks, 2015; G. Toderici, D. Vincent, N. Johnston, S. J. Hwang, D. Minnen, J. Shor, M. Covell. Full resolution image compression with recurrent neural networks, 2016; J. Balle, V. Laparra, E. P. Simoncelli. End-to-end optimized image compression, 2016; N. Johnston, D. Vincent, D. Minnen, M. Covell, S. Singh, T. Chinen, S. J. Hwang, J. Shor, G. Toderici. Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks, 2017 . , , .

ML- , . . . C.-Y. Wu, N. Singhal, and P. Krahenbuhl. Video compression through image interpolation, ECCV (2018). , . AVC/H.264. , .

« » 16 2018 arXiv.org (arXiv:1811.06981). — (Oren Rippel), (Sanjay Nair), (Carissa Lew), (Steve Branson), (Alexander G. Anderson), (Lubomir Bourdev).

Stas911:
Altaisky: . ?
Stas911: . .

Source: https://habr.com/ru/post/431354/

All Articles

The first video codec on machine learning radically surpassed all existing codecs, including H.265 and VP9

Benchmarks

More articles: