⬆️ ⬇️

Video processing on the CPU and GPU. Expert answers





In this post, we publish the answers of Intel expert Dmitry Serkin to the questions you asked earlier about video processing on CPU and GPU. We apologize for some delay - it is associated with a large time difference between us and Dmitry.

As usual, for the convenience of searching questions are provided with the habr-name of the author.



Question of Maratyszcza

Will hardware blocks appear in Intel processors for other (non-video) compression algorithms, such as deflate?
I do not think. There is optimization for specific processors. Intel Integrated Performance Primitives, contains optimization of the ZLIB, DEFLATE, and GZIP family of functions at the level of algorithms and instructions.



Question lifestar

Which codecs support hardware CPU compression?
If we are talking only about coding, then H.264, MPEG-2, MJPEG, and MVC for stereoscopic 3D support. On the way some more widely known.

')

Question jdima

Is it possible to expect that QuickSync in the quality of the resulting image is compared to x264?
If we are talking about presets (encoding settings) for quality, you will never catch up. With each new platform, the coding quality improves, as a greater resource appears on the hardware side and, as a result, the ability to improve algorithms, for example, motion estimation and bitstream packing. x264 uses very good algorithms (not fast, but influencing quality), including RDO. All this is extremely bad falls on the conveyor architecture in the gland. If we talk about medium presets, it is quite beats. All, of course, rests on the final settings of the codec, of which there are many. It must be understood that quality and speed do not go hand in hand. The goal of QuickSync is to encode quickly with good quality for 99% of users. And technology does it. In the meantime, work to increase dB goes every day.



Weatherman question

Is the HD 4000 much different in performance and the new HD 5000? Can you give any examples with modern games?
According to recent press releases, the speed has increased up to 3 times, power consumption has decreased by 2 times. I did not see any public benchmarks on games. They should appear a few weeks before Haswell’s launch. As I recall, it will take place in June. Unfortunately, I can’t give examples, since I’m not in this thread, I’m doing codecs.



Tp7 questions

1. Are there plans to support hardware decoding of multi-bit video, for example Hi10P from H264 or HEVC "high" profiles?
I do not have such information. Plans thing changeable. If these profiles are massively used, then they are very likely to be supported.



2. I remember that some time ago there were attempts to dialogue with the developers of free codecs for what they would like from the new Intel processors. What is the situation in this direction now? Do open source developers influence Intel and does Intel provide any support?
Rather, at the application level, rather than the developers. The recent announcement that HandBrake supports QuickSync is one such event. This is Intel's contribution to the free product. Such activities will occur more and more often, as the development of QuickSync on Linux and its derivatives (Android) is in full swing.

As for giving direct access to the driver and hardware, I have not heard of such activities. In addition, I consider them meaningless, since this work is rather nontrivial. In addition, there is a Media SDK , it provides higher level primitives.



3. At the moment, in principle, there are no good implementations of coding on the GPU (there are only a few of them, and all are not distinguished by quality or a special speed advantage). Why is this happening and are there any positive developments in this area?
I find QuickSync a very successful solution that has both speed and good quality (relative to that speed). As for the solutions from AMD or Nvidia, their failure can be explained by a different architecture from Intel. All their solutions are based on execution units and multithreading, which is difficult to use in codecs (some corner algorithms do not fall on multithreading). QuickSync is a combination of EU and fixed function (algorithmic blocks "sealed" in iron). This combination allows you to get an excellent increase in productivity and quality.



4. It is no secret that the performance of the recently released HEVC and VP9 is now beyond reasonable limits. What is your assessment, how soon will the processor / software appear capable of processing (at least decoding) the HD video of these formats in real time?
I suppose that in a couple of years such an opportunity will appear.



5. How widely is handwritten asm used in Intel’s multimedia products, or do you rely more on compiler optimization? Do you use C ++, or only good old C? How much time does it take to optimize performance versus implementing directly new functionality?
In war, all means are good :) We use all of the above at the driver level and below. The specific asm is, of course, generated from a C-like code for its subsequent manual optimization. It takes a lot of time for everything. There is a lot of research in quality and performance, but there is a deadline for everything. I will not say the exact proportion, but research, of course, consumes more time.



6. How big is Intel's multimedia team? How hard is it to get to you? :)
From hardware drivers to various SDKs, there are thousands of people. Looking at what position you are marking;) In Russia (Moscow and Nizhny Novgorod) there is a large team that deals with the Intel Media SDK. They periodically have vacancies.



RussianNeuroMancer Question

Is there a problem in the hardware or in the driver?
Here most likely in the driver. On Windows, this is a known issue with some restrictions at the OS level. But it is solved. More accessible and detailed I wrote here .



Question Ilya_Smelykh

Will there be hardware colorspace conversion for most popular formats? What about hardware deinterlacing?
All this is there . Planar and Packed Formats. There will be more. Deinterlacing is also supported.



Aingis Question

As you know, last fall, Apple released a 13-inch MacBook Pro with a retina. It does not have a discrete graphics card and all graphics work on the Intel HD4000. There are reviews that this platform is simply not enough for full support. What is Intel planning not to yield in terms of graphics at least Aypad with retina?
I think that the graphics are developing quite quickly and powerfully. Intel Iris should dot the i.



Question diger

Tell us an example of video encoding on a GPU at home.
The most common example is coding for mobile devices. If you want to transcode the series in a few minutes into a format supported by the mobile device, and not wait half an hour, then QuickSync will help you.



Question russelll

Will there be 64 bit drivers for intel 3650?
I apologize, but I do not have such information. But the topic is hot judging by the forums.



Questions sancho2222

1. Is there something similar to KUDA in Intel processors?
Meaning Nvidia CUDA? The answer is Intel OpenCL .



2. What libraries are needed to use the graphics capabilities of the Intel processor, in particular: coding / decoding h.264?
All you need is the Intel Media SDK.



3. Is the performance of the Intel i7-3517UE processor enough for simultaneous decoding and encoding the video resolution of 960 * 720 in H.264?
Yes, definitely. And even in several threads.



4. I have a problem with the Intel Atom processor (tm) N2800. Maybe you can help me. I decode using ffmpeg H.264 from a Logitech C920 camera, video resolution is 960 * 720. After decoding, I get the format of the frame YUYJ420. With this resolution, I can decode 2 streams of 24 frames per second with the above resolution, but if I turn the video after decoding 270 degrees, then I rest on the limitations of the cache (as I understand it), and in the end I can only use 20 frames per second one stream, if you increase the number of frames, the video falls apart into squares and terribly slow. Please tell me what could be the problem? exactly what is cache?
Most likely you run into the overall performance of the system. All operations take place on the central processor and it cannot cope with two threads plus postprocessing. To play the delay, ffmpeg starts to skip frames, so you see artifacts. What is CPU usage for this?

I did not quite understand what format the output is. YUV420? Depending on the format, a different set of operations is required to rotate. Well, there is little cache, and, as you know, it affects the speed.



Question yurasek

I'm wondering what is the potential of the 2nd and 3rd generation logic built into the Intel Core processors with hardware h.264 decoding? That is, how many, for example, h.264 streams in real time with a resolution of 1280 x 720 (1920 x 1080) / 25 frames per second will be able to process the Intel i7-3770 using hardware decoding (if the program code would ideally maximally optimized) for later output to the screen? How much will the resources of other processor units be involved in?
Good question. The number of threads physically rests only on graphic memory. As long as there is enough memory to allocate surfaces, everything should work. Another issue is performance. Depends on the content you are going to decode. In other words, depending on how the streams were encoded, it takes a different amount of time and resources. Taking into account all these factors (and many others), my rough estimate of up to 20 real time sessions at the same time.

Source: https://habr.com/ru/post/181902/



All Articles