📜 ⬆️ ⬇️

Improve multimedia application performance with hardware acceleration


Intel's processor architecture is becoming increasingly GP-oriented, which opens up amazing opportunities for dramatic performance improvements simply by offloading media processing from CPU to GPU. There are many tools available to developers to improve the performance of multimedia applications. These tools include free and easy to use.
In this publication you will find:

If you feel the need to improve multimedia processing performance, but do not know where to start, start with FFmpeg. Measure the performance during software processing, then just turn on hardware acceleration and check how much the performance has changed. Then add the use of the Intel Media SDK and compare again when using different codecs and in different configurations.

Computational architecture: from superscalar to heterogeneous


To appreciate the importance of GP, let's start with the history of improving the architecture of the CPU.
Let's go back to the nineties. The first serious stage in development is the emergence of a superscalar architecture, in which high throughput was achieved due to parallel processing at the instruction level within one processor.


Figure 1. Superscalar architecture
')
Then, at the beginning of zero, a multi-core architecture appeared (when there may be more than one computational core within a single processor). Homogeneous kernels (all completely identical) made it possible to simultaneously perform several threads (parallel processing at the thread level).
At the same time, the performance of a multi-core architecture was limited due to a number of obstacles.



Figure 2. Multicore architecture

Modern heterogeneous architecture


In a heterogeneous architecture, there may be several processors using a common data pipeline that can be optimized for individual functions of encoding, decoding, transforming, scaling, using interlaced scanning, etc.

In other words, thanks to this architecture, we have gained tangible benefits in terms of both performance and power consumption, which were previously unavailable. In fig. 3 shows the development of GP for the last five generations: graphics processors are becoming increasingly important. Both when using h.264 and when switching to the most up-to-date h.265 codecs, graphics processors provide significant computational power, thanks to which video processing with a 4K resolution and even with a higher resolution not only becomes possible, but also runs quite fast.


Figure 3. Development of heterogeneous architecture

GP generation performance


In fig. Figure 4 shows a dramatic increase in computational power in just a few generations, in which graphics processors were structurally placed on the same chip as the CPU. If your application uses multimedia processing, you must enable unloading on the GP to achieve acceleration of 5 times or more (depending on age and system configuration).


Figure 4. Improved graphics processing for each generation of Intel processors.

Getting to programming GP


In step 1, H.264 performance is usually measured, so that you can further evaluate the performance change as the code is refined. FFmpeg is often used to measure performance and to compare speed when using hardware acceleration. FFmpeg is a very powerful, yet easy to use tool.

In step 2, testing is carried out with different codecs and in different configurations. You can enable hardware acceleration by simply replacing the codec (replace libx264 with h264_qsv) with Intel Quick Sync Video .

In step 3, the use of the Intel Media SDK is added.

Note. This publication discusses the use of these tools in the Windows * operating system. If you're interested in implementing for Linux *, see Accessing Intel Media Server Studio for Linux codecs using FFmpeg .

â–Ť Encoding and decoding FFmpeg


Start with H.264 (AVC), since h264: libx264 is a software implementation in FFmpeg by default and produces high quality software-only. Create your own test, then measure the performance again, changing the codec from libx264 to h264_qsv. Later we will talk about H.265 codecs.

It should be noted that when working with video streams, one has to choose between quality and speed. With faster processing, the quality is almost always reduced and the file size increases. You will have to find your own acceptable level of quality based on the amount of time required for encoding. There are 11 presets for selecting a specific combination of quality and speed - from “Fastest” to “Slowest”. There are several data rate control algorithms:

Intel Quick Sync Video supports decoding and encoding using Intel CPUs and integrated GP1. Note that the Intel processor must be compatible with Quick Sync Video and with OpenCL *. For more information, see the Intel SDK Release Notes for OpenCL * Applications . Decoding and encoding support is embedded in FFmpeg using codecs with the _qsv suffix. Currently, Quick Sync Video is supported by the following codecs: MPEG2 video, VC1 (decoding only), H.264 and H.265.

If you want to experiment with Quick Sync Video in FFmpeg, you need to add libmfx. The easiest way to install this library is to use the version of libmfx packaged by the developer lu_zero.
Quick Sync Video hardware accelerated encoding example:

ffmpeg -I INPUT -c:v h264_qsv -preset:v faster out.qsv.mp4

FFmpeg can also use hardware acceleration when decoding with the -hwaccel parameter.

The h264_qsv codec is very fast, but it is clear that even the slowest mode of operation with hardware acceleration is much faster than software coding with the lowest quality and the highest speed.
When testing with H.265 codecs, you will either need to access the build with support for libx265, or build your own version according to the instructions in the Coding Guide for FFmpeg and H.265 or in the X265 documentation .
Example H.265:

ffmpeg -I input -c:v libx265 - preset medium -x265-params crf=28 -c:a aac -strict experimental -b:a 128k output.mp4

For more information about using FFmpeg and Quick Sync Video, see Cloud Computing Intel QuickSync Video and FFmpeg .

Using the Intel Media SDK (sample_multi_transcode)


To further improve performance when using FFmpeg, you need to optimize the application using the Intel Media SDK. The Media SDK is a cross-platform API for developing and optimizing multimedia applications in such a way as to use Intel hardware acceleration with fixed functions.

To get started with the Intel Media SDK, just follow a few simple steps:
  1. Download the Intel Media SDK for the target device.
  2. Download tutorials and read them to understand how to customize the software using the SDK.
  3. Install the Intel Media SDK. If you are using Linux, see the installation guide for Linux .
  4. Download the sample SDK code to experiment with already compiled sample applications.
  5. Build and run the Video Transcoding application: sample_multi_transcode

Commands are similar to FFmpeg commands. Examples:

VideoTranscoding_folder\_bin\x64>\sample_multi_transcode.exe -hw -i::h264 in.mpeg2 -o::h264 out.h264
VideoTranscoding_folder\_bin\x64>\sample_multi_transcode.exe -hw -i::h265 in.mpeg2 -o::h265 out.h265

Note that to use hardware acceleration, you must specify the -hw parameter in the argument list.
This example also works with the HEVC decoder and encoder (h.265), but it needs to be installed from the release of Intel Media Server Studio Pro.
There are many options that can be specified on the command line. Using the -u <quality, speed, balanced> parameter , you can set the target usage (TU), as when using the FFmpeg presets. TU = 4 is used by default. In fig. 5 shows performance metrics for different TU settings.


Figure 5. Examples of H264 performance characteristics relative to intended use

Use other Intel software tools.
To further refine the code, you can use Intel optimization and profiling tools, including the Intel Graphics Performance Analyzer (GPA) and Intel VTune Amplifier . In addition, the Intel Video Pro Analyzer and Intel Stress Bitstreams and Encoder tools can help you achieve high quality video and streaming, improve the performance of encoders and decoders, and speed up testing so that solutions can be released to the market faster.

Conclusion


Computer architecture has undergone significant changes over the past 20 years, and its development only in the last five years has given a significant increase in productivity. Now Intel CPUs can handle multimedia directly on GP, ​​so that new usage models become available for both end users and companies.

You can independently measure performance improvements with FFmpeg, as well as further optimize the code using the free Intel Media SDK APIs. The transition from software processing to hardware acceleration improves system performance and reduces power consumption (and costs), and also provides additional computing resources sufficient to switch to the H.265 codec family over time.

Additional resources


  1. Install and run the Intel Media SDK on Windows
  2. FFMPEG.ORG
  3. Intel Media SDK Integration with FFMPEG for multiplexing, demultiplexing, audio encoding, and decoding
  4. Intel Media SDK Tutorials for Clients and Servers
  5. Intel Graphics Performance Analyzers
  6. Intel VTune Amplifier
  7. Intel Media Server Studio
  8. Accelerate FFmpeg-based Applications with Intel Quick Sync Video
  9. Intel QuickSync Video and FFmpeg *
  10. Intel QuickSync Video and FFmpeg: Installation and Verification
  11. Access to Intel Media Server Studio for Linux codecs using FFmpeg
  12. HEVC Codec (H.265) Value

Source: https://habr.com/ru/post/301698/


All Articles