📜 ⬆️ ⬇️

Server-based video encoding solution using integrated Intel HD Graphics video


In the previous article, we talked about video encoding using Intel Quick Sync technology on modern Intel processors and about the experience we gained while integrating this technology into our software. This time I will talk about how we created the server solution, about the problems we encountered, as well as about the performance of our solution on Intel server processors. Taking this opportunity, I want to thank our colleagues from Intel for their prompt assistance in integrating Intel Quick Sync into our software.

Testing

To test our software, a 1U server was selected in the following configuration:
M / BSupermicro X10SLH-F
CPUIntel® Xeon® CPU E3-1225 v3 @ 3.20GHz
Memory16 GB
OS version on Ubuntu server 12.04.4 LTS 3.8.0-23-generic. The main condition for Quick Sync operation is the presence of the C226 line in the chipset specification. Only chips with this marking can work with hardware video coding. In addition, the absence of embedded video on the motherboard is desirable, otherwise there may be problems with the definition and, therefore, using the Intel GPU using the Intel Media SDK.
The motherboard described above has integrated graphics (embedded video) on board, and we had to tinker to get the SDK to work on this hardware. When installing the SDK on a new server, the Media SDK installation script did not see the device ID. At the same time, we were unable to enable graphics integrated into the processor from the BIOS. Finding a solution led to the need to update the BIOS. After that, the coveted item appeared in the BIOS. However, I also had to disable the video embedded on the motherboard by switching the jumper. In this configuration, IPMI and monitor output do not work, but we work with the server via SSH and this is not so critical.
In addition, there are some restrictions on the Linux kernel used in the system. For servers, this is Ubuntu 12.04 LTS with kernels 3.2.0-41 and 3.8.0-23 or SUSE Linux Enterprise Server 11 with core SP3 3.0.76-11.

We also optimized the mechanism for transferring raw frames in our pipeline, using the native type of SDK memory, which increased performance and made it possible to squeeze the maximum speed out of the iron. In this case, only the pointer to the surface is transmitted and there is no physical copying of the memory through the pipeline.

The test video was video 1920x800, H264, lasting 12 minutes. The output video: 1920x800, high, H264, 8Mb / s. In the case of ffmpeg, the parameters were default (profile high). Test utility from Intel Media SDK sample_full_transcode also encoded with default parameters (profile high). QuickSync-enabled Streambuilder with the following parameters: profile high, RateControlMethod cbr, level avc 4.2. The target usage parameter (affects the quality / coding rate) is balanced in all cases.
The test results are illustrated in the following table.
')
Processor: E3-1225 V3, 16 GB of RAM, Intel® HD Graphics P4600
ffmpegsample_full_transcodestreambuilder (no optimization)streambuilder (optimization)
time8 min 42 s1 minute. 19 seconds2 min. 19 s1 minute. 40 s
cpu (max)750%55%125%50%
mem (max)3.3%4.6%0.5%0.4%
PSNR48,10746.68
Average PSNR51,20449,52
SSIM0.999340.9956
MSE1,6232,969

Processor: I7-3770, 3 GB of RAM, Intel® HD Graphics 4000
ffmpegsample_full_transcodestreambuilder (no optimization)streambuilder (optimization)
time8 min 48 s1 minute. 24 seconds2 minutes. 31 seconds1 minute. 23 seconds
cpu (max)750%nineteen%150%45%
mem (max)18%20%2.8%2.3%
PSNR48,10746,495
Average PSNR51,20449.27
SSIM0.999340.991
MSE1,6233,036

E3-1285 v3 processor, 16 GB, Intel® HD Graphics P4700
ffmpegsample_full_transcodestreambuilder (no optimization)streambuilder (optimization)
time8 min 1 s1 minute. 11 s2 minutes. 11 s1 minute. 34 s
cpu (max)750%55%130%55%
mem (max)3.3%4.6%0.5%0.4%
PSNR48,10746.68
Average PSNR51,20449,52
SSIM0.999340.9956
MSE1,6232,969

Results analysis

The metrics for streambuilder correspond to the metrics for the test utility sample_full_transcode and I dropped them.
From these tables it is clear that server processors with Intel® HD Graphics P4700 / P4600 in this experiment work faster and give better coding quality than I7-3770, Intel® HD Graphics 4000. However, this thesis is not always true, as Intel improves the quality coding with each new version of the chip and the SDK and the speed on the new chips may be less. At the same time, the load on the first CPU is a bit more. What is the reason, it is not yet clear.
In addition, the optimization of working with memory gave an increase of about 2 times in terms of performance.

The coding quality on the Intel® HD Graphics P4700 is the same as on the Intel® HD Graphics P4600, but the E3-1285 v3 is faster by about 14% with the same resource load. In addition, E3-1285 v3 is faster than E3-1225 V3 in coding using ffmpeg by about 10%.
The server with the installed streambuilder with Quick Sync support allows you to encode one source in 12 qualities Full HD (1080p), 24 HD qualities (720p) and 46 qualities SD (480p) sliced ​​in HLS. If this is a raw signal with SDI, then the number of simultaneously coded qualities is slightly higher.
You can experiment with streambuilder (for now only libavcodec based version) by downloading it from here . It comes with a standard config that allows you to record any source in the HLS format.

Results

Intel Quick Sync technology makes it possible to build a relatively inexpensive, high-performance video encoding server with acceptable quality. In the process of introducing this technology, we encountered some technical problems related to the presence of video integrated into the motherboard, which, by the way, are completely solvable. (Recall that the main thing when choosing hardware for this purpose is a chip with the C226 specification and a motherboard without integrated video, since IPMI and VGA output may not work with it).
The advantages of this solution, in my opinion, is that the CPU is almost not involved, as well as a small memory consumption. At the same time, free resources can be used for other tasks or for coding by means of CPU.

In the near future we are going to play with VPP (video post processing = video post-processing) with the Intel Media SDK functions (denoise, crop, resize, frame rate conversion, deinterlacing, etc.). So far, we have implemented crop, resize and deinterlacing, and these operations are performed as quickly as their purely software counterparts. There are quite a few coding parameters in the Intel Media SDK, and we continue to do tests and compare them with our profiles. About the results of experiments with VPP, performance / quality and comparison of ffmpeg / h264 coding with LookAhead Intel HD Graphics technology, I think we will write again.

Source: https://habr.com/ru/post/228713/


All Articles