
This article was written based on the article
Effective video encoding in Linux with Nvidia NVENC: part 1, general , however, has its own characteristics and, unlike the original article, at the time of which the patch was not written, I’m applied to the revised
Patch Nvidia Acceleration to FFmpeg 3.0.2, getting in addition to the nvenc encoder also a fast resize filter - nvresize.
In total, I was able to hardware encode video in H.264 and HEVC using the
Nvidia GTX 960 video card on a fairly weak computer (Xeon L5420) with a speed (for H.264) exceeding the capacity of this processor up to 10 times (and 3 times relative to Core i7)! And on my favorite Debian 8 Jessie.
')
So, let's begin!
Technology
Nvidia NVENC is a technology that enables video encoding in H.264 and HEVC on GPU computing power. Important note: at the time of this writing (May 2016), only qualitative cards of the second generation Maxwell can provide any high-quality and fast coding, and for desktop these are: GM206 (GTX 950, GTX 960), GM204 (GTX 970 and GTX 980) ( Information: there are more expensive professional Nvidia Tesla / Quadro / GRID professional lines, the full list can be found
here ). Moreover, the NVENC module works at the same speed on all cards and the number of CUDA cores does not affect performance, however strange it may sound, so it makes sense to take older versions if the platform is used (besides coding) for games, even more so that the cards are on GM206 more appropriate because In addition to the encoder, they also received a hardware HEVC decoder. Important note: the entire GeForce line has a license limitation of 2 simultaneous streams. You can try to get around it by the method specified
in the second part of the article YourChiefImplementation (FFmpeg)
All the variety of implementations of Nvidia CUDA can be viewed
at the link . From myself I want to say that for video the most common is the popular tool - FFmpeg. We will use it.
Hardware
Although it is banal, but I will list the minimum necessary set of hardware requirements (based on real experience):
- Motherboard: with support for PCI-E. I recommend the Intel platform because of the close connection of Intel-Nvidia and personal affection. AMD fans forgive me!
- Processor: dual core, not lower than Core2 Duo. I recommend the Core i3 and some cheap Xeon (yes!).
- Memory: DDR2 / 3/4, at least 2 GB. An interesting fact, with two coding streams, the total memory consumption I have is ~ 0.7 GB.
- Video card: any Nvidia 900 Series video card (I recommend GM206: GTX 950, GTX 960 or GM204: GTX 970, GTX 980). 2 GB or 4 GB, gaming or not - it does not matter! The main thing is to keep track of the dimensions and keep in mind that with all the GTX 960 models I have seen, additional power is connected from above.
- ATX PSU with additional power connector (6 pin for our task is enough). I recommend a power of at least 400 watts.
Software
The further narrative is based on a certain basis of systems, programs, and their specific versions. I would especially note that I specifically used Debian, and not Ubuntu, for which there are even offs. SDKs I wanted to try to customize everything in my favorite distribution.
- Operating system: Debian 8 Jessie with an optional Deb Multimedia repository
- Nvidia CUDA 7.5.18
- Nvidia SDK 6.0.1
- FFmpeg 3.0.2
- Patch Nvidia Acceleration , changed by me for compatibility with FFmpeg 3.0.2
Assembly
The most important of the files is the first one, with the driver. Only he is needed if you immediately want to install the package after it is built.
Download the files that we need for further work. Size ~ 1.3 GB.cd /usr/src wget 'http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda_7.5.18_linux.run' wget --no-check-certificate 'https://developer.nvidia.com/video-sdk-601' -O 'video-sdk-601.zip' wget 'http://developer.download.nvidia.com/compute/redist/ffmpeg/1511-patch/cudautils.zip' wget 'http://ffmpeg.org/releases/ffmpeg-3.0.2.tar.bz2' wget 'http://kuzko.com/dl/ffmpeg_NVIDIA_gpu_acceleration.3.0.2.patch'
MD5 files 4b3bcecf0dfc35928a0898793cf3e4c6 cuda_7.5.18_linux.run 24af45272ed2881f88ed534d3211b584 video-sdk-601.zip f3f890bd314a568c47191810453cad2c cudautils.zip 7db5efb1070872823143e1365fdfcd53 ffmpeg-3.0.2.tar.bz2 a4f59f92675e02a0fa5c6cd124eda64e ffmpeg_NVIDIA_gpu_acceleration.3.0.2.patch
SHA1 files 0f366a88968b9eee01044de197e27764bc1567d6 cuda_7.5.18_linux.run e57c7b4cfb298d4c725a0bb4477928e228dabb1c video-sdk-601.zip edc818bef432d708466c5454974b9851523a86ba cudautils.zip c40731a221fbfaa50671d69fe894bedd664f91e2 ffmpeg-3.0.2.tar.bz2 f305832ed42beeff7d7c26a00f79668b63b322ec ffmpeg_NVIDIA_gpu_acceleration.3.0.2.patch
Add a Deb Multimedia repository (it is required if you want to get / use FFmpeg, which is as close as possible to the one that Deb Multimedia puts out). This repository is required for my build!
For the lazyCreate a file with one line:
echo -e 'deb http://www.deb-multimedia.org jessie main non-free\ndeb http://www.deb-multimedia.org jessie-backports main' > /etc/apt/sources.list.d/deb-multimedia.list
Next, perform
apt-get update
And set the keyring:
apt-get install deb-multimedia-keyring
Now we update the system and install the packages required for driver assembly. This step is also needed!
Necessary actions apt-get update apt-get -y dist-upgrade apt-get -y install build-essential checkinstall dkms ccache pkg-config libglu1-mesa-dev libx11-dev libxi-dev libxmu-dev libxcb-shm0
Install Nvidia Driver, accept the agreement (accept), all other answers by default. If you are warned that the system is not suitable, do not worry, this is normal! You can pass the -silent key for quick installation, but better go for yourself.
Necessary actions cd /usr/src chmod +x cuda_7.5.18_linux.run ./cuda_7.5.18_linux.run
The choice of further action is up to you. You can either trust me and download the finished deb file (created through checkinstall) and skip the next steps, or build manually by installing the necessary libraries and building the package yourself through checkinstall.
I am a Jedi, why should I be afraid? cd /usr/src wget 'http://kuzko.com/dl/ffmpeg_3.0.2-nvenc-7.5.18-nvresize-cudautils-20160523-1_amd64.deb' dpkg -i ffmpeg_3.0.2-nvenc-7.5.18-nvresize-cudautils-20160523-1_amd64.deb apt-get -f install
If you chose the bright side of the force, then let's continue, install the libraries required for the assembly (for this, among other things, Deb Multimedia was needed), prepare the header files, cudautils and apply the patch:
Necessary actions apt-get install -y --force-yes unzip libfdk-aac-dev libopencv-dev libiec61883-dev libavc1394-dev libass-dev libbluray-dev libbs2b-dev libkvazaar-dev libilbc2 libilbc-dev libopenh264-dev libsnappy-dev libsoxr-dev libxv1 libxcb-shape0 libxcb-shm0 yasm frei0r-plugins-dev libgnutls28-dev libopenjpeg-dev libopus-dev libpulse-dev librtmp-dev libspeex-dev libutvideo-dev libvidstab-dev libvo-amrwbenc-dev libvpx-dev libx265-dev libzvbi-dev libssl-dev libcdio-dev libcdio-paranoia-dev cd /usr/src unzip video-sdk-601.zip; /bin/cp -prf nvidia_video_sdk_6.0.1/Samples/common/inc/nvCPUOPSys.h /usr/include /bin/cp -prf nvidia_video_sdk_6.0.1/Samples/common/inc/nvEncodeAPI.h /usr/include /bin/cp -prf nvidia_video_sdk_6.0.1/Samples/common/inc/nvFileIO.h /usr/include /bin/cp -prf nvidia_video_sdk_6.0.1/Samples/common/inc/NvHWEncoder.h /usr/include /bin/cp -prf nvidia_video_sdk_6.0.1/Samples/common/inc/nvUtils.h /usr/include unzip cudautils.zip cd cudautils make cd /usr/src tar jxf ffmpeg-3.0.2.tar.bz2 cd ffmpeg-3.0.2 sed -i 's/ctx->outputs\[i\]->closed/ctx->outputs\[i\]->status/g' ../ffmpeg_NVIDIA_gpu_acceleration.3.0.2.patch patch -Np1 -i ../ffmpeg_NVIDIA_gpu_acceleration.3.0.2.patch
Now we are ready to execute the three basic package building commands (configure / make / checkinstall).
I will note a few points that you can change for yourself:
- --Enable-nvenc and --enable-nvresize have been added (where without them!)
- Modified --extra-cflags and --extra-ldflags to include cudautils
- Removed --enable-opencl (could not find a library for it) and --enable-libtesseract (an extra module in my opinion)
- FFmpeg is going without --enable-shared, although nothing prevents you from getting it back
- make -j10 can be changed to specific to your system (I used a build machine, where threads + 2 = 10)
- I inserted dependencies in checkinstall to ensure that when installing from scratch, it was enough to run apt-get -f install and install the required packages, and not search for something else via ldd
- Also, due to the imperfection of checkinstall, we had to perform several optional actions (deleting conflicting dev packages and creating the / usr / share / ffmpeg folder)
So, we go for coffee, but only after working configure cd /usr/src cd ffmpeg-3.0.2 ./configure --prefix=/usr --extra-cflags='-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -I../cudautils ' --extra-ldflags='-Wl,-z,relro -L../cudautils ' --cc='ccache cc' --enable-libmp3lame --enable-gpl --enable-nonfree --enable-libvorbis --enable-pthreads --enable-libfaac --enable-libxvid --enable-postproc --enable-x11grab --enable-libgsm --enable-libtheora --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libx264 --enable-libspeex --enable-nonfree --enable-libvpx --enable-libschroedinger --disable-encoder=libschroedinger --enable-version3 --enable-libopenjpeg --enable-librtmp --enable-avfilter --enable-libfreetype --disable-decoder=amrnb --enable-libvo-amrwbenc --libdir=/usr/lib/x86_64-linux-gnu --disable-vda --enable-libbluray --enable-libcdio --enable-gnutls --enable-frei0r --enable-openssl --enable-libass --enable-libopus --enable-fontconfig --enable-libpulse --disable-mipsdsp --disable-mips32r2 --disable-msa --disable-mipsfpu --disable-mipsdspr2 --enable-libvidstab --enable-libzvbi --enable-avresample --enable-libutvideo --enable-libfdk-aac --enable-libx265 --enable-libbs2b --enable-libilbc --enable-libopenh264 --enable-libkvazaar --enable-libsnappy --enable-libsoxr --enable-libiec61883 --enable-vaapi --enable-libdc1394 --disable-altivec --shlibdir=/usr/lib/x86_64-linux-gnu --enable-nvenc --enable-nvresize make -j10 apt-get -y remove libswscale-dev libavcodec-dev libswresample-dev libavutil-dev mkdir -p /usr/share/ffmpeg checkinstall --pkgname=ffmpeg --pkgversion "10:3.0.2-nvenc-7.5.18-nvresize-cudautils-`date +%Y%m%d`" --backup=no --requires='libcdio-paranoia1,libjack0,libasound2,libsdl1.2debian,libdc1394-22,libavc1394-0,libiec61883-0,libvidstab1.0,libbs2b0,libva1,libzvbi0,libx265-79,libx264-148,libvpx1,libvo-amrwbenc0,libutvideo15,libtheora0,libspeex1,libsoxr0,libsnappy1,libschroedinger-1.0-0,libopus0,libopenjpeg5,libopenh264-1,libopencore-amrwb0,libopencore-amrnb0,libmp3lame0,libkvazaar3,libilbc2,libgsm1,libfdk-aac1,libfaac0,libbluray1,libass5,libxcb-xfixes0,libcrystalhd3,libxvidcore4,libxv1,libxcb-shape0,libxcb-shm0' --default
If as a result you received a deb file, then accept my congratulations!
By the way, checkinstall can swear when installing a package. Not scary. Install (if not installed) the package manually via
dpkg -i ffmpeg_3.0.2-nvenc-7.5.18-nvresize-cudautils-20160523-1_amd64.deb and then
apt-get -f install to install the dependencies.
Nvenc and nvresize options
For some reason, not everyone knows what parameters nvenc and nvresize takes, and for this you just need to run ffmpeg with the -h key:
ffmpeg -h encoder = nvenc_h264 Encoder nvenc_h264 [NVIDIA NVENC h264 encoder]: General capabilities: delay Threading capabilities: none Supported pixel formats: yuv420p nv12 nvenc_h264 AVOptions: -preset <string> E..V.... Set the encoding preset (one of slow = hq 2pass, medium = hq, fast = hp, hq, hp, bd, ll, llhq, llhp, default) (default "hq") -profile <string> E..V.... Set the encoding profile (high, main, baseline) -level <string> E..V.... Set the encoding level restriction (auto, 1.0, 1.0b, 1.1, 1.2, ..., 4.2, 5.0, 5.1) -tier <string> E..V.... Set the encoding tier (main or high) -cbr <boolean> E..V.... Use cbr encoding mode (default false) -2pass <boolean> E..V.... Use 2pass encoding mode (default auto) -gpu <int> E..V.... Selects which NVENC capable GPU to use. First GPU is 0, second is 1, and so on. (from 0 to INT_MAX) (default 0) -delay <int> E..V.... Delays frame output by the given amount of frames. (from 0 to INT_MAX) (default INT_MAX) -enableaq <boolean> E..V.... set to 1 to enable AQ (default false)
ffmpeg -h encoder = nvenc_hevc Encoder nvenc_hevc [NVIDIA NVENC hevc encoder]: General capabilities: delay Threading capabilities: none Supported pixel formats: yuv420p nv12 nvenc_hevc AVOptions: -preset <string> E..V.... Set the encoding preset (one of slow = hq 2pass, medium = hq, fast = hp, hq, hp, bd, ll, llhq, llhp, default) (default "hq") -profile <string> E..V.... Set the encoding profile (high, main, baseline) -level <string> E..V.... Set the encoding level restriction (auto, 1.0, 1.0b, 1.1, 1.2, ..., 4.2, 5.0, 5.1) -tier <string> E..V.... Set the encoding tier (main or high) -cbr <boolean> E..V.... Use cbr encoding mode (default false) -2pass <boolean> E..V.... Use 2pass encoding mode (default auto) -gpu <int> E..V.... Selects which NVENC capable GPU to use. First GPU is 0, second is 1, and so on. (from 0 to INT_MAX) (default 0) -delay <int> E..V.... Delays frame output by the given amount of frames. (from 0 to INT_MAX) (default INT_MAX) -enableaq <boolean> E..V.... set to 1 to enable AQ (default false)
Important: s and size are not specified in the help, although vf_nvresize.c has them! I’ll also add that s and size are strings, not WxH, so you need to transfer hd1080, hd720, pal, ntsc, and other string notation permissions. A full list of them can be found
here .
ffmpeg -h filter = nvresize Filter nvresize GPU accelerated video resizer. Inputs:
Expediency. What is all this for?
I will not give detailed benchmarks here (unless the community asks me to do certain tests), they are enough on the Internet, and the encoding speed directly by the card is about the same and depends more on the original video, the resolution / bitrate of the encoded and additional processing (audio coding, overlay filters).
Here are some numbers for a test (cheap and ancient!) System on the Xeon L5420 (4 cores at 2.5 GHz)576p source, x264, bitrate set to 3000k:
frame = 7500 fps = 58 q = -1.0 Lsize = 107951kB time = 00: 05: 00.00 bitrate = 2947.8kbits / s speed = 2.32x
576p source, nvenc_h264, bitrate set to 3000k:
frame = 7500 fps = 804 q = -0.0 Lsize = 116997kB time = 00: 05: 00.00 bitrate = 3194.8kbits / s speed = 32.2x
At 1080p source (37 megabits of source, 10-15 megabits), the speed is 0.42x and 4.2x - again Nvidia is about 10 times faster.
When encoding 1080p in HEVC, it is painful to look at the processor, there are digits 0.08x and below, the card gives about 3x.
For the system on the Core i7, the difference is not so big, but still the card is 3-4 times faster than x265 on HEVC 1080p (15000k) and 13 times on HEVC 576p (3000k). And this despite the fact that the i7 processor itself usually costs as much as a GTX 960 and is more expensive.
As for the quality. I am a perfectionist, but I think that there are many uses for hardware coding, because The decline in quality is actually noticeably weak (watched both on a Dell 24 "monitor and on a 60" TV - all from close range) and only with so-called pixel hunting. But speed makes it possible to solve problems that are usually not able to run in real time on the processor.
I deliberately omitted details on the speed of nvresize, except for its parameters, since Although this is a very interesting and fast filter, its use is beyond the scope of this article.
I hope that my work will not be wasted and will help even more people to join the technology of hardware coding, bypassing some of the pitfalls, the solution of which I spent a lot of time.
But it was worth it!Article updates
- TODO: Apply patch to ffmpeg 3.1, since there appeared a hardware decoder (cuvid), which seriously helps with heavy coding.
- 08/18/2016: Fixed a patch before applying: ctx-> outputs [i] -> closed to ctx-> outputs [i] -> status, the file itself decided not to change yet so that the hashes converge.
- 05/25/2016: Added the GTX 950 to the list of supported, this is the latest GM206, like the GTX 960 and removed the inappropriate remark about the Quadro line, which also came out on the basis of GM204 and GM206.