We are preparing an adaptive video for HTTP Live Streaming

Almost everyone uses, and, for sure, many have heard about the dynamic adaptation of the video stream for network bandwidth. Recently, this is almost a mandatory requirement for online-video on the Internet. The advantages of the adaptive video stream are obvious: if the network “sags” from time to time, the video continues to be displayed in the player without visible swapping and buffering; picture quality is automatically selected with adequate network bandwidth.

Despite the fact that the dynamic adaptation of the video stream is already relatively “old” technology, there are many small details on how to achieve the best result. So that on the server side it is simpler and cheaper, and that this video is compatible with as many clients as possible (Web, iOS, Android, well, do not forget about Smart TV).

')
On Habré, there have already been several articles about adaptive video, so I will not repeat myself and try to focus on how it works and how to make it.

Lyrical digression (as briefly and simply as possible about adaptive video):

Obviously, the easiest way to distribute video on the Internet is to take an mp4 file and put it on an HTTP server. This option is bad because we have only 1 file, and we have a great many clients and the same number of quality and not very Internet connections. If we upload 1080p video with a bit rate of 20 megabits / s (Blueray quality), smartphones will not be able to watch it, and if we upload videos for smartphones (say, 1 megabit / c and 320x240), it will look awful on a 55-inch TV.

Well, since 1 file is bad, let's lay out a dozen files, “cut” different videos from one source, there will be all bitrates and all frame sizes, from mobile to 1080p, with bitrates from 1 megabit / s to 20. Great. But there is a problem. The same smartphone can be both in home Wi-Fi (i.e. fast) and in restaurant (i.e. slow). The same TVs are like people in Moscow, and people on Sakhalin.

Then let the player check somehow what network bandwidth is, and measure it, selects the desired file for viewing. Finally, another unresolved problem is how to take into account the fact that the film takes 2-3 hours, and the Internet is “many”, then “not enough”. Run a measurement of network bandwidth periodically? This method will work, but what to do when you need to switch to a more or less “quality” file, during “subsidence” or “acceleration” of the network? To do this quickly (and even without stopping viewing what has already been pumped up), you need to know in advance from which place in the new file you need to start downloading. Unfortunately, the ratio of the offset from the beginning of the file to the time of the film is very often non-linear, due to the variable bit rate. On fast scenes, when, for example, James Bond pursues another enemy, the picture changes often and the bitrate is high, and on a smooth panorama of a cloudless sky, the picture hardly changes and the bitrate is low.

To cope with this task, it is necessary to index all files in advance (to make pairs of “scene time in a film / position from the beginning of the file.” These pairs are called segments. After that, knowing the current viewing time, you can determine from which place in another file you can download the next segment. For seamless switching, segments from different bit rates are aligned in time.

Of course, all these functions have long been implemented in almost all modern devices. Several different bit rates and frame sizes for the same movie are packaged in a specific format, information about what files are described in which file and in a special descriptor file (often called a manifest). The client downloads the manifest file before viewing and “understands” where it comes from and what to download, where is the size of the video and which bitrate is on the server.

The bad news is that in the modern world this simple approach has been implemented by different companies at different times and in different ways. Here is a list of the most well-known and common ways of adaptive video distribution over HTTP:

HTTP Live Streaming (or HLS, coined by Apple, used in many devices)
HTTP Dynamic Streaming (abbreviated to Adobe HDS)
MPEG-DASH (standard published at the end of January)
Smooth Streaming (invented by Microsoft)

It is also worth noting that sometimes (in the HDS and Smooth Streaming formats) instead of direct links from manifests, a special segment addressing scheme is used, when the server “calculates” according to this special scheme, which file the client requests, and to support this scheme, the server also becomes manifesto.

Let's take a closer look at adaptive video preparation on the example of HLS, as the most simply arranged and most widely supported device format.

The manifesto in HLS is a group of playlists from one “master playlist” and several “stream playlists”. The easiest way is to show it with an example. Suppose we have a very short film (only 3 segments for 10 seconds, for simplicity), for which we made 3 video bitrates - 500 kbps , 1000 kbps and 2000 kbps . In the file system of the server, it can be located for example:

/master-playlist.m3u8 /500K/ /500K/playlist-500K.m3u8 /500K/segment0.ts /500K/segment1.ts /500K/segment2.ts /1000K/ /1000K/playlist-1000K.m3u8 /1000K/segment0.ts /1000K/segment1.ts /1000K/segment2.ts /2000K/ /2000K/playlist-2000K.m3u8 /2000K/segment0.ts /2000K/segment1.ts /2000K/segment2.ts

The master-playlist.m3u8 file inside looks like this (I’ve removed some information for simplicity):

 #EXTM3U #EXT-X-STREAM-INF:BANDWIDTH=500,CODECS="mp4a.40.2, avc1.640028",RESOLUTION=640x480 500K/playlist-500K.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=1000,CODECS="mp4a.40.2, avc1.640028",RESOLUTION=640x480 1000K/playlist-1000K.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=2000,CODECS="mp4a.40.2, avc1.640028",RESOLUTION=640x480 2000K/playlist-2000K.m3u8

Those who are familiar with the format of m3u will easily understand what's what. The file contains three lines-links to other playlists m3u8, and in the line marked with a '#' the line above each link indicates the data of the corresponding bitrate. BANDWIDTH, CODECS, RESOLUTION - in general terms speak for themselves. It is easy to see that only BANDWIDTH differs, although in reality there all the parameters may be different. The client's task is to understand by these parameters which playlist is suitable for him at the moment.

Suppose a client knows that he now has a “good” Internet and prefers a high bit rate (2000K). The client downloads a 2000K / playlist-2000K.m3u8 playlist, which inside looks like this:

 #EXTM3U #EXT-X-TARGETDURATION:10 #EXT-X-VERSION:3 #EXT-X-MEDIA-SEQUENCE:0 #EXT-X-PLAYLIST-TYPE:VOD #EXTINF:9.8849, segment0.ts #EXTINF:9.9266, segment1.ts #EXTINF:9.9266, segment2.ts #EXT-X-ENDLIST

The links to individual segments are visible, their duration in seconds is indicated by the line above, for the zero segment, for example: "#EXTINF: 9.8849". After downloading this playlist, the client starts playing from the first segment to the third. While watching a segment, the next one usually swings and so on. If the client feels that the next segment is being pumped out too slowly, the client can stop the download and start downloading the same segment (for the same place in the film) from another playlist, for example, 500K / playlist-500K.m3u8. When the Internet speed is restored, the client can again switch to downloading segments from a 1000K or 2000K playlist.

The simplicity of HLS allows you to distribute it from almost any server platform (server “logic”, in fact, in the distribution of any and is not required). Individual segments, if necessary, are easily cached in any available way.

Now let's see what tools are available for creating and packaging video in HLS. This process consists of three main steps:

Step 1. Preparing source files (.mov or .mp4 with H.264) with the desired bit rate and frame size.

There are many options, but the most obvious one is to use free ffmpeg . There are many video editors with GUI, for MacOSX and Windows. For the example described above at this stage, you need to get three .mp4 or .mov files, with average bitrates of 500K, 1000K and 2000K. It is important to take the same source for all of them, so that at the end of the whole process, segments that are the same in time will be obtained.

For example, if you have the source file movie.mp4 (we assume that its bitrate is not lower than 2000K), then it will be enough to run ffmpeg like this (the keys mean that the sound track can be taken as it is, and the video bitrate can be changed):

 $ ffmpeg -i movie.mp4 -acodec copy -vb 500K movie-500K.mp4 $ ffmpeg -i movie.mp4 -acodec copy -vb 1000K movie-1000K.mp4 $ ffmpeg -i movie.mp4 -acodec copy -vb 2000K movie-2000K.mp4

Stage 2. Creating one-bit playlists.

Next you need from each movie- * K.mp4 to make a set of m3u8 playlist and segments * .ts. It is important that the segments are synchronous between different bit rates. I will say right away, ffmpeg can “cut” mp4 to m3u8 + segments, but only within one bitrate. Unfortunately, the master playlist will then have to create hands. This is not very difficult (in the minimal version, any text editor is enough), but if you happen to be an Apple iOS or OSX developer, then I can advise the HTTP Live Streaming Tools package (for MacOSX). It includes several programs from which two will be useful to us: mediafilesegmenter and variantplaylistcreator . The first one turns the mp4 file into a m3u8 playlist and “slices” the segments, the second one collects several one-bitrate playlists into a master playlist.

So, we create three playlists from the three files obtained in the previous step (it is assumed that movie files - *. Mp4 are in the current folder).

 $ mkdir 500K $ mediafilesegmenter -I -f 500K -B segment movie-500K.mp4 $ mkdir 1000K $ mediafilesegmenter -I -f 1000K -B segment movie-1000K.mp4 $ mkdir 2000K $ mediafilesegmenter -I -f 2000K -B segment movie-2000K.mp4

Small explanations:

The -I switch requires you to create a special file (variant plist), which will be required later in the variantplaylistcreator program.

The -f 500K switch indicates the directory (500K) into which the “sliced” segments should be folded.

The -B segment key tells how the segment files should be named (prefix, which will be supplemented by a digit and the .ts extension).

At this stage, you should have 3 folders filled with segment files, 3 playlist files, one in each folder and 3 files with the .plist extension. Single-bit playlists are called prog_index.m3u8 by default. You can also check how they are played on clients, for this you need to put them on the HTTP server and run the client on any of these prog_index.m3u8.

Stage 3. We collect three separate playlists into a single master.

For this is used variantplaylistcreator . It starts like this (for our example):

 variantplaylistcreator -r -o movie.m3u8 \ 500K\prog_index.m3u8 movie-500K.plist \ 1000K\prog_index.m3u8 movie-1000K.plist \ 2000K\prog_index.m3u8 movie-2000K.plist

The -r switch requires you to specify the so-called resolution tags (video frame size). In general, it is not mandatory, but if you pack several videos with different resolutions into one master playlist (say, mobile quality, 480p and 1080p), then you should specify it. In this case, the client will know from the master playlist what permissions are available.

The -o movie.m3u8 key indicates the name of the output file (playlist master).

The remaining parts of the command line are the playlist-plist pairs for each bitrate that were obtained in the previous step.

Now we have a playlist movie.m3u8 . You can upload the current directory and subdirectories to the HTTP server and start viewing the movie.m3u8 file on the client. By the way, the * .mov files are no longer needed, to reduce the space occupied by the content they can be removed from the folder.

I have it all for now, but if this topic is interesting to the respected community, I can continue it and in future posts tell how to add alternative audio tracks and subtitles to HLS, as well as how to make MPEG-DASH compatible with Smart TV from HLS. Thanks for attention.

Source: https://habr.com/ru/post/178267/

All Articles

We are preparing an adaptive video for HTTP Live Streaming

More articles: