The most important of the arts: how we implemented the video playback in Mail.Ru Cloud

Some time ago in the Mail.Ru Cloud there was an opportunity to play video files. Already at the very beginning of work on this functionality, we decided that we would develop a sort of Swiss knife: it required the ability to play any video formats and operation on all devices where Cloud is available. Video files uploaded to the Cloud can be divided into two categories: “movies / series” and “user videos”, which people shoot on phones and video cameras — a variety of formats and codecs is especially characteristic of this case. Without preprocessing, it is impossible to view all this on any device, for example, due to the lack of the required codec, or the file size will be too large.

In this article I will talk about how video files play in Mail.Ru Cloud and how we went to make reproduction in the Cloud omnivorous at the input and support the maximum number of devices at the output.

Storage and caching: two approaches

A number of services (for example, YouTube, social networks, and others) convert user-defined video into playable formats immediately after downloading. Only after the conversion is completed, the video becomes available for viewing. In the Mail.Ru Cloud, a different approach is used: the original file is converted directly during playback. Unlike specialized video hosting sites, we cannot change the original. Why did we stop at this option? The Mail.Ru cloud is, first of all, cloud storage, and the user will be unpleasantly surprised if, when downloading his video file, he finds that its quality has deteriorated or the size has changed by at least one byte. On the other hand, we cannot afford to store pre-converted copies of all video files - this will significantly increase the amount of space occupied. Also, we would have to do a lot of extra work, because not all of the stored video files will be viewed at least once.
')
Another plus of the on-the-fly conversion is that if we want to change the conversion settings or, for example, add another possible quality, we will not have to convert the old clips (which would not always be possible, because the original in this case no) - everything will work automatically.

How it works

We use the HLS format, developed by Apple specifically for streaming video over a network. The idea is that each video file is divided into fragments of arbitrary length, from which a playlist is formed, where for each fragment the name and its duration in seconds are indicated. For example, we divide a two-hour movie into ten-second fragments - 720 small separate files are obtained. In accordance with the moment from which the user wishes to watch the video, the player requests the desired file from the playlist transferred to him. One of the advantages of the HLS format is that the user does not have to wait for the start of playback while the player reads the title of the entire video file (in the case of a full-length movie and mobile Internet, the waiting time could be significant).

Another no less important opportunity that this format provides is adaptive streaming, which allows you to change the quality of playback on the fly, depending on the speed of the user's Internet channel. For example, viewing starts at 360p on 3G, and after entering the LTE coverage area, it continues at 720p or 1080p. In HLS, this is implemented quite simply - the player is given a “main playlist” consisting of fragments playlists, where the minimum necessary channel capacity is indicated. After downloading a fragment, the video player calculates the current speed, and depending on it, decides in what capacity to load the next fragment - in the same, lower or higher. Currently we support returns at 240p, 360p, 480p, 720p and 1080p.

Backend

The backend consists of three types of servers. The first group accepts requests for viewing: HLS-playlists are generated / returned, distributed ready-made fragments are distributed, and conversion tasks are set. The second group is a database with integrated logic ( Tarantool ). The third group of servers is converters, which receive tasks from the database and are marked after execution. When a request is received for any fragment of the video file, the first thing we do is to check in the database whether there is already a ready-made converted fragment with the requested quality on any of our servers. There are two options.

First: there is a fragment. In this case, we immediately give it away. It can already be converted, provided that you or someone else has requested it in the last N minutes. This is the first level of caching that works for all convertible files. It is worth mentioning that, in addition to this, we use another type of caching: files that are frequently requested recently are distributed and distributed from several servers in order to eliminate the possibility of network interface overloads.

The second option: we did not have a finished converted fragment. In this case, the task of converting is set in the database, and we expect it to be executed. The Tarantool database, a very fast opensource NoSQL database for which you can write stored procedures on Lua, stores information about video and conversion queue management. Communication of the server described above with the database is as follows. The server makes a request “I want the Nth fragment of the M file with the quality K, is ready to wait no more than T seconds”, and within T seconds it receives information about where the finished file can be taken from, or about the error that occurred. Thus, the database client is not interested in how its task will be accomplished — immediately or through a chain of complex actions: it is provided with the simplest interface to send a request and receive the requested.

Fault tolerance of the database is provided as follows: the client accesses only the master server. When problems arise, the replica is marked by the master, and the client refers to it already. In this case, from the client's point of view, no changes occur - he still interacts with the master.

Another type of database clients are converters that are ready to receive an HTTP link to a file with some parameters as input and to make a converted fragment from it. These converters communicate with the base in a similar way: a request is sent “Give me a task, I am ready to wait for N seconds,” and if a task appears during these N seconds, it will be immediately given to one of the waiting converters. The mechanism of transferring tasks from the client to the converter was quite easy to implement with the help of IPC Channels in Lua inside Tarantool, allowing interaction between different requests. Here is the simplified code for getting the converted fragment:

function get_part(file_hash, part_number, quality, timeout) --     local t = box.select(v.SPACE, v.INDEX_MAIN, file_hash, part_number, quality) --    -   if t ~= nil then return t end --  ,   , ipc channel    --    ,         local table_key = box.pack('ppp', file_hash, part_number, quality) local ch = box.ipc.channel(1) v.ctable[table_key] = ch --       « » box.insert(v.SPACE, file_hash, part_number, quality, STATUS_QUEUED) --     —      if s.waitch:has_readers() then s.waitch:put(true, 0) end --      timeout  local body = ch:get(timeout) if body ~= nil then if body == false then --    ,   return box.tuple.new({RET_ERROR}) else --  ,     local new_tuple = box.select(v.SPACE, v.INDEX_MAIN, file_hash, part_number, quality) return new_tuple end else --   ,   return box.tuple.new({RET_ERROR}) end end local table_key = box.pack('ppp', file_hash, part_number, quality) v.ctable[table_key]:put(true, 0)

The real code is a bit more complicated: for example, it processes situations when a fragment is in the “in the process of conversion” status at the time of the request. Thanks to this scheme, the converter instantly learns about the appearance of the task, and the client - about the completion of its execution, and this is very important, because the longer the user sees the "twist" of video downloading, the higher the likelihood that he will leave the page without waiting for the start of playback.

As can be seen from the graph below, most conversions, and accordingly the user waits for playback, lasts no more than a couple of seconds.

Conversion

For conversion, we use our modified FFmpeg. Initially, we wanted to use the built-in FFmpeg tools for converting to HLS, however, upon closer inspection, it turned out that in our case there are some problems with this. If you ask FFmpeg to convert a file with a duration of 20 seconds to HLS with 10-second fragments, then at the output we will get two files and a playlist, during the playback of which there are no problems. But if you ask him to convert the same file, first from 0 to 10 seconds, and then (by running FFmpeg separately) from 10 to 20 seconds, and making the correct playlist, then when switching from one file to another (approximately 10th second), we hear a noticeable audible click. We spent more than one day on the search of various parameters of the FFmpeg launch, but did not come to any result. I had to get inside and write a small patch, which, when passing a certain command line parameter, corrects this shortcoming due to the coding features of the audio and video tracks.

In addition, we used some other available patches that were not included in FFmpeg at that time - for example, a patch to solve a known problem with very slow conversion of MOV files (video, shot on iPhone). Retrieving tasks from the base and running FFmpeg is controlled by a demon called Aurora, which, like the demon on the other side of the base, is written in Perl and works asynchronously using event-loop'a EV and various useful modules, such as EV-Tarantool and Async :: Chain .

An interesting feature of video launch in Mail.Ru Cloud is that no additional server was installed for this - the most demanding part of the resources (conversion) works on our storage sites in a special isolated environment. Logs and graphs show that we can easily handle the load, several times higher than the existing one. For reference: since the launch at the end of June 2015, more than 5 million unique videos have been requested from us, and 500-600 unique files are viewed per minute.

Frontend

Now almost everyone has a smartphone, or even two. Shooting short videos for subsequent display to friends and family for a long time in the order of things. Therefore, we have provided a scenario where a person uploads video from a smartphone or tablet to the Cloud, and then immediately deletes it from a mobile device in order to free up space in the memory. If a user wants to show this video to someone, he can simply open it directly in the Mail.Ru Clouds mobile application or launch the player in the Clouds web version on the desktop. As a result, it became possible not to store on your smartphone a lot of shot short videos, while always having access to them from any device. In the mobile Internet mode, the bit rate decreases, and, accordingly, the size in megabytes.

In addition, when playing on mobile platforms, we will use the native libraries of Android and iOS. Therefore, the video is played on smartphones and tablets out of the box, in mobile browsers: we do not need to develop additional players for the format used by us. As in the case of desktop operating systems, if necessary, an adaptive mechanism is activated: image quality is dynamically adjusted to the current channel bandwidth.

One of the main differences of our player from "competitors" is its independence from the environment used. In most cases, developers make two different players at once: the first - with the interface on Flash, the second (for browsers that natively support HLS, for example, Safari) - exactly the same, but on HTML5, with loading of the corresponding interface. We have one player. Creating it, we achieved that we had an opportunity without special efforts to change the interface. Therefore, for video and audio, it is almost the same - all icons, layout, etc. written entirely in HTML5. The player does not depend on the technology in which we show the video.

We use Flash as a rendering tool, which only shows video, and the whole interface is built on HTML, so we don’t run into the problem of out-of-sync versions, since there is no need to maintain the Flash version. To play HLS was enough opensource library. To ensure its operation, we wrote the implementation of the element interface (which corresponds to the video element interface from standard HTML5) from scratch, the function calls of which are simply “translated” to the flash library. Therefore, we write the entire interface part on the assumption that we always work with the HTML5 video element and follow its standard. If the browser does not support this format, then we simply replace the native element of the video with our own, which implements the same interface.

If the user does not support Flash, the video is played in HTML5 with HLS support (as long as it is implemented only in Safari). On Android 4.2+ and iOS, HLS is played by native means. In the absence of support and a native format, we suggest the user to download the file.

If you had the experience of video playback implementation, we invite you to comment: I wonder how you solved the problem with breaking the video into fragments, how you chose between storage and caching, and what else you had to face. In general, let's share the experience.

Source: https://habr.com/ru/post/272769/

All Articles