Video is one of the most popular services on Odnoklassniki. What our users do not ship: from cute scenes from a children's matinee to crashes recorded on a DVR. Therefore, fast and stable video download is important to us not only as one of the most popular functions of users, but also as a prerequisite for generating content.
What is the problem? - you ask. You put servers with large disks, you adjust the balancer - and rushed. However, an experienced video ninja knows that there is a whole lot of problems here:
- During the download process, the user may lose the connection with our portal (he closed the laptop, entered the tablet in the elevator, sat down the battery on the phone, etc.)
- Older devices do not support modern download technologies (and we have millions of users who have weak smartphones or ancient browsers)
- With the number of users that we have, the task of stable video uploading turns into a task of stable video download in huge volumes .
Yes this
')
In this article we will talk about how we won all these problems, we will describe the architecture of our solution and the reasons why it turned out to be just that.
Numbers decide everything
Every day, Odnoklassniki users download almost two hundred thousand videos, and with audio messages (yes, we host audio messages on the same cluster as the video), the number of daily downloads exceeds one million.
Every day, 15-20 terabytes of new video gets to our servers, the peak incoming traffic is up to 5 gigabit / s and the outgoing traffic reaches 500 gigabit / s. Most of the downloaded videos are short videos shot by users on smartphones. At the same time, web still remains the leader in terms of total downloads in terabytes.
The quality of video has also changed. A Full HD camera has long been the standard for picking any mobile device. As old smartphones come out of circulation and new ones are purchased, the share of videos uploaded by users in high and ultra-high resolution is inexorably growing. Over the past year, the number of downloadable videos in quality higher than Full HD has more than doubled.
Of course, along with the increase in quality, the size of downloaded video files also grows: when you switch to 1 step up in quality (720p -> 1080p, 1080p -> 1440p, etc.), the amount of disk space occupied by video increases about two and a half times .
Let's see how video downloads are implemented on the portal in our clients: in the browser (web version), in mobile applications (iOS / Android / WinPhone) and on the mobile portal.
Web version
Tens of millions of users. In addition to high loads, these three words mean that we have to support the entire browser zoo currently available from IE 8 to the Firefox nightly builds that have not yet been released.
Newer versions of browsers come out more than once a day. And it is clear that not all of our users are in a hurry to update.
And if automatically updated - there may be problems too. Here, Firefox 40 was released a week ago, and the millions of our users who automatically downloaded it had a broken download of photos. We quickly repaired everything (this time the incompatibility in the Content Security Policy was to blame for everything), but it happens that the repair takes quite some time.
How did we manage to achieve stable operation of the portal in such a zoo of browsers? Fallback, fallback and fallback again!
The client part of our video uploader is written in JavaScript using the RequireJS, jQuery frameworks and the wonderful
FileAPI library from
RubaXa from Mail.Ru.
FileAPI determines if the custom browser supports HTML5. If supported, then all procedures are performed using HTML5. If not supported, then FileAPI switches to Flash itself. Also, the size of the sent chunk depends on the user's browser.
For each browser, we experimentally selected our chunk size. On average, it is equal to 2 MB, and the range ranges from 100 KB to 10 MB.
Uploading files is made as convenient as possible for the user. A gentlemanly feature set is supported: drag & drop, simultaneous loading of multiple files, auto-resuming download, pause button, progress indicator, etc. etc.
From the user's point of view, all this wealth appears like this: while the download is in progress, the user can enter a title for his video, fill in a description, select a video channel, set tags and choose a cover. And, of course, in parallel to walk through the portal - listen to music, fumble pictures and chat in person with friends.
To a heap FileAPI does not impose any restrictions on the size of files uploaded by the user. If you wish, you can even download a terabyte file - some hours-long video in 4K resolution. Another thing is that the user will have to wait for the end of loading for quite a long time even with a wide channel.
Much attention we paid to the automatic resumption of download. Of course, for users with reliable broadband connectivity, this is not so relevant, but not all are in such comfortable conditions. This is evidenced by the distribution of video downloads by region of Russia.
The logic of this process is very simple. FileAPI sends one chunk to the upload server and waits for a response. If the answer does not come in for some time, the client sends this chunk again, then again and again, until the server confirms successful acceptance. At the time, the transition to FileAPI allowed us to reduce the number of errors during the download several times.
Mobile devices
If we talk about the number of video downloads on the portal, the picture is as follows:
Mobile users can download videos in two ways - from the browser and from the application. In mobile browsers, download resumes when the network is broken, and in mobile applications, even after the phone is rebooted (for example, in the event of a low battery).
The client part of the mobile downloader first tries to fill the entire file from start to finish at once using
XMLHttpRequest.send()
. In the event of a broken connection, the bootloader attempts to reconnect with the server, polling the download status. If it is possible to get a response from the server, the loader uses
Blob.slice()
to get the data block following the last successfully loaded byte.
At the same time, some devices have problems loading the block directly from the file, so
FileReader.readAsArrayBuffer
used. The data block is loaded into an array, which is sent to the server. We use blocks of no more than one megabyte in order not to crash the browser.
If the browser supports HTML5 Web Worker API, then the download is carried out in a separate thread, which helps to make the interface more responsive.
To show the download progress to the user, we use the
XMLHttpRequest.onprogress
event. In browsers, this event is implemented as horrible: some send thousands of events every second, others do not send them at all. Throttling is implemented for the first - most of the events are ignored. For the latter, the server is polled periodically (every 5 seconds).
Backend video download. Architecture
Due to the constant increase in load, we are continuously expanding and modernizing the infrastructure of our video service. Today video service is
- 240 servers for storing custom videos. There are more than 7,000 disks there — a total of about 30 petabytes;
- metadata and cache deployed on another 36 machines;
- 150 more servers are responsible for the transformation of videos into our internal format;
- Finally, another 36 machines are used to distribute and download videos.
Thus, our video service consists of
more than 400 machines . Our hardware is generally quite ordinary - something like dual-processor E5-2620 with 64GB of RAM, but transformation servers have more powerful CPUs, and on distribution servers - 256GB of RAM and 10TB SSD. The architecture is well scalable, allows us to easily use these cheap and, in general, is far from new processors and thus significantly save on hardware. The economy here is simple: each $ 1000 of savings in server price translates into
half a million dollars in savings on the entire video service. And if we talk about the portal as a whole, where we now have about 7,000 servers, the picture becomes even more pleasant :)
Actually, the architecture of the video download backend is built with the following basic requirements:
- resuming downloads within a few days after losing the connection;
- service operability in case of software errors and hardware failures;
- guarantee the safety of downloaded data in case of server failure or loss of the whole data center;
- high performance.
For downloading videos in Odnoklassniki, there is a subsystem of 6 servers distributed across three data centers. Each pair of upload servers in the DC has a cluster of LVS servers (
Linux Virtual Server is a Linux kernel module that allows you to distribute IP traffic to any number of physical servers).
For balancing requests between DCs, DNS-GSLB (Global Server Load Balancing) is used. These are servers that resolve a domain name to the IP of the most unloaded / available datacenter. In case of failure of one of the data centers, the DNS-GSLB will evenly distribute the load over the remaining ones.
Consider the video upload process:
- the user initiates the download, and the server gives him the URL to download: http (s): //vu.mycdn.me/upload.do? ...
- then DNS-GSLB resolves the domain name vu.mycdn.me to the IP of one of the LVSs (in the figure in ip1);
- DNS-GSLB partitions users by the first three octets of their IP addresses. Thus, one user with repeated DNS rezolvah issued IP of the same data center. The issuance of the IP address of another DC will occur only if it rejects or the user's IP changes, which rarely happens;
- then the LVS inside the DC has already redirected requests to a less loaded and accessible server;
- IP affinity is configured for LVS, which ensures that all user requests are sent to a single server.
Thus, we have a stable route, through which the user loads data to a specific upload server.
In parallel with the process of uploading video to the upload server, the received data is sent to the distributed storage. For each user on the server, a session is opened and a buffer for receiving data. Buffer size - 10 MB. As soon as the old buffer is filled, two things happen at the same time:
- a new buffer opens;
- An asynchronous operation is performed to flush the old buffer to the intermediate distributed storage.
The intermediate storage of the video is stored until the user completes the download, and the video is not fully processed. An underloaded video is stored in it for several days. Fully loaded video is transferred further to the processing and enters the permanent storage - also, of course, distributed.
In the event of a rotation out of one of the servers inside the DC, LVS will transfer all requests to the available upload-server inside the DC. The available upload server restores the state of the client session according to the data available in the distributed storage, and the user immediately continues the download, in the worst case, with the retry of the last downloaded 10 MB.
When a user switches to another server, a “hole” may appear in the sequence of bytes loaded. In this case, the server returns a special error code 416 - “Range is not acceptable error, recoverable” with the title “X-Last-Known-Byte”. If the client supports this header, then it resumes the download from the place indicated in this header, and if not, it goes back one chunk.
If the DC fails (the situation is more rare than the server crashes), the client library of downloading files (for example, FileAPI) will try to restart the download over IP LVSa located in the fallen DC during retry time. All new downloads will continue through the available DCs.
Fault tolerance backend
Three basic scenarios for checking the resiliency of our service are:
- scenario with data center down;
- disk overload scenario;
- script with traffic overload.
In the general case, we still need to check the scenario with overload on the CPU, but in our case there is enough CPU with a large margin, so we are not seriously considering it at the moment.
The scenario with overload on incoming traffic for us is also unrealistic. The channels that we acquire in data centers are symmetrical, and since our download traffic exceeds our upload traffic by about 100 times, we also do not seriously consider overloading upload as a threat. Our main protection here is a knife switch that cuts off parts of users from downloading videos. We have 256 such parts (partitions), so we can regulate the number of users who can upload (or vice versa view) the video in increments of less than 0.4%.
The performance of the system in case of failure of any data center is regularly tested using the simulation method. In the simplest case, we simply stop the work of our applications on the servers in one of the DCs and see how the service works on the two DCs remaining in battle.
We also have bursts of user activity when they start to download videos in large quantities. This usually happens on holidays and in sports and cultural events.
Spackle with spindles
This year, we had a chance to evaluate the degree of resiliency of our service.
After the Victory Day Parade on May 9, a sharp increase in the downloads of parade and salute videos began. We did not expect a three-fold increase in peak traffic, so the disk subsystem of temporary storage was very quickly loaded at 100%. Upload servers began to receive errors when trying to reset the chunk to temporary storage — the storage did not respond. Each session has a buffer in which incoming data is received. Buffers eat space, so the space in RAM for sessions on upload-servers also quickly ended ...
Those.
- client loader wanted to load chunk
- there was no space in the session buffer
- The server returned a renewable error to the client.
The client downloader tried again until the upload server had a free buffer (while the upload server could not flush the buffer to the distributed storage).
For about two hours, the upload servers dumped the loaded data at the maximum speed of sequential writing to temporary storage disks. The problem was amortized due to the session buffer on upload servers. Sometimes, client loaders would get an error and resume the download at intervals.
The results were as follows: in general, the download speed on the portal slipped by about half, but the service did not fail, and all users eventually uploaded their video.
findings
- If you have work requirements in many browsers, then use ready-made fallback solutions - FileAPI for downloading files, Atmosphere and Pusher for push notifications, etc .;
- for a better balance between performance and cross-browser compatibility, it makes sense to change the size of the chunk depending on the client browser. The default chunk size is 2 MB, but there are problematic browsers for which this size needs to be reduced to 500 KB, or even to 100 KB;
- It makes sense to test virtually any high-load system for fault tolerance using several different scenarios, including simulating accidents .
Well, do not forget about mobile devices - mobile traffic on major portals has long exceeded traffic from desktops.