How Yandex.Disk works: bootloader

We have already talked about how the choice was made in favor of the WebDAV protocol, as well as about the problems that arise on the server side and their solution.

Today, it’s about how the file uploader to the service is organized, and what you need not to forget when writing it for Yandex.Disk-scale services.

')
To begin, consider the architecture of the service as a whole. At the core of the service is mpfs - Magic Proxy File System. This is a backend that contains all the business logic for working with files, folders, directories: all operations for copying or creating new files go through this system. The same system is also responsible for the storage of metadata.

Synchronization with clients occurs via WebDAV protocol. Based on it, we have created clients for Windows, OS X and mobile platforms. But if necessary, using WebDAV, you can configure synchronization with any application that supports this protocol.

1. .

Through the web interface, users can access the same functions as clients: upload, download and view files. In addition, through our internal API, interaction with other Yandex services is performed. At the moment it is the People, Mail, Music and Browser. For example, through the Yandex.Music mobile client, you can access audio files downloaded to Disk.

Sword

But back to our main theme - the loader. We have it under the code name "Kladun." For a long time we could not find a suitable Russian-language counterpart for the uploader-downloader pair, but in the end we settled on the Kladun-Zaburun combination.

It is not difficult to guess that the main task of Kladun is to download and place files with metadata in the repositories. It all starts with the fact that the user adds a file to the folder on the Disk via the client or the web interface. A request is sent to the server, which is transmitted to mpfs. The file system returns the link to the specific machine from the boot loader cluster, after which the file is uploaded to it. As a result, it goes to the repository, then the download status is transferred to mpfs, and it saves all the metadata. Each loader machine has its own local queue. This improves the reliability of the boot process. When one of the data centers or any machine in the cluster is disconnected, the files on all other machines will continue to be uploaded. And on the shut down machines, the pouring process will continue immediately after they are restarted.
In addition to its most obvious function, pick up the file from the user and put it in the storage - the loader can also transfer files between Yandex services.

Task stages

Each task performed by the loader is divided into several stages. First of all, after receiving the file from the user or service, the checksums are calculated. Later they will be used when synchronizing the file to track changes. Since our Kladun supports downloading changes by patches, each file is divided into several blocks, for each of which a separate checksum is generated. If the file has been slightly edited after uploading, then it will not be completely reloaded during synchronization: only those blocks that have changed will be reloaded, and the checksums will no longer match. Something similar is used in version control systems.

The use of hash sums also allows you to avoid reloading files already uploaded by other users. For example, if you want to upload any popular video or installation file to the Disk, and there is already a file in the Disk repository with the same checksum, it will be used. Thus, even very large files can be placed in the storage in just a couple of seconds. This is beneficial to both parties: the user gets a very fast download, and we save resources. Total duplicate files make up about 12 percent of all uploaded files.

After uploading the file to the bootloader's machine and calculating the hash sum, you need to move the file to the storage and send the download status to mpfs. From this point on, files become available to the user: they are displayed in the web interface and can be synchronized using clients. But the work of the loader does not end there; it is the turn of two stages of post-processing: the creation of a preview and a check for viruses.

Each stage passes through a specific set of states. Initially, they are all in the initial state. Some stages of execution require very little time, they almost immediately go into the status of successfully completed (success). But on the road to success may fail. They are of two types: temporary and final. For example, when a network fails during data transfer to mpfs or one of the data centers stops responding, the stage goes into a state of temporary failure (temp fail) and after some time the request is repeated. If after a certain number of repetitions to successfully complete the stage and fails, it goes into a state of final failure (fail).

In the event of a failure at a stage that is not connected to the network in any way and is executed locally, it immediately goes into fail status, since restarting it most likely will not give any results. This is the case with the generation of previews. This is a fairly simple and understandable process, and if something went wrong, then it is likely that we have not yet learned how to do a preview for this type of file, which means that repeating this stage is pointless. The need to repeat the operation, as well as the number of repetitions, after which it finally goes into a state of fail is prescribed for each stage separately. Whether a failure in performing a particular stage leads to the failure of the entire task depends on the mandatory parameter of successful execution. For example, as mentioned above, the generation of the preview is not always successful. But for the user, the presence of a preview in the web interface is not critical, the main thing is that access to their files is obtained, which means that the main task is completed. However, if the failure occurs at a more important stage, such as transferring a file to the repository, the entire task will fail. In this case, the desktop client will retry by itself, and when uploading via the web, the user will receive a message about the failure, after which he may try to download the file again.

In addition, each stage has a maximum execution time. If a stage takes a long time to complete successfully, it can go into a “in progress” state. This usually happens when a file is received from the user, since the downloaded files are quite large (the maximum file size on the Disk is 10 GB), and the connection speed is low.

Just do not disconnect! Just do not disconnect!

When transferring a file to the loader machine, anything can happen. For example, a user connection may disappear. In this case, the stage status is set to temp fail, and after the connection is restored, the download resumes.

Source: https://habr.com/ru/post/186752/

All Articles

How Yandex.Disk works: bootloader

Sword

Task stages

Just do not disconnect! Just do not disconnect!

More articles: