How we did Yandex.Disk: server side, WebDAV and Erlang

Last week, Yandex.Disk was one year old, and more than 8,000,000 users had time to use the service this year.

And now we continue to talk about how much effort it took to make all this possible. Recently, we wrote about how and why the Yandex.Disk team chose WebDAV to synchronize desktop clients with the server and began work on a prototype Yandex.Disk client. Today, as promised, - how everything works from the server side.

')
For proper synchronization, you need not only to be able to upload files, but also to reanimate the fill in the event of an interrupted connection, and also to teach the client to take into account changes in the files.

Obviously, in the case when the connection to the server is interrupted and then restored, the client should be able to download the file to it. There are two parameters that must be considered in this case: the file name and its size. But for us they are not enough - several clients can work with the storage at the same time and the file can be updated competitively. Therefore, it was necessary to add another parameter.

At that time, we had already begun to develop a synchronization module, which in the process of its work considered the md5-hash of the file contents. And we decided to use it as a refinement parameter. First, the client always had this information and, using md5-hash, we did not increase the load on it. Secondly, it is better than any parameter that does not depend on the contents of the file - it gives the opportunity to verify the identity of the file sent and received.

Before sending the file to the server, the client considers the hash. Then he uploads the file using the PUT method, telling the server this hash in the HTTP header Etag. Upon receipt of such a request, the server saves the size of the uploaded file and its md5 into a special table of incomplete fills. In the case of normal uploading of all the content on the server, the md5 of the received file is calculated and compared with that received from the client - if they match, the file is received correctly and it can be saved.

In case of problems with the connection - if it was closed or with a long timeout - on the server it was necessary to save the actually accepted size in the table and log in the access.log failed request. We used the mochiweb web server as a framework, and in the process of handling problems with connection breaks we encountered its features. The library responded to any errors by calling 'exit (normal) `, which means" silent "completion of the process. This is normal if we have nginx in front of us to log requests, and if we don’t need to do anything with such a connection termination. Of course, you can catch this exception. But to understand which of the possible problems happened, we can then, perhaps, by the presence of known functions in the framerays. This method cannot be called normal, so I had to edit the library for issuing more sane errors.

When the connection is broken, the client cannot rely on the information about how many bytes of the sent file actually hit the server. Therefore, we had to make another revision of the protocol - we expanded the HEAD method, with which the client requests this information, passing the server the path to where the file was uploaded, its size and md5. The server searches for the incomplete downloads of the user with the same parameters and responds to the client how many are actually uploaded. After that, the client must resume the download from the location specified by the server using a special request - a new extension of the PUT method.

In addition to simply resuming files, we wanted to apply binary file patches — delta updates — in the way that is done in rsync, but minimizing the load from these operations on the server. We break the file into blocks, which are fast and persistent signatures. The rolling-checksum method for calculating fast signatures was borrowed from rsync . Block signatures are used to search for matching parts of a file that are not required to be sent over the network. The combination of block size, signature, and md5 file is called file digest. In order for the client to determine which parts of the updated file he needs to download or send to the server, he needs to receive a digest of the file stored on the server. To do this, again we had to expand the protocol - this time using a digest method.

As for the digests themselves received from the server, we did not want to slow down the synchronization process by calculating them on request, so it was decided to store them on the server already considered.

For a start, we tried to count digests during streaming files in Erlang. This seemed to reduce the overhead: a portion of the data was already in memory and transferring it to the digest calculating module seemed like a cheap solution. Unfortunately, due to the specifics of working with memory in Erlang, this turned out to be wrong: the data was copied to the driver that counted the hashes, the intermediate results were copied back to the handler process, and then everything was sent back to the driver. This turned out to be too resource-intensive. I did not want to develop a specialized driver that would keep all the intermediate state inside and not transfer it back to Erlang. An alternative solution was to add the file to disk as usual, and after the full receipt of the file, the digest was considered to be a separate program written in C and launched from Erlang as a port. We used this approach and reduced the time for calculating the digest 10 times.

For delta updates on the server, the standard PUT method has been extended, which accepts binary diff and superimposes it on the source file. In this differential, only two commands are defined: copy a part of the source file and paste the part that came from the client. The server deals only with simple operations, and the whole heavy analysis of changes in files is on the client side.

In fact, files are downloaded to the disk by green men.

In fact, files are downloaded to the disk by green men.

For cases when the file is updated on the server, the same algorithm of searching for identical parts is used. The client may need several parts of the same file, so we supported the requests with an indication of the set Range, when the answer comes in the form of multipart / byteranges, in order to reduce the number of calls to the file's metadata.

Another method that is needed for synchronization is to obtain file tree diffs so that the client can determine which files have been updated on the server. This task is different from the usual versioning, so the methods proposed by the standard did not suit us, and we had to expand the protocol once again. When a client wants to update files, he calls this new method, specifying the identifier of his synchronized version. And the server responds with the identifier of the latest version and the list of changes that have occurred in the file structure (not in the files themselves) since the last update. To do this, we keep a history of all changes in the file structure for each user.

Perhaps, with the exception of some trifles, this is all what the WebDAV server in Yandex.Disk does. We are pleased that we chose this protocol. On the one hand, it practically “out of the box” met our needs and did not require significant improvements, but on the other hand, thanks to it, it was easy to integrate many utilities and applications with Yandex.Disk.

Source: https://habr.com/ru/post/176251/

All Articles

How we did Yandex.Disk: server side, WebDAV and Erlang

More articles: