Today we are introducing the long-awaited Yandex.Disk client for Linux. One could even say “specifically for Habrahabr,” since not a single mention of the Disk here could have done without questions about the client for Linux.
It has all the basic functionality that clients have for OS X and Windows, and even more (symlinks!), And one feature - it is a console.

')
Below you can read about how it is configured, what exactly it can do, and how exactly it works and what was difficult to do in it.
You can install it
here . Immediately after installing the package, the command
yandex-disk will appear in the terminal, through which you will continue to communicate with the Yandex cloud. After that, you need to manually run the
setup command.
The settings wizard allows you to select a folder for synchronization in the dialog mode, enable autorun at system startup, configure work through a proxy server (unless of course you use it) and log in to Yandex.Disk. When you manually configure the first thing you need to log in. After that, a config file will be created in the .config folder located in the home directory, where you can configure the path to the synchronization folder (you can specify it in the console manually), set the path to the token file, specify the folders that will or will not be synchronized, and register proxy settings.
Teams
The preparatory work is over, it remains to start the daemon of one of the teams. They allow you to synchronize files and folders and use them wherever there is Internet.
- Sync starts the daemon, synchronizes everything in the Disk folder, and stops the daemon.
- Start will do the same, but without stopping the daemon after synchronization is complete. When using start, the daemon remains running and all changes occurring in the Disk folder will be synchronized automatically.
- Entering in the terminal stop , you can at any time stop the running daemon, if it bothers you.
- The status command tells you the status of the synchronization kernel.
You can work with the disk folder both from the terminal and from Nautilus.
What can
The console client allows you to share a file or folder using the
publish command (if the file is not in the disk folder, it will be copied there before publication). The link will be available in the terminal, and anyone passing on it will be able to see or save a file or folder published by you. If you accidentally published the wrong file, you can close access to the public object using the
unpublish command.
In Yandex.Disk selective synchronization is possible. The
exclude command will allow you to exclude the folder from synchronization: all changes made to it after that will not be sent to the cloud.
The
read-only option will allow you to change files locally, without uploading them to the cloud. If conflicts arise with local changes, the latter will be saved in the renamed files, and the changes from the cloud will be synchronized. The
overwrite option will overwrite locally modified files in read-only mode.
We can not boast of the most interesting innovation in the core of synchronization - from now on, we support synchronization of symlinks! If there are difficulties and questions in using the console client, the
man and
help commands will help you to understand them simply and easily.
How made
In order to use the code in the future to implement clients for different operating systems, it was decided to write it in C ++. We rendered code-specific pieces of code into separate functions or classes, and wrote our own implementation for each platform. We took
Boost ,
OpenSSL and
JsonCpp as the main cross-platform libraries, and
git became the version control system. The Linux client was built using
autoconf . The code was written and debugged in a bunch of KDevelop + console gdb, or in Qt Creator (depending on the preferences of the developer).
Interaction with the cloud and synchronization are performed using the Yandex.Disk core library, which is used by desktop clients of the service.
How does it work
A console client consists of two parts: a daemon and a client. They communicate through text packets containing json messages sent via sockets (on Linux and Mac OS X, unix-domain sockets are used). Asynchronous operation is implemented using the boost :: asio library. Data access synchronization is implemented through boost :: asio :: io \ _service :: strand, which allows not to think about the problem of simultaneous access to data of several streams, and also eliminates the appearance of deadlocks.
For localization, we use the library boost :: locale. The text inside the client is encoded in utf-8 and, if necessary, is converted into code specific for each operating system. Linux file system monitoring uses inotify, which fits perfectly with asynchronous work boost :: asio.
How synchronization works
Synchronization is the heart of Yandex.Disk, its key feature. The task of synchronizing a file tree with a cloud is divided into several independent parts.
1 .
File system monitoring . The Yandex.Disk sync engine was designed and created as a portable abstraction capable of performing tasks on all supported platforms. But such a problem as file system monitoring is not implemented either by the standard C ++ library, or even by monsters such as boost. Moreover, even using the native API of the operating system, we get a set of events specific to each platform.
To monitor the file system, an “observer” interface was designed that can monitor events in a specific directory and return a list of events that occurred in it. And for each supported platform, the set of these events is different. For example, Mac OS X is able to report only the fact of some change in one of the child directories without detail. But Windows and Linux return the full set, including the creation, deletion, modification and movement of objects. Although practice has shown that events on the Windows platform should not be trusted and the most reliable option is listing the directory after receiving the alert.
2 Indexing local files and directories . To control the integrity and implementation of delta-update files, the Yandex.Disk synchronization kernel uses digests — sets of checksums of the file and its individual parts. For the entire file, we calculate the persistent
SHA-256 hash and a set of less persistent amounts for individual blocks. Each file located in the Yandex.Disk folder and not included in the list of exceptions must be indexed. But calculating the SHA-256 hash is quite an expensive operation, and calculating hashes with each software launch would be an unforgivable waste of resources. Therefore, after the file is indexed, the synchronization kernel stores the received digest in a “bank” - a special storage located in the Yandex.Disk service directory. To search for digests in the repository, a unique file identifier is used - inode (size and time of last change). Unfortunately, this approach is not without flaws. For example, many crypto-container files keep the last modification time unchanged even after recording.
Probably, in addition to the subtleties of working with symbolic links, nothing in the listing of directories is of particular interest. To successfully complete synchronization, the kernel must detect and exclude cyclic branches from synchronization.
In general, symbolic links are a real “headache” for the sync kernel. They can point to arbitrary locations in the file system, and not all of them can use the same sync rules. For example, Mac OS X application packages very often contain symbolic links to system library directories, and their synchronization to the cloud would be undesirable - especially between different OS versions. But at the same time, the ability to synchronize additional directories with the help of symbolic links is a very tempting opportunity, which we did not want to miss.
Therefore, to synchronize symbolic links, a special policy was introduced, thanks to which the kernel can choose a specific synchronization option for each symbolic link, depending on the location of the object to which it points.
3 Obtaining the cloud file system tree . To solve the synchronization problem, it’s not enough to have a local file structure and file digests — you need to get the current state of the file system in the cloud. If the synchronization core had to bypass the tree each time using the
PROPFIND method, then each synchronization cycle would take an unreasonably long time and create an unnecessary load on the channel. Therefore, Yandex.Disk software uses a special API, which makes it possible to get the current state of the file tree in the cloud and the changes that have occurred in it, starting from a certain known point determined by the version of the tree.
4 Receive alerts for cloud file system changes . Real-time file synchronization requires timely notification of changes to files in the cloud. It would be possible to use periodic polling of the server by clients, but, having estimated the possible number of clients, we came to the conclusion that such an approach would be poorly scalable and lead to a rapid overload of the service infrastructure. After a brief search, we stopped at the XMPP protocol. One of its implementations has long been working in Yandex. It was developed by a team that later created the WebDAV server for the Yandex.Disk project, so there was no difficulty with integrating this protocol.
Now push-notifications processed by the synchronization kernel include not only events that occurred directly with files or folders in the Yandex.Disk cloud, but also various service messages. For example, about issuing additional space or actions of other users in public folders. Adding these events to the existing protocol did not cause much difficulty due to the extensibility of XMPP, which once again confirmed the correctness of our choice.
5 Creating a list of synchronization operations . After the synchronization kernel has both file trees — local and remote — you can proceed to the synchronization procedure itself. For this purpose, a special tree comparison algorithm is used, which accepts as an input, in addition to the two mentioned trees, also the third one - the last one synchronized. As a result, the algorithm produces a list of operations that must be performed on local and remote files and directories in order to bring trees to a general view.
6 Processing a queue of synchronization operations . Creating a list of operations for local and remote trees occurs independently. As a result, conflicting operations may appear. For example, deleting a file in the cloud that was modified in it and not yet synchronized locally, or changing a file both locally and in the cloud. Modification / deletion conflicts are always resolved by the kernel in favor of modification, and double modification conflicts are resolved by renaming one of the file versions. In this way, we can guarantee the safety of the data and, after the synchronization is complete, let the user decide which of the changes is more suitable for him in each particular case.
Synchronization operations must follow a strict order; you cannot transfer a file until its parent directory is created. Also, the directory cannot be deleted as long as there are files inside it that need to be moved to a new location. The tree comparison algorithm already creates operations in the correct order, but if errors occur it may be broken. To prevent this situation, each operation has a list of dependencies - a set of operations that must be completed before the start of its execution, and a set of operations that should not start until it is executed.
In addition to dependencies, the order of operations is influenced by its priority. For example, file transfer operations are performed depending on file sizes, from small to large.
All these tasks are performed simultaneously, imposing additional requirements on the quality of synchronization of parallel processes and the distribution of resources within the Yandex.Disk synchronization core. If you do not already have Ya.Disk, you can start it
here , and install it for Linux - here:
repo.yandex.ru/yandex-disk .