📜 ⬆️ ⬇️

DataSync API from Yandex.Disk: a cloud for applications and structured data

For data synchronization in applications, the usual “file” cloud storage is not suitable. Too many problems with data consistency have to be addressed by the authors of the applications themselves. Therefore, today we are opening to everyone the DataSync API technology that the Yandex.Disk team has developed for its own Yandex services. It allows you to synchronize structured data between cloud storage and devices. The API uses Yandex login, which almost every Internet user has in Russia and many in other countries. DataSync is multi-platform and not tied only to Android or iOS.



We are really very happy, because three years ago, when Yandex.Disk was launched, we wanted to synchronize not only files between computers, but in general any data between all human devices. Our digital life is not only files, but also points on maps, routes, bookmarks in the browser, a list of records in a computer game, and much more.
')
For more than two years, Yandex. The browser has been working on the synchronization technology of J.Diska. In the near future, other large Yandex services will begin to integrate their platforms on DataSync. Under the cut - more details about how it works, why it is needed, and examples where you can see and try how everything works.

Once the basic infrastructure element of the Internet has become a web server, without which it is now impossible to imagine the existence of a worldwide network. Now times have changed, and we are confident that the cloud self-synchronizing database will be an equally important element of the modern multiplatform Internet.

Previously, the user had only one device and only one way to interact with the Internet service - go to the site. And for the developer, everything was easy. Base-Code-Web interface. Somewhere in the depths of the site, the database worked, where the developer saved the data he needed, and from where he took them, displaying on a single device the user, or rather, in the browser of his computer.

However, times have changed a lot, and now a person can have many devices, and one service can be represented by a whole set of applications. This greatly complicated the life of the developer. He has more everything: interfaces, databases, logic, and even programming languages. But the worst thing is the data. On some devices there are local copies of data that live on the server and other devices, and they all need to be synchronized with each other. This is not an easy task at all: there may be conflicts, data loss, loss of communication, normal operation speed is required on all devices - and much more.

Obviously, if the developer himself decides from scratch the tasks of synchronizing applications and cloud storage, he will most likely forget what he was actually going to do initially or will not get to that at all soon.

Of course, the task to solve the described problem appeared in Yandex. We started working on creating a “cloud database”, which will free development from the routine synchronization task and allow us to focus on creating applications.

A regular database lives boring on one computer, and we made a database that lives simultaneously in the cloud and on all user devices. This, of course, as a ready-made solution, fundamentally simplifies the task of developing multiplatform applications. No matter where the data is stored in such a database - on the phone, in the browser, on the server - nothing is lost and actual data will appear everywhere.

The internal documentation for the project is written as follows: “The DataSync API is a repository of arbitrary structured data associated with a user or a pair of user + application. It is designed for synchronization between user devices and normal operation in the conditions of a poor mobile network. ”

For an external developer, our database is very similar to the popular noSQL MongoDB database - the essence is the same. It consists of collections, collections - from objects. An object is a collection of key-value fields. But we set ourselves the task of making sure that the developer did not think about how to connect data on different devices and in the cloud, but simply worked with our API, while all the data synchronization occurred in a magical way. Of course, such “magic” is possible when you use both the cloud and the client part of our “synchronizing” SDK.

And here I want to tell you a little more about what problems and how we solved it to happen.

Conflicts


In the proposed device of the world, we have many copies of the base, and they all change locally. In such a situation, conflicts are inevitable when data changes simultaneously in different places. And I don't want to lose data at all. The algorithm for solving this problem is quite traditional:


Transparent work without network


The mobile application (and many others) cannot count on a stable connection, and the service data may change at any time. This should not be a problem for someone who uses our cloud base. Obviously, this is also very important to consider in our SDK. We solved this problem with this scheme of actions:


The DataSync API documentation describes all the methods that our own SDKs use. You do not need to make your own library protected from conflicts and disconnections for any platform.

We first made it so that each “application” synchronizes with the same on other platforms and on the server. But then there were examples when it was important to synchronize data between different applications on the same device - roughly speaking, to have a common database. And then we implemented and such. Now, for example, if you have a lite-application and a full-fledged one, then you can easily forward the data of the user who decided to upgrade from one to another. Or send or receive data from the partner application.

How the API works


In the process of working with the API, the client mainly operates with such concepts as:

The DB is the synchronization unit for the API. It contains objects (records), combined in a collection. But all operations are essentially done with separate objects. Moreover, in order to uniquely identify an object, you must specify its identifier and collections.

The database state identifier is the revision number. The API implements a revision control scheme that allows you to track and fix conflict situations. When creating a database, the revision is zero, upon further receipt of any changes for this database, the revision number will increase by one. Any changes are made by sending a delta update. With it, you can create, delete and edit notes.

A conflict is considered a situation in which the client sends to the server a revision younger than that existing on it. In this case, the server looks at the number of the incoming revision and sends a set of changes to the existing version.



In the case when the audit on the client is so outdated that the server cannot provide a list of necessary changes, the client needs to receive a snapshot of the database and update to the current revision on its own. The conflict resolution scheme is left to the discretion of the developers.

Snapshot base reflects the status of the current database revision on the server. It must be requested when creating a local database on a device or when resolving conflict situations.

In order to share data only within its own application, it is necessary to request access rights to store application data, for sharing data - access to shared user data.

Conclusion


You can start using the HTTP API and JS SDK now. In the near future we will release an SDK for Android and iOS.

You can see how everything works by the example of an address book in a test browser application . It demonstrates how a user's phone numbers can be synchronized in steps of 20 seconds. Data is synchronized with the cloud and any instance of the web application. The easiest way to see this is to open two tabs in the browser.

Source: https://habr.com/ru/post/255357/


All Articles