📜 ⬆️ ⬇️

Amelisa. Offline and realtime engine for React and Mongo


Wrote recently a data synchronization engine with first-class offline support. For example, you can go offline, change data, close the browser, open the browser, open the site (go online) and the data is lost without loss. Also during online data between the client and the server are synchronized in real time. I want to tell you what the idea was, what kind of solutions / technologies there are and to whom it might be useful.





Problem


Web application requirements are on the rise. Previously, everyone was happy with static pages, but now users want the comments under the pictures of the seals to be updated right away, the likes are wound up in front of you, event notifications come without waiting for the page to reload. With the growing popularity of mobile devices, the concept of Offline First has emerged, with the idea that the application should take into account instability or lack of a network.
')
The web is a distributed environment. And synchronization of data in a distributed asynchronous environment is not an easy task. Everything is facilitated by the fact that for most applications there are no strict requirements for the reliability of the displayed data. It is not so important if some commentary for a picture with a cat does not come from the server or if two users see a slightly different number of likes.
Usually, in such cases, the implementation of a real-time is screwed to an already existing REST-API, where the server somehow knows what data the client is interested in at a given time (subscription) and when changing this data in the database, sends patches with updates to the client via a WebSocket connection, or some ready-made solution is used. Let's take a closer look, which is now popular to use to work with data on the web.

Solutions


Flux is a methodology (and implementation) for working with data from Facebook. The main idea is one-pointedness. How exactly the data is obtained from the server is not included in the area of ​​interest Flux. If you wish, you can make Flux-stor realtime.

Redux is the newest and most popular Flux-like library. It differs from Flux in simplicity (there is no dispatcher, one way itp).

Relay is a new framework from Facebook, replacing the Flux methodology. Each React component can request data from the server that it needs. This is done using the GraphQL language. Such an abstract query language can be a good solution if you have many different data sources (databases), but you also need to describe how it is converted into database query languages ​​for the entire data schema. Relay allows you to request only a part of the document and resolves situations when several components have requested the same data, which is useful when you have a large number of components. Subscriptions should appear soon.

Falcor is a Netflix engine. It has a single interface for working with local and remote data. Also interesting is the concept of paths.

Meteor was a kind of revolutionary and strongly promoted the idea of ​​isomorphic api for working with data and real-time. Subscriptions are made directly to Mongo requests, and the patches from the data exchange are operations from Mongo oplog. Meteor is not even a framework, but rather a platform, with its package manager.

Firebase - realtime BAAS. An interesting and quite popular paid service that solves the problem of real-time in web applications.

Diffsync uses the diffs algorithm for JSON objects, similar to what Git does for strings. Then, the client and server exchange these diffs. This can work well if your application is not high collaborative.

In order to fully insure yourself against inaccuracy and data loss, it is not enough to exchange patches, you need a more serious approach. There are two conflict resolution techniques for distributed asynchronous systems - OT and CRDT . Data is presented as a state and log of operations. The state is the result of the sequential application of all operations from the log and has its own version. Typically, the minimum state entity is a document and document-oriented databases are used as storage. The transaction log can be stored in the same or another repository. In addition, certain metadata is stored along with the state - state version, timestamp, data type, and so on. CRDT also has a state-based version, which is used, for example, in Riak . But to transfer, with each change, the entire document (state) is not as effective as just one operation, so op-based CRDT is usually used on this web.

ShareJS is the most popular OT implementation. I already wrote about her. You can add that the main advantages of ShareJS are operations on strings, arrays, numbers (this is useful if you are doing a collaboration editor, for example), as well as having a common JSON data type implementation and the ability to subscribe to Mongo requests. For OT, a source of truth is needed, where operations are actually converted. This is usually a server. Implementing a full-fledged offline for OT is an extremely difficult task (I don’t know any implementation). I must say that OT is actively used by Google in services such as Docs, Wave. We, in the company where I work, use ShareJS (as part of the DerbyJS framework) for a single year, and the flight is normal.

All the engines / libraries described above do not have full support for offline, because this requires equal distributed replicas of data, as in Git, global versions of states and so on. There are different approaches here, but the most interesting, in my opinion, is CRDT. Solutions in this area look like this:

Hoodie - Offline First framework tied to CouchDB. In the new version on the client will use PouchDB . CouchDB stores the entire history of document states, which is used for offline. You can draw an analogy with Git - with offline, the history of states is divided into two branches: server and client, and when online they merzhatsya. Something similar to state-based CRDT. CouchDB - by and large key-value storage, there is also a basic implementation of requests, but not as rich as, for example, in Mongo.

Swarm is a CRDT engine developed by our Siberian scientists . It has many data types - key-value, strings, arrays, and so. Swarm is not tied to any database and, accordingly, does not support subscription for requests. The implementation of subscriptions to queries in general (without a link to a specific database) is not a trivial matter.
Victor has very interesting reports and interviews .

A common point for all solutions is that if there is a server part, then it is written in Javascript and requires NodeJS. This is due to the fact that a significant part of the code between the client and the server is reused.

Surely there are still interesting solutions, about which I do not know or did not remember. Share in the comments, we will discuss.

Amelisa


The idea was to combine CRDT offline capabilities and Mongo-query subscriptions. Wrap it in an isomorphic Racer -like api and integrate with React. Add insights from ShareJS for access control, scaling and so on.
Compared to SwarmJS, I had to sacrifice a variety of data types. Unlike ShareJS and its generic JSON data type (including operations on objects, arrays, strings, and numbers), in Amelisa each document is an ordinary key-value (operation on objects).
From Transmit , an idea was taken of how to implement server-side rendering for a tree of components, each of which subscribes to data in isolation. The difficulty here is that the input data for the underlying components is not known, until the overlying data receives data from the database and is rendered.
There is also the similarity of join'ov and the ability to mix Mongo-query subscriptions with data fetches from regular url (REST-api, third-party services, etc.).
Read more about the features in the documentation . Although at the moment it leaves much to be desired, and it may be easier to look at an example of a CRUD application that has authorization based on the Amelisa Auth module and basic access control. For the most daring - source code .

Most likely, Amelisa will be interesting in those projects where offline and data synchronization is needed, but there is no requirement to do collaborative editors and so on. A good use-case can be a todo-list application that runs on the phone as a native or web application and on the desktop as a web application. At the same time, the user does not want to think online or offline to view his task list for today, mark completed or add new ones. And at the moment when it will be online, the data is synchronized between all devices.
Also in the requirements for Amelisa can include React, Mongo, NodeJS.

Currently working on stabilization and integration with React Native . We are also writing a mobile application that uses Amelisa and very soon it should go into production. The best way to follow the news is to follow Amelisa on Twitter .

Source: https://habr.com/ru/post/277645/


All Articles