Holy Grail on steroids: total synchronization and isomorphic JavaScript on Swarm.js

Today on Habré, we present the technology of a replicated model that allows you to create collaborative and realtime web applications as easily as local desktop applications. We believe that when developing applications, real-time data synchronization should be available in the same way as a TCP stream, an HTTP request, or a current from an outlet — immediately and without question. HTML5 applications written in Swarm , in terms of autonomy, locality and download speed are not inferior to native ones.
Using the Swarm library, over the weekend we do more than a month before we did without Swarm. What is more important - we can do what we could not do at all without it. We offer this synchronization library for free.

Today we post TodoMVC ++ , a reactive HolyGrail- on-steroid application written in Swarm + React . I will give a list of features demonstrated in the application:

Instant download - the page is drawn on the server and comes to the client, like compressed HTML; then, the code and data are pulled up and the page is animated. Isomorphic javascript in action.
Caching data in WebStorage - allows you to both speed up the download and work offline, without losing work.
Offline work - the data is already clear, and if we add cache manifest, then the HTML5 application can load and work without the Internet!
Real-time synchronization - open several bookmarks (synchronization via WebStorage) or open the same page on your phone / ipad / other browser (WebSocket).
Difficult to synchronize data structures (yes, yes, it is a difficult-syn-hro-ne-zi-ru-e-we-e).

In general, the application is written without any reference to the network, as a simple (local) MVC application. Synchronization and caching occurs entirely at the Swarm library level, and the application works with local Backbone-like model objects.
')
So, here is the application itself: ppyr.us.
Here is the code: github.com/gritzko/todomvc-swarm

A detailed analysis of the application and the library is in the following material, and now there are a lot of letters about replicas, CRDT and the theoretical basis.

Today, synchronization is highly demanded on both the client and the server. In the server economy of the Internet giants, more and more various storages and data processing facilities appear that require synchronization. The principle of one big database, as a source of truth, is difficult to scale . But we are talking, first of all, about the client. Today, an ordinary user has got a lot of devices. When the data that he sees on the laptop screen does not match the data on the iPhone screen - the user becomes upset. Synchronization of replicas is needed everywhere, and we believe that the data synchronization system (SD) will soon be the same IT content as a database.

The specificity of the current moment is that even industry leaders finished their synchronization decisions only up to the “type of work” stage. GDocs does not quite work offline, GTalk and Skype systematically lose cues from the chat history, Evernote is famous for a wide variety of bugs. And these are the leaders. In general, the problem of synchronization is surprisingly complex and multifaceted. Take Evernote. If Evernote were a local application, a student could write his 80/20 subset. Just like having MySQL and PHP, Zuckerberg wrote facebook.

What is the fundamental complexity of synchronization? Let's understand how classic replication and synchronization technologies work, how they maintain the identity of replicas. The simplest approach is to do everything through the center. All write operations are flocked to the central database, new results of read operations go back from the center. It seems to be safe and easy, but soon the difficulties start coming from three sides.

concurrency - while the response to the previous operation was going on, the client managed to do something else, and how to combine it now is not quite clear,
scaling scheme, where all operations go through one point,
functioning on a bad internet when the center does not respond to customers or responds slowly.

The first typical step of scaling this scheme is replication according to the master-slave principle, as it is implemented in a typical database . The wizard puts the operations into a linear log and distributes this log to slaves who apply it to their data replicas in the same linear order and get the same result. This helps to scale the reading, but adds an eventual consistency element, since Slaves are updated with some lag. The problem of recording remains - all records go through the same center. Linearization can be stretched to distributed replicas using the consensus algorithm, such as Paxos or Raft , but even there the “leader” makes linearization, i.e. all the same center. When the center does shut up, they start horizontal scaling - they cut the base into “shards”. There is already linearization, and with it the entire ACID , breaks into a thousand small ACID'ikov.

Well, the center and linearization are not compatible with offline work. You can, of course, say that offline will not be soon, but the fact is that offline happens and happens regularly. If we tweet or like something, it can be suffered, and if something is more serious, then it is unlikely. If, for example, the waiter has touched the wire with a foot and the Internet has disappeared, then we cannot drive customers out of the restaurant until the admin arrives in a car with flashing lights, and we also cannot serve for free (example from Max Nalsky, co-founder of IIKO ).

Moreover, all these adventures on the server side still do not affect the client side. The client simply waits until the servers agree on everything and report the result. In the notorious project, Meteor tried to make clients synchronize in real time, actually caching MongoDB to the client. For everything to be lively, waiting for the server to respond was masked by the latency compensation trick. The client rolls the operation into its cache, sends it to the server, the server responds if it has succeeded, and if not, sends a correction. The approach is more than doubtful. “-Lust, did you put the car in the garage? “Yes, in part!”

This is such a complicated story with linearization. Well, the more interesting it is to look at popular solutions that are scored on linearization. There are two good examples - Git and CouchDB. Git was written by Linus Torvalds, who was the “center” among Linux developers. This is probably why he felt well that the center is slow, the center does not scale. In git, synchronization occurs according to the master-master principle. The data is presented as a digraph of versions, all parallel versions need to be smoothed once. Scaling - perfect, offline - no problem. Approximately the same in CouchDB . There are attempts to bring CouchDB-like logic to the client - pouchdb and hood.ie.

Something completely new in this area is CRDT , and we’re talking about it today, sorry for the long introduction. CRDT is Convergent / Commutative / Cloud Replicated Data Types. The overall design of CRDT is to use partial order instead of linearization. Operations can occur in parallel on many replicas, and some operations are competitive - i.e. occurred on different replicas, not knowing about each other, none of them are “first”, and on different replicas they are attached in a different order. If the data structures used withstand such easy reordering of operations that do not violate cause-effect relationships, then all the problems associated with the center simply evaporate.

Another question is - are there many such CRDT data structures? As it turned out, the entire computational consumer goods — variables, arrays, associative arrays — are completely realized in the form of CRDT. And if we count money? Is this when linearization and ACID guarantees are needed? Alas, oh, it turned out that the new is well forgotten old. It turned out that the data structures used in bookkeeping - accounts, balances - are quite CRDT. Indeed, during the Renaissance period, when accounting traditions were formed, there was no Internet, so they got out without linearization.

CRDT's big, glowing opportunity is live replicas that are fully functional even in the absence of a connection to the center. Well, the immediate application of all operations, without jogging to the center. Such autonomy and speed is especially relevant in two cases. First, for mobile devices - they are used on the go, with an unreliable Internet. CRDT allows you to store data for future use, and work quietly locally, with background synchronization. Secondly, for applications with the collaboration function, especially in real time (we think about Google Docs, Apple iCloud). In such applications, the “state” is large and rapidly changing, and each run to the server and back is a nail in the coffin.

There are also non-CRDT technologies that allow working with data offline. Its synchronization API offers Dropbox , there is StrongLoop , Firebase, etc., their sea. All these solutions work on the principle of Last-Write-Wins (LWW) - each entry is assigned a timestamp, an entry with a larger tag rubs the old ones. On the same principle is built Cassandra . And in our Swarm library, the most popular primitive is the LWW object . The advantage of Swarm are those data structures that are not solved through LWW. For example, text while editing.

In general, in the looking glass of distributed systems, the opposite is true. In ordinary programming languages, the simplest operation is the increment of a variable, ++. Working with arrays is a little harder, and objects and associative collections are even more complicated. In distributed systems, everything is exactly the opposite! LWW objects and associative containers are not difficult. Linear structures (arrays, text) are very complex, and counters are extremely complex. This can be seen in the example of Cassandra, where LWW objects were made first and foremost, and the counters still seem to be finished.

Get to the point. We decided to write TodoMVC in Swarm + React to show the library in action. Actually, the first TodoMVC on Swarm + React was written in July by Andrei Popp less than in a day, but that code was not “idiomatic”. This time, we added linear collections ( Vector ), server rendering and a bunch of all sorts of goodies. Moreover, the usual TodoMVC seemed bored and useless. For example, looking at React + flux TodoMVC, it is very difficult to understand why the authors nagged all these tricks in the simplest application. Therefore, we have added one feature - recursiveness. By pressing Tab, the user goes to the nested "child" list. Also, we adapted the interface for realtime synchronization. Such an application already represents at least some practical benefit. Also, they began to show the status of the application in the URL - for easy sharing between users. In general, it was difficult for us to stop. Compared to the development of real-time projects in the past, in the face of Swarm we had a sort of clade sword, and all the time itching to kill someone else.

A detailed analysis of the application and the library - in the following material.

Follow the updates on the @swarm_js project twitter .

Source: https://habr.com/ru/post/238785/

All Articles

Holy Grail on steroids: total synchronization and isomorphic JavaScript on Swarm.js

More articles: