ShareJS or how to make your Google Wave with OT and NodeJS

After two years of working on OT (a shared data access conflict resolution technique) for Google Wave, Joseph Gentle got the idea that those who want to make a similar product will need a little less time. To somehow help these people and share knowledge, the ShareJS library was written, which is a NodeJS-based OT implementation. There is also a C-implementation .

')

What is OT?

OT is Operational Transformation or Operational Transformation.
We have data and operations on this data. Operations come to us in turn. The whole point is that before you perform yourself on the data, the operation itself changes according to all previous operations on the current data.

A good example from Wikipedia:

At the initial time, both users have the string 'abc', for which two operations are simultaneously created:
O1 = Insert [0, "x"] (insert the character "x" at position "0")
O2 = Delete [2, “c”] (delete the character “c” in position “2”)

Depending on the sequence of these operations, the result will be different.
If O2, O1, then 'abc' -> 'ab' -> 'xab'
If O1, O2, then 'abc' -> 'xabc' -> 'xac'

How would we in both cases get the same 'xab' result? Transform operations, considering previous changes? What would it look like with OT?

O2: 'abc' -> 'ab'
OT: O1 Insert [0, "x"] -> O1` Insert [0, "x"]
O1`: 'ab' -> 'xab'
In this case, the operations O1 and O1 `are identical (Insert [0," x "]) because the previous operation O2 was above characters in a position larger than the position for O1. Thus, O2 does not affect or alter O1.

O1: 'abc' -> 'xabc'
OT: O2 Delete [2, “c”] -> O2` Delete [3, “c”]
O2`: 'xabc' -> 'xab'
In this case, OT took into account that before O2 there was already an O1 operation, which inserted one character before the position for O2 and for this O2, you should delete the character not from position 2, but from position 3. That's all magic.

Read more about OT: wiki , FAQ

Data types

In OT, a lot depends on the type of data. After all, operations (as well as operations on o_O operations) are different for different data types. For example, all operations with a string can ultimately be reduced to only the 3rd:
- Move the carriage to position n
- Insert the line n in the current position
- Delete n characters starting from the current position
Or even to two:
- Insert a line at position n
- Delete m characters starting at position n

We are primarily interested in the json data type and there everything is somewhat more complicated, since json can consist of objects, arrays, strings, and numbers. These are all different data types. We have already seen how this works for strings. For arrays a similar situation. Also, OT can deal with the increment of a number. More difficult with objects: if two users simultaneously overwrite a field of a json object, then it is impossible to take into account the changes from both users for the general data type json. One of the two operations will be lost. If this is critical, you can create your own data types specific to your application.

Data types for ShareJS are rendered by a separate OTTypes project. There are currently several implementations for string and json. The plans - rich-text data type.

Arihture

ShareJS consists of server and client parts. A part of the code (for example, conversion of operations) is isomorphic, that is, it is executed both on the server and on the client. The role of the operations repository is LiveDB , consisting of Mongo (data) and Redis (operations cache). The data model is document-oriented - collections and documents. Each document has a data type and a version, which is incremented with each change of the document. Operations are atomic at the document level.
The client performs several synchronous operations. They are grouped (setTimeout (sendToServer, 0);), compressed (the same successive ones are combined, those that neutralize each other’s actions are removed) and sent to the server in a crowd.
If the data version of the operation corresponds to the current version of the data in the database, then the operation is applied, if not, the server tries to find the history of intermediate operations first in Redis, then in Mongo (by default also the repository of all operations). The operation is converted according to intermediate operations and applied to the data. The operation itself is performed sequentially (transaction) in Redis (Lua script), where the relevance of the data version is checked once again; if the version is no longer relevant, the circle is repeated anew.
With the help of Redis PubSub, data change events are caught and intermediate operations are sent to all subscribed clients, where they are applied accordingly.
This model allows you to have consistent data on the one hand, and high speed and scalability on the other.

LiveDB can be implemented in another form. The main thing is that the repository supports transactions and events (PubSub). For example, Foundation DB is perfect for this.
In the current implementation, communication with Mongo is provided by means of an adapter , which can be rewritten for any other database.
By default, the entire operation history is stored in the same Mongo database as the data. You can not store it at all or store it in another database.

Application

ShareJS is great for creating real-time data sharing applications like Google Wave, Google Docs, etc.
There is a wrapper over ShareJS - RacerJS . This is essentially a beautiful interface for working with data. You can use RacerJS with client frameworks (AngularJS, Backbone, etc.).
For those who want everything and immediately there is DerbyJS . This is a full-stack isomorphic framework using RacerJS as a model for working with data.

Questions about ShareJS, RacerJS, DerbyJS can be asked in Stackoverflow Chat (Need a reputation on Stackoverflow> 20)

DerbyJS materials

Source: https://habr.com/ru/post/206324/

All Articles