Conflict-Free Replication: CRDT in Theory and Practice

In distributed repositories or editors of any data, support for making changes offline, without blocking or conflicts, is often needed. Different approaches are used for this, one of which is the conflict-free replicated data type (CRDT) algorithms and data types.

What for?

Make readonly-replica easy and everyone understands how. The ability to write to the replica is a complicated thing. There is such a CAP theorem : data consistency ( with onsustency), accessibility ( a vailability), resistance to separation ( p artition tolerance) - select any two. In CRDT, this problem is solved by providing strong eventual consistency ( SEC ) and state monotony.

Strong eventual consistency

Suppose that you can make changes to the replicas that they somehow exchange. Eventual consistency is called the organization of interaction and storage of data, in which all replicas, after the termination of making changes to them, will eventually eventually come to an equivalent state.
Strong Eventual Consistency imposes another restriction: replicas that receive the same updates (no matter in what order), come to an equivalent state immediately after receiving updates.
Do not confuse SEC and sequential consistency: suppose that a list in which you can add and remove elements does not have to guarantee an add-delete sequence in order to satisfy the SEC (for example, the removal priority must be added).
')

Conflict and monotony

There may be conflicts in the EC system. Conflict - these are the changes that, when applied to different replicas, bring them to a consistent state, but when combining the states of the replicas, a certain system invariant is violated. Conflicts are resolved by state rollbacks or in other ways that may even involve user interaction.
In CRDT, it is assumed that the system provides the SEC and its state monotonously progresses, without leading to conflicts. Monotony in this sense means the absence of kickbacks: operations cannot be canceled by returning the system to an earlier state. The states of such a system are connected by a partial order relation; in mathematics, such a system with a join operation defined on it is called a semilattice.

Lattices and semilattices

A semilattice is a semigroup whose binary operation is commutative and idempotent (there is a formal explanation on Wikipedia, in Russian and in more detail in English , and here with mathematics ).
A semilattice is a partially ordered set: the elements (not necessarily all) are related by the “follows” relation.
The operation on the semilattice is:

idempotent: a ⋁ a = a
associative: (a ⋁ b) ⋁ c = a (b c)
commutative: a ⋁ b = b ⋁ a

This operation can be either the exact upper bound (least upper bound, LUB) for the upper (join) semilattice, or the exact lower bound (the greatest lower bound) for the lower (meet) semilattice.
Difference of a tree from a semilattice:

And this is a complete lattice, on it both LUB and GLB are defined simultaneously:

An example of an upper semilattice is depicted on the CDRV. The difference is that a common ancestor is not necessary for combining elements. These are the algebraic structures that interest us now.

Many states in version control systems, such as git or svn, are in a sense a complete lattice: they have a partial order relation defined, revisions have a common ancestor and can be reduced. With elements of the upper semilattice, you can perform the join operation, making their two elements from two elements.

CvRDT vs CmRDT

For different tasks, different ways of organizing interaction and data storage are convenient. CRDT can be divided into two classes:

Commutative ( c o m mutative, CmRDT, op-based): suppose you have a list. When you add an item, you send to all the replicas only this changed state (the added item). Operations must be commutative so that the state of the replica does not depend on the order of receiving updates.
Storage based states ( c on v ergent: CvRDT, state-based): in this case, it is not individual updates that are sent, but the entire data structure (that is, the entire list, in the case of a list). The data structure must support operations:
- query - read something without changing its state (for example: is there an item in the list?)
- update - change the structure (for example: add an item to the list)
- merge - to stop the state, which came from another replica. This operation must be commutative, associative, and idemonet: all states in any direction in any direction do not “return” the system to an earlier state, but monotonously increase the state of the system. If you combine all the replicas, they must come to an equivalent state, combining all the changes made in the replicas.

Visual comparison:

The classes CvRDT and CmRDT are equivalent (the proof of this theorem is in the work of Marc Shapiro in the "references" below). Any CRDT structure can be represented as both CvRDT and CmRDT.
However, from the point of view of the requirements for the data transfer method, they are different: CmRDT requires some way of delivering notifications to replicas, while all CvRDT needs is to write and read the state (but this state can be quite large).

CRDT Example and Record

Let's make a list in which you can add and remove items. It is called 2P-Set.
On the cue there are two sets: the added element (A) and the removed (R; this set is often called the tombstone set), initially empty. When adding an element, we add it to A, when deleting it - to R. Checking the inclusion in the set consists in checking whether there is an element in R and whether it is in A. It turns out that deletion is more important than adding: once you delete an element, you cannot add it back: elements are not deleted from R and A.
The math notation looks like this:

This is a CvRDT structure that, for each operation, passes the entire list. In an obvious way, you can convert it to a CmRDT equivalent. For simplicity, let's assume a very simple case when add is always delivered to remove. The mathematical notation for CmRDT looks like this:

How to understand this

payload - what is located on the replica
initial - its initial state
query - a system status query that does not change it, is executed locally on the replica
update - an operation that changes the state of the system
atSource is the part of the update that runs on the initiator replica
downstream - updates that will be performed on replicas for this operation

Other structures

Of the known, described in the works and implemented in libraries, there are such structures:

G-Counter: (grow-only) monotonously increasing counter
PN-Counter: (positive-negative) counter that can be reduced
LWW-Register: (last-writer-wins) register with the principle of the last entry overriding
MV-Register: (multi-value) multi-value register
G-Set: many elements without deletion
2P-Set: with priority deletion
PN-Set: uses on-delete operation counter
LWW-Set: priority time of operation
OR-Set: (observed-remove) list with identifiers

There are also structures that describe vertices and links in columns, based on lists.

CRDT Optimization

The usual way to make a CRDT structure is to assemble it from others: for example, 2P-set consists of two lists: added and deleted. In practice, there may be restrictions imposed on the operation or amount of data transfer. In this case, you can make some concessions. For example, if items are frequently added and removed to the list, the list of deleted items will grow dramatically over time. You can make a garbage collector, which would delete deleted on some reasonable condition. For example, those that were deleted more than a year ago, or those about which all replicas received notifications.

CRDT vs OT

Operational Transformation is an approach commonly used in text editing. Both approaches provide consistency (EC). OT has higher server requirements, more complex and less stable algorithms. Also, some studies have shown that OT algorithms sometimes do not converge (converge), as stated in the implementation. However, it is impossible to clearly make a choice and say “what’s better”. Of the known CRDT text editors, there are, for example, WOOT (WithOut Operational Transformation), Treedoc and Logoot.

CRDT at KeePass

In KeePass, there is a merge function with a file, in my web version I had to devote special attention to making human offline synchronized. KeePass assigns unique identifiers to all objects and stores the ID of everything deleted in DeletedObjects. To synchronize the directory graph, they store the date of the last position change (location change date).
But there is a nuance. Records still have a change history, from where you can delete items. Entries in the history are identified only by time. Deleted entries from history are not saved anywhere even during synchronization. What does this work out? If you delete an entry from history and zinkkat with the previous state, it will magically return. It turns out that in terms of synchronization as the main scenario of the application, the story will often appear back.
What to do?
Since other clients can not be affected, you can enter restrictions. Consider the assumptions that we can make:

files are synchronized via central storage, not p2p
together with the file you can always store any metadata on the replica

Based on this, it was decided to keep a list of deleted and added history states (local tombstone set) from the moment of making changes to the file before sending changes to the repository. At the same time, of course, synchronization with arbitrary p2p files from the past will still be strange, but this is no longer the main scenario.
To submit changes to the repository, the client must:

compare file revision (versionTag) with last received
if it has changed, load and freeze
if there are local changes, fill in with versionTag check
in case of conflict goto 2
clear local state (tombstone set)

Links

CRDT - Wikipedia
Readings in CRDT - many useful publications about CRDT
CRDT - A digestible explanation with less math (GitHub) - a small description of CRDT without mathematics
crdt_notes (GitHub) - CRDT algorithms collected on one page
Conflict-free replicated data types (paper) - introduction to CRDT
A comprehensive study of the Convergent and Commutative Replicated Data Types (paper) - detailed explanation in detail, with mathematics and pictures, 40 pages.
An optimized conflict-free replicated set - an example of CRDT structure optimization
kdbxweb (GitHub) - js-lib to work with the KeePass format with the support of merge, which was discussed

Source: https://habr.com/ru/post/272987/

All Articles