๐Ÿ“œ โฌ†๏ธ โฌ‡๏ธ

We parse ACID in NoSQL

Motivation


It is not a secret for anyone that if there is a formulated heuristic rule called CAP. Theorem, as opposed to the usual RDBMS system, the class of NoSQL solutions cannot provide full support for ACID . It must be said that for a number of tasks there is no need for this and the support of one of the elements leads to a compromise in resolving the rest, as a result - a large variety of existing solutions . In this article, I would like to consider various architectural approaches to solving problems of partially meeting the requirements for a transaction system.

"A" Atomicity


Atomicity guarantees that no transaction will be partially fixed in the system. All its suboperations will be executed or none

NoSQL systems usually choose high performance not in favor of transactional semantics, since its compliance adds additional processing costs. Many systems still provide a guarantee at the key or line level (Google BigTable) or provide api for atomic operations (Amazon DynamoDB), in which only one stream can modify the record if you, for example, want to have a user visit counter distributed across the cluster . Most systems adhere to non-blocking read-modify-write cycles. The cycle consists of three stages - read the value, modify, write. As you can see, in a multithreaded environment there are many things that can go wrong, for example, that if someone changes the record between the read and write phases. The main mechanism for resolving such conflicts is to use the Compare and Swap algorithm โ€” if someone changed the record during the cycle โ€” we need to understand that the record has changed and repeat the cycle until our value is established, this algorithm looks preferable to completely a write-blocking mechanism. The number of such cycles can be very large, so we need some timeout for the operation, after which the operation will be rejected.

"C" consistency


A transaction reaching its normal completion and, thereby, fixing its results, preserves the consistency of the database. Considering the specifics of NoSQL to the distribution of information across servers - this means whether all replicas containing a copy of data always contain the same version of data
')
Due to the specifics, modern NoSQL is required to choose high availability and the ability to scale the cluster horizontally - it turns out that the system cannot ensure complete data consistency and makes some assumptions in the definition of the concept of consistency. There are two approaches:

Strict consistency

Such systems ensure that replicas are always able to come to an agreement on one version of the data returned to the user. Some replicas will not contain this value, but when the system processes the request for a value by key, the machine will always be able to decide which value to return โ€” it will simply not always be the last. How it works - for example, we have N replicas of the same key. When a request comes to update the key value, the system will not give the result to the user until W replicas reply that they received the update. When a user requests a value, the system returns a response to the user when at least R replicas return the same value. Then we consider the system to be strictly consistent if the condition R + W> N is met. The choice of the values โ€‹โ€‹of R and W affects how many machines have to answer before the answer is returned to the user, usually the condition R + W = N + 1 is chosen - the minimum necessary condition for ensuring strict consistency.

Possible consistency

Some systems ( Voldemort, Cassandra, Riak ) allow you to select R and W for which R + W <N . When a user requests information, there may be times when the system cannot resolve the conflict between versions of key values. To resolve conflicts, a type of versioning called vector clock is used. This is the vector associated with each key that contains change counters for each replica. Let servers A , B, and C be replicas of the same key, a vector will contain three values (N_A, N_B, N_C) , initially initialized to (0,0,0) . Each time a replica changes the key value, it increments its counter value in the vector. If B changes the value of the key that previously had version (39, 1, 5) - the vector will change its value to (39, 2, 5) . When another replica, say C , gets an update from replica B, it compares the value of the vector with its own. As long as all of their vector counters are smaller than those that came from B , the value that has come has a stable version and you can overwrite your own copy. If there are vectors on B and C in which some counters are larger and some less, for example, (39, 2, 5) and (39, 1, 6) , then the system identifies the conflict.

Resolving this conflict varies across systems; Voldemort returns multiple copies of the value, resolving the conflict at the mercy of the user. Two versions of the user basket on the site can be merged without losing information, while merging two versions of one edited document requires user intervention. Cassandra, which stores the timestamp of each record, returns the latest if a conflict is detected, this approach does not allow merging the two versions without losing information, but it simplifies the client part.

"I" Isolation


During the execution of a transaction, parallel transactions should not affect its result. The concept of transaction isolation levels also matters here.

Cassandra, starting with version 1.1 ensures that if you are doing an update:

UPDATE Users
SET login='login' AND password='password'
WHERE key='key'


then no competitive read will see a partial update of the data (login has changed, but the password has not), and this is true only at the level of rows that are within a single column family and having a common key. This may correspond to the isolation level of the read uncommitted transaction, at which lost update conflicts are resolved. But Cassandra does not provide a rollback mechanism at the cluster level, for example, it is possible that the login and password will be saved on some number of nodes, but not enough W to give the user the correct result, while the user is forced to resolve this conflict . The isolation mechanism is that an invisible, client-isolated version is created for each record that is changed, which subsequently automatically replaces the old version using the Compare and Swap mechanisms described above.

"D" Reliability


Regardless of the problems at the lower levels (for example, system de-energization or equipment failures), changes made by a successfully completed transaction should remain saved after the system returns to work. In other words, if the user has received confirmation from the system that the transaction has been completed, he can be sure that the changes he has made will not be undone due to some kind of failure.

The most predictable failure scenario can be a power outage or server restart. Fully reliable system in this case should not return the answer to the user until he writes all the changes from memory to the hard disk. Writing to disk is too long and many NoSQL systems compromise for performance.

Ensuring reliability in a single server

A standard disk can handle 70-150 operations per second, which amounts to a bandwidth of up to 150 Mb / s, ssd - 700 Mb / s, DDR - 6000 - 17000 Mb / s. Therefore, ensuring reliability within a single server while ensuring high performance is a reduction in the number of recordings with random access and an increase in sequential recording. Ideally, the system should minimize the number of records between calls to fsync (data synchronization in memory and on disk). To do this, apply several techniques.

Fsync frequency control

Redis offers several ways to configure when to call fsync . You can configure it to be called after each record change, which is the slowest and safest choice. To improve performance, you can cause a flush to disk every N seconds, in the worst case, you will lose data in N last seconds, which may be quite acceptable for some users. If reliability is not critical at all, then you can disable fsync and rely on the fact that the system itself at some point synchronizes the memory with the disk.

Increase sequential write through logging

For efficient data retrieval, NoSQL systems often use additional structures, for example, B-trees for building indexes, working with it causes multiple random disk accesses. To reduce this, some systems ( Cassandra, HBase, Riak ) add update operations to a sequential-writeable file called redo log . While some structures are rarely written to disk, the log is often written. After the fall, the missing entries can be restored using the log.

Increase bandwidth by grouping records

Cassandra groups several simultaneous changes during a short window, which can be combined into one fsync . This approach, called group commit , increases the response time for a single user, because he has to wait for several other transactions to commit his own. The advantage here is obtained by increasing the overall throughput, since multiple random entries can be combined.

Ensuring reliability within a server cluster

Due to the possibility of unforeseen failures of disks and servers, it is necessary to distribute information across several machines.
Redis is a classic master-slave architecture for data replication. All operations associated with the master go down to the replicas in the form of a log.
MongoDB is a structure in which a given number of servers stores each document, and it is possible to specify the number of servers W <N described above, which is minimally necessary for recording and returning control to the user.
HBase achieves multi-server reliability through the use of HDFS distributed file system.

In general, you can notice a certain tendency of modern NoSQL-tools in the direction of providing greater data consistency. But still, while SQL and NoSQL tools can exist and develop in parallel and solve completely different tasks.

Source: https://habr.com/ru/post/228327/


All Articles