📜 ⬆️ ⬇️

Some facts about the CAP- “theorem”

In any discovery of NoSQL databases, someone will definitely remember the CAP- “theorem”. I do not accidentally write the word "theorem" in quotes. CAP- "theorem" is not at all a theorem in the mathematical understanding of this word. This is an informal statement made by Eric Brewer in a report at the Principles of Distributed Computing (PODC) conference in 2000. Eric argued that it is impossible to create a distributed (consisting of several equivalent instances - links) web application that will simultaneously have three properties: consistency , availability and separation tolerance , or CAP. The informality of the statement is that Brewer did not define these three concepts.

Two years later, Seth Gilbert and Nancy Lynch published a study where they defined the concepts of CAP and also formalized “ Delayed Consistency ”, which was later called “ Eventual Consistency ” and proved the CAP “theorem” in terms of specified definitions. If you have not read the study, then it is definitely worth doing - lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf

This “theorem” would not have been necessary to anyone, if it had not been adopted by the marketers of NoSQL.

Fact №1


Consistency in CAP does not mean that consistency in ACID . Gilbert and Lynch define coherence as the existence of a single order of read / write operations throughout the system. That is, if at time T1 the value X was written, then at any time T2> T1 the user can read X until another write operation occurs. A more human formulation of consistency in CAP - the user will always read what he wrote, if no one else had time to rewrite the data. In ACID, this concept coincides with atomicity.
The consistency in ACID gives much stronger guarantees - the integrity (uniqueness) of the keys and the integrity of the foreign keys (the foreign key always refers to the existing record), as well as the fulfillment of all the limitations described in the data schema. In addition, ACID guarantees not only one transaction, but transactions that may consist of many operations. Some scenarios, such as ticket sales, are generally impossible without ACID guarantees.
')

Fact number 2


Availability in CAP, means at all that you think . Gilbert and Lynch define accessibility as the ability of any link whose performance is not impaired to respond to a request. What is a malfunction is not specified.

From the client’s point of view, the response is “server unavailable” and a direct lack of communication with the server may be equivalent. Because of this logical flaw, any server inability to respond can be perceived as a link failure and the need to repeat the request to another node.
That is, a web service, all instances of which access the same database, in the event of a loss of communication between the instance and the database begins to generate an error, is available from the point of view of the CAP.

There is another logical flaw in the definition of accessibility - the time required for the answer is unlimited, the main thing is that it is finite. That is, the service may try to access the common base for as long as it takes until the connection is restored. “And what if the connection is not restored at all?” - you ask, about this just a little further.

Thus, accessibility in CAP is not the same accessibility in the philistine sense of the meaning of the word.

Fact number 3


CAP assumes that network separations occur and can last indefinitely . Splitting a network means losing any number of messages between servers.
There are two misconceptions.
  1. With unlimited network sharing, it is impossible to fulfill the Consistency property and even the Consistency property in the long run . In fact, with unlimited long sharing, the web service ceases to be one from the user's point of view. Gilbert and Lynch in their study do not consider the case of unlimited long separation, but only a predetermined number of lost messages.
  2. Network sharing is extremely rare in our time. Gilbert and Lynch expressly declare that systems operating on a local network can be considered not susceptible to separation.

Thus, most information systems and websites do not fall under the CAP- “theorem”.

Fact number 4


CAP does not describe client behavior . Loss of communication between the client and the web application is much more common than network separation or even server failure. The mechanisms for re-requesting and resuming transmission are embedded in many clients. Roy Fielding's thesis, which appeared about the same time as CAP, describes how to build web services and clients for them based on the HTTP protocol (REST services), with maximum fault tolerance and consistency (in the sense of ACID).
In most cases, on the principles of REST, you can build a system (client + service) that can overcome the lack of accessibility from the point of view of CAP, and in some cases the lack of consistency.

Fact number 5


AP systems are useless in their pure form . Gilbert and Lynch write that to achieve AP characteristics it is enough that the service always returns one value. In principle, in its pure form, the AP system (on which no additional restrictions are imposed) can return a random value.

Fact №6


Delayed consistency or consistency can ultimately be achieved only if there is no recording in the interval to achieve consistency . In other words, if the service takes time T to achieve a consistent state, then it is necessary that the recording take place no more than once every T interval, otherwise the delayed consistency becomes a permanent inconsistency .

Of course, when recording unrelated data occurs, for example, different zones are updated in the DNS, then this is not a problem. But for data with a bunch of connections, there is a great opportunity to get a permanently inconsistent state with a large write load.

Fact №7


Deferred consistency or consistency ultimately gives no guarantees . If the system supports only consistency in the long run, then it is impossible to guarantee its correctness. Any link may rely on outdated data and not be aware of this. Moreover, it is impossible to introduce additional restrictions and control them at the level of the program code in order to obtain guarantees, since it is necessary to rely on a consistent state to check the restrictions.

Conclusion


The CAP- “theorem” has a rather difficult fate. Initially, it was a statement based on practical considerations. But for the proof, a formal model was invented, which is quite far from the real world. And all would be fine if CAP- “theorem” did not become a way of promoting NoSQL databases, causing a wave of disputes, resentments and delusions. I hope that by understanding CAP, you can make decisions based on practical considerations, not marketing statements.

Source: https://habr.com/ru/post/231703/


All Articles