📜 ⬆️ ⬇️

Kronos: no time travel even in distributed systems

In distributed systems there are a number of fundamental problems: efficient distributed transactions, exactly-once data processing, accurate synchronization of physical clocks. To solve this problem , different types of logical clocks were invented .


However, vector clocks have unpleasant properties: they introduce a conditional relationship between events where there is none, and lose it where it actually is.


However, you can think of something more reliable - Kronos. In this article, we will look at the causal relationship accounting algorithm and its application for building a Key-Value repository with distributed transactions.


image


Problems


As already mentioned, there are a number of problems with logical clocks:



Decision


In a 2014 article, Kronos: The Design and Implementation Ordering Service proposes a solution - a stand-alone service that will deal with causal relationships in events.


The main abstraction inside Kronos is the event on which partial order is introduced. The causal relationship is transitive — that is, if, for example, we know that the creation of the file precedes its change, and the change is preceded by the deletion, we can make a logical conclusion that the creation occurred before the deletion.


The minimum API can be defined by the following set of methods:


MethodResultComment
create_event()eCreates a new event in Kronos.
query_order([(e_1, e_2), ...])[<-, concurrent, ->, ...]For each pair of requests, returns the direction of causation, or simultaneity of events.
assign_order([(e_1, e_2, must), (e_3, e_4, prefer), ...])OK/FAILFor each pair of the query sets the direction of causation
acquire_ref(e)OKIncreases the reference count for this event.
release_ref(e)OKReduces the reference count for this event.

Implementation


It is quite logical that the system is based on an event oriented graph, with an effective wide search for checking the relationship between events, a fault tolerance mechanism and garbage collection.


As can be seen from the API, the assign_order request assign_order takes a type of causal relationship: must or prefer . must correspond to strict invariants — for example, _->_ prefer_->_ prefer while prefer may not be used if it conflicts with must connections. An example of using prefer - requests that came earlier, it is better to call earlier, but this does not affect the correctness.


Effective BFS


In our case, the graph may be large, but the events for which verification requests will be performed, as a rule, will be located close. Therefore, it is necessary to perform BFS faster for such cases.


In the standard implementation, the longest place is the initialization of the array of visited vertices, which always takes time equal to the number of vertices in the graph. Instead, you can use a hash table or use other tricks.


Garbage collection


As you can see from the table, there are two more methods: acquire_ref and release_ref .


Inside Kronos for each event is stored reference count. While some service handles the event, or reserves the ability to add new events that occur after the current one, it stores the link. When this need disappears, the service calls release_ref .


Kronos will delete the event when all conditions are met:


  1. Number of links reached zero
  2. All events preceding this one have already been removed from the graph.

This approach does not limit possible requests, but saves memory inside Kronos.


Applications


Consider the use of the system on the example of Key-value store with distributed transactions.


Let there are several servers, each server is responsible for a range of keys.


Each transaction corresponds to an event in Kronos. For each key, the server must store the last transaction number in which the key participated. The client creates an event and sends its number to all servers whose keys are affected by this transaction. The server tries to create a dependency in Kronos between the current transaction number and the previous event that is stored for this key. If it is not possible to create a dependency, then the transaction is considered unsuccessful (note that there is no data interaction yet).


If all the operations of adding dependencies have completed successfully - this means that the transaction will take place and it can be performed. Servers learn about this from the client and begin to perform parts of the transaction.


Note that such transactions will be ACID :



Performance


Implementing such a KV storage can indeed be effective. The original article provides data that the described implementation of KV-storage surpasses the transaction-based implementation 4 times as fast as transactions.


Moreover, in comparison with MongoDB, the system over Kronos is inferior by only 6%, despite the fact that MongoDB does not use distributed transactions.


Analysis


However, the operation of the Kronos has several disadvantages.



However, the described system allows for the flexible management of a causal relationship between events, ensuring predictable adherence to the necessary invariants.


Conclusion


This is what we at GoTo School teach students and schoolchildren in the direction of Distributed Systems.


And then there are Algorithms and Applications , Application Programming, Bioinformatics and Data Analysis


Come to our autumn school on October 27 - November 4 or winter school in early January.


And if you are not a student and not a schoolboy, come to teach .


')

Source: https://habr.com/ru/post/426399/


All Articles