Apache Curator for Apache Zookeeper features overview

On duty, I have to deal with the design and development of distributed applications. Such applications often use various interprocess communication tools to organize the interaction of components. Particular difficulties arise in the implementation of algorithms that process the associated data distributed. To support such tasks, specialized systems of distributed coordination are used. The most popular and widely used product is Apache Zookeeper .

Zookeeper is a complex product. Despite its age, some errors are periodically detected in it. However, this is only a consequence of its capabilities that help make life easier for many developers of distributed systems. Next, I will look at some of the features of Zookeeper that will help you better understand its capabilities, and then go to the Apache Curator (Netflix) library, which makes the life of distributed software developers enjoyable and offers many ready-made recipes for implementing distributed coordination objects.

Apache zookeeper

As previously noted, Zookeeper is a vital component of distributed systems. The easiest way to present a Zookeeper database is a tree similar to the file system, with each element of the tree identified by (/ a / path / to / node) and storing arbitrary data in it. Thus, using Zookeper it is quite possible to organize a hierarchical distributed data storage, as well as other interesting constructions. The utility and prevalence of Zookeeper is provided by a number of important properties, which are listed below.

Distributed consensus

Consensus is provided by the ZAB algorithm, this algorithm provides the C (consistency) and P (partition tolerance) properties of the CAP theorem , which means integrity and resistance to separation, sacrificing accessibility. In practice, this leads to the following effects:

All clients see the same state, no matter on which server they request this state.
The change of state occurs in an orderly manner, a “race” is impossible (for set operations, get-set operations are not atomic).
The Zookeepr cluster can "fall apart" and become completely inaccessible, but at the same time it will become inaccessible to all.

Consensus is the ability of a distributed system to somehow agree on its current state. Zookeeper uses the ZAB algorithm, often other algorithms are used - Raft ,
Raft .

Ephemeral nodes

The client, establishing a connection with the Zookeeper cluster, creates a session. Within the session, it is possible to create nodes that will be visible to other clients, but whose lifetime is equal to the session lifetime. At the end of the session, these nodes will be deleted. Such nodes have limitations - they can only be terminal and cannot have descendants, that is, you cannot have ephemeral subtrees. Ephemeral nodes are often used for the purpose of
implementation of service discovery systems.

Imagine that we have several instances of the service, between which the load is balanced. If one of the instances appears, then an ephemeral node is created for it, in which the service address is located, and in the event of a service failure, this node is deleted and can no longer be used for balancing. Ephemeral nodes are used very often.

Subscribe to host events

The client can subscribe (watch) to the events of the nodes and receive updates when any events associated with these nodes occur. However, there is also a limitation here - after the occurrence of an event on a node, the subscription is canceled and it needs to be restored again, while obviously there is a possibility of skipping other events that occur on this node. Due to this fact, the possibility of using this function is rather limited.

For example, within the scope of the discovery services, it can be used to react to a configuration change, but it must be remembered that after installing the subscription, it is necessary to perform the operation "manually" to make sure that the state change omission did not occur.

Serial nodes

Zookeeper allows you to create nodes whose names are formed with the addition of successively increasing numbers, while these nodes can be ephemeral. This feature is widely used for solving applied tasks (for example, all services of the same type register themselves as ephemeral nodes) and for implementing Zookeeper “recipes”, for example, fair distributed locking .

Node versions

Node versions allow you to determine whether a node has changed between reading and writing, that is, with the set operation, you can specify the expected version of the node, if it does not match, it means that the node has been changed by another client and the state needs to be re-read. This mechanism allows you to implement an orderly change in the state of the data, for example, when implementing a "recipe" distributed counter .

ACL on nodes

It is possible to set access restrictions for nodes defined by ACLs, which is designed to protect data from untrusted applications. It is worth noting that, of course, ACLs do not protect against overloads that a malicious client can create, providing only a mechanism for restricting access to content.

TTL on nodes

Zookeeper allows you to set TTL nodes, after which (if there are no updates) the node will be deleted. This functionality appeared relatively recently.

Observer Servers

It is possible to connect to a cluster of servers in observer mode, which can be used to perform read operations, which is very useful in cases where the load on the cluster generated by write operations is high. With the use of server-observers, the problem can be solved. The question may arise, why not just add regular nodes to the cluster? The answer lies in the algorithm of consensus - the more nodes that allow you to write data, the longer it will take to reach a consensus and the less cluster performance will be to write. Observer servers do not participate in consensus, and therefore do not affect the performance of write operations.

Time synchronization on nodes

Zookeeper does not use external time to synchronize nodes. This is quite a useful feature, systems that focus on the exact time are more susceptible to errors associated with its mismatch.

Of course, there should be tar in the barrel of honey and it really is - Zookeeper has properties that can limit its use. There is even an expression that quite ironically describes the difficulties of working with Zookeeper - Single Cluster of Failure © Pinterest, which sarcastically demonstrates the fact that, trying to get rid of a single point of failure using a distributed system using Zookeeper, you can face the situation when the very point of failure.

Zookeeper database should fit in RAM

Zookeeper loads the base into memory and keeps it there. If the database does not fit in the RAM, it will be placed in the Swap, which will lead to a significant performance degradation. If the database is large, a server with a sufficiently large amount of RAM is required (which, however, is not a problem at the moment, when 1TB of RAM on the server is far from the limit).

Session timeout

If, when setting up the client, the session timeout is incorrect, then this can lead to unpredictable consequences, which will worsen with increasing load on the cluster and the failure of some cluster nodes. Users tend to reduce session time (30 seconds by default) to increase system convergence, since ephemeral nodes will be removed faster, but this leads to less stability of the system under load.

Performance degradation by the number of nodes in the cluster

Usually, a cluster uses 3 nodes that are involved in reaching consensus, the desire to add additional nodes will significantly reduce the performance of write operations. The number of nodes must be odd (the requirement of the ZAB algorithm), respectively, the expansion of the cluster to 5, 7, 9 nodes will adversely affect performance. If the problem is precisely in read operations, use observer nodes.

Maximum size of data in a node

The maximum data size in a node is limited to 1MB. In case you need to store large amounts of data, Zookeeper will not work.

Maximum number of nodes in a descendant listing

Zookepeer does not impose on how many nodes a node may have, however, the maximum size of a data packet that the server can send to the client is 4MB (jute.maxbuffer). If a node has so many descendants that their list does not fit in one package, then, unfortunately, there is no way to get information about them. This restriction is bypassed by organizing hierarchical pseudo-flat lists in the same way that caches are built in the file system, the names or digests of objects are broken up into parts and organized into a hierarchical structure.

Despite their shortcomings, their advantages outweigh them, which makes Zookeeper an essential component of many distributed ecosystems, for example, Cloudera CDH5, or DC / OS, Apache Kafka, and others.

Zookeeper for developer

Since Zookeeper is implemented using the Java language, in JVM environments its use is organic, for example, it is quite easy to start a server or even a cluster of servers from Java and use it to implement integration or smoke tests of an application without the need to deploy a third-party server. However, the Zookeeper client API is quite low-level, which, although it allows operations, but resembles a swim against the flow of a river. In addition, a deep understanding of the fundamentals of Zookeeper is required in order to properly implement exception handling. For example, when I used the basic interface for working with Zookeeper, debugging and searching for errors in the distributed coordination and detection code presented rather large problems and required considerable time.

However, the solution exists and was presented to the community by Netflix developer Jordan Zimmerman. Meet Apache Curator .

Apache curator

On the main page of the project is a quote:

This statement 100% reflects the essence of Curator. Starting to use this library, I found that the code for working with Zookeeper was simple and straightforward, and the number of errors and the time to eliminate them decreased by a factor of several. If, as it was said earlier, a standard client resembles a swim against the current, then with the curator the situation changes by 180 degrees. In addition, within the Curator framework, a large number of ready-made recipes are sold, which I will review further below.

Base API

The API is designed in the form of an extremely convenient fluid interface , which allows you to simply and concisely determine the required actions. For example (further, examples are given in the Scala language):

client .create() .orSetData() .forPath("/object/path", byteArray)

which can be translated as "create a node or, if it exists, just set the data for the path" / object / path "and write in it byteArray" .

Or, for example:

 client .create() .withMode(CreateMode.EPHEMERAL_SEQUENTIAL) .forPath("/head/child", byteArray)

"create a node of the type serial and ephemeral for the path" / head / child000000XXXX "and write into it byteArray" . A few more examples can be found on this manual page .

Asynchronous operations

Curator supports both synchronous and asynchronous mode for performing operations. In the case of asynchronous use, the client has the type AsyncCuratorFramework , in contrast to the synchronous CuratorFramework . And each call chain accepts the thenAccept method, which indicates the Callback that is called when the operation is completed. More information about the asynchronous interface can be found on its dedicated manual page .

 val async = AsyncCuratorFramework.wrap(client); async.checkExists().forPath(somePath).thenAccept(stat -> mySuccessOperation(stat))

When using Scala, the use of an asynchronous interface does not seem to be justified, since the functionality can be easily implemented using the Scala Future, which allows the code to preserve features of the scala-way development. However, in the case of Java and other JVM languages, this interface can be useful.

Data schema support

Zookeeper does not support semantics of stored data. This means that developers are solely responsible for the formats in which data is stored and which paths they are located. This can be inconvenient in many cases, for example, when new developers come to the project. To solve these problems, Curator supports data schemes that allow you to set constraints on the paths and types of nodes within the paths. The scheme created from the configuration can be presented in the Json format:

 [ { "name": "test", "path": "/a/b/c", "ephemeral": "must", "sequential": "cannot", "metadata": { "origin": "outside", "type": "large" } } ]

Migration support

Curator migrations are a bit like Liquibase, only for the Zookeeper. With their help, it is possible to reflect the evolution of the database in new versions of the product. Migration consists of a set of sequentially performed operations. Each operation is represented by some transformations over the Zookeeper database. Curator independently monitors the application of migrations with the help of Zookeeper. This feature can be used in the process of deploying a new version of the application. Migration details are described on the corresponding manual page .

Test server and test cluster

To simplify testing, Curator allows you to integrate a server or even a cluster of Zookeeper servers into an application. This task can be solved simply without using Curator, only with Zookeeper, but Curator provides a more concise interface. For example, in the case of Zookeeper without Curator:

 class ZookeeperTestServer(zookeperPort: Int, tmp: String) { val properties = new Properties() properties.setProperty("tickTime", "2000") properties.setProperty("initLimit", "10") properties.setProperty("syncLimit", "5") properties.setProperty("dataDir", s"$tmp") properties.setProperty("clientPort", s"$zookeperPort") val zooKeeperServer = new ZooKeeperServerMain val quorumConfiguration = new QuorumPeerConfig() quorumConfiguration.parseProperties(properties) val configuration = new ServerConfig() configuration.readFrom(quorumConfiguration) private val thread = new Thread() { override def run() = { zooKeeperServer.runFromConfig(configuration) } } def start = { thread.start() } def stop = { thread.interrupt() } } ... val s = new ZookeeperTestServer(port, tmp) s.start ... s.stop

In the case of Curator:

 val s = new TestingServer(port) s.start() ... s.stop()

Curator Recipes

Curator recipes are the main motive of using this library for implementing distributed mechanisms of interaction between processes. Next, we list the basic recipes that Curator supports and how they can be applied. I did not apply some recipes in practice, therefore for them the translation as close as possible to the manual is given.

Leader Selection

These recipes are designed to implement a fault-tolerant process execution model, within which there is a current leader and several processes are in hot standby. As soon as the leader ceases to perform its functions, another process becomes the leader. There are two suitable recipes:

Leader Latch , which is an analogue of CountDownLatch, which is blocked until the process has become a leader;
Leader Election , which implements the selection of a leader through a method call. At the moment when the process becomes the leader, the method is called, the exit from which indicates the loss of leadership.

Locks

Locking is one of the most important mechanisms for distributed interprocess synchronization. Curator provides a wide range of lock objects:

Shared Reentrant Lock — a distributed lock that a client can re-enter that has access to it;
Shared Lock - distributed lock;
Shared Reentrant Read Write Lock - an object that allows separate blocking for reading and writing, while several clients can block an object for reading at the same time, the lock for writing is exclusive;
Shared Semaphore - counting semaphore, through which it is easy to carry out work with a limited amount of resources, which is given by a 32-bit integer;
Multi Shared Lock is a high-level object that allows you to perform operations on several distributed locks atomically.

Barriers

Barrier - an object that allows a client to block access to a code section for other participants until certain conditions are met, and when they occur - unblock access, which leads to the fact that all participants can continue their execution;
Double Barrier - the object allows you to synchronize the input of a certain number of clients in the code segment and their exit from it.

Counters

Shared Counter - the usual integer counter (32 bit) with protection against the race;
Distributed Atomic Long - Long type counter (64 bit).

Cache

Path Cache - an object that monitors a node and updates the local cache about its child nodes and optionally about their data when it changes;
Node Cache - an object that monitors the node and updates the local cache about it and its data;
Tree Cache - an object that monitors the entire tree of children of a node and updates the local cache when it changes in the tree;

Knots

Persistent Node - this recipe allows you to create a data node for which Curator will strive to ensure its presence and invariance, even under external influences;
Persistent TTL Node - a recipe for creating a node, the lifetime of which is determined by the TTL, which supports the same properties as the Persistent Node;
Group Member - allows you to organize a group of members.

Queues

I would like to note that Zookeeper is not the best candidate for organizing intensive distributed queues, if you need to ensure the passage of a large number of messages, I recommend using a specially designed solution, for example, Apache Kafka, RabbitMQ or others. However, Curator provides a set of recipes for queuing support:

Distributed Queue - a normal distributed queue, allows you to put and retrieve messages in order of priority;
Distributed Id Queue - a distributed queue that with each message saves an identifier and allows you to retrieve a message from the queue by identifier with its immediate removal;
Distributed Priority Queue - priority queue;
Distributed Delay Queue - a queue allows you to set the time for each added item, in Unixtime format, when it becomes available for reading from the queue;
Simple Distributed Queue - analogue of the queue, which is provided by the standard API Zookeeper.

Conclusion

The Apache Curator library is definitely worth considering to use, it is an outstanding example of engineering work and allows you to greatly simplify interaction with Apache Zookeeper. The disadvantages of the library include a small amount of documentation, which increases the entry barrier for novice developers. In my practice, I have repeatedly needed to study the source code of the library in order to understand exactly how this or that recipe works. However, it also has a positive effect - a deep understanding of the implementation allows you to make fewer logical errors based on assumptions.

It should be noted that the Curator developers recommend that you study the Zookeeper documentation before you start using the library. This is a very sensible advice, because Zookeeper is a product, for the effective use of which it is necessary to understand exactly how it functions, and not just to know its API. These costs will certainly pay off, and in the hands of an experienced engineer, the capabilities of Zookeeper allow you to create reliable and productive distributed systems.

Source: https://habr.com/ru/post/334680/

All Articles