ZeroMQ: sockets in a new way

In any medium or large application, be it desktop or web, for business or for personal use, the programmer needs to solve an important architectural task - how will threads, processes, modules, nodes, clusters, and other parts of its application eco-system communicate with each other? .

Many developers decide to follow the path of least resistance, placing this task, for example, on a DBMS. For example, one process put the data in the database, the second read, processed - put more and so on.
It’s already embarrassing to talk about sharing through files in our years, but it also happens.
Other programmers are trying to create some kind of their own, specialized solution and, as a rule, choose sockets.

The task of designing and developing an application architecture is extremely interesting, but this is a separate topic. In this post I would like to share my first impression of my acquaintance with the ZeroMQ library.
')
ZeroMQ offers the developer a certain high level of abstraction when working with sockets. The library assumes part of the concern for data buffering, queue maintenance, establishing and restoring connections, and other things. Instead of doing such nonsense, you can focus on the main thing - the architecture and logic of the application.

However, in this world, free cheese is only in a mousetrap. Therefore, I tried, as far as I could, to figure out how to pay for the convenience, which I found pros and cons when using this library.

A direct description of ZeroMQ, its API and a bunch of other useful information can be found on the official ZeroMQ website .

In addition, I highly recommend reading the entire Guide on the official website even if you do not use the library - it is full of the right messages and is generally useful for studying various types of network architectures.

We will solve the typical problem and compare the solution based on traditional sockets and ZeroMQ sockets.

So, the task

Suppose we have a service that accepts a client’s connection on a socket, receives requests from it and sends responses to them.
For simplicity, let it be an echo service, i.e. that received - and sent.

Next you need to decide on the format of the exchange.
A traditional socket works with a sequence of bytes, which is not good for an application that exchanges some structured information. Therefore, we will have to create some kind of “package” with data, for simplicity, the package will have one attribute - length. That is, we first transmit the packet length, then the data itself of the specified length. When receiving, respectively, we buffer the received sequence of bytes and parse it into “packets”.
Inside the “package” we can stuff anything: binary structure, text, JSON, BSON, XML, etc.

For simplicity, our server will receive and transmit data in one stream.
But data processing on the server should occur in several threads (we will call them worker).

Decision

As a solution, I created two source codes, one with ordinary sockets, the other with ZeroMQ.
I will not publish the source code in the post itself, to view, follow the links:
1) Traditional Sockets (19 Kb)
2) ZeroMQ Sockets ( 11,74 Kb)

More about tests

Each source code file is a ready-made test, which, when run, starts both the server and the clients (in the same process, but in different threads).
The test works for several seconds and gives the results of each client: how many packets and bytes received, as well as the average speed of receiving packets.
When a client stream starts, one or several data packets are sent, and when each packet is received, it is sent back.
Test parameters can be changed, they are set in # define in each file.

As you can see, ZeroMQ reduced the code by about 2 times, readability improved.
Now let's see how much we paid for it.

On my machine, with the initial parameters, the test produced approximately the following results:

1) 400 packets per second (traditional sockets);
2) 500 packets per second (ZeroMQ).
* Note: the default in the test is 10 client threads and 2 worker, the packet size is 1Kb, the “processing” time (imitating usleep) of one packet by the server is 2ms .

Immediately, I’ll make a reservation that if we were processing data in one stream, along with reception and transmission, then ZeroMQ would lose 2-4 times to normal sockets. It was also tested on a similar test, but I will not publish it yet, because A single-threaded server that processes only one request at a time, and the rest of the clients are waiting — this is not our case.

Let's see why ZeroMQ showed better results than ordinary sockets, despite some overhead projector due to the level of abstraction.

The main reason, of course, lies in the source code of the test itself. Processing data in several threads on ordinary sockets is a rather complicated task. In my test, it is implemented in a far from optimal way:

1) there is no queue of tasks and received packets, we simply do not accept data if we cannot process it;
2) when the worker has finished processing the request, he is wasting sleep until the main thread writes the next task to the buffer;
3) the main thread in case of busy workers idly passes the main loop until the worker is free (or no I / O events appear);
4) when the worker writes the result of the request processing to the client's transfer buffer, the main thread is blocked (or the worker waits until the main thread passes the main loop).

The elimination of these shortcomings will significantly increase the amount of code and the complexity of the task, increasing the probability of errors.

Now let's turn to the ZeroMQ option.

The source code is more readable, and most importantly, it is deprived of any locks (mutex, as in the task with regular sockets). This is the main advantage of ZeroMQ.

In traditional asynchronous programming, locks are inevitable, with an increase in the amount of code, you will definitely put an extra lock somewhere, and forget to put the right one somewhere. Then nested locks will appear, which will eventually lead to deadlock and various race conditions. If errors will occur in rare cases, on the application in production, you are tortured to look for them. And the effect will be stunning - your service will hang tight, unsaved data will be lost, and clients will be disconnected.

ZeroMQ solves this problem simply - processes and threads only exchange messages. It is necessary to make a reservation that it is not recommended to share any common data between threads and use locks. ZeroMQ allows you not to share data about sockets and their buffers between threads, but the data of the application itself remains a developer’s headache.
Inside the process, a message exchange can also occur between threads, and not necessarily via TCP. It is enough to transfer to the zmq_bind / zmq_connect functions instead of "tcp: //127.0.0.1: 1010" something like "ipc: // mysock" - and your exchange already works through UNIX-sockets, and put "inproc: // mysock" - and the exchange will go through the internal memory of the process. It is much faster and more economical than sockets.
As an example, take the source of the test.
The stream that performs data processing (worker) is the same client, but only internal. It connects to the main thread through the specified socket (most effectively inproc: //) and receives a task, running which sends the result back to the main thread. The latter is already forwarding the result to an external client.
ZeroMQ allows you not to worry about the distribution of tasks and the search for a free worker. In this example, it automatically queues the packet for processing (sending worker).

Undoubtedly, ZeroMQ also has quite significant disadvantages. Although this library takes on a lot of worries, it does not guarantee the delivery and safety of your messages. This is given to the developer, which is absolutely correct, in my opinion.

Let's walk through several, the most important aspects of working with ZeroMQ.

Connections

Pros:
+ ZeroMQ automatically recovers outgoing connections. In the application, you may not notice a disconnection, unless, of course, you specifically track this event (see zmq_socket_monitor ())

Minuses:
- I have not yet figured out how to find out the real IP address, host name, or at least the client descriptor from which the message came. The maximum that ZeroMQ gives is a certain client identifier (for a ZMQ_ROUTER type socket), which can be either automatically assigned by ZeroMQ or specified by the client independently before establishing a connection.
- Again, I have not yet figured out how to forcibly disable the client (for example, did not log in on time). And this is fraught with the accumulation of unnecessary compounds.

Queues

Pros:
+ Messages sent to ZeroMQ are placed in an internal queue, which allows not to wait until the end of sending, and in the case of an outgoing connection, it does not matter whether it is installed or not. Queue size may vary.
+ there is also a waiting list, which is why the so-called "Fair line". In the case of an incoming connection, you receive messages from the public queue to receive for all clients.

Minuses:
- As far as I know, you cannot manage queues - clear, count actual size, etc.
- in case of overflow of queue, new messages are discarded

Messages

Pros:
+ In ZeroMQ, you are not working with a stream of bytes, but with separate messages whose length is known.
+ A message in ZeroMQ consists of one or more so-called. “Frames”, which is quite convenient - you can add / remove frames with meta-information as the message passes through the nodes, without touching the frame with the data. Such an approach, in particular, is used in the ZMQ_ROUTER type socket — ZeroMQ, when receiving a message, automatically adds the client identifier from which it is received in the first frame.
+ Each message is atomic, i.e. will always be received or transmitted completely, including all frames.

Minuses:
- Each message should fit in memory, i.e. if you need to send large messages, you will have to break it up into parts (into messages, not frames) yourself. The maximum message size, however, can be customized.

Lyrical digression

In ZeroMQ, in addition to various modes of transport (tcp, ipc, inproc, etc.), there are several types of sockets: REQ, REP, ROUTER, DEALER, PUB, SUB, etc.
I advise you to read them on the documentation carefully. The type of socket at both ends depends on its behavior. Some socket types use optional binding frames.
The above-mentioned Guide quite well with examples will introduce you to the main types of sockets.

Conclusion

If you are just starting to design your application, or some of its individual simple parts, modules and subtasks, then I highly recommend looking at ZeroMQ.
In a real-world application with asynchronous data processing, ZeroMQ provides not only a reduction in the amount of code, but also a slight increase in performance.
Binda of this library is for a variety of programming languages: C ++, C #, CL, Delphi, Erlang, F #, Felix, Haskell, Java, Objective-C, Ruby, Ada, Basic, Clojure, Go, Haxe, Node.js, ooc, Perl Scala.
The library is cross-platform, i.e. can be used both in Linux and under Windows. However, unfortunately, I have not found the official version under MinGW yet.
But the project is developing rapidly, where it is already used a lot, let's hope and believe.

Comments in the comments are welcome!

Source: https://habr.com/ru/post/242359/

All Articles