Decentralized messaging system

The world of IT-development goes in a spiral. The founders of UNIX believed that there would be a lot of programs, but each of them performs its task as “excellent”. In the early 2000s, the main trend was the program-combines that perform all that is possible and even more. Now the vector of development direction has begun to move in the opposite direction. And if previously standard data input / output flow was used for data exchange, now due to the fact that the systems are becoming more and more distributed, specialized integration complexes (the English Bus Bus or Message broker) transfer data to the nodes.

To improve resiliency and reduce the load on the system as a whole, there is a separate approach to data exchange without using a central server.

An example of implementation I would like to present.

A bit of terminology: a message bus (message bus), a message broker — all of these are similar (but far from identical) concepts denoting a software package that receives, processes and transfers data from one node to another.
Subscriber is an application that sends and / or receives a message according to an agreed protocol.
')
To begin, briefly about systems with a central node (including systems with redundancy of the type master-slave, master-master).
Typical enterprise systems: TibcoEMS, IBM MQ, JBoss and others. From an open source system: RabbitMQ, Apache ActiveMQ, Apollo, Redis. There are even cloud services: IronMQ. The most commonly used protocols: AMQP, STOMP.
The basic idea is that subscribers connect to a common server (server cluster) that routes messages between connected clients.

Benefits:

Centralized configuration;
Ease of providing a guaranteed delivery template;
Availability of libraries in almost all programming languages;
A wide selection of specific implementations;

Nevertheless, there are a number of drawbacks:

With a massive load, all processing goes on a central server, which requires high-performance solutions;
Failure of the central node leads to failure to service all subscribers;
With a backup system (such as a master-slave), various data synchronization problems may occur;
For some systems, such as embedded ones, this is redundant.

Despite all the shortcomings, the “big” business uses this type of message broker, since the cost of data loss is much higher than the cost of buying more powerful hardware.

However, in a number of tasks, guaranteed delivery is not required: the “Internet of things”, systems that independently ensure the reliability of data transmission, highly loaded systems, with the admissibility of data loss. In such cases, the functionality of the above solutions is redundant and does not allow solving performance problems.

Another approach is to exchange data without a broker (born broker-less). Typically, such an architecture requires a specialized library and / or additional software on the subscriber node.
From the corporate segment, as far as I know, there is only one product: TIBCO Rendezvous (if someone advises alternatives, I would be very grateful).
From non-commercial systems, you can specify ZeroMQ, which does not require a central server. However, this library does not provide any abstraction over the network and not rarely leads to writing its own centralized systems, nullifying the whole idea of decentralization.
The basic idea of a decentralized architecture is similar to the idea of P2P: the subscriber transmits data to other subscribers without using a common coordinating server. (I do not consider DHCP, DNS, etc., as they are on a different OSI layer).

The following advantages of this approach can be distinguished:

Load distribution on multiple nodes;
Fault tolerance. The system will work as long as there is at least one sender and one recipient;
Potentially higher performance.

Among the shortcomings can be noted:

Lack of centralized management;
It is almost impossible to provide guaranteed delivery;
The low prevalence of such systems in the IT business and the absence of any standards.

UDP is often used for implementation as it does not require a connection. Also, using UDP multicast (hereinafter simply multicast) it is possible to very easily implement the PUB / SUB template, i.e. when the publishing node (PUB) publishes / sends data on the specified topic (topic) to the subscriber nodes (SUB). This technology works MICEX in the distribution of exchange data (FIX FAST) and many other systems.

Consider the implementation of such a system. The requirements are as follows:

The implementation of the template PUB / SUB;
The main purpose - warning systems with small (up to 1KB) messages;
The system should work without a central server and regardless of the recipients;
The primary OS is Linux 2.6 or higher.

First, take the simplest option. Using one multicast address, we will send messages to all subscribers with the name of the topic. Subscribers must filter data according to an individual set of subscriptions.

Determine the content of the UDP packet:

The name of the topic;
Data.

The subscriber algorithm can be described as follows:

Connect to the multicast group:

struct ip_mreq mreq; struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_port = htons(PORT); sin.sin_addr.s_addr = ADDR; mreq.imr_multiaddr = addr; mreq.imr_interface.s_addr = htonl(INADDR_ANY); setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &optval, sizeof (optval)); if (bind(fd, (struct sockaddr *) &sin, sizeof (struct sockaddr_in)) < 0) { perror("Bind"); return -1; } if (setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof (mreq)) < 0) { perror("Join"); return -2;; }

Get the message;
If the topic of the message is not in the list of interest, go to step 2;
Process the message;
Return to paragraph 2.

Publisher's job is even easier:

Add a topic name to the message;
Send a message to multicast address.

This algorithm is simple, working but has an unpleasant moment: if there is a large amount of traffic, there is a lot of useless data sent to the nodes, which they will have to process before dropping them.

Reduce the burden on recipients by assigning different multicast addresses for topics. To calculate the group, we use any hash conversion, for example, CRC-32, and we obtain the necessary IP address.

Subscriber Algorithm:

Calculate hash values from topics of interest:

  unsigned int addr_value = 4009754625 + (crc32_hash(subject) % 16777215);

Connect to the received multicast addresses. The features of working with them are well described in this topic ;
Get the message;
If the topic of the message is not in the list of interest to us, go to point 3;
Process the message;
Return to paragraph 3.

Publisher:

Add a topic name to the message;
Calculate the hash of the topic;
Send to received multicast address message.

Since the range of available multicast groups is 1,677,714 addresses, then if the hash function is well chosen, there will be about two matches for 33 million different topics.
Since in Linux it is possible to correctly use only one socket per one multicast group, it is recommended to use epoll to get data.

The result was a distributed message system that allows you to send data by name to the topic, not caring about the specific addresses of the recipients and has a huge capacity for further expansion. An additional advantage is the fact that applications do not require any specialized libraries, and for devices that only send messages, the library can be ported even to microcontrollers (if they have a network stack).

The implementation and source code can be viewed here .

PS

I love my native Russian language very much, but due to the constant use of English, there may be problems in the text. If you notice an error, I will be very grateful if you write me about it in a personal message.

Source: https://habr.com/ru/post/240053/

All Articles

Decentralized messaging system

More articles: