The basic idea behind the
Pub-Sub is quite simple: it is a public message. If you’re a person, you’ll be, you’ll be, you’ll be. In a
free translation, it may sound like this: "
Publisher-subscriber (
publisher-subscriber or
pub / sub . English ) is a
behavioral design pattern of messaging, in which message senders, called publishers, are not directly tied to the program code sending messages to subscribers (
eng. subscribers ). Instead, messages are divided into classes and do not contain information about their subscribers, if any. Similarly, subscribers deal with one or several classes of messages abstracting from specific publishers . "
Pub sub model
Let's fantasize a little bit and try to develop a Pub-sub system from scratch by ourselves. The result will be the most obvious model, which will further help us understand the features and possible problems of specific implementations.
The main business process in our system may look like this:
- Subscribers subscribe to the message .
- Publisher creates a message .
- The publisher publishes a message in our system.
- The message is stored for some time in the system .
- The system sends a message to all subscribers.
- Subscribers accept the message .
- Subscribers process the message .
I highlighted the participants of the above described business process with the underlining, this is:
- Publisher. There may be several.
- Subscriber. They are usually several, but may not be at all.
- System. This is our pub-sub system.
All participants operate with
messages . Messages are transmitted data. Usually this is just an array of bytes into which any data is serialized.
')
We have listed the following
operations on messages (in bold):
- Subscribe
- Create
- Post
- Keep
- To send
- To accept
- Process
Queues and asynchronous
Nowhere in our model, we are not faced with queues. They are not mentioned in the definition of pub-sub. Also, asynchrony was not mentioned anywhere.
But without the queue / queue pub-sub implementation is hardly possible. If we try to send messages immediately after they are published, there are a lot of problems. For example, how fast will sending messages to thousands of subscribers? What happens if the subscriber fails to process the previous message? What if new messages are posted at a time when previous messages have not yet been sent?
I have not seen pub-sub implementations without queues, although theoretically this is possible.
So, turn.
A queue is a temporary repository of messages, based on the principle “The first has come - the first has left”. Again, the pub-sub in its definition does not require maintaining the order of messages, but this is usually implied.
Where will we place queues / queues? This question will pass a golden thread through our further discussion.
Implementation issues
Now, perhaps, we will begin to formulate questions on our business process:
- Where and how are publisher and subscriber addresses stored?
In principle, addresses can be stored by any participant in the process. But usually the system acts as a broker and stores all the addresses at home.
- Subscription, what is it like?
Surely the subscription will include the subscriber’s address . In addition, a " message class " will be indicated, a kind of filter that determines whether the message matches this subscription or not. The address of the publisher is not needed here.
- Can we add or remove a subscription at any time or should we stop the system for this?
If we add / remove a subscription, how will this affect the messages that are already published and are waiting to be sent?
- Is the message format important?
Most likely the message will be serialized and placed in the package. Additional parameters may be passed along with the packet, which are placed in the headers of the packet.
- How are messages filtered by subscriptions?
Usually the message is immediately filtered by subscriptions at the publishing stage. Sometimes the message is filtered only at the stage of sending the message to the subscriber. Sometimes the message is already filtered by the subscriber.
- What are subscription filters?
Subscription filters can be hierarchies like namespaces, or keysets, or RegEx expressions.
- What happens if the system or the subscriber cannot accept messages?
The system or the subscriber may refuse to accept, may delete the message without any warning, or it may just hang or fail.
- Does the message retry if it fails?
Retry is a standard solution if the receiving party is temporarily unavailable. The key word here is "temporary." If the receiving party or the transmission channel has ceased to work on an ongoing basis, then repeating the sending will not help anything, but will only overload the system / communication channel.
- Can I customize the re-send algorithm?
With unreliable communication channels this question can be one of the main ones. Can we change the number of retries, the interval between attempts? According to what algorithm is the interval between attempts configured?
- How long are messages stored?
If the subscriber does not take the message for a long time, what should he do with it? Should it be stored indefinitely or deleted after a certain interval?
- Who initiates the message sending?
A subscriber may occasionally poll the system for new messages (poll mode) or the system itself sends a new message to the subscriber (push mode). In the first case, the subscriber must know the address of the system, in the second case the system must know the address of the subscriber. In the second case, the system more economically consumes both its resources and data channel resources.
- What happens if the subscriber is overloaded with messages?
Should he close the entrance to receive new messages and send warnings to the sending side or silently ignore messages? Or maybe the subscriber can increase (scale) their resources?
If you are going to use the pub-sub in your project and choose one of the available pub-sub systems for this, go through these questions. If any of them are critical to you, try to find answers.
Implementation details
I want to consider the implementation of Pub-Sub on the example of the most popular programs of this class. If the reader decides that something needs to be changed in this list, I will discuss it once. Why am I doing this, instead of continuing the theoretical discussion of the template? I have a feeling that examples of implementation are no less important for understanding than bare theory.
So, what is now in the list:
Azure ServiceBus Topics
The entire system is located in Microsoft Azure. The implementation can be said to be classical, based on the
message broker model.
Pub-sub system is a centralized paid service, a broker through which all messages pass. All publishers and subscribers are registered here, all their credentials are also stored here. The subscriber registers the subscription here. All messages come here and go to the queue implemented on the basis of SQL Server. For each subscription, a virtual queue is created, a kind of cursor in which the pointers to the real messages stored in the main queue are stored. The messages selected by the filter of the subscription fall into the virtual queue. The subscriber asks for the next message itself (poll), or the system notifies the subscriber of a new message (push). The subscriber receives the message, after which the message pointer is deleted from the virtual queue. When a message is deleted from all virtual queues, it is removed from the main queue, from the system.
Microsoft Azure EventHub
EventHub is very different from the classic implementation. The main difference is that EventHub does not register or store subscriptions, it does not create or store virtual queues. Received messages are simply stored in the system as a linear list. They are stored for a certain time, after which they are deleted by the system. Subscribers have access to all messages. The subscriber himself filters the messages and keeps the cursor on the list. On the one hand, this imposes an additional function on the subscriber. On the other hand, EventHub is free of subscription support, which results in excellent performance. It also allows for a different approach to reliability issues, since the subscriber can go back through the list at any time and re-read messages. Achieved almost complete independence of subscribers from publishers.
The subscription operation in the event hub is a null operation. Any customer can start reading messages at any time.
Microsoft BizTalk Server
BizTalk has been around for the second decade, so you should not expect innovative approaches from it. Essentially, its Pub-sub mechanism, called MessageBox, is a precursor to Azure Topics. The difference is that it is accessed via adapters. We cannot simply call an API function to publish or receive a message, we need to use adapters. Adapters play the role of both publishers and recipients of messages.
RabbitMQ
RabbitMQ is based on the ideas of the
AMQP protocol, but not on its latest version (1.0), but on previous versions: 0-8 and 0-9-1. In these versions, so-called exchanges are supported, which AMQP developers abandoned in version 1.0. The exchange model elegantly implements the pub-sub for the broker, which is RabbitMQ.
ZeroMQ
ZeroMQ is the heir to the AMQP ideas. One of the creators of AMQP did ZeroMQ, when the process of discussing AMQP version 1 came to a standstill. ZeroMQ is one of the interesting ideas on the implementation of ultra-high-speed messaging system, including using pub-sub.
ZeroMQ has no centralized service. The entire Pub-sub system is implemented as part of the publisher and subscriber code. ZeroMQ is just a library that supports queues. In the simplest model, subscriber and publisher programs create local queues. The subscriber is explicitly connected to the address of the publisher, after which messages from the publisher's queue begin to flow into the subscriber's queue. Queues are located in the address spaces of the programs of the publisher and subscriber, without an intermediate broker. The system is not represented by a separate service, but rather, as it were, spread between the publisher and the subscriber.
On the one hand, if the publisher or subscriber decides to complete the work, messages from his turn will simply disappear. On the other hand, such an implementation provides the maximum message transfer rate.
Queues in ZeroMQ are stored in memory, which provides maximum performance, but reduces reliability.
ZeroMQ also implements more complex configurations. For example, you can create a pub-sub c broker. Moreover, ZeroMQ allows you to create a variety of architectures that have no analogues in classical Pub-sub systems. For example, you can create various options for distributed brokers. I recommend reading one of the most successful
pub-sub descriptions on the ZeroMQ website , starting with the
pub-sub basics and ending with
more complex models . It takes a lot of time, but, for example, it opened my eyes to many important aspects of the pub-sub.
microServiceBus
One of the new implementations of the pub-sub. I included it in our list because of several interesting decisions. MicroServiceBus is based on Azure ServiceBus Topics.
Interestingly, the publisher and subscriber code is stored here on Azure. Moreover, this code can be loaded automatically. The code is implemented on Node.js, which means a good platform independence. Node.js runs on virtually any operating system and device. Any device where Node.js can work can be either a publisher or a subscriber, or both.
Following the example of BizTalk Server, microServiceBus implements adapters for many systems and protocols. But unlike BizTalk, adapters are just a useful addition to the API.
microServiceBus is similar to ZeroMQ in that it is focused on the developer.
Just like ZeroMQ, it is multiplatform, although the multiplatform ZeroMQ is based on porting the library into many languages and on the fact that the core library is made in C. The multiplatform microServiceBus is based on the fact that Node.js has been ported to many operating systems.
microServiceBus is focused both on system integration and device integration (IoT).
Redis
Redis is actually not a pub-sub system, but a
key-value data storage . But it surprisingly well implements the classic
pub-sub and shows remarkable performance.
You can install Redis locally or as a distributed service. You can add the ability to save messages to disk.
System selection
If we try to look at the pub-sub systems from a more practical side, we will inevitably face the question of choosing a system for our projects and work. I am in no way trying to answer this question, only schematically outline the outline for finding an answer. Firstly, pub-sub systems are significantly larger than those collected in this article. Secondly, I do not in any way claim to have the correct placement of accents.
I will focus on systems that stand out with something special from the general list.
The comparison table is here .
Development environment
BizTalk Server can only be used with Visual Studio. Numerous editors are used to create various parts of the system.
Development for other systems basically comes down to the usual work with code and system API.
Development languages
This refers to the development of programs for publishers and subscribers, I will call it client code. The system kernel is usually provided as a ready-made service, and we need to add the programs of publishers and subscribers to get a working system.
Client code for Redis, ZeroMQ and RabbitMQ can be developed in a large number of languages. You can use all the languages used for industrial systems: C #, Java, Python, PHP, JavaScript / Node.js, Scala, Ruby.
In ZeroMQ, most of the languages are implemented as shells for the original library written in C. There are also original libraries for C #, Java, Erlang, and JavaScript. But they are a little behind in the functionality of the original library in C.
Azure ServiceBus Topics and Azure EventHub have C # and REST API.
The microServiceBus client code is written only on Node.js, but this is a strength rather than a flaw.
System core implementation
Azure ServiceBus Topics and Azure EventHub systems are implemented as SaaS services hosted in the Microsoft Azure cloud. microServiceBus runs on top of Azure ServiceBus Topics.
ZeroMQ does not have a ready-made broker part, but it can be done quite simply. And you can make a broker of any complexity with any functionality. Good programmers often choose ZeroMQ because of this.
The RabbitMQ and Redis services are supplied in binary and original form and can be installed on any server platforms: Windows, Linux, Mac, Unix. ZeroMQ can also be installed on any platform.
BizTalk Server is implemented as a massive SQL code and .NET code that works in Windows services and it works only under Windows. You will need to have not only a license for BizTalk, but also a license for Windows and SQL Server.
Reliability and Maturity
BizTalk Server stands out in this category. He is known for the fact that systems based on it operate for years without any service. The code of its kernel practically does not change, for many years the errors were corrected.
ZeroMQ and Redis are fairly new systems and are constantly being upgraded. It imposes its own specifics in terms of reliability. There are massive systems implemented based on them, but to create such systems, programmers of a good level are surely needed.
Azure Topics and EventHub are new systems, but Microsoft is behind them, so there are no complaints about reliability.
There is no information on the microServiceBus yet, as the system has just appeared.
I note that systems that store message queues on disks are more reliable than systems that only hold messages in memory.
Performance
ZeroMQ stands out here. In some cases, it sends messages even faster than TCP, thanks to batching.
ZeroMQ is followed by Redis, then EventHub.
Scalability
ZeroMQ also scales best because of its architecture and low resource requirements. But you will have to pay close attention to the design of the system. Although the ZeroMQ documentation provides many examples of scalable systems, you are unlikely to find a ready-made design that can be implemented without significant refinement.
Price
BizTalk Server is the most expensive system on our list. But you should understand that pub-sub is only a small part of its functionality and choose BizTalk by completely different criteria.
Some of the systems listed are Open Source Systems. In extreme cases, you will spend on paid support.
Simplicity
You only need a few minutes to start working with ZeroMQ. Redis takes a little more. Azure ServiceBus Topics, microServiceBus, RabbitMQ and Azure EventHub are also quite simple to start development.
BizTalk Server is difficult in all aspects, but because of its maturity, you are unlikely to be left alone with the system, you will always find good professionals with extensive experience.
A running system on Azure or RabbitMQ is not as easy to set up and scale as it may seem. Any broker that is configured in a cluster configuration will require you not only knowledge, but also experience.
The complexity of the system will grow much faster than the number of nodes in the system.