
I really want to show that in multithreading C ++ “very quickly” does not exclude “very safe”. This means you can write effective and stable programs with as many threads as you like and at the same time avoid wasting a lot of time debugging multithreading. If you are interested in how I manage not to shoot myself in the foot, and how I pay for it, welcome
When I had to write more and more multi-threaded programs about 7-8 years ago, my friend - Captain Obviousness - drew my attention to the following fact: the more flows, the more they interact, the more synchronization objects are required and the more sleepless nights I spend during the testing phase . The situation is complicated by the fact that multithreaded errors are like beautiful young ladies: they always appear in the field of view, but it is much more difficult to meet any specific one more time.
')
In general, when the number of streams in programs stably went away for 5-6, I realized that something needed to be done with this, and to do it very quickly, until there were 10 or more of them.
At the same time, if in the same Win API there are cookies like WaitForMultipleObjects, then when switching to the cross-platform environment, we only have mutexes, critical sections and unconditional support of signals (and this happened when I switched to the wonderful
U ++ cross-platform framework) .
It took several months to search for solutions, when my eyes fell on the description of the wonderful Erlang programming language. He offered a far from new, but very elegant system of inter-stream interaction, which greatly increases the stability of the program.
If in your own words and in brief, we are talking about the fact that each subtask works with its “address space” (or rather, a data set), that is, it is isolated from the others. And the only way to interact subtasks is the exchange of asynchronous messages. This all has a serious theory, a lot of smart names and features, but my friend Captain Obvious doesn’t like all these boring details and asks me to go straight to the point.

We represent each of our stream as a conveyor. This means that the thread is spinning in some "infinite" cycle, which is waiting for the arrival of a new message. As soon as the message is processed, the pipeline flow either processes the following, or falls asleep waiting for a new one.
Let's forget for a second that incoming messages with arguments are sent to us from a multi-threaded environment. Inside the pipeline, we have the simplest single-threaded loop to process incoming thread-safe arguments using internal class data. It is clear that for such work the synchronization objects are not needed. This is the usual code that knows nothing about multithreading. I remind you that our class is a misanthrope, it has not shared its data with anyone, which means that no object from another stream can affect them (which, by the way, fits nicely with encapsulation requirements).
It now remains to decide how and in what form we will transmit these same asynchronous messages. At this place, my friend KO stubbornly shut up, so I had to make the decision myself. It quickly became clear that any enumeration of types of messages for each such class
enum {MESSAGE_....., MESSAGE_....., MESSAGE_.....}
will look cruel, and processing them
switch (messageType) { case MESSAGE_.....: ....... break; case MESSAGE_.....: ....... break; case MESSAGE_.....: ....... break; case MESSAGE_.....: ....... break; }
will cast the legitimate rays of hatred and diarrhea from grateful descendants.
In general, I began to
put callbacks with arguments in the queue for processing. So I got rid of unnecessary enumerations and made the code well readable. It looks like this:
This code looks slightly unusual, but only at first. At the same time, it is quite expressive, not overloaded with synchronization objects, cycles, and so on.
In fact, I merge the stream data into a single class, by inheritance I make the stream pipeline and I write in it handlers consisting of a simple single-threaded code. And that's it!
The use of such an approach slightly changes the approach to building a multi-threaded application, but believe me, it's worth it.

Here, my friend KO, squinting affectionately, remarks that I was crying for thread safety:
- presence of a message processing cycle
- flow synchronization with queue
- need to copy arguments
- memory that eats up the thread queue
In fact, the following happens. The processing cycle uses the internal synchronization object and “sleeps” all the time while there are no messages at the input. That not only clears the karma of the developer, but also makes a feasible contribution to the struggle for processor cores and the ecology of the planet. Calling a callback in my implementation is equal to calling a virtual function plus calling a non-virtual function with arguments, which is not so much.
Synchronization of a queue with a stream is done in the simplest way through a synchronization object, the operation with which is completely transparent to user code. Lock - pull pointer out of the queue - unlock. Not so many resources and it takes away. Plus, you can use non-blocking queues.
About copying arguments. Copying pod-types is a fairly quick operation, which is generally imperceptible in the vast majority of applications. When it comes to more complex arguments, it makes sense to use destructive copying. For example, a string can be passed through the destructive transfer of a pointer to internal data (this mechanism is actively used in U ++). Therefore, passing in the form of a list argument or associative array is a very cheap operation. In fact, the transfer of arbitrarily complex parameters in real programs is reduced, as a maximum, to copying several POD variables.
Finally, the use of memory. Each element of the queue is a small structure containing, in addition to the arguments, a pointer and a virtual table of two pointers (total == (1 + 1 + 2) * sizeof (void *)), which is quite a bit.
And finally, we must understand that any approach is not a panacea for all occasions. For example, the main stream of a high-performance web server is a task from another area. But almost any multi-threaded work in the desktop application falls on this approach as a glove. In real life, the flow queue of more than a thousand or two callbacks is the result of a design error or coding error. Both are caught by the queue length restriction and the debug assertion. This means that each queue takes up less than 64K of memory, which by today's standards is almost imperceptible.
Moreover, I found that the pipeline approach led to a noticeable reduction in the total number of synchronization objects and, consequently, a decrease in the number of locks and, consequently, an increase in program performance while increasing their stability.
What else can you say ... the
queue is cool, do not be afraid of the queues!
PS In the article I just described the principle, outlined the approach. There are many points related to the design of projects using conveyor flows, reverse notification, creating a pool of conveyor flows for high-loaded applications. If the article is of interest to the community, I will reveal these moments in more detail.