Cones stuffed with 15 years of using actors in C ++. Part II

We finish the story begun in the first part . Today we will consider a few more rakes, which happened to occur over the years of using SObjectizer in everyday work.

We continue to list rakes

People want synchronicity ...

The actors in the Model Actors and agents in our SObjectizer communicate through asynchronous messages. And therein lies one of the reasons for the attractiveness of the Actor Model for certain types of tasks. It would seem that asynchrony is one of the cornerstones, one of the bonuses, so take advantage of your health and enjoy.

But no. In practice, requests quickly began to be made in SObjectizer the possibility of simultaneous interaction of agents. For a long time I resisted these requests. But in the end gave up. I had to add to SObjectizer the ability to perform a synchronous request from one agent to another .

It looks like this in the code:

//  . struct get_messages final : public so_5::signal_t {}; ... //   ... auto msgs = request_value<std::vector<message>, get_messages>(mbox, so_5::infinite_wait); // ...  . for(const auto & m : msgs) ...

This shows the call to the request_value function, which makes a synchronous request, suspending the execution of the current thread until the result of the request is received.

In this case, we send a request of type get_messages to get the vector of message objects in response. And we will wait for an answer without time limit.

However, in SObjectizer it is implemented all the same through the message. Inside the request_value, a message is sent to the target agent, which receives and processes it in the usual way. Those. the recipient does not even know that a synchronous request has come to him, for him everything looks like a normal asynchronous message.

 class collector : public so_5::agent_t { public : ... virtual void so_define_agent() override { //   . so_subscribe(mbox).event<get_messages>(&collector::on_get_messages); ... } private : std::vector<messages> collected_messages_; //  ,     get_messages. std::vector<messages> on_get_messages() { std::vector<messages> r; std::swap(r, collected_messages_); return r; } };

Those. inside collector :: on_get_messages, the message receiving agent cannot determine whether it received get_messages as a regular asynchronous message or is part of a synchronous request.

But under the hood is hidden not very complicated mechanics, built on the basis of std :: promise and std :: future from the standard C ++ 11 library.

First, when sending a synchronous request to the recipient does not come a regular message, but a tricky one, along with the std :: promise object inside:

 struct special_message : public so_5::message_t { std::promise<std::vector<messages>> promise_; ... };

This message gets into a special handler that is automatically generated by SObjectizer when subscribing:

 collector * collector_agent = ...; auto actual_message_handler = [collector_agent](special_message & cmd) { try { cmd.promise_.set_value(collector_agent->on_get_messages()); } catch(...) { cmd.promise_.set_exception(std::current_exception()); } }; do_special_subscribe<get_messages, special_message>(mbox, actual_message_handler);

This tricky handler calls a user-defined message handler, and then saves the returned value (or an exception thrown outside) to the std :: promise object from the tricky message. This will trigger std :: future, in which the sender of the request is sleeping. Accordingly, a return from request_value will occur.

Obviously, synchronous interaction between agents is a direct way to receive deadlocks. Therefore, there is a request_value in SObjectizer, but we recommend using it with great care.

The funny thing for me personally was that very quickly there was a useful use for request_value. Just in the mechanisms to protect agents from overload. If this protection is done through the collector / performer pair, then the performer is convenient to apply for the next batch of messages through the request_value. And since the collector and performer should, in principle, work on different threads, the danger of getting deadlock here is minimized.

The moral of this story is this: strict adherence to the principles of some theoretical model is good. But if in practice you are urged to do something that conflicts with these very principles, then it makes sense to listen. Maybe something useful will come out.

Distribution out of the box: everything is not so bright

In SObjectizer-4, the developer out of the box was able to create distributed applications. We had our own protocol on top of TCP / IP, our own way of serializing C ++ data structures.

On the one hand, it was very cool and cool. Using simple gestures, messages could be made to automatically fly between the nodes on which parts of the distributed application were running. SObjectizer took over the serialization and deserialization of data, control of transport channels, reconnections at breaks, etc.

In general, at first everything was cool.

But over time, as the range of tasks solved on the SObjectizer-e widened, as the load on applications grew, we had a lot of trouble:

First, for each type of task it is desirable to have its own protocol. Because, say, the spread of telemetry, i.e. the exchange of a large number of small messages, the loss of which is not terrible, is very different from the exchange of large binary files. For example, an application where you need to share large archives or chunks of video files should use some other protocol than the application in which thousands of messages are transmitted from current air temperature sensors;
secondly, the implementation of back-pressure for asynchronous agents is not a simple thing in itself. And when communication over the network is mixed in here, the situation becomes much worse. Any delays in the network or braking on one of the nodes leads to the accumulation of large volumes of undelivered messages on the other nodes, and this greatly spoils life;
thirdly, the times when large distributed systems could only be written on a single C ++, ended a long time ago. Today, certain components will be written in other programming languages. This means interoperability is required. Which automatically leads to the fact that our own protocol, sharpened for C ++ and SObjectizer, does not help, but hinders the development of distributed applications.

Therefore, there are no tools to support distribution in SObjectizer-5. We are looking more towards making it easier for agents to communicate with the outside world through de facto standard protocols. This is better than inventing your own bikes.

Many agents are a problem, not a solution. SEDA Wei FOREVER!

Well, I personally like this topic very much. For once again emphasize that marketing and common sense can contradict each other :)

Almost all actor frameworks in their marketing materials necessarily say that actors are lightweight entities and in the application you can create at least one hundred thousand actors, even a million, even ten million actors.

When an untrained programmer is faced with the ability to create a million actors in a program, he may have to slightly tear down the roof. This is so tempting - to arrange every activity inside the application in the form of an actor.

The programmer succumbs to this temptation, begins to create actors for every sneeze and soon discovers that he has tens of thousands or even hundreds of thousands of actors working on his program at the same time ... What can cause at least one of two problems.

What's going on inside the application with a million actors?

The first problem that can be encountered when creating a large number of actors is the lack of understanding of what is happening in the program, why the program works in the same way and how the program will behave further.

What I call the bird flock effect happens: the behavior of an individual bird in a flock can be described by a set of several simple rules, but the configuration of the whole flock turns out to be complex and practically unpredictable.

Similarly, in an application with a large number of agents. Each agent can work according to simple and understandable rules, but the behavior of the entire application can be complexly predictable.

For example, some agents will suddenly cease to show signs of life. It seems like they are, but their work is not visible. And then suddenly they “wake up” and start working so actively that there are not enough resources for other agents.

In general, keeping track of what is happening inside an application with ten thousand agents is much more difficult than in an application where only one hundred agents work. Imagine that you have ten thousand agents and you wanted to know how heavily loaded one of them is. I think this will be a problem.

By the way, one of the killer features of Erlang is that Erlang provides tools for introspection. The developer can at least see what is happening inside his Erlang virtual machine. How many processes, how much each process eats, what queue sizes, etc. But Erlang has its own virtual machine and it is possible there.

If we are talking about C ++, then C ++ frameworks, as far as I know, are very far behind Erlang in this area. On the one hand, this is objective. Still, C ++ is compiled into native code and it is much more difficult to monitor pieces of native code. On the other hand, the implementation of such monitoring is a non-trivial task, requiring considerable effort and investment. Therefore, it is difficult to expect advanced features in OpenSource frameworks that are developed only on pure enthusiasm.

So, creating a large number of agents in a C ++ application and not having the same advanced monitoring tools as in Erlang, it is difficult to monitor the application and understand how it works there.

Sudden bursts of activity

The second possible problem is a sudden surge of activity, when a part of your actors suddenly begin to consume all the available resources.

Imagine that you have in the application 100 thousand agents. Each of them initiates an operation and cocks the timer to control the timeout for the operation.

Suppose some piece of the application began to slow down, the previously started operations began to fall off due to a timeout and the deferred messages about the expiration of timeouts began to come in batches. For example, within 2 seconds 10 thousand timers worked. This means calling 10,000 deferred message handlers.

And here it may turn out that each such handler for some reason spends 10ms. This means that processing all 10 thousand deferred messages will take 100 seconds. Even if these messages will be processed in four parallel threads. But it is still 25 seconds.

It turns out that part of our application for these 25 seconds stupidly freezes. And until he processes these same 10 thousand deferred messages, he will not react to anything else.

Misfortune never comes alone...

The saddest thing is that both of the above problems overlap perfectly. Due to a sudden burst of activity, we are faced with unplanned behavior of our application, and because of the effect of the bird flock, we cannot understand what is happening. The application seems to work, but somehow it is not. And it is not clear what to do with it. You can, of course, stupidly beat the application and restart it. But this means the re-creation of 100 thousand agents, their restoration in some state, the renewal of connections to some external services, etc. Painlessly such a restart, unfortunately, will not do.

So the ability to create a bunch of agents in your application should not be treated as a way to solve your problems. And as a way to make yourself even more problems.

The way out, of course, is simple: you need to get by with fewer agents. But how to do that?

SEDA approach

Very well inserts the brains into place familiarity with the approach of SEDA (Staged Event-Driven Architecture) . In the early 2000s, a small group of researchers developed a Java framework of the same name and with its help proved the viability of the underlying idea: break up the implementation of complex operations into a stage, separate each process’s flow (or group of threads) for each stage, and organize the interaction between the stages asynchronous message queues.

Imagine that we need to handle a payment request. We receive a request, check its parameters, then check the possibility of making a payment for a given client (for example, if he has not exceeded the daily limits on his payments), then we estimate the riskiness of the payment (for example, if a client is from Belarus and the payment is for some reason initiated from Bangladesh then it is suspicious), then we make a debit and form the result of the payment. Here you can clearly see several stages of processing a single operation.

The ability to create a million agents in an application pushes us to create one agent for each payment, who himself would consistently perform all the stages. Those. he himself would validate the payment parameters, he himself would determine the daily limits and their exceedances, he himself would make inquiries to the fraud monitoring system, etc. Schematically it might look like this:

In the case of the SEDA approach, we could have one agent for each stage. One agent accepts payment requests from customers and forwards them to the second agent. The second agent checks the request parameters and sends valid requests to the third agent. The third agent checks the limits, etc. Schematically, it looks like this:

The number of agents is reduced by orders of magnitude. It is much easier to control these agents. The protection of such agents from overloads is greatly simplified. These agents, if they work with the DBMS, have the opportunity to use bulk operations. Those. the agent accumulates, say, 1000 messages, then serves them all with 2-3 bulk calls to the database. We have the opportunity to dose the activity of agents. For example, if the external fraud monitoring system suddenly falls off and we need to generate 10 thousand negative answers, then we may not immediately send all these 10 thousand answers, but smear them evenly, say, for ten seconds. Thereby we will protect other parts of the system from overload.

Additional bonus: if only one agent serves some stage, then the task of prioritizing transaction processing at this stage is considerably simplified. For example, if you need transactions from online clients to process with a higher priority than scheduled transactions. In the case of the SEDA approach, this is easier to implement than when an agent is responsible for each transaction.

At the same time, even within the framework of the SEDA approach, we still enjoy the benefits that the Actor Model gives us. But we confine ourselves to literally several dozens of actors, instead of tens of thousands.

Conclusion

In conclusion, I would like to say that the Model Actors is a cool joke, but not a silver bullet at all. In some tasks, the Model Actor works well, in some it does not work very well, in some it does not work at all.

But even if the Model of Actors fits the task, it would still be very useful to have a couple of things:

First, the developer himself must have a head on his shoulders. If the developer thoughtlessly creates hundreds of thousands of actors in his application, does not think about the problem of overload, has no idea what a spontaneous surge of activity is, etc., then with the Actor Model you can make yourself no less trouble than “ bare "threads;
secondly, it would be good if the actor framework provided the developer with all possible assistance. In particular, in such things as protecting actors from overloading, error handling and introspection of what is happening inside the application. That is why we are gradually expanding the functionality of SObjectizer in this direction. We have already added such things as message limits, exceptions, statistics and monitoring information, and tools for tracing the message delivery mechanism.

By the way, just the set of such auxiliary tools in the actor framework, in my opinion, is a sign that determines the maturity of the framework. For it is not so difficult to implement some idea in your framework and show its efficiency. You can spend a few months of work and get quite a working and interesting tool. This is all done on pure enthusiasm. Literally: I liked the idea, wanted and did.

But equipping what happened with all sorts of aids, like collecting statistics or tracing messages, is already a boring routine for which it’s not so easy to find time and desire.

Therefore, my advice to those who are looking for a ready actor framework: pay attention not only to the originality of ideas and the beauty of examples. Look also at all sorts of auxiliary things that will help you figure out what is happening in your application: for example, find out how many actors are inside now, what are their queue sizes, if the message does not reach the recipient, then where does it go ... If the framework is something like this provides, then it will be easier for you. If it does not, then you will have more work.

Well, add from myself: if you wanted to take and make your own actor framework from scratch, which would protect the developer from the rakes discussed above, this is not a good idea. Occupation is absolutely ungrateful. Yes, and hardly payable. This has already been verified. In humans.

Source: https://habr.com/ru/post/324978/

All Articles