Asynchronous state machine: ideology and technology

Introduction

It is good when your subordinates never get sick, do not die, are always present at work and carry out your orders without preliminary preparations: “Called - stand up”. Such are, for example, web services that comply with the REST model (which, if we omit the special HTTP terminology, means that the service interface is actually a data container interface).

In real life, subordinates have runny nose and maternity leave, network connections have timeouts, flights have weather, and car engines have a warm-up idle time in cold weather.

An asynchronous state machine is a convenient top-level abstraction for managing entities with a rich and not always predictable inner world. Such an entity can be a hardware device, a network protocol session or just a parallel running process, the code of which you do not control.
')
The architecture of the asynchronous finite-state machine described below solves a number of standard problems arising from the "frontal" integration of subsystems with regard to their internal state. The most noticeable of such problems is the lack of separation (I would even say, insufficient “galvanic isolation”) of the signal entities and the transition between states, which is why the machine becomes unstable to DoS attacks. There are other, less obvious ones - for example, an “insufficiently atomic” replacement of a node of a subsystem or a resource used by it.

Anatomy (object decomposition)

The state machine model includes the following basic entities:

The state is the mode of operation of the controlled system, which is different from the others according to the opportunities provided. Thus, snapshots of caches and buffers, options “from the fence to lunch” and other accidents of the controlled system are not included in the concept of “state”. Normally, there should be a few units; if the score went to the second decade, then most likely, the managed system should be split up or hierarchized.
A condition is a logical value (true or false) on one of the “inputs” of the system. The superposition of the states of all inputs of the automaton uniquely determines the target state of the automaton. Thus, any input signal that is significant to the state of the automaton ultimately boils down to setting the value of one or several conditions.
Reaction is the response of the automaton to the difference between the current state and the target state. We counted two and a half fundamentally different types of reaction: a direct transition between states, a route and a stop route (“brick”). A direct transition can also be an empty operation (NOP) - for example, if the change in inputs is caused by notification of the completion of an asynchronous operation.

Once again: the state is determined by a set of conditions, the reaction is determined by a pair of states . Any state of inputs can be associated with some state — that is, with no set of inputs, the desired state of the system should not be undefined.

Routes are set “lazily” (lazily) - for each pair of states, between which no direct transition is set, an intermediate target state is set. It is similar to a gateway in network packet routing — with one difference: routes can be nested.

The desired state is recalculated after each transition made. Thus, there is no guarantee that the route will be completed to the end. This reduces the “dreaminess” of the finite state machine, that is, it reduces the response time to an unexpected change in the situation.

A “stop route” is a route in which the initial and intermediate states coincide. It is used to mark pairs of states (real and desirable) in which nothing should be done, but one should wait until the desired state changes.

If you are writing in Java, set the condition and state sets as enum, and in the abstract automaton implementation, use parameterization, EnumSet for the “state word” and EnumMap for the route and transition tables. This is not a dogma. Probably, it can be somehow different. But, as the Armenian radio says, it's a pity.

Physiology (threading and alarm)

Asynchronous state machine is convenient to build over the message queue. On Android (the android.os package), there is a MessageQueue + Looper + Handler infrastructure that perfectly meets our requirements; in adult Java, you can use the ThreadPoolExecutor from a single thread or just a loop that parses LinkedBlockingQueue. If you do not have blockade resource limitations, do not regret the welding and highlight each managed system downstream.

Response messages to the client code (by default, these are the IDs of the states to which the managed system goes, but there may be some useful results - milk, calves ...) are sent via the thread-safe list of subscribers. (In C #, the delegates' subscription to events is implemented out of the box.) In the method of sending a message, it is convenient to return the Boolean flag “enough for me”. The simplest thing to do with such a flag is to implement a one-time “latch” (latch), on which the client can wait for the state he needs; but you can write a rather complex scenario (“after A do it, after B do it and unhook it”).

Messages about the change of the input state are simply taken into account by setting or clearing the corresponding flag. No other action on these signals occurs. The real transition between states is processed when there is no queue (or “presumably not” - no thread-safe guarantees are required here, because we do not prevent the race condition, but only reduce body movements) of other messages, that is, when the “incoming mail” is completely disassembled.

If you specify any inputs yourself - in the state transition code - this can be done either synchronously (by direct assignment) or asynchronously (by queuing the signals). Complete asynchrony eliminates the loop from the queue exhaustion handler, but requires an explicit “ping” (sending an empty message) if the state of the system has changed (that is, we have not stopped in the loop of the stop route), but the route is not passed to the end. You can do without pinging by organizing a cycle with the condition “the desired state is not equal to the real one, and at the same time we did not run into a stop route”. In both cases, the stop route has to be clearly distinguished.

Pedagogy (runlevels and panic)

An easy-to-understand state table is made on the basis of successively applied containsAll () - that is, the system is considered to be in the A (n) state, if all the necessary conditions of this state are present in the “state word” of the inputs, and the same is not true for all A (m | m <n). In most practical cases, states can be strictly ranked by the accumulation of opportunities (from “off and not used” to “on and ready for anything”) and achievements (from “willingness to go” to “go” and “drive”). In this case, each subsequent state differs from the previous one by a strict weakening of the set of necessary conditions.

There may be exceptions. For the description of exceptional cases, the “panic” flag is used, which is both a necessary and a sufficient condition for being in the appropriate state. Transition to this state gives up all the resources used or somehow “cleans” the managed system.

Probably, one can start more than one flag of panic, more than one exceptional state and rank them in descending order of “seriousness” - but in our practice there is no need for this yet.

Economy (resources)

Resources that you get yourself do not require special treatment. In particular, they can be stored in ordinary non-final non-volatile private-fields. Information about whether you have this resource flows entirely from your current state.

A resource requiring special treatment is a dependency: what the managed system uses, without producing it itself. Resources are taken from the outside. Dependency injection in a state machine is implemented by a separate pattern, which I like to call “equipment”.

Each equipment “slot” has one permanent attribute: a condition that signals the availability of this type of equipment — and two values: suggested and effective. Suggested is what the client has suggested. Effective is what is actually used. The state of the equipment is checked at the beginning of the queue exhaustion handler, as well as after each transition between states.

Checking the condition of the equipment is performed as follows (we analyze the most difficult case when the non-null effective is replaced with the non-null suggested — the rest are obtained by removing the irrelevant "if-then"):

If suggested! = Effective, and effective! = Null, then the “hardware available” flag is cleared.
If the equipment is “required” in the current state (i.e., the condition we just dropped is necessary for being in the current state — the runlevel rule helps a lot), then we stop at that — wait for the controlled system to go to where old equipment is not needed.
If old (suggested! = Effective) equipment is not required in the current state, we reset it to null.
If the current state is suggested! = Null && effective == null, then assign effective = suggested and set the equipment availability flag.

The get () method of the object - the equipment holder is batch-private and returns effective.

If you group automata into composites that solve the same problem in different ways, it is good practice to deliver the same resources to all performers, not just the active one. (Client orders, on the contrary, are delivered only to the active executor.)

Ethics and psychology of family life (integration)

Strictly separate the input signals according to their origin:

client code orders (“let's go!”);
callbacks from the managed system ("knocking left engine");
state of resources (“there is an ignition key in the lock”).

While some messages from a managed system may “imply” or “devalue” others (for example, the lack of a key in the lock means that our information about the gained speed is outdated), no signals from the controlled system cancel client instructions, and no client instructions Do not change our knowledge of the managed system. A logical mixture (addition, multiplication) of these conditions occurs already in the reducer “words of inputs” to the actual state.

Hygiene (best practices)

It is fundamentally important that all manipulations with the subsystem are performed in the order of transition between states. Strictly all. So strictly that, ideally, there should be no separate private methods in the class of a specific implementation of the machine — only delegates (in C # sense; in Java, pseudo-delegates such as Callable or Runnable) in the jump table. To some extent, for the sake of this, it is even possible to suffer duplication of the code - since the standard ways of its elimination do not eliminate the semantics redundancy issued to them. (Group filling one delegate of several cells of the transition table does not lead to erosion of semantics and is generally kosher; in our skeletal implementation there are such facility-methods.)

Hooks following the pattern of onStateEnter () or onStateExit (), respectively, should also not be. They will either have to know in which "direction" the subsystem passes the state corresponding to them, or ignore it.

Make sure that the controlled system (“physics”, remaining behind the facade of your finite automaton) does not manifest a stupid initiative — that is, does not monitor its state itself (except for error diagnostics) and does nothing “at the same time” or “just in case”. This includes "lazy initialization", and attempts to predict and preempt the behavior of the client. All this is a source of error.

After initialization of the gearbox and route tables and transitions, it is recommended to explicitly (by scholastic enumeration of all possible sets of conditions and all possible pairs of states) to check their completeness.

With automatic testing of the machine (sorry for a non-bullying self-repeat), “load” it with messages with the order of the frequency changing during the test: very quickly (much faster than they would be processed alone - to make sure that the filling of the queue does not threaten anything per se), just quickly (with a frequency comparable to the frequency of reactions — to check for race conditions) slower than the frequency of reactions (to observe the independent “growth” of the runlevel as the “physics” of the asynchronous tasks are worked out and the machine makes independent decisions about following actions).

PS Before you sign the client on the changes of the parameter A, simultaneously send him the current value A. (“Be racially correct - prevent race conditions”.)

Conclusion

I wrote some to Trivius, but judging by the code that I regularly read, Trivia probably turned out to be useful.

Source: https://habr.com/ru/post/141142/

All Articles