Fine Rust state machines

Translation of the article by Andrew Hobden "Pretty State Machine Patterns in Rust". Link to the original at the end.

Recently, I have been thinking a lot about design patterns and techniques that we use in programming. It’s really great to start exploring a project and seeing familiar patterns and styles that you’ve already seen more than once. This facilitates understanding of the project and makes it possible to speed up the work.

Sometimes you work on a new project and you realize that you need to do something the same way you did in the past project. It may not be part of the functionality or library, it may be something that cannot be wrapped in an elegant macro or a small container. It may be just a design pattern or structural concept that solves a problem well.

One interesting pattern that is often applied to such problems is the "State Machine". I propose to spend a little time to understand what exactly is meant by this phrase, and why is it so interesting.

Throughout the article you can run all the examples on the Rust Playground , I usually use the Nightly version out of habit.

Justify our ideas

On the Internet there are a huge amount of resources and thematic articles about state machines. Moreover, there are many of their implementations.

You used one of them, just to get to this page. You can simulate the TCP protocol using a state machine. You can also simulate HTTP requests with it. You can model any regular language as a finite state machine, for example, a regular expression language (REGEX). They are everywhere, hidden inside the things that we use every day.

So, a finite state machine is any “automaton” that has a set of “states” and “transitions” between them.

When we speak of an automaton, we mean the abstract concept of something doing something . For example, your function "Hello, world!" - automatic. It turns on and ultimately produces what we expect. Also behave and the model with which you interact with your database. We will consider our basic automaton as an ordinary structure that can be created and destroyed.

struct Machine; fn main() { let my_machine = Machine; // . // `my_machine`  ,      . }

States are a way to explain where the state machine is located. For example, we can imagine an automat filling a bottle. This machine is in the "waiting" state when waiting for a new bottle. As soon as he discovers the bottle, then goes into the "filling". Immediately after filling the bottle with the right amount of water, the machine goes into the "completed" state. He returns to the "waiting" state as soon as the bottle is taken.

The main conclusion from this is that no state has any information that relates to other states. The "fill" state does not care how long the machine has been in the "waiting" state. The "completed" state does not care about the degree of filling of the bottles. Each state has strictly defined duties and problems . The natural way to consider these options is enum .

 enum BottleFillerState { Waiting { waiting_time: std::time::Duration }, Filling { rate: usize }, Done } struct BottleFiller { state: BottleFillerState }

Using enum in this way means that the states are mutually exclusive, you can only be in one state at a specific point in time. Fat enums in Rust allow each state to store the necessary information in itself. Until our definition is declared this way, everything is in order.

But there is one small problem. When we described our automaton above, we described three transitions between states: → , → and → . We did not take into account → or → , they simply do not make sense!

This brings us to the idea of transitions. One of the most enjoyable features of a true state machine is that we never have to take care of transitions such as -> . The state machine design pattern must ensure that such a transition is impossible. Ideally, this will happen even before we launch our machine gun - at the time of compiling the program.

Let's look at our transitions in the diagram again:

  +----------+ +------------+ +-----------+ | | | | | | |  +-->+  +-->+  | | | | | | | +----+-----+ +------------+ +--+--------+ ^ | +-----------------------------+

As we can see, there are a finite number of states, and a finite number of transitions between states. At the moment, we can perfectly legally make transitions between each state to any other state, but in most cases this is not true.

This means that the transition between the "Waiting" state and the "Filling" state must satisfy a certain rule. In our example, this rule might look like "Bottle is in place." In the case of a TCP stream, this will be "We received a FIN packet", which means that we need to complete the transfer by closing the stream.

Determine what we want

Now that we know what a state machine is, how do we implement it in Rust? To begin with, let's think about what we want .

Ideally, we would like to see the following characteristics:

It can only be in one state at a time.
Each state must have its own data, if necessary.
The transition between states must have certain semantics.
It should be possible to have some general condition.
Only explicitly defined transitions should be allowed.
Changing one state to another should absorb the state if it can no longer be used.
We should not allocate memory for all states. Memory consumption must be at least no more than the size of the largest state.
Every error message should be easy to understand.
We should not resort to using heaps. Everything should be placed on the stack.
Type system should be considered as our strongest advantage.
As many errors as possible should be detected at compile time.

So, if we had a pattern that satisfies all these requirements, it would be truly fantastic. Well, a template that fits only part of the requirements, will also be good.

Explore possible implementations

With such a powerful and flexible type system as in Rust, we should be able to implement this. The truth is this: there are several ways, each of which offers us certain advantages and teaches us a lesson.

The second attempt with Enum

As we already know, the most natural way is enum , but we have already noticed that we cannot prohibit transitions in this case. But can we just wrap them in a structure? Of course we can! Take a look:

 enum State { Waiting { waiting_time: std::time::Duration }, Filling { rate: usize }, Done } struct StateMachine { state: State } impl StateMachine { fn new() -> Self { StateMachine { state: State::Waiting { waiting_time: std::time::Duration::new(0, 0)} } } fn to_filling(&mut self) { self.state = match self.state { //   "" -> ""  State::Waiting { .. } => State::Filling { rate: 1}, //    _ => panic!("Invalid state transition!") } } // ... } fn main() { let mut state_machine = StateMachine::new(); state_machine.to_filling(); }

At first glance, everything is in order. But do you notice some problems?

An error due to a forbidden transition will occur at runtime, which is terrible!
This prevents only incorrect transitions from outside the module, because private fields can be freely modified from within the module. For example, state_machine.state = State::Done unconditionally operates inside a module.
Every our function that works with states must have a match expression.

However, this approach has some advantages:

The memory required to represent the state machine is the size of the largest state. This is the result of using fat enum , the size of which corresponds to the size of the largest variant.
All memory is allocated in the stack, the heap is not involved.
Transitions between states have certain rules. This either works or gives an error!

Now you might think: "Allow, Hoverbear, you can wrap the output to_filling() in Result<T,E> or add the InvalidState option to the enum !". But let's face it: it doesn't improve the situation much, if it improves at all. Even if we get rid of runtime crashes, we still have to deal with awkward pattern matching expressions, and our errors will still be detected only after the program has started! Ugh! We can do better, I promise.

So continue the search!

Structures with transitions

What if we just use a set of structures? We can define for each of them a set of types common to each state. We can use special functions that turn one type into another! How will it look like?

 //      trait SharedFunctionality { fn get_shared_value(&self) -> usize; } struct Waiting { waiting_time: std::time::Duration, // ,     shared_value: usize } impl Waiting { fn new() -> Self { Waiting { waiting_time: std::time::Duration::new(0,0), shared_value: 0 } } //  ! fn to_filling(self) -> Filling { Filling { rate: 1, shared_value: 0 } } } impl SharedFunctionality for Waiting { fn get_shared_value(&self) -> usize { self.shared_value } } struct Filling { rate: usize, //      shared_value: usize, } impl SharedFunctionality for Filling { fn get_shared_value(&self) -> usize { self.shared_value } } // ... fn main() { let in_waiting_state = Waiting::new(); let in_filling_state = in_waiting_state.to_filling(); }

Damn it, how much code! Thus, the idea was that all states have both data common to all states and their own. As you can see, the to_filling() function to_filling() absorb the "Waiting" state and make the transition to the "Filling" state. Let's summarize everything:

Conversion errors are determined at compile time! For example, you cannot even create a "Fill" state by accident without first creating a "Waiting" state. (Actually you can, but this is irrelevant)
A transition between states is required.
During the transition between states, the old value is absorbed instead of a simple change.
True, we could do the same with enum from our first attempt.
We do not need constant match .
Memory consumption is still not satisfactory. We only need the size of the current
states.

There are some disadvantages:

Lots of duplicate code. You must define the same functions and types for each structure.
It is not always clear which values are common for states and which belong to only one. Updating the code in the future can cost you dearly.
Since the size of the state is not constant, we have to wrap it in enum as before, so that we can use the state machine as one of the components of a more complex system. Here is what it might look like:

 enum State { Waiting(Waiting), Filling(Filling), Done(Done) } fn main() { let in_waiting_state = State::Waiting(Waiting::new()); //    ,   `Waiting`   `enum`! //    `match`    let in_filling_state = State::Filling(in_waiting_state.to_filling()); }

As you can see, this is not very convenient. We are getting closer to what we want. The idea of transition between certain types seems like a big step forward! Before we try something completely different, let's talk about how to change our example, which can simplify further reflections.

The standard Rust library provides two very important types: From and Into , which are extremely useful and deserve mention. It is important to note that the implementation of one of them automatically implements the other. In general, the From implementation is preferable, since it is a bit more flexible. We can implement them very easily for our previous example:

 // ... impl From<Waiting> for Filling { fn from(val: Waiting) -> Filling { Filling { rate: 1, shared_value: val.shared_value, } } } // ...

This not only gives us a general transition function, but is also much easier to read when you see this in the source code! This reduces the psychological burden and facilitates the perception of readers. Instead of implementing our own functions, we use an existing template. The foundation of our templates on the basis of existing ones is an excellent solution.

So it's cool, how do we cope with annoying repetition of code and shared_value everywhere? Let's explore some more!

Almost perfect

Now we will bring together the lessons and ideas from the first two ways, add some new ideas, and get something more pleasant. The essence of this method is to use the power of generalized types. Let's look at a fairly basic structure:

 struct BottleFillingMachine<S> { shared_value: usize, state: S } //     `S`  StateMachine<S> struct Waiting { waiting_time: std::time::Duration } struct Filling { rate: usize } struct Done;

So, we actually embed the state of the finite state machine in the signature BottleFillingMachine . The state machine is in the “Fill” state with BottleStateMachine<Filling> , which is great because when we see this type as part of an error message or something like that, we immediately know the current state of the machine.

We can continue and implement From<T> for some specific options, like this:

 impl From<BottleFillingMachine<Waiting>> for BottleFillingMachine<Filling> { fn from(val: BottleFillingMachine<Waiting>) -> BottleFillingMachine<Filling> { BottleFillingMachine { shared_value: val.shared_value, state: Filling { rate: 1 } } } } impl From<BottleFillingMachine<Filling>> for BottleFillingMachine<Done> { fn from(val: BottleFillingMachine<Filling>) -> BottleFillingMachine<Done> { BottleFillingMachine { shared_value: val.shared_value, state: Done } } }

The definition of the initial state of the machine looks like this:

 impl BottleFillingMachine<Waiting> { fn new(shared_value: usize) -> Self { BottleFillingMachine { shared_value: shared_value, state: Waiting { waiting_time: std::time::Duration::new(0, 0) } } } }

And what does the change of states look like? Like this:

 fn main() { let in_waiting = BottleFillingMachine::<Waiting>::new(0); let in_filling = BottleFillingMachine::<Filling>::from(in_waiting); }

In case you do this inside a function whose signature limits the output type:

 fn transition_the_states(val: BottleFillingMachine<Waiting>) -> BottleFillingMachine<Filling> { val.into() // ,   ? }

What about the type of error messages at the compilation stage ?

 error[E0277]: the trait bound `BottleFillingMachine<Done>: std::convert::From<BottleFillingMachine<Waiting>>` is not satisfied --> <anon>:50:22 | 50 | let in_filling = BottleFillingMachine::<Done>::from(in_waiting); | ^^^^^^^^^^^^^^^^^^^^^^^^^^ | = help: the following implementations were found: = help: <BottleFillingMachine<Filling> as std::convert::From<BottleFillingMachine<Waiting>>> = help: <BottleFillingMachine<Done> as std::convert::From<BottleFillingMachine<Filling>>> = note: required by `std::convert::From::from`

It is quite clear what is wrong here. The error message even tells us the right transitions!

So what does this approach give us?

The correctness of the transitions is confirmed at compile time.
Error messages are very understandable and even offer a list of possible fixes.
We have a "parent" structure, which may have associated types and data that need not be repeated in child types.
As soon as the transition was made, the old state no longer exists, it was "absorbed." In fact, the whole structure disappears, so that we cannot get old values if there are side effects during the transition (changing average waiting time, for example).
Low memory consumption, only stack involved.

There are still disadvantages:

Our implementations From<T> suffer from some "clutter types". However, this is quite a minor inconvenience.
Each BottleFillingMachine<S> has a different size, so we still have to use enum . Yet this is not a serious disadvantage due to our structure.

You can play with this example here.

Dirty relationships with parents

Translator’s Note: The translation of this title, courtesy of Google Translator, is so great that I chose to leave it that way.

How can we organize the parent structure to store the state of the finite state machine without terrible problems with the interaction? Well, this will roll us back to the first idea with enum .

If you remember, the main problem with our first approach was that we did not have the ability to provide transitions, and all the errors showed themselves at run time.

 enum BottleFillingMachineWrapper { Waiting(BottleFillingMachine<Waiting>), Filling(BottleFillingMachine<Filling>), Done(BottleFillingMachine<Done>) } struct Factory { bottle_filling_machine: BottleFillingMachineWrapper } impl Factory { fn new() -> Self { Factory { bottle_filling_machine: BottleFillingMachineWrapper::Waiting(BottleFillingMachine::new(0)) } } }

At the moment, your first reaction is most likely "Damn, Hoverbear, look at these long, horrible type declarations." You are absolutely right! Honestly, they are really long, but I chose the most understandable type names! You can use all your favorite abbreviations and aliases in your code.
Look!

 impl BottleFillingMachineWrapper { fn step(&mut self) -> Self { match self { BottleFillingMachineWrapper::Waiting(val) => BottleFillingMachineWrapper::Filling(val.into()), BottleFillingMachineWrapper::Filling(val) => BottleFillingMachineWrapper::Done(val.into()), BottleFillingMachineWrapper::Done(val) => BottleFillingMachineWrapper::Waiting(val.into()) } } } fn main() { let mut the_factory = Factory::new(); the_factory.bottle_filling_machine = the_factory.bottle_filling_machine.step(); }

Again, you may notice that this works through absorption , not change. Using match , we move val and, thus, allow .into() use it and absorb the previous state. But if you prefer to change values, you can implement #[derive(Clone)] or even Copy for your states.

Despite the fact that it is somewhat less convenient and pleasant to work with, we still have the transitions provided by the system of types and all the guarantees that come with them.

You may notice that this method forces you to process all possible states during the manipulation of the state machine, and this makes sense. If you own and control a structure with a state machine, you must determine the actions for each state in which there can be a state machine.

Or you can just call panic!() If you really want to. But if you just want to panic , then why not use the very first approach?

You can see a fully working example here.

Working examples

This is the case when examples are not superfluous. So I collected a couple of working examples below and provided them with comments.

Three states, two transitions

This example is very similar to our bottle-filling machine, but it actually works, albeit rather trivially. This state machine receives the string and returns the number of words in it.
Link to Rust Playground

 fn main() { //   <StateA>.       ! let in_state_a = StateMachine::new("  ".into()); //   ,         in_state_a.some_unrelated_value; println!(" : {}", in_state_a.state.start_value); //    .    //      let in_state_b = StateMachine::<StateB>::from(in_state_a); //   !     ! // in_state_a.some_unrelated_value; //           in_state_b.some_unrelated_value; println!(" : {:?}", in_state_b.state.interm_value); //     let in_state_c = StateMachine::<StateC>::from(in_state_b); //     !      ! // in_state_c.state.start_value; println!(" : {}", in_state_c.state.final_value); } //     struct StateMachine<S> { some_unrelated_value: usize, state: S } //       impl StateMachine<StateA> { fn new(val: String) -> Self { StateMachine { some_unrelated_value: 0, state: StateA::new(val) } } } //        struct StateA { start_value: String } impl StateA { fn new(start_value: String) -> Self { StateA { start_value: start_value, } } } //  B     struct StateB { interm_value: Vec<String>, } impl From<StateMachine<StateA>> for StateMachine<StateB> { fn from(val: StateMachine<StateA>) -> StateMachine<StateB> { StateMachine { some_unrelated_value: val.some_unrelated_value, state: StateB { interm_value: val.state.start_value.split(" ").map(|x| x.into()).collect(), } } } } // ,      ,       struct StateC { final_value: usize, } impl From<StateMachine<StateB>> for StateMachine<StateC> { fn from(val: StateMachine<StateB>) -> StateMachine<StateC> { StateMachine { some_unrelated_value: val.some_unrelated_value, state: StateC { final_value: val.state.interm_value.len(), } } } }

Raft

If you’ve been following my blog recently, you may know that I prefer to write about Raft. It was Raft, as well as communication with @argorak that pushed me to conduct this research.

Raft is somewhat more complicated than the previous examples, because state transitions are not linear, as A->B->C Here is a state and transition diagram for this state machine.

 +----------+ +-----------+ +--------+ | +----> | | | | Follower | | Candidate +----> Leader | | <----+ | | | +--------^-+ +-----------+ +-+------+ | | +-------------------------+

→ Link to Rust Playground

 //      fn main() { let is_follower = Raft::new(/* ... */); //  ,  3, 5  7  Raft.      :) //      let is_candidate = Raft::<Candidate>::from(is_follower); //  !   let is_leader = Raft::<Leader>::from(is_candidate); //        Follower let is_follower_again = Raft::<Follower>::from(is_leader); //    ... let is_candidate_again = Raft::<Candidate>::from(is_follower_again); //     ! let is_follower_another_time = Raft::<Follower>::from(is_candidate_again); } //     struct Raft<S> { // ...   state: S } //  ,      Raft //     ,     struct Leader { // ...    } //   ,            struct Candidate { // ...    } //    ,   struct Follower { // ...    } // Raft    Follower impl Raft<Follower> { fn new(/* ... */) -> Self { // ... Raft { // ... state: Follower { /* ... */ } } } } //       //     ,     impl From<Raft<Follower>> for Raft<Candidate> { fn from(val: Raft<Follower>) -> Raft<Candidate> { // ...      Raft { // ... attr: val.attr state: Candidate { /* ... */ } } } } //       ,      impl From<Raft<Candidate>> for Raft<Follower> { fn from(val: Raft<Candidate>) -> Raft<Follower> { // ...      Raft { // ... attr: val.attr state: Follower { /* ... */ } } } } //       impl From<Raft<Candidate>> for Raft<Leader> { fn from(val: Raft<Candidate>) -> Raft<Leader> { // ...      Raft { // ... attr: val.attr state: Leader { /* ... */ } } } } //   ,     ,    impl From<Raft<Leader>> for Raft<Follower> { fn from(val: Raft<Leader>) -> Raft<Follower> { // ...      Raft { // ... attr: val.attr state: Follower { /* ... */ } } } }

I-impv Reddit , , . :

. -.
, :
. (, ), .
"", .

Rust . enum , . , , .

, . IRC Mozilla Hoverbear.

:
: Andrew Hobden

Source: https://habr.com/ru/post/324382/

All Articles