In this article, we will model and investigate a two-phase commit protocol using TLA +.
The two-phase commit protocol is practical and is used today in many distributed systems. However, it is rather short. Therefore, we can quickly simulate it and learn a lot. In fact, with this example we will illustrate what result is
fundamentally impossible in distributed systems.
The problem of two-phase commit
The transaction goes through
resource managers (RM) . All RMs must agree on whether the transaction will be
completed or
aborted .
')
The transaction manager (TM) makes the final decision:
commit or
cancel . The prerequisite for a commit is a commitment to committing all RMs. Otherwise, the transaction should be canceled.
Some notes on modeling
For simplicity, we perform simulations in a shared memory model, not in a messaging system. It also provides quick model validation. But we will add non-atomicity to the “read from neighboring node and state update” actions to capture interesting behavior in message passing.
RM can only read the TM state and read / update its own state. He cannot read the status of another resource manager. TM can read the status of all RM nodes and read / update its own state.
Definitions

Lines 9-10 set the initial
rmState
for each RM to
working
, and TM to
init
.
The predicate
canCommit
is
true
if all RMs are “prepared” (ready for commit). If there is a RM in the "hang up" state, then the predicate
canAbort
becomes
canAbort
.

TM modeling is simple. The transaction manager checks for commit or cancellation — and updates
tmState
.
There is a possibility that TM will not be able to make
tmState
“inaccessible”, but only if the
TMMAYFAIL
constant
TMMAYFAIL
set to
true
before starting the model check. In TLA +, labels define the degree of atomicity, that is, its granularity. By the labels F1 and F2 we denote that the corresponding operators are executed nonatomic (after some indefinite time) with respect to the previous operators.
RM Model

The RM model is also simple. Since the “working” and “prepared” states are non-finite, RM selects non-deterministic among the actions until it reaches the final state. The “working” RM can go to the “interrupted” or “prepared” state. A “prepared” RM expects a commit / cancel from TM, and acts accordingly. The figure below shows the possible state transitions for one RM. But note that we have several RMs, each of which passes through its own states at its own pace without knowing the status of other RMs.

Two Phase Commit Model

We need to check the consistency of our two-phase commit: so that there are no different RMs, one of which says “commit” and the other “abortion”.
Completed
predicate verifies that the protocol does not hang forever: in the end, each RM reaches the
committed
or
aborted
final state.
Now we are ready to check the protocol model. Initially, we set
TMMAYFAIL=FALSE, RM=1..3
to run the protocol with three RMs and one TM, that is, in a reliable configuration. Verifying the model takes 15 seconds and says that there are no errors. Both
Consistency
and
Completed
satisfied with any possible protocol execution with any alternation of RM actions and TM actions.

Now set
TMMAYFAIL=TRUE
and restart the check. The program quickly produces the opposite result, where RM is stuck waiting for a response from an inaccessible TM.

We see that at the
State=4
stage, the RM2 transitions are interrupted, the RM3 transitions are interrupted at
State=7
, the
State=8
TM goes into the “hang up” state and falls to the
State=9
. At
State=10
system freezes because RM1 remains in a prepared state forever, awaiting a decision from a fallen TM.
BTM simulation
To avoid transaction freezes, we add a backup TM (BTM), which quickly takes control if the main TM is unavailable. For decision making, BTM uses the same logic as TM. And for simplicity, we assume that BTM never falls.

When we check the model with the added BTM process, we get a new error message.

BTM cannot accept a commit, because our original condition
canCommit
states that all
RMstates
must be “prepared” and do not take into account the condition that some RMs have already received a decision on a commit from the original TM before BTM took control. It is necessary to rewrite
canCommit
conditions taking into account this situation.

Success! When we check the model, we achieve both consistency and completeness, since the BTM takes control and completes the transaction if TM drops.
Here is the 2PCwithBTM model in TLA + (BTM and the second line canCommit were originally unpacked) and the
corresponding pdf .
What if RM fails too?
We assumed that RM is reliable. Now cancel this condition and see how the protocol behaves when RM fails. Add the "unavailable" state to the failure model. To investigate the behavior and model intermittent loss of accessibility, let the emergency RM recover and continue working by reading its status from the log. Here is another RM state transition diagram with the “unavailable” state added and red transitions. Below is the revised model for RM.


It is also necessary to modify
canAbort
, taking into account the state of inaccessibility. TM may decide to “hang up” if any of the services is in an interrupted or inaccessible state. If you omit this condition, then the fallen and not restored RM will interrupt the progress of the transaction. Of course, again, it is necessary to take into account the RMs, who have learned the transaction completion decision from the source TM.
Model checking

When we check the model, the inconsistency problem arises! How could this happen? We trace the trace.
With
State=6
all RMs are in a prepared state, TM decided to complete the transaction, RM1 saw this decision and switched to RC tag, which means it is ready to change its state to “completed”. (Memorize RM1, this gun will fire in the last act). Unfortunately, TM collapses at
State=7
, and RM2 becomes unavailable at
State=8
. In the ninth step, the backup BTM takes control and reads the state of the three RMs as “prepared, unavailable, prepared” - and decides to cancel the transaction in the tenth step. Remember RM1? He decides to complete the transaction, because he received such a decision from the original TM, and goes into the
committed
state at the 11th step. In
State=13
RM3 executes the decision to abort the transaction from the BTM and goes into the
aborted
state - and now we have broken consistency with RM1.
In this case, the BTM made a decision that broke the
consistency . On the other hand, if you force the BTM to wait for the RM to exit from the inaccessible state, it may hang forever in the event of a crash on the node, and this will violate the condition for
fulfillment (progress).
An updated TLA + model file is available here , as well as the
corresponding pdf .
Impossibility of FLP
So what happened? We have stumbled into the
theorem of Fisher, Lynch, Paterson (FLP) about the impossibility of consensus in an asynchronous system with failures.
In our example, the BTM cannot correctly decide whether RM2 is in a state of failure or not — and it is incorrect to decide to abort the transaction. If only the original TM made the decision, such an inaccuracy in recognizing the failure would not have become a problem. RM will comply with any TM decision, so both consistency and implementation progress will be maintained.
The problem is that we have two objects make decisions: TM and BTM, they look at the state of RM at different times and make different decisions. Such asymmetric information is the root of all evil in distributed systems.
The problem does not disappear even with the extension to a three-phase commit.
Here is a three-phase commit modeled in TLA + (
pdf version ), and the error tracing below shows that progress has been broken this time (on
the Wikipedia page on the three-phase commit, the situation is described when RM1 hangs after receiving the solution before the commit and RM2 and RM3 commit commit, which violates consistency).

Paxos is trying to make the world better

But all is not lost, hope is not dead.
We have Paxos . It acts neatly within the FLP theorem. The innovation of Paxos is that it is
always safe (even in the presence of inaccurate detectors, asynchronous execution and failures), and
ultimately ends the transaction when consensus becomes possible.
You can emulate TM on a cluster with three Paxos nodes, and this solves the inconsistency problem of TM / BTM. Or, as Gray and Lamport showed in a
scientific article on consensus in a commit transaction , if RM uses the Paxos container to store its decisions at the same time as the TM response, this eliminates one extra step in the standard protocol algorithm.