Silent algorithms: model “make, write, (assign to another)”

Following the advice of a habrapublic, I try a new version of the translation of the term " lock-free "

Last time, we saw an “unconcerned” algorithm where the capture was implemented so that the stream accessing the captured data does not wait for their release, but is sent in a “workaround” (calculates the required result without using the cache services). In his next post, Raymond explains how this algorithm can be improved in the event that there is no “workaround”. The algorithm, however, remains inconspicuous: each stream continues to work, without waiting for the release of the captured data.

In the common variable, two service bits are now needed: in addition to the capture flag, as in the previous example, the “new job is assigned” flag; and if the work assigned is complex, then besides the flag, it will also need to be stored somewhere else and its parameters. For example, a pointer to an (memory-aligned) object with parameters can be stored in a common variable, and two named flags can be stored in the free low-order bits of the pointer.
')
Before performing an action on an object, we first capture it by setting the corresponding flag atomically. If it turns out that the object has already been captured, we will entrust the execution of our action to the seizing thread by setting the second flag.

If the object was able to be captured, then after the completion of work with it, we remove the capture flag and at the same time check whether we have been assigned a new job. (Ie if there were no appeals to the object during the time we kept it captured.) If there is work, then we will do it; and so on, until one day when unlocking the object of deferred work will not be. We have no right to leave the object in the state “not captured, but there is work”.

The resulting model “make, write, (instruct another)” is woven from a multitude of cycles in case of all sorts of failures. Every atomic operation is a cycle; execution of deferred work is carried out in a cycle; and each time a call from InterlockedCompareExchange reveals a rewrite of the data used, you need to roll back all the work done and perform it from the beginning. The model is very complicated. Raymond describes it as follows: “In the whole world, perhaps, only five will be able to implement it correctly, and I don’t belong to these five.” Nevertheless, he gives an example of an object called GroupWait, which supports two operations:

AddWait: Add a new handle to the list.
SignalAll: Install all events in the list, automatically deleting each event simultaneously with its installation. For the event to be set the next time you call SignalAll, you need to add it to the list again.

Implemented an object simply as a list of handles. It is clear that it could be implemented without a sophisticated, tacky model: for example, using atomic list functions implemented since Windows XP. The task was chosen only to accompany the verbal description with an approximate code. Do not take GroupWait as a sample implementation of a stream-safe list.

So, we use the lower two bits of the pointer to store two flags: a capture flag indicating that some thread is performing a list operation; and a work flag indicating that the user requested the setting of events at the time the list was captured.

 // WARNING!  IF YOU USE THIS CODE YOU ARE AN IDIOT - READ THE TEXT ABOVE

 struct NODE;
 NODE * Node (LONG_PTR key) {return reinterpret_cast <NODE *> (key);  }

 enum {
  Locked = 1,
  Signalled = 2,
 };

 struct NODE {
  NODE * pnNext;
  HANDLE hEvent;

  LONG_PTR Key () {return reinterpret_cast <LONG_PTR> (this);  }
  NODE * Ptr () {return Node (Key () & ~ (Locked | Signalled));  }
 };

 #define NODE_INVALID Node (-1)

 class GroupWait {
 public:
  GroupWait (): m_pnRoot (NULL) {}
  ~ GroupWait ();
  BOOL AddWait (HANDLE hEvent);
  void SignalAll ();
 private:
  NODE * m_pnRoot;
 };

Since we combine in one value a pointer to a list item and a pair of flags, for convenience it is worth defining methods that convert the used types into one another: Node returns a pointer, Key is a numeric value, and Ptr is a usable pointer (without flags in the lower bits) .

We will display our values as bit fields of the form p|S|L : p is a pointer to the next element of the list; S - work flag; and L is the capture flag. The set flag S means that you need to process all the elements of the list, starting from the next - imagine it indicated not inside the element, but on the outgoing arrow.

For example:

    m_pnRoot
   + -------- + - + - +
   |  * | 0 | 1 |
   + --- | ---- + - + - +
       |
       v
   + -------- + - + - + --------- +
 A |  * | 1 |? |  hEvent1 |
   + --- | ---- + - + - + --------- +
       |
       v
   + -------- + - + - + --------- +
 B |  * |? |? |  hEvent2 |
   + --- | ---- + - + - + --------- +
       |
       v
   + -------- + - + - + --------- +
 C |  NULL |? |? |  hEvent3 |
   + -------- + - + - + --------- +

A GroupWait object containing three handles is shown. The reset flag S in the head of the list means that no one required to set the event hEvent1. In contrast, the set flag S in the element A means that you need to bypass all the elements after A and set the corresponding events - namely, hEvent2 and hEvent3. In particular, the value of the flag S in elements B and C does not matter; these elements will be processed anyway, because that is required by the S flag in the element A. Thus, the value of the S flag in the last element of the list never really matters.

The L flag is used only in the head of the list; in all other elements it does not matter.

So the preparations are complete; Add a new handle to the list.

 BOOL GroupWait :: AddWait (HANDLE hEvent)
 {
  NODE * pnInsert = new (nothrow) NODE;
  if (pnInsert == NULL) return FALSE;
  pnInsert-> hEvent = hEvent;

  NODE * pn;
  NODE * pnNew;
  do {
   pn = InterlockedReadAcquire (& m_pnRoot, NODE_INVALID);
   pnInsert-> pnNext = pn;
   pnNew = Node (pnInsert-> Key () | (pn-> Key () & Locked));
  } while (InterlockedCompareExchangeRelease (& m_pnRoot, pnNew, pn)! = pn);
  return TRUE;
 }

We add a new element to the top of the list, and make sure that the L flag from the old element is transferred to the new one: otherwise it may happen that another thread has already captured the list, and we inadvertently “released” it. The S flag in the added item is cleared: no one has yet requested to set a new event. We add an element to the head of the list, we use the familiar model “make, write, (try again)” - checking that no one has changed the list behind our backs. Please note that the “ABA problem” does not arise: even if the unchanged value m_pnRoot points to another object, we still use only the pointer itself, and not the contents of the object.

The simplicity of the AddWait method is unusual for the “make, write, (instruct another)” model: in case of failure, we have nothing to delegate - all the work consists of one record in memory. Pay for this simplicity will be the other methods, which will have to provide for the processing of elements added to the list during the time it was captured.

The SignalAll method is so complicated that it is better to read it in parts.

 void GroupWait :: SignalAll ()
 {
  NODE * pnCapture;
  NODE * pnNew;
  do {
   pnCapture = InterlockedReadAcquire (& m_pnRoot, NODE_INVALID);

   if (pnCapture-> Key () & Locked) {
    pnNew = Node (pnCapture-> Key () | Signaled);
   } else {
    pnNew = Node (Locked);
   }
  } while (InterlockedCompareExchangeAcquire (& m_pnRoot,
                               pnNew, pnCapture)! = pnCapture);

  if (pnCapture-> Key () & Locked) return;

  ...

If the list is captured, the only thing that is required of us is to set the S flag in his head. If the list is free, then we will try to capture it, and at the same time “steal” all the elements of the list, writing down the “cap” NULL|0|1 instead of the head. In both cases, the rewriting of the head of the list is carried out according to the model “make, write, (try again)” - we repeat the attempts until the recording fails.

By setting the flag S, we reassigned our work to the thread that grabbed the list. The set flag S in the head of the list means that it is necessary to process all elements of the list following the head — that is in general, all the elements of the list; just what we need. The thread that captures the list will check the flag when the list is released, and do the work for us.

If the list was not captured, then we took it by our action. “Stolen” elements are not visible to other streams, so that a simultaneous call of SignalAll from different streams will not lead to multiple installation of events.

  ...
  NODE * pnNext;
  NODE * pn;
  for (pn = pnCapture-> Ptr (); pn; pn = pnNext) {
   SetEvent (pn-> hEvent);
   pnNext = pn-> pnNext-> Ptr ();
   delete pn;
  }
  ...

The bypass of the “stolen” list is implemented in a very straightforward manner, without any concern for flow-safety; just take the item by item, set the event, and delete the item. The only feature is calling Ptr to remove flags from the pointer to the next item.

Now you need to unlock the list. To start:

  ...
  pnCapture = pnNew;
  ...

At the beginning of the method, we wrote down pnNew in m_pnRoot , and if this value remained in the head of the list, it means that we got off easy: no one turned to the list for the time we were working with it.

  ...
  for (;;) {
   pnNew = Node (pnCapture-> Key () & ~ Locked);
   if (InterlockedCompareExchangeRelease (& m_pnRoot,
                       pnNew, pnCapture) == pnCapture) {
    return;
   }
  ...

First, let's check if the list has changed: if not, it is enough just to unblock it - and that's it. Otherwise, we start a new cycle: you need to perform all the pending work that has accumulated while the list has been captured.

  ...
   pnCapture = InterlockedReadAcquire (& m_pnRoot, NODE_INVALID);

   NODE * pnNew = Node (pnCapture-> Key () & ~ (Locked | Signaled));
   NODE ** ppn = & pnNew;
   NODE * pn;
   NODE * pnNext;

   BOOL fSignalSeen = FALSE;
   for (pn = pnNew; pn-> Ptr (); pn = pnNext) {
    pnNext = pn-> Ptr () -> pnNext;
    if (fSignalSeen) {
     SetEvent (pn-> Ptr () -> hEvent);
     delete pn-> Ptr ();
    } else if (pn-> Key () & Signaled) {
     fSignalSeen = TRUE;
     (* ppn) = Node (Locked);  // steal, leaving captured
     SetEvent (pn-> Ptr () -> hEvent);
     delete pn-> Ptr ();
    } else {
     ppn = & pn-> Ptr () -> pnNext;
    }
   }
  } // try to unlock again
 } // end of function

To perform a deferred job, we go around the list until we find the set bit S. In the first element, in which the bit S is set, we reset the outgoing pointer to “steal” the rest of the list; then, bypassing this remainder, we set each event, and delete the item. As before, we “steal” the list so that simultaneous calling from several threads does not lead to multiple installation of the event. At the end, when the work is done, we again try to unlock the list - in the expectation that one day there will be no new work, and we will be able to right out of the function.

As you can see, the basic idea of the “make, write, (instruct another)” model is quite simple, but its implementation, taking into account all the subtleties, can cause a headache. It is best to leave the implementation of such pieces to system programmers who have enough time, patience and ability to cope with all this. For example, in an interview with Arun Kishan: Inside Windows 7 — Farewell to the Windows Kernel Dispatcher Lock — the architect who worked on the Windows kernel talks about using this particular model.

Source: https://habr.com/ru/post/118850/

All Articles

Silent algorithms: model “make, write, (assign to another)”

More articles: