The advantages of a non-blocking algorithm are not only and not so much in performance.

I hope that the final post of the series - unlike the three previous ones, which turned out to be too hardcore - will cause not only philological interest in the habrapublica.

One of the commentators of the Chen series of posts about lock-free algorithms inquired about the conditions under which these more complex algorithms significantly exceed such simple blocking primitives as critical sections in terms of performance.

He is absolutely right that the transition from a simple algorithm to a complex one should be justified by measuring performance: if a simple algorithm copes with its task satisfactorily, then there is nothing to look for good from good.
')
But the advantages of non-blocking synchronization are not limited only to improved, compared to the usual blocking primitives, performance. (Later in this post we will see how you can get these non-obvious advantages without switching to a completely lockless synchronization.)

Suppose that to protect the initialization of a singleton, instead of a tricky non-blocking algorithm, we decided to use the usual critical section:

 CRITICAL_SECTION g_csSingletonX;
 X * g_px = NULL;

 X * GetSingletonX ()
 {
 EnterCriticalSection (& g_csSingletonX);
 if (g_px == NULL)
 {
 g_px = new (nothrow) X ();
 }
 LeaveCriticalSection (& g_csSingletonX);
 return g_px;
 }

Problems begin if the X() constructor itself uses locks: then, in order to prevent deadlocks, the program must define a hierarchy of locks , and lock locks strictly in that order. (Moreover, if the constructor code is not written by you, and you cannot influence the order of locks in it, then at this stage you will have to admit defeat.)

When building a hierarchy of locks, it may be that some of the locks cannot always be locked in the same order, and therefore they need to be merged. The sync gets coarser. For example, it may turn out that g_csSingletonX needs to be combined with locks of all singleton constructors, as well as locks of all methods of class X.

 CRITICAL_SECTION g_csCommon;

 X * GetSingletonX ()
 {
 EnterCriticalSection (& g_csCommon);
 if (g_px == NULL)
 {
 g_px = new (nothrow) X ();
 }
 LeaveCriticalSection (& g_csCommon);
 return g_px;
 }

 Y * GetSingletonY ()
 {
 EnterCriticalSection (& g_csCommon);
 if (g_py == NULL)
 {
 g_py = new (nothrow) Y ();
 }
 LeaveCriticalSection (& g_csCommon);
 return g_py;
 }

 void X :: DoSomething ()
 {
 EnterCriticalSection (& g_csCommon);
 .. something ..
 LeaveCriticalSection (& g_csCommon);
 }

Gee! From the small and imperceptible critical section, a global block was obtained, for which a lot of threads are constantly competing.

The essential advantage of a non-blocking algorithm is that if you do not have locks, then there can be no deadlocks; hierarchy is no longer needed. (Unless you build a nonblocking lock out of nonblocking operations, as in the unstable cache example ). Another nice feature of non-blocking is that once there are no locks, they will not remain in an “unattended” state when the thread that seized the lock has crashed; you will not have to wrestle with how to continue to work correctly in the case of WAIT_ABANDONED . The protected data remains correct all the time, and atomic moves from one correct state to the next.

Have you encountered products that, if an error occurs in any one component, can be returned to a working state only by restarting the computer? Locklessness makes it easier to create programs that do not behave this way.

Even if you obviously prefer traditional blocking primitives - after all, using them is so convenient! - then you should take a closer look at non-blocking algorithms. Suppose in the following code:

 CRITICAL_SECTION g_cs;
 GORILLADATA g_data;

 void PokeGorilla (double intensity)
 {
 EnterCriticalSection (& g_cs);
 DeformGorilla (intensity, & g_data);
 Reticulate (& g_data.spline);
 int stress = CalculateTension (& g_data.spline);
 if (stress <25) g_data.mood = RELAXED;
 else if (stress <50) g_data.mood = ANNOYED;
 else g_data.mood = ANGRY;
 DeleteObject (g_data.hbmGorilla);
 g_data.hbmGorilla = RenderGorilla (& g_data);
 LeaveCriticalSection (& g_cs);
 }

- There are a number of controversial places. First of all, is the blocking hierarchy maintained? For example, the Reticulate() function may require a lock protecting geometric operations; and it may turn out that, according to the hierarchy, this lock needs to be captured before g_cs .

If many threads compete for g_cs , then we would have to reduce the time during which PokeGorilla() holds the lock. RenderGorilla() is probably a difficult and slow operation. (You know how hard it is to depict realistic fur.) Then during the RenderGorilla() call, the RenderGorilla() lock g_cs held in vain.

A possible solution to both problems is to switch to lockless synchronization; but this is so hard! (Almost as hard as realistic fur.) Maybe we can manage to spend only 20% of the work to get 80% of the benefits of non-blocking?

void PokeGorilla(double intensity)
{
//
EnterCriticalSection(&g_cs);
GORILLADATA data = g_data;
LeaveCriticalSection(&g_cs);

//
DeformGorilla(intensity, &data);
Reticulate(&data.spline);
int stress = CalculateTension(&data.spline);
if (stress < 25) data.mood = RELAXED;
else if (stress < 50) data.mood = ANNOYED;
else data.mood = ANGRY;
data.hbmGorilla = RenderGorilla(&data);

//
EnterCriticalSection(&g_cs);
HBITMAP hbmToDelete = g_data.hbmGorilla;
g_data = data;
LeaveCriticalSection(&g_cs);
DeleteObject(hbmToDelete);
}

In accordance with the “make, write” model, we copy the state of the gorilla into a local variable, and perform all actions on this local variable: during the “reticulation” the lock is released, so there are no problems with the hierarchy of locks; and during the drawing of the gorilla the lock is also removed, so that the remaining threads are less idle for nothing. When the state of the gorilla is processed, we again capture the lock, and record the changes.

In this case, it turns out that the one who poked the gorilla last was the one who won: the state record he was making overwrites all changes made during the processing while the lock was released. If the changes are independent of each other, then this behavior is acceptable; but suppose the emotional stress of the gorilla accumulates with each poking. We need to find out if the gorilla was poked during the time we processed the previous poke; and if stuck - to take into account the final result of the new data.

Then we return to the “make, write, (try again)” model with a version counter:

LONG g_lCounter;

void PokeGorilla(double intensity)
{
BOOL fSuccess;

do {
//
EnterCriticalSection(&g_cs);
GORILLADATA data = g_data;
LONG lCounter = g_lCounter;
LeaveCriticalSection(&g_cs);

//
DeformGorilla(intensity, &data);
Reticulate(&data.spline);
int stress = CalculateTension(&data.spline);
if (stress < 25) data.mood = RELAXED;
else if (stress < 50) data.mood = ANNOYED;
else data.mood = ANGRY;
data.hbmGorilla = RenderGorilla(&data);

//
EnterCriticalSection(&g_cs);
HBITMAP hbmToDelete;
if (lCounter == g_lCounter)
{
hbmToDelete = g_data.hbmGorilla;
g_data = data;
g_lCounter++;
fSuccess = TRUE;
} else {
hbmToDelete = data.hbmGorilla;
fSuccess = FALSE;
}
LeaveCriticalSection(&g_cs);
DeleteObject(hbmToDelete);
} while (!fSuccess);
}

In addition to the usual gorilla data, we store the version number and increment it with each poking. GORILLADATA , this number would be better stored in the structure of GORILLADATA . (No, it would be really better not to poke a gorilla!) In a non-blocking algorithm, we would check the value of the version number by using InterlockedCompareExchangeRelease ; but the structure of GORILLADATA impossible to update atomically, so we use the critical section to check it and simultaneously update it. Nevertheless, the model remains the same as before: if the gorilla was poked behind our back, we are forced to throw out all the results we have received, and start the calculations in a new way, to reflect both pokes in the final result.

Source: https://habr.com/ru/post/118917/

All Articles

The advantages of a non-blocking algorithm are not only and not so much in performance.

More articles: