Thread-safe lazy initialization in C ++

Raymond Chen wrote an interesting series of blogposts about lockless synchronization. I would like to publish these notes for hack readers. This post is an introduction to a series compiled from three old Chen posts.

Lazy initialization with built-in C ++ tools
Lockless synchronization
Bezamochnaya thread-safe lazy initialization

Lazy initialization with built-in C ++ tools

Initializing static local variables in C ++ is not thread safe, and intentionally!

The specification establishes that static local variables (as opposed to global ones) are initialized when they first execute a block of code in which they are declared.
')
Try to find flight-condition yourself in a code fragment that calculates the result of a certain function “lazily” - at the first call, and then saves this result for subsequent calls.

int ComputeSomething()
{
static int cachedResult = ComputeSomethingSlowly();
return cachedResult;
}

(Approximately this code is advised in the popular C ++ FAQ , so as not to depend on the order of initialization of global static variables chosen by the compiler.)

The thing is, the compiler turns the initialization of static local variables into something like this:

int ComputeSomething()
{
static bool cachedResult_computed = false;
static int cachedResult;
if (!cachedResult_computed) {
cachedResult_computed = true;
cachedResult = ComputeSomethingSlowly();
}
return cachedResult;
}

Now the flight-condition is visible to the naked eye. Imagine that two threads called ComputeSomething() at the same time. The first stream only has time to perform cachedResult_computed = true before the system switches to the second stream. Now the second thread sees the set cachedResult_computed flag, skips the initialization of the variable, and takes its uninitialized value.

And this is not a compiler bug - it was prescribed by the C ++ standard . (Then, in the TC1 edition, the thread safety requirement was removed, leaving “undefined behavior” in case of simultaneous initialization.)

Multi-threaded initialization of static local variables can lead to more significant problems:

class Something { ... };
int ComputeSomething()
{
static Something s;
return s.ComputeIt();
}

This code will unfold in

class Something { ... };
int ComputeSomething()
{
static bool s_constructed = false;
static uninitialized Something s;
if (!s_constructed) {
s_constructed = true;
new(&s) Something; // s
atexit(DestructS);
}
return s.ComputeIt();
}
// s
void DestructS()
{
ComputeSomething::s.~Something();
}

Now flight-kondishnov perhaps several. As before, one of the threads can get ahead of the other, and grab the value of s before it is initialized. But in addition to that, if the first thread managed to check the value of s_constructed , but did not manage to set it to true , then the object s will be created twice and destroyed twice . (Rather, one object will “leak”, and the second one will be destroyed twice.) No jokes!

But that's not all. Look what happens if there are two initialized static local variables:

int ComputeSomething()
{
static Something s(0);
static Something t(1);
return s.ComputeIt() + t.ComputeIt();
}

The compiler, to save memory, will save both flags in one variable:

class Something { ... };
int ComputeSomething()
{
static char constructed = 0;
static uninitialized Something s;
if (!(constructed & 1)) {
constructed |= 1;
new(&s) Something; // s
atexit(DestructS);
}
static uninitialized Something t;
if (!(constructed & 2)) {
constructed |= 2;
new(&t) Something; // t
atexit(DestructT);
}
return s.ComputeIt() + t.ComputeIt();
}

Great, now we have several threads perform logical operations on a common variable without any synchronization . Let's see what happens if one stream attempts to constructed |= 1 at the same time as another stream performs constructed |= 2 .

In x86 / x64 processors there are commands that work with the operand in memory directly. Compiler will generate machine code

or constructed, 1
...
or constructed, 2

without lock prefixes, so that on multiprocessor machines both operations may have time to read the same old value, and one of the set bits will be “lost”.

In ia64 / alpha processors, logical commands work only with registers, so this can happen even on a single-processor machine. The compiler splits each of the |= operations into three machine commands:

ldl t1,0(a0) ;
addl t1,1,t1 ;
stl t1,1,0(a0) ;

If the execution of a thread is interrupted between reading the old value and writing a new one, then the new value can “wipe out” changes made from other threads.

It may be worth protecting the initialization of static variables by the critical section:

int ComputeSomething()
{
EnterCriticalSection(...);
static int cachedResult = ComputeSomethingSlowly();
LeaveCriticalSection(...);
return cachedResult;
}

Now initialization can be performed only by one thread at a time.

But what if ComputeSomething() is called again from the same thread? (“We've traced the call; it's coming from inside the thread!”) For example, if ComputeSomethingSlowly() explicitly or implicitly calls ComputeSomething() . (Sounds far-fetched? And imagine that ComputeSomethingSlowly() shows a dialog box, and the message loop through DispatchMessage() calls an event handler, and this handler uses ComputeSomething() )
In this case, the calling thread will already own the critical section, will go inside without any difficulty, and catch the same uninitialized value.

So, when you see the initialization of a static local variable at run time - be vigilant.

Lockless synchronization

InterlockedXxx operations are an efficient implementation of atomic operations on 32-bit values and pointers. But atomicity alone does not guarantee flow-safety.

Suppose you have a shared variable protected by a critical section, and somewhere in another function you want to increase this variable by one. “Great, I can do without the critical section, and just call InterlockedIncrement .”

Is there anything that the goal of the critical section is to ensure that no one changes the value of a variable simultaneously with the execution of the protected code? You just took it and "quietly" changed it.

Another misconception is the implementation of complex atomic operations using the critical section. For example, this is how they try to implement InterlockedMultiply :

// !
LONG InterlockedMultiply(volatile LONG *plMultiplicand, LONG lMultiplier)
{
EnterCriticalSection(&SomeCriticalSection);
LONG lResult = *plMultiplicand *= lMultiplier;
LeaveCriticalSection(&SomeCriticalSection);
return lResult;
}

Yes, this code prevents the conflict between two InterlockedMultiply calls, but does not guarantee atomicity with respect to other operations on the variable. Take this example:

int x = 2;
Thread1()
{
InterlockedIncrement(&x);
}

Thread2()
{
InterlockedMultiply(&x, 5);
}

If the multiplication is atomic, then as a result of two operations, the result can be x = 15 or x = 11, depending on which of the operations has time to be executed earlier. But our “atomic” multiplication allows us to obtain other values, rather unexpected ones:

Stream 1	Stream 2
initial: `x` = 2
	`InterlockedMultiply(&x, 5)`
	`EnterCriticalSection`
	read `x` (read: 2)
`InterlockedIncrement(&x);` now `x` = 3
	multiply by 5 (result: 10)
	write `x` (written: 10)
	`LeaveCriticalSection`
result: `x` = 10

It is not so atomic! How to fix it?

If the operation being performed is a “pure function” that does not depend on global data, then for its atomic implementation you can use InterlockedCompareExchange :

LONG InterlockedMultiply(volatile LONG *plMultiplicand, LONG lMultiplier)
{
LONG lOriginal, lResult;
do {
lOriginal = *plMultiplicand;
lResult = lOriginal * lMultiplier;
} while (InterlockedCompareExchange(plMultiplicand,
lResult, lOriginal) != lOriginal);
return lResult;
}

The operation consists of three steps.

The first is the “capture” of parameter values: lOriginal = *plMultiplicand;

The second is to perform an operation on the captured values: lResult = lOriginal * lMultiplier;

The third is to save the result only if the parameter values have not changed: InterlockedCompareExchange(plMultiplicand, lResult, lOriginal)

If they have changed, it means that some other operation has overtaken our operation; and this means that our operation needs to be performed again.

Returning to the example above: after an interlocked InterlockedIncrement call, the operation will notice that the value of x has changed and will try to multiply in a new way. The atomicity of the comparison in InterlockedCompareExchange ensures that the result will be recorded only if the parameters have not changed.

Do not forget that the operation should not use any other data! InterlockedCompareExchange unable to check if the values of the global variables used by the operation have changed. Otherwise, a situation known as “ABA problem” is possible: the first time the operation was performed, the parameter value was A; then, imperceptibly for the running thread, its value has changed to B, and then again to A. InterlockedCompareExchange will not notice that something has changed, and will save the value calculated for the first time. If, together with the parameter, other data influencing the result have changed, the recorded result will be incorrect.

If all the operations on the common variable can be implemented as atomic, then it will not need to be protected by the critical section; gain is achieved in performance, and in scalability, and in the consumption of system resources. This approach to the design of multi-threaded programs is called "lockless synchronization." Its advantage lies mainly in the fact that if one of the threads captures a lock (critical section, mutex, semaphore, etc.) and then for one reason or another “gets stuck”, then all the streams using the same lock, “ get stuck after him. With lockless synchronization, none of the "stuck" threads can stop the entire system: no matter how many threads are stuck, at least one of the remaining ones will continue to work. Lockless synchronization is required in real-time systems, where the task execution time should not depend on the vicissitudes of the allocation of operating time by the operating system to each thread.

Bezamochnaya thread-safe lazy initialization

Suppose we decided to lazily initialize a couple of variables. In a single-threaded program, this does not cause difficulties:

// a b : a ≥ 0 b ≥ a.
// -1 ,
// " "
int a = -1, b = -1;

void LazyInitialize()
{
if (a != -1) return; //

a = calculate_nominal_a();
b = calculate_nominal_b();

//
a = max(0, a);
b = max(a, b);
}

As in the examples above, in a multithreaded program, such initialization can lead to flight-condition:

Stream 1	Stream 2
`if (a != -1)` [not satisfied]
`a = calculate_nominal_a();` // returns 2
	`if (a != -1) return;` // too early!

The first thread interrupted execution before it could calculate b , so the second thread takes its uninitialized value.

“Ah, well, it's easy to fix! To find out if initialization is done, we will check not a , but b . ”

void LazyInitialize()
{
if ( b != -1) return; //

a = calculate_nominal_a();
b = calculate_nominal_b();

//
a = max(0, a);
b = max(a, b);
}

Still possible flight-condition:

Stream 1	Stream 2
`if (b != -1)` [not satisfied]
`a = calculate_nominal_a();` // returns 2
`b = calculate_nominal_b();` // returns 1
	`if (b != -1) return;` // too early!

The second thread will take the values a and b that do not meet the imposed restrictions.

“You can protect yourself from this! I will calculate the values of a and b in local variables, and write them to global variables only when the calculation is completed - so that the second thread cannot see the “semi-ready” values. ”

void LazyInitialize()
{
if (b != -1) return; //

//
int temp_a = calculate_nominal_a();
int temp_b = calculate_nominal_b();

//
temp_a = max(0, temp_a );
temp_b = max( temp_a, temp_b );

//
a = temp_a;
b = temp_b;
}

Almost perfect; but flight-condition is still possible:

Stream 1	Stream 2
`if (b != -1)` [not satisfied]
`temp_a = calculate_nominal_a();` // returns 2
`temp_b = calculate_nominal_b();` // returns 1
`temp_a = max(0, temp_a);` // temp_a = 2
`temp_b = max(temp_a, temp_b);` // temp_b = 2
`a = temp_a;` // written to processor cache
`b = temp_b;` // written to processor cache
`b` value is stored in memory
	`if (b != -1) return;` // too early!
the value of `a` stored in memory

The compiler does not guarantee that the new value of b will be available to other processors earlier than the new value of a . Although the processor performs the assignments in this order, the shared memory control circuits in a multiprocessor system can write them in reverse order, so that the other processor will see a = -1 and b = 2.

Record b in memory must be issued as a Release operation ; then all previous operations have to be reflected in memory before other processors see the new value of b .

void LazyInitialize()
{
if (b != -1) return; //

//
int temp_a = calculate_nominal_a();
int temp_b = calculate_nominal_b();

//
temp_a = max(0, temp_a);
temp_b = max(temp_a, temp_b);

//
a = temp_a;
// b
// , ,
// Release
InterlockedCompareExchangeRelease(
reinterpret_cast<LONG*>&b, temp_b, -1);
}

The resulting code is similar to a non-locked atomic operation, only without input parameters: first, all calculations are performed in local variables, and then written to memory in a single InterlockedCompareExchangeRelease operation. Assuming that calculate_nominal_a() and calculate_nominal_b() always return the same values, we do not need to perform the operation again in case of failure: if the variables are not initialized by us, it means that the other thread has already done our work for us. (Note: if InterlockedCompareExchangeRelease fails, the variable a will still be overwritten by our thread!)

Finally, Raymond offered a riddle: can the temp_a variable be dispensed with? Indeed, in any case, the value of a used only after the thread has made sure that initialization is complete.

Source: https://habr.com/ru/post/118390/

All Articles

Thread-safe lazy initialization in C ++

Lazy initialization with built-in C ++ tools

Lockless synchronization

Bezamochnaya thread-safe lazy initialization

More articles: