Memory barriers and non-blocking synchronization in .NET

Introduction

In this article I want to talk about the use of some constructions that are used to implement non-blocking synchronization. It's about the volatile keyword, VolatileRead, VolatileWrite, and MemoryBarrier functions. We will consider what problems force us to use these language constructs and their solutions. When discussing memory barriers, let's briefly review the .NET memory model.

Compiler optimizations

The main problems that a programmer encounters when using non-blocking synchronization are optimizing the compiler and rearranging instructions by the processor.
Let's start with an example when the compiler brings a problem into a multi-threaded program:

class ReorderTest { private int _a; public void Foo() { var task = new Task(Bar); task.Start(); Thread.Sleep(1000); _a = 0; task.Wait(); } public void Bar() { _a = 1; while (_a == 1) { } } }

By running this example, you can make sure that the program freezes. The reason lies in the fact that the compiler caches the _a variable in the processor register.
To solve such problems, C # provides the keyword volatile. The use of this keyword to a variable forbids the compiler to optimize in any way the references to it.

This is what the revised declaration for _a will look like.

 private volatile int _a;

Disabling compiler optimizations is not the only effect of using this keyword. Other effects will be discussed later.
')

Rearrangement of instructions

Consider now the case when the source of problems is the rearrangement of instructions by the processor.
Let there be the following code:

 class ReorderTest2 { private int _a; private int _b; public void Foo() { _a = 1; _b = 1; } public void Bar() { if (_b == 1) { Console.WriteLine(_a); } } }

Procedures Foo and Bar run simultaneously from different threads.
Is this code correct, that is, can we say with confidence that the program will never output zero? If we were talking about single-threaded programs, then to check this code it would be enough to run it once. But, since we are dealing with multithreading, this is not enough. Instead, we need to understand whether we have guarantees that the program will work correctly.

.NET memory model

As already mentioned, the incorrect behavior of a multithreaded program can be caused by permutations of instructions on the processor. Consider this problem in more detail.
Any modern processor can swap memory read and write instructions for optimization. Let me explain this with an example.

 int a = _a; _b = 10;

In this code, the variable _a is first read, then _b is written. But when executing this program, the processor can rearrange read and write instructions, that is, the variable _b will be written first, and only then _a will be read. For a single-threaded program, such a permutation does not matter, but for a multi-threaded program this can turn into a problem. We have now reviewed the load-write permutation. Similar permutations are possible for other combinations of instructions.

The set of permutation rules of such instructions is called a memory model. The .NET platform has its own memory model, which abstracts us from the memory models of a particular processor.
This is the .NET memory model.

Permutation Type	Permutation Allowed
Download Download	Yes
Download Record	Yes
Write-load	Yes
Record record	Not

Now we can consider our example from the point of view of the .NET memory model. Since a write-write permutation is prohibited, writing to the _a variable will always occur before writing to the _b variable, and here the program will work correctly. The problem is in the procedure Bar. Since the permutation of reading instructions is not prohibited, the _b variable can be read before _a.
After permutation, the code will be executed as if it were written as follows:

 var tmp = _a; if (_b == 1) { Console.WriteLine(tmp); }

When we talk about permutations of instructions, we mean the permutation of instructions of the same thread that read / write different variables. If the same variable is written in different streams, then their order is random in any case. And if we are talking about reading and writing the same variable, for example, like this:

 var a = GetA(); UseA(a);

it is clear that there can be no permutations here.

Memory barriers

To solve this problem, there is a universal method - adding a memory barrier (memory barrier, memory fence).
There are several types of memory barriers: full, release fence and accure fence.
A full barrier ensures that all reads and writes located before / after the barrier will also be executed before / after the barrier, that is, no memory access instruction can jump over the barrier.
Now let's deal with two other types of barriers:
Accure fence guarantees that the instructions after the barrier will not be moved to the position before the barrier.
Release Fence ensures that instructions up to the barrier will not be moved to the position after the barrier.
Just a couple of words about terminology. The term volatile write means writing to memory in combination with the creation of a release fence. The term volatile read means reading memory in combination with creating an accure fence.

.NET provides the following methods of working with memory barriers:

Thread.MemoryBarrier () method creates a complete memory barrier
the volatile keyword turns each operation on a variable marked with this word into volatile write or volatile read, respectively.
Thread.VolatileRead () method performs volatile read
Thread.VolatileWrite () method performs volatile write

Let's return to our example. As we already understood, the problem may arise due to the rearrangement of reading instructions. To solve it, we add a memory barrier between the readings _a and _b. After that, we have a guarantee that the stream in which the Bar method is executed will see the records in the correct order.

 class ReorderTest2 { private int _a; private int _b; public void Foo() { _a = 1; _b = 1; } public void Bar() { if (_a == 1) { Thread.MemoryBarrier(); Console.WriteLine(_b); } } }

Using the full memory barrier is redundant here. To eliminate the permutation of reading instructions, it is sufficient to use volatile read while reading _a. This can be achieved using the Thread.VolatileRead method or the volatile keyword.

Methods Thread.VolatileWrite and Thread.VolatileRead

Let's explore the Thread.VolatileWrite and Thread.VolatileRead methods in more detail.
On MSDN, VolatileWrite is written: “Writes a value directly to a field, so that it becomes visible to all processors on the computer.”
In fact, this description is not entirely correct. These methods guarantee two things: the absence of compiler optimizations ¹ and the absence of permutations of instructions in accordance with volatile read or write properties. Strictly speaking, the VolatileWrite method does not guarantee that the value will immediately become visible to other processors, and the VolatileRead method does not guarantee that the value will not be read from cache ² . But due to the lack of optimizations of the code by the compiler and the coherence of the processor caches, we can assume that the description from MSDN is correct.

Consider how these methods are implemented:

 [MethodImpl(MethodImplOptions.NoInlining)] public static int VolatileRead(ref int address) { int num = address; Thread.MemoryBarrier(); return num; } [MethodImpl(MethodImplOptions.NoInlining)] public static void VolatileWrite(ref int address, int value) { Thread.MemoryBarrier(); address = value; }

What else can you see here?
First, it uses a complete memory barrier. As we said, volatile write should create a release fence. Since release fence is a special case of a complete barrier, this implementation is correct, but redundant. If there was a release fence, the processor / compiler would have more options for optimization. Why the .NET development team implemented these functions precisely through a complete barrier is hard to say. But it is important to remember that these are just the details of the current implementation, and no one guarantees that it will not change in the future.

Compiler and CPU Optimization

I want to note once again: both the volatile keyword and all three considered functions of installing memory barriers affect both processor optimization and compiler optimization.
That is, for example, this code is a completely correct solution to the problem shown in the first example:

 public void Bar() { _a = 1; while (_a == 1) { Thread.MemoryBarrier(); } }

Danger volatile

Looking at the implementation of the VolatileWrite and VolatileRead methods, it becomes clear that such a pair of instructions can be rearranged:

 Thread.VolatileWrite(b) Thread.VolatileRead(a)

Since this behavior is inherent in the definition of volatile read and write terms, this is not a bug and operations with variables marked with the volatile keyword have a similar behavior.
But in practice, this behavior may be unexpected.
Consider an example:

 class Program { volatile int _firstBool; volatile int _secondBool; volatile string _firstString; volatile string _secondString; int _okCount; int _failCount; static void Main(string[] args) { new Program().Go(); } private void Go() { while (true) { Parallel.Invoke(DoThreadA, DoThreadB); if (_firstString == null && _secondString == null) { _failCount++; } else { _okCount++; } Console.WriteLine("ok - {0}, fail - {1}, fail percent - {2}", _okCount, _failCount, GetFailPercent()); Clear(); } } private float GetFailPercent() { return (float)_failCount / (_okCount + _failCount) * 100; } private void Clear() { _firstBool = 0; _secondBool = 0; _firstString = null; _secondString = null; } private void DoThreadA() { _firstBool = 1; //Thread.MemoryBarrier(); if (_secondBool == 1) { _firstString = "a"; } } private void DoThreadB() { _secondBool = 1; //Thread.MemoryBarrier(); if (_firstBool == 1) { _secondString = "a"; } } }

If the instructions of the program would be executed exactly in the order in which they are defined, then at least one line would always turn out to be equal to “a”. In fact, due to the rearrangement of instructions, this is not always the case. Replacing the volatile keyword with the appropriate methods, as expected, does not change the result.
To correct the behavior of this program, it is enough to uncomment lines with full memory barriers.

Performance Thread.Volatile * and volatile keyword

On most platforms (more precisely, on all platforms supported by Windows, except for the dying IA64), all writing and reading are volatile write and volatile read, respectively. Thus, at run time, the volatile keyword has no effect on performance. In contrast, the Thread.Volatile * methods, first, bear the overhead of the method call itself, labeled MethodImplOptions.NoInlining, and, second, in the current implementation, they create a complete memory barrier. That is, in terms of performance, in most cases the use of a keyword is preferable.

Links

¹ See page 514 by Joe Duffy. Concurrent Programming on Windows
² See VolatileWrite implemented incorrectly

Used literature:

Joseph Albahari. Threading in C #
Vance Morrison. Understand the Impact of Low-Lock Techniques in Multithreaded Apps
Pedram Rezaei. CLR 2.0 memory model
MS Connect: VolatileWrite implemented incorrectly
ECMA-335 Common Language Infrastructure (CLI)
C # Language Specification
Jeffrey Richter. CLR via C # Third Edition
Joe duffy Concurrent Programming on Windows
Joseph Albahari. C # 4.0 in a nutshell

Source: https://habr.com/ru/post/130318/

All Articles