.NET: Tools for working with multithreading and asynchrony. Part 2

I publish on Habr the original article, the translation of which is posted on the Codingsight blog.

I continue to create a text version of my speech on the multi-threading MAP. The first part can be found here or here , there it was more about the basic set of tools to start a thread or Task, ways to view their status and some sweet little things, like PLinq. In this article I want to focus more on the problems that may arise in a multi-threaded environment and some ways to solve them.

Content

About shared resources

It is impossible to write a program that would work in several streams, but at the same time would not have a single shared resource: since even if this happens at your level of abstraction, then going down one or more levels below it turns out that there is still a common resource. I will give a few examples:
')
Example # 1:

Being afraid of possible problems, you forced streams to work with different files. By file to stream. It seems to you that the program does not have a single share.

Having gone down several levels below, we understand that the hard disk is one, and the problems of providing access to it will have to be solved by its driver or operating system.

Example # 2:

After reading example # 1, you decided to place the files on two different remote machines with two physically different pieces of hardware and operating systems. We keep 2 different connections via FTP or NFS.

Having gone down several levels below, we understand that nothing has changed, and the problem of competitive access will have to be solved by the driver of the network card or the operating system of the machine running the program.

Example # 3:

Having lost a considerable part of the hair in an attempt to prove the possibility of writing a multi-threaded program, you refuse files altogether and decompose the calculations into two different objects, the links to each of which are available only to one stream.

I kill the final dozen of nails in the coffin of this idea: one runtime environment and a garbage collector, a flow scheduler, physically one operational and memory, one processor are still shared resources.

So, we found out that it is impossible to write a multi-threaded program without a single shared resource at all levels of abstraction across the width of the entire technology stack. Fortunately, each of the levels of abstraction, as a rule, one way or another, partially or completely solves the problems of competitive access or simply prohibits it (for example, any UI framework prohibits working with elements from different streams), because problems often arise with shared resources on your level of abstraction. To solve them introduce the concept of synchronization.

Possible problems when working in a multithreaded environment

Errors in the software can be divided into several groups:

The program does not give the result. Falls or freezes.
The program gives an incorrect result.
The program gives the correct result, but does not satisfy one or another non-functional requirement. Working too long or consuming too many resources.

In a multithreaded environment, the two main problems causing errors 1 and 2 are deadlock and race condition .

Deadlock

Deadlock - mutual lock. There are many different variations. The most frequent can be considered the following:

While Thread # 1 was doing something, Thread # 2 blocked resource B , a little later Thread # 1 blocked resource A and is trying to block resource B , unfortunately this will never happen, because Thread # 2 will release resource B only after it has blocked resource A.

Race-Condition

Race-Condition - race condition. The situation in which the behavior and the result of the calculations performed by the program depend on the work of the runtime thread scheduler.
The trouble with this situation lies in the fact that your program may not work only once out of a hundred or even out of a million.

The situation is aggravated by the fact that problems can go together, for example: with a certain behavior of the thread scheduler, a deadlock occurs.

In addition to these two problems leading to obvious errors in the program, there are also some that may not lead to an incorrect result of the calculations, but more time or computing power will be spent to get it. Two such problems are Busy Wait and Thread Starvation .

Busy wait

Busy-Wait is a problem in which the program consumes processor resources not for computing, but for waiting.

Often such a problem in the code looks like this:

while(!hasSomethingHappened) ;

This is an example of extremely bad code, since This code completely occupies one core of your processor without doing anything useful at all. It can be justified if and only if it is critically important to handle the change in a value in another thread. And speaking fast, I'm talking about a case where you can't even wait a few nanoseconds. In other cases, that is, in all that a healthy brain can produce, it is wiser to use ResetEvent varieties and their Slim versions. About them below.

Perhaps some of the readers will offer to solve the problem of fully loading one core with useless waiting to add to a loop a construct like Thread.Sleep (1). This will really solve the problem, but it will create another one: the response time to the change will be on average half a millisecond, which may not be much, but catastrophically more than the ResetEvent family synchronization primitives could be.

Thread starvation

Thread-Starvation is a problem in which there are too many simultaneously running threads in the program. What does it mean exactly those flows that are busy with calculations, and not just waiting for a response from any IO. With this problem, all possible performance gains from using threads are lost, since The processor spends a lot of time switching contexts.
It is convenient to look for such problems with the help of various profilers, below is an example of a screenshot from the dotTrace profiler running in Timeline mode.

(The picture is clickable)

In the program that does not suffer from stream hunger, pink color on the graphs, there will be no reflective streams. In addition, in the Subsystems category it can be seen that the program of 30.6% was waiting for the CPU.

When such a problem is diagnosed, it is solved quite simply: you run too many threads at one time, run less or not all at once.

Sync Tools

Interlocked

This is probably the most lightweight way to synchronize. Interlocked is a set of simple atomic operations. Atomic is called an operation at the time of which nothing can happen. In the .NET Interlocked, it is represented by a static class of the same name with a number of methods, each of which implements one atomic operation.

To realize the horror of non-atomic operations, try writing a program that launches 10 threads, each of which makes a million increments of the same variable, and at the end of their work output the value of this variable - unfortunately it will be very different from 10 million, moreover each time you start the program, it will be different. This happens because even such a simple operation as an increment is not atomic, but involves extracting a value from memory, calculating a new one and writing it back. Thus two threads can simultaneously do each of these operations, in which case the increment will be lost.

The Interlocked class provides the Increment / Decrement methods, it is not difficult to guess what they are doing. It is convenient to use them if you process data in several streams and consider something. Such code will work much faster than the classic lock. If for the situation described in the last paragraph to use Interlocked, then the program in any situation will consistently produce a value of 10 million.

The CompareExchange method performs, at first glance, a rather unobvious function, but all of its presence allows for the implementation of many interesting algorithms, primarily the lock-free family.

 public static int CompareExchange (ref int location1, int value, int comparand);

The method takes three values: the first is passed by reference and this is the value that will be changed to the second, if at the time of the comparison location1 coincides with comparand, then the original value of location1 will be returned. It sounds quite confusing, so it's easier to write code that performs the same operations as CompareExchange:

 var original = location1; if (location1 == comparand) location1 = value; return original;

Only implementation in the Interlocked class will be atomic. That is, if we wrote such code ourselves, a situation might occur when the condition location1 == comparand has already been fulfilled, but by the time the expression1 executes the expression1 = value, another thread has changed the value of location1 and it will be lost.

A good example of using this method can be found in the code that the compiler generates for any C # event.

Let's write a simple class with one MyEvent event:

 class MyClass { public event EventHandler MyEvent; }

Let's build the project in the Release configuration and open the build with dotPeek with the Show Compiler Generated Code option enabled:

 [CompilerGenerated] private EventHandler MyEvent; public event EventHandler MyEvent { [CompilerGenerated] add { EventHandler eventHandler = this.MyEvent; EventHandler comparand; do { comparand = eventHandler; eventHandler = Interlocked.CompareExchange<EventHandler>(ref this.MyEvent, (EventHandler) Delegate.Combine((Delegate) comparand, (Delegate) value), comparand); } while (eventHandler != comparand); } [CompilerGenerated] remove { // The same algorithm but with Delegate.Remove } }

Here you can see that behind the scenes, the compiler has generated a rather sophisticated algorithm. This algorithm protects against the event of losing the subscription to an event when several threads subscribe to this event simultaneously. Let's write down the add method in more detail, remembering what behind the scenes the CompareExchange method does.

 EventHandler eventHandler = this.MyEvent; EventHandler comparand; do { comparand = eventHandler; // Begin Atomic Operation if (MyEvent == comparand) { eventHandler = MyEvent; MyEvent = Delegate.Combine(MyEvent, value); } // End Atomic Operation } while (eventHandler != comparand);

So it is a little clearer, although, probably, still needs to be explained. In words, I would describe this algorithm as follows:

If MyEvent is still the same as it was at the moment we started to execute Delegate.Combine, then write into it what the Delegate.Combine will return, and if not, it doesn’t matter, let's try again and repeat until it comes out.

So no event subscription will be lost. You will have to solve a similar problem if you suddenly want to implement a dynamic thread-safe lock-free array. If several threads rush to add elements to it, then it is important that they all be added as a result.

Monitor.Enter, Monitor.Exit, lock

These are the most commonly used constructs for thread synchronization. Implement the idea of a critical section: that is, the code written between calls to Monitor.Enter, Monitor.Exit on one resource can be executed at a time only by one thread. The lock statement is syntactic sugar around Enter / Exit calls wrapped in try-finally. A nice feature of the implementation of the critical section in .NET is the ability to re-enter it for the same stream. This means that such code will run without problems:

 lock(a) { lock (a) { ... } }

It is unlikely, of course, that someone will write this way, but if you blur this code into several methods in depth call-stack this feature can save you a few if-s. In order for such a trick to become possible, .NET developers had to add a restriction - only an instance of the reference type could be used as a synchronization object, and implicitly add several bytes to each object to which the stream identifier will be written.

This feature of the work of the critical section in c # imposes one interesting limitation on the operation of the lock operator: you cannot use the await operator inside the lock operator. At first, I was surprised, because the similar try-finally Monitor.Enter / Exit construct is compiled. What is the matter? Here it is necessary to carefully reread the last paragraph again, and then add to it some knowledge of how async / await works: the code after await will not necessarily be executed on the same thread as the code before await, it depends on the synchronization context and the presence or no call to ConfigureAwait. From this it follows that Monitor.Exit may run on a thread other than Monitor.Enter, which will cause a SynchronizationLockException to be thrown. If you do not believe, then you can execute the following code in the console application: it will generate a SynchronizationLockException.

 var syncObject = new Object(); Monitor.Enter(syncObject); Console.WriteLine(Thread.CurrentThread.ManagedThreadId); await Task.Delay(1000); Monitor.Exit(syncObject); Console.WriteLine(Thread.CurrentThread.ManagedThreadId);

It is noteworthy that in a WinForms or WPF application, this code will work correctly if called from the main thread. Because there will be a synchronization context that implements the return to the UI-Thread after await has completed. In any case, you should not play with the critical section in the context of the code containing the await operator. In these cases, it is better to use synchronization primitives, which will be discussed later.

Talking about the work of the critical section in .NET, it is worth mentioning about another feature of its implementation. The critical section in .NET works in two modes: spin-wait mode and kernel mode. The spin-wait algorithm is conveniently represented as the following pseudocode:

 while(!TryEnter(syncObject)) ;

This optimization is aimed at the fastest capture of the critical section in a short time from the calculation that if the resource is now occupied, then it is about to be released. If this does not happen in a short period of time, the thread waits in kernel mode, which, like returning from it, takes time. The .NET developers maximally optimized the scenario of short locks, unfortunately, if many threads start to break the critical section between themselves, this can lead to a high and sudden CPU load.

Spinlock SpinWait

Since I already mentioned the spin-wait algorithm, it is worth mentioning the SpinLock and SpinWait structures from the BCL. They should be used if there is reason to believe that there will always be an opportunity to take the lock very quickly. On the other hand, it is hardly worth remembering about them before the results of the profiling show that it is the use of other synchronization primitives that is the bottleneck of your program.

Monitor.Wait, Monitor.Pulse [All]

This pair of methods should be considered together. With their help, you can implement various Producer-Consumer scripts.

Producer-Consumer is a multiprocess / multi-threaded design pattern with one or several threads / processes producing data and one or several processes / threads that process data. Usually uses a shared collection.

Both of these methods can only be called if the caller currently has a blocking thread. The Wait method releases the lock and hangs until another thread calls Pulse.

To demonstrate the work, I wrote a small example:

 object syncObject = new object(); Thread t1 = new Thread(T1); t1.Start(); Thread.Sleep(100); Thread t2 = new Thread(T2); t2.Start();

(I used the image, not the text, to visually show the order of instructions execution)

Parsing: Set a delay of 100ms at the start of the second thread, specifically to ensure that it starts later.
- T1: Line # 2 stream starts
- T1: Line # 3 stream enters critical section
- T1: Line # 6 stream falls asleep
- T2: Line # 3 stream starts
- T2: Line # 4 hangs waiting for the critical section
- T1: Line # 7 releases the critical section and freezes while waiting for Pulse to exit.
- T2: Line # 8 enters critical section
- T2: Line # 11 notifies T1 using the Pulse method
- T2: Line # 14 exits the critical section. Until then, T1 cannot continue execution.
- T1: Line # 15 quits waiting
- T1: Line # 16 exits critical section

There is an important remark in MSDN regarding the use of the Pulse / Wait methods, namely: The Monitor does not store state information, which means that calling the Pulse method before calling the Wait method can lead to a deadlock. If such a situation is possible, then it is better to use one of the classes of the ResetEvent family.

The last example clearly demonstrates the principle of the Wait / Pulse methods of the Monitor class, but still leaves questions about when it should be used. A good example would be such an implementation of BlockingQueue <T>, on the other hand, the implementation of BlockingCollection <T> from System.Collections.Concurrent uses SemaphoreSlim for synchronization.

ReaderWriterLockSlim

This is a favorite of me synchronization primitive, represented by the same name class of the namespace System.Threading. It seems to me that many programs would work better, use their developers in this particular class, instead of the usual lock.

The idea: many threads can read, only one to write. As soon as the stream declares the desire to write, new readings cannot be started, but will wait for the completion of the recording. There is also the concept of upgradeable-read-lock, which can be used if in the process of reading you understand the need to write something, such a lock will be converted to write-lock in one atomic operation.

The System.Threading namespace also has the ReadWriteLock class, but it is not highly recommended for new development. The slim version will avoid a number of cases leading to deadlocks, besides it allows you to quickly seize the lock, because supports spin-wait synchronization before leaving in kernel mode.

If you didn’t know about this class by the time you read this article, then I think now you have remembered quite a few examples from the code written recently that this approach to locks would allow the program to work efficiently.

The interface of the ReaderWriterLockSlim class is simple and straightforward, but its use can hardly be called convenient:

 var @lock = new ReaderWriterLockSlim(); @lock.EnterReadLock(); try { // ... } finally { @lock.ExitReadLock(); }

I like to wrap it up in class, which makes it much more convenient to use it.
The idea: to make Read / WriteLock methods that return an object with the Dispose method, then it will allow to use them in using and by the number of lines it is unlikely to be different from the usual lock.

 class RWLock : IDisposable { public struct WriteLockToken : IDisposable { private readonly ReaderWriterLockSlim @lock; public WriteLockToken(ReaderWriterLockSlim @lock) { this.@lock = @lock; @lock.EnterWriteLock(); } public void Dispose() => @lock.ExitWriteLock(); } public struct ReadLockToken : IDisposable { private readonly ReaderWriterLockSlim @lock; public ReadLockToken(ReaderWriterLockSlim @lock) { this.@lock = @lock; @lock.EnterReadLock(); } public void Dispose() => @lock.ExitReadLock(); } private readonly ReaderWriterLockSlim @lock = new ReaderWriterLockSlim(); public ReadLockToken ReadLock() => new ReadLockToken(@lock); public WriteLockToken WriteLock() => new WriteLockToken(@lock); public void Dispose() => @lock.Dispose(); }

Such a trick will allow you to simply write further:

 var rwLock = new RWLock(); // ... using(rwLock.ReadLock()) { // ... }

ResetEvent Family

To this family, I assign the classes ManualResetEvent, ManualResetEventSlim, AutoResetEvent.
The classes ManualResetEvent, its Slim version and class AutoResetEvent can be in two states:
- Non-signaled, in this state, all threads that caused WaitOne hang up, until the event transitions to a signaled state.
- Signaled state, in this state, all threads that hang on the WaitOne call are released. All new WaitOne calls on the event in the deflated state pass conditionally instantly.

The AutoResetEvent class from the ManualResetEvent class is distinguished by the fact that it automatically goes into a cocked state after releasing exactly one stream. If several threads hang pending AutoResetEvent, then the Set call will release only one arbitrary, unlike ManualResetEvent. ManualResetEvent will release all threads.

Consider the example of how AutoResetEvent works:

 AutoResetEvent evt = new AutoResetEvent(false); Thread t1 = new Thread(T1); t1.Start(); Thread.Sleep(100); Thread t2 = new Thread(T2); t2.Start();

In the example, you can see that the event goes into a cocked state (non-signaled) automatically only by releasing the thread that hung on the WaitOne call.

The class ManualResetEvent, unlike ReaderWriterLock, is not marked as obsolete and not recommended for use after the appearance of its Slim version. The slim version of this class is effectively used for short expectations, since It occurs in the Spin-Wait mode, the normal version is suitable for long.

In addition to the classes ManualResetEvent and AutoResetEvent, there is also a class CountdownEvent. This class is convenient for the implementation of algorithms, where, after the part that was managed to be parallelized, follows a part of bringing the results together. This approach is known as fork-join . A great article is devoted to the work of this class, so I will not analyze it in detail here.

findings

When working with threads, two problems that lead to incorrect results or their absence are race condition and deadlock
Problems that make the program spend more time or resources - thread starvation and busy wait
.NET is rich in thread synchronization tools
There are 2 lock wait modes - Spin Wait, Core Wait. Some .NET thread synchronization primitives use both
Interlocked is a set of atomic operations, used in lock-free algorithms, is the fastest synchronization primitive
The lock operator and Monitor.Enter / Exit implement the idea of a critical section - a code fragment that can be executed only by one thread at a time
Monitor.Pulse / Wait methods are convenient for implementing Producer-Consumer scripts
ReaderWriterLockSlim can be more effective than regular lock in scripts where parallel reading is permissible
The family of classes ResetEvent can be useful for synchronizing threads

Source: https://habr.com/ru/post/459514/

All Articles