How GIL works in Ruby. Part 3. Does GIL make your code thread safe?

Translations of the previous two parts:
First part
The second part of

This is an Jesse Storimer article. He speaks at the Unix fu seminar , an online classroom for Ruby developers who want to learn the amazing hacks in Ruby and improve their skills in the development of the server stack. The number of participants is limited, so hurry up, while there are empty seats. He is also the author of the books “Working with Unix Processes” , “Working with TCP Sockets” and “Working with Threads in Ruby” .
')
In the Ruby community there are some misconceptions about GIL in the interpreter's MRI implementation. If you want to know the answer to the main question of this article, without reading it, here it is: GIL does not make your Ruby code thread safe.

But you must not take my words for granted.

This series of articles began with an attempt to understand what GIL is at the technical level. The first part explains where the conditions for the occurrence of the race condition in the C code, which is used in the implementation of MRI, come from. However, it seems that GIL avoided this, at least for the Array#<< method, we saw it.

The second part confirms that GIL, in fact, makes the atomic implementation of embedded methods in MRI. In other words, this precludes the occurrence of a race condition. However, this applies only to the built-in functions of the MRI itself, and not to your Ruby code. Thus, we still had the question: “Does the GIL provide any guarantees that your Ruby code will be thread safe?”.

I have already answered this question above. Now, I want the delusions about this to stop.

Once again about the state of the race

A race condition may occur if some data is shared across multiple threads and they try to work with this data at the same time. When this happens without any synchronization, for example, without blocking, your program may start doing unexpected things, and data may be lost.

Let's take a step back and remember how a race condition can arise. We will use the following Ruby code example for this part of the article:

 class Sheep def initialize @shorn = false end def shorn? @shorn end def shear! puts "shearing..." @shorn = true end end

This class is nothing new. Sheep is not shorn at birth. The “shear!” Method performs shearing and marks the sheep as already cropped.

 sheep = Sheep.new 5.times.map do Thread.new do unless sheep.shorn? sheep.shear! end end end.each(&:join)

This code creates a new sheep object and spawns 5 threads. Each of them checks whether the sheep has been shaved, and if not, it calls the “shear!” Method.

This is the result I get by running this code several times on MRI 2.0:

 $ ruby check_then_set.rb shearing... $ ruby check_then_set.rb shearing... shearing... $ ruby check_then_set.rb shearing... shearing...

Sometimes one sheep is sheared twice!

If you were sure that GIL would allow your code to "just work" in multiple threads, now this should pass. GIL does not give you any guarantees. Please note that the first time you run the script, you get the expected result, but in subsequent times - the result was not expected. If you continue to run this example, you will see a few more options.

These unexpected results came from a race condition in your Ruby code. In fact, this is a fairly common design error pattern, which even has its own name: “check-then-set race condition”. In this case, two or more threads check some value, and then set other values based on the first. Having nothing to ensure atomicity, it is quite possible that the two streams go through the “value check” phase, and then both perform the “set new value” phase.

Race Status Recognition

Before we consider how to fix it, I want you to understand how to recognize it. I am obliged to @brixen for explaining the alternation terminology in the context of parallelism. It is really useful.

Remember that context switching can occur on any line of your code. When switching from one thread to another, imagine that your program is divided into a set of separate blocks. This sequential block set is a set for interleaving.

On the one hand, it is quite possible that the context switch occurs after every line of code! Such a set of alternating blocks will contain one line of code in each. On the other hand, it is quite possible that there will be no context switching in the body of the stream at all. In this case, in each alternating block will be the complete code of the stream. Between these extremes there are many options, as your program can be cut into alternating blocks.

Some of these alternations are fine. Not every line of code leads to a race condition. But presenting your programs as a set of possible alternating blocks can help you understand when conflict situations occur. I will use a series of graphical schemes to show how this code can be executed by two threads.

Just to make the schemes easier, I replaced the call to the “shear!” Method with its code.

Consider this scheme. Alternating flow blocks A are highlighted in red, Block B are highlighted in blue.

Now let's see how this code can be interleaved by simulating context switching. In the simplest case, if no thread is interrupted during execution, this will not lead to a race condition, and we will get the expected result. It might look like this:

Now I have organized the circuit so that you can see the sequence of events. Remember that GIL stops everything around the executable code, so two threads cannot really work in parallel. Events in this case are sequential, top-down.

In this alternation, thread A did all its work, and then the scheduler switches the context to thread B. Since thread A has already successfully shaved the sheep and updated the state variable, thread B does nothing with it.

But not always everything is so simple. Remember that the scheduler can switch context at any time. This time we are lucky.

Let's look at a more vile example that will produce an unexpected result for us.

In this case, the context switch occurs at the point that causes the problem. Stream A checks the status and starts the haircut. Then, the scheduler switches the context, and flow B starts. Although flow A has already shaved the sheep, he has not yet had time to update the status flag, so flow B does not know anything about it.

Flow B checks the condition, considers that the sheep is not trimmed and shears it again. After that, the context is switched to thread A, which terminates its execution. Although thread B has set the state flag, thread A does it again, because it only remembers its state at the time of the interruption.

The fact that the sheep was cut twice may not seem like a big problem to take care of, but it is enough to replace it with an account, and to take a fee for each hairstyle in order to get dissatisfied customers!

I will share one more example to show the non-deterministic nature of these things.

We just added more context switches, so each thread is executed a little bit more than once. You just need to understand that context switching is possible on any line of the program. These switchings can occur at different times each time the code is executed, so that you can get the desired result at one iteration, and unexpected at the next.

It is really good to think about the state of the race. When you write a multithreaded code, you should think that the program can be crushed into blocks, and take into account the effect of their various alternations. If it seems to you that some of them may lead to incorrect results, you should rethink your approach or introduce synchronization through a mutex.

It's horrible!

Now it seems appropriate to tell you that you can make this code thread safe by simply adding a mutex. Yes, you can really do it , but I specifically prepared the following example to prove my point that this is a terrible approach. You should not write such code for multi-threaded execution.

Every time you have multiple threads referencing an object and making changes to it, and you do not have a lock in the right place to prevent the consequences of a context switch in the middle of the changes, you come across a problem.

However, the race condition can be avoided without blocking in the code. Here is one of the solutions using the queue:

 require 'thread' class Sheep # ... end sheep = Sheep.new sheep_queue = Queue.new sheep_queue << sheep 5.times.map do Thread.new do begin sheep = sheep_queue.pop(true) sheep.shear! rescue ThreadError # raised by Queue#pop in the threads # that don't pop the sheep end end end.each(&:join)

I removed the implementation of the sheep class, as it is exactly the same as before. Now, instead of the joint work of different streams on one sheep and the race for its shearing, a queue has appeared that provides synchronization.

If you run this code on MRI, or on any other truly parallel Ruby implementation, it will produce the expected result each time. We eliminated the race condition in this code. Even taking into account that all threads will call Queue#pop at more or less one time, this code uses an internal mutex so that only one stream at a time can get a sheep.

As soon as this one stream receives a sheep, the race condition disappears. Just one stream, no more to compete with him!

The reason why I suggest using the queue instead of blocking is that it is more difficult to use it incorrectly. In locks, as you know, it is easy to make mistakes. If they are not used correctly, they bring new problems, such as interlocking and performance degradation. To use a data structure, this is how to use an abstraction. Make tricky things more limited, but get a simpler API.

Delayed initialization

I just quickly note that lazy initialization is another form of "check-then-set race condition". Operator ||= expands to:

 @logger ||= Logger.new #   if @logger == nil @logger = Logger.new end @logger

Look at the expanded version and think about where the problem may arise. With several threads and without synchronization, it is quite possible that @logger will be initialized several times. Of course, initializing @logger twice may not be a problem in this case, but I have seen similar bugs in the code that caused problems.

Reflections

In the end, I want you to learn a lesson for yourself.

4 out of 5 programmers agree that in multi-threaded programming it is quite difficult to do everything correctly.

In the end, all that GIL guarantees you is that the implementations of methods built into MRI will be performed atomically (but there are some pitfalls here). This behavior can sometimes help us, but GIL is actually designed for the internal protection of the MRI itself, and not as a reliable API for Ruby developers.

Thus, GIL does not solve thread safety problems. As I have already said, it is difficult to write multi-threaded programs correctly, but we are solving complex tasks every day. One of the options for how we work with a complex problem is abstraction.

For example, when I need to make an HTTP request in my code, I have to use a socket. But usually I do not use it directly, as it is cumbersome and error prone. Instead, I use abstraction. The HTTP client gives a more limited and simple API, hiding the work with the socket and saving me from unnecessary errors.

If it's hard to get a properly working multi-threading, then maybe you shouldn't use it directly.

"If you added a new stream to your program, then you probably added 5 new errors." Mike perham

We see more and more abstractions around threads. The approach that captures the Ruby community is the actor model, with the most popular implementation being Celluloid . It gives us a great abstraction that links primitive concurrency with the Ruby object model. Celluloid does not guarantee that your code will be thread safe or free from a race condition, but it contains best practices about this. I urge you to give him a chance .

These issues that we are talking about are not specific to Ruby or MRI. This is a reality in the world of multi-core programming. The number of cores in devices is only growing, and the MRI has yet to somehow respond to this. Despite some guarantees, the use of GIL in multithreaded programming seems wrong. This is part of the growing pains of MRI. Other implementations such as JRuby or Rubinus are really distributed and do not have a GIL.

We see many new languages that have built-in abstractions for parallelism. There is no one in Ruby, at least not yet. Another advantage of abstractions is that their implementation can improve, while at the same time your code will remain unchanged. For example, if the queue implementation got rid of the use of locks, then your code will reap the rewards without any changes.

Currently, Ruby programmers should themselves study methods for solving these problems! Learn about concurrency. Know the cause of the race condition. Present the code as alternating blocks, it will help you solve problems.

Lastly, I will add a quote that well describes most of the work with concurrency to date:

“Do not work together, sharing state, share state together working”

Using a data structure for synchronization supports this. The actor model supports this idea. It underlies concurrency in languages such as Go, Erlang, and others.

Ruby needs to look at what and how it works in other languages and add it to yourself. As a Ruby developer, you can start doing something today, just try and support one of the approaches. With more people on board, these approaches can become the new standard for Ruby.

Thanks to Brian Shirai for analyzing the draft of this article.

Source: https://habr.com/ru/post/230809/

All Articles