📜 ⬆️ ⬇️

Multithreading in Rust

Rust began as a project that solved two difficult problems:


Initially, these problems did not seem to be related to each other, but to our surprise, their solution turned out to be the same — problems with multithreading solve the same tools that provide security .

Errors of working with memory and errors when working with several threads are often reduced to the fact that the code accesses some data in spite of the fact that it should not do this. The secret weapon of Rust against this is the concept of data ownership , a way to control access to data, which system programmers try to adhere on their own, but which Rust checks statically.
')
From the point of view of security of working with memory, this means that you can not use the garbage collector and at the same time not be afraid of segfolts, because Rust will not allow you to make a mistake.

In terms of multi-threading, this means that you can use different paradigms (message passing, shared state, lock-free data structures, pure functional programming), and Rust will avoid the most common pitfalls.

Here are some features of multi-threaded programming in Rust:


All these advantages derive from the data ownership model, and all the locks, channels, lock-free data structures, etc. described above are defined in the libraries, and not in the language itself. This means that the Rust approach to multithreading is very expandable - new libraries can implement other paradigms and help prevent new classes of errors simply by providing a new API based on Rust features related to data ownership.

The purpose of this post is to show how this is done.

Basics: Data Ownership


We'll start by reviewing the ownership and borrowing systems in Rust. If you are already familiar with them, then you can skip both parts of the "basics" and go directly to multithreading. If you want to understand these concepts deeper, I highly recommend this article written by Yehuda Katz . In the official book of Rust you will find even more detailed explanations.

In Rust, each value has a "domain of ownership", and the transfer or return of a value means the transfer of ownership ("transfer") to a new area. When the area ends, all values ​​that it owns by this moment are destroyed.

Consider a few simple examples. Suppose we create a vector and put several elements into it:

fn make_vec() { let mut vec = Vec::new(); //    make_vec vec.push(0); vec.push(1); //   , `vec`  } 

The scope in which the value is created becomes its owner. In this case, the area that owns the vec is the body of make_vec . The owner can do with the vec anything, in particular, change, adding elements. At the end of the scope, it still owns the vec , and therefore it is automatically destroyed.

It becomes more interesting if the vector is transferred to another function or returned from a function:

 fn make_vec() -> Vec<i32> { let mut vec = Vec::new(); vec.push(0); vec.push(1); vec //      } fn print_vec(vec: Vec<i32>) { //  `vec`     , //    `print_vec` for i in vec.iter() { println!("{}", i) } //  `vec`  } fn use_vec() { let vec = make_vec(); //     print_vec(vec); //    `print_vec` } 

Now, just before the end of the scope, make_vec , vec passed out as the return value — it is not destroyed. The calling function, for example, use_vec , acquires ownership of the vector.

On the other hand, the print_vec function takes the vec parameter, and ownership is passed to it by the calling function. Since print_vec does not transfer ownership to vec any further, the vector is destroyed when leaving this scope.

Once ownership of the value has been transferred somewhere else, it can no longer be used. For example, consider the use_vec function:

 fn use_vec() { let vec = make_vec(); //     print_vec(vec); //    `print_vec` for i in vec.iter() { //   `vec` println!("{}", i * 2) } } 

If you try to compile this option, the compiler will give an error:

 error: use of moved value: `vec` for i in vec.iter() { ^~~ 

The compiler reports that vec no longer available - ownership has been transferred somewhere else. And this is very good, because by this time the vector has already been destroyed.

The catastrophe is prevented.

Basics: Borrowing


So far, the code is not very convenient, because we do not need print_vec destroy the vector that is passed to it. In fact, we would like to give print_vec temporary access to the vector and be able to continue using it afterwards.

Here we need borrowing . In Rust, if you have a value, you can give it temporary access to the functions that you call. Rust will automatically verify that these “loans” will not act longer than the object that is being borrowed “lives”.

To borrow a value, you need to create a link to it (the link is one of the types of pointers) using the & operator:

 fn print_vec(vec: &Vec<i32>) { //  `vec`    //    for i in vec.iter() { println!("{}", i) } //     } fn use_vec() { let vec = make_vec(); //     print_vec(&vec); //      `print_vec` for i in vec.iter() { //   `vec` println!("{}", i * 2) } //  vec  } 

Now print_vec accepts the vector reference, and use_vec gives the "loan" vector: &vec . Since the borrowing is temporary, use_vec retains ownership of the vector and can continue to use it after print_vec returns control (and the vec expired).

Each link is valid only in a specific scope, which the compiler automatically determines. Links come in two forms.


Rust checks that these rules are executed; at compile time, borrowing has no overhead during program execution.

Why do we need two types of links? Consider a function of the following form:

 fn push_all(from: &Vec<i32>, to: &mut Vec<i32>) { for i in from.iter() { to.push(*i); } } 

This function passes through each element of the vector, placing them all in a different vector. The iterator (created by the iter() method) contains references to the vector in the current and end positions, and the current position "moves" in the end direction.

What happens if we call this function with the same vector in both arguments?

 push_all(&vec, &mut vec) 

This will lead to disaster! When we put new elements into a vector, sometimes it will need to change the size, for which a new memory is allocated, into which all elements are copied. In the iterator, there will be a "hanging" link in the old memory, which will lead to unsafe memory work, i.e. to segfault or something else worse.

Fortunately, Rust guarantees that while there is a mutable borrowing, there can be no other references to the object, and therefore the code above will lead to a compilation error:

 error: cannot borrow `vec` as mutable because it is also borrowed as immutable push_all(&vec, &mut vec); ^~~ 

The catastrophe is prevented.

Messaging


Now, after we briefly examined what ownership and borrowing are, let's see how these concepts come in handy in multi-threaded programming.

There are many approaches to writing multi-threaded programs, but one of the simplest of them is the transfer of messages when streams or actors communicate by sending messages to each other. Proponents of this style especially pay attention to the fact that it links data sharing and communication between actors:

Do not communicate through memory sharing; on the contrary, share through communication.
- Effective Go

Possession of data in Rust makes it very easy to convert this tip to a rule that is checked by the compiler. Consider this API for working with channels (although the channels in the standard Rust library are slightly different):

 fn send<T: Send>(chan: &Channel<T>, t: T); fn recv<T: Send>(chan: &Channel<T>) -> T; 

Channels are generic types that are parameterized by the type of data they pass through themselves (this is indicated by <T: Send> ). A Send restriction to T means that T can be safely transferred between threads. We will come back to this later, but for now it’s enough for us to know that Vec<i32> is Send .

As always, passing T to the send function also means transferring ownership of T It follows that such code will not compile:

 // ,  chan: Channel<Vec<i32>> let mut vec = Vec::new(); //  -  send(&chan, vec); print_vec(&vec); 

Here, the thread creates a vector, sends it to another thread, and then continues to use it. The stream that received the vector could change it at the time when the first stream is still running, so calling print_vec could lead to a race or, for example, an error like use-after-free.

Instead, the Rust compiler will print_vec error on the print_vec call:

 Error: use of moved value `vec` 

The catastrophe is prevented.

Locks


Another way to work with many threads is to organize the communication of threads through a passive shared state.

In multithreading with shared state of notoriety. It is very easy to forget to take a lock, or somehow change the wrong data at the wrong time, with a catastrophic result — so easy that many programmers refuse to do this multithreaded programming completely.

Rust's approach is as follows:

  1. One way or another, shared state multithreading is the fundamental programming style required for system code, maximum performance and for implementing other styles of multi-threaded programming.
  2. In fact, the problem lies in a randomly shared state.

The purpose of Rust is to provide you with tools to help you use the shared state, both when you use locks and when you use a lock-free data structure.

Rust streams are "isolated" automatically from each other thanks to the concept of data ownership. Recording can occur only when the stream has mutable access to the data: either due to the fact that the stream owns them, or due to the presence of a mutable link. Anyway, it is guaranteed that the stream will be the only one who at a given time can access the data . Consider the implementation of locks in Rust to understand how this works.

Here is a simplified version of their API (the version in the standard library is more ergonomic):

 //    fn mutex<T: Send>(t: T) -> Mutex<T>; //   fn lock<T: Send>(mutex: &Mutex<T>) -> MutexGuard<T>; //    ,   fn access<T: Send>(guard: &mut MutexGuard<T>) -> &mut T; 

This interface is rather unusual in several aspects.

First, the Mutex type has a type parameter T , meaning the data protected by this lock . When you create a mutex, you transfer ownership of the data to it, immediately losing access to it. (After the lock is created, it remains in the untapped state)

Next, you can use the lock function to block the stream until it seizes the lock. A feature of this function is that it returns a special fuse value, MutexGuard<T> . This object automatically releases the lock after it is destroyed - there is no separate unlock function here.

The only way to access the data is the access function, which turns the mutable link to the fuse into a mutable link to the data (with a shorter lifetime):

 fn use_lock(mutex: &Mutex<Vec<i32>>) { //       ; //        let mut guard = lock(mutex); //        //   let vec = access(&mut guard); // vec   `&mut Vec<i32>` vec.push(3); //     ( `guard` ) } 

Here we can note two key points:


It turns out that Rust does not violate the rules for working with locks: it will not give you the opportunity to gain access to the data protected by the mutex if you have not captured it first . Any attempt to work around this will result in a compilation error. For example, consider this erroneous "refactoring":

 fn use_lock(mutex: &Mutex<Vec<i32>>) { let vec = { //   let mut guard = lock(mutex); //      access(&mut guard) //   ,   }; //   ,    vec.push(3); } 

The Rust compiler generates an error that indicates exactly the problem:

 error: `guard` does not live long enough access(&mut guard) ^~~~~ 

The catastrophe is prevented.

Thread Safety and Trait Send


It is logical to separate data types into those that are "thread-safe" and those that are not. Data structures that are safe to use from multiple threads use tools to synchronize within themselves.

For example, two types of smart pointers are supplied with Rust, using reference counting:


Atomic hardware operations used in Arc are computationally more expensive than simple operations used in Rc , so Rc preferable in a normal situation. On the other hand, it is very important to ensure that Rc<T> never transmitted between threads, because this can lead to races breaking the link count.

The usual approach comes down to thorough documentation. In most languages, there is no semantic difference between thread-safe and unsafe types.

In Rust, the entire set of types is divided into two types - those that implement the Send trait, which means that these types can be safely moved between streams and those that do not implement it ( !Send ), which, respectively, means the opposite. If all components of a type are Send , then he himself is a Send , which covers most types. Some basic types are not thread-safe in nature, so types such as Arc can be explicitly marked as Send , which means a hint to the compiler: "Believe me, I provided all the necessary synchronization here."

Naturally, Arc is Send , and Rc is not.

We have already seen that Channel and Mutex work only with Send data. Since they are the very bridge through which data travels between streams, they also provide the guarantees associated with the Send .

Thus, Rust programmers can take advantage of Rc and other types of data that are unsafe for use in a multi-threaded environment, being sure that if they try to randomly transfer such types to another stream, the Rust compiler will report:

 `Rc<Vec<i32>>` cannot be sent between threads safely 

The catastrophe is prevented.

Stack sharing: scoped


Until now, all data structures were created on a heap, which was then used from several threads. But what if we need to start a thread that uses data that "lives" on the stack of the current thread? It may be dangerous:

 fn parent() { let mut vec = Vec::new(); // fill the vector thread::spawn(|| { print_vec(&vec) }) } 

The child thread accepts a reference to vec , which, in turn, is on the parent stack. When parent returns, the stack is cleared, but the child thread does not know about it. Oh!

To avoid such memory problems, the main API for running threads in Rust looks like this:

 fn spawn<F>(f: F) where F: 'static, ... 

The restriction 'static means, roughly speaking, that no borrowed data should be used in the closure. In particular, this means that code like parent above will not compile:

 error: `vec` does not live long enough 

In essence, this eliminates the possibility that the parent stack can be cleared when other threads are still using it. The catastrophe is prevented.

But there is another way to guarantee security: make sure the parent stack stays in order until the child thread terminates. Such a pattern is called fork-join-programming and is often used when developing parallel divide-and-conquer algorithms. Rust supports this approach using a special function to start a child thread:

 fn scoped<'a, F>(f: F) -> JoinGuard<'a> where F: 'a, ... 

This API has two key differences from spawn , described above.


Due to the use of the 'a parameter, the JoinGuard object cannot go out of scope, covering all the data that is borrowed by the closure f . In other words, Rust ensures that the parent thread will wait for the child thread to complete before clearing its stack (which the child thread can access).

Therefore, the above example can be corrected as follows:

 fn parent() { let mut vec = Vec::new(); //   let guard = thread::scoped(|| { print_vec(&vec) }); //   ,  //     } 

Thus, in Rust you can freely use the data placed on the stack in the child threads, being sure that the compiler will check for the presence of all the necessary synchronization operations.

Translator's note . Literally on the same day that this article came out, the opportunity was found to violate the guarantees provided by scoped in a safe code. Because of this, the thread::scoped was urgently destabilized , so it cannot be used with the beta version of the compiler, but only with nightly. This problem is planned to somehow fix for release 1.0.

Data race


Now we have examined enough examples to finally make a fairly strict statement about the Rust approach to multithreading: the compiler prevents all data races .

A data race occurs when an unsynchronized access to data from multiple streams, provided that at least one of these hits is a record.

Synchronization here refers to tools such as low-level atomic operations. In fact, the statement about preventing all data races is such a way to say that you cannot accidentally “share a state” between threads. Any access to data, including their change, must necessarily be carried out using some form of synchronization.

Data races are only one (albeit very important) example of a race condition, but by preventing them, Rust helps to avoid other, hidden forms of racing. For example, it is important to ensure that the atomicity of the update simultaneously several sections of memory: other threads "see" either all updates at once or none of them. In Rust, the presence of a &mut type reference to all relevant memory areas at the same time ensures that their changes are atomic , because no other thread can access them for reading.

It is worth stopping for a second to comprehend this guarantee in the context of the entire set of programming languages. Many languages ​​provide memory security with a garbage collector, but garbage collection does not help prevent data races.

Instead, Rust uses data ownership and borrowing to implement its two key positions:


Future


When Rust was first created, the channels were built into the language, and in general the approach to multithreading was quite categorical.

Rust' . , , Send , , - , .

, , Rust , . , syncbox simple_parallel , — , . Stay with us!

Source: https://habr.com/ru/post/256211/


All Articles