📜 ⬆️ ⬇️

What is Tokio and Async I / O and why is it needed?

The Rust community has recently concentrated a lot of its efforts on asynchronous I / O, implemented as a Tokio library. And that's great.


For many of the community members, those who have not worked with web servers and related things, it is not clear what we want to achieve. When these things were discussed at the time of version 1.0, I also had a vague idea about this, having never worked with it before.



Consider the models of multithreading on the example of Rust and Go.




What problem are we trying to solve?


One of the key features of Rust is "fearless concurrency" ( fearless concurrency ). However, the kind of competition that is needed to handle a large number of tasks, depending on the input / output ( I / O ) performance, and which is available in Go, Elixir, Erlang - is absent in Rust.


Let's assume that you want to build something like a web server. It will process thousands of requests at a time (problem c10k ). Generally speaking, the problem we are considering consists of many tasks that perform mainly I / O operations (especially those related to network interaction).


"Simultaneous processing of N problems" - this problem is best solved using threads. However ... Thousands of threads? This is probably too much. Working with threads can be quite resource-intensive: each thread must allocate a large stack ( stack ), adjust the thread using a set of system calls. On top of that, context switching is also expensive.


Of course, there will be no thousands of threads working at the same time: you have a limited number of cores ( core ), and at any given time only one thread will be executed on this core .


But in examples like this web server, most of these threads will not do any work. They will wait for either a request or a response.


With normal threads, when you perform an I / O blocking operation, the system call returns control to the kernel, which the control will not return, because probably the I / O operation has not yet completed. Instead, the kernel will use this moment as an opportunity to “load” ( swap in ) another thread and continue executing the original thread (starting the I / O operation) when the I / O operation is completed, that is, when the original thread is “unblocked” ( unblocked ) . That's how you solve such problems in
Rust, when you don’t use Tokio and similar libraries, run a million threads and let the OS independently schedule ( schedule ) the start and end of threads depending on the I / O.


But, as we have already found out, the use of threads does not scale well for tasks like this. (one)


We need more "lighter" threads.




A threading model based on lightweight threads. I think in order to better understand this model, you need to take a break from Rust for a while and look at a language that copes with this well, Go.


So, Go has lightweight threads called goroutines. You run them with the go keyword. The web server can execute code similar to the following:


 listener, err = net.Listen(...) //   for { conn, err := listener.Accept() //   //   go handler(conn) } 

This is a cycle that waits for new TCP requests and launches a new gorutin and the handler function in it. Each network connection will be processed by a separate gorutina, which will be destroyed when the handler function is completed. At the same time, the main loop will be executed, because it works in a separate gorutin.


If these are not real (OS supported) threads, then what happens?


Gorutina is an example of a lightweight thread. The OS does not know anything about them; it sees N threads that are at the disposal of the Go runtime system ( Go runtime , hereinafter CB Go).


CB Go displays M Gorutin (2) on them, loading and unloading Gorutins like an OS scheduler . CB Go can do this because the Go code allows interruptible interconnect , allowing the garbage collector ( GC ) to do its work. All this makes it possible for the scheduler to intentionally stop the work of the gorutina.


The scheduler is aware of the I / O system, so when Gorutin is waiting for the completion of the I / O operation, she returns the right to execution to the scheduler back.


In essence, the compiled Go function will have a set of scattered places around it, where it says to the scheduler and the GC: "Take control of yourself if you want" (and also "I expect this and that, please take control of yourself) .


When the gorutin is loaded into the OS thread, some of the registers will be saved and the pointer to the current instruction ( program counter , PC ) will be transferred to the new gorutin.


But what happens to the stack? OS threads have a large stack with them; it is needed so that the functions can work.


Go uses segmented stacks. The reason why the thread needs a large stack is that most programming languages ​​(PL), including C, expect the stack to be continuous, and the stacks cannot be relocated ( reallocated ), as we do with growing memory buffers, because we expect the data on the stack to remain in the same place, allowing the pointers to the data on the stack to continue to work. So we prudently reserve for ourselves the entire stack, which in our opinion, we might need (approximately 8 MB). In this case, we expect that this is enough for us.


But the expectation that stacks will be continuous, strictly speaking, is not necessary. In Go, stacks are made up of small pieces. When the function is called, it checks if there is enough space on the stack for its execution, and if not, it allocates a new piece of stack and runs on it. So if you want to have thousands of threads doing a small amount of work, they will all receive thousands of small stacks, and everything will be fine.


In fact, Go does a little different today: it copies stacks. I mentioned that stacks cannot just be re-allocated, it is expected that the data on the stack will remain in the same place. But this is not always the case, because Go has a GC, so in any case, it knows which pointers to which it points, and can rewrite the pointers pointing to the stack if necessary.


In any case, the Go runtime system works well with such things. Gorutin are very cheap, and you can run them thousands, without any problems.


Rust supported lightweight / green threads (I believe he used segmented stacks). However, Rust carefully ensures that you do not pay for those things that you do not use, and this (the use of lightweight threads) imposes restrictions on all your code, even if you do not use these lightweight threads.


Lightweight threads were removed from Rust before version 1.0 was released .




Asynchronous I / O


The main building block of this is asynchronous I / O. As mentioned in the previous section, with the usual blocking I / O , when you start an I / O operation, your thread will not be allowed to continue execution (it has become blocked) until the operation is completed. This is good when you work with OS threads (the OS scheduler does all the work for you!), But if you have lightweight threads, you will want to replace the lightweight thread that runs on the OS thread with another thread.


Instead, you use non-blocking I / O , where the thread queues the I / O request to the OS and continues execution. An I / O request will be executed by the kernel after some time. After that, the thread will have to ask the OS: “Has my I / O operation completed?” Before using the I / O result.


Of course, frequently asking the OS whether it has completed an I / O requires resources. That is why there are such system calls as epoll . Here you can put together a set of incomplete I / O operations and then ask the OS to wake up your thread when any of these operations is complete. So you can have a scheduler thread (real thread) that unloads lightweight threads that are waiting for the I / O to complete. When nothing happens, the scheduler thread may go to sleep ( sleep ), causing an epoll , waiting until the OS wakes it up (when one of the I / O operations is completed).


The internal mechanism involved here is probably more complicated.


Let's go back to Rust. Rust has a mio library, which is a platform-independent wrapper over non-blocking I / O and tools like epoll (GNU / Linux), kqueue (FreeBSD), etc. This is a building block, and although those that are used to use epoll in C directly, it may seem convenient, it does not provide wonderful
models like go. However, we can achieve this.




Futures ( futures )


They are another building block. Future - the promise that sooner or later the value will be obtained (in JavaScript, they are called Promise ).


For example, you can make a request for the arrival of a request on a network socket and get Future back (actually Stream , which is similar to futur, but is used to obtain a sequence of values). This Future will not contain the answer, but will know when it comes. You can call wait() on Future , which will be blocked until the resulting value is obtained. You can also call poll() on the Future ,
asking if the Future response (it will give you the result, if any).


Futures can be chained, so you can write code like future.then(|result| process(result)) . A passed closure ( closure ) itself can produce another futur, so that you can chain several entities in a chain, for example, an I / O operation. With futures on a chain, poll() is a way to gradually execute a program; each time you call it, it will go to the next futur, provided that the current future is ready (contains the result).


This is a good explanation of how non-blocking I / O works.


Linking futures to a chain works like chained iterators. Each and_then call (or any other combinator) returns a structure that contains an internal future, which can optionally contain a closure. The closures themselves carry their own links and data with them, so all this is very similar to small stacks!




Tokio


Tokio is essentially a great mio wrapper that uses futures. Tokio has a main event loop, and you pass it closures that return futures. This cycle will perform all the closures that you pass to it, using mio to find out which futures can progress and push them further (causing poll() ).


At the ideological level, this is quite similar to what Go does. You must manually configure the main event loop (scheduler), and as soon as you do this, you can pass on to the loop tasks that I / O do periodically. Every time one of the current tasks is blocked on an I / O operation, the cycle will load a new task. The key difference is that Tokio works in the same thread, while the Go scheduler can use several threads of the OS for execution. However, you can move tasks that are highly processor dependent to other threads of the OS and use channels ( channels ) to coordinate them, which is not difficult.


Although at the ideological level, this is similar to what we have in Go, the code does not look very nice. The following code on Go:


 //      func foo(...) ReturnType { data := doIo() result := compute(data) moreData = doMoreIo(result) moreResult := moreCompute(data) // ... return someFinalResult } 

on Rust looks like this:


 //      fn foo(...) -> Future<ReturnType, ErrorType> { do_io().and_then(|data| do_more_io(compute(data))) .and_then(|more_data| do_even_more_io(more_compute(more_data))) // ...... } 

Not beautiful. The code gets worse if you enter branching and loops . The problem is that in Go we got the interruption points for free, but in Rust we have to manually encode this by linking the combinators into a chain, getting some sort of state machine (automaton).




Generators and async/await


This is where the generators appear (they are also called corintins).
Generators are an experimental feature in Rust. Here is an example:


 let mut generator = || { let i = 0; loop { yield i; i += 1; } }; assert_eq!(generator.resume(), GeneratorState::Yielded(0)); assert_eq!(generator.resume(), GeneratorState::Yielded(1)); assert_eq!(generator.resume(), GeneratorState::Yielded(2)); 

Functions are entities that perform the task and return the result to the calling code once. And generators can return ( yield ) several times: they stop execution in order to return some data, and can be continued, at the same time they will be executed until the next yield . Although my example does not show this, generators can complete execution in the same way as regular functions.


Closures in Rust are , ( captured ) + Fn types of traits, in order to make the structure callable .


Generators are similar to them, among other things, they implement a type of Generator and usually contain an enum , which represents different states.


The "unstable" book has some examples of how this state machine looks.


This is much closer to what we were looking for! Now our code looks like this:


 fn foo(...) -> Future<ReturnType, ErrorType> { let generator = || { let mut future = do_io(); let data; loop { //  ,     //   ,     match future.poll() { Ok(Async::Ready(d)) => { data = d; break }, Ok(Async::NotReady(d)) => (), Err(..) => ... }; yield future.polling_info(); } let result = compute(data); //      `doMoreIo()`,  . . } futurify(generator) } 

futurify is a function that accepts a generator and returns a footer, which on each call of the poll will do the generator's resume() and return NotReady until the generator completes its execution.


But wait, it's even more ugly! What is the point of transforming our relatively clean chain of callbacks into this messivo?


If you look carefully, you will see that the code is consistent. We converted our callback code to the same linear flow as the Go code, however it contains a strange yield code in a loop, futurify also does not look futurify .


Here futures-await comes to the rescue. futures-await is a procedural macro that removes this redundant code. In essence, it allows the function to be rewritten as


 #[async] fn foo(...) -> Result<ReturnType, ErrorType> { let data = await!(do_io()); let result = compute(data); let more_data = await!(do_more_io()); // .... 

Neat and clean. Almost as clean as Go code, we also directly call await!() . The await call data provides the same functionality as the automatically set breakpoints in Go.


And, of course, since the code uses generators, you can use a loop, enter a branch and do whatever you want, and the code will still remain clean.




Summing up


Futures in Rust can be chained together to provide a lightweight, stack-like system. With async/await you can beautifully write a chain of futures, and await provides explicit breakpoints for each I / O operation. Tokio provides a “scheduler” —the main event loop to which you can pass asynchronous functions, while under the hood mio used to abstract from low-level non-blocking I / O primitives.


These are components that can be used separately - you can use Tokio with futures, without async/await . You can use async/await without using Tokio . For example, this might be suitable for a Servo network stack. It does not need to do a lot of parallel I / O operations (at least not on the order of a thousand threads), so that it can use multiplexed OS threads. However, we still want to have a pool of threads and
consistently (pipeline) to process data, and async / await is a good help here.


To summarize: all these components, despite the small amount of redundant ( boilerplate ) code, give something almost as clean as the Gorutins in Go. And since the generators (and, therefore, async/await ) coexist well with the borrowing analyzer ( borrow checker , borrowck), the security invariants supported by Rust are still valid, and we get "fearless competitiveness" for programs that perform a large amount of I / O tasks.


Many thanks to everyone from the Rustycrate community who participated in the translation, proofreading and editing of this article. Namely: vitvakatu, ozkriff.




[1]: This is not necessarily the case for all server applications.
For example, Apache HTTP Server uses OS threads. Often OS threads are the best.
tool for solving the problem.


[2]: Lightweight threads are said to implement the M: N model - ( "green" threading ).


')

Source: https://habr.com/ru/post/347726/


All Articles