Write on Rust - run everywhere. Rust and C interaction

I offer the readers of "Habrakhabr" the translation of the post "Rust Once, Run Everywhere" from the blog Rust by Alex Crichton. I myself have been interested in this language for some time, and in connection with the imminent release of version 1.0 I would like to promote it according to my modest possibilities. Unfortunately, I don’t write anything of my own now, but once I was engaged in translations, so I decided to recall a long-standing business. I did not find the translation of this post on Habré, so I decided to make my own.
Some terms that designate unique concepts for Rust (ownership, borrowing, lifetime parameter), I did not know how best to translate into Russian, so I tried to find the most appropriate and more or less understandable words for the Russian-speaking audience. Any improvement suggestions are accepted.

There have never been plans to achieve Rust's world domination overnight, so there is a great need for the ability to interact with already existing code as easily as with native code at Rust itself. That is why Rust makes it very easy to use the C API without overhead, and at the same time ensures strict security of memory management, thanks to its control system of ownership and borrowing pointers.

For interfacing with other languages, Rust provides FFI (foreign function interface). Following the basic principles of Rust, FFI provides an abstraction with zero price, so the cost of calling functions between Rust and C is the same as the cost of calling C functions from code in C. You can use language properties such as control of ownership and borrowing to create a secure protocol for managing pointers and other resources. Usually such protocols exist only in the form of documentation for the C API (this is at best), Rust makes such requirements explicit (and their implementation is guaranteed by the compiler itself - approx. Transl.)

In this post, we will look at how you can encapsulate an insecure C function call interface into a safe abstraction.
And although the main conversation about interaction with C, the integration of Rust with other languages (such as Ruby and Python) is just as simple.
')

Rust works with C

Let's start with a simple example: calling a code on C from Rust, and then show that Rust does not impose any additional costs. Here is a simple C program that doubles any number on input:

int double_input(int input) { return input * 2; }

To call this function from Rust, you can write this code:

 extern crate libc; extern { fn double_input(input: libc::c_int) -> libc::c_int; } fn main() { let input = 4; let output = unsafe { double_input(input) }; println!("{} * 2 = {}", input, output); }

And that's it! You can try this example yourself with the code from GitHub - just check it out and execute the cargo run from the project directory. From the code it is clear that no additional gestures are required to call the C function, just to describe its signature. Soon we will see that the generated machine code also does not contain any additional costs. However, there are several small (but insidious - approx. Transl.) Details in this program on Rust, so we first analyze each part in more detail.

First, we see extern crate libc . The libc rack contains many type definitions for FFI, which are useful for working with C, and it is guaranteed that the types on the call boundary between Rust and C are consistent with each other.

Go ahead:

 extern { fn double_input(input: libc::c_int) -> libc::c_int; }

In Rust, this is an external function declaration. You can think of this code as an analog of the header file in C. Here the compiler learns about the input and output parameters of the function. As you can see, this signature coincides with the definition of a function on C.

Next we have the main program code:

 fn main() { let input = 4; let output = unsafe { double_input(input) }; println!("{} * 2 = {}", input, output); }

Here you can see one of the most important aspects of FFI in Rust, the unsafe block. The compiler does not know anything about the implementation of the double_input() function, so that it initially assumes that when any external function is called, the memory management security may be impaired. The unsafe block gives the programmer the opportunity to take responsibility for ensuring the safety of working with memory: in this way, you promise that the call to this function will not violate the integrity of the memory, so that the basic guarantees of Rust will remain fulfilled. It seems that these restrictions are too strict, but Rust provides enough resources for API users not to worry about unsafe blocks (this point will be revealed a bit later).

Now that we have seen how to make a C function call from Rust, let's see if Rust really does not impose any additional costs on this call. Almost all programming languages can somehow call C code, but often this is accompanied by at least additional type conversions, and sometimes more complex operations, during program execution. To see what Rust actually does, let's look at the assembler code issued by the Rust compiler to call the double_input() function:

 mov $0x4,%edi callq 3bc30 <double_input>

And it's all! As you can see here, the C-function call from Rust only requires the placement of arguments and one call introduction, just as if the call were from code C.

Safe abstractions

Most of Rust's capabilities are tied to the concept of data ownership, and FFI is no exception. When you create a binding for the C library in Rust, you not only get zero overhead, but you can guarantee greater security of working with memory than in C! Bindings can use Rust's owning and borrowing principles for strict control over the rules using the API, which are usually described only in the C header files in the form of comments.

For example, imagine a library for working with tar archives. This library provides a function to read the contents of each archive file, something like this:

 // Gets the data for a file in the tarball at the given index, returning NULL if // it does not exist. The `size` pointer is filled in with the size of the file // if successful. const char *tarball_file_data(tarball_t *tarball, unsigned index, size_t *size);

This function makes implicit assumptions about how it will be used: the returned char* pointer cannot survive the tarball_t *tarball input parameter. Bindings to this API on Rust can look like this:

 pub struct Tarball { raw: *mut tarball_t } impl Tarball { pub fn file(&self, index: u32) -> Option<&[u8]> { unsafe { let mut size = 0; let data = tarball_file_data(self.raw, index as libc::c_uint, &mut size); if data.is_null() { None } else { Some(slice::from_raw_parts(data as *const u8, size as usize)) } } } }

Here, the pointer *mut tarball_t owns the Tarball structure, which is responsible for clearing the memory and resources, so that we already have full knowledge of the lifetime of the memory allocated for the tar archive. In addition, the file() method returns a borrowed slice, the lifetime of which is implicitly related to the lifetime of the Tarball structure Tarball ( &self argument). Thus, Rust shows that the returned slice can be used only as long as the structure with the archive is alive, statically ensuring that there will be no bugs with trailing pointers (which is easy to admit in C itself). (If you are not familiar with borrowing in Rust, I advise you to read the possession of the post Yehuda Katz .)

Here the main feature of bindings in Rust is their security, that is, the user of this API in Rust should not use the unsafe block to call them! Although the implementation itself is not safe here (due to the use of FFI), the interface to it, thanks to borrowed pointers, guarantees the safety of working with memory for any code on Rust that uses it. That is, the Rust compiler statically guarantees that it is simply impossible to call segfault when using this API from Rust code. And do not forget: all this does not bear any additional overhead costs! All types of C in Rust-e are presented without any additional memory requirements.

The Rust community has already created a decent set of secure bindings for existing C libraries, including OpenSSL , libgit2 , libdispatch , libcurl , sdl2 , Unix APIs, and libsodium . And this list on crates.io is replenished very quickly, so it may very well be that your favorite C-library either already has bindings on Rust, or they will be written soon.

C works with Rust

Despite guarantees for the safety of memory, Rust does not use the garbage collector or the runtime environment, Rust code can be called from C without any special training. That is, there is no overhead for not only C calls from Rust, but also Rust calls from C!

Take the example opposite to the previous one. As before, all the code is available on GitHub . First, the code on Rust-e:

 #[no_mangle] pub extern fn double_input(input: i32) -> i32 { input * 2 }

As before, there is nothing complicated here, but some intricate features are worth considering. First, we marked our function with the #[no_mangle] attribute. This is a signal to the compiler that you do not need to distort the function name double_input . Rust uses name decoration similar to what is used in C ++ to guarantee the uniqueness of names between different libraries, and this attribute will allow us not to guess what name the compiler gave the function when it is called from C (and the decorated name can be like this: double_input::h485dee7f568bebafeaa ).

Next we have the definition of a function, and the most interesting thing here is the keyword extern . This is a special form for specifying an ABI function , which makes it compatible with calling functions from C.

Finally, if you look at Cargo.toml , you will see that this library is not assembled as a regular Rust library (rlib), but as a static one, which in Rust is called “staticlib”. All this makes it possible to statically link the code on Rust with the C program.

Now that we have understood the library on Rust, let's write a C program that will use it.

 #include <stdint.h> #include <stdio.h> extern int32_t double_input(int32_t input); int main() { int input = 4; int output = double_input(input); printf("%d * 2 = %d\n", input, output); return 0; }

Here you can see that in C, as in Rust, you need to declare the external function double_input() , now written in Rust.

Apart from this detail, everything else is already working! If you run make from a directory from GitHub , then this example will be compiled and assembled into one static executable file, which will launch 4 * 2 = 8 to the console.

The absence of a garbage collector and runtime environment makes it very easy to integrate C with Rust. The external C-code should not make any gestures for setting up the Rust's environment, so the transitions between the Rust-code and C are very cheap.

After C

FFI in Rust, as we have seen, does not give almost any overhead, and the system of ownership allows you to write memory-safe bindings to C-libraries for Rust. However, even if you do not use C, you are still lucky! The same principles allow you to call Rust code from Python , Ruby , JavaScript, and many other languages.

Sometimes, programming in these languages, it becomes necessary to speed up critical components, but earlier it had to go down to C, thereby abandoning the security of memory management, high-level abstractions and the ergonomics of these languages.

However, the fact that Rust easily integrates with C means that it is also well suited for this use. One of the first companies to use Rust in production, Skylight , was able to almost instantly improve performance and reduce memory usage by data collections, simply by switching to Rust, while the entire Rust code was published as Ruby-gem.

The transition from Python and Ruby to C languages to optimize performance is often a rather laborious process, and it is difficult to guarantee that the program will not crash so hard to debug. Rust not only provides FFI with zero costs, but it also makes it possible to keep the same security guarantees as the original language. In the long run, this should enable programmers in these languages to descend closer to the hardware and add a bit of system programming to increase performance where it is needed.

FFI is just one of many tools in Rust's money-box, but it is the main component to transition to Rust, since it makes it very easy to integrate with the existing code. I personally will be very happy to see how the merits of Rust come in as many projects as possible!

Source: https://habr.com/ru/post/257687/

All Articles

Write on Rust - run everywhere. Rust and C interaction

Rust works with C

Safe abstractions

C works with Rust

After C

More articles: