The unsafe
keyword is an integral part of the design of the Rust language. For those who are not familiar with it: unsafe
is a keyword that, in simple terms, is a way to bypass Rust's type checking .
The existence of the unsafe
keyword is a surprise to many at first.
In fact, except that the programs do not "fall" from errors when working with memory,
Isn't it a feature of Rust? If so, then why is there an easy way to get around
type system? This may seem like a language defect.
But not everything is so simple, the details - under the cut.
This note represents the unsafe
keyword and the idea of limited "insecurity."
In fact, this is a precursor of a note that I hope to write a little later.
She discusses the Rust memory model, which indicates what can and cannot be done in unsafe
code.
unsafe
code adds 3 features:
extern
.unsafe
code is used to get around this restriction. static mut N: i32 = 1; fn add_one(n: i32) -> i32 { n + 1 } fn main() { unsafe { N = add_one(N); // } // - unsafe { println!("{}", N); // } }
fn add_one_ptr(n: *mut i32) { unsafe { *n = *n + 1; } } fn main() { let mut n = 5; add_one_ptr(&mut n as *mut i32); // // - // safe , n - static mutable // println!("{}", n); // }
This code will call segmentation fault:
unsafe { let ptr = 0 as *mut i32; *ptr = 1; }
unsafe
codeunsafe
code must be indicated by an unsafe
block.unsafe
specifier, its entire code is consideredunsafe
block.Like this:
unsafe fn do_dangerous_thing() { println!("{}", "in `unsafe` code"); } fn main() { unsafe { do_dangerous_thing(); } }
Yet, in my opinion, unsafe
not a disadvantage. In fact he is
important part of the language. unsafe
plays the role of some kind of output valve - this means that we can use the type system in simple cases, but allowing us to use all sorts of tricks that you want to use in your code. We only require that you hide these your tricks ( unsafe
code) behind safe external abstractions.
I think that how interpreted languages like Ruby (or Python) use C code is a good comparison to unsafe
work in Rust. Take, say, a JSON module in Ruby. It includes both a Ruby implementation (JSON :: Pure) and an alternative C implementation (JSON :: Ext). Usually when you use the JSON module, you run C code, but Ruby code
does not interact with it as it does with regular Ruby code. Externally, this code looks like this
same as any other Ruby module, but inside it can use various clever tricks and perform optimizations that cannot be written only in the code on Ruby itself. (You can read this excellent article on Helix to learn more, also there you can learn how to write Ruby plugins on Rust).
Well, the same can happen in Rust, but on a slightly different scale. For example, you can write a productive implementation of a hash table on a clean Rust. Adding unsafe
code will make this code even faster. If this data structure will be used by many people or its work is very important for your program,
then it may be worth it (Therefore, we use unsafe
code in the implementation of the standard library). However, in any case, the calling code on Rust refers to unsafe
code in the same way as unsafe
: the superimposed levels of abstraction provide a uniform
external API.
Of course, the fact that using unsafe
code allows you to make a program faster does not mean that you should use it very often. Just like most Ruby code written in Ruby, most Rust code is written in safe Rust. This is also true because safe Rust code is very efficient, so the benefits of switching to using unsafe
code to achieve high performance are rarely worth the effort.
It seems that the most frequent use of unsafe
code in Rust is the use of libraries in other languages through the FFI ( Foreign Function Interface ). Each C function call from Rust is unsafe
, because the compiler cannot judge the "security" of the C code.
unsafe
code.I think the most interesting thing is to write unsafe
code in Rust (or C module in Ruby) in order
to empower the language. Probably the most frequently cited example is the type Vec
in the standard library, which uses unsafe
code to manipulate uninitialized memory. Rc
and Arc
, which are reference counters,
are also a case in point. However, there are much more interesting examples, such as: CrossBeam and deque use unsafe
code to implement non-blocking ( lock-free ) data structures, or Jobsteal and Rayon use unsafe
code to implement a thread pool (thread pool).
In this article we will look at one simple example: the split_at_mut
method, which is available in the standard library. This method works with mutable slices . It also takes an index ( mid
) and divides the slice into two parts at the specified index. Subsequently, it returns two smaller slice: one with a range of 0..mid
, the second - in the mid..
For convenience, you can imagine split_at_mut
implemented as:
impl [T] { pub fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) { (&mut self[0..mid], &mut self[mid..]) } }
This code will not be compiled for two reasons:
foo[i]
, he ignores the index and treats the array as if it were a single whole ( foo[_]
). This means that it cannot reveal that &mut self[0..mid]
is a call to a different memory location than &mut self[mid..]
. This is due to the fact that conducting a similar analysis would require a much more complex type system.[]
not part of the language - it is fully implemented in the standard library. Therefore, even if the compiler knew that 0..mid
and mid..
do not overlap, it would not follow from this that he 0..mid
that these ranges apply to non-overlapping memory areas.One can imagine that it is possible, by changing the compiler, to ensure that the specified code sample will be compiled, and perhaps we will implement it once. But at the moment we prefer to implement methods like split_at_mut
using unsafe
code. This allows us to have a simple type system, having the ability to write an API like split_at_mut
.
A look at unsafe
code as a plug code allows you to clearly express the idea of "boundaries of abstraction." When you write a plugin in Rust, you expect that when the calling code in Ruby calls your functions, it will provide you with Ruby-related variables.
Inside, you can do what you want, for example, use a C array instead of a vector
in Ruby. But when you go back to running Ruby code, you must convert your returned entities to standard Ruby variables.
The same is true for unsafe
code on Rust. Client code seems that your code is safe . This means that it can be assumed that the calling code will pass valid values to the input. It also means that all your values that you return must comply with the requirements of the Rust type system. Being inside unsafe
borders, you can bypass the rules at your own discretion (of course, the amount of additional features provided is a topic for discussion; I hope to discuss this in a later note).
Let's look at the split_at_mut
method that we saw in the last section. To simplify the understanding, we will consider only the external interface of the function, represented by the signature:
impl [T] { pub fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) { // , // . // , . } }
What can we understand from this signature?
To begin with, split_at_mut
relies on the fact that all its input data is valid (in safe code, the compiler checks that this is indeed the case). unsafe
semantics of the split_at_mut
method can be expressed in the following rules:
self
argument is of the type mut [T]
. From this it follows that we will get a link indicating some (N) number of elements of type T. This is a mutable link, so we know that no one else can access the memory addressed by self
(while the mutable link is not will cease to exist). We also know that memory is initialized.mid
argument is of type usize
. All we know is that this variable is a non-negative integer.There is another unmentioned moment. Nowhere is it guaranteed that the mid
index is a valid index for accessing self
. It follows from this that the unsafe
code we are going to write will have to verify this.
When split_at_mut
completes, it should make the return value
matched the signature. Simply put, this means that the function should return
two allowable (pointing to allocated memory) sub-array ( slice ). It is also important that these sub-arrays do not overlap, that is, they are two non-overlapping sections of memory.
Let's look at several possible implementations of split_at_mut
and determine if they are working variants or not. We have already seen that the implementation written in "pure" Rust does not work (does not compile). Let's try to implement a function using raw pointers:
impl [T] { pub fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) { use std::slice::from_raw_parts_mut; // `unsafe` ** . // `unsafe` , , // UB(undefined behaviour). unsafe { // ** let p: *mut T = &mut self[0]; // `mid` let q: *mut T = p.offset(mid as isize); // `mid` let remainder = self.len() - mid; // "" `0..mid` let left: &mut [T] = from_raw_parts_mut(p, mid); // "" `mid..` let right: &mut [T] = from_raw_parts_mut(q, remainder); (left, right) } } }
This version is closest to the one that is implemented in the standard library.
However, this code is based on an assumption that is not justified by the input values: the code assumes that mid
is within the bounds of the array. Nowhere is it verified that mid <= len
. This means that q
can be outside the bounds of the array, it also means that calculating the remainder
can cause type overflow and wrap around
This is an incorrect implementation , because it requires more guarantees than is required
from the calling code.
We can fix this implementation by adding assert to the fact that mid
is
a valid index (note that assert in Rust is always executed, even in optimized code):
impl [T] { pub fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) { use std::slice::from_raw_parts_mut; // , `mid` : assert!(mid <= self.len()); // , unsafe { let p: *mut T = &mut self[0]; let q: *mut T = p.offset(mid as isize); let remainder = self.len() - mid; let left: &mut [T] = from_raw_parts_mut(p, mid); let right: &mut [T] = from_raw_parts_mut(q, remainder); (left, right) } } }
Well, here we practically repeated the implementation of this function in the standard library (here we used several other auxiliary
tools, but, in essence, the idea is the same).
Of course, it could happen that we actually wanted to assume that mid
was within acceptable limits, and wanted to do without this check. We cannot do this because split_at_mut
is part of the standard library. However, you can imagine a helper method for the calling code to validate this assumption, so that we would do without a costly check to find the index within the array at run time. In this case, split_at_mut
relies on the calling auxiliary code in order to ensure that themid
in array boundaries. This means that split_at_mut
no longer a safe code, because it has additional input requirements to ensure safe memory handling.
Rust allows you to express that the entire function code is unsafe
by placing the unsafe
keyword in the function signature. After such a move, the "insecurity" of the code is no longer an internal part of the implementation of the function; now it is part of the function interface . So we can make the split_at_mut
option - split_at_mut_unchecked
- which does not check split_at_mut_unchecked
mid
within acceptable limits:
impl [T] { // `unsafe`. // `unsafe` , // : `mid <= self.len()`. unsafe pub fn split_at_mut_unchecked(&mut self, mid: usize) -> (&mut [T], &mut [T]) { use std::slice::from_raw_parts_mut; let p: *mut T = &mut self[0]; let q: *mut T = p.offset(mid as isize); let remainder = self.len() - mid; let left: &mut [T] = from_raw_parts_mut(p, mid); let right: &mut [T] = from_raw_parts_mut(q, remainder); (left, right) } }
When fn
declared unsafe
just as it was done above, its call also becomes unsafe
. This means that the person who writes the calling code must review the documentation of the function and make sure that all conditions are met.
And in this particular case, the calling code must make sure that mid <= self.len()
.
If you think about the boundaries of abstraction, an unsafe
declaration means that it is not part of the “safe” Rust area, where the compiler itself detects errors by performing static analysis at the compilation stage. On the contrary, this means that a new abstraction appears, which becomes part of the unsafe
abstraction of the calling code.
Using split_at_mut_unchecked
, we can change the implementation of split_at_mut
so that it, inside of itself, carrying out the necessary checks, split_at_mut_unchecked
:
impl [T] { pub fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) { assert!(mid <= self.len()); // `unsafe`- , , // , `split_at_mut_unchecked`, // , . unsafe { self.split_at_mut_unchecked(mid) } } // **NB:** , `mid <= self.len()`. pub unsafe fn split_at_mut_unchecked(&mut self, mid: usize) -> (&mut [T], &mut [T]) { ... // . } }
Despite the fact that there is nothing in the language that would explicitly link the rules of privacy and the boundaries of unsafe abstractions, yet they are naturally related to each other. This is because privacy allows you to control a piece of code that can change
field in your data, and this is the main building block used to build unsafe
abstractions.
Earlier, we noticed that the Vec
type in the standard library is implemented using unsafe
code. It would not be possible without privacy. If you look at the definition of Vec
, you will see that it looks like this:
pub struct Vec<T> { pointer: *mut T, // capacity: usize, // length: usize, // }
The Vec
implementation code carefully maintains the invariant that the pointer
and the first length
elements it refers to are always valid. One would think that if length
were an open ( pub ) field, then the upper invariant would not be possible: any calling external code could change the length of Vec
to an arbitrary one.
For this reason, the boundaries of "insecurity" tend to fall into one of two categories:
split_at_mut
Vec
unsafe
unsafe
interfacesAs we saw earlier, it can sometimes be useful to create unsafe
functions like split_at_mut_unchecked
, which can serve as a building block for safe abstractions. This is also true for types. Looking at the Vec
implementation from the standard library, you will see that it looks like the code above.
pub struct Vec<T> { buf: RawVec<T>, len: usize, }
What is this type, RawVec
? It turns out that this is unsafe
type which contains a pointer ( pointer ) and a capacity ( capacity ):
pub struct RawVec<T> { // `Unique` `unsafe` , // ** (uniquely owned). ptr: Unique<T>, cap: usize, }
What makes RawVec
an auxiliary unsafe
type? Unlike functions, the concept of " unsafe
type" is rather vague. I define this type as a type that does not allow you to do anything useful without using unsafe
code. Safe ( safe ) code allows you to construct RawVec
, it even allows you to change the size of the buffer that underlies Vec
, but if you want to access the value that is in this buffer, you can only do this using ptr
that returns *mut T
This is a raw pointer, so dereferencing is unsafe
action. This means that in order to provide useful functionality, RawVec
must be included in another unsafe
abstraction (similar to Vec
, which tracks initialization.
unsafe
abstractions are quite powerful tools. , , , . "" , Vec
Rc
. unsafe
API, .
, , , , unsafe
. , unsafe
, , ? , . , . RFC, , , , , .
RFC , , . , , , . , unsafe
, ,
, .
. unsafe
, . aliasing (statements reordering).
, unsafe
. , , safe- , , unsafe
.
Many thanks to everyone from the Rustycrate community who participated in the translation, proofreading and editing of this article. : born2lose, ozkriff, vitvakatu.
UPD : 3 unsafe
.
Source: https://habr.com/ru/post/346336/
All Articles