What does unsafe mean in Rust?

Hi, Habr! I present to you the translation of the article "What Is Rust's unsafe?" by Nora Codes.

I have seen many misunderstandings about what the unsafe keyword means for the usefulness and correctness of the Rust language and its promotion as a "safe system programming language." The truth is much more complicated than can be described in a short tweet, unfortunately. Here is how I see her.

In general, the unsafe keyword does not turn off the type system that supports the Rust code is correct . It only allows you to use some "superpowers", such as pointer dereferencing. unsafe is used to implement secure abstractions based on the fundamentally unsafe world so that most of the code on Rust can use these abstractions and avoid unsafe memory access.

Safety guarantee

Rust guarantees safety as one of its main principles. We can say that this is the meaning of the existence of language. It does not, however, provide security in the traditional sense, during program execution and using the garbage collector. Instead, Rust uses a very advanced type system to keep track of when and to what values you can access. Then the compiler statically analyzes each program on Rust to make sure that it is always in the correct state.

Python Security

Let's take for example Python. Clean Python code cannot corrupt memory. Access to list items has checks to go beyond the boundaries; links returned by functions are counted to avoid hanging links; There is no way to perform arbitrary pointer arithmetic.

This has two implications. First, many types must be "special." For example, it is not possible to implement an efficient list or dictionary in pure Python. Instead, the CPython interpreter has an internal implementation. Secondly, access to external functions (functions implemented not in Python), called the interface of an external function, requires the use of a special ctypes module and violates the security guarantees of the language.

In a sense, this means that everything written in Python does not guarantee secure memory access.

Rust Security

Rust also provides security, but instead of implementing unsafe C structures, it provides a trick: the unsafe keyword. This means that the fundamental data structures in Rust, such as Vec, VecDeque, BTreeMap and String, are implemented in Rust.

You will ask: "But, if Rust provides a trick against its security guarantees of the code, and the standard library is implemented using this trick, wouldn't everything be considered unsafe in Rust?"

In a word, dear reader, - yes , exactly as it was in Python. Let's break it down.

What is prohibited in safe Rust?

Rust security is well defined: we think a lot about it. In short, safe programs on Rust cannot:

To dereference a pointer pointing to a type other than the one the compiler knows about . This means that there are no null pointers (because they don’t point to anywhere), no overrun errors and / or segmentation faults, no buffer overflows. But it also means that there is no use after freeing memory or re-freeing memory (because freeing memory is considered a pointer dereference) and no pun-typing .
Have multiple mutable references to an object or simultaneously mutable and immutable references to an object . That is, if you have a changeable link to an object, you can only have it, and if you have an immutable link to an object, it will not change as long as you keep it. This means that it is impossible to cause a data race in a safe Rust, which is a guarantee that most other safe languages cannot provide.

Rust encodes this information in a type system or using algebraic data types , such as Option to indicate the existence / absence of a value and Result <T, E> to indicate an error / success, or reference and their lifetime , for example, & T vs & mut T to denote common (immutable) link and exclusive (mutable) link and & 'a T vs &' b T to distinguish between links that are valid in different contexts (this is usually omitted because the compiler is smart enough to understand it) .

Examples

For example, the following code will not compile because it contains a hanging link. More specifically, my_struct does not live enough . In other words, the function will return a reference to something that no longer exists, and therefore the compiler cannot (and, in fact, does not even know how) compile it.

fn dangling_reference(v: &u64) -> &MyStruct { //     MyStruct   ,  v,   . let my_struct = MyStruct { value: v }; //      my_struct. return &my_struct; //  - my_struct  (  ). }

This code does the same thing, but it tries to get around this problem by placing the value on the heap (Box is the name of the basic smart pointer in Rust).

 fn dangling_heap_reference(v: &u64) -> &Box<MyStruct> { let my_struct = MyStruct { value: v }; //    Box         . let my_box = Box::new(my_struct); //      my_box. return &my_box; // my_box   .   "" my_struct       - , //    - MyStruct  . }

The correct code returns Box itself instead of a link to it. This encodes a movement of ownership — responsibility for freeing memory — in the function signature. When looking at the signature it becomes clear that the calling code is responsible for what happens to Box, and, indeed, the compiler handles this automatically.

 fn no_dangling_reference(v: &u64) -> Box<MyStruct> { let my_struct = MyStruct { value: v }; let my_box = Box::new(my_struct); //    my_box  . return my_box; //    .         , //    ;       //  Box<MyStruct>       ,      . }

Some bad things are not forbidden in a safe Rust. For example, allowed from the point of view of the compiler:
call deadlock in the program
leak arbitrarily large memory
failing to close file handles, database connections or rocket covers

The strength of the Rust ecosystem lies in the fact that many projects choose to use the type system to ensure the correctness of the code to the maximum, but the compiler does not require such coercion, except in cases of ensuring secure memory access.

What is allowed in an insecure Rust?

Insecure Rust code is Rust code with the unsafe keyword. unsafe can be applied to a function or code block. When it is applied to a function, it means "this function requires that the called code manually provide an invariant, which is usually provided by the compiler." When applied to a block of code, it means "this block of code manually provides the invariant necessary to prevent unsafe memory access, and therefore it is allowed to do unsafe things."

In other words, the function unsafe means "you have to check everything", and on the code block - "I have already checked everything."

As noted in The Rust Programming Language , the code in the block marked with the unsafe keyword can:

Dereference pointer. This is a key "superpower" that allows you to implement doubly linked lists, hashmap, and other fundamental data structures.
Call an unsafe function or method. More about this below.
Access or modify a static variable to be modified. Static variables whose scope is not controlled cannot be statically checked, so their use is unsafe.
Implement unsafe type (trait). Unsafe types are used to mark whether particular types guarantee certain invariants. For example, Send and Sync determine whether a type can be sent between stream boundaries or be used by several threads simultaneously.

Remember those examples with trailing pointers above? Add the word unsafe, and the compiler will swear twice as much, because it doesn’t like to use unsafe where it’s not needed.

Instead, the unsafe keyword is used to implement safe abstractions based on arbitrary pointer operations. For example, the Vec type is implemented using unsafe, but it is safe to use, since it checks for attempts to gain access to the elements and does not allow overflow. Although it provides operations like set_len, which can cause memory access insecurity, they are marked as unsafe.

For example, we could do the same thing as in the no_dangling_reference example, but with unreasonable use of unsafe:

 fn manual_heap_reference(v: u64) -> *mut MyStruct { let my_struct = MyStruct { value: v }; let my_box = Box::new(my_struct); //  Box    . let struct_pointer = Box::into_raw(my_box); return struct_pointer; //   ;     . // MyStruct     . }

Notice the absence of the word unsafe. Creating pointers is absolutely safe. As it was written, this is a risk of memory leaks, but nothing more, and memory leaks are safe. Calling this function is also safe. unsafe is required only when something tries to dereference a pointer. As an added bonus dereference will automatically release the allocated memory.

 fn main() { let my_pointer = manual_heap_reference(1337); let my_boxed_struct = unsafe { Box::from_raw(my_pointer) }; //  "Value: 1337" println!("Value: {}", my_boxed_struct.value); // my_boxed_struct    .       ,  //    - MyStruct }

After optimization, this code is equivalent to simply returning the Box. Box is a safe pointer-based abstraction because it prevents the distribution of pointers everywhere. For example, the next version of main will lead to double freeing of memory (double-free).

 fn main() { let my_pointer = manual_heap_reference(1337); let my_boxed_struct_1 = unsafe { Box::from_raw(my_pointer) }; // DOUBLE FREE BUG! let my_boxed_struct_2 = unsafe { Box::from_raw(my_pointer) }; //  "Value: 1337" . println!("Value: {}", my_boxed_struct_1.value); println!("Value: {}", my_boxed_struct_2.value); // my_boxed_struct_2    .     ,  //    - MyStruct. //  my_boxed_struct_1    .      , //      - MyStruct.  double-free bug. }

So what is a safe abstraction?

Secure abstraction is an abstraction that uses a type system to provide an API that cannot be used to violate the security assurances mentioned above. Box is safer than * mut T, since it cannot lead to the double freeing of memory, illustrated above.

Another example is the Rc type in Rust. This is a reference counting pointer — a non-editable link to data located on the heap. Since it allows multiple simultaneous access to a single memory area, it must prevent change in order to be considered safe.

In addition to this, it is not thread safe. If you need thread safety, you have to use the Arc type (Atomic Reference Counting), which has a performance penalty due to the use of atomic values to count the references and prevent possible data races in multi-threaded environments.

The compiler will not allow you to use Rc where you need to use Arc, because the creators of the type Rc did not mark it as thread-safe. If they did, it would be unfounded: a false promise of security.

When is an insecure Rust needed?

Unsafe Rust is always needed when you need to perform an operation that violates one of those two rules described above. For example, in a doubly linked list, the absence of changeable references to the same data (for the next element and the previous element) completely deprives it of its usefulness. With unsafe, a doubly linked list implementer can write code using * mut Node pointers and then encapsulate it into a safe abstraction.

Another example is working with embedded systems. Often, microcontrollers use a set of registers whose values are determined by the physical state of the device. The world cannot stop until you take & mut u8 from such a register, so you need unsafe to work with device support kraits. As a rule, such cracks encapsulate the state in transparent secure wrappers that copy data as much as possible, or use other techniques that provide guarantees to the compiler.

Sometimes it is necessary to conduct an operation that can lead to simultaneous reading and writing, or unsafe memory access, and this is where unsafe is needed. But as long as it is possible to make sure that the safe invariants are maintained before the user of the safe (that is, unsafe marked code) touches something, everything is fine.

On whose shoulders does this responsibility lie?

We arrive at the statement made earlier - yes , the usefulness of the code on Rust is based on unsafe code. Although this is done somewhat differently than the unsafe implementation of the underlying data structures in Python, the implementation of Vec, Hashmap, etc., should use pointer manipulations to any degree.

We say that Rust is safe, with the fundamental assumption that the unsafe code that we use through our dependencies either on the standard library or on the code of other libraries is correctly written and encapsulated. The fundamental advantage of Rust is that unsafe code is driven into unsafe blocks, which must be carefully checked by their authors.

In Python, the burden of checking the security of memory manipulation rests solely with the developers of interpreters and users of external function interfaces. In C, this burden is on every programmer.

In Rust, it lies with the users of the unsafe keyword. This is obvious, since within such a code, invariants must be maintained manually, and therefore it is necessary to strive for the smallest amount of such code in a library or application code. Insecurity is detected, highlighted and indicated. Therefore, if segfaults occur in your Rust code, then you have found either an error in the compiler, or an error in several lines of your unsafe code.

This is not a perfect system, but if you need speed, security and multithreading at the same time, then this is the only option.

Source: https://habr.com/ru/post/460295/

All Articles