📜 ⬆️ ⬇️

Leakpocalypse: Rust can surprise unpleasantly

Note Trans .: Someone had to do a translation of this article, despite the fact that it is quite old (2015), because it shows a very important feature of working with memory in Rust - using a safe (not marked as unsafe ) code, you can create memory leaks . This should sober up the people who believe in the all-embracing borrow checker.
Spoiler - inside about the impossibility of tracking cyclic references, as well as old diseases of some types from std , at the time of the transfer safely cured.
Despite the presence in the Book of a chapter on safe code (thanks for reminding ozkriff ), as well as an explanatory article by the Russian-speaking community (thanks for reminding mkpankov ), I decided to translate to clearly demonstrate the seriousness of misunderstanding Rust memory management features.
Most likely, this article has not previously been translated due to the very specific terms of the author, which the UFO will not miss in circulation. For this reason, the translation is not quite literal.


Leakage


Like a bolt from the blue, a bug with circular references to std::thread::JoinGuard plunged the community into an abyss of existential horror. If you follow the news and are already aware of the leakage , you can safely skip this section and move on to the next one. If you follow my comments in the community, then in principle you have nothing to read here, you can safely close the window and return to further study of my comments.


So, the bug:


With the help of circular references, you can miss JoinGuard, as a result of which the scoped thread can access the already freed memory area.

Extremely serious statement, since all used APIs are marked as safe, which should exclude such behavior of safe code in principle.


The main focus is on the thread::scoped , which creates a thread with access to the stack window of another thread, the security of which is statically guaranteed by the compiler. The idea behind the security is a JoinGuard , returned by thread::scoped , whose destructor blocks the execution of the pending and thread-owning stack; accordingly, nothing more than that passed to the expected stream can survive this JoinGuard . This allows you to implement quite useful things like:


 use std::vec::Vec; use std::thread; fn increment_elements(slice: &mut [u32]) { for a in slice.iter_mut() { *a += 1; } } fn main() { let mut v = Vec::new(); for i in (0..100) { v.push(i); } let mut threads = Vec::new(); for slice in v.chunks_mut(10) { threads.push(thread::scoped(move || { increment_elements(slice); })); } // JoinGuard' `threads`  ,  `main`   //        } 

Here we do some useful work for each of the ten elements of the array in a separate stream (well, like how. Here is just an example, imagine the payload there yourself.). Magically, Rust is able to statically guarantee the security of data access without even knowing what exactly is happening in the child threads! He just needs to see an array (Join) of JoinGuards of threads that borrow v , respectively, v cannot die before the threads end. (And more precisely, Rust doesn't know about the array itself, it is enough for it that threads borrow v - even if threads are empty at all).


And it is very cool. And alas, wrong .


It is assumed that destructors ( hereinafter - the implementation of Drop - approx. Per. ) Are guaranteed to be executed in a safe, not marked unsafe code. What can we say, your humble servant, like many in the community, grew up with faith in this postulate. In the end, we even have a separate function that specifically removes the connection with the element without calling the destructor, mem::forget , and in the name of this postulate, it is intentionally marked as unsafe !


As it turned out, these are just echoes of old API variants. In fact, mem::forget in the standard library is used in some places in safe code. And if almost all of these places were later referred to as implementation errors, the one remaining is quite fundamental. This is rc::Rc .


Rc is a smart pointer with a link count. It is very simple - put in the constructor Rc::new data to share between multiple links and use. clone() ( clone() ) Rc - counter increases. Delete ( drop() ) clone - the counter decreases. Everything works only at the expense of the borrow checker — tracking the lifetimes ensures that references to the data are released until Rc is deleted, through which they (references) are obtained.


Rc itself is quite a cake - thanks to the principle of the counter, we work only with references to the internals, the data itself remains in place in terms of freeing the memory, it is impossible to read something outside this memory ... except in cases of internal mutability. Although the distribution of copies of references to data in Rust implies immutability (data, not references), the data itself is still possible to change, as an exception to the rule. For these purposes, serves Cell , whose insides can be changed even in the case of multiple access. Actually, therefore, Cell types are marked as non-thread safe. For threads, there is sync::Mutex .


Now let's mix Rc with drop and write safe forget :


 fn safe_forget<T>(data: T) { use std::rc::Rc; use std::cell::RefCell; struct Leak<T> { cycle: RefCell<Option<Rc<Rc<Leak<T>>>>>, data: T, } let e = Rc::new(Leak { cycle: RefCell::new(None), data: data, }); *e.cycle.borrow_mut() = Some(Rc::new(e.clone())); // ,   } 

It looks like a fierce heresy. In short, you can create cyclically dependent readable references with Rc and RefCell . The bottom line is the fact that a destructor of the type inside Leak<T> will never be called, although we no longer have references to Rc. In principle, not calling a destructor is not in itself terrible; you can also terminate the program from the outside or work in an infinite loop. But not in this case - Rc told us that it caused the destructor . It can even be checked:


 fn main() { struct Foo<'a>(&'a mut i32); impl<'a> Drop for Foo<'a> { fn drop(&mut self) { *self.0 += 1; } } let mut data = 0; { let foo = Foo(&mut data); safe_forget(foo); } //    ,       //       data += 1; println!("{:?}", data); //   1,    2 } 

It works . Now we need to understand the problem from the source ticket that caused the panic. Reduce the example from the ticket to the following:


 fn main() { let mut v = Ok(4); if let &mut Ok(ref v) = v { let jg = thread::scoped(move || { println!("{}", v); //     }); safe_forget(jg); // JoinGuard ,   , join  } *v = Err("foo"); //       -  } 

We have disclosed undefined behavior.
And this is undoubtedly a hole in the standard Rust library, since the impossibility of use-after-free and data racing seems to be promised and statically guaranteed, and we have achieved both of these with a trivial example. The question arises - what exactly was done wrong. The original ticket predictably casts a shadow on thread::scoped , since it was created by a compiler developer who is 100% aware of the possibility of a destructor leaking from "safe" code.


This immediately gave rise to a wave of frustration in the community - until this point, they only saw leaks in unsafe . thread::scoped stabilized. mem::forget marked as unsafe , just because in no way, never, by no means and absolutely can not flow from under unsafe !
After the trial, the origins of the bug revealed a confluence of circumstances:



in which, and no more, the bug manifests itself. This immediately led to the appearance of personalities (alas, me too), requiring in one form or another a ban on the existence of any combination of these circumstances at the compiler level. So that you understand the level of despair - even there was an insistent request for an internal type of Leak , marking the possibility of a destructor leaking to safe.


Attention. Release 1.0 after three weeks , there is simply no time to implement the above. The conclusions are the only correct ones - remove mem::forget from unsafe (since it doesn’t do anything secretly), remake thread::scoped . Relevant RFCs:



Please note: Rust never guaranteed no leaks in principle . Leaks, as well as data races, is a concept that has no clear boundaries, therefore it is virtually unavoidable. If you put something in a HashMap and never ask it - it's also a kind of leak. Allocate something on the stack, then start an infinite loop - there too. Any case of heaping data and then forgetting about their existence is the most leakage, which for some reason causes panic in people in this form. An error can be considered data that you forgot to call the destructor during normal "safe" execution, because from the point of view of static analysis, this data does not exist.


What to do?


Thus, I was dejected and confused. Working with collections in my understanding always implied guaranteed execution of destructors. There were excellent opportunities to assemble a descriptor for an arbitrarily complex and dependent type, which acts transparently, but hides all entrails, and parse it correctly when removed. When working with the Rust lifetimes, this guaranteed, however, that the indefinite state of the type data outside this descriptor is statically non-separable ! That is, Rust is even more carefree than C!
A typical example of this is Vec::drain_range :


 //      ? ? //     . struct DrainRange<'a, T> { vec: &'a mut Vec<T>, num_to_drain: usize, start_pos: usize, left: *mut T, right: *mut T, } impl<T> Vec<T> { //  ,    `self[a..b]` // : b    ,   Rust range fn drain_range(&mut self, a: usize, b: usize) -> DrainRange<T> { assert!(a <= b, "invalid range"); assert!(b <= self.len(), "index out of bounds"); DrainRange { left: self.ptr().offset(a as isize), right: self.ptr().offset(b as isize), start_pos: a, num_to_drain: b - a, vec: self, } } } impl<'a, T> Drop for DrainRange<'a, T> { fn drop(&mut self) { //    for _ in self { } let ptr = self.vec.ptr(); let backshift_src = self.start_pos + self.num_to_drain; let backshift_dst = self.start_pos; let old_len = self.vec.len(); let new_len = old_len - self.num_to_drain; let to_move = new_len - self.start_pos; unsafe { //    ! ptr::copy( ptr.offset(backshift_src as isize), ptr.offset(backshift_dst as isize), to_move, ); //  Vec,         self.vec.set_len(new_len); } } } //    impl<'a, T> Iterator for DrainRange<'a, T> { type Item = T; fn next(&mut self) -> Option<T> { if self.left == self.right { None } else { unsafe { let result = Some(ptr::read(self.left)); //  size_of<T> == 0   self.left = self.left.offset(1); result } } } } impl<'a, T> DoubleEndedIterator for DrainRange<'a, T> { fn next_back(&mut self) -> Option<T> { if self.left == self.right { None } else { unsafe { //  size_of<T> == 0   self.right = self.right.offset(-1); Some(ptr::read(self.right)) } } } } 

It even works correctly unwind, class! And the destructor does not work, see:


 fn main() { let vec = vec![Box::new(1)]; { let drainer = vec.drain_range(0, 1); //    `box 1` - ,   ,  drainer.next(); safe_forget(drainer); } println!("{}", vec); // use-after-free ,   `box 1` } 

Not cool. I rummaged in the code of the collections of the standard library for something that hopes for a destructor for security, and, fortunately, I did not find anything (which, of course, does not mean at all that it is not there). More specifically, I was looking for something that fits into the following hierarchy of nastiness of the current destructor:


  1. No destructor - no leakage: primitives, pointers
  2. The leak of system resources is not really a problem: most of the collections and smart pointers around primitives, which have a lot of leakage during a leak
  3. Plain destructor leak is unpleasant but not fatal: collections and smart pointers around structures
  4. The world is in an indefinite, but safe state - sure "pyshch" in step: RingBuf::drain should clean the collection after its link to its drain iterator finishes living, but this is currently guaranteed only by the destructor. The collection itself is consistent.
  5. The world is broken - unacceptable: the aforementioned Vec::drain_range

In general, you should move the dependency on the destructor as far as possible according to this hierarchy, with the exception of code from the last category from all APIs that are not marked unsafe .
This hierarchy is based on the assumption that the current destructor in a running program is a bug. No more, no less, the application code is obliged to assume that there will be no leaks in its arbitrary place or under adequate conditions. As a common practice in Rust, third-party libraries are solely obliged to confirm the impossibility of leaks.


The fact that there is a code that may cause a leak, in the API, like Rc , should not be regarded as a bug of the API or its implementation. Moreover, if leakage is possible only under certain conditions created by the end user. Despite removing mem::forget from unsafe and the possibility of its calls in safe code, in general cases it must be remembered that the consequences of calling it are unsafe.


Rust can force crap unpleasant surprise


( most likely this site is the reason why this article is little cited - comment. per. )


How can we protect the use of Vec::drain_range and move it higher in the hierarchy? Adding the following line to the drain_range constructor. Everything is so simple!


 //  Vec,       . unsafe { self.set_len(a); } 

Now, if DrainRange , we will leak the destructors of the elements that need to be raised, because we have completely lost the values ​​shifted to the beginning of the vector. This is quite relevant to the category of "uncertain, but safe state", well, possible and various other leaks are also added. Still bad, but already a big progress regarding the use-after-free that happened earlier!


This is what I call the “ Pre-Poop Your Pants (PPYP) pattern” pattern — to save the designer’s situation, at least partially, if the destructor does not turn on. Why such a dumb title? Mm I used to imagine the life cycle between the designer and the destructor as a digestive process. Something you can eat (designer), process and go to the toilet (destructor). If the destructor disappears, the toilet never shines. Depending on the life situation (destructor), various consequences are possible:


  1. The types are broken at the structural level - that's all, the end, no one goes anywhere anymore. Where is the surgeon?
  2. The types seem to be intact, but the system is already damaged by them, and will break at any time if the constipation is prolonged
  3. Types are large and complex - There is a chance that there is nothing unpleasant inside
  4. Types-safe drugs - If you do not follow safety instructions, you can show you a helicopter and lead to worse consequences, but under certain conditions you can survive and return to service.
  5. Types of dangerous drugs - overdose and death are possible, should be as much as possible controlled.

Actually, the "It is better to crap" pattern is the following deviation from the usual digestive process: if you have eaten something unexpected, be prepared for "premature trouble." Usually, the troubles do not overtake, and you have time to reach the toilet according to the plan of his visit. But if they nevertheless overtook, it is better to crumble and embarrass than to fall into unconsciousness and wake up on the table at the surgeon. The pattern highlights the following:



By the way. Vec::drain_range managed to save !


')

Source: https://habr.com/ru/post/348344/


All Articles