📜 ⬆️ ⬇️

Ownership and borrowing in D

Almost all non-trivial programs allocate and use dynamic memory. Doing this correctly is becoming more and more important as programs become more and more complex and errors even more expensive.

Typical problems are:

  1. memory leaks (no more free memory is used)
  2. double release (memory release more than once)
  3. use after release (use a pointer to a memory previously released)

The task is to track the pointers responsible for freeing the memory (i.e., owning memory), and to distinguish pointers that simply point to the memory area, control where they are and which of them are active (in scope).

Typical solutions are as follows:
')
  1. Garbage Collection (GC) - GC owns the memory blocks and periodically scans them for the presence of pointers to these blocks. If the pointers are not found, the memory is released. This scheme is reliable and is used in languages ​​such as Go and Java. But GC has a tendency to use much more memory than necessary, it has pauses and slows down the code due to repacking (orig.inserted write gates).
  2. Reference Counting (RC) - An RC object owns the memory and stores a counter of pointers to itself. When this counter decreases to zero, the memory is released. It is also a reliable mechanism and is adopted in languages ​​like C ++ and ObjectiveC. RC is effective in memory, requiring additionally only a place under the counter. The negative sides of RC are the overhead of maintaining the counter, embedding the exception handler to ensure its reduction, and the locks necessary for objects shared between the threads of the program. To improve performance, programmers sometimes tricked, temporarily referring to an RC object to bypass the counter, causing the risk of doing this incorrectly.
  3. Manual Management - Manual memory management is Sishnye malloc and free. It is fast and efficient in using memory, but the language does not help at all to do everything correctly, relying entirely on the experience and diligence of the programmer. I have been using malloc and free for 35 years, and with the help of a bitter and endless experience I rarely make mistakes. But this is not the way that programming technology can rely on, and notice that I said “rarely” and not “never.”

Solutions 2 and 3 in one way or another rely on the faith in the programmer to do everything correctly. Faith-based systems do not scale well, and memory management errors have been proven to be very difficult to recheck (so bad that the use of dynamic memory is prohibited by some coding standards).

But there is also a fourth way - ownership and borrowing (Ownership and Borrowing, OB). It is memory efficient, as fast as manual control, and is subject to automated rechecking. The method is recently popularized by the Rust programming language. It also has its drawbacks, in particular the need to rethink the planning of algorithms and data structures.

Negative sides can be dealt with and the rest of this article is a schematic description of how the OB system works and how we suggest writing it in the D language. I initially thought this was impossible, but after spending a lot of time thinking, I found a way. It is similar to what we did with functional programming — with transitive immunity and “pure” functions.

Possession


The decision, who owns the object in memory, is ridiculously simple - there is a single pointer to the object and it is the owner. He is responsible for the release of memory, after which it becomes invalid. Due to the fact that the pointer to an object in memory is the owner, there are no other pointers inside this data structure, and therefore the data structure forms a tree.

The second consequence is that pointers use the semantics of moving rather than copying:

T* f(); void g(T*); T* p = f(); T* q = p; //  p   q,    g(p); // , p   

Indexing from inside the data structure is prohibited:
 struct S { T* p; } S* f(); S* s = f(); T* q = sp; // ,      sp 

Why not just mark sp as an invalid value? The problem is that it will require the label to be set at runtime, and must be solved at the compilation stage, because it is simply considered a compilation error.

The output of the owner pointer over the scope is also an error:

 void h() { T* p = f(); } // ,   p? 

You must move the pointer value differently:
 void g(T*); void h() { T* p = f(); g(p); //   g(),    g() } 

This perfectly solves the problem of memory leaks and use after release (Hint: to make it clearer, replace f () with malloc (), and g () with free ().)

All of this can be verified at compile time using the Data Flow Analysis (DFA) technique, just like it is used to remove general subexpressions . DFA can unwind any rat tangle from program transitions that may occur.

Borrowing


The tenure system described above is reliable, but it is too restrictive.
Consider:

 struct S { void car(); void bar(); } struct S* f(); S* s = f(); s.car(); // s   car() s.bar(); // , s  

For this to work, s.car () must have a way to get the pointer back on exit.

This is how borrowing works. s.car () borrows a copy of s for the execution time of s.car (). s is invalid for execution time, and becomes valid again when s.car () is completed.

In the D language, the member functions of a struct receive a this pointer by reference, so we can adjust the borrowing with a small extension: getting the argument by reference borrows it.

D also supports the scope for pointers, so borrowing is natural:

 void g(scope T*); T* f(); T* p = f(); g(p); // g()  p g(p); //    p     g() 

(When functions get arguments by reference, or scopes are used, they are not allowed to extend beyond the bounds of the function or scope. This corresponds to the borrowing semantics.)

Borrowing in this way guarantees the uniqueness of a pointer to an object in memory at any given moment.

Borrowing can be extended further with the understanding that the ownership system is also reliable, even if the object is indicated additionally by several constant pointers (but only one is mutable). A constant pointer can neither change memory nor free it. This means that several constant pointers can be borrowed from the mutable owner, but he has no right to be used while these constant pointers are alive.

For example:

 T* f(); void g(T*); T* p = f(); // p   { scope const T* q = p; //    scope const T* r = p; //    g(p); // , p   q  r    } g(p); // ok 

Principles


The above can be reduced to the following understanding that an object in memory behaves as if it were in one of two states:

  1. there is exactly one mutable pointer to it
  2. there are one or more additional constant pointers

The attentive reader will notice something strange in what I wrote: "as if." What did I want to hint at? What the hell is going on? Yes, there is. Computer programming languages ​​are full of such "as if" under the hood, just like the money in your bank account is actually not there (I apologize if it was a rough shock for someone), and it is no different from that. Read on!

But first, a little deeper into the topic.

Integration of Ownership / Borrowing Technology in D


Are not these techniques incompatible with how people usually write in D, and will not almost all existing programs in D break? And it’s not that easy to fix, but so much so that all algorithms will have to be redesigned from scratch?

Yes indeed. Is that D has a (almost) secret weapon: the attributes of functions. It turns out that the semantics of ownership / borrowing (OB) can be implemented for each function separately after the usual semantic analysis. The attentive reader may have noticed that no new syntax has been added, only restrictions have been placed on the existing code. In D, there is already a history of using function attributes to change their semantics, for example, the pure attribute to create "pure" functions. To enable the OB semantics, the @ live attribute is added.

This means that OB can be added to the D code gradually, as needed and free resources. This makes it possible to add OBs, and this is critical, constantly maintaining the project in a fully working, tested and ready state in the release. It also allows you to automate the control process, what percentage of the project has already been transferred to the OB. This technique is added to the list of other guarantees of D for reliability of working with memory (such as monitoring the non-proliferation of pointers to temporary variables on the stack).

As if


Some necessary things can not be implemented with strict obeying OB, such as objects with reference counting. In the end, RC objects are designed to have many pointers to them. Since RC objects are safe when working with memory (if correctly implemented), they can be used with an OB without negatively affecting reliability. They just can not be created with the OB technique. The solution is that there are other function attributes in D, for example @ system . @ system is a feature where many reliability checks are disabled. Naturally, OB will also be disabled in the code with @ system . This is where the implementation of the RC technology is hidden from the OB control.

But in the code with an OB, the RC object looks as if it observes all the rules, so there are problems!

It takes a certain number of such library types to successfully work with OBs.

Conclusion


This article is a basic overview of the OB technique. I am working on a much more detailed specification. It is possible that I missed something and somewhere there is a hole below the waterline, but so far everything looks good. This is a very exciting development for D and I look forward to its implementation.

For further discussions and comments from Walter, refer to the topics on / r / programming subreddit and on Hacker News .

Source: https://habr.com/ru/post/460671/


All Articles