Stack implementation details - part one

Some time ago I wrote that “links” are not “addresses” when it comes to C # and the placement of its objects in memory. Although this is true, but this is just a implementation detail, but not the meaning of a “reference”. Another implementation detail that is often confused with the essence is that "memory for significant types (value types) is allocated on the stack." I often see this because it is written in our documentation .

Virtually every article I see describes in detail (often incorrectly) what a stack is and what the main difference between significant and reference types is that significant types are located on the stack. I am sure you can find many examples of such articles on the web.

I believe that the definition of meaningful types, which is based on implementation details, and not on their observable behavior, is both confusing and not entirely correct. The most significant characteristic of an object of a significant type is not how it is located in memory, but how they behave from the point of view of semantics: “objects of significant types” are always transmitted “by value”, i.e. are copied. If the main differences between reference and significant types were in the details of memory location, we would call them “types on the heap” and “types on the stack”. But in general, it has nothing to do with the essence. In general, it is important that instances of significant types are copied and compared.
')
Unfortunately, the documentation is not focused on the most significant characteristics, but it is focused on the implementation details and misses the essence of the significant types. I would very much like all those articles that explain “what a stack is” instead to explain what “copy by value” is and how misunderstanding this mechanism can cause errors.

The statement that significant types are placed on the stack is not true in the general case. The documentation on MSDN correctly noted that significant types are placed on the stack sometimes. For example, an int field in a reference type is part of an object of this type and, like the entire object, its field is located on the heap. The same story with local variables that fall into the closure of anonymous methods (*), because they essentially become fields of a hidden class and are also located on the heap, so local variables can be located on the heap even if they are of a meaningful type.

In short, we have an explanation that explains nothing. Rejecting performance considerations what else, in terms of developer expectations, can force CLRjitter to place an int variable on the stack, and not on the heap? Nothing, as long as the specification is not violated, the system can choose the most effective code generation strategy.

Yeah, no one promised that the operating system on top of which is implemented CLI provides an array of 1 megabyte called "stack". Windows typically does this and this one megabyte array is a great place to efficiently store small objects with a short lifetime, but there are no requirements or guarantees that the operating system provides this kind of structure or that jitter will use it. In principle, Jitter can decide to create all local variables on the heap, despite the loss of performance, this will work as long as the semantic requirements for the relevant types are met.

It is even worse to think that the significant types are “fast and small” and the reference types are “large and slow”. Indeed, significant types can be generated by jitter in the code on the stack, which is very fast both when allocating and clearing the memory. But at the same time, large structures created on the heap, such as an array of elements of a significant type, are also created very quickly, provided that they are initialized with default values. And reference types occupy additional space in memory. And of course there are such conditions when significant types give a big performance gain. But in the vast majority of programs, how local variables are created and destroyed cannot be a bottleneck in terms of performance.

Nano optimizations for turning reference types into meaningful ones that yield several nanoseconds are not worth it. This should be done only if the profiler data showed that there are real problems that can be solved by using meaningful types instead of reference ones. I have no such data. When choosing to use a reference or significant type, I am always guided by how the type that I create should behave semantically.

(*) also holds for an iterator block.

From the translator: recently I again had the opportunity to conduct several interviews and often to the question “what are significant and reference types” I heard “significant types are types whose instances are located on the stack, and reference types are types whose instances are located in heap. " So before you is a translation of an article by Eric Lippert, which is very old, but has not lost its relevance. I tried to make the translation as readable and easy to read in Russian as possible, so that it differs significantly from the original in form but not in meaning.

Source: https://habr.com/ru/post/221861/

All Articles

Stack implementation details - part one

More articles: