[DotNetBook] Reference and significant data types, features of choice

With this article, I continue to publish a series of articles, the result of which will be a book on the work of the .NET CLR, and .NET as a whole. The IDisposable theme was chosen as an overclocking test. Now touch the difference between the types. The whole book will be available on GitHub: DotNetBook . So Issues and Pull Requests are welcome :)

This is a squeeze from the Struct / Class chapter and their difference .

Features of the choice between class / struct

Let's think about the features of both types, their strengths and weaknesses, and decide where to use them better. Here, of course, it is worth remembering the classics who give the assertion that the choice towards significant types should be given if our type does not plan to be inherited, it will not change during its life, and its size does not exceed 16 bytes. But not everything is so obvious. To make a full comparison, we need to think about choosing a type from different angles, mentally thinking through scenarios for its future use.
')

Note

The chapter published on Habré is not updated and it is possible that it is already somewhat outdated. So, please ask for a more recent text to the original:

CLR Book: GitHub, table of contents
CLR Book: GitHub, chapter
Release 0.5.2 of the book, PDF: GitHub Release

I propose to divide the selection criteria into three groups:

in terms of the type system architecture in which your type will interact;
in terms of approaching you as a system programmer: what choice will be optimal in terms of performance;
in another way is simply impossible.

Each entity that you design must fully reflect its purpose. And this concerns not only its name or interaction interface (methods, properties), but even the choice between the significant and reference type can be made from architectural considerations. Let's consider why from the point of view of the type system architecture, a structure can be chosen instead of a class:

If our projected type will have invariance with respect to the semantic load of its state , then this will mean that its state fully reflects some process or is the value of something. In other words, an instance of a type is completely constant and cannot be changed in its essence. We can create another instance of the type based on this constant by specifying an offset, or create it from scratch by specifying its properties. But we have no right to change it. I ask you to note that I do not mean that the structure is an immutable type. You can change the fields as you like. Moreover, you can give a link to the structure in the method via the ref parameter and get the modified fields on the way out of the method. However, I mean the point from the point of view of architecture. Let me explain with examples:
- DateTime is a structure that encapsulates the notion of a moment in time. It stores this data in the form of UInt64 , however it provides access to specific characteristics of the moment in time. For example: year, month, day, hour, minute, second, millisecond, and even processor tics. However, based on the fact that it encapsulates, it cannot be changeable in nature. We cannot change a specific point in time so that it becomes different. I can't live the next minute of my life on the best birthday of my childhood. Time is unchanged. That is why the choice for the data type can be either a class with readonly interaction interface (which gives a new instance to every property change) or a structure, which despite the possibility of changing the fields of its instances should not do this: the description of the moment in time is * value *. Like a number. You can’t get into the number structure and change it? If you want to get another time, which is offset from the original one day, you just get a new copy of the structure;
- KeyValuePair<TKey, TValue> is a structure that encapsulates the concept of a connected key-value pair. I note that this structure is used only for issuing to the user when listing the contents of the dictionary. Why is the structure chosen in terms of architecture? The answer is simple: because, within the framework of Dictionary <T>, the key and value are inseparable concepts. Yes, inside everything is different. Inside we have a complex structure, where the key lies separately from the value. However, for an external user, from the point of view of the interaction interface and the meaning of the data structure itself, the key-value pair is an inseparable concept. It is the * value * entirely. If we put another value by this key, it means that the whole pair has changed. For the external observer there are no separate keys, and separately there are no values; they are a single whole. That is why the structure in this case is an ideal option.
If our projected type is an integral part of the external type . But at the same time it is structurally integral. Those. it would be incorrect to say that the external type refers to an instance of the encapsulated, but quite correctly - that the encapsulated is a full part of the external along with all its properties. Typically, this is used when designing structures that are part of another structure.
- For example, if you take the structure of the file header, it would be unfair to give a link from one file to another. Like, the header is in the file header.txt . This would be appropriate when inserting a document into some other one, but not by implanting a file, but by a relative link on the file system. A good example is the Windows shortcut file. However, if we are talking about the file header (for example, the JPEG file header, which specifies the image size, compression method, shooting parameters, GPS coordinates and other meta information), then when designing the types that will be used to parse the header, it will be extremely useful to use structures . After all, having described all the headers in the structures, you will receive in memory exactly the same position of all the fields as in the file. And through simple unsafe conversion *(Header *)readedBuffer without any deserializations - completely filled data structures.
Note that each example has the following property: none of the examples have the property of inheriting the behavior of something . Moreover, all these examples also show that there is absolutely no sense in inheriting the behavior of these entities. They are completely self-sufficient as units of something.
If we look at the problematic from the point of view of the efficiency of the code, then we will have a choice on the other hand:
1. Structures must be selected if it is necessary to take some structured data from unmanaged code. Or give unsafe data structure method. The reference type is not suitable for this at all;
2. If the type will often be used to transfer data in method calls (whether as return values or as a method parameter), but there is no need to refer to the same value from different places, then your choice is structure. As an example I can cite tuples. If a method through a tuple returns several values to you, it means that it will return a ValueTuple, which is declared as a structure. Those. when returning, the method will not allocate memory on the heap, but it will use the thread stack, which does not cost you any memory;
3. If you are designing a system that generates some more traffic for instances of the designed type. In this case, the instances themselves are quite small in size, and the lifetime of the instances is very short, then the use of reference types will lead either to the use of a pool of objects, or, if without a pool, to uncontrolled heap litter. At the same time, some of the objects will be transferred to older generations, which will cause a subsidence on the GC. Using meaningful types in such places (if possible) will give a performance boost simply because nothing will go to SOH, and this will unload the GC and the algorithm will work faster

Combining all the above, I can offer some tips and comments in the use of structures:

When choosing collections one should avoid large arrays, inside of which there are large structures. This also applies to those data structures that are based on arrays (and they are the majority). This can lead to leaving the Large Objects Heap and fragmenting it. It is not enough to calculate that if your structure has 4 byte fields, then it will take 4 bytes. Not at all. It should be understood that for 32-bit systems, each field of the structure will be aligned by 4 bytes (the address of each field must be divided into 4 without a residue), and on 64-bit systems - by 8 bytes. Those. the size of the array should depend on the size of the structure and on the platform on which the application is running. In our example with 4 bytes - 85K / (from 4 to 32 bytes on the field * number of fields = 4) minus the size of the array header: approximately 2,600 elements per array, depending on the platform (and you should take the clear side down). Just! Not so much! But it could seem that the magic constant of 10,000 elements could well have come up!
You should also be aware that if you use a structure that is quite large enough as a data source and place it in some class as a field and, for example, the same copy is reproduced for a thousand copies (simply because it is convenient for you to keep everything is at hand), then you increase each instance of the class by the size of the structure, which ultimately leads to the swelling of the 0th generation and leaving for generation 1 or even 2. At the same time, if in fact the instances of the class are short-lived and you ityvaete the fact that they will be collected in the zero generation GC - 1 ms, you will be greatly disappointed that they actually managed to get to the next one, or even the second. And what is the difference? The difference is that if generation 0 is collected in 1 ms, then the first and second is very slow and will lead to subsidence from scratch;
For about the same reason, you should avoid forwarding large structures through a chain of method calls. Because if everything starts to call each other, such calls will take much more place in the stack, summing up the life of your application to death through a StackOverflowException . The second reason is performance. The more copies, the slower everything works;

Therefore, in general, the choice between data types is quite a non-trivial process. Often this may be a premature optimization, which is not recommended. However, if you know that your situation falls under the above principles, you can safely make a choice in the direction of a meaningful type.

The base type is Object and the possibility of implementing interfaces. Boxing.

You and I have passed as it may seem, both fire and water, and we can pass any interview. Perhaps even in the .NET CLR command. But let's not rush to recruit microsoft.com and look for the vacancy section there: we’ll be in time. Let's answer this question better. If significant types do not contain any reference to SyncBlockIndex or a pointer to a virtual method table ... So, excuse me, how do they inherit the object type? Indeed, according to all the canons, any type inherits it. Unfortunately, the answer to this question will not fit in one sentence, but it will give such an understanding about our type system that the last pieces of the puzzle will finally fall into place.
So, let us once again recall the placement of significant types in memory. Wherever they are, they are implanted in the place where they are. They become part of it. Unlike reference types, for which the law says to be in a heap of small or large objects, and in the place of installation of value - always put a link to the place in the heap where our object is located.
So, if you think about it, then any meaningful type has methods ToString , Equals and GetHashCode , which are virtual, redefinable, but we are not allowed to inherit significant types by overriding methods. Why? Because if significant types are made with redefinable methods, then they will need a virtual method table through which the routing of calls will be performed. And this, in turn, will entail the problems of forwarding structures into the unmanaged world: extra fields will go there. The result is that the description of the methods of significant types somewhere lie, but they do not have direct access through the table of virtual methods.
This suggests that the lack of inheritance is artificial:

Inheritance from object is, though not direct;
The basic type has ToString, Equals and GetHashCode, which work in their own way in meaningful types: these methods have their own behavior in each of them. This means that methods are redefined with respect to object;
moreover, if you do a cast in an object , you can still call ToString , Equals and GetHashCode with full rights.
When an instance method is called over a meaningful type, copying to the method does not occur. Those. calling an instance method is similar to calling a static method: Method(ref structInstance, newInternalFieldValue) . And this is, in fact, a call with the transfer of this with one exception: JIT must assemble the method body so that it does not make an additional shift to the structure fields by jumping over the pointer to the virtual method table, which is not in the structure itself. For significant types, it is in a different place .

Those. in a sense, we are not exactly deceived, but they are not talking enough: types are very different behaviorally, but at the level of implementation in the CLR, the difference between them is not so significant. But more about that later.
If we write the following line in our program:

 var obj = (object)10;

Then we will stop dealing with the number 10 . There will be a so-called boxing: packaging. Those. we will begin to be able to work with it through the base class. And if we received such opportunities, this means that VMT (virtual methods table) became available to us, through which we can safely call the virtual methods ToString (), Equals and GetHashCode. And since the original value can be stored anywhere: even on the stack, even as a class field, and leading to the object type, we are able to store a reference to this number of eternals, in reality boxing creates a copy of the meaningful type, and does not make a pointer to original. Those. when boxing occurs, then:

The CLR allocates space on the heap for the + SyncBlockIndex + VMT structure of a significant type (to be able to call ToString, GetHashCode, Equals);
copies there an instance of a meaningful type.

Ladies and Gentlemen. In a decent society, it is not customary to say this, but we received a reference variant of a significant type. I will repeat it again: having made a boxing structure, I received ** absolutely the same set of system fields as the reference type **, becoming a full-fledged reference type. The structure has become a class. Let's call this phenomenon Dotnet's Kulbit. It seems to me that this title will be worthy of such a cunning turn of affairs.
By the way, in order for you to believe in the honesty of my words, it is enough to figure out what happens if you use a structure that implements a certain interface - on this very interface.

 struct Foo : IBoo { int x; void Boo() { x = 666; } } IBoo boo = new Foo(); boo.Boo();

So, when an instance of Foo is created, its value is essentially on the stack. Then we put this variable into a variable of the interface type. The structure is in a variable of reference type. boxing occurs. Good. At the output, we got the object type. But the variable we have is of interface type. This means that type conversion is needed. Those. the call, rather, is something like this:

 IBoo boo = (IBoo)(box_to_object)new Foo(); boo.Boo();

Those. writing such code is extremely inefficient. Not only will you change the copy instead of the original:

 void Main() { var foo = new Foo(); foo.a = 1; Console.WriteLine(foo.a); // -> 1 IBoo boo = foo; boo.Boo(); //    foo.a  10 Console.WriteLine(foo.a); // -> 1 } struct Foo : IBoo { public int a; public void Boo() { a = 10; } } interface IBoo { void Boo(); }

Looks like a hoax twice. The first time - looking at the code, we are not obliged to know what we are dealing with in someone else's code and see below the cast to the IBoo interface. What is actually guaranteed to suggest that Foo is a class, not a structure. Further, the complete lack of visual separation into structures and classes gives a complete feeling that the results of the interface modification must fall into foo, which does not happen because boo is a copy of foo. What actually misleads us. In my opinion, such a code is worth providing comments so that an external developer can figure it out correctly.
The second observation related to our earlier reasoning is related to the fact that we can make a type conversion from object to IBoo . This is another proof that a boxed significant type is not something special, but in fact a reference variant of a significant type. Or, if viewed from a different angle, all types in the type system are referential. We can simply work with structures as meaningful, “shipping” their meaning entirely. As they would say in the C ++ world, dereferencing a pointer to an object.
But you can argue: they say if everything was exactly as I say, then you could write something like this:

 var referenceToInteger = (IInt32)10;

And we would get not just an object , but a typed reference to a packaged significant type. But then it would destroy the whole idea of significant types, friends. And the main idea is the integrity of their values, which allows for excellent optimization based on their properties. So let's not sit back! Let's destroy this idea!

 public sealed class Boxed<T> where T : struct { public T Value; [MethodImpl(MethodImplOptions.AggressiveInlining)] public override bool Equals(object obj) { return Value.Equals(obj); } [MethodImpl(MethodImplOptions.AggressiveInlining)] public override string ToString() { return Value.ToString(); } [MethodImpl(MethodImplOptions.AggressiveInlining)] public override int GetHashCode() { return Value.GetHashCode(); } }

What did we just get? We got absolutely complete analog boxing. But now we have the ability to change its contents by calling its instance methods. And these changes will be received by everyone who has a link to this data structure.

 var typedBoxing = new Boxed<int> { Value = 10 }; var pureBoxing = (object)10;

The first option, you see, looks somewhat uncertain. Instead of the usual type cast, we don’t understand what. Is it the second line. Laconic like Japanese verse. However, they are actually almost identical. The only difference is that during normal packaging, after allocating memory on the heap, no memory is cleared with zeros: the memory immediately takes up the necessary structure. Whereas in the first version there is a cleaning. Only because of this, our option is slower than usual packaging by 10%.
But now we can call some methods on our packed value:

 struct Foo { public int x; public void ChangeTo(int newx) { x = newx; } } var boxed = new Boxed<Foo> { Value = new Foo { x = 5 } }; boxed.Value.ChangeTo(10); var unboxed = boxed.Value;

We have received a new tool, but do not yet know what to do with it. Let's get an answer reasoning:

Our type Boxed <T> essentially does the same thing as a regular one: allocates memory on the heap, gives the value there and allows it to be picked up by executing a kind of unbox
Likewise, if you lose the link to the packaged structure of the GC it will gather;
However, we now have the opportunity to work with a packaged type: call its methods;
Also now we have the opportunity to replace an instance of a significant type in SOH / LOH with another. This we could not do before: we would have to do unboxing , change the structure to another and do boxing back, distributing a new link to consumers.
Also, let's think about the main problem with packaging? Creating traffic in memory. Traffic of an incomprehensible number of objects, some of which can survive until the first generation, where we get problems with garbage collection: it will be there, it will be there a lot and this could have been clearly avoided. And when we have the traffic of short-lived objects, the first solution that comes to mind is pulling. This will be a great completion of the Kulbit Dotnetsky.

 var pool = new Pool<Boxed<Foo>>(maxCount:1000); var boxed = pool.Box(10); boxed.Value=70; // use boxed value here pool.Free(boxed);

Those. , . , . , boxed , . GC.
:

— , : ;
— , , : , . , , . , , GC ;

 static unsafe void Main() { //  boxed int object boxed = 10; //     VMT var address = (void**)EntityPtr.ToPointerWithOffset(boxed); unsafe { //   Virtual Methods Table var structVmt = typeof(SimpleIntHolder).TypeHandle.Value.ToPointer(); //   VMT  ,   Heap  VMT SimpleIntHolder,  Int   *address = structVmt; } var structure = (IGetterByInterface)boxed; Console.WriteLine(structure.GetByInterface()); } interface IGetterByInterface { int GetByInterface(); } struct SimpleIntHolder : IGetterByInterface { public int value; int IGetterByInterface.GetByInterface() { return value; } }

, . github . boxing int reference type. :

boxing
( VMT Int32)
VMT SimpleIntHolder
VMT VMT
unbox
— , Int32, .

, , .

Link to the whole book

CLR Book: GitHub
Release 0.5.0 books, PDF: GitHub Release

Source: https://habr.com/ru/post/342758/

All Articles

[DotNetBook] Reference and significant data types, features of choice

Features of the choice between class / struct

Note

The base type is Object and the possibility of implementing interfaces. Boxing.

Link to the whole book

More articles: