Something about garbage collection and generations

Everyone knows that most modern systems use generations to increase the efficiency of garbage collection. But have you ever wondered how these generations work at all, and how do we get performance gains? But let's not get ahead of ourselves and sort everything out in order.

So, most modern garbage collection systems (Garbage Collector, GC) use generations to more efficiently release short-lived objects. There is a heuristic rule that says that most of the newly created objects are used for a very short time and can be safely removed at the first opportunity.

The main idea of generations is that we collect the “garbage” of the young generation much more often and much faster due to the fact that we analyze not all the objects of the managed heap (which can be very, very much), but only the objects of this young generation.

Let's look at this example. Suppose we have some object “A” that initializes its property in a lazy way:
')

public class B { } public class A { private Lazy<B> _lazyB = new Lazy<B>( () => new B()); public BB { get { return _lazyB.Value; } } }

And access to the property B, and hence the creation of this object, occurs already when the object “A” is in the second generation:

 var a = new A(); GC.Collect(); GC.Collect(); // output: A resides in Gen 2, AB resides in Gen 0 Console.WriteLine("A resides in Gen {0}, AB resides in Gen {1}", GC.GetGeneration(a), GC.GetGeneration(aB)); GC.Collect();

So, zero generation garbage collection is to analyze only the objects of this generation. This means that when you start the garbage collection in line (3), all objects of the zero generation are marked as unreachable, including the newly created object “B”. Then all root references are analyzed to determine the reachability of objects of this generation; if an object is reachable, then it is considered alive, and all other objects that cannot be accessed from the root links are considered garbage and are deleted.

But in our case the object “B” is not reachable directly from the root links, which means that to determine its reachability, the garbage collector will have to analyze the fields of all objects in all heaps of our application, otherwise newly created objects may be “mistakenly” collected by the garbage collector , which we obviously would not want. Then what is the meaning of generations, if each time to determine the reachability of objects of the zero generation, you still have to analyze the entire managed heap entirely?

To solve this problem, we need to somehow add the object “A” to the list of objects that need to be analyzed to determine the reachability of the object “B”. However, instead of storing a list of all dirty objects, most implementations of generational garbage collectors use a special data structure called card table, which stores the address of the object that created the young descendant.

Card table is a simple data structure, which is a bit mask, each bit of which indicates that an object located at a certain range address is “dirty” and contains a link to a “young” object. At the moment, one bit of the bit mask is a range of 128 bytes, which means that each byte of the bit mask contains information about the range in 1K. This approach is a compromise between efficiency and the amount of additional memory required by the garbage collector to keep this table up to date. Thus, for a 32-bit system in which the user mode has 2 GB of address space available, the size of the card table will be 2 MB. However, since one bit in the card table marks the range of 128 bytes of the address space, each time garbage collection will have to analyze dozens of other objects that may not contain references to young generations.

To keep this data structure up to date, every time an object field is written, the JIT compiler generates a so-called write barrier, which boils down to updating the card table if the address of the object being written is in the ephemeral segment, those. is a young object of the 0th or 1st generation.

Now, if we return to our case, then object “B” will not be collected by the garbage collector, since not only root references (which are not referenced to it), but also all objects located at lower 128 bytes of the second generation will be analyzed for reachability analysis, where our object “A” goes.

Why do I need all this?

Yes, there is no particular practical benefit in the information on how exactly garbage collection is implemented (until you subscribe to the event of a long-lived object and do not forget to unsubscribe). It is just that every time generations are discussed when garbage collection is discussed, generations are necessarily mentioned, but the fact is rarely discussed that it is simply impossible to implement efficient garbage collection without additional scrap and well-known mother.

By the way, this implementation also has a small practical consequence: in spite of the fact that older objects create new objects and store them not so often in the fields (British scientists found that no more than 1% of second-generation objects do this way), any Entry into the field of an object requires some additional cost for the same hack required to update the card table. This, for example, makes writing a field in a controlled world a slightly more expensive operation, compared to the uncontrolled world (for example, C ++).

Additional links

Sources about garbage collection in .NET

Source: https://habr.com/ru/post/155847/

All Articles

Something about garbage collection and generations

Why do I need all this?

Additional links

More articles: