Optimization of programs under the Garbage Collector

Not so long ago, a great article appeared on Habré Optimizing garbage collection in a high-loaded .NET service . This article is very interesting because the authors, having armed themselves with the theory, have made the previously impossible: they optimized their application using the knowledge of the work of GC. And if earlier we had no idea how this GC works, now it is presented to us on a platter through the efforts of Konrad Kokos in his book Pro .NET Memory Management . What conclusions did I draw for myself? Let's make a list of problem areas and think about how to solve them.

At a recent CLRium # 5: Garbage Collector workshop, we talked about GC all day. However, I decided to publish one report with a text transcript. This is a report about the conclusions regarding application optimization.

Reduce cross-generational connectivity

Problem

To optimize garbage collection speed, the GC collects the younger generation whenever possible. But to do this, he also needs information about links from older generations (in this case, they are an additional root): the card table.

At the same time, one link from the older to the younger generation forces us to cover the area with a card table:

4 bytes overlaps 4 kb or max. 320 objects - for x86 architecture
8 bytes overlaps 8 kb or max. 320 objects - for x64 architecture

Those. GC, checking the card table, meeting in it a non-zero value, is forced to check a maximum of 320 objects for the presence of outgoing links in our generation.

Therefore, sparse links to the younger generation will make GC more time consuming.

Decision

To have objects with connections to the younger generation - close by;
If the traffic of objects of zero generation is supposed, use pulling. Those. make a pool of objects (there will be no new ones: there will be no zero generation objects). And further, the “warming up” of the pool with two consecutive GCs so that its contents are guaranteed to fail into the second generation, thereby avoiding references to the younger generation and having zeros in the card table;
Avoid references to the younger generation;

Do not allow strong connectivity

Problem

As follows from the phase compression algorithms for objects in SOH:

To compress a heap, it is necessary to bypass the tree and check all the links correcting them for new values.
At the same time, links from the card table affect entire groups of objects.

Therefore, the general strong connectivity of objects can lead to subsidence at GC.

Decision

Positioning strongly connected objects side by side, in the same generation
Avoid unnecessary links in general (for example, instead of duplicating this-> handle links, you should use an existing this-> Service-> handle)
Avoid code with hidden connectivity. For example, closures

Monitor segment usage

Problem

With intensive work, a situation may arise when the allocation of new objects leads to delays: the allocation of new segments under the heap and their further decommissioning when cleaning garbage

Decision

Using PerfMon / Sysinternal Utilities, check the points of selection of new segments and their decommitting and release
If we are talking about LOH, in which there is a dense traffic of buffers, use ArrayPool
When it comes to SOH, make sure that objects of the same lifetime stand out side by side, providing a Sweep trigger instead of Collect
SOH: use object pools

Do not allocate memory in loaded sections of code.

Problem

Loaded part of the code allocates memory:

As a result, the GC selects the allocation window not 1K, but 8K.
If the window does not have enough space, this leads to a GC and expansion of the zoned area.
A dense stream of new objects will make short-lived objects from other streams quickly go to the older generation with worse garbage collection conditions.
Which will increase garbage collection time
That will lead to longer Stop the World even in Concurrent mode

Decision

A complete ban on the use of closures in critical parts of the code
Complete prohibition of boxing on critical parts of the code (you can use emulation through pulling if necessary)
Where you need to create a temporary object for data storage, use the structure. Better - ref struct. When the number of fields is more than 2, transfer by ref

Avoid unnecessary memory allocations in the LOH

Problem

Placing arrays in LOH leads either to its fragmentation or to the weighting of the GC procedure.

Decision

Use partitioning of arrays into sub-arrays and a class that encapsulates the logic of working with such arrays (i.e. instead of List <T>, where the mega-array is stored, your MyList with array [] [], dividing the array into several shorter)
- Arrays go to SOH
- After a couple of garbage collections will lay down near the ever-living objects and will no longer affect the garbage collection
Control the use of double arrays, longer than 1000 elements.

Where justified and possible, use thread stack

Problem

There are a number of ultra short-lived objects or objects living within the framework of a method call (including internal calls). They create traffic objects

Decision

Using memory allocation on the stack, where possible:
- It does not load a bunch
- Does not load GC
- Memory free - instant
Use Span T x = stackalloc T[]; instead of new T[] where possible
Use Span/Memory where possible
Translate algorithms to ref stack types (StackList: struct, ValueStringBuilder )

Release objects as early as possible.

Problem

Conceived as short-lived, objects fall into gen1, and sometimes into gen2.
This results in a heavier GC that lasts longer.

Decision

You must release the object link as soon as possible.
If a lengthy algorithm contains code that works with any objects, separated by code. But which can be grouped in one place, it is necessary to group it, allowing thereby to collect them earlier.
- For example, on line 10 they took out a collection, and on line 120 they filtered it out.

Calling GC.Collect () is not necessary

Problem

It often seems that calling GC.Collect () will fix the situation.

Decision

Much more correct to learn the algorithms of the GC, look at the application for ETW and other diagnostic tools (JetBrains dotMemory, ...)
Optimize the most problematic areas

Avoid Pinning

Problem

Pinning creates a number of problems:

Complicates garbage collection
Creates free memory spaces (nodes free-list items, bricks table, buckets)
May leave some objects in a younger generation, thus forming links from the card table

Decision

If there is no other way, use fixed () {}. This method of fixing does not make a real fix: it only happens when the GC has worked inside the curly braces.

Avoid finalizing

Problem

The finalization is not undetermined:

Undisclosed Dispose () causes finalization with all outgoing links from the object.
Dependent objects are delayed longer than scheduled
Grow older by moving to older generations.
If they contain links to younger ones, generate links from the card table.
Complicating the assembly of older generations, fragmenting them and leading to Compacting instead of Sweep

Decision

Carefully call Dispose ()

Avoid lots of threads

Problem

With a large number of threads, the number of allocation context grows, since they are allocated to each thread:

As a result, GC.Collect comes faster.
Due to the lack of space in the ephemeral segment, Sweep will follow Collect

Decision

Monitor the number of threads by the number of cores

Avoid traffic to objects of different sizes.

Problem

When traffic objects of different size and lifetime of fragmentation occurs:

Increase Fragmentation ratio
Triggering Collection with address change phase in all referencing objects

Decision

If you intend to traffic objects:

Check for extra margins, approximate dimensions
To control the absence of manipulations with strings: where possible, replace with ReadOnlySpan / ReadOnlyMemory
Release link as soon as possible.
Use pulling
Caches and pools "warm up" double GC to compact objects. Thereby you avoid problems with the card table.

Source: https://habr.com/ru/post/453082/

All Articles

Optimization of programs under the Garbage Collector

Reduce cross-generational connectivity

Problem

Decision

Do not allow strong connectivity

Problem

Decision

Monitor segment usage

Problem

Decision

Do not allocate memory in loaded sections of code.

Problem

Decision

Avoid unnecessary memory allocations in the LOH

Problem

Decision

Where justified and possible, use thread stack

Problem

Decision

Release objects as early as possible.

Problem

Decision

Calling GC.Collect () is not necessary

Problem

Decision

Avoid Pinning

Problem

Decision

Avoid finalizing

Problem

Decision

Avoid lots of threads

Problem

Decision

Avoid traffic to objects of different sizes.

Problem

Decision

More articles: