Duke, take out the trash! - Part 3

Today we continue the series of articles about garbage collectors shipped with the Oracle Java HotSpot VM virtual machine. We have already studied a bit of theory and considered how two basic collectors are dealt with a bunch of - Serial GC and Parallel GC. And in this article we will focus on the CMS GC and G1 GC collectors, the primary task of which is to minimize pauses when restoring order in the memory of applications that operate with medium and large amounts of data, that is, for the most part in the memory of server applications.

These two collectors are united by the common name “mostly concurrent collectors” , that is, “mostly competitive collectors” . This is due to the fact that part of their work they perform in parallel with the main threads of the application, that is, at some points they compete with them for processor resources. Of course, this does not pass without a trace, and as a result they exchange the improvement in part of the pauses for deterioration in part of the carrying capacity. Although they do it in different ways. Let's see how.

CMS GC

The CMS collector (decoded as Concurrent Mark Sweep) appeared in HotSpot VM at the same time as the Parallel GC as its alternative for use in applications that have access to several processor cores and are sensitive to STW pauses. At that time, there was another alternative - Incremental GC, but it did not pass natural selection for lack of clear advantages. And CMS survived. And although the peak of its popularity, apparently, has already passed, it will be interesting to take a look at its internal structure, since some of the ideas embodied in it moved to a more modern G1 GC.

Using the CMS GC is enabled with the -XX: + UseConcMarkSweepGC option .
')

Work principles

We have already seen the words Mark and Sweep when considering serial and parallel collectors (if you have not met, then now is the time to do it ). They marked two steps in the process of garbage collection in the older generation: marking the surviving objects and removing dead objects. The CMS collector got its name due to the fact that it performs these steps in parallel with the operation of the main program.

In this case, the CMS GC uses the same memory organization as the Serial / Parallel GC already reviewed: the Eden + Survivor 0 + Survivor 1 + Tenured regions and the same principles of small garbage collection. Differences begin only when it comes to complete assembly. In the case of CMS it is called the older (major) assembly , and not the full one, since it does not affect the objects of the younger generation. As a result, the small and senior assemblies are always separated here. One of the side effects of this separation is that all objects of the younger generation (even potentially dead) can play the role of roots in determining the status of objects in the older generation.

An important difference from the CMS collector from those considered earlier is that it does not wait for Tenured to be filled in order to start the older assembly. Instead, he works in the background constantly, trying to keep the Tenured in a compact state.

Let's look at what the older garbage collection is when using the CMS GC.

It begins by stopping the main application threads and marking all objects directly accessible from the roots. After this, the application resumes its work, and the collector, in parallel with it, searches for all the living objects that are accessible by reference from those very marked root objects (this part it does in one or in several streams).

Naturally, during such a search, the situation in the heap may change, and not all the information collected during the search for living objects is relevant. Therefore, the collector once again suspends the application and scans the heap to search for living objects that have escaped from it during the first pass. In this case, it is assumed that objects which, at the time of the completion of the compilation of the list, are no longer recorded, will be recorded into living objects. These objects are called floating garbage , they will be removed during the next assembly.

After the living objects are marked, the operation of the main threads of the application is resumed, and the collector clears the memory of dead objects in several parallel threads. It should be borne in mind that after cleaning, the objects in the older generation are not packed, since it is very difficult to do this with the application running.

The CMS builder is quite intelligent. For example, he tries to distribute small and senior garbage collections in time so that they do not create long pauses in the application (additional details about this diversity in the comments ). To this end, he keeps statistics on past assemblies and plans subsequent ones based on it.

Separately, you should consider the situation when the collector does not have time to clear the Tenured until the memory is completely finished. In this case, the application stops working, and the entire assembly is performed in sequential mode. This situation is called concurrent mode failure . The builder informs us of these failures with the -verbose: gc or -Xloggc: filename options enabled .

CMS has one interesting mode of operation, called Incremental Mode, or i-cms, which causes it to temporarily stop when working in parallel with the main application in order to free up processor resources for a short period (something like ABS in a car). This can be useful on machines with a small number of cores. But this mode is already marked as not recommended for use and may be disabled in future releases, so we will not analyze it in detail.

STW situations

From all of the above, it follows that with normal garbage collection, the CMS GC has the following situations leading to STW:

Small garbage collection. This pause is no different from a similar pause in the Parallel GC.
The initial phase of the search for living objects in the older assembly (the so-called initial mark pause ). This pause is usually very short.
The phase of the addition of a set of living objects in the older assembly (also known as remark pause ). It is usually longer than the initial search phase.

In the event of a failure in the competitive mode, the pause may be delayed for a sufficiently long time.

Customization

Since the CMS memory approaches are similar to those used in Serial / Parallel GC, the same options for determining the size of the heap regions, as well as the automatic adjustment options for the required performance parameters are applicable to it.

Usually, the CMS, based on the statistics collected by the application, determines when it should execute the older assembly, but it also has a threshold for the fullness of the Tenured region, upon reaching which the older assembly must necessarily be initiated. This threshold can be set using the -XX option : CMSInitiatingOccupancyFraction =? , the value is indicated as a percentage. A value of -1 (sometimes set by default) indicates that the assembly is disabled by this condition.

Advantages and disadvantages

The advantage of this collector compared to the previously considered Serial / Parallel GC is its focus on minimizing downtime, which is a critical factor for many applications. But to accomplish this task you have to sacrifice processor resources and often total bandwidth.

Recall also that this collector does not compact objects in the older generation, which leads to Tenured fragmentation. This fact, combined with the presence of floating debris, makes it necessary for the application (specifically, the older generation) to allocate more memory than other collectors would need (Oracle advises 20% more).

Well, long pauses in case of potential failures of the competitive regime can be an unpleasant surprise. Although they are not frequent, and if there is enough memory, CMS can avoid them completely.

However, such a collector may be suitable for applications that use a large amount of long-lived data. In this case, some of its shortcomings are leveled. But in any case, it is not necessary to make a decision on its use until you have become acquainted with another Java HotSpot VM caster builder.

G1 GC

So we got to the last and probably most interesting for many garbage collector - G1 (which is short for Garbage First). It is interesting, first of all, because it is not a clear continuation of the Serial / Parallel / CMS line, which adds parallelism to another phase of garbage collection, but uses an already significantly different approach to the task of cleaning up memory.

G1 is the youngest member of the HotSpot virtual machine garbage collectors. It was initially positioned as a collector for applications with large heaps (from 4 GB and above), for which it is important to keep the response time small and predictable, even if at the expense of reducing bandwidth. In this field, he competed with the CMS GC, although not initially as successfully as we would like. But it gradually corrected, improved, stabilized and finally reached such a level that Oracle speaks of it as a long-term replacement for CMS, and in the Open JDK they even seriously consider it for the role of the default collector for server configurations in version 9.

All this is obviously worth it to deal with his device. Let's not postpone.

G1 is enabled with the Java option -XX: + UseG1GC.

Work principles

The first thing that catches your eye when considering the G1 is a change in the approach to organizing the heap. Here, the memory is divided into many regions of the same size. The size of these regions depends on the total heap size and is selected by default so that there are no more than 2048 of them, usually between 1 and 32 MB. The only exceptions are the so-called huge (humongous) regions , which are created by combining ordinary regions to accommodate very large objects.

The division of regions into Eden, Survivor and Tenured in this case is logical, regions of one generation are not obliged to go in succession and can even change their affiliation to one or another generation. An example of dividing a heap into regions might look like this (the number of regions is greatly diminished):

Small assemblies are performed periodically to clean the younger generation and transfer objects to the Survivor regions, or raise them to the older generation with transfer to Tenured. Several threads are working on transferring objects, and during this process, the main application stops working. This is an approach already familiar to us from the collectors discussed earlier, but the difference is that cleaning is not performed on all generations, but only on parts of the regions that the collector can clean without exceeding the desired time. At the same time, he chooses for cleaning those regions in which, in his opinion, the greatest amount of garbage has accumulated and the cleaning of which will bring the greatest result. From here just name Garbage First - garbage first of all.

And with the complete assembly (more precisely, here it is called mixed (mixed) ) everything is a bit more clever than in the previously discussed collectors. In G1, there is a process called the marking cycle , which works in parallel with the main application and makes a list of living objects. Except for the last point, this process looks familiar to us:

Initial mark. Mark the roots (with the main application stopped) using information obtained from small assemblies.
Concurrent marking. Marking all living objects in a heap in several streams, in parallel with the operation of the main application.
Remark. Additional search of previously not taken into account living objects (with the main application stopped).
Cleanup. Clearing of auxiliary structures for keeping track of links to objects and searching for empty regions that can already be used to place new objects. The first part of this step is performed when the main application is stopped.

It should be borne in mind that G1 uses the Snapshot-At-The-Beginning (SATB) algorithm to obtain a list of live objects, that is, all objects that were at the time the algorithm started working, plus all objects created during its implementation. This, in particular, means that G1 admits the presence of floating debris, which we met when considering the CMS collector.

After the end of the tagging cycle, G1 switches to performing mixed assemblies. This means that with each assembly, a certain number of older regions are added to the set of younger generation regions to be cleaned. The number of such assemblies and the number of cleaned regions of the older generation is selected based on the statistics of previous assemblies available to the collector so as not to go beyond the required assembly time. Once the collector has cleared enough memory, it switches back to the small build mode.

The next cycle of tagging and, as a result, the next mixed assemblies will be launched when the heap occupancy exceeds a certain threshold.

The mixed garbage collection in the above heap example can go like this:

It may be that in the process of cleaning the memory in the heap there are no free regions to which the surviving objects could be copied. This leads to the situation of allocation (evacuation) failure , the similarity of which we saw in the CMS. In this case, the collector performs a full garbage collection throughout the heap when the main application threads are stopped.

Based on the already mentioned statistics about previous builds, G1 can change the number of regions assigned to a certain generation in order to optimize future builds.

Giants

At the beginning of the G1 story, I mentioned the existence of vast regions in which so-called humongous objects are stored. From the point of view of the JVM, any object larger than half of the region is considered huge and is treated in a special way:

It never moves between regions.
It can be removed as part of a tagging cycle or full garbage collection.
In the region, occupied by a huge object, no one else is hooked, even if there is space left in it.

In general, these points sometimes have far-reaching consequences. Large objects, especially short-lived ones, can cause a lot of inconvenience to all types of assemblers, since they are not removed with small assemblies, but occupy precious space in older regions (remember the accelerated objects discussed in the previous chapter?) But G1 is more vulnerable to them the negative impact due to the fact that even an object of several megabytes (and in some cases 500 KB) is already enormous for it. The commentary on the previous article gives an example of such a problem with Solr .

In the continuation of this series of articles, we will see how to deal with this.

STW situations

If to summarize, then at G1 we get STW in the following cases:

The process of transferring objects between generations. To minimize such pauses, G1 uses several threads.
The short phase of the initial tagging of roots within the tagging cycle.
A longer pause at the end of the remark phase and at the beginning of the cleanup phase of the tagging cycle.

Customization

Since the main goal of the G1 collector is to minimize pauses in the operation of the main application, then the main option for setting it up can be considered to be already encountered by us -XX: MaxGCPauseMillis =? that sets the maximum time for a one-time garbage collection acceptable to us. Even if you are not going to set this property, at least check its default value. Although the Oracle documentation states that the default build time is not limited, but in fact this is not always the case.

Options -XX: ParallelGCThreads =? and -XX: ConcGCThreads =? specify the number of threads to be used for garbage collection and for executing a mark cycle, respectively.

If you are not satisfied with the automatic selection of the size of the region, you can set it manually using the -XX option : G1HeapRegionSize =? . The value should be a power of two, if measured in megabytes. For example, -XX: G1HeapRegionSize = 16m .

If desired, you can change the heap full threshold, at which a mark cycle is initiated and the transition to the mixed assembly mode is initiated. This is done with the -XX option : InitiatingHeapOccupancyPercent =? that takes a percentage. By default, this threshold is 45%.

If you decide to go deeper into the G1 settings, you can turn on additional features with the -XX options : + UnlockExperimentalVMOptions and -XX: + AggressiveOpts and play with the experimental settings.

Advantages and disadvantages

In general, it is believed that the G1 collector more accurately predicts the size of pauses than the CMS, and better distributes assemblies in time to prevent long application stops, especially for large heap sizes. However, he is deprived of some other shortcomings of the CMS, for example, it does not fragment the memory.

The price paid for the advantages of G1 are the processor resources, which it uses to perform a fairly large part of its work in parallel with the main program. As a result, application bandwidth suffers. The default target bandwidth for G1 is 90%. For Parallel GC, for example, this value is 99%. This, of course, does not mean that throughput with G1 will always be almost 10% less, but this feature should always be kept in mind.

So we have disassembled the algorithms of all four garbage collectors in the HotSpot virtual machine. In the next article we will try to figure out how this knowledge can be used to optimize the operation of applications.

Previously:
← Part 2 - Serial GC and Parallel GC assemblers
← Part 1 - Introduction

Source: https://habr.com/ru/post/269863/

All Articles

Duke, take out the trash! - Part 3

CMS GC

Work principles

STW situations

Customization

Advantages and disadvantages

G1 GC

Work principles

Giants

STW situations

Customization

Advantages and disadvantages

More articles: