Duke, take out the trash! - Part 2

In the previous article, we introduced the terminology and basic ideas underlying the Java HotSpot VM garbage collectors and many other virtual machines. Now we, finally, can take a shovel in our hands and begin to rake our heap. Today, we have ~~two shovels~~ on the review : two garbage collectors, used by a huge number of Java programs, which are often not even aware of this. These are Serial GC and Parallel GC. Their popularity is explained simply - these collectors are selected by the virtual machine by default for most hardware configurations.

The approaches to working with the heap used in these collectors in one form or another are used in more advanced implementations, so at this stage it will be very important for us to deal with the ideas and possibilities inherent in them.

Serial gc

Serial GC (also known as sequential collector) is the youngest in terms of its functionality, but the oldest in terms of the length of its presence in the JVM garbage collection. He was slowly but surely collecting garbage even when many of us did not even suspect the existence of the Java language. And still continues to collect. Just as slowly, but just as true.
')
But he didn’t go to the back of the story because he doesn’t have all programs that have large heaps and not all programs run on computers with powerful multi-core processors. In such Spartan conditions, it is very helpful. And even if this is not your case, you should not skip this chapter, since the basic approaches to the implementation of garbage collection in the JVM are described here, so let's get started.

Using Serial GC is enabled with the -XX: + UseSerialGC option .

Work principles

When using this collector, the heap is divided into four regions, three of which belong to the younger generation (Eden, Survivor 0 and Survivor 1), and one (Tenured) - to the older one:

The average object begins its life in the Eden region (translated as Eden, which is quite logical). This is where the JVM puts it at the time of creation. But over time it may turn out that there is no room for a newly created object in Eden, in such cases a small garbage collection is started.

First of all, this build finds and removes dead objects from Eden. The remaining living objects are transferred to the empty Survivor region. One of the two Survivor regions is always empty; it is the one that is selected to transfer objects from Eden:

We see that after a small assembly, the Eden region is completely empty and can be used to house new objects. But sooner or later, our application will again occupy the entire Eden area and the JVM will again attempt to do a small build, this time clearing Eden and part-time Survivor 0, then transferring all the surviving objects to the empty Survivor 1 region:

Next time, Survivor 0 will be selected again as the destination region. As long as there is enough space in the Survivor regions, everything is going well:

The JVM constantly monitors how long objects move between Survivor 0 and Survivor 1, and selects the appropriate threshold for the number of such movements after which the objects move to Tenured, i.e. go to the older generation. If the Survivor region is filled, then the objects from it are also sent to the Tenured:

The described process of small garbage collection is quite simple, but the reasons for using the Survivor regions, and precisely two of them, are not always clear. I think we will leave a detailed explanation of the reasons beyond the scope of this article (discussed in the comments ), and here we only note that of the two main ways of working with surviving objects - compacting and copying - in Sun, when developing a small garbage collector, we chose the second way. as it is easier to implement and often turns out to be more productive.

In the case when there is not enough space for new objects in Tenured, the full garbage collection, working with objects from both generations, comes into play. At the same time, the older generation is not divided into subregions by analogy with the younger one, but represents one big piece of memory, so after removing dead objects from Tenured, data is not transferred (there is nowhere to be transferred), but their compression, i.e. placement is sequential, without fragmentation. Such a cleaning mechanism is called Mark-Sweep-Compact by the name of its steps (mark surviving objects, clear memory from dead objects, compact surviving objects).

Accelerates

The most observant readers probably noticed that at the beginning of the description of the principles of work it is said that in the Eden section an average object is created, and not any. Such a reservation is not without reason. The fact is that there are more accelerate objects, the size of which is so large that it is too expensive to create them in Eden, and then drag them along with Survivor. In this case, they are placed immediately in Tenured.

Is the pile small?

Important factors in the described processes are the absolute size of the heap and the relative sizes of the regions within it.

As the heap fills with data, the JVM can not only purge the memory, but also ask the OS to allocate additional memory to expand the regions. Moreover, if the actual memory used falls below a certain threshold, the JVM can return some of the memory to the operating system. To regulate the appetite of a virtual machine, there are well-known Xms and Xmx options.

And although setting the heap boundary values is sometimes enough for the program to work and not clearly slow down, fine-tuning the collector to achieve the required performance is done by adjusting the sizes of different regions. We will consider examples of such regulation and its influence on the work of the program in a separate article, and here for now we will simply list the parameters with which this is done (see below).

It is also worth noting that by default the younger generation occupies one third of the entire heap, and the older, respectively, two thirds. In addition, each Survivor region is one-tenth of the younger generation, that is, Eden is eight-tenths. As a result, the real proportions of regions by default look like this:

And what happens if even after allocating the maximum amount of memory and its complete cleaning, there is no room for new objects? In this case, we expectedly get java.lang.OutOfMemoryError: Java heap space and the application stops working, leaving us to remember its heap as a file for analysis. Technically, this happens if the work of the collector begins to take at least 98% of the time and at the same time garbage collections free up no more than 2% of the memory.

STW situations

With this collector, everything is quite simple, since all his work is one continuous STW . At the beginning of each garbage collection, the operation of the main threads of the application stops and resumes only after the end of the assembly. Moreover, all the work on cleaning the Serial GC does not rush, in one thread, sequentially, for which he won his name.

Customization

We have already touched on the fact that with the help of Xms and Xmx options, you can configure the initial and maximum heap sizes, respectively. Surely most of you have already done this. Now let's try to dig deeper.

There are options -XX: MinHeapFreeRatio =? and -XX: MaxHeapFreeRatio =? , which set the minimum and maximum share of free space in each generation, upon reaching which the generation size will be automatically increased or decreased, respectively. For example, if MinHeapFreeRatio = 35 , then if the share of free space in any generation falls below 35%, this generation will be provided with additional space so that at least 35% becomes free. Similarly, if MaxHeapFreeRatio = 65 , then, with an increase in the share of free space in a generation of up to 65% or more, part of the memory allocated to this generation will be freed to return to the desired threshold. The default values of these parameters depend on the hardware characteristics of the computer.

You can set the desired ratio of the size of the older generation to the total size of the regions of the younger generation using the -XX option : NewRatio =? . For example, NewRatio = 3 means that for the younger generation (Eden + S0 + S1) a quarter of the pile will be allotted, and for the older generation - three quarters. The counterintuitive name of this option introduces some confusion even in the Oracle documentation, but it works that way. It’s easier to remember that where option names end in Ratio , the real value will be the inverse of what you specified.

If desired, you can limit the size of the younger generation of absolute values of the bottom and top using the options -XX: NewSize =? and -XX: MaxNewSize =? . If you want to set the same values for NewSize and MaxNewSize , you can simply use the -Xmn option. For example, -Xmn256m is equivalent to -XX: NewSize = 256m -XX: MaxNewSize = 256m .

You can still climb inside the younger generation and adjust the ratio of the size of Eden to the size of a Survivor. This is done with the -XX option : SurvivorRatio =? . For example, with SurvivorRatio = 6, each Survivor region will occupy one eighth the size of the entire younger generation, and Eden six sixths (remember the rule of options * Ratio ).

Using the -XX option : -UseGCOverheadLimit, you can disable the collector activity threshold of 98%, upon reaching which OutOfMemoryError occurs.

If you are interested in watching how your objects in the Survivor region age and what target values for its size are set at the moment, you can use the -XX: + PrintTenuringDistribution option , which adds statistics on Survivor to the output of information on some garbage collections.

Advantages and disadvantages

The main advantage of this collector is obvious - it is unassuming in terms of computer resources. Since he does all the work consistently in one thread, he has no noticeable overheads and negative side effects.

Turtle

The main drawback is also understandable - it is a long pause for garbage collection with a significant amount of data. In addition, it is clear that all the settings for Serial GC revolve around the sizes of different regions of the heap. That is, for fine tuning you need to learn something yourself, adjust, experiment, and so on. Someone may not like it.

If your application does not require a large heap size for operation (Oracle specifies a conditional 100 MB boundary), it is not very sensitive to short stops and only one processor core is available for operation, then you can take a closer look at this option. Otherwise, you can look for a better option.

Parallel gc

Parallel GC (parallel collector) develops the ideas behind the sequential collector, adding parallelism and a bit of intelligence to them. If you have more than one processor core on your computer and you clearly did not indicate which collector you would like to use in your program, then the JVM will almost certainly opt for the Parallel GC. It is fairly simple, but at the same time functional enough to meet the needs of most applications.

The parallel collector is enabled with the -XX: + UseParallelGC option .

Work principles

When a parallel collector is connected, the same approaches to organizing a heap are used as in the case of Serial GC - it is divided into the same Eden, Survivor 0, Survivor 1 and Old Gen regions (known to us under the name Tenured), operating according to the same principle . But there are two fundamental differences in working with these regions: first, garbage collection is handled by several threads in parallel; secondly, this collector can independently adjust to the required performance parameters. Let's see how it works.

To determine the number of threads to be used for garbage collection on a computer with N processor cores, the JVM uses the following formula by default: if N ≤ 8 , then the number of threads is N, otherwise, to get the number of threads, N is multiplied by a factor depending on other parameters, usually 5/8 , but on some platforms the coefficient may be less.

By default, both small and full builds enable multithreading. Small uses it when transferring objects to the older generation, and full - when compiling data in the older generation.

Each collector thread receives its own chunk of memory in the Old Gen region, the so-called promotion buffer , where only it can transfer data so as not to interfere with other threads. This approach speeds up garbage collection, but it also has a slight negative effect in the form of possible memory fragmentation:

The intellectual component of the improvements to the parallel collector with respect to the sequential is that it has settings that focus on achieving the efficiency of garbage collection you need. You can specify the performance parameters that suit you — the maximum build time and / or throughput — and the builder will try his best not to exceed the specified thresholds. For this, he will use the statistics of the already completed garbage collections and, based on it, plan the parameters of further assemblies: vary the size of generations, change the proportions of the regions.

For example, if a small JVM build fails to fit in the time you allot, the size of the younger generation can be reduced. If it is not possible to achieve a given bandwidth, but there are no problems with a delay, then the generation size will be increased. And so on.

It should be borne in mind that the statistics ignore garbage collections that you manually started.

Of course, nobody will give you an absolute guarantee of achieving the desired parameters, but you can try, often installing the necessary options is enough.

If you set too strict requirements that the collector cannot fulfill, he will be guided by the following priorities (in descending order of importance):

Reducing the maximum pause.
Increased bandwidth.
Minimize used memory.

At the same time, Parallel GC allows us to independently adjust the size of the regions, as in the sequential collector. But it is not recommended to do both at the same time, in order not to disorient the algorithms of automatic adjustment. Either we allocate enough memory to the application, indicate the desired performance parameters and observe from the side, or we climb into the regions settings, but then we are deprived of the right to demand from the assembler automatic adjustment to the performance criteria we need. He will not swear at us in case of violation of this rule, but he will not be able to do his job effectively either.

STW situations

As is the case with the sequential collector, during the memory cleanup operations all the main application threads stop. The only difference is that the pause, as a rule, is shorter due to the performance of part of the work in parallel mode.

Customization

For the parallel collector, all the same options are applicable as for the sequential one. You can manually set the size of the memory regions or the proportions between them. Below are the options that are added by the parallel collector to what we have already discussed above.

You can manually specify the number of threads you would like to allocate for garbage collection. This is done with the -XX option : ParallelGCThreads =? . For example, -XX: ParallelGCThreads = 9 will limit the number of streams to nine. Keep in mind that increasing the number of threads not only more parallelizes the assembly, but also increases the fragmentation of the Tenured region, and also adds overhead to synchronizing these threads.

If you wish, you can completely disable parallel compaction of objects in the older generation with the -XX option : -UseParallelOldGC .

Setting the desired performance parameters of the collector is done using the options -XX: MaxGCPauseMillis =? and -XX: GCTimeRatio =? .

MaxGCPauseMillis sets a limit on the maximum program suspension time for garbage collection. For example, -XX: MaxGCPauseMillis = 400 will indicate to the JVM that it is advisable not to delay the garbage collection pauses for more than 400 milliseconds. By default, there is no such limit. When setting this parameter, it should be remembered that the restriction on the assembly time can lead to the need to perform it more often, as a result of which the total throughput will suffer.

Using the GCTimeRatio option , you can specify the desired bandwidth threshold (the ratio of the running time of the program to the time of garbage collection). For example, with -XX: GCTimeRatio = 49, the JVM will attempt to build in such a way that they take up no more than 2% of the program's running time (the ratio of build time to program runtime will be 1 / (1 + 49)) .

Options -XX: YoungGenerationSizeIncrement =? and -XX: TenuredGenerationSizeIncrement =? establish, on how many percent it is necessary to increase if necessary younger and senior generation respectively. By default, both of these parameters are 20.

But the rate of reduction in the size of generations is not regulated by percentages, but by a special factor through the -XX option : AdaptiveSizeDecrementScaleFactor . It indicates how many times the decrease should be less than the increase. This option applies to both generations. For example, with -XX: AdaptiveSizeDecrementScaleFactor = 2, each generation decrease will be two times less than its increase (that is, both generations will decrease by 10% with -XX: GenerationSizeIncrement = 20 and -XX: TenuredGenerationSizeIncrement = 20 ).

Advantages and disadvantages

The indisputable advantage of this collector on the background of Serial GC is the possibility of automatic adjustment to the required performance parameters and smaller pauses during assembly. If there are multiple processor cores, the speed gain will be in almost all applications.

Certain fragmentation of memory, of course, is a minus, but it is unlikely to be significant for most applications, since the collector uses a relatively small number of threads.

In general, Parallel GC is a simple, intuitive, and efficient assembler, suitable for most applications. It has no hidden overhead, we can always change its settings and clearly see the result of these changes.

But it happens that it is not enough and you need to look for something more sophisticated. We will talk about more advanced implementations of collectors in the next article.

Part 3 - CMS GC and G1 GC assemblers →

Earlier
← Part 1 - Introduction

Source: https://habr.com/ru/post/269707/

All Articles

Duke, take out the trash! - Part 2

Serial gc

Work principles

Accelerates

Is the pile small?

STW situations

Customization

Advantages and disadvantages

Parallel gc

Work principles

STW situations

Customization

Advantages and disadvantages

More articles: