This topic will discuss the reasons for the long pause of the garbage collector and how to deal with them. I will talk about CMS (low pause), as at the moment it is the most frequently used algorithm for applications with large memory and low latency requirement. The description is given in the assumption that your application is spinning on a box with a large amount of memory and a large number of processors.

The general principles of GC and CMS in particular are described in detail
here . I only briefly summarize here what we need to understand this topic:
- The memory is divided into two areas YoungGen and OldGen
- The newly created objects are in YoungGen.
- Objects that endure several minor garbage collections get into OldGen
- Minor garbage collection cleans YoungGen
- Major garbage collection cleans OldGen
- Full GC clean both areas
- Stop-the-world means that your application is completely stopped when garbage collection is running.
- Concurrent algorithms and phases do not cause the application to stop; garbage collection runs parallel to the application
- Parallel algorithms and phases are activities operating in multiple streams. They can be both oncurrent and stop-the-world. Unless explicitly stated, then usually the documentation implies stop-the-world.
- Minor garbage collection (minor GC) - always only stop-the-world
- Full GC is stop-the-world
- CMS (Concurrent Mark Sweep) has the following main phases:
- initial mark - stop-the-world
- mark - Concurrent
- preclean - Concurrent
- remark - Stop-the-world
- sweep - Concurrent
- Search for dead objects (on which no references are left in the application) in traditional garbage collectors is performed by searching all living objects (which are reachable by links from GC roots)
- CMS does not defragment the memory and uses free lists to manage it.
- GC Ergonomic (parameters that set the desired maximum pauses) does not work with CMS
So, in our case we have the following moments when our application completely stops.
- Minor garbage collection
- Init-mark phase CMS
- Remark CMS phase
- Full GC
First you need to determine whether the pauses in these four cases are so large that they ruin the life of your application. It’s better to look at the production system, since there are exactly the conditions you need. Thank God it can be done absolutely without performance degradation, launching the JVM with the following parameters.
-verbose:gc
-Xloggc:gc.log
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCApplicationStoppedTime
Details on how to read the CMS logs can be read
here . If you want time not to be printed in relative seconds from the start of jvm, but humanly, you can use the parser that
I wrote on python . We are now only interested in parts of the logs on stop-the-world events.
')
1. Brakes minor garbage collection.
[GC [DefNew: 209100K->25808K(235968K), 0.0828063 secs] 209100K->202964K(1284544K), 0.0828351 secs] [Times: user=0.02 sys=0.08, real=0.08 secs]
Total time for which application threads were stopped: 0.0829205 seconds
The main algorithm of minor assemblies is copying, so the more living objects in YougGen, the longer the minor assembly takes. I see at least three things to look at when the pauses are too big and you are not satisfied.
a. Your JVM does not use the appropriate algorithm. The above log uses a single-threaded algorithm (DefNew). I recommend trying the new multi-threading algorithm (in the logs it will be called ParNew), which can be turned on by -XX: + UseParNewGC. You can also mention that if you see the name PSYoungGen in the logs, this means that your JVM uses a parallel algorithm, but the old implementation. Although coupled with CMS, it seems to be not available.
b. You have allocated too large a piece of memory for YoungGen (in the given log it is tsiferka 235968K). It can be reduced by setting the -Xmn option.
c. You use too large a survivor space with a large resolved age of objects, so objects are copied back and cannot get stuck in OldGen, giving an unnecessary load to a minor GC. This situation can be corrected with the -XX: SurvivorRatio and -XX: MaxTenuringThreshold parameters. For a more detailed analysis of this case, you can run the JVM with the -XX: + PrintTenuringDistribution parameter to get more information about the generations of objects in the GC logs.
2. The init-mark is long
[GC [1 CMS-initial-mark: 680168K(1048576K)] 706792K(1284544K), 0.0001161 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
Total time for which application threads were stopped: 0.0002740 seconds
In this phase, the garbage collector tags all objects directly accessible from GC Roots. If you see that this phase takes a lot of time, then you probably have a lot of links from local variables and static fields.
Here you can try to increase the number of streams (ParallelCMSThreads) involved in this phase. By default, it is calculated as (ParallelGCThreads + 3) / 4). Those. if ParallelGCThreads = 8, then only two streams will take part in the init-mark phase, which may not give any increase due to the overhead wise arising from parallelism.
3. Big pauses in the Remark phase
[GC[YG occupancy: 26624 K (235968 K)][Rescan (non-parallel) [grey object rescan, 0.0056478 secs][root rescan, 0.0001873 secs], 0.0059038 secs][weak refs processing, 0.0000090 secs] [1 CMS-remark: 750825K(1048576K)] 777449K(1284544K), 0.0059808 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
Total time for which application threads were stopped: 0.0061668 seconds
During the mark phase, there is a special process that monitors all changes to the links. Remark phase is needed just to view all the modified links.
a. If you see the phrase “Rescan (non-parallel)” in the logs, then I recommend enabling the -XX: + CMSParallelRemarkEnabled option to enable multiple threads for this phase.
b. Since the cleaning of weekly links occurs in this phase, see if you use too many of them. (For example, java.util.WeakHashMap)
c. Perhaps your links change a lot. See how much time passes between inital-mark and remark. The less time elapsed between these phases, the fewer the references will be changed and the faster its function is remark. Starting from the fifth java, just before the remark phase, the abortable-preclean phase was added, which essentially does nothing, I just hang and wait until the minor build works, then wait a little more and end, thus starting the next phase, remark. There are two reasons for this logic. The first one - remark also scans YoungGen and to be able to work in multi-threaded mode, a minor assembly is needed, after which it is possible to effectively split the remaining objects in YoungGen into areas for parallel processing. And the second - remark rather long stop-the-workd phase, and if it works right after the minor assembly, you get one big long pause. There are several parameters that allow you to control this behavior: CMSScheduleRemarkEdenSizeThreshold, CMSScheduleRemarkEdenPenetration, CMSMaxAbortablePrecleanTime. I propose to try CMSScavengeBeforeRemark which will immediately cause a remark to cause a minor build. Thus, you shorten the time between the init-mark and the remark as much as possible and the work for the remark phase will be smaller. This will be especially effective if the pauses of minor assemblies are much less remark, which usually happens.
4. Full GC in the log
(concurrent mode failure): 798958K->74965K(1048576K), 0.0270334 secs] 1033467K->74965K(1284544K), [CMS Perm : 3022K->3022K(21248K)], 0.0270963 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
Total time for which application threads were stopped: 0.0271630 seconds
I have already described this and several other cases of Full GC in detail
here .
That's all I wanted to tell you about the pause. Oh yes, please do not use incremental CMS (-XX: + CMSIncrementalMode), unless you have one or two kernels. Everything will work only slower.
And a few words about other algorithms.
Garbage First (G1), which should appear by default in Java 7 and can be enabled in the sixth Java version starting with Java SE 6 Update 14 with the -XX options: + UnlockExperimentalVMOptions and -XX: + UseG1GC. The idea is to divide the entire area into small sections of memory, which are collected in different parts of time, thereby making very small pauses. There are various JVM parameters that allow you to specify the desired pauses, based on which the memory is divided into regions. It should be noted that this approach cannot be called universal, since the efficiency of its work depends very much on the topology of objects in memory. If you actively use different caches, to which objects you have scattered links throughout the application, the assembly of one region can pull scans of a large number of other areas, which will cause noticeable pauses.
Recently, I often stumble upon posts about
Azul GC , which works without any pauses at all, regardless of the topology of objects, size and memory area. It sounds very promising, but for a long time their solution was available only on their own hardware (Azul's Vega systems), since the algorithm requires special LVB instructions (loaded value barrier). The good news is that finally there is an opportunity to implement a similar mechanism on the x86-64 architecture of Intel processors. If I would write an ultra-low-latency application from scratch, then I would definitely consider using this JVM, but if your application is already in production, and its stability is one of the most important requirements, then switch from an Oracle HotSpot JVM to any another rather risky move. Recall how many problems users came across even going from fifth to sixth Java.
Links to the topic:
- Java official memory management documentation
- Official documentation on configuring the garbage collector
- The blog of the Sun employee who worked on the GC. Here he describes in detail the various aspects of garbage collection. Highly recommend.
- FAQ on various aspects of JVM memory management
- Description of the alternative Azul GC. It also points out the shortcomings of the existing solutions and explains what caused them.