📜 ⬆️ ⬇️

Ahaha, HotSpot, what are you doing, stop it!



As you probably already know , soon the next Joker conference will take place in St. Petersburg. I am going to report on it on how to investigate the behavior of the JVM, which at first seems mysterious and mysterious. This post is a teaser designed to give you an opportunity to understand what to expect from the report.

Suppose that information about a problem suddenly comes to you: when collecting garbage, the reasons for “Last Ditch Collection” and “No GC” are displayed, and a search on the Internet does not give anything intelligible. Fortunately, HotSpot is almost completely assembled from OpenJDK, and therefore, at least in theory, we can find answers to all our questions directly in the source code. What we do!
')

Last Ditch Collection

Armed with the OpenJDK source code , look for the line of interest to us:

$ grep -irn "Last Ditch Collection" . ./src/share/vm/gc_interface/gcCause.cpp:94: return "Last ditch collection"; 

Go to the file and find there a reference to the _last_ditch_collection - the enum element, which contains all the possible causes of garbage collection. Immediately, we will _no_gc necessary constant for “No GC”, it will be _no_gc . We continue to grind:

 $ grep -rn "GCCause::_last_ditch_collection" . ... ./src/share/vm/gc_implementation/shared/vmGCOperations.cpp:286: heap->collect_as_vm_thread(GCCause::_last_ditch_collection); ... 


It looks like this is it! After reading the method in which the call is located, and especially its comments, we quickly understand what is happening.

Apparently, when HotSpot cannot allocate space for something that should be in the metaspace (for example, for a class), garbage collection starts first with the cause GCCause::_metadata_GC_threshold . If this does not help, then an attempt is made to expand metaspace. If this does not help, then a full garbage collection is started with the cause GCCause::_last_ditch_collection . During this assembly, soft references are cleared. Even if this does not help, then the place is really over, and it's time to throw an OOM.

It sounds believable, but it is worth learning how to reproduce it consistently. If we believe what was written above, then with active littering of metaspace we should see at least one assembly with the reason for the “Last ditch collection”:

 ClassPool pool = ClassPool.getDefault(); for(long l = 0; l < Long.MAX_VALUE; l++) { pool.makeClass("com.example.Kitty" + l).toClass(); } 

If this is run with the default settings, then it will take a long time to wait. But we can reduce the size of the metaspace and get the result faster:

 $ java -cp build/libs/labs-8-all.jar -XX:+PrintGCDetails -XX:MaxMetaspaceSize=10m ru.gvsmirnov.perv.labs.gc.NoGcTrigger ... [Full GC (Last ditch collection) [...] 14470K->14425K(32768K), [Metaspace: 8971K->8971K(1056768K)], 0.0481030 secs] ... Exception in thread "main" javassist.CannotCompileException: by java.lang.OutOfMemoryError: Metaspace at javassist.ClassPool.toClass(ClassPool.java:1099) ... 


In most cases, we will see more than one assembly with a reason for the “last ditch collection”. This is quite expected, since otherwise they would not have been confused, and they would hardly have implemented this feature at all.

By the way, here we looked at the Java 9 sources, but the logic of the last ditch collection has not changed for a very long time, so there’s nothing to worry about.

One more funny thing worth mentioning is that we, in general, didn’t need to write code that was actively digesting in metaspace. And in general of any code: after all, the JVM and the host itself load the classes themselves. If we just run java -XX:+PrintGCDetails -XX:MaxMetaspaceSize=2m X , we will get the last ditch collection several times, and then OOM. The virtual machine does not even have time to understand that class X does not exist.

No gc

Let's go to "No GC". The captain tells us that the developers of HotSpot would hardly have called the normal reason for garbage collection. Anyway, the only case in which we, logically, should see this is if we have time to run jstat earlier than at least one garbage collection occurs:

 $ jstat -gccause -t `jps | grep NoGc | cut -d' ' -f1` 100 Timestamp S0 S1 EOP YGC YGCT FGC FGCT GCT LGCC GCC 0.7 0.00 0.00 4.00 0.00 40.34 0 0.000 0 0.000 0.000 No GC No GC 0.8 0.00 0.00 6.00 0.00 45.82 0 0.000 0 0.000 0.000 No GC No GC 0.9 0.00 0.00 6.00 0.00 51.44 0 0.000 0 0.000 0.000 No GC No GC 

Or if he does a check when the garbage collector is not active:

 Timestamp S0 S1 EOP YGC YGCT FGC FGCT GCT LGCC GCC 2.8 0.00 0.00 0.00 0.78 31.00 5 0.080 5 1.051 1.131 Allocation Failure No GC 


So, if it was discovered in a different situation, then it must be a bug. And since when we analyzed the last ditch collection, such oddities were not observed, that is, there is a chance that it has already been fixed. Let's show some optimism and look for all suitable commits:

 $ hg grep --all '::_no_gc' ... src/share/vm/gc_implementation/shared/vmGCOperations.cpp:2097:+: assert(((_gc_cause != GCCause::_no_gc) && ... 


Aha It seems that the problem was indeed fixed in commit number 2097 , as early as February 2013. Looking in the file hotspot_version , we find that the last HotSpot in which the problem should be observed is 21.0-b01 . I had Java 6 on hand with HotSpot 20.45-b01 :

 $ java -version java version "1.6.0_45" Java(TM) SE Runtime Environment (build 1.6.0_45-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode) 


When we launch our cat kit, we immediately get the following result:

 Timestamp S0 S1 EOP YGC YGCT FGC FGCT GCT LGCC GCC 2.7 0.00 0.00 0.00 0.78 100.00 5 0.080 5 1.051 1.131 unknown GCCause Allocation Failure 2.8 0.00 0.00 0.00 0.78 100.00 5 0.080 5 1.051 1.131 unknown GCCause Allocation Failure 


It's pretty close, but not exactly what we were looking for. However, with more detailed study of the patch, it becomes obvious that you just need to use another garbage collector. Adding -XX:+UseG1GC will give us the desired result:

 Timestamp S0 S1 EOP YGC YGCT FGC FGCT GCT LGCC GCC 1.6 ? ? ? ? 96.78 ? ? ? ? ? G1 Evacuation Pause No GC 1.7 ? ? ? ? 100.00 ? ? ? ? ? No GC Permanent Generation Full 1.8 ? ? ? ? 100.00 ? ? ? ? ? No GC Permanent Generation Full 1.9 ? ? ? ? 100.00 ? ? ? ? ? No GC Permanent Generation Full 2.0 ? ? ? ? 100.00 ? ? ? ? ? No GC Permanent Generation Full 


Success! By the way, the person who answers first, why there are some question marks, and what exactly should lead us to use G1, will receive an invitation to Joker Unconference;) SerCe has already given the correct answer .

Afterword

So, just on the basis of a single tweet and using basic command-line utilities, we answered a dodgy question, found an old-time bug in HotSpot, were able to evaluate the JVM version from above, and even found out which garbage collector was used.

The moral is simple: HotSpot is not some mysterious shaitan machine. There is nothing mysterious in the world at all, it is only in our heads .

PS If you have an interesting example from this opera, then do not hesitate to send it. It is possible that I will analyze it in the report, or even just like that :)

PPS Several guest speakers who do not know Russian will come to the conference. They will be happy if there are more reports in English. I added a survey to understand how the main part of the audience can relate to this.

Source: https://habr.com/ru/post/233595/


All Articles