📜 ⬆️ ⬇️

How to build your JDK, without blackjack and automatic garbage collection

On the recently held Java One, Ruslan cheremin told that Disruptor developers are using JVM without a garbage collector. They had their own reasons for this, which have nothing to do with this topic.

I have long wanted to dig into the source of the virtual machine, and cutting out of it GC is a great start. Under the cut, I'll tell you how to build OpenJDK, cut out the garbage collector from it, and collect it again. Toward the end, even the answer to the question “why” has probably come to your mind.


')

Sources? Give two more and sprinkle with binaries!


Main course


OpenJDK is stored in mercurial using forest, and the easiest way to get code is to say

$ hg fclone http://hg.openjdk.java.net/jdk7/jdk7 

If the forest extension is not installed and you do not want to install it for some reason, you can do it like this:

 $ hg clone http://hg.openjdk.java.net/jdk7/jdk7 && jdk7/get_source.sh 


Another option is to download full bundles from the offsite. This will help cut a couple of corners, but will deprive the charms of using a version control system.

An interesting feature: for some reason, jaxp and jaxws are stored in a separate repository. Therefore, they must either be manually downloaded from the relevant sites ( jaxp.java.net and jax-ws.java.net ), or simply allow make download everything they need, saying ALLOW_DOWNLOADS=true . Personally, I think this option is more convenient. Oh yes, in the full source bundles, everything has already been downloaded for us.

Tools without which the dish can not be cooked


It is clear that the assembly will require a lot of things. The simplest is bootstrap jdk, at least version 1.6. You need to specify the path to it through the ALT_BOOTDIR variable. In addition, it requires a huge pile of everything, ranging from the obvious ant and make and ending with CUPS and ALSA . The easiest way to have exactly everything is to ask your batch manager to satisfy all build dependencies. For example, using aptitude:

 $ aptitude build-dep openjdk-6 


Checking what is going

In order to make sure that everything you need is there, you need to run make with the goal of sanity . Pay attention to setting environment variables:

 $ LANG=C ALT_BOOTDIR=/usr/lib/jvm/java-6-openjdk make sanity 


If everything is good, then you will see the inscription Sanity check passed

If everything is bad, then you will get a rather intelligible error message. Correct it and try again.

Now you can build jdk itself. To the environment variables was added the previously specified ALLOW_DOWNLOADS .
 $ ALLOW_DOWNLOADS=true LANG=C ALT_BOOTDIR=/usr/lib/jvm/java-6-openjdk make 


If successful, after 20-40 minutes you will receive a message like

 #-- Build times ---------- Target all_product_build Start 2012-04-20 01:56:53 End 2012-04-20 02:02:14 00:00:06 corba 00:00:09 hotspot 00:00:06 jaxp 00:00:08 jaxws 00:04:47 jdk 00:00:05 langtools 00:05:21 TOTAL 


You can verify that something useful has really gathered and proceed to the next step.

 $ ./build/linux-amd64/bin/java -version openjdk version "1.7.0-vasily_p00pkin" OpenJDK Runtime Environment (build 1.7.0-vasily_p00pkin-gs_2012_04_20_01_06-b00) OpenJDK 64-Bit Server VM (build 23.0-b21, mixed mode) 


I have an alternative operating system ...


... Based on BSD


It's not so bad. Under the strict guidance of good Oracle employees, I managed to assemble a hotspot on a macbook in the Sapsan platform. But the whole JDK for the next night was not very good. However, this can be done, you only need to have a fresh XCode and a lot of patience. I did not have either one or the other, and therefore I simply started a more powerful machine in the cloud of Selektel and conducted experiments on it. As a bonus, the assembly in the cloud is faster, while not burdening my laptop, and therefore I can do something useful at this time (instead of fighting with swords while riding on chairs). If you still want to collect on a poppy, then here is a description of the process.

... Well, you understand, right?


Here, in fact, is also not so bad. Arm cygwin and smoke mana .

The beginning of the most interesting

- Patient, do you suffer from perversions?
- What are you, doctor! I enjoy them!
Now we are faced with the task of understanding where in the source code you need to conjure to cut out the garbage collector. There are three obvious ways to do this: ask someone who knows, read all the sources, or show cunning and resourcefulness. The first method did not come out, because knowledgeable people looked at me strangely and moved away, refusing to participate in such dubious actions. The second method, even from the ideological point of view, is the most correct one, it was a pity for the time. Because there was a third way.

Let's think logically: how can someone affect the garbage collector from the outside? Two ways immediately come to mind: with the help of the keys at startup (like -XX:+UseParallelGC ) and with the help of System.gc() . And although the first seems more logical, I decided to start from the second, because javadocs cannot fully satisfy the interest as to what exactly is happening there. In java-sources, this call is delegated to Runtime, where the method is already native. Anyone who has ever worked with JNI knows how the names of functions in native code are Java_java_lang_Runtime_gc : Java_java_lang_Runtime_gc . Quick grep pushes such code in jdk/src/share/native/java/lang/Runtime.c , in which we are interested in the following lines:
 62 63 64 65 66 

 JNIEXPORT void JNICALL Java_java_lang_Runtime_gc(JNIEnv *env, jobject this) { JVM_GC(); } 

Clearly, now we are looking for JVM_GC . Not less quickly find his ad in src/share/vm/prims/jvm.cpp :
 404 405 406 407 408 409 

 JVM_ENTRY_NO_ENV(void, JVM_GC(void)) JVMWrapper("JVM_GC"); if (!DisableExplicitGC) { Universe::heap()->collect(GCCause::_java_lang_system_gc); } JVM_END 
Here we see two very interesting points: the first is DisableExplicitGC , which does not need comments and the Universe::heap() method of collect . How simple it all is: it turns out that System.gc() doing nothing but synchronously running the collector. No drama. Eh. Well, nothing, but now we know that, most likely, in the collect() method, you can disable the assembly. We easily find the Universe class in the hotspot/src/share/vm/memory/universe.hpp file and notice that the static heap method returns CollectedHeap* , as well as the presence of the initialize_heap() method

A small lyrical digression on the theme of the universe


I must say that the quality of the code in OpenJDK is excellent: good structure, easy to understand what is happening, a lot of comments. For example, an excellent snippet:
 121 122 123 124 125 126 127 

 class Universe: AllStatic { // Ugh. Universe is much too friendly. friend class MarkSweep; friend class oopDesc; //   friend' //... } 

Okay, back to our picker. The initialize_heap() method creates a heap, and depending on which collector the user has specified, some specific implementation is selected. A complete list can be found in the file hotspot/src/share/vm/gc_interface/collectedHeap.hpp :

 192 193 194 195 196 197 198 

 enum Name { Abstract, SharedHeap, GenCollectedHeap, ParallelScavengeHeap, G1CollectedHeap }; 

Continuing the study of the class, we finally come across the necessary code:

 519 520 521 522 523 524 525 526 527 528 

 // Perform a collection of the heap; intended for use in implementing // "System.gc". This probably implies as full a collection as the // "CollectedHeap" supports. virtual void collect(GCCause::Cause cause) = 0; // This interface assumes that it's being called by the // vm thread. It collects the heap assuming that the // heap lock is already held and that we are executing in // the context of the vm thread. virtual void collect_as_vm_thread(GCCause::Cause cause) = 0; 

Here comments are most useful to us. For those who do not know English well enough, I will clarify: the first method, simply collect() , is intended to be assembled “from the outside” (for example, from System.gc or, as the same grep shows, when linux memory is unsuccessful). The second one is launched from the virtual machine thread, which is responsible for garbage collection (and it is assumed that all the necessary locks are already held). A simple solution immediately comes to mind: make it so that when you call these methods, the assembly does not occur. I even tried this approach for the first time, only because bad luck: it turns out that everything is somewhat more complicated, and each heap implementation has its own additional places in which the assembly takes place. Therefore, I had to choose some specific implementation ( GenCollectedHeap with MarkSweepPolicy as the simplest), and depending on the flag (which I called UseTheForce ) to leave the methods that produce the assembly, without doing anything. As a result, changes in the first version occurred here such .

We try!


We quickly scribble a class that, during normal operation, the garbage collector should not abandon the OOM, but in its absence it does so with great joy:
 1 2 3 4 5 6 7 8 9 10 11 

 public class TheForceTester { public static final int ARRAY_SIZE = 1000000; public static void main(String[] args) { while (true) { byte[] lotsOfUsefulData = new byte[ARRAY_SIZE]; } } } 
And we will start this business using our new virtual machine:

 $ ./build/linux-amd64/bin/java -XX:+UseTheForce -verbose:gc TheForceTester Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at ru.yandex.holocron.core.TheForceTester.main(TheForceTester.java:10) 


Hooray! Fierce wines! Moreover, the tester application can add the output of the current free space and make sure that everything else also works as if correct: the heap at Xmx != Xms expands, and when equal, the free space decreases by exactly as much as it should in theory. Great! It remains only to add a spoonful of tar.

Disclaimer and yet the answer to that very question


By That very Question I, of course, mean "Why Why ?!". At the beginning of the topic, I mentioned Disruptor, for which performance is extremely critical. The garbage collector, as is known, introduces poorly predictable delays in the operation of the application. Therefore, if it is possible to reuse most of the objects and restart from time to time, I drank GC - quite an adequate way to accelerate.

In addition, because I want to see if I can. Also curious.

Disclaimer is the following: the given solution is rather dirty, and serves more like a proof of concept. First of all, because we actually did garbage collection instantly, leaving other various overheads from using the collector in the virtual machine. In an amicable way, it would be worthwhile to write your own implementation of CollectedHeap , which would completely exclude all these overheads. However, even after that, there would probably be a few more places in which it would be necessary to poke around.

What does this all mean? Wait for more topics! :)


PS What else would you do?

PPS Collected under linux-amd64 archive: clck.ru/1-L-9 (Yandex.Disk)

PPPS Please do not clone my entire repository. It weighs 600+ megabytes, and the traffic on the machine where it is hosted is paid. However, this does not prevent you from leaning on java.net, and then you can pull out a single commit ( 3358: 3f014511ecce ).

Source: https://habr.com/ru/post/142447/


All Articles