📜 ⬆️ ⬇️

JVM options. How it works

Every day the word java is more and more perceived not as a language, but as a platform thanks to the notorious invokeDynamic . That is why today I would like to talk about a virtual java machine, namely about the so-called Performance options in Oracle HotSpot JVM version 1.6 and higher (server). Because today there are almost no people who know something more than -Xmx, -Xms and -Xss. At one time, when I began to delve into the topic, I discovered a huge amount of interesting information, which I want to share. The starting point, of course, was the official documentation from Oracle. And then - google, experiments and communication:

-XX: + DoEscapeAnalysis


I will begin, perhaps, with the most interesting option - DoEscapeAnalysis. As many of you know, primitives and object references are not created on the heap, but are allocated on the thread stack (256KB by default for Hotspot). It is quite obvious that the java language does not allow you to create objects on the stack on a straight line. But your JVM 1.6 can do this quite well starting from the 14th update.

About how the algorithm itself can be read here (PDF) . In short, then:
')


To implement this algorithm, a so-called connection graph is constructed and used, according to which at the analysis stage (several analysis algorithms) a passage is made to find intersections with other flows and methods.
Thus, after passing the connection graph for any object, one of the following states is possible:



After the analysis stage, the JVM itself performs a possible optimization: in the case of a NoEscape object, then it can be created on the stack; if the object is NoEscape or ArgEscape, then synchronization operations on it can be deleted.

It should be clarified that it is not the object itself that is created on the stack, but its fields. Since the JVM replaces the entire object with a collection of its fields (thanks to Walrus for the clarification).

It is quite obvious that due to this kind of analysis, the performance of individual parts of the program can increase significantly. In synthetic tests, like this:

for (int i = 0; i < 1000*1000*1000; i++) { Foo foo = new Foo(); } 

execution speed may increase by 8-15 times. Although, on the seemingly obvious cases from practice about which recently was written ( here and here ) EscapeAnalys does not work. I suspect that this is due to the size of the stack.

By the way, EscapeAnalysis is partly responsible for the well-known argument about StringBuilder and StringBuffer. That is, if you suddenly used a StringBuffer instead of a StringBuilder in the method, then EscapeAnalysis (if triggered) will remove the locks for the StringBuffer, after which the StringBuffer turns into a StringBuilder.

-XX: + AggressiveOpts


The AggressiveOpts option is a super option. Not in the sense that it dramatically increases the performance of your application, but in the sense that it only changes the values ​​of other options (in fact, this is not quite so - there are quite a few places in the JDK source code where AggressiveOpts change the behavior of JVM besides the options mentioned, one of the examples here ). We will check the modified flags with the help of two commands:

 java -server -XX:-AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal > no_aggr java -server -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal > aggr 

After the execution, the difference in the results of the command execution looked like this:
-AggressiveOpts
+ AggressiveOpts
AutoBoxCacheMax
128
20,000
BiasedLockingStartupDelay
4,000
500
EliminateAutoBox
false
true
Optimizefill
false
true
OptimizeStringConcat
false
true

In other words, all that this option does is change 5 virtual machine parameter data. Moreover, for versions 1.6 update 35 and 1.7 update 7 no differences were noticed. This option is disabled by default and changes nothing in the client mode.
Let's figure out what java means by aggressive optimization:

-XX: AutoBoxCacheMax = size


Allows you to extend the range of cached values ​​for integer types when starting a virtual machine. I already mentioned this option here (second paragraph) .

-XX: BiasedLockingStartupDelay = delay


As you know, a synchronized block in java can be represented by one of 3 types of locks:


You can read more about it here , here and here .

Since most objects (synchronized) are blocked by up to 1 single stream, such objects can be tied (biased) to this stream and synchronization operations on this object inside the stream are greatly reduced in price. If another thread tries to access the biased object, the lock for this object is switched to the thin lock.

Switching itself is relatively expensive, so there is a delay at the start of the JVM, which by default creates all locks as thin and if no competition is detected and the code is used by the same thread, then such locks become biased after the delay expires. That is, the JVM tries to determine locking usage scenarios at the start and accordingly uses fewer switches between them. Accordingly, setting BiasedLockingStartupDelay to zero, we expect that the main pieces of the synchronization code will be used only by the same thread.

-XX: + OptimizeStringConcat


Also quite an interesting option. Recognizes pattern of similarity.

 StringBuilder().append(...).toString() //   StringBuilder().append(new StringBuiler().append(...).toString()).toString() 

and instead of constantly allocating memory for a new concatenation operation, an attempt is made to calculate the total number of characters of each concatenation object to allocate memory only 1 time.
In other words, if we call the append () operation 20 times on a string 20 characters long. That creation of an array of char will occur once and is 400 characters long.

XX: + OptimizeFill


Array fill / copy cycles are replaced with direct machine instructions to speed up the work.
For example, the following block (taken from Arrays.fill ()):

  for (int i=fromIndex; i<toIndex; i++) a[i] = val; 

There will be completely replaced with the appropriate processor instructions like sishnyh memset, memcpy only lower-level ones.

XX: + EliminateAutoBox


Based on the name, the flag should somehow reduce the number of autoboxing operations. Unfortunately, I still could not figure out what this flag does. The only thing that has been clarified is that this applies only to Integer shells.

-XX: + UseCompressedStrings


A rather controversial option in my opinion ... If, in the distant 90s, java developers did not regret 2 bytes per symbol, then today such optimization looks rather ridiculous. If anyone has not guessed, then the option replaces the character arrays in the strings with byte arrays, where possible (ASCII). In fact:

 char[] -> byte[] 

Thus, significant memory savings are possible. But in view of the fact that the type changes, overhead costs for type control appear during certain operations. That is, with this option, JVM performance degradation is possible. Actually, therefore, the option is disabled by default.

-XX: + UseStringCache


Quite a mysterious option, judging by the name, it should somehow cache lines. How? Unclear. There is no information. Yes, and the code does not seem to do anything. I would be glad if someone could clarify.

-XX: + UseCompressedOops


First, a few facts:



This option allows you to reduce the pointer size for 64-bit JVMs to 32 bits, but in this case the heap size is limited to 4 GB, therefore, in addition to the shortened pointer, the multiplicity property of 8 bytes is used. As a result, we are able to use an address space of 2 ^ 35 bytes (32 GB) having pointers of 32 bits.
In fact, inside the virtual machine, we have pointers to objects, not specific bytes in memory. It is clear that due to such assumptions (of multiplicity) there are additional costs for converting pointers. But in essence, this is just one shift and sum operation.

In addition to reducing the size of the pointers themselves, this option also reduces the headers of objects and various alignments and shifts inside the created objects, which allows, on average, to reduce memory consumption by 20-60% depending on the application model.

That is, of the disadvantages we only have:



Since for most applications the option carries some advantages, then starting from JDK 6 update 23 it is enabled by default, as well as in JDK 7. More detailed here and here .

-XX: + EliminateLocks


An option that eliminates unnecessary locks by combining them. For example, the following blocks:

 synchronized (object) { //doSomething1 } synchronized (object) { //doSomething2 } 

 synchronized (object) { //doSomething3 } //doSomething4 synchronized (object) { //doSomething5 } 

will be converted accordingly to

 synchronized (object) { //doSomething1 //doSomething2 } 


 synchronized (object) { //doSomething3 //doSomething4 //doSomething5 } 


This reduces the number of attempts to capture the monitor.

Conclusion

There are quite a few interesting options left overboard, since it is rather difficult to put all ~ 700 flags into one article. I did not specifically touch upon the options for tuning the collector, as this is a rather extensive and complex topic and it deserves several posts. I hope the article was useful to you.

Source: https://habr.com/ru/post/160049/


All Articles