JVM options. How it works

Every day the word java is more and more perceived not as a language, but as a platform thanks to the notorious invokeDynamic . That is why today I would like to talk about a virtual java machine, namely about the so-called Performance options in Oracle HotSpot JVM version 1.6 and higher (server). Because today there are almost no people who know something more than -Xmx, -Xms and -Xss. At one time, when I began to delve into the topic, I discovered a huge amount of interesting information, which I want to share. The starting point, of course, was the official documentation from Oracle. And then - google, experiments and communication:

-XX: + DoEscapeAnalysis

I will begin, perhaps, with the most interesting option - DoEscapeAnalysis. As many of you know, primitives and object references are not created on the heap, but are allocated on the thread stack (256KB by default for Hotspot). It is quite obvious that the java language does not allow you to create objects on the stack on a straight line. But your JVM 1.6 can do this quite well starting from the 14th update.

About how the algorithm itself can be read here (PDF) . In short, then:
')

If the scope of an object does not go beyond the scope of the method in which it is created, then such an object can be created on the stack frame instead of the heap (in fact, not the object itself, but its fields, the totality of which is replaced by the object);
If an object does not leave the scope of the stream, then other objects do not have access to such an object, and therefore all synchronization operations on the object can be deleted.

To implement this algorithm, a so-called connection graph is constructed and used, according to which at the analysis stage (several analysis algorithms) a passage is made to find intersections with other flows and methods.
Thus, after passing the connection graph for any object, one of the following states is possible:

GlobalEscape - the object is accessible from other threads and from other methods, for example, a static field.
ArgEscape - the object was passed as an argument or there is a link to it from the argument object, but it does not go out of the scope of the thread in which it was created.
NoEscape - the object does not leave the scope of the method and its creation can be moved onto the stack.

After the analysis stage, the JVM itself performs a possible optimization: in the case of a NoEscape object, then it can be created on the stack; if the object is NoEscape or ArgEscape, then synchronization operations on it can be deleted.

It should be clarified that it is not the object itself that is created on the stack, but its fields. Since the JVM replaces the entire object with a collection of its fields (thanks to Walrus for the clarification).

It is quite obvious that due to this kind of analysis, the performance of individual parts of the program can increase significantly. In synthetic tests, like this:

for (int i = 0; i < 1000*1000*1000; i++) { Foo foo = new Foo(); }

execution speed may increase by 8-15 times. Although, on the seemingly obvious cases from practice about which recently was written ( here and here ) EscapeAnalys does not work. I suspect that this is due to the size of the stack.

By the way, EscapeAnalysis is partly responsible for the well-known argument about StringBuilder and StringBuffer. That is, if you suddenly used a StringBuffer instead of a StringBuilder in the method, then EscapeAnalysis (if triggered) will remove the locks for the StringBuffer, after which the StringBuffer turns into a StringBuilder.

-XX: + AggressiveOpts

The AggressiveOpts option is a super option. Not in the sense that it dramatically increases the performance of your application, but in the sense that it only changes the values of other options (in fact, this is not quite so - there are quite a few places in the JDK source code where AggressiveOpts change the behavior of JVM besides the options mentioned, one of the examples here ). We will check the modified flags with the help of two commands:

 java -server -XX:-AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal > no_aggr java -server -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal > aggr

After the execution, the difference in the results of the command execution looked like this:

	-AggressiveOpts	+ AggressiveOpts
AutoBoxCacheMax	128	20,000
BiasedLockingStartupDelay	4,000	500
EliminateAutoBox	false	true
Optimizefill	false	true
OptimizeStringConcat	false	true

In other words, all that this option does is change 5 virtual machine parameter data. Moreover, for versions 1.6 update 35 and 1.7 update 7 no differences were noticed. This option is disabled by default and changes nothing in the client mode.
Let's figure out what java means by aggressive optimization:

-XX: AutoBoxCacheMax = size

Allows you to extend the range of cached values for integer types when starting a virtual machine. I already mentioned this option here (second paragraph) .

-XX: BiasedLockingStartupDelay = delay

As you know, a synchronized block in java can be represented by one of 3 types of locks:

biased
thin
fat

You can read more about it here , here and here .

Since most objects (synchronized) are blocked by up to 1 single stream, such objects can be tied (biased) to this stream and synchronization operations on this object inside the stream are greatly reduced in price. If another thread tries to access the biased object, the lock for this object is switched to the thin lock.

Switching itself is relatively expensive, so there is a delay at the start of the JVM, which by default creates all locks as thin and if no competition is detected and the code is used by the same thread, then such locks become biased after the delay expires. That is, the JVM tries to determine locking usage scenarios at the start and accordingly uses fewer switches between them. Accordingly, setting BiasedLockingStartupDelay to zero, we expect that the main pieces of the synchronization code will be used only by the same thread.

-XX: + OptimizeStringConcat

Also quite an interesting option. Recognizes pattern of similarity.

 StringBuilder().append(...).toString() //   StringBuilder().append(new StringBuiler().append(...).toString()).toString()

and instead of constantly allocating memory for a new concatenation operation, an attempt is made to calculate the total number of characters of each concatenation object to allocate memory only 1 time.
In other words, if we call the append () operation 20 times on a string 20 characters long. That creation of an array of char will occur once and is 400 characters long.

XX: + OptimizeFill

Array fill / copy cycles are replaced with direct machine instructions to speed up the work.
For example, the following block (taken from Arrays.fill ()):

  for (int i=fromIndex; i<toIndex; i++) a[i] = val;

There will be completely replaced with the appropriate processor instructions like sishnyh memset, memcpy only lower-level ones.

XX: + EliminateAutoBox

Based on the name, the flag should somehow reduce the number of autoboxing operations. Unfortunately, I still could not figure out what this flag does. The only thing that has been clarified is that this applies only to Integer shells.

-XX: + UseCompressedStrings

A rather controversial option in my opinion ... If, in the distant 90s, java developers did not regret 2 bytes per symbol, then today such optimization looks rather ridiculous. If anyone has not guessed, then the option replaces the character arrays in the strings with byte arrays, where possible (ASCII). In fact:

 char[] -> byte[]

Thus, significant memory savings are possible. But in view of the fact that the type changes, overhead costs for type control appear during certain operations. That is, with this option, JVM performance degradation is possible. Actually, therefore, the option is disabled by default.

-XX: + UseStringCache

Quite a mysterious option, judging by the name, it should somehow cache lines. How? Unclear. There is no information. Yes, and the code does not seem to do anything. I would be glad if someone could clarify.

-XX: + UseCompressedOops

First, a few facts:

The size of a pointer to an object in a 32-bit JVM is 32 bits. In the 64-bit - 64 bits. Therefore, in the first case you can use an address space of 2 ^ 32 bytes (4 GB), and in the second case 2 ^ 64 bytes.
The size of objects in java is multiple to 8 bytes, regardless of the bit width of the virtual machine (this is not true for all virtual machines, but it's about Hotspot). That is, when using 32-bit pointers, the last 3 bits will always be zeros, in fact, the virtual machine actually uses only 29 bits.

This option allows you to reduce the pointer size for 64-bit JVMs to 32 bits, but in this case the heap size is limited to 4 GB, therefore, in addition to the shortened pointer, the multiplicity property of 8 bytes is used. As a result, we are able to use an address space of 2 ^ 35 bytes (32 GB) having pointers of 32 bits.
In fact, inside the virtual machine, we have pointers to objects, not specific bytes in memory. It is clear that due to such assumptions (of multiplicity) there are additional costs for converting pointers. But in essence, this is just one shift and sum operation.

In addition to reducing the size of the pointers themselves, this option also reduces the headers of objects and various alignments and shifts inside the created objects, which allows, on average, to reduce memory consumption by 20-60% depending on the application model.

That is, of the disadvantages we only have:

The maximum heap size is limited to 32 GB (64GB for JRockit with a multiplicity of objects of 16 bytes);
Appear add. the cost of converting JVM links to native and back.

Since for most applications the option carries some advantages, then starting from JDK 6 update 23 it is enabled by default, as well as in JDK 7. More detailed here and here .

-XX: + EliminateLocks

An option that eliminates unnecessary locks by combining them. For example, the following blocks:

 synchronized (object) { //doSomething1 } synchronized (object) { //doSomething2 }

 synchronized (object) { //doSomething3 } //doSomething4 synchronized (object) { //doSomething5 }

will be converted accordingly to

 synchronized (object) { //doSomething1 //doSomething2 }

 synchronized (object) { //doSomething3 //doSomething4 //doSomething5 }

This reduces the number of attempts to capture the monitor.

Conclusion

There are quite a few interesting options left overboard, since it is rather difficult to put all ~ 700 flags into one article. I did not specifically touch upon the options for tuning the collector, as this is a rather extensive and complex topic and it deserves several posts. I hope the article was useful to you.

Source: https://habr.com/ru/post/160049/

All Articles