Dozens of releases happened last week - and although Graal was available before, now it has become more accessible - Congratulations, you're running #Graal! - just add
-XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler
What exactly can it give us and where can we expect improvements, and which bikes should we start cutting out?
An example that I will consider is partially contrived, however, based on real events.
Surely many people use the Preconditions class from the guava library:
checkArgument(value > 0, "Non-negative value is expected, was %s", value);
And everything would be fine if such a piece did not fall on the critical path in the code - the problem is in the implicit creation of garbage.
This is the body of the checkArgument
method:
public static void checkArgument( boolean expression, @Nullable String errorMessageTemplate, @Nullable Object... errorMessageArgs) { if (!expression) { throw new IllegalArgumentException(format(errorMessageTemplate, errorMessageArgs)); } }
Let's make implicit explicit:
boolean expression = value > 0; Object[] errorMessageArgs = new Object[]{Integer.valueOf(value)}; if (!expression) { throw new IllegalArgumentException(format(errorMessageTemplate, errorMessageArgs)); }
Here there is a checker-dyalamma-or go: As a rule, similar checks in the production code are overtakings, and on the one hand you don’t want to pay for them with additional debris, but on the other hand you don’t want to throw out fast fail.
The problem is in objects generated by autoboxing and varargs that may not be used. Alas, faced with branching, Escape Analysis is no longer able to identify an object as unnecessary.
How can I solve the problem?
For example, by overloading the checkArgument
method (which, in general, is done in guava > = 20 ):
public static void checkArgument(boolean expression, @Nullable String errorMessageTemplate, int p1) { if (!expression) { throw new IllegalArgumentException(format(errorMessageTemplate, p1)); } }
But, what if we have not one argument, but more than two - for which there are overloaded methods in guava? Write your crutch, or suffer from garbage? In our code, we are confronted with a place that contains a combination of 3x int, a single line that is executed millions of times and the response time is limited.
Java 10 and -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler
Graal bears on itself many new optimizations, in particular, Partial Escape Analysis - the essence of which, among other things, is that it is able to determine that the created objects are used only in one of the branches - and you can move the creation of these objects into it.
The moment of truth - what is your evidence?
@BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Fork(1) @Warmup(iterations = 5, time = 5000, timeUnit = TimeUnit.MILLISECONDS) @Measurement(iterations = 5, time = 5000, timeUnit = TimeUnit.MILLISECONDS) @State(Scope.Benchmark) public class PartialEATest { @Param(value = {"-1", "1"}) private int value; @Benchmark public void allocate(Blackhole bh) { checkArg(bh, value > 0, "expected non-negative value: %s, %s", value, 1000, "A", 700); } private static void checkArg(Blackhole bh, boolean cond, String msg, Object ... args){ if (!cond){ bh.consume(String.format(msg, args)); } } public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include(PartialEATest.class.getSimpleName()) .addProfiler(GCProfiler.class) .build(); new Runner(opt).run(); } }
Of all the numbers, we are interested in allocations - this is why GCProfiler is included:
Options | Benchmark | (value) | Score | Error | Units |
---|---|---|---|---|---|
-Graal | PartialEATest.allocate: · gc.alloc.rate.norm | -one | 1008,000 | ± 0,001 | B / op |
-Graal | PartialEATest.allocate: · gc.alloc.rate.norm | one | 32,000 | ± 0,001 | B / op |
+ Graal | PartialEATest.allocate: · gc.alloc.rate.norm | -one | 1024,220 | ± 0,908 | B / op |
+ Graal | PartialEATest.allocate: · gc.alloc.rate.norm | one | ≈ 10⁻⁴ | B / op |
Which quite clearly demonstrates that Graal does not create objects unnecessarily - and it's time to cut optimization crutches.
Added :
olegchir reasonably noted: it would be good to see what exactly the code is compiled?
Let's see what kind of assembler code is obtained as a result of compiling the good old C2 and Graal - for this we need hsdis - download it or compile it yourself , add parameters to the launch:
-XX:+UnlockDiagnosticVMOptions -XX:PrintAssemblyOptions=intel -XX:CompileCommand=print,"com/elastic/PartialEATest.*"
There is a lot of code - all compiled code is before the first autoboxing :
ImmutableOopMap{rbx=Oop }pc offsets: 1684 1697 Compiled method (c2) 619 736 4 com.elastic.PartialEATest::allocate (55 bytes) total in heap [0x00000001189a0c90,0x00000001189a1410] = 1920 relocation [0x00000001189a0e08,0x00000001189a0e38] = 48 main code [0x00000001189a0e40,0x00000001189a1060] = 544 stub code [0x00000001189a1060,0x00000001189a1078] = 24 oops [0x00000001189a1078,0x00000001189a10a0] = 40 metadata [0x00000001189a10a0,0x00000001189a10b0] = 16 scopes data [0x00000001189a10b0,0x00000001189a1210] = 352 scopes pcs [0x00000001189a1210,0x00000001189a13c0] = 432 dependencies [0x00000001189a13c0,0x00000001189a13c8] = 8 handler table [0x00000001189a13c8,0x00000001189a1410] = 72 ---------------------------------------------------------------------- com/elastic/PartialEATest.allocate(Lorg/openjdk/jmh/infra/Blackhole;)V [0x00000001189a0e40, 0x00000001189a1078] 568 bytes [Entry Point] [Constants] # {method} {0x000000022ea937b8} 'allocate' '(Lorg/openjdk/jmh/infra/Blackhole;)V' in 'com/elastic/PartialEATest' # this: rsi:rsi = 'com/elastic/PartialEATest' # parm0: rdx:rdx = 'org/openjdk/jmh/infra/Blackhole' # [sp+0x30] (sp of caller) 0x00000001189a0e40: cmp rax,QWORD PTR [rsi+0x8] 0x00000001189a0e44: jne 0x0000000110eb7580 ; {runtime_call ic_miss_stub} 0x00000001189a0e4a: xchg ax,ax 0x00000001189a0e4c: nop DWORD PTR [rax+0x0] [Verified Entry Point] 0x00000001189a0e50: mov DWORD PTR [rsp-0x14000],eax 0x00000001189a0e57: push rbp 0x00000001189a0e58: sub rsp,0x20 ;*synchronization entry ; - com.elastic.PartialEATest::allocate@-1 (line 26) 0x00000001189a0e5c: mov r11d,DWORD PTR [rsi+0x10] ;*getfield value {reexecute=0 rethrow=0 return_oop=0} ; - com.elastic.PartialEATest::allocate@1 (line 26) 0x00000001189a0e60: mov DWORD PTR [rsp],r11d 0x00000001189a0e64: test r11d,r11d 0x00000001189a0e67: jle 0x00000001189a0ffc ;*ifle {reexecute=0 rethrow=0 return_oop=0} ; - com.elastic.PartialEATest::allocate@4 (line 26) 0x00000001189a0e6d: cmp r11d,0xffffff80 0x00000001189a0e71: jl 0x00000001189a100e ;*if_icmplt {reexecute=0 rethrow=0 return_oop=0} ; - java.lang.Integer::valueOf@3 (line 1048) ; - com.elastic.PartialEATest::allocate@24 (line 26) 0x00000001189a0e77: cmp r11d,0x7f 0x00000001189a0e7b: jg 0x00000001189a0ea9 ;*if_icmpgt {reexecute=0 rethrow=0 return_oop=0} ; - java.lang.Integer::valueOf@10 (line 1048) ; - com.elastic.PartialEATest::allocate@24 (line 26) 0x00000001189a0e7d: mov ebp,r11d 0x00000001189a0e80: add ebp,0x80 ;*iadd {reexecute=0 rethrow=0 return_oop=0} ; - java.lang.Integer::valueOf@20 (line 1049) ; - com.elastic.PartialEATest::allocate@24 (line 26) 0x00000001189a0e86: cmp ebp,0x100 0x00000001189a0e8c: jae 0x00000001189a101e 0x00000001189a0e92: movsxd r10,r11d 0x00000001189a0e95: movabs r11,0x12ed02000 ; {oop(a 'java/lang/Integer'[256] {0x000000012ed02000})} 0x00000001189a0e9f: mov rbp,QWORD PTR [r11+r10*8+0x418] ;*aaload {reexecute=0 rethrow=0 return_oop=0} ; - java.lang.Integer::valueOf@21 (line 1049) ; - com.elastic.PartialEATest::allocate@24 (line 26) ................
ImmutableOopMap{rbx=Oop }pc offsets: 251 264 Compiled method (JVMCI) 1850 3888 4 com.elastic.PartialEATest::allocate (55 bytes) total in heap [0x0000000119292590,0x0000000119292830] = 672 relocation [0x0000000119292708,0x0000000119292718] = 16 main code [0x0000000119292720,0x0000000119292795] = 117 stub code [0x0000000119292795,0x0000000119292798] = 3 oops [0x0000000119292798,0x00000001192927a0] = 8 metadata [0x00000001192927a0,0x00000001192927a8] = 8 scopes data [0x00000001192927a8,0x00000001192927c8] = 32 scopes pcs [0x00000001192927c8,0x0000000119292828] = 96 dependencies [0x0000000119292828,0x0000000119292830] = 8 ---------------------------------------------------------------------- com/elastic/PartialEATest.allocate(Lorg/openjdk/jmh/infra/Blackhole;)V (com.elastic.PartialEATest.allocate(Blackhole)) [0x0000000119292720, 0x0000000119292798] 120 bytes [Entry Point] [Constants] # {method} {0x0000000231e007b8} 'allocate' '(Lorg/openjdk/jmh/infra/Blackhole;)V' in 'com/elastic/PartialEATest' # this: rsi:rsi = 'com/elastic/PartialEATest' # parm0: rdx:rdx = 'org/openjdk/jmh/infra/Blackhole' # [sp+0x20] (sp of caller) 0x0000000119292720: cmp rax,QWORD PTR [rsi+0x8] 0x0000000119292724: jne 0x000000010eadc300 ; {runtime_call ic_miss_stub} 0x000000011929272a: nop 0x000000011929272b: data16 data16 nop WORD PTR [rax+rax*1+0x0] 0x0000000119292736: data16 nop WORD PTR [rax+rax*1+0x0] [Verified Entry Point] 0x0000000119292740: mov DWORD PTR [rsp-0x14000],eax 0x0000000119292747: sub rsp,0x18 0x000000011929274b: mov QWORD PTR [rsp+0x10],rbp 0x0000000119292750: cmp DWORD PTR [rsi+0x10],0x1 0x0000000119292754: jl 0x000000011929276d ;*ifle {reexecute=0 rethrow=0 return_oop=0} ; - com.elastic.PartialEATest::allocate@4 (line 26) 0x000000011929275a: mov rbp,QWORD PTR [rsp+0x10] 0x000000011929275f: add rsp,0x18 0x0000000119292763: mov rcx,QWORD PTR [r15+0x70] 0x0000000119292767: test DWORD PTR [rcx],eax ; {poll_return} 0x0000000119292769: vzeroupper 0x000000011929276c: ret ;*return {reexecute=0 rethrow=0 return_oop=0} ; - com.elastic.PartialEATest::allocate@54 (line 27) 0x000000011929276d: mov DWORD PTR [r15+0x314],0xffffffed ;*ifle {reexecute=0 rethrow=0 return_oop=0} ; - com.elastic.PartialEATest::allocate@4 (line 26) 0x0000000119292778: mov QWORD PTR [r15+0x320],0x0 0x0000000119292783: call 0x000000010eadd2a4 ; ImmutableOopMap{rsi=Oop } ;*aload_0 {reexecute=1 rethrow=0 return_oop=0} ; - com.elastic.PartialEATest::allocate@0 (line 26) ; {runtime_call DeoptimizationBlob} 0x0000000119292788: nop
You can see how much the code compiled by C2 is larger than the code compiled by Graal - both autoboxing and varargs, whereas the version of Graal is essentially just a method call.
Source: https://habr.com/ru/post/351996/
All Articles