Hardcore Java / JVM puzzles

Performance tasks from Contour have already been , and our turn has come: we present hardcore tasks from the Java conference JBreak 2018, aka “ hell from Excelsior ”.

The tasks are given in the original formulations, in each task there can be several correct answers, and each task is given a solution under the spoiler.

Task 1

Your colleague has been reading the Java Language Specification and wrote the following:

void playWithRef() { Object obj = new Object(); WeakReference<Object> ref = new WeakReference<>(obj); System.out.println(ref.get() != null); System.gc(); System.out.println(ref.get() != null); }

And rake you: what performance results are possible?

A : false, false
B : false, true
C : true, false
D : true, true

Answer and Decision

The correct answer is: A , C , D.

Decision

The scope of the obj variable is the entire method, and the scope of life ends after exiting the WeakReference constructor (in fact, even a little earlier - in the insides of the constructor). And it is the area of life that affects whether GC can destroy this object.

However, sometimes VM can prolong the life of variables if it is convenient for it. For example, the HotSpot interpreter tells the GC that the variables are alive while they are visible (this can be observed in the debugger). That is, option D is easily achieved by running the example without any additional options on the HotSpot VM (or with explicit -Xint ).

Result C is achieved on many compilers (for example, HotSpot C1 / C2, Excelsior JET JIT & AOT, ...). The compilers are smart enough to calculate that the obj variable is not used and by the first call to get() nothing prevents the GC from destroying the object. However, most often GC will come only when System.gc() explicitly called; this behavior is manifested on HotSpot VM from -Xcomp or Excelsior JET in any mode.

Option A is theoretically achievable if the GC comes, for example, at the end of the execution of the WeakReference constructor.

The problem is based on a bug in the JDK 8 code, where the method argument was inaccurately kept in a WeakReference and died during the execution of the method. About this there is a separate detailed post in our technical blog.

Task 2

An evil hacker deleted the original java file and shuffled the pieces of your class file:

 A: 0700 0401 0001 4300 2000 0300 0100 0000 B: 0000 0000 00 C: 6a61 7661 2f6c 616e 672f 4f62 6a65 6374 D: cafe babe 0000 0031 0005 0700 0201 0010

Rearrange them so that a verified class file is obtained.

Answer and Decision

The correct answer is: D , C , A , B.

Decision

This task is rather wit, but still teaches something new.

It is widely known that a class file begins with a four-byte header 0xCAFEBABE , which means D is exactly the first. Common sense dictates that a short piece of B comes last - this is the tail.

Then it was possible to recall that the class file contains a ConstantPool , in which there are string constants consisting of two-byte length and the actual string encoded in UTF-8. The only piece that is similar to UTF-8 is the C piece - this is the UTF-8 representation of the java/lang/Object string (a link to the superclass of our class). So before it should be bytes 0x0010 (the string has a length of 16), and the only suitable option is D , that is, C is the second.

Alternatively, it was possible to notice that the entire last line B consists of zeros, which means that the last but one line should end in zeros, that is, it is A !

Javap output

 class C minor version: 0 major version: 49 flags: ACC_SUPER Constant pool: #1 = Class #2 // java/lang/Object #2 = Utf8 java/lang/Object #3 = Class #4 // C #4 = Utf8 C { }

Task 3

After listening to the next report about Graal, having enthusiastically looked at the JVM Compiler Interface, you decided to write your own compiler for Java! And we decided to start by generating the x86_64 code for the method:

 static boolean invert(boolean x) { return !x; }

What generated code will be correct for this method?

Legend: Intel-syntax is used, the calling convention is such that the rcx contains an argument, and rax is the result.

 A: test ecx, ecx jnz True mov eax, 1 ret True: mov eax, 0 ret B: xor eax, eax test ecx, ecx jnz End add eax, 1 End: ret C: mov eax, 1 sub eax, ecx ret D: mov eax, ecx xor eax, 1 ret

Answer and Decision

The correct answer is: A , B.

Decision

Increasingly at Java conferences, you can see assembler listings, but in case you are not familiar with the Intel x86 instruction set , below is the equivalent C code:

 A: res = (arg == 0) ? 1 : 0; B: res = 0; if (arg == 0) res += 1; C: res = 1; res -= arg; D: res = arg; res ^= 1;

In fact, all these inversion algorithms work correctly, while the input argument takes the usual logical values 0 and 1 .

Next comes the interesting. From the point of view of the verifier, all short integer types ( boolean , byte , char , short ) are equivalent to the type int . Moreover , boolean -specific byte-code instructions do not exist at all. For example, the byte-code instructions of the method under study are as follows:

 public static boolean invert(boolean); 0: iload_0 1: ifne 8 4: iconst_1 5: goto 9 8: iconst_0 9: ireturn

Thus, a method accepting a boolean must be ready to work with any int , and any non-zero value is treated as true . In this case, the “optimized” options C and D start to behave incorrectly C(2) = -1 and D(2) = 3 , and the more straightforward A and B continue to work A(2) = B(2) = 0 .

To illustrate these subtleties will have to manipulate the byte code. The example is available on GitHub : the numbers 0, 1, 2, 3, -1 are passed to the invert method, and the result is output, followed by calls to println(boolean) and println(int) .

A curious fact: in JDK 8, the HotSpot C2 compiler generated option D , and in JDK 9, the generation template was changed to a more correct one.

Code generated by HotSpot C2 on Intel x86_64

In JDK 8, the pattern D and the incorrect result returned are clearly visible:

 $ jdk8/bin/java -Xcomp -Xbatch -XX:-TieredCompilation -XX:CompileCommand=print,Inverter.invert -XX:+UnlockDiagnosticVMOptions -XX:PrintAssemblyOptions=intel BooleanHell ... Compiled method (c2) 1216 533 Inverter::invert (10 bytes) ... # {method} {0x0000000012600d08} 'invert' '(Z)Z' in 'Inverter' # parm0: rdx = boolean # [sp+0x20] (sp of caller) 0x00000000057d7ac0: sub rsp,0x18 0x00000000057d7ac7: mov QWORD PTR [rsp+0x10],rbp ;*synchronization entry ; - Inverter::invert@-1 (line 3) 0x00000000057d7acc: mov eax,edx 0x00000000057d7ace: xor eax,0x1 ;*ireturn ; - Inverter::invert@9 (line 3) 0x00000000057d7ad1: add rsp,0x10 0x00000000057d7ad5: pop rbp 0x00000000057d7ad6: test DWORD PTR [rip+0xfffffffffdf58524],eax # 0x0000000003730000 ; {poll_return} 0x00000000057d7adc: ret ... false (0) -> true (1) true (1) -> false (0) true (2) -> true (3) true (3) -> true (2) true (-1) -> true (-2)

In JDK 9, we improved the normalization of boolean values: adding the input argument to the {0, 1} range ( test and setne ) and the result was correct:

 $ jdk9/bin/java -Xcomp -Xbatch -XX:-TieredCompilation -XX:CompileCommand=print,Inverter.invert -XX:+UnlockDiagnosticVMOptions -XX:PrintAssemblyOptions=intel BooleanHell ... Compiled method (c2) 4702 1496 Inverter::invert (10 bytes) ... # {method} {0x000001fa974d2dc0} 'invert' '(Z)Z' in 'Inverter' # {method} {0x000001fa974d2dc0} 'invert' '(Z)Z' in 'Inverter' # parm0: rdx = boolean # [sp+0x20] (sp of caller) 0x000001fafcb57720: sub rsp,0x18 0x000001fafcb57727: mov QWORD PTR [rsp+0x10],rbp ;*synchronization entry ; - Inverter::invert@-1 (line 3) 0x000001fafcb5772c: test edx,edx 0x000001fafcb5772e: setne al 0x000001fafcb57731: movzx eax,al 0x000001fafcb57734: xor eax,0x1 ;*ireturn {reexecute=0 rethrow=0 return_oop=0} ; - Inverter::invert@9 (line 3) 0x000001fafcb57737: add rsp,0x10 0x000001fafcb5773b: pop rbp 0x000001fafcb5773c: test DWORD PTR [rip+0xfffffffffdf688be],eax # 0x000001fafaac0000 ; {poll_return} 0x000001fafcb57742: ret ... false (0) -> true (1) true (1) -> false (0) true (2) -> false (0) true (3) -> false (0) true (-1) -> false (0)

Task 4

Suddenly, you realize that you are very interested in what can call this method:

 void guessWhat(Iterable<?> x) { System.out.println(x.getClass()); }

A : class java.util.ArrayList
B : null
C : interface java.lang.Iterable
D : class java.lang.Integer

Answer and Decision

The correct answer is: A , D.

Decision

Options B and C are not possible, since Object.getClass() always returns a non-zero class, and there are no instances of an interface type. Version A is easily implemented: guessWhat(new ArrayList<Object>()) .

However, option D is reachable: Integer does not implement the Iterable interface, but nevertheless its instance can come into this method. The answer is that the severity of the typical Java language system again fell under the weakness of the typical JVM verifier system: any reference type is assignment compatible with any interface. That is, almost everywhere where the interface type is expected (including parameters, return value, fields), you can pass any reference value (that is, arbitrary classes and arrays).

This effect can be demonstrated either by manipulating the bytecode, or by partially recompiling the class files.

Task 5

Having once again believed in the infallibility of javac, you decided to experiment:

 class C { private boolean getBoolean() { return false; } } interface I { default boolean getBoolean() { return true; } } class D extends C implements I {} public class Test { public static void main(String[] a) { foo(new D()); } public static void foo(I i) { System.out.println(i.getBoolean()); } }

What happens when you try to compile and run the Test class?

A : Cannot compile
B : java.lang.IllegalAccessError thrown
C : Prints " true "
D : " false " is printed

Answer and Decision

The correct answer is: B.

Decision

Many believe that IllegalAccessError is the lot of those who are too clever with partial recompilation or obfuscation. So it was with us, when ProGuard, during obfuscation, gave two different methods (one private, the other default) the same names, and the resulting application began to throw IllegalAccessError .

However, it turned out that if two such methods will have the same name immediately in the source code, javac compile them without any warnings, and during execution IllegalAccessError also be thrown.

This behavior of the JVM is explained by the way the target method is searched for the invokeinterface instruction. According to the specification , instance-methods of the class and all superclasses are viewed first, and only then a suitable default method is searched among the super-interfaces, and the privacy of the found method is checked only after the entire process is completed.

Thus, the search ends with the private method getBoolean from the getBoolean C , which got in the way of finding the default of the getBoolean method from the super-interface I After this, IllegalAccessError is logically thrown.

Interestingly, in Java 11 it is planned to change this , and skip private methods in the search process.

Task 6

Suddenly you find yourself debugging the native code of a compiled Java application. You do not have the source code, but you have already found the problem method, here it is:

 1: lea rax, [rel _Test_foo] 2: push rax 3: mov eax, dword [rcx+0FH] 4: idiv dword [rdx+0FH] 5: mov rbx, qword [rel _Test_array] 6: mov ebx, dword [rbx+3BH] 7: add eax, ebx 8: ret 8

You suspect that executing this method could trigger a Java throw of exceptions of various types. It remains to understand what instructions could be to blame (specify their numbers)?

StackOverflowError : _________
NullPointerException : _________
ArithmeticException : _________
IndexOutOfBoundsException : _________

Answer and Decision

Correct answer:

StackOverflowError : 2
NullPointerException : 3, 4, 6
ArithmeticException : 4
IndexOutOfBoundsException : none

Decision

The compiler can generate exception checks in various ways. For example, before accessing the field of an object, it is possible to generate an explicit check of the object for difference from null with an exception throw in case of failure. However, such explicit checks negatively affect the performance and size of the code. Therefore, the compiler tries to do with implicit checks: only the pointer dereference code is generated, which in the case of a null pointer will result in a hardware exception, which the JVM will catch, recognize and rethrow as the corresponding Java exception.

In this problem, it was just necessary to find instructions that could trigger such implicit exceptions.

StackOverflowError occurs when trying to write / read the next stack slot outside the allowed range. This can occur in a push rax instruction.

There is also a single candidate for idiv dword [rdx+0FH] implicit ArithmeticException : the integer division instruction idiv dword [rdx+0FH] . If the dereferenced value is zero, a hardware division by zero will occur, followed by an ArithmeticException .

Implicit checks where NullPointerException can be thrown are very popular in Java code. To find them, it is enough to consider all the places where something is dereferenced. The instruction mov rbx, qword [rel _Test_array] dereferences static data to a relative address, so it can never lead to errors. But the instructions mov eax, dword [rcx+0FH] , idiv dword [rdx+0FH] , mov ebx, dword [rbx+3BH] dereference method parameters and read static data, that is, they can throw a NullPointerException .

Interestingly, the idiv dword [rdx+0FH] contains two implicit checks at once, which can sometimes cause a lot of problems with the JVM .

An implicit check for an IndexOutOfBoundsException must be in an instruction accessing an array element. The clue is reading a certain _Test_array into a register and its dereference in instructions 5 and 6 . However, it should be noted that with such a pattern of generation of access to an array element, indices that fall outside the allowable range will simply access the memory on the heap adjacent to the array, which does not trigger any hardware exceptions. Therefore, on most processor architectures, checks for IndexOutOfBoundsException are explicitly generated. However, in rare cases, the compiler may prove that such a test is not needed at all, which is what happens in this task. That is, an IndexOutOfBoundsException cannot be thrown here at all.

Task 7

An evil hacker hacked your computer again and edited it in the Helper.class hex editor so that the ending of the sayC method was unverifiable:

 public class Main { public static void main(String[] args) { System.out.print("A"); Helper.sayB(); Helper.sayC(); } } public class Helper { public static void sayB() { System.out.print("B"); } public static void sayC() { System.out.print("C"); // bad bytecode goes here } }

What happens when you run the Main class?

A : VerifyError will be VerifyError
B : “ A ” is printed and VerifyError thrown VerifyError
C : “ AB ” will be printed and VerifyError thrown VerifyError
D : “ ABC ” is printed and VerifyError thrown VerifyError

Answer and Decision

The correct answer is: B.

Decision

Verification of the byte-code of a certain class works before any method of this class is executed. In the class Helper there is an sayC method sayC , which means that the whole class is completely unverifiable. Thus, options C and D are definitely wrong: execution never reaches the sayB method.

Next, you need to understand at what point VerifyError thrown. According to the specification , link resolution errors should be thrown when the link is required at runtime, even if the JVM has link resolution vigorous (all links are resolved right away when the class is loaded). In this task, the reference to Helper needed only after the “ A ” output, so the correct answer is B.

A good example demonstrates the described behavior. Unverifiable bytecode obtained by manual manipulation.

More details about Java bytecode verification were described by Nikita Lipsky aka pjBooms on JBreak 2018 (so far only slides) and on JPoint 2017 ( there is a video ).

Conclusion

Although at the conference some were frightened by the type of assembler, there were quite a few people who decided to dive into the subtleties of the JVM: everyone who passed the tasks to our booth, we conducted an express course on the subtleties of the byte code, verifier and implicit exceptions. I hope you, having read the decisions, have learned something new . If so, then our goal is achieved!

Source: https://habr.com/ru/post/350638/

All Articles

Hardcore Java / JVM puzzles

Task 1

Task 2

Task 3

Task 4

Task 5

Task 6

Task 7

Conclusion

More articles: