Performance tasks from Contour have already been , and our turn has come: we present hardcore tasks from the Java conference JBreak 2018, aka β hell from Excelsior β.
The tasks are given in the original formulations, in each task there can be several correct answers, and each task is given a solution under the spoiler.
Your colleague has been reading the Java Language Specification and wrote the following:
void playWithRef() { Object obj = new Object(); WeakReference<Object> ref = new WeakReference<>(obj); System.out.println(ref.get() != null); System.gc(); System.out.println(ref.get() != null); }
And rake you: what performance results are possible?
The correct answer is: A , C , D.
The scope of the obj
variable is the entire method, and the scope of life ends after exiting the WeakReference
constructor (in fact, even a little earlier - in the insides of the constructor). And it is the area of ββlife that affects whether GC can destroy this object.
However, sometimes VM can prolong the life of variables if it is convenient for it. For example, the HotSpot interpreter tells the GC that the variables are alive while they are visible (this can be observed in the debugger). That is, option D is easily achieved by running the example without any additional options on the HotSpot VM (or with explicit -Xint
).
Result C is achieved on many compilers (for example, HotSpot C1 / C2, Excelsior JET JIT & AOT, ...). The compilers are smart enough to calculate that the obj
variable is not used and by the first call to get()
nothing prevents the GC from destroying the object. However, most often GC will come only when System.gc()
explicitly called; this behavior is manifested on HotSpot VM from -Xcomp
or Excelsior JET in any mode.
Option A is theoretically achievable if the GC comes, for example, at the end of the execution of the WeakReference
constructor.
The problem is based on a bug in the JDK 8 code, where the method argument was inaccurately kept in a WeakReference
and died during the execution of the method. About this there is a separate detailed post in our technical blog.
An evil hacker deleted the original java file and shuffled the pieces of your class file:
A: 0700 0401 0001 4300 2000 0300 0100 0000 B: 0000 0000 00 C: 6a61 7661 2f6c 616e 672f 4f62 6a65 6374 D: cafe babe 0000 0031 0005 0700 0201 0010
Rearrange them so that a verified class file is obtained.
The correct answer is: D , C , A , B.
This task is rather wit, but still teaches something new.
It is widely known that a class file begins with a four-byte header 0xCAFEBABE
, which means D is exactly the first. Common sense dictates that a short piece of B comes last - this is the tail.
Then it was possible to recall that the class file contains a ConstantPool , in which there are string constants consisting of two-byte length and the actual string encoded in UTF-8. The only piece that is similar to UTF-8 is the C piece - this is the UTF-8 representation of the java/lang/Object
string (a link to the superclass of our class). So before it should be bytes 0x0010
(the string has a length of 16), and the only suitable option is D , that is, C is the second.
Alternatively, it was possible to notice that the entire last line B consists of zeros, which means that the last but one line should end in zeros, that is, it is A !
class C minor version: 0 major version: 49 flags: ACC_SUPER Constant pool: #1 = Class #2 // java/lang/Object #2 = Utf8 java/lang/Object #3 = Class #4 // C #4 = Utf8 C { }
After listening to the next report about Graal, having enthusiastically looked at the JVM Compiler Interface, you decided to write your own compiler for Java! And we decided to start by generating the x86_64 code for the method:
static boolean invert(boolean x) { return !x; }
What generated code will be correct for this method?
Legend: Intel-syntax is used, the calling convention is such that the rcx
contains an argument, and rax
is the result.
A: test ecx, ecx jnz True mov eax, 1 ret True: mov eax, 0 ret B: xor eax, eax test ecx, ecx jnz End add eax, 1 End: ret C: mov eax, 1 sub eax, ecx ret D: mov eax, ecx xor eax, 1 ret
The correct answer is: A , B.
Increasingly at Java conferences, you can see assembler listings, but in case you are not familiar with the Intel x86 instruction set , below is the equivalent C code:
A: res = (arg == 0) ? 1 : 0; B: res = 0; if (arg == 0) res += 1; C: res = 1; res -= arg; D: res = arg; res ^= 1;
In fact, all these inversion algorithms work correctly, while the input argument takes the usual logical values 0
and 1
.
Next comes the interesting. From the point of view of the verifier, all short integer types ( boolean
, byte
, char
, short
) are equivalent to the type int
. Moreover , boolean
-specific byte-code instructions do not exist at all. For example, the byte-code instructions of the method under study are as follows:
public static boolean invert(boolean); 0: iload_0 1: ifne 8 4: iconst_1 5: goto 9 8: iconst_0 9: ireturn
Thus, a method accepting a boolean
must be ready to work with any int
, and any non-zero value is treated as true
. In this case, the βoptimizedβ options C and D start to behave incorrectly C(2) = -1
and D(2) = 3
, and the more straightforward A and B continue to work A(2) = B(2) = 0
.
To illustrate these subtleties will have to manipulate the byte code. The example is available on GitHub : the numbers 0, 1, 2, 3, -1 are passed to the invert
method, and the result is output, followed by calls to println(boolean)
and println(int)
.
A curious fact: in JDK 8, the HotSpot C2 compiler generated option D , and in JDK 9, the generation template was changed to a more correct one.
In JDK 8, the pattern D and the incorrect result returned are clearly visible:
$ jdk8/bin/java -Xcomp -Xbatch -XX:-TieredCompilation -XX:CompileCommand=print,Inverter.invert -XX:+UnlockDiagnosticVMOptions -XX:PrintAssemblyOptions=intel BooleanHell ... Compiled method (c2) 1216 533 Inverter::invert (10 bytes) ... # {method} {0x0000000012600d08} 'invert' '(Z)Z' in 'Inverter' # parm0: rdx = boolean # [sp+0x20] (sp of caller) 0x00000000057d7ac0: sub rsp,0x18 0x00000000057d7ac7: mov QWORD PTR [rsp+0x10],rbp ;*synchronization entry ; - Inverter::invert@-1 (line 3) 0x00000000057d7acc: mov eax,edx 0x00000000057d7ace: xor eax,0x1 ;*ireturn ; - Inverter::invert@9 (line 3) 0x00000000057d7ad1: add rsp,0x10 0x00000000057d7ad5: pop rbp 0x00000000057d7ad6: test DWORD PTR [rip+0xfffffffffdf58524],eax # 0x0000000003730000 ; {poll_return} 0x00000000057d7adc: ret ... false (0) -> true (1) true (1) -> false (0) true (2) -> true (3) true (3) -> true (2) true (-1) -> true (-2)
In JDK 9, we improved the normalization of boolean
values: adding the input argument to the {0, 1} range ( test
and setne
) and the result was correct:
$ jdk9/bin/java -Xcomp -Xbatch -XX:-TieredCompilation -XX:CompileCommand=print,Inverter.invert -XX:+UnlockDiagnosticVMOptions -XX:PrintAssemblyOptions=intel BooleanHell ... Compiled method (c2) 4702 1496 Inverter::invert (10 bytes) ... # {method} {0x000001fa974d2dc0} 'invert' '(Z)Z' in 'Inverter' # {method} {0x000001fa974d2dc0} 'invert' '(Z)Z' in 'Inverter' # parm0: rdx = boolean # [sp+0x20] (sp of caller) 0x000001fafcb57720: sub rsp,0x18 0x000001fafcb57727: mov QWORD PTR [rsp+0x10],rbp ;*synchronization entry ; - Inverter::invert@-1 (line 3) 0x000001fafcb5772c: test edx,edx 0x000001fafcb5772e: setne al 0x000001fafcb57731: movzx eax,al 0x000001fafcb57734: xor eax,0x1 ;*ireturn {reexecute=0 rethrow=0 return_oop=0} ; - Inverter::invert@9 (line 3) 0x000001fafcb57737: add rsp,0x10 0x000001fafcb5773b: pop rbp 0x000001fafcb5773c: test DWORD PTR [rip+0xfffffffffdf688be],eax # 0x000001fafaac0000 ; {poll_return} 0x000001fafcb57742: ret ... false (0) -> true (1) true (1) -> false (0) true (2) -> false (0) true (3) -> false (0) true (-1) -> false (0)
Suddenly, you realize that you are very interested in what can call this method:
void guessWhat(Iterable<?> x) { System.out.println(x.getClass()); }
class java.util.ArrayList
null
interface java.lang.Iterable
class java.lang.Integer
The correct answer is: A , D.
Options B and C are not possible, since Object.getClass()
always returns a non-zero class, and there are no instances of an interface type. Version A is easily implemented: guessWhat(new ArrayList<Object>())
.
However, option D is reachable: Integer
does not implement the Iterable
interface, but nevertheless its instance can come into this method. The answer is that the severity of the typical Java language system again fell under the weakness of the typical JVM verifier system: any reference type is assignment compatible with any interface. That is, almost everywhere where the interface type is expected (including parameters, return value, fields), you can pass any reference value (that is, arbitrary classes and arrays).
This effect can be demonstrated either by manipulating the bytecode, or by partially recompiling the class files.
Having once again believed in the infallibility of javac, you decided to experiment:
class C { private boolean getBoolean() { return false; } } interface I { default boolean getBoolean() { return true; } } class D extends C implements I {} public class Test { public static void main(String[] a) { foo(new D()); } public static void foo(I i) { System.out.println(i.getBoolean()); } }
What happens when you try to compile and run the Test
class?
java.lang.IllegalAccessError
throwntrue
"false
" is printedThe correct answer is: B.
Many believe that IllegalAccessError
is the lot of those who are too clever with partial recompilation or obfuscation. So it was with us, when ProGuard, during obfuscation, gave two different methods (one private, the other default) the same names, and the resulting application began to throw IllegalAccessError
.
However, it turned out that if two such methods will have the same name immediately in the source code, javac
compile them without any warnings, and during execution IllegalAccessError
also be thrown.
This behavior of the JVM is explained by the way the target method is searched for the invokeinterface
instruction. According to the specification , instance-methods of the class and all superclasses are viewed first, and only then a suitable default method is searched among the super-interfaces, and the privacy of the found method is checked only after the entire process is completed.
Thus, the search ends with the private method getBoolean
from the getBoolean
C
, which got in the way of finding the default of the getBoolean
method from the super-interface I
After this, IllegalAccessError
is logically thrown.
Interestingly, in Java 11 it is planned to change this , and skip private methods in the search process.
Suddenly you find yourself debugging the native code of a compiled Java application. You do not have the source code, but you have already found the problem method, here it is:
1: lea rax, [rel _Test_foo] 2: push rax 3: mov eax, dword [rcx+0FH] 4: idiv dword [rdx+0FH] 5: mov rbx, qword [rel _Test_array] 6: mov ebx, dword [rbx+3BH] 7: add eax, ebx 8: ret 8
You suspect that executing this method could trigger a Java throw of exceptions of various types. It remains to understand what instructions could be to blame (specify their numbers)?
StackOverflowError
: _________NullPointerException
: _________ArithmeticException
: _________IndexOutOfBoundsException
: _________Correct answer:
StackOverflowError
: 2NullPointerException
: 3, 4, 6ArithmeticException
: 4IndexOutOfBoundsException
: noneThe compiler can generate exception checks in various ways. For example, before accessing the field of an object, it is possible to generate an explicit check of the object for difference from null
with an exception throw in case of failure. However, such explicit checks negatively affect the performance and size of the code. Therefore, the compiler tries to do with implicit checks: only the pointer dereference code is generated, which in the case of a null pointer will result in a hardware exception, which the JVM will catch, recognize and rethrow as the corresponding Java exception.
In this problem, it was just necessary to find instructions that could trigger such implicit exceptions.
StackOverflowError
occurs when trying to write / read the next stack slot outside the allowed range. This can occur in a push rax
instruction.
There is also a single candidate for idiv dword [rdx+0FH]
implicit ArithmeticException
: the integer division instruction idiv dword [rdx+0FH]
. If the dereferenced value is zero, a hardware division by zero will occur, followed by an ArithmeticException
.
Implicit checks where NullPointerException
can be thrown are very popular in Java code. To find them, it is enough to consider all the places where something is dereferenced. The instruction mov rbx, qword [rel _Test_array]
dereferences static data to a relative address, so it can never lead to errors. But the instructions mov eax, dword [rcx+0FH]
, idiv dword [rdx+0FH]
, mov ebx, dword [rbx+3BH]
dereference method parameters and read static data, that is, they can throw a NullPointerException
.
Interestingly, the idiv dword [rdx+0FH]
contains two implicit checks at once, which can sometimes cause a lot of problems with the JVM .
An implicit check for an IndexOutOfBoundsException
must be in an instruction accessing an array element. The clue is reading a certain _Test_array
into a register and its dereference in instructions 5
and 6
. However, it should be noted that with such a pattern of generation of access to an array element, indices that fall outside the allowable range will simply access the memory on the heap adjacent to the array, which does not trigger any hardware exceptions. Therefore, on most processor architectures, checks for IndexOutOfBoundsException
are explicitly generated. However, in rare cases, the compiler may prove that such a test is not needed at all, which is what happens in this task. That is, an IndexOutOfBoundsException
cannot be thrown here at all.
An evil hacker hacked your computer again and edited it in the Helper.class
hex editor so that the ending of the sayC
method was unverifiable:
public class Main { public static void main(String[] args) { System.out.print("A"); Helper.sayB(); Helper.sayC(); } } public class Helper { public static void sayB() { System.out.print("B"); } public static void sayC() { System.out.print("C"); // bad bytecode goes here } }
What happens when you run the Main
class?
VerifyError
A
β is printed and VerifyError
thrown VerifyError
AB
β will be printed and VerifyError
thrown VerifyError
ABC
β is printed and VerifyError
thrown VerifyError
The correct answer is: B.
Verification of the byte-code of a certain class works before any method of this class is executed. In the class Helper
there is an sayC
method sayC
, which means that the whole class is completely unverifiable. Thus, options C and D are definitely wrong: execution never reaches the sayB
method.
Next, you need to understand at what point VerifyError
thrown. According to the specification , link resolution errors should be thrown when the link is required at runtime, even if the JVM has link resolution vigorous (all links are resolved right away when the class is loaded). In this task, the reference to Helper
needed only after the β A
β output, so the correct answer is B.
A good example demonstrates the described behavior. Unverifiable bytecode obtained by manual manipulation.
More details about Java bytecode verification were described by Nikita Lipsky aka pjBooms on JBreak 2018 (so far only slides) and on JPoint 2017 ( there is a video ).
Although at the conference some were frightened by the type of assembler, there were quite a few people who decided to dive into the subtleties of the JVM: everyone who passed the tasks to our booth, we conducted an express course on the subtleties of the byte code, verifier and implicit exceptions. I hope you, having read the decisions, have learned something new . If so, then our goal is achieved!
Source: https://habr.com/ru/post/350638/
All Articles