📜 ⬆️ ⬇️

Java Bytecode Fundamentals

Java application developers usually do not need knowledge of the bytecode running in a virtual machine, but those who develop modern frameworks, compilers, or even Java tools may need an understanding of bytecode and perhaps even an understanding of how to use it. for their own purposes. Despite the fact that special libraries such as ASM, cglib, Javassist help in using byte-code, it is necessary to understand the basics in order to use these libraries effectively.
The article describes the very basics from which you can make a start in further digging up this topic.

Let's start with a simple example, namely a POJO with a single field and a getter and setter for it.
public class Foo { private String bar; public String getBar(){ return bar; } public void setBar(String bar) { this.bar = bar; } } 

When you compile a class using the javac Foo.java command, you will have a file Foo.class containing the bytecode. Here is what its content looks like in a HEX editor:

image

Each pair of hexadecimal numbers (bytes) is translated into opcodes (mnemonics). It would be cruel to try to read it in binary format. Let's move on to the mnemonic presentation.
')
The javap -c Foo command displays the bytecode:
 public class Foo extends java.lang.Object { public Foo(); Code: 0: aload_0 1: invokespecial #1; //Method java/lang/Object."<init>":()V 4: return public java.lang.String getBar(); Code: 0: aload_0 1: getfield #2; //Field bar:Ljava/lang/String; 4: areturn public void setBar(java.lang.String); Code: 0: aload_0 1: aload_1 2: putfield #2; //Field bar:Ljava/lang/String; 5: return } 


The class is very simple, so it will be easy to see the connection between the source code and the generated bytecode. First of all, we see that in the byte-code-version of the class, the compiler invokes the default constructor (as written in the JVM specifications).

Further, studying the byte-code instructions (we have aload_0 and aload_1), we see that some of them have prefixes like aload_0 and istore_2. This refers to the type of data with which the instruction operates. The prefix "a" means that the opcode controls the reference to the object. "I", respectively, manages integer.

An interesting point here is that some of the instructions operate on odd operands like # 1 and # 2, which actually refers to a pool of class constants. It's time to examine the class file closer. Run the javap -c -s -verbose command (-s for output signatures, -verbose for verbose output)
 Compiled from "Foo.java" public class Foo extends java.lang.Object SourceFile: "Foo.java" minor version: 0 major version: 50 Constant pool: const #1 = Method #4.#17; // java/lang/Object."":()V const #2 = Field #3.#18; // Foo.bar:Ljava/lang/String; const #3 = class #19; // Foo const #4 = class #20; // java/lang/Object const #5 = Asciz bar; const #6 = Asciz Ljava/lang/String;; const #7 = Asciz ; const #8 = Asciz ()V; const #9 = Asciz Code; const #10 = Asciz LineNumberTable; const #11 = Asciz getBar; const #12 = Asciz ()Ljava/lang/String;; const #13 = Asciz setBar; const #14 = Asciz (Ljava/lang/String;)V; const #15 = Asciz SourceFile; const #16 = Asciz Foo.java; const #17 = NameAndType #7:#8;// "":()V const #18 = NameAndType #5:#6;// bar:Ljava/lang/String; const #19 = Asciz Foo; const #20 = Asciz java/lang/Object; { public Foo(); Signature: ()V Code: Stack=1, Locals=1, Args_size=1 0: aload_0 1: invokespecial #1; //Method java/lang/Object."":()V 4: return LineNumberTable: line 1: 0 public java.lang.String getBar(); Signature: ()Ljava/lang/String; Code: Stack=1, Locals=1, Args_size=1 0: aload_0 1: getfield #2; //Field bar:Ljava/lang/String; 4: areturn LineNumberTable: line 5: 0 public void setBar(java.lang.String); Signature: (Ljava/lang/String;)V Code: Stack=2, Locals=2, Args_size=2 0: aload_0 1: aload_1 2: putfield #2; //Field bar:Ljava/lang/String; 5: return LineNumberTable: line 8: 0 line 9: 5 } 

Now you can see what these strange operands are. For example, # 2:

const # 2 = Field # 3. # 18; // Foo.bar:Ljava/lang/String;

He refers to:

const # 3 = class # 19; // foo
const # 18 = NameAndType # 5: # 6; // bar: Ljava / lang / String;

And so on.

Note that, each opcode is labeled with a number (0: aload_0). This is an indication of the position of the instruction inside the frame - I will explain further what this means.

To understand how bytecode works, just look at the execution model. JVM uses a stack-based execution model. Each thread has a JVM stack containing frames. For example, if we run the application in debugger, we will see the following frames:
image

Each time the method is called, a new frame is created. A frame consists of an operand stack, an array of local variables, and a reference to the pool of constants of the class of the method being executed.
image

The size of the array of local variables is determined at compile time, depending on the number and size of local variables and method parameters. Operand stack - LIFO stack for writing and deleting values ​​in the stack; size is also determined at compile time. Some opcodes add values ​​to the stack, others take operands from the stack, change their state and return to the stack. The operand stack is also used to get the values ​​returned by the method (return values).
 public String getBar(){ return bar; } public java.lang.String getBar(); Code: 0: aload_0 1: getfield #2; //Field bar:Ljava/lang/String; 4: areturn 

The bytecode for this method consists of three opcodes. The first opcode, aload_0, pushes a value with index 0 from the table of local variables onto the stack. The link this in the table of local variables for constructors and instance methods always has an index of 0. The following opcode, getfield, takes out the object field. The last instruction, areturn, returns the reference from the method.

Each method has a corresponding bytecode array. Looking at the contents of the class file in the hex editor, you will see the following values ​​in the bytecode array:

image

So, the bytecode for the getBar method is 2A B4 00 02 B0. 2A refers to the aload_0 instruction, B0 to areturn. It may seem strange that the bytecode for the method has three instructions, and there are 5 elements in the byte array. This is due to the fact that getfield (B4) needs two parameters (00 02), which occupy positions 2 and 3 in the array, hence 5 elements in the array. The areturn instruction moves to position 4.
Local variable table

To illustrate what happens with local variables, let's use another example:
 public class Example { public int plus(int a){ int b = 1; return a + b; } } 

Here are two local variables - the method parameter and the local variable int b. Here is the bytecode:
 public int plus(int); Code: Stack=2, Locals=3, Args_size=2 0: iconst_1 1: istore_2 2: iload_1 3: iload_2 4: iadd 5: ireturn LineNumberTable: line 5: 0 line 6: 2 

LocalVariableTable:
Start Length Slot Name Signature
0 6 0 this LExample;
0 6 1 a I
2 4 2 b I

The method loads constant 1 using iconst_1 and puts it in local variable 2 using istore_2. Now in the local variable table, slot 2 is occupied by variable b, as expected. Next, iload_1 loads the value onto the stack, iload_2 loads the value b. iadd pushes 2 operands from the stack, adds them, and returns the value of the method.
Exception Handling

An interesting example of what kind of byte code is obtained in the case of exception handling, for example, for the try-catch-finally construct.
 public class ExceptionExample { public void foo(){ try { tryMethod(); } catch (Exception e) { catchMethod(); }finally{ finallyMethod(); } } private void tryMethod() throws Exception{} private void catchMethod() {} private void finallyMethod(){} } 

The byte code for the foo () method is:
 public void foo(); Code: 0: aload_0 1: invokespecial #2; //Method tryMethod:()V 4: aload_0 5: invokespecial #3; //Method finallyMethod:()V 8: goto 30 11: astore_1 12: aload_0 13: invokespecial #5; //Method catchMethod:()V 16: aload_0 17: invokespecial #3; //Method finallyMethod:()V 20: goto 30 23: astore_2 24: aload_0 25: invokespecial #3; //Method finallyMethod:()V 28: aload_2 29: athrow 30: return Exception table: from to target type 0 4 11 Class java/lang/Exception 0 4 23 any 11 16 23 any 23 24 23 any 

The compiler generates code for all scripts possible inside a try-catch-finally block: finallyMethod () is called three times (!). The try block compiled as if there was no try and it was merged with finally:
0: aload_0
1: invokespecial # 2; // Method tryMethod :() V
4: aload_0
5: invokespecial # 3; // Method finallyMethod :() V
If the block is executed, the goto instruction skips execution to the 30th position with the opcode return.

If tryMethod throws Exception, the first matching (internal) exception handler from the exception table will be selected. From the exception table, we see that the position with the exception interception is 11:

0 4 11 Class java / lang / Exception

This throws the execution to catchMethod () and finallyMethod ():

11: astore_1
12: aload_0
13: invokespecial # 5; // method catchMethod: () V
16: aload_0
17: invokespecial # 3; // method finallyMethod :() V

If another exception is thrown during the execution, we will see that the position in the exception table will be 23:

0 4 23 any
11 16 23 any
23 24 23 any

Instructions starting at 23:

23: astore_2
24: aload_0
25: invokespecial # 3; // Method finallyMethod :() V
28: aload_2
29: athrow
30: return

So finallyMethod () will be executed anyway, with aload_2 and athrow throwing an unhandled exception.

Conclusion

These are just a few moments from the JVM bytecode area. Most were learned from a developerWorks article by Peter Haggar - Java bytecode: Understanding bytecode makes it a better programmer. The article is a bit outdated, but still relevant. The BCEL User Guide contains a decent description of the basics of the bytecode, so I would suggest reading it to those interested. In addition, the specification of a virtual machine can also be a useful source of information, but it is not easy to read, besides there is no graphical material that is useful in understanding.

In general, I think that understanding how bytecode works is an important point in deepening your knowledge of Java programming, especially for those who look to frameworks, JVM languages ​​compilers or other utilities.

Source: https://habr.com/ru/post/111456/


All Articles