📜 ⬆️ ⬇️

Java virtual machine bytecode structure

Recently, articles appeared on Habré that affect the manipulation of bytecode. What made me post the following article on its structure.

The java platform has two features. To ensure cross-platform, the program is first compiled into an intermediate low-level language - bytecode. The second feature is loading executable classes using an extensible classloader. This mechanism provides more flexibility and allows modifying executable code when loading, creating and loading new classes during program execution.

This technique is widely used to implement AOP, creating test frameworks, ORM. I especially want to mention terracotta , a product with a beautiful idea of ​​clustering jvm and using a modification of byte-code to the full extent. This note will be devoted to a review of the structure of the bytecode, the first part of this strong bundle.

')
Each class in java corresponds to one compiled file. This is true even for subclasses or anonymous classes. Such a file contains information about the name of the class, its parent, the list of interfaces that it implements, the enumeration of its fields and methods. It is important to note that after compiling the information that contains the import directive, it is lost and all classes are now named through the full path. For example, java / lang / String will be written in the String location.

The most interesting thing will look like class methods in bytecode. We will observe what the next class is transforming into:

 package org;

 class Test {
     private String name;

     public String getName () {
         return name;
     }

     public void setName (String name) {
         this.name = name;
     }
 }

Let's start with the title. It contains information about the name of the method, the fact that the method is called without parameters, and the type of the returned argument.

Byte-code is a stack-oriented language, similar in structure to assembler. To make operations with data, you first need to put it on the stack. We want to take the field from the object. To do this you need to put it on the stack. In bytecode there are no variable names, they have numbers. The zero number of the link to the current object or the variable this. Then come the parameters of the executable method. Then the rest of the variables.

The ALOAD 0 command puts the this variable on the stack. To put a different type of data on the stack, you need to use another command. For long will be LLOAD, and for doubles [] will be DALOAD.

The following command GETFIELD, removes from the stack a link to the object and puts a primitive type or a link to the field of this object. She has two options. The first name is a class, the second name is a variable. If the variable is static, then you do not need to put anything on the stack beforehand, and the command should be replaced with GETSTATIC with the same parameters.

The last command says that the method is complete and returns the values ​​of the link type from the stack.

The setter has a slightly more complex structure.
 public setName (Ljava / lang / String;) V
 ALOAD 0
 ALOAD 1
 PUTFIELD org / Test name
 RETURN

This method returns nothing. The first two commands put on the stack the variable this and the parameter of the executable method. Then the command PUTFIELD (PUTSTATIC for a static field) is called, which sets the value of the object field and removes the last two values ​​from the stack. The last command is the exit from the method.

Add a couple more methods to our object and see which bytecode corresponds to them.

     public void forTest (Boolean b) {
         System.out.prinln (b);
     }

     public Long testMethods (Collection <Long> testInterface) {
         Long a = System.curretM ();
         forTest (testInterface.contains (a));    
         return a;
     }

testMethod has the following view.
 INVOKESTATIC java / lang / System currentTimeMillis () J
 LSTORE 2
 ALOAD 0
 ALOAD 1
 LLOAD 2
 INVOKESTATIC java / lang / Long valueOf (J) Ljava / lang / Long;
 INVOKEINTERFACE java / util / Collection contains (Ljava / lang / Object;) Z
 INVOKESTATIC java / lang / Boolean valueOf (Z) Ljava / lang / Boolean;
 INVOKEVIRTUAL org / Test forTest (Ljava / lang / Boolean;) V
 LLOAD 2
 INVOKESTATIC java / lang / Long valueOf (J) Ljava / lang / Long;
 ARETURN

The first command calls a static method on the System class. The second stores the result of calling the currentTimeMillis method in a variable with the second number. Then we put the this variable, the method parameter and the variable number 2 on the stack. I convert a variable to java / lang / Long. And we check that it is contained in the collection by calling the method on the executable parameter. We have an interface parameter, so the INVOKEINTERFACE command is used. For a class method, you must use INVOKEVIRTUAL. To call a method on an object or interface, it is necessary that the object be on the stack, then the parameters of the method being called. As a result of calling the method, they will be replaced with the result or they will simply be removed from the stack if the method returns anything. The last three commands put a change on the stack, turn it into an object and return it as a method value.

To complete our excursion into the bytecode, add the last method and look at the cycles and conditional statements.
     public void testAriphmentics () {
        int i = -17;
         while (i <10) {
             if (i <0) {
                 i = i + 7;
             }
             i = i * 13;
         }
     }

It will look like this in bytecode.
 ACC_FINAL -17
 ISTORE 1
 Label: L1466604866
 ILOAD 1
 ACC_FINAL 10
 IF_ICMPGE L329949514
 ILOAD 1
 IFGE L658705244
 ILOAD 1
 ACC_FINAL 7
 Iadd
 ISTORE 1
 Label: L658705244
 ILOAD 1
 ACC_FINAL 13
 Imul
 ISTORE 1
 GOTO L1466604866
 Label: L329949514
 RETURN

The first two commands initialize the variable i (number 1) with a value of -17.

Next we begin the body of the cycle. For the implementation of which will require labels
transition command and conditional statement. In the body of our method, as many as three tags. The first label indicates the start of the cycle. The second is needed for the conditional operator, and the latter marks the end of the cycle. The catchment operator has one parameter transition label. Before it is called, the compared values ​​must lie on the stack. For comparison with zero, a separate command is used. Each type has its own comparison operator. For int it is IF_ICMPGE. After comparing, the compared values ​​are removed from the stack. For arithmetic operations with two variables, just as for the conditional operator, you must first put them on the stack. After execution, they are removed from the stack, and the result is put in their place.

This brief excursion into the byte-code is complete, some questions such as exceptions, synchronization were not affected. I hope that having an idea about the byte code the reader can easily cope with them. In the next part, we will look at the tools that are used to modify bytecode.

http://math-and-prog.blogspot.com/2009/08/java.html

Source: https://habr.com/ru/post/69797/


All Articles