How strong is the friendship between Java and C inside Dalvik VM?

In this article I tried to describe in great detail my steps when exploring android code and its execution in Dalvik VM. I was very interested to know the answers to the questions:

How does the code generated by C look like? (from the position of ARM)
What does the code generated by java look like?
How and where does the code run?

Therefore, this article is divided into 3 parts.

It seems to me to pose such questions and study them - an important point in the subsequent writing of the code, because the android has already stepped on its heels and not knowing it as well as one of its favorite tools (for example C) is no longer correct.
')

Before that, I practically didn’t analyze virtual machines, and now I’m interested in Dalvik VM. Therefore, the entire description will be related to this VM and you can observe my narration, the course of which changes several times.

Introduction

In order to understand this article, you need to have a basic knowledge of assembly language for ARM architecture. The article provides little information to understand this, but it emphasizes that the reader does not know. Therefore, there are many explanations in your own words, but no more.

It is also necessary that you have the skills to create android applications from scratch (they knew the directory structure and the main files).

You can also look at the results of each article and see if you should read it.

All 3 parts are very different load, about equal strength. For example, when you go to the 3rd part, it may seem to you that it is as much more complicated than the 2nd as compared to the first. So please be careful if you do not own the subject, at the end the load can be large.

Who will be interested in the article

Only for adventurers.

A good developer, if he doesn’t know this, will easily find out (I hope my understanding of a good developer is not much overestimated), but the one who wants to start something and try, but does not know how, it will be very difficult for him.

I wanted to show that to ask myself questions and find answers to them in such a dense forest as Dalvik VM, more than real and I want to show this path to anyone who decides on the same audacious act.

Original purpose

My main task is to compare the C code:

int sum(int a, int b) { return a + b; }

With the same Java code:

  public class Summator { int sum(int a, int b) { return a + b; } static int staticSum(int a, int b) { return a + b; } }

  public class NativeSummator { native int sum(int a, int b); static native int staticSum(int a, int b); }

I have my own assumptions, but I really want to see how it really happens.

Preparatory knowledge

I will remind here how integration with a code in Java, through JNI approximately looks.

In order to use an external (native) function in Java code, the native keyword is used:

 package com.m039.study; public class Summator { native int sum(int a, int b); }

In this case, the C code should look like:

 int Java_com_m039_study_Summator_sum(JNIEnv* env, jobject thiz, int a, int b) { return a + b; }

In order for Java to see a function, it must be composed according to certain rules (given from memory):

name starts with java
contains the path to: module + class name + function name
Two additional arguments are passed to the function: a pointer to the environment and a structure to the object or class (if the method is static)

To compile, use the ndk-build . This command expects C (or C ++) files to be in the ./jni directory. The catalog must contain the Android.mk file and may be Application.mk.

After compilation, the library libNAME.so will be created in the directory ./libs/armebi, where armebi may differ.

You will have to refer to another document if you need more detailed steps. You can look at the catalog of documentation and examples attached to the NDK .

Part I. What does the code generated by C look like?

Question: What exactly does the C code do?

In principle, this is understandable, but how does it look from the point of view of the ARM architecture, since almost all mobile devices now use this particular architecture?

To understand this, you need to understand how C code looks in assembly (raw) form.

Disassemble

The main issue is the interaction between C and Java, based on this, chose 5 examples:

1. Standard:

 int sum(int a, int b) { return a + b; }

2. Binding through JNI (non-static):

  int Java_com_m039_study_Summator_sum(JNIEnv* env, jobject thiz, int a, int b) { return a + b; }

3. Binding via JNI (static):

  int Java_com_m039_study_StaticSummator_sum(JNIEnv* env, jclass thiz, int a, int b) { return a + b; }

4. Binding through JNI (not static, through function):

  int Java_com_m039_study_Summator_sum(JNIEnv* env, jobject thiz, int a, int b) { return sum(a, b); }

5. Binding through JNI (staic, through function):

  int Java_com_m039_study_StaticSummator_sum(JNIEnv* env, jclass thiz, int a, int b) { return sum(a, b); }

We will think a little. It seems to me 4 and 5 are almost alike, if not to say completely. A 1-3 should be different transmitted parameters. This is what I would like to check.

Assembly system

The system for assembling android applications is extremely simplified, and for such purposes as checking compiled code for android you need to know the compilation flags. Especially the optimization flag.

In the Application.mk file, you can change the APP_OPTIM variable and assign it one of two values: release or debug.

Because the purpose of this article is to compare the integration of two languages, and not to understand one well, then for simplicity, I will use the value of 'debug'. This will probably give a better understanding and less chance of an error (which will give more chances to write this article), but still, then you need to see how much the code in the 'release' mode differs.

Now a little more in detail.

The APPLICATION-MK.html file describes what the APP_OPTIM flag APP_OPTIM , it also indicates that you can change the attribute in the application tag to android:debuggable="true" and the variable APP_OPTIM assigned the value 'debug'. If the attribute is set to "false", then APP_OPTIM assigned a 'release'.

Therefore, in the future it is assumed that the AndroidManifest.xml attribute is set to android:debuggable to “true”.

And we cannot miss the moment, which values will be transferred to the compiler, when using the variable APP_OPTIM , this can be seen below, in the same file :

 ifeq ($(APP_OPTIM),debug) APP_CFLAGS := -O0 -g $(APP_CFLAGS) else APP_CFLAGS := -O2 -DNDEBUG -g $(APP_CFLAGS) endif

Now attack to consider all 5 options.

In 'debug' mode

Note : all listings were received by the command arm-linux-androideabi-objdump -d " "

Option 1

 00000804 <sum>: 804: b082 sub sp, #8 806: 9001 str r0, [sp, #4] 808: 9100 str r1, [sp, #0] 80a: 9a01 ldr r2, [sp, #4] 80c: 9b00 ldr r3, [sp, #0] 80e: 18d3 adds r3, r2, r3 810: 1c18 adds r0, r3, #0 812: b002 add sp, #8 814: 4770 bx lr 816: 46c0 nop (mov r8, r8)

In this listing, a lot of superfluous, and all because of the calling agreement, and more specifically the ARM calling convention .

I translate this example:
- 04: reserves memory for 2 local variables.
- 06-08: saves the arguments passed to the function.
- 0a-0c: the same arguments are passed to other registers r2 and r3, respectively.
- 0e: equivalent to r3 = r2 + r3
- 10: save result to register r0
- 12: return the stack pointer to the initial state
- 14: returns where you came from
- 16: align the function in the file using the nop command

Update : I was not clear 2 points: the suffix s and the bx command. The first means that the status flag will also be updated during execution. The second is a thumb command, which is equivalent to the command bl.

After the code was chewed, it became clear that a lot and a lot of excess. Of course, if you use optimization, there was nothing like that.

In this listing, only 2 points should be noted, how the arguments are passed to the function and how the main part is executed.

Arguments are passed through registers, the main part contain only one command. From here it was possible to make that the code adds r0, r1, r2; bx lr adds r0, r1, r2; bx lr has the right to exist.

But again, you can’t say that such code will be in the finished application (unless someone decides to release a debug version to the market). And while there is an assumption that Java is still worse and I want to check it out. Therefore, we must quickly go through the remaining listings.

Option 2

 000007ec <Java_com_m039_study_Summator_sum>: 7ec: b084 sub sp, #16 7ee: 9003 str r0, [sp, #12] 7f0: 9102 str r1, [sp, #8] 7f2: 9201 str r2, [sp, #4] 7f4: 9300 str r3, [sp, #0] 7f6: 9a01 ldr r2, [sp, #4] 7f8: 9b00 ldr r3, [sp, #0] 7fa: 18d3 adds r3, r2, r3 7fc: 1c18 adds r0, r3, #0 7fe: b004 add sp, #16 800: 4770 bx lr 802: 46c0 nop (mov r8, r8)

Yes, the same thing as it was supposed, only the number of arguments increased and only. And how many extra operations, horror. But while the desire to see an optimized version becomes much more.

Note : while the assumption is confirmed that using many JNI functions (to which redundant arguments are passed) is worse than using simple functions (to which fewer arguments are simply passed). (Now, when rewriting this article, this statement sounds very naive)

Option 3

 00000850 <Java_com_m039_study_StaticSummator_sum>: 850: b084 sub sp, #16 852: 9003 str r0, [sp, #12] 854: 9102 str r1, [sp, #8] 856: 9201 str r2, [sp, #4] 858: 9300 str r3, [sp, #0] 85a: 9a01 ldr r2, [sp, #4] 85c: 9b00 ldr r3, [sp, #0] 85e: 18d3 adds r3, r2, r3 860: 1c18 adds r0, r3, #0 862: b004 add sp, #16 864: 4770 bx lr 866: 46c0 nop (mov r8, r8)

The assumption was justified, the code is identical to option 2.

Option 4

 000008e0 <Java_com_m039_study_Summator_sum>: 8e0: b500 push {lr} 8e2: b085 sub sp, #20 8e4: 9003 str r0, [sp, #12] 8e6: 9102 str r1, [sp, #8] 8e8: 9201 str r2, [sp, #4] 8ea: 9300 str r3, [sp, #0] 8ec: 9a01 ldr r2, [sp, #4] 8ee: 9b00 ldr r3, [sp, #0] 8f0: 1c10 adds r0, r2, #0 8f2: 1c19 adds r1, r3, #0 8f4: f7ff ffde bl 8b4 <sum> 8f8: 1c03 adds r3, r0, #0 8fa: 1c18 adds r0, r3, #0 8fc: b005 add sp, #20 8fe: bd00 pop {pc}

Why do you still need to consider this option? In the event that it seemed that it was better to write everything in JNI functions. Whereas in fact, it is better to use JNI as a wrapper for more complex constructions. But in small C functions you can do whatever you want. ( Note : after writing this article this statement already seems obvious to me)

This code is a bit more complicated:
- e0: saves return address
- e2: reserves memory for local variables
- e4-ea: saves arguments to local variables
- ec-ee: takes the value of local variables
- f0-f2: prepares the register value for the function call
- f4: function call itself
- f8-fa: saves the result to the register, and then writes it to the register for the return value
- fc: returns the stack pointer to the initial position
- fe: moves to where they called from

The string 'e0', is used to save the return address, taking into account that the register lr will be rewritten via the instruction bl.

As you can see, there are a lot of extra code. I suppose that such a code has the right to exist:

 push {lr} sub sp, #20 adds r0, r2, #0 adds r1, r3, #0 bl 8b4 <sum> add sp, #20 pop {pc}

Option 5

I think it should not be considered.

In 'release' mode

I'd like to check my guesses about which code will be in an optimized version.

Option 1

 0000894 <sum>: 894: 1808 adds r0, r1, r0 896: 4770 bx lr

Option 2

 00000890 <Java_com_m039_study_Summator_sum>: 890: 1898 adds r0, r3, r2 892: 4770 bx lr

Option 3

 00000898 <Java_com_m039_study_StaticSummator_SUM>: 898: 1898 adds r0, r3, r2 89a: 4770 bx lr

Option 4

 0000089c <Java_com_m039_study_Summator_SUM3>: 89c: b510 push {r4, lr} 89e: 1c10 adds r0, r2, #0 8a0: 1c19 adds r1, r3, #0 8a2: f7ff fff7 bl 894 <sum> 8a6: bd10 pop {r4, pc}

Impression

I'm a little surprised and no. Everything that was supposed before coincided. The optimized code is good, very close to the one that I tried to write above myself.

Part I. Summary

This part will probably introduce you to the ARM assembler. You may want to analyze other language constructs and understand them. Maybe you will also be surprised at your guesses.

But most importantly, if you are an android developer, then now it has become clear how the default compilation settings are used and where to look when you need it.

I was initially interested to know how much C code would be better than Java code. I first wrote an article, but I’m writing this result later. I can say that for now you can pay attention to how elegantly it turns out to solve the problem in the C language and how compact it turned out to be in disassembler form. In Java opcodes, this is also noticeable, but less so. And in the 3rd article, where opcodes are mixed with C, this is almost gone.

And now, just, a little about Java opcodes.

Part II. What does the code generated by java look like?

First you need to understand how the JVM, or rather Dalvik VM. To do this, you must disassemble the * .dex file, which is the bytecode file for Dalvik VM.

What will the Summator class look like (see original goal)? To do this, add the program dexdump -d classes.dex . What the command issued:

  #5 : (in Lcom/m039/study/Summator;) name : 'sum' type : '(II)I' access : 0x0000 () code - registers : 4 ins : 3 outs : 0 insns size : 3 16-bit code units 00226c: |[00226c] com.m039.study.Summator.sum:(II)I 00227c: 9000 0203 |0000: add-int v0, v2, v3 002280: 0f00 |0002: return v0 catches : (none) positions : 0x0000 line=73 locals : 0x0000 - 0x0003 reg=1 this com/m039/study/Summator; 0x0000 - 0x0003 reg=2 a I 0x0000 - 0x0003 reg=3 b I

  name : 'staticSum' type : '(II)I' access : 0x0008 (STATIC) code - registers : 3 ins : 2 outs : 0 insns size : 3 16-bit code units 002100: |[002100] com.m039.study.Summator.staticSum:(II)I 002110: 9000 0102 |0000: add-int v0, v1, v2 002114: 0f00 |0002: return v0 catches : (none) positions : 0x0000 line=77 locals : 0x0000 - 0x0003 reg=1 a I 0x0000 - 0x0003 reg=2 b I

Here you need to see the add-int v0, v1, v2 and the opcode value '9000 0102'. Now you can go to what these opcodes look like from the inside.

What opkody watch?

I'm more interested in the ARM architecture, and because This architecture is used by default, then opcodes will be appropriate. But there are a lot of ARMs, which is used by default?

To do this, refer to the document Application.mk, and more specifically to the description of the variable APP_ABI . It indicates that armv5te is used by default. So we will explore it!

Where to start watching?

The opcodes themselves are in the armv5te directory. It would be correct to consider the files that are in this directory, but I wanted to do it in a not quite correct way, but it also works - to consider already generated files. Or rather file InterpAsm-armv5te.S . It seems to me so interesting and practical.

Documentation

All the main documentation is located in the docs directory, but if necessary I will give links to a more readable format than the raw html.

Investigate addition opcode (0x90)

Consider the opcode 0x90 , its code:

 .L_OP_ADD_INT: /* 0x90 */ /* File: armv5te/OP_ADD_INT.S */ /* File: armv5te/binop.S */ /* * Generic 32-bit binary operation. Provide an "instr" line that * specifies an instruction that performs "result = r0 op r1". * This could be an ARM instruction or a function call. (If the result * comes back in a register other than r0, you can override "result".) * * If "chkzero" is set to 1, we perform a divide-by-zero check on * vCC (r1). Useful for integer division and modulus. Note that we * *don't* check for (INT_MIN / -1) here, because the ARM math lib * handles it correctly. * * For: add-int, sub-int, mul-int, div-int, rem-int, and-int, or-int, * xor-int, shl-int, shr-int, ushr-int, add-float, sub-float, * mul-float, div-float, rem-float */ /* binop vAA, vBB, vCC */ FETCH(r0, 1) @ r0<- CCBB mov r9, rINST, lsr #8 @ r9<- AA mov r3, r0, lsr #8 @ r3<- CC and r2, r0, #255 @ r2<- BB GET_VREG(r1, r3) @ r1<- vCC GET_VREG(r0, r2) @ r0<- vBB .if 0 cmp r1, #0 @ is second operand zero? beq common_errDivideByZero .endif FETCH_ADVANCE_INST(2) @ advance rPC, load rINST @ optional op; may set condition codes add r0, r0, r1 @ r0<- op, r0-r3 changed GET_INST_OPCODE(ip) @ extract opcode from rINST SET_VREG(r0, r9) @ vAA<- r0 GOTO_OPCODE(ip) @ jump to next instruction /* 11-14 instructions */

If you look at the comments, everything becomes clear, but I would like to see these comments in the code myself. Therefore, further there is an analysis of a piece of this code.

Opcode 0x90 corresponds to the addition instruction, its format is 23x . This means that the size of the instruction is 2 bytes and uses 3 registers, 'x' means that there is no more than this.

Therefore, the code of this opcode must extract these 3 registers, which are transmitted in 23x format and use them for addition. But it turns out that everything is not so simple. The prefix 'v' to the register, for example, 'vCC' - means virtual. And it turns out that the opcode transmits the register numbers, and then the instruction retrieves the value contained in the specified register. ( Note : registers do not always have the prefix 'v')

It looks like this:
- We have opcode: 00 | 90 02 | 01 (AA | op CC | BB) *
- 90 means addition opcode
- Extract the numbers of registers r9 = 0, r2 = 1, r3 = 2
- Extract the contents of the registers r1 = REG (r3), r0 = REG (r2) **
- We carry out the instruction 'add r0, r0, r1'
- Save the return value REG (r9) = r0

* added for ease of perception |, similar to documentation
** REG - is pseudocode

This opcode also contains the commands necessary to switch to another opcode. Now they should not pay attention, since they parse a bit lower.

And now how it all looks in assembly language or a more severe explanation:
1. FETCH (r0, 1). This opcode (0x90) uses two registers rPC and rINST (ibid), which correspond to r4 and r5. If you look a little lower, you will notice that rINST ( FETCH_INST ) is the value at rPC. Therefore, it can be said that rINST is equal to the value of the FETCH (rINST, 0) command.

2. The numbers of the registers are retrieved by the standard method. (see picture)

3. GET_VREG (r1, r3). A new rFP register appears. This register points to the memory area for local variables and arguments. Those. through it, you can both extract the values of the arguments and write the return value. It can be represented as a pointer to the internal registers of the VM.

4. .if 0 ... .endif is incomprehensible, and all because he began to explore with the file InterpAsm-armv5te.S! If you look at the file binop.S , it becomes clear that checking the second argument but zero in this opcode is disabled.

5. FETCH_ADVANCE_INST (2) is discussed further.

6. Addition is performed.

7. GET_INST_OPCODE (ip) is discussed further.

8. SET_VREG (r0, r9) writes the return value to the corresponding register, the number of which is specified in r0.

9. GOTO_OPCODE (ip) is discussed further.

I painted a visual (I hope) picture:

It can be concluded that the opcode is similar to the same assembler code ('adds r0, r1, r0'), only with 2 additions: processing the virtual registers and moving on to the next opcode.

Not considered part

The processing of virtual registers was considered. Now you also need to understand what the rest of the teams in the opcode are doing, I hope they will not lead too far. For example, where the values of registers such as rFP and rPC are initialized. Although it is no less interesting, but a strong offshoot of the goal.

Now consider the remaining commands:
1. FETCH_ADVANCE_INST (2) the following opcode is written to the rINST register
2. GET_INST_OPCODE (ip) in the ip register records the opcode number (already the next one)
3. GOTO_OPCODE (ip) go to the code of the next opcode

The rIBASE register appears in these 3 commands. It is necessary to check whether it is an opcode with the number 0x00? It seems as if it should. That is how it is. The code often contains the line:

 adr rIBASE, dvmAsmInstructionStart @ set rIBASE

And the value of dvmAsmInstructionStart is equal to .L_OP_NOP. And the code OP_NOP? He really is 0x00 .

What is not considered, but I would like to

In this section, only the rest (rPC, rFP) registers of the initial value were not considered. Perhaps we consider them further. Why it is “possible” because it’s great value for an article while it’s represented by rFP, and not rPC.

Investigate addition opcode (correct version)

It seemed to me that the way the opcode was investigated was not quite correct, but it seemed to me more pleasant. But now let us consider how this can be done better, namely, how are the files in the arm5vte folder composed?

To do this, refer to the document README.txt .

I am a little surprised, but the method used for the study was correct, in terms of the README.txt file. Here is a quote:

“It’s a directory where you’re looking for a directory, such as out / InterpC-portstd.c, rather than trying to get it. "

Now, as for our example, consider the OP_ADD_INS.S file:

 %verify "executed" %include "armv5te/binop.S" {"instr":"add r0, r0, r1"}

This code means that this file will, conditionally, be replaced by the file binop.S, in which the values of $ instr will be add r0, r0, r1 .

It remains to consider the file binop.S :

 %default {"preinstr":"", "result":"r0", "chkzero":"0"} /* * Generic 32-bit binary operation. Provide an "instr" line that * specifies an instruction that performs "result = r0 op r1". * This could be an ARM instruction or a function call. (If the result * comes back in a register other than r0, you can override "result".) * * If "chkzero" is set to 1, we perform a divide-by-zero check on * vCC (r1). Useful for integer division and modulus. Note that we * *don't* check for (INT_MIN / -1) here, because the ARM math lib * handles it correctly. * * For: add-int, sub-int, mul-int, div-int, rem-int, and-int, or-int, * xor-int, shl-int, shr-int, ushr-int, add-float, sub-float, * mul-float, div-float, rem-float */ /* binop vAA, vBB, vCC */ FETCH(r0, 1) @ r0<- CCBB mov r9, rINST, lsr #8 @ r9<- AA mov r3, r0, lsr #8 @ r3<- CC and r2, r0, #255 @ r2<- BB GET_VREG(r1, r3) @ r1<- vCC GET_VREG(r0, r2) @ r0<- vBB .if $chkzero cmp r1, #0 @ is second operand zero? beq common_errDivideByZero .endif FETCH_ADVANCE_INST(2) @ advance rPC, load rINST $preinstr @ optional op; may set condition codes $instr @ $result<- op, r0-r3 changed GET_INST_OPCODE(ip) @ extract opcode from rINST SET_VREG($result, r9) @ vAA<- $result GOTO_OPCODE(ip) @ jump to next instruction

As you can see the code is very, very similar to what it was before. Only now there are no unclear points. For example, there was a comment "@optional op ..", and why, it is not clear. It is now clear.

I do not see the point of disassembling this opcode, everything has already been considered in it. Now we need to quickly move to the most important question - how are the arguments of the native and non-native functions filled?

Part II. Total

It was already another level than in the first part, it was necessary to learn a lot and remember, but now the structure of the opcode is clear! This can be happy. Yes, there is no more sense in this than simply rejoicing. But at the same time you can already write your opcode.

From this part you could learn what architecture is used to generate files. Where and how to find the corresponding opcode.

You may also have a lot of questions and I advise you to watch the presentation of the developer Dalvik VM. This presentation shows very well how exactly such commands as GOTO_OPCODE are used or how the bytecode queue is organized.

I do not try to immediately draw conclusions to achieve the goal, all conclusions will be at the very end of the article. And now I propose to dive inside Dalvik VM and understand where this code is being executed, but I warn you further the level is even higher, but I will insert less code.

Part III. How and where does the code run?

In the third part will be considered the process of extracting the function in Dalvik VM. The previous knowledge will help a lot, but I'm afraid to assume that the race will be the same as between the first and second part. Let's get started!

Note : it seems to me that you better not immediately follow the links, but first read, so will the material gradually appear and less likely to get confused.

Disassemble

It is necessary to understand how the opcode is filled in, for this we will see the following code in the section:

  public class Summator { void test() { sum(44, 43); staticSum(42, 41); nSum(44, 43); nStaticSum(42, 41); } int sum(int a, int b) { return a + b; } static int staticSum(int a, int b) { return a + b; } native int nSum(int a, int b); native static int nStaticSum(int a, int b); }

  name : 'test' type : '()V' access : 0x0000 () code - registers : 5 ins : 1 outs : 3 insns size : 21 16-bit code units outs : 3 insns size : 21 16-bit code units 0022a8: |[0022a8] com.m039.study.Summator.test:()V 0022b8: 1303 2c00 |0000: const/16 v3, #int 44 // #2c 0022bc: 1302 2b00 |0002: const/16 v2, #int 43 // #2b 0022c0: 1301 2a00 |0004: const/16 v1, #int 42 // #2a 0022c4: 1300 2900 |0006: const/16 v0, #int 41 // #29 0022c8: 6e30 2e00 3402|0008: invoke-virtual {v4, v3, v2}, Lcom/m039/study/Summator;.sum:(II)I // method@002e 0022ce: 7120 2d00 0100|000b: invoke-static {v1, v0}, Lcom/m039/study/Summator;.staticSum:(II)I // method@002d 0022d4: 6e30 2600 3402|000e: invoke-virtual {v4, v3, v2}, Lcom/m039/study/Summator;.nSum:(II)I // method@0026 0022da: 7120 2500 0100|0011: invoke-static {v1, v0}, Lcom/m039/study/Summator;.nStaticSum:(II)I // method@0025 0022e0: 0e00 |0014: return-void catches : (none) positions : 0x0008 line=29 0x000b line=30 0x000e line=31 0x0011 line=32 0x0014 line=33 locals : 0x0000 - 0x0015 reg=4 this Lcom/m039/study/Summator;

I do not really want to disassemble (and copy to the article) each file in the armv5te directory. Therefore, I will try to give links and excerpts from these files.

In the listing above, you can see that the native method is no different from the other. How so?Actually, this is how it should be, but still, the listing does not have any hint of how the native method is retrieved.

But first, why not consider the call itself (invoke-virtual) and immediately assume that the code there is not small. Therefore, I will form in advance what I would like to find there:
- the difference between virtual and static
- the very extraction of the function
- and how many, and possibly, what operations follow before the extraction

The rest is not yet interested and should not distract from the study.

Execution method

Let's start looking at listing from top to bottom.

The part of the code that fills in and retrieves the sum method:

 0022b8: 1303 2c00 |0000: const/16 v3, #int 44 // #2c 0022bc: 1302 2b00 |0002: const/16 v2, #int 43 // #2b 0022c8: 6e30 2e00 3402 |0008: invoke-virtual {v4, v3, v2}, Lcom/m039/study/Summator;.sum:(II)I // method@002e

At first glance, everything is very simple. The relevant registers are filled in and the method is retrieved.

I will try to guess what this method will look like from the inside.

const/16- probably enters the value of 44 in the virtual register by the number 0, and then the value of 43. ( Note : and if we assume that v1 is the designation of the virtual register, then there is no doubt)

invoke-vritual- enters the value of this in v4 and calls the function.

const / 16

  %verify "executed" /* const/16 vAA, #+BBBB */ FETCH_S(r0, 1) @ r0<- ssssBBBB (sign-extended) mov r3, rINST, lsr #8 @ r3<- AA FETCH_ADVANCE_INST(2) @ advance rPC, load rINST SET_VREG(r0, r3) @ vAA<- r0 GET_INST_OPCODE(ip) @ extract opcode from rINST GOTO_OPCODE(ip) @ jump to next instruction

This code reads the value transmitted in the second byte and writes the value to the virtual register. It can be noted (again) that everything looks the same as the generated C code (optimized version) with only 2 additions: an intermediate virtual register is used for each write / read and a new opcode is loaded each time and a transition is made. Otherwise, everything is very, very transparent.

invoke- kind

2 virtual registers are filled, now go to the file OP_INVOKE_VIRTUAL.S. And there is a deep horror. Therefore, we will immediately try to understand the difference between virtual (file OP_INVOKE_VIRTUAL.S ) and static ( OP_INVOKE_STATIC.S ).

By code difference in two places. The value passed to the dvmResolveMethod function and the non-static method has an additional appendix ".L $ {opcode} _continue:".

Consider what the dvmResolveMethod function does.

Before calling this function, the registers are filled as follows. (extract from comments):
- r0 <- method-> clazz
- r1 <- CCCC *
- r2 <- METHOD_VIRTUAL or METHOD_STATIC (method type)
- r3 <- glue-> method

* by documentinstructions-format CCCC, although BBBB is written in the comments.

Now let's turn to the dvmResolveMethod function code . If you look at the function declaration, it becomes clear that the 4th argument (see above) is superfluous:

  Method* dvmResolveMethod(const ClassObject* referrer, u4 methodIdx, MethodType methodType)

This function returns a pointer to a Method structure , which is not necessary to know more about this function. Take a look at this structure.

There are a lot of interesting and useful fields, but there is one wonderful field that provides particular interest - nativeFunc. Therefore, this structure also contains a pointer to the native function.

Now it would be nice to know where this structure or this method is “implemented.”

Both in the file OP_INVOKE_STATIC.S and in OP_INVOKE_VIRTUAL.S at the end the function is called bl common_invokeMethod${routine}. Most likely this is the main handler of the Method structure and its code can be found in the footer.S file . The prefix "NoRange" appeared because of the first line in the file - %default { ... , "routine" : "NoRange" }

But before looking at footer.S, you can see what type meansDalvikBridgeFuncnear the field nativeMethod? If you run through the VM code, you can find that nativeMethodthe dvmResolveNativeMethod function is assigned to the field.

By the comment of this function, it becomes clear that it is used to find the native method (in the library libNAME.so) and most importantly execute it. Well, we take her word for it, that case is the only case when it is better to believe.

One less question, it now became clear who performs the native method and where. But still, it’s not completely clear where. After all, the file footer.S was never reviewed.

Returning to the common_invokeMethodNoRange function. You can pay attention to the fact that it is very scary and it has answers to what was not clear in the last part, namely who fills registers such as rFP and rPC. As you can see, this function fills them.

Only a little patience and there you can understand, for example, what fills rINST, rPC, r2? Pointer to the field method-> inst, i.e. instructions (bytecode). And there are a lot of such moments, so I’ll leave them out.

But native is of interest, and everything is still easier there. If our method is native (the corresponding flags are checked), then that nativeFunc function is executed, which has already been dealt with.

Otherwise, common_invokeMethodNoRange is very similar to the standard opcode that was previously considered, only a lot of additional checks.

And what about static or non-static methods? apparently, footer.S has no relation to them, everything that could be said about them has already been said in the corresponding opcode files (invoke-virtual and invoke-static).

On this we can say who did what and where it made clear. And if not, then continue to figure it out is not difficult. My task was to show that it was more than realistic to ask myself questions (see the original goal) and to understand it well.

Part III. Total

In this part, a function was found responsible for retrieving the native method. Also traced to where this function is retrieved and where it is stored. With this knowledge, you can view other parts of the Java language and watch them in the code of Dalvik VM.

After this part it becomes clear that bytecode is a lot of opcodes and each interacts with each other. And in this part only a small part of this interaction was considered.

If you are interested in this topic, then you can further consider the standard structures of the Java language in a disassembler form and understand why the developer Dalvik VM in his presentation showed examples of how to do it or not do it when programming in Java for android.

Total

Here I will try to make excerpts from what I think can be found in this article.

But first I want to emphasize a couple of features, as the article is composed.

You could see the word “Remark” in the text, it was added after writing the whole article in the draft. And then it seemed to me that these comments should be added.

My goal for this article is to show the thinking that is needed to research incomprehensible code in practice. I chose for this very simple questions and the object of interest to me. Therefore, you can meet many branches and notice how the attitude to the subject changes.

And now a little about what seems to me interesting and is the result of research. They may seem banal to you, many things are written in the specification of the Java language, but after this article it is much easier to understand these structures as they are.

Here are the main points:

1. The method call, no matter how static or virtual it is (there is also a quick prefix), it is a very complex method call itself when compared to simple C (and even C ++). For example, let's say you have a wraper for Box2D. If every time (in an infinite loop) to call a method from this wrapper, in order to check whether the objects intersect, the corresponding question arises - why? This is better and needs to be done in C, but you can create a world and initialize objects in Java.

2. Opcodes are very close in their functionality to assembler inserts, so that they are implemented with the help of an assembler. Naturally, they are better than calling a third-party function via JNI, but worse than just doing it in C.

They are worse by two criteria:

1) They use a wrapper for virtual call through virtual registers, which is very similar to how C compiler issues in debug mode. It can even be said that the opcode is very similar in speed to the non-optimized version in C.

2) Each opcode has the extraction code of the next opcode, but it is not large. After extraction, a transition to the next occurs, which already looks like a simple instruction bl.

I quenched my interest and found out what I wanted and my guesses were mostly justified. I am very pleased if you liked it and you followed this path with me. I hope you have interesting thoughts and the main interest in how everything works.

I often notice that the developers oochen often use stereotypes (about the Java and C / C ++ languages) and do not show the slightest action to destroy them, and without it nothing.

If you are interested in additional material, you can look in the docs folder for the jni-tips.html file and the 2008 presentation of the Davik VM developer. There is also an interesting project smali , but my hands did not get to him.

PS I apologize right away if I made a mistake, please correct me and then I will immediately make a change.

Source: https://habr.com/ru/post/126356/

All Articles

How strong is the friendship between Java and C inside Dalvik VM?

Introduction

Who will be interested in the article

Original purpose

Preparatory knowledge

Part I. What does the code generated by C look like?

Disassemble

Assembly system

In 'debug' mode

Option 1

Option 2

Option 3

Option 4

Option 5

In 'release' mode

Option 1

Option 2

Option 3

Option 4

Impression

Part I. Summary

Part II. What does the code generated by java look like?

What opkody watch?

Where to start watching?

Documentation

Investigate addition opcode (0x90)

Not considered part

What is not considered, but I would like to

Investigate addition opcode (correct version)

Part II. Total

Part III. How and where does the code run?

Disassemble

Execution method

const / 16

invoke- kind

Part III. Total

Total

More articles: