We comprehend C deeper using assembler. Part 2 (conditions)

Here is the second part of the cycle. In it, we will deal with the conditions. This time, let's try other levels of optimization, and see how this may affect the code.

It is necessary to indicate the purpose of these articles, so that there is no misunderstanding. I will not parse each C compiler separately. It is long and tedious. Instead, I want to captivate readers with an analysis of interesting interpretations of the C code, so that people understand how their code can be changed and executed by the processor. And also dispel some myths that run among novice programmers. For example, there are, however, those who believe that if you add numbers in a cycle, it will be faster than just multiplying one number by another. The article does not specifically consider gcc with -m32 -O0, some did not quite understand the idea. If there is real meaning, then I will change both the compiler and the keys.

T. e. What I want to say? Consider two old examples:

int main(void) { register int a = 1; //    1 return a; //     }

and
')

 int a = 1; int b = a * 2;

Indeed, the clang in the first case determines the variable to the stack, but how interesting or essential is it for us? Those familiar with the register specifier read / know that this is merely a recommendation. Therefore, the compiler can simply ignore the specifier. In addition, the purpose of the example was to acquaint the reader with the registers, taking the simplest example. Perfect for this gcc. The second example is even simpler, in it the clang immediately makes a shift, and it surrenders when multiplied by 3, yielding imul. Honestly, I don’t really understand what is curious in this example, so I also cited the code for gcc, which is perverted to the number 22. We all know that the standard of the language does not state how to implement this or that thing. And compiler developers are free to make their own implementations, so long as they do not violate the standard. Therefore, we have a different interpretation of the code depending on the compiler. But, forgive me, disassemble each? What is the practicality of this material? Confuse everyone's head? As was correctly noted, if you are interested in a specific compiler, then you can just sit with the debugger. And it will not be so scary for those who read these articles.

So let's continue.

The simplest condition

First, let's compare the variable and the number:

 int a = 0; if (a < 5) { return 1; } return 0;

AFM (gcc 7.2):

  mov DWORD PTR [ebp-4], 0 cmp DWORD PTR [ebp-4], 4 jg .L2 mov eax, 1 jmp .L3 .L2: mov eax, 0 .L3: leave ret

In the first line, the compiler adds the value of the variable "a" to the stack. In the second we have a new cmp instruction. It is not difficult to guess that this instruction compares two values. In our case: the value from the stack and 4.

But how does it work? It just takes the second operand from the first operand. The sub instruction works in a similar way, but in the case of cmp, the result is not saved. However, the flags in the EFLAGS / RFLAGS register are set according to this result. Without going into details, we can find out if there was a positive result, negative or zero. The following conditional jump command jg is triggered if the result was positive ( j ump if g reater).

If you interpreted this, then a fair question could arise: why more if the sign was less? Indeed, we wrote if a <5 , but it turned into doing something, if a> 4. But the logic of the program did not break. After all, if a> 4, then return 0 occurs. Here another fair question may arise, and if you write a condition: if (a> 4) return 0, how will the code change?

  mov DWORD PTR [ebp-4], 0 cmp DWORD PTR [ebp-4], 4 jle .L2 mov eax, 0 jmp .L3 .L2: mov eax, 1 .L3: leave ret

And we again get the reversal of the condition: jle , you guessed it, is less than or equal to ( j ump if l ess or e qual)

The whole point is that return completes the program, so you need to follow the last two instructions, which is why the line jmp .L3 does not change in both examples. This is an unconditional branch instruction. In our case: it skips the line following the condition where a completely different number should be entered in the eax register.

That is, the compiler checks the opposite condition so that with the original false it is sent to the code down, but if the original condition is true, then the code that goes immediately after cmp and the conditional transition is executed. Let us note, for clarity, the numbers in the condition branch:

 int a = 0; if (a > 5) { //#0 return 1; } //#1 return 0;

  mov DWORD PTR [ebp-4], 0 cmp DWORD PTR [ebp-4], 4 jle .L2 ;#0 mov eax, 0 jmp .L3 .L2: mov eax, 1 ;#1 .L3: leave ret

As you can see, the structure of the program is not violated, but if we replace the conditional transition, then:

  mov DWORD PTR [ebp-4], 0 cmp DWORD PTR [ebp-4], 5 jg .L2 ;#1 mov eax, 1 jmp .L3 .L2: mov eax, 0 ;#0 .L3: leave ret

That is, the inner part of the condition falls down the program, which is not very good: for example, after the condition (in section # 1) there are a lot of lines, then to see section # 0, we will twist the listing very far down. (I forgot to clarify that then I will have to return to continue the execution of the code after if. That is, one more label and one more transition.)

unsigned

We have just considered comparing numbers with a signed sign (signed), but what if we compare unsigned numbers?

 unsigned int a = 0; if (a > 5) {    return 1; } return 0;

  mov DWORD PTR [ebp-4], 0 cmp DWORD PTR [ebp-4], 5 jbe .L2 mov eax, 1 jmp .L3 .L2: mov eax, 0 .L3: leave ret

Nothing has changed except the conditional instruction: instead of jle, now jbe ( j ump b elow or e qual). Why two different instructions for comparing signed and unsigned numbers?

 int a = 0 – 1; //-1 unsigned int a = 0 – 1; //4294967295

Although in fact, in memory, all the same, will be 4294967295. This is just a display method, you can write in C:

 unsigned int a = 0 - 1; printf("%i", a); //-1

But with the cmp instruction, not one flag is set, but several. The jbe instruction checks the overflow flag during subtraction, and jle checks the flag that is equal to the value of the most significant bit of the result (i.e., if there is a negative result, there is 1). In reality, everything is a bit more complicated: JBE (CF = 1 or ZF = 1), JLE (ZF = 1 or SF <> OF), but we can not dwell on it. Let's move on to more interesting things:

 unsigned int a = 0; if (a < 0) {    return 1; } return 0;

will be converted to:

  mov DWORD PTR [ebp-4], 0 mov eax, 0 leave ret

Great, right? According to the logic of our code, the variable “a” will never be less than zero, so the condition can simply be thrown out.

And what about this:

 unsigned int a = 0; if (a > 0) {    return 1; } return 0;

AFM:

  mov DWORD PTR [ebp-4], 0 cmp DWORD PTR [ebp-4], 0 je .L2 mov eax, 1 jmp .L3 .L2: mov eax, 0 .L3: leave ret

The je instruction performs the transition if the result of the comparison is zero ( j ump if e qual).

”<” Faster than ”<=”? Or than ”<|| = ”?

We have already reviewed several conditional branch instructions: jle, jbe, jg, and je. There are few more such instructions for all cases, there are also reverse ones: for example, jne is not zero or not equal or jnbe is not lower and not equal. That is, for any comparison of numbers, we will receive two instructions cmp (or test) and jcc (conditional jump). Thus, it can be concluded that, for example, there is no difference in the number of instructions for <and <=.

But for

 if (a < 0 || a == 0)

the difference will be, but only at -O0.

Let's take a look at the following program:

 #include <stdio.h> int main() { int a = 0; scanf("%d", &a); if (a < 0 || a == 0) {    return 10; } return 20; }

This time I use clang 5.0.0 -O3 -m32, since less asm code is generated, and using this example it will be easier to explain what is happening:

  sub esp, 12 mov dword ptr [esp + 8], 0 ;scanf sub esp, 8 lea eax, [esp + 16] push eax push .L.str call scanf add esp, 16 ;end scanf ;   cmp dword ptr [esp + 8], 0 ;#1 mov ecx, 10 ;#2 mov eax, 20 ;#3 cmovle eax, ecx ;#4 ;  add esp, 12 ret .L.str: .asciz "%d"

# 1: comparing variable a to zero
# 2: ecx is now 10
# 3: in eax register now 20
# 4: cmovle is similar to jle, only it moves the value provided. Thus, if a <= 0, then the value from ecx (10) falls into eax, otherwise, just 20 will remain.

You already understand that if in the C code it is replaced with a <= 0, then nothing will change, but you can check if there is a desire.

When conditions cease to be conditions

Imagine the situation: there is a condition in your code, but when debugging you cannot find conditional instructions. Interesting?

Take a look at the following code:

 int x = 10; scanf("%d", &x); if (x < 0) { return 3; } return 2;

You might expect cmp and tags, you could expect even more non-trivial things, like setx, but got the following (clang 5.0.0 -O3 -m32):

  mov eax, dword ptr [esp + 8] shr eax, 31 or eax, 2 add esp, 12 ret

Well and what is it? Let's see. Everything is clear with the first line: the value of the variable x was transferred to eax.

The next line you should remember in the last article. This is a right shift of 31 bits. That is, in fact, we are left with only the first bit of the whole number.

Next comes the bitwise "or" operation. That is, we end up with either 10 or 11 (in binary number system). That's all, the following lines refer to the epilogue of the function.

What is interesting, to guess to write such a code, is not particularly difficult. We simply add the sign of the number in the variable x to the two.

By the same logic, but a bit different, for example, gcc 4.8.5:

  sar eax, 31 not eax add eax, 3

Sar is also a shift to the right, but it works a little differently, the most significant bit, i.e. a sign, it does not shift.

[1000] shr [0100] shr [0010] shr [0001]

[1000] sar [1100] sar [1110] sar [1111]

That is, if we have all the units, then the number was negative, we invert all the bits, get zero, add 3, exactly what we wanted. And if the number is positive, then all the zeros will be, after inverting they will become ones. In fact, this is -1, adding 3 to it, we get 2.

MSVC-O2 at the same time comes more than expected:

  cmp DWORD PTR _x$[ebp], eax ; x < 0 setl al ; less ? mov al, 1 add eax, 2 ; eax + 2

Very briefly: the low byte of the eax register is set to one, provided that the comparison result is negative, then 2. is added. This is a classic. The compilers are very fond of this technique, I hope we will meet it again.

Else operator

I think everyone understands that for else the opposite condition is not checked. In the code, just another label appears, that's all the features. Let's make sure of this and consider the following code:

 int x = 10; int c = 0; if (x < 4) { c = 3; } else { c = 2; } return c;

AFM:

  mov DWORD PTR [ebp-8], 10 mov DWORD PTR [ebp-4], 0 cmp DWORD PTR [ebp-8], 3 jg .L2 mov DWORD PTR [ebp-4], 3 jmp .L3 .L2: mov DWORD PTR [ebp-4], 2 .L3: mov eax, DWORD PTR [ebp-4]

As you can see, the only difference is that after executing the code inside the if, we skip over the insides of the else block (the .L2 label). I hope that a detailed analysis is not needed, everything seems to be obvious.

Logical operations in if

Let's look at a slightly non-standard example:

 int main(void) { int a = -1; if (a++ < 0 || a++ > 5) { a++; } else { a+=2; } return 0; }

First try to answer: what value will be in the variable "a"? If for you it was not difficult, then you already have some idea what the asm code will look like.

  mov DWORD PTR [ebp-4], -1 mov eax, DWORD PTR [ebp-4] ;#1 lea edx, [eax+1] ;#1 mov DWORD PTR [ebp-4], edx ;#1 test eax, eax ;#1 js .L2 ;#1 mov eax, DWORD PTR [ebp-4] ;#2 lea edx, [eax+1] ;#2 mov DWORD PTR [ebp-4], edx ;#2 cmp eax, 5 ;#2 jle .L3 ;#2 .L2: mov eax, 1 ;#3 jmp .L4 ;#3 .L3: mov eax, 0 ;#4 .L4: test al, al ;#5 je .L5 ;#5 add DWORD PTR [ebp-4], 1 ;#6 jmp .L6 ;#6 .L5: add DWORD PTR [ebp-4], 2 ;#7 .L6: mov eax, 0 ;#8

So, try to figure it out. I singled out operations by numbers, so that it would be easier to distinguish what relates to what.

# 1: a ++ <0. At the same time, the value before the increment is recorded in eax, and that is exactly what should be compared with zero. The test instruction works on the principle of and , but does not change the operands themselves, only the flags. In our case, after test, we check the sign of the number, if it is 1, then we make the transition. Also the value incremented by one is returned to the stack. The increase is done by the instruction lea edx, [eax + 1] . The lea instruction is used to load an effective address. In our case, it replaces two instructions at once: mov edx, eax and add edx, 1 .
# 2: a ++> 5. In fact, the same thing happens, only the transition to the .L3 label if a <= 5. That is, we get to the .L2 label if the first condition or the second condition is satisfied. In this case, note that the second condition will not be calculated if the first condition is fulfilled. But you should have known that already.
# 3: In the eax register it turns out 1
# 4: Eax turns 0
# 5: We check the low byte of the eax register to zero, if it is equal, then go to the label .L5
# 6: Otherwise, add to the variable "a" 1. Go to the end of the program.
# 7: If the low byte in eax was equal to zero, then add the variable "a" 2

That is, # 3 is responsible for the fact that both conditions are met and the “flag” is set, then the check is performed in # 5, if the “flag” is set, then 1 is added, otherwise 2.

It is also worth noting that if (a <0 && a <-5) , if the first condition is not met, then the second will not be calculated either.

Conclusion

We briefly reviewed the conditions in C. You saw that the compiler can either slightly modify the code or change it beyond recognition during optimization.

Unfortunately, the article came out so long that it was not possible to consider the switch statement, so if you want, you can do this in the next article. At the same time, you can also consider optimized programs with else.

Previous article
Next article

Source: https://habr.com/ru/post/345460/

All Articles