📜 ⬆️ ⬇️

We comprehend C deeper using assembler. Part 3

In the third article we will continue to deal with the conditions. Last time, we never looked at optimized if-else versions.

Perhaps, because of this, some fair questions arose, for example, why the compiler is here:

int x = 10; scanf("%i", &x); int c = 0; if (x < 4) {     c = 3; } else {     c = 2; } return c; 

Generated code with two transitions, but not with one.

Indeed, we here and wrote a little strange code, because, obviously, it was possible to write it like this:
')
 int x = 10; scanf("%i", &x); int c = 2; if (x < 4) {    c = 3; } return c; 

You should ask the programmer why he needs an else, and not the compiler. But if we look at the optimized version of the first version, we see that the compiler is not so stupid:

  sub esp, 12 mov dword ptr [esp + 8], 10 sub esp, 8 lea eax, [esp + 16] push eax push .L.str call scanf add esp, 16 xor eax, eax ;   eax cmp dword ptr [esp + 8], 4 ;     4 setl al ;   ,   al  1 or eax, 2 ;    2 add esp, 12 ret .L.str: .asciz "%i" 

Immediately it is worth noting that there are no transitions at all. And the way that clang dealt with our condition, we have already seen. For the second option, exactly the same optimized code will be generated.

Now let's remember this:

 int main(void) { int a = -1; scanf("%i", &a); if (a++ < 0 || a++ > 5) { a++; } else { a+=2; } return a; } 

Was it a bit scary last time? The optimized version may slightly reduce the amount of code and the number of labels (I’ll skip scanf):

  mov ecx, dword ptr [esp + 8] lea eax, [ecx + 1] test ecx, ecx mov dword ptr [esp + 8], eax js .LBB0_2 lea eax, [ecx + 2] cmp ecx, 5 mov dword ptr [esp + 8], eax jl .LBB0_3 .LBB0_2: inc eax add esp, 12 ret .LBB0_3: add ecx, 4 mov eax, ecx add esp, 12 ret 

Up to the mark. LBB0_2, the code corresponds to the condition itself. Please note that in the optimized version the logic: “the second operand“ or ”will not be calculated, if the first is true” is not violated. Well still, she was broken ...

However, there is a significant difference from the non-optimized version. Let's go through the code so that everyone understands what is happening:

The first line in ecx gets the value of the variable "a".
The second line in eax gets a + 1
Next comes the comparison with zero, so that the js instruction goes to the label with the number 2 if the number was negative.
In the fourth line in the variable "a" gets a + 1 from eax.

In the non-optimized version, we took the value of the variable again and put it in eax, but now we take the old value, it is still in ecx. Next we do a + 2 and save to eax.

And we compare the number to increment from 5, so that the instruction jl, go to the third label, if the value is less, thus not falling into the condition.

You should have a fair question: “So that clang will produce two different results for an optimized and non-optimized version of the program?”

No, the thing is that care in else happens with less than 5, not less or equal. The compiler took into account the dubious result, when the variable would be equal to 5, and built the condition. As a result, I saved on additional memory access.

In the optimized version:
5> = 5 => 5 + 3
In non-optimized
7> 5 => 7 + 1

Otherwise, in the optimized version, you can simply add 4 to the result and throw it into eax. These are the wonders.

Why do I compare a non-optimized version of gcc with an optimized clang? Well, gcc does the same thing, just not very pretty. And in clang, the code looks consistent and compact.

It is necessary to mention that the compiler in VS2015 on full optimization left only one label:

  mov eax, DWORD PTR _a$[ebp] ;a → eax add esp, 8 mov ecx, eax inc eax test ecx, ecx js SHORT $LN4@main ; if (a<0) mov ecx, eax inc eax cmp ecx, 5 jg SHORT $LN4@main ; if (a>5) add eax, 2 ;else a+=2 ... $LN4@main: ... inc eax ;a++ in if ... 

In between, a code that does not belong to ours, so I just removed it. As you can see, the conditions are designed to skip the else, so there is only one label. But it is worth adding the code after else, and there will also be two labels. In this case, the code will resemble a non-optimized version by the fact that we first assign one value, but if the condition is met, the value will be overwritten.

switch


Well, what about the switch statement? How the battery of if-s will differ from this “beautiful” operator. Let's find out by the example of the following code:

 char c = getchar(); int a; if (c == 'q') { a = 0; } else if (c == 'w') { a = 1; } else if (c == 'e') { a = 2; } else { a = 4; } return a; 

and its alternative:

 char c = getchar(); int a; switch(c) { case 'q': a = 0; break; case 'w': a = 1; break; case 'e': a = 2; break; default: a = 4; } return a; 

We will not consider in detail the non-optimized version. I think everyone understands that the differences, if any, will be minimal. But it’s funny what exactly the switch generates more code as a result (gcc 7.2):

 movsx eax, BYTE PTR [ebp-13] ;            cmp eax, 113 je .L3 cmp eax, 119 je .L4 cmp eax, 101 je .L5 jmp .L8 

In the case of if, you should already represent the code: comparison - action - transition. And here you see that actions are postponed for later.

In the optimized version of gcc for switch, it will generate shorter code, and there will be no difference at all in the clang. Therefore, let's consider only the switch in gcc. But first think, how could you remove the switch in this code altogether?

So, if invented, then open the spoiler and look:

asm code
  push DWORD PTR stdin call _IO_getc lea edx, [eax-101] add esp, 16 mov eax, 4 cmp dl, 18 ja .L1 movzx edx, dl mov eax, DWORD PTR CSWTCH.3[0+edx*4] .L1: mov ecx, DWORD PTR [ebp-4] leave lea esp, [ecx-4] ret CSWTCH.3: .long 2 .long 4 .long 4 .long 4 .long 4 .long 4 .long 4 .long 4 .long 4 .long 4 .long 4 .long 4 .long 0 .long 4 .long 4 .long 4 .long 4 .long 4 .long 1 

The idea is very simple: match the symbol code and the final number in "a". This can be achieved by subtracting the smallest character code from the switch from the result, in our case it is 'e' or 101.

Then we put in eax 4, because in the else we will have just that.
Next, we compare the contents of edx and 18 (only take the low byte from this register, like al, only for edx is dl). And if the value is greater than (ja), then go to the label .L1. If the question was: “why 18?”, Then run 'w' - 101.

Now, if the result is between 0 and 18, then the formula works: mov eax, DWORD PTR CSWTCH.3 [0 + edx * 4], which means simply taking the result, based on the number in edx, offset 4 separates one number from another in memory (4 bytes).

For 0 ('e') the result will be 2,
For 12 ('q') the result will be 0,
For 18 ('w') the result will be 1,
In other cases there will be 4.

We have a badly torn example from life, hardly, so someone, in general, used the switch. What if some action goes? You might be surprised, but in general nothing changes. Consider this example:

 #include <stdio.h> #include <math.h> int main(void) { start:    char c = getchar();       switch (c)    {    case 'T':    case 't': printf("Talk\n"); break;    case 'W':    case 'w': printf("%f\n", sin(0)); break;    case 'Q':    case 'q': return 0;    default:        printf("wrong command\n");    }    goto start;   } 

asm code
 .LC0: .string "Talk" .LC2: .string "%f\n" .LC3: .string "wrong command" main: ... .L2: ;start sub esp, 12 push DWORD PTR stdin call _IO_getc sub eax, 81 add esp, 16 cmp al, 38 ja .L3 movzx eax, al jmp [DWORD PTR .L5[0+eax*4]] .L5: .long .L9 .long .L3 .long .L3 .long .L6 .long .L3 .long .L3 .long .L7 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L3 .long .L9 .long .L3 .long .L3 .long .L6 .long .L3 .long .L3 .long .L7 .L7: ;sin sub esp, 4 push 0 push 0 push OFFSET FLAT:.LC2 call printf add esp, 16 jmp .L2 .L6: ;Talk sub esp, 12 push OFFSET FLAT:.LC0 call puts add esp, 16 jmp .L2 .L9: ;return 0; mov ecx, DWORD PTR [ebp-4] xor eax, eax leave lea esp, [ecx-4] ret .L3: ;default sub esp, 12 push OFFSET FLAT:.LC3 call puts add esp, 16 jmp .L2 

There is a lot of code, so concentrate on the essentials. Now, instead of moving the number in eax, a transition is made to a specific label. But the idea itself does not change.

Conclusion


It seems that we have dealt with the conditions completely. If someone had doubts about the switch, then you can now not be afraid of him. In the next article, we will look at cycles.

Previous article

Source: https://habr.com/ru/post/347132/


All Articles