📜 ⬆️ ⬇️

How to make C / C ++ compiler generate bad code

This is a translation of the article “ How to trick C / C ++ compilers into a generating terrible code? , The author of the original - Aater Suleman.

On a computer architecture course, I was told that the processor was like a car. The steering wheel and pedals are ISA, the engine is a micro-architecture, and the program is the driver. Continuing this analogy, I will say that using a computer is similar to controlling a machine through a remote control. The console is a cool thing, but at the same time it is important to understand how it works. Even in professional software, I have seen many examples of code that even the smartest compiler can confuse. In this article I will talk about the main methods of entangling compilers.


')
Before I fully immerse in the reasoning about compilers, I want to add a little disclaimer: although I believe that existing compilers need improvement, I also think that compilers will always have bottlenecks where the programmer will need to help them a little.

Compilers


The compiler takes your code in a high level language and converts it into machine code. The scheme of the compiler looks like this:

  1. Parse the code and build its intermediate representation.
  2. Optimize intermediate performance.
  3. Generate machine code.
  4. Perform linking.


Optimization of the code requires the compiler to perform quite a lengthy analysis. A simple example is the removal of a dead code (English dead code elimination, DCE). When running DCE, the compiler excludes code from the program, the execution of which does not affect the result of the program. For example, the loop in lines 3 and 4 will be excluded by the compiler, because the result of the loop is not used anywhere.

1: void foo(){ 2: int x; 3: for(int i = 0; i < N; i++) 4: x += i; 5: } 


When running DCE, the compiler performs a reverse data flow analysis (def-use) to determine which variables are not used. It then deletes the lines that write to these unused variables. In order for the compiler to perform DCE and other similar optimizations, it must be able to know with 100% certainty whether the variable will be accessed. This is where the possibilities for compiler obfuscation are revealed. We can use several well-known bottlenecks.

Function calls


Compiler analysis is a very complex computational task. For reference, the code of the compiler itself is probably the most complex code in the world. Moreover, optimization problems occur very quickly. To keep compile time within reasonable limits, compilers often limit the scope of analysis to functions and perform limited inter-procedural analysis (eng. Inter-procedural analysis, IPA) or do not perform it at all. In addition to this, a compiler that perceives functions as independent entities can compile several files at the same time.

Adding multiple layers of function calls, ideally also scattered across different files, is often a good way to confuse the compiler.

Let's look at this example:

 void foo(){ int x; for(int i = 0; i < N; i++) x += bar(i); } 


And in another file:

 int bar(int i){ return i; } 


The programmer might think that the extra resources here are spent only on bar() calls. However, during the compilation of foo() compiler cannot see that bar() has no side effects, such as writing to a global variable or I / O operations. Therefore, it disables DCE, and the useless loop will now also be executed. The result can be a significant drop in performance.

Pointers


I have seen many programmers use pointers unnecessarily. In theory, for pointers, the main source of waste of resources is their dereference (if any). In practice, they burn much more resources, because they tangle the compiler.

The pointer, theoretically, can point to any place in memory. As soon as the compiler encounters a pointer in the code, it can no longer be sure which variables will be accessed. Therefore, it assumes that several variables in the scope can be accessed through a pointer. In the eyes of the compiler, this often creates a bunch of false dependencies. My classic example is this cycle (I found it in an image processing program):

 for(int i = 0; i < N; i++) *a++ = *b++ + *c++; 


After careful study, a person can understand that the code actually does the following:

 for(int i = 0; i < N; i++) a[i] = b[i] + c[i]; a += N; b += N; c += N; 


The first code confuses the compiler. Even if the iterations of the loop are independent, the compiler does not consider them as such; for this reason, it cannot vectorize this piece of code. But the second - maybe.

For reference: vectorized code runs 5 times faster. The use of multi-threaded processing can give an increase of 4 times on 4 cores with little effort from the programmer. The same code, automatically vectorized, gives a speed increase of 5 times without any effort. I tried the Intel C Compiler (ICC) 9.1 and GCC 4.1. Results are shown for ICC.

If there are no more ideas, use global variables.


Global variables are considered evil for many reasons. I'll bring one more. Since they have a global scope, code using global variables cannot be fully optimized by the compiler. For example, I noticed that the following code runs 30% faster if the variable N local and not global.

 for(int i = 0; i < N; i++) a[i] = b[i] + c[i]; 


When you declare a global variable N , the compiler leaves it in memory and adds an instruction to the loop to load it. If we declare N local, the compiler loads it into a register. You can scold this behavior of the compiler (because N is not volatile ), but you have to work with what is.

Conclusion


It is important to remember about IPA and pointers when you write C / C ++ code. So you can help the compiler generate code that will be executed faster.

By the way: I noticed that both GCC and ICC are sensitive to the linking order, i.e. The generated code depends on the order in which the input files are specified on the linker command line. This can have a strong impact (up to 10%) on performance due to caching and branch prediction. If you really appreciate the performance, you should try to play around with the linking order.

Source: https://habr.com/ru/post/160995/


All Articles