Uncertain behavior and the Fermat theorem

In accordance with C and C ++ standards, if program execution leads to an overflow of a signed integer variable, or to any of hundreds of other “undefined actions” (undefined behavior, UB), then the result of the program execution can be any: it can tweet obscenity, can format the disk for you ...
Alas, in reality, “Easter eggs”, which would force the program in the case of UB to do something out of the ordinary, have not met since GCC 1.17 - it launched nethack when it encountered unknown #pragma in the program code. Usually, the result of UB is much duller: the compiler simply optimizes the code for those cases when UB does not occur, without giving the slightest importance to what this code will do in the case of UB - because the standard allows you to do anything in this case!
To illustrate how the abundance of UB in the standard allows the compiler to perform unobvious optimizations, Raymond Chen gives this code example:

 int table[4]; bool exists_in_table(int v) { for (int i = 0; i <= 4; i++) { if (table[i] == v) return true; } return false; }

In the condition of the cycle, we made a mistake by one, putting <= instead of < . As a result, exists_in_table() must either return true on one of the first four iterations, or it will read the table[4] , which is UB, and in this case, exists_in_table() can do anything, including return true ! In full compliance with the standard, the compiler can optimize the exists_in_table() code to

 int table[4]; bool exists_in_table(int v) { return true; }

Such optimizations sometimes catch programmers by surprise. John Reger gives a selection of examples where UB led to significant consequences:

UB, with a significant shift to the left, allowed the compiler to remove from the NaCl a check of the return address important to safety.
In the Linux kernel, dereferencing a pointer before checking for NULL allowed the compiler to remove this check, creating a vulnerability in the system.
In Debian, using an uninitialized array as a source of random data to initialize the RNG seed led to the fact that the entire seed calculation was deleted by the compiler.
When the variable p not initialized, the program can execute both the code inside if (p) { ... } and the code inside if (!p) { ... } .
When the INT_MAX variable x is equal to INT_MAX , the expression (x+1)>x in different places of the same program can be interpreted as both true and false.
An infinite loop, such as finding a non-existent value, can be removed by the compiler. For example, so the compiler can "disprove" the Great Fermat theorem. (We will analyze this example in detail.)
The compiler can make the program "clairvoyant" by rearranging the operation, potentially capable of breaking down the process (division by zero, reading at the zero pointer, etc.), ahead of the output operation. For example, this code:
```
 int a; void bar (void) { setlinebuf(stdout); printf ("hello!\n"); } void foo3 (unsigned y, unsigned z) { bar(); a = y%z; } int main (void) { foo3(1,0); return 0; } 
```
- will have time to print a message before SIGFPE, if it is compiled without optimizations; and will crash right at the start, if you enable optimization. The program "knows in advance" that it is destined to fall with SIGFPE, and therefore does not even bother printing a message. A similar example, only with SIGSEGV, is Chen.

In 2012, Reger announced a contest for “the most bizarre UB result”. One of the winners took advantage of the fact that using a pointer after it is passed to a parameter in realloc() is UB. His program prints different values using the same pointer:

 #include <stdio.h> #include <stdlib.h> int main() { int *p = (int*)malloc(sizeof(int)); int *q = (int*)realloc(p, sizeof(int)); *p = 1; *q = 2; if (p == q) printf("%d %d\n", *p, *q); }

 $ clang -O realloc.c;  ./a.out 
 12

')
The programs of the other winners of the competition, in my opinion, are more boring and partially overlap with the examples given earlier.
But nothing compares to Reger’s example - the “refutation” of Fermat's compiler theorem .

He explains that for some embedded application he needed to write an infinite loop in C ++ so that the optimizing compiler could not remove all the code following the loop from the program. Modern compilers are smart enough to recognize “idioms” like while (1) { } or for (;;) { } - they understand that the code following such a cycle will never be executed, which means there is no need to compile it. For example, the function

  void foo (void) { for (;;) { } open_pod_bay_doors(); }

- most compilers “shorten” to a single instruction:

  foo: L2: jmp L2

Clang turns out to be even smarter, it is able to recognize (and delete) even such disguised endless cycles:

  unsigned int i = 0; do { i+=2; } while (0==(i&1));

Here, as in the previous example, the compiler is able to prove that it is impossible to exit the loop, which means that it can be replaced with a single jmp instruction.

Reger decided: Fermat's compilers are unlikely to prove the theorem during compilation?

 int fermat (void) { const int MAX = 1000; int a=1,b=1,c=1; while (1) { if (((a*a*a) == ((b*b*b)+(c*c*c)))) return 1; a++; if (a>MAX) { a=1; b++; } if (b>MAX) { b=1; c++; } if (c>MAX) { c=1; } } return 0; } #include <stdio.h> int main (void) { if (fermat()) { printf ("Fermat's Last Theorem has been disproved.\n"); } else { printf ("Fermat's Last Theorem has not been disproved.\n"); } return 0; }

 regehr @ john-home: ~ $ icc fermat2.c -o fermat2
 regehr @ john-home: ~ $ ./fermat2
 Fermat's Last Theorem has been disproved.
 regehr @ john-home: ~ $ suncc -O fermat2.c -o fermat2
 "fermat2.c", line 20: warning: statement not reached
 regehr @ john-home: ~ $ ./fermat2
 Fermat's Last Theorem has been disproved.

How so? The loop ends with return 1; - the compiler was able to prove that Fermat's theorem is incorrect ?!

I wonder what values a,b,c he “found”?

Reger added print "found values" before return 1; - that's when the compiler recognized the impotence and honestly compiled an infinite loop. (Nothing, naturally, was printed.)

Despite the fact that this program does not contain any arithmetic overflows (the factors vary within 1..1000, the sum of their cubes does not exceed 2 ³¹ ), the C ++ standard declares an “infinite action” an infinite loop without changing the external state - therefore, C ++ compilers can consider any such cycle is finite.
The compiler easily sees that the only way out of the while(1) is the operator return 1; , and operator return 0; end fermat() unreachable; so he optimizes this feature until

 int fermat (void) { return 1; }

In other words, the only possibility to write such an infinite loop in C ++ that the compiler could not remove is to add a change in the external state to the loop. The easiest (and standard!) Way to do this is to change a variable declared as volatile .

Source: https://habr.com/ru/post/229963/

All Articles

Uncertain behavior and the Fermat theorem

More articles: