📜 ⬆️ ⬇️

Uncertain behavior and the Fermat theorem

In accordance with C and C ++ standards, if program execution leads to an overflow of a signed integer variable, or to any of hundreds of other “undefined actions” (undefined behavior, UB), then the result of the program execution can be any: it can tweet obscenity, can format the disk for you ...
Alas, in reality, “Easter eggs”, which would force the program in the case of UB to do something out of the ordinary, have not met since GCC 1.17 - it launched nethack when it encountered unknown #pragma in the program code. Usually, the result of UB is much duller: the compiler simply optimizes the code for those cases when UB does not occur, without giving the slightest importance to what this code will do in the case of UB - because the standard allows you to do anything in this case!
To illustrate how the abundance of UB in the standard allows the compiler to perform unobvious optimizations, Raymond Chen gives this code example:

 int table[4]; bool exists_in_table(int v) { for (int i = 0; i <= 4; i++) { if (table[i] == v) return true; } return false; } 

In the condition of the cycle, we made a mistake by one, putting <= instead of < . As a result, exists_in_table() must either return true on one of the first four iterations, or it will read the table[4] , which is UB, and in this case, exists_in_table() can do anything, including return true ! In full compliance with the standard, the compiler can optimize the exists_in_table() code to
 int table[4]; bool exists_in_table(int v) { return true; } 

Such optimizations sometimes catch programmers by surprise. John Reger gives a selection of examples where UB led to significant consequences:

In 2012, Reger announced a contest for “the most bizarre UB result”. One of the winners took advantage of the fact that using a pointer after it is passed to a parameter in realloc() is UB. His program prints different values ​​using the same pointer:
 #include <stdio.h> #include <stdlib.h> int main() { int *p = (int*)malloc(sizeof(int)); int *q = (int*)realloc(p, sizeof(int)); *p = 1; *q = 2; if (p == q) printf("%d %d\n", *p, *q); } 

 $ clang -O realloc.c;  ./a.out 
 12

')
The programs of the other winners of the competition, in my opinion, are more boring and partially overlap with the examples given earlier.
But nothing compares to Reger’s example - the “refutation” of Fermat's compiler theorem .

He explains that for some embedded application he needed to write an infinite loop in C ++ so that the optimizing compiler could not remove all the code following the loop from the program. Modern compilers are smart enough to recognize “idioms” like while (1) { } or for (;;) { } - they understand that the code following such a cycle will never be executed, which means there is no need to compile it. For example, the function
  void foo (void) { for (;;) { } open_pod_bay_doors(); } 

- most compilers “shorten” to a single instruction:
  foo: L2: jmp L2 

Clang turns out to be even smarter, it is able to recognize (and delete) even such disguised endless cycles:
  unsigned int i = 0; do { i+=2; } while (0==(i&1)); 

Here, as in the previous example, the compiler is able to prove that it is impossible to exit the loop, which means that it can be replaced with a single jmp instruction.

Reger decided: Fermat's compilers are unlikely to prove the theorem during compilation?

 int fermat (void) { const int MAX = 1000; int a=1,b=1,c=1; while (1) { if (((a*a*a) == ((b*b*b)+(c*c*c)))) return 1; a++; if (a>MAX) { a=1; b++; } if (b>MAX) { b=1; c++; } if (c>MAX) { c=1; } } return 0; } #include <stdio.h> int main (void) { if (fermat()) { printf ("Fermat's Last Theorem has been disproved.\n"); } else { printf ("Fermat's Last Theorem has not been disproved.\n"); } return 0; } 

 regehr @ john-home: ~ $ icc fermat2.c -o fermat2
 regehr @ john-home: ~ $ ./fermat2
 Fermat's Last Theorem has been disproved.
 regehr @ john-home: ~ $ suncc -O fermat2.c -o fermat2
 "fermat2.c", line 20: warning: statement not reached
 regehr @ john-home: ~ $ ./fermat2
 Fermat's Last Theorem has been disproved.


How so? The loop ends with return 1; - the compiler was able to prove that Fermat's theorem is incorrect ?!

I wonder what values a,b,c he “found”?

Reger added print "found values" before return 1; - that's when the compiler recognized the impotence and honestly compiled an infinite loop. (Nothing, naturally, was printed.)

Despite the fact that this program does not contain any arithmetic overflows (the factors vary within 1..1000, the sum of their cubes does not exceed 2 31 ), the C ++ standard declares an “infinite action” an infinite loop without changing the external state - therefore, C ++ compilers can consider any such cycle is finite.
The compiler easily sees that the only way out of the while(1) is the operator return 1; , and operator return 0; end fermat() unreachable; so he optimizes this feature until
 int fermat (void) { return 1; } 


In other words, the only possibility to write such an infinite loop in C ++ that the compiler could not remove is to add a change in the external state to the loop. The easiest (and standard!) Way to do this is to change a variable declared as volatile .

Source: https://habr.com/ru/post/229963/


All Articles