
Great news is waiting for gcc users when upgrading to version 4.9.0 - new optimizations using undefined behavior can “break” (in fact, break) the existing code, which, for example, compares with pointers previously passed to
memmove () and a number of other functions of the standard library.
For example,
it is stated that in this code:
int wtf( int* to, int* from, size_t count ) { memmove( to, from, count ); if( from != 0 ) return *from; return 0; }
The new gcc can remove the comparison of the pointer with zero, and as a result the call to
wtf (0, 0, 0) will dereference the null pointer (and crash the program).
At first glance, it looks as if the compiler purposefully broke the program. Some readers are already full of indignation (especially the "obscure" example of code) and rush to comment in order to express it. It is too early. First you should see what is said about this in the C99 Standard.
Section 7.21 describes the “string functions” declared in the header of
string.h 7.21.1 / 2 states the following: “if the description of a specific function in this subsection does not say otherwise, then pointers passed as arguments to the function call must have Valid values that meet the requirements of 7.1.4. " The
memmove () function is described in 7.21.2.2, i.e. refers to "string functions", in its description nothing is said about the validity of null pointers at the input.
')
TL; DR; Look in 7.1.4, it says "If the function argument has an invalid value (such as <...>, null pointer) <...>, then the behavior is undefined."
Thus, passing null pointers to
memmove () results in undefined behavior, even if the value of the third parameter (number of bytes) is zero. The compiler draws the following conclusion from this: if the pointer is passed to
memmove () , we can assume that it is non-zero and optimize the rest of the code accordingly. This idea is explained in detail and with examples
in this wonderful publication .
Let's try to reproduce it on MinGW with gcc 4.9.0
#include <stdio.h> #include <string.h> void magic1( char* to, char* from, size_t count ) { memmove( to, from, count ); if( from == 0 ) { printf( "null\n" ); } else { printf( "not null\n" ); } } int main() { magic1( 0, 0, 0 ); return 0; }
Compile:
gcc magic.c -O2 -o magic.exe
Run the resulting executable file - we get in the issue of "not null".
For comparison, if you call the
memmove () call below:
void magic2( char* to, char* from, size_t count ) { if( from == 0 ) { printf( "null\n" ); } else { printf( "not null\n" ); } memmove( to, from, count ); }
then the output will be expected: “null” - with the new optimization, the program's operation may change depending on whether the
memmove () call is higher or lower than comparing the pointer with zero.
That's not all. The program may change when changing the library function to “bicycle” or vice versa:
void mymemcpy( char* to, char* from, size_t count ) { while( count > 0 ) { *to++ = *from++; count--; } } void magic3( char* to, char* from, size_t count ) { mymemcpy( to, from, count ); if( from == 0 ) { printf( "null\n" ); } else { printf( "not null\n" ); } }
When you call
magic3 (0, 0, 0), the program returns “null”. In the case of using library
memcpy (), "not null" is issued.
In the
description of the optimization settings described above is not explicitly mentioned. The most similar is
-fdelete-null-pointer-checks , and indeed with the
-fno-delete-null-pointer-checks setting, this optimization is turned off along with a number of other optimizations that suggest that a previously dereferenced pointer does not make sense to compare with zero. Note that in the optimization described above, we are not talking about pointer dereferencing, but only about passing the pointer as a parameter to string functions.
Contrary to popular belief, really portable code is not as easy to write as we would like. Using
size_t to index arrays is not enough.
Dmitry Mescheryakov,
product department for developers