📜 ⬆️ ⬇️

How ill-conceived compiler warnings help to spoil completely correct code.

This is a post about the complexities of the interaction of artificial and natural intelligence. Modern compilers have fairly advanced static analysis tools that can issue warnings to various suspicious constructs in code. In theory, these warnings should help developers make fewer errors in the code.

In practice, the compiler warnings are not always equally useful. Often, they do not help developers, but interfere with them and can provoke to correct a completely correct code, i.e. violating the rule "works - do not touch."

For example, there will be a story about one line of code from a fairly popular open source library for working with XML. The context for this line is:
fseek( fp, 0, SEEK_END ); const long filelength = ftell( fp ); fseek( fp, 0, SEEK_SET ); if( filelength == -1L ) { return FILE_READ_ERROR; } if( filelength >= (size_t)-1 ) {//    return FILE_READ_ERROR; } const size_t size = filelength; _charBuffer = new char[size+1]; size_t read = fread( _charBuffer, 1, size, fp ); if( read != size ) { return FILE_READ_ERROR; } 

This fragment tries to read the entire file on the disk. To do this, translate the file pointer to the end of the file and get the current position in the file. Then an attempt is made to allocate memory for the data of the entire file. Before allocating the memory, it is worth checking whether the required size fits in size_t to avoid hard-to-diagnose logical errors during further work.

It is performed (in fact, and the code is opposite and the error code is returned) checking that filelength is strictly less than the maximum size_t value. The check is performed on “strictly less”, because you need to allocate one more memory to write the terminating null character before further parsing the contents of the file.
')
Everything would be fine, but the compiler is indignant - after all, on the one hand the comparison is the signed type long, and on the other hand the unsigned size_t. WARNING -Wsign-compare from gcc !!!

What to do, where to run ??? This is a warning !!!

One of the developers corrects the check:
  if( (size_t)filelength >= (size_t)-1 ) { 

... and the compiler is UZBAGOIL satisfied. Success? Alas, initially the correct code now does not work as planned. The sizes long and size_t are not related to each other. The size of any of these types may be larger than the size of the second on a particular platform. The original code required in this case to convert both values ​​to the same type - sufficient for both size. In the "corrected" code, some of the significant digits will be lost if the length of a long is larger than the size of size_t.

So to get rid of the warning in the originally correct code, this code was fixed and broken.

The error could remain for a long time and for many years to please the users when working with very large files on a certain subset of systems. Not left - the library code is open and even lies on Github, soon comes a pool request with editing this code. In the corrected code, the check looks like this:
  if( (unsigned long)filelength >= (size_t)-1 ) { 

The corrected code compares the values ​​of both unsigned types, so the terrible problem (tm), which gcc reported with the -Wsign-compare warning, is missing here. Reduction of long to unsigned long here does not lead to incorrect interpretation of the data, because the only possible negative value is processed previously.

Success? No, not so fast.

Editing is poured into the main branch, after which new warning messages begin to arrive. When compiling for platforms where the size of unsigned long is less than size_t, clang issues a warning -Wututological-constant-out-of-range-compare “WARNING !!! The range of unsigned long values ​​is so dull and hopelessly limited that the comparison always gives the same result and has absolutely no meaning. ”

Fully portable code for any platform, speak? No, when the explicit reduction of long to size_t could lead to the loss of discharges, it suited everyone. And now clang gives a useless warning and library users leave comments to the edits and reports of defects. This is a warning !!!

What to do ... after all, it is necessary to make it so that on platforms where the size of unsigned long is less than size_t, there is no comparison, and on other platforms there is.

And, here is the solution to the "problem":
 template <bool = (sizeof(long) >= sizeof(size_t))> struct LongFitsIntoSizeTMinusOne { static bool Fits( long value ) { return (unsigned long)value < (size_t)-1; } }; template <> bool LongFitsIntoSizeTMinusOne<false>::Fits( long /*value*/ ) { return true; } 

- What is it, Barrymore ???
- This is a Baskervilles pattern, sir !!!

Here you could apply the function template, but the default template parameter values ​​for function templates are supported only starting from C ++ 11, and the library - trying to expand the audience - tries to do without it. struct instead of class is used to not specify the public visibility explicitly.

Used as follows:
  if ( !LongFitsIntoSizeTMinusOne<>::Fits( filelength ) ) { 

Depending on the relationship between the sizes of long and size_t, the compiler chooses either specialization or implementation by default. In the first case, there will be no meaningless checking, and the compiler will be NOTARED about anything.

The main branch is poured in using this template, and the “problem” has been resolved.

Summary of the previous text ... A perfectly correct code with one comparison of an integer variable and an integer constant was ruled three times to make two very popular compilers happy. The first of the edits this completely correct code broke, but it made the compiler happy for which they made the revision. In the end, instead of a simple comparison, they got something closer to FizzBuzz Enterprise . But there are no warnings.

Now let's talk ...

How popular should a spherical piece of open source be in vacuum so that it has a chance to go all the way from the first edit to the last, and how much more likely is it that a less popular code would remain broken by the first edit? It took a total of clearly more than one hour of work of developers with not the lowest qualifications to find the solution shown above, which does not even correct the error, but simply turns the correct code into the correct code without warning.

Why did it happen? There are exactly two reasons.

The first reason is that developers almost unconditionally trust the compiler. Next to such a complex program, many developers feel like being next to the 107-meter monument to the Conquerors of Space on Prospekt Mira in Moscow. A modern compiler is a very complex program, and many people believe that it cannot be wrong. Alas, this opinion is incorrect - the more complex the program, the more opportunities it has to err. The compiler may issue a warning to a completely correct code.

The second reason is that the developers of static analyzers (especially those that are part of the compilers) usually believe that a false positive is not a problem and nothing terrible, the main thing is to WARN, and “the developer must understand”. Understand it without fail, but it’s far from the fact that as a result the code will get better.

In discussions of this problem, there are ideas that the compiler cannot figure out if there is a problem in the code, and only “the developer must”.
Is the compiler so hopelessly helpless and unable to “understand” the code? It's time to think about code optimization. Nontrivial code optimizations — automatic vectorization and unwinding of loops, calculating invariants, removing unnecessary checks, and other useful and smart conversions — require the compiler to be very thoughtful in analyzing the code.

How thoughtful analysis is needed in this example? The value tested is derived from the execution of the ftell () function, its behavior is documented. gcc, for example, already knows how to optimize — and “break” non-Standard-code using the Standard's requirements to pass into “string functions” —for example, memcpy () —only non-zero pointers, details are in this publication . How he does it? The gcc developers added such an opportunity to it - to recognize a number of functions as “string” and make some assumptions about the values ​​of the pointers passed to these functions.

In the same way, compiler developers can add to it the ability to use data on the behavior of other functions of the standard library — hardly trivial, but definitely feasible. Third-party static analyzers also "know" about the functions of the standard library and popular environments, this allows them to report, for example, a possible logical error in the above code when the result of ftell () is not checked for the error-reporting value -1L.

The technical possibility for significant improvements is. This is a question of priorities and attitude to the problem.

You can “teach” the compiler to optimize and “break” non-Standard code with the call of string functions, or you can “teach” it to issue less useless warnings.

What is the output? Many static analysis tools still work according to the principles of “the main thing to warn” and “the developer should”, with this they provoke developers to make unnecessary edits to the code, sometimes breaking working code along the way. You can, of course, declare these developers unworthy and crooked, but you should not forget that the fate of the millions of lines of code used every day by hundreds of millions of people depends on their work.

Static analysis was conceived as a means of facilitating the writing of the correct code — to achieve this goal, static analysis tools should help developers, not hinder them with useless messages.

Each time, as a static analysis tool, a warning is issued to the correct code, one cat is sad somewhere (tm) and, perhaps, an unnecessary edit is made to the completely correct code, causing another error with consequences like Heartbleed.

Dmitry Mescheryakov,
product department for developers

Source: https://habr.com/ru/post/277261/


All Articles