📜 ⬆️ ⬇️

There is nothing to blame on the mirror, if the curve is crooked

One of the most depressing things for every programmer is the realization that all your time is spent not on creating something useful, but on eliminating problems that we ourselves create.

This process is called debugging. Every day, every programmer appears before the fact that when he writes code, he creates errors in the code. And as soon as he realizes that his program is not working, he should look for problems that he himself has created.

To solve such problems, the computer industry has created a huge number of tools that help you make sure that the program works correctly. Programmers, for finding errors, use methods of continuous integration , unit-testing , statements , debuggers , etc. But mistakes still remain, and must be eliminated with the help of human thinking.
')
Some programming languages, such as C, are extremely susceptible to such types of errors that appear and disappear randomly, and as soon as you begin to understand the reason for their appearance, they immediately disappear. Such errors are often called the Heisenbags , because as soon as you start looking for them, they disappear.

Such errors can occur in any programming language, especially when writing a multi-tasking code, where the slightest delay in time can cause a race condition . But in C there is another problem - a memory leak.

However, that would not cause an error, the key steps in finding a problem are always the following:

Recently, a story appeared in Hacker News - If you have a heisenbag in C, then there is a problem in your compiler optimizer . This is a very wrong judgment.

The compiler that you use is likely to be used by thousands of people, while your program is most likely used only by you. Do you think that the compiler or your program is most stable?

In fact, a sign of the programmer's inexperience is the fact that the first thing they do when looking for a mistake is blaming someone else. It is very tempting to blame the computer, the operating system, the library you are using. However, the real programmer is the one who can control his “I” and realize that the error is most likely his.

Of course, there are errors in the code of other programmers. There is no doubt that the library may not work, the operating system may do incomprehensible things, and the compiler to generate strange code. But most of the time - this is your mistake, and this applies even if the error looks too strange.

In the process of debugging, you often beat your head on your own code and repeat again and again the most impossible things that simply cannot happen to your code. However, at one point, the impossible becomes possible and then you find a mistake.

In the article above there is one definitely not complete example:

“Disable the optimizer and check the program again. If it works, then the problem is in the optimizer. Play around with optimization levels by raising the level until the error starts playing. ”

All you know when changing optimization levels is that the levels change regardless of whether an error occurs or not. This does not tell you that the optimizer is working incorrectly. You did not find the desired cause of the error.

Since optimizers make manipulations with code to speed up their work, it is likely that, depending on the level of optimization, heisenbags may appear or disappear. This does not mean that the optimizer is working incorrectly. This is still, most likely, your mistake.

Here is a specific example of a C program that contains an error that appears when the compiler optimization level changes, and shows the strange behavior of the program.

#include <stdlib.h>

int a()
{
int ar[16];
ar[20] = (getpid() % 19 == 0);
}

int main(int argc, char * argv[])
{
int rc[16];
rc[0] = 0;
a();
return rc[0];
}


Compile this program using gcc on Mac OS X using the following Makefile (I saved the code in the odd.c file).

CC=gcc
CFLAGS=

odd: odd.o


And here is an example of a script that runs the program 20 times and displays the result:

#!/bin/bash
for i in {0..20}
do
./odd ; echo -n "$? "
done
echo


If you run this script, you will expect a string of zeros, since rc [0] never gets values ​​other than zero. However, here is an example of the program:

$ ./test
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


If you are an experienced C programmer, you will see how I made a unit appear, and why it appears in different places. But now let's try debugging the program with printf:

[...]
rc[0] = 0;
printf( "[%d]", rc[0] );
a();
[...]


Now when you run the program, the error will disappear.

$ ./test
[0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0
[0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0


It looks weird, so you move printf to another place:

[...]
rc[0] = 0;
a();
printf( "[%d]", rc[0] );
[...]


and get the same strange result with the disappearance of the error. And the same will happen if you disable the optimizer and even without printf the error will not appear:

$ make CFLAGS=-O3
gcc -O3 -c -o odd.o odd.c
gcc odd.o -o odd

$ ./test
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


This all happens because f-a () allocates memory for 16 Integer elements. And immediately writes down after the end of the array either 1 or 0, depending on whether the PID of the process is divided into 19 or not. Ultimately, it is written to rc [0] due to its location on the stack.

Adding printf or changing the optimization level changes the location of the code and excludes incorrect access to rc [0]. But be careful, the error did not go away. The unit simply enrolled in another memory location.

Because C is very susceptible to this type of error, it is important to use good tools to check for such problems. For example, the static code analyzer splint and the memory analyzer valgrind help eliminate a lot of nasty errors. And you should develop your applications with the maximum level of errors and eliminate them all.

Only if you do everything you need, can you begin to suspect someone else's code. But even if you started to do it - check all the steps again to establish the true cause of the error. Unfortunately, in most cases, most of the errors are yours.

Source: https://habr.com/ru/post/85884/


All Articles