📜 ⬆️ ⬇️

Undefined behavior is closer than you think.

Hell is closer than it seems Many people believe that the indefinite behavior of the program arises due to gross errors (for example, writing beyond the bounds of the array) or on inadequate constructions (for example, i = i ++ + ++ i). Therefore, for many, it is a surprise when an indefinite behavior suddenly manifests itself in a completely familiar and non-alarming code. Consider one such example. When programming in C / C ++, you should never let your guard down. Hell is closer than it seems.



Error description


I haven't raised the topic of 64-bit errors for a long time. I shake the old days. In this case, undefined holding will manifest itself in a 64-bit program.
')
Consider an incorrect synthetic sample code.
size_t Count = 1024*1024*1024; // 1 Gb if (is64bit) Count *= 5; // 5 Gb char *array = (char *)malloc(Count); memset(array, 0, Count); int index = 0; for (size_t i = 0; i != Count; i++) array[index++] = char(i) | 1; if (array[Count - 1] == 0) printf("The last array element contains 0.\n"); free(array); 

This code works correctly if you build the 32-bit version of the program. But if you build a 64-bit version of the program, everything is much more interesting.

A 64-bit program allocates an array of 5 gigabyte size bytes and fills it with zeros. Then in the loop the array is filled with some random numbers that are not equal to zero. To prevent numbers from being 0, use "| 1".

Try to guess how this program, compiled in x64 mode, will behave using the compiler included in Visual Studio 2015. Have you prepared an answer? If yes, then continue.

If you run the debug version of this program, it will fall due to overrun of the array. At some point, the index variable will overflow and its value will be equal to? 2147483648 (INT_MIN).

Logical explanation? Nothing like this! This indefinite behavior and anything can happen.

Additional links:
When I or someone else says that this is indefinite behavior, people start grumbling. I don't know why, but people are sure that they know exactly how calculations work in C / C ++ and how compilers behave.

But in fact they do not know. If they knew, they would not have said all sorts of nonsense. Usually stupid things look something like this (collective image):

You carry a theoretical nonsense. Well, yes, formally overflowing 'int' leads to undefined damage. But this is nothing more than chatter. In practice, you can always say what happens. If we add 1 to INT_MAX, we get INT_MIN. Maybe there are some exotic architectures where this is not the case, but my Visual C ++ / GCC compiler produces the correct result.

So, now I will demonstrate undefined behavior without any magic using a simple example and not on some kind of magic architecture, but in a Win64 program.

Just collect the above example in Release x64 mode and run it. The program will stop falling, and the message “The last array element contains 0” will not be displayed.

The indefinite behavior here manifested itself as follows. The array will be completely filled, despite the fact that the type 'int' is insufficient for indexing all the elements of the array. For those who do not believe, I suggest to look at the assembly code:
  int index = 0; for (size_t i = 0; i != Count; i++) 000000013F6D102D xor ecx,ecx 000000013F6D102F nop array[index++] = char(i) | 1; 000000013F6D1030 movzx edx,cl 000000013F6D1033 or dl,1 000000013F6D1036 mov byte ptr [rcx+rbx],dl 000000013F6D1039 inc rcx 000000013F6D103C cmp rcx,rdi 000000013F6D103F jne main+30h (013F6D1030h) 

Here it is a manifestation of uncertain behavior! And no exotic compilers. This is VS2015.

If you replace 'int' with 'unsigned', the undefined behavior will disappear. The array will be filled only partially, and at the end the message “the last array element contains 0” will be displayed.

Assembly code when 'unsigned' is used:
  unsigned index = 0; 000000013F07102D xor r9d,r9d for (size_t i = 0; i != Count; i++) 000000013F071030 mov ecx,r9d 000000013F071033 nop dword ptr [rax] 000000013F071037 nop word ptr [rax+rax] array[index++] = char(i) | 1; 000000013F071040 movzx r8d,cl 000000013F071044 mov edx,r9d 000000013F071047 or r8b,1 000000013F07104B inc r9d 000000013F07104E inc rcx 000000013F071051 mov byte ptr [rdx+rbx],r8b 000000013F071055 cmp rcx,rdi 000000013F071058 jne main+40h (013F071040h) 

Note about PVS-Studio


PVS-Studio analyzer does not directly diagnose overflow of sign variables. This is a thankless task. It is almost impossible to predict what values ​​will have one or another variable and overflow occurs or not. However, he may notice erroneous patterns in this code, which he associates with “64-bit errors”.

In fact, there are no 64-bit errors. There are just mistakes, for example, undefined behavior. Simply, these errors sleep in 32-bit code and manifest themselves in 64-bit. But if we talk about uncertain behavior, this is not interesting, and no one will buy the analyzer. Yes, and do not believe that there may be some problems. But if the analyzer says that a variable can overflow in a cycle, and that this is a “64-bit” error, then it is a completely different matter. Profit.

The PVS-Studio code given above is considered erroneous and issues warnings related to a group of 64-bit diagnostics . The logic is as follows: in Win32, variables of type size_t are 32-bit, a 5 GB array cannot be allocated and everything works correctly. Win64 has a lot of memory, and we wanted to work with a large array. But the code failed and fails. Those. 32-bit code works, but 64-bit code does not. In PVS-Studio, this is called a 64-bit error.

Here are the diagnostic messages that PVS-Studio will issue to the code shown at the beginning:
More details on 64-bit traps are suggested to get acquainted with the following articles:

Correct code


For everything to work well, you must use the appropriate data types. If you are going to handle large arrays, then forget about int and unsigned. For this, there are types ptrdiff_t, intptr_t, size_t, DWORD_PTR, std :: vector :: size_type, and so on. In this case, let size_t be:
 size_t index = 0; for (size_t i = 0; i != Count; i++) array[index++] = char(i) | 1; 

Conclusion


If the C ++ language construct causes an undefined behavior, then it causes it and there is no need to argue with it or predict how it will manifest itself. Just do not write dangerous code.

There is a mass of stubborn programmers who do not want to see anything dangerous in the shifts of negative numbers, overflow of sign numbers, comparing this to zero and so on.

Do not be among them. The fact that the program is working now does not mean anything. How to show UB is impossible to predict. The expected behavior of the program - this is just one of the options for UB.

Source: https://habr.com/ru/post/276657/


All Articles