
I will continue the stories about how programmers walk along the edge without even knowing it. Talk about shift operations <<, >>. The principles of operation of shift operators are obvious and many programmers do not even know that their use according to the C / C ++ standard can lead to undefined or unspecified behavior (undefined behaviors / unspecified behavior).
Previous articles are available here: [
1 ], [
2 ].
Historical excursion
At the beginning of a little story. The need for bit shift operations is obvious to any programmer. Sooner or later, everyone is faced with the need to work with individual bits and bitmasks. However, shear operations are much more popular than they should have been. The reason is that by using shifts, you can multiply and divide numbers by a power of two. For example, the operation “X << 3” multiplies X by 8. The advantage of this method of multiplication and division was the speed of their work in the past.
')
Now I have got a book with a description of assembler commands for processors starting from 8086 and ending with 80486 from the dusty shelf. And I found a table about the number of cycles required to perform various instructions.
Multiplying a 16-bit register by a memory cell using the MUL instruction on an 8086 processor requires about 124-139 cycles!
Shifting a 16-bit register by N positions using the SHL instruction on an 8086 processor requires 8 + 4 * N ticks. That is, in the worst case, 72 tacts are obtained.
It was possible to obtain significant acceleration when calculating arithmetic expressions using various tricks when working with bit operations. This was the reason for the massive use of shifts, at the beginning, in assembly language, then in C and C ++ languages. The first C / C ++ compilers were simple. A person could get a gain in speed, obviously telling the compiler that it is necessary to use a shift here, and not instructions for multiplication or division.
With the development of processors, the benefits of using shifts have been maintained for a long time. In the 80486 processor, multiplication began to take about 26 cycles. It seems much better. However, the shift began to take only 3 bars and, again, was more attractive than multiplication.
Fortunately, these forced optimizations are mostly sunk into oblivion. First, compilers have become intelligent and use the optimal set of instructions to calculate arithmetic expressions. Secondly, the processors have also changed tremendously. Pipelines appeared, transition prediction, register renaming, and much more. Therefore, an ordinary programmer is no longer able to tell how long it takes to execute a particular instruction. But it is clear that if the code is not perfect in some places, this can not even be noticed. The processor will break the instructions into micro-instructions and start executing them in parallel. To be honest, I no longer understand how everything is happening there now. I realized that to understand the intricacies of meaningless, starting with the processor Intel Pentium. And he concluded that you should not think that you know better how to write an optimizing code, use shifts and bit operations wherever possible. Far from the fact that as a result, the code will be faster than the optimizer does in the compiler. But you can definitely say that the program will become confusing and difficult to understand.
Note. The above does not mean that the use of bit operations can no longer bring benefits. There are many interesting and useful tricks [
3 ]. The main thing is not to get involved.
Indefinite behavior
It all started with the fact that I decided to increase in the PVS-Studio analyzer the number of warnings related to undefined behavior [
4 ] and unspecified behavior [
5 ]. The rule detecting incorrect use of shift operations was implemented fairly quickly and simply. After that, I had to stop and think.
It turned out that programmers are very fond of shifts. And they are used in every possible way, often leading to indefinite behavior from the point of view of the standard. But theory is one thing, practice is another. Does it make sense to swear at the code that faithfully served for decades and has experienced more than one compiler? Complex issue. Despite the fact that the code is incorrect, the compilers follow no tacit agreement and process it in a uniform way.
After much deliberation, I decided to leave this diagnostic rule in PVS-Studio without making exceptions from it. If there are too many complaints from users, then maybe I will change my mind. However, users may be satisfied with the ability to turn off this diagnostics or use other
methods of suppressing warnings.
By the way, it was these spiritual torments that prompted me to write an article. I hope the information that I show will be interesting and useful.
So, let's see what is written in the C ++ 11 standard about the shift operators:
The shift operators << and >> group left-to-right.shift-expression << additive-expressionshift-expression >> additive-expressionThis is an integral part of the business process.1. The promoted left operand. If you are on the right side of the left operand.2. The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If it is an unsigned pattern, it can be used to make it possible. Otherwise, if E1 is a signed type and non-negative value, and E1 * 2 ^ E2 is representable, then that is the resulting value; otherwise, the behavior is undefined.3. The value of E1 >> E2 is E1 right-shifted E2 bit positions. It is an insignia of the E1 / 2 ^ E2. If E1 has a signed valueReading such texts is sad and sad. But do not worry. Now we will look at various incorrect situations with examples.
The simplest case leading to undefined behavior is when the right operand has a negative value. Example:
int A = 10;
int B = A << -5;
So, thank God, nobody does. At least, after analyzing more than 70 open-source projects, we did not encounter such errors.
The following case is much more interesting. This is a shift by N bits, where N is greater than the number of bits in the left operand. The simplest example is:
int A = 10;
int B = A << 100;
Let's see how this error may look like in practice. The following code snippet was found by us in the Lib7z library:
SZ_RESULT
SafeReadDirectUInt64 (ISzInStream * inStream, UInt64 * value)
{
int i;
* value = 0;
for (i = 0; i <8; i ++)
{
Byte b;
RINOK (SafeReadDirectByte (instream, & b));
* value | = ((UInt32) b << (8 * i));
}
return SZ_OK;
}
PVS-Studio diagnostic message: V610 Undefined behavior. Check the shift operator '<<. The right operand ('(8 * i)' = [0..56]) is the left operand. lib7z 7zin.c 233
The function attempts to read the 64-bit value byte-by-byte. Unfortunately, it will not work if the number was greater than 0x00000000FFFFFFFF. Notice the shift "(UInt32) b << (8 * i)". The size of the left operand is 32 bits. In this case, the shift occurs from 0 to 56 bits. In practice, this will lead to the fact that the older part of the 64-bit value will remain filled with zeros. In theory, there is generally undefined behavior and the result is unpredictable.
The correct code should look like this:
* value | = ((UInt64) b << (8 * i));
The reader may wonder if the code below is correct?
char A = 1;
int B = A << 20;
Yes, it is correct. To the left of the operator << there is a variable A, consisting only of 8 bits. But before the start of the shift operation, the left side will be expanded to type int. Accordingly, a value of type 'int' can be shifted by 20 bits.
And now the most interesting moment. This is a shift in negative values. The simplest example is:
int A = -1 << 5; // undefined behavior
int B = -1 >> 5; // unspecified behavior
In this code, there is undefined and unspecified behavior. From a practical point of view, there is no difference. Only one conclusion - you can not write like that.
On this one could put an end and give a couple of examples. Unfortunately, there are two nuances that spoil the ideal picture of the world.
Nuances that spoil the perfect picture of the world
Nuance N1. In the old C ++ standard of 1998, situations with uncertain behavior are bypassed. It is said how the << operator behaves when shifting unsigned values. And about sign values nothing is said. In general, the very case when reading the standard does not add clarity. Not considered such a case and all.
So, from the point of view of C ++ from 1998, the construction "-1 << 5" does not lead to indefinite behavior. However, how it should work is also not described.
Nuance N2. Programmers boldly shift negative values in many programs. And it will be difficult to argue with them, because the code works.
Let us try to figure out whether, because of these nuances, we should abandon the new diagnostics. We think not.
In the old C ++ standard, it does not say about undefined behavior. The new says. It turns out that the old standard was simply not accurate enough. By the way, in the new standard of C (I watched a draft of June 25, 2010), it is also said that shifts of negative values lead to indefinite behavior. Conclusion - should get rid of incorrect code.
Now, regarding the ubiquitous use of dangerous shifts. There are really a lot of them. For example, in the JPEG library, you must fill the array with the following values:
11111111111111111111111111111111b
11111111111111111111111111111101b
11111111111111111111111111111001b
11111111111111111111111111110001b
...
This is written as:
/ * entry n is (-1 << n) + 1 * /
static const int extend_offset [16] = {0,
((-1) << 1) + 1, ((-1) << 2) + 1, ((-1) << 3) + 1,
((-1) << 4) + 1, ((-1) << 5) + 1, ((-1) << 6) + 1,
((-1) << 7) +1, ((-1) << 8) +1, ((-1) << 9) +1,
((-1) << 10) + 1, ((-1) << 11) + 1, ((-1) << 12) + 1,
((-1) << 13) + 1, ((-1) << 14) + 1, ((-1) << 15) + 1
};
It's hard to call a jpeg library bad. And this code is tested by time and different compilers.
From the point of view of the standard, it should now be rewritten as follows:
static const int extend_offset [16] =
{0,
((~ 0u) << 1) | 1, ((~ 0u) << 2) | 1, ((~ 0u) << 3) | one,
((~ 0u) << 4) | 1, ((~ 0u) << 5) | 1, ((~ 0u) << 6) | one,
((~ 0u) << 7) | 1, ((~ 0u) << 8) | 1, ((~ 0u) << 9) | one,
((~ 0u) << 10) | 1, ((~ 0u) << 11) | 1, ((~ 0u) << 12) | one,
((~ 0u) << 13) | 1, ((~ 0u) << 14) | 1, ((~ 0u) << 15) | one
};
But whether to make such edits you decide. I can only give advice after all to do it. It is not known when and how, it can manifest itself.
You can give other examples of shifts of negative values in different programs. But they are all of the same type, so it will not be interesting to read about them.
findings
- Previously, the use of bit operations and shifts was a sign of a programmer's skill and allowed writing fast code. Now it is almost not relevant. It is much more important that the code is clear. Use games with bats only in case of real need.
- Expressions of the form "-1 << N" are now declared as incorrect and leading to undefined behavior.
- Expressions like "-1 << N" are long and often used. Therefore, it is difficult to bring real arguments against the use of such structures. The argument is only new standards of C and C ++ languages.
- Decide for yourself whether to correct the shift in negative values or not. But I recommend a fix. Just in case.
- Diagnostic messages regarding dangerous shifts will be available in PVS-Studio starting from version 4.60, which will be released soon.
Additional resources
- Without knowing the ford, do not go into the water. Part one. http://habrahabr.ru/post/137039/
- Without knowing the ford, do not go into the water. Part two. http://habrahabr.ru/post/137411/
- Sean Eron Anderson. Bit Twiddling Hacks. http://www.viva64.com/go.php?url=837
- Wikipedia. Undefined behavior. http://www.viva64.com/go.php?url=663
- Wikipedia. Unspeakable behavior. http://www.viva64.com/go.php?url=738
- Alena C ++. The difference between unspecified behavior and undefined behavior. http://www.viva64.com/go.php?url=739