📜 ⬆️ ⬇️

Algorithms for the high bit search

Here I want to tell and discuss several algorithms for finding the most significant single bit of a number.

Just in case, I will explain: the most significant bit is the unit bit of a number, which is responsible for the highest power of two. In other words, it is the largest power of two, not exceeding the number. To avoid many cases, we will assume here that we are dealing with a natural number ranging from 1 to 2 ^ 31 - 1 inclusive. In addition, in order not to go too deeply into the theory of probability, we will assume that the number in which the most significant bit is to be determined will be any of the possible numbers with the same probability.

To begin with, we will consider the simplest, the first to come to a head algorithm. Let's go through all the powers of two, and choose the maximum from them, which does not exceed the numbers. Here, obviously, you can use the monotony of this property, that is, the fact that if some degree of two does not exceed the number, then less than the degree and even more so do not exceed. Therefore, this method can be written very simply:
')
 int bit1(int x) { int t = 1 << 30; while (x < t) t >>= 1; return t; } 
int bit1(int x) { int t = 1 << 30; while (x < t) t >>= 1; return t; }

(here I am using java, but I think it will be clear to everyone, due to the code’s nature)

Let's see how long it can work. If we assume that a number with the same probability turns out to be any of the indicated interval, then, with a probability of 1/2, namely, if x turns out to be not less than 2 ^ 30, the while loop will not work any time. With a probability of 1/4, namely, if x is in the interval from 2 ^ 29 to 2 ^ 30 - 1, the cycle will work once. And so on. It is easy to understand that this means that the cycle works on average half a time. Which is very good on average, but bad at worst: on the number x = 1, the cycle will work all thirty times.

The following algorithm is free from this trouble; or rather, it is devoid of uncertainty about the time of work in general. I will first demonstrate the code, and then explain the principle of operation.

 int bit2(int x) { x |= x >> 1; x |= x >> 2; x |= x >> 4; x |= x >> 8; x |= x >> 16; return x - (x >> 1); } 
int bit2(int x) { x |= x >> 1; x |= x >> 2; x |= x >> 4; x |= x >> 8; x |= x >> 16; return x - (x >> 1); }

So, let given the number x = 000001bbbbb (I did not follow the required number of bits in the number, b means any bit). Then
 x == 000001bbbbb
 x >> 1 == 0000001bbbb
 x |  x >> 1 == 0000011bbbb

Thus, the first action after the highest edinichka, wherever she was, puts the next one. The second, as you can understand, puts two more behind these two. The third is 4 more (if there is, where to put). And so on. Thus, in the number of all the bits after the senior are single. Then it is clear that x- (x >> 1) gives us the correct answer.

The third algorithm is not completely devoid of arbitrariness in the time of work, but in the worst case, nevertheless, it works much faster than the first. It is based on binopros . Let's try to take the middle bit, for example, the 16th, and form a condition on whether the high bit will be the lowest 16th or not. It is clear that this condition will be x <1 << 16. So, you need to save the test result:

 int t = 1; if (x >= 1 << 16) t <<= 16; 
int t = 1; if (x >= 1 << 16) t <<= 16;

Well, then, of course, you need to check whether it is possible to move t by another 8 bits, then by 4, by 2, and by 1:

 int bit3(int x) { int t = 1; if (x >= t << 16) t <<= 16; if (x >= t << 8) t <<= 8; if (x >= t << 4) t <<= 4; if (x >= t << 2) t <<= 2; if (x >= t << 1) t <<= 1; return t; } 
int bit3(int x) { int t = 1; if (x >= t << 16) t <<= 16; if (x >= t << 8) t <<= 8; if (x >= t << 4) t <<= 4; if (x >= t << 2) t <<= 2; if (x >= t << 1) t <<= 1; return t; }

So, the second and third algorithms work on average longer than the first (which confirms the direct experiment, as well as the fact that the third algorithm works slightly faster than the second), but in the worst case, they work faster.
In addition, it should be noted one thing. The first method uses the magic constant 1 << 30. Its use is based on the fact that we know that a number is equally likely to be any number from 1 to 2 ^ 31 - 1. If the numbers are smaller, then you can decrease the constant. For example, if the numbers are from 1 to 100000, then we can start with int t = 1 << 17. But what if we don’t know , what will be the numbers for which the method is used? If theoretically they can be equal to 2 ^ 31 - 1, but in fact they will be no more than 1000. Then we will have to set int t = 1 << 30, and this method (as, again, confirm the experiments), will work much longer than the next two. That is, the first method not only can sometimes work for a long time, but it may also turn out that it averages longer. His time may be unpredictable. Whereas all these impediments do not at all relate to the other two methods.

Finally, another nuance. The above three algorithms return exactly the power of two - a number of the form 2 ^ k. But what if we need to find exactly k? The first and third methods are easily transformed to give an exponent, rather than a degree itself. The first algorithm starts to work a little faster, the third - a little longer. But the second algorithm is completely unsuitable for this. Thus, the second algorithm also has its own specific minus.

PS The first algorithm is too obvious to raise the question of authorship, the second one is suggested to me by one monopolistically well-known olympiad programmer, and the third one I invented myself, although I am convinced that it is far from the only one and certainly not the first.

Source: https://habr.com/ru/post/93172/


All Articles