Fast integer division by constant

All CPU division operation is relatively slow, nothing can be done about it. But if the divisor is a constant, then the division can be replaced by multiplication by some other constant (the inverse number, which is calculated at compile time). Then the code will work a little faster (and consume less energy). Many compilers (gcc, MSVC) do this optimization, but it turns out that many developers do not know how the multiplier is calculated, and this is not trivial.

Further it will be described how the factor is calculated.

Idea and case for real numbers

For example, if in the code double x = a / 2.5 , then this can be replaced by x = a * (1.0 / 2.5) or x = a * 0.4 . The code will be faster, but sometimes less accurate. For integers such a trick in the forehead does not work, tk. the factor will be less than one (those zero). You can use fixed-point numbers (mantissa N ): int x = a / b = a * B >> N, where B = (1 << N) / b . The problem is that sometimes so calculated x will be 1 different from the correct answer, which is unacceptable for integers.
')

Algorithm

For simplicity, we will work only with non-negative numbers. Mathematical wording:

For a known integer b, find such a pair of integers M, B so that for all integers a : 0 ≤ a <(1 << N ) , x = [a / b] = [a * B >> M] , where [] is an integer the part, and >> M is to divide a by 2 to the power of M (and << multiply).

We write 1.0 / b in binary form (sometimes an infinitely long number), the first M bits of which we denote as integer C , the remaining bits (tail-remainder) as D , those 1.0 / b = C >> M + D, D <(1> > M) .

Let's see what happens if B = C and a * D ≤ 1 (those N ≤ M ):

x = [a / b] = [a * (C >> M + D)] = [a * C >> M + a * D] ≥ [a * C >> M] + [a * D] = [ a * C >> M]

It turns out that if B is the first M bits of the number 1 / b , and the remaining bits are discarded (replaced by zeros), then we will sometimes get the correct values, and sometimes 1 less than x . This happens because a * D is dropped, which, although less than one, can still change the integer part.

Let's try to increase B by one (those B = C + 1 ), then:

[a * (C + 1) >> M] = [a * C >> M + a * (1 >> M)] ≥ [a * C >> M + a * D] = [a * (C> > M + D)] = [a / b] = x

Now it turns out that we will sometimes get the correct answer x , and sometimes 1 more.

It may seem that it is impossible to accurately calculate x through multiplication because at one approximation there will sometimes be numbers 1 less than necessary, with the other 1 more. But it is not. After all, a is integers, and the maximum fractional part of a / b is exactly (b-1) / b , and in the calculations of x it increases by a * ((1 >> M) - D) . Yeah, so if a * ((1 >> M) - D) <1 / b then the inequality turns into equality !!! Let the number L be such that (1 << L) ≥ b , then for equality it is enough (1 >> M) ≤ (1 >> (L + N)) or M ≥ L + N.

Java example

In the example, the divisor is 127 ( L = 7). The result of multiplying by JVM should fit in 32 bits, choose N = (32 - 7) / 2 = 12. Example:

final static int N = 12; private static int divideBy127(int a) { final int b = 127; final int L = 7; final int B = ((1 << (N + L)) / b) + 1; return (a * B) >>> (N + L); } public static void main(String[] args) { final int size = 20000; final int times = 10001; final int[] data = new int[size]; for (int i = 0; i < size; ++i) data[i] = (i * i) & ((1 << N) - 1); // fill by any random data for (int iteration = 0; iteration < 10; ++iteration) { long time0 = System.currentTimeMillis(); // Test 1: Using div int v0 = 0; for (int j = 0; j < times; ++j) for (int i = 0; i < size; ++i) v0 ^= data[i] / 127; // keep result long time1 = System.currentTimeMillis(); // Test 2: Using mul int v1 = 0; for (int j = 0; j < times; ++j) for (int i = 0; i < size; ++i) v1 ^= divideBy127(data[i]); // keep result long time2 = System.currentTimeMillis(); System.out.println((time1 - time0) + " vs " + (time2 - time1) + " " + (v0 - v1)); } }

On my Intel Core-i5 2500, JVM 1.7.0u5 with JIT and the -XX option: + AggressiveOpts the division test works 2 times slower than with multiplication.

Addition

The rest of the topic is not considered in this article:

the dividend may be negative;
how to avoid overflow when multiplying;
the result of multiplying integers by x86 takes 64-bits and under certain conditions you can remove the shift to the right;
if it is known that the numbers are completely divided, then another algorithm can be used;
in some cases, multiplication can be replaced by left shifts and additions;
special cases for some dividers with even faster execution speed.

Literature

"Division by Invariant Integers using Multiplication", Torbjorn Granlund, Peter L. Montgomery - formal proof and some more algorithms on the topic.

Source: https://habr.com/ru/post/147096/

All Articles