Fast index modulo multiplication

Introduction

Typically, this material is given with an abundance of formulas and is designed more for mathematicians. I will try to paint it most easily on simple numerical examples in terms of the application of this method in microelectronics at the hardware level. In numerical examples, for clarity, the value p = 11 will be used.

Formulation of the problem

Suppose that we need to perform the multiplication of the following form: res = (a*b) mod p , where
0 <= a < p
0 <= b < p
p is a prime number.
mod p is the operation of finding the modulo residue.
And it should be done at a low level, where there is no such multiplication operation and the operation of taking the remainder of the division or they are implemented quite difficult (for example, in an electronic device).

Simple solutions

The first thing that comes to mind: multiply, then divide and take the remainder. This approach has the right to exist, but it is extremely costly in the number of operations and rather difficult to implement.
The second thing you can think of is to implement this operation by a two-dimensional multiplication table of size p by p . What makes sense if p small, but as p grows, the cost of storing the table quadratically increases (Fig. 1).

Figure 1. The multiplication table modulo p for p = 11.

The multiplier based on the index method

However, there is a method that requires one (or for the convenience of two) tables of dimension p . The method is based on the replacement of multiplication by addition. And it can be schematically illustrated by the following figure (fig. 2):

Figure 2. Index multiplication.
')
We explain why this is possible. The index representation of a number is based on the concept of a primitive root modulo a simple p [1]. The primitive root w is an integer, the construction of which to a power of 0, 1, 2, …, (p-2) gives non-repeating deductions modulo p . A primitive root always exists for any simple p (proved by Gauss in 1801). In this case, each integer q from the interval (0; p) can be associated with a number i such that: q = (w ⁱ ) mod p . And thus get the following match:
(a*b) mod p <-> w^((ia+ib) mod (p-1)) [2].

Consider an example for the module p = 11 . The primitive root w for this modulus value is 2. As it is easy to make sure that w raised to a power of 0, 1, ... 9, it gives non-repetitive results:

(2 ⁰ ) mod 11 = 1 mod 11 = 1
(2 ¹ ) mod 11 = 2 mod 11 = 2
(2 ² ) mod 11 = 4 mod 11 = 4
(2 ³ ) mod 11 = 8 mod 11 = 8
(2 ⁴ ) mod 11 = 16 mod 11 = 5
(2 ⁵ ) mod 11 = 32 mod 11 = 10
(2 ⁶ ) mod 11 = 64 mod 11 = 9
(2 ⁷ ) mod 11 = 128 mod 11 = 7
(2 ⁸ ) mod 11 = 256 mod 11 = 3
(2 ⁹ ) mod 11 = 512 mod 11 = 6

To get the conversion table between the normal {q} and index {i} representation, you must sort the resulting value pairs in ascending order. Thus, the direct conversion table for the p = 11 module will look as follows:

q	one	2	3	four	five	6	7	eight	9	ten
i	0	one	eight	2	four	9	7	3	6	five

And the inverse transformation table for the p = 11 module will look like this:

i	0	one	2	3	four	five	6	7	eight	9
q	one	2	four	eight	five	ten	9	7	3	6

Find the value of the expression (3 * 5) mod 11. The numbers 3 and 5 have the corresponding indices 8 and 4 (see table 1). Summing these indices modulo (11-1) = 10, we get the result (8 + 4) mod 10 = 12 mod 10 = 2. From table 2 we find that the inverse transformation for index 2 gives the final result equal to 4.

The structural diagram of the index multiplier modulo m = 11 for the considered example can be viewed in the following figure (Fig. 3):

Figure 3. Scheme of the index multiplier for p = 11.

Zero values for inputs

If you look closely at the tables, you can see that there are no zero values for the input data. This is due to the fact that w ⁱ ! = 0 for no value of i . This case is processed separately (or the concept of a singularity is introduced with special rules for its processing). If 0 appears on one of the multiplier inputs, then the output will also be 0, which follows directly from the multiplication rules.

Paralleling multiplier

It turns out multiplication can be done even faster. If the number (p-1) can be divided into mutually simple factors p-1 = m1*m2*…*mr , then the addition operation can be divided into r addition operations of a smaller dimension. In this case, the index is converted into a vector of length r , according to the following formula: ( i mod m1, i mod m2, …, i mod mr ). And summation is performed independently for each element of the vector.

Consider this by example. For p = 11, the value of p -1 = 10 and it can be split into mutually simple factors uniquely: 10 = 2 * 5 ( m1 = 2, m2 = 5). In this case, table 1 may be listed as follows:

q	one	2	3	four	five	6	7	eight	9	ten
i	0	one	eight	2	four	9	7	3	6	five
(i `mod` 2, i `mod` 5)	(0, 0)	(eleven)	(0, 3)	(0, 2)	(0, 4)	(14)	(12)	(13)	(0, 1)	(ten)

And the reverse conversion table, respectively, as follows:

(i `mod` 2, i `mod` 5)	(0, 0)	(0, 1)	(0, 2)	(0, 3)	(0, 4)	(ten)	(eleven)	(12)	(13)	(14)
q	one	9	four	3	five	ten	2	7	eight	6

We will find, as in the previous example, the value (3 * 5) mod 11. First, we look for the corresponding vectors in the table: 3 -> (0, 3), 5 -> (0, 4). Then we add elementwise ((0 + 0) mod 2, (3 + 4) mod 5) = (0, 2). From the inverse transformation table, we find the answer: (0, 2) -> 4. The parallel multiplier circuit is shown below (Fig. 4):

Figure 4. Scheme of parallel index multiplier for p = 11.

How to search for a primitive root?

If you honestly did not ask this question. I use brute force starting from 2 to p. Or, you can use a ready-made sequence: oeis.org/A046145 . If there is a more effective method, write in the comments.

How to design a modulo adder (p-1)?

Due to the peculiarities of the input data of the modulo adder (p-1) , namely, that both inputs come in a number less than p-1 , which means their sum is less than 2*(p-1) . From this it follows that you can use any of the standard adders, the output of which is adjusted according to the following algorithm: if the value is greater than or equal to (p-1) , then subtract from the result (p-1) , otherwise leave it unchanged.

Verilog generator

At leisure, I wrote an online Verilog generator to implement an index multiplier modulo. A drawing of his work is also drawn there.
Verilog for multiplication modulo 11
Verilog for multiplication modulo 31
Verilog generator for any number up to 1000

Literature

[1] en.wikipedia.org/wiki/%D0%9F%D0%B5%D1%80%D0%B2%D0%BE%D0%BE%D0%B1%D1%80%D0B0%D0% B7% D0% BD% D1% 8B% D0% B9_% D0% BA% D0% BE% D1% 80% D0% B5% D0% BD% D1% 8C_% 28% D1% 82% D0% B5% D0% BE% D1% 80% D0% B8% D1% 8F_% D1% 87% D0% B8% D1% 81% D0% B5% D0% BB% 29
[2] www.researchgate.net/publication/224735018_A_fast_RNS_Galois_field_multiplier

From the author

If you have any additions to the article I will be glad to see them in the comments. And another article turned out to be poor in reference with the details, if anyone has something to throw. =)

Source: https://habr.com/ru/post/149273/

All Articles