I2P: Accelerating Asymmetric Cryptography with Spreadsheets

Asymmetric cryptography in I2P has always led to slower work: the Diffie-Hellman algorithm in establishing transport sessions and, in my opinion, the unsuccessful choice of the El-Gamal scheme in I2P addresses. This is especially noticeable when working on weak gland and floodfill. The approach proposed in the article is based on the use of some features of I2P and allows you to achieve a significant acceleration of work and reduce the load on the processor.

Asymmetric encryption in I2P

Currently, asymmetric encryption is based on the modular exponentiation operation and the hypothesis that it is practically impossible to use discrete logarithmization. There are plans to use other types of asymmetric encryption, but so far they have not been implemented.
The public key y is calculated on the basis of the secret key x by the formula y = g ^ x mod p, where p is simple, and in the general case, the triple of numbers (y, p, g) is transmitted as a key. In I2P, the numbers p and g are constants and should be the same for all implementations:
g = 2,
p = 2 ^ 2048 - 2 ^ 1984 - 1 + 2 ^ 64 * ([pi * 2 ^ 1918] + 124476),
where pi denotes the number pi, and [] is the operation of taking the integer part of a number.
The length p is 256 bytes, respectively, and the length of the public key.
')
According to official documentation, the secret key length is 2048 bits for x64 and 226 bits for all others.

Calculated tables

The use of calculated tables has previously been considered for EdDSA signing .
Recall briefly the essence. Over the points of the curve, an addition operation is defined and it is required, using this operation, to multiply the constant base point by a 32-byte number. To do this, we will add byte by byte, taking the result of multiplying the point by the value of each byte from the table, calculated once at the start. This will require exactly 32 additions, while the computation time becomes constant, which is extremely important for multiplying by the secret key.

Similarly, we proceed for the operation of raising g to the power modulo p, but instead of the operation of addition, we will use the multiplication operation.
Calculate a table of all possible values of degrees p for each byte, filling in for each byte an array of 255 elements, multiplying in order by g = 2.
You may notice that the first elements of the table are not necessary to be stored, since they are obtained by setting the bit in the corresponding position, however, the value should not exceed p, in fact, this corresponds to 2048 = 11 bits, that is, only one and third bytes of the table.

Thus, for a private key with a length of 226 bits, the size of the table is 29 * 255 * 255 ~ 2 megabytes, for a key with a length of 2048, the size of the table is 256 * 255 * 255 ~ 16 megabytes, which can be a significant amount, but the effect in this case is more significant.

Montgomery multiplication

Since each exponentiation requires at least 29 multiplications by module, it is advisable to use the Montgomery multiplication to speed up the work.
Its meaning comes down to replacing one module with another, more convenient for calculations module. For practical needs, the power of two is chosen and then the slow division is replaced by a fast bitwise shift.
The disadvantage is the need to convert to the Montgomery representation. However, in our case, this is done only once at the time of the calculation of the table and the only additional action is the conversion of the result.
For practical tasks, we do not need to implement Montgomery multiplication - the corresponding functions are presented in OpenSSL.

int BN_MONT_CTX_set(BN_MONT_CTX *mont, const BIGNUM *m, BN_CTX *ctx);

sets the module (in our case p)

 int BN_mod_mul_montgomery(BIGNUM *r, BIGNUM *a, BIGNUM *b, BN_MONT_CTX *mont, BN_CTX *ctx);

Montgomery's actual multiplication in context with a given module

 int BN_from_montgomery(BIGNUM *r, BIGNUM *a, BN_MONT_CTX *mont, BN_CTX *ctx);

converting the result to a normal presentation.

Implementation in i2pd

Starting with release 2.7.0, it is supported and controlled by the precomputation.elgamal parameter, disabled by default for x64 and enabled for the rest. It is recommended to enable it for work in the floodfill mode, if there are no hard restrictions on the use of memory. The load on the processor in this case is reduced by more than 2 times due to the need to frequently establish connections with other nodes.
It is used when generating key pairs during connection establishment, asymmetric encryption during tunnel construction, and garlic encryption between addresses. Similarly, you can speed up the DSA signature, but in I2P it is considered obsolete and is gradually being replaced by EdDSA.
The approach considered in the article may seem trivial, but its implementation has shown its effectiveness, making it possible to work on platforms that previously lacked performance.

Source: https://habr.com/ru/post/301304/

All Articles

I2P: Accelerating Asymmetric Cryptography with Spreadsheets

Asymmetric encryption in I2P

Calculated tables

Montgomery multiplication

Implementation in i2pd

More articles: