📜 ⬆️ ⬇️

Second Coming GOST 28147-89: Honest Tests

About ten years ago, symmetric cryptography, based on GOST 28147-89, ceased to meet the needs of hardware platforms in terms of speed parameters. The speed of crypto-transformations provided by algorithms implemented on general-purpose registers of processors did not keep pace with the speed of information exchange in networks and on disk drives.

On the other hand (American), AES-256 appeared, which showed much better speed parameters with the same degree of cryptographic strength.

In this situation, the 8th FSB center began work on a new block cipher, which later received the name "Grasshopper" from the initial letters of the authors' last names.
')
Initially, it was a hopeless undertaking, since the logic of the AES cipher was repeated, but if it was accelerated by hardware in Intel and AMD processors, then the Grasshopper certainly could not have such hardware acceleration on these processors.

So Grasshopper is a classic example of budget money thrown away and not small ...

But there was another option to speed up cryptographic operations; unfortunately, he did not receive official support for specific reasons. This option involves the development of algorithms for implementing GOST 28147-89 for multithreading and the refinement of the standard itself to meet the requirements of multithreading.

Multithreading is based on three new methods of implementing GOST 28147-89 in x86-64 architecture processors.

The first is the pipeline operation of the processor in encryption mode. The second is the implementation of the replacement unit on the SSE / AVX processor commands. The third is the use of specialized registers XMM / YMM / ZMM, having a width of 16/32/64 bytes.

These new methods together made it possible to increase the encryption speed according to GOST 28147-89 by at least an order of magnitude and ensure the high-speed encryption parameters are not worse than the American AES-256, which uses a special crypto accelerator in Intel / AMD processors.

But, as they say, the god of the Rogue marks, and time puts everything in its rightful places.
Currently, the cipher "Grasshopper" is perceived as exotic and has no practical use due to the low conversion speed and dubious reputation, manufacturers began to use multi-threaded encryption according to GOST 28147-89 instead. But they try not to talk about this, because they know that copyrights are being violated.

Encryption acceleration methods are patented.

Largely because of this, there were no honest tests for the speed of various encryption systems, and in online publications and on forums all sorts of fabulous and completely unfounded figures walked, having nothing to do with reality.

Therefore, a stable myth about the superiority of AES-256 over GOST 28147-89 in terms of speed parameters almost ten times took shape ...

It is time to conduct an honest test.

Honest Tests

In the articles devoted to cryptographic transformations, various “fantastic” data on the speed of the crypto function work are given, there is a lot of cunning in these figures. We will be honest, no "synthetic" tests, all "for real".

Encryption is essentially a service background process. Crypto-procedure works in the background of much more important tasks and the situation with 100% processor load encryption is exotic. Therefore, we will not speed up the crypto function by increasing the priority and the number of processor cores used; we will limit the maximum processor load to a crypto function of 15%.

In the demo version of the program FastSecurityBoxs (a proprietary handicrafts, please write in the comments ) create dumps of encrypted disks on a four-core Skylake processor with a frequency of 2.6 g Hertz, hypertrading is activated (only 8 logical cores). The cryptographic procedure works on one logical core (out of 8 available), respectively, the CPU load created by it does not exceed 12-15 percent, which corresponds to the actual work of the background task. Two SSD disks were used for copying, the read / write speed on them in the file system mode is approximately 450-500MB / s. on a cleaned device, after running TRIM.
Here is a clean copy without cryptography:

image

Reading disk sectors ( ProjectFK.exe ) takes 5%, writing to a file (System) takes 2% of processor time at a speed of 449 MB / s. Let us remember these numbers, when we turn on cryptography, the costs of crypto-transformations will be added to them, respectively, it will be possible to estimate the costs of the crypto-transformation of the processor load.

Now, creating a dump, turn on cryptography. First, encryption in 8 streams according to GOST 28147-89:

image

Screenshot of creating backup copies of disks using crypto-transformation strictly according to GOST in 8 streams.

The speed of a cryptographic dump that is actually created is 190 MB / s. The speed is limited by crypto-transformation, since the CPU load generated by the ProjectFK. Exe program is 12%, in our case it is the limit on the load of the processor's logical core. SSD drives can work much faster, but they are hampered by the restriction on the use of only half of one physical processor core in cryptographic procedures.

The crypto-transformation in the ProjectFK program spends 7% of the CPU load.

Now encryption in 16 streams according to GOST 28147-89:

image

Screenshot of creating backup copies of disks using crypto-transformation strictly according to GOST in 16 streams. This mode works effectively only on Intel processors of the Skylake generation and higher, but this will be discussed later.

The speed increased to 334 mbyte / s. The CPU load is 10.5%, here the restriction also creates crypto-transformation, but the load on the logical processor core is reduced.

5.5% of the processor load is spent on crypto-transformation in the ProjectFK program.

Well, what can AES encryption provide? Take the most advanced solution using this cipher, Bitlocker. The encryption function is built into his OS kernel and is optimized as much as possible; it’s not a user application like FastSecurityBoxs, so it has a significant handicap ...

This is what Bitlocker gives (AES-128 for 10 rounds) when creating a dump on an encrypted disk:

image

Here, the ProjectFK.exe program only creates a dump, this dump is encrypted by the Bitlocker directly in the System process that writes it to disk, respectively, you need to summarize the processor loads created by both these processes. The load is 12.5 percent of the processor time at the rate of creating an encrypted dump of 392 MB / s.

In the System process, 5% of the processor load is spent on crypto-transformation.

Not bad at all, just keep in mind that AES-128 is not equal to GOST 28147-89, they are from different weight categories of cryptographic resistance.

"Dry residue" tests

Encryption according to GOST 28147-89 in the background (5-7% of the CPU load), at a speed of 200 MB / s., Is already fantastic, these parameters are not in any real-world applications, with hardware accelerated AES-256 cryptography for 14 rounds condition.

In AES-128 mode, it is possible to work a little faster on 10 rounds, but the cryptographic strength of this algorithm is much lower than that of GOST 28147-89 due to the size of the key being twice as small.

We state the obvious. The cryptographic procedure according to strict GOST 28147-89 in a multi-threaded version is capable of running in the background at speeds of 200-400 MBytes / s, loading the processor by only 5-7 percent. But nothing prevents to raise the speed and higher, increasing the CPU load.

According to strict GOST 28147-89, in a multi-threaded mode, the FSB-certified cryptographic tool can provide background encryption of modern SSD and HDD disks on the SATA interface with a processor load of this operation not more than 7%. This allows the use of GOST 28147-89 in programs like Bitlocker even more efficiently than the currently used American standard AES-128, but with much more “strong” cryptographic protection ...

For disks on the interface, NVMe is already operating at speeds of 2–3 Gbyte / s. and for 10G / 10G + networks, you need to increase the speed at least another two times.

There are two ways. The first, passive path has already been tested, when we switched from 8 streams to 16 streams when implementing GOST 28147-89. You can wait for the AVX-512 instruction set to be implemented in the processor. These commands operate on 64-byte registers and can provide multi-threaded execution according to strict GOST 28147-89 in 32 threads at once. That will automatically double the performance of the crypto function.

There is a small “but”, a processor with support for the AVX-512 will only appear next year, and the AVX-512 commands will first be emulated by firmware, which means they will be executed very slowly. So slowly that using them in real programs is pointless.

We have already seen this in the example of the introduction of AVX2 commands, 3 years passed between the appearance of this instruction set and the translation of its execution into a hardware cycle. Only in 2016 Intel introduced hardware support for AVX2 commands on Skylake generation processors.

Wait for Intel to transfer the AVX-512 to a hardware implementation will need about 3 more years ...

The second way is active, it is to develop a new encryption algorithm, originally sharpened on multithreading, because now only a small part of the potential of parallel computing is used.

So let's take this further, do not wait for 3 years ... So expect in the near future the third coming of GOST 28147-89.

Source: https://habr.com/ru/post/318768/


All Articles