Why is Keccak so cool and why was it chosen as the new SHA-3

Hi% username%!
I, as never a professional mathematician and a cryptographer, rarely immediately understand how this or that algorithm works. And even more so why they choose him.
So with the new standard SHA-3. Chose some Keccak, thanks comrade NeverWalkAloner , gave his description. But personally I didn’t understand how it works and what its chip is. Let's figure it out.

At the end of the article there will be a small bonus to paranoids in the form of information to reflect on the persistence of SHA-2.

To begin with, I will say that all hashes tend to be as similar as possible to the so-called "Random Oracle". Therefore, to start a little about him.
')
A random oracle is an abstract black box with infinite memory and operating according to the following algorithm:

Receives a string at the input and looks at whether he worked with it earlier.
If not, it generates a true random number and saves the string-number pair.
If yes, it gives the previously saved number for this string.

Looks like a hash, right? Only with the fundamental difference that the relationship between the hashed string and the result cannot be calculated in principle . Go ahead.

Another scourge of all hash functions is collisions. Not those that can, by definition, be, because the hash has a fixed size, but those that happen much earlier have bad collisions (first and second kinds). Most modern implementations of hashes use various so-called. compression functions that convert a block of text of, say, n, to a block, say, n / 2. And so far no one has proved that it is possible to construct a compression function that is resistant to collisions.

And then the authors of Keccak went the other way. They decided to use pseudo-random permutations in which there can be no collisions by definition. Let us dwell on this moment in more detail.

Recall how any block cipher works.

He takes a block of plaintext of a fixed length, a key, and matches them with a ciphertext of the same length. This mapping is unambiguous, otherwise we would not be able to decrypt the block back. Thus, the ideal block cipher (say, with a block size of 128 bits) can be represented as a non-figable table, in the first line of which there will be all possible keys ( ^2,128 ), in the first column all possible open texts (also ^2,128 ), and at the intersection of rows and columns will be truly randomly generated cipher blocks, and the ratio of the open key-to-encrypted block will be unambiguous.

In mathematics, this one-to-one relationship is called a bijection .

Note that in any row and any column you can find all possible combinations of bits (of length 2 ¹²⁸ for our case) in a random order , and each combination will occur once . This is a permutation . Well, in real life, pseudo-random permutations are used, described by algorithms. Because it is impossible to generate such a table.

I think that now for non-specialists it is clear that the collisions here simply have no place to take. Otherwise, the same open block could be associated with different cipher blocks and vice versa.

And what is the place of hashes and Keccak here?

The authors of Keccak came up with a simple (if you look) scheme, the so-called. Sponge function, or in Russian sponge.
Inside this “sponge” there is a state (with a size of 1600 bits for SHA-3), to which the same function is applied at each round, implementing a pseudo-random permutation.

That is, this is essentially a block cipher without a key with a block size of 1600 bits. And if we fill the state with zeros, perform 10 rounds (10 times apply their function f) in one direction, and then the same in the opposite direction (with the function inverse f), then again we get zeros. Bijection in all its glory. But that is not all.

To hash a string, you must first break it into pieces of a certain size and, at each round, fold (mix) them not to the entire 1600 bit state, but only to its beginning of the size r. Here I will insert the same picture, now it makes sense.

That is, at each round, the next piece of the line is mixed only into a part of the state, whereas the pseudo-random permutation f processes the entire state entirely , thus smearing the line as it is and making it dependent on the whole line.

This is the so-called “soaking up” stage. And now it is clear why he is so called. I hope you too. Just a little bit left!

In order to get a hash, we continue to apply the permutation function f to the state, and at each stage we copy from it only a piece of size r until we get a hash of the required length (we concatenate these pieces). This is the so-called. "Squeezing" sponge. That's all.

Due to the fact that we are not using the whole state, but only a part of it, we cannot say anything about the relationship between the input and output data. And the authors of Keccak proved that the durability of such a construction is indistinguishable from a random oracle with an “answer” size equal to c / 2 (the entire state is c + r in size), provided that f is an ideal permutation function (that label is from an ideal block cipher, only two columns, because there is no key). That is, to get a hash with mathematically proven strength of 512 bits, you need to make the parameter size with (independent part of the state) 512 * 2 = 1024 bits. Thus, in the state of 1600 bits for 512 bit hash, we do not touch 1024 bits, and we mix in the first 576 part of the hashed string at each round.

Not without a spoon of tar:
The authors have well described the function f, it consists of five fairly simple reversible operations (even there are animations ), it looks good and has not yet broken it, but there are no guarantees that it will not be broken in the future. But if this happens, only f will need to be replaced (for example, with a normal block cipher like AES, only without a key), and the “sponge” can be left, since it is (proven) resistant at stance f.

On this, I have everything, I hope, it turned out more or less to put everything on the shelves. Thanks to the guys with pgpru for consulting about bections, and to the authors of keccak for the code of inverse functions that make up the function f (available in the keccak-tools package on their website).

Bonus

When picking the streaming HC-128 noticed 2 interesting features that were re-entering the same number, twisting it 2 times to a different number of bits + one more match with the same one shifted to the right. 3 different constants (6 for both functions)
On java it looks like this

int func(int x, int a, int b, int c)
{
return (ror(x, a) ^ ror(x, b) ^ (x >>> c));
}

Then it turned out that these are 2 functions sigma1 and sigma2 from SHA-2.

And I got confused of something, because they look beautiful, and how they work - xs.

Original a, b, c of SHA-2 and HC-128
{7, 18, 3} and {17, 19, 10}

I decided to calculate the tricky period by taking an infinite loop f (f (f (f ... (1)))) and looking for repetition. It turned out that the function with the original constants had such a “period” 49146, and I managed to find a bunch of pairs whose “period” was 131070. And, like the original constants, the sets that were generated during this cycle did not overlap. for example

{22, 13, 3} and {18, 4, 9},
{24, 15, 2} and {18, 3, 4}

So, the func function, if you count the period normally, running from 0 to 2 ³² -1 has a period of 2 ³² , that is, again, is bijective.

Further quote from the topic with pgpru:

Even famous cryptographers tried to investigate this NSA shny trick, but they did not seem to come to any definite opinion.

And and ma G G G 2 G F F F F - F 1 ma (linear linear linear linear linear was F F F F F was 0 vector).

"Evaluation Report Security Level of Cryptography - SHA-256" © Dr. Helena Handschuh, Dr. Henri gilbert

The weight of the low-weight inputs. This is an increase in the number of bounds. However, the bit-shift is given by the next observation. It is not possible anymore (even in the XOR-linearized case). This is due to the bitshift being used in σ0 and σ1.

Preliminary Analysis of the SHA-256 Message Expansion © Norbert Pramstaller, Christian Rechberger, Vincent Rijmen

We’ve been able to see what’s the problem. hash operations

The S-boxes σ0 and σ1 play for a message.

“Analysis of simpli ﬁ cial variants of SHA-256” © Krystian Matusiewicz, Josef Pieprzyk, Norbert Pramstaller, Christian Rechberger, Vincent Rijmen.

In general, these are 32x32 linear or weakly linear S-blocks, described by a simple combination of operations. How they were precisely selected is not clear.

These are the pies. They stuffed some constants, pushed their algorithm as standard, and didn't explain anything to anyone. And what the hell is not joking, maybe the pseudoperiod that I found allows the uncles from the NSA to fake hashes like two fingers. Or maybe not.

Source: https://habr.com/ru/post/168707/

All Articles

Why is Keccak so cool and why was it chosen as the new SHA-3

Recall how any block cipher works.

More articles: