Time-memory trade off and idle tables

No, I will not tell you with what parameters you need to generate rainbow tables, or how to invent "strong" passwords. By itself, the subject matter is a bit outdated and will hardly help in abstract questions. But, as it turned out, the “rainbow tables” are based on a wonderful method (I would not call it a method or algorithm) to exchange time for memory, I mean, “time-memory trade off”. This is not the first (and, probably, not the last) topic about predictions, but I hope you will like it.

Hash functions

The paragraph is boring, those who know - flick through. But if you do not know, or doubt, then proceed.

Hash function - a function that performs one-way conversion. For a given string (password), it receives a certain hash value (hex-notation: usually a sequence of numbers and letters a - f), such that it is very difficult to obtain any information about the original password string.

Initially, the word is taken from the English language: hash is a jumble, garbage, unnecessary information. There are two possible transliteration options in Russian: hash or hash.
The meaning of the word can be interpreted as follows: by the hash value, we will not get any useful information, so by itself it is a meaningless set of bytes, garbage.
')
However, having the property of ordinary functions (that is, by the same lines at any time we get the same hash values), hash functions are useful in the area of authorization (access restriction). By password it is very easy to get a hash value that can be stored and used to compare passwords with the original one. At the same time, even with access to the database of hash values, it would seem, you can not get access to passwords.

A simple example: We want to save the test string as the password for the Admin user. To do this, we write to the password database:
Admin: 098f6bcd4621d373cade4e832627b4f6 (this sequence of characters is the MD5 hash value from the test string).
Now we can check any string for compliance with our password very easily - we calculate the hash value from it, and compare it with the one we saved in the database.

For example, MD5 from the rest line is 65e8800b5c6800aad896f888b2a62afc and it does not match what we saved. So rest is not a password for admin.

At the same time, MD5 from the test string is always 098f6bcd4621d373cade4e832627b4f6 , which coincides with the saved value. So, test is always suitable as a password for Admin.

Password recovery

Forget the word "hacking", and the fact that "recover" can only be "their" passwords - now it does not matter. We have everything — nothing, the hash function f and the hash value y . It is required to find the string x (password), such that f (x) = y , mathematics.

And we would have easily solved the problem if f (x) were equal to x ^ 2 , it seems that the school was taught. But it was not there, mathematicians say that for any honest hash function, it’s possible to find the opposite at a particular point except by direct brute force. And you need to go through O (2 ^ hash value).

In fact, this is not the case, and for all real-life functions this search has a smaller order, and honest hash functions may not exist at all, this question depends on the existence of pseudo-random generators, but they, in turn ... and we will not refute this thesis , if only because we can not.

So bust.

Brute force options

The search task is formalized very easily (albeit in many ways). Suppose we have the alphabet A (do you still remember the letters?). For a given n (for now remember), we construct all the lines of n characters of the alphabet A. For each line, we calculate its hash value. If it coincides with y , then it seems we have solved the problem.

Here I want to upset those who still consider the strength of the password dependent on the length of the password. It depends on length hardly more than on the contents of the alphabet.

And as an example, take the alphabet consisting of 4 lines: a , b , test , rest .

For n = 2, the search will cover the lines aa, ab, atest, arest, ba, bb, btest, brest ... and so on, only 4 ^ 2 = 16 options.

The complexity of the reordering algorithm is O (| A | ^ n).

Parameter evaluation

The search will be successful in one of two cases:
1. we stumble upon a first-order hash collision .
2. The initial password will be contained in A ^ n (that is, it can be composed in the words of the alphabet).

Regarding the first case, I can say that this idea should be left. Even for MD5 , which has only 2 ^ 128 different hash values, this will take a lot of time (millennia when going through a billion hash values per second). Just do not confuse collisions of the first kind in free collisions (of the second kind), they are really easy to find, but in practice it is completely useless.

So we aim at situation 2. It is easy to estimate the parameters and the order of calculations for typical situations.

No cheating

So, the tables will not allow to reduce this order. This would be contradictory to the thesis about the "irreversibility" of hash functions. All that the tables can do is to store some precomputed (for a time longer than the time of a direct search!) Information, which allows finding the corresponding password for each specific value rather quickly .

Precomputation algorithm

We will write to the table a pair of two hash values s _i and e _i . Lots and lots of such couples.

Where s _i = f ( random string from the alphabet ).
s _i ¹ = f ( r (s _i ))
s _i ² = f ( r (s _i ¹ ))
...
e _i = f ( r (s _i ^m ))

So, stop, what is r ? And this is such a special reduction function. It displays the hash value of the function f back into the alphabet A ^ n according to any law, but if possible, preserving injectivity (the function must give different values from different arguments).

If you forget for a while how to get the values, you can write them in a chain :
s _i → s _i ¹ → s _i ² → ... → s _i ^m → e _i , where m is the length of the chain, the same for all chains of one table.

The difficulty of obtaining one pair of O (m), total in the table of N pairs, respectively, O (N * m) time will be spent on the generation of the table. To save to disk O (N) memory. A password to any of the intermediate values of s _i ^k can be found for O (m).

Translation into “human language” will sound something like this: in order to be able to recover any alphabet password A ^ n, you need about A ^ n / 5000 (for m = 5000) disk bytes, about 5000 steps of the recovery algorithm.

The more predicted, the greater the number of passwords we can recover, in the same time. But the "weight" of such tables will be more. So time-memory trade off.

Extreme cases are simple: for m = 1, we need to save all the hash values of a given alphabet to disk, which is a lot. But the recovery time will be O (1) (in fact, O (log (| A | ^ n) due to the use of search, but oh well).

And for m = | A | ^ n, the table only takes n-eleven bytes, but the search will take as much time as without the table. Yes, and may not succeed because of the probabilistic nature of the table (oh, in vain, I said this, then you have to take the rap).

Use table

Great, let us have that same hash value y and a table of pairs (s ₀ , e ₀ ); (s ₁ , e ₁ ); ... (s _N , e _N ).

In the first iteration of the algorithm, we will look for the value of y among the end points of the chains: e _i .

Let us be lucky, y = e _k for some k . Then we regenerate the chain, starting with the corresponding s _k : after all, the element s _k ^N-1 going before e _k has a remarkable property: f (r (s _k ^N-1 )) = e _k = y, which means r ( s _k ^N-1 ) password required!

Let no luck, then we calculate y ₁ = f (r (y)). And our bad luck repeats, y _i = ((f ○ r) ^ i) (y), this I wrote down i applied f and r successively in turn (just as in chains in the table) . It should be noted that it makes no sense to count y _{N + 1} and further - even if they are in the table, the table will not help us. So here is a maximum of N steps, at each of which we count y _i and look for among e _k . And if, finally, it is found:

s _k → s _k ¹ → ... → s _k ^v → s _k ^v+1 → ... → ... → e _k
y → y ₁ → ... → y _i

The arrows here denote all the same - the composition of f and r. So, r (s _k ^v ) is the password for y !

Here is an explanation of why we don’t need to count y _N further: the “lower” chain should be shorter than the upper one; we refer to the previous relative to the y element.

So what is next?

Several questions remained unresolved:
1. how to quickly search the hash in the table
2. "Something about probability" :)
3. why do i need rainbow tables and how they are arranged
4. maybe more examples are needed
5) distinguished points and their simultaneous use with RT
6) debriefing of GSM A5 / 1 cryptanalysis, which is based on 5)

Source: https://habr.com/ru/post/82941/

All Articles