We generate pseudo-random ID a la Youtube

Hi% username%! Sometimes it is necessary to generate IDs in a row , and that they are guaranteed not to be repeated . On youtube, this is used so that you cannot get new and old vidosiki bruteforce, it is also not uncommon on different file sharing sites and generally everywhere where you need to prevent or at least make it difficult to directly search through the values.

For example, in the moodle system, which was used in our university for testing students, response IDs were incremental and cross-cutting across the entire base. It is logical to assume that the correct answer was the one with the lowest ID within the question. In general, we had no problems with tests. Then they switched to the GUID, but by that time I was already released, hehe.

Let's look at several ways to generate such length-limited sequences from the simplest to the most cryptographically stable.

Deduction ring

More precisely, the multiplicative group of the residue ring. In fact, it is easier than it sounds. I will explain the picture:
')

See, the troika degrees here go in order.
1, 2, 3, 4, 5, 6, and the result of the remainder of the division is not
3, 2, 6, 4, 5, 1. But still, as a result, all the numbers from 1 to 6 are present.

7 is called a module, and 3 is called a primitive root (generator). For this to work, the module must be of the form:

where p is a prime number greater than two. Modules for us are the maximum ID values that we can generate with their help.

What is not an approach for generating pseudo-random aydishek? Take the ID in order, we raise the generator to the power of this ID and take the remainder modulo. For sufficiently large modules it will turn out quite to itself.

This, by the way, is almost Diffie Hellman, only on a smaller scale. By the way, those who handles DH parameters for web servers with handles know that the process is not fast. Just because it is not easy to search for a primitive root for large numbers.

Linear congruential method

A popular thing as default random number generators in many programming languages, but not at all crypto-resistant. Here is its formula:

With properly selected parameters provides a period equal to m .

And if a prime number is chosen as m , then even c can be thrown out under certain conditions, the period will be maximal or close to it.

The disadvantages are obvious - you need to look for special numbers, and not the fact that they will be a multiple of the size of the byte. Well, to predict the next member of the sequence, knowing the previous ones, it is possible, it was proved by mathematicians.

Linear shift registers with feedback

Absolutely wonderful constructions that allow you to generate pseudo-random sequences by perekorivaniya just a few bits. What bits Xori defines the polynomial underlying the LFSR. If it is primitive, then the sequence will be maximal.

These primitive polynomials are not so easy to generate, just like looking for prime numbers. But there was still a great primpoly program that can find primitive polynomials of a given degree. It can even list all possible primitive polynomials by passing the -a parameter, for example, Primpoly.exe -a 2 64.

If we need an ID of 8 bytes in size, approximately like on Youtube, then here is the minimal polynomial x ^ 64 + x ^ 4 + x ^ 3 + x + 1.

How to use it:

We have any 64bit number other than zero. We cross the bits 64, 4, 3, and 1 between ourselves. A unit in a polynomial means that the number is shifted to the right by 1 bit, and the result of xor is placed on the high bit.

Implementation example on C:

bit = ((lfsr >> 0) ^ (lfsr >> 60) ^ (lfsr >> 61) ^ (lfsr >> 63) ) & 1; lfsr = (lfsr >> 1) | (bit << 63);

A shift to 0 gives us bit 64, a shift to 1 bit 63, etc. If we execute this code in an infinite loop, then we will eventually arrive at the value from which we started.

And if you drive a number through several registers with different polynomials, then it will be quite difficult to predict such a sequence even to mathematically savvy comrades. By the way, on the principle of a combination of LFSR, the encryption algorithms for GSM traffic A5 / 1 and A5 / 2 are built, although they have already been broken, but nonetheless.

The disadvantage of this approach is that we can only get the ID sequentially, without knowing in advance what will be “through one”. Therefore, we proceed to the next method.

Reversible function

Or call them isomorphic. Here in general there are an infinite number of options, everything is limited by your imagination and desire to wind up the complexity of the process, the result of which is your Aydishechka.

For example, take the functions σ0 and σ1 from SHA-256, which are compared to another 32-bit one 32bit number (they are called f-1 and f2 in the description of the stream cipher HC-128).

>>> this is a cyclic shift, >> this is just a shift to the right.

Here is a quote from the comrades who investigated these functions:

And and ma G G G 2 G F F GF (2) - vector).

It is clear from it that they are isomorphic (one to one), you can not check it yourself. We can even not build the inverse function, we simply run all the numbers in the cycle through one or the other, or both, or a couple of times one, then a couple of times another, well, you understand. And we get all the IDs in the 32-bit range in a pseudo-random order that only we know about.

Such functions can be thought out, as you understand, you can basilion and combine as you like. But for now, this is also not a cryptographic way. Yes, and get back, if necessary, the original ID simply will not work.

Crypto-resistant generation of ID sequences of any size.

You will not believe it, but everything has already been invented for us, I even wrote an article about it .

Only here we do not have credit card numbers, but numbers as large as we need. 16 to 192 bits in 1 bit increments for one round of the BPS algorithm. We take the base of the system as we need, not necessarily a multiple of 8. In the algorithm, the limit on the maximum base is 65536, but nothing prevents you from doing more, technically 96 bits are placed in one half of the Feistel network. Or do not pervert at all, make the base 256 and just encrypt as many bytes as there are in your ID.

So, set up a BPS for this base, a key for AES, IV (Tweak) and run all the original ID from 1 to “system base - 1” through this BPS. Then what we have turned out is wrapped up in some base58 and here you have a beautiful aydishechka ready.

We don’t even need to store it , we can decipher it and match the original normal aydishka, the backend may not even be aware that we are perverted with ID.

True random

The truest approach in terms of cryptoresistance (even theoretically, nothing can be predicted), but inconvenient in the economy. We generate each ID absolutely randomly with some piece of hardware and check in the database that there is no such one yet and then we write. But it is necessary to climb into the database every time to check, over time, the generation will be slower and other problems.

So it goes.

Source: https://habr.com/ru/post/311676/

All Articles