The cornerstone of pseudo-randomness: how does the search for numbers begin

( c )

Random numbers are constantly generated by each machine that can exchange data. And even if it does not exchange data, every computer needs randomness for distributing programs in memory. In this case, of course, the computer, as a deterministic system, cannot create true random numbers.

When it comes to random (or pseudo-random) number generators, the story is always built around the search for true randomness. For decades serious mathematicians have been discussing what is considered an accident, in practical terms, we have long learned to use the "correct" entropy. However, the “noise” is just the tip of the iceberg.

Where to start if we want to unravel the tangle of the strongest algorithms of PRNG and TRNG? In fact, no matter what algorithms you deal with, it all comes down to three whales: a seed, a table of predefined constants, and mathematical formulas.

Whatever the seed, there are still algorithms that participate in the generators of true random numbers, and such algorithms are never random.

What is an accident

The first suitable definition of a random sequence was given in 1966 by the Swedish statistician Per Martin-Löf, a student of one of the greatest mathematicians of the 20th century, Andrei Kolmogorov. Previously, researchers tried to define a random sequence as a sequence that passed all randomness tests.

The main idea of Martin-Lof was to use the theory of computability for the formal definition of the notion of the test of randomness. This contrasts with the idea of chance in probability; in this theory, no particular element of the sample space can be called random.

“Random sequence” in Martin-Löf’s ideas should be typical, i.e. should not have individual distinctive features.

It was shown that the randomness of Martin-Löf admits many equivalent characteristics, each of which satisfies our intuitive idea of the properties that random sequences should have:

incompressibility;
passing statistical tests for randomness;
the difficulty of making predictions.

The existence of multiple definitions of Martin-Lof randomization and the stability of these definitions with different computational models indicate that Martin-Lof's randomness is a fundamental property of mathematics.

Seed: the basis of pseudo-random algorithms

The first random number generation algorithms performed a series of basic arithmetic operations: multiply, divide, add, subtract, take averages, etc. Today, such powerful algorithms as Fortuna and Yarrow (used in FreeBSD, AIX, Mac OS X, NetBSD) look like random number generators for paranoids. Fortuna, for example, is a cryptographic generator in which, to protect against discredit after performing each request for random data in the amount of 220 bytes, another 256 bits of pseudo-random data are generated and used as a new encryption key - the old key is destroyed each time.

Years passed before the simplest algorithms evolved to cryptographically secure pseudo-random number generators. Partially this process can be traced by the example of the work of a single mathematical function in the C language.

The rand () function is the simplest of the random number generation functions in C.

#include <stdio.h> #include <stdlib.h> int main() { int r,a,b; puts("100 Random Numbers"); for(a=0;a<20;a++) { for(b=0;b<5;b++) { r=rand(); printf("%dt",r); } putchar('n'); } return(0); }

In this random example, a nested loop is used to display 100 random values. The rand () function is good at creating a lot of random values, but they are predictable. To make the output less predictable, you need to add seed to the random number generator — this is done using the srand () function.

Seed is the starting number, the point from which the pseudo-random number sequence begins. The pseudo-random number generator uses a single initial value, from which its pseudo-randomness follows. A true random number generator always has a high-quality random variable at the beginning, provided by various entropy sources.

 #include <stdio.h> #include <stdlib.h> int main() { unsigned seed; int r,a,b; printf("Input a random number seed: "); scanf("%u",&seed); srand(seed); for(a=0;a<20;a++) { for(b=0;b<5;b++) { r=rand(); printf("%dt",r); } putchar('n'); } return(0); }

srand () takes a number and sets it as a starting point. If the seed is not set, then each time you start the program, we will receive the same random numbers.

Here is an example of a simple random number formula from the "classic" - the book "Programming Language C" by Kernighan and Richie, the first edition of which was published in 1978:

 int rand() { random_seed = random_seed * 1103515245 +12345; return (unsigned int)(random_seed / 65536) % 32768; }

This formula assumes the existence of a variable called random_seed, initially given by some number. The variable random_seed is multiplied by 1 103 535 245, and then 12 345 is added to the result; random_seed is then replaced with this new value. It is actually a pretty good random number generator. If you use it to create random numbers from 0 to 9, then the first 20 values that it returns when seed = 10 are:

 44607423505664567674

If you have 10,000 values from 0 to 9, then the distribution will be as follows:

0 - 10151 - 10242 - 10483 - 9964 - 9885 - 10016 - 9967 - 10068 - 9659 - 961

Any pseudo-random number formula depends on the initial value. If you provide the rand () seed 10 functions on one computer and look at the stream of numbers it produces, the result will be identical to the “random sequence” created on any other computer with seed 10.

Unfortunately, the random number generator has another weakness: you can always predict what will happen next, based on what was before. To get the next number in the sequence, we must always remember the last internal state of the generator - the so-called state. Without state, we will again do the same math operation with the same numbers to get the same answer.

How to make a seed unique to each case? The most obvious solution is to add the current system time to the calculations. This can be done using the time () function.

 #include <stdio.h> #include <stdlib.h> #include <time.h> int main() { int r,a,b; srand((unsigned)time(NULL)); for(a=0;a<20;a++) { for(b=0;b<5;b++) { r=rand(); printf("%dt",r); } putchar('n'); } return(0); }

The time () function returns information about the current time of day, a value that is constantly changing. At the same time, the typecasting method ensures that the value returned by the time () function is an integer.

So, as a result of adding “random” system time, the rand () function generates values that are more random than we got in the first example.

However, in this case, the seed can be guessed by knowing the system time or the time of launching the application. As a rule, for applications where random numbers are absolutely critical, it is best to find an alternative solution.

The .net framework has a System.Security.Cryptography.RandomNumberGenerator function , where the following factors are taken into account in the calculations:

ID of the current process;
current thread ID;
the number of counts from the moment of loading;
current time;
various high-precision processor performance counters;
MD4 hash of the user's environment (user name, computer name, etc.).

But again, all these numbers are not random.

The best thing you can do with deterministic pseudo-random number generators is to add the entropy of physical phenomena.

Generator Period (Cycle)

One of the most frequently used methods for generating pseudo-random numbers is various modifications of the linear congruential method , the scheme of which was proposed by Derrick Lemer back in 1949:

X _{n + 1} = (aX _n + c) mod m, where m is a module, a is a multiplier, c is an increment, mod is an operation to take the remainder of division. Moreover, m> 0, 0 <a ≤ m, 0 <c ≤ m, the initial value X ₀ is also set: 0 <X ₀ ≤ m.

The linear congruential method gives us repeating sequences - the congruent sequence always forms "loops". This cycle (period), repeated infinitely many times, is a property of all sequences of the form X _{n + 1} = f (n).

In C, the linear congruential method is implemented in the rand () function already familiar to you:

 #define RAND_MAX 32767 unsigned long next=1; int rand(void) { next=next*1103515245+12345; return((unsigned int)(next/65536)%RAND_MAX);} void srand(unsigned int seed) { next=seed; }

What is a cycle in terms of random numbers? Period is the number of numbers that are generated before they return in the same sequence. For example, the number of periods in the A5 cipher averages 2 ²³ , and the complexity of the attack is 240, which allows hacking it on any personal computer.

Consider the case when the seed is 1, and the period is 100 (in Haskell):

 random i = (j, ans) where j = 7 * i `mod` 101 ans = (j — 1) `mod` 10 + 1 — just the ones place, but 0 means 10

As a result, we get the following answer:

 random 1 —> ( 7, 7) random 7 —> (49, 9) random 49 —> (40, 10) random 40 —> (78, 8) random 78 —> (41, 1) random 41 —> (85, 5) random 85 —> (90, 10) random 90 —> (24, 4) random 24 —> (67, 7) random 67 —> (65, 5) random 65 —> (51, 1)

This is just an example and a similar structure in real life is not used. In Haskell, if you want to build a random sequence, you can use the following code:

 random :: StdGen —> (Int, StdGen)

Choosing random Int gives you back Int and new StdGen, which you can use to get more pseudo-random numbers. Many programming languages, including Haskell, have random number generators that automatically remember their state (in Haskell, this is randomIO).

The larger the period, the higher the reliability of creating good random values, but even with billions of cycles, it is crucial to use a reliable seed. Real random number generators usually use atmospheric noise (put any physical phenomenon here - from the user's mouse movement to radioactive decay), but we can also cheat using the program method by adding asynchronous flows of different garbage to the seed, be it the lengths of intervals between the last heartbeat flows or the time expectations of mutual exclusion (or better add all together).

True randomness bits

So, having received a seed with an admixture of data from real physical phenomena (or making life as difficult for a future hacker as possible with the largest set of software debris streams that we can think of), setting state to protect against repeat value errors and adding cryptographic algorithms (or complex mathematical problems), we will get some data set, which we will consider as a random sequence. What's next?

Then we return to the very beginning, to one of the fundamental requirements - tests.

The US National Institute of Standards and Technology has put into the " Statistical Test Package for Random and Pseudo-Random Number Generators for Cryptographic Applications " 15 basic checks. They can be limited, but this package is not at all the “top” of the randomness check.

One of the most stringent statistical tests proposed by Professor George Marsalla of the University of Florida. The diehard tests include 17 different checks, some of which require very long sequences: at least 268 megabytes.

Randomness can be tested using the TestU01 library provided by Pierre L'Ecuille and Richard Simard of the University of Montreal, including the classic tests and some of the original ones, as well as through the publicly available library SPRNG .

Another useful service for quantifying randomness.

Source: https://habr.com/ru/post/351282/

All Articles

The cornerstone of pseudo-randomness: how does the search for numbers begin

What is an accident

Seed: the basis of pseudo-random algorithms

Generator Period (Cycle)

True randomness bits

More articles: