📜 ⬆️ ⬇️

Database of primes

Just now, again carried away with prime numbers. Lures me in their secret.

Wrote an algorithm similar to the sieve of Eratosthenes. For 3 hours, the program found 700 thousand first prime numbers. And I need at least 14 million prime numbers to multiply them to get a number with the number of decimal digits equal to 100 million pieces.

From the article "Once again on the search for prime numbers," written by a user of Bodigrim , I learned of the existence of a fast primegen program that works using an Atkin sieve . Installed it in the LUbuntu virtual machine (VirtualBox). Indeed, the primegen works very fast!
')
Then the question was how to save 14 million prime numbers? You can simply write every prime number to the file as int32. And if a prime number will be more than the power of 32 bits?

I got the idea to write to the file not the numbers themselves, but the distances between them. And the distance between adjacent prime numbers should always be small, suggested that fit in one byte.

It remains to find out the maximum possible distance for a certain range of numbers. Since the difference between primes is always an even number (except for the distance between 2 and 3), we divide the distance by 2.

In the primegen program, in the source file primes.c, instead of displaying the number on the screen, it implemented an algorithm for calculating statistics on the number of distances between the numbers:

int RastCount_Index = 0; int RastCount[1000]; for(i=0;i < 1000; i++) RastCount[i] = 0; for (;;) { u = primegen_next(&pg) - p; p += u; if (p > high) break; for (i = 0;u;++i) { u += digits[i]; if (u >= 200) { digits[i] = u % 10; u = u / 10; } else { digits[i] = mod10[u]; u = div10[u]; } } if (i > len) len = i; int LetsRast, index; LetsRast = 0; index = 0; char r[40], r_old[40]; for (i = 0;i < 40; i++) { r[i] = 0; r_old[i] = 0; } for (i = len - 1;i >= 0;--i) { if (! LetsRast) if (digits_old[i] != digits[i]) LetsRast = 1; if (LetsRast) { r[index] = '0' + digits[i]; r_old[index] = '0' + digits_old[i]; index++; } } int ri, ri_old, Rast; ri = atoi(r); ri_old = atoi(r_old); Rast = (ri - ri_old) >> 1; RastCount[Rast]++; if (Rast > RastCount_Index) RastCount_Index = Rast; for (i = len-1;i >= 0; i--) digits_old[i] = digits[i]; } for(i = 0; i <= RastCount_Index; i++) printf("%i = %i\n", i, RastCount[i]); 


In the terminal performed:

 ./primes 1 1000000000 

After 10 seconds, the list was displayed:
0 = 1 (distance between the numbers 2 and 3)
1 = 3424507
...
141 = 1

Thus, 141 is the maximum possible distance on a prime number of up to 1 billion.

Wrote the code to write to the file:

 FILE* fd; fd = fopen("primes.bin", "w+"); unsigned char b1[1]; b1[0] = Rast; fwrite(b1,1,1,fd); fclose(fd); 

Launched up to 300 million:

 ./primes 1 300000000 

In the file primes.bin received 16 million first prime numbers. Compressed with the archiver 7-Zip, the file is shrunk to 9 MB.

PS There is an idea how to compress the database of primes even more. It is necessary to divide simple numbers into 4 groups according to the last decimal digit: 1, 3, 7, 9. Divide the distance between the numbers by 10. Also form 4 files. It is possible that for the distance values ​​it will be possible to allocate not 8 bits, but less, then it will be necessary to implement the formation of a byte buffer from, for example, 5-bit distances.

Although in our century of gigabyte capacities it is not very important.

Source: https://habr.com/ru/post/246789/


All Articles