In the process of learning programming, I was faced with the task of finding 1M first prime numbers.
On Habré have already written about this. But, most often, it was about finding all prime numbers up to the number
N. I will tell you how I solved the problem of finding the
N first prime numbers.
At first glance it may seem that there is practically no difference in these tasks. And this is true until we realize that the enumeration of dividers copes with the task is extremely bad.
')
As a result of the work done to find 1 million primes, it takes me just 0.262 seconds, which, of course, is not the limit, but already impressive. The algorithm is implemented in C ++.
Like the author of the previous article
on prime numbers , I began to solve the problem head-on. So, the prime number
p is what has two dividers - one and
p . The first thing that comes to mind is to check each number by dividing into all previous numbers.
The idea turned out to be bad. I understood this after the first 30 minutes of the program in search of a million simple ones. The results were as follows:
10,000 - 2.349 seconds
100,000 - 293.552 seconds
1,000,000 - did not wait
As in the article already mentioned above, I began to optimize the algorithm. First, it does not make sense to check even numbers, because 2 is the only prime even number. The same with dividers - you can not check for even ones. Third, as divisors, it is sufficient to take numbers that do not exceed half of the number being checked. Here's what came of it:
// Listing 1 <br/>
<br/>
#include <iostream> <br/>
#include <vector> <br/>
#include <ctime> <br/>
<br/>
using namespace std ; <br/>
<br/>
int main ( ) <br/>
{ <br/>
clock_t start = clock ( ) ; <br/>
<br/>
const int AIM = 10000 ; <br/>
<br/>
vector < int > prime_nums ; <br/>
prime_nums. push_back ( 2 ) ; <br/>
<br/>
for ( int num = 3 ; prime_nums. size ( ) < AIM ; num + = 2 ) <br/>
{ <br/>
bool isprime = true ; <br/>
for ( int i = 3 ; i <= num / 2 ; i + = 2 ) <br/>
{ <br/>
if ( num % i == 0 ) <br/>
{ <br/>
isprime = false ; <br/>
break ; <br/>
} <br/>
} <br/>
<br/>
if ( isprime ) <br/>
prime_nums. push_back ( num ) ; <br/>
} <br/>
<br/>
cout << prime_nums [ AIM - 1 ] ; <br/>
<br/>
clock_t end = clock ( ) ; <br/>
cout << " \n ----------------" << endl ; <br/>
cout << "Time: " << double ( end - start ) / CLOCKS_PER_SEC << " sec" << endl ; <br/>
}
It became better, but a million I could not wait
10,000 - 0.589 seconds
100,000 - 72.662 seconds
1 000 000 - stopped waiting after 20 minutes
It is clear that you can continue to optimize this algorithm for a long time, which I actually did. You can not check the numbers ending in 5 and 0. You can make a list of the most common divisors and check each number first on them. At some point, it seemed to me that I could solve the problem by checking the new numbers only on the simple ones already found. Significant results are not brought. The problem was that with each new number, the number of checks increased, and the division operation was one of the most difficult.
Went to search. I learned about the
sieve of Eratosthenes . But to use the algorithm as it is for my task is impossible. It is not known in advance * how many numbers should be sifted in order to find
N simple ones. Therefore, I decided to modify the algorithm to fit my needs as follows.
Sifting in advance a large set of natural numbers is not profitable. So it is necessary to sift some sets of natural numbers until we reach the desired result. And each new set must first be sifted to the already found prime numbers. Here is the code obtained after a series of optimizations:
// Listing 2 <br/>
<br/>
#include <iostream> <br/>
#include <ctime> <br/>
#include <vector> <br/>
<br/>
using namespace std ; <br/>
<br/>
int main ( ) <br/>
{ <br/>
clock_t start = clock ( ) ; <br/>
<br/>
const int AIM = 1000000 ; <br/>
<br/>
int startSize = AIM ; // <br/>
int addSize = AIM ; // <br/>
<br/>
vector < bool > nums ( startSize ) ; <br/>
vector < int > primeNums ( AIM ) ; <br/>
<br/>
int foundPrimes = 0 ; <br/>
<br/>
for ( int i = 2 ; i < startSize ; i ++ ) <br/>
nums [ i ] = true ; <br/>
<br/>
bool addition = false ; <br/>
int adder = 0 ; <br/>
while ( true ) <br/>
{ <br/>
if ( addition ) <br/>
{ <br/>
nums. resize ( nums. size ( ) + addSize, true ) ; <br/>
<br/>
// <br/>
for ( int i = 0 ; i < foundPrimes ; i ++ ) <br/>
{ <br/>
int cur_num = primeNums [ i ] ; <br/>
if ( ( addSize + ( ( nums. size ( ) - addSize ) % cur_num ) ) < cur_num ) <br/>
continue ; <br/>
for ( int j = ( ( nums. size ( ) - addSize ) / cur_num ) * cur_num ; j < nums. size ( ) ; j + = cur_num ) <br/>
nums [ j ] = false ; <br/>
} <br/>
} <br/>
else <br/>
addition = true ; <br/>
<br/>
<br/>
int iter ; <br/>
if ( foundPrimes == 0 ) <br/>
iter = 2 ; <br/>
else <br/>
iter = primeNums [ foundPrimes - 1 ] + 2 ; <br/>
<br/>
for ( ; iter < nums. size ( ) ; iter ++ ) <br/>
{ <br/>
// <br/>
if ( nums [ iter ] ) <br/>
{ <br/>
primeNums [ foundPrimes ] = iter ; <br/>
foundPrimes ++ ; <br/>
if ( foundPrimes == AIM ) <br/>
break ; <br/>
// <br/>
for ( int j = iter + iter ; j < nums. size ( ) ; j + = iter ) <br/>
nums [ j ] = false ; <br/>
} <br/>
else <br/>
continue ; <br/>
<br/>
} <br/>
if ( foundPrimes == AIM ) <br/>
break ; <br/>
} <br/>
<br/>
cout << "Last prime: " << primeNums [ AIM - 1 ] ; <br/>
<br/>
clock_t end = clock ( ) ; <br/>
cout << " \n ----------------" << endl ; <br/>
cout << "Time: " << double ( end - start ) / CLOCKS_PER_SEC << " sec" << endl ; <br/>
return 0 ; <br/>
}
The results even slightly surprised:
10,000 - 0.001 s.
100,000 - 0.021 s.
1,000,000 - 0.262 s.
...
10,000,000 - 2.99 seconds
On this, with the task of finding 1M of the first primes, I finished. The result completely satisfied me. But what else can you do? You can implement any of the known algorithms based on the sieve of Eratosthenes, for example,
Sundaram sieve . But the results are unlikely to change significantly. Perhaps
Atkin's sieve will be able to show itself better - it is this algorithm that is most often used in modern cryptography.
* - unknown to me.
halyavin said in the first comment that it has long been known.
UPDRegarding memory consumption, the described algorithms behave as follows:
1 million primes, maximum RAM
- enumeration of dividers: ~ 4Mb
- Eratosthenes modified sieve: ~ 6Mb
10 million primes, maximum RAM
- sorting dividers: ~ 39Mb
- Eratosthenes modified sieve: ~ 61Mb