⬆️ ⬇️

Sieve of Eratosthenes for O (n). Evidence

In the comments to one of the past posts about the sieve of Eratosthenes, this short Wikipedia algorithm was mentioned:



Algorithm 1:



1:  i := 2, 3, 4, ...,  n: 2:  lp[i] = 0: 3: lp[i] := i 4: pr[] += {i} 5:  p  pr  p ≤ lp[i]  p*i ≤ n: 6: lp[p*i] := p : lp -        n pr -     n. 


The algorithm is simple, but not all, it seemed obvious. The main problem is that there is no proof on Wikipedia, and the reference to the original source ( pdf ) contains a rather different algorithm from the above.

')

In this post, I will try, I hope, it is easy to prove that this algorithm not only works, but also does it for linear complexity.



Algorithm complexity note
Although this algorithm is asymptotically faster than the standard sieve of Eratosthenes for O (n log log n), it requires much more memory. Therefore, for really large n, where this algorithm would shine in all its glory, it is not applicable. However, it is very useful even for small n, if you need not only to find all prime numbers, but also to quickly decompose all numbers up to n.



Definitions



 m a t h c a l P - set of prime numbers.

l p ( x ) - the minimum prime divisor of the number: lp (x) = min \ {p \ in \ mathcal {P} \ \ vert \ p \ vert x \}lp (x) = min \ {p \ in \ mathcal {P} \ \ vert \ p \ vert x \}

rd(x) - other factors: rd(x)=x/lp(x)



Please note all definitions above are for x ge2 .



Some obvious properties of the functions introduced above that will be used later:

lp(a timesb)=min(lp(a),lp(b))

rd(x)<x

p in mathcalP=>lp(p)=p



Evidence



Lemma 1 : lp(x) lelp(rd(x)), forallx notin mathcalP,x>1

Proof : Since any divider rd(x) also a divider x , but lp(x) does not exceed any divider x then lp(x) does not exceed any divider rd(x) , including the smallest of them. The only problem with this statement is if rd(x) has no prime dividers, but this is only possible in the case x in mathcalP which is excluded in the condition of the lemma.

 square



Let be E = \ {(lp (x), rd (x)) \ vert \ forall x \ notin \ mathcal {P}, 2 \ le x \ le n \}E = \ {(lp (x), rd (x)) \ vert \ forall x \ notin \ mathcal {P}, 2 \ le x \ le n \}



Because lp(x) timesrd(x)=x (by definition rd() ) if we were given a lot E then we could calculate lp() for all composite numbers up to n. This is done, for example, by the following algorithm:



Algorithm 2:



 1:   (l,r)  E: 2: lp[l*r] := l; 


notice, that  vertE vert len , so algorithm 2 above works for linear complexity.



Next, I will prove that algorithm 1 from Wikipedia actually just goes through all the elements of this set, because it can be parameterized in a different way.



Let be E '= \ {(p, r) \ vert p \ in \ mathcal {P}, 2 \ le r <n, p \ le lp (r), p \ times r \ le n \}E '= \ {(p, r) \ vert p \ in \ mathcal {P}, 2 \ le r <n, p \ le lp (r), p \ times r \ le n \}



Lemma 2 : E=E

Proof :



Let be (a,b) inE =>  existsx notin mathcalP, 2 lex len mida=lp(x),b=rd(x) .



By definition lp(),rd() : a in mathcalP , a timesb=x . By Lemma 1, a lelp(b) .

because rd(x)<x then b<n .

Insofar as x notin mathcalP , b ge2 .

Also, by definition E , x len , Consequently, a timesb len .

All 4 conditions E fulfilled means (a,b) inE => E subsete .



Let be (a,b) inE . Let be x=a timesb .

By definition E , a in mathcalP . So a - easy to divide x .

lp(x)=lp(a timesb)=min(lp(a),lp(b))=min(a,lp(b)).

Because a lelp(b) then lp(x)=a. So b=rd(x).

By definition, x=a timesb len. Also obviously x=a timesb ge2.

All conditions for E performed => E subsetE.



Consequently, E=E.

 square



It now remains to sort out the correct ones. r and p from the definition of the set E and apply algorithm 2. This is exactly what Algorithm 1 does (only i is used instead of r). But there is a problem! To go through all the elements of the set E to calculate lp, we need to know lp, because there is a condition in the definition p lelp(r) .



Fortunately, this problem is costing the correct brute force. Need to sort out r in the outer loop, and p - in the internal. Then lp(r) will already be calculated. This fact is proved by the following theorem.



Theorem 1:

Algorithm 1 supports the following invariant: After iterating the outer loop with i = k, all primes up to k inclusive will be allocated to the pr list. It will also be calculated lp() for all x notin mathcalP midx len, rd(x) lek in the lp array.



Proof :

We prove by induction. For k = 2, the invariant is checked manually. The single prime number 2 will be added to the pr list, because the lp [] array is initially filled with zeros. Also, the only composite number rd(x) le2 - this is 4. It is easy to verify that the inner loop will execute exactly once (for n> 3) and correctly execute lp [4]: ​​= 2.



Now suppose that the invariant holds for iteration i = k-1. We prove that it will be satisfied for the iteration i = k.



To do this, it is enough to check that the number i, if it is simple, will be added to the pr list and that lp() will be calculated for all compound numbers x len, including rd(x)=i. It is these numbers from the invariant for k that are not already covered by the invariant for k-1.



If i is simple, then lp [i] will be 0, because the only write operation to the lp array, which theoretically could count lp [i] (in line 6), always writes to composite indices, because p * i (for i> 1) - always composite. Therefore, the number i will be added to the list of simple ones. Also, in line 3, lp [i] will be calculated.



If i is composite, then at the beginning of the iteration lp [i] will already be calculated by the invariant for i = k-1, because rd(i)<i or rd(i) lei1, hence i falls under the condition of an invariant in the previous iteration. Therefore, the composite number i will not be added to the list of prime numbers.



Further, having the correct lp [i] and all prime numbers up to i inclusively, the loop in lines 5-6 will go through all the elements (p,i) inE whose second part is i:





All the necessary primes are in the pr list, because need only numbers to l p ( i ) l e i  . Therefore, lp [] values ​​will be calculated for all composite numbers. x , which r d ( x ) = i . These are exactly all the numbers that were missing in the transition from the invariant for k-1 to the invariant for k.



Therefore, the invariant is satisfied for any i = 2..n.



 s q u a r e



By the invariant from Theorem 1 for i = n, it turns out that all prime numbers up to n and all lp [] will be calculated by algorithm 1.



Moreover, since the various elements of the set E then the total internal loop will execute no more than  v e r t E v e r t < n  operations. The check operation in the loop will run smoothly.  v e r t E v e r t + n - 1 < 2 n  times (  v e r t E v e r t  returns true and n-1 times, for each i, returns false). All remaining operations are nested in one cycle with i from 2 to n.

It follows that the complexity of the algorithm 1 is O (n).

Source: https://habr.com/ru/post/452388/



All Articles