Sieve of Eratosthenes for O (n). Evidence

In the comments to one of the past posts about the sieve of Eratosthenes, this short Wikipedia algorithm was mentioned:

Algorithm 1:

1:  i := 2, 3, 4, ...,  n: 2:  lp[i] = 0: 3: lp[i] := i 4: pr[] += {i} 5:  p  pr  p ≤ lp[i]  p*i ≤ n: 6: lp[p*i] := p : lp -        n pr -     n.

The algorithm is simple, but not all, it seemed obvious. The main problem is that there is no proof on Wikipedia, and the reference to the original source ( pdf ) contains a rather different algorithm from the above.
')
In this post, I will try, I hope, it is easy to prove that this algorithm not only works, but also does it for linear complexity.

Algorithm complexity note

Although this algorithm is asymptotically faster than the standard sieve of Eratosthenes for O (n log log n), it requires much more memory. Therefore, for really large n, where this algorithm would shine in all its glory, it is not applicable. However, it is very useful even for small n, if you need not only to find all prime numbers, but also to quickly decompose all numbers up to n.

Definitions

$\ mathcal {P}$ - set of prime numbers.

$lp (x)$ - the minimum prime divisor of the number:

lp (x) = min \ {p \ in \ mathcal {P} \ \ vert \ p \ vert x \}

$lp (x) = min \ {p \ in \ mathcal {P} \ \ vert \ p \ vert x \}$

$rd (x)$ - other factors:

$rd (x) = x / lp (x)$

Please note all definitions above are for

$x \ ge 2$ .

Some obvious properties of the functions introduced above that will be used later:

$lp (a \ times b) = min (lp (a), lp (b))$

$rd (x) <x$

$p \ in \ mathcal {P} => lp (p) = p$

Evidence

Lemma 1 :

$lp (x) \ le lp (rd (x)), \ forall x \ notin \ mathcal {P}, x> 1$
Proof : Since any divider

$rd (x)$ also a divider

$x$ , but

$lp (x)$ does not exceed any divider

$x$ then

$lp (x)$ does not exceed any divider

$rd (x)$ , including the smallest of them. The only problem with this statement is if

$rd (x)$ has no prime dividers, but this is only possible in the case

$x \ in \ mathcal {P}$ which is excluded in the condition of the lemma.

$\ square$

Let be

E = \ {(lp (x), rd (x)) \ vert \ forall x \ notin \ mathcal {P}, 2 \ le x \ le n \}

$E = \ {(lp (x), rd (x)) \ vert \ forall x \ notin \ mathcal {P}, 2 \ le x \ le n \}$

Because

$lp (x) \ times rd (x) = x$ (by definition

$rd ()$ ) if we were given a lot

$E$ then we could calculate

$lp ()$ for all composite numbers up to n. This is done, for example, by the following algorithm:

Algorithm 2:

 1:   (l,r)  E: 2: lp[l*r] := l;

notice, that

$\ vert E \ vert \ le n$ , so algorithm 2 above works for linear complexity.

Next, I will prove that algorithm 1 from Wikipedia actually just goes through all the elements of this set, because it can be parameterized in a different way.

Let be

E '= \ {(p, r) \ vert p \ in \ mathcal {P}, 2 \ le r <n, p \ le lp (r), p \ times r \ le n \}

$E '= \ {(p, r) \ vert p \ in \ mathcal {P}, 2 \ le r <n, p \ le lp (r), p \ times r \ le n \}$

Lemma 2 :

$E = E '$
Proof :

Let be

$(a, b) \ in E$ =>

$\ exists x \ notin \ mathcal {P}, \ 2 \ le x \ le n \ mid a = lp (x), b = rd (x)$ .

By definition

$lp (), rd ()$ :

$a \ in \ mathcal {P}$ ,

$a \ times b = x$ . By Lemma 1,

$a \ le lp (b)$ .
because

$rd (x) <x$ then

$b <n$ .
Insofar as

$x \ notin \ mathcal {P}$ ,

$b \ ge 2$ .
Also, by definition

$E$ ,

$x \ le n$ , Consequently,

$a \ times b \ le n$ .
All 4 conditions

$E '$ fulfilled means

$(a, b) \ in E '$ =>

$E \ subset e '$ .

Let be

$(a, b) \ in E '$ . Let be

$x = a \ times b$ .
By definition

$E '$ ,

$a \ in \ mathcal {P}$ . So

$a$ - easy to divide

$x$ .

$lp (x) = lp (a \ times b) = min (lp (a), lp (b)) = min (a, lp (b)).$
Because

$a \ le lp (b)$ then

$lp (x) = a.$ So

$b = rd (x).$
By definition,

$x = a \ times b \ le n.$ Also obviously

$x = a \ times b \ ge 2.$
All conditions for

$E$ performed =>

$E '\ subset E.$

Consequently,

$E = E '.$

$\ square$

It now remains to sort out the correct ones.

$r$ and

$p$ from the definition of the set

$E '$ and apply algorithm 2. This is exactly what Algorithm 1 does (only i is used instead of r). But there is a problem! To go through all the elements of the set

$E '$ to calculate

$lp,$ we need to know

$lp,$ because there is a condition in the definition

$p \ le lp (r)$ .

Fortunately, this problem is costing the correct brute force. Need to sort out

$r$ in the outer loop, and

$p$ - in the internal. Then

$lp (r)$ will already be calculated. This fact is proved by the following theorem.

Theorem 1:
Algorithm 1 supports the following invariant: After iterating the outer loop with i = k, all primes up to k inclusive will be allocated to the pr list. It will also be calculated

$lp ()$ for all

$x \ notin \ mathcal {P} \ mid x \ le n, \ rd (x) \ le k$ in the lp array.

Proof :
We prove by induction. For k = 2, the invariant is checked manually. The single prime number 2 will be added to the pr list, because the lp [] array is initially filled with zeros. Also, the only composite number

$rd (x) \ le 2$ - this is 4. It is easy to verify that the inner loop will execute exactly once (for n> 3) and correctly execute lp [4]: = 2.

Now suppose that the invariant holds for iteration i = k-1. We prove that it will be satisfied for the iteration i = k.

To do this, it is enough to check that the number i, if it is simple, will be added to the pr list and that

$lp ()$ will be calculated for all compound numbers

$x \ le n,$ including

$rd (x) = i.$ It is these numbers from the invariant for k that are not already covered by the invariant for k-1.

If i is simple, then lp [i] will be 0, because the only write operation to the lp array, which theoretically could count lp [i] (in line 6), always writes to composite indices, because p * i (for i> 1) - always composite. Therefore, the number i will be added to the list of simple ones. Also, in line 3, lp [i] will be calculated.

If i is composite, then at the beginning of the iteration lp [i] will already be calculated by the invariant for i = k-1, because

$rd (i) <i$ or

$rd (i) \ le i-1,$ hence i falls under the condition of an invariant in the previous iteration. Therefore, the composite number i will not be added to the list of prime numbers.

Further, having the correct lp [i] and all prime numbers up to i inclusively, the loop in lines 5-6 will go through all the elements

$(p, i) \ in E '$ whose second part is i:

$p \ in P,$ because it is from the pr list
$p \ le lp (i),$ by the condition of the cycle stop
$p \ times i \ le n,$ by the condition of the cycle stop
$i <n,$ follows from $p \ times i \ le n, \ p> 1$

All the necessary primes are in the pr list, because need only numbers to

$lp (i) \ le i$ . Therefore, lp [] values will be calculated for all composite numbers.

$x$ , which

$rd (x) = i$ . These are exactly all the numbers that were missing in the transition from the invariant for k-1 to the invariant for k.

Therefore, the invariant is satisfied for any i = 2..n.

$\ square$

By the invariant from Theorem 1 for i = n, it turns out that all prime numbers up to n and all lp [] will be calculated by algorithm 1.

Moreover, since the various elements of the set

$E$ then the total internal loop will execute no more than

$\ vert E \ vert <n$ operations. The check operation in the loop will run smoothly.

$\ vert E \ vert + n - 1 <2n$ times (

$\ vert E \ vert$ returns true and n-1 times, for each i, returns false). All remaining operations are nested in one cycle with i from 2 to n.
It follows that the complexity of the algorithm 1 is O (n).

Source: https://habr.com/ru/post/452388/

All Articles

Sieve of Eratosthenes for O (n). Evidence

Definitions

Evidence

More articles: