📜 ⬆️ ⬇️

Sexy primes, "slow python" or how I fought against the wall of misunderstanding

Many developers, especially those who are actively involved in the design of the system, probably faced a similar situation: a colleague comes (razrab, project or salesman does not matter) with another fixed idea: let's rewrite everything to java, scala, etc. (favorite substitute).

So I once again "lowered" such an idea in a rather big-legacy project. Not completely rewritten, not quite everything (well, in the future). In general, go from the python (and we also have a tickle modularly present there) to scala. It was about the development of new modules and services, i.e. start with the middle-level and front-nearby API's least attached. As I understood in perspective, it is possible entirely.

A person is not a developer, like an initial project and a little sales person (for a specific client) in one person.
')
I am not that opposed. And I respect the rock and in my own love. Usually I am generally open to everything new. For example, in some places besides tickle and python, we have services or modules in other languages. So, for example, we moved from cvs to svn, and then to git (and earlier, a long time ago, in general, MS-VSS was). There are actually a lot of examples, one thing unites them - the developers themselves decided so or at least (collectively, or there was a group of initiators - it does not matter). And the point is generally in the arguments for and against.

The problem is that sometimes for a reasoned discussion “Developer vs. Anybody-Else ”in the latter does not reach the level of knowledge of“ matter ”or it is simply incredibly difficult to convey a thought - that is, as it were, a conversation in different languages. And it's good if this is someone like software architect. Worse, if we have a “conversation,” for example, with a pure sales person, who announced, for example, sudden customer requirements.

Why no one prescribes, for example, a tiler - what kind of trowel to work for him (such as with 10mm glue teeth will go away more, let's still have 5mm. And the fact that there are curved walls doesn’t bother anyone). And theoretically, the screw can also be “twisted” with a hammer, but for this there is a screwdriver, and later a screwdriver was invented. I exaggerate of course, but sometimes it really reminds me of such absurdity.

This is me to the fact that the developer should ideally choose the developer himself or have at least the last word in it - to work with him or suffer from this instrument.

But something I digress. In my particular history of arguments - for scala, the man, as always, has almost none.
Although I could talk for a long time about things, such as the presence of developers, ready-made developments, a well-honed and debugged system, etc. etc. But caught on his "Python is very slow." As an example, he threw a link at Interpreting a benchmark in C, Clojure, Python, Ruby, Scala and others - Stack Overflow to me, which he didn’t even read to the end (for there is almost direct text - everything is not so bad with python) .

It was meant exactly this (the time is indicated in seconds):
Sexy primes up to: 10k 20k 30k 100k --------------------------------------------------------- Python2.7 1.49 5.20 11.00 119 Scala2.9.2 0.93 1.41 2.73 20.84 Scala2.9.2 (optimized) 0.32 0.79 1.46 12.01 


Talking with him at the level of a purely technical side, there is not the slightest desire. Those. about memory allocation / object pool, GC (we do not have pure CPython, more like RPython or pypy with its own Jit, MemMgr, GC), about any sugar with which the person writing Bench has gone a little too far, etc.

My answer was “they don’t like to develop on python for this” and “they don’t write it that way, at least the speed-critical code”. I was a little deceitful and naturally understand that this benchmark example is artificial, that is, Means purely to measure speed. But the sores that, with such a test, increase in CPython, are also known to me.
Therefore, I still tried on this particular example to show why this test is not expedient, well, or why it is not completely objective or something.

I started by showing him this test and the results of execution (marked with a star) in PyMNg (which is our build), pypy2, pypy3 and python3 (which was the same slow for the same understandable reasons). Iron, of course, is different, but the order, I think, is about the same.
  Sexy primes up to: 10k 20k 30k 100k --------------------------------------------------------- PyMNg * 0.10 - - 2.46 pypy2 * 0.13 - - 5.44 pypy3 * 0.12 - - 5.69 --------------------------------------------------------- scala 2.10.4 (warmed up) * - - 6.57 scala 2.10.4 (cold) * - - 7.11 --------------------------------------------------------- Python3.4 * 1.41 - - 117.69 --------------------------------------------------------- Python2.7 1.49 5.20 11.00 119.00 Scala2.9.2 0.93 1.41 2.73 20.84 Scala2.9.2 (optimized) 0.32 0.79 1.46 12.01 

Further, in principle, it was possible not to continue; I just wanted to make an attempt to explain to a person what he was wrong in that the choice of language is not evaluated by benchmarks like “Hello, world”, etc.

So, the task is to look for “sexy” pairs of prime numbers on python. A lot and fast.

For those who are not interested in debriefing, a link to the script on github , if there is a desire to play.

The results of the performance of each option under the spoiler, respectively.
100K for all options.
pypy3 sexy-primes-test.py 100K
  org = 5.69000 s = 5690.00 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] mod1 = 2.45500 s = 2455.00 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] mod2 = 1.24300 s = 1243.00 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] org2 = 3.46800 s = 3468.00 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] max = 0.03200 s = 32.00 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] orgm = 0.13000 s = 130.00 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] siev1 = 0.01200 s = 12.00 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] siev2 = 0.01000 s = 10.00 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] osie1 = 0.01200 s = 12.00 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] osie2 = 0.00200 s = 2.00 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] 

python34 sexy-primes-test.py 100K
  org = 120.75802 s = 120758.02 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] mod1 = 132.99282 s = 132992.82 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] mod2 = 76.65101 s = 76651.01 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] org2 = 53.42527 s = 53425.27 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] max = 0.44004 s = 440.04 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] orgm = 0.39003 s = 390.03 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] siev1 = 0.04000 s = 40.00 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] siev2 = 0.04250 s = 42.50 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] osie1 = 0.04500 s = 45.00 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] osie2 = 0.04250 s = 42.50 mils | 2447 sp: [5, 11], [7, 13], ... [99901, 99907], [99923, 99929] 

10M starting from max (the rest are just slow on such an array).
pypy3 sexy-primes-test.py 10M max
  max = 5.28500 s = 5285.00 mils | 117207 sp: [5, 11], [7, 13], ... [9999931, 9999937], [9999937, 9999943] orgm = 12.65600 s = 12656.00 mils | 117207 sp: [5, 11], [7, 13], ... [9999931, 9999937], [9999937, 9999943] siev1 = 0.51800 s = 518.00 mils | 117207 sp: [5, 11], [7, 13], ... [9999931, 9999937], [9999937, 9999943] siev2 = 0.23200 s = 232.00 mils | 117207 sp: [5, 11], [7, 13], ... [9999931, 9999937], [9999937, 9999943] osie1 = 0.26800 s = 268.00 mils | 117207 sp: [5, 11], [7, 13], ... [9999931, 9999937], [9999937, 9999943] osie2 = 0.22700 s = 227.00 mils | 117207 sp: [5, 11], [7, 13], ... [9999931, 9999937], [9999937, 9999943] 

python34 sexy-primes-test.py 10M max
  max = 288.81855 s = 288818.55 mils | 117207 sp: [5, 11], [7, 13], ... [9999931, 9999937], [9999937, 9999943] orgm = 691.96458 s = 691964.58 mils | 117207 sp: [5, 11], [7, 13], ... [9999931, 9999937], [9999937, 9999943] siev1 = 4.02766 s = 4027.66 mils | 117207 sp: [5, 11], [7, 13], ... [9999931, 9999937], [9999937, 9999943] siev2 = 4.05016 s = 4050.16 mils | 117207 sp: [5, 11], [7, 13], ... [9999931, 9999937], [9999937, 9999943] osie1 = 4.69519 s = 4695.19 mils | 117207 sp: [5, 11], [7, 13], ... [9999931, 9999937], [9999937, 9999943] osie2 = 4.43018 s = 4430.18 mils | 117207 sp: [5, 11], [7, 13], ... [9999931, 9999937], [9999937, 9999943] 

100M starting with version siev1 (for the same reason).
pypy3 sexy-primes-test.py 100M siev1
 siev1 = 7.39800 s = 7398.00 mils | 879908 sp: [5, 11], [7, 13], ... [99999617, 99999623], [99999821, 99999827] siev2 = 2.24500 s = 2245.00 mils | 879908 sp: [5, 11], [7, 13], ... [99999617, 99999623], [99999821, 99999827] osie1 = 2.53500 s = 2535.00 mils | 879908 sp: [5, 11], [7, 13], ... [99999617, 99999623], [99999821, 99999827] osie2 = 2.31000 s = 2310.00 mils | 879908 sp: [5, 11], [7, 13], ... [99999617, 99999623], [99999821, 99999827] 

python34 sexy-primes-test.py 100M siev1
 siev1 = 41.87118 s = 41871.18 mils | 879908 sp: [5, 11], [7, 13], ... [99999617, 99999623], [99999821, 99999827] siev2 = 40.71163 s = 40711.63 mils | 879908 sp: [5, 11], [7, 13], ... [99999617, 99999623], [99999821, 99999827] osie1 = 48.08692 s = 48086.92 mils | 879908 sp: [5, 11], [7, 13], ... [99999617, 99999623], [99999821, 99999827] osie2 = 44.05426 s = 44054.26 mils | 879908 sp: [5, 11], [7, 13], ... [99999617, 99999623], [99999821, 99999827] 

By the way, such a spread between CPython and PyPy is often one of the reasons why people switch to PyPy, write their own alokators, memory managers, GC, stackless and others like them, use third-party modules (for example, NumPy) and build their own assemblies. For example, when execution speed is important and like here, we have an “aggressive” use of a pool of objects / multiple calls, etc. It was the same long ago and in our case, when we moved from pure python. There was still a lot of things, and braking multithreding, and refcounting, etc. But the move itself was a deliberate decision of the whole team, and not a fad "lowered" from above. If there is interest and find time, you can try to squeeze about this article.

For this particular “task” one could write our own C-binding, use numpy type modules, etc. I tried to convince a colleague that it is solved on the knee for a short time almost “algorithmically,” if you know how the python is ticking inside.

So, let's start proving to a person that a python can “run” quickly, if the set task is solved, not an artificial test.

The original script, slightly modified by me for readability and third python support, this option is called org in my example script. (Only, plz, it is not necessary here about the "xrange vs range" - I perfectly imagine what the difference is, and here it is not particularly important, especially in the 3rd python, moreover, the iteration is as though “completed” ).
 def is_prime(n): return all((n % i > 0) for i in range(2, n)) # def sexy_primes_below(n): # return [[j-6, j] for j in range(9, n+1) if is_prime(j) and is_prime(j-6)] def sexy_primes_below(n): l = [] for j in range(9, n+1): if is_prime(j-6) and is_prime(j): l.append([j-6, j]) return l 

Since even at 10M we have only 100K sexy pairs, the change of the original primes_below to my version with a direct cycle does not greatly affect the execution time, it is just clearer for changes in subsequent versions (for example, a sieve). All tsimes in the implementation of is_prime , at least for now.

1. Firstly, the use of such a “sugar” as in the original (especially in “assemblies” such as PyPy, or our PyMNg) is not encouraged, or at least, as in this case, it hurts as a result in the form of a reduction in speed . Rewrite it as mod1 option
 def is_prime(n): i = 2 while True: if not n % i: return False i += 1 if i >= n: return True return True 

Already faster, at least in PyPy. But it's not that…

2. The code became immediately clearer and it is clear that it can be rewritten as mod2 in half faster if you do not check the even numbers (which, except for the two, were not initially prime).
 def is_prime(n): if not n % 2: return False i = 3 while True: if n % i == 0: return False i += 2 if i >= n: return True return True 

Let's fix it in the original - org2 is the same as mod2 , but using “sugar” in one line.
 def is_prime(n): return n % 2 and all(n % i for i in range(3, n, 2)) 

3. If we check the dividers to the square root value (it would be correct to rounded, but we will make it easier - it's just a test), then everything can be done much faster, we get the max option:
 def is_prime(n): if not n & 1: return 0 i = 3 while 1: if not n % i: return 0 if i * i > n: return 1 i += 2 return 1 
Much faster, really.

Again we rule it in the original - orgm .
 def is_prime(n): #return ((n & 1) and all(n % i for i in range(3, int(math.sqrt(n))+1, 2))) return ((n & 1) and all(n % i for i in range(3, int(n**0.5)+1, 2))) 

And we see that, at least in PyPy, it again runs slower (although partially, possibly, because of the direct calculation of the “root”, which is in range).

4. Here, a colleague's eyes light up, and he is like in that cartoon (in my opinion, “The Greedy Rich Man”) about furrier and 7 caps: “Can you be even faster?”. Since from memory there is no limit (not emdedded, etc.) I decided to quickly redo it using a “half” sieve - half sieve, that there is a prepared array of flags for offset for odd numbers, i.e. divided by 2. Moreover, on the python to organize such a sieve for one or two.
Well, at once we modify sexy_primes_below , causing generation of a sieve in it, and in order not to edit is_prime and not to call it in a loop, we ask immediately sieve .
We siev1 option siev1 .
 def primes_sieve(n): """ temporary "half" mask sieve for primes < n (using bool) """ sieve = [True] * (n//2) for i in range(3, int(n**0.5)+1, 2): if sieve[i//2]: sieve[i*i//2::i] = [False] * ((ni*i-1)//(2*i)+1) return sieve def sexy_primes_below(n): l = [] sieve = primes_sieve(n+1) #is_prime = lambda j: (j & 1) and sieve[int(j/2)] for j in range(9, n+1): i = j-6 #if (i & 1) and is_prime(i) and is_prime(j): if (i & 1) and sieve[int(i/2)] and sieve[int(j/2)]: l.append([i, j]) return l 
This option is so fast that in PyPy, for example, it gives out at 100M almost the same time as the original at 100K. Those. testing 2,000 times more numbers and generating a larger list of sexually simple couples several orders of magnitude.

Immediately rewrote the version of siev2 , because I remembered about the somewhat "stupid" processing bool in PyPy. Those. replacing the boolean flags with 0/1. This example works out at 100M already twice or three times faster than the original at 100K!

5. The variants osiev1 and osiev2 wrote so that in the future it was possible to replace the sieve for all numbers with a lot of shorter ones, i.e. to be able to search for pairs incrementally or blockwise.

For this, I changed the sieve offsets to a direct sieve storing not the flags, but the values ​​themselves:
 sieve = [1, 1, 1, 0, 1, 1 ...]; #  3, 5, 7, 9, 11, 13 ... osieve = [3, 5, 7, 0, 11, 13, ...]; #  3, 5, 7, 9, 11, 13 ... 

Option osiev1 .
 def primes_sieve(n): """ temporary odd direct sieve for primes < n """ sieve = list(range(3, n, 2)) l = len(sieve) for i in sieve: if i: f = (i*i-3) // 2 if f >= l: break sieve[f::i] = [0] * -((fl) // i) return sieve def sexy_primes_below(n): l = [] sieve = primes_sieve(n+1) #is_prime = lambda j: (j & 1) and sieve[int((j-2)/2)] for j in range(9, n+1): i = j-6 #if (i & 1) and is_prime(i) and is_prime(j): if (i & 1) and sieve[int((i-2)/2)] and sieve[int((j-2)/2)]: l.append([i, j]) return l 

Well, the second option osiev2 just a little bit faster, because initiates a sieve much more optimally.
 def primes_sieve(n): """ temporary odd direct sieve for primes < n """ #sieve = list(range(3, n, 2)) l = ((n-3)//2) sieve = [-1] * l for k in range(3, n, 2): o = (k-3)//2 i = sieve[o] if i == -1: i = sieve[(k-3)//2] = k if i: f = (i*i-3) // 2 if f >= l: break sieve[f::i] = [0] * -((fl) // i) return sieve 

These two methods could be converted to an iterative sieve (for example, search for pairs of 10K or 100K blocks). This would significantly save memory when counting. For example, if you now try both osieve versions with 1G or 10G, we are almost guaranteed to immediately get a MemoryError exception (well, or you are rich in Buratino - and you have a lot of memory and 64-bit python.

I did not finish the block method (it is not in the example script, let it be a “homework”, if someone suddenly wants.), Because My colleague has already repented, bowing in admiration, and I hope at least there will be no more hammering my head (and the team) with such nonsense.

And on the original 100K the execution time of the latter could no longer be calculated - 0.00 mils (I calculated it only by increasing the number of iterations of the execution times to 10).

What I'm pretty sure about is that it is unlikely to be able to increase the speed by another order not only on scala, but also on pure C. If only algorithmically again ...

The script itself, if anyone is going to experiment, you can ask for help, for example, like this:
 sexy-primes-test.py --help 

If anything, about primes is pretty well written in wikihow in great detail.

At the request of workers added a poll ...

Source: https://habr.com/ru/post/259095/


All Articles