⬆️ ⬇️

Python performance. Easy way

I always knew that one of the advantages of Python is the ability to rewrite the most brake pieces of C code and increase the speed of the program to unprecedented heights. But I never tried it myself, because thought it was too hard. After reading this article, I no longer think so.



Programmers familiar with ctypes are unlikely to find something interesting here, but for beginners I ask for cat.



Ctypes is a Python mechanism for importing functions from external libraries.

% timeit is an IPython shell magic function that measures the execution time of an expression in Python




Ctypes is great! Let's start with a small trivial example: the summation of numbers in a certain range.

Here is the implementation of this Python function.

def sumrange(arg): return sum(xrange(arg)) 


Fine! But what if we try to summarize a really large range of numbers, for example, from 0 to 10 ** 8 (ie, 100,000,000)

 In [2]: %timeit sumrange(10**2) 1000000 loops, best of 3: 1.53 us per loop In [3]: %timeit sumrange(10**8) 1 loops, best of 3: 9.77 s per loop In [4]: %timeit sumrange(10**9) 1 loops, best of 3: 97.8 s per loop 


Not so fun anymore. Let's try something else:

 def sumrange2(arg): x = i = 0 while i < arg: x += i i += 1 return x 


What will come of it?

 In [10]: %timeit sumrange2(10**2) 100000 loops, best of 3: 10.5 us per loop In [11]: %timeit sumrange2(10**8) 1 loops, best of 3: 18.5 s per loop 


Wow ... So much the worse ... This time I will not even try 10 ** 9.

So how do we speed up the execution? Just do not offer mathematical optimization ... we are in the new world of computers! ( in the original: don't suggest math tricks ... this is the new world of computing! )

Yes, I know that the complexity of the algorithm is a constant and does not depend on the value of the argument, n * (n + 1) / 2. But the article is not about this.



What about ctypes?

')

 #include <stdio.h> unsigned long long sumrange(unsigned long long arg) { unsigned long long i, x; x = 0; for (i = 0; i < arg; i++) { x = x + i; } return x; } 


Save it with the name sumrange.c and compile it (we will not use optimizations for the purity of the experiment):

$ gcc -shared -Wl,-install_name,sumrange.so -o sumrange.so -fPIC sumrange.c



Importing in Python what happened:

 import ctypes sumrange_ctypes = ctypes.CDLL('./sumrange.so').sumrange sumrange_ctypes.restype = ctypes.c_ulonglong sumrange_ctypes.argtypes = ctypes.c_ulonglong, 




And Oscar gets ...



 In [15]: %timeit sumrange_ctypes(10**2) 1000000 loops, best of 3: 1.28 us per loop In [16]: %timeit sumrange_ctypes(10**8) 1 loops, best of 3: 381 ms per loop In [17]: %timeit sumrange_ctypes(10**9) 1 loops, best of 3: 3.79 s per loop In [18]: %timeit sumrange_ctypes(10**10) 1 loops, best of 3: 37.8 s per loop 




Summary report:

10 ** 210 ** 810 ** 910 ** 10
Pure Python, method number 11.53 µs9.77 seconds97.8 seconds-
Pure Python, method number 210.5 µs18.5 s--
ctypes1.28 µs381 ms3.79 s37.8 seconds




Hell of a performance boost!



For Node.js hackers, there is an equivalent ctypes - FFI (Foreign Function Interface): github.com/rbranson/node-ffi

Source: https://habr.com/ru/post/157537/



All Articles