I always knew that one of the advantages of Python is the ability to rewrite the most brake pieces of C code and increase the speed of the program to unprecedented heights. But I never tried it myself, because thought it was too hard. After reading this article, I no longer think so.
Programmers familiar with ctypes are unlikely to find something interesting here, but for beginners I ask for cat.
Ctypes is a Python mechanism for importing functions from external libraries.
% timeit is an
IPython shell magic function that measures the execution time of an expression in Python
Ctypes is great! Let's start with a small trivial example: the summation of numbers in a certain range.
Here is the implementation of this Python function.
def sumrange(arg): return sum(xrange(arg))
Fine! But what if we try to summarize a really large range of numbers, for example, from 0 to 10 ** 8 (ie, 100,000,000)
In [2]: %timeit sumrange(10**2) 1000000 loops, best of 3: 1.53 us per loop In [3]: %timeit sumrange(10**8) 1 loops, best of 3: 9.77 s per loop In [4]: %timeit sumrange(10**9) 1 loops, best of 3: 97.8 s per loop
Not so fun anymore. Let's try something else:
def sumrange2(arg): x = i = 0 while i < arg: x += i i += 1 return x
What will come of it?
In [10]: %timeit sumrange2(10**2) 100000 loops, best of 3: 10.5 us per loop In [11]: %timeit sumrange2(10**8) 1 loops, best of 3: 18.5 s per loop
Wow ... So much the worse ... This time I will not even try 10 ** 9.
So how do we speed up the execution? Just do not offer mathematical optimization ... we are in the new world of computers! (
in the original: don't suggest math tricks ... this is the new world of computing! )
Yes, I know that the complexity of the algorithm is a constant and does not depend on the value of the argument, n * (n + 1) / 2. But the article is not about this.
What about ctypes?
')
#include <stdio.h> unsigned long long sumrange(unsigned long long arg) { unsigned long long i, x; x = 0; for (i = 0; i < arg; i++) { x = x + i; } return x; }
Save it with the name sumrange.c and compile it (we will not use optimizations for the purity of the experiment):
$ gcc -shared -Wl,-install_name,sumrange.so -o sumrange.so -fPIC sumrange.c
Importing in Python what happened:
import ctypes sumrange_ctypes = ctypes.CDLL('./sumrange.so').sumrange sumrange_ctypes.restype = ctypes.c_ulonglong sumrange_ctypes.argtypes = ctypes.c_ulonglong,
And Oscar gets ...
In [15]: %timeit sumrange_ctypes(10**2) 1000000 loops, best of 3: 1.28 us per loop In [16]: %timeit sumrange_ctypes(10**8) 1 loops, best of 3: 381 ms per loop In [17]: %timeit sumrange_ctypes(10**9) 1 loops, best of 3: 3.79 s per loop In [18]: %timeit sumrange_ctypes(10**10) 1 loops, best of 3: 37.8 s per loop
Summary report:
| 10 ** 2 | 10 ** 8 | 10 ** 9 | 10 ** 10 |
---|
Pure Python, method number 1 | 1.53 µs | 9.77 seconds | 97.8 seconds | - |
Pure Python, method number 2 | 10.5 µs | 18.5 s | - | - |
ctypes | 1.28 µs | 381 ms | 3.79 s | 37.8 seconds |
Hell of a performance boost!
For Node.js hackers, there is an equivalent ctypes - FFI (Foreign Function Interface):
github.com/rbranson/node-ffi