
No matter how good Python is, he has a problem known to all developers - speed. On this topic was written many
articles , including
on Habré .
Most often, offer the following solutions:
- Use Psyco
- Rewrite part of the program in C using Python C Extensions
- Change
brains algorithm
Of course, the decisions are correct. But each of them has its drawbacks.
Psyco is an excellent module that achieves code acceleration by hundreds of percent, but: only 32-bit Python versions not higher than 2.6 are supported,
high memory consumption and the fact that lately the development of psyco slowed down, the last update on the
site is dated July 16, 2010. Perhaps with the release of Psyco v2, the situation will change, but so far, this module is not always applicable.
Python C Extensions (I recommend an excellent article
rushman Writing an extension module for Python on C ) - the creators of Python made all developers using this language an invaluable gift - Python / C API, giving the possibility of relatively transparent integration of C-code in Python programs 'e. There are only two drawbacks to this solution:
- The “threshold of entry” for C and Python / C API is still higher than that of “bare” Python, which cuts off this opportunity for developers who are not familiar with C
- One of the key features of Python is the speed of development. Writing a part of a C program reduces it, in proportion to the part of the code rewritten in C to the whole program.
So, this method is also not suitable for everyone.
The change of the algorithm is not always possible, the situations are frequent, as the fastest algorithm produces disappointing results in speed.
')
So what to do?
Then, if for your project the above listed methods did not fit, what should I do? Change Python to another language? No, you can not give up. We will optimize the code itself.
Examples will be taken from a program that builds a Mandelbrot set of a given size with a given number of iterations.
The operating time of the original version with parameters 600 * 600 pixels, 100 iterations was 3.07 seconds , we take this value as 100%I will say in advance, part of the optimization will lead to the fact that the code will become less pythonic, forgive the adherents of the python-way.Step 0. Bringing the main program code to a separate
This step helps the python interpreter to better conduct internal optimizations about the launch, and when using psyco, this step can help a lot, since psyco optimizes only functions without affecting the main body of the program.
If earlier the calculated part of the source program looked like this:
for Y in xrange(height): for X in xrange(width):
That, changing it to:
def mandelbrot(height, itt, width): for Y in xrange(height): for X in xrange(width):
we got a time of
2.4 seconds , i.e.
78% of the original.
Step 1. Profiling
The standard Python library, it's just a Klondike of useful modules. Now we are interested in the cProfile module, thanks to which, profiling the code becomes simple and even interesting.
Full documentation on this module can be found
here , we have enough of a couple of simple commands.
python -m cProfile sample.py
The -m key allows you to run modules as separate programs if the module itself provides such an opportunity.The result of this command will be getting the "profile" of the program - a table, like
4613944 function calls (4613943 primitive calls) in 2.818 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 2.309 2.309 2.766 2.766 mand_slow.py:22(mandelbrot)
...
With its help, it is easy to identify the places that need optimization (the lines with the highest values of ncalls (the number of function calls), tottime and percall (the running time of all calls to this function and each individual, respectively)).
For convenience, you can add the
-s time
sorting the profiler output by runtime.
In my case, the interesting part of the output was (the execution time differs from the above because the profiler adds its “overhead”):
4613944 function calls (4613943 primitive calls) in 2.818 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 2.309 2.309 2.766 2.766 mand_slow.py:22(mandelbrot)
3533224 0.296 0.000 0.296 0.000 {abs}
360000 0.081 0.000 0.081 0.000 {math.atan2}
360000 0.044 0.000 0.044 0.000 {math.cos}
360000 0.036 0.000 0.036 0.000 {math.sqrt}
...
So, the profile is received, now we will be engaged in optimization closely.
Step 2. Profile analysis
We see that in the first place in time is our main function mandelbrot, followed by the system function abs, followed by several functions from the math module, then - single function calls, with minimal time costs, they are not interesting for us.
So, the system functions "licked" by the community, we can hardly be improved, so let's move on to our own code:
Step 3. Math
Now, the code looks like this:
pix = img.load()
Note that the exponentiation ** operator is quite “common”, we only need the second-degree construction, i.e. all constructions of the form x ** 2 can be replaced by x * x, thus winning a little more time. Let's look at the time:
1.9 seconds , or
62% of the initial time, is achieved by simply replacing two lines:
p = math.sqrt((x - 0.25) ** 2 + y ** 2) ... Z_i = Z_i **2 + z
on:
p = math.sqrt((x - 0.25) * (x - 0.25) + y * y) ... Z_i = Z_i * Z_i + z
Steps 5, 6 and 7. Small but important
The basic truth that all Python programmers know about is working with global variables more slowly than working with local variables. But it is often forgotten that this is true not only for variables but in general for all objects. In the function code, calls are made to several functions from the math module. So why not import them into the function itself? Made by:
def mandelbrot(height, itt, width): from math import atan2, cos, sqrt pix = img.load()
Another 0.1 seconds is won back.
Recall that abs (x) returns a float number. So, it is worth comparing it with float and not int:
if abs(Z_i) > 2: ------> if abs(Z_i) > 2.0:
Another 0.15s.
53% of the start time.
And finally, the dirty hack.
In this particular task, it can be understood that the bottom half of the image is equal to the top, so The number of calculations can be halved, resulting in a total of
0.84 seconds or
27% of the original time.
Conclusion
Profile. Use timeit. Optimize. Python is a powerful language, and the programs on it will work at a speed proportional to your desire to understand and polish everything :)
The purpose of this article is to show that due to minor and minor changes, such as replacing ** with *, you can make a green snake crawl
up to two times faster without using heavy artillery in the form of Xi, or psyco shamanism.
Also, you can combine different tools, such as the above optimization and the psyco module, it won't get any worse :)
Thanks to everyone who read to the end, I will be glad to hear your opinions and comments in the comments!
UPD Useful
link in the comments cited
funca .