Acceleration of Python scripts without any mental effort

One of the common uses for Python is small data processing scripts (for example, some kind of logs). I often had to deal with such tasks, the scripts were usually written hastily. Coupled with my poor knowledge of the algorithms, this led to the fact that the code was far from optimal. This did not upset me at all: an extra minute of execution would not make the weather.

The situation changed a little when the amount of data for processing grew. And after the execution time of the next script has exceeded one day, I decided to devote some time to optimization - I would still like to get the result before it loses relevance. In this article, I do not plan to talk about profiling, but will touch on the topic of compiling Python code. In this case, I will state the condition: optimization options should not be demanding of the developer’s time, but, on the contrary, be friendly to “pysch-pysch and in production”.

For the benchmark, I made two scripts that solve absolutely synthetic problems:

generate.py - the script generates 500 thousand dictionaries with the same keys and different values, serializes them to json and writes them to a file. It turns out something like:

{"str_youCPQmO": "TMrjhoKpFuyZ", "int_VAuUXXmC": 5, "int_ScRduCVX": 73, "str_YfsEUEve": "IOAYDoAuZBzQ", "int_dlBzZYlO": 77, "int_lJaDHVSH": 45, "str_qzDCDxbm": "rfERFTVFZiku", "int_gblmMsBX": 57, "str_MZNfINjj": "DeDaDMjKQyzo", "str_sUQVbIyn": "tenhduEcWkof"}  {"str_youCPQmO": "OJRZDmiQxflr", "int_VAuUXXmC": 9, "int_ScRduCVX": 32, "str_YfsEUEve": "CYxuIUTWAVTH", "int_dlBzZYlO": 37, "int_lJaDHVSH": 22, "str_qzDCDxbm": "aZTizzobHBbh", "int_gblmMsBX": 63, "str_MZNfINjj": "sJElOjzNlpJZ", "str_sUQVbIyn": "WDUdOMwERjxm"}

analyze.py - the script reads the file generated above and aggregates them in two ways:
1. if the values are strings, then you need to find the most used symbol by this key;
2. if the values are numbers, then we need to calculate the average value of the sigmoid function from each of them (rather strange, but why not?).

The tasks, being invented, are nevertheless similar to what I came across in real life, so it is quite suitable for the benchmark. Although it is obvious that additional factors interfere in normal work: the script needs to be parallelized, you can use specialized libraries (it is much easier and faster to aggregate numbers using numpy / pandas), etc.
')
Since from the very beginning I set myself the requirement that the method of acceleration should be simple, there are only options that you can use to scroll through the manuals diagonally. So I quickly googled something like 'jit python', 'compiler python' and chose to benchmark:

python 2.7 without third-party libraries (reference);
python 3 without third-party libraries;
pypy;
nuitka —recurse-none (only main files are compiled);
nuitka —recurse-all (all dependencies are compiled);
numba;
Cython without modifying code for static typing;

Obviously, the comparison is not very correct - both interpreters and compilers, both jit and regular, are in the list. Nevertheless, all of them can be considered to solve my applied task — to get the scripts to work with a little blood faster than their result loses relevance.

Unfortunately, I did not manage to launch numba - scripts fell with exceptions like NotImplementedError: cell vars are not supported. It seems that as a universal means of accelerating everything in a row, numba is not suitable yet.

The benchmark was held on Macbook Pro Late 2013 (2.4 GHz Intel Core i5). A small fabric script was written for launch, so that anyone can easily repeat it in the conditions of interest. So, the result:

As you can see, the performance gain varies from script to script (which is not surprising). Nevertheless, we can note the unconditional victory of pypy in both categories, some acceleration from Cython and uselessness of nuitka for these purposes (which does not negate the use, if, for example, you just need to put all the dependencies together). It is also interesting that for python 3 aggregation it turned out to be faster than the cythonized version of the same script. I decided for myself that in different cases it is reasonable to use both pypy and Cython: for example, if numpy / scipy / pandas etc. are used with full force in the script, then pypy may cause difficulties (not this whole stack works out of the pypy box) but it will be quite easy to broadcast one heavy function in Cython.

Test sources are on Github , improvements and additions are welcome.

Source: https://habr.com/ru/post/276569/

All Articles

Acceleration of Python scripts without any mental effort

More articles: