Pillow-SIMD is the fork-follower of the Pillow imaging library (which itself is a fork of the PIL library, now deceased). "Follower" means that the project does not become independent, but will be updated along with Pillow and have the same version numbering, only with a suffix. I hope to release the Pillow-SIMD versions more or less immediately after the Pillow versions are released.
There are several ways to improve image processing performance (and all other things, probably, too).
The great thing is when you can use a faster algorithm, like when in Pillow 2.7 Gaussian blur based on convolutions was replaced by blur with a sequence of box filters. Unfortunately, the number of such tricks is very limited. Also very tempting is the idea of ​​using more computing resources. But unfortunately, they are often either not there, or they cost extra money (as is the case with leased servers). Using the same GPU for computing is generally a non-trivial task related to the selection of a certain hardware and the correct setting of the drivers. It remains the most reliable way - to try to get the existing code to work faster on the existing hardware. And here SIMD-instructions fit perfectly well.
SIMD means: “single instruction, a lot of data” (single instruction, multiple data). In classic programs, we take operands, perform an operation, and save the result. In the case of SIMD, we immediately take a bundle of operands, do the same action on all of them at once, and save a bundle of results. For a processor, this is easier than doing the same thing several times. There are a huge number of processor command extensions with SIMD instructions, for example: MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON.
In the current version, Pillow-SIMD can be compiled using SSE4 extensions (default) or AVX2.
Pillow-SIMD is suitable for production. Different versions of Pillow-SIMD have been working on Uploadcare servers for more than a year now . Uploadcare is a service for storing and processing custom content and is the main sponsor of Pillow-SIMD.
Currently the following operations are accelerated in the SIMD version:
The numbers indicate the number of processed megapixels of the original image per second. For example, if a resize of an image of size 7712 Ă— 4352 was completed in 0.5 seconds, the performance would be 67.1 Mpx / s.
Already in the editing process, I realized that it seemed to me that there was confusion in the megapixel for ImageMagick 10 ^ 6 pixels, and in the megapixel for Pillow - 2 ^ 20. But this does not greatly affect the overall picture.
Tested by:
ImageMagick 6.9.3-8 Q8 x86_64
Pillow 3.2.0
Pillow-SIMD 3.2.0.post2
Source | Operation | Filter | IM | Pillow | SIMD SSE4 | SIMD AVX2 |
---|---|---|---|---|---|---|
7712 Ă— 4352 RGB | Resize to 16x16 | Bilinear | 27.0 | 217 | 437 | 710 |
Bicubic | 10.9 | 115 | 232 | 391 | ||
Lanczos | 6.6 | 76.1 | 157 | 265 | ||
Resize to 320x180 | Bilinear | 32.0 | 166 | 410 | 612 | |
Bicubic | 16.5 | 92.3 | 211 | 344 | ||
Lanczos | 11.0 | 63.2 | 136 | 223 | ||
Resize to 2048x1155 | Bilinear | 20.7 | 87.6 | 229 | 265 | |
Bicubic | 12.2 | 65.7 | 140 | 171 | ||
Lanczos | 8.7 | 41.3 | 100 | 126 | ||
Blur | 1px | 8.1 | 17.1 | 37.8 | ||
10px | 2.6 | 17.4 | 39.0 | |||
100px | 0.3 | 17.2 | 39.0 | |||
1920 Ă— 1280 RGB | Resize to 16x16 | Bilinear | 41.6 | 196 | 426 | 750 |
Bicubic | 18.9 | 102 | 221 | 379 | ||
Lanczos | 13.7 | 68.6 | 140 | 227 | ||
Resize to 320x180 | Bilinear | 27.6 | 111 | 303 | 346 | |
Bicubic | 14.5 | 66.3 | 164 | 230 | ||
Lanczos | 9.8 | 44.3 | 108 | 143 | ||
Resize to 2048x1155 | Bilinear | 9.1 | 20.7 | 71.1 | 69.6 | |
Bicubic | 6.3 | 16.9 | 53.8 | 53.1 | ||
Lanczos | 4.7 | 14.6 | 40.7 | 41.7 | ||
Blur | 1px | 8.7 | 16.2 | 35.7 | ||
10px | 2.8 | 16.7 | 35.4 | |||
100px | 0.4 | 16.4 | 36.2 |
Pillow is always faster than ImageMagick, and Pillow-SIMD is faster than Pillow about 2-2.5 times for the SSE4 version. Basically, the AVX2 version is faster than the ImageMagick 10-15 times.
Tests were performed on Ubuntu 14.04 64-bit running on an Intel Core i5 4258U processor with AVX2. All tests used only one processor core.
The ImageMagick performance was measured by the convert command line utility with the arguments -verbose
and -bench
. The selected filters exactly match the existing Pillow filters:
PIL.Image.BILINEAR == Triangle
PIL.Image.BICUBIC == Catrom
PIL.Image.LANCZOS == Lanczos
For testing such scripts were used.
There are no tricks here, high-quality resize and blur methods were used for the tests. Results almost pixel by pixel with a small error. The only difference is in the effectiveness of the algorithms themselves. In Pillow 2.7, resampling was rewritten using pre-computed coefficients, less using floating point numbers and transposing, effectively using the processor cache.
Of course, due to the use of SIMD commands. But I still have a few thoughts on how to improve this result.
In short, it is very difficult. Pillow supports a large number of architectures, not just x86. But even on x86 Pillow for some platforms it is distributed in the form of compiled executable files. To be able to use SIMD commands in code, you need to pass arguments to the compiler allowing the use of the most advanced instructions that we want to use: -mavx2
. After that, you need to check the capabilities of the processor at runtime and enable this or that branch of code depending on them. The problem is that such arguments automatically compile code hidden under the conditions of the preprocessor if (__AVX2__)
and below, which may have no checks for execution time. The saddest thing is that such code is actually located, at least when compiling GCC, and executable files without using AVX2 explicitly, but compiled with -mavx2
, start to -mavx2
. Of course, you can build different versions of the library with different compiler options and dynamically connect them, but this [see beginning of this paragraph].
The good news is that to install the SSE4 version, it’s enough to write as usual pip install pillow-simd
, and if your processor can do it in SSE4 (I think the probability is about 95%), everything will be fine. Remember to remove the original Pillow package.
If you want to build the AVX2 version, then you need to pass additional flags to the compiler. The easiest way to do this is to set the CC
environment variable during installation and compilation.
$ pip uninstall -y pillow-simd ; CC="cc -mavx2" pip install pillow-simd
Sometimes it happens that there is a dependency on Pillow not only on you, but on other packages that you use. And even if these packages do not really need fast re-sampling, they still install Pillow without SIMD, which can be imported first. For this, this hack may come in handy when installing from GitHub:
$ pip install -e git+https://github.com/uploadcare/pillow-simd.git@v3.2.0.post3#egg=pillow
Then during the installation of another package with a dependency on Pillow, another version of Pillow will not be installed:
$ pip install xhtml2pdf -e git+https://github.com/uploadcare/pillow-simd.git@v3.2.0.post3#egg=pillow
Source: https://habr.com/ru/post/301576/
All Articles