📜 ⬆️ ⬇️

Pillow-SIMD

Acceleration of operations by 2.5 times compared with Pillow and 10 compared with ImageMagick



Pillow-SIMD is the fork-follower of the Pillow imaging library (which itself is a fork of the PIL library, now deceased). "Follower" means that the project does not become independent, but will be updated along with Pillow and have the same version numbering, only with a suffix. I hope to release the Pillow-SIMD versions more or less immediately after the Pillow versions are released.


Why SIMD


There are several ways to improve image processing performance (and all other things, probably, too).


  1. You can use better algorithms that give the same result.
  2. You can make a faster implementation of the existing algorithm.
  3. You can connect more computing resources to solve the same problem: additional CPU cores, GPUs.

The great thing is when you can use a faster algorithm, like when in Pillow 2.7 Gaussian blur based on convolutions was replaced by blur with a sequence of box filters. Unfortunately, the number of such tricks is very limited. Also very tempting is the idea of ​​using more computing resources. But unfortunately, they are often either not there, or they cost extra money (as is the case with leased servers). Using the same GPU for computing is generally a non-trivial task related to the selection of a certain hardware and the correct setting of the drivers. It remains the most reliable way - to try to get the existing code to work faster on the existing hardware. And here SIMD-instructions fit perfectly well.


SIMD means: “single instruction, a lot of data” (single instruction, multiple data). In classic programs, we take operands, perform an operation, and save the result. In the case of SIMD, we immediately take a bundle of operands, do the same action on all of them at once, and save a bundle of results. For a processor, this is easier than doing the same thing several times. There are a huge number of processor command extensions with SIMD instructions, for example: MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON.


In the current version, Pillow-SIMD can be compiled using SSE4 extensions (default) or AVX2.


Project status


Pillow-SIMD is suitable for production. Different versions of Pillow-SIMD have been working on Uploadcare servers for more than a year now . Uploadcare is a service for storing and processing custom content and is the main sponsor of Pillow-SIMD.


Currently the following operations are accelerated in the SIMD version:



Performance


The numbers indicate the number of processed megapixels of the original image per second. For example, if a resize of an image of size 7712 Ă— 4352 was completed in 0.5 seconds, the performance would be 67.1 Mpx / s.


Already in the editing process, I realized that it seemed to me that there was confusion in the megapixel for ImageMagick 10 ^ 6 pixels, and in the megapixel for Pillow - 2 ^ 20. But this does not greatly affect the overall picture.


Tested by:



SourceOperationFilterIMPillowSIMD SSE4SIMD AVX2
7712 Ă— 4352 RGBResize to 16x16Bilinear27.0217437710
Bicubic10.9115232391
Lanczos6.676.1157265
Resize to 320x180Bilinear32.0166410612
Bicubic16.592.3211344
Lanczos11.063.2136223
Resize to 2048x1155Bilinear20.787.6229265
Bicubic12.265.7140171
Lanczos8.741.3100126
Blur1px8.117.137.8
10px2.617.439.0
100px0.317.239.0
1920 Ă— 1280 RGBResize to 16x16Bilinear41.6196426750
Bicubic18.9102221379
Lanczos13.768.6140227
Resize to 320x180Bilinear27.6111303346
Bicubic14.566.3164230
Lanczos9.844.3108143
Resize to 2048x1155Bilinear9.120.771.169.6
Bicubic6.316.953.853.1
Lanczos4.714.640.741.7
Blur1px8.716.235.7
10px2.816.735.4
100px0.416.436.2

Pillow is always faster than ImageMagick, and Pillow-SIMD is faster than Pillow about 2-2.5 times for the SSE4 version. Basically, the AVX2 version is faster than the ImageMagick 10-15 times.


Tests were performed on Ubuntu 14.04 64-bit running on an Intel Core i5 4258U processor with AVX2. All tests used only one processor core.


The ImageMagick performance was measured by the convert command line utility with the arguments -verbose and -bench . The selected filters exactly match the existing Pillow filters:



For testing such scripts were used.


Why is Pillow so fast


There are no tricks here, high-quality resize and blur methods were used for the tests. Results almost pixel by pixel with a small error. The only difference is in the effectiveness of the algorithms themselves. In Pillow 2.7, resampling was rewritten using pre-computed coefficients, less using floating point numbers and transposing, effectively using the processor cache.


Why Pillow-SIMD is even faster


Of course, due to the use of SIMD commands. But I still have a few thoughts on how to improve this result.



Why not put changes back into Pillow


In short, it is very difficult. Pillow supports a large number of architectures, not just x86. But even on x86 Pillow for some platforms it is distributed in the form of compiled executable files. To be able to use SIMD commands in code, you need to pass arguments to the compiler allowing the use of the most advanced instructions that we want to use: -mavx2 . After that, you need to check the capabilities of the processor at runtime and enable this or that branch of code depending on them. The problem is that such arguments automatically compile code hidden under the conditions of the preprocessor if (__AVX2__) and below, which may have no checks for execution time. The saddest thing is that such code is actually located, at least when compiling GCC, and executable files without using AVX2 explicitly, but compiled with -mavx2 , start to -mavx2 . Of course, you can build different versions of the library with different compiler options and dynamically connect them, but this [see beginning of this paragraph].


Installation


The good news is that to install the SSE4 version, it’s enough to write as usual pip install pillow-simd , and if your processor can do it in SSE4 (I think the probability is about 95%), everything will be fine. Remember to remove the original Pillow package.


If you want to build the AVX2 version, then you need to pass additional flags to the compiler. The easiest way to do this is to set the CC environment variable during installation and compilation.


 $ pip uninstall -y pillow-simd ; CC="cc -mavx2" pip install pillow-simd 

Sometimes it happens that there is a dependency on Pillow not only on you, but on other packages that you use. And even if these packages do not really need fast re-sampling, they still install Pillow without SIMD, which can be imported first. For this, this hack may come in handy when installing from GitHub:


 $ pip install -e git+https://github.com/uploadcare/pillow-simd.git@v3.2.0.post3#egg=pillow 

Then during the installation of another package with a dependency on Pillow, another version of Pillow will not be installed:


 $ pip install xhtml2pdf -e git+https://github.com/uploadcare/pillow-simd.git@v3.2.0.post3#egg=pillow 

')

Source: https://habr.com/ru/post/301576/


All Articles