📜 ⬆️ ⬇️

Numpy and multiprocessing

Now many people use the numpy library in their python programs, since it significantly speeds up work with data and performing mathematical operations. However, in many cases, numpy works several times slower than it can ... because it uses only one processor, although it could use everything that you have.

The fact is that to perform many operations, numpy calls functions from the linear algebra library. This is where the problem usually lies. Fortunately, everything is quite easily fixable.

So, there are three possible situations:


Let's do a simple test. Run this program here:
')
import numpy as np size = 10000 a = np.random.random_sample((size, size)) b = np.random.random_sample((size, size)) n = np.dot(a,b) 

After that, if you are working in Linux, then run top , and if you are working in Windows, go to the “Speed” tab in the task manager (called by Ctrl + Shift + Esc) ... If top shows the load at 100% and the indicator “CPU load” on the “Performance” tab, on the contrary, shows the value many times below 100%, it means that only one core is occupied by calculations - and this article is for you. Those who have all the processors involved can be happy - they are fine - and then you can not read.

Windows solution

Theoretically, you can, of course, find the sources of libraries, recompile them and rebuild numpy . I even heard that someone wrote that he saw people who said that they succeeded ... In general, the easiest way is to install a scientific distribution of Python, for example, Anaconda or Canopy . The distribution includes not only python and numpy , but also a whole bunch of useful libraries for calculations and visualization.

Then you can restart the initial test to make sure that the speed has increased significantly.

Linux solution

In fact, you can also install an Anaconda , Canopy distribution or something else with all the libraries at once. But if you prefer to collect your own hands, read on - there are all the recipes.

Library check


As you remember, there are two options:


If you are running the latest version of numpy (> 1.10), then go to the directory where numpy is installed (usually /usr/local/lib/python2.7/dist-packages/numpy , but depending on the version of Linux and Python may change) and run the following commands in the console:

 cd core ldd multiarray.so 

In earlier versions of the numpy library multiarray.so is not, but there is _dotblas.so :

 ldd _dotblas.so 

The output from the ldd command will show you whether numpy uses third-party linear algebra libraries.

 linux-vdso.so.1 => (0x00007fffe58a2000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8adbff4000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f8adbdd6000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8adba10000) /lib64/ld-linux-x86-64.so.2 (0x00007f8adc68c000) 

If you don't see libblas.so in the listing, then your numpy uses its own internal library. If you see, then you have an ATLAS or BLAS.

In any case, first you need the correct library of linear algebra.

Installing OpenBLAS


OpenBLAS is a good library of algorithms and linear algebra functions that underlie modern data analysis and machine learning methods.

First of all, you will need a Fortran compiler, since OpenBLAS is not compatible with the standard g77 compiler.

 sudo apt-get install gfortran 

Download OpenBLAS from github (after returning to the appropriate directory for installation):

 git clone https://github.com/xianyi/OpenBLAS.git 

Now go to the directory and run the build:

 cd OpenBLAS make FC=gfortran 

When the compilation and build is successfully completed, install the library.

 sudo make install 

By default, the library will be installed in / opt / OpenBLAS . If you want to install it to another location, run make install with the PREFIX key:

 sudo make install PREFIX=/your/preferred/location 

Reassign Libraries


If earlier you found out that you already have installed some kind of library of linear algebra, then you just need to run the library reassignment command:

 sudo update-alternatives --install /usr/lib/libblas.so.3 libblas.so.3 \ /opt/OpenBLAS/lib/libopenblas.so 50 

After that, OpenBLAS will by default become a linear alegra library not only for numpy , but in general for all your programs and libraries.

And run the test again to see how all the processors are now involved in the calculations.

Putting the right numpy


If your numpy worked on the built-in library, then you have to rebuild it so that it picks up the newly installed OpenBLAS.

First get rid of the defective library:

 sudo pip uninstall numpy 

Then create in the home directory a .numpy-site.cfg file with the following content:

 [default] include_dirs = /opt/OpenBLAS/include library_dirs = /opt/OpenBLAS/lib [openblas] openblas_libs = openblas include_dirs = /opt/OpenBLAS/include library_dirs = /opt/OpenBLAS/lib [lapack] lapack_libs = openblas [atlas] atlas_libs = openblas libraries = openblas 

If you have previously selected a nonstandard location for OpenBLAS, then change the paths in the file. Now install numpy again:

 sudo pip install numpy 

When the compilation and installation is complete, run the initial test to make sure that the processors are not idle now. That's all.

Source: https://habr.com/ru/post/274331/


All Articles