Now many people use the
numpy library in their python programs, since it significantly speeds up work with data and performing mathematical operations. However, in many cases,
numpy works several times slower than it can ... because it uses only one processor, although it could use everything that you have.
The fact is that to perform many operations,
numpy calls functions from the linear algebra library. This is where the problem usually lies. Fortunately, everything is quite easily fixable.
So, there are three possible situations:
- you do not have any linear algebra libraries installed, and then numpy uses the built-in library, and it must be said, is very slow;
- you already have classic libraries like ATLAS and BLAS, and they can only use one processor;
- You have a modern library OpenBLAS, MKL and the like.
Let's do a simple test. Run this program here:
')
import numpy as np size = 10000 a = np.random.random_sample((size, size)) b = np.random.random_sample((size, size)) n = np.dot(a,b)
After that, if you are working in Linux, then run
top , and if you are working in Windows, go to the “Speed” tab in the task manager (called by Ctrl + Shift + Esc) ... If
top shows the load at 100% and the indicator “CPU load” on the “Performance” tab, on the contrary, shows the value many times below 100%, it means that only one core is occupied by calculations - and this article is for you. Those who have all the processors involved can be happy - they are fine - and then you can not read.
Windows solution
Theoretically, you can, of course, find the sources of libraries, recompile them and rebuild
numpy . I even heard that someone wrote that he saw people who said that they succeeded ... In general, the easiest way is to install a scientific distribution of Python, for example,
Anaconda or
Canopy . The distribution includes not only
python and
numpy , but also a whole bunch of useful libraries for calculations and visualization.
Then you can restart the initial test to make sure that the speed has increased significantly.
Linux solution
In fact, you can also install an
Anaconda ,
Canopy distribution or something else with all the libraries at once. But if you prefer to collect your own hands, read on - there are all the recipes.
Library check
As you remember, there are two options:
- you have “old school” (or “obsolete”, as you like) libraries (for example, ATLAS);
- you have no libraries installed, and numpy uses the built-in library (which is even slower)
If you are running the latest version of
numpy (> 1.10), then go to the directory where numpy is installed (usually
/usr/local/lib/python2.7/dist-packages/numpy , but depending on the version of Linux and Python may change) and run the following commands in the console:
cd core ldd multiarray.so
In earlier versions of the
numpy library
multiarray.so is not, but there is
_dotblas.so :
ldd _dotblas.so
The output from the
ldd command will show you whether
numpy uses third-party linear algebra libraries.
linux-vdso.so.1 => (0x00007fffe58a2000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8adbff4000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f8adbdd6000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8adba10000) /lib64/ld-linux-x86-64.so.2 (0x00007f8adc68c000)
If you don't see
libblas.so in the listing, then your
numpy uses its own internal library. If you see, then you have an ATLAS or BLAS.
In any case, first you need the correct library of linear algebra.
Installing OpenBLAS
OpenBLAS is a good library of algorithms and linear algebra functions that underlie modern data analysis and machine learning methods.
First of all, you will need a Fortran compiler, since OpenBLAS is not compatible with the standard
g77 compiler.
sudo apt-get install gfortran
Download OpenBLAS from github (after returning to the appropriate directory for installation):
git clone https://github.com/xianyi/OpenBLAS.git
Now go to the directory and run the build:
cd OpenBLAS make FC=gfortran
When the compilation and build is successfully completed, install the library.
sudo make install
By default, the library will be installed in
/ opt / OpenBLAS . If you want to install it to another location, run
make install with the
PREFIX key:
sudo make install PREFIX=/your/preferred/location
Reassign Libraries
If earlier you found out that you already have installed some kind of library of linear algebra, then you just need to run the library reassignment command:
sudo update-alternatives --install /usr/lib/libblas.so.3 libblas.so.3 \ /opt/OpenBLAS/lib/libopenblas.so 50
After that, OpenBLAS will by default become a linear alegra library not only for
numpy , but in general for all your programs and libraries.
And run the test again to see how all the processors are now involved in the calculations.
Putting the right numpy
If your
numpy worked on the built-in library, then you have to rebuild it so that it picks up the newly installed OpenBLAS.
First get rid of the defective library:
sudo pip uninstall numpy
Then create in the home directory a .numpy-site.cfg file with the following content:
[default] include_dirs = /opt/OpenBLAS/include library_dirs = /opt/OpenBLAS/lib [openblas] openblas_libs = openblas include_dirs = /opt/OpenBLAS/include library_dirs = /opt/OpenBLAS/lib [lapack] lapack_libs = openblas [atlas] atlas_libs = openblas libraries = openblas
If you have previously selected a nonstandard location for OpenBLAS, then change the paths in the file. Now install
numpy again:
sudo pip install numpy
When the compilation and installation is complete, run the initial test to make sure that the processors are not idle now. That's all.