📜 ⬆️ ⬇️

D std.ndslice as a replacement for Python Numpy

Preface: I have been writing in Python for more than 6 years and I can call myself a professional in this language. Recently I even wrote a book about him. However, for the last 8 months I have switched to D and for 4 months I have been actively involved in the development of this language in terms of the expansion of the standard library Phobos. I also participated in the code review of the std.ndslice module which will be discussed.

std.ndslice as well as Numpy is designed to work with multidimensional arrays. However, unlike Numpy, ndslice has an extremely low overhead as it is based on ranges (ranges) that are used in the regular library throughout. Ranges can avoid unnecessary copying procedures, as well as allow you to beautifully organize lazy calculations.

In this article I would like to tell about what advantages std.ndslice gives in comparison with Numpy.

Why should Python programmers look towards D?


Because D allows you to write almost the same simple and understandable code as Python, while this code will work much faster than Python code.
')
The following example creates a range of numbers from 0 to 999 using the iota function (the analog in Python is called xrange ) and returns an array of 5x5x40.

 import std.range : iota; import std.experimental.ndslice; void main() { auto slice = sliced(iota(1000), 5, 5, 40); } 

Although D is a statically typed language, and an explicit type indication increases the readability of the code so that it will be easier for Python programmers to use automatic typing using the auto keyword.

The sliced function returns a multi-dimensional slice. At the input, it can take as simple arrays as well as ranges . As a result, we have a 5x5x40 cube consisting of numbers from 0 to 999.

A few words about what Ranges is. It is more correct to translate them into Russian as ranges. Ranges allow us to describe the rules for sorting data of any data type, be it a class or a structure. It's enough that you implement the function: front , which returns the first element of the range, popFront , which goes to the next element and empty returns a boolean value indicating that the looped sequence is empty. Ranges allow you to perform lazy brute-force access to data as they are needed. The concept of Ranges is covered in more detail here .

Please note that no empty memory allocations! This happens because iota also allows you to generate lazy ranges, and sliced in lazy mode also receives data from iota and processes them as it is received.

As you can see, std.ndslice works a little differently than Numpy. Numpy creates its own type for data, while std.ndslice represents a way to manipulate already existing data. This allows you to use the same data in your entire program without wasting resources on useless memory allocation! It is not difficult to guess that this is an extremely serious impact on the performance of your decisions.

Let's look at the following example. In it, we will retrieve data from stdin , filter only unique strings, sort them, and then return to stdout .

 import std.stdio; import std.array; import std.algorithm; void main() { stdin // get stdin as a range .byLine(KeepTerminator.yes) .uniq // stdin is immutable, so we need a copy .map!(a => a.idup) .array .sort // stdout.lockingTextWriter() is an output range, meaning values can be // inserted into to it, which in this case will be sent to stdout .copy(stdout.lockingTextWriter()); } 

For those who wish to more thoroughly understand the lazy generation of ranges, I recommend reading this article.

Since slice has three dimensions, this is a range that returns ranges of ranges (ranges of ranges). This is clearly seen in the following example:

 import std.range : iota; import std.stdio : writeln; import std.experimental.ndslice; void main() { auto slice = sliced(iota(1000), 5, 5, 40); foreach (item; slice) { writeln(item); } } 

The result of his work will be as follows (three dots for short):

 [[0, 1, ... 38, 39], [40, 41, ... 78, 79], [80, 81, ... 118, 119], [120, 121, ... 158, 159], [160, 161, ... 198, 199]] ... [[800, 801, ... 838, 839], [840, 841, ... 878, 879], [880, 881, ... 918, 919], [920, 921, ... 958, 959], [960, 961, ... 998, 999]] 

The foreach works much like a for in Python. However, in D you can use it both in C style and in Python cycle style, but without the hassle of enumerate or xrange .

Using UFCS (Uniform Function Call Syntax) you can rewrite the code in the following manner:

 import std.range : iota; import std.experimental.ndslice; void main() { auto slice = 1000.iota.sliced(5, 5, 40); } 

UFCS allows you to record a method call on a chain and write:

 a.func(b) 

instead:

 func(a, b) 

Let's now generate an empty project using the dub package manager. With the command dub init and in the file \source\app.d we write:

 import std.experimental.ndslice; void main() { } 

Currently std.experimental.ndslice; located in the std.experimental section. This does not mean that it is raw. This means that he needs to insist.

Now we will build the project with the command:

 dub 

Module D ndslice is very similar to Numpy:

 a = numpy.arange(1000).reshape((5, 5, 40)) b = a[2:-1, 1, 10:20] 

Equivalently:

 auto a = 1000.iota.sliced(5, 5, 40); auto b = a[2 .. $, 1, 10 .. 20]; 

Now let's create a two-dimensional array and get the middle of each column.

Python:

 import numpy data = numpy.arange(100000).reshape((100, 1000)) means = numpy.mean(data, axis=0) 

D:

 import std.range; import std.algorithm.iteration; import std.experimental.ndslice; import std.array : array; void main() { auto means = 100_000.iota .sliced(100, 1000) .transposed .map!(r => sum(r) / r.length) .array; } 

In order for this code not to work in lazy mode, I had to call the array method. However, in a real application, the result will not be calculated until it is used in another part of the program.

Currently, Phobos does not have built-in statistics . Therefore, the example uses a simple lambda to find the average value. map! function map! has an exclamation mark at the end. This means that this is a template function. It allows at the compilation stage to generate code based on the expression specified in its body. Here's a good article on how the patterns themselves in D work .

Although the D code is a bit more verbose than in Python, but thanks to the map! the code will work with any input data that is a range. While the Python code will only work with special arrays from Numpy.

Here I must say that after carrying out this test, it turned out that Python lost D at times and after long discussions on Hacker News, I realized that I had made a mistake and the comparison was not entirely correct. iota dynamically creates data that the sliced function sliced . And accordingly, we don’t touch the memory until its last relocation. Also D returns an array with a long data type while Numpy is from double . As a result, I rewrote the code and brought the value of the array to 1,000,000 instead of 10,000. Here’s what happened:

 import std.range : iota; import std.array : array; import std.algorithm; import std.datetime; import std.conv : to; import std.stdio; import std.experimental.ndslice; enum test_count = 10_000; double[] means; int[] data; void f0() { means = data .sliced(100, 10000) .transposed .map!(r => sum(r, 0L) / cast(double) r.length) .array; } void main() { data = 1_000_000.iota.array; auto r = benchmark!(f0)(test_count); auto f0Result = to!Duration(r[0] / test_count); f0Result.writeln; } 

Tests conducted on the 2015 MacBook Pro with a 2.9 GHz Intel Core Broadwell i5. In Python, I used the %timeit function in D std.datetime.benchmark to measure speed. Compiled everything using LDC v0.17 with the following keys: ldmd2 -release -inline -boundscheck=off -O . Or if you are using a dub, then the equivalent of these keys will be the dub --build=release-nobounds --compiler=ldmd2 options dub --build=release-nobounds --compiler=ldmd2 .

Here are the results of the first test:

 Python: 145 µs LDC: 5 µs D is 29x faster 

Here is the test of the revised version:

 Python: 1.43 msec LDC: 628 ÎĽs D is 2.27x faster 

Agree not very bad difference considering the fact that Numpy is written in C, and in D everyone scolds the garbage collector.

How can D avoid Numpy problems?


Yes Numpy is fast, but it is fast only in comparison with simple Python masises. But the problem is that it is only partially compatible with these simple arrays.

The Numpy library is somewhere to the side of the rest of Python. She lives her life. It uses its own functions, it works with its types. For example, if we need to use an array created in Python in NumPy, we will need to use np.asarray which will copy it into a new variable. A quick search on github showed that thousands of projects use this crutch. While data could simply be transferred from one function to another without these blank copies.

 import numpy as np a = [[0.2,0.5,0.3], [0.2,0.5,0.3]] p = np.asarray(a) y = np.asarray([0,1]) 

Try to solve this problem by rewriting part of the standard Python library to use Numpy types. However, this is still not a full-fledged solution, which leads to remarkable jokes when writing:

 sum(a) 

instead:

 a.sum() 

we get a 10x drop in speed.

D has no such problems by design. This is a compiled, statically typed language. During code generation, all types of variables are known. In std.ndslice itself, you get full access to all the functions of the Phobos library, for example, to such wonderful things as std.algorithm and std.range. Oh yeah, at the same time you can use patterns D that allow you to generate code right at the compilation stage.

Here is an example:

 import std.range : iota; import std.algorithm.iteration : sum; import std.experimental.ndslice; void main() { auto slice = 1000.iota.sliced(5, 5, 40); auto result = slice // sum expects an input range of numerical values, so to get one // we call std.experimental.ndslice.byElement to get the unwound // range .byElement .sum; } 

You take and just use the sum function and it just takes and works, just like any other function from the base library.

In Python itself, in order to get a list of a specified length initialized with a specific value, we need to write:

 a = [0] * 1000 

Numpy will be completely different:

 a = numpy.zeros((1000)) 

And if you do not use Numpy, then your code will be 4 times slower, not counting the extra copy operation that eats memory. In D, range comes in handy, which allows us to do the same operation quickly and without empty copy operations:

 auto a = repeat(0, 1000).array; 

And if needed, we can immediately call ndslice:

 auto a = repeat(0, 1000).array.sliced(5, 5, 40); 

The main advantage of Numpy at present is its prevalence. Now it is used really very widely from banking systems to machine learning. There are a lot of books, examples and articles on it. However, the mathematical possibilities in D will obviously be expanded very soon. So the author of ndslice stated that he is now working on BLAS (Basic Linear Algebra Subprograms) for Phobos, which will also be fully integrated with ndslice and with the standard library.

A powerful mathematical subsystem will allow you to very effectively solve a number of tasks where you need to work with big data. For example, a computer vision system. The prototype of one of these systems is already being developed and is called DCV .

Image processing on D


The following example will show how the median filter works allowing you to eliminate noise in the image. The movingWindowByChannel function can also be used in other filters where a sliding window is required. movingWindowByChannel allows movingWindowByChannel to move around the image using a sliding window. Each such window is passed to a filter that calculates pixel values ​​based on the selected zone.

This function does not handle zones with partial overlap. However, it can be used to calculate the values ​​in them too. To do this, create an enlarged image with the edges reflecting the borders of the original image and then process it.

 /** Params: filter = unary function. Dimension window 2D is the argument. image = image dimensions `(h, w, c)`, where  is the number of channels in the image nr = number of rows in the window n = number of columns in the window Returns: image dimensions `(h - nr + 1, w - nc + 1, c)`, where  is the number of channels in the image. Dense data layout is guaranteed. */ Slice!(3, C*) movingWindowByChannel(alias filter, C) (Slice!(3, C*) image, size_t nr, size_t nc) { // local imports in D work much like Python's local imports, // meaning if your code never runs this function, these will // never be imported because this function wasn't compiled import std.algorithm.iteration: map; import std.array: array; // 0. 3D // The last dimension represents the color channel. auto wnds = image // 1. 2D composed of 1D // Packs the last dimension. .pack!1 // 2. 2D composed of 2D composed of 1D // Splits image into overlapping windows. .windows(nr, nc) // 3. 5D // Unpacks the windows. .unpack // 4. 5D // Brings the color channel dimension to the third position. .transposed!(0, 1, 4) // 5. 3D Composed of 2D // Packs the last two dimensions. .pack!2; return wnds // 6. Range composed of 2D // Gathers all windows in the range. .byElement // 7. Range composed of pixels // 2D to pixel lazy conversion. .map!filter // 8. `C[]` // The only memory allocation in this function. .array // 9. 3D // Returns slice with corresponding shape. .sliced(wnds.shape); } 

The following function is an example of how to calculate the median of an object. The function is greatly simplified in order to increase readability.

 /** Params: r = input range buf = buffer with length no less than the number of elements in `r` Returns: median value over the range `r` */ T median(Range, T)(Range r, T[] buf) { import std.algorithm.sorting: sort; size_t n; foreach (e; r) { buf[n++] = e; } buf[0 .. n].sort(); immutable m = n >> 1; return n & 1 ? buf[m] : cast(T)((buf[m - 1] + buf[m]) / 2); } 

And now Main itself:

 void main(string[] args) { import std.conv: to; import std.getopt: getopt, defaultGetoptPrinter; import std.path: stripExtension; // In D, getopt is part of the standard library uint nr, nc, def = 3; auto helpInformation = args.getopt( "nr", "number of rows in window, default value is " ~ def.to!string, &nr, "nc", "number of columns in window, default value is equal to nr", &nc); if (helpInformation.helpWanted) { defaultGetoptPrinter( "Usage: median-filter [<options...>] [<file_names...>]\noptions:", helpInformation.options); return; } if (!nr) nr = def; if (!nc) nc = nr; auto buf = new ubyte[nr * nc]; foreach (name; args[1 .. $]) { import imageformats; // can be found at code.dlang.org IFImage image = read_image(name); auto ret = image.pixels .sliced(cast(size_t)image.h, cast(size_t)image.w, cast(size_t)image.c) .movingWindowByChannel !(window => median(window.byElement, buf)) (nr, nc); write_image( name.stripExtension ~ "_filtered.png", ret.length!1, ret.length!0, (&ret[0, 0, 0])[0 .. ret.elementsCount]); } } 

If not all the examples seemed clear to you, I recommend that you read the free version of the wonderful book Programming in D.

PS If you know how to translate this publication in the status of "translations", then write in private.

Source: https://habr.com/ru/post/277715/


All Articles