How I switched from Python to Julia (and why)

A little background on Python

Python is a great language. I tried several languages before it: Pascal at school; C, C with classes, C ++ - at the university. The last two (three) have instilled a persistent aversion to programming: instead of solving a problem, you are busy with allocations and destructors (scary words from the past), thinking in terms of low-level primitives. My opinion is that C is not suitable for solving educational and scientific problems (in any case, in the field of mathematics). I am sure that I will be objected, but I'm not trying to impose anything on anyone, just expressing my opinion.

Python has become a revelation in its time. For the first time in my life, I wrote several levels of abstraction higher than is customary in Xi. The distance between the task and the code has decreased as never before.

I would probably have written it all my life in Python if I didn’t have to suddenly implement NIST statistical tests. It would seem that the task is very simple: there is an array of several lengths (> = 10) megabytes, there is a set of tests that need to be applied to this array.

What is good numpy?

In python for working with arrays there is a good numpy package, which is essentially a high-level interface on fast libraries like LAPACK. It would seem that the entire gentlemanly set for scientific computing is available (Sympy, Numpy, Scipy, Matplotlib), what more could you ask for?
')
Everyone who has dealt with numpy knows that he is good, but not in everything. We still have to try to make the operations vectorized (as in R), i.e. in a sense, uniform for the whole array. If suddenly your task for some reason does not fit into this paradigm, then you have problems.

What kind of non-vectorized tasks are I talking about? Yes, at least take the same NIST: calculate the length of the linear shift register according to the Berlekamp-Messi algorithm; calculate the length of the longest subsequence of consecutive units, and so on. I do not exclude the possibility that there is some ingenious solution that will reduce the task to a vectorized one.

Clever?

As an example from the same NIST: it was necessary to count the number of “switching” sequences, where by “switching” I mean changing successively going units (... 1111 ...) to consecutive zeroes (... 000 ... ), and vice versa. To do this, you can take the original sequence without the last element (x [: -1]) and subtract the sequence shifted by 1 (x [2:]) from it, and then calculate the number of nonzero elements in the resulting new sequence. All together will look like:

np.count_nonzero(x[:-1] - x[1:])

It may look like an entertaining warm-up for the mind, but in fact here is something unnatural, some kind of trick that will not be obvious to the reader after a short while. Not to mention the fact that it is still slow - no one has canceled the allocation of memory.

Non-vectorized operations in Python are long. How to deal with them if numpy is not enough? Usually offer several solutions:

Numba jit. If she worked as described on the main page of Numba, then I think it would be worth it to finish the story. Over the prescription I had forgotten a little what had gone wrong with her; the bottom line is that the acceleration was not as impressive as I expected, alas.
Cython. OK, raise your hands to those who believe that cython is a really beautiful, elegant solution that does not destroy the very meaning and spirit of Python? I do not think so; if you’ve gotten to Cython, then you can stop deceiving yourself and move on to something less sophisticated like C ++ and C.
Rewrite the bottlenecks on C and jerk them out of your favorite Python. OK, but what if I have the whole program - this is a solid calculation and bottlenecks? C does not offer! I'm not talking about the fact that in this decision you need to know well not one, but two languages - Python and Xi.

Here comes the JULIA!

Namayavshis with existing solutions and not finding (having failed to program) anything good, I decided to rewrite it on something “faster”. In fact, if you write the “number thrasher” in the 21st century with normal array support, vectorized out-of-box operations, etc. etc., the choice you have is not particularly thick:

Fortran . And do not laugh, which of us did not pull the BLAS / LAPACK of their favorite languages? Fortran is a really good (better SI!) Language for SCIENTIFIC calculations. I did not take it for the reason that since the times of Fortran they have already invented a lot of things and added them to programming languages; I was hoping for something more modern.
R suffers from the same problems as Python (vectorization).
Matlab - probably yes, I unfortunately have no money to check.
Julia is a dark horse, it will take off - it will not take off (and it was natural until stable-version 1.0)

Some obvious advantages of Julia

It looks like Python, in any case, the same "high-level", with the possibility, if necessary, to go down to the bits in the numbers.
There is no messing around with memory allocations and things like that.
Powerful type system. Types are prescribed optionally and very dosage. You can write a program without specifying a single type - and if you do it STRONGLY, then the program will be fast. But there are nuances.
It's easy to write custom types that are as fast as built-in types.
Multiple dispatch. You can talk about it for hours, but in my opinion, this is the best thing about Julia, it is the basis for the design of all programs and in general the basis of the philosophy of language.
Due to multiple dispatch, many things are written very uniformly.
Examples with multiple dispatch
```
 rand() #       0  1 rand(10) #   10     0  1 rand(3,5) #   3  5   .... using Distributions d = Normal() #     0, 1 rand(d) #     rand(d, 10) #   10 ...    
```
That is, the first argument can be any (one-dimensional) distribution from Distributions, but the function call will remain the same. And not only the distribution (you can transfer the RNG itself as a MersenneTwister object and much more). Another (in my opinion indicative) example is navigation in DataFrames without loc / iloc.
6. Arrays are native, embedded. Vectorization out of the box.

Cons, silent about which would be a crime!

New language. On the one hand, of course, a minus. In some ways, perhaps immature. On the other hand, he took into account the rakes of many past languages, stands "on the shoulders of giants," absorbed many interesting and new things.
Immediately fast programs are unlikely to write. There are some non-obvious things that are very easy to deal with: you step on a rake, ask the community for help, step in again ... and so on. Basically it is type instability, type instability and global variables. In general, as far as I can judge by myself, a programmer who wants to write quickly on Julia goes through several stages: a) he writes like in Python. This is great and possible, but sometimes there will be performance problems. b) writes as in C: prescribing manic types wherever possible. c) gradually understands where it is necessary (very metered) to prescribe types, and where they interfere.
Ecosystem. Some bags are raw, in the sense that something is constantly falling off somewhere; some are mature but non-compatible with each other (conversion to other types is necessary, for example). On the one hand, this is obviously bad; on the other hand, there are many interesting and bold ideas in Julia just because “we are standing on the shoulders of giants” - a tremendous experience has been gained in attacking the rake and “how not to do it”, and this is taken into account (partially) by the developers of the packages.
The numbering of the arrays from 1. Ha, kidding, this is of course a plus! No, well, seriously, what's the matter, from which index does the numbering begin? You get used to it in 5 minutes. Nobody complains that integers are called integer, not whole. These are all matters of taste. And yes, take at least the same Cormen on the algorithms - there everything is numbered from one, and sometimes it is inconvenient to rework in Python.

Well, what about the speed of something?

It is time to remember for the sake of which everything was started.

Note: I safely forgot Python, so write your improvements in the comments, I will try to run them on my laptop and see the execution time.

So, two small examples with microbench marks.

Something vectorized

The task: the input of the function is the vector X from 0 and 1. It is necessary to convert it into a vector from 1 and (-1) (1 -> 1, 0 -> -1) and calculate how many coefficients from the Fourier transform of this vector by the absolute value exceeds the boundary.

 def process_fft(x, boundary): return np.count_nonzero(np.abs(np.fft.fft(2*x-1)) > boundary) %timeit process_fft(x, 2000) 84.1 ms ± 335 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

 function process_fft(x, boundary) count(t -> t > boundary, abs.(fft(2*x.-1))) end @benchmark process_fft(x, 2000) BenchmarkTools.Trial: memory estimate: 106.81 MiB allocs estimate: 61 -------------- minimum time: 80.233 ms (4.75% GC) median time: 80.746 ms (4.70% GC) mean time: 85.000 ms (8.67% GC) maximum time: 205.831 ms (52.27% GC) -------------- samples: 59 evals/sample: 1

We won’t see anything surprising here - all the same they don’t consider themselves, but transfer to well-optimized Fortran programs.

Something non-vectorized

The input is an array of 0 and 1. Find the length of the longest subsequence of consecutive units.

 def longest(x): maxLen = 0 currLen = 0 # This will count the number of ones in the block for bit in x: if bit == 1: currLen += 1 maxLen = max(maxLen, currLen) else: maxLen = max(maxLen, currLen) currLen = 0 return max(maxLen, currLen) %timeit longest(x) 599 ms ± 639 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

 function longest(x) maxLen = 0 currLen = 0 # This will count the number of ones in the block for bit in x if bit == 1 currLen += 1 maxLen = max(maxLen, currLen) else maxLen = max(maxLen, currLen) currLen = 0 end end return max(maxLen, currLen) end @benchmark longest(x) BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 9.094 ms (0.00% GC) median time: 9.248 ms (0.00% GC) mean time: 9.319 ms (0.00% GC) maximum time: 12.177 ms (0.00% GC) -------------- samples: 536 evals/sample: 1

The difference is obvious "naked" view. Tips for doping and / or vectorizing numpy versions are welcome. I also want to note that the programs are almost identical. For example, I did not register any type in Julia (compare with the previous one) - all the same, everything works quickly.

I would also note that the versions presented were not included in the final program, but were further optimized; here they are given as an example and without unnecessary complications (forwarding extra memory in Julia to do rfft in-place, etc.).

What happened in the end?

The final code for Python and for Julia I will not show: this is a mystery (at least for now). At the time when I quit finishing the python version, she ran all the NIST tests on an array of 16 megabytes of binary characters in ~ 50 seconds. At Julia, at the moment, all tests are run on the same volume of ~ 20 seconds. It may well be that I am a fool, and there was space for optimizations, etc. etc. But this is me, what I am, with all its advantages and disadvantages, and in my opinion, it is necessary to consider not the spherical speed of programs in vacuum in programming languages, but how I can program it (without really very gross errors).

Why did I write all this?

People here became interested; I decided to write when the time comes. In the future, I think to go through Julia more thoroughly, with an example of solving some typical problems. What for? More languages, good and different, and let everyone who wants can find something that suits him personally.

Source: https://habr.com/ru/post/427879/

All Articles