Bitwise sorting with a human face

Despite the fame of the bitwise sorting algorithm, it is difficult to find a decent implementation of it in the C ++ language on the Internet (to be honest, I think that in other languages too). Almost everything that is located by search engines is monstrous either in terms of code or in terms of efficiency. And most often bad, and then, and more.

The main mistake is that the authors are trying to bring universality where it is not needed, and do not provide universality where it is really necessary. The result is something that is slow and impossible to use.

Perhaps that is why many people still consider deconstruction an algorithm of purely academic interest and of little use in reality. However, this is a misconception.

Recently, however, began to appear tolerable options. One of them tried to get into Bust, but was not missed, and the second one even got.
')
Moreover, if the first one is simply not very well designed and implemented, then the second one - which got into boost - is not a completely bit-sort sort at all, but a kind of hybrid that on small arrays reduces to calling std :: sort. Accordingly, its interface contains everything else, and a function of comparing elements, which must be passed to std :: sort.

Both of these options, in my opinion, do not reach the true, faithful, Orthodox bitwise sorting, neither in terms of the interface, nor in terms of speed.

Please note that the purpose of the article is neither an analysis of the algorithm nor a detailed description of the implementation details. The algorithm is considered from the point of view of its “user characteristics”: it should work quickly and should be convenient to use.

To begin with, to show that I am not just theoretical, I will give a comparison of my bitwise sorting with std :: sort and boost :: integer_sort.

8 bits, 100-1000 elements

8 bits, 1000-10000 elements

8 bits, 10,000–100,000 items

16 bits, 100-1000 elements

16 bits, 1000—10000 elements

16 bits, 10,000–100,000 items

32 bits, 100-1000 elements

32 bits, 1000-10000 elements

32 bits, 10,000-100,000 items

64 bits, 100-1000 elements

64 bits, 1000-10000 elements

64 bits, 10,000-100,000 items

Measurements were made on a machine with a Core i7 3.40GHz processor. On a laptop with a Core 2 Duo 2.26GHz processor, the advantage of bitwise sorting is even more obvious.

And now we proceed to the design.

First approach

To get a competent interface, it is naturally best to turn to the standard library for enlightenment.

Take the standard algorithm std :: sort. The first option takes only two iterators. Iterators set the range to be sorted, and about the elements of the range it is known that they have an order relationship defined - the “less” operator. This is enough to sort.

template <typename RandomAccessIterator> void sort (RandomAccessIterator first, RandomAccessIterator last);

But what if the “less” operator is not defined for the elements or does it need to compare the elements somehow differently? Then the user must set the order relation himself and pass it as the third argument.

 template <typename RandomAccessIterator, typename Compare> void sort (RandomAccessIterator first, RandomAccessIterator last, Compare comp);

So, what will the bitwise sorting look like if you start from this example? In the first version - about the same. If it is known that the elements of the input range are integers, then we get the same thing as std :: sort:

 template <typename RandomAccessIterator> void radix_sort (RandomAccessIterator first, RandomAccessIterator last);

But what if the input elements are not numbers? It's simple: you need to set the mapping from these elements to numbers. This mapping, as well as the order relation in std :: sort, will be the third argument of the function.

 template <typename RandomAccessIterator, typename Map> void radix_sort (RandomAccessIterator first, RandomAccessIterator last, Map to_integer);

This interface does not provide any possibility to influence the selection of digits from numbers. On the other hand, it is clear that the majority of non-distracting users will have enough of the existing interface, because the user wants to sort the numbers and do it quickly. Implementation details, as a rule, are not very interesting.

This means that all the arguments of the function that will be associated directly with the discharge of digits will be somewhere in the end.

Therefore, in this step, we believe that the discharges are selected somehow automatically, and you can still be distracted by other problems.

Additional memory

The algorithm of bitwise sorting requires additional memory to work:

Array for storing intermediate results of sorting one by one.
Array for counters elements.

The problem of storing intermediate results has two solutions:

Allocate memory for an array inside a function.
Transfer buffer outside.

And if you think a little, the right decision is obvious - the user must transfer the buffer outside. And the buffer will be set as an iterator at the beginning of a piece of memory, the size of which is not less than the size of the sorted range.

It is more flexible.

The user is not required to allocate a piece of memory, strictly equal to the size of the sorted range. If he wants, he can select one large piece and use it to sort the differently sized ranges.
It is more efficient.

If sorting is performed repeatedly, then it is enough to allocate memory once for the buffer, and then transfer it to sorting. And the sorting itself will not waste precious time on allocation.

Given the chosen solution, the updated interface will be as follows:

 template <typename RandomAccessIterator1, typename RandomAccessIterator2> void radix_sort (RandomAccessIterator1 first, RandomAccessIterator1 last, RandomAccessIterator2 buffer); template <typename RandomAccessIterator1, typename RandomAccessIterator2, typename Map> void radix_sort (RandomAccessIterator1 first, RandomAccessIterator1 last, RandomAccessIterator2 buffer, Map to_integer);

If the user is too lazy to create a buffer before sorting each time, then he can write an elementary wrapper for himself:

 template <typename RandomAccessIterator, typename Map> void im_too_lazy_to_create_a_buffer_for_the_radix_sort (RandomAccessIterator first, RandomAccessIterator last, Map to_integer) { using value_type = std::iterator_traits<RandomAccessIterator>::value_type; std::vector<value_type> buffer(std::distance(first, last)); radix_sort(first, last, buffer.begin(), to_integer); }

With an array of counters, too, in principle, it is clear.

It is clear that the work on calculating the size and number of digits, the allocation of the necessary arrays for the counters and the transfer of these counters to the sorting function is too complicated to pass it to the user. The user wants to sort an array. The rest is not interesting to him.

And here we come to the most important thing - the problem of discharge, and this should be discussed separately.

Separation of numbers from numbers

The favorite way to isolate the digits of most unfortunate developers of bitwise sorting is to transfer the function of an integer base, the remainder of the division by which is considered a single digit. And the most common such basis is 10. This can be understood. But do not forgive.

A small improvement of this method, which some developers guess is to make - they take the power of two as the base, and in the arguments of the function they set this degree, and not the base itself. Accordingly, the discharge is allocated with the help of bit operations, which in itself is already better than simple division, but still bad.

Some, like, for example, the author of the first of the above examples, prompts the user to specify the width and minimum and maximum values of a single digit. Which is also unacceptable, because the user can easily break a function by passing incorrect or inconsistent parameters to it.

So, to choose the right interface, we formulate a few important thoughts.

The user wants to use for sorting all digits of the sorted numbers.

Indeed, the main case is sorting by the first category, then by the second, by the third, and so on. until the last without gaps.

Although omissions of discharges or parts of discharges are the only non-trivial possibility that the user may need from the mechanism for selecting discharges - you can arrange it, and I will tell about it later.
The best bit is one byte.

On all processors, in all operating systems and compilers, which I was able to check, the fastest sorting works if you take one byte in one bit. For the car is the most convenient case.

In fact, for sorting microscopic arrays (10-20 elements) of eight-bit numbers, it would be more profitable to take half a byte as a discharge, but on such sizes the bitwise sorting is in any case inferior (or not particularly winning) to sorting by comparisons, so this case does not interest us.

Based on these statements, we will create a discharge extraction mechanism, which will be the radix mapping from the result of the to_integer function to an integer. Moreover, the type returned by the function radix will be considered as one digit. For example, if radix returns std :: uint8_t, then one byte will be taken as a bit.

Based on this, at the compilation stage, you can get all the necessary information: the size of the discharge and shift to obtain the next digit (in this case - 8), the range of possible values of the discharge (in this case - [0, 256)), the number of digits in the sorted numbers etc.

It will also allow to completely get rid of dynamic memory, and even for counters of discharges to start arrays on the stack.

The final interface will look like this:

 template <typename RandomAccessIterator1, typename RandomAccessIterator2> void radix_sort (RandomAccessIterator1 first, RandomAccessIterator1 last, RandomAccessIterator2 buffer); template <typename RandomAccessIterator1, typename RandomAccessIterator2, typename Map> void radix_sort (RandomAccessIterator1 first, RandomAccessIterator1 last, RandomAccessIterator2 buffer, Map to_integer); template <typename RandomAccessIterator1, typename RandomAccessIterator2, typename Map, typename Radix> void radix_sort (RandomAccessIterator1 first, RandomAccessIterator1 last, RandomAccessIterator2 buffer, Map to_integer, Radix radix);

The last two arguments are optional. Moreover, the latter will be used only in some exceptional cases, and the penultimate - relatively often. Perhaps, approximately the same as the order relation in std :: sort.

By default, the radix function returns the low byte of a number, and the to_integer function is the identical transformation. Thus, if ordinary integers are sorted, you do not need to specify anything further.

About passes

If someone is still interested in how to arrange the skipping of discharges or individual numbers when sorting, I tell. To do this, you either need to throw away unnecessary bits in the radix display, or make a more cunning display to_integer, in which you need to zero all unnecessary digits.

For example, to discard the older half-byte from each digit, the radix mapping must be written like this:

 auto radix = [] (auto x) -> std::uint8_t { return x & 0xf; }; //     .

findings

The advantages of this approach to bitwise sorting are:

High speed default

The user can try different options, for example, take std :: uint16_t as a discharge or even bool, but, rather, everything, the result will only get worse.
Compatibility with the standard library

The algorithm uses one of the basic design patterns of the standard library - an iterator. Besides the fact that the algorithm on iterators is more convenient to use, it is also faster than on containers. This requires a separate explanation, but it does not fit into the scope of this article.

In addition, the converter (to_integer) is also familiar to the SBSH user through algorithms such as std :: transform.
The user is maximally excluded from implementation details, but retains control over what is happening.

The user cannot “spoil” the algorithm, pass incorrect arguments to the function, etc. - in the worst case, its code will not compile. At the same time, he still has full control over the process (with the exception of those very “academic” experiments, such as “sort by base 97”).
All that is possible is done at compile time.

There is no allocation in the function. All additional memory that is used, either comes from the outside, or is allocated on the stack.

In addition, the template metaprogramming allows you to unwind cycles by digits, since the number of digits is also known at the compilation stage.

So bitwise sorting is an algorithm that is quite suitable for practical use in the combat code. Checkmate, "academicians"!

If the internal implementation is interesting, then the code lies here .

Source: https://habr.com/ru/post/271677/

All Articles

Bitwise sorting with a human face

First approach

Additional memory

Separation of numbers from numbers

findings

More articles: