📜 ⬆️ ⬇️

Sort: determine the best algorithm

As you might guess, I know the answer to this question, but since my article describing the funnel sorting algorithm was met here, to put it mildly, nervously, I decided to do the same kind of testing that basically, of course, based on the articles presented here on Habré.

The Internet does not have difficult arrays for sorting algorithms (I, in any case, could not find them), and numerous “comparative analyzes” of algorithms on arrays of several hundreds or thousands of elements are simply ridiculous, and therefore I decided to drive away the funnel arrays on which studies were conducted with the number of elements of at least 10 ^ 5 or more.

First, a brief overview of what they write about the sorting algorithms with my comments:

At interviews often ask what sorting is the fastest. Trick question. In response, you should ask: “And for what case is the time-optimal sorting chosen?” And only when the conditions are announced can we safely sort through the available options.
My version is obvious: the “funnel” algorithm is ideal for any “conditions”, which are completely unnecessary to voice.

Think about how these and other sorting algorithms will behave for such cases of input data (sometimes degenerate) as:
1. already sorted array;
2. an array sorted in reverse order;
3. partially ordered array;
4. array with repeating elements;
5. array, all values ​​of which are the same.
The first, second and fifth cases obviously have linear complexity, the other two depend on how “the array is partially ordered” and how many “repeating elements” are there. The more, the faster the sorting will go! However, the “funnel” has no “difficult” input data!
')
The main goal of designing new algorithms can be defined as follows: reducing the total array sorting time.
Exactly! And not to minimize this unfortunate "number of comparisons"!

Different arrays of four-byte numbers are used as test data:
- pseudo-random unordered numbers;
- the numbers are arranged in ascending order;
- The numbers are not descending.
The last two options are obviously linear. Sorting of ints is obviously much easier than sorting rows. Well, so be it: if test arrays for ints come across, I will modify the reading, writing and comparing elements slightly.

As practice shows, between the number of operations and program execution time is not always a linear relationship. As shown below, a similar situation develops when estimating the time to sort large amounts of data. Finding out the reasons for this phenomenon is beyond the scope of the issues under consideration, but its presence is a fact established and should be considered when constructing efficient sequential and parallel algorithms.
Axiom! What I wrote in my note and numerous comments to it.

Using the standard “quick sorting” qsort procedure is not practical for the following reasons:
- in the worst case, the algorithm requires about O (n ^ 2) operations, which does not allow sorting of arbitrary arrays of any significant size in a reasonable time;
- time to perform sorting using the qsort procedure is longer than the running time of other considered algorithms, even in the average case. This is probably a payment for the versatility of the interface (the qsort procedure allows processing data of any type) and occurs due to inefficient access to the elements of the sorted array.
Even so? As I have already written, the first item is enough for me personally.

Direct use of recursion in a merge sort algorithm requires significant overhead. Mainly, they are caused by the need to use additional arrays when merging ordered fragments.
So remove the recursion! As I already wrote, normal people just start to merge arrays, starting with single elements! And it will not be necessary “to copy the result at each step, which significantly slows down the processing”.

With four processors, even with access to shared memory (in this case, the cost of transferring arrays from the processor to the processor can be considered equal to zero), the acceleration will not exceed 3.5. When using 32 processors on shared memory, the acceleration will not exceed 10.
Well, do not suffer! And one processor is enough! The algorithm is fast enough - tea, not a traveling salesman problem! Moreover, the "final merge of the two halves of the entire array is performed on a single processor."

An array of 100 million four-byte numbers sorted on 640 processors for 1.21 seconds. One processor of this system requires 83.51 seconds to sort the same array.
I say: do not suffer! Take better processors any other useful thing!

Now we will look at two articles of Habr.

First article: habr.com/ru/post/133996
Of course, I don’t intend to compare anything with anything - I’ll simply run the same arrays with a funnel if this turns out to be possible.

Each algorithm was run on the same data three times, the best result was taken into account (so that the short-term system loads would not affect our honest experiment). Iron: HP dv3510er laptop, Intel Core 2 Duo 2GHz, 2GB RAM. OS: Ubuntu 11.10
Well, three is three. I have about the same iron (a bit better): 2.8 GHz dual-core, 2 GB RAM. OS WinXP.

Experiment 1: Work on ordinary data
10 sets of 10 ^ 8 random natural numbers were generated, the average operation time was measured for each algorithm.
Well, 10 so 10 ... what do we have here ... rand from WATCOM generates ints in the 32K range? Somehow it will not be enough. All right, we will write down twice - in low and high bytes of an inta on a randy. Well, until the heap, let's combine all 10 arrays into one - we will get more data for an array of 10 ^ 9 elements.

Experiment 2: Independent sorted groups of equal size
Given 10 ^ 8 different natural numbers, sorted in ascending order. We group them by k consecutive elements and rearrange all the groups in the reverse order.
Lord, what can they think of! :) Well, for k = 10 ^ 8, as I understand it, this will be the “sorted array”. For k = 10 ^ 0, it is the same, but sorted in the reverse order. And for the rest, respectively ... well, sort of generated. We will also merge these arrays into one - there will be 100 million fewer elements there than in the first “generalized” test.

Experiment 3: Sorted groups of different sizes, limited range of values
Given 10 ^ 8 integers from a limited range (in our case [0: 100000]) and they are ordered by groups of arbitrary size from 1 to k.
Well, and how do you want to understand this? I did not understand the generation algorithm at all! Okay, this test is skipped.

Experiment 4: Arbitrary Permutations
We will generate partially ordered data in another way: take 10 ^ 8 arbitrary natural numbers, sort them and make k arbitrary permutations of two elements in places.
We also skip this one — this is not a test at all: at first, there is a purely sorted array with only a small amount of distortion, and at the end just a random array. Well, yes, it is: all the algorithms are lined up in the same sequence as in the first test, and even the time they almost coincide!

The second article: habr.com/ru/post/335920
Algorithm descriptions, descriptions ... descriptions ... descriptions ... and tests where? Yeah, here they are: Hardware and system: Processor: Intel Core i7-3770 CPU 3.40 GHz RAM: 8 GB
Here, iron is a little better than mine, but it doesn't matter.

All tests are divided into four groups. The first group is an array of random numbers in different modules (10, 1000, 10 ^ 5, 10 ^ 7 and 10 ^ 9). The second group is an array divided into several sorted subarrays. In fact, an array of random numbers modulo 10 ^ 9 was taken, and then sub-arrays of size equal to the minimum of the length of the remaining suffix and a random number modulo some constant were sorted. The sequence of constants is 10, 100, 1000, etc. right up to the size of the array. The third group is an initially sorted array of random numbers with a certain number of “swaps” - permutations of two random elements. The sequence of the number of swaps is the same as in the previous group. Finally, the last group consists of several tests with a completely sorted array (in the direct and reverse order), several tests with an initial array of natural numbers from 1 to n, in which several numbers are replaced with a random one, and tests with a large number of repetitions of one element (10 %, 25%, 50%, 75% and 90%).
And here, in fact, nothing to check! Either what has already happened, or even more attractive for the funnel "a large number of repetitions." Well, what is there to drive "for 20 starts"? In general, I give the results of the tests from the first article:

Tests of the first group:
0: N = 2602009514 T = 195 sec
1: N = 2602148436 T = 193 sec
2: N = 2601943056 T = 191 sec
3: N = 2602055129 T = 194 sec
4: N = 2601888286 T = 195 sec
5: N = 2602105643 T = 191 sec
6: N = 2601928733 T = 191 sec
7: N = 2601947256 T = 196 sec
8: N = 2601891575 T = 193 sec
9: N = 2602105643 T = 196 sec
0-9: N = 29495155841 T = 3210 sec (8 tmp-files)
Somehow very much in time and in the number of comparisons.

Tests of the second group:
0: N = 199999998 T = 18 sec
1: N = 1923669691 T = 57 sec
2: N = 1683854900 T = 46 sec
3: N = 1480233649 T = 42 sec
4: N = 1249780062 T = 39 sec
5: N = 996602000 T = 33 sec
6: N = 666000198 T = 26 sec
7: N = 3,400,00016 T = 21 sec
8: N = 99999999 T = 16 sec
0-8: N = 11462881951 T = 968 sec (7 tmp-files)

In all cases, the time is “dirty”, that is, it includes the load time of the original array into RAM, the sorting itself, and the time it takes to write the results to the file (for large arrays, the time for merging the temporary files). The results speak for themselves (and during the sorting of the "billionaire" I wrote this note). The sorting program is almost the same as for sorting strings: elements are read and written byte-by-byte (just the length of a string is always 4 bytes), and only in the procedure of comparing elements not strings are compared, but ints (like UI32). So throw out, gentlemen, all algorithms in the trash - the funnel sorting algorithm is IDEAL! :)

Source: https://habr.com/ru/post/453552/


All Articles