Another sorting by distribution

When it comes to efficient sorting algorithms, the erudite habrauser will immediately recall the unfading “ quick sort ”, the new-fashioned “ Tim sort ”, the legendary “ merge sort ” and even the tricky “ introspective sort ”.

Without questioning the effectiveness of the above methods, I suggest you sorting, which, under certain input conditions, easily does any other algorithm in terms of speed.

We are talking about FlashSort , which handles very large arrays with evenly distributed data.

The method in 1998 was presented by the German scientist Karl-Dietrich Neubert . His scientific activity is devoted not so much to the theory of algorithms as to solid state physics, biophysical systems, and elementary particle physics. Analyzing the results of experiments, Neubert came to the conclusion that it is logical to sort large arrays of statistical data by resorting to the theory of probability.
')

The essence

The principle of operation is easy to explain with a specific example.

Suppose there is a large array of n elements whose values range from 1 to 100 . If we meet an element with a value of 50 , then it is reasonable to assume that its rightful place is in the middle of the array. The situation is similar with other elements: 5 , probably, should be closer to the beginning of the structure, 95 it is appropriate to push almost to the end. As a result of such careless manipulations, we quickly obtain an almost sorted array .

Having scrambled up the elements, it remains to apply some method that quickly complements the unordered ( sorting inserts fit perfectly).

Matan

The main task is reduced to the fact that all the elements of the array, depending on their values, need to be distributed into several classes. The smallest numbers are in the first class, the largest - in the last, the remaining numbers - in intermediate groups.

If we take m for the number of classes and also know the minimum A _min and maximum A _max elements in the array, then the class number for the element A _{i is} calculated using the following formula:

From the point of view of probability theory, class K is nothing more than a quantile indicating the place of the element A _i in our distribution model.

Square brackets in the expression mean the integer part. Classes will thus be renumbered from 1 to m . In the additional array Q [ 1..m ], the position Q [ K ] contains the number of elements from the main array belonging to the corresponding class.

Then, using an additional array, we redistribute the elements of the main array, inserting them into almost our places. Algorithmic nuances - in the code of the java-program below.

Visualization

Duration more than 2 minutes. It will be clear why the sorting is dubbed "flickering."

YouTube version

Time effectiveness

As already mentioned, the algorithm is very effective on large arrays with uniformly distributed data. In this case, FlashSort with an average time complexity O (n) is significantly ahead of both QuickSort and other efficient sorting methods.

In other situations, things are not so brilliant. On short arrays, on long arrays with data that are not evenly distributed in size, the “flashing sorting” shows at least acceptable, but much more modest results.

It also does not make sense to apply the main algorithm to non-fully ordered arrays, it is easier to go straight to the sorting by inserts . An almost sorted array for FlashSort is a degenerate case, and in particularly unsuccessful situations, the worst time complexity is O (n ² ) .

Memory efficiency

The method is not very memory demanding, although it requires resources for a new array, the number of elements in which is the number of classes. The question is rather actual: what m to take? The more classes - the better the distribution will be, but the higher the price in the form of additional memory. In most cases, the acceptable ratio "price / quality" gives the formula

m ≈ 0.42 n ,

derived by Neubert empirically.

If the range of possible values of the elements of the main array is small enough, then as m it is possible (and even desirable) to take the whole distance between the minimum and maximum elements. In such cases, if the condition “1 value = 1 class” is observed, the speed of sorting over time achieves the best result O (n) at insignificant expenses for additional memory.

Implementation

FlashSort in Java.

import java.util.Arrays; public class FlashSortAlgorithm extends SortAlgorithm { void sort(int[] a) throws Exception { FlashSort(a); // FlashSort InsertionSort(a); //   } //FlashSort,    //   private void FlashSort(int [] a) throws Exception { int n = a.length; //  int m = n * 0.42; //  int [] l = new int[m]; //  int i = 0, j = 0, k = 0; //   int anmin = a[0]; //  int nmax = 0; //   //     for (i = 1; i < n; i++) { if (a[i] < anmin) anmin = a[i]; if (a[i] > a[nmax]) nmax = i; } // = ?   //   . // ,  ! if (anmin == a[nmax]) return; //   double c1 = ((double) m - 1) / (a[nmax] - anmin); //   //    - //     for (i = 0; i < n; i++) { k = (int) (c1 * (a[i] - anmin)); l[k]++; } //     //(  2-)   . //      //        //     for (k = 1; k < m; k++){ l[k] += l[k - 1]; } //        //         //       int hold = a[nmax]; a[nmax] = a[0]; a[0] = hold; //  // ,  //     int nmove = 0; // ,      //      ""  int flash; //   //  ,    //  j = 0; //    //      1..m-1 k = m - 1; while (nmove < n - 1) { while (j > (l[k] - 1)) { j++; k = (int) (c1 * (a[j] - anmin)); } flash = a[j]; while (!(j == l[k])) { k = (int) (c1 * (flash - anmin)); hold = a[l[k] - 1]; a[l[k] - 1] = flash; flash = hold; l[k]--; nmove++; } } } //    //     FlashSort private void InsertionSort(int [] a) throws Exception { int i, j, hold; for (i=a.length-3; i>=0; i--) { if (a[i+1] < a[i]) { hold = a[i]; j=i; while (a[j+1] < hold) { a[j] = a[j+1]; j++; } a[j] = hold; } } } }

The code is taken from here , my comments.
A collection of implementations on various YPs (Ada, C, Basic, Fort, Fortran, Java, Pascal) can be found on the Professor’s website .

Algorithm Characteristics

Title	FlashSort (Flashing sort; Flashing sort; Flashing sort)
Author	Karl-Dietrich Neubert
Year of publication	1998
Class	Sort by distribution
Resilience	Unstable
Comparisons	No comparisons
Time complexity	the worst	O (n ² )
	average	O (n + m)
	the best	O (n)
Memory difficulty	Total	O (n + m)
Memory difficulty	additional data	O (m)

Links

FlashSort in English Wikipedia
Karl-Dietrich Neubert, personal site.
FlashSort on Dr. Dobb's journal
Java visualization

Source: https://habr.com/ru/post/195968/

All Articles