Hash table sorting

I bring to your attention a new (as I think) sorting algorithm. Tried to look for similar, but did not see analogs. Let me know if you have seen something like this.

The essence of sorting is that the hash function and collision resolution are constructed in such a way that the data in the hash table is already in sorted form. It remains only to run through the array of hash tables and collect non-empty sorted data.

Who cares - I ask under the cat.

So, the algorithm works as follows:

On the first pass, we find the minimum and maximum values of the source data for calculating the coefficient of the hash function of projecting values into the range of indexes of the hash table (A _min , A _max ).
On the second pass, insert the original values into the hash table, calculating the index using the hash function index = ( int ) (k * (A _i -A _min ) * TMPLEN / (A _max -A _min )) .
On the third pass we go through the hash table and copy the sorted data into the original array.

To resolve collisions, use the skip of occupied cells (if the inserted value is <= recorded in the table) and shift the parts of the table to the right (up to the first free cell), if you need to insert it to the place where the value is greater.

')

The temporary array for the hash table is 2-3 times larger than the original array. Due to this, it is possible to optimize the resolution of collisions and use only the shift to the right.

The algorithm belongs to the class of stable sorts, with a combination of comparison and calculation.

Computational complexity - from O (n * log n) (as I understood when comparing with the quick sorting built into Java) to O (n * n) in the worst case. If you own the hardware, you can withdraw it formally. I have already forgotten everything. Waiting for your thoughts in the comments.

With a uniform distribution, I discovered that the algorithm works 20-25% faster than quick sorting!

Memory cost - O (n) .

Comparing with quick sorting, I found that with a small number of values of the source data, the algorithm degrades greatly, since it is necessary to resolve a lot of collisions.

However, combining with a merge (sorting in blocks of 500 values), I achieved a performance commensurate with quick sorting, and more than with pure merge.

Benefits:

it is well parallelized (when using non-recursive merging of blocks);
productivity is commensurate with quick sorting;
if the distribution is known, then the hash function can be adjusted to optimize the hash table filling.

Disadvantages:

too much memory requires;
in a small range of values or with a highly uneven distribution, the performance degrades.

I don’t know if this algorithm will be useful in practice, but it’s definitely not a hindrance for academic study.

My source code for java testing.

Playing parameters can be tested in different modes. For example, if you set SORTBLOCK = SOURCELEN, then a pure hashing algorithm will be applied. With MIN_VAL and MAX_VAL, you can set a range of values for generating random numbers.

With SOURCELEN <300, the sorted data is displayed in the console. For an empty value in the array, I chose zero, so do not include it in the range.

And I understand that the measurement of performance is not entirely correct. But for the preliminary assessment is quite amiss. When there will be time - I will try on a special framework, as colleagues advise.

import java.util.Arrays; import java.util.Date; import java.util.Random; public class HashSort { //     static int SOURCELEN = 1000000; int source[] = new int[SOURCELEN]; //         int quick[] = new int[SOURCELEN]; //      static int SORTBLOCK = 500; static int k = 3; //   static int TMPLEN = (SOURCELEN < SORTBLOCK * k) ? SORTBLOCK * k : SOURCELEN; int tmp[] = new int[TMPLEN]; //     static int MIN_VAL = 10; static int MAX_VAL = 1000000; int minValue = 0; int maxValue = 0; double hashKoef = 0; long finishTime = 0; long startTime = 0; long finishTimeQuick = 0; long startTimeQuick = 0; //       public void randomize() { int i; Random rnd = new Random(); for(i=0; i<SOURCELEN; i++) { int rndValue = MIN_VAL + ((int)(rnd.nextDouble()*((double)MAX_VAL-MIN_VAL))); source[i] = rndValue; } } //          - public void findMinMax(int startIndex, int endIndex) { int i; minValue = source[startIndex]; maxValue = source[startIndex]; for(i=startIndex+1; i<=endIndex; i++) { if( source[i] > maxValue) { maxValue = source[i]; } if( source[i] < minValue) { minValue = source[i]; } } hashKoef = ((double)(k-1)*0.9)*((double)(endIndex-startIndex)/((double)maxValue-(double)minValue)); } //       public void stickParts(int startIndex, int mediana, int endIndex) { int i=startIndex; int j=mediana+1; int k=0; while(i<=mediana && j<=endIndex) { if(source[i]<source[j]) { tmp[k] = source[i]; i++; } else { tmp[k] = source[j]; j++; } k++; } if( i>mediana ) { while(j<=endIndex) { tmp[k] = source[j]; j++; k++; } } if(j>endIndex) { while(i<=mediana) { tmp[k] = source[i]; i++; k++; } } System.arraycopy(tmp, 0, source, startIndex, endIndex-startIndex+1); } //         boolean shiftRight(int index) { int endpos = index; while( tmp[endpos] != 0) { endpos++; if(endpos == TMPLEN) return false; } while(endpos != index ) { tmp[endpos] = tmp[endpos-1]; endpos--; } tmp[endpos] = 0; return true; } // -    public int hash(int value) { return (int)(((double)value - (double)minValue)*hashKoef); } //         public void insertValue(int index, int value) { int _index = index; while(tmp[_index] != 0 && tmp[_index] <= value) { _index++; } if( tmp[_index] != 0) { shiftRight(_index); } tmp[_index] = value; } //         public void extract(int startIndex, int endIndex) { int j=startIndex; for(int i=0; i<(SORTBLOCK*k); i++) { if(tmp[i] != 0) { source[j] = tmp[i]; j++; } } } public void clearTMP() { if( tmp.length < SORTBLOCK*k) { Arrays.fill(tmp, 0); } else { Arrays.fill(tmp, 0, SORTBLOCK*k, 0); } } //   public void hashingSort(int startIndex, int endIndex) { // 1.          findMinMax(startIndex, endIndex); // 2.    clearTMP(); // 3.       - for(int i=startIndex; i<=endIndex; i++) { insertValue(hash(source[i]), source[i]); } // 4.         extract(startIndex, endIndex); } //          public void sortPart(int startIndex, int endIndex) { if( (endIndex - startIndex) <= SORTBLOCK ) { hashingSort(startIndex, endIndex); return; } int mediana = startIndex + (endIndex - startIndex) / 2; sortPart(startIndex, mediana); sortPart(mediana+1, endIndex); stickParts(startIndex, mediana, endIndex); } public void sort() { sortPart(0, SOURCELEN-1); } //   public String toString() { int i; String res = ""; res += "Source:"; if( SOURCELEN < 300) { for(i=0; i<SOURCELEN; i++) { res += " " + source[i]; } } res += "\n"; res += "Quick: "; if( SOURCELEN < 300) { for(i=0; i<SOURCELEN; i++) { res += " " + quick[i]; } } res += "\n"; res += "len: " + SOURCELEN + "\n"; res += "hashing&merge sort time: " + (finishTime - startTime) + "\n"; res += "quick sort time: " + (finishTimeQuick - startTimeQuick) + "\n"; return res; } //         public void copyToQuick() { System.arraycopy(source, 0, quick, 0, source.length); } public static void main(String[] args) { HashSort hs = new HashSort(); hs.randomize(); hs.copyToQuick(); hs.startTime = (new Date()).getTime(); hs.sort(); hs.finishTime = (new Date()).getTime(); hs.startTimeQuick = (new Date()).getTime(); Arrays.sort(hs.quick); hs.finishTimeQuick = (new Date()).getTime(); System.out.println(hs.toString()); } }

Source: https://habr.com/ru/post/203032/

All Articles

Hash table sorting

More articles: