📜 ⬆️ ⬇️

One more analysis of bubble sort


Once, on New Year's Eve, inspired by the article about bubble sorting and its modifications, I decided to write my own implementation, and to think about how I could improve it. But at the same time, to start learning JAVA after all (by profession I am not a programmer, although I wrote a little).

Why do we need bubble sorting these days?
She's practically the slowest.
It has the highest (quadratic) complexity algorithm.

But! It is the easiest to implement and very visual, and is often used for educational purposes or at junior / intern interviews.
In addition, with minor modifications, interesting results can be achieved.
Newbies in programming and interested - please under the cat.
')

So, we immediately pursue several goals.
Write the implementation of the classic bubble sort;
Try to write a modified algorithm that should overtake the classic "bubble";
Learn the basics of JAVA and a bit of OOP.

All work with sorting and processing will be moved to a separate class ArrayUtils , from the class Run we will call sorting methods and form the order of calls with different data sets.

In the ArrayUtils class, we will have:
  1. Actually the array itself
  2. As the simplest metrics, we add two variables compareValue and switchValue to count the number of comparisons and the number of displacements of values, as well as timeAmount - operation time - they will be updated when the sorting method is started;
  3. Method of outputting results and metrics results ();
  4. The validation () method to immediately check if the sort is really working correctly;


I did the validation in the following way.
The constructor of the ArrayUtils class receives an array as input, which copies it into two local arrays, array and sortedArray , the latter is immediately sorted by the regular sorter Arrays.sort ().
class ArrayUtils
public class ArrayUtils { final private int[] array; final private int[] sortedArray; long switchCount, compareCount, timeAmount; public ArrayUtils(int[] array) { this.array = array; this.sortedArray = Arrays.copyOf(array, array.length); Arrays.sort(sortedArray); } void results(String text) { System.out.println(String.format("%-35s Compares: %, 15d, Switches: %, 15d, Time: %, 15d", text, compareCount, switchCount, timeAmount)); } } 


The actual validation () method actually compares the current state of the array with our sortedArray , which is sorted by the regular sorter (in Java, this is one of the variations of the smart quicksort). I call the method via assert, in eclipse it is turned off by default, but by adding the -ea options, our assert correctly drops out if the array breaks.
Validation method
 boolean validate() { if (Arrays.equals(array,sortedArray)) return true; return false; } 


Now we can proceed to the implementation of bubble sorting, which will be the benchmark for further work.
sortBubbleClassic
 public void sortBubbleClassic() { int currentPosition; int maxPosition; int temp; switchCount=0; compareCount=0; time = System.nanoTime(); for (maxPosition=array.length - 1; maxPosition >= 0;maxPosition--) { for (currentPosition = 0; currentPosition < maxPosition; currentPosition++) { compareCount++; if (array[currentPosition] > array[currentPosition+1]) { temp = array[currentPosition]; array[currentPosition] = array[currentPosition+1]; array[currentPosition+1] = temp; switchCount++; } } } time = System.nanoTime() - time; assert(validate()); return; } 


We now write the Run class, from which we will call sorting, we add the fillRandom () method in it to create a set of random data for further sorting.
Run.java
 public class Run { static public void FillRandom(int[] array) { for (int count=0; count < array.length; count++) { array[count] = (int)(Math.random()*100); } } public static void main(String[] args) { int arraySize= 10000; int random[] = new int[arraySize+1]; FillRandom(random); ArrayUtils bubbleClassic = new ArrayUtils(random); bubbleClassic.sortBubbleClassic(); bubbleClassic.results("Bubble Classic, random array"); } } 


You can run. Passing an array to the created class, and then copying it, I did so that the original array with random data remains unchanged for later use. So we can sort an identical set of random data in different ways.

Run, we get the result:
 Bubble Classic, random array Compares: 50 005 000, Switches: 24 486 908, Time: 117 116 326 

If assert did not throw an error, then the algorithm sorts correctly. On an array of 10,000 elements, we received over 50 million comparisons and 24 million permutations.

We copy our sortBubbleClassic method into sortBubbleAdvanced, and we begin to think what can be improved.
First of all, I thought that it was possible to add a check on whether the array was sorted so as not to drive through it for nothing.
To do this, I created the Boolean variable changed , which is set to false before the start of the inner loop, and inside the loop, if we do the permutation, is set to true .
If, having run the entire internal cycle, we have not made a single permutation, we can not drive further the wasted cycles, but immediately exit.
sortBubbleAdvanced - step 1
  public void sortBubbleClassic() { int currentPosition; int maxPosition; int temp; switchCount=0; compareCount=0; timeAmount = System.nanoTime(); Boolean changed; //    for (maxPosition=array.length - 1; maxPosition >= 0;maxPosition--) { changed=false; //   for (currentPosition = 0; currentPosition < maxPosition; currentPosition++) { compareCount++; if (array[currentPosition] > array[currentPosition+1]){ temp = array[currentPosition]; array[currentPosition] = array[currentPosition+1]; array[currentPosition+1] = temp; switchCount++; changed = true; //   } } if (!changed) { //     -   timeAmount = System.nanoTime() - timeAmount; assert(validate()); return; } } timeAmount = System.nanoTime() - timeAmount; assert(validate()); return; } 


Further, if at the beginning of the array, we have a large number, we pull it to the right, making a bunch of permutations. It is logical that you can immediately move it to the end of the array.
However, doing a lot of checks to figure out exactly where to throw it is expensive. Therefore, I made the simplest solution - check if the current value is greater than the value in the rightmost element - swap them. Increase the number of checks, but reduce the number of permutations.
Immediately and the second optimization - if we have a small number at the very end of the array, it will generally run to the beginning of the array through N internal cycles, almost equal to the number of elements in the array. Therefore, we add another check to swap the current value and the leftmost value in the array, if it is smaller. In itself, this action gives an extra check, but it accelerates slightly. But, after the completion of the internal cycle, we can be sure that the smallest number is in the leftmost position of our data array. This means that we can now begin the inner loop not from the first element, but from the second. With each step, we will now cut the array for the inner loop into two values ​​at once - left and right. This is a clear increase.
We have our three checks rationally.
The first check is the main bubble - two elements are compared, then we compare the current element with the leftmost element of the array, for which one is smaller. Then with the most right, on the subject, who is more.
sortBubbleAdvanced - step 2
 public void sortBubbleAdvanced(){ int currentPosition; int maxPosition; //    int minPosition = 0; //    boolean changed; // ,    int temp; switchCount = 0; compareCount = 0; timeAmount = System.nanoTime(); for (maxPosition = array.length - 1; maxPosition >= 0; maxPosition--) { changed=false; for (currentPosition = minPosition; currentPosition < maxPosition; currentPosition++) { if (array[currentPosition] > array[currentPosition+1]){ //      temp = array[currentPosition+1]; array[currentPosition+1] = array[currentPosition]; array[currentPosition] = temp; switchCount++; changed=true; } if (array[currentPosition] < array[minPosition]){ //      temp = array[minPosition]; array[minPosition] = array[currentPosition]; array[currentPosition] = temp; switchCount++; changed=true; } if (array[currentPosition] > array[maxPosition]){ //      temp = array[maxPosition]; array[maxPosition] = array[currentPosition]; array[currentPosition] = temp; switchCount++; changed=true; } compareCount+=3; } if (!changed) { timeAmount=System.nanoTime()-timeAmount; assert(validate()); return; } compareCount++; minPosition++ //    } timeAmount=System.nanoTime()-timeAmount; assert(validate()); return; } 


You can check what happened with us. Rule the Run class to add now the check of two methods and run
 Bubble Classic, random array Compares: 50 005 000, Switches: 24 758 509, Time: 105 797 881 Bubble Advanced, random array Compares: 112 317 213, Switches: 18 684 909, Time: 87 415 460 

As you can see, the result is very significant. The number of comparisons has more than doubled. But on the other hand, the number of permutations has decreased, and they are more expensive in time, so by time we win about 15% ! ..

Add two more data sets — already sorted incremental array, and sorted in reverse order — decremental (in theory it should be the worst case for sorting), add them to the Run class and add sortBubbleClassic and sortBubbleAdvanced calls for all three arrays - random , incremental and decremental
two more types of arrays
 static public void fillDecremental(int[] array) { for (int count=0; count < array.length; count++) { array[count] = array.length-count; } } static public void fillIncremental(int[] array) { for (int count=0; count < array.length; count++) { array[count] = count; } } 


See the results:
 Bubble Classic, random array Compares: 50 005 000, Switches: 24 678 169, Time: 116 314 053 Bubble Advanced, random array Compares: 111 879 748, Switches: 18 615 013, Time: 77 282 419 Bubble Classic, decremental array Compares: 50 005 000, Switches: 50 005 000, Time: 48 202 818 Bubble Advanced, decremental array Compares: 112 527 500, Switches: 49 985 004, Time: 77 115 071 Bubble Classic, incremental array Compares: 50 005 000, Switches: 0, Time: 24 805 261 Bubble Advanced, incremental array Compares: 30 000, Switches: 0, Time: 35 084 

On a random data set, our advanced method is expected to win, and on the decremental it is almost 60% longer; (
But he just instantly checks the already sorted array, thanks to our small check with the variable changed .

I was only partially pleased with this result. You can not leave the optimization of the algorithm, if it on some sets can show the result worse than the original. Reflecting on how to improve bubble sorting, without changing the basic principle, I noticed that in the “bubble”, large numbers actively travel to the right with each pass of the internal cycle, plus the maximum number also moves there. Thus, our right-hand side of the array becomes sorted before the left-hand side ... This idea was realized into the following idea:
With each permutation, I remember this position. Having reached the last element of the inner loop, I can reduce the array on the right not by one, but immediately cut to this last position where the permutation was, since this means that all positions after it are already sorted.
The number of cycles should be significantly reduced, at least about two times for the decremental array.
It is implemented in just three lines:
sortBubbleAdvanced - final step
 public void sortBubbleAdvanced(){ int currentPosition; int maxPosition; int changedMaxPosition = array.length - 1; //    int minPosition = 0; boolean changed; int temp; switchCount = 0; compareCount = 0; timeAmount = System.nanoTime(); for (maxPosition = array.length - 1; maxPosition >= 0; minPosition++) //       ,    minPosition++ { changed=false; for (currentPosition = minPosition; currentPosition < maxPosition; currentPosition++) { if (array[currentPosition] > array[currentPosition+1]){ temp = array[currentPosition+1]; array[currentPosition+1] = array[currentPosition]; array[currentPosition] = temp; switchCount++; changed=true; changedMaxPosition = currentPosition; //      } if (array[currentPosition] < array[minPosition]){ temp = array[minPosition]; array[minPosition] = array[currentPosition]; array[currentPosition] = temp; switchCount++; changed=true; } if (array[currentPosition] > array[maxPosition]){ temp = array[maxPosition]; array[maxPosition] = array[currentPosition]; array[currentPosition] = temp; switchCount++; changed=true; } compareCount+=3; } if (!changed) { timeAmount=System.nanoTime()-timeAmount; assert(validate()); return; } compareCount++; maxPosition = changedMaxPosition; //    -     } timeAmount=System.nanoTime()-timeAmount; assert(validate()); return; } 


We look at what happened with us:
 Bubble Classic, random array Compares: 50 005 000, Switches: 25 020 725, Time: 117 550 372 Bubble Advanced, random array Compares: 45 090 482, Switches: 10 174 100, Time: 70 032 156 Bubble Classic, decremental array Compares: 50 005 000, Switches: 50 005 000, Time: 47 815 033 Bubble Advanced, decremental array Compares: 60 022 000, Switches: 30 003 000, Time: 46 042 519 Bubble Classic, incremental array Compares: 50 005 000, Switches: 0, Time: 25 072 582 Bubble Advanced, incremental array Compares: 30 000, Switches: 0, Time: 34 773 

We snapped up on random data by about 10%, and YES , on the decremental array with a small margin, but we overtook the classic “bubble” - as expected, the time decreased by about two times.
The incremental array is instantaneous.
Due to the sharp reduction of empty passes on the already sorted part, the number of checks of the advanced algorithm has decreased, and in some cases even less than the original, and the number of permutations is always much less (well, apart from the checks, this is just the cost of an empty pass through the array).

So, staying within the main idea of ​​bubble sorting, we were able to significantly improve the result.
What else can be done?
Let's compare our algorithm with the quicksort leader.

We write a simple implementation (to be honest, I just stole it in the internet, slightly correcting it so that the metrics remain, but given that quicksort uses recursion, in a good way, you would need to add a metric for it ... but how can you compare it with algorithms without recursion? In general, not the essence ...), so:
Simple quicksort implementation
 public void quickSort() { timeAmount = System.nanoTime(); switchCount = 0; compareCount = 0; doQuickSort(0, array.length - 1); timeAmount = System.nanoTime() - timeAmount; assert(validate()); } private void doQuickSort(int startPosition, int lastPosition) { if (startPosition >= lastPosition) { compareCount++; return; } int tempStartPosition = startPosition, tempLastPosition = lastPosition; int currentPosition = tempStartPosition - (tempStartPosition - tempLastPosition) / 2; while (tempStartPosition < tempLastPosition) { while (tempStartPosition < currentPosition && (array[tempStartPosition] <= array[currentPosition])) { compareCount++; tempStartPosition++; } while (tempLastPosition > currentPosition && (array[currentPosition] <= array[tempLastPosition])) { compareCount++; tempLastPosition--; } if (tempStartPosition < tempLastPosition) { int temp = array[tempStartPosition]; array[tempStartPosition] = array[tempLastPosition]; array[tempLastPosition] = temp; switchCount++; if (tempStartPosition == currentPosition) currentPosition = tempLastPosition; else if (tempLastPosition == currentPosition){ currentPosition = tempStartPosition; compareCount++; } compareCount++; } compareCount++; } doQuickSort(startPosition, currentPosition); doQuickSort(currentPosition + 1, lastPosition); } 



See the results:
 Bubble Classic, random array Compares: 50 005 000, Switches: 24 684 713, Time: 117 338 627 QuickSort, random array Compares: 453 318, Switches: 22 234, Time: 3 000 141 Bubble Advanced, random array Compares: 44 832 715, Switches: 10 048 493, Time: 70 540 407 Bubble Classic, decremental array Compares: 50 005 000, Switches: 50 005 000, Time: 47 269 214 QuickSort, decremental array Compares: 153 632, Switches: 5 000, Time: 179 766 Bubble Advanced, decremental array Compares: 60 022 000, Switches: 30 003 000, Time: 45 579 908 Bubble Classic, incremental array Compares: 50 005 000, Switches: 0, Time: 24 927 899 QuickSort, incremental array Compares: 143 632, Switches: 0, Time: 134 437 Bubble Advanced, incremental array Compares: 30 000, Switches: 0, Time: 35 394 

As expected, quicksort easily does all of our algorithms on both a random data set and a decrement array. But suddenly, on an already sorted array, our advanced bubble makes it almost 4 times faster!

At first, I thought that there wasn’t much point. Well, yes, my advanced algorithm checks that the array is already sorted faster, but such a task is extremely rare.
Then I figured that this is not all - in fact, almost at the same speed, our advanced algorithm, in one pass, can sort at best 3 numbers (minimum, maximum, and move a couple more along the way), and decided to check for practice.
I changed the method that creates the incremental array so that there are several random numbers in the middle of the sorted array:
Create a not exactly sorted array.
 static public void fillIncremental(int[] array) { for (int count=0; count < array.length; count++) { array[count] = count; } for (int count=array.length/2; count < array.length/2+5; count++) //     5    array[count]=(int)Math.random()*100; } 



Look what happened:
 Bubble Classic, incremental array Compares: 50 005 000, Switches: 24 995, Time: 24 751 238 QuickSort, incremental array Compares: 219 964, Switches: 8 426, Time: 249 624 Bubble Advanced, incremental array Compares: 179 740, Switches: 24 981, Time: 125 123 

The advanced bubble algorithm did faster than quicksort, sorting out 5 random numbers almost twice as fast.

Increase the size of the array 10 times, check just in case:
 Bubble Classic, incremental array Compares: 5 000 050 000, Switches: 249 995, Time: 2 168 664 535 QuickSort, incremental array Compares: 2 539 530, Switches: 86 062, Time: 11 479 582 Bubble Advanced, incremental array Compares: 1 799 740, Switches: 249 981, Time: 6 411 974 

Anyway, almost twice as fast.
On an almost sorted array, where the number of unsorted elements does not exceed a certain minimum number, our converted bubble algorithm does not show quadratic complexity, but the usual O (N), and considering its simplicity to implement, it bypasses quicksort.

If we add the number of non-sorted elements up to about 10, the quicksort and bubbleAdvanced speeds are compared, and then our algorithm is still getting bored into hopeless quadratic slowness. However, if you need to sort a few randomly inserted values ​​in a previously sorted data array, it was out of competition.

Moral, results, conclusions.

The results , or rather the figures were shown above, can be discussed in the comments.
Sources are available on github (although there is a bit of garbage there, I tried to learn maven, but the source files of the classes can be compiled into any IDE or console).
In addition, I did write my first JAVA code, a bit more complicated than HelloWorld. And also learned at least about two methods of sorting from within.
And besides, if you dig into the algorithms, you can then intuitively guess what results you can expect, and where to dig.
So far, I don’t really understand why the decremental array is processed faster than the random one. It seemed to me that for the bubble sort the worst case is that it is a decremental array. Perhaps this is due to the fact that in a random array there are the same numbers, in general, there is still something to think about.

Conclusions and morality - under a certain set of data, if speed is important, it always makes sense to think and come up with your own bike, which can outrun regular methods, whose main task is to show quick results in the most common cases.

Source: https://habr.com/ru/post/274493/


All Articles