About sorting (bubble, fast, comb ...)

This article focuses primarily on novice programmers. About sorting in general and the articles mentioned in the Internet title, why do we need another one? For example, there is a good article here on Habré: Bubble Sort and all-all-all . First of all, there is not much good, and secondly, in this article I want to answer the questions “why, what, how, why”. Why do we need sorting? First of all, for searching and presenting data. Some tasks with unsorted data are very difficult to solve, and some are simply impossible. Example: spelling dictionary, the words in it are sorted alphabetically. If this were not the case, then try to find the right word in it. Phonebook where subscribers are sorted alphabetically. Even if sorting is not necessary and not much needed, it is still more convenient to work with sorted data.

What we sort, what type of data? We will sort the integer data in the array using comparison and exchange. Those. we compare m [i] and m [j] (i <j) elements of the array, and if the j-th is “less” of the i-th, then we swap them. As a result, the array will be sorted in ascending order.
Note : we will compare as m [j] / 10 <m [i] / 10. This is done to see how items with the same key are sorted. With this comparison, 30 <40, but 30 = 31 = 32 ... Stable (or stable sorting) does not change the order of the elements that have the same comparison keys (in our example, the comparison key is the whole part of the value of the / 10 element)

Here is the initial (unsorted) array:

thirty

ten

but sorted:

ten

thirty

Note: the order of the elements 32,31,30 has changed, i.e. sorting is not stable (not stable).
')
All the sortings used in the article compare a couple of elements and change them if they are not in the right order. How are they different? They differ in the way they select pairs of compared items.
Further examples of sorting procedures in C languages. Everywhere the first parameter is the address of the array, the second is the number of elements in the array. The outer loop in the procedures is called the passage, and the inner loop is simply a cycle.

Simple sorting

The pass goes through all the elements of the array and in a loop compares the current element with all subsequent ones.

void sorto(int *m,int n) // , , ...  { int tmp; for (int i=0;i<n-1;++i) { //  for (int j=i+1; j<n;++j) { //  if (m[j]/10<m[i]/10) { // j- ,     i- tmp=m[i]; m[i]=m[j]; m[j]=tmp; } } } }

Work result

ten

thirty

The advantages of simple sorting are simple to implement and stable.
Cons - it is very slow, because there are a lot of comparison operations. Each of the n elements is compared with all subsequent ones, so the number of comparisons is ~ n * n / 2. If n = 100,000, then 5 billion comparisons. True, if n = 1000, then only 500 thousand comparisons and a modern computer can do it in less than a second, so with small quantities this sorting is good. There is one important minus - if the data is almost sorted (out of 100 thousand elements only 10 are not in their places), all the same, the work time will change little, because comparison operations will be the same. Let's try to improve the algorithm and move on to

Bubble Sort

It is also “Bubble” and “Bubble.” The internal loop runs over the array again and again and compares the current element with the previous one (therefore, the mileage starts from 1) and if the current one is less, rearranges it with the previous one. The location of the permutation is stored in the variable k, and at the end of the passage in k is the index from which the array is sorted.

 void sortb(int *m,int n) { int tmp,k; while(n>1) { //  ,     k=0; //   for (int i=1; i<n;++i) { if(m[i]/10<m[i-1]/10) { tmp=m[i-1]; m[i-1]=m[i]; m[i]=tmp; k=i; //       } } n=k; //  k   } }

In the worst case (when the source array is sorted in the reverse order, the same n * n / 2 comparisons are obtained. But in the best or in the good one? ”? The best case is when the array is completely otorotirovan. Then nothing will be rearranged in the first pass, in n will be assigned to the end of the pass 0 (k = 0) and there will be no second pass. Instead of 100 thousand passes, only 1 is great, but what if the array is sorted but not completely. For example, rearrange the minimum and maximum elements in the sorted array and sort this " problematic array (remember that 30 32 have the same key):

thirty

ten

In the first step of passage 1, “20” is compared with “50” and is exchanged. At the next step, "32" is compared again with "50" (after all, he now took the place of "20." As a result, at the end of the first pass, "50" will take place at the end, and the rest will move 1 position to the beginning of the array. And what will happen from “10", which was at the end? It will also move to position 1 to the beginning. If you imagine that the starting positions are at the top and the end ones at the bottom, "at the bottom", then as a result of the passage the heaviest will be at the bottom, and the easiest " will emerge "by one position. In one of the articles, heavy elements are called" hares "and light elements are called" turtles. " The ode “hare” quickly runs to the end (or drowns) until it meets a more weighty hare (which will take the baton and run further) or until it takes its place. And the lighter elements pop up one position. If the lightest takes the nth position , then you need n passes to get to its place. The heaviest (“pellet”) quickly sinks in one pass, and the easiest (“bubble”) pops up slowly. The bubble that floats the longest and determines the total number of passages. If the easiest is at the end of the array, you need as many passes as there are elements in the array.

What to do with the "turtles" so that they do not interfere so much. Comes to the rescue

Shaker sorting

Why do heavy elements sink quickly in bubble sorting, and light ones emerge slowly ? Because the comparison cycle advances from the beginning of the array to the end and "drags" heavy elements with it. And if you move from the end to the beginning, then the lungs will begin to emerge quickly, and the heavy ones will slowly sink. The easiest one pass will move from the end to the beginning. Shaker sorting on even passages moves from the beginning to the end, and on odd sets, our problem array will also be sorted out in 3 passes. For a nearly sorted array, shaker sorting can be much faster than a bubble, but for a randomly sorted source array, the gain is usually not very large.

Hairbrush sorting

Or comb. Again, ask why the bubble slowly pops up? Because for the passage it moves to 1 position. Why does he move only 1 position? Because the neighboring elements are compared and rearranged. So let's compare not neighboring elements, but those located at a distance s (gradually decreasing s with each pass). Implementing this idea in practice, we come to sorting the comb.

 void sortc(int *m,int n) { int tmp,k; int s=n; //    long long cnt=0; while(n>1) { s/=1.247f; //    .      80%   , //         20% if (s<1) s=1; // 0   ,  1 k=0; //   for (int i=0; i+s<n;++i) { // ,    ( s  )   if(m[i]/10>m[i+s]/10) { tmp=m[i]; m[i]=m[i+s]; m[i+s]=tmp; k=i; } ++cnt; } if (s==1) //  1,    .   n=k+1; } }

The idea is simple and the algorithm is not complicated, but the result is impressive. Sorting hair comb turns out to be much faster than a bubble / shaker, it can even outrun the “fast” qsort sorting. But there is also a minus - it is not stable (which is intuitive).

Quick sort qsort

The quick sort function of our array is very simple:

 //   qsort int fcomp(const void *i, const void *j) { return (*(int *)i)/10 - (*(int *)j)/10; } void sortq(int *m,int n) { qsort(m,n,sizeof(int),&fcomp); }

It is simple because it uses the standard qsort quick sort. Let's look again at the above algorithms. In all, pairs of elements m [i] and m [j] are selected and compared. But what is m [i]? This is the i-th element in the array of integers m or "element located at the address Ai = m + i * <sizeof (int)>". Accordingly, the j-th element is located at the address Aj = m + j * <sizeof (int)>. So, we compare elements by addresses Ai and Aj and rearrange them if Aj <Ai (less in the sense of some comparison operation that we use). So, the qsort function is a kind of "sorting combine" that selects, compares and swaps elements from our array (which we pass to it by the first parameter). Naturally, you need to know the number of elements in the array and it is passed by the second parameter. How does qsort determine where the i-th element is? Very simple - by adding to the address of the array i * <element size in bytes>. This <size in bytes> is passed in the third parameter. Good, but how does qsort compare the elements, because we have a clever way to compare? It does not compare itself; it uses our comparison function fcomp, whose address is passed in 4 parameters. When qsort decides by its internal algorithm that it is necessary to compare i and j-th elements, it will transmit their addresses as 1 and 2 parameters of the function fcomp, which returns the comparison result <0, = 0,> 0, depending on whether the first the second parameter is equal to it or more. If qsort sees that i <j but fcomp (& m [i], & m [j])> 0, she will simply rearrange the elements in the array (she knows the size of the element, but the contents are not important to her).

Sorting time 100001 element

Let's measure the sorting time for an array containing 100001 elements on a computer with an Intel i5 processor (3.3GHz). The time is indicated in seconds, the fraction indicates the number of passes (for quick sorting it is unknown). As expected, the shaker sorting on the problem array (which is completely ordered, only the first and last elements are rearranged) the absolute leader. She is perfectly "sharpened" for this data. But on random data sorting comb and qsort do not leave opponents a chance. Bubble sorting on the problem array shows a twofold increase in speed compared to a random one simply because the number of permutation operations is orders of magnitude smaller.

Sorting	Simple	Bubble	Shaker	Comb	Fast (qsort)
Stable	+	+	+	-	-
Random	23.1 / 100000	29.1 / 99585	19.8 / 50074	0.020 / 49	0.055
Problem	11.5 / 100000	12.9 / 100000	0.002 / 3	0.015/48	0.035
Back	18.3 / 100000	21.1 / 100000	21.1 / 100001	0.026 / 48	0.037

And now we will return to the sources, to bubble sorting and take a look at the sorting process. See how on the first pass the heavy element (50) moves to the end?

Compared elements are shown in green frames, and swapped elements are shown in red

Supplement after publication

I do not consider qsort bad or slow by any means - it is fast enough, functional and should be used with it if possible. Yes, she has to spend time calling the comparison function and she yielded to a “comb” that compares “in place.” This lag is insignificant (compare with the lag of the bubble from qsort, which will increase with the growth of the array). Let now it is necessary to compare not numbers, but some complex structure on a certain field and let this structure consist of 1000 bytes. Put 100k elements in the array (100mb is something) and call qsort. The fcomp function (comparator function) compares the required fields and the result is a sorted array. In this case, when rearranging qsort elements, you will have to copy fragments of 1000 bytes 3 times. And now “a little trick” - we will create an array of 100 thousand references to the original elements and pass the beginning of this array of references to qsort. Since the link is 4 bytes (64 bit 8), and not 1000, then when exchanging qsort links, you need to change these 4/8 bytes. Of course, you will need to change fcomp, because as parameters it will now receive not element addresses, but element addresses (but this is a simple change). But now you can make several sorting functions (each sorts structures by its field). And even, if necessary, you can make several arrays of links. Here are the possibilities qsort gives!

By the way: the use of references to objects instead of the objects themselves can be useful not only when calling qsort, but also when using containers such as vector, set or map.

Number of comparisons and exchanges

The following table shows the number of comparisons / number of exchanges for each sort. For qsort, the number of exchanges is unknown, so only the number of comparisons is shown. It can be seen that for a random array the number of comparisons is minimal for qsort.

Sorting	Simple	Bubble	Shaker	Comb	Fast (qsort)
Random	5'000'050'000 / 1'131'715'503	4'999'702'086 / 2'507'142'238	3'341'739'679 / 2'507'142'238	4'489'129 / 714'533	1'915'414
Problem	5'000'050'000 / 199'999	5'000'050'000 / 199'999	299'997 / 199'999	4'395'305 / 91'829	1'588'741
Back	5'000'050'000 / 5'000'050'000	5'000'050'000 / 5'000'050'000	5'000'050'000 / 5'000'050'000	4'395'305 / 233'210	1'588'741

Source: https://habr.com/ru/post/357640/

All Articles

About sorting (bubble, fast, comb ...)

Simple sorting

Bubble Sort

Shaker sorting

Hairbrush sorting

Quick sort qsort

Sorting time 100001 element

Supplement after publication

Number of comparisons and exchanges

More articles: