Search for the k-th smallest element

Today a very interesting article appeared on Habré , about finding the minimum (maximum) value on a segment in an array . Since the article turned out to be interesting and popular, I decided to share with you one more search algorithm in the array of some “special” values.

Surely everyone met the task of finding the k-th smallest element in the array. The k-th element is characterized by the fact that it is greater than (or equal to) k elements of the array and less than or equal to Nk of the remaining elements (where N is the number of elements in the array).

The task of finding the k-th smallest element is usually associated with the sorting task, since the obvious method of finding this element is to sort the N elements and choose the k-th.

But we will go a little different way. I assume that readers know how the quick sort algorithm works, but just in case, let me remind you. A random element x is selected in the array, and the array is scanned to the left until a [i]> x is found, then scans to the right until a [j] <x is found. As soon as two such elements are found, they are exchanged and viewed until the indices i, j become equal somewhere in the middle of the array. The result is an array whose left side contains the elements <= x, and the right side contains the elements> = x. The described procedure is applied recursively for the left and right side and continues until a fully sorted array is obtained. ( A little more about effective sorting algorithms ).
')
The separation procedure used in quick sorting gives the potential to find the desired (k-th) element much faster.
This algorithm works as follows. At the first step, the separation procedure with L = 1 and R = N is called (that is, the separation is performed for the entire array), and a [k] is selected as the separating value x. After separation, the values of the indices i, j are obtained such that

a [h] <x for all h <i
a [h]> x for all h> j
i> j

Three cases are possible here:
• The separation value of x turned out to be too small. As a result, the boundary between the two parts is less than the desired value of k. Then the separation operation must be repeated with elements a [i] ... a [R].

• The selected x value was too large. Then the separation operation should be repeated with elements a [L] ... a [j].

• The element a [k] splits an array into two parts in the required proportion and therefore is the desired value.

The separation operation must be repeated until Case 3 is implemented. This cycle is expressed in the following fragment (I apologize for Pascal, but my students still know only him):

procedure Find ( k : integer ) ;
var
L , R , i , j : integer ;
w , x : integer ;
begin
L : = 1 ; R : = N ;
while L <R - 1 do
begin
x : = a [ k ] ;
i : = L ;
j : = R ;
REPEAT
while a [ i ] <x do
i : = i + 1 ;
while x <a [ j ] do
j : = j - 1 ;
if i < = j then
begin
w : = a [ i ] ;
a [ i ] : = a [ j ] ;
a [ j ] : = w ;
i : = i + 1 ;
j : = j - 1 ;
end ;
UNTIL i> j ;
if j <k then
L : = i ;
if k <i then
R : = j ;
end ;
end ;

If we assume that, on average, each partition divides in half the size of the part of the array in which the desired value is located, then the required number of comparisons will be N + N / 2 + N / 4 + ... + 1 = 2N. This explains the effectiveness of the above procedure for searching medians and other values, and also explains its superiority over the simple method consisting in pre-sorting the entire array with the subsequent selection of the k-th element (where the best behavior is of the order N * log (N)).

I hope this algorithm will help you make your programs more efficient and faster. Thanks for attention.

Source: https://habr.com/ru/post/115018/

All Articles

Search for the k-th smallest element

More articles: