📜 ⬆️ ⬇️

Optimization for beginners, or the benefits of profiling

I got the task to write in PHP the optimal algorithm for inserting a new value into an ordered array. And it is reasoned to prove that this algorithm is the best. For this it was proposed to write three options and choose the best of them. Of course, I know that the best search method is binary, but once told to prove that it is the best, so be it, I will write two more. With this attitude and confidence in the future, I began to code.

I invite the beginning programmers to read what came out of it and to discuss the experienced ones.

Task


There is a fairly large (10 thousand elements) ordered array with numbers. It is necessary to optimally insert a new value into it while maintaining orderliness.

Solution options


The easiest way is to insert at the end and re-sort the built-in function. But initially the condition was not to do so.
')
What should be done to insert a new value? To begin to find the desired position. Given the size of the array, this will probably be the most resource-intensive part. And then insert this value into the position found. So you need to write 3 options for finding this position. We take as guinea-pigs: brute force, binary search, interpolation search (similar to binary, just do not divide in half, but try to guess the position more accurately).

Who is not interested, the program code of the search functions can be skipped.

Search by search


function insertBruteForce(&$array, $value) { function insertBruteForce(&$array, $value) { foreach($array as $position => $test) { if ($test >= $value) { break; } } insertTo($array, $position, $value); } 

Binary search


 function insertBinary(&$array, $value) { $begin = 0; $end = count($array) - 1; while ($end - $begin > 1) { $position = round(($begin + $end) / 2); if ($array[$position] > $value) { $end = $position; } elseif ($array[$position] < $value) { $begin = $position; } else { break; } } if ($array[$position] < $value) { ++$position; } insertTo($array, $position, $value); } 

It has a somewhat strange appearance due to the fact that we are not looking for the exact value, but the position between the elements.

Interpolated search


 function insertInterpolation(&$array, $value) { $begin = 0; $end = count($array) - 1; while ($end - $begin > 1) { $range = $array[$end] - $array[$begin]; $percentPosition = ($value - $array[$begin]) / $range; $position = $begin + round(($end - $begin) * $percentPosition); $position = min($position, $end); if($array[$position] <= $value && (!isset($array[$position+1]) || $array[$position+1] >= $value)) { break; } elseif ($array[$position] > $value) { $end = $end != $position ? $position : $position - 1; } elseif ($array[$position] < $value) { $begin = $begin != $position ? $position : $position + 1; } } if ($array[$position] < $value) { ++$position; } insertTo($array, $position, $value); } 


Inserting a value into a found position


Well, it should be easy (as I thought then). However, in PHP there is no built-in function for inserting a new value into a given position; there is only a substitution of a value. Not scary, take advantage of what is - cut, paste the value and glue. This is not an array brute force, you only have to do it once, use the built-in functions, they also work quickly.

 function insertTo(&$array, $position, $value) { $array = array_merge(array_slice($array, 0, $position), array($value), array_slice($array, $position)); } 

As it turned out later, this should not be done.

Test results


I quickly write the code to generate a random array with data, a test for repeated launching and collecting statistics. And here something strange happened. The result was something like this:
insertBruteForce: 0.0088
insertBinary: 0.0088
insertInterpolation: 0.0087

The lack of difference between binary search and interpolation can still be explained. But why does a simple search give the same result? An increase in the array does not change the ratio of forces.

Profiling hurries to the rescue


It became clear that the usual time metering for these questions will not answer. Well, Xdebug is already installed and configured, it only remains to enable profiling in it and see what happens.

And here again a surprise awaited me. The main part of the time was not the search for a position, but the insertion of a new element into the found position. At the same time, the execution time of the search itself was not much affected by the result.

So you need to rewrite the insert function. Instead of cutting and gluing try to push and paste.
 function insertDown(&$array, $value) { $i = count($array); for ($i = $i - 1; $i >= 0 && $array[$i] < $value; --$i) { $array[$i+1] = $array[$i]; } $array[$i] = $value; } 

Already better, but still not right.

This option works 40% faster and consumes less memory. And the result is:
insertBruteForce: 0.0052
insertBinary: 0.0053
insertInterpolation: 0.0053

And now we look once again at the last function. What she does? She pushes the elements until it reaches the desired position. Does she really need to know the position in advance?

Search and insert in one bottle


 function insertDown(&$array, $value) { $i = count($array); for ($i = $i - 1; $i >= 0 && $array[$i] >= $value; --$i) { $array[$i+1] = $array[$i]; } $array[$i] = $value; } 

The result: just one simple function (yes, with a brute force) and test time in 0.0049 seconds, so far the best result.

The benefits of collective intelligence


Added the next day
As a result of the discussion here by the comrades and myself in the code, errors were revealed that I made during his edits (I tested the initial version, but then started the experiments). Already corrected and inserted the updated results in the text.

PQR prompted to replace the value insertion function with this:
 function insertTo(&$array, $position, $value) { array_splice($array, $position, 0, $value); } 

It turns out the insert function in PHP is, but it is part of a more universal one. Test result:
insertBruteForce: 0.0035
insertBinary: 0.0036
insertInterpolation: 0.0037
insertDown: 0.0047

Even better, but still strange.

SerafimArts suggested using the SplFixedArray class instead of the usual array. I try. The insertion function really had to be “manual” again:
 function insertTo($array, $position, $value) { $size = $array->count(); $array->setSize($size + 1); for ($i = $size - 1; $i >= $position; --$i) { $array[$i+1] = $array[$i]; } $array[$position] = $value; } 

Result:
insertBruteForce: 0.0033
insertBinary: 0.0019
insertInterpolation: 0.0018
insertDown: 0.0026

All options have reduced lead time. And what is most interesting, the result is exactly what was originally expected and exactly the way we were taught in the university.

Epilogue


Knowledge and assumptions are good, but it is necessary to check what happens in practice. The right tool for this is not generally accepted:
  $start = microtime(true); <- > $time = microtime(true) - $start; 

and profiling. Although the above method may be useful, still profiling gives more detailed information and collects statistics not only where you explicitly indicated this (although this is also possible).

While experimenting with code, you need to be very careful about writing automated tests. In my case, this would eliminate a number of mistakes made during his revisions.

Many thanks to all the commentators for the hints and suggestions.

Source: https://habr.com/ru/post/242309/


All Articles