⬆️ ⬇️

Optimization for beginners, or the benefits of profiling

I got the task to write in PHP the optimal algorithm for inserting a new value into an ordered array. And it is reasoned to prove that this algorithm is the best. For this it was proposed to write three options and choose the best of them. Of course, I know that the best search method is binary, but once told to prove that it is the best, so be it, I will write two more. With this attitude and confidence in the future, I began to code.



I invite the beginning programmers to read what came out of it and to discuss the experienced ones.



Task



There is a fairly large (10 thousand elements) ordered array with numbers. It is necessary to optimally insert a new value into it while maintaining orderliness.



Solution options



The easiest way is to insert at the end and re-sort the built-in function. But initially the condition was not to do so.

')

What should be done to insert a new value? To begin to find the desired position. Given the size of the array, this will probably be the most resource-intensive part. And then insert this value into the position found. So you need to write 3 options for finding this position. We take as guinea-pigs: brute force, binary search, interpolation search (similar to binary, just do not divide in half, but try to guess the position more accurately).



Who is not interested, the program code of the search functions can be skipped.



Search by search



function insertBruteForce(&$array, $value) { function insertBruteForce(&$array, $value) { foreach($array as $position => $test) { if ($test >= $value) { break; } } insertTo($array, $position, $value); } 


Binary search



 function insertBinary(&$array, $value) { $begin = 0; $end = count($array) - 1; while ($end - $begin > 1) { $position = round(($begin + $end) / 2); if ($array[$position] > $value) { $end = $position; } elseif ($array[$position] < $value) { $begin = $position; } else { break; } } if ($array[$position] < $value) { ++$position; } insertTo($array, $position, $value); } 


It has a somewhat strange appearance due to the fact that we are not looking for the exact value, but the position between the elements.



Interpolated search



 function insertInterpolation(&$array, $value) { $begin = 0; $end = count($array) - 1; while ($end - $begin > 1) { $range = $array[$end] - $array[$begin]; $percentPosition = ($value - $array[$begin]) / $range; $position = $begin + round(($end - $begin) * $percentPosition); $position = min($position, $end); if($array[$position] <= $value && (!isset($array[$position+1]) || $array[$position+1] >= $value)) { break; } elseif ($array[$position] > $value) { $end = $end != $position ? $position : $position - 1; } elseif ($array[$position] < $value) { $begin = $begin != $position ? $position : $position + 1; } } if ($array[$position] < $value) { ++$position; } insertTo($array, $position, $value); } 




Inserting a value into a found position



Well, it should be easy (as I thought then). However, in PHP there is no built-in function for inserting a new value into a given position; there is only a substitution of a value. Not scary, take advantage of what is - cut, paste the value and glue. This is not an array brute force, you only have to do it once, use the built-in functions, they also work quickly.



 function insertTo(&$array, $position, $value) { $array = array_merge(array_slice($array, 0, $position), array($value), array_slice($array, $position)); } 


As it turned out later, this should not be done.



Test results



I quickly write the code to generate a random array with data, a test for repeated launching and collecting statistics. And here something strange happened. The result was something like this:

insertBruteForce: 0.0088

insertBinary: 0.0088

insertInterpolation: 0.0087



The lack of difference between binary search and interpolation can still be explained. But why does a simple search give the same result? An increase in the array does not change the ratio of forces.



Profiling hurries to the rescue



It became clear that the usual time metering for these questions will not answer. Well, Xdebug is already installed and configured, it only remains to enable profiling in it and see what happens.



And here again a surprise awaited me. The main part of the time was not the search for a position, but the insertion of a new element into the found position. At the same time, the execution time of the search itself was not much affected by the result.



So you need to rewrite the insert function. Instead of cutting and gluing try to push and paste.

 function insertDown(&$array, $value) { $i = count($array); for ($i = $i - 1; $i >= 0 && $array[$i] < $value; --$i) { $array[$i+1] = $array[$i]; } $array[$i] = $value; } 


Already better, but still not right.



This option works 40% faster and consumes less memory. And the result is:

insertBruteForce: 0.0052

insertBinary: 0.0053

insertInterpolation: 0.0053



And now we look once again at the last function. What she does? She pushes the elements until it reaches the desired position. Does she really need to know the position in advance?



Search and insert in one bottle



 function insertDown(&$array, $value) { $i = count($array); for ($i = $i - 1; $i >= 0 && $array[$i] >= $value; --$i) { $array[$i+1] = $array[$i]; } $array[$i] = $value; } 


The result: just one simple function (yes, with a brute force) and test time in 0.0049 seconds, so far the best result.



The benefits of collective intelligence



Added the next day

As a result of the discussion here by the comrades and myself in the code, errors were revealed that I made during his edits (I tested the initial version, but then started the experiments). Already corrected and inserted the updated results in the text.



PQR prompted to replace the value insertion function with this:

 function insertTo(&$array, $position, $value) { array_splice($array, $position, 0, $value); } 


It turns out the insert function in PHP is, but it is part of a more universal one. Test result:

insertBruteForce: 0.0035

insertBinary: 0.0036

insertInterpolation: 0.0037

insertDown: 0.0047



Even better, but still strange.



SerafimArts suggested using the SplFixedArray class instead of the usual array. I try. The insertion function really had to be “manual” again:

 function insertTo($array, $position, $value) { $size = $array->count(); $array->setSize($size + 1); for ($i = $size - 1; $i >= $position; --$i) { $array[$i+1] = $array[$i]; } $array[$position] = $value; } 


Result:

insertBruteForce: 0.0033

insertBinary: 0.0019

insertInterpolation: 0.0018

insertDown: 0.0026



All options have reduced lead time. And what is most interesting, the result is exactly what was originally expected and exactly the way we were taught in the university.



Epilogue



Knowledge and assumptions are good, but it is necessary to check what happens in practice. The right tool for this is not generally accepted:

  $start = microtime(true); <- > $time = microtime(true) - $start; 


and profiling. Although the above method may be useful, still profiling gives more detailed information and collects statistics not only where you explicitly indicated this (although this is also possible).



While experimenting with code, you need to be very careful about writing automated tests. In my case, this would eliminate a number of mistakes made during his revisions.



Many thanks to all the commentators for the hints and suggestions.

Source: https://habr.com/ru/post/242309/



All Articles