Express assessment of the complexity of the algorithm (+ analysis of the problem with Joker 2017 and DotNext 2017 Moscow)

For any practical application, log(n) can be considered constant. Just in some companies, this constant is larger than yours. © folk wisdom

Half the life I learn to program. Including I teach developers to make a quick assessment of the computational complexity of the algorithm. What for?!

At first glance, this is quite an academic skill that is not needed beyond the scope of the algorithm exam. However, experienced developers use it daily, without hesitation, at the level of reflexes. They do this for a rough analysis of the code that they are reading, writing, or just about to write. Usually a quick analysis of the asymptotics is enough to exclude the accidental appearance of a highly inefficient code.

First, let us figure out how to make an estimate of the complexity, using the example of a short but non-trivial task. Then I will tell you how to learn how to make a quick assessment, and show statistics on how the participants of the Joker and DotNext conferences solved the problem-example.

Picture from here .

Challenge on Joker 2017 and DotNext 2017 Moscow

My colleagues , who were at the Joker conferences in St. Petersburg and DotNext in Moscow last fall, asked me to come up with some kind of problem about the complexity of the Contour booth.

At first I decided that complexity was a bad topic, boring. But still decided to try, and in my opinion the result was quite beautiful! The idea was born easily, on the way home, and then I spent almost an hour behind the keyboard trying to draw it into a short but effective piece of C # code. Here's what happened:

 private static ISet<string> Compute(int n) { var set = new HashSet<string>(); for (int i = 2; i < n; i++) set.Add(ToBinaryString(i)); for (int step = 2; step < n; step++) for (int i = 2 * step; i < n; i += step) set.Remove(ToBinaryString(i)); return set; } private static string ToBinaryString(int x) { return x == 0 ? "0" : ToBinaryString(x / 2) + x % 2; }

Same java code

 private static Set<String> compute(int n) { Set<String> set = new HashSet<>(); for (int i = 2; i < n; i++) set.add(toBinaryString(i)); for (int step = 2; step < n; step++) for (int i = 2 * step; i < n; i += step) set.remove(toBinaryString(i)); return set; } private static String toBinaryString(int x) { return x == 0 ? "0" : toBinaryString(x / 2) + x % 2; }

It was suggested, having studied this code, to answer three questions:

What does this strange code do?
How can I optimize it?
What is the complexity of the function Compute(n) ?

The most accurate answer to the last question was suggested to choose from an impressive list of options:

O (n)
O (n²)
O (n³)
O (n log (n))
O (n log² (n))
O (n log³ (n))
O (n²log (n))
O (n³ log (n))
O (n² log² (n))
O (n² log³ (n))
O (n³ log³ (n))

It’s time to pour a cup of coffee, read the code snippet again, speculate for five minutes and choose the answer. Next will be spoilers.

Decision

Let's start from the bottom up. What the ToBinaryString method does is easy to understand - working recursively, it translates the string into a binary record. Since the argument is halved on each recursion call, its depth is O(log(x)) . Since there is no branching of recursion, we could stop here, having decided that the complexity of this function is O(log(x)) .

But you can show a little more meticulousness and recall the immutability of strings in C # and Java. Indeed, string concatenation will lead to copying, which means it will work for O (string length). So the total complexity will be slightly worse than the logarithm. O(log²(x)) is our choice! Of course, from a practical point of view, the difference is almost not noticeable (see the epigraph), but for the problem - the most it!

Let's go back to the Compute method . The complexity of the first cycle for - O(n log²(n)) . Obviously, the complexity of the whole method will be dictated by heavier nested for.

The inner loop of nested fors will make O(n/step) ToBinaryString calls. So the total complexity will be O(n/2 + n/3 + n/4 +... n/(n-1)) = O(n(1/2 + 1/3 + ...))

Now is the time to make a shallow dive from computer science to mathematical analysis to recall that the formula in brackets is exactly a partial sum of the harmonic series . The benefit of this series has been studied along and across. In particular, the Euler formula tells us that partial sums grow as O(n log(n)) . Thank you, analysis, bye-bye, see you soon! :)

So, remembering the price of the call ToBinaryString, we can estimate the complexity of our algorithm from above as O(n log³(n)) . This is the answer!

It turns out, to solve the problem, you need quite a lot:

know that concatenating strings is bad
be able to count the complexity of recursive methods
Know that HashSet's remove works efficiently.
be able to reduce the complexity estimate of nested loops to a partial sum of a series
and finally, to know about Euler's formula for a partial sum of the harmonic series

Wow! It seems pretty cool for such a small piece of code :)

What is going on?

Is it time to understand what this strange code is doing at all? Take a closer look! This is a search for a list of prime numbers using the unfinished sieve of Eratosthenes . Indeed, the inner loop deletes every second number, every third and so on. As a result, there will remain only those who are not divided into anything, except for 1 and themselves.

Here, no known optimization has been applied (for example, a reasonable choice of the number of loop iterations), and pessimization in the form of ToBinaryString has also been added, in order to completely confuse. As a result, there is no trace of the original complexity O (n log (log (n))).

Here are some interesting optimization tips from conference participants:

Memoize ToBinaryString. Simply put, cache the result. Obviously this will speed up the work, but it is far from obvious that this will improve the assessment of complexity. And really improve. How hard - think for yourself.
Use not HashSet, but BitArray (C #) / BitSet (Java). Interesting offer. To do this, you have to store in the set not the string, but the numbers themselves. And the ToBinaryString method is applied only at the end to the result. But this refactoring alone will improve the complexity. After that, replacing the collection will probably speed up the program by a constant, but will not affect the asymptotic estimate of complexity :)
Use a different algorithm for finding prime numbers. For example, Atkin's sieve is behind O (n / log (log (n))).

Joker vs. members DotNext members

Statistics!

On Joker, the problem was solved by 86 people. Of these, 8 gave the correct answer O (n log ³ (n)), and another 8 fell for the trick with concatenation and answered O (n log ² (n)).

On DotNext, the puzzle was solved by 69 people. Of these, 10 answered correctly, and another 17 people answered O (n log² (n)). But far-reaching holivarny conclusions about the superiority of dotnetchikov over javistami according to this statistics will not work. We know for sure that 49 konturovtsy arrived on DotNext, and they had an advantage.

2 hour difficulty assessment course

Have you forgotten what I am learning to program?

Last fall, I launched a mini course on rapid assessment of the complexity of algorithms on ulearn.me, “our little Curser”. It can be completed in just a couple of hours, it contains almost no theory, but it will have to solve 3 dozen of tasks that lead you from simple techniques to more complex ones. If the analysis of this task was difficult for you to understand, the course is for you.

This course may have influenced the results. Moreover, after Joker, I added to it an analysis of the elements of this very task - do not lose the good :)

Want to spend two hours to deal with the express assessment ~~and not get a reward in our internal social network~~ ? Welcome to the course.

" Evaluation of the complexity of algorithms " on ulearn.me

Source: https://habr.com/ru/post/348870/

All Articles