📜 ⬆️ ⬇️

"Topological" sorting graph with cycles

The full title of the article was supposed to sound like “Stable“ topological ”sorting of a graph with cycles by O(|V| + |e| log |e|) by time and O(|V|) by memory without recursion”, but I was told what is bust.

Disclaimer: I am a programmer, not a mathematician, so in some places inaccurate wording is possible, for which you can and should kick.

The essence of the task


I will analyze the formulation of the problem, the solution of which I want to share, in parts.

Topological sorting is the ordering of the vertices of a directed acyclic graph, in which each of the vertices from which the edge extends comes earlier than the vertex into which this edge enters. There are two important nuances: a graph can have more than one such orderings and it applies only to acyclic graphs. Mathematicians are not worried, but programmers sometimes want determinism and a little more than “sorry, you have a cycle here, there will be no sorting for you”.
')
Therefore, we add the requirement of stability : a pair of vertices, the order between which is not specified by the edges of the graph, must be determined by the order in which these same vertices arrived at the input of the algorithm. As a result, repeated sorts will not change the order of the vertices.

With the absence of recursion, everything is simple, the computer is significantly weaker than the Turing machine, and the memory (and especially the stack) is limited. Therefore, in application programming, usually iterative algorithms are preferable to recursive ones.

And, finally, I will define what I call “topological” sorting if there are cycles in the graph. This is the ordering of the vertices, which coincides with the true topological sorting, if each of the cycles is replaced by one vertex, and the vertices of the cycle itself, in accordance with the requirement of stability, are located relative to each other in the original order.

And now with all this garbage, we will try to take off, I will do it all within the limits specified in the beginning of the post time and memory restrictions.

Search for a solution


If you look at the existing topological sorting algorithms ( Kahn's algorithm , search in depth ), it turns out that they all, if there is a cycle, say "I can not" and stop working.

Therefore, let us go on the other hand, with algorithms that can do something intelligible with cycles. For example, find them. Among the algorithms for searching cycles in graphs listed in Wikipedia, Tarjan’s algorithm attracted attention. It contains an interesting remark that, as a by-product, the algorithm produces the inverse topological graph sorting:
In the case of a successor. Therefore, it was strongly connected .
True, the algorithm is recursive and it is not clear that it has stability, but this is clearly a movement in the right direction. When reading Wikipedia more attentively, it reveals a reference to the article A by the authorship of Comrade David Pierce, in which it is not enough that there is an imperative algorithm, but also with reduced memory requirements compared to the classical Tarjan algorithm. The bonus is the implementation of the algorithm in Java . We must take!

Algorithm PEA_FIND_SCC3 (V, E) from Pierce's article


So, we have a list of vertices at the input and (thanks to Pierce) a certain index of the strong connected component to which this vertex belongs to the output. The next step is to stably sort the vertices according to the sequence number of their component. There is an algorithm for such a task; it is called sorting by calculation , performing it in O(n) time.

In the process of collecting the algorithm in a heap, it turned out that from Tarjan the fact that it is natural for him to give back the topological sorting — that the neighboring branches of the graph (not having an order relationship) —number the graph backwards, not really having any connections with each other, are in the reverse order ...

Answer


So, the final decision:

  1. We number the vertices of the source list. O(|V|)
  2. We sort the edges of each of the vertices according to the number of the vertex where the edge goes. O(|e| log |e|)
  3. Using the Pierce algorithm, we find and number the components of strong connectivity. O(|V|)
  4. Using sorting by counting, sort the vertices on the basis of the numbers of the components of the strong connectivity they received. O(|V|)

The code on GitHub, Java, public domain . It can be noted that in order to ensure the stability of sorting, the Pierce algorithm is slightly modified and goes around the vertices in the reverse order.

But why???


And now the background, why it all took. When loading / unloading dynamic libraries (.so), glibc, you need to decide in which order to initialize the static variables. Variables depend on each other, depend on various functions, etc. In general, all this forms the same graph in which there are cycles and which must be sorted.

Once upon a time, this task was dealt with by a rather non-optimal code that performed the task in O(n^2) . And in general, it didn’t really bother anyone, until in 2012 it was discovered that the code was not working correctly and was mistaken in some cases.

Severe men from RedHat thought-thought and screwed on top some more cycles. Problem cases have been fixed, but the algorithm has already started working for O(n^3) , and this has already become noticeable and in some applications it has taken a few to tens of seconds, about which in 2013 a bug was introduced. Also, the author of the bug found cases in which the algorithm with O(n^3) also wrong . He suggested using Tarjan’s algorithm, although the patch with corrections was never issued.

As time went on, glibc mercilessly braked and in 2015 there was another attempt to fix the algorithm . Unfortunately unsuccessful, the algorithm was chosen O(n^2) , besides it was confused by the branches of the graph, between which the order is not defined.

Today is the year 2019, and glibc is still slowing down. Judging by how much time it took for me to correct the task, the chances that I will complete it are significantly lower than 100%. Everything is aggravated by the fact that it happens on C, without IDE support, in the code with the GNU coding style, the insane test runner (“and if you want to run the test again, just delete the corresponding .out file”). And in order for glibc to even take a look at your patch, you need to go through the procedure of copyright assignment , correctly patch the patch and the devil knows what else. Therefore, in order to at least remove the issue with the invention of the algorithm that solves the problem, this post was written.

Source: https://habr.com/ru/post/451208/


All Articles