Proof of the incorrect sorting algorithm Android, Java and Python

Tim Peters developed the Timsort hybrid sorting algorithm in 2002. The algorithm is a skillful combination of ideas of merge sorting and sorting inserts and sharpened to work effectively with real data. Timsort was first developed for Python, but then Joshua Bloch (the creator of the Java collections, who, incidentally, noted that most binary search algorithms contain an error ) ported it to Java (the java.util.Collections.sort and java.util.Arrays methods .sort) Today Timsort is the standard sorting algorithm in the Android SDK, Oracle JDK and OpenJDK. Given the popularity of these platforms, it can be concluded that computers, cloud services, and mobile devices that use Timsort for sorting are worth billions.

But back in 2015 year. After we successfully verified the Java implementations of sorting by counting and bitwise sorting ( J. Autom. Reasoning 53 (2), 129-139 ) with our formal verification tool called KeY , we looked for a new object to learn. Timsort seemed to be a suitable candidate, because it is quite complex and widely used. Unfortunately, we could not prove its correctness. The reason for this, when examined in detail, turned out to be simple: there is a bug in the Timsort implementation. Our theoretical studies have shown us where to look for the error (curiously, the error was already in the Python implementation). This article describes how we achieved this.

An article with a more complete analysis, as well as several test programs are available on our website .

Content

1. A bug in the Timsort implementation on Android, Java and Python

So what is the bug? And why don't you try to play it yourself?
')

1.1 Reproduce Timsort bug in Java

git clone https://github.com/abstools/java-timsort-bug.git cd java-timsort-bug javac *.java java TestTimSort 67108864

If a bug is present, you will see this result.

 Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 40 at java.util.TimSort.pushRun(TimSort.java:413) at java.util.TimSort.sort(TimSort.java:240) at java.util.Arrays.sort(Arrays.java:1438) at TestTimSort.main(TestTimSort.java:18)

Video playback playback errors

1.2 How does Timsort work?

Timsort is a hybrid algorithm that uses insert sorting and merge sorting.

The algorithm reorders the input array from left to right, finding in it consecutive (non-overlapping) sorted segments (“series”, or runs). If the series is shorter than a certain minimum length, it is supplemented by subsequent elements and sorted by insertion. The lengths of the created series are added to the end of the runLen array. When a new series is added to runLen , the mergeCollapse method merges the series until the last three elements in runLen satisfy the following pair of conditions, which is called the " invariant ":

runLen [n-2]> runLen [n-1] + runLen [n]
runLen [n-1]> runLen [n]

Here n is the index of the last series in runLen. The idea was that checking this invariant for the last three series guarantees that all successive triples of series will satisfy it. At the end, all the series are merged, resulting in a fully sorted input array.

For reasons of efficiency, it is desirable to allocate as little memory as possible under runLen , but it should be enough for all the series to fit there. If the invariant is satisfied for all series, the length of the series will grow exponentially (even faster than the Fibonacci numbers: the length of each series is strictly greater than the sum of the lengths of the next two). Since the series do not overlap, they will not need much to completely sort even a huge input array.

1.3 Bug Timsort step by step

The following code snippet shows that the implementation of mergeCollapse checks an invariant for the last three series in runLen :

 private void mergeCollapse() { while (stackSize > 1) { int n = stackSize - 2; if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]) { if (runLen[n - 1] < runLen[n + 1]) n--; mergeAt(n); } else if (runLen[n] <= runLen[n + 1]) { mergeAt(n); } else { break; //   } } }

Unfortunately, this is not enough for all series to satisfy the invariant. Suppose that runLen contains such a set of lengths before executing mergeCollapse (n = 4):

 120, 80, 25, 20, 30

On the first pass of the while loop, series of lengths 25 and 20 will be combined (since 25 <20 + 30 and 25 <30):

 120, 80, 45, 30

In the second pass of the cycle (now n = 3), we determine that the invariant is set for the last three series, because 80> 45 + 30 and 45> 30, so the mergeCollapse ends the work. But mergeCollapse did not restore the invariant for the whole array: it is broken in the first three, since 120 <80 + 45.

The test generator on our website allows you to identify this problem. It creates an input array with many short series, while their length does not satisfy the invariant, which leads to the fall of Timsort. The real reason for the error is that due to the violation of the invariant, the length of the series grows slower than expected, resulting in their number exceeds runLen.length and this leads to an ArrayOutOfBoundsException.

2. Proof of (Incorrect) Timsort

We found that the mergeCollapse invariant is violated when we tried to formally verify Timsort. Fortunately, we also understood how to fix it. As a result, we even managed to verify the corrected version, in which the invariant is actually observed. But let's not hurry. To begin with, let's see what a formal verification is and how it is performed.

2.1 KeY Verification System

KeY is a deductive verification platform for single-threaded Java and JavaCard programs. It allows you to prove the correctness of programs for a given specification. Roughly speaking, the specification consists of preconditions (in terms of KeY - requires ) and postconditions (in terms of KeY - ensures ). Specifications are given before the methods that implement them (for example, before the mergeCollapse () mentioned above). The specification of a method is also called its contract .

In the case of sorting, the precondition simply states that a non-empty array is fed to the input, and the post-condition requires that the resulting array be sorted and be a permutation of the input. The KeY system proves that if a verifiable method is called with input data that satisfies the preconditions, then the method will end normally and the resulting state will satisfy the postcondition. This is also called complete correctness , because the normal completion of the method is also proved. As we saw, the java.util.Arrays.sort () method from OpenJDK does not honor this contract, because for certain input data it ends with an exception.

Additionally, class and object invariants are used to indicate general constraints on field values. The consistency of data or boundary conditions is usually checked:

 /*@ private invariant @ runBase.length == runLen.length && runBase != runLen; @*/

This invariant says that the lengths of the runBase and runLen arrays must match, but at the same time they must be two different arrays. The semantics of invariants implies that each output method must not only provide the postconditions of its contract, but also the invariants of its own object “this”.

KeY uses the Java Modeling Language (JML) as its specification language. Expressions on it are written in the usual Java language, so Java programmers can easily master it. The main JML extension is quantifiers ( \ forall T x - for any T, \ exists T x - there is T) and, of course, special keywords for contracts. JML specifications are provided in java-file comments before the code to which they refer. The following is a simple example of a Java method with a JML specification:

 /*@ private normal_behavior @ requires @ n >= MIN_MERGE; @ ensures @ \result >= MIN_MERGE/2; @*/ private static int /*@ pure @*/ minRunLength(int n) { assert n >= 0; int r = 0; //  1          /*@ loop_invariant n >= MIN_MERGE/2 && r >=0 && r<=1; @ decreases n; @ assignable \nothing; @*/ while (n >= MIN_MERGE) { r |= (n & 1); n >>= 1; } return n + r; }

The minRunLength () contract contains one precondition: the input parameter must be no less than MIN_MERGE. In this case (and only in this) the method promises to end normally (do not get hung up and do not throw an exception) and return a value that is greater than or equal to MIN_MERGE / 2. Additionally, the method is marked as pure . This means that the method does not modify the heap.

The key point is that the KeY system is able to statically prove contracts of similar methods for any input parameters. How is this possible? KeY performs symbolic execution of a verifiable method, that is, executes it using character values, so that all possible execution paths are taken into account. But this is not enough, because the symbolic execution of loops with a non-fixed number of iterations (such as a loop in mergeCollapse (), where we don’t know the value of stackSize) will never end. To make the symbolic execution of loops finite, the logic of invariants is used. For example, the above minRunLength () method contains a cycle, the specification of which is expressed by a cycle invariant. The invariant states that after each iteration the condition n >= MIN_MERGE/2 && r >= 0 && r <= 1 is satisfied, and from this one can prove the postcondition of the entire method. The summary, Redu, is used to prove the end of a loop. It indicates an expression whose value is non-negative and strictly decreases. The assignable construct lists heap objects that can be modified in a loop. The keyword \ nothing means that the heap is not modified. And indeed: the loop changes only the local variable r and the argument n.

In general, there are not enough method contracts for formal verification. It is also necessary to provide cycle invariants. Sometimes it is very difficult to come up with an invariant that is observed in a cycle and is strong enough to derive the post-condition of the method from it. Without instrumental support and the technology of automated proof of theorems, it is hardly possible to make correct cycle invariants in non-trivial programs. In fact, this is where the Timsort creators mistake lies. Under certain conditions, a loop in mergeCollapse violates the following invariant of the Timsort class (see section 1.3 Timsort Bug Step by Step ):

 /*@ private invariant @ (\forall int i; 0<=i && i<stackSize-4; @ runLen[i] > runLen[i+1] + runLen[i+2])) @*/

This invariant states that runLen [i] must be greater than the sum of two subsequent elements, for any non-negative i, less than stackSize-4. The invariant is not restored even after the loop, so the entire mergeCollapse method does not save the class invariant. Therefore, the loop invariant is not as strict as the developers assumed. This is what we discovered during our attempt to verify Timsort with KeY. Without a special tool, it would be almost impossible to detect.

Although JML is very similar to Java, it is a full-fledged contract programming language suitable for full functional verification of Java programs.

2.2 Correction and its formal specification

A simplified version of the contract, which according to the authors' idea should be respected in mergeCollapse, is given below.

 /*@ requires @ stackSize > 0 && @ runLen[stackSize-4] > runLen[stackSize-3]+runLen[stackSize-2] @ && runLen[stackSize-3] > runLen[stackSize-2]; @ ensures @ (\forall int i; 0<=i && i<stackSize-2; @ runLen[i] > runLen[i+1] + runLen[i+2]) @ && runLen[stackSize-2] > runLen[stackSize-1] @*/ private void mergeCollapse()

Two ensured formulas imply that when mergeCollapse is completed, all series satisfy the invariant given in section 1.2 . We have already seen that this contract is not being executed in the current implementation of mergeCollapse ( section 1.3 ). Now we present a revised version that honors the contract:

 private void newMergeCollapse() { while (stackSize > 1) { int n = stackSize - 2; if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1] || n-1 > 0 && runLen[n-2] <= runLen[n] + runLen[n-1]) { if (runLen[n - 1] < runLen[n + 1]) n--; } else if (n<0 || runLen[n] > runLen[n + 1]) { break; //   } mergeAt(n); } }

The main idea of this version is to check that the invariant is observed for the last four series in runLen instead of three . As we will see below, this is enough for the lengths of all series to satisfy the invariant after the mergeCollapse is completed.

The first step in proving a contract for a revised version of mergeCollapse is to find a suitable loop invariant. Here is its simplified version:

 /*@ loop_invariant @ (\forall int i; 0<=i && i<stackSize-4; @ runLen[i] > runLen[i+1] + runLen[i+2]) @ && runLen[stackSize-4] > runLen[stackSize-3]) @*/

Intuitively, the cycle invariant states that all series except, perhaps, the last four, satisfy the condition. In this case, we note that the cycle ends (according to the break operator) only if the last four series also satisfy it. Thus, it is possible to guarantee that all series satisfy the invariant.

2.3 Analysis of the results of KeY

When we feed KeY with a corrected version of mergeCollapse along with a contract and a cycle invariant, the system performs a symbolic execution of the cycle and generates verification conditions: formulas that imply that the mergeCollapse contract is fulfilled. The following (simplified) formula shows the main condition for proving the correctness of a mergeCollapse generated by KeY:

The formula states that the conditions in brackets that are true after the end of the cycle should follow the post condition mergeCollapse. It is clear where the expressions in brackets came from: the break statement that completes the loop is executed only when they are all true. We formally proved this formula (and all other verification conditions) with the help of KeY in semi-automatic mode. The following is a sketch of evidence:

Proof . The formula runLen [stackSize-2]> runLen [stackSize-1] from the post-condition mergeCollapse directly follows from n> = 0 ==> runLen [n]> runLen [n + 1] .

We prove the following formula:

\ forall int i; 0 <= i && i <stackSize-2; runLen [i]> runLen [i + 1] + runLen [i + 2] .

The following options are possible i:

i <stackSize-4: corresponds to the loop invariant
i = stackSize-4: follows from n> 1 ==> runLen [n-2]> runLen [n-1] + runLen [n]
i = stackSize-3: follows from n> 0 ==> runLen [n-1]> runLen [n] + runLen [n + 1]

This proof shows that the new version of mergeCollapse is completed only when all series satisfy the invariant.

3. Proposed bug fixes for Timsort implementations in Python and Android / Java

Our error analysis (including the mergeCollapse fix) has been sent, reviewed and accepted into the Java bugtracker .

The bug is present at least in the Java version for Android, OpenJDK and OracleJDK: all use the same Timsort code. Also a bug is present in Python. The following sections provide the original and revised versions.

As mentioned above, the idea of a fix is very simple: check that the invariant is executed for the last four series in runLen, and not just for three.

3.1 Incorrect function merge_collapse (Python)

Timsort for Python (written in C using the Python API) is available in the subversion repository - Algorithm is also described here .

The Timsort version for Java was ported with CPython. The CPython version contains an error, designed to support arrays of up to 2 ⁶⁴ elements, also contains an error. However, it is impossible to cause an array overflow error on modern machines: the algorithm allocates 85 elements for runLen, and this is enough (as our analysis shows in the full article) for arrays containing no more than 2 ⁴⁹ elements. For comparison, the most powerful supercomputer to date Tianhe-2 has 2 ⁵⁰ bytes of memory.

 /* The maximum number of entries in a MergeState's * pending-runs stack. * This is enough to sort arrays of size up to about * 32 * phi ** MAX_MERGE_PENDING * where phi ~= 1.618. 85 is ridiculously large enough, * good for an array with 2**64 elements. */ #define MAX_MERGE_PENDING 85 merge_collapse(MergeState *ms) { struct s_slice *p = ms->pending; assert(ms); while (ms->n > 1) { Py_ssize_t n = ms->n - 2; if (n > 0 && p[n-1].len <= p[n].len + p[n+1].len) { if (p[n-1].len < p[n+1].len) --n; if (merge_at(ms, n) < 0) return -1; } else if (p[n].len <= p[n+1].len) { if (merge_at(ms, n) < 0) return -1; } else break; } return 0; }

3.2 Corrected merge_collapse (Python) function

 merge_collapse(MergeState *ms) { struct s_slice *p = ms->pending; assert(ms); while (ms->n > 1) { Py_ssize_t n = ms->n - 2; if ( n > 0 && p[n-1].len <= p[n].len + p[n+1].len || (n-1 > 0 && p[n-2].len <= p[n].len + p[n-1].len)) { if (p[n-1].len < p[n+1].len) --n; if (merge_at(ms, n) < 0) return -1; } else if (p[n].len <= p[n+1].len) { if (merge_at(ms, n) < 0) return -1; } else break; } return 0; }

3.3 Incorrect mergeCollapse function (Java / Android)

The error is completely similar to the error in Python:

  private void mergeCollapse() { while (stackSize > 1) { int n = stackSize - 2; if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]) { if (runLen[n - 1] < runLen[n + 1]) n--; mergeAt(n); } else if (runLen[n] <= runLen[n + 1]) { mergeAt(n); } else { break; //   } } }

3.4 Corrected mergeCollapse function (Java / Android)

The fix is similar to the one above for Python.
[UPDATE 26/2: we changed the code from the original version of the article. The old code was equivalent, but contained redundant checks and differed in style. Thanks to everyone who noticed!]

  private void newMergeCollapse() { while (stackSize > 1) { int n = stackSize - 2; if ( (n >= 1 && runLen[n-1] <= runLen[n] + runLen[n+1]) || (n >= 2 && runLen[n-2] <= runLen[n] + runLen[n-1])) { if (runLen[n - 1] < runLen[n + 1]) n--; } else if (runLen[n] > runLen[n + 1]) { break; //   } mergeAt(n); } }

4. Conclusion: which conclusions follow from this?

When trying to verify Timsort, we were unable to establish an invariant of the sort object. Analyzing the reasons for the failure, we found an error in the Timsort implementation, which causes an ArrayOutOfBoundsException for certain input data. We suggested correcting an erroneous method (without a perceptible reduction in performance) and formally proved that the correction is correct and there is no more error.

From this story, in addition to the actual error found, several observations can be made.

Programmers often consider formal methods impractical and / or far from real tasks. This is not the case: we found and fixed a software bug that billions of users use every day. As our analysis showed, it would be almost impossible to find and fix such a bug without formal analysis and a tool for verification. Error for many years lived in the standard library of languages Java and Python. Its manifestations were noticed earlier and even thought that they had corrected, but in reality they only achieved that the error began to occur less frequently.
Although it is unlikely that such an error will occur by chance, it is easy to imagine how it can be used to attack. Probably, in the standard libraries of popular programming languages lie other unnoticed errors. Maybe it is worth to look for them before they cause harm or will be used by hackers?
The response of Java developers to our report is somewhat disappointing. Instead of using our corrected (and verified!) Version of mergeCollapse (), they preferred to increase the selected runLen size to “sufficient” size. As we have shown, it was not at all necessary to do so. But now everyone who uses java.utils.Collection.sort () will have to spend more memory. Given the astronomical number of programs that use such a basic function, this will lead to noticeable additional energy costs. We can only guess why our decision was not made: perhaps the JDK developers did not bother to read our report completely and therefore did not understand the essence of the fix and decided not to rely on it. In the end, OpenJDK develops a community largely composed of volunteers who do not have much time.

What conclusions follow from this? We would be happy if our work served as a starting point for closer interaction between researchers of formal methods and developers of open software platforms. Formal methods are already being successfully applied in Amazon and Facebook . Modern languages of formal specification and formal verification tools are not so confusing and super-complex to learn. Their usability and automation is constantly improving. But we need more people to try, test and use our tools. Yes, it will take some effort to start writing formal specifications and verifying the code, but this is no more difficult than, for example, figuring out how to use the compiler or the project build tool. But we are talking about days or weeks, not months or years. Well, how do you accept the challenge?

Best wishes,
Stein de Gove , Jurian Roth , Frank de Boer , Richard Bubel and Rainer Henle .

Thanks

This work was partially supported by the project of the seventh framework program of the European Union FP7-610582 ENVISAGE: Engineering Virtualized Services .
This article would not have been written without the support and polite reminders of Amunde Tveit ! We also want to thank Beruza Nobacte for providing us with a video showing an error.

Envisage logo

Source: https://habr.com/ru/post/251751/

All Articles