Static analysis and property-based testing: together we are powerful

As you know, there are bugs in all programs. There are many ways to deal with them: unit tests, review, static analysis, dynamic analysis, smoke testing, and so on. Sometimes to eradicate a certain bug it is useful to combine different techniques.

I am developing Java Inspection at IntelliJ IDEA, which is mostly written in Java. In a sense, I am in a privileged position compared to other programmers: to refine the static IDE analyzer in order to find a new class of errors - this is my direct working duty, which, at the same time, allows finding and disabling bugs in the same IDE. I want to share one such success story.

In early October, a colleague threw me a report with a hung property-based test . This test does something like the following: opens a random Java file from IDEA sources, makes some random edits, puts the cursor in a random place, applies some random quick-fix that is available at that place, and so on. The colleagues did this thing at the summer hackathon and thanks to it they managed to catch and neutralize many bugs in the inspections before they were reported by users. Also reproduced the errors that were once reported by users, but for some reason they were not reproduced here. In general, a very useful thing. If anyone is interested, the source is all available in the community IDEA version on GitHub .

So, the report. The test fell on a timeout, but it gives the initial value of the pseudo-random number generator, which can be used to reproduce the behavior. Looking at the stack-trace, we managed to find out that the test hung in a fairly simple endless loop inside the inspection, reporting string concatenation in the loop:

PsiElement parent = PsiUtil.skipParenthesizedExprUp(expression.getParent()); while (parent instanceof PsiTypeCastExpression || parent instanceof PsiConditionalExpression) { parent = PsiUtil.skipParenthesizedExprUp(expression.getParent()); }

Let me tell you why this cycle is needed. String concatenation in a loop is known to be harmful, and often interchangeable with StringBuilder. However, in some cases it makes no sense to warn about such a concatenation. For example, if after each concatenation the current result is used:

 String dots = ""; for(int i=0; i<10; i++) { dots += "."; pacman.eatSomeDots(dots); }

There is no point in replacing it with StringBuilder , since every intermediate string is needed for passing to the method. If it is possible to optimize, then it is not trivial. It is here (or rather just below) that we check how the occurrences of references to a string variable are used. Here, expression is another reference to a string in the source text of the program being analyzed (in our example, the parameter of the eatSomeDots method). Obviously, if we call eatSomeDots((dots)) , or eatSomeDots((String)(dots)) , or even eatSomeDots(flag ? dots : "***") , then the line dots should still be considered always used. Therefore, we climb up the hierarchy of expressions up, skipping possible brackets, type conversions (PsiTypeCastExpression) and conditional operators (PsiConditionalExpression).

However, a banal typo has crept into the loop: there should be no parent = PsiUtil.skipParenthesizedExprUp(expression.getParent()); , and parent = PsiUtil.skipParenthesizedExprUp(parent.getParent()); . It turned out to be an unpleasant situation: if the condition of the loop is true, then the parent reassigned to the same value that it already had, with the result that the loop becomes infinite.

Ironically, the code, the purpose of which is to deal with performance problems in cycles, itself contains a problem in the cycle, albeit of a completely different sense. This was not noticed on the review and for some reason it turned out that such a case was not covered by the unit test. It is unpleasant. I, of course, corrected the code and wrote a unit test. Is it worth it to stop? No, every mistake must be analyzed to determine why it occurred, why it went unnoticed and what we can do to prevent the situation from repeating.

There was no suitable unit test, which means that they never entered the loop body in tests - this is bad. If we were to monitor code coverage regularly, this problem would probably have been avoided immediately. But just to start demanding 100% test coverage is also a road to nowhere. It was a review, it was good, but such an error was easy to miss, it was also wrong to blame the reviewer. The fact that the property-based-test caught it, and earlier users - is generally excellent. It means that such tests are really useful stuff and we made them not in vain. But I, as the IDE author, was worried about another question: why couldn’t any of our inspections have warned of such a mistake? Is it possible to write an inspection that marks this code? Or can improve existing?

We have several inspections that allow us to detect endless cycles. For example, if the parent variable were not assigned in a loop, then the Loop variable is not updated inside loop inspection would work. But it is assigned. If the condition were always true, the inspection of “Constant conditions & exceptions” would have worked. But it can be false if we never enter the cycle. Interestingly, the skipParenthesizedExprUp and getParent methods have no visible side effects, and the analyzer knows this because they are marked with the @Contract(pure = true) annotation. Also, the cycle condition has no side effect. It turns out that the entire side effect of one iteration of the loop is assignment to the local variable parent .

Let's call a side effect, which changes only local variables (the variables themselves, and not, for example, the fields of the objects to which these variables refer) as “local side effects”, and any other side effect - non-local. A local side effect can only affect the behavior of the current method. If a cycle has only a local side effect, and its condition does not have any side effects at all, let's collect all the variables that we write to during one iteration of the cycle. In our case, this is only the parent variable. Let me remind you, the body of the loop looks like this:

 parent = PsiUtil.skipParenthesizedExprUp(expression.getParent());

Now let's see if the value of this variable is used before the fact of assignment. We can do this using simple control flow analysis. This is generally a necessary operation for issuing correct error messages in Java: if a local variable is used before the first assignment, such a Java program is considered incorrect. But we can apply this analysis only to the body of our cycle. If the variable recorded in the body is used before assignment, then the value from the last iteration affects the next iteration. Otherwise, the next iteration will do the same as the previous one, and we will not get new effects from subsequent iterations!

In our loop, the situation is quite simple: the value of parent from the previous iteration in the body of the loop is not used at all. However, this approach will allow you to catch more complex cases. For example, imagine that another variable is written in a loop:

 PsiExpression immediateParent = null; while (parent instanceof PsiTypeCastExpression || parent instanceof PsiConditionalExpression) { immediateParent = expression.getParent(); parent = PsiUtil.skipParenthesizedExprUp(immediateParent); }

Now we write two variables and read one of them during the iteration. But we read guaranteed after it was written down, which means that we still never use the value from the previous iteration.

How to characterize such a cycle? Cycle without side effects? But there is a side effect. The problem is that this effect is always the same after the first iteration. That is, either we will not enter the cycle at all, or we will execute one iteration. If, after the first iteration, the condition of the loop remains true, then nothing new will happen on subsequent iterations, and we will loop. Then the word idempotency came to my mind. The body of such a cycle is idempotent: re-performing the body has no effect. And so a new inspection of the “ Idempotent loop body ” appeared.

We made sure that the inspection highlighted the initial error cycle. However, it is worth checking out what else it highlights and whether it gives false positives. False workings can negate all the benefits of the inspection, such an inspection will only annoy and turn it off. On the IDEA Ultimate source code, there were four more draws and it turned out that they are all correct: these are really four potentially infinite loops. Such a trivial error, for example, was found in refactoring, which converts Groovy to Java:

 private static String getNewFileName(GroovyFile file) { ... String prefix = FileUtil.getNameWithoutExtension(file.getName()); String fileName = prefix + ".java"; int index = 1; while (fileNames.contains(fileName)) { fileName = prefix + index + ".java"; } return fileName; }

Pretty standard code to generate a unique file name. In general, it is strange that it is written directly in the refactoring code. For good this should be reusable utility solution. By the way, another way to avoid bugs - do not reinvent the wheel. It is better to debug a single reference bike and ride it all.

In the code, obviously, an error, which is described by our script. If there is no file named prefix + ".java" , then this name will be used. If there is a file with this name, then the name prefix + "1.java" will be generated. But if there is such a file, then we will have an infinite loop, because subsequent iterations will not give new effects (here the effect is a change in fileName ). Probably the first two cases were tested, and the third was missed. The fix is quite simple - you need to increase the index variable in the loop.

This story showed how important it is to analyze the error found. Property-based-test revealed one infinite loop, that's good. But such tests also cover not everything. In this case, it turned out that the error can also be found statically and after a corresponding refinement of the static analyzer, we found four more similar errors. Does this mean that static analysis is five times better than property-based-tests? No, because without a test, we would not have known about this problem at all (at least until the users stumbled upon it). In addition, the property-based-test can find errors that cannot be described by static rules at all. Different search and error prevention techniques are powerful when they work together. If you think about how to detect the error found statically, you can save a lot of time and effort in the future. Of course, not everyone can easily modify the static analyzer. But you can search for simple cases in IDEA using the Structural search (you can also configure the inspection to highlight your own template). If you ran into a more complicated, but potentially formalized situation, you can always throw an idea into the official bug tracker of your IDE.

The “Idempotent loop body” inspection is available in EAP-versions of IDEA 2018.1, so you can try it now. Program with pleasure!

Source: https://habr.com/ru/post/347676/

All Articles

Static analysis and property-based testing: together we are powerful

More articles: