Control of integer ranges in FindBugs

FindBugs is an open source static code analyzer for Java (under the LGPL ). It contains many detectors that identify certain problems in the code. Recently, I am a participant in the project and am writing new detectors for it. I will tell about one of them in this article. We will also look at examples of bugs found in real projects.

FindBugs device

FindBugs does not analyze the source code, but bytecode. This method has some drawbacks. It is impossible to write a detector that is based only on incorrect formatting of the code (for example, it is possible that the else omitted else in else if, like V646 in PVS-Studio ). It is impossible to detect some situations in which the same byte code is generated. For example, it would be nice to detect such a strange code (which is issued with an IDEA warning and the same PVS-Studio ):

while(a < 5) { a++; return; }

But if we replace while with if, the compilers will generate exactly the same bytecode, so FindBugs cannot see the difference. It is important to keep this in mind when you use or plan to expand FindBugs.

The Byte Code Engineering Library (6.0-SNAPSHOT) from Apache and sometimes ASM 5.0.2 is used to work with bytecode. Apparently, this situation happened for historical reasons: when the FindBugs project started, BCEL was the only suitable option, but now, unfortunately, it hardly develops. In fact, the FindBugs authors themselves are refining it so that new versions of class files can be analyzed.

The main part of the work of FindBugs is performed in detectors, which are involved in issuing warnings. There are also special detectors that collect some information about the code (for example, build a call graph) and save it for future use by other detectors. FindBugs works in two passes: on the first pass, these special detectors work mainly and they analyze not only the classes of the project itself, but also the classes of the libraries used. There are also analysis factories that build a specific analysis of a class or method on the demand of a particular detector (for example, analyzing the flow of null and non-null values within a method).
')

Task

I wanted to catch logical errors like these:

 int test(int x) { if( x > 10 && x > 5 ) { return 1; } return 0; }

The second condition is obviously useless here, because if x is greater than 10, then it is at least as large as 5. The given example is very simple, although such errors occur in real code. In general, the task is to find the conditions on the integers, which for any possible value of the number is always true or always false.

Control flow graph

In compilation theory, there is such a thing - control flow graph (CFG). This is similar to the good old flowchart built on the program code: the vertices of the graph are the basic blocks, the code in which is executed linearly without any transitions, and the edges - the possible transition directions. The control flow graph can be quite large even for a simple method, because most instructions can generate an exception, for which an edge is added. The control flow graph for each method is already built in FindBugs, so it can be used. This is how the graph for the method given above looked like:

For our task, we first need to select all the parameters and local variables that can participate in the analysis. So far I have limited myself to those that never change (perhaps in the future the restriction will be relaxed). That is, we are interested in parameters that are never assigned, and variables that are assigned exactly once. Such a list is easy to compile by simply running over the instructions of the method in search of instructions for writing to a local variable. For each variable, we initially set the allowable range of values. For example, for the int type, this would be [Integer.MIN_VALUE..Integer.MAX_VALUE].

Next, we iterate over those CFG nodes that contain the condition. If this is a comparison of a variable of interest with a constant, then we divide the corresponding interval according to the condition. For example, for the condition x> 5 we get [Integer.MIN_VALUE..5] + [6..Integer.MAX_VALUE], and for x! = 0 we get three intervals: [Integer.MIN_VALUE ..- 1] + {0} + [1..Integer.MAX_VALUE]. Also for each such condition, an interval is preserved that satisfies the condition.

Then for each variable we are interested in we run through all the intervals of the definition domain and for each interval we walk along the flow graph, starting from the input node. Moreover, if we find ourselves in a conditional node associated with a given variable, then we go only along one edge depending on whether the condition is met at a given interval, and mark which one we went through. For all other nodes we go through all kinds of outgoing edges. Our small method divides the domain of x into three intervals, for which the traversal of the graph will look like this:

Note that in the first two cases the same thing happened. In the end, it remains only to find edges that we have never reached and generate an error message on them. If the final vertex of this edge was also not reached, this means that we have not just a useless condition, but also a code that will never be executed. In this case, I increase the priority of the error message. For the above example, there is an unreachable edge, but all vertices are reachable:

And here is a simple method in which an unreachable vertex appears (the body of the if operator):

 int test(int x) { if( x > 10 && x < 5 ) { return 1; } return 0; }

CFG will look like this:

finally

It all worked great until I ran into finally. Compilers duplicate finally-blocks in byte-code, which causes unattainable conditions that were not in the source. Let's say such a simple example:

 int test(int x) { try { if( x > 0 ) return -1; return 0; } finally { if( x <= 0 ) log.info("x <= 0!"); } }

The code is quite normal, but in the compilation process it turns into something like this (without taking into account possible exceptions):

 int test(int x) { if( x > 0 ) { if( x <= 0 ) log.info("x <= 0!"); return -1; } if( x <= 0 ) log.info("x <= 0!"); return 0; }

As a result, our new detector complains that in the first case, x <= 0 is always false and there is a dead code, and in the second, that x <= 0 is always true. There was no ready-made solution for analyzing such situations in FindBugs. Only in the null analysis some assumptions are made based on the table of line numbers. It seemed to me unreliable (especially this table may not be), and I wrote a separate analyzer, which with a small amount of heuristics finds duplicate finally-blocks in byte code and can output all its duplicates for a given CFG edge. Actually, in the process of this work, the idea was born to write a comic article . Now I issue an error only if the edge was not reached in all duplicates.

Bugs found

Now the fun part. I usually analyze a pack of open source projects for tests, so examples will be from different projects.

Errors due to a range of type values

This is a side effect of the analysis, about which I did not immediately think: sometimes the condition is useless, not because the condition above contradicts it, but because the type of the variable is limited. I made a separate message for him. Here is an example, about which I will not even speak, from which project it is. This method with small variations was thoughtlessly copied into a dozen projects that use XML:

 public final static boolean isNameStartChar(char ch) { return (ch > bA && ch < aZ) || (ch > ba && ch < az) || (ch == ':') || (ch == '_') || (ch > 0xBF && ch < 0xD7) || (ch > 0xD7 && ch < 0xF7) || (ch > 0xF7 && ch < 0x300) || (ch > 0x36F && ch < 0x37E) || (ch > 0x37E && ch < 0x2000) || (ch > 0x200B && ch < 0x200E) || (ch > 0x206F && ch < 0x2190) || (ch > 0x2BFF && ch < 0x2FF0) || (ch > 0x3000 && ch < 0xD800) || (ch > 0xF900 && ch < 0xFDD0) || (ch > 0xFDEF && ch < 0xFFFE) || (ch > 0x0FFFF && ch < 0xF0000); // ch > 0xFFFF   }

The last pair of conditions is meaningless: the value of a char variable cannot be greater than 0xFFFF. If the authors planned to support such characters, it is necessary to analyze the UTF-16 surrogate pairs, the analysis of a single char is not enough.

Here is a more specific - Utility # encodeRun from the IBM ICU project:

 private static final char ESCAPE = '\uA5A5'; private static final <T extends Appendable> void encodeRun(T buffer, short value, int length) { try { if (length < 4) { for (int j=0; j<length; ++j) { if (value == (int) ESCAPE) // value == 0xA5A5   buffer.append(ESCAPE); buffer.append((char) value); } } .... }

ESCAPE when converting to int is equal to 0xA5A5 (42405) and can in no way equal to a variable of type short, which takes values from -32768 to 32767.

Some of these errors were already caught by FindBugs (for example, comparing byte with numbers greater than 127), now it is caught more. But in general, it is not interesting, for such a general and IDE could follow without problems.

Used && instead of ||

One of the common blunders. Usually found in validating input parameters, which is not always properly covered by unit tests. For example, in SegmentArrayWithData # setElementAt in IntelliJ IDEA:

 public void setElementAt(int i, int startOffset, int endOffset, int data) { if (data < 0 && data > Short.MAX_VALUE) throw new IndexOutOfBoundsException("data out of short range" + data); ... }

Here data> Short.MAX_VALUE is meaningless, because there can not be a number at the same time less than 0 and more than 32767. As a result, a dead code is formed: the exception will never be thrown. A similar bug was found in various projects, but I was surprised to see it in IDEA: they themselves have a built-in serious static analyzer, which probably should catch this. Perhaps overlooked.

Used || instead of &&

This is much less common. Here again from ICU - ULocale # parseAcceptLanguage :

 case 8: // before q value fraction part if ('0' <= c || c <= '9') { // c <= '9'   ... } else { // invalid state = -1; } break;

The condition c <= '9' is meaningless: if it turns out that c <'0', then it is certainly <= '9'. As a result, the whole if is useless, the else branch will never be executed.

The same condition in two branches

Example from Eclipse Mylyn - DefaultXmlStreamWriter # printEscaped :

 private static void printEscaped(PrintWriter writer, int ch, boolean attribute) throws IOException { String ref = getEntityRef(ch, attribute); if (ref != null) { writer.write('&'); writer.write(ref); writer.write(';'); } else if (ch == '\r' || ch == 0x0085 || ch == 0x2028) { printHex(writer, ch); } else if ((ch >= ' ' && ch != 160 && isUtf8Printable((char) ch) && XML11Char.isXML11ValidLiteral(ch)) || ch == '\t' || ch == '\n' || ch == '\r') { // ch == '\r'   writer.write((char) ch); } else { printHex(writer, ch); } }

Here the condition ch == '\ r' is in two else if branches. Of course, only the first will work. There is no dead code, but it is unclear what the author meant.

Mutually exclusive paragraphs

Here's a pretty fresh code from the same Mylyn - EscapeBlock # canStart :

 public boolean canStart(String line, int lineOffset) { if (lineOffset == 0) { Matcher matcher = START_PATTERN.matcher(line); if (lineOffset > 0) { // lineOffset > 0   matcher.region(lineOffset, line.length()); } if (matcher.matches()) { return true; } } return false; }

If lineOffset is 0, then it can never be greater than 0.

Forgot what happened before?

Example from the project Jmol - GromacsReader # readAtoms

 for (int i = 0; i < modelAtomCount; ++i) { rd(); int len = line.length(); if (len != 44 && len != 68) { Logger.warn("line cannot be read for GROMACS atom data: " + line); continue; } ... if (len < 69) // len < 69   continue; float vx = parseFloatRange(line, 44, 52) * 10; float vy = parseFloatRange(line, 52, 60) * 10; ... }

At the beginning of the loop, the length of the line read from the file is checked. If it is not equal to 44 or 68, issue a warning and proceed to the next iteration of the loop. Toward the end of the iteration is another test for len: now if (len <69). Only two len values could reach here: 44 or 68. Both are less than 69, so the condition always works and the last 6 lines of the cycle will never be fulfilled.

Entangled in comparison operators

Again ICU - CollationParsedRuleBuilder # isJamo :

 private static final boolean isJamo(char ch) { return (ch >= 0x1100 && ch <= 0x1112) || (ch >= 0x1175 && ch <= 0x1161) // ch <= 0x1161   || (ch >= 0x11A8 && ch <= 0x11C2); }

The average pair of conditions finds numbers in the range from 0x1175 to 0x1161. Of course, there cannot be such numbers, since 0x1161 is less than 0x1175. Here we issue a warning that the condition ch <= 0x1161 is useless. Dead code is not formed, but there is definitely some kind of error.

Computation overflow

Here, Andrey2008 constantly asks if he can show funny bugs from the PVS-Studio code, found by PVS-Studio, and he answers that all the bugs are usually fixed right away, as they use incremental verification, and they don’t even get into the repository. In general, this is also true for FindBugs: The Eclipse project in FindBugs is configured for incremental checking, so it is difficult to find something serious in FindBugs, testing it yourself. But everything changes when you develop a new detector. Then he can really find a very tasty bug in the old code, which is what happened. See the DumbMethods # checkForCompatibleLongComparison Method

 private void checkForCompatibleLongComparison(OpcodeStack.Item left, OpcodeStack.Item right) { if (left.getSpecialKind() == Item.RESULT_OF_I2L && right.getConstant() != null) { long value = ((Number) right.getConstant()).longValue(); if ( (value > Integer.MAX_VALUE || value < Integer.MIN_VALUE)) { int priority = Priorities.HIGH_PRIORITY; if (value == Integer.MAX_VALUE+1 || value == Integer.MIN_VALUE -1) { //     priority = Priorities.NORMAL_PRIORITY; } ... }

This code (written, by the way, more than four years ago) is executed if in the source code being analyzed the number of type int is compared with the number of type long, which does not fit into the range int. Such a comparison is meaningless, so the INT_BAD_COMPARISON_WITH_INT_VALUE warning is generated here. It was supposed to set the priority lower if the comparison is made with a number that is on the border of a range of integers (Integer.MAX_VALUE + 1 or Integer.MIN_VALUE-1). But the fact is that these expressions themselves are calculated in integers and an overflow occurs: instead of Integer.MAX_VALUE + 1 we get Integer.MIN_VALUE, and instead of Integer.MIN_VALUE-1 we get Integer.MAX_VALUE. Moreover, the compiler performs this calculation, and the resulting number is already in the bytecode. This is where my new detector flashed: it follows from the above condition that neither Integer.MIN_VALUE nor Integer.MAX_VALUE can be in value at this point. As a result, the condition will never be fulfilled, and the priority will never be lowered. Of course, you should have written

 if (value == Integer.MAX_VALUE+1L || value == Integer.MIN_VALUE-1L)

It is clear why this bug went unnoticed for years: users could not even guess about the intention of the authors to lower the priority in this case, and the tests in FindBugs usually check if there is a certain warning on a certain code or not, but do not check its priority.

Reinsurance

Quite a lot of messages are issued on reinsurance conditions such as:

 if( x > 0 ) { ... } else if( x <= 0 ) { ... }

The second condition is obviously not necessary, although someone may argue that it additionally documents the code. Nevertheless, I believe that if an explanation is needed, it is better to write it in a comment. Anyway, earlier FindBugs already cursed such code:

 if( obj == null ) { ... } else if( obj != null ) { ... }

Therefore, I believe that I did not break the general concept.

Another such code is found:

 if( x > 0 ) { ... //   if( x > 0 ) { ... //   } ... }

Usually there is no error here, but it looks as if the author himself does not remember which branch of code he is in. Perhaps this is a sign that the method is very complex.

Conclusion

Those who want to try a new detector can download the code from our repository and compile it with ant. If someone is too lazy to compile, I made unofficial assemblies from the current code - findbugs.jar and Eclipse update site (unfortunately, the official Eclipse daily site does not work). The new alerts are called UC_USELESS_CONDITION and UC_USELESS_CONDITION_TYPE (category Dodgy code).

If anyone is interested in the implementation, most of the code in the ValueRangeAnalysisFactory class, the detector itself is RedundantConditions , and the search for duplicate finally-blocks is FinallyDuplicatesInfoFactory . The code is still inconclusive, but quite working. There may be false drawbacks in asserts, we will deal with this.

Errors are in any program, and static analysis is good.

Source: https://habr.com/ru/post/240121/

All Articles