During the operation of a static analyzer, the exact values or ranges of values of some variables and expressions can be calculated at the analysis stage. This is useful information that you can use when searching for errors. We call such values virtual values, and this article will be about them.

If the static analyzer is able to calculate what the expression is equal to, this allows for a deeper analysis of the code and to identify more errors. Of course, we are talking not only about the exact values of expressions, such as 1 + 2, but also about calculating the range of values that a variable can take at a certain place in a program. In the
PVS-Studio analyzer, we call the algorithms responsible for calculating ranges the mechanism of virtual values. Such a mechanism exists both in the core of the C / C ++ code analyzer and in the core of the C # analyzer. In this article we will look at the mechanism of virtual values using the example of a C # analyzer.
In our PVS-Studio analyzer, we use
Roslyn to find errors in C # projects to get all the necessary information about the source code. Roslyn provides us with a syntax tree, type information, dependency searching, and so on. During the analysis, PVS-Studio performs a syntax tree traversal and applies diagnostics to its nodes. In addition, during the crawling process, information is collected that can be used by the analyzer later. Examples of such additional information are virtual values.
')
Creating virtual values
Virtual values are stored for fields, properties, variables, and parameters when first mentioned in the code. If the first mention is a declaration of a variable or an assignment, then we will try to calculate the virtual value by analyzing the expression to the right of the equal sign. Otherwise, we usually know nothing about the property / field / parameter and believe that it can take any valid value. Consider an example:
public class MyClass { private bool hasElements = false; public void Func(byte x, List<int> list) { int y = x; hasElements = (list.Count >= 0); if (hasElements && y >= 0)
When in the process of traversing the tree, the analyzer reaches the body of the
Func function, it will begin to calculate the virtual values of the variables. The first line declares the variable
y , which is initialized to
x . Since we know about
x only that it is of type
byte , it means that it can take any value from 0 to 255. This range of values will be assigned as the virtual value of the variable
y . The same will be done for the
hasElements field: the analyzer knows that the
Count property on the list cannot accept negative values, therefore, the
true value is assigned to the
hasElements variable. Now, when analyzing the condition
hasElements &&
y > = 0, we know that the left and right sides are true and the whole condition is also always true - this is where the
V3022 diagnostics
work .
Let's take a closer look at how the virtual value of the expression is calculated.
Calculating the virtual value of an expression
For variables of different types, the virtual value is calculated differently. In the case of an integer type variable, the virtual value is stored as a set of value ranges that the variable can accept. For example, consider the following code:
public void Func(int x) { if (x >= -10 && x <= 10 && x != 0) { int y = x + 5; } }
At the beginning of the function, nothing is known about the variable
x and its range is all valid values of type
int : [int.MinValue, int.MaxValue]. When entering the
if block, we can refine the virtual value based on the condition. Thus, inside the
if block, the variable
x will have a range [-10, -1], [1, 10]. If now
x will be used when calculating an expression, the analyzer will take into account its virtual value. In our example,
y will get the virtual value [-5, 4], [6, 15].
For expressions of type
bool, the virtual value is calculated differently. Here we have only three options: false, true, or unknown meaning. Therefore, we simply enumerate a sufficient number of options for all the variables of the expression, and check in all cases whether the expression will take the same value. For example:
public void Func(uint x) { bool b = (x >= 0);
Whatever values for the parameter
x we take, the expression
x > = 0 is always true. Therefore, substituting several values instead of
x , we will make sure of this and assign
true as the virtual value for
b .
Another example from the umbraco project:
private object Aggregate(object[] args, string name) { .... if (name != "Min" || name != "Max")
To make sure that the condition in the example is always true, the analyzer substitutes the values “Min”, “Max”, “”,
null instead of name. In each of these cases, either the left or the right side of the expression will be true, so the expression in the condition is always true.
Virtual values calculated for all variables are stored separately for each block. When entering a nested block, the analyzer creates its own set of virtual values based on the parent block. For a simple nested block, this is just a copy of all virtual values. For conditions, loops, and other blocks, virtual values are not just copied, they may be subject to additional restrictions.
Refinement of virtual values in the if / else block
Consider for example how virtual values behave when processing an if / else block.
public void Func(int x) { if (x >= 0) {
After analyzing the condition
x > = 0, PVS-Studio will limit the range of the variable
x for the first
if block to the values [0, int.MaxValue]. After processing the second condition
x <= 10, the analyzer will create two more copies of the virtual values of the variable
x - one for the
if block and the other for the
else block. Moreover, these copies will be subject to restrictions taking into account the virtual value of the same variable from the parent block and the virtual value of the expression in the condition. That is, for a nested
if block, the virtual value of the variable
x will be [0, 10], and for the
else block, all other values will be [11, int.MaxValue].
After traversing the
if /
else block, we need to combine the virtual values from the
if and
else blocks for each variable. It should also be noted here that if at the end of the
if or
else there was a transition operator, for example,
return , then it is not necessary to combine the values from this block. Examples:
public void Func1(int x) { bool b1 = false; bool b2 = false; if (x >= 0) { .... b1 = true; b2 = true; } else { .... b1 = true; }
Cycle processing
The peculiarity of calculating virtual values for cycles is that the body of the cycle must be bypassed twice. Consider an example.
public void Func() { int i = 0; while (i < 10) { if (i == 5) { .... } i++; } }
If we simply copy the virtual values from the parent block to the
while block, then when analyzing the condition
i == 5 we would get a false positive V3022, since we know that before the cycle, the variable
i was zero. Therefore, before analyzing the body of the loop, you need to calculate what values the variables can take at the end of the iteration, as well as in all the blocks containing the
continue statement, and combine all these values together with the values of the variables before entering the loop. In addition, if we analyze the
for loop, we need to take into account the initialization and change blocks of the counter. After the values of all possible points of entry into the loop are combined, it is necessary to apply the loop condition in the same way as for the
if block. So we will get the correct virtual values for the variables and we can perform a second round of the cycle, on which diagnostics will be applied.
After traversing the cycle, we need to combine the virtual values of the variables from all points from which we can get to the next operator after the cycle. These are the values before the beginning of the loop (if no iteration is performed), the values of the variables at the end of the loop body, the values of the variables in the blocks containing the
break or
continue statements. All of these values we have already calculated and remembered at the time of the first round of the cycle. Now all of them must also be combined and apply the condition opposite to the cycle condition.
It was a difficult explanation, so let's look at an example:
public void Func(bool condition1, bool condition2, bool condition3) { int x = 0; while (condition1) {
In this example, the variable
x before entering the loop is zero. Having executed the first pass through the loop, the analyzer will calculate that the variable
x can also take the values 1, 2 in the block with
break and 3 in the block with
continue . Since we have three points of transition to the next iteration of the cycle, at the beginning of the cycle, the variable
x can take the values 0, 1 or 3. And we can get into the next operator after the cycle from four points. Therefore, here
x can be 0, 1, 2, or 3.
The analyzer also calculates which values the variables can take within the
case blocks of the
switch statement , within
try /
catch /
finally, and for other language constructs.
Division by zero
Dividing by zero is one of the errors that can be found with virtual values. The peculiarity of this diagnosis is that not every division, in which theoretically there can be a zero in the denominator, should lead to its operation. Consider an example:
public int GetBlockCount(int dataLength, int blockSize) { return dataLength / blockSize; }
In this function,
blockSize can theoretically take any value of type
int , and zero also falls within this range. But if you issue warnings to such a code, the diagnosis will lose its meaning, since useful warnings will be lost in hundreds of false positives. Therefore, we needed to teach the analyzer to identify among all the divisions really suspicious, for example, the following:
public string GetDownloadAvgRateString() { if (SecondsDownloading >= 0) { return GetSpeed(Downloaded / SecondsDownloading); } else { return ""; } }
or such:
public void Func(int x, int y) { for (int i = -10; i <= 10; i++) { y = x / i; } }
As a solution, we divide the ranges of virtual values into accurate and inaccurate. By default, a range is considered inaccurate until it is refined by explicitly assigning a constant or variable with an exact range, or by limiting the condition of an if statement or a loop. If the zero gets inside or on the border of the exact range, then in this case the division by zero diagnosis works.
Examples
Now let's look at some examples from real projects found by PVS-Studio using virtual values.
Example N1 (RavenDB). internal static void UrlPathEncodeChar (char c, Stream result) { if (c < 33 || c > 126) { byte [] bIn = Encoding.UTF8.GetBytes (c.ToString ()); for (int i = 0; i < bIn.Length; i++) { .... } } else if (c == ' ') {
The first condition of the function
UrlPathEncodeChar handles special characters, the second condition is a special optimization for the space. But since the ASCII code space is 32, the space will be processed by the first block. PVS-Studio finds this error as follows: inside the
if block, the virtual value of the variable
c will be [0, 32], [127, char.MaxValue], and inside the first
else block all other values: [33, 126]. Since the space code does not fall into this range, the analyzer reports error
V3022 - the expression c == '' is always false.
Example N2 (ServiceStack). protected override sealed void Initialize() { if (RootDirInfo == null) RootDirInfo = new DirectoryInfo(WebHostPhysicalPath); if (RootDirInfo == null || !RootDirInfo.Exists)
At the beginning of the function
Initialize about the property
RootDirInfo, we do not know anything. After analyzing the condition
RootDirInfo ==
null , 2 more copies of virtual values are created: one for the
if block in which
RootDirInfo is
null , and the second for the
else block in which
RootDirInfo is not
null . Although there is no
else block in our example, virtual values are still created for it. Further inside the
if block, a new value is obtained in the
RootDirInfo property, which is obtained as a result of calling the constructor. Since the constructor never returns
null , the
RootDirInfo value in the
if block
is now not
null . Since the
RootDirInfo for the
else block is also not
null , when combining these two branches, we get that
RootDirInfo after processing the first condition will never be
null . As a result, when analyzing the second condition, PVS-Studio reports an error
V3063 - part of the condition is always false.
Example N3 (ServiceStack). public static TextNode ParseTypeIntoNodes(this string typeDef) { .... var lastBlockPos = typeDef.IndexOf('<'); if (lastBlockPos >= 0) { ....
Consider what happens in this example with the
lastBlockPos variable. First, the result of calling the
IndexOf function is assigned to it. The analyzer knows that the
IndexOf ,
IndexOfAny ,
LastIndexOf ,
LastIndexOfAny functions return a non-negative value or -1. Therefore, the range of the
lastBlockPos variable will be [-1, int.MaxValue]. After entering the
if block, the range will be limited only by non-negative values [0, int.MaxValue]. Next, the analyzer will run through the body of the
while loop . The variable
nextPos, when declared, gets the range [-1, int.MaxValue]. After analyzing the condition if (
nextPos == -1), two copies of the virtual values of the variable
nextPost are created : [-1] for the
if branch and [0, int.MaxValue] for the
else branch. Since the if branch contains the
break statement, the rest of the loop body for the
nextPost variable uses only the virtual values of the
else branch: [0, int.MaxValue], which are assigned at the end of the
lastBlockPos variable.
Thus, we have two transition points to the loop body: one at the entrance to the loop, in which
lastBlockPos has the value [0, int.MaxValue] and the second when it goes to the next iteration, at which
lastBlockPos also has the value [0, int.MaxValue ]. Therefore,
lastBlockPos never accepts negative values in the loop condition, which means the loop condition is always true, as reported by the
V3022 diagnostic.
It is worth noting that finding this error manually is quite difficult, since the body of the cycle contains about forty lines and to trace the passage through all branches is problematic.
Example N4 (TransmissionRemoteDotnet).
Here, the variable y of type
long is assigned a value of type
byte . Since the
byte type is unsigned, checking
y <0 is meaningless.
Example N5 (MSBuild). private bool ValidateTaskNode() { bool foundInvalidNode = false; foreach (XmlNode childNode in _taskNode.ChildNodes) { switch (childNode.NodeType) { case XmlNodeType.Comment: case XmlNodeType.Whitespace: case XmlNodeType.Text:
In this example, the loop body contains two statements —
switch and
if . Consider the
switch block. The first section with three
cases contains only the
continue operator, so there can be no transition to checking the condition
foundInvalidNode . The second
case section either goes to the next iteration of the loop, or sets
foundInvalidNode to
true and exits the
switch . And finally, the
default section also sets
foundInvalidNode to
true and exits the
switch . Thus, after exiting the
switch, the variable
foundInvalidNode will always be true, which means the following
if is superfluous. The analyzer took into account that in this
switch block there is a branch of
default , which means that control cannot proceed immediately to the condition check - one of the
switch sections will be necessarily executed.
It should be noted that inside this statement,
switch continue is related to the loop, and
break exits the
switch , not the loop!
Conclusion
Calculation of variable values at the stage of static analysis is a powerful tool for finding errors. The code may contain complex branches, nested conditions and loops, blocks of hundreds of lines in size. Tracing manually how a variable changes and finding an error can be very difficult and the
PVS-Studio static analyzer is a good assistant in the search for many errors.
If you want to share this article with an English-speaking audience, then please use the link to the translation: Ilya Ivanov.
Search for errors .