The publication by Microsoft of the source code of their projects is quite a good reason to check them. This time was no exception, and today we will look at suspicious places found in the Infer.NET code. Down with annotation - more to the point!
A little bit about the project and analyzer
Infer.NET is a machine learning system developed by Microsoft experts. The source code for the project has recently become available on
GitHub , which was the reason for checking it out. More details about the project can be read, for example,
here .
The project was checked with the help of the PVS-Studio static analyzer version 6.26. I remind you that PVS-Studio is looking for errors in the code in C \ C ++ \ C # (and soon in Java) under Windows, Linux, macOS. C # code while analyzing only under Windows. The analyzer can be
downloaded and tried on your project.
The test itself was extremely simple and without problems. I previously unloaded the project from GitHub, restored the required packages (dependencies) and made sure that the project is being successfully built. This is required in order for the analyzer to have access to all the necessary information to conduct a comprehensive analysis. After assembling in a couple of clicks, I launched the analysis of the solution through the PVS-Studio plugin for Visual Studio.
')
By the way, this is not the first project from Microsoft, tested with PVS-Studio - there were others:
Roslyn ,
MSBuild ,
PowerShell ,
CoreFX and
others .
Note If you or your friends are interested in analyzing Java code - you can write to us in
support by selecting the item “I want an analyzer for Java”. There is no public beta-version of the analyzer yet, but it should appear soon. Somewhere in the secret laboratory (through the wall) the guys are actively working on it.
But enough of the abstract conversations - let's look at the problems in the code.
Is this a bug or a feature?
I propose to try to find the error yourself - a completely solvable problem. No joke in the spirit of what was in the article "
Top 10 errors in C ++ projects for 2017 ", honestly. So do not rush to read the analyzer warning presented after the code fragment.
private void MergeParallelTransitions() { .... if ( transition1.DestinationStateIndex == transition2.DestinationStateIndex && transition1.Group == transition2.Group) { if (transition1.IsEpsilon && transition2.IsEpsilon) { .... } else if (!transition1.IsEpsilon && !transition2.IsEpsilon) { .... if (double.IsInfinity(transition1.Weight.Value) && double.IsInfinity(transition1.Weight.Value)) { newElementDistribution.SetToSum( 1.0, transition1.ElementDistribution, 1.0, transition2.ElementDistribution); } else { newElementDistribution.SetToSum( transition1.Weight.Value, transition1.ElementDistribution, transition2.Weight.Value, transition2.ElementDistribution); } .... }
PVS-Studio warning :
V3001 There are identical.-double.IsInfinity (transition1.Weight.Value) there are identical sub-expressions. Runtime Automaton.Simplification.cs 479
As can be seen from the code fragment, the method is working with a couple of variables -
transition1 and
transition2 . The use of similar names is sometimes quite justified, but it is worth remembering that in this case the probability of accidentally making a mistake somewhere with a name increases.
This is
what happened when checking numbers to infinity (
double.IsInfinity ). Because of the error, we checked the value of the same variable 2 times -
transition1.Weight.Value . The variable in the second subexpression should be
transition2.Weight.Value .
Another similar suspicious code.
internal MethodBase ToMethodInternal(IMethodReference imr) { .... bf |= BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Instance; .... }
PVS-Studio Warning :
V3001 There are identical sub-expressions of 'BindingFlags.Public' operator. Compiler CodeBuilder.cs 194
When generating the value of the
bf variable, the
BindingFlags.Public enumeration element is used twice. Either this code contains an extra flag setting operation, or instead of the second use of
BindingFlags.Public, there should be a different enum value.
By the way, in the source this code is written in one line. It seems to me that if it is formatted in tabular style (like here), the problem is easier to detect.
Go ahead. I give the whole body of the method and again I suggest you find the error (or maybe the error) on your own.
private void ForEachPrefix(IExpression expr, Action<IExpression> action) {
Found? Checking!
PVS-Studio warnings :
- V3003 The use of if (A) {...} else if (A) {...} 'pattern was detected. There is a possibility of logical error presence. Check lines: 1719, 1727. Compiler CodeRecognizer.cs 1719
- V3003 The use of if (A) {...} else if (A) {...} 'pattern was detected. There is a possibility of logical error presence. Check lines: 1721, 1729. Compiler CodeRecognizer.cs 1721
Let's simplify the code a bit so that the problems become more obvious.
private void ForEachPrefix(IExpression expr, Action<IExpression> action) { if (....) .... else if (expr is IUnaryExpression) ForEachPrefix(((IUnaryExpression)expr).Expression, action); else if (expr is IAddressReferenceExpression) ForEachPrefix(((IAddressReferenceExpression)expr).Expression, action); .... else if (expr is IUnaryExpression) ForEachPrefix(((IUnaryExpression)expr).Expression, action); else if (expr is IAddressReferenceExpression) ForEachPrefix(((IAddressReferenceExpression)expr).Expression, action) .... }
Conditional expressions and
then- branches of several
if statements are duplicated. Perhaps this code was written using the copy-paste method, which is why the problem arose. Now it turns out that the
then- branches of duplicates will never be executed, since:
- if the conditional expression is true, the body of the first if statement from the corresponding pair is executed;
- if the conditional expression is false in the first case, it will be false in the second.
Since the
then- branches contain the same actions, now it looks like a redundant code that is confusing. It is possible that there is a problem of another kind - instead of duplicates, other checks should have been performed.
We continue.
public int Compare(Pair<int, int> x, Pair<int, int> y) { if (x.First < y.First) { if (x.Second >= y.Second) {
PVS-Studio warnings :
- V3004 The 'then' statement is equivalent to the 'else' statement. Runtime RegexpTreeBuilder.cs 1080
- V3004 The 'then' statement is equivalent to the 'else' statement. Runtime RegexpTreeBuilder.cs 1093
The code looks extremely suspicious, as it contains two conditional operators with the same bodies
then and
else- branches. Probably in both cases it is worth returning different values. Or, if this is conceived behavior, it will be useful to remove redundant conditional statements.
There were interesting cycles. Example below:
private static Set<StochasticityPattern> IntersectPatterns(IEnumerable<StochasticityPattern> patterns) { Set<StochasticityPattern> result = new Set<StochasticityPattern>(); result.AddRange(patterns); bool changed; do { int count = result.Count; AddIntersections(result); changed = (result.Count != count); break; } while (changed); return result; }
PVS-Studio warning :
V3020 An unconditional 'break' within a loop. Compiler DefaultFactorManager.cs 474
Because of the unconditional
break statement, exactly one iteration of the loop is performed, and the
changed control variable is not even used. In general, the code looks strange and suspicious.
The same method (exact copy) met in another class. Appropriate analyzer warning:
V3020 An unconditional 'break' within a loop. Visualizers.Windows FactorManagerView.cs 350
By the way, the method met with the unconditional
continue operator in the loop (it was found by the analyzer with the same diagnostics), but above it was a comment confirming that this is a special temporary solution:
I remind you that there were no such comments near the unconditional
break operator.
Go ahead.
internal static DependencyInformation GetDependencyInfo(....) { .... IExpression resultIndex = null; .... if (resultIndex != null) { if (parameter.IsDefined( typeof(SkipIfMatchingIndexIsUniformAttribute), false)) { if (resultIndex == null) throw new InferCompilerException( parameter.Name + " has SkipIfMatchingIndexIsUniformAttribute but " + StringUtil.MethodNameToString(method) + " has no resultIndex parameter"); .... } .... } .... }
PVS-Studio warning :
V3022 Expression 'resultIndex == null' is always false. Compiler FactorManager.cs 382
Immediately, I note that between the declaration and the above test, the value of the variable
resultIndex may change. However, between the checks of
resultIndex! = Null and
resultIndex == null, the value cannot be changed. Therefore, the result of the expression
resultIndex == null will always be
false , which means that an exception will never be generated.
I hope that you have an interest in finding errors yourself without my suggestions to find a problem, but just in case I will suggest doing it again. The code for the method is small;
public static Tuple<int, string> ComputeMovieGenre(int offset, string feature) { string[] genres = feature.Split('|'); if (genres.Length < 1 && genres.Length > 3) { throw new ArgumentException(string.Format( "Movies should have between 1 and 3 genres; given {0}.", genres.Length)); } double value = 1.0 / genres.Length; var result = new StringBuilder( string.Format( "{0}:{1}", offset + MovieGenreBuckets[genres[0]], value)); for (int i = 1; i < genres.Length; ++i) { result.Append( string.Format( "|{0}:{1}", offset + MovieGenreBuckets[genres[i].Trim()], value)); } return new Tuple<int, string>(MovieGenreBucketCount, result.ToString()); }
Let's see what happens here. The input string is parsed by the '|' character. If the array length is not as expected, an exception must be generated.
Wait a second ...
genres.Length <1 && genres.Length> 3 ? Since there is no number that would immediately fall into both values ​​of the range (
[int.MinValue..1) and
(3..int.MaxValue] ) required by the expression, the result of the expression will always be
false . Therefore, this check does not protect against anything, and the expected exception will not be generated.
This is what the analyzer warns about:
V3022 Expression 'genres.Length <1 && genres.Length> 3' is always false. Probably the '||' operator should be used here. Evaluator Features.cs 242
A suspicious division operation has been encountered.
public static void CreateTrueThetaAndPhi(....) { .... double expectedRepeatOfTopicInDoc = averageDocLength / numUniqueTopicsPerDoc; .... int cnt = Poisson.Sample(expectedRepeatOfTopicInDoc); .... }
PVS-Studio warning :
V3041 The expression was implicitly cast from 'int' type to 'double' type. Consider using a fractional part. An example: double A = (double) (X) / Y ;. LDA Utilities.cs 74
The following is suspicious: the integer division is performed (the variables
averageDocLength and
numUniqueTopicsPerDoc are of type
int ), and the result is written to a variable of type
double . This begs the question: was it specially done, or was it still meant the division of real numbers? If the variable
expectedRepeatOfTopicInDoc were of type
int , this would remove the possible questions.
In other places, the
Poisson.Sample method, whose argument is the suspicious variable
expectedRepeatOfTopicInDoc , is used, for example, as described below.
int numUniqueWordsPerTopic = Poisson.Sample((double)averageWordsPerTopic);
averageWordsPerTopic is of type
int , which at the place of use is
cast to
double .
And here is another place of use:
double expectedRepeatOfWordInTopic = ((double)numDocs) * averageDocLength / numUniqueWordsPerTopic; .... int cnt = Poisson.Sample(expectedRepeatOfWordInTopic);
Notice that the variables have the same names as in the original example, only for initializing the
expectedRepeatOfWordInTopic , the division of real numbers is used (due to the explicit reduction of
numDocs to
double type).
In general, it is worth looking at the original place to which the analyzer issued a warning.
But thinking about whether to correct it, and how, let's leave it to the authors of the code (they know better), and go ahead. To the next suspicious division.
public static NonconjugateGaussian BAverageLogarithm(....) { .... double v_opt = 2 / 3 * (Math.Log(mx * mz / Ex2 / 2) - m); if (v_opt != v) { .... } .... }
PVS-Studio warning :
V3041 The expression was implicitly cast from 'int' type to 'double' type. Consider using a fractional part. An example: double A = (double) (X) / Y ;. Runtime ProductExp.cs 137
The analyzer again detected a suspicious integer division operation, since
2 and
3 are integer numeric literals, and the result of the expression 2/3 is
0 . As a result, the whole expression takes the form:
double v_opt = 0 * expr;
Agree, a bit strange. Several times I returned to this warning, trying to find some kind of trick, and not trying to add it to the article. The method is filled with mathematics and different formulas (which, to be honest, we didn’t really want to), is there anything to expect. In addition, I try to be as skeptical as possible about the warnings that I write to the article, and I describe them only after studying better.
But then it dawned on me - why do we need a multiplier in the form of
0 , written as
2/3 ? So this place is, in any case, worth a look.
public static void WriteAttribute(TextWriter writer, string name, object defaultValue, object value, Func<object, string> converter = null) { if ( defaultValue == null && value == null || value.Equals(defaultValue)) { return; } string stringValue = converter == null ? value.ToString() : converter(value); writer.Write($"{name}=\"{stringValue}\" "); }
PVS-Studio warning :
V3080 Possible null dereference. Consider inspecting 'value'. Compiler WriteHelpers.cs 78
It is a fair statement of the analyzer based on the condition. A dereferencing null reference can occur in a
value.Equals (defaultValue) expression if
value == null . Since this expression is the right operand of the operator ||, to calculate it, the left operand must be
false , and for this it suffices that at least one of the variables
defaultValue \
value is not
null . As a result, if
defaultValue! = Null , and
value == null :
- defaultValue == null -> false ;
- defaultValue == null && value == null -> false ; ( value check failed)
- value.Equals (defaultValue) -> NullReferenceException , since value is null .
Let's look at a similar case:
public FeatureParameterDistribution( GaussianMatrix traitFeatureWeightDistribution, GaussianArray biasFeatureWeightDistribution) { Debug.Assert( (traitFeatureWeightDistribution == null && biasFeatureWeightDistribution == null) || traitFeatureWeightDistribution.All( w => w != null && w.Count == biasFeatureWeightDistribution.Count), "The provided distributions should be valid and consistent in the number of features."); .... }
PVS-Studio warning :
V3080 Possible null dereference. Consider inspecting 'traitFeatureWeightDistribution'. Recommender FeatureParameterDistribution.cs 65
Let's throw out the excess, leaving only the logic for calculating the Boolean value, to make it easier to understand:
(traitFeatureWeightDistribution == null && biasFeatureWeightDistribution == null) || traitFeatureWeightDistribution.All( w => w != null && w.Count == biasFeatureWeightDistribution.Count)
Again, the right operand of the operator || calculated only if the result of calculating the left is
false . The left operand can be
false , including when
traitFeatureWeightDistribution == null and
biasFeatureWeightDistribution! = Null . Then the right operand of the operator || will be calculated, and the call to
traitFeatureWeightDistribution.All will result in an
ArgumentNullException .
Another interesting code snippet:
public static double GetQuantile(double probability, double[] quantiles) { .... int n = quantiles.Length; if (quantiles == null) throw new ArgumentNullException(nameof(quantiles)); if (n == 0) throw new ArgumentException("quantiles array is empty", nameof(quantiles)); .... }
PVS-Studio Warning :
V3095 The 'quantiles' object was used against it. Check lines: 91, 92. Runtime OuterQuantiles.cs 91
Notice that the
quantiles.Length property is
first accessed , and then
quantiles are checked for equality
null . In summary, if
quantiles == null , the method will throw an exception, just a little bit wrong, and a bit out of place where it was needed. Apparently, they mixed up the lines in some places.
If you have successfully coped with the detection of the errors given earlier on your own, I suggest you brew a cup of coffee and try the feat again, finding an error in the method below. To make it a bit more interesting, I quote the whole method code.
(
Link to full size )

Okay, okay, it was a joke (or did you still succeed ?!). Let's simplify the task a bit:
if (sample.Precision < 0) { precisionIsBetween = true; lowerBound = -1.0 / v; upperBound = -mean.Precision; } else if (sample.Precision < -mean.Precision) { precisionIsBetween = true; lowerBound = 0; upperBound = -mean.Precision; } else {
Got better? The analyzer issued the following warning for this code:
V3008 The 'lowerBound' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 324, 323. Runtime GaussianOp.cs 324
Indeed, in the last
else branch, the value of the variable
lowerBound is assigned twice in a row. Apparently (and judging by the code above), the variable
upperBound should participate in one of the assignments.
Follow on.
private void WriteAucMatrix(....) { .... for (int c = 0; c < classLabelCount; c++) { int labelWidth = labels[c].Length; columnWidths[c + 1] = labelWidth > MaxLabelWidth ? MaxLabelWidth : labelWidth; for (int r = 0; r < classLabelCount; r++) { int countWidth = MaxValueWidth; if (countWidth > columnWidths[c + 1]) { columnWidths[c + 1] = countWidth; } } .... }
PVS-Studio warning :
V3081 The 'r' counter is not used inside a loop. Consider inspecting usage of 'c' counter. CommandLine ClassifierEvaluationModule.cs 459
Please note that the internal loop counter -
r - is not used in the body of this cycle. Because of this, it turns out that during all iterations of the inner loop, the same operations are performed on the same elements - after all, the indices also use the counter of the outer loop (
c ), and not the inner one (
r ).
Let's see what else was interesting.
public RegexpFormattingSettings( bool putOptionalInSquareBrackets, bool showAnyElementAsQuestionMark, bool ignoreElementDistributionDetails, int truncationLength, bool escapeCharacters, bool useLazyQuantifier) { this.PutOptionalInSquareBrackets = putOptionalInSquareBrackets; this.ShowAnyElementAsQuestionMark = showAnyElementAsQuestionMark; this.IgnoreElementDistributionDetails = ignoreElementDistributionDetails; this.TruncationLength = truncationLength; this.EscapeCharacters = escapeCharacters; }
PVS-Studio warning :
V3117 Constructor parameter 'useLazyQuantifier' is not used. Runtime RegexpFormattingSettings.cs 38
The constructor does not use one parameter -
useLazyQuantifier . It looks especially suspicious against the background of the fact that the class defines a property with the appropriate name and type -
UseLazyQuantifier . Apparently, they forgot to initialize it through the corresponding parameter.
There are several potentially dangerous event handlers. An example of one of them is shown below:
public class RecommenderRun { .... public event EventHandler Started; .... public void Execute() {
PVS-Studio warning :
V3083 Unsafe invocation of event 'Started', NullReferenceException is possible. Consider assigning an event to a local variable before invoking it. Evaluator RecommenderRun.cs 115
The fact is that between checking for
null inequality and calling the handler, an unsubscribing event can occur, and if at the time between checking for
null and calling the handler the event has no subscribers, a
NullReferenceException will be
thrown . To avoid such problems, you can, for example, save a reference to a chain of delegates to a local variable or use the '?.' to call handlers.
In addition to the above code snippet, there were 35 such places.
By the way,
785 warnings of
V3024 were met. The warning
V3024 is issued when comparing real numbers using the '! =' Or '==' operators. I will not dwell here on why such comparisons are not always correct - for more details, see the documentation, there is also a link to
StackOverflow (this is it).
Considering that formulas and calculations were often encountered, these warnings can also be important, although they are placed at level 3 (since not all projects are relevant at all).
If there is a certainty that these warnings are irrelevant - you can remove them with
almost one click , reducing the total number of analyzer responses.
Conclusion
Somehow it happened that I hadn’t written articles about project validation for a long time, and it was quite pleasant to touch this process again. I hope that you learned something new / useful from this article, or at least read it with interest.
I wish the developers to fix the problem areas as soon as possible and remind that making mistakes is normal, yet we are human. That's why additional tools like static analyzers are needed to find what a person missed, right? Anyway - good luck with the project, and thanks for the work!
And remember that the maximum benefit from the static analyzer is achieved with its
regular use .
All the best!

If you want to share this article with an English-speaking audience, then please use the link to the translation: Sergey Vasiliev.
What Errors Lurk in Infer .NET Code?