📜 ⬆️ ⬇️

What errors are hiding in the code Infer.NET?


The publication by Microsoft of the source code of their projects is quite a good reason to check them. This time was no exception, and today we will look at suspicious places found in the Infer.NET code. Down with annotation - more to the point!

A little bit about the project and analyzer


Infer.NET is a machine learning system developed by Microsoft experts. The source code for the project has recently become available on GitHub , which was the reason for checking it out. More details about the project can be read, for example, here .

The project was checked with the help of the PVS-Studio static analyzer version 6.26. I remind you that PVS-Studio is looking for errors in the code in C \ C ++ \ C # (and soon in Java) under Windows, Linux, macOS. C # code while analyzing only under Windows. The analyzer can be downloaded and tried on your project.

The test itself was extremely simple and without problems. I previously unloaded the project from GitHub, restored the required packages (dependencies) and made sure that the project is being successfully built. This is required in order for the analyzer to have access to all the necessary information to conduct a comprehensive analysis. After assembling in a couple of clicks, I launched the analysis of the solution through the PVS-Studio plugin for Visual Studio.
')
By the way, this is not the first project from Microsoft, tested with PVS-Studio - there were others: Roslyn , MSBuild , PowerShell , CoreFX and others .

Note If you or your friends are interested in analyzing Java code - you can write to us in support by selecting the item “I want an analyzer for Java”. There is no public beta-version of the analyzer yet, but it should appear soon. Somewhere in the secret laboratory (through the wall) the guys are actively working on it.

But enough of the abstract conversations - let's look at the problems in the code.

Is this a bug or a feature?


I propose to try to find the error yourself - a completely solvable problem. No joke in the spirit of what was in the article " Top 10 errors in C ++ projects for 2017 ", honestly. So do not rush to read the analyzer warning presented after the code fragment.

private void MergeParallelTransitions() { .... if ( transition1.DestinationStateIndex == transition2.DestinationStateIndex && transition1.Group == transition2.Group) { if (transition1.IsEpsilon && transition2.IsEpsilon) { .... } else if (!transition1.IsEpsilon && !transition2.IsEpsilon) { .... if (double.IsInfinity(transition1.Weight.Value) && double.IsInfinity(transition1.Weight.Value)) { newElementDistribution.SetToSum( 1.0, transition1.ElementDistribution, 1.0, transition2.ElementDistribution); } else { newElementDistribution.SetToSum( transition1.Weight.Value, transition1.ElementDistribution, transition2.Weight.Value, transition2.ElementDistribution); } .... } 

PVS-Studio warning : V3001 There are identical.-double.IsInfinity (transition1.Weight.Value) there are identical sub-expressions. Runtime Automaton.Simplification.cs 479

As can be seen from the code fragment, the method is working with a couple of variables - transition1 and transition2 . The use of similar names is sometimes quite justified, but it is worth remembering that in this case the probability of accidentally making a mistake somewhere with a name increases.

This is what happened when checking numbers to infinity ( double.IsInfinity ). Because of the error, we checked the value of the same variable 2 times - transition1.Weight.Value . The variable in the second subexpression should be transition2.Weight.Value .

Another similar suspicious code.

 internal MethodBase ToMethodInternal(IMethodReference imr) { .... bf |= BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Instance; .... } 

PVS-Studio Warning : V3001 There are identical sub-expressions of 'BindingFlags.Public' operator. Compiler CodeBuilder.cs 194

When generating the value of the bf variable, the BindingFlags.Public enumeration element is used twice. Either this code contains an extra flag setting operation, or instead of the second use of BindingFlags.Public, there should be a different enum value.

By the way, in the source this code is written in one line. It seems to me that if it is formatted in tabular style (like here), the problem is easier to detect.

Go ahead. I give the whole body of the method and again I suggest you find the error (or maybe the error) on your own.

 private void ForEachPrefix(IExpression expr, Action<IExpression> action) { // This method must be kept consistent with GetTargets. if (expr is IArrayIndexerExpression) ForEachPrefix(((IArrayIndexerExpression)expr).Target, action); else if (expr is IAddressOutExpression) ForEachPrefix(((IAddressOutExpression)expr).Expression, action); else if (expr is IPropertyReferenceExpression) ForEachPrefix(((IPropertyReferenceExpression)expr).Target, action); else if (expr is IFieldReferenceExpression) { IExpression target = ((IFieldReferenceExpression)expr).Target; if (!(target is IThisReferenceExpression)) ForEachPrefix(target, action); } else if (expr is ICastExpression) ForEachPrefix(((ICastExpression)expr).Expression, action); else if (expr is IPropertyIndexerExpression) ForEachPrefix(((IPropertyIndexerExpression)expr).Target, action); else if (expr is IEventReferenceExpression) ForEachPrefix(((IEventReferenceExpression)expr).Target, action); else if (expr is IUnaryExpression) ForEachPrefix(((IUnaryExpression)expr).Expression, action); else if (expr is IAddressReferenceExpression) ForEachPrefix(((IAddressReferenceExpression)expr).Expression, action); else if (expr is IMethodInvokeExpression) ForEachPrefix(((IMethodInvokeExpression)expr).Method, action); else if (expr is IMethodReferenceExpression) ForEachPrefix(((IMethodReferenceExpression)expr).Target, action); else if (expr is IUnaryExpression) ForEachPrefix(((IUnaryExpression)expr).Expression, action); else if (expr is IAddressReferenceExpression) ForEachPrefix(((IAddressReferenceExpression)expr).Expression, action); else if (expr is IDelegateInvokeExpression) ForEachPrefix(((IDelegateInvokeExpression)expr).Target, action); action(expr); } 

Found? Checking!

PVS-Studio warnings :

Let's simplify the code a bit so that the problems become more obvious.

 private void ForEachPrefix(IExpression expr, Action<IExpression> action) { if (....) .... else if (expr is IUnaryExpression) ForEachPrefix(((IUnaryExpression)expr).Expression, action); else if (expr is IAddressReferenceExpression) ForEachPrefix(((IAddressReferenceExpression)expr).Expression, action); .... else if (expr is IUnaryExpression) ForEachPrefix(((IUnaryExpression)expr).Expression, action); else if (expr is IAddressReferenceExpression) ForEachPrefix(((IAddressReferenceExpression)expr).Expression, action) .... } 

Conditional expressions and then- branches of several if statements are duplicated. Perhaps this code was written using the copy-paste method, which is why the problem arose. Now it turns out that the then- branches of duplicates will never be executed, since:


Since the then- branches contain the same actions, now it looks like a redundant code that is confusing. It is possible that there is a problem of another kind - instead of duplicates, other checks should have been performed.

We continue.

 public int Compare(Pair<int, int> x, Pair<int, int> y) { if (x.First < y.First) { if (x.Second >= y.Second) { // y strictly contains x return 1; } else { // No containment - order by left bound return 1; } } else if (x.First > y.First) { if (x.Second <= y.Second) { // x strictly contains y return -1; } else { // No containment - order by left bound return -1; } } .... } 

PVS-Studio warnings :

The code looks extremely suspicious, as it contains two conditional operators with the same bodies then and else- branches. Probably in both cases it is worth returning different values. Or, if this is conceived behavior, it will be useful to remove redundant conditional statements.

There were interesting cycles. Example below:

 private static Set<StochasticityPattern> IntersectPatterns(IEnumerable<StochasticityPattern> patterns) { Set<StochasticityPattern> result = new Set<StochasticityPattern>(); result.AddRange(patterns); bool changed; do { int count = result.Count; AddIntersections(result); changed = (result.Count != count); break; } while (changed); return result; } 

PVS-Studio warning : V3020 An unconditional 'break' within a loop. Compiler DefaultFactorManager.cs 474

Because of the unconditional break statement, exactly one iteration of the loop is performed, and the changed control variable is not even used. In general, the code looks strange and suspicious.

The same method (exact copy) met in another class. Appropriate analyzer warning: V3020 An unconditional 'break' within a loop. Visualizers.Windows FactorManagerView.cs 350

By the way, the method met with the unconditional continue operator in the loop (it was found by the analyzer with the same diagnostics), but above it was a comment confirming that this is a special temporary solution:

 // TEMPORARY continue; 

I remind you that there were no such comments near the unconditional break operator.

Go ahead.

 internal static DependencyInformation GetDependencyInfo(....) { .... IExpression resultIndex = null; .... if (resultIndex != null) { if (parameter.IsDefined( typeof(SkipIfMatchingIndexIsUniformAttribute), false)) { if (resultIndex == null) throw new InferCompilerException( parameter.Name + " has SkipIfMatchingIndexIsUniformAttribute but " + StringUtil.MethodNameToString(method) + " has no resultIndex parameter"); .... } .... } .... } 

PVS-Studio warning : V3022 Expression 'resultIndex == null' is always false. Compiler FactorManager.cs 382

Immediately, I note that between the declaration and the above test, the value of the variable resultIndex may change. However, between the checks of resultIndex! = Null and resultIndex == null, the value cannot be changed. Therefore, the result of the expression resultIndex == null will always be false , which means that an exception will never be generated.

I hope that you have an interest in finding errors yourself without my suggestions to find a problem, but just in case I will suggest doing it again. The code for the method is small;

 public static Tuple<int, string> ComputeMovieGenre(int offset, string feature) { string[] genres = feature.Split('|'); if (genres.Length < 1 && genres.Length > 3) { throw new ArgumentException(string.Format( "Movies should have between 1 and 3 genres; given {0}.", genres.Length)); } double value = 1.0 / genres.Length; var result = new StringBuilder( string.Format( "{0}:{1}", offset + MovieGenreBuckets[genres[0]], value)); for (int i = 1; i < genres.Length; ++i) { result.Append( string.Format( "|{0}:{1}", offset + MovieGenreBuckets[genres[i].Trim()], value)); } return new Tuple<int, string>(MovieGenreBucketCount, result.ToString()); } 

Let's see what happens here. The input string is parsed by the '|' character. If the array length is not as expected, an exception must be generated. Wait a second ... genres.Length <1 && genres.Length> 3 ? Since there is no number that would immediately fall into both values ​​of the range ( [int.MinValue..1) and (3..int.MaxValue] ) required by the expression, the result of the expression will always be false . Therefore, this check does not protect against anything, and the expected exception will not be generated.

This is what the analyzer warns about: V3022 Expression 'genres.Length <1 && genres.Length> 3' is always false. Probably the '||' operator should be used here. Evaluator Features.cs 242

A suspicious division operation has been encountered.

 public static void CreateTrueThetaAndPhi(....) { .... double expectedRepeatOfTopicInDoc = averageDocLength / numUniqueTopicsPerDoc; .... int cnt = Poisson.Sample(expectedRepeatOfTopicInDoc); .... } 

PVS-Studio warning : V3041 The expression was implicitly cast from 'int' type to 'double' type. Consider using a fractional part. An example: double A = (double) (X) / Y ;. LDA Utilities.cs 74

The following is suspicious: the integer division is performed (the variables averageDocLength and numUniqueTopicsPerDoc are of type int ), and the result is written to a variable of type double . This begs the question: was it specially done, or was it still meant the division of real numbers? If the variable expectedRepeatOfTopicInDoc were of type int , this would remove the possible questions.

In other places, the Poisson.Sample method, whose argument is the suspicious variable expectedRepeatOfTopicInDoc , is used, for example, as described below.

 int numUniqueWordsPerTopic = Poisson.Sample((double)averageWordsPerTopic); 

averageWordsPerTopic is of type int , which at the place of use is cast to double .

And here is another place of use:

 double expectedRepeatOfWordInTopic = ((double)numDocs) * averageDocLength / numUniqueWordsPerTopic; .... int cnt = Poisson.Sample(expectedRepeatOfWordInTopic); 

Notice that the variables have the same names as in the original example, only for initializing the expectedRepeatOfWordInTopic , the division of real numbers is used (due to the explicit reduction of numDocs to double type).

In general, it is worth looking at the original place to which the analyzer issued a warning.

But thinking about whether to correct it, and how, let's leave it to the authors of the code (they know better), and go ahead. To the next suspicious division.

 public static NonconjugateGaussian BAverageLogarithm(....) { .... double v_opt = 2 / 3 * (Math.Log(mx * mz / Ex2 / 2) - m); if (v_opt != v) { .... } .... } 

PVS-Studio warning : V3041 The expression was implicitly cast from 'int' type to 'double' type. Consider using a fractional part. An example: double A = (double) (X) / Y ;. Runtime ProductExp.cs 137

The analyzer again detected a suspicious integer division operation, since 2 and 3 are integer numeric literals, and the result of the expression 2/3 is 0 . As a result, the whole expression takes the form:

 double v_opt = 0 * expr; 

Agree, a bit strange. Several times I returned to this warning, trying to find some kind of trick, and not trying to add it to the article. The method is filled with mathematics and different formulas (which, to be honest, we didn’t really want to), is there anything to expect. In addition, I try to be as skeptical as possible about the warnings that I write to the article, and I describe them only after studying better.

But then it dawned on me - why do we need a multiplier in the form of 0 , written as 2/3 ? So this place is, in any case, worth a look.

 public static void WriteAttribute(TextWriter writer, string name, object defaultValue, object value, Func<object, string> converter = null) { if ( defaultValue == null && value == null || value.Equals(defaultValue)) { return; } string stringValue = converter == null ? value.ToString() : converter(value); writer.Write($"{name}=\"{stringValue}\" "); } 

PVS-Studio warning : V3080 Possible null dereference. Consider inspecting 'value'. Compiler WriteHelpers.cs 78

It is a fair statement of the analyzer based on the condition. A dereferencing null reference can occur in a value.Equals (defaultValue) expression if value == null . Since this expression is the right operand of the operator ||, to calculate it, the left operand must be false , and for this it suffices that at least one of the variables defaultValue \ value is not null . As a result, if defaultValue! = Null , and value == null :


Let's look at a similar case:

 public FeatureParameterDistribution( GaussianMatrix traitFeatureWeightDistribution, GaussianArray biasFeatureWeightDistribution) { Debug.Assert( (traitFeatureWeightDistribution == null && biasFeatureWeightDistribution == null) || traitFeatureWeightDistribution.All( w => w != null && w.Count == biasFeatureWeightDistribution.Count), "The provided distributions should be valid and consistent in the number of features."); .... } 

PVS-Studio warning : V3080 Possible null dereference. Consider inspecting 'traitFeatureWeightDistribution'. Recommender FeatureParameterDistribution.cs 65

Let's throw out the excess, leaving only the logic for calculating the Boolean value, to make it easier to understand:

 (traitFeatureWeightDistribution == null && biasFeatureWeightDistribution == null) || traitFeatureWeightDistribution.All( w => w != null && w.Count == biasFeatureWeightDistribution.Count) 

Again, the right operand of the operator || calculated only if the result of calculating the left is false . The left operand can be false , including when traitFeatureWeightDistribution == null and biasFeatureWeightDistribution! = Null . Then the right operand of the operator || will be calculated, and the call to traitFeatureWeightDistribution.All will result in an ArgumentNullException .

Another interesting code snippet:

 public static double GetQuantile(double probability, double[] quantiles) { .... int n = quantiles.Length; if (quantiles == null) throw new ArgumentNullException(nameof(quantiles)); if (n == 0) throw new ArgumentException("quantiles array is empty", nameof(quantiles)); .... } 

PVS-Studio Warning : V3095 The 'quantiles' object was used against it. Check lines: 91, 92. Runtime OuterQuantiles.cs 91

Notice that the quantiles.Length property is first accessed , and then quantiles are checked for equality null . In summary, if quantiles == null , the method will throw an exception, just a little bit wrong, and a bit out of place where it was needed. Apparently, they mixed up the lines in some places.

If you have successfully coped with the detection of the errors given earlier on your own, I suggest you brew a cup of coffee and try the feat again, finding an error in the method below. To make it a bit more interesting, I quote the whole method code.

( Link to full size )

Picture 2



Okay, okay, it was a joke (or did you still succeed ?!). Let's simplify the task a bit:

 if (sample.Precision < 0) { precisionIsBetween = true; lowerBound = -1.0 / v; upperBound = -mean.Precision; } else if (sample.Precision < -mean.Precision) { precisionIsBetween = true; lowerBound = 0; upperBound = -mean.Precision; } else { // in this case, the precision should NOT be in this interval. precisionIsBetween = false; lowerBound = -mean.Precision; lowerBound = -1.0 / v; } 

Got better? The analyzer issued the following warning for this code: V3008 The 'lowerBound' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 324, 323. Runtime GaussianOp.cs 324

Indeed, in the last else branch, the value of the variable lowerBound is assigned twice in a row. Apparently (and judging by the code above), the variable upperBound should participate in one of the assignments.

Follow on.

 private void WriteAucMatrix(....) { .... for (int c = 0; c < classLabelCount; c++) { int labelWidth = labels[c].Length; columnWidths[c + 1] = labelWidth > MaxLabelWidth ? MaxLabelWidth : labelWidth; for (int r = 0; r < classLabelCount; r++) { int countWidth = MaxValueWidth; if (countWidth > columnWidths[c + 1]) { columnWidths[c + 1] = countWidth; } } .... } 

PVS-Studio warning : V3081 The 'r' counter is not used inside a loop. Consider inspecting usage of 'c' counter. CommandLine ClassifierEvaluationModule.cs 459

Please note that the internal loop counter - r - is not used in the body of this cycle. Because of this, it turns out that during all iterations of the inner loop, the same operations are performed on the same elements - after all, the indices also use the counter of the outer loop ( c ), and not the inner one ( r ).

Let's see what else was interesting.

 public RegexpFormattingSettings( bool putOptionalInSquareBrackets, bool showAnyElementAsQuestionMark, bool ignoreElementDistributionDetails, int truncationLength, bool escapeCharacters, bool useLazyQuantifier) { this.PutOptionalInSquareBrackets = putOptionalInSquareBrackets; this.ShowAnyElementAsQuestionMark = showAnyElementAsQuestionMark; this.IgnoreElementDistributionDetails = ignoreElementDistributionDetails; this.TruncationLength = truncationLength; this.EscapeCharacters = escapeCharacters; } 

PVS-Studio warning : V3117 Constructor parameter 'useLazyQuantifier' is not used. Runtime RegexpFormattingSettings.cs 38

The constructor does not use one parameter - useLazyQuantifier . It looks especially suspicious against the background of the fact that the class defines a property with the appropriate name and type - UseLazyQuantifier . Apparently, they forgot to initialize it through the corresponding parameter.

There are several potentially dangerous event handlers. An example of one of them is shown below:

 public class RecommenderRun { .... public event EventHandler Started; .... public void Execute() { // Report that the run has been started if (this.Started != null) { this.Started(this, EventArgs.Empty); } .... } .... } 

PVS-Studio warning : V3083 Unsafe invocation of event 'Started', NullReferenceException is possible. Consider assigning an event to a local variable before invoking it. Evaluator RecommenderRun.cs 115

The fact is that between checking for null inequality and calling the handler, an unsubscribing event can occur, and if at the time between checking for null and calling the handler the event has no subscribers, a NullReferenceException will be thrown . To avoid such problems, you can, for example, save a reference to a chain of delegates to a local variable or use the '?.' to call handlers.

In addition to the above code snippet, there were 35 such places.

By the way, 785 warnings of V3024 were met. The warning V3024 is issued when comparing real numbers using the '! =' Or '==' operators. I will not dwell here on why such comparisons are not always correct - for more details, see the documentation, there is also a link to StackOverflow (this is it).

Considering that formulas and calculations were often encountered, these warnings can also be important, although they are placed at level 3 (since not all projects are relevant at all).

If there is a certainty that these warnings are irrelevant - you can remove them with almost one click , reducing the total number of analyzer responses.



Conclusion


Somehow it happened that I hadn’t written articles about project validation for a long time, and it was quite pleasant to touch this process again. I hope that you learned something new / useful from this article, or at least read it with interest.

I wish the developers to fix the problem areas as soon as possible and remind that making mistakes is normal, yet we are human. That's why additional tools like static analyzers are needed to find what a person missed, right? Anyway - good luck with the project, and thanks for the work!

And remember that the maximum benefit from the static analyzer is achieved with its regular use .

All the best!



If you want to share this article with an English-speaking audience, then please use the link to the translation: Sergey Vasiliev. What Errors Lurk in Infer .NET Code?

Source: https://habr.com/ru/post/429578/


All Articles