Accord.Net: we are looking for an error in the code, due to which the machines will enslave humanity

Articles about validating open source projects are a good thing. Someone, including developers, will learn about errors contained in the project, someone will learn about the methodology of static analysis and will begin to apply it to improve the quality of their code. For us, this is a great way to popularize the PVS-Studio analyzer, and at the same time the possibility of additional testing. This time I checked the Accord.Net platform and found many interesting fragments in the code.

About project and analyzer

Accord.Net is a .Net based machine learning platform written in C #. The platform consists of several libraries covering a wide range of tasks, such as static data processing, machine learning, pattern recognition, etc. The project source code is available in the repository on GitHub .

')
To check the project, we used the PVS-Studio static code analyzer, available for download by reference . In addition, you can read other articles about checking open projects or the " base of errors ", in which we collect the bugs we found.

Little warning

The analyzer issued 91 first-level warnings and 141 second-level warnings. 109 of these warnings have been described or mentioned in the article. Skimming through the rest of the warnings, I would consider 23 more errors, but they are not listed in the article, as they seemed not interesting enough or were similar to those already described. It is already more difficult to judge the remaining warnings of the analyzer - you need to more thoroughly understand and analyze the code. As a result, at least 132 of 232 warnings I would attribute to errors. This suggests that the number of false positives of the analyzer is approximately 46%. And, no, wait ... This tells us that every second analyzer warning is an error in the code! In my opinion, this is quite a weighty argument for the need to use static analysis tools. How correctly and incorrectly to use static analyzers is written at the end of the article. In the meantime, I propose to look at what interesting was found in the source code.

Bugs found

Same subexpressions

It is quite easy to make an error caught by the V3001 diagnostic message, especially if you use copy-paste, or if the names of the variables used in the expression are similar. These errors are one of the most common, and this project was not without them. Try to find an error in the fragment below, without looking into the analyzer warning.

public Blob[] GetObjects(UnmanagedImage image, bool extractInOriginalSize) { .... if ((image.PixelFormat != PixelFormat.Format24bppRgb) && (image.PixelFormat != PixelFormat.Format8bppIndexed) && (image.PixelFormat != PixelFormat.Format32bppRgb) && (image.PixelFormat != PixelFormat.Format32bppArgb) && (image.PixelFormat != PixelFormat.Format32bppRgb) && (image.PixelFormat != PixelFormat.Format32bppPArgb) ) .... }

PVS-Studio warning : V3001 There are identical sub-expressions 'image.PixelFormat! = PixelFormat.Format32bppRgb' operator. Accord.Imaging BlobCounterBase.cs 670

Here, the subexpression image.PixelFormat! = PixelFormat.Format32bppRgb is repeated twice . It was rather simple to make a mistake with such names of the enumeration elements, which happened. It is worth noting that, looking through such code, it is very easy to miss the extra subexpression. But whether one of the comparisons is superfluous, or a different meaning of the listing was implied, the question is interesting. In one case, there will simply be a redundant code, and in the other, a logical error.

This code was met again in the same file in line 833. Copy-paste ... Copy-paste never changes.

Wrong Cycle Exit Condition

We all used to call cycle counters by names like i , j , k , etc. As a rule, it is convenient and is common practice, but sometimes it can stand sideways. As in the example below.

 public static void Convert(float[][] from, short[][] to) { for (int i = 0; i < from.Length; i++) for (int j = 0; i < from[0].Length; j++) to[i][j] = (short)(from[i][j] * (32767f)); }

PVS-Studio warning : V3015 It is likely that a variable is being compared inside the 'for' operator. Consider reviewing 'i' Accord.Audio SampleConverter.cs 611

In the condition of exiting the second cycle, the variable i is used , although j is the counter of this cycle. In this case, the variable i inside the body of the nested loop does not change. As a result, j will be incremented until it goes beyond the bounds of the array, and an exception is generated.

Different logic for the same conditions.

The code met when for two identical if statements different logic was provided.

 public void Fit(double[][] observations, double[] weights, MultivariateEmpiricalOptions options) { if (weights != null) throw new ArgumentException("This distribution does not support weighted samples.", "weights"); .... if (weights != null) weights = inPlace ? weights : (double[])weights.Clone(); .... }

PVS-Studio warning : V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. If it’s a statement of senseless Accord.Statistics MultivariateEmpiricalDistribution.cs 653

Strange code, right? Especially considering the fact that the weights variable is a parameter of the method and is not used between these conditions. Thus, the second if statement will never be executed, since if the expression weights! = Null is true, an exception of type ArgumentException will be thrown.

This code is met again in the same file in line 687.

Conditions whose values are always false

Diagnostic rule V3022 is very prettier from the first release and never ceases to amaze me. Let's see if it can surprise you. First of all, I suggest finding an error in the code without resorting to the diagnostic message of the analyzer. At the same time, I draw attention to the fact that this is already a simplified version, with fragments of code cut out.

 private static void dscal(int n, double da, double[] dx, int _dx_offset, int incx) { .... if (((n <= 0) || (incx <= 0))) { return; } .... int _i_inc = incx; for (i = 1; (_i_inc < 0) ? i >= nincx : i <= nincx; i += _i_inc) .... }

PVS-Studio warning : V3022 Expression '(_i_inc <0)' is always false. Accord.Math BoundedBroydenFletcherGoldfarbShanno.FORTRAN.cs 5222

Of course, now it is easier to find the error, because of the method were specially cut out fragments that are not interesting to us. Yet it is hard to tell right away where the problem is in this code. And the fact is (as you might have guessed by reading the message from the analyzer) that the expression (_i_inc <0) is always false. Here you should pay attention to the fact that the _i_inc variable is initialized by the value of the incx variable. At the same time, the value of the incx variable at the time of initialization _i_inc is positive, since otherwise it would be possible to exit the method body above the code. Therefore, _i_inc can only have a positive value, the result of the comparison _i_inc <0 is always false , and the condition for exiting the loop is always the expression i <= nincx.

Such an in-depth analysis became available due to the mechanism of virtual values , which greatly improved some diagnostic rules of the analyzer.

 private void hqr2() { .... int low = 0; .... for (int i = 0; i < nn; i++) { if (i < low | i > high) .... } .... }

PVS-Studio warning : V3063 A part of conditional expression is always false: i <low. Accord.Math JaggedEigenvalueDecompositionF.cs 571

The subexpression i <low will always be false , since the minimum value accepted by the variable i is 0, and the low value at the time of comparison will also always be 0. Therefore, the subexpression i <low will be calculated as “idle”.

There are many such places. I will not give all warnings, but I will list a few:

V3063 A part of the conditional expression is always false: i <low. Accord.Math JaggedEigenvalueDecompositionF.cs 972
V3063 A part of the conditional expression is always false: i <low. Accord.Math JaggedEigenvalueDecomposition.cs 571
V3063 A part of the conditional expression is always false: i <low. Accord.Math JaggedEigenvalueDecomposition.cs 972
V3063 A part of the conditional expression is always false: i <low. Accord.Math EigenvalueDecomposition.cs 567

Integer division with reduction to real type

The analyzer found suspicious mathematical calculations in the code. It is enough just to forget that when dividing integers, integer division is performed. If real division was meant, an unpleasant and subtle mistake can occur. In some cases, it is difficult for a third-party developer to say whether such an expression contains an error, but it is worth checking out such operations. I propose to consider several such cases.

 public static double GetSpectralResolution(int samplingRate, int samples) { return samplingRate / samples; }

PVS-Studio warning : V3041 The expression was implicitly cast from 'int' type to 'double' type. Consider using a fractional part. An example: double A = (double) (X) / Y ;. Accord.Audio Tools.cs 158

This method performs the division of two integer variables, but the result of this division is implicitly cast to double type. It looks suspicious.

But the following code looks even more strange:

 public static int GreatestCommonDivisor(int a, int b) { int x = a - b * (int)Math.Floor((double)(a / b)); while (x != 0) { a = b; b = x; x = a - b * (int)Math.Floor((double)(a / b)); } return b; }

PVS-Studio warnings:

V3041 The expression was implicitly cast from 'int' type to 'double' type. Consider using a fractional part. An example: double A = (double) (X) / Y ;. Accord.Math Tools.cs 137
V3041 The expression was implicitly cast from 'int' type to 'double' type. Consider using a fractional part. An example: double A = (double) (X) / Y ;. Accord.Math Tools.cs 142

The analyzer found the expression suspicious (double) (a / b) . The Floor method returns the largest integer less than or equal to the specified double-precision floating-point number ( MSDN. Math. Flora ). But the variables a and b are of type int , therefore - integer division will be performed, the result of which will be an integer value. It turns out that there is no point in explicitly casting this value to a double type and calling the Floor method.

To perform the operation correctly, it was necessary to cast one of the operands to type double . Then the real division would be performed, and the call to the Floor method would be justified:

 Math.Floor((double)a / b)

The parameter of the method, the value of which is always overwritten

We continue. Errors of the following type are not common, but still come across.

 private static double WeightedMode(double[] observations, double[] weights, double mode, int imax, int imin) { .... var bestValue = currentValue; .... mode = bestValue; return mode; }

PVS-Studio warning : V3061 Parameter 'mode' is always rewritten in method body before being used. Accord.Statistics TriangularDistribution.cs 646

One of the method parameters - mode - is overwritten and returned. In this case, the mode method body is not used in any way (not counting overwriting). I will not undertake to assert that this is a mistake (among the suspicious places found by this diagnostic rule in other projects, it was clearly erroneous), but this code looks strange.

In general, there is an interesting feature: as a rule, errors in this project are not encountered one by one. The exact same code can be found in several places. Indeed, copy-paste does not change ...

V3061 Parameter 'mode' is always rewritten in method body before being used. Accord.Statistics TriangularDistribution.cs 678
V3061 Parameter 'mode' is always rewritten in method body before being used. Accord.Statistics TriangularDistribution.cs 706
V3061 Parameter 'mode' is always rewritten in method body before being used. Accord.Statistics TriangularDistribution.cs 735

Null reference dereference

 public override string ToString(string format, IFormatProvider formatProvider) { .... var fmt = components[i] as IFormattable; if (fmt != null) sb.AppendFormat(fmt.ToString(format, formatProvider)); else sb.AppendFormat(fmt.ToString()); .... }

PVS-Studio warning : V3080 Possible null dereference. Consider inspecting 'fmt'. Accord.Statistics MultivariateMixture'1.cs 697

Usually, errors that lead to exceptions are trapped during the development process, if the code is tested thoroughly enough. But not always, and the example above is proof of that. If the fmt! = Null expression is false, the instance method ToString is called on the fmt object . Result? NullReferenceException Exception .

And, as you might have guessed, the same mistake occurred again: MultivariateMixture'1.cs 697

Mutual assignment of links

 public class MetropolisHasting<T> : IRandomNumberGenerator<T[]> { .... T[] current; T[] next; .... public bool TryGenerate() { .... var aux = current; current = next; next = current; .... } .... }

PVS-Studio warning : V3037 An odd sequence of assignments of this kind: A = B; B = A ;. Check lines: 290, 289. Accord.Statistics MetropolisHasting.cs 290

Obviously, in the above fragment of the TryGenerate method, they wanted to swap references to the arrays next and current (the aux variable is not used anywhere else). But because of the error, it turned out that both current and next now store references to the same array — the one to which was contained in the variable next .

The correct code should look like this:

 var aux = current; current = next; next = aux;

Potential division by 0

The analysis revealed several errors of potential division by 0. Glance take a look at them:

 public BlackmanWindow(double alpha, int length) : base(length) { double a0 = (1.0 - alpha) / 2.0; double a1 = 0.5; double a2 = alpha / 2.0; for (int i = 0; i < length; i++) this[i] = (float)(a0 - a1 * Math.Cos((2.0 * System.Math.PI * i) / (length - 1)) + a2 * Math.Cos((4.0 * System.Math.PI * i) / (length - 1))); }

PVS-Studio warnings:

V3064 Potential division by zero. Consider inspecting denominator '(length - 1)'. Accord.Audio BlackmanWindow.cs 64
V3064 Potential division by zero. Consider inspecting denominator '(length - 1)'. Accord.Audio BlackmanWindow.cs 65

The analyzer found the following code suspicious:

 (2.0 * System.Math.PI * i) / (length - 1)

It is possible that in reality there will be no error here ( length is the length of some window, and for the occurrence of an error it should have the value 1), but who knows? It is better to be safe, otherwise there is a great chance to get a very unpleasant mistake, which, moreover, will be hard to detect.

Another interesting code snippet met with potential division by 0.

 public static double[,] Centering(int size) { if (size < 0) { throw new ArgumentOutOfRangeException("size", size, "The size of the centering matrix must be a positive integer."); } double[,] C = Matrix.Square(size, -1.0 / size); .... }

PVS-Studio warning : V3064 Potential division by zero. Consider inspecting denominator 'size'. Accord.Math Matrix.Construction.cs 794

As for me - a very interesting and funny mistake. The analyzer noted that in the expression -1.0 / size the division by 0 is possible. Now pay attention to the test above. If size <0 , an exception will be generated, but if size == 0 , there will be no exception, but division by 0 will be. In this case, in the literal passed to the exception constructor, it is written that the size of the matrix must be a positive number, but the test is performed for non-negative values. A positive and non - negative - yet different concepts. I suppose, to correct the error, it is enough just to correct the check:

 if (size <= 0)

Using bit operator instead of logical

Sometimes you have to deal with the fact that not all programmers know the difference between bit and logical operators ('|' and '||', '&' and '&&'). This can lead to completely different consequences - from superfluous calculations to falls. This project encountered several suspicious places using bit operations:

 public JaggedSingularValueDecompositionF( Single[][] value, bool computeLeftSingularVectors, bool computeRightSingularVectors, bool autoTranspose, bool inPlace) { .... if ((k < nct) & (s[k] != 0.0)) .... }

PVS-Studio warning : V3093 The '&' operator evaluates both operands. Perhaps a short-circuit '&& operator should be used instead. Accord.Math JaggedSingularValueDecompositionF.cs 461

The body of the if statement will be executed if both subexpressions ( k <nct and s [k]! = 0.0 ) are true . But at the same time, even if the value of the first subexpression ( k <nct ) is false , the second subexpression will still be calculated, which would not be the case when using the && operator. So, if the programmer checked the value of k in order not to go beyond the array, then he did not get anything out of this.

Similar warnings:

V3093 The '&' operator evaluates both operands. Perhaps a short-circuit '&& operator should be used instead. Accord.Math JaggedSingularValueDecompositionF.cs 510
V3093 The '&' operator evaluates both operands. Perhaps a short-circuit '&& operator should be used instead. Accord.Math JaggedSingularValueDecompositionF.cs 595
V3093 The '&' operator evaluates both operands. Perhaps a short-circuit '&& operator should be used instead. Accord.Math JaggedSingularValueDecomposition.cs 461
V3093 The '&' operator evaluates both operands. Perhaps a short-circuit '&& operator should be used instead. Accord.Math JaggedSingularValueDecomposition.cs 510
V3093 The '&' operator evaluates both operands. Perhaps a short-circuit '&& operator should be used instead. Accord.Math JaggedSingularValueDecomposition.cs 595
V3093 The '&' operator evaluates both operands. Perhaps a short-circuit '&& operator should be used instead. Accord.Math Gamma.cs 296

Check in the loop of the same element

An error was found that was found by the diagnostic rule from the latest release of the analyzer.

 public override int[] Compute(double[][] data, double[] weights) { .... int cols = data[0].Length; for (int i = 0; i < data.Length; i++) if (data[0].Length != cols) throw new DimensionMismatchException("data", "The points matrix should be rectangular. The vector at position {} has a different length than previous ones."); .... }

PVS-Studio warning : V3102 Suspicious access to the element of the 'data' object. Accord.MachineLearning BinarySplit.cs 121

An interesting mistake. The programmer wanted to verify that the jagged data array is actually two-dimensional (i.e., is a matrix). But I made a mistake in the expression data [0] .Length! = Cols , using not the loop counter i , but the integer literal - 0 as the index value. As a result, the expression data [0] .Length! = Cols will always be false, since it is equivalent expression data [0] .Length! = data [0] .Length . If the data parameter were a two-dimensional array ( Double [,] ), this error, as well as the entire check, could have been avoided. But it is possible that the use of a gear array due to some features of the application architecture.

Passing to the method as the argument of the caller

The following code snippet also looks suspicious.

 public static double WeightedMean(this double[] values, double[] weights) { .... } public override void Fit(double[] observations, double[] weights, IFittingOptions options) { .... mean = observations.WeightedMean(observations); .... }

PVS-Studio warning : V3062 An object of the observation of its own method. Consider checking the 'WeightedMean' method. Accord.Statistics InverseGaussianDistribution.cs 325

The analyzer found it suspicious that the WeightedMean method takes as argument the same object for which it is called. This becomes doubly strange when you consider that the WeightedMean is an extension method. I conducted additional research by looking at how this method is used in other places. In all places of its use, the array of weights spoke as the second argument (note that such an array exists in the considered Fit method), so most likely this is an error and the correct code should look like this:

 mean = observations.WeightedMean(weights);

Potential serialization error

The potential problem of serializing one of the classes was discovered.

 public class DenavitHartenbergNodeCollection : Collection<DenavitHartenbergNode> { .... } [Serializable] public class DenavitHartenbergNode { .... public DenavitHartenbergNodeCollection Children { get; private set; } .... }

PVS-Studio warning : V3097. Possible exception: The 'DenavitHartenbergNode' type marked by [Serialable] contains non-serializable members not marked by [NonSerialized]. Accord.Math DenavitHartenbergNode.cs 77

When serializing an instance of the DenavitHartenbergNode class, an exception of the type SerializationException may occur. It all depends on the type of serializer selected. If, for example, an instance of the BinaryFormatter type acts as a serializer, an exception will be generated, since all serializable members are required (and this property is also serializable) decorated with the [Serializable] attribute.

Some solutions:

implement this property through the field, decorated with the [NonSerialized] attribute. Then the field (and therefore the property associated with it) will not be serialized;
implement the ISerializable interface, and in the GetObjecData method ignore the serialization of this property;
decorate the DenavitHartenbergNodeCollection type with the [Serializable] attribute .

Judging by the surrounding code (all other properties have serializable types), it is necessary to implement the latter case.

If instances of this type are serialized using serializers that do not require all members to be serialized to be decorated with the [Serializable] attribute, you can not worry about this code.

The analyzer detected a lot of unsafe event call fragments. How much? 75 warnings V3083 ! I propose to consider one piece of code, since the rest, in principle, are similar to it.

 private void timeUp_Elapsed(object sender, ElapsedEventArgs e) { .... if (TempoDetected != null) TempoDetected(this, EventArgs.Empty); }

PVS-Studio warning : V3083 Unsafe invocation of event 'TempoDetected', NullReferenceException is possible. Consider assigning an event to a local variable before invoking it. Accord.Audio Metronome.cs 223

In this code snippet, it is checked whether there are subscribers to the TempoDetected event, and, if they exist, the event is triggered. It was assumed that if, at the time of verification, TempoDetected had no subscribers, this would avoid the occurrence of an exception. However, there is a possibility that at the time between checking TempoDetected for null inequality and triggering an event, it will not have any subscribers (for example, there will be a reply from this event in other threads). As a result, if at the time the event is called, TempoDetected does not have subscribers, an exception of type NullReferenceException will be generated. To avoid such problems you can, for example, use the null-conditional operator '?.', Added in C # 6.0. You can read more about the problem and other ways to solve it in the documentation for this diagnostic rule.

How to and how not to use a static analyzer

Before you finish, I would like to talk a little bit about how you still need to use static analysis tools. Often we have to deal with this approach: “We checked our project before release. Something of no particular interest was found. ”No, no, no! This is the wrong way to use. As a parallel, you can give such an example - refuse to develop programs in the IDE, write everything in a simple notebook. And just before the release, start working in the IDE. Strange? Still would! Sense from this IDE, if most of the time, when from it would have been a sense, she was gathering dust on your SSD / HDD? The same situation with static analyzers. They should be applied regularly, not once.

If you check the application code with the analyzer just before the release, it is obvious that many errors will already be fixed. But at what cost? At the cost of nerves and time of the developers who wrote the code, at the cost of numerous tests designed to identify these very errors. As a result, the cost of such errors becomes, to put it mildly, considerable.

, . , , , , . , , , . , .

. , , . .

, .

Results

, . , - - , - . , . , , PVS-Studio . , .

, : Sergey Vasiliev. Accord.Net: Looking for a Bug that Could Help Machines Conquer Humankind .

Read the article and have a question?

Often our articles are asked the same questions. We collected answers to them here: Answers to questions from readers of articles about PVS-Studio, version 2015 . Please review the list.

Source: https://habr.com/ru/post/305406/

All Articles