Nullable Reference types in C # 8.0 and static analysis

It's no secret that Microsoft has been working on the release of the eighth version of C # for a long time. In the recent release of Visual Studio 2019, a new version of the language (C # 8.0) is already available, but only as a beta release. There are several possibilities in the plans of this new version, the implementation of which may not seem quite obvious, or more precisely, not quite expected. One of these innovations was the ability to use Nullable Reference types. The stated meaning of this innovation is the fight against Null Reference Exceptions (NRE).

We are pleased that the language is developing and new features should help developers. Coincidentally, in our analyzer, PVS-Studio for C #, relatively recently, the possibilities for detecting exactly the same NRE in the code have been significantly expanded. And we wondered whether there is now any sense for static analyzers in general, and for PVS-Studio in particular, to try to look for potential dereferencing of zero references if, at least in the new code using the Nullable Reference, such dereferencies become “impossible” ? Let's try to answer this question.

Pros and cons of innovation

To begin with, it is worth recalling that in the latest beta version of C # 8.0, available at the time of this writing, the Nullable Reference is disabled by default, i.e. the behavior of reference types will not change.

What are the nullable reference types in C # 8.0 if you include them? This is the same good old reference type with the difference that variables of this type should now be marked with the help of '?' (for example, string? ), by analogy with how it is already done for Nullable <T> , i.e. nullable meaningful types (for example, int? ). However, now the same string without '?' is already beginning to be interpreted as a non-nullable reference, i.e. This is a reference type, whose variable cannot contain null values.
')
Null Reference Exception is one of the most embarrassing exceptions, since it says little about the source of problems, especially if there are several dereferences in a row in the method that threw the exception. The ability to prohibit the transfer of null to the type reference variable looks great, but if null was previously passed to the method, and some further execution logic was tied to this, what should we do now? Of course, instead of null , you can pass a literal, a constant, or simply an “impossible” value, which, according to the logic of the program, cannot be assigned anywhere else to this variable. However, the fall of the entire program can be replaced by a further "quiet" incorrect execution. Not always it will be better than to see the error immediately.

And if instead throw an exception? A meaningful exception in a place where something went wrong is always better than a NRE somewhere higher or lower in the stack. But it's good if we are talking about our own project, where we can fix consumers and insert a try-catch block, and when developing a library using the (non) Nullable Reference, we take responsibility that some method always returns a value. And not always, even in your own code, it will turn out (at least simply) to replace the null return by throwing an exception (too much code can be hurt).

Nullable Reference can be enabled at the entire project level by adding the NullableContextOptions property with the value enable , or at the file level, using the preprocessor directive:

#nullable enable string cantBeNull = string.Empty; string? canBeNull = null; cantBeNull = canBeNull!;

Types will now be more visible. According to the signature of the method, it is possible to determine its behavior, whether there is a check for null in it or not, whether it can return null or not. Now, if you try to access the nullable reference variable without checking, the compiler will issue a warning.

It is quite convenient when using third-party libraries, but a situation arises with possible misinformation. The fact is that the transfer of null is still possible, for example, when using the new null-forgiving operator (!). Those. just with a single exclamation mark you can break all further assumptions that will be made about the interface using these variables:

 #nullable enable String GetStr() { return _count > 0 ? _str : null!; } String str = GetStr(); var len = str.Length;

Yes, it can be said that it is wrong to write this way, and no one will ever do that, but as long as such an opportunity remains, to rely solely on the contract imposed by the interface of this method (that it cannot return null), it is no longer possible.

And, by the way, you can write the same thing with the help of several operators!, Because C # now allows you to write like this (and this code is completely compiled):

 cantBeNull = canBeNull!!!!!!!;

Those. we would like to further emphasize: pay attention - it can be null !!! (we in the team call it “emotional” programming). In fact, the compiler (from Roslyn), when building a syntax tree of code, interprets the operator! similar to simple brackets, so their number, as in the case of brackets, is not limited. Although, if you write enough of them, the compiler can also be “dumped”. Perhaps this will change in the final version of C # 8.0.

Similarly, the compiler's warning compiler can be circumvented by referring to the nullable reference variable without checking:

 canBeNull!.ToString();

You can write more emotionally:

 canBeNull!!!?.ToString();

This syntax is actually difficult to imagine in a real project, putting the null-forgiving operator says to the compiler: everything is fine, the check is not needed. Adding an operator to Elvis, we say: but in general it may not be normal, let's check.

And now the legitimate question arises - why, if the concept of a non-nullable reference type implies that a variable of this type cannot contain null , can we still write it so easily there? The fact is that “under the hood”, at the level of the IL code, our non-nullable reference type remains ... all the same “normal” reference type. And all the nullability syntax is actually only an annotation for the static analyzer built into the compiler (and, in our opinion, not the most convenient analyzer, but more on that later). In our opinion, to include in the language a new syntax only as an annotation for a third-party tool (even if it is embedded in the compiler) is not the most “beautiful” solution, since For a programmer using this language, the fact that this is only a summary may not be completely obvious - after all, the very similar syntax for nullable structures works quite differently.

Returning to how you can still break the Nullable Reference types. At the time of writing, if there are several projects in the solution, when transferring from a method declared in one project a reference variable, for example, of type String, to a method from another project where NullableContextOptions is included , the compiler will decide that this is already a non-nullable String, and will not issue a warning. And this is despite a lot of [Nullable (1)] attributes added to each field and class method in the IL code when Nullable Reference is enabled . These attributes, by the way, should be taken into account if you work with the list of attributes through reflection, counting on the existence of only those attributes that you added yourself.

This situation may create additional problems when translating a large code base to the Nullable Reference. Most likely this process will be gradual, project by project. Of course, with a competent approach to change, you can gradually move to a new functionality, but if you already have a working draft, any changes in it are dangerous and undesirable (it works - do not touch!). That is why, when using the PVS-Studio analyzer, there is no need to edit the source code or somehow mark it up to detect potential NREs . To check the places where NullReferenceException can occur , you just need to run the analyzer and look at the warnings V3080. No need to change project properties or source code. No need to add directives, attributes or operators. No need to change the code.

With the support of the Nullable Reference types in the PVS-Studio analyzer, we faced a choice - should the analyzer interpret non-nullable reference variables as having always non-zero values? After studying the question of the possibilities to “break” this guarantee, we came to the conclusion that there is no - the analyzer should not make such an assumption. After all, even if non-nullable reference types are used everywhere in the project, the analyzer can complement their use by detecting situations in which the value may contain null .

How PVS-Studio searches for Null Reference Exceptions

Dataflow mechanisms in the C # PVS-Studio analyzer track possible values of variables as they are analyzed. Including, PVS-Studio carries out interprocedural analysis, i.e. tries to determine the possible value returned by the method, as well as the methods called in this method, etc. Among other things, the analyzer remembers variables that can potentially take the value null . If the analyzer sees further dereference without checking such a variable, again, either in the current code being checked, or within the method called in this code, a warning V3080 about the potential Null Reference Exception will be issued.

At the same time, the main idea underlying this diagnostic is that the analyzer will swear only if you have seen the assignment of null to a variable somewhere. This is the main difference between the behavior of this diagnostic and the analyzer built into the compiler, which works with the Nullable Reference types. The analyzer built into the compiler will complain about any dereference of an unchecked nullable reference type variable, unless, of course, this analyzer is “deceived” by the operator!, Or just to write a rather convoluted verification code (here, however, it’s worth noting that In any other way, absolutely any analyzer is possible, especially there is a goal to set yourself, and PVS-Studio is no exception here).

PVS-Studio curses only if it sees null (in a local context, or coming from a method). At the same time, even if the variable is a non-nullable reference variable, the behavior of the analyzer will not change - it will still swear if it sees that null was written to it. This approach seems to us more correct (or, at least, convenient for the user of the analyzer), since it does not require to “daub” the entire code with null checks to find potential dereferencing - this could have been done before, without the Nullable Reference, for example, with the same contracts. In addition, the analyzer can now be used to further control the same non-nullable reference variables. If they are used “honestly” and they are never assigned null, the analyzer will keep silent. If null is assigned and the variable is dereferenced without checking, the analyzer will warn you about this with the message V3080:

 #nullable enable String GetStr() { return _count > 0 ? _str : null!; } String str = GetStr(); var len = str.Length; <== V3080: Possible null dereference. Consider inspecting 'str'

Consider further some examples of such V3080 diagnostics triggers in the code of Roslyn himself. We checked this project not so long ago, but this time we will consider only potential Null Reference Exception triggers that weren't in previous articles. Let's see how the PVS-Studio analyzer can find potential null reference dereferencing, and how to fix these places using the new Nullable Reference syntax.

V3080 [CWE-476] Possible null dereference inside method. Consider inspecting the 2nd argument: chainedTupleType. Microsoft.CodeAnalysis.CSharp TupleTypeSymbol.cs 244

 NamedTypeSymbol chainedTupleType; if (_underlyingType.Arity < TupleTypeSymbol.RestPosition) { .... chainedTupleType = null; } else { .... } return Create(ConstructTupleUnderlyingType(firstTupleType, chainedTupleType, newElementTypes), elementNames: _elementNames);

As you can see, the variable chainedTupleType can be null in one of the code execution branches. Then chainedTupleType is passed inside the ConstructTupleUnderlyingType method , and is used there with a check through Debug.Assert . This situation is very common in Roslyn, but it is worth remembering that Debug.Assert is removed in the release version of the assembly. Therefore, the analyzer still considers dereference within the ConstructTupleUnderlyingType method dangerous. The following is the body of this method, where dereference takes place:

 internal static NamedTypeSymbol ConstructTupleUnderlyingType( NamedTypeSymbol firstTupleType, NamedTypeSymbol chainedTupleTypeOpt, ImmutableArray<TypeWithAnnotations> elementTypes) { Debug.Assert (chainedTupleTypeOpt is null == elementTypes.Length < RestPosition); .... while (loop > 0) { .... currentSymbol = chainedTupleTypeOpt.Construct(chainedTypes); loop--; } return currentSymbol; }

Whether the analyzer has to take into account such Asserts is a question that is in fact controversial (some of our users want it to do this), because the contracts from System.Diagnostics.Contracts, for example, the analyzer is now taking into account. I'll tell you just a small example from the real use of the same Roslyn in our analyzer. Recently, we supported the new version of Visual Studio , and at the same time updated the 3rd version in the Roslyn analyzer. After that, the analyzer began to fall when checking a certain code, on which it had not previously crashed. At the same time, the analyzer did not fall inside our code, but inside the Roslyn code itself - fall from the Null Reference Exception. And further debugging showed that in the place where Roslyn now falls, exactly a couple of lines up, there is the same null check through Debug.Assert . And she, as we see, did not save.

This is a very good example of problems with the Nullable Reference , because the compiler considers Debug.Assert as a valid test in any configuration. That is, if you simply include #nullable enable and mark the chainedTupleTypeOpt argument as nullable reference , there will be no compiler warnings at the dereferencing point in the ConstructTupleUnderlyingType method.

Consider the following example of triggering PVS-Studio.

V3080 Possible null dereference. Consider inspecting 'effectiveRuleset'. RuleSet.cs 146

 var effectiveRuleset = ruleSet.GetEffectiveRuleSet(includedRulesetPaths); effectiveRuleset = effectiveRuleset.WithEffectiveAction(ruleSetInclude.Action); if (IsStricterThan(effectiveRuleset.GeneralDiagnosticOption, ....)) effectiveGeneralOption = effectiveRuleset.GeneralDiagnosticOption;

This warning notes that calling the WithEffectiveAction method may return null , but the result is used without checking ( effectiveRuleset.GeneralDiagnosticOption ). The body of the WithEffectiveAction method, which can return null, is written to the variable effectiveRuleset :

 public RuleSet WithEffectiveAction(ReportDiagnostic action) { if (!_includes.IsEmpty) throw new ArgumentException(....); switch (action) { case ReportDiagnostic.Default: return this; case ReportDiagnostic.Suppress: return null; .... return new RuleSet(....); default: return null; } }

If we enable the Nullable Reference mode for the GetEffectiveRuleSet method, we will have two places in which we need to change our behavior. Since there is an exception throw in the method above - it is logical to assume that the method call is wrapped in a try-catch block and will correctly rewrite the method, throwing the exception instead of returning null. But as we rise above the challenges, we see that the interception is high and the consequences can be quite unpredictable. Let's look at the consumer variable of effectiveRuleset - the IsStricterThan method

 private static bool IsStricterThan(ReportDiagnostic action1, ReportDiagnostic action2) { switch (action2) { case ReportDiagnostic.Suppress: ....; case ReportDiagnostic.Warn: return action1 == ReportDiagnostic.Error; case ReportDiagnostic.Error: return false; default: return false; } }

As you can see, this is a simple switch on two enumerations with a possible ReportDiagnostic.Default enumeration value . So it is best to rewrite the call as follows:

The signature WithEffectiveAction will change:

 #nullable enable public RuleSet? WithEffectiveAction(ReportDiagnostic action)

The call will look like this:

 RuleSet? effectiveRuleset = ruleSet.GetEffectiveRuleSet(includedRulesetPaths); effectiveRuleset = effectiveRuleset?.WithEffectiveAction(ruleSetInclude.Action); if (IsStricterThan(effectiveRuleset?.GeneralDiagnosticOption ?? ReportDiagnostic.Default, effectiveGeneralOption)) effectiveGeneralOption = effectiveRuleset.GeneralDiagnosticOption;

knowing that IsStricterThan only performs a comparison - the condition can be rewritten, for example:

 if (effectiveRuleset == null || IsStricterThan(effectiveRuleset.GeneralDiagnosticOption, effectiveGeneralOption))

We now turn to the next post analyzer.

V3080 Possible null dereference. Consider inspecting 'propertySymbol'. BinderFactory.BinderFactoryVisitor.cs 372

 var propertySymbol = GetPropertySymbol(parent, resultBinder); var accessor = propertySymbol.GetMethod; if ((object)accessor != null) resultBinder = new InMethodBinder(accessor, resultBinder);

Further use of the propertySymbol variable should be taken into account when correcting the analyzer warning.

 private SourcePropertySymbol GetPropertySymbol( BasePropertyDeclarationSyntax basePropertyDeclarationSyntax, Binder outerBinder) { .... NamedTypeSymbol container = GetContainerType(outerBinder, basePropertyDeclarationSyntax); if ((object)container == null) return null; .... return (SourcePropertySymbol)GetMemberSymbol(propertyName, basePropertyDeclarationSyntax.Span, container, SymbolKind.Property); }

The GetMemberSymbol method can also return null in some cases.

 private Symbol GetMemberSymbol( string memberName, TextSpan memberSpan, NamedTypeSymbol container, SymbolKind kind) { foreach (Symbol sym in container.GetMembers(memberName)) { if (sym.Kind != kind) continue; if (sym.Kind == SymbolKind.Method) { .... var implementation = ((MethodSymbol)sym).PartialImplementationPart; if ((object)implementation != null) if (InSpan(implementation.Locations[0], this.syntaxTree, memberSpan)) return implementation; } else if (InSpan(sym.Locations, this.syntaxTree, memberSpan)) return sym; } return null; }

With the use of the nullable reference type, the call will change like this:

 #nullable enable SourcePropertySymbol? propertySymbol = GetPropertySymbol(parent, resultBinder); MethodSymbol? accessor = propertySymbol?.GetMethod; if ((object)accessor != null) resultBinder = new InMethodBinder(accessor, resultBinder);

Pretty simple when you know where to fix it. Static analysis easily finds this potential error by obtaining all possible field values across all the chains of procedure calls.

V3080 Possible null dereference. Consider inspecting 'simpleName'. CSharpCommandLineParser.cs 1556

 string simpleName; simpleName = PathUtilities.RemoveExtension( PathUtilities.GetFileName(sourceFiles.FirstOrDefault().Path)); outputFileName = simpleName + outputKind.GetDefaultExtension(); if (simpleName.Length == 0 && !outputKind.IsNetModule()) ....

Problem in line with simpleName.Length check . simpleName is the result of executing a whole chain of methods and can be null . By the way, you can curiosity see the RemoveExtension method and find differences from Path.GetFileNameWithoutExtension. Here we could restrict ourselves to checking simpleName! = Null , but in the context of non-zero references the code will look something like this:

 #nullable enable public static string? RemoveExtension(string path) { .... } string simpleName;

The call will look like this:

 simpleName = PathUtilities.RemoveExtension( PathUtilities.GetFileName(sourceFiles.FirstOrDefault().Path)) ?? String.Empty;

Conclusion

Nullable Reference types can help a lot when planning an architecture created from scratch, but reworking existing code can potentially take a lot of time and care, as it can cause a lot of subtle errors. In this article, we did not set ourselves the goal of discouraging someone from using Nullable Reference types in their projects. We consider this innovation to be generally useful for the language, although the way it was implemented may raise questions.

You should always remember the limitations inherent in this approach, and that the included Nullable Reference mode does not protect against errors with dereferencing null links, and if used incorrectly, it can even lead to them. It is worth considering the use of a modern static analyzer, for example, PVS-Studio, which supports interprocedural analysis, as an additional tool that can, together with the Nullable Reference, protect you from dereference of zero references. Each of these approaches, both in-depth interprocedural analysis and annotation of method signatures (which in essence makes the Nullable Reference), has its advantages and disadvantages. The analyzer will allow you to get a list of potentially dangerous places, as well as changing the existing code to see all the consequences of such changes. If you assign null in some case, the analyzer should immediately indicate all consumers to the variable where it is not checked before dereference.

You can search for any other errors yourself, either in the considered project or in your own. To do this, you just need to download and try the PVS-Studio analyzer.

If you want to share this article with an English-speaking audience, then please use the link to the translation: Paul Eremeev, Alexander Senichkin. Nullable Reference types in C # 8 and static analysis

Source: https://habr.com/ru/post/455230/

All Articles

Nullable Reference types in C # 8.0 and static analysis

Pros and cons of innovation

How PVS-Studio searches for Null Reference Exceptions

Conclusion

More articles: