📜 ⬆️ ⬇️

Examine pattern matching in C # 7

In C # 7, a long-awaited opportunity finally came to be called pattern matching . If you are familiar with functional languages ​​such as F #, you may be a little disappointed with this feature in its current form, but even today it can simplify your code in a variety of scenarios.

Each new opportunity is fraught with danger for a developer working in an application critical for productivity. New levels of abstractions are good, but in order to use them effectively, you need to know what is going on under the hood. Today we are going to study the insides of pattern matching in order to understand how this is implemented.
The C # language introduced the concept of a pattern that can be used in an is-expression and inside a case block of a switch statement .

There are 3 types of templates:
')

Pattern matching in is-expressions


public void IsExpressions(object o) { // Alternative way checking for null if (o is null) Console.WriteLine("o is null"); // Const pattern can refer to a constant value const double value = double.NaN; if (o is value) Console.WriteLine("o is value"); // Const pattern can use a string literal if (o is "o") Console.WriteLine("o is \"o\""); // Type pattern if (o is int n) Console.WriteLine(n); // Type pattern and compound expressions if (o is string s && s.Trim() != string.Empty) Console.WriteLine("o is not blank"); } 

An is-expression can check whether the value is equal to a constant, and a type check can optionally create a pattern variable .

I found several interesting aspects related to pattern matching in is-expressions:


We first check the first two cases:

 public void ScopeAndDefiniteAssigning(object o) { if (o is string s && s.Length != 0) { Console.WriteLine("o is not empty string"); } // Can't use 's' any more. 's' is already declared in the current scope. if (o is int n || (o is string s2 && int.TryParse(s2, out n))) { Console.WriteLine(n); } } 

The first if statement introduces the variable s , and the variable is visible within the entire method. This is reasonable, but it will complicate the logic if other if-statements in the same block try to reuse the same name again. In this case, you need to use a different name to avoid collisions.

The variable entered in the is-expression is fully defined only when the predicate is true . This means that n in the second if statement is not defined in the right operand, but since this variable is already declared, we can use it as out in the int.TryParse method.

The third aspect, mentioned above, is the most important. Consider the following code:

 public void BoxTwice(int n) { if (n is 42) Console.WriteLine("n is 42"); } 

In most cases, the is-expression is converted to object.Equals (constValue, variable) (even if the specification says that the == operator should be used for primitive types):

 public void BoxTwice(int n) { if (object.Equals(42, n)) { Console.WriteLine("n is 42"); } } 

This code causes 2 boxing, which can seriously affect performance if they are used in a critical application path. Once the expression o is null also caused the package (see Suboptimal code for e is null ) and I hope that the current behavior will also be corrected soon (here is the corresponding tick on github).

If the n- variable is of type object , then o is 42 will result in one memory allocation (for packing a literal 42 ), although such a code based on a switch does not result in memory allocations.

var pattern in is-expressions


Sample var is a special case of a sample type with one key difference: the sample will match any value, even if the value is null .

 public void IsVar(object o) { if (o is var x) Console.WriteLine($"x: {x}"); } 

o is object true when o is not null , but o is var x is always true . The compiler knows this and, in Release (*) mode, completely removes the if construct and simply leaves the call to the console method. Unfortunately, the compiler does not warn that the code is not reachable in the following case:
if (! (o is var x)) Console.WriteLine ("Unreachable") . I hope this will be fixed too.

(*) It is not clear why the behavior differs only in the Release mode. But I think that all problems are of the same nature: the initial implementation of the feature is not optimal. But based on this comment by Neil Gafter, this will change: “Bad code corresponding to pattern matching is being rewritten from scratch (to support recursive patterns too). I expect that most of the improvements you’re looking for here will be “free” in the new code. ”

The lack of null checking makes this case very special and potentially dangerous. But if you know exactly what is happening, you may find this matching option useful. It can be used to insert a temporary variable inside an expression:

 public void VarPattern(IEnumerable<string> s) { if (s.FirstOrDefault(o => o != null) is var v && int.TryParse(v, out var n)) { Console.WriteLine(n); } } 

Is-expression and "Elvis" operator


There is another case that I found very useful. The type pattern matches the value only if the value is not null . We can use this “filtering” logic with a null-propagating operator to make the code more readable:

 public void WithNullPropagation(IEnumerable<string> s) { if (s?.FirstOrDefault(str => str.Length > 10)?.Length is int length) { Console.WriteLine(length); } // Similar to if (s?.FirstOrDefault(str => str.Length > 10)?.Length is var length2 && length2 != null) { Console.WriteLine(length2); } // And similar to var length3 = s?.FirstOrDefault(str => str.Length > 10)?.Length; if (length3 != null) { Console.WriteLine(length3); } } 

Note that the same template can be used for both value types and reference types.

Pattern Matching switch


C # 7 extends the switch statement to use samples in case blocks:

 public static int Count<T>(this IEnumerable<T> e) { switch (e) { case ICollection<T> c: return c.Count; case IReadOnlyCollection<T> c: return c.Count; // Matches concurrent collections case IProducerConsumerCollection<T> pc: return pc.Count; // Matches if e is not null case IEnumerable<T> _: return e.Count(); // Default case is handled when e is null default: return 0; } } 

The example shows the first set of changes in the switch statement.

  1. The switch statement can use any type of variable.
  2. The case clause may specify a pattern.
  3. The order of sentences in the case is important. The compiler produces an error if the previous case corresponds to the base type, and the next case corresponds to the derived type.
  4. All case blocks contain an implicit null (**) check. In the previous example, the last case block is correct, since it will only work if the argument is not null .

(**) The last case block shows another feature added in C # 7, called the “ discard ” pattern. The _ name is special and tells the compiler that the variable is not needed. A type pattern in a case clause requires a variable name, and if you are not going to use it, you can ignore it with _ .

The following fragment shows another switch -based pattern matching feature — the possibility of using predicates:

 public static void FizzBuzz(object o) { switch (o) { case string s when s.Contains("Fizz") || s.Contains("Buzz"): Console.WriteLine(s); break; case int n when n % 5 == 0 && n % 3 == 0: Console.WriteLine("FizzBuzz"); break; case int n when n % 5 == 0: Console.WriteLine("Fizz"); break; case int n when n % 3 == 0: Console.WriteLine("Buzz"); break; case int n: Console.WriteLine(n); break; } } 

A switch can have more than one case block with the same type. In this case, the compiler combines all type checks in one block to avoid redundant computations:

 public static void FizzBuzz(object o) { // All cases can match only if the value is not null if (o != null) { if (o is string s && (s.Contains("Fizz") || s.Contains("Buzz"))) { Console.WriteLine(s); return; } bool isInt = o is int; int num = isInt ? ((int)o) : 0; if (isInt) { // The type check and unboxing happens only once per group if (num % 5 == 0 && num % 3 == 0) { Console.WriteLine("FizzBuzz"); return; } if (num % 5 == 0) { Console.WriteLine("Fizz"); return; } if (num % 3 == 0) { Console.WriteLine("Buzz"); return; } Console.WriteLine(num); } } } 

But you need to keep in mind two things:

  1. The compiler combines only consecutive case blocks with the same type, and if you mix blocks for different types, the compiler will generate less optimal code:

     switch (o) { // The generated code is less optimal: // If o is int, then more than one type check and unboxing operation // may happen. case int n when n == 1: return 1; case string s when s == "": return 2; case int n when n == 2: return 3; default: return -1; } 

    The compiler converts it like this:

     if (o is int n && n == 1) return 1; if (o is string s && s == "") return 2; if (o is int n2 && n2 == 2) return 3; return -1; 

  2. The compiler is doing everything possible to prevent typical problems with the wrong order of case blocks.

     switch (o) { case int n: return 1; // Error: The switch case has already been handled by a previous case. case int n when n == 1: return 2; } 

    But the compiler does not know that one predicate is stronger than another and, in fact, makes the following blocks unreachable:

     switch (o) { case int n when n > 0: return 1; // Will never match, but the compiler won't warn you about it case int n when n > 1: return 2; } 

Pattern Matching 101


Source: https://habr.com/ru/post/347916/


All Articles