Pattern matching, changes and movements in Rust

One of the main goals of the Rust project is secure system programming. This area usually implies an imperative paradigm, which means the presence of side effects, the need to think about the shared state, etc. In order to ensure security under such conditions, the programs and data types on Rust must be structured in such a way that they can be statically checked. The elements and limitations of the Rust language together make it easier to write programs that pass these checks and thus provide security. For example, the concept of data ownership is deeply integrated into Rust.

The match expression is a special construct in which these features and limitations are combined in an interesting way. match expression takes an input value, classifies it, and then passes execution to the code that processes the corresponding data class.

In this article we will look at how match works in Rust. Here are the main elements that match and its complement, enum , combine into one:

Structural pattern matching: option analysis and usability are much better than using switch in C or Java.
Exhaustive analysis: match ensures that no options are omitted.
match supports both imperative and functional styles: you can continue to use the break statement, assignment, and so on, and you absolutely do not need to relearn the style based on expressions;
match can both “borrow” and “move”: Rust encourages the programmer to think about owning and borrowing data. The match expression is designed including the possibility of only borrowing a part of the structure instead of moving it . This is necessary in order not to transfer ownership of any data earlier than necessary.

We will look at each of these points separately below, but first we need to lay the foundation for further discussion - how does a match look and work?

`match` basics

The match expression in Rust looks like this:

 match INPUT_EXPRESSION { PATTERNS_1 => RESULT_EXPRESSION_1, PATTERNS_2 => RESULT_EXPRESSION_2, ... PATTERNS_n => RESULT_EXPRESSION_n }

where each of the PATTERNS_i contains at least one sample ( pattern , pattern - pattern). The sample describes a subset of all possible values into which the expression INPUT_EXPRESSION can be calculated. The PATTERN => RESULT_EXPRESSION is called a “branch”, or simply “branch” (“arm”).

Patterns can correspond to simple values, such as numbers or symbols; they may also correspond to special character data defined by the user with the help of enum .

The code below generates a guess (bad) in the number guessing game based on the previous answer:

 enum Answer { Higher, Lower, Bingo, } fn suggest_guess(prior_guess: u32, answer: Answer) { match answer { Answer::Higher => println!("maybe try {} next", prior_guess + 10), Answer::Lower => println!("maybe try {} next", prior_guess - 1), Answer::Bingo => println!("we won with {}!", prior_guess), } } #[test] fn demo_suggest_guess() { suggest_guess(10, Answer::Higher); suggest_guess(20, Answer::Lower); suggest_guess(19, Answer::Bingo); }

(Almost all the code in this article is directly executed - you can copy pieces of code into the demo.rs file, compile it with the argument - --test , run the resulting executable file, and observe the test results.)

Patterns can also be matched with structured data (i.e., tuples, slices, user-defined data types). In such templates, parts of the input piece of data are usually bound to individual local variables, which can then be used to calculate the result.

The special pattern _ matches any value and is often used for the default branch; special pattern .. summarizes _ behavior for mapping to a sequence of values or key-value pairs.

In addition, several templates can be used in one branch, combining them through a vertical line. In this case, this branch will be executed if the data is suitable for any of the combined templates.

All of the above can be seen in the updated version of the strategy for generating answers in the "guessing":

 struct GuessState { guess: u32, answer: Answer, low: u32, high: u32, } fn suggest_guess_smarter(s: GuessState) { match s { //      `answer`,  `Bingo`, //        `p`. GuessState { answer: Answer::Bingo, guess: p, .. } => { // ~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~ ~~ // | | | | // | | |    // | | | // | |    `guess` // | |    `p` // | | // | ,   `answer`  `Bingo` // | //   `s`   `GuessState` println!("we won with {}!", p); } //   ,      //     . //        (l..h), : // -    ,     , //        `l`,    //    ---  `h`; // -    ,     , //     `h`   ,  `l` --- //      . GuessState { answer: Answer::Higher, low: _, guess: l, high: h } | GuessState { answer: Answer::Lower, low: l, guess: h, high: _ } => { // ~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~ ~~~~~~ ~~~~~~~~ ~~~~~~~ // | | | | | // | | | |   // | | | |  // | | | |  `high`, // | | | |    // | | | |  // | | | | // | | |   `guess` // | | |    `l` // | | |  `h`,   // | | |   // | | | // | |     `low`, // | |     // | | // | ,   `answer`  // | `Higher`  `Lower`,   // |   // | //   `s`   `GuessState` let mid = l + ((h - l) / 2); println!("lets try {} next", mid); } } } #[test] fn demo_guess_state() { suggest_guess_smarter(GuessState { guess: 20, answer: Answer::Lower, low: 10, high: 1000 }); }

The ability to simultaneously check enumeration options and simultaneously link the substructures of the input data with variables allows you to write powerful, clear and simple code that focuses the attention of its reader on data important to a particular branch.

In a nutshell, this is how match works.

So how exactly does this design interact with the concept of ownership and security in general?

Exhaustive analysis of options

I usually start by excluding everything that is impossible. What remains must be true, however incredible it may seem.

Sherlock Holmes

One of the convenient approaches to solving a complex problem is to break it down into its component parts and analyze each separately. In order for this method to work, the partitioning must be exhaustive : the parts you selected should cover all possible scenarios.

Using enum and match in Rust can help with this, because match provides an exhaustive analysis of options: every possible input value must be covered with a pattern in at least one match branch. This helps to catch errors in the program logic and ensures that the result of the match expression is clearly defined.

Therefore, for example, the following code is rejected by the compiler:

 fn suggest_guess_broken(prior_guess: u32, answer: Answer) { let next_guess = match answer { Answer::Higher => prior_guess + 10, Answer::Lower => prior_guess - 1, // ERROR: non-exhaustive patterns: `Bingo` not covered }; println!("maybe try {} next", next_guess); }

Many other languages provide some sort of pattern matching (for example, ML and various implementations of match on macros in Scheme), but not all of them have this limitation.

In Rust, this restriction is present for three reasons:

First, as noted above, splitting a problem into parts gives a general solution only if this partition covers all possible scenarios. Check for completeness makes logical errors obvious.
Secondly, the fullness check helps with refactoring. During the development process, I often add new variations to individual definitions of enum 's. Checking for completeness helps to find all those match expressions where I used variants from the previous version of the enum type.
Thirdly, since match is an expression, the completeness check ensures that all its branches are either computed into a value of one type or transfer control to some other part of the program.

Exit `match`

The following code is a revised version of the suggest_guess_broken function, which is shown above; it directly demonstrates the transfer of control to another part of the program:

 fn suggest_guess_fixed(prior_guess: u32, answer: Answer) { let next_guess = match answer { Answer::Higher => prior_guess + 10, Answer::Lower => prior_guess - 1, Answer::Bingo => { println!("we won with {}!", prior_guess); return; } }; println!("maybe try {} next", next_guess); } #[test] fn demo_guess_fixed() { suggest_guess_fixed(10, Answer::Higher); suggest_guess_fixed(20, Answer::Lower); suggest_guess_fixed(19, Answer::Bingo); }

The suggest_guess_fixed function shows that match can process some of the variants and immediately exit the function, and in the remaining variants calculate the corresponding values and pass them further into the function body.

We can use similar constructions in match without fear of losing one of the options, because in match their analysis is exhaustive.

Algebraic data types and structural invariants

Algebraic data types concisely and accurately describe data classes and allow you to encode a variety of structural invariants. Rust uses the struct and enum definitions for this.

enum -type allows you to define mutually exclusive classes of values. The examples above use enumerations to create simple character labels, but in Rust, enumerations are also used for much more complex data classes.

For example, a binary tree is either a leaf or an internal node with links to two child trees. Here is one way to represent a binary tree of integers:

 enum BinaryTree { Leaf(i32), Node(Box<BinaryTree>, i32, Box<BinaryTree>) }

(The Box<V> means “owning” a link to an instance of a V type placed on the heap. If you own Box<V> , then you also own the V that it contains, and therefore you can change it, create it links, etc. When you have finished working with Box , and it is out of scope, the resources associated with placing V on the heap are automatically released.)

The above definition of enum ensures that if you have a value of type BinaryTree , it will always match one of the specified options. You will never get BinaryTree::Node without a left descendant. No need to check for null .

It’s still necessary to check whether this BinaryTree value is a Leaf or Node option, but the compiler statically guarantees that these checks will be made: you cannot accidentally read a Leaf value as if it were a Node , and vice versa.

Here is a function that adds all the numbers in the tree using match :

 fn tree_weight_v1(t: BinaryTree) -> i32 { match t { BinaryTree::Leaf(payload) => payload, BinaryTree::Node(left, payload, right) => { tree_weight_v1(*left) + payload + tree_weight_v1(*right) } } } ///  ,   : /// /// +----(4)---+ /// | | /// +-(2)-+ [5] /// | | /// [1] [3] /// fn sample_tree() -> BinaryTree { let l1 = Box::new(BinaryTree::Leaf(1)); let l3 = Box::new(BinaryTree::Leaf(3)); let n2 = Box::new(BinaryTree::Node(l1, 2, l3)); let l5 = Box::new(BinaryTree::Leaf(5)); BinaryTree::Node(n2, 4, l5) } #[test] fn tree_demo_1() { let tree = sample_tree(); assert_eq!(tree_weight_v1(tree), (1 + 2 + 3) + 4 + 5); }

Algebraic data types establish structural invariants that are strictly supported by the language. (The system of modules and privacy offers more ample opportunities for defining invariants, but we will not deviate from the topic.)

Expression and Operator Orientation

Unlike many other languages that provide pattern matching, Rust supports both expression-based and operator-based styles.

Many functional languages in which there is pattern matching encourage code writing using expressions, which focuses on the values returned by a combination of expressions, and the use of side effects is not recommended. In imperative languages, everything is exactly the opposite - they assume a comprehensive use of operators, i.e. sequences of commands executed only for their side effects.

Rust perfectly supports both styles.

Consider a function that converts a non-negative integer into a string, representing it as a numeral (“1st”, “2nd”, “3rd”, ...). The following code uses range patterns for simplicity, but it is written in a style similar to using switch in imperative languages like C (or C ++, or Java, etc.), when the branches of match are executed only for their side effects:

 fn num_to_ordinal(x: u32) -> String { let suffix; match (x % 10, x % 100) { (1, 1) | (1, 21...91) => { suffix = "st"; } (2, 2) | (2, 22...92) => { suffix = "nd"; } (3, 3) | (3, 23...93) => { suffix = "rd"; } _ => { suffix = "th"; } } return format!("{}{}", x, suffix); } #[test] fn test_num_to_ordinal() { assert_eq!(num_to_ordinal( 0), "0th"); assert_eq!(num_to_ordinal( 1), "1st"); assert_eq!(num_to_ordinal( 12), "12th"); assert_eq!(num_to_ordinal( 22), "22nd"); assert_eq!(num_to_ordinal( 43), "43rd"); assert_eq!(num_to_ordinal( 67), "67th"); assert_eq!(num_to_ordinal(1901), "1901st"); }

This program will compile, which is great because static analysis ensures at the same time that:

suffix always initialized before we call format! at the end of the function, and
the value of the variable suffix assigned only once during the execution of the function (if this were not the case, the compiler would force us to mark the suffix as a variable variable).

It is clear that this program can be written in a style based on expressions, for example, like this:

 fn num_to_ordinal_expr(x: u32) -> String { format!("{}{}", x, match (x % 10, x % 100) { (1, 1) | (1, 21...91) => "st", (2, 2) | (2, 22...92) => "nd", (3, 3) | (3, 23...93) => "rd", _ => "th" }) }

Sometimes this style helps to write a very clear and precise code, but sometimes vice versa, and then it is better to use operators with side effects (the ability to return a value from a function from one match branch to the suggest_guess_fixed function defined earlier, just shows this).

For both styles have their own applications. The most important thing is that switching to the operator style does not force you to sacrifice other Rust features, such as ensuring that the value of a non- mut variable can be assigned only once.

An important case where this is important is the initialization of a state and its subsequent borrowing, which occurs only in some branches of the execution flow:

 fn sometimes_initialize(input: i32) { let string: String; //     let borrowed: &str; //    match input { 0...100 => { //    ... string = format!("input prints as {}", input); // ...      borrowed = &string[6..]; } _ => { //      borrowed = "expected between 0 and 100"; } } println!("borrowed: {}", borrowed); //     ,   : // println!("string: {}", string); // ... : error: use of possibly uninitialized variable: `string` } #[test] fn demo_sometimes_initialize() { sometimes_initialize(23); //    `string` sometimes_initialize(123); //   -  }

The interesting thing here is that the code after match cannot directly access the string , because the compiler requires that the variable be initialized on all the paths of the program execution before it is accessed. At the same time, we can (with the help of borrowed ) access the data that is contained within the string , because the link to them is recorded in the borrowed in the first branch, and we made the borrowed initialized on all the execution paths leading to println! which borrowed and uses.

(The compiler ensures that no string borrowings can live longer than string , and in the generated code, at the end of the scope, the string value is automatically deployed if it was actually created.)

In short, to ensure correctness, the Rust language ensures that data is always initialized before it is used, but its developers have tried to avoid the need for artificial writing patterns when they are needed solely to “coax” the static analyzer (for example, requiring the string be initialized with empty data or using style based on expressions).

Pattern matching without moving

Matching a pattern with a value can borrow a substructure without taking ownership. This is important when applying pattern matching to links (i.e., to values of type &T ).

The section "Algebraic data types" above shows the type of binary tree and the program that calculates the sum of all numbers in the tree instance.

That version of tree_weight has a significant drawback: it takes a tree by value. As soon as you pass the tree to tree_weight_v1 , this tree disappears (i.e., is deployed).

 #[test] fn tree_demo_v1_fails() { let tree = sample_tree(); assert_eq!(tree_weight_v1(tree), (1 + 2 + 3) + 4 + 5); //      ... // assert_eq!(tree_weight_v1(tree), (1 + 2 + 3) + 4 + 5); // ...   : error: use of moved value: `tree` }

This behavior occurs, however, not because of the use of match , but rather because of the choice of the function signature:

 fn tree_weight_v1(t: BinaryTree) -> i32 { 0 } // ^~~~~~~~~~  ,    //   `t`

In fact, match in Rust works fine where there is no need to obtain ownership. In particular, the input to match is the L-value expression , that is, the result of the input expression must be a value that is in a certain area of memory . match evaluates this expression and then checks the data in the corresponding memory area.

(If the input expression is a variable name, dereference of a link or field, then the L-value will correspond to the memory area that contains this variable or field or where the link points. If the input expression is a function call or another operation that creates temporary anonymous value, then this value will be formally stored in the temporary memory area, and it will be processed by the match .)

Therefore, if we want to make such a variant of tree_weight , which simply borrows a tree, and does not take it away at all, then we should use the appropriate opportunity of the expression match .

 fn tree_weight_v2(t: &BinaryTree) -> i32 { // ^~~~~~~~~~~ `&` ,   **  match *t { BinaryTree::Leaf(payload) => payload, BinaryTree::Node(ref left, payload, ref right) => { tree_weight_v2(left) + payload + tree_weight_v2(right) } } } #[test] fn tree_demo_2() { let tree = sample_tree(); assert_eq!(tree_weight_v2(&tree), (1 + 2 + 3) + 4 + 5); }

The tree_weight_v2 function tree_weight_v2 very similar to tree_weight_v1 . The differences are as follows: firstly, it takes a reference ( & type), secondly, we added deference *t , and thirdly, importantly, we use ref patterns for left and right in the case of Node .

The dereferencing of the *t reference, being interpreted as an L-value expression, simply gets the address in memory where the BinaryTree is located (because t: &BinaryTree is just a reference to this data). Here *t neither creates a copy of the tree, nor moves it to a new place in memory because match treats it as L-value.

And the most important part of how the L-value expressions work, the ref patterns.

, - ref -:

T , i , i . , i T (, , " i: T ").

T , ( " T [] Copy "), i (, T ).

, -ref- , i T .

payload tree_weight_v2 i32 , i32 Copy , payload .

, ref -:

L-value- T ref i , . , ref i T , i T (, , i: &T ).

Node tree_weight_v2 left Box ( ), right , , .

tree_weight_v2 , .

, ref mut - ( ref mut i ) : i: &mut T . , . , , match , .

ref mut , :

 fn tree_grow(t: &mut BinaryTree) { // ^~~~~~~~~~~~~~~ `&mut`:      match *t { BinaryTree::Leaf(ref mut payload) => *payload += 1, BinaryTree::Node(ref mut left, ref mut payload, ref mut right) => { tree_grow(left); *payload += 1; tree_grow(right); } } } #[test] fn tree_demo_3() { let mut tree = sample_tree(); tree_grow(&mut tree); assert_eq!(tree_weight_v2(&tree), (2 + 3 + 4) + 5 + 6); }

, ref mut - payload . ref , payload , , . .

, left right Node . , , &mut -.

Conclusion

Rust — , , — Rust . enum match , , .

, , :

Higher Answer::Higher ,
,
ident @ pattern ,
{ let id = expr; ... } match expr { id => { ... } } ,

( #rust IRC ).

( , , Aaron Turon, Niko Matsakis, Mutabah , proc , libfud , asQuirrel annodomini #rust .)

Source: https://habr.com/ru/post/256941/

All Articles