One of the main goals of the Rust project is secure system programming. This area usually implies an imperative paradigm, which means the presence of side effects, the need to think about the shared state, etc. In order to ensure security under such conditions, the programs and data types on Rust must be structured in such a way that they can be statically checked. The elements and limitations of the Rust language together make it easier to write programs that pass these checks and thus provide security. For example, the concept of data ownership is deeply integrated into Rust.
The match
expression is a special construct in which these features and limitations are combined in an interesting way. match
expression takes an input value, classifies it, and then passes execution to the code that processes the corresponding data class.
In this article we will look at how match
works in Rust. Here are the main elements that match
and its complement, enum
, combine into one:
switch
in C or Java.match
ensures that no options are omitted.match
supports both imperative and functional styles: you can continue to use the break
statement, assignment, and so on, and you absolutely do not need to relearn the style based on expressions;match
can both “borrow” and “move”: Rust encourages the programmer to think about owning and borrowing data. The match
expression is designed including the possibility of only borrowing a part of the structure instead of moving it . This is necessary in order not to transfer ownership of any data earlier than necessary.We will look at each of these points separately below, but first we need to lay the foundation for further discussion - how does a match
look and work?
match
basicsThe match
expression in Rust looks like this:
match INPUT_EXPRESSION { PATTERNS_1 => RESULT_EXPRESSION_1, PATTERNS_2 => RESULT_EXPRESSION_2, ... PATTERNS_n => RESULT_EXPRESSION_n }
where each of the PATTERNS_i
contains at least one sample ( pattern , pattern - pattern). The sample describes a subset of all possible values ​​into which the expression INPUT_EXPRESSION
can be calculated. The PATTERN => RESULT_EXPRESSION
is called a “branch”, or simply “branch” (“arm”).
Patterns can correspond to simple values, such as numbers or symbols; they may also correspond to special character data defined by the user with the help of enum
.
The code below generates a guess (bad) in the number guessing game based on the previous answer:
enum Answer { Higher, Lower, Bingo, } fn suggest_guess(prior_guess: u32, answer: Answer) { match answer { Answer::Higher => println!("maybe try {} next", prior_guess + 10), Answer::Lower => println!("maybe try {} next", prior_guess - 1), Answer::Bingo => println!("we won with {}!", prior_guess), } } #[test] fn demo_suggest_guess() { suggest_guess(10, Answer::Higher); suggest_guess(20, Answer::Lower); suggest_guess(19, Answer::Bingo); }
(Almost all the code in this article is directly executed - you can copy pieces of code into the demo.rs
file, compile it with the argument - --test
, run the resulting executable file, and observe the test results.)
Patterns can also be matched with structured data (i.e., tuples, slices, user-defined data types). In such templates, parts of the input piece of data are usually bound to individual local variables, which can then be used to calculate the result.
The special pattern _
matches any value and is often used for the default branch; special pattern ..
summarizes _
behavior for mapping to a sequence of values ​​or key-value pairs.
In addition, several templates can be used in one branch, combining them through a vertical line. In this case, this branch will be executed if the data is suitable for any of the combined templates.
All of the above can be seen in the updated version of the strategy for generating answers in the "guessing":
struct GuessState { guess: u32, answer: Answer, low: u32, high: u32, } fn suggest_guess_smarter(s: GuessState) { match s { // `answer`, `Bingo`, // `p`. GuessState { answer: Answer::Bingo, guess: p, .. } => { // ~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~ ~~ // | | | | // | | | // | | | // | | `guess` // | | `p` // | | // | , `answer` `Bingo` // | // `s` `GuessState` println!("we won with {}!", p); } // , // . // (l..h), : // - , , // `l`, // --- `h`; // - , , // `h` , `l` --- // . GuessState { answer: Answer::Higher, low: _, guess: l, high: h } | GuessState { answer: Answer::Lower, low: l, guess: h, high: _ } => { // ~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~ ~~~~~~ ~~~~~~~~ ~~~~~~~ // | | | | | // | | | | // | | | | // | | | | `high`, // | | | | // | | | | // | | | | // | | | `guess` // | | | `l` // | | | `h`, // | | | // | | | // | | `low`, // | | // | | // | , `answer` // | `Higher` `Lower`, // | // | // `s` `GuessState` let mid = l + ((h - l) / 2); println!("lets try {} next", mid); } } } #[test] fn demo_guess_state() { suggest_guess_smarter(GuessState { guess: 20, answer: Answer::Lower, low: 10, high: 1000 }); }
The ability to simultaneously check enumeration options and simultaneously link the substructures of the input data with variables allows you to write powerful, clear and simple code that focuses the attention of its reader on data important to a particular branch.
In a nutshell, this is how match
works.
So how exactly does this design interact with the concept of ownership and security in general?
I usually start by excluding everything that is impossible. What remains must be true, however incredible it may seem.
Sherlock Holmes
One of the convenient approaches to solving a complex problem is to break it down into its component parts and analyze each separately. In order for this method to work, the partitioning must be exhaustive : the parts you selected should cover all possible scenarios.
Using enum
and match
in Rust can help with this, because match
provides an exhaustive analysis of options: every possible input value must be covered with a pattern in at least one match
branch. This helps to catch errors in the program logic and ensures that the result of the match
expression is clearly defined.
Therefore, for example, the following code is rejected by the compiler:
fn suggest_guess_broken(prior_guess: u32, answer: Answer) { let next_guess = match answer { Answer::Higher => prior_guess + 10, Answer::Lower => prior_guess - 1, // ERROR: non-exhaustive patterns: `Bingo` not covered }; println!("maybe try {} next", next_guess); }
Many other languages ​​provide some sort of pattern matching (for example, ML and various implementations of match
on macros in Scheme), but not all of them have this limitation.
In Rust, this restriction is present for three reasons:
enum
's. Checking for completeness helps to find all those match
expressions where I used variants from the previous version of the enum
type.match
is an expression, the completeness check ensures that all its branches are either computed into a value of one type or transfer control to some other part of the program.match
The following code is a revised version of the suggest_guess_broken
function, which is shown above; it directly demonstrates the transfer of control to another part of the program:
fn suggest_guess_fixed(prior_guess: u32, answer: Answer) { let next_guess = match answer { Answer::Higher => prior_guess + 10, Answer::Lower => prior_guess - 1, Answer::Bingo => { println!("we won with {}!", prior_guess); return; } }; println!("maybe try {} next", next_guess); } #[test] fn demo_guess_fixed() { suggest_guess_fixed(10, Answer::Higher); suggest_guess_fixed(20, Answer::Lower); suggest_guess_fixed(19, Answer::Bingo); }
The suggest_guess_fixed
function shows that match
can process some of the variants and immediately exit the function, and in the remaining variants calculate the corresponding values ​​and pass them further into the function body.
We can use similar constructions in match
without fear of losing one of the options, because in match
their analysis is exhaustive.
Algebraic data types concisely and accurately describe data classes and allow you to encode a variety of structural invariants. Rust uses the struct
and enum
definitions for this.
enum
-type allows you to define mutually exclusive classes of values. The examples above use enumerations to create simple character labels, but in Rust, enumerations are also used for much more complex data classes.
For example, a binary tree is either a leaf or an internal node with links to two child trees. Here is one way to represent a binary tree of integers:
enum BinaryTree { Leaf(i32), Node(Box<BinaryTree>, i32, Box<BinaryTree>) }
(The Box<V>
means “owning” a link to an instance of a V
type placed on the heap. If you own Box<V>
, then you also own the V
that it contains, and therefore you can change it, create it links, etc. When you have finished working with Box
, and it is out of scope, the resources associated with placing V
on the heap are automatically released.)
The above definition of enum
ensures that if you have a value of type BinaryTree
, it will always match one of the specified options. You will never get BinaryTree::Node
without a left descendant. No need to check for null
.
It’s still necessary to check whether this BinaryTree
value is a Leaf
or Node
option, but the compiler statically guarantees that these checks will be made: you cannot accidentally read a Leaf
value as if it were a Node
, and vice versa.
Here is a function that adds all the numbers in the tree using match
:
fn tree_weight_v1(t: BinaryTree) -> i32 { match t { BinaryTree::Leaf(payload) => payload, BinaryTree::Node(left, payload, right) => { tree_weight_v1(*left) + payload + tree_weight_v1(*right) } } } /// , : /// /// +----(4)---+ /// | | /// +-(2)-+ [5] /// | | /// [1] [3] /// fn sample_tree() -> BinaryTree { let l1 = Box::new(BinaryTree::Leaf(1)); let l3 = Box::new(BinaryTree::Leaf(3)); let n2 = Box::new(BinaryTree::Node(l1, 2, l3)); let l5 = Box::new(BinaryTree::Leaf(5)); BinaryTree::Node(n2, 4, l5) } #[test] fn tree_demo_1() { let tree = sample_tree(); assert_eq!(tree_weight_v1(tree), (1 + 2 + 3) + 4 + 5); }
Algebraic data types establish structural invariants that are strictly supported by the language. (The system of modules and privacy offers more ample opportunities for defining invariants, but we will not deviate from the topic.)
Unlike many other languages ​​that provide pattern matching, Rust supports both expression-based and operator-based styles.
Many functional languages ​​in which there is pattern matching encourage code writing using expressions, which focuses on the values ​​returned by a combination of expressions, and the use of side effects is not recommended. In imperative languages, everything is exactly the opposite - they assume a comprehensive use of operators, i.e. sequences of commands executed only for their side effects.
Rust perfectly supports both styles.
Consider a function that converts a non-negative integer into a string, representing it as a numeral (“1st”, “2nd”, “3rd”, ...). The following code uses range patterns for simplicity, but it is written in a style similar to using switch
in imperative languages ​​like C (or C ++, or Java, etc.), when the branches of match
are executed only for their side effects:
fn num_to_ordinal(x: u32) -> String { let suffix; match (x % 10, x % 100) { (1, 1) | (1, 21...91) => { suffix = "st"; } (2, 2) | (2, 22...92) => { suffix = "nd"; } (3, 3) | (3, 23...93) => { suffix = "rd"; } _ => { suffix = "th"; } } return format!("{}{}", x, suffix); } #[test] fn test_num_to_ordinal() { assert_eq!(num_to_ordinal( 0), "0th"); assert_eq!(num_to_ordinal( 1), "1st"); assert_eq!(num_to_ordinal( 12), "12th"); assert_eq!(num_to_ordinal( 22), "22nd"); assert_eq!(num_to_ordinal( 43), "43rd"); assert_eq!(num_to_ordinal( 67), "67th"); assert_eq!(num_to_ordinal(1901), "1901st"); }
This program will compile, which is great because static analysis ensures at the same time that:
suffix
always initialized before we call format!
at the end of the function, andsuffix
assigned only once during the execution of the function (if this were not the case, the compiler would force us to mark the suffix
as a variable variable).It is clear that this program can be written in a style based on expressions, for example, like this:
fn num_to_ordinal_expr(x: u32) -> String { format!("{}{}", x, match (x % 10, x % 100) { (1, 1) | (1, 21...91) => "st", (2, 2) | (2, 22...92) => "nd", (3, 3) | (3, 23...93) => "rd", _ => "th" }) }
Sometimes this style helps to write a very clear and precise code, but sometimes vice versa, and then it is better to use operators with side effects (the ability to return a value from a function from one match
branch to the suggest_guess_fixed
function defined earlier, just shows this).
For both styles have their own applications. The most important thing is that switching to the operator style does not force you to sacrifice other Rust features, such as ensuring that the value of a non- mut
variable can be assigned only once.
An important case where this is important is the initialization of a state and its subsequent borrowing, which occurs only in some branches of the execution flow:
fn sometimes_initialize(input: i32) { let string: String; // let borrowed: &str; // match input { 0...100 => { // ... string = format!("input prints as {}", input); // ... borrowed = &string[6..]; } _ => { // borrowed = "expected between 0 and 100"; } } println!("borrowed: {}", borrowed); // , : // println!("string: {}", string); // ... : error: use of possibly uninitialized variable: `string` } #[test] fn demo_sometimes_initialize() { sometimes_initialize(23); // `string` sometimes_initialize(123); // - }
The interesting thing here is that the code after match
cannot directly access the string
, because the compiler requires that the variable be initialized on all the paths of the program execution before it is accessed. At the same time, we can (with the help of borrowed
) access the data that is contained within the string
, because the link to them is recorded in the borrowed
in the first branch, and we made the borrowed
initialized on all the execution paths leading to println!
which borrowed
and uses.
(The compiler ensures that no string
borrowings can live longer than string
, and in the generated code, at the end of the scope, the string
value is automatically deployed if it was actually created.)
In short, to ensure correctness, the Rust language ensures that data is always initialized before it is used, but its developers have tried to avoid the need for artificial writing patterns when they are needed solely to “coax” the static analyzer (for example, requiring the string
be initialized with empty data or using style based on expressions).
Matching a pattern with a value can borrow a substructure without taking ownership. This is important when applying pattern matching to links (i.e., to values ​​of type &T
).
The section "Algebraic data types" above shows the type of binary tree and the program that calculates the sum of all numbers in the tree instance.
That version of tree_weight
has a significant drawback: it takes a tree by value. As soon as you pass the tree to tree_weight_v1
, this tree disappears (i.e., is deployed).
#[test] fn tree_demo_v1_fails() { let tree = sample_tree(); assert_eq!(tree_weight_v1(tree), (1 + 2 + 3) + 4 + 5); // ... // assert_eq!(tree_weight_v1(tree), (1 + 2 + 3) + 4 + 5); // ... : error: use of moved value: `tree` }
This behavior occurs, however, not because of the use of match
, but rather because of the choice of the function signature:
fn tree_weight_v1(t: BinaryTree) -> i32 { 0 } // ^~~~~~~~~~ , // `t`
In fact, match
in Rust works fine where there is no need to obtain ownership. In particular, the input to match
is the L-value expression , that is, the result of the input expression must be a value that is in a certain area of ​​memory . match
evaluates this expression and then checks the data in the corresponding memory area.
(If the input expression is a variable name, dereference of a link or field, then the L-value will correspond to the memory area that contains this variable or field or where the link points. If the input expression is a function call or another operation that creates temporary anonymous value, then this value will be formally stored in the temporary memory area, and it will be processed by the match
.)
Therefore, if we want to make such a variant of tree_weight
, which simply borrows a tree, and does not take it away at all, then we should use the appropriate opportunity of the expression match
.
fn tree_weight_v2(t: &BinaryTree) -> i32 { // ^~~~~~~~~~~ `&` , ** match *t { BinaryTree::Leaf(payload) => payload, BinaryTree::Node(ref left, payload, ref right) => { tree_weight_v2(left) + payload + tree_weight_v2(right) } } } #[test] fn tree_demo_2() { let tree = sample_tree(); assert_eq!(tree_weight_v2(&tree), (1 + 2 + 3) + 4 + 5); }
The tree_weight_v2
function tree_weight_v2
very similar to tree_weight_v1
. The differences are as follows: firstly, it takes a reference ( &
type), secondly, we added deference *t
, and thirdly, importantly, we use ref
patterns for left
and right
in the case of Node
.
The dereferencing of the *t
reference, being interpreted as an L-value expression, simply gets the address in memory where the BinaryTree
is located (because t: &BinaryTree
is just a reference to this data). Here *t
neither creates a copy of the tree, nor moves it to a new place in memory because match
treats it as L-value.
And the most important part of how the L-value expressions work, the ref
patterns.
, - ref
-:
T
, i
, i
. , i
T
(, , " i: T
").T
, ( " T
[] Copy
"), i
(, T
).
, -ref- , i
T
.
payload
tree_weight_v2
i32
, i32
Copy
, payload
.
, ref
-:
T
ref i
, . , ref i
T
, i
T
(, , i: &T
).Node
tree_weight_v2
left
Box
( ), right
, , .
tree_weight_v2
, .
, ref mut
- ( ref mut i
) : i: &mut T
. , . , , match
, .
ref mut
, :
fn tree_grow(t: &mut BinaryTree) { // ^~~~~~~~~~~~~~~ `&mut`: match *t { BinaryTree::Leaf(ref mut payload) => *payload += 1, BinaryTree::Node(ref mut left, ref mut payload, ref mut right) => { tree_grow(left); *payload += 1; tree_grow(right); } } } #[test] fn tree_demo_3() { let mut tree = sample_tree(); tree_grow(&mut tree); assert_eq!(tree_weight_v2(&tree), (2 + 3 + 4) + 5 + 6); }
, ref mut
- payload
. ref
, payload
, , . .
, left
right
Node
. , , &mut
-.
Rust — , , — Rust . enum
match
, , .
, , :
Higher
Answer::Higher
,ident @ pattern
,{ let id = expr; ... }
match expr { id => { ... } }
,( , , Aaron Turon, Niko Matsakis, Mutabah
, proc
, libfud
, asQuirrel
annodomini
#rust
.)
Source: https://habr.com/ru/post/256941/
All Articles