Rust: for and iterators

In this article, we will discuss for loops, as well as related concepts of iterators and "objects to be iterated."

Depending on your previous experience with other programming languages, these concepts may seem very familiar in terms of syntax and semantics, or completely new and incomprehensible. Their closest counterparts can be found in Python, but I think programmers in Java, C #, or (modern) C ++ will also see a lot of intersections with what is in their languages.

The basics

In Raste, the for loop syntax is almost sparsely laconic:

let v = vec!["1", "2", "3"]; for x in v { println!("{}", x); }

(The variant of the for loop through a double semicolon is absent in Rast as a phenomenon, as well as in Python we can either iterate over a certain range , or use a while or a loop for more complex cases)
')
Expectedly, the code above will print three lines with 1, 2, 3. Perhaps less obvious is the fact that the vector v was moved inside the loop during its execution. Attempting to use this vector after a cycle will generate an error:

 <anon>:6:22: 6:23 error: use of moved value: `v` [E0382] <anon>:4 println!("{}", x); <anon>:5 } <anon>:6 println!("{:?}", v); ^

Possession of the vector and its elements completely irrevocably moved into the cycle. Being quite unexpected in comparison with other languages, this behavior is fully consistent with the general policy of Rasta “moving by default”.

But without being fully accustomed to the rules of moving and borrowing, this fact may still be a surprise to you, since for the most part, movement is associated with calling functions and their context. In most cases, to simplify the understanding, you can consider the for loop above as for the function for_each :

 for_each(v, |x| println!("{}", x));

This view also gives a hint how we can avoid moving the value inside the loop. Instead of transmitting the vector itself, we can only transmit a link to it:

 for_each_ref(&v, |x| println!("{}", x));

Transferring this code back to the loop form:

  for x in &v { println!("{}", x); } println!("{:?}", v);

We will get rid of the compiler error.

Iterators and "iterated objects"

It is important to note that the added ampersand ( & ) is by no means part of the for loop syntax. We simply changed the object by which we iterate, instead of Vec <T> , the vector itself, we pass & Vec <T> , an immutable (immutable) link to it. The consequence is a change of type x from T to & T , i.e. now it's a link to the item. (this had no effect on the body of the cycle due to the presence of a " transformation during dereference ")

Thus, it turns out that Vec <T> and & Vec <T> are both “iterable objects”. The usual way to implement this for programming languages is to introduce a special object - an “iterator”.

The iterator tracks which element it points to at the moment and supports at least the following operations:

Getting the current item
Move to the next item
Notification that items have run out

Some languages provide different iterators for each of these tasks, but in Rast it was decided to merge them into one. Looking at the documentation for the Iterator's trait, you will see that in order to satisfy its implementation it is enough to have one method next .

We remove syntactic sugar

But how exactly are iterator objects created from iterated objects?

In a typical Rasta manner, this task is delegated to another treyte called IntoIterator :

 // () trait IntoIterator { fn into_iter(self) -> Iterator; }

A unique feature of Rasta is that into_iter , the only method of this treit, not only creates an iterator from the collection, but essentially absorbs the original collection, leaving the resulting iterator the only way to access the elements of the collection. (Because of what can we say this? The fact is that into_iter receives self as an argument, not & self or & mut self , which means that ownership of the object is passed inside this method)

(translator's note: hereinafter, the author does not consider in detail the difference between the collection of into_iter , iter and iter_mut methods for creating iterators, which is that the first moves the collection inward, and the second borrows immutably, and therefore the iteration follows the links to the third one borrows mutable, thereby allowing to change the elements of the collection during the iteration)

This behavior protects us from a very common mistake called iterator disability, which is probably well known to C ++ programmers. Since Since the collection is essentially “converted” to an iterator, the following becomes impossible:

The existence of more than one iterator pointing to a collection
Modification of the collection while one of the iterators is in scope

Do not all these “movements” and “borrowings” sound familiar to you? Earlier, I noted that iterating over a vector in a for loop, we essentially move it “inside the loop”.

As you can already guess during the iteration over the vector, we actually call IntoIterator :: into_iter for this vector, getting an iterator at its output. Calling next in each iteration, we continue to cycle through until we get None .

Thus, the cycle:

 for x in v { //   }

In essence, just syntactic sugar for the following expression:

 let mut iter = IntoIterator::into_iter(v); loop { match iter.next() { Some(x) => { //   }, None => break, } }

You can see well that v cannot be used not only after the cycle ends, but even before it begins . This happened because we moved the vector inside the iter iterator through the into_iter trait method ... IntoIterator !

Simple, isn't it? :)

The for loop is a syntax sugar for invoking IntoIterator :: into_iter followed by repeated invocation of Iterator :: next .

Ampersand

However, this behavior is not always desirable. But we already know how to avoid it. Instead of iterating over the vector itself, use the link to it:

 for x in &v { //   }

(comment perev .: equivalent to for x in v.iter () {...} )

In this case, everything we talked about above is applied here, right up to the disclosure of syntactic sugar. The into_iter method is called in the same way as before, with one difference, instead of a vector, it receives a link to it:

 // () impl IntoIterator for &Vec<T> { fn into_iter(self) -> Iterator<Item=&T> { ... } }

Thus, the output iterator will produce references to the elements of the vector ( & T ), rather than the elements themselves ( T ). And since The self above is also a link, the collection does not move anywhere, so we can safely access it after the end of the cycle.

The same goes for changeable links:

 for x in &mut v { //   }

(comment perev .: equivalent to for x in v.iter_mut () {...} )

With the only difference that now into_iter is called for & mut vec <T> . Accordingly, the result of the form Iterator <Item = & mut T> allows us to modify the elements of the collection.

To support these two cases, we did not need any additional compiler support, since everything is already covered by the same treyt.

Disclosing the syntactic sugar of a loop through IntoIterator works the same for the objects of the collections themselves, and for the mutable and immutable references to them.

What about the iter method?

So far, we have only talked about for loops, which represent a very imperative style of calculations.

If you are more inclined towards functional programming, you may have seen and wrote various constructions combining methods like the following:

 let doubled_odds: Vec<_> = numbers.iter() .filter(|&x| x % 2 != 0).map(|&x| x * 2).collect();

Methods like map and filter are called iterator adapters, and all of them are defined for the Iterator treyt . They are not only very numerous and expressive, but can also be supplied by third-party crates .

In order to take advantage of adapters, we need to first get an iterator. We know that cycles usually get it through into_iter , so in principle we can use the same approach here:

 let doubled_odds: Vec<_> = IntoIterator::into_iter(&numbers) .filter(|&x| x % 2 != 0).map(|&x| x * 2).collect();

In order to improve the readability of the code and reduce its size, collections usually provide the iter method, which is an abbreviation of the expression above. It is this method that you usually will see in expressions like the above.

v.iter () is nothing more than an abbreviation for IntoIterator :: into_iter (& v) .

How about both?

Last thing worth noting: Rast does not indicate what we should use, iterators or cycles, to work with collections. With enabled optimizations in release mode, both approaches must be compiled into equally efficient machine code with inline closures and deployed if necessary cycles.

Thus, the choice of an approach is nothing more than a matter of style and habit. Sometimes the right solution is to mix both approaches, which Rust allows you to do without problems:

 fn print_prime_numbers_upto(n: i32) { println!("Prime numbers lower than {}:", n); for x in (2..n).filter(|&i| is_prime(i)) { println!("{}", x); } }

As before, this is possible through the disclosure of syntactic sugar using the IntoIterator trait. In this case, Rast will convert the iterator into itself.

The iterators themselves are also “iterated objects”, by means of the “transparent” implementation of the IntoIterator :: trait_ into_iter treit .

Finally

If you want to know more information about iterators and cycles, official documentation will be the best source for you. And although mastering all of the adapters of the iterators is by no means necessary for writing effective code on Rast, it is very likely that a close look at the documentation for the collect method and the associated FromIterator trait will be very useful for you.

Source: https://habr.com/ru/post/306702/

All Articles