📜 ⬆️ ⬇️

Why your first FizzBuzz implementation on Rust may not work

The full original title of the article : "Although you might not like it)"

tl; dr; -version : At first glance, some aspects of Rust may seem strange and even uncomfortable, however, they turn out to be very successful for a language that is positioned as a system language. The concepts of ownership and lifetime make it possible to bring strong static guarantees into the language and make the programs effective and safe both in memory and in time.

License : CC-BY by Chris Morgan .
')

Why your first FizzBuzz implementation may not work: exploring some of the features of Rust that initially shock, but in fact are its best sides (although you may still not like them)

http://chrismorgan.info/media/images/rust-fizzbuzz.svg FizzBuzz is supposed to be a simple task for a beginner, but in Rust there are several pitfalls that you should know about. These pitfalls are not Rust problems, but rather differences from what most programmers are familiar with, restrictions that may seem very tough at first glance, but in reality they give huge advantages at a low price.

Rust is a "moving target", however, the language becomes more stable. The code from the article works with version 0.12. If something breaks, please contact me . Regarding the code in Python, it will work in both deuce and triple.

Simple implementation


Ok, I said in the title that your first implementation of FizzBuzz might not work, well, it may work. You could write it as in the example below. For brevity, omit fn main() { … } . If you are worried that the Python code is shorter than Rust, then there is a special form of Python code for you, it is available by clicking on the checkbox. ( Note: in the original article there is a checkbox that switches the Python code into a “special form” from __future__ import braces , something like Easter eggs from the author ).
Implementing FizzBuzz with separate print instructions: everything just works.
PythonRusty
 for i in range(1, 101): if i % 15 == 0: print('FizzBuzz') elif i % 5 == 0: print('Buzz') elif i % 3 == 0: print('Fizz') else: print(i) 

 for i in range(1i, 101) { if i % 15 == 0 { println!("FizzBuzz"); } else if i % 5 == 0 { println!("Buzz"); } else if i % 3 == 0 { println!("Fizz"); } else { println!("{}", i); } } 


Both programs produce the desired result, and they are obviously very similar. The main thing that is worth mentioning here is that in Rust println!() [1] requires a string literal as the first argument, a format string; the corresponding Python code will look like this: print('{}'.format(i)) .

But what if we want to get rid of the duplication of the print call in the code? Here’s how it might look:
FizzBuzz with one print instruction.
PythonRust (will not compile)
 for i in range(1, 101): if i % 15 == 0: result = 'FizzBuzz' elif i % 5 == 0: result = 'Buzz' elif i % 3 == 0: result = 'Fizz' else: result = i print(result) 

 for i in range(1i, 101) { let result = if i % 15 == 0 { "FizzBuzz" } else if i % 5 == 0 { "Buzz" } else if i % 3 == 0 { "Fizz" } else { i }; println!("{}", result); } 


Notice how in Rust we can use the whole if block as an expression. Even the result of the assignment is not really needed, we could just shove the entire block into the construct where it is used. This is a very familiar approach for rubists, but not for pythonists, because in Python everything is an instruction, not an expression. If you are skeptical of this approach, I understand; when I started to get acquainted with Rust, his bias in expressions and the ability to omit the return instruction seemed to me wonderful. But using Rust, I realized that this was not the case. Actually it's great.

Python was my favorite language for five years, but despite the fact that I continue to write professionally in this language (although I would like to switch to Rust), I discovered that I’ve increasingly lacked chips from Rust. When working with Rust, I don’t feel the same lack of anything from Python, apart from the need for libraries that are not yet in Rust. In general, Rust has greatly degraded Python in my eyes.

The code on Rust looks good, but in reality it does not work because of the strict typing rules in this language. So what is the type of the result variable? The first three if branches return strings, and the fourth, an integer:

 f.rs:7:12: 11: 6 error: if and then have incompatible types: expected `& 'static str`, found` int` (expected & -ptr, found int)
 f.rs:7} else if i% 3 == 0 {
 f.rs:8 "Fizz"
 f.rs:9} else {
 f.rs:10 i
 f.rs:11};
 error: aborting due to previous error

This does not work. How about turning a number into a string?
 for i in range(1i, 101) { let result = if i % 15 == 0 { "FizzBuzz" } else if i % 5 == 0 { "Buzz" } else if i % 3 == 0 { "Fizz" } else { i.to_string() }; println!("{}", result); } 

Here we drew off the chip, which many are familiar with from other programming languages ​​( to_string ) and applied it in a field that not many people understand. In general, this does not work.

  f.rs:7:12: 11: 6 error: if and else have incompatible types: expected `static str`, found` collections :: string :: String` (expected & -ptr, found struct collections :: string :: String)
 f.rs:7} else if i% 3 == 0 {
 f.rs:8 "Fizz"
 f.rs:9} else {
 f.rs:10 i.to_string ()
 f.rs:11};
 error: aborting due to previous error

“What?” I can hear you say, “aren't they all lines now? What's the deal with this &'static str (yes how the hell do you even say it?) And collections::string::String ? ”. At this stage, we need to take a more careful approach to analyzing the types of values ​​produced by the branches: the first three branches do not produce just some kind of "string", they produce &'static str , and the fourth branch does not produce just "whole", and int . In languages ​​like Python, Ruby and JavaScript, the types of integers are combined (and JS went even further, and completely combined all numeric types), while C #, Java, Go have many types of integers that differ in size. But even languages ​​like C #, Java, Go have only one type for a string.

And Rust is not. He has two of them.

Two types of strings? What is this anyway?


Here we could restrict ourselves to a simple explanation and go further, but since we went down so deeply, why not go down to the end, and understand what was done and why it was absolutely worth it. So why C♯, Java and Go could be satisfied with one string type, but Rust did not? To answer this question we must descend to the level of memory management.

Both C♯, Java and Go are all managed languages [2] (also known as garbage collection languages). That is, they have a mechanism in runtime that manages the allocation and freeing of memory at the appropriate time: when no one else uses the string, it can be specially born. Thus, they can return a link to a string without worrying about its lifetime: strings that are still in use will not be singled out.

There is also one assignment for these languages. As a rule, they have immutable (immutable) lines - if you concatenate two lines, memory allocation (allocation) for a new line of the desired size will occur. (This also means that for a solution that will concatenate Fizz and Buzz in appropriate cases, there will be two allocations for numbers divisible by 15. True, some languages ​​may slightly smooth out this negative effect by applying what is called a string pool or internment The success of this mechanism depends on the optimizer and how the code is written. I suppose the lines are immutable because, as an alternative, we will have more of the evils - changing the line may affect other lines depending on it. This strongly beats the correctness of the program and can lead to race conditions in what is essentially a primitive type.
Also, for unicode strings, this can lead to the appearance of incorrect line slices. Of course, these problems also arise in other places, but having them in lines can be much worse. (I said that these languages ​​have the same string type, but this is not quite the case - there are also specialized string types, for example, both Java and .NET have a mechanism called StringBuilder ).

The Rust model differs from the one used in garbage collection languages ​​and is based on the concept of ownership. In this model, each object has one owner ( approx. Ln. Is owned ) in one place at one time, and in other places you can safely receive a pointer to it, borrow (borrow).

collections::string::String is a type with ownership . This means that he has the exclusive right to own the contents of the effluent. When an object of this type leaves its scope, the string is freed. Therefore, any substring cannot be of type String , because there will be no connection between the string and the substring, and when the first leaves its scope, the second will become incorrect. Instead, substrings (or string sections ) use a type that is a reference to an object owned by someone else - &str . Rust, thanks to the concept of the lifetime of an object, is able to ensure that not a single string cut survives its original string, thus maintaining memory security.

There is a more detailed explanation in the life time guide . Here, if you see the construction of '_ after the reference type, know that this is how the link lifetime is determined. There is a special lifetime of 'static , which means that the object exists for the entire duration of the program. Such objects are baked directly into the executable file, as well as string literals that are found in the code — that is, the type of the string literal &'static str .

Earlier, when the type ~T was what Box<T> , and str was fake, type ~str was a string type resizable. It kept the current (size) and maximum (capacity) size as the current type of String (which replaced ~str ). It was assumed that all types of wrappers will work this way. Now, Box<T> is a simple wrapped value. That is why it is not used - without additional capacity, he would need to re-allocate the memory each time he writes to the line. String able to re-allocate memory and does it by default. Therefore, the difference between Box<str> and &str significant.

I can add that during this change, the new type wore the name StrBuf . In fact, the situation is not much different from that in other languages. In fact, this is the effect of the lack of mandatory garbage collection, which makes some applications of &str stupid. In Rust, you have to access the string buffer more often than in other languages, simply because other languages allow you to treat your main string type more frivolously.


Back to fizzbuzz


That is, the problem is that in one branch we have a line with ownership, and the lines in the other three are just static string sections (links to statically defined lines). How do we solve this problem? Maybe we can try to make them all in string sections (yes, yes, the type &str any lifetime 'a we can implicitly result in 'b , if 'a longer than 'b . Since 'static longer than anything, the compiler can freely convert it to a suitable lifetime):
 for i in range(1i, 101) { let result = if i % 15 == 0 { "FizzBuzz" } else if i % 5 == 0 { "Buzz" } else if i % 3 == 0 { "Fizz" } else { i.to_string().as_slice() }; println!("{}", result); } 

Looks like a good idea, huh? Sorry, that won't work either:
 f.rs:10:9: 10:22 error: borrowed value does not live long enough
 f.rs:10 i.to_string (). as_slice ()
                 ^ ~~~~~~~~~~~~
 f.rs:2:25: 13: 2 note: reference must be valid for the block at 2:24 ...
 f.rs:2 for i in range (1i, 101) {
 f.rs Called result = if i% 15 == 0 {
 f.rs:4 "FizzBuzz"
 f.rs.12} else if i% 5 == 0 {
 f.rs:6 "Buzz"
 f.rs:7} else if i% 3 == 0 {
        ...
 f.rs:9:12: 11: 6 note: ... but borrowed value is only valid for the expression at 9:11
 f.rs:9} else {
 f.rs:10 i.to_string (). as_slice ()
 f.rs:11};
 error: aborting due to previous error

Here we run into a lifetime: the string spawned in i.to_string() not stored for sufficient time and is freed at the end of the block. Thus, the link to it also cannot leave the block. This is a potential bug related to the reference to non-valid memory that the Rust compiler successfully caught. In some languages, this is called “dangling index” and it is very bad.

Here we can simply raise the string variable per block, it is enough for us that the string be valid during the body of the loop. Sometimes you will be confronted with situations in which this will be enough, but often not.
 for i in range(1i, 101) { let x; let result = if i % 15 == 0 { "FizzBuzz" } else if i % 5 == 0 { "Buzz" } else if i % 3 == 0 { "Fizz" } else { x = i.to_string(); x.as_slice() }; println!("{}", result); } 
We place the link in the covering block, it works.

How about making everything a String ?


We can go in the opposite direction, obliging all branches to return rows with ownership:
 for i in range(1i, 101) { let result = if i % 15 == 0 { "FizzBuzz".to_string() } else if i % 5 == 0 { "Buzz".to_string() } else if i % 3 == 0 { "Fizz".to_string() } else { i.to_string() }; println!("{}", result); } 
We do everything in rows, but not for free for runtime.

This approach works well, but it means that memory will be allocated for each iteration, not only for those in which we get a number.

Write the function


We passed as much as we could in this direction, without rolling the code into absurdity. How about changing the very formulation of the problem, namely, that we do not print the result, but return it from the function?

Let's start with this code:
The fizz_buzz function that returns a String .
PythonRusty
 def fizz_buzz(i): if i % 15 == 0: return 'FizzBuzz' elif i % 5 == 0: return 'Buzz' elif i % 3 == 0: return 'Fizz' else: return i for i in range(1, 101): print(fizz_buzz(i)) 

 fn fizz_buzz(i: int) -> String { if i % 15 == 0 { "FizzBuzz".to_string() } else if i % 5 == 0 { "Buzz".to_string() } else if i % 3 == 0 { "Fizz".to_string() } else { i.to_string() } } for i in range(1i, 101) { println!("{}", fizz_buzz(i)); } 

Now we have an extra level of encapsulation. It demonstrates just the case when a decision with the removal of a variable to a higher level will not work, because the variable will leave the function itself.
( You can try it yourself ; the return value of the function cannot be represented in the Rust type system, since there is no suitable lifetime - x will not get a 'static lifetime, and there is nothing to which we could tie it.)

Also, since we put the code in a function, we allocate new lines for those cases when it is not needed.

Let's SendStr


Fortunately, Rust supports algebraic data types (also known as enum ). Also, in the standard library there is a suitable type that can describe an object that is either a string cut or a possession string.

Below is the definition of this type (without a description of the methods that make it even more useful):
 pub enum MaybeOwned<'a> { Slice(&'a str), Owned(String) } pub type SendStr = MaybeOwned<'static>; 

Definitions of MaybeOwned and SendStr from std::str .

Send is a restriction that indicates that an object can be safely transferred between tasks (that is, between threads, while not losing memory security); this also implies that the object is self-sufficient, and can be returned from a function. Let there be a string like &'static str , as in the definition of SendStr ; it does not contain references to any objects inside the function, is it? Therefore, it can exist as long as it takes. The same is true for String . Therefore, either of these two objects can be captured inside an enum type, which says that we own one or another object. Therefore, SendStr satisfies the Send condition. This type stores a value in it and the user can perform various operations on it . Now the most remarkable fact is that we can extract a string cut from this type using as_slice() . This type also implements std::fmt::Show , which means that we can use it in formatted output directly, specifying {} (type Show is a direct analog of __str__() in Python or to_s() , toString() , &c in other languages, but it works directly with the writer object, which makes it possible to get rid of the intermediate string object. Calling to_string() on any type that implements Show also calls this mechanism).

Here is the application:
 use std::str::SendStr; fn fizz_buzz(i: int) -> SendStr { if i % 15 == 0 { "FizzBuzz".into_maybe_owned() } else if i % 5 == 0 { "Buzz".into_maybe_owned() } else if i % 3 == 0 { "Fizz".into_maybe_owned() } else { i.to_string().into_maybe_owned() } } for i in range(1i, 101) { println!("{}", fizz_buzz(i)); } 

The fizz_buzz function returns SendStr . It works.
( .into_maybe_owned() taken from IntoMaybeOwned and is available by default )

Cool! Now we have reduced the amount of work the computer needs to do and have made our well-known example faster.
But can we go further?

Writing your own enum type and implementing the type std::fmt::Show


Of course, what we convey is not really a “string”, these are some meanings of “Fizz”, “Buzz”, “FizzBuzz”, or a number. We simply converted all the options into a string in advance; we can easily do it lazily, avoiding unnecessary allocations (in fact, all allocations here can be avoided).

Let's make our own enum .
In the process, we also implement std::fmt::Show for it, which will allow outputting it directly to stdout , without the need for an intermediate line.
Use an isolated data type to effectively represent possible values.
similar to Python (extremely tight)Rusty
 class FizzBuzzItem: def __init__(self, value): self._value = value def __str__(self): if self is Fizz: return "Fizz" elif self is Buzz: return "Buzz" elif self is FizzBuzz: return "FizzBuzz" else: return str(self._value) # ,     Fizz = FizzBuzzItem(object()) Buzz = FizzBuzzItem(object()) FizzBuzz = FizzBuzzItem(object()) def Number(number): return FizzBuzzItem(number) def fizz_buzz(i): if i % 15 == 0: return FizzBuzz elif i % 5 == 0: return Buzz elif i % 3 == 0: return Fizz else: return Number(i) for i in range(1, 101): print(fizz_buzz(i)) 

 use std::fmt; enum FizzBuzzItem { Fizz, Buzz, FizzBuzz, Number(int), } impl fmt::Show for FizzBuzzItem { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { match *self { Fizz => f.write(b"Fizz"), Buzz => f.write(b"Buzz"), FizzBuzz => f.write(b"FizzBuzz"), Number(num) => write!(f, "{}", num), } } } fn fizz_buzz(i: int) -> FizzBuzzItem { if i % 15 == 0 { FizzBuzz } else if i % 5 == 0 { Buzz } else if i % 3 == 0 { Fizz } else { Number(i) } } for i in range(1i, 101) { println!("{}", fizz_buzz(i)); } 


Please note that this is a really good way to present data, although we could not bother so much in this case and simply replace the first three branches with the Word(&'static str) type Word(&'static str) : Word("FizzBuzz") so on. (In truth, this was the first version I wrote in this step. Even I was turned to using strings where it is not required!)

We could go further by writing a separate iterator, but considering how iterators work in Rust, this is not necessary at all - you can simply write range(1, 101).map(fizz_buzz) . range(1, 101).map(fizz_buzz) . This will give much more flexibility. Once Iterator<int> is implemented somewhere, you can simply add .map(fizz_buzz) to the end and you will get a type that implements Iterator<FizzBuzzItem> .

The cycle can be rewritten once or twice in this style:
We use the fizz_buzz function on the integer iterator.
PythonRusty
 for f in map(fizz_buzz, range(1, 101)): print(f) 

 for f in range(1, 101).map(fizz_buzz) { println!("{}", f); } 


Whichever way we choose, as a result we get the good old FizzBuzz exhaust.

Conclusion


Now you know why your first implementation of FizzBuzz on Rust might not work. Some of the difficulties described in the article are typical for statically-typed languages, some relate to the specifics of Rust. (Actually, the situation is similar to the same in C ++, with the difference that C ++ allows you to make a bunch of silly mistakes and does not give any guarantees of working with memory. Do not argue with me about this, here I just quote other people, I I do not know C ++ properly.)

We walked around the topic of ownership in Rust, and how it can prevent you from writing in the style you are used to, and why this is true (though without describing specific advantages). We also mentioned the effective concept of enum types (algebraic data types), which allows us to describe data more strictly and efficiently.

I hope you saw the power of all these things, and it interested you.

Is the described additional meaning? Yes.

It is unpleasant? Periodically. (My experience says that all this saves from difficulties as often as it creates them.)

Does this improve the effectiveness of your programs? Of course, with full confidence. Previously, these things required a loss of security and correctness, now in Rust, you do not need such compromises.

Does it make it easier to code through?In simple cases, like this one, there is not much difference, but in complex these mechanisms become real help. (I really lack them in Python.)

Summing up on these concepts, we can say that there are bad and good sides: sometimes you will love them, sometimes you hate them. But at least I hate them less often.

Should you use Rust? Well, I suggest at least try it. You may find it raw or unsuitable for your purposes because of its emphasis on system programming. For many high-level tasks, it can be somewhat cumbersome. But I believe that the time will come, and it will be a cool tool for things like web programming, which I talked about at the report in StrangeLoop (you can also watch the slides, 2MB SVG).

Finally, if you have little knowledge of Rust or have not understood some part of the article, I suggest that you familiarize yourself with the official documentation ; The thirty-minute introduction to Rust describes the concept of ownership quite well, and the Hyde reveals the enumtypes and more. There are also more detailed guides on specific issues . If you still have questions, places like the #rust channel on irc.mozilla.org can be a great help - I am there for a long time, my nickname is ChrisMorgan .

Well, if you really love messing around with optimizing FizzBuzz


Yes please . This is the final version, with the minimum corrections necessary to compile with the up-to-date version of Rust, and the string version OUTfor improved readability (!?):
 #![no_std] #![feature(asm, lang_items)] extern crate libc; static OUT: &'static [u8] = b"\ 1\n2\nFizz\n4\nBuzz\nFizz\n7\n8\nFizz\nBuzz\n11\nFizz\n13\n14\nFizzBuzz\n\ 16\n17\nFizz\n19\nBuzz\nFizz\n22\n23\nFizz\nBuzz\n26\nFizz\n28\n29\nFizzBuzz\n\ 31\n32\nFizz\n34\nBuzz\nFizz\n37\n38\nFizz\nBuzz\n41\nFizz\n43\n44\nFizzBuzz\n\ 46\n47\nFizz\n49\nBuzz\nFizz\n52\n53\nFizz\nBuzz\n56\nFizz\n58\n59\nFizzBuzz\n\ 61\n62\nFizz\n64\nBuzz\nFizz\n67\n68\nFizz\nBuzz\n71\nFizz\n73\n74\nFizzBuzz\n\ 76\n77\nFizz\n79\nBuzz\nFizz\n82\n83\nFizz\nBuzz\n86\nFizz\n88\n89\nFizzBuzz\n\ 91\n92\nFizz\n94\nBuzz\nFizz\n97\n98\nFizz\nBuzz\n"; #[start] fn start(_argc: int, _argv: *const *const u8) -> int { unsafe { asm!( " mov $$1, %rax mov $$1, %rdi mov $0, %rsi mov $$0x19d, %rdx syscall " : : "r" (&OUT[0]) : "rax", "rdi", "rsi", "rdx" : ); } 0 } #[lang = "stack_exhausted"] extern fn stack_exhausted() {} #[lang = "eh_personality"] extern fn eh_personality() {} #[lang = "fail_fmt"] extern fn fail_fmt() {} 


Translator's Notes :
1. Rust has a developed system of macros, in this case println!in compile-time it is developed into a specialized call for a specific type println.
2. When you read the original for the first time, you might get the impression that we are talking about managed code , however, here we mean managed memory . Despite the different wording within the brackets and outside, we are talking about the same thing.

The material is quite large, quite possible stylistic or semantic errors of translation. Also, due to the fact that I am not an expert in Rust and statically-typed languages, there may be inaccuracies in the description of some mechanisms. In both cases, I will be grateful if you send me your corrections in personal messages.
Thanks for attention.

Source: https://habr.com/ru/post/240617/


All Articles