This is the last article from the series on working with strings and memory in Rust by Herman Radtke, which I translate. It seemed to me the most useful, and initially I wanted to start translating from it, but then it seemed to me that the rest of the articles in the series are also needed to create a context and an introduction to simpler, but very important, moments of the language, without which this article loses its utility.String or &str . I also want to discuss why we may need it. fn remove_spaces(input: &str) -> String { let mut buf = String::with_capacity(input.len()); for c in input.chars() { if c != ' ' { buf.push(c); } } buf } input string, and adds all non-whitespace characters to the buf buffer. Now the question is: what if there is not a single space at the input? Then the input value will be exactly the same as buf . In this case, it would be more efficient not to create buf at all. Instead, we would simply like to return the given input back to the user of the function. The input type is &str , but our function returns a String . We could change the input type to String : fn remove_spaces(input: String) -> String { ... } input becomes String , the user of the function will have to transfer the ownership of the input to our function, so that he will not be able to work with the same data in the future. We should take possession of input only if we really need it. Secondly, the input may already be &str , and then we force the user to convert the string to a String , nullifying our attempt to avoid allocating memory for buf .&str ) if there are no spaces in it, and a new string ( String ) if there are spaces and we need to remove them. This is where the type of copy-on-write ( c lone- o n- w rite) Cow comes to the rescue. The Cow type allows us to abstract away from whether we own the variable ( Owned ) or we just borrow it ( Borrowed ). In our example, &str is a link to an existing string, so this will be borrowed data. If the string has spaces, we need to allocate memory for the new String . The variable buf owns this string. In the usual case, we would move the ownership of buf , returning it to the user. When using Cow we want to move the buf ownership to the Cow type and then return it. use std::borrow::Cow; fn remove_spaces<'a>(input: &'a str) -> Cow<'a, str> { if input.contains(' ') { let mut buf = String::with_capacity(input.len()); for c in input.chars() { if c != ' ' { buf.push(c); } } return Cow::Owned(buf); } return Cow::Borrowed(input); } input argument contains at least one space, and only then allocates memory for the new buffer. If there are no spaces in input , then it is simply returned as is. We add a bit of complexity at runtime to optimize memory handling. Please note that our type of Cow the same lifetime as that of &str . As we said earlier, the compiler needs to track the use of the &str reference in order to know when it is safe to free up memory (or call the destructor method if the type implements Drop ).Cow is that it implements the Deref type, so you can call on the methods that do not change these methods without even knowing if a new buffer is allocated for the result. For example: let s = remove_spaces("Herman Radtke"); println!(" : {}", s.len()); s , then I can convert it to the owning variable using the into_owned() method. If the Cow contains borrowed data (the Borrowed option is selected), memory allocation will occur. This approach allows us to clone (that is, allocate memory) lazily only when we really need to write (or change) into a variable.Cow::Borrowed : let s = remove_spaces("Herman"); // s Cow::Borrowed let len = s.len(); // Deref let owned: String = s.into_owned(); // String Cow::Owned : let s = remove_spaces("Herman Radtke"); // s Cow::Owned let len = s.len(); // Deref let owned: String = s.into_owned(); // , String Cow as follows:remove_spaces not to worry about memory allocation. Using Cow will be the same anyway (whether new memory will be allocated or not).&str to String . Similarly, we can use it to convert a &str or String into the desired Cow variant. Calling .into() will cause the compiler to choose the correct conversion option automatically. Using .into() doesn't slow down our code at all; it's just a way to get rid of the explicit indication of the Cow::Owned or Cow::Borrowed option. fn remove_spaces<'a>(input: &'a str) -> Cow<'a, str> { if input.contains(' ') { let mut buf = String::with_capacity(input.len()); let v: Vec<char> = input.chars().collect(); for c in v { if c != ' ' { buf.push(c); } } return buf.into(); } return input.into(); } fn remove_spaces<'a>(input: &'a str) -> Cow<'a, str> { if input.contains(' ') { input .chars() .filter(|&x| x != ' ') .collect::<std::string::String>() .into() } else { input.into() } } &str in the optimal case, and a less optimal case that requires memory allocation under String . Other examples that come to my mind: encoding a string into valid XML / HTML or correctly escaping special characters in a SQL query. In many cases, the input data is already correctly encoded or shielded, and then it is better to simply return the input string back as is. If the data needs to be changed, then we will have to allocate memory for the string buffer and return it already.String::with_capacity() instead of String::new() when creating the string buffer. You can use String::new() instead of String::with_capacity() , but it is much more efficient to allocate all the required memory for the buffer at once, rather than re-allocating it as we add new characters to the buffer.String is actually a Vec vector from UTF-8 code points. When calling String::new() Rust creates a zero-length vector. When we put the character a in the string buffer, for example using input.push('a') , Rust should increase the capacity of the vector. To do this, it will allocate 2 bytes of memory. When we further put characters into the buffer, when we exceed the allocated amount of memory, Rust doubles the size of the line, re-allocating the memory. He will continue to increase the capacity of the vector each time it is exceeded. The sequence of allocated capacity is: 0, 2, 4, 8, 16, 32, …, 2^n , where n is the number of times Rust has detected that the allocated memory has been exceeded. Re-allocating memory is very slow (correction: kmc_v3 explained that it may not be as slow as I thought). Rust not only has to ask the kernel to allocate new memory, it also has to copy the contents of the vector from the old memory to the new one. Take a look at the source code of Vec :: push to see for yourself the logic of vector resizing.memcpy byte copying is used, with a completely predictable way of accessing memory. So this is probably the most efficient way to move data from memory to memory. The libc system library typically includes memcpy optimizations for your particular micro-architecture.jemalloc in Rust makes such optimizations.std::vector in C ++ can be very slow due to the fact that you need to call the displacement constructors individually for each element, and they can throw an exception.remove_spaces("Herman Radtke") , the overhead of re-allocating memory does not play a big role. But what if I want to remove all spaces in all JavaScript files on my site? Overhead for re-allocating buffer memory will be much more. When placing data into a vector ( String or whatever), it is very useful to specify the size of memory that will be required when creating the vector. At best, you know the desired length in advance, so that the capacity of the vector can be set exactly. Comments to the code Vec warn about the same.Source: https://habr.com/ru/post/274565/
All Articles