String
or &str
. I also want to discuss why we may need it. fn remove_spaces(input: &str) -> String { let mut buf = String::with_capacity(input.len()); for c in input.chars() { if c != ' ' { buf.push(c); } } buf }
input
string, and adds all non-whitespace characters to the buf
buffer. Now the question is: what if there is not a single space at the input? Then the input
value will be exactly the same as buf
. In this case, it would be more efficient not to create buf
at all. Instead, we would simply like to return the given input
back to the user of the function. The input
type is &str
, but our function returns a String
. We could change the input
type to String
: fn remove_spaces(input: String) -> String { ... }
input
becomes String
, the user of the function will have to transfer the ownership of the input
to our function, so that he will not be able to work with the same data in the future. We should take possession of input
only if we really need it. Secondly, the input may already be &str
, and then we force the user to convert the string to a String
, nullifying our attempt to avoid allocating memory for buf
.&str
) if there are no spaces in it, and a new string ( String
) if there are spaces and we need to remove them. This is where the type of copy-on-write ( c lone- o n- w rite) Cow comes to the rescue. The Cow
type allows us to abstract away from whether we own the variable ( Owned
) or we just borrow it ( Borrowed
). In our example, &str
is a link to an existing string, so this will be borrowed data. If the string has spaces, we need to allocate memory for the new String
. The variable buf
owns this string. In the usual case, we would move the ownership of buf
, returning it to the user. When using Cow
we want to move the buf
ownership to the Cow
type and then return it. use std::borrow::Cow; fn remove_spaces<'a>(input: &'a str) -> Cow<'a, str> { if input.contains(' ') { let mut buf = String::with_capacity(input.len()); for c in input.chars() { if c != ' ' { buf.push(c); } } return Cow::Owned(buf); } return Cow::Borrowed(input); }
input
argument contains at least one space, and only then allocates memory for the new buffer. If there are no spaces in input
, then it is simply returned as is. We add a bit of complexity at runtime to optimize memory handling. Please note that our type of Cow
the same lifetime as that of &str
. As we said earlier, the compiler needs to track the use of the &str
reference in order to know when it is safe to free up memory (or call the destructor method if the type implements Drop
).Cow
is that it implements the Deref
type, so you can call on the methods that do not change these methods without even knowing if a new buffer is allocated for the result. For example: let s = remove_spaces("Herman Radtke"); println!(" : {}", s.len());
s
, then I can convert it to the owning variable using the into_owned()
method. If the Cow
contains borrowed data (the Borrowed
option is selected), memory allocation will occur. This approach allows us to clone (that is, allocate memory) lazily only when we really need to write (or change) into a variable.Cow::Borrowed
: let s = remove_spaces("Herman"); // s Cow::Borrowed let len = s.len(); // Deref let owned: String = s.into_owned(); // String
Cow::Owned
: let s = remove_spaces("Herman Radtke"); // s Cow::Owned let len = s.len(); // Deref let owned: String = s.into_owned(); // , String
Cow
as follows:remove_spaces
not to worry about memory allocation. Using Cow
will be the same anyway (whether new memory will be allocated or not).&str
to String
. Similarly, we can use it to convert a &str
or String
into the desired Cow
variant. Calling .into()
will cause the compiler to choose the correct conversion option automatically. Using .into()
doesn't slow down our code at all; it's just a way to get rid of the explicit indication of the Cow::Owned
or Cow::Borrowed
option. fn remove_spaces<'a>(input: &'a str) -> Cow<'a, str> { if input.contains(' ') { let mut buf = String::with_capacity(input.len()); let v: Vec<char> = input.chars().collect(); for c in v { if c != ' ' { buf.push(c); } } return buf.into(); } return input.into(); }
fn remove_spaces<'a>(input: &'a str) -> Cow<'a, str> { if input.contains(' ') { input .chars() .filter(|&x| x != ' ') .collect::<std::string::String>() .into() } else { input.into() } }
&str
in the optimal case, and a less optimal case that requires memory allocation under String
. Other examples that come to my mind: encoding a string into valid XML / HTML or correctly escaping special characters in a SQL query. In many cases, the input data is already correctly encoded or shielded, and then it is better to simply return the input string back as is. If the data needs to be changed, then we will have to allocate memory for the string buffer and return it already.String::with_capacity()
instead of String::new()
when creating the string buffer. You can use String::new()
instead of String::with_capacity()
, but it is much more efficient to allocate all the required memory for the buffer at once, rather than re-allocating it as we add new characters to the buffer.String
is actually a Vec
vector from UTF-8 code points. When calling String::new()
Rust creates a zero-length vector. When we put the character a
in the string buffer, for example using input.push('a')
, Rust should increase the capacity of the vector. To do this, it will allocate 2 bytes of memory. When we further put characters into the buffer, when we exceed the allocated amount of memory, Rust doubles the size of the line, re-allocating the memory. He will continue to increase the capacity of the vector each time it is exceeded. The sequence of allocated capacity is: 0, 2, 4, 8, 16, 32, …, 2^n
, where n is the number of times Rust has detected that the allocated memory has been exceeded. Re-allocating memory is very slow (correction: kmc_v3 explained that it may not be as slow as I thought). Rust not only has to ask the kernel to allocate new memory, it also has to copy the contents of the vector from the old memory to the new one. Take a look at the source code of Vec :: push to see for yourself the logic of vector resizing.memcpy
byte copying is used, with a completely predictable way of accessing memory. So this is probably the most efficient way to move data from memory to memory. The libc system library typically includes memcpy
optimizations for your particular micro-architecture.jemalloc
in Rust makes such optimizations.std::vector
in C ++ can be very slow due to the fact that you need to call the displacement constructors individually for each element, and they can throw an exception.remove_spaces("Herman Radtke")
, the overhead of re-allocating memory does not play a big role. But what if I want to remove all spaces in all JavaScript files on my site? Overhead for re-allocating buffer memory will be much more. When placing data into a vector ( String
or whatever), it is very useful to specify the size of memory that will be required when creating the vector. At best, you know the desired length in advance, so that the capacity of the vector can be set exactly. Comments to the code Vec
warn about the same.Source: https://habr.com/ru/post/274565/
All Articles