Rust key features

Rust is a new programming language developed by Mozilla. The main goal of the developers is to create a safe practical language for parallel computing. The first version of the language was written by Graidon Choir in 2006, and in 2009 Mozilla joined the development. Since then, the compiler itself, originally written in OCaml, has undergone changes: it was successfully rewritten to Rust using LLVM as the back-end.

The main product developed by Rust is the new Servo web engine, which is also being developed by Mozilla. In 2013, Samsung Electronics joined the development of Rust and Servo, with the active participation of which the Servo engine code was ported to the ARM architecture. The support of the language by such serious players of the IT industry cannot but rejoice and gives hope for its further active development and improvement.

The Rust language simply cannot help but be liked by system and network developers, those who have to write a lot of code that is critical for their work, in C and C ++, because:

Rust is focused on developing secure applications. This includes safe work with memory: the absence of null-pointers, control over the use of non-initialized and de-initialized variables; the inability to share shared states across multiple tasks; static analysis of the lifetime of pointers.
Rust is focused on developing parallel applications. It supports light (green) threads, asynchronous messaging without copying the data sent, the ability to select the placement of objects on the stack, in a local task heap or a heap shared between tasks.
Rust is focused on developing speed and memory efficient applications. Using LLVM as a back-end allows you to compile an application into native code, and a simple interface for interacting with C code makes it easy to use existing high-performance libraries.
Rust is focused on developing cross-platform applications. The compiler is officially supported on Windows, Linux and Mac OS X platforms, while there are ports to other * NIX platforms, such as FreeBSD. Also supported by several processor architectures: i386, x64 and ARM.
Rust allows you to write in different styles: object-oriented, functional, actor-based, imperative.
Rust supports existing debugging tools: GDB, Valgrind, Instruments.

The target audience

First, I planned to write an introductory article that would consider the language from the very beginning, starting with the declaration of variables and ending with the functionality and features of the memory model. On the one hand, such an approach would allow to reach as large a target audience as possible, on the other hand, an article with similar content would be uninteresting to people who have good experience with languages like C ++ or Java, and would not allow for a deeper analysis of basic features of Rust, that is, exactly what makes it attractive.
')
Therefore, I decided not to describe in detail such basic things as creating variables, cycles, functions, closures, and everything else that is clear from the code. The bulk of not quite obvious features will be described as necessary, in the process of parsing the main features of Rust. As a result, the article describes the description of two of the three main features of the language: safe work with memory and writing parallel applications. Unfortunately, at the time of writing this article, the network subsystem was in active development, which made the inclusion of the description of working with it in the article completely meaningless.

Terminology

By and large, this is one of the two or three available articles on Rust in Russian, so there is no well-established Russian terminology and I have to take the most appropriate equivalents already familiar from other programming languages. For the convenience of further reading the documentation and articles in English when the first Russian term appears, the English equivalent is given in parentheses.

The most problems were caused by the terms Box and Pointer. By its properties that Box, that Pointer most resemble smart pointers from C ++, so I decided to use the term "pointers". Thus, Owned boxes turned into Unique pointers, and Borrowed pointers into Temporary pointers.

Work with memory

The principles of working with memory is the first key feature of Rust, which distinguishes this language from both languages with full access to memory (such as C ++) and languages with full memory control by the GC (like Java). The fact is that, on the one hand, Rust provides the developer with the ability to control where to place the data, introducing the separation according to the types of pointers and ensuring control over their use at the compilation stage. On the other hand, the reference counting mechanism, which in the final version of the language will be replaced by a full GC, provides automatic resource management.

In Rust, there are several types of pointers that address objects located in different types of memory and obey different rules:

Managed boxes. Indicate data located in the local heap of the task; multiple shared pointers can address the same object.
Unique pointers (Owned boxes). Indicate data placed on the exchange heap, common to all tasks; per unit of time, access to an object can address only one pointer (see exceptions to the rule in the “ARC Module” section).
Temporary indexes (Borrowed pointers). Universal pointers that have the ability to point to any type of object: a stack, placed in a local or exchange heap. Mainly used to write a universal code that works with data in functions when the type of object placement is not important.
Slightly on the side are objects placed on the stack. There is no designated pointer type for their addressing.

Schematically, the Rust memory model can be represented as follows:

Use stack

let x = Point {x: 1f, y: 1f}; // (1) let y = x; // (2)

So, the code (1) will place an object of type Point on the task stack in which it will be called. When copying such an object (2), it is not the pointer to the object x that is copied, but the entire structure of type Point.

For information: variables

As you can see from the example above, the let keyword is used in Rust to create variables. By default, all variables are constant and to create a variable variable, you must add the keyword mut. Thus, creating a variable of type Point could look like this: let mut x = Point {x: 1f, y: 1f} ;.

It is extremely important to remember when working with variables that the data is constant, and the compiler closely follows attempts to change them by “tricking”.

 let x = Point {x:1, y:2}; let y = Point {x:2, y:3}; let mut px = &x; // (1) let py = &y; px.x = 42; // (2) px = py; // (3)

So, it is quite possible (1) to create a variable that points to constant data, but the attempt (2) to change the data itself will end up with an error at the compilation stage. But changing the value of the variable storing the address of the constant object Point and created earlier is valid (3).

 error: assigning to immutable field px.x = 42; ^~~~~

Shared Pointers

Shared pointers are used as pointers to objects located in the local task heap. Each task has its own local heap, and pointers to objects located in it can never be transferred outside of its limits. The unary operator @ is used to create shared pointers.

 let x = @Point {x: 1f, y: 1f};

Unlike stack objects, when copying, only the pointer is copied, not the data. It is from this property that the name of this type of pointers went, since their behavior is very similar to shared_ptr from the C ++ language.

 let y = x; //  x  y     //     Point

It is also necessary to note the fact that it is impossible to create a structure containing a pointer to its own type (a classic example is a simply linked list). In order for the compiler to allow such a construction, it is necessary to wrap the pointer in the type Option (1).

 struct LinkedList<T> { data: T, nextNode: Option<@LinkedList<T>> // (1) }

Unique pointers

Unique pointers, like shared pointers, are pointers to objects in the heap, on which their similarities end. The data addressed by unique pointers are located in the exchange heap, which is common to all tasks. The unary operator is used to create unique pointers ~

 let p = ~Point {x: 1f, y: 1f};

Unique pointers implement semantics of ownership, so that an object can only address one unique pointer. C ++ developers will surely find common features between the unique Rust pointers and the unique_ptr class from STL.

 let new_p = p; // (1) let val_x = px; // (2)

Assigning (1) to the pointer new_p of the pointer p causes new_p to start pointing to the object of type Point created earlier, and the pointer p is deinitialized. In the case of an attempt to work with deinitialized variables (2), the compiler generates an error of use of moved value and suggests making a copy of the variable instead of assigning a pointer with subsequent initialization of the source.

 let p = ~Point {x: 1f, y: 1f}; let new_p = p.clone(); // (1)

Due to the explicit creation of a copy (1), new_p points to a copy of the object of type Point created earlier, and the pointer p does not change. In order to apply the clone method to the Point structure, the structure must be declared using the # [deriving (Clone)] attribute.

 #[deriving(Clone)] struct Point {x: float, y: float}

Temporary indexes

Temporary pointers are pointers that can point to an object located in any of the possible types of memory: stack, local or heap exchange, as well as an internal member of any data structure. At the physical level, temporary pointers are typical C pointers and, as a result, are not tracked by the garbage collector and do not introduce any additional overhead. At the same time, their main difference from C indexes are additional checks carried out at the compilation stage to ensure that they can be used safely. To create temporary pointers, the unary operator is used &

 let on_the_stack = &Point {x: 3.0, y: 4.0}; // (1)

An object of type Point was created (1) on the stack and a temporary pointer was stored in on_the_stack. This code is similar to the following:

 let on_the_stack = Point {x: 3.0, y: 4.0}; let on_the_stack_pointer = &on_the_stack;

Types other than stack ones are converted to temporary pointers automatically, without using the address taking operator, which makes it easier to write functions (1) if the type of the pointer does not matter.

 let on_the_stack : Point = Point {x: 3.0, y: 4.0}; let managed_box : @Point = @Point {x: 5.0, y: 1.0}; let owned_box : ~Point = ~Point {x: 7.0, y: 9.0}; fn compute_distance(p1: &Point, p2: &Point) -> float { // (1) let x_d = p1.x - p2.x; let y_d = p1.y - p2.y; sqrt(x_d * x_d + y_d * y_d) } compute_distance(&on_the_stack, managed_box); compute_distance(managed_box, owned_box);

And now a small illustration of how you can get a temporary pointer to the internal element of the data structure.

 let y = &point.y;

Monitoring the lifetime of time pointers is quite voluminous and not quite an established topic. If you wish, you can read about it in detail in the article Rust Borrowed Pointers Tutorial and Lifetime Notation.

Pointer dereferencing

To access values that are addressed using pointers, you need to perform a dereferencing operation (Dereferencing pointers). When you access the fields of structured objects de-bargaining is done automatically.

 let managed = @10; let owned = ~20; let borrowed = &30; let sum = *managed + *owned + *borrowed;

Conversion between pointers

Almost immediately after starting work with Rust, the question arises: “How to convert an object addressed with a unique pointer to a shared one or vice versa?” The answer to this question is brief and at first somewhat discouraging: no way. If you think about it well, then it becomes obvious that there are no means of such a conversion, and there can not be, since the objects are in different heaps and obey different rules, the objects may have dependency graphs, the automatic tracking of which is also difficult. Therefore, if you need to convert between pointers, which is nothing more than moving objects between heaps, you need to create copies of objects, for which you can use serialization.

Tasks

The second key feature of Rust is writing parallel applications. In terms of possibilities for writing parallel applications, Rust is reminiscent of Erlang with its model of actors and the exchange of messages between them and Limbo with its channels. In this case, the developer is given the opportunity to choose whether he wants to copy the memory when sending a message, or simply to transfer ownership of the object. And when several tasks work together with the same object, one-writer-many-reader access can be easily arranged. For created tasks, you can choose the most suitable scheduler or write your own.

For information: do-syntax

Before proceeding to the description of working with tasks, it is advisable to familiarize yourself with the do-syntax that Rust uses to simplify working with higher-order functions. As an example, we can take the function each, passing a pointer (1) to each of the elements of the array to the function op.

 fn each(v: &[int], op: &fn(v: &int)) { let mut n = 0; while n < v.len() { op(&v[n]); // (1) n += 1; } }

Using the each function, using the do-syntax (1), you can display each of the elements of the array, not forgetting that the pointer that needs to be dereferenced (2) to access the data will be passed to the lambda:

 do each([1, 2, 3]) |n| { // (1) io::println(n.to_str()); // (2) }

Since the do-syntax is syntactic sugar, the entry below is equivalent to the entry using the do-syntax.

 each([1, 2, 3], |n| { io::println(n.to_str()); });

Run task to execute

Creating and executing a task in Rust is very simple. The code related to working with tasks is concentrated in the std :: task module, and the simplest way to create and start a task is to call the spawn function from this module.

 use std::task; fn print_message() { println("Message form task 1"); } fn main() { spawn(print_message); // (1) spawn( || println("Message form task 2") ); // (2) do spawn { // (3) println("Message form task 3"); } }

The spawn function accepts a closure as an argument and starts it to execute as a task (don't forget that the tasks in Rust are implemented on top of the green streams). In order to get the current task, within which the code runs, you can use the get_task () method from the task module. Given the fact that within the task closures are performed, it is not difficult to suggest 3 ways to start the task: transferring the address of the function (1) by creating a closure “in place” (2) or, more true from the point of view of the language ideology, using do - syntax (3).

Interaction between tasks

The Rust memory model, in general, does not allow sharing the same memory from different tasks (shared memory model), instead offering mail exchange between tasks (mailbox model). At the same time, for several tasks there is an opportunity to work with shared memory in “read only” and “one writer many readers” modes. For the organization of interaction between tasks Rust offers the following methods:

Low-level channels and ports from the module std :: comm;
High-level abstraction over the channels and ports extra :: comm;
Channels for transmitting binary data from extra :: flatpipes;

Low level messaging

The most widely used method of interaction between tasks at the moment is the module std :: comm. The code from std :: comm is well debugged, well documented and quite easy to use. The basis of the messaging mechanism of std :: comm is streams that are manipulated by channels and ports. A stream is a unidirectional communication mechanism in which a port is used to send a message, and a channel is for receiving sent information. The simplest example of using a stream is as follows:

 let (chan, port) = stream(); // (1) port.send("data"); // (2) // port.send(1); // (3) println(chan.recv()); // (4)

In this example, a pair (1) is created, consisting of a channel and a port, which are used to send (2) a string data type. Special attention should be paid to the prototype of the stream () function, which looks like this: fn stream <T: Send> () -> (Port, Chan). As can be seen from the prototype, the channel and port are template types, which, at first glance, is not obvious from the code above. In this case, the type of data transferred is displayed automatically based on the first use. So, if you uncomment the line that sends the unit (3) to the stream, the compiler will give the error message:

 error: mismatched types: expected `&'static str` but found `<VI0>` (expected &'static str but found integral variable

The class of the template parameter Send, which means the possibility of transmitting only objects that support sending outside the current task, deserves special attention.

To retrieve data from a stream, you can use the recv () function, which either returns the data or blocks the task until it appears. Looking at the example cited above, the suspicion creeps in that it is completely useless, since there is no practical sense in sending messages using threads in one task. So it is worth moving to more practical things, such as using threads to transfer information between tasks.

 let value = vec::from_fn(5, |x| x + 1); // (1) let (server_chan, server_port) = stream(); // (2) let (client_chan, client_port) = stream(); // (3) do task::spawn { let val: ~[uint] = server_chan.recv(); // (4) let res = val.map(|v| {v+1}); client_port.send(res) // (5) } server_port.send(value); // (6) io::println(fmt!("Result: %?", client_chan.recv())); // (7)

The first thing you should pay attention to when working with threads is the need to pass values that are addressed by unique pointers, and the from_fn () (1) function just creates such an array. Since the stream is unidirectional, two streams will be needed to send the request (2) and receive the answer (3). Using the recv () function, data is read from the stream (4), and in the absence of those, the stream will block the task until they appear. To send the result to the client, the send () (5) function is used, which belongs not to the server stream, but to the client thread; Similarly, you must deal with the data for sending to the server task: they are recorded (6) using the send () function associated with the server port. At the very end, the result transmitted by the server task is read (7) from the client stream.

Thus, to send messages to the server and receive messages on the server side, the stream server_chan, server_port is used. Due to the one-direction flow, to get the result of server calculations, a client flow was created consisting of a pair client_chan, client_port.

Stream sharing

Although the flow is a unidirectional data transfer mechanism, it does not make it necessary to create a new flow for everyone who wants to send data, as there is a mechanism that ensures the work in the “one-receiver-many-senders” mode.

 enum command { // (1) print_hello(int), stop } ... let (server_chan, server_port) = stream(); // (2) let (client_chan, client_port) = stream(); // (3) do spawn { // (4) let mut hello_count = 0; let mut done = false; while !done { let req: command = server_chan.recv(); // (5) match req { print_hello(client_id) => { println( fmt!("Hello from client #%d", client_id)); hello_count += 1; } stop => { println("Stop command received"); done = true; } } } client_port.send(hello_count); // (6) } let server_port = SharedChan::new(server_port); // (7) for i in range(0, 5) { let server_port = server_port.clone(); // (8) do spawn { server_port.send(print_hello(i)); // (9) } } server_port.send(stop); println(fmt!("Result: %?", client_chan.recv()));

For this, as well as for the “one-reader-one-writer” scheme, it is necessary to create server (2) and client (3) streams and run the server task (3). The logic of the server task is extremely simple: read (5) the data from the server channel transmitted by the client (9), display a message about the receipt of the request on the screen and send the resulting number of received print_hello (5) requests to the client stream. Since there are several writers, it is necessary to make changes to the server port type, converting (7) it to SharedChan instead of Chan, and for each writer to create a unique copy of the port (8) using the clone () method. Further work with the port is no different from the previous example: the send () method is used to send data to the server (9) with the only difference that now the data is sent from several tasks at the same time.

In addition to illustrating the thread collaboration method, this example shows how to send several different types of messages using one stream. Since the type of data transmitted by the stream is set at the compilation stage, to transfer data of different types, you must either use the serialization with the subsequent transfer of binary data (this method is described below in the section “Transferring Objects”) or transfer the enumeration (1). By their properties, enumerations in Rust are similar to unions from the C language or the Variant type, which is present in one form or another in almost all high-level programming languages.

Object Forwarding

In those cases where the need to send values that are addressed solely by unique pointers becomes a problem, the flatpipes module comes to the rescue. This module allows you to send and receive any binary data in the form of an array or objects that support serialization.

 #[deriving(Decodable)] // (1) #[deriving(Encodable)] // (2) struct EncTest { val1: uint, val2: @str, val3: ~str } ... let (server_chan, server_port) = flatpipes::serial::pipe_stream(); // (3) do task::spawn { let value = @EncTest{val1: 1u, val2: @"test string 1", val3: ~"test string 2"}; server_port.send(value); // (4) } let val = server_chan.recv(); server_port.send(value); // (5)

As you can see from the example, working with flatpipes is extremely simple. The structure, the objects of which will be transferred via flatpipes, must be declared serializable (1) and deserializable (2). Creating flatpipes (3) is technically no different from creating ordinary streams, as well as receiving (4) and sending (5) messages using a channel and a port. The main difference of flatpipes from the flow is the creation of a deep copy of the object on the sending side and the construction of a new object on the receiving side. Thanks to this approach, the overhead of working with flatpipes, compared with conventional flows, increases, but the possibilities for sending data between tasks increase.

High-level messaging abstraction

In most of the examples above, two streams are created: one to send data to the server, the second to receive data from the server. Such an approach does not bring any tangible benefits and simply litters the code. In this connection, the extra :: comm module was created, which is a high-level abstraction over std :: comm and contains a DuplexStream that allows you to organize bidirectional communication within a single stream. Needless to say, if you look into the DuplexStream source code, it becomes clear that this is nothing more than a convenient add-on over a pair of standard streams.

 let value = ~[1, 2, 3, 4, 5]; let (server, client) = DuplexStream(); // (1) do task::spawn { let val: ~[uint] = server.recv(); // (2) io::println(fmt!("Value: %?", val)); let res = val.map(|v| {v+1}); server.send(res) // (3) } client.send(value); // (4) io::println(fmt!("Result: %?", client.recv())); // (5)

When working with DuplexStream, (1) creates a single pair of two bidirectional streams, both of which can be used for both sending and receiving messages. The server object is captured by the task context and is used to receive (2) and send (3) messages in the server task, and the client object is used in the client task (4,5). The principle of working with DuplexStream is no different from working with ordinary streams, but it reduces the number of auxiliary objects.

Arc module

Despite all the delights of sending messages, sooner or later the question arises: “And what to do with a large data structure, access to which is needed from several tasks at the same time?” Of course, it can be sent as a unique pointer between threads, but this approach will make it difficult to develop applications, and its maintenance will turn into a real nightmare. For such cases, the Arc module was created, allowing to organize the sharing of several tasks to the same object.

Sharing unique pointers with read-only access

First you need to deal with the simplest case - sharing access to immutable data from several tasks.To solve this problem, it is necessary to use the Arc module, which implements an automatic reference counting mechanism (Atomically Reference-Counter) to a shared object. In the prototype of the function for creating an ARC object pub fn new (data: T) -> Arc, you should pay attention to the restrictions imposed on type T.

 impl<T:Freeze+Send> Arc<T> { pub fn new(data: T) -> Arc<T> { ... } ... }

Now, the object should refer not only to the Send class, as was the case with the stream, but also to the Freeze class, which guarantees the absence of any changeable fields or pointers to the changeable fields inside the T object (such objects in Rust are called deeply immutable objects).

 let data = arc::Arc::new(~[1, 2, 3, 4, 5]); // (1) let shared_data = data.clone(); // (2) do spawn { let val = shared_data.get(); // (3) println(fmt!("Shared array: %?", val)); } println(fmt!("Original array: %?", data.get())); // (4)

Suppose there is no work with streams in this example, but it is quite sufficient to illustrate work with Arc, as it clearly demonstrates the main functionality of this module - the ability to simultaneously access the same data from different tasks. So, to share the same array wrapped in Arc (1), you need to create the Arc wrapper clone (2), which will make it possible to access data from both the new (3) and the main (4) tasks.

R / W access to unique signs

The RWArc module causes me to have double emotions. On the one hand, thanks to RWArc, the concept of “many readers is one writer” is widespread and well known to most developers, which is probably good, since the concept is widely known. On the other hand, shared access to memory, and not RO access, which was described a little earlier, but RW access, is fraught with problems with deadlocks, from which Rust should protect developers. For myself, I came to the following conclusion: you need to know about the module, but you should not use it without emergency.

 let data = arc::RWArc::new(~[1, 2, 3, 4, 5]); // (1) do 5.times { let reader = data.clone(); // (2) do spawn { do reader.read() |data| { // (3) io::println(fmt!("Value: %?", data)); // (4) } } } do spawn { do data.write() |data| { // (5) for x in data.mut_iter() { *x = *x * 2 } // (6) } }

In the example above, (1) an array wrapped in RWArc is created, so that it can be accessed both for reading (4) and for writing (6). The fundamental difference between the example of working with RWArc and all previous examples is the use of closures in the read () (3) and write () (5) functions as an argument. Reading and writing data wrapped in RWArc can only be done in these functions. And, as usual, it is necessary to create a copy (2) of the object to access it from the closure, otherwise the original will become inaccessible.

How is that even possible?

Yes, this is the question that arises after you learn that the Arc and RWArc modules are present in Rust. At first glance, they contradict the concept of working with memory in Rust in general, and the principles of operation of unique pointers in particular. Without being the creator or developer of this language, I can only talk about how this behavior is possible. As part of the Rust language, there is an unsafe keyword that allows you to write code that works with memory directly, call such unsafe memory functions as malloc, free, and use address arithmetic. It is this feature that is used to bypass the memory protection built into Rust and to share the same object. All code related to this functionalityMarked as "COMPLETELY UNSAFE" and should not be used by end users directly.

Instead of conclusion

Although right now the Rust language is not suitable for industrial use, in my opinion, it has great potential. It may well be that in a few years Rust will be able to compete with such wonderful dinosaur languages as C and C ++, at least in areas related to writing network and parallel applications. In a pinch, I really hope so.

As for the article, it is most likely impossible to consider it complete: first, the syntax of the language will surely undergo a number of changes, and, second, the work on the third of the key features of the language, support for network interactions, should be completed. As soon as this functionality comes to a more or less complete state, I will definitely write about it.

Source: https://habr.com/ru/post/191916/

All Articles