Black magic of metaprogramming: how macros work in Rust 1.15

In the last article we met with one of the most interesting features of the Rust language - procedural macros.

As promised, today I will talk about how to write such macros on my own and what is their fundamental difference from the notorious preprocessor macros in C / C ++.

But first, let's go through release 1.15 and talk about other innovations, since for many they were no less in demand.

What can you read?

Rust language develops very intensively. Naturally, publishers do not have time and do not undertake to publish books, as they become obsolete even before the paint dries on the pages.

Therefore, most of the current documentation is presented in electronic form. The traditional source of information is the Book , in which you can find most of the answers to the questions of beginners. For very frequent questions there is a FAQ section .

For those who already have programming experience in other languages, and generally adult enough to understand on their own, another book will do. It is assumed that she gives better material and should replace the first book. And for those who like to learn from examples, Rust by Example will do.

People familiar with C ++ may be interested in a book, or rather a porting guide , trying to provide material in comparison with C ++ and focusing on the differences of languages and on what problems Rust solves better.

If you are interested in the history of language development and the view from the other side of the barricades, I highly recommend the blogs Aaron Turon and Niko Matsakis . The guys write in a very lively language and talk about current problems of the language and how they are supposed to be solved. Often from these blogs you will learn much more relevant information than from other sources.

Finally, if you are not afraid of dragons and dark corners, you can take a look at Rustonomicon . Only I warn you, after reading this book, you will not be able to look at Rust in the same way. However, I was distracted ...

New in Rust 1.15

Since the release of 1.14 about 6 weeks have passed. During this time, 1443 patches (not weak, right?) That fix bugs and add new features managed to enter the new release. And just the other day, hotfix 1.15.1 appeared , with minor but important fixes.

For details, you can refer to the announcement page or to the detailed description of the changes (changelog) . Here we concentrate on the most noticeable changes.

Cargo is already an adult

The compiler build system and the standard Rust library were rewritten to Rust itself using Cargo, the standard package manager and build system adopted in the Rust ecosystem.

From now on, Cargo is the default build system. It was a long process, but it finally bore fruit. The authors claim that the new build system has been used since last December in the master repository branch and so far everything is going well.

Now the file with the name build.rs , lying on the same level with Cargo.toml will be interpreted as a build script.

Already even ~~wound up~~ already pull request to remove all makefiles; integration is scheduled for release 1.17.

All this prepares the ground for the direct use of packages from crates.io for building a compiler, as in any other project. It is also a good demonstration of the possibilities of Cargo.

New architectures

Rasta has i686-unknown-openbsd Tier 3 support for i686-unknown-openbsd , MSP430 , and ARMv5TE . Recently it became known that support for the AVR microcontroller architecture appears in the LLVM 4.0 release. Rust developers are aware of this and already ~~getting ready~~ Almost everything was done to integrate the new version of LLVM and the new architecture.

Moreover, there are already projects using Rust in an embedded environment. The compiler developers are polling the community to find out the needs of this still small but important group of users.

Faster! Above! Stronger!

The compiler has become faster . And recently, it was also announced that the incremental compilation system has entered a beta testing phase. On my projects, the compile time after minor changes decreased from ~ 20 to ~ 4 seconds, although the final linking still takes a decent amount of time. So far, incremental compilation works only in nightly builds and is highly dependent on the nature of the dependencies, but progress is good.

The slice::sort() algorithm was rewritten and became much, much, much faster . This is now a hybrid sort, implemented under the influence of Timsort . Previously, normal merge sorting was used.

In C ++, we can define overlapping template specialization for some type, but for now we can’t impose restrictions on what types can be used to specialize this template. Work in this direction is underway, but so far everything is very difficult.

Stable Rust has always been able to set type constraints , but more recently it has been possible to define, or rather block, a generalized implementation with a more specific one, if it sets more stringent constraints. This allows you to optimize the code for special cases without breaking the generic interface.

In particular, in release 1.15, a specialized implementation of the extend() method for Vec<T> was added , where T: Copy , which uses simple linear copying of memory regions, which led to significant acceleration.

In addition, implementations of the chars().count() , chars().last() , and char_indices().last() methods were accelerated.

IDE support

This is not yet in stable Rust, but nevertheless the news is too significant to keep silent about it. The fact is that recently the developers of the Rust Language Server have announced the release of an alpha version of their offspring.

Language Server Protocol is a standard protocol that allows editors and development environments to communicate in the same language as compilers. It abstracts such operations as auto-completion of input, transition to definition, refactoring, work with buffers, etc.

This means that any editor or IDE that supports LSP will automatically receive support for all LSP-compatible languages.

Already, you can try the basic features on compatible editors, only the authors strongly advise you to be careful about your data, because the code is still quite raw.

Macros in Rust

Let's return to our sheep.

From the very beginning, programmers wanted to write less and get more. At different times, different things were understood by this, but conditionally, there are two methods to reduce the code:

Selection of logically complete pieces of code for reuse
Selection of dependent code fragments that mean nothing outside their context

The first principle is more consistent with the traditional decomposition of programs: the separation of code into functions, methods, classes, etc.

Macros, inclusions and other preprocessing can be attributed to the second. There are three mechanisms for this in the Rust language:

Common Macros
Procedural macros
Compiler plugins

Ordinary macros (in the documentation macro for example ) are used when you want to avoid repetition of the same code, but allocating it to a function is irrational or impossible. Macros vec! or println! are examples of such macros. They are set in a declarative way. They work on the principle of matching and substitution on the model. The implementation is based on the work of 1986 , from which they received their full name.

Procedural macros are the first attempt to stabilize the compiler plugin interface. Unlike ordinary declarative macros, procedural macros are a fragment of Rust code that is executed during the compilation of a program and the result of which is a set of tokens. The compiler will interpret these tokens as the result of macro substitution.

At the moment, the compiler only provides for the use of procedural macros to support custom derive attributes. In the future, the number of scenarios will be expanded.

Compiler plugins are the most powerful, but complex and unstable (in the sense of API) tool that is available only in nightly compiler builds. The documentation provides an example of a Roman numerals support plugin as numeric literals.

Macro example

Because macros are not limited to the lexical context of a function, they can generate definitions for higher-level entities as well. For example, a macro can define a whole impl block, or a method together with a name, a list of parameters and the type of the return value.

Macro inserts are possible in almost all places of the module hierarchy:

inside expressions
in trait and impl blocks
in the bodies of functions and methods
in the body of the module

Macros are often used in libraries when it is necessary to define structures of the same type, for example, the impl series for standard data types.

For example, in the standard Rust library, macros are used to compactly declare the implementation of the PartialEq type for various combinations of slices, arrays and vectors:

Careful brain!

 macro_rules! __impl_slice_eq1 { ($Lhs: ty, $Rhs: ty) => { __impl_slice_eq1! { $Lhs, $Rhs, Sized } }; ($Lhs: ty, $Rhs: ty, $Bound: ident) => { #[stable(feature = "rust1", since = "1.0.0")] impl<'a, 'b, A: $Bound, B> PartialEq<$Rhs> for $Lhs where A: PartialEq<B> { #[inline] fn eq(&self, other: &$Rhs) -> bool { self[..] == other[..] } #[inline] fn ne(&self, other: &$Rhs) -> bool { self[..] != other[..] } } } } __impl_slice_eq1! { Vec<A>, Vec<B> } __impl_slice_eq1! { Vec<A>, &'b [B] } __impl_slice_eq1! { Vec<A>, &'b mut [B] } __impl_slice_eq1! { Cow<'a, [A]>, &'b [B], Clone } __impl_slice_eq1! { Cow<'a, [A]>, &'b mut [B], Clone } __impl_slice_eq1! { Cow<'a, [A]>, Vec<B>, Clone }

We will consider a more illustrative example. Namely, the implementation of the macro vec! which acts as a constructor for Vec :

 macro_rules! vec { //     : vec![0; 32] ( $elem:expr; $n:expr ) => ( $crate::vec::from_elem($elem, $n) ); //   : vec![1, 2, 3] ( $($x:expr),* ) => ( <[_]>::into_vec(box [$($x),*]) ); //    : vec![1, 2, 3, ] ( $($x:expr,)* ) => ( vec![$($x),*] ) }

A macro works like a match construct, but at compile time. The input for it is a fragment of the program's syntax tree. Each branch consists of a match pattern and a lookup expression, separated by => .

The matching pattern resembles regular expressions with possible quantifiers * and + . In addition to metavariables, the indicated types ( designator ) are indicated through a colon. For example, the type expr corresponds to the expression, ident to any identifier, and ty type identifier. More information about the syntax of macros is written in the manual for macros and documentation , and in the porting guide you can find the actual analysis of the macro vec! with a description of each branch.

Specifying types of metavariables allows you to more accurately determine the region of applicability of the macro, as well as to catch possible errors.

Upon encountering the use of a macro in the code, the compiler will select the branch that is appropriate for the given case and replace the macro construct in the tree with the corresponding substitution expression. If there is no suitable construction in the macro body, the compiler will generate a meaningful error message.

Cleanliness and tidiness

The macro in Rust must be written to generate lexically correct code. This means that not every character set can be a valid macro. This avoids many of the problems associated with using the preprocessor in C / C ++.

 #define SQUARE(a) a*a int x = SQUARE(my_list.pop_front()); int y = SQUARE(x++);

In a seemingly innocuous code fragment, instead of one element, we pulled out two, calculated not the result we expected, but also provoked an undefined behavior with the last line. Three serious errors on two lines of code are a bit too much.

Of course, the example is synthetic, but we all know perfectly well how a constant change of requirements and people in a team can confuse even once good code.

The root of evil lies in the fact that the C / C ++ preprocessor operates at the text level, and the compiler has to parse the program already corrupted by the preprocessor.

In contrast, macros in Rust are parsed and used by the compiler itself and work at the level of the program's syntax tree. Therefore, the problems described above cannot arise in principle.

Macros in Rust:

do not obscure variables
do not violate the procedure of parsing conditions
do not give hidden side effects
do not lead to indefinite behavior

Such macros are called hygienic . One consequence is that a macro cannot declare a variable visible outside of it.

But within the macro, you can start variables that are guaranteed not to intersect with the variables above the code. For example, the above macro vec! can be rewritten using an intermediate variable. For simplicity, consider only the main branch:

 macro_rules! vec { ( $($x:expr),* ) => { { //  - let mut result = Vec::new(); //     $x    $(result.push($x);)* //  result     result } }; }

So the code

 let vector = vec![1, 2, 3];

after macro substitution will be converted to

 let vector = { let mut result = Vec::new(); result.push(1); result.push(2); result.push(3); result };

Procedural macros

When the capabilities of ordinary macros are not enough, procedural ones go into battle.

As mentioned above, procedural macros are so called, because instead of a simple substitution, they can return a completely arbitrary set of tokens, which is the result of executing a certain procedure, or rather, a function. We will study this function.

As a guinea pig, let's take an implementation of the automatically output constructor #[derive(new)] from the corresponding library .

From the user's point of view, the usage will look like this:

 #[macro_use] extern crate derive_new; #[derive(new)] struct Bar { x: i32, y: String, } fn main() { let _ = Bar::new(42, "Hello".to_owned()); }

That is, having defined the attribute #[derive(new)] we asked the compiler to self-infer ... but what exactly? Where does the compiler understand exactly what method we expect to get? Let's figure it out.

To get started, let's look at the source code of the library, fortunately it is not so big:

Many beeches (75 lines)

 #![crate_type = "proc-macro"] extern crate proc_macro; extern crate syn; #[macro_use] extern crate quote; use proc_macro::TokenStream; #[proc_macro_derive(new)] pub fn derive(input: TokenStream) -> TokenStream { let input: String = input.to_string(); let ast = syn::parse_macro_input(&input).expect("Couldn't parse item"); let result = new_for_struct(ast); result.to_string().parse().expect("couldn't parse string to tokens") } fn new_for_struct(ast: syn::MacroInput) -> quote::Tokens { let name = &ast.ident; let (impl_generics, ty_generics, where_clause) = ast.generics.split_for_impl(); let doc_comment = format!("Constructs a new `{}`.", name); match ast.body { syn::Body::Struct(syn::VariantData::Struct(ref fields)) => { let args = fields.iter().map(|f| { let f_name = &f.ident; let ty = &f.ty; quote!(#f_name: #ty) }); let inits = fields.iter().map(|f| { let f_name = &f.ident; quote!(#f_name: #f_name) }); quote! { impl #impl_generics #name #ty_generics #where_clause { #[doc = #doc_comment] pub fn new(#(args),*) -> Self { #name { #(inits),* } } } } }, syn::Body::Struct(syn::VariantData::Unit) => { quote! { impl #impl_generics #name #ty_generics #where_clause { #[doc = #doc_comment] pub fn new() -> Self { #name } } } }, syn::Body::Struct(syn::VariantData::Tuple(ref fields)) => { let (args, inits): (Vec<_>, Vec<_>) = fields.iter().enumerate().map(|(i, f)| { let f_name = syn::Ident::new(format!("value{}", i)); let ty = &f.ty; (quote!(#f_name: #ty), f_name) }).unzip(); quote! { impl #impl_generics #name #ty_generics #where_clause { #[doc = #doc_comment] pub fn new(#(args),*) -> Self { #name(#(inits),*) } } } }, _ => panic!("#[derive(new)] can only be used with structs"), } }

And now let's sort it out by bone and try to understand what it does.

 #![crate_type = "proc-macro"] extern crate proc_macro; extern crate syn; #[macro_use] extern crate quote; use proc_macro::TokenStream;

In the first lines of the library, the special translation unit type proc-macro , which says that it will not be anyhow, but a plug-in for the compiler. Then the necessary proc_macro and syn libraries are proc_macro with all the tools. The first one sets the main types, the second one provides the means of parsing the Rust code into an abstract syntax tree (AST). In turn, the quote library provides a very important quote! which we will see in action a little later.

Finally, the necessary type of TokenStream , as it appears in the function prototype.

The following is the function itself, which acts as an entry point to the procedural macro:

 #[proc_macro_derive(new)] pub fn derive(input: TokenStream) -> TokenStream { let input: String = input.to_string(); let ast = syn::parse_macro_input(&input).expect("Couldn't parse item"); let result = new_for_struct(ast); result.to_string().parse().expect("couldn't parse string to tokens") }

Pay attention to the proc_macro_derive(new) attribute, which tells the compiler that this function is responsible for #[derive(new)] .

At the input, it receives a set of tokens from the compiler that make up the macro body. At the output, the compiler expects to get another set of tokens that are the result of the macro. Thus, the derive() function works as a kind of filter.

The body of the function is very simple. First we convert the input set of tokens to a string, and then parse the string as an abstract syntax tree. The most interesting thing happens inside the function call new_for_struct() , which takes AST at the input, and gives the quoted tokens (more on that later). Finally, the received tokens are converted back to a string (do not ask me why this is so), they are sent to TokenStream and are given as a result of the macro to the compiler.

To be honest, I also do not understand why to shuffle the data back and forth through the lines and why it was impossible to immediately make a sane interface, but oh well. Perhaps in the future the situation will change.

Let's see what the new_for_struct() function new_for_struct() . But first, we will look at the structures for which we may need to generate constructors.

So, at the entrance we can submit:

 //   #[derive(new)] struct Normal { x: i32, y: String, } //  tuple struct #[derive(new)] struct Tuple(i32, i32, i32); // - #[derive(new)] struct Empty;

It is clear that the syntax trees for all three options will be different. And this needs to be considered when generating the new() method. Actually, all that new_for_struct() does is look at the passed AST tree, determine with which option it is dealing with the given moment and generates the necessary substitution. And if they don’t know what to say to her, she starts to panic.

 fn new_for_struct(ast: syn::MacroInput) -> quote::Tokens { let name = &ast.ident; let (impl_generics, ty_generics, where_clause) = ast.generics.split_for_impl(); let doc_comment = format!("Constructs a new `{}`.", name); match ast.body { syn::Body::Struct(syn::VariantData::Struct(ref fields)) => { /*   */ }, syn::Body::Struct(syn::VariantData::Unit) => { /*   */ }, syn::Body::Struct(syn::VariantData::Tuple(ref fields)) => { /* tuple struct */ } _ => panic!("#[derive(new)] can only be used with structs"), } }

Let's look at the code that generates the substitution for a regular structure. Here, the code is already inconvenient to break up, so I will insert comments directly into the text:

 //         : //        let args = fields.iter().map(|f| { let f_name = &f.ident; let ty = &f.ty; quote!(#f_name: #ty) }); //   ,  : let inits = fields.iter().map(|f| { let f_name = &f.ident; quote!(#f_name: #f_name) }); // ,        : quote! { impl #impl_generics #name #ty_generics #where_clause { #[doc = #doc_comment] pub fn new(#(args),*) -> Self { #name { #(inits),* } } } }

The whole trick here is in the quote! Macro quote! which allows you to quote code snippets, substituting for itself a set of corresponding tokens. Pay attention to metavariables beginning with a lattice. They are inherited from the lexical context in which the quotation is located.

If it is still not clear how it works, take a look at the result of applying the procedural macro to the Normal structure described above.

The structure itself again:

 #[derive(new)] struct Normal { x: i32, y: String, }

The result of applying a procedural macro:

 /// Constructs a new `Normal`. impl Normal { pub fn new(x: i32, y: String) -> Self { Normal { x: x, y: y } } }

Suddenly, everything falls into place. It turns out that we just personally generated the impl block for the structure, added the associated constructor function new() with the documentation (!), Two parameters x and y corresponding types and with the implementation that returns our structure, sequentially initializing its fields with values from its parameters.

Since Rust can understand from the context what the x and y correspond to before and after the colon, everything compiles successfully.

As an exercise, I propose to disassemble the remaining two branches on your own.

Conclusion

The potential for procedural macros is only to be discovered. The examples indicated in the last article are only the tip of the iceberg and the most straightforward use case. There are much more interesting projects, such as a garbage collector project implemented entirely by the lexical means of the Rust language.

I hope that the article was useful to you. And if, after reading it, you also want to play around with the Rust language, I will consider my task fully completed :)

The material was prepared jointly with Daria Schetinina.

Source: https://habr.com/ru/post/321620/

All Articles