📜 ⬆️ ⬇️

Framework for procedural macros in Rust

From translator


Procedural macros are one of the most expected features of Rust. At the moment, it is possible to write procedural macros only for an unstable version of the compiler, although there are several containers, like syntex , that allow you to do limited code generation within a stable compiler. However, this does not make things easier, since the interface to the AST remains unstable, and although the syntex authors try to keep up with the nightly builds, sometimes the feils happen due to changes in the structure of the AST.
In this blog post, one of the core team members, Nick Cameron, shared his vision of the future of procedural macros. Although the post is full of technical details on the inside of the compiler, it seemed to me that the habrasoobschestvu might be interested to look a little behind the scenes of the development of Rust.

Procedural macro framework


In this post I will explain how, in my opinion, procedural macros should look like. I have already talked about the syntax in another post , and when we publish the API for procedural macros, I will write a post about it. I have already described a number of changes in the macro system, so here I will repeat something (partly contrary to the previous post), but I will reveal more details.

Types of macros


There are only two types of procedural macros: function macros and attribute macros. The first are functions marked with the attribute #[macro] , the second are marked with #[macro_attribute] . Macro functions are used in the form of foo!(tokens) , and attribute macros in the form of #[foo] or #[foo(tokens)] , while connecting to the AST node, following the usual rules for using attributes. Attributes #![...] also supported, in accordance with the obvious semantics (they refer not to the next AST node, but to the parent node - approx. Transl.) .
Macro functions have the following signature:
 #[macro] pub fn foo(TokenStream, &mut MacroContext) -> TokenStream 

Attribute macros have the following signature:
 #[macro_attribute] pub fn foo(Option<TokenStream>, TokenStream, &mut MacroContext) -> TokenStream 

The first argument is an optional stream of tokens from the macro attribute itself ( tokens from #[foo(tokens)] ). The second argument, TokenStream is the stream of tokens from the AST node to which the macro attribute belongs. The returned TokenStream replaces the original AST node, and may be zero or more AST nodes (that is, we replace the Modifier and Decorator syntax extensions at once).
We guarantee that the second TokenStream parsed to some valid AST node, while the first can parse, or maybe not.
A procedural macro must ensure that the returned TokenStream parsed in the context of the macro call.

libmacro


Libmacro is a new library that is added to the standard language distribution. It is assumed that it will be used mainly by procedural macros. Its contents will follow the same stabilization rules as other library containers (that is, all features are introduced as unstable, and then stabilized as their utility is proved). The libsyntax library remains, but it will be a compiler implementation detail: procedural macros should not use it and it will not be marked as stable (that is, stable macros should not use it).
The idea is that libmacro will provide a fairly low-level interface. We expect containers with higher-level libraries to appear in the ecosystem. In particular, libmacro will not have the concept of AST. It is expected that containers in a wider ecosystem will provide AST, as well as functionality for parsing tokens in AST, and building AST.
Libmacro will contain the structures for tokens (which can be re-exported from libsyntax ) and MacroContext , passed to macros. Libmacro will include the following functionality:

Most of this functionality will be available as MacroContext methods.
I will tell about this API in more detail in a future post. Here I will cover some aspects of tokens and MacroContext .

Tokens


Creating an effective and ergonomic representation of tokens involves many areas. Here is the first sketch:
 mod tokens { use {Span, HygieneObject, InternedString}; pub struct TokenStream(Vec<TokenTree>); impl TokenStream { // Methods for adding and removing tokens, etc. } pub struct TokenTree { pub kind: TokenKind, pub span: Span, pub hygiene: HygieneObject, } pub enum TokenKind { Delimited(Delimiter, Vec<TokenTree>), // String includes the commenting tokens. Comment(String, CommentKind), String(String, StringKind), Dollar, Semicolon, Eof, Word(InternedString), Punctuation(char), } pub enum Delimiter { // { } Brace, // ( ) Parenthesis, // [ ] Bracket, } pub enum CommentKind { Regular, InnerDoc, OuterDoc, } pub enum StringKind { Regular, Raw(usize), Byte, RawByte(usize), } } 
We could only store HygieneInformation for TokenKind::Word , and not for all tokens. We could also store it for token ranges, and not for each token separately.
I'm not sure that we need to distinguish between $ and ; : dollar is used to designate metavariables in macros, and semicolon to separate elements from each other, so it may be useful to distinguish from. Perhaps we should distinguish ! and # , as they are used when calling macros, although I can't figure out where this might be useful.
It may be worth interning string literals. Perhaps you should not save the contents of the comments, as they can be read through the spans (now we are doing both this and that).
I don't think we need interpolated non-terminals here.
We should also provide some support functions. Note, however: I expect that over time we will provide stability guarantees for these data structures. These functions will be stable only by signatures, but not by the results of work. They will accept either TokenTree or &[TokenTree] :

And maybe some functions for the convenience of building a tree of tokens.
')

MacroContext


MacroContext performs several roles:

Maybe MacroContext will be some structure, but I think most of the fields will be private. It is possible that this will be a type.

Contextual information

Access methods:


Return Tokens Properties



Other functionality

I will reveal a lot in future posts about libmacro . The most important functionality includes displaying errors, warnings, etc. This includes the ability to display comments and suggestions on the code, and provide information about spans, based on the tokens available to the macro.

Commissioning


Initially, we will support both new procedural macros and old syntax extensions. Both will be unstable. Defining old syntax extensions will have to issue deprecation warnings with a proposal to use new procedural macros. We will stabilize procedural macros over time, through the stabilization of attributes to declare procedural macros. Then, gradually stabilize the libmacro part by part. As soon as a sufficient part of the functionality becomes stable (and we rewrite the internal syntax extensions to the new system), we will have to remove support for the old syntax extensions.

Alternatives


Now we support the IdentTT syntax extension, which represents a macro function with an identifier between the macro name and the separator separator. I would like to stop this support. However, it may be useful to emulate some elements (for example my_struct! foo { ... } ). Unfortunately, such an application is unsatisfactory, since it does not support modifiers (like pub my_struct! foo ... ), and some authors want different types of tokens after the macro name when it is called, and not just identifiers. My suggestion is that we should remove this feature at the moment. It can be added in the future with preserving backward compatibility, either by adding a new attribute ( #[macro_with_ident] ), or by adding information to the MacroContext .
MacroContext somewhat heavy, perhaps it would be better to divide it into several smaller types or structures. However, this may make writing macros less ergonomic.

Source: https://habr.com/ru/post/274225/


All Articles