Framework for procedural macros in Rust

From translator

Procedural macros are one of the most expected features of Rust. At the moment, it is possible to write procedural macros only for an unstable version of the compiler, although there are several containers, like syntex , that allow you to do limited code generation within a stable compiler. However, this does not make things easier, since the interface to the AST remains unstable, and although the syntex authors try to keep up with the nightly builds, sometimes the feils happen due to changes in the structure of the AST.
In this blog post, one of the core team members, Nick Cameron, shared his vision of the future of procedural macros. Although the post is full of technical details on the inside of the compiler, it seemed to me that the habrasoobschestvu might be interested to look a little behind the scenes of the development of Rust.

Procedural macro framework

In this post I will explain how, in my opinion, procedural macros should look like. I have already talked about the syntax in another post , and when we publish the API for procedural macros, I will write a post about it. I have already described a number of changes in the macro system, so here I will repeat something (partly contrary to the previous post), but I will reveal more details.

Types of macros

There are only two types of procedural macros: function macros and attribute macros. The first are functions marked with the attribute #[macro] , the second are marked with #[macro_attribute] . Macro functions are used in the form of foo!(tokens) , and attribute macros in the form of #[foo] or #[foo(tokens)] , while connecting to the AST node, following the usual rules for using attributes. Attributes #![...] also supported, in accordance with the obvious semantics (they refer not to the next AST node, but to the parent node - approx. Transl.) .
Macro functions have the following signature:

 #[macro] pub fn foo(TokenStream, &mut MacroContext) -> TokenStream

Attribute macros have the following signature:

 #[macro_attribute] pub fn foo(Option<TokenStream>, TokenStream, &mut MacroContext) -> TokenStream

The first argument is an optional stream of tokens from the macro attribute itself ( tokens from #[foo(tokens)] ). The second argument, TokenStream is the stream of tokens from the AST node to which the macro attribute belongs. The returned TokenStream replaces the original AST node, and may be zero or more AST nodes (that is, we replace the Modifier and Decorator syntax extensions at once).
We guarantee that the second TokenStream parsed to some valid AST node, while the first can parse, or maybe not.
A procedural macro must ensure that the returned TokenStream parsed in the context of the macro call.

libmacro

Libmacro is a new library that is added to the standard language distribution. It is assumed that it will be used mainly by procedural macros. Its contents will follow the same stabilization rules as other library containers (that is, all features are introduced as unstable, and then stabilized as their utility is proved). The libsyntax library remains, but it will be a compiler implementation detail: procedural macros should not use it and it will not be marked as stable (that is, stable macros should not use it).
The idea is that libmacro will provide a fairly low-level interface. We expect containers with higher-level libraries to appear in the ecosystem. In particular, libmacro will not have the concept of AST. It is expected that containers in a wider ecosystem will provide AST, as well as functionality for parsing tokens in AST, and building AST.
Libmacro will contain the structures for tokens (which can be re-exported from libsyntax ) and MacroContext , passed to macros. Libmacro will include the following functionality:

parsing a string into tokens,
quasi-quoting (converting text and meta-variables to tokens),
pattern matching for tokens,
string interning
generation of new identifiers with different hygiene settings,
manipulating information about the hygiene of tokens,
use of macros (including name resolution),
manipulating spans ("spans" is a representation of the display of AST pieces on the source code, I don’t know how best to translate - approx. transl.) (in particular, opening the traces and creating new spans) and obtaining information about the place in the source code from spans,
checking the status of feature flags (“feature gates”, attributes of the form #[feature(name)] , including various, usually unstable, compiler features — note translator) and setting #[feature(name)] flags for use during code generation,
mark attributes as used,
error messages, etc.,
parsing tokens into key-value pairs, as specified in the arguments of the attribute.

Most of this functionality will be available as MacroContext methods.
I will tell about this API in more detail in a future post. Here I will cover some aspects of tokens and MacroContext .

Tokens

Creating an effective and ergonomic representation of tokens involves many areas. Here is the first sketch:

 mod tokens { use {Span, HygieneObject, InternedString}; pub struct TokenStream(Vec<TokenTree>); impl TokenStream { // Methods for adding and removing tokens, etc. } pub struct TokenTree { pub kind: TokenKind, pub span: Span, pub hygiene: HygieneObject, } pub enum TokenKind { Delimited(Delimiter, Vec<TokenTree>), // String includes the commenting tokens. Comment(String, CommentKind), String(String, StringKind), Dollar, Semicolon, Eof, Word(InternedString), Punctuation(char), } pub enum Delimiter { // { } Brace, // ( ) Parenthesis, // [ ] Bracket, } pub enum CommentKind { Regular, InnerDoc, OuterDoc, } pub enum StringKind { Regular, Raw(usize), Byte, RawByte(usize), } }

We could only store HygieneInformation for TokenKind::Word , and not for all tokens. We could also store it for token ranges, and not for each token separately.
I'm not sure that we need to distinguish between $ and ; : dollar is used to designate metavariables in macros, and semicolon to separate elements from each other, so it may be useful to distinguish from. Perhaps we should distinguish ! and # , as they are used when calling macros, although I can't figure out where this might be useful.
It may be worth interning string literals. Perhaps you should not save the contents of the comments, as they can be read through the spans (now we are doing both this and that).
I don't think we need interpolated non-terminals here.
We should also provide some support functions. Note, however: I expect that over time we will provide stability guarantees for these data structures. These functions will be stable only by signatures, but not by the results of work. They will accept either TokenTree or &[TokenTree] :

is_keyword
is_reserved_word
is_special_ident
is_operator
is_ident
is_path
metavariables - extracts meta variables from TokenStream , for example for foo($x:ident, $y:expr) returns [("x", 2 ident), ("y", 6, expr)] in the form of a certain data structure.

And maybe some functions for the convenience of building a tree of tokens.
')

MacroContext

MacroContext performs several roles:

contains information about the context of the macro declaration and the context of its application,
conveys information about how the results of the macro should be used,
provides access to the functionality of libmacro , which requires the preservation of some state.

Maybe MacroContext will be some structure, but I think most of the fields will be private. It is possible that this will be a type.

Contextual information

Access methods:

use macro span (note: TokenStream macro arguments also have their own span),
Span definition of the macro itself,
the hygienic context of the place where the macro is used, and the place where the macro is defined (note that these are closing objects, again, all tokens will have their own hygienic contexts),
any unopened attributes at the place of application of the macro,
AST view of the node that the macro should produce
feature flags enabled by macro
information on whether a macro is used in an unsafe block or not,
separators used for macro functions.

Return Tokens Properties

set of feature flags for generated code,
An indication of how to apply hygiene in the generated code.

Other functionality

I will reveal a lot in future posts about libmacro . The most important functionality includes displaying errors, warnings, etc. This includes the ability to display comments and suggestions on the code, and provide information about spans, based on the tokens available to the macro.

Commissioning

Initially, we will support both new procedural macros and old syntax extensions. Both will be unstable. Defining old syntax extensions will have to issue deprecation warnings with a proposal to use new procedural macros. We will stabilize procedural macros over time, through the stabilization of attributes to declare procedural macros. Then, gradually stabilize the libmacro part by part. As soon as a sufficient part of the functionality becomes stable (and we rewrite the internal syntax extensions to the new system), we will have to remove support for the old syntax extensions.

Alternatives

Now we support the IdentTT syntax extension, which represents a macro function with an identifier between the macro name and the separator separator. I would like to stop this support. However, it may be useful to emulate some elements (for example my_struct! foo { ... } ). Unfortunately, such an application is unsatisfactory, since it does not support modifiers (like pub my_struct! foo ... ), and some authors want different types of tokens after the macro name when it is called, and not just identifiers. My suggestion is that we should remove this feature at the moment. It can be added in the future with preserving backward compatibility, either by adding a new attribute ( #[macro_with_ident] ), or by adding information to the MacroContext .
MacroContext somewhat heavy, perhaps it would be better to divide it into several smaller types or structures. However, this may make writing macros less ergonomic.

Source: https://habr.com/ru/post/274225/

All Articles