ST Macros
ST Macros
Daniel Keep
2015-10-14
2
Contents
1 Introduction 5
Thanks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Patterns 59
Callbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Incremental TT Munchers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Internal Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Push-Down Accumulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Repetition Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Trailing Separators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4 CONTENTS
TT Bundling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Provisional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5 Building Blocks 69
AST Coercion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Enum Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6 Annotated Examples 75
Ook! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Introduction
1
Note: this is a work in progress.
This book is an attempt to distil the Rust community’s collective knowledge of Rust macros.
As such, both additions (in the form of pull requests) and requests (in the form of issues) are
welcome.
If you wish to contribute, see the GitHub repository1 .
Thanks
Thanks to the following for suggestions and corrections: IcyFoxy, Rym, TheMicroWorm, Yurume,
akavel, cmr, ogham, and snake_case.
License
This work is licensed under both the Creative Commons Attribution-ShareAlike 4.0 International
License2 and the MIT license3 .
1 https://github.com/DanielKeep/tlborm/
2 http://creativecommons.org/licenses/by-sa/4.0/
3 http://opensource.org/licenses/MIT
6 CHAPTER 1. INTRODUCTION
Macros, A Methodical Introduction
2
This chapter will introduce Rust’s Macro-By-Example system: macro_rules!. Rather than trying
to cover it based on practical examples, it will instead attempt to give you a complete and
thorough explanation of how the system works. As such, this is intended for people who just
want the system as a whole explained, rather than be guided through it.
There is also the Macros chapter of the Rust Book1 which is a more approachable, high-level ex-
planation, and the practical introduction2 chapter of this book, which is a guided implementation
of a single macro.
Syntax Extensions
Before talking about macros, it is worthwhile to discuss the general mechanism they are built on:
syntax extensions. To do that, we must discuss how Rust source is processed by the compiler,
and the general mechanisms on which user-defined macros are built.
Source Analysis
The first stage of compilation for a Rust program is tokenisation. This is where the source text
is transformed into a sequence of tokens (i.e. indivisible lexical units; the programming language
equivalent of “words”). Rust has various kinds of tokens, such as:
…among others. There are some things to note about the above: first, self is both an identifier
and a keyword. In almost all cases, self is a keyword, but it is possible for it to be treated as
an identifier, which will come up later (along with much cursing). Secondly, the list of keywords
includes some suspicious entries such as yield and macro that aren’t actually in the language,
but are parsed by the compiler—these are reserved for future use. Third, the list of symbols also
includes entries that aren’t used by the language. In the case of <-, it is vestigial: it was removed
from the grammar, but not from the lexer. As a final point, note that :: is a distinct token; it
is not simply two adjacent : tokens. The same is true of all multi-character symbol tokens in
Rust, as of Rust 1.2. 3
As a point of comparison, it is at this stage that some languages have their macro layer, though
Rust does not. For example, C/C++ macros are effectively processed at this point.4 This is why
the following code works:5
The next stage is parsing, where the stream of tokens is turned into an Abstract Syntax Tree
(AST). This involves building up the syntactic structure of the program in memory. For example,
the token sequence 1 + 2 is transformed into the equivalent of:
┌─────────┐ ┌─────────┐
│ BinOp │ ┌╴│ LitInt │
│ op: Add │ │ │ val: 1 │
│ lhs: ◌ │╶┘ └─────────┘
│ rhs: ◌ │╶┐ ┌─────────┐
└─────────┘ └╴│ LitInt │
│ val: 2 │
└─────────┘
The AST contains the structure of the entire program, though it is based on purely lexical
information. For example, although the compiler may know that a particular expression is
referring to a variable called “a”, at this stage, it has no way of knowing what “a” is, or even
where it comes from.
It is after the AST has been constructed that macros are processed. However, before we can
discuss that, we have to talk about token trees.
3 @ has a purpose, though most people seem to forget about it completely: it is used in patterns to bind a
non-terminal part of the pattern to a name. Even a member of the Rust core team, proof-reading this chapter,
who specifically brought up this section, didn’t remember that @ has a purpose. Poor, poor swirly.
4 In fact, the C preprocessor uses a different lexical structure to C itself, but the distinction is broadly irrelevant.
5 Whether it should work is an entirely different question.
SOURCE ANALYSIS 9
Token trees
Token trees are somewhere between tokens and the AST. Firstly, almost all tokens are also token
trees; more specifically, they are leaves. There is one other kind of thing that can be a token tree
leaf, but we will come back to that later.
The only basic tokens that are not leaves are the “grouping” tokens: (...), [...], and {...}.
These three are the interior nodes of token trees, and what give them their structure. To give a
concrete example, this sequence of tokens:
a + b + (c + d[0]) + e
Note that this has no relationship to the AST the expression would produce; instead of a single
root node, there are nine token trees at the root level. For reference, the AST would be:
┌─────────┐
│ BinOp │
│ op: Add │
┌╴│ lhs: ◌ │
┌─────────┐ │ │ rhs: ◌ │╶┐ ┌─────────┐
│ Var │╶┘ └─────────┘ └╴│ BinOp │
│ name: a │ │ op: Add │
└─────────┘ ┌╴│ lhs: ◌ │
┌─────────┐ │ │ rhs: ◌ │╶┐ ┌─────────┐
│ Var │╶┘ └─────────┘ └╴│ BinOp │
│ name: b │ │ op: Add │
└─────────┘ ┌╴│ lhs: ◌ │
┌─────────┐ │ │ rhs: ◌ │╶┐ ┌─────────┐
│ BinOp │╶┘ └─────────┘ └╴│ Var │
│ op: Add │ │ name: e │
┌╴│ lhs: ◌ │ └─────────┘
┌─────────┐ │ │ rhs: ◌ │╶┐ ┌─────────┐
│ Var │╶┘ └─────────┘ └╴│ Index │
│ name: c │ ┌╴│ arr: ◌ │
└─────────┘ ┌─────────┐ │ │ ind: ◌ │╶┐ ┌─────────┐
│ Var │╶┘ └─────────┘ └╴│ LitInt │
│ name: d │ │ val: 0 │
└─────────┘ └─────────┘
It is important to understand the distinction between the AST and token trees. When writing
macros, you have to deal with both as distinct things.
10 CHAPTER 2. MACROS, A METHODICAL INTRODUCTION
One other aspect of this to note: it is impossible to have an unpaired paren, bracket or brace;
nor is it possible to have incorrectly nested groups in a token tree.
The first two are “attributes”, and are shared between both language-specific constructs (such
as #[repr(C)] which is used to request a C-compatible ABI for user-defined types) and syntax
extensions (such as #[derive(Clone)]). There is currently no way to define a macro that uses
these forms.
The third is the one of interest to us: it is the form available for use with macros. Note that this
form is not limited to macros: it is a generic syntax extension form. For example, whilst format!
is a macro, format_args! (which is used to implement format!) is not.
The fourth is essentially a variation which is not available to macros. In fact, the only case where
this form is used at all is with macro_rules! which, again we will come back to.
Disregarding all but the third form ($name ! $arg), the question becomes: how does the Rust
parser know what $arg looks like for every possible syntax extension? The answer is that it
doesn’t have to. Instead, the argument of a syntax extension invocation is a single token tree.
More specifically, it is a single, non-leaf token tree; (...), [...], or {...}. With that knowledge,
it should become apparent how the parser can understand all of the following invocation forms:
bitflags! {
flags Color: u8 {
const RED = 0b0001,
const GREEN = 0b0010,
const BLUE = 0b0100,
const BRIGHT = 0b1000,
}
}
lazy_static! {
static ref FIB_100: u32 = {
fn fib(a: u32) -> u32 {
match a {
0 => 0,
1 => 1,
MACROS IN THE AST 11
fib(100)
};
}
fn main() {
let colors = vec![RED, GREEN, BLUE];
println!("Hello, World!");
}
Although the above invocations may look like they contain various kinds of Rust code, the parser
simply sees a collection of meaningless token trees. To make this clearer, we can replace all these
syntactic “black boxes” with �, leaving us with:
bitflags! ⬚
lazy_static! ⬚
fn main() {
let colors = vec! ⬚;
println! ⬚;
}
Just to reiterate: the parser does not assume anything about �; it remembers the tokens it
contains, but doesn’t try to understand them.
The important takeaways are:
• There are multiple kinds of syntax extension in Rust. We will only be talking about macros
defined by the macro_rules! construct.
• Just because you see something of the form $name! $arg, doesn’t mean it’s actually a macro;
it might be another kind of syntax extension.
• The input to every macro is a single non-leaf token tree.
• Macros (really, syntax extensions in general) are parsed as part of the abstract syntax tree.
Aside: due to the first point, some of what will be said below (including the next
paragraph) will apply to syntax extensions in general.6
The last point is the most important, as it has significant implications. Because macros are
parsed into the AST, they can only appear in positions where they are explicitly supported.
Specifically macros can appear in place of the following:
• Patterns
6 This is rather convenient as “macro” is much quicker and easier to type than “syntax extension”.
12 CHAPTER 2. MACROS, A METHODICAL INTRODUCTION
• Statements
• Expressions
• Items
• impl Items
• Identifiers
• Match arms
• Struct fields
• Types
There is absolutely, definitely no way to use macros in any position not on the first list.
Expansion
Expansion is a relatively simple affair. At some point after the construction of the AST, but
before the compiler begins constructing its semantic understanding of the program, it will expand
all macros.
This involves traversing the AST, locating macro invocations and replacing them with their
expansion. In the case of non-macro syntax extensions, how this happens is up to the particular
syntax extension. That said, syntax extensions go through exactly the same process that macros
do once their expansion is complete.
Once the compiler has run a syntax extension, it expects the result to be parseable as one of a
limited set of syntax elements, based on context. For example, if you invoke a macro at module
scope, the compiler will parse the result into an AST node that represents an item. If you invoke
a macro in expression position, the compiler will parse the result into an expression AST node.
In fact, it can turn a syntax extension result into any of the following:
• an expression,
• a pattern,
• zero or more items,
• zero or more impl items, or
• zero or more statements.
In other words, where you can invoke a macro determines what its result will be interpreted as.
The compiler will take this AST node and completely replace the macro’s invocation node with
the output node. This is a structural operation, not a textural one!
For example, consider the following:
┌─────────────┐
│ Let │
│ name: eight │ ┌─────────┐
│ init: ◌ │╶─╴│ BinOp │
└─────────────┘ │ op: Mul │
┌╴│ lhs: ◌ │
┌────────┐ │ │ rhs: ◌ │╶┐ ┌────────────┐
│ LitInt │╶┘ └─────────┘ └╴│ Macro │
│ val: 2 │ │ name: four │
└────────┘ │ body: () │
└────────────┘
From context, four!() must expand to an expression (the initialiser can only be an expression).
Thus, whatever the actual expansion is, it will be interpreted as a complete expression. In this
case, we will assume four! is defined such that it expands to the expression 1 + 3. As a result,
expanding this invocation will result in the AST changing to:
┌─────────────┐
│ Let │
│ name: eight │ ┌─────────┐
│ init: ◌ │╶─╴│ BinOp │
└─────────────┘ │ op: Mul │
┌╴│ lhs: ◌ │
┌────────┐ │ │ rhs: ◌ │╶┐ ┌─────────┐
│ LitInt │╶┘ └─────────┘ └╴│ BinOp │
│ val: 2 │ │ op: Add │
└────────┘ ┌╴│ lhs: ◌ │
┌────────┐ │ │ rhs: ◌ │╶┐ ┌────────┐
│ LitInt │╶┘ └─────────┘ └╴│ LitInt │
│ val: 1 │ │ val: 3 │
└────────┘ └────────┘
Note that we added parens despite them not being in the expansion. Remember that the compiler
always treats the expansion of a macro as a complete AST node, not as a mere sequence of tokens.
To put it another way, even if you don’t explicitly wrap a complex expression in parentheses,
there is no way for the compiler to “misinterpret” the result, or change the order of evaluation.
It is important to understand that macro expansions are treated as AST nodes, as this design
has two further implications:
• In addition to there being a limited number of invocation positions, macros can only expand
to the kind of AST node the parser expects at that position.
• As a consequence of the above, macros absolutely cannot expand to incomplete or syntac-
tically invalid constructs.
14 CHAPTER 2. MACROS, A METHODICAL INTRODUCTION
There is one further thing to note about expansion: what happens when a syntax extension
expands to something that contains another syntax extension invocation. For example, consider
an alternative definition of four!; what happens if it expands to 1 + three!()?
let x = four!();
Expands to:
let x = 1 + three!();
This is resolved by the compiler checking the result of expansions for additional macro invoca-
tions, and expanding them. Thus, a second expansion step turns the above into:
let x = 1 + 3;
The takeaway here is that expansion happens in “passes”; as many as is needed to completely
expand all invocations.
Well, not quite. In fact, the compiler imposes an upper limit on the number of such recursive
passes it is willing to run before giving up. This is known as the macro recursion limit and
defaults to 32. If the 32nd expansion contains a macro invocation, the compiler will abort with
an error indicating that the recursion limit was exceeded.
This limit can be raised using the #![recursion_limit="…"] attribute, though it must be done
crate-wide. Generally, it is recommended to try and keep macros below this limit wherever
possible.
macro_rules!
With all that in mind, we can introduce macro_rules! itself. As noted previously, macro_rules!
is itself a syntax extension, meaning it is technically not part of the Rust syntax. It uses the
following form:
macro_rules! $name {
$rule0 ;
$rule1 ;
// …
$ruleN ;
}
There must be at least one rule, and you can omit the semicolon after the last rule.
Each “rule” looks like so:
Actually, the parens and braces can be any kind of group, but parens around the pattern and
braces around the expansion are somewhat conventional.
If you are wondering, the macro_rules! invocation expands to… nothing. At least, nothing that
appears in the AST; rather, it manipulates compiler-internal structures to register the macro. As
such, you can technically use macro_rules! in any position where an empty expansion is valid.
MACRO_RULES! 15
Matching
When a macro is invoked, the macro_rules! interpreter goes through the rules one by one, in
lexical order. For each rule, it tries to match the contents of the input token tree against that
rule’s pattern. A pattern must match the entirety of the input to be considered a match.
If the input matches the pattern, the invocation is replaced by the expansion; otherwise, the next
rule is tried. If all rules fail to match, macro expansion fails with an error.
The simplest example is of an empty pattern:
macro_rules! four {
() => {1 + 3};
}
This matches if and only if the input is also empty (i.e. four!(), four![] or four!{}).
Note that the specific grouping tokens you use when you invoke the macro are not matched.
That is, you can invoke the above macro as four![] and it will still match. Only the contents of
the input token tree are considered.
Patterns can also contain literal token trees, which must be matched exactly. This is done
by simply writing the token trees normally. For example, to match the sequence 4 fn ['spang
"whammo"] @_@, you would use:
macro_rules! gibberish {
(4 fn ['spang "whammo"] @_@) => {...};
}
You can use any token tree that you can write.
Captures
Patterns can also contain captures. These allow input to be matched based on some general
grammar category, with the result captured to a variable which can then be substituted into the
output.
Captures are written as a dollar ($) followed by an identifier, a colon (:), and finally the kind of
capture, which must be one of the following:
macro_rules! one_expression {
($e:expr) => {...};
}
These captures leverage the Rust compiler’s parser, ensuring that they are always “correct”.
An expr capture will always capture a complete, valid expression for the version of Rust being
compiled.
You can mix literal token trees and captures, within limits (explained below).
A capture $name:kind can be substituted into the expansion by writing $name. For example:
macro_rules! times_five {
($e:expr) => {5 * $e};
}
Much like macro expansion, captures are substituted as complete AST nodes. This means that
no matter what sequence of tokens is captured by $e, it will be interpreted as a single, complete
expression.
You can also have multiple captures in a single pattern:
macro_rules! multiply_add {
($a:expr, $b:expr, $c:expr) => {$a * ($b + $c)};
}
Repetitions
Patterns can contain repetitions. These allow a sequence of tokens to be matched. These have
the general form $ ( ... ) sep rep.
Repetitions can contain any other valid pattern, including literal token trees, captures, and other
repetitions.
Repetitions use the same syntax in the expansion.
For example, below is a macro which formats each element as a string. It matches zero or more
comma-separated expressions and expands to an expression that constructs a vector.
MINUTIAE 17
macro_rules! vec_strs {
(
// Start a repetition:
$(
// Each repeat must contain an expression...
$element:expr
)
// ...separated by commas...
,
// ...zero or more times.
*
) => {
// Enclose the expansion in a block so that we can use
// multiple statements.
{
let mut v = Vec::new();
// Start a repetition:
$(
// Each repeat will contain the following statement, with
// $element replaced with the corresponding expression.
v.push(format!("{}", $element));
)*
v
}
};
}
Minutiae
This section goes through some of the finer details of the macro system. At a minimum, you
should try to be at least aware of these details and issues.
Once the parser begins consuming tokens for a capture, it cannot stop or backtrack. This means
that the second rule of the following macro cannot ever match, no matter what input is provided:
macro_rules! dead_rule {
($e:expr) => { ... };
($i:ident +) => { ... };
}
18 CHAPTER 2. MACROS, A METHODICAL INTRODUCTION
Consider what happens if this macro is invoked as dead_rule!(x+). The interpreter will start at
the first rule, and attempt to parse the input as an expression. The first token (x) is valid as an
expression. The second token is also valid in an expression, forming a binary addition node.
At this point, given that there is no right-hand side of the addition, you might expect the parser
to give up and try the next rule. Instead, the parser will panic and abort the entire compilation,
citing a syntax error.
As such, it is important in general that you write macro rules from most-specific to least-specific.
To defend against future syntax changes altering the interpretation of macro input, macro_rules!
restricts what can follow various captures. The complete list, as of Rust 1.3 is as follows:
• item: anything.
• block: anything.
• stmt: => , ;
• pat: => , = if in
• expr: => , ;
• ty: , => : = > ; as
• ident: anything.
• path: , => : = > ; as
• meta: anything.
• tt: anything.
macro_rules! capture_expr_then_stringify {
($e:expr) => {
stringify!($e)
};
}
fn main() {
println!("{:?}", stringify!(dummy(2 * (1 + (3)))));
println!("{:?}", capture_expr_then_stringify!(dummy(2 * (1 + (3)))));
}
Note that stringify! is a built-in syntax extension which simply takes all tokens it is given and
concatenates them into one big string.
The output when run is:
"dummy ( 2 * ( 1 + ( 3 ) ) )"
"dummy(2 * (1 + (3)))"
Note that despite having the same input, the output is different. This is because the first
invocation is stringifying a sequence of token trees, whereas the second is stringifying an AST
expression node.
CAPTURES AND EXPANSION REDUX 19
To visualise the difference another way, here is what the stringify! macro gets invoked with in
the first case:
«dummy» «( )»
╭───────┴───────╮
«2» «*» «( )»
╭───────┴───────╮
«1» «+» «( )»
╭─┴─╮
«3»
« »
│ ┌─────────────┐
└╴│ Call │
│ fn: dummy │ ┌─────────┐
│ args: ◌ │╶─╴│ BinOp │
└─────────────┘ │ op: Mul │
┌╴│ lhs: ◌ │
┌────────┐ │ │ rhs: ◌ │╶┐ ┌─────────┐
│ LitInt │╶┘ └─────────┘ └╴│ BinOp │
│ val: 2 │ │ op: Add │
└────────┘ ┌╴│ lhs: ◌ │
┌────────┐ │ │ rhs: ◌ │╶┐ ┌────────┐
│ LitInt │╶┘ └─────────┘ └╴│ LitInt │
│ val: 1 │ │ val: 3 │
└────────┘ └────────┘
As you can see, there is exactly one token tree, which contains the AST which was parsed from
the input to the capture_expr_then_stringify! invocation. Hence, what you see in the output
is not the stringified tokens, it’s the stringified AST node.
This has further implications. Consider the following:
macro_rules! capture_then_match_tokens {
($e:expr) => {match_tokens!($e)};
}
macro_rules! match_tokens {
($a:tt + $b:tt) => {"got an addition"};
(($i:ident)) => {"got an identifier"};
($($other:tt)*) => {"got something else"};
}
fn main() {
println!("{}\n{}\n{}\n",
match_tokens!((caravan)),
20 CHAPTER 2. MACROS, A METHODICAL INTRODUCTION
match_tokens!(3 + 6),
match_tokens!(5));
println!("{}\n{}\n{}",
capture_then_match_tokens!((caravan)),
capture_then_match_tokens!(3 + 6),
capture_then_match_tokens!(5));
}
got an identifier
got an addition
got something else
By parsing the input into an AST node, the substituted result becomes un-destructible; i.e. you
cannot examine the contents or match against it ever again.
Here is another example which can be particularly confusing:
macro_rules! capture_then_what_is {
(#[$m:meta]) => {what_is!(#[$m])};
}
macro_rules! what_is {
(#[no_mangle]) => {"no_mangle attribute"};
(#[inline]) => {"inline attribute"};
($($tts:tt)*) => {concat!("something else (", stringify!($($tts)*), ")")};
}
fn main() {
println!(
"{}\n{}\n{}\n{}",
what_is!(#[no_mangle]),
what_is!(#[inline]),
capture_then_what_is!(#[no_mangle]),
capture_then_what_is!(#[inline]),
);
}
no_mangle attribute
inline attribute
something else (# [ no_mangle ])
something else (# [ inline ])
HYGIENE 21
The only way to avoid this is to capture using the tt or ident kinds. Once you capture with
anything else, the only thing you can do with the result from then on is substitute it directly
into the output.
Hygiene
Macros in Rust are partially hygienic. Specifically, they are hygienic when it comes to most
identifiers, but not when it comes to generic type parameters or lifetimes.
Hygiene works by attaching an invisible “syntax context” value to all identifiers. When two iden-
tifiers are compared, both the identifiers’ textural names and syntax contexts must be identical
for the two to be considered equal.
To illustrate this, consider the following code:
We will use the background colour to denote the syntax context. Now, let’s expand the macro
invocation:
First, recall that macro_rules! invocations effectively disappear during expansion.
Second, if you attempt to compile this code, the compiler will respond with something along the
following lines:
Note that the background colour (i.e. syntax context) for the expanded macro changes as part
of expansion. Each macro expansion is given a new, unique syntax context for its contents. As
a result, there are two different as in the expanded code: one in the first syntax context, the
second in the other. In other words, a is not the same identifier as a, however similar they may
appear.
That said, tokens that were substituted into the expanded output retain their original syntax
context (by virtue of having been provided to the macro as opposed to being part of the macro
itself). Thus, the solution is to modify the macro as follows:
Which, upon expansion becomes:
The compiler will accept this code because there is only one a being used.
Non-Identifier Identifiers
There are two tokens which you are likely to run into eventually that look like identifiers, but
aren’t. Except when they are.
First is self. This is very definitely a keyword. However, it also happens to fit the definition of
an identifier. In regular Rust code, there’s no way for self to be interpreted as an identifier, but
it can happen with macros:
22 CHAPTER 2. MACROS, A METHODICAL INTRODUCTION
macro_rules! what_is {
(self) => {"the keyword `self`"};
($i:ident) => {concat!("the identifier `", stringify!($i), "`")};
}
macro_rules! call_with_ident {
($c:ident($i:ident)) => {$c!($i)};
}
fn main() {
println!("{}", what_is!(self));
println!("{}", call_with_ident!(what_is(self)));
}
But that makes no sense; call_with_ident! required an identifier, matched one, and substituted
it! So self is both a keyword and not a keyword at the same time. You might wonder how this
is in any way important. Take this example:
macro_rules! make_mutable {
($i:ident) => {let mut $i = $i;};
}
struct Dummy(i32);
impl Dummy {
fn double(self) -> Dummy {
make_mutable!(self);
self.0 *= 2;
self
}
}
#
# fn main() {
# println!("{:?}", Dummy(4).double().0);
# }
So the macro will happily match self as an identifier, allowing you to use it in cases where you
can’t actually use it. But, fine; it somehow remembers that self is a keyword even when it’s an
identifier, so you should be able to do this, right?
macro_rules! make_self_mutable {
($i:ident) => {let mut $i = self;};
}
struct Dummy(i32);
impl Dummy {
fn double(self) -> Dummy {
make_self_mutable!(mut_self);
mut_self.0 *= 2;
mut_self
}
}
#
# fn main() {
# println!("{:?}", Dummy(4).double().0);
# }
<anon>:2:33: 2:37 error: `self` is not available in a static method. Maybe a `self` ar
↳ gument is missing? [E0424]
<anon>:2 ($i:ident) => {let mut $i = self;};
^~~~
That doesn’t make any sense, either. It’s not in a static method. It’s almost like it’s complaining
that the self it’s trying to use isn’t the same self… as though the self keyword has hygiene,
like an… identifier.
macro_rules! double_method {
($body:expr) => {
fn double(mut self) -> Dummy {
$body
}
};
}
struct Dummy(i32);
impl Dummy {
double_method! {{
self.0 *= 2;
self
}}
24 CHAPTER 2. MACROS, A METHODICAL INTRODUCTION
}
#
# fn main() {
# println!("{:?}", Dummy(4).double().0);
# }
macro_rules! double_method {
($self_:ident, $body:expr) => {
fn double(mut $self_) -> Dummy {
$body
}
};
}
struct Dummy(i32);
impl Dummy {
double_method! {self, {
self.0 *= 2;
self
}}
}
At last, this works. So self is both a keyword and an identifier when it feels like it. Surely this
works for other, similar constructs, right?
macro_rules! double_method {
($self_:ident, $body:expr) => {
fn double($self_) -> Dummy {
$body
}
};
}
struct Dummy(i32);
impl Dummy {
double_method! {_, 0}
}
#
# fn main() {
# println!("{:?}", Dummy(4).double().0);
# }
No, of course not. _ is a keyword that is valid in patterns and expressions, but somehow isn’t an
identifier like the keyword self is, despite matching the definition of an identifier just the same.
You might think you can get around this by using $self_:pat instead; that way, _ will match!
Except, no, because self isn’t a pattern. Joy.
The only work around for this (in cases where you want to accept some combination of these
tokens) is to use a tt matcher instead.
Debugging
rustc provides a number of tools to debug macros. One of the most useful is trace_macros!, which
is a directive to the compiler instructing it to dump every macro invocation prior to expansion.
For example, given the following:
#![feature(trace_macros)]
macro_rules! each_tt {
() => {};
($_tt:tt $($rest:tt)*) => {each_tt!($($rest)*);};
}
This is particularly invaluable when debugging deeply recursive macros. You can also enable this
from the command-line by adding -Z trace-macros to the compiler command line.
Secondly, there is log_syntax! which causes the compiler to output all tokens passed to it. For
example, this makes the compiler sing a song:
#![feature(log_syntax)]
macro_rules! sing {
() => {};
($tt:tt $($rest:tt)*) => {log_syntax!($tt); sing!($($rest)*);};
}
26 CHAPTER 2. MACROS, A METHODICAL INTRODUCTION
sing! {
^ < @ < . @ *
'\x08' '{' '"' _ # ' '
- @ '$' && / _ %
! ( '\t' @ | = >
; '\x08' '\'' + '$' ? '\x7f'
, # '"' ~ | ) '\x07'
}
Sometimes, it is what the macro expands to that proves problematic. For this, the --pretty
argument to the compiler can be used. Given the following code:
fn main() {
let world = S!("World");
println!("Hello, {}!", world);
}
#![feature(no_std, prelude_import)]
#![no_std]
#[prelude_import]
use std::prelude::v1::*;
#[macro_use]
extern crate std as std;
// Shorthand for initialising a `String`.
fn main() {
let world = String::from("World");
::std::io::_print(::std::fmt::Arguments::new_v1(
{
static __STATIC_FMTSTR: &'static [&'static str]
= &["Hello, ", "!\n"];
__STATIC_FMTSTR
},
&match (&world,) {
(__arg0,) => [
SCOPING 27
::std::fmt::ArgumentV1::new(__arg0, ::std::fmt::Display::fmt)
],
}
));
}
Other options to --pretty can be listed using rustc -Z unstable-options --help -v; a full list
is not provided since, as implied by the name, any such list would be subject to change at any
time.
Scoping
The way in which macros are scoped can be somewhat unintuitive. Firstly, unlike everything
else in the languages, macros will remain visible in sub-modules.
Note: In these examples, remember that all of them have the same behaviour when
the module contents are in separate files.
Secondly, also unlike everything else in the language, macros are only accessible after their
definition. Also note that this example demonstrates how macros do not “leak” out of their
defining scope:
mod a {
// X!(); // undefined
}
mod b {
// X!(); // undefined
macro_rules! X { () => {}; }
X!(); // defined
}
mod c {
// X!(); // undefined
}
To be clear, this lexical order dependency applies even if you move the macro to an outer scope:
28 CHAPTER 2. MACROS, A METHODICAL INTRODUCTION
mod a {
// X!(); // undefined
}
macro_rules! X { () => {}; }
mod b {
X!(); // defined
}
mod c {
X!(); // defined
}
mod a {
// X!(); // undefined
}
macro_rules! X { () => { Y!(); }; }
mod b {
// X!(); // defined, but Y! is undefined
}
macro_rules! Y { () => {}; }
mod c {
X!(); // defined, and so is Y!
}
mod a {
// X!(); // undefined
}
#[macro_use]
mod b {
macro_rules! X { () => {}; }
X!(); // defined
}
mod c {
X!(); // defined
}
Note that this can interact in somewhat bizarre ways due to the fact that identifiers in a macro
(including other macros) are only resolved upon expansion:
mod a {
// X!(); // undefined
}
#[macro_use]
mod b {
macro_rules! X { () => { Y!(); }; }
SCOPING 29
Another complication is that #[macro_use] applied to an extern crate does not behave this way:
such declarations are effectively hoisted to the top of the module. Thus, assuming X! is defined
in an external crate called mac, the following holds:
mod a {
// X!(); // defined, but Y! is undefined
}
macro_rules! Y { () => {}; }
mod b {
X!(); // defined, and so is Y!
}
#[macro_use] extern crate macs;
mod c {
X!(); // defined, and so is Y!
}
# fn main() {}
Finally, note that these scoping behaviours apply to functions as well, with the exception of
#[macro_use] (which isn’t applicable):
macro_rules! X {
() => { Y!() };
}
fn a() {
macro_rules! Y { () => {"Hi!"} }
assert_eq!(X!(), "Hi!");
{
assert_eq!(X!(), "Hi!");
macro_rules! Y { () => {"Bye!"} }
assert_eq!(X!(), "Bye!");
}
assert_eq!(X!(), "Hi!");
}
fn b() {
macro_rules! Y { () => {"One more"} }
assert_eq!(X!(), "One more");
}
These scoping rules are why a common piece of advice is to place all macros which should be
accessible “crate wide” at the very top of your root module, before any other modules. This
30 CHAPTER 2. MACROS, A METHODICAL INTRODUCTION
Import/Export
There are two ways to expose a macro to a wider scope. The first is the #[macro_use] attribute.
This can be applied to either modules or external crates. For example:
#[macro_use]
mod macros {
macro_rules! X { () => { Y!(); } }
macro_rules! Y { () => {} }
}
X!();
Macros can be exported from the current crate using #[macro_export]. Note that this ignores
all visibility.
Given the following definition for a library package macs:
mod macros {
#[macro_export] macro_rules! X { () => { Y!(); } }
#[macro_export] macro_rules! Y { () => {} }
}
X!(); // X is defined
#[macro_use] extern crate macs;
X!();
#
# fn main() {}
Note that you can only #[macro_use] an external crate from the root module.
Finally, when importing macros from an external crate, you can control which macros you import.
You can use this to limit namespace pollution, or to override specific macros, like so:
macro_rules! Y { () => {} }
IMPORT/EXPORT 31
fn main() {}
When exporting macros, it is often useful to refer to non-macro symbols in the defining crate.
Because crates can be renamed, there is a special substitution variable available: $crate. This
will always expand to an absolute path prefix to the containing crate (e.g. :: macs).
Note that this does not work for macros, since macros do not interact with regular name resolution
in any way. That is, you cannot use something like $crate::Y! to refer to a particular macro
within your crate. The implication, combined with selective imports via #[macro_use] is that
there is currently no way to guarantee any given macro will be available when imported by
another crate.
It is recommended that you always use absolute paths to non-macro names, to avoid conflicts,
including names in the standard library.
32 CHAPTER 2. MACROS, A METHODICAL INTRODUCTION
Macros, A Practical Introduction
3
This chapter will introduce the Rust macro-by-example system using a relatively simple, practical
example. It does not attempt to explain all of the intricacies of the system; its goal is to get you
comfortable with how and why macros are written.
There is also the Macros chapter of the Rust Book1 which is another high-level explanation, and
the methodical introduction (chapter 2, page 7) chapter of this book, which explains the macro
system in detail.
A Little Context
Note: don’t panic! What follows is the only math will be talked about. You can
quite safely skip this section if you just want to get to the meat of the article.
If you aren’t familiar, a recurrence relation is a sequence where each value is defined in terms of
one or more previous values, with one or more initial values to get the whole thing started. For
example, the Fibonacci sequence2 can be defined by the relation:
Thus, the first two numbers in the sequence are 0 and 1, with the third being F0 + F1 = 0 + 1
= 1, the fourth F1 + F2 = 1 + 1 = 2, and so on forever.
Now, because such a sequence can go on forever, that makes defining a fibonacci function a little
tricky, since you obviously don’t want to try returning a complete vector. What you want is to
return something which will lazily compute elements of the sequence as needed.
In Rust, that means producing an Iterator. This is not especially hard, but there is a fair
amount of boilerplate involved: you need to define a custom type, work out what state needs to
be stored in it, then implement the Iterator trait for it.
1 http://doc.rust-lang.org/book/macros.html
2 https://en.wikipedia.org/wiki/Fibonacci_number
34 CHAPTER 3. MACROS, A PRACTICAL INTRODUCTION
However, recurrence relations are simple enough that almost all of these details can be abstracted
out with a little macro-based code generation.
So, with all that having been said, let’s get started.
Construction
Usually, when working on a new macro, the first thing I do is decide what the macro invocation
should look like. In this specific case, my first attempt looked like this:
From that, we can take a stab at how the macro should be defined, even if we aren’t sure of the
actual expansion. This is useful because if you can’t figure out how to parse the input syntax,
then maybe you need to change it.
macro_rules! recurrence {
( a[n] = $($inits:expr),+ , ... , $recur:expr ) => { /* ... */ };
}
Assuming you aren’t familiar with the syntax, allow me to elucidate. This is defining a macro,
using the macro_rules! system, called recurrence!. This macro has a single parsing rule. That
rule says the input to the macro must match:
Finally, the rule says that if the input matches this rule, then the macro invocation should be
replaced by the token sequence /* ... */.
It’s worth noting that inits, as implied by the name, actually contains all the expressions that
match in this position, not just the first or last. What’s more, it captures them as a sequence
as opposed to, say, irreversibly pasting them all together. Also note that you can do “zero or
more” with a repetition by using * instead of +. There is no support for “zero or one” or more
specific numbers of repetitions.
As an exercise, let’s take the proposed input and feed it through the rule, to see how it is
processed. The “Position” column will show which part of the syntax pattern needs to be
matched against next, denoted by a “�”. Note that in some cases, there might be more than one
possible “next” element to match against. “Input” will contain all of the tokens that have not
been consumed yet. inits and recur will contain the contents of those bindings.
35
<thead>
<tr>
<th>Position</th>
<th>Input</th>
<th><code>inits</code></th>
<th><code>recur</code></th>
</tr>
</thead>
<tbody class="small-code">
<tr>
<td><code>a[n] = $($inits:expr),+ , ... , $recur:expr</code>
<code>⌂</code></td>
<td><code>a[n] = 0, 1, ..., a[n-1] + a[n-2]</code></td>
<td></td>
<td></td>
</tr>
<tr>
<td><code>a[n] = $($inits:expr),+ , ... , $recur:expr</code>
<code> ⌂</code></td>
<td><code>[n] = 0, 1, ..., a[n-1] + a[n-2]</code></td>
<td></td>
<td></td>
</tr>
<tr>
<td><code>a[n] = $($inits:expr),+ , ... , $recur:expr</code>
<code> ⌂</code></td>
<td><code>n] = 0, 1, ..., a[n-1] + a[n-2]</code></td>
<td></td>
<td></td>
</tr>
<tr>
<td><code>a[n] = $($inits:expr),+ , ... , $recur:expr</code>
<code> ⌂</code></td>
<td><code>] = 0, 1, ..., a[n-1] + a[n-2]</code></td>
<td></td>
<td></td>
</tr>
<tr>
<td><code>a[n] = $($inits:expr),+ , ... , $recur:expr</code>
<code> ⌂</code></td>
<td><code>= 0, 1, ..., a[n-1] + a[n-2]</code></td>
<td></td>
<td></td>
</tr>
<tr>
<td><code>a[n] = $($inits:expr),+ , ... , $recur:expr</code>
<code> ⌂</code></td>
<td><code>0, 1, ..., a[n-1] + a[n-2]</code></td>
<td></td>
36 CHAPTER 3. MACROS, A PRACTICAL INTRODUCTION
<td></td>
</tr>
<tr>
<td><code>a[n] = $($inits:expr),+ , ... , $recur:expr</code>
<code> ⌂</code></td>
<td><code>0, 1, ..., a[n-1] + a[n-2]</code></td>
<td></td>
<td></td>
</tr>
<tr>
<td><code>a[n] = $($inits:expr),+ , ... , $recur:expr</code>
<code> ⌂ ⌂</code></td>
<td><code>, 1, ..., a[n-1] + a[n-2]</code></td>
<td><code>0</code></td>
<td></td>
</tr>
<tr>
<td colspan="4" style="font-size:.7em;">
Note: there are two � here, because the next input token might match either the comma separator
betweenem> elements in the repetition, or the comma after the repetition. The macro system
will keep track of both possibilities, until it is able to decide which one to follow.
</td>
</tr>
<tr>
<td><code>a[n] = $($inits:expr),+ , ... , $recur:expr</code>
<code> ⌂ ⌂</code></td>
<td><code>1, ..., a[n-1] + a[n-2]</code></td>
<td><code>0</code></td>
<td></td>
</tr>
<tr>
<td><code>a[n] = $($inits:expr),+ , ... , $recur:expr</code>
<code> ⌂ ⌂ <s>⌂</s></code></td>
<td><code>, ..., a[n-1] + a[n-2]</code></td>
<td><code>0</code>, <code>1</code></td>
<td></td>
</tr>
<tr>
<td colspan="4" style="font-size:.7em;">
Note: the third, crossed-out marker indicates that the macro system has, as a consequence of
the last token consumed, eliminated one of the previous possible branches.
</td>
</tr>
<tr>
37
Note: this particular step should make it clear that a binding like $recur:expr will consume an
entire expression, using the compiler’s knowledge of what constitutes a valid expression. As will
be noted later, you can do this for other language constructs, too.
</td>
</tr>
</tbody>
The key take-away from this is that the macro system will try to incrementally match the tokens
provided as input to the macro against the provided rules. We’ll come back to the “try” part.
Now, let’s begin writing the final, fully expanded form. For this expansion, I was looking for
something like:
let fib = {
struct Recurrence {
mem: [u64; 2],
pos: usize,
}
38 CHAPTER 3. MACROS, A PRACTICAL INTRODUCTION
This will be the actual iterator type. mem will be the memo buffer to hold the last few values so
the recurrence can be computed. pos is to keep track of the value of n.
Aside: I’ve chosen u64 as a “sufficiently large” type for the elements of this sequence.
Don’t worry about how this will work out for other sequences; we’ll come to it.
#[inline]
fn next(&mut self) -> Option<u64> {
if self.pos < 2 {
let next_val = self.mem[self.pos];
self.pos += 1;
Some(next_val)
We need a branch to yield the initial values of the sequence; nothing tricky.
} else {
let a = /* something */;
let n = self.pos;
let next_val = (a[n-1] + a[n-2]);
self.mem.TODO_shuffle_down_and_append(next_val);
self.pos += 1;
Some(next_val)
}
}
}
This is a bit harder; we’ll come back and look at how exactly to define a. Also, TODO_shuffle_-
down_and_append is another placeholder; I want something that places next_val on the end of the
array, shuffling the rest down by one space, dropping the 0th element.
Lastly, return an instance of our new structure, which can then be iterated over. To summarise,
the complete expansion is:
let fib = {
struct Recurrence {
mem: [u64; 2],
39
pos: usize,
}
#[inline]
fn next(&mut self) -> Option<u64> {
if self.pos < 2 {
let next_val = self.mem[self.pos];
self.pos += 1;
Some(next_val)
} else {
let a = /* something */;
let n = self.pos;
let next_val = (a[n-1] + a[n-2]);
self.mem.TODO_shuffle_down_and_append(next_val.clone());
self.pos += 1;
Some(next_val)
}
}
}
Aside: Yes, this does mean we’re defining a different Recurrence struct and its im-
plementation for each macro invocation. Most of this will optimise away in the final
binary, with some judicious use of #[inline] attributes.
It’s also useful to check your expansion as you’re writing it. If you see anything in the expansion
that needs to vary with the invocation, but isn’t in the actual macro syntax, you should work
out where to introduce it. In this case, we’ve added u64, but that’s not neccesarily what the user
wants, nor is it in the macro syntax. So let’s fix that.
macro_rules! recurrence {
( a[n]: $sty:ty = $($inits:expr),+ , ... , $recur:expr ) => { /* ... */ };
}
/*
let fib = recurrence![a[n]: u64 = 0, 1, ..., a[n-1] + a[n-2]];
Aside: if you’re wondering, the bit after the colon in a capture can be one of several
kinds of syntax matchers. The most common ones are item, expr, and ty. A com-
plete explanation can be found in Macros, A Methodical Introduction; macro_rules!
(Captures) (section 2, page 15).
There’s one other thing to be aware of: in the interests of future-proofing the language,
the compiler restricts what tokens you’re allowed to put after a matcher, depending
on what kind it is. Typically, this comes up when trying to match expressions or
statements; those can only be followed by one of =>, ,, and ;.
A complete list can be found in Macros, A Methodical Introduction; Minutiae; Cap-
tures and Expansion Redux (section 2, page 17).
I will skim a bit over this part, since it’s effectively tangential to the macro stuff. We want to
make it so that the user can access previous values in the sequence by indexing a; we want it to
act as a sliding window keeping the last few (in this case, 2) elements of the sequence.
We can do this pretty easily with a wrapper type:
struct IndexOffset<'a> {
slice: &'a [u64; 2],
offset: usize,
}
#[inline(always)]
fn index<'b>(&'b self, index: usize) -> &'b u64 {
use std::num::Wrapping;
Aside: since lifetimes come up a lot with people new to Rust, a quick explanation:
'a and 'b are lifetime parameters that are used to track where a reference (i.e. a
borrowed pointer to some data) is valid. In this case, IndexOffset borrows a reference
to our iterator’s data, so it needs to keep track of how long it’s allowed to hold that
reference for, using 'a.
41
'b is used because the Index::index function (which is how subscript syntax is actually
implemented) is also parameterised on a lifetime, on account of returning a borrowed
reference. 'a and 'b are not necessarily the same thing in all cases. The borrow
checker will make sure that even though we don’t explicitly relate 'a and 'b to one
another, we don’t accidentally violate memory safety.
{
use std::mem::swap;
This swaps the new value into the end of the array, swapping the other elements down one space.
Aside: doing it this way means that this code will work for non-copyable types, as
well.
macro_rules! recurrence {
( a[n]: $sty:ty = $($inits:expr),+ , ... , $recur:expr ) => { /* ... */ };
}
fn main() {
/*
let fib = recurrence![a[n]: u64 = 0, 1, ..., a[n-1] + a[n-2]];
struct Recurrence {
mem: [u64; 2],
pos: usize,
}
42 CHAPTER 3. MACROS, A PRACTICAL INTRODUCTION
struct IndexOffset<'a> {
slice: &'a [u64; 2],
offset: usize,
}
#[inline(always)]
fn index<'b>(&'b self, index: usize) -> &'b u64 {
use std::num::Wrapping;
#[inline]
fn next(&mut self) -> Option<u64> {
if self.pos < 2 {
let next_val = self.mem[self.pos];
self.pos += 1;
Some(next_val)
} else {
let next_val = {
let n = self.pos;
let a = IndexOffset { slice: &self.mem, offset: n };
(a[n-1] + a[n-2])
};
{
use std::mem::swap;
self.pos += 1;
Some(next_val)
43
}
}
}
Note that I’ve changed the order of the declarations of n and a, as well as wrapped them (along
with the recurrence expression) in a block. The reason for the first should be obvious (n needs
to be defined first so I can use it for a). The reason for the second is that the borrowed reference
&self.mem will prevent the swaps later on from happening (you cannot mutate something that is
aliased elsewhere). The block ensures that the &self.mem borrow expires before then.
Incidentally, the only reason the code that does the mem swaps is in a block is to narrow the scope
in which std::mem::swap is available, for the sake of being tidy.
If we take this code and run it, we get:
0
1
2
3
5
8
13
21
34
Success! Now, let’s copy & paste this into the macro expansion, and replace the expanded code
with an invocation. This gives us:
macro_rules! recurrence {
( a[n]: $sty:ty = $($inits:expr),+ , ... , $recur:expr ) => {
{
/*
What follows here is *literally* the code from before,
cut and pasted into a new position. No other changes
have been made.
*/
use std::ops::Index;
struct Recurrence {
mem: [u64; 2],
pos: usize,
}
44 CHAPTER 3. MACROS, A PRACTICAL INTRODUCTION
struct IndexOffset<'a> {
slice: &'a [u64; 2],
offset: usize,
}
#[inline(always)]
fn index<'b>(&'b self, index: usize) -> &'b u64 {
use std::num::Wrapping;
#[inline]
fn next(&mut self) -> Option<u64> {
if self.pos < 2 {
let next_val = self.mem[self.pos];
self.pos += 1;
Some(next_val)
} else {
let next_val = {
let n = self.pos;
let a = IndexOffset { slice: &self.mem, offset: n };
(a[n-1] + a[n-2])
};
{
use std::mem::swap;
self.pos += 1;
Some(next_val)
45
}
}
}
fn main() {
let fib = recurrence![a[n]: u64 = 0, 1, ..., a[n-1] + a[n-2]];
Obviously, we aren’t using the captures yet, but we can change that fairly easily. However, if we
try to compile this, rustc aborts, telling us:
Here, we’ve run into a limitation of macro_rules. The problem is that second comma. When it
sees it during expansion, macro_rules can’t decide if it’s supposed to parse another expression
for inits, or .... Sadly, it isn’t quite clever enough to realise that ... isn’t a valid expression,
so it gives up. Theoretically, this should work as desired, but currently doesn’t.
Aside: I did fib a little about how our rule would be interpreted by the macro
system. In general, it should work as described, but doesn’t in this case. The macro_-
rules machinery, as it stands, has its foibles, and its worthwhile remembering that
on occasion, you’ll need to contort a little to get it to work.
In this particular case, there are two issues. First, the macro system doesn’t know
what does and does not constitute the various grammar elements (e.g. an expres-
sion); that’s the parser’s job. As such, it doesn’t know that ... isn’t an expression.
Secondly, it has no way of trying to capture a compound grammar element (like an
expression) without 100% committing to that capture.
In other words, it can ask the parser to try and parse some input as an expression,
but the parser will respond to any problems by aborting. The only way the macro
system can currently deal with this is to just try to forbid situations where this could
be a problem.
On the bright side, this is a state of affairs that exactly no one is enthusiastic about.
The macro keyword has already been reserved for a more rigorously-defined future
macro system. Until then, needs must.
Thankfully, the fix is relatively simple: we remove the comma from the syntax. To keep things
balanced, we’ll remove both commas around ...:
46 CHAPTER 3. MACROS, A PRACTICAL INTRODUCTION
macro_rules! recurrence {
( a[n]: $sty:ty = $($inits:expr),+ ... $recur:expr ) => {
// ^~~ changed
/* ... */
};
}
fn main() {
let fib = recurrence![a[n]: u64 = 0, 1 ... a[n-1] + a[n-2]];
// ^~~ changed
Success! We can now start replacing things in the expansion with things we’ve captured.
Substitution
Substituting something you’ve captured in a macro is quite simple; you can insert the contents
of a capture $sty:ty by using $sty. So, let’s go through and fix the u64s:
macro_rules! recurrence {
( a[n]: $sty:ty = $($inits:expr),+ ... $recur:expr ) => {
{
use std::ops::Index;
struct Recurrence {
mem: [$sty; 2],
// ^~~~ changed
pos: usize,
}
struct IndexOffset<'a> {
slice: &'a [$sty; 2],
// ^~~~ changed
offset: usize,
}
#[inline(always)]
fn index<'b>(&'b self, index: usize) -> &'b $sty {
// ^~~~ changed
use std::num::Wrapping;
#[inline]
fn next(&mut self) -> Option<$sty> {
// ^~~~ changed
/* ... */
}
}
fn main() {
let fib = recurrence![a[n]: u64 = 0, 1 ... a[n-1] + a[n-2]];
Let’s tackle a harder one: how to turn inits into both the array literal [0, 1] and the array
type, [$sty; 2]. The first one we can do like so:
This effectively does the opposite of the capture: repeat inits one or more times, separating
each with a comma. This expands to the expected sequence of tokens: 0, 1.
Somehow turning inits into a literal 2 is a little trickier. It turns out that there’s no direct way
to do this, but we can do it by using a second macro. Let’s take this one step at a time.
macro_rules! count_exprs {
/* ??? */
}
The obvious case is: given zero expressions, you would expect count_exprs to expand to a literal
0.
48 CHAPTER 3. MACROS, A PRACTICAL INTRODUCTION
macro_rules! count_exprs {
() => (0);
// ^~~~~~~~~~ added
}
Aside: You may have noticed I used parentheses here instead of curly braces for the
expansion. macro_rules really doesn’t care what you use, so long as it’s one of the
“matcher” pairs: ( ), { } or [ ]. In fact, you can switch out the matchers on the
macro itself (i.e. the matchers right after the macro name), the matchers around the
syntax rule, and the matchers around the corresponding expansion.
You can also switch out the matchers used when you invoke a macro, but in a more
limited fashion: a macro invoked as { ... } or ( ... ); will always be parsed as an
item (i.e. like a struct or fn declaration). This is important when using macros in a
function body; it helps disambiguate between “parse like an expression” and “parse
like a statement”.
macro_rules! count_exprs {
() => (0);
($e:expr) => (1);
// ^~~~~~~~~~~~~~~~~ added
}
Two?
macro_rules! count_exprs {
() => (0);
($e:expr) => (1);
($e0:expr, $e1:expr) => (2);
// ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ added
}
We can “simplify” this a little by re-expressing the case of two expressions recursively.
macro_rules! count_exprs {
() => (0);
($e:expr) => (1);
($e0:expr, $e1:expr) => (1 + count_exprs!($e1));
// ^~~~~~~~~~~~~~~~~~~~~ changed
}
This is fine since Rust can fold 1 + 1 into a constant value. What if we have three expressions?
macro_rules! count_exprs {
() => (0);
49
Aside: You might be wondering if we could reverse the order of these rules. In
this particular case, yes, but the macro system can sometimes be picky about what
it is and is not willing to recover from. If you ever find yourself with a multi-rule
macro that you swear should work, but gives you errors about unexpected tokens,
try changing the order of the rules.
Hopefully, you can see the pattern here. We can always reduce the list of expressions by matching
one expression, followed by zero or more expressions, expanding that into 1 + a count.
macro_rules! count_exprs {
() => (0);
($head:expr) => (1);
($head:expr, $($tail:expr),*) => (1 + count_exprs!($($tail),*));
// ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ changed
}
With this, we can now modify recurrence to determine the necessary size of mem.
// added:
macro_rules! count_exprs {
() => (0);
($head:expr) => (1);
($head:expr, $($tail:expr),*) => (1 + count_exprs!($($tail),*));
}
macro_rules! recurrence {
( a[n]: $sty:ty = $($inits:expr),+ ... $recur:expr ) => {
{
use std::ops::Index;
struct Recurrence {
mem: [$sty; MEM_SIZE],
// ^~~~~~~~ changed
pos: usize,
}
struct IndexOffset<'a> {
slice: &'a [$sty; MEM_SIZE],
// ^~~~~~~~ changed
50 CHAPTER 3. MACROS, A PRACTICAL INTRODUCTION
offset: usize,
}
#[inline(always)]
fn index<'b>(&'b self, index: usize) -> &'b $sty {
use std::num::Wrapping;
#[inline]
fn next(&mut self) -> Option<$sty> {
if self.pos < MEM_SIZE {
// ^~~~~~~~ changed
let next_val = self.mem[self.pos];
self.pos += 1;
Some(next_val)
} else {
let next_val = {
let n = self.pos;
let a = IndexOffset { slice: &self.mem, offset: n };
(a[n-1] + a[n-2])
};
{
use std::mem::swap;
self.pos += 1;
51
Some(next_val)
}
}
}
With that done, we can now substitute the last thing: the recur expression.
# macro_rules! count_exprs {
# () => (0);
# ($head:expr $(, $tail:expr)*) => (1 + count_exprs!($($tail),*));
# }
# macro_rules! recurrence {
# ( a[n]: $sty:ty = $($inits:expr),+ ... $recur:expr ) => {
# {
# const MEMORY: uint = count_exprs!($($inits),+);
# struct Recurrence {
# mem: [$sty; MEMORY],
# pos: uint,
# }
# struct IndexOffset<'a> {
# slice: &'a [$sty; MEMORY],
# offset: uint,
# }
# impl<'a> Index<uint, $sty> for IndexOffset<'a> {
# #[inline(always)]
# fn index<'b>(&'b self, index: &uint) -> &'b $sty {
# let real_index = *index - self.offset + MEMORY;
# &self.slice[real_index]
# }
# }
# impl Iterator<u64> for Recurrence {
/* ... */
#[inline]
fn next(&mut self) -> Option<u64> {
if self.pos < MEMORY {
let next_val = self.mem[self.pos];
self.pos += 1;
Some(next_val)
} else {
let next_val = {
let n = self.pos;
let a = IndexOffset { slice: &self.mem, offset: n };
52 CHAPTER 3. MACROS, A PRACTICAL INTRODUCTION
$recur
// ^~~~~~ changed
};
{
use std::mem::swap;
let mut swap_tmp = next_val;
for i in range(0, MEMORY).rev() {
swap(&mut swap_tmp, &mut self.mem[i]);
}
}
self.pos += 1;
Some(next_val)
}
}
/* ... */
# }
# Recurrence { mem: [$($inits),+], pos: 0 }
# }
# };
# }
# fn main() {
# let fib = recurrence![a[n]: u64 = 1, 1 ... a[n-1] + a[n-2]];
# for e in fib.take(10) { println!("{}", e) }
# }
… wait, what? That can’t be right… let’s check what the macro is expanding to.
53
The --pretty expanded argument tells rustc to perform macro expansion, then turn the resulting
AST back into source code. Because this option isn’t considered stable yet, we also need -Z
unstable-options. The output (after cleaning up some formatting) is shown below; in particular,
note the place in the code where $recur was substituted:
#![feature(no_std)]
#![no_std]
#[prelude_import]
use std::prelude::v1::*;
#[macro_use]
extern crate std as std;
fn main() {
let fib = {
use std::ops::Index;
const MEM_SIZE: usize = 1 + 1;
struct Recurrence {
mem: [u64; MEM_SIZE],
pos: usize,
}
struct IndexOffset<'a> {
slice: &'a [u64; MEM_SIZE],
offset: usize,
}
impl <'a> Index<usize> for IndexOffset<'a> {
type Output = u64;
#[inline(always)]
fn index<'b>(&'b self, index: usize) -> &'b u64 {
use std::num::Wrapping;
let index = Wrapping(index);
let offset = Wrapping(self.offset);
let window = Wrapping(MEM_SIZE);
let real_index = index - offset + window;
&self.slice[real_index.0]
}
}
impl Iterator for Recurrence {
type Item = u64;
#[inline]
fn next(&mut self) -> Option<u64> {
if self.pos < MEM_SIZE {
let next_val = self.mem[self.pos];
self.pos += 1;
Some(next_val)
} else {
let next_val = {
let n = self.pos;
let a = IndexOffset{slice: &self.mem, offset: n,};
54 CHAPTER 3. MACROS, A PRACTICAL INTRODUCTION
a[n - 1] + a[n - 2]
};
{
use std::mem::swap;
let mut swap_tmp = next_val;
{
let result =
match ::std::iter::IntoIterator::into_iter((0..MEM_SIZ
↳ E).rev()) {
mut iter => loop {
match ::std::iter::Iterator::next(&mut iter) {
::std::option::Option::Some(i) => {
swap(&mut swap_tmp, &mut self.mem[i]);
}
::std::option::Option::None => break,
}
},
};
result
}
}
self.pos += 1;
Some(next_val)
}
}
}
Recurrence{mem: [0, 1], pos: 0,}
};
{
let result =
match ::std::iter::IntoIterator::into_iter(fib.take(10)) {
mut iter => loop {
match ::std::iter::Iterator::next(&mut iter) {
::std::option::Option::Some(e) => {
::std::io::_print(::std::fmt::Arguments::new_v1(
{
static __STATIC_FMTSTR: &'static [&'static str] =
↳ &["", "\n"];
__STATIC_FMTSTR
},
&match (&e,) {
(__arg0,) => [::std::fmt::ArgumentV1::new(__arg0,
↳ ::std::fmt::Display::fmt)],
}
))
}
::std::option::Option::None => break,
}
},
55
};
result
}
}
But that looks fine! If we add a few missing #![feature(...)] attributes and feed it to a nightly
build of rustc, it even compiles! … what?!
Aside: You can’t compile the above with a non-nightly build of rustc. This is
because the expansion of the println! macro depends on internal compiler details
which are not publicly stabilised.
Being Hygienic
The issue here is that identifiers in Rust macros are hygienic. That is, identifiers from two
different contexts cannot collide. To show the difference, let’s take a simpler example.
macro_rules! using_a {
($e:expr) => {
{
let a = 42i;
$e
}
}
}
This macro simply takes an expression, then wraps it in a block with a variable a defined. We
then use this as a round-about way of computing 4. There are actually two syntax contexts
involved in this example, but they’re invisible. So, to help with this, let’s give each context a
different colour. Let’s start with the unexpanded code, where there is only a single context:
Now, let’s expand the invocation.
As you can see, the a that’s defined by the macro is in a different context to the a we provided in
our invocation. As such, the compiler treats them as completely different identifiers, even though
they have the same lexical appearance.
This is something to be really careful of when working on macros: macros can produce ASTs
which will not compile, but which will compile if written out by hand, or dumped using --pretty
expanded.
The solution to this is to capture the identifier with the appropriate syntax context. To do that,
we need to again adjust our macro syntax. To continue with our simpler example:
This now expands to:
Now, the contexts match, and the code will compile. We can make this adjustment to our
recurrence! macro by explicitly capturing a and n. After making the necessary changes, we
have:
56 CHAPTER 3. MACROS, A PRACTICAL INTRODUCTION
macro_rules! count_exprs {
() => (0);
($head:expr) => (1);
($head:expr, $($tail:expr),*) => (1 + count_exprs!($($tail),*));
}
macro_rules! recurrence {
( $seq:ident [ $ind:ident ]: $sty:ty = $($inits:expr),+ ... $recur:expr ) => {
// ^~~~~~~~~~ ^~~~~~~~~~ changed
{
use std::ops::Index;
struct Recurrence {
mem: [$sty; MEM_SIZE],
pos: usize,
}
struct IndexOffset<'a> {
slice: &'a [$sty; MEM_SIZE],
offset: usize,
}
#[inline(always)]
fn index<'b>(&'b self, index: usize) -> &'b $sty {
use std::num::Wrapping;
#[inline]
fn next(&mut self) -> Option<$sty> {
if self.pos < MEM_SIZE {
let next_val = self.mem[self.pos];
self.pos += 1;
57
Some(next_val)
} else {
let next_val = {
let $ind = self.pos;
// ^~~~ changed
let $seq = IndexOffset { slice: &self.mem, offset: $ind };
// ^~~~ changed
$recur
};
{
use std::mem::swap;
self.pos += 1;
Some(next_val)
}
}
}
fn main() {
let fib = recurrence![a[n]: u64 = 0, 1 ... a[n-1] + a[n-2]];
↳ };
for e in recurrence!(f[i]: f64 = 1.0 ... f[i-1] * i as f64).take(10) {
println!("{}", e)
}
1
1
2
58 CHAPTER 3. MACROS, A PRACTICAL INTRODUCTION
6
24
120
720
5040
40320
362880
Success!
4
Patterns
Callbacks
macro_rules! call_with_larch {
($callback:ident) => { $callback!(larch) };
}
macro_rules! expand_to_larch {
() => { larch };
}
macro_rules! recognise_tree {
(larch) => { println!("#1, the Larch.") };
(redwood) => { println!("#2, the Mighty Redwood.") };
(fir) => { println!("#3, the Fir.") };
(chestnut) => { println!("#4, the Horse Chestnut.") };
(pine) => { println!("#5, the Scots Pine.") };
($($other:tt)*) => { println!("I don't know; some kind of birch maybe?") };
}
fn main() {
recognise_tree!(expand_to_larch!());
call_with_larch!(recognise_tree);
}
Due to the order that macros are expanded in, it is (as of Rust 1.2) impossible to pass information
to a macro from the expansion of another macro. This can make modularising macros very
60 CHAPTER 4. PATTERNS
difficult.
An alternative is to use recursion and pass a callback. Here is a trace of the above example to
demonstrate how this takes place:
recognise_tree! { expand_to_larch ! ( ) }
println! { "I don't know; some kind of birch maybe?" }
// ...
call_with_larch! { recognise_tree }
recognise_tree! { larch }
println! { "#1, the Larch." }
// ...
macro_rules! callback {
($callback:ident($($args:tt)*)) => {
$callback!($($args)*)
};
}
fn main() {
callback!(callback(println("Yes, this *was* unnecessary.")));
}
Incremental TT Munchers
macro_rules! mixed_rules {
() => {};
(trace $name:ident; $($tail:tt)*) => {
{
println!(concat!(stringify!($name), " = {:?}"), $name);
mixed_rules!($($tail)*);
}
};
(trace $name:ident = $init:expr; $($tail:tt)*) => {
{
let $name = $init;
println!(concat!(stringify!($name), " = {:?}"), $name);
mixed_rules!($($tail)*);
}
};
}
INTERNAL RULES 61
This pattern is perhaps the most powerful macro parsing technique available, allowing one to
parse grammars of significant complexity.
A “TT muncher” is a recursive macro that works by incrementally processing its input one step
at a time. At each step, it matches and removes (munches) some sequence of tokens from the
start of its input, generates some intermediate output, then recurses on the input tail.
The reason for “TT” in the name specifically is that the unprocessed part of the input is always
captured as $($tail:tt)*. This is done as a tt repetition is the only way to losslessly capture
part of a macro’s input.
The only hard restrictions on TT munchers are those imposed on the macro system as a whole:
• You can only match against literals and grammar constructs which can be captured by
macro_rules!.
• You cannot match unbalanced groups.
It is important, however, to keep the macro recursion limit in mind. macro_rules! does not have
any form of tail recursion elimination or optimisation. It is recommended that, when writing a
TT muncher, you make reasonable efforts to keep recursion as limited as possible. This can be
done by adding additional rules to account for variation in the input (as opposed to recursion into
an intermediate layer), or by making compromises on the input syntax to make using standard
repetitions more tractable.
Internal Rules
#[macro_export]
macro_rules! foo {
(@as_expr $e:expr) => {$e};
($($tts:tt)*) => {
foo!(@as_expr $($tts)*)
};
}
Because macros do not interact with regular item privacy or lookup, any public macro must
bring with it all other macros that it depends on. This can lead to pollution of the global macro
namespace, or even conflicts with macros from other crates. It may also cause confusion to users
who attempt to selectively import macros: they must transitively import all macros, including
ones that may not be publicly documented.
A good solution is to conceal what would otherwise be other public macros inside the macro
being exported. The above example shows how the common as_expr! macro could be moved
into the publicly exported macro that is using it.
The reason for using @ is that, as of Rust 1.2, the @ token is not used in prefix position; as such,
it cannot conflict with anything. Other symbols or unique prefixes may be used as desired, but
use of @ has started to become widespread, so using it may aid readers in understanding your
code.
62 CHAPTER 4. PATTERNS
Note: the @ token was previously used in prefix position to denote a garbage-collected
pointer, back when the language used sigils to denote pointer types. Its only current
purpose is for binding names to patterns. For this, however, it is used as an infix
operator, and thus does not conflict with its use here.
Additionally, internal rules will often come before any “bare” rules, to avoid issues with macro_-
rules! incorrectly attempting to parse an internal invocation as something it cannot possibly
be, such as an expression.
If exporting at least one internal macro is unavoidable (e.g. you have many macros that depend
on a common set of utility rules), you can use this pattern to combine all internal macros into
a single uber-macro.
macro_rules! crate_name_util {
(@as_expr $e:expr) => {$e};
(@as_item $i:item) => {$i};
(@count_tts) => {0usize};
// ...
}
Push-Down Accumulation
macro_rules! init_array {
(@accum (0, $_e:expr) -> ($($body:tt)*))
=> {init_array!(@as_expr [$($body)*])};
(@accum (1, $e:expr) -> ($($body:tt)*))
=> {init_array!(@accum (0, $e) -> ($($body)* $e,))};
(@accum (2, $e:expr) -> ($($body:tt)*))
=> {init_array!(@accum (1, $e) -> ($($body)* $e,))};
(@accum (3, $e:expr) -> ($($body:tt)*))
=> {init_array!(@accum (2, $e) -> ($($body)* $e,))};
(@as_expr $e:expr) => {$e};
[$e:expr; $n:tt] => {
{
let e = $e;
init_array!(@accum ($n, e.clone()) -> ())
}
};
}
All macros in Rust must result in a complete, supported syntax element (such as an expression,
item, etc.). This means that it is impossible to have a macro expand to a partial construct.
One might hope that the above example could be more directly expressed like so:
REPETITION REPLACEMENT 63
macro_rules! init_array {
(@accum 0, $_e:expr) => {/* empty */};
(@accum 1, $e:expr) => {$e};
(@accum 2, $e:expr) => {$e, init_array!(@accum 1, $e)};
(@accum 3, $e:expr) => {$e, init_array!(@accum 2, $e)};
[$e:expr; $n:tt] => {
{
let e = $e;
[init_array!(@accum $n, e)]
}
};
}
The expectation is that the expansion of the array literal would proceed as follows:
[init_array!(@accum 3, e)]
[e, init_array!(@accum 2, e)]
[e, e, init_array!(@accum 1, e)]
[e, e, e]
However, this would require each intermediate step to expand to an incomplete expression. Even
though the intermediate results will never be used outside of a macro context, it is still forbidden.
Push-down, however, allows us to incrementally build up a sequence of tokens without needing
to actually have a complete construct at any point prior to completion. In the example given at
the top, the sequence of macro invocations proceeds as follows:
As you can see, each layer adds to the accumulated output until the terminating rule finally
emits it as a complete construct.
The only critical part of the above formulation is the use of $($body:tt)* to preserve the output
without triggering parsing. The use of ($input) -> ($output) is simply a convention adopted to
help clarify the behaviour of such macros.
Push-down accumulation is frequently used as part of incremental TT munchers (??, page ??),
as it allows arbitrarily complex intermediate results to be constructed.
Repetition Replacement
macro_rules! replace_expr {
($_t:tt $sub:expr) => {$sub};
}
64 CHAPTER 4. PATTERNS
This pattern is where a matched repetition sequence is simply discarded, with the variable being
used to instead drive some repeated pattern that is related to the input only in terms of length.
For example, consider constructing a default instance of a tuple with more than 12 elements (the
limit as of Rust 1.2).
macro_rules! tuple_default {
($($tup_tys:ty),*) => {
(
$(
replace_expr!(
($tup_tys)
Default::default()
),
)*
)
};
}
Here, we are not actually using the matched types. Instead, we throw them away and instead
replace them with a single, repeated expression. To put it another way, we don’t care what the
types are, only how many there are.
Trailing Separators
macro_rules! match_exprs {
($($exprs:expr),* $(,)*) => {...};
}
There are various places in the Rust grammar where trailing commas are permitted. The
two common ways of matching (for example) a list of expressions ($($exprs:expr),* and
$($exprs:expr,)*) can deal with either no trailing comma or a trailing comma, but not both.
Placing a $(,)* repetition after the main list, however, will capture any number (including zero
or one) of trailing commas, or any other separator you may be using.
Note that this cannot be used in all contexts. If the compiler rejects this, you will likely need to
use multiple arms and/or incremental matching.
TT Bundling
macro_rules! call_a_or_b_on_tail {
((a: $a:expr, b: $b:expr), call a: $($tail:tt)*) => {
TT BUNDLING 65
$a(stringify!($($tail)*))
};
fn main() {
assert_eq!(
call_a_or_b_on_tail!(
(a: compute_len, b: show_tail),
the recursive part that skips over all these
tokens doesn't much care whether we will call a
or call b: only the terminal rules care.
),
None
);
assert_eq!(
call_a_or_b_on_tail!(
(a: compute_len, b: show_tail),
and now, to justify the existence of two paths
we will also call a: its input should somehow
be self-referential, so let's make it return
some ninety one!
),
Some(91)
);
}
In particularly complex recursive macros, a large number of arguments may be needed in order to
carry identifiers and expressions to successive layers. However, depending on the implementation
there may be many intermediate layers which need to forward these arguments, but do not need
to use them.
As such, it can be very useful to bundle all such arguments together into a single TT by placing
66 CHAPTER 4. PATTERNS
them in a group. This allows layers which do not need to use the arguments to simply capture and
substitute a single tt, rather than having to exactly capture and substitute the entire argument
group.
The example above bundles the $a and $b expressions into a group which can then be forwarded
as a single tt by the recursive rule. This group is then destructured by the terminal rules to
access the expressions.
Provisional
This section is for patterns or techniques which are of dubious value, or which might be too niche
for inclusion.
Abacus Counters
Note: this section assumes understanding of push-down accumulation (??, page ??)
and incremental TT munchers (??, page ??).
macro_rules! abacus {
((- $($moves:tt)*) -> (+ $($count:tt)*)) => {
abacus!(($($moves)*) -> ($($count)*))
};
((- $($moves:tt)*) -> ($($count:tt)*)) => {
abacus!(($($moves)*) -> (- $($count)*))
};
((+ $($moves:tt)*) -> (- $($count:tt)*)) => {
abacus!(($($moves)*) -> ($($count)*))
};
((+ $($moves:tt)*) -> ($($count:tt)*)) => {
abacus!(($($moves)*) -> (+ $($count)*))
};
fn main() {
let equals_zero = abacus!((++-+-+++--++---++----+) -> ());
assert_eq!(equals_zero, true);
}
This technique can be used in cases where you need to keep track of a varying counter that starts
at or near zero, and must support the following operations:
PROVISIONAL 67
• Increment by one.
• Decrement by one.
• Compare to zero (or any other fixed, finite value).
In this way, operations on the counter are like flicking tokens back and forth like an abacus.1
In cases where you want to represent negative values, -n can be represented as n instances of a
different token. In the example given above, +n is stored as n + tokens, and -m is stored as m -
tokens.
In this case, the operations become slightly more complicated; increment and decrement effec-
tively reverse their usual meanings when the counter is negative. To whit given + and - for the
positive and negative tokens respectively, the operations change to:
• Increment by one:
• match (), substitute (+).
• match (- $($count:tt)*), substitute ($($count)*).
• match ($($count:tt)+), substitute (+ $($count)+).
• Decrement by one:
• match (), substitute (-).
• match (+ $($count:tt)*), substitute ($($count)*).
• match ($($count:tt)+), substitute (- $($count)+).
• Compare to 0: match ().
• Compare to +1: match (+).
• Compare to -1: match (-).
• Compare to +2: match (++).
• Compare to -2: match (--).
• (and so on…)
Note that the example at the top combines some of the rules together (for example, it combines
increment on () and ($($count:tt)+) into an increment on ($($count:tt)*)).
If you want to extract the actual value of the counter, this can be done using a regular counter
macro4 . For the example above, the terminal rules can be replaced with the following:
1 This desperately thin reasoning conceals the real reason for this name: to avoid having yet another thing with
“token” in the name. Talk to your writer about avoiding semantic satiation2 today!
In fairness, it could also have been called “unary counting”3 .
4 ../blk/README.html#counting
68 CHAPTER 4. PATTERNS
macro_rules! abacus {
// ...
macro_rules! replace_expr {
($_t:tt $sub:expr) => {$sub};
}
macro_rules! abacus {
(-) => {-1};
(+) => {1};
($($moves:tt)*) => {
0 $(+ abacus!($moves))*
}
}
Building Blocks
5
Reusable snippets of macro code.
AST Coercion
The Rust parser is not very robust in the face of tt substitutions. Problems can arise when
the parser is expecting a particular grammar construct and instead finds a lump of substituted
tt tokens. Rather than attempt to parse them, it will often just give up. In these cases, it is
necessary to employ an AST coercion.
These coercions are often used with push-down accumulation (section 4, page 62) macros in order
to get the parser to treat the final tt sequence as a particular kind of grammar construct.
Note that this specific set of macros is determined by what macros are allowed to expand to, not
what they are able to capture. That is, because macros cannot appear in type position1 , you
cannot have an as_ty! macro.
Counting
Repetition with replacement
Counting things in a macro is a surprisingly tricky task. The simplest way is to use replacement
with a repetition match.
1 See Issue #273362 .
70 CHAPTER 5. BUILDING BLOCKS
macro_rules! replace_expr {
($_t:tt $sub:expr) => {$sub};
}
macro_rules! count_tts {
($($tts:tt)*) => {0usize $(+ replace_expr!($tts 1usize))*};
}
This is much better, but will likely crash the compiler with inputs of around 500 or so tokens.
Consider that the output will look something like this:
The compiler must parse this into an AST, which will produce what is effectively a perfectly
unbalanced binary tree 500+ levels deep.
Recursion
macro_rules! count_tts {
() => {0usize};
($_head:tt $($tail:tt)*) => {1usize + count_tts!($($tail)*)};
}
Note: As of rustc 1.2, the compiler has grevious performance problems when large
numbers of integer literals of unknown type must undergo inference. We are using
explicitly usize-typed literals here to avoid that.
If this is not suitable (such as when the type must be substitutable), you can help
matters by using as (e.g. 0 as $ty, 1 as $ty, etc.).
This works, but will trivially exceed the recursion limit. Unlike the repetition approach, you can
extend the input size by matching multiple tokens at once.
macro_rules! count_tts {
($_a:tt $_b:tt $_c:tt $_d:tt $_e:tt
$_f:tt $_g:tt $_h:tt $_i:tt $_j:tt
$_k:tt $_l:tt $_m:tt $_n:tt $_o:tt
$_p:tt $_q:tt $_r:tt $_s:tt $_t:tt
$($tail:tt)*)
=> {20usize + count_tts!($($tail)*)};
($_a:tt $_b:tt $_c:tt $_d:tt $_e:tt
$_f:tt $_g:tt $_h:tt $_i:tt $_j:tt
$($tail:tt)*)
=> {10usize + count_tts!($($tail)*)};
($_a:tt $_b:tt $_c:tt $_d:tt $_e:tt
COUNTING 71
$($tail:tt)*)
=> {5usize + count_tts!($($tail)*)};
($_a:tt
$($tail:tt)*)
=> {1usize + count_tts!($($tail)*)};
() => {0usize};
}
fn main() {
assert_eq!(700, count_tts!(
,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,
,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,
Slice length
A third approach is to help the compiler construct a shallow AST that won’t lead to a stack
overflow. This can be done by constructing an array literal and calling the len method.
macro_rules! replace_expr {
($_t:tt $sub:expr) => {$sub};
}
macro_rules! count_tts {
($($tts:tt)*) => {<[()]>::len(&[$(replace_expr!($tts ())),*])};
}
72 CHAPTER 5. BUILDING BLOCKS
This has been tested to work up to 10,000 tokens, and can probably go much higher. The
downside is that as of Rust 1.2, this cannot be used to produce a constant expression. Although
the result can be optimised to a simple constant (in debug builds it compiles down to a load
from memory), it still cannot be used in constant positions (such as the value of consts, or a
fixed array’s size).
However, if a non-constant count is acceptable, this is very much the preferred method.
Enum Parsing
macro_rules! parse_unitary_variants {
(@as_expr $e:expr) => {$e};
(@as_item $($i:item)+) => {$($i)+};
// Exit rules.
(
@collect_unitary_variants ($callback:ident ( $($args:tt)* )),
($(,)*) -> ($($var_names:ident,)*)
) => {
parse_unitary_variants! {
@as_expr
$callback!{ $($args)* ($($var_names),*) }
}
};
(
@collect_unitary_variants ($callback:ident { $($args:tt)* }),
($(,)*) -> ($($var_names:ident,)*)
) => {
parse_unitary_variants! {
@as_item
$callback!{ $($args)* ($($var_names),*) }
}
};
// Consume an attribute.
(
@collect_unitary_variants $fixed:tt,
(#[$_attr:meta] $($tail:tt)*) -> ($($var_names:tt)*)
) => {
parse_unitary_variants! {
@collect_unitary_variants $fixed,
($($tail)*) -> ($($var_names)*)
}
};
(
@collect_unitary_variants $fixed:tt,
($var:ident $(= $_val:expr)*, $($tail:tt)*) -> ($($var_names:tt)*)
) => {
parse_unitary_variants! {
@collect_unitary_variants $fixed,
($($tail)*) -> ($($var_names)* $var,)
}
};
// Entry rule.
(enum $name:ident {$($body:tt)*} => $callback:ident $arg:tt) => {
parse_unitary_variants! {
@collect_unitary_variants
($callback $arg), ($($body)*,) -> ()
}
};
}
This macro shows how you can use an [incremental tt muncher] and push-down accumulation
(section 4, page 62) to parse the variants of an enum where all variants are unitary (i.e. they
have no payload). Upon completion, parse_unitary_variants! invokes a [callback] macro with
the list of variants (plus any other arbitrary arguments supplied).
This can be modified to also parse struct fields, compute tag values for the variants, or even
extract the names of all variants in an arbitrary enum.
74 CHAPTER 5. BUILDING BLOCKS
Annotated Examples
6
This section contains real-world1 macros which have been annotated to explain their design and
construction.
Ook!
This macro is an implementation of the Ook! esoteric language2 , which is isomorphic to the
Brainfuck esoteric language3 .
The execution model for the language is very simple: memory is represented as an array of “cells”
(typically at least 8-bits) of some indeterminate number (usually at least 30,000). There is a
pointer into memory which starts off at position 0. Finally, there is an execution stack (used to
implement looping) and pointer into the program, although these last two are not exposed to
the running program; they are properties of the runtime itself.
The language itself is comprised of just three tokens: Ook., Ook?, and Ook!. These are combined
in pairs to form the eight different operations:
Implementation
#![recursion_limit = "158"]
This is, in fact, the lowest possible recursion limit for which the example program provided at
the end will actually compile. If you’re wondering what could be so fantastically complex that
it would justify a recursion limit nearly five times the default limit… take a wild guess4 .
These are here purely to ensure they are visible to the macro expansion.5
macro_rules! Ook {
The name should probably have been ook! to match the standard naming convention, but the
opportunity was simply too good to pass up.
The rules for this macro are broken up into sections using the internal rules6 pattern.
The first of these will be a @start rule, which takes care of setting up the block in which the rest
of our expansion will happen. There is nothing particularly interesting in this: we define some
variables and helper functions, then do the bulk of the expansion.
A few small notes:
• We are expanding into a function largely so that we can use try! to simplify error handling.
• The use of underscore-prefixed names is so that the compiler will not complain about
unused functions or variables if, for example, the user writes an Ook! program that does
no I/O.
around (due to hygiene). To be honest, by the time I realised I needed to define these, the macro was already
mostly written and… well, would you want to go through and fix this thing up if you didn’t absolutely need to?
6 ../pat/README.html#internal-rules
OOK! 77
Opcode parsing
Next are the “execute” rules, which are used to parse opcodes from the input.
The general form of these rules is (@e $syms; ($input)). As you can see from the @start rule,
$syms is the collection of symbols needed to actually implement the program: input, output, the
memory array, etc.. We are using TT bundling7 to simplify forwarding of these symbols through
later, intermediate rules.
First, is the rule that terminates our recursion: once we have no more input, we stop.
Next, we have a single rule for almost each opcode. For these, we strip off the opcode, emit the
corresponding Rust code, then recurse on the input tail: a textbook TT muncher8 .
// Increment pointer.
(@e ($a:expr, $i:expr, $inc:expr, $dec:expr, $r:expr, $w:expr, $re:expr);
7 ../pat/README.html#tt-bundling
8 ../pat/README.html#incremental-tt-munchers
78 CHAPTER 6. ANNOTATED EXAMPLES
// Decrement pointer.
(@e ($a:expr, $i:expr, $inc:expr, $dec:expr, $r:expr, $w:expr, $re:expr);
(Ook? Ook. $($tail:tt)*))
=> {
$i = if $i == 0 { MEM_SIZE } else { $i } - 1;
Ook!(@e ($a, $i, $inc, $dec, $r, $w, $re); ($($tail)*));
};
// Increment pointee.
(@e ($a:expr, $i:expr, $inc:expr, $dec:expr, $r:expr, $w:expr, $re:expr);
(Ook. Ook. $($tail:tt)*))
=> {
$inc($a, $i);
Ook!(@e ($a, $i, $inc, $dec, $r, $w, $re); ($($tail)*));
};
// Decrement pointee.
(@e ($a:expr, $i:expr, $inc:expr, $dec:expr, $r:expr, $w:expr, $re:expr);
(Ook! Ook! $($tail:tt)*))
=> {
$dec($a, $i);
Ook!(@e ($a, $i, $inc, $dec, $r, $w, $re); ($($tail)*));
};
// Write to stdout.
(@e ($a:expr, $i:expr, $inc:expr, $dec:expr, $r:expr, $w:expr, $re:expr);
(Ook! Ook. $($tail:tt)*))
=> {
try!($w.write_all(&$a[$i .. $i+1]));
Ook!(@e ($a, $i, $inc, $dec, $r, $w, $re); ($($tail)*));
};
Here is where things get more complicated. This opcode, Ook! Ook?, marks the start of a loop.
Ook! loops are translated to the following Rust code:
while memory[ptr] != 0 {
// Contents of loop
}
Of course, we cannot actually emit an incomplete loop. This could be solved by using pushdown9 ,
were it not for a more fundamental problem: we cannot write while memory[ptr] != {, at all,
anywhere. This is because doing so would introduce an unbalanced brace.
The solution to this is to actually split the input into two parts: everything inside the loop, and
everything after it. The @x rules handle the first, @s the latter.
Loop extraction
Next are the @x, or “extraction”, rules. These are responsible for taking an input tail and
extracting the contents of a loop. The general form of these rules is: (@x $sym; $depth; $buf;
$tail).
The purpose of $sym is the same as above. $tail is the input to be parsed, whilst $buf is a
push-down accumulation buffer10 into which we will collect the opcodes that are inside the loop.
But what of $depth?
A complication to all this is that loops can be nested. Thus, we must have some way of keeping
track of how many levels deep we currently are. We must track this accurately enough to not
stop parsing too early, nor too late, but when the level is just right.11
Since we cannot do arithmetic in macros, and it would be infeasible to write out explicit integer-
matching rules (imagine the following rules all copy & pasted for a non-trivial number of positive
integers), we will instead fall back on one of the most ancient and venerable counting methods
in history: counting on our fingers.
9 ../pat/README.html#push-down-accumulation
10 ../pat/README.html#push-down-accumulation
11 It is a little known fact[ˆaeg-ook.md--fact] that the story of Goldie Locks was actually an allegory for accurate
But as macros don’t have fingers, we’ll use a token abacus counter12 instead. Specifically, we
will use @s, where each @ represents one additional level of depth. If we keep these @s contained
in a group, we can implement the three important operations we need:
First is a rule to detect when we find the matching Ook? Ook! sequence that closes the loop we’re
parsing. In this case, we feed the accumulated loop contents to the previously defined @e rules.
Note that we do not need to do anything with the remaining input tail (that will be handled by
the @s rules).
Next, we have rules for entering and exiting nested loops. These adjust the counter and add the
opcodes to the buffer.
Finally, we have a rule for “everything else”. Note the $op0 and $op1 captures: as far as Rust is
concerned, our Ook! tokens are always two Rust tokens: the identifier Ook, and another token.
Thus, we can generalise over all non-loop opcodes by matching !, ?, and . as tts.
Here, we leave $depth untouched and just add the opcodes to the buffer.
Loop Skipping
This is broadly the same as loop extraction, except we don’t care about the contents of the loop
(and as such, don’t need the accumulation buffer). All we need to know is when we are past the
loop. At that point, we resume processing the input using the @e rules.
As such, these rules are presented without further exposition.
// End of loop.
(@s $syms:tt; ();
(Ook? Ook! $($tail:tt)*))
=> {
Ook!(@e $syms; ($($tail)*));
};
Entry point
This is the only non-internal rule.
It is worth noting that because this formulation simply matches all tokens provided to it, it is
extremely dangerous. Any mistake can cause an invocation to fail to match all the above rules,
thus falling down to this one and triggering an infinite recursion.
When you are writing, modifying, or debugging a macro like this, it is wise to temporarily prefix
rules such as this one with something, such as @entry. This prevents the infinite recursion case,
and you are more likely to get matcher errors at the appropriate place.
($($Ooks:tt)*) => {
Ook!(@start $($Ooks)*)
};
}
82 CHAPTER 6. ANNOTATED EXAMPLES
Usage
Here, finally, is our test program.
fn main() {
let _ = Ook!(
Ook. Ook? Ook. Ook. Ook. Ook. Ook. Ook.
Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
Ook. Ook. Ook. Ook. Ook! Ook? Ook? Ook.
Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
Ook. Ook? Ook! Ook! Ook? Ook! Ook? Ook.
Ook! Ook. Ook. Ook? Ook. Ook. Ook. Ook.
Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
Ook. Ook. Ook! Ook? Ook? Ook. Ook. Ook.
Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook?
Ook! Ook! Ook? Ook! Ook? Ook. Ook. Ook.
Ook! Ook. Ook. Ook. Ook. Ook. Ook. Ook.
Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
Ook! Ook. Ook! Ook. Ook. Ook. Ook. Ook.
Ook. Ook. Ook! Ook. Ook. Ook? Ook. Ook?
Ook. Ook? Ook. Ook. Ook. Ook. Ook. Ook.
Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
Ook. Ook. Ook! Ook? Ook? Ook. Ook. Ook.
Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook?
Ook! Ook! Ook? Ook! Ook? Ook. Ook! Ook.
Ook. Ook? Ook. Ook? Ook. Ook? Ook. Ook.
Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
Ook. Ook. Ook! Ook? Ook? Ook. Ook. Ook.
Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
Ook. Ook? Ook! Ook! Ook? Ook! Ook? Ook.
Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook.
Ook? Ook. Ook? Ook. Ook? Ook. Ook? Ook.
Ook! Ook. Ook. Ook. Ook. Ook. Ook. Ook.
Ook! Ook. Ook! Ook! Ook! Ook! Ook! Ook!
Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook.
Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook!
Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook!
Ook! Ook. Ook. Ook? Ook. Ook? Ook. Ook.
Ook! Ook. Ook! Ook? Ook! Ook! Ook? Ook!
Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
Ook. Ook. Ook. Ook. Ook! Ook.
);
}
The output when run (after a considerable pause for the compiler to do hundreds of recursive
macro expansions) is:
OOK! 83
Hello World!
With that, we have demonstrated the horrifying truth that macro_rules! is Turing-complete!
An aside
This was based on a macro implementing an isomorphic language called “Hodor!”. Manish
Goregaokar then implemented a Brainfuck interpreter using the Hodor! macro13 . So that is a
Brainfuck interpreter written in Hodor! which was itself implemented using macro_rules!.
Legend has it that after raising the recursion limit to three million and allowing it to run for
four days, it finally finished.
…by overflowing the stack and aborting. To this day, esolang-as-macro remains a decidedly
non-viable method of development with Rust.
13 https://www.reddit.com/r/rust/comments/39wvrm/hodor_esolang_as_a_rust_macro/cs76rqk?context=
10000