100-exercises-to-learn-rust
100-exercises-to-learn-rust
Luca Palmieri
ii
Contents
1 Welcome 1
1.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 A Basic Calculator 7
2.1 Types, part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Control flow, part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Panics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Factorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6 Loops, part 1: while . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.7 Loops, part 2: for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.8 Overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9 Case-by-case behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.10 Conversions, pt. 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 Modelling A Ticket 29
3.1 Structs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Visibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5 Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6 Ownership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7 Mutable references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.8 Memory layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.9 Heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.11 Destructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.12 Wrapping up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
iii
iv CONTENTS
4 Traits 59
4.1 Traits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2 Implementing traits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Operator overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4 Derive macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.5 Trait bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.6 String slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.7 Deref trait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.8 Sized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.9 From and Into . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.10 Generics and associated types . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.11 Copying values, pt. 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.12 Copying values, pt. 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.13 The Drop trait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.14 Wrapping up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6 Intro 123
6.1 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.2 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.3 Resizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.4 Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.5 .iter() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.6 Lifetimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.7 Combinators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.8 impl Trait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.9 impl Trait in argument position . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.10 Slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.11 Mutable slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.12 Ticket ids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.13 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.14 Mutable indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.15 HashMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.16 Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7 Intro 153
7.1 Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
7.2 'static . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.3 Leaking data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.4 Scoped threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.5 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.6 Interior mutability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
7.7 Two-way communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
7.8 A dedicated Client type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7.9 Bounded vs unbounded channels . . . . . . . . . . . . . . . . . . . . . . . . . 170
7.10 Update operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
7.11 Locks, Send and Arc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.12 Readers and writers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.13 Design review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.14 Sync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
vi CONTENTS
9 Epilogue 203
Chapter 1
Welcome
Methodology
Formats
You can go through the course material in the browser or download it as a PDF file, for offline
reading.
If you prefer to have the course material printed out, buy a paperback copy on Amazon.
1
2 CHAPTER 1. WELCOME
Structure
On the left side of the screen, you can see that the course is divided into sections. Each section
introduces a new concept or feature of the Rust language.
To verify your understanding, each section is paired with an exercise that you need to solve.
You can find the exercises in the companion GitHub repository.
Before starting the course, make sure to clone the repository to your local machine:
We also recommend you work on a branch, so you can easily track your progress and pull in
updates from the main repository, if needed:
cd 100-exercises-to-learn-rust
git checkout -b my-solutions
All exercises are located in the exercises folder. Each exercise is structured as a Rust package.
The package contains the exercise itself, instructions on what to do (in src/lib.rs), and a test
suite to automatically verify your solution.
Tools
• Rust). If rustup is already installed on your system, run rustup update (or another ap-
propriate command depending on how you installed Rust on your system) to ensure you’re
running on the latest stable version.
• (Optional but recommended) An IDE with Rust autocompletion support. We recommend
one of the following:
– RustRover;
– Visual Studio Code with the rust-analyzer extension.
To verify your solutions, we’ve also provided a tool to guide you through the course. It is the wr
CLI (short for “workshop runner”). Install it with:
In a new terminal, navigate back to the top-level folder of the repository. Run the wr command
to start the course:
wr
We recommend committing your solutions to Git as you progress through the course,
so you can easily track your progress and “restart” from a known point if needed.
Author
This course was written by Luca Palmieri, Principal Engineering Consultant at Mainmatter.
Luca has been working with Rust since 2018, initially at TrueLayer and then at AWS.
Luca is the author of “Zero to Production in Rust”, the go-to resource for learning how to build
backend applications in Rust.
He is also the author and maintainer of a variety of open-source Rust projects, including cargo-
chef, Pavex and wiremock.
Exercise
1.1 Syntax
Comments
Functions
Functions in Rust are defined using the fn keyword, followed by the function’s name, its input
parameters, and its return type. The function’s body is enclosed in curly braces {}.
In previous exercise, you saw the greeting function:
greeting has no input parameters and returns a reference to a string slice (&'static str).
Return type
The return type can be omitted from the signature if the function doesn’t return anything (i.e. if
it returns (), Rust’s unit type). That’s what happened with the test_welcome function:
fn test_welcome() {
assert_eq!(greeting(), "I'm ready to learn Rust!");
}
Returning values
You can also use the return keyword to return a value early:
fn greeting() -> &'static str {
// Notice the semicolon at the end of the line!
return "I'm ready to learn Rust!";
}
Input parameters
Input parameters are declared inside the parentheses () that follow the function’s name.
Each parameter is declared with its name, followed by a colon :, followed by its type.
For example, the greet function below takes a name parameter of type &str (a “string slice”):
// An input parameter
//
fn greet(name: &str) -> String {
format!("Hello, {}!", name)
}
If there are multiple input parameters, they must be separated with commas.
Type annotations
Since we’ve been mentioned “types” a few times, let’s state it clearly: Rust is a statically typed
language.
Every single value in Rust has a type and that type must be known to the compiler at compile-time.
Types are a form of static analysis.
You can think of a type as a tag that the compiler attaches to every value in your program.
Depending on the tag, the compiler can enforce different rules—e.g. you can’t add a string to a
number, but you can add two numbers together. If leveraged correctly, types can prevent whole
classes of runtime bugs.
6 CHAPTER 1. WELCOME
Exercise
A Basic Calculator
Nailing the basics with a few exercises will get the language flowing under your fingers. When
we move on to more complex topics, such as traits and ownership, you’ll be able to focus on the
new concepts without getting bogged down by the syntax or other trivial details.
Exercise
7
8 CHAPTER 2. A BASIC CALCULATOR
Primitive types
u32 is one of Rust’s primitive types. Primitive types are the most basic building blocks of a
language. They’re built into the language itself—i.e. they are not defined in terms of other types.
You can combine these primitive types to create more complex types. We’ll see how soon
enough.
Integers
An integer is a number that can be written without a fractional component. E.g. 1 is an integer,
while 1.2 is not.
Bit width
The 32 in u32 refers to the number of bits1 used to represent the number in memory.
The more bits, the larger the range of numbers that can be represented.
Rust supports multiple bit widths for integers: 8, 16, 32, 64, 128.
With 32 bits, u32 can represent numbers from 0 to 2^32 - 1 (a.k.a. u32::MAX).
With the same number of bits, a signed integer (i32) can represent numbers from -2^31 to 2^31
- 1 (i.e. from i32::MIN to i32::MAX).
The maximum value for i32 is smaller than the maximum value for u32 because one bit is used
to represent the sign of the number. Check out the two’s complement representation for more
details on how signed integers are represented in memory.
1
A bit is the smallest unit of data in a computer. It can only have two values: 0 or 1.
2.1. TYPES, PART 1 9
Summary
Combining the two variables (signed/unsigned and bit width), we get the following integer types:
Literals
But all values in Rust have a type, so… what’s the type of 42?
The Rust compiler will try to infer the type of a literal based on how it’s used.
If you don’t provide any context, the compiler will default to i32 for integer literals.
If you want to use a different type, you can add the desired integer type as a suffix—e.g. 2u64 is
a 2 that’s explicitly typed as a u64.
Underscores in literals
Arithmetic operators
• + for addition
• - for subtraction
• * for multiplication
• / for division
• % for remainder
Precedence and associativity rules for these operators are the same as in mathematics.
You can use parentheses to override the default precedence. E.g. 2 * (3 + 4).
2
Rust doesn’t let you define custom operators, but it puts you in control of how the built-in operators behave. We’ll
talk about operator overloading later in the course, after we’ve covered traits.
10 CHAPTER 2. A BASIC CALCULATOR
� Warning
The division operator / performs integer division when used with integer types. I.e.
the result is truncated towards zero. For example, 5 / 2 is 2, not 2.5.
let b: u8 = 100;
let a: u32 = b;
Further reading
Exercise
3
There are some exceptions to this rule, mostly related to references, smart pointers and ergonomics. We’ll cover
those later on. A mental model of “all conversions are explicit” will serve you well in the meantime.
2.2. VARIABLES 11
2.2 Variables
let x = 42;
Type
Every variable in Rust must have a type. It can either be inferred by the compiler or explicitly
specified by the developer.
You can specify the variable type by adding a colon : followed by the type after the variable
name. For example:
Type inference
If we don’t specify the type of a variable, the compiler will try to infer it based on the context in
which the variable is used.
let x = 42;
let y: u32 = x;
Inference limitations
The compiler sometimes needs a little help to infer the correct variable type based on its usage.
In those cases you’ll get a compilation error and the compiler will ask you to provide an explicit
type hint to disambiguate the situation.
12 CHAPTER 2. A BASIC CALCULATOR
Not all heroes wear capes, not all variables are declared with let.
Function arguments are variables too!
Initialization
let x: u32;
let x: u32;
let y = x + 1;
Exercise
if clauses
let number = 3;
if number < 5 {
println!("`number` is smaller than 5");
}
This program will print number is smaller than 5 because the condition number < 5 is true.
else clauses
Like most programming languages, Rust supports an optional else branch to execute a block of
code when the condition in an if expression is false.
For example:
let number = 3;
if number < 5 {
println!("`number` is smaller than 5");
} else {
println!("`number` is greater than or equal to 5");
}
else if clauses
Your code drifts more and more to the right when you have multiple if expressions, one nested
inside the other.
let number = 3;
if number < 5 {
println!("`number` is smaller than 5");
} else {
if number >= 3 {
println!("`number` is greater than or equal to 3, but smaller than 5");
} else {
println!("`number` is smaller than 3");
}
}
14 CHAPTER 2. A BASIC CALCULATOR
You can use the else if keyword to combine multiple if expressions into a single one:
let number = 3;
if number < 5 {
println!("`number` is smaller than 5");
} else if number >= 3 {
println!("`number` is greater than or equal to 3, but smaller than 5");
} else {
println!("`number` is smaller than 3");
}
Booleans
let number = 3;
if number {
println!("`number` is not zero");
}
This follows from Rust’s philosophy around type coercion: there’s no automatic conversion from
non-boolean types to booleans. Rust doesn’t have the concept of truthy or falsy values, like
JavaScript or Python.
You have to be explicit about the condition you want to check.
Comparison operators
It’s quite common to use comparison operators to build conditions for if expressions.
Here are the comparison operators available in Rust when working with integers:
• ==: equal to
2.3. CONTROL FLOW, PART 1 15
if/else is an expression
let number = 3;
let message = if number < 5 {
"smaller than 5"
} else {
"greater than or equal to 5"
};
In the example above, each branch of the if evaluates to a string literal, which is then assigned
to the message variable.
The only requirement is that both if branches return the same type.
Exercise
2.4 Panics
Let’s go back to the speed function you wrote for the “Variables” section. It probably looked
something like this:
If you have a keen eye, you might have spotted one issue5 : what happens if time_elapsed is
zero?
You can try it out on the Rust playground!
The program will exit with the following error message:
fn main() {
panic!("This is a panic!");
// The line below will never be executed
let x = 1 + 2;
}
There are other mechanisms to work with recoverable errors in Rust, which we’ll cover later. For
the time being we’ll stick with panics as a brutal but simple stopgap solution.
Further reading
Exercise
2.5 Factorial
Exercise
while <condition> {
// code to execute
}
let sum = 0;
let i = 1;
// "while i is less than or equal to 5"
while i <= 5 {
// `+=` is a shorthand for `sum = sum + i`
sum += i;
i += 1;
}
This will keep adding 1 to i and i to sum until i is no longer less than or equal to 5.
The example above won’t compile as is. You’ll get an error like:
3 | let i = 1;
| -
| |
| first assignment to `i`
| help: consider making this binding mutable: `mut i`
...
8 | i += 1;
| ^^^^^^ cannot assign twice to immutable variable
while i <= 5 {
sum += i;
i += 1;
}
Further reading
Exercise
Having to manually increment a counter variable is somewhat tedious. The pattern is also ex-
tremely common!
To make this easier, Rust provides a more concise way to iterate over a range of values: the for
loop.
A for loop is a way to execute a block of code for each element in an iterator8 .
Here’s the general syntax:
Ranges
Rust’s standard library provides range type that can be used to iterate over a sequence of num-
bers9 .
For example, if we want to sum the numbers from 1 to 5:
Every time the loop runs, i will be assigned the next value in the range before executing the
block of code.
There are five kinds of ranges in Rust:
• 1..5: A (half-open) range. It includes all numbers from 1 to 4. It doesn’t include the last
value, 5.
• 1..=5: An inclusive range. It includes all numbers from 1 to 5. It includes the last value, 5.
• 1..: An open-ended range. It includes all numbers from 1 to infinity (well, until the maxi-
mum value of the integer type).
• ..5: A range that starts at the minimum value for the integer type and ends at 4. It doesn’t
include the last value, 5.
• ..=5: A range that starts at the minimum value for the integer type and ends at 5. It
includes the last value, 5.
8
Later in the course we’ll give a precise definition of what counts as an “iterator”. For now, think of it as a sequence
of values that you can loop over.
9
You can use ranges with other types too (e.g. characters and IP addresses), but integers are definitely the most
common case in day-to-day Rust programming.
2.7. LOOPS, PART 2: FOR 21
You can use a for loop with the first three kinds of ranges, where the starting point is explicitly
specified. The last two range types are used in other contexts, that we’ll cover later.
The extreme values of a range don’t have to be integer literals—they can be variables or expres-
sions too!
For example:
let end = 5;
let mut sum = 0;
for i in 1..(end + 1) {
sum += i;
}
Further reading
Exercise
2.8 Overflow
When the result of an arithmetic operation is bigger than the maximum value for a given integer
type, we are talking about an integer overflow.
Integer overflows are an issue because they violate the contract for arithmetic operations.
The result of an arithmetic operation between two integers of a given type should be another
integer of the same type. But the mathematically correct result doesn’t fit into that integer type!
If the result is smaller than the minimum value for a given integer type, we refer to the
event as an integer underflow.
For brevity, we’ll only talk about integer overflows for the rest of this section, but keep
in mind that everything we say applies to integer underflows as well.
The speed function you wrote in the “Variables” section underflowed for some input
combinations. E.g. if end is smaller than start, end - start will underflow the
u32 type since the result is supposed to be negative but u32 can’t represent negative
numbers.
No automatic promotion
One possible approach would be automatically promote the result to a bigger integer type. E.g.
if you’re summing two u8 integers and the result is 256 (u8::MAX + 1), Rust could choose to
interpret the result as u16, the next integer type that’s big enough to hold 256.
But, as we’ve discussed before, Rust is quite picky about type conversions. Automatic integer
promotion is not Rust’s solution to the integer overflow problem.
Alternatives
Since we ruled out automatic promotion, what can we do when an integer overflow occurs?
It boils down to two different approaches:
This is the most conservative approach: we stop the program when an integer overflow occurs.
That’s done via a panic, the mechanism we’ve already seen in the “Panics” section.
2.8. OVERFLOW 23
When the result of an arithmetic operation is bigger than the maximum value for a given integer
type, you can choose to wrap around.
If you think of all the possible values for a given integer type as a circle, wrapping around means
that when you reach the maximum value, you start again from the minimum value.
For example, if you do a wrapping addition between 1 and 255 (=u8::MAX), the result is 0
(=u8::MIN). If you’re working with signed integers, the same principle applies. E.g. adding 1 to
127 (=i8::MAX) with wrapping will give you -128 (=i8::MIN).
overflow-checks
Rust lets you, the developer, choose which approach to use when an integer overflow occurs.
The behaviour is controlled by the overflow-checks profile setting.
If overflow-checks is set to true, Rust will panic at runtime when an integer operation over-
flows. If overflow-checks is set to false, Rust will wrap around when an integer operation
overflows.
Profiles
A profile is a set of configuration options that can be used to customize the way Rust code is
compiled.
“Have you built your project in release mode?” is almost a meme in the Rust commu-
nity.
It refers to developers who are not familiar with Rust and complain about its perfor-
mance on social media (e.g. Reddit, Twitter, etc.) before realizing they haven’t built
their project in release mode.
You can also define custom profiles or customize the built-in ones.
24 CHAPTER 2. A BASIC CALCULATOR
overflow-check
Further reading
• Check out “Myths and legends about integer overflow in Rust” for an in-depth discussion
about integer overflow in Rust.
Exercise
overflow-checks is a blunt tool: it’s a global setting that affects the whole program.
It often happens that you want to handle integer overflows differently depending on the context:
sometimes wrapping is the right choice, other times panicking is preferable.
wrapping_ methods
You can opt into wrapping arithmetic on a per-operation basis by using the wrapping_ methods10 .
For example, you can use wrapping_add to add two integers with wrapping:
let x = 255u8;
let y = 1u8;
let sum = x.wrapping_add(y);
assert_eq!(sum, 0);
saturating_ methods
Alternatively, you can opt into saturating arithmetic by using the saturating_ methods.
Instead of wrapping around, saturating arithmetic will return the maximum or minimum value for
the integer type. For example:
let x = 255u8;
let y = 1u8;
let sum = x.saturating_add(y);
assert_eq!(sum, 255);
Since 255 + 1 is 256, which is bigger than u8::MAX, the result is u8::MAX (255).
The opposite happens for underflows: 0 - 1 is -1, which is smaller than u8::MIN, so the result
is u8::MIN (0).
You can’t get saturating arithmetic via the overflow-checks profile setting—you have to explic-
itly opt into it when performing the arithmetic operation.
Exercise
10
You can think of methods as functions that are “attached” to a specific type. We’ll cover methods (and how to define
them) in the next chapter.
26 CHAPTER 2. A BASIC CALCULATOR
We’ve repeated over and over again that Rust won’t perform implicit type conversions for inte-
gers.
How do you perform explicit conversions then?
as
The semantics of this conversion are what you expect: all u32 values are valid u64 values.
Truncation
This program will run without issues, because as conversions are infallible. But what is the
value of b? When going from a larger integer type to a smaller, the Rust compiler will perform a
truncation.
To understand what happens, let’s start by looking at how 256u16 is represented in memory, as
a sequence of bits:
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
| | |
+---------------+---------------+
First 8 bits Last 8 bits
When converting to a u8, the Rust compiler will keep the last 8 bits of a u16 memory represen-
tation:
2.10. CONVERSIONS, PT. 1 27
0 0 0 0 0 0 0 0
| |
+---------------+
Last 8 bits
Recommendation
Limitations
Surprising behaviour is not the only downside of as casting. It is also fairly limited: you can only
rely on as casting for primitive types and a few other special cases.
When working with composite types, you’ll have to rely on different conversion mechanisms
(fallible and infallible), which we’ll explore later on.
Further reading
• Check out Rust’s official reference to learn the precise behaviour of as casting for each
source/target combination, as well as the exhaustive list of allowed conversions.
Exercise
Modelling A Ticket
The first chapter should have given you a good grasp over some of Rust’s primitive types, oper-
ators and basic control flow constructs.
In this chapter we’ll go one step further and cover what makes Rust truly unique: ownership.
Ownership is what enables Rust to be both memory-safe and performant, with no garbage col-
lector.
As our running example, we’ll use a (JIRA-like) ticket, the kind you’d use to track bugs, features,
or tasks in a software project.
We’ll take a stab at modeling it in Rust. It’ll be the first iteration—it won’t be perfect nor very
idiomatic by the end of the chapter. It’ll be enough of a challenge though!
To move forward you’ll have to pick up several new Rust concepts, such as:
Exercise
29
30 CHAPTER 3. MODELLING A TICKET
3.1 Structs
• A title
• A description
• A status
We can start by using a String to represent them. String is the type defined in Rust’s standard
library to represent UTF-8 encoded text.
But how do we combine these three pieces of information into a single entity?
Defining a struct
struct Ticket {
title: String,
description: String,
status: String
}
A struct is quite similar to what you would call a class or an object in other programming lan-
guages.
Defining fields
struct Configuration {
version: u32,
active: bool
}
Instantiation
You can create an instance of a struct by specifying the values for each field:
Accessing fields
// Field access
let x = ticket.description;
Methods
impl Ticket {
fn is_open(self) -> bool {
self.status == "Open"
}
}
// Syntax:
// impl <StructName> {
// fn <method_name>(<parameters>) -> <return_type> {
// // Method body
// }
// }
self
If a method takes self as its first parameter, it can be called using the method call syntax:
This is the same calling syntax you used to perform saturating arithmetic operations on u32 values
in the previous chapter.
Static methods
If a method doesn’t take self as its first parameter, it’s a static method.
32 CHAPTER 3. MODELLING A TICKET
struct Configuration {
version: u32,
active: bool
}
impl Configuration {
// `default` is a static method on `Configuration`
fn default() -> Configuration {
Configuration { version: 0, active: false }
}
}
The only way to call a static method is by using the function call syntax:
Equivalence
You can use the function call syntax even for methods that take self as their first parameter:
The function call syntax makes it quite clear that ticket is being used as self, the first parameter
of the method, but it’s definitely more verbose. Prefer the method call syntax when possible.
Exercise
3.2 Validation
struct Ticket {
title: String,
description: String,
status: String,
}
We are using “raw” types for the fields of our Ticket struct. This means that users can create a
ticket with an empty title, a suuuuuuuper long description or a nonsensical status (e.g. “Funny”).
We can do better than that!
Further reading
• Check out String’s documentation for a thorough overview of the methods it provides.
You’ll need it for the exercise!
Exercise
3.3 Modules
The new method you’ve just defined is trying to enforce some constraints on the field values
for Ticket. But are those invariants really enforced? What prevents a developer from creating
a Ticket without going through Ticket::new?
To get proper encapsulation you need to become familiar with two new concepts: visibility and
modules. Let’s start with modules.
What is a module?
In Rust a module is a way to group related code together, under a common namespace (i.e. the
module’s name).
You’ve already seen modules in action: the unit tests that verify the correctness of your code are
defined in a different module, named tests.
#[cfg(test)]
mod tests {
// [...]
}
Inline modules
The tests module above is an example of an inline module: the module declaration (mod
tests) and the module contents (the stuff inside { ... }) are next to each other.
Module tree
Inline modules are useful for small pieces of code, but as your project grows you’ll want to split
your code into multiple files. In the parent module, you declare the existence of a submodule
using the mod keyword.
mod dog;
cargo, Rust’s build tool, is then in charge of finding the file that contains the module implemen-
tation.
If your module is declared in the root of your crate (e.g. src/lib.rs or src/main.rs), cargo
expects the file to be named either:
3.3. MODULES 35
• src/<module_name>.rs
• src/<module_name>/mod.rs
• [..]/<parent_module>/<module_name>.rs
• [..]/<parent_module>/<module_name>/mod.rs
You can access items defined in the same module without any special syntax. You just use their
name.
struct Ticket {
// [...]
}
That’s not the case if you want to access an entity from a different module.
You have to use a path pointing to the entity you want to access.
You can compose the path in various ways:
Star imports
You can also import all the items from a module with a single use statement.
use crate::module_1::module_2::*;
If you’re struggling to picture the module tree of your project, you can try using cargo-modules
to visualize it!
Refer to their documentation for installation instructions and usage examples.
Exercise
3.4 Visibility
When you start breaking down your code into multiple modules, you need to start thinking about
visibility. Visibility determines which regions of your code (or other people’s code) can access
a given entity, be it a struct, a function, a field, etc.
Private by default
Visibility modifiers
You can modify the default visibility of an entity using a visibility modifier.
Some common visibility modifiers are:
• pub: makes the entity public, i.e. accessible from outside the module where it’s defined,
potentially from other crates.
• pub(crate): makes the entity public within the same crate, but not outside of it.
• pub(super): makes the entity public within the parent module.
• pub(in path::to::module): makes the entity public within the specified module.
You can use these modifiers on modules, structs, functions, fields, etc. For example:
Configuration is public, but you can only access the version field from within the same crate.
The active field, instead, is private and can only be accessed from within the same module or
one of its submodules.
Exercise
3.5 Encapsulation
Now that we have a basic understanding of modules and visibility, let’s circle back to encapsu-
lation.
Encapsulation is the practice of hiding the internal representation of an object. It is most com-
monly used to enforce some invariants on the object’s state.
Going back to our Ticket struct:
struct Ticket {
title: String,
description: String,
status: String,
}
Accessor methods
In summary:
That’s a good start, but it’s not enough: apart from creating a Ticket, we also need to interact
with it. But how can we access the fields if they’re private?
We need to provide accessor methods.
Accessor methods are public methods that allow you to read the value of a private field (or fields)
of a struct.
Rust doesn’t have a built-in way to generate accessor methods for you, like some other languages
do. You have to write them yourself—they’re just regular methods.
Exercise
3.6 Ownership
If you solved the previous exercise using what this course has taught you so far, your accessor
methods probably look like this:
impl Ticket {
pub fn title(self) -> String {
self.title
}
Those methods compile and are enough to get tests to pass, but in a real-world scenario they
won’t get you very far. Consider this snippet:
if ticket.status() == "To-Do" {
// We haven't covered the `println!` macro yet,
// but for now it's enough to know that it prints
// a (templated) message to the console
println!("Your next task is: {}", ticket.title());
}
These constraints are enforced by the borrow checker, a subsystem of the Rust compiler, often
the subject of jokes and memes in the Rust community.
Ownership is a key concept in Rust, and it’s what makes the language unique. Ownership enables
Rust to provide memory safety without compromising performance. All these things are true
at the same time for Rust:
Languages like Python, JavaScript, and Java give you 2. and 3., but not 1.
Language like C or C++ give you 1., but neither 2. nor 3.
Depending on your background, 3. might sound a bit arcane: what is a “dangling pointer”? What
is a “double free”? Why are they dangerous?
Don’t worry: we’ll cover these concepts in more details during the rest of the course.
For now, though, let’s focus on learning how to work within Rust’s ownership system.
The owner
In Rust, each value has an owner, statically determined at compile-time. There is only one owner
for each value at any given time.
Move semantics
Rust’s ownership system is baked into the type system: each function has to declare in its sig-
nature how it wants to interact with its arguments.
So far, all our methods and functions have consumed their arguments: they’ve taken ownership
of them. For example:
42 CHAPTER 3. MODELLING A TICKET
impl Ticket {
pub fn description(self) -> String {
self.description
}
}
In particular, this is the sequence of events that unfold when we call ticket.status():
When we try to use ticket again via ticket.title(), the compiler complains: the ticket
value is gone now, we no longer own it, therefore we can’t use it anymore.
To build useful accessor methods we need to start working with references.
Borrowing
It is desirable to have methods that can read the value of a variable without taking ownership of
it.
Programming would be quite limited otherwise. In Rust, that’s done via borrowing.
3.6. OWNERSHIP 43
• Immutable references (&) allow you to read the value, but not to mutate it
• Mutable references (&mut) allow you to read and mutate the value
To ensure these two properties, Rust has to introduce some restrictions on references:
• You can’t have a mutable reference and an immutable reference to the same value at the
same time
• You can’t have more than one mutable reference to the same value at the same time
• The owner can’t mutate the value while it’s being borrowed
• You can have as many immutable references as you want, as long as there are no mutable
references
In a way, you can think of an immutable reference as a “read-only” lock on the value, while a
mutable reference is like a “read-write” lock.
All these restrictions are enforced at compile-time by the borrow checker.
Syntax
struct Configuration {
version: u32,
active: bool,
}
fn main() {
let config = Configuration {
version: 1,
active: true,
};
// `b` is a reference to the `version` field of `config`.
// The type of `b` is `&u32`, since it contains a reference to
// a `u32` value.
2
This is a great mental model to start out, but it doesn’t capture the full picture. We’ll refine our understanding of
references later in the course.
44 CHAPTER 3. MODELLING A TICKET
Exercise
impl Ticket {
pub fn title(&self) -> &String {
&self.title
}
Setters
Setter methods allow users to change the values of Ticket’s private fields while making sure
that its invariants are respected (i.e. you can’t set a Ticket’s title to an empty string).
There are two common ways to implement setters in Rust:
impl Ticket {
pub fn set_title(mut self, new_title: String) -> Self {
// Validate the new title [...]
self.title = new_title;
self
}
}
It takes ownership of self, changes the title, and returns the modified Ticket instance.
This is how you’d use it:
46 CHAPTER 3. MODELLING A TICKET
Since set_title takes ownership of self (i.e. it consumes it), we need to reassign the result
to a variable. In the example above we take advantage of variable shadowing to reuse the same
variable name: when you declare a new variable with the same name as an existing one, the new
variable shadows the old one. This is a common pattern in Rust code.
self-setters work quite nicely when you need to change multiple fields at once: you can chain
multiple calls together!
The second approach to setters, using &mut self, looks like this instead:
impl Ticket {
pub fn set_title(&mut self, new_title: String) {
// Validate the new title [...]
self.title = new_title;
}
}
This time the method takes a mutable reference to self as input, changes the title, and that’s
it. Nothing is returned.
You’d use it like this:
Ownership stays with the caller, so the original ticket variable is still valid. We don’t need to
reassign the result. We need to mark ticket as mutable though, because we’re taking a mutable
reference to it.
&mut-setters have a downside: you can’t chain multiple calls together. Since they don’t return
the modified Ticket instance, you can’t call another setter on the result of the first one. You
have to call each setter separately:
3.7. MUTABLE REFERENCES 47
ticket.set_title("New title".into());
ticket.set_description("New description".into());
ticket.set_status("In Progress".into());
Exercise
We’ve looked at ownership and references from an operational point of view—what you can and
can’t do with them. Now it’s a good time to take a look under the hood: let’s talk about memory.
When discussing memory, you’ll often hear people talk about the stack and the heap.
These are two different memory regions used by programs to store data.
Stack
+-----------------+
| frame for func1 |
+-----------------+
|
| func2 is
| called
v
+-----------------+
| frame for func2 |
+-----------------+
| frame for func1 |
+-----------------+
|
| func2
| returns
v
+-----------------+
| frame for func1 |
+-----------------+
3
If you have nested function calls, each function pushes its data onto the stack when it’s called but it doesn’t pop it
off until the innermost function returns. If you have too many nested function calls, you can run out of stack space—the
stack is not infinite! That’s called a stack overflow.
3.8. MEMORY LAYOUT 49
Rust
std::mem::size_of
You can verify how much space a type would take on the stack using the std::mem::size_of
function.
For a u8, for example:
Exercise
3.9 Heap
The stack is great, but it can’t solve all our problems. What about data whose size is not known at
compile time? Collections, strings, and other dynamically-sized data cannot be (entirely) stack-
allocated. That’s where the heap comes in.
Heap allocations
You can visualize the heap as a big chunk of memory—a huge array, if you will.
Whenever you need to store data on the heap, you ask a special program, the allocator, to
reserve for you a subset of the heap. We call this interaction (and the memory you reserved) a
heap allocation. If the allocation succeeds, the allocator will give you a pointer to the start of
the reserved block.
No automatic de-allocation
+---+---+---+---+---+---+-...-+-...-+---+---+---+---+---+---+---+
| Allocation 1 | Free | ... | ... | Allocation N | Free |
+---+---+---+---+---+---+ ... + ... +---+---+---+---+---+---+---+
It’s the allocator’s job to keep track of which parts of the heap are in use and which are free. The
allocator won’t automatically free the memory you allocated, though: you need to be deliberate
about it, calling the allocator again to free the memory you no longer need.
Performance
The heap’s flexibility comes at a cost: heap allocations are slower than stack allocations. There’s
a lot more bookkeeping involved!
If you read articles about performance optimization you’ll often be advised to minimize heap
allocations and prefer stack-allocated data whenever possible.
When you create a local variable of type String, Rust is forced to allocate on the heap4 : it
doesn’t know in advance how much text you’re going to put in it, so it can’t reserve the right
amount of space on the stack.
But a String is not entirely heap-allocated, it also keeps some data on the stack. In particular:
• The length of the string, i.e. how many bytes are in the string.
• The capacity of the string, i.e. how many bytes have been reserved on the heap.
If you run this code, memory will be laid out like this:
+---------+--------+----------+
Stack | pointer | length | capacity |
| | | 0 | 5 |
+--|------+--------+----------+
|
|
v
+---+---+---+---+---+
Heap: | ? | ? | ? | ? | ? |
+---+---+---+---+---+
s.push_str("Hey");
+---------+--------+----------+
Stack | pointer | length | capacity |
| | | 3 | 5 |
+--| ----+--------+----------+
|
|
v
+---+---+---+---+---+
Heap: | H | e | y | ? | ? |
+---+---+---+---+---+
s now holds 3 bytes of text. Its length is updated to 3, but capacity remains 5. Three of the five
bytes on the heap are used to store the characters H, e, and y.
usize
How much space do we need to store pointer, length and capacity on the stack?
It depends on the architecture of the machine you’re running on.
52 CHAPTER 3. MODELLING A TICKET
Every memory location on your machine has an address, commonly represented as an unsigned
integer. Depending on the maximum size of the address space (i.e. how much memory your
machine can address), this integer can have a different size. Most modern machines use either
a 32-bit or a 64-bit address space.
Rust abstracts away these architecture-specific details by providing the usize type: an unsigned
integer that’s as big as the number of bytes needed to address memory on your machine. On a
32-bit machine, usize is equivalent to u32. On a 64-bit machine, it matches u64.
Capacity, length and pointers are all represented as usizes in Rust5 .
std::mem::size_of returns the amount of space a type would take on the stack, which is also
known as the size of the type.
What about the memory buffer that String is managing on the heap? Isn’t that part
of the size of String?
No!
That heap allocation is a resource that String is managing. It’s not considered to be part of the
String type by the compiler.
std::mem::size_of doesn’t know (or care) about additional heap-allocated data that a type
might manage or refer to via pointers, as is the case with String, therefore it doesn’t track its
size.
Unfortunately there is no equivalent of std::mem::size_of to measure the amount of heap
memory that a certain value is allocating at runtime. Some types might provide methods to
inspect their heap usage (e.g. String’s capacity method), but there is no general-purpose
“API” to retrieve runtime heap usage in Rust.
You can, however, use a memory profiler tool (e.g. DHAT or a custom allocator) to inspect the
heap usage of your program.
Exercise
5
The size of a pointer depends on the operating system too. In certain environments, a pointer is larger than a memory
address (e.g. CHERI). Rust makes the simplifying assumption that pointers are the same size as memory addresses, which
is true for most modern systems you’re likely to encounter.
3.10. REFERENCES 53
3.10 References
What about references, like &String or &mut String? How are they represented in memory?
Most references6 in Rust are represented, in memory, as a pointer to a memory location.
It follows that their size is the same as the size of a pointer, a usize.
You can verify this using std::mem::size_of:
assert_eq!(std::mem::size_of::<&String>(), 8);
assert_eq!(std::mem::size_of::<&mut String>(), 8);
A &String, in particular, is a pointer to the memory location where the String’s metadata is
stored.
If you run this snippet:
let s = String::from("Hey");
let r = &s;
--------------------------------------
| |
+----v----+--------+----------+ +----|----+
Stack | pointer | length | capacity | | pointer |
| | | 3 | 5 | | |
+--| ----+--------+----------+ +---------+
| s r
|
v
+---+---+---+---+---+
Heap | H | e | y | ? | ? |
+---+---+---+---+---+
It’s a pointer to a pointer to the heap-allocated data, if you will. The same goes for &mut String.
The example above should clarify one thing: not all pointers point to the heap.
They just point to a memory location, which may be on the heap, but doesn’t have to be.
Exercise
6
Later in the course we’ll talk about fat pointers, i.e. pointers with additional metadata. As the name implies, they
are larger than the pointers we discussed in this chapter, also known as thin pointers.
54 CHAPTER 3. MODELLING A TICKET
3.11 Destructors
When introducing the heap, we mentioned that you’re responsible for freeing the memory you
allocate.
When introducing the borrow-checker, we also stated that you rarely have to manage memory
directly in Rust.
These two statements might seem contradictory at first. Let’s see how they fit together by
introducing scopes and destructors.
Scopes
The scope of a variable is the region of Rust code where that variable is valid, or alive.
The scope of a variable starts with its declaration. It ends when one of the following happens:
1. the block (i.e. the code between {}) where the variable was declared ends
fn main() {
// `x` is not yet in scope here
let y = "Hello".to_string();
let x = "World".to_string(); // <-- x's scope starts here...
let h = "!".to_string(); // |
} // <-------------- ...and ends here
2. ownership of the variable is transferred to someone else (e.g. a function or another variable)
fn compute(t: String) {
// Do something [...]
}
fn main() {
let s = "Hello".to_string(); // <-- s's scope starts here...
// |
compute(s); // <------------------- ..and ends here
// because `s` is moved into `compute`
}
Destructors
When the owner of a value goes out of scope, Rust invokes its destructor.
The destructor tries to clean up the resources used by that value—in particular, whatever memory
it allocated.
You can manually invoke the destructor of a value by passing it to std::mem::drop.
That’s why you’ll often hear Rust developers saying “that value has been dropped” as a way to
state that a value has gone out of scope and its destructor has been invoked.
3.11. DESTRUCTORS 55
We can insert explicit calls to drop to “spell out” what the compiler does for us. Going back to
the previous example:
fn main() {
let y = "Hello".to_string();
let x = "World".to_string();
let h = "!".to_string();
}
fn main() {
let y = "Hello".to_string();
let x = "World".to_string();
let h = "!".to_string();
// Variables are dropped in reverse order of declaration
drop(h);
drop(x);
drop(y);
}
Let’s look at the second example instead, where s’s ownership is transferred to compute:
fn compute(s: String) {
// Do something [...]
}
fn main() {
let s = "Hello".to_string();
compute(s);
}
fn compute(t: String) {
// Do something [...]
drop(t); // <-- Assuming `t` wasn't dropped or moved
// before this point, the compiler will call
// `drop` here, when it goes out of scope
}
fn main() {
let s = "Hello".to_string();
compute(s);
}
Notice the difference: even though s is no longer valid after compute is called in main, there is no
drop(s) in main. When you transfer ownership of a value to a function, you’re also transferring
the responsibility of cleaning it up.
56 CHAPTER 3. MODELLING A TICKET
This ensures that the destructor for a value is called at most7 once, preventing double free bugs
by design.
What happens if you try to use a value after it’s been dropped?
let x = "Hello".to_string();
drop(x);
println!("{}", x);
Drop consumes the value it’s called on, meaning that the value is no longer valid after the call.
The compiler will therefore prevent you from using it, avoiding use-after-free bugs.
Dropping references
let x = 42i32;
let y = &x;
drop(y);
7
Rust doesn’t guarantee that destructors will run. They won’t, for example, if you explicitly choose to leak memory.
3.11. DESTRUCTORS 57
It goes back to what we said earlier: we only want to call the destructor once.
You can have multiple references to the same value—if we called the destructor for the value
they point at when one of them goes out of scope, what would happen to the others? They would
refer to a memory location that’s no longer valid: a so-called dangling pointer, a close relative
of use-after-free bugs. Rust’s ownership system rules out these kinds of bugs by design.
Exercise
3.12 Wrapping up
Exercise
Traits
In the previous chapter we covered the basics of Rust’s type and ownership system.
It’s time to dig deeper: we’ll explore traits, Rust’s take on interfaces.
Once you learn about traits, you’ll start seeing their fingerprints all over the place.
In fact, you’ve already seen traits in action throughout the previous chapter, e.g. .into() invo-
cations as well as operators like == and +.
On top of traits as a concept, we’ll also cover some of the key traits that are defined in Rust’s
standard library:
Since we’ll be talking about conversions, we’ll seize the opportunity to plug some of the “knowl-
edge gaps” from the previous chapter—e.g. what is "A title", exactly? Time to learn more
about slices too!
Exercise
59
60 CHAPTER 4. TRAITS
4.1 Traits
All our tests, so far, have been making assertions using Ticket’s fields.
Ticket is a new type. Out of the box, there is no behavior attached to it.
Rust doesn’t magically infer how to compare two Ticket instances just because they contain
Strings.
The Rust compiler is nudging us in the right direction though: it’s suggesting that we might be
missing an implementation of PartialEq. PartialEq is a trait!
Defining a trait
trait <TraitName> {
fn <method_name>(<parameters>) -> <return_type>;
}
We might, for example, define a trait named MaybeZero that requires its implementors to define
an is_zero method:
trait MaybeZero {
fn is_zero(self) -> bool;
}
Implementing a trait
To implement a trait for a type we use the impl keyword, just like we do for regular1 methods,
but the syntax is a bit different:
For example, to implement the MaybeZero trait for a custom number type, WrappingU32:
To invoke a trait method, we use the . operator, just like we do with regular methods:
To satisfy the latter, you may have to add a use statement for the trait:
use crate::MaybeZero;
• The trait is defined in the same module where the invocation occurs.
• The trait is defined in the standard library’s prelude. The prelude is a set of traits and types
that are automatically imported into every Rust program. It’s as if use std::prelude::*;
was added at the beginning of every Rust module.
You can find the list of traits and types in the prelude in the Rust documentation.
Exercise
When a type is defined in another crate (e.g. u32, from Rust’s standard library), you can’t directly
define new methods for it. If you try:
impl u32 {
fn is_even(&self) -> bool {
self % 2 == 0
}
}
Extension trait
An extension trait is a trait whose primary purpose is to attach new methods to foreign types,
such as u32. That’s exactly the pattern you deployed in the previous exercise, by defining the
IsEven trait and then implementing it for i32 and u32. You are then free to call is_even on
those types as long as IsEven is in scope.
fn main() {
// Invoke its method on a type that implements it
if 4.is_even() {
// [...]
}
}
One implementation
trait IsEven {
fn is_even(&self) -> bool;
}
64 CHAPTER 4. TRAITS
There can be no ambiguity as to what trait implementation should be used when IsEven::is_even
is invoked on a u32 value, therefore there can only be one.
Orphan rule
Things get more nuanced when multiple crates are involved. In particular, at least one of the
following must be true:
This is known as Rust’s orphan rule. Its goal is to make the method resolution process unam-
biguous.
Imagine the following situation:
Which implementation should be used? The one defined in B? Or the one defined in C?
There’s no good answer, therefore the orphan rule was defined to prevent this scenario. Thanks
to the orphan rule, neither crate B nor crate C would compile.
4.2. IMPLEMENTING TRAITS 65
Further reading
• There are some caveats and exceptions to the orphan rule as stated above. Check out the
reference if you want to get familiar with its nuances.
Exercise
Now that we have a basic understanding of what traits are, let’s circle back to operator over-
loading. Operator overloading is the ability to define custom behavior for operators like +, -, *,
/, ==, !=, etc.
// Provided method
fn ne(&self, other: &Self) -> bool { ... }
}
When you write x == y the compiler will look for an implementation of the PartialEq trait for
the types of x and y and replace x == y with x.eq(y). It’s syntactic sugar!
This is the correspondence for the main operators:
Operator Trait
+ Add
- Sub
* Mul
/ Div
% Rem
== and != PartialEq
<, >, <=, and >= PartialOrd
Arithmetic operators live in the std::ops module, while comparison ones live in the std::cmp
module.
Default implementations
struct WrappingU8 {
inner: u8,
}
You are not forced to use the default implementation though. You can choose to override it when
you implement the trait:
struct MyType;
Exercise
Implementing PartialEq for Ticket was a bit tedious, wasn’t it? You had to manually compare
each field of the struct.
Destructuring syntax
Furthermore, the implementation is brittle: if the struct definition changes (e.g. a new field is
added), you have to remember to update the PartialEq implementation.
You can mitigate the risk by destructuring the struct into its fields:
If the definition of Ticket changes, the compiler will error out, complaining that your destruc-
turing is no longer exhaustive.
You can also rename struct fields, to avoid variable shadowing:
Destructuring is a useful pattern to have in your toolkit, but there’s an even more convenient way
to do this: derive macros.
Macros
Inspection
Some IDEs let you expand a macro to inspect the generated code. If that’s not possible, you can
use cargo-expand.
Derive macros
#[derive(PartialEq)]
struct Ticket {
title: String,
description: String,
status: String
}
Derive macros are used to automate the implementation of common (and “obvious”) traits for
custom types. In the example above, the PartialEq trait is automatically implemented for
Ticket. If you expand the macro, you’ll see that the generated code is functionally equivalent
to the one you wrote manually, although a bit more cumbersome to read:
#[automatically_derived]
impl ::core::cmp::PartialEq for Ticket {
#[inline]
fn eq(&self, other: &Ticket) -> bool {
self.title == other.title
&& self.description == other.description
&& self.status == other.status
}
}
Further reading
Exercise
The problem
All our functions and methods, so far, have been working with concrete types.
Code that operates on concrete types is usually straightforward to write and understand. But it’s
also limited in its reusability.
Let’s imagine, for example, that we want to write a function that returns true if an integer is
even. Working with concrete types, we’d have to write a separate function for each integer type
we want to support:
// Etc.
Alternatively, we could write a single extension trait and then different implementations for each
integer type:
trait IsEven {
fn is_even(&self) -> bool;
}
// Etc.
Generic programming
fn print_if_even<T>(n: T)
where
T: IsEven + Debug
{
if n.is_even() {
println!("{n:?} is even");
}
}
Trait bounds
fn print_if_even<T>(n: T) {
if n.is_even() {
println!("{n:?} is even");
}
}
3 | println!("{n:?} is even");
| ^^^^^
| `T` cannot be formatted using `{:?}` because
| it doesn't implement `Debug`
|
help: consider restricting type parameter `T`
|
1 | fn print_if_even<T: std::fmt::Debug>(n: T) {
| +++++++++++++++++
Without trait bounds, the compiler doesn’t know what T can do.
It doesn’t know that T has an is_even method, and it doesn’t know how to format T for printing.
From the compiler point of view, a bare T has no behaviour at all.
Trait bounds restrict the set of types that can be used by ensuring that the behaviour required
by the function body is present.
All the examples above used a where clause to specify trait bounds:
fn print_if_even<T>(n: T)
where
T: IsEven + Debug
// ^^^^^^^^^^^^^^^^^
// This is a `where` clause
{
// [...]
}
If the trait bounds are simple, you can inline them directly next to the type parameter:
In the examples above, we used T as the type parameter name. This is a common convention
when a function has only one type parameter.
Nothing stops you from using a more meaningful name, though:
It is actually desirable to use meaningful names when there are multiple type parameters at play
or when the name T doesn’t convey enough information about the type’s role in the function.
74 CHAPTER 4. TRAITS
Maximize clarity and readability when naming type parameters, just as you would with variables
or function parameters. Follow Rust’s conventions, though: use upper camel case for type pa-
rameter names.
You may wonder why we need trait bounds at all. Can’t the compiler infer the required traits from
the function’s body?
It could, but it won’t.
The rationale is the same as for explicit type annotations on function parameters: each function
signature is a contract between the caller and the callee, and the terms must be explicitly stated.
This allows for better error messages, better documentation, less unintentional breakages across
versions, and faster compilation times.
Exercise
Throughout the previous chapters you’ve seen quite a few string literals being used in the
code, like "To-Do" or "A ticket description". They were always followed by a call to
.to_string() or .into(). It’s time to understand why!
String literals
You define a string literal by enclosing the raw text in double quotes:
Memory layout
+---------+--------+----------+
Stack | pointer | length | capacity |
| | | 5 | 5 |
+--|------+--------+----------+
|
|
v
+---+---+---+---+---+
Heap: | H | e | l | l | o |
+---+---+---+---+---+
If you remember, we’ve also examined how a &String is laid out in memory:
--------------------------------------
| |
+----v----+--------+----------+ +----|----+
| pointer | length | capacity | | pointer |
| | | 5 | 5 | | |
+----|----+--------+----------+ +---------+
| s &s
|
v
+---+---+---+---+---+
| H | e | l | l | o |
+---+---+---+---+---+
76 CHAPTER 4. TRAITS
&String points to the memory location where the String’s metadata is stored.
If we follow the pointer, we get to the heap-allocated data. In particular, we get to the first byte
of the string, H.
What if we wanted a type that represents a substring of s? E.g. ello in Hello?
String slices
A &str is a view into a string, a reference to a sequence of UTF-8 bytes stored elsewhere. You
can, for example, create a &str from a String like this:
s slice
+---------+--------+----------+ +---------+--------+
Stack | pointer | length | capacity | | pointer | length |
| | | 5 | 5 | | | | 4 |
+----|----+--------+----------+ +----|----+--------+
| s |
| |
v |
+---+---+---+---+---+ |
Heap: | H | e | l | l | o | |
+---+---+---+---+---+ |
^ |
| |
+--------------------------------+
slice doesn’t own the data, it just points to it: it’s a reference to the String’s heap-allocated
data.
When slice is dropped, the heap-allocated data won’t be deallocated, because it’s still owned
by s. That’s why slice doesn’t have a capacity field: it doesn’t own the data, so it doesn’t
need to know how much space it was allocated for it; it only cares about the data it references.
&str vs &String
As a rule of thumb, use &str rather than &String whenever you need a reference to textual
data.
&str is more flexible and generally considered more idiomatic in Rust code.
4.6. STRING SLICES 77
If a method returns a &String, you’re promising that there is heap-allocated UTF-8 text some-
where that matches exactly the one you’re returning a reference to.
If a method returns a &str, instead, you have a lot more freedom: you’re just saying that some-
where there’s a bunch of text data and that a subset of it matches what you need, therefore
you’re returning a reference to it.
Exercise
impl Ticket {
pub fn title(&self) -> &String {
&self.title
}
}
to
impl Ticket {
pub fn title(&self) -> &str {
&self.title
}
}
was all you needed to do to get the code to compile and the tests to pass. Some alarm bells
should be ringing in your head though.
• self.title is a String
• &self.title is, therefore, a &String
• The output of the (modified) title method is &str
You would expect a compiler error, wouldn’t you? Expected &String, found &str or some-
thing similar. Instead, it just works. Why?
The Deref trait is the mechanism behind the language feature known as deref coercion.
The trait is defined in the standard library, in the std::ops module:
Deref coercion
By implementing Deref<Target = U> for a type T you’re telling the compiler that &T and &U are
somewhat interchangeable.
In particular, you get the following behavior:
• References to T are implicitly converted into references to U (i.e. &T becomes &U)
• You can call on &T all the methods defined on U that take &self as input.
There is one more thing around the dereference operator, *, but we don’t need it yet (see std’s
docs if you’re curious).
Thanks to this implementation and deref coercion, a &String is automatically converted into a
&str when needed.
Exercise
4.8 Sized
There’s more to &str than meets the eye, even after having investigated deref coercion.
From our previous discussion on memory layouts, it would have been reasonable to expect &str
to be represented as a single usize on the stack, a pointer. That’s not the case though. &str
stores some metadata next to the pointer: the length of the slice it points to. Going back to the
example from a previous section:
In memory, we get:
s slice
+---------+--------+----------+ +---------+--------+
Stack | pointer | length | capacity | | pointer | length |
| | | 5 | 5 | | | | 4 |
+----|----+--------+----------+ +----|----+--------+
| s |
| |
v |
+---+---+---+---+---+ |
Heap: | H | e | l | l | o | |
+---+---+---+---+---+ |
^ |
| |
+--------------------------------+
A type is Sized if its size is known at compile time. In other words, it’s not a DST.
Marker traits
Auto traits
Examples
All the types we’ve seen so far are Sized: u32, String, bool, etc.
str, as we just saw, is not Sized.
&str is Sized though! We know its size at compile time: two usizes, one for the pointer and
one for the length.
Exercise
The problem
impl Ticket {
pub fn new(
title: String,
description: String,
status: String
) -> Self {
// [...]
}
}
We’ve also seen that string literals (such as "A title") are of type &str.
We have a type mismatch here: a String is expected, but we have a &str. No magical coercion
will come to save us this time; we need to perform a conversion.
The Rust standard library defines two traits for infallible conversions: From and Into, in the
std::convert module.
These trait definitions showcase a few concepts that we haven’t seen before: supertraits and
implicit trait bounds. Let’s unpack those first.
4.9. FROM AND INTO 83
Supertrait / Subtrait
The From: Sized syntax implies that From is a subtrait of Sized: any type that implements
From must also implement Sized. Alternatively, you could say that Sized is a supertrait of
From.
Every time you have a generic type parameter, the compiler implicitly assumes that it’s Sized.
For example:
In other words, both T and the type implementing From<T> must be Sized, even though the
former bound is implicit.
You can opt out of the implicit Sized bound with a negative trait bound:
This syntax reads as “T may or may not be Sized”, and it allows you to bind T to a DST (e.g.
Foo<str>). It is a special case, though: negative trait bounds are exclusive to Sized, you can’t
use them with other traits.
84 CHAPTER 4. TRAITS
&str to String
In std’s documentation you can see which std types implement the From trait.
You’ll find that String implements From<&str> for String. Thus, we can write:
If a type U implements From<T>, then Into<U> for T is automatically implemented. That’s why
we can write let title = "A title".into();.
.into()
Every time you see .into(), you’re witnessing a conversion between types.
What’s the target type, though?
In most cases, the target type is either:
.into() will work out of the box as long as the compiler can infer the target type from the context
without ambiguity.
Exercise
Let’s re-examine the definition for two of the traits we studied so far, From and Deref:
Due to how deref coercion works, there can only be one “target” type for a given type. E.g.
String can only deref to str. It’s about avoiding ambiguity: if you could implement Deref
multiple times for a type, which Target type should the compiler choose when you call a &self
method?
That’s why Deref uses an associated type, Target.
An associated type is uniquely determined by the trait implementation. Since you can’t imple-
ment Deref more than once, you’ll only be able to specify one Target for a given type and there
won’t be any ambiguity.
Generic traits
On the other hand, you can implement From multiple times for a type, as long as the input type
T is different. For example, you can implement From for WrappingU32 using both u32 and u16
as input types:
This works because From<u16> and From<u32> are considered different traits.
There is no ambiguity: the compiler can determine which implementation to use based on type
of the value being converted.
As a closing example, consider the Add trait from the standard library:
RHS
Output
The type they’re implementing the trait for is &u32, but the result of the addition is u32.
It would be impossible2 to provide this implementation if add had to return Self, i.e. &u32 in
this case. Output lets std decouple the implementor from the return type, thus supporting this
case.
On the other hand, Output can’t be a generic parameter. The output type of the operation must
be uniquely determined once the types of the operands are known. That’s why it’s an associated
type: for a given combination of implementor and generic parameters, there is only one Output
type.
Conclusion
To recap:
• Use an associated type when the type must be uniquely determined for a given trait im-
plementation.
• Use a generic parameter when you want to allow multiple implementations of the trait for
the same type, with different input types.
Exercise
2
Flexibility is rarely free: the trait definition is more complex due to Output, and implementors have to reason about
what they want to return. The trade-off is only justified if that flexibility is actually needed. Keep that in mind when
designing your own traits.
88 CHAPTER 4. TRAITS
fn consumer(s: String) { /* */ }
fn example() {
let mut s = String::from("hello");
consumer(s);
s.push_str(", world!"); // error: value borrowed here after move
}
Clone
Its method, clone, takes a reference to self and returns a new owned instance of the same
type.
In action
Going back to the example above, we can use clone to create a new String instance before
calling consumer:
fn consumer(s: String) { /* */ }
fn example() {
let mut s = String::from("hello");
let t = s.clone();
consumer(t);
s.push_str(", world!"); // no error
}
4.11. COPYING VALUES, PT. 1 89
Instead of giving ownership of s to consumer, we create a new String (by cloning s) and give
that to consumer instead.
s remains valid and usable after the call to consumer.
In memory
Let’s look at what happened in memory in the example above. When let mut s =
String::from("hello"); is executed, the memory looks like this:
s
+---------+--------+----------+
Stack | pointer | length | capacity |
| | | 5 | 5 |
+--|------+--------+----------+
|
|
v
+---+---+---+---+---+
Heap: | H | e | l | l | o |
+---+---+---+---+---+
When let t = s.clone() is executed, a whole new region is allocated on the heap to store a
copy of the data:
s t
+---------+--------+----------+ +---------+--------+----------+
Stack | pointer | length | capacity | | pointer | length | capacity |
| | | 5 | 5 | | | | 5 | 5 |
+--|------+--------+----------+ +--|------+--------+----------+
| |
| |
v v
+---+---+---+---+---+ +---+---+---+---+---+
Heap: | H | e | l | l | o | | H | e | l | l | o |
+---+---+---+---+---+ +---+---+---+---+---+
If you’re coming from a language like Java, you can think of clone as a way to create a deep
copy of an object.
Implementing Clone
To make a type Clone-able, we have to implement the Clone trait for it.
You almost always implement Clone by deriving it:
#[derive(Clone)]
struct MyType {
// fields
}
90 CHAPTER 4. TRAITS
The compiler implements Clone for MyType as you would expect: it clones each field of MyType
individually and then constructs a new MyType instance using the cloned fields.
Remember that you can use cargo expand (or your IDE) to explore the code generated by
derive macros.
Exercise
Let’s consider the same example as before, but with a slight twist: using u32 rather than String
as a type.
fn consumer(s: u32) { /* */ }
fn example() {
let s: u32 = 5;
consumer(s);
let t = s + 1;
}
It’ll compile without errors! What’s going on here? What’s the difference between String and
u32 that makes the latter work without .clone()?
Copy
1. The type doesn’t manage any additional resources (e.g. heap memory, file handles, etc.)
beyond the std::mem::size_of bytes that it occupies in memory.
2. The type is not a mutable reference (&mut T).
If both conditions are met, then Rust can safely create a new instance of the type by performing
a bitwise copy of the original instance—this is often referred to as a memcpy operation, after the
C standard library function that performs the bitwise copy.
92 CHAPTER 4. TRAITS
s copied_s
+---------+--------+----------+ +---------+--------+----------+
| pointer | length | capacity | | pointer | length | capacity |
| | | 5 | 5 | | | | 5 | 5 |
+--|------+--------+----------+ +--|------+--------+----------+
| |
| |
v |
+---+---+---+---+---+ |
| H | e | l | l | o | |
+---+---+---+---+---+ |
^ |
| |
+------------------------------------+
This is bad! Both String instances would try to free the memory buffer when they go out of
scope, leading to a double-free error. You could also create two distinct &mut String references
that point to the same memory buffer, violating Rust’s borrowing rules.
When we introduced ownership and mutable borrows, we stated one rule quite clearly: there
can only ever be one mutable borrow of a value at any given time.
That’s why &mut u32 doesn’t implement Copy, even though u32 does.
If &mut u32 implemented Copy, you could create multiple mutable references to the same value
and modify it in multiple places at the same time. That’d be a violation of Rust’s borrowing rules!
It follows that &mut T never implements Copy, no matter what T is.
Implementing Copy
In most cases, you don’t need to manually implement Copy. You can just derive it, like this:
4.12. COPYING VALUES, PT. 2 93
#[derive(Copy, Clone)]
struct MyStruct {
field: u32,
}
Exercise
The Drop trait is a mechanism for you to define additional cleanup logic for your types, beyond
what the compiler does for you automatically.
Whatever you put in the drop method will be executed when the value goes out of scope.
When talking about the Copy trait, we said that a type can’t implement Copy if it manages addi-
tional resources beyond the std::mem::size_of bytes that it occupies in memory.
You might wonder: how does the compiler know if a type manages additional resources? That’s
right: Drop trait implementations!
If your type has an explicit Drop implementation, the compiler will assume that your type has
additional resources attached to it and won’t allow you to implement Copy.
// This is a unit struct, i.e. a struct with no fields.
#[derive(Clone, Copy)]
struct MyType;
Exercise
4.14 Wrapping up
We’ve covered quite a few different traits in this chapter—and we’ve only scratched the surface!
It may feel like you have a lot to remember, but don’t worry: you’ll bump into these traits so often
when writing Rust code that they’ll soon become second nature.
Closing thoughts
• Don’t make a function generic if it is always invoked with a single type. It introduces indi-
rection in your codebase, making it harder to understand and maintain.
• Don’t create a trait if you only have one implementation. It’s a sign that the trait is not
needed.
• Implement standard traits for your types (Debug, PartialEq, etc.) whenever it makes
sense. It will make your types more idiomatic and easier to work with, unlocking a lot of
functionality provided by the standard library and ecosystem crates.
• Implement traits from third-party crates if you need the functionality they unlock within their
ecosystem.
• Beware of making code generic solely to use mocks in your tests. The maintainability cost
of this approach can be high, and it’s often better to use a different testing strategy. Check
out the testing masterclass for details on high-fidelity testing.
Before moving on, let’s go through one last exercise to consolidate what we’ve learned. You’ll
have minimal guidance this time—just the exercise description and the tests to guide you.
Exercise
The Ticket struct we worked on in the previous chapters is a good start, but it still screams “I’m
a beginner Rustacean!”.
We’ll use this chapter to refine our Rust domain modelling skills. We’ll need to introduce a few
more concepts along the way:
Exercise
97
98 CHAPTER 5. MODELLING A TICKET, PT. 2
5.1 Enumerations
Based on the validation logic you wrote in a previous chapter, there are only a few valid statuses
for a ticket: To-Do, InProgress and Done.
This is not obvious if we look at the status field in the Ticket struct or at the type of the status
parameter in the new method:
#[derive(Debug, PartialEq)]
pub struct Ticket {
title: String,
description: String,
status: String,
}
impl Ticket {
pub fn new(
title: String,
description: String,
status: String
) -> Self {
// [...]
}
}
In both cases we’re using String to represent the status field. String is a very general type—it
doesn’t immediately convey the information that the status field has a limited set of possible
values. Even worse, the caller of Ticket::new will only find out at runtime if the status they
provided is valid or not.
We can do better than that with enumerations.
enum
An enumeration is a type that can have a fixed set of values, called variants.
In Rust, you define an enumeration using the enum keyword:
enum Status {
ToDo,
InProgress,
Done,
}
Exercise
5.2 match
enum Status {
ToDo,
InProgress,
Done
}
impl Status {
fn is_done(&self) -> bool {
match self {
Status::Done => true,
// The `|` operator lets you match multiple patterns.
// It reads as "either `Status::ToDo` or `Status::InProgress`".
Status::InProgress | Status::ToDo => false
}
}
}
A match statement that lets you compare a Rust value against a series of patterns.
You can think of it as a type-level if. If status is a Done variant, execute the first block; if it’s a
InProgress or ToDo variant, execute the second block.
Exhaustiveness
There’s one key detail here: match is exhaustive. You must handle all enum variants.
If you forget to handle a variant, Rust will stop you at compile-time with an error.
E.g. if we forget to handle the ToDo variant:
match self {
Status::Done => true,
Status::InProgress => false,
}
Catch-all
If you don’t care about one or more variants, you can use the _ pattern as a catch-all:
match status {
Status::Done => true,
_ => false
}
The _ pattern matches anything that wasn’t matched by the previous patterns.
If you’re keen on correctness, avoid using catch-alls. Leverage the compiler to re-examine all
matching sites and determine how new enum variants should be handled.
Exercise
enum Status {
ToDo,
InProgress,
Done,
}
Variants
Let’s say that we want to store the name of the person who’s currently working on a ticket.
We would only have this information if the ticket is in progress. It wouldn’t be there for a to-do
ticket or a done ticket. We can model this by attaching a String field to the InProgress variant:
enum Status {
ToDo,
InProgress {
assigned_to: String,
},
Done,
}
match status {
Status::InProgress { assigned_to } => {
println!("Assigned to: {}", assigned_to);
},
Status::ToDo | Status::Done => {
println!("Done");
}
}
Bindings
match status {
Status::InProgress { assigned_to: person } => {
println!("Assigned to: {}", person);
},
Status::ToDo | Status::Done => {
println!("Done");
}
}
Exercise
impl Ticket {
pub fn assigned_to(&self) -> &str {
match &self.status {
Status::InProgress { assigned_to } => assigned_to,
Status::Done | Status::ToDo => {
panic!(
"Only `In-Progress` tickets can be \
assigned to someone"
)
}
}
}
}
You only care about the Status::InProgress variant. Do you really need to match on all the
other variants?
New constructs to the rescue!
if let
The if let construct allows you to match on a single variant of an enum, without having to
handle all the other variants.
Here’s how you can use if let to simplify the assigned_to method:
impl Ticket {
pub fn assigned_to(&self) -> &str {
if let Status::InProgress { assigned_to } = &self.status {
assigned_to
} else {
panic!(
"Only `In-Progress` tickets can be assigned to someone"
);
}
}
}
let/else
If the else branch is meant to return early (a panic counts as returning early!), you can use the
let/else construct:
impl Ticket {
pub fn assigned_to(&self) -> &str {
let Status::InProgress { assigned_to } = &self.status else {
104 CHAPTER 5. MODELLING A TICKET, PT. 2
panic!(
"Only `In-Progress` tickets can be assigned to someone"
);
};
assigned_to
}
}
It allows you to assign the destructured variable without incurring any “right drift”, i.e. the variable
is assigned at the same indentation level as the code that precedes it.
Style
Exercise
5.5 Nullability
Our implementation of the assigned method is fairly blunt: panicking for to-do and done tickets
is far from ideal.
We can do better using Rust’s Option type.
Option
enum Option<T> {
Some(T),
None,
}
Option encodes the idea that a value might be present (Some(T)) or absent (None).
It also forces you to explicitly handle both cases. You’ll get a compiler error if you are working
with a nullable value and you forget to handle the None case.
This is a significant improvement over “implicit” nullability in other languages, where you can
forget to check for null and thus trigger a runtime error.
Option’s definition
Option’s definition uses a Rust construct that you haven’t seen before: tuple-like variants.
Tuple-like variants
Tuple-like structs
You can then access the two fields of a Point instance using their positional index:
Tuples
It’s weird to say that something is tuple-like when we haven’t seen tuples yet!
Tuples are another example of a primitive Rust type. They group together a fixed number of
values with (potentially different) types:
The syntax is simple: you list the types of the values between parentheses, separated by commas.
You can access the fields of a tuple using the dot notation and the field index:
assert_eq!(second.0, -42);
assert_eq!(second.1, 3);
assert_eq!(second.2, 8);
Tuples are a convenient way of grouping values together when you can’t be bothered to define
a dedicated struct type.
Exercise
5.6 Fallibility
impl Ticket {
pub fn new(
title: String,
description: String,
status: Status
) -> Ticket {
if title.is_empty() {
panic!("Title cannot be empty");
}
if title.len() > 50 {
panic!("Title cannot be longer than 50 bytes");
}
if description.is_empty() {
panic!("Description cannot be empty");
}
if description.len() > 500 {
panic!("Description cannot be longer than 500 bytes");
}
Ticket {
title,
description,
status,
}
}
}
As soon as one of the checks fails, the function panics. This is not ideal, as it doesn’t give the
caller a chance to handle the error.
It’s time to introduce the Result type, Rust’s primary mechanism for error handling.
Both Ok and Err are generic, allowing you to specify your own types for the success and error
cases.
108 CHAPTER 5. MODELLING A TICKET, PT. 2
No exceptions
Rust, with Result, forces you to encode fallibility in the function’s signature.
If a function can fail (and you want the caller to have a shot at handling the error), it must return
a Result.
Exercise
5.7 Unwrapping
Unlike exceptions, Rust’s Result forces you to handle errors at the call site.
If you call a function that returns a Result, Rust won’t allow you to implicitly ignore the error
case.
When you call a function that returns a Result, you have two key options:
• Panic if the operation failed. This is done using either the unwrap or expect methods.
• Destructure the Result using a match expression to deal with the error case explicitly.
match parse_int("42") {
Ok(number) => println!("Parsed number: {}", number),
Err(err) => eprintln!("Error: {}", err),
}
Exercise
Your solution to the previous exercise may have felt awkward: matching on strings is not ideal!
A colleague might rework the error messages returned by Ticket::new (e.g. to improve read-
ability) and, all of a sudden, your calling code would break.
You already know the machinery required to fix this: enums!
Reacting to errors
When you want to allow the caller to behave differently based on the specific error that occurred,
you can use an enum to represent the different error cases:
Using an error enum, you’re encoding the different error cases in the type system—they become
part of the signature of the fallible function.
This simplifies error handling for the caller, as they can use a match expression to react to the
different error cases:
match s.parse_u32() {
Ok(n) => n,
Err(U32ParseError::Negative) => 0,
Err(U32ParseError::TooLarge) => u32::MAX,
Err(U32ParseError::NotANumber) => {
panic!("Not a number: {}", s);
}
}
Exercise
Error reporting
In the previous exercise you had to destructure the TitleError variant to extract the error mes-
sage and pass it to the panic! macro.
This is a (rudimentary) example of error reporting: transforming an error type into a represen-
tation that can be shown to a user, a service operator, or a developer.
It’s not practical for each Rust developer to come up with their own error reporting strategy: it’d
be a waste of time and it wouldn’t compose well across projects. That’s why Rust provides the
std::error::Error trait.
There are no constraints on the type of the Err variant in a Result, but it’s a good practice to
use a type that implements the Error trait. Error is the cornerstone of Rust’s error handling
story:
You might recall the : syntax from the From trait—it’s used to specify supertraits. For Error,
there are two supertraits: Debug and Display. If a type wants to implement Error, it must also
implement Debug and Display.
We’ve already encountered the Debug trait in a previous exercise—it’s the trait used by
assert_eq! to display the values of the variables it’s comparing when the assertion fails.
From a “mechanical” perspective, Display and Debug are identical—they encode how a type
should be converted into a string-like representation:
// `Debug`
pub trait Debug {
fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>;
}
// `Display`
pub trait Display {
fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>;
}
The difference is in their purpose: Display returns a representation that’s meant for “end-users”,
while Debug provides a low-level representation that’s more suitable to developers and service
operators.
That’s why Debug can be automatically implemented using the #[derive(Debug)] attribute,
while Display requires a manual implementation.
112 CHAPTER 5. MODELLING A TICKET, PT. 2
Exercise
It took a bit of code to implement the Error trait for TicketNewError, didn’t it?
A manual Display implementation, plus an Error impl block.
We can remove some of the boilerplate by using thiserror, a Rust crate that provides a pro-
cedural macro to simplify the creation of custom error types.
But we’re getting ahead of ourselves: thiserror is a third-party crate, it’d be our first depen-
dency!
Let’s take a step back to talk about Rust’s packaging system before we dive into dependencies.
What is a package?
A Rust package is defined by the [package] section in a Cargo.toml file, also known as its
manifest. Within [package] you can set the package’s metadata, such as its name and version.
Go check the Cargo.toml file in the directory of this section’s exercise!
What is a crate?
Inside a package, you can have one or more crates, also known as targets.
The two most common crate types are binary crates and library crates.
Binaries
Libraries
Libraries, on the other hand, are not executable on their own. You can’t run a library, but you can
import its code from another package that depends on it.
A library groups together code (i.e. functions, types, etc.) that can be leveraged by other pack-
ages as a dependency.
All the exercises you’ve solved so far have been structured as libraries, with a test suite attached
to them.
Conventions
There are some conventions around Rust packages that you need to keep in mind:
• If there’s a src/main.rs file, cargo will infer that the package contains a binary crate.
You can override these defaults by explicitly declaring your targets in the Cargo.toml file—see
cargo’s documentation for more details.
Keep in mind that while a package can contain multiple crates, it can only contain one library
crate.
Exercise
5.11 Dependencies
A package can depend on other packages by listing them in the [dependencies] section of its
Cargo.toml file.
The most common way to specify a dependency is by providing its name and version:
[dependencies]
thiserror = "1"
This will add thiserror as a dependency to your package, with a minimum version of 1.0.0.
thiserror will be pulled from crates.io, Rust’s official package registry. When you run cargo
build, cargo will go through a few stages:
• Dependency resolution
• Downloading the dependencies
• Compiling your project (your own code and the dependencies)
Dependency resolution is skipped if your project has a Cargo.lock file and your manifest files
are unchanged. A lockfile is automatically generated by cargo after a successful round of de-
pendency resolution: it contains the exact versions of all dependencies used in your project, and
is used to ensure that the same versions are consistently used across different builds (e.g. in CI).
If you’re working on a project with multiple developers, you should commit the Cargo.lock file
to your version control system.
You can use cargo update to update the Cargo.lock file with the latest (compatible) versions
of all your dependencies.
Path dependencies
You can also specify a dependency using a path. This is useful when you’re working on multiple
local packages.
[dependencies]
my-library = { path = "../my-library" }
The path is relative to the Cargo.toml file of the package that’s declaring the dependency.
Other sources
Check out the Cargo documentation for more details on where you can get dependencies from
and how to specify them in your Cargo.toml file.
Dev dependencies
You can also specify dependencies that are only needed for development—i.e. they only get
pulled in when you’re running cargo test.
They go in the [dev-dependencies] section of your Cargo.toml file:
116 CHAPTER 5. MODELLING A TICKET, PT. 2
[dev-dependencies]
static_assertions = "1.1.0"
We’ve been using a few of these throughout the book to shorten our tests.
Exercise
5.12 thiserror
We’ve seen how to implement the Error trait “manually” for a custom error type.
Imagine that you have to do this for most error types in your codebase. That’s a lot of boilerplate,
isn’t it?
We can remove some of the boilerplate by using thiserror, a Rust crate that provides a pro-
cedural macro to simplify the creation of custom error types.
#[derive(thiserror::Error, Debug)]
enum TicketNewError {
#[error("{0}")]
TitleError(String),
#[error("{0}")]
DescriptionError(String),
}
All the derive macros we’ve seen so far were provided by the Rust standard library.
thiserror::Error is the first example of a third-party derive macro.
derive macros are a subset of procedural macros, a way to generate Rust code at compile
time. We won’t get into the details of how to write a procedural macro in this course, but it’s
important to know that you can write your own!
A topic to approach in a more advanced Rust course.
Custom syntax
Each procedural macro can define its own syntax, which is usually explained in the crate’s doc-
umentation. In the case of thiserror, we have:
• #[derive(thiserror::Error)]: this is the syntax to derive the Error trait for a custom
error type, helped by thiserror.
• #[error("{0}")]: this is the syntax to define a Display implementation for each variant
of the custom error type. {0} is replaced by the zero-th field of the variant (String, in this
case) when the error is displayed.
Exercise
In the previous chapter we looked at the From and Into traits, Rust’s idiomatic interfaces for
infallible type conversions.
But what if the conversion is not guaranteed to succeed?
We now know enough about errors to discuss the fallible counterparts of From and Into:
TryFrom and TryInto.
Both TryFrom and TryInto are defined in the std::convert module, just like From and Into.
The main difference between From/Into and TryFrom/TryInto is that the latter return a Result
type.
This allows the conversion to fail, returning an error instead of panicking.
Self::Error
Both TryFrom and TryInto have an associated Error type. This allows each implementation to
specify its own error type, ideally the most appropriate for the conversion being attempted.
Self::Error is a way to refer to the Error associated type defined in the trait itself.
Duality
Just like From and Into, TryFrom and TryInto are dual traits.
If you implement TryFrom for a type, you get TryInto for free.
Exercise
5.14 Error::source
There’s one more thing we need to talk about to complete our coverage of the Error trait: the
source method.
Implementing source
The Error trait provides a default implementation that always returns None (i.e. no underlying
cause). That’s why you didn’t have to care about source in the previous exercises.
You can override this default implementation to provide a cause for your error type.
use std::error::Error;
#[derive(Debug)]
struct DatabaseError {
source: std::io::Error
}
In this example, DatabaseError wraps an std::io::Error as its source. We then override the
source method to return this source when called.
• dyn Error is a trait object. It’s a way to refer to any type that implements the Error trait.
• 'static is a special lifetime specifier. 'static implies that the reference is valid for “as
long as we need it”, i.e. the entire program execution.
Combined: &(dyn Error + 'static) is a reference to a trait object that implements the Error
trait and is valid for the entire program execution.
Don’t worry too much about either of these concepts for now. We’ll cover them in more detail in
future chapters.
thiserror provides three ways to automatically implement source for your error types:
• A field named source will automatically be used as the source of the error.
use thiserror::Error;
#[derive(Error, Debug)]
pub enum MyError {
#[error("Failed to connect to the database")]
DatabaseError {
source: std::io::Error
}
}
• A field annotated with the #[source] attribute will automatically be used as the source of
the error.
use thiserror::Error;
#[derive(Error, Debug)]
pub enum MyError {
#[error("Failed to connect to the database")]
DatabaseError {
#[source]
inner: std::io::Error
}
}
• A field annotated with the #[from] attribute will automatically be used as the source of
the error and thiserror will automatically generate a From implementation to convert the
annotated type into your error type.
use thiserror::Error;
#[derive(Error, Debug)]
pub enum MyError {
#[error("Failed to connect to the database")]
DatabaseError {
5.14. ERROR::SOURCE 121
#[from]
inner: std::io::Error
}
}
The ? operator
use std::fs::File;
is equivalent to:
use std::fs::File;
You can use the ? operator to shorten your error handling code significantly.
In particular, the ? operator will automatically convert the error type of the fallible operation
into the error type of the function, if a conversion is possible (i.e. if there is a suitable From
implementation)
Exercise
5.15 Wrapping up
Further reading
Exercise
Intro
In the previous chapter we modelled Ticket in a vacuum: we defined its fields and their con-
straints, we learned how to best represent them in Rust, but we didn’t consider how Ticket fits
into a larger system. We’ll use this chapter to build a simple workflow around Ticket, introducing
a (rudimentary) management system to store and retrieve tickets.
The task will give us an opportunity to explore new Rust concepts, such as:
• Stack-allocated arrays
• Vec, a growable array type
• Iterator and IntoIterator, for iterating over collections
• Slices (&[T]), to work with parts of a collection
• Lifetimes, to describe how long references are valid
• HashMap and BTreeMap, two key-value data structures
• Eq and Hash, to compare keys in a HashMap
• Ord and PartialOrd, to work with a BTreeMap
• Index and IndexMut, to access elements in a collection
Exercise
123
124 CHAPTER 6. INTRO
6.1 Arrays
As soon as we start talking about “ticket management” we need to think about a way to store
multiple tickets. In turn, this means we need to think about collections. In particular, homoge-
neous collections: we want to store multiple instances of the same type.
What does Rust have to offer in this regard?
Arrays
Accessing elements
Out-of-bounds access
If you try to access an element that’s out of bounds, Rust will panic:
This is enforced at runtime using bounds checking. It comes with a small performance overhead,
but it’s how Rust prevents buffer overflows.
In some scenarios the Rust compiler can optimize away bounds checks, especially if iterators are
involved—we’ll speak more about this later on.
If you don’t want to panic, you can use the get method, which returns an Option<&T>:
Performance
Since the size of an array is known at compile-time, the compiler can allocate the array on the
stack. If you run the following code:
+---+---+---+
Stack: | 1 | 2 | 3 |
+---+---+---+
Exercise
6.2 Vectors
Arrays’ strength is also their weakness: their size must be known upfront, at compile-time. If you
try to create an array with a size that’s only known at runtime, you’ll get a compilation error:
let n = 10;
let numbers: [u32; n];
Arrays wouldn’t work for our ticket management system—we don’t know how many tickets we’ll
need to store at compile-time. This is where Vec comes in.
Vec
You would then push elements into the vector using the push method:
numbers.push(1);
numbers.push(2);
numbers.push(3);
Accessing elements
Access is bounds-checked, just like element access with arrays. It has O(1) complexity.
Memory layout
This layout should look familiar: it’s exactly the same as String!
That’s not a coincidence: String is defined as a vector of bytes, Vec<u8>, under the hood:
pub struct String {
vec: Vec<u8>,
}
Exercise
6.3 Resizing
We said that Vec is a “growable” vector type, but what does that mean? What happens if you
try to insert an element into a Vec that’s already at maximum capacity?
Vec::with_capacity
If you have a rough idea of how many elements you’ll store in a Vec, you can use the
Vec::with_capacity method to pre-allocate enough memory upfront.
This can avoid a new allocation when the Vec grows, but it may waste memory if you overestimate
actual usage.
Evaluate on a case-by-case basis.
Exercise
6.4 Iteration
During the very first exercises, you learned that Rust lets you iterate over collections using for
loops. We were looking at ranges at that point (e.g. 0..5), but the same holds true for collections
like arrays and vectors.
for desugaring
Every time you write a for loop in Rust, the compiler desugars it into the following code:
Iterator trait
The next method in the previous code snippet comes from the Iterator trait. The Iterator
trait is defined in Rust’s standard library and provides a shared interface for types that can pro-
duce a sequence of values:
trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
}
130 CHAPTER 6. INTRO
The Item associated type specifies the type of the values produced by the iterator.
next returns the next value in the sequence.
It returns Some(value) if there’s a value to return, and None when there isn’t.
Be careful: there is no guarantee that an iterator is exhausted when it returns None. That’s only
guaranteed if the iterator implements the (more restrictive) FusedIterator trait.
IntoIterator trait
Not all types implement Iterator, but many can be converted into a type that does.
That’s where the IntoIterator trait comes in:
trait IntoIterator {
type Item;
type IntoIter: Iterator<Item = Self::Item>;
fn into_iter(self) -> Self::IntoIter;
}
The into_iter method consumes the original value and returns an iterator over its elements.
A type can only have one implementation of IntoIterator: there can be no ambiguity as to
what for should desugar to.
One detail: every type that implements Iterator automatically implements IntoIterator as
well. They just return themselves from into_iter!
Bounds checks
Iterating over iterators has a nice side effect: you can’t go out of bounds, by design.
This allows Rust to remove bounds checks from the generated machine code, making iteration
faster.
In other words,
There are exceptions to this rule: the compiler can sometimes prove that you’re not going out of
bounds even with manual indexing, thus removing the bounds checks anyway. But in general,
prefer iteration to indexing where possible.
6.4. ITERATION 131
Exercise
6.5 .iter()
This pattern can be simplified by implementing IntoIterator for a reference to the collection.
In our example above, that would be &Vec<Ticket>.
The standard library does this, that’s why the following code works:
The former is convenient in for loops, the latter is more explicit and can be used in other contexts.
Exercise
6.6 Lifetimes
Let’s try to complete the previous exercise by adding an implementation of IntoIterator for
&TicketStore, for maximum convenience in for loops.
Let’s start by filling in the most “obvious” parts of the implementation:
Lifetime parameters
Lifetimes are labels used by the Rust compiler to keep track of how long a reference (either
mutable or immutable) is valid.
The lifetime of a reference is constrained by the scope of the value it refers to. Rust always
makes sure, at compile-time, that references are not used after the value they refer to has been
dropped, to avoid dangling pointers and use-after-free bugs.
This should sound familiar: we’ve already seen these concepts in action when we discussed
ownership and borrowing. Lifetimes are just a way to name how long a specific reference is
valid.
Naming becomes important when you have multiple references and you need to clarify how they
relate to each other. Let’s look at the signature of Vec::iter():
In plain English: the Iter returned by iter() cannot outlive the Vec reference (&self) it was
created from.
This is important because Vec::iter, as we discussed, returns an iterator over references to
the Vec’s elements. If the Vec is dropped, the references returned by the iterator would be
invalid. Rust must make sure this doesn’t happen, and lifetimes are the tool it uses to enforce
this rule.
Lifetime elision
Rust has a set of rules, called lifetime elision rules, that allow you to omit explicit lifetime
annotations in many cases. For example, Vec::iter’s definition looks like this in std’s source
code:
No explicit lifetime parameter is present in the signature of Vec::iter(). Elision rules imply that
the lifetime of the Iter returned by iter() is tied to the lifetime of the &self reference. You
can think of '_ as a placeholder for the lifetime of the &self reference.
See the References section for a link to the official documentation on lifetime elision.
In most cases, you can rely on the compiler telling you when you need to add explicit lifetime
annotations.
References
• std::vec::Vec::iter
• std::slice::Iter
• Lifetime elision rules
Exercise
6.7 Combinators
Closures
let x = 42;
let add_x = |y| x + y;
let sum = add_x(1);
If necessary, you can specify the types of the arguments and/or the return type:
collect
Further reading
• Iterator’s documentation gives you an overview of the methods available for iterators in
std.
• The itertools crate defines even more combinators for iterators.
Exercise
That’s tricky though! What’s the return type of to_dos, as implemented below?
impl TicketStore {
pub fn to_dos(&self) -> ??? {
self.tickets.iter().filter(|t| t.status == Status::ToDo)
}
}
Unnameable types
The filter method returns an instance of std::iter::Filter, which has the following defi-
nition:
where I is the type of the iterator being filtered on and P is the predicate used to filter the
elements.
We know that I is std::slice::Iter<'_, Ticket> in this case, but what about P?
P is a closure, an anonymous function. As the name suggests, closures don’t have a name, so
we can’t write them down in our code.
impl Trait
impl Trait is a feature that allows you to return a type without specifying its name. You just
declare what trait(s) the type implements, and Rust figures out the rest.
impl TicketStore {
pub fn to_dos(&self) -> impl Iterator<Item = &Ticket> {
self.tickets.iter().filter(|t| t.status == Status::ToDo)
}
}
That’s it!
138 CHAPTER 6. INTRO
Generic?
RPIT
If you read RFCs or deep-dives about Rust, you might come across the acronym RPIT.
It stands for “Return Position Impl Trait” and refers to the use of impl Trait in return position.
Exercise
In the previous section, we saw how impl Trait can be used to return a type without specifying
its name.
The same syntax can also be used in argument position:
fn print_iter<T>(iter: T)
where
T: Iterator<Item = i32>
{
for i in iter {
println!("{}", i);
}
}
Downsides
Exercise
6.10 Slices
+---------+--------+----------+
Stack | pointer | length | capacity |
| | | 2 | 3 |
+--|------+--------+----------+
|
|
v
+---+---+---+
Heap: | 1 | 2 | ? |
+---+---+---+
&[T]
Vec implements the Deref trait using [T] as the target type, so you can use slice methods on a
Vec directly thanks to deref coercion:
Memory layout
numbers slice
+---------+--------+----------+ +---------+--------+
Stack | pointer | length | capacity | | pointer | length |
| | | 3 | 4 | | | | 2 |
+----|----+--------+----------+ +----|----+--------+
| |
| |
v |
+---+---+---+---+ |
Heap: | 1 | 2 | 3 | ? | |
+---+---+---+---+ |
^ |
| |
+--------------------------------+
&Vec<T> vs &[T]
When you need to pass an immutable reference to a Vec to a function, prefer &[T] over &Vec<T>.
This allows the function to accept any kind of slice, not necessarily one backed by a Vec.
For example, you can then pass a subset of the elements in a Vec. But it goes further than
that—you could also pass a slice of an array:
Array slices and Vec slices are the same type: they’re fat pointers to a contiguous sequence of
elements. In the case of arrays, the pointer points to the stack rather than the heap, but that
doesn’t matter when it comes to using the slice.
Exercise
Every time we’ve talked about slice types (like str and [T]), we’ve used their immutable borrow
form (&str and &[T]).
But slices can also be mutable!
Here’s how you create a mutable slice:
slice[0] = 42;
Limitations
When working with immutable borrows, the recommendation was clear: prefer slice references
over references to the owned type (e.g. &[T] over &Vec<T>).
That’s not the case with mutable borrows.
Consider this scenario:
It won’t compile!
push is a method on Vec, not on slices. This is the manifestation of a more general principle: Rust
won’t allow you to add or remove elements from a slice. You will only be able to modify/replace
the elements that are already there.
In this regard, a &mut Vec or a &mut String are strictly more powerful than a &mut [T] or a
&mut str.
Choose the type that best fits based on the operations you need to perform.
Exercise
But we don’t know the id before creating the ticket. So it can’t be there from the get-go.
It’d have to be optional:
That’s also not ideal—we’d have to handle the None case every single time we retrieve a ticket
from the store, even though we know that the id should always be there once the ticket has been
created.
The best solution is to have two different ticket states, represented by two separate types: a
TicketDraft and a Ticket:
A TicketDraft is a ticket that hasn’t been created yet. It doesn’t have an id, and it doesn’t have
a status.
A Ticket is a ticket that has been created. It has an id and a status.
Since each field in TicketDraft and Ticket embeds its own constraints, we don’t have to
duplicate logic across the two types.
Exercise
6.13 Indexing
Index
// Slightly simplified
pub trait Index<Idx>
{
type Output;
// Required method
fn index(&self, index: Idx) -> &Self::Output;
}
It has:
Notice how the index method doesn’t return an Option. The assumption is that index will panic
if you try to access an element that’s not there, as it happens for array and vec indexing.
Exercise
Index allows read-only access. It doesn’t let you mutate the value you retrieved.
IndexMut
If you want to allow mutability, you need to implement the IndexMut trait.
// Slightly simplified
pub trait IndexMut<Idx>: Index<Idx>
{
// Required method
fn index_mut(&mut self, index: Idx) -> &mut Self::Output;
}
IndexMut can only be implemented if the type already implements Index, since it unlocks an
additional capability.
Exercise
6.15 HashMap
Our implementation of Index/IndexMut is not ideal: we need to iterate over the entire Vec to
retrieve a ticket by id; the algorithmic complexity is O(n), where n is the number of tickets in the
store.
We can do better by using a different data structure for storing tickets: a HashMap<K, V>.
use std::collections::HashMap;
book_reviews.insert(
"Adventures of Huckleberry Finn".to_string(),
"My favorite book.".to_string(),
);
HashMap works with key-value pairs. It’s generic over both: K is the generic parameter for the
key type, while V is the one for the value type.
The expected cost of insertions, retrievals and removals is constant, O(1). That sounds perfect
for our usecase, doesn’t it?
Key requirements
There are no trait bounds on HashMap’s struct definition, but you’ll find some on its methods.
Let’s look at insert, for example:
// Slightly simplified
impl<K, V> HashMap<K, V>
where
K: Eq + Hash,
{
pub fn insert(&mut self, k: K, v: V) -> Option<V> {
// [...]
}
}
Hash
A hashing function (or hasher) maps a potentially infinite set of a values (e.g. all possible strings)
to a bounded range (e.g. a u64 value).
There are many different hashing functions around, each with different properties (speed, colli-
sion risk, reversibility, etc.).
148 CHAPTER 6. INTRO
A HashMap, as the name suggests, uses a hashing function behind the scene. It hashes your key
and then uses that hash to store/retrieve the associated value. This strategy requires the key
type must be hashable, hence the Hash trait bound on K.
You can find the Hash trait in the std::hash module:
You will rarely implement Hash manually. Most of the times you’ll derive it:
#[derive(Hash)]
struct Person {
id: u32,
name: String,
}
Eq
HashMap must be able to compare keys for equality. This is particularly important when dealing
with hash collisions—i.e. when two different keys hash to the same value.
You may wonder: isn’t that what the PartialEq trait is for? Almost!
PartialEq is not enough for HashMap because it doesn’t guarantee reflexivity, i.e. a == a is
always true.
For example, floating point numbers (f32 and f64) implement PartialEq, but they don’t satisfy
the reflexivity property: f32::NAN == f32::NAN is false.
Reflexivity is crucial for HashMap to work correctly: without it, you wouldn’t be able to retrieve a
value from the map using the same key you used to insert it.
The Eq trait extends PartialEq with the reflexivity property:
It’s a marker trait: it doesn’t add any new methods, it’s just a way for you to say to the compiler
that the equality logic implemented in PartialEq is reflexive.
You can derive Eq automatically when you derive PartialEq:
#[derive(PartialEq, Eq)]
struct Person {
id: u32,
name: String,
}
6.15. HASHMAP 149
There is an implicit contract between Eq and Hash: if two keys are equal, their hashes must be
equal too. This is crucial for HashMap to work correctly. If you break this contract, you’ll get
nonsensical results when using HashMap.
Exercise
6.16 Ordering
By moving from a Vec to a HashMap we have improved the performance of our ticket management
system, and simplified our code in the process.
It’s not all roses, though. When iterating over a Vec-backed store, we could be sure that the
tickets would be returned in the order they were added.
That’s not the case with a HashMap: you can iterate over the tickets, but the order is random.
BTreeMap
Just like HashMap, you won’t find trait bounds on the definition of BTreeMap. But you’ll find trait
bounds on its methods. Let’s look at insert:
// `K` and `V` stand for the key and value types, respectively,
// just like in `HashMap`.
impl<K, V> BTreeMap<K, V> {
pub fn insert(&mut self, key: K, value: V) -> Option<V>
where
K: Ord,
{
// implementation
}
}
Hash is no longer required. Instead, the key type must implement the Ord trait.
Ord
The cmp method returns an Ordering enum, which can be one of Less, Equal, or Greater.
Ord requires that two other traits are implemented: Eq and PartialOrd.
6.16. ORDERING 151
PartialOrd
PartialOrd is a weaker version of Ord, just like PartialEq is a weaker version of Eq. You can
see why by looking at its definition:
Exercise
Intro
One of Rust’s big promises is fearless concurrency: making it easier to write safe, concurrent
programs. We haven’t seen much of that yet. All the work we’ve done so far has been single-
threaded. Time to change that!
In this chapter we’ll make our ticket store multithreaded.
We’ll have the opportunity to touch most of Rust’s core concurrency features, including:
We’ll also discuss various design patterns for multithreaded systems and some of their trade-offs.
Exercise
153
154 CHAPTER 7. INTRO
7.1 Threads
Before we start writing multithreaded code, let’s take a step back and talk about what threads
are and why we might want to use them.
What is a thread?
main
When a Rust program starts, it runs on a single thread, the main thread.
This thread is created by the operating system and is responsible for running the main function.
use std::thread;
use std::time::Duration;
fn main() {
loop {
thread::sleep(Duration::from_secs(2));
println!("Hello from the main thread!");
}
}
std::thread
Rust’s standard library provides a module, std::thread, that allows you to create and manage
threads.
spawn
You can use std::thread::spawn to create new threads and execute code on them.
For example:
7.1. THREADS 155
use std::thread;
use std::time::Duration;
fn main() {
let handle = thread::spawn(|| {
loop {
thread::sleep(Duration::from_secs(1));
println!("Hello from a thread!");
}
});
loop {
thread::sleep(Duration::from_secs(2));
println!("Hello from the main thread!");
}
}
If you execute this program on the Rust playground you’ll see that the main thread and the
spawned thread run concurrently.
Each thread makes progress independently of the other.
Process termination
When the main thread finishes, the overall process will exit.
A spawned thread will continue running until it finishes or the main thread finishes.
use std::thread;
use std::time::Duration;
fn main() {
let handle = thread::spawn(|| {
loop {
thread::sleep(Duration::from_secs(1));
println!("Hello from a thread!");
}
});
thread::sleep(Duration::from_secs(5));
}
In the example above, you can expect to see the message “Hello from a thread!” printed roughly
five times.
Then the main thread will finish (when the sleep call returns), and the spawned thread will be
terminated since the overall process exits.
join
You can also wait for a spawned thread to finish by calling the join method on the JoinHandle
that spawn returns.
156 CHAPTER 7. INTRO
use std::thread;
fn main() {
let handle = thread::spawn(|| {
println!("Hello from a thread!");
});
handle.join().unwrap();
}
In this example, the main thread will wait for the spawned thread to finish before exiting.
This introduces a form of synchronization between the two threads: you’re guaranteed to see
the message “Hello from a thread!” printed before the program exits, because the main thread
won’t exit until the spawned thread has finished.
Exercise
7.2 'static
If you tried to borrow a slice from the vector in the previous exercise, you probably got a compiler
error that looks something like this:
argument requires that v is borrowed for 'static, what does that mean?
The 'static lifetime is a special lifetime in Rust.
It means that the value will be valid for the entire duration of the program.
Detached threads
A thread launched via thread::spawn can outlive the thread that spawned it.
For example:
use std::thread;
fn f() {
thread::spawn(|| {
thread::spawn(|| {
loop {
thread::sleep(std::time::Duration::from_secs(1));
println!("Hello from the detached thread!");
}
});
});
}
In this example, the first spawned thread will in turn spawn a child thread that prints a message
every second.
The first thread will then finish and exit. When that happens, its child thread will continue
running for as long as the overall process is running.
In Rust’s lingo, we say that the child thread has outlived its parent.
'static lifetime
it must not borrow any values that might be dropped before the program exits; violating this
constraint would expose us to a use-after-free bug.
That’s why std::thread::spawn’s signature requires that the closure passed to it has the
'static lifetime:
The first approach is how you solved the issue in the previous exercise: by allocating new vectors
to hold the left and right parts of the original vector, which were then moved into the spawned
threads.
'static references
Let’s talk about the second case, references that are valid for the entire duration of the program.
Static data
The most common case is a reference to static data, such as string literals:
Since string literals are known at compile-time, Rust stores them inside your executable, in a
region known as read-only data segment. All references pointing to that region will therefore
be valid for as long as the program runs; they satisfy the 'static contract.
7.2. 'STATIC 159
Further reading
Exercise
The main concern around passing references to spawned threads is use-after-free bugs: access-
ing data using a pointer to a memory region that’s already been freed/de-allocated.
If you’re working with heap-allocated data, you can avoid the issue by telling Rust that you’ll
never reclaim that memory: you choose to leak memory, intentionally.
This can be done, for example, using the Box::leak method from Rust’s standard library:
Leaking data is dangerous: if you keep leaking memory, you’ll eventually run out and crash with
an out-of-memory error.
At the same time, memory leaked via leak method is not truly forgotten.
The operating system can map each memory region to the process responsible for it. When the
process exits, the operating system will reclaim that memory.
Keeping this in mind, it can be OK to leak memory when:
“Let the OS deal with it” is a perfectly valid memory management strategy if your usecase allows
for it.
Exercise
All the lifetime issues we discussed so far have a common source: the spawned thread can
outlive its parent.
We can sidestep this issue by using scoped threads.
std::thread::scope(|scope| {
scope.spawn(|| {
let first = &v[..midpoint];
println!("Here's the first half of v: {first:?}");
});
scope.spawn(|| {
let second = &v[midpoint..];
println!("Here's the second half of v: {second:?}");
});
});
println!("Here's v: {v:?}");
scope
Scoped spawns
handle1.join().unwrap();
162 CHAPTER 7. INTRO
handle2.join().unwrap();
println!("Here's v: {v:?}");
The translated example wouldn’t compile, though: the compiler would complain that &v can’t be
used from our spawned threads since its lifetime isn’t 'static.
That’s not an issue with std::thread::scope—you can safely borrow from the environment.
In our example, v is created before the spawning points. It will only be dropped after scope
returns. At the same time, all threads spawned inside scope are guaranteed to finish before
scope returns, therefore there is no risk of having dangling references.
The compiler won’t complain!
Exercise
7.5 Channels
For our ticket management system, we want to do something different: a client-server architec-
ture.
We will have one long-running server thread, responsible for managing our state, the stored
tickets.
Communication
Channels
use std::sync::mpsc::channel;
Multiple senders
Sender is clonable: we can create multiple senders (e.g. one for each client thread) and they
will all push data into the same channel.
Receiver, instead, is not clonable: there can only be a single receiver for a given channel.
That’s what mpsc (multi-producer single-consumer) stands for!
Message type
Errors
Exercise
impl<T> Sender<T> {
pub fn send(&self, t: T) -> Result<(), SendError<T>> {
// [...]
}
}
When we introduced the borrow-checker, we named the two types of references we can have in
Rust:
Immutable/mutable is a mental model that works for the vast majority of cases, and it’s a great
one to get started with Rust. But it’s not the whole story, as you’ve just seen: &T doesn’t actually
guarantee that the data it points to is immutable.
Don’t worry, though: Rust is still keeping its promises. It’s just that the terms are a bit more
nuanced than they might seem at first.
UnsafeCell
Whenever a type allows you to mutate data through a shared reference, you’re dealing with
interior mutability.
By default, the Rust compiler assumes that shared references are immutable. It optimises your
code based on that assumption.
The compiler can reorder operations, cache values, and do all sorts of magic to make your code
faster.
166 CHAPTER 7. INTRO
You can tell the compiler “No, this shared reference is actually mutable” by wrapping the data in
an UnsafeCell.
Every time you see a type that allows interior mutability, you can be certain that UnsafeCell is
involved, either directly or indirectly.
Using UnsafeCell, raw pointers and unsafe code, you can mutate data through shared refer-
ences.
Let’s be clear, though: UnsafeCell isn’t a magic wand that allows you to ignore the borrow-
checker!
unsafe code is still subject to Rust’s rules about borrowing and aliasing. It’s an (advanced) tool
that you can leverage to build safe abstractions whose safety can’t be directly expressed in
Rust’s type system. Whenever you use the unsafe keyword you’re telling the compiler: “I know
what I’m doing, I won’t violate your invariants, trust me.”
Every time you call an unsafe function, there will be documentation explaining its safety pre-
conditions: under what circumstances it’s safe to execute its unsafe block. You can find the
ones for UnsafeCell in std’s documentation.
We won’t be using UnsafeCell directly in this course, nor will we be writing unsafe code. But
it’s important to know that it’s there, why it exists and how it relates to the types you use every
day in Rust.
Key examples
Let’s go through a couple of important std types that leverage interior mutability.
These are types that you’ll encounter somewhat often in Rust code, especially if you peek under
the hood of some the libraries you use.
Reference counting
Rc is a reference-counted pointer.
It wraps around a value and keeps track of how many references to the value exist. When the
last reference is dropped, the value is deallocated.
The value wrapped in an Rc is immutable: you can only get shared references to it.
use std::rc::Rc;
Rc uses UnsafeCell internally to allow shared references to increment and decrement the ref-
erence count.
7.6. INTERIOR MUTABILITY 167
RefCell
RefCell is one of the most common examples of interior mutability in Rust. It allows you to
mutate the value wrapped in a RefCell even if you only have an immutable reference to the
RefCell itself.
This is done via runtime borrow checking. The RefCell keeps track of the number (and type)
of references to the value it contains at runtime. If you try to borrow the value mutably while
it’s already borrowed immutably, the program will panic, ensuring that Rust’s borrowing rules are
always enforced.
use std::cell::RefCell;
let x = RefCell::new(42);
Exercise
In our current client-server implementation, communication flows in one direction: from the client
to the server.
The client has no way of knowing if the server received the message, executed it successfully,
or failed. That’s not ideal.
To solve this issue, we can introduce a two-way communication system.
Response channel
We need a way for the server to send a response back to the client.
There are various ways to do this, but the simplest option is to include a Sender channel in the
message that the client sends to the server. After processing the message, the server can use
this channel to send a response back to the client.
This is a fairly common pattern in Rust applications built on top of message-passing primitives.
Exercise
All the interactions from the client side have been fairly low-level: you have to manually create a
response channel, build the command, send it to the server, and then call recv on the response
channel to get the response.
This is a lot of boilerplate code that could be abstracted away, and that’s exactly what we’re
going to do in this exercise.
Exercise
Bounded channels
use std::sync::mpsc::sync_channel;
Sending messages
• send: if there is space in the channel, it will enqueue the message and return Ok(()).
If the channel is full, it will block and wait until there is space available.
• try_send: if there is space in the channel, it will enqueue the message and return Ok(()).
If the channel is full, it will return Err(TrySendError::Full(value)), where value is the
message that couldn’t be sent.
Depending on your use case, you might want to use one or the other.
Backpressure
The main advantage of using bounded channels is that they provide a form of backpressure.
They force the producers to slow down if the consumer can’t keep up. The backpressure can
then propagate through the system, potentially affecting the whole architecture and preventing
end users from overwhelming the system with requests.
Exercise
Legacy updates
In the non-threaded version of the system, updates were fairly straightforward: TicketStore
exposed a get_mut method that allowed the caller to obtain a mutable reference to a ticket, and
then modify it.
Multithreaded updates
The same strategy won’t work in the current multithreaded version. The borrow checker would
stop us: SyncSender<&mut Ticket> isn’t 'static because &mut Ticket doesn’t satisfy
the 'static lifetime, therefore they can’t be captured by the closure that gets passed to
std::thread::spawn.
There are a few ways to work around this limitation. We’ll explore a few of them in the following
exercises.
Patching
We can’t send a &mut Ticket over a channel, therefore we can’t mutate on the client-side.
Can we mutate on the server-side?
We can, if we tell the server what needs to be changed. In other words, if we send a patch to
the server:
struct TicketPatch {
id: TicketId,
title: Option<TicketTitle>,
description: Option<TicketDescription>,
status: Option<TicketStatus>,
}
The id field is mandatory, since it’s required to identify the ticket that needs to be updated.
All other fields are optional:
Exercise
The patching strategy you just implemented has a major drawback: it’s racy.
If two clients send patches for the same ticket roughly at same time, the server will apply them
in an arbitrary order. Whoever enqueues their patch last will overwrite the changes made by the
other client.
Version numbers
Locking
use std::sync::Mutex;
Locking granularity
This approach is more efficient, but it has a downside: TicketStore has to become aware of
the multithreaded nature of the system; up until now, TicketStore has been blissfully ignoring
the existence of threads.
Let’s go for it anyway.
For the whole scheme to work, the lock must be passed to the client that wants to modify the
ticket.
The client can then directly modify the ticket (as if they had a &mut Ticket) and release the lock
when they’re done.
This is a bit tricky.
We can’t send a Mutex<Ticket> over a channel, because Mutex is not Clone and we can’t move
it out of the TicketStore. Could we send the MutexGuard instead?
Let’s test the idea with a small example:
use std::thread::spawn;
use std::sync::Mutex;
use std::sync::mpsc::sync_channel;
fn main() {
let lock = Mutex::new(0);
174 CHAPTER 7. INTRO
spawn(move || {
receiver.recv().unwrap();
});
Send
Send is a marker trait that indicates that a type can be safely transferred from one thread to
another.
Send is also an auto-trait, just like Sized; it’s automatically implemented (or not implemented)
for your type by the compiler, based on its definition.
You can also implement Send manually for your types, but it requires unsafe since you have to
guarantee that the type is indeed safe to send between threads for reasons that the compiler
can’t automatically verify.
Channel requirements
MutexGuard
MutexGuard is not Send because the underlying operating system primitives that Mutex uses
to implement the lock require (on some platforms) that the lock must be released by the same
thread that acquired it.
If we were to send a MutexGuard to another thread, the lock would be released by a different
thread, which would lead to undefined behavior.
Our challenges
Summing it up:
• We can’t send a MutexGuard over a channel. So we can’t lock on the server-side and then
modify the ticket on the client-side.
• We can send a Mutex over a channel because it’s Send as long as the data it protects is
Send, which is the case for Ticket. At the same time, we can’t move the Mutex out of the
TicketStore nor clone it.
impl<T> Mutex<T> {
// `&self`, not `self`!
pub fn lock(&self) -> LockResult<MutexGuard<'_, T>> {
// Implementation details
}
}
use std::sync::Arc;
If you’re having a déjà vu moment, you’re right: Arc sounds very similar to Rc, the reference-
counted pointer we introduced when talking about interior mutability. The difference is thread-
safety: Rc is not Send, while Arc is. It boils down to the way the reference count is implemented:
Rc uses a “normal” integer, while Arc uses an atomic integer, which can be safely shared and
modified across threads.
Arc<Mutex<T>>
We have all the pieces we need to implement the locking strategy for our ticket store.
Further reading
• We won’t be covering the details of atomic operations in this course, but you can find more
information in the std documentation as well as in the “Rust atomics and locks” book.
Exercise
Our new TicketStore works, but its read performance is not great: there can only be one client
at a time reading a specific ticket, because Mutex<T> doesn’t distinguish between readers and
writers.
We can solve the issue by using a different locking primitive: RwLock<T>.
RwLock<T> stands for read-write lock. It allows multiple readers to access the data simulta-
neously, but only one writer at a time.
RwLock<T> has two methods to acquire a lock: read and write.
read returns a guard that allows you to read the data, while write returns a guard that allows
you to modify it.
use std::sync::RwLock;
Trade-offs
On the surface, RwLock<T> seems like a no-brainer: it provides a superset of the functionality of
Mutex<T>. Why would you ever use Mutex<T> if you can use RwLock<T> instead?
There are two key reasons:
In our case, we can expect the workload to be read-heavy (since most clients will be reading
tickets, not modifying them), so RwLock<T> is a good choice.
Exercise
No locking of the state was necessary, since the server was the only one modifying the state.
That’s because the “inbox” channel naturally serialized incoming requests: the server would
process them one by one.
We’ve already discussed the limitations of this approach when it comes to patching behaviour,
but we didn’t discuss the performance implications of the original design: the server could only
process one request at a time, including reads.
Fine-grained locking
We then moved to a more sophisticated design, where each ticket was protected by its own
lock and clients could independently decide if they wanted to read or atomically modify a ticket,
acquiring the appropriate lock.
This design allows for better parallelism (i.e. multiple clients can read tickets at the same time),
but it is still fundamentally serial: the server processes commands one by one. In particular, it
hands out locks to clients one by one.
Could we remove the channels entirely and allow clients to directly access the TicketStore,
relying exclusively on locks to synchronize access?
Removing channels
We want all threads to refer to the same state, otherwise we don’t really have a multithreaded
system—we’re just running multiple single-threaded systems in parallel.
We’ve already encountered this problem when we tried to share a lock across threads: we can
use an Arc.
7.13. DESIGN REVIEW 179
There is one interaction that’s still lockless thanks to the serialization provided by the channels:
inserting (or removing) a ticket from the store.
If we remove the channels, we need to introduce (another) lock to synchronize access to the
TicketStore itself.
If we use a Mutex, then it makes no sense to use an additional RwLock for each ticket: the Mutex
will already serialize access to the entire store, so we wouldn’t be able to read tickets in parallel
anyway.
If we use a RwLock, instead, we can read tickets in parallel. We just need to pause all reads while
inserting or removing a ticket.
Let’s go down this path and see where it leads us.
Exercise
7.14 Sync
Before we wrap up this chapter, let’s talk about another key trait in Rust’s standard library: Sync.
Sync is an auto trait, just like Send.
It is automatically implemented by all types that can be safely shared between threads.
In order words: T is Sync if &T is Send.
Exercise
Async Rust
Threads are not the only way to write concurrent programs in Rust.
In this chapter we’ll explore another approach: asynchronous programming.
In particular, you’ll get an introduction to:
Exercise
181
182 CHAPTER 8. ASYNC RUST
All the functions and methods you’ve written so far were eager.
Nothing happened until you invoked them. But once you did, they ran to completion: they did all
their work, and then returned their output.
Sometimes that’s undesirable.
For example, if you’re writing an HTTP server, there might be a lot of waiting: waiting for the
request body to arrive, waiting for the database to respond, waiting for a downstream service to
reply, etc.
What if you could do something else while you’re waiting?
What if you could choose to give up midway through a computation?
What if you could choose to prioritise another task over the current one?
That’s where asynchronous functions come in.
async fn
use tokio::net::TcpListener;
fn run() {
// Invoke `bind_random`
let listener = bind_random();
// Now what?
}
Nothing happens!
Rust doesn’t start executing bind_random when you call it, not even as a background task (as you
might expect based on your experience with other languages). Asynchronous functions in Rust
are lazy: they don’t do any work until you explicitly ask them to. Using Rust’s terminology, we
say that bind_random returns a future, a type that represents a computation that may complete
later. They’re called futures because they implement the Future trait, an interface that we’ll
examine in detail later on in this chapter.
.await
The most common way to ask an asynchronous function to do some work is to use the .await
keyword:
8.1. ASYNCHRONOUS FUNCTIONS 183
use tokio::net::TcpListener;
async fn run() {
// Invoke `bind_random` and wait for it to complete
let listener = bind_random().await;
// Now `listener` is ready
}
.await doesn’t return control to the caller until the asynchronous function has run to comple-
tion—e.g. until the TcpListener has been created in the example above.
Runtimes
• Progress: they make sure that tasks make progress whenever they can.
• Efficiency: if a task is waiting for something, they try to make sure that another task can
run in the meantime, fully utilising the available resources.
No default runtime
Rust is fairly unique in its approach to asynchronous programing: there is no default runtime.
The standard library doesn’t ship with one. You need to bring your own!
In most cases, you’ll choose one of the options available in the ecosystem. Some runtimes are
designed to be broadly applicable, a solid option for most applications. tokio and async-std
belong to this category. Other runtimes are optimised for specific use cases—e.g. embassy for
embedded systems.
Throughout this course we’ll rely on tokio, the most popular runtime for general-purpose asyn-
chronous programming in Rust.
#[tokio::main]
The entrypoint of your executable, the main function, must be a synchronous function. That’s
where you’re supposed to set up and launch your chosen async runtime.
184 CHAPTER 8. ASYNC RUST
Most runtimes provide a macro to make this easier. For tokio, it’s tokio::main:
#[tokio::main]
async fn main() {
// Your async code goes here
}
fn main() {
let rt = tokio::runtime::Runtime::new().unwrap();
rt.block_on(
// Your async function goes here
// [...]
);
}
#[tokio::test]
#[tokio::test]
async fn my_test() {
// Your async test code goes here
}
Exercise
Your solution to the previous exercise should look something like this:
tokio::spawn
tokio::spawn allows you to hand off a task to the executor, without waiting for it to complete.
Whenever you invoke tokio::spawn, you’re telling tokio to continue running the spawned task,
in the background, concurrently with the task that spawned it.
Here’s how you can use it to process multiple connections concurrently:
use tokio::net::TcpListener;
Asynchronous blocks
JoinHandle
Panic boundary
If a task spawned with tokio::spawn panics, the panic will be caught by the executor.
If you don’t .await the corresponding JoinHandle, the panic won’t be propagated to the
spawner. Even if you do .await the JoinHandle, the panic won’t be propagated automati-
cally. Awaiting a JoinHandle returns a Result, with JoinError as its error type. You can then
check if the task panicked by calling JoinError::is_panic and choose what to do with the
panic—either log it, ignore it, or propagate it.
use tokio::task::JoinError;
std::thread::spawn vs tokio::spawn
Exercise
So far we’ve been talking about async runtimes as an abstract concept. Let’s dig a bit deeper
into the way they are implemented—as you’ll see soon enough, it has an impact on our code.
Flavors
The current-thread runtime, as the name implies, relies exclusively on the OS thread it was
launched on to schedule and execute tasks.
When using the current-thread runtime, you have concurrency but no parallelism: asyn-
chronous tasks will be interleaved, but there will always be at most one task running at any
given time.
Multithreaded runtime
When using the multithreaded runtime, instead, there can up to N tasks running in parallel at any
given time, where N is the number of threads used by the runtime. By default, N matches the
number of available CPU cores.
There’s more: tokio performs work-stealing.
If a thread is idle, it won’t wait around: it’ll try to find a new task that’s ready for execution, either
from a global queue or by stealing it from the local queue of another thread.
Work-stealing can have significant performance benefits, especially on tail latencies, whenever
your application is dealing with workloads that are not perfectly balanced across threads.
Implications
Let’s ignore the Future trait for now to focus on the rest.
spawn is asking all its inputs to be Send and have a 'static lifetime.
The 'static constraint follows the same rationale of the 'static constraint on std::thread::spawn:
the spawned task may outlive the context it was spawned from, therefore it shouldn’t depend
on any local data that may be de-allocated after the spawning context is destroyed.
fn spawner() {
let v = vec![1, 2, 3];
// This won't work, since `&v` doesn't
// live long enough.
tokio::spawn(async {
for x in &v {
println!("{x}")
}
})
}
Send, on the other hand, is a direct consequence of tokio’s work-stealing strategy: a task that
was spawned on thread A may end up being moved to thread B if that’s idle, thus requiring a
Send bound since we’re crossing thread boundaries.
fn spawner(input: Rc<u64>) {
// This won't work either, because
// `Rc` isn't `Send`.
tokio::spawn(async move {
println!("{}", input);
})
}
Exercise
use std::rc::Rc;
use tokio::task::yield_now;
fn spawner() {
tokio::spawn(example());
}
async fn example() {
// A value that's not `Send`,
// created _inside_ the async function
let non_send = Rc::new(1);
13 | yield_now().await;
| ^^^^^
| await occurs here, with `non_send` maybe used later
note: required by a bound in `tokio::spawn`
|
164 | pub fn spawn<F>(future: F) -> JoinHandle<F::Output>
| ----- required by a bound in this function
165 | where
166 | F: Future + Send + 'static,
| ^^^^ required by this bound in `spawn`
To understand why that’s the case, we need to refine our understanding of Rust’s asynchronous
model.
We stated early on that async functions return futures, types that implement the Future trait.
You can think of a future as a state machine. It’s in one of two states:
trait Future {
type Output;
poll
• Poll::Pending: the future is not ready yet. You need to call poll again later.
• Poll::Ready(value): the future has finished. value is the result of the computation, of
type Self::Output.
Once Future::poll returns Poll::Ready, it should not be polled again: the future has com-
pleted, there’s nothing left to do.
192 CHAPTER 8. ASYNC RUST
use std::rc::Rc;
use tokio::task::yield_now;
async fn example() {
let non_send = Rc::new(1);
yield_now().await;
println!("{}", non_send);
}
The compiler would transform it into an enum that looks somewhat like this:
When example is called, it returns ExampleFuture::NotStarted. The future has never been
polled yet, so nothing has happened.
When the runtime polls it the first time, ExampleFuture will advance until the next .await point:
it’ll stop at the ExampleFuture::YieldNow(Rc<i32>) stage of the state machine, returning
Poll::Pending.
When it’s polled again, it’ll execute the remaining code (println!) and return Poll::Ready(()).
When you look at its state machine representation, ExampleFuture, it is now clear why example
is not Send: it holds an Rc, therefore it cannot be Send.
Yield points
As you’ve just seen with example, every .await point creates a new intermediate state in the
lifecycle of a future.
8.4. THE FUTURE TRAIT 193
That’s why .await points are also known as yield points: your future yields control back to the
runtime that was polling it, allowing the runtime to pause it and (if necessary) schedule another
task for execution, thus making progress on multiple fronts concurrently.
We’ll come back to the importance of yielding in a later section.
Exercise
What is blocking?
How long is too long? How much time can a task spend without yielding before it becomes a
problem?
It depends on the runtime, the application, the number of in-flight tasks, and many other factors.
But, as a general rule of thumb, try to spend less than 100 microseconds between yield points.
Consequences
• Deadlocks: if the task that’s not yielding is waiting for another task to complete, and that
task is waiting for the first one to yield, you have a deadlock. No progress can be made,
unless the runtime is able to schedule the other task on a different thread.
• Starvation: other tasks might not be able to run, or might run after a long delay, which can
lead to poor performances (e.g. high tail latencies).
• Synchronous I/O. You can’t predict how long it will take, and it’s likely to be longer than 100
microseconds.
• Expensive CPU-bound computations.
The latter category is not always obvious though. For example, sorting a vector with a few
elements is not a problem; that evaluation changes if the vector has billions of entries.
OK, so how do you avoid blocking the runtime assuming you must perform an operation that
qualifies or risks qualifying as blocking?
8.5. DON’T BLOCK THE RUNTIME 195
You need to move the work to a different thread. You don’t want to use the so-called runtime
threads, the ones used by tokio to run tasks.
tokio provides a dedicated threadpool for this purpose, called the blocking pool. You can
spawn a synchronous operation on the blocking pool using the tokio::task::spawn_blocking
function. spawn_blocking returns a future that resolves to the result of the operation when it
completes.
use tokio::task;
async fn run() {
let handle = task::spawn_blocking(expensive_computation);
// Do other stuff in the meantime
let result = handle.await.unwrap();
}
The blocking pool is long-lived. spawn_blocking should be faster than creating a new thread
directly via std::thread::spawn because the cost of thread initialization is amortized over mul-
tiple calls.
Further reading
Exercise
If you browse tokio’s documentation, you’ll notice that it provides a lot of types that “mirror”
the ones in the standard library, but with an asynchronous twist: locks, channels, timers, and
more.
When working in an asynchronous context, you should prefer these asynchronous alternatives
to their synchronous counterparts.
To understand why, let’s take a look at Mutex, the mutually exclusive lock we explored in the
previous chapter.
Task A Task B
|
Acquire lock
Yields to runtime
|
+--------------+
|
Tries to acquire lock
We have a deadlock. Task B will never manage to acquire the lock, because the lock is currently
held by task A, which has yielded to the runtime before releasing the lock and won’t be scheduled
again because the runtime cannot preempt task B.
8.6. ASYNC-AWARE PRIMITIVES 197
tokio::sync::Mutex
use std::sync::Arc;
use tokio::sync::Mutex;
Acquiring the lock is now an asynchronous operation, which yields back to the runtime if it can’t
make progress.
Going back to the previous scenario, the following would happen:
Task A Task B
|
Acquires the lock
Starts `http_call`
Yields to runtime
|
+--------------+
|
Tries to acquire the lock
Cannot acquire the lock
Yields to runtime
|
+--------------+
|
`http_call` completes
Releases the lock
Yield to runtime
|
+--------------+
|
Acquires the lock
[...]
All good!
We’ve used a single-threaded runtime as the execution context in our previous example, but the
same risk persists even when using a multithreaded runtime.
The only difference is in the number of concurrent tasks required to create the deadlock: in a
single-threaded runtime, 2 are enough; in a multithreaded runtime, we would need N+1 tasks,
where N is the number of runtime threads.
198 CHAPTER 8. ASYNC RUST
Downsides
Other primitives
We used Mutex as an example, but the same applies to RwLock, semaphores, etc.
Prefer async-aware versions when working in an asynchronous context to minimise the risk of
issues.
Exercise
8.7 Cancellation
use tokio::time::timeout;
use tokio::sync::oneshot;
use std::time::Duration;
async fn http_call() {
// [...]
}
async fn run() {
// Wrap the future with a `Timeout` set to expire in 10 milliseconds.
let duration = Duration::from_millis(10);
if let Err(_) = timeout(duration, http_call()).await {
println!("Didn't receive a value within 10 ms");
}
}
When the timeout expires, the future returned by http_call will be cancelled. Let’s imagine
that this is http_call’s body:
use std::net::TcpStream;
async fn http_call() {
let (stream, _) = TcpStream::connect(/* */).await.unwrap();
let request: Vec<u8> = /* */;
stream.write_all(&request).await.unwrap();
}
Clean up
Rust’s cancellation mechanism is quite powerful—it allows the caller to cancel an ongoing task
without needing any form of cooperation from the task itself.
At the same time, this can be quite dangerous. It may be desirable to perform a graceful can-
cellation, to ensure that some clean-up tasks are performed before aborting the operation.
For example, consider this fictional API for a SQL transaction:
200 CHAPTER 8. ASYNC RUST
async fn transfer_money(
connection: SqlConnection,
payer_id: u64,
payee_id: u64,
amount: u64
) -> Result<(), anyhow::Error> {
let transaction = connection.begin_transaction().await?;
update_balance(payer_id, amount, &transaction).await?;
decrease_balance(payee_id, amount, &transaction).await?;
transaction.commit().await?;
}
On cancellation, it’d be ideal to explicitly abort the pending transaction rather than leaving it hang-
ing. Rust, unfortunately, doesn’t provide a bullet-proof mechanism for this kind of asynchronous
clean up operations.
The most common strategy is to rely on the Drop trait to schedule the required clean-up work.
This can be by:
When you spawn a task using tokio::spawn, you can no longer drop it; it belongs to the runtime.
Nonetheless, you can use its JoinHandle to cancel it if needed:
async fn run() {
let handle = tokio::spawn(/* some async task */);
// Cancel the spawned task
handle.abort();
}
Further reading
• Be extremely careful when using tokio’s select! macro to “race” two different futures.
Retrying the same task in a loop is dangerous unless you can ensure cancellation safety.
Check out select!’s documentation for more details.
If you need to interleave two asynchronous streams of data (e.g. a socket and a channel),
prefer using StreamExt::merge instead.
• A CancellationToken may be preferable to JoinHandle::abort in some cases.
Exercise
8.8 Outro
Rust’s asynchronous model is quite powerful, but it does introduce additional complexity. Take
time to know your tools: dive deep into tokio’s documentation and get familiar with its primitives
to make the most out of it.
Keep in mind, as well, that there is ongoing work at the language and std level to streamline and
“complete” Rust’s asynchronous story. You may experience some rough edges in your day-to-day
work due to some of these missing pieces.
A few recommendations for a mostly-pain-free async experience:
Don’t let these caveats scare you: asynchronous Rust is being used effectively at massive scale
(e.g. AWS, Meta) to power foundational services.
You will have to master it if you’re planning building networked applications in Rust.
Exercise
Epilogue
In the end, there are no shortcuts: if you want to get good at something, you need to do it,
over and over again. Throughout this course you wrote a fair amount of Rust, enough to get the
language and its syntax flowing under your fingers. It’ll take many more lines of code to feel it
“yours”, but that moment will come without a doubt if you keep practicing.
Going further
Let’s close with some pointers to additional resources that you might find useful as you move
forward in your journey with Rust.
Exercises
You can find more exercises to practice Rust in the rustlings project and on exercism.io’s Rust
track.
Introductory material
Check out the Rust book and “Programming Rust” if you’re looking for a different perspective
on the same concepts we covered throughout this course. You’ll certainly learn something new
since they don’t cover exactly the same topics; Rust has a lot of surface area!
203
204 CHAPTER 9. EPILOGUE
Advanced material
If you want to dive deeper into the language, refer to the Rustonomicon and “Rust for Rus-
taceans”.
The “Decrusted” series is another excellent resource to learn more about the internals of many
of the most popular Rust libraries.
Domain-specific material
If you want to use Rust for backend development, check out “Zero to Production in Rust”.
If you want to use Rust for embedded development, check out the Embedded Rust book.
Masterclasses
You can then find resources on key topics that cut across domains.
For testing, check out “Advanced testing, going beyond the basics”.
For telemetry, check out “You can’t fix what you can’t see”.