Rust Linked List Light
Rust Linked List Light
Table of Contents
Learning Rust With Entirely Too Many Linked Lists
Learn Rust With Entirely Too Many Linked Lists
An Obligatory Public Service Announcement
Performance doesn't always matter
They have O(1) split-append-insert-remove if you have a pointer there
I can't afford amortization
Linked lists waste less space
I use linked lists all the time in <functional language>
Linked lists are great for building concurrent data structures!
Mumble mumble kernel embedded something something intrusive.
Iterators don't get invalidated by unrelated insertions/removals
They're simple and great for teaching!
Take a Breath
A Bad Singly-Linked Stack
Basic Data Layout
New
Ownership 101
Push
Pop
Testing
Drop
Bonus Section For Premature Optimization!
The Final Code
An Ok Singly-Linked Stack
Using Option
Making it all Generic
Peek
IntoIter
Iter
IterMut
Final Code
A Persistent Singly-Linked Stack
Layout
Basics
Drop
Arc
Final Code
A Bad but Safe Doubly-Linked Deque
Layout
Building Up
When to choose interior mutability
Introducing inherited mutability roots to shared types
Breaking Down
Peeking
Symmetric Junk
Iteration
IntoIter 2
Iter
Final Code
An Ok Unsafe Singly-Linked Queue
Layout
Unsafe Rust
Basics
Miri
Attempting To Understand Stacked Borrows
The Motivation: Pointer Aliasing
Safe Stacked Borrows
Unsafe Stacked Borrows
Managing Stacked Borrows
Testing Stacked Borrows
Basic Borrows
Testing Arrays
Testing Shared References
Testing Interior Mutability
Testing Box
Layout and Basics 2: Getting Raw
Layout
Basics
Extra Junk
Final Code
A Production-Quality Unsafe Doubly-Linked Deque
Layout
Variance and PhantomData
Basics
Drop and Panic Safety
Boring Combinatorics
Filling In Random Bits
Testing
Send, Sync, and Compile Tests
An Introduction To Cursors
Implementing Cursors
Split
Splice
Testing Cursors
Final Code
A Bunch of Silly Lists
The Double Singly-Linked List
The Stack-Allocated Linked List
3
Got any issues or want to check out all the final code at once? Everything's on Github!
NOTE: The current edition of this book is written against Rust 2018, which was first released with
rustc 1.31 (Dec 8, 2018). If your rust toolchain is new enough, the Cargo.toml file that cargo new
creates should contain the line edition = "2018" (or if you're reading this in the far future, perhaps
some even larger number!). Using an older toolchain is possible, but unlocks a secret hardmode,
where you get extra compiler errors that go completely unmentioned in the text of this book. Wow,
sounds like fun!
I fairly frequently get asked how to implement a linked list in Rust. The answer honestly depends on what
your requirements are, and it's obviously not super easy to answer the question on the spot. As such I've
decided to write this book to comprehensively answer the question once and for all.
In this series I will teach you basic and advanced Rust programming entirely by having you implement 6
linked lists. In doing so, you should learn:
The following pointer types: & , &mut , Box , Rc , Arc , *const , *mut , NonNull (?)
Ownership, borrowing, inherited mutability, interior mutability, Copy
All The Keywords: struct, enum, fn, pub, impl, use, ...
Pattern matching, generics, destructors
Testing, installing new toolchains, using miri
Unsafe Rust: raw pointers, aliasing, stacked borrows, UnsafeCell, variance
Yes, linked lists are so truly awful that you deal with all of these concepts in making them real.
Everything's in the sidebar (may be collapsed on mobile), but for quick reference, here's what we're going to
be making:
Just so we're all the same page, I'll be writing out all the commands that I feed into my terminal. I'll also be
using Rust's standard package manager, Cargo, to develop the project. Cargo isn't necessary to write a Rust
program, but it's so much better than using rustc directly. If you just want to futz around you can also run
some simple programs in the browser via play.rust-lang.org.
In later sections, we'll be using "rustup" to install extra Rust tooling. I strongly recommend installing all of
your Rust toolchains using rustup.
Let's get started and make our project: 4
We'll put each list in a separate file so that we don't lose any of our work.
It should be noted that the authentic Rust learning experience involves writing code, having the compiler
scream at you, and trying to figure out what the heck that means. I will be carefully ensuring that this occurs
as frequently as possible. Learning to read and understand Rust's generally excellent compiler errors and
documentation is incredibly important to being a productive Rust programmer.
Although actually that's a lie. In writing this I encountered way more compiler errors than I show. In
particular, in the later chapters I won't be showing a lot of the random "I typed (copy-pasted) bad" errors
that you expect to encounter in every language. This is a guided tour of having the compiler scream at us.
We're going to be going pretty slow, and I'm honestly not going to be very serious pretty much the entire
time. I think programming should be fun, dang it! If you're the type of person who wants maximally
information-dense, serious, and formal content, this book is not for you. Nothing I will ever make is for you.
You are wrong.
5
But all of these cases are super rare for anyone writing a Rust program. 99% of the time you should just
use a Vec (array stack), and 99% of the other 1% of the time you should be using a VecDeque (array
deque). These are blatantly superior data structures for most workloads due to less frequent allocation,
lower memory overhead, true random access, and cache locality.
Linked lists are as niche and vague of a data structure as a trie. Few would balk at me claiming a trie is a
niche structure that your average programmer could happily never learn in an entire productive career -- and
yet linked lists have some bizarre celebrity status. We teach every undergrad how to write a linked list. It's
the only niche collection I couldn't kill from std::collections. It's the list in C++!
We should all as a community say no to linked lists as a "standard" data structure. It's a fine data structure
with several great use cases, but those use cases are exceptional, not common.
Several people apparently read the first paragraph of this PSA and then stop reading. Like, literally they'll
try to rebut my argument by listing one of the things in my list of great use cases. The thing right after the
first paragraph!
Just so I can link directly to a detailed argument, here are several attempts at counter-arguments I have
seen, and my response to them. Feel free to skip to the first chapter if you just want to learn some Rust!
If performance doesn't matter, then it's surely fine to apply the natural default of an array.
They have O(1) split-append-insert-remove if you have a 6
pointer there
Yep! Although as Bjarne Stroustrup notes this doesn't actually matter if the time it takes to get that pointer
completely dwarfs the time it would take to just copy over all the elements in an array (which is really quite
fast).
Unless you have a workload that is heavily dominated by splitting and merging costs, the penalty every
other operation takes due to caching effects and code complexity will eliminate any theoretical gains.
But yes, if you're profiling your application to spend a lot of time in splitting and merging, you may have
gains in a linked list.
Then push and pop will be truly O(1) operations. And they're going to be considerably faster than push
and pop on linked list. You do a pointer offset, write the bytes, and increment an integer. No need to go to
any kind of allocator.
But yes, if you can't predict your load, there are worst-case latency savings to be had!
But this is a worst-case scenario. In the best-case, an array stack only has three pointers of overhead for the
entire array. Basically no overhead.
Linked lists on the other hand unconditionally waste space per element. A singly-linked list wastes one
pointer while a doubly-linked list wastes two. Unlike an array, the relative wasteage is proportional to the
size of the element. If you have huge elements this approaches 0 waste. If you have tiny elements (say,
bytes), then this can be as much as 16x memory overhead (8x on 32-bit)!
Actually, it's more like 23x (11x on 32-bit) because padding will be added to the byte to align the whole
node's size to a pointer.
This is also assuming the best-case for your allocator: that allocating and deallocating nodes is being done 7
densely and you're not losing memory to fragmentation.
But yes, if you have huge elements, can't predict your load, and have a decent allocator, there are memory
savings to be had!
Specifically, linked lists are nice because they represent an iteration without the need for any mutable state.
The next step is just visiting the next sublist.
Rust mostly does this kind of thing with iterators. They can be infinite and you can map, filter, reverse, and
concatenate them just like a functional list, and it will all be done just as lazily!
Rust also lets you easily talk about sub-arrays with slices. Your usual head/tail split in a functional
language is just slice.split_at_mut(1) . For a long time, Rust had an experimental system for pattern
matching on slices which was super cool, but the feature was simplified when it was stabilized. Still, basic
slice patterns are neat! And of course, slices can be turned into iterators!
But yes, if you're limited to immutable semantics, linked lists can be very nice.
Note that I'm not saying that functional programming is necessarily weak or bad. However it is
fundamentally semantically limited: you're largely only allowed to talk about how things are, and not how
they should be done. This is actually a feature, because it enables the compiler to do tons of exotic
transformations and potentially figure out the best way to do things without you having to worry about it.
However this comes at the cost of being able to worry about it. There are usually escape hatches, but at
some limit you're just writing procedural code again.
Even in functional languages, you should endeavour to use the appropriate data structure for the job when
you actually need a data structure. Yes, singly-linked lists are your primary tool for control flow, but they're
a really poor way to actually store a bunch of data and query it.
But yes, linked lists are the defacto heroes of the dark world of lock-free concurrency.
Mumble mumble kernel embedded something something 8
intrusive.
It's niche. You're talking about a situation where you're not even using your language's runtime. Is that not a
red flag that you're doing something strange?
But yes, you can do some really cool crazy stuff with cursors.
Take a Breath
Ok. That's out of the way. Let's write a bajillion linked lists.
We'll put our first list in src/first.rs . We need to tell Rust that first.rs is something that our lib
uses. All that requires is that we put this at the top of src/lib.rs (which Cargo made for us):
// in lib.rs
pub mod first;
11
Which reads approximately as "A List is either Empty or an Element followed by a List". This is a recursive
definition expressed as a sum type, which is a fancy name for "a type that can have different values which
may be different types". Rust calls sum types enum s! If you're coming from a C-like language, this is
exactly the enum you know and love, but in overdrive. So let's transcribe this functional definition into Rust!
For now we'll avoid generics to keep things simple. We'll only support storing signed 32-bit integers:
// in first.rs
// pub says we want people outside this module to be able to use List
pub enum List {
Empty,
Elem(i32, List),
}
Well. I don't know about you, but I certainly feel betrayed by the functional programming community.
If we actually check out the error message (after we get over the whole betrayal thing), we can see that rustc
is actually telling us exactly how to solve this problem:
insert indirection (e.g., a Box , Rc , or & ) at some point to make first::List representable
Lesse here...
A pointer type for heap allocation. See the module-level documentation for more.
clicks link
Box<T> , casually referred to as a 'box', provides the simplest form of heap allocation in Rust. Boxes
provide ownership for this allocation, and drop their contents when they go out of scope.
Examples
Creating a box:
let x = Box::new(5);
#[derive(Debug)]
enum List<T> {
Cons(T, Box<List<T>>),
Nil,
}
fn main() {
let list: List<i32> = List::Cons(1, Box::new(List::Cons(2, Box::new(List::Nil))));
println!("{:?}", list);
}
Recursive structures must be boxed, because if the definition of Cons looked like this:
Cons(T, List<T>),
It wouldn't work. This is because the size of a List depends on how many elements are in the list, and
so we don't know how much memory to allocate for a Cons. By introducing a Box, which has a
defined size, we know how big Cons needs to be.
Wow, uh. That is perhaps the most relevant and helpful documentation I have ever seen. Literally the first
thing in the documentation is exactly what we're trying to write, why it didn't work, and how to fix it.
Dang, docs rule. 13
Hey it built!
...but this is actually a really foolish definition of a List, for a few reasons.
[] = Stack
() = Heap
We're allocating a node that just says "I'm not actually a Node"
One of our nodes isn't heap-allocated at all.
On the surface, these two seem to cancel each-other out. We heap-allocate an extra node, but one of our
nodes doesn't need to be heap-allocated at all. However, consider the following potential layout for our
list:
In this layout we now unconditionally heap allocate our nodes. The key difference is the absence of the
junk from our first layout. What is this junk? To understand that, we'll need to look at how an enum is laid
out in memory.
enum Foo {
D1(T1),
D2(T2),
...
Dn(Tn),
}
A Foo will need to store some integer to indicate which variant of the enum it represents ( D1 , D2 , .. Dn ).
This is the tag of the enum. It will also need enough space to store the largest of T1 , T2 , .. Tn (plus some
extra space to satisfy alignment requirements).
The big takeaway here is that even though Empty is a single bit of information, it necessarily consumes
enough space for a pointer and an element, because it has to be ready to become an Elem at any time.
Therefore the first layout heap allocates an extra element that's just full of junk, consuming a bit more 14
space than the second layout.
One of our nodes not being allocated at all is also, perhaps surprisingly, worse than always allocating it.
This is because it gives us a non-uniform node layout. This doesn't have much of an appreciable effect on
pushing and popping nodes, but it does have an effect on splitting and merging lists.
layout 1:
[Elem A, ptr] -> (Elem B, ptr) -> (Elem C, ptr) -> (Empty *junk*)
split off C:
layout 2:
[ptr] -> (Elem A, ptr) -> (Elem B, ptr) -> (Elem C, *null*)
split off C:
Layout 2's split involves just copying B's pointer to the stack and nulling the old value out. Layout 1
ultimately does the same thing, but also has to copy C from the heap to the stack. Merging is the same
process in reverse.
One of the few nice things about a linked list is that you can construct the element in the node itself, and
then freely shuffle it around lists without ever moving it. You just fiddle with pointers and stuff gets
"moved". Layout 1 trashes this property.
Alright, I'm reasonably convinced Layout 1 is bad. How do we rewrite our List? Well, we could do
something like:
Hopefully this seems like an even worse idea to you. Most notably, this really complicates our logic,
because there is now a completely invalid state: ElemThenNotEmpty(0, Box(Empty)) . It also still suffers
from non-uniformly allocating our elements.
However it does have one interesting property: it totally avoids allocating the Empty case, reducing the total
number of heap allocations by 1. Unfortunately, in doing so it manages to waste even more space! This is
because the previous layout took advantage of the null pointer optimization.
We previously saw that every enum has to store a tag to specify which variant of the enum its bits
represent. However, if we have a special kind of enum:
enum Foo { 15
A,
B(ContainsANonNullPtr),
}
the null pointer optimization kicks in, which eliminates the space needed for the tag. If the variant is A, the
whole enum is set to all 0 's. Otherwise, the variant is B. This works because B can never be all 0 's, since
it contains a non-zero pointer. Slick!
Can you think of other enums and types that could do this kind of optimization? There's actually a lot! This
is why Rust leaves enum layout totally unspecified. There are a few more complicated enum layout
optimizations that Rust will do for us, but the null pointer one is definitely the most important! It means & ,
&mut , Box , Rc , Arc , Vec , and several other important types in Rust have no overhead when put in an
Option ! (We'll get to most of these in due time.)
So how do we avoid the extra junk, uniformly allocate, and get that sweet null-pointer optimization? We
need to better separate out the idea of having an element from allocating another list. To do this, we have
to think a little more C-like: structs!
While enums let us declare a type that can contain one of several values, structs let us declare a type that
contains many values at once. Let's break our List into two types: A List, and a Node.
As before, a List is either Empty or has an element followed by another List. By representing the "has an
element followed by another List" case by an entirely separate type, we can hoist the Box to be in a more
optimal position:
struct Node {
elem: i32,
next: List,
}
Alright! We actually just constructed exactly the layout that we used to demonstrate that our first layout (as
suggested by the official Rust documentation) was problematic.
> cargo build 16
:(
Rust is mad at us again. We marked the List as public (because we want people to be able to use it), but
not the Node . The problem is that the internals of an enum are totally public, and we're not allowed to
publicly talk about private types. We could make all of Node totally public, but generally in Rust we favour
keeping implementation details private. Let's make List a struct, so that we can hide the implementation
details:
enum Link {
Empty,
More(Box<Node>),
}
struct Node {
elem: i32,
next: Link,
}
Because List is a struct with a single field, its size is the same as that field. Yay zero-cost abstractions!
> cargo build 17
Alright, that compiled! Rust is pretty mad, because as far as it can tell, everything we've written is totally
useless: we never use head , and no one who uses our library can either since it's private. Transitively, that
means Link and Node are useless too. So let's solve that! Let's implement some code for our List!
18
New
To associate actual code with a type, we use impl blocks:
impl List {
// TODO, make code happen
}
Now we just need to figure out how to actually write code. In Rust we declare a function like so:
The first thing we want is a way to construct a list. Since we hide the implementation details, we need to
provide that as a function. The usual way to do that in Rust is to provide a static method, which is just a
normal function inside an impl :
impl List {
pub fn new() -> Self {
List { head: Link::Empty }
}
}
Self is an alias for "that type I wrote at the top next to impl ". Great for not repeating yourself!
We create an instance of a struct in much the same way we declare it, except instead of providing the
types of its fields, we initialize them with values.
We refer to variants of an enum using :: , which is the namespacing operator.
The last expression of a function is implicitly returned. This makes simple functions a little neater.
You can still use return to return early like other C-like languages.
19
Ownership 101
Now that we can construct a list, it'd be nice to be able to do something with it. We do that with "normal"
(non-static) methods. Methods are a special case of function in Rust because of the self argument, which
doesn't have a declared type:
There are 3 primary forms that self can take: self , &mut self , and &self . These 3 forms represent the
three primary forms of ownership in Rust:
self - Value
&mut self - mutable reference
&self - shared reference
A value represents true ownership. You can do whatever you want with a value: move it, destroy it, mutate
it, or loan it out via a reference. When you pass something by value, it's moved to the new location. The
new location now owns the value, and the old location can no longer access it. For this reason most
methods don't want self -- it would be pretty lame if trying to work with a list made it go away!
A mutable reference represents temporary exclusive access to a value that you don't own. You're allowed to
do absolutely anything you want to a value you have a mutable reference to as long you leave it in a valid
state when you're done (it would be rude to the owner otherwise!). This means you can actually completely
overwrite the value. A really useful special case of this is swapping a value out for another, which we'll be
using a lot. The only thing you can't do with an &mut is move the value out with no replacement. &mut
self is great for methods that want to mutate self .
A shared reference represents temporary shared access to a value that you don't own. Because you have
shared access, you're generally not allowed to mutate anything. Think of & as putting the value out on
display in a museum. & is great for methods that only want to observe self .
Later we'll see that the rule about mutation can be bypassed in certain cases. This is why shared references
aren't called immutable references. Really, mutable references could be called unique references, but we've
found that relating ownership to mutability gives the right intuition 99% of the time.
20
Push
So let's write pushing a value onto a list. push mutates the list, so we'll want to take &mut self . We also
need to take an i32 to push:
impl List {
pub fn push(&mut self, elem: i32) {
// TODO
}
}
First things first, we need to make a node to store our element in:
What goes next ? Well, the entire old list! Can we... just do that?
impl List {
pub fn push(&mut self, elem: i32) {
let new_node = Node {
elem: elem,
next: self.head,
};
}
}
Nooooope. Rust is telling us the right thing, but it's certainly not obvious what exactly it means, or what to
do about it:
We're trying to move the self.head field out to next , but Rust doesn't want us doing that. This would
leave self only partially initialized when we end the borrow and "give it back" to its rightful owner. As we
said before, that's the one thing you can't do with an &mut : It would be super rude, and Rust is very polite
(it would also be incredibly dangerous, but surely that isn't why it cares).
What if we put something back? Namely, the node that we're creating:
pub fn push(&mut self, elem: i32) { 21
let new_node = Box::new(Node {
elem: elem,
next: self.head,
});
self.head = Link::More(new_node);
}
No dice. In principle, this is something Rust could actually accept, but it won't (for various reasons -- the
most serious being exception safety). We need some way to get the head without Rust noticing that it's
gone. For advice, we turn to infamous Rust Hacker Indiana Jones:
Ah yes, Indy suggests the mem::replace maneuver. This incredibly useful function lets us steal a value
out of a borrow by replacing it with another value. Let's just pull in std::mem at the top of the file, so that
mem is in local scope:
use std::mem;
self.head = Link::More(new_node);
}
Here we replace self.head temporarily with Link::Empty before replacing it with the new head of the list.
I'm not gonna lie: this is a pretty unfortunate thing to have to do. Sadly, we must (for now).
But hey, that's push all done! Probably. We should probably test it, honestly. Right now the easiest way to22
do that is probably to write pop , and make sure that it produces the right results.
23
Pop
Like push , pop wants to mutate the list. Unlike push , we actually want to return something. But pop
also has to deal with a tricky corner case: what if the list is empty? To represent this case, we use the trusty
Option type:
Option<T> is an enum that represents a value that may exist. It can either be Some(T) or None . We could
make our own enum for this like we did for Link, but we want our users to be able to understand what the
heck our return type is, and Option is so ubiquitous that everyone knows it. In fact, it's so fundamental that
it's implicitly imported into scope in every file, as well as its variants Some and None (so we don't have to
say Option::None ).
The pointy bits on Option<T> indicate that Option is actually generic over T. That means that you can
make an Option for any type!
So uh, we have this Link thing, how do we figure out if it's Empty or has More? Pattern matching with
match !
Whoops, pop has to return a value, and we're not doing that yet. We could return None , but in this case
it's probably a better idea to return unimplemented!() , to indicate that we aren't done implementing the
function. unimplemented!() is a macro ( ! indicates a macro) that panics the program when we get to it
(~crashes it in a controlled manner).
pub fn pop(&mut self) -> Option<i32> { 24
match self.head {
Link::Empty => {
// TODO
}
Link::More(node) => {
// TODO
}
};
unimplemented!()
}
Unconditional panics are an example of a diverging function. Diverging functions never return to the caller,
so they may be used in places where a value of any type is expected. Here, unimplemented!() is being
used in place of a value of type Option<T> .
Note also that we don't need to write return in our program. The last expression (basically line) in a
function is implicitly its return value. This lets us express really simple things a bit more concisely. You can
always explicitly return early with return like any other C-like language.
Come on Rust, get off our back! As always, Rust is hella mad at us. Thankfully, this time it's also giving us
the full scoop! By default, a pattern match will try to move its contents into the new branch, but we can't do
this because we don't own self by-value here.
Rust says we should add a reference to our match to fix that. 🤷♀️ Let's try it:
pub fn pop(&mut self) -> Option<i32> { 25
match &self.head {
Link::Empty => {
// TODO
}
Link::More(node) => {
// TODO
}
};
unimplemented!()
}
Hooray, compiling again! Now let's figure out that logic. We want to make an Option, so let's make a
variable for that. In the Empty case we need to return None. In the More case we need to return Some(i32) ,
and change the head of the list. So, let's try to do basically that?
head
desk
We're trying to move out of node when all we have is a shared reference to it.
We should probably step back and think about what we're trying to do. We want to:
The key insight is we want to remove things, which means we want to get the head of the list by value. We
certainly can't do that through the shared reference we get through &self.head . We also "only" have a
mutable reference to self , so the only way we can move stuff is to replace it. Looks like we're doing the
Empty dance again!
cargo build
OMG
Which is a bit more concise and idiomatic. Note that the Link::Empty branch completely lost its braces,
because we only have one expression to evaluate. Just a nice shorthand for simple cases.
cargo build
Testing
Alright, so we've got push and pop written, now we can actually test out our stack! Rust and cargo
support testing as a first-class feature, so this will be super easy. All we have to do is write a function, and
annotate it with #[test] .
Generally, we try to keep our tests next to the code that it's testing in the Rust community. However we
usually make a new namespace for the tests, to avoid conflicting with the "real" code. Just as we used mod
to specify that first.rs should be included in lib.rs , we can use mod to basically create a whole new
file inline:
// in first.rs
mod test {
#[test]
fn basics() {
// TODO
}
}
running 1 test
test first::test::basics ... ok
Yay our do-nothing test passed! Let's make it not-do-nothing. We'll do that with the assert_eq! macro.
This isn't some special testing magic. All it does is compare the two things you give it, and panic the
program if they don't match. Yep, you indicate failure to the test harness by freaking out!
mod test { 29
#[test]
fn basics() {
let mut list = List::new();
// Populate list
list.push(1);
list.push(2);
list.push(3);
// Check exhaustion
assert_eq!(list.pop(), Some(1));
assert_eq!(list.pop(), None);
}
}
Oops! Because we made a new module, we need to pull in List explicitly to use it.
mod test {
use super::List;
// everything else the same
}
> cargo test 30
running 1 test
test first::test::basics ... ok
Yay!
What's up with that warning though...? We clearly use List in our test!
...but only when testing! To appease the compiler (and to be friendly to our consumers), we should indicate
that the whole test module should only be compiled if we're running tests.
#[cfg(test)]
mod test {
use super::List;
// everything else the same
}
Drop
We can make a stack, push on to, pop off it, and we've even tested that it all works right!
Do we need to worry about cleaning up our list? Technically, no, not at all! Like C++, Rust uses destructors
to automatically clean up resources when they're done with. A type has a destructor if it implements a trait
called Drop. Traits are Rust's fancy term for interfaces. The Drop trait has the following interface:
Basically, "when you go out of scope, I'll give you a second to clean up your affairs".
You don't actually need to implement Drop if you contain types that implement Drop, and all you'd want to
do is call their destructors. In the case of List, all it would want to do is drop its head, which in turn would
maybe try to drop a Box<Node> . All that's handled for us automatically... with one hitch.
When list gets dropped, it will try to drop A, which will try to drop B, which will try to drop C. Some of
you might rightly be getting nervous. This is recursive code, and recursive code can blow the stack!
Some of you might be thinking "this is clearly tail recursive, and any decent language would ensure that
such code wouldn't blow the stack". This is, in fact, incorrect! To see why, let's try to write what the
compiler has to do, by manually implementing Drop for our List as the compiler would:
impl Drop for List { 32
fn drop(&mut self) {
// NOTE: you can't actually explicitly call `drop` in real Rust code;
// we're pretending to be the compiler!
self.head.drop(); // tail recursive - good!
}
}
We can't drop the contents of the Box after deallocating, so there's no way to drop in a tail-recursive
manner! Instead we're going to have to manually write an iterative drop for List that hoists nodes out of
their boxes.
Running target/debug/lists-5c71138492ad4b4a
running 1 test
test first::test::basics ... ok
enum Link {
Empty,
More(Box<Node>),
}
struct Node {
elem: i32,
next: Link,
}
impl List {
pub fn new() -> Self {
List { head: Link::Empty }
}
self.head = Link::More(new_node);
}
#[cfg(test)]
mod test {
use super::List;
#[test]
fn basics() {
let mut list = List::new();
// Check exhaustion
assert_eq!(list.pop(), Some(1));
assert_eq!(list.pop(), None);
}
}
Geez. 80 lines, and half of it was tests! Well, I did say this first one was going to take a while!
37
An Ok Singly-Linked Stack
In the previous chapter we wrote up a minimum viable singly-linked stack. However there's a few design
decisions that make it kind of sucky. Let's make it less sucky. In doing so, we will:
// in lib.rs
Using Option
Particularly observant readers may have noticed that we actually reinvented a really bad version of Option:
enum Link {
Empty,
More(Box<Node>),
}
Link is just Option<Box<Node>> . Now, it's nice not to have to write Option<Box<Node>> everywhere, and
unlike pop , we're not exposing this to the outside world, so maybe it's fine. However Option has some
really nice methods that we've been manually implementing ourselves. Let's not do that, and replace
everything with Options. First, we'll do it naively by just renaming everything to use Some and None:
use std::mem; 39
struct Node {
elem: i32,
next: Link,
}
impl List {
pub fn new() -> Self {
List { head: None }
}
self.head = Some(new_node);
}
This is marginally better, but the big wins will come from Option's methods.
First, mem::replace(&mut option, None) is such an incredibly common idiom that Option actually just
went ahead and made it a method: take .
pub struct List { 40
head: Link,
}
struct Node {
elem: i32,
next: Link,
}
impl List {
pub fn new() -> Self {
List { head: None }
}
self.head = Some(new_node);
}
Second, match option { None => None, Some(x) => Some(y) } is such an incredibly common idiom that
it was called map . map takes a function to execute on the x in the Some(x) to produce the y in Some(y) .
We could write a proper fn and pass it to map , but we'd much rather write what to do inline.
The way to do this is with a closure. Closures are anonymous functions with an extra super-power: they
can refer to local variables outside the closure! This makes them super useful for doing all sorts of
conditional logic. The only place we do a match is in pop , so let's just rewrite that:
Running target/debug/lists-5c71138492ad4b4a
running 2 tests
test first::test::basics ... ok
test second::test::basics ... ok
It turns out that's actually really easy. Let's make all of our types generic right now:
struct Node<T> {
elem: T,
next: Link<T>,
}
You just make everything a little more pointy, and suddenly your code is generic. Of course, we can't just do
this, or else the compiler's going to be Super Mad.
The problem is pretty clear: we're talking about this List thing but that's not real anymore. Like Option
and Box, we now always have to talk about List<Something> .
But what's the Something we use in all these impls? Just like List, we want our implementations to work
with all the T's. So, just like List, let's make our impl s pointy:
impl<T> List<T> { 43
pub fn new() -> Self {
List { head: None }
}
self.head = Some(new_node);
}
Running target/debug/lists-5c71138492ad4b4a
running 2 tests
test first::test::basics ... ok
test second::test::basics ... ok
All of our code is now completely generic over arbitrary values of T. Dang, Rust is easy. I'd like to make a
particular shout-out to new which didn't even change:
Bask in the Glory that is Self, guardian of refactoring and copy-pasta coding. Also of interest, we don't
write List<T> when we construct an instance of list. That part's inferred for us based on the fact that we're
returning it from a function that expects a List<T> .
Peek
One thing we didn't even bother to implement last time was peeking. Let's go ahead and do that. All we
need to do is return a reference to the element in the head of the list, if it exists. Sounds easy, let's try:
Map takes self by value, which would move the Option out of the thing it's in. Previously this was fine
because we had just take n it out, but now we actually want to leave it where it was. The correct way to
handle this is with the as_ref method on Option, which has the following definition:
impl<T> Option<T> {
pub fn as_ref(&self) -> Option<&T>;
}
It demotes the Option<T> to an Option to a reference to its internals. We could do this ourselves with an
explicit match but ugh no. It does mean that we need to do an extra dereference to cut through the extra
indirection, but thankfully the . operator handles that for us.
cargo build
Nailed it.
We can also make a mutable version of this method using as_mut : 45
EZ
#[test]
fn peek() {
let mut list = List::new();
assert_eq!(list.peek(), None);
assert_eq!(list.peek_mut(), None);
list.push(1); list.push(2); list.push(3);
assert_eq!(list.peek(), Some(&3));
assert_eq!(list.peek_mut(), Some(&mut 3));
}
cargo test
Running target/debug/lists-5c71138492ad4b4a
running 3 tests
test first::test::basics ... ok
test second::test::basics ... ok
test second::test::peek ... ok
That's nice, but we didn't really test to see if we could mutate that peek_mut return value, did we? If a
reference is mutable but nobody mutates it, have we really tested the mutability? Let's try using map on
this Option<&mut T> to put a profound value in:
#[test] 46
fn peek() {
let mut list = List::new();
assert_eq!(list.peek(), None);
assert_eq!(list.peek_mut(), None);
list.push(1); list.push(2); list.push(3);
assert_eq!(list.peek(), Some(&3));
assert_eq!(list.peek_mut(), Some(&mut 3));
list.peek_mut().map(|&mut value| {
value = 42
});
assert_eq!(list.peek(), Some(&42));
assert_eq!(list.pop(), Some(42));
}
The compiler is complaining that value is immutable, but we pretty clearly wrote &mut value ; what
gives? It turns out that writing the argument of the closure that way doesn't specify that value is a mutable
reference. Instead, it creates a pattern that will be matched against the argument to the closure; |&mut
value| means "the argument is a mutable reference, but just copy the value it points to into value ,
please." If we just use |value| , the type of value will be &mut i32 and we can actually mutate the head:
#[test]
fn peek() {
let mut list = List::new();
assert_eq!(list.peek(), None);
assert_eq!(list.peek_mut(), None);
list.push(1); list.push(2); list.push(3);
assert_eq!(list.peek(), Some(&3));
assert_eq!(list.peek_mut(), Some(&mut 3));
list.peek_mut().map(|value| {
*value = 42
});
assert_eq!(list.peek(), Some(&42));
assert_eq!(list.pop(), Some(42));
}
cargo test 47
Running target/debug/lists-5c71138492ad4b4a
running 3 tests
test first::test::basics ... ok
test second::test::basics ... ok
test second::test::peek ... ok
Much better!
48
IntoIter
Collections are iterated in Rust using the Iterator trait. It's a bit more complicated than Drop :
The new kid on the block here is type Item . This is declaring that every implementation of Iterator has an
associated type called Item. In this case, this is the type that it can spit out when you call next .
The reason Iterator yields Option<Self::Item> is because the interface coalesces the has_next and
get_next concepts. When you have the next value, you yield Some(value) , and when you don't you yield
None . This makes the API generally more ergonomic and safe to use and implement, while avoiding
redundant checks and logic between has_next and get_next . Nice!
Sadly, Rust has nothing like a yield statement (yet), so we're going to have to implement the logic
ourselves. Also, there's actually 3 different kinds of iterator each collection should endeavour to
implement:
IntoIter - T
IterMut - &mut T
Iter - &T
We actually already have all the tools to implement IntoIter using List's interface: just call pop over and
over. As such, we'll just implement IntoIter as a newtype wrapper around List:
impl<T> List<T> {
pub fn into_iter(self) -> IntoIter<T> {
IntoIter(self)
}
}
Running target/debug/lists-5c71138492ad4b4a
running 4 tests
test first::test::basics ... ok
test second::test::basics ... ok
test second::test::into_iter ... ok
test second::test::peek ... ok
Nice!
50
Iter
Alright, let's try to implement Iter. This time we won't be able to rely on List giving us all the features we
want. We'll need to roll our own. The basic logic we want is to hold a pointer to the current node we want
to yield next. Because that node may not exist (the list is empty or we're otherwise done iterating), we want
that reference to be an Option. When we yield an element, we want to proceed to the current node's next
node.
impl<T> List<T> {
pub fn iter(&self) -> Iter<T> {
Iter { next: self.head.map(|node| &node) }
}
}
Let's try something new: see that error[E0106] thing? That's a compiler error code. We can ask rustc to
explain those with, well, --explain :
> rustc --explain E0106 51
This error indicates that a lifetime is missing from a type. If it is an error
inside a function signature, the problem may be with failing to adhere to the
lifetime elision rules (see below).
Here are some simple examples of where you'll run into this error:
That uh... that didn't really clarify much (these docs assume we understand Rust better than we currently
do). But it looks like we should add those 'a things to our struct? Let's try that.
Alright I'm starting to see a pattern here... let's just add these little guys to everything we can:
pub struct Iter<'a, T> { 52
next: Option<&'a Node<T>>,
}
Maybe we should actually figure out what the heck this 'a lifetime stuff even means.
Lifetimes can scare off a lot of people because they're a change to something we've known and loved since
the dawn of programming. We've actually managed to dodge lifetimes so far, even though they've been
tangled throughout our programs this whole time.
Lifetimes are unnecessary in garbage collected languages because the garbage collector ensures that
everything magically lives as long as it needs to. Most data in Rust is manually managed, so that data
needs another solution. C and C++ give us a clear example what happens if you just let people take
pointers to random data on the stack: pervasive unmanageable unsafety. This can be roughly separated into
two classes of error:
So what's a lifetime?
Quite simply, a lifetime is the name of a region (~block/scope) of code somewhere in a program. That's it.
When a reference is tagged with a lifetime, we're saying that it has to be valid for that entire region.
Different things place requirements on how long a reference must and can be valid for. The entire lifetime
system is in turn just a constraint-solving system that tries to minimize the region of every reference. If it
successfully finds a set of lifetimes that satisfies all the constraints, your program compiles! Otherwise you
get an error back saying that something didn't live long enough.
Within a function body you generally can't talk about lifetimes, and wouldn't want to anyway. The compiler
has full information and can infer all the constraints to find the minimum lifetimes. However at the type and
API-level, the compiler doesn't have all the information. It requires you to tell it about the relationship
between different lifetimes so it can figure out what you're doing.
In principle, those lifetimes could also be left out, but then checking all the borrows would be a huge
whole-program analysis that would produce mind-bogglingly non-local errors. Rust's system means all
borrow checking can be done in each function body independently, and all your errors should be fairly local
(or your types have incorrect signatures).
But we've written references in function signatures before, and it was fine! That's because there are certain
cases that are so common that Rust will automatically pick the lifetimes for you. This is lifetime elision.
In particular:
// Only one reference in input, so the output must be derived from that input
fn foo(&A) -> &B; // sugar for:
fn foo<'a>(&'a A) -> &'a B;
So what does fn foo<'a>(&'a A) -> &'a B mean? In practical terms, all it means is that the input must
live at least as long as the output. So if you keep the output around for a long time, this will expand the
region that the input must be valid for. Once you stop using the output, the compiler will know it's ok for
the input to become invalid too.
With this system set up, Rust can ensure nothing is used after free, and nothing is mutated while
outstanding references exist. It just makes sure the constraints all work out!
impl<T> List<T> {
pub fn iter(&self) -> Iter<T> {
Iter { next: self.head.map(|node| &node) }
}
}
// We *do* have a lifetime here, because Iter has one that we need to define
impl<'a, T> Iterator for Iter<'a, T> {
// Need it here too, this is a type declaration
type Item = &'a T;
(╯°□°)╯︵ ┻━┻
OK. SO. We fixed our lifetime errors but now we're getting some new type errors.
We want to be storing &Node 's, but we're getting &Box<Node> s. Ok, that's easy enough, we just need to
dereference the Box before we take our reference:
impl<T> List<T> {
pub fn iter<'a>(&'a self) -> Iter<'a, T> {
Iter { next: self.head.map(|node| &*node) }
}
}
(ノಥ益ಥ)ノ┻━┻
We forgot as_ref , so we're moving the box into map , which means it would be dropped, which means
our references would be dangling:
impl<T> List<T> {
pub fn iter<'a>(&'a self) -> Iter<'a, T> {
Iter { next: self.head.as_ref().map(|node| &*node) }
}
}
😭
as_ref added another layer of indirection we need to remove:
impl<T> List<T> {
pub fn iter<'a>(&'a self) -> Iter<'a, T> {
Iter { next: self.head.as_deref() }
}
}
cargo build
🎉🎉🎉
The as_deref and as_deref_mut functions are stable as of Rust 1.40. Before that you would need to do
map(|node| &**node) and map(|node| &mut**node) . You may be thinking "wow that &** thing is really
janky", and you're not wrong, but like a fine wine Rust gets better over time and we no longer need to do
such. Normally Rust is very good at doing this kind of conversion implicitly, through a process called deref
coercion, where basically it can insert *'s throughout your code to make it type-check. It can do this because58
we have the borrow checker to ensure we never mess up pointers!
But in this case the closure in conjunction with the fact that we have an Option<&T> instead of &T is a bit
too complicated for it to work out, so we need to help it by being explicit. Thankfully this is pretty rare, in
my experience.
Just for completeness' sake, we could give it a different hint with the turbofish:
The turbofish, ::<> , lets us tell the compiler what we think the types of those generics should be. In this
case ::<&Node<T>, _> says "it should return a &Node<T> , and I don't know/care about that other type".
This in turn lets the compiler know that &node should have deref coercion applied to it, so we don't need
to manually apply all those *'s!
But in this case I don't think it's really an improvement, this was just a thinly veiled excuse to show off
deref coercion and the sometimes-useful turbofish. 😅
Let's write a test to be sure we didn't no-op it or anything:
#[test]
fn iter() {
let mut list = List::new();
list.push(1); list.push(2); list.push(3);
Running target/debug/lists-5c71138492ad4b4a
running 5 tests
test first::test::basics ... ok
test second::test::basics ... ok
test second::test::into_iter ... ok
test second::test::iter ... ok
test second::test::peek ... ok
Heck yeah.
Finally, it should be noted that we can actually apply lifetime elision here:
impl<T> List<T> { 59
pub fn iter<'a>(&'a self) -> Iter<'a, T> {
Iter { next: self.head.as_deref() }
}
}
is equivalent to:
impl<T> List<T> {
pub fn iter(&self) -> Iter<T> {
Iter { next: self.head.as_deref() }
}
}
Or, if you're not comfortable "hiding" that a struct contains a lifetime, you can use the Rust 2018 "explicitly
elided lifetime" syntax, '_ :
impl<T> List<T> {
pub fn iter(&self) -> Iter<'_, T> {
Iter { next: self.head.as_deref() }
}
}
60
IterMut
I'm gonna be honest, IterMut is WILD. Which in itself seems like a wild thing to say; surely it's identical to
Iter!
Semantically, yes, but the nature of shared and mutable references means that Iter is "trivial" while IterMut
is Legit Wizard Magic.
The key insight comes from our implementation of Iterator for Iter:
The signature of next establishes no constraint between the lifetime of the input and the output! Why do
we care? It means we can call next over and over unconditionally!
Cool!
This is definitely fine for shared references because the whole point is that you can have tons of them at
once. However mutable references can't coexist. The whole point is that they're exclusive.
The end result is that it's notably harder to write IterMut using safe code (and we haven't gotten into what
that even means yet...). Surprisingly, IterMut can actually be implemented for many structures completely
safely!
We'll start by just taking the Iter code and changing everything to be mutable:
pub struct IterMut<'a, T> { 61
next: Option<&'a mut Node<T>>,
}
impl<T> List<T> {
pub fn iter_mut(&self) -> IterMut<'_, T> {
IterMut { next: self.head.as_deref_mut() }
}
}
Ok looks like we've got two different errors here. The first one looks really clear though, it even tells us
how to fix it! You can't upgrade a shared reference to a mutable one, so iter_mut needs to take &mut
self . Just a silly copy-paste error.
Oops! I actually accidentally made an error when writing the iter impl in the previous section, and we
were just getting lucky that it worked!
We have just had our first run in with the magic of Copy. When we introduced ownership we said that
when you move stuff, you can't use it anymore. For some types, this makes perfect sense. Our good friend
Box manages an allocation on the heap for us, and we certainly don't want two pieces of code to think that
they need to free its memory.
However for other types this is garbage. Integers have no ownership semantics; they're just meaningless 62
numbers! This is why integers are marked as Copy. Copy types are known to be perfectly copyable by a
bitwise copy. As such, they have a super power: when moved, the old value is still usable. As a
consequence, you can even move a Copy type out of a reference without replacement!
All numeric primitives in Rust (i32, u64, bool, f32, char, etc...) are Copy. You can also declare any user-
defined type to be Copy as well, as long as all its components are Copy.
Critically to why this code was working, shared references are also Copy! Because & is copy, Option<&> is
also Copy. So when we did self.next.map it was fine because the Option was just copied. Now we can't
do that, because &mut isn't Copy (if you copied an &mut, you'd have two &mut's to the same location in
memory, which is forbidden). Instead, we should properly take the Option to get it.
#[test]
fn iter_mut() {
let mut list = List::new();
list.push(1); list.push(2); list.push(3);
Running target/debug/lists-5c71138492ad4b4a
running 6 tests
test first::test::basics ... ok
test second::test::basics ... ok
test second::test::iter_mut ... ok
test second::test::into_iter ... ok
test second::test::iter ... ok
test second::test::peek ... ok
Yep. It works.
Holy shit.
What. 63
Ok I mean it actually is supposed to work, but there's usually something stupid that gets in the way! Let's
be clear here:
We have just implemented a piece of code that takes a singly-linked list, and returns a mutable reference to
every single element in the list at most once. And it's statically verified to do that. And it's totally safe. And
we didn't have to do anything wild.
That's kind of a big deal, if you ask me. There are a couple reasons why this works:
We take the Option<&mut> so we have exclusive access to the mutable reference. No need to worry
about someone looking at it again.
Rust understands that it's ok to shard a mutable reference into the subfields of the pointed-to struct,
because there's no way to "go back up", and they're definitely disjoint.
It turns out that you can apply this basic logic to get a safe IterMut for an array or a tree as well! You can
even make the iterator DoubleEnded, so that you can consume the iterator from the front and the back at
once! Woah!
64
Final Code
Alright, that's it for the second list; here's the final code!
65
pub struct List<T> {
head: Link<T>,
}
struct Node<T> {
elem: T,
next: Link<T>,
}
impl<T> List<T> {
pub fn new() -> Self {
List { head: None }
}
self.head = Some(new_node);
}
#[cfg(test)]
mod test {
use super::List;
#[test]
fn basics() {
let mut list = List::new();
// Populate list
list.push(1);
list.push(2);
list.push(3);
// Check exhaustion
assert_eq!(list.pop(), Some(1));
assert_eq!(list.pop(), None);
}
#[test]
fn peek() {
let mut list = List::new();
assert_eq!(list.peek(), None);
assert_eq!(list.peek_mut(), None);
list.push(1); list.push(2); list.push(3);
assert_eq!(list.peek(), Some(&3));
assert_eq!(list.peek_mut(), Some(&mut 3));
list.peek_mut().map(|value| {
*value = 42
});
assert_eq!(list.peek(), Some(&42));
assert_eq!(list.pop(), Some(42));
}
#[test]
fn into_iter() {
let mut list = List::new();
list.push(1); list.push(2); list.push(3);
#[test]
fn iter() {
let mut list = List::new();
list.push(1); list.push(2); list.push(3);
#[test]
fn iter_mut() {
let mut list = List::new();
list.push(1); list.push(2); list.push(3);
Getting beefier!
69
Let's move from single ownership to shared ownership by writing a persistent immutable singly-linked
list. This will be exactly the list that functional programmers have come to know and love. You can get the
head or the tail and put someone's head on someone else's tail... and... that's basically it. Immutability is a
hell of a drug.
In the process we'll largely just become familiar with Rc and Arc, but this will set us up for the next list
which will change the game.
// in lib.rs
Layout
Alright, back to the drawing board on layout.
The most important thing about a persistent list is that you can manipulate the tails of lists basically for
free:
For instance, this isn't an uncommon workload to see with a persistent list:
This just can't work with Boxes, because ownership of B is shared. Who should free it? If I drop list2, does
it free B? With boxes we certainly would expect so!
Functional languages — and indeed almost every other language — get away with this by using garbage
collection. With the magic of garbage collection, B will be freed only after everyone stops looking at it.
Hooray!
Rust doesn't have anything like the garbage collectors these languages have. They have tracing GC, which
will dig through all the memory that's sitting around at runtime and figure out what's garbage
automatically. Instead, all Rust has today is reference counting. Reference counting can be thought of as a
very simple GC. For many workloads, it has significantly less throughput than a tracing collector, and it
completely falls over if you manage to build cycles. But hey, it's all we've got! Thankfully, for our usecase
we'll never run into cycles (feel free to try to prove this to yourself — I sure won't).
So how do we do reference-counted garbage collection? Rc ! Rc is just like Box, but we can duplicate it,
and its memory will only be freed when all the Rc's derived from it are dropped. Unfortunately, this
flexibility comes at a serious cost: we can only take a shared reference to its internals. This means we can't
ever really get data out of one of our lists, nor can we mutate them.
struct Node<T> {
elem: T,
next: Link<T>,
}
// in third.rs
struct Node<T> {
elem: T,
next: Link<T>,
}
cargo build
Oh dang, sick burn. Unlike everything we used for our mutable lists, Rc is so lame that it's not even
implicitly imported into every single Rust program. What a loser.
use std::rc::Rc;
cargo build 72
Seems legit. Rust continues to be completely trivial to write. I bet we can just find-and-replace Box with Rc
and call it a day!
...
No. No we can't.
73
Basics
We already know a lot of the basics of Rust now, so we can do a lot of the simple stuff again.
impl<T> List<T> {
pub fn new() -> Self {
List { head: None }
}
}
push and pop don't really make sense anymore. Instead we can provide prepend and tail , which
provide approximately the same thing.
Let's start with prepending. It takes a list and an element, and returns a List. Like the mutable list case, we
want to make a new node, that has the old list as its next value. The only novel thing is how to get that
next value, because we're not allowed to mutate anything.
The answer to our prayers is the Clone trait. Clone is implemented by almost every type, and provides a
generic way to get "another one like this one" that is logically disjoint, given only a shared reference. It's like
a copy constructor in C++, but it's never implicitly invoked.
Rc in particular uses Clone as the way to increment the reference count. So rather than moving a Box to be
in the sublist, we just clone the head of the old list. We don't even need to match on the head, because
Option exposes a Clone implementation that does exactly the thing we want.
tail is the logical inverse of this operation. It takes a list and returns the whole list with the first element
removed. All that is is cloning the second element in the list (if it exists). Let's try this:
cargo build
Hrm, we messed up. map expects us to return a Y, but here we're returning an Option<Y> . Thankfully, this
is another common Option pattern, and we can just use and_then to let us return an Option.
Great.
Now that we have tail , we should probably provide head , which returns a reference to the first element.
That's just peek from the mutable list:
Nice.
#[test]
fn basics() {
let list = List::new();
assert_eq!(list.head(), None);
}
}
Running target/debug/lists-5c71138492ad4b4a
running 5 tests
test first::test::basics ... ok
test second::test::into_iter ... ok
test second::test::basics ... ok
test second::test::iter ... ok
test third::test::basics ... ok
Perfect!
impl<T> List<T> {
pub fn iter(&self) -> Iter<'_, T> {
Iter { next: self.head.as_deref() }
}
}
#[test]
fn iter() {
let list = List::new().prepend(1).prepend(2).prepend(3);
cargo test
Running target/debug/lists-5c71138492ad4b4a
running 7 tests
test first::test::basics ... ok
test second::test::basics ... ok
test second::test::iter ... ok
test second::test::into_iter ... ok
test second::test::peek ... ok
test third::test::basics ... ok
test third::test::iter ... ok
(chumps did)
Note that we can't implement IntoIter or IterMut for this type. We only have shared access to elements.
77
Drop
Like the mutable lists, we have a recursive destructor problem. Admittedly, this isn't as bad of a problem
for the immutable list: if we ever hit another node that's the head of another list somewhere, we won't
recursively drop it. However it's still a thing we should care about, and how to deal with isn't as clear.
Here's how we solved it before:
cur_link = boxed_node.next.take();
This is mutating the Node inside the Box, but we can't do that with Rc; it only gives us shared access,
because any number of other Rc's could be pointing at it.
But if we know that we're the last list that knows about this node, it would actually be fine to move the
Node out of the Rc. Then we could also know when to stop: whenever we can't hoist out the Node.
And look at that, Rc has a method that does exactly this: try_unwrap :
running 8 tests
test first::test::basics ... ok
test second::test::basics ... ok
test second::test::into_iter ... ok
test second::test::iter ... ok
test second::test::iter_mut ... ok
test second::test::peek ... ok
test third::test::basics ... ok
test third::test::iter ... ok
Great! Nice.
79
Arc
One reason to use an immutable linked list is to share data across threads. After all, shared mutable state
is the root of all evil, and one way to solve that is to kill the mutable part forever.
Except our list isn't thread-safe at all. In order to be thread-safe, we need to fiddle with reference counts
atomically. Otherwise, two threads could try to increment the reference count, and only one would happen.
Then the list could get freed too soon!
In order to get thread safety, we have to use Arc. Arc is completely identical to Rc except for the fact that
reference counts are modified atomically. This has a bit of overhead if you don't need it, so Rust exposes
both. All we need to do to make our list is replace every reference to Rc with std::sync::Arc . That's it.
We're thread safe. Done!
But this raises an interesting question: how do we know if a type is thread-safe or not? Can we accidentally
mess up?
The reason this is the case is because Rust models thread-safety in a first-class way with two traits: Send
and Sync .
A type is Send if it's safe to move to another thread. A type is Sync if it's safe to share between multiple
threads. That is, if T is Sync, &T is Send. Safe in this case means it's impossible to cause data races, (not
to be mistaken with the more general issue of race conditions).
These are marker traits, which is a fancy way of saying they're traits that provide absolutely no interface.
You either are Send, or you aren't. It's just a property other APIs can require. If you aren't appropriately Send,
then it's statically impossible to be sent to a different thread! Sweet!
Send and Sync are also automatically derived traits based on whether you are totally composed of Send
and Sync types. It's similar to how you can only implement Copy if you're only made of Copy types, but
then we just go ahead and implement it automatically if you are.
Almost every type is Send and Sync. Most types are Send because they totally own their data. Most types
are Sync because the only way to share data across threads is to put them behind a shared reference, which
makes them immutable!
However there are special types that violate these properties: those that have interior mutability. So far
we've only really interacted with inherited mutability (AKA external mutability): the mutability of a value is
inherited from the mutability of its container. That is, you can't just randomly mutate some field of a non-
mutable value because you feel like it.
Interior mutability types violate this: they let you mutate through a shared reference. There are two major
classes of interior mutability: cells, which only work in a single-threaded context; and locks, which work in
a multi-threaded context. For obvious reasons, cells are cheaper when you can use them. There's also
atomics, which are primitives that act like a lock.
So what does all of this have to do with Rc and Arc? Well, they both use interior mutability for their
reference count. Worse, this reference count is shared between every instance! Rc just uses a cell, which
means it's not thread safe. Arc uses an atomic, which means it is thread safe. Of course, you can't magically80
make a type thread safe by putting it in Arc. Arc can only derive thread-safety like any other type.
I really really really don't want to get into the finer details of atomic memory models or non-derived Send
implementations. Needless to say, as you get deeper into Rust's thread-safety story, stuff gets more
complicated. As a high-level consumer, it all just works and you don't really need to think about it.
81
Final Code
That's all I really have to say on the immutable stack. We're getting pretty good at implementing lists now!
82
use std::rc::Rc;
struct Node<T> {
elem: T,
next: Link<T>,
}
impl<T> List<T> {
pub fn new() -> Self {
List { head: None }
}
#[cfg(test)]
mod test {
use super::List;
#[test]
fn basics() {
let list = List::new();
assert_eq!(list.head(), None);
#[test]
fn iter() {
let list = List::new().prepend(1).prepend(2).prepend(3);
In the process we'll become familiar with interior mutability, and probably learn the hard way that safe
doesn't mean correct. Doubly-linked lists are hard, and I always make a mistake somewhere.
// in lib.rs
This will be another clean-room operation, though as usual we'll probably find some logic that applies
verbatim again.
Disclaimer: this chapter is basically a demonstration that this is a very bad idea.
85
Layout
The key to our design is the RefCell type. The heart of RefCell is a pair of methods:
The rules for borrow and borrow_mut are exactly those of & and &mut : you can call borrow as many
times as you want, but borrow_mut requires exclusivity.
Rather than enforcing this statically, RefCell enforces them at runtime. If you break the rules, RefCell will
just panic and crash the program. Why does it return these Ref and RefMut things? Well, they basically
behave like Rc s but for borrowing. They also keep the RefCell borrowed until they go out of scope. We'll
get to that later.
Now with Rc and RefCell we can become... an incredibly verbose pervasively mutable garbage collected
language that can't collect cycles! Y-yaaaaay...
Alright, we want to be doubly-linked. This means each node has a pointer to the previous and next node.
Also, the list itself has a pointer to the first and last node. This gives us fast insertion and removal on both
ends of the list.
use std::rc::Rc;
use std::cell::RefCell;
struct Node<T> {
elem: T,
next: Link<T>,
prev: Link<T>,
}
> cargo build 86
Hey, it built! Lots of dead code warnings, but it built! Let's try to use it.
87
Building Up
Alright, we'll start with building the list. That's pretty straight-forward with this new system. new is still
trivial, just None out all the fields. Also because it's getting a bit unwieldy, let's break out a Node
constructor too:
impl<T> Node<T> {
fn new(elem: T) -> Rc<RefCell<Self>> {
Rc::new(RefCell::new(Node {
elem: elem,
prev: None,
next: None,
}))
}
}
impl<T> List<T> {
pub fn new() -> Self {
List { head: None, tail: None }
}
}
Yay!
Now let's try to write pushing onto the front of the list. Because doubly-linked lists are significantly more
complicated, we're going to need to do a fair bit more work. Where singly-linked list operations could be
reduced to an easy one-liner, doubly-linked list ops are fairly complicated.
In particular we now need to specially handle some boundary cases around empty lists. Most operations
will only touch the head or tail pointer. However when transitioning to or from the empty list, we need
to edit both at once.
An easy way for us to validate if our methods make sense is if we maintain the following invariant: each
node should have exactly two pointers to it. Each node in the middle of the list is pointed at by its
predecessor and successor, while the nodes on the ends are pointed to by the list itself.
cargo build
Why can't we access the prev and next fields on our nodes? It worked before when we just had an
Rc<Node> . Seems like the RefCell is getting in the way.
clicks link
Values of the Cell<T> and RefCell<T> types may be mutated through shared references (i.e. the
common &T type), whereas most Rust types can only be mutated through unique ( &mut T )
references. We say that Cell<T> and RefCell<T> provide 'interior mutability', in contrast with typical 89
Rust types that exhibit 'inherited mutability'.
Cell types come in two flavors: Cell<T> and RefCell<T> . Cell<T> provides get and set
methods that change the interior value with a single method call. Cell<T> though is only compatible
with types that implement Copy . For other types, one must use the RefCell<T> type, acquiring a
write lock before mutating.
RefCell<T> uses Rust's lifetimes to implement 'dynamic borrowing', a process whereby one can
claim temporary, exclusive, mutable access to the inner value. Borrows for RefCell<T> s are tracked
'at runtime', unlike Rust's native reference types which are entirely tracked statically, at compile time.
Because RefCell<T> borrows are dynamic it is possible to attempt to borrow a value that is already
mutably borrowed; when this happens it results in thread panic.
90
It's very common then to put a RefCell<T> inside shared pointer types to reintroduce mutability:
use std::collections::HashMap;
use std::cell::RefCell;
use std::rc::Rc;
fn main() {
let shared_map: Rc<RefCell<_>> = Rc::new(RefCell::new(HashMap::new()));
shared_map.borrow_mut().insert("africa", 92388);
shared_map.borrow_mut().insert("kyoto", 11837);
shared_map.borrow_mut().insert("piccadilly", 11826);
shared_map.borrow_mut().insert("marbles", 38);
}
Note that this example uses Rc<T> and not Arc<T> . RefCell<T> s are for single-threaded scenarios.
Consider using Mutex<T> if you need shared mutability in a multi-threaded situation.
shared_map.borrow_mut().insert("africa", 92388);
In particular, the borrow_mut thing. Seems we need to explicitly borrow a RefCell. The . operator's not
going to do it for us. Weird. Let's try:
pub fn push_front(&mut self, elem: T) { 91
let new_head = Node::new(elem);
match self.head.take() {
Some(old_head) => {
old_head.borrow_mut().prev = Some(new_head.clone());
new_head.borrow_mut().next = Some(old_head);
self.head = Some(new_head);
}
None => {
self.tail = Some(new_head.clone());
self.head = Some(new_head);
}
}
}
Breaking Down
pop_front should be the same basic logic as push_front , but backwards. Let's try:
cargo build
Hrm... It seems that Box was really spoiling us. borrow_mut only gets us an &mut Node<T> , but we can't
move out of that!
We need something that takes a RefCell<T> and gives us a T . Let's check the docs for something like
that:
fn into_inner(self) -> T
old_head.into_inner().elem
Ah dang. into_inner wants to move out the RefCell, but we can't, because it's in an Rc . As we saw in the
previous chapter, Rc<T> only lets us get shared references into its internals. That makes sense, because
that's the whole point of reference counted pointers: they're shared!
This was a problem for us when we wanted to implement Drop for our reference counted list, and the
solution is the same: Rc::try_unwrap , which moves out the contents of an Rc if its refcount is 1.
Rc::try_unwrap(old_head).unwrap().into_inner().elem
Rc::try_unwrap returns a Result<T, Rc<T>> . Results are basically a generalized Option , where the
None case has data associated with it. In this case, the Rc you tried to unwrap. Since we don't care about
the case where it fails (if we wrote our program correctly, it has to succeed), we just call unwrap on it.
Anyway, let's see what compiler error we get next (let's face it, there's going to be one).
> cargo build 94
UGH. unwrap on Result requires that you can debug-print the error case. RefCell<T> only implements
Debug if T does. Node doesn't implement Debug.
Rather than doing that, let's just work around it by converting the Result to an Option with ok :
Rc::try_unwrap(old_head).ok().unwrap().into_inner().elem
PLEASE.
cargo build
YES.
phew
We did it.
Let's test by stealing the old stack basic test (because that's all that we've implemented so far):
#[cfg(test)] 95
mod test {
use super::List;
#[test]
fn basics() {
let mut list = List::new();
// Populate list
list.push_front(1);
list.push_front(2);
list.push_front(3);
// Check exhaustion
assert_eq!(list.pop_front(), Some(1));
assert_eq!(list.pop_front(), None);
}
}
cargo test
Running target/debug/lists-5c71138492ad4b4a
running 9 tests
test first::test::basics ... ok
test fourth::test::basics ... ok
test second::test::iter_mut ... ok
test second::test::basics ... ok
test fifth::test::iter_mut ... ok
test third::test::basics ... ok
test second::test::iter ... ok
test third::test::iter ... ok
test second::test::into_iter ... ok
Nailed it.
Now that we can properly remove things from the list, we can implement Drop. Drop is a little more
conceptually interesting this time around. Where previously we bothered to implement Drop for our stacks
just to avoid unbounded recursion, now we need to implement Drop to get anything to happen at all.
Rc can't deal with cycles. If there's a cycle, everything will keep everything else alive. A doubly-linked list, 96
as it turns out, is just a big chain of tiny cycles! So when we drop our list, the two end nodes will have their
refcounts decremented down to 1... and then nothing else will happen. Well, if our list contains exactly one
node we're good to go. But ideally a list should work right if it contains multiple elements. Maybe that's
just me.
As we saw, removing elements was a bit painful. So the easiest thing for us to do is just pop until we get
None:
cargo build
(We actually could have done this with our mutable stacks, but shortcuts are for people who understand
things!)
We could look at implementing the _back versions of push and pop , but they're just copy-paste jobs
which we'll defer to later in the chapter. For now let's look at more interesting things!
97
Peeking
Alright, we made it through push and pop . I'm not gonna lie, it got a bit emotional there. Compile-time
correctness is a hell of a drug.
Let's cool off by doing something simple: let's just implement peek_front . That was always really easy
before. Gotta still be easy, right?
Right?
HAH.
cargo build
This is exactly the same logic as our singly-linked stack. Why are things different. WHY.
The answer is really the whole moral of this chapter: RefCells make everything sadness. Up until now,
RefCells have just been a nuisance. Now they're going to become a nightmare.
So what's going on? To understand that, we need to go back to the definition of borrow :
It's later.
Ref and RefMut implement Deref and DerefMut respectively. So for most intents and purposes they
behave exactly like &T and &mut T . However, because of how those traits work, the reference that's
returned is connected to the lifetime of the Ref, and not the actual RefCell. This means that the Ref has to
be sitting around as long as we keep the reference around.
This is in fact necessary for correctness. When a Ref gets dropped, it tells the RefCell that it's not borrowed
anymore. So if we did manage to hold onto our reference longer than the Ref existed, we could get a
RefMut while a reference was kicking around and totally break Rust's type system in half.
So where does that leave us? We only want to return a reference, but we need to keep this Ref thing
around. But as soon as we return the reference from peek , the function is over and the Ref goes out of
scope.
😖
As far as I know, we're actually totally dead in the water here. You can't totally encapsulate the use of
RefCells like that.
But... what if we just give up on totally hiding our implementation details? What if we returns Refs?
Hmm... that's right. We have a Ref<Node<T>> , but we want a Ref<T> . We could abandon all hope of
encapsulation and just return that. We could also make things even more complicated and wrap
Ref<Node<T>> in a new type to only expose access to an &T .
Instead, we're going to go deeper down. Let's have some fun. Our source of fun is this beast:
Yes: just like you can map over an Option, you can map over a Ref.
I'm sure someone somewhere is really excited because monads or whatever but I don't care about any of
that. Also I don't think it's a proper monad since there's no None-like case, but I digress.
It's cool and that's all that matters to me. I need this.
Awww yissss
Let's make sure this is working by munging up the test from our stack. We need to do some munging to
deal with the fact that Refs don't implement comparisons.
#[test] 100
fn peek() {
let mut list = List::new();
assert!(list.peek_front().is_none());
list.push_front(1); list.push_front(2); list.push_front(3);
assert_eq!(&*list.peek_front().unwrap(), &3);
}
Running target/debug/lists-5c71138492ad4b4a
running 10 tests
test first::test::basics ... ok
test fourth::test::basics ... ok
test second::test::basics ... ok
test fourth::test::peek ... ok
test second::test::iter_mut ... ok
test second::test::into_iter ... ok
test third::test::basics ... ok
test second::test::peek ... ok
test second::test::iter ... ok
test third::test::iter ... ok
Great!
101
Symmetric Junk
Alright let's get all that combinatoric symmetry over with.
//..
// Populate list
list.push_front(1);
list.push_front(2);
list.push_front(3);
// Check exhaustion
assert_eq!(list.pop_front(), Some(1));
assert_eq!(list.pop_front(), None);
// Populate list
list.push_back(1);
list.push_back(2);
list.push_back(3);
// Check exhaustion
assert_eq!(list.pop_back(), Some(1));
assert_eq!(list.pop_back(), None);
}
#[test]
fn peek() {
let mut list = List::new();
assert!(list.peek_front().is_none());
assert!(list.peek_back().is_none());
assert!(list.peek_front_mut().is_none());
assert!(list.peek_back_mut().is_none());
104
list.push_front(1); list.push_front(2); list.push_front(3);
assert_eq!(&*list.peek_front().unwrap(), &3);
assert_eq!(&mut *list.peek_front_mut().unwrap(), &mut 3);
assert_eq!(&*list.peek_back().unwrap(), &1);
assert_eq!(&mut *list.peek_back_mut().unwrap(), &mut 1);
}
Are there some cases we're not testing? Probably. The combinatoric space has really blown up here. Our
code is at very least not obviously wrong.
Running target/debug/lists-5c71138492ad4b4a
running 10 tests
test first::test::basics ... ok
test fourth::test::basics ... ok
test second::test::basics ... ok
test fourth::test::peek ... ok
test second::test::iter ... ok
test third::test::iter ... ok
test second::test::into_iter ... ok
test second::test::iter_mut ... ok
test second::test::peek ... ok
test third::test::basics ... ok
Iteration
Let's take a crack at iterating this bad-boy.
IntoIter
IntoIter, as always, is going to be the easiest. Just wrap the stack and call pop :
impl<T> List<T> {
pub fn into_iter(self) -> IntoIter<T> {
IntoIter(self)
}
}
But we have an interesting new development. Where previously there was only ever one "natural" iteration
order for our lists, a Deque is inherently bi-directional. What's so special about front-to-back? What if
someone wants to iterate in the other direction?
Rust actually has an answer to this: DoubleEndedIterator . DoubleEndedIterator inherits from Iterator
(meaning all DoubleEndedIterator are Iterators) and requires one new method: next_back . It has the exact
same signature as next , but it's supposed to yield elements from the other end. The semantics of
DoubleEndedIterator are super convenient for us: the iterator becomes a deque. You can consume elements
from the front and back until the two ends converge, at which point the iterator is empty.
Much like Iterator and next , it turns out that next_back isn't really something consumers of the
DoubleEndedIterator really care about. Rather, the best part of this interface is that it exposes the rev
method, which wraps up the iterator to make a new one that yields the elements in reverse order. The
semantics of this are fairly straight-forward: calls to next on the reversed iterator are just calls to
next_back .
Anyway, because we're already a deque providing this API is pretty easy:
cargo test
Running target/debug/lists-5c71138492ad4b4a
running 11 tests
test fourth::test::basics ... ok
test fourth::test::peek ... ok
test fourth::test::into_iter ... ok
test first::test::basics ... ok
test second::test::basics ... ok
test second::test::iter ... ok
test second::test::iter_mut ... ok
test third::test::iter ... ok
test third::test::basics ... ok
test second::test::into_iter ... ok
test second::test::peek ... ok
Nice.
Iter
Iter will be a bit less forgiving. We'll have to deal with those awful Ref things again! Because of Refs, we
can't store &Node s like we did before. Instead, let's try to store Ref<Node> s:
impl<T> List<T> {
pub fn iter(&self) -> Iter<T> {
Iter(self.head.as_ref().map(|head| head.borrow()))
}
}
So far so good. Implementing next is going to be a bit hairy, but I think it's the same basic logic as the old
stack IterMut but with extra RefCell madness:
impl<'a, T> Iterator for Iter<'a, T> { 107
type Item = Ref<'a, T>;
fn next(&mut self) -> Option<Self::Item> {
self.0.take().map(|node_ref| {
self.0 = node_ref.next.as_ref().map(|head| head.borrow());
Ref::map(node_ref, |node| &node.elem)
})
}
}
cargo build
Shoot.
node_ref doesn't live long enough. Unlike normal references, Rust doesn't let us just split Refs up like
that. The Ref we get out of head.borrow() is only allowed to live as long as node_ref , but we end up
trashing that in our Ref::map call.
pub fn map_split<U, V, F>(orig: Ref<'b, T>, f: F) -> (Ref<'b, U>, Ref<'b, V>) where
F: FnOnce(&T) -> (&U, &V),
U: ?Sized,
V: ?Sized,
elem
})
}
cargo build
Compiling lists v0.1.0 (/Users/ADesires/dev/temp/lists)
error[E0521]: borrowed data escapes outside of closure
--> src/fourth.rs:159:13
|
153 | fn next(&mut self) -> Option<Self::Item> {
| --------- `self` is declared here, outside of the closure body
...
159 | self.0 = next.as_ref().map(|head| head.borrow());
| ^^^^^^ ---- borrow is only valid in the closure body
| |
| reference to `next` escapes the closure body here
Ergh. We need to Ref::Map again to get our lifetimes right. But Ref::Map returns a Ref and we need an
Option<Ref> , but we need to go through the Ref to map over our Option...
??????
self.0 = if next.is_some() {
Some(Ref::map(next, |next| &**next.as_ref().unwrap()))
} else {
None
};
elem
})
}
I don't think there's anything we can do here. It's a dead end. Let's try getting out of the RefCells.
What about our Rc s. Who said we even needed to store references? Why can't we just Clone the whole
Rc to get a nice owning handle into the middle of the list?
impl<T> List<T> {
pub fn iter(&self) -> Iter<T> {
Iter(self.head.as_ref().map(|head| head.clone()))
}
}
No, none of those work... our Iter doesn't have a lifetime anymore! Both &T and Ref<T> require us to
declare some lifetime up front before we get into next . But anything we manage to get out of our Rc
would be borrowing the Iterator... brain... hurt... aaaaaahhhhhh
Maybe we can... map... the Rc... to get an Rc<T> ? Is that a thing? Rc's docs don't seem to have anything like
that. Actually someone made a crate that lets you do that.
But wait, even if we do that then we've got an even bigger problem: the dreaded spectre of iterator
invalidation. Previously we've been totally immune to iterator invalidation, because the Iter borrowed the
list, leaving it totally immutable. However if our Iter was yielding Rcs, they wouldn't borrow the list at all!
That means people can start calling push and pop on the list while they hold pointers into it!
Well, pushing is actually fine. We've got a view into some sub-range of the list, and the list will just grow
beyond our sights. No biggie.
However pop is another story. If they're popping elements outside of our range, it should still be fine. We
can't see those nodes so nothing will happen. However if they try to pop off the node we're pointing at...
everything will blow up! In particular when they go to unwrap the result of the try_unwrap , it will actually
fail, and the whole program will panic.
That's actually pretty cool. We can get tons of interior owning pointers into the list and mutate it at the
same time and it will just work until they try to remove the nodes that we're pointing at. And even then we
don't get dangling pointers or anything, the program will deterministically panic!
But having to deal with iterator invalidation on top of mapping Rcs just seems... bad. Rc<RefCell> has
really truly finally failed us. Interestingly, we've experienced an inversion of the persistent stack case.
Where the persistent stack struggled to ever reclaim ownership of the data but could get references all day
every day, our list had no problem gaining ownership, but really struggled to loan our references.
Although to be fair, most of our struggles revolved around wanting to hide the implementation details and110
have a decent API. We could do everything fine if we wanted to just pass around Nodes all over the place.
Heck, we could make multiple concurrent IterMuts that were runtime checked to not be mutable accessing
the same element!
Really, this design is more appropriate for an internal data structure that never makes it out to consumers of
the API. Interior mutability is great for writing safe applications. Not so much safe libraries.
Anyway, that's me giving up on Iter and IterMut. We could do them, but ugh.
111
Final Code
Alright, so that's implementing a 100% safe doubly-linked list in Rust. It was a nightmare to implement,
leaks implementation details, and doesn't support several fundamental operations.
But, it exists.
Oh, I guess it's also riddled with tons of "unnecessary" runtime checks for correctness between Rc and
RefCell . I put unnecessary in quotes because they're actually necessary to guarantee the whole actually
being safe thing. We encountered a few places where those checks actually were necessary. Doubly-linked
lists have a horribly tangled aliasing and ownership story!
Still, it's a thing we can do. Especially if we don't care about exposing internal data structures to our
consumers.
From here on out, we're going to be focusing on other side of this coin: getting back all the control by
making our implementation unsafe.
112
use std::rc::Rc;
use std::cell::{Ref, RefMut, RefCell};
struct Node<T> {
elem: T,
next: Link<T>,
prev: Link<T>,
}
impl<T> Node<T> {
fn new(elem: T) -> Rc<RefCell<Self>> {
Rc::new(RefCell::new(Node {
elem: elem,
prev: None,
next: None,
}))
}
}
impl<T> List<T> {
pub fn new() -> Self {
List { head: None, tail: None }
}
#[cfg(test)]
mod test {
use super::List;
#[test]
fn basics() {
let mut list = List::new();
// Populate list
list.push_front(1);
list.push_front(2);
list.push_front(3);
// Check exhaustion
assert_eq!(list.pop_front(), Some(1));
assert_eq!(list.pop_front(), None);
// Populate list
list.push_back(1);
list.push_back(2);
list.push_back(3);
// Check normal removal 115
assert_eq!(list.pop_back(), Some(3));
assert_eq!(list.pop_back(), Some(2));
// Check exhaustion
assert_eq!(list.pop_back(), Some(1));
assert_eq!(list.pop_back(), None);
}
#[test]
fn peek() {
let mut list = List::new();
assert!(list.peek_front().is_none());
assert!(list.peek_back().is_none());
assert!(list.peek_front_mut().is_none());
assert!(list.peek_back_mut().is_none());
assert_eq!(&*list.peek_front().unwrap(), &3);
assert_eq!(&mut *list.peek_front_mut().unwrap(), &mut 3);
assert_eq!(&*list.peek_back().unwrap(), &1);
assert_eq!(&mut *list.peek_back_mut().unwrap(), &mut 1);
}
#[test]
fn into_iter() {
let mut list = List::new();
list.push_front(1); list.push_front(2); list.push_front(3);
In this chapter we're going to roll back to singly-linked lists and implement a singly-linked queue to dip
our toes into raw pointers and Unsafe Rust.
// in lib.rs
Our code is largely going to be derived from second.rs, since a queue is mostly an augmentation of a stack
in the world of linked lists. Still, we're going to go from scratch because there's some fundamental issues
we want to address with layout and what-not.
117
Layout
So what's a singly-linked queue like? Well, when we had a singly-linked stack we pushed onto one end of
the list, and then popped off the same end. The only difference between a stack and a queue is that a queue
pops off the other end. So from our stack implementation we have:
input list:
[Some(ptr)] -> (A, Some(ptr)) -> (B, None)
stack push X:
[Some(ptr)] -> (X, Some(ptr)) -> (A, Some(ptr)) -> (B, None)
stack pop:
[Some(ptr)] -> (A, Some(ptr)) -> (B, None)
To make a queue, we just need to decide which operation to move to the end of the list: push, or pop? Since
our list is singly-linked, we can actually move either operation to the end with the same amount of effort.
To move push to the end, we just walk all the way to the None and set it to Some with the new element.
input list:
[Some(ptr)] -> (A, Some(ptr)) -> (B, None)
flipped push X:
[Some(ptr)] -> (A, Some(ptr)) -> (B, Some(ptr)) -> (X, None)
To move pop to the end, we just walk all the way to the node before the None, and take it:
input list:
[Some(ptr)] -> (A, Some(ptr)) -> (B, Some(ptr)) -> (X, None)
flipped pop:
[Some(ptr)] -> (A, Some(ptr)) -> (B, None)
We could do this today and call it quits, but that would stink! Both of these operations walk over the entire
list. Some would argue that such a queue implementation is indeed a queue because it exposes the right
interface. However I believe that performance guarantees are part of the interface. I don't care about precise
asymptotic bounds, just "fast" vs "slow". Queues guarantee that push and pop are fast, and walking over
the whole list is definitely not fast.
One key observation is that we're wasting a ton of work doing the same thing over and over. Can we
"cache" all that work and reuse it? Why, yes! We can store a pointer to the end of the list, and just jump
straight to there!
It turns out that only one inversion of push and pop works with this. To invert pop we would have to
move the "tail" pointer backwards, but because our list is singly-linked, we can't do that efficiently. If we
instead invert push we only have to move the "head" pointer forward, which is easy.
struct Node<T> {
elem: T,
next: Link<T>,
}
impl<T> List<T> {
pub fn new() -> Self {
List { head: None, tail: None }
}
match old_tail {
Some(mut old_tail) => {
// If the old tail existed, update it to point to the new tail
old_tail.next = Some(new_tail);
}
None => {
// Otherwise, update the head to point to it
self.head = Some(new_tail);
}
}
}
}
I'm going a bit faster with the impl details now since we should be pretty comfortable with this sort of
thing. Not that you should necessarily expect to produce this code on the first try. I'm just skipping over
some of the trial-and-error we've had to deal with before. I actually made a ton of mistakes writing this
code that I'm not showing, but you can only see me leave off a mut or ; so many times before it stops
being instructive. Don't worry, we'll see plenty of other error messages!
> cargo build 119
Shoot!
Box doesn't implement Copy, so we can't just assign it to two locations. More importantly, Box owns the
thing it points to, and will try to free it when it's dropped. If our push implementation compiled, we'd
double-free the tail of our list! Actually, as written, our code would free the old_tail on every push. Yikes!
🙀
Alright, well we know how to make a non-owning pointer. That's just a reference!
pub struct List<T> { 120
head: Link<T>,
tail: Option<&mut Node<T>>, // NEW!
}
struct Node<T> {
elem: T,
next: Link<T>,
}
impl<T> List<T> {
pub fn new() -> Self {
List { head: None, tail: None }
}
// Put the box in the right place, and then grab a reference to its Node
let new_tail = match self.tail.take() {
Some(old_tail) => {
// If the old tail existed, update it to point to the new tail
old_tail.next = Some(new_tail);
old_tail.next.as_deref_mut()
}
None => {
// Otherwise, update the head to point to it
self.head = Some(new_tail);
self.head.as_deref_mut()
}
};
self.tail = new_tail;
}
}
Nothing too tricky here. Same basic idea as the previous code, except we're using some of that implicit
return goodness to extract the tail reference from wherever we stuff the actual Box.
Oh right, we need to give lifetimes to references in types. Hmm... what's the lifetime of this reference? Well,
this seems like IterMut, right? Let's try what we did for IterMut, and just add a generic 'a :
pub struct List<'a, T> { 121
head: Link<T>,
tail: Option<&'a mut Node<T>>, // NEW!
}
struct Node<T> {
elem: T,
next: Link<T>,
}
// Put the box in the right place, and then grab a reference to its Node
let new_tail = match self.tail.take() {
Some(old_tail) => {
// If the old tail existed, update it to point to the new tail
old_tail.next = Some(new_tail);
old_tail.next.as_deref_mut()
}
None => {
// Otherwise, update the head to point to it
self.head = Some(new_tail);
self.head.as_deref_mut()
}
};
self.tail = new_tail;
}
}
cargo build 122
Woah, that's a really detailed error message. That's a bit concerning, because it suggests we're doing
something really messed up. Here's an interesting part:
the lifetime must be valid for the lifetime 'a as defined on the impl
We're borrowing from self , but the compiler wants us to last as long as 'a , what if we tell it self does
last that long..?
cargo build
head.elem
})
}
#[cfg(test)]
mod test {
use super::List;
#[test]
fn basics() {
let mut list = List::new();
// Populate list
list.push(1);
list.push(2);
list.push(3);
// Check exhaustion
assert_eq!(list.pop(), Some(5));
assert_eq!(list.pop(), None);
}
}
cargo test 124
....
....
🙀🙀🙀🙀🙀🙀🙀🙀🙀🙀🙀🙀🙀🙀🙀🙀🙀🙀🙀🙀🙀🙀🙀
Oh my goodness.
The compiler's not wrong for vomiting all over us. We just committed a cardinal Rust sin: we stored a
reference to ourselves inside ourselves. Somehow, we managed to convince Rust that this totally made
sense in our push and pop implementations (I was legitimately shocked we did).
The reason this sort of works is that Rust doesn't really have the notion of a pointer into yourself at all.
Each part of the code is technically correct in isolation (we can call push and pop once) but then the
absurdity of what we created takes affect and everything just locks up.
I'm sure there is some use for what we've written, but as far as I'm concerned it's just syntatically valid 125
gibberish. We're saying we contain something with lifetime 'a , and that push and pop borrows self for
that lifetime. That's weird but Rust can look at each part of our code individually and it doesn't see any
rules being broken.
But as soon as we try to actually use the list, the compiler quickly goes "yep you've borrowed self
mutably for 'a , so you can't use self anymore until the end of 'a " but also "because you contain 'a , it
must be valid for the entire list's existence".
It's nearly a contradiction but there is one solution: as soon as you push or pop , the list "pins" itself in
place and can't be accessed anymore. It has swallowed its own proverbial tail, and ascended to a world of
dreams.
NARRATOR: it didn't exist when this book was first written, but Rust actually formalized the notion
of a pin into something useful! This was probably the most complex addition to the language since
the borrowchecker. We don't want our list to be pinned though!
Pins are necessary and useful for async-await/futures/coroutines because the compiler needs to be
able to bundle up all the local variables of a function into some kind of struct and store them
somewhere until the future/coroutine is ready to be resumed. Since local variables can reference other
local variables, and we want that to work, these structs can end up containing references to
themselves!
So to await or yield Rust needs a way to be able to properly describe and manipulate pinned
values. Thankfully all of this stuff is largely just hidden away in automatic compiler machinery and no
one actually has to think about Pin (or even Futures) under normal circumstances. The main
exception is that this stuff is very important for the folks building and designing async runtimes like
tokio.
We will not be implementing an async runtime in this book. I know my friends know all sorts of
"cool" (messed up) tricks you can do with Pin , but from what I can tell, I'd be happier to just not
know them. I will continue to tell myself that Pinned types aren't real and they can't hurt me.
Our pop implementation hints at why storing a reference to ourselves inside ourselves could be really
dangerous:
// ...
if self.head.is_none() {
self.tail = None;
}
What if we forgot to do this? Then our tail would point to some node that had been removed from the list.
Such a node would be instantly freed, and we'd have a dangling pointer which Rust was supposed to
protect us from!
And indeed Rust is protecting us from that kind of danger. Just in a very... roundabout way.
Please. No.
No, instead we're going to go off the rails and use raw pointers. Our layout is going to look like this: 126
struct Node<T> {
elem: T,
next: Link<T>,
}
And that's that. None of this wimpy reference-counted-dynamic-borrow-checking nonsense! Real. Hard.
Unchecked. Pointers.
NARRATOR: This implementation was in fact still dangerously wrong, but it wasn't yet time to learn
that lesson. The next section will learn that the hard way, as usual.
Hello unsafe .
Unsafe Rust
This is a serious, big, complicated, and dangerous topic. It's so serious that I wrote an entire other book on
it.
The long and the short of it is that every language is actually unsafe as soon as you allow calling into other
languages, because you can just have C do arbitrarily bad things. Yes: Java, Python, Ruby, Haskell...
everyone is wildly unsafe in the face of Foreign Function Interfaces (FFI).
Rust embraces this truth by splitting itself into two languages: Safe Rust, and Unsafe Rust. So far we've
only worked with Safe Rust. It's completely 100% safe... except that it can FFI into Unsafe Rust.
Unsafe Rust is a superset of Safe Rust. It's completely the same as Safe Rust in all its semantics and rules,
you're just allowed to do a few extra things that are wildly unsafe and can cause the dreaded Undefined
Behaviour that haunts C.
Again, this is a really huge topic that has a lot of interesting corner cases. I really don't want to go really
deep into it (well, I do. I did. Read that book). That's ok, because with linked lists we can actually ignore
almost all of it.
The main Unsafe tool we'll be using are raw pointers. Raw pointers are basically C's pointers. They have
no inherent aliasing rules. They have no lifetimes. They can be null. They can be misaligned. They can be
dangling. They can point to uninitialized memory. They can be cast to and from integers. They can be cast
to point to a different type. Mutability? Cast it. Pretty much everything goes, and that means pretty much
anything can go wrong.
This is some bad stuff and honestly you'll live a happier life never having to touch these. Unfortunately, we
want to write linked lists, and linked lists are awful. That means we're going to have to use unsafe
pointers.
There are two kinds of raw pointer: *const T and *mut T . These are meant to be const T* and T* from
C, but we really don't care about what C thinks they mean that much. You can only dereference a *const T
to an &T , but much like the mutability of a variable, this is just a lint against incorrect usage. At most it just
means you have to cast the *const to a *mut first. Although if you don't actually have permission to
mutate the referent of the pointer, you're gonna have a bad time.
Anyway, we'll get a better feel for this as we write some code. For now, *mut T == &unchecked mut T !
128
Basics
NARRATOR: This section has a looming fundamental error in it, because that's the whole point of
the book. However once we start using unsafe it's possible to do things wrong and still have
everything compile and seemingly work. The fundamental mistake will be identified in the next
section. Don't actually use the contents of this section in production code!
impl<T> List<T> {
pub fn new() -> Self {
List { head: None, tail: None }
}
}
We could use an Option, but unlike Box, *mut is nullable. This means it can't benefit from the null pointer
optimization. Instead, we'll be using null to represent None.
So how do we get a null pointer? There's a few ways, but I prefer to use std::ptr::null_mut() . If you
want, you can also use 0 as *mut _ , but that just seems so messy.
use std::ptr;
// defns...
impl<T> List<T> {
pub fn new() -> Self {
List { head: None, tail: ptr::null_mut() }
}
}
cargo build 129
Alright, let's move on to writing push again. This time, instead of grabbing an Option<&mut Node<T>>
after we insert, we're just going to grab a *mut Node<T> to the insides of the Box right away. We know we
can soundly do this because the contents of a Box has a stable address, even if we move the Box around.
Of course, this isn't safe, because if we just drop the Box we'll have a pointer to freed memory.
How do we make a raw pointer from a normal pointer? Coercions! If a variable is declared to be a raw
pointer, a normal reference will coerce into it:
We have all the info we need. We can translate our code into, approximately, the previous reference
version:
pub fn push(&mut self, elem: T) { 130
let mut new_tail = Box::new(Node {
elem: elem,
next: None,
});
self.tail = raw_tail;
}
Huh? We have a pointer to a Node, why can't we get the next field?
Rust is kinda a jerk when you use raw pointers. To access the contents of a raw pointer, it insists that we
manually deref them, because it's such an unsafe operation. So let's do that:
*self.tail.next = Some(new_tail);
(*self.tail).next = Some(new_tail);
> cargo build 131
--> src/fifth.rs:31:13
|
31 | (*self.tail).next = Some(new_tail);
| ^^^^^^^^^^^^^^^^^ dereference of raw pointer
|
= note: raw pointers may be NULL, dangling or unaligned;
they can violate aliasing rules and cause data races:
all of these are undefined behavior
Remember how I said Unsafe Rust is like an FFI language for Safe Rust? Well, the compiler wants us to
explicitly delimit where we're doing this FFI-ing. We have two options. First, we can mark our entire
function as unsafe, in which case it becomes an Unsafe Rust function and can only be called in an unsafe
context. This isn't great, because we want our list to be safe to use. Second, we can write an unsafe block
inside our function, to delimit the FFI boundary. This declares the overall function to be safe. Let's do that
one:
if !self.tail.is_null() {
// Hello Compiler, I Know I Am Doing Something Dangerous And
// I Promise To Be A Good Programmer Who Never Makes Mistakes.
unsafe {
(*self.tail).next = Some(new_tail);
}
} else {
self.head = Some(new_tail);
}
self.tail = raw_tail;
}
Yay!
It's kind've interesting that that's the only place we've had to write an unsafe block so far. We do raw
pointer stuff all over the place, what's up with that?
It turns out that Rust is a massive rules-lawyer pedant when it comes to unsafe . We quite reasonably 132
want to maximize the set of Safe Rust programs, because those are programs we can be much more
confident in. To accomplish this, Rust carefully carves out a minimal surface area for unsafety. Note that all
the other places we've worked with raw pointers has been assigning them, or just observing whether
they're null or not.
If you never actually dereference a raw pointer those are totally safe things to do. You're just reading and
writing an integer! The only time you can actually get into trouble with a raw pointer is if you actually
dereference it. So Rust says only that operation is unsafe, and everything else is totally safe.
NARRATOR: Somewhere on the other side of the world, a hardware engineer feels a shiver down her
spine — someone must be insisting pointers are just integers again. She looks down at her proposal
for a new hardware pointer authentication scheme and sheds a single tear. The compiler engineer
next door feels nothing — they long ago learned to always wear a heavy sweater.
Having only some of the pointer operations be actually unsafe raises an interesting problem: although
we're supposed to delimit the scope of the unsafety with the unsafe block, it actually depends on state
that was established outside of the block. Outside of the function, even!
This is what I call unsafe taint. As soon as you use unsafe in a module, that whole module is tainted with
unsafety. Everything has to be correctly written in order to make sure all invariants are upheld for the unsafe
code.
This taint is manageable because of privacy. Outside of our module, all of our struct fields are totally
private, so no one else can mess with our state in arbitrary ways. As long as no combination of the APIs we
expose causes bad stuff to happen, as far as an outside observer is concerned, all of our code is safe! And
really, this is no different from the FFI case. No one needs to care if some python math library shells out to
C as long as it exposes a safe interface.
Anyway, let's move on to pop , which is pretty much verbatim the reference version:
if self.head.is_none() {
self.tail = ptr::null_mut();
}
head.elem
})
}
Again we see another case where safety is stateful. If we fail to null out the tail pointer in this function,
we'll see no problems at all. However subsequent calls to push will start writing to the dangling tail!
// Populate list
list.push(1);
list.push(2);
list.push(3);
// Check exhaustion
assert_eq!(list.pop(), Some(5));
assert_eq!(list.pop(), None);
This is just the stack test, but with the expected pop results flipped around. I also added some extra steps
at the end to make sure that tail-pointer corruption case in pop doesn't occur.
cargo test 134
running 12 tests
test fifth::test::basics ... ok
test first::test::basics ... ok
test fourth::test::basics ... ok
test fourth::test::peek ... ok
test second::test::basics ... ok
test fourth::test::into_iter ... ok
test second::test::into_iter ... ok
test second::test::iter ... ok
test second::test::iter_mut ... ok
test second::test::peek ... ok
test third::test::basics ... ok
test third::test::iter ... ok
Gold Star!
Miri
nervously laughs This unsafe stuff is so easy, I don't know why everyone says otherwise. Our program
works perfectly.
NARRATOR: 🙂
...right?
NARRATOR: 🙂
Well, we're writing unsafe code now, so the compiler can't help us catch mistakes as well. It could be that
the tests happened to work, but were actually doing something non-deterministic. Something Undefined
Behavioury.
But what can we do? We've pried open the windows and snuck out of rustc's classroom. No one can help
us now.
"It's wild man, it can validate that the actual dynamic execution of your program conforms to the semantics
of Rust's memory model. Blows your mind..."
What?
Of course I do, it's the tool for having an up to date Rust toolchain!
> rustup +nightly-2022-01-21 component add miri 136
The tool we're installing, miri , works closely with rustc's internals, so it's only available for nightly
toolchains.
+nightly-2022-01-21 tells rustup we want to install miri with the rust nightly toolchain for that
date. I'm giving a specific date because sometimes miri falls behind and can't be built for a few
nightlies. rustup will automatically download whatever toolchain we specify with + if we don't have
it installed yet.
2022-01-21 is just a nightly I know has miri support, which you can check on this status page. You
can just use +nightly if you're feeling lucky.
Whenever we invoke miri via cargo miri we will also use this + syntax to specify the toolchain we
installed miri on. If you don't want to have to specify it every time, you can use rustup override
set .
UH???
> y
--> \lib\rustlib\src\rust\library\core\src\option.rs:846:18
|
846 | Some(x) => Some(f(x)),
| ^ trying to reborrow for Unique at alloc84055,
| but parent tag <209678> does not have an
| appropriate item in the borrow stack
|
= help: this indicates a potential bug in the program:
it performed an invalid operation, but the rules it
violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-
borrows.md
for further information
"Here take the bottle of estradiol too, you're gonna need it later."
Wait why?
NARRATOR: The mysterious person then proceeded to transform into a fox and scampered through a
hole in the wall. The author then stared into the middle distance for several minutes while they tried
to process everything that just happened.
The mysterious fox in the alleyway was right about more than just my gender: miri really is The Good Shit.
Ok so what is miri?
An experimental interpreter for Rust's mid-level intermediate representation (MIR). It can run binaries
and test suites of cargo projects and detect certain classes of undefined behavior, for example:
On top of that, Miri will also tell you about memory leaks: when there is memory still allocated at the
end of the execution, and that memory is not reachable from a global static, Miri will raise an error.
...
However, be aware that Miri will not catch all cases of undefined behavior in your program, and
cannot run all programs
TL;DR: it interprets your program and notices if you break the rules at runtime and Do An Undefined
Behaviour. This is necessary because Undefined Behaviour is generally a thing that happens at runtime. If
the issue could be found at compile time, the compiler would just make it an error!
If you're familiar with tools like ubsan and tsan: it's basically that but all together and more extreme.
Miri is now hanging outside the classroom window with a knife. A learning knife.
If we ever want miri to check our work, we can ask them to interpret our test suite with
> cargo +nightly-2022-01-21 miri test 140
Now let's take a closer look at what they carved into our desk:
error: Undefined Behavior: trying to reborrow for Unique at alloc84055, but parent tag
<209678> does not have an appropriate item in the borrow stack
--> \lib\rustlib\src\rust\library\core\src\option.rs:846:18
|
846 | Some(x) => Some(f(x)),
| ^ trying to reborrow for Unique at alloc84055,
| but parent tag <209678> does not have an
| appropriate item in the borrow stack
|
= help: see
https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md
for further information
Well I can see we made an error, but that's a confusing error message. What's the "borrow stack"?
Normally I'd give a guided tour of the docs, but we're not really the target audience of that documentation.
It's more designed for compiler developers and academics who are working on the semantics of Rust.
So I'm going to just give you the high level idea of "stacked borrows", and then give you a simple strategy
for following the rules.
NARRATOR: Stacked borrows are still "experimental" as a semantic model for Rust, so breaking
these rules may not actually mean your program is "wrong". But unless you literally work on the
compiler, you should just fix your program when miri complains. Better safe than sorry when it
comes to Undefined Behaviour.
142
We say two pointers alias when the pieces of memory they point to overlap. Just as someone who "goes
by an alias" can be referred to by two different names, that overlapping piece of memory can be referred to
by two different pointers. This can lead to problems.
The compiler uses information about pointer aliasing to optimize accesses to memory, so if the information
it has is wrong then the program will be miscompiled and do random garbage.
NARRATOR: Practically speaking, aliasing is more concerned with memory accesses than the
pointers themselves, and only really matters when one of the accesses is mutating. Pointers are
emphasized because they're a convenient thing to attach rules to.
To understand why pointer aliasing information is important, let's consider The Parable of the Tiny Angry
Man.
Michiel was looking through their bookshelf one day when they saw a book they didn't remember. They
pulled it from the bookcase and looked at the cover.
"Oh yes, my old copy of War and Peace, a book I definitely have read. I loved the part with all the Peace."
Suddenly there was a knock at the door. Michiel returned the book to its shelf and opened the door -- it
was their sworn nemesis Hamslaw. As Hamslaw prepared a devastating remark about Michiel's clearly
inferior codegolfing skills, they sensed an opening:
"Well I have, look it's right there in my bookcase, which obviously means I've read it."
Hamslaw couldn't believe it. Her face shifted from its usual smug demeanor to an iron mask of rage and
determination. Hamslaw pushed Michiel aside and power-walked to the book shelf, cleaving the tome
from its resting place with the fury of a thousand Valkyries. She turned the ancient text over in her hands,
and the instant she saw the cover she began to shake.
Michiel prepared to boast of their clearly unparalleled brilliance, but was interrupted by the sudden laughter
of Hamslaw.
Tears were rolling down Hamslaw's face. This was clearly the greatest moment of her life.
They fell to their knees and stared blankly at the bookcase. How could this have happened? They had
checked the cover only a moment ago!
And then they saw a bit of motion in the bookcase. It was a tiny man. A tiny many with the angriest scowl
Michiel had ever seen. The tiny man flipped Michiel off and mouthed the words "no one will believe you"
and disappeared back between the books.
Michiel's plan had been perfect, but they had failed to account for the possibility of a tiny angry man with a
sharpie and the desire for destruction. They thought they knew what the cover of the book said, and they
thought that no one could have possibly changed it. But alas, they were wrong.
Hamslaw was already working on a zine commemorating her incredible victory — Michiel's reputation at
the local Internet Cafe would never recover.
No one wants to be like Michiel, but no one wants to live in constant fear of the tiny angry man either. We
want to know when the tiny angry man could be playing tricks on us. When he is, we will be very careful
and paranoid about checking everything before we use it. But when the tiny angry man is gone, we want to
be able to remember things.
That's the (very simplified) crux of pointer aliasing: when can the compiler assume it's safe to "remember"
(cache) values instead of loading them over and over? To know that, the compiler needs to know whenever
there could be little angry men mutating the memory behind your back.
NARRATOR: the compiler also uses this information to cache stores, which just means it can avoid
committing things to memory if it thinks no one will notice. In this case the problem is still tiny angry
men, but they only need to read the memory for it to be a problem.
144
Except it's more complicated than that. We can "reborrow" mutable pointers like this:
*ref2 += 2;
*ref1 += 1;
println!("{}", data);
// ORDER SWAPPED!
*ref1 += 1;
*ref2 += 2;
println!("{}", data);
For more information about this error, try `rustc --explain E0503`.
error: could not compile `playground` due to previous error
When we reborrow a mutable pointer, the original pointer can't be used anymore until the borrower is done
with it (no more uses).
In the code that works, there's a nice little nesting of the uses. We reborrow the pointer, use the new pointer
for a while, and then stop using it before using the older pointer again. In the code that doesn't work, that
doesn't happen. We just interleave the uses arbitrarily.
This is how we can have reborrows and still have aliasing information: all of our reborrows clearly nest, so145
we can consider only one of them "live" at any given time.
Hey, you know what's a great way to represent cleanly nested things? A stack. A stack of borrows.
Whatever's at the top of the borrow stack is "live" and knows it's effectively unaliased. When you reborrow
a pointer, the new pointer is pushed onto the stack, becoming the live pointer. When you use an older
pointer it's brought back to life by popping everything on the borrow stack above it. At this point the pointer
"knows" it was reborrowed and that the memory might have been modified, but that it once more has
exclusive access -- no need to worry about little angry men.
So it's actually always ok to access a reborrowed pointer, because we can always pop everything above it.
The real trouble is accessing a pointer that has already been popped off of the borrow stack -- then you've
messed up.
Thankfully the design of the borrowchecker ensures that safe Rust programs follow these rules, as we saw
in the above example, but the compiler generally views this problem "backwards" from the stacked
borrows perspective. Instead of saying using ref1 invalidates ref2 , it insists that ref2 must be valid for
all its uses, and that ref1 is the one messing things up by going out of turn.
Hence "cannot use *ref1 because it was mutably borrowed". It's the same result (especially with non-
lexical lifetimes), but framed in a way that's probably more intuitive.
But the borrowchecker can't help us when we start using unsafe pointers!
146
That's a hard problem, and I don't know how to solve it, but the folks who worked on Stacked Borrows
came up with something plausible, and miri tries to implement it.
The very high-level concept is that when you convert a reference (or any other safe pointer) into an raw
pointer it's basically like taking a reborrow. So now the raw pointer is allowed to do whatever it wants with
that memory, and when the reborrow expires it's just like when that happens with normal reborrows.
But the question is, when does that reborrow expire? Well, probably a good time to expire it is when you
start using the original reference again. Otherwise things aren't a nice nested stack.
But wait, you can turn a raw pointer into a reference! And you can copy raw pointers! What if you go &mut
-> *mut -> &mut -> *mut and then access the first *mut ? How the heck do the stacked borrows work
then?
I genuinely don't know! That's why things are complicated. In fact they're extra complicated because
stacked borrows are trying to be more permissive and let more unsafe code work the way you'd expect it to.
This is why I run things under miri to try to help me catch mistakes.
In fact, this messiness is why there is an extra-experimental extra-strict mode of miri: -Zmiri-tag-raw-
pointers .
To enable it, we need to pass it via a MIRIFLAGS environment variable like this:
Or like this on Windows, where you need to just set the variable globally:
$env:MIRIFLAGS="-Zmiri-tag-raw-pointers"
cargo +nightly-2022-01-21 miri test
We'll generally be trying to conform to this extra-strict mode just to be extra confident in our work. It's also
in some sense "simpler", so it's actually better for messing around and getting an intuition for stacked
borrows.
147
Once you start using raw pointers, try to ONLY use raw pointers.
This makes it as unlikely as possible to accidentally lose the raw pointer's "permission" to access the
memory.
1. Safe pointers often assert more properties than just aliasing: the memory is allocated, it's
aligned, it's large enough to fit the type of the pointee, the pointee is properly initialized, etc. So
it's even more dangerous to wildly throw them around when things are in a dubious state.
2. Even if you stay in raw pointer land, you can't just wildly alias any memory. Pointers are
conceptually tied to specific "allocations" (which can be as granular as a local variable on the
stack), and you're not supposed to take a pointer from one allocation, offset it, and then access
memory in a different allocation. If this was allowed, there would always be the threat of tiny
angry men everywhere. This is part of the reason "pointers are just integers" is a problematic
viewpoint.
Now, we still want safe references in our interface, because we want to build a nice safe abstraction so the
user of our list doesn't have to know or worry about.
1. At the start of a method, use the input references to get our raw pointers
2. Do our best to only use unsafe pointers from this point on
3. Convert our raw pointers back to safe pointers at the end if needed
But the fields of our types are private so we're going to keep those entirely as raw pointers.
In fact, part of the big mistake we made was continuing to use Box! Box has a special annotation in it that
tells the compiler "hey this is a lot like &mut , because it uniquely owns that pointer". Which is true!
But the raw pointer we were keeping to the end of the list was pointing into a Box, so whenever we access
the Box normally we're probably invalidating that raw pointer's "reborrow"! ☠
In the next section we'll return to our true form and hit our heads against a bunch of examples.
148
That was a lot of theory and ideas -- let's move on to the true heart and soul of this book: writing some bad
code and getting our tools to scream at us. We're going to go through a ton of examples to try to see if our
mental model makes sense, and to try to get an intuitive feel for stacked borrows.
NARRATOR: Catching Undefined Behaviour in practice is a hairy business. After all, you're dealing
with situations that the compiler literally assumes don't happen.
If you're lucky, things will "seem to work" today, but they'll be a ticking time bomb for a Smarter
Compiler or slight change to the code. If you're really lucky things will reliably crash so you can just
catch the mistake and fix it. But if you're unlucky, then things will be broken in weird and baffling
ways.
Miri tries to work around this by getting rustc's most naive and unoptimized view of the program and
tracking extra state as it interprets. As far as "sanitizers" go, this is a fairly deterministic and robust
approach but it will never be perfect. You need your test program to actually have an execution with
that UB, and for a large enough program it's very easy to introduce all sorts of non-determinism
(HashMaps use RNG by default!).
We can never take miri approving of our program's execution as an absolute certain statement there's
no UB. It's also possible for miri to think something's UB when it really isn't. But if we have a mental
model of how things work, and miri seems to agree with us, that's a good sign that we're on the right
track.
149
Basic Borrows
In previous sections we saw that the borrowchecker didn't like this code:
// ORDER SWAPPED!
*ref1 += 1;
*ref2 += 2;
println!("{}", data);
unsafe {
let mut data = 10;
let ref1 = &mut data;
let ptr2 = ref1 as *mut _;
// ORDER SWAPPED!
*ref1 += 1;
*ptr2 += 2;
println!("{}", data);
}
cargo run
Compiling miri-sandbox v0.1.0
Finished dev [unoptimized + debuginfo] target(s) in 0.71s
Running `target\debug\miri-sandbox.exe`
13
Rustc seems perfectly happy with this: no warnings and the program produced the result we expected!
Now let's look at what miri (in strict mode) thinks of it:
--> src\main.rs:9:9
|
9 | *ptr2 += 2;
| ^^^^^^^^^^ no item granting read access to tag <untagged>
| at alloc748 found in borrow stack.
|
= help: this indicates a potential bug in the program:
it performed an invalid operation, but the rules it
violated are still experimental
Nice! Our intuitive model of how things work held up: although the compiler couldn't catch the issue for us,
150
miri did.
Let's try something more complicated, the &mut -> *mut -> &mut -> *mut case we alluded to before:
unsafe {
let mut data = 10;
let ref1 = &mut data;
let ptr2 = ref1 as *mut _;
let ref3 = &mut *ptr2;
let ptr4 = ref3 as *mut _;
println!("{}", data);
}
cargo run
22
--> src\main.rs:13:5
|
13 | *ptr4 += 4;
| ^^^^^^^^^^ no item granting read access to tag <1621>
| at alloc748 found in borrow stack.
|
Wow yep! In strict mode miri can "tell apart" the two raw pointers and have using the second one invalidate
the first one. Let's see if everything works when we remove the first use that messes everything up:
unsafe {
let mut data = 10;
let ref1 = &mut data;
let ptr2 = ref1 as *mut _;
let ref3 = &mut *ptr2;
let ptr4 = ref3 as *mut _;
println!("{}", data);
}
cargo run 151
20
NICE.
Yeah I'm pretty sure at this point we can all get PhD's in programming language memory model design and
implementation. Who even needs compilers, this stuff is easy.
Testing Arrays
Let's mess with some arrays and pointer offsets ( add and sub ). This should work, right?
unsafe {
let mut data = [0; 10];
let ref1_at_0 = &mut data[0]; // Reference to 0th element
let ptr2_at_0 = ref1_at_0 as *mut i32; // Ptr to 0th element
let ptr3_at_1 = ptr2_at_0.add(1); // Ptr to 1st element
*ptr3_at_1 += 3;
*ptr2_at_0 += 2;
*ref1_at_0 += 1;
cargo run
[3, 3, 0, 0, 0, 0, 0, 0, 0, 0]
What happened? We're using the borrow stack perfectly fine! Does something weird happen when we go
ptr -> ptr ? What if we just copy the pointer so they all go to the same location:
unsafe {
let mut data = [0; 10];
let ref1_at_0 = &mut data[0]; // Reference to 0th element
let ptr2_at_0 = ref1_at_0 as *mut i32; // Ptr to 0th element
let ptr3_at_0 = ptr2_at_0; // Ptr to 0th element
*ptr3_at_0 += 3;
*ptr2_at_0 += 2;
*ref1_at_0 += 1;
cargo run
[6, 0, 0, 0, 0, 0, 0, 0, 0, 0]
unsafe {
let mut data = [0; 10];
let ref1_at_0 = &mut data[0]; // Reference to 0th element
let ptr2_at_0 = ref1_at_0 as *mut i32; // Ptr to 0th element
let ptr3_at_0 = ptr2_at_0; // Ptr to 0th element
let ptr4_at_0 = ptr2_at_0.add(0); // Ptr to 0th element
let ptr5_at_0 = ptr3_at_0.add(1).sub(1); // Ptr to 0th element
cargo run
[20, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Nope! Miri is actually way more permissive when it comes to raw pointers that are derived from other raw
pointers. They all share the same "borrow" (or miri calls it, a tag).
Once you start using raw pointers they can freely split into their own tiny angry men and mess with
themselves. This is ok because the compiler understands that and won't optimize the reads and writes the
same it does with references.
NARRATOR: If the code is simple enough, the compiler can keep track of all the derived pointers and
still optimize things where possible, but it's going to be a lot more brittle than the reasoning it can
use for references.
Even though data is one "allocation" (local variable), ref1_at_0 is only borrowing the first element. Rust
allows borrows to be broken up so that they only apply to particular parts of the allocation! Let's try it out:
unsafe { 154
let mut data = [0; 10];
let ref1_at_0 = &mut data[0]; // Reference to 0th element
let ref2_at_1 = &mut data[1]; // Reference to 1th element
let ptr3_at_0 = ref1_at_0 as *mut i32; // Ptr to 0th element
let ptr4_at_1 = ref2_at_1 as *mut i32; // Ptr to 1th element
*ptr4_at_1 += 4;
*ptr3_at_0 += 3;
*ref2_at_1 += 2;
*ref1_at_0 += 1;
Shoot! Rust doesn't track array indices to prove these borrows are disjoint, but it does give us
split_at_mut to break a slice into multiple parts in a way that is safe to assume works:
unsafe {
let mut data = [0; 10];
*ptr7_at_1 += 7;
*ptr6_at_0 += 6;
*ref5_at_1 += 5;
*ref4_at_0 += 4;
cargo run
[10, 12, 0, 0, 0, 0, 0, 0, 0, 0]
Also note that operations like split_at_mut being allowed tells us that borrows can be less of a stack
and more of a tree, because we can break one big borrow into a bunch of disjoint smaller ones, and
everything still works.
(I think in the actual stacked borrows model everything's still stacks because the stacks are conceptually
tracking permissions for each byte of the program..?)
What if we directly turn a slice into a pointer? Will that pointer have access to the full slice?
unsafe {
let mut data = [0; 10];
*ref6_at_1 += 6;
*ref5_at_0 += 5;
*ptr4_at_1 += 4;
*ptr3_at_0 += 3;
cargo run
[8, 12, 4, 6, 8, 10, 12, 14, 16, 18]
Nice! Pointers aren't just integers: they have a range of memory associated with them, and with Rust we're
allowed to narrow that range!
156
But Rust has shared references that are read-only and can be freely copied, how should those work? Well
we've seen that raw pointers can be freely copied and we can handle that by saying they "share" a single
borrow. Maybe we think of shared references the same way?
Let's test that out with a function that reads a value ( println! can be a little magical with auto-ref/deref
stuff, so I'm wrapping it in a function to make sure we're testing what we want to be):
fn opaque_read(val: &i32) {
println!("{}", val);
}
unsafe {
let mut data = 10;
let mref1 = &mut data;
let sref2 = &mref1;
let sref3 = sref2;
let sref4 = &*sref2;
*mref1 += 1;
opaque_read(&data);
}
cargo run
10
10
10
10
10
11
Oh yeah we forgot to do anything with raw pointers, but at least we can see that it's fine for all the shared
references to be used interchangeably. Now let's mix in some raw pointers:
fn opaque_read(val: &i32) { 157
println!("{}", val);
}
unsafe {
let mut data = 10;
let mref1 = &mut data;
let ptr2 = mref1 as *mut i32;
let sref3 = &mref1;
let ptr4 = sref3 as *mut i32;
*ptr4 += 4;
opaque_read(sref3);
*ptr2 += 2;
*mref1 += 1;
opaque_read(&data);
}
cargo run
Oh whoops, we were actually messing around with & &mut instead of & ! Rust is very good at papering
over that when it doesn't matter. Let's properly reborrow it with let sref3 = &*mref1 :
cargo run
Nope, Rust still doesn't like that! You can only cast a shared reference to a *const which can only read.
But what if we just... do... this...?
cargo run
14
17
WHAT. OK SURE FINE? Great cast system there Rust. It's almost like the *const is a pretty useless type
that only really exists to describe C APIs and to vaguely suggest correct usage (it is, it does). What does
miri think?
MIRIFLAGS="-Zmiri-tag-raw-pointers" cargo +nightly-2022-01-21 miri run 158
Alas, though we can get around the compiler complaining with a double cast, it doesn't actually make this
operation allowed. When we take the shared reference, we're promising not to modify the value.
This is important because that means when the shared borrow is popped off the borrow stack, the mutable
pointers below it can assume the memory hasn't changed. There may have been some tiny angry men
reading the memory (so writes had to be comitted) but they weren't able to modify it and the mutable
pointers can assume the last value they wrote is still there!
Once a shared reference is on the borrow-stack, everything that gets pushed on top of it only
has read permissions.
fn opaque_read(val: &i32) {
println!("{}", val);
}
unsafe {
let mut data = 10;
let mref1 = &mut data;
let ptr2 = mref1 as *mut i32;
let sref3 = &*mref1;
let ptr4 = sref3 as *const i32 as *mut i32;
opaque_read(&*ptr4);
opaque_read(sref3);
*ptr2 += 2;
*mref1 += 1;
opaque_read(&data);
}
Note how it was still "fine" to create a mutable raw pointer as long as we only actually read from it!
cargo run
10
10
13
And just to be sure, let's check that a shared reference gets popped like normal:
fn opaque_read(val: &i32) { 159
println!("{}", val);
}
unsafe {
let mut data = 10;
let mref1 = &mut data;
let ptr2 = mref1 as *mut i32;
let sref3 = &*mref1;
*ptr2 += 2;
opaque_read(sref3); // Read in the wrong order?
*mref1 += 1;
opaque_read(&data);
}
cargo run
12
13
--> src\main.rs:13:17
|
13 | opaque_read(sref3); // Read in the wrong order?
| ^^^^^ trying to reborrow for SharedReadOnly
| at alloc742, but parent tag <1620>
| does not have an appropriate item
| in the borrow stack
|
Hey, we even got a slightly different error message about SharedReadOnly instead of some specific tag.
That makes sense: once there's any shared references, basically everything else is just a big
SharedReadOnly soup so there's no need to distinguish any of them!
160
We've been insisting shared references can't be used for mutation but that chapter was all about how you
could actually mutate through shared references with interior mutability. Let's try the nice and simple
std::cell::Cell type:
use std::cell::Cell;
unsafe {
let mut data = Cell::new(10);
let mref1 = &mut data;
let ptr2 = mref1 as *mut Cell<i32>;
let sref3 = &*mref1;
sref3.set(sref3.get() + 3);
(*ptr2).set((*ptr2).get() + 2);
mref1.set(mref1.get() + 1);
println!("{}", data.get());
}
Ah, such a beautiful mess. It will be lovely to see miri spit on it.
cargo run
16
Smashes another padlock just to really show the stdlib we mean business
#[lang = "unsafe_cell"]
#[repr(transparent)]
#[repr(no_niche)]
pub struct UnsafeCell<T: ?Sized> {
value: T,
}
Oh it's wizard magic. Ok. I guess. #[lang = "unsafe_cell"] is literally just saying UnsafeCell is
UnsafeCell. Let's stop breaking locks and check the actual documentation of std::cell::UnsafeCell.
161
The core primitive for interior mutability in Rust.
If you have a reference &T , then normally in Rust the compiler performs optimizations based on the
knowledge that &T points to immutable data. Mutating that data, for example through an alias or by
transmuting an &T into an &mut T , is considered undefined behavior. UnsafeCell<T> opts-out of
the immutability guarantee for &T : a shared reference &UnsafeCell<T> may point to data that is
being mutated. This is called “interior mutability”.
UnsafeCell basically tells the compiler "hey listen, we're gonna get goofy with this memory, don't make
any of the usual aliasing assumptions about it". Like putting up a big "CAUTION: TINY ANGRY MEN
CROSSING" sign.
use std::cell::UnsafeCell;
fn opaque_read(val: &i32) {
println!("{}", val);
}
unsafe {
let mut data = UnsafeCell::new(10);
let mref1 = data.get_mut(); // Get a mutable ref to the contents
let ptr2 = mref1 as *mut i32;
let sref3 = &*ptr2;
*ptr2 += 2;
opaque_read(sref3);
*mref1 += 1;
println!("{}", *data.get());
}
cargo run
12
13
--> src\main.rs:15:17
|
15 | opaque_read(sref3);
| ^^^^^ trying to reborrow for SharedReadOnly
| at alloc748, but parent tag <1629> does
| not have an appropriate item in the
| borrow stack
|
Wait, what? We spoke the magic words! What am I going to do with all this federally approved ritual- 162
enhancing goat blood?
Well, we did, but then we completely discarded the spell by using get_mut which peeks inside the
UnsafeCell and makes a proper &mut i32 to it anyway!
Think about it: if the compiler had to assume &mut i32 could be looking inside an UnsafeCell , then it
would never be able to make any assumptions about aliasing at all! Everything could be full of tiny angry
men.
So what we need to do is keep the UnsafeCell in our pointer types so that the compiler understands what
we're doing.
use std::cell::UnsafeCell;
fn opaque_read(val: &i32) {
println!("{}", val);
}
unsafe {
let mut data = UnsafeCell::new(10);
let mref1 = &mut data; // Mutable ref to the *outside*
let ptr2 = mref1.get(); // Get a raw pointer to the insides
let sref3 = &*mref1; // Get a shared ref to the *outside*
println!("{}", *data.get());
}
cargo run
12
16
It works! I won't have to throw out all this blood after all.
Actually, hey wait. We're still being a bit goofy with the order here. We made ptr2 first, and then made
sref3 from the mutable pointer. And then we used the raw pointer before the shared pointer. That all
seems... wrong.
Actually wait we did that with the Cell example too. HMMM.
use std::cell::UnsafeCell;
fn opaque_read(val: &i32) {
println!("{}", val);
}
unsafe {
let mut data = UnsafeCell::new(10);
let mref1 = &mut data;
// These two are swapped so the borrows are *definitely* totally stacked
let sref2 = &*mref1;
// Derive the ptr from the shared ref to be super safe!
let ptr3 = sref2.get();
*ptr3 += 3;
opaque_read(&*sref2.get());
*sref2.get() += 2;
*mref1.get() += 1;
println!("{}", *data.get());
}
cargo run
13
16
Now, one reason why the first implementation we had might actually be correct is because if you really
think about it &UnsafeCell<T> really is no different from *mut T as far as aliasing is concerned. You can
infinitely copy it and mutate through it!
So in some sense we just created two raw pointers and used them interchangeably like normal. It's a little
sketchy that both were derived from the mutable reference, so maybe the second one's creation should still
pop the first one off the borrow stack, but that's not really necessary since we're not actually accessing the
contents of the mutable reference, just copying its address.
A line like let sref2 = &*mref1 is a tricksy thing. Syntactically it looks like we're dereferencing it, but
dereferencing on it's own isn't actually a thing? Consider &my_tuple.0 : you aren't actually doing anything
to my_tuple or .0 , you're just using them to refer to a location in memory and putting & in front of it that
says "don't load this, just write the address down".
&* is the same thing: the * is just saying "hey let's talk about the location this pointer points to" and the
& is just saying "now write that address down". Which is of course the same value the original pointer
had. But the type of the pointer has changed, because, uh, types!
That said, if you do &** then you are in fact loading a value with the first * ! * is weird!
164
NARRATOR: No one cares that you know the word "lvalue", Jonathan. In Rust we call them places,
which is totally different and so much cooler?
165
Testing Box
Hey remember why we started this extremely long aside? You don't? Weird.
Well it was because we mixed Box and raw pointers. Box is kind of like &mut , because it claims unique
ownership of the memory it points to. Let's test that claim out:
unsafe {
let mut data = Box::new(10);
let ptr1 = (&mut *data) as *mut i32;
*data += 10;
*ptr1 += 1;
// Should be 21
println!("{}", data);
}
cargo run
21
--> src\main.rs:7:5
|
7 | *ptr1 += 1;
| ^^^^^^^^^^ no item granting read access to tag <1707>
| at alloc763 found in borrow stack.
|
Yep, miri hates that. Let's check that doing things in the right order is ok:
unsafe {
let mut data = Box::new(10);
let ptr1 = (&mut *data) as *mut i32;
*ptr1 += 1;
*data += 10;
// Should be 21
println!("{}", data);
}
cargo run
21
Yep!
Whelp that's all folks, we're finally done talking and thinking about stacked borrows! 166
...wait how do we solve this problem with Box? Like, sure we can write toy programs like this but we need
to store the Box somewhere and hold onto our raw pointers for a potentially long time. Surely stuff will get
mixed up and invalidated?
Great question! To answer that we'll finally be returning to our true calling: writing some god damn linked
lists.
Wait, I need to write linked lists again? Let's not be hasty folks. Be reasonable. Just hold on I'm sure there's
some other interesting issues for me to discu—
167
TL;DR on the previous three sections: randomly mixing safe pointers like & , &mut , and Box with
unsafe pointers like *mut and *const is a recipe for Undefined Behaviour because the safe pointers
introduce extra constraints that we aren't obeying with the raw pointers.
Oh god I need to write linked lists again. Fine. FINE. It's Fine. We're fine.
We're gonna knock a lot of this section out real quick since we already discussed the design in the first try
around, and everything we did was basically correct except for how we mixed together safe and unsafe
pointers.
168
Layout
So in the new layout we're only going to only use raw pointers and everything will be perfect and we'll
never make mistakes again.
struct Node<T> {
elem: T,
next: Link<T>,
}
struct Node<T> {
elem: T,
next: Link<T>,
}
Remember: Option isn't as nice or useful when we're using raw pointers, so we're not using that anymore.
In later sections we'll look at the NonNull type, but don't worry about that for now.
169
Basics
List::new is basically the same.
use ptr;
impl<T> List<T> {
pub fn new() -> Self {
List { head: ptr::null_mut(), tail: ptr::null_mut() }
}
}
Wait we're not using Box anymore. How do we allocate memory without Box?
Well, we could with std::alloc::alloc , but that's like bringing a katana into the kitchen. It'll get the job
done but it's kinda overkill and unwieldy.
We want to have boxes, but, not. One completely wild but maybe viable option would be to do something
like this:
struct Node<T> {
elem: T,
real_next: Option<Box<Node<T>>>,
next: *mut Node<T>,
}
With the idea that we create the Boxes and store them in our node, but then we take a raw pointer into
them and only use that raw pointer until we're done with the Node and want to destroy it. Then we can
take the Box out of real_next and drop it. I think that would conform to our very simplified stacked
borrows model?
If you wanna try to make that, have "fun", but that just looks awful right? This isn't the chapter on Rc and
RefCell, we're not gonna play this game anymore. We're gonna just make simple and clean stuff.
After calling this function, the caller is responsible for the memory previously managed by the Box. In
particular, the caller should properly destroy T and release the memory, taking into account the
memory layout used by Box. The easiest way to do this is to convert the raw pointer back into a Box
with the Box::from_raw function, allowing the Box destructor to perform the cleanup.
Note: this is an associated function, which means that you have to call it as Box::into_raw(b) 170
instead of b.into_raw() . This is so that there is no conflict with a method on the inner type.
Examples
Converting the raw pointer back into a Box with Box::from_raw for automatic cleanup:
let x = Box::new(String::from("Hello"));
let ptr = Box::into_raw(x);
let x = unsafe { Box::from_raw(ptr) };
Nice, that looks literally designed for our use case. It also matches the rules we're trying to follow: start
with safe stuff, turn into into raw pointers, and then only convert back to safe stuff at the end (when we
want to Drop it).
This is basically exactly like doing the weird real_next thing but without having to faff around storing the
Box when it's the exact same pointer as the raw pointer anyway.
Also now that we're just using raw pointers everywhere, let's not worry about keeping those unsafe
blocks narrow: it's all unsafe now. (It always was, but it's nice to lie to yourself sometimes.)
if !self.tail.is_null() {
(*self.tail).next = new_tail;
} else {
self.head = new_tail;
}
self.tail = new_tail;
}
}
Hey that code's actually looking a lot cleaner now that we're sticking to raw pointers!
On to pop, which is also pretty similar to how we left it, although we've got to remember to use
Box::from_raw to clean up the allocation:
pub fn pop(&mut self) -> Option<T> { 171
unsafe {
if self.head.is_null() {
None
} else {
// RISE FROM THE GRAVE
let head = Box::from_raw(self.head);
self.head = head.next;
if self.head.is_null() {
self.tail = ptr::null_mut();
}
Some(head.elem)
}
}
}
Our nice little take s and map s are dead, gotta just check and set null manually now.
And while we're here, let's slap in the destructor. This time we'll implement it as just repeatedly popping,
because it's cute and simple:
// Populate list
list.push(1);
list.push(2);
list.push(3);
// Check exhaustion
assert_eq!(list.pop(), Some(5));
assert_eq!(list.pop(), None);
cargo test
running 12 tests
test fifth::test::basics ... ok
test first::test::basics ... ok
test fourth::test::basics ... ok
test fourth::test::peek ... ok
test second::test::basics ... ok
test fourth::test::into_iter ... ok
test second::test::into_iter ... ok
test second::test::iter ... ok
test second::test::iter_mut ... ok
test second::test::peek ... ok
test third::test::basics ... ok
test third::test::iter ... ok
running 12 tests
test fifth::test::basics ... ok
test first::test::basics ... ok
test fourth::test::basics ... ok
test fourth::test::peek ... ok
test second::test::basics ... ok
test fourth::test::into_iter ... ok
test second::test::into_iter ... ok
test second::test::iter ... ok
test second::test::iter_mut ... ok
test second::test::peek ... ok
test third::test::basics ... ok
test third::test::iter ... ok
EYYYY!!!!!
IT FRIGGIN WORKED!
PROBABLY!
FAILING TO FIND UNDEFINED BEHAVIOUR IS NOT A PROOF THAT IT ISN'T THERE WAITING TO
CAUSE PROBLEMS BUT THERE IS A LIMIT TO HOW RIGOROUS I AM WILLING TO BE FOR A JOKE
BOOK ABOUT LINKED LISTS SO WE'RE GONNA CALL THIS A 100% MACHINE VERIFIED PROOF
AND ANYONE WHO SAYS OTHERWISE CAN SUCK MY COQ!
∴ QED □
174
Extra Junk
Now that push and pop are written, everything else is acutally exactly the same as the stack case, weirdly.
Only operations that change the length of the list need to touch the tail pointer.
But of course, now that everything's unsafe pointers we need to rewrite the code to use those! And if we're
going to be touching all the code, we might as well take the chance to make sure we aren't missing
something.
But anyway, let's start copy-pasting code from the stack implementation:
// ...
IntoIter looks fine, but Iter and IterMut are breaking our simple rule of never using safe pointers in our
types anymore. Let's be safe and change those to use raw pointers:
impl<T> List<T> {
pub fn into_iter(self) -> IntoIter<T> {
IntoIter(self)
}
Looks good!
error[E0392]: parameter `'a` is never used 175
--> src\fifth.rs:17:17
|
17 | pub struct Iter<'a, T> {
| ^^ unused parameter
|
= help: consider removing `'a`, referring to it in a field,
or using a marker such as `PhantomData`
Zero-sized type used to mark things that “act like” they own a T .
Adding a PhantomData<T> field to your type tells the compiler that your type acts as though it stores
a value of type T , even though it doesn’t really. This information is used when computing certain
safety properties.
For a more in-depth explanation of how to use PhantomData<T> , please see the Nomicon.
Hey don't get hasty there, we're reading the book that I wrote. Not that other book that some huge nerd
probably wrote! I bet if they write a data structure in there it's something lame like an Array Stack and not a
Linked List.
Perhaps the most common use case for PhantomData is a struct that has an unused lifetime
parameter, typically as part of some unsafe code.
Ah so we're naming a lifetime in our type but not actually using it. We could go down the PhantomData
path, but I want to save that for the doubly-linked list in the next chapter that will really need it.
We're in an interesting situation where we actually don't need PhantomData. I think. I'm just going to claim
that and trust that it's true, and if miri yells at us at the end I'll concede the point and we'll do the
PhantomData thing.
What we're actually going to do is put the references back in these Iterator types and be happy we get to
use references in some places still. I think that's sound because there's still a kind of proper nesting when
you use an iterator: you create the iterator, use safe references for a while, and then discard the iterator.
Only once the iterator is gone can you access the list and call things like push and pop which need to
mess with the tail pointer and Boxes. Now, during the iteration we are going to be dereferencing a bunch of
raw pointers, so there is a kind of mixing there, but we should be able to think of those references as 176
reborrows of the unsafe pointers.
I'm not even 100% convinced but I just wanna give it a try and see!
impl<T> List<T> {
pub fn into_iter(self) -> IntoIter<T> {
IntoIter(self)
}
If we're going to be storing references, we need to upgrade our raw pointers to options-of-references. We
could check if the pointer is null, but this is one of the incredibly narrow cases where I think it's ok to use
the nasty ptr::as_ref and ptr::as_mut methods.
I usually recommend avoiding these methods like the plague because they do some surprising and nasty
stuff and they're inherently reintroducing references when my whole "easy rule" is to avoid doing that!
Those methods come with a lot of warnings, but the most interesting is this:
You must enforce Rust’s aliasing rules, since the returned lifetime 'a is arbitrarily chosen and does
not necessarily reflect the actual lifetime of the data. In particular, for the duration of this lifetime, the
memory the pointer points to must not get accessed (read or written) through any other pointer.
Hey look it's the thing we talked about for 25 pages! I have already asserted we're definitely going to be
fine to use references here, so aliasing solved! The other evil part is the signature:
Do you see how that lifetime isn't attached to the input at all, because self is by-value? Yeah that's what
we call an "unbounded lifetime" and it's nasty stuff. It's willing to pretend to be as large as we ask it to be,
even 'static ! The way you deal with that is by putting it somewhere that is bounded, which usually just177
means "return this from a function as soon as possible so that the function signature limits it".
Boy I'm nervous about this but we're gonna keep pushing through! Let's steal some iterator impls from the
stack:
running 15 tests
test fifth::test::basics ... ok
test fifth::test::into_iter ... ok
test fifth::test::iter ... ok
test fifth::test::iter_mut ... ok
test first::test::basics ... ok
test fourth::test::basics ... ok
test fourth::test::into_iter ... ok
test fourth::test::peek ... ok
test second::test::basics ... ok
test second::test::into_iter ... ok
test second::test::iter ... ok
test second::test::iter_mut ... ok
test second::test::peek ... ok
test third::test::iter ... ok
test third::test::basics ... ok
running 15 tests
test fifth::test::basics ... ok
test fifth::test::into_iter ... ok
test fifth::test::iter ... ok
test fifth::test::iter_mut ... ok
test first::test::basics ... ok
test fourth::test::basics ... ok
test fourth::test::into_iter ... ok
test fourth::test::peek ... ok
test second::test::basics ... ok
test second::test::into_iter ... ok
test second::test::iter ... ok
test second::test::iter_mut ... ok
test second::test::peek ... ok
test third::test::basics ... ok
test third::test::iter ... ok
NARRATOR: but wasn't the whole point that the mistakes are there to teach the reader.
YEAH WELL SOMETIMES THE LESSON IS THAT I'M RIGHT AND EVERYONE SHOULD LISTEN TO ME
WHEN I SAY THINGS ABOUT UNSAFE CODE BECAUSE I HAVE SPENT FAR TOO MUCH TIME
THINKING ABOUT THE SOUNDNESS OF ITERATOR IMPLEMENTATIONS?! OK?! OK.
I'm not even gonna test them because I never make mistakes anymore.
FINE.
I guess I am going to continue to make mistakes, so we're going to be extra careful and add a new test I'm
going to call "miri food": something that just messes around and mixes up our APIs a bunch to help miri
catch our mistakes.
#[test] 180
fn miri_food() {
let mut list = List::new();
list.push(1);
list.push(2);
list.push(3);
assert!(list.pop() == Some(1));
list.push(4);
assert!(list.pop() == Some(2));
list.push(5);
assert!(list.peek() == Some(&3));
list.push(6);
list.peek_mut().map(|x| *x *= 10);
assert!(list.peek() == Some(&30));
assert!(list.pop() == Some(30));
assert!(list.pop() == Some(400));
list.peek_mut().map(|x| *x *= 10);
assert!(list.peek() == Some(&5000));
list.push(7);
running 16 tests
test fifth::test::basics ... ok
test fifth::test::into_iter ... ok
test fifth::test::iter ... ok
test fifth::test::iter_mut ... ok
test fifth::test::miri_food ... ok
test first::test::basics ... ok
test fourth::test::basics ... ok
test fourth::test::into_iter ... ok
test fourth::test::peek ... ok
test second::test::into_iter ... ok
test second::test::basics ... ok
test second::test::iter_mut ... ok
test second::test::peek ... ok
test third::test::iter ... ok
test second::test::iter ... ok
test third::test::basics ... ok
running 16 tests
test fifth::test::basics ... ok
test fifth::test::into_iter ... ok
test fifth::test::iter ... ok
test fifth::test::iter_mut ... ok
test fifth::test::miri_food ... ok
test first::test::basics ... ok
test fourth::test::basics ... ok
test fourth::test::into_iter ... ok
test fourth::test::peek ... ok
test second::test::into_iter ... ok
test second::test::basics ... ok
test second::test::iter_mut ... ok
test second::test::peek ... ok
test third::test::iter ... ok
test second::test::iter ... ok
test third::test::basics ... ok
Perfect.
182
Final Code
Alright, so with a teeny-tiny dash of unsafety we managed to get a linear time improvement over the naive
safe queue, and we managed to reuse almost all of the logic from the safe stack!
You know, except for that part where miri completely dunked on us and we had to write a short master's
thesis on rust's memory model. You know, as you do.
But on the bright side we didn't have to write any jank Rc or RefCell stuff.
183
use std::ptr;
struct Node<T> {
elem: T,
next: Link<T>,
}
impl<T> List<T> {
pub fn new() -> Self {
List { head: ptr::null_mut(), tail: ptr::null_mut() }
}
pub fn push(&mut self, elem: T) {
unsafe {
let new_tail = Box::into_raw(Box::new(Node {
elem: elem,
next: ptr::null_mut(),
}));
if !self.tail.is_null() {
(*self.tail).next = new_tail;
} else {
self.head = new_tail;
}
self.tail = new_tail;
}
}
pub fn pop(&mut self) -> Option<T> {
unsafe {
if self.head.is_null() {
None
} else {
let head = Box::from_raw(self.head);
self.head = head.next;
if self.head.is_null() {
self.tail = ptr::null_mut();
}
Some(head.elem)
}
}
}
pub fn peek(&self) -> Option<&T> { 184
unsafe {
self.head.as_ref().map(|node| &node.elem)
}
}
#[cfg(test)]
mod test {
use super::List;
#[test]
fn basics() {
let mut list = List::new();
// Populate list
list.push(1);
list.push(2);
list.push(3);
// Check exhaustion
assert_eq!(list.pop(), Some(5));
assert_eq!(list.pop(), None);
#[test]
fn into_iter() {
let mut list = List::new();
list.push(1); list.push(2); list.push(3);
#[test]
fn iter() {
let mut list = List::new();
list.push(1); list.push(2); list.push(3); 186
#[test]
fn iter_mut() {
let mut list = List::new();
list.push(1); list.push(2); list.push(3);
#[test]
fn miri_food() {
let mut list = List::new();
list.push(1);
list.push(2);
list.push(3);
assert!(list.pop() == Some(1));
list.push(4);
assert!(list.pop() == Some(2));
list.push(5);
assert!(list.peek() == Some(&3));
list.push(6);
list.peek_mut().map(|x| *x *= 10);
assert!(list.peek() == Some(&30));
assert!(list.pop() == Some(30));
assert!(list.pop() == Some(400));
list.peek_mut().map(|x| *x *= 10);
assert!(list.peek() == Some(&5000));
list.push(7);
Our story begins as 2014 was coming to a close and we were rapidly approaching the release of Rust 1.0,
Rust's first stable release. I had found myself in the role of caring for std::collections , or as we
affectionately called it in those times, libcollections.
libcollections had spent years as a dumping ground for everyone's Cute Ideas and Vaguely Useful Things.
This was all well and good when Rust was a fledgling experimental language, but if my children were
going to escape the nest and be stabilized, they would have to prove their worth.
Until then I had encouraged and nurtured them all, but it was now time for them to face judgement for their
failings.
I sunk my claws into the bedrock and carved tombstones for my most foolish children. A grisly monument
that I placed in the town square for all to see:
Their fates were sealed, for my word was absolute. The other collections were horrified by my brutality, but
they were not yet safe from their mother's wrath. I soon returned with two more tombstones:
The Bit twins were more cunning than their fallen comrades, but they lacked the strength to escape me.
Most thought my work done, but I soon took one more:
Deprecate VecMap
VecMap had tried to survive through stealth — it was so small and inoffensive! But that wasn't enough for
the libcollections I saw in my vision of the future.
No, DList, it can't be! I thought you died in that tragic garbage collection incident! The one which was
definitely an accident and not intentional at all!
They had faked their death and taken on a new name, but it was still them: LinkedList, the shadowy and
untrustworthy schemer of computing.
I spread word of their misdeeds to all that would hear me, but hearts were unmoved. LinkedList was a 188
silver-tongued devil who had convinced everyone around me that it was some sort of fundamental and
natural datastructure of computing. It had even convinced C++ that it was the list!
Easily! Trivially!
"It's non-trivial unsafe code, so it makes sense to have it in the standard library!"
But alas, LinkedList had gathered too many allies and grown too strong while I was distracted with its kin.
I fled to my laboratory and tried to devise some sort of evil clone or enhanced cyborg replicant that could
rival and destroy it, but my grant funding ran out because my research was "too murderously evil" or
somesuch nonsense.
But you're here now. You've come this far. Surely now you can understand the depths of LinkedList's
debauchery! Come, I will you show you everything you need to know to help me destroy it once and for all
— everything you need to know to implement an unsafe production-quality Doubly-Linked Deque.
How production-quality? Well we're going to completely rewrite my ancient Rust 1.0 linked-list crate, the
one that is objectively better than the one in std. The one with Cursors on stable Rust, from 2015!
Something the 2022 stdlib still doesn't have!
189
Layout
Let us begin by first studying the structure of our enemy. A Doubly-Linked List is conceptually simple, but
that's how it decieves and manipulates you. It's the same kind of linked list we've looked at over and over,
but the links go both ways. Double the links, double the evil.
So rather than this (gonna drop the Some/None stuff to keep it cleaner):
We have this:
This lets you traverse the list from either direction, or seek back and forth with a cursor.
In exchange for this flexibility, every node has to store twice as many pointers, and every operation has to
fix up way more pointers. It's a significant enough complication that it's a lot easier to make a mistake, so
we're going to be doing a lot of testing.
You might have also noticed that I intentionally haven't drawn the ends of the list. This is because this is
one of the places where there are genuinely defensible options for our implementation. We definitely need
our implementation to have two pointers: one to the start of the list, and one to the end of the list.
There are two notable ways to do this in my mind: "traditional" and "dummy node".
The traditional approach is the simple extension of how we did a Stack — just store the head and tail
pointers on the stack:
This is fine, but it has one downside: corner cases. There are now two edges to our list, which means twice
as many corner cases. It's easy to forget one and have a serious bug.
The dummy node approach attempts to smooth out these corner cases by adding an extra node to our list
which contains no data but links the two ends together into a ring:
[ptr] -> (ptr, ?DUMMY?, ptr) <-> (ptr, A, ptr) <-> (ptr, B, ptr)
^ ^
+-------------------------------------------------+
By doing this, every node always has actual pointers to a previous and next node in the list. Even when you
remove the last element from the list, you just end up stitching the dummy node to point at itself:
Problem 1: An extra indirection and allocation, especially for the empty list, which must include the
dummy node. Potential solutions include:
Don't allocate the dummy node until something is inserted: simple and effective, but it adds back
some of the corner cases we were trying to avoid by using dummy pointers!
Use a static copy-on-write empty singleton dummy node, with some really clever scheme that lets
the Copy-On-Write checks piggy-back on normal checks: look I'm really tempted, I really do love that
shit, but we can't go down that dark path in this book. Read ThinVec's sourcecode if you want to see
that kind of perverted stuff.
Store the dummy node on the stack - not practical in a language without C++-style move-
constructors. I'm sure there's something weird thing we could do here with pinning but we're not
gonna.
Problem 2: What value is stored in the dummy node? Sure if it's an integer it's fine, but what if we're
storing a list full of Box ? It may be impossible for us to initialized this value! Potential solutions include:
Make every node store Option<T> : simple and effective, but also bloated and annoying.
Really careful and clever inheritance-style type punning so the dummy node doesn't include the data
field. This is also tempting but it's extremely dangerous and annoying. Read BTreeMap's source if
you want to see that kind of perverted stuff.
The problems really outweigh the convenience for a language like Rust, so we're going to stick to the
traditional layout. We'll be using the same basic design as we did for the unsafe queue in the previous
chapter:
struct Node<T> {
front: Link<T>,
back: Link<T>,
elem: T,
}
(Now that we have reached the doubly-linked-deque, we have finally earned the right to call ourselves
LinkedList, for this is the True Linked List.)
This isn't quite a true production-quality layout yet. It's fine but there's magic tricks we can do to tell Rust
what we're doing a bit better. To do that we're going to need to go... deeper.
191
1. Variance
2. Drop Check
3. NonNull Optimizations
4. The isize::MAX Allocation Rule
5. Zero-Sized Types
The third we could make into our problem but it's more trouble than it's worth -- if you've opted into a
LinkedList you've already given up the battle on memory-effeciency 100-fold already.
The second is something that I used to insist was really important and that std messes around with, but
the defaults are safe, the ways to mess with it are unstable, and you need to try so very hard to ever notice
the limitations of the defaults, so, don't worry about it.
That just leaves us with Variance. To be honest, you can probably punt on this one too, but I still have my
pride as a Collections Person, so we're going to Do The Variance Thing.
So, surprise: Rust has subtyping. In particular, &'big T is a subtype of &'small T . Why? Well because if
some code needs a reference that lives for some particular region of the program, it's usually perfectly fine
to give it a reference that lives for longer. Like, intuitively that's just true, right?
Why is this important? Well imagine some code that takes two values with the same type:
fn take_two<T>(_val1: T, _val2: T) { }
This is some deeply boring code, and so we should expect it to work with T=&u32 fine, right?
fn take_two<T>(_val1: T, _val2: T) { }
Now let's have some fun and wrap it in, oh, I don't know, std::cell::Cell :
192
use std::cell::Cell;
fn take_two<T>(_val1: T, _val2: T) { }
Huh??? We didn't touch the lifetimes, why's the compiler angry now!?
Ah well, the lifetime "subtyping" stuff must be really simple, so it falls over if you wrap the references in
anything, see look it breaks with Vec too:
fn take_two<T>(_val1: T, _val2: T) { }
Well, yes. But also, no. The magic was inside us all along, and that magic is ✨Variance✨.
Read the nomicon's chapter on subtyping if you want all the gorey details, but basically subtyping isn't
always safe. In particular it's not safe when mutable references are involved because you can use things
like mem::swap and suddenly oops dangling pointers!
Things that are "like mutable references" are invariant which means they block subtyping from happening
on their generic parameters. So for safety, &mut T is invariant over T, and Cell<T> is invariant over T
because &Cell<T> is basically just &mut T (because of interior mutability).
Almost everything that isn't invariant is covariant, and that just means that subtyping "passes through" it
and continues to work normally (there are also contravariant types that make subtyping go backwards but
they are really rare and no one likes them so I won't mention them again). 193
Collections generally contain a mutable pointer to their data, so you might expect them to be invariant too,
but in fact, they don't need to be! Because of Rust's ownership system, Vec<T> is semantically equivalent
to T , and that means it's safe for it to be covariant!
struct Node<T> {
front: Link<T>,
back: Link<T>,
elem: T,
}
But how is Rust actually deciding the variance of things? Well in the good-old-days before 1.0 we messed
around with just letting people specify the variance they wanted and... it was an absolute train-wreck!
Subtyping and variance is really hard to wrap your head around, and core developers genuinely disagreed
on basic terminology! So we moved to a "variance by example" approach: the compiler just looks at your
fields and copies their variances. If there's any kind of disagreement, then invariance always wins, because
that's safe.
So what's in our type definitions that Rust is getting mad about? *mut !
Raw pointers in Rust really just try to let you do whatever, but they have exactly one safety feature: because
most people have no idea that variance and subtyping are a thing in Rust, and being incorrectly covariant
would be horribly dangerous, *mut T is invariant, because there's a good chance it's being used "as" &mut
T.
This is extremely annoying for Exactly Me as a person who has spent a lot of time writing collections in
Rust. This is why when I made std::ptr::NonNull, I added this little piece of magic:
Unlike *mut T , NonNull<T> was chosen to be covariant over T . This makes it possible to use
NonNull<T> when building covariant types, but introduces the risk of unsoundness if used in a type
that shouldn’t actually be covariant.
But hey, it's interface is built around *mut T , what's the deal! Is it just magic?! Let's look:
194
pub struct NonNull<T> {
pointer: *const T,
}
impl<T> NonNull<T> {
pub unsafe fn new_unchecked(ptr: *mut T) -> Self {
// SAFETY: the caller must guarantee that `ptr` is non-null.
unsafe { NonNull { pointer: ptr as *const T } }
}
}
NOPE. NO MAGIC HERE! NonNull just abuses the fact that *const T is covariant and stores that instead,
casting back and forth between *mut T at the API boundary to make it "look like" it's storing a *mut T .
That's the whole trick! That's how collections in Rust are covariant! And it's miserable! So I made the Good
Pointer Type do it for you! You're welcome! Enjoy your subtyping footgun!
The solution to all your problems is to use NonNull, and then if you want to have nullable pointers again,
use Option<NonNull<T>> . Are we really going to bother doing that..?
Yep! It sucks, but we're making production grade linked lists so we're going to eat all our vegetables and do
things the hard way (we could just use bare *const T and cast everywhere, but I genuinely want to see
how painful this is... for Ergonomics Science).
use std::ptr::NonNull;
// !!!This changed!!!
pub struct LinkedList<T> {
front: Link<T>,
back: Link<T>,
len: usize,
}
struct Node<T> {
front: Link<T>,
back: Link<T>,
elem: T,
}
...wait nope, one last thing. Any time you do raw pointer stuff, you should add a Ghost to protect your
pointers:
use std::marker::PhantomData;
PhantomData is a way for us to give the compiler an extra "example" field that conceptually exists in your
type but for various reasons (indirection, type erasure, ...) doesn't. In this case we're using NonNull because
we're claiming our type behaves "as if" it stored a value T, so we add a PhantomData to make that explicit.
The stdlib actually has other reasons to do this because it has access to the accursed Drop Check overrides,
but that feature has been reworked so many times that I don't actually know if the PhantomData thing is a
thing for it anymore. I'm still going to cargo-cult it for all eternity, because Drop Check Magic is burned into
my brain!
...ok for real we're done with layout now! On to actual basic functionality!
196
Basics
Alright, this is the part of the book that sucks, and why it took me 7 years to write this chapter! Time to just
burn through a whole lot of really boring stuff we've done 5 times already, but extra verbose and long
because we have to do everything twice and with Option<NonNull<Node<T>>> !
impl<T> LinkedList<T> {
pub fn new() -> Self {
Self {
front: None,
back: None,
len: 0,
_boo: PhantomData,
}
}
}
PhantomData is a weird type with no fields so you just make one by, saying its type name. shrug
Ah yes, I truly hate my pointer-y children. We need to explicitly get the raw pointer out of NonNull with
as_ptr , because DerefMut is defined in terms of &mut and we don't want to randomly introduce safe
references into our unsafe code!
(*old.as_ptr()).front = Some(new); 197
(*new.as_ptr()).back = Some(old);
self.len -= 1;
result
// Box gets implicitly freed here, knows there is no T.
})
}
}
#[test]
fn test_basic_front() {
let mut list = LinkedList::new();
// Mess around
list.push_front(10);
assert_eq!(list.len(), 1);
list.push_front(20);
assert_eq!(list.len(), 2);
list.push_front(30);
assert_eq!(list.len(), 3);
assert_eq!(list.pop_front(), Some(30));
assert_eq!(list.len(), 2);
list.push_front(40);
assert_eq!(list.len(), 3);
assert_eq!(list.pop_front(), Some(40));
assert_eq!(list.len(), 2);
assert_eq!(list.pop_front(), Some(20));
assert_eq!(list.len(), 1);
assert_eq!(list.pop_front(), Some(10));
assert_eq!(list.len(), 0);
assert_eq!(list.pop_front(), None);
assert_eq!(list.len(), 0);
assert_eq!(list.pop_front(), None);
assert_eq!(list.len(), 0);
}
}
running 1 test
test test::test_basic_front ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in
0.00s
...Right?
199
Is it right?
Sorry did you forget the book you're reading? Of course it's wrong! (Sort Of.)
// Bring the Box back to life so we can move out its value and
// Drop it (Box continues to magically understand this for us).
let boxed_node = Box::from_raw(node.as_ptr());
let result = boxed_node.elem;
self.len -= 1;
result
// Box gets implicitly freed here, knows there is no T.
debug_assert!(self.len == 1);
Really? Our friggin' integrity check for tests is a bug?? Yes!!! Well, if we implement our collection right it
shouldn't be, but it can turn something benign like "oh we are doing a bad job of keeping len up to date"
into An Exploitable Memory Safety Bug! Why? Because it can panic! Most of the time you don't have to
think or worry about panics, but once you start writing really unsafe code and playing fast and loose with
"invariants", you need to become hyper-vigilant about panics!
We've gotta talk about exception safety (AKA panic safety, AKA unwind safety, ...).
So here's the deal: by default, panics are unwinding. Unwinding is just a fancy way to say "make every
single function immediately return". You might think "well, if everyone returns then the program is about to
die, so why care about it?", but you'd be wrong!
We have to care for two reasons: destructors run when a function returns, and the unwind can be caught. In
both cases, code can keep running after a panic, so we need to be very careful and make sure our unsafe
collections are always in some kind of coherent state whenever a panic could occur, because each panic is 200
an implicit early return!
Let's think about what state our collection is in when we get to that line:
We have our boxed_node on the stack, and we've extracted the element from it. If we were to return at this
point, the Box would be dropped, and the node would be freed. Do you see it now..? self.back is still
pointing at that freed node! Once we implement the rest of our collection and start using self.back for
things, this could result in a use-after-free! Yikes!
Interestingly, this line has similar problems, but it's much safer:
self.len -= 1;
By default in debug builds Rust checks for underflows and overflows and will panic when they happen.
Yes, every arithmetic operation is a panic-safety hazard! This one is better because it happens after we've
repaired all of our invariants, so it won't cause memory-safety issues... as long as we don't trust len to be
right, but then, if we underflow it's definitely wrong, so we were dead either way! The debug assert is in
some sense worse because it can escalate a minor issue into a critical one!
I've brought up the term "invariants" a few times, and that's because it's a really useful concept for panic-
safety! Basically, to an outside observer of our collection, there are certain property we're always upholding.
For a LinkedList, one of those is that any node that is reachable in our list is still allocated and initialized.
Inside the implementation we have a bit more flexibility to break invariants temporarily as long as we make
sure to repair them before anyone notices. This is actually one of the "killer apps" of Rust's ownership and
borrowing system for collections: if an operation requires an &mut Self , then we are guaranteed that we
have exclusive access to our collection and that it's fine for us to temporarily break invariants, safe in the
knowledge that no one can sneakily mess with it.
Perhaps the greatest expression of this is Vec::drain, which actually lets you completely smash a core
invariant of Vec and start moving values out from the front or even middle of a Vec. The reason this is
sound is because the Drain iterator that we return holds an &mut to the Vec, and so all access is gated
behind it! No one can observe the Vec until the Drain iterator goes away, and then it's destructor can "repair"
the Vec before anyone can notice, it's perfe--
It's not perfect. Unfortunately, you can't rely on destructors in code you don't control to run, and so even
with Drain we need to do a little extra work to make our type always preserved invariants, but in a kind of
goofy way: we just set the Vec's len to 0 at the start, so if anyone leaks the Drain, then they will have a safe
Vec... but they will have also lost a bunch of data. You leak me? I leak you! An eye for an eye! True justice!
For a situation where you can actually use destructors for panic-safety, check out the BinaryHeap::sift_up
case study.
Anyway, we won't be needing all of this fancy stuff for our LinkedLists, we just need to be a bit more
vigilant about where we break our invariants, what we trust/require to be correct, and to avoid introducing
unnecessary unwinds in the middle of hairy tasks.
In this case, we have two options to make our code a bit more robust:
Use operations like Option::take a lot more aggressively, because they are more "transactional" and
have a tendency to preserve invariants.
Kill the debug_asserts and trust ourselves to write better tests with dedicated "integrity check" 201
functions that won't run in user code ever.
In principle I like the first option, but it doesn't actually work great for a doubly-linked list, because
everything is doubly-redundantly encoded. Option::take wouldn't fix the problem here, but moving the
debug_assert down a line would. But really, why make things harder for ourselves? Let's just remove those
debug_asserts, and make sure anything can panic is at the start or end of our methods, where our invariants
should be known to hold.
(In this way it's perhaps more accurate to think of them as preconditions and postconditions but you really
should endeavour to treat them as invariants as much as possible!)
struct Node<T> {
front: Link<T>,
back: Link<T>,
elem: T,
}
impl<T> LinkedList<T> {
pub fn new() -> Self {
Self {
front: None,
back: None,
len: 0,
_boo: PhantomData,
}
}
self.len -= 1;
result
// Box gets implicitly freed here, knows there is no T.
})
}
}
What can panic here? Well, knowing that honestly requires you to be a bit of a Rust expert, but thankfully, I
am!
The only places I can see in this code that possibly can panic (barring some absolute fuckery where
someone recompiles the stdlib with debug_asserts enabled, but this is not something you should ever do)
are Box::new (for out-of-memory conditions) and the len arithmetic. All of that stuff is at the very end or
very start of our methods, so yep, we're being nice and safe!
...were you surprised by Box::new being able to panic? Panics will get you like that! Try to preserve those
invariants so you don't need to worry about it!
204
Boring Combinatorics
Ok, back to our regularly scheduled linked lists!
We've got to fill in a bunch of really boring combinatoric implementations like front, front_mut, back,
back_mut, iter, iter_mut, into_iter, ...
You could do them with macros or whatever but honestly, that's a worse fate than copy-pasting. We're just
going to do a lot of copy-pasting. I have very carefully crafted the previous push/pop implementations so
that we should be able to literally just swap front and back and the code does/says the right thing! Hooray
for painful experience! (It's so tempting to talk about "prev and next" for nodes, but I find it's really worth it
to just consistently talk about "front" and "back" as much as possible to avoid mistakes.)
Hey actually, this book is really old and some nice new things have been added like the ? operator which
does an early return on Option::None, does that make our code nicer?
Maybe? It's kind of a wash for something this simple, and the previous section was all about how early
returns are kinda spooky for us, so maybe we should prefer being a bit more explicit here (I'm sticking to the
map implementation). On to front_mut:
So in addition to next and size_hint , we're going to support next_back and len .
The vigilant among you might notice that IterMut seems a lot more sketchy with double-ended iteration,
but it's actually still sound!
... god this is gonna be a lot of boilerplate. Maybe I should really write a macro... no, no, that's still a worse
fate.
pub struct Iter<'a, T> { 206
front: Link<T>,
back: Link<T>,
len: usize,
_boo: PhantomData<&'a T>,
}
impl<T> LinkedList<T> {
pub fn iter(&self) -> Iter<T> {
Iter {
front: self.front,
back: self.back,
len: self.len,
_boo: PhantomData,
}
}
}
we'll paste IterMut at the end, it's literally the exact same code with mut in a lot of places, let's just knock
out into_iter first. We can mercifully still lean on our tried-and-true solution of just making it wrap our
collection and using pop for next:
pub struct IntoIter<T> { 208
list: LinkedList<T>,
}
impl<T> LinkedList<T> {
pub fn into_iter(self) -> IntoIter<T> {
IntoIter {
list: self
}
}
}
Alright, here's all of our code with all the combinatorics filled in:
209
use std::ptr::NonNull;
use std::marker::PhantomData;
struct Node<T> {
front: Link<T>,
back: Link<T>,
elem: T,
}
impl<T> LinkedList<T> {
pub fn new() -> Self {
Self {
front: None,
back: None,
len: 0,
_boo: PhantomData,
}
}
self.len -= 1;
result
// Box gets implicitly freed here, knows there is no T.
})
}
}
self.len -= 1;
result
// Box gets implicitly freed here, knows there is no T.
})
}
}
#[cfg(test)]
mod test {
use super::LinkedList;
#[test]
fn test_basic_front() {
let mut list = LinkedList::new();
// Mess around
list.push_front(10);
assert_eq!(list.len(), 1);
list.push_front(20);
assert_eq!(list.len(), 2);
list.push_front(30);
assert_eq!(list.len(), 3); 215
assert_eq!(list.pop_front(), Some(30));
assert_eq!(list.len(), 2);
list.push_front(40);
assert_eq!(list.len(), 3);
assert_eq!(list.pop_front(), Some(40));
assert_eq!(list.len(), 2);
assert_eq!(list.pop_front(), Some(20));
assert_eq!(list.len(), 1);
assert_eq!(list.pop_front(), Some(10));
assert_eq!(list.len(), 0);
assert_eq!(list.pop_front(), None);
assert_eq!(list.len(), 0);
assert_eq!(list.pop_front(), None);
assert_eq!(list.len(), 0);
}
}
216
impl<T> LinkedList<T> {
pub fn is_empty(&self) -> bool {
self.len == 0
}
And now we've got a bunch of traits to implement that everyone expects:
impl<T> Default for LinkedList<T> { 217
fn default() -> Self {
Self::new()
}
}
I definitely wrote all of these from scratch, and didn't just copy the std impls. Because they're so interesting,
and I definitely remember the subtleties of manually implementing Hash. Yeah, that's something I think
about All The Time...
First, a nasty namespace clash. For whatever reason std now has macros named Hash and Debug, and so if
you don't have the traits imported, you'll get really cryptic errors about macros instead of the proper
"missing trait".
The other intersting thing to talk about is Hash itself. Do you see how we hash in len ? That's actually
really important! If collections don't hash in lengths, they can accidentally make themselves vulnerable to
prefix collisions. For instance, what distinguishes ["he", "llo"] from ["hello"] ? If no one is hashing
lengths or some other "separator", nothing! Making it too easy for hash collisions to accidentally or
maliciously happen can result in serious sadness, so just do it!
struct Node<T> {
front: Link<T>,
back: Link<T>,
elem: T,
}
impl<T> LinkedList<T> {
pub fn new() -> Self {
Self {
front: None,
back: None,
len: 0,
_boo: PhantomData,
}
}
self.len -= 1;
result
// Box gets implicitly freed here, knows there is no T.
})
}
}
self.len -= 1;
result
// Box gets implicitly freed here, knows there is no T.
})
}
}
#[cfg(test)]
mod test {
use super::LinkedList;
#[test]
fn test_basic_front() { 226
let mut list = LinkedList::new();
// Mess around
list.push_front(10);
assert_eq!(list.len(), 1);
list.push_front(20);
assert_eq!(list.len(), 2);
list.push_front(30);
assert_eq!(list.len(), 3);
assert_eq!(list.pop_front(), Some(30));
assert_eq!(list.len(), 2);
list.push_front(40);
assert_eq!(list.len(), 3);
assert_eq!(list.pop_front(), Some(40));
assert_eq!(list.len(), 2);
assert_eq!(list.pop_front(), Some(20));
assert_eq!(list.len(), 1);
assert_eq!(list.pop_front(), Some(10));
assert_eq!(list.len(), 0);
assert_eq!(list.pop_front(), None);
assert_eq!(list.len(), 0);
assert_eq!(list.pop_front(), None);
assert_eq!(list.len(), 0);
}
}
227
Testing
Alright I put off testing for a while because well, we both know that we're now masters of Rust and we
don't make mistakes anymore! Also just, this is a rewrite of an old crate, so I just have all the tests already.
They're tests, you've seen tests a lot already. Here they are:
#[cfg(test)] 228
mod test {
use super::LinkedList;
#[test]
fn test_basic_front() {
let mut list = LinkedList::new();
// Mess around
list.push_front(10);
assert_eq!(list.len(), 1);
list.push_front(20);
assert_eq!(list.len(), 2);
list.push_front(30);
assert_eq!(list.len(), 3);
assert_eq!(list.pop_front(), Some(30));
assert_eq!(list.len(), 2);
list.push_front(40);
assert_eq!(list.len(), 3);
assert_eq!(list.pop_front(), Some(40));
assert_eq!(list.len(), 2);
assert_eq!(list.pop_front(), Some(20));
assert_eq!(list.len(), 1);
assert_eq!(list.pop_front(), Some(10));
assert_eq!(list.len(), 0);
assert_eq!(list.pop_front(), None);
assert_eq!(list.len(), 0);
assert_eq!(list.pop_front(), None);
assert_eq!(list.len(), 0);
}
#[test]
fn test_basic() {
let mut m = LinkedList::new();
assert_eq!(m.pop_front(), None);
assert_eq!(m.pop_back(), None);
assert_eq!(m.pop_front(), None);
m.push_front(1);
assert_eq!(m.pop_front(), Some(1));
m.push_back(2);
m.push_back(3);
assert_eq!(m.len(), 2); 229
assert_eq!(m.pop_front(), Some(2));
assert_eq!(m.pop_front(), Some(3));
assert_eq!(m.len(), 0);
assert_eq!(m.pop_front(), None);
m.push_back(1);
m.push_back(3);
m.push_back(5);
m.push_back(7);
assert_eq!(m.pop_front(), Some(1));
#[test]
fn test_iterator() {
let m = generate_test();
for (i, elt) in m.iter().enumerate() {
assert_eq!(i as i32, *elt);
}
let mut n = LinkedList::new();
assert_eq!(n.iter().next(), None);
n.push_front(4);
let mut it = n.iter();
assert_eq!(it.size_hint(), (1, Some(1)));
assert_eq!(it.next().unwrap(), &4);
assert_eq!(it.size_hint(), (0, Some(0)));
assert_eq!(it.next(), None);
}
#[test]
fn test_iterator_double_end() {
let mut n = LinkedList::new();
assert_eq!(n.iter().next(), None);
n.push_front(4);
n.push_front(5);
n.push_front(6);
let mut it = n.iter();
assert_eq!(it.size_hint(), (3, Some(3)));
assert_eq!(it.next().unwrap(), &6);
assert_eq!(it.size_hint(), (2, Some(2)));
assert_eq!(it.next_back().unwrap(), &4);
assert_eq!(it.size_hint(), (1, Some(1)));
assert_eq!(it.next_back().unwrap(), &5);
assert_eq!(it.next_back(), None);
assert_eq!(it.next(), None);
} 230
#[test]
fn test_rev_iter() {
let m = generate_test();
for (i, elt) in m.iter().rev().enumerate() {
assert_eq!(6 - i as i32, *elt);
}
let mut n = LinkedList::new();
assert_eq!(n.iter().rev().next(), None);
n.push_front(4);
let mut it = n.iter().rev();
assert_eq!(it.size_hint(), (1, Some(1)));
assert_eq!(it.next().unwrap(), &4);
assert_eq!(it.size_hint(), (0, Some(0)));
assert_eq!(it.next(), None);
}
#[test]
fn test_mut_iter() {
let mut m = generate_test();
let mut len = m.len();
for (i, elt) in m.iter_mut().enumerate() {
assert_eq!(i as i32, *elt);
len -= 1;
}
assert_eq!(len, 0);
let mut n = LinkedList::new();
assert!(n.iter_mut().next().is_none());
n.push_front(4);
n.push_back(5);
let mut it = n.iter_mut();
assert_eq!(it.size_hint(), (2, Some(2)));
assert!(it.next().is_some());
assert!(it.next().is_some());
assert_eq!(it.size_hint(), (0, Some(0)));
assert!(it.next().is_none());
}
#[test]
fn test_iterator_mut_double_end() {
let mut n = LinkedList::new();
assert!(n.iter_mut().next_back().is_none());
n.push_front(4);
n.push_front(5);
n.push_front(6);
let mut it = n.iter_mut();
assert_eq!(it.size_hint(), (3, Some(3)));
assert_eq!(*it.next().unwrap(), 6);
assert_eq!(it.size_hint(), (2, Some(2)));
assert_eq!(*it.next_back().unwrap(), 4);
assert_eq!(it.size_hint(), (1, Some(1)));
assert_eq!(*it.next_back().unwrap(), 5);
assert!(it.next_back().is_none());
assert!(it.next().is_none());
}
#[test]
fn test_eq() {
let mut n: LinkedList<u8> = list_from(&[]);
let mut m = list_from(&[]);
assert!(n == m);
n.push_front(1); 231
assert!(n != m);
m.push_back(1);
assert!(n == m);
#[test]
fn test_ord() {
let n = list_from(&[]);
let m = list_from(&[1, 2, 3]);
assert!(n < m);
assert!(m > n);
assert!(n <= n);
assert!(n >= n);
}
#[test]
fn test_ord_nan() {
let nan = 0.0f64 / 0.0;
let n = list_from(&[nan]);
let m = list_from(&[nan]);
assert!(!(n < m));
assert!(!(n > m));
assert!(!(n <= m));
assert!(!(n >= m));
let n = list_from(&[nan]);
let one = list_from(&[1.0f64]);
assert!(!(n < one));
assert!(!(n > one));
assert!(!(n <= one));
assert!(!(n >= one));
#[test]
fn test_debug() {
let list: LinkedList<i32> = (0..10).collect();
assert_eq!(format!("{:?}", list), "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]");
assert_eq!(map.len(), 2);
assert_eq!(map.get(&list1), Some(&"list1"));
assert_eq!(map.get(&list2), Some(&"list2"));
assert_eq!(map.remove(&list1), Some("list1"));
assert_eq!(map.remove(&list2), Some("list2"));
assert!(map.is_empty());
}
}
cargo test
Finished test [unoptimized + debuginfo] target(s) in 0.00s
Running unittests src\lib.rs
running 12 tests
test test::test_basic ... ok
test test::test_basic_front ... ok
test test::test_eq ... ok
test test::test_iterator ... ok
test test::test_iterator_mut_double_end ... ok
test test::test_ord_nan ... ok
test test::test_iterator_double_end ... ok
test test::test_mut_iter ... ok
test test::test_rev_iter ... ok
test test::test_hashmap ... ok
test test::test_ord ... ok
test test::test_debug ... ok
test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in
0.00s
$env:MIRIFLAGS="-Zmiri-tag-raw-pointers" 233
cargo miri test
Compiling linked-list v0.0.3
Finished test [unoptimized + debuginfo] target(s) in 0.35s
Running unittests src\lib.rs
running 12 tests
test test::test_basic ... ok
test test::test_basic_front ... ok
test test::test_debug ... ok
test test::test_eq ... ok
test test::test_hashmap ... ok
test test::test_iterator ... ok
test test::test_iterator_double_end ... ok
test test::test_iterator_mut_double_end ... ok
test test::test_mut_iter ... ok
test test::test_ord ... ok
test test::test_ord_nan ... ok
test test::test_rev_iter ... ok
😭
We did it, we actually didn't screw up. This isn't a trick! All of our practice and training was finally worth it,
we've finally written good code!!!
Now with all that gunk out of the way, we can get back to the Interesting Stuff!
234
Like Copy, these traits have absolutely no code associated with them, and are just markers that your type
has a particular property. Send says that your type is safe to send to another thread. Sync says your type is
safe to share between threads (&Self: Send).
The same argument for LinkedList being covariant applies here: generally normal collections which don't
use fancy interior mutability tricks are safe to make Send and Sync.
But I said they're opt out. So actually, are we already? How would we know?
Let's add some new magic to our code: random private garbage that won't compile unless our types have
the properties we expect:
#[allow(dead_code)]
fn assert_properties() {
fn is_send<T: Send>() {}
fn is_sync<T: Sync>() {}
is_send::<LinkedList<i32>>();
is_sync::<LinkedList<i32>>();
is_send::<IntoIter<i32>>();
is_sync::<IntoIter<i32>>();
is_send::<Iter<i32>>();
is_sync::<Iter<i32>>();
is_send::<IterMut<i32>>();
is_sync::<IterMut<i32>>();
is_send::<Cursor<i32>>();
is_sync::<Cursor<i32>>();
Oh geez, what gives! I had that great Holy Roman Empire joke!
Well, I lied to you when I said raw pointers have only one safety guard: this is the other. *const AND
*mut explicitly opt out of Send and Sync to be safe, so we do actually have to opt back in:
Note that we have to write unsafe impl here: these are unsafe traits! Unsafe code (like concurrency libraries)
gets to rely on us only implementing these traits correctly! Since there's no actual code, the guarantee we're
making is just that, yes, we are indeed safe to Send or Share between threads!
Don't just slap these on lightly, but I am a Certified Professional here to say: yep there's are totally fine.
Note how we don't need to implement Send and Sync for IntoIter: it just contains LinkedList, so it auto-
derives Send and Sync — I told you they were actually opt out! (You opt out with the hillarious syntax of
impl !Send for MyType {} .)
cargo build
Compiling linked-list v0.0.3
Finished dev [unoptimized + debuginfo] target(s) in 0.18s
Ok nice!
...Wait, actually it would be really dangerous if stuff that shouldn't be these things wasn't. In particular,
IterMut definitely shouldn't be covariant, because it's "like" &mut T . But how can we check that?
With Magic! Well, actually, with rustdoc! Ok well we don't have to use rustdoc for this, but it's the funniest236
way to do it. See, if you write a doccomment and include a code block, then rustdoc will try to compile and
run it, so we can use that to make fresh anonymous "programs" that don't affect the main one:
/// ```
/// use linked_list::IterMut;
///
/// fn iter_mut_covariant<'i, 'a, T>(x: IterMut<'i, &'static T>) -> IterMut<'i, &'a T> {
x }
/// ```
fn iter_mut_invariant() {}
cargo test
...
Doc-tests linked-list
running 1 test
test src\lib.rs - assert_properties::iter_mut_invariant (line 458) ... FAILED
failures:
Ok cool, we've proved it's invariant, but uh, now our tests fail. No worries, rustdoc lets you say that's
expected by annotating the fence with compile_fail!
(Actually we only proved it's "not covariant" but honestly if you manage to make a type "accidentaly and
incorrectly contravariant" then, congrats?)
/// ```compile_fail
/// use linked_list::IterMut;
///
/// fn iter_mut_covariant<'i, 'a, T>(x: IterMut<'i, &'static T>) -> IterMut<'i, &'a T> {
x }
/// ```
fn iter_mut_invariant() {}
cargo test 237
Compiling linked-list v0.0.3
Finished test [unoptimized + debuginfo] target(s) in 0.49s
Running unittests src\lib.rs
...
Doc-tests linked-list
running 1 test
test src\lib.rs - assert_properties::iter_mut_invariant (line 458) - compile fail ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in
0.12s
Yay! I recommend always making the test without compile_fail so that you can confirm that it fails to
compile for the right reason. For instance, that test will also fail (and therefore pass) if you forget the use ,
which, is not what we want! While it's conceptually appealing to be able to "require" a specific error from
the compiler, this would be an absolute nightmare that would effectively make it a breaking change for the
compiler to produce better errors. We want the compiler to get better, so, no you don't get to have that.
(Oh wait, we can actually just specify the error code we want next to the compile_fail but this only works
on nightly and is a bad idea to rely on for the reasons state above. It will be silently ignored
on not-nightly.)
/// ```compile_fail,E0308
/// use linked_list::IterMut;
///
/// fn iter_mut_covariant<'i, 'a, T>(x: IterMut<'i, &'static T>) -> IterMut<'i, &'a T> {
x }
/// ```
fn iter_mut_invariant() {}
...also, did you notice the part where we actually made IterMut invariant? It was easy to miss, since I "just"
copy-pasted Iter and dumped it at the end. It's the last line here:
cargo build
Compiling linked-list v0.0.3 (C:\Users\ninte\dev\contain\linked-list)
error[E0392]: parameter `'a` is never used
--> src\lib.rs:30:20
|
30 | pub struct IterMut<'a, T> {
| ^^ unused parameter
|
= help: consider removing `'a`, referring to it in a field, or using a marker such as
`PhantomData`
Ha! The compiler has our back and won't just let us not use the lifetime. Let's try using the wrong example238
instead:
cargo build
Compiling linked-list v0.0.3 (C:\Users\ninte\dev\contain\linked-list)
Finished dev [unoptimized + debuginfo] target(s) in 0.17s
cargo test
...
Doc-tests linked-list
running 1 test
test src\lib.rs - assert_properties::iter_mut_invariant (line 458) - compile fail ... FAILED
failures:
failures:
src\lib.rs - assert_properties::iter_mut_invariant (line 458)
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in
0.15s
Eyyy!!! The system works! I love having tests that actually do their job, so that I don't have to be quite so
horrified of looming mistakes!
239
An Introduction To Cursors
OK!!! We now have a LinkedList that's on par with std's 1.0 implementation! Which of course means that
our LinkedList is still completely useless. We've taken the enormous performance penalty of implementing
a Deque as a linked list, and we don't have any of the APIs that make it actually useful.
We're not going to make our list support "weird" stuff, because that's all adhoc and domain-specific. But
the splitting and splicing thing, now that's something we can do!
But here's the problem: actually reaching the kth element in a LinkedList takes O(k) time, so how can we
possibly do arbitrary splits and merges in O(1)? Well, the trick is that you don't have an API like
split_at(index) -- you make a system where the user can statefully iterate to a position in the list and
make O(1) modifications at that point!
Hey, we already have iterators! Can we use them for this? Kind of... but one of their super-powers gets in
the way. You may recall that the way that we write out the lifetimes for by-ref iterators means that the
references they return aren't tied to the iterator. This lets us repeatedly call next and hold onto the
elements:
If the returned references borrowed the iterator, then this code wouldn't work at all. The compiler would
just complain about the second call to next ! This flexibility is great, but it puts some implicit constraints
on us:
By-Mutable-Ref Iterators can never go backwards and yield an element again, because the user
would be able to get two &mut 's to the same element, breaking fundamental rules of the language.
By-Ref Iterators can't have extra methods which could possibly modify the underlying collection in a
way that would invalidate any reference that has already been yielded.
Unfortunately, both of these things are exactly what we want our LinkedList API to do! So we can't just use
iterators, we need something new: Cursors.
Cursors are exactly like the little blinking | you get when you're editing some text on a computer. It's a 240
position in a sequence (the text) that you can move around (with the arrow keys), and whenever you type
the edits happen at that point.
See if I just
press
enter
the whole
text
Sorry you're standing behind me and watching me type this right? So that totally makes sense, right? Right.
Now if you've ever had the misfortune of having a keyboard with an "insert" key and actually pressed it, you
know that there's actually technically two interpretations of cursors: they can either lie between elements
(characters) or on elements. I'm pretty sure no one has ever pressed "insert" on purpose in their life, and that
it exists purely as a Suffering Button, so it's pretty obvious which one is Better and Right: cursors go
between elements!
Pretty rock-solid logic right there, I don't think anyone can disagree with me.
Sorry what? There was an RFC in 2018 to add Cursors to Rust's LinkedList?
With a Cursor one can seek back and forth through a list and get the current element. With a
CursorMut One can seek back and forth and get mutable references to elements, and it can insert and
delete elements before and behind the current element (along with performing several list operations
such as splitting and splicing).
Current element? This cursor is on elements, not between them! I can't believe they didn't accept my totally
rock-solid argument! So yeah you can just go use the Cursor in std... wait, it's 2022, and Rust 1.60 still
has Cursor marked as unstable?
Hey wait:
Cursors always rest between two elements in the list, and index in a logically circular way. To
accommodate this, there is a "ghost" non-element that yields None between the head and tail of the
list.
HEY WAIT. This is the opposite of what the RFC says??? But wait all the docs on the methods still refer to
"current" elements... wait hold on, where have I seen this ghost stuff before. Oh wait, didn't I do that in my
old linked-list fork where I prototyped?
241
Cursors always rest between two elements in the list, and index in a logically circular way. To
accomadate this, there is a "ghost" non-element that yields None between the head and tail of the
List.
Hold up what the fuck. This isn't a gag, I am actually trying to Read The Docs right now. Did std actually
RFC a different design from the one I proposed in 2015, but then copy-paste the docs from my
prototype??? Is std meta-shitposting me for writing a book about how much I hate LinkedList????? Like
yeah I built that prototype to demonstrate the concept so that people would let me add it to std and make
LinkedList not useless but, qu'est-ce que le fuck??????????????
Ok you know what, clearly std is blessing my design as the objectively superior one, so we're going to do
my design. Also that's nice because this entire chapter is me actually literally rewriting that library from
scratch, so not changing the API sounds Good To Me!
A Cursor is like an iterator, except that it can freely seek back-and-forth, and can safely mutate the list
during iteration. This is because the lifetime of its yielded references are tied to its own lifetime,
instead of just the underlying list. This means cursors cannot yield multiple elements at once.
Cursors always rest between two elements in the list, and index in a logically circular way. To
accomadate this, there is a "ghost" non-element that yields None between the head and tail of the
List.
When created, cursors start between the ghost and the front of the list. That is, next will yield the
front of the list, and prev will yield None. Calling prev again will yield the tail.
Cute, even though we concluded that the whole "sentinel-node" thing was more trouble than it's worth,
we're still going to end up with semantics that "pretend" there's a sentinel node so that the cursor can wrap
around to the other side of the list.
Oh yeah, this is coming back to me. I wrote this when I was really mad about combinatoric explosion, and
was trying to come up with a way for there to only be one copy of each operation. Unfortunately this is...
semantically problematic. See, when the user wants to splice one list into another, they might want the
cursor to end up before the splice or after it. The inserted list can be arbitrarily large, so it's a genuine issue
for us to only allow for one and expect the user to walk over the entire inserted list!
We're gonna have to rework this design from the ground up after all. What does our Cursor type need?
Well it needs to:
point "between" two elements 242
as a nice little feature, keep track of what "index" is next
update the list itself to modify front/back/len.
How do you point between two elements? Well, you don't. You just point at the "next" element. So, yeah
even though we're exposing "cursor goes in-between" semantics, we're really implementing it as "cursor is
on", and just pretending everything happens before or after that point.
But there's a reason! The splice use-case wants to let the user choose whether they end up before or after
the list, but this is... horribly complicated to express with the std API! They have splice_after and
splice_before, but neither changes the cursor's position, so really you'd need splice_after_before and
splice_after_after...
Wait no I'm being silly. In the std API you can just choose the node you want to end up on, and then use
splice_after/before as appropriate.
squints
Alright screw it, we're going to implement the RFC. Or at least the interesting parts of it.
I have my quibbles with some of the terminology std uses, but cursors are always going to be a bit brain-
melty: iter().next_back() gets you back() , so that's good, but then each subsequent next_back() is
actually bringing you closer to the front and indeed, every pointer we follow is a "front" pointer! If I think
about this seeming-paradox too much it hurts my brain, so, I can certainly respect going for different
terminology to avoid this.
The std API talks about operations before "before" (towards the front) and "after" (towards the back), and
instead of next and next_back , it... calls things move_next and move_prev . HRM. Ok so they're getting
into a bit of the iterator terminology, but at least next doesn't evoke front/back, and helps you orient how
things behave compared to the iterators.
Implementing Cursors
Ok so we're only going to bother with std's CursorMut because the immutable version isn't actually
interesting. Just like my original design, it has a "ghost" element that contains None to indicate the start/end
of the list, and you can "walk over it" to wrap around to the other side of the list. To implement it, we're
going to need:
Ok so quite reasonably index on a Cursor returns an Option<usize> . The std implementation does a
bunch of junk to avoid storing it as an Option but... we're a linked list, it's fine. Also std has the
cursor_front/cursor_back stuff which starts the cursor on the front/back elements, which feels intuitive, but
then has to do something weird when the list is empty.
You can implement that stuff if you want, but I'm going to cut down on all the repetitive gunk and corner
cases and just make a bare cursor_mut method that starts at the ghost, and people can use
move_next/move_prev to get the one they want (and then you can wrap that up as cursor_front if you really
want).
Pretty straight-forward, one field for each item of our bulleted list! Now the cursor_mut method:
impl<T> LinkedList<T> {
pub fn cursor_mut(&mut self) -> CursorMut<T> {
CursorMut {
list: self,
cur: None,
index: None,
}
}
}
Since we're starting at the ghost, we can just start with everything as None, nice and simple! Next,
movement:
impl<'a, T> CursorMut<'a, T> { 244
pub fn index(&self) -> Option<usize> {
self.index
}
move_prev is the exact same logic, but with front/back inverted and the indexing changes inverted:
Next let's add some methods to look at the elements around the cursor: current, peek_next, and peek_prev.
A Very Important Note: these methods must borrow our cursor by &mut self , and the results must be
tied to that borrow. We cannot let the user get multiple copies of a mutable reference, and we cannot let 245
them use any of our insert/remove/split/splice APIs while holding onto such a reference!
Thankfully, this is the default assumption rust makes when you use lifetime elision, so, we will just do the
right thing by default!
Head empty, Option methods and (omitted) compiler errors do all thinking now. I was skeptical about the
Option<NonNull> stuff, but, god damn it really just lets me autopilot this code. I've spent way too much
time writing array-based collections where you never get to use Option, wow this is nice!
( (*node.as_ptr()) is still miserable but, that's just Rust's raw pointers for you...)
Next we have a choice: we can either jump right to split and splice, the entire point of these APIs, or we can
take a baby-step with single element insert/remove. I have a feeling we're just going to want to implement
insert/remove in terms of split and splice so... let's just do those first and see where the cards fall
(genuinely have no idea as I type this).
246
Split
First up, split_before and split_after, which return everything before/after the current element as a LinkedList
(stopping at the ghost element, unless you're at the ghost, in which case we just return the whole List and
the cursor now points to an empty list):
squints ok this one is actually some non-trivial logic so we're going to have to talk it out one step at a time.
Let's start with the corner cases. The third case I believe is just
mem::replace(self.list, LinkedList::new())
Right? We become empty, we return the whole list, and our fields were already None, so nothing to update.
Nice. Oh hey, this also Does The Right Thing on the fourth case too!
So now the normal cases... ok I'm going to need some ASCII diagrams for this. In the most general case,
we have something like this:
So we need to break the link between cur and prev, and... god so much needs to change. Ok I just need to
break this up into steps so I can convince myself it makes sense. This will be a bit over-verbose but I can at
least make sense of it:
pub fn split_before(&mut self) -> LinkedList<T> { 247
if let Some(cur) = self.cur {
// We are pointing at a real element, so the list is non-empty.
unsafe {
// Current state
let old_len = self.list.len;
let old_idx = self.index.unwrap();
let prev = (*cur.as_ptr()).front;
LinkedList {
front: output_front,
back: output_back,
len: output_len,
_boo: PhantomData,
}
}
} else {
// We're at the ghost, just replace our list with an empty one.
// No other state needs to be changed.
std::mem::replace(self.list, LinkedList::new())
}
}
Note that this if-let is handling the "normal case, but prev is the ghost" situation:
If you want to, you can squash that all together and apply optimizations like:
I am done Making Mistakes and I am just going to try to write the most foolproof code I can. This is how I
actually write collections: just break things down into trivial steps and cases until it can fit in my head and
seems foolproof. Then write a ton of tests until I'm convinced I didn't manage to mess it up still.
Because most of the collections work I've done is extremely unsafe I don't generally get to rely on the
compiler catching mistakes, and miri didn't exist back in the day! So I just need to squint at a problem until
my head hurts and try my hardest to Never Ever Ever Make A Mistake.
Splice
Just one more boss to fight, splice_before and splice_after, which I expect to be the corner-casiest one of
them all. The two functions take in a LinkedList and grafts its contents into outrs. Our list could be empty,
their list could be empty, we've got ghosts to deal with... sigh let's just take it one step at a time with
splice_before.
Becoming this:
Ok? Ok. Let's write that out... TAKES A HUGE BREATH AND PLUNGES IN:
pub fn splice_before(&mut self, mut input: LinkedList<T>) { 250
unsafe {
if input.is_empty() {
// Input is empty, do nothing.
} else if let Some(cur) = self.cur {
if let Some(0) = self.index {
// We're appending to the front, see append to back
(*cur.as_ptr()).front = input.back.take();
(*input.back.unwrap().as_ptr()).back = Some(cur);
self.list.front = input.front.take();
(*prev.as_ptr()).back = Some(in_front);
(*in_front.as_ptr()).front = Some(prev);
(*cur.as_ptr()).front = Some(in_back);
(*in_back.as_ptr()).back = Some(cur);
Ok this one is genuinely horrendous, and really is feeling that Option<NonNull> pain now. But there's a lot
of cleanups we can do. For one, we can pull this code out to the very end, because we always want to do it.
I don't love (although sometimes it's a noop, and setting input.len is more a matter of paranoia about
future extensions to the code):
self.list.len += input.len;
input.len = 0;
251
Use of moved value: input
Ah, right, in the "we're empty" case we're moving the list. Let's replace that with a swap:
In this case the writes will be pointless, but, they still work (we could probably also early-return in this
branch to appease the compiler).
This unwrap is just a consequence of me thinking about the cases backwards, and can be fixed by making
the if-let ask the right question:
} else {
let prev = (*cur.as_ptr()).front.unwrap();
}
Adjusting the index is duplicated inside the branches, so can also be hoisted out:
*self.index.as_mut().unwrap() += input.len;
(*prev.as_ptr()).back = Some(in_front);
(*in_front.as_ptr()).front = Some(prev);
(*cur.as_ptr()).front = Some(in_back);
(*in_back.as_ptr()).back = Some(cur);
} else {
// We're appending to the front, see append to back below
(*cur.as_ptr()).front = input.back.take();
(*input.back.unwrap().as_ptr()).back = Some(cur);
self.list.front = input.front.take();
}
// Index moves forward by input length
*self.index.as_mut().unwrap() += input.len;
} else if let Some(back) = self.list.back {
// We're on the ghost but non-empty, append to the back
// We can either `take` the input's pointers or `mem::forget`
// it. Using take is more responsible in case we do custom
// allocators or something that also needs to be cleaned up!
(*back.as_ptr()).back = input.front.take();
(*input.front.unwrap().as_ptr()).front = Some(back);
self.list.back = input.back.take();
} else {
// We're empty, become the input, remain on the ghost
std::mem::swap(self.list, &mut input);
}
self.list.len += input.len;
// Not necessary but Polite To Do
input.len = 0;
Alright this still sucks, but mostly because of -- nope ok just spotted a bug:
(*back.as_ptr()).back = input.front.take();
(*input.front.unwrap().as_ptr()).front = Some(back);
We take input.front and then unwrap it on the next line! sigh and we do the same thing in the equivalent
mirror case. We would have caught this instantly in tests, but, we're trying to be Perfect now, and I'm just
kinda doing this live, and this is the exact moment where I saw it. This is what I get for not being my usual
tedious self and doing things in phases. More explicit!
253
// We can either `take` the input's pointers or `mem::forget`
// it. Using `take` is more responsible in case we ever do custom
// allocators or something that also needs to be cleaned up!
if input.is_empty() {
// Input is empty, do nothing.
} else if let Some(cur) = self.cur {
// Both lists are non-empty
let in_front = input.front.take().unwrap();
let in_back = input.back.take().unwrap();
(*back.as_ptr()).back = Some(in_front);
(*in_front.as_ptr()).front = Some(back);
self.list.back = Some(in_back);
} else {
// We're empty, become the input, remain on the ghost
std::mem::swap(self.list, &mut input);
}
self.list.len += input.len;
// Not necessary but Polite To Do
input.len = 0;
Alright now this, this I can tolerate. The only complaints I have are that we don't dedupe in_front/in_back
(probably we could rejig our conditions but eh whatever). Really this is basically what you would write in C
but with Option<NonNull> gunk making it tedious. I can live with that. Well no we should just make raw
pointers better for this stuff. But, out of scope for this book.
Anyway, I am absolutely exhausted after that, so, insert and remove and all the other APIs can be left as
an excercise to the reader.
Here's the final code for our Cursor with my attempt at copy-pasting the combinatorics. Did I get it right? I'll
only find out when I write the next chapter and test this monstrosity!
pub struct CursorMut<'a, T> { 254
list: &'a mut LinkedList<T>,
cur: Link<T>,
index: Option<usize>,
}
impl<T> LinkedList<T> {
pub fn cursor_mut(&mut self) -> CursorMut<T> {
CursorMut {
list: self,
cur: None,
index: None,
}
}
}
LinkedList {
front: output_front,
back: output_back,
len: output_len,
_boo: PhantomData,
}
}
} else {
// We're at the ghost, just replace our list with an empty one.
// No other state needs to be changed.
std::mem::replace(self.list, LinkedList::new())
}
}
LinkedList {
front: output_front,
back: output_back,
len: output_len,
_boo: PhantomData,
}
}
} else {
// We're at the ghost, just replace our list with an empty one.
// No other state needs to be changed.
std::mem::replace(self.list, LinkedList::new())
}
}
(*back.as_ptr()).back = Some(in_front);
(*in_front.as_ptr()).front = Some(back);
self.list.back = Some(in_back);
} else {
// We're empty, become the input, remain on the ghost
std::mem::swap(self.list, &mut input);
}
self.list.len += input.len;
// Not necessary but Polite To Do
input.len = 0;
(*front.as_ptr()).front = Some(in_back);
(*in_back.as_ptr()).back = Some(front);
self.list.front = Some(in_front);
} else {
// We're empty, become the input, remain on the ghost
std::mem::swap(self.list, &mut input);
}
self.list.len += input.len;
// Not necessary but Polite To Do
input.len = 0;
Testing Cursors
Time to find out how many horribly embarassing mistakes I made in the previous section!
Oh god we made our API unlike both std and the old impl. Alright well I'm just gonna hastily cobble
together something from both of them. Yeah let's "borrow" these tests from std:
#[test] 261
fn test_cursor_move_peek() {
let mut m: LinkedList<u32> = LinkedList::new();
m.extend([1, 2, 3, 4, 5, 6]);
let mut cursor = m.cursor_mut();
cursor.move_next();
assert_eq!(cursor.current(), Some(&mut 1));
assert_eq!(cursor.peek_next(), Some(&mut 2));
assert_eq!(cursor.peek_prev(), None);
assert_eq!(cursor.index(), Some(0));
cursor.move_prev();
assert_eq!(cursor.current(), None);
assert_eq!(cursor.peek_next(), Some(&mut 1));
assert_eq!(cursor.peek_prev(), Some(&mut 6));
assert_eq!(cursor.index(), None);
cursor.move_next();
cursor.move_next();
assert_eq!(cursor.current(), Some(&mut 2));
assert_eq!(cursor.peek_next(), Some(&mut 3));
assert_eq!(cursor.peek_prev(), Some(&mut 1));
assert_eq!(cursor.index(), Some(1));
#[test]
fn test_cursor_mut_insert() {
let mut m: LinkedList<u32> = LinkedList::new();
m.extend([1, 2, 3, 4, 5, 6]);
let mut cursor = m.cursor_mut();
cursor.move_next();
cursor.splice_before(Some(7).into_iter().collect());
cursor.splice_after(Some(8).into_iter().collect());
// check_links(&m);
assert_eq!(m.iter().cloned().collect::<Vec<_>>(), &[7, 1, 8, 2, 3, 4, 5, 6]);
let mut cursor = m.cursor_mut();
cursor.move_next();
cursor.move_prev();
cursor.splice_before(Some(9).into_iter().collect());
cursor.splice_after(Some(10).into_iter().collect());
check_links(&m);
assert_eq!(m.iter().cloned().collect::<Vec<_>>(), &[10, 7, 1, 8, 2, 3, 4, 5, 6, 9]);
fn check_links<T>(_list: &LinkedList<T>) {
// would be good to do this!
}
Moment of truth!
cargo test 263
running 14 tests
test test::test_basic_front ... ok
test test::test_basic ... ok
test test::test_debug ... ok
test test::test_iterator_mut_double_end ... ok
test test::test_ord ... ok
test test::test_cursor_move_peek ... FAILED
test test::test_cursor_mut_insert ... FAILED
test test::test_iterator ... ok
test test::test_mut_iter ... ok
test test::test_eq ... ok
test test::test_rev_iter ... ok
test test::test_iterator_double_end ... ok
test test::test_hashmap ... ok
test test::test_ord_nan ... ok
failures:
failures:
test::test_cursor_move_peek
test::test_cursor_mut_insert
test result: FAILED. 12 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in
0.00s
I'll admit, I had some hubris here and was hoping I nailed it. This is why we write tests (but maybe I just did
a bad job of porting the tests..?).
cursor.move_next();
assert_eq!(cursor.current(), Some(&mut 1));
assert_eq!(cursor.peek_next(), Some(&mut 2));
assert_eq!(cursor.peek_prev(), None);
assert_eq!(cursor.index(), Some(0));
cursor.move_prev();
assert_eq!(cursor.current(), None);
assert_eq!(cursor.peek_next(), Some(&mut 1)); // DIES HERE
Head empty, Option methods and (omitted) compiler errors do all thinking now.
...Yeah this is just wrong. If self.cur is None, we aren't just supposed to give up, we need to check
self.list.front too, because we're on the ghost! So we just need to add an or_else to the chain:
failures:
failures:
test::test_cursor_mut_insert
test result: FAILED. 13 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in
0.00s
Did you notice the part where I commented out some code for testing remove_current? Yeah I wasn't paying
attention to the fact that this test is stateful. Let's just create a new list with the state the remove_current
part would have left us in:
let mut m: LinkedList<u32> = LinkedList::new(); 266
m.extend([1, 8, 2, 3, 4, 5, 6]);
cargo test
Compiling linked-list v0.0.3
Finished test [unoptimized + debuginfo] target(s) in 0.70s
Running unittests src\lib.rs
running 14 tests
test test::test_basic_front ... ok
test test::test_basic ... ok
test test::test_cursor_move_peek ... ok
test test::test_eq ... ok
test test::test_cursor_mut_insert ... ok
test test::test_iterator ... ok
test test::test_iterator_double_end ... ok
test test::test_ord_nan ... ok
test test::test_mut_iter ... ok
test test::test_hashmap ... ok
test test::test_debug ... ok
test test::test_ord ... ok
test test::test_iterator_mut_double_end ... ok
test test::test_rev_iter ... ok
test result: ok. 14 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in
0.00s
Doc-tests linked-list
running 1 test
test src\lib.rs - assert_properties::iter_mut_invariant (line 803) - compile fail ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in
0.12s
Heyyyy look at thaaat... ok now I'm getting paranoid. Let's properly fill in check_links and test it under miri:
assert_eq!(from_front, re_reved);
}
running 14 tests
test test::test_basic ... ok
test test::test_basic_front ... ok
test test::test_cursor_move_peek ... ok
test test::test_cursor_mut_insert ... ok
test test::test_debug ... ok
test test::test_eq ... ok
test test::test_hashmap ... ok
test test::test_iterator ... ok
test test::test_iterator_double_end ... ok
test test::test_iterator_mut_double_end ... ok
test test::test_mut_iter ... ok
test test::test_ord ... ok
test test::test_ord_nan ... ok
test test::test_rev_iter ... ok
Doc-tests linked-list
running 1 test
test src\lib.rs - assert_properties::iter_mut_invariant (line 803) - compile fail ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in
0.10s
DONE.
Done.
We did it. We made a god damn production-quality LinkedList, with basically all the same functionality as
the one in std. Are we missing little convenience methods here and there? Absolutely. Will I add them into
the final published version of the crate? Probably!
So. We win.
Wait fuck. We're being production quality. Ok one last final boss: clippy.
cargo clippy 268
cargo clippy
Checking linked-list v0.0.3 (C:\Users\ninte\dev\contain\linked-list)
warning: redundant pattern matching, consider using `is_some()`
--> src\lib.rs:189:19
|
189 | while let Some(_) = self.pop_front() { }
| ----------^^^^^^^------------------- help: try this: `while
self.pop_front().is_some()`
|
= note: `#[warn(clippy::redundant_pattern_matching)]` on by default
= note: this will change drop order of the result, as well as all temporaries
= note: add `#[allow(clippy::redundant_pattern_matching)]` if this is important
= help: for further information visit https://rust-lang.github.io/rust-
clippy/master/index.html#redundant_pattern_matching
warning: method `into_iter` can be confused for the standard trait method
`std::iter::IntoIterator::into_iter`
--> src\lib.rs:210:5
|
210 | / pub fn into_iter(self) -> IntoIter<T> {
211 | | IntoIter {
212 | | list: self
213 | | }
214 | | }
| |_____^
|
= note: `#[warn(clippy::should_implement_trait)]` on by default
= help: consider implementing the trait `std::iter::IntoIterator` or choosing a less
ambiguous method name
= help: for further information visit https://rust-lang.github.io/rust-
clippy/master/index.html#should_implement_trait
Complaint 1 (and 3): we use while let Some(_) = instead of while .is_some() . The loop is empty so
this truly doesn't matter but ok fine, clippy, I'll do things your way.
Complaint 2: We have an actual inherent into_iter method. Wait, what checks std ok, point to clippy.
IntoIterator is in the prelude (and basically a lang item) so, we don't need an inherent version too.
Complaint 4: we copied a weird cargocult from std. shrug fine I'll remove it.
cargo clippy
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Nice. Just one last thing to do before calling it production quality: fmt.
cargo fmt
...yeah it added some newlines and removed some trailing whitespace. Nothing interesting.
Final Code
I can't believe I actually just made you sit through me actually reimplementing std::collections::LinkedList
from scratch, with all the fiddly little pedantry and mistakes I made along the way.
Alright, here's all 1200 lines of our complete rewrite of in all of its glory. This should be the same text as
this commit.
I'll put some polish and docs back on and publish 0.1.0 later.
271
use std::cmp::Ordering;
use std::fmt::{self, Debug};
use std::hash::{Hash, Hasher};
use std::iter::FromIterator;
use std::marker::PhantomData;
use std::ptr::NonNull;
struct Node<T> {
front: Link<T>,
back: Link<T>,
elem: T,
}
impl<T> LinkedList<T> {
pub fn new() -> Self {
Self {
front: None,
back: None,
len: 0,
_boo: PhantomData,
}
}
self.len -= 1;
result
// Box gets implicitly freed here, knows there is no T. 273
})
}
}
self.len -= 1;
result
// Box gets implicitly freed here, knows there is no T.
})
}
}
LinkedList {
front: output_front,
back: output_back,
len: output_len,
_boo: PhantomData,
} 280
}
} else {
// We're at the ghost, just replace our list with an empty one.
// No other state needs to be changed.
std::mem::replace(self.list, LinkedList::new())
}
}
LinkedList {
front: output_front,
back: output_back,
len: output_len,
_boo: PhantomData,
} 281
}
} else {
// We're at the ghost, just replace our list with an empty one.
// No other state needs to be changed.
std::mem::replace(self.list, LinkedList::new())
}
}
(*back.as_ptr()).back = Some(in_front);
(*in_front.as_ptr()).front = Some(back);
self.list.back = Some(in_back);
} else {
// We're empty, become the input, remain on the ghost
std::mem::swap(self.list, &mut input);
}
282
self.list.len += input.len;
// Not necessary but Polite To Do
input.len = 0;
(*front.as_ptr()).front = Some(in_back);
(*in_back.as_ptr()).back = Some(front);
self.list.front = Some(in_front);
} else {
// We're empty, become the input, remain on the ghost
std::mem::swap(self.list, &mut input);
}
self.list.len += input.len; 283
// Not necessary but Polite To Do
input.len = 0;
#[allow(dead_code)]
fn assert_properties() {
fn is_send<T: Send>() {}
fn is_sync<T: Sync>() {}
is_send::<LinkedList<i32>>();
is_sync::<LinkedList<i32>>();
is_send::<IntoIter<i32>>();
is_sync::<IntoIter<i32>>();
is_send::<Iter<i32>>();
is_sync::<Iter<i32>>();
is_send::<IterMut<i32>>();
is_sync::<IterMut<i32>>();
/// ```compile_fail,E0308
/// use linked_list::IterMut;
///
/// fn iter_mut_covariant<'i, 'a, T>(x: IterMut<'i, &'static T>) -> IterMut<'i, &'a T> {
x }
/// ```
fn iter_mut_invariant() {}
}
#[cfg(test)]
mod test {
use super::LinkedList;
#[test]
fn test_basic_front() {
let mut list = LinkedList::new();
// Mess around
list.push_front(10);
assert_eq!(list.len(), 1);
list.push_front(20);
assert_eq!(list.len(), 2);
list.push_front(30);
assert_eq!(list.len(), 3);
assert_eq!(list.pop_front(), Some(30));
assert_eq!(list.len(), 2);
list.push_front(40);
assert_eq!(list.len(), 3);
assert_eq!(list.pop_front(), Some(40));
assert_eq!(list.len(), 2);
assert_eq!(list.pop_front(), Some(20));
assert_eq!(list.len(), 1);
assert_eq!(list.pop_front(), Some(10));
assert_eq!(list.len(), 0);
assert_eq!(list.pop_front(), None);
assert_eq!(list.len(), 0);
assert_eq!(list.pop_front(), None);
assert_eq!(list.len(), 0);
}
#[test]
fn test_basic() {
let mut m = LinkedList::new();
assert_eq!(m.pop_front(), None);
assert_eq!(m.pop_back(), None);
assert_eq!(m.pop_front(), None);
m.push_front(1);
assert_eq!(m.pop_front(), Some(1));
m.push_back(2);
m.push_back(3);
assert_eq!(m.len(), 2);
assert_eq!(m.pop_front(), Some(2));
assert_eq!(m.pop_front(), Some(3));
assert_eq!(m.len(), 0);
assert_eq!(m.pop_front(), None);
m.push_back(1);
m.push_back(3);
m.push_back(5);
m.push_back(7); 285
assert_eq!(m.pop_front(), Some(1));
#[test]
fn test_iterator() {
let m = generate_test();
for (i, elt) in m.iter().enumerate() {
assert_eq!(i as i32, *elt);
}
let mut n = LinkedList::new();
assert_eq!(n.iter().next(), None);
n.push_front(4);
let mut it = n.iter();
assert_eq!(it.size_hint(), (1, Some(1)));
assert_eq!(it.next().unwrap(), &4);
assert_eq!(it.size_hint(), (0, Some(0)));
assert_eq!(it.next(), None);
}
#[test]
fn test_iterator_double_end() {
let mut n = LinkedList::new();
assert_eq!(n.iter().next(), None);
n.push_front(4);
n.push_front(5);
n.push_front(6);
let mut it = n.iter();
assert_eq!(it.size_hint(), (3, Some(3)));
assert_eq!(it.next().unwrap(), &6);
assert_eq!(it.size_hint(), (2, Some(2)));
assert_eq!(it.next_back().unwrap(), &4);
assert_eq!(it.size_hint(), (1, Some(1)));
assert_eq!(it.next_back().unwrap(), &5);
assert_eq!(it.next_back(), None);
assert_eq!(it.next(), None);
}
#[test]
fn test_rev_iter() {
let m = generate_test();
for (i, elt) in m.iter().rev().enumerate() {
assert_eq!(6 - i as i32, *elt);
}
let mut n = LinkedList::new(); 286
assert_eq!(n.iter().rev().next(), None);
n.push_front(4);
let mut it = n.iter().rev();
assert_eq!(it.size_hint(), (1, Some(1)));
assert_eq!(it.next().unwrap(), &4);
assert_eq!(it.size_hint(), (0, Some(0)));
assert_eq!(it.next(), None);
}
#[test]
fn test_mut_iter() {
let mut m = generate_test();
let mut len = m.len();
for (i, elt) in m.iter_mut().enumerate() {
assert_eq!(i as i32, *elt);
len -= 1;
}
assert_eq!(len, 0);
let mut n = LinkedList::new();
assert!(n.iter_mut().next().is_none());
n.push_front(4);
n.push_back(5);
let mut it = n.iter_mut();
assert_eq!(it.size_hint(), (2, Some(2)));
assert!(it.next().is_some());
assert!(it.next().is_some());
assert_eq!(it.size_hint(), (0, Some(0)));
assert!(it.next().is_none());
}
#[test]
fn test_iterator_mut_double_end() {
let mut n = LinkedList::new();
assert!(n.iter_mut().next_back().is_none());
n.push_front(4);
n.push_front(5);
n.push_front(6);
let mut it = n.iter_mut();
assert_eq!(it.size_hint(), (3, Some(3)));
assert_eq!(*it.next().unwrap(), 6);
assert_eq!(it.size_hint(), (2, Some(2)));
assert_eq!(*it.next_back().unwrap(), 4);
assert_eq!(it.size_hint(), (1, Some(1)));
assert_eq!(*it.next_back().unwrap(), 5);
assert!(it.next_back().is_none());
assert!(it.next().is_none());
}
#[test]
fn test_eq() {
let mut n: LinkedList<u8> = list_from(&[]);
let mut m = list_from(&[]);
assert!(n == m);
n.push_front(1);
assert!(n != m);
m.push_back(1);
assert!(n == m);
#[test]
fn test_ord() {
let n = list_from(&[]);
let m = list_from(&[1, 2, 3]);
assert!(n < m);
assert!(m > n);
assert!(n <= n);
assert!(n >= n);
}
#[test]
fn test_ord_nan() {
let nan = 0.0f64 / 0.0;
let n = list_from(&[nan]);
let m = list_from(&[nan]);
assert!(!(n < m));
assert!(!(n > m));
assert!(!(n <= m));
assert!(!(n >= m));
let n = list_from(&[nan]);
let one = list_from(&[1.0f64]);
assert!(!(n < one));
assert!(!(n > one));
assert!(!(n <= one));
assert!(!(n >= one));
#[test]
fn test_debug() {
let list: LinkedList<i32> = (0..10).collect();
assert_eq!(format!("{:?}", list), "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]");
#[test]
fn test_hashmap() {
// Check that HashMap works with this as a key
assert_eq!(map.len(), 2);
assert_eq!(map.get(&list1), Some(&"list1"));
assert_eq!(map.get(&list2), Some(&"list2"));
assert_eq!(map.remove(&list1), Some("list1"));
assert_eq!(map.remove(&list2), Some("list2"));
assert!(map.is_empty());
}
#[test]
fn test_cursor_move_peek() {
let mut m: LinkedList<u32> = LinkedList::new();
m.extend([1, 2, 3, 4, 5, 6]);
let mut cursor = m.cursor_mut();
cursor.move_next();
assert_eq!(cursor.current(), Some(&mut 1));
assert_eq!(cursor.peek_next(), Some(&mut 2));
assert_eq!(cursor.peek_prev(), None);
assert_eq!(cursor.index(), Some(0));
cursor.move_prev();
assert_eq!(cursor.current(), None);
assert_eq!(cursor.peek_next(), Some(&mut 1));
assert_eq!(cursor.peek_prev(), Some(&mut 6));
assert_eq!(cursor.index(), None);
cursor.move_next();
cursor.move_next();
assert_eq!(cursor.current(), Some(&mut 2));
assert_eq!(cursor.peek_next(), Some(&mut 3));
assert_eq!(cursor.peek_prev(), Some(&mut 1));
assert_eq!(cursor.index(), Some(1));
#[test]
fn test_cursor_mut_insert() {
let mut m: LinkedList<u32> = LinkedList::new();
m.extend([1, 2, 3, 4, 5, 6]);
let mut cursor = m.cursor_mut();
cursor.move_next(); 289
cursor.splice_before(Some(7).into_iter().collect());
cursor.splice_after(Some(8).into_iter().collect());
// check_links(&m);
assert_eq!(
m.iter().cloned().collect::<Vec<_>>(),
&[7, 1, 8, 2, 3, 4, 5, 6]
);
let mut cursor = m.cursor_mut();
cursor.move_next();
cursor.move_prev();
cursor.splice_before(Some(9).into_iter().collect());
cursor.splice_after(Some(10).into_iter().collect());
check_links(&m);
assert_eq!(
m.iter().cloned().collect::<Vec<_>>(),
&[10, 7, 1, 8, 2, 3, 4, 5, 6, 9]
);
assert_eq!(from_front, re_reved);
}
}
291
ahahahaha
No
This chapter is a living document of the more ridiculous linked lists and how they interact with Rust.
Instead, we can smash our list into two halves: one going to the left, and one going to the right:
// lib.rs
// ...
pub mod silly1; // NEW!
// silly1.rs
use crate::second::List as Stack;
struct List<T> {
left: Stack<T>,
right: Stack<T>,
}
Now, rather than having a mere safe stack, we have a general purpose list. We can grow the list leftwards
or rightwards by pushing onto either stack. We can also "walk" along the list by popping values off one
end and onto the other. To avoid needless allocations, we're going to copy the source of our safe Stack to
get access to its private details:
pub struct Stack<T> { 293
head: Link<T>,
}
struct Node<T> {
elem: T,
next: Link<T>,
}
impl<T> Stack<T> {
pub fn new() -> Self {
Stack { head: None }
}
self.head = Some(new_node);
}
self.push_node(new_node);
}
impl<T> List<T> {
fn new() -> Self {
List { left: Stack::new(), right: Stack::new() }
}
}
We return booleans here as just a convenience to indicate whether we actually managed to move. Now
let's test this baby out:
#[cfg(test)]
mod test {
use super::List;
#[test]
fn walk_aboot() {
let mut list = List::new(); // [_]
list.push_left(0); // [0,_]
list.push_right(1); // [0, _, 1]
assert_eq!(list.peek_left(), Some(&0));
assert_eq!(list.peek_right(), Some(&1));
list.push_left(2); // [0, 2, _, 1]
list.push_left(3); // [0, 2, 3, _, 1]
list.push_right(4); // [0, 2, 3, _, 4, 1]
assert_eq!(list.pop_left(), None);
assert_eq!(list.pop_right(), Some(0)); // [_, 2, 3, 4, 1]
assert_eq!(list.pop_right(), Some(2)); // [_, 3, 4, 1]
list.push_left(5); // [5, _, 3, 4, 1]
assert_eq!(list.pop_right(), Some(3)); // [5, _, 4, 1]
assert_eq!(list.pop_left(), Some(5)); // [_, 4, 1]
assert_eq!(list.pop_right(), Some(4)); // [_, 1]
assert_eq!(list.pop_right(), Some(1)); // [_]
assert_eq!(list.pop_right(), None);
assert_eq!(list.pop_left(), None);
}
}
> cargo test 296
Running target/debug/lists-5c71138492ad4b4a
running 16 tests
test fifth::test::into_iter ... ok
test fifth::test::basics ... ok
test fifth::test::iter ... ok
test fifth::test::iter_mut ... ok
test fourth::test::into_iter ... ok
test fourth::test::basics ... ok
test fourth::test::peek ... ok
test first::test::basics ... ok
test second::test::into_iter ... ok
test second::test::basics ... ok
test second::test::iter ... ok
test second::test::iter_mut ... ok
test third::test::basics ... ok
test third::test::iter ... ok
test second::test::peek ... ok
test silly1::test::walk_aboot ... ok
This is an extreme example of a finger data structure, where we maintain some kind of finger into the
structure, and as a consequence can support operations on locations in time proportional to the distance
from the finger.
We can make very fast changes to the list around our finger, but if we want to make changes far away from
our finger we have to walk all the way over there. We can permanently walk over there by shifting the
elements from one stack to the other, or we could just walk along the links with an &mut temporarily to do
the changes. However the &mut can never go back up the list, while our finger can!
297
So let's allocate memory on the stack the easy way: by calling a function and getting a new stack frame
with more space! This is a very silly solution to our problem but also genuinely practical and useful. It's
done all the time, potentially without actually even thinking about it as a linked list!
Any time you're doing something recursively, you can just pass a pointer to the current step's state to the
next step. If that pointer itself is part of the state, then you've created a linked list that's stack-allocated!
Now of course we're in the silly part of the book so we're going to do this in a silly way: by making the
linked list the star and forcing all the user's code to live in a swamp of callbacks. Everybody loves nested
callbacks!
Our List type will just be a Node with a reference to another Node:
And it will have only one operation, push , which will take the old list, the state for the current node, and a
callback. The new list will be produced in the callback. We will also let callbacks return any value, which
push will return when it completes:
List::push(None, 3, |list| {
println!("{}", list.data);
List::push(Some(list), 5, |list| {
println!("{}", list.data);
List::push(Some(list), 13, |list| {
println!("{}", list.data);
})
})
})
It's beautiful. 😿 298
The user can already traverse this list by using while-let to walk over the prev values, but just for fun, let's
implement an iterator, which is the usual:
#[cfg(test)]
mod test {
use super::List;
#[test]
fn elegance() {
List::push(None, 3, |list| {
assert_eq!(list.iter().copied().sum::<i32>(), 3);
List::push(Some(list), 5, |list| {
assert_eq!(list.iter().copied().sum::<i32>(), 5 + 3);
List::push(Some(list), 13, |list| {
assert_eq!(list.iter().copied().sum::<i32>(), 13 + 5 + 3);
})
})
})
}
}
> cargo test 299
running 18 tests
test fifth::test::into_iter ... ok
test fifth::test::iter ... ok
test fifth::test::iter_mut ... ok
test fifth::test::basics ... ok
test fifth::test::miri_food ... ok
test first::test::basics ... ok
test second::test::into_iter ... ok
test fourth::test::peek ... ok
test fourth::test::into_iter ... ok
test second::test::iter_mut ... ok
test fourth::test::basics ... ok
test second::test::basics ... ok
test second::test::iter ... ok
test third::test::basics ... ok
test silly1::test::walk_aboot ... ok
test silly2::test::elegance ... ok
test second::test::peek ... ok
test third::test::iter ... ok
Now at this point you might be wonder "hey can I mutate the data stored in a node?". Maybe! Let's try to
make the list use mutable references instead of shared ones:
300
pub struct List<'a, T> {
pub data: T,
pub prev: Option<&'a mut List<'a, T>>,
}
<ad infinitum>
Whelp. Seems like it doesn't like our iterator. Maybe we messed that up? Let's simplify the test a bit to
check:
#[test]
fn elegance() {
List::push(None, 3, |list| {
assert_eq!(list.data, 3);
List::push(Some(list), 5, |list| {
assert_eq!(list.data, 5);
List::push(Some(list), 13, |list| {
assert_eq!(list.data, 13);
})
})
})
}
> cargo test 302
The problem is that our list is accidentally( 😉) relying on variance. Variance is a complicated subject but
let's look at it in simplified terms here:
Each list contains a reference to a List with the exact same type as itself. From the perspective of the inner-
most list, that means all lists are using the same lifetime as itself, but this is objectively false: each node in
the list lives strictly longer than the next one, because they are literally in nested scopes!
So... why did the code compile when we were using shared references? Because in many cases, the
compiler knows it's safe to have something that lives "too long"! When we stuff a reference to a list into
the next one, the compiler is quietly "shrinking" down the lifetimes to make them fit what the new list
expects. This lifetime shrinking is variance.
It's the exact same trick in languages with inheritance that let's you pass a Cat where an Animal (a
supertype of a Cat) is expected. Intuitively we know it's fine to pass a Cat when an Animal is expected
because a Cat is just an Animal and more. It's fine to forget the "and more" part for a while, right?
Similarly, a larger lifetime is just a smaller lifetime and more. So it's fine to forget the "and more" here too!
But of course you are now wondering: then why doesn't the mutable reference version work!?
Well, variance isn't always safe. If our code did compile, we could have written a use-after-free like this:
List::push(None, 3, |list| { 303
List::push(Some(list), 5, |list| {
List::push(Some(list), 13, |list| {
// HAHAHA all the lifetimes are the same, so the compiler will
// let me rewrite my parent to hold a mutable reference to myself!
// I will create all the use-after-frees!!
*list.prev.as_mut().unwrap().prev = Some(list);
})
})
})
The problem with forgetting details is that somewhere else might remember those details and expect them
to remain true. That is a very big problem once you introduce mutation. If you're not careful, the code that
doesn't remember the "and more" that we threw away might think it's fine to write things to places that
"remember" and expect the "and more" to still be there.
So while you can shorten the lifetime of a mutable reference, once you start nesting them things become
"invariant" and you're not allowed to shorten lifetimes anymore.
Specifically &mut &'big mut T cannot be converted to &mut &'small mut T , where 'big is bigger than
'small . Or more formally, &'a mut T is covariant over 'a but invariant over T .
Fun fact: Java actually specifically lets you do this kind of thing, but it does runtime checks to prevent
meowing dogs.
So what can we do to mutate the data? Use interior mutability! This lets us tell the compiler that we just
want to be able to mutate the data but won't touch the references.
We can just revert back to the previous version of our code with shared references, and use Cell in a new
test:
#[test] 304
fn cell() {
use std::cell::Cell;
running 19 tests
test fifth::test::into_iter ... ok
test fifth::test::basics ... ok
test fifth::test::iter_mut ... ok
test fifth::test::iter ... ok
test fourth::test::basics ... ok
test fourth::test::into_iter ... ok
test second::test::into_iter ... ok
test first::test::basics ... ok
test fourth::test::peek ... ok
test second::test::basics ... ok
test fifth::test::miri_food ... ok
test silly2::test::cell ... ok
test third::test::iter ... ok
test second::test::iter_mut ... ok
test second::test::peek ... ok
test silly1::test::walk_aboot ... ok
test silly2::test::elegance ... ok
test third::test::basics ... ok
test second::test::iter ... ok