The Embedded Rust Book
The Embedded Rust Book
Introduction
Welcome to The Embedded Rust Book: An introductory book about using the Rust
Programming Language on "Bare Metal" embedded systems, such as
Microcontrollers.
Scope
The goals of this book are:
Share current best practices about using Rust for embedded development. i.e.
How to best use Rust language features to write more correct embedded
software.
This book tries to be as general as possible but to make things easier for both the
readers and the writers it uses the ARM Cortex-M architecture in all its examples.
However, the book doesn't assume that the reader is familiar with this particular
architecture and explains details particular to this architecture where required.
You are comfortable using the Rust Programming Language, and have
written, run, and debugged Rust applications on a desktop environment. You
should also be familiar with the idioms of the 2018 edition as this book
targets Rust 2018.
Other Resources
If you are unfamiliar with anything mentioned above or if you want more
information about a specific topic mentioned in this book you might find some of
these resources helpful.
If you have trouble following the instructions in this book or find that some section
of the book is not clear enough or hard to follow then that's a bug and it should be
reported in the issue tracker of this book.
Pull requests fixing typos and adding new content are very welcome!
The code samples and free-standing Cargo projects contained within this
book are licensed under the terms of both the MIT License and the Apache
License v2.0.
The written prose, pictures and diagrams contained within this book are
licensed under the terms of the Creative Commons CC-BY-SA v4.0 license.
TL;DR: If you want to use our text or images in your work, you need to:
Give the appropriate credit (i.e. mention this book on your slide, and provide
a link to the relevant page)
Provide a link to the CC-BY-SA v4.0 licence
Indicate if you have changed the material in any way, and make any changes
to our material available under the same licence
48 KiB of RAM.
General purpose Input Output (GPIO) and other types of pins accessible
through the two rows of headers along side the board.
A USB interface accessible through the USB port labeled "USB USER".
For a more detailed list of features and further specifications of the board take a
look at the STMicroelectronics website.
A word of caution: be careful if you want to apply external signals to the board. The
microcontroller STM32F303VCT6 pins take a nominal voltage of 3.3 volts. For
further information consult the 6.2 Absolute maximum ratings section in the
manual
Hosted Environments
These kinds of environments are close to a normal PC environment. What this
means is that you are provided with a System Interface E.G. POSIX that provides
you with primitives to interact with various systems, such as file systems,
networking, memory management, threads, etc. Standard libraries in turn usually
depend on these primitives to implement their functionality. You may also have
some sort of sysroot and restrictions on RAM/ROM-usage, and perhaps some
special HW or I/Os. Overall it feels like coding on a special-purpose PC
environment.
In a bare metal environment no code has been loaded before your program.
Without the software provided by an OS we can not load the standard library.
Instead the program, along with the crates it uses, can only use the hardware (bare
metal) to run. To prevent rust from loading the standard library use no_std . The
platform-agnostic parts of the standard library are available through libcore.
libcore also excludes things which are not always desirable in an embedded
environment. One of these things is a memory allocator for dynamic memory
allocation. If you require this or any other functionalities there are often crates
which provide these.
As mentioned before using libstd requires some sort of system integration, but this
is not only because libstd is just providing a common way of accessing OS
abstractions, it also provides a runtime. This runtime, among other things, takes
care of setting up stack overflow protection, processing command line arguments,
and spawning the main thread before a program's main function is invoked. This
runtime also won't be available in a no_std environment.
Summary
#![no_std] is a crate-level attribute that indicates that the crate will link to the
core-crate instead of the std-crate. The libcore crate in turn is a platform-agnostic
subset of the std crate which makes no assumptions about the system the
program will run on. As such, it provides APIs for language primitives like floats,
strings and slices, as well as APIs that expose processor features like atomic
operations and SIMD instructions. However it lacks APIs for anything that involves
platform integration. Because of these properties no_std and libcore code can be
used for any kind of bootstrapping (stage 0) code like bootloaders, firmware or
kernels.
Overview
* Only if you use the alloc crate and use a suitable allocator like alloc-cortex-m.
** Only if you use the collections crate and configure a global default allocator.
See Also
RFC-1184
Tooling
Dealing with microcontrollers involves using several different tools as we'll be
dealing with an architecture different than your laptop's and we'll have to run and
debug programs on a remote device.
We'll use all the tools listed below. Any recent version should work when a
minimum version is not specified, but we have listed the versions we have tested.
The text below explains why we are using these tools. Installation instructions can
be found on the next page.
cargo-generate OR git
Bare metal programs are non-standard ( no_std ) Rust programs that require some
adjustments to the linking process in order to get the memory layout of the
program right. This requires some additional files (like linker scripts) and settings
(like linker flags). We have packaged those for you in a template such that you only
need to fill in the missing information (such as the project name and the
characteristics of your target hardware).
cargo-binutils
cargo-binutils is a collection of Cargo subcommands that make it easy to use
the LLVM tools that are shipped with the Rust toolchain. These tools include the
LLVM versions of objdump , nm and size and are used for inspecting binaries.
The advantage of using these tools over GNU binutils is that (a) installing the LLVM
tools is the same one-command installation ( rustup component add llvm-tools-
preview ) regardless of your OS and (b) tools like objdump support all the
architectures that rustc supports -- from ARM to x86_64 -- because they both
share the same LLVM backend.
qemu-system-arm
QEMU is an emulator. In this case we use the variant that can fully emulate ARM
systems. We use QEMU to run embedded programs on the host. Thanks to this
you can follow some parts of this book even if you don't have any hardware with
you!
GDB
A debugger is a very important component of embedded development as you may
not always have the luxury to log stuff to the host console. In some cases, you may
not even have LEDs to blink on your hardware!
In general, LLDB works as well as GDB when it comes to debugging but we haven't
found an LLDB counterpart to GDB's load command, which uploads the program
to the target hardware, so currently we recommend that you use GDB.
OpenOCD
GDB isn't able to communicate directly with the ST-Link debugging hardware on
your STM32F3DISCOVERY development board. It needs a translator and the Open
On-Chip Debugger, OpenOCD, is that translator. OpenOCD is a program that runs
on your laptop/PC and translates between GDB's TCP/IP based remote debug
protocol and ST-Link's USB based protocol.
OpenOCD also performs other important work as part of its translation for the
debugging of the ARM Cortex-M based microcontroller on your
STM32F3DISCOVERY development board:
It knows how to interact with the memory mapped registers used by the ARM
CoreSight debug peripheral. It is these CoreSight registers that allow for:
Breakpoint/Watchpoint manipulation
Reading and writing of the CPU registers
Detecting when the CPU has been halted for a debug event
Continuing CPU execution after a debug event has been encountered
etc.
It also knows how to erase and write to the microcontroller's FLASH
Rust Toolchain
NOTE Make sure you have a compiler version equal to or newer than 1.31 . rustc
-V should return a date newer than the one shown below.
$ rustc -V
rustc 1.31.1 (b6c32da9b 2018-12-18)
For bandwidth and disk usage concerns the default installation only supports
native compilation. To add cross compilation support for the ARM Cortex-M
architectures choose one of the following compilation targets. For the
STM32F3DISCOVERY board used for the examples in this book, use the final
thumbv7em-none-eabihf target.
cargo-binutils
cargo-generate
OS-Specific Instructions
Linux
Windows
macOS
Linux
Here are the installation commands for a few Linux distributions.
Packages
Ubuntu 18.04 or newer / Debian stretch or newer
NOTE gdb-multiarch is the GDB command you'll use to debug your ARM
Cortex-M programs
Fedora 27 or newer
Arch Linux
udev rules
This rule lets you use OpenOCD with the Discovery board without root privilege.
If you had the board plugged to your laptop, unplug it and then plug it again.
lsusb
(..)
Bus 001 Device 018: ID 0483:374b STMicroelectronics ST-LINK/V2.1
(..)
Take note of the bus and device numbers. Use those numbers to create a path like
/dev/bus/usb/<bus>/<device> . Then use this path like so:
ls -l /dev/bus/usb/001/018
user::rw-
user:you:rw-
macOS
All the tools can be install using Homebrew:
$ # GDB
$ brew install armmbed/formulae/arm-none-eabi-gcc
$ # OpenOCD
$ brew install openocd
$ # QEMU
$ brew install qemu
Windows
arm-none-eabi-gdb
ARM provides .exe installers for Windows. Grab one from here, and follow the
instructions. Just before the installation process finishes tick/select the "Add path
to environment variable" option. Then verify that the tools are in your %PATH% :
$ arm-none-eabi-gdb -v
GNU gdb (GNU Tools for Arm Embedded Processors 7-2018-q2-update)
8.1.0.20180315-git
(..)
OpenOCD
There's no official binary release of OpenOCD for Windows but if you're not in the
mood to compile it yourself, the xPack project provides a binary distribution, here.
Follow the provided installation instructions. Then update your %PATH%
environment variable to include the path where the binaries were installed.
( C:\Users\USERNAME\AppData\Roaming\xPacks\@xpack-dev-tools\openocd
\0.10.0-13.1\.content\bin\ , if you've been using the easy install)
$ openocd -v
Open On-Chip Debugger 0.10.0
(..)
QEMU
Grab QEMU from the official website.
Verify Installation
In this section we check that some of the required tools / drivers have been
Connect your laptop / PC to the discovery board using a micro USB cable. The
discovery board has two USB connectors; use the one labeled "USB ST-LINK" that
sits on the center of the edge of the board.
Also check that the ST-LINK header is populated. See the picture below; the ST-
LINK header is circled in red.
You should get the following output and the program should block the console:
The contents may not match exactly but you should get the last line about
breakpoints and watchpoints. If you got it then terminate the OpenOCD process
and move to the next section.
If you didn't get the "breakpoints" line then try one of the following commands.
If one of those commands works it means you got an old hardware revision of the
discovery board. That won't be a problem but commit that fact to memory as you'll
need to configure things a bit differently later on. You can move to the next
section.
If none of the commands work as a normal user then try to run them with root
permission (e.g. sudo openocd .. ). If the commands do work with root
permission then check that the udev rules have been correctly set.
If you have reached this point and OpenOCD is not working please open an issue
and we'll help you out!
Getting Started
In this section we'll walk you through the process of writing, building, flashing and
debugging embedded programs. You will be able to try most of the examples
without any special hardware as we will show you the basics using QEMU, a
popular open-source hardware emulator. The only section where hardware is
required is, naturally enough, the Hardware section, where we use OpenOCD to
program an STM32F3DISCOVERY.
QEMU
We'll start writing a program for the LM3S6965, a Cortex-M3 microcontroller. We
have chosen this as our initial target because it can be emulated using QEMU so
you don't need to fiddle with hardware in this section and we can focus on the
tooling and the development process.
IMPORTANT We'll use the name "app" for the project name in this tutorial.
Whenever you see the word "app" you should replace it with the name you
selected for your project. Or, you could also name your project "app" and avoid the
substitutions.
Using cargo-generate
cd app
Using git
[package]
authors = ["{{authors}}"] # "{{authors}}" -> "John Smith"
edition = "2018"
name = "{{project-name}}" # "{{project-name}}" -> "awesome-app"
version = "0.1.0"
# ..
[[bin]]
name = "{{project-name}}" # "{{project-name}}" -> "awesome-app"
test = false
bench = false
Using neither
Grab the latest snapshot of the cortex-m-quickstart template and extract it.
Then fill in the placeholders in the Cargo.toml file as done in the second part of
Program Overview
For convenience here are the most important parts of the source code in
src/main.rs :
#![no_std]
#![no_main]
use panic_halt as _;
use cortex_m_rt::entry;
#[entry]
fn main() -> ! {
loop {
// your code goes here
}
}
This program is a bit different from a standard Rust program so let's take a closer
look.
#![no_std] indicates that this program will not link to the standard crate, std .
Instead it will link to its subset: the core crate.
#![no_main] indicates that this program won't use the standard main interface
that most Rust programs use. The main (no pun intended) reason to go with
no_main is that using the main interface in no_std context requires nightly.
fn main() -> ! . Our program will be the only process running on the target
hardware so we don't want it to end! We use a divergent function (the -> ! bit in
Cross compiling
The next step is to cross compile the program for the Cortex-M3 architecture.
That's as simple as running cargo build --target $TRIPLE if you know what the
compilation target ( $TRIPLE ) should be. Luckily, the .cargo/config in the
template has the answer:
[build]
# Pick ONE of these compilation targets
# target = "thumbv6m-none-eabi" # Cortex-M0 and Cortex-M0+
target = "thumbv7m-none-eabi" # Cortex-M3
# target = "thumbv7em-none-eabi" # Cortex-M4 and Cortex-M7 (no FPU)
# target = "thumbv7em-none-eabihf" # Cortex-M4F and Cortex-M7F (with
FPU)
Since the thumbv7m-none-eabi compilation target has been set as the default in
your .cargo/config file, the two commands below do the same:
Inspecting
Now we have a non-native ELF binary in target/thumbv7m-none-eabi/debug/app .
We can inspect it using cargo-binutils .
With cargo-readobj we can print the ELF headers to confirm that this is an ARM
binary.
Note that:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0x0
Type: EXEC (Executable file)
Machine: ARM
Version: 0x1
Entry point address: 0x405
Start of program headers: 52 (bytes into file)
Start of section headers: 153204 (bytes into file)
Flags: 0x5000200
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 2
Size of section headers: 40 (bytes)
Number of section headers: 19
Section header string table index: 18
cargo-size can print the size of the linker sections of the binary.
app :
section size addr
.vector_table 1024 0x0
.text 92 0x400
.rodata 0 0x45c
.data 0 0x20000000
.bss 0 0x20000000
.debug_str 2958 0x0
.debug_loc 19 0x0
.debug_abbrev 567 0x0
.debug_info 4929 0x0
.debug_ranges 40 0x0
.debug_macinfo 1 0x0
.debug_pubnames 2035 0x0
.debug_pubtypes 1892 0x0
.ARM.attributes 46 0x0
.debug_frame 100 0x0
.debug_line 867 0x0
Total 14570
IMPORTANT: ELF files contain metadata like debug information so their size on disk
does not accurately reflect the space the program will occupy when flashed on a
device. Always use cargo-size to check how big a binary really is.
NOTE this output can differ on your system. New versions of rustc, LLVM and
libraries can generate different assembly. We truncated some of the
instructions to keep the snippet small.
Reset:
406: bl #0x24e
40a: movw r0, #0x0
< .. truncated any more instructions .. >
DefaultHandler_:
656: b #-0x4 <DefaultHandler_>
UsageFault:
657: strb r7, [r4, #0x3]
DefaultPreInit:
658: bx lr
__pre_init:
659: strb r7, [r0, #0x1]
__nop:
65a: bx lr
HardFaultTrampoline:
65c: mrs r0, msp
660: b #-0x2 <HardFault_>
HardFault_:
662: b #-0x4 <HardFault_>
HardFault:
663: <unknown>
Running
Next, let's see how to run an embedded program on QEMU! This time we'll use the
hello example which actually does something.
#![no_main]
#![no_std]
use panic_halt as _;
use cortex_m_rt::entry;
use cortex_m_semihosting::{debug, hprintln};
#[entry]
fn main() -> ! {
hprintln!("Hello, world!").unwrap();
// exit QEMU
// NOTE do not run this on hardware; it can corrupt OpenOCD state
debug::exit(debug::EXIT_SUCCESS);
loop {}
}
This program uses something called semihosting to print text to the host console.
When using real hardware this requires a debug session but when using QEMU
this Just Works.
qemu-system-arm \
-cpu cortex-m3 \
-machine lm3s6965evb \
-nographic \
-semihosting-config enable=on,target=native \
-kernel target/thumbv7m-none-eabi/debug/examples/hello
Hello, world!
The command should successfully exit (exit code = 0) after printing the text. On
*nix you can check that with the following command:
echo $?
-cpu cortex-m3 . This tells QEMU to emulate a Cortex-M3 CPU. Specifying the
CPU model lets us catch some miscompilation errors: for example, running a
program compiled for the Cortex-M4F, which has a hardware FPU, will make
QEMU error during its execution.
-kernel $file . This tells QEMU which binary to load and run on the
emulated machine.
Typing out that long QEMU command is too much work! We can set a custom
runner to simplify the process. .cargo/config has a commented out runner that
invokes QEMU; let's uncomment it:
[target.thumbv7m-none-eabi]
# uncomment this to make `cargo run` execute programs on QEMU
runner = "qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb -nographic
-semihosting-config enable=on,target=native -kernel"
This runner only applies to the thumbv7m-none-eabi target, which is our default
compilation target. Now cargo run will compile the program and run it on QEMU:
Debugging
Debugging is critical to embedded development. Let's see how it's done.
Remote debugging involves a client and a server. In a QEMU setup, the client will
be a GDB (or LLDB) process and the server will be the QEMU process that's also
running the embedded program.
qemu-system-arm \
-cpu cortex-m3 \
-machine lm3s6965evb \
-nographic \
-semihosting-config enable=on,target=native \
-gdb tcp::3333 \
-S \
-kernel target/thumbv7m-none-eabi/debug/examples/hello
This command won't print anything to the console and will block the terminal. We
have passed two extra flags this time:
-gdb tcp::3333 . This tells QEMU to wait for a GDB connection on TCP port
3333.
-S . This tells QEMU to freeze the machine at startup. Without this the
program would have reached the end of main before we had a chance to
launch the debugger!
Next we launch GDB in another terminal and tell it to load the debug symbols of
the example:
gdb-multiarch -q target/thumbv7m-none-eabi/debug/examples/hello
Then within the GDB shell we connect to QEMU, which is waiting for a connection
on TCP port 3333.
You'll see that the process is halted and that the program counter is pointing to a
function named Reset . That is the reset handler: what Cortex-M cores execute
upon booting.
core::num::bignum::Big32x40::mul_small () at src/libcore
/num/bignum.rs:254 src/libcore/num/bignum.rs: No such file or
directory.
That's a known glitch. You can safely ignore those warnings, you're most likely
at Reset().
This reset handler will eventually call our main function. Let's skip all the way there
using a breakpoint and the continue command. To set the breakpoint, let's first
take a look where we would like to break in our code, with the list command.
list main
This will show the source code, from the file examples/hello.rs.
6 use panic_halt as _;
7
8 use cortex_m_rt::entry;
9 use cortex_m_semihosting::{debug, hprintln};
10
11 #[entry]
12 fn main() -> ! {
13 hprintln!("Hello, world!").unwrap();
14
15 // exit QEMU
We would like to add a breakpoint just before the "Hello, world!", which is on line
13. We do that with the break command:
break 13
We can now instruct gdb to run up to our main function, with the continue
command:
continue
Continuing.
We are now close to the code that prints "Hello, world!". Let's move forward using
the next command.
next
16 debug::exit(debug::EXIT_SUCCESS);
At this point you should see "Hello, world!" printed on the terminal that's running
qemu-system-arm .
$ qemu-system-arm (..)
Hello, world!
next
quit
Hardware
By now you should be somewhat familiar with the tooling and the development
process. In this section we'll switch to real hardware; the process will remain
largely the same. Let's dive in.
Does the ARM core include an FPU? Cortex-M4F and Cortex-M7F cores do.
How much Flash memory and RAM does the target device have? e.g. 256 KiB
of Flash and 32 KiB of RAM.
Where are Flash memory and RAM mapped in the address space? e.g. RAM is
commonly located at address 0x2000_0000 .
You can find this information in the data sheet or the reference manual of your
device.
In this section we'll be using our reference hardware, the STM32F3DISCOVERY. This
board contains an STM32F303VCT6 microcontroller. This microcontroller has:
Configuring
We'll start from scratch with a fresh template instance. Refer to the previous
section on QEMU for a refresher on how to do this without cargo-generate .
$ cd app
The second step is to enter the memory region information into the memory.x file.
$ cat memory.x
/* Linker script for the STM32F303VCT6 */
MEMORY
{
/* NOTE 1 K = 1 KiBi = 1024 bytes */
FLASH : ORIGIN = 0x08000000, LENGTH = 256K
RAM : ORIGIN = 0x20000000, LENGTH = 40K
}
NOTE: If you for some reason changed the memory.x file after you had made
the first build of a specific build target, then do cargo clean before cargo
build , because cargo build may not track updates of memory.x .
Make sure the debug::exit() call is commented out or removed, it is used only
for running in QEMU.
#[entry]
fn main() -> ! {
hprintln!("Hello, world!").unwrap();
// exit QEMU
// NOTE do not run this on hardware; it can corrupt OpenOCD state
// debug::exit(debug::EXIT_SUCCESS);
loop {}
}
You can now cross compile programs using cargo build and inspect the binaries
using cargo-binutils as you did before. The cortex-m-rt crate handles all the
magic required to get your chip running, as helpfully, pretty much all Cortex-M
CPUs boot in the same fashion.
Debugging
Debugging will look a bit different. In fact, the first steps can look different
depending on the target device. In this section we'll show the steps required to
debug a program running on the STM32F3DISCOVERY. This is meant to serve as a
reference; for device specific information about debugging check out the
Debugonomicon.
As before we'll do remote debugging and the client will be a GDB process. This
time, however, the server will be OpenOCD.
As done during the verify section connect the discovery board to your laptop / PC
and check that the ST-LINK header is populated.
On a terminal run openocd to connect to the ST-LINK on the discovery board. Run
this command from the root of the template; openocd will pick up the
openocd.cfg file which indicates which interface file and target file to use.
$ cat openocd.cfg
# Depending on the hardware revision you got you'll have to pick ONE of
these
# interfaces. At any time only one interface should be commented out.
NOTE If you found out that you have an older revision of the discovery board
during the verify section then you should modify the openocd.cfg file at this
point to use interface/stlink-v2.cfg .
$ openocd
Open On-Chip Debugger 0.10.0
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : auto-selecting first available session transport "hla_swd". To
override use 'transport select <transport>'.
adapter speed: 1000 kHz
adapter_nsrst_delay: 100
Info : The selected transport took over low-level target control. The
results might differ compared to plain JTAG/SWD
none separate
Info : Unable to match requested speed 1000 kHz, using 950 kHz
Info : Unable to match requested speed 1000 kHz, using 950 kHz
Info : clock speed 950 kHz
Info : STLINK v2 JTAG v27 API v2 SWIM v15 VID 0x0483 PID 0x374B
Info : using stlink api v2
Info : Target voltage: 2.913879
Info : stm32f3x.cpu: hardware has 6 breakpoints, 4 watchpoints
On another terminal run GDB, also from the root of the template.
$ <gdb> -q target/thumbv7em-none-eabihf/debug/examples/hello
Next connect GDB to OpenOCD, which is waiting for a TCP connection on port
3333.
Now proceed to flash (load) the program onto the microcontroller using the load
command.
(gdb) load
Loading section .vector_table, size 0x400 lma 0x8000000
Loading section .text, size 0x1e70 lma 0x8000400
Loading section .rodata, size 0x61c lma 0x8002270
Start address 0x800144e, load size 10380
Transfer rate: 17 KB/sec, 3460 bytes/write.
The program is now loaded. This program uses semihosting so before we do any
semihosting call we have to tell OpenOCD to enable semihosting. You can send
commands to OpenOCD using the monitor command.
You can see all the OpenOCD commands by invoking the monitor help
command.
Like before we can skip all the way to main using a breakpoint and the continue
command.
(gdb) continue
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.
NOTE If GDB blocks the terminal instead of hitting the breakpoint after you
issue the continue command above, you might want to double check that
the memory region information in the memory.x file is correctly set up for
your device (both the starts and lengths).
Advancing the program with next should produce the same results as before.
(gdb) next
16 writeln!(stdout, "Hello, world!").unwrap();
(gdb) next
19 debug::exit(debug::EXIT_SUCCESS);
At this point you should see "Hello, world!" printed on the OpenOCD console,
among other stuff.
$ openocd
(..)
Info : halted: PC: 0x08000e6c
Hello, world!
Info : halted: PC: 0x08000d62
Info : halted: PC: 0x08000d64
Info : halted: PC: 0x08000d66
Info : halted: PC: 0x08000d6a
Info : halted: PC: 0x08000a0c
Info : halted: PC: 0x08000d70
Info : halted: PC: 0x08000d72
Issuing another next will make the processor execute debug::exit . This acts as a
breakpoint and halts the process:
(gdb) next
$ openocd
(..)
Info : halted: PC: 0x08001188
semihosting: *** application exited ***
Warn : target not halted
Warn : target not halted
target halted due to breakpoint, current mode: Thread
xPSR: 0x21000000 pc: 0x08000d76 msp: 0x20009fc0, semihosting
However, the process running on the microcontroller has not terminated and you
can resume it using continue or a similar command.
(gdb) quit
Debugging now requires a few more steps so we have packed all those steps into a
single GDB script named openocd.gdb . The file was created during the cargo
generate step, and should work without any modifications. Let's have a peak:
$ cat openocd.gdb
load
Alternatively, you can turn <gdb> -x openocd.gdb into a custom runner to make
cargo run build a program and start a GDB session. This runner is included in
.cargo/config but it's commented out.
[target.thumbv7m-none-eabi]
# uncomment this to make `cargo run` execute programs on QEMU
# runner = "qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb
-nographic -semihosting-config enable=on,target=native -kernel"
You may well find that the code you need to access the peripherals in your micro-
controller has already been written, at one of the following levels:
Board Crate
A board crate is the perfect starting point, if you're new to embedded Rust. They
nicely abstracts the HW details that might be overwelming when starting studying
this subject, and makes standard tasks easy, like turning a LED on or off. The
functionality they exposes varies a lot between boards. Since this book aims at
staying hardware agnostic, the board crates won't be covered by this book.
But if you're working on a system that doesn't yet have dedicated board crate, or
you need functionality not provided by existing crates, read on as we start from the
bottom, with the micro-architecture crates.
Micro-architecture crate
Let's look at the SysTick peripheral that's common to all Cortex-M based micro-
controllers. We can find a pretty low-level API in the cortex-m crate, and we can use
it like this:
#![no_std]
#![no_main]
use cortex_m::peripheral::{syst, Peripherals};
use cortex_m_rt::entry;
use panic_halt as _;
#[entry]
fn main() -> ! {
let peripherals = Peripherals::take().unwrap();
let mut systick = peripherals.SYST;
systick.set_clock_source(syst::SystClkSource::Core);
systick.set_reload(1_000);
systick.clear_current();
systick.enable_counter();
while !systick.has_wrapped() {
// Loop
}
loop {}
}
The functions on the SYST struct map pretty closely to the functionality defined by
the ARM Technical Reference Manual for this peripheral. There's nothing in this API
about 'delaying for X milliseconds' - we have to crudely implement that ourselves
using a while loop. Note that we can't access our SYST struct until we have called
Peripherals::take() - this is a special routine that guarantees that there is only
one SYST structure in our entire program. For more on that, see the Peripherals
section.
#![no_std]
#![no_main]
use cortex_m_rt::entry;
use tm4c123x;
#[entry]
pub fn init() -> (Delay, Leds) {
let cp = cortex_m::Peripherals::take().unwrap();
let p = tm4c123x::Peripherals::take().unwrap();
We've accessed the PWM0 peripheral in exactly the same way as we accessed the
SYST peripheral earlier, except we called tm4c123x::Peripherals::take() . As
this crate was auto-generated using svd2rust, the access functions for our register
fields take a closure, rather than a numeric argument. While this looks like a lot of
code, the Rust compiler can use it to perform a bunch of checks for us, but then
generate machine-code which is pretty close to hand-written assembler! Where the
auto-generated code isn't able to determine that all possible arguments to a
particular accessor function are valid (for example, if the SVD defines the register
as 32-bit but doesn't say if some of those 32-bit values have a special meaning),
then the function is marked as unsafe . We can see this in the example above
when setting the load and compa sub-fields using the bits() function.
Reading
The read() function returns an object which gives read-only access to the various
sub-fields within this register, as defined by the manufacturer's SVD file for this
chip. You can find all the functions available on special R return type for this
particular register, in this particular peripheral, on this particular chip, in the
tm4c123x documentation.
if pwm.ctl.read().globalsync0().is_set() {
// Do a thing
}
Writing
The write() function takes a closure with a single argument. Typically we call this
w . This argument then gives read-write access to the various sub-fields within this
register, as defined by the manufacturer's SVD file for this chip. Again, you can find
all the functions available on the 'w' for this particular register, in this particular
peripheral, on this particular chip, in the tm4c123x documentation. Note that all of
the sub-fields that we do not set will be set to a default value for us - any existing
content in the register will be lost.
pwm.ctl.write(|w| w.globalsync0().clear_bit());
Modifying
If we wish to change only one particular sub-field in this register and leave the
other sub-fields unchanged, we can use the modify function. This function takes a
closure with two arguments - one for reading and one for writing. Typically we call
these r and w respectively. The r argument can be used to inspect the current
contents of the register, and the w argument can be used to modify the register
contents.
pwm.ctl.modify(|r, w| w.globalsync0().clear_bit());
The modify function really shows the power of closures here. In C, we'd have to
read into some temporary value, modify the correct bits and then write the value
back. This means there's considerable scope for error:
#![no_std]
#![no_main]
use cortex_m_rt::entry;
use tm4c123x_hal as hal;
use tm4c123x_hal::prelude::*;
use tm4c123x_hal::serial::{NewlineMode, Serial};
use tm4c123x_hal::sysctl;
#[entry]
fn main() -> ! {
let p = hal::Peripherals::take().unwrap();
let cp = hal::CorePeripherals::take().unwrap();
sysctl::SystemClock::UsePll(sysctl::PllOutputFrequency::_80_00mhz),
);
// Configure the PLL with those settings
let clocks = sc.clock_setup.freeze();
// Output handling
NewlineMode::SwapLFtoCRLF,
// We need the clock rates to calculate the baud rate divisors
&clocks,
// We need this to power up the UART peripheral
&sc.power_control,
);
loop {
writeln!(uart, "Hello, World!\r\n").unwrap();
}
}
Semihosting
Semihosting is a mechanism that lets embedded devices do I/O on the host and is
mainly used to log messages to the host console. Semihosting requires a debug
session and pretty much nothing else (no extra wires!) so it's super convenient to
use. The downside is that it's super slow: each write operation can take several
milliseconds depending on the hardware debugger (e.g. ST-Link) you use.
#![no_main]
#![no_std]
use panic_halt as _;
use cortex_m_rt::entry;
use cortex_m_semihosting::hprintln;
#[entry]
fn main() -> ! {
hprintln!("Hello, world!").unwrap();
loop {}
}
If you run this program on hardware you'll see the "Hello, world!" message within
the OpenOCD logs.
$ openocd
(..)
Hello, world!
(..)
QEMU understands semihosting operations so the above program will also work
with qemu-system-arm without having to start a debug session. Note that you'll
need to pass the -semihosting-config flag to QEMU to enable semihosting
support; these flags are already included in the .cargo/config file of the
template.
There's also an exit semihosting operation that can be used to terminate the
QEMU process. Important: do not use debug::exit on hardware; this function
can corrupt your OpenOCD session and you will not be able to debug more
programs until you restart it.
#![no_main]
#![no_std]
use panic_halt as _;
use cortex_m_rt::entry;
use cortex_m_semihosting::debug;
#[entry]
fn main() -> ! {
let roses = "blue";
if roses == "red" {
debug::exit(debug::EXIT_SUCCESS);
} else {
debug::exit(debug::EXIT_FAILURE);
}
loop {}
}
$ cargo run
Running `qemu-system-arm (..)
$ echo $?
1
One last tip: you can set the panicking behavior to exit(EXIT_FAILURE) . This will
let you write no_std run-pass tests that you can run on QEMU.
For convenience, the panic-semihosting crate has an "exit" feature that when
enabled invokes exit(EXIT_FAILURE) after logging the panic message to the host
stderr.
#![no_main]
#![no_std]
use cortex_m_rt::entry;
use cortex_m_semihosting::debug;
#[entry]
fn main() -> ! {
let roses = "blue";
assert_eq!(roses, "red");
loop {}
}
$ cargo run
Running `qemu-system-arm (..)
panicked at 'assertion failed: `(left == right)`
left: `"blue"`,
right: `"red"`', examples/hello.rs:15:5
$ echo $?
1
Panicking
Panicking is a core part of the Rust language. Built-in operations like indexing are
runtime checked for memory safety. When out of bounds indexing is attempted
this results in a panic.
In the standard library panicking has a defined behavior: it unwinds the stack of
the panicking thread, unless the user opted for aborting the program on panics.
Given that embedded systems range from user facing to safety critical (cannot
crash) there's no one size fits all panicking behavior but there are plenty of
commonly used behaviors. These common behaviors have been packaged into
crates that define the #[panic_handler] function. Some examples include:
You may be able to find even more crates searching for the panic-handler
keyword on crates.io.
A program can pick one of these behaviors simply by linking to the corresponding
crate. The fact that the panicking behavior is expressed in the source of an
application as a single line of code is not only useful as documentation but can also
be used to change the panicking behavior according to the compilation profile. For
example:
#![no_main]
#![no_std]
// ..
In this example the crate links to the panic-halt crate when built with the dev
profile ( cargo build ), but links to the panic-abort crate when built with the
release profile ( cargo build --release ).
An example
Here's an example that tries to index an array beyond its length. The operation
results in a panic.
#![no_main]
#![no_std]
use panic_semihosting as _;
use cortex_m_rt::entry;
#[entry]
fn main() -> ! {
let xs = [0, 1, 2];
let i = xs.len() + 1;
let _y = xs[i]; // out of bounds access
loop {}
}
This example chose the panic-semihosting behavior which prints the panic
message to the host console using semihosting.
$ cargo run
Running `qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb (..)
panicked at 'index out of bounds: the len is 3 but the index is 4',
src/main.rs:12:13
You can try changing the behavior to panic-halt and confirm that no message is
printed in that case.
Exceptions
Exceptions, and interrupts, are a hardware mechanism by which the processor
handles asynchronous events and fatal errors (e.g. executing an invalid
instruction). Exceptions imply preemption and involve exception handlers,
subroutines executed in response to the signal that triggered the event.
Other than the exception attribute exception handlers look like plain functions
but there's one more difference: exception handlers can not be called by
software. Following the previous example, the statement SysTick(); would result
in a compilation error.
This behavior is pretty much intended and it's required to provide a feature:
static mut variables declared inside exception handlers are safe to use.
#[exception]
fn SysTick() {
static mut COUNT: u32 = 0;
// `COUNT` has transformed to type `&mut u32` and it's safe to use
*COUNT += 1;
}
As you may know, using static mut variables in a function makes it non-reentrant.
It's undefined behavior to call a non-reentrant function, directly or indirectly, from
more than one exception / interrupt handler or from main and one or more
exception / interrupt handlers.
A complete example
Here's an example that uses the system timer to raise a SysTick exception
roughly every second. The SysTick exception handler keeps track of how many
times it has been called in the COUNT variable and then prints the value of COUNT
to the host console using semihosting.
NOTE: You can run this example on any Cortex-M device; you can also run it
on QEMU
#![deny(unsafe_code)]
#![no_main]
#![no_std]
use panic_halt as _;
use core::fmt::Write;
use cortex_m::peripheral::syst::SystClkSource;
use cortex_m_rt::{entry, exception};
use cortex_m_semihosting::{
debug,
hio::{self, HStdout},
};
#[entry]
fn main() -> ! {
let p = cortex_m::Peripherals::take().unwrap();
let mut syst = p.SYST;
loop {}
}
#[exception]
fn SysTick() {
static mut COUNT: u32 = 0;
static mut STDOUT: Option<HStdout> = None;
*COUNT += 1;
// Lazy initialization
if STDOUT.is_none() {
*STDOUT = hio::hstdout().ok();
}
your
// debugger will end in an inconsistent state
if *COUNT == 9 {
// This will terminate the QEMU process
debug::exit(debug::EXIT_SUCCESS);
}
}
[dependencies]
cortex-m = "0.5.7"
cortex-m-rt = "0.6.3"
panic-halt = "0.2.0"
cortex-m-semihosting = "0.3.1"
If you run this on the Discovery board you'll see the output on the OpenOCD
console. Also, the program will not stop when the count reaches 9.
fn DefaultHandler() {
loop {}
}
#[exception]
fn DefaultHandler(irqn: i16) {
// custom default handler
}
The irqn argument indicates which exception is being serviced. A negative value
indicates that a Cortex-M exception is being serviced; and zero or a positive value
indicate that a device specific exception, AKA interrupt, is being serviced.
The result is that the HardFault handler must have the following signature:
fn(&ExceptionFrame) -> ! . The argument of the handler is a pointer to registers
that were pushed into the stack by the exception. These registers are a snapshot of
the processor state at the moment the exception was triggered and are useful to
diagnose a hard fault.
NOTE: This program won't work, i.e. it won't crash, on QEMU because qemu-
system-arm -machine lm3s6965evb doesn't check memory loads and will
happily return 0 on reads to invalid memory.
#![no_main]
#![no_std]
use panic_halt as _;
use core::fmt::Write;
use core::ptr;
#[entry]
fn main() -> ! {
// read a nonexistent memory location
unsafe {
ptr::read_volatile(0x3FFF_FFFE as *const u32);
}
loop {}
}
#[exception]
fn HardFault(ef: &ExceptionFrame) -> ! {
if let Ok(mut hstdout) = hio::hstdout() {
writeln!(hstdout, "{:#?}", ef).ok();
}
loop {}
}
The HardFault handler prints the ExceptionFrame value. If you run this you'll see
something like this on the OpenOCD console.
$ openocd
(..)
ExceptionFrame {
r0: 0x3ffffffe,
r1: 0x00f00000,
r2: 0x20000000,
r3: 0x00000000,
r12: 0x00000000,
lr: 0x080008f7,
pc: 0x0800094a,
xpsr: 0x61000000
}
The pc value is the value of the Program Counter at the time of the exception and
it points to the instruction that triggered the exception.
You can lookup the value of the program counter 0x0800094a in the dissassembly.
You'll see that a load operation ( ldr r0, [r0] ) caused the exception. The r0
field of ExceptionFrame will tell you the value of register r0 was 0x3fff_fffe at
that time.
Interrupts
Interrupts differ from exceptions in a variety of ways but their operation and use is
largely similar and they are also handled by the same interrupt controller. Whereas
exceptions are defined by the Cortex-M architecture, interrupts are always vendor
(and often even chip) specific implementations, both in naming and functionality.
Interrupts do allow for a lot of flexibility which needs to be accounted for when
attempting to use them in an advanced way. We will not cover those uses in this
book, however it is a good idea to keep the following in mind:
interrupt handler table) are usually automatically generated via svd2rust from a
SVD description.
Interrupt handlers look like plain functions (except for the lack of arguments)
similar to exception handlers. However they can not be called directly by other
parts of the firmware due to the special calling conventions. It is however possible
to generate interrupt requests in software to trigger a diversion to the interrupt
handler.
#[interrupt]
fn TIM2() {
static mut COUNT: u32 = 0;
For a more detailed description about the mechanisms demonstrated here please
refer to the exceptions section.
IO
Peripherals
Most Microcontrollers have more than just a CPU, RAM, or Flash Memory - they
contain sections of silicon which are used for interacting with systems outside of
the microcontroller, as well as directly and indirectly interacting with their
surroundings in the world via sensors, motor controllers, or human interfaces such
as a display or keyboard. These components are collectively known as Peripherals.
These peripherals are useful because they allow a developer to offload processing
to them, avoiding having to handle everything in software. Similar to how a
desktop developer would offload graphics processing to a video card, embedded
developers can offload some tasks to peripherals allowing the CPU to spend its
time doing something else important, or doing nothing in order to save power.
If you look at the main circuit board in an old-fashioned home computer from the
1970s or 1980s (and actually, the desktop PCs of yesterday are not so far removed
from the embedded systems of today) you would expect to see:
A processor
A RAM chip
A ROM chip
An I/O controller
The RAM chip, ROM chip and I/O controller (the peripheral in this system) would be
joined to the processor through a series of parallel traces known as a 'bus'. This
bus carries address information, which selects which device on the bus the
processor wishes to communicate with, and a data bus which carries the actual
data. In our embedded microcontrollers, the same principles apply - it's just that
everything is packed on to a single piece of silicon.
However, unlike graphics cards, which typically have a Software API like Vulkan,
Metal, or OpenGL, peripherals are exposed to our Microcontroller with a hardware
interface, which is mapped to a chunk of the memory.
Although 32 bit microcontrollers have a real and linear address space from
0x0000_0000 , and 0xFFFF_FFFF , they generally only use a few hundred kilobytes
of that range for actual memory. This leaves a significant amount of address space
remaining. In earlier chapters, we were talking about RAM being located at address
0x2000_0000 . If our RAM was 64 KiB long (i.e. with a maximum address of 0xFFFF)
then addresses 0x2000_0000 to 0x2000_FFFF would correspond to our RAM.
When we write to a variable which lives at address 0x2000_1234 , what happens
internally is that some logic detects the upper portion of the address (0x2000 in
this example) and then activates the RAM so that it can act upon the lower portion
of the address (0x1234 in this case). On a Cortex-M we also have our Flash ROM
mapped in at address 0x0000_0000 up to, say, address 0x0007_FFFF (if we have a
512 KiB Flash ROM). Rather than ignore all remaining space between these two
regions, Microcontroller designers instead mapped the interface for peripherals in
certain memory locations. This ends up looking something like this:
Interaction with these peripherals is simple at a first glance - write the right data to
the correct address. For example, sending a 32 bit word over a serial port could be
as direct as writing that 32 bit word to a certain memory address. The Serial Port
Peripheral would then take over and send out the data automatically.
This interface is how interactions with the hardware are made, no matter what
language is used, whether that language is Assembly, C, or Rust.
A First Attempt
The Registers
Let's look at the 'SysTick' peripheral - a simple timer which comes with every
Cortex-M processor core. Typically you'll be looking these up in the chip
manufacturer's data sheet or Technical Reference Manual, but this example is
common to all ARM Cortex-M cores, let's look in the ARM reference manual. We
see there are four registers:
The C Approach
In Rust, we can represent a collection of registers in exactly the same way as we do
in C - with a struct .
#[repr(C)]
struct SysTick {
pub csr: u32,
pub rvr: u32,
pub cvr: u32,
pub calib: u32,
}
The qualifier #[repr(C)] tells the Rust compiler to lay this structure out like a C
compiler would. That's very important, as Rust allows structure fields to be re-
ordered, while C does not. You can imagine the debugging we'd have to do if these
fields were silently re-arranged by the compiler! With this qualifier in place, we
have our four 32-bit fields which correspond to the table above. But of course, this
struct is of no use by itself - we need a variable.
Volatile Accesses
Now, there are a couple of problems with the approach above.
Now, the problem is that compilers are clever. If you make two writes to the same
piece of RAM, one after the other, the compiler can notice this and just skip the
first write entirely. In C, we can mark variables as volatile to ensure that every
read or write occurs as intended. In Rust, we instead mark the accesses as volatile,
not the variable.
So, we've fixed one of our four problems, but now we have even more unsafe
code! Fortunately, there's a third party crate which can help - volatile_register .
#[repr(C)]
struct SysTick {
pub csr: RW<u32>,
pub rvr: RW<u32>,
pub cvr: RW<u32>,
pub calib: RO<u32>,
}
Now, the volatile accesses are performed automatically through the read and
write methods. It's still unsafe to perform writes, but to be fair, hardware is a
bunch of mutable state and there's no way for the compiler to know whether these
writes are actually safe, so this is a good default position.
present a safe API for our users so they don't have to worry about it (provided they
trust us to get it right!).
#[repr(C)]
struct RegisterBlock {
pub csr: RW<u32>,
pub rvr: RW<u32>,
pub cvr: RW<u32>,
pub calib: RO<u32>,
}
impl SystemTimer {
pub fn new() -> SystemTimer {
SystemTimer {
p: unsafe { &mut *(0xE000_E010 as *mut RegisterBlock) }
}
}
Now, the problem with this approach is that the following code is perfectly
acceptable to the compiler:
fn thread1() {
let mut st = SystemTimer::new();
st.set_reload(2000);
}
fn thread2() {
let mut st = SystemTimer::new();
st.set_reload(1000);
}
Our &mut self argument to the set_reload function checks that there are no
other references to that particular SystemTimer struct, but they don't stop the
user creating a second SystemTimer which points to the exact same peripheral!
Code written in this fashion will work if the author is diligent enough to spot all of
these 'duplicate' driver instances, but once the code is spread out over multiple
modules, drivers, developers, and days, it gets easier and easier to make these
kinds of mistakes.
The last two of these rules sound suspiciously similar to what the Borrow Checker
does already!
Well, we can, but for the Borrow Checker, we need to have exactly one instance of
each peripheral, so Rust can handle this correctly. Well, luckliy in the hardware,
there is only one instance of any given peripheral, but how can we expose that in
the structure of our code?
Singletons
fn main() {
let _ = unsafe {
THE_SERIAL_PORT.read_speed();
};
}
But this has a few problems. It is a mutable global variable, and in Rust, these are
always unsafe to interact with. These variables are also visible across your whole
program, which means the borrow checker is unable to help you track references
and ownership of these variables.
Instead of just making our peripheral a global variable, we might instead decide to
make a global variable, in this case called PERIPHERALS , which contains an
Option<T> for each of our peripherals.
struct Peripherals {
serial: Option<SerialPort>,
}
impl Peripherals {
fn take_serial(&mut self) -> SerialPort {
let p = replace(&mut self.serial, None);
p.unwrap()
}
}
static mut PERIPHERALS: Peripherals = Peripherals {
serial: Some(SerialPort),
};
fn main() {
let serial_1 = unsafe { PERIPHERALS.take_serial() };
// This panics!
// let serial_2 = unsafe { PERIPHERALS.take_serial() };
}
Although interacting with this structure is unsafe , once we have the SerialPort it
contained, we no longer need to use unsafe , or the PERIPHERALS structure at all.
This has a small runtime overhead because we must wrap the SerialPort
structure in an option, and we'll need to call take_serial() once, however this
small up-front cost allows us to leverage the borrow checker throughout the rest of
our program.
#[macro_use(singleton)]
extern crate cortex_m;
fn main() {
// OK if `main` is executed only once
let x: &'static mut bool =
singleton!(: bool = false).unwrap();
}
cortex_m docs
// cortex-m-rtic v0.5.x
#[rtic::app(device = lm3s6965, peripherals = true)]
const APP: () = {
#[init]
fn init(cx: init::Context) {
static mut X: u32 = 0;
// Cortex-M peripherals
let core: cortex_m::Peripherals = cx.core;
But why?
But how do these Singletons make a noticeable difference in how our Rust code
works?
impl SerialPort {
const SER_PORT_SPEED_REG: *mut u32 = 0x4000_1000 as _;
fn read_speed(
&self // <------ This is really, really important
) -> u32 {
unsafe {
ptr::read_volatile(Self::SER_PORT_SPEED_REG)
}
}
}
Because we are using a singleton, there is only one way or place to obtain a
SerialPort structure
To call the read_speed() method, we must have ownership or a reference to
a SerialPort structure
These two factors put together means that it is only possible to access the
hardware if we have appropriately satisfied the borrow checker, meaning that at
no point do we have multiple mutable references to the same hardware!
fn main() {
// missing reference to `self`! Won't work.
// SerialPort::read_speed();
fn setup_spi_port(
spi: &mut SpiPort,
cs_pin: &mut GpioPin
) -> Result<()> {
// ...
}
This isn't:
This allows us to enforce whether code should or should not make changes to
hardware at compile time, rather than at runtime. As a note, this generally only
works across one application, but for bare metal systems, our software will be
compiled into a single application, so this is not usually a restriction.
Static Guarantees
Rust's type system prevents data races at compile time (see Send and Sync traits).
The type system can also be used to check other properties at compile time;
reducing the need for runtime checks in some cases.
When applied to embedded programs these static checks can be used, for example,
to enforce that configuration of I/O interfaces is done properly. For instance, one
can design an API where it is only possible to initialize a serial interface by first
configuring the pins that will be used by the interface.
One can also statically check that operations, like setting a pin low, can only be
performed on correctly configured peripherals. For example, trying to change the
output state of a pin configured in floating input mode would raise a compile error.
And, as seen in the previous chapter, the concept of ownership can be applied to
peripherals to ensure that only certain parts of a program can modify a peripheral.
This access control makes software easier to reason about compared to the
alternative of treating peripherals as global mutable state.
Typestate Programming
The concept of typestates describes the encoding of information about the current
state of an object into the type of that object. Although this can sound a little
arcane, if you have used the Builder Pattern in Rust, you have already started using
Typestate Programming!
impl FooBuilder {
pub fn new(starter: u32) -> Self {
Self {
a: starter,
b: starter,
}
}
fn main() {
let x = foo_module::FooBuilder::new(10)
.double_a()
.into_foo();
println!("{:#?}", x);
}
In this example, there is no direct way to create a Foo object. We must create a
FooBuilder , and properly initialize it before we can obtain the Foo object we
want.
Strong Types
Because Rust has a Strong Type System, there is no easy way to magically create
an instance of Foo , or to turn a FooBuilder into a Foo without calling the
into_foo() method. Additionally, calling the into_foo() method consumes the
original FooBuilder structure, meaning it can not be reused without the creation
of a new instance.
This allows us to represent the states of our system as types, and to include the
necessary actions for state transitions into the methods that exchange one type for
another. By creating a FooBuilder , and exchanging it for a Foo object, we have
walked through the steps of a basic state machine.
Disabled
Enabled
Configured as Output
Output: High
Output: Low
Configured as Input
Input: High Resistance
Input: Pulled Low
Input: Pulled High
If the peripheral starts in the Disabled mode, to move to the Input: High
Resistance mode, we must perform the following steps:
1. Disabled
2. Enabled
3. Configured as Input
4. Input: High Resistance
Similarly, if we want to move a GPIO pin from configured as Input: Pulled Low to
Output: High , we must perform the following steps:
Hardware Representation
Typically the states listed above are set by writing values to given registers mapped
to a GPIO peripheral. Let's define an imaginary GPIO Configuration Register to
illustrate this:
Bit
Name Value Meaning Notes
Number(s)
Disables the
enable 0 0 disabled
GPIO
Enables the
1 enabled
GPIO
Sets the
direction 1 0 input direction to
Input
Sets the
1 output direction to
Output
Bit
Name Value Meaning Notes
Number(s)
Sets the input
input_mode 2..3 00 hi-z as high
resistance
Input pin is
01 pull-low
pulled low
Input pin is
10 pull-high
pulled high
Invalid state.
11 n/a
Do not set
Output pin is
output_mode 4 0 set-low
driven low
Output pin is
1 set-high
driven high
0 if input is <
input_status 5 x in-val 1.5v, 1 if input
>= 1.5v
impl GpioConfig {
pub fn set_enable(&mut self, is_enabled: bool) {
self.periph.modify(|_r, w| {
w.enable().set_bit(is_enabled)
});
}
However, this would allow us to modify certain registers that do not make sense.
For example, what happens if we set the output_mode field when our GPIO is
configured as an input?
In general, use of this structure would allow us to reach states not defined by our
state machine above: e.g. an output that is pulled low, or an input that is set high.
For some hardware, this may not matter. On other hardware, it could cause
unexpected or undefined behavior!
Design Contracts
In our last chapter, we wrote an interface that didn't enforce design contracts. Let's
take another look at our imaginary GPIO configuration register:
Bit
Name Value Meaning Notes
Number(s)
Disables the
enable 0 0 disabled
GPIO
Enables the
1 enabled
GPIO
Sets the
direction 1 0 input direction to
Input
Sets the
1 output direction to
Output
Sets the input
input_mode 2..3 00 hi-z as high
resistance
Input pin is
01 pull-low
pulled low
Input pin is
10 pull-high
pulled high
Invalid state.
11 n/a
Do not set
Output pin is
output_mode 4 0 set-low
driven low
Output pin is
1 set-high
driven high
0 if input is <
input_status 5 x in-val 1.5v, 1 if input
>= 1.5v
If we instead checked the state before making use of the underlying hardware,
enforcing our design contracts at runtime, we might write code that looks like this
instead:
impl GpioConfig {
pub fn set_enable(&mut self, is_enabled: bool) {
self.periph.modify(|_r, w| {
w.enable().set_bit(is_enabled)
});
}
self.periph.modify(|r, w| {
w.direction().set_bit(is_output)
});
Ok(())
}
if self.periph.read().direction().bit_is_set() {
// Direction must be input
return Err(());
}
self.periph.modify(|_r, w| {
w.input_mode().variant(variant)
});
Ok(())
}
if self.periph.read().direction().bit_is_clear() {
// Direction must be output
return Err(());
}
self.periph.modify(|_r, w| {
w.output_mode.set_bit(is_high)
});
Ok(())
}
if self.periph.read().direction().bit_is_set() {
// Direction must be input
return Err(());
}
Ok(self.periph.read().input_status().bit_is_set())
}
}
Type States
But what if instead, we used Rust's type system to enforce the state transition
rules? Take this example:
w.enable.enabled()
.direction.output()
.input_mode.set_high()
});
GpioConfig {
periph: self.periph,
enabled: Enabled,
direction: Output,
mode: DontCare,
}
}
}
self.periph.modify(|_r, w| w.input_mode().pull_high());
GpioConfig {
periph: self.periph,
enabled: Enabled,
direction: Input,
mode: PulledHigh,
}
}
}
Now let's see what the code using this would look like:
/*
* Example 1: Unconfigured to High-Z input
*/
let pin: GpioConfig<Disabled, _, _> = get_gpio();
/*
* Example 2: High-Z input to Pulled Low input
*/
let pulled_low = input_pin.into_input_pull_down();
let pin_state = pulled_low.bit_is_set();
/*
* Example 3: Pulled Low input to Output, set high
*/
let output_pin = pulled_low.into_enabled_output();
output_pin.set_bit(true);
This is definitely a convenient way to store the state of the pin, but why do it this
way? Why is this better than storing the state as an enum inside of our GpioConfig
structure?
Also, because these states are enforced by the type system, there is no longer
room for errors by consumers of this interface. If they try to perform an illegal
state transition, the code will not compile!
use core::mem::size_of;
let _ = size_of::<Enabled>(); // == 0
let _ = size_of::<Input>(); // == 0
let _ = size_of::<PulledHigh>(); // == 0
let _ = size_of::<GpioConfig<Enabled, Input, PulledHigh>>(); // == 0
struct Enabled;
Structures defined like this are called Zero Sized Types, as they contain no actual
data. Although these types act "real" at compile time - you can copy them, move
them, take references to them, etc., however the optimizer will completely strip
them away.
The GpioConfig we return never exists at runtime. Calling this function will
generally boil down to a single assembly instruction - storing a constant register
value to a register location. This means that the type state interface we've
developed is a zero cost abstraction - it uses no more CPU, RAM, or code space
tracking the state of GpioConfig , and renders to the same machine code as a
direct register access.
Nesting
In general, these abstractions may be nested as deeply as you would like. As long
as all components used are zero sized types, the whole structure will not exist at
runtime.
For complex or deeply nested structures, it may be tedious to define all possible
combinations of state. In these cases, macros may be used to generate all
implementations.
Portability
In embedded environments portability is a very important topic: Every vendor and
even each family from a single manufacturer offers different peripherals and
capabilities and similarly the ways to interact with the peripherals will vary.
Embedded systems are a bit special in this regard since we typically do not have
operating systems and user installable software but firmware images which are
compiled as a whole as well as a number of other constraints. So while the
traditional approach as defined by Wikipedia could potentially work it is likely not
the most productive approach to ensure portability.
What is embedded-hal?
In a nutshell it is a set of traits which define implementation contracts between
HAL implementations, drivers and applications (or firmwares). Those contracts
include both capabilities (i.e. if a trait is implemented for a certain type, the HAL
implementation provides a certain capability) and methods (i.e. if you can
construct a type implementing a trait it is guaranteed that you have the methods
specified in the trait available).
Implements
Application
HardwareAbstraction Layer
Traits
Uses
PeripheralAccess Crate
Microcontroller
The main reason for having the embedded-hal traits and crates implementing and
using them is to keep complexity in check. If you consider that an application might
have to implement the use of the peripheral in the hardware as well as the
application and potentially drivers for additional hardware components, then it
should be easy to see that the re-usability is very limited. Expressed
mathematically, if M is the number of peripheral HAL implementations and N the
number of drivers then if we were to reinvent the wheel for every application then
we would end up with M*N implementations while by using the API provided by
the embedded-hal traits will make the implementation complexity approach M+N.
Of course there're additional benefits to be had, such as less trial-and-error due to
a well-defined and ready-to-use APIs.
HAL implementation
A HAL implementation provides the interfacing between the hardware and the
users of the HAL traits. Typical implementations consist of three parts:
Driver
Application
The application binds the various parts together and ensures that the desired
functionality is achieved. When porting between different systems, this is the part
which requires the most adaptation efforts, since the application needs to correctly
initialize the real hardware via the HAL implementation and the initialisation of
different hardware differs, sometimes drastically so. Also the user choice often
Concurrency
Concurrency happens whenever different parts of your program might execute at
different times or out of order. In an embedded context, this includes:
Since many embedded programs need to deal with interrupts, concurrency will
usually come up sooner or later, and it's also where many subtle and difficult bugs
can occur. Luckily, Rust provides a number of abstractions and safety guarantees
to help us write correct code.
No Concurrency
The simplest concurrency for an embedded program is no concurrency: your
software consists of a single main loop which just keeps running, and there are no
interrupts at all. Sometimes this is perfectly suited to the problem at hand!
Typically your loop will read some inputs, perform some processing, and write
some outputs.
#[entry]
fn main() {
let peripherals = setup_peripherals();
loop {
let inputs = read_inputs(&peripherals);
let outputs = process(inputs);
write_outputs(&peripherals, outputs);
}
}
In Rust, such static mut variables are always unsafe to read or write, because
without taking special care, you might trigger a race condition, where your access
to the variable is interrupted halfway through by an interrupt which also accesses
that variable.
For an example of how this behaviour can cause subtle errors in your code,
consider an embedded program which counts rising edges of some input signal in
each one-second period (a frequency counter):
#[entry]
fn main() -> ! {
set_timer_1hz();
let mut last_state = false;
loop {
let state = read_signal_level();
if state && !last_state {
// DANGER - Not actually safe! Could cause data races.
unsafe { COUNTER += 1 };
}
last_state = state;
}
}
#[interrupt]
fn timer() {
unsafe { COUNTER = 0; }
}
Each second, the timer interrupt sets the counter back to 0. Meanwhile, the main
loop continually measures the signal, and incremements the counter when it sees
a change from low to high. We've had to use unsafe to access COUNTER , as it's
static mut , and that means we're promising the compiler we won't cause any
undefined behaviour. Can you spot the race condition? The increment on COUNTER
is not guaranteed to be atomic — in fact, on most embedded platforms, it will be
split into a load, then the increment, then a store. If the interrupt fired after the
load but before the store, the reset back to 0 would be ignored after the interrupt
returns — and we would count twice as many transitions for that period.
Critical Sections
So, what can we do about data races? A simple approach is to use critical sections, a
context where interrupts are disabled. By wrapping the access to COUNTER in main
in a critical section, we can be sure the timer interrupt will not fire until we're
finished incrementing COUNTER :
#[entry]
fn main() -> ! {
set_timer_1hz();
let mut last_state = false;
loop {
let state = read_signal_level();
if state && !last_state {
// New critical section ensures synchronised access to
COUNTER
cortex_m::interrupt::free(|_| {
unsafe { COUNTER += 1 };
});
}
last_state = state;
}
}
#[interrupt]
fn timer() {
unsafe { COUNTER = 0; }
}
same as disabling interrupts, running some code, and then re-enabling interrupts.
Note we didn't need to put a critical section inside the timer interrupt, for two
reasons:
If COUNTER was being shared by multiple interrupt handlers that might preempt
each other, then each one might require a critical section as well.
This solves our immediate problem, but we're still left writing a lot of unsafe code
which we need to carefully reason about, and we might be using critical sections
needlessly. Since each critical section temporarily pauses interrupt processing,
there is an associated cost of some extra code size and higher interrupt latency
and jitter (interrupts may take longer to be processed, and the time until they are
processed will be more variable). Whether this is a problem depends on your
system, but in general, we'd like to avoid it.
It's worth noting that while a critical section guarantees no interrupts will fire, it
does not provide an exclusivity guarantee on multi-core systems! The other core
could be happily accessing the same memory as your core, even without
interrupts. You will need stronger synchronisation primitives if you are using
multiple cores.
Atomic Access
On some platforms, atomic instructions are available, which provide guarantees
about read-modify-write operations. Specifically for Cortex-M, thumbv6 (Cortex-
M0) does not provide atomic instructions, while thumbv7 (Cortex-M3 and above)
do. These instructions give an alternative to the heavy-handed disabling of all
interrupts: we can attempt the increment, it will succeed most of the time, but if it
was interrupted it will automatically retry the entire increment operation. These
atomic operations are safe even across multiple cores.
#[entry]
fn main() -> ! {
set_timer_1hz();
let mut last_state = false;
loop {
let state = read_signal_level();
if state && !last_state {
// Use `fetch_add` to atomically add 1 to COUNTER
COUNTER.fetch_add(1, Ordering::Relaxed);
}
last_state = state;
}
}
#[interrupt]
fn timer() {
// Use `store` to write 0 directly to COUNTER
COUNTER.store(0, Ordering::Relaxed)
}
This time COUNTER is a safe static variable. Thanks to the AtomicUsize type
COUNTER can be safely modified from both the interrupt handler and the main
thread without disabling interrupts. When possible, this is a better solution — but
it may not be supported on your platform.
A note on Ordering : this affects how the compiler and hardware may reorder
instructions, and also has consequences on cache visibility. Assuming that the
target is a single core platform Relaxed is sufficient and the most efficient choice
in this particular case. Stricter ordering will cause the compiler to emit memory
barriers around the atomic operations; depending on what you're using atomics
for you may or may not need this! The precise details of the atomic model are
complicated and best described elsewhere.
do better in Rust!
We can abstract our counter into a safe interface which can be safely used
anywhere else in our code. For this example, we'll use the critical-section counter,
but you could do something very similar with atomics.
use core::cell::UnsafeCell;
use cortex_m::interrupt;
impl CSCounter {
pub fn reset(&self, _cs: &interrupt::CriticalSection) {
// By requiring a CriticalSection be passed in, we know we must
// be operating inside a CriticalSection, and so can confidently
// use this unsafe block (required to call UnsafeCell::get).
unsafe { *self.0.get() = 0 };
}
#[entry]
fn main() -> ! {
set_timer_1hz();
let mut last_state = false;
loop {
let state = read_signal_level();
if state && !last_state {
// No unsafe here!
interrupt::free(|cs| COUNTER.increment(cs));
}
last_state = state;
}
}
#[interrupt]
fn timer() {
// We do need to enter a critical section here just to obtain a
valid
// cs token, even though we know no other interrupt could pre-empt
// this one.
interrupt::free(|cs| COUNTER.reset(cs));
We've moved our unsafe code to inside our carefully-planned abstraction, and
now our application code does not contain any unsafe blocks.
This design requires that the application pass a CriticalSection token in: these
tokens are only safely generated by interrupt::free , so by requiring one be
passed in, we ensure we are operating inside a critical section, without having to
actually do the lock ourselves. This guarantee is provided statically by the compiler:
there won't be any runtime overhead associated with cs . If we had multiple
counters, they could all be given the same cs , without requiring multiple nested
critical sections.
This also brings up an important topic for concurrency in Rust: the Send and Sync
traits. To summarise the Rust book, a type is Send when it can safely be moved to
another thread, while it is Sync when it can be safely shared between multiple
threads. In an embedded context, we consider interrupts to be executing in a
separate thread to the application code, so variables accessed by both an interrupt
and the main code must be Sync.
For most types in Rust, both of these traits are automatically derived for you by the
compiler. However, because CSCounter contains an UnsafeCell , it is not Sync,
and therefore we could not make a static CSCounter : static variables must be
Sync, since they can be accessed by multiple threads.
To tell the compiler we have taken care that the CSCounter is in fact safe to share
between threads, we implement the Sync trait explicitly. As with the previous use
of critical sections, this is only safe on single-core platforms: with multiple cores,
you would need to go to greater lengths to ensure safety.
Mutexes
We've created a useful abstraction specific to our counter problem, but there are
One such synchronisation primitive is a mutex, short for mutual exclusion. These
constructs ensure exclusive access to a variable, such as our counter. A thread can
attempt to lock (or acquire) the mutex, and either succeeds immediately, or blocks
waiting for the lock to be acquired, or returns an error that the mutex could not be
locked. While that thread holds the lock, it is granted access to the protected data.
When the thread is done, it unlocks (or releases) the mutex, allowing another thread
to lock it. In Rust, we would usually implement the unlock using the Drop trait to
ensure it is always released when the mutex goes out of scope.
Using a mutex with interrupt handlers can be tricky: it is not normally acceptable
for the interrupt handler to block, and it would be especially disastrous for it to
block waiting for the main thread to release a lock, since we would then deadlock
(the main thread will never release the lock because execution stays in the
interrupt handler). Deadlocking is not considered unsafe: it is possible even in safe
Rust.
This is in fact done for us in the cortex_m crate! We could have written our
counter using it:
use core::cell::Cell;
use cortex_m::interrupt::Mutex;
#[entry]
fn main() -> ! {
set_timer_1hz();
let mut last_state = false;
loop {
let state = read_signal_level();
if state && !last_state {
interrupt::free(|cs|
COUNTER.borrow(cs).set(COUNTER.borrow(cs).get() + 1));
}
last_state = state;
}
}
#[interrupt]
fn timer() {
// We still need to enter a critical section here to satisfy the
Mutex.
interrupt::free(|cs| COUNTER.borrow(cs).set(0));
}
We're now using Cell , which along with its sibling RefCell is used to provide
safe interior mutability. We've already seen UnsafeCell which is the bottom layer
of interior mutability in Rust: it allows you to obtain multiple mutable references to
its value, but only with unsafe code. A Cell is like an UnsafeCell but it provides a
safe interface: it only permits taking a copy of the current value or replacing it, not
taking a reference, and since it is not Sync, it cannot be shared between threads.
These constraints mean it's safe to use, but we couldn't use it directly in a static
variable as a static must be Sync.
So why does the example above work? The Mutex<T> implements Sync for any T
which is Send — such as a Cell . It can do this safely because it only gives access
to its contents during a critical section. We're therefore able to get a safe counter
with no unsafe code at all!
This is great for simple types like the u32 of our counter, but what about more
complex types which are not Copy? An extremely common example in an
embedded context is a peripheral struct, which generally is not Copy. For that, we
can turn to RefCell .
Sharing Peripherals
Device crates generated using svd2rust and similar abstractions provide safe
access to peripherals by enforcing that only one instance of the peripheral struct
can exist at a time. This ensures safety, but makes it difficult to access a peripheral
from both the main thread and an interrupt handler.
To safely share peripheral access, we can use the Mutex we saw before. We'll also
need to use RefCell , which uses a runtime check to ensure only one reference to
a peripheral is given out at a time. This has more overhead than the plain Cell ,
but since we are giving out references rather than copies, we must be sure only
one exists at a time.
Finally, we'll also have to account for somehow moving the peripheral into the
shared variable after it has been initialised in the main code. To do this we can use
the Option type, initialised to None and later set to the instance of the peripheral.
use core::cell::RefCell;
use cortex_m::interrupt::{self, Mutex};
use stm32f4::stm32f405;
#[entry]
fn main() -> ! {
// Obtain the peripheral singletons and configure it.
// This example is from an svd2rust-generated crate, but
// most embedded device crates will be similar.
let dp = stm32f405::Peripherals::take().unwrap();
let gpioa = &dp.GPIOA;
#[interrupt]
fn timer() {
That's quite a lot to take in, so let's break down the important lines.
Our shared variable is now a Mutex around a RefCell which contains an Option .
The Mutex ensures we only have access during a critical section, and therefore
makes the variable Sync, even though a plain RefCell would not be Sync. The
RefCell gives us interior mutability with references, which we'll need to use our
GPIOA . The Option lets us initialise this variable to something empty, and only
later actually move the variable in. We cannot access the peripheral singleton
statically, only at runtime, so this is required.
interrupt::free(|cs| MY_GPIO.borrow(cs).replace(Some(dp.GPIOA)));
Inside a critical section we can call borrow() on the mutex, which gives us a
reference to the RefCell . We then call replace() to move our new value into the
RefCell .
interrupt::free(|cs| {
let gpioa = MY_GPIO.borrow(cs).borrow();
gpioa.as_ref().unwrap().odr.modify(|_, w| w.odr1().set_bit());
});
Finally, we use MY_GPIO in a safe and concurrent fashion. The critical section
prevents the interrupt firing as usual, and lets us borrow the mutex. The RefCell
then gives us an &Option<GPIOA> , and tracks how long it remains borrowed - once
that reference goes out of scope, the RefCell will be updated to indicate it is no
longer borrowed.
Since we can't move the GPIOA out of the &Option , we need to convert it to an
&Option<&GPIOA> with as_ref() , which we can finally unwrap() to obtain the
use core::cell::RefCell;
use core::ops::DerefMut;
use cortex_m::interrupt::{self, Mutex};
use cortex_m::asm::wfi;
use stm32f4::stm32f405;
#[entry]
fn main() -> ! {
let mut cp = cm::Peripherals::take().unwrap();
let dp = stm32f405::Peripherals::take().unwrap();
interrupt::free(|cs| {
G_TIM.borrow(cs).replace(Some(tim));
});
loop {
wfi();
}
}
#[interrupt]
fn timer() {
interrupt::free(|cs| {
if let Some(ref mut tim)) =
G_TIM.borrow(cs).borrow_mut().deref_mut() {
tim.start(1.hz());
}
});
}
Whew! This is safe, but it is also a little unwieldy. Is there anything else we can do?
RTIC
One alternative is the RTIC framework, short for Real Time Interrupt-driven
Concurrency. It enforces static priorities and tracks accesses to static mut
variables ("resources") to statically ensure that shared resources are always
accessed safely, without requiring the overhead of always entering critical sections
and using reference counting (as in RefCell ). This has a number of advantages
such as guaranteeing no deadlocks and giving extremely low time and memory
overhead.
The framework also includes other features like message passing, which reduces
the need for explicit shared state, and the ability to schedule tasks to run at a given
time, which can be used to implement periodic tasks. Check out the
documentation for more information!
At the time of writing, there are not many Rust RTOS examples to point to, but it's
an interesting area so watch this space!
Multiple Cores
It is becoming more common to have two or more cores in embedded processors,
which adds an extra layer of complexity to concurrency. All the examples using a
critical section (including the cortex_m::interrupt::Mutex ) assume the only other
execution thread is the interrupt thread, but on a multi-core system that's no
longer true. Instead, we'll need synchronisation primitives designed for multiple
cores (also called SMP, for symmetric multi-processing).
These typically use the atomic instructions we saw earlier, since the processing
system will ensure that atomicity is maintained over all cores.
Covering these topics in detail is currently beyond the scope of this book, but the
general patterns are the same as for the single-core case.
Collections
Eventually you'll want to use dynamic data structures (AKA collections) in your
program. std provides a set of common collections: Vec , String , HashMap , etc.
All the collections implemented in std use a global dynamic memory allocator
(AKA the heap).
If you need collections, a heap allocated implementation is not your only option.
You can also use fixed capacity collections; one such implementation can be found
in the heapless crate.
Using alloc
The alloc crate is shipped with the standard Rust distribution. To import the
crate you can directly use it without declaring it as a dependency in your
Cargo.toml file.
#![feature(alloc)]
use alloc::vec::Vec;
To be able to use any collection you'll first need use the global_allocator
attribute to declare the global allocator your program will use. It's required that the
allocator you select implements the GlobalAlloc trait.
implement a simple bump pointer allocator and use that as the global allocator.
However, we strongly suggest you use a battle tested allocator from crates.io in
your program instead of this allocator.
use core::alloc::GlobalAlloc;
use core::ptr;
use cortex_m::interrupt;
#[global_allocator]
static HEAP: BumpPointerAlloc = BumpPointerAlloc {
head: UnsafeCell::new(0x2000_0100),
end: 0x2000_0200,
};
Apart from selecting a global allocator the user will also have to define how Out Of
Memory (OOM) errors are handled using the unstable alloc_error_handler
attribute.
#![feature(alloc_error_handler)]
use cortex_m::asm;
#[alloc_error_handler]
fn on_oom(_layout: Layout) -> ! {
asm::bkpt();
loop {}
}
Once all that is in place, the user can finally use the collections in alloc .
#[entry]
fn main() -> ! {
let mut xs = Vec::new();
xs.push(42);
assert!(xs.pop(), Some(42));
loop {
// ..
}
}
If you have used the collections in the std crate then these will be familiar as they
are exact same implementation.
Using heapless
heapless requires no setup as its collections don't depend on a global memory
allocator. Just use its collections and proceed to instantiate them:
use heapless::Vec;
use heapless::consts::*;
#[entry]
fn main() -> ! {
let mut xs: Vec<_, U8> = Vec::new();
xs.push(42).unwrap();
assert_eq!(xs.pop(), Some(42));
}
You'll note two differences between these collections and the ones in alloc .
First, you have to declare upfront the capacity of the collection. heapless
collections never reallocate and have fixed capacities; this capacity is part of the
type signature of the collection. In this case we have declared that xs has a
capacity of 8 elements that is the vector can, at most, hold 8 elements. This is
indicated by the U8 (see typenum ) in the type signature.
Second, the push method, and many other methods, return a Result . Since the
heapless collections have fixed capacity all operations that insert elements into
the collection can potentially fail. The API reflects this problem by returning a
Result indicating whether the operation succeeded or not. In contrast, alloc
collections will reallocate themselves on the heap to increase their capacity.
As of version v0.4.x all heapless collections store all their elements inline. This
means that an operation like let x = heapless::Vec::new(); will allocate the
collection on the stack, but it's also possible to allocate the collection on a static
variable, or even on the heap ( Box<Vec<_, _>> ).
Trade-offs
Keep these in mind when choosing between heap allocated, relocatable collections
and fixed capacity collections.
With heap allocations Out Of Memory is always a possibility and can occur in any
place where a collection may need to grow: for example, all alloc::Vec.push
invocations can potentially generate an OOM condition. Thus some operations can
implicitly fail. Some alloc collections expose try_reserve methods that let you
check for potential OOM conditions when growing the collection but you need be
proactive about using them.
If you exclusively use heapless collections and you don't use a memory allocator
for anything else then an OOM condition is impossible. Instead, you'll have to deal
with collections running out of capacity on a case by case basis. That is you'll have
deal with all the Result s returned by methods like Vec.push .
OOM failures can be harder to debug than say unwrap -ing on all Result s
returned by heapless::Vec.push because the observed location of failure may not
match with the location of the cause of the problem. For example, even
vec.reserve(1) can trigger an OOM if the allocator is nearly exhausted because
some other collection was leaking memory (memory leaks are possible in safe
Rust).
Memory usage
Reasoning about memory usage of heap allocated collections is hard because the
capacity of long lived collections can change at runtime. Some operations may
implicitly reallocate the collection increasing its memory usage, and some
collections expose methods like shrink_to_fit that can potentially reduce the
memory used by the collection -- ultimately, it's up to the allocator to decide
whether to actually shrink the memory allocation or not. Additionally, the allocator
may have to deal with memory fragmentation which can increase the apparent
memory usage.
On the other hand if you exclusively use fixed capacity collections, store most of
them in static variables and set a maximum size for the call stack then the linker
will detect if you try to use more memory than what's physically available.
However, fixed capacity collections can not be shrunk which can result in lower
load factors (the ratio between the size of the collection and its capacity) than what
relocatable collections can achieve.
If you are building time sensitive applications or hard real time applications then
you care, maybe a lot, about the worst case execution time of the different parts of
your program.
The alloc collections can reallocate so the WCET of operations that may grow the
collection will also include the time it takes to reallocate the collection, which itself
depends on the runtime capacity of the collection. This makes it hard to determine
the WCET of, for example, the alloc::Vec.push operation as it depends on both
the allocator being used and its runtime capacity.
On the other hand fixed capacity collections never reallocate so all operations have
a predictable execution time. For example, heapless::Vec.push executes in
constant time.
Ease of use
The alloc API will be familiar to virtually every Rust developer. The heapless API
tries to closely mimic the alloc API but it will never be exactly the same due to its
explicit error handling -- some developers may feel the explicit error handling is
excessive or too cumbersome.
Design Patterns
This chapter aims to collect various useful design patterns for embedded Rust.
Checklist
Naming
Interoperability
Predictability
GPIO
Naming
Interoperability
The method should shut down and reset the peripheral if necessary. Calling new
with the raw peripheral returned by free should not fail due to an unexpected
state of the peripheral.
If the HAL type requires other non- Copy objects to be constructed (for example
I/O pins), any such object should be released and returned by free as well. free
should return a tuple in that case.
For example:
impl Timer {
pub fn new(periph: TIMER0) -> Self {
Self(periph)
}
A PAC should be reexported under the name pac , regardless of the actual name
of the crate, as the name of the HAL should already make it clear what PAC is being
accessed.
Predictability
Each GPIO Interface or Port should implement a split method returning a struct
with every pin.
Example:
impl PortA {
pub fn split(self) -> PortAPins {
PortAPins {
pa0: PA0,
pa1: PA1,
// ...
}
}
}
Example:
impl PA0 {
pub fn erase_pin(self) -> PA {
PA { pin: 0 }
}
}
impl PA {
pub fn erase_port(self) -> Pin {
Pin {
port: Port::A,
pin: self.pin,
}
}
}
enum Port {
A,
B,
C,
D,
}
Additional, chip-specific state (eg. drive strength) may also be encoded in this way,
using additional type parameters.
Methods for changing the pin state should be provided as into_input and
into_output methods.
The following methods should be provided for every pin type (that is, both erased
and non-erased pin types should provide the same API):
Pin state should be bounded by sealed traits. Users of the HAL should have no
need to add their own state. The traits can provide HAL-specific methods required
to implement the pin state API.
Example:
mod sealed {
pub trait Sealed {}
}
todo!()
}
Preprocessor
In embedded C it is very common to use the preprocessor for a variety of
purposes, such as:
In Rust there is no preprocessor, and so many of these use cases are addressed
differently. In the rest of this section we cover various alternatives to using the
preprocessor.
The closest match to #ifdef ... #endif in Rust are Cargo features. These are a
little more formal than the C preprocessor: all possible features are explicitly listed
per crate, and can only be either on or off. Features are turned on when you list a
crate as a dependency, and are additive: if any crate in your dependency tree
enables a feature for another crate, that feature will be enabled for all users of that
crate.
For example, you might have a crate which provides a library of signal processing
primitives. Each one might take some extra time to compile or declare some large
table of constants which you'd like to avoid. You could declare a Cargo feature for
each component in your Cargo.toml :
[features]
FIR = []
IIR = []
#[cfg(feature="FIR")]
pub mod fir;
#[cfg(feature="IIR")]
pub mod iir;
You can similarly include code blocks only if a feature is not enabled, or if any
combination of features are or are not enabled.
The conditional compilation will only apply to the next statement or block. If a
block can not be used in the current scope then the cfg attribute will need to be
used multiple times. It's worth noting that most of the time it is better to simply
include all the code and allow the compiler to remove dead code when optimising:
it's simpler for you and your users, and in general the compiler will do a good job
of removing unused code.
These are new to stable Rust as of 1.31, so documentation is still sparse. The
functionality available to const fn is also very limited at the time of writing; in
future Rust releases it is expected to expand on what is permitted in a const fn .
Macros
In general, where you might have used a C preprocessor macro, you probably want
to see if a macro-by-example can do the job instead. They can be defined in your
crate and easily used by your own crate or exported for other users. Be aware that
As with Cargo features, it is worth considering if you even need the macro. In many
cases a regular function is easier to understand and will be inlined to the same
code as a macro. The #[inline] and #[inline(always)] attributes give you
further control over this process, although care should be taken here as well — the
compiler will automatically inline functions from the same crate where
appropriate, so forcing it to do so inappropriately might actually lead to decreased
performance.
Explaining the entire Rust macro system is out of scope for this tips page, so you
are encouraged to consult the Rust documentation for full details.
Build System
Most Rust crates are built using Cargo (although it is not required). This takes care
of many difficult problems with traditional build systems. However, you may wish
to customise the build process. Cargo provides build.rs scripts for this purpose.
They are Rust scripts which can interact with the Cargo build system as required.
At present there is no support for post-build scripts, which you might traditionally
have used for tasks like automatic generation of binaries from the build objects or
printing build information.
Cross-Compiling
Using Cargo for your build system also simplifies cross-compiling. In most cases it
suffices to tell Cargo --target thumbv6m-none-eabi and find a suitable executable
in target/thumbv6m-none-eabi/debug/myapp .
For platforms not natively supported by Rust, you will need to build libcore for
that target yourself. On such platforms, Xargo can be used as a stand-in for Cargo
which automatically builds libcore for you.
int16_t arr[16];
int i;
for(i=0; i<sizeof(arr)/sizeof(arr[0]); i++) {
process(arr[i]);
}
See the Iterators in the Book and Iterator documentation for more details.
References vs Pointers
In Rust, pointers (called raw pointers) exist but are only used in specific
In most cases, we instead use references, indicated by the & symbol, or mutable
references, indicated by &mut . References behave similarly to pointers, in that they
can be dereferenced to access the underlying values, but they are a key part of
Rust's ownership system: Rust will strictly enforce that you may only have one
mutable reference or multiple non-mutable references to the same value at any
given time.
In practice this means you have to be more careful about whether you need
mutable access to data: where in C the default is mutable and you must be explicit
about const , in Rust the opposite is true.
One situation where you might still use raw pointers is interacting directly with
hardware (for example, writing a pointer to a buffer into a DMA peripheral
register), and they are also used under the hood for all peripheral access crates to
allow you to read and write memory-mapped registers.
Volatile Access
In C, individual variables may be marked volatile , indicating to the compiler that
the value in the variable may change between accesses. Volatile variables are
commonly used in an embedded context for memory-mapped registers.
void ISR() {
// Signal that the interrupt has occurred
signalled = true;
}
void driver() {
while(true) {
// Sleep until signalled
while(!signalled) { WFI(); }
// Reset signalled indicator
signalled = false;
// Perform some task that was waiting for the interrupt
run_task();
}
}
#[interrupt]
fn ISR() {
// Signal that the interrupt has occurred
// (In real code, you should consider a higher level primitive,
// such as an atomic type).
unsafe { core::ptr::write_volatile(&mut SIGNALLED, true) };
}
fn driver() {
loop {
// Sleep until signalled
while unsafe { !core::ptr::read_volatile(&SIGNALLED) } {}
// Reset signalled indicator
unsafe { core::ptr::write_volatile(&mut SIGNALLED, false) };
// Perform some task that was waiting for the interrupt
run_task();
}
}
We can pass &mut SIGNALLED into the function requiring *mut T , since &mut
T automatically converts to a *mut T (and the same for *const T )
We need unsafe blocks for the read_volatile / write_volatile methods,
since they are unsafe functions. It is the programmer's responsibility to
ensure safe use: see the methods' documentation for further details.
It is rare to require these functions directly in your code, as they will usually be
taken care of for you by higher-level libraries. For memory mapped peripherals,
the peripheral access crates will implement volatile access automatically, while for
concurrency primitives there are better abstractions available (see the
Concurrency chapter).
In Rust this is controlled by the repr attribute on a struct or union. The default
representation provides no guarantees of layout, so should not be used for code
that interoperates with hardware or C. The compiler may re-order struct members
or insert padding and the behaviour may change with future versions of Rust.
struct Foo {
x: u16,
y: u8,
z: u16,
}
fn main() {
let v = Foo { x: 0, y: 0, z: 0 };
println!("{:p} {:p} {:p}", &v.x, &v.y, &v.z);
}
#[repr(C)]
struct Foo {
x: u16,
y: u8,
z: u16,
}
fn main() {
let v = Foo { x: 0, y: 0, z: 0 };
println!("{:p} {:p} {:p}", &v.x, &v.y, &v.z);
}
#[repr(packed)]
struct Foo {
x: u16,
y: u8,
z: u16,
}
fn main() {
let v = Foo { x: 0, y: 0, z: 0 };
// Unsafe is required to borrow a field of a packed struct.
unsafe { println!("{:p} {:p} {:p}", &v.x, &v.y, &v.z) };
}
Note that using repr(packed) also sets the alignment of the type to 1 .
#[repr(C)]
#[repr(align(4096))]
struct Foo {
x: u16,
y: u8,
z: u16,
}
fn main() {
let v = Foo { x: 0, y: 0, z: 0 };
let u = Foo { x: 0, y: 0, z: 0 };
println!("{:p} {:p} {:p}", &v.x, &v.y, &v.z);
println!("{:p} {:p} {:p}", &u.x, &u.y, &u.z);
}
For further details on type layouts, refer to the type layout chapter of the Rust
Reference.
Other Resources
In this book:
A little C with your Rust
A little Rust with your C
The Rust Embedded FAQs
Rust Pointers for C Programmers
I used to use pointers - now what?
Interoperability
Interoperability between Rust and C code is always dependent on transforming
data between the two languages. For this purposes there are two dedicated
modules in the stdlib called std::ffi and std::os::raw .
std::os::raw deals with low-level primitive types that can be converted implicitly
by the compiler because the memory layout between Rust and C is similar enough
or the same.
std::ffi provides some utility for converting more complex types such as Strings,
mapping both &str and String to C-types that are easier and safer to handle.
Neither of these modules are available in core , but you can find a #![no_std]
compatible version of std::ffi::{CStr,CString} in the cstr_core crate, and
most of the std::os::raw types in the cty crate.
We are collecting examples and use cases for this on our issue tracker in issue #61.
We are collecting examples and use cases for this on our issue tracker in issue #62.
As C++ does not have a stable ABI for the Rust compiler to target, it is
recommended to use the C ABI when combining Rust with C or C++.
First, we will cover manually translating these definitions from C/C++ to Rust.
Typically, libraries written in C or C++ will provide a header file defining all types
and functions used in public interfaces. An example file may look like this:
/* File: cool.h */
typedef struct CoolStruct {
int x;
int y;
} CoolStruct;
/* File: cool_bindings.rs */
#[repr(C)]
pub struct CoolStruct {
pub x: cty::c_int,
pub y: cty::c_int,
}
Let's take a look at this definition one piece at a time, to explain each of the parts.
#[repr(C)]
pub struct CoolStruct { ... }
By default, Rust does not guarantee order, padding, or the size of data included in
a struct . In order to guarantee compatibility with C code, we include the
#[repr(C)] attribute, which instructs the Rust compiler to always use the same
rules C does for organizing data within a struct.
pub x: cty::c_int,
pub y: cty::c_int,
This statement defines the signature of a function that uses the C ABI, called
cool_function . By defining the signature without defining the body of the
function, the definition of this function will need to be provided elsewhere, or
linked into the final library or binary from a static library.
i: cty::c_int,
c: cty::c_char,
cs: *mut CoolStruct
Similar to our datatype above, we define the datatypes of the function arguments
using C-compatible definitions. We also retain the same argument names, for
clarity.
We have one new type here, *mut CoolStruct . As C does not have a concept of
Rust's references, which would look like this: &mut CoolStruct , we instead have a
raw pointer. As dereferencing this pointer is unsafe , and the pointer may in fact
be a null pointer, care must be taken to ensure the guarantees typical of Rust
when interacting with C or C++ code.
Rather than manually generating these interfaces, which may be tedious and error
prone, there is a tool called bindgen which will perform these conversions
automatically. For instructions of the usage of bindgen, please refer to the bindgen
user's manual, however the typical process consists of the following:
1. Gather all C or C++ headers defining interfaces or datatypes you would like to
use with Rust
2. Write a bindings.h file, which #include "..." 's each of the files you
gathered in step one
3. Feed this bindings.h file, along with any compilation flags used to compile
your code into bindgen . Tip: use Builder.ctypes_prefix("cty") /
--ctypes-prefix=cty and Builder.use_core() / --use-core to make the
generated code #![no_std] compatible.
4. bindgen will produce the generated Rust code to the output of the terminal
window. This file may be piped to a file in your project, such as bindings.rs .
You may use this file in your Rust project to interact with C/C++ code
compiled and linked as an external library. Tip: don't forget to use the cty
crate if your types in the generated bindings are prefixed with cty .
For embedded projects, this most commonly means compiling the C/C++ code to a
static archive (such as cool-library.a ), which can then be combined with your
Rust code at the final linking step.
If the library you would like to use is already distributed as a static archive, it is not
necessary to rebuild your code. Just convert the provided interface header file as
described above, and include the static archive at compile/link time.
If your code exists as a source project, it will be necessary to compile your C/C++
code to a static library, either by triggering your existing build system (such as
make , CMake , etc.), or by porting the necessary compilation steps to use a tool
called the cc crate. For both of these steps, it is necessary to use a build.rs
script.
The full reference may be found here. build.rs scripts are useful for generating
code (such as via bindgen), calling out to external build systems such as Make , or
directly compiling C/C++ through use of the cc crate
For projects with complex external projects or build systems, it may be easiest to
use [ std::process::Command ] to "shell out" to your other build systems by
traversing relative paths, calling a fixed command (such as make library ), and
then copying the resulting static library to the proper location in the target build
directory.
While your crate may be targeting a no_std embedded platform, your build.rs
executes only on machines compiling your crate. This means you may use any Rust
crates which will run on your compilation host.
fn main() {
cc::Build::new()
.file("foo.c")
.compile("libfoo.a");
}
Apart from cargo and meson , most build systems don't have native Rust support.
So you're most likely best off just using cargo for compiling your crate and any
dependencies.
Setting up a project
Create a new cargo project as usual.
There are flags to tell cargo to emit a systems library, instead of its regular rust
target. This also allows you to set a different output name for your library, if you
want it to differ from the rest of your crate.
[lib]
name = "your_crate"
crate-type = ["cdylib"] # Creates dynamic lib
# crate-type = ["staticlib"] # Creates static lib
Building a C API
Because C++ has no stable ABI for the Rust compiler to target, we use C for any
#[no_mangle]
The Rust compiler mangles symbol names differently than native code linkers
expect. As such, any function that Rust exports to be used outside of Rust needs to
be told not to be mangled by the compiler.
extern "C"
By default, any function you write in Rust will use the Rust ABI (which is also not
stabilized). Instead, when building outwards facing FFI APIs we need to tell the
compiler to use the system ABI.
Depending on your platform, you might want to target a specific ABI version, which
are documented here.
Putting these parts together, you get a function that looks roughly like this.
#[no_mangle]
pub extern "C" fn rust_function() {
Just as when using C code in your Rust project you now need to transform data
from and to a form that the rest of the application will understand.
However, calling a Rust function from C requires a header file to declare the
function signatures.
Every function in your Rust-ffi API needs to have a corresponding header function.
#[no_mangle]
pub extern "C" fn rust_function() {}
void rust_function();
etc.
There is a tool to automate this process, called cbindgen which analyses your Rust
code and then generates headers for your C and C++ projects from it.
At this point, using the Rust functions from C is as simple as including the header
and calling them!
#include "my-rust-project.h"
rust_function();
Unsorted topics
No optimizations
This is the default. When you call cargo build you use the development (AKA
dev ) profile. This profile is optimized for debugging so it enables debug
information and does not enable any optimizations, i.e. it uses -C opt-level = 0 .
At least for bare metal development, debuginfo is zero cost in the sense that it
won't occupy space in Flash / ROM so we actually recommend that you enable
debuginfo in the release profile -- it is disabled by default. That will let you use
breakpoints when debugging release builds.
[profile.release]
# symbols are nice and they don't increase the size on Flash
debug = true
No optimizations is great for debugging because stepping through the code feels
like you are executing the program statement by statement, plus you can print
stack variables and function arguments in GDB. When the code is optimized, trying
to print variables results in $0 = <value optimized out> being printed.
The biggest downside of the dev profile is that the resulting binary will be huge
and slow. The size is usually more of a problem because unoptimized binaries can
occupy dozens of KiB of Flash, which your target device may not have -- the result:
your unoptimized binary doesn't fit in your device!
Optimizing dependencies
There's a Cargo feature named profile-overrides that lets you override the
optimization level of dependencies. You can use that feature to optimize all
dependencies for size while keeping the top crate unoptimized and debugger
friendly.
Here's an example:
# Cargo.toml
[package]
name = "app"
# ..
[profile.dev.package."*"] # +
opt-level = "z" # +
That's a 6 KiB reduction in Flash usage without any loss in the debuggability of the
top crate. If you step into a dependency then you'll start seeing those <value
optimized out> messages again but it's usually the case that you want to debug
the top crate and not the dependencies. And if you do need to debug a
dependency then you can use the profile-overrides feature to exclude a
particular dependency from being optimized. See example below:
# ..
1 , 2 and 3 . When you run cargo build --release you are using the release
profile which defaults to opt-level = 3 .
Both opt-level = 2 and 3 optimize for speed at the expense of binary size, but
level 3 does more vectorization and inlining than level 2 . In particular, you'll see
that at opt-level equal to or greater than 2 LLVM will unroll loops. Loop
unrolling has a rather high cost in terms of Flash / ROM (e.g. from 26 bytes to 194
for a zero this array loop) but can also halve the execution time given the right
conditions (e.g. number of iterations is big enough).
If you want your release binaries to be optimized for size then change the
profile.release.opt-level setting in Cargo.toml as shown below.
[profile.release]
# or "z"
opt-level = "s"
These two optimization levels greatly reduce LLVM's inline threshold, a metric used
to decide whether to inline a function or not. One of Rust principles are zero cost
abstractions; these abstractions tend to use a lot of newtypes and small functions
to hold invariants (e.g. functions that borrow an inner value like deref , as_ref )
so a low inline threshold can make LLVM miss optimization opportunities (e.g.
eliminate dead branches, inline calls to closures).
When optimizing for size you may want to try increasing the inline threshold to see
if that has any effect on the binary size. The recommended way to change the
inline threshold is to append the -C inline-threshold flag to the other rustflags
in .cargo/config .
# .cargo/config
# this assumes that you are using the cortex-m-quickstart template
[target.'cfg(all(target_arch = "arm", target_os = "none"))']
rustflags = [
# ..
"-C", "inline-threshold=123", # +
]
What value to use? As of 1.29.0 these are the inline thresholds that the different
optimization levels use:
You should try 225 and 275 when optimizing for size.
Appendix A: Glossary
The embedded ecosystem is full of different protocols, hardware components and
vendor-specific things that use their own terms and abbreviations. This Glossary
attempts to list them with pointers for understanding them better.
Term Meaning
Sometimes referred to as I² C or Inter-IC. It is a protocol meant
I2C for hardware communication within a single integrated circuit.
See i2c.info for more details
SPI Serial Peripheral Interface
USART Universal synchronous and asynchronous receiver-transmitter
UART Universal asynchronous receiver-transmitter
Floating-point Unit. A 'math processor' running only operations
FPU
on floating-point numbers
PAC Peripheral Access Crate