Timestamp Peripherals For Precise Real-Time Programming
Timestamp Peripherals For Precise Real-Time Programming
Output
@7 @5 @9 @6
Input
but that precision is lost when using high-level languages without Sslang
suitable abstractions for temporal behavior. So, for timing-sensitive Program
applications, programmers resort to low-level languages like C
6 9
which lack expressiveness and safety guarantees. Other program- 5 7
mers use specialized precision-timing hardware which is expensive
and difficult to obtain. Figure 1: Our approach: a peripheral interprets changes on
In this work, we achieve sub-microsecond precision from a high- input pins as timestamped events, which are passed to a real-
level real-time programming language on the RP2040, a cheap, time discrete-event simulator (the Sslang program), which
widely available microcontroller. Our work takes advantage of the sends timestamped output events to another peripheral that
RP2040’s Programmable I/O (PIO) devices, which are cycle-accurate generates precisely timed output waveforms.
coprocessors designed for implementing hardware protocols over
the RP2040’s GPIO pins.
We use the PIO devices to implement timestamp peripherals, These hardware-managed timestamps make it much easier to de-
which are input capture and output compare devices. We use times- velop and analyze real-time software. Lohstroh et al. [13] argue that
tamp peripherals to mediate I/O from programs written in Sslang, real-time programming models should provide software with some
a real-time programming language with deterministic concurrency. notion of logical time, an engineering fiction that is easier to reason
We show that timestamp peripherals help Sslang programs achieve about than physical models of time. Within a real-time system, the
the precise timing behavior prescribed by Sslang’s Sparse Synchro- timestamp peripherals we propose here form the boundary between
nous Programming model. the logical software and the physical external environment.
Timestamping in software, such as with an interrupt service
KEYWORDS routine that records a system timer value, is imprecise because
real time systems, concurrency control, computer languages, timing of interrupt response time uncertainty. Another approach would
be to implement such timestamping hardware in an FPGA with
ACM Reference Format:
a processor core, but such chips are substantially more expensive
John Hui, Kyle J. Edwards, and Stephen A. Edwards. 2023. Timestamp Periph-
erals for Precise Real-Time Programming. In 21st ACM-IEEE International than commodity microcontrollers.
Conference on Formal Methods and Models for System Design (MEMOCODE To demonstrate timestamp peripherals, we implement them on
’23), September 21–22, 2023, Hamburg, Germany. ACM, New York, NY, USA, the inexpensive (US$0.70), widely available RP2040 microcontroller,
11 pages. https://doi.org/10.1145/3610579.3611084 using its programmable input/output (PIO) blocks and interface
them with the Sparse Synchronous Model (SSM) runtime. The re-
1 INTRODUCTION sulting peripherals sample input pins at 16 MHz and allow output
Systems often have a mix of real-time requirements, ranging from changes to be scheduled with the same precision, far more accu-
picosecond-level precision to best-effort. This paper proposes times- rately than is possible using only the RP2040’s 1 MHz timer. Overall,
tamp peripherals—general-purpose peripherals that timestamp in- our system† gives users the ability to write high-level programs
put events and emit output events according to timestamps—as an that can measure and produce output signals with 62.5 ns precision.
interface between the hardest real-time layer and the first software In this paper, we describe and evaluate the performance of our
layer (Figure 1). While a handful of existing peripherals timestamp real-time software environment with timestamp peripherals. We
events, most are specialized. based our environment on Edwards and Hui’s [5] Sparse Synchro-
nous Model and propose a real-time language called Sslang (Sparse
This work was supported by the NIH under grant RF1MH120034-01.
Synchronous Language), described in Section 2. Sslang relies on
Permission to make digital or hard copies of all or part of this work for personal or timestamp peripherals, which we implemented on the RP2040 mi-
classroom use is granted without fee provided that copies are not made or distributed crocontroller and its PIO blocks, described in Section 3 and Section 4.
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the To determine the performance limits of our approach, we ran exper-
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or iments and describe our findings in Section 5. Section 6 summarizes
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected].
related work; we conclude in Section 7.
MEMOCODE ’23, September 21–22, 2023, Hamburg, Germany
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0318-8/23/09. . . $15.00
https://doi.org/10.1145/3610579.3611084 † Source code available at https://github.com/ssm-lang/pico-ssm
MEMOCODE ’23, September 21–22, 2023, Hamburg, Germany John Hui, Kyle J. Edwards, and Stephen A. Edwards
procedure tick_loop(invar, outvar): // The main tick loop, with PIO input and output variables
init_ssm_runtime() // Initialize the SSM runtime
tick() // Run the program for time zero
forever
rt ← timer_read() // Read the real time from the system timer
nt ← next_time() // Get the time of the next scheduled event
if input_queued() && input_peek().time < nt // Is there a pending input event before any other event?
schedule(invar, input_dequeue()) // . . . yes: move it from the PIO queue to the SSM runtime queue
elseif nt ≤ rt // Has the model fallen behind physical time?
tick() // . . . yes: run the program for an instant; update next time
if outvar.next_time ≠ ∞ // Is there a scheduled PIO output?
pio_output(outvar.next_time, // . . . yes: send it to the PIO
outvar.next_value)
elseif nt ≠ ∞ // Is there an event scheduled for the future?
set_alarm(nt) // . . . yes: schedule an alarm to wake up then
wait(semaphore) // Wait for the alarm or an input event
cancel_alarm() // If an input event awakened us, cancel the alarm
release(semaphore) // Release the semaphore if an alarm came just after an input event
else
wait(semaphore) // Wait for an input event
Figure 4: The RP2040 platform runtime tick loop, which calls tick() to advance model time, then sleeps until the next scheduled
event or external input. Based on the tick loop from Hui & Edwards [7].
Figure 5: System block diagram. The Capture sm (in pio0) timestamps input pin events; the dma controller enqueues them.
The tick loop (Figure 4) gathers the next event from the input queues, schedules it in the ssm event queue, calls tick to run the
Sslang program for an instant, feeds updated time/value to Alarm and Buffer sms (also in pio0), sets an alarm, and sleeps.
Because the PIO programs cannot directly read the system timer, The initialization routine compensates for its own latency, which
we maintain two additional real-time clocks in the PIO programs we measured to be roughly 3 µs. We add this offset to the initial PIO
that need access to the current time. Fortunately, all three timers counter to ensure it runs slightly ahead of the system clock. This
are driven by clocks derived from the external 12 MHz crystal, so offset is critical for the correctness of the tick loop, which assumes
we set them to run at precisely the same rate. They will remain that if the PIO input queue is empty, future queued events will
synchronized provided we start them in phase. We initialize the have a greater timestamp than the current system clock time. If the
PIO counters with the code in Figure 6, which reads the system PIO counters were run behind the system clock, PIO timestamps
timer, sends the initial count value to the counting SMs, and starts could be smaller, violating this assumption. We verify our clocks
all the three SMs simultaneously. are synchronized using the loopback test described in Section 5.4.
Timestamp Peripherals for Precise Real-Time Programming MEMOCODE ’23, September 21–22, 2023, Hamburg, Germany
time_t read_timer(void) {
Figure 10: Output Alarm PIO program. Every 8 cycles, this
uint32_t lo = timer_hw->timelr; / / Latches timehr
decrements a counter, reads a new alarm value if the CPU
uint32_t hi = timer_hw->timehr;
has written one, and sends an interrupt to the Output Buffer
uint64_t us = ((uint64_t) hi << 32) | lo;
PIO program if the counter matches the alarm.
return us << 4; / / Convert 1 MHz timer to 16 MHz
} .program output_buffer
; ISR: Reads the initial GPIO state
void set_alarm(time_t t) { / / Convert to 1 MHz timer ; OSR: Holds output to GPIO
timer_hw->alarm[ALARM_NUM] = t >> 4; in pins, 32 ; Read current GPIO state
} mov osr, isr ; as the default output value
.wrap_target
Figure 9: Translation between SSM time and the RP2040’s wait 1 irq 4 ; Wait for IRQ from Alarm SM
64-bit 1 MHz timer. Reading the lower 32 bits from the timelr out pins, 32 ; Write OSR to GPIO pins
register latches the upper 32 bits, to avoid a wraparound race. .wrap ; Loop again
/ / Make the SM read this output value: inject a pull instruction after ms 1000, gate <- ()
pio_sm_exec(pio0, buffer_sm, loop
pio_encode_pull(0, 1)); if updated gate / / Was gate assigned just now?
log_count (deref count)
if (t < read_timer() + OUTPUT_MARGIN)
/ / Deadline too close: immediately send the output value if updated input / / Was input assigned just now?
/ / by injecting an out instruction count <- 1 / / Yes: reinitialize count to 1
pio_sm_exec(pio0, buffer_sm, else
pio_encode_out(pio_pins, 32)); count <- 0 / / No: reinitialize count to 0
else {
/ / Set Alarm time after ms 1000, gate <- ()
uint32_t tgt = time_to_pio(t); wait gate / / Pause before counting again
pio_sm_put(pio0, alarm_sm, tgt);
} after ms 1000, gate <- ()
} else
count <- deref count + 1
wait gate || input / / Block until either is assigned
Figure 12: Function that schedules a new value/time event
on the output buffer and alarm programs
Figure 14: A frequency counter that reports the number of
events an input variable each second, after Krook et al. [12].
pulsewidth input = The updated function returns true when the variable was
let result = new 0 assigned in the current instant.
par loop
wait input / / Wait for rising edge
let b = now ()
wait input / / Wait for falling edge
let a = now () We test this program with 10 kHz pulses of varying widths and
result <- a - b / / Compute pulse width record the difference in timestamps between pulse edges. Table 1
loop shows the results. We observe a single least-significant bit of jit-
sleep (ms 1000) / / Pause between logging ter in all cases, likely an artifact of sampling. We attribute the
log_pwm (deref result) roughly 20 ppm errors in the long-period measurements to the
expected precision of the crystal oscillator.
While we do not expect correct results for pulses shorter than
Figure 13: A Sslang program to measure pulse width. Note the 62.5 ns sampling period, we were pleased that the resulting
that the now calls return model time, not wall-clock time. behavior was not absurd. The input SM was still able to observe
certain pulses and conclude that they were short.
Table 1: Pulse widths (in clock cycles) reported by the When short pulses are applied above 200 kHz, we begin to ob-
pulsewidth program serve sporadic but drastic measurement errors. For instance, with
a 320 kHz pulse signal with a 500 ns pulse, the program occasion-
ally reports 808 or 809 ticks instead of the expected 8 or 9. These
Pulse Input Expected Observed Jitter Error errors are due to incoming input events accumulating faster than
80 ms 1 280 000 1 280 021 1 22 the program can process them, overflowing the input ring buffer.
8 ms 128 000 128 002 1 3 It takes 32 events—16 cycles of the pulse signal—to overflow a
800 µs 12 800 12 800 1 1 256 B ring buffer; at 320 kHz, 16 cycles is 50 µs, accounting for the
80 µs 1 280 1 280 1 1 extra 800 ticks we observe.
8 µs 128 128 1 1
800 ns 12.8 13 1 0.2 5.3 Frequency Counter
80 ns 1.28 2 0.72
To further assess our RP2040 runtime’s ability under high input
40 ns 0.64 2 1.36
load, we implement the frequency counter from Krook et al. [12] in
Experimental data for the Sslang pulse timer program in Figure 13. Sslang, shown in Figure 14. This program measures the frequency
Units are 16 MHz clock cycles (62.5 ns). We attribute the 17 ppm of a signal by counting the number of events that appear on an
error in the 80 ms measurement to the crystal oscillator. input variable every second.
MEMOCODE ’23, September 21–22, 2023, Hamburg, Germany John Hui, Kyle J. Edwards, and Stephen A. Edwards
has a shorter reaction time (1.96 µs vs. 13.8 µs; see Figure 18b) be-
cause it eliminates most software overhead by performing all work
in an input-triggered ISR. The Sslang program times the falling (a)
edge significantly better because of our PIO timestamp peripherals.
The output of the C program is 1.43 µs–2.39 µs late (Figure 18c)
because of the initial latency and imprecision in the busy wait
loop, which polls the system timer. the Sslang program’s falling
output is 0 ns–62.4 ns late; that jitter in Figure 18d is purely due to
phase differences between the PIO’s 16 MHz sampling clock and
the frequency generator’s oscillator.
The Sslang output system uses two strategies to transmit output
variable assignments to the environment: for sufficiently later as-
signments, the output system sets the Alarm SM’s target counter to
trigger the Buffer SM when the event is scheduled for; for instanta- (b)
neous assignments, shorter delays, and when the system is running
behind, the output system instructs the Buffer SM to immediately
emit the event, rather than risk missing the output deadline while
programming the Alarm SM. The result is high-precision output
timing when possible, and best-effort timing otherwise.
6 RELATED WORK
6.1 Synchronous Software on Real Hardware
Our RP2040 runtime is not the first implementation of the Sparse
Synchronous Model on real hardware; Krook et al. previously devel- (c)
oped Zephyr bindings for Edwards & Hui’s SSM runtime to run pro-
grams on an NRF52840-DK development board [7, 12]. In contrast to
our work, Krook et al. implement the input and output timestamp-
ing in software: input event timestamps are captured during the
GPIO interrupt service routine, and output event timing depends
on when tick executes the output handler process. Though their
approach does not require specialized hardware like the RP2040’s
PIO, their timestamps’ accuracy is limited by the unpredictable
latency of the interrupt handler. Our RP2040 platform runtime can
capture and emit events far more reliably, as demonstrated by our
pulse generator experiment in Section 5.5. Our approach supports
Scoria-like non-timestamp peripherals alongside timestamp GPIO.
Other synchronous, discrete-event programming models have
also been implemented on real hardware. Jellum et al. [11] imple- (d)
ment an embedded target for Lingua Franca, a polyglot coordination
language that supports event-driven execution like SSM [14]. Like Figure 18: Behavior of the reactive 100 µs pulse generator.
Scoria, Jellum et al.’s embedded target is based on Zephyr RTOS and The top trace (blue) is the input; the middle (cyan) is the
manages timestamps in software. Their square wave generator’s C program’s output; the bottom (magenta) is from Sslang.
1 µs sleep-induced jitter appears consistent with that of our C im- (a) The C and Sslang programs try to match the 100 µs in-
plementation for the reactive pulse generator, and their 28 µs/35 µs put pulse. (b) C responds faster (1.96 µs) than Sslang (13.8 µs).
input/output latency reflects the kind of error we eliminate using (c) The C program’s falling edge is 1.43 µs–2.39 µs late, while
dedicated PIO hardware. (d) the Sslang program is at most 62.4 ns late.
MEMOCODE ’23, September 21–22, 2023, Hamburg, Germany John Hui, Kyle J. Edwards, and Stephen A. Edwards
Zou et al. [20–22] develop PtidyOS to execute programs im- Programmable Real-time Unit (PRU) co-processors that execute
plemented using the Ptides programming model, a precursor to alongside the Beaglebone’s desktop-class processors.
Lingua Franca. PtidyOS features a preemptive EDF scheduler and Such PRET processors are currently much more expensive than
runs on the Luminary microcontroller, though it uses the same the RP2040 we used in this work, and it is unclear how precise tim-
software-based timestamping strategy as Scoria’s runtime and does ing (as opposed to predictable) can be achieved on these machines.
not appear to take advantage of the hardware’s input capture fea- The RP2040’s PIO blocks are technically PRET machines (their
tures. To help developers account for this latency, Zou et al. show parallel SMs even appear to be implemented with an interleaved
that that they can statically determine the schedulability of Ptides pipeline), but their lack of memory access and most arithmetic
programs by annotating actors and input/output ports with worst- operations make them far more limited than other PRET machines.
case latencies and simulating the execution, made possible by the
fixed actor topology of Ptides programs. Sslang trades the ability
to do such analysis for a more expressive programming model. 7 CONCLUSIONS
This work shows how software can achieve high timing precision
6.2 Timestamp Peripherals through access to peripherals that can timestamp input events and
schedule timestamped output events. We demonstrated a system
Certain microcontrollers’ peripherals are capable of a primitive
running on the RP2040, an inexpensive, commodity microcontroller,
form of timestamping termed input capture and output compare.
able to achieve 62.5 ns precision on both input and output, although
For example, Atmel ATmega328P microcontrollers [1] include in-
minimum reaction time is in the 13 µs range. We implemented the
put capture units that sample a 16-bit timer on the rising or falling
input and output systems as precisely timed programs running
edge of a single input. The Microchip PIC32 family of microcon-
on the RP2040’s novel PIO system, but similar results could be
trollers [16] possess similar functionality with a 32-bit timer and
achieved with peripherals implemented in an FPGA or directly on
also include an output compare device that can raise, lower, or
the processor chip.
toggle an output pin when the timer matches a target timestamp.
Although the RP2040 has a 64-bit 1 MHz system timer designed
These facilities are geared toward the measurement and generation
to be a master time base, limitations of the PIO system forced us to
of PWM signals, which are highly periodic and not bursty.
implement separate clocks within the PIO devices, which provided
We chose to implement our SSM runtime on the RP2040 rather
higher timing precision (these clocks run at 16 MHz) as well as
than on an ATmega or PIC32 device because we wanted to take
clock synchronization headaches. While the system clock and the
advantage of the RP2040’s 64-bit timer. Unlike the ATmega328P
PIO clocks run off the same crystal oscillator, it was very important
and PIC32, our timestamp peripherals are implemented using the
to start them in sync and in phase so that the peripheral timestamps
RP2040’s PIO device, and support reading from and writing to
did not “time travel” and cause unexpected behavior. This confirmed
multiple consecutive GPIO pins at the same time.
to us that synchronized clocks are key to implementing the Sparse
Timestamping hardware devices also exist for specific applica-
Synchronous Model.
tions. For instance, the IEEE’s Time-Sensitive Networking proto-
An early plan for the output system had it consuming a sequence
cols [8, 9] ensure deterministic networking between devices syn-
of time-value pairs from a FIFO, but this proved unworkable since
chronized using the Precision Timing Protocol. These devices work
SSM semantics allows a scheduled output event to be replaced
by timestamping network packets; Austad and Mathisen [2] show
with an earlier event. While the SSM runtime handles this with a
that this capability is useful for minimizing network-induced jitter
heap that supports re-insertion, implementing such a data structure
for distributed Lingua Franca programs.
with a PIO is impractical. This led us to the simpler mechanism
Certain Nordic Semiconductor SoCs, such as the NRF52 series [17],
presented above: separate time and value “registers” that can be
include a “programmable peripheral interconnect” system that can
overwritten when preemption is needed. The disadvantage of this
configure a timer to timestamp and schedule events on arbitrary pe-
approach is that the software runtime needs to perform a separate
ripherals including single GPIO pins. This feature appears to enable
action for each output event, even if the desired output sequence
timestamp peripherals, but we are unaware of any implementations.
is known in advance and could be buffered. For future work, we
plan to introduce non-preemptible events to Sslang for reducing
6.3 Timing-Predictable Hardware software load, combined with a DMA-assisted output queue for
Rather than rely on peripherals for precise timing, Precision Timed more reliable and precise burst outputs.
(PRET) machine architectures ensure predictable for the main pro- While the Input SM clock and the RP2040’s system timer are
cessor [6]. This approach typically sacrifices single-threaded perfor- synchronized, there is a small but difficult-to-characterize latency
mance in favor of highly parallel real-time workloads that benefit between when an input event is observed (and timestamped) and
from numerous timing-predictable cores. Jellum et al. [10] propose when the DMA controller makes that event available to the main
InterPRET as a hardware architecture for running Lingua Franca tick loop. The uncertainty arises from any DMA controller latency
programs. Their architecture is comparable to XMOS’s XCore ar- plus any interference from other bus traffic. While short, this latency
chitecture [15], a commercial PRET machine. raises the question of when the system can safely advance time past
Other approaches offload time-sensitive computation to timing- a certain point and be assured that no additional inputs will arrive
predictable co-processors. For instance, Vicuna [18] is a co-processor before that point. Interestingly, this is exactly the problem that
designed for massively parallel workloads. Meanwhile, the Beagle- Zou et al. [22] considered for distributed systems, even though our
bone family of development boards [3] feature timing-predictable system is not one that would traditionally be considered distributed.
Timestamp Peripherals for Precise Real-Time Programming MEMOCODE ’23, September 21–22, 2023, Hamburg, Germany