Design Patterns - Embedded Software Design - A Practical Approach To Architecture, Processes, and Coding Techniques
Design Patterns - Embedded Software Design - A Practical Approach To Architecture, Processes, and Coding Techniques
The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
J. Beningo, Embedded Software Design
https://doi.org/10.1007/978-1-4842-8279-3_5
5. Design Patterns
Jacob Beningo1
(1) Linden, MI, USA
I’ve worked on over 100 different embedded projects for the first 20
years of my career. These projects have ranged from low-power sen-
sors used in the defense industry to safety-critical medical devices and
flight software to children’s toys. The one thing I have noticed between
all these projects is that common patterns tend to repeat themselves
from one embedded system to the next. Of course, this isn’t an instance
where a single system is designed, and we just build the same system
repeatedly. However, having had the chance to review several dozen
products where I had no input, these same patterns found their way
into many systems. Therefore, they must contain some basic patterns
necessary to build embedded systems.
This chapter will explore a few design patterns for embedded sys-
tems that leverage a microcontroller. We will not cover the well-estab-
lished design patterns for object-oriented programming languages by
the “Gang of Four (GoF),”1 but instead focus on common patterns for
building real-time embedded software from a design perspective. We
will explore several areas such as single vs. multicore development,
publish and subscribe models, RTOS patterns, handling interrupts, and
designing for low power.
One of the primary design philosophies is that data dictates design. When designing
embedded software, we must follow the data. In many cases, the data starts in the
outside world, interacts with the system through a peripheral device, and is then
served up to the application. A major design decision that all architects encounter is
“How to get the data from the peripheral to the application?”. As it turns out, there
are several different design mechanisms that we can use, such as
Polling
Interrupts
Direct memory access (DMA)
Each of these design mechanisms, in turn, has several design pat-
terns that can be used to ensure data loss is not encountered. Let’s ex-
plore these mechanisms now.
Peripheral Polling
Figure 5-1 The application polls a peripheral for data on its schedule
Peripheral Interrupts
Figure 5-2 When data is received in the peripheral, it fires an interrupt which stops executing the ap-
plication, runs the interrupt handler, and then returns to running the application
When an interrupt is used in a design, there is a chance that the work performed on
the data will take too long to run in the ISR. When we design an ISR, we want the
interrupt to
There’s also a good chance that the data just received needs to be
combined with past or future data to be useful. We can’t do all those op-
erations in a timely manner within an interrupt. We are much better
served by saving the data and notifying the application that data is
ready to be processed. When this happens, we need to reach for design
patterns that allow us to get the data quickly, store it, and get back to
the main application as soon as possible.
Designers can leverage several such patterns used on bare-metal and RTOS-
based systems. A few of the most exciting patterns include
A linear data store is a shared memory location that an interrupt service routine can
directly access, typically to write new data to memory. The application code, usually
the data reader, can also directly access this memory, as shown in Figure 5-3.
Figure 5-3 An interrupt (writer) and application code (reader) can directly access the data store
Data stores often require the designer to build mutual exclusion into the data
store. Mutual exclusion is needed because data stores have a critical section where if
the application is partway through reading the data when an interrupt fires and
changes it, the application can end up with corrupt data. We don’t care how the
developers implement the mutex at the design level, but we need to make them
aware that the mutex exists. I often do this by putting a circular symbol on the data
store containing either an “M” for a mutex or a key symbol, as shown in Figure 5-4.
Unfortunately, at this time, there are no standards that support official nomenclature
for representing a mutex.
Figure 5-4 The data store must be protected by a mutex to prevent race conditions. The mutex is shown
Ping-pong buffers, also sometimes referred to as double buffers, offer another design
solution meant to help alleviate some of the race condition problems encountered
with a data store. Instead of having a single data store, we have two identical data
stores, as shown in Figure 5-5.
Figure 5-5 A ping-pong buffer transfers data between the ISR and application
Now, at first, having two data stores might seem like an opportunity
just to double the trouble, but it’s a potential race condition saver. A
ping-pong buffer is so named because the data buffers are used back
and forth in a ping-pong-like manner. For example, at the start of an
application, both buffers are marked as write only – the ISR stores data
in the first data store when data comes in. When the ISR is done and
ready for the application code to read, it marks that data store as ready
to read. While the application reads that data, the ISR stores it in the
second data store if additional data comes in. The process then repeats.
One of the simplest and most used patterns to get and use data from an interrupt is to
leverage a circular buffer. A circular buffer is a data structure that uses a single,
fixed-size buffer as if it were connected end to end. Circular buffers are often
represented as a ring, as shown in Figure 5-6. Microcontroller memory is not circular
but linear. When we build a circular buffer in code, we specify the start and stop
addresses, and once the stop address is reached, we loop back to the starting address.
Figure 5-6 An eight-element circular buffer representation. The red arrow indicates the head where
new data is stored. The green arrow represents the tail, where data is read out of the buffer
The idea with the circular buffer is that the real-time data we re-
ceive in the interrupt can be removed from the peripheral and stored
in a circular buffer. As a result, the interrupt can run as fast as possible
while allowing the application code to process the circular buffer at its
discretion. Using a circular buffer helps ensure that data is not lost, the
interrupt is fast, and we still process the data reasonably.2
The most straightforward design pattern for a circular buffer can be seen in
Figure 5-7. In this pattern, we are simply showing how data moves from the
peripheral to the application. The data starts in the peripheral, is handled by the ISR,
and is stored in a circular buffer. The application can come and retrieve data from the
circular buffer when it wants to. Of course, the circular buffer needs to be sized
appropriately, so the buffer does not overflow.
Figure 5-7 The data flow diagram for moving data from the peripheral memory storage into the appli-
The circular buffer design pattern is great, but there is one problem
with it that we haven’t discussed; the application needs to poll the buf-
fer to see if there is new data available. While this is not a world-ending
catastrophe, it would be nice to have the application notified that the
data buffer should be checked. Two methods can signal the application:
a semaphore and an event flag.
Figure 5-8 A ring buffer stores incoming data with a semaphore to notify the application that data is
An example design pattern for using event flags and interrupts can
be seen in Figure 5-9. We can represent an event flag version of a circu-
lar buffer with a notification design pattern. As you can see, the pattern
itself does not change, just the tool we use to implement it. The imple-
mentation here results in fewer clock cycles being used and less RAM.
Figure 5-9 A ring buffer stores incoming data with an event flag to notify the application that data is
The last method we will look at for moving data from an interrupt into
the application uses message queues. Message queues are a tool avail-
able in real-time operating systems that can take data of a preset set
maximum size and queue it up for processing by a task. A message
queue can typically store more than a single message and is config-
urable by the developer.
To leverage the message queue design pattern, the peripheral once again
produces data retrieved by an ISR. The ISR then passes the data into a message
queue that can be used to signal an application task that data is available for
processing. When the application task has the highest priority, the task will run and
process the stored data in the queue. The overall pattern can be seen in Figure 5-10.
Figure 5-10 A message queue stores incoming data and passes it into the application for processing
Many microcontrollers today include a direct memory access (DMA) controller that
allows individual channels to be set up to move data in the following ways without
interaction from the CPU:
RAM to RAM
Peripheral to RAM
Peripheral to peripheral
A typical block diagram for DMA and how it interacts with the CPU, memory,
and peripherals can be seen in Figure 5-11.
Figure 5-11 The DMA controller can transfer data between and within the RAM and peripherals with-
For example, Figure 5-12 shows a design pattern where the DMA controller
transfers peripheral data into a circular buffer. After a prespecified number of byte
transfers, the DMA controller will trigger an interrupt. The interrupt, in this case,
uses a semaphore to signal the application that there is data ready to be processed.
Note that we could have used one of the other interrupt patterns, like an event flag,
but I think this gives you the idea of how to construct different design patterns.
Figure 5-12 An example design pattern that uses the DMA controller to move data from a peripheral
Again, the advantage is using the DMA controller to move data with-
out the CPU being involved. The CPU is able to go execute other instruc-
tions while the DMA controller is performing transfer operations.
Developers do have to be careful that they don’t access or try to manip-
ulate the memory locations used by the DMA controller while a transfer
is in progress. A DMA interrupt is often used to signal that it is safe to
operate on the data. Don’t forget that it is possible for the DMA to be in-
terrupted.
Design patterns can cover nearly every aspect of the embedded soft-
ware design process, including the RTOS application. For example, in
an RTOS application, designers often need to synchronize the execution
of various tasks and data flowing through the application. Two types of
synchronization are usually found in RTOS applications, resource syn-
chronization and activity synchronization.
Resource Synchronization
Figure 5-13 Two tasks looking to access the same memory location require a design pattern to synchro-
In Figure 5-13, you can see that the sensor task acquires data from a
device and then writes it to memory. The control task reads the data
from memory and then uses it to generate an output. Marked in red,
you can see that the write and read operations are critical sections! If
the control task is in the process of reading the memory when the sen-
sor task decides to update the memory, we can end up with data cor-
ruption and perhaps a wrong value being used to control a motor or
other device. The result could be really bad things happening to a user!
There are three ways designers can deal with resource synchroniza-
tion: interrupt locking, preemption locking, and mutex locking.
TipA good architect will minimize the need for resource synchroniza-
tion. Avoid it if possible, and when necessary, use these techniques
(with mutex locking being the preference).
Interrupt Locking
Interrupt locking occurs when a system task disables interrupts to provide resource
synchronization between the task and an interrupt. For example, suppose an interrupt
is used to gather sensor data that is written to the shared memory location. In that
case, the control task can lock (disable) the specific interrupt during the reading
process to ensure that the data is not changed during the read operation, as shown in
Figure 5-14.
Figure 5-14 Interrupt locking is used to disable interrupts to prevent a race condition with the sensor
task
Preemption Lock
Preemption lock is a method that can be used to ensure that a task is uninterrupted
during the execution of a critical section. Preemption lock temporarily disables the
RTOS kernel preemptive scheduler during the critical section, as shown in Figure 5-
15.
Figure 5-15 A preemption lock protects the critical section by not allowing the RTOS kernel to preempt
One of the safest and most recommended methods for protecting a shared resource is
to use a mutex lock. A mutex is an RTOS object whose sole purpose is to provide
mutual exclusion to shared resources, as shown in Figure 5-16.
Figure 5-16 A mutex lock is used to protect a critical section by creating an object whose state can be
A mutex will not disable interrupts. It will not disable the kernel’s
preemptive scheduler. It will protect the shared resource in question.
One potential problem with a mutex lock protecting a shared resource
is that developers need to know it exists! For example, if a developer
wrote a third task that would access the sensor data, if they did not
know it was a shared resource, they could still just directly access the
memory! A mutex is an object that manages a lock state, but it does not
physically lock the shared resource.
Activity Synchronization
The unilateral rendezvous is the first method we will discuss to synchronize two
tasks. The unilateral rendezvous uses a binary semaphore or an event flag to
synchronize the tasks. For example, Figure 5-17 shows tasks 1 and 2 are
synchronized with a unilateral rendezvous. First, task 2 executes its code to a certain
point and then becomes blocked. Task 2 will remain blocked until task 1 reaches the
point where it is ready for task 2 to resume executing. Then, task 1 notifies task 2
that it’s okay to proceed by giving a semaphore. Task 2 then unblocks, takes the
semaphore, and continues to execute.
Figure 5-17 An example of unilateral rendezvous synchronization between tasks using a binary
semaphore
The unilateral rendezvous is not only for synchronizing two tasks together. Unilateral
rendezvous can also synchronize and coordinate task execution between an interrupt
and a task, as shown in Figure 5-18. The difference here is that after the ISR gives
the semaphore or the event flag, the ISR will continue to execute until it is complete.
In addition, the unilateral rendezvous between two tasks may cause the task that
gives the semaphore to be preempted by the other task if the second task has a higher
priority.
Figure 5-18 A unilateral rendezvous can be used to synchronize an ISR and task code
Figure 5-19 A counting semaphore is used in this unilateral rendezvous to track how many “credits”
Bilateral Rendezvous
The broadcast design pattern allows multiple tasks to block until a semaphore is
given, an event flag occurs, or even a message is placed into a message queue.
Figure 5-21 shows an example of a task or an interrupt giving a semaphore
broadcast to three other tasks. Each receiving task can consume the semaphore and
execute its task code once it is the highest priority task ready to run.
Figure 5-21 A task or an ISR can give a binary semaphore broadcast consumed by multiple tasks
Any developer working with embedded systems in the IoT industry and
connecting to cloud services is probably familiar with the publish/sub-
scribe model. In many cases, an IoT device will power up, connect to
the cloud, and then subscribe to message topics it wants to receive. The
device may even publish specific topics as well. Interestingly, even an
embedded system running FreeRTOS can leverage the publish and sub-
scribe model for its embedded architecture.
The general concept, shown in Figure 5-22, is that a broker is used to receive and
send messages below to topics. Publishers send messages to the broker specifying
which topic the message belongs to. The broker then routes those messages to
subscribers who request that they receive messages from the specific topics. It’s
possible that there would be no subscribers to a topic, or that one subscriber
subscribes to multiple topics, and so forth. The publish and subscribe pattern is
excellent for abstracting the system architecture. The publisher doesn’t know who
listens to its messages and doesn’t care. The subscriber only cares about messages
that it is subscribed to. The result is a scalable system.
Figure 5-22 An example publish/subscribe system where two publishers publish messages to the bro-
ker, which then routes the messages to the subscribers of the message topics
Let’s, for a moment, think about an example. Let’s say we have a sys-
tem that will collect data from several sensors. That sensor data is go-
ing to be stored in nonvolatile memory. In addition, that data needs to
be transmitted as part of a telemetry beacon for the system. On top of
that, the data needs to be used to maintain the system’s orientation
through use in a control loop.
We could architect this part of the system in several ways, but one exciting way
would be to use the publish and subscribe design pattern. For example, we could add
a message broker to our system, define the sensor acquisition task as a publisher, and
then have several subscribers that receive data messages and act on that data, as
shown in Figure 5-23.
Figure 5-23 A sensor task delivers sensor data to subscribers through the broker
Regarding low-power design patterns, the primary pattern is to keep the device
turned off as much as possible. The software architecture needs to be event driven.
Those events could be a button press, a period that has elapsed, or another trigger. In
between those events, when no practical work is going to be done, the
microcontroller should be placed into an appropriate low-power state, and any non-
essential electronics should be turned off. A simple state diagram for the design
pattern can be seen in Figure 5-24.
Figure 5-24 The system lives in the low-power state unless a wake-up event occurs. The system returns
The preceding design pattern is, well, obvious. The devil is in the de-
tails of how you stay in the low-power state, wake up fast enough, and
then get back to sleep. Like many things in embedded systems, the
specifics are device and application specific. However, there are at least
a few tips and suggestions that I can give to help you minimize the en-
ergy that your devices consume.
First, if your system uses an RTOS, one of the biggest struggles you’ll
encounter is keeping the RTOS kernel asleep. Most RTOS application im-
plementations I’ve designed and the ones I’ve seen set the RTOS timer
tick to one millisecond. This is too fast when you want to go into a low-
power state. The kernel will wake the system every millisecond when
the timer tick expires! I’d be surprised if the microcontroller and sys-
tem had even settled into a low-power state by that time. Designers
need to use an RTOS with a tickless mode built-in or one where they
can scale their system tick to keep the microcontroller asleep for longer
periods.
Figure 5-25 An unoptimized system whose RTOS system tick prevents the system from going to sleep
In this system, I enabled the FreeRTOS tickless mode and set up the low-power
state I wanted the system to move into. I configured the tickless mode such that I
could get a system tick once every 50 milliseconds, as shown in Figure 5-26. The
result was that the system could move into a low-power state and stay there for much
longer. The power consumption also dropped from 32 milliamps down to only 11
milliamps!
Figure 5-26 Enabling a tickless mode removed the constant system wake-up caused by the system tick
Over the last couple of years, as the IoT has taken off and machine
learning at the edge has established itself, an interest in multicore mi-
crocontrollers has blossomed. Microcontrollers have always been the
real-time hot rods of microprocessing. They fill a niche where hard
deadlines and consistently meeting deadlines are required. However,
with the newer needs and demands being placed on them, many manu-
facturers realized that either you move up to a general-purpose proces-
sor and lose the real-time performance, or you start to put multiple mi-
crocontrollers on a single die. Multiple processors in a single package
can allow a low-power processing to acquire data and then offload it
for processing on a much more powerful processor.
Today, in 2022, there are many multicore microcontroller offerings, such as the
Cypress PSoC 6, the STMicroelectronics STM32H7, and the Raspberry Pi Pico, to
name a few. A multicore microcontroller is interesting because it provides multiple
execution environments that can be hardware isolated where code can run in parallel.
I often joke, although prophetically, that one day microcontrollers may look
something like Figure 5-27, where they have six, eight, or more cores all crunching
together in parallel.
Today’s architectures are not quite this radical yet. A typical multicore
microcontroller has two cores. However, two types of architectures are common.
First, there is homogenous multiprocessing. In these architectures, each processing
core uses the same processor architecture. For example, the ESP32 has Dual Xtensa
32-bit LX6 cores, as shown in Figure 5-28. One core is dedicated to Wi-
Fi/Bluetooth, while the other is dedicated to the user application.
Figure 5-28 Symmetric multicore processing has two cores of the same processor architecture
The alternative architecture is to use heterogeneous multiprocessing. In these
architectures, each processing core has a different underlying architecture. For
example, the Cypress PSoC 64 has an Arm Cortex-M4 for user applications and an
Arm Cortex-M0+ to act as a security processor (see Figure 5-29). The two cores also
don’t have to run at the same clock speed. For example, in the previous example, the
Arm Cortex-M4 runs at 150 MHz, while the Arm Cortex-M0+ runs at 100 MHz.
Artificial intelligence and real-time control are the first use case that I see being
pushed relatively frequently. Running a machine learning inference on a
microcontroller can undoubtedly be done, but it is compute cycle intensive. The idea
is that one core is used to run the machine learning inference, while the other does
the real-time control, as shown in Figure 5-30. For example, in the STM32H7
family, the multicore parts have an Arm Cortex-M7 and an Arm Cortex-M4. The M7
runs the machine learning inference, while the M4 does the standard real-time stuff
like motor control, sensor acquisition, and communication. The M4 core can feed the
M7 with the data it needs to run the AI algorithm, and the M7 can feed the M4 the
result.
Figure 5-30 A multicore microcontroller use case where one core is used for running an AI inference,
while the other core manages real-time application control (Source: STM32H7 MCUs for rich and complex
Real-Time Control
Figure 5-31 A multicore microcontroller use case where one core runs cycle-intensive real-time dis-
plays and memory, while the other manages real-time application control (Source: STM32H7 MCUs for
Security Solutions
Another popular use case for multiple cores is managing security solutions with an
application. For example, developers can use the hardware isolation built into the
multiple cores to use one as a security processor where the security operations and
the Root-of-Trust live. In contrast, the other core is the normal application space (see
Figure 5-32). Data can be shared between the cores using shared memory, but the
cores only interact through interprocessor communication (IPC) requests.29
Figure 5-32 A multicore microcontroller use case where one core is used as a security processor, while
the other core manages real-time application control (Source: STM32H7 MCUs for rich and complex ap-
The three use cases that I described earlier are just several design pat-
terns that emerge for multicore microcontrollers. There are undoubted-
ly many other potential design patterns. There will certainly be others
once microcontrollers start to move beyond having just two cores.
Multiple microcontrollers will be an area to keep an eye on in the fu-
ture. While many applications currently are targeting the high end of
the spectrum, as costs come down, we will undoubtedly start seeing 8-
bit and 16-bit applications for multicore parts.
Final Thoughts
Action Items
To put this chapter’s concepts into action, here are a few activities the reader can
perform to get more familiar with design patterns:
Examine how your most recent application accesses peripherals. For example,
are they accessed using polling, interrupts, or DMA?
What improvements could be made to your peripheral
interactions?
Consider a product or hobby project you would be interested in
building. What design patterns that we’ve discussed would be nec-
essary to build the system?
What parts, if any, of your application could benefit from using pub-
lish and subscribe patterns?
What are a couple of design patterns that can be used to minimize
energy consumption? Are you currently using these patterns in
your architecture? Why or why not?
What are some of the advantages of using a multicore microcon-
troller? Are there disadvantages? If so, what are they?
Footnotes
1 https://springframework.guru/gang-of-four-design-patterns/
2 The definition for reasonable amount of time is obviously open to debate and
very application dependent.
3 Real-Time Concepts for Embedded Systems by Qing Li with Caroline Yao, page
231.
4 Real-Time Concepts for Embedded Systems by Qing Li with Caroline Yao, page
233.