RTOS Application Design - Embedded Software Design - A Practical Approach To Architecture, Processes, and Coding Techniques
RTOS Application Design - Embedded Software Design - A Practical Approach To Architecture, Processes, and Coding Techniques
The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
J. Beningo, Embedded Software Design
https://doi.org/10.1007/978-1-4842-8279-3_4
Jacob Beningo1
(1) Linden, MI, USA
Embedded software has steadily become more and more complex. As
businesses focus on joining the IoT, the need for an operating system to
manage low-level hardware, memory, and time has steadily increased.
Embedded systems implement a real-time operating system in approxi-
mately 65% of systems.1 The remaining systems are simple enough for
bare-metal scheduling techniques to achieve the systems requirements.
An RTOS provides developers with several key capabilities that can be time-
consuming and costly to develop and test from scratch. For example, an RTOS will
provide
A multithreading environment
At least one scheduling algorithm
Mutexes, semaphores, queues, and event flags
Middleware components (generally optional)
The RTOS typically fits into the software stack above the board sup-
port package but under the application code, as shown in Figure 4-1.
Thus, the RTOS is typically considered part of the middleware, even
though middleware components may be called directly by the RTOS.
While an RTOS can provide developers with a great starting point and several
tools to jump-start development, designing an RTOS-based application can be
challenging the first few times they use an RTOS. There are common questions that
developers encounter, such as
This chapter will explore the answers to these questions by looking at how we
design an application that uses an RTOS. However, before we dig into the design, we
first need to examine the similarities and differences between tasks, threads, and
processes.
Figure 4-1 An RTOS is a middleware library that, at a minimum, includes a scheduler and a kernel. In
addition, an RTOS will typically also provide additional stacks such as networking stacks, device input/out-
A task has several definitions that are worth discussing. First, a task
is a concurrent and independent program that competes for execution
time on a CPU.4 This definition tells us that tasks are isolated ap-
plications without interactions with other tasks in the system but may
compete with them for CPU time. They also need to appear like they are
the only program running on the processor. This definition is helpful,
but it doesn’t represent what a task is on an embedded system.
It is a separate “program.”
It may interact with other tasks (programs) running on the system.
It has a dedicated function or purpose.
For most developers working with an RTOS, a thread and a task are
synonyms! Surveying several different RTOSes available in the wild,
you’ll find that there are several that provide thread APIs, such as
Azure RTOS, Keil RTX, and Zephyr. These operating systems provide
similar capabilities that compete with RTOSes that use task
terminology.
This approach is acceptable and used in many applications, but it does not isolate
or protect the various elements. For example, if one of the green threads goes off the
rails and starts overwriting memory used by the blue thread, there is nothing in place
to detect or protect the blue thread. Furthermore, since everything is in one memory
space, everyone can access everyone else’s data! Obviously, this may be unwanted
behavior in many applications, which is why the MPU could be used to create
processes, resulting in multiprocess applications like that shown in Figure 4-3.
Figure 4-3 A multiprocess application can leverage an MPU to group threads, input/output, interrupts,
Feature-Based Decomposition
Display
Touch screen
LED backlight
Cloud connectivity
Temperature measurement
Humidity measurement
HVAC controller
Most teams will create a list of features the system must support
when they develop their stakeholder diagrams and identify the system
requirements. This effort can also be used to determine the tasks that
make up the application software.
When using the feature-based approach, it’s critical that developers also go
through an optimization phase to see where identified tasks can be combined based
on common functionality. For example, tasks may be specified for measuring
temperature, pressure, humidity, etc. However, having a task for each individual task
will overcomplicate the design. Instead, these measurements could all be combined
into a sensor task. Figure 4-4 provides an example of a system’s appearance when
feature-based task decomposition is used.
Figure 4-4 An example is feature-based task decomposition for an IoT thermostat that shows all the
Using features is not the only way to decompose tasks. One of my fa-
vorite methods to use is the outside-in approach.
Developers can follow a simple process to decompose their applications using the
outside-in approach. The outside-in process helps developers think through the
application, examine the data elements, and ensure a consistent process for
developing applications. The steps are relatively straightforward and include
The easiest way to examine and explain each step is to look at an ex-
ample. One of my favorite teaching examples is to use an Internet-con-
nected thermostat. So let’s discuss how we can decompose an IoT ther-
mostat into tasks.
Like any design, we need to understand the hardware before designing the
software architecture. Figure 4-5 shows an example thermostat hardware block
diagram. We can use it to follow then the steps we have defined to decompose our
application into tasks.
Figure 4-5 The hardware block diagram for an IoT thermostat. Devices are grouped based on the hard-
The first step to decomposing an application into tasks is identifying the major
components that make up the system. These are going to be components that
influence the software system. For example, the IoT thermostat would have major
components such as
Humidity/temperature sensor
Gesture sensor
Touch screen
Analog sensors
Connectivity devices (Wi-Fi/Bluetooth)
LCD/display
Fan/motor control
Backlight
Etc.
One component that I find people often overlook is the device itself!
Make sure you add that to the list as well. For example, add an IoT de-
vice to your list if you are building an IoT device. If you are making a
propulsion controller, add a propulsion controller. The reason to do
this is that this block acts as a placeholder for our second-tier tasks that
we will decompose in the last step.
At this point, we now know the major components that we will be try-
ing to integrate into our software architecture. We don’t know what
tasks we will have yet, though. That will depend on how these different
components interact with each other and how they produce data. So,
the next logical step is to take our list of components and build a block
diagram with them. A block diagram will help us visualize the system,
which I find helpful. We could, in theory, create a table for the decom-
position steps, but I’ve always found that a picture is worth a thousand
words and much more succinctly gets the point across.
Figure 4-6 demonstrates how one might start to develop a block diagram for the
IoT thermostat. Notice that I’ve grouped the analog and digital sensors into a single
sensor block to simplify the diagram and grouped them with the fan/state control
components.
Figure 4-6 The major components are arranged in a way that allows us to visualize the system
We don’t want to lose sight of our design principles when designing embedded
software. We’ve discussed how important it is to let data dictate the design. When we
are working on decomposing our application into tasks, the story is not any different.
Therefore, we want to examine each major component and identify which blocks
generate input data into the IoT device block, as shown in Figure 4-7.
Figure 4-7 In step #3, we identify where input data comes into the system. These are the data sources
In this step, we examine each block and determine what the output
from the block is and the input into the application. For example, the
touch screen in our application will likely generate an event with x and
y coordinate data when someone presses the touch screen. In addition,
the Wi-Fi and Bluetooth blocks will generate network data, which we
are leaving undefined now. Finally, the sensor block we mark as gener-
ating sensor values.
At this stage, I’ve left the data input into the IoT device block as descriptive.
However, it would not hurt to add additional detail if it is known. For example, I
might know that the touch screen data will have the following characteristics:
X and y coordinates.
Event driven (data only present on touch).
Data generation is every 50 milliseconds when events occur.
Coordinate data may be used to detect gestures.
After identifying all the inputs into the IoT device block, we naturally want to look at
all the outputs. Outputs are things that our device is going to control in the “real
world.”12 An example of the outputs in the IoT thermostat example can be found in
Figure 4-8.
Figure 4-8 In step #4, we identify the system’s outputs. These are the data products that are generated
by the system
We can see from the diagram that this system has quite a few out-
puts. First, we have an output to the LCD, which is character data or
commands to the LCD. The LED backlight and fan are each fed duty cy-
cle information. There is then data associated with several other blocks
as well.
Just like with the input labeling, it would not hurt to add a reference label to the
block diagram and generate a table or description that gives more information about
each. For example, I might expand on the LED backlight duty cycle output data to
include
Finally, we are ready to start identifying the first tasks in our system!
As you may have already figured out, these first tasks are specific to in-
teracting with the hardware components in our system! That’s really
what the outside-in approach is doing. The application needs to look at
the outer layer of hardware, identify the tasks necessary to interact
with it, and finally identify the inner, purely software-related tasks.
Typically, I like to take the block diagram I was working with and
transform it during this step. I want to look at the inputs and outputs to
then rearrange the blocks and data into elements that can be grouped.
For example, I may take the sensor and touch screen blocks and put
them near each other because they can be looked at as sensor inputs to
the application. I might look at the Wi-Fi and Bluetooth blocks and say
these are connectivity blocks and should also be placed near each
other. This helps to organize the block diagram a bit before we identify
our first tasks.
I make one critical adjustment to the block diagram; I expand the IoT device
block from a small text block to a much larger one. I do this so that the IoT device
block can encompass all the tasks I am about to define! I then go through the system
inputs and outputs and generate my first-tier tasks when I do this. The result can be
seen in Figure 4-9.
Figure 4-9 In step #5, we group data inputs and outputs and identify the first tasks in the outer layer of
the application
Five tasks have been identified in this initial design which include
Process inputs
Network manager
Print
Process outputs
Memory manager
What can be seen here is that we are trying to minimize the number
of tasks included in this design. We’ve grouped all the sensor inputs
into a single process input task. All connectivity has been put into a net-
work management task. The nonvolatile memory source is accessed
through a memory manager gatekeeping task rather than using a mu-
tex to protect access.
At this point, we now have our initial, first-tier tasks representing our design’s outer
layer. This layer is based on the hardware our application needs to interact with and
the data inputs and outputs from those devices. The next step is carefully reviewing
how that data will flow through the task system and interact with our application
task(s). Figure 4-10 shows how one might modify our previous diagram to see this
data flow.
Figure 4-10 Carefully identify the high-level dependencies between the various tasks. This diagram
shows arrows between the application task and the input/output tasks
The diagram also allows us to start considering which tasks are con-
current, that is, which ones may need to appear to run simultaneously
and which are feeding other tasks. For example, if we have an output
that we want to generate, we may want the input task and the applica-
tion task to run first to act on the latest data. (We may also want to act
on the output data from the last cycle first to minimize jitter in the
output!)
Once we have identified our initial tasks, it’s the perfect time to develop a data
flow diagram. A data flow diagram shows how data is input, transferred through the
system, acted on, and output from the system. In addition, the data flow diagram
allows the designer to identify event-driven mechanisms and RTOS objects needed
to build the application. These will include identifying
Figure 4-11 provides an example data flow diagram for our IoT thermostat.
Figure 4-11 Developing a data flow diagram is critical in determining how data moves through the ap-
plication and identifying shared resources, task synchronization, and resources needed for the application
The data flow diagram we created in the last step can be beneficial in
forming the core for our second-tier task diagram. In this step, we want
to look at our application, its features, processes, etc., and break it into
more manageable tasks. It wouldn’t be uncommon for this generic
block we have been looking at to break up into a dozen or more tasks
suddenly! We’ve essentially been slowly designing our system in layers,
and the application can be considered the inner layer in the design.
To identify our second-tier or application tasks, I start with my data flow diagram
and strip out everything in the data flow outside the tasks that directly interact with
the application. This essentially takes Figure 4-11 and strips it down into Figure 4-
12. This diagram focuses on the application block’s inputs, outputs, and processing.
Figure 4-12 Identifying the second-tier application tasks involves focusing on the application task
block and all the input and output data from that block
We now remove the application block and try to identify the tasks
and activities performed by the application. The approach should look
at each input and determine what application task would handle that
input. Each output can be looked at to determine what application task
would take that output. The designer can be methodical about this.
Let’s start by looking at the process input task. Process inputs execute and
receive new sensor data. The sensor data is passed to the application through a data
store protected by a mutex. The mutex ensures mutually exclusive access to the data
so that we don’t end up with corrupted data. How process inputs execute would be as
follows:
The process input task behavior is now clearly defined. So the ques-
tion becomes, how will the application receive and process the new
sensor data?
One design option for the application is to have a new task, sensor DSP, block on
a semaphore waiting for new data to be available. For example, the sensor DSP task
would behave as follows:
Figure 4-13 Second-tier tasks are created by breaking down the application task and identifying semi-
independent program segments that assist in driving the controller and its output
Now that we have identified our tasks, several questions should be on every
designer’s mind:
There are typically three different algorithms that designers can use to set task
priorities:
Response Execution
Task Task Period
Time Time
Number Type (ms)
(ms) (ms)
1 Periodic 30 20 100
2 Periodic 15 5 150
Response Execution
Task Task Period
Time Time
Number Type (ms)
(ms) (ms)
3 Aperiodic 100 15 –
4 Aperiodic 20 2 –
In shortest job first scheduling, the task with the shortest execution
time is given the highest priority. For example, for the tasks in Table 4-
1, task 4 has a two-millisecond execution time, which makes it the high-
est priority task. The complete task priority for the shortest job first can
be seen in Table 4-2.
Shortest response time scheduling sets the task with the shortest response time as
the highest priority. Each task has a requirement for its real-time response. For
example, a task that receives a command packet every 100 milliseconds may need to
respond to a completed packet within five milliseconds. For the tasks in Table 4-1,
task 2 has the shortest response time, making it the highest priority.
Table 4-2 Priority results for the system are defined in Table 4-1 based on the se-
lected scheduling algorithm
Shortest Shortest
Scheduling Rate Monotonic
Response Execution
Policy Scheduling
Time Time
Assigned
Task 4 Task 2 Task 2
Task
Priority
Task 1 Task 3 Undefined
Finally, the periodic execution time scheduling sets the task priority
with the shortest period. Therefore, the task must be a periodic task.
For example, in Table 4-1, tasks 3 and 4 are aperiodic (event-driven)
tasks that do not occur at regular intervals. Therefore, they have an un-
defined task priority based on the periodic execution time policy.
Table 4-2 summarizes the results for setting task priorities for the
tasks defined in Table 4-1 based on the selected scheduling policy. The
reader can see that the chosen scheduling policy can dramatically im-
pact the priority settings. Conversely, selecting an inappropriate sched-
uling policy could affect how the system performs and even determine
whether deadlines will be met successfully or not.
For many real-time systems, the go-to scheduling policy starts with
the periodic execution time, also known as rate monotonic scheduling
(RMS). RMS allows developers to set task priorities and verify that all
the tasks in a system can be scheduled to meet their deadlines!
Calculating CPU utilization for a system based on its tasks is a powerful
tool to make sure that the design is on the right track. Let’s now look at
how verifying our design is possible by using rate monotonic analysis
(RMA).
Rate monotonic analysis (RMA) is an analysis technique to determine if all tasks can
be scheduled to run and meet their deadlines.15 It relies on calculating the CPU
utilization for each task over a defined time frame. RMA comes in several different
flavors based on how complex and accurate an analysis the designer wants to
perform. However, the basic version has several critical assumptions that developers
need to be aware which include
The other big assumption in this list is that all tasks are
independent. If you have ever worked on an RTOS-based system, it’s ev-
ident that this is rarely the case. Sometimes, tasks will protect a shared
resource using a mutex or synchronize task execution using a sem-
aphore. These interactions make it so that the task execution time may
be affected by other tasks and not wholly independent. The basic RMA
assumptions don’t allow for these cases, although there are modifica-
tions to RMA that cover these calculations beyond our discussion’s
scope.16
The primary test to determine if all system tasks can be scheduled successfully is
to use the basic RMA equation:
The right side of the equation provides us with an inequality that we must
compare the CPU utilization. Table 4-3 shows the upper bound on the CPU
utilization that is allowed to schedule all tasks in a system successfully. Notice that
the inequality quickly bounds itself to 69.3% CPU utilization for an infinite number
of tasks.
Table 4-3 Scheduling all tasks in a system and ensuring they meet their dead-
lines depends on the number of tasks in the system and the upper bound of the
CPU utilization
1 100%
2 82.8%
3 77.9%
4 75.6%
∞ 69.3%
RMA allows the designer to use their assumptions about their tasks,
and then based on the number of tasks, they can calculate whether the
system can successfully schedule all the tasks or not. It is essential to
recognize that the basic analysis is a sanity check. In many cases, I view
RMA as a model that isn’t calculated once but tested using our initial as-
sumptions and then periodically updated based on real-world measure-
ments and refined assumptions as the system is built.
The question that may be on some readers’ minds is whether the example system
we were looking at in Table 4-1 can be scheduled successfully or not. Using RMA,
we can calculate the individual CPU utilization as shown in Table 4-4.
Table 4-4 An example system has four tasks: two periodic and two aperiodic
Most RTOS kernels have a trace capability built into them. The trace ca-
pability is a setting in the RTOS that allows the kernel to track what is
happening in the kernel so that developers can understand the applica-
tion’s performance and even debug it if necessary. The trace capability
is an excellent, automated, instrumentation tool for designers to lever-
age to understand how the implemented application is performing
compared to the design.
Designers need two things to retrieve the event data from the RTOS
kernel. First, they need a recording library that can take the events
from the kernel and store them in a RAM buffer. The RAM buffer can
then either be read out when it is full (snapshot mode) or it can be emp-
tied at periodic intervals and streamed to the host (streaming mode) in
real time. Second, there needs to be a host-side application that can re-
trieve the event data and then reconstruct it into a visualization that
can be reviewed by the developer.
We will discuss tools a bit more in Chapter 15, but I don’t want to move on
without giving you a few examples of the tools I use to perform RTOS
measurements. The first tool that can be used is SEGGER SystemView. An example
trace from SystemView can be seen in Figure 4-14. The RTOS generates the events
which are then streamed using the SEGGER RTT library through SWD into a
SEGGER J-Link which then distributes it through the J-Link server to SystemView.
SystemView then records the events and generates
An event log
A task context switch diagram in a “chart strip” visualization
The total CPU utilization over time
A context summary that lists the system tasks along with statistical
data about them such as total runtime, minimum execution time,
average execution time, maximum execution time, and so forth.
SystemView is a free tool that can be downloaded by any team that is using a J-
Link. SystemView can provide some amazing insights, but as a free tool, it does have
limitations as to what information it reports and its capabilities. However, for teams
looking for quick and easy measurements without spending any money, this is a
good first tool to investigate.
Figure 4-14 An example RTOS trace taken using SEGGER SystemView. Notice the running trace data
that shows task context switches and the event log with all the events generated in the system. (Image
Source: SEGGER19)
The second tool that I would recommend and the tool that I use for
all my RTOS measurements and analysis is Percepio’s Tracealyzer.
Tracealyzer is the “hot rod” of low-cost RTOS visualization tools.
Tracealyzer provides the same measurements and visualizations as
SystemView but takes things much further. The event library can
record the standard RTOS events but also allows users to create custom
events as well. That means if a developer wanted to track the state of a
state machine or the status of a network stack and so forth, they can
create custom events that are recorded and reported.
Figure 4-15 An example RTOS trace taken using Percepio Tracealyzer. Notice the multiple views and
the time synchronization between them to visualize what is happening in the system at a given point in
I’m probably a little bit biased when it comes to using these tools, so
I would recommend that you evaluate the various tools yourself and
use the one that best fits your own needs and budget. There are certain-
ly other tools out there that can help you measure your task execution
and performance, but these two are my favorites for teams that are on
a tight budget. (And the cost for Tracealyzer is probably not even no-
ticeable in your company’s balance sheet; I know it’s not in mine!)
Final Thoughts
Real-time operating systems have found their way into most embedded
systems, and it’s likely they will increase and continue to dominate em-
bedded systems. How you break your system up into tasks and process-
es will affect how scalable, reusable, and even host robust your system
is. In this chapter, we’ve only scratched the surface of what it takes to
decompose a system into tasks, set our task priorities, and arrive at a
functional system task architecture. No design or implementation can
be complete without leveraging modern-day tracing tools to measure
your assumptions in the real world and use them to feed back into your
RMA model and help to verify or tune your application.
Action Items
To put this chapter’s concepts into action, here are a few activities the reader can
perform to start applying RTOS design principles to their application(s):
Footnotes
1 www.embedded.com/wp-
content/uploads/2019/11/EETimes_Embedded_2019_Embedded_Markets_S-
tudy.pdf
2 www.embedded.com/program-structure-and-real-time/
3 www.embeddedrelated.com/thread/5762/rtos-vs-bare-metal
4 Unknown reference.
5 https://docs.microsoft.com/en-us/azure/rtos/threadx/chapter1
6 https://docs.microsoft.com/en-us/azure/rtos/threadx/chapter1
7 https://docs.microsoft.com/en-us/azure/rtos/threadx/chapter1
8 This diagram is inspired by and borrowed from Jean Labrosse’s “Using a MPU
with an RTOS” blog series.
9 This diagram is inspired by and borrowed from Jean Labrosse’s “Using a MPU
with an RTOS” blog series.
10 www.webopedia.com/definitions/feature/#:~:text=
(n.),was%20once%20a%20simple%20application
12 I mention “real world” here because there will be inputs and outputs in the
application layer that don’t physically interact with the world but are just soft-
ware constructs.
13 Miller, G. (1956), The psychological review, 63, 81–97.
14 The concepts, definitions, and examples in this section are taken from Real-
Time Operating Systems Book 1 – The Theory by Jim Cooling, Chapter 9, page 6.
15 https://en.wikipedia.org/wiki/Rate-monotonic_scheduling
17 This is the worst-case periodic rate for this aperiodic, event-driven task.
18 This is the worst-case periodic rate for this aperiodic, event-driven task.
19
https://c.a.segger.com/fileadmin/documents/Press_Releases/PR_151106_SEG-
GER_SystemView.pdf
20 https://percepio.com/docs/OnTimeRTOS32/manual/