Embedded Software Development The Open-Source Approach
Embedded Software Development The Open-Source Approach
Hu
Development
valuable to students and professionals who need a single, coherent source of
information.”
—Kristian Sandström, ABB Corporate Research, Västerås, Sweden
Embedded Software Development: The Open-Source Approach delivers a practi-
cal introduction to embedded software development, with a focus on open-source
components. This programmer-centric book is written in a way that enables even novice
practitioners to grasp the development process as a whole.
• Defines the role and purpose of embedded systems, describing their internal
structure and interfacing with software development tools
• Examines the inner workings of the GNU compiler collection (GCC)-based
software development system or, in other words, toolchain
• Presents software execution models that can be adopted profitably to
model and express concurrency
• Addresses the basic nomenclature, models, and concepts related to task-based
scheduling algorithms
• Shows how an open-source protocol stack can be integrated in an embedded
system and interfaced with other software components
• Analyzes the main components of the FreeRTOS Application Programming Interface
(API), detailing the implementation of key operating system concepts
• Discusses advanced topics such as formal verification, model checking, runtime
checks, memory corruption, security, and dependability
Tingting Hu
National Research Council of Italy
P o l i t e c n i c o d i To r i n o
Tu r i n , I t a l y
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Ché nessun quaggiù lasciamo,
né timore, né desir
(For we leave no one behind us,
nor dreads, nor desires)
— Ivan
Foreword .................................................................................................................xiii
Preface....................................................................................................................xvii
vii
viii Contents
Index...................................................................................................................... 515
Foreword
Embedded software touches many aspects in our daily life by defining the behavior
of the things that surround us, may it be a phone, a thermostat, or a car. Over time,
these things have been made more capable by a combination of advances in hardware
and software. The features and capabilities of these things are further augmented by
connecting to other things or to external software services in the cloud, e.g., by a
training watch connecting wirelessly to a heartrate monitor that provides additional
sensor data and to a cloud service providing additional analysis and presentation
capabilities. This path of evolution can be seen in most domains, including industrial
embedded systems, and has over time added new layers of knowledge that are needed
for development of embedded systems.
Embedded systems cover a large group of different systems. For instance, a phone
is a battery-powered device with relatively high processing power and ample options
for wireless connectivity, while a thermostat is likely to have scarce resources and
limited connectivity options. A car on the other hand represents another type of em-
bedded system, comprising a complex distributed embedded system that operates in
harsh environments. Although the requirements and constraints on these systems are
quite different, there are still strong commonalities among most types of embedded
systems, considering the software layers that sit closest to the hardware, typically
running on embedded or real-time operating systems that offer similar services.
Every new generation of a software-based system typically is more elaborate and
operates in a more complex and dynamic environment than the previous one. Over
the last four decades many things have changed with respect to software based sys-
tems in general; starting in the 1980s with the early desktop PC, like the IBM XT that
I used at the university, which was similar to industrial and embedded systems with
respect to operating system support and development environments. In the 1990s,
real-time and embedded operating systems services remained mostly the same, while
the evolution of general purpose operating systems witnessed many changes includ-
ing graphical user interfaces. During the first decade of the new millennium we saw
the introduction of smart phones, server virtualization, and data centers, and cur-
rently we see the emergence of large scale open-source software platforms for cloud
computing and the Internet of Things (IoT).
Some embedded systems such as consumer electronics have been an integral part
of these latest developments while industrial system adoption is slower and more
careful. Although, the system context today is much more complex than thirty-five
years ago and many software layers have been added, the foundation, the software
that sits close to the hardware, is still much the same with the same requirements for
timeliness and predictability.
As the software stack in embedded systems increases and as the functionality
reaches beyond the embedded device through connectivity and collaboration with
xiii
xiv Foreword
What is lacking is continuous and coherent descriptions holding all fragments to-
gether presenting a holistic view on the fundamentals of the embedded system, the
involved components, how they operate in general, and how they interact and depend
on each other. As such, this book fills a gap in the literature and will prove valuable
to students and professionals who need a single coherent source of information.
Kristian Sandström
ABB Corporate Research
Västerås, Sweden
xvii
The Authors
Ivan Cibrario Bertolotti earned the Laurea degree (summa cum laude) in com-
puter science from the University of Torino, Turin, Italy, in 1996. Since then, he has
been a researcher with the National Research Council of Italy (CNR). Currently, he
is with the Institute of Electronics, Computer, and Telecommunication Engineering
(IEIIT) of CNR, Turin, Italy.
His research interests include real-time operating system design and implementa-
tion, industrial communication systems and protocols, and formal methods for vul-
nerability and dependability analysis of distributed systems. His contributions in this
area comprise both theoretical work and practical applications, carried out in coop-
eration with leading Italian and international companies.
Dr. Cibrario Bertolotti taught several courses on real-time operating systems at
Politecnico di Torino, Turin, Italy, from 2003 until 2013, as well as a PhD degree
course at the University of Padova in 2009. He regularly serves as a technical referee
for the main international conferences and journals on industrial informatics, factory
automation, and communication. He has been an IEEE member since 2006.
xix
List of Figures
xxi
xxii List of Figures
6.1 Notation for real-time scheduling algorithms and analysis. ....................... 141
6.2 U-based schedulability tests for Rate Monotonic....................................... 145
6.3 Scheduling diagram for the task set of Table 6.2........................................ 147
6.4 Scheduling diagram for the task set of Table 6.3........................................ 148
6.5 Unbounded priority inversion. .................................................................... 154
6.6 Priority inheritance protocol. ...................................................................... 158
xxv
xxvi List of Tables
10.1 Object-Like Macros to Be Provided by the F REE RTOS Porting Layer .... 276
10.2 Function-Like Macros to Be Provided by the F REE RTOS
Porting Layer .............................................................................................. 277
10.3 Data Types to Be Provided by the F REE RTOS Porting Layer................... 278
10.4 Functions to Be Provided by the F REE RTOS Porting Layer..................... 278
14.1 Original (N) and Additional (A) Transitions of the Election Protocol’s
Timed FSM ................................................................................................. 412
1
2 Embedded Software Development: The Open-Source Approach
but are indeed of utmost importance to guarantee that embedded systems work reli-
ably and efficiently.
Furthermore, virtually all chapters in the first part of the book make explicit ref-
erence to real fragments of code and, when necessary, to a specific operating system,
that is, F REE RTOS. This also addresses a shortcoming of other textbooks, that is, the
lack of practical programming examples presented and commented on in the main
text (some provide specific examples in appendices).
From this point of view F REE RTOS is an extremely useful case study because it
has a very limited memory footprint and execution-time overhead, but still provides
a quite comprehensive range of primitives.
In fact, it supports a multithreaded programming model with synchronization and
communication primitives that are not far away from those available on much bigger,
and more complex systems. Moreover, thanks to its features, is has successfully been
used in a variety of real-world embedded applications.
At the same time, it is still simple enough to be discussed in a relatively detailed
way within a limited number of pages and without overwhelming the reader with
information. This also applies to its hardware abstraction layer—that is, the code
module that layers the operating system on top of a specific hardware architecture—
which is often considered an “off-limits” topic in other operating systems.
As a final remark, the book title explicitly draws attention to the open-source
approach to software development. In fact, even though the usage of open-source
components, at least in some scenarios (like industrial or automotive applications)
is still limited nowadays, it is the authors’ opinion that open-source solutions will
enjoy an ever-increasing popularity in the future.
For this reason, all the software components mentioned and taken as examples
in this book—from the software development environment to the real-time operat-
ing system, passing through the compiler and the software analysis and verification
tools—are themselves open-source.
The first part of the book guides the reader through the key aspects of embed-
ded software development, spanning from the peculiar requirements of embedded
systems in contrast to their general-purpose counterparts, without forgetting network
communication, implemented by means of open-source protocol stacks.
In fact, even though the focus of this book is mainly on software development
techniques, most embedded systems are nowadays networked or distributed, that is,
they consist of a multitude of nodes that cooperate by communicating through a
network. For this reason, embedded programmers must definitely be aware of the
opportunities and advantages that adding network connectivity to the systems they
design and develop may bring.
The chapters of the first part of the book are:
their internal structure and to the most common way they are interfaced
to software development tools. The chapter also gives a first overview of
the embedded software development process, to be further expanded in the
following.
• Chapter 3, GCC-Based Software Development Tools. The main topic
of this chapter is a thorough description of the GNU compiler collec-
tion (GCC)-based software development system, or toolchain, which is ar-
guably the most popular open-source product of this kind in use nowadays.
The discussion goes through the main toolchain components and provides
insights on their inner workings, focusing in particular on the aspects that
may affect programmers’ productivity,
• Chapter 4, Execution Models for Embedded Systems. This chapter and the
next provide readers with the necessary foundations to design and imple-
ment embedded system software. Namely, this chapter presents in detail
two different software execution models that can be profitably applied to
model and express the key concept of concurrency, that is, the parallel ex-
ecution of multiple activities, or tasks, within the same software system.
• Chapter 5, Concurrent Programming Techniques. In this chapter, the con-
cept of execution model is further expanded to discuss in detail how con-
currency must be managed to achieve correct and timely results, by means
of appropriate concurrent programming techniques.
• Chapter 6, Scheduling Algorithms and Analysis. After introducing some
basic nomenclature, models, and concepts related to task-based scheduling
algorithms, this chapter describes the most widespread ones, that is, rate
monotonic (RM) and earliest deadline first (EDF). The second part of the
chapter briefly discusses scheduling analysis, a technique that allows pro-
grammers to predict the worst-case timing behavior of their systems.
• Chapter 7, Configuration and Usage of Open-Source Protocol Stacks.
In recent years, many embedded systems rapidly evolved from central-
ized to networked or distributed architectures, due to the clear advan-
tages this approach brings. In this chapter, we illustrate how an open-
source protocol stack—which provides the necessary support for inter-node
communication—can easily be integrated in an embedded software system
and how it interfaces with other software components.
• Chapter 8, Device Driver Development. Here, the discourse goes from the
higher-level topics addressed in the previous three chapters to a greater level
of detail, concerning how software manages and drives hardware devices.
This is an aspect often neglected in general-purpose software development,
but of utmost importance when embedded systems are considered, because
virtually all of them are strongly tied to at least some dedicated hardware.
• Chapter 9, Portable Software. While the previous chapters set the stage
for effective embedded software development, this chapter outlines the all-
important trade-off between code execution efficiency and portability, that
is, easiness of migrating software from one project to another. This aspect
4 Embedded Software Development: The Open-Source Approach
In the second part, the book presents a few advanced topics, focusing mainly on
improving software quality and dependability. The importance of these goals is ever
increasing nowadays, as embedded systems are becoming commonplace in critical
application areas, like anti-lock braking system (ABS) and motion control.
In order to reach the goal, it is necessary to adopt a range of different techniques,
spanning from the software design phase to runtime execution, passing through its
implementation. In particular, this book first presents the basic principles of formal
verification through model checking, which can profitably be used since the very
early stages of algorithm and software design.
Then, a selection of runtime techniques to prevent or, at least, detect memory
corruption are discussed. Those techniques are useful to catch any software error
that escaped verification and testing, and keep within bounds the damage it can do to
the system and its surroundings.
Somewhat in between these two extremes, static code analysis techniques are use-
ful, too, to spot latent software defects that may escape manual code inspection. With
respect to formal verification, static code analysis techniques have the advantage of
working directly on the actual source code, rather than on a more abstract model.
Moreover, their practical application became easier in recent years because sev-
eral analysis tools moved away from being research prototypes and became stable
enough for production use. The book focuses on one of them as an example.
The second part of the book consists of the following chapters:
The bibliography at the end of the book has been kept rather short because it has
been compiled with software practitioners in mind. Hence, instead of providing an
exhaustive and detailed list of references that would have been of interest mainly to
people willing to dive deep into the theoretical aspects of embedded real-time sys-
tems, we decided to highlight a smaller number of additional sources of information.
In this way, readers can more effectively use the bibliography as a starting point
to seek further knowledge on this rather vast field, without getting lost. Within the
works we cite, readers will also find further, more specific pointers to pursue their
quest.
Part I
CONTENTS
2.1 Role and Purpose of Embedded Systems .......................................................... 9
2.2 Microcontrollers and Their Internal Structure................................................. 13
2.2.1 Flash Memory..................................................................................... 15
2.2.2 Static Random Access Memory (SRAM)........................................... 17
2.2.3 External Memory ................................................................................ 19
2.2.4 On-Chip Interconnection Architecture ............................................... 23
2.3 General-Purpose Processors versus Microcontrollers ..................................... 29
2.4 Embedded Software Development Process ..................................................... 32
2.5 Summary.......................................................................................................... 37
This chapter outlines the central role played by embedded software in a variety of
contemporary appliances. At the same time, it compares embedded and general-
purpose computing systems from the hardware architecture and software develop-
ment points of view. This is useful to clarify and highlight why embedded software
development differs from application software development for personal computers
most readers are already familiar with.
9
10 Embedded Software Development: The Open-Source Approach
What’s more, embedded systems are deeply involved in the transportation indus-
try, especially automotive. It is quite common to find that even inexpensive cars are
equipped with more than 10 embedded nodes, including those used for anti-lock
braking system (ABS). For what concerns industry automation, embedded systems
are deployed in production lines to carry out all sorts of activities, ranging from mo-
tion control to packaging, data collection, and so on.
The main concerns of embedded systems design and development are different
from general-purpose systems. Embedded systems are generally equipped with lim-
ited resources, for instance, small amount of memory, low clock frequency, leading to
the need for better code optimization strategies. However, fast CPUs sometimes sim-
ply cannot be adopted in industrial environments because they are supposed to work
within a much narrower range of temperature, for instance [0, 40] degrees Celsius.
Instead, industrial-grade microprocessors are generally assumed to perform sus-
tainable correct functioning even up to 85 degrees. This leads to one main concern
of embedded systems, that is reliability, which encompasses hardware, software, and
communication protocol design. In addition, processor heat dissipation in such envi-
ronments is another issue if they are working at a high frequency.
All these differences bring unavoidable changes in the way embedded software
is developed, in contrast with the ordinary, general-purpose software development
process, which is already well known to readers. The main purpose of this chapter is
to outline those differences and explain how they affect programmers’ activities and
way of working.
In turn, this puts the focus on how to make the best possible use of the limited
resources available in an embedded system by means of code optimization. As out-
lined above, this topic is of more importance in embedded software development
with respect to general-purpose development, because resource constraints are usu-
ally stronger in the first case.
Moreover, most embedded systems have to deal with real-world events, for in-
stance, continuously changing environmental parameters, user commands, and oth-
ers. Since in the real world events inherently take place independently from each
other and in parallel, it is natural that embedded software has to support some form
of parallel, or concurrent execution. From the software development point of view,
this is generally done by organizing the code as a set of activities, or tasks, which are
carried out concurrently by a real-time operating system (RTOS).
The concept of task—often also called process—was first introduced in the semi-
nal work of Dijkstra [48]. In this model, any concurrent application, regardless of its
nature or complexity, is represented by, and organized as, a set of tasks that, concep-
tually, execute in parallel.
Each task is autonomous and holds all the information needed to represent the
evolving execution state of a sequential program. This necessarily includes not only
the program instructions but also the state of the processor (program counter, regis-
ters) and memory (variables).
Informally speaking, each task can be regarded as the execution of a sequential
program by “its own” conceptual processor even though, in a single-processor sys-
Embedded Applications and Their Requirements 11
tem the RTOS will actually implement concurrent execution by switching the physi-
cal processor from one task to another when circumstances warrant.
Therefore, thoroughly understanding the details of how tasks are executed, or
scheduled, by the operating system and being aware of the most common concurrent
programming techniques is of great importance for successful embedded software
development. This is the topic of Chapters 4 through 6.
In the past, the development of embedded systems has witnessed the evolution
from centralized to distributed architectures. This is because, first of all, the easiest
way to cope with the increasing need for more and more computing power is to
use a larger number of processors to share the computing load. Secondly, as their
complexity grows, centralized systems cannot scale up as well as distributed systems.
A simple example is that, in a centralized system, one more input point may re-
quire one more pair of wires to bring data to the CPU for processing. Instead, in a
distributed system, many different Inputs/Outputs values can be transmitted to other
nodes for processing through the same shared communication link. Last but not the
least, with time, it becomes more and more important to integrate different subsys-
tems, not only horizontally but also vertically.
For example, the use of buses and networks at the factory level makes it much eas-
ier to integrate it into the factory management hierarchy and support better business
decisions. Moreover, it is becoming more and more common to connect embedded
systems to the Internet for a variety of purposes, for instance, to provide a web-based
user interface, support firmware updates, be able to exchange data with other equip-
ment, and so on.
For this reason protocol stacks, that is, software components which implement
a set of related communication protocols—for instance, the ubiquitous TCP/IP
protocols—play an ever-increasing role in all kinds of embedded systems. Chap-
ter 7 presents in detail how a popular open-source TCP/IP protocol stack can be
interfaced, integrated, and configured for use within an embedded system.
For the same reason Chapter 8, besides illustrating in generic terms how software
and hardware shall be interfaced in an embedded system—by means of suitable de-
vice drivers—also shows an example of how protocol stacks can be interfaced with
the network hardware they work with.
Another major consideration in embedded systems design and development is
cost, including both hardware and software, as nowadays software cost is growing
and becomes as important as hardware cost. For what concerns software, if existing
applications and software modules can be largely reused, this could significantly
save time and effort, and hence, reduce cost when integrating different subsystems
or upgrading the current system with more advanced techniques/technologies.
An important milestone toward this goal, besides the adoption of appropriate soft-
ware engineering methods (which are outside the scope of this book), consists of
writing portable code. As discussed in Chapters 9 and 10, an important property of
portable code is that it can be easily compiled, or ported, to diverse hardware archi-
tectures with a minimum amount of change.
12 Embedded Software Development: The Open-Source Approach
This property, besides reducing software development time and effort—as out-
lined previously—also brings the additional benefit of improving software reliability
because less code must be written anew and debugged when an application is moved
from one architecture to another. In turn, this topic is closely related to code opti-
mization techniques, which are the topic of Chapter 11.
Generally, embedded systems also enforce requirements on real-time perfor-
mance. It could be soft real-time in the case of consumer appliances and building au-
tomation, instead of hard real-time for critical systems like ABS and motion control.
More specifically, two main aspects directly related to real-time are delay and
jitter. Delay is the amount of time taken to complete a certain job, for example,
how long it takes a command message to reach the target and be executed. Delay
variability gives rise to jitter. For example, jobs are completed sometimes sooner,
sometimes later.
Hard real-time systems tolerate much less jitter than their counterparts and they
require the system to behave in a more deterministic way. This is because, in a hard
real-time system, deadlines must always be met and any jitter that results in missing
the deadline is unacceptable. Instead, this is allowed in soft real-time systems, as
long as the probability is sufficiently small. This also explains why jitter is generally
more of concern. Several of these important aspects of software development will be
considered in more detail in Chapter 12, within the context of a real-world example.
As we can see, in several circumstances, embedded systems also have rather tight
dependability requirements that, if not met, could easily lead to safety issues, which
is another big concern in some kinds of embedded systems. Safety is not only about
the functional correctness (including the timing aspect) of a system but also related
to the security aspect of a system, especially since embedded systems nowadays are
also network-based and an insecure system is often bound to be unsafe.
Therefore, embedded software must often be tested more accurately than general-
purpose software. In some cases, embedded software correctness must be ensured
not only through careful testing, but also by means of formal verification. This topic
is described in Chapter 13 from the theoretical point of view and further developed
in Chapter 14 by means of a practical example. Further information about security
and dependability, and how they can be improved by means of automatic software
analysis tools, is contained in Chapter 16.
A further consequence, related to both code reliability and security, of the limited
hardware resources available in embedded architectures with respect to their general-
purpose counterparts, is that the hardware itself may provide very limited support to
detect software issues as early as possible and before damage is done to the system.
From this point of view, the software issue of most interest is memory corruption
that takes place when part of a memory-resident data structure is overwritten with
inconsistent data, often due to a wayward task that has no direct relationship with
the data structure itself. This issue is also often quite difficult to detect, because the
effects of memory corruption may be subtle and become manifest a long time after
the corruption actually took place.
Embedded Applications and Their Requirements 13
• One or more processor cores, which are the functional units responsible for
program execution.
• Multiple internal memory banks, with different characteristics regarding
capacity, speed, and volatility.
• Optionally, one or more memory controllers to interface the microcontroller
with additional, external memory.
• A variety of input–output controllers and devices, ranging from very
simple, low-speed devices like asynchronous serial receivers/transmitters
to very fast and complex ones, like Ethernet and USB controllers.
14 Embedded Software Development: The Open-Source Approach
Is is therefore evident that, even though most of this book will focus on the pro-
cessing capabilities of microcontrollers, in terms of program execution, it is ex-
tremely important to consider the microcontroller architecture as a whole during
component selection, as well as software design and development.
For this reason, this chapter contains a brief overview of the major components
outlined above, with special emphasis on the role they play from the programmer’s
perspective. Interested readers are referred to more specific literature [50, 154] for
detailed information. After a specific microcontroller has been selected for use, its
hardware data sheet of course becomes the most authoritative reference on this topic.
Embedded Applications and Their Requirements 15
Arguably the most important component to be understood and taken into account
when designing and implementing embedded software is memory. In fact, if it is true
that processor cores are responsible for instruction execution, those instructions are
stored in memory and must be continuously retrieved, or fetched, from memory to
be executed.
Moreover, program instructions heavily refer to, and work on, data that are stored
in memory, too. In a similar way, processor cores almost invariably make use of
one or more memory-resident stacks to hold arguments, local variables, and return
addresses upon function calls.
This all-important data structure is therefore referenced implicitly upon each func-
tion call (to store input arguments and the return address into it), during the call itself
(to retrieve input arguments, access local variables, and store function results), and
after the call (to retrieve results).
As a consequence, the performance and determinism of a certain program is
deeply affected, albeit indirectly, by the exact location of its instructions, data, and
stacks in the microcontroller’s memory banks. This dependency is easily forgotten
by just looking at the program source code because, more often than not, it simply
does not contain this kind of information.
This is because most programming languages (including the C language this book
focuses on) do allow the programmer to specify the abstract storage class of a vari-
able (for instance, whether a variable is local or global, read-only or read-write,
and so on) but they do not support any standard mechanisms that allow program-
mers to indicate more precisely where (in which memory bank) that variable will be
allocated.
As will be better described in Chapters 3 and 9, this important goal must there-
fore be pursued in a different way, by means of other components of the software
development toolchain. In this case, as shown in Figure 2.2, the linker plays a central
role because it is the component responsible for the final allocation of all memory-
resident objects defined by the program (encompassing both code and data), to fit the
available memory banks.
Namely, compiler-dependent extensions of the programming language are first
used to tag specific functions and data structures in the source code, to specify where
they should be allocated in a symbolic way. Then, the linker is instructed to pick up
all the objects tagged in a certain way and allocate them in a specific memory bank,
rather than using the default allocation method.
An example of this strategy for a GCC-based toolchain will be given in Chapters 8
and 11, where it will be used to distribute the data structures needed by the example
programs among the memory banks available on the microcontroller under study. A
higher-level and more thorough description of the technique, in the broader context
of software portability, will be given in Chapter 9 instead.
lose its contents when power is removed from the microcontroller. On the other hand,
it works as a read-only memory during normal use.
Write operations into flash memory are indeed possible, but they require a special
procedure (normal store operations performed by the processor are usually not ade-
quate to this purpose), are relatively slow (several orders of magnitude slower than
read operations) and, in some cases, they can only be performed when the microcon-
troller is put into a special operating mode by means of dedicated tools.
Flash memory is usually used to store the program code (that, on recent archi-
tectures, is read-only by definition), constant data, and the initial value of global
variables.
In many cases, flash memory is not as fast as the processor. Therefore, it may be
unable to sustain the peak transfer rate the processor may require for instruction and
data access, and it may also introduce undue delays or stalls in processing activities
if those accesses are performed on demand, that is, when the processor asks for them.
Embedded Applications and Their Requirements 17
For this reason, units of various complexity are used to try and predict the next
flash memory accesses that will be requested by the processor and execute them in
advance, while the processor is busy with other activities.
For instance, both the NXP LPC24xx and LPC17xx microcontroller fami-
lies [126, 128] embed a Memory accelerator module (MAM) that works in com-
bination with the flash memory controller to accelerate both code and data accesses
to flash memory.
In this way, if the prediction is successful, the processor is never stalled waiting for
flash memory access because processor activities and flash memory accesses proceed
concurrently.
On the other hand, if the prediction is unsuccessful, a processor stall will definitely
occur. Due to the fact that prediction techniques inherently work on a statistical basis,
it is often very hard or impossible to predict exactly when a stall will occur during
execution, and also for how long it will last. In turn, this introduces a certain degree
of non-determinism in program execution timings, which can hinder its real-time
properties in critical applications.
In those cases, it is important that programmers are aware of the existence of
the flash acceleration units just described and are able to turn them off (partially or
completely) in order to change the trade-off point between average performance and
execution determinism.
1. Being SRAM volatile, it cannot be used to permanently retain the code, which
would otherwise be lost as soon as power is removed from the system. Hence, it
is necessary to store the code elsewhere, in a non-volatile memory (usually flash
memory) and copy it into SRAM at system startup, bringing additional complexity
to system initialization.
Furthermore, after the code has been copied, its execution address (that is, the
address at which it is executed by the processor) will no longer be the same as its
load address (the address of the memory area that the linker originally assigned
to it).
18 Embedded Software Development: The Open-Source Approach
This clearly becomes an issue if the code contains absolute memory addresses to
refer to instructions within the code itself, or other forms of position-dependent
code, because these references will no longer be correct after the copy and, if
followed, they will lead code execution back to flash memory.
This scenario can be handled in two different ways, at either the compiler or
linker level, but both require additional care and configuration instructions to
those toolchain components, as better described in Chapter 3. Namely:
• It is possible to configure the compiler—for instance, by means of appropriate
command-line options and often at the expense of performance—to gener-
ate position-independent code (PIC), that is, code that works correctly even
though it is moved at will within memory.
• Another option is to configure the linker so that it uses one base address (often
called load memory address (LMA)) as its target to store the code, but uses
a different base address (the virtual memory address (VMA)) to calculate and
generate absolute addresses.
2. When the code is spread across different memory banks—for instance, part of it
resides in flash memory while other parts are in SRAM—it becomes necessary to
jump from one bank to another during program execution—for instance, when a
flash-resident function calls a SRAM-resident function. However, memory banks
are often mapped far away from each other within the microcontroller’s address
range and the relative displacement, or offset, between addresses belonging to
different banks is large.
Modern microcontrollers often encode this offset in jump and call instructions
to locate the target address and, in order to reduce code size and improve per-
formance, implement several different instruction variants that support different
(narrower or wider) offset ranges. Compilers are unaware of the target address
when they generate jump and call instructions—as better explained in Chapter 3,
this is the linker’s responsibility instead. Hence they, by default, choose the in-
struction variant that represents the best trade-off between addressing capability
and instruction size.
While this instruction variant is perfectly adequate for jumps and calls among
functions stored in the same memory bank, the required address offset may not fit
into it when functions reside in different memory banks. For the reasons recalled
above, the compiler cannot detect this issue by itself. Instead, it leads to (generally
rather obscure) link-time errors and it may be hard for programmers to track these
errors back to their original cause.
Although virtually all compilers provide directives to force the use of an instruc-
tion variant that supports larger offsets for function calls, this feature has not
been envisaged in most programming language standards, including the C lan-
guage [89]. Therefore, as discussed in Chapter 9, compiler-dependent language
extensions have to be used to this purpose, severely impairing code portability.
3. When SRAM or, more in general, the same bank of memory is used by the pro-
cessor for more than one kind of access, for instance, to access both instructions
and data, memory contention may occur.
Embedded Applications and Their Requirements 19
• Flash memory
• SRAM
• Dynamic Random Access Memory (DRAM)
On them, external flash memory and SRAM share the same general characteris-
tics as their on-chip counterparts. Instead, DRAM is rarely found in commerce as
a kind of on-chip microcontroller memory, due to difficulties to make the two chip
production processes coexist. It is worth mentioning the main DRAM properties here
20 Embedded Software Development: The Open-Source Approach
because they have some important side effects on program execution, especially in an
embedded real-time system. Interested readers are referred to [155] for more detailed
information about this topic.
DRAM, like SRAM, is a random access memory, that is, the processor can freely
and directly read from and write into it, without using any special instructions or
procedures. However, there are two important differences concerning access delay
and jitter:
1. Due to its internal architecture, DRAM usually has a much higher capacity than
SRAM but read and write operations are slower. While both on-chip and external
SRAM are able to perform read and write operations at the same speed as the
processor, DRAM access times are one or two orders of magnitude higher in
most cases.
To avoid intolerable performance penalties, especially as processor speed grows,
DRAM access requires and relies on the interposition of another component,
called cache, which speeds up read and write operations. A cache is basically
a fast memory of limited capacity (much smaller than the total DRAM capacity),
which is as fast as SRAM and holds a copy of DRAM data recently accessed by
the processor.
The caching mechanism is based on widespread properties of programs, that is,
their memory access locality. Informally speaking, the term locality means that,
if a processor just made a memory access at a certain address, its next accesses
have a high probability to fall in the immediate vicinity of that address.
To persuade ourselves of this fact, by intuition, let us consider the two main kinds
of memory access that take place during program execution:
• Instruction fetch. This kind of memory access is inherently sequential in most
cases, the exception being jump and call instructions. However, they represent
a relatively small fraction of program instruction and many contemporary pro-
cessors provide a way to avoid them completely, at least for very short-range
conditional jumps, by means of conditionally executed instructions. A thor-
ough description of how conditionally executed instructions work is beyond
the scope of this book. For instance, Reference [8] discusses in detail how they
have been implemented on the ARM Cortex family of processor cores.
• Data load and store. In typical embedded system applications, the most com-
monly used memory-resident data structure is the array, and arrays elements
are quite often (albeit not always) accessed within loops by means of some
sort of sequential indexing.
Cache memory is organized in fixed-size blocks, often called cache lines, which
are managed as an indivisible unit. Depending on the device, the block size usu-
ally is a power of 2 between 16 and 256 bytes. When the processor initiates a
transaction to read or write data at a certain address, the cache controller checks
whether or not a line containing those data is currently present in the cache.
• If the required data are found in the cache a fast read or write transaction, in-
volving only the cache itself and not memory, takes place. The fast transaction
Embedded Applications and Their Requirements 21
is performed at the processor’s usual speed and does not introduce any extra
delay. This possibility is called cache hit.
• Otherwise, the processor request gives origin to a cache miss. In this case, the
cache controller performs two distinct actions:
a. It selects an empty cache line. If the cache is completely full, this may
entail storing the contents of a full cache line back into memory, an oper-
ation known as eviction.
b. The cache line is filled with the data block that surrounds the address
targeted by the processor in the current transaction.
In the second case, the transaction requested by the processor finishes only after
the cache controller has completed both actions outlined previously. Therefore, a
cache miss entails a significant performance penalty from the processor’s point of
view, stemming from the extra time needed to perform memory operations.
On the other hand, further memory access transactions issued by the processor in
the future will likely hit the cache, due to memory access locality, with a signifi-
cant reduction in data access time.
From this summary description, it is evident that cache performance heavily de-
pends on memory access locality, which may vary from one program to another
and even within the same program, depending on its current activities.
An even more important observation, from the point of view of real-time embed-
ded systems design, is that although it is quite possible to satisfactorily assess the
average performance of a cache from a statistical point of view, the exact cache
behavior with respect to a specific data access is often hard to predict.
For instance, it is impossible to exclude scenarios in which a sequence of cache
misses occurs during the execution of a certain section of code, thus giving rise to
a worst-case execution time that is much larger than the average one.
To make the problem even more complex, cache behavior also depends in part on
events external to the task under analysis, such as the allocation of cache lines—
and, consequently, the eviction of other lines from the cache—due to memory
accesses performed by other tasks (possibly executed by other processor cores) or
interrupt handlers.
2. Unlike SRAM, DRAM is unable to retain its contents indefinitely, even though
power is continuously applied to it, unless a periodic refresh operation is per-
formed. Even though a detailed explanation of the (hardware-related) reasons for
this and of the exact procedure to be followed to perform a refresh cycle are
beyond the scope of this book, it is useful anyway to briefly recall its main conse-
quences on real-time code execution.
Firstly, it is necessary to highlight that during a refresh cycle DRAM is unable
to perform regular read and write transactions, which must be postponed, unless
advanced techniques such as hidden refresh cycles are adopted [155]. Therefore,
if the processor initiates a transaction while a refresh cycle is in progress, it will
incur additional delay—beyond the one needed for the memory access itself—
unless the transaction results in a cache hit.
Secondly, the exact time at which a refresh cycle starts is determined by the mem-
ory controller (depending on the requirements of the DRAM connected to it) and
not by the processor.
22 Embedded Software Development: The Open-Source Approach
• For what concerns internal signal routing, we already mentioned that typ-
ical microcontrollers have a limited number of external pins, which is not
big enough to route all internal input–output signals to the printed circuit
board (PCB).
The use of an external (parallel) bus is likely to consume a significant num-
ber of pins and make them unavailable for other purposes. A widespread
workaround for this issue is to artificially limit the external bus width to
save pins.
For instance, even tough 32-bit microcontrollers support an external 32-bit
bus, hardware designers may limit its width to 16 or even 8 bits.
Besides the obvious advantage in terms of how many pins are needed to
implement the external bus, a negative side-effect of this approach is that
more bus cycles become necessary to transfer the same amount of data.
Assuming that the bus speed is kept constant, this entails that more time is
needed, too.
• To simplify external signal routing hardware designers may also keep the
external bus speed slower than the maximum theoretically supported by the
microcontroller, besides reducing its width. This is beneficial for a variety
of reasons, of which only the two main ones are briefly presented here.
• It makes the system more tolerant to signal propagation time skews and
gives the designer more freedom to route the external bus, by means of
longer traces or traces of different lengths.
• Since a reduced number of PCB traces is required to connect the mi-
crocontroller to the external components, fewer routing conflicts arise
against other parts of the layout.
maximum of 6 wires, plus ground, to be implemented. These figures are much lower
than what a parallel bus, even if its width is kept at 8 bits, requires.
As it is easy to imagine, the price to be paid is that this kind of interface is likely
unable to sustain the peak data transfer rate a recent processor requires, and the
interposition of a cache (which brings all the side effects mentioned previously) is
mandatory to bring average performance to a satisfactory level.
Regardless of the underlying reasons, those design choices are often bound to cap
the performance of external memory below its maximum. From the software point of
view, in order to assess program execution performance from external memory, it is
therefore important to evaluate not only the theoretical characteristics of the memory
components adopted in the system, but also the way they have been connected to the
microcontroller.
The components connected to a bus can be categorized into two different classes:
1. Bus masters, which are able to initiate a read or write transaction on the bus,
targeting a certain slave.
24 Embedded Software Development: The Open-Source Approach
2. Bus slaves, which can respond to transactions initiated by masters, but cannot
initiate a transaction on their own.
A typical example of bus master is, of course, the processor. However, peripheral
devices may act as bus masters, too, when they are capable of autonomous direct
memory access (DMA) to directly retrieve data from, and store them to, memory
without processor intervention.
A different approach to DMA, which does not require devices to be bus masters,
consists of relying on a general-purpose DMA controller, external to the devices. The
DMA controller itself is a bus master and performs DMA by issuing two distinct bus
transactions, one targeting the device and the other one targeting memory, on behalf
of the slaves.
For instance, to transfer a data item from a device into memory, the DMA con-
troller will first wait for a trigger from the device, issue a read transaction targeting
Embedded Applications and Their Requirements 25
the device (to read the data item from its registers), and then issue a write trans-
action targeting memory (to store the data item at the appropriate address). This
approach simplifies device implementation and allows multiple devices to share the
same DMA hardware, provided they will not need to use it at the same time.
In order to identify which slave is targeted by the master in a certain transaction,
masters provide the target’s address for each transaction and slaves respond to a
unique range of addresses within the system’s address space. The same technique is
also used by bridges to recognize when they should forward a transaction from one
bus to the other.
Each bus supports only one ongoing transaction at a time. Therefore all bus mas-
ters connected to the same bus must compete for bus access by means of an arbitra-
tion mechanism. The bus arbiter chooses which master can proceed and forces the
others to wait when multiple masters are willing to initiate a bus transaction at the
same time. On the contrary, several transactions can proceed in parallel on differ-
ent buses, provided those transactions are local to their bus, that is, no bridges are
involved.
As shown in Figure 2.3, buses are interconnected by means of a number of
bridges, depicted as gray blocks.
• Two bridges connect the local bus to the AHB buses. They let the proces-
sor access the controllers connected there, as well as the additional SRAM
banks.
• One additional bridge connects one of the AHB buses to the APB bus. All
transactions directed to the lower-performance peripherals go through this
bridge.
• The last bridge connects the two AHB buses together, to let the Ethernet
controller (on the left) access the SRAM bank residing on the other bus, as
well as external memory through the external memory controller.
The role of a bridge between two buses A and B is to allow a bus master M
residing, for instance, on bus A to access a bus slave S connected to bus B. In order
to do this, the bridge plays two different roles on the two buses at the same time:
• On bus A, it works as a bus slave and responds on behalf of S to the trans-
action initiated by M.
• On bus B, it works as a bus master, performing on behalf of M the transac-
tion directed to S.
The kind of bridge described so far is the simplest one and works in an asymmetric
way, that is, is able to forward transactions initiated on bus A (where its master port
is) toward bus B (where the slave port is), but not vice versa. For instance, referring
back to Figure 2.3, the AHB/AHB bridge can forward transactions initiated by a
master on the left-side AHB toward a slave connected to the right-side AHB, but not
the opposite.
Other, more complex bridges are symmetric instead and can assume both master
and slave roles on both buses, albeit not at the same time. Those bridges can also be
26 Embedded Software Development: The Open-Source Approach
seen as a pair of asymmetric bridges, one from A to B and the other from B to A,
and they are often implemented in this way.
As outlined above, bus arbitration mechanisms and bridges play a very important
role to determine the overall performance and determinism of the on-chip intercon-
nection architecture. It is therefore very important to consider them at design time
and, especially for what concerns bridges, make the best use of them in software.
When the processor (or another bus master) crosses a bridge to access memory
or a peripheral device controller, both the processor itself and the system as a whole
may incur a performance and determinism degradation. Namely: