UML Workshop For Embedded Systems
UML Workshop For Embedded Systems
UML Workshop For Embedded Systems
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a pho-
tocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Foreword
v
vi
sight into working of the whole system stopped at the system level commands
of the operating systems. The mystery had to be revealed by discussing hy-
pothetical implementations of the system level calls and interaction with the
operating system kernel—of course at the expense of the digital control sub-
ject. At that time, seldom any electrical engineering curriculum had a separate
subject dedicated to operating systems. Of frustration and to avoid the “black
box” approach to illustrating control systems in action, I have written in C
a simple multitasking real-time executive for MS-DOS-based platforms, to be
run on an IBM PC (Intel 8088). Students were provided with the implementa-
tion documentation in addition to theoretical background; quite a lot of pages
to study. But the reward was substantial: they were now in full “control.”
With the support of an enterprising post-graduate student, the executive was
intended to be grown into more robust RTOS with a view for commercializa-
tion. But it was never to be. Academic life has other priorities. Around 1992,
I decided to harness the MINIX operating system, which I then taught to the
final-year graduate students, to run my real-time control lab experiments to
illustrate control algorithms in their supporting real-time operating system
environment. But soon after that came the Linux kernel.
If you are one of those professionals with the compartmented knowledge,
particularly with the electrical and computer engineering or software engi-
neering background, with not much theoretical knowledge of and practical
exposure to real-time operating systems, this book is certainly an invaluable
help to “close the loop” in your knowledge, and to develop an insight into how
things work in the realm of real-time systems. Readers with a background in
computer science will benefit from the hands-on approach, and a comprehen-
sive overview of the aspects of control theory and signal processing relevant
to the real-time systems. The book also discusses a range of advanced topics
which will allow computer science professionals to stay up-to-date with the
recent developments and emerging trends.
The book was written by two Italian researchers from the Italian National
Research Council (CNR) actively working in the area of real-time (embedded)
operating systems, with a considerable background in control and communi-
cation systems, and a history of the development of actual real-time systems.
Both authors are also involved in teaching several courses related to these
topics at Politecnico di Torino and University of Padova.
The book has been written with a remarkable clarity, which is particularly
appreciated whilst reading the section on real-time scheduling analysis. The
presentation of real-time scheduling is probably the best in terms of clarity
I have ever read in the professional literature. Easy to understand, which is
important for busy professionals keen to acquire (or refresh) new knowledge
without being bogged down in a convoluted narrative and an excessive detail
overload. The authors managed to largely avoid theoretical only presentation
of the subject, which frequently affects books on operating systems. Selected
concepts are illustrated by practical programming examples developed for the
Linux and FreeRTOS operating systems. As the authors stated: Linux has a
vii
Richard Zurawski
ISA Group, San Francisco, California
This page intentionally left blank
The Authors
Ivan Cibrario Bertolotti received the Laurea degree (summa cum laude)
in computer science from the University of Torino, Turin, Italy, in 1996.
Since then, he has been a researcher with the National Research Council of
Italy (CNR). Currently, he is with the Istituto di Elettronica e di Ingegneria
dell’Informazione e delle Telecomunicazioni (IEIIT), Turin, Italy.
His research interests include real-time operating system design and im-
plementation, industrial communication systems and protocols, and formal
methods for vulnerability and dependability analysis of distributed systems.
His contributions in this area comprise both theoretical work and practical
applications, carried out in cooperation with leading Italian and international
companies.
He has taught several courses on real-time operating systems at Politecnico
di Torino, Turin, Italy, starting in 2003, as well as a PhD degree course at the
University of Padova in 2009. He regularly serves as a technical referee for the
main international conferences and journals on industrial informatics, factory
automation, and communication. He has been an IEEE member since 2006.
Gabriele Manduchi received the Laurea degree (summa cum laude) in elec-
tronic engineering from the University of Padova, Padua, Italy, in 1987. Since
1998 he has been a researcher with the National Research Council of Italy
(CNR), and currently he is senior researcher at the Istituto Gas Ionizzati
(IGI), Padua, Italy, where he leads the control and data acquisition group of
the RFX nuclear fusion experiment.
His research interests include data acquisition and real-time control in
large physics experiments. He is a coauthor of a software framework for data
acquisition widely used in many nuclear fusion experiments around the world,
and he is involved in several international projects for the management of
computer infrastructures in fusion research.
He has taught several courses on computer architectures and software en-
gineering at the University of Padova, Padua, Italy, starting in 1996, as well
as PhD degree courses at the University of Padova in 2008–2010.
ix
This page intentionally left blank
Acknowledgments
This book is the outcome of more than 10 years of research and teaching ac-
tivity in the field of real-time operating systems and real-time control systems.
During this time, we have been positively influenced by many other people
we came in contact with, both from academy and industry. They are too nu-
merous to mention individually, but we are nonetheless indebted to them for
their contribution to our professional growth.
A special thank you goes to our university students, who first made use
of the lecture notes this book is based upon. Their questions, suggestions,
and remarks were helpful to make the book clearer and easier to read. In
particular, we would like to thank Antonio Barbalace for his advices about
Linux internals.
We would also like to express our appreciation to our coworkers for
their support and patience while we were busy with the preparation of the
manuscript. A special mention goes to one of Ivan’s past teachers, Albert Wer-
brouck, who first brought his attention to the wonderful world of embedded
systems.
Last, but not least, we are grateful to Richard Zurawski, who gave us
the opportunity to write this book. We are also indebted to the CRC Press
publishing and editorial staff: Nora Konopka, Jim McGovern, Laurie Schlags,
and Jessica Vakili. Without their help, the book would probably not exist.
xi
This page intentionally left blank
xiii
3.1 Multiprogramming . . . . . . . . . . . . . . . . . . . . . . . 65
3.2 Process Interleaving and System Timing . . . . . . . . . . . 67
3.3 Process State . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.4 Process state diagram . . . . . . . . . . . . . . . . . . . . . . 71
3.5 Process State with Multithreading . . . . . . . . . . . . . . . 76
xv
xvi
13.1 Upper Bounds and Least Upper Bound for scheduling algo-
rithm A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
13.2 Necessary schedulability condition. . . . . . . . . . . . . . . 298
13.3 No overlap between instances of τ1 and the next release time
of τ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
13.4 Overlap between instances of τ1 and the next release time of τ2 . 300
13.5 Schedulability conditions for Rate Monotonic. . . . . . . . . 304
13.6 RM scheduling for a set of tasks with U = 0.900. . . . . . . . 305
13.7 RM scheduling for a set of tasks with U = 1. . . . . . . . . . 307
13.8 Ulub value versus the number of tasks in the system. . . . . . 308
13.9 A sample task set where an overflow occurs. . . . . . . . . . 309
13.10 Utilization based schedulability check for EDF. . . . . . . . . 311
14.1 Scheduling sequence of the tasks of Table 14.1 and RTA anal-
ysis for task τ3 . . . . . . . . . . . . . . . . . . . . . . . . . . 320
14.2 RM scheduling fails for the tasks of Table 14.3. . . . . . . . . 327
14.3 DM scheduling succeeds for the tasks of Table 14.3. . . . . . 328
xxi
xxii
1 Introduction 1
4 Deadlock 79
4.1 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 Formal Definition of Deadlock . . . . . . . . . . . . . . . . . 83
4.3 Reasoning about Deadlock: The Resource Allocation Graph 84
4.4 Living with Deadlock . . . . . . . . . . . . . . . . . . . . . . 86
4.5 Deadlock Prevention . . . . . . . . . . . . . . . . . . . . . . . 87
4.6 Deadlock Avoidance . . . . . . . . . . . . . . . . . . . . . . . 91
4.7 Deadlock Detection and Recovery . . . . . . . . . . . . . . . 98
xxiii
xxiv
Bibliography 485
Index 493
This page intentionally left blank
1
Introduction
This book addresses three different topics: Embedded Systems, Real-Time Sys-
tems, and Open Source Operating Systems. Even if every single topic can well
represent the argument of a whole book, they are normally intermixed in prac-
tical applications. This is in particular true for the first two topics: very of-
ten industrial or automotive applications, implemented as embedded systems,
must provide timely responses in order to perform the required operation.
Further, in general, real-time requirements typically refer to applications that
are expected to react to the events of some kind of controlled process.
Often in the literature, real-time embedded systems are presented and an-
alyzed in terms of abstract concepts such as tasks, priorities, and concurrence.
However, in order to be of practical usage, such concepts must be then even-
tually implemented in real programs, interacting with real operating systems,
to be executed for the control of real applications.
Traditionally, textbooks concentrate on specific topics using different ap-
proaches. Scheduling theory is often presented using a formal approach based
on a set of assumptions for describing a computer system in a mathemati-
cal framework. This is fine, provided that the reader has enough experience
and skills to understand how well real systems fit into the presented mod-
els, and this may not be the case when the textbook is used in a course or,
more in general, when the reader is entering this area as a primer. Operating
system textbooks traditionally make a much more limited usage of mathemat-
ical formalism and take a more practical approach, but often lack practical
programming examples in the main text (some provide specific examples in
appendices), as the presented concepts apply to a variety of real world systems.
A different approach is taken here: after a general presentation of the ba-
sic concepts in the first chapters, the remaining ones make explicit reference
to two specific operating systems: Linux and FreeRTOS. Linux represents a
full-fledged operating system with a steadily growing user base and, what is
more important from the perspective of this book, is moving toward real-time
responsiveness and is becoming a feasible choice for the development of real-
time applications. FreeRTOS represents somewhat the opposite extreme in
complexity. FreeRTOS is a minimal system with a very limited footprint in
system resources and which can therefore be used in very small applications
such as microcontrollers. At the same time, FreeRTOS supports a multithread-
ing programming model with primitives for thread synchronization that are
not far from what larger systems offer. If, on the one side, the choice of two
1
2 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
specific case studies may leave specific details of other widespread operat-
ing systems uncovered, on the other one it presents to the reader a complete
conceptual path from general concepts of concurrence and synchronization
down to their specific implementation, including dealing with the unavoidable
idiosyncrasies of specific application programming interfaces. Here, code ex-
amples are not collected in appendices, but presented in the book chapters to
stress the fact that concepts cannot be fully grasped unless undertaking the
“dirty job” of writing, debugging, and running programs.
The same philosophy has been adopted in the chapters dealing with
scheduling theory. It is not possible, of course, to get rid of some mathe-
matical formalism, nor to avoid mathematical proofs (which can, however, be
skipped without losing the main conceptual flow). However, thanks to the fact
that such chapters follow the presentation of concurrence-related issues in op-
erating systems, it has been possible to provide a more practical perspective
to the presented results and to better describe how the used formalism maps
onto real-world applications.
This book differs from other textbooks in two further aspects:
• The presentation of a case study at the beginning of the book, rather than
at its end. This choice may sound bizarre as case studies are normally used
to summarize presented concepts and results. However, the purpose of the
case study here is different: rather than providing a final example, it is
used to summarize prerequisite concepts on computer architectures that
are assumed to be known by the reader afterwards. Readers may in fact
have different backgrounds: less experienced ones may find the informal
description of computer architecture details useful to understand more in-
depth concepts that are presented later in the book such as task context
switch and virtual memory issues. The more experienced will likely skip
details on computer input/output or memory management, but may nev-
ertheless have some interest in the presented application, handling online
image processing over a stream of frames acquired by a digital camera.
• The presentation of the basic concepts of control theory and Digital Signal
Processing in a nutshell. Traditionally, control theory and Digital Signal
Processing are not presented in textbooks dealing with concurrency and
schedulability, as this kind of knowledge is not strictly related to operating
systems issues. However, the practical development of embedded systems
is often not restricted to the choice of the optimal operating system archi-
tecture and task organization, but requires also analyzing the system from
different perspectives, finding proper solutions, and finally implementing
them. Different engineering disciplines cover the various facets of embed-
ded systems: control engineers develop the optimal control strategies in
the case the embedded system is devoted to process control; electronic
engineers will develop the front-end electronics, such as sensor and ac-
tuator circuitry, and finally software engineers will define the computing
architecture and implement the control and supervision algorithms. Ac-
Introduction 3
and therefore without detailed code examples. Afterwards, the same concepts
are described, with the aid of several code examples, in the context of the two
reference systems: Linux and FreeRTOS.
Part I includes a chapter on network communication; even if not explicitly
addressing network communication, a topic which deserves by itself a whole
book, some basic concepts about network concepts and network programming
are very often required when developing embedded applications.
The chapters of this part are the following:
• Chapter 2: A Case Study: Vision Control. Here, an application is presented
that acquires a stream of images from a Web camera and detects online the
center of a circular shape in the acquired images. This represents a com-
plete example of an embedded application. Both theoretical and practical
concepts are introduced here, such as the input/output architecture in op-
erating systems and the video capture application programming interface
for Linux.
• Chapter 3: Real-Time Concurrent Programming Principles. From this
chapter onwards, an organic presentation of concurrent programming con-
cepts is provided. Here, the concept of parallelism and its consequences,
such as race conditions and deadlocks, are presented. Some general imple-
mentation issues of multiprocessing, such as process context and states,
are discussed.
• Chapter 4: Deadlock. This chapter focuses on deadlock, arguably one of
the most important issues that may affect a concurrent application. After
defining the problem in formal terms, several solutions of practical interest
are presented, each characterized by a different trade-off between ease of
application, execution overhead, and conceptual complexity.
• Chapter 5: Interprocess Communication Based on Shared Variables. The
chapter introduces the notions of Interprocess Communication (IPC), and
it concentrates on the shared memory approach, introducing the concepts
of lock variable, mutual exclusion, semaphore and monitors, which repre-
sent the basic mechanisms for process coordination and synchronization
in concurrent programming.
• Chapter 6: Interprocess Communication Based on Message Passing. An
alternate way for achieving interprocess communication, based on the ex-
change of messages, is discussed in this chapter. As in the previous two
chapters, the general concepts are presented and discussed without any
explicit reference to any specific operating system.
• Chapter 7: Interprocess Communication Primitives in POSIX/Linux. This
chapter introduces several examples showing how the general concurrent
programming concepts presented before are then mapped into Linux and
POSIX. The presented information lies somewhere between a user guide
and a reference for Linux/POSIX IPC primitives.
Introduction 5
• Chapter 10: Lock and Wait-Free Communication. The last chapter of Part
I outlines an alternative approach in the development of concurrent pro-
grams. Unlike the more classic methods discussed in Chapters 5 and 6,
lock and wait-free communication never forces any participating process
to wait for another. In this way, it implicitly addresses most of the prob-
lems lock-based process interaction causes to real-time scheduling—to be
discussed in Chapter 15—at the expense of a greater design and imple-
mentation complexity. This chapter is based on more formal grounds than
the other chapters of Part I, but it is completely self-contained. Readers
not mathematically inclined can safely skip it and go directly to Part II.
• Chapter 15: Process Interactions and Blocking. This chapter and the next
provide the concepts that are required to map the theoretical results
on scheduling analysis presented up to now onto real-world applications,
where the tasks cannot anymore be described as independent processes,
but interact with each other. In particular, this chapter addresses the in-
terference among tasks due to the sharing of system resources, and intro-
duces the priority inheritance and priority ceiling procedures, which are of
fundamental importance in the implementation of real-world applications.
The last part will cover other aspects of embedded systems. Unlike the first two
parts, where concepts are introduced step by step to provide a comprehensive
understanding of concurrent programming and real-time systems, the chapters
of the last part cover separate, self-consistent arguments. The chapters of this
part are the following:
Introduction 7
• Chapter 20: Basics of Control Theory and Digital Signal Processing. This
chapter provides a quick tour of the most important mathematical con-
cepts for control theory and digital signal processing, using two case stud-
ies: the control of a pump and the development of a digital low-pass filter.
The only mathematical background required of the reader corresponds to
what is taught in a base math course for engineering, and no specific pre-
vious knowledge in control theory and digital signal processing is assumed.
The short bibliography at the end of the book has been compiled with less
experienced readers in mind. For this reason, we did not provide an exhaus-
tive list of references, aimed at acknowledging each and every author who
contributed to the rather vast field of real-time systems.
Rather, the bibliography is meant to point to a limited number of addi-
tional sources of information, which readers can and should actually use as a
starting point to seek further information, without getting lost. There, readers
will also find more, and more detailed, references to continue their quest.
8 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Part I
Concurrent Programming
Concepts
9
This page intentionally left blank
2
A Case Study: Vision Control
CONTENTS
2.1 Input Output on Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Accessing the I/O Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Synchronization in I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.3 Direct Memory Access (DMA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Input/Output Operations and the Operating System . . . . . . . . . . . . . . . . 22
2.2.1 User and Kernel Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Input/Output Abstraction in Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Acquiring Images from a Camera Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.1 Synchronous Read from a Camera Device . . . . . . . . . . . . . . . . . . . . . 29
2.3.2 Virtual Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.3 Handling Data Streaming from the Camera Device . . . . . . . . . . . 37
2.4 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4.1 Optimizing the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5 Finding the Center Coordinates of a Circular Shape . . . . . . . . . . . . . . . . . 54
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
The presented case study consists of a Linux application that acquires a se-
quence of images (frames) from a video camera device. The data acquisition
program will then perform some elaboration on the acquired images in order
to detect the coordinates of the center of a circular shape in the acquired
images.
This chapter is divided into four main sections. In the first section general
concepts in computer input/output (I/O) are presented. The second section
will discuss how I/O is managed by operating systems, in particular Linux,
11
12 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
while in the third one the implementation of the frame acquisition is pre-
sented. The fourth section will concentrate on the analysis of the acquired
frames to retrieve the desired information; after presenting two widespread
algorithms for image analysis, the main concepts about software complexity
will be presented, and it will be shown how the execution time for those al-
gorithms can be reduced, sometimes drastically, using a few optimization and
parallelization techniques.
Embedded systems carrying out online analysis of acquired images are be-
coming widespread in industrial control and surveillance. In order to acquire
the sequence of the frames, the video capture application programming inter-
face for Linux (V4L2) will be used. This interface supports most commercial
USB webcams, which are now ubiquitous in laptops and other PCs. There-
fore this sample application can be easily reproduced by the reader, using for
example his/her laptop with an integrated webcam.
RAM1 RAM2
I/O Device 1
I/O Bus
I/O Device 2
FIGURE 2.1
Bus architecture with a separate I/O bus.
values onto the I/O bus locations (i.e., at the addresses corresponding to the
device registers) via specific I/O Read and Write instructions.
In memory-mapped I/O, devices are seen by the processor as a set of reg-
isters, but no specific bus for I/O is defined. Rather, the same bus used to
exchange data between the processor and the memory is used to access I/O
devices. Clearly, the address range used for addressing device registers must
be disjoint from the set of addresses for the memory locations. Figure 2.1 and
Figure 2.2 show the bus organization for computers using a dedicated I/O
bus and memory-mapped I/O, respectively. Memory-mapped architectures
are more common nowadays, but connecting all the external I/O devices di-
rectly to the memory bus represents a somewhat simplified solution with sev-
eral potential drawbacks in reliability and performance. In fact, since speed
in memory access represents one of the major bottlenecks in computer per-
formance, the memory bus is intended to operate at a very high speed, and
therefore it has very strict constraints on the electrical characteristics of the
bus lines, such as capacity, and in their dimension. Letting external devices
be directly connected to the memory bus would increase the likelihood that
possible malfunctions of the connected devices would seriously affect the func-
tion of the whole system and, even if that were not the case, there would be
the concrete risk of lowering the data throughput over the memory bus.
In practice, one or more separate buses are present in the computer for I/O,
even with memory-mapped architectures. This is achieved by letting a bridge
14 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
RAM1 RAM2
FIGURE 2.2
Bus architecture for Memory Mapped I/O.
component connect the memory bus with the I/O bus. The bridge presents
itself to the processor as a device, defining a set of registers for programming
the way the I/O bus is mapped onto the memory bus. Basically, a bridge
can be programmed to define one or more address mapping windows. Every
address mapping window is characterized by the following parameters:
bus) that are mapped onto the corresponding address range in the secondary
PCI bus (for which the bridge is the master, i.e., leads bus operations). Follow-
ing the same approach, new I/O buses, such as the Small Computer System
Interface (SCSI) bus for high-speed disk I/O, can be integrated into the com-
puter board by means of bridges connecting the I/O bus to the memory bus
or, more commonly, to the PCI bus. Figure 2.3 shows an example of bus con-
figuration defining a memory to PCI bridge, a PCI to PCI bridge, and a PCI
to SCSI bridge.
One of the first actions performed when a computer boots is the configuration
of the bridges in the system. Firstly, the bridges directly connected to the
memory bus are configured, so that the devices over the connected buses can
be accessed, including the registers of the bridges connecting these to new I/O
buses. Then the bridges over these buses are configured, and so on. When all
the bridges have been properly configured, the registers of all the devices in
the system are directly accessible by the processor at given addresses over the
memory bus. Properly setting all the bridges in the system may be tricky, and
a wrong setting may make the system totally unusable. Suppose, for example,
what could happen if an address map window for a bridge on the memory bus
were programmed with an overlap with the address range used by the RAM
memory. At this point the processor would be unable to access portions of
memory and therefore would not anymore be able to execute programs.
Bridge setting, as well as other very low-level configurations are normally
performed before the operating system starts, and are carried out by the Basic
Input/Output System (BIOS), a code which is normally stored on ROM and
executed as soon as the computer is powered. So, when the operating system
starts, all the device registers are available at proper memory addresses. This
is, however, not the end of the story: in fact, even if device registers are seen
by the processor as if they were memory locations, there is a fundamental
difference between devices and RAM blocks. While RAM memory chips are
expected to respond in a time frame on the order of nanoseconds, the response
time of devices largely varies and in general can be much longer. It is therefore
necessary to synchronize the processor and the I/O devices.
RAM1 RAM2
Mem to PCI
I/O Device 1 PCI Device 2
Bridge
PCI to PCI
PCI Bus 2
Bridge
PCI Bus 1
PCI Device 1
PCI Device 3
PCI to SCSI
SCSI Bus
Bridge
SCSI Device 1
FIGURE 2.3
Bus architecture with two PCI buses and one SCSI bus.
A Case Study: Vision Control 17
device. This comes, however, at a cost: no useful operation can be carried out
by the processor when synchronizing to devices in polling. If we assume that
100 ns are required on average for memory access, and assuming that access
to device registers takes the same time as a memory access (a somewhat
simplified scenario since we ignore here the effects of the memory cache),
acquiring a data stream from the serial port would require more than 8000
read operations of the status register for every incoming byte of the stream
– that is, wasting 99.99% of the processor power in useless accesses to the
status register. This situation becomes even worse for slower devices; imagine
the percentage of processor power for doing anything useful if polling were
used to acquire data from the keyboard!
Observe that the operations carried out by I/O devices, once programmed
by a proper configuration of the device registers, can normally proceed in par-
allel with the execution of programs. It is only required that the device should
notify the processor when an I/O operation has been completed, and new data
can be read or written by the processor. This is achieved using Interrupts, a
mechanism supported by most I/O buses. When a device has been started,
typically by writing an appropriate value in a command register, it proceeds
on its own. When new data is available, or the device is ready to accept new
data, the device raises an interrupt request to the processor (in most buses,
some lines are dedicated to interrupt notification) which, as soon as it finishes
executing the current machine instruction, will serve the interrupt request by
executing a specific routine, called Interrupt Service Routine (ISR), for the
management of the condition for which the interrupt has been generated.
Several facts must be taken into account when interrupts are used to syn-
chronize the processor and the I/O operations. First of all, more than one
device could issue an interrupt at the same time. For this reason, in most sys-
tems, a priority is associated with interrupts. Devices can in fact be ranked
based on their importance, where important devices require a faster response.
As an example, consider a system controlling a nuclear plant: An interrupt
generated by a device monitoring the temperature of a nuclear reactor core is
for sure more important than the interrupt generated by a printer device for
printing daily reports. When a processor receives an interrupt request with
a given associated priority level N , it will soon respond to the request only
if it is not executing any service routine for a previous interrupt of priority
M , M ≥ N . In this case, the interrupt request will be served as soon as the
previous Interrupt Service Routine has terminated and there are no pending
interrupts with priority greater or equal to the current one.
current processor status. Assuming that memory locations used to store the
program and the associated data are not overwritten during the execution of
the interrupt service routine, it is only necessary to preserve the content of
the processor registers. Normally, the first actions of the routine are to save
in the stack the content of the registers that are going to be used, and such
registers will be restored just before its termination. Not all the registers can
be saved in this way; in particular, the PC and the SR are changed just before
starting the execution of the interrupt service routine. The PC will be set to
the address of the first instruction of the routine, and the SR will be updated
to reflect the fact that the process is starting to service an interrupt of a given
priority. So it is necessary that these two register are saved by the processor
itself and restored when the interrupt service routine has finished (a specific
instruction to return from ISR is defined in most computer architectures). In
most architectures the SR and PC registers are saved on the stack, but oth-
ers, such as the ARM architecture, define specific registers to hold the saved
values.
A specific interrupt service routine has to be associated with every possi-
ble source of interrupt, so that the processor can take the appropriate actions
when an I/O device generates an interrupt request. Typically, computer ar-
chitectures define a vector of addresses in memory, called a Vector Table,
containing the start addresses of the interrupt service routines for all the I/O
devices able to generate interrupt requests. The offset of a given ISR within
the vector table is called the Interrupt Vector Number. So, if the interrupt vec-
tor number were communicated by the device issuing the interrupt request,
the right service routine could then be called by the processor. This is ex-
actly what happens; when the processor starts serving a given interrupt, it
performs a cycle on the bus called the Interrupt Acknowledge Cycle (IACK)
where the processor communicates the priority of the interrupt being served,
and the device which issued the interrupt request at the specified priority
returns the interrupt vector number. In case two different devices issued an
interrupt request at the same time with the same priority, the device closest
to the processor in the bus will be served. This is achieved in many buses by
defining a bus line in Daisy Chain configuration, that is, which is propagated
from every device to the next one along the bus, only in cases where it did not
answer to an IACK cycle. Therefore, a device will answer to an IACK cycle
only if both conditions are met:
Note that in this case it will not propagate the daisy chain signal to the next
device.
The offset returned by the device in an IACK cycle depends on the cur-
rent organization of the vector table and therefore must be a programmable
parameter in the device. Typically, all the devices which are able to issue an
A Case Study: Vision Control 19
Processor Interrupt 1
PC
IACK 3
Device
SR IVN
4
5
2
SR
ISR
PC Address
IVN
FIGURE 2.4
The Interrupt Sequence.
interrupt request have two registers for the definition of the interrupt prior-
ity and the interrupt vector number, respectively. The sequence of actions is
shown in Figure 2.4, highlighting the main steps of the sequence:
1. The device issues an interrupt request;
2. The processor saves the context, i.e., puts the current values of the
PC and of the SR on the stack;
3. The processor issues an interrupt acknowledge cycle (IACK) on the
bus;
4. The device responds by putting the interrupt vector number (IVN)
over the data lines of the bus;
5. The processor uses the IVN as an offset in the vector table and
loads the interrupt service routine address in the PC.
I/O devices are configured, the code of the interrupt service routine
has to be loaded in memory, and its start address written in the
vector table at, say, offset N ;
3. The value N has to be communicated to the device, usually written
in the interrupt vector number register;
4. When an I/O operation is requested by the program, the device
is started, usually by writing appropriate values in one or more
command registers. At this point the processor can continue with
the program execution, while the device operates. As soon as the
device is ready, it will generate an interrupt request, which will
be eventually served by the processor by running the associated
interrupt service routine.
In this case it is necessary to handle the fact that data reception is asyn-
chronous. A commonly used techniques is to let the program continue after
issuing an I/O request until the data received by the device is required. At
this point the program has to suspend its execution waiting for data, unless
not already available, that is, waiting until the corresponding interrupt service
routine has been executed. For this purpose the interprocess communication
mechanisms described in Chapter 5 will be used.
For a 1 GHz processor this means that 10% of the processor time is dedicated
to data transfer, a percentage clearly no more acceptable.
Very often data exchanged with I/O devices are transferred from or to
memory. For example, when a disk block is read it is first transferred to mem-
ory so that it is later available to the processor. If the processor itself were in
charge of transferring the block, say, after receiving an interrupt request from
the disk device to signal the block availability, the processor would repeat-
edly read data items from the device’s data register into an internal processor
register and write it back into memory. The net effect is that a block of data
has been transferred from the disk into memory, but it has been obtained
at the expense of a number of processor cycles that could have been used to
do other jobs if the device were allowed to write the disk block into memory
by itself. This is exactly the basic concept of Direct Memory Access (DMA),
which is letting the devices read and write memory by themselves so that the
processor will handle I/O data directly in memory. In order to put this simple
concept in practice it is, however, necessary to consider a set of facts. First
of all, it is necessary that the processor can “program” the device so that it
will perform the correct actions, that is, reading/writing a number N of data
items in memory, starting from a given memory address A. For this purpose,
every device able to perform DMA provides at least the following registers:
So, in order to program a block read or write operation, it is necessary that the
processor, after allocating a block in memory and, in case of a write operation,
filling it with the data to be output to the device, writes the start address
and the number of data items in the MAR and WC registers, respectively.
Afterwards the device will be started by writing an appropriate value in (one
of) the command register(s). When the device has been started, it will operate
in parallel with the processor, which can proceed in the execution of the
program. However, as soon as the device is ready to transfer a data item,
it will require the memory bus used by the processor to exchange data with
memory, and therefore some sort of bus arbitration is needed since it is not
possible that two devices read or write the memory at the same time on
the same bus (note however that nowadays memories often provide multiport
access, that is, allow simultaneous access to different memory addresses). At
any time one, and only one, device (including the processor) connected to the
bus is the master, i.e., can initiate a read or write operation. All the other
connected devices at that time are slaves and can only answer to a read/write
bus cycle when they are addressed. The memory will be always a slave in the
bus, as well as the DMA-enabled devices when they are not performing DMA.
At the time such a device needs to exchange data with the memory, it will
22 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
ask the current master (normally the processor, but it may be another device
performing DMA) the ownership of the bus. For this purpose the protocol
of every bus able to support ownership transfer is to define a cycle for the
bus ownership transfer. In this cycle, the potential master raises a request line
and the current master, in response, relinquishes the mastership, signaling this
over another bus line, and possibly waiting for the termination of a read/write
operation in progress. When a device has taken the bus ownership, it can then
perform the transfer of the data item and will remain the current master until
the processor or another device asks to become the new master. It is worth
noting that the bus ownership transfers are handled by the bus controller
components and are carried out entirely in hardware. They are, therefore,
totally transparent to the programs being executed by the processor, except
for a possible (normally very small) delay in their execution.
Every time a data item has been transferred, the MAR is incremented and
the WC is decremented. When the content of the WC becomes zero, all the
data have been transferred, and it is necessary to inform the processor of
this fact by issuing an interrupt request. The associated Interrupt Service
Routine will handle the block transfer termination by notifying the system of
the availability of new data. This is normally achieved using the interprocess
communication mechanisms described in Chapter 5.
system, which may lead to the crash of the whole system. (At least in mono-
lithic operating systems such as Linux and Windows; this may be not true
for other systems, such as microkernel-based ones.) User programs will never
interact directly with the driver as the device is accessible only via the Ap-
plication Programming Interface (API) provided by the operating system. In
the following we shall refer to the Linux operating systems and shall see how
a uniform interface can be adapted to the variety of available devices. The
other operating systems adopt a similar architecture for I/O, which typically
differ only by the name and the arguments of the I/O systems routines, but
not on their functionality.
routine terminates, it will pick the saved return address from the stack and
put it into the Program Counter, so that the execution of the calling program
is resumed. We have already seen, however, how the interrupt mechanism can
be used to “invoke” an interrupt service routine. In this case the sequence
is different, and is triggered not by the calling program but by an external
hardware device. It is exactly when the processor starts executing an Inter-
rupt Service routine that the current execution mode is switched to kernel
mode. When the interrupt service routine returns and the interrupted pro-
gram resumes its execution, unless not switching to a new interrupt service
routine, the execution mode is switched to user mode. It is worth noting that
the mode switch is not controlled by the software, but it is the processor which
only switches to kernel mode when servicing an interrupt.
This mechanism makes sense because interrupt service routines interact
with devices and are part of the device driver, that is, of a software compo-
nent that is integrated in the operating system. However, it may happen that
user programs have to do I/O operations, and therefore they need to execute
some code in kernel mode. We have claimed that all the code handling I/O
is part of the operating system and therefore the user program will call some
system routine for doing I/O. However, how do we switch to kernel mode in
this case where the trigger does not come from an hardware device? The so-
lution is given by Software Interrupts. Software interrupts are not triggered
by an external hardware signal, but by the execution of a specific machine
instruction. The interrupt mechanism is quite the same: The processor saves
the current context, picks the address of the associated interrupt service rou-
tine from the vector table and switches to kernel mode, but in this case the
Interrupt Vector number is not obtained by a bus IACK cycle; rather, it is
given as an argument to the machine instruction for the generation of the
software interrupt.
The net effect of software interrupts is very similar to that of a function
call, but the underlying mechanism is completely different. This is the typical
way the operating system is invoked by user programs when requesting system
services, and it represents an effective barrier protecting the integrity of the
system. In fact, in order to let any code to be executed via software interrupts,
it is necessary to write in the vector table the initial address of such code but,
not surprisingly, the vector table is not accessible in user mode, as it belongs to
the set of data structures whose integrity is essential for the correct operation
of the computer. The vector table is typically initialized during the system
boot (executed in kernel mode) when the operating system initializes all its
data structures.
To summarize the above concepts, let’s consider the execution story of one
of the most used C library function: printf(), which takes as parameter the
(possibly formatted) string to be printed on the screen. Its execution consists
of the following steps:
library. Arguments are passed on the stack and the start address of
the printf routine is put in the program counter;
2. The printf code will carry out the required formatting of the
passed string and the other optional arguments, and then calls the
operating system specific system service for writing the formatted
string on the screen;
3. The system routine executes initially in user mode, makes some
preparatory work and then needs to switch in kernel mode. To do
this, it will issue a software interrupt, where the passed interrupt
vector number specifies the offset in the Vector Table of the corre-
sponding ISR routine to be executed in kernel mode;
4. The ISR is eventually activated by the processor in response to the
software interrupt. This routine is provided by the operating system
and it is now executing in kernel mode;
5. After some work to prepare the required data structures, the ISR
routine will interact with the output device. To do this, it will call
specific routines of the device driver;
6. The activated driver code will write appropriate values in the device
registers to start transferring the string to the video device. In the
meantime the calling process is put in wait state (see Chapter 3 for
more information on processes and process states);
7. A sequence of interrupts will be likely generated by the device to
handle the transfer of the bytes of the string to be printed on the
screen;
8. When the whole string has been printed on the screen, the calling
process will be resumed by the operating system and printf will
return.
Software interrupts provide the required barrier between user and kernel mode,
which is of paramount importance in general purpose operating systems. This
comes, however, at a cost: the activation of a kernel routine involves a sequence
of actions, such as saving the context, which is not necessary in a direct call.
Many embedded systems are then not intended to be of general usage. Rather,
they are intended to run a single program for control and supervision or, in
more complex systems involving multitasking, a well defined set of programs
developed ad hoc. For this reason several real-time operating systems do not
support different execution levels (even if the underlying hardware could), and
all the software is executed in kernel mode, with full access to the whole set of
system resources. In this case, a direct call is used to activate system routines.
Of course, the failure of a program will likely bring the whole system down,
but in this case it is assumed that the programs being executed have already
been tested and can therefore be trusted.
26 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
The evil, however, hides in the details, and in fact all the complexity in the
device/computer interaction has been simply moved to ioctl(). Depending
on the device’s nature, the set of operations and of the associated data struc-
tures may range from a very few and simple configurations to a fairly complex
set of operations and data structures, described by hundreds of user manual
pages. This is exactly the case of the standard driver for the camera devices
that will be used in the subsequent sections of this chapter for the presented
case study.
The abstraction carried out by the operating system in the application
programming interface for device I/O is also maintained in the interaction
between the operating system and the device-specific driver. We have already
seen that, in order to integrate a device in the systems, it is necessary to pro-
vide a device-specific code assembled into the device driver and then integrated
into the operating system. Basically, a device driver provides the implementa-
tion of the open, close, read, write, and ioctl operations. So, when a program
opens a device by invoking the open() system routine, the operating system
will first carry out some generic operations common to all devices, such as
the preparation of its own data structures for handling the device, and will
then call the device driver’s open() routine to carry out the required device
specific actions. The actions carried out by the operating system may involve
the management of the calling process. For example, in a read operation, the
operating system, after calling the device-specific read routine, may suspend
the current process (see Chapter 3 for a description of the process states) in
the case the required data are not currently available. When the data to be
read becomes available, the system will be notified of it, say, with an interrupt
from the device, and the operating system will wake the process that issued
the read() operation, which can now terminate the read() system call.
28 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
# define M A X _ F O R M A T 100
# define FALSE 0
# define TRUE 1
# define C H E C K _ I O C T L _ S T A T U S( message ) \\
if ( status == -1) \\
{ \\
perror ( message ); \\
exit ( E X I T _ F A I L U R E); \\
}
int y u y v F o u n d;
for (;;)
{
status = select (1 , & fds , NULL , NULL , & tv );
if ( status == -1)
{
perror ( " Error in Select " );
exit ( E X I T _ F A I L U R E);
}
status = read ( fd , buf , i m a g e S i z e);
if ( status == -1)
{
perror ( " Error reading buffer " );
exit ( E X I T _ F A I L U R E);
}
/∗ Step 8: Do image proc e ssi n g ∗/
p r o c e s s I m a g e( buf , width , height , i m a g e s i z e);
}
}
The first action (step 1)in the program is opening the device. System routine
open() looks exactly as an open call for a file. As for files, the first argument
is a path name, but in this case such a name specifies the device instance. In
Linux the names of the devices are all contained in the /dev directory. The
files contained in this directory do not correspond to real files (a Webcam is
obviously different from a file), rather, they represent a rule for associating a
unique name with each device in the system. In this way it is also possible to
discover the available devices using the ls command to list the files contained
in a directory. By convention, camera devices have the name /dev/video<n>,
so the command ls /dev/video* will show how many camera devices are
available in the system. The second argument given to system routine open()
specifies the protection associated with that device. In this case the constant
O RDWR specifies that the device can be read and written. The returned value
is an integer value that uniquely specifies within the system the Device De-
scriptor, that is the set of data structures held by Linux to manage this device.
This number is then passed to the following ioctl() calls to specify the target
device. Step 2 consists in checking whether the camera device supports read-
/write operation. The attentive reader may find this a bit strange—how could
the image frames be acquired otherwise?—but we shall see in the second ex-
ample that an alternative way, called streaming, is normally (and indeed most
often) provided. This query operation is carried out by the following line:
status = ioctl(fd, VIDIOC_QUERYCAP, &cap);
In the above line the ioctl operation code is given by constant
VIDIOC QUERYCAP (defined, as all the other constants used in the manage-
ment of the video device, in linux/videodev2.h), and the associated data
structure for the pointer argument is of type v4l2 capability. This struc-
ture, documented in the V4L2 API specification, defines, among others, a
capability field containing a bit mask specifying the supported capabilities for
that device.
Line
if(cap.capabilities & V4L2_CAP_READWRITE)
32 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
will let the program know whether read/write ability is supported by the
device.
In step 3 the device is queried about the supported pixel formats. To do
this, ioctl() is repeatedly called, specifying VIDIOC ENUM FMT operation and
passing the pointer to a v4l2 fmtdesc structure whose fields of interest are:
• index: to be set before calling ioctl() in order to specify the index of the
queried format. When no more formats will be available, that is, when the
index is greater or equal the number of supported indexes, ioctl() will
return an error.
• type: specifies the type of the buffer for which the supported format is
being queried. Here, we are interested in the returned image frame, and
this is set to V4L2 BUF TYPE VIDEO CAPTURE
If the pixel format YUYV is found (this is the normal format supported by
all Webcams), the program proceeds in defining an appropriate image format.
There are many parameters for specifying such information, all defined in
structure v4l2 format passed to ioctl to get (operation VIDIOC G FMT) or to
set the format (operation VIDIOC S FMT). The program will first read (step 4)
the currently defined image format (normally most default values are already
appropriate) and then change (step 5) the formats of interest, namely, image
width, image height, and the pixel format. Here, we are going to define a
640 x 480 image using the YUYV pixel format by writing the appropriate
values in fields fmt.pix.width, fmt.pix.height and fmt.pix.pixelformat
of the format structure. Observe that, after setting the new image format,
the program checks the returned values for image width and height. In fact,
it may happen that the device does not support exactly the requested image
width and height, and in this case the format structure returned by ioctl
contains the appropriate values, that is, the supported width and height that
are closest to the desired ones. Fields pix.sizeimage will contain the total
length in bytes of the image frame, which in our case will be given by 2 times
width times height (recall that in YUYV format four bytes are used to encode
two pixels).
At this point the camera device is configured, and the program can start
acquiring image frames. In this example a frame is acquired via a read() call
whose arguments are:
Function read() returns the number of bytes actually read, which is not
necessarily equal to the number of bytes passed as argument. In fact, it may
happen that at the time the function is called, not all the required bytes are
available, and the program has to manage this properly. So, it is necessary to
make sure that when read() is called, a frame is available for readout. The
usual technique in Linux to synchronize read operation on device is the usage
of the select() function, which allows a program to monitor multiple device
descriptors, waiting until one or more devices become “ready” for some class
of I/O operation (e.g., input data available). A device is considered ready if
it is possible to perform the corresponding I/O operation (e.g., read) without
blocking. Observe that the usage of select is very useful when a program has to
deal with several devices. In fact, since read() is blocking, that is, it suspends
the execution of the calling program until some data are available, a program
reading on multiple devices may suspend in a read() operation regardless the
fact that some other device may have data ready to be read. The arguments
passed to select() are
The devices masks have are of type fd set, and there is no need to know
its definition since macros FD ZERO and FD SET allow resetting the mask
and adding a device descriptor to it, respectively. When the select has not
to monitor a device class, the corresponding mask is NULL, as in the above
example for the write and exception mask. The timeout is specified using the
structure timeval, which defines two fields, tv sec and tv usec, to specify
the number of seconds and microseconds, respectively.
The above example will work fine, provided the camera device supports
direct the read() operation, as far as it is possible to guarantee that the
read() routine is called as often as the frame rate. This is, however, not
always the case because the process running the program may be preempted
by the operating system in order to assign the processor to other processes.
Even if we can guarantee that, on average, the read rate is high enough, it is in
general necessary to handle the occasional cases in which the reading process
is late and the frame may be lost. Several chapters of this book will discuss
this fact, and we shall see several techniques to ensure real-time behavior, that
is, making sure that a given action will be executed within a given amount
of time. If this were the case, and we could ensure that the read() operation
for the current frame will be always executed before a new frame is acquired,
there would be no risk of losing frames. Otherwise it is necessary to handle
34 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
occasional delays in frame readout. The common technique for this is double
buffering, that is using two buffers for the acquired frames. As soon as the
driver is able to read a frame, normally in response to an interrupt indicating
that the DMA transfer for that frame has terminated, the frame is written
in two alternate memory buffers. The process acquiring such frames can then
copy from one buffer while the driver is filling in the other one. In this case,
if T is the frame acquisition period, a process is allowed to read a frame with
a delay up to T . Beyond this time, the process may be reading a buffer that
at the same time is being written by the driver, producing inconsistent data
or losing entire frames. The double buffering technique can be extended to
multiple buffering by using N buffers linked to form a circular chain. When
the driver has filled the nth buffer, it will use buffer (n + 1)modN for the next
acquisition. Similarly, when a process has read a buffer it will proceed to the
next one, selected in the same way as above. If the process is fast enough, the
new buffer will not be yet filled, and the process will be blocked in the select
operation. When select() returns, at least one buffer contains valid frame
data. If, for any reason, the process is late, more than one buffer will contain
acquired frames not yet read by the program. With N buffers, for a frame
acquisition period of T , the maximum allowable delay for the reading process
is (N − 1)T . In the next example, we shall use this technique, and we shall
see that it is no more necessary to call function read() to get data, as one or
more frames will be already available in the buffers that have been set before
by the program. Before proceeding with the discussion of the new example, it
is, however, necessary to introduce the virtual memory concept.
Virtual Address
Page Table
Physical Address
Page Table Entry
Physical Page Number Page Offset
FIGURE 2.5
The Virtual Memory address translation.
the common case of 32 bit architectures, where 32 bits are used to represent
virtual addresses, the top 32−K bits of virtual addresses are used as the index
in the page table. This corresponds to providing a logical organization of the
virtual address rage in a set of memory pages, each 2K bytes long. So the most
significant 32 − K bits will provide the memory page number, and the least
significant K bits will specify the offset within the memory page. Under this
perspective, the page table provides a page number translation mechanism,
from the logical page number into the physical page number. In fact also the
physical memory can be considered divided into pages of the same size, and
the offset of the physical address within the translated page will be the same
of the original logical page.
Even if virtual memory may seem at a first glance a method merely in-
vented to complicate the engineer’s life, the following example should convince
the skeptics of its convenience. Consider two processes running the same pro-
gram: This is perfectly normal in everyday’s life, and no one is in fact surprised
by the fact that two Web browsers or editor programs can be run by differ-
ent processes in Linux (or tasks in Windows). Recalling that a program is
composed of a sequence of machine instructions handling data in processor
registers and in memory, if no virtual memory were supported, the two in-
stances of the same program run by two different processes would interfere
with each other since they would access the same memory locations (they
are running the same program). This situation is elegantly solved, using the
virtual memory mechanism, by providing two different mappings to the two
processes so that the same virtual address page is mapped onto two different
physical pages for the two processes, as shown in Figure 2.6. Recalling that
the address translation is driven by the content of the page table, this means
that the operating systems, whenever it assigns the processor to one process,
will also set accordingly the corresponding page table entries. The page table
36 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Virtual Address Virtual Address
Process 1 Process 2
Physical Memory
FIGURE 2.6
The usage of virtual address translation to avoid memory conflicts.
contents become therefore part of the set of information, called Process Con-
text, which needs to be restored by the operating system in a context switch,
that is whenever a process regains the usage of the processor. Chapter 3 will
describe process management in more detail; here it suffices to know that
virtual address translation is part of the process context.
Virtual memory support complicates quite a bit the implementation of
an operating system, but it greatly simplifies the programmer’s life, which
does not need concerns about possible interferences with other programs. At
this point, however, the reader may be falsely convinced that in an operat-
ing system not supporting virtual memory it is not possible to run the same
program in two different processes, or that, in any case, there is always the
risk of memory interferences among programs executed by different processes.
Luckily, this is not the case, but memory consistence can be obtained only by
imposing a set of rules for programs, such as the usage of the stack for keeping
local variables. Programs which are compiled by a C compiler normally use
the stack to contain local variables (i.e., variables which are declared inside a
program block without the static qualifier) and the arguments passed in rou-
tine calls. Only static variables (i.e., local variables declared with the static
qualifier or variables declared outside program blocks) are allocated outside
A Case Study: Vision Control 37
Process 1 Process 2
Local variables on Local variables on
Stack 1 Stack 2
Static variables
FIGURE 2.7
Sharing data via static variable on systems which do not support Virtual
Addresses.
the stack. A separate stack is then associated with each process, thus allow-
ing memory insulation, even on systems supporting virtual memory. When
writing code for systems without virtual memory, it is therefore important to
pay attention in the usage of static variables, since these are shared among
different processes, as shown in Figure 2.7. This is not necessarily a negative
fact, since a proper usage of static data structures may represent an effective
way for achieving interprocess communication. Interprocess communication,
that is, exchanging data among different processes, can be achieved also with
virtual memory, but in this case it is necessary that the operating system is
involved so that it can set-up the content of the page table in order to allow
the sharing of one or more physical memory pages among different processes,
as shown in Figure 2.8.
Physical Memory
FIGURE 2.8
Using the Page Table translation to map possibly different virtual addresses
onto the same physical memory page.
the network. The same holds for a video device, and read operation will get the
acquired image frame, not read from any “address.” However, when handling
memory buffers in double buffering, it is necessary to find some way to map
region of memory used by the driver into memory buffers for the program.
mmap() can be used for this purpose, and the preparation of the shared buffers
is carried out in two steps:
1. The driver allocates the buffers in its (physical) memory space, and
returns (in a data structure passed to ioctl()) the unique address
(in the driver context) of such buffers. The returned addresses may
be the same physical address of the buffers, but in any case they
are seen outside the driver as addresses referred to the conceptual
file model.
2. The user programs calls mmap() to map such buffers in its virtual
memory onto the driver buffers, passing as arguments the file ad-
dresses returned in the previous ioctl() call. After the mmap() call
the memory buffers are shared between the driver, using physical
addresses, and the program, using virtual addresses.
The code of the program using multiple buffering for handling image frame
streaming from the camera device is listed below.
# include < fcntl .h >
# include < stdio .h >
# include < stdlib .h >
# include < string .h >
# include < errno .h >
# include < linux / v i d e o d e v 2.h >
# include < asm / unistd .h >
# include < poll .h >
# define M A X _ F O R M A T 100
# define FALSE 0
# define TRUE 1
# define C H E C K _ I O C T L _ S T A T U S( message ) \\
if ( status == -1) \\
{ \\
perror ( message ); \\
exit ( E X I T _ F A I L U R E); \\
}
size_t length ;
} b u f f e r D s c;
int idx ;
fd_set fds ; // S e l e c t d e s c r i p t o r s
struct timeval tv ; //Timeout s p e c i f i c a t i o n s t r u c t u r e
Steps 1–6 are the same of the previous program, except for step 2, where
the streaming capability of the device is now checked. In Step 7, the driver is
asked to allocate four image buffers. The actual number of allocated buffers
is returned in the count field of the v4l2 requestbuffers structure passed
to ioctl(). At least two buffers must have been allocated by the driver to
allow double buffering. In Step 8 the descriptors of the buffers are allocated
via the calloc() system routine (every descriptor contains the dimension and
a pointer to the associated buffer). The actual buffers, which have been allo-
cated by the driver, are queried in order to get their address in the driver’s
space. Such an address, returned in field m.offset of the v4l2 buffer struc-
ture passed to ioctl(), cannot be used directly in the program since it refers
to a different address space. The actual address in the user address space is
returned by the following mmap() call. When the program arrives at Step 9,
the buffers have been allocated by the driver and also mapped to the pro-
gram address space. They are now enqueued by the driver, which maintains a
linked queue of available buffers. Initially, all the buffers are available: every
time the driver has acquired a frame, the first available buffer in the queue
is filled. Streaming, that is, frame acquisition, is started at Step 10, and then
at Step 11 the program waits for the availability of a filled buffer, using the
select() system call. Whenever select() returns, at least one buffer con-
tains an acquired frame. It is dequeued in Step 12, and then enqueued in Step
13, after it has been used in image processing. The reason for dequeuing and
then enqueuing the buffer again is to make sure that the buffer will not be
used by the driver during image processing.
Finally, image processing will be carried out by routine processImage(),
which will first build a byte buffer containing only the luminance, that is,
taking the first byte of every 16 bit word of the passed buffer, coded using the
YUYV format.
step allows reducing the size of the problem, since for the following analysis
it suffices to take into account the pixels representing the edges in the image.
Edge detection is carried out by computing the approximation of the gradients
in the X (Lx ) and Y (Ly ) directions for every pixel of the image, selecting,
only those pixels for which the gradient magnitude, computed as |∇L| =
then,
L2x + L2y , is above a given threshold. In fact, informally stated, an edge
corresponds to a region where the brightness of the image changes sharply,
the gradient magnitude being an indication of the “sharpness” of the change.
Observe that in edge detection we are only interested in the luminance, so in
the YUYV pixel format, only the first byte of every two will be considered. The
gradient is computed using a convolution matrix filter. Image filters based on
convolution matrix filters are very common in image elaboration and, based on
the matrix used for the computation, often called kernel, can perform several
types of image processing. Such a matrix is normally a 3 x 3 or 5 x 5 square
matrix, and the computation is carried out by considering, for each pixel image
P (x, y), the pixels surrounding the considered one and multiplying them for
the corresponding coefficient of the kernel matrix K. Here we shall use a
3 x 3 kernel matrix, and therefore the computation of the filtered pixel value
P f (x, y) is
2
2
P f (x, y) = K(i, j)P (x + i − 1, y + j − 1) (2.1)
i=0 j=0
Here, we use the Sobel Filter for edge detection, which defines the following
two kernel matrixes: ⎡ ⎤
−1 0 1
⎣−2 0 2⎦ (2.2)
−1 0 1
for the gradient along the X direction, and
⎡ ⎤
1 2 1
⎣0 0 0⎦ (2.3)
−1 −2 −1
# define T H R E S H O L D 100
/∗ S ob e l matrixes ∗/
static int GX [3][3];
static int GY [3][3];
/∗ I n i t i a l i z a t i o n of the S ob e l matrixes , to be c a l l e d b e for e
S ob e l f i l t e r computation ∗/
static void initG ()
{
/∗ 3x3 GX S ob e l mask . ∗/
GX [0][0] = -1; GX [0][1] = 0; GX [0][2] = 1;
GX [1][0] = -2; GX [1][1] = 0; GX [1][2] = 2;
44 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
/∗ 3x3 GY S ob e l mask . ∗/
GY [0][0] = 1; GY [0][1] = 2; GY [0][2] = 1;
GY [1][0] = 0; GY [1][1] = 0; GY [1][2] = 0;
GY [2][0] = -1; GY [2][1] = -2; GY [2][2] = -1;
}
/∗ Convolution s t a r t s here ∗/
else
{
/∗ X Gradient ∗/
for ( i = -1; i <= 1; i ++)
{
for ( j = - 1; j <= 1; j ++)
{
sumX = sumX + ( int )( (*( image + x + i +
( y + j )* cols )) * GX [ i +1][ j +1]);
}
}
/∗ Y Gradient ∗/
for ( i = -1; i <= 1; i ++)
{
for ( j = -1; j <= 1; j ++)
{
sumY = sumY + ( int )( (*( image + x + i +
( y + j )* cols )) * GY [ i +1][ j +1]);
}
}
/∗ Gradient Magnitude approximation to avoid square r oot ope r ati on s ∗/
sum = abs ( sumX ) + abs ( sumY );
}
two algorithms for a problem of dimension N , the first one requiring f (N ) op-
erations, and the second one requiring exactly 100f (N ). Of course, we would
never choose the second one; however they are equivalent in the big-O nota-
tion, being both O(f (N )).
Therefore, in order to assess the complexity of a given algorithm and to op-
timize it, other techniques must be considered, in addition to the choice of the
appropriate algorithm. This the case of our application: given the algorithm,
we want to make its computation as fast as possible.
First of all, we need to perform a measurement of the time the algorithm
takes. A crude but effective method is to use the system routines for getting
the current time, and measure the difference between the time measured first
and after the computation of the algorithm. The following code snippet makes
a raw estimation of the time procedure makeBorder() takes in a Linux system.
# define I T E R A T I O N S 1000
struct time_t beforeTime , a f t e r T i m e;
int e x e c u t i o n T i m e;
....
g e t t i m e o f d a y(& beforeTime , NULL );
for ( i = 0; i < I T E R A T I O N S; i ++)
m a k e B o r d e r( image , border , cols , rows );
g e t t i m e o f d a y(& afterTime , NULL );
/∗ Execution time i s expressed in microseconds ∗/
e x e c u t i o n T i m e = ( a f t e r T i m e. tv_sec - b e f o r e T i m e. tv_sec ) * 1000000
+ a f t e r T i m e. tv_usec - b e f o r e T i m e. tv_usec ;
e x e c u t i o n T i m e /= I T E R A T I O N S;
...
The POSIX routine gettimeofday() reads the current time from the CPU
clock and stores it in a time t structure whose fields define the number of
seconds (tv sec) and microseconds (tv usec) from the Epoch, that is, a
reference time which, for POSIX, is assumed to be 00:00:00 UTC, January 1,
1970.
The execution time measured in this way can be affected by several factors,
among which can be the current load of the computer. In fact, the process
running the program may be interrupted during execution by other processes
in the system. Even after setting the priority of the current process as the
highest one, the CPU will be interrupted many times for performing I/O and
for the operating system operation. Nevertheless, if the computer is not loaded,
and the process running the program has a high priority, the measurement is
accurate enough.
We are now ready to start the optimization of our edge detection algo-
rithm. The first action is the simplest one: let the compiler do it. Modern
compilers perform very sophisticated optimization of the machine code that
is produced when parsing the source code. It is easy to get an idea of the
degree of optimization by comparing the execution time when compiling the
program without optimization (compiler flag -O0) and with the highest degree
of optimization (compiler flag -O3), which turns out to be 5–10 times shorter
for the edge detection routine. The optimization performed by the compiler
addresses the following aspects:
A Case Study: Vision Control 47
a = 15 * i;
.....
}
a = 0;
for (i = 0; i < 10; i++)
{
a = a + 15;
.....
}
The compiler recognizes, then, induction variables and replaces more com-
plex operations with additions. This optimization is particularly useful for
the loop variables used as indexes in arrays; in fact, many computer ar-
chitectures define memory access operations (arrays are stored in memory
and are therefore accessed via memory access machine instructions such as
LOAD or STORE), which increment the passed memory index by a given
amount in the same memory access operation.
is performed in the cached copy of the data item. Otherwise, a free block in
the cache is found (possibly copying in memory an existing cache block if the
cache is full), and a block of data located around that memory address is first
copied from memory to the cache. The two cases are called Cache Hit and
Cache Miss, respectively. Clearly, a cache miss incurs in a penalty in execution
time (the copy of a block from memory to cache), but, due to memory access
locality, it is likely that further memory accesses will hit the cache, with a
significant reduction in data access time.
The gain in performance due to the cache memory depends on the program
itself: the more local is memory access, the faster will be program execution.
Consider the following code snippet, which computes the sum of the elements
of a MxN matrix.
double a [ M ][ N ];
double sum = 0;
for ( i = 0; i < M , i ++)
for ( j = 0; j < N ; j ++)
sum += a [ i ][ j ];
In C, matrixes are stored in row first order, that is, rows are stored sequen-
tially. In this case a[i][j] will be adjacent in memory to a[i][j+1], and the
program will access matrix memory sequentially. The following code is also
correct, differing from the previous one only for the exchange of the two for
statements.
double a [ M ][ N ];
double sum = 0;
for ( j = 0; j < N ; j ++)
for ( i = 0; i < M , i ++)
sum += a [ i ][ j ];
However in this case memory access is not sequential since matrix elements
a[i][j] and a[i+1][j] are stored in memory locations that are N elements
far away. In this case, the number of cache misses will be much higher than
in the former case, especially for large matrixes, affecting the execution time
of that code.
Coming back to routine makeBorder(), we observe that it is accessing
memory in the right order. In fact, what the routine basically does is to con-
sider a 3 x 3 matrix sweeping along the 480 rows of the image. The order
of access is therefore row first, corresponding to the order in which bytes
are stored in the image buffer. So, if bytes are being considered in a “cache
friendly” order, what can we do to improve performance? Recall that the
compiler is very clever in optimizing access to information stored in program
variables, but is mostly blind as regard the management of information stored
in memory (i.e., in arrays and matrixes). This fact suggests to us a possible
strategy: move the current 3 x 3 portion of the image being considered in the
Sobel filter into 9 variables. Filling this set of 9 variables the first time a line
is considered will require reading 9 values from memory, but at the follow-
ing iterations, that is, moving the 3 x 3 matrix one position left, only three
new values will be read from memory, the others already being stored in pro-
gram variables. Moreover, the nine multiplications and summations required
52 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
to compute the value of the current output filter can be directly expressed in
the code, without defining the 3 x 3 matrixes GX and GY used in the program
listed above. The new implementation of makeBorder() is listed below, using
the new variables c11, c12, . . . , c33 to store the current portion of the image
being considered for every image pixel.
void m a k e B o r d e r( char * image , char * border , int cols , int rows )
{
int x , y , sumX , sumY , sum ;
/∗ Vari ab le s to hold the 3x3 porti on of the image used in the computation
of the S ob e l f i l t e r output ∗/
int c11 , c12 , c13 , c21 , c22 , c23 , c31 , c32 , c33 ;
The resulting code is for sure less readable then the previous version, but, when
compiled, it produces a code that is around three times faster because the
compiler has now more chance for optimizing the management of information,
being memory access limited to the essential cases.
In general code optimization is not a trivial task and requires ingenuity and
a good knowledge of the optimization strategies carried out by the compiler.
Very often, in fact, the programmer experiences the frustration of getting no
advantage after working hard in optimizing his/her code, simply because the
foreseen optimization had already been carried out by the compiler. Since
optimized source code is often much less readable that a nonoptimized one,
implementing a given algorithm taking care also of possible code optimization,
may be an error-prone task. For this reason, implementation should be done
in two steps:
r
θ
x
FIGURE 2.9
r and θ representation of a line.
check the correctness of the new code and the amount of gained
performance.
cos θ r
y = −( )x + ( ) (2.5)
sin θ sin θ
Imagine an image containing one line. After edge detection, the pixels
associated with the detected edges may belong to the line, or to some other
element of the scene represented by the image. Every such pixel at coordinates
(x0 , y0 ) is assumed by the algorithm as belonging to a potential line, and the
(infinite) set of lines passing for (x0 , y0 ) is considered. For all such lines, the
associated parameters r and θ obey to the following relation:
r0
θ0
θ
FIGURE 2.10
(r, θ) relationship for points (x0 , y0 ) and (x1 , y1 ).
that is, a sinusoidal law in the plane (r, θ). Suppose now that the considered
pixel effectively belongs to the line, and consider another pixel at position
(x1 , y1 ), belonging to the same line. Again, for the set of lines passing through
(x1 , y1 ), their r and θ will obey the law:
Plotting (2.5) and (2.7) in the (r, θ) (Figure 2.10) we observe that the two
graphs intersect in (r0 , θ0 ), where r0 and θ0 are the parameters of the line
passing through (x0 , y0 ) and (x1 , y1 ). Considering every pixel on that line, all
the corresponding curves in place (r, θ) will intersect in (r0 , θ0 ). This suggests a
voting procedure for detecting the lines in an image. We must consider, in fact,
that in an image spurious pixels are present, in addition to those representing
the line. Moreover, the (x, y) position of the line pixels may lie not exactly in
the expected coordinates for that line. So, a matrix corresponding to the (r, θ)
plane, initially set to 0, is maintained in memory. For every edge pixel, the
matrix elements corresponding to all the pairs (r, θ) defined by the associated
sinusoidal relation are incremented by one. When all the edge pixels have been
considered, supposing a single line is represented in the image, the matrix
element at coordinates (r0 , θ0 ) will hold the highest value, and therefore it
suffices to choose the matrix element with the highest value, whose coordinates
will identify the recognized line in the image.
A similar procedure can be used to detect the center of a circular shape
in the image. Assume initially that the radius R of such circle is known.
In this case, a matrix with the same dimension of the image is maintained,
initially set to 0. For every edge pixel (x0 , y0 ) in the image, the circle of radius
R centered in (x0 , y0 ) is considered, and the corresponding elements in the
matrix incremented by 1. All such circles intersect in the center of the circle
in the image, as shown in Figure 2.11. Again, a voting procedure will allow
discovery of the center of the circle in edge image, even in presence of spurious
pixels, and the approximate position of the pixels representing the circle edges.
If the radius R is not known in advance, it is necessary to repeat the above
procedure for different values of R and choose the radius value that yields
56 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
FIGURE 2.11
Circles drawn around points over the circumference intersect in the circle
center.
FIGURE 2.12
A sample image with a circular shape.
the maximum count value for the candidate center. Intuitively, this holds,
because only when the considered radius is the right one will all the circles
built around the border pixels of the original circle intersect in a single point.
Observe that even if the effective radius of the circular object to be detected
in the image is known in advance, the radius of its shape in the image may
depend on several factors, such as its distance from the camera, or even from
the illumination of the scene, which may yield slightly different edges in the
image, so in practice it is always necessary to consider a range of possible
radius values.
The overall detection procedure is summarized in Figures 2.12, 2.13, 2.14,
and 2.15. The original image and the detected edges are shown in Figures 2.12
and 2.13, respectively. Figure 2.14 is a representation of the support matrix
used in the detection procedure. It can be seen that most of the circles in the
image intersect in a single point (the others are circles drawn around the other
edges of the image), reported then in the original image in Figure 2.15.
The code of routine findCenter() is listed below. Its input arguments are
the radius of the circle, the buffer containing the edges of the original image
(created by routine makeBorder()), and the number of rows and columns. The
routine returns the position of the detected center and a quality indicator,
expressed as the normalized maximum value in the matrix used for center
detection. The buffer for such a matrix is passed in the last argument.
A Case Study: Vision Control 57
FIGURE 2.13
The image of 2.12 after edge detection.
FIGURE 2.14
The content of the voting matrix generated from the edge pixels of 2.13.
FIGURE 2.15
The detected center in the original image.
58 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
/∗ Black t h r e s h o l d :
a p i x e l v alu e l e s s than the t h r e s h o l d i s considered b l a c k . ∗/
# define B L A C K _ L I M I T 10
void f i n d C e n t e r( int radius , unsigned char * buf , int rows , int cols ,
int * retX , int * retY , int * retMax , unsigned char * map )
{
int x , y , l , m , currCol , currRow , maxCount = 0;
int maxI = 0 , maxJ = 0;
/∗ Square r oots needed for computation are computed only once
and maintained in array sq r ∗/
static int sqr [2 * M A X _ R A D I U S];
static int s q r I n i t i a l i z e d = 0;
/∗ Hit counter , used to normalize the returned q u a l i t y i n d i c a t o r ∗/
double t o t C o u n t s = 0;
/∗ The matrix i s i n i t i a l l y s e t to 0 ∗/
memset ( map , 0 , rows * cols );
/∗ I f square r oot v al u e s not y e t i n i t i a l i z e d , compute them ∗/
if (! s q r I n i t i a l i z e d)
{
s q r I n i t i a l i z e d = 1;
for ( l = - radius ; l <= radius ; l ++)
/∗ i n t e g e r approximation of s q r t ( r adi u s ˆ2 − l ˆ2) ∗/
sqr [ l + radius ] = sqrt ( radius * radius - l * l ) + 0.5;
}
for ( currRow = 0; currRow < rows ; currRow ++)
{
for ( currCol = 0; currCol < cols ; currCol ++)
{
/∗ Consider only p i x e l s corresponding to borders of the image
Such p i x e l s are s e t by makeBorder as dark ones∗/
if ( buf [ currRow * cols + currCol ] <= B L A C K _ L I M I T)
{
x = currCol ;
y = currRow ;
/∗ Increment the v al u e of the p i x e l s in map b u f f e r which corresponds to
a c i r c l e of the g i v e n r adi u s centered in ( currCol , currRow) ∗/
for ( l = x - radius ; l <= x + radius ; l ++)
{
if ( l < 0 || l >= cols )
continue ; // Out of image X range
m = sqr [l - x + radius ];
if (y - m < 0 || y + m >= rows )
continue ; //Out of image Y range
map [( y - m )* cols + l ]++;
map [( y + m )* cols + l ]++;
t o t C o u n t s += 2; //Two more p i x e l s incremented
/∗ Update current maximum ∗/
if ( maxCount < map [( y + m )* cols + l ])
{
maxCount = map [( y + m )* cols + l ];
maxI = y + m ;
maxJ = l ;
}
if ( maxCount < map [( y - m )* cols + l ])
{
maxCount = map [( y - m )* cols + l ];
maxI = y - m ;
maxJ = l ;
}
}
}
}
}
/∗ Return the (X, y ) p o s i t i o n in the map which y i e l d s the l a r g e s t v alu e ∗/
* retX = maxJ ;
* retY = maxI ;
/∗ The returned q u a l i t y i n d i c a t o r i s expressed as maximum p i x e l
A Case Study: Vision Control 59
v al u e in map matrix ∗/
* retMax = maxCount ;
}
As stated before, due to small variations of the actual radius of the circular
shape in the image, routine findCenter() will be iterated for a set of radius
values, ranging between a given minimum and maximum value.
When considering the possible optimization of the detection procedure,
we observe that every time routine findCenter() is called, it is necessary to
compute the square root values that are required to select the map elements
which lie on a circumference centered on the current point. Since the routine is
called for a fixed range of radius values, we may think of removing the square
root calculation at the beginning of the routine, and to pass on an array of
precomputed values, which are prepared in an initialization phase for all the
considered radius values. This improvement would, however, bring very little
improvement in speed: in fact, only few tens of square root computations (i.e.,
the pixel dimension of the radius) are carried out every time findCenter() is
called, a very small number of operations if compared with the total number of
operations actually performed. A much larger improvement can be obtained by
observing that it is possible to execute findCenter() for the different radius
values in parallel instead of in a sequence. The following code uses POSIX
threads, described in detail in Chapter 7, to launch a set of thread, each
computing the center coordinates for a given value of the radius. Every thread
can be considered an independent flow of execution for the passed routine. In a
multicore processor, threads can run on different cores, thus providing a drastic
reduction of the execution time because code is executed effectively in parallel.
A new thread is created by POSIX routine pthread create(), which takes as
arguments the routine to be executed and the (single) parameter to be passed.
As findCenter() accepts multiple input and output parameters, it cannot be
passed directly as argument to pthread create(). The normal practice is to
allocate a data structure containing the routine-specific parameters and to
pass its pointer to pthread create() using a support routine (doCenter()
in the code below).
After launching the threads, it is necessary to wait for their termina-
tion before selecting the best result. This is achieved using POSIX routine
pthread join(), which suspends the execution of the calling program un-
til the specified thread terminates, called in a loop for every created thread.
When the loop exits, all the centers have been computed, and the best can-
didate can be chosen using the returned arguments stored in the support
argument structures.
# include < pthreads .h >
/∗ D e fi n i ti on of a s t r u c t u r e to contain the arguments to be
exchanged with findCenter ( ) ∗/
struct a r g u m e n t s{
unsigned char * edges ; //Edge image
int rows , cols ; //Rows and columns i f the image
int r ; //Current r adi u s
int retX , retY ; //Returned c e n te r p o s i t i o n
int retMax ; //Returned q u a l i t y f a c t o r
60 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
2.6 Summary
In this chapter a case study has been used to introduce several important facts
about embedded systems. In the first part, the I/O architecture of computers
has been presented, introducing basilar techniques such as polling, interrupts
and Direct Memory Access.
The interface to I/O operations provided by operating systems, in par-
ticular Linux, has then been presented. The operating system shields all the
internal management of I/O operations, offering a very simple interface, but
nonetheless knowledge in the I/O techniques is essential to fully understand
how I/O routines can be used. The rather sophisticated interface provided by
the library V4L2 for camera devices allowed us to learn more concepts such
as virtual memory and multiple buffer techniques for streaming.
The second part of the chapter concentrates on image analysis, introducing
some basic concepts and algorithms. In particular, the important problem of
code optimization is discussed, presenting some optimization techniques car-
ried out by compilers and showing how to “help” compilers in producing more
optimized code. Finally, an example of code parallelization has been presented,
to introduce the basic concepts of threads activation and synchronization.
We are ready to enter the more specific topics of the book. As explained
in the introduction, embedded systems represent a field of application with
many aspects, only few of which can be treated in depth in a reasonably sized
text. Nevertheless, the general concepts we met so far will hopefully help us
in gaining some understanding of the facets not “officially” covered by this
book.
62 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
3
Real-Time Concurrent Programming
Principles
CONTENTS
3.1 The Role of Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Definition of Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3 Process State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4 Process Life Cycle and Process State Diagram . . . . . . . . . . . . . . . . . . . . . . . 70
3.5 Multithreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
63
64 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Even more importantly, the ability of carrying out multiple activities “at
once” helps in fulfilling any timing constraint that may be present in the sys-
tem in an efficient way. This aspect is often of concern whenever the computer
interacts with the outside world. For example, network interfaces usually have
a limited amount of space to buffer incoming data. If the system as a whole
is unable to remove them from the buffer and process them within a short
amount of time—on the order of a few milliseconds for a high-speed net-
work coupled with a low-end interface—the buffer will eventually overflow,
and some data will be lost or will have to be retransmitted. In more extreme
cases, an excessive delay will also trigger higher-level errors, such as network
communication timeouts.
The same end result can be obtained when a single processor or core is
available, or when the number of parallel activities exceeds the number of avail-
able processors or cores, by means of software techniques implemented at the
operating system level, known as multiprogramming, that repeatedly switch
the processor back and forth from one activity to another. If properly im-
plemented, this context switch is completely transparent to, and independent
from, the activities themselves, and they are usually unaware of its details.
The term pseudo parallelism is often used in this case, to contrast it with the
real hardware-supported parallelism discussed before, because technically the
computer is still executing exactly one activity at any given instant of time.
The notion of sequential process (or process for short) was born, mainly
in the operating system community, to help programmers express parallel
activities in a precise way and keep them under control. It provides both an
abstraction and a conceptual model of a running program.
Real-Time Concurrent Programming Principles 65
Processor Execution
Process A Process B Process C Process A Process B Process C
FIGURE 3.1
Multiprogramming: abstract model of three sequential processes (left) and
their execution on a single-processor system (right).
execute them. The solid lines represent the execution of a certain process,
whereas the dashed lines represent context switches. The multiprogramming
mechanism ensures, in the long run, that all processes will make progress even
if, as shown in the time line of processor activity over time at the bottom of
the figure, the processor indeed executes only one process at a time.
Comparing the left and right sides of Figure 3.1 explains why the adoption
of the process model simplifies the design and implementation of a concurrent
system: By using this model, the system design is carried out at the process
level, a clean and easy to understand abstraction, without worrying about the
low-level mechanisms behind its implementation. In principle, it is not even
necessary to know whether the system’s hardware is really able to execute
more than one process at a time or not, or the degree of such a parallelism,
as long as the execution platform actually provides multiprogramming.
The responsibility of choosing which processes will be executed at any
given time by the available processors, and for how long, falls on the operat-
ing system and, in particular, on an operating system component known as
scheduler. Of course, if a set of processes must cooperate to solve a certain
problem, not all possible choices will produce meaningful results. For example,
if a certain process P makes use of some values computed by another process
Q, executing P before Q is probably not a good idea.
Therefore, the main goal of concurrent programming is to define a set of
interprocess communication and synchronization primitives. When used ap-
propriately, these primitives ensure that the results of the concurrent program
will be correct by introducing and enforcing appropriate constraints on the
scheduler decisions. They will be discussed in Chapters 5 and 6.
Another aspect of paramount importance for real-time systems—that is,
systems in which there are timing constraints on system activities—is that,
even if the correct application of concurrent programming techniques guaran-
tees that the results of the concurrent program will be correct, the scheduling
decisions made by the operating system may still affect the behavior of the
system in undesirable ways, concerning timing.
This is due to the fact that, even when all constraints set forth by the
interprocess communication and synchronization primitives are met, there are
still many acceptable scheduling sequences, or process interleaving. Choosing
one or another does not affect the overall result of the computation, but may
change the timing of the processes involved.
As an example, Figure 3.2 shows three different interleavings of processes
P , Q, and R. All of them are ready for execution at t = 0, and their execution
requires 10, 30, and 20 ms of processor time, respectively. Since Q produces
some data used by P , P cannot be executed before Q. For simplicity, it is also
assumed that processes are always run to completion once started and that
there is a single processor in the system.
Interleaving (a) is unsuitable from the concurrent programming point of
view because it does not satisfy the precedence constraint between P and Q
stated in the requirements, and will lead P to produce incorrect results. On
Real-Time Concurrent Programming Principles 67
10 ms
FIGURE 3.2
Unsuitable process interleavings may produce incorrect results. Process inter-
leaving, even when it is correct, also affects system timing. All processes are
ready for execution at t = 0.
the other hand, interleavings (b) and (c) are both correct in this respect—the
precedence constraint is met in both cases—but they are indeed very different
from the system timing point of view. As shown in the figure, the completion
time of P and R will be very different. If we are dealing with a real-time
system and, for example, process P must conclude within 50 ms, interleaving
(b) will satisfy this requirement, but interleaving (c) will not.
In order to address this issue, real-time systems use specially devised
scheduling algorithms, to be discussed in Chapters 11 and 12. Those algo-
rithms, complemented by appropriate analysis techniques, guarantee that a
concurrent program will not only produce correct results but it will also satisfy
its timing constraints for all permitted interleavings. This will be the main
subject of Chapters 13 through 16.
Global memory
Program
references
counter
1. Program
code
Resource 4. Resource
allocation state
2. Processor state
FIGURE 3.3
Graphical representation of the process state components.
However, this is still not enough because the same program code, with
the same processor state, can still give origin to distinct execution activities
depending on the memory state. The same instruction, for example, a division,
can in fact correspond to very different activities, depending on the contents of
the memory word that holds the divisor. If the divisor is not zero, the division
will be carried out normally; if it is zero, most processors will instead take a
trap.
The last elements the process state must be concerned with are the operat-
ing system resources currently assigned to the process itself. They undoubtedly
have an influence on program execution—that is, in the final analysis, on the
process—because, for example, the length and contents of an input file may
affect the behavior of the program that is reading it.
It should be noted that none of the process state components discussed
so far have anything to do with time. As a consequence, by design, a context
switch operation will be transparent with respect to the results computed
by the process, but may not be transparent for what concerns its timeliness.
This is another way to justify why different scheduling decisions—that is,
performing a context switch at a certain instant instead of another—will not
affect process results, but may lead to either an acceptable or an unacceptable
timing behavior. It also explains why other techniques are needed to deal with,
and satisfy, timing constraints in real-time systems.
The fact that program code is one of the process components but not
the only one, also implies that there are some decisive differences be-
tween programs and processes, and that those two terms must not be used
interchangeably. Similarly, processes and processors are indeed not synonyms.
In particular:
a. Creation
1.
Created
b. Admission
f. End of wait
2.
h. Destruction Ready
d. Yield or
c. Scheduling
preemption
5. 3. 4.
Terminated Running e. Passive wait Blocked
g. Termination
FIGURE 3.4
An example of process state diagram.
init, have exactly one parent, and zero or more children. This relationship can
be conveniently represented by arranging all processes into a tree, in which
Some operating systems keep track and make use of this relationship in order
to define the scope of some service requests related to the processes themselves.
For example, in most Unix and Unix-like systems only the parent of a process
can wait for its termination and get its final termination status. Moreover,
the parent–child relation also controls resource inheritance, for example, open
files, upon process creation.
During their life, processes can be in one of several different states. They go
from one state to another depending on their own behavior, operating system
decision, or external events. At any instant, the operating systems has the
responsibility of keeping track of the current state of all processes under its
control.
A useful and common way to describe in a formal way all the possible
process states and the transition rules is to define a directed graph, called
Process State Diagram (PSD), in which nodes represent states and arcs rep-
resent transitions.
A somewhat simplified process state diagram is shown in Figure 3.4. It
should be remarked that real-world operating systems tend to have more
states and transitions, but in most cases they are related to internal details of
72 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
that specific operating systems and are therefore not important for a general
discussion.
Looking at the diagram, at any instant, a process can be in one of the
following states:
This includes the current position of the process within the Process State
Diagram and other process attributes that drive scheduling decisions and de-
pend on the scheduling algorithm being used. A relatively simple scheduling
algorithm may only support, for example, a numeric attribute that represents
the relative process priority, whereas more sophisticated scheduling techniques
may require more attributes.
3.5 Multithreading
According to the definition given in Section 3.3, each process can be regarded
as the execution of a sequential program on “its own” processor. That is, the
process state holds enough state information to fully characterize its address
space, the state of the resources associated with it, and one single flow of
control, the latter being represented by the processor state.
In many applications, there are several distinct activities that are nonethe-
less related to each other, for example, because they have a common goal. For
example, in an interactive media player, it is usually necessary to take care
of the user interface while decoding and playing an audio stream, possibly
retrieved from the Internet. Other background activities may be needed as
well, such as retrieving the album artwork and other information from a re-
mote database.
It may therefore be useful to manage all these activities as a group and
share system resources, such as files, devices, and network connections, among
them. This can be done conveniently by envisaging multiple flows of control, or
threads, within a single process. As an added bonus, all of them will implicitly
refer to the same address space and thus share memory. This is a useful feature
because many interprocess communication mechanisms, for instance, those
discussed in Chapter 5, are indeed based on shared variables.
Accordingly, many modern operating systems support multithreading, that
is, they support multiple threads within the same process by splitting the pro-
cess state into per-process and per-thread components as shown in Figure 3.5.
In particular,
• The program code is the same for all threads, so that all of them execute
from the same code base.
• Each thread has its own processor state and procedure call stack, in order
to make the flows of control independent from each other.
• All threads evolve autonomously for what concerns execution, and hence,
each of them has its own position in the PSD and its own scheduling at-
tributes.
• All threads reside in the same address space and implicitly share memory.
76 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
1. Program
code
Resource 4. Resource
allocation state
2.2.Processor state
2.Processor state
2. Per-thread
Processor state
processor state
FIGURE 3.5
Graphical representation of the process state components in a multithreading
system.
• All resources pertaining to the process are shared among its threads.
real-time operating systems, too. In all these situations, the only way to still
support multiprogramming despite the hardware limitations is through mul-
tithreading.
For example, the ARM Cortex-M3 [7] port of the FreeRTOS operating
system [13], to be discussed in Chapter 17, can make use of the MPU if it
is available. If it is not, the operating system still supports multiple threads,
which share the same address space and can freely read and write each other’s
data.
3.6 Summary
In this chapter, the concept of process has been introduced. A process is an
abstraction of an executing program and encompasses not only the program
itself, which is a static entity, but also the state information that fully char-
acterizes execution.
The notion of process as well as the distinction between programs and
processes become more and more important when going from sequential to
concurrent programming because it is essential to describe, in a sound and
formal way, all the activities going on in parallel within a concurrent system.
This is especially important for real-time applications since the vast majority
of them are indeed concurrent.
The second main concept presented in this chapter is the PSD. Its main
purpose is to define and represent the different states a process may be in dur-
ing its lifetime. Moreover, it also formalizes the rules that govern the transition
of a process from one state to another.
As it will be better explained in the next chapters, the correct definition
of process states and transitions plays a central role in understanding how
processes are scheduled for execution, when they outnumber the processors
available in the systems, how they exchange information among themselves,
and how they interact with the outside world in a meaningful way.
Last, the idea of having more than one execution flow within the same
process, called multithreading, has been discussed. Besides being popular in
modern, general-purpose systems, multithreading is of interest for real-time
systems, too. This is because hardware limitations may sometimes prevent
real-time operating systems from supporting multiple processes in an effective
way. In that case, typical of small embedded systems, multithreading is the
only option left to support concurrency anyway.
78 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
4
Deadlock
CONTENTS
4.1 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 Formal Definition of Deadlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3 Reasoning about Deadlock: The Resource Allocation Graph . . . . . . . . . 84
4.4 Living with Deadlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.5 Deadlock Prevention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.6 Deadlock Avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.7 Deadlock Detection and Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
79
80 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
taneously using the same disk block for storage must be avoided because this
would lead to incorrect results and loss of data.
To deal with this problem, most operating systems compel the processes
under their control to request resources before using them and wait if those
resources are currently assigned to another process, so that they are not im-
mediately available for use. Processes must also release their resources when
they no longer need them, in order to make them available to others. In this
way, the operating system acts as an arbiter for what concerns resource allo-
cation and can ensure that processes will have exclusive access to them when
required.
Unless otherwise specified, in this chapter we will only be concerned with
reusable resources, a term taken from historical IBM literature [33]. A reusable
resource is a resource that, once a process has finished with it, is returned to
the system and can be used by the same or another process again and again.
In other words, the value of the resource or its functionality do not degrade
with use. This is in contrast with the concept of consumable resource, for
example, a message stored in a FIFO queue, that is created at a certain point
and ceases to exist as soon as it is assigned to a process.
In most cases, processes need more than one resource during their lifetime
in order to complete their job, and request them in succession. A process A
wanting to print a file may first request a memory buffer in order to read the
file contents into it and have a workspace to convert them into the printer-
specific page description language. Then, it may request exclusive use of the
printer and send the converted data to it. We leave out, for clarity, the possibly
complex set of operations A must perform to get access to the file.
If the required amount of memory is not immediately available, it is reason-
able for the process to wait until it is, instead of failing immediately because
it is likely that some other process will release part of its memory in the im-
mediate future. Likewise, the printer may be assigned to another process at
the time of the request and, also in this case, it is reasonable to wait until it
is released.
Sadly, this very common situation can easily lead to an anomalous condi-
tion, known as deadlock, in which a whole set of processes is blocked forever
and will no longer make any progress. Not surprisingly, this problem has re-
ceived a considerable amount of attention in computer science; in fact, one of
the first formal definitions of deadlock was given by in 1965 by Dijkstra [23],
who called it “deadly embrace.”
To illustrate how a deadlock may occur in our running example, let us
consider a second process B that runs concurrently with A. It has the same
goal as process A, that is, to print a file, but is has been coded in a different
way. In particular, process B request the printer first, and then it tries to
get the memory buffer it needs. The nature of this difference is not at all
important (it may be due, for example, to the fact that A and B have been
written by two different programmers unaware of each other’s work), but it
Deadlock 81
Process
A
2. A requests and
obtains its buffer
4. A waits for the
printer to become
Memory available
Printer
A's buffer
Printer
B's buffer
3. B waits until enough
memory becomes
available
1. B gets exclusive
access to the printer
Process
B
FIGURE 4.1
A simple example of deadlock involving two processes and two resources.
1. Process B request the printer, P . Since the printer has not been
assigned to any process yet, the request is granted immediately and
B continues.
2. Process A requests a certain amount MA of memory. If we assume
that the amount of free memory at the time of the request is greater
than MA , this request is granted immediately, too, and A proceeds.
3. Now, it is the turn of process B to request a certain amount of
memory MB . If the request is sensible, but there is not enough
free memory in the system at the moment, the request is not de-
clined immediately. Instead, B is blocked until a sufficient amount
of memory becomes available.
This may happen, for example, when the total amount of memory
M in the system is greater than both MA and MB , but less than
82 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
4. When process A requests the printer P , it finds that it has been as-
signed to B and that it has not been released yet. As a consequence,
A is blocked, too.
At this point, both A and B will stay blocked forever because they own a
resource, and are waiting for another resource that will never become available
since it has been assigned to the other process involved in the deadlock.
Even in this very simple example, it is evident that a deadlock is a complex
phenomenon with a few noteworthy characteristics:
Unfortunately, this means that the code will be hard to debug, and even the
insertion of a debugger or code instrumentation to better understand what
is happening may perturb system timings enough to make the deadlock
disappear. This is a compelling reason to address deadlock problems from
a theoretical perspective, during system design, rather than while testing
or debugging it.
• The processes involved in the deadlock will no longer make any progress in
their execution, that is, they will wait forever.
Havender [33] and Coffman et al. [20] were able to formulate four conditions
that are individually necessary and collectively sufficient for a deadlock to
occur. These conditions are useful, first of all because they define deadlock in a
way that abstracts away as much as possible from any irrelevant characteristics
of the processes and resources involved.
Second, they can and have been used as the basis for a whole family of
deadlock prevention algorithms because, if an appropriate policy is able to
prevent (at least) one of them from ever being fulfilled in the system, then no
deadlock can possibly occur by definition. The four conditions are
1. Mutual exclusion: Each resource can be assigned to, and used by, at
most one process at a time. As a consequence, a resource can only
be either free or assigned to one particular process. If any process
requests a resource currently assigned to another process, it must
wait.
2. Hold and Wait : For a deadlock to occur, the processes involved in
the deadlock must have successfully obtained at least one resource
in the past and have not released it yet, so that they hold those
resources and then wait for additional resources.
3. Nonpreemption: Any resource involved in a deadlock cannot be
taken away from the process it has been assigned to without its
consent, that is, unless the process voluntarily releases it.
4. Circular wait : The processes and resources involved in a deadlock
can be arranged in a circular chain, so that the first process waits
for a resource assigned to the second one, the second process waits
for a resource assigned to the third one, and so on up to the last
process, which is waiting for a resource assigned to the first one.
84 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
R1 P2 R5
A
R4
P1
B R2
P3 P4
R3
FIGURE 4.2
A simple resource allocation graph, indicating a deadlock.
Hence, the resource allocation graph shown in Figure 4.2 represents the fol-
lowing situation:
• Process P1 owns resources R2 and R3 , and is waiting for R1 .
• Process P2 owns resource R1 and is waiting for resource R4 .
• Process P3 owns resource R4 and is waiting for two resources to become
available: R2 and R3 .
• Process P4 owns resource R5 and is not waiting for any other resource.
It should also be noted that arcs connecting either two processes, or two
resources, have got no meaning and are therefore not allowed in a resource al-
location graph. More formally, the resource allocation graph must be bipartite
with respect to process and resource nodes.
The same kind of data structure can also be used in an operating system
to keep track of the evolving resource request and allocation state. In this
case,
• When a process P requests a certain resource R, the corresponding “request
arc,” going from P to R, is added to the resource allocation graph.
• As soon as the request is granted, the request arc is replaced by an “own-
ership arc,” going from R to P . This may either take place immediately or
after a wait. The latter happens, for example, if R is busy at the moment.
Deadlock avoidance algorithms, discussed in Section 4.6, may compel a
process to wait, even if the resource it is requesting is free.
• When a process P releases a resource R it has previously acquired, the
ownership arc going from R to P is deleted. This arc must necessarily be
present in the graph, because it must have been created when R has been
granted to P .
For this kind of resource allocation graph, it has been proved that the presence
of a cycle in the graph is a necessary and sufficient condition for a deadlock.
It can therefore be used as a tool to check whether a certain sequence of
resource requests, allocations, and releases leads to a deadlock. It is enough
to keep track of them, by managing the arcs of the resource allocation graph
as described earlier, and check whether or not there is a cycle in the graph
after each step.
If a cycle is found, then there is a deadlock in the system, and the
deadlock involves precisely the set of processes and resources belonging to
the cycle. Otherwise the sequence is “safe” from this point of view. The
resource allocation graph shown in Figure 4.2 models a deadlock because
P1 → R1 → P2 → R4 → P3 → R2 → P1 is a cycle. Processes P1 , P2 , and
P3 , as well as resources R1 , R2 , and R4 are involved in the deadlock. Likewise,
P1 → R1 → P2 → R4 → P3 → R3 → P1 is a cycle, too, involving the same
processes as before and resources R1 , R3 , and R4 .
86 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
As for any directed graph, also in this case arc orientation must be taken
into account when assessing the presence of a cycle. Hence, referring again to
Figure 4.2, P1 ← R2 ← P3 → R3 → P1 is not a cycle and does not imply the
presence of any deadlock in the system.
The deadlock problem becomes more complex when there are different
kinds (or classes) of resources in a system and there is, in general, more than
one resource of each kind. All resources of the same kind are fungible, that is,
they are interchangeable so that any of them can be used to satisfy a resource
request for the class they belong to.
This is a common situation in many cases of practical interest: if we do
not consider data access time optimization, disk blocks are fungible resources
because, when any process requests a disk block to store some data, any
free block will do. Other examples of fungible resources include memory page
frames and entries in most operating system tables.
The definition of resource allocation graph can be extended to handle
multiple resource instances belonging to the same class, by using one rectangle
for each resource class, and representing each instance by means of a dot
drawn in the corresponding rectangle. In Reference [39], this is called a general
resource graph.
However, in this case, the theorem that relates cycles to deadlocks becomes
weaker. It can be proved [39] that the presence of a cycle is still a necessary
condition for a deadlock to take place, but it is no longer sufficient. The
theorem can hence be used only to deny the presence of a deadlock, that is, to
state that if there is not any cycle in an extended resource allocation graph,
then there is not any deadlock in the system.
Most other operating systems derived from Unix, for example, Linux, suffer
from the same problems.
On the contrary, in many cases, real-time applications cannot tolerate any
latent deadlock, regardless of its probability of occurrence, for instance, due
to safety concerns. Once it has been decided to actually “do something” about
deadlock, the algorithms being used can be divided into three main families:
• How much influence they have in the way designers and programmers de-
velop the application.
• How much and what kind of information about processes behavior they
need, in order to work correctly.
N =X −C (4.10)
and has the same shape as C and X. Since C changes with time, N also
does.
n
ri = ti − Cij ∀i = 1, . . . , n . (4.12)
j=1
Finally, a resource request coming from the j-th process will be denoted by
the vector qj :
⎛ ⎞
q1j
⎜ ⎟
qj = ⎝ ... ⎠ , (4.13)
qmj
where the i-th element of the vector, qij , indicates how many resources of the
i-th class the j-th process is requesting. Of course, if the process does not want
to request any resource of a certain class, it is free to set the corresponding
qij to 0.
Whenever it receives a new request qj from the j-th process, the banker
executes the following algorithm:
94 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
To assess the safety of a resource allocation state, during step 3 of the preced-
ing algorithm, the banker uses a conservative approach. It tries to compute at
least one sequence of processes—called a safe sequence—comprising all the n
processes in the system and that, when followed, allows each process in turn to
attain the worst-case resource need it declared, and thus successfully conclude
its work. The safety assessment algorithm uses two auxiliary data structures:
• A (column) vector w that is initially set to the currently available resources
(i.e., w = r initially) and tracks the evolution of the available resources as
the safe sequence is being constructed.
• A (row) vector f , of n elements. The j-th element of the vector, fj , cor-
responds to the j-th process: fj = 0 if the j-th process has not yet been
inserted into the safe sequence, fj = 1 otherwise. The initial value of f is
zero, because the safe sequence is initially empty.
The algorithm can be described as follows:
1. Try to find a new, suitable candidate to be appended to the safe
sequence being constructed. In order to be a suitable candidate, a
certain process Pj must not already be part of the sequence and it
must be able to reach its worst-case resource need, given the current
resource availability state. In formulas, it must be
Even if a state is unsafe, all processes could still be able to conclude their
work without deadlock if, for example, they never actually request the maxi-
mum number of resources they declared.
It should also be remarked that the preceding algorithm does not need
to backtrack when it picks up a sequence that does not ensure the successful
termination of all processes. A theorem proved in Reference [32] guarantees
that, in this case, no safe sequences exist at all. As a side effect, this property
greatly reduces the computational complexity of the algorithm.
Going back to the overall banker’s algorithm, we still have to discuss the
fate of the processes which had their requests postponed and were forced to
wait. This can happen for two distinct reasons:
In the safety assessment algorithm, we build the safe sequence one step at
a time. In order to do this, we must inspect at most n candidate processes in
the first step, then n − 1 in the second step, and so on. When the algorithm
is able to build a safe sequence of length n, the worst case for what concerns
complexity, the total number of inspections is therefore
n(n + 1)
n + (n − 1) + . . . + 1 = (4.21)
2
The insertion of each candidate into the safe sequence (4.19), an operation
performed at most n times, does not make the complexity any larger because
the complexity of each insertion is O(m), giving a complexity of O(mn) for
them all.
On the other hand, as for all other processes, the additional column of X
must hold the maximum number of resources the new process will need during
its lifetime for each resource class, represented by xn+1 . The initial value of
the column being added to N must be xn+1 , according to how this matrix
has been defined in Equation (4.10).
is not requesting any additional resources at a given time, the elements of its
sj vector will all be zero at that time.
As for the graph-based method, all these data structures evolve with time
and must be updated whenever a process requests, receives, and relinquishes
resources. However, again, all of them can be maintained in constant time.
Deadlock detection is then based on the following algorithm:
1. Start with the auxiliary (column) vector w set to the currently
available resources (i.e., w = r initially) and the (row) vector f set
to zero. Vector w has one element for each resource class, whereas
f has one element for each process.
2. Try to find a process Pj that has not already been marked and
whose resource request can be satisfied, that is,
Then, the algorithm goes back to step 2 to look for additional pro-
cesses.
It can be proved that a deadlock exists if, and only if, there are unmarked
processes—in other word, at least one element of f is still zero—at the end
of the algorithm. Rather obviously, this algorithm bears a strong resemblance
to the state safety assessment part of the banker’s algorithm. Unsurprisingly,
they also have the same computational complexity.
From the conceptual point of view, the main difference is that the latter
algorithm works on the actual resource requests performed by processes as
they execute (and represented by the vectors sj ), whereas the banker’s algo-
rithm is based on the worst-case resource needs forecast (or guessed) by each
process (represented by xj ).
As a consequence, the banker’s algorithm results are conservative, and a
state can pessimistically be marked as unsafe, even if a deadlock will not nec-
essarily ensue. On the contrary, the last algorithm provides exact indications.
It can be argued that, in general, since deadlock detection algorithms have
a computational complexity comparable to the banker’s algorithms, there is
apparently nothing to be gained from them, at least from this point of view.
However, the crucial difference is that the banker’s algorithm must necessar-
ily be invoked on every resource request and release, whereas the frequency
100 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
4.8 Summary
Starting with a very simple example involving only two processes, this chapter
introduced the concept of deadlock, an issue that may arise whenever processes
compete with each other to acquire and use some resources. Deadlock is espe-
cially threatening in a real-time system because its occurrence blocks one or
more processes forever and therefore jeopardizes their timing.
Fortunately, it is possible to define formally what a deadlock is, and when it
takes place, by introducing four conditions that are individually necessary and
collectively sufficient for a deadlock to occur. Starting from these conditions,
it is possible to define a whole family of deadlock avoidance algorithms. Their
underlying idea is to ensure, by design, that at least one of the four conditions
cannot be satisfied in the system being considered so that a deadlock cannot
occur. This is done by imposing various rules and constraints to be followed
during system design and implementation.
When design-time constraints are unacceptable, other algorithms can be
used as well, at the expense of a certain amount of run-time overhead. They
operate during system execution, rather than design, and are able to prevent
deadlock by checking all resource allocation requests. They make sure that the
system never enters a risky state for what concerns deadlock by postponing
some requests on purpose, even if the requested resources are free.
To reduce the overhead, it is also possible to deal with deadlock even later
by using a deadlock detection and recovery algorithm. Algorithms of this kind
let the system enter a deadlock state but are able to detect deadlock and
recover from it by aborting some processes or denying resource requests and
grants forcibly.
For the sake of completeness, it should also be noted that deadlock is only
one aspect of a more general group of phenomenons, known as indefinite wait,
indefinite postponement, or starvation. A full treatise of indefinite wait is very
complex and well beyond the scope of this book, but an example taken from
the material presented in this chapter may still be useful to grasp the full
extent of this issue. A good starting point for readers interested in a more
thorough discussion is, for instance, the work of Owicki and Lamport [70].
In the banker’s algorithm discussed in Section 4.6, when more than one
request can be granted safely but not all of them, a crucial point is how to
pick the “right” request, so that no process is forced to wait indefinitely in
favor of others.
Even if, strictly speaking, there is not any deadlock under these circum-
stances, if the choice is not right, there may still be some processes that are
blocked for an indefinite amount of time because their resource requests are
always postponed. Similarly, Reference [38] pointed out that, even granting
safe requests as soon as they arrive, without reconsidering postponed requests
first, may lead to other forms of indefinite wait.
102 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
5
Interprocess Communication Based on
Shared Variables
CONTENTS
5.1 Race Conditions and Critical Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Hardware-Assisted Lock Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.3 Software-Based Mutual Exclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.4 From Active to Passive Wait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5 Semaphores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.6 Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
More often than not, in both general-purpose and real-time systems, processes
do not live by themselves and are not independent of each other. Rather,
several processes are brought together to form the application software, and
they must therefore cooperate to solve the problem at hand.
Processes must therefore be able to communicate, that is, exchange infor-
mation, in a meaningful way. As discussed in Chapter 2, it is quite possible
to share some memory among processes in a controlled way by making part
of their address space refer to the same physical memory region.
However, this is only part of the story. In order to implement a correct
and meaningful data exchange, processes must also synchronize their actions
in some ways. For instance, they must not try to use a certain data item
if it has not been set up properly yet. Another purpose of synchronization,
presented in Chapter 4, is to regulate process access to shared resources.
This chapter addresses the topic, explaining how shared variables can
be used for communication and introducing various kinds of hardware- and
software-based synchronization approaches.
103
104 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
the application and exchange data exactly in the same way: there is a set
of global variables defined in the program, all functions have access to them
(within the limits set forth by the scoping rules of the programming language),
and they can get and set their value as required by the specific algorithm they
implement.
A similar thing also happens at the function call level, in which the caller
prepares the function arguments and stores them in a well-known area of mem-
ory, often allocated on the stack. The called function then reads its arguments
from there and uses them as needed. The value returned by the function is
handled in a similar way. When possible, for instance, when the function argu-
ments and return value are small enough, the whole process may be optimized
by the language compiler to use some processor registers instead of memory,
but the general idea is still the same.
Unfortunately, when trying to apply this idea to a concurrent system, one
immediately runs into several, deep problems, even in trivial cases. If we want,
for instance, to count how many events of a certain kind happened in a sequen-
tial programming framework, it is quite intuitive to define a memory-resident,
global variable (the definition will be somewhat like int k if we use the C
programming language) and then a very simple function void inck(void)
that only contains the statement k = k+1.
It should be pointed out that, as depicted in Figure 5.1, no real-world
CPU is actually able to increment k in a single, indivisible step, at least
when the code is compiled into ordinary assembly instructions. Indeed, even a
strongly simplified computer based on the von Neumann architecture [31, 86]
will perform a sequence of three distinct steps:
1. Load the value of k from memory into an internal processor register; on a
simple processor, this register would be the accumulator. From the proces-
sor’s point of view, this is an external operation because it involves both the
processor itself and memory, two distinct units that communicate through
the memory bus. The load operation is not destructive, that is, k retains
its current value after it has been performed.
2. Increment the value loaded from memory by one. Unlike the previous one,
this operation is internal to the processor. It cannot be observed from
the outside, also because it does not require any memory bus cycle to
be performed. On a simple processor, the result is stored back into the
accumulator.
3. Store the new value of k into memory with an external operation involving
a memory bus transaction like the first one. It is important to notice that
the new value of k can be observed from outside the processor only at
this point, not before. In other words, if we look at memory, k retains its
original value until this final step has been completed.
Even if real-world architectures are actually much more sophisticated than
what is shown in Figure 5.1—and a much more intimate knowledge of their
Interprocess Communication Based on Shared Variables 105
int k;
void inck(void)
{
k = k+1;
}
Register k
Memory Bus
FIGURE 5.1
Simplified representation of how the CPU increments a memory-resident
variable k.
1.1. CPU #1 loads the value of k from memory and stores it into one of its
registers, R1 . Since k currently contains 0, R1 will also contain 0.
Register R1 Register R2 k
2.1. Load
2.3. Store
Memory Bus
FIGURE 5.2
Concurrently incrementing a shared variable in a careless way leads to a race
condition.
2.1. Now CPU #2 takes over, loads the value of k from memory, and stores
it into one of its registers, R2 . Since CPU #1 has not stored the updated
value of R1 back to memory yet, CPU #2 still gets the value 0 from k, and
R2 will also contain 0.
1.3. CPU #1 does the same: it stores the contents of R1 back to memory,
that is, it stores 1 into k.
steps performed by the other. In turn, this depends on the precise timing
relationship between the processors, down to the instruction execution level.
This is not only hard to determine but will likely change from one execution
to another, or if the same code is executed on a different machine.
Informally speaking, this kind of time-dependent errors may be hard to find
and fix. Typically, they take place with very low probability and may therefore
be very difficult to reproduce and analyze. Moreover, they may occur when a
certain piece of machinery is working in the field and no longer occur during
bench testing because the small, but unavoidable, differences between actual
operation and testing slightly disturbed system timings.
Even the addition of software-based instrumentation or debugging code to
a concurrent application may subtly change process interleaving and make a
time-dependent error disappear. This is also the reason why software devel-
opment techniques based on concepts like “write a piece of code and check
whether it works or not; tweak it until it works,” which are anyway ques-
tionable even for sequential programming, easily turn into a nightmare when
concurrent programming is involved.
These observations also lead to the general definition of a pathological con-
dition, known as race condition, that may affect a concurrent system: whenever
a set of processes reads and/or writes some shared data to carry out a com-
putation, and the results of this computation depend on the exact way the
processes interleaved, there is a race condition.
In this statement, the term “shared data” must be construed in a broad
sense: in the simplest case, it refers to a shared variable residing in memory, as
in the previous examples, but the definition actually applies to any other kind
of shared object, such as files and devices. Since race conditions undermine
the correctness of any concurrent system, one of the main goals of concurrent
programming will be to eliminate them altogether.
Fortunately, the following consideration is of great help to better focus this
effort and concentrate only on a (hopefully small) part of the processes’ code.
The original concept is due to Hoare [36] and Brinch Hansen [15]:
1. A process spends part of its execution doing internal operations,
that is, executing pieces of code that do not require or make access
to any shared data. By definition, all these operations cannot lead
to any race condition, and the corresponding pieces of code can be
safely disregarded when the code is analyzed from the concurrent
programming point of view.
2. Sometimes a process executes a region of code that makes access to
shared data. Those regions of code must be looked at more carefully
because they can indeed lead to a race condition. For this reason,
they are called critical regions or critical sections.
With this definition in mind, and going back to the race condition depicted
in Figure 5.2, we notice that both processes have a critical region, and it is
the body of function inck(). In fact, that fragment of code increments the
108 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
shared variable k. Even if the critical region code is correct when executed by
one single process, the race condition stems from the fact that we allowed two
distinct processes to be in their critical region simultaneously.
We may therefore imagine solving the problem by allowing only one process
to be in a critical region at any given time, that is, forcing the mutual exclusion
between critical regions pertaining to the same set of shared data. For the
sake of simplicity, in this book, mutual exclusion will be discussed in rather
informal and intuitive terms. See, for example, the works of Lamport [57, 58]
for a more formal and general treatment of this topic.
In simple cases, mutual exclusion can be ensured by resorting to special
machine instructions that many contemporary processor architectures sup-
port. For example, on the Intel R
64 and IA-32 architecture [45], the INC
instruction increments a memory-resident integer variable by one.
When executed, the instruction loads the operand from memory, incre-
ments it internally to the CPU, and finally stores back the result; it is therefore
subject to exactly the same race condition depicted in Figure 5.2. However, it
can be accompanied by the LOCK prefix so that the whole sequence is executed
atomically, even in a multiprocessor or multicore environment.
Unfortunately, these ad-hoc solutions, which coerce a single instruction to
be executed atomically, cannot readily be applied to more general cases, as
it will be shown in the following example. Figure 5.3 shows a classic way of
solving the so-called producers–consumers problem. In this problem, a group
of processes P1 , . . . , Pn , called producers, generate data items and make
them available to the consumers by means of the prod() function. On the
other hand, another group of processes C1 , . . . , Cm , the consumers, use the
cons() function to get hold of data items.
To keep the code as simple as possible, data items are assumed to be integer
values, held in int-typed variables. For the same reason, the error-handling
code (which should detect and handle any attempt of putting a data item into
a full buffer or, symmetrically, getting an item from an empty buffer) is not
shown.
With this approach, producers and consumers exchange data items
through a circular buffer with N elements, implemented as a shared, stati-
cally allocated array int buf[N]. A couple of shared indices, int in and int
out, keep track of the first free element of the buffer and the oldest full ele-
ment, respectively. Both of them start at zero and are incremented modulus
N to circularize the buffer. In particular,
• Assuming that the buffer is not completely full, the function prod() first
stores d, the data item provided by the calling producer, into buf[in], and
then increments in. In this way, in now points to the next free element of
the buffer.
• Assuming that the buffer is not completely empty, the function cons()
takes the oldest data item residing in the buffer from buf[out], stores it
into the local variable c, and then increments out so that the next consumer
Interprocess Communication Based on Shared Variables 109
int cons(void) {
void prod(int d) { int c;
if((in+1) % N == out) Macros/Shared if(in == out)
...the buffer is full... ...the buffer is empty...
else variables else
{ {
buf[in] = d; #define N 8 c = buf[out];
in = (in+1) % N; int buf[N]; out = (out+1) % N;
} int in=0, out=0; }
} return c;
}
prod( 1 ) C1
7 0
6 1
buf
prod( 2 ) 5 2
P1 4 3 C2
out = 2
P2
in = 4
...
...
Cm
in = 6
Pn
7 0
6 1
buf
5 2
?
4 3
2
out = 2
FIGURE 5.3
Not all race conditions can be avoided by forcing a single instruction to be
executed atomically.
will get a fresh data item. Last, it returns the value of c to the calling
consumer.
If should be noted that, since the condition in == out would be true not only
for a buffer that is completely empty but also for a buffer containing N full
elements, the buffer is never filled completely in order to avoid this ambiguity.
In other words, we must consider the buffer to be full even if one free element—
often called the guard element—is still available. The corresponding predicate
is therefore (in+1) % N == out. As a side effect of this choice, of the N buffer
elements, only up to N − 1 can be filled with data.
Taking for granted the following two, quite realistic, hypotheses:
110 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
1. any integer variable can be loaded from, or stored to, memory with
a single, atomic operation;
2. neither the processor nor the memory access subsystem reorders
memory accesses;
it is easy to show that the code shown in Figure 5.3 works correctly for up to
one producer and one consumer running concurrently.
For processors that reorder memory accesses—as most modern, high-
performance processors do—the intended sequence can be enforced by using
dedicated machine instructions, often called fences or barriers. For example,
the SFENCE, LFENCE, and MFENCE instructions of the Intel R
64 and IA-32 ar-
chitecture [45] provide different degrees of memory ordering.
The SFENCE instruction is a store barrier; it guarantees that all store op-
erations that come before it in the instruction stream have been committed
to memory and are visible to all other processors in the system before any
of the following store operations becomes visible as well. The LFENCE instruc-
tion does the same for memory load operations, and MFENCE does it for both
memory load and store operations.
On the contrary, and quite surprisingly, the code no longer works as it
should as soon as we add a second producer to the set of processes being
considered. One obvious issue is with the increment of in (modulus N ), but
this is the same issue already considered in Figure 5.2 and, as discussed before,
it can be addresses with some hardware assistance. However, there is another,
subtler issue besides this one.
Let us consider two producers, P1 and P2 : they both concurrently invoke
the function prod() to store an element into the shared buffer. For this ex-
ample, let us assume, as shown in Figure 5.3, that P1 and P2 want to put
the values 1 and 2 into the buffer, respectively, although the issue does not
depend on these values at all.
It is also assumed that the shared buffer has a total of 8 elements and
initially contains two data items, represented as black dots, whereas white
dots represent empty elements. Moreover, we assume that the initial values of
in and out are 4 and 2, respectively. This is the situation shown in the middle
of the figure. The following interleaving could take place:
1. Process P1 begins executing prod(1) first. Since in is 4, it stores 1
into buf[4].
2. Before P1 makes further progress, P2 starts executing prod(2). The
value of in is still 4, hence P2 stores 2 into buf[4] and overwrites
the data item just written there by P1 .
3. At this point, both P1 and P2 increment in. Assuming that the race
condition issue affecting these operations has been addressed, the
final value of in will be 6, as it should.
It can easily be seen that the final state of the shared variables after these op-
erations, shown in the lower part of Figure 5.3, is severely corrupted because:
Interprocess Communication Based on Shared Variables 111
From the consumer’s point of view, this means that, on the one hand, the
data item produced by P1 has been lost and the consumer will never get it.
On the other hand, the consumer will get and try to use a data item with an
undefined value, with dramatic consequences.
With the same reasoning, it is also possible to conclude that a very similar
issue also occurs if there is more than one consumer in the system. In this
case, multiple consumers could get the same data item, whereas other data
items are never retrieved from the buffer. From this example, two important
conclusions can be drawn:
1. Acquire some sort of lock associated with the shared object and
wait if it is not immediately available.
2. Use the shared object.
3. Release the lock, so that other processes can acquire it and be able
to access the same object in the future.
112 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
In the above sequence, step 2 is performed by the code within the critical
region, whereas steps 1 and 3 are a duty of two fragments of code known as
the critical region entry and exit code. In this approach, these fragments of
code must compulsorily surround the critical region itself. If they are relatively
short, they can be incorporated directly by copying them immediately before
and after the critical region code, respectively.
If they are longer, it may be more convenient to execute them indirectly
by means of appropriate function calls in order to reduce the code size, with
the same effect. In the examples presented in this book, we will always follow
the latter approach. This also highlights the fact that the concept of critical
region is related to code execution, and not to the mere presence of some
code between the critical region entry and exit code. Hence, for example, if a
function call is performed between the critical region entry and exit code, the
body of the called function must be considered as part of the critical region
itself.
In any case, four conditions must be satisfied in order to have an acceptable
solution [85]:
1. It must really work, that is, it must prevent any two processes from
simultaneously executing code within critical regions pertaining to
the same shared object.
2. Any process that is busy doing internal operations, that is, is not
currently executing within a critical region, must not prevent other
processes from entering their critical regions, if they so decide.
Lock variable
lock
Set of shared
variables
read/write
read/write
entry();
... critical region ...
exit();
entry();
... critical region ...
exit();
P1
P2 Pn
...
FIGURE 5.4
Lock variables do not necessarily work unless they are handled correctly.
1.1. P1 executes the entry code and checks the value of lock, finds that lock
is 0, and immediately escapes from the while loop.
2.1. Before P1 had the possibility of setting lock to 1, P2 executes the entry
code, too. Since the value of lock is still 0, it exits from the while loop as
well.
At this point, both P1 and P2 execute their critical code and violate the
mutual exclusion constraint. An attentive reader would have certainly noticed
that, using lock variables in this way, the mutual exclusion problem has merely
been shifted from one “place” to another. Previously the problem was how
to ensure mutual exclusion when accessing the set of shared variables, but
Interprocess Communication Based on Shared Variables 115
Lock variable
lock
Set of shared
variables
read/write
read/write
entry();
... critical region ...
exit();
entry();
... critical region ...
exit();
P1
P2 Pn
...
FIGURE 5.5
Hardware-assisted lock variables work correctly.
now the problem is how to ensure mutual exclusion when accessing the lock
variable itself.
Given the clear similarities between the scenarios depicted in Figures 5.2
and 5.4, it is not surprising that the problem has not been solved at all.
However, this is one of the cases in which hardware assistance is very effective.
In the simplest case, we can assume that the processor provides a test and set
instruction. This instruction has the address p of a memory word as argument
and atomically performs the following three steps:
As shown in Figure 5.5, this instruction can be used in the critical region
entry code to avoid the race condition discussed before because it forces the
test of lock to be atomic with respect to its update. The rest of the code stays
the same. For convenience, the test and set instruction has been denoted as
116 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
a C function int test and set(int *p), assuming that the int type indeed
represents a memory word.
For what concerns the practical implementation of this technique, on the
IntelR
64 and IA-32 architecture [45], the BTS instruction tests and sets a
single bit in either a register or a memory location. It also accepts the LOCK
prefix so that the whole instruction is executed atomically.
Another, even simpler, instruction is XCHG, which exchanges the contents
of a register with the contents of a memory word. In this case, the bus-locking
protocol is activated automatically regardless of the presence of the LOCK
prefix. The result is the same as the test and set instruction if the value
of the register before executing the instruction is 1. Many other processor
architectures provide similar instructions.
It can be shown that, considering the correctness conditions stated at the
end of Section 5.1, the approach just described is correct with respect to
conditions 1 and 2 but does not fully satisfy conditions 3 and 4:
• In extreme cases, the execution of the while loop may be so slow that other
processes may succeed in taking turns entering and exiting their critical
regions, so that the “slow” process never finds lock at 0 and is never
allowed to enter its own critical region. This is in contrast with condition
3.
From the practical standpoint, this may or may not be a real issue depending
on the kind of hardware being used. For example, using this method for mutual
exclusion among cores in a multicore system, assuming that all cores execute
at the same speed (or with negligible speed differences), is quite safe.
Access-control variables
turn
flag[]
Set of shared
variables
read/write
read/write
entry(0);
... critical region ...
exit(0);
entry(1);
... critical region ...
exit(1);
P0
P1
FIGURE 5.6
Peterson’s software-based mutual exclusion for two processes.
It is also assumed that the memory access atomicity and ordering constraints
set forth in Section 5.1 are either satisfied or can be enforced.
Unlike the other methods discussed so far, the critical region entry and exit
functions take a parameter pid that uniquely identifies the invoking process
and will be either 0 (for P0 ) or 1 (for P1 ). The set of shared, access-control
variables becomes slightly more complicated, too. In particular,
• There is now one flag for each process, implemented as an array flag[2]
of two flags. Each flag will be 1 if the corresponding process wants to,
or succeeded in entering its critical section, and 0 otherwise. Since it is
assumed that processes neither want to enter, nor already are within their
critical region at the beginning of their execution, the initial value of both
flags is 0.
• The variable turn is used to enforce the two processes to take turns if
both want to enter their critical region concurrently. Its value is a process
identifier and can therefore be either 0 or 1. It is initially set to 0 to make
118 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
sure it has a legitimate value even if, as we will see, this initial value is
irrelevant to the algorithm.
A formal proof of the correctness of the algorithm is beyond the scope of this
book, but it is anyway useful to gain an informal understanding of how it
works and why it behaves correctly. As a side note, the same technique is also
useful in gaining a better understanding of how other concurrent programming
algorithms are designed and built.
The simplest and easiest-to-understand case for the algorithm happens
when the two processes do not execute the critical region entry code con-
currently but sequentially. Without loss of generality, let us assume that P0
executes this code first. It will perform the following operations:
• turn == 0 (P0 has been the last process to start executing enter()).
If, at this point, P1 tries to enter its critical region by executing enter(1),
the following sequence of events takes place:
Shared variables
int flag[2] = {0, 0};
int turn=0;
... set of shared variables ...
flag[0] = 1; flag[1] = 1;
turn = 0; turn = 1;
while(flag[1] == 1 && turn == 0); while(flag[0] == 1 && turn == 1);
P0 P1
FIGURE 5.7
Code being executed concurrently by the two processes involved in Peterson’s
critical region enter code.
P1 is therefore trapped in the while loop and will stay there until P0 exits
its critical region and invokes exit(0), setting flag[0] back to 0. In turn,
this will make the left-hand part of the predicate being evaluated by P1 false,
break its busy waiting loop, and allow P1 to execute its critical region. After
that, P0 cannot enter its critical region again because it will be trapped in
enter(0).
When thinking about the actions performed by P0 and P1 when they
execute enter(), just discussed above, it should be remarked that, even if
the first program statement in the body of enter(), that is, flag[pid] = 1,
is the same for both of them, the two processes are actually performing very
different actions when they execute it.
That is, they are operating on different flags because they have got different
values for the variable pid, which belongs to the process state. This further
highlights the crucial importance of distinguishing programs from processes
when dealing with concurrent programming because, as it happens in this
case, the same program fragment produces very different results depending
on the executing process state.
It has just been shown that the mutual exclusion algorithm works satis-
factorily when P0 and P1 execute enter() sequentially, but this, of course,
does not cover all possible cases. Now, we must convince ourselves that, for
every possible interleaving of the two, concurrent executions of enter(), the
algorithm still works as intended. To facilitate the discussion, the code exe-
cuted by the two processes has been listed again in Figure 5.7, replacing the
local variables pid and other by their value. As already recalled, the value of
these variables depends on the process being considered.
120 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
For what concerns the first two statements executed by P0 and P1 , that
is, flag[0] = 1 and flag[1] = 1, it is easy to see that the result does not
depend on the execution order because they operate on two distinct variables.
In any case, after both statements have been executed, both flags will be set
to 1.
On the contrary, the second pair of statements, turn = 0 and turn = 1,
respectively, work on the same variable turn. The result will therefore depend
on the execution order but, thanks to the memory access atomicity taken for
granted at the single-variable level, the final value of turn will be either 0 or
1, and not anything else, even if both processes are modifying it concurrently.
More precisely, the final value of turn only depends on which process executed
its assignment last, and represents the identifier of that process.
Let us now consider what happens when both processes evaluate the pred-
icate of their while loop:
• The left-hand part of the predicate has no effect on the overall outcome of
the algorithm because both flag[0] and flag[1] have been set to one.
• The right-hand part of the predicate will be true for one and only one
process. It will be true for at least one process because turn will be either
0 or 1. In addition, it cannot be true for both processes because turn cannot
assume two different values at once and no processes can further modify it.
In summary, either P0 or P1 (but not both) will be trapped in the while loop,
whereas the other process will be allowed to enter into its critical region. Due
to our considerations about the value of turn, we can also conclude that the
process that set turn last will be trapped, whereas the other will proceed. As
before, the while loop executed by the trapped process will be broken when
the other process resets its flag to 0 by invoking its critical region exit code.
Going back to the correctness conditions outlined in Section 5.1, this algo-
rithm is clearly correct with respect to conditions 1 and 2. For what concerns
conditions 3 and 4, it also works better than the hardware-assisted lock vari-
ables discussed in Section 5.2. In particular,
• The slower process is no longer systematically put at a disadvantage. As-
suming that P0 is slower than P1 , it is still true that P1 may initially
overcome P0 if both processes execute the critical region entry code con-
currently. However, when P1 exits from its critical region and then tries to
immediately reenter it, it can no longer overcome P0 .
When P1 is about to evaluate its while loop predicate for the second time,
the value of turn will in fact be 1 (because P1 set it last), and both flags
will be set. Under these conditions, the predicate will be true, and P1 will
be trapped in the loop. At the same time, as soon as turn has been set to
1 by P1 , P0 will be allowed to proceed regardless of its speed because its
while loop predicate becomes false and stays this way.
• For the same reason, and due to the symmetry of the code, if both processes
Interprocess Communication Based on Shared Variables 121
blocking
higher-priority process enter()
t3
P0 becomes ready
lower-priority process
and takes the place of P1
in critical
running
section
P1
t0 t1 t2
time
FIGURE 5.8
Busy wait and fixed-priority assignment may interfere with each other, leading
to an unbounded priority inversion.
repeatedly contend against each other for access to their critical region,
they will take turns at entering them, so that no process can systematically
overcome the other. This property also implies that, if a process wants to
enter its critical region, it will succeed within a finite amount of time, that
is, at the most the time the other process spends in its critical region.
For the sake of completeness, it should be noted that the algorithm just de-
scribed, albeit quite important from the historical point of view, is definitely
not the only one of this kind. For instance, interested readers may want to
look at the famous Lamport’s bakery algorithm [55]. One of the most inter-
esting properties of this algorithm is that it still works even if memory read
and writes are not performed in an atomic way by the underlying processor.
In this way, it completely solves the mutual exclusion problem without any
kind of hardware assistance.
application point of view—as a matter of fact, the intent of the loop is to pre-
vent it from proceeding further at the moment—but it wastes processor cycles
anyway. Since in many embedded systems processor power is at a premium,
due to cost and power consumption factors, it would be a good idea to put
these wasted cycles to better use.
Another side effect is subtler but not less dangerous, at least when dealing
with real-time systems, and concerns an adverse interaction of busy wait with
the concept of process priority and the way this concept is often put into
practice by a real-time scheduler.
In the previous chapters it has already been highlighted that not all pro-
cesses in a real-time system have the same “importance” (even if the concept
of importance has not been formally defined yet) so that some of them must
be somewhat preferred for execution with respect to the others. It is there-
fore intuitively sound to associate a fixed priority value to each process in
the system according to its importance and design the scheduler so that it
systematically prefers higher-priority processes when it is looking for a ready
process to run.
The intuition is not at all far from reality because several popular real-time
scheduling algorithms, to be discussed in Chapter 12, are designed exactly in
this way. Moreover, Chapters 13–16 will also make clearer that assigning the
right priority to all the processes, and strictly obeying them at run-time, also
plays a central role to ensure that the system meets its timing requirements
and constraints.
A very simple example of the kind of problems that may occur is given in
Figure 5.8. The figure shows two processes, P0 and P1 , being executed on a
single physical processor under the control of a scheduler that systematically
prefers P0 to P1 because P0 ’s priority is higher. It is also assumed that these
two processes share some data and—being written by a proficient programmer
who just read this chapter—therefore contain one critical region each. The
critical regions are protected by means of one of the mutual exclusion methods
discussed so far. As before, the critical region entry code is represented by the
function enter(). The following sequence of events may take place:
1. Process P0 becomes ready for execution at t0 , while P1 is blocked
for some other reason. Supposing that P0 is the only ready process
in the system at the moment, the scheduler makes it run.
2. At t1 , P0 wants to access the shared data; hence, it invokes the
critical region entry function enter(). This function is nonblocking
because P1 is currently outside its critical region and allows P0 to
proceed immediately.
3. According to the fixed-priority relationship between P0 and P1 , as
soon as P1 becomes ready, the scheduler grabs the processor from
P0 and immediately brings P1 into the running state with an action
often called preemption. In Figure 5.8, this happens at t2 .
4. If, at t3 , P1 tries to enter its critical section, it will get stuck in
Interprocess Communication Based on Shared Variables 123
P0 becomes ready
and takes the place of P1
blocking
higher-priority process enter()
t3 t4
lower-priority process
in critical in critical
running
section section
P1
t0 t1 t2
time
FIGURE 5.9
Using passive wait instead of busy wait solves the unbounded priority inversion
problem in simple cases.
running when appropriate. This certainly saves processor power and, at least
in our simple example, also addresses the unbounded priority inversion issue.
As shown in Figure 5.9, if enter() and exit() are somewhat modified to use
passive wait,
It is easy to see that there still is a priority inversion region, shown in dark
gray in Figure 5.9, but it is no longer unbounded. The maximum amount of
time P1 can be blocked by P0 is in fact bounded by the maximum amount of
time P0 can possibly spend executing in its critical section. For well-behaved
processes, this time will certainly be finite.
Even if settling on passive wait is still not enough to completely solve the
unbounded priority inversion problem in more complex cases, as will be shown
in Chapter 15, the underlying idea is certainly a step in the right direction and
Interprocess Communication Based on Shared Variables 125
5.5 Semaphores
Semaphores were first proposed as a general interprocess synchronization
framework by Dijkstra [23]. Even if the original formulation was based on busy
wait, most contemporary implementations use passive wait instead. Even if
semaphores are not powerful enough to solve, strictly speaking, any arbitrary
concurrent programming problem, as pointed out for example in [53], they
have successfully been used to address many problems of practical significance.
They still are the most popular and widespread interprocess synchronization
method, also because they are easy and efficient to implement.
A semaphore is an object that comprises two abstract items of information:
In both cases, V(s) never blocks the caller. It should also be re-
marked that, when V(s) unblocks a process, it does not necessarily
make it running immediately. In fact, determining which processes
must be run, among the Ready ones, is a duty of the scheduling al-
gorithm, not of the interprocess communication mechanism. More-
over, this decision is often based on information—for instance, the
relative priority of the Ready processes—that does not pertain to
the semaphore and that the semaphore implementation may not
even have at its disposal.
The process state diagram transitions triggered by the semaphore primitives
are highlighted in Figure 5.10. As in the general process state diagram shown in
Figure 3.4 in Chapter 3, the transition of a certain process A from the Running
to the Blocked state caused by P() is voluntary because it depends on, and is
caused by, a specific action performed by the process that is subjected to the
transition, in this case A itself.
On the other hand, the transition of a process A from the Blocked back
into the Ready state is involuntary because it depends on an action performed
by another process. In this case, the transition is triggered by another process
B that executes a V() involving the semaphore on which A is blocked. After
all, by definition, as long as A is blocked, it does not proceed with execution
and cannot perform any action by itself.
Semaphores provide a simple and convenient way of enforcing mutual ex-
clusion among an arbitrary number of processes that want to have access to
a certain set of shared variables. As shown in Figure 5.11, it is enough to
associate a mutual exclusion semaphore mutex to each set of global variables
to be protected. The initial value of this kind of semaphore is always 1.
Then, all critical regions pertaining to that set of global variables must
be surrounded by the statements P(mutex) and V(mutex), using them as a
pair of “brackets” around the code, so that they constitute the critical region
entry and exit code, respectively.
Interprocess Communication Based on Shared Variables 127
a. Creation
1.
Created B performs a V(s)
b. Admission and A is extracted
from the
semaphore's queue
2.
Ready A
h. Destruction
d. Yield or
c. Scheduling
preemption
5. 3. 4.
A
Terminated Running Blocked
g. Termination
A performs a P(s)
when the value of
s is zero
FIGURE 5.10
Process State Diagram transitions induced by the semaphore primitives P()
and V().
Set of shared
variables
read/write
read/write
P(mutex);
... critical region ...
V(mutex);
P(mutex);
... critical region ...
V(mutex);
P1
P2 Pn
...
FIGURE 5.11
A semaphore can be used to enforce mutual exclusion.
because it will find mutex at 1. All the other processes will find mutex at 0
and will be blocked on it until P1 reaches the critical region exit code and
invokes V(mutex).
When this happens, one of the processes formerly blocked on mutex will be
resumed—for example P2 —and will be allowed to execute the critical region
code. Upon exit from the critical region, P2 will also execute V(mutex) to
wake up another process, and so on, until the last process Pn exits from the
critical region while no other processes are currently blocked on P(mutex).
In this case, the effect of V(mutex) is to increment the value of mutex and
bring it back to 1 so that exactly one process will be allowed to enter into the
critical region immediately, without being blocked, in the future.
It should also be remarked that no race conditions during semaphore ma-
nipulation are possible because the semaphore implementation must guarantee
that P() and V() are executed atomically.
For what concerns the second correctness condition, it can easily be ob-
served that the only case in which the mutual exclusion semaphore prevents
a process from entering a critical region takes place when another process is
already within a critical region controlled by the same semaphore. Hence, pro-
cesses doing internal operations cannot prevent other processes from entering
their critical regions in any way.
The behavior of semaphores with respect to the third and fourth correct-
Interprocess Communication Based on Shared Variables 129
FIGURE 5.12
Producers–consumers problem solved with mutual exclusion and condition
synchronization semaphores.
5.6 Monitors
As discussed in the previous section, semaphores can be defined quite easily;
as we have seen, their behavior can be fully described in about one page. Their
practical implementation is also very simple and efficient so that virtually all
modern operating systems support them. However, semaphores are also a very
low-level interprocess communication mechanism. For this reason, they are
difficult to use, and even the slightest mistake in the placement of a semaphore
primitive, especially P(), may completely disrupt a concurrent program.
For example, the program shown in Figure 5.13 may seem another legit-
imate way to solve the producers–consumers problem. Actually, it has been
derived from the solution shown in Figure 5.12 by swapping the two semaphore
primitives shown in boldface. After all, the program code still “makes sense”
after the swap because, as programmers, we could reason in the following way:
• In order to store a new data item into the shared buffer, a producer must,
first of all, make sure that it is the only process allowed to access the shared
buffer itself. Hence, a P(mutex) is needed.
• In addition, there must be room in the buffer, that is, at least one element
must be free. As discussed previously, P(empty) updates the count of free
buffer elements held in the semaphore empty and blocks the caller until
there is at least one free element.
FIGURE 5.13
Semaphores may be difficult to use: even the incorrect placement of one single
semaphore primitive may lead to a deadlock.
producer tries to store an element into the buffer, the following sequence of
events may occur:
• If the buffer is full, the value of the semaphore empty will be zero because
its value represents the number of empty elements in the buffer. As a con-
sequence, the execution of P(empty) blocks the producer. It should also be
noted that the producer is blocked within its critical region, that is, without
releasing the mutual exclusion semaphore mutex.
After this sequence of events takes place, the only way to wake up the blocked
producer is by means of a V(empty). However, by inspecting the code, it can
easily be seen that the only V(empty) is at the very end of the consumer’s
code. No consumer will ever be able to reach that statement because it is
preceded by a critical region controlled by mutex, and the current value of
mutex is zero.
In other words, the consumer will be blocked by the P(mutex) located at
the beginning of the critical region itself as soon as it tries to get data item
from the buffer. All the other producers will be blocked, too, as soon as they
try to store data into the buffer, for the same reason.
As a side effect, the first N consumers trying to get data from the buffer
will also bring the value of the semaphore full back to zero so that the
following consumers will not even reach the P(mutex) because they will be
blocked on P(full).
The presence of a deadlock can also be deducted in a more abstract way,
for instance, by referring to the Havender/Coffman conditions presented in
Chapter 4. In particular,
• The hold and wait condition is satisfied because both processes hold a
resource—either the mutual exclusion semaphore or the ability to provide
more empty buffer space—and wait for the other.
• Neither resource can be preempted. Due to the way the code has been
designed, the producer cannot be forced to relinquish the mutual exclusion
semaphore before it gets some empty buffer space. On the other hand, the
consumer cannot be forced to free some buffer space without first passing
through its critical region controlled by the mutual exclusion semaphore.
With respect to its components, a monitor guarantees the following two main
properties:
• Information hiding, because the set of shared data defined in the monitor is
accessible only through the monitor methods and cannot be manipulated
directly from the outside. At the same time, monitor methods are not
allowed to access any other shared data. Monitor methods are not hidden
and can be freely invoked from outside the monitor.
• Mutual exclusion among monitor methods, that is, the monitor implemen-
tation, must guarantee that only one process will be actively executing
within any monitor method at any given instant.
• wait(c) blocks the invoking process and releases the monitor in a single,
atomic action.
The informal reasoning behind the primitives is that, if a process starts ex-
ecuting a monitor method and then discovers that it cannot finish its work
immediately, it invokes wait on a certain condition variable. In this way, it
blocks and allows other processes to enter the monitor and perform their job.
When one of those processes, usually by inspecting the monitor’s shared
data, detects that the first process can eventually continue, it calls signal
on the same condition variable. The provision for multiple condition variables
stems from the fact that, in a single monitor, there may be many, distinct
reasons for blocking. Processes can easily be divided into groups and then
awakened selectively by making them block on distinct condition variables,
one for each group.
However, even if the definition of wait and signal just given may seem
quite good by intuition, it is nonetheless crucial to make sure that the syn-
chronization mechanism does not “run against” the mutual exclusion property
that monitors must guarantee in any case, leading to a race condition. It turns
out that, as shown in Figure 5.14, the following sequence of events involving
two processes A and B may happen:
A B
Monitor 1
2
wait(c) 3
signal(c)
FIGURE 5.14
After a wait/signal sequence on a condition variable, there is a race condition
that must be adequately addressed.
A B
Monitor 1
wait(c)
2
3
signal(c)
FIGURE 5.15
An appropriate constraint on the placement of signal in the monitor methods
solves the race condition issue after a wait/signal sequence.
In any case, processes like B—that entered the monitor and then
blocked as a consequence of a signal—take precedence on processes
that want to enter the monitor from the outside, like process C
in the figure. These processes will therefore be admitted into the
monitor, one at a time, only when the process actively executing in
136 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
A B
Monitor 1 2
3
wait(c) signal(c)
FIGURE 5.16
Another way to eliminate the race condition after a wait/signal sequence is
to block the signaling process until the signaled one ceases execution within
the monitor.
A B
Monitor 1 2
wait(c) signal(c)
3
4b
4a
FIGURE 5.17
The POSIX approach to eliminate the race condition after a wait/signal
sequence is to force the process just awakened from a wait to reacquire the
monitor mutual exclusion lock before proceeding.
The most important side effect of this approach from the practical
standpoint is that, when process A waits for a condition and then
process B signals that the condition has been fulfilled, process A
cannot be 100% sure that the condition it has been waiting for will
still be true when it will eventually resume executing in the monitor.
It is quite possible, in fact, that another process C was able to enter
the monitor in the meantime and, by altering the monitor’s shared
data, make the condition false again.
To conclude the description, Figure 5.18 shows how the producers–consumers
problem can be solved by means of a Brinch Hansen monitor, that is, the
simplest kind of monitor presented so far. Unlike the previous examples, this
one is written in “pseudo C” because the C programming language, by itself,
138 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
# define N 8
monitor P r o d u c e r s C o n s u m e r s
{
int buf [ N ];
int in = 0 , out = 0;
c o n d i t i o n full , empty ;
int count = 0;
FIGURE 5.18
Producers–consumers problem solved by means of a Brinch Hansen monitor.
does not support monitors. The fake keyword monitor introduces a monitor.
The monitor’s shared data and methods are syntactically grouped together
by means of a pair of braces. Within the monitor, the keyword condition
defines a condition variable.
The main differences with respect to the semaphore-based solution of
Figure 5.12 are
• The mutual exclusion semaphore mutex is no longer needed because the
monitor construct already guarantees mutual exclusion among monitor
methods.
• The two synchronization semaphores empty and full have been replaced
by two condition variables with the same name. Indeed, their role is still
the same: to make producers wait when the buffer is completely full, and
make consumers wait when the buffer is completely empty.
• In the semaphore-based solution, the value of the synchronization
semaphores represented the number of empty and full elements in the
buffer. Since condition variables have no memory, and thus have no value
at all, the monitor-based solution keeps that count in the shared variable
count.
• Unlike in the previous solution, all wait primitives are executed condi-
Interprocess Communication Based on Shared Variables 139
tionally, that is, only when the invoking process must certainly wait. For
example, the producer’s wait is preceded by an if statement checking
whether count is equal to N or not, so that wait is executed only when the
buffer is completely full. The same is also true for signal, and is due to
the semantics differences between the semaphore and the condition variable
primitives.
5.7 Summary
In order to work together toward a common goal, processes must be able
to communicate, that is, exchange information in a meaningful way. A set
of shared variables is undoubtedly a very effective way to pass data from one
process to another. However, if multiple processes make use of shared variables
in a careless way, they will likely incur a race condition, that is, a harmful
situation in which the shared variables are brought into an inconsistent state,
with unpredictable results.
Given a set of shared variables, one way of solving this problem is to locate
all the regions of code that make use of them and force processes to execute
these critical regions one at a time, that is, in mutual exclusion. This is done
by associating a sort of lock to each set of shared variables. Before entering a
critical region, each process tries to acquire the lock. If the lock is unavailable,
because another process holds it at the moment, the process waits until it is
released. The lock is released at the end of each critical region.
The lock itself can be implemented in several different ways and at differ-
ent levels of the system architecture. That is, a lock can be either hardware- or
software-based. Moreover, it can be based on active or passive wait. Hardware-
based locks, as the name says, rely on special CPU instructions to realize lock
acquisition and release, whereas software-based locks are completely imple-
mented with ordinary instructions.
When processes perform an active wait, they repeatedly evaluate a pred-
icate to check whether the lock has been released or not, and consume CPU
cycles doing so. On the contrary, a passive wait is implemented with the help
of the operating system scheduler by moving the waiting processes into the
Blocked state. This is a dedicated state of the Process State Diagram, in which
processes do not compete for CPU usage and therefore do not proceed with
execution.
From a practical perspective, the two most widespread ways of supporting
mutual exclusion for shared data access in a real-time operating system are
semaphores and, at a higher level of abstraction, monitors. Both of them are
based on passive wait and are available in most modern operating systems.
Moreover, besides mutual exclusion, they can both be used for condi-
tion synchronization, that is, to establish timing constraints among processes,
140 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
which are not necessarily related to mutual exclusion. For instance, a synchro-
nization semaphore can be used to block a process until an external event of
interest occurs, or until another process has concluded an activity.
Last, it should be noted that adopting a lock to govern shared data access
is not the only way to proceed. Indeed, it is possible to realize shared objects
that can be concurrently manipulated by multiple processes without using any
lock. This will be the topic of Chapter 10.
6
Interprocess Communication Based on
Message Passing
CONTENTS
6.1 Basics of Message Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.2 Naming Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.3 Synchronization Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.4 Message Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.5 Message Structure and Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.6 Producer–Consumer Problem with Message Passing . . . . . . . . . . . . . . . . . 153
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
141
142 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Even if this definition still lacks many important details that will be discussed
later, it is already clear that the most apparent effect of message passing
primitives is to transfer a certain amount of information from the sending
process to the receiving one. At the same time, the arrival of a message to a
process also represents a synchronization signal because it allows the process
to proceed after a blocking receive.
The last important requirement of a satisfactory interprocess communica-
tion mechanism, mutual exclusion, is not a concern here because messages are
never shared among processes, and their ownership is passed from the sender
to the receiver when the message is transferred. In other words, the mecha-
nism works as if the message were instantaneously copied from the sender to
the receiver even if real-world message passing systems do their best to avoid
actually copying a message for performance reasons.
In this way, even if the sender alters a message after sending it, it will
merely modify its local copy, and this will therefore not influence the message
sent before. Symmetrically, the receiver is allowed to modify a message it
received, and this action will not affect the sender in any way.
Existing message-passing schemes comprise a number of variations around
this basic theme, which will be the subject of the following sections. The main
design choices left open by our summary description are
???
P
Q'
Q
Module A
Module B
P
Q'
Q
Mailbox
Module A
Module B
FIGURE 6.1
Direct versus indirect naming scheme; the direct scheme is simpler, the other
one makes software integration easier.
1. how the send and receive primitives are associated to each other;
2. their symmetry (or asymmetry).
About the first aspect, the most straightforward approach is for the send-
ing process to name the receiver directly, for instance, by passing its process
identifier to send as an argument. On the other hand, when the software gets
more complex, it may be more convenient to adopt an indirect naming scheme
in which the send and receive primitives are associated because they both
name the same intermediate entity. In the following, we will use the word
144 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
mailbox for this entity, but in the operating system jargon, it is also known
under several other names, such as channel or message queue.
As shown in Figure 6.1, an indirect naming scheme is advantageous to
software modularity and integration. If, for example, a software module A
wants to send a message to another module B, the process P (of module A)
responsible for the communication must know the identity of the intended
recipient process Q within module B. If the internal architecture of module
B is later changed, so that the intended recipient becomes Q instead of Q,
module A must be updated accordingly, or otherwise communication will no
longer be possible.
In other words, module A becomes dependent not only upon the interface
of module B—that would be perfectly acceptable—but also upon its internal
design and implementation. In addition, if process identifiers are used to name
processes, as it often happens, even more care is needed because there is
usually no guarantee that the identifier of a certain process will still be the
same across reboots even if the process itself was not changed at all.
On the contrary, if the communication is carried out with an indirect
naming scheme, depicted in the lower part of the figure, module A and process
P must only know the name of the mailbox that module B is using for incoming
messages. The name of the mailbox is part of the external interface of module
B and will likely stay the same even if B’s implementation and internal design
change with time, unless the external interface of the module is radically
redesigned, too.
Another side effect of indirect naming is that the relationship among com-
municating processes becomes more complex. For both kinds of naming, we
can already have
With indirect naming, since multiple processes can receive messages from the
same mailbox, there may also be a one-to-many or a many-to-many structure,
or in which one or more processes send messages to a group of recipients,
without caring about which of them will actually get the message.
This may be useful to conveniently handle concurrent processing in a
server. For example, a web server may comprise a number of “worker” pro-
cesses (or threads), all equal and able to handle a single HTTP request at a
time. All of them will be waiting for requests through the same intermediate
entity (which will most likely be a network communication endpoint in this
case).
When a request eventually arrives, one of the workers will get it, process it,
and provide an appropriate reply to the client. Meanwhile, the other workers
will still be waiting for additional requests and may start working on them
concurrently.
Interprocess Communication Based on Message Passing 145
A B
receive
send
message
A B
send
message
receive
Executing
Blocked
FIGURE 6.2
Asynchronous message transfer. The sender is never blocked by send even if
the receiver is not ready for reception.
basically checks whether a message is available and, in that case, retrieves it,
but never waits if it is not. On the sending side, the establishment of additional
synchronization constraints proceeds, in most cases, along three basic schemes:
1. As shown in Figure 6.2, a message transfer is asynchronous if the
sending process is never blocked by send even if the receiving pro-
cess has not yet executed a matching receive. This kind of message
transfer gives rise to two possible scenarios:
• If, as shown in the upper part of the figure, the receiving pro-
cess B executes receive before the sending process A has sent
the message, it will be blocked and it will wait for the mes-
sage to arrive. The message transfer will take place when A
eventually sends the message.
• If the sending process A sends the message before the receiv-
ing process B performs a matching receive, the system will
Interprocess Communication Based on Message Passing 147
A B
receive
send
message
A B
send
message
receive
Executing
Blocked
FIGURE 6.3
Synchronous message transfer, or rendezvous. The sender is blocked by send
when the receiver is not ready for reception.
A B
receive (request)
send/receive
(request/reply)
request message Execute request
and generate reply
send (reply)
reply message
Executing
Blocked
FIGURE 6.4
Remote invocation message transfer, or extended rendezvous. The sender is
blocked until it gets a reply from the receiver. Symmetrically, the receiver is
blocked until the reply has successfully reached the original sender.
the sending process A invokes the send primitive when the receiving
process B has not called receive yet, A is blocked until B does so.
When B is eventually ready to receive the message, the message
transfer takes place, and A is allowed to continue. As shown in
the upper part of the figure, nothing changes with respect to the
asynchronous model if the receiver is ready for reception when the
sender invokes send. In any case, with this kind of message transfer,
the receiver B can rest assured that the sending process A will not
proceed beyond its send before B has actually received the message.
This difference about the synchronization model has an important
impact for what concerns message buffering, too: since in a ren-
dezvous the message sender is forced to wait until the receiver is
ready, the system must not necessarily provide any form of inter-
mediate buffering to handle this case. The message can simply be
kept by the sender until the receiver is ready and then transferred
directly from the sender to the receiver address space.
3. A remote invocation message transfer, also known as extended ren-
vezvous, is even stricter for what concerns synchronization. As de-
picted in Figure 6.4, when process A sends a request message to
process B, it is blocked until a reply message is sent back from B
to A.
Interprocess Communication Based on Message Passing 149
At first sight it may seem that, since an asynchronous message transfer can be
used as the “basic building block” to construct all the others, it is the most
useful one. For this reason, as will be discussed in Chapters 7 and 8, most
operating systems provide just this synchronization model. However, it has
been remarked [18] that it has a few drawbacks, too:
• Having a large buffer between the sender and the receiver decouples the
two processes and, on average, makes them less sensitive to any variation
in execution and message passing speed. Thus, it increases the likelihood of
executing them concurrently without unnecessarily waiting for one another.
• The interposition of a buffer increases the message transfer delay and makes
it less predictable. As an example, consider the simple case in which we
assume that the message transfer time is negligible, the receiver consumes
messages at a fixed rate of k messages per second, and there are already
m messages in the buffer when the m + 1 message is sent. In this case, the
receiver will start processing the m + 1 message after m/k seconds. Clearly,
if m becomes too large for any reason, the receiver will work on “stale”
data.
Interprocess Communication Based on Message Passing 151
For these and other reasons, the approach to buffering differs widely from
one message passing implementation to another. Two extreme examples are
provided by
ideal world it would be possible to directly send and receive any kind of data,
even of a user-defined type, but this is rarely the case in practice.
The first issue is related to data representation: the same data type, for
instance the int type of the C language, may be represented in very different
ways by the sender and the receiver, especially if they reside on different hosts.
For instance, the number of bits may be different, as well as the endianness,
depending on the processor architecture. When this happens, simply moving
the bits that made up an int data item from one host to another is clearly
not enough to ensure a meaningful communication.
A similar issue also occurs if the data item to be exchanged contains point-
ers. Even if we take for granted that pointers have the same representation
in both the sending and receiving hosts, a pointer has a well-defined meaning
only within its own address space, as discussed in Chapter 2. Hence, a pointer
may or may not make sense after message passing, depending on how the
sending and receiving agents are related to each other:
1. If they are two threads belonging to the same process (and, there-
fore, they necessarily reside on the same host), they also live within
the same address space, and the pointer will still reference the same
underlying memory object.
2. If they are two processes residing on the same host, the pointer will
still be meaningful after message passing only under certain very
specific conditions, that is, only if their programmers were careful
enough to share a memory segment between the two processes, make
sure that it is mapped at the same virtual address in both processes,
and allocate the referenced object there.
3. If the processes reside on different hosts, there is usually no way to
share a portion of address spaces between them, and the pointer
will definitely lose its meaning after the transfer.
Even worse, it may happen that the pointer will still be formally
valid in the receiver’s context—that is, it will not be flagged as in-
valid by the memory management subsystem because it falls within
the legal boundaries of the address space—but will actually point
to a different, and unrelated, object.
In any case, it should also be noted that, even if passing a pointer makes sense
(as in cases 1 and 2 above), it implies further memory management issues,
especially if memory is dynamically allocated. For instance, programmers must
make sure that, when a pointer to a certain object is passed from the sender
to the receiver, the object is not freed (and its memory reused) before the
receiver is finished with it.
This fact may not be trivial to detect for the sender, which in a sense
can be seen as the “owner” of the object when asynchronous or synchronous
transfers are in use. This is because, as discussed in Section 6.3, the sender is
allowed to continue after the execution of a send primitive even if the receiver
Interprocess Communication Based on Message Passing 153
int cons(void) {
void prod(int d) { uint32_t m;
uint32_t m; int d;
m = host_to_neutral(d); recv(P, &m, sizeof(m));
send(C, &m, sizeof(m)); d = neutral_to_host(m);
} return d;
}
P ... C
FIGURE 6.5
A straightforward solution to the producer–consumer problem with syn-
chronous message passing. The same approach also works with asynchronous
message passing with a known, fixed amount of buffering.
either did not get the message (asynchronous transfer) or did not actually
work on the message (synchronous transfer) yet.
Since the problem is very difficult to solve in general terms, most operating
systems and programming languages leave this burden to the programmer. In
other words, in many cases, the message-passing primitives exported by the
operating system and available to the programmer are merely able to move a
sequence of bytes from one place to another.
The programmer is then entirely responsible for making sure that the
sequence of bytes can be interpreted by the receiver. This is the case for
both POSIX/Linux and FreeRTOS operating systems (discussed in Chapters 7
and 8), as well as the socket programming interface for network communication
(outlined in Chapter 9).
On the other side, the consumer C invokes the function cons() whenever it
is ready to retrieve a message:
• The function waits until a message arrives, by invoking the recv message-
passing primitive. Since the naming scheme is direct and symmetric, the
first argument of recv identifies the intended sender of the message, that
is, P . The next two arguments locate a memory buffer in which recv is
expected to store the received message and its size.
• Then, the data item found in the message just received is converted to the
host representation by means of the function neutral to host(). For a
single, 32-bit integer variable, a suitable POSIX function would be ntohl().
The result d is returned to the caller.
Upon closer examination of Figure 6.5, it can be seen that the code just
described gives rise to a unidirectional flow of messages, depicted as light grey
boxes, from P to C, each carrying one data item. The absence of messages
represents a synchronization condition because the consumer C is forced to
wait within cons() until a message from P is available.
However, if we compare this solution with, for instance, the semaphore-
based solution shown in Figure 5.12 in Chapter 5, it can easily be noticed that
another synchronization condition is amiss. In fact, in the original formulation
of the producer–consumer problem, the producer P must wait if there are “too
many” messages already enqueued for the consumer. In Figure 5.12, the exact
definition of “too many” is given by N, the size of the buffer interposed between
producers and consumers.
Therefore, the solution just proposed is completely satisfactory—and
matches the previous solutions, based on other interprocess synchronization
mechanisms—only if the second synchronization condition is somewhat pro-
vided implicitly by the message-passing mechanism itself. This happens when
the message transfer is synchronous, implying that there is no buffer at all
between P and C.
An asynchronous message transfer can also be adequate if the maximum
amount of buffer provided by the message-passing mechanism is known and
Interprocess Communication Based on Message Passing 155
#define N 8
void cons_init(void) {
int i;
empty_t s = empty;
for(i=0; i<N; i++)
void prod(int d) { send(P, &s, sizeof(s));
uint32_t m; }
1 empty_t s;
recv(C, &s, sizeof(s)); int cons(void) {
m = host_to_neutral(d); uint32_t m;
send(C, &m, sizeof(m)); empty_t s = empty;
} 2 int d;
recv(P, &m, sizeof(m));
d = neutral_to_host(m);
send(P, &s, sizeof(s));
return d;
}
...
P C
...
Empty messages from C to P
FIGURE 6.6
A more involved solution to the producer–consumer problem based on asyn-
chronous message passing. In this case, the synchronization condition for the
producer P is provided explicitly rather than implicitly.
fixed, and the send primitive blocks the sender when there is no buffer space
available.
If only asynchronous message passing is available, the second synchroniza-
tion condition must be implemented explicitly. Assuming that the message-
passing mechanism can successfully buffer at least N messages, a second flow
of empty messages that goes from C to P and only carries synchronization
information is adequate for this, as shown in Figure 6.6. In the figure, the ad-
ditional code with respect to Figure 6.5 is highlighted in bold. The data type
empty t represents an empty message. With respect to the previous example,
• The consumer C sends an empty message to P after retrieving a message
from P itself.
• The producer P waits for an empty message from the consumer C before
sending its own message to it.
• By means of the initialization function cons init(), the consumer injects
N empty messages into the system at startup.
At startup, there are therefore N empty messages. As the system evolves, the
total number of empty plus full messages is constant and equal to N because
156 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
one empty (full) message is sent whenever a full (empty) message is retrieved.
The only transient exception happens when the producer or the consumer are
executing at locations 1 and 2 of Figure 6.6, respectively. In that case, the
total number of messages can be N − 1 or N − 2 because one or two messages
may have been received by P and/or C and have not been sent back yet.
In this way, C still waits if there is no full message from P at the moment,
as before. In addition, P also waits if there is no empty message from C. The
total number of messages being constant, this also means that P already sent
N full messages that have not yet been handled by C.
6.7 Summary
In this chapter we learnt that message passing is a valid alternative to inter-
process communication based on shared variables and synchronization devices
because it encompasses both data transfer and synchronization in the same
set of primitives.
Although the basics of message passing rely on two intuitive and sim-
ple primitives, send and receive, there are several design and implementation
variations worthy of attention. They fall into three main areas:
CONTENTS
7.1 Threads and Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.1.1 Creating Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.1.2 Creating Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7.2 Interprocess Communication among Threads . . . . . . . . . . . . . . . . . . . . . . . . . 175
7.2.1 Mutexes and Condition Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
7.3 Interprocess Communication among Processes . . . . . . . . . . . . . . . . . . . . . . . 180
7.3.1 Semaphores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
7.3.2 Message Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.3.3 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
7.4 Clocks and Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
7.5 Threads or Processes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
159
160 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
will then introduce some Linux primitives for the management of clocks and
timers, an important aspect when developing programs that interact with the
outside world in embedded applications.
• the Program Counter, that is, the address of the next machine
instruction to be executed by the program;
• the Stack Pointer, that is, the address of the stack in memory
containing the local program variables and the arguments of all
the active procedures of the program at the time the scheduler
reclaimes the processor.
3. The page table content for that program. We have seen in Chapter 2
that, when virtual memory is supported by the system, the memory
usage of the program is described by a set of page table entries that
specify how virtual addresses are translated into physical addresses.
In this case, the context switch changes the memory mapping and
avoids the new process overwriting sections of memory used by the
previous one.
4. Process-specific data structures maintained by the operating system
to manage the process.
The amount of information to be saved for the process losing the processor and
to be restored for the new process can be large, and therefore many processor
cycles may be spent at every context switch. Very often, most of the time
spent at the context switch is due to saving and restoring the page table since
the page table entries describe the possibly large number of memory pages
used by the process. For the same reason, creating new processes involves the
creation of a large set of data structures.
The above facts are the main reason for a new model of computation rep-
resented by threads. Conceptually, threads are not different from processes
because both entities provide an independent flow of execution for programs.
This means that all the problems, strategies, and solutions for managing con-
current programming apply to processes as well as to threads. There are,
however, several important differences due to the amount of information that
is saved by the operating system in context switches. Threads, in fact, live
in the context of a process and share most process-specific information, in
particular memory mapping. This means that the threads that are activated
within a given process share the same memory space and the same files and
devices. For this reason, threads are sometimes called “lightweight processes.”
Figure 7.1 shows on the left the information forming the process context. The
memory assigned to the process is divided into
• Stack, containing the private (sometimes called also automatic) variables
and the arguments of the currently active routines. Normally, a processor
register is designated to hold the address of the top of the stack;
• Text, containing the machine code of the program being executed. This
area is normally only read;
• Data, containing the data section of the program. Static C variables and
variables declared outside the routine body are maintained in the data
section;
• Heap, containing the dynamically allocated data structures. Memory al-
located by C malloc() routine or by the new operator in C++ belong to
the heap section.
In addition to the memory used by the program, the process context is formed
by the content of the registers, the descriptors for the open files and devices,
162 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Multithreaded
Process context
Process Context
SP General
Stack SP General Stack Thread 1 Purpose
Purpose PC Registers
PC Registers
SP General
Text Stack Thread 2 Purpose
PC Registers
File and
Device
Text
Data Descriptors
OS Data
Structures File and
Data Device
Heap
Descriptors
OS Data
Heap Structures
FIGURE 7.1
Process and Thread contexts.
and the other operating system structures maintained for that process. On
the right of Figure 7.1 the set of information for a process hosting two threads
is shown. Note that the Text, Data, and Heap sections are the same for both
threads. Only the Stack memory is replicated for each thread, and the thread-
specific context is only formed by the register contents. The current content
of the processor registers in fact represents a snapshot of the program activity
at the time the processor is removed by the scheduler from one thread to
be assigned to another one. In particular, the stack pointer register contains
the address of the thread-specific stack, and the program counter contains
the address of the next instruction to be executed by the program. As the
memory-mapping information is shared among the threads belonging to the
same process as well as the open files and devices, the set of registers basically
represents the only information to be saved in a context switch. Therefore,
unless a thread from a different process is activated, the time required for a
context switch between threads is much shorter compared to the time required
for a context switch among processes.
that could be portable across different systems. For this reason, a standardized
interface has been specified by the IEEE POSIX 1003.c standard in 1995, and
an API for POSIX threads, called Pthreads, is now available on every UNIX
system including Linux. The C types and routine prototypes for threads are
defined in the pthread.h header file.
The most important routine is:
int pthread_create(thread_t *thread, pthread_attr_t *attr,
void *(*start_routine)(void*), void *arg)
which creates and starts a new thread. Its arguments are the following:
• thread: the returned identifier of the created thread to be used for sub-
sequent operations. This is of type thread t which is opaque, that is, the
programmer has no knowledge of its internal structure, this being only
meaningful to the pthread routines that receive it as argument.
• attr: the attributes of the thread. Attributes are represented by the opaque
type pthread attr t.
• start routine: the routine to be executed by the thread.
• arg: the pointer argument passed to the routine.
All the pthread routines return a status that indicates whether the required
action was successful: all functions return 0 on success and a nonzero error
code on error. Since the data type for the attribute argument is opaque, it is
not possible to define directly its attribute fields, and it is necessary to use
specific routines for this purpose. For example, one important attribute of the
thread is the size of its stack: if the stack is not large enough, there is the risk
that a stack overflow occurs especially when the program is using recursion.
To prepare the attribute argument specifying a given stack size, it is necessary
first to initialize a pthread attr t parameter with default setting and then
use specific routines to set the specific attributes. After having been used,
the argument should be disposed. For example, the following code snippet
initializes a pthread attr parameter and then sets the stack size to 4 MByte
(the default stack size on Linux is normally 1 MByte for 32 bit architectures,
and 2 MByte for 64 bit architectures).
pthread_attr_t atrr;
//Attribute initialization
pthread_attr_init (&attr);
//Set stack size to 4 MBytes
pthread_attr_setstacksize(&attr, 0x00400000);
...
//Use attr in thread creation
...
//Dispose attribute parameter
pthread_attr_destroy(&attr);
164 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
The second argument, when non-NULL, is the pointer to the returned value
of the thread. A thread may return a value either when the code terminates
with a return statement or when pthread exit(void *value)is called. The
latter is preferable especially when many threads are created and terminated
because pthread exit() frees the internal resources allocated for the thread.
Threads can either terminate spontaneously or be canceled. Extreme care
is required when canceling threads because an abrupt termination may lead
to inconsistent data, especially when the thread is sharing data structures.
Even worse, a thread may be canceled in a critical section: If this happens,
no other thread will ever be allowed to enter that section. For this reason,
POSIX defines the following routines to handle thread cancelation:
# define M A X _ T H R E A D S 256
# define ROWS 10000
# define COLS 10000
...
In the foregoing code there are several points worth examining in detail. First
of all, the matrix is declared outside the body of any routine in the code. This
means that the memory for it is not allocated in the Stack segment but in
the Heap segment, being dynamically allocated in the main program. This
segment is shared by every thread (only the stack segment is private for each
thread). Since the matrix is accessed only in read mode, there is no need to
consider synchronization. The examples in the next section will present ap-
plications where the shared memory is accessed for both reading and writing,
and, in this case, additional mechanisms for ensuring data coherence will be
required. Every thread needs two parameters: the row number of the first
element of the set of rows assigned to the thread, and the number of rows
to be considered. Since only one pointer argument can be passed to threads,
the program creates an array of data structures in shared memory, each con-
taining the two arguments for each thread, plus a third return argument that
will contain the partial sum, and then passes the pointer of the corresponding
structure to each thread . Finally, the program awaits the termination of the
threads by calling in pthread join() in a loop with as many iterations as the
number of activated threads. Note that this works also when the threads ter-
minate in an order that is different from the order pthread join() is called.
In fact, if pthread join() is called for a thread that has already terminated,
the routine will return soon with the result value passed by the thread to
pthread exit() and maintained temporarily by the system. In the program,
Interprocess Communication Primitives in POSIX/Linux 167
1500
900
600
300
0
0 2 4 6 8 10
Active Threads
FIGURE 7.2
Execution time of the marge matrix summation for an increasing number of
executor threads on an 8-core processor.
the partial sum computed by each thread is stored in the data structure used
to exchange the thread routine argument, and therefore the second parame-
ter of pthread join() is null, and pthread exit() is not used in the thread
routine.
In the above example, the actions carried out by each thread are purely
computational. So, with a single processor, there is no performance gain in
carrying out computation either serially or in parallel because every thread
requires the processor 100% of its time and therefore cannot proceed when the
processor is assigned to another thread. Modern processors, however, adopt
a multicore architecture, that is, host more than one computing unit in the
processor, and therefore, there is a true performance gain in carrying out
computation concurrently. Figure 7.2 shows the execution time for the above
example at an increasing number of threads on an 8-core processor. The ex-
ecution time halves passing from one to two threads, and the performance
improves introducing additional threads. When more than 8 threads are used,
the performance does not improve any further; rather it worsens slightly. In
fact, when more threads than available cores are used in the program, there
cannot be any gain in performance because the thread routine does not make
any I/O operation and requires the processor (core) during all its execution.
The slight degradation in performance is caused by the added overhead in the
context switch due to the larger number of active threads.
The improvement in execution speed due to multithreading becomes more
evident when the program being executed by threads makes I/O operations.
In this case, the operating system is free to assign the processor to another
thread when the current thread starts an I/O operation and needs to await
its termination. For this reason, if the routines executed by threads are I/O
intensive, adding new threads still improves performance because this reduces
168 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
the chance that the processor idles awaiting the termination of some I/O
operation. Observe that even if no I/O operation is executed by the thread
code, there is a chance that the program blocks itself awaiting the completion
of an I/O operation in systems supporting memory paging. When paging in
memory, pages of the active memory for processes can be held in secondary
memory (i.e., on disk), and are transferred (swapped in) to RAM memory
whenever they are accessed by the program, possibly copying back (swapping
out) other pages in memory to make room for them. Paging allows handling
a memory that is larger than the RAM memory installed in the computer,
at the expense of additional I/O operations for transferring memory pages
from/to the disk.
Threads represent entities that are handled by the scheduler and, from
this point of view, do not differ from processes. In fact, the difference between
processes and threads lies only in the actions required for the context switch,
which is only a subset of the process-specific information if the processing
unit is exchanged among threads of the same process. The following chapters
will describe in detail how a scheduler works, but here we anticipate a few
concepts that will allow us to understand the pthread API for controlling
thread scheduling.
We have already seen in Chapter 3 that, at any time, the set of active
processes (and threads) can be partitioned in two main categories:
• Ready processes, that is, processes that could use the processor as soon as
it is assigned to them;
• Waiting processes, that is, processes that are waiting for the completion
of some I/O operation, and that could not make any useful work in the
meantime.
FIFO policy, this policy ensures that all the processes with the high-
est priority have a chance of being assigned processor time, at the
expense, however, of more overhead due to the larger number of
context switches.
Scheduling policy represents one of the elements that compose the thread’s
attributes, passed to routine pthread create(). We have already seen that
the thread’s attributes are represented by an opaque type and that a set of
routines are defined to set individual attributes. The following routine allows
for defining the scheduling policy:
int pthread_attr_setschedpolicy(pthread_attr_t *attr, int policy);
where policy is either SCHED FIFO, SCHED RR, or SCHED OTHER. SCHED OTHER
can only be used at static priority 0 and represents the standard Linux time-
sharing scheduler that is intended for all processes that do not require special
static priority real-time mechanisms. The above scheduling policies do not rep-
resent the only possible choices and the second part of this book will introduce
different techniques for scheduling processes in real-time systems.
Thread priority is finally defined for a given thread by routine:
int pthread_setschedprio(pthread_t thread, int prio);
pid_t pid;
...
pid = fork();
if(pid == 0)
{
//Actions for the created process
}
else
{
//Actions for the calling process
}
The created process is a child process of the creating one and will proceed
in parallel with the latter. As for threads, if processes are created to carry
out a collaborative work, it is necessary that, at a certain point, the creator
process synchronizes with its child processes. The following system routine
will suspend the execution of the process until the child process, identified by
the process identifier returned by fork(), has terminated.
pid_t wait(pid_t pid, int *status, int options)
Its argument status, when non-NULL, is a pointer to an integer variable
that will hold the status of the child process (e.g., if the child process termi-
nated normally or was interrupted). Argument options, when different from
0, specifies more specialized wait options.
If processes are created to carry out collaborative work, it is necessary that
they share memory segments in order to exchange information. While with
threads every memory segment different from the stack was shared among
threads, and therefore it suffices to use static variables to exchange informa-
tion, the memory allocated for the child process is by default separate from the
memory used by the calling process. We have in fact seen in Chapter 2 that in
operating systems supporting virtual memory (e.g., Linux), different processes
access different memory pages even if using the same virtual addresses, and
that this is achieved by setting the appropriate values in the Page Table at
every context switch. The same mechanism can, however, be used to provide
controlled access to segments of shared memory by setting appropriate values
in the page table entries corresponding to the shared memory pages, as shown
in Figure 2.8 in Chapter 2. The definition of a segment of shared memory is
done in Linux in two steps:
1. A segment of shared memory of a given size is created via system
routine shmget();
2. A region of the virtual address space of the process is “attached”
to the shared memory segment via system routine shmat().
The prototype of shmget() routine is
int shmget(key_t key, size_t size, int shmflg)
Interprocess Communication Primitives in POSIX/Linux 171
where key is the unique identifier of the shared memory segment, size is the
dimension of the segment, and shmflags defines the way the segment is cre-
ated or accessed. When creating a shared memory segment, it is necessary to
provide an unique identifier to it so that the same segment can be referenced
by different processes. Moreover, the shared memory segment has to be cre-
ated only the first time shmget() is called, and the following times it is called
by different processes with the same identifier, the memory segment is simply
referenced. It is, however, not always possible to know in advance if the spec-
ified segment of shared memory has already been created by another process.
The following code snippet shows how to handle such a situation. It shows
also the use of system routine ftok() to create an identifier for shmget()
starting from a numeric value, and the use of shmat() to associate a range of
virtual addresses with the shared memory segment.
# include < sys / ipc .h >
# include < sys / shm .h >
# include < sys / types .h >
In the case where the memory region is shared by a process and its children
processes, it is not necessary to explicitly define shared memory identifiers.
In fact, when a child process is created by fork(), it inherits the memory
segments defined by the parent process. So, in order to share memory with
children processes, it suffices, before calling fork(), to create and map a new
172 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
TABLE 7.1
Protection bitmask
Operation and permissions Octal value
Read by user 00400
Write by user 00200
Read by group 00040
Write by group 00020
Read by others 00004
Write by others 00002
shared memory segment passing constant ICP PRIVATE as the first argument of
shmget(). The memory Identifier returned by shmget() will then be passed to
shmat(), which will in turn return the starting address of the shared memory.
When the second argument of shmat() is NULL (the common case), the
operating system is free to choose the virtual address range for the shared
memory. The third argument passed to shmat() specifies in a bitmask the level
of protection of the shared memory segment, and is normally expressed in octal
value as shown in Table 7.1. Octal value 0666 will specify read-and-write access
for all processes. The following example, performing the same computation of
the example based on threads in the previous section, illustrates the use of
shared memory among children processes.
# include < stdio .h >
# include < stdlib .h >
# include < sys / time .h >
# include < sys / ipc .h >
# include < sys / shm .h >
# include < sys / wait .h >
# define M A X _ P R O C E S S E S 256
# define ROWS 10000 L
# define COLS 10000 L
/∗ A l l o c a t e the matrix M ∗/
b i g M a t r i x = malloc ( ROWS * COLS * sizeof ( long ));
/∗ F i l l the matrix with some v al u e s ∗/
...
/∗ Spawn c h i l d pr oc e sse s ∗/
174 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
From a programming point of view, the major conceptual difference with the
thread-based example is that parameters are not explicitly passed to child
processes. Rather, a variable within the program (currProcessIdx) is set to
the index of the child process just before calling fork() so that it can be used
in the child process to select the argument structure specific to it.
The attentive reader may be concerned about the fact that, since fork()
creates a clone of the calling process including the associated memory, the
amount of processing at every child process creation in the above example may
be very high due to the fact that the main process has allocated in memory a
very large matrix. Fortunately this is not the case because the memory pages
in the child process are not physically duplicated. Rather, the corresponding
page table entries in the child process refer to the same physical pages of the
parent process and are marked as Copy On Write. This means that, whenever
the page is accessed in read mode, both the parent and the child process refer
to the same physical page, and only upon a write operation is a new page
in memory created and mapped to the child process. So, pages that are only
read by the parent and child processes, such as the memory pages containing
the program code, are not duplicated at all. In our example, the big matrix is
written only before creating child processes, and therefore, the memory pages
for it are never duplicated, even if they are conceptually replicated for every
process. Nevertheless, process creation and context switches require more time
in respect of threads because more information, including the page table, has
to be saved and restored at every context switch.
Routines shmget() and shmat(), now incorporated into POSIX, derive
from the System V interface, one of the two major “flavors” of UNIX, the
other being Berkeley Unix (BSD). POSIX defines also a different interface
for creating named shared memory objects, that is, the routine sem open().
The arguments passed to sem open() specify the systemwide name of the
shared memory object and the associated access mode and protection. In this
case, routine mmap(), which has been encountered in Chapter 2 for mapping
Interprocess Communication Primitives in POSIX/Linux 175
I/O into memory, is used to map the shared memory object onto a range of
process-specific virtual addresses.
pthread_mutex_init(pthread_mutex_t *mutex,
pthread_mutex_attr_t *attr)
where the first argument is the pointer of the mutex variable, and the second
one, when different from 0, is a pointer of a variable holding the attributes
for the mutex. Such attributes will be explained later in this book, so, for the
176 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
moment, we will use the default attributes. Once initialized, a thread can lock
and unlock the mutex via routines
pthread_mutex_lock(pthread_mutex_t *mutex)
pthread_mutex_unlock(pthread_mutex_t *mutex)
Routine pthread mutex lock() is blocking, that is, the calling thread is pos-
sibly put in wait state. Sometimes it is more convenient just to check the
status of the mutex and, if the mutex is already locked, return immediately
with an error rather than returning only when the thread has acquired the
lock. The following routine does exactly this:
int pthread_mutex_trylock(pthread_mutex_t *mutex)
Finally, a mutex should be destroyed, that is, the associated resources released,
when it is no more used:
pthread_mutex_destroy(pthread_mutex_t *mutex)
Recalling the producer/consumer example of Chapter 5, we can see that mu-
texes are well fit to ensure mutual exclusion for the segments of code that
update the circular buffer and change the index accordingly. In addition to
using critical sections when retrieving an element from the circular buffer and
when inserting a new one, consumers need also to wait until at least one el-
ement is available in the buffer, and producers have to wait until the buffer
is not full. This kind of synchronization is different from mutual exclusion
because it requires waiting for a given condition to occur. This is achieved
by pthread condition variables acting as monitors. Once a condition variable
has been declared and initialized, the following operations can be performed:
wait and signal. The former will suspend the calling thread until some other
thread executes a signal operation for that condition variable. The signal op-
eration will have no effect if no thread is waiting for that condition variable;
otherwise, it will wake only one waiting thread. In the producer/consumer
program, two condition variables will be defined: one to signal the fact that
the circular buffer is not full, and the other to signal that the circular buffer
is not empty. The producer performs a wait operation over the first condition
variable whenever it finds the buffer full, and the consumer will execute a sig-
nal operation over that condition variable after consuming one element of the
buffer. A similar sequence occurs when the consumer finds the buffer empty.
The prototypes of the pthread routines for initializing, waiting, signaling,
and destroying condition variables are respectively:
int pthread_cond_init(pthread_cond_t *condVar,
pthread_condattr_t *attr)
int pthread_cond_wait(pthread_cond_t *cond ,
pthread_mutex_t *mutex)
int pthread_cond_signal(pthread_cond_t *cond)
int pthread_cond_destroy(pthread_cond_t *cond)
Interprocess Communication Primitives in POSIX/Linux 177
The attr argument passed to pthread cond init() will specify whether the
condition variable can be shared also among threads belonging to different
processes. When NULL is passed as second argument, the condition variable
is shared only by threads belonging to the same process. The first argument of
pthread cond wait() and pthread cond signal() is the condition variable,
and the second argument of pthread cond wait() is a mutex variable that
must be locked at the time pthread cond wait() is called. This argument
may seem somewhat confusing, but it reflects the normal way condition vari-
ables are used. Consider the producer/consumer example, and in particular,
the moment in which the consumer waits, in a critical section, for the condi-
tion variable indicating that the circular buffer is not empty. If the mutex used
for the critical section were not released prior to issuing a wait operation, the
program would deadlock since no other thread could enter that critical sec-
tion. If it were released prior to calling pthread cond wait(), it may happen
that, just after finding the circular buffer empty and before issuing the wait
operation, another producer adds an element to the buffer and issues a signal
operation on that condition variable, which does nothing since no thread is
still waiting for it. Soon after, the consumer issues a wait request, suspending
itself even if the buffer is not empty. It is therefore necessary to issue the wait
at the same time the mutex is unlocked, and this is the reason for the second
argument of pthread cond wait(), which will atomically unlock the mutex
and suspend the thread, and will lock again the mutex just before returning
to the caller program when the thread is awakened.
The following program shows the usage of mutexes and condition variables
when a producer thread puts integer data in a shared circular buffer, which
are then read by a set of consumer threads. The number of consumer threads
is passed as an argument to the program. A mutex is defined to protect inser-
tion and removal of elements into/from the circular buffer, and two condition
variables are used to signal the availability of data and room in the circular
buffer.
# define B U F F E R _ S I Z E 128
/∗ Shared data ∗/
int buffer [ B U F F E R _ S I Z E];
/∗ readIdx i s the index in the b u f f e r of the next item to be r e t r i e v e d ∗/
int readIdx = 0;
/∗ w r i te Idx i s the index in the b u f f e r of the next item to be i n s e r t e d ∗/
int writeIdx = 0;
/∗ Buffer empty c on di ti on corresponds to readIdx == w r i te Idx . Buffer f u l l
c on di ti on corresponds to ( w r i te Idx + 1)%BUFFER SIZE == readIdx ∗/
178 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
/∗ I n i t i a l i z e mutex and c on di ti on v a r i a b l e s ∗/
p t h r e a d _ m u t e x _ i n i t(& mutex , NULL )
p t h r e a d _ c o n d _ i n i t(& dataAvailable , NULL )
p t h r e a d _ c o n d _ i n i t(& roomAvailable , NULL )
using only pthreads primitive is more easily portable across different platforms
than a program using Linux-specific synchronization primitives.
where the first argument specifies the semaphore’s name. The second argu-
ment defines associated flags that specify, among other information, if the
semaphore has to be created if not yet existing. The third argument specifies
Interprocess Communication Primitives in POSIX/Linux 181
the associated access protection (as seen for shared memory), and the last
argument specifies the initial value of the semaphore in the case where this
has been created. sem open() will return the address of a sem t structure
to be passed to sem wait() and sem post(). Named semaphores are used
when they are shared by different processes, using then their associated name
to identify the right semaphores. When the communicating processes are all
children of the same process, unnamed semaphores are preferable because it is
not necessary to define names that may collide with other semaphores used by
different processes. Unnamed semaphores are created by the following routine:
int sem_init(sem_t *sem, int pshared, unsigned int value)
sem init() will always create a new semaphore whose data structure will be
allocated in the sem t variable passed as first argument. The second argument
specifies whether the semaphore will be shared by different processes and will
be set to 0 only if the semaphore is to be accessed by threads belonging to the
same process. If the semaphore is shared among processes, the sem t variable
to host the semaphore data structures must be allocated in shared memory.
Lastly, the third argument specifies the initial value of the semaphore.
The following example is an implementation of our well-known produc-
er/consumer application where the producer and the consumers execute on
different processes and use unnamed semaphores to manage the critical section
and to handle producer/consumer synchronization. In particular, the initial
value of the semaphore (mutexSem) used to manage the critical section is set to
one, thus ensuring that only one process at a time can enter the critical section
by issuing first a P() (sem wait()) and then a V() (sem post()) operation.
The other two semaphores (dataAvailableSem and roomAvailableSem)will
contain the current number of available data slots and free ones, respectively.
Initially there will be no data slots and BUFFER SIZE free slots and therefore
the initial values of dataAvailableSem and roomAvailableSem will be 0 and
BUFFER SIZE, respectively.
# include < stdio .h >
# include < stdlib .h >
# include < sys / ipc .h >
# include < sys / shm .h >
# include < sys / wait .h >
# include < s e m a p h o r e.h >
# define M A X _ P R O C E S S E S 256
# define B U F F E R _ S I Z E 128
/∗ Shared Buffer , indexes and semaphores are he ld in shared memory
readIdx i s the index in the b u f f e r of the next item to be r e t r i e v e d
w r i te Idx i s the index in the b u f f e r of the next item to be i n s e r t e d
Buffer empty c on di ti on corresponds to readIdx == w r i te Idx
Buffer f u l l c on di ti on corresponds to
( w ri te Idx + 1)%BUFFER SIZE == readIdx )
Semaphores used for sy n c hr on i z ati on :
mutexSem i s used to p r o t e c t the c r i t i c a l s e c t i o n
dataAvailableSem i s used to wait for data a v i l a b i l i t y
roomAvailableSem i s used to wait for room a b a i l a b l e in the b u f f e r ∗/
struct B u f f e r D a t a {
int readIdx ;
182 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
int writeIdx ;
int buffer [ B U F F E R _ S I Z E];
sem_t mutexSem ;
sem_t d a t a A v a i l a b l e S e m;
sem_t r o o m A v a i l a b l e S e m;
};
struct B u f f e r D a t a * s h a r e d B u f;
/∗ Consumer r ou ti n e ∗/
static void consumer ()
{
int item ;
while (1)
{
/∗ Wait for a v a i l a b i l i t y of at l e a s t one data s l o t ∗/
sem_wait (& sharedBuf - > d a t a A v a i l a b l e S e m);
/∗ Enter c r i t i c a l s e c t i o n ∗/
sem_wait (& sharedBuf - > mutexSem );
/∗ Get data item ∗/
item = sharedBuf - > buffer [ sharedBuf - > readIdx ];
/∗ Update read index ∗/
sharedBuf - > readIdx = ( sharedBuf - > readIdx + 1)% B U F F E R _ S I Z E;
/∗ S i g n al t h a t a new empty s l o t i s a v a i l a b l e ∗/
sem_post (& sharedBuf - > r o o m A v a i l a b l e S e m);
/∗ E x i t c r i t i c a l s e c t i o n ∗/
sem_post (& sharedBuf - > mutexSem );
/∗ Consume data item and tak e ac ti on s ( e . g return )∗/
...
}
}
/∗ producer r ou ti n e ∗/
static void producer ()
{
int item = 0;
while (1)
{
/∗ Produce data item and tak e ac ti on s ( e . g . return )∗/
...
/∗ Wait for a v a i l a b i l i t y of at l e a s t one empty s l o t ∗/
sem_wait (& sharedBuf - > r o o m A v a i l a b l e S e m);
/∗ Enter c r i t i c a l s e c t i o n ∗/
sem_wait (& sharedBuf - > mutexSem );
/∗ Write data item ∗/
sharedBuf - > buffer [ sharedBuf - > writeIdx ] = item ;
/∗ Update w r i te index ∗/
sharedBuf - > writeIdx = ( sharedBuf - > writeIdx + 1)% B U F F E R _ S I Z E;
/∗ S i g n al t h a t a new data s l o t i s a v a i l a b l e ∗/
sem_post (& sharedBuf - > d a t a A v a i l a b l e S e m);
/∗ E x i t c r i t i c a l s e c t i o n ∗/
sem_post (& sharedBuf - > mutexSem );
}
}
/∗ Main program : the passed argument s p e c i f i e s the number
of consumers ∗/
int main ( int argc , char * args [])
{
int memId ;
int i , n C o n s u m e r s;
pid_t pids [ M A X _ P R O C E S S E S];
if ( argc != 2)
{
printf ( " Usage : prodcons < numProcesses >\ n " );
exit (0);
}
sscanf ( args [1] , " % d " , & n C o n s u m e r s);
/∗ Set−up shared memory ∗/
Interprocess Communication Primitives in POSIX/Linux 183
Observe that, in the above example, there is no check performed on read and
write indexes to state whether data or free room are available. This check is
in fact implicit in the P (semWait()) and V (semPost()) operations carried
out on dataAvailableSem and roomAvailableSem semaphores.
whose first argument, if not IPC PRIVATE, is the message queue unique iden-
tifier, and the second argument specifies, among others, the access protection
to the message queue, specified as a bitmask as for the shared memory. The
returned value is the message queue identifier to be used in the following
routines. New data items are inserted in the message queue by the following
routine:
where the first argument is the message queue identifier. The second argument
is a pointer to the data structure to be passed, whose length is specified in
the third argument. Such a structure defines, as its first long element, a user-
provided message type that can be used to select the messages to be received.
The last argument may define several options, such as specifying whether the
process is put in wait state in the case the message queue is full, or if the
routine returns immediately with an error in this case.
Message reception is performed by the following routine:
whose arguments are the same for of the previous routine, except for msgtyp,
which, if different from 0, specifies the type of message to be received. Unless
differently specified, msgrcv() will put the process in wait state if a message
of the specified type is not present in the queue.
The following example uses message queues to exchange data items between
a producer and a set of consumers processes.
# include < stdio .h >
# include < stdlib .h >
# include < sys / ipc .h >
# include < sys / wait .h >
# include < sys / msg .h >
# define M A X _ P R O C E S S E S 256
/∗ The ty pe of message ∗/
# define P R O D C O N S _ T Y P E 1
/∗ Message s t r u c t u r e d e f i n i t i o n ∗/
struct msgbuf {
long mtype ;
int item ;
};
Interprocess Communication Primitives in POSIX/Linux 185
/∗ Message queue i d ∗/
int msgId ;
/∗ Consumer r ou ti n e ∗/
static void consumer ()
{
int retSize ;
struct msgbuf msg ;
int item ;
while (1)
{
/∗ Receive the message . msgrcv r e tu r n s the s i z e of the r e c e i v e d message ∗/
retSize = msgrcv ( msgId , & msg , sizeof ( int ) , PRODCONS_TYPE , 0);
if ( retSize == -1) // I f Message r e c e pti on f a i l e d
{
perror ( " error msgrcv " );
exit (0);
}
item = msg . item ;
/∗ Consume data item ∗/
...
}
}
/∗ Consumer r ou ti n e ∗/
static void producer ()
{
int item = 0;
struct msgbuf msg ;
msg . mtype = P R O D C O N S _ T Y P E;
while (1)
{
/∗ produce data item ∗/
...
msg . item = item ;
msgsnd ( msgId , & msg , sizeof ( int ) , 0);
}
}
/∗ Main program . The number of consumer
i s passed as argument ∗/
int main ( int argc , char * args [])
{
int i , n C o n s u m e r s;
pid_t pids [ M A X _ P R O C E S S E S];
if ( argc != 2)
{
printf ( " Usage : prodcons < nConsumers >\ n " );
exit (0);
}
sscanf ( args [1] , " % d " , & n C o n s u m e r s);
/∗ I n i t i a l i z e message queue ∗/
msgId = msgget ( IPC_PRIVATE , 0666);
if ( msgId == -1)
{
perror ( " msgget " );
exit (0);
}
/∗ Launch producer process ∗/
pids [0] = fork ();
if ( pids [0] == 0)
{
/∗ Child process ∗/
producer ();
exit (0);
}
/∗ Launch consumer pr oc e sse s ∗/
for ( i = 0; i < n C o n s u m e r s; i ++)
{
186 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
The above program is much simpler than the previous ones because there is no
need to worry about synchronization: everything is managed by the operating
system! Several factors however limit in practice the applicability of message
queues, among which is the fact that they consume more system resources
than simpler mechanisms such as semaphores.
Routines msgget(), msgsnd(), and msgrcv(), now in the POSIX stan-
dard, originally belonged to the System V interface. POSIX defines also a
different interface for named message queues, that is, routines mq open() to
create a message queue, and mq send() and mq receive() to send and re-
ceive messages over a message queue, respectively. As for the shared memory
object creation, the definition of the message queue name is more immediate:
the name is directly passed to mq open(), without the need for using ftok()
to create the identifier to be passed to the message queue creation routine.
On the other side, msgget() (as well as shmget()) allows creating unnamed
message queues, that are shared by the process and its children with no risk
of conflicts with other similar resources with the same name.
7.3.3 Signals
The synchronization mechanisms we have seen so far provide the neces-
sary components, which, if correctly used, allow building concurrent and dis-
tributed systems. However sometime it is necessary to handle the occurrence
of signals, that is, asynchronous event requiring some kind of action in re-
sponse. In POSIX and ANSI, a set of signals is defined, summarized by table
7.2, and the corresponding action can be specified using the following routine:
signal(int signum, void (*handler)(int))
where the first argument is the event number, and the second one is the address
of the event handler routine, which will be executed asynchronously when an
event of the specified type is sent to the process.
A typical use of routine signal() is for “trapping” the SIG INT event that
is generated by the <ctrl> C key. In this case, instead of an abrupt program
termination, it is possible to let a cleanup routine be executed, for example
closing the files which have been opened by the process and making sure that
their content is not corrupted. Another possible utilization of event handlers is
Interprocess Communication Primitives in POSIX/Linux 187
TABLE 7.2
Some signal events defined in Linux
Signal Name
and Number Description
SIGHUP 1 Hangup (POSIX)
SIGINT 2 Terminal interrupt (ANSI)
SIGQUIT 3 Terminal quit (POSIX)
SIGILL 4 Illegal instruction (ANSI)
SIGTRAP 5 Trace trap (POSIX)
SIGFPE 8 Floating point exception (ANSI)
SIGKILL 9 Kill (can’t be caught or ignored) (POSIX)
SIGUSR1 10 User-defined signal 1 (POSIX)
SIGSEGV 11 Invalid memory segment access (ANSI)
SIGUSR2 12 User-defined signal 2 (POSIX)
SIGPIPE 13 Write on a pipe with no reader, Broken pipe (POSIX)
SIGALRM 14 Alarm clock (POSIX)
SIGTERM 15 Termination (ANSI)
SIGSTKFLT 16 Stack fault
SIGCHLD 17 Child process has stopped or exited, changed (POSIX)
SIGCONT 18 Continue executing, if stopped (POSIX)
SIGSTOP 19 Stop executing (can’t be caught or ignored) (POSIX)
SIGTSTP 20 Terminal stop signal (POSIX)
SIGTTIN 21 Background process trying to read, from TTY (POSIX)
SIGTTOU 22 Background process trying to write, to TTY (POSIX)
At this point the reader may wonder why any more processes need to be used
when developing a concurrent application. After all, a single big process, host-
ing all the threads that cooperate in carrying out the required functionality,
may definitely appear as the best choice. Indeed, very often this is the case,
but threads have a weak aspect that sometimes cannot be acceptable, that is,
the lack of protection. As threads share the same memory, except for stacks,
a wrong memory access performed by one thread may corrupt the data struc-
ture of other threads. We have already seen that this fact would be impossible
among processes since their memories are guaranteed to be insulated by the
operating system, which builds a “fence” around them by properly setting the
processes’ page tables. Therefore, if some code to be executed is not trusted,
that is, there is any likelihood that errors could arise during execution, the pro-
tection provided by the process model is mandatory. An example is given by
Web Servers, which are typically concurrent programs because they must be
able to serve multiple clients at the same time. Serving an HTTP connection
may, however, also imply the execution of external code (i.e., not belonging
to the Web Server application), for example, when CGI scripts are activated.
If the Web Server were implemented using threads, the failure of a CGI script
potentially crashes the whole server. Conversely, if the Web Server is imple-
mented as a multiprocess application, failure of a CGI script will abort the
client connection, but the other connections remain unaffected.
7.6 Summary
This chapter has presented the Linux implementation of the concepts intro-
duced in Chapters 3 and 5. Firstly, the difference between Linux processes
and threads has been described, leading to a different memory model and
two different sets of interprocess communication primitives. The main differ-
ence between threads and processes lies in the way memory is managed; since
threads live in the context of a process, they share the same address space of
the hosting process, duplicating only the stack segment containing local vari-
ables and the call frames. This means in practice that static variables, that
are not located in the stack, are shared by all the threads cerated by a given
190 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
CONTENTS
8.1 FreeRTOS Threads and Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.2 Message Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.3 Counting, Binary, and Mutual Exclusion Semaphores . . . . . . . . . . . . . . . . 207
8.4 Clocks and Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
191
192 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
TABLE 8.1
Summary of the task-related primitives of FreeRTOS
Function Purpose Optional
vTaskStartScheduler Start the scheduler -
vTaskEndScheduler Stop the scheduler -
xTaskCreate Create a new task -
vTaskDelete Delete a task given its handle ∗
uxTaskPriorityGet Get the priority of a task ∗
vTaskPrioritySet Set the priority of a task ∗
vTaskSuspend Suspend a specific task ∗
vTaskResume Resume a specific task ∗
xTaskResumeFromISR Resume a specific task from an ISR ∗
xTaskIsTaskSuspended Check whether a task is suspended ∗
vTaskSuspendAll Suspend all tasks but the running one -
xTaskResumeAll Resume all tasks -
uxTaskGetNumberOfTasks Return current number of tasks -
void vTaskStartScheduler(void);
void vTaskEndScheduler(void);
The two scenarios are clearly very different because, in the first case, the return
is immediate and the application tasks are never actually executed, whereas
in the second the return is delayed and usually occurs when the application is
shut down in an orderly manner, for instance, at the user’s request. However,
194 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
portBASE_TYPE xTaskCreate(
pdTASK_CODE pvTaskCode,
const char * const pcName,
unsigned short usStackDepth,
void *pvParameters,
unsigned portBASE_TYPE uxPriority,
xTaskHandle *pvCreatedTask);
where:
• usStackDepth indicates how many stack words must be reserved for the
task stack. The stack word size depends on the underlying hardware archi-
tecture and is configured when the operating system is being ported onto
it. If necessary, the actual size of a stack word can be calculated by look-
ing at the portSTACK TYPE data type, defined in an architecture-dependent
header file that is automatically included by the main FreeRTOS header
file.
Interprocess Communication Primitives in FreeRTOS 195
Table 8.1, as well as in all the ensuing ones, those functions are marked as
optional.
Another important difference with respect to a POSIX-compliant oper-
ating system is that FreeRTOS—like most other small, real-time operating
systems—does not provide anything comparable to the POSIX thread can-
cellation mechanism. This mechanism is rather complex and allows POSIX
threads to decline or postpone deletion requests, or cancellation requests as
they are called in POSIX, directed to them. This is useful in ensuring that
these requests are honored only when it is safe to do so.
In addition, POSIX threads can also register a set of functions, called
cleanup handlers, which will be invoked automatically by the system while
a cancellation request is being honored, before the target thread is actually
deleted. Cleanup handlers, as their name says, provide therefore a good op-
portunity for POSIX threads to execute any last-second cleanup action they
may need to make sure that they leave the application in a safe and consistent
state upon termination.
On the contrary, task deletion is immediate in FreeRTOS, that is, it can
neither be refused nor delayed by the target task. As a consequence, the target
task may be deleted and cease execution at any time and location in the code,
and it will not have the possibility of executing any cleanup handler before
terminating. From the point of view of concurrent programming, it means that,
for example, if a task is deleted when it is within a critical region controlled
by a mutual exclusion semaphore, the semaphore will never be unlocked.
The high-level effect of the deletion is therefore the same as the termi-
nated task never having exited from the critical region: no other tasks will
ever be allowed to enter a critical region controlled by the same semaphore
in the future. Since this usually corresponds to a complete breakdown of any
concurrent program, the direct invocation of vTaskDelete should usually be
avoided, and it should be replaced by a more sophisticated deletion mecha-
nism.
One simple solution, mimicking the POSIX approach, is to send a deletion
request to the target task by some other means—for instance, one of the
interprocess communication mechanisms described in Sections 8.2 and 8.3,
and design the target task so that it responds to the request by terminating
itself at a well-known location in the target task’s code and after any required
cleanup operation has been carried out.
After creation, it is possible to retrieve the priority of a task and change
it by means of the functions
Both functions are optional, that is, they can be excluded from the oper-
ating system to reduce its code and data space requirements. They both take
Interprocess Communication Primitives in FreeRTOS 197
a task handle, pxTask, as their first argument. The special value NULL can be
used as a shortcut to refer to the calling task.
The function vTaskPrioritySet modifies the priority of a task after it
has been created, and uxTaskPriorityGet returns the current priority of the
task. It should, however, be noted that both the priority given at task creation
and the priority set by vTaskPrioritySet represent the baseline priority of
the task.
Instead, uxTaskPriorityGet returns its active priority, which may differ
from the baseline priority when one of the mechanisms to prevent unbounded
priority inversion, to be discussed in Chapter 15, is in effect. More specifically,
FreeRTOS implements the priority inheritance protocol for mutual exclusion
semaphores. See also Section 8.3 for more information.
The pair of optional functions vTaskSuspend and vTaskResume take an
argument of type xTaskHandle according to the following prototypes:
They are used to suspend and resume the execution of the task identified
by the argument. For vTaskSuspend, the special value NULL can be used to
suspend the invoking task, whereas, obviously, it is impossible for a task to
resume executing of its own initiative.
Like vTaskDelete, vTaskSuspend also may suspend the execution of a task
at an arbitrary point. Therefore, it must be used with care when the task to
be suspended contains critical sections—or, more generally, can get mutually
exclusive access to one or more shared resources—because those resources are
not implicitly released while the task is suspended.
FreeRTOS, like most other monolithic operating systems, does not hold a
full task context for interrupt handlers, and hence, they are not full-fledged
tasks. One of the consequences of this design choice is that interrupt handlers
cannot block or suspend themselves (informally speaking, there is no dedicated
space within the operating system to save their context into), and hence,
calling vTaskSuspend(NULL) from an interrupt handler makes no sense. For
related reasons, interrupt handlers are also not allowed to suspend regular
tasks by invoking vTaskSuspend with a valid xTaskHandle as argument.
The function
void vTaskSuspendAll(void);
suspends all tasks but the calling one. Interrupt handling is not suspended
and is still performed as usual.
Symmetrically, the function
portBASE_TYPE xTaskResumeAll(void);
1. Any FreeRTOS primitive that might block the caller for any reason
and even temporarily, or might require a context switch, must not
be used within this kind of critical region. This is because blocking
the only task allowed to run would completely lock up the system,
and it is impossible to perform a context switch with the scheduler
disabled.
2. Protecting critical regions with a sizable execution time in this way
would probably be unacceptable in many applications because it
leads to a large amount of unnecessary blocking. This is especially
true for high-priority tasks, because if one of them becomes ready
for execution while a low-priority task is engaged in a critical region
of this kind, it will not run immediately, but only at the end of the
critical region itself. See Chapter 15 for additional information on
how to compute the worst-case blocking time a task will suffer,
depending on the method used to address the unbounded priority
inversion problem.
The last function related to task management simply returns the number of
tasks currently present in the system, regardless of their state:
Therefore, the count also includes the calling task and blocked tasks. More-
over, it may also include some tasks that have been deleted by vTaskDelete.
This is a side effect of the delayed dismissal of the operating system’s data
structures associated with a task upon deletion previously mentioned.
TABLE 8.2
Summary of the main message-queue related primitives of FreeRTOS
Function Purpose Optional
xQueueCreate Create a message queue -
vQueueDelete Delete a message queue -
xQueueSendToBack Send a message -
xQueueSendToFront Send a high-priority message -
xQueueSendToBackFromISR . . . from an interrupt handler -
xQueueSendToFrontFromISR . . . from an interrupt handler -
xQueueReceive Receive a message -
xQueueReceiveFromISR . . . from an interrupt handler -
xQueuePeek Nondestructive receive -
uxQueueMessagesWaiting Query current queue length -
uxQueueMessagesWaitingFromISR . . . from an interrupt handler -
xQueueIsQueueEmptyFromISR Check if a queue is empty -
xQueueIsQueueFullFromISR Check if a queue is full -
xQueueHandle xQueueCreate(
unsigned portBASE_TYPE uxQueueLength,
unsigned portBASE_TYPE uxItemSize);
creates a new message queue, given the maximum number of elements it can
contain, uxQueueLength, and the size of each element, uxItemSize, expressed
in bytes. Upon successful completion, the function returns a valid message
Interprocess Communication Primitives in FreeRTOS 201
queue handle to the caller, which must be used for any subsequent operation
on the queue just created. When an error occurs, the function returns a NULL
pointer instead.
When a message queue is no longer needed, it is advisable to delete it, in
order to reclaim its memory for future use, by means of the function
It should be noted that the deletion of a FreeRTOS message queue takes place
immediately and is never delayed even if some tasks are waiting on it. The
fate of the waiting tasks then depends on whether they specified a time limit
for the wait or not:
• if they did specify a time limit for the message queue operation, they will
receive an error indication when the operation times out;
• otherwise, they will be blocked forever.
After a message queue has been successfully created and its xQueue handle
is available for use, it is possible to send a message to it by means of the
functions
portBASE_TYPE xQueueSendToBack(
xQueueHandle xQueue,
const void *pvItemToQueue,
portTickType xTicksToWait);
portBASE_TYPE xQueueSendToFront(
xQueueHandle xQueue,
const void *pvItemToQueue,
portTickType xTicksToWait);
portBASE_TYPE xQueueSendToBackFromISR(
xQueueHandle xQueue,
const void *pvItemToQueue,
portBASE_TYPE *pxHigherPriorityTaskWoken);
portBASE_TYPE xQueueSendToFrontFromISR(
xQueueHandle xQueue,
const void *pvItemToQueue,
portBASE_TYPE *pxHigherPriorityTaskWoken);
• If the value is 0 (zero), the function returns an error indication to the caller
when the operation cannot be performed immediately because the message
queue is completely full at the moment.
• Any other value is interpreted as the maximum amount of time the function
will wait, expressed as an integral number of clock ticks. See Section 8.4
for more information about ticks.
The return value of xQueueSendToBack will be pdPASS if the function was suc-
cessful; any other value means than an error occurred. In particular, the error
code errQUEUE FULL means that the function was unable to send the message
within the maximum amount of time specified by xTicksToWait because the
queue was full.
Unlike in POSIX, FreeRTOS messages do not have a full-fledged priority
associated with them, and hence, they are normally sent and received in First-
In, First-Out (FIFO) order. However, a high-priority message can be sent
using the xQueueSendToFront function instead of xQueueSendToBack. The
only difference between those two functions is that xQueueSendToFront sends
the message to the front of the message queue so that it passes over the other
messages stored in the queue and will be received before them.
Neither xQueueSendToBack nor xQueueSendToFront can be called
from an interrupt handler. Instead, either xQueueSendToBackFromISR or
xQueueSendToFrontFromISR must be used. The only differences with respect
to their regular counterparts are
• They cannot block the caller, and hence, they do not have a xTicksToWait
argument and always behave as if the timeout were 0, that is, they re-
turn an error indication to the caller if the operation cannot be concluded
immediately.
portBASE_TYPE xQueueReceive(
xQueueHandle xQueue,
void *pvBuffer,
portTickType xTicksToWait);
portBASE_TYPE xQueueReceiveFromISR(
xQueueHandle xQueue,
void *pvBuffer,
portBASE_TYPE *pxHigherPriorityTaskWoken);
portBASE_TYPE xQueuePeek(
xQueueHandle xQueue,
void *pvBuffer,
portTickType xTicksToWait);
All these functions take a message queue handle, xQueue, as their first ar-
gument; this is the message queue they will work upon. The second argument,
pvBuffer, is a pointer to a memory buffer into which the function will store
the message just received. It must be large enough to hold the message, that
is, at least as large as a message queue item as declared when the queue was
created.
In the case of xQueueReceive, the last argument, xTicksToWait, speci-
fies how much time the function should wait for a message to become avail-
able if the message queue was completely empty when it was invoked. The
valid values of xTicksToWait are the same already mentioned when discussing
xQueueSendToBack.
The return value of xQueueReceive will be pdPASS if the function was
successful; any other value means than an error occurred. In particular, the
error code errQUEUE EMPTY means that the function was unable to receive
a message within the maximum amount of time specified by xTicksToWait
because the queue was empty. In this case, the buffer pointed by pvBuffer
will not contain any valid message after xQueueReceive returns.
The function xQueueReceive, when successful, removes the message it just
received from the message queue so that each message sent to the queue is
received exactly once. On the contrary, the function xQueuePeek simply copies
the message into the memory buffer indicated by the caller without removing
it for the queue. It takes the same arguments as xQueueReceive.
The function xQueueReceiveFromISR is the variant of xQueueReceive
that must be used within an interrupt handler. It never blocks, but it re-
204 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
unsigned portBASE_TYPE
uxQueueMessagesWaiting(const xQueueHandle xQueue);
unsigned portBASE_TYPE
uxQueueMessagesWaitingFromISR(const xQueueHandle xQueue);
portBASE_TYPE
xQueueIsQueueEmptyFromISR(const xQueueHandle xQueue);
portBASE_TYPE
xQueueIsQueueFullFromISR(const xQueueHandle xQueue);
These functions should be used with caution because, although the informa-
tion they return is certainly correct and valid at the time of the call, the scope
of its validity is somewhat limited. It is worth mentioning, for example, that
the information may no longer be valid and should not be relied upon when
any subsequent message queue operation is attempted because other tasks
may have changed the queue status in the meantime.
For example, the preventive execution of uxQueueMessageWaiting by a
task, with a result less than the total length of the message queue, is not
enough to guarantee that the same task will be able to immediately conclude
a xQueueSendToBack in the near future: other tasks, or interrupt handlers,
may have sent additional items into the queue and filled it completely in the
meantime.
The following program shows how the producers/consumers problem can
be solved using a FreeRTOS message queue.
/∗ Producers /Consumers problem s o l v e d with a FreeRTOS message queue ∗/
while (1)
{
/∗ A r e a l producer would put t o g e t h e r an a c t u a l data item .
Here , we b l o c k for a w hi le and then make up a fak e item .
∗/
v T a s k D e l a y( P R O D U C E R _ D E L A Y );
item = args - > n *1000 + c ;
c ++;
while (1)
{
/∗ Receive a data item from the fr on t of the queue , w ai ti n g i f
the queue i s empty . portMAX DELAY means t h a t the r e i s no
upper bound on the amount of wait .
∗/
if ( x Q u e u e R e c e i v e( args - >q , & item , p o r t M A X _ D E L A Y) != pdPASS )
printf ( " * Consumer #% d unable to receive \ n " , args - > n );
else
printf ( " Consumer #% d - received item %6 d \ n " , args - >n , item );
else
{
/∗ Create NP producer t a s k s ∗/
for ( i =0; i < NP ; i ++)
{
p r o d _ a r g s[ i ]. n = i ; /∗ Prepare the arguments ∗/
p r o d _ a r g s[ i ]. q = q ;
/∗ Create NC consumer t a s k s ∗/
for ( i =0; i < NC ; i ++)
{
c o n s _ a r g s[ i ]. n = i ;
c o n s _ a r g s[ i ]. q = q ;
v T a s k S t a r t S c h e d u l e r ();
printf ( " * v T a s k S t a r t S c h e d u l e r() failed \ n " );
}
The main program first creates the message queue that will be used for
interprocess communication, and then a few producer and consumer tasks.
For the sake of the example, the number of tasks to be created is controlled
by the macros NP and NC, respectively.
The producers will all execute the same code, that is, the function
producer code. Each of them receives as argument a pointer to a struct
task args s that holds two fields:
In this way, all tasks can work together by only looking at their arguments and
without sharing any variable, as foreseen by the message-passing paradigm.
Symmetrically, the consumers all execute the function consumer code, which
has a very similar structure.
TABLE 8.3
Summary of the semaphore creation/deletion primitives of FreeRTOS
Function Purpose Optional
xSemaphoreCreateCounting Create a counting semaphore ∗
vSemaphoreCreateBinary Create a binary semaphore -
xSemaphoreCreateMutex Create a mutex semaphore ∗
xSemaphoreCreateRecursiveMutex Create a recursive mutex ∗
vQueueDelete Delete a semaphore of any kind -
• Both the maximum and initial value of a binary semaphore are constrained
to be 1, and hence, they are not explicitly indicated.
• Binary semaphores are the only kind of semaphore that is always available
for use in FreeRTOS, regardless of its configuration. All the others are
optional.
Mutual exclusion semaphores are created by means of two different functions,
depending on whether the recursive lock and unlock feature is desired or not:
xSemaphoreHandle xSemaphoreCreateMutex(void);
xSemaphoreHandle xSemaphoreCreateRecursiveMutex(void);
In both cases, the creation function returns either a semaphore handle upon
successful completion, or NULL. All mutual exclusion semaphores are unlocked
when they are first created, and priority inheritance is always enabled for
them.
Since FreeRTOS semaphores of all kinds are built on top of a message
queue, they can be deleted by means of the function vQueueDelete, already
discussed in Section 8.2. Also in this case, the semaphore is destroyed imme-
diately even if there are some tasks waiting on it.
Interprocess Communication Primitives in FreeRTOS 209
After being created, all kinds of semaphores except recursive, mutual ex-
clusion semaphores are acted upon by means of the functions xSemaphoreTake
and xSemaphoreGive, the FreeRTOS counterpart of P() and V(), respectively.
Both take a semaphore handle xSemaphore as their first argument:
Like many other FreeRTOS primitives that can be invoked from an in-
terrupt handler, this function returns to the caller, in the variable pointed
by pxHigherPriorityTaskWoken, an indication on whether or not it awak-
ened a task with a priority higher than the task which was running when the
interrupt handler started.
The interrupt handler should use this information, as discussed in Sec-
tion 8.1, to determine if it should invoke the FreeRTOS scheduling algorithm
before exiting. The function also returns either pdTRUE or pdFALSE, depending
on whether it was successful or not.
The last pair of functions to be discussed here are the counterpart of
xSemaphoreTake and xSemaphoreGive, to be used with recursive mutual ex-
clusion semaphores:
Both their arguments and return values are the same as xSemaphoreTake
and xSemaphoreGive, respectively. Table 8.4 summarizes the FreeRTOS func-
tions that work on semaphores.
The following program shows how the producers–consumers problem can
be solved using a shared buffer and FreeRTOS semaphores.
210 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
TABLE 8.4
Summary of the semaphore manipulation primitives of FreeRTOS
Function Purpose Optional
xSemaphoreTake Perform a P() on a semaphore -
xSemaphoreGive Perform a V() on a semaphore -
xSemaphoreGiveFromISR . . . from an interrupt handler -
xSemaphoreTakeRecursive P() on a recursive mutex ∗
xSemaphoreGiveRecursive V() on a recursive mutex ∗
int buf [ N ];
int in = 0 , out = 0;
x S e m a p h o r e H a n d l e empty , full ;
x S e m a p h o r e H a n d l e mutex ;
while (1)
{
/∗ A r e a l producer would put t o g e t h e r an a c t u a l data item .
Here , we b l o c k for a w hi le and then make up a fak e item .
∗/
v T a s k D e l a y( P R O D U C E R _ D E L A Y );
item = args - > n *1000 + c ;
c ++;
Interprocess Communication Primitives in FreeRTOS 211
/∗ Mutual e x c l u si on for b u f f e r ac c e ss ∗/
else if ( x S e m a p h o r e T a k e( mutex , p o r t M A X _ D E L A Y) != pdTRUE )
printf ( " * Producer % d unable to take ’ mutex ’\ n " , args - > n );
else
{
/∗ Store data item i n t o ’ buf ’ , update ’ in ’ index ∗/
buf [ in ] = item ;
in = ( in + 1) % N ;
/∗ Release mutex ∗/
if ( x S e m a p h o r e G i v e( mutex ) != pdTRUE )
printf ( " * Producer % d unable to give ’ mutex ’\ n " , args - > n );
while (1)
{
/∗ Synchronize with producers ∗/
if ( x S e m a p h o r e T a k e( full , p o r t M A X _ D E L A Y) != pdTRUE )
printf ( " * Consumer % d unable to take ’ full ’\ n " , args - > n );
/∗ Mutual e x c l u si on for b u f f e r ac c e ss ∗/
else if ( x S e m a p h o r e T a k e( mutex , p o r t M A X _ D E L A Y) != pdTRUE )
printf ( " * Consumer % d unable to take ’ mutex ’\ n " , args - > n );
else
{
/∗ Get data item from ’ buf ’ , update ’ out ’ index ∗/
item = buf [ out ];
out = ( out + 1) % N ;
/∗ Release mutex ∗/
if ( x S e m a p h o r e G i v e( mutex ) != pdTRUE )
printf ( " * Consumer % d unable to give ’ mutex ’\ n " , args - > n );
else
{
/∗ Create NP producer t a s k s ∗/
for ( i =0; i < NP ; i ++)
{
p r o d _ a r g s[ i ]. n = i ; /∗ Prepare the argument ∗/
/∗ Create NC consumer t a s k s ∗/
for ( i =0; i < NC ; i ++)
{
c o n s _ a r g s[ i ]. n = i ;
v T a s k S t a r t S c h e d u l e r ();
printf ( " * v T a s k S t a r t S c h e d u l e r() failed \ n " );
}
As before, the main program takes care of initializing the shared synchro-
nization and mutual exclusion semaphores needed by the application, creates
several producers and consumers, and then starts the scheduler. Even if the
general parameter passing strategy adopted in the previous example has been
maintained, the only argument passed to the tasks is their identification num-
ber because the semaphores, as well as the data buffer itself, are shared and
globally accessible.
With respect to the solution based on message queues, the most important
difference to be remarked is that, in this case, the data buffer shared between
the producers and consumers must be allocated and handled explicitly by
Interprocess Communication Primitives in FreeRTOS 213
TABLE 8.5
Summary of the time-related primitives of FreeRTOS
Function Purpose Optional
xTaskGetTickCount Get current time, in ticks -
vTaskDelay Relative time delay ∗
vTaskDelayUntil Absolute time delay ∗
the application code, instead of being hidden behind the operating system’s
implementation of message queues. In the example, it has been implemented
by means of the (circular) buffer buf[], assisted by the input and output
indexes in and out.
xTicksToDelay xTicksToDelay
vTaskDelayUntil
xTimeIncrement xTimeIncrement
FIGURE 8.1
Comparison between relative and absolute time delays, as implemented by
vTaskDelay and vTaskDelayUntil.
delays can be used in an actual piece of code. When run for a long time, the
example is also useful in better highlighting the difference between those two
kinds of delay. In fact, it can be seen that the wake-up time of task rel delay
(that uses a relative delay) not only drifts forward but is also irregular because
the variations in its response time are not accounted for when determining
the delay before its next activation. On the contrary, the wake-up time of
task abs delay (that uses an absolute delay) does not drift, and it strictly
periodic.
/∗ FreeRTOS vTaskDelay v e r su s vTaskDelayUntil ∗/
v T a s k D e l a y( PERIOD );
while (1)
{
/∗ Block u n t i l the i n s t a n t last w ak e u p + PERIOD,
then update last w ak e u p and return
∗/
v T a s k D e l a y U n t i l(& last_wakeup , PERIOD );
else
{
v T a s k S t a r t S c h e d u l e r ();
printf ( " * v T a s k S t a r t S c h e d u l e r() failed \ n " );
}
8.5 Summary
In this chapter, we filled the gap between the abstract concepts of multipro-
gramming and IPC, presented in Chapters 3 through 6, and what real-time
operating systems actually offer programmers when the resources at their dis-
posal are severely constrained.
FreeRTOS, an open-source, real-time operating system targeted to small
Interprocess Communication Primitives in FreeRTOS 217
embedded systems has been considered as a case study. This is in sharp con-
trast to what was shown in Chapter 7, which deals instead with a full-fledged,
POSIX-compliant operating system like Linux. This also gives the readers the
opportunity of comparing several real-world code examples, written in C for
these two very dissimilar execution environments.
The first important difference is about how FreeRTOS implements multi-
processing. In fact, to cope with hardware limitations and to simplify the
implementation, FreeRTOS does not support multiple processes but only
threads, or tasks, all living within the same address space. With respect to a
POSIX system, task creation and deletion are much simpler and less sophis-
ticated, too.
For what concerns IPC, the primitives provided by FreeRTOS are rather
established, and do not depart significantly from any of the abstract concepts
discussed earlier in the book. The most important aspect that is worth not-
ing is that FreeRTOS sometimes maps a single abstract concept into several
distinct, concrete objects.
For example, the abstract semaphore corresponds to four different “flavors”
of semaphore in FreeRTOS, each representing a different trade-off between the
flexibility and power of the object and the efficiency of its implementation.
This is exactly the reason why this approach is rather common and is also
taken by most other, real-world operating systems.
Another noteworthy difference is that, quite unsurprisingly, time plays
a central role in a real-time operating system. For this reason, all abstract
primitives that may block a process (such as a P() on a semaphore) have
been extended to support a timeout mechanism. In this way, the caller can
specify a maximum amount of time it is willing to block for any given primitive
and do not run the risk of being blocked forever if something goes wrong.
Last but not least, FreeRTOS also provides a couple of primitives to syn-
chronize a task with the elapsed time. Even if those primitives were not dis-
cussed in abstract terms, they are especially important anyway in a real-time
system because they lay out the foundation for executing any kind of periodic
activity in the system. Moreover, they also provide a convenient way to insert
a controlled delay in a task without wasting processing power for it.
218 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
9
Network Communication
CONTENTS
9.1 The Ethernet Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.2 TCP/IP and UDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
9.3 Sockets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
9.3.1 TCP/IP Sockets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
9.4 UDP Sockets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
We have seen in chapters 7 and 8 how processes and threads can communicate
within the same computer. This chapter will introduce the concepts and inter-
faces for achieving communication among different computers to implement
distributed applications. Distributed applications involving network commu-
nication are used in embedded systems for a variety of reasons, among which
are
• Computing Power : Whenever the computing power needed by the applica-
tion cannot be provided by a single computer, it is necessary to distribute
the application among different machines, each carrying out a part of the
required computation and coordinating with the others via the network.
• Distributed Data: Often, an embedded system is required to acquire and
elaborate data coming from different locations in the controlled plant. In
this case, one or more computers will be dedicated to data acquisition
and first-data processing. They will then send preprocessed data to other
computers that will complete the computation required for the control loop.
• Single Point of failure: For some safety-critical applications, such as aircraft
control, it is important that the system does not exhibit a single point of
failure, that is, the failure of a single computer cannot bring the system
down. In this case, it is necessary to distribute the computing load among
separate machines so that, in case of failure of one of them, another one
can resume the activity of the failed component.
Here we shall concentrate on the most widespread programming interface for
network communication based on the concept of socket. Before describing the
programming interface, we shall briefly review some basic concepts in network
communication with an eye on Ethernet, a network protocol widely used in
local area networks (LANs).
219
220 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
• MAC Source (6 octets): The MAC address of the sender of the frame.
• Packet Length (2 octets): Coding either the length of the data frame or
other special information about the packet type.
TCP
TCP Data
Header
IP Header IP Data
Ethternet Ethernet
Frame Ethernet Frame Payload Frame
Header Footer
FIGURE 9.1
Network frames: Ethernet, IP, and TCP/IP.
the packets, check for duplicated data assembling and de-assembling of data
packets, and traffic congestion control. TCP defines its own data packet for-
mat, which brings, in addition to data themselves, all the required information
for reliable and stream-oriented communication, including Source/Destination
port definitions and Packet Sequence and Acknowledge numbers used to de-
tect lost packets and handle retransmission.
Being the TCP layer built on top of the Internet layer, the latter cannot know
anything about the structure of the TCP data packet, which is contained in
the data part of the Internet packet, as shown in Figure 9.1. So, when a data
packet is received by the Internet layer (possibly contained in the payload of
a Ethernet data packet), the specific header fields will be used by the Internet
layer, which will pass the data content of the packet to the above TCP layer,
which in turn will interpret this as a TCP packet.
The abstraction provided by the TCP layer represents an effective way
to achieve network communication and, for this reason, TCP/IP communica-
tion is widely used in applications. In the next section we shall present the
programming model of TCP/IP and illustrate it in a sample client/server ap-
plication. This is, however, not the end of the story: many other protocols are
built over TCP/IP, such as File Transfer Protocol (FTP), and the ubiquitous
Hypertext Transfer Protocol (HTTP) used in web communication.
without knowing which are the recipients. This would not be possible using
TCP/IP because a connection should be established with every listener, and
therefore, its address must be known in advance. The User Datagram Protocol
(UDP) [73], which is built over the Internet layer, lets computer applications
send messages, in this case referred to as datagrams, to other hosts on an
Internet network without the need of establishing point-to-point connections.
In addition, UDP provides multicast capability, that is, it allows sending of
datagrams to sets of recipients without even knowing their IP addresses. On
the other side, the communication model offered by UDP is less sophisticated
than that of TCP/IP, and data reliability is not provided. Later in this chapter,
the programming interface of UDP will be presented, together with a sample
application using UDP multicast communication.
9.3 Sockets
The programming interface for TCP/IP and UDP is centered around the con-
cept of socket, which represents the endpoint of a bidirectional interprocess
communication flow. The creation of a socket is therefore the first step in the
procedure for setting up and managing network communication. The proto-
type of the socket creation routine is
where domain selects the protocol family that will be used for communi-
cation. In the case of the Internet, the communication domain is AF INET.
type specifies the communication semantics, which can be SOCK STREAM or
SOCK DGRAM for TCP/IP or UDP communication, respectively. The last ar-
gument, protocol, specifies a particular protocol within the communication
domain to be used with the socket. Normally, only a single protocol exists
and, therefore, the argument is usually specified as 0.
The creation of a socket represents the only common step when man-
aging TCP/IP and UDP communication. In the following we shall first de-
scribe TCP/IP programming using a simple client–server application. Then
UDP communication will be described by presenting a program for multicast
notification.
of command, from clients and returns other character strings representing the
answer to the commands. It is worth noting that a high-level protocol for
information exchange must be handled by the program: TCP/IP sockets in
facts provide full duplex point-to-point communication where the communi-
cation partners can send and transmit bytes, but it is up to the application to
handle transmission and reception to avoid, for example, situations in which
the two communication partners both hang waiting to receive some data from
the other. The protocol defined in the program below is a simple one and can
be summarized as follows:
• The client initiates the transaction by sending a command to be executed.
To do this, it first sends the length (4 bytes) of the command string, followed
by the command characters. Sending the string length first allows the server
to receive the correct number of bytes afterwards.
• The server, after receiving the command string, executes the command
getting and answer string, which is sent back to the client. Again, first the
length of the string is sent, followed by the answer string characters. The
transaction is then terminated and a new one can be initiated by the client.
Observe that, in the protocol used in the example, numbers and single-byte
characters are exchanged between the client and the server. When exchanging
numbers that are represented by two, four, or more bytes, the programmer
must take into account the possible difference in byte ordering between the
client and the server machine. Getting weird numbers from a network con-
nection is one of the main source of headache to novel network programmers.
Luckily, there is no need to find out exotic ways of discovering whether the
client and the server use a different byte order and to shuffle bytes manually,
but it suffices to use a few routines available in the network API that convert
short and integer numbers to and from the network byte order, which is, by
convention, big endian.
Another possible source of frustration for network programmers is due to
the fact that the recv() routine for receiving a given number of bytes from
the socket does not necessarily return after the specified number of bytes has
been read, but it may end when a lower number of bytes has been received,
returning the actual number of bytes read. This occurs very seldom in practice
and typically not when the program is tested, since it is related to the level of
congestion of the network. Consequently, when not properly managed, this fact
generates random communication errors that are very hard to reproduce. In
order to receive a given number of bytes, it is therefore necessary to check the
number of bytes returned by recv(), possibly issuing again the read operation
until all the expected bytes are read, as done by routine receive() in this
program.
The client program is listed below:
# include < stdio .h >
# include < sys / types .h >
# include < sys / socket .h >
Network Communication 227
The above program first creates a socket and connects it to the server whose
IP Address and port are passed in the command string. Socket connection
is performed by routine connect(), and the server address is specified in a
variable of type struct sockaddr in, which is defined as follows
Network Communication 229
struct sockaddr_in {
short sin_family; // Address family e.g. AF_INET
unsigned short sin_port; // Port number in Network Byte order
struct in_addr sin_addr; // see struct in_addr, below
char sin_zero[8]; //Padding zeroes
};
struct in_addr {
unsigned long s_addr; //4 byte IP address
};
The Internet Address is internally specified as a 4-byte integer but is presented
to users in the usual dot notation. The conversion from human readable no-
tation and the integer address is carried out by routine gethostbyname(),
which fills a struct hostent variable with several address-related informa-
tion. We are interested here (and in almost all the applications in practice)
in field h addr, which contains the resolved IP address and which is copied
in the corresponding field of variable sin. When connect() returns success-
fully, the connection with the server is established, and data can be exchanged.
Here, the exchanged information is represented by character strings: command
string are sent to the server and, for every command, an answer string is re-
ceived. The length of the string is sent first, converted in network byte order
by routine htonl(), followed by the string characters. Afterwards, the answer
is obtained by reading first its length and converting from network byte order
via routine ntohl(), and then reading the expected number of characters.
The server code is listed below, and differs in several points from the
client one. First of all, the server does not have to know the address of the
clients: after creating a socket and binding it to the port number (i.e., the
port number clients will specify to connect to the server), and specifying the
maximum length of pending clients via listen() routine, the server suspends
itself in a call to routine accept(). This routine will return a new socket to
be used to communicate with the client that just established the connection.
# include < stdio .h >
# include < sys / types .h >
# include < sys / socket .h >
# include < netinet / in .h >
# include < arpa / inet .h >
# include < netdb .h >
# include < string .h >
# include < stdlib .h >
/∗ Handle an e s t a b l i s h e d connection
r ou ti n e r e c e i v e i s l i s t e d in the pr e v i ou s example ∗/
static void h a n d l e C o n n e c t i o n( int currSd )
{
unsigned int netLen ;
int len ;
char * command , * answer ;
for (;;)
{
/∗ Get the command s t r i n g l e n g t h
I f r e c e i v e f a i l s , the c l i e n t most l i k e l y e x i t e d ∗/
230 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
/∗ Main Program ∗/
main ( int argc , char * argv [])
{
int sd , currSd ;
int sAddrLen ;
int port ;
int len ;
unsigned int netLen ;
char * command , * answer ;
struct s o c k a d d r _ i n sin , retSin ;
/∗ The port number i s passed as command argument ∗/
if ( argc < 2)
{
printf ( " Usage : server < port >\ n " );
exit (0);
}
sscanf ( argv [1] , " % d " , & port );
/∗ Create a new soc k e t ∗/
if (( sd = socket ( AF_INET , SOCK_STREAM , 0)) == -1)
{
perror ( " socket " );
exit (1);
}
/∗ I n i t i a l i z e the address ( s t r u c t sok addr i n ) f i e l d s ∗/
memset (& sin , 0 , sizeof ( sin ));
sin . s i n _ f a m i l y = AF_INET ;
sin . sin_addr . s_addr = I N A D D R _ A N Y;
sin . sin_port = htons ( port );
In the above example, the server program has two nested loops: the external
loop waits for incoming connections, and the internal one, defined in routine
handleConnection(), handles the connection just established until the client
exits. Observe that the way the connection is terminated in the example is
rather harsh: the inner loop breaks whenever an error is issued when either
reading or writing the socket, under the assumption that the error is because
the client exited. A more polite management of the termination of the con-
nection would have been to foresee in the client–server protocol an explicit
command for closing the communication. This would also allow discriminat-
ing between possible errors in the communication and the natural termination
of the connection.
Another consequence of the nested loop approach in the above program is
that the server, while serving one connection, is not able to accept any other
connection request. This fact may pose severe limitations to the functional-
ity of a network server: imagine a web server that is able to serve only one
connection at a time! Fortunately, there is a ready solution to this problem:
let a separate thread (or process) handle the connection established, allowing
the main process accepting other connection requests. This is also the reason
for the apparently strange fact why routine accept() returns a new socket to
be used in the following communication with the client. The returned socket,
in fact, is specific to the communication with that client, while the original
socket can still be used to issue accept() again.
The above program can be turned into a multithreaded server just replac-
ing the external loop accepting incoming connections as follows:
/∗ Thread r ou ti n e . I t c a l l s r ou ti n e handleConnection ( )
de fi n e d in the pre v i ou s program . ∗/
static void * c o n n e c t i o n H a n d l e r( void * arg )
{
int currSock = *( int *) arg ;
h a n d l e C o n n e c t i o n( currSock );
free ( arg );
p t h r e a d _ e x i t (0);
return NULL ;
}
...
/∗ Replacement of the e x t e r n a l ( accept ) loop of the pr e v i ou s program ∗/
for (;;)
{
232 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
In the new version of the program, routine handleConnection() for the com-
munication with the client is wrapped into a thread. The only small change
in the program is due to the address of the socket being passed because the
thread routine accepts a pointer argument. The new server can now accept
and serve any incoming connection in parallel.
built above IP that allows that applications send and receive messages, called
datagrams, over an IP network. Unlike TCP/IP, the communication does not
require prior communication to set up client–server connection, and for this
reason, it is called connectionless. UDP provides an unreliable service, and
datagrams may arrive out of order, duplicated, or lost, and these conditions
must be handled in the user application. Conversely, faster communication
can be achieved in respect of other reliable protocols because UDP introduces
less overhead. As for TCP/IP, message senders and receivers are uniquely
identified by the pair (IP Address, port). No connection is established prior to
communication, and datagrams sent and received by routines sendto() and
revfrom(), respectively, can be sent and received to/from any other partner
in communication. In addition to specifying a datagram recipient in the form
(IP Address, port), UDP allows broadcast, that is, sending the datagram to
all the recipients in the network, and multicast, that is, sending the data-
gram to a set of recipients. In particular, multicast communication is useful
in distributed embedded applications because it is often required that data
are exchanged among groups of communicating actors. The approach taken in
UDP multicast is called publish–subscribe, and the set of IP addresses ranging
from 224.0.0.0 to 239.255.255.255 is reserved for multicast communication.
When an address is chosen for multicast communication, it is used by the
sender, and receivers must register themselves for receiving datagrams sent to
such address. So, the sender is not aware of the actual receivers, which may
change over time.
The use of UDP multicast communication is explained by the following
sender and receiver programs: the sender sends a string message to the mul-
ticast address 225.0.0.37, and the message is received by every receiver that
subscribed to that multicast address.
# include < sys / types .h >
# include < sys / socket .h >
# include < netinet / in .h >
# include < arpa / inet .h >
# include < string .h >
# include < stdio .h >
# include < stdlib .h >
/∗ Port number used in the a p p l i c a t i o n ∗/
# define PORT 4444
/∗ Mu l ti c ast address ∗/
# define GROUP " 2 2 5 . 0 . 0 . 3 7"
/∗ Sender main program : g e t the s t r i n g from the command argument ∗/
main ( int argc , char * argv [])
{
struct s o c k a d d r _ i n addr ;
int sd ;
char * message ;
/∗ Get message s t r i n g ∗/
if ( argc < 2)
{
printf ( " Usage : sendUdp < message >\ n " );
exit (0);
}
message = argv [1];
/∗ Create the soc k e t . The second argument s p e c i f i e s t h a t
t h i s i s an UDP soc k e t ∗/
if (( sd = socket ( AF_INET , SOCK_DGRAM ,0)) < 0)
234 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
{
perror ( " socket " );
exit (0);
}
/∗ Set up d e s t i n a t i o n address : same as TCP/IP example ∗/
memset (& addr ,0 , sizeof ( addr ));
addr . s i n _ f a m i l y = AF_INET ;
addr . sin_addr . s_addr = i n e t _ a d d r( GROUP );
addr . sin_port = htons ( PORT );
/∗ Send the message ∗/
if ( sendto ( sd , message , strlen ( message ) ,0 ,
( struct sockaddr *) & addr , sizeof ( addr )) < 0)
{
perror ( " sendto " );
exit (0);
}
/∗ Close the soc k e t ∗/
close ( sd );
}
After creating the UDP socket, the required steps for the receiver are
9.5 Summary
This chapter has presented the programming interface of TCP/IP and UDP,
which are widely used in computer systems and embedded applications. Even
if the examples presented here refer to Linux, the same interface is exported
in other operating systems, either natively as in Windows and VxWorks or by
separate modules, and so it can be considered a multiplatform communication
standard. For example, lwIP [27] is a lightweight, open-source protocol stack
that can easily be layered on top of FreeRTOS [13] and other small operating
systems. It exports a subset of the socket interface to the users.
TCP/IP provides reliable communication and represents the base protocol for
a variety of other protocols such as HTTP, FTP and Secure Shell (SSH).
UDP is a lighter protocol and is often used in embedded systems, especially
for real-time applications, because it introduces less overhead. Using UDP,
user programs need to handle the possible loss, duplication and out-of-order
reception of datagrams. Such a management is not as complicated as it might
appear, provided the detected loss of data packets is acceptable. In this case,
it suffices to add a timestamp to each message: the sender increases the times-
tamp for every sent message, and the timestamp is checked by the receiver.
If the timestamp of the received message is the previous received timestamp
plus one, the message has been correctly received, and no datagram has been
lost since the last reception. If the timestamp is greater than the previous one
plus one, at least another datagram has been lost or will arrive out of order.
Finally, if the timestamp is less or equal the previous one, the message is a
duplicated one or arrived out of order, and will be discarded.
The choice between TCP/IP and UDP in an embedded system depends
on the requirements: whenever fast communication is required, and the occa-
sional loss of some data packet is tolerable, UDP is a good candidate. There
are, however, other applications in which the loss of information is not tolera-
ble: imagine what would happen if UDP were used for communicating alarms
in a nuclear plant! So, in practice, both protocols are used, often in the same
application, where TCP/IP is used for offline communication (no realtime re-
quirements) and whenever reliability is an issue. The combined use of TCP/IP
and UDP is common in many applications. For example, the H.323 proto-
col [51], used to provide audiovisual communication sessions on any packet
network, prescribes the use of UDP for voice and image transmission, and
TCP/IP for communication control and management. In fact, the loss of dat-
apacket introduces degradation in the quality of communication, which can be
acceptable to a certain extent. Conversely, failure in management information
exchange may definitely abort a videoconference session.
Even if this chapter concentrated on Ethernet, TCP/IP, and UDP, which
represent the most widespread communication protocols in many fields of ap-
plication, it is worth noting that several other protocols exist, especially in
Network Communication 237
CONTENTS
10.1 Basic Principles and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
10.2 Multidigit Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
10.3 Application to the Readers/Writer Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 251
10.4 Universal Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
10.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
239
240 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
some hardware and software components introduce execution delays that are
inherently hard to predict precisely, for example,
• Branch mispredictions
• Failures
This implies a certain degree of uncertainty on how much time a given task will
actually spend within a critical region in the worst case, and the uncertainty
is reflected back into worst-case blocking time computation, as we will see in
Chapter 15.
Even if a full discussion of the topic is beyond the scope of this book,
this chapter contains a short introduction to a different method of object
sharing, known as lock and wait-free communication. The main difference with
respect to lock-based object sharing is that the former is able to guarantee the
consistency of an object shared by many concurrent processes without ever
forcing any process to wait for another.
In particular, we will first look at a very specific method, namely, a lock-free
algorithm for sharing data among multiple, concurrent readers and one single
writer. Then, another method will be presented, which solves the problem in
more general terms and allows objects of any kind to be shared in a lock-
free way. The two methods are related because some aspects of the former—
namely, the properties of multidigit counters—are used as a base for the latter.
The inner workings of this kind of algorithms are considerably more com-
plex than, for instance, semaphores. Hence, this chapter is, by necessity, based
on more formal grounds than the previous ones and includes many theorems.
However, the proof of most theorems can safely be skipped without losing the
general idea behind them.
See, for example, References [4, 3] for a more thorough discussion about
how lock-free objects can be profitably adopted in a real-time system, as well
as a framework for their implementation.
1977 [56]. Its basic principle is that, if a multidigit register is read in one
direction while it is being written in the opposite direction, the value obtained
by the read has a few useful properties.
Even if it is not so commonly used in practice, this algorithm has the
advantage of being relatively simple to describe. Nevertheless, it has a great
teaching value because it still contains all the typical elements to be found in
more complex lock-free algorithms. This is the main reason it will be presented
here. It should also be noted that simple nonblocking buffers for real-time
systems were investigated even earlier by Sorenson and Hemacher in 1975 [84].
The main underlying hypothesis of the Lamport’s method is that there are
basic units of data, called digits, which can be read and written by means of in-
divisible, atomic operations. In other words, it is assumed that the underlying
hardware automatically sequences concurrent operations at the level of a sin-
gle digit. This hypothesis is not overly restrictive because digits can be made
as small as needed, even as small as a single bit. For example, making them
the size of a machine word is enough to accommodate most existing hardware
architectures, including many shared-memory multiprocessor systems.
Let v denote a data item composed of one or more digits. It is assumed
that two distinct processes cannot concurrently modify v because there is only
one writer. Since the value of v may, and usually will, change over time, let
v[0] denote the initial value of v. This value is assumed to be already there
even before the first read operation starts. In other words, it is assumed that
the initial value of v is implicitly written by an operation that precedes all
subsequent read operations. In the same way, let v[1] , v[2] , . . . , denote the
successive values of v over time.
Due to the definition just given, each write operation to v begins with v
equal to v[i] , for some i ≥ 0, and ends with v equal to v[i+1] . Since v is in
general composed of more than one digit, the transition from the old to the
new value cannot be accomplished in one single, atomic step.
The notation v = v1 , . . . , vm indicates that the data item v is composed
of the subordinate data items vj , 1 ≤ j ≤ m. Each subordinate data item vj
is only written as part of a write to v, that is, subordinate data items do not
change value “on their own.” For simplicity, it is also assumed that a read (or
write) operation of v involves reading (or writing) each vj . If this is not the
case, we pretend that a number of dummy read (or write) operations occur
anyway. For write operations, if a write to v does not involve writing vj , we
pretend that a write to vj was performed, with a value that happens to be
the same as that subordinate data item had before. Under these hypotheses,
we can write
[i]
v[i] = v1 , . . . , vm
[i]
∀i ≥ 0 . (10.1)
In particular, we can say that the i-th value v assumes over time, v[i] , is
[i]
composed by the i-th value of all its subordinate data items vj , 1 ≤ j ≤ m.
When one ore more processes read a data item that cannot be accessed in
an atomic way, while another process is updating it, the results they get may be
Lock and Wait-Free Communication 243
FIGURE 10.1
When one ore more processes read a data item that cannot be accessed in an
atomic way while another process is updating it, the results they get may be
difficult to predict.
difficult to predict. As shown in Figure 10.1, the most favorable case happens
when a read operation is completely performed while no write operation is in
progress. In this case, it will return a well-defined value, that is, one of the
values v[i] the data item v assumes over time. For example, the leftmost read
operation shown in the figure will obtain v[2] .
However, a read of v that is performed concurrently and that overlaps with
one or more writes to v may obtain a value different from any v[i] . In fact, if
v is not composed of a single digit, reading and writing it involves a sequence
of separate operations that take a certain amount of time to be performed.
All those individual operations, invoked by different processes, are executed
concurrently and may therefore overlap with each other.
By intuition, a read operation performed concurrently with one or more
write operations will obtain a value that may contain “traces” of different val-
ues v[i] . If a read obtains traces of m different values, that is, v[i1 ] , . . . , v[im ] ,
we say that it obtained a value of v[k,l] , where k = min(i1 , . . . , im ) and
l = max(i1 , . . . , im ). According to how k and l are defined, it must be
0 ≤ k ≤ l.
Going back to Figure 10.1, we may be convinced that the rightmost read
operation may return traces of v[2] , v[3] , v[4] , and v[5] because the time frame
occupied by the read operation spans across all those successive values of
v. More specifically, the read operation started before v[2] was completely
overwritten by the new value, made progress while values v[3] , v[4] , and v[5]
were being written into v, but ended before the write operation pertaining to
v[6] started.
It should also be remarked that lock-based techniques must not be con-
cerned with this problem because they enforce mutual exclusion between read
and write operations. For instance, when using a mutual exclusion semaphore
to access v, the scenario shown in the rightmost part of Figure 10.1 simply
cannot happen because, assuming that the semaphore has a First Come First
Served (FCFS) waiting queue,
244 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
1. the read operation will start only after the write operation of v[3]
has been fully completed; and
2. the next write operation of v[4] will be postponed until the read
operation has read all digits of v.
More formally, let v be a sequence of m digits: v = d1 , . . . , dm . Since
reading and writing a single digit are atomic operations, a read obtains a
[i ] [i ] [i ]
value d1 1 , . . . , dmm . But since dj j is a digit of value v[ij ] , we can also say
that the read obtained a trace of that particular value of v. In summary, the
read obtains a value v[k,l] , where k = min(i1 , . . . , im ) and l = max(i1 , . . . , im ).
Hence, the consistency of this value depends on how k and l are related:
1. If k = l, then the read definitely obtained a consistent value of v,
[k] [k]
and the value is v[k] = d1 , . . . , dm .
2. If k
= l, then the consistency of the value obtained by the read
cannot be guaranteed.
The last statement does not mean that if k
= l the read will never get a
consistent value. A consistent value can still be obtained if some digits of v
were not changed when going from one value to another. For instance, a read
of a 3-digit data item v may obtain the value
[6] [7] [6]
d1 d2 d3 , (10.2)
because it has been performed while a write operation was bringing the value
of v from v[6] to v[7] . However, if the second digit of v is the same in both
v[6] and v[7] , this still is a consistent value, namely, v[6] .
In general, a data item v can be much more complicated than a fixed
sequence of m digits. The data item size may change with time so that suc-
cessive values of v may consist of different sets of digits. This happens, for
instance, when v is a list linked with pointers in which a certain set of digits
may or may no longer be part of the data item’s value, depending on how the
pointers are manipulated over time.
In this case, a read operation carried out while v is being updated may
return digits that were never part of v. Moreover, it may even be hard to define
what it means for a read to obtain traces of a certain value v[i] . Fortunately,
to solve the readers/writer problem for v, it turns out that the important
thing to ensure is that a read does not return traces of certain versions of v.
More formally, we only need a necessary (not sufficient) condition for a
read to obtain traces of a certain value v[i] . Clearly, if this condition does not
hold, the read cannot obtain traces of v[i] . By generalizing what was shown in
the example of Figure 10.1, the necessary conditions can be stated as follows:
Lemma 10.1. If a read of v obtains traces of value v[i] , then
1. the beginning of the read preceded the end of the write of v[i+1] ;
and
Lock and Wait-Free Communication 245
2. the end of the read followed the beginning of the write of v[i] .
It is easy to prove that this theorem is true in the special case of a multidigit
data item. It is also a reasonable assumption in general, for other types of data,
although it will be harder to prove it.
Proof. For a multidigit number, the theorem can be proven by showing that
if either necessary condition is false, then a read of v cannot obtain a trace of
v[i] . In particular:
[i+1] [i+1]
• Immediately after the write of v[i+1] concludes, it is v = d1 , . . . , dm ,
because all digits of v have been updated with the new value they have in
v[i+1] .
• If the beginning of the read follows the end of this write, the read obtains
[i ] [i ]
a value d1 1 , . . . , dmm with ij ≥ i + 1, that is, ij > i for 1 ≤ j ≤ m.
Informally, speaking, it may be either ij = i + 1 if the read got the k-
th digit before any further update was performed on it by the writer, or
ij > i + 1 if the digit was updated with an even newer value before the
read.
• If the end of the read precedes the beginning of this write, the read obtains
[i ] [i ]
a value d1 1 , . . . , dmm with ij ≤ i − 1, that is, ij < i for 1 ≤ j ≤ m.
Informally speaking, for any j, it will either be ij = i − 1, meaning that
the read received the latest value of the j-th digit, or ij < i − 1, signifying
it received some past value.
• As in the previous case, since the read operation cannot “look into the
future,” the read cannot obtain traces of v[i] because it cannot be ij ≥ i
for any j.
The necessary condition just stated can be combined with the definition
of v[k,l] , given previously, to extend it and make it more general. As before,
the following statement can be proven for multidigit data items, whereas it is
a reasonable assumption, for other types of data:
1. the beginning of the read preceded the end of the write of v[k+1] ;
and
246 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
2. the end of the read followed the beginning of the write of v[l] .
It is now time to discuss a few useful properties of the value a read oper-
ation can obtain when it is performed on a multidigit data item concurrently
with other read operations and one single write operation. In order to this, it
is necessary to formally define the order in which the groups of digit forming
a multidigit data item are read and written.
Let v = v1 , . . . , vm , where the vj are not necessarily single digits but
may be groups of several adjacent digits. A read (or write) of v is performed
from left to right if, for each 1 ≤ j < m, the read (or write) of vj is completed
before the read (or write) of vj+1 is started. Symmetrically, a read (or write)
of v is performed from right to left if, for each 1 < j ≤ m, the read (or write)
of vj is completed before the read (or write) of vj−1 is started.
Before continuing, it should be noted that there is no particular reason to
state that lower-numbered digit groups are “on the left” of the data item, and
that higher-numbered digit groups are “on the right.” This definition is needed
only to give an intuitive name to those two read/write orders and be able to
tell them apart. An alternative statement that would place higher-numbered
digit groups on the left would clearly be equally acceptable.
It is also important to remark that the order in which the individual digits,
within a certain digit group vj are read (or written) is left unspecified by the
definitions.
k1 ≤ l1 ≤ k2 ≤ . . . ≤ km ≤ lm . (10.3)
Then,
[k ] [k]
k1 ≤ k ⇒ d1 1 ≤ d1 . (10.6)
[k]
But v[k] = d1 , by definition, and hence,
[k ]
k1 ≤ k ⇒ d1 1 ≤ v[k] . (10.7)
But (10.7) is equivalent to the first proposition of the Lemma. The second
proposition can be derived in a similar way by observing that
[k ] [k]
k1 ≥ k ⇒ d1 1 ≥ d1 . (10.8)
Hence, it must be
[k ]
k1 ≥ k ⇒ d1 1 ≥ v[k] . (10.9)
Lock and Wait-Free Communication 249
For the induction step, we assume that m > 1 and that the Lemma is
true for m − 1; this will be our induction hypothesis. By definition of μ(·),
according to Definition 10.3, v[i] ≤ v[j] implies that μ(v[i] ) ≤ μ(v[j] ). We can
therefore apply the induction hypothesis because its premises are satisfied.
From the induction hypothesis,
[k ] [k ]
k1 ≤ . . . ≤ km−1 ≤ km ⇒ d1 1 , . . . , dm−1
m−1
≤ μ(v[km ] ). (10.10)
The inequality on the right of (10.10) still holds if we append the same
[k ]
digit dmm to both its sides:
[k ] [k ] [k ]
d1 1 , . . . , dm−1
m−1
≤ μ(v[km ] ) ⇒ d1 1 , . . . , dm
[km ]
≤ v[km ] . (10.11)
but this is exactly what the first proposition of the Lemma states. The sec-
ond proposition can be proven in the same way, basically by reversing all
inequalities in the proof.
Proof. Since the dj are single digits and read/write operations on them are
[k ,l ]
assumed to be atomic, reading a digit always returns a value dj j j in which
[k ]
kj = lj for all 1 ≤ j ≤ m. Let dj j be that value.
From Theorem 10.1, we know that if v is written from right to left and
read from left to right, the value obtained by the read is
[k ] [km ]
d1 1 , . . . , dm with k1 ≤ . . . ≤ km . (10.13)
k1 ≤ . . . ≤ km−1 ≤ l, (10.14)
and we can apply the first proposition of Lemma 10.3 to state that
[k ] [km ]
d1 1 , . . . , dm ≤ v[l] . (10.15)
250 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
FIGURE 10.2
An example of how concurrent read and write operations, performed in oppo-
site orders, relate to each other.
The last equation proves the first proposition of the Theorem. The sec-
ond proposition can be proved in a similar way using the “mirror image” of
Theorem 10.1 and the second proposition of Lemma 10.3.
pertaining to distinct write operations are shown with distinct shades of grey
in the background, whereas the initial values have a white background.
With this depiction, a read operation performed concurrently with the
write operations mentioned above is represented by a path going from left to
right along the figure and picking one value for each digit. Figure 10.2 shows
one of those paths in which the read operation obtains the value 1012. It is
easy to see that, even if there are many possible paths, each corresponding to
a different value, all paths lead the read operation to obtain a value that does
not exceed the value eventually written by the last write operation, that is,
2112.
• The writer increments version number v1 from left to right before starting
to write into the shared data.
• The writer increments version number v2 from right to left after finishing
the write operation.
• The reader reads version number v2 from left to right before starting to
read the shared data.
• The reader reads version number v1 from right to left after finishing reading
of the shared data.
• If the version numbers obtained before and after reading the shared data
do not match, the reader retries the operation.
The corresponding C-like code, derived from the Algol code of Reference [56],
is shown in Figure 10.3. In the figure,
252 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Initialization
multi digit number v1 = 0, v2 = 0; Reader
Writer do {
→
temp = v2;
→ ...read shared data...
v1 = v1 + 1; }
...write shared data... ←
← while(v1 != temp);
v2 = v1;
FIGURE 10.3
C-like code of a lock-free solution to the readers/writer problem, using two
multidigit version numbers, adapted from [56].
→
• v means that the read or write access to the digits of v are performed
from left to right.
←
• v means that the read or write access to the digits of v are performed
from right to left.
Due to the read/write order, Theorem 10.2 states that the reader always
obtains a value of v2 less than or equal to the value just written (or being
written) into v2 when the read concluded. For the same reason, the value
obtained by reading v1 is always greater than or equal to the value v1 had
when the read began, even if that value was already being overwritten by the
writer.
Since the writer increments v1 before starting to update the shared data,
and increments v2 to the same value as v1 after the update is done, if the
reader obtains the same value from reading v2 and v1, it also read a single,
consistent version of the shared data. Otherwise, it might have obtained an
inconsistent value, and it must therefore retry the operation.
Formally, the correctness of the algorithm relies on the following theorem.
It states that, if the reader does not repeat the operation, then it obtained a
consistent value for the shared data.
Theorem 10.3. Let D denote the shared data item. Let v2[k1 ,l1 ] , D[k2 ,l2 ] ,
Lock and Wait-Free Communication 253
and v1[k3 ,l3 ] be the values of v2, D, and v1 obtained by a reader in one single
iteration of its loop.
If v2[k1 ,l1 ] = v1[k3 ,l3 ] (the reader exits from the loop and does not retry the
read), then k2 = l2 (the value it obtained for D is consistent).
Hence, v2[k1 ,l1 ] = v1[k3 ,l3 ] (the reader exits from the loop and does not
retry the read) implies v2[l1 ] = v1[k3 ] , but this in turn entails l1 = k3 . From
(10.18) we can conclude that it must also be k2 = l2 , but this result implies
that the reader obtained a consistent version of D, that is, D[k2 ,l2 ] with k2 = l2 ,
and this proves the theorem.
New version of
the shared
Process 4: store_conditional object
FIGURE 10.4
Basic technique to transform a sequential object implementation into a lock-
free implementation of the same object.
1: load_linked Block #1
1: load_linked
Block #2
Process
P Block #3
Process
Q
Stale pointer
Block #1
4: store_conditional Block #2
Process
P Block #3
Process
Q
FIGURE 10.5
A careless memory management approach may lead to a race condition in
object access.
Block #1
2: P's copy
Block #2
Process
P Block #3
2: Q's copy
Process
Q
FIGURE 10.6
Making a copy of an object while it is being overwritten by another process
is usually not a good idea.
FIGURE 10.7
Pseudocode of the universal construction of a lock-free object proposed in
Reference [34].
instruction checks whether a variable previously read with load linked has
been modified or not, but does not store any new value into it. If the under-
lying architecture does not provide this kind of support, like the ARM V6
processor architecture [5], a software-based consistency check can be used as
a replacement.
In this case, two counters, C0 and C1 , complement each object version.
Both counters are unbounded, that is, it is assumed that they never over-
flow, and start from the same value. The counters are used according to the
following rules:
• On the other hand, C1 and C0 are read before starting and after finishing
to copy an object, respectively.
• The consistency check consists of comparing these values: if they match, the
copied object is definitely consistent; otherwise, it might be inconsistent.
We are now ready to put all pieces together and discuss the full algorithm
to build a lock-free object from a sequential one in a universal way. It is shown
in Figure 10.7 as pseudocode, and its formal correctness proof was given in
Reference [34]; however, we will only discuss it informally here. In the listing, Q
is a shared variable that points to the current version of the shared object, and
N is a local variable that points to the object version owned by the executing
process.
Shared objects are assumed to be structures containing three fields: the
obj field holds the object contents, while C0 and C1 are the two counters used
for consistency check. The obj field does not comprise the counters; hence,
when the obj field is copied from one structure to another, the counters of
the destination structure are not overwritten and retain their previous value.
• The algorithm starts by getting a local pointer, called old, to the current
object version, using load linked (step 1 of Figure 10.7).
• After the copy, the values obtained for the two counters are compared:
if they do not match, another process has worked on the object in the
meantime and the whole operation must be retried (step 6).
• If the consistency check was successful, then N points to a consistent copy
of the shared object, and the executing process can perform the intended
operation on it (step 7).
• If the conditional store was successful, the executing process can acquire
ownership of the old object version (step 10).
It should be noted that the basic scheme just described is quite inefficient in
some cases. For example, it is not suited for “large” objects because it relies
262 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
on copying the whole object before working on it. However, other universal
constructions have been proposed just for that case, for instance Reference [2].
It can also be made wait-free by means of a general technique known as
operation combining and discussed in Reference [34].
10.5 Summary
In this chapter, it has been shown that it is not strictly necessary for processes
to wait for each other if they must share information in an orderly and mean-
ingful way. Rather, the lock and wait-free communication approach allows
processes to perform concurrent accesses to a shared object without blocking
and without introducing any kind of synchronization constraint among them.
Since, due to lack of space, it would have been impossible to fully discuss
the topic, only a couple of algorithms have been considered in this chapter.
They represent an example of two distinct ways of approaching the problem:
the first one is a simple, ad-hoc algorithm that addresses a very specific con-
current programming problem, whereas the second one is more general and
serves as a foundation to build many different classes of lock-free objects.
As a final remark, it is worth noting that, even if lock and wait-free algo-
rithms are already in use for real-world applications, they are still an active
research topic, above all for what concerns their actual implementation. The
development and widespread availability of open-source libraries containing
a collection of lock-free data structures such as, for example, the Concurrent
Data Structures library (libcds) [52] is encouraging. More and more program-
mers will be exposed to them in the near future, and they will likely bring
what is today considered an advanced topic in concurrent programming into
the mainstream.
Part II
Real-Time Scheduling
Analysis
263
This page intentionally left blank
11
Real-Time Scheduling Based on the Cyclic
Executive
CONTENTS
11.1 Scheduling and Process Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
11.2 The Cyclic Executive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
11.3 Choice of Major and Minor Cycle Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
11.4 Tasks with Large Period or Execution Time . . . . . . . . . . . . . . . . . . . . . . . . . . 273
11.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
265
266 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
may be unacceptable for what concerns timings. This is the main goal of
scheduling models, the main topic of the second part of this book.
in a given time interval. As in the previous cases, the focus of the schedul-
ing algorithm is on carrying out as much useful work as possible given a
certain set of processes, rather than satisfying any time-related property of
a specific process.
On the other hand, real-time scheduling algorithms must put the emphasis
on the timing requirements of each individual process being executed, even
if this entails a greater overhead and the mean performance of the system
becomes worse. In order to do this, the scheduling algorithms can take advan-
tage of the greater amount of information that, on most real-time systems, is
available on the processes to be executed.
This is in sharp contrast with the scenario that general-purpose scheduling
algorithms usually face: nothing is known in advance about the processes being
executed, and their future characteristics must be inferred from their past
behavior. Moreover, those characteristics, such as processor time demand,
may vary widely with time. For instance, think about a web browser: the
interval between execution bursts and the amount of processor time each of
them requires both depend on what its human user is doing at the moment
and on the contents of the web pages he or she it looking at.
Even in the context of real-time scheduling, it turns out that the analysis
of an arbitrarily complex concurrent program, in order to predict its worst-
case timing behavior, is very difficult. It is necessary to introduce a simplified
process model that imposes some restrictions on the structure of real-time
concurrent programs to be considered for analysis.
The simplest model, also known as the basic process model, has the fol-
lowing characteristics:
TABLE 11.1
Notation for real-time scheduling algorithms and analysis methods
Symbol Meaning
τi The i-th task
τi,j The j-th instance of the i-th task
Ti The period of task τi
Di The relative deadline of task τi
Ci The worst-case execution time of task τi
Ri The worst-case response time of task τi
ri,j The release time of τi,j
fi,j The response time of τi,j
di,j The absolute deadline of τi,j
The basic model just introduced has a number of shortcomings, and will be
generalized to make it more suitable to describe real-world systems. In par-
ticular,
• Process independence must be understood in a very broad sense. It means
that there are no synchronization constraints among processes at all, so no
process must even wait for another. This rules out, for instance, mutual
exclusion and synchronization semaphores and is somewhat contrary to
the way concurrent systems are usually designed, in which processes must
interact with one another.
• The deadline of a process is not always related to its period, and is often
shorter than it.
• Some processes are sporadic rather than periodic. In other words, they
are executed “on demand” when an external event, for example an alarm,
occurs.
• For some applications and hardware architectures, scheduling and context
switch times may not be negligible.
• The behavior of some nondeterministic hardware components, for example,
caches, must sometimes be taken into account, and this makes it difficult to
determine a reasonably tight upper bound on the process execution time.
• Real-time systems may sometimes be overloaded, a critical situation in
which the computational demand exceeds the system capacity during a
certain time interval. Clearly, not all processes will meet their deadline in
this case, but some residual system properties may still be useful. For in-
stance, it may be interesting to know what processes will miss their deadline
first.
The basic notation and nomenclature most commonly adopted to define
scheduling algorithms and the related analysis methods are summarized in
Real-Time Scheduling Based on the Cyclic Executive 269
Di Di
fi,1 fi,2 Ri ≥ fi,j ∀j
Ci
Ci
di,1 di,2
Preemption Time
ri,1 ri,2
Ti Interference
ri,3
FIGURE 11.1
Notation for real-time scheduling algorithms and analysis methods.
Table 11.1. It will be used throughout the second part of this book. Figure 11.1
contains a graphical depiction of the same terms and shows the execution of
two instances, τi,1 and τi,2 , of task τi .
As shown in the figure, it is important to distinguish among the worst-case
execution time of task τi , denoted by Ci , the response time of its j-th instance
fi,j , and its worst-case response time, denoted by Ri . The worst-case execution
time is the time required to complete the task without any interference from
other activities, that is, if the task being considered were alone in the system.
The response time may (and usually will) be longer due to the effect of
other tasks. As shown in the figure, a higher-priority task becoming ready
during the execution of τi will lead to a preemption for most scheduling algo-
rithms, so the execution of τi will be postponed and its completion delayed.
Moreover, the execution of any tasks does not necessarily start as soon as they
are released, that is, as soon an they become ready for execution.
It is also important to clearly distinguish between relative and absolute
deadlines. The relative deadline Di is defined for task τi as a whole and is the
same for all instances. It indicates, for each instance, the distance between
its release time and the deadline expiration. On the other hand, there is one
distinct absolute deadline di,j for each task instance τi,j . Each of them denotes
the instant in which the deadline expires for that particular instance.
TABLE 11.2
A simple task set to be executed by a cyclic executive
Task τi Period Ti (ms) Execution time Ci (ms)
τ1 20 9
τ2 40 8
τ3 40 8
τ4 80 2
In its most basic form, it is assumed that the basic model just introduced
holds, that is, there is a fixed set of periodic tasks. The basic idea is to lay out
offline a completely static schedule such that its repeated execution causes all
tasks to run at their correct rate and finish within their deadline. The existence
of such a schedule is also a proof “by construction” that all tasks will actually
and always meet their deadline at runtime. Moreover, the sequence of tasks
in the schedule is always the same so that it can be easily understood and
visualized.
For what concerns its implementation, the schedule can essentially be
thought of as a table of procedure calls, where each call represents (part of)
the code of a task. During execution, a very simple software component, the
cyclic executive, loops through the table and invokes the procedures it con-
tains in sequence. To keep the executive in sync with the real elapsed time,
the table also contains synchronization points in which the cyclic executive
aligns the execution with a time reference usually generated by a hardware
component.
In principle, a static schedule can be entirely crafted by hand but, in prac-
tice, it is desirable for it to adhere to a certain well-understood and agreed-
upon structure, and most cyclic executives are designed according to the fol-
lowing principles. The complete table is also known as the major cycle and is
typically split into a number of slices called minor cycles, of equal and fixed
duration.
Minor cycle boundaries are also synchronization points: during execution,
the cyclic executive switches from one minor cycle to the next after waiting
for a periodic clock interrupt. As a consequence, the activation of the tasks at
the beginning of each minor cycle is synchronized with the real elapsed time,
whereas all the tasks belonging to the same minor cycle are simply activated
in sequence. The minor cycle interrupt is also useful in detecting a critical
error known as minor cycle overrun, in which the total execution time of the
tasks belonging to a certain minor cycle exceeds the length of the cycle itself.
As an example, the set of tasks listed in Table 11.2 can be scheduled on
a single-processor system as shown in the time diagram of Figure 11.2. If
deadlines are assumed to be the same as periods for all tasks, from the figure
it can easily be seen that all tasks are executed periodically, with the right
period, and they all meet their deadlines.
More in general, this kind of time diagram illustrates the job that each
Real-Time Scheduling Based on the Cyclic Executive 271
τ4,1
time
20 ms
(minor cycle)
80 ms
(major cycle)
FIGURE 11.2
An example of how a cyclic executive can successfully schedule the task set of
Table 11.2.
while (1)
{
w a i t _ f o r _ i n t e r r u p t ();
task_1 ();
task_2 ();
task_4 ();
w a i t _ f o r _ i n t e r r u p t ();
task_1 ();
task_3 ();
w a i t _ f o r _ i n t e r r u p t ();
task_1 ();
task_2 ();
272 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
w a i t _ f o r _ i n t e r r u p t ();
task_1 ();
task_3 ();
}
}
In the sample program, the scheduling table is actually embedded into the
main loop. The main program first sets up the timer with the right minor
cycle period and then enters an endless loop. Within the loop, the function
wait for interrupt() sets the boundary between one minor cycle and the
next. In between, the functions corresponding to the task instances to be
executed within the minor cycle are called in sequence.
Hence, for example, the first minor cycle contains a call to task 1(),
task 2(), and task 4() because, as it can be seen on the left part of Fig-
ure 11.2, the first minor cycle must contain one instance of τ1 , τ2 , and τ4 .
With this implementation, no actual processes exist at run-time because
the minor cycles are just a sequence of procedure calls. These procedures share
a common address space, and hence, they implicitly share their global data.
Moreover, on a single processor system, task bodies are always invoked se-
quentially one after another. Thus, shared data do not need to be protected
in any way against concurrent access because concurrent access is simply not
possible.
Once a suitable cyclic executive has been constructed, its implementa-
tion is straightforward and very efficient because no scheduling activity takes
place at run-time and overheads are very low, without precluding the use of
a very sophisticated (and computationally expensive) algorithm to construct
the schedule. This is because scheduler construction is done completely offline.
On the downside, the cyclic executive “processes” cannot be protected from
each other, as regular processes are, during execution. It is also difficult to
incorporate nonperiodic activities efficiently into the system without changing
the task sequence.
TABLE 11.3
A task set in which a since task, τ4 , leads to a large major cycle because its
period is large
Task τi Period Ti (ms) Execution time Ci (ms)
τ1 20 9
τ2 40 8
τ3 40 8
τ4 400 2
The cyclic executive repeats the same schedule over and over at each major
cycle. Therefore, the major cycle must be big enough to be an integer multiple
of all task periods, but no larger than that to avoid making the scheduling table
larger than necessary for no reason. A sensible choice is to let the major cycle
length be the Least Common Multiple (LCM) of the task periods. Sometimes
this is also called the hyperperiod of the task set. For example, if we call TM
the major cycle length, we have
time
20 ms
(minor cycle)
Worst-case exec. time: C4 = 2 ms
40 ms
(major cycle)
FIGURE 11.3
An example of how a simple secondary schedule can schedule the task set of
Table 11.3 with a small major cycle.
the issue can be circumvented by designing the schedule as if τ4 were not part
of the system, and then using a so-called secondary schedule.
In its simplest form, a secondary schedule is simply a wrapper placed
around the body of a task, task 4() in our case. The secondary schedule is
invoked on every major cycle and, with the help of a private counter q that
is incremented by one at every invocation, it checks if it has been invoked for
the k-th time, with k = 10 in our example. If this is not the case, it does
nothing; otherwise, it resets the counter and invokes task 4(). The code of
the secondary schedule and the corresponding time diagram are depicted in
Figure 11.3.
As shown in the figure, even if the time required to execute the wrapper
itself is negligible, as is often the case, the worst-case execution time that
must be considered during the cyclic executive design to accommodate the
secondary schedule is still equal to the worst-case execution time of τ4 , that is,
C4 . This is an extremely conservative approach because task 4() is actually
invoked only on every k iterations of the schedule, and hence, the worst-case
execution time of the wrapper is very different from its mean execution time.
A different issue may occur when one or more tasks have a large execu-
tion time. The most obvious case happens when the execution time of a task
is greater than the minor cycle length so that it simply cannot fit into the
schedule. However, there may be subtler problems as well. For instance, for
Real-Time Scheduling Based on the Cyclic Executive 275
TABLE 11.4
Large execution times, of τ3 in this case, may lead to problems when designing
a cyclic executive
Task τi Period Ti (ms) Execution time Ci (ms)
τ1 25 10
τ2 50 8
τ3 100 20
the task set shown in Table 11.4, the minor and major cycle length, chosen
according to the rules given in Section 11.3, are
Tm = 25 ms (11.6)
TM = 100 ms (11.7)
In this case, as shown in the upper portion of Figure 11.4, task instance
τ3,1 could be executed entirely within a single minor cycle because C3 ≤ Tm ,
but this choice would hamper the proper schedule of other tasks, especially τ1 .
In fact, the first instance of τ1 would not fit in the first minor cycle because
C1 + C3 > Tm . Shifting τ3,1 into another minor cycle does not solve the
problem either.
Hence, the only option is to split τ3,1 into two pieces: τ3,1a and τ3,1b ,
and put them into two distinct minor cycles. For example, as shown in the
lower part of Figure 11.4, we could split τ3,1 into two equal pieces with an
execution time of 10 ms each and put them into the first and third minor
cycle, respectively.
Although it is possible to work out a correct cyclic executive in this way, it
should be remarked that splitting tasks into pieces may cut across the tasks in
a way that has nothing to do with the structure of the code itself. In fact, the
split is not made on the basis of some characteristics of the code but merely
on the constraints the execution time of each piece must satisfy to fit into the
schedule.
Moreover, task splits make shared data management much more com-
plicated. As shown in the example of Figure 11.4—but this is also true in
general—whenever a task instance is split into pieces, other task instances
are executed between those pieces. In our case, two instances of task τ1 and
one instance of τ2 are executed between τ3,1a and τ3,1b . This fact has two
important consequences:
time
25 ms
(minor cycle)
100 ms
(major cycle)
τ3,1
time
25 ms
(minor cycle)
100 ms
(major cycle)
FIGURE 11.4
In some cases, such as for the task set of Table 11.4, it is necessary to split
one or more tasks with a large execution time into pieces to fit them into a
cyclic executive.
Last but not least, building a cyclic executive is mathematically hard in itself.
Moreover, the schedule is sensitive to any change in the task characteristics,
above all their periods, which requires the entire scheduling sequence to be
reconstructed from scratch when those characteristics change.
Even if the cyclic executive approach is a simple and effective tool in many
cases, it may not be general enough to solve all kinds of real-time scheduling
problems that can be found in practice. This reasoning led to the introduction
of other, more sophisticated scheduling models, to be discussed in the following
chapters. The relative advantages and disadvantages of cyclic executives with
respect to other scheduling models have been subject to considerable debate.
Real-Time Scheduling Based on the Cyclic Executive 277
11.5 Summary
To start discussing about real-time scheduling, it is first of all necessary to
abstract away, at least at the beginning, from most of the complex and involved
details of real concurrent systems and introduce a simple process model, more
suitable for reasoning and analysis. In a similar way, an abstract scheduling
model specifies a scheduling algorithm and its associated analysis methods
without going into the fine details of its implementation.
In this chapter, one of the simplest process models, called the basic process
model, has been introduced, along with the nomenclature associated with it. It
is used throughout the book as a foundation to talk about the most widespread
real-time scheduling algorithms and gain an insight into their properties. Since
some of its underlying assumptions are quite unrealistic, it will also be pro-
gressively refined and extended to make it adhere better to what real-world
processes look like.
Then we have gone on to specify how one of the simplest and most intuitive
real-time scheduling methods, the cyclic executive, works. Its basic idea is to
lay out a time diagram and place task instances into it so that all tasks
are executed periodically at their proper time and they meet their timing
constraints or deadlines.
The time diagram is completely built offline before the system is ever
executed, and hence, it is possible to put into action sophisticated layout
algorithms without incurring any significant overhead at runtime. The time
diagram itself also provides intuitive and convincing evidence that the system
really works as intended.
That said, the cyclic executive also has a number of disadvantages: it may
be hard to build, especially for unfortunate combinations of task execution
times and periods, it is quite inflexible, and may be difficult to properly main-
tain it when task characteristics are subject to change with time or the system
complexity grows up. For this reason, we should go further ahead and examine
other, more sophisticated, scheduling methods in the next chapters.
278 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
12
Real-Time, Task-Based Scheduling
CONTENTS
12.1 Fixed and Variable Task Priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
12.1.1 Preemption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
12.1.2 Variable Priority in General Purpose Operating
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
12.2 Rate Monotonic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
12.2.1 Proof of Rate Monotonic Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . 285
12.3 The Earliest Deadline First Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
The previous chapter has introduced the basic model and terminology for
real-time scheduling. The same notation will be used in this chapter as well
as in the following ones, and therefore, it is briefly recalled here. A periodic
real-time process is called a task and denoted by τi . A task models a periodic
activity: at the j-th occurrence of the period Ti a job τi,j for a given task τi is
released. The job is also called an instance of the task τi . The relative deadline
Di for a task τi represents the maximum time allowed between the release of
any job τi,j and its termination, and, therefore, the absolute deadline di,j for
the job τi,j is equal to its release time plus the relative deadline. The worst-
case execution time of task Ti represents the upper limit of the processor
time required for the computation of any job for that task, while Ri indicates
the worst-case response time of task Ti , that is, the maximum elapsed time
between the release of any job for this task and its termination. The worst-
case execution time (WCET) Ci is the time required to complete any job of
the task τi without any interference from other activities. Finally, fi,j is the
actual absolute response time (i.e., the time of its termination) for job τi,j of
task τi .
While in the cyclic executive scheduling policy all jobs were executed in a
predefined order, in this chapter we shall analyze a different situation where
tasks correspond to processes or threads and are therefore scheduled by the
operating system based on their current priority. Observe that, in the cyclic
executive model, there is no need for a scheduler at all: the jobs are represented
by routines that are invoked in a predefined order by a single program. Here
we shall refer to a situation, which is more familiar to those who have read
the first part of this book, where the operating system handles the concurrent
execution of different units of execution. In the following, we shall indicate
279
280 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
such units as tasks, being the distinction between processes and threads not
relevant in this context. Depending on the practical requirements, tasks will be
implemented either by processes or threads, and the results of the scheduling
analysis are valid in both cases.
Many of the results presented in this chapter and in the following ones
are due to the seminal work of Liu and Layland [60], published in 1973. Sev-
eral proofs given in the original paper have later been refined and put in a
more intuitive form by Buttazzo [19]. Interested readers are also referred to
Reference [78] for more information about the evolution of real-time schedul-
ing theory from a historical perspective. Moreover, References [19, 61] discuss
real-time scheduling in much more formal terms than can be afforded here,
and they will surely be of interest to readers with a stronger mathematical
background.
12.1.1 Preemption
Before discussing about priority assignment policies, we need to consider an
important fact: what happens if, during the execution of a task at a given
priority, another task with higher priority becomes ready? Most modern op-
erating systems in this case reclaim the processor from the executing task and
assign it to the task with higher priority by means of a context switch. This
policy is called preemption, and it ensures that the most important task able
to utilize the processor is always executing. Older operating systems, such as
MS-DOS or the Mac OS versions prior to 10, did not support preemption,
and therefore a task that took possession of the processor could not be forced
to release it, unless it performed an I/O operation or invoked a system call.
Preemption presents several advantages, such as making the system more re-
active and preventing rogue tasks from monopolizing the processor. The other
Real-Time, Task-Based Scheduling 281
side of the coin is that preemption is responsible for most race conditions due
to the possibly unforeseen interleaved execution of higher-priority tasks.
Since in a preemptive policy the scheduler must ensure that the task cur-
rently in execution is always at the highest priority among those that are
ready, it is important to understand when a context switch is possibly re-
quired, that is, when a new higher priority task may request the processor.
Let us assume first that the priorities assigned to tasks are fixed: a new task
may reclaim the processor only when it becomes ready, and this may happen
only when a pending I/O operation for that task terminates or a system call
(e.g., waiting for a semaphore) is concluded. In all cases, such a change in the
task scenario is carried out by the operating system, which can therefore effec-
tively check current task priorities and ensure that the current task is always
that with the highest priority among the ready ones. This fact holds also if
we relax the fixed-priority assumption: the change in task priority would be
in any case carried out by the operating system, which again is aware of any
possible change in the priority distribution among ready tasks.
Within the preemptive organization, differences may arise in the manage-
ment of multiple ready tasks with the same highest priority. In the following
discussion, we shall assume that all the tasks have a different priority level, but
such a situation represents somehow an abstraction, the number of available
priority levels being limited in practice. We have already seen that POSIX
threads allow two different management of multiple tasks at the same highest
priority tasks:
1. The First In First Out (FIFO) management, where the task that
acquires the processor will execute until it terminates or enters in
wait state due to an I/O operation or a synchronization primitive,
or a higher priority task becomes ready.
2. The Round Robin (RR) management where after some amount of
time (often called time slice) the running task is preempted by the
scheduler even if no I/O operation is performed and no higher-
priority task is ready to let another task at the same priority gain
processor usage.
tions, but interaction with a human user is a major use case and, in this
case, the perceived responsiveness of the system is an important factor. When
interacting with a computer via a user interface, in fact, getting a quick re-
sponse to user events such as the click of the mouse is preferable over other
performance aspects such as overall throughput in computation. For this rea-
son, a task that spends most of its time doing I/O operations, including the
response to user interface events, is considered more important than a task
making intensive computation. Moreover, when the processor is assigned to a
task making lengthy computation, if preemption were not supported by the
scheduler, the user would experience delays in interaction due to the fact that
the current task would not get a chance to release the processor if performing
only computation and not starting any I/O. For these reasons, the scheduler
in a general purpose operating system will assign a higher priority to I/O
intensive tasks and will avoid that a computing-intensive task monopolize the
processor, thus blocking interaction for an excessively long period. To achieve
this, it is necessary to provide an answer to the following questions:
TABLE 12.1
An example of Rate Monotonic priority assignment
Task Period Computation Time Priority
τ1 20 7 High
τ2 50 13 Low
τ3 25 6 Medium
cate higher priorities, but in any case, this is only an implementation issue
and does not affect the following discussion.
In order to better understand how a scheduler (the rate monotonic sched-
uler in this case) works, and to draw some conclusions about its characteris-
tics, we can simulate the behavior of the scheduler and build the corresponding
scheduling diagram. To be meaningful, the simulation must be carried out for
an amount of time that is “long enough” to cover all possible phase relations
among the tasks. As for the cyclic executive, the right amount of time is the
Least Common Multiple (LCM) of the task periods. After such period, if no
overflow occurs (i.e., the scheduling does not fail), the same sequence will re-
peat, and therefore, no further information is obtained when simulation of the
system behavior is performed for a longer period of time. Since we do not have
any additional information about the tasks, we also assume that all tasks are
simultaneously released at t = 0. We shall see shortly that this assumption
is the most pessimistic one when considering the scheduling assignment, and
therefore, if we prove that a given task priority assignment can be used for
a system, that it will be feasible regardless of the actual initial release time
(often called phase) of task jobs. A sample schedule is shown in Figure 12.1.
The periods of the three tasks are 20 ms, 25 ms and 50 ms, respectively,
and therefore the period to be considered in simulation is 100ms, that is the
Least Common Multiplier of 20, 25 and 50. At t = 0, all tasks are ready: the
first one to be executed is τ1 then, at its completion, τ3 . At t = 13 ms, τ2
finally starts but, at t = 20 ms, τ1 is released again. Hence, τ2 is preempted
in favor of τ1 . While τ1 is executing, τ3 is released, but this does not lead to a
preemption: τ3 is executed after τ1 has finished. Finally, τ2 is resumed and then
completed at t = 39 ms. At t = 40 ms, after 1 ms of idling, task τ1 is released.
Since it is the only ready task, it is executed immediately, and completes at t
= 47 ms. At t = 50ms, both τ3 and τ2 become ready simultaneously. τ3 is run
first, then τ2 starts and runs for 4 ms. However, at t = 60 ms, τ1 is released
again. As before, this leads to the preemption of τ2 and τ1 runs to completion.
Then, τ2 is resumed and runs for 8 ms, until τ3 is released. τ2 is preempted
again to run τ3 . The latter runs for 5 ms but at, t = 80 ms, τ1 is released for
the fifth time. τ3 is preempted, too, to run τ1 . After the completion of τ1 , both
τ3 and τ2 are ready. τ3 runs for 1 ms, then completes. Finally, τ2 runs and
completes its execution cycle by consuming 1 ms of CPU time. After that, the
system stays idle until t = 100 ms, where the whole cycle starts again.
Real-Time, Task-Based Scheduling 285
100 ms
Preemption τ3,4
τ2,1 τ2,2
τ1 release times
τ3 release times
τ2 release times
FIGURE 12.1
Scheduling sequence for tasks τ1 , τ2 , and τ3 .
Intuitively Rate Monotonic makes sense: tasks with shorter period are
expected to be executed before others because they have less time available.
Conversely, a task with a long period can afford waiting for other more urgent
tasks and finish its execution in time all the same. However intuition does
not represent a mathematical proof, and we shall prove that Rate Monotonic
is really the best scheduling policy among all the fixed priority scheduling
policies. In other words, if every task job finishes execution within its deadline
under any given fixed priority assignment policy, then the same system is
feasible under Rate Monotonic priority assignment. The formal proof, which
may be skipped by the less mathematically inclined reader, is given below.
We shall prove this in two steps. First we shall introduce the concept of “crit-
ical instant,” that is, the “worst” situation that may occur when a set of
periodic tasks with given periods and computation times is scheduled. Task
jobs can in fact be released at arbitrary instants within their period, and the
time between period occurrence and job release is called the phase of the task.
We shall see that the worst situation will occur when all the jobs are initially
released at the same time (i.e., when the phase of all the tasks is zero). The
following proof will refer to such a situation: proving that the system under
consideration is schedulable in such a bad situation means proving that it will
be schedulable for every task phase.
We introduce first some considerations and definition:
• According to the simple process model, the relative deadline of a task is
equal to its period, that is, Di = Ti ∀i.
• Hence, for each task instance, the absolute deadline is the time of its next
release, that is, di,j = ri,j+1 .
• We say that there is an overflow at time t if t is the deadline for a job that
misses the deadline.
• A scheduling algorithm is feasible for a given set of task if they are scheduled
so that no overflows ever occur.
• A critical instant for a task is an instant at which the release of the task
will produce the largest response time.
• A critical time zone for a task is the interval between a critical instant and
the end of the task response.
The following theorem, proved by Liu and Layland [60], identifies critical
instants.
Theorem 12.1. A critical instant for any task occurs whenever it is released
simultaneously with the release of all higher-priority tasks.
To prove the theorem, which is valid for every fixed-priority assignment,
let τ1 , τ2 , . . . , τm be a set of tasks, listed in order of decreasing priority, and
consider the task with the lowest priority, τm . If τm is released at t1 , between
t1 and t1 + Tm , that is, the time of the next release of τm , other tasks with
a higher priority will possibly be released and interfere with the execution
of τm because of preemption. Now, consider one of the interfering tasks, τi ,
with i < m and suppose that, in the interval between t1 and t1 + Tm , it is
released at t2 ; t2 + Ti , . . . ; t2 + kTi , with t2 ≥ t1 . The preemption of τm by
τi will cause a certain amount of delay in the completion of the instance of τm
being considered, unless it has already been completed before t2 , as shown in
Figure 12.2.
From the figure, it can be seen that the amount of delay depends on the
relative placement of t1 and t2 . However, moving t2 towards t1 will never
Real-Time, Task-Based Scheduling 287
τm
t
t1 t1+T1
τi τi τi
τm τm τm
t
FIGURE 12.2
Interference to τm due to higher-priority tasks τi .
τ2
t
C1
0 T2
τ1
t
0 T1 2T1
FIGURE 12.3
Tasks τ1 and τ2 not scheduled under RM.
assignment, then it is also schedulable by RM. This result also implies that
if RM cannot schedule a certain task set, no other fixed-priority assignment
algorithm can schedule it.
We shall consider first the simpler case in which exactly two tasks are
involved, and we shall prove that if the set of two tasks τ1 and τ2 is schedulable
by any arbitrary, but fixed, priority assignment, then it is schedulable by RM
as well.
Let us consider two tasks, τ1 and τ2 , with T1 < T2 . If their priorities are
not assigned according to RM, then τ2 will have a priority higher than τ1 . At
a critical instant, their situation is that shown in Figure 12.3.
The schedule is feasible if (and only if) the following inequality is satisfied:
C1 + C2 ≤ T1 (12.1)
τ2 τ2 τ2
t
(F = 2 in this example)
0 T2
τ1 τ1 τ1
t
0 T1 2T1
FIGURE 12.4
Situation in which all the instances of τ1 are completed before the next release
of τ2 .
C1 < T2 − F T1 (12.3)
From Figure 12.4 , we can see that the task set is schedulable if and only
if
(F + 1)C1 + C2 ≤ T2 (12.4)
Now consider case 2, corresponding to Figure 12.5. The second case occurs
when
C1 ≥ T2 − F T1 (12.5)
From Figure 12.5, we can see that the task set is schedulable if and only if
F C1 + C2 ≤ F T1 (12.6)
In summary, given a set of two tasks, τ1 and τ2 , with T1 < T2 we have the
following two conditions:
1. When priorities are assigned according to RM, the set is schedulable
if and only if
• (F + 1)C1 + C2 ≤ T2 , when C1 < T2 − F T1 .
290 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
C1 ≥ T2 − FT1 → Overlap
τ2 τ2
t
(F = 2 in this example)
0 T2
τ1 τ1 τ1
t
0 T1 2T1
FIGURE 12.5
Situation in which the last instance of τ1 that starts within the critical zone
of τ2 overlaps the next release of τ2 .
• F C1 + C2 ≤ F T1 , when C1 ≥ T2 − F T1 .
(F + 1)C1 + F C2 ≤ F T1 + C1 (12.7)
F C2 ≥ C2 (12.8)
F T1 + C1 < T2 (12.9)
Real-Time, Task-Based Scheduling 291
As a consequence, we have
F C1 + F C2 ≤ F T1 (12.11)
F C2 ≥ C2 (12.12)
As a consequence, we have
F C1 + C2 ≤ F C1 + F C2 ≤ F T1 (12.13)
which concludes the proof of the optimality of RM when considering two tasks.
The optimality of RM is then extended to an arbitrary set of tasks thanks
to the following theorem [60]:
Theorem 12.2. If the task set τ1 , . . . , τn (n tasks) is schedulable by any
arbitrary, but fixed, priority assignment, then it is schedulable by RM as well.
The proof is a direct consequence of the previous considerations: let τi and
τj be two tasks of adjacent priorities, τi being the higher-priority one, and
suppose that Ti > Tj . Having adjacent priorities, both τi and τj are affected
in the same way by the interferences coming from the higher-priority tasks
(and not at all by the lower-priority ones). Hence, we can apply the result just
obtained and state that if we interchange the priorities of τi and τj , the set is
still schedulable. Finally, we notice that the RM priority assignment can be
obtained from any other priority assignment by a sequence of pairwise priority
reorderings as above, thus ending the proof.
The above problem has far-reaching implications because it gives us a
simple way for assigning priorities to real-time tasks knowing that that choice
is the best ever possible. At this point we may wonder if it is possible to
do better by relaxing the fixed-priority assumption. From the discussion at
the beginning of this chapter, the reader may have concluded that dynamic
priority should be abandoned when dealing with real-time systems. This is
true for the priority assignment algorithms that are commonly used in general
purpose operating systems since there is no guarantee that a given job will
terminate within a fixed amount of time. There are, however, other algorithms
for assigning priority to tasks that do not only ensure a timely termination of
the job execution but perform better than fixed-priority scheduling. The next
section will introduce the Earliest Deadline First dynamic priority assignment
policy, which takes into account the absolute deadline of every task in the
priority assignment.
292 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
where φi is the phase of task τi , that is, the release time of its first instance
(for which j = 0), and Ti and Di are the period and relative deadlines of task
τi , respectively. The priority of each task is assigned dynamically, because it
depends on the current deadlines of the active task instances. The reader may
be concerned about the practical implementation of such dynamic priority
assignment: does it require that the scheduler must continuously monitor the
current situation in order to arrange task priorities when needed? Luckily,
the answer is no: in fact, task priorities may be updated only when a new
task instance is released (task instances are released at every task period).
Afterwards, when time passes, the relative order due to the proximity in time
of the next deadline remains unchanged among active tasks, and therefore,
priorities are not changed.
As for RM, EDF is an intuitive choice as it makes sense to increase the
priority of more “urgent” tasks, that is, for which deadline is approaching. We
already stated that intuition is not a mathematical proof, therefore we need
a formal way of proving that EDF is the optimal scheduling algorithm, that
is, if any task set is schedulable by any scheduling algorithm, then it is also
schedulable by EDF. This fact can be proved under the following assumption:
The formal proof will be provided in the next chapter, where it will be shown
that any set of tasks whose processor utilization does not exceed the processor
capability is schedulable under EDF. The processor utilization for a set of tasks
τ1 , . . . , τn is formally defined as
n
Ci
(12.15)
i=1
Ti
12.4 Summary
This chapter has introduced the basics of task based scheduling, providing two
“optimal” scheduling procedures: RM for fixed task priority assignment, and
EDF for dynamic task priority assignment. Using a fixed-priority assignment
has several advantages over EDF, among which are the following:
• EDF requires a more complex run-time system, which will typically have
a higher overhead.
• EDF is less predictable and can experience a domino effect in which a large
number of tasks unnecessarily miss their deadline.
On the other side, EDF is always able to exploit the full processor capacity,
whereas fixed-priority assignment, and therefore RM, in the worst case does
not.
EDF implementations are not common in commercial real-time kernels
because the operating system would need to keep into account a set of param-
eters that is not considered in general-purpose operating systems. Moreover,
EDF refers to a task model (periodic tasks with given deadline) that is more
specific than the usual model of process. There is, however, a set of real-time
open-source kernels that support EDF scheduling, and a new scheduling mode
has been recently proposed for Linux [29]. Both have developed under the FP7
European project ACTORS [1].
Here, each task is characterized by a budget and a period, which is equal to
its relative deadline. At any time, the system schedules the ready tasks having
the earliest deadlines. During execution, the budget is decreased at every clock
tick, and when a task’s budget reaches zero (i.e., the task executed for a time
interval equal to its budget), the task is stopped until the beginning of the
next period, the deadline of the other tasks changed accordingly, and the task
with the shortest deadline chosen for execution.
Up to now, however, the usage of EDF scheduling is not common in em-
bedded systems, and a fixed task priority under RM policy is normally used.
As a final remark, observe that all the presented analysis relies on the
assumption that the considered tasks do not interact each other, neither are
they suspended, for example, due to an I/O operation. This is a somewhat
unrealistic assumption (whole chapters of this book are devoted to interprocess
communication and I/O), and such effects must be taken into consideration
in real-world systems. This will be the main argument of Chapters 15 and 16,
294 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
which will discuss the impact in the schedulability analysis of the use of system
resources and I/O operations.
13
Schedulability Analysis Based on
Utilization
CONTENTS
13.1 Processor Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
13.2 Sufficient Schedulability Test for Rate Monotonic . . . . . . . . . . . . . . . . . . . . 298
13.2.1 Ulub for Two Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
13.2.2 Ulub for N Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
13.3 Schedulability Test for EDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
13.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
The previous chapter introduced the basic concepts in process scheduling and
analyzed the two classes of scheduling algorithms: fixed priority and vari-
able priority. When considering fixed-priority scheduling, it has been shown
that Rate Monotonic (RM) Scheduling is optimal, that is, if a task set is
schedulable under any-fixed priority schema, then it will be under RM. For
variable-priority assignment, the optimality of Earliest Deadline First (EDF)
has been enunciated and will be proved in this chapter.
Despite the elegance and importance of these two results, their practical
impact for the moment is rather limited. In fact, what we are interested in
practice is to know whether a given task assignment is schedulable, before
knowing what scheduling algorithm to use. This is the topic of this chapter
and the next one. In particular, a sufficient condition for schedulability will
be presented here, which, when satisfied, ensures that the given set of tasks is
definitely schedulable. Only at this point do the results of the previous chapter
turn out to be useful in practice because they give us an indication of the right
scheduling algorithm to use.
295
296 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
where CiT i is the fraction of processor time spent executing task τi . The proces-
sor utilization factor is therefore a measure of the computational load imposed
on the processor by a given task set and can be increased by increasing the
execution times Ci of the tasks. For a given scheduling algorithm A, there
exists a maximum value of U below which the task set Γ is schedulable, but
for which any increase in the computational load Ci of any of the tasks in the
task set will make it no longer schedulable. This limit will depend on the task
set Γ and on the scheduling algorithm A.
A task set Γ is said to fully utilize the processor with a given scheduling al-
gorithm A if it is schedulable by A, but any increase in the computational load
Ci of any of its tasks will make it no longer schedulable. The corresponding
upper bound of the utilization factor is denoted as Uub (Γ, A).
If we consider now all the possible task sets Γ, it is interesting (and use-
ful) to ask how large the utilization factor can be in order to guarantee the
schedulability of any task set Γ by a given scheduling algorithm A. In order
to do this, we must determine the minimum value of Uub (Γ, A) over all task
sets Γ that fully utilize the processor with the scheduling algorithm A. This
new value, called least upper bound and denoted as Ulub (A), will only depend
on the scheduling algorithm A and is defined as
where Γ represents the set of all task sets that fully utilize the processor. A
pictorial representation of the meaning of Ulub (A) is given in Figure 13.1. The
least upper bound Ulub (A) corresponds to the shaded part of the figure. For
every possible task set Γi , the maximum utilization depends on both A and
Γ. The actual utilization for task set Γi will depend on the computational load
of the tasks but will never exceed Uub (Γi , A). Since Ulub (A) is the minimum
upper bound over all possible task sets, any task set whose utilization factor
is below Ulub (A) will be schedulable by A. On the other hand, it may happen
Schedulability Analysis Based on Utilization 297
Depending on the Ci of the tasks in Γ1 Uub(Γ1,A)
Γ1
Γ2
Γ3
Γm
U
FIGURE 13.1
Upper Bounds and Least Upper Bound for scheduling algorithm A.
that Ulub (A) can sometimes be exceeded, but not in general case.
Necessary
schedulability test
??? NO
U
0 1
FIGURE 13.2
Necessary schedulability condition.
τ2 τ2 τ2
t
(F = 2 in this example)
0 T2
τ1 τ1 τ1
t
0 T1 2T1
FIGURE 13.3
No overlap between instances of τ1 and the next release time of τ2 .
• The execution time C1 is short enough so that all the instances of τ1 within
the critical zone of τ2 are completed before the next release of τ2 .
• The execution of the last instance of τ1 that starts within the critical zone
of τ2 overlaps the next release of τ2 .
Let us consider the first case, shown in Figure 13.3. The largest possible value
of C2 is:
C2 = T2 − (F + 1)C1 (13.4)
If we compute U for this value of C2 , we will obtain Uub . In fact, in this case,
the processor is fully utilized, and every increment in either C1 or C2 would
make the task set no more schedulable.
300 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
C1 ≥ T2 − FT1 → Overlap
τ2 τ2
t
(F = 2 in this example)
0 T2
τ1 τ1 τ1
t
0 T1 2T1
FIGURE 13.4
Overlap between instances of τ1 and the next release time of τ2 .
By definition of U we have
C1 C2 C1 T2 − (F + 1)C1
U ub = + = +
T1 T2 T1 T2
C1 (F + 1)C1
= 1+ −
T1 T2
C1 T2
= 1+ [ − (F + 1)] (13.5)
T2 T1
T2
Since F = T1 ,
F T1 ≤ T2 < (F + 1)T1 (13.6)
and the quantity between square brackets will be strictly negative. Therefore,
Uub is monotonically decreasing with respect to C1 .
Consider now the second case, in which the execution of the last instance
of τ1 that starts within the critical zone of τ2 overlaps the next release of τ2 .
This case is shown in Figure 13.4. The largest possible value of C2 in this case
is
C2 = F T1 − F C1 (13.7)
Schedulability Analysis Based on Utilization 301
C1 = T2 − F T1 (13.9)
At this point, we can take either one of the expressions we derived for Uub
and substitute C1 = T2 − F T1 into it. In fact, both refer to the same situation
from the scheduling point of view, and hence, they must both give the same
result.
It should be noted that the resulting expression for Uub will still depend on
the task periods T1 and T2 through F , and hence, we will have to minimize it
with respect to F in order to find the least upper bound Ulub . By substituting
C1 = T2 − F T1 into (13.8), we get
T1 T2 − F T1 T2
U = F + ( − F)
T2 T2 T1
T1 T1 T2
= F + (1 − F )( − F)
T2 T2 T1
T1 T1 T2 T2
= F + ( − F )( − F )
T2 T2 T1 T1
T1 T2 T2
= [F + ( − F )( − F )] (13.10)
T2 T1 T1
302 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
dU 2G(1 + G) − (1 + G2 )
=
dG (1 + G)2
G2 + 2G − 1
= (13.14)
(1 + G)2
dU
dG will be zero when G2 + 2G − 1 = 0, that is, when
√
G = −1 ± 2 (13.15)
√
Of these solutions, only G = −1 + 2 is acceptable because the other one is
negative.
Schedulability Analysis Based on Utilization 303
Ulub = U |G=√2−1
√
1 + ( 2 − 1)2
= √
1 + ( 2 − 1)
√
4−2 2
= √
2
√
= 2( 2 − 1) (13.16)
N
Ci
≤ N (21/N − 1) (13.19)
i=1
Ti
We can summarize this result as shown in Figure 13.5. With respect to Fig-
ure 13.2, the area of uncertain utilization has been restricted. Only for uti-
lization values falling into the white area, are we not yet able to state schedu-
lability.
The next three examples will illustrate in practice the above concepts. Here
we shall assume that both Ti and Ci are measured with the same, arbitrary
time unit.
Consider first the task set of Table 13.1. In this task assignment, the com-
bined processor utilization factor is U = 0.625. For three tasks, from (13.19)
we have Ulub = 3(21/3 − 1) ≈ 0.779 and, since U < Ulub , we conclude from the
sufficient schedulability test that the task set is schedulable by RM.
Consider now the set of tasks described in Table 13.2. The priority as-
304 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Sufficient Necessary
schedulability test for RM schedulability test
YES ??? NO
U
1/N
0 N (2 −1) 1
FIGURE 13.5
Schedulability conditions for Rate Monotonic.
TABLE 13.1
A task set definitely schedulable by RM.
Task τi Period Ti Computation Time Ci Priority Utilization
τ1 50 20 Low 0.400
τ2 40 4 Medium 0.100
τ3 16 2 High 0.125
TABLE 13.2
A task set for which the sufficient RM scheduling condition does not hold.
Task τi Period Ti Computation Time Ci Priority Utilization
τ1 50 10 Low 0.200
τ2 30 6 Medium 0.200
τ3 20 10 High 0.500
Schedulability Analysis Based on Utilization 305
τ3
τ2
τ1
50
Missed
deadline
τ1,1
4 4 2
τ3 release times
τ2 release times
τ1 release times
FIGURE 13.6
RM scheduling for a set of tasks with U = 0.900.
signment does not change in respect of the previous example because periods
Ti are still ordered as before. The combined processor utilization factor now
becomes U = 0.900 and, since U > Ulub , the sufficient schedulability test does
not tell us anything useful in this case.
A snapshot of the scheduling sequence is given in Figure 13.6, where all
the tasks are released at time 0. In fact we know that if all tasks fulfill their
deadlines when they are released at their critical instant, that is, simultane-
ously, then the RM schedule is feasible. However, it is easy to show that task
τ1 misses its deadline, and hence the task set is not schedulable.
Let us now consider yet another set of processes, listed in Table 13.3. As
before, the priority assignment does not change with respect to the previous
example because periods Ti are still ordered in the same way. The combined
processor utilization factor is now the maximum value allowed by the necessary
306 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
TABLE 13.3
A task set for which the sufficient RM scheduling condition does not hold.
Task Period Computation Time
τi Ti Ci Priority Utilization
τ1 40 14 Low 0.350
τ2 20 5 Medium 0.250
τ3 10 4 High 0.400
τ3
τ2
τ1
40
τ1,1
1 6 1 6
τ3 release times
τ2 release times
τ1 release times
FIGURE 13.7
RM scheduling for a set of tasks with U = 1.
308 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
1.5
Ulub
1.2
0.9
0.6 Asymptotic bound = ln2
0.3
0.0
0 2 4 6 8 10
Number of Tasks
FIGURE 13.8
Ulub value versus the number of tasks in the system.
Continuous utilization
τ4 τ4
τ3 Preemption τ3
IDLE
Preemption Missed
τ2 τ2
deadline
τ1 τ1
t1 t1 t2
FIGURE 13.9
A sample task set where an overflow occurs.
310 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Sufficient Necessary
schedulability test for EDF schedulability test
YES NO
U
0 1
FIGURE 13.10
Utilization based schedulability check for EDF.
The EDF algorithm is optimum in the sense that, if any task set is schedu-
lable by any scheduling algorithm, under the hypotheses just set out, then it
is also schedulable by EDF. In fact,
13.4 Summary
This chapter has presented two simple schedulability tests based on the proces-
sor utilization for RM and EDF scheduling algorithms. The major advantage
of such tests is simplicity: it is possible to execute at run-time a schedulabil-
ity acceptance test whenever a new task is dynamically added in a running
system. As already stated before, fixed priority is the scheduling algorithm
supported by most current operating system, and RM can be implemented in
this case by
In this case, however, RM and EDF scheduling, provably optimum for single-
processor systems, are not necessarily optimum. On the other side, multicore
computers are becoming more and more widespread even in the embedded
systems market. A good compromise between the computational power offered
by multicore systems and the required predictability in real-time applications
Schedulability Analysis Based on Utilization 313
CONTENTS
14.1 Response Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
14.2 Computing the Worst-Case Execution Time . . . . . . . . . . . . . . . . . . . . . . . . . . 321
14.3 Aperiodic and Sporadic Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
14.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
315
316 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
condition that the relative deadline corresponds to the task’s period is now
relaxed into condition Di ≤ Ti .
During execution, the preemption mechanism grabs the processor from a
task whenever a higher-priority task is released. For this reason, all tasks (ex-
cept the highest-priority one) suffer a certain amount of interference from
higher-priority tasks during their execution. Therefore, the worst-case re-
sponse time Ri of task τi is computed as the sum of its computation time
Ci and the worst-case interference Ii it experiences, that is,
Ri = Ci + Ii (14.1)
Observe that the interference must be considered over any possible interval
[t, t + Ri ], that is, for any t, to determine the worst case. We already know,
however, that the worst case occurs when all the higher-priority tasks are
released at the same time as task τi . In this case, t becomes a critical instant
and, without loss of generality, it can be assumed that all tasks are released
simultaneously at the critical instant t = 0.
The contribution of each higher-priority task to the overall worst-case in-
terference will now be analyzed individually by considering the interference
due to any single task τj of higher priority than τi . Within the interval [0, Ri ],
τj will be released one (at t = 0) or more times. The exact number of releases
can be computed by means of a ceiling function, as
Ri
(14.2)
Tj
Since each release of τj will impose on τi an interference of Cj , the worst-case
interference imposed on τi by τj is
Ri
Cj (14.3)
Tj
This because if task τj is released at any time t < Ri , than its execution must
have finished before Ri , as τj has a larger priority, and therefore, that instance
of τj must have terminated before τi can resume.
Let hp(i) denote the set of task indexes with a priority higher than τi .
These are the tasks from which τi will suffer interference. Hence, the total
interference endured by τi is
Ri
Ii = Cj (14.4)
Tj
j∈hp(i)
No simple solution exists for this equation since Ri appears on both sides, and
is inside . on the right side. The equation may have more than one solution:
the smallest solution is the actual worst-case response time.
The simplest way of solving the equation is to form a recurrence relation-
ship of the form
(k+1)
wi
(k)
wi = Ci + Cj (14.6)
Tj
j∈hp(i)
(k)
where wi is the k-th estimate of Ri and the (k + 1)-th estimate from the k-th
(0)
in the above relationship. The initial approximation wi is chosen by letting
(0)
wi = Ci (the smallest possible value of Ri ).
(0) (1) (k)
The succession wi , wi , . . . , wi , . . . is monotonically nondecreasing.
This can be proved by induction, that is by proving that
(0) (1)
1. wi ≤ wi (Base Case)
(k−1) (k) (k) (k+1)
2. If wi ≤ wi , then wi ≤ wi for k > 1 (Inductive Step)
(1)
The base case derives directly from the expression of wi :
(1)
wi
(0)
(0)
wi = Ci + Cj ≥ wi = Ci (14.7)
Tj
j∈hp(i)
(k) (k−1)
In fact, since, by hypothesis, wi ≥ wi , each term of the summation
is either 0 or a positive integer multiple of Cj . Therefore, the succession
(0) (1) (k)
wi , wi , . . . , wi , . . . is monotonically nondecreasing.
(0) (1) (k)
Two cases are possible for the succession wi , wi , . . . , wi , . . .:
• If the equation has no solutions, the succession does not converge, and it
(k)
will be wi > Di for some k. In this case, τi clearly does not meet its
deadline.
(k) (k−1)
• Otherwise, the succession converges to Ri , and it will be wi = wi =
Ri for some k. In this case, τi meets its deadline if and only if Ri ≤ Di .
(k)
It is possible to assign a physical meaning to the current estimate wi . If
we consider a point of release of task τi , from that point and until that task
318 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
TABLE 14.1
A sample task set.
Task τi Period Ti Computation Time Ci Priority
τ1 8 3 High
τ2 14 4 Medium
τ3 22 5 Low
instance completes, the processor will be busy and will execute only tasks
(k)
with the priority of τi or higher. wi can be seen as a time window that is
(0)
moving down the busy period. Consider the initial assignment wi = Ci : in
(0) (1)
the transformation from wi to wi , the results of the ceiling operations will
be (at least) 1. If this is indeed the case, then
(1)
wi = Ci + Cj (14.9)
j∈hp(i)
The priority assignment is Rate Monotonic and the CPU utilization factor
U is
3
Ci 3 4 5
U= = + + 0.89 (14.10)
i=1
T i 8 14 22
The necessary schedulability test (U ≤ 1) does not deny schedulability, but
the sufficient test for RM is of no help in this case because U > 1/3(21/3 −1)
0.78.
The highest-priority task τ1 does not endure interference from any other
task. Hence, it will have a response time equal to its computation time, that
(0)
is, R1 = C1 . In fact, considering (14.6), hp(1) = ∅ and, given w1 = C1 , we
(1)
trivially have w1 = C1 . In this case, C1 = 3, hence R1 = 3 as well. Since
R1 = 3 and D1 = 8, then R1 ≤ D1 and τ1 meets its deadline.
(0)
For τ2 , hp(2) = {1} and w2 = C2 = 4. The next approximations of R2
are
(1) 4
w2 = 4+ 3=7
8
(2) 7
w2 = 4+ 3=7 (14.11)
8
(2) (1)
Since w2 = w2 = 7, then the succession converges, and R2 = 7. In other
words, widening the time window from 4 to 7 time units did not introduce
any additional interference. Task τ2 meets its deadline, too, because R2 = 7,
D2 = 14, and thus R2 ≤ D2 .
For τ3 , hp(3) = {1, 2}. It gives rise to the following calculations:
(0)
w3 = 5
(1) 5 5
w3 = 5+ 3+ 4 = 12
8 14
(2) 12 12
w3 = 5+ 3+ 4 = 15
8 14
(3) 15 15
w3 = 5+ 3+ 4 = 19
8 14
(4) 19 19
w3 = 5+ 3+ 4 = 22
8 14
(5) 22 22
w3 = 5+ 3+ 4 = 22 (14.12)
8 14
R3 = 22 and D3 = 22, and thus R3 ≤ D3 and τ3 (just) meets its deadline.
Figure 14.1 shows the scheduling of the three tasks: τ1 and τ2 are released
3 and 2 times, respectively, within the period of τ3 , which, in this example,
corresponds also to the worst response time for τ3 . The worst-case response
time for all the three tasks is summarized in Table 14.2.
In this case RTA guarantees that all tasks meet their deadline.
320 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
τ1
τ2
τ3
w(0) = 5
τ3,1
Release of τ1 and τ2
w(1) = 12
Release of τ1
τ3,1 w(2) = 15
Release of τ2
τ3,1 w(3) = 19
Release of τ1
τ3,1
w(4) = 22
τ2,2
FIGURE 14.1
Scheduling sequence of the tasks of Table 14.1 and RTA analysis for task τ3 .
Schedulability Analysis Based on Response Time Analysis 321
TABLE 14.2
Worst-case response time for the sample task set
Task τi Period Ti Computation Priority Worst-Case
Time Ci Resp. Time Ri
τ1 8 3 High 3
τ2 14 4 Medium 7
τ3 22 5 Low 22
the past execution flow of the program, thus significantly reducing the average
execution time. Prediction is, however, based on statistical assumptions, and
therefore, prediction errors may occur, leading to the flush of the pipeline with
an adverse impact on the task execution time.
The reader should be convinced at this point that, using analysis alone,
it is in practice not possible to derive the worst-case execution time for mod-
ern processors. However, given that most real-time systems will be subject
to considerable testing anyway, for example, for safety reasons, a combined
approach that combines testing and measurement for basic blocks and path
analysis for complete components can often be appropriate.
less frequently, but a suitable schedulability analysis test will ensure (if passed)
that the maximum rate can be sustained.
For these tasks, assuming Di = Ti , that is, a relative deadline equal to the
minimum interarrival time, is unreasonable because they usually encapsulate
error handlers or respond to alarms. The fault model of the system may state
that the error routine will be invoked rarely but, when it is, it has a very short
deadline. For many periodic tasks it is useful to define a deadline shorter than
the period.
The RTA method just described is adequate for use with the extended
process model just introduced, that is, when Di ≤ Ti . Observe that the method
works with any fixed-priority ordering, and not just with the RM assignment,
as long as the set hp(i) of tasks with priority larger than task τi is defined
appropriately for all i and we use a preemptive scheduler. This fact is especially
important to make the technique applicable also for sporadic tasks. In fact,
even if RM was shown to be an optimal fixed-priority assignment scheme when
Di = Ti , this is no longer true for Di ≤ Ti .
The following theorem (Leung and Whitehead, 1982) [59] introduces an-
other fixed-priority assignment no more based on the period of the task but
on their relative deadlines.
The optimality of DMPO means that, if any task set Γ can be scheduled
using a preemptive, fixed-priority scheduling algorithm A, then the same task
set can also be scheduled using the DPMO. As before, such a priority as-
signment sounds to be good choice since it makes sense to give precedence to
more “urgent” tasks. The formal proof of optimality will involve transforming
the priorities of Γ (as assigned by A), until the priority ordering is Deadline
Monotonic (DM). We will show that each transformation step will preserve
schedulability.
Let τi and τj be two tasks in Γ, with adjacent priorities, that are in the
wrong order for DMPO under the priority assignment schema A. That is, let
Pi > Pj and Di > Dj under A, where Pi (Pj ) denotes the priority of τi (τj ).
We shall define now a new priority assignment scheme A to be identical to
A, except that the priorities of τi and τj are swapped and we prove that Γ is
still schedulable under A .
We observe first that all tasks with a priority higher than Pi (the max-
imum priority of the tasks being swapped) will be unaffected by the swap.
All tasks with priorities lower than Pj (the minimum priority of the tasks
being swapped) will be unaffected by the swap, too, because the amount of
interference they experience from τi and τj is the same before and after the
swap.
Task τj has a higher priority after the swap, and since it was schedulable,
326 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
by hypothesis, under A, it will suffer after the swap either the same or less
interference (due to the priority increase). Hence, it must be schedulable un-
der A , too. The most difficult step is to show now that task τi , which was
schedulable under A and has had its priority lowered, is still schedulable under
A .
We observe first that once the tasks have been switched, the new worst-
case response time of τi becomes equal to the old response time of τj , that is,
Ri = Rj .
Under the previous priority assignment A, we had
• Rj ≤ Dj (schedulability)
• Dj < Di (hypothesis)
• Di ≤ Ti (hypothesis)
• Rj ≤ Dj (schedulability under A)
• Dj < Di (hypothesis)
Hence, Ri < Di , and it can be concluded that τi is still schedulable after the
switch.
In conclusion, the DM priority assignment can be obtained from any other
priority assignment by a sequence of pairwise priority reorderings as above.
Each such reordering step preserves schedulability.
The following example illustrates a successful application of DMPO for a
task set where RM priority assignment fails. Consider the task set listed in
Table 14.3, where the RM and DM priority assignments differ for some tasks.
The behaviors of RM and DM for this task set will be now examined and
compared.
From Figures 14.2 and 14.3 we can see that RM is unable to schedule the
task set, whereas DM succeeds. We can derive the same result performing
RTA analysis for the RM schedule.
For the RM schedule, we have, for τ3 , hp(3) = ∅. Hence, R3 = C3 = 3, and
τ3 (trivially) meets its deadline.
Schedulability Analysis Based on Response Time Analysis 327
TABLE 14.3
RM and DM priority assignment
Task Parameters Priority
Task Ti Deadline Di Ci RM DM
τ1 19 6 3 Low High
τ2 14 7 4 Medium Medium
τ3 11 11 3 High Low
τ4 20 19 2 Very Low Very Low
τ1
τ2 : Deadline
τ3 : Period
τ4
τ1 release times
τ2 release times
τ3 release times
τ4 release times
FIGURE 14.2
RM scheduling fails for the tasks of Table 14.3.
328 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
τ1
τ2 : Deadline
τ3 : Period
τ4
τ4
τ1 release times
τ2 release times
τ3 release times
τ4 release times
FIGURE 14.3
DM scheduling succeeds for the tasks of Table 14.3.
Schedulability Analysis Based on Response Time Analysis 329
(0)
w2 = 4
(1) 4
w2 = 4+ 3=7
11
(2) 7
w2 = 4+ 3 = 7 = R2 (14.13)
11
(0)
w1 = 3
(1) 3 3
w1 = 3+ 3+ 4 = 10
11 14
(2) 10 10
w1 = 3+ 3+ 4 = 10 = R1
11 14
(14.14)
(0)
w2 = 4
(1) 4
w2 = 4+ 3=7
19
(2) 7
w2 = 4+ 3 = 7 = R2 (14.15)
19
(0)
w3 = 3
(1) 3 3
w3 = 3+ 3+ 4 = 10
19 14
(2) 10 10
w3 = 3+ 3+ 4 = 10 = R3 (14.16)
19 14
Finally, τ4 meets its deadline also because R4 = 19 and D4 = 19. This ter-
minates the RTA analysis, proving the schedulability of Γ for the DM priority
assignment.
14.4 Summary
This chapter introduced RTA, a check for schedulability that allows a finer
resolution in respect of the other utilization-based checks. It is worth noting
that, even when RTA fails, the task set may be schedulable because RTA
assumes that all the tasks are released at the same critical instant. It is, how-
ever, always convenient and safer to consider the worst case (critical instant)
in scheduling dynamics because it would be very hard to make sure that crit-
ical instants never occur due to the variability in the task execution time on
computers, for example, to cache misses and pipeline hazards. Even if not so
straight as the utilization-based check, RTA represents a practical schedula-
bility check because, even in case of nonconvergence, it can be stopped as
long as the currently computed response time for any task exceeds the task
deadline.
The RTA method can be also be applied to Earliest Deadline First (EDF),
but is considerably more complex than for the fixed-priority case and will
not be considered in this book due its very limited applicability in practical
application.
RTA is based on an estimation of the worst-case execution time for each
considered task, and we have seen that the exact derivation of this parameter
is not easy, especially for general-purpose processors, which adopt techniques
for improving average execution speed, and for which the worst-case execution
time can be orders of magnitude larger that the execution time in the large
majority of the executions. In this case, basing schedulability analysis on the
worst case may sacrifice most of the potentiality of the processor, with the
Schedulability Analysis Based on Response Time Analysis 331
risk of having a very low total utilization and, therefore, of wasting computer
resources. On the other side, considering lower times may produce occasional,
albeit rare, deadline misses, so a trade-off between deadline miss probability
and efficient processor utilization is normally chosen. The applications where
absolutely no deadline miss is acceptable are in fact not so common. For
example, if the embedded system is used within a feedback loop, the effect of
occasional deadline misses can be considered as a disturb (or noise) in either
the controlled process, the detectors, or the actuators, and can be handled by
the system, provided enough stability margin in achieved control.
Finally, RTA also allows dealing with the more general case in which the
relative deadline Di for task τi is lower than the task period Ti . This is the case
for sporadic jobs that model a set of system activities such as event and alarm
handling. As a final remark, observe that, with the inclusion of sporadic tasks,
we are moving toward a more realistic representation of real-time systems. The
major abstraction so far is due to the task model, which assumes that tasks
do not depend on each other, an assumption often not realistic. The next two
chapters will cover this aspect, taking into account the effect due to the use
of resources shared among tasks and the consequent effect in synchronization.
332 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
15
Task Interactions and Blocking
CONTENTS
15.1 The Priority Inversion Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
15.2 The Priority Inheritance Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
15.3 The Priority Ceiling Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
15.4 Schedulability Analysis and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
15.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
The basic process model, first introduced in Chapter 11, was used in Chap-
ters 12 through 14 as an underlying set of hypotheses to prove several inter-
esting properties of real-time scheduling algorithms and, most importantly, all
the schedulability analysis results we discussed so far. Unfortunately, as it has
already been remarked at the end of Chapter 14, some aspects of the basic
process model are not fully realistic, and make those results hard to apply to
real-world problems.
The hypothesis that tasks are completely independent from each other re-
garding execution is particularly troublesome because it sharply goes against
the basics of all the interprocess communication methods introduced in Chap-
ters 5 and 6. In one form or another, they all require tasks to interact and
coordinate, or synchronize, their execution. In other words, tasks will some-
times be forced to block and wait until some other task performs an action in
the future.
For example, tasks may either have to wait at a critical region’s boundary
to keep a shared data structure consistent, or wait for a message from another
task before continuing. In all cases, their execution will clearly no longer be
independent from what the other tasks are doing at the moment.
333
334 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
τL τM τH
Read/write
Read/write shared data
shared data
P(m)
No interactions P(m)
with τH and τL
RL
V(m) RH V(m)
Priority inversion
τM Blocked Ready Running
time
t1 t2 t3 t4
FIGURE 15.1
A simple example of unbounded priority inversion involving three tasks.
• Initially, neither τH nor τM are ready for execution. They may be, for
instance, periodic tasks waiting for their next execution instance or they
may be waiting for the completion of an I/O operation.
• During its execution, at t1 in the figure, τL enters into its critical region RL ,
protected by semaphore m. The semaphore primitive P(m) at the critical
region’s boundary is nonblocking because no other tasks are accessing the
shared memory M at the moment. Therefore, τL is allowed to proceed
immediately and keeps running within the critical region.
• At t3 , task τM becomes ready for execution and moves into the Ready state,
too, but this has no effect on the execution of τH , because the priority of
τM is lower than the priority of τH .
• As τH proceeds with its execution, it may try to enter its critical region. In
the figure, this happens at t4 . At this point, τH is blocked by the semaphore
primitive P(m) because the value of semaphore m is now zero. This behavior
is correct since τL is within its own critical region RL , and the semaphore
mechanism is just enforcing mutual exclusion between RL and RH , the
critical region τH wants to enter.
it, that is, τH and τL . In fact, τH is in the Blocked state and, by definition, it
cannot perform any further action until τL exits from RL and executes V(m)
to unblock it. On its part, τL is Ready, but it is not being executed because
the fixed-priority scheduler does not give it any processor time. Hence, it has
no chance of proceeding through RL and eventually execute V(m).
The length of the priority inversion region depends instead on how much
time τM keeps running. Unfortunately, as discussed above, τM has nothing
to do with τH and τL . The programmers who wrote τH and τL may even
be unaware that τM exists. The existence of multiple middle-priority tasks
τM1 , . . . , τMn instead of a single one makes the situation even worse. In a
rather extreme case, those tasks could take turns entering the Ready state
and being executed so that, even if none of them keeps running for a long
time individually, taken as a group there is always at least one task τMk in
the Ready state at any given instant. In that scenario, τH will be blocked for
an unbounded amount of time by τM1 , . . . , τMn even if they all have a lower
priority than τH itself.
In summary, we are willing to accept that a certain amount of blocking
of τH by some lower-priority tasks cannot be removed. By intuition, when
τH wants to enter its critical region RH in the example just discussed, it
must be prepared to wait up to the maximum time needed by τL to execute
within critical region RL . This is a direct consequence of the mutual exclusion
mechanism, which is necessary to access the shared resources in a safe way.
However, it is also necessary for the blocking time to have a computable and
finite upper bound. Otherwise, the overall schedulability of the whole system,
and of τH in particular, will be compromised in a rather severe way.
As for many other concurrent programming issues, it must also be re-
marked that this is not a systematic error. Rather, it is a time-dependent
issue that may go undetected when the system is bench tested.
in the system so that no other task can preempt it. The task goes back to its
regular priority when it exits from the critical region.
One clear advantage of this method is its extreme simplicity. It is also easy
to convince oneself that it really works. Informally speaking, if we prevent
any tasks from unexpectedly losing the processor while they are holding any
mutual exclusion semaphores, they will not block any higher-priority tasks for
this reason should they try to get the same semaphores. At the same time,
however, the technique introduces a new kind of blocking, of a different nature.
That is, any higher-priority task τM that becomes ready while a low-
priority task τL is within a critical region will not get executed—and we
therefore consider it to be blocked by τL —until τL exits from the critical
region. This happens even if τM does not interact with τL at all. The problem
has been solved anyway because the amount of blocking endured by τM is
indeed bounded. The upper bound is the maximum amount of time τL may
actually spend running within its critical region. Nevertheless, we are now
blocking some tasks, like τM , which were not blocked before.
For this reason, this way of proceeding is only appropriate for very short
critical regions, because it causes much unnecessary blocking. A more sophisti-
cated approach is needed in the general case, although introducing additional
kinds of blocking into the system in order to set an upper bound on the block-
ing time is a trade-off common to all the solutions to the unbounded priority
inversion problem that we will present in this chapter. We shall see that the
approach just discussed is merely a strongly simplified version of the priority
ceiling emulation protocol, to be described in Section 15.3.
In any case, the underlying idea is useful: the unbounded priority inversion
problem can be solved by means of a better cooperation between the synchro-
nization mechanism used for mutual exclusion and the processor scheduler.
This cooperation can be implemented, for instance, by allowing the mutual
exclusion mechanism to temporarily change task priorities. This is exactly the
way the priority inheritance algorithm, or protocol, works.
The priority inheritance protocol has been proposed by Sha, Rajkumar,
and Lehoczky [79], and offers a straightforward solution to the problem of
unbounded priority inversion. The general idea is to dynamically increase
the priority of a task as soon as it is blocking some higher-priority tasks. In
particular, if a task τL is blocking a set of n higher-priority tasks τH1 , . . . , τHn
at a given instant, it will temporarily inherit the highest priority among them.
This prevents any middle-priority task from preempting τL and unduly make
the blocking experienced by τH1 , . . . , τHn any longer than necessary.
In order to define the priority inheritance protocol in a more rigorous way
and look at its most important properties, it is necessary to set forth some
additional hypotheses and assumptions about the system being considered. In
particular,
• It is first of all necessary to distinguish between the initial, or baseline,
priority given to a task by the scheduling algorithm and its current, or
active, priority. The baseline priority is used as the initial, default value of
Task Interactions and Blocking 339
the active priority but, as we just saw, the latter can be higher if the task
being considered is blocking some higher-priority tasks.
• The tasks are under the control of a fixed-priority scheduler and run within
a single-processor system. The scheduler works according to active priori-
ties.
• If there are two or more highest-priority tasks ready for execution, the
scheduler picks them in First-Come First-Served (FCFS) order.
• Semaphore waits due to mutual exclusion are the only source of blocking in
the system. Other causes of blocking such as, for example, I/O operations,
must be taken into account separately, as discussed in Chapter 16.
The priority inheritance protocol itself consists of the following set of rules:
Direct
τH Blocked Running blocking In c.s. Running
Priority
τM Inheritance Push-through
Blocked blocking Ready
V(m)
End of inheritance
time
t1 t2 t3 t4 t5 t6
FIGURE 15.2
A simple application of the priority inheritance protocol involving three tasks.
• Task τH executes within its critical region from t5 until t6 . Then it exits
from the critical region, releasing the mutual exclusion semaphore with
V(m), and keeps running past t6 .
Even from this simple example, it is clear that the introduction of the priority
inheritance protocol makes the concept of blocking more complex than it was
before. Looking again at Figure 15.2, there are now two distinct kinds of
blocking rather than one, both occurring between t4 and t5 :
Task Interactions and Blocking 341
time
τ
Region R1
Region R2
time
τ
Region R1
Region R2
FIGURE 15.3
In a task τ , critical regions can be properly (above) or improperly (below)
nested.
same property. It can be shown that most results discussed in the following are
still valid even if only maximal critical regions are taken into account, unless
otherwise specified. The interested reader should refer to Reference [79] for
more information about this topic.
The first lemma we shall discuss establishes under which conditions a high-
priority task τH can be blocked by a lower-priority task.
When the hypotheses of Lemma 15.1 are satisfied, then task τL can block
τH . The same concept can also be expressed in two slightly different, but
equivalent, ways:
Proof. Lemma 15.2 states that each lower-priority task τLi can block τH for
at most the duration of one of its critical sections. The critical section must
be one of those that can block τH according to Lemma 15.1.
In the worst case, the same scenario may happen for all the n lower-priority
tasks, and hence, τH can be blocked at most n times, regardless of how many
semaphores τH uses.
More important information we get from this lemma is that, provided all
tasks only spend a finite amount of time executing within their critical regions
in all possible circumstances, then the maximum blocking time is bounded.
This additional hypothesis is reasonable because, by intuition, if we allowed a
task to enter a critical region and execute within it for an unbounded amount
of time without ever leaving, the mutual exclusion framework would no longer
344 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
work correctly anyway since no other tasks would be allowed to get into any
critical region controlled by the same semaphore in the meantime.
It has already been discussed that push-through blocking is an additional
form of blocking, introduced by the priority inheritance protocol to keep the
worst-case blocking time bounded. The following lemma gives a better char-
acterization of this kind of blocking and identifies which semaphores can be
responsible for it.
Lemma 15.4. A semaphore S can induce push-through blocking onto task τH
only if it is accessed both by a task that has a priority lower than the priority
of τH , and by a task that either has or can inherit a priority higher than the
priority of τH .
Proof. The lemma can be proved by showing that, if the conditions set forth
by the lemma do not hold, then push-through blocking cannot occur.
If S is not accessed by any task τL with a priority lower than the priority
of τH , then, by definition, push-through blocking cannot occur.
Let us then suppose that S is indeed accessed by a task τL , with a priority
lower than the priority of τH . If S is not accessed by any task that has or can
inherit a priority higher than the priority of τH , then the priority inheritance
mechanism will never give to τL an active priority higher than τH . In this
case, τH can always preempt τL and, again, push-through blocking cannot
take place.
If both conditions hold, push-trough blocking of τH by τL may occur, and
the lemma follows.
At this point, by combining Lemmas 15.3 and 15.5, we obtain the following
important theorem, due to Sha, Rajkumar, and Lehoczky [79].
Nonblocking P(m1)
FIGURE 15.4
An example of transitive priority inheritance involving three tasks. If transitive
inheritance did not take place, τH ’s blocking would be unbounded due to the
preemption of τL by τB .
Proof. In the proof, we will use the same task nomenclature just introduced to
define transitive priority inheritance. Since τH is directly blocked by τM , then
τM must hold a semaphore, say SM . But, by hypothesis, τM is also directly
blocked by a third task τL on a different semaphore held by τL , say SL .
As a consequence, τM must have performed a blocking P() on SL after
successfully acquiring SM , that is, within the critical region protected by SM .
This corresponds to the definition of properly nested critical regions.
If transitive priority inheritance is ruled out with the help of Lemma 15.6,
that is, if nested critical regions are forbidden, a stronger version of
Lemma 15.4 holds:
lemma is the same as Lemma 15.4, minus the “can inherit” clause. The rea-
soning is valid because Lemma 15.6 rules out transitive priority inheritance
if critical regions are not nested. Moreover, transitive inheritance is the only
way for a task to acquire a priority higher than the highest-priority task with
which it shares a resource.
The conditions set forth by this lemma also cover direct blocking because
semaphore S can directly block a task τH only if it is used by another task
with a priority less than the priority of τH , and by τH itself. As a consequence,
the lemma is valid for all kinds of blocking (both direct and push-through).
Proof. The proof of this theorem descends from the straightforward applica-
tion of the previous lemmas. The function usage(k, i) captures the conditions
under which semaphore Sk can block τi by means of either direct or push-
through blocking, set forth by Lemma 15.7.
A single semaphore Sk may guard more than one critical region, and it is
generally unknown which specific critical region will actually cause the block-
ing for task τi . It is even possible that, on different execution instances of τi ,
the region will change. However, the worst-case blocking time is still bounded
by the worst-case execution time among all critical regions guarded by Sk ,
that is, C(k).
Eventually, the contributions to the worst-case blocking time coming from
the K semaphores are added together because, as stated by Lemma 15.5, τi
can be blocked at most once for each semaphore.
optimal for the priority inheritance protocol, and the bound it computes is
not so tight because
• The priority inheritance protocol does not prevent deadlock from occurring.
Deadlock must be avoided by some other means, for example, by imposing
a total order on semaphore accesses, as discussed in Chapter 4.
All of these issues are addressed by the priority ceiling protocols, also proposed
by Sha, Rajkumar, and Lehoczky [79]. In this chapter we will discuss the
original priority ceiling protocol and its immediate variant; both have the
following properties:
semaphore. The overall goal of the protocol is to ensure that, if task τL holds
a semaphore and it could lead to the blocking of an higher-priority task τH ,
then no other semaphores that could also block τH are to be acquired by any
task other than τL itself.
A side effect of this approach is that a task can be blocked, and hence
delayed, not only by attempting to lock a busy semaphore but also when
granting a lock to a free semaphore could lead to multiple blocking on higher-
priority tasks.
In other words, as we already did before, we are trading off some useful
properties for an additional form of blocking that did not exist before. This
new kind of blocking that the priority ceiling protocol introduces in addition
to direct and push-through blocking, is called ceiling blocking.
The underlying hypotheses of the original priority ceiling protocol are the
same as those of the priority inheritance protocol. In addition, it is assumed
that each semaphore has a static ceiling value associated with it. The ceiling
of a semaphore can easily be calculated by looking at the application code
and is defined as the maximum initial priority of all tasks that use it.
As in the priority inheritance protocol, each task has a current (or ac-
tive) priority that is greater than or equal to its initial (or baseline) priority,
depending on whether it is blocking some higher-priority tasks or not. The
priority inheritance rule is exactly the same in both cases.
A task can immediately acquire a semaphore only if its active priority
is higher than the ceiling of any currently locked semaphore, excluding any
semaphore that the task has already acquired in the past and not released
yet. Otherwise, it will be blocked. It should be noted that this last rule can
block the access to busy as well as free semaphores.
The first property of the priority ceiling protocol to be discussed puts an
upper bound on the priority a task may get when it is preempted within a
critical region.
Proof. The easiest way to prove this lemma is by contradiction. If, contrary
to our thesis, τL inherits a priority higher than or equal to the priority of τM,
then it must block a task τH . The priority of τH must necessarily be higher
than or equal to the priority of τM . If we call PH and PM the priorities of τH
and τM , respectively, it must be PH ≥ PM .
On the other hand, since τM was allowed to enter ZM without blocking, its
priority must be strictly higher than the maximum ceiling of the semaphores
currently locked by any task except τM itself. Even more so, if we call C ∗ the
maximum ceiling of the semaphores currently locked by tasks with a priority
lower than the priority of τM, thus including τL , it must be PM > C ∗ .
350 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Then, we can prove that no transitive blocking can ever occur when the
priority ceiling protocol is in use because
Proof. We assume that a task cannot deadlock “by itself,” that is, by trying
to acquire again a mutual exclusion semaphore it already acquired in the past.
In terms of program code, this would imply the execution of two consecutive
P() on the same semaphore, with no V() in between.
Then, a deadlock can only be formed by a cycle of n ≥ 2 tasks {τ1 , . . . , τn }
waiting for each other according to the circular wait condition discussed in
Chapter 4. Each of these tasks must be within one of its critical regions;
otherwise, deadlock cannot occur because the hold & wait condition is not
satisfied.
By Lemma 15.9, it must be n = 2; otherwise, transitive blocking would
occur, and hence, we consider only the cycle {τ1 , τ2 }. For a circular wait to
occur, one of the tasks was preempted by the other while it was within a critical
region because they are being executed by one single processor. Without loss
Task Interactions and Blocking 351
This theorem is also very useful from the practical standpoint. It means
that, under the priority ceiling protocol, programmers can put into their code
an arbitrary number of critical regions, possibly (properly) nested into each
other. As long as each task does not deadlock with itself, there will be no
deadlock at all in the system.
The next goal is, as before, to compute an upper bound on the worst-case
blocking time that a task τi can possibly experience. First of all, it is necessary
to ascertain how many times a task can be blocked by others.
Theorem 15.4. Under the priority ceiling protocol, a task τH can be blocked
for, at most, the duration of one critical region.
The next step is to identify the critical regions of interest for blocking,
that is, which critical regions can block a certain task.
Lemma 15.10. Under the priority ceiling protocol, a critical region Z, be-
longing to task τL and guarded by semaphore S, can block another task τH
only if PL < PH , and the priority ceiling of S, CS∗ is greater than or equal to
PH , that is, CS∗ ≥ PH .
Theorem 15.5. Let K be the total number of semaphores in the system. The
worst-case blocking time experienced by each activation of task τi under the
priority ceiling protocol is bounded by Bi :
K
Bi = max {usage(k, i)C(k)}
k=1
where
• C(k) is the worst-case execution time among all critical regions guarded by
semaphore Sk .
Proof. The proof of this theorem descends from the straightforward applica-
tion of
The only difference between the formulas given in Theorem 15.2 (for prior-
ity inheritance) and Theorem 15.5 (for priority ceiling), namely, the presence
of a summation instead of a maximum can be easily understood by comparing
Lemma 15.5 and Theorem 15.4.
Lemma 15.5 states that, for priority inheritance, a task τH can be blocked
at most once for each semaphore that satisfies the blocking conditions of
Lemma 15.7, and hence, the presence of the summation for priority inher-
itance. On the other hand, Theorem 15.4 states that, when using priority
ceiling, τH can be blocked at most once, period, regardless of how many
Task Interactions and Blocking 353
It can be proved that the new recurrence relationship still has the same
properties as the original one. In particular, if it converges, it still provides
the worst-case response time Ri for an appropriate choice of wi0 . As before,
either 0 or Ci are good starting points.
The main difference is that the new formulation is pessimistic, instead of
necessary and sufficient, because the bound Bi on the worst-case blocking
time is not tight, and hence, it may be impossible for a task to ever actually
incur in a blocking time equal to Bi .
Let us now consider an example. We will consider a simple set of tasks
and determine the effect of the priority inheritance and the immediate priority
ceiling protocols on their worst-case response time, assuming that their periods
are large (> 100 time units). In particular, the system includes
The upper part of Figure 15.5 sketches the internal structure of the three
tasks. Each task is represented as a rectangle with the left side aligned with
its release time. The rectangle represents the execution of the task if it were
alone in the system; the gray area inside the rectangle indicates the location
of the critical region the task contains, if any.
The lower part of the figure shows how the system of tasks being considered
Task Interactions and Blocking 355
τH
τM
RH
τH
Blocking P(s)
τM
τL
0 1 time
Preemption
FIGURE 15.5
An example of task scheduling with unbounded priority inversion.
τH
τM
RM
RH
τH
τM
Inheritance ends
τL
0 1 time
Preemption
FIGURE 15.6
An example of task scheduling with priority inheritance.
τH
τM
τH
τM
0 1 time
Preemption
FIGURE 15.7
An example of task scheduling with immediate priority ceiling.
TABLE 15.1
Task response times for Figures 15.5–15.7
Actual Response Time Worst Case
Task Nothing P. Inheritance Imm. P. Ceiling Response Time
τH 6 4 3 5
τM 5 8 8 9
τL 11 11 11 11
15.5 Summary
In most concurrent applications of practical interest, tasks must interact with
each other to pursue a common goal. In many cases, task interaction also
implies blocking, that is, tasks must synchronize their actions and therefore
wait for each other.
In this chapter we saw that careless task interactions may undermine pri-
ority assignments and, eventually, jeopardize the ability of the whole system
to be scheduled because they may lead to an unbounded priority inversion.
This happens even if the interactions are very simple, for instance, when tasks
manipulate some shared data by means of critical regions.
One way of solving the problem is to set up a better cooperation between
the scheduling algorithm and the synchronization mechanism, that is, to allow
the synchronization mechanism to modify task priorities as needed. This is the
underlying idea of the algorithms discussed in this chapter: priority inheritance
and the two variants of priority ceiling.
It is then possible to show that all these algorithms are actually able to
force the worst-case blocking time of any task in the system to be upper-
bounded. Better yet, it is also possible to calculate the upper bound, starting
from a limited amount of additional information about task characteristics,
which is usually easy to collect.
Once the upper bound is known, the RTA method can also be extended
to take consider it and calculate the worst-case response time of each task in
the system, taking task interaction into account.
360 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
16
Self-Suspension and Schedulability Analysis
CONTENTS
16.1 Self-Suspension and the Critical Instant Theorem . . . . . . . . . . . . . . . . . . . . 362
16.2 Self-Suspension and Task Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
16.3 Extension of the Response Time Analysis Method . . . . . . . . . . . . . . . . . . . 369
16.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
361
362 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
time
Worst-case interference
T2 = 15, C2 = 6, released at t = 7 I2 = 8, due to τ1
Overflow
FIGURE 16.1
When a high-priority task self-suspends itself, the response time of lower-
priority tasks may no longer be the worst at a critical instant.
The lower part of Figure 16.1 shows, instead, what happens if each instance
of τ1 is allowed to self-suspend for a variable amount of time, between 0 and
2 time units, at the very beginning of its execution. In particular, the time
diagrams have been drawn assuming that τ1,1 self-suspends for 2 time units,
whereas τ1,2 and τ1,3 self-suspend for 1 time unit each. In the time diagram,
the self-suspension regions are shown as grey rectangles. Task instance τ2,1 is
still released as in time diagram B, that is, at t = 4 time units.
However, the scheduling of τ2,1 shown in time diagram C is very different
than before. In particular,
• The self-suspension of τ1,1 , lasting 2 time units, has the local effect of
shifting its execution to the right of the time diagram. However, the shift
has a more widespread effect, too, because it induces an “extra” interference
364 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
on τ2,1 that postpones the beginning of its execution. In Figure 16.1, this
extra interference is shown as a dark gray rectangle.
• Since its execution has been postponed, τ2,1 now experiences 8 time units
of “regular” interference instead of 4 due to τ1,2 and τ1,3 . It should be noted
that this amount of interference is still within the worst-case interference
computed using the critical instant theorem.
B1SS = S1 + 0 = 2 (16.2)
Blocking P(m)
τ1 released at t = 3 Nonblocking P(m)
time unit
V(m) V(m)
τ1
time
τ2 released at t = 0 Direct blocking: one time, for Nonblocking P(m)
no more than 2 time units V(m) A
Int. Interference
τ2
Nonblocking P(m) V(m)
B SS B
τ1
τ2
Nonblocking P(m) V(m) Nonblocking P(m) V(m)
FIGURE 16.2
When a task self-suspends itself, it may suffer more blocking than predicted
by the theorems of Chapter 15.
• Then, it enters its first critical region, shown as a dark grey rectangle in
the figure. If we are using a semaphore m for synchronization, the critical
region is delimited, as usual, by the primitives P(m) (at the beginning) and
V(m) (at the end).
• The task instance executes within the first critical region for 2 time units.
• After concluding its first critical region, the task instance further executes
for 3 time units before entering its second critical region.
• After the end of the second critical region, the task instance executes for 1
more time unit, and then it concludes.
Time diagram A of Figure 16.2 shows how the two tasks are scheduled, as-
suming that τ1 is released at t = 3 time units, τ2 is released at t = 0 time
units, and the priority of τ1 is higher than the priority of τ2 . It is also taken for
granted that the period of both tasks is large enough—let us say T1 = T2 = 30
time units—so that only one instance of each is released for the whole length
of the time diagram, and no self-suspension is allowed. In particular,
It should be noted that the system behavior, in this case, can still be accurately
predicted by the theorems of Chapters 12 through 15, namely:
Self-Suspension and Schedulability Analysis 367
• The reason for this can readily be discerned from the time diagram. When
τ1 is resumed after self-suspension and tries to enter its second critical
region, it is blocked again because τ2 entered its second critical region while
τ1 itself was suspended.
• The second block of τ1 lasts for 1.5 time units, that is, until τ2 eventually
exits from the second critical region and unblocks it.
In summary,
368 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
K
Bi1S = usage(k, i)C(k) (for priority inheritance) (16.4)
k=1
or
K
Bi1S = max {usage(k, i)C(k)} (priority ceiling) (16.5)
k=1
where, as before,
• C(k) is the worst-case execution time among all critical regions guarded
by semaphore Sk .
This quantity now becomes the worst-case blocking time endured by each
Self-Suspension and Schedulability Analysis 369
individual task segment. Therefore, the total worst-case blocking time of task
τi due to task interaction, BiT I , is given by
BiT I = (Qi + 1)Bi1S (16.6)
where Qi is the number of self-suspensions of task τi .
In our example, from either (16.4) or (16.5) we have
B11S = 2 (16.7)
because the worst-case execution time of any task within a critical region
controlled by semaphore m is 2 time units and τ1 can be blocked by τ2 .
We also have
B21S = 0 (16.8)
because τ2 is the lowest-priority task in the system and cannot be blocked by
any other task. In fact, usage(k, 2) = 0 for any k.
According to (16.6), the worst-case blocking times are
B1T I = 2B11S = 4 (16.9)
for τ1 , and
B2T I = B21S = 0 (16.10)
for τ2 . This is of course not a formal proof, but it can be seen that B1T I is
indeed a correct upper bound of the actual amount of blocking seen in the
example.
An additional advantage of this approach is that it is very simple and
requires very little knowledge about task self-suspension itself. It is enough
to know how many self suspensions each task contains, information quite
easy to collect. However, the disadvantage of using such a limited amount of
information is that it makes the method extremely conservative. Thus, BiT I
is not a tight upper bound for the worst-case blocking time and may widely
overestimate it in some cases.
More sophisticated and precise methods do exist, such as that described
in Reference [54]. However, as we have seen in several other cases, the price
to be paid for a tighter upper bound for the worst-case blocking is that much
more information is needed. For instance, in the case of [54], we need to know
not only how many self suspensions each task has got, but also their exact
location within the task. In other words, we need to know the execution time
of each, individual task segment, instead of the task execution time as a whole.
where Ci is its execution time, and Ii is the worst-case interference the task
may experience due to the presence of higher-priority tasks.
Then, in Chapter 15, we argued that the same method can be extended to
also handle task interactions—as well as the blocking that comes from them—
by considering an additional contribution to the worst-case response time as
written in (15.1):
Ri = Ci + Bi + Ii (16.12)
where Bi is the worst-case blocking time of τi . It can be proved that (16.12)
still holds even when tasks are allowed to self-suspend, if we redefine Bi to
include the additional sources of blocking discussed in Sections 16.1 (extra
interference) and 16.2 (additional blocking after each self-suspension).
Namely, Bi must now be expressed as:
Bi = BiSS + BiT I (16.13)
In summary, referring to (16.1) and (16.6) for the definition of BiSS and
respectively, and then to (16.4) and (16.5) for the definition of Bi1S , we
BiT I ,
can write
K
Bi = S i + min(Cj , Sj ) + (Qi + 1) usage(k, i)C(k) (16.14)
j∈hp(i) k=1
wi0 = Ci + Bi (16.17)
Let us now apply the RTA method to the examples presented in Sec-
tions 16.1 and 16.2. Table 16.1 summarizes the attributes of the tasks shown
in Figure 16.1 as calculated so far. The blocking time due to task interaction,
BiT I , is obviously zero for both tasks in this case, because they do not interact
in any way.
For what concerns the total blocking time Bi , from (16.13) we simply have:
B1 = 2+0=2 (16.18)
B2 = 2+0=2 (16.19)
Self-Suspension and Schedulability Analysis 371
TABLE 16.1
Attributes of the tasks shown in Figure 16.1 when τ1 is allowed to self-suspend
Task Ci Ti Si BiSS BiT I
τ1 (high priority) 4 7 2 2 0
τ2 (low priority) 6 15 0 2 0
TABLE 16.2
Attributes of the tasks shown in Figure 16.2 when τ1 is allowed to self-suspend
Task Ci Ti Si BiSS BiT I
τ1 (high priority) 10 30 3.5 3.5 4
τ2 (low priority) 10 30 0 3.5 0
w10 = C1 + B1 = 4 + 2 = 6 (16.20)
w20 = C2 + B2 = 6 + 2 = 8 (16.21)
(0)
w2 8
w21 = 8+ C1 = 8 + 4 = 16 (16.22)
T1 7
16
w22 = 8+ 4 = 20 (16.23)
7
20
w23 = 8+ 4 = 20 (16.24)
7
As before, the high-priority task τ1 does not endure any interference, and
hence, we simply have:
and the succession converges immediately, giving R1 = 17.5 time units. For
the low-priority task τ2 , we have instead:
16.4 Summary
This chapter complements the previous one and further extends the schedu-
lability analysis methods at our disposal to also consider task self-suspension.
This is a rather common event in a real-world system because it occurs
whenever a task voluntarily suspends itself for a certain variable, but upper-
bounded, amount of time. A typical example would be a task waiting for an
I/O operation to complete.
Surprisingly, we saw that the self-suspension of a task has not only a
local effect on the response time of that task—this is quite obvious—but it
may also give rise to an extra interference affecting lower-priority tasks, and
further increase the blocking times due to task interaction well beyond what
is predicted by the analysis techniques discussed in Chapter 15.
The main goal of the chapter was therefore to look at all these effects,
calculate their worst-case contribution to the blocking time of each task, and
further extend the RTA method to take them into account. This last extension
concludes our journey to make our process model, as well as the analysis
techniques associated with it, closer and closer to how tasks really behave in
an actual real-time application.
Part III
Advanced Topics
373
This page intentionally left blank
17
Internal Structure of FreeRTOS
CONTENTS
17.1 Task Scheduler/Context Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
17.2 Synchronization Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
17.3 Porting FreeRTOS to a New Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
17.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
375
376 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
CurrentTCB
TCB of
running task
TCB of ready
3 task
ReadyTasksLists[]
(one for each priority level)
0
Each element contains a
pointer to the TCB of a
1 task ready for execution
All lists are of type xList;
their head also stores the
...
number of elements in the list List elements are of
type xListItem
2
1
TCB of
suspended
Each element points to the TCB of task
a task that is finished, but whose Each element contains a pointer to
memory has not been freed yet the TCB of a task suspended for an
undetermined number of ticks
TasksWaitingTermination * TCB of
2 terminating
task
FIGURE 17.1
Simplified view of the data structures used by the FreeRTOS scheduler. The
elements marked with ∗ are optional; they may or may not be present, de-
pending on the FreeRTOS configuration.
count of how many elements currently are in the list, to speed up common
operations like checking whether a list is empty or not.
In Figure 17.1, an xList is represented by a sequence of ordered grey boxes
connected by arrows. The leftmost box is the list header, and the number
within it indicates how many elements currently belong to the list. The next
boxes represent list elements; each of them points to a TCB, although, for
clarity, not all of them are shown in the figure.
It should also be noted that the actual implementation of an xList is
Internal Structure of FreeRTOS 377
slightly more complicated than what has been described so far. That is, it
actually is a circular list and incorporates a guard element to delimit its end.
However, those additional details are mainly related to compactness and effi-
ciency, and do not significantly change the underlying idea.
The main components of the scheduler data structures are
• The CurrentTCB pointer designates the TCB of the running task. There is
only one instance of this pointer, because FreeRTOS only supports single-
processor systems, where at the most, one process can be running at a
time.
• A task may become ready for execution while the scheduler is sus-
pended and the ReadyTaskLists[] cannot be manipulated directly. In
this case, the task is temporarily “parked,” by linking its TCB to the
PendingReadyList list. The elements of this list are moved into the proper
position of ReadyTaskLists[], depending on their priority, as soon as the
scheduler becomes operational again and before any scheduling decision is
taken.
• The SuspendedTaskList holds the TCBs of all tasks that are currently
suspended, that is, those that are waiting for an undetermined number of
clock ticks. This list is needed only if FreeRTOS has been configured to
support task suspension. Such a configuration is needed to support infinite
timeouts in interprocess communication as well as explicit task suspension.
• Last, the TasksWaitingTermination list collects all tasks that are finished
378 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
TABLE 17.1
Contents of a FreeRTOS Task Control Block (TCB)
Field Purpose Optional
pcTaskName[] Human-readable task name -
uxTCBNumber Task identification number ∗
uxPriority Active priority -
uxBasePriority Baseline priority ∗
pxStack Lowest task stack address -
pxEndOfStack Highest task stack address ∗
pxTopOfStack Current top of the task stack -
xGenericListItem Link to the scheduler’s lists -
xEventListItem Link to an event wait list -
uxCriticalNesting Interrupt disable nesting level ∗
ulRunTimeCounter CPU time consumed ∗
pxTaskTag User-defined, per-task pointer ∗
xMPUSettings Memory protection information ∗
but that have not yet been removed from the system because the memory
associated with them has not yet been freed. For a variety of reasons, this
last operation in the lifetime of a task is accomplished by the idle task,
running at the minimum priority in the system. Hence, finished tasks may
spend a nonnegligible amount of time in this list if the system is busy.
As said before, in FreeRTOS each task is represented by a data structure
called TCB, containing a number of fields. Some of them are always present;
others are optional, in order to save space, because they are needed only for
certain operating system configurations. As shown in Table 17.1, the TCB
contains many of the elements that were discussed in the previous chapters,
namely,
• pcTaskName holds the human-readable task name as a character string.
This information is not directly used in any way by the scheduler but may
be useful to identify the task for debugging purposes.
• Its machine-readable counterpart is the uxTCBNumber. This field is present
only if the FreeRTOS runtime trace facility has been configured, and is a
unique number that represents the task.
• uxPriority represents the current, or active, priority of the task used by
the scheduling algorithm.
• When the system has been configured to support mutual exclusion
semaphores with priority inheritance, the previous field is complemented
by uxBasePriority, which represents the baseline priority of the task.
• pxStack points to the area of memory used to store the task stack. Re-
gardless of the direction in which the stack grows on a certain architecture
Internal Structure of FreeRTOS 379
(toward higher or lower addresses), this field always points to the base of
the area, that is, its lowest address.
• For architectures in which task stacks grow upward, that is, toward higher
addresses, pxEndOfStack points to the highest, legal stack address. This
information is required to perform stack occupancy checks.
• pxTopOfStack points to the top of the task stack. This information is used
for two distinct, but related, purposes:
1. When the task context is saved, most of the task state informa-
tion is pushed onto its stack; the pointer is used to later retrieve
this information.
2. The value of the stack pointer itself is part of the task state, and
hence, the pointer is also used to restore the task stack pointer
to the right value during context restoration.
• The xGenericListItem field is used to link the task control block to one
of the lists managed by the scheduler, depending on the task state. In
particular,
• The interrupt disable nesting level indicates how many nested critical
regions protected by disabling interrupts are currently in effect for a
given task. It is used to properly reenable interrupts when the outer-
most critical region concludes, that is, when the nesting level goes back
to zero. In some architectures, this datum is held within the TCB in the
uxCriticalNesting field.
• The ulRunTimeCounter is present in the TCB only when FreeRTOS has
been configured to collect runtime statistics. It represents how much time
has been spent running the task from its creation. It should be noted that
its value is not derived from the operating system tick but from a separate,
architecture-dependent timer. Hence, its resolution and unit of measure-
ment may not be the same.
• pxTaskTag holds a pointer that can be uniquely associated with the task
by the user. It is useful, for example, to store a pointer to a data structure
holding additional user-defined, per-task information besides what is held
in the TCB itself.
• If the architecture supports memory protection among tasks, the
xMPUSettings points to an architecture-dependent data structure. Its con-
tents are used during context switch to reprogram the Memory Protection
Unit (MPU) according to the requirements of the task to be executed next.
An interesting omission in the FreeRTOS TCB is the task state, that is, its
location in the process state diagram. However, this information can easily
be inferred, by looking at which lists the TCB is currently linked to, through
xGenericListItem and xEventListItem. The TCB of the running task can
be reached directly through the CurrentTCB pointer.
Another thing that is seemingly missing is the processor state information
pertaining to the task, that is, the value of the program counter, general
registers, and so on. In FreeRTOS, this information is pushed onto the task
stack when its context is saved. Therefore, even if it is not stored in the TCB
directly, it can still be retrieved indirectly because the TCB does contain a
pointer to the top of the stack, pxTopOfStack. This is the situation shown for
task B in Figure 17.2.
We can now start discussing how FreeRTOS implements a context switch
in practice. In particular, let us assume that task A is currently running and
the operating system is about to switch to task B, which is ready for execution.
The status of the main data structures involved in this context switch
before it begins is shown in Figure 17.2. Since task A is being executed, its
processor state (depicted as a dark grey block in the figure) is actually within
the CPU itself. The CPU stack pointer points somewhere within task A’s stack
and delimits the portion of stack currently in use by the task (the light grey
zone) from the free stack space (the white zone). For the sake of the example,
stacks are assumed to grow downward in the figure.
While task A is running, the stack pointer value evolves according to what
Internal Structure of FreeRTOS 381
downward
All stacks
Processor state of
Processor state of
grow
task B, except SP
task A
Free stack space
SP
Free stack space
CPU
CurrentTCB
TopOfStack
TCB of task A
TopOfStack
TCB of task B
ReadyTaskLists[]
xList element
FIGURE 17.2
State of the main FreeRTOS data structures involved in a context switch when
it is executing task A and is about to switch to task B.
downward
All stacks
Processor state of
Scheduler state
grow
Processor state of task B, except SP
task A, except SP
SP Free stack space Free stack space
CPU
CurrentTCB
TopOfStack
TCB of task A
TopOfStack
TCB of task B
ReadyTaskLists[]
xList element
FIGURE 17.3
State of the main FreeRTOS data structures involved in a context switch when
the context of task A has been saved and the scheduler is about to run.
pxTopOfStack field points to the information just saved. In this way, the saved
task context is made accessible from the TCB itself.
At this point, the processor stack pointer is also switched to a dedicated
kernel stack, and hence, the processor can safely be used to execute the
scheduling algorithm without fear of damaging the context of any task in
the system. The final result of the scheduling algorithm is an update of the
scheduler data structures, namely, to the CurrentTCB pointer.
In particular, as shown in Figure 17.4, if we suppose that the scheduling al-
gorithm chooses B as the next task to be executed, it updates the CurrentTCB
pointer so that it refers to the TCB of task B.
It should also be noted that immediately before the context switch takes
place, further updates to the data structures may be necessary, depending on
the reason of the context switch itself. The figure refers to the simplest case,
in which a context switch is needed due to the readiness of a higher-priority
task (B) and the currently executing task (A) is still ready for execution.
In this case, the TCB of A should, in principle, be linked back to one of
the ReadyTaskLists[] according to its priority. Actually, as an optimization,
FreeRTOS never removes a task from its ready list when it becomes running,
so this operation is unnecessary. More complex scenarios involve intertask
synchronization or communication and will be discussed in Section 17.2.
The last step of the context switch is to restore the context of task B to
resume its execution. The final state of the system is depicted in Figure 17.5.
After context restoration, the processor state of task B has been loaded into
the processor, and the processor stack pointer has been brought back exactly
where it was when the context of B was saved. Indeed, by comparing Fig-
Internal Structure of FreeRTOS 383
downward
All stacks
Processor state of
Scheduler state
grow
Processor state of task B, except SP
task A, except SP
SP Free stack space Free stack space
CPU
CurrentTCB
TopOfStack
TCB of task A
TopOfStack
TCB of task B
ReadyTaskLists[]
xList element
FIGURE 17.4
State of the main FreeRTOS data structures involved in a context switch when
the scheduling algorithm has chosen B as the next task to be executed.
ures 17.2 and 17.5, it can be seen that they are exactly equivalent, with the
roles of tasks A and B interchanged.
downward
All stacks
Processor state of
grow
Processor state of
task B task A, except SP
Free stack space
SP Free stack space
CPU
CurrentTCB
TopOfStack
TCB of task A
TopOfStack
TCB of task B
ReadyTaskLists[]
xList element
FIGURE 17.5
State of the main FreeRTOS data structures involved in a context switch after
the context of task B has been restored.
• pcHead and pcTail delimit the message storage zone associated with the
queue. In particular, pcHead points to the base, that is, the lowest address
of the memory area, and pcTail points to one byte more than the highest
address of the area.
A separate message storage zone is used, instead of embedding it into the
xQUEUE, so that the main xQUEUE data structure always has the same length
and layout regardless of how many messages can be stored into it, and their
size.
• pcReadFrom and pcWriteTo delineate the full portion of the message stor-
age zone, which currently contains messages, and separate it from the free
message storage space. It should be remarked that the meaning of the
pcReadFrom differs from the meaning of pcWriteTo in a slightly counter-
intuitive way: while pcWriteTo points to the first free slot in the message
storage zone, pcReadFrom points to the element that was last read from the
queue. As a consequence, the oldest message in the queue is not pointed
directly by pcReadFrom but resides one element beyond that.
These pointers are used by tasks to know where the oldest message cur-
rently stored in the queue starts (at the location pointed by pcReadFrom
plus the item size) and where the next message must be written (at the
location pointed by pcWriteTo).
Overall, the message storage zone is managed as a circular buffer to avoid
moving messages from one location to another within the storage area when
performing a send or receive operation. Hence, both pointers wrap back to
pcHead whenever they reach pcTail.
Internal Structure of FreeRTOS 385
TABLE 17.2
Contents of a FreeRTOS message queue data structure (xQUEUE)
Field Purpose
uxLength Maximum queue capacity (# of messages)
uxItemSize Message size in bytes
pcHead Lowest address of message storage zone
pcTail Highest address of message storage zone +1
pcReadFrom Address of oldest full element -uxItemSize
pcWriteTo Address of next free element
uxMessagesWaiting # of messages currently in the queue
xTasksWaitingToSend List of tasks waiting to send a message
xTasksWaitingToReceive List of tasks waiting to receive a message
xRxLock Send queue lock flag and message counter
xTxLock Receive queue lock flag and message counter
Tail
WriteTo
addresses
Higher
ReadFrom
Head ItemSize
TasksWaitingToSend 0
TasksWaitingToReceive 0
3 messages currently in
xQUEUE the queue,
uxMessagesWaiting=3
Tail
WriteTo
ReadFrom
Head ItemSize
TasksWaitingToSend 0
TasksWaitingToReceive 2
No messages currently in
xQUEUE the queue,
uxMessagesWaiting=0
FIGURE 17.6
Simplified depiction of a FreeRTOS message queue.
other tasks nor ISRs are allowed to operate on the queue because interrupts
are disabled in order to guarantee its consistency.
• If the queue is empty, the task exits from the “strong” critical section just
discussed and enters a “weaker” critical section, protected by disabling the
operating system scheduler. Within this critical section, the running task
cannot be preempted but interrupts are enabled again, and hence, the task
locks the queue against further updates from ISRs by means of the fields
xRxLock and xTxLock.
At first sight, having two distinct critical sections arranged in this way may
look like a useless complication. However, as it will become clearer from
the following description, the operations contained in the weaker critical
section require a relatively long execution time. Hence, especially in a real-
time system, it is important to keep interrupts enabled while they are
carried out, even at the expense of making the code more involved.
• If the timeout of the receive operation has already expired at this point, the
queue is unlocked and the operation is concluded with an error indication.
• If the timeout of the receive operation is not yet expired (or no timeout was
specified) and the queue is still empty—some messages could have been
sent to the queue between the two critical sections—the task is blocked
by removing it from the element of ReadyTaskLists[] it belongs to and
then linked to either one of the delayed task lists (if the task specified
a finite timeout for the receive operation) or to the suspended task list
(if no timeout was specified). In addition, the task is also linked to the
xTasksWaitingToReceive list associated to the queue.
• At this point, the queue is unlocked and the scheduler is reenabled. If the
current task was blocked in the previous step, this also forces a context
switch to occur.
Moreover, unlocking the queue may also wake up some tasks blocked on
either xTasksWaitingToReceive or xTasksWaitingToSend. This is neces-
sary because ISRs are allowed to send and receive messages from the queue
while it is locked, but they are not allowed to update the waiting task lists.
This update is therefore delayed and performed as soon as the queue is
unlocked.
The whole sequence outlined above is repeated to retry the receive operation
whenever the task is awakened. This may happen either because the receive
timeout expired or more messages were sent to the queue. In the first case,
the next receive attempt will necessarily fail because the timeout expiration
will definitely be detected.
However, in the second case, the receive operation is not necessarily bound
to succeed on the next attempt because other, higher-priority tasks may
“steal” all the messages sent to the queue before the current task had a chance
388 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
TABLE 17.3
xQUEUE fields that have a different meaning when the message queue supports
a mutual exclusion semaphore with priority inheritance
Original name New name Purpose
pcHead uxQueueType Queue type
pcTail pxMutexHolder Task owning the mutex
pcReadFrom uxRecursiveCallCount Critical region nesting counter
of running. In this case, the task will find that the queue is empty and block
again.
All other communication and synchronization objects provided by FreeR-
TOS are directly layered on message queues. For example, a counting
semaphore with an initial value of x and a maximum value of y is implemented
as a message queue that can hold at most y zero-byte messages, with x dummy
messages stored into the queue during initialization. Binary semaphores are
handled as a special case of counting semaphores, with y = 1 and either x = 0
or x = 1.
Mutual exclusion semaphores are an important exception because FreeR-
TOS implements the priority inheritance algorithm for them and supports
the recursive lock and unlock feature for them. As a consequence, the message
queue mechanism just described cannot be applied as it is. Just to make an
example, task priorities are obeyed but never modified by the message queue
operations discussed so far.
On the one hand, to implement the priority inheritance algorithm, more
information is needed than it is provided by the xQUEUE data structure dis-
cussed so far. On the other hand, several fields in the same data structure
are unneeded when it is used to support a mutual exclusion semaphore rather
than a true message queue.
Hence, as shown in Table 17.3, several xQUEUE fields get a different name
and meaning in this case, such as:
• As seen in Table 17.2, for regular message queues, the pcHead field holds
the lowest address of the message storage zone associated with the queue.
However, as discussed before, message queues used to build semaphores
hold zero-size messages, and thus, no memory at all is actually needed to
store them; only their count is important.
For this reason, the pcHead field—now renamed uxQueueType—is initial-
ized to a NULL pointer to indicate that the message queue is indeed a mutual
exclusion semaphore.
currently owns the mutex. In this context, owning the mutex means that
the task is currently within a critical region controlled by that semaphore.
data types required by FreeRTOS into the corresponding data types supported
by the compiler:
1 # define portCHAR char
2 # define portFLOAT float
3 # define portDOUBLE double
4 # define portLONG long
5 # define portSHORT short
6 # define portSTACK_TYPE unsigned portLONG
7 # define portBASE_TYPE long
For example, the code excerpt shown above states that, for the Cortex-
M3, the FreeRTOS data type portCHAR (an 8-bit character) corresponds to
the C language data type char. Even more importantly, it also states that
portBASE TYPE, the most “natural” integer data type of the architecture,
which usually corresponds to a machine word, is a long integer. Similarly,
the portSTACK TYPE is used as the base type for the task stacks, and its cor-
rect definition is crucial for correct stack alignment.
Then, the data type used by FreeRTOS to represent time, expressed in
ticks, must be defined. This data type is called portTickType and it is defined
as follows:
1 # if ( c o n f i g U S E _ 1 6 _ B I T _ T I C K S == 1 )
2 typedef unsigned p o r t S H O R T p o r t T i c k T y p e;
3 # define p o r t M A X _ D E L A Y ( p o r t T i c k T y p e ) 0 xffff
4 # else
5 typedef unsigned portLONG p o r t T i c k T y p e;
6 # define p o r t M A X _ D E L A Y ( p o r t T i c k T y p e ) 0 x f f f f f f f f
7 # endif
The first definition states that, on this architecture, stacks grow downward.
The macro can also be defined as ( +1 ) to denote that they grow upward
instead. The second definition determines the length of a tick in milliseconds,
starting from the configuration option configTICK RATE HZ. The last one ex-
presses the strongest memory alignment constraint of the architecture for any
kind of object in bytes. In this case, the value 8 means that a memory address
that is a multiple of 8 bytes is good for storing any kind of object.
The next definition concerns portYIELD, the function or macro invoked by
FreeRTOS to perform a context switch from the current task to a new one
chosen by the scheduling algorithm. In this case, this activity is delegated to
the architecture-dependent function vPortYieldFromISR:
Internal Structure of FreeRTOS 391
The first two definitions are not used directly by FreeRTOS; rather, they
act as a building block for the following ones. portSET INTERRUPT MASK
unconditionally disables all interrupt sources that may interact with
FreeRTOS by setting the basepri processor register to the value
configMAX SYSCALL INTERRUPT PRIORITY.
This is accomplished with the help of an assembly language insert (in-
troduced by the GCC-specific keyword asm) because the basepri register
can be accessed only by means of the specialized msr instruction instead of a
standard mov.
The effect of the assignment is that all interrupt requests with a priority
lower than or equal to either the specified value or the current execution prior-
ity of the processor are not honored immediately but stay pending. Interrupt
requests with a higher priority are still handled normally, with the constraint
that they must not invoke any FreeRTOS function.
The portCLEAR INTERRUPT MASK macro does the opposite: it uncondition-
392 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
ally reenables all interrupt sources by resetting the basepri processor regis-
ter to zero, that is, the lowest possible priority. As a side effect, the proces-
sor will also handle immediately any interrupt request that was left pending
previously.
The two macros just mentioned are used directly to implement
portDISABLE INTERRUPT and portENABLE INTERRUPT, invoked by FreeRTOS
to disable and enable interrupts, respectively, from a task context. On the other
hand, FreeRTOS invokes two other macros, portSET INTERRUPT MASK FROM
ISR and portCLEAR INTERRUPT MASK FROM ISR, to do the same from an in-
terrupt service routine, as this distinction is needed on some architectures.
On the Cortex-M3 architecture, this is unnecessary, and therefore,
the same code is used in both cases. The rather counterintuitive def-
initions found at lines 17–21 of the listing stem from the fact that
portSET INTERRUPT MASK FROM ISR is expected to return a value that will
be passed to the matching portCLEAR INTERRUPT MASK FROM ISR as an ar-
gument. This simplifies their implementation on some architectures be-
cause it makes possible the passing of some information from one macro
to the other, but it is unnecessary for the Cortex-M3. As a conse-
quence, portSET INTERRUPT MASK FROM ISR returns a dummy zero value, and
portCLEAR INTERRUPT MASK FROM ISR ignores its argument.
The last two functions related to interrupt handling, to be defined here, are
portENTER CRITICAL and portEXIT CRITICAL. They are used within FreeR-
TOS to delimit very short critical regions of code that are executed in a task
context, and must be protected by disabling interrupts.
Since these critical regions can be nested into each other, it is
not enough to map them directly into portDISABLE INTERRUPTS and
portENABLE INTERRUPTS. If this were the case, interrupts would be incor-
rectly reenabled at the end of the innermost nested critical region instead
of the outermost one. Hence, a slightly more complex approach is in order.
For the Cortex-M3, the actual implementation is delegated to the functions
vPortEnterCritical and vPortExitCritical. They are defined in another
architecture-dependent module.
Last, portmacro.h contains an empty definition for the macro portNOP, a
macro that must “do nothing.” For the Cortex-M3 architecture, it is in fact
defined to be empty:
1 # define portNOP ()
does not have any effect until after the instruction that follows it, whereas
the instruction that disables interrupts takes effect immediately.
Hence, on those architectures, enabling interrupts and disabling them
again in the next instruction—as it happens with the STI/CLI sequence in
the IntelR
64 and IA-32 architecture—prevents any interrupt requests from
actually being accepted by the processor. The most straightforward solution
is to insert something between the interrupt enable and disable instructions.
This something must not modify the machine state in any way but still count
as (at least) one instruction, and this is exactly what portNOP does.
Besides what has been discussed so far, portmacro.h may also contain
additional macro, data type, and function definitions that are not required by
FreeRTOS but are used by other architecture-dependent modules.
The portmacro.h header only contains data type and macro defini-
tions. We have seen that, in some cases, those macro definitions map func-
tion names used by FreeRTOS, like portYIELD, into architecture-dependent
function names, like vPortYieldFromISR. We shall therefore discuss how
the architecture-dependent functions described so far are actually imple-
mented, along with other functions not mentioned so far but still required
by FreeRTOS.
The implementation is done in one or more architecture-dependent mod-
ules. For the Cortex-M3 architecture, all of them are in the port.c source
file. The first couple of functions to be discussed implements (possibly nested)
critical regions by disabling interrupts:
1 static unsigned p o r t B A S E _ T Y P E u x C r i t i c a l N e s t i n g = 0 x a a a a a a a a;
2
3 void v P o r t E n t e r C r i t i c a l( void )
4 {
5 p o r t D I S A B L E _ I N T E R R U P T S ();
6 u x C r i t i c a l N e s t i n g ++;
7 }
8
9 void v P o r t E x i t C r i t i c a l( void )
10 {
11 u x C r i t i c a l Ne sti ng - -;
12 if ( u x C r i t i c a l N e s t i n g == 0 )
13 {
14 p o r t E N A B L E _ I N T E R R U P T S();
15 }
16 }
zero, that is, the calling task is about to exit from the outermost critical re-
gion. Incrementing and decrementing uxCriticalNesting does not pose any
concurrency issue on a single-processor system because these operations are
always performed with interrupts disabled.
It should also be noted that, although, in principle, uxCriticalNesting
should be part of each task context—because it holds per-task information—it
is not necessary to save it during a context switch. In fact, due to the way the
Cortex-M3 port has been designed, a context switch never occurs unless the
critical region nesting level of the current task is zero. This property implies
that the nesting level of the task targeted by the context switch must be zero,
too, because its context has been saved exactly in the same way. Then it is
assured that any context switch always saves and restores a critical nesting
level of zero, making this action redundant.
The next two functions found in port.c are used to request a processor
rescheduling (also called a yield) and perform it, respectively as follows:
1 # define p o r t N V I C _ I N T _ C T R L ( ( volatile unsigned long *) 0 x e 0 0 0 e d 0 4 )
2 # define p o r t N V I C _ P E N D S V S E T 0 x10000000
3
4 void v P o r t Y i e l d F r o m I S R( void )
5 {
6 *( p o r t N V I C _ I N T _ C T R L) = p o r t N V I C _ P E N D S V S E T;
7 }
8
9 void x P o r t P e n d S V H a n d l e r( void )
10 {
11 __asm volatile
12 (
13 " mrs r0 , psp \n"
14 " \n"
15 " ldr r3 , p x C u r r e n t T C B C o n s t \n"
16 " ldr r2 , [ r3 ] \n"
17 " \n"
18 " stmdb r0 ! , { r4 - r11 } \n"
19 " str r0 , [ r2 ] \n"
20 " \n"
21 " stmdb sp ! , { r3 , r14 } \n"
22 " mov r0 , %0 \n"
23 " msr basepri , r0 \n"
24 " bl v T a s k S w i t c h C o n t e x t \n"
25 " mov r0 , #0 \n"
26 " msr basepri , r0 \n"
27 " ldmia sp ! , { r3 , r14 } \n"
28 " \n"
29 " ldr r1 , [ r3 ] \n"
30 " ldr r0 , [ r1 ] \n"
31 " ldmia r0 ! , { r4 - r11 } \n"
32 " msr psp , r0 \n"
33 " bx r14 \n"
34 " \n"
35 " . align 2 \n"
36 " p x C u r r e n t T C B C o n s t: . word p x C u r r e n t T C B \n"
37 :: " i " ( c o n f i g M A X _ S Y S C A L L _ I N T E R R U P T _ P R I O R I T Y)
38 );
39 }
xPSR
PC
Saved by the context
Free stack space LR switch routine
R0-R3, R12
R4-R11
CurrentTCB
SP
TopOfStack
PSP
TCB of current
CPU task
FIGURE 17.7
Detailed stack layout during a FreeRTOS context switch on the ARM Cortex-
M3 architecture.
controller by means of its interrupt control register, portNVIC INT CTRL. The
priority assigned to this interrupt request is the lowest among all interrupt
sources. Thus, the corresponding exception handler is not necessarily executed
immediately.
When the processor eventually honors the interrupt request, it automat-
ically saves part of the execution context onto the task stack, namely, the
program status register (xPSR), the program counter and the link register (PC
and LR), as well as several other registers (R0 to R3 and R12). Then it switches
to a dedicated operating system stack and starts executing the exception han-
dling code, xPortPendSVHandler.
The handler first retrieves the task stack pointer PSP and stores it in the
R0 register (line 13). This does not clobber the task context because R0 has
already been saved onto the stack by hardware. Then, it puts into R2 a pointer
to the current TCB taken from the global variable pxCurrentTCB (lines 15–
16).
The handler is now ready to finish the context save initiated by hardware
by pushing onto the task stack registers R4 through R11 (line 18). At last,
the task stack pointer in R0 is stored into the first field of the TCB, that is,
the TopOfStack field (line 19). At this point, the stack layout is as shown in
Figure 17.7, which represents the specialization of Figure 17.3 for the Cortex-
M3 architecture. In particular,
• the stack pointer currently used by the processor, SP, points to the oper-
ating system stack;
• the PSP register points to where the top of the task stack was after excep-
396 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
tion entry, that is, below the part of task context saved automatically by
hardware;
• the TopOfStack field of the current task TCB points to the top of the task
stack after the context save has been concluded.
By comparing the listing with Figure 17.7, it can be seen that the initial
context is set up as follows:
• The initial Processor Status Register xPSR is the value of the macro
portINITIAL XPSR.
• The Program Counter PC comes from the pxCode argument.
• The Link Register LR is set to 0 so that any attempt of the task to return
from its main function causes a jump to that address and can be caught.
• Register R0, which holds the first (and only) argument of the task entry
function, points to the task parameter block pvParameters.
• The other registers are not initialized.
We have already examined the architecture-dependent functions that switch
the processor from one task to another. Starting the very first task is somewhat
an exception to this general behavior.
1 void v P o r t S t a r t F i r s t T a s k( void )
2 {
3 __asm volatile (
4 " ldr r0 , =0 x E 0 0 0 E D 0 8 \n"
5 " ldr r0 , [ r0 ] \n"
6 " ldr r0 , [ r0 ] \n"
7 " msr msp , r0 \n"
8 " svc 0 \n"
9 );
10 }
11
12 void v P o r t S V C H a n d l e r( void )
13 {
14 __asm volatile (
15 " ldr r3 , p x C u r r e n t T C B C o n s t 2 \n"
16 " ldr r1 , [ r3 ] \n"
17 " ldr r0 , [ r1 ] \n"
18 " ldmia r0 ! , { r4 - r11 } \n"
19 " msr psp , r0 \n"
20 " mov r0 , #0 \n"
21 " msr basepri , r0 \n"
22 " orr r14 , #0 xd \n"
23 " bx r14 \n"
24 " \n"
25 " . align 2 \n"
26 " p x C u r r e n t T C B C o n s t 2: . word p x C u r r e n t T C B \n"
27 );
28 }
0xE000ED08. This is the address of the VTOR register that points to the base
of the exception table.
It should be noted that the MSP (Main Stack Pointer) register being dis-
cussed here is not the same as the PSP (Process Stack Pointer) register we
talked about earlier. The Cortex-M3 architecture, in fact, specifies two dis-
tinct stack pointers. With FreeRTOS the PSP is used when a task is running
whereas the MSP is dedicated to exception handling. The processor switches
between them automatically as its operating mode changes.
The initial context restoration is performed by means of a synchronous
software interrupt request made by the svc instruction (line 8).
This software interrupt request is handled by the exception handler
vPortSVCHandler; its code is very similar to xPortPendSVHandler, but it
only restores the context of the new task pointed by CurrentTCB without
saving the context of the previous task beforehand. This is correct because
there is no previous task at all. As before, the processor base priority mask
basepri is reset to zero (lines 20–21) to enable all interrupt sources as soon
as the exception handling function ends.
Before returning from the exception with a bx instruction, the contents of
the link register LR (a synonym of R14) are modified (line 22) to ensure that the
processor returns to the so-called “thread mode,” regardless of what its mode
was. When handling an exception, the Cortex-M3 processor automatically
enters “handler mode” and starts using the dedicated operating system stack
mentioned earlier.
When the execution of a task is resumed, it is therefore necessary to restore
the state from that task’s stack and keep using the same task stack to con-
tinue with the execution. This is exactly what the exception return instruction
does when it goes back to thread mode. A similar, automatic processor mode
switch for exception handling is supported by most other modern processors,
too, although the exact names given to the various execution modes may be
different.
1 # define portNVIC_SYSTICK_LOAD ( ( volatile unsigned long *) 0 x e 0 0 0 e 0 1 4 )
2 # define portNVIC_SYSTICK_CTRL ( ( volatile unsigned long *) 0 x e 0 0 0 e 0 1 0 )
3 # define portNVIC_SYSTICK_CLK 0 x00000004
4 # define portNVIC_SYSTICK_INT 0 x00000002
5 # define portNVIC_SYSTICK_ENABLE 0 x00000001
6
7 void p r v S e t u p T i m e r I n t e r r u p t( void )
8 {
9 *( p o r t N V I C _ S Y S T I C K _ L O A D) =
10 ( c o n f i g C P U _ C L O C K _ H Z / c o n f i g T I C K _ R A T E _ H Z ) - 1 UL ;
11 *( p o r t N V I C _ S Y S T I C K _ C T R L) =
12 portNVIC_SYSTICK_CLK | portNVIC_SYSTICK_INT
13 | p o r t N V I C _ S Y S T I C K _ E N A B L E;
14 }
15
16 void x P o r t S y s T i c k H a n d l e r( void )
17 {
18 unsigned long ulDummy ;
19
20 # if c o n f i g U S E _ P R E E M P T I O N == 1
21 *( p o r t N V I C _ I N T _ C T R L) = p o r t N V I C _ P E N D S V S E T;
22 # endif
Internal Structure of FreeRTOS 399
23
24 ulDummy = p o r t S E T _ I N T E R R U P T _ M A S K _ F R O M _ I S R ();
25 {
26 v T a s k I n c r e m e n t T i c k();
27 }
28 p o r t C L E A R _ I N T E R R U P T _ M A S K _ F R O M _ I S R( ulDummy );
29 }
The next two functions manage the interval timer internal to Cortex-M3
processors, also known as SYSTICK:
• The function prvSetupTimerInterrupt programs the timer to generate
periodic interrupt requests at the rate specified by the configTICK RATE HZ
configuration variable and starts it.
17.4 Summary
Looking at how a real-time operating system, like FreeRTOS, really works
inside is useful for at least two reasons:
CONTENTS
18.1 The Linux Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
18.2 Kernel Preemption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
18.3 The PREEMPT RT Linux Patch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
18.3.1 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
18.4 The Dual-Kernel Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
18.4.1 Adeos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
18.4.2 Xenomai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
18.4.3 RTAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
18.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
401
402 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
system, at every system tick (normally in the range of 1–10 ms) the kernel can
regularly take control of the processor via kernel routine scheduler tick()
and then adjust task time slices as well as update the priorities of non-real-
time tasks, possibly assigning the processor to a new task.
Unless declared as a First In First Out (FIFO) task, every task is assigned
a timeslice, that is, a given amount of execution time. Whenever the task has
been running for such an amount of time, it expires and the processor can
be assigned to another task of the same priority unless a higher-priority task
is ready at that time. The Linux scheduler uses a queue of task descriptors,
called the run queue, to maintain task specific information. The run queue is
organized in two sets of arrays:
1. Active: Stores tasks that have not yet used their timeslice.
2. Expired: Stores tasks that have used their timeslice.
For every priority level, one active and one expired array are defined. When-
ever the active array becomes empty, the two arrays are swapped, and there-
fore, the tasks can proceed to the next timeslice. Figure 18.1 shows the organi-
zation of the run queue. In order to be able to select the highest-priority task
in constant time, a bitmap of active tasks is used, where every bit corresponds
to a given priority level and defines the presence or absence of ready tasks at
that priority. With this organization of the run queue, adopted since kernel
version 2.4.20, the complexity in the management of the queue is O(1), that
is, the selection of a new task for execution as well as the reorganization of the
queue, can be performed in constant time regardless of the number of active
tasks in the system. Kernel routine scheduler tick(), called at every timer
interrupt, performs the following actions:
• If no task is currently running, that is, all tasks are waiting for some event,
the only action is the update of the statistics for every idle task. Statistics
such as the amount of time a task has been in wait state are then used in
the computation of the current priority for non-real-time tasks.
• Otherwise, the scheduler checks the current task to see whether it is a
real-time task (with priority above a given threshold). If the task is a real-
time task, and if it has been scheduled as a FIFO task, no timeslice check
is performed. In fact, a FIFO task remains the current task even in the
presence of other ready tasks of the same priority. Otherwise, the task
has been scheduled as round-robin and, as the other non-real-time tasks,
its timeslice field is decremented. If the timeslice field goes to 0, the task
descriptor is moved to the expired array and, if for that priority level the
active array is empty, the two arrays are swapped, that is, a new timeslice
starts for the tasks at that priority.
• Based on the new statistics, the priority for non-real-time tasks are recal-
culated. Without entering the details of the dynamic priority computation,
the priority is adjusted around a given base value so that tasks that are
404 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Active bitmap
Higher priority
1 Task C Task M Task N 1
2 Task D Task E 2
3 3
Task C
Task G Task O
Lower priority
Lower priority
Task H Task I
MAX_PRIO – 1 MAX_PRIO – 1
FIGURE 18.1
Data organization of the Linux O(1) scheduler.
Internal Structures and Operating Principles of Linux Real-Time Extensions 405
more likely to be in wait state are rewarded with a priority boost. On the
other side, the priority of computing intensive tasks, doing very few I/O
operations, tends to be lowered in order to improve the user-perceived re-
sponsiveness of the system even if at the price of a slightly reduced overall
throughput.
Whenever scheduler tick() detects that a task other than the current one
is eligible to gain processor ownership, it passes control to kernel routine
schedule(), which selects the new running task, possibly performing a con-
text switch. Recall that routine schedule() can also be called by system
routines or device drivers whenever the current task needs to be put in wait
state, waiting for the termination of an I/O or synchronization operation, or,
conversely, when a task becomes newly ready because of the termination of
an I/O or synchronization operation. The actions performed by schedule()
are the following:
• It finds the highest-priority ready task at that time, first by checking the
active bitmap to find the nonempty active queue at the highest priority (re-
call that in Linux lower-priority numbers correspond to higher priorities).
The task at the head of the corresponding active queue is selected: if it cor-
responds to the current task, no further action is required and schedule()
terminates; otherwise, a context switch occurs.
• After saving the context of the current task, the context of the new task is
restored by setting the current content of the page table and by changing
the content of the stack pointer register to the kernel stack (in the task
descriptor) of the new task. As soon as the machine registers are restored,
in particular the Program Counter, the new task resumes, and the final
part of the schedule() routine is executed in the new context.
may be different from task to task, kernel memory mapping never changes.
Therefore, kernel routines can exchange pointers among tasks with the guar-
antee that they consistently refer to the same objects in memory. Recall that
it is not possible to exchange memory pointers in user tasks because the same
virtual address may refer to different physical memory due to the task-specific
page table content.
is the possibility of the kernel code being interrupted. The kernel cannot, of
course, be made preemptible tout court because there would be the risk of
corrupting the kernel data structures. For this reason, the kernel code will
define a number of critical sections that are protected by spinlocks. A global
counter, preempt count, keeps track of the currently active critical sections.
When its value equals to zero, the kernel can be interrupted; otherwise, in-
terrupts are disabled. preempt count is incremented every time a spinlock is
acquired, and decremented when it is released.
Suppose a high-priority real-time task τ2 is waiting for an event repre-
sented by an interrupt that arrives at time T1 while the system is executing
in kernel mode within the context of a low-priority task τ1 , as shown in Fig-
ure 18.2. In a non-preemptible kernel, the context switch will occur only at
time T2 , that is, when the kernel section of task τ1 terminates. Only at that
time, will the scheduler be invoked, and therefore, task τ2 will gain proces-
sor ownership with a delay of T2 − T1 , which can be tens or even hundreds
of milliseconds. In a preemptible kernel, it is not necessary to wait until the
whole kernel section of task τ2 has terminated. If the kernel is executing in
a preemptible section, as soon as the interrupt is received, the ISR is soon
called and schedule() invoked, giving the processor to task τ1 . In the case
where the kernel is executing a critical section, the interrupt will be served at
time T1 < T2 , that is, when the kernel code exited the critical section. In this
case, the delay experienced by task τ2 is T1 − T1 , which is normally very short
since the code protected by spinlocks is usually very limited.
An important consequence of kernel preemptability is that kernel activities
can now be demanded to kernel threads. Consider the case in which an I/O
device requires some sort of kernel activity in response to an interrupt, for
example, generated by the disk driver to signal the termination of the Direct
Memory Access (DMA) transfer for a block of data. If such activity were car-
ried out by the ISR routine associated with the interrupt, the processor would
not be able to perform anything else even if a more urgent task becomes ready
in the meantime. If, after performing a minimal action, the ISR demanded the
rest of the required activity from a kernel task, there would be the chance for
a more urgent task to gain processor ownership before the related activity is
terminated.
Spinlocks represent the most basic locking mechanism in the kernel and,
as such, can be used also in ISR code. If the kernel code is not associated with
ISR but runs within the context of a user or a kernel task, critical sections
can be protected by semaphores. Linux kernel semaphores provide two basic
function: up() and down(). If a task calls down() for a semaphore, the count
field in the semaphore is decremented. If that field is less than 0, the task
calling down() is blocked and added to the semaphore’s waiting queue. If the
field is greater than 0, the task continues. Calling up() the task increments
the count field and, if it becomes greater than 0, wakes a task waiting on the
semaphore’s queue. Semaphores have the advantage over spinlocks of allowing
another task gain processor usage while waiting for the resource, but cannot be
408 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Non-Preemptive kernel
Low-priority
User Mode Kernel Mode User Mode
Task 1
T1 T2
High-priority
Task 2
Preemptive Kernel
Kernel mode
Critical Low-priority
User Mode User Mode
region Task 1
T1'
T1 T2
High-priority
Task 2
FIGURE 18.2
Latency due to non-preemptible kernel sections.
Internal Structures and Operating Principles of Linux Real-Time Extensions 409
used for synchronization with ISRs (the ISR does not run in the context of any
task). Moreover, when the critical section is very short, it may be preferable
to use spinlocks because they are simpler and introduce less overhead.
Kernel V 2.4
Kernel V 2.6
Preemptible Non-Preemptible
FIGURE 18.3
The evolution of kernel preemption in Linux.
18.4.1 Adeos
Adeos is a resource virtualization layer, that is, a software system that inter-
faces to the hardware machine and provides an Hardware Abstraction Layer
(HAL). Adeos enables multiple entities, called domains, to exists simultane-
ously on the same machine. Instead of interfacing directly to the hardware,
every domain relies on an Application Programming Interface (API) exported
Internal Structures and Operating Principles of Linux Real-Time Extensions 413
Adeos pipeline
Interrupts
and traps
Highest-priority Lowest-priority
Root Domain
Domain domain
Linux kernel
FIGURE 18.4
The Adeos pipeline.
by Adeos to interact with the machine. In this way, Adeos can properly
dispatch interrupts and exceptions to domains. A domain is a kernel-based
software component that can ask the Adeos layer to be notified of incom-
ing external interrupts, including software interrupts generated when system
calls are issued by user applications. Normally, at least two domains will be
defined: a real-time kernel application carrying out real-time activity, and a
Linux kernel for the rest of the system activities. It is in principle possible
to build more complex applications where more than one operating system is
involved, each represented by a different domain. Adeos ensures that system
events, including interrupts, are orderly dispatched to the various client do-
mains according to their respective priority. So, a high-priority domain will
receive system event notification before lower-priority domains and can decide
whether to pass them to the other ones. All active domains are queued ac-
cording to their respective priority, forming a pipeline of events (Figure 18.4).
Incoming events are pushed to the head of the pipeline and progress down to
its tail. Any pipeline stage corresponding to a domain can be stalled, which
means that the next incoming interrupt will not be delivered to that domain,
neither to the downstream domains. While a stage is being stalled, pending
interrupts accumulate in the domain’s interrupt log and eventually get played
when the stage gets unstalled. This mechanism is used by domains to protect
their critical sections by interrupts. Recall that, in Linux, critical sections are
protected also by disabling interrupts, thus preventing interrupt handler code
to interfere in the critical section. If the Linux kernel is represented by an
Adeos domain, critical sections are protected by stalling the corresponding
pipeline stage. Interrupts are not delivered to downstream stages (if any), but
upstream domains will keep on receiving events. In practice, this means that
a real-time system running ahead of the Linux kernel in the pipeline would
still be able to receive interrupts at any time with no incurred delay. In this
414 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Adeos pipeline
Interrupts
and traps
Xenomai Linux
Interrupt shield
(primary) (secondary)
FIGURE 18.5
The Xenomai domains in the Adeos pipeline.
18.4.2 Xenomai
Xenomai is a real-time extension of Linux based on Adeos and defines three
Adeos domains: the primary domain, which hosts the real-time nucleus in-
cluding a scheduler for real-time applications; the interrupt shield, used to
selectively block the propagation of interrupts; and the secondary domain,
consisting in the Linux kernel (Figure 18.5). The primary domain receives all
incoming interrupts first before the Linux kernel has had the opportunity to
notice them. The primary domain can therefore use such events to perform
real-time scheduling activities, regardless of any attempt of the Linux kernel
to lock them, by stalling the corresponding pipeline stage to protect Linux
kernel critical sections.
Xenomai allows running real-time threads, called Xenomai threads, either
strictly in kernel space or within the address space of a Linux process. All
Internal Structures and Operating Principles of Linux Real-Time Extensions 415
Linux Xenomai
Adeos
Low-level-Linux (HAL)
Hardware
FIGURE 18.6
The Xenomai layers.
Xenomai threads are known to the primary domain and normally run in the
context of this domain, which is guaranteed to receive interrupts regardless of
the activity of the secondary (Linux) domain. A Xenomai thread can use spe-
cific primitives for thread synchronization but is also free to use Linux system
calls. In the latter case, the Xenomai thread is moved in the secondary domain
and will rely on the services offered by the Linux kernel. Conversely, when a
Xenomai thread running in the secondary domain invokes a possibly block-
ing Xenomai system call, it will be moved to the primary domain before the
service is eventually performed, relying on the Xenomai-specific kernel data
structures. Xenomai threads can be moved back and forth the primary and sec-
ondary domains depending on the kind of services (Linux vs. Xenomai system
calls) requested. Even when moved to the secondary (Linux) domain, a Xeno-
mai thread can maintain real-time characteristics: this is achieved by avoiding
the Xenomai thread execution being perturbed by non-real-time Linux inter-
rupt activities. A simple way to prevent delivery of interrupts to the Linux
kernel when the Xenomai thread is running in the secondary (Linux) domain
is to stall the interrupt shield domain lying between the primary and sec-
ondary ones. The interrupt shield will then be disengaged when the Xenomai
thread finishes the current computation. In this way, Xenomai threads can
have real-time characteristics even when running in the Linux domain, albeit
suffering from an increased latency due to the fact that, in this domain, they
are scheduled by the original Linux scheduler.
Letting real-time Xenomai threads work in user space and use Linux sys-
tem calls simplify the development of real-time applications, and, above all,
allows an easy porting of existing applications from other systems. To this
purpose, the Xenomai API provides several sets of synchronization primitives,
416 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
called skins, each emulating the set of system routines of a given operating
system.
The Adeos functionality is used in Xenomai also for handling system call
interception. In fact, system calls performed by Xenomai threads, including
Linux system calls, must be intercepted by Xenomai to properly handle the
migration of the thread between the primary and secondary domain. This is
achieved thanks to the possibility offered by Adeos of registering an event
handler that is then activated every time a syscall is executed (and therefore
a software interrupt is generated).
The layers of Xenomai are shown in Figure 18.6. At the lowest level is the
hardware, which is directly interfaced to the Linux HAL, used by Adeos to
export a new kind of HAL that supports event and interrupts dispatching.
This layer is then exported to the primary (xenomai) and secondary (Linux)
domains.
18.4.3 RTAI
Real-Time Application Interface (RTAI) ia another example of dual-kernel
real-time extension of Linux based on Adeos. Its architecture is not far from
that of Xenomai, and the Xenomai project itself originates from a common
development with RTAI, from which it separated in 2005.
In RTAI, the Adeos layer is used to provide the dispatching of system
events to two different domains: the RTAI scheduler, and the Linux kernel.
In RTAI, however, the Adeos software has been patched to adapt it to the
specific requirements of RTAI. So, the clean distinction in layers of Xenomai
(Figure 18.6) is somewhat lost, and parts of the RTAI code make direct access
to the underlying hardware (Figure 18.7). The reason for this choice is the
need for avoiding passing through the Adeos layer when dispatching of those
events that are critical in real-time responsiveness. Despite this difference in
implementation, the concept is the same: let interrupts, which may signal
system events requiring some action from the system, reach the real-time
scheduler before they are handled by the Linux kernel and regardless the
possible interrupt disabling actions performed by the Linux kernel to protect
critical sections.
The main components of RTAI are shown in Figure 18.8. Above the ab-
straction layer provided by the patched version of Adeos, the RTAI scheduler
organizes the execution of real-time tasks. The RTAI component provides In-
terProcess Communication (IPC) among real-time and Linux tasks.
The RTAI scheduler handles two main kind of tasks:
1. Native kernel RTAI threads. These tasks live outside the Linux en-
vironment and therefore cannot use any Linux resource. It is, how-
ever, possible to let them communicate with Linux tasks using RTAI
Internal Structures and Operating Principles of Linux Real-Time Extensions 417
Linux
RTAI
Adeos
Low-level-Linux (HAL)
Hardware
FIGURE 18.7
The RTAI layers.
RTAI IPC
RTAI Scheduler
Adeos/HAL
FIGURE 18.8
The RTAI components.
RM and EDF refer to timed tasks. RTAI interfaces directly to the hardware
timer device in order to manage the occurrence of timer ticks with a minimum
overhead. For the EDF scheduling policy, it is also necessary to define release
time and deadline for each timed RTAI thread.
While the shortest latencies in the system are achieved using native kernel
RTAI threads, the possibility of providing real-time responsiveness to selected
Linux tasks increases the practical usability of RTAI. For this purpose, a
new scheduler, called LinuX RealTime (LXRT), has been provided in RTAI,
which is also able to manage Linux tasks. A Linux task can be made real-time
by invoking the RTAI rt make hard real time() routine. In this case, the
task is moved from the Linux scheduler to the RTAI one, and therefore, it
enters the set of tasks that are considered by the RTAI scheduler for real-
time responsiveness. In order to provide a fast response, the RTAI scheduler
is invoked when either
Considering the fact that, thanks to the underlying Adeos layer, interrupts are
first delivered to the RTAI domain, the LXRT scheduler has a chance to release
new ready tasks whenever a significant system event occurs, including the soft
interrupt originated by the activation of a system routine in the Linux kernel
(traps are redirected by Adeos to the RTAI domain). Basically, the LXRT
scheduler works as a coscheduler of the Linux one: whenever no real-time
task (either “converted” Linux or native RTAI thread) can be selected for
computation, control is passed to the Linux scheduler by invoking the original
schedule() function.
A real-time Linux task can at any time return to its original condition by
invoking rt make soft real time(). In this case, its descriptor is removed
Internal Structures and Operating Principles of Linux Real-Time Extensions 419
from the task queues of the LXRT scheduler queues and put back into the
Linux scheduler.
Two levels of priority inheritance are supported by the RTAI scheduler:
18.5 Summary
The interest toward real-time Linux is growing due to several factors, among
which are the following
This chapter presented the two different approaches that have been followed
in the implementation of real-time systems based on Linux: the “mainstream”
evolution of the Linux kernel, and the dual-kernel organization. The former
420 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
is a slower process: the Linux kernel is complex and changes in such a crit-
ical part of software require a long time to be fully tested and accepted by
the user community because they potentially affect a great number of exist-
ing applications. The dual-kernel approach circumvents the problem of kernel
complexity, and therefore, allows implementing working systems in a much
shorter time with much less effort. Nevertheless, systems like Xenomai and
RTAI are implemented as patches (for Adeos) and loadable modules for the
Linux kernel, and require a continuous work to adapt them to the evolution of
the Linux kernel. On the other side, such systems are likely to achieve shorter
latencies because they remove the problem from its origin by defining a com-
pletely different path, outside the Linux kernel, for those system events that
are involved in real-time operations.
The reader may wonder now which approach is going to be the “winner.”
Giving an answer to such a question is not easy, but the authors’ impres-
sion is that the mainstream Linux evolution is likely to become the common
choice, restricting the usage of dual-core solutions to a selected set of highly
demanding applications.
19
OS Abstraction Layer
CONTENTS
19.1 An Object Oriented Interface to Threads and Other IPC Mechanisms 423
19.1.1 Linux Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
19.1.2 FreeRTOS Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
19.2 A Sample Multiplatform Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
19.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
Through this book we have presented examples based on two different open
source operating systems: Linux and FreeRTOS. These two systems represent
somewhat two opposite extremes in complexity. Linux is a full-fledged system
supporting all those features that are required for handling complex systems,
and is used in top-level applications such as large servers. FreeRTOS is a min-
imal system that is oriented towards small applications and microcontrollers,
with minimal requirements for memory and computing resources, so that it
can be used on very tiny systems such as microcontrollers.
Despite their different sizes, Linux and FreeRTOS provide a similar view
of the systems, where computation is carried out by a number of threads that
can interact by sharing memory and synchronizing via semaphores. Linux
provides, of course, support for more sophisticated features such as virtual
memory, processes, and a sophisticated I/O interface, but nevertheless the
conceptual models of the system are basically the same. How they differ is in
their Application Programming Interface (API), and therefore an application
written for FreeRTOS cannot be ported to Linux as its is, even if all the
features supported by FreeRTOS are supported by Linux as well.
The difference in API is not an issue in the case the development of the
embedded system is targeted to the specific architecture. However there is a
growing number of applications that require multiplatform support, that is,
the ability of the application code to be used on different operating systems,
for the following two reasons:
421
422 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
were not available in all the tool chains used to generate the application code
in embedded system, and, above all, that the code produced by C++ compil-
ers was rather inefficient, requiring, for example, a large number of libraries
(and therefore more code in memory) for carrying out even simple operations.
This is no more the case, and modern C++ compilers are able to produce
highly efficient code which can be safely used in time critical applications.
public :
/∗ Constructor : convert the passed code and appends i t to the
passed s t r i n g message ∗/
S y s t e m E x c e p t i o n( const char * msg , int errNo )
{
memset ( errorMsg , 0 , 512);
sprintf ( errorMsg , " % s : % s " , msg , strerror ( errNo ));
}
/∗ Get the error message ∗/
char * what ()
{
return errorMsg ;
}
};
Class Semaphore specifies a counting semaphore. Its private field is the handle
of the underlying Linux semaphore, which is created by the class’ constructor.
The class exports the public wait() and post() methods.
class S e m a p h o r e
{
/∗ The corresponding Linux semaphore handle ∗/
sem_t s e m H a n d l e;
public :
OS Abstraction Layer 425
public :
/∗ Constructor : i n i t i a l i z e the i n t e r n a l pthread mutex ∗/
Mutex ()
{
int status = p t h r e a d _ m u t e x _ i n i t(& mutex , NULL );
if ( status != 0)
throw new S y s t e m E x c e p t i o n( " Error Creating Mutex " , errno );
}
void lock ()
{
p t h r e a d _ m u t e x _ l o c k(& mutex );
}
void unlock ()
{
p t h r e a d _ m u t e x _ u n l o c k(& mutex );
}
426 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
~ Mutex ()
{
p t h r e a d _ m u t e x _ d e s t r o y(& mutex );
}
};
Class Condition carries out the monitor functionality. Its public method
wait() suspends the execution of the current thread until the monitor is
signaled via public method signal(). Method wait() can accept as parame-
ter a reference of a Mutex object. This mutex will be unlocked prior to waiting
and locked again before method wait() returns.
The Linux implementation is straightforward and the monitor functional-
ity is mapped onto a pthread condition object.
class C o n d i t i o n
{
p t h r e a d _ c o n d _ t cond ;
public :
/∗ Constructor : i n i t i a l i z e the i n t e r n a l pthread c on di ti on o b j e c t ∗/
C o n d i t i o n()
{
int status = p t h r e a d _ c o n d _ i n i t(& cond , NULL );
if ( status != 0)
throw new S y s t e m E x c e p t i o n( " Error Creating C o n d i t i o n" , errno );
}
public :
/∗ Constructor : i n i t i a l i z e the Linux Message queue , and a l l o c a t e the
OS Abstraction Layer 427
i n t e r n a l b u f f e r ∗/
M e s s a g e Q u e u e( int itemSize )
{
this - > itemSize = itemSize ;
msgId = msgget ( IPC_PRIVATE , 0666);
if ( msgId == -1)
throw new S y s t e m E x c e p t i o n( " Error Creating Message Queue " , errno );
msgBuf = new char [ sizeof ( long ) + itemSize ];
/∗ The message c l a s s r e q u i r e d by Linux i s here always 1 ∗/
*(( long *) msgBuf ) = 1;
}
/∗ Send message : message dimension has been de c lar e d in the c on str u c tor ∗/
void send ( void * item )
{
memcpy (& msgBuf [ sizeof ( long )] , item , itemSize );
int status = msgsnd ( msgId , msgBuf , itemSize , 0);
if ( status == -1)
throw new S y s t e m E x c e p t i o n( " Error Sending Message " , errno );
}
/∗ Receive a message , p o s s i b l y w ai ti n g for i t , and return the number
of b y t e s a c t u a l l y read ∗/
int receive ( void * retItem )
{
int retBytes = msgrcv ( msgId , msgBuf , itemSize , 0 , 0);
if ( retBytes == -1)
throw new S y s t e m E x c e p t i o n( " Error R e c e i v i n g Message " , errno );
/∗ Copy the message from the i n t e r n a l b u f f e r i n t o c l i e n t ’ s b u ffe r ,
sk i ppi n g the f i r s t longword c on tai n i n g the message c l a s s ∗/
memcpy ( retItem , & msgBuf [ sizeof ( long )] , retBytes );
return retBytes ;
}
/∗ Destructor : Remove Linux message queue s t r u c t u r e s and
d e a l l o c a t e b u f f e r ∗/
~ M e s s a g e Q u e u e()
{
msgctl ( msgId , IPC_RMID , NULL );
delete [] msgBuf ;
}
};
The following are the classes and structures used to implement Thread class.
Class Runnable is declared as an abstract class and must be implemented
by the client application. Thread’s method start() takes two arguments: the
pointer of a Runnable subclass instance and a generic argument to be passed to
Runnable’s method run(). Since the routine passed to pthread create() ac-
cepts only one parameter, this is defined as the pointer to a structure contain-
ing both the address of the Runnable instance and the argument. A pointer to
routine handlerWithArg() is passed to pthread create(), and this routine
will in turn call Runnable’s run() method with the specified argument. Thread
stack size and priority can be set using setter methods setStackSize() and
setPriority(). Finally, method join() will suspend caller’s execution until
the created thread terminates.
/∗ Abstract c l a s s for re pre se n ti n g Runnable e n t i t i e s . I t w i l l be
i n h e r i t e d by user−provided c l a s s e s in order to s p e c i f y
s p e c i f i c thread code . The c l a s s has only one v i r t u a l method ,
run ( ) , which w i l l r e c e i v e a v oi d∗ g e n e r i c argument ∗/
class Runnable
{
public :
virtual void run ( void * arg ) = 0;
428 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
};
public :
/∗ Constructor : i n i t i a l i z e pthread a t t r i b u t e s ∗/
Thread ()
{
p t h r e a d _ a t t r _ i n i t (& attr );
priority = -1;
}
/∗ Set the thread stac k s i z e ∗/
void s e t S t a c k S i z e( int s t a c k S i z e)
{
p t h r e a d _ a t t r _ s e t s t a c k s i z e (& attr , s t a c k S i z e);
}
/∗ Set the thread p r i o r i t y ∗/
void s e t P r i o r i t y( int priority )
{
/∗ P r i o r i t y i s simply recorded , as the p r i o r i t y can be s e t only
when the thread i s s t a r t e d ∗/
this - > priority = priority ;
}
/∗ S t a r t a new thread . The passed r ou ti n e i s embedded in an i n stan c e of
( s u b c l a s s of ) Runnable c l a s s ∗/
void start ( Runnable * rtn , void * arg )
{
/∗ Prepare the argument s t r u c t u r e ∗/
args . rtn = rtn ;
args . arg = arg ;
int status = p t h r e a d _ c r e a t e(& threadId , & attr ,
( void *(*)( void *)) handlerWithArg , ( void *)& args );
if ( status != 0)
throw new S y s t e m E x c e p t i o n( " Error Creating Thread " , errno );
if ( priority != -1) /∗ I f not d e f a u l t p r i o r i t y ∗/
{
status = p t h r e a d _ s e t s c h e d p r i o( threadId , priority );
if ( status != 0)
throw new S y s t e m E x c e p t i o n( " Error setting Thread Priority " , errno );
}
}
/∗ Wait the termination of the c re ate d thread ∗/
void join ()
{
p t h r e a d _ j o i n( threadId , NULL );
}
OS Abstraction Layer 429
}
};
class S y s t e m E x c e p t i o n
{
char errorMsg [512];
public :
S y s t e m E x c e p t i o n( const char * msg )
{
memset ( errorMsg , 0 , 512);
sprintf ( errorMsg , " % s " , msg );
}
char * what ()
{
return errorMsg ;
}
};
public :
/∗ Constructor : c r e a t e s the semaphore o b j e c t ∗/
S e m a p h o r e( int initVal = 0)
{
s e m H a n d l e = x S e m a p h o r e C r e a t e C o u n t i n g( M A X _ F R E E R T O S_ SE M_ VA L , initVal );
if ( s e m H a n d l e == NULL )
throw new S y s t e m E x c e p t i o n( " Error i n i t i a l i z i n g S e m a p h o r e" );
}
OS Abstraction Layer 431
void wait ()
{
if ( x S e m a p h o r e T a k e( semHandle , p o r t M A X _ D E L A Y) != pdTRUE )
throw new S y s t e m E x c e p t i o n( " Error waiting s e m a p h o r e" );
}
void post ()
{
if ( x S e m a p h o r e G i v e( s e m H a n d l e) != pdTRUE )
throw new S y s t e m E x c e p t i o n( " Error posting S e m a p h o r e" );
}
~ S e m a p h o r e()
{
v Q u e u e D e l e t e( s e m H a n d l e);
}
};
public :
/∗ Constructor : c r e a t e s the r e c u r s i v e binary semaphore ∗/
Mutex ()
{
s e m H a n d l e = x S e m a p h o r e C r e a t e R e c u r s i v e M u t e x ();
if ( s e m H a n d l e == NULL )
throw new S y s t e m E x c e p t i o n( " Error Creating Mutex " );
}
void lock ()
{
if ( x S e m a p h o r e T a k e( semHandle , p o r t M A X _ D E L A Y) != pdTRUE )
throw new S y s t e m E x c e p t i o n( " Error locking mutex " );
}
void unlock ()
{
if ( x S e m a p h o r e G i v e( s e m H a n d l e) != pdTRUE )
throw new S y s t e m E x c e p t i o n( " Error u n l o c k i n g mutex " );
}
~ Mutex ()
{
v Q u e u e D e l e t e( s e m H a n d l e);
}
};
While in Linux monitors are directly supported by the pthread library, they
are not natively available in FreeRTOS. Therefore, in the FreeRTOS im-
plementation, monitors must be implemented using the available objects
(Mutexes and Semaphores). It would be possible to use directly FreeRTOS
semaphores, but using the interface classes makes the code more readable. In
this way, there is no need for a destructor, the native semaphores discarded by
the object fields destructors are automatically called whenever the Condition
instance is discarded.
class C o n d i t i o n
{
/∗ The Mutex to p r o t e c t c on di ti on data s t r u c t u r e s ∗/
Mutex mutex ;
/∗ The Semaphore used to wake w ai ti n g processes , i n i t i a l l y s e t to
432 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
public :
/∗ Constructor : the mutex and semaphore o b j e c t s are c r e ate d by the
c on stru c tors of f i e l d s mutex and sem
so the only r e q u i r e d ac ti on i s to i n i t i a l i z e waitingCount to zero ∗/
C o n d i t i o n()
{
w a i t i n g C o u n t = 0;
}
/∗ Simulated wait procedure : increment the w ai ti n g counter , wait
for the semaphore a f t e r r e l e a s i n g the passed Mutex , i f any ,
and ac q u i r e i t afterwards ∗/
void wait ( Mutex * u s e r M u t e x)
{
w a i t i n g C o u n t++;
if ( u s e r M u t e x)
userMutex - > unlock ();
sem . wait ();
if ( u s e r M u t e x)
userMutex - > lock ();
}
/∗ Simulated S i g n al procedure : check w r i te r counter . I f g r e ate r
than zero , the re i s at l e a s t one w ai ti n g task , which i s awakened
by posti n g the semaphore . The check and p o s s i b l e decrement of
v a r i a b l e waitingCount must be performed in a c r i t i c a l segment ,
pr ote c te d by the Mutex ∗/
void signal ()
{
mutex . lock ();
if ( w a i t i n g C o u n t == 0)
{
/∗ No w ai ti n g t a s k s ∗/
mutex . unlock ();
return ;
}
/∗ There i s at l e a s t one w ai ti n g task ∗/
waitingCount - -;
sem . post ();
mutex . unlock ();
}
};
public :
/∗ Constructor : c r e ate n ati v e message queue o b j e c t ∗/
M e s s a g e Q u e u e( int itemSize )
{
this - > itemSize = itemSize ;
queue = x Q u e u e C r e a t e( M A X _ F R E E R T O S _ QU EU E_ L EN , itemSize );
if ( queue == NULL )
OS Abstraction Layer 433
The basic concept in the Thread interface, that is, embedding the code to be
executed by the thread into the method run() for a Runnable class, is retained
in the FreeRTOS implementation. There are, however, three main differences
with the Linux implementation:
/∗ S t a r t the sc he du le r ∗/
static void s t a r t S c h e d u l e r()
{
v T a s k S t a r t S c h e d u l e r ();
}
};
public :
T i m e I n t e r v a l( long secs , long nanoSecs )
{
this - > secs = secs + nanoSecs / 1 0 0 0 0 0 0 0 0 0 ;
this - > nanoSecs = nanoSecs % 1 0 0 0 0 0 0 0 0 0 ;
}
T i m e I n t e r v a l( long m i l l i S e c s)
{
this - > secs = m i l l i S e c s / 1000;
this - > nanoSecs = ( m i l l i S e c s % 1000) * 1000000;
}
class Timer
{
public :
void sleep ( T i m e I n t e r v a l & tv )
{
p o r t T i c k T y p e numTicks =
( tv . g e t T o t M i l l i S e c s() * c o n f i g T I C K _ R A T E _ H Z )/1000;
v T a s k D e l a y( numTicks );
}
436 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
};
# endif //OS ABSTRACTION FREE RTOS H
The reader may wonder why the subscriber should receive a new kind
of object (of class MulticastReceiver) instead of directly a message queue
instance. After all, class MulticastReceiver is just a container for the mes-
sage queue, and doesn’t do anything more. The reason for this choice is that
class MuticastReceiver allows hiding the actual implementation of the mul-
ticast mechanisms. The subscriber is not interested in knowing how multicast
is implemented, and it makes no sense therefore to expose such knowledge.
Moreover, if for any reason the internal structure of MulticastManager were
changed, using a different mechanism in place of message queues, this change
would be reflected in the subscriber’s interface unless not hidden by the in-
terface class MulticastReceiver.
Classes MulticastReceiver and MulticastManager are listed below:
# ifndef M U L T I C A S T _ H
# define M U L T I C A S T _ H
# include " O S A b s t r a c t i o n. h "
class M u l t i c a s t R e c e i v e r
{
M e s s a g e Q u e u e * mq ;
public :
/∗ This c l a s s i s i n s t a n t i a t e d only by MulticastManager ,
passi n g the corresponding message queue ∗/
M u l t i c a s t R e c e i v e r( M e s s a g e Q u e u e * mq )
{
this - > mq = mq ;
}
/∗ C al l e d by the s u b s c r i b e r to r e c e i v e muticast messages ∗/
void receive ( void * retItem )
{
mq - > receive ( retItem );
}
};
# define I N I T I A L _ M A X _ C L I E N T S 100
class M u l t i c a s t M a n a g e r
{
/∗ Maximum number of c l i e n t s b e fore r e a l l o c a t i n g arrays ∗/
int m a x C l i e n t s;
/∗ Actual number of c l i e n t s ∗/
int c u r r C l i e n t s;
/∗ A l l exchanged information i s assumed of the same s i z e ∗/
int itemSize ;
/∗ Mutex to p r o t e c t data s t r u c t u r e s ∗/
Mutex mutex ;
/∗ Array of message queue r e fe r e n c e s ∗/
M e s s a g e Q u e u e ** m s g Q u e u e s;
public :
/∗ The dimension of the messages i s passed to the c on str u c tor ∗/
438 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
M u l t i c a s t M a n a g e r( int itemSize )
{
this - > itemSize = itemSize ;
/∗ I n i t i a l a l l o c a t i o n ∗/
m a x C l i e n t s = I N I T I A L _ M A X _ C L I E N T S;
m s g Q u e u e s = new M e s s a g e Q u e u e *[ m a x C l i e n t s];
c u r r C l i e n t s = 0;
}
/∗ C a l l e d by s u b s c r i b e r a c t o r ∗/
M u l t i c a s t R e c e i v e r * s u b s c r i b e()
{
/∗ Create a message queue for t h i s s u b s c r i b e r ∗/
M e s s a g e Q u e u e * mq = new M e s s a g e Q u e u e( itemSize );
/∗ Update message queue , p o s s i b l y r e a l l o c a t i n g i t ∗/
mutex . lock ();
if ( c u r r C l i e n t s == m a x C l i e n t s)
{
/∗ Need r e a l l o c a t i o n : double the number of a l l o c a t e d message
queue poi n te r s ∗/
int n e w M a x C l i e n t s = m a x C l i e n t s*2;
M e s s a g e Q u e u e ** n e w M s g Q u e u e s = new M e s s a g e Q u e u e *[ n e w M a x C l i e n t s];
memcpy ( newMsgQueues , msgQueues , m a x C l i e n t s);
delete [] m s g Q u e u e s;
m s g Q u e u e s = n e w M s g Q u e u e s;
m a x C l i e n t s = n e w M a x C l i e n t s;
}
/∗ At t h i s poi n t the r e i s room for sure ∗/
m s g Q u e u e s[ c u r r C l i e n t s ++] = mq ;
mutex . unlock ();
return new M u l t i c a s t R e c e i v e r( mq );
}
/∗ P u b l i sh a message : i t w i l l be r e c e i v e d by a l l s u b s c r i b e r s ∗/
void publish ( void * item )
{
/∗ l o c k data s t r u c t u r e to avoid i n t e r f e r e n c e s with othe r
p u b l i s h / s u b s c r i b e ope r ati on s ∗/
mutex . lock ();
/∗ send the message to a l l s u b s c r i b e r s ∗/
for ( int i = 0; i < c u r r C l i e n t s; i ++)
m s g Q u e u e s[ i ] - > send ( item );
mutex . unlock ();
}
};
# endif //MULTICAST H
public :
/∗ Constructor r e c e i v i n g a r e fe r e n c e of the m u l t i c a s t manager ∗/
M u l t i c a s t L i s t e n e r( M u l t i c a s t M a n a g e r * mm )
{
this - > mm = mm ;
}
/∗ Run method executed by the thread . The passed argument here
i s a poi n te r to the index of the thread ∗/
void run ( void * arg )
{
printf ( " run Thread % d \ n " , *( int *) arg );
/∗ Wait 1 second ∗/
timer . sleep ( ti );
}
/∗ The l a s t message c on tai n s the QUIT code ∗/
int quitCode = Q U I T _ C O D E;
mm . publish (& quitCode );
/∗ Wait for the termination of a l l thre ads ∗/
for ( int i = 0; i < N U M _ T H R E A D S; i ++)
threads [ i ]. join ();
printf ( " End of p u b l i s h e r\ n " );
}
catch ( S y s t e m E x c e p t i o n * exc )
{
/∗ I f anything went wrong , p r i n t the error message ∗/
printf ( " System error in p u b l i s h e r: % s \ n " , exc - > what ());
}
}
};
/∗ Main program ∗/
int main ( int argc , char * argv [])
{
Thread p u b l i s h e r;
/∗ Create and s t a r t the p u b l i s h e r thread . ∗/
/∗ I t w i l l c r e ate the l i s t e n e r s , send them messages ,
and j o i n with them ∗/
try {
p u b l i s h e r. start ( new M u l t i c a s t P u b l i s h e r() , NULL );
/∗ S t a r t the sc he du le r (dummy in Linux implementation ∗/
Thread :: s t a r t S c h e d u l e r ();
}
catch ( S y s t e m E x c e p t i o n * exc )
{
printf ( " System error in main : % s \ n " , exc - > what ());
}
}
19.3 Summary
This chapter has presented a possible approach to handling multiplatform
applications. Handling interaction with more than one operating systems may
be desirable for several reasons among which are these:
CONTENTS
20.1 Case Study 1: Controlling the Liquid Level in a Tank . . . . . . . . . . . . . . . . 444
20.1.1 The Use of Differential Equations to Describe the Dynamics of
the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
20.1.2 Introducing an Integral Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
20.1.3 Using Transfer Functions in the Laplace Domain . . . . . . . . . . . . . 450
20.1.4 Deriving System Properties from Its Transfer Function . . . . . . . 452
20.1.5 Implementing a Transfer Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
20.1.6 What We Have Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
20.2 Case Study 2: Implementing a Digital low-pass Filter . . . . . . . . . . . . . . . . 462
20.2.1 Harmonics and the Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . 462
20.2.2 Low-Pass Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
20.2.3 The Choice of the Sampling Period . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
20.2.4 Building the Digital Low-Pass Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 479
20.2.5 Signal to Noise Ratio (SNR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
20.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
Two case studies will be presented in this chapter in order to introduce ba-
sic concepts in Control Theory and Digital Signal Processing, respectively.
The first one will consist in a simple control problem to regulate the flow of
a liquid in a tank in order to stabilize its level. Here, some basic concepts
of control theory will be introduced to let the reader become familiar with
the concept of transfer function, system stability, and the techniques for its
practical implementation.
443
444 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Tank
h(t)
f(t)
Pump
FIGURE 20.1
The tank–pump system.
FIGURE 20.2
Tank–pump system controlled in feedback mode.
the liquid is being pumped away the tank. The tank–pump system input and
output are correlated as follows:
1 t
h(t) = h0 + f (τ )dτ (20.1)
B 0
where h0 is the level in the tank at time 0, and B is the base surface of
the tank. f (τ )dτ represents, in fact, the liquid volume change during the
infinitesimal time dτ . In order to control the level of the liquid, we may think
of sending to the actuator (the pump) a reference signal that is proportional
to the difference of the measured level and the reference level value, that is,
f (t) = Kp [href (t) − h(t)] (20.2)
corresponding to the schema shown in Figure 20.2. This is an example of
feedback control where the reference signal depends also on its current out-
put. Parameter Kp is called the Proportional Gain, and this kind of feedback
control is called proportional because the system is fed by a signal which is
proportional to the current error, that is, the difference between the desired
output href (t) and the current one h(t). This kind of control intuitively works.
In fact, if the current level h(t) is lower than the reference value href (t), the
preset flow is positive, and therefore liquid enters the tank. Conversely, if
h(t) > href (t), liquid is pumped away from the tank, and when the liquid
level is ok, that is, h(t) = href (t), the requested flow is 0.
dh(t) Kp
= [href − h(t)] (20.4)
dt B
that is, the differential equation
dh(t) Kp Kp
+ h(t) = href (20.5)
dt B B
whose solution is the actual evolution h(t) of the liquid level in the tank.
The tank–pump system is an example of linear system. More in general,
the I/O relationship for linear systems is expressed by the differential equation
dy n dy n−1 dy
an n
+ an−1 n−1 + ... + a1 + a0 = 0 (20.8)
dt dt dt
is of the form
n
μi
y(t) = Aik tk epi t (20.9)
i=1 k=0
where Aik are coefficients that depend on the initial system condition, and n
and μi are the number of different roots and their multiplicity of the polyno-
mial
an pn + an−1 pn−1 + ... + a1 p + a0 = 0 (20.10)
respectively. Polynomial (20.10) is called the characteristic equation of the dif-
ferential equation (20.7). The terms tk epi t are called the modes of the system.
Often, the roots of (20.10) have single multiplicity, and the modes are then
of the form epi t . It is worth noting that the roots of polynomial (20.10) may
Control Theory and Digital Signal Processing Primer 447
be real values or complex ones, that is, of the form p = a + jb, where a and
b are the real and imaginary parts of p, respectively. Complex roots (20.10)
appear in conjugate pairs (the conjugate of a complex number a+ jb is a− jb).
We recall also that the exponential of a complex number p = a + jb is of the
form ep = ea [cos b + j sin b]. The modes are very important in describing the
dynamics of the system. In fact, if any root of the associated polynomial has a
positive real part, the corresponding mode will have a term that diverges over
time (an exponential function with an increasing positive argument), and the
system becomes unstable. Conversely, if all the modes have negative real part,
the system transients will become negligible after a given amount of time.
Moreover, the characteristics of the modes of the system provide us with ad-
ditional information. If the modes are real numbers, they have the shape of an
exponential function; if instead they have a nonnull imaginary part, the modes
will have also an oscillating term, whose frequency is related to the imaginary
part and whose amplitude depends on the real part of the corresponding root.
The attentive reader may be concerned by the fact that, while the modes
are represented by complex numbers, the free evolution of the system must be
represented by real numbers (after all, we live in a real world). This apparent
contradiction is explained by considering that the complex roots of (20.10)
are always in conjugate pairs, and therefore, the imaginary terms elide in the
final summation of (20.9). In fact, for every complex number p = a + jb and
its complex conjugate p = a − jb, we have
ep = ea [cos(b) + j sin(−b)] = ea [cos(b) − j sin(b)] = (ep ); (20.11)
moreover, considering the common case in which solutions of the character-
istic equation have single multiplicity, the (complex) coefficients Ai in (20.9)
associated with pi = a + jb and pi = a − jb are Ai and Ai , respectively, and
therefore
Ai epi t + Ai epi t = ea [2 Re(Ai ) cos(bt) − 2 Im(Ai ) sin(bt)] (20.12)
where Re(Ai ) and Im(Ai ) are the real and imaginary parts of Ai , respectively.
Equation (20.12) represents the contribution of the pair of conjugate roots,
which is a real number.
level (m)
href = 1 Kp = 0.4
1.0
Kp = 0.2
0.5
h0 = 0.3
0.0
0 10 20
time (s)
FIGURE 20.3
Tank–pump system response when controlled in feedback mode.
where hl (t) and hf (t) are the free and forced solutions of (20.5).
Parameter A is finally computed considering the boundary condition of
the system, that is, the values of h(t) for t = 0− . Just before the reference
value href has been applied to the system. For t = 0− (20.15) becomes
which yields the solution A = h0 − href , thus getting the final response of our
tank–pump system
Kp t
h(t) = (h0 − href )e− B + href (20.17)
FIGURE 20.4
Flow request to the pump using feedback control with proportional gain.
the proportional gain is limited by the maximum flow the pump is able to
generate. Moreover, the system will always approach the requested liquid level
asymptotically. We may wonder if it is possible to find some other control
schema that could provide a faster response. A possibility could be considering
an additional term in the input for the pump that is proportional to the
integral of the error, in the hope it can provide a faster response. In fact,
in the evolution plotted in Figure 20.3, the integral of the error is always
positive. We may expect that forcing the integral of the error to become null
will provide a faster step response, possibly with an overshoot, in order to
make the overall error integral equal to 0 (see Figure 20.9). To prove our
intuition, we consider a reference signal for the pump of the form
t
f (t) = Kp [href (t) − h(t)] + Ki [href (τ ) − h(τ )]dτ (20.18)
0
where Kp and Ki are the proportional and integral gains in the feedback
control, respectively. We obtain, therefore, the relation
1 t
h(t) = h0 + f (τ )dτ =
B 0
1 t Ki τ
h0 + {Kp [href (τ ) − h(τ )] + [href (τ ) − h(τ )]dτ }dτ (20.19)
B 0 B 0
Derivating both terms, we obtain
t
dh(t) Kp Ki
= [href (t) − h(t)] + [href (τ ) − h(τ )]dτ (20.20)
dt B B 0
and, by derivating again
d2 h(t) Kp dhref (t) dh(t) Ki
2
= [ − ]+ [href (t) − h(t)] (20.21)
dt B dt dt B
450 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
where s and F (s) are complex numbers. Even if at a first glance this approach
may complicate things, rather than simplifying problems (we are now consid-
ering complex functions of complex variables), this new formalism can rely on
a few interesting properties of the Laplace transforms, which turn out to be
very useful for expressing I/O relationships in linear systems in a simple way.
In particular we have
df (t)
L{ } = sL{f (t)} (20.25)
dt
L{f (t)}
L{ f (t)dt} = (20.26)
s
Equation (20.24) states that the Laplace transform is a linear operator, and,
due to (20.25) and (20.26), relations expressed by time integration and deriva-
tion become algebraic relations when considering Laplace transforms. In the
differential equation in (20.22), if we consider the Laplace transforms H(s)
and Href (s) in place of h(t) and href (t), from (20.24), (20.25) and (20.26),
we have for the tank–pump system
that is,
Ki + Kp s
H(s) = Href (s) (20.28)
Bs2 + Kp s + Ki
Control Theory and Digital Signal Processing Primer 451
F(s) H(s)
1/sB
FIGURE 20.5
Graphical representation of the transfer function for the tank–pump system.
Observe that the I/O relationship of our tank–pump system, which is ex-
pressed in the time domain by a differential equation, becomes an algebraic
relation in the Laplace domain. The term
Ki + Kp s
W (s) = (20.29)
Bs2 + Kp s + Ki
is called the Transfer Function and fully characterizes the system behavior.
Using Laplace transforms, it is not necessary to explicitly express the differ-
ential equation describing the system behavior, and the transfer function can
be directly derived from the block description of the system. In fact, recalling
the I/O relationship of the tank and relating the actual liquid level h(t) and
the pump flow f (t) in (20.1), using property (20.26),we can express the same
relationship in the Laplace domain as
1
H(s) = F (s) (20.30)
sB
where H(s) and F (s) are the Laplace transforms of h(t) and f (t), respectively.
This relation can be expressed graphically as in Figure 20.5. Considering the
control law involving the proportional and integral gain
t
f (t) = Kp [href (t) − h(t)] + Ki [href (τ ) − h(τ )]dτ (20.31)
0
FIGURE 20.6
Graphical representation of tank–pump system controlled in feedback.
FIGURE 20.7
The module of the transfer function for the tank–pump system.
454 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
im
real
FIGURE 20.8
Zeroes and poles of the transfer function for the tank–pump system.
becomes null when N (s) = 0, that is, for s = −1 + j0. There is normally no
need to represent graphically in this way the module of the transfer function
W (s), and it is more useful to express graphically its poles and zeroes in the
complex plane, as shown in Figure 20.8. By convention, zeroes are represented
by a circle and poles by a cross. System stability can be inferred by W (s)
(s)
when expressed in the form W (s) = N D(s) , where the numerator N (s) and the
denominator D(s) are polynomials in s. Informally stated, a system is stable
when its natural evolution with a null input is towards quiet, regardless of its
initial condition. The output of an unstable system may instead diverge even
with null inputs. If we recall how the expression of W (s) has been derived
from the differential equation expressing the I/O relationship of the system,
we recognize that the denominator D(s) corresponds to the characteristic
equation of the linear system whose roots define the modes of the system. So,
recalling the definition (20.9) of the modes of the system contributing to its
free evolution, we can state that if the poles of the transfer function W (s) have
a positive real part, the system will be unstable. Moreover, if the poles of W (s)
have a non-null imaginary part, we can state that oscillations will be present
in the free evolution of the system, and these oscillation will have a decreasing
amplitude if the real part of the poles is negative, and increasing otherwise.
The limit case is for poles of W (s), which are pure imaginary numbers (i.e.,
with null real part); in this case, the free evolution of the system oscillates
with constant amplitude over time.
Let us now return to the tank–pump system controlled in feedback using a
proportional gain, Kp , and an integral gain, Ki . Recalling its transfer function
in (20.34), we observe that its poles are the solution to equation
s2 B + sKp + Ki = 0 (20.36)
that is,
−Kp ± Kp2 − 4Ki B
s1,2 = (20.37)
2B
(recall that B is the base surface of the tank). Figure 20.9 shows the controlled
tank–pump response for Kp = 0.4 and Ki = 0.02, compared with the same
Control Theory and Digital Signal Processing Primer 455
level (m)
href = 1
1.0
0.5
h0 = 0.3
0.0
0 10 20 30
time (s)
FIGURE 20.9
The response of the controlled tank–pump system with proportional gain set
to 0.4 and integral gains set to 0.02 (black) and 0 (grey), respectively.
response for a proportional gain Kp = 0.4 only. It can be seen that the response
is faster, but an overshoot is present, as the consequence of the nonnull integral
gain (recall that control tries√to reduce the integral of the error in addition to
the error itself). For Kp > 2 Ki B Equation (20.36) yields two real solutions,
and therefore, the modes of the system in its free evolution have an exponential
shape and do not oscillate. Conversely, for Ki to be large enough, that is,
K2
Ki > 4Bp the two poles of W (s) become complex and therefore the system
response contains an oscillating term, as shown in Figure 20.10. In any case,
the oscillations are smoothed since Kp ≥ 0, and therefore, the real part of the
poles is not positive. It is interesting to observe that if Kp = 0, that is, when
considering only the integral of the error href (t)−h(t) in the feedback control,
the system response contains an oscillating term with constant amplitude.
Before proceeding, it is worthwhile now to summarize the advantages pro-
vided by the representation of the transfer functions expressed in the Laplace
domain for the analysis of linear systems. We have seen how it is possible
to derive several system properties simply based on its block diagram repre-
sentation and without deriving the differential equation describing the system
dynamics. More importantly, this can be done without deriving any analytical
solution of such differential equation. This is the reason why this formalism
is ubiquitously used in control engineering.
level (m)
href = 1
1
h0 = 0.3
0
0 10 20 30
time (s)
FIGURE 20.10
The response of the controlled tank–pump system with proportional gain set
to 0.4 and integral gain set to 1.
is the specification of the control function, often expressed in the Laplace do-
main. It is worth noting that in the definition of the control strategy for a
given system, we may deal with different transfer functions. For example, in
the tank–pump system used throughout this chapter, we had the following
transfer functions:
1
W1 (s) = (20.38)
Bs
which is the description of the tank I/O relationship
Ki
W2 (s) = Kp + (20.39)
s
which represents the control law, and
Ki + Kp s
W3 (s) = (20.40)
Bs2+ Kp s + Ki
which describes the overall system response.
If we turn our attention to the implementation of the control, once the pa-
rameters Kp and Ki have been chosen, we observe that the embedded system
must implement W2 (s), that is, the controller. It is necessary to provide to
the controller the input error href (t) − h(t), that is the difference between the
current level of the liquid in the tank and the reference one. The output of
the controller, f (t) will drive the pump in order to provide the requested flow.
The input to the controller is therefore provided by a sensor, while its output
will be sent to an actuator. The signals h(t) and f (t) may be analog signals,
such as a voltage level. This was the common case in the older times, prior
to the advent of digital controllers. In this case the controller itself was im-
plemented by an analog electronic circuit whose I/O law corresponded to the
Control Theory and Digital Signal Processing Primer 457
y(t)
t
FIGURE 20.11
Sampling a continuous function.
desired transfer function for control. Nowadays, analog controllers are rarely
used and digital controllers are used instead. Digital controllers operate on
the sampled values of the input signals and produce sampled outputs. The
input may derive from analog-to-digital conversion (ADC) performed on the
signal coming from the sensors, or taking directly the numerical values from
the sensor, connected via a local bus, a Local Area Network or, more recently,
a wireless connection. The digital controller’s outputs can then be converted
to analog voltages by means of a Digital to Analog converter (DAC) and then
given to the actuators, or directly sent to digital actuators with some sort of
bus interface.
When dealing with digital values, that is, the sampled values of the I/O
signals, an important factor is the sampling period T . For the tank–pump
system, the analog input h(t) is then transformed into a sequence of sampled
values h(nT ) as shown in Figure 20.11. Since sampling introduces unavoidable
loss of information, we would like that the sampled signal could represent an
approximation that is accurate enough for our purposes. Of course, the shorter
the sampling period T , the more accurate the representation of the original
analog signal. On the other side, higher sampling speed comes at a cost since
it requires a faster controller and, above all, faster communication, which may
become expensive. A trade-off is therefore desirable, that is, choosing a value
of T that is short enough to get an acceptable approximation, but avoiding
implementing an “overkill.” Such a value of T cannot be defined a priori, but
depends on the dynamics of the system; the faster the system response, the
shorted must be the sampling period T . A crude method for guessing an ap-
propriate value of T is to consider the step response of the system, as shown
in (20.9), and to choose T as the rise time divided for 10, so that a sufficient
number of samples can describe the variation of the system output. More ac-
curate methods consider the poles of W (s), which determine the modes of the
458 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
free system evolution, that is, its dynamics. In principle, the poles with the
largest absolute value of their real part, that is, describing the fastest modes
of the system, should be considered, but they are not always significant for
control purposes (for example, when their contribution to the overall system
response is negligible). Moreover, what gets digitized in the system is only the
controller, not the system itself. For this reason, normally the working range
of frequencies for the controller is considered, and the sampling period chosen
accordingly. The second test case presented in this chapter will describe more
in detail how the sampling period is chosen based on frequency information.
Even if this may seem another way to complicate the engineer’s life (as for the
Laplace transforms, we are moving from the old, familiar, real space into a
complex one), we shall see shortly how Z transforms are useful for specifying
the I/O relationship in digital controllers. The following important two facts
hold:
Z{Ay1 (n) + By2 (n)} = AZ{y1 (n)} + BZ{y2 (n)} (20.42)
that is, the Z transform is linear, and
∞
∞
Z{y(n − 1)} = y(n − 1)z −n = y(n )z −n z −1 = z −1 Z{y(n)}
n=−∞ n =−∞
(20.43)
The above relation has been obtained by replacing the term n in the summa-
tion with n = n − 1. Stated in words, (20.43) means that there is a simple
relation (the multiplication for z −1 ) between the Z transform of a sequence
y(n) and that of the same sequence delayed of one sample. Let us recall the
general I/O relationship in a linear system:
dy n dy n−1 dy
an + a n−1 + ... + a1 + a0 =
dtn dtn−1 dt
dum dum−1 du
bm m + bm−1 m−1 + ... + b1 + b0 (20.44)
dt dt dt
where u(t) and y(t) are the continuous input and output of the system, re-
spectively. If we move from the continuous values of y(t) and u(t) to the
corresponding sequence of sampled values y(kT ) and u(kT ), after having cho-
sen a period T small enough to provide a satisfactory approximation of the
system evolution, we need to compute an approximation of the sampled time
Control Theory and Digital Signal Processing Primer 459
derivatives of y(t) and u(t). This is obtained by approximating the first order
derivative with a finite difference:
dy y(kT ) − y((k − 1)T )
(kT ) (20.45)
dt T
Recalling that the Z transform is linear, we obtain the following Z represen-
tation of the first-order time-derivative approximation:
(1 − z −1 )n (1 − z −1 )
Y (z)[an n
+ ... + a1 + a0 ] =
T T
(1 − z −1 )m (1 − z −1 )
U (z)[bm + ... + b 1 + b0 ] (20.48)
Tm T
that is,
Y (z) = V (z)U (z) (20.49)
where Y (z) and U (z) are the Z transform of y(t) and u(t), respectively, and
V (z) is the transfer function of the linear system in the Z domain. Again,
we obtain an algebraic relationship between the transforms of the input and
output, considering the sampled values of a linear system. Observe that the
transfer function V (z) can be derived directly from the transfer function W (s)
in the Laplace domain, by the replacement
1 − z −1
s= (20.50)
T
From a specification of the transfer function W (s) expressed in the form
(s)
W (s) = N D(s) with the replacement of (20.50), we derive a specification of
V (z) expressed in the form
m
bi (z −1 )i
V (z) = ni=0 −1 )i
(20.51)
i=0 ai (z
that is,
n
m
Y (z) ai (z −1 )i = U (z) bi (z −1 )i (20.52)
i=0 i=0
460 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Recalling that Y (z)z −1 is the Z transform of the sequence y((k − 1)T ) and,
more in general, that Y (z)z −i is the Z transform of the sequence y((k − i)T ),
we can finally get the I/O relationship of the discretized linear system in the
form
n m
ai y((k − i)T ) = bi u((k − i)T ) (20.53)
i=0 i=0
that is,
1
m n
y(kT ) = [ bi u((k − i)T ) − ( ai y((k − i)T )] (20.54)
a0 i=0 i=1
Put in words, the system output is the linear combination of the n−1 previous
outputs and the current input plus the m − 1 previous inputs. This represen-
tation of the controller behavior can then be easily implemented by a program
making only multiplication and summations, operations that can be executed
efficiently by CPUs.
In summary, the steps required to transform the controller specification
given as a transfer function W (s) into a sequence of summations and multi-
plications are the following:
Kp + Ki T − z −1 Kp
V (z) = (20.56)
1 − z −1
from which we derive the definition of the algorithm
where the input u(kT ) is represented by the sampled values of the difference
Control Theory and Digital Signal Processing Primer 461
between the sampled reference href (kT ) and the actual liquid level h(kT ),
and the output y(kT ) corresponds to the flow reference sent to the pump.
Observe that the same technique can be used in a simulation tool to com-
pute the overall system response, given the transfer function W (s). For exam-
ple, the plots in Figures (20.9) and (20.10) have been obtained by discretizing
the overall tank–pump transfer function (20.40).
An alternative method for the digital implementation of the control trans-
fer function is the usage of the Bilinear Transform, that is, using the replace-
ment
2 1 − z −1
s= (20.58)
T 1 + z −1
which can be derived from the general differential equation describing the
linear system using a reasoning similar to that used to derive (20.50).
such a way that its parameters can be easily changed. Very often, in fact,
some fine tuning is required during the commissioning of the system.
As a final observation, all the theory presented here and used in practice
rely on the assumption that the system being controlled can be modeled as
a linear system. Unfortunately, many real-world systems are not linear, even
simple ones. In this case, it is necessary to adopt techniques for approximating
the nonlinear systems with a linear one in a restricted range of parameters of
interest.
that is, a sinusoidal signal of amplitude A, frequency f , and phase ϑ. Its period
T is the inverse of the frequency, that is, T = f1 . Under rather general con-
ditions, every periodic function y(t) can be expressed as a (possibly infinite)
sum of harmonics to form Fourier series. As an example, consider the square
function fsquare (t) with period T shown in Figure 20.12. The same function
can be expressed by the following Fourier series:
∞
4 cos (2π(2k − 1)t − π2 )
fsquare (t) = (20.60)
π 2k − 1
k=1
y(t)
t
FIGURE 20.12
A square function.
y(t)
FIGURE 20.13
The approximation of a square function considering 1 and 10 harmonics.
464 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
amplitude
f req
FIGURE 20.14
The components (amplitude vs. frequency) of the harmonics of the square
function.
The reader may wonder why the Fourier transform yields a complex value.
Control Theory and Digital Signal Processing Primer 465
im
b A
θ
a re
FIGURE 20.15
Representation of a complex number in the re–im plane.
That is, the contribution of the Fourier transform Y (f ) at the given frequency
f1 is the harmonic at frequency f1 whose amplitude and phase are given by
the module and phase of the complex number Y (f1 ).
Usually, the module of the Fourier transform Y (f ) is plotted against fre-
quency f to show the frequency distribution of a given function f (t). The plot
is symmetrical in respect of the Y axis. In fact, we have already seen that
Y (−f ) = Y (f ), and therefore, the modules of Y (f ) and of Y (−f ) are the
same.
The concepts we have learned so far are can be applied to a familiar
concept, that is, sound. Intuitively, we expect that grave sounds will have
a harmonic content mostly containing low frequency components, while acute
sounds will contain harmonic components at higher frequencies. In any case,
the sound we perceive will have no harmonics over a given frequency value
because our ear is not able to perceive sounds over a given frequency limit.
FIGURE 20.16
A signal with noise.
0 10 20
f requency(Hz)
FIGURE 20.17
The spectrum of the signal shown in Figure (20.16).
468 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
FIGURE 20.18
The signal of Figure 20.16 after low-pass filtering.
0 10 20
f requency(Hz)
FIGURE 20.19
The spectrum of the signal shown in Figure 20.18.
Control Theory and Digital Signal Processing Primer 469
1.2
Attenuation
1
0.8
0.6
0.4
0.2
0 fc
f requency(Hz)
FIGURE 20.20
Frequency response of an ideal filter with cut-off frequency fc .
low-pass filter used to filter the signal shown in Figure 20.16 is shown in
Figure 20.21. The gain of the filter is normally expressed in decibel (dB),
corresponding to 20 log10 (A1 /A0 ), where A0 and A1 are the amplitude of the
original and filtered harmonic, respectively. Since the gain is normally less
than or equal to one for filters, its expression in decibel is normally negative.
The frequency response shown in Figure 20.21 is shown expressed in decibel
in Figure 20.22. Referring to Figure 20.21, for frequencies included in the Pass
Band the gain of the filter is above a given threshold, normally −3dB (in the
ideal filter the gain in the pass band is exactly 0 dB), while in the Stop Band
the gain is below another threshold, which, depending on the application, may
range from −20 dB and −120 dB (in the ideal filter the gain in decibel for
these frequencies tends to −∞). The range of frequencies between the pass
band and the stop band is often called transition band: for an ideal low-pass
filter there is no Transition Band, but in practice the transition band depends
on the kind of selected filters, and its width is never null.
A low-pass filter is a linear system whose relationship between the input
(unfiltered) signal and the output (filtered) one is expressed by a differential
function in the form of (20.6). We have already in hand some techniques for
handling linear systems and, in particular, we know how to express the I/O
relationship using a transfer function W (s) expressed in the Laplace domain.
At this point, we are able to recognize a very interesting aspect of the Laplace
transform. Recalling the expression of the Laplace transform of function w(t)
∞
W (s) = w(t)e−st dt (20.68)
0
1.2
Attenuation
0.9
0.3
0
0 5 10
f requency(Hz)
FIGURE 20.21
Frequency response of the filter used to filter the signal shown in Figure 20.16.
25
Attenuation(dB)
0
0 5
f requency(Hz)
−25
−50
FIGURE 20.22
Frequency response shown in Figure 20.21 expressed in decibel.
Control Theory and Digital Signal Processing Primer 471
and supposing that f (t) = 0 before time 0, the Fourier transform corresponds
to the Laplace one, for s = j2πf , that is, when considering the values of
the complex variable s corresponding to the imaginary axis. So, information
carried by the Laplace transform covers also the frequency response of the
system. Recalling the relationship between the input signal u(t) and the output
signal y(t) for a linear system expressed in the Laplace domain
Y (f ) = W (f )U (f ) (20.71)
In particular, if we apply a sinusoidal input function u(t) = cos (2πf t), the
output will be of the form y(t) = A cos (2πf t + θ), where A and θ are the
module and phase of W (s), s = ej2πf .
Let us consider again the transfer function of the tank–pump with feedback
control we defined in the previous section. We recall here its expression
sKp + Ki
W (s) = (20.72)
s2 B + sKp + Ki
where fc is the cut-off frequency, and n is called the order of the filter and
sk , k = 1, . . . , n are the poles, which are of the form
The poles of the transfer function lie, therefore, on the left side of a circle
472 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
FIGURE 20.23
The module of the transfer function for the tank–pump system controlled in
feedback highlighting its values along the imaginary axis.
1.2
Attenuation
0.9
0.6
0.3
0
−5 0 5
f requency(Hz)
FIGURE 20.24
The Fourier representation of the tank–pump transfer function.
Control Theory and Digital Signal Processing Primer 473
im
real
FIGURE 20.25
The poles of a Butterworth filter of the third-order.
of radius 2πf . Figure 20.25 displays the poles for a Butterworth filter of the
third order. Figure 20.26 shows the module of the transfer function in the
complex plane, highlighting its values along the imaginary axis corresponding
to the frequency response shown in (20.27). The larger the number of poles
in the Butterworth filter, the sharper the frequency response of the filter, that
is, the narrower the Transition Band. As an example, compare the frequency
response of Figure 20.27 corresponding to a Butterworth filter with 3 poles,
and that of Figure 20.21 corresponding to a Butterworth filter with 10 poles.
An analog Butterworth filter can be implemented by an electronic circuit,
as shown in in Figure 20.28. We are, however, interested here in its digital
implementation, which can be carried out using the technique introduces in
the previous section, that is
FIGURE 20.26
The module of a Butterworth filter of the third-order, and the corresponding
values along the imaginary axis.
Control Theory and Digital Signal Processing Primer 475
1.2
Attenuation
0.9
0.6
0.3
0
−15 0 15
f requency(Hz)
FIGURE 20.27
The module of the Fourier transform of a third-order Butterworth filter with
5 Hz cutoff frequency.
3.18µF
Vin
+
10kΩ 10kΩ Vout
10kΩ
-
3.18µF 1 kΩ
3.18µF
1kΩ
0V
FIGURE 20.28
An electronic implementation of a Butterworth filter of the third order with
5 Hz cutoff frequency.
476 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
its samples y(nT ). Stated in other words, if we are able to find a mathematical
transformation that, starting from the sampled values y(nT ) can rebuild y(t),
for every time t, then we can ensure that no information has been lost when
sampling the signal. To this purpose, let us recall the expression of the Fourier
transform for the signal y(t)
∞
Y (f ) = y(t)e−2πf t dt (20.75)
−∞
which transforms the real function of real variable y(t) into a complex function
of real variable Y (f ). Y (f ) maintains all the information of y(t), and, in fact,
the latter can be obtained from Y (f ) via a the Fourier antitransform:
∞
y(t) = Y (f )ej2πf t df (20.76)
−∞
Now, suppose we have in hand only the sampled values of y(t), that is, y(nT )
for a given value of the sampling period T. An approximation of the Fourier
transform can be obtained by replacing the integral in (20.75) with the sum-
mation
∞
YT (f ) = T y(nT )e−j2πf nT (20.77)
n=−∞
Even if from the discrete-time Fourier transform we can rebuild the sam-
pled values y(nT ), we cannot yet state anything about the values of y(t) at
the remaining times. The following relation between the continuous Fourier
transform Y (f ) of (20.75) and the discrete time version YT (f ) of (20.77) will
allow us to derive information on y(t) also for the times between consecutive
samples:
∞
1 r
yT (f ) = Y (f − ) (20.79)
T r=−∞ T
Put in words, (20.79) states that the discrete time Fourier representation
YT (f ) can be obtained by considering infinite terms, being the rth term com-
posed of the continuous Fourier transform shifted on the right of rfc = r/T .
The higher the sampling frequency fc , the more separate will be the terms of
the summation. In particular, suppose that Y (f ) = 0 for |f | < f1 < fc /2. Its
module will be represented by a curve similar to that shown in Figure 20.29.
Therefore the module of the discrete time Fourier will be of the form shown in
Figure 20.30, and therefore, for −fc /2 < f < fc /2, the discrete time transform
YT (f ) will be exactly the same as the continuous one Y (f ).
Control Theory and Digital Signal Processing Primer 477
Y (f )
−fc /2 fc /2 f
FIGURE 20.29
A frequency spectrum limited to fc /2.
Y (f )
−fc /2 fc /2 f
FIGURE 20.30
The discrete time Fourier transform corresponding to the continuous one of
Figure 20.29.
478 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Y (f )
−fc /2 fc /2 f
FIGURE 20.31
The Aliasing effect.
This means that, if the sampling frequency is at least twice the highest
frequency of the harmonics composing the original signal, no information is
lost in sampling. In fact, in principle, the original signal y(t) could be derived
by antitrasforming using (20.76), the discrete-time Fourier transform, built
considering only the sampled values y(nT ). Of course, we are not interested
in the actual computation of (20.76), but this theoretical result gives us a
clear indication in the choice of the sampling period T .
This is a fundamental result in the field of signal processing, and is called
the Nyquist–Shannon sampling theorem, after Harry Nyquist and Claude
Shannon, even if other authors independently discovered and proved part of
it. The proof by Shannon was published in 1949 [80] and is based on an earlier
work by Nyquist [67].
Unfortunately, things are not so bright in real life, and normally, it is not
possible for a given function y(t) to find a frequency f0 for which Y (f ) =
0, |f | > f0 . In this case we will have an aliasing phenomenon, as illustrated
in Figure 20.31, which shows how the spectrum is distorted as a consequence
of sampling. The effect of the aliasing when considering the sampled values
y(nT ) is the “creation” of new harmonics that do not exists in the original
continuous signal y(t). The aliasing effect is negligible for sampling frequencies
large enough, and so the amplitude of the tail in the spectrum above fc /2
becomes small, but significant distortion in the sampled signal may occur for
a poor choice of fc .
The theory presented so far provides us the “golden rule” of data acquisi-
tion, when signals sampled by ADC converters are then acquired in an embed-
ded system, that is, choosing a sampling frequency which is at least twice the
maximum frequency of any significant harmonic of the acquired signal. How-
ever, ADC converters cannot provide an arbitrarily high sampling frequency,
and in any case, this may be limited by the overall system architecture. As an
example, consider an embedded system that acquires 100 signals coming from
sensors in a controlled industrial plant, and suppose that a serial link connects
the ADC converters and the computer. Even if the single converter may be
able to acquire the signal at, say, 1 kHz (commercial ADC converters can have
Control Theory and Digital Signal Processing Primer 479
a sampling frequency up to some MHz), sending 100 signals over a serial link
means that a data throughput of 100 KSamples/s has to be sustained by the
communication link, as well as properly handled by the computer. Moving to
a sampling frequency of 10 kHz may be not feasible for such a system because
either the data link is not able to sustain an higher data throughput, or the
processing power of the computer becomes insufficient.
Once a sampling frequency fc has been chosen, it is mandatory to make
sure that the conversion does not introduce aliasing, and therefore it is nec-
essary to filter the input signals with an analog low-pass filter whose cut-off
frequency is at least fc /2, before ADC conversion. Butterworth filters, whose
electrical schema is shown in Figure 20.28, are often used in practice, and are
normally implemented inside the ADC boards themselves.
W (s) =
31006.28
(s − (−15.708 + j27.207))(s − (−31.416 + j0))(s − (−15.708 − j27.207))
(20.83)
y(nT ) =
y((n − 3)T ) − 3.063y((n − 2)T ) + 3.128y((n − 1)T ) + 3.142 × 10−5 x(nT )
1.065
(20.85)
480 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
typedef struct {
int c u r r I n d e x; //Current index in c i r c u l a r b u f f e r s
int bufSize ; //Number of elements in the b u f f e r s
float * yBuf ; //Output h i s t o r y b u f f e r
float * uBuf ; //Input h i s t o r y b u f f e r
float * a ; // Previous output c o e f f i c i e n t s
int aSize ; //Number of a c o e f f i c i e n t
float * b ; // Previous i n pu t c o e f f i c i e n t s
int bSize ; //Number of b c o e f f i c i e n t s
} F i l t e r D e s c r i p t o r;
/∗ F i l t e r s t r u c t u r e i n i t i a l i z a t i o n .
To be c a l l e d b e fore e n te r i n g the re al −time phase
I t s arguments are the a and b c o e f f i c i e n t s of the f i l t e r ∗/
F i l t e r D e s c r i p t o r * i n i t F i l t e r( float * aCoeff , int numACoeff ,
float * bCoeff , int n u m B C o e f f)
{
int i ;
F i l t e r D e s c r i p t o r * n e w F i l t e r;
Control Theory and Digital Signal Processing Primer 481
return currOut ;
}
482 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
/∗ F i l t e r d e a l l o c a t i o n r ou ti n e .
To be c a l l e d when the f i l t e r i s no lon g e r used o u t s i d e
the r e al −time phase ∗/
void r e l e a s e F i l t e r( F i l t e r D e s c r i p t o r * filter )
{
free (( char *) filter - > a );
free (( char *) filter - > b );
free (( char *) filter - > yBuf );
free (( char *) filter - > uBuf );
free (( char *) filter );
}
If we use B bits for the conversion, and input range of A, the quantization
interval Δ is equal to A/(2B ) and therefore
that is, for every additional bit in the conversion, the SNR is incremented
of around 6dB for every additional bit in the ADC conversion (the other
terms in (20.90) are constant). This gives us an estimation of the effect of the
introduced quantization error and also an indication on the number of bits to
be considered in the ADC conversion. Nowadays, commercial ADC converters
use 16 bits or more in conversion, and the number of bits may reduced for
very high-speed converters.
20.3 Summary
In this section we have learned the basic concepts of control theory and the
techniques that are necessary to design and implement a digital low-pass filter.
The presented concepts represent facts that every developer of embedded sys-
tems should be aware of. In particular, the effects of sampling and the conse-
quent harmonic distortion due to the aliasing effect must always be taken into
account when developing embedded systems for control and data acquisition.
Another important aspect that should always be taken in consideration when
developing systems handling acquired data, is the choice of the appropriate
number of bits in analog-to-digital conversion. Finally, once the parameters
of the linear system have been defined, an accurate implementation of the
algorithm is necessary in order to ensure that the system will have a deter-
ministic execution time, less than a given maximum time. Most of this book
will be devoted to techniques that can ensure real-time system responsiveness.
A precondition to every technique is that the number of machine instructions
required for the execution of the algorithms is bounded. For this reason, in
the presented example, the implementation of the filter has been split into
two sets of routine: offline routines for the creation and the deallocation of the
required data structures, and an online routine for the actual run-time filter
computation. Only the latter one will be executed under real-time constraints,
and will consist of a fixed number of machine instructions.
484 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
Bibliography
485
486 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
[57] L. Lamport. The mutual exclusion problem: part I—a theory of inter-
process communication. Journal of the ACM, 33(2):313–326, 1986.
[58] L. Lamport. The mutual exclusion problem: part II—statement and so-
lutions. Journal of the ACM, 33(2):327–348, 1986.
[69] J. Oberg. Why the Mars probe went off course. IEEE Spectrum,
36(12):34–39, December 1999.
490 Real-Time Embedded Systems—Open-Source Operating Systems Perspective
[73] J. Postel. User Datagram Protocol, RFC 768. ISI, August 1980.