0% found this document useful (0 votes)
58 views328 pages

Proceedings 2011

Uploaded by

Mohammed iliyas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views328 pages

Proceedings 2011

Uploaded by

Mohammed iliyas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 328

Thirteenth Real-Time Linux

Workshop

Organizing committee
Local staff at Czech Technical University Prague
Michal Sojka (Faculty of Electrical Engineering, Department of Control Engineering)
Pavel Pı́ša (Faculty of Electrical Engineering, Department of Control Engineering)
Petr Hodač (SUT - Středisko un*xových technologiı́, UN*X Technologies Center)
Real-Time Linux Foundation Working Group and OSADL
Prof. Nicholas Mc Guire, (Lanzhou University, China)
Andreas Platschek (OpenTech, Austria)
Dr. Carsten Emde (OSADL, Germany)

Program committee
Roberto Bucher, RTAI Maintainer, Switzerland
Alfons Crespo, University Valencia, Spain
Carsten Emde, OSADL, Germany
Oliver Fendt, Siemens, Corporate Technology, Germany
Gerhard Fohler, Technische Universität Kaiserslautern, Germany
Thomas Gleixner, Linutronix, Germany
Nicholas Mc Guire, Lanzhou University, China
Hermann Härtig, TU Dresden, Germany
Zdeněk Hanzálek, Czech Technical University, Prague
Paul E. McKenney, IBM
Jan Kiszka, Siemens, Germany
Miguel Masmano, Universidad Politecnica de Valencia, Spain
Odhiambo Okech, University of Nairobi, Kenya
Pavel Pı́ša, Czech Technical University, Prague
Andreas Platschek, OpenTech, Austria
Zhou Qingguo, Lanzhou University, China
Ismael Ripoll, University Valencia, Spain
Georg Schiesser, OpenTech, Austria
Stefan Schönegger, Bernecker + Rainer, Austria
Michal Sojka, Czech Technical University, Prague
Martin Terbuc, University of Maribor, Slovenia
Mathias Weber, Roche Diagnostics Ltd., Switzerland
Bernhard Zagar, Johanes Keppler University, Austria
Peter Zijlstra, Red Hat, Netherlands

Prague 2011
Title: Proceedings of the 13th Real Time Linux Workshop

Publisher: Open Source Automation Development Lab (OSADL) eG

Year of publication: 2011

ISBN: 978-3-0003-6193-7
Preface

After several Real-Time Linux Workshops in Europe (Vienna 1999, Milan 2001, Valencia 2003, Lille 2005,
Linz 2007, Dresden 2009), in America (Orlando 2000, Boston 2002, Guadalajara 2008), and Asia (Singapore
2004, Lanzhou 2006) reaching Africa for the first time in 2010 (Nairobi 2010), the Thirteenth Real-Time
Linux Workshop comes to Prague, Czech Republic this year.

The event is still driven by the simple goal: bring together developers and users, present new develop-
ments, discuss ‘real’ user demand and get to know those anonymous people that only exist as e-mail folders
on your mailing-list archive, and last but not least, encourage the spirit of a community.

Free Libre Open-Source Software is a fast growing technology pool and we can observe this well in the
breath of development presented at this year’s Real-Time Linux Workshop. Not only has FLOSS reached
traditional automation and control, but it is increasingly reaching into technical areas that were almost
unthinkable for ”non-commercial” entities - safety critical systems. This development is underpinned by
developments in the FLOSS tools for formal and semi-formal verification. With other words, FLOSS is cov-
ering the entire area from educational material, traditional automation and control, robotics to aerospace
and automotive industries - while no single workshop can ever claim to cover it all - we do hope to have
collected a representative snapshot of this sprawling community.

Thank you very much for attending the Real Time Linux Workshop. We hope that your expectations
are met during this workshop, as developer, as user or as newcomer to real time Linux.

The organizing committee

i
ii
Acknowledgements

No Real Time Linux community, no Real Time Linux users, no Real Time Linux Workshop. Therefore, our
thanks go to the Real Time Linux community for the work done in Open Source software development as an
international cooperation.

All authors and attendees, thanks a lot for your contribution in any respect.

In particular, we want to express our thanks to the sponsors of the 13th Real-Time Linux Workshop:

Last but not least, thanks to everybody having contributed to this workshop and not explicitly mentioned
above.

OSADL – Real-Time Linux Foundation Working Group

iii
iv
Contents

Real-Time Linux in Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


Realtime Suite: a step-by-step introduction to the world of real-time signal acquisition and
conditioning.
Alberto Guiggiani, Michele Basso, Massimo Vassalli and Francesco Difato . . . . . . . 1
A Nonlinear Model-Based Control realized with an Open Framework for Educational Purposes
Klaus Weichinger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Real-Time Linux Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
The Witch Navigator A Software GNSS Receiver Built on Real-Time Linux
Petr Kacmarik, Pavel Kovar, Ondrej Jakubov and Frantisek Vejrazka . . . . . . . . . . 17
Case Study: Challenges and Benefits in Integrating Real Time patch in PowerPc Based Media
System
Manikandan Ramachandran and Aviral Pandey . . . . . . . . . . . . . . . . . . . . . . 29
Hard real-time Control and Coordination of Robot Tasks using Lua
Markus Klotzbuecher and Herman Bruyninckx . . . . . . . . . . . . . . . . . . . . . . 37
Platform independent remote control and data exchange with real-time targets on the example
of a rheometer system
Martin Leonhartsberger and Bernhard G. Zagar . . . . . . . . . . . . . . . . . . . . . . 45
Using GNU/Linux and other Free Software for Remote Data Collection, Analysis and Control
of Silos in Mexico
Don W. Carr, Juan Villalvazo Naranjo, Rubén Ruelas, Benjamin Ojeda Magaña . . . 53
A Status Report on REACT, a Control Engine that runs on top of GNU/Linux for Creating
SCADA and DCS Systems
Don W. Carr, Rubén Ruelas, Benjamin Ojeda Magaña, Adriana Corona Nakamura . 57
Development of an optical profile measurement system under RTAI Linux using a CD pickup
head
Gregor Gerstorfer and Bernhard G. Zagar . . . . . . . . . . . . . . . . . . . . . . . . 61
Process Data Connection Channels in uLan Network for Home Automation and Other Dis-
tributed Applications
Pavel Pı́ša, Petr Smolı́k, František Vacek, Martin Boháček, Jan Štefan and Pavel
Němeček . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Real-Time Linux Infrastructure and Tools . . . . . . . . . . . . . . . . . . . . . . . . 75
Application of RT-Preempt Linux and Sercos III for Real-time Simulation
Michael Abel, Luis Contreras and Prof. Peter Klemm . . . . . . . . . . . . . . . . . . 75
Lachesis: a testsuite for Linux based real-time systems
Andrea Claudi and Aldo Franco Dragoni . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Generic User-Level PCI Drivers
Hannes Weisbach, Björn Döbel and Adam Lackorzynski . . . . . . . . . . . . . . . . . 91
COMEDI and UIO drivers for PCI Multifunction Data Acquisition and Generic I/O Cards
and Their QEMU Virtual Hardware Equivalents
Pavel Pı́ša and Rostislav Lisový . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A Framework for Component-Based Real-Time Control Applications
Stefan Richter, Michael Wahler and Atul Kumar . . . . . . . . . . . . . . . . . . . . . 107

v
Performance Evaluation and Enhancement of Real-Time Linux . . . . . . . . 117
Real-Time Performance of L4Linux
Adam Lackorzynski, Janis Danisevskis, Jan Nordholz and Michael Peter . . . . . . . . 117
Tiny Linux Project: Section Garbage Collection Patchset
Sheng Yong, Wu Zhangjin and Zhou Qingguo . . . . . . . . . . . . . . . . . . . . . . . 125
Performance Evaluation of openPOWERLINK
Yang Minqiang, Li Xuyuan, Nicholas Mc Guire and Zhou Qingguo . . . . . . . . . . . 135
Improving Responsiveness for Virtualized Networking Under Intensive Computing Workloads
Tommaso Cucinotta, Fabio Checconi and Dhaval Giani . . . . . . . . . . . . . . . . . 143
Evaluation of RT-Linux on different hardware platforms for the use in industrial machinery
control
Thomas Gusenleitner and Gerhard Lettner . . . . . . . . . . . . . . . . . . . . . . . . 149
openPOWERLINK in Linux Userspace: Implementation and Performance Evaluation of the
Real-Time Ethernet Protocol Stack in Linux Userspace
Wolfgang Wallner and Josef Baumgartner . . . . . . . . . . . . . . . . . . . . . . . . . 155
Timing Analysis of a Linux-Based CAN-to-CAN Gateway
Michal Sojka, Pavel Pı́ša , Ondrěj Špinka, Oliver Hartkopp and Zdeněk Hanzálek . . 165
Evaluation of embedded virtualization on real-time Linux for industrial control system
Sanjay Ghosh and Pradyumna Sampath . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Real-Time Linux Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Turning Kriegers MCS Lock into a Send Queue or, a Case for Reusing Clever, Mostly Lock-
Free Code in a Different Area
Benjamin Engel and Marcus Völp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
DYNAMIC MEMORY ALLOCATION ON REAL-TIME LINUX
Jianping Shen, Michael Hamal and Sven Ganzenmüller . . . . . . . . . . . . . . . . . 187
pW/CS - Probabilistic Write / Copy-Select (Locks)
Nicholas Mc Guire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
On the implementation of real-time slot-based task-splitting scheduling algorithms for multi-
processor systems
Paulo Baltarejo Sousa, Konstantinos Bletsas, Eduardo Tovar and Björn Andersson . 207
Experience with Sporadic Server Scheduling in Linux: Theory vs. Practice
Mark J. Stanovich, Theodore P. Baker and An-I Andy Wang . . . . . . . . . . . . . . 219
How to cope with the negative impact of a processors energy-saving features on real-time
capabilities?
Carsten Emde . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
FLOSS in Safety Critical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
On integration of open-source tools for system validation, example with the TASTE tool-chain
Julien Delange and Maxime Perrotin . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Safety logic on top of complex hardware software systems utilizing dynamic data types.
Nicholas McGuire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
POK, an ARINC653-compliant operating system released under the BSD license
Julien Delange and Laurent Lec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
D-Case Editor: A Typed Assurance Case Editor
Yutaka Matsuno . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
A FLOSS library for the safety domain
Peter Krebs, Andreas Platschek and Hans Tschürtz . . . . . . . . . . . . . . . . . . . 279
Open Proof for Railway Safety Software
Klaus-Rüdiger Hase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Migrating a OSEK run-time environment to the OVERSEE platform
Andreas Platschek and Georg Schiesser . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

vi
Real-Time Linux in Education

Realtime Suite: a step-by-step introduction to the world of


real-time signal acquisition and conditioning.

Alberto Guiggiani
Università di Firenze, Dipartimento di Sistemi e Informatica
Via di S. Marta 3, Florence, Italy
[email protected]

Michele Basso
Università di Firenze, Dipartimento di Sistemi e Informatica
Via di S. Marta 3, Florence, Italy
[email protected]

Massimo Vassalli
Natonal Research Council of Italy, Institute of Biophysics
Via De Marini 6, Genoa, Italy
[email protected]

Francesco Difato
Italian Institute of Technology (IIT), Dept. of Neuroscience and Brain Technologies
Via Morego 30, Genoa, Italy
[email protected]

Abstract
In this paper we present the Realtime Suite, a project that aims to provide all the tools and guides
needed to set up a Linux-RTAI real-time machine within a CACSD environment. This is addressed in
particular to researchers and students approaching the world of real-time control applications for the first
time, or people searching for an open-source alternative to commercial solutions.

1 Introduction Suite[1] our aim is to provide the tools and docu-


mentation needed to easily set up a working real-
When building a real-time machine, open-source time machine for signal acquisition and processing
software can constitute a suitable alternative to com- within a Computer Aided Control System Design
mercial solutions in terms of functionality, perfor- (CACSD) framework. In order to achieve this, we
mance and costs. On the other hand, the major started by collecting all the pieces required to config-
drawbacks of this choice are the long amount of time ure a complete software chain: an Ubuntu operating
and the advanced computer skills needed to make all system with RTAI[2] modified kernel; Comedi[3] to
the pieces work together. These steps represent an interface with DAQ boards; Scicoslab[4] with RTAI-
obstacle that might discourage less-experienced peo- Lib[5][6] palette to generate and build the real-time
ple, like students for example, to enter the world of target and, finally, QRTAI-Lab[7] and RTAI-XML[8]
Linux-based real-time applications. With Realtime to monitor the target locally or remotely, respec-

1
Realtime Suite: a step-by-step introduction to the world of real-time signal acquisition and conditioning.

tively. The next step was to edit source codes with the various components of the real-time machine is
the objective to avoid conflicts and compile errors. shown in figure 1.
Lastly, we packed everything in the so-called Real-
time Suite alongside with documentation with simple
step-by-step instructions and examples. In addition 2.1 RTAI and Comedi
to that, we configured the suite on a Virtual Machine
(VM) that works out-of-the-box, useful for testing RTAI[2], Real Time Application Interface, is a Linux
purposes. extension that allows the execution of tasks with
strict temporal constraints, enabling hard real-time
In order to verify the efficacy of the proposed ap-
(HRT) control algorithm implementations. Develop-
proach in developing an actual real-time application,
ment started back in 1999 with the work of Paolo
a Realtime Suite based system has been installed[9]
Mantegazza at Politecnico di Milano. The Suite in-
to realize a feedback control for nanometer-precision
cludes RTAI version 3.8.1 alongside with Linux Ker-
specimen tracking in an optical tweezer system with
nel 2.6.32 patched from LinuxCNC[10].
piezoelectric actuators.
Comedi[3] is a set of drivers provided as Linux
The paper is organized as follows. In Section 2
kernel extensions that enable communication with a
an overview of the software included in the Realtime
broad range of commercial data acquisition boards.
Suite package will be presented. Section 3 will cover
A collection of libraries is provided, with APIs which
the remaining two components of Realtime Suite: a
allow the real-time target to interface with the de-
step-by-step tutorial and the preconfigured virtual
vice.
machine. Section 4 will focus on results of perfor-
mance tests to evaluate achievable sampling rates
and jitter, while in the last section an application
2.2 Scicoslab and RTAI-Lib
will be proposed in which a Realtime Suite based
machine has been used to control specimen position
in an optical tweezers setup.

2 An Open-source Rapid Con-


troller Prototyping Environ-
ment

Linux Ubuntu + Kernel RTAI

TCP/IP

FIGURE 2: RTAI Scicoslab blocks

Scicoslab[4] is a scientific software package born in


2005 from the BUILD4 distribution of Scilab, when
the development of the latter left the INRIA (In-
FIGURE 1: Real-time Machine stitut National de Recerche on Informatique et on
Automatique). It provides a wide range of func-
In this section will follow a brief description of the tions for mathematical and engineering applications,
software needed to build a Linux-based machine for and comes with the CACSD software Scicos. With
signal acquisition and conditioning. The proposed Scicos [11] control algorithms can be designed by
configuration is based on open-source projects and means of a graphical UI, connecting blocks from
packed in one single archive, available for download available palettes. The RTAI-Lib palette, installed
in the Realtime Suite website [1]. Some of the source in the machine setup phase, adds various block func-
files included in the package has been edited in or- tional to hard real-time control targets. They can
der to assure compatibility between each software be divided in four groups: blocks providing interface
version and obtain a fluent installation procedure. A with Comedi devices, mailboxes for communications
schematic diagram of functional connections between between multiple targets running on the same ma-

2
Real-Time Linux in Education

chine, semaphores for task synchronization, and in- of Linux real-time applications. To achieve this, the
put/output blocks, like signal generators, scopes and software package described in the previous section
meters. The RTAI-Lib palette blocks are shown in comes with a step-by-step tutorial that accompanies
figure 2. the user in all the phases needed to build a work-
ing real-time control machine, from the initial sys-
tem setup to compilation and execution of the first
2.3 QRTAILab real-time target. The majority of the steps involves
running console commands to compile and install
QRTAILab [7] is a RTAI-Linux Graphical User In- the various components, but everything is explained
terface useful to manage real-time targets running on and viable for users without advanced programming
the same machine. It creates a virtual oscilloscope skills. The tutorial is a revised version of the one
to monitor signals and allows online modification of proposed by Bucher, Mannori, Netter[16].
the target parameters. It was developed starting In addition to the software package and its re-
from the source code of xrtailab, a software part of lated tutorial, a virtual machine has been released.
the RTAI-Lab package, using the Qt libraries. With It is based on Linux Ubuntu 10.04 and includes all
respect to xrtailab, QRTAILab is much lighter[12] the software of the Realtime Suite compiled and
CPU-wise when connecting to complex targets. ready-to-use. It is available through the RTAI-XML
project website[8] in the Open Virtual Machine for-
mat (.ova) and can be executed on a wide range
2.4 Remote interface: RTAI-XML of host machines with the open-source virtualization
software VirtualBox[17]. Due to the limitations of a
One of the major concerns that becomes evident virtualized system it cannot be used in substitution
when designing real-time control system architec- of a physical RTAI machine in actual signal acquisi-
tures is the intrinsic duality between hard real-time tion and conditioning, but can constitute and handy
(HRT) and soft real-time (SRT) components. While tool in the design phases of Scicos control algorithms
the first requires the programmer to focus on tim- or while testing remote RTAI-XML clients.
ing constrains, latencies, and sampling rates, SRT
components like human-machine interfaces (HMI)
require flexibility, user-friendliness and efficient data
handling. In order to separate those two worlds, a 4 Performance Tests
web services approach can be taken. Web services[13]
allow two pieces of software to communicate through This section shows results of two kinds of experimen-
a network defining a standard object access protocol. tal tests performed on a machine configured with
Here only the communication language is shared, Realtime Suite. The machine was a commercial
leaving freedom of implementation. personal computer with the following specifications:
CPU Intel Core2Duo 6300 @ 1.86 GHz
RTAI-XML[8][14] brings a web services approach
System RAM 1 GB
to the world of Linux real-time control applications.
Video Card ATI Radeon HD 4350
The Realtime Suite includes the RTAI-XML server
HDD WD Caviar Blue SATA @ 7200 RPM
component that are compiled on the RT machine
National Instruments DAQ board NI-PCI 6229
through the last steps of the tutorial. This server
components acts as an intermediary between the
real-time targets and a remote procedure call frame- 4.1 Jitter
work. Using XML, it bridges the target signals and
parameters over the network, making them acces- The first test aims at evaluating precision of the sam-
sible from remote clients like jRTAILab[15], a Java pling tick, in respect to sampling rate and task pri-
implementation of xrtailab, or any other application- ority. In order to calculate it, two separate real-
oriented client as the one presented in Section 5. time targets are in execution: the first with high
priority and high sampling rate (2.5 KHz), the sec-
ond with low priority and low sampling rate (125
3 Setup Tutorial and Realtime Hz). Each one gets sampling effective timestamps
and compares them with expected timestamps, cal-
Suite VM culated by adding task period to a counter. The ab-
solute value of their difference is the instantaneous
The Realtime Suite project was born with the objec- jitter. The two tasks are kept running until the max-
tive to guide less experienced people into the world imum jitter reaches a stable value. Results are shown

3
Realtime Suite: a step-by-step introduction to the world of real-time signal acquisition and conditioning.

in table 1. without mechanical contact[18]. For small displace-


ments of the trapped bead, few hundreds of nanome-
Priority Samp. Rate [Hz] Max Jitter [ns] ters, from the laser focus centre, the particle suffers a
High 2500 19425 force actuated by the light beam, in the pN order, at-
Low 125 23445 tracting it towards the center of the trap. This light-
force has the important property to show a behavior
TABLE 1: Jitter Test similar to a spring for displacement of the bead of
about 200 nm, therefore the force-displacement rela-
tionship can be described by the Hookes Law[19]. In
4.2 Sampling Rates the past two decades optical tweezers found applica-
tions for nanometers and micrometers scale experi-
Sampling rate test is executed to establish maximum ments in many scientific areas, spacing from physics
reachable sampling rates in respect to number of in- to chemistry and biology. The application shown in
put and output channels open on the data acquisi- this paper involves studies on neuron cells during dif-
tion board. This is done by running a single target ferentiation, and relies heavily on a real-time control
with a fixed number of input/output Comedi blocks machine built with the Realtime Suite.
and gradually increasing sampling rates until output
waveform, measured with an oscilloscope, stops be- a
Win Machine RTAI Machine
ing equivalent to the one generated inside the block
diagram. The test is repeated for different numbers
of I/O channels as summarized in figure 3. C# PI
Interface Controller

e c

RTAI-XML

FIGURE 4: Optical Tweezers scheme

Setup is shown in figure 4, and has been described


in details in [9]. A laser beam is used (a) to trap a
microsphere. The interference fringes generated by
FIGURE 3: Sampling Rate Tests the interaction of the bead with the laser are pro-
jected to a four-quadrant photodiode(b). This in-
Looking at the chart, it is clear how the major impact terferometric measurement allows measuring the po-
on maximum sampling rates is given by the number sition of the sphere in respect to the center of the
of input channels. Anyway, sampling rates in the beam with sub-nanometer and sub-millisecond res-
magnitude of some KHz, suitable for a wide range olutions. Photodiode signals are acquired by the
of control applications, are easily handled even with RTAI machine equipped with a National Instruments
many open channels. This is a remarkable result for DAQ PCI board and Comedi drivers, and sampled
a machine built with such limited and cheap hard- at 2 KHz rate. On the machine a control tar-
ware components. get designed with Scicoslab implements a propor-
tional/integral feedback(c) comparing bead displace-
ment with an external reference position given by the
5 Application - Optical Tweez- user. Controller output pilots a Physik Instrumente
3-axis piezoelectric positioning system (d). Thanks
ers Control to RTAI-XML, all signals and feedback parameters
are sent through the network to another machine
Optical tweezers are scientific instruments that ex- where a custom C# interface is running (e). This
ploits the property of light to exert forces on matter interface provides a graphical front-end for parame-
to optically trap a small particle (usually a silica or ter editing, signals visualization and measurements
glass bead with diameter in the range of microme- saving. Moreover, automatically handles recoveries
ters) near the focus of a concentrated laser beam, with a DC-motor translation stage to extend piezo-

4
Real-Time Linux in Education

electric stage range. References


[1] Realtime Suite, Online, http://www.
rtaixml.net/realtime-suite

[2] RTAI, Real Time Application Interface, On-


line, https://www.rtai.org/

[3] Comedi, linux control and measurement device


interface, Online, http://www.comedi.org

[4] Scicoslab, Online, http://www.scicoslab.


org

[5] R. Bucher, L. Dozio, CACSD under RTAI Linux


with RTAI-LAB, Real Time Linux Work-
shop, Valencia, 2003

[6] R. Bucher, S. Balemi, Rapid controller prototyp-


ing with Matlab/Simulink and Linux, Control
FIGURE 5: Section of the C# interface Engineering Practice, 14 185192, 2006

In figure 5 is shown the section of C# interface that [7] QRTAILab, a user interface for RTAI, Online,
provides a graphical front-end to edit on-the-fly pa- http://qrtailab.sourceforge.net
rameters, e.g. proportional and integral gains, of the [8] RTAI-XML, Online, http://www.rtaixml.
control algorithm running on the real-time machine. net
This application shows the functionalities of a
[9] A. Guiggiani, B. Torre, A. Contestabile, F. Ben-
control machine built with Realtime Suite. We were
fenati, M. Basso, M. Vassalli, F. Difato, Long-
able to acquire multiple analog channels (nine in this
range and long-term interferometric tracking by
application) sampled with a bandwidth of two KHz,
static and dynamic force-clamp optical tweezers,
and to condition by a custom control algorithms,
Optics Express, in print 2011
with the use of limited hardware (a commercial PC
and a DAQ board). In addition to that, we could sep- [10] Linux CNC, Online, http://www.linuxcnc.
arate implementation of the HMI soft real-time com- org
ponents from business logic ones thanks to RTAI-
XML. [11] Scicos, Online, http://www.scicos.org

[12] xrtailab and QRtaiLab performance compari-


son, Online, http://qrtailab.sourceforge.
6 Conclusions net/performance.html

With Realtime Suite, we have configured a ready- [13] W3C: Web Services, Online, http://www.w3.
to-use software package useful for researchers/users org/2002/ws/
approaching real-time applications for the first time.
[14] M. Basso, R. Bucher, M. Romagnoli and M.
Everything is based on open-source projects sup-
Vassalli, Real-Time Control with Linux: A Web
ported by active developers and communities. At
Services Approach, In Proc. 44th IEEE Con-
the cost of a personal computer with a supported
ference on Decision and Control - Euro-
data acquisition board, and a few hours of work,
pean Control Conference, Seville (Spain),
it is possible to build a real-time machine capable
pp. 2733-2738, 12–15 Dec., 2005
of running custom control targets, sampling signals
with a bandwidth of a few kHz. Flexibility in con- [15] jRTAILab, a client for RTAI-XML,
trol architecture design is added by the inclusion of Online, http://www.rtaixml.net/
the RTAI-XML project, which allows to separate the client-applications/jrtailab
HRT components (signal acquisition, control algo-
rithms...) from the SRT components (user interface, [16] R. Bucher, S. Mannori, T. Netter, RTAI-Lab tu-
data manipulation), to develop appropriate strate- torial: Scicoslab, Comedi, and real-time control,
gies for interfacing two distinct worlds. 2010

5
Realtime Suite: a step-by-step introduction to the world of real-time signal acquisition and conditioning.

[17] VirtualBox, Online, http://www.rtaixml. optical trap for dielectric particles, Optical
net Letters, 11: 288-290, 1986
[19] K. Svoboda and S. Block, Force and velocity
[18] A. Ashkin, J. Dziedzic, J. Bjorkholm, and S. measured for single kinesin molecules, Cell,
Chu, Observation of a single-beam gradient force 77(5): 773-784, 1994

6
Real-Time Linux in Education

A Nonlinear Model-Based Control realized with an


Open Framework for Educational Purposes

Klaus Weichinger
BIOE Open Hardware Automation System Developer
3300 Greinsfurth, Austria
[email protected] - http://bioe.sourceforge.net

Abstract
Today, nonlinear model-based control methods are an essential part in different control applications.
To provide a complete open framework for educational purposes this contribution extends common open
source software (Scilab/Scicos, Maxima and rt-preempt Linux real-time system) with a low-cost do-it-
yourself open hardware interface and a web-based monitoring system embedded into Scicos blocks.
The simple concept of the open hardware interface called Bioe (Basic Input Output Elements) allows
the real-time application to interact with general analog and digital signals as well as to more complex
devices (e.g. resistive touch panels, RC servos, I2 C acceleration sensors). Furthermore, a prototype of a
web-based monitoring system using Ajax is treated. It consists of a Http web server embedded into a
Scicos block so that existing Scicos code generation packages for rt-preempt Linux can be used without
modifications.
To demonstrate the applicability and usability of the proposed framework a nonlinear model-based
control law for a mechatronic multi-input multi-output system is derived with the concept of input/output
linearization and realized with the proposed open framework.

1 Introduction and Xenomai [7], but the comparison of real-time


approaches is not the topic of this paper.
Rapid Control Prototyping (RCP) is still asso-
This contribution picks up two aspects that
ciated with cost-intensive hardware and software in-
should complete the existing OSS [2, 8, 4, 9] to a cost-
vestments. Such proprietary RPC frameworks are
and time-saving open RCP framework for rt-preempt
used in industrial applications because they are well
Linux real-time systems. The first aspect is discussed
known from education.
in section 2 and concerns the hardware to interface a
Mercifully, the real-time capability of the Linux RCP system with the target system. The presented
kernel due to patches like Rtai [1] or rt-preempt [2] solution is a low-cost and do-it-yourself system called
has highly increased and the usability of cost-saving Basic Input Output Elements (Bioe, [10]) with a
open source software (OSS) for computer-aided con- simple but very flexible interface. The second aspect
trol system design (CACSD) has improved within treats a web-based monitoring approach that imple-
the last years. A fully usable RCP framework for in- ments a Http web server within a Scicos block and
dustrial and educational applications is provided by uses an Ajax web-application to scope signals and
Rtai-Lab which consists of a Linux distribution with edit parameters. The web-based concept, first re-
a Rtai patched kernel, Scilab/Scicos [3] or Sci- sults of the prototype and further details are topic of
cosLab [4] and the Comedi drivers collection [5] for section 3.
a variety of data acquisition boards. Furthermore,
In section 4 the proposed open RCP framework
the software xRtaiLab allows to scope and record
is used to implement a nonlinear model-based con-
signals and to edit system parameters. There also
trol law for a mechatronic system. The multiple-
exist other real-time Linux projects like RTLinux [6]

7
A Nonlinear Model-Based Control realized with an Open Framework for Educational Purposes

input multiple-output (MIMO) mechatronic system the functionality to deal with different types of sig-
consists of the well known mass-spring system with nals and to provide the information via the Bioe bus
viscous friction that is actuated with a double-acting by the use of 16 end-points (EP0, EP1, . . . , EP15).
hydraulic piston (DAP). Beside the position control Each end-point consists of a 16bit receive and 16bit
of the mass the sum-pressure of the DAP has to be transmit register (RXadr,ep and T Xadr,ep with the
stabilized at a constant value. The method of in- Bioe address adr ∈ {0, . . . , 15} and the end-point
put/output exact linearization [11, 12] is used to de- number ep ∈ {0, . . . , 15}). These end-points are the
rive the control law and a hardware-in-loop (HIL) common interface between the real-time application
simulation is used to test the control law and to and the physical signal. The end-points are accessed
demonstrate the usage of the Bioe system and the with transactions. During a transaction a 16bit value
web-based monitoring system. is written from the PC to the register RXadr,ep and
is read from the Bioe module register T Xadr,ep back
Usability and reliability of an open RCP frame-
to the PC simultaneous.
work are basic requirements so that open hard- and
software can be used for educational purposes, a
very important precondition to introduce open RCP
frameworks into industrial applications.

2 Open Hardware - BIOE


The Bioe system was developed within a
diploma thesis [13] and is a simple but effective piece
of hardware to interface a PC with signals of a phys-
ical system. It consists of small modules that are
connected parallel with the Bioe bus cable to the
parallel port interface (Line Print Terminal, LPT) of
the PC as shown in figure 1.

FIGURE 1: A Bioe mock-up

Each module can be addressed with a 4bit dip-switch


and so up to 16 Bioe modules can be connected to
the bus cable. An adapter board is used for signal FIGURE 2: UML sequence diagram from
amplification and to protect the LPT against dam- the Bioe module firmware
ages. This interface is used for communication be-
In addition, the end-points EP0 and EP1 are
cause of its real-time performance and simplicity. In
used to activate the required interface type with the
addition, the LPT interface is still available on some
device type number (DTN). This approach reduces
embedded and desktop systems or can be retrofitted
the software effort for the real-time task and so only
with e.g. PCI Express cards.
two Scicos blocks are required to use the whole
A Bioe module is equipped with an AT-Mega16 functionality of Bioe (see section 2.2). Furthermore,
microprocessor. This 8bit microprocessor contains each Bioe module contains all interface types that

8
Real-Time Linux in Education

are realized for Bioe and each interface type can and connectors, communication speed and the micro-
be activated with an unique DTN. After the initial- processor performance. For the communication with
ization of the interface the corresponding interface transactions a 4 bit parallel bus is used and each
function is processed and the informations are con- transaction contains the device address, the end-
verted and exchanged between the end-points and point number, the register values and a very simple
the hardware interface as illustrated in figure 2. check-sum to detect transmission errors.
The table 1 gives an overview of the provided PIN Name Description
interfaces:
1 Vcc +5 V supply
DTN Interface Type Description 2 GN D ground supply
001 16 digital outputs 3 CS Chip-Select (by Master)
4 CLK Clock (by Master)
002 16 digital inputs 5 DO0 Bit 0 Master→Slave
003 8 digital inputs, 8 digital outputs 6 DO1 Bit 1 Master→Slave
7 DO2 Bit 2 Master→Slave
050 two 10bit PWM, four 10bit ADC 8 DO3 Bit 3 Master→Slave
090 square signal generator 9 DI0 Bit 0 Slave→Master
10 DI1 Bit 1 Slave→Master
091 square signal frequency measurement
11 DI2 Bit 2 Slave→Master
100 incremental decoder 12 DI3 Bit 3 Slave→Master
101 incremental decoder with HCTL2022 13 NC not in use
14 NC not in use
105 16x2 LCD driver, four push buttons
106 ADC, PWM, RC-Servomotor and TABLE 2: Description of the 14 pole Bioe
incremental decoder bus flat cable
107 UART interface Performance measurements done with different
120 RC5 infrared receiver PC’s return a transaction time Ttrans between 25 µs
and 50 µs (time where CS is high; see figure 3). This
127 4 wire resistive touch interface
parameter depends on the PC’s hardware, especially
128 WII-Nunchuck interface (I2 C) the type of parallel port (parallel port directly on the
129 WII-Remote IR camera interface (I2 C) motherboard, a PCIe extension module, ...).

230 electrical network simulator


CS
231 torsional oscillator simulator CLK
250 cycle time measurement DO0-3 ADR EP O15:12 O11:8 O7:4 O3:0 CHECKSUM

DI0-3 I15:12 I11:8 I7:4 I3:0 CHECKSUM

TABLE 1: List of available interface types


FIGURE 3: Timing-Diagram of a Bioe
If a new interface type is required the implementa- transaction (O is the value written to
tion is done in the Bioe module software. The con- RXadr,ep , I is the value read from T Xadr,ep )
cept to keep hardware related functionality on the
Bioe modules avoids any modification of the Bioe The following example demonstrates the deter-
software on the PC and reduces the efforts to keep mination of the smallest possible sample time of a
the PC software updated. Hence, if the Bioe com- control realized with Bioe.
munication via the end-points is ensured all available
DTN interfaces can be used. In the following sec- Example 1 A PID control of a system with one in-
tions further topics concerning the Bioe system are put ni = 1 and one output no = 1 should be realized
discussed. with Bioe. This fictive example illustrates how to
determine the smallest sample time for the control
2.1 BIOE Bus and Performance task. For discrete time systems it is assumed that the
time delay Tio between reading the inputs and setting
The Bioe bus is a 14 pole flat cable (see table the outputs is zero. Nevertheless, due to finite cal-
2) that connects the modules with the LPT adapter culation speed there is a delay. If a sample time Ts
board. This bus is a compromise of low-cost cables is used so that the relation Tio ≤ 0.1 Ts is ensured

9
A Nonlinear Model-Based Control realized with an Open Framework for Educational Purposes

then the effects of the timing error can be neglected.


In this example a measurement of the Bioe bus re- System IO 2 QA
sulted in Ttrans = 30 µs. ADR=2 DTN=50 ADR=2 EP2
[888,1,0,9,0] [888,1,0,9,0]
Ts = 10 Ttrans (ni + no ) (1)
The equation (1) delivers the minimum sample time FIGURE 5: Scicos blocks (left side: de-
of Ts = 0.6 ms for this PID control example. vice block; right side: end-point block)

To configure the parallel port the Bioe API


2.2 Toolchain requires the base-address of the parallel port (e.g.
888 dec), the channel number (e.g. 1) to define the
The Bioe toolchain consists of a set of Scicos chip-select number, the speed-parameter (e.g. 0)
blocks, a command-line tool and a tool with a graph- to trim the communication speed and the pause-
ical user interface. A CLisp API is also available. parameter (e.g. 9) to avoid CPU stress of the Bioe
This toolchain uses the Bioe C library which pro- module in consequence of a permanent Bioe bus
vides the required communication functions and con- communication. Optionally, a fifth parameter is used
tains the platform abstraction layer. to define a setting number that loads the port set-
For the development and testing of Bioe mod- tings from the file bioesettings.conf if available.
ules and new interface functions the command- This feature will be required if a compiled Bioe ap-
line tool bioedude and the graphical user interface plication has to be adapted to a different hardware
bioegui (figure 4) can be used to configure the mod- platform (e.g. an other parallel port base-address).
ules and to access the end-points directly. These
tools are also very useful to test sensors and actuators 2.3 An Example Interface
of a new mock-up. The toolchain can be compiled
and used on WIN32 and Linux based platforms.

FIGURE 6: Illustration of a complex Bioe


interface
All interfaces are described similar to DTN106 illus-
FIGURE 4: Screenshot of bioegui (uses trated in figure 6. The concept of Bioe allows such
FLTK, a cross-platform C++ GUI toolkit) simple documentation so that the principle of What
You See Is What You Get (WYSIWYG) fits very
Only two Scicos blocks (figure 5 are required well.
to include the Bioe system into a Scicos model.
The first block bioe dev does not have inputs and
outputs and it initializes the interface type of a Bioe 3 Web-Based Monitoring
module. The second block bioe ep provides the end-
point access. This block can be configured so that it This section presents the prototype of a web-
reads or writes an end-point triggered by an external based monitoring system for rt-preempt Linux real-
event. time applications created with Scicos. The idea was

10
Real-Time Linux in Education

to implement a Http server within the real-time ap- the moment the Rtxmlserver provides an access
plication, to start the server as non-real-time thread via Http requests (a reduced Http server is im-
and to exchange signals and parameters between the plemented according RFC-2616 [15]; TCP and UDP
real-time application and the Http server (see sec- servers are planed).
tion 3.1). A REST-style architecture [14] is used for s ta ti c p t h r e a d _ t thrd ;
Ajax based communication in which the signals are
s ta ti c void * s e r v e r _ t a s k(void * p )
exchanged as XML formated byte stream. A web {
application or an other kind of application uses this struct s c h e d _ p a r a m param ;
param . s c h e d _ p r i o r i t y = 10;
interface to establish a connection with the real-time i f ( sched setscheduler (0 , SCHED_FIFO , & param )== -1)
application to scope signals and to modify parame- {
exit ( -1);
ters. For the client side communication and a basic }
web-based monitoring system see section 3.2. // ... process the server
}

3.1 Server Side Component void r t x m l s e r v e r _ f n c( s c i c o s _ b l o c k * block , int flag )


{
i f ( flag ==1) { /* set output */ }
The Rtxmlserver illustrated in figure 7 was re- i f ( flag ==2) { /* get input */ }
i f ( flag ==5) { /* t e r m i n a t i o n */ }
alized within one Scicos block. During the initial- i f ( flag ==4)
ization of this Scicos block a new POSIX thread is { /* i n i t i a l i s a t i o n */
// ... block o p e r a t i o n s
created (see listing 1). pthread create (& thrd , NULL , server_task , NULL );
}
}

Control-Thread
BIOE bus
Real-Time
LISTING 1: Schematic structure of the
Rtxmlserver Scicos block
Rtxmlserver Create Thread
The Http server supports GET requests to ac-
FIFO based cess the file system and to load a web-page. In ad-
Server-Thread
Signal dition, a PUT-Request with the URI /rtxml.pipe
Manager Non-Real-Time
passes the RTXML-Request within the Http
request message-body to the RTXML-Request-
Data Process Server Handler. The handler parses the XML byte stream
and does the corresponding actions similar to SAX
RTXML- [16]. The XML formated RTXML-Response is
XML HTTP
Request sent back to the client within the message-body of
Stream Server
Handler the PUT-Response. The whole description of the
RTXML-Requests and RTXML-Responses is not in-
Record tended within this contribution but sub-section 3.3
should give an idea about the structure of the XML
formated messages.
File-System
The concept to start a non-real-time thread
within a Scicos block (see listing 1) can be applied
HTTP-Request for other purposes like image processing. In this
case, the function to get the image from a camera
FIGURE 7: Simplified Structure Diagram and to do the image processing can be done within
of the Rtxmlserver a non-real-time task. The results are passed to the
real-time thread. This approach reduces the effort
This thread processes the data-exchange between to start the application and to share the data with
a remote client monitoring system and the real-time inter-process communication.
task. For the communication between the real-time
thread and the server thread a object called RTSig- 3.2 Client Side Application
nalManager was implemented. The Scicos signal
blocks for the Rtxmlserver use this manager to The prototype of a client side monitoring system
register their signals and parameters and during the shown in figure 8 is an Ajax based web-application
operation the signal value with time-stamp is passed using JavaScipt and jQuery (see [17]). Ajax (with
via thread-safe FIFOs to the Rtxmlserver. At jQuery) and the Http request method PUT are

11
A Nonlinear Model-Based Control realized with an Open Framework for Educational Purposes

used to establish and configure a connection with the with all sampled informations for offline analysis the
Rtxmlserver and to exchange the signals and pa- Rtxmlserver can record all data-points into a local
rameters. This functionality is encapsulated within *.csv file and this file can be downloaded with the
the JavaScript Rtxmlclient class that prepares web browser.
and stores the data in arrays. These arrays can be
used for a visualization purposes, e.g., realized with
a JavaScript plotting library to illustrate the signals
continuously without refreshing the whole web-page.
The library jQuery UI [17] was used to improve the
look and feel of the visualization and to get a behav-
ior comparable with a native application.

HTTP-Request
RT-Task
Data
Container

Rtxmlclient
JavaScript-Class Web-Browser
using Ajax
FIGURE 9: Screenshot of a web-based
monitoring example

3.3 About the RTXML-Request and


jQuery Web- RTXML-Response
HTML-Web Page
Application
The RTXML-Request in listing 2 contains most
of the available functions. With the first XML tag
FIGURE 8: Simplified Structure Diagram <C>...</C> the client provides the connection num-
of the web-based application using Rtxml- ber so that the Rtxmlserver knows that it is the
client same client. Then some action commands are sent
to the Rtxmlserver and at the end the parameter
Figure 9 shows the screenshot of the standard with the id 6 is set with the value 2.44.
interface. It allows the configuration of the connec- <R >
tion. If the connection to the real-time application < ! -- Connection - Number -- >
< C > 69 </ C >
is established, the three signal scopes and the pa- < ! -- Actions: -- >
rameter list are automatically configured. This stan- < A > <N > G E T P A R A M E T E R S</ N > </ A >
< A > <N > G E T S I G N A L S</ N > </ A >
dard interface consists of a HTML file, some CSS and < A > <N > S T A R T R E C O R D</ N > < V > data . csv </ V > </ A >
JavaScript files. A custom interface can be created < ! -- set p a r a m e t e r 6 to 2.44 -- >
< P > <I >6 </ I > <V > 2.44 </ V >< / P >
with a text editor and without the need of further </ R >
compilers or a complex toolchain. This approach
allows to prepare a demonstration system mock-up
with a specialized interface for presentation purposes LISTING 2: An example of a RTXML-
or to create laboratory setups where the students Request
can configure the control-law parameters and test the
performance of the control-law. The RTXML-Responses have a similar format
(see listing 3). The Rtxmlserver collects all sig-
The first prototype works very well and can be
nals of one sample time, creates a list that contains
used for applications, but if a high sample-rate and
the time (double precision), signal id and signal value
more signals are transmitted it is necessary to re-
(double precision), converts this list to a HEX-ASCII
duce the number of transmitted data-points. Oth-
string and places it with XML tags into the RTXML-
erwise, there is not enough CPU power to parse the
Response.
XML formated context with JavaScript. The reduc-
tion of data-points has no significant effect on the The main idea was to keep all information in dou-
quality of the visualization. To provide data records ble precision so that there is no information lost due

12
Real-Time Linux in Education

to truncation. The HEX-ASCII string yields to a L is a geometrical parameter of the system and L0 is
100% increase of the bytes to transmit. In the future the unstressed length of the spring. Here and in the
the use of Base64 (RFC4648, [18]) coding is planed following let us assume for simplicity L = L0 .
because this increases the bytes to sent only about
33%. Furthermore, it should be checked if the XML hydraulic L
parsing can be improved. actuator x
<R > m
<! -- Connection - Number -- > c, L0
<C > 69 </ C >
<! -- Provide Signals and P a r a m e t e r s -- >
<P > <I >5 < / I > <N > Param1 </ N > </ P >
<P > <I >6 < / I > <N > Param2 </ N > </ P > viscous friction
<S > <I >1 < / I > <N > Input </ N > </ S >
<S > <I >2 < / I > <N > Output </ N > </ S > QA QB
<! -- Signal Samples Hex - Formated -- >
<! -- time , id1 , signal1 , id2 , signal2 ,.. -- >
<S > <V > AF43 .....4 B2C </ V > </ S > FIGURE 10: Schematic of the hydraulic
<S > <V > AF43 .....4 B3A </ V > </ S > system
</ R >

4.1 Control Law Design


LISTING 3: An example of a RTXML- The design goal of this application is to keep the
Response DAP position x and the sum-pressure pA + α pB as
close as possible to the desired values xd and pd . The
4 Application Example control deviation can be formulated with
e1 = x − xd e 2 = pA + α pB − pd .
To demonstrate the proposed open RCP frame-
work this chapter is devoted to the position and sum- With the concept of exact input-output linearization
pressure control of a non-linear hydraulic system by it can be shown that the inputs of the system appear
...
the use of exact input-output linearization [11, 12]. at e 1 and ė2 and in consequence the system is input-
The hydraulic system, realized with a PC and Bioe state linearizable. To keep these outputs as close as
for HIL simulations, is controlled by an other PC. possible to zero the dynamics
Both PCs use Linux with a rt-preempt patched ker- ...
nel to provide real-time capability. The signals like 0 = e 1 + α12 ë1 + α11 ė1 + α10 e1 (3)
position, pressure and volume flow are exchanged be- 0 = ė2 + α20 e2 (4)
tween the PCs via an analog interface and Bioe (see
are chosen with a set of parameters α10 , α11 , α12 and
figure 1). According to [19, 20, 21], the mathematical
α20 so that the dynamics (3) and (4) are asymptot-
model of the system illustrated in figure 10 can be
ically stable and have the required behavior. In the
described with the set of ordinary differential equa-
next step the control law
tions
...
QA = QA (x, v, pA , pB , xd , ẋd , ẍd , x d , pd , ṗd )
ẋ = v ... (5)
QB = QB (x, v, pA , pB , xd , ẋd , ẍd , x d , pd , ṗd )
1
v̇ = [A pA − α A pB − d v + c (L − L0 − x))]
m can be evaluated by solving the equations (3) and
Ef (4) for the volume flows QA and QB . The time
ṗA = (QA − A v) ...
VA0 + A x derivatives ẋd , ẍd , x d and ṗd can be provided by
Ef a trajectory planning system and this allows a feed-
ṗB = (QB + α v A) , forward tracking control of the system (2). Let us
VB0 − α A x
(2) assume that the desired trajectories xd (t) and pd (t)
are smooth and changes very slow and so the choice
...
where ˙ denotes the total time derivative, A and α A ẋd = ẍd = x d = ṗd = 0 is justifiable. Consider that
the piston areas, m the mass, Ef the isothermal bulk the control law (5) depends on the piston velocity v
modulus of the fluid (see, e.g. [21]), VA0 and VB0 which can not be measured. Hence, a velocity and
are the initial volumes, c the spring coefficient and d disturbance observer is designed in the next step.
the viscous friction coefficient. The piston position
x and the pressures pA , pB are measurable states of 4.2 Observer Design
the system and it is assumed that the velocity v of
the piston can not be measured. The volume flows To implement the control law (5) a velocity ob-
QA and QB are the inputs of the system. The length server can be used. The realized HIL mock-up (see

13
A Nonlinear Model-Based Control realized with an Open Framework for Educational Purposes

figure 1) was realized with a minimum of effort and with e = [ev eFd ]T . Nevertheless, the proof of the
the signals have small offsets. The pressures pA stability of the observer does not allow for conclu-
and pB are very sensitive for signal-offsets because a sions concerning the stability of the overall system
wrong measured pressure deals like an external dis- because of the nonlinear system (2).
turbance force and this irritates a reduced observer
for the velocity v. To fix this problem the system (2) 4.3 Implementation
is extended by an unknown but constant disturbance
force Fd with the dynamic Ḟd = 0. To realize the HIL simulation Ubuntu 10.04 LTS
The following observer design (see, e.g., [12]) [22] with the rt-preempt patched 2.6.31-11-rt kernel
treats the general system from the Lucid repository was installed. ScicosLab
4.4.1 [4] with the rt-preempt code generation pack-
ẋ = v age [9] is used for the simulation and real-time code
1 generation. The Bioe and Rtxmlserver blocks
v̇ = (−c x − d v + F + Fd ) were installed into the ScicosLab directory struc-
m
Ḟd = 0 ture. A performance measurement of a dummy sim-
ulation with a cycle time of Ts = 2 ms and a Bioe
with the position x, the velocity v and the distur- with DTN250 (see table 1) was performed on an In-
bance force Fd . To get the equations for the DAP tel(R) Pentium(R) M 1.4 GHz notebook. During the
system F = A pA − α A pB has to be used. The test duration of two hours the maximum Ts,max =
position x and the force F can be determined by 2.08 ms and the minimum Ts,min = 1.928 ms of the
measurements and calculations and so a reduced ob- cycle time were detected. It should be noted that
server is designed to estimate the velocity v and the the measured time variations cover the rt-preempt
disturbance force Fd . In the first step the following patched Linux notebook, the Bioe bus and the Bioe
state transformation element software.

w1 = v + kv x w2 = Fd + kFd x In addition, the rt-preempt code generation


package [9] was modified so that a real-time scal-
is introduced and results in the linear and time in- ing parameter can be used to slow-down or speed-up
variant system the HIL simulation. This feature can be very use-
ful in the case of complex systems (e.g. distributed
ẇ = Aobs w + u (6) parameter systems) were the calculation of the dy-
with the new state vector w = [w1 w2 ]T , the matrix namics can not be performed in real-time. In the
case of the DAP system control a real-time scaling
 
k − d 1 factor of three is used to keep the time of the Bioe
Aobs = v m m transactions and the settling time of the analog in-
kFd 0
terface circuit small compared to the cycle time. In
and the input addition, RC low-pass filters with R = 1 kΩ and
  x F
 C = 1 µF were used to convert the digital PWM
d kv − kFd − c − kv2 m m + m
u= . signals (fP W M = 15.6 kHz) into analog signals.
−kv kFd x
For the numerical simulation and HIL simula-
With a specific choice of parameters kv and kFd the tion Scicos blocks of the systems (2) and (5) are
matrix Aobs becomes a Hurwitz matrix and then the required. For this job a Maxima toolbox was devel-
standard approach of a trivial observer for the sys- oped that allows to export a system
tem (6) can be applied to get an estimation ŵ of w,
where 
ˆ indicates the estimated value of .
ẋ = f (x, u, p) (7)
ŵ˙ = Aobs ŵ + u y = g(x, u, p) (8)
Now it is easy to prove that the estimation error for
the velocity v and the disturbance force Fd into a ready to use Scicos block. A Scicos
block normally consists of an interface-function and
ev = v − v̂ = w1 − wˆ1 a computational-function (see, e.g. [23]). The
eF = Fd − Fˆd = w2 − wˆ2 interface-function is written in the Scilab language
d
and provides an interface to edit the system param-
has the asymptotically stable dynamic eters p and the initial values of the state x. The
computational-function can be written in Scilab
ė = Aobs e or C and contains the set of ordinary equations

14
Real-Time Linux in Education

xref
["xref"] psumref QA
"QA"
["pref"] x QB
["x"] pA "QB"
Sample−Time Plot−Output File−Output ["pA"] v_obs
pB "vobs"
From ["pB"]
fr..
Tp [..
Goto EALin−Control+Observer
Ts xref calc_psum "ps..
Goto psumref"xref"
"pref"
Reference Position/Pressure
x x−>dac S. N. adc−>x
QA "x"
["QA"] q−>dac N. S. adc−>q v v−>dac S. N. adc−>v "v"
QB pA p−>dac S. N. adc−>p
["QB"] q−>dac N. S. adc−>q "pA"
pB p−>dac S. N. adc−>p "pB"
Analog Interface (S/H, Noise, ADC, DAC)
Analog Interface (S/H, Noise, ADC, DAC)
DAP−System

FIGURE 11: Numerical simulation of the DAP control realized with Scicos (Tp = 1 ms, Ts = 5 ms)

f (x, u, p) with the system input u and the equations changed dramatically in order to obtain the step re-
g(x, u, p) for the system output y. sponse of the system. In the third plot the volume-
flow QA into the DAP are compared and the fourth
Such a code-generation package is an important
plot compares the real piston velocity (with noise)
component of a complete RCP framework because
and the estimated velocity from the numerical simu-
it avoids the repeated implementation of equations,
lation.
the search for typing errors and it is easy to keep the
analytic calculations and the simulation code syn-
chronized. The created blocks for the DAP system 0.015
and the control law can be used for numerical simula- Reference
Piston Position

0.01
tions and the real-time code generation with Scicos. 0.005 Simulation
x/m

0 HIL
To obtain realistic results the numerical simula- −0.005
tion in figure 11 takes effects like quantization, signal −0.01
ranges and noise into account. The numerical results −0.015
0 1 2 3 4 5
fit very well with the HIL measurements and they are 210000
discussed in the following section. 200000
Sum−Pressure
psum / Pa

190000
180000
4.4 Results 170000 Reference
160000 Simulation
150000 HIL
Following system parameters were used for the 140000
numerical and HIL simulation: m = 1 kg, d = 1 Nms , 0 1 2 3 4 5
N
c=1m , A = 1 · 10−4 m2 , α = 0.7, VA0 = 5 · 10−6 m3 , 5e−005
Simulation
4e−005
Volume−Flow

VB0 = 3.5 · 10−6 m3 , Ef = 1.6 · 105 mN2 . Some of HIL


QA / m3/s

3e−005
the parameter values are not realistic because they 2e−005
1e−005
were chosen so that bad influences caused by noise, 0
quantization, offsets and signal limitations due to −1e−005
the low-cost analog interface will be reduced. The −2e−005
0 1 2 3 4 5
eigenvalues of the observer dynamic matrix Aobs are 0.1
placed with the coefficients kv = −3 and kFd = −4 0.08 Simulation v
0.06 Simulation vobs
to −2. Finally, the poles of the error dynamic (3) are
Velocity
v / m/s

0.04
defined with α12 = 90, α11 = 2700, α10 = 27000 at 0.02
−30 and the pole of the error dynamic (4) is aligned 0
with α20 = 10 to −10 . −0.02
−0.04
The results are summarized in figure 12. The 0 1 2 3 4 5

first two plots illustrates the tracking behavior of Time t / s


the piston position x and the sum-pressure psum con-
FIGURE 12: Comparison of the numerical
trol. The desired trajectories xd (t) and pd (t) are also
simulation results and the HIL measurements

15
A Nonlinear Model-Based Control realized with an Open Framework for Educational Purposes

5 Conclusions and Perspectives [10] Basic Input Output Elements Project Website:
http://bioe.sourceforge.net. Web. 20 Aug. 2011.
The HIL simulation of an industrial motivated
control demonstrated that rt-preempt patched Linux [11] Isidori, A.: Nonlinear Control Systems, 3rd Edi-
kernels, ScicosLab, Maxima, the presented open tion. Springer, Londen, UK, 1995.
hardware Bioe and the web-based monitoring sys-
tem prototype build a complete open rapid control [12] Schlacher, K.; Zehetleitner, K.: Control of Hy-
prototyping framework which can be used for edu- draulic Devices, an Internal Model Approach In:
cational purposes. Due to the web-based approach Analysis and Design of Nonlinear Control Sys-
there is no additional software except a modern web- tems. Springer-Verlag Berlin Heidelbarg, 2008.
browser required to interact with the real-time sys-
tem. Furthermore, the web interface can be modified [13] Weichinger, K.: Tonregelung einer Trompete
to the needs of the application by editing the web mit einem Low-Cost Automatisierungssystem
pages with a single text editor. und RTAI Linux, JK University Linz, Austria,
2008.
Improvements of the web-based monitoring sys-
tem and the release of a ready to use version for [14] Fielding, R. T.: Architectural Styles and the
the community with more human interface demos Design of Network-based Software Architectures.
are planed. An other point of interest is to use this Dissertation, University of California, Irvine,
framework to realize a distributed parameter system 2000.
benchmark example and to compare the results with
measurements. Therefore, more complex Bioe in- [15] Fiedling, R. T. et al.: RFC2616: Hyper-
terfaces will be used and the realization of 2D/3D text Transfer Protocol - HTTP/1.1, June 1999,
visualizations are planed. http://www.ietf.org/rfc/rfc2616.txt, Web. 20
Aug. 2011.

References [16] SAX-Project Website:


http://www.saxproject.org. Web. 20 Aug. 2011.
[1] RTAI - Real Time Application Interface Official
Website: https://www.rtai.org. Web. 20 Aug. [17] jQuery Project Website: http://jquery.org.
2011. Web. 20 Aug. 2011.
[2] Real-Time Linux Wiki:
[18] Josefsson, S.: RFC4648: The Base16, Base32,
http://rt.wiki.kernel.org. Web. 20 Aug. 2011.
and Base64 Data Encodings, October 2006,
[3] Scilab / Scicos Website: http://www.ietf.org/rfc/rfc4648.txt, Web. 20
http://www.scilab.org. Web. 20 Aug. 2011. Aug. 2011.
[4] ScicosLab 4.4.1 - Project Website: [19] Grabmair, G.; Schlacher, K.; Kugi, A.: Geomet-
http://www.scicoslab.org. Web. 20 Aug. 2011. ric energy based analysis and controller design
[5] COMEDI - Linux Control and Measurement De- of hydraulic acutators applied in rolling mills.
vice Interface - Website: European Control Conference (ECC), Cam-
http://www.comedi.org. Web. 20 Aug. 2011. bridge, 2003.

[6] FSM Labs, Inc.: RTLinux3.1 Getting Started [20] Kugi, A.: Non-linear Control Based on Physical
with RTLinux, 2001. Models, volume 260 of Lecture Notes in Con-
trol and Information Sciences. Springer-Verlag,
[7] Xenomai: Real-Time Framework for Linux
London, 2001.
Website:
http://www.xenomai.org. Web. 20 Aug. 2011. [21] Murrenhoff, H.: Grundlagen der Fluidtechnik,
[8] Maxima, a Computer Algebra System; Web- Teil 1: Hydraulik. Shaker Verlag, Aachen, 2007.
site: http://maxima.sourceforge.net. Web. 20
Aug. 2011. [22] Ubuntu Linux Distribution Website:
http://www.ubuntu.com. Web. 20 Aug. 2011.
[9] Bucher, R.: rt-preempt Code Generation Pack-
age for ScicosLab [23] Campbell, S.; Chancelier J.; Nikoukhah,
http://www.dti.supsi.ch/b̃ucher. Web. 20 Aug. R.: Modeling and Simulation in Scilab/Scicos.
2011. Springer. 2006.

16
Real-Time Linux Applications

The Witch Navigator – A Software GNSS Receiver Built on


Real-Time Linux

Petr Kacmarik
Department of Radio Engineering K13137, CTU FEE Prague
Technicka 2, 166 27 Praha 6, Czech Republic
[email protected]

Pavel Kovar
Department of Radio Engineering K13137, CTU FEE Prague
Technicka 2, 166 27 Praha 6, Czech Republic
[email protected]

Ondrej Jakubov
Department of Radio Engineering K13137, CTU FEE Prague
Technicka 2, 166 27 Praha 6, Czech Republic
[email protected]

Frantisek Vejrazka
Department of Radio Engineering K13137, CTU FEE Prague
Technicka 2, 166 27 Praha 6, Czech Republic
[email protected]

Abstract
The Witch Navigator (WNav) is an open source project of GNSS (Global Navigation Satellite System)
receiver whose hardware is implemented as an ExpressCard hosted in PC with Linux OS.
The employment of PC offers a possibility of an easy implementation of signal processing algorithms
since almost no restrictions are introduced by a specific embedded platform (concerning memory re-
quirements or real data type and its arithmetic). As a consequence, the WNav is especially suitable for
researchers or students because the signal processing algorithms can there be implemented in asimilar
manner as in high-level simulations. Furthermore, developers can rely on the wide and well known col-
lection of development tools for Linux on x86 architecture. Unlike similar projects, WNav has capability
to achieve performance comparable to professional GNSS receivers.
The WNav receiver is equipped with two front ends which can process any civil GNSS signals on two
frequencies simultaneously. The whole receiver task is distributed between the device driver, user space
real-time process and other axillary processes. The real-time needs are satisfied with RT PREEMPT
kernel patch.
The paper describes the whole conception of WNav with focus on the kernel part (device driver) and
the real-time user space process, provides information about the processes synchronization and presents
the achieved performance.
The first obvious milestone is to develop the fully functional GPS L1 C/A receiver which justifies
the selected conception. The achieved results and experience with this legacy signal are presented in the
paper, as well.

17
The Witch Navigator A Software GNSS Receiver Built on Real-Time Linux

1 Introduction searchers. Since WNav is a software receiver, the


researchers can implement and test almost an arbi-
trary GNSS receiver related algorithm. The utiliza-
It makes no sense to emphasize the importance of
tion of PC workstation (x86) is also a big advantage.
satellite navigation for everyday life. A satellite nav-
The PC is a well know and quite common architec-
igation system with worldwide coverage is termed as
ture with a wide range of development tools. Next,
GNSS (Global Navigation Satellite System). Nowa-
there is a sufficient computational power for algo-
days the term GNSS can represent several partic-
rithms, and PC architecture does not impose restric-
ular navigation systems. The most important one
tions as in embedded architecture (fix-point arith-
and well know is the United State GPS (Global Po-
metic, memory size). A proposed algorithm in some
sitioning System) since it can be considered as the
high level simulation tool can be directly applied in
only fully operable GNSS. For the WNav project,
WNav, i.e. without a huge work to adapt the algo-
the implementation of the legacy GPS L1 C/A signal
rithm to specific architecture. Thus, the WNav re-
processing into WNav is important milestone that
ceiver can be used for rapid verification of algorithms
verifies the entire WNav project conception. The
in a real environment.
next GNSSa are EU’s Galileo, Russian’s Glonass and
China’s Beidou. Technical university students and teachers fo-
cused on radio engineering or satellite navigation can
The project of our software GNSS receiver, which
be second group of interest. Since the PC worksta-
is carried out at the Department of Radio Engineer-
tion is an integral part of the WNav receiver, all sig-
ing FEE CTU, goes back to 2000. The WNav re-
nals, registers, status etc. can be logged, and then
ceiver is its latest contribution. Up to the WNav
viewed and visualized in a comfortable manner.
project beginning we tried several types of architec-
tures. All of those were based on FPGA (Field Pro- Obviously, there will still be a few applications
grammable Gate Array) which was supported with which can not be fully satisfied with COTS (Com-
some processor (x86, PowerPC, MicroBlaze). There mercially available Off-The-Shelf) products. In these
was established a simple rule how to divide the whole cases the WNav can be utilized as a cheep but capa-
GNSS signal processing task between the FPGA and ble solution.
the processor: the direct signal manipulation at high
sampling rate is performed in the FPGA while rela-
tively slow but logically complicated operations are 2 Introduction to GNSS Signal
done in the processor.
Processing
At that time, we faced several drawbacks. If
some FPGA development board together with PC
Before we move to the detailed description of the
were used, such solution showed low communication
WNav receiver and its particular components, it
capability between this board and the PC. If the ar-
will be useful to explain basic principles of satel-
chitecture was purely based on FPGA and processor
lite navigation. Our goal is to prepare background
core incorporated into the FPGA the main drawback
for the time requirement explanation and also com-
was low computational performance of such proces-
munication explanation between the receiver hard-
sor core.
ware (WNav device) and processor (PC worksta-
The WNav project tried to overcome most of the tion). More details concerning satellite navigation
drawbacks. The utilization of PC (with x86 proces- can be found in textbooks [1, 2].
sor) takes advantage of a widely available architec-
Consider an unknown user position (x, y, z) and
ture with sufficient computation power. The WNav
known positions of satellites (xi , yi , zi ), i ∈ 1, 2, . . .
receiver analog and FPGA part are implemented as
(the satellite positions are computed from parame-
an ExpressCard peripheral device. The ExpressCard
ters of satellite tracks). In principle, the receiver
internally relies on the PCI Express (PCIe) standard
obtains the distance di to the i-th satellite by mea-
which ensures sufficient throughput between FPGA
surement of the signal propagation time. If we mea-
and the processor on PC. The ExpressCard now is
sure distance di to 3 satellites with known positions
relatively modern interface and we can expect its
(xi , yi , zi ) we can obtain user position (x, y, z) by
support for a long period.
solving the following system of nonlinear equations
What is the purpose of the WNav project? p
Someone can argue that the GNSS market is satu- di = (x − xi )2 + (y − yi )2 + (z − zi )2 (1)
rated and there is no space for a new GNSS receiver.
The real situation is complicated since the receiver
At first, we expect an interest of GNSS re- clock and satellite clocks are not synchronized. As a

18
Real-Time Linux Applications

consequence, the receiver is not capable of direct dis- and PLL are their detectors. The detector is a block
tance measurement but rather so called pseudorange which output is proportional to an error of tracking
ρi which differs form true distance di by an unknown parameter, i.e. to τ − τ̂ for DLL or to ϕ − ϕ̂ for
bias b. The system of equations has then following PLL. The detector output drives (through the loop
form filter) particular NCO in the signal replica generator
p in other to minimize the error. The contemporary
ρi − b = (x − xi )2 + (y − yi )2 + (z − zi )2 (2) GNSS receiver has several such DLL/PLL blocks,
Since there are 4 unknown parameters (3 position co- each of them tracks one satellite signal.
ordinates and bias b) the receiver needs to perform
minimally 4 measurements to find the solution.
How is the time measurement accomplished from
the signal? The receiver generates identical signals as DLL
satellites. These locally generated signals are called detector
si(t) τ error DLL loop
signal replicas. The signal replicas are kept synchro-
filter
nized with the received signals. The time (delay)
information is then carried in signal replica parame- φ error PLL loop
ters needed for their generation so it is available in filter
PLL
the receiver. detector
ri(t)
Let’s consider the following simplified model of
replica
the received signal generator

si (t) = A d(t − τi ) ci (t − τi ) exp j(ωo,i t + ϕ0 ) + n(t), code NCO carr. NCO
(3)
where A is a signal amplitude, d(t) represents a nav-
igation message, ci (t) is a pseudorandom code (PRN
code) and n(t) is an additive noise. The parameter τi
represents the signal delay, ωo is a frequency offset,
ϕ0 arbitrary carrier phase in t = 0. The correspond-
ing signal replica has the form FIGURE 1: Simplified block diagram of
GNSS signal tracking – DLL and PLL struc-
ri (t) = ci (t − τ̂i ) exp(−jω̂o,i t), (4) ture
where τ̂i is an estimation of the received signal delay.
The τ̂i is the key parameter needed for pseudorange
formation ρi .
The specific correlation property of ci (t) enables Now we move towards the real implementation
to keep si (t) and ri (t) synchronized, i.e. the estima- of the signal tracking as utilized in the WNav re-
tion error τi − τ̂i is kept small. ceiver. The modified block diagram of the algorithm
is shown in Fig. 2.
In fact, the parameter τi can carry only a frac-
tional part of the pseudorange ρi due to ci (t) peri- The detector in GNSS receiver consists of sev-
odicity. Since ci (t) has period of 1ms the τi can only eral correlators (WNav employs just Early and Late
be measured in the range of 0 to 1 ms which corre- correlators in one DLL/PLL block) and discrimina-
sponds to 0 to 300 m in distance. The pseudorange tor block. The correlator is a block which computes
ρi is extended using the bits of navigation message the mutual energy of the si (t) and ri (t) over spec-
d(t) and so called Z-count (time mark imprinted in ified interval (given by period of ci (t), i.e. 1 ms in
d(t)). GPS L1 case). This interval is denoted as the inte-
gration time and, in WNav project, the instants of
The signal tracking is a stage of signal processing
the Early and Late integration ends are denoted as
where a locally generated replica is kept synchronized
Early and Late PRN TIC (E PRN TIC and L PRN
with a received signal. The receiver usually utilizes
TIC), respectively. The next block, discriminator,
cooperation of two feed-back systems for this task.
is usually non-linear memory-less block. Note that
For code and carrier synchronization the DLL and
the DLL and PLL detectors have common correla-
PLL feed-back systems are employed, respectively.
tors while discriminator blocks differ. The DLL and
The simplified block diagram of the signal track- PLL detectors are marked as the blue and red areas,
ing is shown in Fig. 1. The key parts of both DLL respectively.

19
The Witch Navigator A Software GNSS Receiver Built on Real-Time Linux

each DLL/PLL block tracks different satellite signal,


the E/L PRN TIC events of particular blocks are
PLL detector DLL detecor not synchronized. To make the transfer feasible, the
WNav utilize a resampling conception: registers of

discriminator
correlator

DLL loop
si(t) all DLL/PLL blocks are sampled at a slightly higher
Early

filter
DLL
rate than E/L PRN TIC and at this time all register
values are interchanged between the FPGA and the
PC. The software on the PC side then does recogni-

discriminator
tion whether new correlator output values were re-
correlator

PLL loop
filter
ceived as a consequence of E/L PRN TIC of a partic-
Late

PLL
ular DLL/PLL block. The resampling runs at 800 µs
rate and in the WNav project is denoted as TIC
rE,i(t) rL,i(t) event. We will also discuss the TIC event later since
it is an important event which drives the FPGA-PC
carr. NCO
meas. carr.
generator
carrier

communication.
Signal tracking can not be an initial stage of sig-
nal processing. The DLL/PLL supposes that the
code NCO
meas. code

parameter error is small enough and then the detec-


generator

cL,i(t)
code

tor can produce a meaningful output which drives


cE,i(t)
the NCO in the right way. The stage which has to
precede the signal tracking is denoted as signal ac-
quisition. The task of the acquisition is to provide
a coarse estimate of a signal parameter to seamless
FIGURE 2: Signal tracking algorithm in transition to the tracking stage.
the WNav receiver – DLL and PLL structure
The WNav receiver relies on parallel acquisition
The interface between the FPGA and the PC algorithm. The FPGA part therefore contains snap-
workstation divides the entire tracking algorithm shot memory, where the sampled complex envelope
into a high frequency part processed in the FPGA of the received signal is stored. Since the acquisi-
and a slow part processed in the PC. This interface tion algorithm is completely implemented in the PC,
is outlined with gray line in the figure. The interface the contents of the snapshot memory have to be also
output/input registers are marked as gray boxes in transferred to the PC. If we consider In-phase (I) and
the figure. Quadrature (Q) signal components, eight bit samples
at 20 MHz each, we obtain 2 × 20 106 × 800 10−6 =
The correlators perform signal down-sampling.
32000 = 0x7d00 bytes as a size of the snapshot mem-
While the correlators treat the sampled signals at
ory which has to be entirely transferred to PC at
their inputs (at 20 MHz rate in WNav case), their
every TIC.
outputs are issued at integration time rate, i.e. at
1 kHz. The relatively slow correlator outputs are
read from appropriate registers (Early/Late corre-
lator output registers) and the rest of the loops are
implemented as software in the PC. The local replica
generator, since treat sampled signal, has to be im-
plemented in the FPGA. The replica generator is
driven through the NCO code/carrier control regis- 3 Description of the WNav Re-
ters. The signal delay can be read from the code mea- ceiver
surement register (and also from the carrier measure-
ment register in case of precise carrier phase mea-
surement).
In this section, we will describe particular parts of
The Early/Late correlator output registers issue the WNav receiver. We will start with the receiver
new values at every E/L PRN TIC event (approxi- hardware. Next, the device driver will be described
mately every 1ms). These values have to be sent to and finally we will turn our attention to the software
PC workstation and new NCO code/carrier registers (processes) in user space. Here, the provided infor-
have to be received. The receiver has several such mation introduces WNav as a Linux project. The
DLL/PLL blocks, and this type of register trans- signal processing and GNSS perspective of the WNav
fer has to be accomplished for all of them. Since receiver can be found in [3, 4].

20
Real-Time Linux Applications

3.1 Receiver Hardware 1192). The 8-bit samples at sampling rate of 20 MHz
are used for each component. The next signal pro-
The WNav hardware consists of a peripheral device cessing is accomplished in the FPGA. The WNav is
plugged into the PC workstation. The device is im- now built on Xilinx Spartan 6 FPGA (XC6SLX45T).
plemented as an ExperessCard/54 (L-shape). Two
The key elements of the FPGA part are
different WNav device prototypes can be seen in
DLL/PLL correlator blocks as described above (see
Fig. 3.
Fig. 2). These blocks are organized into groups of six
and, on the higher level, there are four such groups
in WNav. Thus the WNav receiver has capability to
track 6 × 4 = 24 satellite signals simultaneously.
Except the correlator blocks, the FPGA part
contains the other blocks as snapshot memory for
signal acquisition purpose, and an I2C block which
can control the direct-conversion tuners.
All input and output registers and snapshot
memory are arranged in such a way that they can be
accessible through I/O memory mapped mechanism
from the PC side. The device offers one I/O mem-
ory region which is common for reading and writing
operations. But, the read and write operations with
identical address access different memory cells in the
device (generally, a value which was written to the
FIGURE 3: Two different versions of the device can not be read back from the device at the
WNav ExperessCard device prototypes same address). The arrangement of the input and
output registers into I/O memory space as viewed
The WNav device consists of an analog part and from the PC side can be seen in Fig. 6 and Fig. 7.
FPGA part. The FPGA part is responsible for digi- To ensure register visibility from the PC side, the
tal signal processing and communication via PCI Ex- FPGA part contains the communication block which
press (PCIe). forms and processes TLP (Transaction Layer Pack-
ets) packets and transforms them to the I/O memory
I operations.
CH1 MAX 2120 MAX 1192 I,Q
direct conv. 2x8 bit
RX ADC adc_clk The memory region for the read operation is
Q
AGC PCIe
large due to the snapshot memory size. To meet
lanes strict time requirement the transfer from the device
TCXO Spartan 6
I2C to the PC is accomplished with DMA and the com-
20 MHz xc6slx45t
AGC munication block is equipped with a simple DMA
I controller.
MAX 2120 MAX 1192 I,Q
direct conv. 2x8 bit The communication with the device is synchro-
CH2 RX ADC adc_clk
Q nized with the TIC event. The TIC event occurs
every 800 µs and at this instant, the new values can
be read and written through I/O memory. The TIC
Lin. Reg. Lin. Reg. Lin. Reg. Config. event is propagated to the PC side by MSI interrupt
1.2V (core) 1.2V (PCIe) 2.5V (AUX) flash
Prog. (Message Signaled Interrupt) which is generated at
connect. the end of DMA transfer. The time relation of the
TIC event and the interrupt is depicted in Fig. 5.
FIGURE 4: Functional block diagram of
the WNav ExperessCard peripheral device. The correctly plugged WNav device can be seen
in a list of PCI devices:
The device is equipped with two RF (Radio Fre-
quency) inputs with MMCX connector, thus, sig- $ lspci -v
nals from two antennas can be processed simultane- ...
18:00.0
ously. The analog receiver part utilizes direct conver- RAM memory: Xilinx Corporation Zomojo Z1
sion concept (MAX 2120), so the complex envelope Subsystem: Xilinx Corporation Zomojo Z1
(I&Q components) are fed into A/D converter (MAX Physical Slot: 1

21
The Witch Navigator A Software GNSS Receiver Built on Real-Time Linux

Flags: bus master, fast devsel, latency 0, IRQ 44 driver. The second structure, struct wnav dev,
Memory at e4000000 (32-bit, non-prefetchable) gathers data for each particular device (for one
[size=1M]
Capabilities: [40] Power Management version 3
plugged WNav card) into a system; it is supposed,
Capabilities: [48] MSI: Enable+ Count=1/1 that there can be more WNav cards plugged in
Maskable- 64bit+ one PC. The struct wnav dri contains an array of
Capabilities: [58] Express Endpoint, MSI 00 pointers to struct wnav dev as a one of its item.
Capabilities: [100] Device Serial Number
00-00-00-00-00-00-00-00 Most of struct wnav dev items are filled in
Kernel driver in use: wnav
wnav pci probe() function call invoked after the
device plugging. The important items are ad-
dresses for accessing I/O memory of the WNav de-
3.2 Device Driver – Kernel Module
vice. The hardware address baddr hw is obtained
from pci resource start(), and is mapped using
The device driver was written based on information
pcim iomap() to obtain virtual address baddr vir,
in [5, 6]. Other up to date information was obtained
which is used for access from the driver side. The
using a Linux identifier search server [7].
next two addresses are related to the DMA transfer.
The WNav device driver is implemented as a There is dma hw address which has to be sent into the
character device driver. When module is loaded, the device (the DMA controller in the FPGA needs this)
plugged WNav card is accessible through the device and dma vir address, which is used for access the
file /dev/wnav0. The kernel message, when the de- DMA region from driver side. Both of them are ob-
vice was plugged, is shown here (there was one WNav tained as a result of pci alloc consistent() call.
device detected):
The device driver counterpart in the user space is
wnav: wnav_module_init() BEGIN
a RT process wnav core, which is mainly responsible
wnav: wnav_pci_probe() BEGIN for channel services, i.e. the closing feedback of the
wnav: /dev/wnav0 created for device DLLs/PLLs. The driver implements following sys-
wnav 0000:18:00.0: enabling device (0000 -> 0002) tem calls: open(), close(), read() and write().
wnav 0000:18:00.0: PCI INT A -> GSI 18 (level, low)
-> IRQ 18 The RT process calls read() and write() periodi-
wnav: resource start: 0x000000e4000000 cally and between these two calls the channel services
wnav: resource end: 0x000000e40fffff are accomplished (more wnav core details will be
wnav: resource length:0x00000000100000 provided in 3.3.1). The time relation of the FPGA,
wnav: resource flags: 0x00000000040200
wnav: +--> IORESOURCE_IO 0 driver and RT process is depicted in Fig. 5. The
wnav: +--> IORESOURCE_MEM 1 behavior of read() and write() system calls is de-
wnav: +--> IORESOURCE_IRQ 0 pended on the device status stored in variable stat
wnav: +--> IORESOURCE_DMA 0
(item in struct wnav dev). The stat can be one of
wnav: +--> IORESOURCE_PREFETCH 0
wnav: +--> IORESOURCE_READONLY 0 the following: WN READ, WN WRITE and WN DONE.
wnav: +--> IORESOURCE_CACHEABLE 0
wnav: +--> IORESOURCE_RANGELENGTH 0 We describe driver function according Fig. 5.
wnav: +--> IORESOURCE_SHADOWABLE 0 Consider that we are in instant of TIC. The RT pro-
wnav 0000:18:00.0: setting latency timer to 64 cess is sleeping now (is in waiting queue), since it
wnav 0000:18:00.0: irq 44 for MSI/MSI-X called read() and status was WN DONE. However, af-
wnav: recognized MSI IRQ: 44
wnav: probe function ends with success ter the TIC new values are available through I/O
wnav: +--> device id (minor): 0 memory. But it is not supposed that the I/O
wnav: +--> strct wnav_dev ptr: 0xf44f4000 memory would be accessible now with functions as
wnav: wnav_pci_probe() END
ioread32() or iowrite32(). Instead, the DMA
wnav: module init ends with success
wnav: Number of recognized WNav devices: 1 transfer is initiated after the TIC event. Entire read
wnav: +--> &dev[0] 0xf44f4000 block, as shown in Fig. 6, is then available in the
wnav: +--> &dev[1] 0x (null) kernel space (driver). The transfer end is signalized
wnav: +--> &dev[2] 0x (null)
wnav: +--> &dev[3] 0x (null)
by the MSI interrupt. The interrupt handler changes
wnav: +--> &dev[4] 0x (null) the status from WN DONE to WN READ. Since the status
wnav: +--> &dev[5] 0x (null) in WN READ is the condition for RT process wake up,
wnav: +--> &dev[6] 0x (null) the RT process is removed from the waiting queue
wnav: +--> &dev[7] 0x (null)
wnav: wnav_module_init() END
and can now continue in reading. The read block is
then copied into the user space with copy to user()
function. Except the data from the FPGA (out-
Internally, the module data are stored into two
put registers, snapshot memory block), TMARK and
structures. On the top, there is a structure struct
FFLAG blocks are also added. The TMARK and
wnav dri, which gathers common data for entire

22
Real-Time Linux Applications

FFLAG contain timing information and fault flags read block, no DMA transfer mechanism is imple-
from previous read() & write() cycle. This is a mented due to simplicity. The write() system call is
way, how to make available these useful data for per- implemented into two stages. First, the write block is
formance debugging in user space (we will mention copied into the kernel space with copy from user().
both of them later). When the entire read block is Next, depending of write block contents, the data
transferred into the user space, the status is changed are copied from the kernel space to the device with
from WN READ to WN WRITE. Next possible attempt of iowrite32(). When all data are written, the status
read ends with an error. Then, the RT process does is changed from WN WRITE to WN DONE. Next possible
channel services and prepares data for writing. The attempt of write ends with an error. When the RT
arrangement of the write block can be seen in Fig. 7. process calls read() in this time, the RT process is
Since the write block is significantly smaller than the put into waiting queue due to the status in WN DONE.

read() write() read()


(RT process)
space
user

copy_from_user()
channel

copy_to_user()
sleeping services sleeping

WN_WRITE
WN_DONE

WN_DONE
WN_READ
(module)
kernel
space

transfer
DMA

iowrite32()
(FPGA)

wnav_irq_handler()
HW

TIC MSI TIC


interrupt 800 μs

FIGURE 5: Time relations during the data


transfer, cooperation of the WNav hardware,
device driver (kernel module) and user space
RT process
The previous paragraph describes the desired sit- handler calls the TIC counters differ exactly by one.
uation, when all the PC tasks after the MSI inter- Next, the TIC event can be detected in the driver
rupt are finished before the next TIC. Despite uti- by reading the TIC counter directly from I/O mem-
lization of Real Time Linux kernel (RT PREEMPT) ory (not by reading the DMA memory region). The
[8], such strict time requirement can not be ensured driver checks that in particular driver status (inter-
in all cases and conditions. The strategy of WNav rupt handler, reading and writing) the TIC counter
driver is to announce the situation when time con- read from I/O memory does not differ with the TIC
straints were broken. Further, the driver offers time counter obtained from the DMA memory region.
measurement of particular drive statuses. This infor-
To analyze time consumptions in particular
mation is available in user space due to mentioned
statuses the driver stores time markers (using
TMARK and FFLAG blocks.
get cycles() function). The markers form items
The breaking of time constraints is gathered in of struct time mark, its instance TMARK is then
struct faul flags, its instance FFLAG is an item a item in struct wnav dev. The driver stores mark-
in struct wanv dev. The FFLAG contains error ers at the beginning and end of the interrupt handler,
counters or cycle slip counters, i.e. if all items re- at the beginning and end of read() system call, at
main in zeros the time constraints are not broken. the time when the RT process wakes up and at the
The driver checks that in two successive interrupt beginning and end of write() system call.

23
The Witch Navigator A Software GNSS Receiver Built on Real-Time Linux

reading WN_SIZE_ACQMEM

0x7d00
0x0000 acq. mem.
(snap-shot)
0x7cfc
0x7d00

corr. meas.

2×0x80 0x100
Late code
Early carr.
and
0x7dfc
WN_SIZE_ALLDMA

WN_SIZE_RDBUFF
0x7e00
0x7f1c

output
0x7efc

anf
0x7f00
corr. ctr.
0x7f08 corr. space
TIC count
0x7f10 i2c stat.
struct
0x7f18 time_mark
struct
TMARK fault_flag

FFLAG

FIGURE 6: Reading from the WNav de-


vice

0x4+0x100+0x100
header

WN_SIZE_WRBUFF
0x0000
NCO control NCO ctr.
registers registers
0x00fc
corr. ctr. reg. 0x8000 reg. value
PRN mem.
DMA address 0x8008 segment
corr. space reg. 0x8004
I2C ctr. reg. 0x800c
0x01 0000
PRN mem. for
bank 0 0x100×0x40
0x01 3ffc
0x01 4000 = 0x4000
PRN mem. for
bank 1
0x01 7ffc
0x01 8000
PRN mem. for
bank 2 0x01 bffc
0x01 c000
PRN mem. for
bank 3
0x01 fffc

FIGURE 7: Writing to the WNav device

3.3 User Space Processes of three main user space processes. We already men-
tioned the RT process labeled as wnav core which
Our goal now is to describe the processes which run is mainly responsible for the channel services. The
in the user space context. We describe just coarse second one is a user interface and offers a look in-
conception. The detailed information is available on side of WNav but also provides WNav’s control fa-
the project homepage [12] where the source code doc- cility. The process is labeled as wnav monitor. The
umented with Doxygen [11] tool is placed. third one is a process responsible for position, ve-
locity and time (PVT) estimation and is labeled as
The WNav project, in the first approach, consists

24
Real-Time Linux Applications

wnav pvt. All of these three processes can be run In the shared memory, a queue of tasks (struct
separately. Of course, to allow a meaningful oper- wnc llist) is implemented. This queue is filled from
ation of wnav monitor or wnav pvt, the wnav core the user interface. Here, in the RT process, the task
has to be also running. from queue is performed, and if the task is finished,
the it is removed form the queue.
The Inter-Process Communication (IPC) in the
WNav project is based on shared memory. As a The following work of the RT process is the chan-
synchronization object of shared resources (items in nel services. The signal processing related data (filter
shared memory) we simply rely on integer variables status, channel status, accumulated correlator out-
which are treated atomically (we used gcc built- puts, etc.) are stored in structure wnav corr t. The
in atomic functions). A shared memory top struc- array of wnav corr t with identical organization as
ture struct wnav shamem can be used as an outline, DLL/PLL blocks in FPGA (i.e. 4 × 6) is an item
which data are shared among the processes: of the structure struct wnav shamem. The signal
processing task is driven according the channel sta-
struct wnav_shamem tus, item chst in wnav corr t. Based on the avail-
{ able correlator outputs from struct rd restbuff
/* --- IO memory --- */ and data in wnav corr t, new values for code and
struct acq_mem acq;
carrier NCOs are prepared.
struct rd_restbuff rd_rest;
struct wr_buff wr; Another important task of the RT process is an
/* --- --- */
struct prn_gener prn; export of code measurement for pseudorange form-
struct wnav_tuner tuner; ing. These data are stored in struct pvt share and
struct wnav_core wcore; then are read by PVT process.
/* --- channels (correlators) --- */
wnav_corr_t The final task is to write the data back to the
corr_all[WN_CORR_NO_GRP][WN_CORR_IN_GRP]; FPGA through the device file /dev/wnav0. The task
/* --- monitor --- */
struct wna_heap heap;
is accomplished in a function do write(). It may be
struct wnc_llist llist; divided into two parts, see Fig. 7. If just new NCO
/* --- pvt process --- */ values and possible one register value have to be writ-
struct pvt_share pvt; ten, the writing is accomplished with one write()
};
system call. All needed data are prepared in struct
wr buff. In such case, when the PRN code has to
be written in addition, the write is divided into two
3.3.1 RT Process: wnav core
steps. In the first one, the header and NCO values
are written and then a segment of PRN code is writ-
The RT process is responsible for several tasks which
ten.
are accomplished in infinite loop. We describe them
in next paragraphs.
First, the data are read from the device file 3.3.2 Process of User Interface: wnav monitor
/dev/wnav0. This task is covered in a func-
tion do read(). The reading is accomplished into
The user interface is based on Ncurses library [10],
two steps (there are two read() system calls in
see screenshot in Fig. 8.
do read()), see Fig. 6. In the first step, the block of
snapshot memory is read. Such block contains sig- The first task of the user interface is to make
nal samples over 800 µs (interval between two suc- possible a look inside the receiver. The displayed in-
cessive TICs). The block is stored as one element of formation is organized into pages. In the screenshot,
an array in struct acq mem. This array organizes the channel status related data are displayed. There
successive snapshot memory blocks to result into a are other pages like an acquisition page, tuner status
region with signal samples over long interval equals page, fault flag page and help page.
to several multiples of 800 µs. The second step of
The second task of the user interface is a re-
reading get the code and carrier measurements, cor-
ceiver control. The user can control the receiver
relator outputs but also debugging and performance
by typing the commands into a command line (see
related information from kernel driver. These data
the last line in the screenshot). The command
are stored in struct rd restbuff.
consists of its identification (string) and argument
Next task of the RT process is connected with the lists, e.g. ACQ 0 0 starts the acquisition for the
receiver control through the user interface. This task first channel in the first group (channel with coor-
is accomplished in function wnc llist perfrmv(). dinates 0, 0), or TUNER 0 1575.42 4.0 10.0 sets

25
The Witch Navigator A Software GNSS Receiver Built on Real-Time Linux

the frequency, bandwidth and gain for the first tuner algorithm is based on Eq. 2 which is solved using the
(with 0 id). Since most tasks need some write least square method.
and read to/from the FPGA, the monitor process
For proper function the process wnav pvt needs
just converts commands into task objects and put
data from the RT process wnav core. The com-
them into the queue. The queue is internally imple-
munication is accomplished using struct pvt share
mented with struct wnav llist in the shared mem-
placed in the shared memory.
ory. The function wnc llist add() performs a com-
mand parsing and putting task into this queue. Such The items of struct pvt share can be seen
created task objects are then retrieved in wnav core from next code:
with wnc llist perfrmv() function.

struct pvt_share {
uint8_t count_m125;
uint8_t core_idx;
uint8_t pvt_idx;
enum praw_stat stat[PVT_RAW_CNT];
PVT_RAW raw[PVT_RAW_CNT];
};

The communication utilizes a circular buffer con-


ception. The size of the buffer is determined with
PVT RAW CNT. The data for the PVT process are en-
capsulated in raw array. The structure has two in-
dexes, core idx and pvt idx, which point to places
where wnav core and wnav pvt shall perform write
and read, respectively. The items in raw array are
protected with a lock (or status) stat. The lock stat
indicates whether the corresponding item in raw ar-
FIGURE 8: wnav monitor process screen- ray is free, locked for writing by wnav core, contains
shot: page with channel statuses consistent data or is locked for reading by wnav pvt.
The data export from wnav core is controlled
3.3.3 Position, Velocity and Time Estima- by count m125. This is TIC counter modulo 125
tion Process: wnav pvt and data are exported at its overflow, i.e. each
125 × 800 µs = 100 ms. Thus, 100 ms is an interval
Process wnav pvt performs an estimation of the re- between two successive position, velocity and time
ceiver position, velocity and time. Internally, the estimation of the WNav receiver.

PC workstation 1 PC workstation 2
PC workstation type – HP Compaq nc6320
CPU model name Intel(R) Core(TM)2 CPU Genuine Intel(R) CPU
CPU MHz 1866.669 MHz 1833.337 MHz
CPU cores 2 2
cache L1/L2 32/4096 KB 64/2048 KB
bogoMIPS 3732.89 3657.46
address size: physical/virtual 36/48 32/32
system memory 2G 1G
distribution Fedora 12 (Constantine) Fedora 15 (Lovelock)
kernel 2.6.33.7-rt29 SMP PREEMPT RT 3.0.1-rt11 SMP PREEMPT RT
hardware platform x86 64 i386

TABLE 1: Parameters of PC workstations


used for WNav receiver testing

26
Real-Time Linux Applications

4 WNav Receiver Testing interrupt thread and RT process real time priority
and CPU affinity.
The WNav receiver was intensively tested on two dif- Unfortunately, we have not yet gathered enough
ferent PC platforms. The first of them was a desktop information to reliable answer which tuning mecha-
PC (gaming computer), further labeled as PC 1, the nism or parameters have key impact on the WNav
second of them was a laptop, further labeled as PC receiver. The kernel and operation system tunings
2. The PC parameters are enumerated in Tab 1. for WNav are challenges for the future. Clearly, it
On both PC platforms, we were capable to put the will always be a trade off between the PC hardware
WNav receiver into operation. performance and amount of work needed for kernel
The test configuration for PC 1 can be seen in and operation system tunings.
Fig. 9. We used Spirent simulator GSS6560[13] as
a GPS L1 C/A signal source. During the testing Early correlator
the important receiver parameters were logged for 2000
I
later analysis and visualization. See Fig. 10, where 0 Q

the correlator outputs of one channel are depicted. −2000


0 5 10 15 20 25 30 35 40 45 50
The data for the figure were acquired from wnav core time [s]
Late correlator
process. 2000
I
0 Q

−2000
24.8 25 25.2 25.4 25.6 25.8 26 26.2
time [s]
5 E & L power
x 10
20
10
0
11.5 12 12.5 13 13.5
time [s]

FIGURE 10: Visualization of Early and


Late correlator outputs; 1) Early correlator
outputs during the test, 2) Late correlator out-
puts: detail with synchronized carrier phase,
3) Early power and Late power signals: detail
immediately after a signal detection

FIGURE 9: The WNav receiver tested with 5 Conclusion


a signal from Spirent simulator
In this paper, the Witch Navigator GNSS soft-
The time constraints in the receiver were also
ware receiver was described. The receiver hard-
monitored using build-in facility in the kernel driver.
ware (device) is implemented as an ExpressCard in-
However, breaking of most time constraints is clearly
tended for PC workstation running Linux kernel with
visible since such situation issues in a malfunction of
RT PREEMPT patch. The first obvious milestone
the receiver. The synchronization of all tracked sig-
of the project is an implementation of GPS L1 C/A
nals is then lost and the receiver has to be restarted.
capable algorithms which was already accomplished.
While the first PC platform (PC 1) worked di- The conducted experiments prove, that the key re-
rectly after using custom kernel with RT PREEMPT ceiver elements (ExpressCard, FPGA, x86, Linux
patch, it was not true in case of PC 2. To force the kernel) have enough capability not only for GNSS
WNav receiver to operation on PC 2, we had to do demanding signal processing tasks bat also the ca-
a lot of operation system and kernel related experi- pability to meet strict time constraints needed for
ments and tunings. DLL/PLL controlling.
For PC 2 we took into consideration all available The paper brings details concerning receiver im-
recommendation related to RT PREEMPT patch, plementation as a software project in the Linux op-
mainly from [8, 9]. We did some experiments with eration system. The communication with the de-
kernel parameters, tried to reduce PC load by remov- vice, device driver, user space processes and their
ing unnecessary demons and service and changed the IPC were described.

27
The Witch Navigator A Software GNSS Receiver Built on Real-Time Linux

The WNav project is developed completely with dioengineering, Volume 19, Issue 4, pp.
free tools. This applies to both hardware (PCB de- 536–543, 2010.
sign, FPGA programming) and software. The WNav
project is open source project. Its source code, doc- [4] Kovar, P., Kacmarik, P., Vejrazka,
umentation and other related materials will be avail- F.: Interoperable GPS, GLONASS
able on the project’s homepage [12]. and Galileo Software Receiver. IEEE
Aerospace and Electronic Systems
Further project development has two obvious di- Magazine, Volume 26, Issue 4, pp. 24–
rections. The first of them is the project development 30, April 2011.
in terms of GNSS, i.e. to introduce algorithms for
new GNSS signals and systems (now, just GPS L1 [5] Corbet, J., Rubini, A., Kroah-
C/A has been implemented). The second of them is a Hartman, G.: Linux Device Drivers,
project development in terms of software implemen- Third Edition. O’Reilly Media. Febru-
tation, i.e. improving an ineffective implementation ary 2005.
of the algorithms, gathering the information how to
[6] Venkateswaran, S.: Essential Linux De-
configure the kernel and operation system, maintain-
vice Drivers. Prentice Hall. 2008.
ing the code, keeping up to date documentations etc.
We hope that in both developing directions we will [7] Linux Cross Reference, Identifier
utilize the feedback of other potential users of the Search. [Online]. Available: http:
WNav receiver. //lxr.free-electrons.com/ident
[8] The RT-kernel Wiki page. [Online].
Available: http://rt.wiki.kernel.
Acknowledgments org/

The authors would like to thank the Spirent Com- [9] Red Hat Realtime Tuning Guide.
munication for lending the GPS simulator GSS6560, 2011. [Online]. Available: http:
which was used for the verification and testing of the //docs.redhat.com/docs/en-US/
final tracking and PVT algorithms. Red_Hat_Enterprise_MRG/1.3/html/
Realtime_Tuning_Guide/index.html
[10] The Ncurses (new curses) library
References homepage. [Online]. Available:
http://www.gnu.org/software/
[1] Kaplan, E., Hegarty, Ch.: Understand- ncurses/ncurses.html
ing GPS: Principles And Applications. [11] Doxygen home page. [Online].
Second Edition. Artech House Mobile Available: http://www.stack.nl/
Communications. 2005.
~dimitri/doxygen/index.html
[2] Misra, E., Enge, P.: Global Positioning [12] Witch Navigator home page. [Online].
System: Signals, Measurements, and Available: http://www.witchnav.cz/
Performance. Second Edition. Ganga-
Jamuna Press. 2006. [13] GSS6560 Multi-Channel Fully
Flexible GPS/SBAS Simulation
[3] Jakubov, O., Kovar, P., Kacmarik, P., System. Spirent product descrip-
Vejrazka, P.: The Witch Navigator – A tion page. [Online]. Available:
Low Cost GNSS Software Receiver for http://www.spirentfederal.com/
Advanced Processing Techniques. Ra- GPS/Products/GSS6560/Overview/

28
Real-Time Linux Applications

Case Study: Challenges and Benefits in Integrating Real Time


patch in PowerPc Based Media System

Manikandan Ramachandran
Infosys Technologies Limited
Electronic City, Hosur Road, Bengaluru 560100,India
mani [email protected]

Aviral Pandey
Motorola Mobility
2450 Walsh Avenue,Santa Clara, CA 95051, USA
[email protected]

Abstract
Media systems generally have many CPU intensive as well as time critical processes. Vanilla Linux 2.6
with preemption enabled does provide solution to this kind of system. However, if the system is interrupt
intensive, as ours, then vanilla Linux 2.6 performance is not expected to be good; as by definition inter-
rupt preempts all higher priority tasks. RT patch seems to address exact issue by providing an option
to handle interrupts in process context, but the solution address exact issue by providing an option to
handle interrupts in process context, but the solution doesn’t seem to fit customized Linux. Quite a bit
of architecture changes had to be made to reap the benefits of RT patch.

This paper describes about various challenges faced in integrating Ingo’s RT patch on a customized
PowerPc Linux package. And explains how those challenges were overcome. It describes how LTTng can
be used to identify the bottlenecks and finally concludes by comparing performance of the application
that was run on vanilla and RT Linux.

29
Case Study: Challenges and Benefits in Integrating Real Time patch in PowerPc Based Media System

1 Introduction 3 Problem Statement

Linux operating system is evolving fast as a de- The most important goal of the product software de-
facto embedded operating system. Looking back the sign is that whenever there is an interrupt from video
community has made tremendous progress from be- hardware, kernel has to schedule video processing
ing stuck with big-o-lock to a really preemptible ker- tasks with almost zero latency. In theory assign-
nel in 2.6, and to fairly predictable kernel with RT ing high real time priority should take care of this
patch. However,still there are few challenges in using requirement; however in few instances it has been
Linux for commercial systems that require real-time demonstrated that Linux kernel has failed to honor
capabilities. priority of tasks because of various other kernel de-
pendencies.
This paper takes one of such case and walks Linux Trace Tool next generation or LTTng is
through the challenges in integrating RT patch along an open source tool that helps to find performance
with slew of other patches. After all the effort in in- bottle neck on Linux based system. Using LTTng,
tegrating, the real performance of the system is not following system bottlenecks were identified:
up to the expectation. The paper dwells deeper into
a performance issue and proposes a solution to fix
that issue. 3.1 Multiple IDE interrupts

The media product has two compact flashes each of


which can be accessed through kernel’s IDE driver.
Any access to compact flash generates lot of inter-
2 Product Overview rupts that could preempt the high priority applica-
tion process and cause degradation in system perfor-
mance. This issue was kind of mitigated by deferring
The system identified for this study is a head-end
IDE request by 1 millisecond, think of it like yielding
media device. The device is a real time video splicer
CPU to other tasks before scheduling another IDE
and statiscal remultiplexer that can groom hundreds
request. This feature ensures that there is only one
of video services. The device has 2 PowerPc 7448,
IDE interrupt during 1 millisecond window. This so-
built on a Marvell communication controller board
lution seems to work; however in certain condition,
[MV64460]. Real MPEG related operation is done
when there is excessive IDE access, LTTng shows
by a proprietary application that is highly CPU in-
that more than 1 IDE interrupt could occur in 1 mil-
tensive and it has the highest priority among all pro-
lisecond window.
cesses. In short, proprietary application ”almost”
expects hard real time capability from the operat-
ing system. The figure below gives intended hierar-
chy of CPU resource to be utilized by various pro-
cesses/interrupts.

FIGURE 2: IDE interrupt in ”video pro-


cess” context
This scenario happens when video hardware inter-
rupts and IDE interrupts are handled one after the
other.
Figure 2[Refer Appendix A for legend] is a
FIGURE 1: Processes Hierarchy capture of LTTng viewer. ”Marker 1” and ”Marker

30
Real-Time Linux Applications

2”. shows that there are more than 1 IDE interrupts In our system there are 2 scenarios in which spin-
while application process is woken up by video inter- lock usage is inevitable:
rupt. ”Marker 3” just shows the events that occurred
around ”Marker 1”. • Custom drivers handling interrupts or to pro-
vide mutual exclusion to critical resource.
• Kernel usage of spinlocks. One common in-
3.2 Softirqs preempts high priority
stance is usage of spinlocks in ”printk” func-
process tion.

Linux kernel is driven by timer interrupts. When-


ever there is timer interrupt, scheduler is woken up
and deferred tasks are executed from .softirq. con-
4 A Case for Real-Time Patch
text. The occurrences of timer interrupt can be con-
figured. In our kernel we have configured that to Real-time patch [3] provides following features that
be 1 millisecond. So every 1 millisecond, ”softirq” make the kernel more preemptible:
daemon is woken up and it preempts all other high
priority tasks. Usually ”softirq” daemon doesn”t hog • Makes spinlocks preemptible by default.
CPU [it would take 30-40 microseconds], but if there
• Option to handle IRQs in kernel thread context
are many kernel soft timers registered it could hog
instead of IRQ context.
the CPU.
• ”Softirqs” are handled in kernel threads con-
text.

Our study about system performance issues with


vanilla kernel 2.6.33 made a compelling argument for
using RT patch. We thought by using RT patch we
could address our system performance issues. Based
on that thought we made following changes to our
system software architecture:

• Apply RT patch to our kernel and make drivers


compatible to that.
• All interrupts except video hardware interrupts
were made to handle as kernel threads.
FIGURE 3: ”softirq” stealing ”video pro- • Priority of Application process was made
cess” context higher than interrupt and softirq threads.

In Figure 3, marker points to context switch from The idea of new design is that whenever there is a
application process to softirq daemon. In this case video interrupt, media process is woken up with zero
few micro seconds are lost from application process or little latency. Another expectation from RT patch
context. is that, the feedback mechanism used in IDE driver
to throttle IDE request can be removed and RT patch
would inherently take care of preempting low priority
3.3 Spinlocks and Preemption tasks including tasks that handle interrupts.

In multiprocessor system spinlocks are used to pro-


vide mutual exclusion for any critical resource. Spin- 5 Challenges in Integrating
locks by its nature disables preemption in CPU of the
current process. So if a lower priority process hap- RT-patch
pens to take spinlock before a higher priority tasks
scheduled to run, then as long as spinlock is held The end goal of this exercise was to have a stable
by the lower priority task, higher priority is task is real-time Linux kernel and have performance profil-
starved for CPU time. This is one scenario, where ing capability by integrating LTTng. At the time of
priority of process fails to get honored. this exercise 2.6.33 was the latest and stable kernel

31
Case Study: Challenges and Benefits in Integrating Real Time patch in PowerPc Based Media System

version so we choose real-time patch version 2.6.33.9 • Thread to run 10000 tight loops
and LTTng version [version]. Patch process went
• Interval between each loops 10000 micro sec-
about quite smoothly, but we had following run time
onds
issues:

1 Unable to boot the System in SMP mode: We conducted this test in 2 phases. In first phase, all
The cause of this issue was found to be with application was stopped and made sure CPU utiliza-
calling ”kzalloc” from ”setup cpu”. Appar- tion was less than 1% on both CPU. Then we ran
ently this issue is not seen in non-real-time cyclic test first with vanilla kernel then cyclic test
kernel. We worked around this issue by stat- was ran on kernel with real-time patch. Results of
ically allocating memory rather than using the test are given in Figure-4.
kzalloc or kmalloc.
2 Dependency on BSP code: When used real-
time patch over Marvell’s BSP code, we found
lot of recursive locks issues. We identified those
issues and fixed them.
3 Handling IRQ in a Thread: This is one of the
toughest challenges that we are dealing with.
In first look, after booting real-time kernel one
would think that interrupts are handled in In-
terrupt threads, but we found it in hard way
that interrupts continue to get handled in in- FIGURE 4: Cyclic Test with No Load
terrupt context unless a few specific changes
are made[4]. In second phase, both CPUs of the system were
This issue is the main point of this paper, and heavily loaded using a script that just creates a tight
it is discussed in detail in section ”Performance loop. Multiple instances of this script were run
Analysis”. concurrently till the CPU utilization of the system
reached 97%. Cyclic test was run in this loaded sce-
nario. Figure 5 gives the result of cyclic test on both
kernels.
6 Performance Measure
After applying real-time patch we tried to get raw
performance measure of our system using ”Cyclic
testr” [5] and the actual application performance in a
controlled test environment. From cyclic test, we no-
ticed that the performance of the system was similar
to vanilla kernel. However, with our custom prod-
uct performance test we found that the performance
of the application has deteriorated a lot. Following
section describes our test scenario and results.
FIGURE 5: Cyclic Test with Load

6.1 Kernel Performance Measure


6.1.1 Inference
”Cyclic” test is used to determine the internal worst-
case latency of a Linux real-time system.”Cyclitest” From figure 4 we see that the minimum latency
measures the amount of time that passes between for vanilla kernel is 896 microseconds which is less
when a timer expires and when the thread actually than real-time patched kernel. While maximum la-
runs the thread is woken[9]. tency for real-time patched kernel is more than 1mil-
liseonds which is far worst than vannila kernel.
We used this tool in both patched real-time ker-
nel and vanilla kernel. The ”Cyclic test” test was From figure 5, we can infer that latency isnr’t
run with following parameter set: bad with loaded system. Both for vanilla and real-
patched kernel we see the difference is only about 10
• one thread with priority 80 micro seconds.

32
Real-Time Linux Applications

The conclusion of this test is that we don’t ob-


serve huge improvement in performance of the ker-
nel/system with real-time patch.

6.2 Application Performance Mea-


sure

The idea behind this performance test is to study


how vanilla and real-time patched kernel behaves
when a high priority process is made to starve for
CPU time. Since real-time patch has option to han-
dle interrupts in process context and makes normal
spinlocks preemptible, the expectation and the desire
was that the real-time patched kernel always honors
process.s priority irrespective of other processes or FIGURE 7: CPU1 load comparison
kernel state.
Figure 6 and 7 compares the performance of each
In our case, Media application is configured to kernel while the process was stressed for 30 minutes.
have higher priority than all other processes, includ-
ing processes that are supposed to handle interrupts.
6.2.2 Inference

From Figure 6 and Figure 7 it is apparent that appli-


6.2.1 Test Case Description
cation performance on real-time patch kernel is not
better than its performance on vanilla kernel. Only
The Media application was configured to run at full difference between real-time patch kernel and vanilla
capacity, so that CPU resource is solely taken by kernel from a platform perspective is that a patch
this process. Executions of other trivial tasks were to throttle IDE access was not ported to real-time
reduced. The test was then run on 3 different kernels: kernel.
To make sure IDE access is the cause of the issue,
• 2.6.22.5 [scheduler based on RB-tree] excessive IDE access was made when media applica-
tion process was fully loaded. Under this scenario, we
• 2.6.33.5 [CFS scheduler]
found that the performance of the application dete-
riorated further. Ideally a process that handles IDE-
• 2.6.33.5 + RT patch
interrupts should have yielded to high priority task
processes like the media application process; how-
ever, from this test it looks like media application
process seems to be preempted when there is an IDE
interrupt.

7 Application Performance
Analysis
After making the kernel real-time enabled, the expec-
tation was that the IRQ thread will be preempted by
high priority media application process. However, as
demonstrated in previous section, we have seen that
performance of the application was not good with
real-time patched kernel. There could be two possi-
bilities for this kind of behavior:
FIGURE 6: CPU0 load comparison
1. Scheduler is not honoring processes priority.

33
Case Study: Challenges and Benefits in Integrating Real Time patch in PowerPc Based Media System

2. IDE interrupts are still handled in interrupt in thread, one has to use ”request threaded irq” and
context like vanilla kernel. split the handling of irq into two parts by using two
handler functions. In first part, interrupt will be han-
We confirmed ”Case 1” is not the case by running dled in interrupt context. In this context, handler
”rt-migrate-test” written by Steven Rostedt[6]. This should disable the source of the interrupt and wake
is a simple program that creates multiple threads corresponding interrupt thread. The second handler
with various priority and then checks if scheduler should do the actual handling of the interrupt, which
honors the priority of each thread. This test passed obviously will be done in the thread context.
both in normal scenario and in heavily loaded system
Having identified the bottleneck, we are in pro-
with IDE interrupts and media application process.
cess of converting IDE and other similar CPU inten-
To check if IDE interrupts are still handled in sive interrupt handlers to be run in thread context.
interrupt context, we patched real-time kernel with
LTTng. Then with patched kernel, the media ap-
plication was stressed while generating multiple IDE
interrupts.
9 Future Work

9.1 Creating thread function for in-


terrupt handling

As discussed in section 8, just having real-time patch


is not go enough for making interrupts to be han-
dled in threads. A thread function, which handles
deferred interrupt work, should be designed and im-
plemented. In future, we intend to make low priority
interrupts that are CPU intensive to be handled in
threads.

9.2 High Resolution Timers

We didn.t enable .High Resolution Timers. in any of


the current kernel configuration as we weren.t sure
FIGURE 8: Video Process being preempted about the support of this feature in PowerPc based
by interrupts system [8]. In future, we intend to do deep profile of
kernel performance with high resolution timers.
Figure 8 captures the various process states at
the time of the issue. ”irq75-ide1” is the handler
process for the IDE interrupt: in this case IDE in-
terrupt number is 75.From the viewer, we can in- 10 Conclusion
terpret that video process was starved for CPU as
it was busy handling IDE interrupt ”75” in inter- Effectiveness of real-time patch on a system depends
rupt context rather than in the context of the thread on various factors. In our case, preemption of IRQ
”irq75-ide1”. This is the cause of video process star- handler is the single most important criteria in de-
vation. termining system performance. Although real-time
patch creates interrupt threads by default, it really
doesn.t mean that interrupts are handled in those
8 Solution threads. This behavior confuses the user. Instead,
it would have been better if no interrupt threads
were created. If one prefers to have an interrupt
We started looking into ”request irq” implementa-
threaded then he or she may do so by using ”re-
tion in real-time patched kernel to understand why
quest threaded irq”.
IDE interrupts were handled in interrupt context not
in thread context. From ”request irq” implemen- This case study shows that applying real-time
tation it was clear that irq thread did nothing if patch will not solve all real time requirements of a
handlers for an interrupt were installed through ”re- system. Instead it provides great tools like preemp-
quest irq” call. In order for interrupt to be handled tive spinlocks, threaded interrupt handler etcetera.

34
Real-Time Linux Applications

To make a system near real time capable, one has [8] High resolution timers and
to understand their system bottlenecks. LTTng is dynamic ticks design notes
a great tool which clearly brings out hard to un- [linux/Documentation/timers/highres.txt]
derstand system behavior. As demonstrated in this
case, LTTng was used extensively to get to bottom
[9] rt-latency-howto.txt
of many performance issues. It is highly recom-
[http://people.redhat.com/williams/latency-
mended to use LTTng to understand system perfor-
howto/rt-latency-howto.txt]
mance issues and make few system software archi-
tecture changes to take advantage of tools provided
by real-time patch. To summarize, real-time patch is
not a ”cure-all” of all system real-time owes; instead
it provides great tools that could be used as the first A Appendix
step to make a system real time capable.
Following is the legend for LTTV out-
put. Source: ”http://ltt.openrapids.net/lttv-
References doc/user guide/x81.html”.

[1] Introduction to LTTng [ http://lttng.org/lttng-


kernel-tracer]
[2] How to Use LTTV [http://lttng.org/lttv]
[3] Introduction to RealTime
Patch[https://rt.wiki.kernel.org]
[4] Cyclictest examples
[https://rt.wiki.kernel.org/index.php/Cyclictest]
[5] Moving interrupts to thread by Jake
Edge.[http://lwn.net/Articles/302043]
[6] http://kerneltrap.org/Linux/Balancing Real Time Threads
[7] Investigating latency effects of the Linux real-
time Preemption Patches (PREEMPT RT) on
AMDs GEODE LX Platform FIGURE 9: Lengend For LTTV

35
Case Study: Challenges and Benefits in Integrating Real Time patch in PowerPc Based Media System

36
Real-Time Linux Applications

Hard real-time Control and Coordination of Robot Tasks using Lua

Markus Klotzbuecher
Katholieke Universiteit Leuven
Celestijnenlaan 300B, Leuven, Belgium
[email protected]

Herman Bruyninckx
Katholieke Universiteit Leuven
Celestijnenlaan 300B, Leuven, Belgium
[email protected]

Abstract
Control and Coordination in industrial robot applications operating under hard real-time constraints
is traditionally implemented in languages such as C/C++ or Ada. We present an approach to use Lua,
a lightweight and single threaded extension language that has been integrated in the Orocos RTT frame-
work. Using Lua has several advantages: increasing robustness by automatic memory management and
preventing pointer related programming errors, supporting inexperienced users by offering a simpler syn-
tax and permitting dynamic changes to running systems. However, to achieve deterministic temporal
behavior, the main challenge is dealing with allocation and recuperation of memory. We describe a prac-
tical approach to real-time memory management for the use case of Coordination. We carry out several
experiments to validate this approach qualitatively and quantitatively and provide robotics engineers the
insights and tools to assess the impact of using Lua in their applications.

1 Introduction tages. Firstly, the robustness and hence safety, of


the system is increased. This is because scripts, in
contrast to C/C++, can not easily crash a process
This work takes place in the context of component and thereby bring down unrelated computations that
based systems. To construct an application, compu- are executed in sibling threads. This property is es-
tational blocks are instantiated and interconnected sential for the aspect of coordination, which, as a
with anonymous, data-flow based communication. system level concern, has higher robustness require-
Coordination refers to the process of managing and ments than regular functional computations.
monitoring these functional computations such that Secondly, the use of a scripting language facili-
the system behaves as intended. Keeping Coordina- tates less experienced programmers not familiar with
tion separate from Computations increases reusabil- C/C++ to construct components. This is impor-
ity of the latter blocks as these are not polluted with tant for the robotics domain, where users are often
application specific knowledge. Examples of typical not computer scientists. Moreover, rapid prototyp-
coordination tasks are switching between controllers ing is encouraged while leaving the option open to
upon receiving events, reconfiguring computations convert parts of the code to compiled languages af-
and dealing with erroneous conditions. A complex ter identification of bottlenecks. At last, the use of
example of an robot applications constructed using an interpreted language permits dynamic changes to
this paradigm can be found here [1]. a running system such as hot code updates. This
To implement coordination we propose to use is essential for building complex and long running
the Lua [2] extension language. Using an inter- systems that can not afford downtime.
preted language for this purpose has several advan-

37
Hard real-time Control and Coordination of Robot Tasks using Lua

The major challenge of using Lua in a hard real- to Lua. Lua was ultimately chosen because of its
time context is dealing with allocation and recu- significantly larger user community.
peration of memory. Previously we sketched two
strategies to address this: either running in a zero-
allocation mode and with the garbage collector de-
activated or in a mode permitting allocations from
3 Approach
a preallocated memory pool using a O(1) allocator
and with active but controlled garbage collection [3]. To achieve deterministic allocations, Lua was con-
In practice, especially when interacting with C/C++ figured to use the Two-Level Segregate Fit (TLSF)
code it may be inconveniant to entirely avoid collec- [8] O(1) memory allocator. This way memory allo-
tions, hence now we consider it necessary to run the cations are served from a pre-allocated, fixed pool.
garbage collector. Naturally, this raises the issue of how to determine
the required pool size such that the interpreter will
The rest of this paper is structured as follows. not run out of memory. We address this in two ways.
The next section gives an overview over related work. Firstly, by examining memory management statistics
Section 3 describes how we address the issue of mem- the worst case memory consumption of a particular
ory management in a garbage collected language application can be determined and an appropriate
used for coordination. Section 4 describes four ex- size set. Due to the single threaded nature of Lua
periments with the two goals of demonstrating the a simple coverage test can give high confidence that
approach and giving an overview of the worst-case this value will not be exceeded in subsequent runs.
timing behavior to be expected. Robustness is dis- Furthermore, to achieve robust behavior the current
cussed in the context of the last experiment, a coor- memory use is monitored online and appropriate ac-
dination statechart. We conclude in section 5. tions are defined for the (unlikely) case of a memory
shortage. What actions are appropriate depends on
the respective application.
2 Related work This leads to the second challenge for using Lua
in a hard real-time context, namely garbage collec-
tion. In previous work [3] we suggested to avoid
The Orocos RTT framework [4] provides a hard real-
garbage collection entirely by excluding a set of op-
time safe scripting language and a simple state ma-
erations that resulted in allocations. However, in
chine. While both are much appreciated by the user
practical applications that transfer data between the
community, the limited expressivity of the state ma-
scripting language and C/C++ this is not always
chine model (e.g. the lack of hierarchical states)
possible. Consequently the garbage collector can not
and the comparably complex implementation of both
be disabled for long periods and must be either au-
scripting language and state machines have been rec-
tomatically or manually invoked to prevent running
ognized as shortcomings. This work is an effort to
out of memory. For achieving high determinism, it
address this.
is necessary to stop automatic collections and to ex-
The real-time Java community has broadly ad- plicitly invoke incremental collection steps when the
dressed the topic of using Java in hard real-time respective application permits this. Only this way
applications [5]. The goal is to use Java as a re- it can be avoided that an automatic collection takes
placement to C/C++ to build multi-threaded real- place at an undesirable time.
time systems. To limit the impact of garbage collec-
The Lua garbage collector is incremental, mean-
tion parallel and concurrent collection techniques are
ing that it may execute the garbage collection cycle
used [6]. For our use case of building domain spe-
in smaller steps. This is a necessary prerequisite for
cific coordination languages we chose to avoid this
achieving low garbage collection latencies, although
complexity as coordination can be defined without
of course no guarantee; ultimately the latency de-
language level concurrency. In return this permits
pends on various factors such as the amount of live
taking advantage of the deterministic behavior of a
data, the properties of the live data1 and the amount
single threaded scripting language.
of memory to be freed. The control and coordination
The Extensible Embeddable Language (EEL) [7] applications we have in mind generally tend to pro-
is a scripting language designed for use in real-time duce little garbage because the scripting language
application such as audio processing or control ap- is primarily used to combine calls to C/C++ code
plications. Hence, it seems an interesting alternative in meaningful ways. Even though, to achieve high
1 In Lua, for instance, tables are collected atomically. Hence large tables will increase the worst-case duration of an incremental

collection step.

38
Real-Time Linux Applications

robustness the worst-case duration of the collection The purpose of this test is to compare the aver-
steps can be monitored to deal robustly with possible age and worst case latencies between the Lua and C
timing violations. version and to investigate the impact of the garbage
collector in different modes.
The following summarizes the basic approach.
First, the desired functionality is implemented and
executed with a freely running garbage collector.
This serves to determine the maximum memory use
Results The following table summarizes the re-
from which the necessary memory pool size can be
sults of the cyclictest experiments. Each field con-
inferred by adding a safety margin (e.g. the maxi-
tains two values, the average (“a”) and worst case
mum use times 2). Next, the program is optimized
(“w”) latency given in microseconds, that were ob-
to stop the garbage collector in critical paths and in-
tained after fifteen minutes of execution.
cremental steps are executed explicitly. The worst
case timing of these steps is benchmarked, as is the
overall memory consumption. The program is then sleep time 500 1000 2000 5000 10000
executed again with the goal to confirm that the ex- a, w a, w a, w a, w a, w
plicitly executed garbage collection is sufficient to C 0, 35 0, 31 0, 45 1, 35 1, 30
not run low on memory. Lua/free 2, 41 2, 39 3, 39 3, 45 5, 46
Lua/off 2, 38 2, 39 3, 38 3, 43 5, 38
Lua/ctrl 2, 38 2, 42 3, 37 3, 36 5, 46

4 Experiments
Comparing the C cyclictest with the Lua variants
as expected indicates that there is an overhead of us-
In this section we describe the experiments carried ing the scripting language. The difference between
out to assess worst-case latencies and overhead of the three garbage collection modes are less visible.
Lua compared to using C/C++ implementations. The table below shows the average of the worst case
All tests are executed using Xenomai [9] (v2.5.6 on latencies in microseconds and expressed as a ratio to
Linux-2.6.37) on a Dell Latitude E6410 with an Intel the average worst case of C. Note that the average
i7 quad core CPU and 8 GiB of RAM, with real- of a worst-case latency is only meaningful for reveal-
time priorities, current and future memory locked in ing the differences between the four tests, but not in
RAM and under load.2 Apart from the cyclictest absolute terms. A better approach might be to base
all tests are implemented using the Orocos RTT [4] the average on the 20% worst-case values.
framework. The source code is available here [15].

test WC avg (us) ratio to C


C 35.2 1
4.1 Lua Cyclictest
Lua/free 42 1.19
Lua/off 39.2 1.11
The first test is a Lua implementation of the well
Lua/ctrl 39.8 1.13
known cyclictest [10]. This test measures the latency
between scheduled and real wake up time of a thread
after a request to sleep using clock nanosleep(2). The above table shows that a freely running
The test is repeated with different, absolute sleep garbage collector will introduce additional overhead
times. For the Lua version, the test is run with in critical paths. Running with the garbage collector
three different garbage collector modes: Free, Off or off or triggered manually at points where it will not
Controlled. Free means the garbage collector is not interfere add approximately 11% and 13% respec-
stopped and hence automatically reclaims memory tively compared to the C implementation. Of course
(the Lua default). Off means the allocator is stopped the first option is only sustainable for finite periods.
completely3 by calling collectgarbage(’stop’). 13% of overhead does not seem much for using a
Controlled means that the collector is stopped and scripting language, however it should be noted that
an incremental garbage collection step is executed af- this is largely the result of only traversing the bound-
ter computing the wake up time statistics (this way ary to C twice: first for returning from the sleep
the step does not add to the latency as long as the system call and secondly for requesting the current
collection completes before the next wake up). time.
2 ping -f localhost, and while true; do ls -R /; done.
3 This is possible because the allocations are so few that the system does not run out of memory within the duration of the
test.

39
Hard real-time Control and Coordination of Robot Tasks using Lua

4.2 Event messages round trip slower. Of the 1 MiB memory pool, a maximum of
34% was used. It is worth noting that for the initial
The second experiment measures the timing of time- version of this benchmark, the response times were
stamped event messages sent from a requester to a approximately eight times slower. Profiling revealed
responder component, as shown in Figure 1. The test that this was caused by inefficient access to the time-
simulates a simple yet common coordination scenario stamp message; switching to a faster foreign function
in which a Coordinator reacts to an incoming event interface yielded the presented results.
by raising a response event, and serves to measure
the overhead of calls into the Lua interpreter. The
test is constructed using the Orocos RTT framework 4.3 Cartesian Position Tracker
and is implemented using event driven ports con-
nected by lock free connections. Both components The following two experiments illustrate more prac-
are deployed in different threads. Three timestamps tical use cases. The first experiment compares both a
are recorded: the first before sending the message, Lua and C++ implementation of a so-called “Carte-
the second at the responder side and the third on sian position tracker”, typical in robotics, and run-
the requester side after receiving the response. The ning at 1KHz, by measuring the duration of the con-
test is executed using two different responder com- troller update function. In contrast to the previ-
ponents implemented in Lua and C++. ous example the incremental garbage collection step
is executed during the controller update and hence
Req Resp contributes to its worst case execution time.
The following listing shows the simplified code
timestamp t1 of the update function. Note that diff function is a
store timestamp t2
call to the Kinematics and Dynamics Library (KDL)
and send response [11] C++ library, hence the controller is not imple-
mented in pure Lua. This is perfectly acceptable,
gcstep
timestamp t3 as the goal is not to replace compiled languages but
to improve the simplicity and flexibility of using the
primitives these offer.

pos_msr = rtt.Variable("KDL.Frame")
FIGURE 1: Sequence diagram of event pos_dsr = rtt.Variable("KDL.Frame")
round trip test. vel_out = rtt.Variable("KDL.Twist")
local vel, rot = vel_out.vel, vel_out.rot
For the Lua responder, this application takes ad-
function updateHook()
vantage of the fact that the requester component will if pos_msr:read(pos_msr) == ’NoData’ or
wait for 500us before sending the next message and pos_dsr:read(pos_dsr) == ’NoData’ then
executes an incremental garbage collection step af- return
ter sending each response. If this assumption could end
not be made, the worst-case garbage collection delay diff(pos_msr, pos_dsr, vel_out, 1)
would have to be added to the response time (as is
the case for experiment 4.3). vel.X = vel.X * K[0]
vel.Y = vel.Y * K[1]
vel.Z = vel.Z * K[2]
rot.X = rot.X * K[3]
Results The following table summarizes the aver-
rot.Y = rot.Y * K[4]
age (“a”) and worst-case (“w”) duration of this ex- rot.Z = rot.Z * K[5]
periment for the request (t2 − t1), response (t3 − t2)
and total round trip time (t3 − t1); all values in mi- vel_out:write(vel_out)
luagc.step()
croseconds. end
req resp total Lua/C (total)
a, w a, w a, w a, w Note that for Lua versions prior to 5.2
C 9, 37 7, 18 16, 50 - invoking the incremental garbage collector
Lua 15, 47 11, 59 26, 106 1.63, 2.12 (collectgarbage(’step’)) restarts automatic col-
lection, hence collectgarbage(’stop’) must be
On average, the time for receiving a response invoked immediately after the first statement. The
from the Lua component is 1.6 times slower than custom luagc.step function executes both state-
using the C responder. The worst case is 2.2 times ments.

40
Real-Time Linux Applications

Results The following table summarizes the re- disabled when entering the approach state and en-
sults of the worst case execution times in microsec- abled again in grasp after the respective controllers
onds. The average execution time is approximately have been enabled.
14 times, the worst case duration 7 times slower than
Besides the actual grasping it is necessary to
the C version. The worst case garbage collection time
monitor the memory use to avoid running out of
measured was 29us, of the 1MiB memory pool size a
memory. With an appropriately sized memory pool
maximum of 34% was in use.
and sufficient garbage collection steps, such a short-
age should not occur. Nevertheless, to guarantee ro-
type duration (avg, max) Lua/C (total)
bust and safe behavior this condition must be taken
a, w a, w
into account and the robot put into a safe state. This
C 5, 19 -
is shown in Figure 3.
Lua 68, 128 13.6, 6.7

In the current implementation the majority of Root


both execution time spent and amount of garbage
generated results from the multiplication of the K operational [ mem_use < 0.7 ] mem_low
gains with the output velocity. If performance load(grasp.fsm) entry:
[ mem_use > 0.6 ] robot_stop()
needed to be optimized, moving this operation to luagc.full()
exit:
C++ would yield the largest improvement. robot_start()

4.4 Coordination Statechart FIGURE 3: Dealing with low memory.

The second real-world example is a coordination As the grasping task can only take place while
Statechart that is implemented using the Reduced enough memory is available, it is defined as a sub-
Finite State Machine (rFSM) domain specific lan- state of operational. The structural priority rule of
guage [12], a lightweight Statechart execution engine the Statechart model [13] then guarantees that the
implemented in pure Lua. The goal is to coordinate transition to mem low has always higher priority than
the operation of grasping an object in an uncertain any transitions in the grasping state machine.
position. The grasping consists of two stages: ap-
proaching the object in velocity control mode and Identifying the required memory pool size has
switching to force control for the actual grasp opera- currently to be done by measuring empirically the
tion when contact is made. This statechart is shown maximum required memory of a state machine and
in Figure 2. adding a safety margin. To avoid this, it would
be desirable to infer the expected memory use from
the state machine description. Predicting the static
grasping memory used by the state machine graph is straight-
forward; also the run-time memory use of the rFSM
core is predictable 4 as it depends on few factors such
approach grasp as the longest possible transition and the maximum
entry: e_contact entry:
luagc.step() en_force_ctrl() number of events to be expected within a time step.
en_vel_ctrl() grasp() However, predicting the memory use of the user sup-
approach_object() luagc.start()
plied programs would require a more detailed analy-
e_grasp_failed
e_grasp_ok
sis/simulation, which is currently out of the scope of
this work; but in robotics, most user supplied pro-
grams are in C/C++ anyway.
FIGURE 2: Coordinating the grasping of
an object. Results The previously described grasping coor-
dination Statecharts are tested by raising the events
The real-time constraints of this example depend that effect the transitions from grasping, approach
largely on the approach velocity: if the transition to to grasp. The timing is measured from receiving the
the grasp state is taken too late, the object might e contact event until completing the entry of the
have been knocked over. To avoid the overhead of grasp state. After this, the same sequence of events
garbage collection in this hot path, the collector is is repeated. The functions for enabling the controller
4 It consists mainly of traversing and transforming the FSM graph.

41
Hard real-time Control and Coordination of Robot Tasks using Lua

are left empty, hence the pure overhead of the FSM priate validation should be repeated for each critical
execution is measured. Running the test repeatedly use. In particular when real-time allocation and col-
for five minutes indicates a worst-case transition du- lection is involved, run time validation of real-time
ration between approach and grasp of 180us. The constraints must be considered as an integral part
memory pool size was set to 1 MiB and the TLSF the application.
statistics report a maximum use of 58%. To test
The major shortcoming of the current approach
the handling of low memory conditions, in a second
is that worst-case memory use can be difficult to pre-
experiment the collector is not started in the grasp
dict. To deal with this we currently allocate addi-
state. As a result no memory is recovered, eventually
tional safety margins. As the overall memory usage
leading to a low memory condition and a transition
of the Lua language is comparably small, such a mea-
to the mem low state. For this test the worst case
sure will be acceptable for many systems, save the
maximum memory use was as expected 70%.
very resource constrained.
This test does not take into account the latencies
To conclude, we believe the results demonstrate
of transporting an event to the state machine. For
the feasibility of our approach to use a scripting lan-
example, when using the Orocos RTT event driven
guage for hard real-time control and coordination
ports, the experiments from Section 4.2 can comple-
that permits to significantly improve robustness and
ment this one. Moreover it should be noted that so
safety of a system. The price of these improvements
far no efforts have been put into minimizing rFSM
are (i) increased yet bounded worst-case latencies,
transitions latencies; we expect some improvement
(ii) computational overhead, as well as (iii) requir-
by optimizing these in future work.
ing additional precautions such as manual schedul-
ing of garbage collection. In summary, we believe
Robustness considerations As described, basic this constitutes a modern and practical approach to
robustness of coordination state machines is achieved building hard real-time systems that shifts the fo-
by monitoring of memory and current real-time la- cus from lowest possible latency to sufficient latency
tencies. However, the system level concern of coordi- while maximizing reliability.
nation unfortunately combines the two characteris- Future work will take place in two directions. On
tics of (i) requiring higher robustness than functional the high level we are investigating how to automat-
computations and (ii) being subject to frequent late ically generate executable domain specific languages
modifications during system integration, the latter from formal descriptions. Implementationwise we in-
of course being susceptible to introduce new errors. tend to investigate if and how the presented worst
The combination of scripting language and rFSM case timing behavior can be improved by using the
model can mitigate this effect in two ways. Firstly luajit [14] implementation, a high performance just-
the scripting language inherently prevents fatal er- in-time compiler for Lua.
rors caused by memory corruption, thereby making it
impossible to crash the application. Secondly, rFSM
statecharts execute Lua user code in safe mode5 . Acknowledgments This research was funded by the
This way errors are caught and converted to events European Community under grant agreements FP7-
that again can be used to stop the robot in a safe ICT-231940 (Best Practice in Robotics), and FP7-ICT-
way. 230902(ROSETTA), and by K.U.Leuven’s Concerted Re-
search Action Global real-time optimal control of au-
tonomous robots and mechatronic systems. The au-
thors also gratefully acknowledge the support by Willow
5 Conclusions
Garage, in the context of the PR2 Beta program.

We have described how the Lua programming lan-


guage can be used for hard real-time coordination
and control by making use of an O(1) memory allo- References
cator, experimentally determining worst-case mem-
ory use and manually optimizing garbage collection
[1] R. Smits et al., “Constraint-based motion spec-
to not interfere in critical paths. Several experiments
ification application using two robots.”, http:
are carried out to determine worst-case latencies.
//www.orocos.org/orocos/constraint-
As usual, benchmark results should be judged based-motion-specification-application-
with caution and mainly serve to remind that appro- using-two-robots, 2008.
5 Using the Lua pcall function

42
Real-Time Linux Applications

[2] R. Ierusalimschy, L. H. de Figueiredo, and W. [8] M. Masmano, I. Ripoll, P. Balbastre, and A.


C. Filho, “Lua – an extensible extension lan- Crespo, “A constant- time dynamic storage al-
guage”, Softw. Pract. Exper., vol. 26, no. 6, pp. locator for real-time systems”, Real-Time Syst.,
635652, 1996. vol. 40, no. 2, pp. 149179, 2008.
[3] M. Klotzbuecher, P. Soetens, and H. Bruyn- [9] P. Gerum. “Xenomai - implementing a rtos em-
inckx. “OROCOS RTT-Lua: an Execution ulation framework on GNU/Linux”, 2004.
Environment for building Real-time Robotic
Domain Specific Languages.” In International [10] T. Gleixner. https://rt.wiki.kernel.org/
Workshop on Dynamic languages for RObotic index.php/Cyclictest
and Sensors, pages 284289, 2010.
[11] R. Smits. “KDL: Kinematics and Dynamics Li-
[4] P. Soetens, “A software framework for brary”, http://www.orocos.org/kdl/, 2001.
real-time and distributed robot and ma-
chine control,” Ph.D. dissertation, May [12] M. Klotzbuecher. “rFSM Coordination Stat-
2006, http://www.mech.kuleuven.be/dept/ echarts”, https://github.com/kmarkus/rFSM,
resources/docs/soetens.pdf. 2011.
[5] “Real-time specification for Java (RTSJ)”,
[13] D. Harel and A. Naamad. “The STATEMATE
version 1.0.2, http://www.rtsj.org/
semantics of statecharts.”, ACM Trans. on Soft-
specjavadoc/book_index.html, 2006.
ware Engineering Methodolody, 5(4):293333,
[6] Sun Microsystems. “Memory Management 1996.
in the Java HotSpotTM Virtual Ma-
chine.”, 2006, http://java.sun.com/j2se/ [14] M. Pall, “The LuaJIT Just-In-Time Compiler”,
reference/whitepapers/memorymanagement_ 2011, http://luajit.org/
whitepaper.pdf
[15] Experiments source code. http://people.
[7] D. Olofson, “The Extensible Embeddable Lan- mech.kuleuven.be/~mklotzbucher/2011-09-
guage”, http://eel.olofson.net/, 2005. 19-rtlws2011/source.tar.bz2

43
Hard real-time Control and Coordination of Robot Tasks using Lua

44
Real-Time Linux Applications

Platform independent remote control and data exchange with


real-time targets on the example of a rheometer system

Martin Leonhartsberger
Institute for Measurement Technology, Johannes Kepler University
Altenbergerstrasse 69, 4040 Linz, Austria
[email protected]

Bernhard G. Zagar
Institute for Measurement Technology, Johannes Kepler University
Altenbergerstrasse 69, 4040 Linz, Austria
[email protected]

Abstract
Most real-time applications in context to automatic control and data acquisition do require user
interfaces to change parameters or to extract data at runtime. Especially in prototyping control processes,
it is common to use CACSD programs such as Matlab or Scicos for code generation. While still being in
the prototyping stage the customer demands to start evaluation using a software tool. Very often those
customers do not possess the ability to work with CACSD tools or other tools as xRTAILab because of
missing licenses or lack of knowledge in linux. To close the gap between the real-time target and data
evaluation a remote control framework is introduced. Our approach is to use RTAIXML as a XML-RPC
server. It is running on the rt-system and has the ability to connect to the CACSD generated code. This
server instance can be contacted through a platform independent Java framework which allows quick
prototyping of simple to use user interfaces without interfering the control system development. It is
possible to transfer scopes and to change parameters on running targets. A successful application is
shown on the example of a low cost rheometer.

1 Introduction be used. It should be possible to produce it in small


series and especially for the rheometer it is a require-
ment to make usage easy. Still it is a prototype with
Connecting a well adapted end user software to an many changes ongoing, so rapid prototyping should
experimental real-time application can be quite hard. be possible. Therefore we suggest a combination of
Usually such environments are served through soft- Matlab or Scicos for Computer Aided Control Design
ware packages as Matlab [1] and Scicos [2]. A tech- (CACSD), an open source real-time target for exe-
nician is acquiring data and evaluating it with those cution, a webservice provider on the target for data
tools directly. On the other hand, professional in- exchange and a java framework to setup an applica-
dustrial appliances use products as for example Lab- tion for remote controlling the whole application. A
VIEW [3] and its data acquisition hardware to get similar unified approach is described from Roberto
everything from a single source. A handy user inter- Bucher and Silvano Balemi in [4].
face is built and the application is on its way. The With this suggested approach an end user software,
second approach is to develop all the needed software independent from most operating systems (it uses
in house for a series production. Both is very expen- Java) can be written. With the framework pro-
sive and usually not applicable for projects like the vided, software development concentrates on build-
one described in this paper. Often it should be low ing a graphical user interface (GUI) and a state ma-
cost, therefore many professional tool chains can’t chine changing parameters of the running real-time

45
Platform independent remote control and data exchange with real-time targets

target through the framework. With this combina- new target parameter values. This is especially
tion rapid prototyping of the control process is pos- needed to change feedback gains or the state.
sible parallel to the GUI implementation as long as
Get Signal Structure: receives a structure of ex-
the state machine interface parameters are designed
posed signals on the target. They can be scalar
properly.
or arrays.
View Selected Signal: This will start a continu-
2 Used tools ous transfer of a signal, so not over XML-RPC
but with an explicit socket connection.
An RTAI [5] patched 2.6.32.11 kernel is used on Write Signal Data on File: This will store a sig-
our reference system. RTAI was started by Paolo nal on the server to request it in a later stage.
Mategazza from the Politecnico di Milano and is in- It is not used in this project, but this may be
troducing a real-time schedule into the kernel which very useful for unattended recordings.
is then able to execute hard real-time code, both
in user and kernel space. Precompiled kernels for Disconnect: This command will shutdown the ses-
a quick start can be found for example in [6]. For sion and deallocate the target lock.
communication with the data acquisition hardware a
COMEDI [7] driver is being used. COMEDI drivers 2.2 Java Framework and state ma-
are low level kernel drivers which provide their func-
tionalities through a common interface for different
chine
data acquisition hardware. Comedilib as a user space
For remote controlling a target over RTAI-XML a
library is used to extend the block modeling capabili-
Java Framework was developed during this work.
ties of Simulink [1] and Scicos [2] for code generation
in a later step. All tools except Simulink are avail-
able as open source.

2.1 RTAI-Lab and RTAI-XML

RTAI-Lab makes a connection between RTAI and


CACSD tools with the aim to allow code genera-
tion for RTAI out of CACSD tools. To monitor and
change parameters and scopes GUI applications are
existing. RTAI-XML [8] is a branch of those ap-
plications which aims to bring the lab environment
x/qrtailab [9] to a remote space. This tool commu-
nicates over an XML remote procedure call proto-
col (XML-RPC) with the target and exchanges data
and parameter settings. The protocol is mostly doc-
umented in the sources and was implemented into a
Java framework during this work. Overview to the
available XML-RPC commands:

Connect: will initiate a connection to the RT target


and negotiate ports.
Start: will start the target connected with if it is
not already running. Targets could also start
automatically on connect.
FIGURE 1: Java Framework
Stop: will stop the target connected to.
Get Parameters: will download the target param- It enables one to change parameters and export scope
eters. data from the target in a programming context, not
in real-time of course, but fast enough for visualiza-
Send Modified Parameters: requires a (changed) tion. The framework is intended to work with dif-
getParameters structure, upload them and set ferent target generators, for example Simulink and

46
Real-Time Linux Applications

Scicos. The java classes, designed with the Model A possibility to change parameters is now found,
View Controller pattern are shortly presented in the the next step is to implement a work flow by us-
UML chart in Fig 1. ing states for example. This could be done by state
flow objects in Simulink or by additional C programs
Class RTAITarget: This class provides an ob- directly on the target for example. Still, a client pro-
ject for a target. setParameterByName(..) gram for evaluating the results would be necessary as
allows the change of a single parameter well and very often the required workflows are due
or the whole parameter-set can be up- to change quite often, therefore we suggest to im-
dated with setParameters(..) at once. plement a state machine on the client in a program.
getScopeByName(String sName) returns an Every automatic control block and calculation can be
object of type RTAIScope which is being de- switched on and off with a constant in the CACSD
scribed in the next item. Functions for send- model. Initially all processes are switched off. The
ing and receiving parameters allow to update remote program then starts to switch on or off the
the target at certain times when all parame- parts as it is defined in a state model. We have suc-
ters have been set or can be triggered with the cessfully used this approach to implement different
function parameter updateImmediately which workflows for one device.
is available for all ”set” functions. Listeners To show the usage of the framework, a very small
will notify attached objects about changes in demonstration program has been written. The com-
the parameters. ments in the code will explain the different steps.
Class RTAIScope: This class provides an object The source of the framework itself is not printed
of a single scope instance. All the scope due to its large amount of line numbers but will
history is saved in this object after trig- shortly be available as an sourceforge project under
gering the socket transfer of the data with the GPL.
scope.start(). Listeners will as well update
attached objects about new data.
2.2.1 A short example
Interface RTAITargetListener: Provides the in-
terface which target listener objects will have
to implement. We create a small example which is shown in Fig. 2.
The demo target has been inherited of the comedilib
Interface RTAIScopeListener: Provides the in- demos. It demonstrates the output of a signal to an
terface which scope listener objects will have RTAI scope and to a COMEDI device. After code
to implement. generation and setup of all necessary components on
the real-time target it is now possible to change pa-
Interface XMLRPCClientInterface: Provides
rameters with a few lines of Java code on a platform-
an interface with all remote procedure calls
independent remote machine.
specified by RTAI-XML. As it is possible to
use different XML-RPC implementations (in
the corresponding implementation the apache
classes have been used) this is done generic
to change the implementation if necessary or
desired.
Class XMLRPCClientImpl: The actual imple-
mentation to the XMLRPCClientInterface. FIGURE 2: Simple Simulink example

public c l a s s AppMinimal {

public s t a t i c void main ( S t r i n g [ ] args ) {

// c r e a t e c o n n e c t i o n t o RTAI−XML s e r v e r
C l i e n t X m l I n t e r f a c e r e m o t e C l i e n t = new Cl i e n t X m l R p c R t a i I mpl (new S t r i n g ( ” 1 0 . 0 . 0 . 1 0 ” ) ,
2 9 5 0 0 ,new S t r i n g ( ”STEST” ) ) ;

// c r e a t e a t a r g e t
RTAITarget t a r g e t = new RTAITarget ( r e m o t e C l i e n t ) ;

try {
// Connect t o t a r g e t and s t a r t it
t ar ge t . connect ( ) ;
target . start ( ) ;

47
Platform independent remote control and data exchange with real-time targets

// Query amount o f Parameters and Scopes , p r i n t them


System . o u t . p r i n t l n ( ” P a r a m e t e r s : ”+t a r g e t . getNumberOfParameters ( ) ) ;
System . o u t . p r i n t l n ( ” S c o p e s : ”+t a r g e t . getNumberOfScopes ( ) ) ;

// L i s t a l l a v a i l a b l e Parameters
System . o u t . p r i n t l n ( ” P a r a m e t e r s a v a i l a b l e on t a r g e t : ” ) ;
HashMap v=t a r g e t . g e t P a r a m e t e r s ( ) ;
I t e r a t o r <V e c t o r> m = v . v a l u e s ( ) . i t e r a t o r ( ) ;
while ( m. hasNext ( ) ) {
V e c t o r o = m. n e x t ( ) ;
System . o u t . p r i n t l n ( o ) ;
}

// L i s t a l l a v a i l a b l e s c o p e s
System . o u t . p r i n t l n ( ” S c o p e s a v a i l a b l e on t a r g e t : ” ) ;
V e c t o r v1=t a r g e t . g e t S c o p e s ( ) ;

f o r ( i n t n=0;n<v1 . s i z e ( ) ; n++) {
System . o u t . p r i n t l n ( v1 . g e t ( n ) ) ;
}

// Update a p a r a m e t e r s on t h e t a r g e t , i n t h i s c a s e a s o u r c e s b l o c k
t a r g e t . u p d a t e P a r a m e t e r ( ” t e s t / Gain ” , ” Value ” , 1 0 0 . , true ) ;

// Get a s c o p e
RTAIScope s c = new RTAIScope ( t a r g e t , ”U” ) ;
sc . st ar t Tr an sf e r ( ) ;
sleep (20);
sc . stopTransfer ( ) ;
System . o u t . p r i n t l n ( ” S c o p e d a t a from U : ” + s c . g e t D a t a ( ) . t o S t r i n g ( ) ) ;

// s t o p t a r g e t and d i s c o n n e c t it
t a r g e t . stop ( ) ;
target . disconnect ( ) ;

// i n c a s e o f any e x c e p t i o n , c a t c h i t , s t o p and c l o s e t a r g e t .
} catch ( E x c e p t i o n e ) {
e . printStackTrace ( ) ;
t a r g e t . stop ( ) ;
target . disconnect ( ) ;
}
}
}

3 Application 3.1 Mechanical design

Rheologic measurement cycles do take a certain


amount of time and the number of rheometer-devices
is limited, therefore it is often not possible to carry
out a large test series. It is intended to raise the
amount of control samples in quality assurance and 14
laboratory environments with new automated low
cost measurement devices. Extensive research in im-
plementation and verification was carried out with a
feasibility study [10] and three master theses ([11], 1
13
[12], [13]). We propose a low cost principle which
should be accomplished by a new measurement con- 18

cept, usage of low cost parts and open source prod- 17

ucts in control system design. A fully operational 15


prototype based on a disc–disc CSR principle was
built and is now being evaluated. Optimized for Bi- 16
tumen (remains of distillation of crude oil with a col-
loidal structure) which changes viscosity typically in
a range of 107 it can nevertheless be used for many
other materials.
FIGURE 3: Mechanical setup — top view
The rheometer is intellectual property of vialit
GmbH in Austria.

48
Real-Time Linux Applications

FIGURE 5: Schematic view of electronic


parts

1 The electronic subsystem consists of actuators, sen-


4
sors, amplifying circuits, a data acquisition card and
5
an EPIC/CE embedded PC. There are three sensors
2 mounted on the rheometer:

7
6
Distance sensor: A TCRT reflex sensor was se-
ϑ 8
9 3 11 lected in [11, S. 45]. It is measuring the dis-
12 13
10
tance between a vertical bolt (Fig. 3 - 17,18)
14
ϑ mounted on the rotor and a fixed position on
the rheometer. Using the geometry the rota-
tion degree can be determined.
Stator temperature sensor: A PT100 (Fig. 4 -
10) sensor is used to determine the tempera-
FIGURE 4: Mechanical setup — front ture in the stator mounting clamp. Further
view on this temperature is used for control of the
peltier elements.
The rheometer consists of a rotatory and a static
shaft. The rotor is carried by a small metal ball Rotor temperature sensor: An Analog Devices
bearing (which is kept centered by a small magnet AD592CN sensor (Fig. 4 - 6) is used in the ro-
in the stator). Stabilization occurs through a mag- tor head to determine the temperature of the
net on the top which is also lifting the rotor to keep heat filament. Later on this temperature is be-
friction down. The distance needs to be trimmed to ing used to correct the temperature loss due to
a point, where the magnetic force is stabilizing the the poor heat transfer capacity of the probe.
rotor but does not actually lift it completely. Only
a reduction of the weight on the ball is feasible and The following actuators are used:
gives a positive side effect to reduce friction forces.
On top of the rotor a tripod is mounted. Each foot Peltier elements: Two peltier elements (Fig. 4
of this tripod has a magnet glued into epoxide resin. - 12) mounted on the stator clamp are heat-
Each of these magnets is centered between two coils ing and respectively cooling the stator and the
in a Helmholtz type configuration. The constant probe.
magnetic field between those coils causes the mag-
nets to move, the combination is an electrodynamic Heating filament: Due to the poor heat transfer
actor. AC-currents are causing the rotator to oscil- stator tempering is not sufficient. A heating
late. Both parts, rotor and stator are heated to a filament (Fig. 4 - 7) is inserted into the rotor
selectable temperature. head to enable the possibility of exact temper-
ature control in the probe. An extensive re-
search on the heat propagation model can be
3.2 Electronic design found in [12].

Both, sensors and actuators need amplifying circuits


which have been designed in [11] and [12]. Con-
Actuators
nected to the amplifying stage a Sensoray S526 data
Helmholtz
Coils
acquisition card is used. The card driver is embed-
Heating
Peltier Elements
ded through COMEDI [7]. Connected through the
PC/104 interface of the EPIC/CE embedded PC the
Heating
Filament Data
Acquisiton PCI 104
EPIC / PC 104
System
data acquisition card communicates with the embed-
Amplifier
Card
(DAQ)
BUS
ded PC.
Circuits
Sensors CPU
PCI 104

Distance Sensoray s526


(TCRT)
3.3 Control software
Temperature ϑ
Top

Temperature ϑ Simulink generated targets are providing a good pos-


Bottom
sibility to run control processes in a real-time en-
vironment. Though it was not made to implement

49
Platform independent remote control and data exchange with real-time targets

complex structures in form of workflows. Very often controllers (running on the embedded target) de-
and especially in the rheometer case program states pending on the actual requirements for the measure-
and workflows are needed. After a certain measure- ment steps. Additionally it is possible to change
ment point has been recorded and evaluated there some of the parameters if necessary. All additional
is a need to go on to the next one. When moving parameter can still be changed by an control engi-
on parameters have to be changed according to the neer using q/x/jRTAI-Lab from the RTAI project
workflow and its requirements. Using the framework [5] parallel to the running GUI.
from Section 2.2 a state flow and a GUI (Fig. 6) for
an industrial customer was developed.

FIGURE 6: Rheometer GUI

FIGURE 8: Rheometer state flow (auto-


matic)

4 Conclusions

We have shown tools and methods to fast prototype


control applications using a real-time linux system.
FIGURE 7: Rheometer state flow (main) With this toolsets it is possible to program a plat-
form independent GUI for a control application with-
The state machine (running on the client with the out loosing flexibility in automatic control develop-
GUI) is switching on and off parts of the automatic ment.

50
Real-Time Linux Applications

5 Acknowledgment [7] http://comedi.org/, Linux Control Measure-


ment Device Interface
The author gratefully acknowledges the partial finan-
cial support for the work presented in this paper by [8] http://www.rtaixml.net//, Server component of
the Austrian Research Promotion Agency and the the real-time Application Interface (RTAI)
Austrian COMET program supporting the Austrian
Center of Competence in Mechatronics (ACCM) and [9] http://www.rtai.org/, RTAILab Toolchain to
the company Vialit GmbH, Austria. develop block diagrams

[10] Low–Cost Rheometer–Konzeptstudie, Bernhard


References Zagar, 2005, Institute for Measurement
Technology, JKU, Linz, Austria
[1] http://www.mathworks.com/, Mathworks Mat-
[11] Systemanalyse und Prototyping für ein Low
lab and Simulink
Cost Rheometer, Daniel Schleicher, 2007, In-
[2] http://www.scicos.org/, Open source block dia- stitute for Measurement Technology,
gram modeler/simulator JKU, Linz, Austria
[3] http://www.ni.com/labview/, National Instru-
[12] Hochpräzise Temperatur–Regelung für ein Low
ments LabVIEW
Cost Rheometer auf Basis von real-time-Linux,
[4] Scilab/Scicos and Linux RTAI — A unified Klaus Oppermann 2007, Institute for Mea-
approach, Roberto Bucher, Silvano Balemi, surement Technology, JKU, Linz, Aus-
2005,Proceedings of the 2005 IEEE Con- tria
ference on Control Applications
[13] Low Cost Rheometer for dynamic viscosity mea-
[5] http://www.rtai.org/, Linux real-time Applica-
surement of bitumen with real-time Linux, Mar-
tion Interface
tin Leonhartsberger, 2011, Institute for
[6] http://www.linuxcnc.org/, Home of users of the Measurement Technology, JKU, Linz,
Enhanced Machine Controller Austria

51
Platform independent remote control and data exchange with real-time targets

52
Real-Time Linux Applications

Using GNU/Linux and other Free Software for Remote Data


Collection, Analysis and Control of Silos in Mexico

Don W. Carr, Juan Villalvazo Naranjo, Rubén Ruelas, Benjamin Ojeda Magaña
Universidad de Guadalajara
José Guadalupe Zuno 48, Col. Los Belenes, C.P. 45101, Zapopan, Jalisco, México
[email protected]

Abstract
We have developed a system based on GNU/Linux, Apache, PostgreSQL, Zend Framework, the RE-
ACT control engine, and various free software projects to create a system for remote data collection,
analysis, and control of grain silos. This system allows grain operators to monitor/manage grain silos
from a distance. The system includes a GNU Linux based computer, locally on site at the silo, with the
REACT control engine installed, and, a GNU/Linux server in the cloud running the Zend Framework
and the PostgreSQL database. Communication with the server is via GET/POST.

1 Introduction 2 What is monitored and con-


trolled at each site
In this section, we describe what is monitored and
controlled at each site. Locally we must be able
to monitor the temperature and relative humidity
For grain operators to be able to monitor/manage of the environment, and columns of temperature of
grain silos from a distance, they must have access the grain inside of the silo. We must also be able
to grain quality evaluations done locally on-site at to turn the ventilation fans for the silos on and off,
the silo, history of outside temperature, relative hu- and also sense when they are on to be sure they are
midity, and history of temperature measurements in- working or to know when they have been turned on
side the silo. In addition, operators must be able to manually.
set the grain ventilation parameters remotely so that
ventilation can then happen automatically on-site,
with no need to have a person locally checking tem-
perature and relative humidity and deciding when
the fans should be turned on and off. All of this can
be automated except that the grain quality evalua-
tions must be done by an actual human being using
laboratory equipment. We can however provide web
forms so that the data from these evaluations get into
the database on the website the same day that they
are carried out. This design also allows the manage-
ment team to be distributed at any location around
the world where there is Internet service. It of course
also allows all interested parties such as those that
own the grain, or loaned the money for the grain, to
verify that the grain is still in the silo and in good FIGURE 1: Temperature and Relative Hu-
condition. midity Data from Poncitlan, Mexico

53
Using GNU/Linux and other Free Software for Remote Data Collection, Analysis and Control

The temperature columns are typically located such 1-Wire cable runs, via a 16:1 multiplexor. Each ca-
that there is a maximum of 5 meters to the next ble run can be up to 50 meters long, and is for one
column, or 2.5 meters from the wall to the nearest column of temperatures inside the silo. Each column
sensor, such that there will be a temperature mea- must be hung from the ceiling of the silo and bound
surement no farther than 2.5 meters from every point together with a 1/4 inch steel cable using shrink-fit
in the silo. Vertically the temperature sensors have tubing. The steel cable is necessary due to the ex-
been located every 1.5 meters, but, will switch to treme forces that can be generated when the body of
every 0.5 meters to be able to more accurately esti- grain moves as the silo is being unloaded. The Dallas
mate the height of the grain. More on how we esti- 1-Wire temperature sensors must be soldered every
mate the height of the grain later. The number of 0.5 meters or 1.5 meters depending on the accuracy
temperatures measured inside a silo can run into the of volume measurements desired.
hundreds, depending on the size of the silo. Every
Finally, to turn the ventilator fans on/off, we
10 minutes, the outside temperature, relative humid-
need a relay board with two relays, one that is nor-
ity, all of the temperatures in the columns inside the
mally open and goes in parallel with the ON button,
silo, and the status of the ventilation fans, are logged
and one that is normally closed, and goes in series
locally for backup, and, also communicated to the re-
with the OFF button. The ON relay is pulsed to turn
mote web server. At the same time, we check for new
the ventilator fan on, and, the OFF relay is pulsed
control parameters, and download that at the same
to turn the ventilator fan off. This allows both au-
time, if necessary.
tomatic and manual control of the ventilator fans.
A graph of temperature and relative humidity
The architecture for an installation using 900
taken from near Poncitlan, Jalisco, for one of the
MHz radios is shown below. The SBC7300 is the
initial tests, is shown in Figure 1.
GNU/Linux hardened computer, the CL4490 is a 900
MHz radio, the X505 is a master controller for ven-
tilation fans that can connect multiple relay boards,
3 How the monitoring is done the X105 is a slave device in case that more venti-
lation fans need to be connected, and the W100 is
On site, we use one hardened GNU/Linux computer a weather station. The T200 is the PCB for read-
running our REACT control engine [1],[2],3] to read ing up to 16 temperature columns, and can be daisy
the values from all of the data acquisition devices. chained with an RS-485 port.
We are currently using the Technologic Systems TS-
7300 single board computer, but, we are also testing
the TS-7500 and TS-7553 and plan to switch to one
of these for future projects, as they are cheaper and
more compact. None of these have moving parts, and
will withstand temperature up to 70 degrees Celsius.
They all have SD card slots for gigabytes of local
storage. All of the data acquisition devices that we
use support the MODBUS protocol, and we com-
municate via RS-232, RS-485, and 900 MHz radios.
The GNU/Linux computer is connected to the Inter-
net via Cellular modem, or DSL modem, or the local
network, depending on availability.
FIGURE 2: Temperature and Relative Hu-
The temperature and relative humidity are read midity Data from Poncitlan, Mexico
using a small PCB with micro-controller and inter-
face for a GE Chipcap-D temperature/relative hu-
midity sensor. The GE Chipcap-D is soldered to a 4 The Server
small PCB that must be located close to the micro-
controller, and protected from the weather. Dis-
The server where all of the data is logged can be
cussing the optimum location of this system is be-
located anywhere in the world where there is an
yond the scope of this paper.
Internet connection. The actual servers we use
The temperatures inside of a silo are read us- for this project are located in Karlsruhe, Germany,
ing Dallas 1-Wire temperature sensors, and con- and run Debian GNU/Linux, with the Zend Frame-
nected via Category 5, UTP cable. There is a micro- work, and PostgreSQL database installed. The
controller that can communicate with up to 16 Dallas monitored data at each silo is uploaded every 10

54
Real-Time Linux Applications

minitues via HTTP/POST, and the ventilation pa- necessary, they will be able to disable ventilation en-
rameters are checked for changes every 10 minutes tirely via the web page for the silo in question. To
via HTTP/GET. If we go for more than 20 minutes start, the managers must put in the range of humid-
without communications from a particular site, we itys/temperatures to ventilate the grain. For corn
mark it as offline. We use only HTTPS for these re- for instance, if you ventilate the grain in the range
quests, and, each request must be accompanied by of 70-75% relative humidity, you will end up with
the correct security key for each silo, or the request grain that is 13.5 - 14% humidity by weight. We do
is rejected. Notice that the ventilation parameters plan to automate the process further, by letting the
for each silo can be set from anywhere in the world manager only put the target humidity of the grain,
with Internet access, and they will be transferred and and the algorithm will then automatically try to hit
applied at the local site within 10 minutes. Thus, that target. We should note, that the parameter of
the team managing the silos must only log into the the grain that matters more than all others is the
server for all management functions of the various humidity. If it is too humid, then there will be mold
silos. Onsite at each silo, we will not require grain and other things that will destroy the grain. If it is
conservation experts, only technical people to main- too dry, the grain will crack and break. After hu-
tain the equipment. There must be experts on grain midity, we then want the grain as cool as possible.
certification/quality to visit all of the silos periodi- If the grain is just above zero degrees Celsius, it can
cally and carry out quality analysis and certify the be stored for years. If it is over 30 degrees Celsius, it
quality of the grain using portable laboratory equip- will only last a matter of months. In hotter regions
ment. like the state of Sinaloa, Mexico, when the humidity
is in the correct range, the temperature is typically
much hotter than we would like, and, thus, the grain
stored in Sinaloa must be used in a relatively short
period of time.
To estimate how much grain is in a silo, we ba-
sically need to know the level of the grain in the
silo. From this, using simple geometry, we can cal-
culate how many cubic meters of grain are in each
silo. We can then estimate the number of metric tons
of grain in the silo using the approximate number of
metric tons of grain per cubic meter. So, the ques-
tion is now can we estimate the level of grain in the
silos? The answer is in the temperature readings.
In the silo, the temperature above the grain makes
wide swings based on the outside temperature, sun
hitting the silo, cloud cover, weather conditions in
general. Grain, however, is a natural insulator, and,
temperatures inside the grain remain very constant.

FIGURE 3: Overall System Architecture

5 Control algorithms and anal-


ysis of the data
The basic algorithm for controlling the ventilation
fans is to turn the ventilation fans on when the rel-
ative humidity and temperature are in the correct
range, and, we are not in the hours when electricity
is most expensive, if specified. Obviously, the fans
are then turned off when the conditions are not met.
Further, when managers determine that the grain is FIGURE 4: Data from one Temperature
adequately ventilated, and ventilation is no longer Column in a Silo

55
Using GNU/Linux and other Free Software for Remote Data Collection, Analysis and Control

6 Conclusions References
[1] Don W. Carr, R. Ruelas, Ramón Reynoso, Anne
Santerre, 2006, An Extensible Object-Oriented
We have developed a novel system using GNU/Linux Instrument Controller for Linux, The Eighth
and other free software projects, that allows silos Real-Time Linux Workshop, Lanzhou,
to be managed/monitored from a distance, so that Gansu, China.
there is no need for grain conservation experts at [2] Donald Wayne Carr Finch, Rubén Ruelas, Raúl
each site, and, so that all stakeholders have access Aquino Santos, Apolinar González Potes, 2007,
to the grain quality and also the quantity of grain Interlocks For An Object-Oriented Control En-
in each silo. The system allows managers to set gine, The 3rd Balkan Conference in In-
the parameters of humidity and temperature when formatics, Sofia, Bulgaria.
the grain will be ventilated, and then monitor the
progress/effectiveness of the ventilation via the qual- [3] Donald Wayne Carr Finch, Rubén Ruelas, Apoli-
ity analysis done periodically on-site, and, disable nar González Potes, Raúl Aquino Santos, 2007,
ventilation when goals have been reached. Because REACT: An Object-Oriented Control Engine for
of GNU/Linux and the many other free software Rapidly Building Control Systems, Conference
projects, we were able to complete this project much on Electronics, Robotics, and Automo-
quicker and at a lower cost. tive Mechanics, Cuernavaca, México.

56
Real-Time Linux Applications

A Status Report on REACT, a Control Engine that runs on top of


GNU/Linux for Creating SCADA and DCS Systems

Don W. Carr, Rubén Ruelas, Benjamin Ojeda Magaña, Adriana Corona Nakamura
Universidad de Guadalaara
José Guadalupe Zuno 48, Col. Los Belenes, C.P. 45101, Zapopan, Jalisco, México
[email protected]

Abstract
REACT is a control engine that runs on GNU/Linux, written in C/C++, for creating SCADA and
DCS systems, or just simple controllers for a single machine or laboratory equipment. The design was
based on the experience of the author on large SCADA systems for natural gas pipelines, natural gas
distribution systems, and water distribution systems, software for testing military, industrial, and light
vehicle transmissions, and research on control systems in general. REACT was designed as a general
purpose control engine to scale similar to how GNU/Linux scales: from tiny embedded devices in the
field, all the way up to large computers. Thus, REACT can run as the central software on a server
as a SCADA system, and can also run on small hardened devices as the software for an RTU, or DCS
controller.

1 Introduction is supported through a code generator that automat-


ically generates the code to link the script function
to the object.
The development of REACT [1],[2],[3], was started
in 2002 for Masters students at the University of
Guadalajara, so that students would be able to actu-
ally see the software used for real-time systems and 2 REACT Driver Model
take the mystery away. It was first used for a project
of two electronics masters students to automate a liq- When we started out, the code to communicate with
uid chromatography instrument, with robotic system the data acquisition card was compiled directly into
for changing samples, for the molecular biology de- REACT, and communicated with the factory driver
partment at the University of Guadalajara. REACT that was a shared object file (.so). The problem is
allows the configuration of all the common objects that it created a dependency on a particular shared
often referred to as point types or tagnames that are object file, and the code had to be commented out
common for a SCADA or DCS systems: analog in- for other systems that did not use this code. So, we
put, analog output, discrete input, discrete output, quickly moved to putting the driver interface code
timed discrete output, pulse count input, PID con- into shared objects that can be loaded on demand
troller, etc. A variety of other object types have been using the system API dlopen(), dlsym(), dlerror(),
created for data logging, control, and monitoring, dlclose(). To support objects, we use this API to load
etc, for special applications such as laboratory instru- an object factory function that is called to instanti-
ments, air conditioning monitoring/control, pump ate an object that inherits from our abstract base
station monitoring/control, silo monitoring/control, class iodriver t which defines the interface to all I/O
etc. There is also a scripting language with the drivers. The factory function for each driver must
scripts tokenized for faster execution. Scripts can call return a pointer to iodriver t. We currently have
member functions of all the REACT objects, which drivers for Modbus RTU, Modbus ASCII, Modbus

57
A Status Report on REACT, a Control Engine that runs on top of GNU/Linux

TCP/IP, various PC data acquisition cards, Dallas be done using code generation from a configuration
1-Wire via the OWFS project, a simple ASCII pro- file that named the database field names/types, and
tocol that we developed, and, some simulators that the corresponding field names in the objects. We
we developed that load as drivers. This driver model also quickly realized that this same configuration file
allows us to write a simple simulator that loads as a could be used to convert existing projects with de-
driver to use during testing, and then switch easily limited text files, to use database files. Finally, the
to the real driver in the field. It allows us to keep existing user interface was a text editor to edit the
the code compact for memory constrained embedded delimited text files, and we would need a new user
systems. interface to edit the object configurations/configure
REACT. With these same configuration files, if we
add a few prompts, and a few other simple things,
3 REACT Object model we can generate either a web based user interface for
editing the configuration, or, a text interface, based
on curses, to edit remotely via ssh. We have im-
We needed an object model to implement all of the plemented the code generator for converting existing
control engine objects, often referred to as point projects, and to read/write the configurations from
types or tagnames, or just tags. Actually, we identify REACT, and are working on generating a web in-
all control objects by their tagname which serves as terface so that we can offer an online configuration
a unique ID. As you will see, there is a need to have editor. For now, we are using the SQLite console ap-
a unique ID to refer to objects from scrips, displays, plication to remotely edit configuration parameters
etc. We have four basic types of objects: 1) Input ob- via ssh.
jects that receive process values via a device driver,
2) Output objects that send/write process values via See below, the configuration file for discrete out-
a device driver, 3) Control objects that access pro- puts in Figure 1, and then, the user interface gener-
cess values via input objects, and send process values ated from this file, in Figure 2.
via an output object, and, 4) Objects that only cal-
culate values, do data logging, do user interfaces, etc.
Currently, all of the object types must be hard-coded
into REACT, but, we are in the process of switching
them to be all loaded on demand at load-time, if they
are needed for a particular project. We have started
by implementing dynamic loading for one object type
(analog input).

4 REACT Configuration
Like many projects, we started out storing all con-
figuration in a directory full of delimited text files.
However, there are possible corruption problems, and
it is tedious to copy all of the files to the target sys-
tem, and retrieve all of the configuration files, af-
ter local changes have been made, for backup. For
this reason, we are moving to putting all of the con-
figurations into SQLite, to eliminate the problems
of corruption by using transactions, and, thus sim-
plify copying the configurations to a target system,
and backing up, since SQLite stores the complete FIGURE 1: Config File used to Auto-
database with all tables in a single file. SQLite is Generate code for Discrete Outputs
also extremely compact and can itself be loaded when
needed, and then unloaded using dlopen(), dlsym(),
dlerror(), dlclose().
Faced with the need to write all of the functions
to read/write the configuration for ALL object types,
we quickly realized that it was repetitive, and could

58
Real-Time Linux Applications

things like sleep, wait for condition to be true, wait


for user input, write text to output, etc.
These scripts have proven to be very useful and
have been used on many different projects. Below is
an example of a script to control the level in a tank
between a low point and a high point. We basically
open the valve when we reach the high point, and
close the valve when we reach the low point. The
tag hi level is a discrete input connected to a level
sensor that is on/true when the water is at or above
the high point. The tag lo level is a discrete input
connected to a level sensor that is on/true when the
water is at or above the low point. The tag valve 2
is a discrete output that activates a drain valve that
is opened by sending the value true, and closed by
sending the value false. The 15 second wait was in-
troduced to avoid the valve opening and closing in
rapid succession when a wave was generated.

FIGURE 3: Example Script to Control


FIGURE 2: Auto-Generated Form for Dis-
Level in a Tank
crete Outputs
As an alternative to scripts, we are also work-
5 REACT Scripts ing on state diagrams [4],[5] to describe control algo-
rithms/sequences. For state diagrams, we will still
use the script commands described here to specify
Early on, we realized the need for simple scripts so actions that take place on arriving at a state, or leav-
that common users, that may be control experts, but, ing a state.
are not programmers could extend the funtionality
of REACT by creating, for example, test sequences,
control sequences, interlocks, alarm sequences, shut-
down sequences, etc. Thus, we developed a parser 6 Conclusion
to tokenize scripts for faster run-time execution, and
a code generator to bind scripts to actual C++ ob-
We have created REACT, a very useful free soft-
ject method calls. The code generator works by pro-
ware application that runs on GNU/Linux for cre-
cessing the object header file with a special keyword
ating SCADA systems, DCS systems, simple ma-
(SCRIPT OBJECT) added (in comments) above ob-
chine/instrument controllers, etc. REACT is ex-
jects that support script functions, and then another
tensible through creating new objects written in
special keyword (SCRIPT FUNCTION) above each
C/C++, and also, for non-programmers by writing
method that can be executed from a script. Using
REACT scripts. We make extensive use of code gen-
the tagname, at load-time, we can identify the ob-
eration for repetitive programming tasks to save time
ject, verify it has the method with the given name
and eliminate errors, and enable new funtionality to
and parameters, and bind to that object/method.
be added faster. We use dynamically loaded shared
Thus, at run-time, when the script reaches the line
objects for drivers and REACT objects, to elimi-
with this method call, the given method is called, on
nate dependency conflicts, reduce code size, and al-
the given object, with the given parameters.
low new drivers/object types to be added without
We also needed to create system functions for re-compiling REACT.

59
A Status Report on REACT, a Control Engine that runs on top of GNU/Linux

References [3] Donald Wayne Carr Finch, Rubén Ruelas, Apoli-


nar González Potes, Raúl Aquino Santos, 2007,
REACT: An Object-Oriented Control Engine for
[1] Don W. Carr, R. Ruelas, Ramón Reynoso, Anne Rapidly Building Control Systems, Conference
Santerre, 2006, An Extensible Object-Oriented on Electronics, Robotics, and Automo-
Instrument Controller for Linux, The Eighth tive Mechanics, Cuernavaca, México.
Real-Time Linux Workshop, Lanzhou,
Gansu, China. [4] D. Harel, E. Gery, 1996, Executable Object Mod-
eling with Statecharts, Proceedings of the
18th International Conference on Soft-
[2] Donald Wayne Carr Finch, Rubén Ruelas, Raúl ware Engineering, pp246–257.
Aquino Santos, Apolinar González Potes, 2007,
Interlocks For An Object-Oriented Control En- [5] D. Harel, 1987, Statecharts: A visual formalism
gine, The 3rd Balkan Conference in In- for complex systems, Science of Computer
formatics, Sofia, Bulgaria. Programming, no. 8, pp231–274.

60
Real-Time Linux Applications

Development of an optical profile measurement system under RTAI


Linux using a CD pickup head

Gerstorfer, Gregor
Institute for Measurement Technology, Johannes Kepler University Linz, Austria
Altenbergerstr. 69, 4040 Linz
[email protected]

Zagar, Bernhard G.
Institute for Measurement Technology, Johannes Kepler University Linz, Austria
Altenbergerstr. 69, 4040 Linz
[email protected]

Abstract
The development of a low cost profile measurement system suggests – because of the feature low cost –
the use of an open source control system. In this work, the authors combine a commercially available
compact disc (CD) pickup head, the National Instruments NI-PCI 6221 data acquisition card, and a PC
running RTAI Linux. The setup is used to scan along a CD and measure the distance between tracks.
The setup will be used in a lab course by students to get to know rapid control prototyping systems and
open source alternatives to expensive industrial systems.

1 Introduction sensor system, several investigations have to be car-


ried out. These investigations ought to deliver in-
formation about the electrical and mechanical struc-
The words low cost are nowadays omnipresent, so tures and characteristics of those high quality but
this paper will fulfill these words and explains the use still low cost pickup heads. Main focus is given to the
of low cost components in combination with RTAI so called focus error signal, conveying the sought af-
Linux. The centrepiece of the setup is a commer- ter measurement information of a change in distance
cially available optical pickup head of a standard CD to the reflective surface. This will be presented in
drive. Those pickup heads are very inexpensive but the first part of this paper. The second part intro-
still show a high level of accuracy. This is the reason duces the software and hardware parts of the system.
why the authors use them to scan across a specimen’s At the end, some measurement results and how the
surface for measuring its profile. system could be used in a students lab will be pre-
Several different applications using an optical sented.
pickup head as a measurement system can be found
in [1], [2] or [3]. In this paper there is a focus on the
pickup head itself but also on the data acquisition
and control system which is done with RTAI Linux
2 The optical system of the CD
and RTAI–Lab. There are more than just several ap- pickup head
plications which already run RTAI Linux (can par-
tially be found on the project’s homepage [4]). Here
Optical storage systems usually consist of a rotat-
the application is also aiming to motivate students
ing data storage medium (i.e. a CD) carrying the
to work with open source software.
data which is read out by the mechano-optical pickup
Preliminary to the use of a CD pickup head as a head. The included electronics control the system

61
Development of an optical profile measurement system under RTAI Linux using a CD pickup head

and convert the data. In this section the used shape at the detector array is either elliptic or circu-
pickup head (Mitsumi PXR–550X) from a commer- lar, depending on whether the reflection of the beam
cially available CD–drive is investigated. takes place in the focal plane of the objective lens
(circular beam) or if the reflection takes place out
Detector of the focal plane (elliptic shape). The orientation
(NE–SW or NW–SE) and size of the elliptic beam
pattern depends on the location and distance from
Astigmatic Lens the focal plane.

2.5
Beam Splitter
2

1.5

RF in V
1

0.5

0
0 20 40 60 80 100 120 140 160
z in µm

Laser Diode
2
A B
1
D C
FE in V

Too Close
0
Collimator A B A B
-1
D C D C
Too Far In Focus
-2
0 20 40 60 80 100 120 140 160
z in µm
VCM

Objective Lens
FIGURE 2: Bottom: The FE signal’s char-
acteristic s–curve with corresponding beam
shapes at the detector array. Top: The RF
signal.
Reflective Surface The mentioned four elements of the detector ar-
ray are named A, B, C and D. Each of them deliv-
ers a photocurrent depending on the illuminance in
FIGURE 1: The working principle of a CD the respective area. These currents are subsequently
pickup head – focused case. converted into voltages A, B, C and D. From these
voltages, the so–called focus error (FE) signal can be
Figure 1 shows the principle of a CD pickup head, derived:
which consists of a laser diode operating at a wave-
length of 780 nm, a linearly polarizing beam split- F E = (A + C) − (B + D). (1)
ter, a collimator, an objective lens and an astigmatic
lens in front of the detector array. The emitted laser When a specimen with a reflective surface is
beam is polarized by the polarizing beam splitter, moved towards the pickup head, the measured FE
collimated and reflected at the specimen’s surface af- signal shows a characteristic s–shape with an approx-
ter passing the objective lens. This objective lens can imately linear range over an input range of ±3 µm,
be moved by the so called voice coil motors (VCM). see Fig. 2. The reflection takes place exactly in the
The scattered back light takes its way back through focal plane, when the FE signal becomes zero.
the objective lens, passes the beam splitter and is di- Beside the FE signal also the total illumination
rected via the astigmatic lens to the detector array. integrated over all four photodiodes, the RF signal,
The detector array is a four–quadrant photo diode, is of interest:
in Fig. 2 their arrangement can be seen. Because of
the characteristics of the astigmatic lens, the beam RF = A + B + C + D. (2)

62
Real-Time Linux Applications

The RF signal corresponding to the above mentioned At the begin of a profile measurement at first a
FE signal is shown in the upper part of Fig. 2. voltage sweep from −0.5 V up to 0.5 V must be ap-
plied to the VCM and the corresponding FE signal
has to be acquired so that we can derive the actual
3 Profile measurement with a sensitivity of the FE signal to the specimen’s rough-
ness according to its reflective properties. Later some
CD pickup head measurement results will be presented based on this
idea.
The fact that the FE signal shows an approximately
linear range in its s–curve is used for profile mea-
surement: a specimen is placed exactly in the focal 4 The setup using RTAI Linux
plane and is then moved in any direction within the
focal plane to obtain a scan. The variation of the
FE signal is then the measure for the profile. Since In this work there is also a focus on the controlling
some surfaces show better reflective properties than hard- and software because the students in the course
others do, and the FE signal’s amplitude depends on should get to know and use open source software.
the amount of scattered back light, a way to obtain
a proportionality factor between FE signal and pro- Host
file roughness is sought after. Therefore one has to Master
assume, that the reflective properties of a specimen MATLAB/Simulink TCP/IP
RTAILab
are approximately constant in the investigated area. Realtime Workshop

Hence the voice coil motors are used to ascertain TCP/IP


Virtual Box Realtime Task
the proportionality between FE signal amplitude and
specimen roughness. RTAI Kernel
Comedi
QRtaiLab
For having the voice coil motors as a reference, NI PCI 6221
their characteristics were investigated first (see also ...
[5]). In Fig. 3 the results of the investigation us-
ing a laser vibrometer (Polytec OFV–505) measur- optical
ing the VCM–displacement when different voltages pickup head
are applied to the VCM are shown. This is a static
investigation, dynamic investigations are shown in a
previous work [5]. The results yield to a sensitivity FIGURE 4: Hard- and Software structure
of the VCM (only in axial direction) of of the setup.
z mm
= 1.1 (3) The structure of the entire system is shown in Fig. 4.
ua V An ordinary Desktop PC (Intel P4 2.8 GHz, 512 MB
in the linear range between −0.5 V ≤ ua ≤ 0.5 V. RAM) is used as the Master PC. The Master also
contains the data acquisition device NI PCI 6221.
−3
x 10 Via TCP/IP the Master is connected to the Host PC.
1
The Host is running Windows with an installed Ver-
0.8
sion of MATLAB/Simulink (MATLAB R2010a [6])
0.6 and also Ubuntu (Lucid Lynx 10.04 [7]) is running
0.4 in a Virtual Machine (Oracle VM VirtualBox 4.0.12
0.2 [8]).
z, x in m

0 On the Master PC the Linux distribution


−0.2 Ubuntu (Lucid Lynx 10.04) is installed. A new ker-
−0.4 nel (version 2.6.32-2, downloaded from [9]) with the
−0.6
applied RTAI patch (version 3.8, downloaded from
axial [4]) is configured to make a realtime system out of
−0.8 radial
it. Several pages in the web provide how–tos [10],
−1
−1 −0.5 0 0.5 1 [11] which can be used to configure the RTAI ker-
u in V
nel. Also the driver interface Comedi (version 0.7.76,
downloaded from [12]) is installed and RTAI is mod-
FIGURE 3: The relation between VCM ified so that the data acquisition device can be used
movement and supplied voltage. in realtime tasks.

63
Development of an optical profile measurement system under RTAI Linux using a CD pickup head

During the installation of RTAI a folder named The sampling time amounts to ts = 0.1 ms.
MATLAB is created. This folder is copied into the
MATLAB directory of the PC where – after a small
setup – the realtime task then can be developed in
Simulink. Since the configured RTAI kernel contains
Comedi support there are several Comedi blocks in-
cluded. With the realtime workshop (included in
Simulink) the code is generated and can be trans-
ferred to the Master. Subsequently, the code has to
be compiled (the generated code contains a makefile)
and can be started as long as the required realtime
modules are loaded. The Ubuntu installation in the
Host’s virtual machine has the same RTAI patched
kernel and loaded modules as the Master. With QR-
taiLab [13] a connection to the realtime task can be
established to view scopes and to change parameters
of the realtime task.
In this work the NI PCI 6221 data acquisition FIGURE 5: Screenshot of QRtaiLab plot-
card measures the voltages A, B, C and D (see sec- ting acquired signals.
tion 3) as well as the voltage it applies to the VCM
In Fig. 5 a zoomed-in view of the processed FE signal
ua .
and the corresponding VCM voltage is shown. The
linear range of the FE signal crosses zero when the
VCM voltage amounts to ua = 415 mV. Furthermore
the sensitivity of the FE signal can be computed by
considering Eqn. 3 to
z µm
= 2.54 . (4)
FE V

Focus Error signal


2

1
FE

−1

−2
−0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
t in s
5 Measurement example 0.5
VCM voltage

0.45
u in V

0.4
O

0.35
As a measurement example we first measure the pro-
−0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
file of a CD. Hence CDs are easily available, their re- t in s
flective characteristics are good and because they are
one of the few available specimen with a microstruc-
FIGURE 6: FE signal acquired while the
ture on it they are used as specimen.
reference delta voltage (top). Corresponding
First, the CD is placed near and parallel to the VCM voltage (bottom).
pickup head. Then the objective lens is moved by
the voice coil motor in axial direction (a delta volt- Subsequently the VCM voltage is fixed to the zero
age is applied to the VCM). This is done for finding crossing voltage and slightly adjusted so that the FE
the lens’ position where the reflection takes place in signal becomes zero. Now the CD is moved in ra-
the focal plane and for acquiring the data for the cal- dial direction along the pickup head. This is accom-
ibration of the FE signal (see section 3). In Fig. 5 plished by a translation stage (Oriel Encoder Mike
the screenshot of QRtaiLab measuring the signals A, 18011) with a velocity of v = 1 µm s . The acquired
B, C and D and the VCM delta voltage is depicted. data is shown in Fig. 7, please remark that the plot

64
Real-Time Linux Applications

already shows the profile roughness with respect to software were zero and students should learn to use
the travelled distance, both in µm. similar setups for their own projects.
Profile of a CD
1

Acknowledgment
0.5

The authors gratefully acknowledge the partial finan-


0
cial support of the work presented in this paper by
the Austrian Center of Competence in Mechatronics
z in µm

−0.5
(ACCM).
−1

−1.5 References
−2
0 5 10 15 20 25 30 [1] Development of a low-cost autofocusing probe
x in µm
for profile measurement, Kuang-Chao Fan, Chih-
Liang Chu, Jong-I Mou, (12) 2001, Measurement
FIGURE 7: Measured profile of a part of a Science and Technology
CD.
[2] Measurement of Cantilever Displacement Us-
The track pitch is measured and amounts to 1.6 µm ing a Compact Disk/Digital Versatile Disk
which corresponds to the specification of the CD Pickup Head, En-Te Hwu, Kuang-Yuh Huang,
standard. The profile shows a roughness of maximal Shao-Kang Hung, Ing-Shou Hwang, 45 (2006),
2 µm. Japanese Journal of Applied Physics
The setup shows some insufficiencies hence it is
[3] DVD pickup heads for optical measurement ap-
hard to position the pickup head and the CD per-
plications, Stefan Kostner, Michael J. Vellekoop,
fectly parallel. Furthermore the calibration of the FE
125 (2008), Elektrotechnik und Information-
signal is difficult because the reflective characteristics
stechnik (e&i)
are never constant. Another problem occured when
a controller was implemented. Because the FE sig- [4] https://www.rtai.org
nal’s linear range is very narrow the controller is not
able to position the lens in focus but rather switches [5] Development of a low-cost measurement system
from the negative to the positive defocus where the for cutting edge profile detection, Gregor Ger-
FE signal also goes zero. Nevertheless a micro struc- storfer, Bernhard G. Zagar, 2011, Chinese Optics
ture can be measured despite the insufficiencies. Letters

[6] http://www.mathworks.com
6 Conclusions [7] http://www.ubuntu.com

[8] http://www.virtualbox.org
The development of a profile measurement system
based on a optical pickup head using RTAI Linux [9] http://www.kernel.org
was presented. The optical and electronic compo-
nents of a CD pickup head and its working principle [10] https://www.rtai.org/RTAILAB/RTAI-
were introduced. The control and the data acquisi- KubuntuJaunty-ScicosLab-Qrtailab.txt
tion from a remote PC delivered measurement results
which showed the functionality of the setup. Based [11] http://qrtailab.sourceforge.net/rtai installation.html
on this work students will work in a lab course with [12] http://www.comedi.org
open source/rapid control prototyping system. By
using the introduced setup the costs for additional [13] http://qrtailab.sourceforge.net

65
Development of an optical profile measurement system under RTAI Linux using a CD pickup head

66
Real-Time Linux Applications

Process Data Connection Channels in uLan Network for Home


Automation and Other Distributed Applications

Pavel Pı́ša1,2
[email protected]
Petr Smolı́k1,3
[email protected]
František Vacek1
[email protected]
Martin Boháček1
[email protected]
Jan Štefan1
[email protected]
Pavel Němeček1
[email protected]

1
Czech Technical University in Prague, Department of Control Engineering
Karlovo náměstı́ 13, 121 35 Praha 2, Czech Republic

2
PiKRON s.r.o.
Kaňkovského 1235, 182 00 Praha 8, Czech Republic

3
AGROSOFT Tábor s.r.o.
Harantova 2213, 390 02 Tábor, Czech Republic

Abstract

The uLan protocol is the multi-master communication protocol aimed on small RS-485 control net-
works. It provides deterministic media access arbitration and it is open in design from its origin. An
open-source implementation of the protocol has already been available for many years. The article fo-
cuses on its adaptation for use in distributed home appliances (switches, lights and HVAC components
interconnection and control). For resource restricted control nodes, it was a challenging task to imple-
ment a flexible and persistent configuration of data and events direct routing between distributed nodes
without need for permanent operation of commanding master. Because devices do not have resources
to mutually examine their often large objects/properties dictionaries, the mechanism to map properties
values into process data messages slots has been implemented. The message slots act as (virtual) wires
which are setup by configuration tools running on PC which has enough resources to build and visualize
full objects/properties model by examining of connected devices. Examples of developed devices using
developed concept are presented at the end of the article together with tools available to help with fast
prototyping of new devices and their testing in PC environment. The compilation of embedded devices
code as native Linux binaries is quite straightforward because uLAN driver implementation is portable
and provides same API when compiled for system-less nodes, GNU/Linux or Windows operating system
environment.

67
Process Data Connection Channels in uLan Network

1 Introduction level layers need to support examination of instru-


ment type and available properties/variables.
There is a need for a cheap, two wires bus commu-
nication between resource constrained MCU based
node in many projects and application areas. Many uLAN Protocol Overview
standards exist but most of them require a special
MAC hardware to be integrated onto MCU or at- The initial design of instruments control electronics
tached as an additional communication chip. Many has been restricted to Intel-8051 based controllers
technologies have the disadvantage of being propri- due to its availability and price. These devices pro-
etary or at least controlled by (sometimes secretly vide only single UART hardware for communication
held) patents, even those declared as public stan- and their computational power is quite low. But they
dards. The described solution is based on a different offer multi-drop (9-bit per character) feature which
approach. It is targetted to standard UART hard- allows to suppress the need to process these received
ware (with multi-drop or stick parity bit support) data characters (address bit clear / bit 8 = 0) which
available on most MCUs and PC serial port inter- are not a part of message targeted to given instru-
faces and it has been developed as an open protocol ment/network node (module in uLAN terminology).
from its beginning. uLAN defines character values 0 · · · 0x64 with ad-
dress bit set to address target module but only up to
The article is divided to main parts. The first one 64 masters are considered by media arbitration de-
describes protocol basic ideas leading to uLAN pro- scribed later. The value 0 works as the broadcast ad-
tocol design and implementation. Description starts dress. Values from 0x75 · · · 0x7F range have control
from low level frame protocol and describes uLAN and data delimiters role. Values above 0x80 are used
Object Interface (uLOI) higher level layer with a to release the bus by master after it finishes its mas-
brief application example. tering role in one message(s) exchange session. The
The second part focuses on process data ex- whole range above 0x80 is used for bus release to en-
change based on device properties/variables values code releasing node/module address which allows to
mapping into communication channels distributed in enhance fairness of communication channel capacity
publisher-subscriber manner. distribution between nodes. Due to standard UART
behavior and need to synchronize on character ba-
sis, whole time to transfer character includes start
and stop bit in addition to the address indication
2 uLAN Protocol bit. The whole character transfer takes 11 bit times
in uLAN case.
Origin and Initial Target Applications
As a physical layer, RS-485 signal levels, wiring
and transceivers have been selected. Because multi-
The protocol design has been motivated by the need
master operation has been required (as stated above)
of control and data acquisition networking suitable
some mechanism of media access control/arbitration
for next generation of High Pressure Liquid Chro-
has to be defined. The one solution is to use token
matography (HPLC) instruments sets designed by
passing (Profibus, BACnet MS/TP). But it requires
yet future PiKRON company forming group in 1992.
to keep and update nodes lists in each communica-
The HPLC chromatography instruments do not re-
tion node, initial single token selection and its regen-
quire so fast command/data exchange for basic se-
eration after node failure is quite complex. RS-486
tups, but there are many parameters which have to
signalling does allow reliable collision detection on
be setup and should be monitored. The data types
the wire when transceiver is switched to Tx direction.
range from simple one scalar variable setup (wave-
Switching between Tx and Rx direction and link level
length, flow rate) to gradient time program and de-
stabilization is much slower than available data rates
tector data stream (one float at 25 Hz in our case).
as well. However simulation of dominant/recessive
There should be no loss of sampled data but grouping
levels is possible by switching between Tx logic zero
into longer packets is possible (group of 32 samples
level and Rx direction when bus termination with
at the time is used in our case). The requirement
quiet level bias to logic one is used.
has the ability to send some synchronization com-
mands between instruments without latency added uLAN deterministic distributed media arbitra-
by data resending or even polling cycle controlled tion has been partially inspired by Philip’s I2C de-
by a single master (PC). Because the development sign. But to allow full speed data rates during mes-
of new/different instruments and components to the sage data phase and because UART hardware allows
modular system was expected, the protocol higher only control of transceiver Tx/Rx direction (in most

68
Real-Time Linux Applications

cases assisted by CPU code in ISR) only on whole chronously (UART is used) and some delays could be
character time granularity, the arbitration is based caused by latencies in interrupt processing and some
on switching between Tx zero and Rx for whole char- delays are even required for safe transceiver Rx/Tx
acter time (sometimes implemented by break charac- switching without spikes the minimal time is speci-
ter send). Not like in I2C case, the arbitration needs fied as 4 character/byte transfer times Tchr .
to finish before target address and data are sent in
The first phase TarbW waiting time is not the
transceiver fully driven Tx mode. The arbitration
same for all nodes to ensure some distribution of the
sequence is based on self node/module address to
channel capacity between multiple nodes. The wait
ensure unique dominant/recessive sequence for each
time value is counted as
node.
TarbW = ((LAdr − Adr − 1) mod 16 + 4) · Tchr (1)
uLAN is targetted to control applications which
require data receiption acknowledgement and com-
munication exchanges can be simplified by a direct where LAdr is node address of the last node
reply by addressed device during a single arbitra- which has won arbitration and now releases the bus,
tion cycle. Direct reply frame follows directly after Adr is the address of given node which prepares for
initial frame end without media arbitration. Mas- bus use and Tchr is time to transfer one character.
ter releases the bus after last frame belonging to the This setup ensures strict cycling of media access pri-
given session. This is technique used in many other ority between nodes with messages prepared in Tx
standards but the advantage of uLan is mechanism queue when only addresses up to 16 are assigned to
generic enough that there is no need to use special- nodes. If more nodes are used, the cycling between
ized command format knowledge on the master’s side aliasing nodes is not ensured on deterministic basis
of communication and required/expected single mes- but at least helps with some stochastic distribution.
sage session frames sequence can be prepared and The second phase ensures that node with lower
passed to the driver on application level. own address wins arbitration when two or more
The single frame consists of destination address nodes finish the first phase at the same time. The
(DAdr) with address bit set, source address (SAdr), arbitration is based on sending next three dominant
command (Com) followed by frame data characters. level break characters separated from initial one by
The end of data is delimited by one of four control precomputed time intervals Tarb,0 , Tarb,1 and Tarb,2
characters describing the frame end kind. The sim- Tarb,i = ((Adr shr(2 · i)) mod 4 + 1) · Tchr (2)
ple frame consistency check byte (XorSum) follows.
The frame is directly acknowledged if frame end kind If the activity from other node is detected during
specifies that. Then an direct reply frame can follow inactive interval time, the node abandons arbitra-
if indicated by frame end as well. tion and restarts from the first phase. Direct binary
coding and sending of own address as sequence of
Data frame format
dominant recessive character intervals have not been
selected because precise timing would be a problem
DAdr
or
SAdr Com 0 to MaxBlock
of data bytes
uL_End,
uL_Arq,
XorSum through ISR responses. The addition of one dom-
uL_Beg uL_Prq
or
inant start bit and recessive stop bit around each
uL_Aap
arbitration bit would result in even longer phase two
sequence (3 · Tchr · 8 = 24 · Tchr ) length.
FIGURE 1: uLan Frame Format
Bus request and release

Media Arbitration and Its Cost LAdr delay first


(LAdr-Adr-1) connect
delay
Adr
delay
(Adr shr 2)
delay
(Adr shr 4)
transfer of data release
frames beginning of bus by
mod 16 + 4 mark and 3 and 3 +1 and 3 +1 with DAdr of LAdr=Adr
+1 first frame or 80h

The media arbitration is divided into two phases.


FIGURE 2: uLan Media Access Arbitra-
The first phase is bus quiet time which given node tion
waits to ensure that bus is free (TarbW ). The dom-
inant level (break character) is sent after detection The designed deterministic distributed media ar-
of TarbW bus quiet time. If the other node character bitration poses quite significant cost and consumes
is received, arbitration restarts from the beginning. important part of communication channel capacity.
The second phase ensures a deterministic resolution The 11 bit times Tb are required to transfer single
for the case when two or more nodes finish the first character Tchr = 11 · Tb .
phase in same time.
The TarbAll time of whole arbitration sequence
Because characters are sent and processed asyn- (TarbW + 1 + Tarb,0 + 1 + Tarb,1 + 1 + Tarb,2 + 1) is

69
Process Data Connection Channels in uLan Network

bounded by next ranges single message.

TarbAll ∈ h4 + 3 · 2, 20 + 3 · 5i · 11 · Tb (3)
TarbAll ∈ h10, 20 + 35i · 11 · Tb (4) Higher Level Layers

There are multiple higher level services built above


The whole time of one message arbitration cycle
low level uLAN messages and frames protocol de-
consisting of single frame and reception acknowledge-
scribed earlier.
ment represents time interval TarbAll + (3 + ld + 2 +
4 + 1) · 11 · Tb where ld is number of data bytes. If
network with only 10 nodes with addresses 1 · · · 11 Network Control Messages (uLNCS) the com-
is considered, the arbitration overhead is much lower mands from this group allow to check and
due to shorter times of the second phase for modes change module/node assigned network address,
assigned by lower addresses and because maximal check its identification and production serial
length of the first phase applies only in case when number
same node requests bus repeatedly (see equation 1).
The average message transfer time is more favorable Dynamic Address Assignment (uLDY) the
for this case, if full Tx saturation from all nodes is mechanism to unveil newly attached nodes
supposed. The first phase time is 9 × 5 · 11Tb and from new serial product number appearance,
1 × 13 · 11Tb for this case. The second phase from 2 assign them free network address and detect
contributions evaluates to 7, 8, 9, 7, 8, 9, 10, 8, 9, 10, 11 node disconnection or switching off
character times. The average arbitration time Tarb
uLan Object Interface Layer (uLOI) the
settles on (9.6 + 5.8) · 11 · Tb and whole message time
mechanism to retrieve list of device supported
is (ld + 25.4) · 11 · Tb . In case of quite common (for
readable and writeable variables/properties,
our HPLC applications) message length of 256 B and
their names and data types
communication speed of 19200 Bd it takes 1.6122 s to
send 10 messages (one from each station) and over-
head caused by arbitration and other control charac- Only very short description of use of the last
ters represents 10 %. If the whole encoding schema is mechanism fits in this article.
compared to synchronous communication which does
not need any address, start and stop bits, the over-
head causes 50 %. But even synchronous communi- 3 uLan Object Interface Layer
cation requires some bit-stuffing in real applications
protocols and some media access control. On the
The uLOI defines the system how to serialize ob-
other hand, if short messages of 8 bytes each are con-
jects (properties/variables) identification and their
sidered then uLAN protocol makes up much higher
values in transferred messages. The service works
overhead about 300 % (550 % if counted on bit level).
with asynchronous reply as further master transfer
The uLAN protocol compared to CAN can offer after request sends service/command number to spe-
in case of dedicated or FPGA hardware solution up cific uLOI node/module. Multiple queries for ob-
to 10 times higher transfer rates for bus of the same jects values and/or their description can be serial-
physical length because arbitration (requiring propa- ized in a single message. The limitation is given
gation of dominant/recessive level to whole link and only by the maximal length of a request and ex-
back) is running with 11 times slower timing than pected reply messages which is at least 1000 B for ac-
actual data bytes. Other advantage is that during tual products. The controlling application can build
data transfer full active push/pull transceiver mode model representing connected devices and then use
is used which provides better noise immunity and this model to access data and control attached mod-
works well even if only single twisted pair of wires ules/instruments. The objects serialization and iden-
is used. CAN typically does not work well with- tification minimizes amount of metadata to minimize
out ground interconnection. When compared to to- communication overhead. Each object in module is
ken passing networks, uLAN has much simpler (ba- identified only by 16 bit Object Identification Num-
sically none) master node connection to the network ber. No type, name nor data length for plain types is
and minimal delays are caused by node failure or included in regular transfers. All these information
switched off. The significant disadvantage of very has to be obtained by controlling application/system
high overhead for small messages can be adjusted by in advance through predefined OIDs for this purpose.
building higher level protocol in the way that mul- The significant advantage of the protocol and cur-
tiple variables/properties transfers are grouped into rent uLan driver implementation is that mesage can

70
Real-Time Linux Applications

be read from incoming queue by parts and OIDs are are not required for the most tasks of home automa-
directly interpreted and the reply message is build tion systems. That is why use of uLan for heating
again in “driver space” buffers. The second advan- monitoring and control, lights switching and ring-
tage is that reply allows to identify which objects bells has been proposed by team preparing new home
data it contains. This allows to have more data re- automation project at the Department of Control
quest on the fly from different controlling nodes or Engineering.
applications.
uLOI layer supports devices configuration and
The example of system utilizing many of uLAN their state monitoring by higher level systems. But
services is CHROMuLAN HPLC control system de- use of polling cycle by higher level system is sig-
veloped by Jindrich Jindrich and PiKRON Ltd. nificant disadvantage for home automation. The
home appliances has to be equipped by system which
allows direct communication between nodes in re-
Control System Device 1 Device 2
Local
sponse to the incoming events. This is important
Local display
CHROMuLAN display
keyboard, UI not only to short latencies caused by polling cycle
User scripts keyboard
Control logic Graphic and time program
parameters and User and UI but even to allow system to provide at least basic
acquired data IFPS interpretter Interface
Device Device logic functionality even in the case of higher level control
Object tree browser and handler function
Object tree of branches, properties appli−
and application
comunicating
system failure. It would be possible to use uLOI
and process variables cation over uLan objects messages for direct data writes or reads to/from one
Process
Persistent variables and appliance to objects located in other one. However,
storage Temp. Dev 1 Dev 2 device
ULF, ULC storage model model uLan
object
uLan
object para− this would require mutual knowledge of the structure
ULD files (mem)
uLan net. model interface interface meters of appliances and require quite complex and memory
uLan API uLan uLan resource huge OIDs list and types retrieval or made
API and
MCU
API and Contro−
MCU llers
system inflexible by storing other device OIDs into
Operating System
Linux/Windows uLan
support
libraries
support sensors
libraries etc.
firmware in fixed form.
DOS driver

UART RS485 UART UART more


The generic system for building uLan Connection
Control Computer
Hardware (PC)
chip buffer RS485 RS485 HW Network (uLCN) for processing the data exchange
has been designed instead. This mechanism consists
of two main specifications. The first there is de-
fined new uLAN protocol level command/service for
FIGURE 3: uLOI in Devices and Corre-
process data exchange (UL CMD PDO). The mes-
sponding Model Build in CHROMuLAN Ap-
sage of this type contains one or more blocks holding
plication
data corresponding to individual virtual “wires” con-
nected between appliances. Each such wire is iden-
Many other applications have been developed at
tified by its Connection ID (CID) and delivers data
PiKRON company or by other uLAN adopters. I.e.
or events of some type.
Agrosoft Tábor FASTOP and FASTOS systems for
automatic batch distribution of feed to pigs, cows
and their systems for cow milking data collection.
uLAN interconnect the feeding units with RF animal
identification with central database in these systems uLAN PDO Connection Channels
for example.
The subsystem is designed for direct pro-
cess data (PDO) exchange between devices
4 Data Exchange in Home (nodes/instruments). Every data transfer is iden-
tified by connection ID (CID). Design allows to map
Control Application one or multiple uLOI dictionary objects (properties,
variables) as data source or destination for given
The multi-master capability of uLan, very low cost CID. The mapping is stored directly in devices. The
interconnection with use of a phone line grade ca- mechanism allows to transfer multiple CID identi-
bles, free bus topology, non problematic interfacing fied data values in single message. Receiver identi-
between many low cost microcontrollers and stable fies data scope only by CID, no source address or
drivers for PC operating systems are features which device internal uLOI OID assignment or meta-data
speaks for spreading of uLan into other areas as well. format is encoded in PDO messages or directly influ-
uLan is not intended for high speed communication ence the processing. This allows to connect objects
or hard real-time data exchange but these features with different OIDs, group multiple objects under

71
Process Data Connection Channels in uLan Network

Res Lo Res Hi Ext len (el) Ext CID data len (dl) data CID ...
1 byte 1 byte 1 byte 0..el bytes 2 bytes LE 1 (2) byte dl bytes

Table 1: UL CMD PDO Message Structure

a single CID, use broadcast to distribute data into mappings for the same CID. The special form to em-
multiple destination devices or even use more de- bed 3 bytes (OID + single byte) or 4 bytes (OID +
vices as data source for same CID. When device 2 bytes) directly into ULOI PICO or ULOI POCO
receives PDO message, it processes every CID iden- mapping table entry is also supported.
tified data according to configured mapping. CIDs
and their respective data for which no mapping is
found are simply skipped. Only data types compati-
bility between mapped source and destination OIDs Events to Process Messages Mapping
is required and sometimes this requirement can be
even relaxed to some degree. If destination type is The ULOI PEV2C array specifies, which CID/CIDs
shorter then source, remaining bytes are skipped, identified transfers should be initiated when given
counter case is illegal for actual implementation. event number is activated. One event can be speci-
Predefined constant data can be sent in response to fied multiple times to trigger multiple CID transfers.
event activation as well. The ULOI PEV2C array entry specifies event num-
ber to CID mapping and some flags to nail down CID
Command UL CMD PDO (0x50) is specified for
processing.
PDO messages. Message format starts with two
reserved bytes for future static extensions and one
byte follows, which can be used for dynamic PDO
messages header extensions in future. These bytes
should be sent as zero for current protocol version. 5 Example Applications
Each data block is preceded by its CID and data
length. Maximal individual data block length is 127 DAMIC Home Automation Compo-
bytes for actual implementation and is encoded in nents
single byte. Format allows extension to two bytes in
future if needed.
The concept of the uLAN PDO connection channels
is used in a components and appliances set which has
been developed at the Department of Control Engi-
Control of Data Mapping into Chan- neering to cover needs of heating, ventilation, air-
nels conditioning (HVAC), light control and other home
automation tasks:
All configuration/mapping of PDO data source and
processing of received PDO messages is done through
device objects dictionary (uLOI). Exchanged data
and meta-data stored in mapping tables have same
format as is used for uLOI layer properties/data ac-
cess.
The core component are ULOI PICO and
ULOI POCO mapping tables, both with same for-
mat structure. They are accessible as regular uLOI
arrays of four field structures. Each array entry spec-
ifies mapping between CID and object dictionary en-
tries. Simple one to one mappings are specified di-
rectly by entry by OID number. Complex mapping
can specify offset into block of meta-data byte array
instead of direct OID specification. This allows to
serialize multiple objects/OIDs data under one CID,
add execute command after CID data reception and FIGURE 4: uACT 2i2ct - uLan Actuator
distribution into uLDOI objects etc. Another possi- and Temperature Sensor
bility is to process the same received data by multiple

72
Real-Time Linux Applications

The uLAN uLOI, uLCN infrastructure is used on


PC hardware which runs Linux or Windows oper-
ating systems but even Linux equipped access-point
devices or PowerPC based boards are supported by
Linux builds of uLAN driver.

uLAN-Admin

uLAN-admin is a set of Qt library based components


which provide access to devices properties/variables
by means of uLOI protocol, allows devices scanning,
identification, monitoring and administration. The
core component is library ”libulproxy”. The model
of uLan devices and their OI variables is built by the
library in memory and creates abstraction to access
uLAN network components over JSON RPC 2.0 in-
FIGURE 5: uLTH 010 - uLan Room Ther- terface. Thye library provides a routing of uLAN bus
mostat communication through TCP sockets as well. An
utility library ”libulqttypes” take care about con-
uACT (010) an actuator and temperature sensor version of OI variables values between uLAN bus
available in more variants of output and input data format and Qt types. Its primary purpose is
channels count and power stages to decode/encode byte arrays of uLAN communica-
tion to/from QVariant variables. uLAN-admin also
uLMI (010) a device equipped by digital inputs to contains exemplary application ”browser” providing
sense doors and windows state with additional overview of devices on bus, which is based on above
temperature sensor described libraries.
uLSW (010) not an only light wall switch which
allows to map four contacts (left, right x up,
down), their combinations and pres duration
and “double click” to different events
uDIM 010 a multiple channels dimming controller
for 8-230 VAC lights control
uLMO (010) a miniaturized variant of the actua-
tor controller
uLTH 010 a room temperature controller equipped
by local multi-setpoint week program and user
interface logic for program visualization and FIGURE 6: uLAN-admin - ”Browser” Ap-
editing plication Main Window

All above listed components can be combined to-


uLAN-genmod
gether. The temperature controller uLTH can con-
trol a heater equipped by valve controlled by uACT
uLAN GenMod is an application that allows to con-
for example. The uLMI can be used to indicate open
nect a virtual devices to uLAN bus. Each device
window and this state can be routed to the uLTH to
is defined by two files. A graphical representation
switch of heating when inhibitant opens doors for
of a virtual device is described by QML (Qt Model-
ventilation. The designed infrastructure is used in a
ing Language). uLAN description is defined in XDS
thermal recuperation and ventilation units (VECO)
file (XML description), where are device’s name, ad-
as well.
dress, serial number and device’s object interface.
One or more computers can be used to monitor A whole house network and variables interconnec-
and visualize components states and setup parame- tion can be configured by uLAN-admin tool through
ters and time programs over uLOI protocol and or ULOI PICO and ULOI POCO tables, where is de-
can participate in PDO uLCN based data exchange. fined what PDO messages and CIDs device receive

73
Process Data Connection Channels in uLan Network

and send. The application allows save this network ness of the project make it an excellent candidate for
configuration. The network configuration is transfer- smaller hobbyists home automation projects. The
able to real devices. The virtual device can control minimal requirements for small nodes (only UART
the real devices connected to uLAN bus and vice with software parity control) allows to base such de-
versa. signs on a cheap Cortex-M3 or even smaller MCUs.
The design of higher communication layers can be
utilized even in combination with different link tech-
nologies or can serve as an inspiration for other sim-
ilar projects at least.
uLan project is a live thanks to more companies’
and university members’ participation. The actual
version of the code used in multiple real sold prod-
ucts is available from uLAN project SourceForge GIT
repository and file releases archives.

References
FIGURE 7: uLAN-genmod - Application [1] Jindřich, J., Pı́ša, P.: CHROMuLAN
Main window with Two Devices project [online], 2004–2011, Available:
http://sourceforge.net/projects/chromulan/.
6 Conclusion [2] Pı́ša, P., Smolı́k, P.: uLan Communication Pro-
tocol for Laboratory Instruments, Home Automa-
The uLAN protocol and surrounding infrastructure tion and Field Applications, In 15th International
have been used in many applications for years. They Conference on Process Control 05, Bratislava,
include two generations of HPCL instruments sets 2005. Slovak University of Technology. ISBN
(third generation is in preparation now), more agri- ISBN 80-227-2235-9.
cultural control systems and componets, other se- [3] Pı́ša, P., Smolı́k, P.: uLan SF.net
rious production grade and hobbyists projects (i.e. project [online], 2004–2011, Available:
HISC private house control network based on sole http://ulan.sourceforge.net/.
uLOI which componets has been designed around the
year 2005). [4] Pı́ša, P.: ulan Driver and Protocol Base
Documentation [online], 2004–2011, Available:
uLAN uLCN/PDO design started in 2008 and its http://ulan.sourceforge.net/index.php?page=3.
actual version is complete and well tested. The ap-
proach is similar to CANopen dictionary and PDO [5] PiKRON s.r.o.: HPLC Systems Man-
idea but it is more flexible and suitable for wider uals and Products [online], 2011,
size data types, generic arrays and inherits under- http://www.pikron.com/pages/-
laying uLOI layer flexibility. uLOI layer provides products/hplc.html.
network introspection capabilities much better than [6] Pı́ša, P.: Mathematics and electrical processing
many other standards offers. Yet the metadata over- of liquid chromatography detector signal, Ph.D.
head is kept very small for data exchange after initial Thesis, Czech Technical University in Prague,
device model retrieval phase. 2010.
The PDO mapping system has been tested on [7] Němeček, P., Čarek, L., Fiala, O., Burget, P.:
the CTU developed components for home automa- DAMIC - HVAC control system [application pro-
tion during the DAMIC project. The initial versions totype], 2009
of open-sourced management software utilizing Qt
library is being developed as well. uLan driver and [8] MIKROKLIMA s.r.o.: DAMIC prod-
fully portable interface libraries allows to test even ucts for MIDAM Control System [online],
GNU/Linux builds of components and their interac- 2011, http://www.midam.cz/categories/-
tion. The Qt based components builder and dictio- DAMIC-inteligentni-dum.html.
nary sources generator is in development to help new-
[9] DCE MCU HW and SW Development Resources
comers to test capabilities and speed up new nodes
Wiki – Rtime Server at DCE FEE CTU [online],
design.
2011 http://rtime.felk.cvut.cz/hw/.
The uLCN/PDO mapping extension and open-
74
Real-Time Linux Infrastructure and Tools

Application of RT-Preempt Linux and Sercos III for Real-time


Simulation

Michael Abela,b , Luis Contrerasa , Prof. Peter Klemma,b


a
Institute for Control Engineering of Machine Tools and Manufacturing Units (ISW),
Seidenstr. 36, 70174 Stuttgart, Germany
University of Stuttgart
b
Graduate School of Excellence advanced Manufacturing Engineering (GSaME),
Nobelstr. 12, 70569 Stuttgart, Germany
University of Stuttgart
{Michael.Abel, Peter.Klemm}@isw.uni-stuttgart.de, [email protected]

Abstract
This paper presents the application of RT-Preempt Linux in a virtual commissioning scenario. In
this scenario, a proprietary Programmable Logic Controller (PLC) is connected to a real-time simulation
model. The model is located on a separate Linux personal computer which simulates for example the
hardware of a production machine. Furthermore, the controller and the simulation computer are con-
nected through the sercos III automation bus. The simulation computer uses a sercos III PCI card as
communication hardware in combination with a user space IO (UIO) driver. This allows the execution
of the simulation model and the sercos III driver as real-time processes on the simulation computer. The
sercos III driver was adapted in order to imitate the bus-interface of a custom sercos III bus-coupler and
to provide easy integration into the PLC engineering system. Moreover, variables in the PLC can be
coupled to input and output values of the simulation model. With this virtual commissioning method,
it is possible to reduce the time to market of a machine, since writing and testing the PLC code for the
controller can be done in parallel to the construction of the hardware.

1 Introduction by software tests before the hardware of the machine


is finished. But testing and bug-fixing of functions
which depend on the availability of the machine can
Due to the constant pressure of the market, manu- only be done when the hardware is available. The
facturers have to bring new production machines to same problem comes up at the commissioning of the
the market regularly. To reduce the time to market machine, since the whole machine needs to be fin-
of their machines manufacturers have to find meth- ished until commissioning can begin. To solve this
ods to decrease the overall development time. Es- problem, a virtual commissioning of the machine can
pecially at companies where machines are built for be performed. This can be achieved by applying a
special purposes, every machine is unique and even virtual machine model which simulates the mecha-
needs software which has to be developed particu- tronic parts of the machine. Since production ma-
larly for one machine. The overall development time chines are usually controlled by real-time systems, a
of a machine can be decreased dramatically by par- real-time model for hardware in the loop simulation
allelisation of development tasks. Since most of the is well suited for this purpose [1]. Hardware in the
development tasks depend on each other, dependen- loop simulation means that a simulation model cal-
cies have to be taken into account. Software compo- culates the mechatronic behaviour of a machine while
nents can be written by means of the system spec- its functions are controlled by real control programs.
ification. Parts of the software can even be tested

75
Application of RT-Preempt Linux and Sercos III for Real-time Simulation

2 Theory & State of the Art Each device is equipped with two Ethernet ports.
The preferred bus topology is a ring structure, since a
This chapter gives a short introduction into the tech- ring provides more redundancy than a star topology.
nologies and software systems which are used within Apart from this, a line topology with one or two lines
this project. (i.e. broken ring) can be used as well. Sercos uses
a sophisticated device model which classifies every
bus component into different classes of functionality.
2.1 RT-Preempt Linux and User According to the device model it is possible to dis-
Space IO Drivers tinguish between servo-drives, IO-devices and other
automation hardware.
Linux with real-time kernel preemption (RT- Furthermore, a parameter model was introduced
Preempt) is a enhancement to the Linux kernel. The to describe functional interfaces of field-bus devices.
aim of this patch is to enable real time capabilities Every device has a set of sercos parameters which
in the Linux kernel. The RT-Preempt patch allows characterise the interface of the device. Parame-
user space programs to run in real-time [2], [3]. ters can be accessed by unique identification numbers
The User Space IO (UIO) driver model enables (IDN). Furthermore, a parameter contains a descrip-
drivers to run in the user space of a Linux system [4]. tion of the parameter as string, several attributes and
UIO drivers are a convenient method to implement the data of the parameter with a with fixed or vari-
drivers for non-standard and rarely used hardware able length.
which does not fit into the regular kernel subsys- Sercos uses a start-up phase with five different
tems. The memory of a device is mapped into ad- communication phases (CP) which are usually called
dresses which are accessible from user space memory CP0 to CP4. When the communication phase has
segments. To handle interrupts, a user-space thread passed the early stages and reaches CP4, real-time
can be applied. In addition, a small interrupt han- communication is active, devices and connections are
dler within the kernel space is necessary to wake the set up adequately and real-time data can be trans-
thread. With this functionality it is possible to write mitted. Furthermore, sercos devices can be described
drivers for special purpose devices without the need in the sercos Device Description Markup Language
to handle complex in-kernel structures. UIO-drivers (SDDML) which is based on the Extended Markup
are often used to handle networking devices for field- Language (XML).
buses on systems which are running on RT-Preempt
Linux.

2.2 Serial Real-Time Communication 2.3 Passive sercos III PCI Card
System (sercos) III

The automation bus sercos III is an Ethernet based Custom and PC based sercos slaves can be built
field-bus system which can be used in a wide range by equipping PCs for example with sercos III PCI
of automation applications. Sercos III is standard- networking cards from the company Automata [6].
ised by the association sercos International e.V. [5]. The card contains standard Ethernet communication
In the following the term sercos is used as abbrevia- hardware and a FPGA in order to connect it to the
tion to sercos III. Sercos is based on standard Eth- PCI bus. To bring the card to operation, a propri-
ernet and uses Ethernet frames to communicate on etary driver is necessary. This Sercos Slave Driver
the bus. A sercos network consists of a bus master (SSLV) is written OS independently and contains a
and several slave devices (Figure 1). hardware abstraction layer which can be ported to
other operating systems easily. The card is named
”passive” because a driver, which executes the sercos
networking stack, and a real-time operating system
are necessary to use the card.
This project utilises a port of the SSLV to RT-
Preempt Linux. The SSLV is running as UIO-Driver
within the user space. To support the user space
part of the driver there is also a small kernel module
FIGURE 1: Sercos III ring with master called uio sercos3 in the mainline kernel. Figure 2
and slave devices. shows a rough overview of the SSLV.

76
Real-Time Linux Infrastructure and Tools

models in real-time. Within this project a Virtuos-S


variant is deployed which can be executed on a Linux
system. To synchronise Virtuos with other programs
semaphores are a convenient method. They can trig-
ger the beginning of a simulation step or inform that
an simulation step has finished. Other programs can
use a Virtuos library that provides access to the in-
put and output ports of the simulation model.

3 Problem Definition
To perform virtual commissioning of a production
machine, a real-time simulation model of the hard-
ware is necessary. This virtual machine model can
usually be executed by Virtuos on a PC which also
FIGURE 2: Automata sercos III Slave
executes a PLC or an other type of controller soft-
Driver (SSLV); according to [7].
ware. Indeed, the project specification demanded the
application of a MLP VEP PLC, a proprietary con-
The driver consists of two parts: A small kernel
troller which is not able to execute a Virtuos model.
module called uio sercos3 and the user space applica-
Moreover, it provides no standard interfaces to con-
tion of the SSLV. The user space part is separated in
nect it to a virtual machine model. To solve these
two threads: A UIO interrupt handler thread which
problems, a new method is desired to connect the
is executed with a high real-time priority. And the
PLC to the model. Since the PLC is a proprietary
UserTask, a regular user space part of the SSLV. The
device which needs to be programmed by proprietary
UserTask has a lower real-time priority and needs to
software there are no simple methods to extend it by
be executed at least once in a sercos communication
custom real-time tasks.
cycle. Moreover, a database for IDNs is contained in
the SSLV which can be interfaced from the bus and The solution to this problem is to move the sim-
from the UserTask. ulation model to a PC and let it communicate with
the PLC. Since the communication between PLC and
simulation model needs to be run in real-time, a field-
2.4 MLP VEP and IndraWorks bus can be used to connect the PLC to the simulation
(Figure 3).
In this project a proprietary Programmable Logic
Controller (PLC) produced by the company Bosch
Rexroth [8] is deployed. The MLP VEP is a PC
based PLC which is equipped with several sercos
ports and acts as sercos master device on the bus
system. Furthermore, it can be programed and con-
figured with the engineering tool IndraWorks. It is FIGURE 3: PLC and simulation PC.
able to execute PLC programs written in the five
languages specified in IEC 61131-3 [9] and has addi- Since the MLP VEP PLC offers direct access to
tional Motion Logic Control (MLC) functionality. the sercos field-bus, sercos will be applied as field-
bus in this project. As this field-bus will also be
used later on in the production machine, it does also
2.5 Virtuos also simplify the integration of other hardware which
will be connected to the PLC afterwards.
Virtuos [10],[1] is a simulation software which en-
ables the execution of mechatronic and other mod-
els in real-time. Virtuos consists of three different
software parts: Virtuos-M, Virtuos-V and Virtuos- 4 Approach
S which provide different functionality to the user.
Virtuos-M and Virtuos-V are used for modelling and Sercos uses a device model which can provide dif-
visualisation of simulation models. Virtuos-S is used ferent types of automation devices. But no device
as simulation solver which can compute simulation has an interface which resembles the complexity of

77
Application of RT-Preempt Linux and Sercos III for Real-time Simulation

Virtuos simulation model. A simulation model pro- the PLC. The configuration of the field-bus system is
duces and consumes a high amount of data in every done from this system as well. On the right hand side
simulation cycle. In this project it is sufficient to the simulation computer system with a RT-Preempt
provide exchange of floating point values and inte- patched Linux kernel is shown. This system is also
ger values, since the simulation consists of a mecha- equipped with a passive sercos PCI communication
tronic model. The interface between the PLC and card. PLC and simulation PC are connected via ser-
the simulation was defined as an amount of integer cos. For debugging purposes an Ethernet wiretap
and floating point values. To be able to integrate the can be inserted, as shown in the figure. Inside the
simulation model into the bus system the interface simulation PC the SSLV and the simulation model
of the model was enhanced to resemble the interface are executed. Since the system is (beside the RT-
of a (very large) bus-coupler (see figure 4). kernel) a standard Linux PC, additional software can
be used for debugging purposes as well. The SSLV
is executed to support sercos communication with
the PCI card. Moreover, it is equipped with IPC in-
terfaces to communicate with the simulation model.
Besides of that, the SSLV can record debugging in-
formation in real-time into a FIFO buffer. This in-
FIGURE 4: Simulation model hidden be- formation can be easily read by third party programs
hind the interface of a bus-coupler device or saved for later analysis.

A bus-coupler is a standard field-bus device that


usually couples various kinds of electrical input and 5.1 Communication Concept
output signals to the field-bus. With this solution it
is possible to ”hide” the interface of the simulation The concept for distributing data within the system
model behind the interface of a large bus-coupler. is shown in figure 6.
This also enables easy integration of the simulation
model into the PLC program, since the interface of
the simulation model looks like a bus-coupler. Due
to the application of the real bus system to connect
to the simulation model, the timing which comes to
use later is also applied.

5 System Design
This chapter introduces the design of the simulation
system. Figure 5 shows the overall system structure.
FIGURE 6: Communication concept

Simulation data is transferred in data packets. A


data packet consists of a certain amount of integer
values and a certain amount of floating point values.
To support the simulation with enough data, packets
with 64 32-bit integer values and 32 64-bit floating
point values are used. All the values are composed
together to data packets of 512 bytes in size. Data
packets are composed in the PLC and are put into
Ethernet frames which are sent via the field-bus to
the simulation PC. In the PC, packets are copied to
the address space of the SSLV. The Virtuos-IO (VIO)
thread synchronises the execution of the simulation,
FIGURE 5: Overall system structure decomposes the packet into data types and copies
them to a memory mapped address space which is
The PLC Controller is located on the left hand shared with the simulation. The data transfer back
side of the figure. A separate computer with Indra- from the simulation to the PLC works in a similar
Works is needed to develop and compile programs for manner.

78
Real-Time Linux Infrastructure and Tools

5.2 Controlling the Simulation from data exchange with Virtuos. As a first step, the ser-
the PLC cos interface of the IDN database of the SSLV was
enhanced to emulate the bus-interface of a standard
Within the programming system of the PLC, IO off the shelf bus-coupler with just 16 bits of IO-data.
ports of devices can be mapped to variables. Vari- This is an error prone process since there is no de-
ables can be connected to either input or output scription which IDNs are retrieved and evaluated by
ports. Afterwards, IO-operations can be done by the PLC during the start-up phases. The fields for
setting bit-masks in the PLC program. In addition, cyclic real-time data were extended afterwards to the
field-bus devices can be added to the IndraWorks size of 512 bytes as specified in the custom SDDML
project from a device database. The database can be file of the device. In the final configuration 512 bytes
extended by device descriptions. For sercos devices of data are transferred from the PLC to the simula-
this can be achieved by using files in the SDDML- tion and the same from the simulation to the PLC in
Language. Since the bus-coupler which is used in this every communication cycle. Data composition and
project is not a standard off-the shelf bus-coupler, decomposition is also done by the communication
a SDDML file was written which describes a very thread. Listing 2 shows the specification of a data
large bus-coupler. The file was added to the device packet in C source code:
database of IndraWorks to be able to use it within
the PLC program. typedef struct {
double doubles[32];
Moreover, a data structure was created which int ints[64];
combines all the IO-data that has to be send or re- } io_type;
ceived in one communication cycle. The structure
contains a certain amount of 32 bit integer and 64 Listing 2: Specification of a data packet in the
bit floating point variables (See listing 1). SSLV
TYPE io_type: Luckily, the compiler for the PLC code and the
STRUCT GNU-C compiler use the same method of storing
reals:ARRAY [0..31] OF LREAL; data. To decompose the data packet back into struc-
integers: ARRAY [0..63] OF DINT; tures of variables a pointer to a byte array can be
END_STRUCT used. The pointer has be to casted into a pointer of
END_TYPE type io type and vice versa. To connect the SSLV
to the running simulation and to synchronise a sep-
Listing 1: Specification of a data packet in the PLC arate Virtuos-IO (VIO) thread is used. The VIO
thread has the responsibility to exchange data with
Since the size of bytes in the structure is equal
the running Virtuos simulation and to trigger simu-
to the size of bytes of IO-data in the bus-coupler it
lation steps from outside. For purposes of synchroni-
is possible to connect the complete structure to the
sation two semaphores are deployed. Figure 7 shows
IO-configuration at once. To support input and out-
the execution model as simple Gantt-diagram (with-
put data, two structures were added to the input and
out the running communication thread).
to the output of the device. Consequently, a regu-
lar PLC program can be used to perform calculation
input and output operations.

5.3 Enhancing the sercos Slave Driver


(SSLV)

The SSLV provides two different tasks. On the one


hand, it connects to the field-bus system to exchange
data with the PLC System. On the other hand it is
used to connect to the simulation and exchange data
with the simulation. Since both tasks have critical
FIGURE 7: Execution model of the simu-
timing behaviour they are executed separately in two
lation and the VIO-thread
threads within the SSLV. To move data from one
thread to another they write into global data struc- The execution model of both processes is very
tures. One thread is responsible for covering ser- similar to those of a traditional producer-consumer
cos communication and the other thread handles the model. Two semaphores are deployed: The ”start”

79
Application of RT-Preempt Linux and Sercos III for Real-time Simulation

semaphore has the purpose to signal the beginning lation may cover more than one field-bus devices at
of a simulation step, the ”end” semaphore signals one. As a result it will, be feasible to switch between
the end of a simulation step. The VIO thread is simulated hardware and real hardware without the
started at TN . When the VIO thread has completed need for any changes in the PLC.
its data transfer to the simulation, the simulation
is started. The simulation executes one simulation
cycle and signals the end of the cycle to the VIO- 8 Acknowlegement
thread. Since the exact simulation time varies from
application to application, the VIO-thread does not
The authors would like to thank the German Re-
start data transfer immediately but sleeps until the
search Foundation (DFG) for financial support of
next TN +1 to be in time with the other parts of the
the projects within supporting the Graduate School
system.
of Excellence advanced Manufacturing Engineering
(GSaME) at the University of Stuttgart.

6 Conclusion
References
This paper presents how RT-Preempt Linux can be
used for real-time simulation and the virtual com- [1] Hardware in the loop simulation of production
missioning of production machines. A proprietary systems dynamics, Sascha Röck, 2011, Prod.
PLC is connected to a simulation PC which executes Eng. Res. Devel., German Academic Societey
the real-time simulation model. The automation bus for Production Engineering (WGP), Springer
sercos III is used to transfer data in a deterministic Verlag, Germany.
manner between PLC and simulation PC. To adapt
the simulation model to the field-bus, its interface is [2] Realtime Linux, Open Source Au-
hidden behind the interface of a bus-coupler device. tomation Development Lab (OS-
For this purpose a sercos III PCI networking card is ADL), https://www.osadl.org/Realtime-
utilised. The driver of the card is enhanced to emu- Linux.projects-realtime-linux.0.html, 2011.
late the interface of a bus-coupler and to to transfer
data between the bus and the simulation model. The [3] Real-Time Linux Wiki,
simulation model is executed by the simulation soft- https://rt.wiki.kernel.org.
ware Virtuos on the simulation PC. With this setup,
[4] UIO drivers in the context of RT kernels, Hans-
PLC programs for controlling production machines,
Jürgen Koch, Germany, Twelfth Real-Time
which need run on their (proprietary) and unmodi-
Linux Workshop, 2010, Kenya.
fied target hardware can be tested by means of sim-
ulated mechatronic hardware. Accordingly, the time [5] sercos International e.V., www.sercos.org, Ger-
to market of a production machine can be reduced many.
by parallelisation of development tasks. As testing
of programs which control or depend on mechanical [6] AUTOMATA GmbH & Co. KG,
hardware can be tested without the real hardware to www.automataweb.com, Germany.
be available.
[7] Sercos III Slave Driver API Documentation
V1.1, Automata GmbH, 2011, Germany.

7 Future Work [8] Bosch Rexroth AG, www.boschrexroth.com,


Germany.
At current, only one field-bus device can be emu- [9] Programmable controllers - IEC 61131-3, In-
lated by the PCI card. This is why a small hardware ternational Electrotechnical Commission (IEC),
abstraction layer in the PLC code is necessary to 2010, Switzerland.
switch from simulated to real hardware. In a follow-
up project, the emulation of more than one sercos [10] Virtuos, ISG - Industrielle Steuerungstechnik
III devices will be possible. Consequently, the simu- GmbH, www.isg-stuttgart.de, Germany.

80
Real-Time Linux Infrastructure and Tools

Lachesis: a testsuite for Linux based real-time systems

Andrea Claudi
Università Politecnica delle Marche, Department of Ingegneria dell’Informazione (DII)
Via Brecce Bianche, 60131 Ancona, Italy
[email protected]

Aldo Franco Dragoni


Università Politecnica delle Marche, Department of Ingegneria dell’Informazione (DII)
Via Brecce Bianche, 60131 Ancona, Italy
[email protected]

Abstract
Testing is a key step in software development cycle. Error and bug fixing costs can significantly affect
development costs without a full and comprehensive test on the system.
First efforts to introduce real-time features in the Linux kernel are now more than ten years old.
Nevertheless, no comprehensive testsuites is able to assess the functionality or the conformance to the
real-time operating systems standards of the Linux kernel and of real-time nanokernels that rely on it.
In this paper we propose Lachesis, an automated testsuite derived from the LTP (Linux Test Project)
real-time tests. Lachesis is designed with portability and extensibility as main goals, and it can be used
to test Linux, PREEMPT RT, RTAI and Xenomai real-time features and performances. It provides some
tests for SCHED DEADLINE patch, too. Lachesis is now under active development, and more tests are
planned to be added in the near future.

1 Introduction on a real-time patch to the Linux kernel, which in-


troduces or improves some essential features for a
real-time operating system.
Linux kernel is being increasingly used in a wide vari-
ety of contexts, from desktops to smartphones, from Other solutions use a real-time nanokernel in
laptops to robots. The use of Linux is rapidly grow- parallel with the Linux kernel. This is made possi-
ing due to its reliability and robustness. ble through the use of an hardware abstraction layer
on top of the hardware devices, that listens for in-
Thanks to these qualities, Linux is widely used terrupts and dispatches them to the kernel respon-
in safety and mission critical systems, too. In these sible for managing them. RTAI [4] and Xenomai [5]
contexts, time is extremely important: these systems nanokernels, both built on the top of Adeos [6] hard-
must meet their temporal requirements in every fail- ware abstraction layer, belong to this category.
ure and fault scenario. A system where the correct-
ness of an operation depends upon the time in which Although first efforts to enhance and introduce
the operation is completed is called a real-time sys- real-time features in the Linux kernel are now more
tem. than ten years old, nowadays there are no compre-
hensive testsuites able to assess the functionality or
Over the years various solutions have been pro- the conformance to the real-time operating systems
posed to adapt the Linux kernel to real-time require- standards for the Linux kernel and for the nanoker-
ments. Each of them has found extensive applica- nels that rely on it.
tions in production environments. Some of these so-
lutions, like PREEMPT RT [1] and, more recently, Properly testing the Linux kernel is not easy,
IRMOS [2] and SCHED DEADLINE [3], are based since the code base is continuously growing and in

81
Lachesis: a testsuite for Linux based real-time systems

rapid evolution. We need more efficient, effective and 2 Taxonomy of testing method-
comprehensive test methods, able to ensure proper
software behaviour in the wide range of situations
ologies
where systems can be deployed. For example, tests
are critical in an environment where a malfunction- In software engineering, testing is the process of vali-
ing system can seriously damage machinery, struc- dation, verification and reliability measurement that
tures or even human life. ensure the software to work as expected and to meet
requirements.
Nowadays there are many automatic testsuites,
covering an increasingly wide range of kernel fea- The Linux kernel has been tested since its intro-
tures. Many of these testsuites make it possible duction. In the early stage of development tests were
to functionally test file systems and network stack, ad-hoc and very informal: every developer individu-
to evaluate efficiency in memory management and ally conducted tests on the portion of code he devel-
in communication between processes, and to assess oped, with his own methodologies and techniques;
standards compliance. Very few of them make func- frequently tests came after end-users bug reports,
tional testing on real-time features, and none of them and were aimed at resolving the problem, identify-
test performances or conformance with real-time fea- ing the section of code causing it.
tures. Over the years kernel grew across many differ-
ent architectures and platforms. Testing activities
became increasingly difficult and costly in terms of
1.1 Paper contributions time, but remained very critical for kernel reliability,
robustness and stability.
In this paper we propose Lachesis, an automated
testsuite for Linux based real-time systems, derived For this reason different testing methodologies
from the LTP real-time tests. Lachesis main goals and techniques for the Linux kernel were experi-
are: mented and used. Indicatively, these methods can
be grouped into seven categories [7].
• to provide extensive and comprehensive testing
of real-time Linux kernel features
2.1 Built-in debugging options
• to provide a common test environment for dif-
ferent Linux based real-time systems This kind of tests must not be done simultaneously
with functional and performance tests. It consists in
• to provide a set of functional, regression, per- a series of debugging options (CONFIG DEBUG *
formance and stress test, either developing or in the Linux kernel) and fault insertion routines that
porting them from other testsuites allow the kernel to test itself.
• to design and experiment a series of build tests

• to minimize development time for new tests 2.2 Build tests


• to make the testsuite extensible and portable Build tests just compile the kernel code searching for
warnings and errors from the compiler. This kind of
Several real-time tests were ported to Lachesis tests is an extensive problem, for a number of differ-
from other testsuites. In this paper we also detail ent reasons:
porting procedures and results.
• Different architectures to build for;
1.2 Paper structure • Different configuration options that could be
used;
The rest of this paper is organized as follow: section
2 illustrates a taxonomy for testing methodologies; • Different toolchains to build with;
section 3 briefly introduces the most important test-
suites for the Linux kernel and give some reasons
why to choose LTP as start point to develop a new 2.3 Static verification tests
real-time testsuite; Section 4 illustrates Lachesis and
the tests it includes; Section 5, finally concludes the This kind of tests is designed to find bugs in the code
paper. without having to execute it. Static verification tests

82
Real-Time Linux Infrastructure and Tools

examine the code statically with tools like sparse, 3 Automated testsuites
LClint [8] (later renamed as splint), and BLAST [9].
Testing is expensive, both in terms of costs and time.
Automation is a good way to reduce economic and
2.4 Functional and unit tests human efforts on testing. A number of automated
testing environments for the Linux kernel has been
Functional and unit tests are conceived to examine proposed, each with its own strengths and weak-
one specific system functionality. The code imple- nesses.
menting a feature is tested in isolation, to ensure it In the following paragraphs the most impor-
meets some requirement for the implemented specific tant test suites for the Linux kernel are presented.
operation. Crashme [10] is an example of this kind Goals and basic concepts which guide their design
of test. are stated.

2.5 Regression tests 3.1 IBM autobench

Regression tests are designed to uncover new er- IBM autobench is an open source test harness1 con-
rors and bugs in existing functionalities after changes ceived to supports build and system boot tests, along
made on software, such as new features introduction with support for profiling [7]. It is written in a com-
or patches correcting old bugs. The goal for this kind bination of perl and shell scripts, and it is fairly com-
of tests is to assure that a change did not introduce prehensive.
new errors.
Autobench can set up a test execution environ-
Regression testing methods are different, but in ment, perform various tests on the system, and write
general consist in rerunning previously ran tests and logs of statistical data. Tests can be executed in par-
evaluating system behaviour, checking whether new allel, but test control support is basic and the user
errors appear or old errors re-emerge. have almost no control over the way tests are exe-
cuted. Error handling includes the success or failure
of the tests, but is a very complex activity and must
be done explicitly in all cases. In addition the use
2.6 Performance tests
of different languages limits testsuite’s extensibility
and maintainability.
Performance tests measure the relative performance
of a specific workload on a certain system. They IBM autobench project is inactive since 2004,
produce data sets and comparisons between tests, al- when the last version was released.
lowing to identify performance changes or to confirm
that no changes has happened. In this category we
can include kernbench, a tool for CPU performance 3.2 Autotest
tests; iobench, a tool for disk performance tests; and
netperf, a tool to test network performance. Autotest is an open source test harness capable of
running as a standalone client. It is easy to plug
it into an existing server harness [11], too. Au-
2.7 Stress tests totest provides a large number of tests, including
functional, stress, performance, regression and ker-
nel build tests. It supports various profilers, too.
Stress tests push the system to the limits of its capa-
bilities, trying to identify anomalous behaviours. A Autotest is written in python, which enables it
test of this kind can be conceived as an highly parallel to provide an object oriented and clean design. In
task, such as a completely parallelized matrix multi- this way testsuite is easy to extend and maintain.
plication. A performance test running under heavy Including python syntax in job control file, for ex-
memory pressure (such as running with a small phys- ample, users can take more control on test execution.
ical memory), or in a highly resource-competitive en- On the other hand, python is not widely used in the
vironment (competing with many other tasks to ac- real-time community, and is not suited for real-time
cess the CPU, for example) can become a stress test. tests or applications development.
1 A test harness is an automated test framework designed to perform a series of tests on a program unit in different operating

conditions and load, monitor the behaviour of the system and compare the test results with a given range of good values.

83
Lachesis: a testsuite for Linux based real-time systems

Autotest has built-in error handling support. cyclictest [14], for example, is a well known test
Tests produce machine parsable logs; their exit sta- that measures the latency of cyclic timer interrupts.
tus are consistent and a descriptive message of them Through command line options, the user can choose
is provided. A parser is built into the server har- to pin the measurement thread to a particular core
ness, with the task of summarizing test execution of a multi-core system, or to run one thread per core.
results from different testers, and formatting them Cyclictest works by creating one or more user space
in an easy consultation form. periodic thread, the period being specified by the
user. The accuracy of the measurement is ensured
Autotest includes very few tests to examine the
by using different timing mechanism.
Linux kernel from a real-time point of view; almost
all of them are functional tests. Moreover, it does not Another interesting test is hackbench. It is both
include any compliance test on real-time standards. a benchmark and a stress test for the Linux kernel
scheduler. Hackbench creates a specified number of
pairs of schedulable entities which communicate via
3.3 Crackerjack socket. It measures how long it takes for each pair
to send data back and forth.
Crackerjack is a testsuite whose main goal is regres-
sion testing [12]. It provides: rt-tests is a good and well established suite to
test Linux kernel real-time features. However it is
• automatic assessment of kernel behaviours conceived primarily to test the PREEMPT RT patch
set, so it’s quite difficult to extend it to other Linux
• test results storage and analysis based real-time systems. For example, it contains a
• incompatibilities notification couple of tests based on a driver for the Linux kernel;
in systems such as Xenomai, the use of a driver as
• test result and expected test result manage- this causes a mode change in which a real-time task
ment (register, modify, remove) switch from the Xenomai to the Linux environment.
Thus, the task experiences much longer latencies.
Crackerjack is initially developed to test Linux ker-
nel system calls, but over time has been revised to Test results are outputted in a statistical sum-
easy future extension to other operating systems. mary, rather than in a boolean ”PASS” or ”FAIL”.
Unfortunately rt-tests do not provide any mechanism
It is implemented using Ruby on Rails. This for collecting the results and present them in a ma-
makes it easy to modify it, ensuring a low mainte- chine parsable form.
nance cost and simplifying development of new tests.
However, as for python, Ruby is not suited for real-
time tests or applications development.
3.5 LTP - Linux Test Project
Crackerjack integrates a branch tracer for the
Linux kernel, called btrax. Btrax is a tool to anal- LTP (Linux Test Project) is a functional and regres-
yse programs effectiveness. Crackerjack uses btrax to sion testsuite [15]. It contains more than 3000 test
trace the branch executions of the target program, to cases to test much of the functionalities of the ker-
analyse the trace log file, and to display data about nel, and the number of tests is increasingly growing.
coverage and execution path. btrax makes use of LTP is written almost entirely in C, except for some
Intel processors’ branch trace capabilities, recording shell scripts.
how much code was tested.
In recent years LTP has been increasingly used
Crackerjack does not support conformance, per- by kernel developers and testers and today is almost
formance or stress tests, and does not include any a de-facto standard to test the Linux kernel [16] [17].
functional test on real-time features. Linux distributors use LTP, too, and contributes en-
hancements, bug fixes and new tests back to the
suite.
3.4 rt-tests
LTP excellence is testing Linux kernel basic func-
rt-tests [13] is a popular testsuite developed to test tionality, generating sufficient stress from the test
the PREEMPT RT patch to the Linux kernel. It is cases. LTP is able to test and stress filesystems, de-
developed by Thomas Gleixner and Clark Williams, vice drivers, memory management, scheduler, disk
and it is used in the OSADL lab, across various hard- I/O, networking, system calls and IPC and provides
ware architectures, for a continuous testing. rt-tests a good number of scripts to generate heavy load on
includes ten different tests for real-time features. the system.

84
Real-Time Linux Infrastructure and Tools

It also provides some additional testsuites such common test environment for different Linux based
as pounder, kdump, open-hpi, open-posix, code cov- real-time systems. Therefore it seems reasonable to
erage [18], and others. start from an existing, accepted and widely used test-
suite, to adopt its principles and apply them to a new
LTP lacks support for profiling, build and boot
testsuite, conceived with other goals and priorities.
tests. Even if it contains a complete set of tests, LTP
is not a general heavy weight testing client. We choose LTP as a starting point for Lachesis.
There are many reasons behind this choice. First,
LTP also lacks support for machine parsable logs.
LTP is one of the few testsuites able to provide a
Test results can be formatted as HTML pages, but
set of tests for Linux kernel real-time features, and
they are either “PASS” or “FAIL”, and for tester
a large number of testers use it. Second, LTP has
is more complex to understand the reasons behind
a well established and clean architecture. It makes
failures.
use of two main libraries, librttest which provides
LTP has a particularly interesting real-time test- an API to create, signal, join and destroy real-time
suite, that provides functional, performance and tasks, and libstats which provides an API for some
stress tests on the Linux kernel real-time features. basic statistical analysis and some functions to save
To the best of our knowledge, LTP is one of the few data for subsequent analysis. Last, LTP provides a
testsuites that provides such a comprehensive and logging infrastructure. This is an important and de-
full featured set of tests for Linux real-time func- sirable feature for Lachesis, too.
tionalities.
We believe it is of little significance to compare
the results of tests to absolute values statically built
inside the testsuite. In fact, varying the hardware
4 Lachesis to test, these values should vary as well. So, un-
like LTP, Lachesis provides a boolean pass/no pass
All analysed testsuites seem to suffer some key fail- output only on functional tests; by contrast, in per-
ings in relation to testing the Linux kernel real-time formance tests it outputs a statistical summary of
features. the results.
Many of them seem to have little consideration
for real-time features in the Linux kernel. A great 4.1 Architecture
part of them (with the notable exception of LTP and
rt-tests) does not offer any real-time functional, per-
Lachesis is designed to analyse a variety of Linux
formance or stress test.
based real-time systems; therefore it provides a
Usually it is simple to design and develop a new straightforward method to build tests for different
test inside an existing testsuite. However it is very kernels. During the configuration Lachesis probes
difficult to extend an entire testsuite in order to the system to determine which nanorkernels or real-
test some new real-time nanokernels. System calls time patches are present, and instructs the compiler
analysed in tests, in fact, may differ syntactically to produce in output different executables for each
from one kernel to another maintaining the same system to be tested. A set of scripts is provided to
functionality. Moreover some nanokernels, such as execute tests sequentially; launching these scripts,
RTAI and Xenomai, provides some real-time features tests are executed one after another and tests results
through additional system calls, for which specific are stored in logs for subsequent analysis.
tests should be developed.
librttest had to be rewritten to support both
Another problem is the lack of machine parsable RTAI and Xenomai primitives. Lachesis maintains
results. There is no standard way to consistently librttest API, extending it to provide advanced real-
communicate results to the user; often we have not time features, typical of Linux-based nanokernels.
any detail on the reason that led a test to failure. Basic real-time features are provided encapsulating
real-time specific function into the pre-existing API,
Lastly, every testsuites has grown rapidly and
thus concealing them from the user. For example,
chaotically in response to the evolution of the Linux
the create task() primitive was modified to take
kernel. For this reason they are not easy to under-
into account the corresponding primitives for Xeno-
stand, maintain and extend.
mai and RTAI.
The lack of a comprehensive testsuite to meet
As a result, it is possible to write a single test
previously exposed needs led us to develop Lachesis.
for a specific real-time feature and use it to test all
The ambitious goal of Lachesis is to provide a supported systems, thus increasing testsuite’s porta-

85
Lachesis: a testsuite for Linux based real-time systems

bility and maintainability. • RTAI END deletes the task


The logging infrastructure of Lachesis is based
on LTP features. Libstat is used to provide a set of In all the other primitives, where approaches do not
functions to store data and make statistical calcu- differ so much from system to system, small changes
lus on them; other functions are used to write files have proved to be enough. Two primitives to create
with reports on data. These files are written in an periodic and deadline tasks were added.
easy readable form (though they are not yet machine Time management API was extended with many
parsable) and stored in an unique subdirectory. primitives. A RTIME structure was defined to stan-
dardize time management between the different sys-
tems to be tested. Two functions have been defined
to make additions and subtractions on it. Nanosleep
and busy work primitives have been modified to call
specific real-time sleep functions. A primitive was
added to end the current period for the calling task.
Buffers management API remained almost un-
changed from original API. Few changes were made
to support RTAI’s memory management primitives,
changing malloc() and free() to rt malloc() and
rt free() where appropriate.
Data management API remained also unchanged
from original LTP API, since these functions only
deal with data collected from tests previously car-
ried out. We plan to work on this API in the near
future, to support XML parsing and to format results
FIGURE 1: Lachesis’s architecture adequately.
In addition to the API, we needed to introduce
4.2 API some macros to replace some standard functions with
their real-time counterpart, provided by a particular
One of the principal goals of Lachesis is to provide nanokernel. For example, we have a macro to re-
a single API to develop tests for a number of dif- place printf() with rt printf() instances, if Lach-
ferent Linux based real-time systems. To do that, esis have to build tests for Xenomai nanokernel.
it provides an API that can be divided in four sec-
tions: tasks management, time management, buffers
management and data management. 4.3 Tests included in Lachesis
Tasks management API has been largely changed Below we briefly present all the tests currently in-
from the original LTP API, to provide support for cluded in Lachesis. Some of them were ported from
testing on RTAI and Xenomai nanokernels. In par- different testsuites, others were developed as a part
ticular, the create thread primitive has been mod- of this work. In the naming scheme we have tried
ified to call the corresponding Xenomai primitive, if to adopt the following principle: the first word is the
Lachesis is requested to compile test for Xenomai parameter measured by the test, the second indicates
nanokernel. RTAI’s approach is quite different and the way in which measurement is made. We will in-
consists in ensuring that certain section of code can dicate when the test is considered to be passed and
be scheduled in accordance to some hard real-time al- the category it belongs to.
gorithm, calling some specific functions immediately
before and after them. We have addressed this par- Tests are supposed to be done in an unloaded
ticular mechanism defining four macros, to be used system. When a load is necessary, the test itself gen-
inside the tasks to be defined: erates it. It’s worth to say that Lachesis don’t take
into account power management features. So it’s up
• RTAI INIT defines task descriptor, scheduler to the user to ensure that tests are running under
and system timer the same power management conditions. This could
be done disabling the power management in the con-
• RTAI START TASK starts hard real-time section figuration of the Linux kernel.
• RTAI STOP TASK ends hard real-time section 1) blocktime mutex is a performance test that

86
Real-Time Linux Infrastructure and Tools

measures the time a task waits to lock a mutex. Test to ensure a correct preemption between tasks. It cre-
creates a task with higher priority, one with lower ates 26 tasks at different priority, each of them trying
priority, and some tasks at medium priority; highest to acquire a mutex. Test is passed if all task are ap-
priority task tries to lock a mutex shared with all the propriately preempted in 1 loop. This is a functional
other tasks. Test is repeated 100 times. test.
2) func deadline1 is a functional test we 11) func prio verifies priority ordered wakeup
developed, conceived to be used only on from waiting. It creates a number of tasks with in-
SCHED DEADLINE patched kernels. It creates creasing priorities, and a master task; each of them
a task set with U = 1, using the UUniFast [20] algo- waits on the same mutex. When the master task
rithm not to bias test’s results. Task set is scheduled, releases its mutex, any other task can run. Test is
and test is passed if no deadline is missed. passed if tasks wakeup happened in the correct pri-
ority order. This is a functional test.
3) func deadline2 is a functional test we
developed, conceived to be used only on 12) func sched verifies scheduler behaviour using
SCHED DEADLINE patched kernels. It creates a football analogy. Two kinds of tasks are created:
a task set with U > 1, using the UUniFast algorithm defence tasks and offence tasks. Offence tasks are
not to bias test’s results. Task set is scheduled, and at lowest priority and tries to increment the value of
test is passed if at least one deadline is missed. a shared variable (the ball). Defence tasks have an
higher priority and they should block offence tasks,
4) func gettime verifies clock gettime() be-
in such a way that they never execute. In this way
haviour. It creates a certain number of tasks, some of
ball position should never change. The highest prior-
them setted to sleep, some other ready to be sched-
ity task (the referee) end the game after 50 seconds.
uled. Test is passed if the total execution time of
Test is passed if at the end of the test the shared
sleeping tasks is close to zero. This is a functional
variable is zero. This is a functional test.
test.
13) jitter sched measures the maximum execu-
5) func mutex creates a number of tasks to walk
tion jitter obtained scheduling two different tasks.
through an array of mutexes. Each task holds a max-
The execution jitter of a task is the largest difference
imum number of locks at a time. When the last task
between the execution times of any of its jobs [19].
is finished, it tries to destroy all mutexes. Test is
The first task measures the time it takes to do a fixed
passed if all mutexes can be destroyed, none of them
amount of work; it is periodically interrupted by an
being held by a terminated task. This is a functional
higher priority task, that simply wakes up and goes
test.
back to sleep. Test is repeated 1000 times. This is a
6) func periodic1 creates three groups of periodic performance test.
tasks, each group with different priorities. Each task
14) latency gtod is a performance test. It mea-
makes some computation then sleeps till its next pe-
sures the time elapsed between two consecutive calls
riod, for 6000 times. Test is passed if no period is
of the gettimeofday() primitive. Test is repeated a
missed. This is a functional test.
million of times, at bulks of ten thousand per time.
7) func periodic2 is a functional test we devel-
15) latency hrtimer one timer task and many
oped, and is conceived to be used in kernels that
busy tasks have to be scheduled. Busy tasks run at
support primitives for periodic scheduling. It creates
lower priority than timer task; they perform a busy
a task set with U = 1, using the UUniFast algorithm
wait, then yield the cpu. Timer task measures the
not to bias test’s results. Task set is scheduled, and
time it takes to return from a nanosleep call. Test
test is passed if no period is missed.
is repeated 10000 times, and is passed if the highest
8) func periodic3 is a functional test we devel- priority task latency is not increased by low priority
oped, and is conceived to be used in kernels that tasks. This is a performance test.
support primitives for periodic scheduling. It creates
16) latency kill two tasks with different priority
a task set with U > 1, using the UUniFast algorithm
are to be scheduled. Lower priority task sends a kill
not to bias test’s results. Task set is scheduled, and
signal to the higher priority task, that terminates.
test is passed if at least one period is missed.
The test measures the latency between higher pri-
9) func pi checks whether priority inheritance ority task start and termination. Test is repeated
support is present in the running kernel. This is a 10000 times, and we expect a latency under the tens
functional test. of microseconds. This is a performance test.
10) func preempt verifies that the system is able 17) latency rdtsc is a performance test. It mea-

87
Lachesis: a testsuite for Linux based real-time systems

sures the average latency between two read of the Several real-time tests were ported to Lachesis
TSC register, using the rdtscll() primitive. Test from other testsuites, in a simple and straightfor-
is repeated a million of times. ward way. In many cases there were no needs to
change the code except to add some macro calls at
18) latency sched is conceived to measure the la-
the beginning and at the end of the test’s code.
tency involved in periodic scheduling in systems that
do not support primitives for periodic scheduling. Our extension to librttest API has made possi-
A task is executed, then goes to sleep for a certain ble to develop some new tests for threads with fixed
amount of time; at the beginning of the new period periods or deadlines. These tests are useful to value
the task is rescheduled. We measure the difference jitter and latency in periodic task scheduling. Simi-
between expected start time and effective start time lar tests can be developed in very short times.
for the task. We expect this difference is under the
Unfortunately, Lachesis is far from complete.
tens of µs and, additionally, we expect the task does
First, its test coverage is very low. Second, tests
not miss any period. This is a performance test.
included in Lachesis are somewhat general to be re-
19) latency signal schedules two tasks with the ally useful in development. So Lachesis needs to ex-
same priority. One task sends a signal, the other re- pand its test coverage with more specific tests, and
ceives it. The test measures the time elapsed between librttest needs to take into account more low level
sending and receiving the signal. Test is repeated a primitive, to make possible to develop more signifi-
million of times. We expect a latency under the tens cant tests.
of µs. This is a performance test.
For this reasons, we believe it’s very important to
20) stress pi stresses the Priority Inheritance integrate the testsuite rt-tests in Lachesis. As under-
protocol. It creates 3 real-time tasks and 2 non real- lined previously, rt-tests is very specific in respect to
time tasks, locking and unlocking a mutex 5000 times Lachesis, and so it’s quite difficult to figure out how
per period. We expect real-time tasks to make more to extend these tests to other real-time nanokernels.
progress on the CPU than non real-time tasks. This We expect that a strong extension to librttest API
is a stress test. is necessary to reach this goal.
Up to now Lachesis is tested and used only on
x86 architecture. Given that we use only high-level
5 Conclusions and future work kernel primitives, we are quite confident that the
testsuite is easily portable on other architectures,
In this paper we have presented Lachesis, a unified with little or no effort. Recently we developed a
and automated testsuite for Linux based real-time porting of Xenomai 2.5.5.2 and RTAI 3.8 to a Mar-
systems. Lachesis tries to meet the need for a soft- vell ARM9 board2 , and we plan to use Lachesis to
ware tool to test Linux and Linux-based systems real- test the functionalities and the performances of these
time features, having the following qualities: portings.
However, just the variety of Linux based real-
• supports tests on Linux, RTAI, Xenomai, PRE- time systems that Lachesis is able to test proves that
EMPT RT and SCHED DEADLINE real-time it is portable and easy to use. We plan to exploit
features, through a standard test API this qualities porting Lachesis to other systems, such
• provides a series of functional, performance IRMOS [2], SCHED SPORADIC [21], XtratuM [22]
and stress tests to ensure the functionality of and PartiKle [23], and to other architectures.
the examined kernels Beyond this, we plan to develop some kernel-
space tests for real-time nanokernels and to build
• provides a series of tests for periodic and dead- a system to parse and XML format results. Test re-
line tasks sults quality can be improved, also, detailing possible
• is easy to use: each feature to be tested is as- reasons behind a test failure.
sociated to a script, which runs tests and logs Lachesis is actually under active development,
the results for every testable system. and can be downloaded from bitbucket.org3 .
• it includes a set of bash scripts that helps to
execute tests in the correct order and in the
correct conditions.
2 ARM Marvell 88F6281, equipped with a Marvell Feroceon processor, clocked at 1.2 GHz, ARMv5TE instruction set.
3 https://bitbucket.org/whispererindarkness/lachesis

88
Real-Time Linux Infrastructure and Tools

Acknowledgments [11] J. Admanski and S. Howard, Autotest-Testing


the Untestable, in Proceedings of the Linux
We would like to thank Andrea Baldini and Symposium, 2009.
Francesco Lucconi for their contributions in the de-
velopment of the testsuite, and Massimo Lupi for his [12] H. Yoshioka, Regression Test Framework and
ideas and advices. Kernel Execution Coverage, in Proceedings of
the Linux Symposium, 2007, pp. 285-296.
[13] http://git.kernel.org/?p=linux/kernel/git/clrk-
References wllms/rt-tests.git;a=summary

[1] S. Rostedt and D. Hart, Internals of the RT [14] https://rt.wiki.kernel.org/index.php/Cyclictest


Patch, in Proceedings of the Linux Symposium, [15] P. Larson, Testing Linux with the Linux Test
2007, pp. 161-172. Project, in Ottawa Linux Symposium, 2001, p.
[2] F. Checconi, T. Cucinotta, D. Faggioli, and G. 265.
Lipari, Hierarchical multiprocessor CPU reser- [16] S. Modak and B. Singh, Building a Robust Linux
vations for the linux kernel. OSPERT 2009, p. kernel piggybacking The Linux Test Project, in
15. Proceedings of the Linux Symposium, 2008.
[3] D. Faggioli, F. Checconi, M. Trimarchi, and [17] S. Modak, B. Singh, and M. Yamato, Putting
C. Scordino, An EDF scheduling class for the LTP to test - Validating both the Linux kernel
Linux kernel, in Proceedings of the 11th Real- and Test-cases, Proceedings of Linux Sympo-
Time Linux Workshop, 2009, pp. 1-8. sium, 2009.
[4] P. Mantegazza, E. Dozio, and S. Papacharalam-
[18] P. Larson, N. Hinds, R. Ravindran, and H.
bous, RTAI: Real time application interface,
Franke, Improving the Linux Test Project with
Linux Journal, no. 72es, pp. 10-es, 2000.
kernel code coverage analysis, in Proceedings of
[5] P. Gerum, Xenomai - Implement- the Linux Symposium, 2003, pp. 1-12.
ing a RTOS emulation framework on
[19] G. C. Buttazzo, Hard real-time computing sys-
GNU/Linux, 2004. [Online]. Available:
tems - predictable scheduling algorithms and ap-
http://www.xenomai.org/documentation/
plications, 2nd edition, Springer, 2005.
[6] K. Yaghmour, Adaptive domain environment
[20] E. Bini and G. C. Buttazzo, Measuring the
for operating systems, 2001. [Online]. Available:
Performance of Schedulability Tests, Real-Time
http://www.opersys.com/adeos/
Systems, vol. 30, no. 1-2, pp. 129-154, May 2005.
[7] M. Bligh and A. Whitcroft, Fully Automated
[21] D. Faggioli, A. Mancina, F. Checconi, and G. Li-
Testing of the Linux Kernel, in Proceedings of
pari, Design and implementation of a posix com-
the Linux Symposium, vol. 1, 2006, pp. 113-125.
pliant sporadic server for the Linux kernel, in
[8] D. Evans, J. Guttag, J. Horning, and Y. Tan, Proceedings of the 10th Real-Time Linux work-
LCLint: A tool for using specications to check shop, 2008, pp. 65-80. [Online].
code, ACM SIGSOFT Software Engineering
Notes, vol. 19, no. 5, pp. 87-96, 1994. [22] M. Masmano, I. Ripoll, A. Crespo, and J.
Metge, Xtratum: a hypervisor for safety criti-
[9] D. Beyer, T. Henzinger, R. Jhala, and R. Ma- cal embedded systems, in Proceedings of the 11th
jumdar, The software model checker Blast, In- Real-Time Linux Workshop. Dresden. Germany,
ternational Journal on Software Tools for Tech- 2009.
nology Transfer, vol. 9, no. 5-6, pp. 505-525,
Sep. 2007. [23] S. Peiro, M. Masmano, I. Ripoll, and A. Cre-
spo, PaRTiKle OS, a replacement for the core of
[10] G. Carette, CRASHME: Random In- RTLinux-GPL, in Proceedings of the 9th Real-
put Testing, 1996. [Online]. Available: Time Linux Workshop, Linz, Austria, 2007, p.
http://crashme.codeplex.com/ 6.

89
Lachesis: a testsuite for Linux based real-time systems

90
Real-Time Linux Infrastructure and Tools

Generic User-Level PCI Drivers

Hannes Weisbach, Björn Döbel, Adam Lackorzynski


Technische Universität Dresden
Department of Computer Science, 01062 Dresden
{weisbach,doebel,adam}@tudos.org

Abstract
Linux has become a popular foundation for systems with real-time requirements such as industrial
control applications. In order to run such workloads on Linux, the kernel needs to provide certain
properties, such as low interrupt latencies. For this purpose, the kernel has been thoroughly examined,
tuned, and verified. This examination includes all aspects of the kernel, including the device drivers
necessary to run the system.
However, hardware may change and therefore require device driver updates or replacements. Such an
update might require reevaluation of the whole kernel because of the tight integration of device drivers
into the system and the manyfold ways of potential interactions. This approach is time-consuming and
might require revalidation by a third party. To mitigate these costs, we propose to run device drivers in
user-space applications. This allows to rely on the unmodified and already analyzed latency characteristics
of the kernel when updating drivers, so that only the drivers themselves remain in the need of evaluation.
In this paper, we present the Device Driver Environment (DDE), which uses the UIO framework
supplemented by some modifications, which allow running any recent PCI driver from the Linux kernel
without modifications in user space. We report on our implementation, discuss problems related to DMA
from user space and evaluate the achieved performance.

1 Introduction in safety-critical systems would need additional au-


dits and certifications to take place.
Several advantages make the Linux kernel an at- The web server in above example makes use of an
tractive OS platform for developing systems with in-kernel network device driver. Now, if the network
real-time capabilities in areas as diverse as indus- driver needs to be upgraded for instance because of
trial control, mobile computing, and factory automa- a security-related bugfix, the whole kernel or at least
tion: The kernel supports many popular computing parts of it would need to be reaudited. These certifi-
platforms out of the box, which provides a low bar- cations incur high cost in terms of time and manual
rier starting to develop software for it. Being open labor. They become prohibitively expensive when
source, it can be easily adapted to the target plat- they need to be repeated every time a part of the
form’s needs. A huge community of developers guar- system is upgraded.
antees steady progress and fast response to problems.
Running device drivers in user space allows to
Applying Linux in a real-time environment how- circumvent recertification of the whole kernel by en-
ever leads to additional problems that need to be capsulating the device driver in a user-level appli-
handled. We imagine a system where a computer cation. If, like in our example, the network is solely
controls a safety-critical industrial machine while in used by non-real-time work, it can be completely run
parallel providing non-real-time services. For ex- outside the real-time domain and doesn’t need to be
ample, it might provide work statistics through a certified at all.
web server running on the same machine. The
Linux already comes with UIO, a framework for
RT PREEMPT series of kernel patches [1] aims to
writing device drivers in user space [5]. However,
provide the ability to do so. However, use of Linux

91
Generic User-Level PCI Drivers

these drivers still need to be rewritten from scratch order to run them as user-level applications without
using UIO. In this paper we propose an alternative modification. In this section we give an overview of
technique: Using UIO and other available kernel the DDE approach and analyze Linux’ UIO frame-
mechanisms, we implement a Device Driver Envi- work regarding its capabilities of supporting generic
ronment (DDE) – a library providing a kernel-like user-level device drivers.
interface at the user level. This approach allows for
reusing unmodified in-kernel drivers by simply wrap-
ping them with the library and running them at the 2.1 The DDE Approach
user level.
Our approach for reusing in-kernel device drivers in
In the following section, we introduce the general user space is depicted in Figure 1. The source code
idea of the DDE and inspect the UIO framework with of an unmodified native Linux device driver is linked
respect to its support of a generic user-level driver against a wrapper library, the Device Driver Envi-
layer. We then discuss our implementation of a DDE ronment. The wrapper provides all functions the
for Linux in Section 3. Thereafter, we continue an- driver expects to be implemented originally by the
alyzing the special needs of Direct Memory Access Linux kernel. The DDE reimplements these func-
(DMA) from user space in Section 4 and present a tions solely using mechanisms provided by a device
solution that requires only minimal kernel support. driver abstraction layer, called DDEKit.
In Section 5 we evaluate our DDE implementation
with an in-kernel e1000e network device driver run-
ning as user-space application.
Native Linux
Device Driver

2 User-Level Device Drivers


Device Driver Environment
for Linux
DDEKit Abstraction Layer

Device drivers are known to be one of the single most


important sources of bugs in today’s systems [4].
Combined with the fact that most modern operat- Linux Kernel
ing systems run device drivers inside their kernel, it
is not surprising that a majority of system crashes is
caused by device drivers – Swift and colleagues re- FIGURE 1: DDE Architecture
ported in 2003 that 85% of Windows crashes may be
Only DDE knows about the intricate require-
attributed to device driver faults [24].
ments of guest drivers. In turn, the DDEKit provides
One way to improve reliability in this area is abstract driver-related functionality (device discov-
to separate device drivers from the kernel and run ery and management of device resources, synchro-
them as independent user-level applications. Doing nization, threading, etc.) and implements it using
so isolates drivers from each other and the remaining functionality from the underlying host OS. Split-
components and increases the chance that a faulting ting development into these two layers allows to use
driver does not take down the rest of the system. a DDEKit for a certain host platform in connec-
Properly isolated drivers may be restarted after a tion with different guest DDE implementations as
crash as it is done in Minix3 [13]. Performance degra- well as reuse the same guest DDE on a variety of
dation resulting from moving drivers out of the kernel hosts. This layering has allowed for implementations
into user space is often considered a major disadvan- of DDE/DDEKit for different guest drivers (Linux,
tage of this approach. However, it has been proven FreeBSD [10]) as well as different host platforms
that user-level device drivers may achieve the same (Linux, L4/Fiasco [12], Genode [15], Minix3 [25],
performance as if run in the kernel [16]. Further GNU/HURD [7]).
research showed that existing device drivers can be
In this paper we focus on implementing a DDE
automatically retrofitted to run most of their critical
for Linux PCI device drivers on top of the Linux
code in user space and only keep performance-critical
kernel. To achieve this goal, it is necessary to under-
paths within the kernel [11].
stand the facilities at hand to perform device driver-
Our ultimate goal is to provide a Device Driver related tasks from user space. The User-level IO
Environment, a common runtime library that can be framework (UIO) appears to be a good starting point
linked against arbitrary in-kernel device drivers in for this.

92
Real-Time Linux Infrastructure and Tools

2.2 UIO Overview disable interrupt delivery using an interrupt disable


bit. This enables the implementation of a generic
The Linux user-level IO framework (UIO) is an ex- UIO PCI driver and removes the requirement of a
tension to the kernel that allows user-level drivers to device-specific driver stub.
access device resources through a file interface and
The lack of support for user-level DMA is an-
is depicted in Figure 2. The interfacing is performed
other issue that needs to be resolved in order to sup-
by the generic uio core. In addition to that, UIO
port arbitrary user-level PCI drivers. In the follow-
relies on a tiny device-specific driver stub, labelled
ing sections we present the details of our implemen-
uio dev in the figure. During startup, this stub ob-
tation of a DDE for Linux.
tains information about the device’s I/O resources
and when encountering an interrupt takes care of
checking whether the interrupt was raised by the de-
vice and handles the device-specific way of acknowl- 3 A DDE For Linux
edging the interrupt.
As described in Section 2.1, the user-space driver
User-Level environment consists of two parts: a host-specific
Device Driver DDEKit providing a generic device driver abstrac-
tion and a guest-specific DDE that solely relies on
the functionality provided by the DDEKit. For our
implementation, we can build upon the already ex-
uio_core isting Linux-specific DDE for the L4/Fiasco micro-
kernel [12]. In addition to that we need to implement
a DDEKit for Linux as a host, which we describe in
uio_dev this section.
Linux Kernel

FIGURE 2: UIO components and their in- 3.1 Anatomy of a DDEKit


teraction
The DDEKit’s task is to provide a generic interface
A user-level driver obtains access to the target
that suits the needs of guest device driver environ-
device’s resources through a /dev/uioXXX device file.
ments. To come up with this interface, we analyzed
Reading the device returns the number of (interrupt)
network, block, and character device drivers in two
events that occurred since the last read. Device I/O
different kernels (Linux and FreeBSD) [10], resulting
memory can be accessed by mmap’ing the device. UIO
in a list of mechanisms all these drivers and their
neither supports x86 I/O ports1 nor direct memory
respective environments rely on.
access (DMA).
The most important task of a device driver is
managing I/O resources. Therefore, a driver abstrac-
UIO for Generic User-Level Drivers tion layer needs to provide means to access and han-
dle interrupts, memory-mapped I/O, and I/O ports.
Our goal is to implement a DDE that allows generic As most of the drivers we are concerned with are PCI
PCI device drivers to be run in user space. This device drivers, DDEKit also needs to provide ways
does not fit well with UIO’s dependence on a device- to enumerate the system’s PCI devices or at least
specific stub driver. Unfortunately, there is no discover the resources that are attached to a device.
generic way to move acknowledgment of an inter- Additionally, means for dynamic memory manage-
rupt out of the kernel. Instead, this is often highly ment are crucial when implementing anything but
device-specific and requires running in kernel mode. the most simple device driver.
As an exception, the situation improves with PCI While most device drivers operate single-
devices that adhere to more recent versions of the threaded, threading plays an important role in DDE
PCI specification [21] (v2.3 or later). These devices implementations, because threads can be used to im-
allow generic detection of whether an interrupt is plement tasks such as interrupt handling, Linux Soft-
asserted using an interrupt state bit in the PCI con- IRQs, as well as deferred activities (work queues).
fig space. Furthermore, it is possible to generically The existence of threading implies that synchroniza-
1 Actually, UIO does not need to support I/O ports, because these can be directly accessed by the user application if it is

given the right I/O permissions.

93
Generic User-Level PCI Drivers

tion mechanisms such as locks, semaphores, and even uio pci generic to the device. Whenever the read
condition variables need to be present. returns, at least one interrupt event has occurred
and the handler function registered by the driver is
Furthermore, a lot of drivers need a notion of
executed.
time, which Linux drivers usually obtain by look-
ing at the magic jiffies variable. Hence, DDEKit The interrupt handler thread is the only one
needs to support this. Apart from these features, in polling the UIO device file for interrupts. Af-
order to be useful, the DDEKit also provides means ter successful return from the blocking read, the
for printing messages and a link-time mechanism sysfs node for the device’s PCI config space
for implementing prioritized init-calls, that is func- (/sys/class/uio/.../config) is written to disable
tions that are automatically run during application IRQs while handling the interrupts. In order to avoid
startup before the program’s main function is exe- interrupt storms in the kernel while the user-level
cuted. driver is executing its handler, the disabled interrupt
is only turned on right before the interrupt thread be-
comes ready to wait for the next interrupt by reading
3.2 I/O Ports and Memory the UIO device.

In order to drive PCI devices and handle their re-


sources, DDEKit needs means to discover devices at 3.4 Threads and Synchronization
runtime. This is implemented using libpci [18], which
allows scanning the PCI bus from user space. The
Threads are a fundamental building block of a DDE,
located devices are then attached to a virtual PCI
because drivers may use a wide range of facilities
bus implemented by DDE. At runtime, any calls by
that might be executed in parallel: soft-IRQs, ker-
the driver to the PCI subsystem use this virtual bus
nel threads, and work queues are implemented by
to perform their work.
spawning a dedicated thread for each such object.
After the virtual PCI bus is filled with Furthermore, threads are used for implementing in-
the devices to be driven, information about terrupts as discussed in Section 3.3. However, not
the provided resources is obtained from all of these activities are actually allowed to execute
/sys/bus/pci/devices/.../resource. Access to in parallel. Therefore, means for (blocking) synchro-
I/O ports is later granted by first checking whether nization are needed.
the ports requested by the driver match the ones
As DDEKit/Linux is implemented to run in
specified by the resource file, and thereafter grant-
Linux user space, we can make use of the full range
ing the process port access using the ioperm system
functions provided by the libpthread API to imple-
call. I/O memory resources are also validated and
ment threading as well as synchronization.
then made accessible by mmaping the respective sysfs
resource files.

3.5 Timing
3.3 Interrupt Handling
Linux device drivers use timing in two flavors: first,
For managing interrupts, DDEKit/Linux makes use the jiffies counter is incremented with every clock
of the UIO interrupt handling mechanism, which tick. DDEKit/Linux emulates jiffies as a global
supports generic interrupt handling through the variable. During startup, a dedicated jiffies
uio pci generic module for all PCI devices sup- thread is started that uses the libC’s nanosleep to
porting the PCI specification v2.3 or higher. sleep for a while and thereafter adapt the jiffies
counter accordingly. For the drivers we experimented
Once the driver requests an IRQ for a de-
with so far, it has proven sufficient to not tick with
vice, DDEKit locates the generic UIO driver’s
HZ frequency as the Linux kernel would, but in-
sysfs node (/sys/bus/pci/drivers/.../new id).
stead only update the jiffies counter every 10th
It then writes the PCI device’s device and vendor IDs
HZ tick. This might be adapted once a driver needs
into this file and thereby makes uio pci generic
a finer granularity. Furthermore, as device drivers
become responsible for handling this device’s inter-
run as independent instances in user space, this can
rupts.
be configured for every device driver separately ac-
Thereafter, a new interrupt handler thread is cording to its needs and the jiffies counting over-
started. This thread performs a blocking read on head can even be completely removed for drivers that
the UIO file that was generated when attaching don’t need this time source.

94
Real-Time Linux Infrastructure and Tools

The second way Linux drivers use timing is 1. The region’s physical address needs to be avail-
through the add timer group of functions that allows able as DMA does not use virtual addresses.
to program deferred events. DDEKit/Linux provides
an implementation by spawning a dedicated timer 2. It needs to be physically contiguous so that no
thread for every driver instance. This thread man- virtual-to-physical address translations need to
ages a list of pending timers and uses a semaphore be done during the DMA transfer.
to block with a timeout until the next timer occur-
3. It needs to be pinned, that is the region or parts
rence should be triggered. If the blocking semaphore
of it must not be swapped out during the DMA
acquisition returns with a timeout, the next pending
transfer.
timer needs to be handled by executing the handler
function. Otherwise, an external thread has mod-
ified the timer list by either adding or removing a None of these criteria are met by user-
timer. In this case the timer thread recalculates the level memory allocation routines such as malloc,
time to sleep until the next trigger and goes back to posix memalign or mmap, because they work on
sleep. purely virtual addresses and the underlying kernel
is free to map those pages anywhere it wants.
As it is necessary to get kernel support for han-
dling DMA, we implemented a small kernel module
3.6 Memory Management providing an interface to the in-kernel DMA API.
The module supports two modes: copy-mode pro-
Running in user space means that DDEKit/Linux vides a simple translation layer between user and
may use LibC’s malloc and free functions for inter- kernel pages for DMA and zero-copy mode facilitates
nal memory management needs. However, this does an IOMMU to improve DMA performance.
not suffice for implementing Linux’ memory manage-
ment functions. Linux’ kmalloc is internally already
implemented using SLABs or one of their equiva- 4.1 Copy-DMA
lents. Our implementation currently provides a spe-
cific SLAB implementation in DDEKit, but we plan Our kernel module for supporting DMA from user
to use Linux’ original memory allocator in the fu- space closely collaborates with the uio core as
ture and only back it with page-granularity memory shown in Figure 3. The uio dma module is noti-
allocations provided from DDEKit. fied by the uio core when a device is bound to it
Additionally, Linux drivers may use the group and creates an additional device node /dev/uio-dma
of get free pages functions to allocate memory which user-level drivers can use to obtain DMA-able
with page granularity. DDEKit/Linux supports page memory for a specific device2 .
granularity allocations through a function that uses
mmap in order to allocate page-aligned memory. User-Level
Device Driver
A remaining problem is that drivers commonly
acquire DMA-able memory in order to allow high
amounts of data to be copied without CPU in-
teraction. This is impossible by solely relying on
user-level primitives. This means that an imple- uio_core uio_dma

mentation of DMA allocation functions such as


dma alloc coherent requires additional thought.
We go on to discuss our solution to this problem uio_dev
Linux Kernel
in the following section.

FIGURE 3: Introducing the uio dma mod-


ule
4 Attacking the DMA Problem Linux device drivers can allocate DMA memory
either using dma alloc coherent or they can request
In order for DMA to or from a memory region to to map DMA memory for a certain buffer in virtual
work properly, the region needs to meet three crite- memory and a DMA direction (send/receive) using
ria: {dma,pci} map single.
2 The notification is necessary so that the uio dma module has access to the respective UIO PCI device data structure.

95
Generic User-Level PCI Drivers

A naı̈ve idea would be to simply implement a de-


vice driver that allows allocating DMA memory from IOVA Physical Virtual
the kernel and then use it from user space. However, space Memory Memory
on many platforms it is possible to use any physi-
cal memory for DMA. Therefore, many drivers do
not explicitly allocate DMA memory upfront, but
instead simply start DMA from arbitrary memory
regions, even from their stack. This means the DMA
allocator driver would have to provide all dynamic
memory allocations for the user space driver. Not
only would this circumvent the convenience of man-
aging user space memory using libC’s malloc and
free, but it would also decrease the possibility of
using GDB with a user space driver, because such FIGURE 4: Mapping DMA buffers using
kernel memory would not be ptraceable. an IOMMU
DDE’s implementation of dma alloc coherent Using an IOMMU comes to the rescue here. In
performs an mmap on the uio-dma device which in this case we can use an arbitrary buffer that is vir-
turn allocates DMA-able memory in the kernel and tually contiguous (which includes every buffer allo-
establishes a mapping to user space so that upon cated using malloc). The uio dma module upon en-
return from the system call the driver can use this countering the DMA ioctl then only needs to run
memory area for DMA. through the list of pages forming the buffer and per-
For the map single family of functions an ioctl form an equally contiguous mapping into the device’s
on the uio-dma device is used to send a virtual user IOVA space. Thereafter, the user-level driver can use
address and the DMA direction to the kernel mod- the IOVA address returned from the ioctl call and
ule. The system call returns the physical address the program DMA without needing to care about phys-
driver can then use to initiate DMA. ical contiguity.
If upon a DMA MAP ioctl the direction indi- A minor intricacy arises because of the fact that
cates that data shall be sent from user space, the user-level buffers do not always start and end at
kernel module allocates a DMA-able kernel buffer a page boundary. This means that multiple DMA
and copies the user data into this buffer before re- buffers may share the same page, so that upon un-
turning the DMA buffer’s physical address. If DMA mapping one of the buffers, the uio dma module can-
shall be done from the device into a user buffer, not safely remove the IOVA mapping as other DMA
the ioctl only allocates a DMA buffer in the ker- buffers may still contain the same page. Therefore,
nel and delays copying data from the DMA buffer uio dma uses reference counting to detect when a
to the user buffer until the buffer is unmapped us- page may really be unmapped.
ing dma unmap single. It is safe to do so, because
only after this function call the DMA can safely be
assumed to be finished and therefore the user-level 5 Case Studies
driver should not touch the buffer beforehand any-
way. There are three interesting questions concerning
user-level device drivers:
4.2 DMA With Fewer Copies 1. Does running the driver in user space modify
the real-time capabilities of a PREEMPT RT
While the copy-DMA method works without any fur- kernel?
ther support than the one that is already present
within the kernel, more recent hardware featuring 2. How does the user space driver’s performance
an IOMMU can be used to get rid of the copying compare with an in-kernel driver?
steps between user and kernel buffers.
3. Which benefits can be gained by running a
Copy-DMA uses different kernel- and user-level driver in user space and using existing profiling
buffers because it needs to ensure that the kernel and debugging tools?
buffers that are effectively used for the DMA opera-
tion are in fact physically contiguous, which cannot In this section we try to answer these questions us-
be guaranteed for the user-level buffers. ing a real-world example. We downloaded the Linux

96
Real-Time Linux Infrastructure and Tools

e1000e network interface driver from the Intel web-


site [6] and compiled it to run in user space.
For all our experiments we used a quad-core Intel cyclic_test latencies with IOMMU

Maximum latency in microseconds


60
Core i7 running at 2.8 GHz with 2 GB of RAM. The
operating system was a Linux 3.0.1-rt11 kernel with 50

the PREEMPT RT option switched on. We tested 40


the e1000e driver with an Intel 82578DC Gb ethernet 30
card.
20

10
5.1 Real-Time Operation 0

no

hi

no

hi

no

hi

no

hi
_l

_l

_l

_l
_l

_l

_l

_l
oa

oa

oa

oa
oa

oa

oa

oa
To evaluate the influence of running PCI drivers

d_

d_

d_
d

d_

d_

d_
e1

us

us
e1

us
in user space on the system’s real-time behav-

se

er

er
00

er
00

_m
0e

_m
0e

ap
ior, we used the cyclic test utility provided by

ap
OSADL [20]. Figure 5 shows the maximum laten-
cies for several scenarios we tested.
FIGURE 6: Maximum cyclic test latencies
Each group has four bars corresponding to for the IOMMU scenario
threads running on the 4 CPUs in our test ma-
chine. The group labelled no load shows the la- In addition to the experiments also present in
tencies for running cyclic test on the idle system the no-IOMMU case, we added two more bar groups
running with idle=poll to mitigate power manage- labelled * user map. These groups show maximum
ment effects. For the group labelled hi load we set latencies obtained when using the no-copy version of
each CPU’s load to 100% and reran cyclic test. the uio dma module.
Thereafter, we added network load to the system
by running the IPerf UDP benchmark [8] between In both setups we see that the maximum laten-
the test machine and a remote PC. The groups with cies using user-level device drivers are within the
labels * e1000e show the latency for using network bounds of the other measurements. Although we ob-
through the in-kernel e1000e driver. The groups la- serve a peak in the latency for hi load user, this peak
belled * user give latencies obtained for running the is within the bounds of the unmodified measurements
experiment with the e1000e driver in user space using (e.g., hi load in the previous experiment). We con-
DDE. clude that running device drivers in user space using
DDE has no influence on the real-time capabilities of
cyclic_test latencies without IOMMU
the system.
60
Maximum latency in microseconds

50

40 5.2 UDP/TCP Performance


30
To evaluate the performance of user-level DDE
20
drivers, we linked our user-level e1000e driver to the
10 lwIP stack [9] and then ran a UDP and TCP through-
0
put benchmark while connected to an external com-
puter. For comparison, we also ran the same bench-
no

hi

no

hi

no

hi
_l

_l

_l
_l

_l

_l
oa

oa

oa

mark using the in-kernel device driver and the builtin


oa

oa

oa
d

d_

d_
d

d_

d_
e1

us
e1

us

Linux TCP stack.


er
00

er
00

0e
0e

Figure 7 shows the average and maximum


FIGURE 5: Maximum cyclic test latencies throughputs achieved in these experiments. For
for the non-IOMMU scenario UDP it is notable, that even though the network link
is a 1Gb NIC, IPerf was only able to saturate 800
Additionally, we tried to figure out whether turn- MBit/s. Furthermore, user-level and kernel stacks
ing the machine’s IOMMU on or off makes a differ- perform equally well. However, the CPU utilization
ence and therefore reran our experiments with the for the user-level driver is higher: the kernel stack
IOMMU turned on. The results for these experi- ran at about 50% utilization, while the user stack
ments are shown in Figure 6. consumed 80%.

97
Generic User-Level PCI Drivers

Profiling DDE

IPerf throughput When we initially ran the e1000e driver in user space,
Kernel stack
performance was by far not as convincing as in the
1000 User lwIP stack experiments described in Section 5.2.
Using Valgrind’s [19] Callgrind profiler, we were
Throughput in MBit/s

800
able to investigate where the performance went. We
600 were caught by surprise by the result: DDE man-
ages a list of virtual-to-physical mappings for all
400
allocated memory. This list is used to implement
200
the virt to phys lookup mechanism. This is im-
plemented as a linked list and the assumption was
0 that this would suffice, because there would never
U

TC

TC

be many mappings stored in this list and calls to


D

P_

P_
P_

P_

av

m
av

virt to phys would take place rather less frequently.


ax
ax

g
g

Callgrind’s output however told us that this func-


FIGURE 7: IPerf throughput results tion accounted for a huge amount of execution time.
With this knowledge we were able to take a closer
TCP performance is unfortunately much worse look at what regions were registered in this list and
for the user-level TCP stack than with the in-kernel found out that in many cases, we did not need to
one. We are still investigating these issues and right store this information at all, thereby reducing the
now attribute this to problems with the lwIP stack. amount of time spent searching the virt-phys map-
pings.

5.3 Testing, Debugging, and Profiling


6 Related Work
So far we showed that user-level device drivers allow
for increased isolation at acceptable speed while not Our work relates to the problem of reusing existing
influencing the system’s real-time capabilities. An- device drivers when designing a new operating sys-
other advantage of running them as user applications tem. LeVasseur et al. proposed to use per-driver vir-
is the availability of debugging and profiling tools tual machines to reuse and isolate device drivers [17].
that ease driver development. In this section we in- Poess showed that the DDE approach also works
troduce two examples where user-level tools could be for binary device drivers [22], which is not yet im-
applied to kernel code and helped us find problems plemented in our system. Friebel implemented a
in our implementation of DDE. DDE for FreeBSD device drivers which uses the split
DDEKit/DDE architecture [10]. Building upon this,
it should now also be possible to use FreeBSD device
Debugging DDE drivers within Linux user space. Boyd-Wickizer’s
SUD framework [2] also allows running Linux drivers
in user space but focusses more on security and iso-
While working on the user-level e1000e device driver,
lation and instead of the real-time capabilities of the
we experienced hangs in TCP connectivity and
system.
started debugging them. With the help of the GDB
debugger we were able to figure out that the problem Our work is also related to Schneider’s device
occurred when the driver was in NAPI polling mode driver validation mechanism [23]. By wrapping
and ran out of its network processing budget. drivers in user-level applications, we can use the sys-
tem’s native analysis and profiling tools in order to
In the Linux kernel, the driver at this point vol-
observe driver behavior and identify security viola-
untarily reschedules, giving other kernel activity the
tions. The RUMP framework for NetBSD has similar
chance to run. This was improperly implemented
goals as our work and allows debugging and devel-
within the DDE as it simply returned from the Soft-
oping device drivers for NetBSD in user space [14].
IRQ handler. In our case however, it would have
However, it is not intended to be used for actually
been necessary to raise the Soft-IRQ again. This did
running drivers at user space in production systems.
not happen and so the driver went to sleep until it
got woken up by the next interrupt. Chipounov proposed to perform heavyweight

98
Real-Time Linux Infrastructure and Tools

instruction-level tracing and symbolic execution in ings of the 2010 USENIX conference on USENIX
order to generate device-specific code that can be annual technical conference, USENIX ATC’10,
dropped into existing per-OS device driver skele- pages 9–9, Berkeley, CA, USA, 2010. USENIX As-
tons [3]. While this approach eases device driver sociation.
reuse, applying it to a real-time kernel has the same [3] Vitaly Chipounov and George Candea. Reverse en-
drawbacks as native in-kernel drivers in that they gineering of binary device drivers with RevNIC. In
still need to be revalidated every time an update is EuroSys ’10: Proceedings of the 5th European Con-
applied. ference on Computer Systems, pages 167–180, New
York, NY, USA, 2010. ACM.
[4] Andy Chou, Junfeng Yang, Benjamin Chelf, Seth
Hallem, and Dawson Engler. An empirical study of
7 Conclusion operating systems errors. In SOSP ’01: Proceed-
ings of the Eighteenth ACM Symposium on Operat-
In this paper we presented a Device Driver En- ing Systems Principles, pages 73–88, New York, NY,
vironment that allows executing generic Linux in- USA, 2001. ACM.
kernel PCI drivers as user-level applications on top [5] Jonathan Corbet. UIO: user-space drivers. https:
of Linux. This is achieved by implementing the DDE //lwn.net/Articles/232575/, 2007.
as a wrapper library implementing the facilities ex- [6] Intel Corp. Network adapter driver for Gigabit
pected by in-kernel drivers at user space using off- PCI based network connections for Linux. http:
the-shelf kernel mechanisms such as UIO and sysfs. //downloadcenter.intel.com, 2010.
With the help of a small kernel module our frame- [7] Zheng Da. DDE for GNU/HURD. http://www.
work also supports DMA from user space. gnu.org/software/hurd/dde.html.
Using this framework, we were able to run the [8] Jon Dugan and Mitch Kutzko. IPerf TCP/UDP
bandwidth benchmark. http://sourceforge.net/
widely used e1000e network interface driver in user
projects/iperf/, 2011.
space on a PREEMPT RT kernel. Experiments us-
ing cyclic test showed that the real-time latencies of [9] Adam Dunkels. Minimal TCP/IP implementation
the system were not influenced by the fact that the with proxy support. Technical Report T2001:20,
SICS – Swedish Institute of Computer Science,
driver was running from user space. Furthermore, it
February 2001. Master’s thesis.
was possible to use common Linux program analy-
sis tools such as the GDB debugger and Valgrind to [10] Thomas Friebel. Uebertragung des Device-
Driver-Environment-Ansatzes auf Module des BSD-
profile and debug drivers.
Betriebssystemkerns. Master’s thesis, TU Dresden,
The DDEKit for Linux is available for download 2006.
at http://os.inf.tu-dresden.de/ddekit/. [11] Vinod Ganapathy, Matthew J. Renzelmann, Arini
Balakrishnan, Michael M. Swift, and Somesh Jha.
The design and implementation of microdrivers.
In ASPLOS’08: Proceedings of the Thirteenth In-
Acknowledgments ternational Conference on Architectural Support
for Programming Languages and Operating Sys-
We’d like to thank several people whose hard work tems, pages 168–178, Seattle, Washington, USA,
within the recent years has made design and imple- March 2008. ACM Press, New York, NY, USA.
mentation of the Device Driver Environment possi- http://doi.acm.org/10.1145/1346281.1346303.
ble. Thank you, Christian Helmuth, Thomas Friebel, [12] TU Dresden OS Group. DDE/DDEKit for
and Dirk Vogt. Carsten Weinhold provided valuable Fiasco+L4Env. http://wiki.tudos.org/DDE/
hints on improving this paper. DDEKit, 2006.
[13] Jorrit N. Herder, Herbert Bos, Ben Gras, Philip
This work was partially supported by the Ger-
Homburg, and Andrew S. Tanenbaum. Failure re-
man Research Association (DFG) within the Special silience for device drivers. In DSN ’07: Proceedings
Purpose Program 1500, project title ASTEROID. of the 37th Annual IEEE/IFIP International Con-
ference on Dependable Systems and Networks, pages
41–50, Washington, DC, USA, 2007. IEEE Com-
References puter Society.
[14] Antti Kantee. Rump device drivers: Shine on
[1] Linux RT project. http://www.kernel.org/pub/ you kernel diamond. http://ftp.netbsd.org/pub/
linux/kernel/projects/rt/. NetBSD/misc/pooka/tmp/rumpdev.pdf, 2010.
[2] Silas Boyd-Wickizer and Nickolai Zeldovich. Toler- [15] Genode Labs. Genode dde kit. http://genode.
ating malicious device drivers in linux. In Proceed- org/documentation/api/dde\_kit\_index.

99
Generic User-Level PCI Drivers

[16] Ben Leslie, Peter Chubb, Nicholas Fitzroy-Dale, [21] PCI SIG. PCI Local Bus Specification.
Stefan Götz, Charles Gray, Luke Macpherson, http://www.pcisig.com/specifications/
Daniel Potts, Yueting Shen, Kevin Elphinstone, and conventional/conventional_pci_23/, 2002.
Gernot Heiser. User-level device drivers: Achieved
performance. Journal of Computer Science and [22] Bernhard Poess. Binary device driver reuse. Mas-
Technology, 20, 2005. ter’s thesis, Universitaet Karlsruhe, 2007.
[17] Joshua LeVasseur, Volkmar Uhlig, Jan Stoess, and
Stefan Götz. Unmodified device driver reuse and im- [23] Fred Schneider, Dan Williams, Patrick Reynolds,
proved system dependability via virtual machines. Kevin Walsh, and Emin Gun Sirer. Device driver
In In Proceedings of the 6th Symposium on Operat- safety through a reference validation mechanism. In
ing Systems Design and Implementation, pages 17– Proceedings of the 8th USENIX Symposium on Op-
30, 2004. erating Systems Design and Implementation OSDI
’08, December 2008.
[18] Martin Mares. PCI Utilities. http://mj.ucw.cz/
pciutils.html, 2010.
[24] Michael M. Swift, Brian N. Bershad, and Henry M.
[19] Nicholas Nethercote and Julian Seward. Valgrind: a Levy. Improving the reliability of commodity oper-
framework for heavyweight dynamic binary instru- ating systems. SIGOPS Oper. Syst. Rev., 37(5):207–
mentation. In Proceedings of the 2007 ACM SIG- 222, 2003.
PLAN Conference on Programming Language De-
sign and Implementation, PLDI ’07, pages 89–100, [25] Andrew Tanenbaum, Raja Appuswamy, Herbert
New York, NY, USA, 2007. ACM. Bos, Lorenzo Cavallaro, Cristiano Giuffrida, Tomáš
[20] OSADL. Cyclic test util- Hrubý, Jorrit Herder, Erik van der Kouwe, and
ity. https://www.osadl.org/ David van Moolenbroek. MINIX 3: Status Report
Realtime-test-utilities-cyclictest-and-s. and Current Research. ;login: The USENIX Maga-
rt-test-cyclictest-signaltest.0.html, 2011. zine, 35(3), June 2010.

100
Real-Time Linux Infrastructure and Tools

COMEDI and UIO Drivers for PCI Multifunction


Data Acquisition and Generic I/O Cards
and Their QEMU Virtual Hardware Equivalents

Pavel Pı́ša
Czech Technical University in Prague, Department of Control Engineering
Karlovo náměstı́ 13, 121 35 Praha 2, Czech Republic
[email protected]

Rostislav Lisový
Czech Technical University in Prague, Faculty of Electrical Engineering
Karlovo náměstı́ 13, 121 35 Praha 2, Czech Republic
[email protected]

Abstract
The article describes implementation of UIO and Comedi drivers for Humusoft MF624 and MF614
data acquisition cards. Basic functions (D/A, A/D converters, digital inputs/outputs) of Humusoft
MF624 card were implemented into the Qemu emulator as well which enable to experiment with drivers
implementation without physical access to the cards and risk of data lost when drivers are developed
and tested on same primary Linux kernel instance. The article can help newcomers in the area to gain
knowledge required to implement support for other similar cards and hardware emulation of these cards.
The matching real and virtual setup can be used in operating system courses for practical introduction to
simple drivers implementation and helps with understanding internal computation world with real world
computers interfacing.

1 Introduction point of view and straightforward testing of proper


function of the driver. This can be very helpful for
beginners who are not familiar with hardware related
When teaching development of Linux drivers one of topics.
the approaches is to explain kernel API and program-
ming paradigms by creating driver which does not
require any special hardware – e.g. character driver
which returns upper case ASCII text when receiving 2 Humusoft MF614, MF624
lower case. Although this approach can be useful, the
issues associated with dealing with hardware should
be practised as well. Humusoft MF614 and MF624 are data acquisition
(DAQ) cards. Both of these cards use PCI inter-
The approach we took in this work eliminates
face to connect to the computer. The main features
the need of physical access to hardware whereas it
this cards provide are digital inputs, digital outputs,
provides full feature set of PCI device in form of vir-
ADCs, DACs, timers, encoder inputs. Humusoft
tual hardware. This was possible by implementing
MF614 is predecessor of MF624 – available functions
virtual PCI device into Qemu emulator.
are quite similar. The main difference is in driver
The main reason of choosing DAQ cards for programming – MF614 has only 8-bit wide registers,
this project was easy interfacing from programmer’s whereas MF624 has ones 16- or 32-bit wide.

101
COMEDI and UIO drivers for PCI Multifunction Data Acquisition

MF624 is available for purchase on manufac- 1 | /* struct pci_dev *dev */


turer’s web page. MF614 is no more produced. 2 | struct uio_info *info;
3 | info = kzalloc(sizeof(struct uio_info),
4 | GFP_KERNEL);
5 |
3 UIO Driver 6 | info->name = "mf624";
7 | info->version = "0.0.1";
Each UIO driver consists of two parts – small kernel 8 |
module (the need for it is mostly because of device- 9 | info->mem[0].name = "PCI chipset";
specific interrupt handling/disabling) and user-space 10 | info->mem[0].addr =
driver logic (as shown in figure 1). The main advan- 11 | pci_resource_start(dev, 0);
tage of this approach is that the most of the develop- 12 | info->mem[0].size =
ment happens in user-space, thus during prototyping 13 | pci_resource_len(dev, 0);
the driver (or when using a bad one) the integrity and 14 | info->mem[0].memtype = UIO_MEM_PHYS;
stability of the kernel will not be disrupted. 15 | info->mem[0].internal_addr =
16 | pci_ioremap_bar(dev, 0);
Hardware Kernel-space User-space 17 |
18 | info->port[0].name =
Linux kernel Application 19 | "Board programming registers";
/dev/uio0
PCI

20 | info->port[0].porttype = UIO_PORT_X86;
21 | info->port[0].start =
UIO driver Driver
22 | pci_resource_start(dev, 1);
23 | info->port[0].size =
24 | pci_resource_len(dev, 1);
FIGURE 1: UIO driver structure 25 |
26 | uio_register_device(&dev->dev, info);
27 | pci_set_drvdata(dev, info);
Driver uio pci generic
Structure uio mem is used for enabling memory-
When dealing with any device compliant to PCI 2.3, mapped I/O regions, whereas structure uio port is
it is also possible to use uio pci generic driver in used for I/O ports (for each of these structures there
kernel instead of programming a specific one. This is statically allocated array with a size of 5 elements).
driver makes all memory regions of the device avail-
able to user-space.
Binding to the device is done by writing Ven- Interface to User-space
dor and Device ID into /sys/bus/pci/drivers/
uio pci generic/new id file. Communication with kernel part of the UIO driver
is possible through /dev/uioX file (where X is the
Interrupt handler uses Interrupt Disable bit in number of instance of a driver). There are several
the PCI command register and Interrupt Status bit syscalls possible to be used when interfacing with
in the PCI status register. Because neither of MF614 this file:
or MF624 is PCI 2.3 compliant it is not possible to
use this driver for them.
open() opens the device, returns file descriptor used
for another syscalls.

Implementing the Kernel Part read() blocks until an interrupt occurs (the value
read is number of interrupts seen by the de-
In case of writing UIO driver for PCI device, ini- vice).
tialization function of the module registers struct
pci driver in standard way1 , where the probe func- mmap() is used to map memory of the device to user-
tion handles initialization of UIO-related structures. space. The offset value passed to mmap() de-
The main structure holding all data of particular termines the memory area of a device to map
UIO driver is struct uio info. Its simple initial- – for n-th area offset should be n*sysconf(
ization (including registration) is shown below: SC PAGESIZE).
1 For more information about PCI driver development see [1] available online at https://lwn.net/Kernel/LDD3/

102
Real-Time Linux Infrastructure and Tools

irqcontrol() is used for enabling (called with pa- comedi driver register() function. The only pa-
rameter set to (int) 1) or disabling ((int) 0) rameter passed to this function is pointer to struct
interrupts. comedi driver structure. The most important fields
of this structure are:
It is possible to define your own mmap(), open(),
release() functions as an option. When there is const char *driver_name; /* "my_driver" */
need to use irqcontrol(), it is necessary to imple- struct module *module; /* THIS_MODULE */
ment this function per device. int (*attach) (struct comedi_device *,
struct comedi_devconfig *);
Information related to a particular driver in-
int (*detach) (struct comedi_device *);
stance can be found in /sys/class/uio/uioX direc-
tory. Most of the files are read-only. The subdirec-
tory maps contains information about MMIO regions Unlike the UIO or generic PCI driver, the
mapped by the driver, subdirectory portio is for I/O main initialization function is not probe() (of
port regions. struct pci driver) but attach() (of struct
comedi driver) which is invoked by Comedi sub-
When using UIO and mmap() with MF624 card system.
(which has 32 or 128 bytes long memory regions)
there is an issue with the return value of this syscall The attach() function is responsible not only
– the pointer to the memory is page-size-aligned, so for common PCI device initialization but also for
it is necessary to add low bits of physical address initialization of struct comedi device (which is
(page offset) of each memory region to it. Physical accessible through a pointer passed to attach()
address can be obtained from addr file located in function). The most important step is to allocate
/sys/class/uio/uioX/maps/mapX. Region offset is and initialize each subdevice (in Comedis nomen-
equal to addr & (sysconf( SC PAGESIZE) - 1). clature subdevice represents one particular function
of the device – e.g. ADC, digital out, etc.) of the
DAQ card. Allocation is done by Comedi func-
tion alloc subdevices(struct comedi device
4 Comedi Driver *dev, unsigned int num subdev), each struct
comedi subdevice is then accessible in array
UIO driver is a versatile solution available mainly for called subdevices which is part of struct
uncommon devices. In our case of using DAQ card comedi device. Example of initialization of sub-
a special subsystem in Linux kernel designated for device representing ADC:
DAQ card drivers can be used. It is called Comedi
(Linux control and measurement device interface). It 1 | s = dev->subdevices + 0;
provides library functions for user- and kernel-space 2 | s->type = COMEDI_SUBD_AI;
making development and usage of DAQ devices eas- 3 | s->subdev_flags = SDF_READABLE |
ier. It consists of three different parts. 4 | SDF_GROUND;
5 | s->n_chan = 8;
Comedi is a part of Linux kernel. It consist of indi- 6 | s->maxdata = (1 << 14) - 1;
vidual device drivers including Comedi driver 7 | s->range_table = &range_bipolar10;
providing basic set of functions used by device 8 | s->len_chanlist = 8;
drivers. 9 | s->insn_read = mf624_ai_rinsn;
10 | s->insn_config = mf624_ai_cfg;
Comedilib is a user-space library providing unified
interface for another user-space application to
devices supported by Comedi. Interface to User-space
Kcomedilib is also a part of Linux kernel. It pro- After successful compilation and loading of particu-
vides the same API as Comedilib, whereas this lar Comedi driver, there should be /dev/comediX
is used for real-time applications. (where X is number of instance of a driver)
file. For communication with this file Comedi li-
brary functions are used. For opening device –
Implementing the Driver comedi open(), for reading/writing ADCs/DACs
– comedi data read(), comedi data write()
Each Comedi driver should be registered to the list and for reading/writing digital inputs/outputs –
of active Comedi drivers. This is done by invoking comedi dio read(), comedi dio write().

103
COMEDI and UIO drivers for PCI Multifunction Data Acquisition

There are already applications using Comedi void (*)(void). For registering new PCI device,
API2 – thus in some cases there is no need for im- it is necessary to call pci qdev register() passing
plementing user-space application from scratch. parameter of pointer to PCIDeviceInfo. The most
important fields of this Qemu-specific data type are
pointers to init and exit functions with prototype of
int (*)(PCIDevice *).
5 Qemu Virtual Hardware
The PCI device specific initialization consists of:
Qemu is an open-source processor emulator. Unlike
common virtualization solutions it is able of emulat- • Initializing configuration space of PCI device
ing x86, x86-64, ARM and other widespread proces- – e.g. setting Vendor and Device IDs, device
sor architectures. For the purposes of this work it class, interrupt pin, etc.
was used for implementing virtual Humusoft MF624 • Registration of I/O memory used by the de-
DAQ card. vice.

• Creating a function (called when device gets


Implementation of Virtual PCI device allocated memory from virtual PCI controller)
for mapping of physical memory to particular
When creating new virtual device in Qemu, main BARs (Base Address Registers) of the PCI de-
hook into Qemu device infrastructure is done by vice.
invoking device init() with parameter of pointer
to initialization function with prototype of static The very basic (non-compilable) example:

1 | static CPUReadMemoryFunc * const mf624_BAR0_read[3] = { NULL, NULL, mf624_BAR0_read32 };


2 | static CPUWriteMemoryFunc * const mf624_BAR0_write[3] = { NULL, NULL, mf624_BAR0_write32 };
3 |
4 | static void mf624_map(PCIDevice *pci_dev, int region, pcibus_t addr, pcibus_t sz, int tp)
5 | {
6 | mf624_state_t *s = DO_UPCAST(mf624_state_t, dev, pci_dev);
7 | cpu_register_physical_memory(addr + 0x0, BAR0_size, s->BAR0_mem_table_index);
8 | }
9 |
10 | static int pci_mf624_init(PCIDevice *pci_dev)
11 | {
12 | mf624_state_t *s = DO_UPCAST(mf624_state_t, dev, pci_dev); /* i.e. container_of() */
13 | uint8_t *pci_conf;
14 |
15 | pci_conf = s->dev.config;
16 | pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_HUMUSOFT);
17 | /* ... */
18 | pci_conf[PCI_INTERRUPT_PIN] = 0x1;
19 |
20 | s->BAR0_mem_table_index = cpu_register_io_memory(mf624_BAR0_read, mf624_BAR0_write,
21 | s, DEVICE_NATIVE_ENDIAN); /* returns unsigned int */
22 | pci_register_bar(&s->dev, 0, BAR0_size, PCI_BASE_ADDRESS_SPACE_MEMORY, mf624_map);
23 | return 0;
24 | }
25 |
26 | static PCIDeviceInfo mf624_info = {
27 | .qdev.name = "mf624", .qdev.size = sizeof(mf624_state_t),
28 | .init = mf624_init, .exit = mf624_exit,
29 | };
30 | static void reg_dev(void) { pci_qdev_register(&mf624_info); }
31 | device_init(reg_dev)
2 For basic list of available applications see http://www.comedi.org/applications.html

104
Real-Time Linux Infrastructure and Tools

Usage of Virtual MF624

When running any guest operating system in Qemu


(with support for MF624 activated) the virtual
MF624 device is available in the same way as if it
was real hardware – there are no issues with inter-
facing between guest operating system and virtual
device. Interfacing between virtual hardware and
real world is handled by TCP/IP connection from
MF624 part in Qemu to host operating system. It
is used for reading/setting output/input values (as
shown in figure 2).
The most fundamental way of communication
through this channel is by using telnet application.
Example of real communication:

$ telnet localhost 55555


Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is ’^]’. FIGURE 3: Graphical application used for
DA1=9.998779 interfacing between virtual MF624 and real
DOUT=255.000000 world
DOUT=0.000000
DA1=5.000000 6 Conclusion
^]
telnet> Connection closed.
The outcome of this work creates basic integrated
tool for teaching PCI driver development (mostly)
for GNU/Linux operating system. Its main advan-
As a much more easier way of interfacing, there tage is the possibility to train driver development on
is also graphical application created just for purposes real hardware without the necessity of having an ex-
of communication with virtual MF624 card (see fig- pensive DAQ device. The other advantage is a safe
ure 3). It was created using Qt4 graphical toolkit. environment for driver prototyping – where no mis-
take can damage host operating system.
All the information (including source code) re-
Operating lated to topic covered in this article are pub-
system licly available on web page rtime.felk.cvut.cz/
hw/index.php/Humusoft MF6xx
MF624

Qemu TCP/IP Qt GUI User-space


References
Operating system (GNU/Linux) Kernel-space
[1] Jonathan Corbet, Alessandro Rubini, Greg
Kroah-Hartman, Linux Device Drivers, 3rd Edi-
Computer without MF624 Hardware
tion, O’Reilly Media, 2005
[2] Hans-Jürgen Koch http://www.kernel.org/doc/
htmldocs/uio-howto.html
FIGURE 2: Qemu implementing virtual
MF624 device [3] David Schleef, Frank Hess, Herman Bruyninckx
http://www.comedi.org/doc/
[4] Fabrice Bellard et al. git://git.qemu.org/qemu.git

105
COMEDI and UIO drivers for PCI Multifunction Data Acquisition

106
Real-Time Linux Infrastructure and Tools

A Framework for Component-Based


Real-Time Control Applications

Stefan Richter
ABB Corporate Research, Industrial Software Systems
Segelhofstr. 1K, Baden-Dättwil, Switzerland
[email protected]

Michael Wahler
ABB Corporate Research, Industrial Software Systems
Segelhofstr. 1K, Baden-Dättwil, Switzerland
[email protected]

Atul Kumar
ABB Corporate Research, Industrial Software Systems
Whitefield Road, Bangalore, India
[email protected]

Abstract
State-of-the-art real-time control systems execute multiple concurrent control applications using op-
erating system mechanisms such as processes, mutexes, or message queues. Such mechanisms leave a
high degree of freedom to developers but are often hard to deal with: they incur runtime overhead, e. g.,
context switches between threads, and often require tedious and costly fine-tuning, e. g., of process and
thread priorities. Reuse is often made more difficult by the tight coupling of software to a given hardware
or other software.
In this paper, we present a software architecture and execution framework for cyclic control appli-
cations that simplifies the construction of real-time control systems while increasing predictability and
reducing runtime overhead and coupling. We present the concepts of this framework as well as imple-
mentation details of our RTLinux-based prototype.

1 Introduction duction of digital communication standards such as


IEC 61850 [3], there have been strong trends to com-
Control systems interact with the physical world bine several such logical controllers into one physical
through sensors and actuators. Since physical pro- device to reduce cost. Still, a certain degree of in-
cesses typically do not wait for the result of some dependence is required to make sure that a faulty
computation, control systems must meet given dead- logical controller does not affect other controllers on
lines and are therefore real-time systems. Tradition- the same physical device.
ally, there was a separate controller for each con- Executing multiple control applications on the
trol application. These controllers are typically run- same controller has significantly increased the com-
ning at control cycles of one millisecond, i.e., they plexity of the controller’s software. In particular,
acquire new measurements, process them, and take the concurrent execution of multiple applications re-
actions once every millisecond. With the continuous quires synchronization between different processes.
improvement of computer technologies and the intro-

107
A Framework for Component-Based Real-Time Control Applications

The fine-tuning of such synchronization to satisfy all


runtime constraints on a given platform can be te-
dious and error-prone. Furthermore, it is difficult Overvoltage Overcurrent
to reuse a given set of processes on a different plat- Protection Protection
form. At runtime, the concurrent execution of multi-
ple processes causes context switches, which may be R R

expensive in comparison to the program logic.


In this paper, we present a component-based Sample Manager
software architecture and execution framework for a
large class of industrial control systems. These sys-
tems comprise one or several periodic real-time tasks FIGURE 1: Structure of a typical con-
(as defined by Liu in Section 3.3.1 [8]). Using a small troller software
set of abstraction mechanisms, our approach simpli-
fies system construction and makes large parts of an A typical implementation of such a system would
engineered control system reusable across different follow the client-server pattern and comprise three
platforms. processes: overvoltage protection, overcurrent pro-
tection, and the sample manager. The latter would
This introductory section finishes with a moti- be responsible for receiving, parsing, resampling, and
vating example and an overview of related issues. storing the data packets temporarily. Ultimately, the
Our component framework and its abstraction mech- protection functions would request resampled data at
anisms are presented in Section 2. We give im- 1 ms intervals.
plementation details in Section 3 along with sev-
eral measurements of the framework’s performance Both protection processes would contain one
in Section 4. We conclude the paper with a sum- thread each, which would consist of an endless loop
mary, discussion, and outlook in Section 5. that waits for a trigger (either as a direct timer
interrupt handler, some IPC mechanism such as a
semaphore, or by calling a sleep function) and then
1.1 Motivating Example run its respective protection algorithm. For obtain-
ing data from the sample manager process, some
Throughout this paper we will use an example from inter-process communication mechanism would be
the domain of power systems. In this domain, a typ- required.
ical purpose of an embedded controller is protecting On the other side, the sample manager would
primary equipment, e. g., a transformer. To this end, provide two threads for answering to the protection
the controller receives measurements, i. e., voltages functions and one thread for dealing with incoming
or currents, from a sensor and determines from these network packets. These three threads would have to
measurements whether the equipment is in a healthy be synchronized using standard mechanisms such as
state. If not, the controller sends a command to a mutexes to protect access to the sample buffer1 .
circuit breaker that disconnects the equipment from
the network to avoid damage. There are controllers
for erroneous situations regarding voltage, current, 1.2 Issues
temperature, and many more.
In our example, we want to run an overcurrent This example exhibits several drawbacks of this kind
protection and an overvoltage protection on the same of implementation.
device. Both protection functions run at 1 kHz (a
typical frequency for this kind of controllers) and rely Predictability Algorithms for analyzing general
on measurements that are sampled at 1.6 kHz (a typ- thread-based systems, e. g., detecting dead-
ical sampling frequency in IEC 61850). These mea- locks, are hard to implement (see, e. g., [10]).
surements need to be resampled by the controllers By using specific characteristics of the class of
from 1.6 to 1 kHz to feed consistent data into the systems under consideration here, predictions
protection algorithms. Because of the general setup could be automated.
of the environment, all sensor data from one point
in time are sent in one Ethernet packet as defined in Implementation effort For every such applica-
IEC 61850-9-2. The structure of the software com- tion, the application engineer has to think
prising these functionalities is shown in Figure 1. thoroughly about the implementation details
1 Even though there might be a solution without explicit synchronization mechanisms it is not obvious and possibly not easy.

108
Real-Time Linux Infrastructure and Tools

such as priorities, proper synchronization, pos- systems and that white-box components contain too
sible deadlock scenarios etc. Often, applica- much detail for this task. The decomposition of com-
tion engineers in the power and automation ponents into blocks (see Section 2.1) in our approach
domain lack formal education in computer sci- follows up on their plea for gray-box components.
ence, making the task even harder for them.
Reusability has been addressed in a component-
Reusability While all processes seem to be fairly based real-time system by Wang et al. [15]. They
independent from each other, they do depend propose a component-based resource overlay that iso-
on the protocol between sample manager and lates the underlying resource management from ap-
protection functions. This protocol needs to be plications to separate the concerns of application de-
implemented by both sides. Further, develop- signers and component providers. Complementing
ers must also decide if the threads are supposed the work presented in this paper, they focus on the
to be run in a light-weight (same process) or in proper allocation of resources to real-time compo-
a heavy-weight way (different processes) and nents.
must adapt the usage of inter-application com-
munication mechanisms accordingly.
2 Component Framework
OS overhead If there are fewer CPU cores than
threads context switches between threads have
to occur several times in each cycle, whenever To address the issues in Section 1.2, we designed a
the flow of execution requires one thread to component framework with a runtime concept com-
wait for another thread’s output. Because of prising four structural elements: component, func-
the short cycle times in many embedded sys- tion block, port, and channel. In Section 2.1 and
tems, context switches may have a significant Section 2.2, we describe these concepts in greater de-
impact on the system behavior (see Section 4). tail. Our component framework further encompasses
a concept for executing fully deterministic static but
Communication overhead The communication replaceable schedules. Application schedules are ex-
must rely on means of inter-process commu- plained in Section 2.3, their execution is presented
nication, which is considered to be slow (e. g., in Section 2.4. In Section 2.5, we discuss how the
message passing) or to potentially compromise concepts introduced in this section address the afore-
data integrity (e. g., shared memory). mentioned issues.

1.3 Related Work 2.1 Components and Function Blocks


Component-based systems have been proposed to In our component framework, the example above
overcome some but not all of these issues. Kopetz [4] would consist of three components representing over-
describes a component system for real-time systems voltage protection, overcurrent protection, and sam-
that addresses predictability and implementation ef- ple manager. Conceptually, components separate
fort. While we use a different, i. e., more fine-grained pieces of software from each other. Their pur-
and restrictive, component model, we follow the pro- pose is to provide for the independence required
posed concept by using static schedules as advocated in Section 1.1, hence to ensure that software de-
also by Locke [9]. fects that could affect system stability (e. g., memory
Kuz et al. [5] present a component framework on safety violations) do not propagate across component
top of the L4 microkernel. It strives for defining op- boundaries.
erating system functionality (e. g., file system) in a Components as such do not directly provide for
component-based way. In contrast, our framework any executable code; they are in effect comparable
builds on top of existing real-time operating systems to address spaces in typical operating systems. In
such as RTLinux and focuses on the control applica- contrast to a process, which contains threads, a com-
tions. ponent contains a number of function blocks. A func-
Rastofer and Bellosa [12] aim at separating com- tion block (or simply block ) is defined as a sequence
ponent functionality from platform mechanisms such of instructions with the following properties:
as concurrency and synchronization to increase pre-
dictability of system properties. Earlier, Büchi and Sequential It needs to consist of exactly one stream
Weck [2] pointed out that black-box components are of instructions that can be executed on one pro-
not sufficient for analyzing important properties of cessor core, i. e., it cannot distribute its work

109
A Framework for Component-Based Real-Time Control Applications

on several cores or threads. The stream of in-


structions can contain branches and loops as
Sample Manager Overvoltage Protection
long as the other properties below are satisfied.
Voltage Overvoltage
Sample Protection
Selector Algorithm
Sample
Terminating Each block needs to guarantee that it Provider

finishes its execution within its given deadline Sample


Receiver
for arbitrary inputs if the block is free to run Overcurrent Protection
Current Overcurrent
on the CPU. Sample Protection
Selector Algorithm
Sample
Provider

Non-Blocking A block may not depend on any


functionality of the underlying platform that
could block its execution. This includes in par- FIGURE 2: Block diagram of example
ticular synchronization mechanisms, standard
I/O operations, and sleep instructions. It does
not mean that blocks cannot be used at all to 2.2 Ports and Channels
access file systems but it has to be done in a
non-blocking way, i. e., by polling. Blocks depend on input and generate output. To
this end, they expose interfaces consisting of input
ports and output ports, which have been depicted as
Stateless A block may not keep any state, i. e., the inbound and outbound arrows in Figure 2. Blocks
output of a block depends only on its inputs in are not allowed to write to input ports or to read
the same execution cycle. 2 from output ports.
Ports need to be connected to channels, which
are represented by circles in Figure 2, such that each
port is connected to exactly one channel and each
These properties ensure that data flow and con-
channel is connected to exactly one input port and
trol flow are not part of the block’s logic. Instead,
one output port.
block execution is orchestrated by the framework ac-
cording to given data flow and control flow models. The framework ensures that each block has valid
This orchestration is described in Section 2.3. inputs before its execution gets triggered, i. e., all
blocks on which a block depends get executed and
Compared to a thread as a basic entity in op-
finished before this block.
erating systems, a function block does not main-
tain data across cycles. An important consequence There might be different kinds of channels in a
is that the stack associated with the execution of system to optimize its performance. For instance,
a block is empty after a block finished. Moreover, a channel between two blocks in the same compo-
blocks do not require sophisticated synchronization nent could be implemented as simple as some shared
like threads since they do not rely on blocking mech- memory. A channel between two components on the
anisms. In particular, blocks cannot wait for other same controller could be implemented by some mes-
blocks and thus situations involving deadlocks can- sage queue mechanism. This observation yields the
not arise. essential argument for not encapsulating each block
into its own component.
In Figure 2, we depict a decomposition of our ex-
ample from Section 1.1 into components and blocks. The reader may have noticed that our example
The sample manager has a block Sample Receiver, requires a buffer for received samples but that a block
which receives samples from the network and one is explicitly stateless. We consider this buffer a kind
block Sample Provider for each protection compo- of feedback that can be represented by a component-
nent that provides the required samples. Each pro- internal channel as depicted with a dashed line in
tection component has a block that determines which Figure 3. The Repeater is a block that distributes
samples have to be requested from the sample man- the same data from one channel to n others, in this
ager and a block containing the actual protection case as feedback to the sample receiver and as input
algorithm. to the sample providers.
2 Of course, some blocks might also depend on the state of the underlying system, e. g., the file system. For the sake of

simplicity such blocks and their implications to the system shall be out of scope of this paper.

110
Real-Time Linux Infrastructure and Tools

(W)atchdog nodes terminate the execution of their


children after a predefined time.

The application schedule for the example of Fig-


Sample ure 2 is shown in Figure 4. The root node, which
Repeater is executed first in each cycle, is a watchdog node
Receiver
that terminates subsequent blocks if they run longer
than 850 µs. The next block is a sequential node,
FIGURE 3: State as feedback which first executes the Sample Receiver (SR) block
and then a parallel control node. This parallel node
Because input ports are read-only and output specifies that its branches (two sequences of blocks)
ports are write-only this setup could be easily opti- can be executed in any order.
mized by a compiler to avoid copying data. The ad-
vantage of this approach is its proximity to control W 850 µs
theory, in which system state is usually described in
terms of some mathematical function that depends S

on time (see, e. g., [13]). Domain-relevant standards SR P


such as IEC 61131 thus consequently model system
S S
state as feedback.
CSS VSS

2.3 Application Schedules


SP SP

Application schedules describe the execution order OCP OVP


and periodicity of the blocks of an application. In
particular, they define the control flow between
FIGURE 4: Schedule for example applica-
blocks because applications do not change during ex-
tion
ecution. Such separation of control logics (defined
in the blocks) and how individual algorithms oper- Sometimes the worst-case execution time of a
ate together (defined in the schedule) allows for easy system cannot be statically verified. In this case, the
reuse of blocks in different contexts. root node of the schedule should be a watchdog con-
An application schedule is a tree whose leaf nodes trol node with a timeout that is less than the cycle
are blocks and whose inner nodes (called control time to ensure the timely termination of the whole
nodes) specify details of the execution of their chil- schedule. As an example, a system running at 1 kHz
dren. Schedule trees can be of arbitrary depth, should be guarded by a watchdog control node that
i. e., control nodes can contain other control nodes. ends a cycle after 850 µs. This leaves a slack time of
Schedules can be automatically generated from the 150 µs in which asynchronous tasks such as an FTP
data flow specified in block diagrams [6] and subse- server can be scheduled by the operating system.
quently be optimized either automatically or manu- Application schedules as presented in this sub-
ally. The control nodes of our system are: section offer several benefits:

ˆ The information about dependencies between


(S)equential nodes require execution of their child
blocks is not in the blocks themselves but in the
nodes in exactly the sequence specified.
schedule. Thus, blocks can be independently
(P)arallel nodes allow for an arbitrary execution developed, tested, and verified.
order of their child nodes. ˆ Interaction between blocks is simplified, which
makes it easier to reason about the system.
(A)lternating nodes alternate through their n
child nodes. In each cycle, the next child gets ˆ Blocks can be guaranteed to get their share of
executed. It has to be specified after how many CPU time by using time slot guarantees pro-
cycles c ≥ n execution starts again with the vided by the scheduling mechanisms of mod-
first child. This allows for an efficient imple- ern real-time operating systems. Conversely,
mentation of multi-rate systems as each child blocks can be checked at runtime whether they
gets executed with a c times larger period than stick to their time limits using a watchdog
its parent. timer as described above.

111
A Framework for Component-Based Real-Time Control Applications

2.4 Execution least common denominator of the individual cycle


times of all applications in the system. As an exam-
Since multiple applications can be executed on the ple, if three applications shall be executed that need
same controller, a system schedule needs to be com- to be run at 100 Hz, 500 Hz, and 1000 Hz, the base
puted from the individual application schedules. cycle time should be set to 1 ms. The applications
Blocks are executed sequentially in our framework. can then be executed in every 10th (100 Hz), in every
Therefore, the system schedule defines a total or- 5th (500 Hz), and in every single cycle (1000 Hz).
der on the blocks of all applications. This total or-
der must be consistent with the partial orders of the
blocks in the individual application schedules. 2.5 Benefits
As an example, assume that the application from The concepts of the component framework pre-
Figure 2 is the only application to be scheduled. sented in this section reduces the drawbacks listed in
Since the application schedule of this application (cf. Section 1.2. The static scheduling approach of our
Figure 4) involves parallel control nodes, there are framework improves the predictability of the real-
several possible system schedules that satisfy the par- time system compared to dynamic scheduling ap-
tial block order. One such system schedule would be proaches with priority-based scheduling (cf. Liu,
the sequence {SR, CSS, SP, OCP, VSS, SP, OVP}. Section 4.4 [8]). Its simplified programming model
Although system schedules are static, an active lets developers focus on the algorithms in the blocks
schedule can be replaced with another schedule at instead of priorities and synchronization mecha-
runtime. In [14] we describe how this is achieved nisms. It has been shown empirically that the im-
and explain how control software can be updated at plementation effort can be reduced significantly by
runtime using this mechanism. providing an appropriate programming model [11].
System schedules are executed by a dispatcher, Through the explicit representation of control
which executes a sequence of blocks in a cyclic fash- flow and data flow, blocks become more reusable be-
ion at a given base frequency. To this end, timer cause they can be developed independently of the
interrupts are generated at regular intervals, each of context in which they are used. Furthermore, pro-
which triggers one execution of the dispatcher. In grammers can rely on the robust communication pro-
Figure 5, timer interrupts are displayed as vertical tocol between blocks and do not have to implement
arrows. The dispatcher then calls some or all blocks a protocol themselves.
according to its schedule, which is represented by the We have shown that blocks are executed sequen-
gray bars in Figure 5. After the last block has been tially. As we will show in Section 3, context switches
executed, the dispatcher waits until the next timer can be avoided when sacrificing component separa-
interrupt. During this period of time, which we call tion. In this case, OS overhead will be reduced. See
slack time, other software may run asynchronously, Section 5 for a discussion on how this overhead could
e. g., an FTP server. This software will be preempted be reduced while still maintaining component sepa-
by the operating system as soon as the timer inter- ration.
rupt triggers the dispatcher for the execution of the
next cycle. Furthermore, system engineers have the freedom
to adjust the communication overhead depending on
their respective priority for performance and safety.
This will be discussed in the subsequent section.
slack slack
Schedule Schedule
time time
t
3 Implementation
FIGURE 5: Timeline of schedule execu-
tion Besides the conceptual separation of concerns by us-
ing components on the code level, our framework al-
The interval between two timer interrupts is lows for the physical enforcement of such separation
called base cycle time of the system; its inverse is at runtime. By assigning components to distinct ad-
called base frequency. Alternating nodes allow for dress spaces, a defect in one component cannot af-
the implementation of multi-rate systems as long as fect arbitrary other components by corrupting their
the cycle times of all blocks are integer multiples of memory. As an example, on POSIX-based systems,
the base cycle time. In other words, the base cy- components can run as individual processes and the
cle time should be chosen such that it represents the channels between components can be implemented

112
Real-Time Linux Infrastructure and Tools

using message passing. cution of blocks and notify the framework of


their completion.
However, this increased level of safety comes at
a price: while blocks in the same address space can
be executed without overhead, blocks in different ad-
dress spaces require additional context and address 4 Performance Measurement
space switches. Whereas components in the same ad-
dress space can communicate via light-weight mech- In the following, we compare the performance
anisms such as shared memory, components in differ- of the high-safety implementation and the high-
ent address spaces must employ indirect mechanisms performance implementation.
such as message passing or sockets for communicat-
The hardware used for the measurements is a
ing with each other.3
PC with an Intel Core2 Duo CPU E6550 running
To allow system engineers to make a trade-off at 2.33 GHz. The Linux kernel (2.6.31-9-rt) was in-
between performance and safety, our framework can structed to only use one CPU core. The hardware
be configured at compile time to enable or disable used for the measurements is certainly more pow-
physical separation while still having the same con- erful than current embedded hardware. However,
ceptual separation. Therefore, systems with a high the measurements will provide valuable information
level of safety can be built by implementing physi- on the factor by which performance decreases in the
cal separation of all components. For such systems, high-safety implementation.
more powerful and thus more expensive CPUs have
Figure 6 shows the setup for measuring. In this
to be used. If in contrast the price of a system should
setup, two blocks b1 and b2, which belong to differ-
be kept as low as possible, engineers can waive in-
ent components, communicate with each other. In
creased safety and build a system using a low-cost
each cycle, b1 gets the current time from the system
CPU without physical separation of components.
and sends the timestamp to b2. In Figure 6, two
Consequently, we implemented two versions of numbered dotted arrows indicate the following two
our framework based on POSIX. In the high- measurements:
performance (or fast ) implementation, the frame-
work and all components are executed in the same 1. Channel transmission time: The time it
address space and shared memory is used for inter- takes for data to be sent across a channel. In
component communication. In the high-safety (or the high-performance implementation, a chan-
safe) implementation, the framework and each com- nel is simply a location in memory. In the high-
ponent run as processes in separate address spaces safety implementation, a channel is a message
and message queues are used for communication. queue provided by the OS.
In Section 4, we compare the performance between
these two implementations. 2. Block control: The time it takes for
the framework to start the execution of a
In order to foster reusability of the framework block and regain control after its execution.
and the components, maximal platform indepen- The high-performance implementation directly
dence has been a major design decision. In fact, calls functions whereas the high-safety imple-
both versions of our framework, high-performance mentation relies on message queues.
and high-safety, share an extensive common code
base except for a very thin platform layer. It re-
quires C++ wrapper classes and class templates for C1 C2
only a few platform-specific mechanisms: B1 B2
1

ˆ Timers allow the framework to perform its ex-


ecution cycles and watchdog control nodes to 2

terminate blocks. Component Framework

ˆ Data Transmission (e. g., shared memory or


message queue) is required for implementing FIGURE 6: Performance measurement
channels. setup

ˆ Synchronization (e. g., mutex, semaphore, or Table 1 lists the results of our performance mea-
message queue) is used for initiating the exe- surements. Each measurement is the average result
3 On the other hand, research indicates that message passing can be faster than shared memory on multi-core machines [1].

113
A Framework for Component-Based Real-Time Control Applications

of hundreds of measurements. The tolerance of best it is feasible to construct systems that are either safe
case and worst case is about 10% to 15%. or fast.
The prevailing concept of processes and threads,
“fast” “safe” however, makes the construction of systems that are
(1) Channel transmission 0.02 µs 1.97 µs safe and fast difficult. We argue that a small and
(2) Block control 0.46 µs 6.79 µs easy-to-implement modification of this concept will
Sum 0.48 µs 8.76 µs overcome this limitation: Instead of running a thread
in the same process, i. e., address space, we propose
TABLE 1: Performance measurement re- to allow threads to change the address space during
sults runtime.
With this modification, we could statically
In total, it takes the framework about 0.48 µs
schedule the execution of blocks in one thread per
to schedule a block and a channel in the high-
CPU core. During execution of this schedule, the
performance implementation and 8.76 µs in the high-
thread would enter and leave address spaces in cor-
safety implementation. In measurements performed
respondence to the blocks’ components. An obvious
by Li et al. [7] on a comparable system, the aver-
advantage is the lack of context switches because
age context switch was around 3.8 µs. This indicates
blocks always run to completion and because they
that most of the overhead of a safe implementation
leave the stack empty after execution.
is caused by context switches and not by the feature
that is actually required for safety, address space sep- Moreover, inter-process communication could be
aration. implemented efficiently by using shared memory
without synchronization mechanisms. Imagine two
blocks A and B in different components that are
5 Conclusions connected by a channel such that A sends data to
B. Since B only gets started after A finished its ex-
ecution, B will always read consistent data from the
We have presented a component-based software channel/shared memory. This is still true if the same
framework that enforces structurization of cyclic thread is executing A and B, or if A and B are exe-
real-time control software systems vertically and hor- cuted on different cores.
izontally. By decoupling different aspects, such as
application logic, control flow, or communication, Note that it is also possible to implement an
our approach can be expected to simplify the con- operating system based on our component/block
struction of complex control systems and to reduce paradigm instead of the process/thread paradigm.
the implementation effort. We currently consider this idea less feasible. The
main reason is that in cyclic control systems there are
In addition, static scheduling and non- also sporadic tasks without real-time requirements,
preemptible execution of function blocks increase e. g., an FTP server. Such low-priority tasks run in
the system’s determinism and thus its predictability the slack time of the real-time cycle, and typically
compared to interleaved execution of threads and their execution requires the slack time of more than
dynamic scheduling. Moreover, our approach allows one cycle. They therefore need to be preemptible
system engineers to adjust the system to their needs and the operating system would have to provide some
if safety is of lesser concern than cost because the preemption mechanism similar to the one in the pro-
number of context switches and thus system over- cess/thread paradigm.
head can be reduced.
We have shown how the abstractions offered by
the framework can be implemented on RTLinux and 5.2 Future Work
provided performance measurements for two differ-
ent implementations. We see considerable potential for automatic tools
that can assist in system design. For instance, com-
ponent boundaries do not have to be drawn arbi-
5.1 Discussion trarily. Instead, an automatic tool can formally de-
rive component boundaries according to logical con-
In domains such as power and automation systems, straints. In our example, the three components have
components are required to be separated from each to exist because the two protection functions have to
other at runtime to prevent the propagation of faults be separated: If one of them fails the other one is not
across component boundaries. We have shown that affected. In addition, the sample manager needs to

114
Real-Time Linux Infrastructure and Tools

be separate because it must not be affected by faults Microkernel-based Embedded Systems,” Jour-
from either protection function in order to still be nal of Systems and Software, vol. 80, no. 5, pp.
able to serve the other one. However, if there is only 687–699, 2007.
one protection function it can be merged with the
sample manager because a fault in any block renders [6] E. Lee and D. Messerschmitt, “Static Schedul-
the whole system dysfunctional. ing of Synchronous Data Flow Programs for
Digital Signal Processing,” IEEE Transactions
In Section 2.4, we showed how the system sched- on Computers, vol. 36, no. 1, pp. 24–35, 1987.
ule maps all blocks to be executed onto the same
CPU core. This approach can be extended to mul- [7] C. Li, C. Ding, and K. Shen, “Quantifying
tiple cores by providing one mapping for each core the cost of context switch”, Proceedings of the
such that every block is statically assigned to one 2007 workshop on Experimental computer sci-
core. This allows for executing our framework in ence, Article 2, San Diego.
multi-core, multi-CPU, or even distributed scenar-
ios. Synchronization between the different cores is [8] J. W. S. Liu, “Real-Time Systems”, Prentice-
simplified to well-defined synchronization points be- Hall, New Jersey, 2000, ISBN 0-13-099651-3.
cause the dependencies between the blocks are ex-
plicitly specified in the application schedules. [9] C. D. Locke, “Software Architecture for Hard
Real-Time Applications: Cyclic Executives vs.
The dispatcher in our framework can be ex- Fixed Priority Executives,” Journal of Real-
tended such that blocks are not only executed in Time Systems, no. 4, pp. 37–53, 1992.
sequence, but at precise points in time. This is an
effective means for reducing the system’s jitter. [10] M. Naik, C.-S. Park, K. Sen, and D. Gay, “Ef-
fective Static Deadlock Detection”, ICSE’09.

References [11] L. Prechelt, “An Empirical Comparison of


Seven Programming Languages”, IEEE Com-
[1] A. Baumann, P. Barham, P. E. Dagand, puter vol. 33, no. 10, pp. 23–29, 2000.
T. Harris, R. Isaacs, S. Peter, T. Roscoe,
[12] U. Rastofer and F. Bellosa, “Distributed
A. Schüpbach, and A. Singhania, “The Mul-
tikernel: A New OS Architecture for Scalable component-based software engineering for dis-
tributed embedded real-time systems,” IEE
Multicore Systems,” in SOSP ’09: Proceedings
Proceedings Software, vol. 148, no. 3, pp. 99–
of the ACM SIGOPS 22nd symposium on Oper-
ating systems principles. New York, NY, USA: 103, 2001.
ACM, 2009, pp. 29–44. [13] C. B. Speedy, R. F. Brown, and G. C. Good-
[2] M. Büchi and W. Weck, “A Plea for Grey-Box win, “Control Theory: Identification and Opti-
Components”, in Workshop on Foundations of mal Control”, Oliver & Boyd, Edinburgh, 1970.
Component-Based Systems, (FoCBS ’97), Zrich,
1997. [14] M. Wahler, S. Richter, S. Kumar, and M. Oriol,
“Non-disruptive Large-scale Component Up-
[3] IEC 61850. Communication networks and sys- dates for Real-Time Controllers,” in Third
tems in substations. International Eletrotechni- Workshop on Hot Topics in Software Upgrades
cal Commission Standard. (HotSWUp’11), 2011.
[4] H. Kopetz, “The component-based design of
[15] S. Wang, S. Rho, Z. Mai, R. Bettati, and
large distributed real-time systems,” Control
W. Zhao, “Real-time component-based sys-
Engineering Practice, vol. 6, no. 1, pp. 53–60,
tems,” in Proceedings IEEE Real-Time and Em-
1998.
bedded Technology and Applications Symposium
[5] I. Kuz, Y. Liu, I. Gorton, and G. Heiser, (RTAS), San Francisco, CA, USA, 7-10 March
“CAmkES: A Component Model for Secure 2005.

115
A Framework for Component-Based Real-Time Control Applications

116
Performance Evaluation and Enhancement of Real-Time Linux

Real-Time Performance of L4Linux

Adam Lackorzynski Janis Danisevskis, Jan Nordholz, Michael Peter


Technische Universität Dresden Technische Universität Berlin
Department of Computer Science Deutsche Telekom Laboratories
Operating Systems Group Security in Telecommunications
[email protected] {janis,jnordholz,peter}@sec.t-labs.tu-berlin.de

Abstract
Lately, the greatly improved real-time properties of Linux piqued the interest of the automization
industry. Linux holds a lot of promise as its support by a large active developer community ensures an
array of supported platforms. At the same time, automization vendors can tap into a huge reservoir of
sophisticated software and skilled developers. The open nature of Linux, however, raises questions as to
their resilience against attacks, particularly in installations that are connected to the internet.
While Linux has a good security record in general, security vulnerabilities have been recurring and
will do so for the foreseeable future. As such, it is expedient to supplement Linux’ security mechanisms
with the stronger isolation afforded by virtual machines. However, virtualization introduces an additional
layer in the system software stack which may impair the responsiveness of its guest operating systems.
In this paper, we show that L4Linux — an encapsulated Linux running on a microkernel — can be
improved on such that it exhibits a real-time behavior that falls very close to that of the corresponding
mainline Linux version. In fact, we only measured a small constant increase of the response times, which
should be small enough to be negligible for most applications. Our results show that it is practically
possible to run security critical tasks and control applications in dedicated virtual machines, thus greatly
improving the system’s resilience against attackers.

1 Introduction for a host of systems holds a lot of appeal for de-


vice manufacturers. Having ports of Linux for many
platforms is due in no small part to an active commu-
Traditionally, embedded systems follow the trend set nity, which allows for improvements to circulate as
by their siblings in the desktop and server arena. open source. A third contributing factor for Linux’
Rapid technological advances allow for ever more popularity is its maturity. For example, as embed-
functionality. At the downside, the growing complex- ded systems embrace network connectivity, they can
ity poses a risk as software defects may badly impair draw on a mature network stack and a variety of
the expected machine behavior. This risk is par- complementing user level software. Originally not
ticularly threatening for embedded systems as these being a real-time operating system, the tireless ef-
systems often have to meet stringent safety require- fort of the community managed to evolve Linux to
ments. the point where it can run applications with real-
time requirements. With that ability, Linux is set to
Another trend carrying over from desktop and
make inroads into markets that used to be the pre-
servers is the use of open source software, partic- serve of dedicated real-time operating systems, most
ularly Linux. The adoption of Linux is remarkable of them proprietary.
insofar as it started out as a hobbyist’s system aimed
at desktops. Although neither security nor scalabil- With regard to security, the operating system is
ity were considered in the beginning, Linux came to of particular importance. It falls to it to keep ap-
address either of them quite well, making it hugely plications running on them in check. Yet, prevalent
popular as server operating system. Linux also got a operating systems have a rather poor record when
significant foothold in the embedded market. Apart it comes to isolation. The first problem is that they
from the absence of licensing costs, its availability do poorly when it comes to the principle of least

117
Real-Time Performance of L4Linux

authority. Some applications need special privileges for safety-critical systems.


to fulfill their task. Unfortunately, the granularity
As an alternative solution, the research commu-
with which these privileges can be granted is too
nity put forth designs based on a minimalistic kernel
coarse, leaving many applications running with ex-
which is small enough to be thoroughly inspected.
cessive rights. If such an application succumbs to an
Most of the functionality that used to be part of
attacker, then he can wield that right for its mali-
monolithic kernels (device drivers, protocol stacks)
cious purposes. The point is corroborated by the ob-
would be moved into user level tasks. The rationale
servation that exploiting applications running with
was that such an architecture would allow for easy
the root identity are in high numbers. However, to-
customization by replacing or duplicating the rele-
day’s operating systems cannot be easily phased out
vant user level components.
because porting their large number of applications
and device drivers is a laborious endeavor with un- As device driver development had been known
certain prospects of success. to be the weak spot for any new system, there was
the need to reuse as many device drivers as possi-
Confronted with the wanting protection of oper-
ble. That raised the question of how to provide the
ating systems, designers of security-critical systems
execution environment they were written for. The
have long turned to physical separation, that is, plac-
original vision was that monolithic kernels would be
ing each application on its own machine. While en-
decomposed into small components, each executing
suring excellent separation, this approach is not eas-
in its own protection domain. Since faults would be
ily applicable to use cases where budget, space, and
contained, many system architect nourished the hope
power limitations have to be met. That is where op-
that the resulting system would be superior in terms
erating system encapsulation comes into play. While
of flexibility, fault tolerance, and security. This en-
virtualization is currently the most prominent en-
deavor, though, proved complicated. Plans to move
capsulation technique, it is neither the only one nor
to a system completely devoid of legacy operating
the one best suited for all use cases. In the absence
system code have not materialized so far and it is
of hardware virtualization support, other approaches
dubious if they will in the foreseeable future.
like OS rehosting might be the better choice.
As set out above, the insufficiencies of prevalent
While virtualization is a well-studied subject, we
operating systems pose a risk that may be unaccept-
are not aware of research covering real-time operat-
able for security- and safety-critical applications. At
ing systems in virtual machines. In this paper, we
the same time, they cannot be completely retired
set out to investigate whether the real-time behav-
as many applications and device drivers depend on
ior of Linux can be preserved if it is executed in a
them. One solution might be to deploy multiple of
container. Our results indicate that such a setup
them each dedicated to a specific task and ensure
is feasible if moderately longer event response times
that no unsanctioned interference between them oc-
can be tolerated.
curs. The most frequently cited arguments are the
We will proceed with a brief summary of OS en- following:
capsulation use cases followed by a description of
L4Linux, the system we build on. The design chap-
Security. Embedded systems are ever more often
ter follows up with a discussion of the problems we
connected to the internet whereby the attack
faced when deriving an L4Linux version from a Linux
surface is drastically enlarged. Given the
with the PREEMPT RT patch applied. To evaluate the
checkered security record of existing operat-
real-time performance of L4Linux, we conducted a
ing system, it seems prudent to place exposed
number of experiments. The results will be presented
functionality in dedicated compartments which
before we discuss related work and conclude.
provide an additional line of defense.

Consolidation. High end embedded systems often


2 Mitigating OS Shortcomings employ multiple electronic control units. Not
only does such a setup incur high material cost,
In the past, monolithic operating system kernels have each unit has also to be wired and adds to
been perceived as an obstacle for the evolvement of the overall energy consumption. Consolidat-
secure, robust, and flexible systems. The underlying ing multiple of them onto one unit can yield
concern was that their growing complexity would in- significant cost, weight, and energy savings.
evitably introduce software defects. If such a defect
triggers a fault, the consequences are most likely fa- Certification. So far, safety-certified components
tal and will crash the system, which is unacceptable could not be deployed alongside noncertified

118
Performance Evaluation and Enhancement of Real-Time Linux

ones on the same controller necessitating phys- Timer. The current version of Fiasco uses a 1 KHz
ical separation which incurs costs. A certifiable clock to drive its scheduler. Accordingly, time-
kernel holds the promise of providing isolation outs are limited to that granularity as well,
strong enough so that a coexistence on one con- which is insufficient for high-resolution timers.
troller becomes viable.
Communication. L4Linux operates in an environ-
Development. The diversity of current software
ment where some services are provided by other
may be difficult to leverage if the development
tasks. Requesting them involves the platform
is constrained to one operating system. It
communication means, in case of Fiasco pri-
would be much better if the respective class-
marily synchronous IPC. Waiting for an out-
leading solutions could be used while preserv-
standing reply may delay the dispatch of an
ing the investments in existing software assets.
incoming event.

Memory virtualization. For security reasons,


3 Design L4Linux cannot manage its page tables directly
but has to use L4 tasks instead. It turned out
that some applied performance optimizations
L4Lx
have an adverse effect on responsiveness.
Processes Secure
App
Processes L4Linux Server
Drivers Protocols Drivers
We will touch on each of these points in the fol-
User
Kernel lowing sections.
Drivers Protocols Fiasco
Linux

FIGURE 1: Left: Linux running directly 3.1 Communication


on a machine. Right: Linux running on top
of Fiasco, an L4 kernel. In our architecture, L4Linux draws on services pro-
vided by other tasks. Depending on the use case,
Our design is based on L4Linux[?], a port of communication may either by synchronous or asyn-
the Linux kernel to the L4[5] microkernel. When chronous. Synchronous IPC has performance advan-
it was initially developed it set the standard re- tages for short operations, while asynchronous com-
garding performance for encapsulated operating sys- munication can better hide latencies.
tems. Ever since it has evolved alongside with Fi-
An example where synchronous IPC was chosen
asco as the latter improved on its security features
is the graphics server. L4Linux maintains a shadow
(capability-based access control) and became avail-
framebuffer and notifies the graphics servers when
able for multiprocessors and multiple architectures
changes have occured. The graphics server, which
(IA32, AMD64, ARM).
has also access to the shadow framebuffer, then up-
Unlike previous versions of L4Linux, which relied dates the device framebuffer taking into account the
on L4 threads, the current vCPU-based execution current visibility situation (as multiple L4Linux in-
model[1] is very similar to the environment found stances can have windows at various positions).
on physical machines. As such, any improvement of
Considering performance, such an arrangement
the mainline Linux kernel with respect to interrupt
is reasonable. However, whenever a screen update is
latencies should in principle also be available in the
in progress, the L4Linux main thread cannot pick up
corresponding version of L4Linux. The overhead due
on incoming events as it waits for the reply from the
to the additional context switches and privilege tran-
graphics server. While this situation is hardly notice-
sitions will have an impact but should lengthen the
able in performance measurements, latency-sensitive
critical paths only by a constant delay.
applications are hurt badly. Our solution involves
That said, the previous versions of L4Linux were a second thread relieving the main thread from en-
designed with an emphasis on throughput perfor- gaging directly in IPC. The two threads themselves
mance. Design decisions that yield good perfor- interact through a ringbuffer in shared memory and
mance results might however prove problematic for use asynchronous notification. As a result, the main
real-time performance. In fact, we came up against thread is never tied up and can always promptly re-
three problems of that kind: spond to incoming events.

119
Real-Time Performance of L4Linux

3.2 Timer based motherboard and 4 GB RAM. In order to in-


crease the magnitude of the differences between our
Unlike other peripheral devices, timers are not di- test scenarios, we reduced the memory available to
rectly accessible by user level components. Where the OS to 128 MB and disabled the second core of
timeouts or delays are required they are typically im- the CPU.
plemented using IPC timeouts. IPC timeouts as im-
As base OS version we chose the recent Linux
plemented by the Fiasco microkernel have timer tick
version 3.0.3 and the corresponding release of
granularity—typically with a one millisecond period.
L4Linux. We then took Thomas Gleixner’s Linux-
As this granularity is too coarse to use it as RT patches and applied them to both code bases.
a timer source for a high-resolution timer, we de- L4Linux was supplemented with the current version
cided to ditch the periodic IPC-timeout based timer. of Fiasco and a minimal L4 runtime environment,
Instead, L4Linux was granted direct access to the which consisted of a framebuffer and console driver.
HPET device, which can be used to trigger timer Both resulting setups were finally booted directly
events with a high resolution. from Grub.
We chose the following software components as
3.3 Memory Virtualization our test suite:

As a user-level task, L4Linux has no direct access to • cyclictest is a simple benchmark to measure
the page tables, which are under the exclusive control the accuracy of OS sleep primitives. The tool is
of the microkernel. To isolate its processes, L4Linux part of the kernel.org ”rt-tests” benchmark
makes use of L4 tasks. L4 allows two tasks to share suite 1 . To achieve accurate results, we exe-
memory and provides the original owner with the cuted it at the highest realtime priority and
means to later revoke that sharing. L4Linux uses chose clock nanosleep() as sleep function,
that memory sharing facility to provision its pro- as it allows an absolute specification of the
cesses with memory. Whenever Linux modifies the wakeup time and thus ignores the time spent
pagetable of one of its processes, this change is re- setting up the actual timer event.
flected in a corresponding change in the memory con-
figuration in the process’ task. • hackbench transmits small messages between
a fixed set of processes and thus exerts stress
While under normal operations page table up- on the OS scheduler. Its current development
dates are propagated individually, the destruction of version is also hosted through git 2 .
a process is handled differently. To avoid the over-
head of a microkernel syscall for each single page • Finally, the compilation of a Linux kernel both
table invalidation, the destruction is performed by a causes a considerable amount of harddisk inter-
single system call. The destruction of a task address rupt activity and creates and destroys a large
space requires the microkernel to iterate the page di- number of processes. We used a standard 2.6
rectory and return page table to its internal memory series Linux source tree and the default i386
pools. Although task destruction can be preempted, configuration as baseline.
it will not be aborted once started. As such, it is fully
added to worst-case times of timing critical paths in
L4Linux.
4.1 Throughput
We added a thread to L4Linux, which disposes
of tasks of perished processes. Not longer execut- To get a feeling for the setup we were about to ex-
ing long-latency syscalls, the main L4Linux thread periment with, we started out with some throughput
remains responsive to incoming events. measurements. While these are by no means repre-
sentative due to the restrictions to very little memory
and only one core, they serve as a good starting point
4 Evaluation and already hint at some of the expected results.
As every switch between a userspace task and
To evaluate our design we conducted a number of the L4Linux server involves a round-trip into the mi-
experiments. Our test machine contained a 2.7 Ghz crokernel and a switch to a different address space,
Athlon 64 X2 5200+ processor, an nVidia-MCP78- L4Linux suffers from frequent TLB and cache misses.
1 RT-Tests Repository: git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git
2 Hackbench Repository: https://github.com/kosaki/hackbench

120
Performance Evaluation and Enhancement of Real-Time Linux

To highlight the effect of this disadvantage, we cre- requires assistance from the microkernel. To see this
ated an ”intermediate” version of native Linux with- problem in effect, we created two latency histograms
out support for global pages, large pages and with with cyclictest under concurrent operation of the
explicit TLB flushes on every kernel entry and exit hackbench benchmark (cf. Fig. 4) and a kernel com-
path. pilation (cf. Fig. 5), respectively. While the maxima
of both histograms are in the expected order of mag-
The hackbench benchmark (cf. Fig. 2) shows
nitude, the former shows outliers up to 100µs and the
the stripped-down version of Linux almost halfway
latter (due to its constant destruction of processes)
between native Linux and L4Linux, which demon-
even an almost constant distribution of ”long” laten-
strates that the impact of repeated cache misses is
cies beyond 40µs.
quite severe when tasks are rapidly rescheduled. The
compilation of a Linux kernel (cf. Fig. 3) displays as 1e+06
expected only a mild slowdown for the non-caching L4Linux (no deferred destruction)

Linux variant. 100000

10000

300 Linux 1000


TLB Flush

#
L4Linux
250 100

200 10
time (s)

150 1

0 20 40 60 80 100
latency (us)
100

50 FIGURE 4: Latencies measured with


hackbench as load.
0
hackbench
1e+06
L4Linux (no deferred destruction)

FIGURE 2: Runtime of hackbench (40 100000

processes, 100’000 iterations).


10000

1000
#

1000
Linux 100

TLB Flush
L4Linux
800 10

1
600
time (s)

0 20 40 60 80 100 120 140


latency (us)

400
FIGURE 5: Latencies measured with
hackbench as load.
200
As outlined in section 3.3, we therefore external-
0 ized the destruction of address spaces to a separate
kernel L4 thread. While the execution context issuing the
destruction request still waits for the destruction to
FIGURE 3: Duration of kernel complete, L4Linux as a whole is then interruptible
compilation. during the wait and can react to external events.

4.2 Memory Management 4.3 Latency under Load

Another possible source of large latencies are long- Taking all these findings into consideration,
running non-abortable system calls. One particu- we finally compared native Linux and our im-
larly long operation especially in the Fiasco-based proved L4Linux implementation directly us-
setup is the destruction of an address space, as this ing our established benchmark combinations

121
Real-Time Performance of L4Linux

cyclictest/hackbench and cyclictest/kernel-


compile. The results are shown in Fig. 6 and Fig. 7. 1e+06
Linux
L4Linux
The first obvious result is that the L4Linux dis-
100000
tribution is offset from the native distribution by
5µs. The explanation for this is pretty simple. As 10000

L4Linux is not allowed to receive interrupts directly,


the microkernel must first handle the interrupt, de- 1000

#
termine the recipient attached to it and pass the in-
100
terrupt on. This operation induces a very constant
amount of overhead – we measured a delay of 3.65µs 10

in our setup for the delivery of an edge-triggered in-


terrupt. Handling level-triggered interrupts requires 1
0 5 10 15 20 25 30

even more time, as L4Linux has to issue an extra latency (us)

syscall to unmask the interrupt line once it is done


FIGURE 6: Latencies measured with
with the device in these cases.
hackbench as load.
As our timer is edge-triggered, there remains a
delay of about 1.3µs. We attribute this to the over- 1e+06
Linux
head induced by the microkernel syscalls involved in L4Linux

switching to the real-time task (restoring global de- 100000

scriptor table and execution state) as well as to the


10000
aforementioned additional address space switch.
Both distributions reach their maximum not im- 1000
#

mediately, which means that the ideal interrupt de-


100
livery path is not the most frequent. This effect is
likely linked to execution paths during which Linux 10

(just as L4Linux) have disabled interrupt reception.


The delay incurred by this deferred delivery is more 1
0 5 10 15 20 25 30 35 40 45

pronounced in the rehosted setup, because deferring latency (us)

the delivery causes additional round-trips to the mi-


FIGURE 7: Latencies measured with ker-
crokernel once L4Linux has enabled interrupt recep-
nel compile as load.
tion again. Overall though, the difference is not ex-
ceptionally large.
Finally, both setups exhibit outliers well be- 5 Related Work
yond the main part of their respective distribu-
tion. With hackbench as load-generating bench- Virtualization shares many goals with OS encapsu-
mark, these show the usual difference between native lation, it can be even viewed as an alternative im-
and rehosted and are therefore no specific effect of plementation. For a long time, the adoption of vir-
an implementation detail of Fiasco or L4Linux. The tual machines was hampered by the lack of hardware
interrupt-heavy kernel compilation on the other hand support in processors. The construction of VMs that
demonstrates that additional interrupts (mostly gen- are efficient, equivalent, and properly controlled re-
erated by the hard drive controller) affect L4Linux quires the instruction set architecture to meet certain
much harder than native Linux due to the interrupt requirements[6]. Unfortunately, many widely used
delivery overhead. This effect is even worsened by processors, in particular those of IA32 provenance[7],
the fact that the cyclictest benchmark has no way do not, and among those which do meet the ba-
to atomically determine the clock overrun once it is sic requirements are again many which lack sup-
awoken from its sleep syscall: interrupts hitting be- port for virtualization of memory management struc-
tween the sleep and the time lookup are accounted tures. While this deficiency does not bar virtual-
for with their full duration. ization on these processors, performance is indeed
severely hampered, as operations on a guest’s page
L4Linux even has to deal with another interrupt
tables require extensive help by the hypervisor[2].
source which is not present in the native scenario: Fi-
The performance degradation usually cancels the
asco employs the on-board PIT for its own schedul-
benefits provided by the comfortable hardware vir-
ing needs and configures it to generate interrupts in
tualization interface, so OS encapsulation remains a
regular intervals.
viable solution on these platforms.

122
Performance Evaluation and Enhancement of Real-Time Linux

Virtualization support has also been announced References


for ARM processors. It remains to be seen how long
it takes until processors without it are fully sup- [1] A. W. Adam Lackorzynski and M. Peter. Generic
planted. Virtualization with Virtual Processors. In Pro-
Xen[3] provides for running multiple operating ceedings of the Twelfth Real-Time Linux Work-
systems on one physical machine. As with L4Linux, shop, Nairobi, 2010.
guests have to be modified if the underlying ma-
chine does not provide virtualization support. Unlike [2] K. Adams and O. Agesen. A comparison of soft-
Fiasco, Xen does not provide light-weight task and ware and hardware techniques for x86 virtualiza-
IPC. Although Xen was ported to the ARM architec- tion. In ASPLOS-XII: Proceedings of the 12th
ture, this port has not found the echo of its siblings international conference on Architectural support
on the desktop or server. for programming languages and operating sys-
tems, pages 2–13, New York, NY, USA, 2006.
Virtualization and microkernel architecture
ACM.
do not rule each other out, they are rather
complementary[8, 9]. While thorough research of
[3] P. Barham, B. Dragovic, K. Fraser, S. Hand,
worst case execution times of mircokernels them-
T. Harris, A. Ho, R. Neugebauer, I. Pratt, and
selves exists[4], so far, neither of the encapsulation
A. Warfield. Xen and the art of virtualization,
approaches has been examined as to their applica-
2003.
bility for real-time applications.
[4] B. Blackham, Y. Shi, S. Chattopadhyay, A. Roy-
choudhury, and G. Heiser. Timing analysis of a
6 Conclusion protected operating system kernel. In Proceed-
ings of the 32nd IEEE Real-Time Systems Sym-
In this paper we investigated the real-time execu- posium, Vienna, Austria, Nov 2011.
tion characteristics of an encapsulated operating sys-
tem. Our results indicted that run-time overhead [5] J. Liedtke. On micro-kernel construction. In Pro-
is incurred in timing-critical paths. However, these ceedings of the fifteenth ACM symposium on Op-
overhead is in the order of the latency experienced erating systems principles, SOSP ’95, pages 237–
with native Linux. As such, L4Linux is suitable for 250, New York, NY, USA, 1995. ACM.
use cases where the security of real-time applications
shall be bolstered by deploying them in dedicated OS [6] G. J. Popek and R. P. Goldberg. Formal require-
capsules. ments for virtualizable third generation architec-
tures. Commun. ACM, 17(7):412–421, 1974.
Future work will concentrate on reducing the
current overheads. We see good chances that the
[7] J. Robin and C. Irvine. Analysis of the intel pen-
influence of large overhead contributors such as the
tium’s ability to support a secure virtual machine
early timer expiration can be mitigated, e.g. by de-
monitor, 2000.
laying the timer and rearming it until after the real-
time task has finished executing.
[8] A. L. Steffen Liebergeld, Michael Peter. To-
Our choice of using OS rehosting as encapsula- wards Modular Security-Conscious Virtual Ma-
tion technology was mainly motivated by its appli- chines. In Proceedings of the Twelfth Real-Time
cability to a host of processors currently deployed in Linux Workshop, Nairobi, 2010.
embedded systems. As hardware virtualization sup-
port finds its way into more and more processors, the [9] U. Steinberg and B. Kauer. Nova: a
question comes up whether the software stack un- microhypervisor-based secure virtualization ar-
derneath a virtual machine can be designed in such chitecture. In EuroSys ’10: Proceedings of the
a way that it allows for real-time operations in the 5th European conference on Computer systems,
virtual machine. pages 209–222, New York, NY, USA, 2010. ACM.

123
Real-Time Performance of L4Linux

124
Performance Evaluation and Enhancement of Real-Time Linux

Tiny Linux Kernel Project: Section Garbage Collection Patchset

Wu Zhangjin
Tiny Lab - Embedded Geeks
http://tinylab.org
[email protected]

Sheng Yong
Distributed & Embedded System Lab, SISE, Lanzhou University, China
Tianshui South Road 222, Lanzhou, P.R.China
[email protected]

Abstract
Linux is widely used in embedded systems which always have storage limitation and hence requires
size optimization. In order to reduce the kernel size, based on the previous work of the “Section Garbage
Collection Patchset”, This paper focuses on details its principle, presents some new ideas, documents the
porting steps, reports the testing results on the top 4 popular architectures: ARM, MIPS, PowerPC, X86
and at last proposes future works which may enhance or derive from this patchset.

1 Introduction Our gc-sections subproject focuses on analyzes


its working mechanism, improves the patchset(e.g.
unique user defined sections), applies the ideas for
The size of Linux kernel source code increases more architectures, tests them and explores potential
rapidly, while the memory and storage are limited enhancement and derivation. The following sections
in embedded systems (e.g. in-vehicle driving safety will present them respectively.
systems, data acquisition equipments etc.). This re-
quires small or even tiny kernel to lighten or even
eliminate the limitation and eventually expand the 2 Link time dead code remov-
potential applications.
ing using section garbage col-
Tiny Lab estimated the conventional tailoring
methods and found that the old Tiny-linux project is lection
far from being finished and requires more work and
hence submitted a project proposal to CELF: “Work Compiler puts all executable code produced by com-
on Tiny Linux Kernel” to improve the previous work piling C codes into section called .text, r/o data into
and explore more ideas. section called .rodata, r/w data into .data, and unini-
tialized data into .bss[2, 3]. Linker does not know
“Section garbage collection patchset(gc-
which parts of sections are referenced and which ones
sections)” is a subproject of Tiny-linux, the initial
are not referenced. As a result, unused(or ‘dead’)
work is from Denys Vlasenko[9]. The existing patch-
function or data cannot be removed. In order to
set did make the basic support of section garbage
solve this issue, each function or data should has its
collection work on X86 platforms, but is still out-
own section.
side of the mainline for the limitation of the old
GNU toolchains and for there are some works to be gcc provides -ffunction-sections or
done(e.g. compatibility of the other kernel features). -fdata-sections option to put each function or data

125
Tiny Linux Project: Section Garbage Collection Patchset

to its own section, for instance, there is a function $ echo ’ unused (){} main (){} ’ | gcc -S -x c -o - - \
| grep . text
called unused func(), it goes to .text.unused func . text
section, Then, ld provides the --gc-sections op-
tion to check the references and determine which Or else, each function has its own section (indi-
dead function or data should be removed, and the cated by the .section instruction of assembly):
--print-gc-sections option of ld can print the the
$ echo ’ unused (){} main (){} ’ \
function or data being removed, which is helpful to | gcc - ffunction - sections -S -x c -o - - | grep . text
. section . text . unused , " ax " , @progbits
debugging. . section . text . main ," ax " , @progbits

The following two figures demonstrates the dif-


ferences between the typical object and the one with As we can see, the prefix is the same .text, the
-ffunction-sections: suffix is function name, this is the default section
naming rule of gcc.
Expect -ffunction-sections, the section at-
tribute instruction of gcc can also indicate where
should a section of the funcion or data be put in,
and if it is used together with -ffunction-sections,
it has higher priority, for example:
$ echo ’ __attribute__ (( __section__ ( " . text . test " ))) unused (){} \
main (){} ’ | gcc - ffunction - sections -S -x c -o - - | grep . text
. section . text . test ," ax " , @progbits
. section . text . main ," ax " , @progbits

.text.test is indicated instead of the default


.text.unused. In order to avoid function redefinition,
the function name in a source code file should be
unique, and if only with -ffunction-sections, every
FIGURE 1: Typical Object function has its own unique section, but if the same
section attribute applies on different functions, dif-
ferent functions may share the same section:
$ echo ’ __attribute__ (( __section__ ( " . text . test " ))) unused (){} \
__attribute__ (( __section__ ( ". text . test " ))) main (){} ’ \
| gcc - ffunction - sections -S -x c -o - - | grep . text
. section . text . test ," ax " , @progbits

Only one section is reserved, this breaks the core


rule of section garbage collection: before linking,
each function or data should has its own section, but
sometimes, for example, if want to call some func-
tions at the same time, section attribute instruction
is required to put these functions to the same section
and call them one by one, but how to meet these
two requirements? use the section attribute instruc-
FIGURE 2: Object with -ffunction- tion to put the functions to the section named with
sections the same prefix but unique suffix, and at the linking
stage, merge the section which has the same prefix
To learn more about the principle of section to the same section, so, to linker, the sections are
garbage collection, the basic compiling, assebmling unique and hence better for dead code elimination,
and linking procedure should be explained at first but still be able to link the functions to the same
(Since the handling of data is similar to function, section. The implementation will be explained in
will mainly present function below). the coming sections.
Based on the same rule, the usage of section at-
2.1 Compile: Translate source code tribute instruction should also follow the other two
from C to assembly rules:

If no -ffunction-sections for gcc, all functions are 1. The section for function should named with
put into .text section (indicated by the .text instruc- .text prefix, then, the linker may be able to
tion of assembly): merge all of the .text sections. or else, will not

126
Performance Evaluation and Enhancement of Real-Time Linux

be able to or not conveniently merge the sec- Here is a basic linker script:
tions and at last instead may increase the size O U T P U T _ F O R M A T( " elf32 - i386 " , " elf32 - i386 " ,
of executable for the accumulation of the sec- " elf32 - i386 " )
O U T P U T _ A R C H( i386 )
tion alignment. ENTRY ( _start )
SECTIONS
2. The section name for function should be pre- {
. text :
fixed with .text. instead of the default .text {
prefix used gcc and break the core rule. *(. text . stub . text .* . gnu . linkonce . t .*)
...
}
. data :
And we must notice that: ‘You will not be able {
to use “gprof” on all systems if you specify this op- *(. data . data .* . gnu . linkonce . d .*)
...
tion and you may have problems with debugging if }
you specify both this option and -g.’ (gcc man page) / DISCARD / : { *(. note . GNU - stack ) *(. gnu . lto_ *) }
}
$ echo ’ unused (){} main (){} ’ | \
gcc - ffunction - sections -p -x c -o test -
< stdin >:1:0: warning : - ffunction - sections disabled ; \ The first two commands tell the target architec-
it makes p r o f i l i n g i m p o s s i b l e ture and the ABI, the ENTRY command indicates
the entry of the executable and the SECTIONS com-
mand deals with sections.
2.2 Assemble: Translate assembly The entry (above is start, the standard C entry,
files to binary objects defined in crt1.o) is the root of the whole executable,
all of the other symbols (function or data) referenced
In assembly file, it is still be possible to put the func- (directly or indirectly) by the the entry must be kept
tion or data to an indicated section with the .sec- in the executable to make ensure the executable run
tion instruction (.text equals .section “.text”). Since without failure. Besides, the undefined symbols (de-
-ffunction-sections and -fdata-sections doesn’t fined in shared libraries) may also need to be kept
work for assembly files and they has no way to de- with the EXTERN command. and note, the --entry
termine the function or data items, therefore, for the and --undefined options of ld functions as the same
assembly files written from scratch (not translated to the ENTRY and EXTERN commands of linker
from C language), .section instruction is required to script respectively.
added before the function or data item manually, or
else the function or data will be put into the same --gc-sections will follow the above rule to deter-
.text or .data section and the section name indicated mine which sections should be reserved and then pass
should also be unique to follow the core rule of sec- them to the SECTIONS command to do left merg-
tion garbage collection. ing and including. The above linker script merges all
section prefixed by .text, .stub and .gnu.linkonce.t
The following commands change the section to the last .text section, the .data section merging is
name of the ‘unused’ function in the assembly file similar. The left sections will not be merged and kept
and show that it does work. as their own sections, some of them can be removed
$ echo ’ unused (){} main (){} ’ \ by the /DISCARD/ instruction.
| gcc - ffunction - sections -S -x c -o - - \
| sed -e " s / unused / test / g " \ Let’s see how --gc-section work, firstly, without
| gcc -c - x a s s e m b l e r - -o test
$ objdump -d test | grep . section it:
D i s a s s e m b l y of section . text . test :
$ echo ’ unused (){} main (){} ’ | gcc -x c -o test -
D i s a s s e m b l y of section . text . main :
$ size test
text data bss dec hex filename
800 252 8 1060 424 test

2.3 Link: Link binary objects to tar- Second, With --gc-sections (passed to ld with
get executable -Wl option of gcc):
$ echo ’ unused (){} main (){} ’ | gcc - ffunction - sections \
At the linking stage, based on a linker script, the -Wl , - - gc - sections -x c -o test -
$ size test
linker should be able to determine which sections text data bss dec hex filename
should be merged and included to the last executa- 794 244 8 1046 416 test
bles. When linking, the -T option of ld can be used
to indicate the path of the linker script, if no such It shows, the size of the .text section is reduced
option used, a default linker script is called and can and --print-gc-sections proves the dead ‘unused’
be printed with ld --verbose. function is really removed:

127
Tiny Linux Project: Section Garbage Collection Patchset

$ echo ’unused(){} main(){}’ | gcc -ffunction-sections \ 3.1 Basic support of gc-sections


-Wl,--gc-sections,--print-gc-sections -x c -o test - patchset for Linux
/usr/bin/ld: Removing unused section ’.rodata’ in file ’.../crt1.o’

/usr/bin/ld: Removing unused section ’.data’ in file ’.../crt1.o’ The basic support of gc-sections patchset for Linux
/usr/bin/ld: Removing unused section ’.data’ in file ’.../crtbegin.o’ includes:
/usr/bin/ld: Removing unused section ’.text.unused’ in file ’/tmp/cclR3Mgp.o’

• Avoid naming duplication between the


The above output also proves why the size of the magic sections defined by section attribute
.data section is also reduced. instruction and -ffunction-sections or
-fdata-sections
But if a section is not referenced (directly or in- The kernel has already defined some sections
directly) by the entry, for instance, if want to put with the section attribute instruction of gcc,
a file into the executable for late accessing, the file the naming method is prefixing the sections
can be compressed and put into a .image section like with .text., as we have discussed in the above
this:
section, the name of the sections may be the
... same as the ones used by -ffunction-sections
SECTIONS
{ or -fdata-sections and hence break the core
... rule of section garbage collections.
. data :
{ Therefore, several patches has been up-
_ _ i m a g e _ s t a r t = .;
*(. image )
streamed to rename the magic sections
_ _ i m a g e _ e n d = .; from {.text.X, .data.X, .bss.X, .rodata.X}
...
}
to {.text..X, .data..X, .bss..X, .rodata..X}
} and from {.text.X.Y, .data.X.Y, .bss.X.Y,
.rodata.X.Y} to {.text..X..Y, .data..X..Y,
The file can be accessed through the point- .bss..X..Y, .rodata..X..Y}, accordingly, the re-
ers: image start and image end, but the .image lated headers files, c files, assembly files, linker
section itself is not referenced by anybody, then, scripts which reference the sections should be
--gc-sections has no way to know the fact that .im- changed to use the new section names.
age section is used and hence removes .image and as As a result, the duplication between
a result, the executable runs without expected, In the section attribute instruction and
order to solve this issue, another KEEP instruction -ffunction-sections/-fdata-sections is elim-
of the linker script can give a help. inated.
...
SECTIONS
• Allow linker scripts to merge the sec-
{ tions generated by -ffunction-sections or
...
. data :
-fdata-sections and prevent them from merg-
{ ing the magic sections
_ _ i m a g e _ s t a r t = .;
KEEP (*(. image )) In order to link the function or data sec-
i m a g e _ e n d = .; tions generated by -ffunction-sections or
...
} -fdata-sections to the last {.text, .data, .bss,
} .rodata}, the related linker scripts should be
changed to merge the corresponding {.text.*,
.data.*, .bss.*, .rodata.*} and to prevent the
linker from merging the magic sections(e.g.
3 Section garbage collection .data..page aligned), more restrictive patterns
patchset for Linux like the following is preferred:
*(. text .[ A - Za - z0 -9 _$ ^]*)

The previous section garbage collection patchset is A better pattern may be the following:
for the -rc version of 2.6.35, which did add the core *(. text .[^.]*)
support of section garbage collection for Linux but
Note, both of the above patterns are only sup-
it still has some limitations.
ported by the latest ld, please use the versions
Now, Let’s analyze the basic support of section newer than 2.21.0.20110327 or else, they don’t
garbage collection patchset for Linux and then list work and will on the contrary to generate big-
the existing limitations. ger kernel image for ever such section will be

128
Performance Evaluation and Enhancement of Real-Time Linux

linked to its own section in the last executable 6. Didn’t pay enough attention to the the kernel
and the size will be increased heavily for the modules, the kernel modules may also include
required alignment of every section. dead symbols which should be removed
• Support objects with more than 64k sections 7. Only for X86 platform, not enough for the
The variant type of section number(the other popular embedded platforms, such as
e shnum member of elf{32,64} hdr) is u16, ARM, MIPS and PowerPC
the max number is 65535, the old modpost
tool (used to postprocess module symbol) In order to break through the above limita-
can only handle an object which only has tions, improvement has been added in our gc-sections
small than 64k sections and hence may fail project, see below section.
to handle the kernel image compiled with
huge kernel builds (allyesconfig, for exam-
3.2 Improvement of the previous gc-
ple) with -ffunction-sections. Therefore, the
modpost tool is fixed to support objects with sections patchset
more than 64k sections by the document
“IA-64 gABI Proposal 74: Section Indexes”: Our gc-sections project is also based on mainline
http://www.codesourcery.com/public/cxx- 2.6.35(exactly 2.6.35.13), it brings us with the fol-
abi/abi/prop-74-sindex.html. lowing improvement:

• Invocation of -ffunction-sections/-fdata-sections 1. Ensure the other kernel features work with gc-
and --gc-sections sections
In order to have a working kernel with Ftrace requires the mcount loc section to
-ffunction-sections and -fdata-sections: store the mcount calling sites; Kgcov requires
$ make KCFLAGS = " - ffunction - sections - fdata - sections " the .ctors section to do gcov initialization,
these two sections are not referenced directly
Then, in order to also garbage-collect the sec- and will be removed by --gc-sections and
tions, added hence should be kept by the KEEP instruc-
L D F L A G S _ v m l i n u x += -- gc - sections tion explicitly. Besides, more sections listed
in include/asm-generic/vmlinux.lds.h or the
in the top-level Makefile. other arch specific header files has the similar
situation and should be kept explicitly too.
The above support did make a working kernel /* include / asm - generic / vmlinux . lds . h */
...
with section garbage collection on X86 platforms, but - *( _ _ m c o u n t _ l o c) \
still has the following limitations: + KEEP (*( _ _ m c o u n t _ l o c )) \
...
- *(. ctors ) \
+ KEEP (*(. ctors )) \
1. Lack of test, and is not fully compatible with ...
some main kernel features, such as Ftrace, Kg-
cov 2. The section name defined by section attribute
2. The current usage of section attribute instruc- instruction should be unique
tion itself still breaks the core rule of section The symbol name should be globally unique
garbage collections for lots of functions or data (or else gcc will report symbol redefinition), in
may be put into the same sections(e.g. init), order to keep every section name unique, it is
which need to be fixed possible to code the section name with the sym-
bol name. FUNCTION (or func in Linux)
3. Didn’t take care of assembly carefully and is available to get function name, but there is
therefore, the dead sections in assembly may no way to get the variable name, which means
also be reserved in the last kernel image there is no general method to get the symbol
4. Didn’t focus on the support of compressed ker- name so instead, another method should be
nel images, the dead sections in them may also used, that is coding the section name with line
be reserved in the last compressed kernel image number and a file global counter. the combina-
tion of these two will minimize the duplication
5. The invocation of the gc-sections requires to of the section name (but may also exist dupli-
pass the gcc options to ‘make’ through the en- cation) and also reduces total size cost of the
vironment variables, which is not convenient section names.

129
Tiny Linux Project: Section Garbage Collection Patchset

gcc provides LINE and COUNTER to get # define _ _ a s m _ s e c t i o n( S ) \


. section __us ( S .)
the line number and counter respectively, so,
the previous section() macro can be changed
Then, every .section instruction used in the as-
from:
sembly files should be changed as following:
# define _ _ s e c t i o n( S ) \
_ _ a t t r i b u t e _ _ (( _ _ s e c t i o n _ _(# S ))) /* include / linux / init . h */
-# define __HEAD . section " . head . text " ," ax "
+# define __HEAD _ _ a s m _ s e c t i o n(. head . text ) , " ax "
to the following one:
# define __concat (a , b ) a ## b
# define _ _ u n i q u e _ i m p l(a , b ) __concat (a , b ) 4. Simplify the invocation of the gc-sections
# define __ui (a , b ) _ _ u n i q u e _ i m p l(a , b )
# define _ _ u n i q u e _ c o u n t e r( a ) \
In order to avoid passing -ffunction-sectoins,
__ui (a , _ _ C O U N T E R _ _) -fdata-sections to ‘make’ in every compiling,
# define __uc ( a ) _ _ u n i q u e _ c o u n t e r( a )
# define _ _ u n i q u e _ l i n e( a ) __ui (a , __LINE__ ) both of these two options should be added
# define __ul ( a ) _ _ u n i q u e _ l i n e( a ) to the top-level Makefile or the arch spe-
# define __unique ( a ) __uc ( __ui ( __ul ( a ) , l_c ))
# define _ _ u n i q u e _ s t r i n g( a ) \ cific Makefile directly, and we also need to
_ _ s t r i n g i f y( __unique ( a )) disable -ffunction-sectoins explicitly when
# define __us ( a ) _ _ u n i q u e _ s t r i n g( a )
Ftrace is enabled for Ftrace requires the
# define _ _ s e c t i o n( S ) \ -p option, which is not compatible with
_ _ a t t r i b u t e _ _ (( _ _ s e c t i o n _ _( __us ( S .))))
-ffunction-sectoins.
Let’s use the init for an example to see the Adding them to the Makefile directly may also
effect, before, the section name is .init.text, be better to fix the other potential compatia-
all of the functions marked with init will be bilities, for example, -fdata-sections doesn’t
put into that section. With the above change, work on 32bit kernel, which can be fixed as fol-
every function will be put into a unique sec- lowing:
tion like .text.init.13l c16 and make the linker # arch / mips / Makefile
be able to determine which one should be re- ifndef C O N F I G _ F U N C T I O N _ T R A C E R
cflags - y := - ffunction - sections
moved. endif
# FIXME : 32 bit doesn ’t work with - fdata - sections
Similarly, the other macros used the section ifdef C O N F I G _ 6 4 B I T
attribute instruction should be revisited, e.g. cflags - y += - fdata - sections
endif
sched.
In order to make the linker link the functions Note, some architectures may prefer
marked with init to the last .init.text section, KBUILD CFLAGS than cflags-y, it depends.
the linker scripts must be changed to merge Besides, the --print-gc-sections option
.init.text.* to .init.text. The same change need should be added for debugging, which can
to be applied to the other sections. help to show the effect of gc-sections or when
3. Ensure every section name indicated in assem- the kernel doesn’t boot with gc-sections, it can
bly is unique help to find out which sections are wrongly
removed and hence keep them explicitly.
-ffunction-sections and -fdata-sections only
# Makefile
works for C files, for assembly files, the .section ifeq ( $ ( K B U I L D _ V E R B O S E) ,1)
instruction is used explicitly, by default, the L D F L A G S _ v m l i n u x += -- print - gc - sections
endif
kernel uses the instruction like this: .section
.text, which will break the core rule of section Then, make V=1 can invocate the linker to print
garbage collection tool, therefore, every assem- which symbols are removed from the last the
bly file should be revisited. executables.
For the macros, like LEAF and NESTED used by In the future, in order to make the whole
MIPS, the section name can be uniqued with gc-sections configurable, 4 new kernel config
symbol name: options may be required to reflect the selec-
# define LEAF ( symbol ) \ tion of -ffunction-sections, -fdata-sections,
- . section . text ; \
+ . section . text . asm . symbol ;\ --gc-sections and --print-gc-sections.

But the other directly used .section instruc- 5. Support compressed kernel image
tions require a better solution, fortunately, we The compressed kernel image often include a
can use the same method proposed above, that compressed vmlinux and an extra bootstraper,
is: the bootstraper decompress the compressed

130
Performance Evaluation and Enhancement of Real-Time Linux

kernel image and boot it. the bootstraper may The architecture and platform specific parts
also include dead code, but for its Makefile does are small but need to follow some basic steps
not inherit the make rules from either the top to minimize the time cost, the porting steps
level Makefile or the Makefile of a specific ar- to a new platform will be covered in the next
chitecture, therefore, this should be taken care section.
of independently.
Just like we mentioned in section 2.3, 3.3 The steps of porting gc-sections
the section stored the kernel image must
be kept with the KEEP instruction, and
patchset to a new platform
the -ffunction-sectoins, -fdata-sections,
In order to make gc-sections work on a new platform,
--gc-sections and --print-gc-sections op-
the following steps should be followed (use ARM as
tions should also be added for the compiling
an example).
and linking of the bootstraper.

6. Take care of the kernel modules 1. Prepare the development and testing environ-
ment, including real machine(e.g. dev board)
Currently, all of the kernel modules share
or emulator(e.g. qemu), cross-compiler, file
a common linker script: scripts/module-
system etc.
common.lds, which is not friendly to
--gc-sections for some architectures may re- For ARM, we choose qemu 0.14.50 as the emu-
quires to discard some specific sections. there- lator and versatilepb as the test platform, the
fore, a arch specific module linker script should corss-compiler (gcc 4.5.2, ld 2.21.0.20110327)
be added to arch/ARCH/ and the following is provided by ubuntu 11.04 and the filesys-
lines should be added to the top-level Make- tem is installed by debootstrap, the ramfs
file: is available from http://d-i.debian.org/daily-
# Makefile
images/armel/.
+ LDS_MODULE = \
-T $ ( srctree )/ arch / $ ( SRCARCH )/ module . lds 2. Check whether the GNU toolchains sup-
+ LDFLAGS_MODULE = \ port -ffunction-sections, -fdata-sections
$ ( if $ ( wildcard arch / $ ( SRCARCH )/ module . lds ) ,\
$ ( L D S _ M O D U L E)) and --gc-sections, if no support, add the
L D F L A G S _ M O D U L E += \ toolchains support at first
-T $ ( srctree )/ scripts / module - common . lds
The following command shows the GNU
Then, every architecture can add the archi- toolchains of ARM does support gc-sections,
tecture specific parts to its own module linker or else, there will be failure.
script, for example: $ echo ’ unused (){} main (){} ’ | arm - linux - gnueabi - gcc \
- ffunction - sections -Wl ,-- gc - sections \
# arch / mips / module . lds -S -x c -o - - | grep . section
SECTIONS { . section . text . unused , " ax " ,% progbits
. section . text . main , " ax " ,% progbits
/ DISCARD / : {
*(. MIPS . options )
...
} 3. Add -ffunction-sections, -fdata-sections, at
} proper place in arch or platform specific Make-
file
In order to remove the dead code in the kernel # arch / arm / Makefile
modules, it may require to enhance the com- ifndef C O N F I G _ F U N C T I O N _ T R A C E R
KBUILD_CFLAGS += - ffunction - sections
mon module linker script to keep the functions endif
called by module init() and module exit(), for KBUILD_CFLAGS += - fdata - sections
these two are the init and exit entries of the
modules. Besides, the other specific sections 4. Fix the potential compatibility problem (e.g.
(e.g. .modinfo, version) may need to be kept disable -ffunction-sections while requires
explicitly. This idea is not implemented in our Ftrace)
gc-sections project yet. The Ftrace compatiability problem is fixed
above, no other compatibility has been found
7. Port to the other architectures based platforms
up to now.
Our gc-sections have added the gc-sections sup-
port for the top 4 architectures (ARM, MIPS, 5. Check if there are sections which are unrefer-
PowerPC and X86) based platforms and all of enced but used, keep them
them have been tested. The following three sections are kept for ARM:

131
Tiny Linux Project: Section Garbage Collection Patchset

# arch / arm / kernel / vmlinux . lds . S # arch / arm / module . lds . S


... ...
KEEP (*(. proc . info . init *)) SECTIONS {
... / DISCARD / : {
KEEP (*(. arch . info . init *)) # ifndef C O N F I G _ M M U
... *(. fixup )
KEEP (*(. taglist . init *)) *( _ _ e x _ t a b l e)
# endif
}
6. Do basic build and boot test, if boot failure }
happens, use make V=1 to find out the wrongly # arch / arm / Makefile
extra - y := module . lds
removed sections and keep them explicitly with
the KEEP instruction
11. Do full test: test build, boot with NFS root
$ qemu - system - arm -M v e r s a t i l e p b -m 128 M \
- kernel vmlinux - initrd initrd . gz \ filesystem, the modules and so forth.
- append " root =/ dev / ram init =/ bin / sh " \
Enable the network bridge support between
If the required sections can not be determined qemu and your host machine, open the NFS
in the above step, it will be found at this step server on your host, config smc91c111 kernel
for make V=1 will tell you which sections may driver, dhcp and NFS root client, then, boot
be related to the boot failure. your kernel with NFS root filesystem to do a
full test.
7. Add support for assembly files with the $ qemu - system - arm - kernel / path / to / zImage \
asm section() macro - append " init =/ bin / bash root =/ dev / nfs \
nfsroot = $ n f s _ s e r v e r:/ path / to / rootfs ip = dhcp " \
Using grep command to find out every .section -M v e r s a t i l e p b -m 128 M - net \
nic , model = s m c 9 1 c 1 1 1 - net tap
place and replace it with the asm section()
macro, for example:
# arch / arm / mm / proc - arm926 . S
- . section " . rodata "
+ _ _ a s m _ s e c t i o n(. rodata ) 4 Testing results
8. Follow the above steps to add support for com- Test has been run on all of the top 4 architectures,
pressed kernel including basic boot with ramfs, full boot with NFS
Enable gc-sections in the Makefile of com- root filesystem and the main kernel features (e.g.
pressed kernel: Ftrace, Kgcov, Perf and Oprofile).
# arch / arm / boot / c o m p r e s s e d/ Makefile
E X T R A _ C F L A G S+= - ffunction - sections - fdata - sections The host machine is thinkpad SL400 with In-
... tel(R) Core(TM)2 Duo CPU T5670, the host sytem
L D F L A G S _ v m l i n u x := --gc - sections
ifeq ( $ ( K B U I L D _ V E R B O S E ) ,1) is ubuntu 11.04.
L D F L A G S _ v m l i n u x += -- print - gc - sections
endif The qemu version and the cross compiler for
... ARM is the same as above section, the cross com-
And then, keep the required sections in the piler for MIPS is compiled from buildroot, the cross
linker script of the compressed kernel: compiler for PowerPC is downloaded from emde-
bian.org/debian/, the details are below:
# arch / arm / boot / c o m p r e s s e d/ vmlinux . lds . in
KEEP (*(. start ))
KEEP (*(. text ))
KEEP (*(. text . c a l l _ k e r n e l)) arch board net gcc ld
ARM versatilepb smc91c111 4.5.2 2.21.0.20110327

9. Make sure the main kernel features (e.g. MIPS malta pcnet 4.5.2 2.21
Ftrace, Kgcov, Perf and Oprofile) work nor- PPC g3beige pcnet 4.4.5 2.20.1.20100303

mally with gc-sections X86 pc-0.14 ne2k pci 4.5.2 2.21.0.20110327

Validated Ftrace, Kgcov, Perf and Oprofile on


ARM platform and found they worked well. TABLE 1: Testing environment

10. Add architecture or platform specific mod- Note:


ule.lds to remove unwanted sections for the ker-
nel modules • In order to boot qemu-system-ppc on ubuntu
In order to eliminate the unneeded sections(e.g. 11.04, the openbios-ppc must be downloaded
.fixup, ex table) for modules while no CON- from debian repository and installed, then use
FIG MMU, a new module.lds.S is added for the -bios option of qemu-system-ppc to indi-
ARM: cate the path of the openbios.

132
Performance Evaluation and Enhancement of Real-Time Linux

• Due to the regular experssion pattern bug 5 Conclusions


of ld <2.20 described in section 3, In or-
der to make the gc-sections features work The testing result shows that gc-sections does elim-
with 2.20.1.20100303, the linker script of pow- inate some dead code and reduces the size of the
erpc is changed through using the pattern kernel image by about 1˜3%, which is useful to some
.text.*, .data.*, .bss.*, .sbss.*, but to avoid size critical embedded applications.
wrongly merging the kernel magic sections (e.g.
.data..page aligned) to the .data section, the Besides, this brings Linux kernel with link time
magic sections are moved before the merging dead code elimination, more dead code can be further
of .data, then it works well because of the the eliminated in some specific applications (e.g. only
.data..page aligned will be linked at first, then, parts of the kernel system calls are required by the
it will not match .data.* and then will not go target system, finding out the system calls really not
to the .data section. Due to the unconvenience used may guide the kernel to eliminate those system
of this method, the real solution will be forcing calls and their callees), and for safety critical sys-
the users to use ld >= 2.21, or else, will dis- tems, dead code elimination may help to reduce the
able this gc-sections feature to avoid generating code validations and reduce the possibinity of execu-
bigger kernel image. tion on unexpected code. And also, it may be possi-
SECTIONS
ble to scan the kernel modules(‘make export report’
{ does help this) and determine which exported kernel
...
. data .. p a g e _ a l i g n e d : ... {
symbols are really required, keep them and recom-
P A G E _ A L I G N E D _ D A T A( P A G E _ S I Z E) pile the kernel may help to only export the required
}
...
symbols.
. data : AT ( ADDR (. data ) - L O A D _ O F F S E T) {
DATA_DATA Next step is working on the above ideas and
*(. data .*) firstly will work on application guide system call op-
...
*(. sdata .*) timization, which is based on this project and maybe
... eliminate more dead code.
}
...
}
And at the same time, do more test, clean up
the existing patchset, rebase it to the latest stable
kernel, then, upstream them.
The following table shows the size information of Everything in this work is open and free, the
the vanilla kernel image(vmlinux, .orig) and the ker- homepage is tinylab.org/index.php/projects/tinylinux,
nel with gc-sections(.gc), both of them are stripped the project repository is gitorious.org/tinylab/tinylinux.
through strip -x (or even strip s) because of gc-
sections may introduce more symbols (especially,
non-global symbols) which are not required for run-
ning on the embedded platforms. References
The kernel config is gc sections defconfig placed
under arch/ARCH/configs/, it is based on the ver- [1] Tiny Lab, http://www.tinylab.org
satile defconfig, malta defconfig, pmac32 defconfig
and i386 defconfig respectively, extra config options [2] Link time dead code and data
only include the DHCP, net driver, NFS client and elimination using GNU toolchain,
NFS root file system. http://elinux.org/images/2/2d/ELC2010-gc-
sections Denys Vlasenko.pdf, Denys Vlasenko
arch text data bss total save [3] Executable and Linkable Format,
ARM 3062975 137504 198940 3650762 http://www.skyfree.org/linux/references/
3034990 137120 198688 3608866 -1.14% ELF Format.pdf
MIPS 3952132 220664 134400 4610028
3899224 217560 123224 4545436 -1.40% [4] GNU Linker ld,
PPC 5945289 310362 153188 6671729 http://sourceware.org/binutils/docs-2.19/ld/
5849879 309326 152920 6560912 -1.66%
X86 2320279 317220 1086632 3668580 [5] A Whirlwind Tutorial on Creating Re-
2206804 311292 498916 3572700 -2.61% ally Teensy ELF Executables for Linux,
http://www.muppetlabs.com/b̃readbox/software/
TABLE 2: Testing results tiny/teensy.html

133
Tiny Linux Project: Section Garbage Collection Patchset

[6] Understanding ELF using readelf and obj- https://www.ibm.com/developerworks/cn/linux/l-


dump, http://www.linuxforums.org/articles/ excutff/
understanding-elf-using-readelf-and-
objdump 125.html [9] Section garbage collection patchset,
https://patchwork.kernel.org/project/linux-
[7] ELF: From The Programmers parisc/list/?submitter=Denys+Vlasenko, Denys
Perspective, http://www.ru.j- Vlasenko
npcs.org/usoft/WWW/www debian.org/
Documentation/elf/elf.html [10] Work on Tiny Linux Kernel,
http://elinux.org/Work on Tiny Linux Kernel,
[8] UNIX/LINUX , Wu Zhangjin

134
Performance Evaluation and Enhancement of Real-Time Linux

Performance Evaluation of openPOWERLINK


Yang Minqiang, Li Xuyuan, Nicholas Mc Guire, Zhou Qingguo

Distributed & Embedded System Lab, SISE, Lanzhou University, China
Tianshui South Road 222,Lanzhou,P.R.China
[email protected]

Abstract
For the low cost and high bandwidth of Ethernet, there have been a lot of investigations to implement
real-time communication over Ethernet. To overcome the intrinsic non-determinism of Ethernet, a few of
real-time Ethernet variants turn up but industrial users are still hesitating about the choice among those
variant. openPOWERLINK is a open source industrial Ethernet solution which follows POWERLINK -
a real-time Ethernet protocol. For the good performance and wide platform support of RT-PREEMPT, a
soft PLC which consists of openPOWERLINK and RT-PREEMPT might be a cheap and easy solution for
many deployment cases. In this paper, we would evaluate synchronicity, jitter, cycle time and other most
relevant indicators of quality of openPOWERLINK on RT-PREEMPT in distributed systems which are
used commonly to implement tightly coordinated controllers, data acquisition or synchronization systems.
To allow evaluating such typical scenarios with inexpensive hardware we used the parallel-port to simulate
an input/output unit of a PLC. We designed some benchmark cases to evaluate the related indicators
and represented the result as reference data. Further with the increased use of COTS and FLOSS
components for safety related systems, which are often distributed/replicated systems, the results of this
evaluation may also be of interest to designers of safety related systems. OpenPOWERLINK provides
basic capabilities suitable to build replicated or redundant systems (i.e. TMRs).
Keywords: openPOWERLINK, EPL, Real Time Ethernet, VIAC3

1 Introduction but it only fulfills the requirement of the low level


real-time demand applications. Another kind of Eth-
It is generally known that Ethernet has low-price de- ernet field-buses are realized with modified Ethernet
vices, while providing higher and higher speeds, but hardware, e.g. EtherCAT. It generally means that
also its intrinsic non-deterministic. A further note- you could get better performance (cycle time and
worthy advantage clearly is the availability of well throughput) and you probably cannot use the normal
trained engineers as Ethernet is a widely in-use stan- Ethernet device directly. Solutions such as Ether-
dard technology. In order to utilize these good fea- CAT require special Ethernet controller which have
tures, a lot of researches and attempts to use Eth- two Ethernet ports and the capability of processing
ernet in the industrial control context are underway. packets ”on the fly”[4].
Real-time Ethernet[1] is a communication architec- Ethernet POWERLINK (EPL) is another vari-
ture using standard Ethernet hardware with various ant of Ethernet; it introduces Slot Communication
modifications which introduce real-time and deter- Network Management (SCNM) to provide determin-
ministic property to meet the requirements of indus- istic communication and leaves the hardware without
trial control system. These modifications lie on dif- any modification, but uses its own software protocol
ferent layers in TCP/IP reference model[2]. stack (note that the EPL NIC utilizes the CANopen
Ethernet variants like Modbus/TCP use the application layer interface). This compromise on the
TCP/IP protocol stack without any modifications, modification to the Ethernet brings a balance be-
∗ Research supported by National Natural Science Foundation of China under Grant No.60973137; Gansu International

Sci.&Tech. Cooperation under Grant No.090WCGA891.

135
Performance Evaluation of openPOWERLINK

tween cost and performance. Moreover, for its good CANopen is one of the most popular higher layer
synchronization, EPL attracted much attention of in- protocols for CAN-based networks. Therefore there
dustrial users. are a number of device and application profiles un-
To provide reference to industrial users and der development or already available which are used
related developers, we built a simple distributed in example in building related applications like door
system with openPOWERLINK[9] which is the control or elevators, for ships, trains, municipal ve-
open-source implementation of EPL. Some demo hicles or railway as well as for medical applications.
and benchmark applications were implemented for Besides these standardized profiles, another big ad-
benchmark. In the following sections, we will intro- vantage of CANopen is that it’s used in a wide range
duce our work and represent the evaluation result. of proprietary systems and applications which eases
integration of these and the transformation to open-
source.
2 Background: EPL and open- openPOWERLINK is an open source industrial
POWERLINK Ethernet solution provided by SYSTEC electronic
[10]. It contains the Ethernet POWERLINK proto-
Bringing together Ethernet, CANopen, and a newly col stack for the Managing Node (master) and for
developed stack for real-time data communication, the Controlled Nodes (slaves). It is released under
POWERLINK integrates features and abilities from the BSD License.
three different worlds. In contrast to a number of EPL Application Layer HTTP
competing products, POWERLINK keeps very close FTP
Object Dictionary Other
to the Ethernet standard, retaining original Ethernet Application Layer
PDO SDO (Asynchronous data)
features, and thus reducing the cost of industrial de-
ployment. It expands Ethernet with a mixed Polling
Transport Layer
and Time slicing mechanism named SCNM (refer to NMT

Network Layer
figure 1).
Managing Node(MN) EPL Data Link Layer
Isochoronous Phase Asynchronous Phase
DataLink Layer
PReq PReq PRes MAC
SoC SoA ASnd
to CN 1 to CN 2 from MN
Idle
Phase

PRes PRes
from CN 1 from CN 2
ASnd Physical Layer (PHY) Physical Layer

Controlled Node(MN)
Fig. 2: Abstract Model of openPOWER-
LINK
Fig. 1: EPL Cycle The EPL stack is divided into two parts: low-
There are two kinds of nodes in EPL, namely prioritized processes above the Communication Ab-
managing node (MN) and controlled node (CN). straction Layer (CAL) called EPL user part and
A MN, which acts as the master in the EPL net- high-prioritized processes below the CAL called EPL
work, polls the CN cyclically. This process takes kernel part. Processes which have to be processed in
place in the isochronous phase of the EPL cycle. every EPL cycle have high priority, e.g. Data Link
Immediately after the isochronous phase follows an Layer (DLL), PDO processing and core NMT state
asynchronous phase for communication which is not machine. All other processes have low priority, e.g.
time-critical, e.g. TCP/IP communication. The SDO. It is possible to swap out the high-prioritized
isochronous phase starts with a Start of Cyclic frame processes on a separate CPU (e.g. on a SMP ma-
on which all nodes are synchronized. This schedule chine) to ensure the real-time requirements[16].
design avoids collisions, which are usually present on
Standard Ethernet, and ensures the determinism of
the hard real-time communication. It is implemented
3 Performance Evaluation
in the EPL data link layer. The SoC1 packet is sent
3.1 Performance Indicators
for synchronizing and indicating that the start of the
isochronous phase of a new cycle. SoA starts the There are a lot of indicators that should be consid-
asynchronous phase[16]. ered when evaluating real-time Ethernet. We paid
EPL integrates the CANopen, a robust and more attention on the most important performance
proven protocol widely used throughout the au- indicators (PIs) in this paper, namely synchronicity,
tomation world, which greatly simplifies setting up minimum cycle time, latency and jitter, and the scal-
networks because of its extensive standardization. ability over the number of end nodes. Some other PIs
1 EPL frame types, SoA: Start of Asynchronous, SoC: Start of Cyclic, PReq: Poll Request, PRes: Roll Response

136
Performance Evaluation and Enhancement of Real-Time Linux

were also considered, such as topology, throughput


etc[3]. To get the worst case values, all the tests in
the paper have been done with heavy system load.

3.2 Setup
We built a three nodes distributed system using VIA
boards with little modification on the default config-
uration of openPOWERLINK. The motivation for
the three node setup being that a common setup
safety related systems is as a triple modular redun-
dancy (TMR) - thus this is one of the target profiles
we are interested in for a real-time Ethernet solu-
tion. There was one MN and two CNs. The three
nodes were connected via 10Mbps HUB or 100Mbps
Fig. 4: System Latency of RT-PREEMPT
Switch (the ideal solution a 100Mbps hub was not
available). A parallel-port cable was attached to each We run cyclictest with different interval (500,
CNs. Suitable pins of the other end of the cable were 1000, 2000 in macrosecond) and the worst case la-
connected to an oscilloscope. We used two channels tency were less than 80 microsecond. The distribu-
of the oscilloscope to display the signals from the two tions of the latency over frequency are quit stable.
CNs - one channel for each CN. The structure of the
setup was figured as follows.
3.3 Synchronicity
As we described above, there was one MN and two
CNs in our setup. To meet our requirement, we need
to configure the related entries of object dictionary
and let the MN send a 8-bit real-time data to each of
the two CNs periodically. The two CNs output the
data to their parallel-port when they get the next
SoC packet. To facilitate our programming, the MN
sent 0x00H and 0xFFH to the CNs separately and
reversed them in the next cycle. The snapshot of the
oscilloscope below shows the output of the CNs.
Fig. 3: System Setup Implementation of MN includes the following
main parts.
The following are the detail of the system.
• Define object dictionary

Component M odel/V ersion • Link the variable to the corresponding object


CPU VIA Nehemiah (C3), 1GHz • Event callback function
NIC RTL8139 10M/100Mbps TPDO should be initialized when the reset-
RAM 256M communication event occurs.
Hub Ovislink 10Mbps
Switch TP-Link TL-SF1016 10/100Mbps • Synchronization callback function
EPL openPOWERLINK 1.6 The synchronization function invoked while
RTOS RT-PREEMPT transmitting SoC packet will reverse the out-
linux-2.6.33.9-rt31 put value to generate the PWM signal.
Distribution Debian GNU/Linux 5 and 6 • Clean up
Close NMT and EPL stack
Before we run openPOWERLINK, we have to
make sure our RTOS is correctly configured and run- The CN also need the object dictionary to be
ning properly. To ensure this the de-facto standard defined as well as link the variable with the entry
cyclictest[11] benchmark was used - the overall result of the object dictionary. The related callback func-
for RT-PREEMPT on VIAC3 is quite good - results tions should be properly setup and configured fol-
are plotted in the following figures. lowing the specification[12][16]. Make sure that no

137
Performance Evaluation of openPOWERLINK

other machine is connected to the hub the EPL en-


vironment is using, if there are, make sure the ma-
chine does not send any packet to the network. Raise
the priority of the interrupt handler thread of open-
POWERLINK to a value above the other peripherals
and system management kernel threads (we used a
priority above 50 - 50 being the default priority of
IRQ-threads in RT-Preempt prior to Linux 3.X).

Fig. 6: Overload Synchronization

EPL use precision time protocol according to


IEEE 1588[8] to synchronize the independent clocks
running on separate MNs of a distributed control
system in which a high accuracy and precision is re-
quired. As clock synchronization is not relevant for
the openPOWERLINK stack and the EPL protocol,
Fig. 5: Output of Slave Nodes
the clock synchronization between MNs is not cov-
ered in this paper.
.
EPL protocol provides good synchronization ca-
pabilities, but the latency of software stack of open-
POWERLINK on the CNs and RTOS mainly con- 3.4 Cycle time and number of nodes
tribute the synchronization deviation, we need to
know how good the synchronicity we could get by
The cycle time is the key issue of real-time communi-
running openPOWERLINK as CNs(the latency of
cation for many automation and control applications.
MN does not affect the synchronization and the jitter
The homepage of openPOWERLINK announced
of NIC and Ethernet HUB is negligible). As we de-
that ”POWERLINK, which operates with standard,
scribed above, the parallel-port of the two CNs were
on-board Ethernet controllers, reaches cycle times
connected to the two channels of the oscilloscope.
down to 0.5 milliseconds in this Open Source imple-
The oscilloscope used the signal of one channel (A) as
mentation while ensuring high synchronicity. Sup-
the reference (trigger), while the signal of the other
ported by co-processors, POWERLINK even ensures
channel (B) would fluctuate relatively and is a direct
cycle times down to 0.1 ms.”[9] Some benchmark
measure for the synchronicity of the two CNs. By ac-
results on openPOWERLINK by B & R also pre-
cumulating the wave shape, we could get the relative
sented results indicating that the minimal cycle time
synchronization performance of openPOWERLINK.
of 250 microseconds was achieved[7] though in the
We run the openPOWERLINK for 30 minutes with
presented setup such cycle times are out of reach.
heavy load and recorded the persistent mode snap-
shot in which it indicated about 100 microsecond’s The jitter of the SoC can directly reflect the over-
synchronization deviation. Note that the deviation all real-time performance of open-POWERLINK on
is not the worst case synchronization time, because RT-PREEMPT, which can be seen as reference for
the worst cases of the two nodes probably did not the system (including application, RTOS and hard-
occur at the same time. The worst case synchro- ware ) jitter. To measure the jitter, a rising edge or a
nization time should be less than the deviation - to falling edge was generated alternately on the parallel-
estimate this the distribution of jitter would need port while sending SoC packet. We used the persist
further study. mode of oscilloscope to capture the PWM signal.

138
Performance Evaluation and Enhancement of Real-Time Linux

asynchronous period and the idle phase. Obviously,


we have
Tsoc = Tsr − Trr (2)
The histogram of Tsr and Trr is showed in Figure
9 and Figure 10.

Fig. 7: SoC Jitter

.
The cycle time which is commonly one of the
most critical indicator depends on the system jitter,
number of nodes and transmission hardware (NIC Fig. 9: Timing over 10Mbps HUB
and Ethernet). Ethernet packet capture tools like
Wireshark cannot meet the high precise timer re-
quirement for a system analysis. Thus we need to
record the time stamps via high resolution timer.
The MN and CNs used independent clocks, so we
cannot compare the time stamp from different nodes
without time synchronization. Our solution is to
record the related time stamps on the same machine
so that we could get the duration for different phases.
The following figure showed a simplified communica-
tion period.

Fig. 10: Timing over 100Mbps Switch


avg avg avg
Setup Tsr Trr Tsoc
Fig. 8: Cycle Timing 10M Hub 437 176 261
. 100M Switch 279 75 204
Refer to figure 8 for the latency benchmark of
some steps which mainly contribute to the overall Because the system (protocol stack, RTOS, NIC
openPOWERLINK latency. To analyze the overall and switch etc.) have jitters and the data might
real-time performance and cycle time, namely Tsr be lost for various factor. The polling mechanism of
(duration from sending SoC to getting PRes), Trr EPL protocol requires a timeout To within which the
(time from sending PReq to getting PRes). MN should get the PRes packet after it sent PReq
The whole cycle of EPL can be simply formu- packet to the CN, otherwise MN stop waiting the
lated below, CN and turn to poll next CN. If the timeout value is
too big, the cycle time will be unnecessary long. On
the other side that, if the timeout value is too small,
TC = Tsoc + Trr ∗ n + Tai (1)
the late PRes from the previous CN will collide with
in which, TC is the cycle time. Tsoc is the time the PReq to the next CN. The EPL standard does
for delivering SoC and the safety margin after it. not define the timeout value which is closely depen-
Trr indicates the time from sending PReq to getting dent on the EPL implementation, system, hardware
PRes. n is the number of nodes. Tai is the time for and the transport media, so that we have to define

139
Performance Evaluation of openPOWERLINK

the timeout value which fits to our specific environ- The throughput is closely related to the amount
ment and setup. A new equation can be derived from of data to send during one cycle and minimal cycle
Equation 1, time. After getting those data, we could easily esti-
mate the relationship between throughput, payload
TC = Tsoc + Trr ∗ n + Ta + To ∗ n (3) and cycle time. As the Ethernet has enough band-
in which Ta is the time for asynchronous phase(that width to fulfill the RTE and non-RTE throughput,
we can refer to the Trr to calculate); To ∗ n is the we would not extend the issue in this paper.
idle time to ensure the jitter of the system will not
affect the regular cyclic communication. Note that
the possibility of the worst case latency occur at the 4 Conclusion
same time is very low. We do not need a safety mar-
gin as long as To ∗ n. In other words, you might need As the strong potential of introducing Ethernet into
to find out a balance between cycle time and failure distributed real-time control system, we took several
rate(here means the tolerable rate of the actual cycle important performance indicators of real-time Eth-
time exceeds the configured value). ernet into consideration when evaluating openPOW-
As soon as the timeout, specification of com- ERLINK. User could easily setup the EPL on a Linux
munication (RT and non-RT data amount) and the machine without any special hardware or particu-
number of nodes are specified, referring to the data lar topology and get relatively good hard real-time
and equation above we could get the time for deliver- communication. This feature facilitates the user and
ing SoC (including the gap) and the round time for saves much cost.
PReq and Pres and the time for the asynchronous In this paper, we studied the openPOWERLINK
and idle phase. And go a step further, we could on a few significant indicators, benchmarked the crit-
roughly estimate the possible cycle length of the EPL ical phases in the EPL communicate cycle and gave
network with 10Mbps HUB or 100Mbps Switch. a reference model for industrial user or related re-
Comparing to the system jitter, the jitter of searchers and developers. The data we got indi-
switch is negligeable in our case. We could take the cates that the cycle time and synchronization per-
average value of the parameters as the reference to es- formance of openPOWERLINK-on-RT-PREEMPT
timate the possible cycle time we can get (note we got solution meet the requirement of process control sys-
data in overload system condition). Normally the la- tems and most motion control systems[2]. The data
tency of a 100Mbps switch is about 10 microseconds shows that the system jitter is quite big which makes
(the precise value depends on a specific hardware), a great impact on the EPL cycle time and syn-
so you may use a 100Mbps Hub to avoid the latency chronicity. To use it in high precision motion con-
generated by switch and get better performance. trol system, there are some points need to be opti-
mized, such as the software architecture, code and
RT-PREEMPT etc. Besides optimization of those
3.5 Other indicators points and implementation of a real distributed con-
Up to 240 CNs can be employed connected in var- trol which has mechanical equipment integrated, to
ious configurations. You may use both hubs(hub is seek for a solution to effectively reuse the Ethernet
recommended for its small latency) or switches(you card driver in Linux kernel is also our future work.
need to consider the latency) in more than one level
with different topology. Rather than fixed topology
of EtherCAT, EPL has flexible topology. A mixed 5 Reference
tree and line structure is available when a large num-
ber of nodes are being used. As the intrinsic Ethernet References
property, the EPL network can be easily connected
via gateways to non real-time networks. [1] Jean-Dominique Decotignie, Ethernet-Based
Real-Time and Industrial Communications,
Proceedings of the IEEE, 2005.

[2] Max Felser, Real-Time Ethernet-Industry


Prospective, Proceedings of the IEEE, 2005.

[3] L. Seno, S. Vitturi, C. Zunino, Real-time Ether-


net Networks Evaluation Using Performance In-
dicators, Emerging Technologies & Factory Au-
Fig. 11: EPL Topology tomation, 2009. ETFA 2009.

140
Performance Evaluation and Enhancement of Real-Time Linux

[4] D. Jansen and H. Buttner, Real-time Ethernet: [10] http://www.systec-electronic.com


the EtherCAT solution, Computing and Control
Engineering, 2004. [11] http://rt.wiki.kernel.org/index.php/Cyclictest

[5] http://www.ethernet-powerlink.org/. [12] EPSG, Ethernet POWERLINK V2.0 Commu-


nication Profile Specification, Version 1.0.0
[6] Yang Minqiang, Bringing openPOWERLINK to
MIPS (Loongson 2F), RTLWS12, 2010. [13] RT-preempt, http://rt.wiki.kernel.org/
[7] Josef Baumgartner, POWERLINK and Real-
[14] http://rt.wiki.kernel.org/index.php/Cyclictest
Time Linux: A Perfect Match for Highest Per-
formance in Real Applications, RTLWS12, 2010. [15] Daniel P.Bovet & Marco Cesati, Understanding
[8] IEEE Std., IEEE Standard for a Precision Clock the LINUX KERNEL 3sd Edition, O’REILLY,
Synchronization Protocol for Networked Mea- 2005
surement and Control Systems, 1588 -2002.
[16] EPSG, openPOWERLINK: Ethernet POWER-
[9] http://openpowerlink.sourceforge.net/ LINK Protocol Stack, 2010

141
Performance Evaluation of openPOWERLINK

142
Performance Evaluation and Enhancement of Real-Time Linux

Improving Responsiveness for Virtualized Networking Under


Intensive Computing Workloads

Tommaso Cucinotta
Scuola Superiore Sant’Anna
Pisa, Italy
[email protected]

Fabio Checconi
IBM Research, T.J. Watson
Yorktown Heights, NY, USA
[email protected]

Dhaval Giani
Scuola Superiore Sant’Anna
Pisa, Italy
[email protected]

September 27, 2011

Abstract
In this paper the problem of providing network response guarantees to multiple Virtual Machines
(VMs) co-scheduled on the same set of CPUs is tackled, where the VMs may have to host both responsive
real-time applications and batch compute-intensive workloads. When trying to use a real-time reservation-
based CPU scheduler for providing stable performance guarantees to such a VM, the compute-intensive
workload would be scheduled better with high time granularities, to increase performance and reduce
system overheads, whilst the real-time workload would need lower time granularities in order to keep the
response-time under acceptable levels. The mechanism that is proposed in this paper mixes both concepts,
allowing the scheduler to dynamically switch between fine-grain and coarse-grain scheduling intervals
depending on whether the VM is performing network operations or not. A prototype implementation of
the proposed mechanism has been realized for the KVM hypervisor when running on Linux, modifying
a deadline-based real-time scheduling strategy for the Linux kernel developed previously. The gathered
experimental results show that the proposed technique is effective in controlling the response-times of the
real-time workload inside a VM while at the same time it allows for an efficient execution of the batch
compute-intensive workload.

Acknowledgements n. 214777 “IRMOS—Interactive Realtime Multime-


dia Applications on Service Oriented Infrastruc-
The research leading to these results has received tures” and n. 248465 “S(o)OS – Service-oriented Op-
funding from the European Community’s Seventh erating Systems.”
Framework Programme FP7 under grant agreements

143
Improving Responsiveness for Virtualized Networking Under Intensive Computing Workloads

1 Introduction and Related based real-time CPU scheduling [2] for the Linux ker-
nel in order to stabilize the performance of individ-
Work ual compute-intensive VMs, tackling the problem of
network-intensive VMs later [6].
Virtualization is increasingly gaining momentum as
the enabling technology for the management of phys- The latter works rely on the use of a reservation-
ical resources in data centers and Infrastructure-as- based scheduler [2] for the CPU (a hard-reservation
a-Service (IaaS) providers in the domain of Cloud variant of the Constant Bandwidth Server [1]) that
allows for configuring the scheduling guarantees for
Computing. Indeed, virtualization enhances the flex-
ibility in managing physical resources, thanks to its a given VM in terms of a budget (Q) and a period
capability to virtualize the hardware so as to host (P ). The scheduler will guarantee that each VM will
multiple Virtual Machines (VMs) executing poten- be scheduled for Q time units every period of P time
tially different Operating Systems, and the capability units, underPthe iusual assumption of non-saturation
to live-migrate them as needed without interrupting for EDF ( i Q Pi ≤ 1, see [10] for details). The
the provided service, except for a very low down- reservation period can be specified independently for
time. Virtualized systems are also capable of ex- each VM, and it constitutes the time granularity over
which the CPU allocation is granted to the VM.
hibiting a performance nearly equal to the one expe-
rienced on the bare metal, due to the hardware virtu- A shorter period improves the responsiveness of
alization extensions provided by modern processors. the VMs at the cost of higher scheduling overheads,
As a consequence of virtualization, multiple thus being beneficial for time-sensitive workloads.
under-utilized servers can easily be consolidated onto On the other hand, a longer period leads to lower
the same physical host. This allows a reduction in scheduling overheads, thus it is beneficial for batch
and high-performance workloads, at the cost of po-
the number of required physical hosts to support a
number of virtualized OSes, leading to advantages in tentially longer time intervals during which the VM
terms of costs for running the infrastructure and of is unresponsive (in the worst-case, a VM might have
energy impact. to wait as much as 2(P − Q) before being scheduled
again). However, for VMs embedding both batch
However, once multiple VMs are deployed on computing activities (including both main VM func-
the same physical resources, their individual perfor- tionality or typical bookkeeping OS activities, such
mance is at risk of becoming greatly unstable, unless as updating indexes) and time-sensitive tasks (e.g.,
proper mechanisms are utilized. A VM which tempo- reporting on the progress of batch tasks, or realizing
rararily saturates either the processing, networking, independent features), both configurations do not fit
or storage access capacity of the underlying physical very well, as highlighted by Dunlap in the discussion
resources immediately impacts the performance of about future work on the new upcoming Xen Credit
the other VMs which share the same resources. This Scheduler [7].
is a potentially critical issue for IaaS providers where
proper QoS specifications are included in the Service- In this paper we propose a novel mechanism for
scheduling VMs with both compute-intensive and
Level Agreements (SLAs) with the customers.
network-responsive workloads. In absence of exter-
The problem of providing a stable performance nal requests the VM progresses with its (long) period
to individual VMs has been studied in the past. For configuration (e.g., hundreds of ms) and can perform
example, Gupta et al. [8] introduce in the Xen hyper- batch computing activities reducing scheduling over-
visor1 a proper CPU scheduling strategy accounting heads to the minimum. However the occurrence of
for the consumption of device driver domain(s) as external requests allows the VM to be woken up by
due to the individual VMs operations. In [11], an the scheduler within a much shorter interval (e.g., ms
extension to the Xen credit-based scheduler is pro- or tens of ms), to perform relatively short activities
posed, to improve its behavior in presence of multi- configured at a higher priority inside the VM, so as
ple different applications with I/O bound workloads. to respond very quickly to external events.
Also, Liao et al. [9] propose to modify the Xen CPU
scheduler, by making it cache aware, and the net-
working infrastructure to improve the performance
of virtualized I/O on 10Gbps Ethernet.
For the KVM hypervisor2, Cucinotta et al. [3,
4, 5] investigated on the use of hierarchical deadline-
1 More information at: http://www.xen.org/.
2 More information at: http://www.linux-kvm.org/.

144
Performance Evaluation and Enhancement of Real-Time Linux

2 Approach
The mechanism proposed in this paper applies to vir-
tual machines scheduled under a reservation-based
real-time scheduler like the one presented in [2]. For
the sake of simplicity the focus is on single-core VMs
scheduled according to a partitioned EDF policy (so
one or more VMs are pinned on each physical core
and scheduled on it).
Each VM can be configured with a set of schedul-
ing parameters denoted by (Q, P ), with the meaning FIGURE 2: Example schedule of a
that Q time units are granted to the VM for each P VM with generic scheduling parameters of
time units. The interest in having Q/P < 1, thus the (Q, P ), when co-scheduled, exhibiting a non-
possibility to have multiple VMs co-scheduled on the responsiveness time interval of 2(P − Q).
same processor and core, comes from the fact that
the infrastructure provider may have an interest in Also, P controls the scheduling overheads im-
“partitioning” the big computing power available on posed on the system. In fact, the scheduler forces a
a single powerful core into multiple VMs with lower context switch at least every interval as long as the
computing capabilities and rent them separately, or minimum P value across all the reservations config-
merely from the fact that the hosted VMs have an ured on the core.
expected workload (e.g., as due to requests coming
from the network) that cannot saturate the comput- This kind of scheduler allows heterogeneous vir-
ing power on the underlying physical core, thus en- tualized workloads to safely coexist as far as they
abling the provider to perform server consolidation. belong to different VMs. One can easily configure a
The Q value constitutes both a guarantee and a lim- short P value for a VM with a real-time workload
itation (i.e., we are using hard reservations). This that needs to be responsive, and a long P for a VM
ensures that the performance of each VM is not af- that performs mainly batch computations. However,
fected (too much) from how much intensively other mixing such types of workloads in the same VM may
VMs are computing [5, 4]. lead to problems. One can configure the responsive
activities in the VM to run at a higher priority as
Roughly speaking, at equal Q over P ratios, the compared to the batch computing ones (i.e., by ex-
chosen value for P regulates the responsiveness of the ploiting priority-based scheduling as available on ev-
associated VM. It is easy to see that, if the VM is ery OS). However, still the non-responsive periods
running alone, then its schedule comes out as shown of the VM will largely dominate the response time
in Figure 1, and the non-responsiveness time interval of the real-time task(s). So, in order to keep such
for the VM may be as long as P − Q. However, the response times low, the normal option would be the
worst-case condition when the VM is co-scheduled one to use small P values, obtaining high schedul-
with other VMs is the one shown in Figure 2, with ing overheads also while the VM is doing its batch
the budget granted to the VM at the beginning of a computing activities without any request from the
P time window (for example, because at that time outside triggering the real-time functionality.
all other VMs were idle), and at the end of the time
window immediately following (for example, as due In order to resolve this problem, in this paper
to the wake-up of a VM at the beginning of this sec- we propose the following mechanism (see Figure 3).
ond time window, with a deadline slightly shorter The VM is normally attached to a reservation con-
than the first VM, under theoretical saturation for figured with scheduling parameters (Q, P ), with a
the EDF scheduler). period P tuned for the batch computing case, i.e., it
is relatively large, for example in the range of hun-
dreds of milliseconds. In addition, a second “spare”
reservation is configured in the system with parame-
ters (Qs , Ps ) tuned for the operation of the real-time
activity, i.e., Ps is relatively small, for example in
the range of tens of milliseconds or shorter, and Qs
FIGURE 1: Example schedule of a sufficient to complete an activation of the real-time
VM with generic scheduling parameters of activity. Now, whenever the VM receives a network
(Q, P ), when running alone, exhibiting a non- packet and its current budget is exhausted (i.e., it
responsiveness time interval of P − Q. is in the non-responsiveness time frame), the VM

145
Improving Responsiveness for Virtualized Networking Under Intensive Computing Workloads

is temporarily attached to the “spare” reservation. Finally, in order to avoid keeping a spare reser-
Having a much shorter deadline, the spare reserva- vation for each and every VM hosted onto the same
tion forces the VM to be scheduled and receive Qs physical host, we propose to use a pool of spare reser-
execution time units on the processor within the Ps vations which can be used for the purpose illustrated
deadline from the packet receive time; this will cause above. The idea is that, exploiting statistical multi-
the VM to run, receive the packet and possibly ac- plexing of the networking traffic patterns among in-
tivate the real-time activity that will perform some dependent VMs, one can assume that the probability
fast computation (and possibly provide a response of having all the VMs requiring a spare reservation
packet). If the real-time activity cannot complete attached dynamically at the same time be very small.
within the first activation of the spare reservation, it This way, the additional utilization to keep for spare
will be resumed during the subsequent activations, reservations may be kept limited.
so it will receive additional Qs time units during the
Therefore, a pool of a few reservations with short
following Ps time window, and so on, till the time
periods will be ready to be used for boosting reser-
of replenishment of the original reservation budget,
vations (with longer periods) of VMs when they re-
at which time the VM relinquishes the spare reserva-
ceive packets from the external world but their nor-
tion. With a proper tuning of the Qs and Ps parame-
mal budget is exhausted due to compute-intensive
ters a VM configured for batch computing activities
activities. This allows for a very quick reaction-time
should exhibit a tremendously improved response-
of the VMs.
time to sporadic requests coming from the network,
at the cost of keeping some extra-capacity unused in
the system.
3 Implementation Details
In order to validate the proposed approach we im-
plemented a proof of concept in the Linux kernel,
using the KVM hypervisor to execute the VMs. We
started from the IRMOS scheduler [2], modifying it
to include support for reservations providing “spare”
bandwidth, and introducing the glue code needed to
use this new feature.
FIGURE 3: Example schedule of a VM
with generic scheduling parameters of (Q, P ), From the interface point of view, each reservation
and a spare reservation of (Qs , Ps ) which may have the property of providing spare bandwidth
is dynamically activated and attached to the to the reservations needing it, and/or the property
same VM on a new packet arrival. Despite the of using spare bandwidth from reservations provid-
budget for the VM at packet arrival time was ing it. The system administrator controls the pa-
exhausted, the VM can complete a short real- rameters of the reservations and the dependencies
time activity of duration Qs within the spare between users and providers of spare bandwidth us-
reservation period Ps . ing the CGROUP filesystem interface.
To recognize the events that are related to VM
The requirements of the real-time workload are
I/O, and consequently activate the spare bandwidth
assumed to be relatively small, and in any case the
mechanism we modified the networking code. In our
additional reservation to be attached dynamically to
modified kernel, when a packet arrives we check its
a VM cannot bee too large in terms of utilization
destination and if is headed towards a Virtual Ma-
(budget over period), because it needs to remain un-
chine we retrieve its server using a simplified hash
used for all the time in which the VM does not ac-
table. If the server has run out of bandwidth we set
cess the network. For example, it might require a
a flag to mark that it needs to access its spare reser-
10% or a lower CPU utilization to complete. This
vation. Setting the flag may also imply requeueing
should allow the real-time activity triggered by the
the running tasks belonging to the same VM, as they
received network packet to complete, assuming it is
may need to access the spare bandwidth too.
configured in the VM for running at higher prior-
ity than other activities. For example, the VM may When a task is activated, along as performing a
perform kernel-level activities inside the networking regular activation, the scheduler checks if the task
driver and stack, and relatively short userspace ac- belongs to a virtual machine, and if the VM’s server
tivities, which may be running in a task that was needs spare bandwidth; if this is the case, the task
waiting for the packet arrival. is not only enqueued in its own server, as would be

146
Performance Evaluation and Enhancement of Real-Time Linux

done anyway, but it is also enqueued in the server der to evaluate the worst-case latency experienced by
providing the spare reservation. ping, the VM was pinned on the first physical core of
the host, while a user-space tool, pinned on the other
The flag set on the VM’s server needs to be re-
core, was used to spin-wait for budget exhaustion of
set, and this may happen on two conditions. The
the associated reservation, and issue a ping request at
first possibility is when the emergency bandwidth
that time. As highlighted in Section 2, the minimum
has been set for a certain duration, empirically de-
observed ping time is theoretically P − Q = 60ms
termined not as a function of time, but rather of the
in this case (but far higher values were observed, ac-
chances the server has had to execute its tasks. The
tually). However, the mechanism introduced in this
other possibility is when the original server has its
paper foresees the attachment of the spare reserva-
bandwidth restored.
tion to the VM at the ping packet receive time, thus
the VM has a chance to run for Qs = 1ms within the
deadline of Ps = 10ms (and for an additional 1ms
4 Experimental Results for each subsequent 10ms time window, till the re-
plenishment of the original reservation budget), thus
The approach presented in the previous section was responding to the request much more quickly.
validated through an experiment conducted on a pro-
The obtained ping times with the VM running
totype implementation of the mechanism, evaluated
under the real-time scheduler are shown in Figure 4.
on a Linux 2.6.35 kernel patched with the IRMOS
As it can be seen when using the spare reservation
real-time scheduler [2], running on an Intel Core 2
(bottom curve) mechanism, the experienced ping
Duo P9600 CPU configured for running at a fixed
times are highly reduced as compared to when not
2.66 GHz frequency. The VM was configured with
using it (top curve).
the CPU thread running at real-time priority lower
than the one used for all its other threads. We were
180
unable to use the full implementation described in "< sed -e ’s/^.*time=\\([0-9.]\\+\\) ms/\\1/’ ping-notrick.dat"
"< sed -e ’s/^.*time=\\([0-9.]\\+\\) ms/\\1/’ ping-trick.dat"
Section 3, and we used only a subset of it, handling 160

part of the transition to the spare reservation from 140

userspace; however in the experiments we made sure


120
that the mapping of the VMs to the reservation was
Ping time (ms)

compatible with the described approach. 100

80
In order to show the advantages of the technique,
the ping times for reaching the VM have been mea- 60

sured under various conditions (so, the ping time 40

is representative of the responsiveness of the VM),


20
while a fake compute-intensive workload was used
inside the VM, using a throughput utility that has 0
0 10 20 30 40 50 60 70 80 90 100

the capability to measure how many repetitions of Request

a basic for loop with a few arithmetic operations


FIGURE 4: Obtained ping times with-
have been realized over a time horizon. Note that
out the additional spare reservation (top
a ping packet only reaches the kernel-level network
curve) and with the spare reservation (bottom
driver of the target VM (which runs at higher priority
curve).
as compared to user-space computing applications).
The evaluation of the technique with real user-space
Also, looking at the throughput that can be
applications (e.g., a webserver that needs to remain
achieved by the batch computing activities inside
responsive) is deferred as future work on the topic.
the VM with various equivalent reservation config-
In the experiment, the potential of the mech- urations (in terms of occupied CPU share), we can
anism is highlighted by measuring the worst-case observe that with a reservation of (40ms, 100ms) our
responsiveness of the system, under the assump- program was reporting 1.11 cycles per microsecond,
tion of sporadically interspaced, non-enqueuing ping while with a reservation of (4ms, 10ms) it was re-
requests, while the VM is under heavy compute- porting 0.56 cycles. The big difference is due to the
intensive workload. This has been achieved running additional scheduling overheads due to the ten times
the throughput utility inside the VM, attaching it more context switches. Therefore, it is highly ben-
to a reservation with scheduling parameters (Q, P ) = eficial to keep the VM configured with the longer
(40ms, 100ms), and by using a spare reservation con- period, in this case, while our mechanism allows to
figuration of (Qs , Ps ) = (4ms, 10ms). Also, in or- greatly improve its responsivenes.

147
Improving Responsiveness for Virtualized Networking Under Intensive Computing Workloads

5 Conclusions and Future [3] T. Cucinotta, G. Anastasi, and L. Abeni. Real-


time virtual machines. In Proceedings of the 29th
Work IEEE Real-Time System Symposium (RTSS
2008) – Work in Progress Session, Barcelona,
In this paper we present a novel scheduling mech- December 2008.
anism to provide efficiently a tight responsiveness
to virtual machines hosting mixed compute-intensive [4] T. Cucinotta, G. Anastasi, and L. Abeni. Re-
and real-time workloads. It is possible to sched- specting temporal constraints in virtualised ser-
ule such VMs with a reservation-based scheduler by vices. In Proceedings of the 2nd IEEE Interna-
using a large period for minimum overheads during tional Workshop on Real-Time Service-Oriented
compute-intensive periods, but at the same time en- Architecture and Applications (RTSOAA 2009),
sure that the VM responds within a much shorter Seattle, Washington, July 2009.
deadline when receiving input from the outside, as [5] T. Cucinotta, G. Anastasi, F. Checconi, D. Fag-
in the case of receiving a network packet. gioli, K. Kostanteli, A. Cuevas, D. Lamp,
The presented mechanism can still be improved, S. Berger, M. Stein, T. Voith, L. Fuerst,
and various directions for future extension are pos- D. Golbourn, and M. Muggeridge. Irmos de-
sible. Firstly, the mechanism may be improved to liverable: D6.4.2 final version of realtime ar-
shorten the response-time of the VM also during the chitecture of execution environment. Avail-
periods in which its own reservation has still budget, able on-line at: http://www.irmosproject.
but the deadline is still quite far away. This may eu/Deliverables/Default.aspx., 1 2010.
be seen in workloads where the other VMs in the
system have deadlines shorter than the one of the [6] T. Cucinotta, D. Giani, D. Faggioli, and
VM receiving the packet, but still quite far away as F. Checconi. Providing performance guarantees
compared to the desired tightness of the real-time ac- to virtual machines using real-time scheduling.
tivity response. Second, the current implementation In Proceedings of the 5th Workshop on Virtu-
is only a proof-of-concept and needs to be better en- alization and High-Performance Cloud Comput-
gineered to reach production quality levels. Third, ing (VHPC 2010), Ischia (Naples), Italy, August
the idea of the pool of spare reservations has only 2010.
been sketched out, but it needs to be refined, imple- [7] G. Dunlap. Scheduler development update. Xen
mented and experimented on a real system. Finally, Summit Asia 2009, Shanghai, 11 2009.
the presented mechanism needs to be applied to some
real-life workload in order to highlight its full poten- [8] D. Gupta, L. Cherkasova, R. Gardner, and
tial for real application scenarios. A. Vahdat. Enforcing performance isolation
across virtual machines in xen. In Proc.
ACM/IFIP/USENIX 2006 International Con-
ference on Middleware, New York, NY, USA,
References 2006.

[9] G. Liao et al. Software techniques to improve


[1] L. Abeni and G. Buttazzo. Integrating multi- virtualized i/o performance on multi-core sys-
media applications in hard real-time systems. tems. In Proc. ACM/IEEE ANCS 2008, New
In Proc. IEEE Real-Time Systems Symposium, York, NY, USA, 2008.
Madrid, Spain, December 1998.
[10] C. L. Liu and J. Layland. Scheduling al-
ghorithms for multiprogramming in a hard real-
[2] F. Checconi, T. Cucinotta, D. Faggioli, and time environment. Journal of the ACM, 20(1),
G. Lipari. Hierarchical multiprocessor CPU 1973.
reservations for the linux kernel. In Proceedings
of the 5th International Workshop on Operat- [11] D. Ongaro, A. L. Cox, and S. Rixner. Schedul-
ing Systems Platforms for Embedded Real-Time ing i/o in virtual machine monitors. In Proc.
Applications (OSPERT 2009), Dublin, Ireland, ACM SIGPLAN/SIGOPS VEE ’08, New York,
June 2009. NY, USA, 2008. ACM.

148
Performance Evaluation and Enhancement of Real-Time Linux

Evaluation of RT-Linux on different hardware platforms for the use


in industrial machinery control

Thomas Gusenleitner, DI(FH)


ENGEL Austria GmbH
Ludwig-Engel-Strasse 1, A-4311 Schwertberg
[email protected]

Gerhard Lettner, BSc


ENGEL Austria GmbH
Ludwig-Engel-Strasse 1, A-4311 Schwertberg
[email protected]

October 3, 2011

Abstract
Using Linux and Open Source in an industrial environment is becoming more and more common, in
part to ensure participation with daily improvements and compatibility with future development. One
of the most important requirements in the environment of industrial machinery control is realtime, so
we decided to evaluate RT-Linux on different hardware platforms. To generate a realistic load which is
comparable to the real machinery control a simplified version of machinery control, called testplc, was
developed and used in the hardware assessments conducted. The results of this evaluation should give a
clear statement about the applicability of each hardware platform for the machinery control area.

1 Introduction 2 Method

To get a wide range of results we take some really dif-


ferent hardware platforms from old industrial over
modern energy saving up to high performance and
consumer electronics.
In this environment its impossible to take the
Repeatable cycles is the main goal of our machin-
real control application for evaluation on these dif-
ery control, so one of the preconditions is an exact
ferent hardware platforms, the used test control ap-
and reproduceable timing of the scheduler. The ma-
plication is explained in the next section.
chinery control in real is a bit too complex for this
evaluation so we decided to implement a little test The used operation system is based on debian
control application with a comparable core design. lenny, the kernel version is 2.6.33.91 which is patched
with the realtime patch rt312 . More details see chap-
To get a concrete appreciation of the interaction
ter 2.2.
between RT-Linux, CPU, chipset and board design
we evaluate the same application with the same RT- The aim of this evaluation work is to get more
Linux kernel on different hardware platforms. into details using RT-Linux for industrial machin-
1 http://www.kernel.org/pub/linux/kernel/v2.6/longterm/v2.6.33
2 http://www.kernel.org/pub/linux/kernel/projects/rt/

149
Evaluation of RT-Linux on different hardware platforms for the use in industrial machinery control

ery control. It’s very interesting to implement a This test control application gives us two histograms
small test control application designed towards the as result. The first one is a latency histogram of all
real control application. The concrete results give five threads within a range of 0 to 300 micro seconds.
us a better understanding which influences lead to a The second one is a timing histogram of the 500 dou-
scheduling malfunction and the properties of multi ble multiplications within a range of 0 to 20000 nano
core cpus are displayed in the latency histograms. seconds.
Indeed a machinery control application in real
life has to deal with locking, interrupts, priority in- 2.2 RT-Linux
heritance and other complicated stuff, but if this
simple test control application shows problems in To get compareable results its evident to use equal
scheduling and runtime a real machinery control ap- Kernel configurations for the evaluation on each
plication will never work. hardware platform. As already mentioned in chap-
ter 2 the used Kernel was patched using the required
RT-Patch which is available at www.kernel.org. To
2.1 Test Control Application ensure the real time behavour the kernel configura-
tion (see section 2.2.1) as well as the runtime con-
The test control application is implemented in pure figuration (see section 2.2.2) is important and was
C with posix threads. Five of these threads build adopted.
the core of this control application with different cy-
cle times and priorities to simulate the real machin-
ery control. The timing is controlled with absolute 2.2.1 Kernel configuration
timestamps starting with a global start time. In ev-
ery cycle the period time is added to the absolute The following items in the Kernel configuration were
time. Each thread greps the actual system time af- considered:
ter returning from clock_nanosleep before it calcu-
lates the difference between scheduled time and the Processor type and features
actual time. We call this difference latency and in- [ * ] High Resolution Timer Support
serted the value in a histogram. The parameters for [ * ] Symmetric multi-processing support
clock_nanosleep are clockId=CLOCK_MONOTONIC
Preemtion Mode
and flags=TIMER_ABSTIME. For security reasons we
introduce a latency threshold to detect a scheduling (*) Compl.Preemption (Real-Time)
fault. [ * ] Generix x86 support
Timer frequency
Every thread has a number of functions to ex-
ecute, for example calculate 100 double multiplica- (x) 1000 HZ
tions, 500 double multiplications, sorting lists, cal- Power management and ACPI options
culating pi and communicate 1024 bytes over udp
to a server. The udp communication is only done [ * ] Power Management Support
in the 10000 micro second cycle time task to avoid [ ] CPU Freqeuncy Scaling
network problems. This communication generates a Kernel Hacking
huge amount of interrupts to stress the scheduler.
[ * ] Tracers
For measurement we take the time of 500 double
[ * ] Scheduling Latency Tracer
multiplications in nano seconds which are also stored
in a histogram. The timing and priority configura- [ * ] Scheduling Latency Histogram
tions of these threads are [ * ] Missed Timer Offset Histogram

1. 100 us cycle time with priority 80 2.2.2 Runtime configuration

2. 500 us cycle time with priority 75 To ensure real time behavior during run-
time, the Real-Time group scheduling must
3. 1000 us cycle time with priority 70 be modified. Therefore the content of the
files /proc/sys/kernel/sched_rt_period_us and
4. 2000 us cycle time with priority 65 /proc/sys/kernel/sched_rt_runtime_us has
to be set equal. The standard content for
5. 10000 us cycle time with priority 60 sched_rt_period_us is 1000000 (1s) and for

150
Performance Evaluation and Enhancement of Real-Time Linux

sched_rt_runtime_us 950000 (0.95s). Using the


standard settings causes to give the non real time
tasks a chance in 5% of the CPU time, if a real
time task would lock the whole CPU. The described
modifaction has the effect that the real time tasks
get the whole CPU time and so can eventually lock
the system so that no more non real time task can
run. Further details can be found in the Kernel
documentation [3].

3 Results

In the following sections we list the results and de-


scriptions of the used test systems. The following his- Figure 1: Testsystem Z530 latency histogram
tograms show the latency and multiplication times
in micro seconds on the X axis and the number of
occurencies on the Y axis in a logarithmic scale.
The following figures (Figure 2, 4, 6, 8) with the
multiplication times show the variation of calculation
time in the different realtime tasks and illustrate the
influence of the thread priority.

3.1 Testsystem Z530

3.1.1 Description

The testsystem Z530 is based on a Kontron main-


board with an Intel(R) Atom(TM) CPU Z530 Figure 2: Testsystem Z530 multiplication time
clocked with 1.60GHz and two giga bytes of mem-
ory.
3.2 Testsystem T7500

3.2.1 Description
3.1.2 Result
The testsystem T7500 is based on a Kontron main-
The testplc takes about 80 % of the whole CPU per- board with an Intel(R) Core(TM)2 Duo CPU T7500
formance. In this case the latency times vary over clocked with 2.20GHz and three giga bytes of mem-
more than 100 micro seconds, the peaks look like ory.
serialized tasks in order to the configured priority.
Maybe one of the main reasons for this picture of
latency times in Figure 1 is the poor performance 3.2.2 Result
and the missing ability for parallel computing. The
first three, highest priority tasks do not exceed the The testplc takes about 15 % of the whole CPU per-
maximum worst case latency of 300 micro seconds, formance. In this case the latency times as shown in
the 2000 and 100000 micro second task exceed the Figure 3 do not vary much and concentrate within 10
worst case latency of 300 micro seconds very often, micro seconds. In this plot we see the ability of par-
as shown in the histogram on the right peaks where allel computing, the two fastest tasks set their cpu
the overruns are cumulated. affinity so that each of them use one core.

151
Evaluation of RT-Linux on different hardware platforms for the use in industrial machinery control

Figure 3: Testsystem T7500 latency histogram Figure 5: Testsystem D525 latency histogram

Figure 4: Testsystem T7500 multipliaction time Figure 6: Testsystem D525 multipliaction time

3.4 Testsystem CP255


3.3 Testsystem D525
3.4.1 Description
3.3.1 Description
The testsystem CP255 is based on a KEBA main-
The testsystem D525 is based on a Gigabyte GA- board with an Intel(R) Pentium(R) M processor
D525TUD mainboard with an Intel(R) Atom(TM) clocked with 1.40GHz and one giga bytes of mem-
CPU D525 clocked with 1.80GHz and two giga bytes ory.
of memory.
3.4.2 Result

3.3.2 Result The testplc takes about 25 % of the whole CPU per-
formance. In this case the latency times in Figure 7
The testplc takes about 32 % of the whole CPU per- of the three highest priority tasks do not vary much
formance. In this case of Figure 5 the latency times and concentrate within 30 micro seconds. The lower
cover a range up to 200 micro seconds. Only a view the priority the higher the variation in the latency
peaks are detected which exceed the 300 micro sec- time, but even the lowest priority tasks latency so
onds. far stays below 150 micro seconds.

152
Performance Evaluation and Enhancement of Real-Time Linux

behavour shows that the realtime capability is not


given for this testsystem.

Figure 7: CP255 latency histogram

Figure 9: OMAP4 latency histogram


The result in Figure 9 must be accepted with
reservation, as the version of the realtime patch and
kernel configuration for the used operation system
was unknown.

4 Conclusion
The results of the recorded histograms show the dif-
ferent real time behavours of the evaluated hard-
ware platforms. For the Z530 and OMAP testsys-
tem the real time capability can’t be confirmed as
the latency time shows a variation more than 300 mi-
cro seconds. In contrast to the testsystem Z530 the
Figure 8: CP255 multipliaction time historgams from the testsystem T7500 show a very
The results in Figure 7 were produced using a small range of latency variation of only 40 micro sec-
vendor specific rt-kernel based on version 2.6.33.9. onds for the tasks with the lowest priority. Although
the testsystem CP255 is optmized for realtime, the
results are not as good as the values from testsys-
3.5 Testsystem OMAP4 tem T7500. We assume the cpu performance is the
reason for this behavour. The four higher priority
3.5.1 Description tasks show only a variation of 15 micro seconds. As
the implementation considers the requirements of a
The testsystem OMAP4 is based on a OMAP4 real machine control and the evaluation gives a good
Panda board with an ARMv7 Processor rev 2 (v7l) overview about the realtime capability, we consider
processor clocked with 1.00GHz and one giga bytes the acquired evaluation method as an extension to
of memory. the OSADL QA farm [2] evaluation.

3.5.2 Result

The testplc takes about 75 % of the whole CPU per-


formance. In Figure 9 the latency times of the three
higehst priority tasks vary much and reach 300 mi-
cro seconds as well as the lower priorty tasks. This

153
Evaluation of RT-Linux on different hardware platforms for the use in industrial machinery control

References [2] OSADL QA Farm Realtime, www.osadl.org/QA-


Farm-Realtime.qa-farm-about.0.html

[3] Kernel.org: Real-Time group scheduling,


[1] Open Source Automation Development Lab, http://www.kernel.org/doc/
www.osadl.org Documentation/scheduler/sched-rt-group.txt

154
Performance Evaluation and Enhancement of Real-Time Linux

openPOWERLINK in Linux Userspace: Implementation and


Performance Evaluation of the Real-Time Ethernet Protocol Stack
in Linux Userspace

Wolfgang Wallner
[email protected]

Josef Baumgartner
[email protected]

Bernecker + Rainer Industrie-Elektronik Ges.m.b.H


B & R Strasse 1, 5142 Eggelsberg, Austria

Abstract
The RT-Preempt patch for Linux turns the Linux operating system into a real-time operating system
and therefore is an ideal platform to implement a real-time Ethernet protocol stack like openPOWERLINK.
The initial implementation of the openPOWERLINK stack on X86 Linux was developed as a kernel mod-
ule. The solution completely bypasses the Linux network stack and achieves maximum performance
through the usage of its own network interface drivers. However, this limits the protocol stack to the
few openPOWERLINK network interface drivers currently available and also makes the protocol very
dependent on the used kernel version. To circumvent these drawbacks, the whole protocol stack was
implemented as Linux user space library. As most of the necessary real-time features are also available in
user space and many applications do not need the performance level of the kernel space implementation,
this solution is adequate for a lot of applications.
This paper describes the porting of the openPOWERLINK stack to user space and examines the
performance of the user space implementation. Therefore, the influence of the user space implementation
on the network jitter and on the generated system load is analyzed and compared with the kernel space
implementation. Due to the long term goal to integrate the lower level layers of the openPOWERLINK
stack into the mainline Linux kernel, in this paper it is furthermore discussed how the protocol stack
could be segmented into a kernel part that would be integrated into the Linux kernel and a user part that
is provided as a user space library.

1 Ethernet POWERLINK organization which drives the further development


of the protocol. Ethernet POWERLINK is a patent-
free and open standard, and the specification[1] is
1.1 Overview freely available on the EPSG website. As Fast
Ethernet is used unmodified as the physical layer,
POWERLINK is a strict, deterministic hard real- no proprietary hardware (e.g. ASICs) is needed for
time protocol based on Fast Ethernet (100 MBit). an implementation. The POWERLINK network can
It supports time-isochronous data transfer along be built with standard Ethernet hardware. Network
with asynchronous communication between network switches could be used, but should be avoided, as
nodes. POWERLINK was originally developed by they have no upper bound for the forwarding de-
B&R in 2001. In 2002 the protocol was opened to the lay of frames. Instead of swiches, hubs are preferred,
public and the Ethernet POWERLINK Standardiza- because they provide lower latencies and more deter-
tion Group (EPSG) was founded. The EPSG is an ministic behaviour. As all nodes in a POWERLINK

155
openPOWERLINK in Linux Userspace

network have to support the timing rules, standard sion of the Start of Cyclic (SoC) frame by the MN.
Ethernet devices may not be connected directly to a The SoC frame is sent as a multicast and can be
POWERLINK domain. received and processed by all other POWERLINK
stations in the network. No application data is trans-
ported in the SoC, it is only used for synchronization.
1.2 POWERLINK Layer Model
Immediately after transmitting the SoC, the MN
Figure 1 shows the generic Ethernet POWERLINK addresses each CN in the network with a Poll Re-
layer model. The POWERLINK protocol is lo- quest frame(PReq). Each CN responds with a Poll
cated at OSI layer 2 and 7 (Data Link Layer (DLL) Response (PRes). This frame is sent as multicast and
and application layer). The characteristic timing can therefore be received by the MN as well as by all
that is used to circumvent non-real-time attributes other CNs in the network. Therefore, the PRes can
of standard Ethernet (mainly CSMA/CD) belongs not only send input data from the CN to the MN, but
to the DLL. The POWERLINK specification de- also allows cross-communication among the CNs. Di-
fines that the CANopen interface is used as appli- rect cross-communication allows the times for data
cation layer. The usage of CANopen as application exchange between stations to be reduced consider-
layer makes it easy to integrate classical CANopen ably, since the data need not be copied in the MN.
applications to POWERLINK. The CANopen con- A CN only transmits when it receives a directly
cepts of Device Profiles, the Object Dictionary, Ser- addressed request (PReq) from the MN. The MN
vice Data Objects (SDOs), Process Data Objects waits for the response from the CN. This prevents
(PDOs) and Network Management (NMT) are all collisions on the network and enables deterministic
reused in POWERLINK. This is the reason why timing.
POWERLINK is often referred to as ”CANopen over
Ethernet”. A fixed time is reserved in the network cycle for
asynchronous data. Asynchronous data differs from
cyclic data in that it need not be configured in ad-
vance. Asynchronous data is generated on-demand
by a POWERLINK station. Examples are visualiza-
tion data, diagnostic data, etc. One asynchronous
frame can be sent per POWERLINK cycle. The
CNs can signal the MN in the poll response frame
that they would like to send asynchronous data. The
MN determines which station is allowed to send, and
shares this information in the Start of Asynchronous
(SoA) frame. Any Ethernet frame can be sent as
an asynchronous frame (ARP, IP, etc.). However, a
maximum length (MTU = Maximum Transfer Unit)
must not be exceeded.
The most important timing characteristic in an
Ethernet POWERLINK network is the cycle time,
which is measured between the start of two consec-
FIGURE 1: Overview of the utive SoC frames. The worst case jitter of the cy-
POWERLINK Protocol Layers. cle time is a quality attribute of the MN. A typical
POWERLINK communication cycle is shown in Fig-
1.3 Communication Principle (Data ure 2.
Link Layer)

A POWERLINK device can be a Managing Node


(MN) or a Controlled Node (CN). A POWERLINK
network has exactly one active MN (active redun-
dant MNs are possible to increase fault tolerance).
The MN regulates the activity on the network. All
other active devices in the network are CNs.
Communication in POWERLINK networks hap-
pens in cycles. Each cycle starts with the transmis-

156
Performance Evaluation and Enhancement of Real-Time Linux

FIGURE 2: Schematic showing a typical • Multi-Tasking: The openPOWERLINK


POWERLINK cycle. Notice how the cycle stack requires some kind of concurrent execu-
time is measured from one SoC frame to the tion for its modules. On bare-metal devices,
next. this can be done using IRQs, on hosted plat-
forms this is usually implemented using the
platform specific thread API.
2 openPOWERLINK
• Shared Buffer: These are message queues,
2.1 Overview which are internally used by the CAL to con-
nect the kernel and user part of the stack.
openPOWERLINK is an open source implementa- • Ethernet Driver: In order for the DLL to be
tion of the POWERLINK technology. It was orig- platform independent, an interface has defined
inally developed by SYS TEC electronic GmbH [9] to access the network. Each platform needs an
and later released under the BSD license in 2006. implementation of an Ethernet Driver (Edrv)
The openPOWERLINK project is hosted on the module to access the platform specific network
SourceForge website [10]. interface that uses this interface.
A main design goal of openPOWERLINK was • Low Resolution Timers (LRTs): Some
portability. Current implementations include Linux, parts of the stack need to watch timeouts in
Windows, VxWorks, bare-metal devices and more. the range of milliseconds (i.e. SDO transfer
timeout). These timeouts are not critical for
real time, and the timers used for these pur-
2.2 Software Architecture pose are referred to as Low Resolution Timers.
The software architecture of openPOWERLINK is • High Resolution Timers (HRTs): The
very similar to the generic POWERLINK architec- cyclic transmission of frames is controlled by
ture as previously shown in Figure 1. A remarkable the HRTs. These timers need to handle time-
exception is the strict partitioning in two parts: outs in the micro second range with a de-
sired precision of a few nano seconds. To
generate precise isochronous SoC frames, a
• Kernel part: The DLL and all layers below,
POWERLINK MN implementation needs a
like Ethernet driver or High Resolution Timers
very accurate system timer and low interrupt
(HRTs), are contained in what is called the ker-
latencies.
nel part. This part contains the time critical
modules of POWERLINK.
While the first three points are usually straight-
• User part: The CANopen specific modules
forward, the last point poses a challenge on many
(Object Dictionary, PDO, SDO, . . . ) are
platforms.
grouped in this part.

These two parts exchange information through 3 Userspace Implementation


the Communication Abstraction Layer (CAL). The
notations kernel part and user part are currently only
naming conventions. In the current implementations 3.1 Motivation
these two parts are always located in the same mem-
ory space. However, this is one of the preparations On the Linux platform, previously the only imple-
for future implementations where this two parts are mentation of the openPOWERLINK stack was com-
actually split apart. pletely in kernel space, having only the application
code in user space. This implementation is charac-
terized by the following properties:
2.3 Porting openPOWERLINK
+ Provides high performance and precision
To increase portability, platform dependent code is
concentrated in a few isolated places. The porting – Requires special Ethernet drivers
process of the openPOWERLINK stack to a new – Maintenance burden (not mainline)
platform typically consists of the adaption or reim-
plementation of the following modules: – Hard to debug

157
openPOWERLINK in Linux Userspace

The performance and precision reached by this


implementation are satisfying (cycle times down to
250µs, jitter in the two-digit microsecond range).
However, there are disadvantages: As it needs
special device drivers, a new device driver has to
be written for every additionally supported Ethernet
chip. As these drivers are not part of the mainline
Linux kernel, this will increase the amount of main-
tenance needed to keep them functional. Addition-
ally, this implementation is not suitable for general
purpose debugging of the openPOWERLINK stack
(kernel space debugging is more difficult). This lead
to the idea of porting the stack completely to user
space. The result of the porting efforts should have
the following advantages:

+ Support for all Ethernet chips by using some


kind of standard network interface FIGURE 3: Overview of different
openPOWERLINK stack implementations:
+ Less maintenance effort (stable interfaces in (a) Stack in kernel space (b) Stack in user
user space) space using pcap (c) Stack in user space using
kernel driver (experimental).

+ Easier to debug than kernel space implementa-


tion 3.3 Porting to user space

+ Still enough performance for many production Section 2.3 sketched to general porting procedure,
applications this section will describe the design decisions that
were made for the Linux user space implementation.

+ Possible first step to a later kernel space/user


space hybrid solution (outlined in Section 5) 3.3.1 Multi-Tasking

The user space implementation is based on the


pthread library, which is used to provide concurrent
3.2 Linux platform overview execution of different openPOWERLINK modules.

Figure 3 shows the general architecture of differ-


ent openPOWERLINK stack implementations on 3.3.2 Shared Buffer
the Linux platform. The first architecture shows
the complete openPOWERLINK stack implemented The shared buffers in user space were implemented
in Linux kernel space. This implementation is de- using plain malloc. This is possible because all parts
scribed in detail in [2]. A long term performance of the stack not only reside in the same memory
and stability test of this implementation is run in space, but also as threads inside the same process.
the OSADL Realtime QA Farm[7]. The second ar- POSIX semaphores and mutexes were used for syn-
chitecture shows the current port to user space that chronization between the different threads.
is based on the pcap library, which is in the focus of
this paper. The architecture on the right shows an
implementation of the user space stack which uses 3.3.3 Ethernet Interface
the openPOWERLINK kernel space drivers. This
implementation was developed to examine the influ- To provide access to the Ethernet interface from user
ence of the libPCAP interface on the performance space two possible implementations could be used,
and determinism of the system. It is shown for com- either libpcap or raw sockets. Because PCAP based
parison, but will not be further described in the rest openPOWERLINK Edrv modules are already avail-
of this paper. able for Windows XP and Windows CE, it was also

158
Performance Evaluation and Enhancement of Real-Time Linux

chosen as the basis for the Linux platform. As libP- The Ethernet POWERLINK network in the fig-
CAP uses RAW sockets on Linux there is nearly no ure is highlighted in orange, other connections that
performance difference. are shown in white indicate standard non-real-time
Ethernet. As measuring network times using tools
like Wireshark on a standard desktop PC suffers
3.3.4 Timers from larger jitter in the timestamps of individual
frames, a B&R Network Analyzer X20ET8819 was
Implementations for both LRTs as well as HRTs used. The B&R Network Analyzer is equipped with
use the POSIX timer API. Using POSIX timers for two network ports, one for POWERLINK and one
the needed HRTs is possible because of the high- for standard Ethernet. It is able to capture frames
resolution timers that ware introduced by Thomas on the POWERLINK network and timestamp them
Gleixner and Ingo Molnar as part of the Linux kernel with a 20ns resolution. It packs the timestamped
since 2.6.16. The new timer system does no longer POWERLINK frames into UDP packets and sends
depend on the periodic tick of the operating system them onto the Ethernet interface for further analy-
and allows nanoseconds resolution. However, the res- sis. The PC that was used to collect the captured
olution depends on the available timer hardware of POWERLINK frames and later run statistical ana-
the system. On an Intel X86 architecture there are lyzes and create test protocol is shown in the upper
different clock sources available (hpet, tsc, acpi pm) right corner. To generate network stress, another ex-
which provide a usable timer resolution in the mi- ternal device was needed. For this purpose, another
crosecond range. These high-resolution timers can Linux PC was used, which is shown in the upper left
be used to increase the precision of POSIX user space corner.
timers, which is exactly what we needed.
A detailed overview of the new architecture is
given in the paper Hrtimers and Beyond: Transform-
ing the Linux Time Subsystems[3].

4 Performance Evaluation

4.1 Test description

The different implementations of the openPOWERLINK


stack on Linux were configured as MN and used to
control a network of up to 30 CNs. While the timing
of frames that were sent by the MN were monitored
by an external PC, different load scenarios were run
on the MN to analyze there influence. To simulate a
real world application, each CN was equipped with
digital I/O modules. The control application on the FIGURE 4: Schematic showing the test
MN modified the outputs based on the input values setup. The number of actually connected CNs
in every cycle. varies depending on the test case (3 to 30 CNs,
partitioned as 3 daisy chains).
4.1.1 Hardware wiring
4.1.2 Node configuration
Figure 4 shows the general hardware setup that was
used for the performance and precision tests. The Managing Node (MN)
most interesting node in the test setup is the MN,
The hardware platform used for the MN was a B&R
which is shown in blue in the drawing. The MN
AutomationPC810[5] (APC810). Besides being de-
is the Linux PC that was used to compare the dif-
signed as mechanically robust for harsh environ-
ferent openPOWERLINK implementations. A vari-
ments, it is not different from a standard X86 desk-
able amount of Ethernet POWERLINK CNs was
top platform.
connected to the MN using a standard Ethernet
hub. Up to 30 CNs were used, partitioned as 3 The APC810 used in our tests was equipped with
daisy chains, each consisting of up to 10 nodes. a Intel Core2Duo U7500 dual core processor running

159
openPOWERLINK in Linux Userspace

at 1.06 GHz, 1 GByte DDR2 PC2-5300 DRAM and related threads has been increased. It is impor-
a 40GB hard disk drive. The Intel 945GME chipset tant that the real-time related openPOWERLINK
contains the Graphics Media Accelerator GMA 950. threads all have a higher priority than the other sys-
The APC is equipped with two on-board network in- tem threads. The internal priority relation between
terfaces, which use different Ethernet chips. One of the different openPOWERLINK threads is based on
these interfaces is based on the Intel 82573L, while the stack architecture (timer threads higher than
the other uses a Realtek 8111B. A third interface was network threads, thread of kernel-to-user shared
added as a PCI card, based on the Realtek 8139 chip. buffer higher than user-to-kernel shared buffer, . . . ).
The Low-res timer threads have the same priority as
These connections were used as follows:
the system SIRQs, as they are not critical to the real-
time behaviour. The startup thread has a very low
• Intel 82573L: Used as POWERLINK inter- priority, because it is mainly used for initialization,
face. This interface was configured to have no but has nothing to do during cyclic operation.
IP address, to avoid interference between the
POWERLINK and the Linux network stack.
Controlled Nodes (CNs)
• Realtek 8111B: Connected to the corporate
network, used for TCP/IP communication. A network of standard B&R POWERLINK bus cou-
• Realtek 8139: Directly connected to the PC plers (X20BC0083[6]) was used as CNs. As the ap-
that serves as flood ping generator (used for plication on the MN should simulate a real world
network stress test). implementation, these CNs were equipped with in-
put and output modules and exchanged new data in
The operating system used on the MN was an every cycle. These CNs were addressed as standard
Ubuntu 10.04 LTS (Lucid Lynx). We used the lat- CANopen DS401: Generic I/O modules.
est stable real time kernel 2.6.33.7-rt30 as listed on
the OSADL webpage[8]. The thread priorities were Network Analyzer
adjusted to the following values (demo pi console is
the name of the used demo application): To generate high precision time stamps for the
observed POWERLINK frames, a B&R network
analyzer (X20ET8819) was used. This device is
Thread Priority Description equipped with two Ethernet ports. One of these in-
sirq-hrtimer/0 -81 High-res timer terfaces is used as a pure POWERLINK input port
sirq-hrtimer/1 -81 High-res timer to analyze the received frames. It latches the time
demo pi console -76 High-res timer of reception with a precision of 20 ns. This informa-
irq/29-eth1 -71 Network interface tion is packed in UDP packets and sent out on the
sirq-net-rx/0 -61 Network handling second Ethernet port. On the PC this information
sirq-net-rx/1 -61 Network handling can be received and further processed, i.e. to create
demo pi console -56 Shared Buffer K→U high precision Wireshark traces. In our case these
demo pi console -51 Shared Buffer U→K measurements were evaluated in our test program to
demo pi console -51 Edrv (PCAP) measure the SoC jitter.
demo pi console -50 Low-res timer
demo pi console -21 Startup thread
Flood ping generator
TABLE 1: Thread priorities used by the A standard desktop PC running Linux was used to
user space implementation. create high amounts of network IRQs on the MN by
sending flood pings.
The priorites of the system threads were in-
creased using the tool chrt before the POWERLINK
Measurement PC
application was started. To adjust the priorities
of the stack internal threads, the the API call Another standard desktop PC running Linux that
sched priority was used during run time. was used to dump the timing measurements sent by
As stated earlier in section 2.3, the interrupt la- the network analyzer, do statistical calculations on
tencies for timer IRQs need to be as low as possible them using GNU R, and create the test reports with
to increase precision. This is the reason why the LATEX.
timer related threads are set to the highest priori-
ties. For the same reason, the priorities of network

160
Performance Evaluation and Enhancement of Real-Time Linux

4.1.3 Load scenario


Cycle time CNs User [%] Kernel [%]
• Idle: The first measurement was done on an
idle system as a reference for the different stress 10 ms 3 6 1
tests. 10 ms 10 7 2
10 ms 20 11 2
• CPU load: For the CPU stress test, the tool 10 ms 30 15 3
cpuburn was used [11]. It is designed to load 5 ms 3 8 2
X86 CPUs as heavily as possible for the pur- 5 ms 10 15 2
poses of system testing. 5 ms 20 23 5
5 ms 30 32 8
• Hard Disk I/O load: The tool dd was used 2 ms 3 22 7
to read and write large amounts of data from 2 ms 10 36 10
and to the hard disk drive. 2 ms 20 N/A 16
2 ms 30 N/A 20
• USB I/O load: As for the hard disk, dd was
used on an USB drive to produce USB I/O TABLE 2: Comparison of the CPU load on
load. different network configurations.

As can be seen from these numbers, the CPU


• Network load: Heavy network stress was
load to drive the same network configuration may
caused by an external flood ping on the first
increase by a factor of 4-5. With the current user
Ethernet interface.
space implementation, it was not possible to handle
more than 15 CNs with a cycle time of 2 ms. The
• Scheduling load: Heavy process scheduling
reason for this limitation is currently unknown and
load was caused by hackbench [12]. It spawns
must be further examined.
over a hundred processes which are communi-
cating by sending signals to each other.

• Miscellaneous load: To cause miscellaneous


system load a Linux kernel compilation was
started.

4.2 Results

Precision

For a comparison between the measured jitter values FIGURE 6: CPU load of the pcap based
of the kernel space and the user space implemen- POWERLINK stack in different configura-
tation, see Figure 5. The influence of the different tions.
load scenarios is very similar for both the user space
and kernel space implementation. Notice however
the different scale: in the range of 100 µs for user 5 Conclusion and Future Work
space and in the range of 40 µs for kernel space.
High scheduling load has the greatest impact on the The measured values of performance and precision of
network latencies on both implementations. the user space implementation are inferior to the ker-
nel space variant, which was expected. While high
performance application still need to be served by the
Performance kernel space implementation, the experiments have
shown that the user space variant can be used for
The measured CPU load of the user space imple- many applications with lower requirements. A no-
mentation on different configurations is visualized in ticeable benefit of the user space implementation is
Figure 6. The kernel space and user space imple- the portability. Through the use of the pcap library
mentation are compared in the following table (CPU it can be used on any Ethernet chip that is supported
load is given in percent of a single CPU core): by the mainline Linux kernel. In combination with

161
openPOWERLINK in Linux Userspace

FIGURE 5: Boxplots showing the measured jitters values during different load scenarios. (a)
shows the jitter values of the kernel based implementation, while (b) shows the results of the user
space version using pcap.

the RT-Preempt patch this implementation can be


used to turn any standard X86 Linux box into a
master for real time industrial networking based on
Ethernet POWERLINK.
A possible and preferred future stack architec-
ture is shown in figure 7. We would like to submit
the time critical parts (mostly the Data Link Layer
(DLL)) directly into the mainline kernel. The non
time critical parts (mainly CANopen) could be im-
plemented and distributed as a user space library.
This setup would greatly reduce the amount of time
and effort needed to turn a standard Linux installa-
tion into a hard real-time network master while still
providing high performance.
The openPOWERLINK stack has already been
prepared for the use as a kernel space/user space
hybrid solution and many parts of the needed in-
frastructure are already in place. However, be-
fore we can finally split the two parts, more work
FIGURE 7: Architecture of the preferred
needs to be done. There is ongoing effort in the
future implementation: Time critical parts in-
openPOWERLINK community to realize the de-
cluded in the mainline Linux kernel (using
scribed architecture.
standard network drivers), while higher lay-
Additionally some enhancements in the Linux ers are implemented and distributed as a user
network stack architecture are needed for a high- space library.
performance POWERLINK stack. It is necessary to
be able to capture and insert Ethernet packets with
low latencies. These topics were already outlined by
Thomas Gleixner in the document Powerlink - Linux
kernel support[4]. As these functions are of general
interest there is a good chance that they will be im-
plemented in the mainline kernel.

162
Performance Evaluation and Enhancement of Real-Time Linux

References [6] X20 System User’s Manual, Version 2.10,


9.6 BC0083, Bernecker + Rainer Industrie-
Elektronik Ges.m.b.H, Austria
[1] EPSG Draft Standard 301,
Ethernet POWERLINK, Communication Pro- [7] OSADL Realtime QA Farm,
file Specification, 2008, Ethernet POWERLINK https://www.osadl.org/Real-time-Ethernet-
Standardization Group, V 1.1.0 Powerlink-jitter-an.qa-farm-rt-powerlink-
jitter.0.html
[2] POWERLINK and Real-Time Linux: A Perfect
Match for Highest Performance in Real Appli- [8] OSADL Project: Realtime Linux,
cations, October 2010, Josef Baumgartner, Ste- https://www.osadl.org/Realtime-
fan Schoenegger, Bernecker + Rainer Industrie- Linux.projects-realtime-linux.0.html
Elektronik Ges.m.b.H, Austria [9] SYS TEC electronic GmbH, http://www.systec-
electronic.com
[3] Hrtimers and Beyond: Transforming the Linux
Time Subsystems, July 2006, Thomas Gleixner, [10] openPOWERLINK Protocol Stack Source,
Douglas Niehaus http://openpowerlink.sourceforge.net/

[11] cpuburn homepage,


[4] Powerlink - Linux kernel support, May 2010,
http://pages.sbcglobal.net/redelm/, Robert
Thomas Gleixner
Redelmeier

[5] APC 810 User’s Manual, Version 1.20, October [12] hackbench homepage,
2009, Bernecker + Rainer Industrie-Elektronik http://devresources.linux-
Ges.m.b.H, Austria foundation.org/craiger/hackbench

163
openPOWERLINK in Linux Userspace

164
Performance Evaluation and Enhancement of Real-Time Linux

Timing Analysis of a Linux-Based CAN-to-CAN Gateway

Michal Sojka1 , Pavel Pı́ša1 , Ondřej Špinka1 , Oliver Hartkopp2 , Zdeněk Hanzálek1
1
Czech Technical University in Prague
Technická 2, 121 35 Praha 6, Czech Republic
{sojkam1,pisa,spinkao,hanzalek}@fel.cvut.cz
2
Volkswagen Group Research
Brieffach 1777, 38436 Wolfsburg, Germany
[email protected]

Abstract
In this paper, we thoroughly analyze timing properties of CAN-to-CAN gateway built with Linux
kernel CAN subsystem. The latencies induced by this gateway are evaluated under many combinations
of conditions, such as when traffic filtering is used, when the gateway is configured to modify the routed
frames, when various types of load are imposed on the gateway or when the gateway is run on different
kernels (both rt-preempt and vanilla are included). From the detailed results, we derive the general
characteristics of the gateway. Some of the results apply not only for the special case of CAN-to-CAN
routing, but also for the whole Linux networking subsystem because many mechanisms in the Linux
networking stack are shared by all protocols.
The overall conclusion of our analysis is that the gateway is in pretty good shape and our results were
used to support merging the gateway into Linux mainline.

1 Introduction vices (or complete subsystems) that are not compat-


ible with each other. They may use different proto-
cols or simply use fixed CAN IDs, that collide with
Controller Area Network (CAN) is still by far the other devices. Then, it is necessary to separate those
most widespread networking standard used in the au- devices using a gateway that ensures that only the
tomotive industry today, even in the most recent ve- traffic needed for communication with the rest of the
hicle designs. Although there are more modern solu- system passes the gateway, perhaps after being mod-
tions available on the market [1, 2] (such as FlexRay ified to not collide with the rest of the system.
or various industrial Ethernet standards), CAN rep- For this reason a widely configurable CAN gate-
resents a reliable, cheap, proven and well-known net- way has been implemented inside the Linux kernel,
work. Thanks to its non-destructive and strictly de- which is based on the existing Linux CAN subsystem
terministic medium arbitration, CAN also exhibits (PF CAN/SocketCAN [3]) and can be configured via
very predictable behavior, making it ideally suited the commonly used netlink configuration interface.
for real-time distributed systems. Because of these This gateway is designed for CAN-to-CAN routing
indisputable qualities, it is unlikely that the CAN is and allows frame1 filtering and manipulation of the
going to be phased out in foreseeable future. routed frame content. Obviously, such a gateway
The maturity of CAN technology means that must satisfy very strict real-time requirements, espe-
there exist a huge number of CAN compatible de- cially if it connects critical control systems. There-
vices on the market. It is therefore quite easy to fore, the gateway had to undergo a set of comprehen-
build prototypes of complex systems by just connect- sive tests, focused on measuring latencies intruded by
ing off-the-shelf devices and configuring them to do the gateway under various conditions.
what it desired. Sometimes, however, there are de-
1 CAN uses the term “frame” for what other networks call packet or message.

165
Timing Analysis of a Linux-Based CAN-to-CAN Gateway

The results of this testing are reported in this pa- The software configuration is kept as simple as
per. The complete data set, consisting of gigabytes possible in order to make the results not disturbed by
of data and more than one thousand graphs, as well unrelated activities. The gateway runs only a Linux
as the source codes of our testing tools, are available kernel, a Dropbear SSH server and, obviously, the
for download in our public repositories [4, 5]. This gateway itself. On the PC, a stripped-down Debian
allows other people interested in this topic to inde- distribution is used. The tasks that generate the test
pendently review our results and methods, as well traffic and measure the gateway latency are assigned
as to use them as a base for their own experiments. the highest real-time priority and their memory is
Our methods and results are relevant not only for the locked in order to prevent page-faults. SocketCAN
special case of CAN-to-CAN routing but, since Linux was used on both the gateway and the PC as the
networking subsystem forms the core of many other CAN driver.
protocols, also for other networks including Ether-
net, Bluetooth, Zigbee etc.
The paper is organized as follows: the next sec-
tion describes the setup of our testbed and how we 2.1 Measurement Methodology
measured the gateway latencies. Section 3 summa-
rizes the main results found during our testing. We To measure the gateway latency, we generate CAN
give our conclusion in Section 4. traffic in the PC and send it out from can0 inter-
face. As can be seen in Figure 1, this interface is
directly wired to the can1 interface of the PC as
2 Testbed Setup well as to one interface of the gateway. The can1
interface is used to receive the frames to determine
The testbed, used for gateway latency measure- the exact time when each frame actually appears on
ments, is depicted in Figure 1 and consists of a stan- the bus2 . This is necessary in order to exclude vari-
dard PC and the gateway. The PC is Pentium 4 ous delays such as queuing time in the can0 transmit
running at 2.4 GHz with 2 GB RAM, equipped with queue. When a frame is received on can1 interface,
Kvaser PCI quad-CAN SJA1000-based adapter. The it is timestamped by the driver in its interrupt han-
gateway is an embedded board based on MPC5200B dler. These timestamps are sufficiently precise for
(PowerPC) microcontroller running at 400 MHz. our measurements.
There are two CAN buses that connects the PC with The frames routed through the gateway are re-
the gateway. The PC generates the CAN traffic on ceived on can2 interface of the PC. Again, these
one bus and looks at the traffic routed via the gate- frames are timestamped the same way as was de-
way on the other bus. The gateway is also connected scribed in the previous paragraph. The total latency
to the PC via Ethernet (using a dedicated adapter is then calculated by simply subtracting the times-
in the PC). This connection serves for booting the tamps measured on the can2 and can1 interfaces (see
gateway via TFTP and NFS protocols, for configur- Figure 2). It is worth noting that both timestamps
ing it via SSH, and also to generate Ethernet load to are obtained using the same clock (in our case time-
see how it influences the gateway latencies. stamp counter register of the PC’s CPU), which en-
sures that the results are not influenced by the offset
of non-synchronized clocks.
To calculate the latency, we need to determine
which received frame corresponds to which transmit-
ted one, and this mechanism must be able to cope
with possible frame losses or frame modifications in
the gateway. For this purpose, the first two bytes of
the data payload are used to store a unique number
that is never modified by the gateway. This number
serves as an index to a lookup table, which stores the
timestamps relevant to the particular frame. This al-
lows for easily detection of frame losses. When the
corresponding entry in the lookup table contains just
one timestamp after a certain timeout, which is set
FIGURE 1: Testbed configuration. to 1 s by default, the frame is considered lost.
2 We could also use kernel provided TX timestamps for this, but it somehow didn’t work in our setup.

166
Performance Evaluation and Enhancement of Real-Time Linux

We run all experiments for all possible combinations


of the above conditions which resulted in 653 exper-
CAN bus 0 msg 1 iments. The interested reader can find the full set of
CAN gateway
the results at [5]. The most important findings are
(Linux) discussed later in this paper.
CAN bus 1 msg 1'

GW latency Duration
time 2.3 Presentation of Results
Total latency

RX timestamp 1 RX timestamp 2 In every experiment we measured the latency of mul-


tiple (in most cases 10000) frames. The results are
presented below in a sort of histogram called “la-
tency profile”. Figure 3 shows how the latency pro-
file (at the bottom) relates to the classical histogram
FIGURE 2: Calculation of gateway latency.
(at top). In essence, latency profile is a backward-
cumulative histogram with logarithmic vertical axis.
Since we are interested only in the latency intro-
duced by the gateway (see GW latency in Figure 2), 


 
we subtract from the total latency the duration of 

the frame transmission, where we take into account 
       
the stuff bits inserted by CAN link layer. 


    


The sources of our testing tools and the individ- 

       
ual test cases can be found in our git repository [4] 


      


under gw-tests directory. 



       



2.2 What Was Tested? 



%  !& '



        !"#$
Our goal was to measure the properties of the gate-
way under a wide range of conditions. These in-
cluded:
FIGURE 3: How is latency profile constructed.
1. The gateway configuration such as frame fil-
ters, frame modifications, etc. The advantage of using latency profiles is that
the worst-case behavior (bottom right part of the
2. Additional load imposed on the gateway sys-
graph) is “magnified” by the logarithmic scale. More
tem. The following types of load were con-
formally, the properties of the latency profile are
sidered: no load; CPU load i.e. running
as follows: Given two points (t1 , m1 ) and (t2 , m2 )
hackbench3 on the gateway; Ethernet load i.e.
from a latency profile, where t1 < t2 , we can say
running ping -f -s 60000 -q gw on the PC
that m1 − m2 frames had the latency in the range
with gw being the IP address of the gateway.
(t1 , t2 ). Additionally, the rightmost point (tw , mw )
3. Type of CAN traffic. We tested the gateway means that there were exactly mw frames with the
with three kinds of traffic: One frame at a time, worst-case latency of tw .
where the next frame was sent only after receiv-
ing of the previously sent frame from the gate-
way; 50% bus load, where frames were sent with 2.4 Measurement Precision
a fixed period which was equal to two times
the transmission duration and finally, 100% bus We conducted a few experiments to evaluate the
load (flood), where frames were sent as fast as precision of measuring the latencies with our setup.
possible. First, we measured the total frame latencies by two
means: (1) by a PC as described above and (2) by an
4. Linux kernel version used on the gateway. independent CAN analyzer by Vector4 . In the other
The following versions were tested: 2.6.33.7, experiments we used only the method 1 (PC) as it al-
2.6.33.7-rt29, 2.6.36.2, 3.0.4 and 3.0.4-rt14. lows for full automation of the measurement, whereas
3 Hackbench repository is at http://git.kernel.org/?p=linux/kernel/git/tglx/rt-tests.git.
4 We used analyzer called CANalyzer (http://www.vector.com/vi canalyzer en.html)

167
Timing Analysis of a Linux-Based CAN-to-CAN Gateway

method 2 (CANalyzer) requires a lot of manual work 3 The Results


to save and process the measured data. The results
of the comparison can be seen in Figure 4. It can be In this section we present the main results of the
observed that the time resolution of CAN analyzer analysis. Some of the results gives a nice insight
is only 10 µs while the PC was able to measure data in how networking in Linux works and how Linux
with far better resolution, thanks to the support of schedules various activities.
high resolution timers in Linux. Our histogram uses
bins of 1 µs. The difference between the two meth-
ods is most of the time below 10 µs. Occasionally (for
less then 0.01% of frames), we got a bigger difference. 3.1 Simple gateway
Such precision is sufficient for our experiments.
In the first experiment we measured the behavior of
the gateway which simply routes all frames from one
 


bus to another without any modifications.
 



 #$
$%& '



 
      
  


! " 


  

 



 


FIGURE 4: Comparison of total latency measure- 

ment by PC and by CANalyzer, payload: 4 bytes. 


 



      
    !  ! 
Since the PC not only timestamps the received
frames but also generates the Ethernet load to ar-
tificially load the gateway, this additional activity
influences the precision of the measurements. To see
how big this influence is, we ignored the frames re- FIGURE 6: Simple gateway – latency profile and
the corresponding time chart. Conditions: GW ker-
ceived on can2 interface and calculated the latency
l from RX timestamp on can1, the time before the nel 2.6.33.7, traffic: oneatatime, load: none, pay-
frame was sent (in user space) to can0 and TX dura- load: 4 bytes.
tion (i.e. l = tRX − tsend − tduration ). The ideal result
would be a vertical line at time 0, i.e. all frames were The latency profile and the corresponding time
received immediately after being transmitted, but in chart is shown in Figure 6. It can be seen that
reality (see Figure 5) we get a sloped line around the best-case gateway latency is about 35 µs and the
31 µs, because the measurement includes the over- worst-case is about 140 µs. Most of the observed la-
head of sending and receiving operation. The second tencies fall into two groups, which are split by a gap
line in the graph shows, the generating the Ethernet around 50 µs. We attribute this gap to timer in-
traffic decreases the measurement precision by ap- terrupts which were triggered during processing the
proximately 30 µs, if we ignore a few out-layers, or by frame in the gateway. It can be seen that the cost of
about 200 µs if we do not ignore them. Both number the timer interrupt is about 20 µs.
are far below the increase of gateway induced latency
(which is in order of milliseconds – see Section 3.3)
and therefore, the precision is sufficient even when 3.2 Batched Processing of Frames
the PC generates the Ethernet load.
Linux kernel processes the incoming CAN frames (or

packets in Ethernet networks) in batches. Basically,


#
 


$%  
 when RX soft-irq is scheduled, it runs in a loop and

tries to process all frames sitting in receive buffers



        (either in hardware or in software). The graph in
 
         ! "  Figure 7 shows nicely the effect of this. If we com-
pare the latencies when the CAN traffic was gener-
ated with one frame at a time and flood methods,
it can be seen that in the former case, the overhead
FIGURE 5: Influence of Ethernet load generator of scheduling the RX soft-irq is always included (the
on measurement precision (no GW involved). latency profile starts at 35 µs), whereas in the latter

168
Performance Evaluation and Enhancement of Real-Time Linux

case, the overhead is reduced. Whenever the gate- 3.4 Frame Filtering
way receives a frame just when it finishes processing
of the previous frame, it does not exit the soft-irq The SocketCAN gateway allows for filtering the
and continues processing the new frame. Therefore, frames based on their IDs. There are two kinds of
the best-case latencies are much lower in that case filter implementations. First implementation (used
(the latency of 0 is of course caused by measurement for all EFF5 frames) puts the filtering rules into a
inaccuracies). The worst-case is about the same in linked list. Whenever a frame is received, this list
both cases. is traversed and when a match is found, the frame
is routed to the requested interface. The second im-
 plementation is optimized for matching single SFF6
      


 IDs. Since there is only 2048 distinct SFF IDs, the


filter uses the frame ID as an index to the table and


  


the destination interface if found without traversing
 a potentially long list.



       
             
 

  
  
   !


  
  "
   !
  #
FIGURE 7: The effect of batched frame process-   
   !"
ing. Conditions: GW kernel 2.6.33.7, load: none, 
  

payload: 4 bytes. 
     

'(( )    *

3.3 Effect of Loading the Gateway 


(  * (  *!"
 (  *! (  *


(  *!
  

(  *#
Figure 8 shows the effect of loading the gateway.  (  *

The No load line is the same as in the graphs be- 




fore. The CPU load line represents the case when 


the CPU of the gateway was heavily loaded by do-      
$% &
ing many inter-process communications in parallel
(hackbench). This approximately doubles the worst-
case latency from 140 µs to 250 µs. The Ethernet
load (flood ping), however, influences the gateway FIGURE 9: Gateway with filters. Top: latency
much more significantly. As it was shown in [6], for different number of EFF filter, bottom: single
this is due to the shared RX queue for both CAN SFF ID filters. Conditions: GW kernel 2.6.33.7,
and Ethernet traffic. Therefore, processing of CAN traffic: one frame at a time, load: none.
frames has to wait after the Ethernet packets (in our
case big ICMP echo requests) are processed. The top graph in Figure 9 show the cost of hav-
ing a different number of EFF filters in the list and

only the last one matches the sent frames. The gate-
 
  
way latency obviously increases with the number of
 !"    filters. From a more detailed graph [5] it was nicely

  

 visible when the list started to be bigger than CPU


cache and the latency started to increase quicker. In



our case this boundary was hit for about 80 filters.



     Additionally, when the filter list is too long and
 
CAN frames arrives faster, the gateway is no longer
able to handle all of them and starts dropping them.
This is visible in the appropriate graph at [5].
FIGURE 8: The effect of loading the gateway. The bottom graph in Figure 9 shows that the
Conditions: GW kernel 2.6.33.7, traffic: one frame single ID SFF filters perform the same for all frame
at a time, payload: 4 byte. IDs even when there is 2048 distinct filtering rules.
5 Extended Frame Format
6 Standard Frame Format

169
Timing Analysis of a Linux-Based CAN-to-CAN Gateway

3.5 Frame Modifications Interestingly, 3.0-rt is much better in this regard


than 2.6.33-rt. Compared to non-rt kernels, 3.0-rt
Besides routing the frames, the gateway is also able increases latencies by about 20 µs, whereas 2.6.33-rt
to modify them. There are different operations which increases them by almost 100 µs.
can be applied to the frames: AND, OR, XOR, SET
and two checksum generators. The graph in Fig- 
ure 10 shows the cost of modifying the frames. In  


  

 
essence, most of the cost comes from copying the 
 
 
socket buffer before modifying it. The difference be- 


tween different modifications is negligible. 
      
 


  
  
 


  
 

   ! " #$



  %!%& " #$
FIGURE 12: Huge latencies in rt-preempt kernel.



Conditions: 2048 EFF filters, GW kernel: 3.0.4-



     rt14, traffic: one frame at a time, load: none, pay-
  load: 2 bytes.

There seems to be a bug in rt-preempt kernels.


For certain workloads, we get the latencies as big as
FIGURE 10: The Cost of Frame Modification. 50 ms. This can be seen in Figure 12. Interestingly,
Conditions: GW kernel 2.6.33.7, traffic: one frame we get such a high latency regularly, exactly every
at a time, load: none, payload: 8 bytes. one second. This behavior appears in both -rt ker-
nels tested and we will try to find the source of the
3.6 Differences between Kernels latency later. Additionally, kernel 3.0.4-rt14 hangs
with heavy Ethernet load, which is also something
Figure 11 shows the differences between different ker- we want to look at in the future.
nel versions. We tried to keep the configs7 of the ker-
nels as similar as possible by using make oldconfig.
From the graph, we can observe two things. First, 3.7 User-Space Gateway
with increasing non-rt versions, the latency increases
as well. The difference between 2.6.33 and 2.6.36 is The SocketCAN gateway is implemented in kernel
about 10 µs, the difference between 2.6.36 and 3.0 is space. It is interesting to see the differences when
smaller – about 2 µs. the gateway is implemented as a user space program.
The user space gateway was executed with real-time
scheduling policy (SCHED FIFO) with priority 90.


 
 


 
 

     


    


  








 
             

   

FIGURE 11: Different kernels. Conditions: traf- FIGURE 13: Kernel-space vs. User-space gate-
fic: one frame at a time, load: none, payload: 4 way. Conditions: kernel: 2.6.33.7, load: none, traf-
bytes. fic: one frame at a time, payload: 2 bytes.

Second, the latency of rt-preempt kernels is In Figure 13 can be seen that the time needed
higher than any of non-rt kernels. This is obvious, to route the frame in user-space is about three times
because the preemptivity of the kernel has its costs. bigger then with the kernel gateway.
7 The configs of our gateway kernels can be found at https://rtime.felk.cvut.cz/gitweb/can-benchmark.git/tree/HEAD:/-

kernel/build/shark

170
Performance Evaluation and Enhancement of Real-Time Linux

It is interesting to compare these graphs for dif-


ferent kernel versions. From that, one can see where

the increase of overhead comes from. The interested
 

 
   
 




   
  
reader is referred to our web site [5].





       
  4 Conclusion
This paper presented the timing analysis of a Linux-
FIGURE 14: Kernel-space vs. User-space gate- based CAN-to-CAN gateway and studied influence of
way under heavy traffic. Conditions: kernel: various factors (like CPU and bus load, kernel ver-
2.6.33.7 (non-rt and rt), load: none, traffic: flood, sions etc.) on frame latencies. The results indicate
payload: 2 bytes. that the gateway itself introduces no significant over-
head under real-life bus loads and working conditions
Figure 14 shows the gateway latencies with flood and can reliably work as a part of a distributed em-
CAN traffic. One can see that the user-space gate- bedded system. Our results were used to support
way under non-rt kernel drops some frames (the gap merging the gateway into Linux mainline. The gate-
at the top) and exhibits latencies up to 100 ms. The way should appear in Linux 3.2 release.
latencies are caused mainly by queuing frames in re-
On the other hand, it must be noted that espe-
ceiving socket queues. Since the user space had no
cially excessive Ethernet traffic or improperly con-
chance to run for a long time, the queue becomes
structed frame filters can lead to significant perfor-
long and eventually some frames are dropped. The
mance penalties and possible frame losses. The CAN
kernel simply has “higher priority” than anything in
subsystem, which forms the core of the examinated
the user space. With -rt kernel, the situation is dif-
CAN gateway, is inherently prone to problems un-
ferent. The priorities of both user and kernel threads
der heavy bus loads, not only on CAN bus, but also
can be set to (almost) arbitrary values, which allows
on other networking devices, as was already demon-
to reduce latencies of the user-space gateway down
strated in our previous work [6]. Nevertheless, the
to 2 ms. described gateway is a standard and easy-to-use solu-
tion, integrated in Linux kernel mainline, and there-
3.8 Multihop Routing fore represents the framework of choice for most de-
velopers.
In the last experiment, we modified the kernel gate- It was also clearly demonstrated that the kernel-
way to allow routing a single frame multiple times space solution works much better than the user-space
via virtual CAN devices. This allows us to split the solution, and that it can be beneficial to use stan-
overall latency into two parts. The first part is the dard non-rt kernels (providing that the gateway runs
overhead of interrupt handling and soft-irq schedul- in kernel-space). This allows to avoid greater over-
ing and the second part is the processing of the frame head and resulting performance penalty of rt kernels,
in CAN subsystem. The latter part can be derived providing that the standard kernel is properly con-
from Figure 15, by looking at the difference between figured.
consecutive lines (and dividing it by two). We get
Finally, our benchmarks revealed a few problems
that CAN subsystem processing takes about 10 µs.
in -rt kernels. We will investigate these problems as
The rest (from ca. 60 µs to 130 µs) is the overhead
of the rest of the system. our future work.



References


 
  


 

  [1] T. Nolte, H. Hansson, and L. L. Bello, “Automo-



         
tive communications-past, current and future,”
  in 10th IEEE Conference on Emerging Technolo-
gies and Factory Automation (ETFA), Catania,
Italy, 2005, pp. 992–1000.
FIGURE 15: Latencies of multi-hop routing via [2] N. Navet, Y. Song, F. Simonot-Lion, and
virtual CAN interfaces. C. Wilwert, “Trends in automotive communi-

171
Timing Analysis of a Linux-Based CAN-to-CAN Gateway

cation systems,” Proceedings of the IEEE, vol. [5] M. Sojka and P. Pı́ša, “Netlink-
93(6), pp. 1204–1223, 2005. based CAN-to-CAN gateway timing
test results,” 2011. [Online]. Available:
[3] “The SocketCAN poject website,” http://rtime.felk.cvut.cz/can/benchmark/2.1/
http://developer.berlios.de/projects/socketcan.
[6] M. Sojka and P. Pı́ša, “Timing analysis of
[4] M. Sojka, “CAN Benchmark git Linux CAN drivers,” in Eleventh Real-Time
repository,” 2010. [Online]. Avail- Linux Workshop. Homagstr. 3 - 5, D-72296
able: http://rtime.felk.cvut.cz/gitweb/can- Schopfloch: Open Source Automation Develop-
benchmark.git ment Lab, 2009, pp. 147–153.

172
Performance Evaluation and Enhancement of Real-Time Linux

Evaluation of embedded virtualization on real-time Linux for


industrial control system

Sanjay Ghosh
Industrial Software Systems, ABB Corporate Research
Bhoruka Tech Park, Whitefield Road, Bangalore, India
[email protected]

Pradyumna Sampath
Industrial Communications, ABB Corporate Research
Bhoruka Tech Park, Whitefield Road, Bangalore, India
[email protected]

Abstract
Real time control applications in industrial control systems have long been trusted to run on specially
designed dedicated embedded hardware like PLCs, controllers etc. The essential non real-time func-
tionalities of the automation systems like engineering and HMI are separately executed on independent
hardware. Over the last decade, advancements in RTOS, technologies such as virtualization and avail-
ability of more powerful COTS hardware have enabled the recent trends, where industrial PCs are being
proposed to replace the hardware controller units. This paper describes the evaluation of our prototype
control system on a general purpose hardware based on Linux rt-preempt. In our setup, the control
logic is executed by a runtime control engine alongside the standard engineering framework, standard
HMI, data acquisition, EtherCAT based IO and also general PC utility applications on a uni-processor
system without impacting the deterministic performance. The paper discusses in detail, our performance
evaluation results and the methodologies used in terms of, the test setup, boundary conditions, and the
parameters measured under typical load conditions as in real industrial applications.

1 Introduction and Back- troller, leaving a significant resource left over to run
other essential non real time applications. Addition-
ground ally, PCs are available in several configurations and
form factors to meet the diverse user needs compared
The automation industry has seen a trend in the us- to the controller hardware. The use of commercial
age of control systems based on the commodity hard- PC along with desktop operating systems does not
ware instead of being based on the standard con- guarantee the deterministic hard real-time require-
troller hardware. The key advantages this solution ments for discrete manufacturing. Hard real-time
provides are in terms of higher flexibility of configu- controllers, by their nature are meant to provide fast,
ration and ease of customization. In fact, with the deterministic and repeatable scan times without be-
availability of high processing and memory capabil- ing affected by the other background activities un-
ities, PCs can significantly out-perform most of the dertaken by the operating system. Typical discrete
commercial controllers that are being currently used manufacturing applications require deterministic, re-
in the industry, in terms of the available hardware peatable scan times to be as fast as one millisec-
resources to perform tasks. Therefore, control ap- ond. Therefore the control engineering community,
plications running on a resource rich PC-based con- as they move to the PC-based control, expects it to
trol may consume a comparatively lesser of the total deliver similar level of performance and reliability.
available CPU than in the case of a traditional con- The challenge yet remains for the PC based control

173
Evaluation of embedded virtualization on real-time Linux for industrial control system

on one hand to enable all the benefits discussed above 2 Industrial Control on com-
and on the other hand to achieve the reliability re-
quired of a controller.
modity hardware

2.1 Requirements for PC based con-


trol

The key high level requirement for a PC based con-


trol is to run the control engine and the IO commu-
1.1 Related Work
nication as user space tasks in a real time operating
environment with a real time execution guarantee.
Some previous works have been reported in the lit- In addition to this, the non real time applications
erature which explores the concepts of PC based like the application engineering tool, HMI, monitor-
control. Work by Magro et.al. [1] include evalua- ing program and PC utility applications should be
tion of time performances of a soft-PLC, OpenPCS able to run on the same hardware without jeopardiz-
by Info Team running on a native operating system ing the real time response. Generally, all the appli-
Windows2000 with different configurations and load- cation engineering tools as well as HMI interfaces are
ing. In one of the work [2], it was analyzed that, packaged as Windows based application. The need
by carefully tuning the Linux rt-preempt based host for Windows based engineering stations stems from
hypervisor, and using hardware assisted virtualiza- existing knowledge base of using windows for such
tion along with a device emulation application, sub- applications. Furthermore, the user should be able
millisecond scheduling latencies inside guests can be to perform shift reporting, batch reporting etc using
achieved. In fact, Wurmsdobler [3] in one of his ar- the general PC utility applications (mostly Windows
ticle concludes based on their extensive testing that, based) like Microsoft Office. In such context, we try
.for slow processes, Linux supports hard real time to define the requirements of a system that is capa-
without any changes.. There have been many real- ble of providing an environment to achieve the above
time benchmarking studies on native operating sys- functionalities. The control engine [9] executes in-
tems [4], [5], [6] to observe the performance and ca- side an independent execution environment as a user
pabilities of the real time patches. Most of these space application on the operating system and fully
works conclude that; with appropriate tuning of real relies on the underlying OS to provide infrastructure
time priorities and also employing appropriate ker- to ensure determinism.
nel configurations, appreciable real time behavior
could be achieved, to support industry grade real
time application demands. Schild et.al. [7] studied 2.1.1 The requirements on control tasks
the interrupt-response times of a real-time operating
system, Linux rt-preempt, hosting virtual machines There is a clear difference in the resource require-
using hardware assisted virtualization technology. A ments of the different real time and non real time
mechanism called .coscheduling., [8] i.e. dynamically tasks that are required to be present on the PC based
boosting the guest VM.s priority levels, is proposed hardware. The execution behavior of the control
in order to improve the CPU throughput of a general tasks is usually short or periodic [11]. The control en-
purpose OS VM on a RTOS host. gine executed in a ‘scan cycle’., wherein, it performs
input, executes the control logic, and ultimately pro-
In this paper we present the evaluation of our duces output. One full machine cycle of the control is
prototype PC based control based on Linux rt- divided into the execution cycle times and the slack
preempt to run real-time control. This additionally times both adding up to make the fixed interval cycle
allows Windows environment to run on the same time. The control engine executes the cycle as fast
hardware thus providing the best of both worlds. as possible, and the worst-case loop time determines
Section 2 discusses the requirements for the PC based the response time of the system [3].
controller from the perspective of the factory au-
tomation domain and describes our prototype in de-
tail. Section 3 describes the evaluation setup based 2.1.2 The requirements on non-realtime
on our prototype PC based controller. Section 4 tasks
presents the evaluation result and our observations.
Section 5 describes the related work. Section 6 con- For factory automation processes, HMI especially is
cludes the paper by summarizing our findings and required to be responsive to ensure an expected level
also a note on our vision for future work. of user experience. However, these applications are

174
Performance Evaluation and Enhancement of Real-Time Linux

scarcely needed during operation of a factory au- and the user modes, in Linux called the guest mode,
tomation system. The resource demands of the con- which in turn has its own kernel and user modes [17].
trol engine execution and should not starve the other This resulting guest VM.s physical memory can be
non real time tasks, especially the engineering and mapped to the virtual memory of the host hyper-
the HMI. visor. Using the corresponding host process for the
VM guest process, standard configurations in terms
of priority, affinity etc can be configured in order to
2.2 Technology alternatives for PC flexibly influence the scheduling of virtual machines
based control during runtime [2]. This is another advantage which
a PC based control using a hardware assisted virtu-
In order to host the control engine we identified and alization over a native operating system scores over
selected Linux rt-preempt [12] over other available the traditional control.
real time extensions to the Linux kernel such as RTAI
and RT-Linux, mainly based on the requirements
for our domain [13]. Other reasons for choosing rt-
2.3 Prototype PC based control
preempt is that a lot of the features in this patch is
has been already included into the mainline kernel in Based on the requirements for the PC based Control
parts. It is definitely important to ensure the avail- mentioned in section 2.1 and considering the technol-
ability of the community support that is sustainable ogy solution mentioned in section 2.2, we came out
over long product life cycles. rt-preempt patch im- with a prototype PC based Control. For the con-
plements real time behavior by allowing nearly the trol engine we selected one of the control engine that
entire kernel to be preempted, with the exception of a provides support for Linux operating system on x86
few very small regions of code. Further by inclusion architecture hardware. Programming IDE runs on
of the high resolution timers (hrtimers) hard real- Windows guest OS and is based on the open interna-
time behavior can be achieved in rt-preempt [14]. tional standard IEC 61131 and is also supplied by the
Thus the control engine in the PC based control is to same vendor. The control engine also supports visu-
be run as user space task in Linux preempt RT with alization on the device where the HMI is executed
appropriate real-time priorities. In order to achieve as a user space process on the host operating sys-
co-existence of both the real time and the non real tem. The control engine executes as a high real time
time tasks, a significant degree of isolation in terms priority on the Linux rt-preempt based host while
of memory space is required. The Linux rt-preempt the engineering and other PC utility applications are
kernel with its implementation of a virtual mem- executed inside one or many Windows based guests
ory model provides for this feature with real-time with comparatively lower process priorities. The ex-
responses. Combining virtualization and real-time ecution model of the prototype PC based control is
gives several use cases for embedded systems. Full shown below in figure-1.
virtualization options (such as Intel VT-x) are now
commonly available in commodity hardware having
x86 [10] architecture.
We also selected Kernel Based Virtual Machine
(KVM) [15], [16] which utilizes the hardware virtual-
ization extension to enable the virtualization in our
PC based control prototype. Since 2.6.20, KVM as
an active open source project has been a part of
the mainline kernel as a Linux kernel module with
a strong developer community. KVM runs unmodi-
fied guest operating systems on the host OS provid-
ing each virtual machine to own private virtualized
hardware: a network card, disk, graphics adapter,
etc.. It relies on the host OS for tasks like schedul- FIGURE 1: Execution Model
ing, native interrupt, memory management, hard-
ware management, etc. using a kernel-space device Figure 2 below shows the communication model of
driver (/dev/kvm) and hence categorized as type- the prototype PC based control. The network con-
2 hypervisor. This uses a user-space component figuration between the host operating system (Linux
QEMU [17] for all device emulation. KVM adds an rt-preempt) and the guest operating system (Win-
operating mode in addition to the default, the kernel dows) is using the local Ethernet based soft bridge

175
Evaluation of embedded virtualization on real-time Linux for industrial control system

(bridge configuration using /etc/network/interfaces 3.1 System and Runtime Environ-


. auto br0). Communication between the control ment
engine (also known as the Target Device) executing
on the host and the programming IDE (also known The host system in our evaluation setup consisted
as the Engineering) executing on the guest is based of an Intel core two duo processor (Core2 G6950
on the industry standard OPC Server. We used the at 2.8GHz) with 2 GB of RAM memory. The
industry standard real time communication protocol, CPU is x86 architecture based and has support for
EtherCAT for IO communication in our prototype. hardware assisted full virtualization with the In-
The EtherCAT master stack executes as a high real tel VT-x technology. The system had a 1Gbps
time priority user space process in the host Linux rt- Ethernet network interface card. Host was run-
preempt operating system. This enables the Ether- ning the Linux distribution Ubuntu10.04. The host
net network interface card of the PC to communicate kernel was built out of 2.6.33.7.2-rt30 kernel with
to the EtherCAT slave module connected to it using the rt-preempt patch. CONFIG PREEMPT, CON-
the EtherCAT protocol. An oscilloscope is used to FIG PREEMPT RT and the HRTTIMER were en-
display and verify the real time output from the PC abled and CONFIG ACPI PROCESSOR was dis-
based control. abled to achieve the best possible latencies. Both
the host as well as the guests used the native TSC
(constant tsc flag in /proc/cpuinfo) as clock source
so that the time measurements obtained within
the guest are reliable. In-kernel tracing functions
i.e. CONFIG LATENCY TRACE were built in, but
were kept disabled during the experiments[2]

echo 0 > /proc/sys/kernel/ftrace_enabled


echo 0 > /sys/kernel/debug/tracing/tracing_\
enabled

One of the CPU out of the two in the core two


duo system was kept disabled during the experiments

echo 0 > /sys/devices/system/cpu/cpu1/online

The experimental setup consisted of two non


real time custom created guests with 512MB virtual
RAM and a single VCPU created using Virtual Ma-
chine Manager application. Both of these virtual
machines were based on Windows XP SP3. One
of the guests VM runs the application engineering
while the other was meant to running PC utility ap-
plications. We also disabled the USB legacy option
in system BIOS, as it was reported in some earlier
work[8] that the USB legacy device is one key factor
that causes latencies arising due to SMIs. Instead we
FIGURE 2: Communication Model used PS/2 keyboard and mouse. One of the exper-
iments requires running the application engineering
on a separate networked system. The other PC con-
sists of an x86 architecture based Intel Core i5 CPU,
M520, 2.4GHz, 2 GB RAM and running Windows
XP SP3.
3 Evaluation Setup
3.2 Test Applications
The earlier section described our PC based control
prototype in general. Going further, this section de- The benchmarking control programs referred by the
scribes the specific evaluation setup and configura- PLCopen [18] have been used for performance eval-
tions of the prototype we used for our experiments. uation of the PC based control in adherence to

176
Performance Evaluation and Enhancement of Real-Time Linux

the standard guidelines from the PLCopen. These stress -cpu <c> -io <i> -vm <m> -timeout <t>&
scripts are written in IEC 61131 language structured
text format [19] and are used as standard programs Similarly, in order to load or stress the Windows
to benchmark the performance of PLCs. The ap- based guest VMs, Windows based freeware Heavy-
plication engineering interface used in our prototype Load 3.0.0.159 [21] was used. In order to stress the
allows application programming using the standard system resources, HeavyLoad writes a large test-file
IEC 61131 language. There are two types of bench- to the temp folder, allocates physical and virtual
marking scripts, for application oriented tests and for memory and draws patterns in its window.
language oriented tests. Application oriented bench-
marks are used to measure the whole cycle of the
control i.e. from receiving an input signal, the inter- 3.3 Evaluation Parameters
nal processing, till writing an output signal. These
are a set of different types of applications and their For the performance evaluation of the PC based con-
mixtures, which are typically used in the factory au- trol, we mainly focus on measuring the upper bound
tomation. Language oriented benchmarks evaluates or the worst case latencies of the cyclic execution
the computational performance of a controller while of the control application. The key to appropriate
performing all the available language constructs in performance evaluation is accurately measuring the
61131-3 language. We have evaluated our proto- cycle time. The actual cycle time is the time span
type PC based control using seven of the benchmark- between start and end of a test cycle excluding any
ing programs including both the application oriented possible overheads such as task startup process, IO
tests and the language oriented tests. Almost similar access time etc. However, system specific overhead
results were observed for all of these tests. For the (like timer tick and task scheduler) gets included by
sake of conciseness, in this paper, we present only default to the cycle time measurement. Therefore
the results of one of the experiments i.e. the lan- the time measurements are instrumented within the
guage oriented test for the control statements. This application test program. At every iteration cycle,
test evaluates the performance of the PC based con- at the start of the operations, the current system
trol in operation of one thousand instances of differ- timestamp is stored in a variable. Then the opera-
ent control statements e.g. IF, CASE, FOR, WHILE tions are being performed according as per the appli-
etc. Repetition time is coded inside the test project cation logic. Right at the end of all the operations,
scripts using looping. All the experiments were per- the system time is stored again and the time elapsed
formed for more than two hours with the control en- while executing the operations is being calculated.
gine continuously executing the test program during This is the termed as the Execution Cycle time or
this duration with different test conditions. The ap- more commonly Cycle Time. Based on these mea-
plication engineering tool used in the prototype also surements per cycle, average, minimum and maxi-
allows developing visualization applications too. The mum execution cycle times is calculated. Another
visualization program executes on the control engine measurement parameter which is of interest is the
along with the application test program, however Jitter in the execution of the control logic. This
with an execution cycle time, typically, two orders is the measure of how early or how late the execu-
larger than that of the control application. We have tion cycle starts with reference to the desired time
created a simple visualization program which shows of start. Prior to performing the actual performance
six visualization objects on the screen and linked to evaluation experiments, as the first step, it is also
these to monitor and update the status of six differ- important measure the processing capabilities of a
ent variables (i.e. six OPC tags) in the application control system. In order to estimate the processing
test program. capability of the control system for the particular
control logic, we execute the test control program
While performing the performance evaluation of with different interval cycle times and the watchdog
any real time systems, it is a usual practice to load or enabled. Watchdog in this context is defined as a
stress the system under test in order to observe the monitor inbuilt within the runtime engine which in-
performance in such conditions. In order to stress
dicates an exception when the actual execution time
the Linux rt-preempt host, we used the open source of the IEC61131 application exceeds the designated
tool ‘Stress’ 1.0.4. [20] to run as a background pro- interval time. That is estimated as the minimum
cess in order to imposes a configurable amount of possible interval cycle time of the system under test
CPU, memory, I/O, and disk stress on the system. for that particular control application. Further, for
An example of the command line used for executing all the performance evaluation experiments on the
the stress program is as follows system using that test control application, the in-
terval cycle time is configured to be the minimum

177
Evaluation of embedded virtualization on real-time Linux for industrial control system

possible interval time. while executing the control applications on the PC


based control. Also this gives the performance of
As mentioned in the description of our evalua-
the control applications and real time IO communi-
tion prototype, an oscilloscope is connected to the
cation, when the engineering application interface,
IO module in order to display and verify the real
HMI and PC utility applications all run on the same
time output from the PC based control. The IO cy-
hardware.
cle time (also known as the scan time in the control
engineering), for the purpose of all the experiments
is kept constant and it is equal to the interval cycle
4.1 Test 1: Interval time evaluation
time. Our experimental prototype consists of real
time processes such as the control engine and Ether-
This experiment was performed in order to approx-
CAT IO master stack; and non real time processes
imately measure the performance of the PC based
such as a guest VM for running the application engi-
control in terms of the minimum possible interval
neering tool and another guest VM for running the
cycle times it can support for the execution of a con-
PC utility applications. Among all the user space
trol logic. For the PC based control, a comparative
processes running on the host system, the control en-
measurement was performed among the two possi-
gine process is assigned the highest real time priority
ble scenarios. Scenario-1, when the application engi-
followed by the EtherCAT IO master stack process.
neering tool, visualization as well as PC utility appli-
In order to limit the size of the test matrix, we cations run on the same hardware with the control
decided to focus only upon few specific aspects of engine; scenario-2, when all these applications run
the performance evaluation and hence we have fixed on a different system. In this experiment, we tested
few of the variable parameters for all the experi- the execution of the system by step wise reducing the
ments. For all the experiments we have fixed the interval cycle time of the control logic till the system
real time priorities of the control engine process and throws watchdog exceptions. Following were our ob-
the EtherCAT IO master stack process as 68 and 50 servations while performing the experiment for both
respectively. As mentioned in the earlier section, the of the above mentioned scenarios.
stress program is used to produce controlled load on
Table 1: Execution time evaluation for
the host Linux system. The stress program continu-
Scenario-1
ously runs for the whole duration of the experiment
with a constant value of the load. Value for the last
Interval(µs) Observation
15 minutes average load from /proc/loadavg is con-
300 Watchdog exception at start
sidered to be the measure of the constant system
350 Watchdog exception at start
load. Based on this measure, for the purpose of our
400 Watchdog exception at start
experiments, we have defined the categories of the
450 Watchdog exception after few
system load level as Light <2, Moderate 5, Heavy
minutes
10 and Very Heavy <=20. The reference numbers
500 Smooth Execution
mentioned in the parenthesis approximately repre-
sents the /proc/loadavg values for each of the cat-
egories. According to the standard guidelines from Table 2: Execution time evaluation for
the PLCopen each of the standard benchmark con- Scenario-2
trol programs needs to be repeated atleast 10,000
times to get accurate results. We ran all our tests Interval(µs) Observation
for a longer duration of atleast two hrs each. If the 300 Watchdog exception at start
interval cycle time be 1 millisecond, then during the 350 Watchdog exception at start
duration of a test, the control application would run 400 Watchdog exception after few
>= 7x106 execution cycles, which is typically large minutes
enough to observe the maximum latencies. 450 Smooth Execution
500 Smooth Execution

The measurements presented in the table 2 shows


4 Evaluation Results and Dis- that, the PC based control was able to smoothly exe-
cute a particular test control application, when con-
cussions figured to run with interval cycle time of 450µs. This
is only when the system runs just the control engine
The objective of the performance evaluation is to and the IO communication run as real time processes
measure the worst case latencies and jitters observed in the Linux rt-preempt host. In another scenario,

178
Performance Evaluation and Enhancement of Real-Time Linux

where the necessary non real time tasks are also run application and scan times for the EtherCAT based
alongside these real time processes, the smooth ex- output observed using an oscilloscope.
ecution of the control application is found possible
only beyond the interval cycle time of 500µs. How-
ever, typical factory automation applications require
scan times of as small as one millisecond. In both
of the above mentioned scenarios, it was observed
that the average execution time the system requires
for executing one cycle of the test control logic was
less than 225ms, i.e. well below even the 50% of
the interval cycle time. As the control engine and
the IO communication runs as high real time prior-
ity processes, these tasks are never preempted from
the scheduler by other non real time tasks, during
their execution. However, it is during these available
slack times the non real time processes are scheduled
if required.
FIGURE 3: Execution Model

4.2 Test 2: Latency and jitter evalu- The measurements presented in the table 3 shows
ation that, the PC based control was able to accommodate
the execution of typical non real time tasks required
This experiment is meant to evaluate the perfor- in a control system, on the same hardware, without
mance of PC based control prototype for reliably compromising the real time guarantee of the control
running the control applications and real time IO applications. Further, it was also observed that, even
communication, even in the presence of the necessary by deliberately applying heavy loads on the host sys-
non real time tasks, running on the same hardware. tem as well as the guest systems, there is only a neg-
For the purpose of this experiment we executed the ligible change on the real time execution behavior of
test control application on the PC based prototype the control application.
under seven different test conditions identified as the In one of the scenario, the reliability of the iso-
seven setup configurations, C1,C2,.., C7. These con- lation between the Linux rt-preempt host and the
figurations are defined based on what all tasks are Windows based guests partitions was also evaluated.
being run on the system Even during the deliberate crashing or during re-
booting of the Windows guests, the control engine
• Base Configuration (Base): Control engine + and the real time IO continues to perform unaffected
IO communication + Visualization on host + in terms of cycle times and jitter.
Windows VM1 running application engineering
tool
• C1: Base 5 Conclusion and Future Work
• C2: Base + Low stress on host
Our experiments with the prototype PC based con-
• C3: Base + Moderate stress on host trol device has demonstrated that; it is possible to
be able to achieve deterministic responses by us-
• C4: Base + Heavy stress on host ing Linux rt-preempt along with KVM as the host
RTOS. Irrespective of the concurrently running ap-
• C5: Base + Very Heavy stress on host
plication load on the windows guests, the determin-
• C6: Base + Very Heavy stress on host + Max istic behavior of the user space applications running
possible load on Windows VM1 on the rt-preempt kernel is not affected. The choice
of real-time tasks and their priorities must still be
• C7: Base + Very Heavy stress on host + Max carefully managed and the host and guest must still
possible load on Windows VM1 + Windows follow the traditional separation of concerns i.e. real-
VM2 running text processing application time and non-real time respectively. The results of
the tests performed also indicate that, such a system
Table 3 below presents the measurements of the ex- may be conceivable for certain industrial application
ecution cycle times and jitter for the test control domains, but maybe inappropriate for applications

179
Evaluation of embedded virtualization on real-time Linux for industrial control system

which demand more stringent real-time constraints [9] Robert Kaiser, Stephan Wagner, and Alexander
(such as closed loop motion control). Zuepke, ”Safe and Cooperative Coexistence of
a SoftPLC and Linux”, Sysgo AG white paper,
Going forward we believe that there is potential
2007
for further work in the area. One such activity might
involve the confluence of multi-core, virtualization [10] Intel Corporation, .Intel 64 and IA-32 Archi-
and real-time. Comparative studies between SMP tectures Software Developer.s Manual., Vol. 3B,
Virtualization and AMP virtualization for real-time pp. 3-8, March 2010.
systems is an area where the industry and academia
might see benefit. [11] Gernot Heiser, .The role of virtualization in em-
bedded systems., Proceedings of the 1st workshop
on Isolation and integration in embedded sys-
References tems, pp. 11-16, April, 2008.

[12] CONFIG PREEMPT RT Patch -


[1] Micaela Caserza Magro, Paolo Pinceti, ”Mea-
http://rt.wiki.kernel.org/index.php/
suring Real Time Performances of PC-based In-
dustrial Control Systems”, Proceedings of IEEE [13] Morten Mossige, Pradyumna Sampath, and
Conference on Emerging Technologies and Fac- Rachana Rao, ”Evaluation of Linux rt-preempt
tory Automation (ETFA), Sept, 2007. ETFA. for embedded industrial devices for Automa-
[2] Jan Kiszka, ”Towards Linux as a Real-Time tion and Power technologies - A case study”,
Hypervisor”, RTLWS11, 2009. RTLWS9, 2007.

[3] Peter Wurmsdobler, ”Slower is easier”, Indus- [14] Steven Rostedt and Darren Hart, .Internals of
trial Computing, pp. 49-51, Nov 2001. the RT Patch., Proceedings of the Linux Sym-
posium, 2007.
[4] Kushal Koolwal,. Investigating latency effects of
the Linux real-time Preemption Patches (PRE- [15] Kernel Based Virtual Machine -
EMPT RT) on AMD.s GEODE LX Platform., http://www.linux-kvm.org/
RTLWS11, 2009.
[16] Avi Kivity et al., ”KVM: The Linux Virtual Ma-
[5] A. Heursch, D. Grambow, A. Hosrtkotte, chine Monitor”, Proceedings of the Linux Sym-
and H. Rzehak, .Steps towards a fully pre- posium, 2007.
emptable Linux kernel., Proceedings of the
27th IFAC/IFIP/IEEE Workshop on Real- [17] QEMU - open source processor emulator -
Time Programming, May 2003. http://www.qemu.org

[6] Carsten Emde, ”Long-term monitoring of ap- [18] PLCopen Benchmark home page
parent latency in PREEMPT RT Linux real- - http://www.plcopen.org/pages/
time systems”, RTLWS12, 2010. tc3 certification/benchmarking/index.htm
[7] Henning Schild, Adam Lackorzynski, and [19] IEC 61131 .Programmable controllers - Part 3:
Alexander Warg, .Faithful Virtualization on a Programming languages., ed.2.0, 2003.
Real-Time Operating System., RTLWS11, 2009.
[20] Stress project home page .
[8] Baojing Zuo et.al., ”Performance Tuning To- http://weather.ou.edu/ apw/projects/stress/
wards a KVM-based Low Latency Virtualization
System”, Proceedings of 2nd International Con- [21] HeavyLoad download page .
ference on Information Engineering and Com- http://www.softpedia.com/get/ Sys-
puter Science (ICIECS), 2010. tem/Benchmarks/HeavyLoad.shtml

180
Real-Time Linux Concepts

Turning Krieger’s MCS Lock into a Send Queue


or, a Case for Reusing Clever, Mostly Lock-Free Code in a Different Area

Benjamin Engel and Marcus Völp


Technische Universität Dresden
Operating-Systems Group
Nöthnitzer Strasse 46, Dresden, Germany
{voelp, engel}@os.inf.tu-dresden.de

Abstract
Lock- and wait-free data structures can be constructed in a generic way. However, when complex
operations are involved, their practical use is rather limited due to high performance overheads and, in
some settings, difficult to fulfil object lifecycles.
While working on a synchronous inter-processor communication (IPC) path for multicore systems, we
stumbled over a clever piece of code that did fulfil most of the properties that this path requires for its
send queue. Unfortunately, this piece of code was by no means a data-structure publication or somehow
related to send queues. Reporting on our experience in translating Krieger’s MCS-style reader-writer lock
into a send queue for cross-processor IPC, we would like to make the point that sometimes, searching for
code could end up in a valuable treasure chest even for largely different areas.

1 Introduction 2. synchronous IPC is able to exploit preallo-


cated memory buffers, thereby eliminating the
need for memory allocation during the message
Predictability, security and the ease to support
transfer; and,
application-tailored OS functionalities all speak for
microkernels and microhypervisors as host operat- 3. processor local synchronous IPC implementa-
ing systems for todays and future manycore sys- tions can be as fast as a few hundred cycles 1 ,
tems. In particular, augmented virtualization envi- which makes them suitable for higher-level syn-
roments as we find them in desktop, server and em- chronization primitives.
bedded systems benefit from the ability to co-host
large legacy software stacks next to more sensitive
While processor-local IPC not necessarily re-
code such as real-time subsystems [2, 3] and secure
quires a send queue, it turns out that in cross-
applications [4].
processor IPC paths senders have to be blocked at
Synchronous inter-process communication (IPC) the receiver and processed in some arrival-dependent
is one of the central mechanisms many of these ker- order. In other words, multiprocessor synchronous
nels implement. Reasons favoring synchronous (i.e., IPC needs a send queue to avoid unbounded starva-
unbuffered and blocking) IPC are: tion of senders waiting to rendevous with their re-
ceiver.
1. asynchronous communication and coordination This paper reports on our experience implement-
primitives can easily be built on top of syn- ing such a send queue and the surprising result we
chronous IPC, however the reverse is not pos- found: basically a wait-free MCS-style reader-writer
sible in this generality; lock by Krieger et al. [7] provides all the essential
1 288 cycles on an Intel Core i7 920 2.67 GHz [5].

181
Turning Kriegers MCS Lock into a Send Queue

functionality that we required. In the following, we In addition, send queues should also support a de-
summarize the key requirements of send queues, mo- queue operation from the middle of the list, allowing
tivate our choice of a non-blocking implementation threads to cancel their not-yet-started IPC. Situa-
and highlight the difficulties of using non-blocking tions where callers have to abort an IPC include the
code in the kernel. Then we briefly outline the orig- deletion of a thread and timeouts (e.g., set to react
inal reader-writer lock by Krieger and present our on certain error situations).
modifications to turn it into a send queue. We eval-
One of the key benefits of a synchronous IPC
uate our results and relate them to others, before
path is the possibility to preallocate all message
introducing our idea on a treasure chest of non-
buffers to avoid memory allocation during the IPC.
blocking building blocks.
The immediate consequence for the send queue is
that all operations have to operate on preallocated
memory as well. In particular, if a call is finished, the
2 Send Queue Requirements caller’s message buffer and all meta data used dur-
ing the IPC must be ready for use in subsequent IPC
calls. In particular for generic constructions of lock-
The primary purpose of a send queue is to order
free data structures, these properties are difficult to
incoming requests to prevent sender starvation. In
fulfil. Still non-blocking implementations have their
processor-local IPC paths, the order of send opera-
benefits: they typically scale to higher CPU num-
tions is governed by the sequence threads are picked
bers than lock-based variants and, in our case, the
by the scheduler. This completely eliminating the
effort and overhead required for a fair lock protecting
need of a send queue. For example, facilitating time
the send queue is in the same order as the effort and
and priority inheritance, servers may receive time
overhead of a mostly lock-free send queue.
from the currently highest prioritized thread blocked
on them. Using this time, they can therefore com-
plete potentially pending requests to process the re-
quest of the time provider [6]. The crutial feature 3 Krieger’s MCS-style Reader
ensuring this ordering is the ability to donate time Writer Lock
to other threads, an operation which is easily done lo-
cally but very hard to apply across processor bound-
By swinging a single pointer to the tail of a list,
aries. Therefore, for cross-processor IPC, the order-
MCS locks implicitly arrange lock acquiring threads
ing of incoming requests needs another mechanism:
in FIFO order. Either if a thread enqueues into an
a send queue.
empty queue (i.e., tail = 0 prior to the enqueue oper-
When designing IPC primitives, multiple, par- ation) or if the previous lock holder releases the lock
tially contradicting targets need close attention, like by clearing a field the next thread in the list spins
performance and flexibility. Individual send and re- on, the thread at the head of the list becomes lock
ceive operations allow for a high degree of freedom, holder.
but add complexity. Threads may invoke multiple
prev
servers by sending multiple messages before enter- ...
R* L R* L R* W R R
ing a receive state or they may receive from another
client before replying to a previous one. On the other
hand, allowing only calls (i.e., atomic send and re- Tail
R* L
ceive operations) to invoke servers and restricting prev
R* L R* W ... R R
servers to reply only to the last caller, possibly af-
ter calling other servers in the course of handling the
caller’s request, limits the flexibility of IPC but sim- Tail
plifies synchronization and is sufficient for most use
cases. Processor-local versions of such a call-reply FIGURE 1: Data structure and dequeue
style IPC path can be very fast because they do not operation from Krieger’s MCS lock
need any form of synchronization [5].
Eliminating contention on the active reader
The operations required of a send queue are en- counter of Mellor-Crummey and Scott’s original
queuing to the tail of the list and dequeuing the head reader-writer lock [8], Krieger et al. [7] introduce
when replying. Depending on whether these opera- a second pointer to each queue element to enqueue
tions are used to block threads only in the case of readers into a lazily updated double-linked list. Fin-
a contended server or in every IPC, enqueuing into ished readers dequeue from the middle of the list us-
an empty list and dequeuing the head must be fast. ing spin locks that are only taken for the purpose of

182
Real-Time Linux Concepts

protecting neighboring list nodes during a dequeue. The first part of Invariant I is ensured automat-
The last active reader releases a subsequent writer if ically by the call-reply style IPC path as long as on-
present. Figure 1 illustrates this algorithm and the going requests are not aborted (Section 4.1 below
used data structures. discusses how the spirit of this invariant can be main-
tained in case of aborts). The second part is ensured
The key features of Krieger’s MCS lock, makeing
by leaving the callee locked in situations where the
it a perfect starting point for send queues, are:
last thread dequeues itself from the send queue.
1. an implicit FIFO ordering by threads atomi-
cally swinging the tail pointer to their list ele- type SQ_Item = record
ment and lazily enqueuing afterwards; next : ^sq_item
pred : ^sq_item
2. an extremely fast enqueue operation for the un- lock : precedence_spinlock
contended case (essentially just an atomic swap type Status = enum {EMPTY, NOT_EMPTY, OTHER}
of the tail pointer); type send_queue = class
head : ^sq_item
3. in most situations, an extremely fast dequeue tail : ^sq_item
operation of the head element (essentially only method enqueue(I : ^SQ_Item) : Status
the release of the next thread and an atomic pred : ^SQ_Item := swap(tail, I)
compare-and-swap if the queue is empty after- if pred = nil
wards); and, head := I
release_precedence(I->lock)
4. the possibility to dequeue from the middle of return EMPTY
I->pred := pred
the queue. // store fence
pred->next := I
release_clear_precedence(I->lock)
4 Turning Krieger’s Lock into return NOT_EMPTY

a Send Queue
FIGURE 2: Types and enqueue operation
Our goal is to translate Krieger’s MCS lock into a
mostly lock-free send queue for a call-reply style IPC
Figure 2 shows the pseudocode for enqueuing
path. In general, there are two principle ways to
into the send queue2 . Like in MCS locks, the func-
use such a send queue: enqueue callers only if the
tion enqueue starts by atomically swinging the tail
callee is contended, or, enqueue all callers that are
pointer of the send queue to the list element of the
sending or waiting to send to the callee. In the first
invoking caller and remembering the old value of tail.
case, exclusive control over the callee must be taken
If this old value is nil, the queue was empty and the
to determine whether it is ready to receive new re-
function may return after updating the head pointer.
quests. If not, the caller would block in the send
Otherwise, enqueue first sets the predecessor pointer
queue while waiting for the callee to be ready to re-
of its element before completing the enqueue opera-
ceive its request. Otherwise, the caller, in case of
tion by updating the predecessor’s next pointer. As
a caller-driven implementation, or the callee, in case
pointed out by Krieger, this write sequence prevents
of a callee-driven implementation, starts transferring
a race with a concurrent dequeue operation from
the message. However, because locking the callee to
the middle of the list and may require an additional
obtain exclusive control ideally involves a fair lock
fence. The role of the precedence spinlock and the
that facilitates local spinning, we expect comparable
validity of head will be discussed after we have in-
costs for locking the callee and for enqueuing into an
troduced the two dequeue functions.
MCS-style send queue.
Krieger’s MCS lock does not distinguish between
To avoid the additional overhead of locking the
dequeuing from the middle of the list and dequeuing
callee in the uncontended case (i.e., when the callee
the head element, since finished readers have to re-
is already receiving), we maintain the following in-
tract from the queue lock no matter where they are.
variant:
For a send queue however, the former operation is
Invariant I: A callee with an empty send queue only invoked when a thread cancels its IPC, whereas
is always receiving and implicitly locked by the first dequeuing the head is used in every reply to a caller,
thread entering the send queue. thereby completing the IPC.
2 For a better comparison, we adopt the pseudocode introduced by Mellor-Crummey and Scott [8], which is also used in

Krieger et al. [7].

183
Turning Kriegers MCS Lock into a Send Queue

method dequeue_head(I : ^SQ_Item) : Status not yet update this pointer after Step 1. If now
if acquire_precedence(I->lock) = false a thread C enqueues itself and both A and B
return OTHER
if I->next = nil
dequeue themselves using dequeue head, A will
if !compare_swap(tail, I, nil) set C’s predecessor to nil but B’s later dequeue
repeat while I->next = nil will set the already left A as a new head.
next : ^SQ_Item := I->next
if next != nil
next->pred := nil Although the list structure itself is maintained, any
head := next head dependent action by C will now block forever or
I->next := nil work on the wrong head A. For call-reply style IPC
return NOT_EMPTY
I->next := nil paths, the restriction Invariant II imposed on the use
return EMPTY of the head pointer is no problem, because head is
used only by the thread having exclusive control of
FIGURE 3: Dequeue head operation the callee and in one of the following two situations:
(1) to pull in the next message after a reply; and
Figure 3 shows the pseudocode for dequeueing (2) to identify the thread to reply to. In a callee-
the head element. Again we defer the discussion of driven implementation, the callee is active anyway
the precedence spin lock to Section 4.1. Like in all when these situations occur. In a caller-driven im-
MCS-style locks, a nil next pointer can indicate one of plementation, the exclusive owner of the callee takes
two situations: either the dequeue operation is about over control of the thread at the head of the send
to remove the last thread from the list (in this case, queue to push its message to the receiver.
tail = I holds) or, a thread is about to enqueue into
the list but did not yet manage to update the prede-
cessor’s next pointer. The atomic compare-and-swap 4.1 Cancel
checks in which of the two states the list is in and
method
clears tail in the first case. Otherwise, the dequeu- dequeue_middle(I : ^SQ_Item) : Status
ing thread spins until the next pointer is set. This pred : ^SQ_Item = I->pred
spinning is bounded because enqueue executes in the // chase and lock pred
kernel with interrupts disabled. After returning from repeat while pred != nil
if try_acquire(pred->lock)
this loop, either the list is empty or the next pointer if pred = I->pred
is set and the dequeuing thread can update the head break
pointer to the corresponding thread, now being the release(pred->lock)
pred := I->pred
new head of the list.
if pred
Notice that the head pointer is not always cur- acquire_precedence(I->lock)
prev->next := nil
rent. In particular, dequeue head does not update if I->next = nil
head in case the list gets empty. The invariant that if !compare_swap(tail, I, prev)
maintains correctness of this implementation is the repeat while I->next = nil
following: next : ^SQ_Item := I->next
next->pred := pred
Invariant II: Head is valid only for the thread that if next != nil
pred->next := next
is under exclusive control of the callee. release(prev->lock)
An immediate consequence of this invariant is, I->pred := nil
I->next := nil
that threads may not poll head until they reach the return dequeue_head(I)
front of the send queue. The following sequence il-
lustrates this race:
FIGURE 4: Dequeue middle operation
1. Thread A enqueues and dequeues from the list.
Assuming the list was empty, head now refers Figure 4 shows the pseudocode for dequeuing
to A because dequeue head will not clear the threads from the middle of the send queue, which
head pointer, as doing so would race with with is required for canceling waiting threads. Except for
B’s concurrent enqueue operation. the precedence spinlocks, the above code directly re-
sembles Krieger’s reader unlock operation. The first
2. Thread B starts enqueuing itself but is delayed
while loop chases the predecessor pointer of the de-
after updating the tail pointer.
queuing caller’s list element to lock it for the subse-
3. Thread A returns, enqueues itself to the list quent dequeue operation. In our code base, this loop
and finds itself to be the head because B did is preemption reactive in the sense that it will abort

184
Real-Time Linux Concepts

the dequeue operation if a preemption is pending. threads and their corresponding send queue items
We have omitted this test for reasons of simplicity. may not be deallocated immediately upon their de-
struction. Instead, we reuse a read-copy update
Having locked the predecessor, the own lock of
(RCU) like deferred destruction scheme [9], that was
the caller’s list element is acquired to prevent con-
already available in Nova [5].
current dequeues from modifying the prev and next
pointers. Like in all MCS-style locks, compare-and- Although a callee is not necessarily receiving, the
swap is used to update the tail pointer in situations spirit of Invariant I holds trivially in a callee-driven
where the tail element of the list is dequeued. Oth- implementation because a caller enqueuing into an
erwise, the dequeue operation waits for a pending empty list will simply activate the callee no matter
enqueue operation to update the next pointer of the in what state it is. After a cancel, this may result in
to-be-dequeued element. Together with enqueue and the callee completing its prior operation or perform-
dequeue head, dequeue middle maintains the invari- ing some cleanup before entering the receive state
ant that dequeued list elements are always prece- during which it will pull the caller’s message.
dence locked. In the following we describe the role
of these precedence locks in greater detail.
4.2 Evaluation
The primary purpose of the spin lock is to pro-
tect the prev and next pointers from concurrent de-
queues. In our version, we grant dequeue head prece-
dence over concurrent dequeue operations from the 6000
et
enqueue + dequeue time in cycles
middle of the list, which are invoked as part of an IPC so
ck
5000 ss
cancel operation and, unlike dequeue head, are not cro

performance critical. To grant precedence, threads 4000

single core
socket
dequeuing from the middle can obtain the lock only single
if it is free and the precedence bit is clear. There- 3000

fore, by setting the precedence bit in case the lock 2000


could not be acquired immediately, the callee reply-
ing to the caller will obtain the lock after at most 1000
the duration of one dequeue operation.
0
0 2 4 6 8 10 12
In a callee-driven implementation of synchronous
number of participating cores
IPC, there is one situation where two threads concur- (Xeon X5650 @ 2.67GHz, dual socket, 6 cores each)
rently attempt to acquire the lock with precedence:
when the reply of a callee to its caller collides with FIGURE 5: Send queue overhead
an IPC cancel operation to the head of the send
queue. In this situation and because dequeuing from To evaluate the performance overhead of our
the send queue is the last operation of the IPC, it send queue we have measured roundtrip times of the
does not matter which one of the threads completes operations involved in a call-reply pair — enqueue
its dequeue operation. Therefore, a thread will bail followed by dequeue head if the caller is at the head
out from the dequeue head operation with the status of the send queue. Measurements were done on an
code Other if it finds the lock with the precedence bit Intel Xeon X5650 @ 2.67 GHz by increasing the num-
set. If these two operations happen in any sequence ber of cores participating in the roundtrip. As seen
one after the other, it is important to abort the sec- in Figure 5, the call/reply to an uncontended server
ond operation because otherwise, after the compare- on a single core is quite fast, having multiple cores on
and-swap operation in dequeue head, the second de- the same socket competing on a queue adds a over-
queue head would wait for a thread to update its next head of up to 1200 cycles, but having to access the
pointer, which will never happen. In Krieger’s MCS cross-socket interconnect is really painful.
lock and likewise in our send queue, there is no means
to detect just from the link information whether or
not an element is enqueued. The invariant that de-
queued elements are precedence locked introduces 5 Related Work and Treasure
precisely this information to avoid the above lifelock Chest
in subsequent dequeue head operations.
Because threads dequeuing from the middle may In the literature, one finds several lock-free imple-
refer to elements that are no longer enqueued, mentations of common data structures such as lists,
stacks and trees [11, 12]. However, typically these

185
Turning Kriegers MCS Lock into a Send Queue

data-structures where designed as a reference imple- ACM Trans. on Programming Languages


mentation to illustrate some high-level concepts such and Systems, vol. 12, no. 3, pp. 463492, 1990.
as linearizability [1], wait-freeness [14] or obstruction
[2] M. Roitzsch and H. Härtig, “Ten Years of
freeness [13]. Others demonstrate the use of memory
Research on L4-Based Real-Time” Proc. of
management schemes such as hazard pointers [10] to
the Eighth Real-Time Linux Workshop,
establish the type safe memory these data structures
Lanzhou, China, 2006
require. However, little work is published on the
building blocks leading to lock-free data structures [3] Guanghui Cheng, Nicholas Mc Guire, Qing-
(Valois’ work [11] forming an exception) and on non- guo Zhou and Lian Li, “L4eRTL: Port of
pure algorithms that, like Krieger’s MCS lock, are eRTL(PaRTiKle) to L4/Fiasco microkernel”
lock-free in the important operations but use poten- 11th RT Linux Workshop, 2009
tially unfair locks when this unfairness does no harm.
The consequences are that it is difficult to find an im- [4] H. Härtig “Security Architectures Revisited”
plementation that perfectly suits a given problem, it Proc. of the 10th ACM SIGOPS Euro-
is hard to identify the building blocks used in such pean Workshop, France, Sept. 2002
an implementation and even harder to perform the [5] U. Steinberg and B. Kauer “NOVA: A
necessary adjustments. Microhypervisor-Based Secure Virtualization
More raising the problem than providing a defi- Architecture” Proc. of EuroSys, April 2010
nite solution, we propose to collect searchable build- [6] U. Steinberg, J. Wolter and H. Härtig “Fast
ing blocks for lock-free data structures. By building Component Interaction for Real-Time Systems”
blocks we mean essential ways to introduce a cer- Proc. of the 17th Euromicro Confer-
tain functionality and the prerequisites and proper- ence on Real-Time Systems, July 2005
ties they imply. For example, dequeuing from the
middle of a doule-linked list may be implemented us- [7] O. Krieger, M. Stumm, R. Unrau and J. Hanna
ing Valois’ helper nodes [11], but with the limitation “A Fair Fast Scalable Reader-Writer Lock”
of not being able to reuse these nodes immediately Proc. of the IEEE International Con-
or, in a mostly lock-free fashion, with Krieger’s spin ference on Parallel Processing, 1993
locks that are just used for the dequeue operation. [8] J. Mellor-Crummey and M. Scott “Scal-
RCU [9] is an excellent building block for read-most able reader-writer synchronization for shared-
data structures and deferred object destruction, how- memory multiprocessors” 3rd ACM Symp.
ever send queues are write dominated. In our case, on Principles and Practice of Parallel
relaxing the validity of the head pointer allowed for Programming, 106-113, April 1991
a very simple implementation of the queue and the
queue-state dependent implicit locking of the callee [9] P. McKenney “Read-Copy Update”
improves performance for the uncontended case. http://www.rdrop.com/users/paulmck/RCU/
[10] M. Michael “Hazard Pointers: Safe Mem-
ory Reclamation for Lock-Free Objects” IEEE
6 Conclusions Trans. on Parallel and Distributed Sys-
tems 15 (6): 491504, 2004
This paper describes the modifications necessary to [11] J. Valois “Lock-free linked lists using compare-
turn Krieger’s MCS-style queue lock into a send and-swap” 14th Annual ACM Symp. on
queue for call-reply style synchronous IPC paths. Principles of Distributed Computing,
To our surprise, Krieger’s lock already provides all 214-222, Aug 1995
the essential functionality. Our send queue extends
Kriegers lock in two invariant driven ways: by in- [12] K. Fraser “Practical lock-freedom” PhD The-
troducing a head pointer and by introducing an im- sis, University of Cambridge, Feb 2004
plicit locking scheme where threads enqueuing into
[13] M. Herlihy, V. Luchangco and M. Moir
an empty queue immediately get hold of the callee
“Obstruction-Free Synchronization: Double-
lock. This resulted in a fast send queue usable in call-
Ended Queues as an Example” 23rd Int. Con-
reply style synchronous cross-processor IPC paths.
ference on Distributed Computing Sys-
tems (ICDCS 2003), 19-22 May 2003

References [14] M. Herily “Wait-free synchronization” ACM


Trans. on Programming Languages and
[1] M. Herlihy and J. Wing, “Linearizability: a Systems, 13(1):124-149, Jan 1991
correctness condition for concurrent objects,”
186
Real-Time Linux Concepts

DYNAMIC MEMORY ALLOCATION ON REAL-TIME LINUX

Jianping Shen
Institut Dr. Foerster GmbH und Co. KG
In Laisen 70, 72766, Reutlingen, Germany
[email protected]

Michael Hamal
Institut Dr. Foerster GmbH und Co. KG
In Laisen 70, 72766, Reutlingen, Germany
[email protected]

Sven Ganzenmüller
Institut Dr. Foerster GmbH und Co. KG
In Laisen 70, 72766, Reutlingen, Germany
[email protected]

Abstract
Dynamic memory allocation is used in many real-time systems. In such systems there are a lot of
objects, which are referenced by different threads. Their number and lifetime is unpredictable, therefore
they should be allocated and deallocated dynamically. Heap operations are in conflict with the main
demand of real-time systems, that all operations in high priority threads must be deterministic. In this
paper we provide a generic solution, a combination of the memory pool pattern with a shared pointer,
which meets both: high system reliability by automatic memory deallocation and deterministic execution
time by avoiding heap operations.

1 Introduction For such systems we need a deterministic, automatic


and multi-threading capable dynamic memory man-
Dynamic memory management on real-time multi- agement solution for C and C++ real-time develop-
threaded systems has two handicaps. ment.

1. Execution time for memory allocation and


deallocation should be fast and predictable in
time. Heap allocations in general (with new()
or malloc()) are not deterministic, because of
memory fragmentation and non-deterministic
behaviour of system calls (brk(), sbrk(), 2 Detailed Problem Descrip-
mmap()) [1].
tion
2. For objects which are referenced by more than
one thread, it is difficult or even impossible to
predict which thread will be the last user of In this section we will explain the two mentioned
the object and has the duty to deallocate the handicaps in detail and try to figure out the solu-
object’s memory at the right time. tions.

187
DYNAMIC MEMORY ALLOCATION ON REAL-TIME LINUX

2.1 Execution Time of Memory Allo- ordered double linked lists (shown in figure 2).
cation

The implementation of memory management de-


pends greatly upon operating system and architec-
ture. Some operating systems supply an allocator for
malloc(), while others supply functions to control
certain regions of data. The same dynamic memory
allocator is often used to implement both malloc()
and operator new() in C++ [2].
Thus, we firstly limit our discussion to the fol- FIGURE 2: Available chunks in Bin.
lowing preconditions:
If the deallocator function void free(void* ptr)
Operating system: Linux
is called, a chunk will be released. Which means the
C Runtime Library: glibc
chunk is marked as available and dependent on its
Architecture: X86-32
size, goes to Bins or Fastbins for further allocations.
Linux uses virtual memory, each process runs in Obviously the available chunks in Bins and Fastbins
its own virtual address space. Linux dynamically are dynamic and dependent on runtime conditions.
maps the virtual memory to physical memory during
If the allocator function void* malloc(size t
runtime. The virtual memory layout for a running
size) is called, the following steps will be processed:
process is shown in figure 1.

1. If memory size ≤ max fast, ptmalloc2() tries


to find a chunk in Fastbins.
2. If step 1 failed or size > max fast and size <=
DEFAULT MMAP THRESHOLD, ptmalloc2() tries
to find a chunk in Bins.
3. If step 2 failed, ptmalloc2() tries to increase
the heap size by calling sbrk().
4. If step 3 failed or
size > DEFAULT MMAP THRESHOLD, ptmalloc2()
calls mmap() to map a physical memory in the
process virtual address space.
5. Finally return the allocated chunk, or NULL if
steps 1 – 4 all failed.

This is a very simplified description. What re-


ally happens is much more complicated, but we don’t
FIGURE 1: Process memory layout on want to inspect the details. It is now important to
Linux. know, that execution time of memory allocation is
not predictable.
For a memory allocation of small or normal size
(steps 1 – 3), ptmalloc2() tries to find an appropri-
The allocator implementation in glibc is
ate chunk in Bins or Fastbins. The number of avail-
ptmalloc2(). A memory block managed by
able chunks in both containers is dynamic and de-
ptmalloc2() is called chunk. The heap organises all
pendent on runtime conditions. The allocation time
available chunks in two containers called Bins and
is therefore not predictable.
Fastbins. Fastbins contains small sized1 chunks for
fast allocation, Bins contains normal sized2 chunks. For large size memory allocations (step 4),
Available chunks in Bins are organized in 128 size- ptmalloc2() uses the system call mmap() to map
1 size <= max fast (default 72 Bytes).
2 size > max fast (default 72 Bytes) and size <= DEFAULT MMAP THRESHOLD (default 128 kB).

188
Real-Time Linux Concepts

physical memory in the process’ virtual address 3 The Approach


space. With virtual memory management the phys-
ical memory may be swapped on hard disk. It in-
3.1 Combination of Memory Pool and
volves disk IO, thus, its execution time is also un-
predictable. Shared Pointer
On other operating systems and architectures, The goal is to benefit from both approaches, a com-
the most allocator implementations involve system bination of the memory pool with shared pointers.
calls. Hence the memory allocation on such systems
may be expensive and slow. The allocation time may
1. The memory pool preallocates a number of ob-
or may not be predictable, depending on the concrete
jects (see figure 3).
implementation.
A common solution for this problem is the mem- 2. During runtime threads acquire objects from
ory pool approach. A memory pool preallocates a the pool as shared pointers.
number of memory blocks during startup. While the
3. If nobody is using the object any longer, the
system is running, threads request objects from the
shared pointer will automatically return the
pool and return it back to the pool after usage. In a object back to the pool (see figure 4).
single threaded environment the memory pool pat-
tern allows allocations with constant execution time
[3]. In a multi-threading environment the execution
time is predictable, but not constant.

2.2 Memory Deallocation at Right FIGURE 3: Memory pool layout.


Time

Consider memory allocations in a multi threading en-


vironment. Thread A allocates a memory block, and
passes it as pointer to Thread B and Thread C. Now
Thread A, Thread B, and Thread C are all users of
this memory block. The last user must deallocate
the memory block to prevent a memory leak. Which
of them becomes the last user is dynamic and de-
pends on runtime conditions. In such a situation, its
impossible to safely deallocate the memory by just
calling the deallocator in one of the three threads.
A solution to this problem is the shared pointer
pattern. The idea behind shared pointers (and other
resource management objects) is called Resource Ac-
FIGURE 4: Shared pointer layout.
quisition Is Initialization (RAII). Once a memory
block is allocated, it will be immediately turned over
to a shared pointer. A reference counter is used by 3.2 Execution Time of Shared Point-
the shared pointer to keep track of all memory block
ers
users. The shared pointer automatically releases the
memory block when nobody is using this block any
The common shared pointer implementation raises
longer [4].
an interesting question: where’s the reference
With the help of shared pointers, the users (in counter located and who allocates it? Unfortunately,
our case the threads) don’t need to release the mem- it will be allocated with new() on heap by the shared
ory by themselves, this will be done by the shared pointer itself. That means the constructor call of a
pointer always at the right time. shared pointer is unpredictable in time.

189
DYNAMIC MEMORY ALLOCATION ON REAL-TIME LINUX

Hence we have to modify the shared pointer in Single-threaded environment:


a way that we also preallocate the reference counter
as follows. tall = tp + tm

1. Extend the memory pool. The memory pool


The complete latency is constant in a single-threaded
must preallocate memory not only for the ob-
environment.
ject, but also for the object’s reference counter.
2. Modify the standard shared pointer to work
with the memory pool’s preallocated reference
Multi-threaded environment:
counter.
Best case: no concurrent access to the memory pool

3.3 The Final Approach tall,min = tp + tm

We put 3.1 and 3.2 together, and provide here the Worst case: full concurrent access, all threads access
final solution. the memory pool at the same time.
1. The memory pool preallocates memory for user
data and its reference counter. tall,max = (tp + tm ) · threadmax

2. At the runtime the process acquires memory


Thus, the complete latency in a multi-threaded en-
from memory pool as shared pointer.
vironment can be calculated as:
3. The Shared pointer uses the preallocated ref-
erence counter. Its execution time is therefore tp + tm ≤ tall ≤ (tp + tm ) · threadmax
predictable. The Shared pointer keeps also a
pointer to the memory pool. It will be used to
It is not constant, but predictable in time.
return memory in step 4.
4. When nobody is using the memory block any
longer, the shared pointer automatically re-
turns the memory to memory pool. 3.5 Performance Improvement

This solution works without memory allocation The mutex protection enforces that all parallel mem-
at runtime. The process and the shared pointer both ory pool accesses will be serialised. This could be a
use preallocated memory from the memory pool. In performance bottleneck. A workaround is to create
our approach we call the shared pointer RtSharedPtr. more memory pools to improve parallel memory ac-
quisition.

3.4 Execution Time If we create a memory pool for each dynamic


data type in each thread, there is no concurrent
Our memory pool is designed to be used in multi- memory pool access, thus, we don’t need the mu-
threading environments. The maximum execution tex any more. We can disable the mutex protection
time of a memory acquisition from a memory pool by creating a memory pool as thread local. To cre-
is dependent on the maximum number of threads. ate a thread local memory pool, we call the memory
For multi-threading environments the access to the pool constructor as follows
memory pool is protected by a mutex. Let’s assume: RtMemoryPool(int size, bool tLocal=true,
const QString& name)

tp = execution time for a memory We will reach the maximal performance. The
execution time is minimal and constant.
acquisition at a memory pool
tm = execution time
tall = tp
of mutex lock + unlock
threadmax = maximum thread number
Obviously in a single-threaded environment we
tall = complete latency should create the memory pool as thread local to get
for a memory acquisition the best performance.

190
Real-Time Linux Concepts

4 The Implementation

4.1 Memory Pool

Our implementation3 is based on C++ templates [5],


therefore all preallocated memory chunks of a con-
crete memory pool are of the same size. That means,
we need a memory pool for each dynamic allocated
data type but the template approach garanties type-
safety.

t e m p l a t e <typename F>
c l a s s RtSharedPtr ;
FIGURE 6: Return Memory
t e m p l a t e <typename T>
c l a s s RtMemoryPool Instead of a raw pointer the memory pool returns
{ a shared pointer to the allocated memory block, The
RtMemoryPool ( i n t s i z e , b o o l shared pointer uses the reference count in the al-
t L o c a l , c o n s t Q Str ing& name ) ; located memory block, and keeps a pointer to the
RtSharedPtr <T> s h a r e d P t r A l l o c ( ) ; memory pool.
...
// usa g e
};
RtMemoryPool<SomeType> memPool
( 1 0 0 , ”MyPool” ) ;
The preallocated memory blocks are organized RtSharedPtr <SomeType> s p t r =
as a linked list. Each block contains user data, its memPool . s h a r e d P t r A l l o c ( ) ;
reference counter and a pointer to next block. For a
memory acquisition the pool returns always the head If the template type is a class, the memory pool
block. (see figure 5) will call the class constructor4 to initialize the mem-
ory block. Accordingly the class destructor will be
called when the memory block returns back to the
pool. If the object holds some resources5 , they are
released by the class destructor to prevent resource
leaks.

4.2 Shared Pointer

We modified the boost shared pointer[6] as follows:

1. Make it possible to use the preallocated refer-


ence counter.
2. Add a boolean to indicate whether the man-
aged memory block is from a memory pool.
3. Add a memory pool pointer, which will be used
FIGURE 5: Memory Acquisition to return the memory block back to the pool.
4. Extend the release behavior; make the shared
If a memory block returns, it will be added as pointer also be able to return memory byck to
the new head block in front of the list. (see figure 6) the pool.
3 Ourimplementation is based on the Qt library, so we use QString, which can be easily replaced by std::string.
4 Constructorsof pool objects should be kept simple, because their execution affects directly the memory aquisition time.
5 Objects with dynamically allocated resources violate the real-time conditions because of their destruction in the object’s

destructor.

191
DYNAMIC MEMORY ALLOCATION ON REAL-TIME LINUX

The modified shared pointer can work in two priority thread is meassured in µs and illustrated in figure
modes. 7 and figure 8.

Normal mode: it works identically as a standard


shared pointer and allocates the reference counter on
heap. Its execution time is therefore unpredictable.
Real-time mode: in this mode the shared pointer
must work together with a memory pool. Since the
reference counter is preallocated in the pool, the ex-
ecution time is predictable. There is no heap opera-
tion during runtime.
t e m p l a t e <typename T>
c l a s s RtSharedPtr
{
t e m p l a t e <typename F> FIGURE 7: Test case A: pool allocation.
e x p l i c i t RtSharedPtr (F∗ p ) :
pn ( p ) , px ( p )
{
}
// a new c o n s t r u c t o r
t e m p l a t e <typename F>
e x p l i c i t RtSharedPtr (F∗ p ,
Refer enceCo unt ∗ count ,
RtMemoryPool<T>∗ p o o l ) :
pn ( p , count , p o o l ) , px ( p )
{
} ...
}

We provide a new constructor. Compared to FIGURE 8: Test case B: heap allocation.


the boost version it takes two extra parameters.
A pointer to a preallocated reference counter and Both test cases run for approximately 10 hours and
a pointer to the related memory pool. With the produce about 109 records.
new constructor the shared pointer will be bound to
Test case A shows, that 90% pool allocation is ac-
the memory pool and to it’s preallocated reference
complished up to 5 µs, and 99.999% pool allocation up
counter. to 10 µs. The curve is limited to 60 µs on the time axis,
A complete implementation can be found at which is the worst execution time. Our approach satisfies
http://www.foerstergroup.de/files/TS/RtSharedPtr.tgz therefore hard real time systems timing requirements.
Compared to pool allocation, the execution time of
a heap allocation is evenly distributed between 80 µs and
4.3 Test Results 900 µs. Its worst excution time is not limited.

We tested our solution on:


Hardware: Intel Atom N270,
1 GB RAM 5 Conclusions
Operation System: Linux 3.0.0-rt3
Compiler: gcc/g++ 4.4.5
The described approach provides a generic solution,
C-Library: glibc 2.11.2
which meets both: high system reliability by automatic
As test case we started two threads, one high-priority memory deallocation and deterministic execution time by
thread which allocates memory – in test case A from the avoiding heap operations.
memory pool (figure 7) and in test case B from heap (fig-
We have tested our approach on Linux and Windows.
ure 8). The second thread acts as a noise generator, it
As a generic solution it can be easily migrated to other
allocates and frees memory chunks of different size on
operation systems, which do not provide operation sys-
heap to bother the glibc’s allocator.
tem level memory allocation with deterministic execution
The execution time for an allocation in the high- time.

192
Real-Time Linux Concepts

References [3] http://en.wikipedia.org/wiki/Object pool pattern


[4] Scott Meyers: Effective C++, Third Edition, 2005,
[1] Gianluca Insolvibile: Advanced memory allocation, Pearson Education, Inc.
2003, Linux Journal Issue 109 [5] C++ Deeply Embedded, 2010, Hilf GmbH.
[2] http://en.wikipedia.org/wiki/Malloc [6] http://www.boost.org/

193
DYNAMIC MEMORY ALLOCATION ON REAL-TIME LINUX

194
Real-Time Linux Concepts

pW/CS - Probabilistic Write / Copy-Select (Locks)

Nicholas Mc Guire
Disttributed & Embedded Systems Lab, Lanzhou University
Tianshui South Road 222,Lanzhou,P.R.China
[email protected],[email protected]

Abstract
The initial problem of protecting data for concurrent access was relatively simple, the goal was ex-
clusive access to a shared resource. Elaborate semantical variations of the atomicity theme have been
developed over the past decades, followed by a ever increasing focus on scalability. At the same time a
continuously increasing complexity of operating systems and applications has resulted in a steady growth
of the complexity of the locking subsystem - one could speculate that the locking complexity has been
growing faster than the overall complexity of operating systems, but it would be hard to put numeric
evidence on this claim - suffice it to state that the development of Linux in the transition from 2.2 to 2.4
and to the now current 3.X series of kernels has been very much dominated by locking issues related to
scalability [8], [7].
At the same time we have seen that locking semantics has become more complex, priority inheri-
tance/priority ceiling, fine grain locking, lock types dependent on global state [5] and large lock depen-
dencies (or lock chains [13]) becoming common. This growth in complexity has dramatically impacted the
development of real-time OS like the Preempt-RT real-time extension to the Linux kernel - not too surpris-
ing considerable efforts related to real-time are lock related [4],[6]. The paradigm has roughly remained the
same - explicit mutual exclusion to critical regions and atomicity of access in a functionally deterministic
manner along with hardware support for more elaborate atomic instructions (i.e. cmove,cmpx16).
This approach has a serious draw back:
• it is hard to make locking scalable
• detecting and fixing locking problems is becoming more difficult
• the performance impact of locking - notably on real-time - is problematic
• The worst case behavior is only a miniscule sub-state-space hard to actually reach during testing
• Timing wise the worst case is always the loaded system, thus reliable prediction of load impact is
limited.
The question is - is there an alternative ? Notably one that scales with growing complexity ? Its
not yet time to give a simple yes or no answer, but we believe that we can state that for some locking
problems there are solution that can actually inherently scale with growing complexity. The problem
simply has to be approached from a different perspective.
Operating systems have been traditionally modeled as deterministic constructs - code is deterministic
- but in system scope this simply does not hold. Non-determinism at the temporal level paired with
preemptible operating systems inherently leads to the inability to predict the global state of an operating
system even in the near future (lets say a few seconds into the future). Thus modeling a task as running on
a ”random” global state - the operating system - allows a new perspective for access to shared resources.
Taking one step back, locking was not introduce to provide exclusive access, locking was introduced
to ensure consistency of access to a shared resources - locking being one way this can be done in a
straightforward manner. At times where memory was a scarce resource this approach made a lot of
sense - with RAM readily available, though with significant access performance differences depending on
physical location, alternative solutions for consistent access to shared resources may make more sense -
one of these methods is probabilistic locking.
In this paper we present the motivation for a simply probabilistic lock - arguably the term lock
is inappropriate - but we retain it as it serves the same purpose as the traditional locks - guarantee

195
pW/CS - Probabilistic Write / Copy-Select (Locks)

consistency of shared data. This lock is not a one-fits-all solution to the problem of shared data in
concurrent systems - but rather it should be seen as an attempt to change the perspective and view
contemporary systems as what they are - inherently random systems - and capitalize on this notion to
resolve the scalability problem.

1 Introduction The algorithm described in this paper contains


a race - and it can be quite trivially shown under
Computer science has been much focused on deter- what conditions the race exists (i.e. a SPIN model
ministic methods - notably when it comes to syn- would reveal this). We will introduce a lock-less/0-
chronization methods we rely on correctness proofs wait algorithm to read a register (of in principle ar-
to assure that the methods are sound. While this bitrary size) while concurrently writing it. We will
does guarantee that these methods will not fail as show that this algorithm is reliable with the reader
long as we actually are able to model them (that is and the writer being non-atomic and then argue that
we know all involved locks) allowing us to exclude in- while theoretically unsafe with a non-atomic reader
consistency of data - they are causing serious prob- and non-atomic writer it is practically - that is sta-
lems in the transition to multi-core systems - they tistically - safe with an arbitrary reliability target.
don’t scale well. A further issue with the determinis- Thus the trade-off is spatial replication vs interpro-
tic approach is that the worst case is expected under cess synchronization time.
heavy load and thus testing only has a limited sig- The main contribution of this article is to demon-
nificance in certifying correctness as the state space strate that the growing complexity of modern sys-
that would need to be covered by testing is simply tems (complex hardware and software) needs answers
too large. to core questions, that rather than fighting complex-
Real-time systems have constraints on access ity, capitalize on it and result in robust systems un-
patterns, i.e. SCHED FIFO makes it impossible for der real-world conditions.
a tight-loop sequence of preemption to happen on
a single CPU (unless intentionally coded) - thus the
measures to ensure consistency could be relaxed. Un-
1.1 Race Condition
fortunately historically it seems that locking was de-
A race is a access pattern on a shared object that
veloped for the general case of arbitrary preemption
can’t be judged from the context of the involved
patterns and then these general results were special-
tasks only. It is important to note that unprotected
ized for the real-time case. The effect being that
shared access in it self does not suffice to create in-
though real-time locking should be in principle sim-
consistency - essentially occurrence of inconsistency
pler than the general non-real-time case, it is in fact
is bound to access patterns. thus there are two op-
more complex.
tions:
The question we ask is simple if the methods
in use are actually solving problems that exist or if
• unify the context - i.e. add a shared lock to
they are not heavily involved in solving non-existing
join the context
problems with considerable overhead to do so. Even
worse - on very large systems forcing unnecessary • de-couple context - i.e. randomize access to
determinism might well be one of the main prob- minimize joined context
lems. Is it reasonable to assume arbitrary defined
task sequences or arbitrary preemption sequences at
The first is the ”traditional” deterministic lock-
the temporal level? For any real life system this
ing scheme, the second is not actually that new, but
makes little sense - for full fledged general purpose
maybe just not yet presented in the context of lock-
OS it makes absolutely no sense. Even more, the
ing. The goal is to design synchronization that scales
inherent randomness of modern CPUs [9] makes it
with complexity rather than trying to enforce sim-
close to impossible to actually achieve synchronous
plicity at the local level by increasing the global com-
sequences of concurrent access to unprotected global
plexity.
objects, even if one were to maliciously attempt to
do so. Methods like WCET estimation are becoming The properties of masking locks build on the no-
(actually are at this point) prohibitively pessimistic tion of inherent non-determinism of concurrency in
and thus practically not usable for multicore systems. modern CPUs. Basically this non-determinism at

196
Real-Time Linux Concepts

task level is precisely the cause for race conditions • failure probability decreases with the number
in the first place, if modern systems were strictly of participating processes.
synchronous at the global level then we could pre-
determine any access patterns and consequently pro- Scalability is not mentioned here simply because
tect. Sources of non-determinism a plentiful in mod- we don’t yet have a good model to actually describe
ern systems, not only asynchronous interrupts, but and analyze scalability but clearly scalability is a
also non-deterministic cache replacement strategies, prime target. Wile we list non-atomic read-write it
ECC RAM and flash (the later with correction rates should be noted that we are assuming that single 32
in the order of 1 out of 100 accesses projected [11]), bit entities are written atomically - that is a write
complex dependencies of instruction execution times, of a word to a memory/register location will never
etc. All of this leads to a non-deterministic timing permit an inconsistent concurrent read - either the
- that is execution time jitter - and paired with pre- old value is read in its entirety or the new value is
emtibility - to a non-deterministic global state from read in its entirety but no ”mix” of the two - any sane
the perspective of the individual thread of execution. architecture will guarantee that (at least at present).
In safety related or HA systems traditionally ran-
dom faults have been mask by replication and re- 1.2 Concept
dundancy - we take a similar approach here but at
a much smaller scale - the critical object is a single The concept is embarrassingly trivial, the writer sim-
data object and the ”fault” is the writing process. ply writes to the shared object indescriminent of the
We start with a well studied and simple class - a sin- state of any reader. Obviously this would not be safe
gle writer multiple reader construct - similar to the for a single object - as with safety related systems
one introduced by Peterson in his influencial paper where random faults must be covered - we simply
”Concurrent reading while writing” [15], whereby the view the concurrent threads as ”randomly” access-
assumptions about the read and write operations are ing the data object and the writer is viewed as the
very much relaxed to reflect the nature of modern ”fault-injector”.
super-scalar multicores, that is no memory barriers
or volatile data types are assumed. The design goals
for race masking are: Shared Val

• lock-less / 0-wait
• hand-shake-free Reg 1 Reg 2 Reg 3
• non-atomic reader/writer
• constant number of steps for read and write
(O(1)) Copy Reg1 Copy Reg2 Copy Reg3

• concurrent multiple reader, single writer


FIGURE 1: fault-injection model
• never later than a locking version
Thus the therapy is simply replication. Rather
• reader and writer crash safe than writing one protected region, we simply write N
unprotected regions and with the inherent random-
– none can be blocked
ness of complex hardware software systems we can
– no reader will receive an inconsistent or provide an arbitrary high probability that at least
old value if the writer crashes one of the regions is consistent at any time and thus
– No bounds on the number of crashing pro- can be retrieve by the readers. Summarized this sim-
cesses ply means:

• an arbitrary probability of success can be pro- • probabilistic guarantee of success that can be
vided (level of replication) set to arbitrary value
• failure probability decreases with increasing • write operation: write replicated registers un-
system complexity. protected
• failure probability decreases with increasing • read operations: copies replicated registers and
system load. selects

197
pW/CS - Probabilistic Write / Copy-Select (Locks)

hence the name probabilistic Write/Copy-Select 2 Register layout and protocol


pW/CS lock.
textbfUnderlying principle: In this section we describe the reader and writer pro-
tocols as well as the replicated register layout.
The principles are roughly modeled along the Replicated register set layout
lines of loosely coupled replicated systems to miti-
gate random faults in safety related systems: The principle layout is simply a set of N regis-
ters with 2N markers guarding it, so a N-register for
• temporal serialization is replaced by spatial pW/CS protection would be:
”concurrency”
• atomicity is replaced by a probability of suc- [marker,reg1,marker]...[marker,regN,marker]
cess.
• atomicity of single object updates must be Note that the markers are to be unique if un-
guaranteed (that is the write of a single 32bit bounded reader delays are permitted, if reader de-
word must be guaranteed to be consistent (that lays are bounded then the markers type space must
is the single load or store is consistent - which be sufficient to cover uniqueness within the reader de-
should hold on any CPU I hope). lay for un-delayed writers (or register aliasing could
occur - i.e. a role-over of a marker if the marker
replication is done to guarantee that ate last one were only a char). In the proof of concept imple-
register is consistent and complete at any time (with mentation the largest inherently atomically writable
an arbitrary selectable probability) for a given as- object, a 32bit value, is used as marker.
sumed maximum synchronous preemptions of reader read and write protocol
and writer thread. The value available is always the
last complete value written (though an in-progress The readers and writers have a simple protocol
write may be incomplete) - in any case a reader al- to follow, basically the direction of access is inverted.
ways gets access to the last consistent register copy, The selection of direction is of course arbitrary - the
thus never later than a locking solution would pro- essence only that readers and writers access in op-
vide. posite direction. Now on weakly ordered architec-
tures this might not hold (or require larger number
So pW/CS addresses consistency of concurrently of replicas) - but as the approach in it self is non-
accessed data - it does not address completeness nor deterministic this does not matter as no consistency
ordering issues - it is the lowest level primitive for assumptions are actually made.
sharing non-atomic resources without introducing a
joint context constraint.
data Writer
1.3 Categorization

In Lamport’s taxonomy this is a regular 1-writer al- Reg 1 Reg 2 Reg 3


gorithm
Copy/
A regular variable is a safe variable in
Select
which a Read that overlaps one or more Reg1’ Reg2’ Reg3’
Writes returns either the value of the Copy/
most recent Write preceding the Read or Select
of one of the overlapping Writes. data

though Lamport’s taxonomy [3] might not be ap- FIGURE 2: access model
plicable to a probabilistic locking scheme but it ful-
fills the criteria quite nicely. As this is a low-level
primitive only, the motivation to build on such defi- • writer protocol:
nitions is to allow deducing high-level constructs (i.e.
– update replicas (left to right)
monitors seem a quite natural option) to build on
this primitive. – update protocol:

198
Real-Time Linux Concepts

∗ update leading (left) marker from 1977 but uses it in a deterministic algorithm
∗ update register to implemnt a multivalent regular register. Dijkstra
∗ update the trailing (right) marker proposes a non-deterministic slection in his paper
titled ”Guarded Commands, Nondeterminacy and
• reader protocol: Formal Derivation of Programs” 1975 [1] from which
– copy register set (right to left) we use the idea of gards to protect a set of in principle
non-deterministically selected actions (copying of the
– select consistent register register). Interestingly enough Hoare in ”Commu-
– selection protocol: nicating Sequencial Processes” [2] describes a num-
ber of situations based on Dijkstras da nguarded
for(reg=right,reg<left,reg--){ commands that resemble the pW/CS locks proposed
if(markers identical ){ here, though the context is quite different. Ulti-
select register mately non-determinism has been proposed in many
} publications though we are not aware of examples
} where this non-determinism is actually utilized - this
return register alone is the novelty of the proposed design here and
we believe it is potentially useful in resolving scala-
bility problems in at least some situations.
protection by write and read being in opposite
directions. It is not possible to get inconsistent data
even with a single pass - it is though possible to get
no data (all data is found to be inconsistent). The 4 General race-masking with
probability of the occurrence of all data is inconsis-
tent can though be brought down to an arbitrary low probabilistic locking
value with sufficient replication.
If the requirement of well set priorities and thus one-
Note that there are possibilities for ”smarter”
sided non-preemptibility is dropped then there is a
protocols than the above brute-force one. For the
possibility that the read will return with none of the
proof-of-concept implementation this simple minded
registers in a detectable consistent state. That is
approach was shown to work just fine - and as it
actually we don’t know the state of the register -
allows simple modeling it is what we are currently
we infer positively that the register is consistent if
using.
the markers are identical. At the same time we can
not positively infer an inconsistency of the register
in case of markers being unequal though. But tak-
3 race-masking with implicit ing the inconsistency of the markers as indication of
reader locking inconsistency of the registers is a pessimistic assump-
tion in all cases and thus safe.
The initial motivation for looking into race mask- To fail the access to the registers must be strictly
ing was to allow lock-less coordination of real-time in lock-step order for readers and writers - for N repli-
and non-real-time tasks on a real-time enhanced cas N*2+1 lock-step access would be needed to re-
GNU/Linux system. Essentially this section is to sult in all registers being inconsistent. Note that the
show that the introduction of real-time priorities will probability of such a lock-step behavior does increase
also only improve the situation but never aggravate with the size of the critical region (actually the un-
it in the sense that the probability of success is never interrupted time spent in the critical region).
reduced.
A collision (all registers in an intermediate state)
A non-probabilistic variant is by implicit priority would require 2N+1 synchronous preemption - so 13
locking of the reader - if the reader has higher priority synchronous preemption for a system using 6 repli-
than the writer then it is not possible for the writer cas. With synchronous preemption we mean that the
to preempt the reader and thus it is guaranteed that preemption of the reader must occur after a complete
the reader will be able to copy the entire buffer un- register with markers was read every time and the
interrupted - in this case it can also be guaranteed preemption of the writer must happen in the middle
that the reader gets at least one consistent copy if of the register region every time plus that last read
N ≥ 3 replicas of the register are used. must also be preempted to ensure that no register is
read in a consistent state.
This is nothing really new, Lamport suggested
this in 1985 [3] suggesting the idea actually stems Such an aliasing for N = 3 requires synchronous

199
pW/CS - Probabilistic Write / Copy-Select (Locks)

preemption of reader and writer in at least 6 consec- the first valid entry found, note that the first found
utive cases - this means a sixfold synchronous race is the last written thus the most current of the N
conditions is needed to result in inconsistent data - replicated registers so the selection can stop once
what is the probability of such a scenario if single a consistent register set was found. To ensure this
race conditions are hard to reproduce ? the individual replicas though must be on cache line
boundaries - if they were fit in a single cache line then
In fact the race condition could be extended to
the ordering implemented in the software would not
arbitrary number of race hits to be needed to result
necessarily be honored by the hardware.
in inconsistent data and thus one can provide arbi-
trary probability of success (at the expense of larger
number of replications).
5 Properties
This solution could be described as, a somewhat
paradox, ”safe race” - safe to an arbitrary proba-
bility of successful reading of at least one consistent 5.1 Assessment of the randomness
register. hypothesis
The current proof-of-concept is for a register con-
The most interesting issue in the experiments was to
sisting of 3 integer values, but is extensible to any
determine if the data can bolster the claim of a ran-
data structure - we note though that the prime in-
dom fault scenario. If this assumption is false then
terest is in resolving synchronization of small data
obviously the underlying model would not be valid
objects, where race occurrence is very unlikely and
and consequently the conclusions also not - at least
thus traditional locking excessively wasteful.
not at this point in time.
The writer is simply an unconditional write to
The random fault model is basically claiming
the register set.
that the writer actually has the same properties as a
random fault injection - even though it is obviously
do{
systematic in nature, its timing is suspected to be
/* unconditional write */
truly random. If this holds then the mitigation of
ui[i].w_enter++;
the fault also holds - with some constraints of course
ui[i].period = period;
that will be developed a bit later.
ui[i].duty = duty;
ui[i].bit = 1; To assess that the failures are actually random
ui[i].w_exit++; we take two main data samples into account.
i++;
}while(i < NUM_REPLICA); • timing distribution
The reader copies the register set in reverse order • distribution of single buffer inconsistencies
and then runs a selection loop on it:
From this data, presented below, we can con-
while(!exit_cond){
clude that the writer process actually exhibits prop-
i = NUM_REPLICA-1;
erties of a random fault (SEU).
do{
pwm[i].w_enter = ui[i].w_enter;
loaded system 8 reader threads (i7)
pwm[i].period = ui[i].period; 1e+09
"reader1_8t_pWCS_load_i7.dist"
pwm[i].duty = ui[i].duty; 1e+08
pwm[i].bit = ui[i].bit; 1e+07
pwm[i].w_exit = ui[i].w_exit; 1e+06
i--;
samples

100000
}while( i >= 0); 10000

1000
for(i=0;i<NUM_REPLICA;i++){ 100
if(pwm[i].w_enter - pwm[i].w_exit == 0){ 10
/* consistent register set found */ 1
0 50 100 150 200 250 300
} time in microseconds
}
FIGURE 3: timing distribution of one
The selection can then simply set the pointer to reader

200
Real-Time Linux Concepts

loaded system 8 reader threads (i7)


2 Thread Race distribution Core Duo 2 E7400
1e+10
"reader1_8t_pWCS_load_i7_buffer.dist" 700
1e+09 "2t_E7400.log"
"2t_E7400.log.2"
600 g1(x)
1e+08
1e+07 500

race occurance
1e+06
samples

400
100000
10000 300

1000
200
100
100
10
1 0
0 1 2 3 4 5 610 620 630 640 650 660 670
inconsistent buffers Sample #

FIGURE 4: buffer inconsistency distribu- FIGURE 6: distribution of race probability


tion of one reader for given loop length (E7400 QuadCore)

To calculate the probability of success we need


5.2 probability of failure a probability of a race condition in the first place.
Thus the probability of one register actually being
In the above example a 6-tuple replica was in use, read inconsistent.
for N registers 2*N+1 synchronous preemption are
required - what is the probability of this happening
?
To see this we look at the distribution of the 5.3 Failure rates of pW/CS
race occurrence on a single unprotected global inte-
ger over the loop length. 10000 runs are done and To estimate the failure rate we implemented a
then the occurrence of races is plotted, showing the pW/CS protected data object and ran tests where
approximation of the race probability. we record the distribution of inconsistent registers
(actually inconsistent markers which is a conserva-
race occurrence over loop length tive indication of inconsistent registers) and plot the
10000
"Intel_Q6600.dist5" distribution for different scenarios. From this data
9000
we derive a model of the distribution and estimate
8000
7000
the probabilities involved.
race occurence

6000
5000
idle system 2,8,256 reader threads (24core AMD)
4000 1e+09
"pWCS_nobind_rw−yield_2t_load0.log"
3000 "pWCS_nobind_rw−yield_8t_load0.log"
1e+08 "pWCS_nobind_rw−yield_256t_load0.log"
2000
1e+07
1000
1e+06
0
0 500 1000 1500 2000
samples

100000
loop lenth
10000

1000
FIGURE 5: race on single unprotected 100
global variable with two threads over the loop 10
length 1
0 1 2 3 4 5 6
inconsisten buffers

The actual occurrence of a race is almost per-


fectly normal distributed, if one creates 1000 in- FIGURE 7: buffer inconsistency distribu-
stances of two racing threads for a given fixed loop tion idle system (24 core AMD)
length and records the number of races that occurred
the distribution is close to a perfect gauss curve.

201
pW/CS - Probabilistic Write / Copy-Select (Locks)

plete model yet, partially because the test-case is


loaded system 2,8,256 reader threads (24core AMD) quite artificial and it needs to be demonstrated that
1e+09
"pWCS_nobind_rw−yield_2t_load16.log"
"pWCS_nobind_rw−yield_8t_load16.log"
this actually holds for a real problem. The perfor-
1e+08 "pWCS_nobind_rw−yield_256t_load16.log" mance assessment is done by looking at the time
1e+07
it takes to access the shared data object and plot-
1e+06
ting this time as a histogram - it shows that the
samples

100000
time distribution is very favorable for the probabilis-
10000
tic lock even on a loaded system (note that readers
1000 and writer are SCHED OTHER not RR or FIFO).
100

10
The comparison is done between code using
1
pW/CS and code using a normal pthread nutex to
0 1 2 3 4 5 6 protect the shared object.
inconsisten buffers

FIGURE 8: buffer inconsistency distribu-


tion load 16 (24 core AMD) time in microseconds on idle system

"8t_pWCS"
"8t_lock"
inconsistent buffers

"2t.dist"
"256t.dist" 1e+10
1e+09
1e+08
1e+07
1e+09 1e+06
100000
1e+08 10000
1000
1e+07 100
1e+06 10
1
100000 0
10000 1
1000 2
100 0
10 100 3
200 4
1 300 samples
400 5
500 6
20 time in microseconds 600
6 4 samples 700 7
0 0.5 12108
1 1.5 2 2.5 3 1614
system load

FIGURE 9: buffer inconsistency distribu- FIGURE 10: timing idle system (4 core In-
tion load sweep from 0 to 16, 2 threads vs 256 tel)
threads (24 core AMD)

Note that the idle is the worst case (as expected).


Further this code has a close to minimal loop body
thus the probability of a preemption occurring in the
time in microseconds on loaded system
critical section is very high and can be expected to be
smaller in almost ever other case. Again this is quite "8t_pWCS_load"
"8t_lock_load"
the opposite of what you have in traditional locking
where ”keep it simple” is considered best-practice - 1e+09
1e+08
with probabilistic locks increased complexity of ac- 1e+07
1e+06
100000
cess to the shared data is actually an advantage. The 10000
1000
100
more erratic the system is the lower the probability 10
1
0
of N lock-step preemptions leading to all buffers be- 1
0 100 2
ing inconsistent. 200 300 3
400 500 4 samples
600 700 5
Notably current tests have shown that larger and time in microseconds 800 900
1000 7
6

thus more complex systems perform better than sim-


pler systems - though we must note that we only had
very limited access to large systems so the tests were
FIGURE 11: timing loaded system (4 core
generally only short runs and sometimes under not
Intel)
well specified load conditions.

5.4 Performance of pW/CS Notably running the same test on a larger sys-
tem - a 24 core AMD (2 CPUs) one can clearly
The performance evaluation is quite preliminary, see that the difference between the probabilistic ap-
partially because we don’t have a sufficiently com- proach and the deterministic approach widens.

202
Real-Time Linux Concepts

- from the conceptual side we believe that this is a


time in microseconds on idle system scalable solution though.
"pWCS_p16_g8_idle.log"
"lock_p16_g8_idle.log"

5.6 Failure behavior


1e+09
1e+08
1e+07
1e+06
100000
The maybe most interesting behavior of pW/CS is
10000
1000
100 that a failure of one of the participating threads is
10
1
0
simply irrelevant the writer can at best leave one
0 100 2
1 data replica inconsistent , a reader would go entirely
3
200 300
400 4 samples
unnoticed as it does not alter the state of the shared
500 600 5
time in microseconds
700 800
900 6 object at any time.
1000 7
pW/CS obviously does not have any double lock-
FIGURE 12: timing idle system (24 core ing issues as there is no actual lock involved.
AMD)
The prototype implementation used counters to
check consistency, this is simple to implement but
time in microseconds on loaded system strictly not sufficient to guarantee correctness of the
"pWCS_p16_g8_load.log"
"lock_p16_g8_load.log"
shared object, a better solution, though somewhat
more involved computationally is to use a simple
CRC to ensure consistency - first tests are running
1e+09
1e+08
1e+07 but were not ready on time for this paper (we need
1e+06
100000
10000
1000
reasons to publish more papers on this any way...)
100
10
1
0
1
0 100
200 3
2 5.7 Testability
300 400 4 samples
500 600 5
700 800 6
time in microseconds 900 1000 7 Traditional locking needs to be tested under load
conditions, which are not only hard to generalize but
FIGURE 13: timing loaded system (24 core never can cover all possible situations. Probabilistic
AMD) approaches on the other hand can be designed to
have their worst case probability of collision in the
If one can generalize these results is though still idle system - that is a high load improves the prob-
open due to the very limited test base we are able to ability of a un-synchronized access and thus also de-
utilize for this work. creases the probability of a collision which requires a
complex synchronized access pattern. Thus in prin-
ciple probabilistic locks are fully testable by testing
5.5 Fairness on the idle system. We would like to emphasis that
we don’t yet see this proof-of-concept as verified to
Fairness of any synchronization object is a critical carry this property though we do think that it is
issue. While locks don’t exhibit much unfairness on possible to build synchronization that exhibits the
small systems (single to 4 core systems) and mild property of ”idle is the worst case”.
load scenarios, the lock-fairness can become a major
issue on 16 or 32++ way systems. Any new locking
proposal thus must exhibit fairness and scalability
(note that fairness can also be achieved ”brute-force” 6 Possible advantages
at the price of performance and/or scalability - i.e.
server type approaches). While ”deterministic” code can’t be exhaustively
tested, and it is common that while testing a number
pW/CS lock can’t lead to reader nor writer star-
of profiles, maybe for extensive periods of time, that
vation - if it does then the CPU was overloaded to
the problem surfaces in either an untested profile or
begin with - but the locking regime it self does not
simply as a matter of time.
contribute to un-fairness or starvation. with respect
to scalability we are not yet sure if we can give it a The root problem is that we can’t reliably pro-
thumbs-up, tests have been limited to a few systems duce the ”worst-case” situation on a system, not even
up to now and only one was a 16 core (nehalem) and on a fairly simple hardware/software system, thus
one 24 core AMD system, thus it is to early to say leaving the occurrence of the worst-case to the field.

203
pW/CS - Probabilistic Write / Copy-Select (Locks)

With other words the problem becomes more likely empted/blocked - the reader will always have access
with high-load situations and we can’t test all possi- to the latest consistent buffer the writer was able to
ble combinations of high-load situations. provide.
What is now behind the problem is that the race
condition becomes a rare but possible problem - a
specific global state of the system - if it occurs we fail. 6.1 Next steps
In this sense the failure is deterministic (functional
view) but its dependency is relatively complex so it The current proof-of-concept implementation is sub-
is hard to test. On the other hand the state of the optimal in that it creates a N-replica copy for each
good case (looking at the successful execution of the reader. This is an unnecessary overhead in that it
synchronization object) is well defined ”determinis- would be at most suitable to create such a ”scratch-
tic” in most states - but the state space is very large pad” per NUMA-node on a NUMA system, for non-
so it is hard to achieve coverage. All we need to do is NUMA a direct selection from the writers replicas
turn it around - make the race depend on a complex is also an option. Further on the implementation
deterministic global state and make the good case side, the currently used counters should be replaced
independent of a particular state - that is - the good by a stronger consistency check - with the increas-
case should have a very large state space coverage, ing overhead of accessing remote CPUs the overall
and the bad case a small and testable state-space. local computation that can be expended if commu-
nication can be reduced to hand-shake-free (write-
The mitigation proposed here is to use synchro- and-forget) semantics are conciderable, equally the
nization that exhibits its worst-case behavior on the expendable spatial overhead is considerable before
idle system. A probabilistic lock will have it highest approaching a lock based implementation (on a 4x4
probability of aliasing, and thus failing, in an idle system that we had access to temporarily a imple-
system, and the higher or more erratic the load situ- mentatoin using 100 replicas was still faster than a
ation is the better it gets - because the probability of locking version for a shared object of 16 bytes !).
synchronous preemption that cause the possible col- More work needs to be done to understand where the
lision decreases. Thus we regain the ability to test break-even point would lie and consequently where
synchronization potentially as we can now reliably this approach would be suitable.
provide the worst-case with one single profile - the
idle system. At the same time the good case does not The second large area of future work is in the
have a single deterministic global state constellation modeling of this concept. The current approach
but rather happens in a large number of independent of a quite brute-force implementation to get a bet-
states (that is only one register must be consistent ter understanding of the approach and its potential
from N). is hardly suitable for actual deployment in a real
system if no formal model is available for assess-
A second aspect of raciness is the temporal di- ment. Unfortunately the available models don’t fit
mension - in traditional systems one could observe the approach well. Maybe with the exception of Di-
that something that worked well for a long time sud- jkstras guarded commands and non-deterministc if
denly fails because of optimization or faster hardware construct. The only real diffference being that while
- we had the ”implicit ordering” simply by the ex- Dijkstras guarded commands evaluate the guards to
ecution flow that protected the unprotected critical determin if execution should take place, pW/CS un-
region. Now the probabilistic lock has exactly the conditionally coplies the replicated register instance
opposite qualities, the fast the system gets the lower and then uses the ”guards” to determin if the selec-
the probability of the reader not achieving a consis- tion should take place or if the replica is abandoned -
tent read before being preempted, and this also holds currently we intend to utilizing Dijkstras constructs
for optimization of compilers - so again we can test to model pW/CS.
the worst case - slow system, unoptimized code - it
can only get better for the probability of the race
condition not occurring in all N replicas of the regis-
ter set. 7 Conclusion
Finally the issues of Amdahl’s law, the longest
serialized portion of code can quickly dominate the With ever growing complexity, designing determin-
overall performance. As pW/CS has no serializa- istic while optimal systems is becoming increasingly
tion of readers and the writer there is no impact on hard (or actually impossible). In this paper we pro-
concurrent threads by individual threads being pre- pose to look at potentially capitalizing on the grow-
ing complexity rather than fighting it - by utilizing

204
Real-Time Linux Concepts

the inherent randomness of complex systems in com- August 1975 Communications of the ACM
bination with probabilistic locking methods. Volme 18 Number 8
We demonstrate the feasibility of this approach [2] Communicating Sequencial Processes, C.A.R
with an admittedly naive implementation of a criti- Hoare, August 1978 Communications of the
cal section shared between a concurrently executing ACM,Volume 21, Number 8
readers and writer of arbitrary priority. The results
indicate that with growing complexity of the system, [3] On Interprocess Communication, Leslie Lamport,
with higher system load, and with increased number December 1985 – Part I: Basic formalism, Part II:
of readers the probability of failing is reduced. Fur- Algorithms. SRI International
ther faster systems have a higher probability of suc-
cess than slower systems, and equally optimization [4] ELC: A PREEMPT RT roadmap,Jake Edge
of compiler plays to our advantage. on a talk by Thomas Gleixner, April 2011
http://lwn.net/Articles/440064/
We are aware that this is too early to call this a
sound and reliable result but the preliminary inves- [5] Linux Kernel Development (3ed Edition), Robert
tigation does indicate that the proposed path - stop Love, July 2010 Addison-Wesley
fighting complexity, use it ! - is worth investigation
in more detail. [6] migrate disable infrastructure, Peter Zijlstra, July
2011 Linux 3.0-rc7-rt0
The most notable obstacle to utilize such ap-
proaches in our opinion is the lack of appropriate [7] fasync() BKL pushdown, Jonathan Corbet, June
models for probabilistic approaches. This clearly 2008, http://lwn.net/Articles/287083/
will be our next steps in this effort to capitalize
on the inadvertable trend of growing hardware and [8] hrtimers and beyond transforma-
software complexity. Further a systematic tradeoff tion of the Linux time(r) system,
study, comparing traditional locking options in re- Thomas Gleixner, Douglas Niehaus, 2006
lation to system complexity will be on our TODO http://www.kernel.org/pub/linux/kernel/
list. people/tglx/hrtimers/ols2006-
hrtimers.pdf
The main conclusion from this work though is
simply that locking may not be the best solution for [9] Analysis of inherent randomness of the Linux
concurrent access to shared objects - rethinking the kernel, Nicholas Mc Guire, Peter Odhiambo Okech,
problem in the context of modern super-scalar mul- September 2009 DSLab Lanzhou University
ticore systems might well be worth the effort.
[10] Completely Fair Scheduler,Ingo Molnr, October
2007, Linux 2.6.23
Acknowledgment [11] LEC(TM) for Flash Memory,Lyric Semiconduc-
tor,Undated,www.lyricsemiconductor.com
We would like to thank Silicon Graphics GmbH, Ger- http://gigaom.com/2010/08/16/lyric-
many, specifically Mr. Heinz M oser for supporting semiconducto/
this research effort by providing us with suitable mul-
ticore system for development and testing. This sup- [12] sched fair.c, Ingo Molnar,Peter Zijlstra, et. al,
port allowed us to perform an initi al evaluation of linux-2.6/kernel/sched fair.c
the scalability properties of pW/CS. [13] Lockdependency Validator,Ingo Mol-
sources used for this project are available on re- nar,Arjan van de Ven, May 2006
quest under the GPL V2 from DSLab [14] and will be http://lwn.net/Articles/185605/
released as soon as they are cleaned up
[14] pWCS, gauss,c,dist.c,Nicholas Mc Guire,
2011,http://dslab.lzu.edu.cn:8080/members/
hofrat/pWCS/
References
[15] Concurrent reading while writing, Gary L. Peter-
[1] Guarded Commands, Nondeterminancy and For- son, James E. Burns, 1983. ACM Transactions on
mal Derivation of Programs, Edsger W. Dijkstra, Programming Languages and Systems

205
pW/CS - Probabilistic Write / Copy-Select (Locks)

206
Real-Time Linux Concepts

On the implementation of real-time slot-based task-splitting


scheduling algorithms for multiprocessor systems

Paulo Baltarejo Sousa


CISTER-ISEP Research Center, Polytechnic Institute of Porto
Rua Dr. António Bernardino de Almeida 431, 4200-072 PORTO, Portugal
[email protected]

Konstantinos Bletsas
CISTER-ISEP Research Center, Polytechnic Institute of Porto
Rua Dr. António Bernardino de Almeida 431, 4200-072 PORTO, Portugal
[email protected]

Eduardo Tovar
CISTER-ISEP Research Center, Polytechnic Institute of Porto
Rua Dr. António Bernardino de Almeida 431, 4200-072 PORTO, Portugal
[email protected]

Björn Andersson
Software Engineering Institute, Carnegie Mellon University
Pittsburgh, USA
[email protected]

Abstract
In this paper we discuss challenges and design principles of an implementation of slot-based task-
splitting algorithms into the Linux 2.6.34 version. We show that this kernel version is provided with
the required features for implementing such scheduling algorithms. We show that the real behavior of
the scheduling algorithm is very close to the theoretical. We run and discuss experiments on 4-core and
24-core machines.

1 Introduction partitioned and semi-partitioned.


Global scheduling algorithms store tasks in one
Nowadays, multiprocessors implemented on a single global queue, shared by all processors. Tasks can mi-
chip (called multicores) are mainstream computing grate from one processor to another; that is, a task
technology and it is expected that the number of can be preempted during its execution and resume
cores per chip continue increasing. They may provide its execution on another processor. At any moment
great computing capacity if appropriate scheduling the m (assuming m processors) highest-priority tasks
algorithms are devised. Real-time scheduling algo- are selected for execution on the m processors. Some
rithms for multiprocessors are categorized as: global, algorithms of this category can achieve a utilization

207
Real-time slot-based task-splitting scheduling algorithms for multiprocessor systems

bound of 100%, but generate too many preemptions. executes prior runtime and besides assigning tasks
to processors is also responsible for computing all
Partitioned scheduling algorithms partition the
parameters required by the dispatching algorithm.
task set and assign all tasks in a partition to the
The dispatching algorithm works over the timeslot
same processor. Hence, tasks cannot migrate be-
and selects tasks to be executed by processors.
tween processors. Such algorithms involve few pre-
emptions but their utilization bound is at most 50%. The Sporadic-EKG (S-EKG) [4] extends the pe-
riodic task set model of EKG [7] to sporadic task
Semi-partitioning (also known as task-splitting)
set models. This approach assures that the number
scheduling algorithms assign most tasks (called non-
of split tasks is bounded (there are at most m − 1
split tasks) to just one processor but some tasks
split tasks), each split task executes on only two pro-
(called split tasks) are assigned to two or more pro-
cessors and the non-split tasks execute on only one
cessors. Uniprocessor dispatchers are then used on
processor. The beginning and end of each times-
each processor but they are modified to ensure that
lot are synchronized across all processors. The end
a split task never executes on two or more processors
of a timeslot of processor p contains a reserve and
simultaneously.
the beginning of a timeslot of processor p + 1 con-
Several multiprocessor scheduling algorithms tains another reserve, and these two reserves sup-
have been implemented and tested using vanilla ply processing capacity for a split task. As non-split
Linux kernel. LitmusRT [1] provides a modu- tasks execute only on one processor they are sched-
lar framework for different scheduling algorithms uled according to the uniprocessor EDF scheduling
(global-EDF, pfair algorithms). Kato et al. [2] cre- algorithm. A detailed description of that algorithm
ated a modular framework, called RESCH, for using with an example can be found at [8].
other algorithms than LitmusRT (partitioned, semi-
While EKG versions are based on the task, the
partitioned scheduling). Faggioli et al. [3] imple-
NPS-F [5,6] uses an approach based on bins. Each
mented global-EDF in the Linux kernel and made it
bin is assigned one or more tasks and there is a one
compliant with POSIX interfaces.
to one relation between each bin and each notional
In this paper we address the Real-time TAsk- processor. Then, the notional processor schedules
Splitting scheduling algorithms (ReTAS) frame- tasks of each bin under the EDF scheduling pol-
work [11] that implements a specific type of semi- icy. The time is split into equal-duration timeslots
partitioned scheduling: slot-based task-splitting and each timeslot is composed by one or more time
multiprocessor scheduling algorithms [4, 5, 6]. Slot- reserves. Each notional processor is assigned one
based task-splitting scheduling algorithms assign reserve in one physical processor. However, up to
most tasks to just one processor and a few to only m − 1 notional processors could be assigned to two
two processors. They subdivide the time into equal- reserves, which means that these notional proces-
duration timeslots and each timeslot processor is sors are implemented upon two physical processor
composed by one or more time reserves. These re- reserves, while the remaining notional processors are
serves are used to execute tasks. Reserves used for implemented upon one physical processor reserve.
split tasks, which execute on two processors, must
There is one fundamental difference between S-
be carefully positioned to avoid overlapping in time.
EKG and NPS-F algorithms. NPS-F can potentially
The remainder of this paper is structured as fol- generate a higher number of split tasks than S-EKG.
lows. Section 2 provides a description of the main Another difference is related to the dispatching al-
features of the slot-based task-splitting scheduling al- gorithm. The S-EKG allows non-split tasks to be
gorithms. Section 3 discusses some challenges and executed on the split task reserves (in the case when
design principles to implement this kind of algo- these tasks are not ready to be executed) while NPS-
rithms. A detailed description of our implementa- F does not; that is, each notional processor executes
tion is presented in Section 4 while in Section 5 we only on its reserve(s).
discuss the discrepancy between theory and practice.
Fig. 1 shows a generic execution timeline pro-
Finally, in Section 6 conclusions are drawn.
duced by these scheduling algorithms. The time is
divided into equal-duration timeslots of length S.
Each timeslot is divided up to 3 reserves: x[p], y[p]
2 Slot-based task-splitting and N [p]. Reserve x[p] is located in the beginning of
the timeslot and is reserved to execute the task or no-
Slot-based task-splitting algorithms have two impor- tional processor split between processors p and p − 1.
tant components: (i) the task assigning; and (ii) the Reserve y[p] is located in the end of the timeslot and
dispatching algorithm. The task assigning algorithm

208
Real-Time Linux Concepts

is reserved to execute the task or notional processor its reserve on processor p, it has to immediately re-
split between processors p and p + 1. The remain- sume execution on its reserve on processor p+1. Due
ing part (N [p]) of the timeslot is used to execute to many sources of unpredictability (e.g. interrupts)
non-split tasks or notional processors that execute in a real operating system, this precision is not pos-
on only one processor. sible. Consequently, this can prevent the dispatcher
of processor p+ 1 to select the split task because pro-
cessor p has not yet relinquished that task. In order
to handle this issue, one option could be that pro-
cessor p + 1 sends an inter-processor interrupt (IPI)
to processor p to relinquish the split task, and an-
other could be that processor p + 1 sets up timer x
time units in future to force the invocation of its dis-
patcher. Two reasons have forced us to choose the
latter. First, we know that if a dispatcher has not yet
relinquished the split task it was because something
is preventing it from doing so, such as, the execu-
tion of an interrupr service routine (ISR). Second,
FIGURE 1: Execution timeline example. the use of IPIs will create some dependency between
processors that could embarrass the scalability of the
In the remainder of this paper, we will discuss the dispatcher.
implementation of S-EKG an NPS-F algorithms in
4-core and 24-core machines supported by the Linux
2.6.34 kernel version.

3 Challenges and design prin- 4 Implementation of slot-based


ciples for implementing slot- task-splitting
based task-splitting
In [9]a set of challenges and a set of design princi- 4.1 Assumptions about the architec-
ples for the S-EKG implementation were discussed. ture
However, in this paper we will implement NPS-F as
well and for this reason we will need to adapt the
We assume identical processors, which means that
design principles. In this seciton, we will do so as
(i) all processors have the same instruction set
follows: each processor should have its own runqueue
and data layout (e.g. big-endian/little-endian) and
(the queue that stores ready jobs). The runqueue of
(ii) all processors execute at the same speed.
each processor should map the ready tasks with its
reserves; that is, which ready tasks are allowed to We also assume that the execution speed of a
execute on each reserve. Since some tasks may ex- processor does not depend on activities on another
ecute on two processors, what is the best approach processor (for example whether the other processor
for that? If tasks are stored locally on each proces- is busy or idle or which task it is busy executing)
sor, whenever a task migrates from one processor to and also does not change at runtime. In practice,
another processor, it requires locking both processor this implies that (i) if the system supports simulta-
runqueues for moving that task from one runqueue to neous multithreading (Intel calls it hyperthreading)
the other runqueue. However, in the case of the NPS- then this feature must be disabled and (ii) features
F this could imply moving more than one task. Since that allow processors to change their speed (for ex-
the frequency of migration may be high, it turns out ample power and thermal management) must be dis-
that this is not the best approach; so we adopted a abled.
different approach. We defined a runqueue per no-
We assume that each processor has a local timer
tional processor so each notional processor stores all
providing two functions. One function allows read-
ready tasks assigned to it. Then, we map each no-
ing the current real-time (that is not calendar time)
tional processor to processor reserves.
as an integer. Another function makes it possible to
As it is intuitive from observing two consecutive set up the timer to generate an interrupt at x time
timeslots in Fig. 1, whenever a split task consumes units in the future, where x can be specified.

209
Real-time slot-based task-splitting scheduling algorithms for multiprocessor systems

4.2 Why vanilla Linux kernel? sidered a job. Note that the first job of each task
appears in the system at time0 + offset (time0 is
The vanilla Linux kernel 2.6.34 was chosen to imple- set equal to all tasks in the system) and the remain-
ment the scheduling algorithms S-EKG [4] and NPS- ing jobs are activated according to the period. The
F [5,6]. That kernel version provides the required delay until function sleeps a task until the absolute
mechanisms to satisfy the previously mentioned de- time specified by arrival.
sign principles: (i) each processor holds its own run-
queue and it is easy to add new fields to it; (ii) it a r r i v a l := t i m e 0 + o f f s e t ;
while ( t r u e )
has already implemented red-black trees that are bal- {
anced binary trees whose nodes are sorted by a key delay until ( arrival ) ;
execute () ;
and most the operations are done in O(log n) time; a r r i v a l := a r r i v a l + p e r i o d ;
(iii) it has the high resolution timers infrastructure }

that offers a nanosecond time unit resolution, and Listing 1: ReTAS task pseudo-algorithm.
timers can be set on a per-processor basis; (iv) it
is very simple to add new system calls and, finally, In the Linux operating system a process is an
(v) it comes with the modular scheduling infrastruc- instance of a program in execution. To manage
ture that easily enables adding a new scheduling pol- all processes, the kernel uses an instance of struct
icy to the kernel. task struct data structure for each process. In or-
der to manage ReTAS tasks, some fields were added
to the struct task struct data structure (see List-
4.3 ReTAS implementation ing 2). notional cpu id field is used to associate
the task with the notional processor. Fields cpu1
The vanilla Linux kernel 2.6.34 has three native and cpu2 are used to set the logical identifier of pro-
scheduling modules: RT (Real-Time); CFS (Com- cessor(s) in which the task will be executed on. The
pletely Fair Scheduling) and Idle. Those modules absolute deadline and also the arrival of each job
are hierarchically organized by priority in a linked are set on the deadline and arrival fields of the
list; the module with highest priority is the RT, the retas job param data structure, respectively.
one with the lowest is the Idle module. Starting with
the highest priority module, the dispatcher looks for s tr u ct r e t a s t a s k {
int n o t i o n a l c p u i d ;
a runnable task of each module in a decreasing order s tr u ct r e t a s t a s k p a r a m t a sk p a r a m {
of priority. unsigned long long d e a d l i n e ; // D i
} t a sk p a r a m ;
We added a new scheduling policy module, called s tr u ct r e t a s j o b p a r a m j o b p a r a m {
unsigned long long d e a d l i n e ; // d i j
ReTAS, on top of the native Linux module hierarchy, unsigned long long a r r i v a l ; // a i j
thus becoming the highest priority module. That } job param ;
i n t cpu1 ;
module implements the S-EKG and NPS-F schedul- i n t cpu2 ;
ing algorithms. The ReTAS implementation con- ...
};
sists on a set of modifications to the Linux 2.6.34
kernel in order to support the S-EKG and NPS-F s tr u ct t a s k s t r u c t {
...
scheduling algorithms and also the cluster version s tr u ct r e t a s t a s k r e t a s t a s k ;
of the NPS-F, called C-NPS-F [5]. These schedul- };

ing policies are identified by the SCHED S EKG and Listing 2: Fields added to struct task struct
SCHED NPS F macros. kernel data structure.
Since the assigning algorithm is executed prior
to runtime, in the next sections we will focus only on
the kernel implementation; that is, on the dispatch- 4.3.2 Notional processors
ing algorithms.
As mentioned before, ReTAS tasks are assigned to
notional processors. Therefore, notional processors
4.3.1 ReTAS tasks act as a runqueue. Each notional processor is an in-
stance of struct notional cpu data structure (see
To differentiate these tasks from other tasks present Listing 3), which is identified by a numerical iden-
in the system, we refer to these tasks as ReTAS tifier (id). Field cpu is set with the logical identi-
tasks. Listing 1 shows the pseudo-algorithm of Re- fier of the physical processor that, in a specific time
TAS tasks. They are periodic tasks and are always instant, is executing a task from that notional pro-
present in the system. Each loop iteraction is con- cessor. The purpose of the flag will be explained in

210
Real-Time Linux Concepts

Section 4.3.5. Each notional processor organizes all 4.3.4 ReTAS scheduling module
ready jobs in a red-black tree, whose root is the field
root tasks, according to the job absolute deadline. In the vanilla Linux kernel each processor holds a
The lock field is used to serialize the insertion and runqueue of all runnable tasks assigned to it. The
remotion operations over the red-black tree specially scheduling algorithm uses this runqueue to select
for notional processors that are executed by two pro- the “best” process to be executed. The information
cessors. edf field points to the task with the earli- for these processes is stored in a per-processor data
est deadline stored in the red-black tree. Note that structure called struct rq (Listing 5). Many func-
notional cpus is a vector defined as global variable. tions that compose the Linux’s modular scheduling
framework have an instance of this data structure as
s tr u ct n o t i o n a l c p u { argument. Listing 5 shows the new data structures
int id ;
a t o m i c t cpu ;
required by the ReTAS scheduling module added to
atomic t f la g ; the Linux native struct rq. The purpose of the
raw spinlock t lock ;
s tr u ct r b r o o t r o o t t a s k s ;
struct timeslot data structure was described in
s tr u ct t a s k s t r u c t ∗ e d f ; the previous section.
...
};
... s tr u ct r e t a s r q {
int p ost sc h e d u l e ;
s tr u ct n o t i o n a l c p u n o t i o n a l c p u s [ s tr u ct t i m e s l o t t i m e s l o t ;
NR NOTIONAL CPUS ] ; s tr u ct r e l e a s e r e l e a s e ;
s tr u ct r e s c h e d c p u r e s c h e d c p u ;
Listing 3: struct notional cpu data structure. };

s tr u ct r q {
...
s tr u ct r e t a s r q retas rq ;
};
4.3.3 Timeslot reserves Listing 5: struct retas rq added to struct rq.

Each processor needs to know the composition


According to the rules of the Linux’s modu-
of its timeslot. So, per-processor an instance
lar scheduling framework, each module must imple-
of the struct timeslot (Listing 4) is defined.
ment a set of functions specified in the sched class
Fields timeslot length, begin curr timeslot,
data structure. Listing 6 shows the definition
reserve length and timer are used to set the time
of retas sched class, which implements the Re-
division into time reserves. They are also used to
TAS module. The first field (next), is a pointer
identify in each time reserve a given time instant t
to the next sched class in the hierarchy. Since
falls in. When a timer expires, the timer callback
retas sched class is declared as the highest pri-
sets the current task to be preempted and this au-
ority scheduler module that field points to the
tomatically triggers the invocation of the dispacther.
rt sched class, which implements the two POSIX
And taking into account the current reserve, the dis-
real-time policies (SCHED FIFO and SCHED RR). The
pacther (that will be described on the next section)
other fields are functions that act as callbacks to spe-
tries to pick a task from either the notional proces-
cific events.
sor pointed by notional cpu (for the first option)
or notional processor pointed by alt notional cpu s t a t i c const s tr u ct s c h e d c l a s s
(for the second option). retas sched class = {
. next = &r t s c h e d c l a s s ,
. enqueue task = enqueue task retas ,
s tr u ct t i m e s l o t r e s e r v e { . dequeue task = dequeue task retas ,
s tr u ct n o t i o n a l c p u ∗ n o t i o n a l c p u ; // f i r s t . check preempt curr = check preempt curr retas
option ,
s tr u ct n o t i o n a l c p u ∗ a l t n o t i o n a l c p u ; // . pick next task = pick next task retas ,
second o p t i o n ...
unsigned long long r e s e r v e l e n g t h ; };
};
Listing 6: retas sched class scheduling class.
s tr u ct t i m e s l o t {
unsigned long long t i m e s l o t l e n g t h ;
unsigned long long b e g i n c u r r t i m e s l o t ; The enqueue task retas (Listing 7) is called
s tr u ct t i m e s l o t r e s e r v e n o t i o n a l c p u s [ whenever a ReTAS job enters into a runnable state.
NR NCPUS PER CPU ] ;
... It receives two pointers, one for the runqueue of
s tr u ct h r t i m e r t i m e r ; the processor that is running this code (rq) and an-
};
other to the task that is entering in a runnable state
Listing 4: struct timeslot data structure. (p). Then, it updates the job absolute deadline by
suming the job arrival time (this field is updated

211
Real-time slot-based task-splitting scheduling algorithms for multiprocessor systems

through the job release procedure that will be de-


s t a t i c void
scribed Section 4.3.6) to the task relative deadline, c h e c k p r e e m p t c u r r r e t a s ( s tr u ct r q ∗ rq , s tr u ct
and inserts it into the red-black tree of its notional t a s k s t r u c t ∗p , i n t sy n c )
{
processor. Additionally, it checks if this job (in the s tr u ct t a s k s t r u c t ∗ t=NULL ;
case of being a split task) could be executed by other i n t cpu ;
t=g e t e d f t a s k ( g e t c u r r e n t n o t i o n a l c p u (&rq−>
processor; that is, if it is a split task could happen retas rq . timeslot ) ) ;
that when a job is released on this processor could if (! t){
t=g e t e d f t a s k ( g e t c u r r e n t a l t n o t i o n a l c p u (&
correspond to its reserve on the other processor. If rq−>r e t a s r q . t i m e s l o t ) ) ;
that is the case, then an IPI is sent to the other pro- }
i f ( t ){
cessor, using the resched cpu function. i f ( t != rq−>c u r r ) {
cpu=i s e x e c u t i n g o n o t h e r c p u ( t−>r e t a s t a s k .
s t a t i c void n o t i o n a l c p u i d , rq−>cpu ) ;
e n q u e u e t a s k r e t a s ( s tr u ct r q ∗ rq , s tr u ct i f ( cpu==−1){
t a s k s t r u c t ∗p , i n t wakeup , b o o l f l a g ) s e t t s k n e e d r e s c h e d ( rq−>c u r r ) ;
{ }
i n t cpu ; else {
p−>r e t a s t a s k . j o b p a r a m . d e a d l i n e=p−>r e t a s t a s k s e t r e s c h e d c p u t i m e r e x p i r e s ( rq ) ;
. j o b p a r a m . a r r i v a l+p−>r e t a s t a s k . }
t a sk p a r a m . d e a d l i n e ; }
i n s e r t t a s k (& n o t i o n a l c p u s [ p−>r e t a s t a s k . }
notional cpu id ] , p) ; }
cpu=c h e c k f o r r u n n i n g o n o t h e r c p u s ( p , rq−>cpu
);
Listing 9: check preempt curr retas function.
i f ( cpu!=−1){
r e s c h e d c p u ( cpu ) ;
} The pick next task retas function selects the
return ; job to be executed by the current processor (see List-
}
ing 10). This function is called by the dispatcher
Listing 7: enqueue task retas function. whenever the currently running task is marked to
be preempted or when a task finishes its execution.
When a ReTAS job is no longer runnable, then First, it tries to get highest priority ReTAS job from
the dequeue task retas function is called that un- the first notional processor and, if there is no job it
does the work of the enqueue task retas function checks the second notional processor. If there is one
(see Listing 8); that is, it removes the task from the ReTAS job ready to be executed (and it is not the
notional processor. current executing job) then, the next step is to lock
the notional processor to that processor (this is done
s t a t i c void in the lock notional cpu function and this locking
d e q u e u e t a s k r e t a s ( s tr u ct r q ∗ rq , s tr u ct
t a s k s t r u c t ∗p , i n t s l e e p ) mechanism will be described in Section 4.3.5). If this
{ notional processor is locked it sets up a local timer to
r e m o v e t a s k (& n o t i o n a l c p u s [ p−>r e t a s t a s k .
notional cpu id ] , p) ; expire some time later and returns NULL, otherwise
return ; it returns the pointer to that job.
}

Listing 8: dequeue task retas function. s t a t i c s tr u ct t a s k s t r u c t ∗


p i c k n e x t t a s k r e t a s ( s tr u ct r q ∗ r q )
{
As the name suggests, the check preempt curr
s tr u ct t a s k s t r u c t ∗ t=NULL ;
retas function (Listing 9) checks whether the cur- i n t cpu ;
rently running task must be preempted or not. This t=g e t e d f t a s k ( g e t c u r r e n t n o t i o n a l c p u (&rq−>
retas rq . timeslot ) ) ;
function is called following the enqueuing or de- i f ( ! t ) { // i t i s assumed t h a t t h e s e t a s k s ( o f
queuing of a task and it checks if there is any a l t n o t i o n a l c p u ) e x e c u t e o n l y on t h i s cpu
t=g e t e d f t a s k ( g e t c u r r e n t a l t n o t i o n a l c p u (&
ReTAS jobs to be executed. If so, it checks if rq−>r e t a s r q . t i m e s l o t ) ) ;
that job is available; that is, if it is not being }
i f ( t ){
executed by another processor (to handle this is- cpu=l o c k n o t i o n a l c p u ( t−>r e t a s t a s k .
sue, we use the atomic t cpu field defined on n o t i o n a l c p u i d , rq−>cpu ) ;
i f ( cpu !=−1){
the struct notional cpu) it sets a flag that in- s e t r e s c h e d c p u t i m e r e x p i r e s ( rq , t−>
dicates to the dispatcher that the currently run- retas task . notional cpu id ) ;
t=NULL;
ning task must be preempted, otherwise it sets up goto p i c k r e t ;
a local timer (defined in the struct resched cpu }
}
resched cpu) to expire some time later (throught pick ret :
set resched cpu timer expires, which will trig- return t ;
}
ger, at timer expiration, the invocation of the dis-
patcher). Listing 10: pick next task retas function.

212
Real-Time Linux Concepts

4.3.5 Locking a notional processor p o s t s c h e d u l e r e t a s ( s tr u ct r q ∗ r q )


{
i n t i , ncpu=−1;
Whenever there is a shared resource there is the i f ( rq−>r e t a s r q . p o s t s c h e d u l e ) {
i f ( rq−>c u r r −>p o l i c y==SCHED S EKG | | rq−>c u r r
need to create a synchronization mechanism to se- −>p o l i c y==SCHED NPS F ) {
rialize the access to that resource. In this imple- ncpu=rq−>c u r r −>r e t a s t a s k . n o t i o n a l c p u i d ;
}
mentation a notional processor can be shared by f o r ( i =0; i <rq−>r e t a s r q . t i m e s l o t .
up two physical processors. So, to serialize the n r n o t i o n a l c p u s ; i ++){
i f ( l i k e l y ( rq−>r e t a s r q . t i m e s l o t .
access to those notional processors two functions notional cpus [ i ] . notional cpu ) ){
are used (see Listing 11): lock notional cpu and i f ( rq−>r e t a s r q . t i m e s l o t . n o t i o n a l c p u s [ i ] .
n o t i o n a l c p u −>i d != ncpu ) {
unlock notional cpu. The locking is done in the u n l o c k n o t i o n a l c p u ( rq−>r e t a s r q . t i m e s l o t .
pick next task retas function. As it can be seen n o t i o n a l c p u s [ i ] . n o t i o n a l c p u −>i d , rq−>
cpu ) ;
from Listing 11, first it identifies the physical proces- }
sor that is executing tasks from that notional proces- }
}
sor. Next, it tries to add one to the atomic t flag }
variable using the atomic add unless kernel func- rq−>r e t a s r q . p o s t s c h e d u l e = 0 ;
}
tion. But this operation only succeds if the value is
not one, that is, if it is zero, otherwise, it fails. In the Listing 12: post schedule retas function.
first case, success, it atomically adds one to the flag
and sets the cpu variable with the logical identifier s t a t i c i n l i n e void
of the current processor. And this way it locks the c o n t e x t s w i t c h ( s tr u ct r q ∗ rq , s tr u ct
t a s k s t r u c t ∗ prev ,
notional processor to that processor. In the second s tr u ct t a s k s t r u c t ∗ n e x t )
case, failure, the notional processor could be locked {
...
by other processor or by the current processor. If it i f ( prev−>p o l i c y==SCHED S EKG | | prev−>p o l i c y
is locked by the current processor nothing changes, ==SCHED NPS F | |
next−>p o l i c y==SCHED S EKG | | next−>p o l i c y==
otherwise the logical number of the processor is re- SCHED NPS F )
turned and the dispatcher cannot pick any job from rq−>r e t a s r q . p o s t s c h e d u l e = 1 ;
...
this notional processor. s w i t c h t o ( prev , next , p r e v ) ;
...
In order to unlock the notional processor, when- }
ever a ReTAS job is the prev or the next task in the
a s m l i n k a g e void
context of the context swicth function, which is in- s c h e d s c h e d u l e ( void )
vocated by schedule function (see Listing 13), it will {
s tr u ct t a s k s t r u c t ∗ prev , ∗ n e x t ;
enforce the execution of the post schedule retas s tr u ct r q ∗ r q ;
function (see Listing 12) to unlock the notional pro- ...
p u t p r e v t a s k ( rq , p r e v ) ;
cessor. Unlocking means setting the flag variable next = p i c k n e x t t a s k ( rq ) ;
equal to zero and the cpu variable equal to −1. ...
c o n t e x t s w i t c h ( rq , prev , n e x t ) ; /∗ u n l o c k s
t h e r q ∗/
i n t l o c k n o t i o n a l c p u ( i n t ncpu , i n t cpu ) ...
{ p o s t s c h e d u l e r e t a s ( rq ) ;
i n t r e t=a t o m i c r e a d (& n o t i o n a l c p u s [ ncpu ] . cpu ) ; ...
i f ( a t o m i c a d d u n l e s s (& n o t i o n a l c p u s [ ncpu ] . f l a g }
,1 ,1) ){
a t o m i c s e t (& n o t i o n a l c p u s [ ncpu ] . cpu , cpu ) ; Listing 13: context swicth and schedule
r e t=cpu ;
}
functions.
i f ( r e t==cpu )
r e t =−1;
return r e t ;
} 4.3.6 Job release mechanism
void u n l o c k n o t i o n a l c p u ( i n t ncpu , i n t cpu )
{
int x ;
The job release mechanism is supported by the
x=a t o m i c r e a d (& n o t i o n a l c p u s [ ncpu ] . cpu ) ; struct release and is set per-processor. It is com-
i f ( x==cpu ) {
a t o m i c s e t (& n o t i o n a l c p u s [ ncpu ] . cpu , −1) ;
posed by a red-black tree and a timer. The idea is the
a t o m i c s e t (& n o t i o n a l c p u s [ ncpu ] . f l a g , 0 ) ; following: a job is put in the waiting state, next, it
}
}
is inserted into a red-black tree ordered by the abso-
lute arrival time and, finally, a timer is set to expire
Listing 11: Lock and unlock notional processor at the earliest arrival time of all jobs stored into the
functions.
red-black tree. When the timer expires, the job with
earliest arrival time is removed from the red-black
void tree and its state changes to running, consequently,

213
Real-time slot-based task-splitting scheduling algorithms for multiprocessor systems

it becomes ready. The timer is armed to expire at


the earliest arrival time of remaining jobs stored into
the red-black tree. This procedure is triggered by the
delay until system call that specifies the absolute
arrival time of a job. One feature of this mechanism
is that the next release of a job is done on the pro-
cessor where it finishes its execution; that is, where
it executes the delay until system call.

5 From theory to practice

Usually, real-time scheduling algorithms for multi-


processor systems are supported by a set of assump- FIGURE 2: Reserve jitter.
tions that have no correspondence in the real-world.
The evaluation of the discrepancy between theory In theory it is typically assumed that a release
and practice is here addressed taking into account of a job is instantaneous and becomes ready immedi-
two real-world phenomena: jitter and overhead. For ately. In practice, however, something very different
convenience we define jitter as being the time from is observed. RelJi,k (release jitter of job τi,k ) de-
when an event must occur until it actually occurs. notes the difference in time from when the job τi,k
We have identified three sources of jitter: reserve, should arrive until it is inserted in the ready queue
release and context switch. We also define overhead (see Fig. 3). The following steps take place in prac-
as the time that the current executing task is blocked tice. First, a timer expires to wake up the job and,
by something else not related to the scheduling algo- as mentioned before, there is always a drift between
rithm. We have identified two sources for the over- the time instant when the timer should expire and
head: interrupts and release. when it actually expires. Next, the job is removed
from the release queue and inserted into the ready
queue.

5.1 Sources of jitter

Theoretically, when a reserve ends one assumes that


jobs are instantaneously switched, but in practice
this is not true. Fig. 2 shows ResJi,k , which repre-
sents the measured reserve jitter of job τi,k and de-
notes the discrepancy between the time when the job
τi,k should (re)start executing (at the beginning of
the reserve A, where A could be x, N or y) and when
it actually (re)starts. It should be mentioned that
the timers are set up to fire when the reserve should
begin, but, unfortunately, there is always a drift be-
tween this time and the time instant at which the
timer fires. Then, the timer callback executes and,
in most cases, sets the current task to be preempted FIGURE 3: Release jitter.
triggering the invocation of the dispatcher. The dis-
patcher selects a task according to the dispatching A final important source of jitter is the con-
algorithm and switches the current task with the se- text switching. A context switch is the procedure
lected task. for switching processor from one job to another.

214
Real-Time Linux Concepts

In theory, the time required to switch jobs is usu-


ally neglected. However, switching from one pro-
cess to another requires a certain amount of time
for saving and loading registers, among other oper-
ations. CtswJi,k (context switch jitter of job τi,k )
denotes the difference in time from when the job τi,k
should start executing until it actually (re)starts (see
Fig. 4). Note that, we consider the context switch
time incurred by the EDF scheduling policy, because
the context switch time incurred by reserves are ac-
counted for by ResJ.

FIGURE 5: Interrupt overhead.

FIGURE 4: Context switch jitter.


FIGURE 6: Release overhead.

5.3 Experimental setup

5.2 Sources of overhead In order to evaluate the discrepancy between theory


and practice, we have conducted a range of exper-
iments with 4-core (Intel(R) Core(TM) i7 CPU @
Usually, in theory, we ignore interrupts, but in prac- 2.67GHz) and 24-core (AMD OPteron (TM) Proces-
tice they are one of main sources of timing unpre- sor 6168 @ 1.90GHz) machines with real-time tasks
dictability in this kind of system. In practice, when executing empty for-loops. In order to make the en-
an interrupt arises the processor suspends the execu- vironment more controlled, we (i) set runlevel to 1
tion of the current job in order to execute the asso- (that is no windowing system), (ii) disabled network
ciated ISR. IntOi,k (interrupt overhead of job τi,k ) interface and also the filesystem journal mechanism,
denotes the time during which job τi,k is prevented (iii) all interrupts are handled by the last processor
from executing due to the execution of an ISR (see and (iv) setup one non real-time task per core, as
Fig. 5). a background task, to ensure that the kernel idle
threads never start executing.
When a job is executing on a processor, jobs of
other tasks could appear in the system. In our imple- We generated 17 random task sets. The period
mentation, to release a job, the processor stops what and utilization of tasks varied from 5 ms up to 50 ms
it is doing to release that job. RelOi,k (release over- and from 0.1 up to 1, respectively. The number of
head of job τi,k ) denotes the time during which job tasks varied from 6 up to 28 tasks in the case of the
τi,k is prevented from executing due to the releases 4-core machine. The time duration of each experi-
of other jobs (see Fig. 6). ment was approximately 1000 s. All task sets were

215
Real-time slot-based task-splitting scheduling algorithms for multiprocessor systems

scheduled using the S-EKG and NPS-F scheduling al- observed shows that something prevented the release
gorithms (which have comparable jitter/overheads). mechanism of doing that job release. The reason for
Since each experiment took 1000 s, the whole set of this is related to the unprectability of the underlying
experiments took 34000 s. operating system. There are many sources of un-
predictability in a Linux kernel: (i) interrupts are
the events with the highest priority, consequently
5.4 Discussion of results when one arises, the processor execution switches
to handle the interrupt (usually interrupts arise in
We collected the maximum values observed for each an unpredictable fashion); (ii) on Symmetric Multi
type of jitter and also for each type of overhead ir- Processing (SMP) systems there are multiple kernel
respective of the algorithm used (S-EKG or NPS-F). threads running on different processors in parallel,
Table 1 presents the experimental results for: reserve and those can simultaneously operate on shared ker-
jitter (ResJ), release jitter (RelJ), context switch nel data structures requiring serialization on access
jitter (CtswJ), the overhead of interrupt 20 (related to such data; (iii) disabling and enabling preemption
to the hard disk) and the overhead of tick. Note features used in many parts of the kernel code can
that, we do not directly present the release overhead. postpone some scheduling decisions; (iv) the high
Rather, since, the release overhead is part of what is resolution timer infrastructure is based on local Ad-
experienced as release jitter, we simply present the vanced Programmable Interrupt Controller (APIC),
worst-case RelJ (which also accounts for RelO). The disabling and enabling local interrupts can disrupt
column identified with MAXmax gives the maximum the precision of that timer and, finally, (v) the hard-
value observed in all experiments. The third col- ware that Linux typically runs on does not provide
umn (AVGτi ) gives the average value experimented the necessary determinism, which would permit the
by the task that experienced the MAXmax value. The timing behavior of the system to be predictable with
fourth column (MINmax ) gives the minimum of the all latencies being time-bounded and known prior to
collected values (note that is the minimum of the run-time.
maximum values). The last column displays the av- The same reasons could be given to explain the
erage value of the task that experienced the MINmax MAXmax CtswJ value. However, usually the mag-
value. Before analyzing the results, we draw the at- nitude of CtswJ is not too high, because this oper-
tention of the reader to the time unit, µs, which ation is done by the scheduler, which executes in a
means that the impact of those jitterrs/overheads is controlled context.
relatively small. Recall that the period of tasks in
the various experiments varied from 5 ms up to 50 In Table 1, we present the overhead results of two
ms. ISRs: irq20 and tick (tick is a periodic timer inter-
rupt used by the system to do a set of operations,like
The highest ResJ values were constantly experi- for instance invoking the scheduler). The reason for
enced by split tasks. This is due to the task migra- this is, irq20 can be configured to be executed by one
tion mechanism required for split tasks (described in specific processor but tick cannot. In our opinion, it
Section 4). In that mechanism, if a task is not avail- does not make sense to present other values besides
able, a timer is set to expire some time later. The MAXmax , because in these experiments this is a spo-
value chosen for this delay was 5 µs. radic and rare event. In contrast, tick is periodic
The MAXmax RelJ value is too high (31.834 µs), with a frequency of approximately 1 ms. The values
but comparing both AVGτi (0.329 µs and 0.369 µs) observed show that this overhead is very small.

MAXmax (µs) AVGτi (µs) MINmax (µs) AVGτi (µs)


ResJ 8.824 5.255 7.919 5.833
RelJ 31.834 0.329 10.029 0.369
CtswJ 2.218 0.424 0.587 0.305
IntO - irq20 24.226 - - -
IntO - tick 0.571 0.272 0.408 0.243

TABLE 1: Experimental results (4-core


machine).

216
Real-Time Linux Concepts

MAXmax (µs) AVGτi (µs) MINmax (µs) AVGτi (µs)


ResJ 48.087 8.493 19.243 9.498
RelJ 37.264 0.731 11.609 0.848

TABLE 2: Experimental results (24-core


machine).
Because of space limitations, in this paper our References
analysis is focused on 4-core machine results, in [10]
some results of a set of experiments with the 24-core [1] J. M. Calandrino, H. Leontyev, A. Block, U.
machine are presented. Nevertheless, Table 2 shows
C. Devi, and J. H. Anderson, LITMUSRT :
some results related to the ResJ and RelJ on the 24-
A Testbed for Empirically Comparing Real-
core machine. The explanation for the MAXmax val-
Time Multiprocessor Schedulers, in proceedings
ues is the same that was given for the 4-core machine
of 27th IEEE Real-Time Systems Symposium
results. The AVGτi values are in all cases higher than
(RTSS 06), Rio de Janeiro, Brazil, pp. 111–126,
those for the 4-core machines. This is due to the
2006.
speed of the processors: 4-core processors operate at
2.67GHz while 24-core processors operate at 1.9GHz. [2] S. Kato and R. Rajkumar and Y. Ishikawa,
A Loadable Real-Time Scheduler Suite for
Multicore Platforms, in Technical Report
CMU-ECE-TR09-12, 2009. Available on-
6 Conclusions line:http://www.ece.cmu.edu/˜shinpei/papers
/techrep09.pdf.
We have presented the Real-time TAsk-Splitting
scheduling algorithms (ReTAS) framework that [3] D. Faggioli and M. Trimarchi and F. Checconi
implements S-EKG and NPS-F slot-based task- and C. Scordino, An EDF scheduling class for
splitting scheduling algorithms. The main purpose the Linux kernel, in proceedings of 11th Real-
of this framework is to show that slot-based task- Time Linux Workshop (RTLWS 09), Dresden,
splitting scheduling algorithms can be implemented Germany, pp. 197–204, 2009.
in a real-operating system (using the vanilla Linux
kernel) and work in practice. Using this frame- [4] B. Andersson and K. Bletsas, Sporadic Multi-
work we have identified and measured the the real- processor Scheduling with Few Preemptions, in
operating system jitters and overheads. In spite proceedings of 20th Euromicro Conference on
of the unpredictability of the Linux kernel we ob- Real-Time Systems (ECRTS 08), Prague, Czech
served a good correspondence between theory and Republic, pp. 243–252, 2008.
practice. These good results are due to: (i) the con-
[5] K. Bletsas and B. Andersson, Notional proces-
trolled experimental environment; (ii) the use of the
sors: an approach for multiprocessor schedul-
local high-resolution timers and (iii) the fact that
ing, in proceedings of 15th IEEE Real-Time and
these scheduling algorithms only involve very limited
Embedded Technology and Applications Sym-
synchronization on shared system resources between
posium (RTAS 09), San Francisco, CA, USA,
processors.
pp. 3–12, 2009.

[6] K. Bletsas and B. Andersson, Preemption-light


Acknowledgements multiprocessor scheduling of sporadic tasks with
high utilisation bound, in proceedings of 30th
IEEE Real-Time Systems Symposium (RTSS
This work was supported by the CISTER Research 09), Washington, DC, USA, pp. 385–394, 2009.
Unit (608 FCT) and also by the REHEAT project,
ref. FCOMP-01-0124-FEDER-010045 funded by [7] B. Andersson and E. Tovar, Multiprocessor
FEDER funds through COMPETE (POFC - Oper- Scheduling with Few Preemption, in proceedings
ational Programme ’Thematic Factors of Competi- of 12th IEEE International Conference on Em-
tiveness) and by National Funds (PT), through the bedded and Real-Time Computing Systems and
FCT - Portuguese Foundation for Science and Tech- Application (RTCSA 06), Sydney, Australia,
nology. pp. 322–334, 2006.

217
Real-time slot-based task-splitting scheduling algorithms for multiprocessor systems

[8] P. B. Sousa and B. Andersson and E. Tovar, Im- Diego, CA, USA, 2010. Available online:
plementing Slot-Based Task-Splitting Multipro- http://cse.unl.edu/rtss2008/archive/rtss2010/
cessor Scheduling, in proceedings of 6th IEEE WIP2010/5.pdf
International Symposium on Industrial Embed-
ded Systems (SIES 11), Västerås, Sweden, pp. [10] P. B. Sousa K. Bletsas and E. Tovar and B.
256–265, 2011. Andersson, On the implementation of real-time
slot-based task-splitting scheduling algorithms
[9] P. B. Sousa and B. Andersson and E. To- for multiprocessor systems, (extended version of
var, Challenges and Design Principles for this paper) in Technical Report HURRAY-TR-
Implementing Slot-Based Task-Splitting Multi- 110903, 2011.
processor Scheduling, in Work in Progress
(WiP) session of the 31st IEEE Real- [11] P. B. Sousa, ReTAS. Available online:
Time Systems Symposium (RTSS 10), San http://webpages.cister.isep.ipp.pt/˜pbsousa/retas/.

218
Real-Time Linux Concepts

Experience with Sporadic Server Scheduling in Linux: Theory vs.


Practice

Mark J. Stanovich, Theodore P. Baker∗, An-I Andy Wang


Florida State University
Department of Computer Science, Florida, USA
{stanovic,baker,awang}@cs.fsu.edu

Abstract

Real-time aperiodic server algorithms were originally devised to schedule the execution of threads
that serve a stream of jobs whose arrival and execution times are not known a priori, in a way that
supports schedulability analysis. Well-known examples of such algorithms include the periodic polling
server, deferrable server, sporadic server, and constant bandwidth server.
The primary goal of an aperiodic-server scheduling algorithm is to enforce a demand bound for each
thread - that is, an upper bound on the amount of CPU time a thread may compete for in a given time
interval. Bounding the demand of a given thread limits the interference that thread can inflict on other
threads in the system experience in the competition for CPU time. Isolating the CPU-time demands
of threads, known as temporal isolation, is an essential requirement for guaranteed resource reservations
and compositional schedulability analysis in open real-time systems. A secondary goal of an aperiodic
server is to minimize the worst-case and/or average response time while enforcing the demand bound.
The theoretical aperiodic server algorithms meet both goals to varying degrees.
An implementation of an aperiodic server can yield performance significantly worse than its theoretical
counterpart. Average response time is often higher, and even temporal isolation may not be enforced due
to factors not found or considered in the theoretical algorithm. These factors include context-switching
overheads, imprecise clocks and timers, preemption delays (e.g., overruns), and limits on storage available
for bookkeeping.
This paper reports our experience implementing, in Linux, variations of the sporadic-server scheduling
algorithm, originally proposed by Sprunt, Sha, and Lehoczky. We chose to work with sporadic-server
scheduling because it fits into the traditional Unix priority model, and is the only scheduling policy
recognized by the Unix/POSIX standard that enforces temporal isolation. While this paper only considers
sporadic server, some lessons learned extend to other aperiodic servers including those based on deadline
scheduling.
Through our experience, we show that an implemented sporadic server can perform worse than less
complex aperiodic servers such as the polling server. In particular, we demonstrate the effects of an
implementation’s inability to divide CPU time into infinitely small slices and to use them with no overhead.
We then propose and demonstrate techniques that bring the performance closer to that of the theoretical
sporadic-server algorithm. Our solutions are guided by two objectives. The primary objective is that the
server enforce an upper bound on the CPU time demanded. The secondary objective is that the server
provide low average-case response time while adhering to the server’s CPU demand bound. In order to
meet these objectives, our solutions restrict the degree to which the server’s total CPU demand can be
divided. Additionally, we provide mechanisms to increase the server’s ability to provide more continuous
allocations of CPU demand.
Through a network packet service example, we show that sporadic server can be effectively used to
bound CPU demand. Further, the efficiency of jobs served by sporadic server can be improved in terms
of both reduced average-case response time and increased throughput.

∗ Dr. Baker’s contributions to this paper are based on work supported by the National Science Foundation, while working at

the Foundation.

219
Experience with Sporadic Server Scheduling in Linux: Theory vs. Practice

1 Introduction workload and processing resources. The theory can


guarantee that a set of timing constraints will always
The roots of this paper are in experiments we did be satisfied, but only if an actual system conforms to
in 2007 on trying to schedule Linux device-driver the abstract models on which the analysis is based.
execution in a way that conforms to an analyz- Real-time operating systems provide a run-time
able real-time scheduling model[3]. We found that platform for real-time applications, including the
the Unix SCHED SPORADIC scheduling policy is mechanisms and services that schedule the execu-
a potential improvement over SCHED FIFO at any tion of tasks on the processing unit(s). For timing
constant scheduling priority. Then, in subsequent guarantees based on real-time scheduling theory to
studies we discovered that we needed to correct apply to an application implemented using an oper-
some technical defects in the POSIX definition of ating system, there must be a close correspondence
SCHED SPORADIC, which are reported in [5]. The between the virtual execution platform provided by
paper describes our more recent efforts to deal with the OS and the abstract models and scheduling al-
another phenomenon, having to do with preemption gorithms of the theory. The burden of achieving this
overhead and trade-offs between server throughput, correspondence falls on the OS and application de-
server response time, and the ability to guarantee velopers.
deadlines of other real-time tasks.
The OS must provide mechanisms that allow de-
In a broader sense, this paper is about narrowing velopment of applications that conform to the ab-
a gap that has developed between real-time operat- stract models of the theory within bounded toler-
ing systems and real-time scheduling theory. While ances. In the case of a general-purpose operating
a great deal is known about real-time scheduling in system that supports the concept of open systems,
theory, very little of the theory can be applied in such as Linux, the OS must go further, to provide
current operating systems. We feel that closer in- firewall-like mechanisms that preserve conformance
tegration of operating systems implementation and to the models when independently developed appli-
scheduling theory is needed to reach a point where cations or components run alongside one another.
one can build open systems that reliably meet real-
In real-time scheduling theory the arrival of a re-
time requirements.
quest for some amount of work is known as a job, and
After some review of real-time scheduling the- a logical stream of jobs is called a task. Some jobs
ory and aperiodic servers, we discuss our experi- have deadlines. The goal is to find a way to schedule
ences with implementing sporadic server scheduling, all jobs in a way that one can prove that hard dead-
the problem of properly handling preemption over- lines will always be met, soft deadlines will be met
head, and how we addressed the problem. We com- within a tolerance by some measure, and all tasks
pare the performance of serving aperiodic workload are able to make some progress at some known rate.
by a polling server, a sporadic server, and a hybrid To succeed, the theory must make some assumptions
polling-and-sporadic server, using our implementa- about the underlying computer platform and about
tion of the three scheduling algorithms in Linux. We the workload, i.e. the times at which jobs may arrive
conclude with a brief discussion of lessons learned and how long it takes to execute them.
and some further work.
The best-behaved and best understood task
model is a periodic task, whose jobs have a known
worst-case execution time (WCET) and a known
2 Background fixed separation between every pair of consecutive
jobs, called the period. A periodic task also has an
associated deadline, the point in time, relative to the
Any implementor of real-time operating systems
arrival of a job, by which the job must complete exe-
needs to understand the basics of real-time schedul-
cution. These workload parameters, along with oth-
ing theory, in order to understand the implications
ers, can be used to determine whether all jobs can
of implementation decisions. While this paper is not
meet their timing constraints if executed according
primarily about scheduling theory, we try to estab-
to certain scheduling algorithms, including strict pre-
lish some theoretical background as motivation for
emptive fixed-task priority scheduling.
our discussion of implementation issues.
A key concept in the analysis of preemptive
Real-time scheduling theory provides analysis
scheduling is interference. The nominal WCET of a
techniques that can be used to design a system to
job is based on the assumption that it is able to run
meet timing constraints. The analyses are based on
to completion (i.e., until the corresponding thread
abstract scheduling algorithms and formal models of

220
Real-Time Linux Concepts

suspends itself) without interference from jobs of to gather information. While it is desirable to re-
any other task. Showing that a job can complete ceive all network packets, missing a few packets is
within a given time window in the presence of other not catastrophic. The difficulty lies in that the net-
tasks amounts to bounding the amount of proces- work receive path is shared by other tasks on the sys-
sor time the other tasks can steal from it over that tem, some with different deadlines and others with
interval, and then showing that this worst-case inter- no explicit deadlines.
ference leaves enough time for the job to complete.
Assuming a fixed-task-priority model, a prior-
The usual form of interference is preemption by a
ity must be chosen for the bottom level of network
higher priority task. However, lower priority tasks
packet service. Processing the packets at a low or
can also cause interference, which is called priority
background priority does not work well because pro-
inversion or preemption delay. Preemption delays
cessing the packets may be delayed arbitrarily. Ex-
may be caused by critical sections, imprecision in
tended delay in network packet processing means
the OS timer mechanism, or any other failure of the
that a real-time task waiting for the packets may miss
kernel to adhere consistently to the preemptive fixed-
an unacceptably large number of packets. Another
priority scheduling model.
option is to schedule the network packet processing
A system that supports the UNIX real-time API at a high priority. However, the network packet pro-
permits construction of threads that behave like a pe- cessing now can take an unbounded amount of CPU
riodic task. The clock nanosleep() function is one of time, potentially starving other tasks on the system
several that provide a mechanism for suspending ex- and thereby causing missed deadlines. Therefore, a
ecution between one period and the next. Using the scheduling scheme is needed that provides some high-
sched setscheduler() function the application can re- priority time to serve the aperiodic jobs; however, the
quest the SCHED FIFO policy, and assign a priority. high-priority time should be limited, preventing the
By doing this for a collection of periodic tasks, and packet processing from monopolizing the CPU. The
choosing priorities sufficiently high to preempt all bound on CPU time ensures other tasks have access
other threads,1 one should be able to develop an ap- to the CPU in a timely manner.
plication that conforms closely enough to the model
The key to extending analysis techniques devel-
of periodic tasks and fixed task-priority preemptive
oped for periodic tasks to this broader class of work-
scheduling to guarantee the actual tasks meet dead-
loads is to ration processor time. It must be pos-
lines within some bounded tolerance.
sible to force even an uncooperative thread to be
Unfortunately, that is not enough. To support a scheduled in a way that the worst-case interference
reasonable range of real-time applications one needs it causes other tasks can be modeled by the worst-
to be able to handle a wider range of tasks. For ex- case behavior of some periodic task. A number of
ample, a task may request CPU time periodically but scheduling algorithms that accomplish this have been
the execution time requested may not be bounded, studied, which we refer to collectively as aperiodic
or the arrival of work may not be periodic. If such a servers.
task has high enough priority, the interference it can
Examples of well-known aperiodic server
cause for other tasks may be unpredictable or even
scheduling algorithms for use in a fixed-task-priority
unbounded, causing other tasks to miss deadlines.
scheduling environment include the polling and de-
Aperiodic tasks typically have performance re- ferrable servers [18], and the sporadic server [2].
quirements that are soft, meaning that if there is a There are also several examples for use with dead-
deadline it is stochastic, or occasional deadline misses line scheduling, among which the constant bandwidth
can be tolerated, or under temporary overload con- server has received considerable attention[17].
ditions load shedding may be acceptable. So, while
All these algorithms bound the amount of CPU
the CPU time allocated to the service of aperiodic
time an aperiodic task receives in any time interval,
tasks should be bounded to bound worst-case inter-
which bounds the amount of interference it can cause
ference for other tasks, it should be provided in a way
other tasks, guaranteeing the other tasks are left a
that allows the aperiodic task to achieve fast average
predictable minimum supply of CPU time. That is,
response time under expected normal circumstances.
aperiodic servers actively enforce temporal isolation,
One example of an aperiodic task that requires which is essential for an open real-time execution
fast average response time can be found in the paper platform.
by Lewandowski, et. al [3]. In this paper, a real-
The importance of aperiodic servers extends
time task uses the network in its time-critical path
beyond the scheduling of aperiodic tasks. Even
1 Of course, careful attention must be given to other details, such as handling critical sections.

221
Experience with Sporadic Server Scheduling in Linux: Theory vs. Practice

the scheduling of periodic tasks may benefit from The amount of CPU time consumed is restored to
the temporal isolation property.2 Aperiodic server the budget one replenishment period in the future,
scheduling algorithms have been the basis for a starting from the instant when the sporadic server
rather extensive body of work on open real-time sys- requested CPU time and had budget. The operation
tems, appearing sometimes under the names virtual to restore the budget at a given time in the future,
processor, hierarchical, or compositional scheduling. based on the amount of time consumed, is known as
For example, see [4, 9, 10, 11, 12, 13, 14]. a replenishment. Once the server uses all of its bud-
get, it can no longer compete for CPU time at its
In this paper, we limit attention to a fixed-task-
scheduling priority.4
priority scheduling environment, with particular at-
tention to sporadic- server scheduling. The primary The objective of the sporadic-server scheduling
reason is that Linux for the most part adheres to the algorithm is to limit worst-case system behavior
UNIX standard and therefore supports fixed-task- such that the server’s operation can be modeled, for
priority scheduling. Among the well-known fixed- schedulability analysis of other tasks, as if it were
task-priority aperiodic-server scheduling algorithms, a periodic task. That is, in any given sliding time
sporadic-server scheduling is theoretically the best. window, the sporadic server will not demand more
It also happens to be the only form of aperiodic- CPU time than could be demanded by a periodic
server scheduling that is recognized in the UNIX task with the same period and budget. A secondary
standard. goal of the sporadic server is to provide fast average
response time for its jobs.
A polling server is one way of scheduling aperi-
odic workloads. The polling server is a natural ex- With regard to minimizing average response
tension to the execution pattern of a periodic task. time, a sporadic server generally outperforms a
Using a polling server, queued jobs are provided CPU polling server. The advantage with a sporadic server
time based on the polling server’s budget, which is is that jobs can often be served immediately upon
replenished periodically. If no work is available when arrival, whereas with a polling server jobs will gener-
the polling server is given its periodic allocation of ally have to wait until the next period to receive CPU
CPU time, the server immediately loses its budget. time. Imagine a job arrival that happens immedi-
Similarly, if the budget is partially used, and no jobs ately after the polling server’s period. The job must
are queued, the polling server gives up the remainder wait until the following period to begin service, since
of the budget.3 the polling server immediately forfeits its budget if
there are no jobs available to execute. A sporadic
server, on the other hand, can execute the job imme-
diately given that its budget can be retained when
the server’s queue is empty. The ability to retain
budget allows the server to execute more than once
during its period, serving multiple jobs as they ar-
rive. Aperiodic servers that can hold on to their bud-
get until needed are known as bandwidth-preserving
servers.

FIGURE 1: Example usage and replenish-


ment of sporadic server’s budget.
3 Implementation
A sporadic server is a thread that is scheduled
according to one of the variants of the original spo- Several variants of the original sporadic-server al-
radic server algorithm introduced by Sprunt, Sha, gorithm have been proposed, including the POSIX
and Lehoczky [2]. While many variants exist, the SCHED SPORADIC [7], and more recently two vari-
basic idea is the same. A sporadic server has a ants that correct defects in the POSIX version [5, 8].
budget, replenishment period, and scheduling prior- Differences include how they handle implementation
ity. When the sporadic server uses the CPU, the constraints such as limited space to store replen-
amount of time used is deducted from its budget. ishment operations, overruns, and preemption costs.
2 Even nominally periodic tasks may be subject to faults that cause them to over-run their predicted WCET.
3A polling server cannot be implemented as a SCHED FIFO periodic task, because there is no enforcement of the processor
time budget.
4 This does not describe the original Sporadic Server algorithm completely, nor does it address a subtle defect in the origi-

nal algorithm which was corrected in subsequent work. Further, there are many sporadic-server variants, each with their own
nuances. These details are omitted to simplify the discussion.

222
Real-Time Linux Concepts

The scheduling algorithm followed by our implemen- m1 contains a timestamp, which is then subtracted
tation is described in [15], which is an updated ver- from the time the packet is received by the UDP layer
sion of [5] including corrections for errors in the on m2 .5 In our setup, m1 periodically sends packets
pseudo-code that were identified by Danish et al. in to m2 . The time between sending packets is varied
[4]. in order to increase the load experienced by the net-
work receive thread on m2 . The receive thread on m2
Correct operation of a sporadic server results in
is scheduled using either the polling server, sporadic
bounded interference experienced by lower-priority
server, or SCHED FIFO [7] scheduling policies.6 In
tasks. In order to measure the interference, we used
our experiments, m2 is running Linux 2.6.38 with
Regehr’s “hourglass” technique [6], which creates an
a ported version of softirq threading found in the
application-level process that monitors its own exe-
2.6.33 Linux real-time patch. m2 has a Pentium D
cution time without requiring special operating sys-
830 processor running at 3GHz with a 2x16KB L1
tem support. The hourglass process infers the times
cache and a 2x1MB L2 cache. 2GB of RAM are in-
of its transitions between executing and not execut-
stalled. The kernel was configured to use only one
ing by reading the clock in a tight loop. If the time
core, so all data gathered is basically equivalent to a
between two successive clock values is small, the as-
uniprocessor system.
sumption is that the process was not preempted.
However, if the difference is large, the thread was
likely preempted. This technique can be used to find
SCHED_FIFO
preemption points and thereby determine the time 1000 sporadic server
polling server
intervals when the hourglass process executed. From
this information, the hourglass process can calculate response time (milliseconds) 100
its total execution time.
10
Using the hourglass approach, we were able to
evaluate whether an implemented sporadic server ac-
1
tually provides temporal isolation. That is, if we
schedule the hourglass task with a priority below
0.1
that of the sporadic server (assuming there are no
other higher-priority tasks in the system), the hour-
0.01
glass task should be able to consume all of the CPU 0 2 4 6 8 10 12 14 16 18 20 22
time that remains after the sporadic server used all sent packets (1000 pkts/sec)

of its budgeted high-priority time. The CPU time


available to the hourglass task should, ideally, be one FIGURE 2: Response time using different
hundred percent minus the percentage budgeted for scheduling policies.
the sporadic server, viewed over a large enough time
window. Therefore, if we schedule a sporadic server
with a budget of 1 millisecond and a period of 10
0.4
milliseconds, and there are no other tasks with pri- 10% utilization
SCHED_FIFO
ority above the sporadic server and hourglass tasks, 0.35 sporadic server
polling server
the hourglass task should be able to consume at least 0.3
90% of the CPU time, i.e., 9 milliseconds in any win-
CPU utilization

0.25
dow of size 10 milliseconds. In reality, other activities
such as interrupt handlers may cause the interfer- 0.2

ence experienced by the hourglass task to be slightly 0.15


higher.
0.1
To evaluate the response time characteristics of
0.05
our sporadic server, we measured the response time
of datagram packets sent across a network. The re- 0
0 2 4 6 8 10 12 14 16 18 20 22
sponse time of a packet is measured by the time dif- sent packets (1000 pkts/sec)
ference between sending the packet on one machine,
m1 , and receiving the packet by another, m2 . More FIGURE 3: sirq-net-rx thread CPU utiliza-
specifically, the data portion of each packet sent from tion using different scheduling policies.
5 The clocks for the timestamps on m1 and m2 are specially synchronized using a dedicated serial connection.
6 SCHED FIFO differs from the other in allowing thread of sufficiently high priority to execute arbitrarily long without
preemption.

223
Experience with Sporadic Server Scheduling in Linux: Theory vs. Practice

Scheduling the Linux network receive thread microsecond time interval was produced using the
(i.e., sirq-net-rx) using various scheduling policies af- Linux Trace Toolkit(LTTng)[16] and is shown in Fig-
fects the average response time of received network ure 4. The top bar is the sirq-net-rx thread and the
packets. One would expect that the polling server bottom bar is the lower-priority hourglass measur-
would result in higher average response times than ing task. This figure shows that the CPU time of
SCHED FIFO or sporadic server and that sporadic both tasks is being finely sliced. The small time
server and SCHED FIFO should provide similar av- slices cause interference for both the lower-priority
erage response times until sporadic server runs out and sporadic server thread that would not be expe-
of budget. rienced if the threads were able to run to completion.
In our experiment, sporadic server and polling
server are both given a budget of 1 millisecond 3.1 Accounting for Preemption Over-
and a period equal to 10 milliseconds. The spo-
head
radic server’s maximum number of replenishments
is set to 100. The hourglass task is scheduled us-
To ensure that no hard deadlines are missed, and
ing SCHED FIFO scheduling at a real-time priority
even to ensure that soft deadlines are met within the
lower than the priority of the network receive thread.
desired tolerances, CPU time interference due to pre-
Each data point is averaged over a 10 second interval
emptions must be included in the system’s schedula-
of sending packets at varied rates. The CPU utiliza-
bility analysis. The preemption interference caused
tion and response time for the described experiment
by a periodic task can be included in the analysis by
are shown in Figures 2 and 3.
adding a preemption term to the task’s worst-case
One would expect that if the sporadic server execution time (W CET ) that is equal to twice the
and polling server both were budgeted 10% of the worst-case context switch cost – one for switching
CPU, the lower-priority hourglass task should be into the task and one for switching out of the task.8
able to consume at least 90% of the CPU time re- Assuming all tasks on the system are periodic, this
gardless of the load. However, the data for the exper- is at least a coarse way of including context-switch
iment shows that the sporadic server is causing much time in the schedulability analysis.
greater than 10% interference. The additional inter-
A sporadic server can cause many more context
ference is the consequence of preemptions caused by
switches than a periodic task with the same param-
the server. Each time a packet arrives the sporadic
eters. Rather than always running to completion, a
server preempts the hourglass task, thereby causing
sporadic server has the ability to self-suspend its ex-
two context switches for each packet arrival. Given
ecution. Therefore, to obtain a safe W CET bound
that the processing time for a packet is small (2-10
for analysis of interference9 , one would have to deter-
microseconds) the server will suspend itself before
mine the maximum number of contiguous “chunks”
the next packet arrives. In this situation, the aggre-
of CPU time the sporadic server could request within
gate time for context switching and other sporadic
any given period-sized time interval. The defini-
server overhead such as using additional timer events
tion of sporadic- server scheduling given in schedul-
and running the sporadic-sever-related accounting
ing theory publications does not place any such re-
becomes significant. For instance, on the receiv-
striction on the number of CPU demand chunks and
ing machine the context-switch time alone was mea-
thus imposes no real bound on the W CET . In order
sured at 5-6 microseconds using the lat ctx LMbench
to bound the number of preemptions, and thereby
program[1].
bound the time spent context switching, most imple-
The overhead associated with preemption causes mented variations of sporadic server limit the maxi-
the additional interference that is measured by the mum number of pending replenishments, denoted by
lower-priority hourglass task.7 max repl. Once max repl replenishments are pend-
ing, a sporadic server will be prevented from execut-
A snapshot of CPU execution time over a 500
ing until one of the future replenishments arrives.
7 The lower-priority thread does not measure much of the cache eviction and reloading that other applications may experience,

because its code is very small and typically remains in the CPU’s cache. When cache effects are taken into account, the potential
interference penalty for each preemption by a server is even larger.
8 This is an intentional simplification. The preemption term should include all interferences caused by the sporadic server

preempting another thread, not only the direct context-switch time, but also interferences such as the worst-case penalty im-
posed by cache eviction and reloading following the switch. For checking the deadline of a task, both “to” and “from” context
switches need to be included for potentially preempting task, but only the “to” switch needs be included for the task itself.
9 From this point on we abuse the term W CET to stand for the maximum interference that a task can cause for lower-priority

tasks, which includes not just the maximum time that the task itself can execute, but also indirect costs, such as preemption
overheads.

224
Real-Time Linux Concepts

FIGURE 4: LTTng visualization of CPU execution.

Using max repl, the maximum context-switching


time per period of a sporadic server is two times the
max repl. Using this logic, and assuming that the
actual context-switch costs are added on top of the
0.4
servers budget, a worst-case upper bound on the in- 10% utilization
SCHED_FIFO
terference that can be caused by a sporadic server 0.35 sporadic server
polling server
task could be written as: 0.3

CPU utilization
0.25

0.2
SSbudget + 2 ∗ max repl
0.15

0.1

Accounting for the cost due to preemptions is 0.05

important in order to ensure system schedulabil- 0


ity; however, adding preemption cost on top of the 0 2 4 6 8 10 12 14 16 18 20 22
sent packets (1000 pkts/sec)
server’s budget as above results in over-provisioning.
That is, if a sporadic server does not use max repl
number of replenishments in a given period a worst-
case interference bound derived in this way is an FIGURE 5: Accounting for context-
over-estimate. At the extreme, when a sporadic switching overhead.
server consumes CPU time equal to its budget in
one continuous chunk, the interference only includes
the cost for two context switches rather than two
times max repl. However, the server cannot make Charging for preemptions on-line requires that
use of this windfall to execute jobs in its queue be- the preemption interference be known. Determining
cause the context switch cost was not added to its an appropriate amount to charge the server for pre-
actual budget. empting can be very difficult, as it depends on many
factors. In order to determine an amount to charge
We believe a better approach is to account for
sporadic server for a preemption, we ran the network
actual context-switch costs while the server is exe-
processing experiment under a very heavy load and
cuting, charging context switch costs caused by the
extracted an amount that consistently bounded the
server against its actual budget, and doing so only
interference of sporadic server to under 10%. While
when it actually preempts another task. In this ap-
such empirical estimation may not be the ideal way
proach the SSbudget alone is used as the interfer-
to determine the preemption interference, it gave us
ence bound for lower-priority tasks. Accounting for
a reasonable value to verify that charging for pre-
context-switching overhead is performed on-line by
emptions can bound the interference.
deducting an estimate of the preemption cost from
the server’s budget whenever the server causes a The network experiment was performed again,
preemption. Charging the sporadic server for pre- this time charging sporadic server a toll of 10 mi-
emption overhead on-line reduces over-provisioning, croseconds each time it caused a preemption. Fig-
and need not hurt server performance on the aver- ure 5 shows the results for the experiment and
age, although it can reduce the effective worst-case demonstrates that time interference for other lower-
throughput of the server if the workload arrives as priority tasks can be bounded to 10%, that is, the
many tiny jobs (as in our packet service experiment). server’s budget divided by the period.

225
Experience with Sporadic Server Scheduling in Linux: Theory vs. Practice

3.2 Preemption Overhead It turns out that the workload presented by our
packet service example is a bad one for the sporadic
server, in that a burst of packet arrivals can frag-
ment the server budget, and then this fragmenta-
SCHED_FIFO
1000 sporadic server tion becomes “locked in” until the backlog is worked
polling server
off. Suppose a burst of packets arrives, and the first
response time (milliseconds)

100 max repl packets are separated by just enough time


for the server to preempt the running task, forward
10 the packet to the protocol stack, and resume the
preempted task. The server’s budget is fragmented
1 into max repl tiny chunks. Subsequent packets are
buffered (or missed, if the device’s buffer overflows),
0.1 until the server’s period passes and the replenish-
ments are added back to its budget. Since there is
0.01 by now a large backlog of work, the server uses up
0 2 4 6 8 10 12 14 16 18 20 22
sent packets (1000 pkts/sec)
each of its replenishment chunks as it comes due,
then suspends itself until the next chunk comes due.
This results in a repetition of the same pattern until
FIGURE 6: Accounting for context-
the backlog caused by the burst of packets has been
switching overhead.
worked off. During this overload period, the spo-
radic server is wasting a large fraction of its budget
Bounding the interference that an aperiodic in preemption overhead, reducing its effective band-
workload causes for other tasks is the primary ob- width below that of a polling server with the same
jective of aperiodic server scheduling; however, one budget and period. There is no corresponding im-
would also like to see fast average response time. Fig- provement in average response time, since after the
ure 6 shows that under heavy load, the average re- initial max repl fragmentation, the reduced band-
sponse time of packets when using sporadic-server width will cause the response times to get worse and
scheduling is actually worse than that of a polling worse.
server with the same parameters. For this experi-
ment, not only is the sporadic server’s average re-
sponse time higher, but as the load increases up to
45% of the packets were dropped.10 3.3 Reducing the Impact of Preemp-
tion Overhead
The poor performance of the sporadic server is
due to a significant portion of its budget being con-
A hybrid server combining the strengths of polling
sumed to account for preemption costs, leaving a
and sporadic servers may be a better alternative than
smaller budget to process packets. If all of the pack-
choosing either one. In this approach, a sporadic
ets arrived at the same time, the processing would
server is used to serve light loads and a polling server
be batched and context switching would not occur
to serve heavy loads.
nearly as often. However, due to the spacing be-
tween packet arrivals, a large number of preemptions Sporadic-server scheduling supports a polling-
occur. A polling server on the other hand has a like mode of operation. When the max repl param-
much larger portion of its budget applied to process- eter value is one, only one preemption is permitted
ing packets, and therefore does not drop packets and per period. Switching to the polling-like mode of
also decreases the average response time. operation is just a matter of adjusting max repl to
1.
Based on the poor performance of a sporadic
server on such workloads one might naı̈vely jump to When changing modes of operation of the spo-
the conclusion that, in general, a polling server is a radic server in the direction of reducing max repl,
much better choice. Actually, there is a trade-off, something must be done if the current number
in which each form of scheduling has its advantage. of pending replenishments would exceed max repl.
Given the same budget and period, a sporadic server One approach is to allow the number of pending re-
will provide much better average-case response time plenishments to exceed max repl temporarily, reduc-
under light load, or even under a moderate load of ing it by one each time a replenishment comes due.
large jobs, but can perform worse than the polling Another approach is to implement the reduction at
server for certain kinds of heavy or bursty workloads. once, by coalescing pending replenishments. This is
10 No packets were dropped by the other servers.

226
Real-Time Linux Concepts

similar to the classical mode-change scheduling prob- may be incorrectly identified as the onset of a heavy
lem, in that one must be careful not to violate the load and the early switching may cause the server
assumptions of the schedulability analysis during the to postpone a portion of its budget that could have
transition. In the case of a sporadic server the con- been used sooner. Conversely, delaying the switch
straint is that the server cannot cause any more in- may mean that time that could have been used to
terference within any time window than would be serve incoming jobs is wasted on preemption charges.
caused by a periodic task with execution time equal
While an ideal switching point may not be pos-
the server budget and period equal to the server’s
sible to detect beforehand, one reasonable indicator
budget replenishment period, including whatever ad-
of a heavy load is when sporadic server uses all of its
justments have been made to the model to allow for
budget. That is the point when a sporadic server is
context-switch effects. We call this the sliding win-
blocked from competing for CPU time at its schedul-
dow constraint for short.
ing priority. At this point the server could switch to
its polling-like mode of operation.
A possible event to indicate when to switch back
to the sporadic server mode of operation is when a
sporadic server blocks but still has available budget.
This point in time would be considered as entering a
period of light load and the max repl could be rein-
stated.

FIGURE 7: Sporadic server with 0.4


10% utilization
max repl ≥ 4, before switch to polling- SCHED_FIFO
0.35 sporadic server coalesce (immediate)
like server. polling server
0.3 sporadic server coalesce (gradual)
sporadic server (max_repl = 1)
CPU utilization

0.25

0.2

0.15

0.1

0.05

0
0 2 4 6 8 10 12 14 16 18 20 22
sent packets (1000 pkts/sec)
FIGURE 8: After switch to poll-like server,
with max repl = 1 and replenishments coa-
lesced. FIGURE 9: Coalescing replenishments un-
der heavy load.
In order to maintain the sliding-window con-
straint during the mode change, one can think in
terms of changing the times associated with pend-
SCHED_FIFO
ing replenishments. Consolidating the replenishment 1000 sporadic server coalesce (immediate)
polling server
times would allow the creation of a single replen- sporadic server coalesce (gradual)
response time (milliseconds)

sporadic server (max_repl = 1)


ishment with an amount equal to the server’s ini- 100
tial budget. To guard against violating the sliding-
window constraint, the replenishment time of any re- 10
plenishment must not be moved earlier in time. One
approach is to coalesce all replenishments into the 1
replenishment with a time furthest in the future, re-
sulting into a single replenishment with an amount 0.1
equal to the server’s initial budget as shown in Fig-
ures 7 and 8. 0.01
0 2 4 6 8 10 12 14 16 18 20 22
Switching from sporadic server to a polling-like sent packets (1000 pkts/sec)

server should be performed if the server is experi-


encing heavy load. The ideal switching point may FIGURE 10: Coalescing replenishments
be difficult to detect. For instance, a short burst under heavy load.

227
Experience with Sporadic Server Scheduling in Linux: Theory vs. Practice

Implementation of the switching mechanism de- 4 Conclusion


scribed above is relatively simple. The replenish-
ments are coalesced into one when the server runs
Any open real-time operating system needs to pro-
out of budget but still has work. The single replen-
vide some form of aperiodic-server scheduling policy,
ishment limit will remain enforced until the sporadic
in order to permit temporal isolation of tasks, and to
server is suspended and has budget, a point in time
provide a real-time virtual processor abstraction that
considered to be an indication of light load. So, the
can support fault-tolerant compositional schedulabil-
polling-like mode of operation will naturally tran-
ity analysis. The only standard Unix scheduling pol-
sition back to the original sporadic server mode of
icy with these properties is Sporadic Server. 11
operation.
We have described our experiences implementing
Immediately coalescing all replenishments may
and using a variation of the Sporadic Server schedul-
be too eager. Loads that are between light and heavy
ing algorithm in Linux. Our experience demon-
may experience occasional or slight overloads that re-
strates that sporadic server scheduling can be an ef-
quire only slightly more CPU time. In this case, con-
fective way to provide a predictable quality of ser-
verting all potential preemption charges, by delaying
vice for aperiodic jobs while bounding the interfer-
replenishments, into CPU time to serve packets is
ence that the server thread can cause other tasks,
too extreme. Therefore, to perform better under a
thereby supporting schedulability analysis. However,
range of loads one approach is to coalesce only two re-
this goal cannot be achieved without consideration of
plenishments for each overload detection. Using this
some subtle implementation issues that are not ad-
method allows the sporadic server to naturally find
dressed in the theoretical formulations of sporadic
an intermediate number of replenishments to serve
server that have been published.
packets efficiently without wasting large portions of
its budget on preemption charges. Neither the published theoretical versions of Spo-
radic Server nor the POSIX/Unix formulation con-
The performance data for the two coalescing
sider all the interference effects we found on a real
methods, immediate and gradual, are shown in Fig-
implementation. In particular, fine grained time slic-
ures 9 and 10. These figures show the advantage
ing degrades the performance of the sporadic server
of transitioning between sporadic-server and polling-
thread and can cause interference for other threads
like mode of operation. Under light load until ap-
on the system to significantly exceed the assumptions
proximately 4500 pkts/sec, the sporadic server has
of the theoretical model. This interference is mainly
the same response time as SCHED FIFO scheduling.
due to a sporadic server being able to use CPU time
However, once the load is heavy enough the sporadic
in arbitrarily small time slices. Such fine time slic-
server is forced to limit the amount of CPU demand
ing not only increases the interference that the server
and therefore the response time begins to increase to
inflicts on tasks that it preempts, but also degrades
that of a polling server. There is not enough CPU
the throughput of the server itself. Through network
budget to maintain the low average response time
service experiments we showed that the interference
with SCHED FIFO scheduling. The difference be-
caused by a sporadic server can be significant enough
tween the immediate and gradual coalescing is seen
to cause other real-time threads to miss their dead-
when the restriction on CPU demand begins. The
lines.
gradual coalescing provides a gradual transition to
polling-like behavior where as the immediate coalesc- Charging a sporadic server for preemptions is an
ing has a much faster transition to the polling server’s effective means to limit the CPU interference. The
response time performance. The better performance charging for preemptions can be carried out in sev-
of the gradual coalescing is due to the server making eral ways. We chose an on-line approach where the
better use of the available budget. With immediate server is charged when it preempts another thread.
coalescing, when the server transitions to the polling- Charging the server only when it actually preempts
like mode the CPU utilization drops, as one would not only bounds the CPU time for other tasks, but
expect of sporadic server where the max repl is set allows the server to use its budget more effectively.
to 1. However, with gradual coalescing the server That is, rather than accounting for the additional
continues to use its available budget to pay for pre- interference by inflating the nominal server budget
emption costs and serve some jobs earlier, which re- (over the implemented server budget) in the schedu-
sults in lower response times. lability analysis, we charge the server at run time
for the actual number of preemptions it causes. In
11 While a literal implementation of the abstract description of SCHED SPORADIC in the Unix standard is not practical and
would not support schedulability analysis, we feel that the corrections described in this paper and [15] fall within the range of
discretion over details that should be permitted to an implementor.

228
Real-Time Linux Concepts

this way the server’s actual interference is limited rived, but the preemption overhead for doing this
to its actual CPU time budget, and we do not need was still a problem. By waiting for several pack-
to use an inflated value in the schedulability analy- ets to arrive, and then processing them in a batch,
sis. Since the preemption charges come out of the the polling server and our hybrid server were able
server’s budget, we still need to consider preemp- to handle the same workload with much less over-
tion costs when we estimate the worst-case response head. However, the logical next step is to force a
time of the server itself. However, if we choose to similar waiting interval on the interrupt handler for
over-provision the server for worst-case (finely frag- the network device.
mented) arrival patterns it actually gets the time and
While we have not experimented with deadline-
can use it to improve performance when work arrives
based aperiodic servers in Linux, it appears that
in larger chunks.
our observations regarding the problem of fitting the
The ability to use small time slices allows a spo- handling of context switch overheads to an analyz-
radic server to achieve low average response times able theoretical model should also apply to the con-
under light loads. However, under a load of many stant bandwidth server, and that a similar hybrid
small jobs, a sporadic server can fragment its CPU approach is likely to pay off there.
time and waste a large fraction of its budget on pre-
In future work, we hope to explore additional
emption charges. A polling server, on the other hand,
variations on our approach to achieving a hybrid be-
does not experience this fragmentation effect, but
tween polling and sporadic server, to see if we can
does not perform as well as sporadic server under
improve performance under a range of variable work-
light load. To combine the strengths of both servers,
loads. We are considering several different mecha-
we described a mechanism to transition a sporadic
nisms, including stochastic, for detecting when we
server into a polling-like mode, thereby allowing spo-
should change modes of operation as the system
radic server to serve light loads with good response
moves between intervals of lighter and heavier load.
time and serve heavy loads with throughput simi-
We also plan to explore other aperiodic servers and
lar to a polling server. The data for our experiments
determine how much interference preemptions cause.
show that the hybrid approach performs well on both
For example, it appears that a constant bandwidth
light and heavy loads.
server would suffer the same performance problems
Our recent experiences reinforce what we learned as a sporadic server when the workload causes bud-
in prior work with sporadic-server scheduling in get fragmentation. We also plan to investigate the
Linux [5]. There are devils in the details when it preemption interference due to cache eviction and
comes to reducing a clever-looking theoretical algo- reloading. The threads used in our experiments ac-
rithm to a practical implementation. To produce a cess relatively small amounts of data and therefore do
final implementation that actually supports schedu- not experience very large cache interferences. This
lability analysis, one must experiment with a real im- is not true for all applications, and the cache effects
plementation, reflect on any mismatches between the on such applications will need to be bounded. While
theoretical model and reality, and then make further limiting the number of replenishments does reduce
refinements to the implemented scheduling algorithm the cache effect, better mechanisms are needed to
until there is a match that preserves the analysis. reduce the ability of sporadic server to cause cache
This sort of interplay between theory and practice interferences.
pays off in improved performance and timing pre-
Other questions we are considering include
dictability.
whether it is practically feasible to schedule multiple
We also believe our experience suggests a poten- threads using a single sporadic-server budget, and
tial improvement to the “NAPI” strategy employed how well sporadic-server scheduling performs on a
in Linux network device drivers for avoiding unnec- multi-core system with thread migration.
essary packet-arrival interrupts. NAPI leaves the in-
terrupt disabled so long as packets are being served,
re-enabling it only when the network input buffer References
is empty. This can be beneficial if the network de-
vice is faster than the CPU, but in the ongoing race
between processors and network devices the speed [1] L. McVoy and C. Staelin. lmbench: Portable
advantage shifts one way and another. For our ex- tools for performance analysis. In USENIX An-
perimental set-up, the processor was sufficiently fast nual Technical Conference, pages 279–294, Jan.
that it was able to handle the interrupt and the sirq- 1996.
net-rx processing for each packet before the next ar-
[2] B. Sprunt, L. Sha, and L. Lehoczky. Aperiodic

229
Experience with Sporadic Server Scheduling in Linux: Theory vs. Practice

task scheduling for hard real-time systems. Real- Real-Time Systems Symposium, pages 376–385,
Time Systems, 1(1):27–60, 1989. 2005.
[3] M. Lewandowski, M. J. Stanovich, T. P. Baker, [11] G. Lipari and E. Bini. Resource partitioning
K. Gopalan, and A.-I. Wang. Modeling device among real-time applications. In Proc. 15th
driver effects in real-time schedulability analy- EuroMicro Conf. on Real-Time Systems, pages
sis: Study of a network driver. In Real Time and 151–158, July 2003.
Embedded Technology and Applications Sympo-
sium, 2007. RTAS ’07. 13th IEEE, pages 57–68, [12] S. Saewong, R. R. Rajkumar, J. P. Lehoczky,
Apr. 2007. and M. H. Klein. Analysis of hierar hical fixed-
priority scheduling. In ECRTS ’02: Proceedings
[4] M. Danish, Y. Li, and R. West. Virtual-cpu of the 14th Euromicro Conf. on Real-Time Sys-
scheduling in the quest operating system. Real- tems, page 173, Washington, DC, USA, 2002.
Time and Embedded Technology and Applica- IEEE Computer Society.
tions Symposium, IEEE, 0:169–179, 2011.
[13] I. Shin and I. Lee. Compositional real-
[5] M. Stanovich, T. P. Baker, A.-I. A. Wang, time scheduling framework with periodic model.
and M. G. Harbour. Defects of the posix spo- ACM Trans. Embed. Comput. Syst., 7(3):1–39,
radic server and how to correct them. In Real 2008.
Time and Embedded Technology and Applica-
tions Symposium, 2010. RTAS ’10. 16th IEEE, [14] Y. C. Wang and K. J. Lin. The implementa-
pages 35–45, Stockholm, Sweden, Apr. 2010. tion of hierarchical schedulers in the RED-Linux
IEEE Computer Society. scheduling framework. In Proc. 12th EuroMicro
Conf. on Real-Time Systems, pages 231–238,
[6] J. Regehr. Inferring scheduling behavior with
June 2000.
Hourglass. In Proc. of the USENIX Annual
Technical Conf. FREENIX Track, pages 143– [15] M. Stanovich, T. P. Baker, A.-I. A. Wang, and
156, Monterey, CA, June 2002. M. G. Harbour. Defects of the posix sporadic
[7] IEEE Portable Application Standards Commit- server and how to correct them. Technical
tee (PASC). Standard for Information Tech- Report TR-091026 (revised), Florida State
nology - Portable Operating System Interface University Department of Computer Science,
(POSIX) Base Specifations, Issue 7. IEEE, Dec. http://www.cs.fsu.edu/research/reports/TR-
2008. 100315.pdf

[8] D. Faggioli, M. Bertogna, and F. Checconi. Spo- [16] Linux Trace Toolkit Next Generation,
radic server revisited. In Proceedings of the 2010 http://lttng.org/
ACM Symposium on Applied Computing, SAC
[17] L. Abeni, G. Lipari, and G. Buttazzo. Constant
’10, pages 340–345, Sierre, Switzerland, 2010.
bandwidth vs. proportional share resource allo-
ACM.
cation. In Proc. IEEE Int. Conf. Multimedia
[9] R. J. Bril and P. J. L. Cuijpers. Analysis of Computing and Systems, Florence, Italy, June
hierarchical fixed-priority pre-emptive schedul- 1999.
ing revisited. Technical Report CSR-06-36,
Technical University of Eindhoven, Eindhoven, [18] J. Strosnider, J. P. Lehoczky, and L. Sha.
Netherlands, 2006. The deferrable server algorithm for enhanced
aperiodic responsiveness in real-time environ-
[10] R. I. Davis and A. Burns. Hierarchical fixed pri- ments. IEEE Trans. Computers, 44(1):73–91,
ority preemptive scheduling. In Proc. 26th IEEE Jan. 1995.

230
Real-Time Linux Concepts

How to cope with the negative impact of a processor’s energy-saving


features on real-time capabilities?

Carsten Emde
Open Source Automation Development Lab (OSADL) eG
Aichhalder Str. 39, 78713 Schramberg, Germany
[email protected]

Abstract
In the early days of using computers for determinism-critical tasks, processors mostly were suitable
for this purpose, since instruction execution was in sync with the clock frequency. This made it possible
to correctly predict the execution time of a given code segment. With the rapidly increasing need for
processing power, deterministic execution was abandoned in favor of throughput. In consequence, the
peak processing power of a today’s state-of-the-art multi-core processor is about 1,000,000 times greater
than that of a standard processor 30 years ago. The worst-case performance, however, only improved by
a factor of about 10 - and even this may require specific configuration of the processor and the operating
system. The main reasons for the lack of progress of the worst-case performance are the introduction of
caches and energy-saving features. While the negative impact of caching on the worst-case performance
could be studied in recent years and can now be handled reasonably well, the details of energy-saving
are less well known. Although energy-saving most of the time boils down to switching off or at least
throttling down unneeded processor components, such mechanisms can be implemented in various ways
and locations and are often undocumented. It was, therefore, the aim of this project to investigate the
latency behavior of modern energy-saving processors and to provide recommendations how to disable or
circumvent energy-saving.
To investigate the impact of energy-saving, latency measurements no longer were performed in a short
closed loop such as when using the cyclictest utility but with randomly occurring interrupt triggers at
idle state of the processor. This could lead to long latencies which were then attempted to be reduced by
specific processor and Linux kernel configurations.
As a result, we now can recommend a number of individual processor settings and configuration items
of the Linux kernel to optimize the worst-case system latency when modern energy-saving processors are
used. In some cases, however, it was not possible to disable any deleterious effect of energy-saving on a
processor’s latency, although we tried very hard. Thus, we urgently appeal to semiconductor manufactur-
ers to - whatever mechanisms they invent to reduce power consumption - provide a way to switch them
off, if they adversely affect response time. In many cases, power-saving and fast reaction to asynchronous
events exclude each other. It is, however, still possible to obtain a deterministic response while power-
saving is enabled, but it must be taken into account that the worst-case latency of such systems may be
considerably prolonged.

1 Mechanisms of power-saving • Allowing the creation of battery-powered de-


vices
There are a number of good reasons to reduce the
• Prolonging the lifetime of systems through re-
power consumption of microprocessors
duced thermal stress

• Reducing the energy consumption as part of • Reducing the need of fans and preventing dam-
the general ecological imperative age from defective fans

231
How to cope with the negative impact of a processors energy-saving features on real-time capabilities?

• Reducing the need of dust filters and prevent- For the analysis of the effect of throttling on the
ing damage from filters that were forgotten to worst-case latency of a processor, the cyclictest util-
clean or to replace ity could be used in its original version. The utility
was run as usual on systems with and without en-
The semiconductor industry is using two com- abled power-saving. At least 100 million test cycles
pletely different approaches to provide processors were run to obtain reliable results of the worst-case
with reduced energy consumption: latency.
For the analysis of the effect of idle states on the
• Using smaller structures and more efficient iso- worst-case latency of a processor, the cyclictest util-
lation material to reduce current demand and ity was expanded. In a first step, the -i or –interval=
leakage current option that is used to define the duration of the mea-
surement cycle was allowed to accept a range of a
• Slowing down or switching off parts of the pro- lowest and a highest interval duration. When a range
cessor when idle is specified using this newly defined option format,
an individual duration is determined for every test
While the former approach is generally welcome and interval by using a uniformly distributed logarithmic
normally does not have any impact on the response random value between the lowest and and the high-
time of a processor, the latter is relevant in the con- est duration specified. The interval is displayed in
text of real-time systems that try to achieve a min- the output line to monitor the behavior of this func-
imum worst-case latency. It is, therefore, important tionality.
to analyze a processor with respect of the imple- In a second step, a two-dimensional histogram
mented mechanisms for energy saving when selecting was implemented that is activated when the -h or –
and configuring it for a system that relies on real- histogram= option is specified. This histogram stores
time computing. the frequencies of latency samples per interval dura-
tion. This makes it possible to differentiate recorded
latency values with respect to the duration of the pre-
2 Slowing down or switching ceding idle time of the processor. It is expected that
the latency values would not depend on the duration
off parts of the processor of the test interval in processors without any (or com-
pletely disabled) power-saving mechanism, whereas
To slow down or switch off parts of the processor, in processors with active power-saving, the latency
three different general mechanism are employed: values would be the higher, the longer the preceding
period of quiescence was.
• Throttling

• Sleep states 4 Throttling


• Undisclosed internal mechanisms
4.1 Principle
This article will present methods to analyze the
various mechanisms and present procedures to pre- The clock frequency of many modern processors can
vent them from interfering with a system’s real- be adjusted to reflect a system’s load requirements
time capabilities. In all cases, mainline Linux with which reduces power consumption when the pro-
the PREEMPT RT patches was used, kernel version cessor is less busy or even idle. The Linux ker-
2.6.33.15-rt31. nel can optionally be compiled with the cpufreq
subsystem to manage CPU frequency scaling also
known as throttling. It provides, among others,
the ondemand and the performance scaling gov-
3 Material and methods ernor. Manipulation of the CPU frequency scal-
ing is done through the sys virtual file system
To study the effect of the various power-saving mech- in the /sys/devices/system/cpu/cpuN/cpufreq direc-
anisms, the cyclictest [1] utility was used. It is avail- tory. The available frequencies are listed in the file
able from a git repository [2] but also part of many scaling available frequencies and the available gover-
Linux distributions where the related package usu- nors in the file scaling available governors. Writing
ally is called rt-tests. the name of one of the available governors to the file

232
Real-Time Linux Concepts

scaling governor will select it, e.g. to set CPU #0 to 5 Sleep states
full speed:

cd /sys/devices/system/cpu/cpu0/cpufreq Another mechanism to reduce energy consumption


echo performance >scaling_governor when computing power is not needed are sleep states.
A processor may implement a certain number of sleep
states going from snoozing to deep sleep. The deeper
4.2 Effect of throttling setting on la- the sleep state is, however, the longer the processor
tency may need to wakeup and to react to an unpredictable
asynchronous event. It is, therefore, important to
Many of the processors tested show a distinct in- take care of the sleep state setting when the worst-
crease of the worst-case latency, if throttling is en- case latency is crucial.
abled and its clock frequency is allowed to be de-
The sleep states are controlled through BIOS set-
creased to the specified minimum. A typical result
tings and kernel parameters. Unfortunately, there is
is shown in Figure 1 that was obtained on an AMD
not a standardized naming convention of BIOS set-
two-core G-Series processor running either at 1,400
tings that affect sleep states, but every processor and
MHz (performance governor) or at 583 MHz to 1,400
BIOS manufacturer may call the related settings dif-
MHz (ondemand governor) depending on the load.
ferently. There is a kernel command line parameter
that can be used to limit the sleep state to a certain
Worst-case latency maximum, but this does not necessarily prevent the
150
Latency
BIOS from sending the processor to sleep state. If
(μs) in doubt, the kernel parameter processor.max cstate
120 can be used. If set to 1, the kernel will not send the
processor to any sleep state. Figure 2 exemplifies
CPU #1
a long-term latency recording where this parameter
90
CPU #0 was used during certain periods of time. A total
of more than 220 latency plots are stacked horizon-
60 tally with the time running backwards from back to
front. It is clearly visible that the worst-case latency
dropped from about 200 to about 130 microseconds
30
when this parameter was specified.

0
Performance (1,400 MHz) On-demand (583 MHz)

FIGURE 1: Typical worst-case latency


analysis with throttling disabled (performance
scaling governor with a constant clock fre-
quency of 1,400 MHz) and throttling enabled
(ondemand scaling governor with a minimum
clock frequency of 583 MHz)

4.3 Recommendation

If the fastest possible reaction of a processor to exter-


nal events is required, any throttling setting should
be disabled, and the clock frequency of the processor
should be set to full speed. Power consumption will
then most probably reach the specified maximum,
and care must be taken that sufficient cooling capac-
ity is available. However, not all tested processors
revealed a distinct effect of the clock frequency on FIGURE 2: Effect of the kernel command
the worst-case latency. It may, therefore, be advis- line parameter processor.max cstate=1 to pre-
able to determine the worst-case latency under the vent the processor from entering a sleep state
various settings of the CPU frequency scaling to en- (data obtained on an AMD six-core processor
sure that this has the expected effect. Phenom II X6 1090T)

233
How to cope with the negative impact of a processors energy-saving features on real-time capabilities?

5.1 Effect of cycle interval on latency

To investigate whether a processor uses sleep states,


the modified cyclictest utility was used that allows
to specify a range of test intervals. A range of 100

log10 of
2.5
microseconds to 1 second was specified, the number 2.0
1.5
of histogram cells was set to 100 which is equivalent

Frequen
1.0
to 100 microseconds, and a total of 40,000 cycles per 0.5
core was preset: 0.0

cy
10

s)
20 60

(u
55
30 50

al
cyclictest -m -Sp99 -i100-1000000 -l40000 \ 45

rv
40 40

te
Lat 50 35

In
-h100 enc 60 30

cle
y (u 25
s) 70

Cy
20
80 15

of
10

0
90

g1
Figure 3 shows the result of such a measurement 5

*lo
100

10
on an Intel Core i3-2100T processor running at a
maximum clock frequency of 2,500 MHz. All power-
saving features were enabled at BIOS level. The in- FIGURE 4: Frequency of latency samples
terval is scaled logarithmically and multiplied by 10; with respect to the duration of the preceding
thus, the scale value of 20 is equivalent to 102 mi- interval of quiescence (data obtained with a
croseconds and the scale value of 60 is equivalent to modified version of the cyclictest utility), Intel
106 microseconds = 1 second. It can be seen that the Core i3-2100T @2,500 MHz, all power-saving
worst-case latency increases with the increase of the features disabled
duration of the preceding measuring interval. This
is very probably the result of entering a sleep state
when the processor is idle for a certain amount of
time. 5.2 Recommendation

When a minimum worst-case latency must be


achieved, the BIOS settings should be investigated
2.5 for the occurrence of terms such as ”power saving”,
log10 of

2.0 ”energy saving”, ”speed steps”, ”power now”, ”green


1.5
computing” etc. If possible, anything that is suspi-
Frequen

1.0
0.5 cious to reduce the computing speed while in idle
state should be disabled. In addition, the kernel pa-
cy

0.0

10 rameter processor.max cstate=1 could be tried.


s)

20 60
(u

55
30 50
al

45
rv

40 40
te

Lat 50 35
In

enc 60 30
cle

y (u 25
s) 70
Cy

20
80 15
of

10
0

90
g1

5
6 Undisclosed internal mecha-
*lo

100
10

FIGURE 3: Frequency of latency samples


nisms
with respect to the duration of the preceding
interval of quiescence (data obtained with a In addition to throttling and sleep states, there prob-
modified version of the cyclictest utility), Intel ably are undisclosed internal mechanisms that reduce
Core i3-2100T @2,500 MHz, all power-saving the power consumption of a processor and conversely
features enabled affect the worst-case latency. Not all mechanisms,
however, are related to idle state. One of the tested
When the same command was run on the iden- processors, for example, is used in a notebook along
tical hardware but with BIOS power-saving features with a chip set that is optimized for mobile systems.
disabled, a completely different result was obtained. When the modified version of the cyclictest utility
First, the worst-case latency was lower in general; was run on this processor, elevated worst-case la-
secondly, there was no increase of the worst-case la- tency values were obtained, but they were indepen-
tency with increasing interval duration (refer to Fig- dent from the duration of the test interval (refer to
ure 4). Figure 5).

234
Real-Time Linux Concepts

increase by an order of magnitude.


It is not difficult to predict that battery-driven
devices such as smartphones will have an important
influence on the development of future processors.
log10 of

2.0
1.5 And it is well conceivable that the semiconductor
Frequenc

1.0
0.5
industry will develop even more sophisticated mech-
0.0 anisms of power saving that may interfere with real-
y

time execution even more than they already do. It re-


10

s)
60
20 mains to appeal to semiconductor manufacturers to

(u
30

al
50

rv
40
- whatever mechanisms they invent to reduce power

te
40

In
Lat 50
enc 60

cle
30
y (u consumption - provide a way to switch them off, at

Cy
s) 70 20

of
80
least if they adversely affect response time. On the

0
g1
90 10

*lo
other hand, field-programmable gate arrays have be-

10
100

come very powerful in recent years; this makes is


FIGURE 5: Frequency of latency samples possible to create 32-bit software processors. Related
with respect to the duration of the preceding softcores are already available and can be used under
interval of quiescence (data obtained with a an Open Source license [3]. These processors may be
modified version of the cyclictest utility), Intel designed individually in such a way that they pro-
Pentium Dual-Core T4500 @2,300 MHz, all vide optimum conditions for operating in a real-time
power-saving features disabled environment.
Most of the findings presented in this article are
7 Conclusion not specific for PREEMPT RT mainline real-time
Linux but may be applicable to real-time extensions
and to other operating systems as well.
Power-saving features of state-of-the-art micropro-
cessors certainly have a negative impact on the Some of the measurements were made on proces-
worst-case latency. If a very fast response to external sors that are under test at the quality assurance farm
events is crucial, e.g. in the range of single-digit mi- of the Open Source Automation Development Lab
croseconds, power saving should better be disabled (OSADL). In addition to continuous latency record-
whenever possible. If the constraints on the worst- ings and cyclic latency plots on a large number of
case latency are less tight, power-saving may well different systems, selected systems are running under
coexist with real-time execution, since the determin- particular and documented power-saving conditions.
ism of the real-time response is not affected per se. It All data are publicly available on the organization’s
must, however, be considered that the latency may Website at osadl.org/QA [4].

References
[1] The Real-Time Linux Kernel Wiki https://rt.wiki.kernel.org/index.php/Cyclictest - may not be accessible
[2] GIT repository of the RT Tests that includes, among others the cyclictest utility, maintained by Clark
Williams http://git.kernel.org/?p=linux/kernel/git/clrkwllms/rt-tests.git;a=summary - may not be acces-
sible
[3] Martin Walter, The SCARTS Hardware/Software Interface, OSADL Academic Works Vol. 2, 2011
[4] The OSADL Quality Assurance Farm https://www.osadl.org/QA/

235
How to cope with the negative impact of a processors energy-saving features on real-time capabilities?

236
FLOSS in Safety Critical Systems

On integration of open-source tools for system validation,


example with the TASTE tool-chain

Julien Delange and Maxime Perrotin


European Space Agency
ESAESTEC, TEC-SWE
Keplerlaan 1
2200AG Noordwijk, The Netherlands
[email protected], [email protected]

Abstract
Design and implementation of safety-critical system is very difficult because they must ensure contin-
uous correct operational state whereas they are deployed in hostile environments. An error either during
design or implementation phases can have significant impacts and consequences. To avoid such issues,
failure cases must be clearly identified and handled by software engineers to prevent any propagation from
one faulty component to another. For that purpose, good practices and standards are applied during the
development process, from the specifications to the implementation.
However, despite all existing efforts, bugs are still introduced. They are introduced at different levels of
the development process: either in the specifications (as in the Mars Climate Orbiter mission - failure was
due to a mix-up of metric units) or in the implementation (as in the Ariane 5 launch - wrong assumption
was made about a data type so that the system generate an overflow).
Over the years, several solutions have been designed to address such issues. However, they rely on
different system representations and are applicable at different level of the design process, so that their
use could be difficult and may lead to design inconsistencies. In consequence, we have to avoid these
problems and make their use more consistent.
In this paper, we present our tool-chain for system design, validation, implementation and certification.
It relies on a modeling notation to capture both software and hardware concerns. The use of a single
notation ensures specification consistency and avoiding potential errors when using different language to
specify the same system aspect. We detail the support of this process in The Assert Set of Tools for
Engineering (TASTE) development tool-chain.

1 Introduction 1. Validation: from specifications, engineers/de-


velopers check system feasibility and require-
ments enforcement
Safety-critical systems design and implementation is
very difficult: they must operate correctly and con- 2. Code production: developers implements
tinuously whereas they are usually deployed in hos- the system by translating the specifications
tile environments. Misconception or errors may have into code (Ada/C) that can be compiled.
significant consequences and are potentially mission
or life critical. To avoid such issues, failure cases 3. Certification: implementation execution is
must be clearly identified and handled by software validated against its specifications and/or es-
engineers to avoid any propagation from one faulty tablished standards (DO178B, ECSS, etc).
component to another. For that purpose, several
Even with a such careful process, errors remain
guidelines and standards have been designed and are
either in the specifications (as in the Mars Climate
currently used during all system development phases.
Orbiter mission - failure was due to a mix-up of met-
Safety-critical systems development is usually ric units) or in the implementation (as in the Ariane
splitted in several phases, as illustrated in figure 1: 5 launch - wrong assumption was made about a data

237
On integration of open-source tools for system validation, example with the TASTE tool-chain

type so that the system generate an overflow). In 2 Problem & approach


particular, errors may be introduced at each step:
2.1 Problem statement
1. Validation: because it often rely on differ-
ent system representations, a requirement vali-
Despite all existing initiatives, errors/bugs are still
dated using one notation may not be validated
introduced during development. To reduce them as
using another one.
much as possible, one has to:
2. Code production: developers often intro-
duce errors or bugs either by misunderstanding 1. check and validate specifications auto-
system specifications or just by making syntax matically
or semantic errors.
2. verify implementation correctness re-
3. Certification: this also relies on a manual garding specifications
process where engineers can make errors by
misunderstanding specifications. Different tools already address these issues. How-
ever, they are loosely coupled and rely on a different
notations that lead to potential semantic issues so
System specifications that:

Result 1. a requirement R1 validated using the specifica-


Validation tion language L1 and the validation tool T1 can
OK be not validated when using the specification
language L2 and the validation tool T2.

Code production KO 2. even if the user manages to translate system


specification from one language to another, the
process will require a manual translation which
is error-prone: the user can introduce specifi-
Implementation cation errors by himself (syntax error, misun-
derstanding of system specifications, etc.).

As a result, there is a strong need for a more


Certification/compliance consistent approach that strengthen system develop-
with specs. ment with:

1. A single notation for all system aspects so


Result that we avoid several representations of the
same concepts and so, prevent any specifica-
OK KO tion inconsistency.

2. An automation of development steps with


FIGURE 1: Generic development process tools that process specifications and produced
work-flow development output without human guidance.

To cope with these problems, the development


process must be more consistent and automated as 2.2 Proposed approach
much as possible. It would automatically check re-
quirements from specifications and ensure standards First, we propose to capture system architecture
compliance enforcement at lower levels. As a result, with its requirements, properties and constraints us-
this would also reduce development cost (system is ing a single modeling language. It would specify both
verified by appropriate tools), ensure the develop- software and hardware concerns with a unique nota-
ment process reliability and robustness due to the tion, avoiding all usual semantic issues.
automation of each step.
From this high-level representation (models), de-
Next section details in details identified problems velopment steps (same as in figure 1) are automati-
and presents our approach to address them. cally processed by appropriate tools. In particular:

238
FLOSS in Safety Critical Systems

1. Validation: tools automatically process mod-


els to check system correctness and feasibility
Data View Deployment view Interface view
so that designers can fix specifications errors (AADL & ASN1) AADL AADL

before further development efforts.


asn1scc Concurrency View
AADL

2. Code Production: code generators trans- Requirements Code Execution


form specifications/models into implementa- Validation generation Analysis
tion code (such as Ada or C). This gener-
ated code is automatically compiled and linked Validation result
against a Real-Time execution platform that OK KO
supports system entities (tasks, mutexes, etc.)

Functional code Execution result

Types mapping
C/Ada
OK KO
3. Certification: implementation is either sim-

C/Ada
Configuration & deployment
C/Ada
ulated or executed on the target to check its
behavior correctness and standards (such as Operating system
C/Ada

DO178B or ECSS) compliance.

FIGURE 2: TASTE process work-flow


We implement this process in The ASSERT Set
of Tools for Engineering [1] (TASTE) by using AADL From these models, our tool-chain automates the
as a single specification language. We also design development, as illustrated in figure 2:
tools that process AADL models and support each
development step. Next sections details the both 1. Validation: it checks specification correctness
language, its tailoring for our needs and the specific by processing models and using appropriate
tools we develop in that context. tools:

(a) Cheddar [7] or MAST [8], two scheduling


analysis tools released under free software
(b) REAL [15], an AADL [9] query tool inte-
grated in Ocarina [10] that checks specifi-
3 Overview of TASTE cation correctness

2. Code production: it transforms AADL mod-


The Assert Set Of Tools for Engineering [1] (TASTE) els into C code using Ocarina [10], an AADL
is the outcomes of the European ASSERT project [2]. tool-suite released under GPL licensing terms.
It [1] aims at providing a complete functional tool- In particular, Ocarina is able to automatically
chain for safety-critical system development, from generate code that targets real-time embedded
the specifications to the certification/validation. platforms (RTEMS [12], VxWorks[13], etc.)
and standards (RT-POSIX[14]).
It relies on the Architecture Analysis and Design
The code is then integrated on top of a real-
Language (AADL) [9] to represent both software and
time execution platform: Ocarina [10] actually
hardware concerns, their properties and constraints.
supports the following free-licensed platforms:
First, software aspects are specified by the Data
RTEMS [12], Xenomai [11] and POK [18].
View and the Interface View, two AADL models
that represent system functions (C/Ada code) and 3. Certification: it executes the code either on
the data they share (using ASN.1 [19], a standard- the target or a simulator (such as QEMU),
ized notation for describing data types). Hardware checks:
and deployment concerns are described with the De-
ployment View, an AADL model that describes • its performance (using gprof, a perfor-
the execution platform (processors, devices, memo- mance analysis tools included in the GNU
ries, etc.) and its association with system functions. binutils suite [3])

239
On integration of open-source tools for system validation, example with the TASTE tool-chain

• reproduces its behavior (by instrument- the the overall development process, making it more
ing the code and produce a Value Change consistent. Extensions mechanisms allow us to tailor
Dump (VCD) [6]) file to be used with the language to our needs:
GTKWave [5])
• Properties extension mechanism is used to
• produces code coverage reports using the
define specific requirements from textual spec-
COUVERTURE tool-set [4] (specific free-
ifications to the AADL model (for example, to
licensed tools from Adacore that aims at
model memory concerns such as stack or heap
supporting code coverage using a specific
size, etc.).
tailored version of QEMU [16]).
• Annex languages mechanism is used to as-
The use of a single notation (AADL [9]), pro- sociate our in-house AADL validation tool
cessed by dedicated tools for each development as- (REAL) to check requirements enforcement.
pect makes the overall process more consistent. In It processes processes models according to its
addition, automation of model processing avoids is- components hierarchy and check for system
sues of usual development process and ensures re- requirements validation (for example: can a
quirements traceability. Finally, while system fea- process P1 with 1Mb of RAM contain three
sibility and requirements are automatically checked threads that require a stack of 800Kb ?).
during the development process, these tools also pro-
vides metrics (such as code coverage) that can be If several modeling languages already exist for
used for system certification. the specification of real-time embedded systems, no
one provides the ability to capture both hardware
Next sections focus on validation and certifica- and software aspects with such a flexibility. That is
tion functions of our tool-chain: why our choice was focused on this language.

• Section 4 describes our system validation func-


tions from using AADL specifications. 4.2 REAL validation tool-set
• Section 5 details the automatic certification REAL (Requirements Enforcement Analysis Lan-
process with respect to the implementation. guage) [15] is a language that associates validation
theorems to AADL components. A dedicated solver
analyzed model components with their theorems and
4 Model Analysis & Validation check their enforcement.
To use REAL, users have to:
The TASTE tool-chain relies on AADL [9] to spec-
ify software (Data View and Interface View) and • Map properties and constraints from the tex-
hardware (Deployment View) aspects. AADL is a tual specification to the AADL model (for ex-
component-based language to specify hardware and ample, execution time for each system func-
software concerns with their execution constraints. tion, period/deadline of each task, etc.).
It has all necessary constructs to express safety- • Design theorem to check requirements feasibil-
critical system concerns and supports mechanisms to ity (for example: functions can be executed
address specific modeling needs. They can be writ- within task period).
ten using either a textual or a graphical notation and
is supported by a large tool-set, from command-line One key aspect is the genericity of this approach:
interface tools (such as Ocarina [10]) to tools with users can keep existing theorems in a library that
advanced graphical interface (like OSATE [9]). would be reused for later projects.
As this article does not aim at providing a full Listings 1 and 2 give an example of the definition
overview of AADL, readers that would like to learn of a REAL theorem and its application on an AADL
more about it can refer to the introduction written model. Listing 1 defines a (incomplete, due to lack
by its designers [20]. of space) model with one main system containing:

• One process component with two tasks


4.1 AADL modeling benefits (thread components). The first one requires
35Kbytes of memory and the other 57Kbytes;
By introducing a single specification notation for
• One memory component with 40000 bytes.
both software and hardware concerns, we strengthen

240
FLOSS in Safety Critical Systems

TASTE interfaces AADL specifications with two


1 p r o c e s s i m p l e m e n ta ti o n p . i
2 subcomponents
scheduling analysis tools: Cheddar [7] and MAST [8].
3 task1 : thread t . i Both are available under the GPL license terms.
4 { S o u r c e S t a c k S i z e => 10 Kbytes ; Next sections give an overview of these tools and
5 Source Data Size => 20 Kbytes ; explain how AADL models are exported to them.
6 Source Code Size => 5 Kbytes ; } ;
7 task2 : thread t . i
8 { S o u r c e S t a c k S i z e => 2 Kbytes ;
9 Source Data Size => 50 Kbytes ; Overview of Cheddar and MAST
10 Source Code Size => 5 Kbytes ; } ;
11 end p . i ; Cheddar [7] is a scheduling analysis tool written in
12
13 system i m p l e m e n ta ti o n s . i Ada that provides command-line as well as graph-
14 subcomponents ical interface (shown in figure 3). It validates tim-
15 mem : memory ram . i ing constraints either by simulating system execution
16 {Word Count => 1 0 0 0 0 ;
or performing feasibility tests. To do so, the user
17 Word Size => 4 b y t e s ; } ;
18 prs : pr oces s p . i ; must describe system architecture (processor, task,
19 properties scheduling policy, etc.). Cheddar supports state-of-
20 Actual Memory Binding => the-art scheduling algorithms (RMS, EDF, LLF), as
21 ( r e f e r e n c e (mem) ) a p p l i e s to p r s ;
22 end s . i ;
well as standardized algorithms (like the one avail-
able in POSIX 1003b). Cheddar analysis also takes
Listing 1: AADL model example to be processed by inter-tasks dependencies into accounts with an anal-
the REAL validator ysis of different sharing methods such as PIP, PCP
or IPCP.
The REAL theorem (listing 2) checks that for
each AADL process component of the model in
listing 1, the amount of memory required by its tasks
(lines 10 to 12 of listing 2) is less than the size of its
associated memory (lines 14 and 15 of listing 2). Re-
garding the model of listing 1, this theorem is not be
validated and validation tool would issue an error.
1 theorem check memory
2 f o r e a c h p r s i n p r o c e s s s e t do
3 t := {x i n T h r e a d S e t |
4 is subcomponent of (x , prs ) } ;
5
6 m := { x i n Memory Set | u
7 i s b o u n d t o ( Prs , x ) } ;
8
9 check (
10 ( sum ( p r o p e r t y ( t , ” S o u r c e S t a c k S i z e ” ) ) +
11 sum ( p r o p e r t y ( t , ” S o u r c e D a t a S i z e ” ) ) +
12 sum ( p r o p e r t y ( t , ” S o u r c e C o d e S i z e ” ) ) )
13 <
14 ( sum ( p r o p e r t y (m, ” w o r d c o u n t ” )∗ FIGURE 3: Cheddar scheduling validation
15 ( p r o p e r t y (m, ” w o r d s i z e ” ) ) ) )
16 );
As for Cheddar, MAST [8] is a tool (shown in
17 end check memory ;
figure 4) that aims at validating scheduling feasibil-
Listing 2: REAL theorem that checks task memory ity of a system. It can analyzes system using several
requirements algorithms either for task scheduling (RMS, EDF,
etc.) or data locking (PIP, PCP, etc.). On the other
hand, MAST takes into account distributed systems
concerns, which is especially critical for real-time sys-
4.3 Scheduling validation tems. For example, when a task execution is trig-
gered an incoming data from another task, analysis
Scheduling is a very intensive topic in the con- has to take into account scheduling concerns since a
text of embedded and real-time systems. Numer- delay on the sender side would have an impact. In
ous scheduling analysis techniques and methods have addition, network-related aspects (such as latency,
been designed over the years trying to evaluate sys- jitter, etc.) may also impact system execution as a
tem scheduling feasibility. whole. A system specification in MAST takes these

241
On integration of open-source tools for system validation, example with the TASTE tool-chain

aspects into account, offering a convenient way to Connect scheduling analysis tools with AADL
analyze distributed systems.
To automate scheduling analysis, TASTE transforms
system specifications (Interface View and De-
ployment View) into a new description that can
be processed by MAST or Cheddar. First, it trans-
lates the AADL models into a Concurrency View:
a single AADL model that merges both software and
hardware aspects with all execution entities (tasks,
shared variables, etc.) with their scheduling con-
straints (scheduling algorithm of processors, locking
policy for shared variables, etc.).
Then, an appropriate code generator (Oca-
rina [10]) transforms this concurrency view into a
new representation suitable for Cheddar or MAST.
In fact, it consists in translating AADL language
constructs into an XML representation that can be
processed either by Cheddar or MAST. As a result,
this export function of our tool-chain bring the abil-
FIGURE 4: MAST scheduling analyzer in- ity to automate scheduling analysis with both tools
terface from the same specification (AADL models).

However MAST and Cheddar rely on their own


specification language. Consequently, engineers have
to translate system specifications into a new rep-
resentation dedicated to scheduling analysis. This 5 Implementation analysis
mapping is error-prone: engineers can make a mis-
take so evaluation would be done using wrong as-
sumptions. For that reason, TASTE automates this TASTE automatically creates system implementa-
translation from AADL to a specific representation tion from its Interface View, Deployment View
that would be used by either Cheddar or MAST, as and Data View (AADL models) by generating code
detailed in the next section. that targets Real-Time operating systems. However,
even if this automatic process offers many benefits
(error avoidance, requirements enforcement & trace-
Deployment view Interface view
AADL
ability, etc.), system implementation still has to be
AADL
validated and also met certification requirements.

Concurrency View
AADL

Code 5.1 Performance analysis


generation
Once system implementation is generated, develop-
ers can deploy it on the execution target. Then,
Cheddar Model MAST Model
appropriate tools trace/monitor system behavior to
evaluate its performance. For that purpose, several
Cheddar MAST tools already exist and are released either under pro-
prietary or free-software license.
To assess generated application performance,
Scheduling TASTE uses gprof, an execution profiling program
OK KO
feasability available in the binutils [3] tool-set. Its main advan-
tage is its integration within the GNU compilation
tool-chain: just by adding a flag in the compilation
FIGURE 5: Scheduling validation process options enable application profiling that can be later
of TASTE processed by analysis tools.

242
FLOSS in Safety Critical Systems

ample, the task activation time is correct regarding


specified period and deadline).

FIGURE 7: System behavior description


with GTKWave

FIGURE 6: Gprof interface of TASTE 5.3 Code coverage analysis

TASTE provides its own interface method with Standards such as DO178B [21] (for avionics sys-
gprof, as illustrated in figure 6. It parses profiling tems) or ECSS [22] (for aerospace applications) re-
results by its own and produces an execution report quires that safety-critical systems enforces a prede-
with the execution time and the number of execu- fined code coverage, depending on their criticality
tion for each function. By using this report, engi- level.
neers check execution traces compliance with system To do so, different methods are commonly used,
requirements. but most of the time, they require a manual instru-
mentation or inspection of application code. Code
instrumentation is intrusive: the code under inspec-
5.2 Specifications compliance en- tion is not the one that would be deployed and so,
forcement validation results may not be relevant. In addition,
a manual inspection is still error-prone, due to the
Execution profiling provides metrics and data that human-factor errors.
could detect some erroneous execution case (a func-
tion called too many times, a call that would not Binary QEMU
happen, etc.), but may be not sufficient to check im-
plementation correctness. In particular, implemen-
tation validation requires to check implementation
xcov Exec trace file
consistency with the specifications (AADL models).
This consist in monitoring system events, and check Coverage report
their compliance with the model. For that purpose,
TASTE provides functions to monitor system events FIGURE 8: Work-flow of the couverture
at run-time and create appropriate metrics that can tool-set
be compared with its specifications. To do so, it
instruments generated application with profiling in- To cope with these issues and provide an accu-
structions that produces VCD [6] files at run-time rate coverage analysis, TASTE relies on the COU-
(example of events reported is available in figure 7) VERTURE tool-set [4], a code coverage analyzer re-
with the following metrics: leased under free-software license terms. It relies on
two main tools that produce coverage reports (as
• Task activation time shown in figure 8):

• Data sent/received through tasks port 1. A tailored version of QEMU [16] traces all exe-
cuted instructions when executing the system.
• Shared data usage (semaphore/mutex acquisi-
tion and release) 2. An analysis tool, xcov, compares executed in-
structions with the program under execution
Once produced, programs such as GTKWave [5] and produces a coverage analysis report.
as used to depict system events with a graphical in-
terface and provide the ability to analyze system be- By tracking executed instructions and establish-
havior. It offers the ability to check run-time be- ing a mapping with the source file, xcov produces
havior consistency with system specifications (for ex- a complete coverage report, as shown in figure 9. It

243
On integration of open-source tools for system validation, example with the TASTE tool-chain

details the execution of each line of code so that de- 1. A sensor for temperature acquisition.
velopers are able to assess if some block could be
2. A filter for bad data detection.
removed or not.
3. An average computer that receives each
However, to evaluate system implementation,
new temperature value from the filter and prints
this coverage functionality would be integrated with
the average temperature.
a test framework that would execute generated ap-
plications with different input values that are repre- Each function is deployed on top a Real-Time
sentative of a real environment. This would provide Operating System, executed on a single processor:
a better assessment of system quality, force each con-
• The sensor function is deployed on a LEON2
dition/decision of the code to be executed and lead
processor with RTEMS.
to a better coverage analysis.
• The filter function is executed on an Intel i386
processor with a Linux operating system.
• The average function is deployed on a LEON2
processor with RTEMS.
Finally, to enable the communication between
functions sensor/filter and and filter/average, the
processors are connected using a SpaceWire bus, as
shown in figure 11.

FIGURE 9: Example of a coverage report


produced by xcov

One particular interest of the COUVERTURE


tool-set is its non-intrusive characteristic: coverage
analysis is representative of the quality of applica-
tions that are finally deployed. In addition, COU-
VERTURE supports several coverage methods that FIGURE 10: Interface view of the system
required by certification standards (Statement Cov-
erage - SC, Decision Coverage - DC or Modified Con-
dition Decision Coverage - MCDC). Using these dif- 6.2 Specification & Validation
ferent methods, we could evaluate and potentially
certify generated systems at different levels. Functional aspects are specified in the Interface
View, as illustrated in figure 10. Functions are de-
scribed with the following characteristics:
6 Case study
• The sensor function is triggered periodically
The following sections illustrate the use of our each second. When activated, it acquires data
TASTE tool-chain through a case-study that de- from the hardware device and sends it to the
ploys several functions into a heterogeneous and dis- filter function.
tributed architecture.
• The filter function is sporadic, meaning that
its activation is triggered by incoming of data
6.1 Overview from the sensor function. The Minimal Inter-
Arrival Time (MIAT) between two new data
This case study consists in simulating a temperature instance is 10ms. However, as the sensor func-
sensor with a basic forecast management system. It tion is executed each second, the filter function
is composed of three functions: would have the same period. When activated,

244
FLOSS in Safety Critical Systems

it filters the data and send it when it is consid- FIGURE 12: Scheduling analysis result for
ered as valid. For the needs of this simulation, the node acquisition
50% of received data are considered as correct
so that this function sends data to the average Finally, from these both models, we can validate
function every two seconds. some of its aspects prior to implementation efforts.
In this case-study, we run a schedulability feasibility
• The average function is also sporadic and ac- using Cheddar [7], as illustrated in figure 11. The
tivated when receiving incoming data from the scheduling feasibility test is based on simulation and
filter. As the filter function sends data ev- is performed for each processor of the system (figure
ery two seconds, this function execution follows 11 illustrates the result for the acq board).
this period.

6.3 Implementation validation

Then, TASTE processes system specifications


(Interface view and Deployment view, to gen-
erate binaries that would be executed on the target.
Then, it analyzes the implementation and verifies its
compliance with the specifications in order to check
certification requirements enforcement.
First of all, system profiling is performed us-
ing gprof, as illustrated in figure 13. Profiling
report shows how many time each function has
been executed. In the following example, the re-
sults refer to the execution of the first ten sec-
onds of the filter function. We can see that the
po hi delay until() function (called by each
FIGURE 11: Deployment view of the sys- function at the end of each cycle) has been invoked
tem almost 30 times, which can seem to be inconsistent:
this function is triggered each second so that func-
Then, once these functional aspects have been
tion would be activated at most 10 times. How-
described, their allocation on the platform has to be
ever, the node that hosts it communicates using two
specified using the Deployment View. It defines
SpaceWire bus and each one is using a task that
the hardware to be used and its association with the
polls the bus for incoming data periodically each sec-
previously defined functions of the Interface View.
ond. Consequently, if the functional aspects gener-
The deployment view of the case study (illustrated
ates only one task in the system, the deployment con-
in figure 11) is composed of three processors (two
cerns (device drivers) add additional resources that
LEON using RTEMS - acq board and avg board
have an impact on system execution.
and one intel i386 that uses Linux - filter board)
interconnected using SpaceWire links.

FIGURE 13: Profiling of the filter node us-


ing gprof

245
On integration of open-source tools for system validation, example with the TASTE tool-chain

FIGURE 15: Behavior analysis of the filter


node using GTKWave, fine grain

Finally, our tool-chain produces a code coverage


analysis report of the generated application. As ex-
plained in section 5.3, safety-critical standards (such
as DO178B [21] or ECSS [22]) require that appli-
cations enforce quality criteria such as a predefined
code coverage value. To do so, TASTE automati-
FIGURE 14: Behavior analysis of the filter
cally produces coverage report using the COUVER-
node using GTKWave, coarse grain
TURE [4] tool-set, as shown in figure 16.
Then, as detailed in section 5.1, TASTE ana- Reports can be produced in different formats
lyzes system implementation to check system behav- (text, HTML, etc.). They detail, for each file and
ior compliance with its specifications. Figures 14 and function, the coverage information, so that engineers
15 report run-time events that occur when execut- assess system quality based on execution metrics. In
ing of the implementation of the node filter board. our case-study, most functions of the RTEMS exec-
Figure 14 reports the events at a coarse grain : we utive are not used so that it significantly decreases
can see that system activity happens on a periodic the code coverage level of produced applications.
basis, each second. Then, figure 15 details the events
at a finer grain, each second:

1. The poller task of the SpaceWire driver


(task 0 on figure 15) is activated. It receives
incoming data from the acq board (that ac-
quires temperature from the sensors).
2. When receiving data, poller task from the
SpaceWire driver (task 0 on fig 15) transfers
data to system port (port 2 0 on fig 15).
3. New data instance triggers the execution of the
filter function and its associated task (task 2
on figure 15) which retrieves the data (so that
the size of the port port 2 0 fallbacks to 0), FIGURE 16: Code Coverage Analysis of
executes its code and waits for new data. the average node using COUVERTURE

We can see that these events are consistent with


system specifications: the SpaceWire driver receives 7 Conclusions & Perspectives
data each second and triggers the execution of the
sporadic function filter. However, we didn’t detail
This article gives an overview of open-source tools
the execution of the task task 1 which seems always
that provide help and guidance for safety-critical sys-
active. In fact, this task corresponds to the poller
tems design. They are used as early as possible to
function connected to the other SpaceWire bus. As
support each step of the development process. Such
it never receives data (this node only sends data
tools are usually loosely-coupled and require man-
through this bus), the associated task is always wait-
ual efforts to be tailored to the development process
ing for incoming data and never go to the sleep mode.
of each system. To cope with these issues, TASTE
makes their use more consistent by linking them with
a single specification notation.
For that purpose, AADL models describes sys-
tem architecture with its execution constraints us-
ing. Then, tools translates this specification notation
to be processed by validation programs that check
architecture correctness and requirements enforce-
ment. This process automates the process, avoiding
issues of usual development methods.

246
FLOSS in Safety Critical Systems

Use of such a tool-chain strengthens the develop- [2] The ASSERT project
ment process and makes it more robust and reliable. http://www.assert-project.net
Moreover, as potential errors are discovered early in
the development process and integration issues would [3] GNU binutils
likely be reduced, development cost are expected to http://www.gnu.org/software/binutils/
decrease significantly.
[4] The Couverture project
Further work would cover other aspects of safety-
critical systems development. In particular, our [5] GTKWave
tool-chain could also support additional guidance for http://gtkwave.sourceforge.net/
safety-critical standards (such as DO178 or ECSS)
enforcement by providing documentation generation [6] Value Change Dump file format
facilities or additional implementation code valida-
[7] Cheddar scheduling analyzer
tion (coding rules to be checked, etc.).
[8] MAST scheduling analyzer
7.1 Perspectives http://mast.unican.es/

[9] Architecture Analysis and Design Language


Automation of AADL models production from usual
http://www.aadl.info
text-based specifications or linking these two nota-
tion would be particularly useful and integrates our [10] Ocarina AADL Toolsuite
tool-chain with traditional design methods. Such a http://ocarina.enst.fr
translation process will require that:
[11] Xenomai - http://www.xenomai.org
• all system entity and its associated require-
ment from the initial specifications are cor- [12] RTEMS - http://www.rtems.com
rectly translated into AADL components
[13] VxWorks - http://www.windriver.com
• there is no specification inconsistency (due to
semantic issues, mapping error, etc.) between [14] POSIX 1003.1b
the AADL model and the initial specifications.
[15] Olivier Gilles and Jérôme Hugues - Validating
Tools that address these issues are currently be- requirements at model-level in Ingnierie Dirige
ing developed. For example, the TOPCASED [23] par les modles (IDM08)
requirements importer tool provides the ability to
connect a requirement from a textual document [16] Fabrice Bellard - QEMU, a fast and portable
(with an extension such as .odt, .pdf, .txt, etc.) dynamic translator in Proceedings of the annual
to a model (potentially AADL). This work is emerg- conference on USENIX Annual Technical Con-
ing and available tools are still considered as exper- ference
imental, but this topic is a particular interest and
would be a major interest to trace a requirement [17] Xtratum - http://www.xtratum.org/
from its description (in the textual document) to the
implementation (the code). [18] Partitioned Operating Kernel - POK
http://pok.safety-critical.net
Then, another idea is to strengthen our tool-
chain by improving its functions. In fact, some [19] Gerald W. Neufeld and Son Vuong - Overview
TASTE functions are limited to several architectures of ASN1 in NetISDN, 1992
or platforms (for example, system analysis that pro-
duces VCD files is limited to Linux platforms). An [20] The Architecture Analysis & Design Language
important improvement would consist in supporting - An introduction
all potential deployment platforms.
[21] Radio Technical Commission for Aeronautics
(RTCA) - DO-178B: Software Considerations in
Airborne Systems and Equipment Certification
References
[22] ECSS-E-40, Space Engineering Software
[1] The ASSERT Set of Tools for Engineering
http://www.assert-project.net/taste [23] TOPCASED - http://www.topcased.org

247
On integration of open-source tools for system validation, example with the TASTE tool-chain

248
FLOSS in Safety Critical Systems

Safety logic on top of complex hardware software systems utilizing


dynamic data types.

Nicholas Mc Guire
Distributed and Embedded Systems Lab
SISE, Lanzhou University
[email protected]

Abstract
Utilizing computers for safety critical systems, notably contemporary super scalar multi-cores, let
alone NUMA systems running general purpose operating systems like GNU/Linux, is quite contended in
the safety community - their hopes still rest on determinism and KISS. While keeping things simple in
the safety related components is undoubtedly preferred, it is questionable if keeping the hardware model
simple is realistic - notable with the divergence of reality from model with respect to determinism already
being dramatic for widely used general-purpose single-core CPUs. Further actually deterministically
covering the impact of all complex software components is not doable with an economically tolerable
effort (if it is technically doable is a different issue).
The consequence of this belief in determinism, is an, in our opinion useless, fight against complexity
and non-determinism - two inherent properties of modern hardware/software systems. Quite to the
contrary, we propose to utilize the properties of complex systems to enhance safety related systems. This
seemingly paradox approach can be seen as an attempt to take the bull by the horns as it seems inevitable
that the time of simple CPUs and black-box proprietary operating systems, that continue to entertain
the illusion of determinism, is coming to and end.
Safety mechanisms, drawing enhancements from underlying complexity, we see as potentially suitable
for building safety related systems are:
• computation: Inherent diversity
• data: mapping value domain to complex data representations
• time: loos coupling: inherent randomness
and we are quite sure that this little list is incomplete at this point.
In this article we will describe an attempt at the second category called dynamic data types, which
essentially combine the value domain with the temporal properties of data to map data to a value in
the frequency domain rather than to a value in the time-domain. We outline the concept of dynamic
data types and a rational for why it seems a promising approach for covering of particular fault classes.
Finally we describe how building simple logic utilizing dynamic data types on complex systems can yield
a safe system never the less and thus allow to co-locate safety related logic with non-safety related general
purpose applications and services on a single contemporary system.
Keywords: Safety logic, complex systems, dynamic data types

1 Introduction this absence of noise has great technical advantages,


it has some interesting side-effects with respect to
safety properties. During failure mode and effect
Digital systems have a number of well known advan-
analysis for digital systems one regularly stumbles
tages, notably with respect to signal rectification and
across a number of fault classes seemingly inherent
complexity of signal interaction without degenera-
to digital systems:
tion - simply speaking the absence of noise. Though

249
Safety logic on top of complex hardware software systems utilizing dynamic data types.

• permanent bit faults (a memory cell, a shorted if this is implemented with common means then
relay, etc) we have a signal level (voltage or digital makes no
difference here) indicating the switch position of
• transient bit faults (the infamous cosmic ray -
up/down, we have an actuator that operates on such
or more earthly EMV issues)
a signal and we have the indicator lights that will
• systematic operational faults (i.e. the FOOF provide operator feedback on current actions.
bug in Pentium I) possible single faults (simple model)
• transient operational faults (i.e. critically low
voltage)
• up stuck:
• temporal faults (i.e. clock drifts or clock up- actuator ”sees” up pushed and lifts the door,
dates) it is fed back to the operator via indicator light
but the operator might not be able to close it,
This is a bit course grained but serves well for and the actuator can’t actually determine that
the discussion here. It also should be noted that fail- the input is unintended or invalid. The haz-
ures are only one possible cause of hazards, so this ard is probably limited as the indicator is cor-
is in principle incomplete [5]. rect and humans thus can respond according
All of these fault classes are then mitigated by to preset procedures to achieve a ”safe state”
different technologies (see [1] IEC 61508 part 7 for - long term consequences of ”up” being per-
a overview of available technologies) , starting from sistent are obviously only meaningfully inter-
redundancies and different levels, by adding diversity preted in a particular context.
in hardware or software, and by introducing pro- • down stuck:
tocols that mitigate against consequences of these actuator ”sees” down permanent and closes the
faults (i.e. using sequence numbers, CRC, etc.[4]). door - feedback to the operator indicates closed
All of this surrounding safety related mitigation’s of door - safe state assumed - thus no hazard, but
the respective fault classes may well be technically availability is impacted.
suitable, but the question to ask first is - why do
these faults exist in the first place - and if they are • actuator stuck:
inherent to digital systems can they not be mitigated the actuator does not respond to input, it stays
at a generic level rather than growing the complexity in what ever state it happens to be - this may
at the application level ? be visible in the indicator light but the opera-
tor would not see that the actuator is damaged
There have been some indirect efforts to mitigate in the signal - an additional diagnostic input
these issues at a generic level, think of the many IEC would be needed (i.e. indicator for the actua-
61131 [3] runtime systems allowing to focus again on tor operation status it self). Depending on the
the relatively simple logic and handling the safety actuator stuck position this may or may not be
related issues below-the-hood. The assumption be- safe - again context would be needed.
ing that for a simple set of operations a complete
list of potential faults can be established and miti- • indicator stuck green: - this would not allow
gated. The methods though used for this mitigation the operator to detect a potential hazardous
again are relatively specific and in general limited to situation and exposure to a hazard could oc-
ladder-logic or boolean-logic constructs - so the ques- cur based on procedures (green indicating en-
tion remains - what is the root cause and can it be tering permitted). The problem would not be
mitigated at a principle level? detected until an operator attempts to change
state by requesting ”up” and not getting any
response (indicator light changing to red) - but
2 Naive case study it would rely on human observation and even
if detected the diagnostics would require addi-
tional means.
This case study might seem quite trivial but from
a safety perspective we believe it demonstrates the • indicator stuck red:
potential advantage of the concept introduced in this impacts availability primarily.
paper - dynamic data types.
Note that we are not considering any temporal
switch <---> actuator <---> indicator lights issues here - there are of course a whole set of tempo-
(up/down) (red/green) ral failure modes as well, nor are we concerned with

250
FLOSS in Safety Critical Systems

the (and often dominant) non-technical failures like violation of value domain constraints. That is to say
management and safety culture failures. if a register contains an integer and a bit is flipped
it still contains an integer and is within range - the
bit-flip is not noticeable in the data-type. The rea-
2.1 Adding diagnostics son for this is that the value domain is dense (even
though it is discrete - dense in the sense that there are
As noted above there are a number of faults that no ”holes” in the representation) - conversely analog
could be diagnosed if additional information were signals had to be constructed with a granularity to
available from additional sensors - but the obvious ensure detectability of divergence, so the analog sig-
disadvantage of this is that additional sensors mean nal spectrum, effectively usable, could not be dense.
reduced availability on the one hand and the prob-
SEU
lem of accumulated faults for any indicators of ”rare-
events” - notably the later is problematic from a Single event upsets are random alterations of
safety perspective. A further issue is simply that some state of the system that can’t be predicted,
the system intended to be kept simple starts becom- they only can be covered by some form of fault-
ing more complex than would be needed for the pure tolerance or fault detection/reaction - they are in
logic. principle not testable as they are not visible until
they actually occurred. In case where the affected
The above example exhibits two main problems:
resource may stay in the altered state, the alteration
may be detectable if additional information about
• lack of indication of ”invalid” state for all com- the expected state is available.
ponents

• inability to identify the cause without addi- logic high .---- perceived value
tional sensors |
logic low------- - - intended value
^
If we look at the potentially hazardous situations
SEU
then these are associated with the problem of single
points of failure as well as the inability to signal ”in-
valid state”. Omission
Before considering methods to implement redun- Omission faults are due to the inavailability of
dancy, signal-diversity, safety communication, etc. some resource. They are in principle detectable,
we would like to look at the root causes and then though the effort for this may be high - note that
derive requirements that could potentially eliminate omission faults may also have systematic character-
them all together. istics (i.e. wire cross-talk).

logic high . - - - intended value


3 Root cause |
logic low ------------ perceived value
^
Paradoxically, from a safety perspective, one of the
bit failure output is a legal value
problems of digital systems is the absence of noise -
wire breakage
strictly speaking it is not absence, it just is a ”rare”
phenomena and thus perceived as being absent - we
don’t notice the rare event of bit-flips on the wire, So the SEU or omission occurrence is not visible
either it is covered by detection and retransmission in the output (think of this as a register or mem-
(i.e. ECC RAM, or noise cancellation on ethernet ory cell in which 0 was originally stored - due to
wires) or it leads to a failure that we then ”fix” by wire cross-talk it is flipped to 1 and then read as 1),
rebooting our PC... the bit failure on the other hand could also be due
to accumulated faults or physical failures of mem-
ory/register/wire but stay unnoticed until activated
3.1 Value domain by accessing. Testing for this fault is also not sim-
ple as it need not have caused a permanent damage,
The assumed absence of noise on the signals in digital so writing to the SEU affected location ”clears” the
systems has the irritating consequence that the rare fault aside from the problem of some paths not be-
event of bit-level alteration (SEUs) does not lead to a ing easily testable until you need them (think of fault

251
Safety logic on top of complex hardware software systems utilizing dynamic data types.

handling or emergency shutdown procedures). With |


other words the problem is that this type of fault | |
is neither detectable in the value domain, nor is it logic low ------------------’
mitigated in the data type. T1 T1’
^
This is in fact inherently a property of digital
clock fault output is late but not
systems, and a number of mitigation’s have been pro-
detectable by the externally
posed. Some of these mitigation’s are:
perceived value.

• inverse replicated logic (2-out-of-2)

• diverse replicated logic (2-out-of-2, N-out-of-M 4 dynamic data types


for availability reasons)
In this section we introduce a novel concept (at least
• periodic testing and statistical SEU occurrence in the context of safety related logic) of encapsulating
models ”protecting” the intermediate intervals. the general signal in a way that does not exhibit the
above mentioned limitations, and would not require
• safe data objects, protected by additional rep-
mitigation by additional means but rather enforces
resentations (i.e. CRCs)
inherent mitigation of faults. At this point we are not
(yet) claiming that the fault coverage is actually com-
All of these protections have one thing in com- plete - but we do think that the majority of relevant
mon, they tackle the problem at the symptomatic faults (notably single fault hypothesis) are covered
level, and all of them have the potential for common- without impacting availability by false-negative.
cause faults, though the probabilities can be argued
The system model behind dynamic data types
to be sufficiently low (if reality correlates with these
roughly is to view the input signals as streaming
arguments is not the topic here).
through the system and accumulating changes based
A further serious problem, from a safety perspec- on the operations done on them. There is no re-
tive, of digital systems is that the absence of an active liance on the correctness of operation, there only is a
signal, due to failure, may also be a legal value - i.e. reliance of the effects of operation being unique and
a solid 0. visible in the signal. This is in some ways similar
The root cause of the problem though lies in to coded mono-processors [7] that focus on the op-
the data type it self not becoming invalid by the al- erations rather than on the data, and ensure that
teration, and thus on-the-fly modification, and even operations carry such a unique property. The basic
permanent modification (i.e. a stuck relay) are not operations that are currently being studied are:
visible/detectable in the data-type it self.
• Signal expansion: i.e. adding of signals - ex-
tending the signals content to contain the dif-
3.2 Time domain ference and sum of the inputs.
• Signal reduction: i.e. filtering of signals - re-
A similar situation is given for the time domain. Ba- ducing the spectrum it represents.
sically the primitive data types don’t have any en-
coding of the time-axis. Even if we pair values with • Delaying the signal: i.e. phase shifting - alter-
some for of timestamps (also just a value represent- ing the temporal correlation to other signals in
ing some point in time) we don’t substantially change the system.
the issue, we just have added a level of complexity
for a correlated fault to occur (that is change of the
value and synchronous change of the time-stamp to 4.1 Signal Requirements
fit the expected value - think of a zero-crossing detec-
tion for sinusoidal signal used frequency monitoring The requirements listed here are only from a safety
on power-lines, the failure now would require a value perspective and not from a functional perspective.
and time-stamp fault to lead to a false positive, so further more they are most likely not yet complete
the probability is reduced but it is not eliminated). as this project is in an early stage.

Clock fault • all valid states must be active states - inactiv-


ity of any component may not lead to a legal
logic high . - - - - - - .------ value

252
FLOSS in Safety Critical Systems

• all elements may only react to valid states (im- frequencies, with a tolerated range. In this example
plying active states) we mapped:
• SEUs shall not impact the value domain (gen-
eral resilience against SEUs) Logic Dynamic representation
• signals must encapsulate temporal properties TRUE 6 Hz
along with value properties FALSE 12 Hz

• Operations on signals may not include a single-


point-of-failure (i.e. comparison to fixed value Each value needs a specific tolerance - so the
or fixed offset addition or filter parameters of value domain which is mapped to the frequency do-
which modification of only one leads to a false- main is sparse, and thus tolerant against a certain
positive filter) deviation (i.e. transient SEUs). Further signal inter-
action is by multiplication, thus any signal omission
To satisfy these - admittedly very crude - require- would lead to multiplication with a 0 value and thus
ments, we propose mapping logical values to discreet no valid output.

FIGURE 1: response to input omission


failure

Stuck at failures are a bit more critical, select- ing (in this case butterworth filters) allows covering
ing proper frequencies in the generated intermedi- both omission and stuck-at faults.
ate spectrum in combination with band-pass filter-

253
Safety logic on top of complex hardware software systems utilizing dynamic data types.

FIGURE 2: response to stuck-at failure of


input

4.2 Core concepts of dynamic data • compromising dynamic data types would re-
types quire arbitrarily complex correlated alterations
of data/code to yield a valid signal
From a safety perspective dynamic data types con-
• increased system complexity reduces the prob-
cept can be summarized as
ability of a false-positive signal generation

• variable values are bound to a frequency spec- Essentially the goal of dynamic data types is
trum and so conserve temporal information of to ”scale” with system complexity growth which we
the signal consider inevitable.

• only active states are legal (recognizable) states


- that is signal omission or stuck-at does not 4.3 Hardware elements
yield a valid signal
The concept of dynamic data-types does not only
• combinations of signals lead to valid signals or allow implementing computer based logic on top of
no signal complex hardware/software systems safety but also
would lend it self to low-complexity system that
• Complex logic can be described by signal com- can be fully integrated with computer based systems
position without reduction of generality.

254
FLOSS in Safety Critical Systems

positive voltage
True : 1050Hz sin-wave level
IN A False: 1350Hz detection detection
2.1kHz

True : 1050Hz
False: 1350Hz Output
True/False
IN B voting
Frequency
logic
Switch
Multiply

True : 1050Hz
False: 1350Hz
negative voltage
sin-wave level
detection detection

FIGURE 3: Realization of logic element as


low-complexity hardware
Also as the whole point of dynamic data types half-waves that provide a diverse representation of
is to encapsulate not only state information but also the signal - thus any stuck at at best could trigger one
status and temporal information in the data repre- side but not both - finally utilizing both half-waves to
sentation this only makes sense if the end-elements, generate the output action. Naturally such an end-
the actuators in the system, also can utilize these element is quite problem specific thus we can’t gener-
inherent safety properties. alize this, but this scheme does show how the transi-
tion from the digital logic representation/processing
A basic model of an end-element for the use in
to final elements can be done retaining the technical
dynamic data types, is outlined here. The basic con-
safety properties of the hardware/software system.
cept is to process the bandpass signal as separate

positive voltage
sin-wave level
detection detection
1050kHz

Computation
True : 1050Hz voting
Node Actuator
False: 1350Hz logic

negative voltage
sin-wave level
detection detection

FIGURE 4: Realization of safe end-


element
While this is still active research, this prelimi- one can find relatively simple safety logic involved
nary hardware implementation does show the gener- in monitoring operations of complex systems. Ba-
ality of the concept allowing to establish a uniform sically the majority of the control is not safety re-
safe signaling model within a safety related system. lated - think of this as a CNC machine where the
only real safety related issue is if the protection do
rs are closed during operations or in case of operation
with open doors (for positioning and maintenance)
4.4 Composable logic the movement of components must be must be lim-
ited to very low speeds - the essence of this exam-
A further issue for any control system is compos- ple is that the actual safety conditions are relatively
ability. While we are restricting the model here to simple and it is typically resolved by monitoring the
boolean logic at present extensions do seem possi- system and intervening in case of constraint violation
ble. Never the less the first step is to demonstrate rather than relying on the entire control software to
composability of basic logic elements. be safety related and ”bug free”. This relatively sim-
Even though the presented example is not a ple logic components (i.e. doorALock & doorBLock
wildly impressive level of complexity - many times ) can now be monitored by a dedicated safety re-

255
Safety logic on top of complex hardware software systems utilizing dynamic data types.

lated system - hat is some MCU or could be run on either be in the tolerance of the signaling system (fil-
the available computing capacity provided adequate ter bandpass tolerances) or will cause the system to
isolation could be enforced as stated in IEC 61508- enter an identifiable invalid state and lead to a safety
3 Clause 7.4.2.8/7.4.2.9 [2]. To provide such simple reaction. For systematic faults it may be a bit more
logic one can use dynamic data types that allow cre- complicated to argue such an approach but equally
ating reliable logic from signal summations followed the impact of any systematic fault at the lower level
by digital filtering to extract the intended logic oper- (i.e. operating system, system libraries) would have
ation. This is achieved by multiplying the inputs and to generate a very complex and synchronous response
the resulting frequency summation/difference again and not merely a local fault like the infamous FOOF
by a constant helper frequency. bug [6].
The claim here thus is that such relatively low-
Inputs complexity safety logic elements can be run on a
A B A*B H1 Spectrum of A*B*H1 unsafe OS while retaining the safety properties of
6 6 0,12 27 , ,15, ,27, ,39 the logic. It should be stressed though that we are
6 12 6,18 27 ,9 , ,21, ,33, ,45 not claiming any mitigation of systematic faults in
12 6 6,18 27 ,9 , ,21, ,33, ,45 the implementation of the safety logic it self - rather
12 12 0,24 27 3 , , , ,27, , , ,51 only mitigation of systematic faults in the underly-
| XOR | ing generic components is claimed. With the overall
complexity of the safety logic and the signal process-
ing beneath it being relatively low we think that it
The output frequency spectrum now allows to
is an absolutely realistic target to implement such a
use a bandpass to extract A XOR B from the above
logic according to suitable standard procedures and
spectrum. By changing the input H1 (a helper fre-
under control of adequate safety management to en-
quency) the frequency spectrum generated by multi-
sure with adequate probability that the residual fail-
plying the two inputs with the helper frequency al-
ure rate of the safety logic is sufficiently low. As an
lows extraction of A AND B. By following the input
example the DDT logic presented here takes two in-
multiplication stage by a digital filter (in our case we
puts A and B and a control input H and provides an
use a 4th or 5th order FIR filter) one then can ex-
output of A o B and NOT A o B, with the operation
tract the desired compositional logic term from the
being AND or XOR depending on the setting of H.
signal.

Inputs H
A B A*B H1 Spectrum of A*B*H1
6 6 0,12 9 3 , 9, ,21, ,
6 12 6,18 9 3 , 9,15, ,27,
A AoB
12 6 6,18 9 3 , 9,15, ,27,
12 12 0,24 9 , 9,15, , ,33 DDT−
| AND |
Logic
B AoB
While this may seem quite a complex way of
doing this, it is precisely the complexity introduced
that is the protective mechanism against systematic FIGURE 5: composable logic element
and stochastic faults in the system as the signals logic structure
value is encoded in the frequency of the signal and
thus to transform a signal into a different logical sig- Note that the frequencies selected here during
nal (i..e. a signal that should be N Hz representing simulation, technically are not sensible, but the prin-
on to a frequency of N Hz representing off) would ciples don’t change when transformed to other ranges
require a highly complex sequence of synchronous of the spectrum - any real-life system would be op-
faults to appear - all faults that appear randomly will erating in the kHz range.

256
FLOSS in Safety Critical Systems

FIGURE 6: composable logic element based


on DDT
Inspecting the generated frequency spectrum one tem can impact the processing of dynamic data types
can extract further logic operations on the inputs at arbitrary points, but as there is no single point
from the output spectrum through adjusted filters of failure in the transformation of the input stream
as well as additional logic inputs. At this point we to the output stream, no such single fault could im-
neither know the scalability of this composition nor pact the logical correctness - it can of course diminish
do we know its limits - but the current state of ex- availability - which is not of concern here. Regard-
periments is sufficient to already produce some ba- ing multiple faults, there is of course the possibility
sic boolean safety logics that could be practicable of accumulated dormant faults emerging and these
for safety related applications running on complex then striking at some mode change - i.e. in the above
(non-safety) hardware software platforms. example when switching the mode-input from AND
logic to XOR logic - but any such fault would have
to again generate a complex sequence of values to
lead to a false positive output which is not impos-
4.5 Masking through complexity sible but unlikely - how unlikely it is is still under
active investigation.
As noted this is in the context of efforts to utilize
complexity of underlying hardware/software systems As the data stream though always depends on
to enhance safety rather than fighting them. With multiple manipulations (one could view dynamic
the the mapping presented in the last section we ob- data types as a N-value diverse data representation)
tain a process to construct outputs that is not suscep- SEUs are fully mask (both in signals as well as pa-
tible to faults in the underlying software/hardware rameters for i.e. the filters) and at the same time
systems because any false-positive requires an arbi- would be detectable by independent signal monitor-
trarily long correlated sequence of faults to lead to ing. In many ways what is being proposed here is
the false positive. Any fault in the underlying sys- the re-invention of quite simplistic analogue technol-

257
Safety logic on top of complex hardware software systems utilizing dynamic data types.

ogy in a digital world - but with the advantage of components as well as the frequency/spectrum-gap
retaining the signal transmission properties and the selection) are directly safety critical, such a system
ability to manipulate the signals in software. can be allowed to run on a un-safe operating sys-
tem/hardware due to the inherent detectability of
induced faults. Thus the required independence from
5 Conclusion the co-located software components can potentially
be provided.
Essentially we believe that safety needs methods to While we see dynamic data types as a potential
build on complexity - ideally appreciate complexity solution to some of the typical safety logic needs, we
of underlying systems, without compromising safety are well aware of this not yet being a mature con-
principles. We see little point in a continuous fight cept - rather we hope that we will find opportunities
against ever increasing complexity of system software to study this approach in more detail. Independent
and hardware - the question of ”should it be done in of the suitability of the dynamic data type concept
software” is legitimate in many cases, but there is though, we believe that it is time to start thinking
also little point in denying the trend towards com- about how to utilize complexity and inherent non-
plexer software/hardware systems and aggregation determinism for safety related systems rather than
of functions of different safety levels in systems. If continuing to rely on an abstract model of comput-
the safety community does not find answers to the ing that is crumbling - if not vanishing.
complexity growth that fit current technologies and
goes on recommending simple systems with simple
or no operating system, this will eventually backfire References
as we will see the expertise and experience as well
as field data of these systems fade into negligence - [1] IEC 61508-7 Ed 1 Functional safety of electri-
resulting in degraded safety in the long run. cal/electronical/programmable electronic safety-
We are not claiming that dynamic data types are related systems Part 7: Overview of tech-
THE answer to the problem, but we do believe that niques and measures, International Electrotech-
they are an example of the potential that lies in the nical Commission, 1998
paradigm change when switching to ”capitalize on
[2] IEC 61508-3:1998 Ed 1 Functional safety
complexity” rather than ”fight complexity”.
of electrical/electronical/programmable electronic
As Leveson notes that software can affect system safety-related systems Part 3: Software require-
safety in two principle ways: ments, International Electrotechnical Commis-
sion,1998
• it can exhibit behavior in terms of output [3] IEC 61131,
value and timing that contributes to the sys-
tem reaching a hazardous state [4] EN 50159-1 Railway applications - Communica-
tion, signalling and processing systems Part 1:
• it can fail to recognize and handle hardware Safety-related communication in closed transmis-
failures that it is required to control or respond sion systems, CENELEC, 2003
to in some way.
[5] Safeware: System Safety and Computers, Nancy
As safety is a system property dynamic data Leveson, 1995 Addison-Wesley
types can’t in principle guarantee software safety - [6] Pentium F00F bug ,Wikipedia, May 2010
but we believe dynamic data-types can give reliable http://en.wikipedia.org/wiki/0xF00FC7C8a
mitigation’s to both of the above mentioned potential
of software to impact system safety. Thus while the [7] The Coded Microprocessor Ccertification, Ozello
implementation of dynamic data types (the software Patrick, 1992,SafeComp

258
FLOSS in Safety Critical Systems

POK, an ARINC653-compliant operating system


released under the BSD license

Julien Delange
European Space Agency
Keplerlaan 1, 2201AG Noordwijk, The Netherlands
[email protected]

Laurent Lec
MakeMeReach
23 rue de Cléry, 75002 Paris, France
[email protected]

Abstract
High-integrity systems must be designed to ensure reliability and robustness properties. They must
operate continuously, even when deployed in hostile environment and exposed to hazards and threats.
To avoid any potential issue during execution, they are developed with specific attention. For that
purpose, specific standards define methods and rules to be checked during the development process.
Dedicated execution platforms must also be used to reduce potential errors. For example, in the avionics
domain, the DO178-B standard defines the quality criteria (in terms of performance, code coverage, etc.)
to be met according to the software assurance level. ARINC653 specifies services for the design of safe
systems of avionics systems by using partitioning mechanisms.
However, despite those specific methods and tools, errors are still introduced in high-integrity systems
implementation. In fact, their complexity due to the large number of collocated functions complicates
their analysis, design or even configuration & deployment. In addition, an error may lead to a safety or
security threats, which is especially critical for such systems.
In addition, existing tools and software are released under either commercial or proprietary terms.
This does not ease identification and fix of potential security/safety issues while also reducing the potential
users audience.
In this paper, we present POK, a kernel released under the BSD license that supports software iso-
lation with time & space partitioning for high-integrity systems implementation. Its configuration is
automatically generated from system specifications to avoid potential error related to traditional code
production processes. System specifications, written using AADL models are also analyzed to detect any
design error prior implementation efforts.

1 Introduction tive behavior. However, they often operate in hostile


environments so that they must be especially tested,
validated to ensure absence of potential error.
Safety-critical systems are used in domains (such as
military, medicine, etc.) where security and safety Last years, processing power of such real-time
are a special interest. They have strong requirements embedded systems increased significantly so that
in terms of time (enforcement of strong deadlines) many functions previously implemented using dedi-
and resources consumption (low memory footprint, cated hardware are now provided by software, which
no dead code, etc.). These systems must ensure a facilitates updates and reduces development costs.
continuous operational state so that each potential In addition, this additional processing power gives
failure must be detected before generating any defec- the ability to collocate several functions on a same

259
POK, an ARINC653-compliant operating system released under the BSD license

computation node, reducing the hardware to deploy purpose, functions are separated within partitions.
and thus, the overall system complexity. These are under supervision of a dedicated kernel
that provides partitions execution separation using
However, by collocating several functions on the
time & space isolation:
same processor brings new problems. In particular,
this integration must ensure that each function will
be executed as it was deployed on a single proces- • Time isolation means that processing capac-
sor. In other words, the execution run-time must ity is allocated to each partition so that it is
provide an environment similar to the one provided executed as if it was deployed on a single pro-
by a single processor. In addition, impacts between cessor.
collocated functions must be analyzed, especially • Space isolation means that each partition is
for safety-critical systems where integrated functions contained in a dedicated memory segment so
may be classified at different security/safety levels. that one partition cannot access the segment
Next section details the problem and our pro- of another. However, partitions can communi-
posed approach. It also presents other work related cate through dedicated channels under kernel
to this topic. supervision so that only authorized channels
are granted at run-time.

2 Problems & Approach Partitions integration analysis is especially im-


portant since a partitioned kernel may collocate sev-
The following paragraphs details identified problems eral functions classified at different security/safety
related to high-integrity systems design and devel- levels. Communication channels between partitions
opment with the increasing number of functions and must be verified to check that system architecture
their integration on the same processor. Then, it does not break safety/security requirements. For ex-
details our approach and each of its functionality. ample, in the context of safety, we have to check that
a partition at a low safety level cannot block a device
used by a partition at a higher security level. On the
2.1 Problems other hand, on security level, we have to check that
a partition cannot establish covert channel to read
High-integrity systems host an increasing number of or write data from/to partition classified at higher
functions. In addition, due to new hardware and per- security levels.
formance improvements, the trend is to exploit ad- Thus, automatic kernel and partitions configura-
ditional computing capacity and collocate more soft- tion would be a particular interest because these sys-
ware components on the same processor. However, tems host partitions that provide critical functions
this introduces new problems, especially when sys- and must be carefully designed. In consequence, we
tems have to enforce safety or security requirements: have to avoid any manual code production process,
because it is error prone and have important eco-
1. The platform must provide an environment for
nomic and safety impacts [6]. This is also especially
system functions so that they can be executed
critical when creating the most important code - the
as if they run on a single processor.
one that configures the isolation policy.
2. Functions collocated on the same processor
must be analyzed so that an error on a func-
tion cannot impact the others. For example, 2.2 Approach
this would ensure that a function classified at
a given assurance level cannot impact another To cope with identified problems, we propose the fol-
at a higher one. lowing approach focused on the following aspects:

3. The execution platform configuration must be


1. A dedicated run-time, POK, that ensures
error-free to ensure functions isolation and re-
time & space partitioning so that functions are
quirements enforcement. Otherwise, any po-
executed as if they run on a single processor.
tential mistake would lead to misbehavior and
make the whole system unstable. 2. An analysis framework that analyzes sys-
tem functions and detects potential errors.
First problem relates to function isolation: while
collocating several functions on the same proces- 3. An automatic partitions and kernel con-
sor, designers must ensure their isolation. For that figuration tool from validated specifications.

260
FLOSS in Safety Critical Systems

This ensures enforcement of validated safety the ARINC653 avionics standard [2] specifies the ser-
and security requirements at run-time. vices and the API that would be provided by such a
system. The MILS [14] approach, dedicated to secu-
Use of these three steps altogether makes a com- rity, also details required services to integrate several
plete development process that would ease integra- functions on the same processor while ensuring a se-
tion of heterogeneous functions, as illustrated in fig- curity policy.
ure 1. First, the designer describes its functions with On our side, the POK operating system provides
its properties (criticality/security levels, execution services required by both ARINC53 & MILS: time
time, etc.) using an appropriate specification lan- & space partitioning support, real-time scheduling,
guage (AADL). Then, we analyze system architec- device drivers isolation, etc. It supports ARINC653
ture to ensure that each of their requirements would API and provides a POSIX adaptation layer to ease
be enforced while integrating (step 1 on 1). From this application use on POK. Finally, it is released under
analysis, deployment & configuration code is auto- the BSD license so that users are willing to use it as
matically produced so that partitions and kernels are free-software and can easily improve it, depending on
correctly configured according to their requirements their needs.
(step 2 on 1). Finally, generated code is integrated
with the POK execution platform that supports time On system modeling and analysis side, no frame-
& space isolation so that functions are integrated and work provides the capability to both capture and
separated as specified (step 3 on 1). analyze partitioned architecture requirements. How-
ever, this is very important for incoming projects
when system architecture must be analyzed/vali-
AADL models dated and development automated, especially be-
cause traditional development methods costs are still
Specifications validation (1) increasing [6] and lead to security/safety issues.
Similarly, no tool automates configuration and
Code generator (2) deployment of partitioned kernel from their spec-
ifications. Usually, developers make it manually
Configuration Plate-forme (POK) by translating system requirements into configura-
tion/deployment code. This is still error-prone and
(3)
a fault can have a significant impact (missed tim-
Implementation
ing constraint/deadline, communication channel not
allowed, etc.). Automating configuration is still an
FIGURE 1: Overall development approach emerging need but would likely to take importance
while system functions are still increasing.
As a result, our approach ensures integration of
several functions on the same processor while pre-
serving all their requirements in terms of timing, se- 3 The POK execution platform
curity or safety.

The following sections gives a general presentation


2.3 Related Work of the POK kernel, detailing its services and imple-
mentation internals.
Partitioning operating systems that support time &
space partitioning already exist. However, most ex-
isting solutions are released as commercial and pro- 3.1 Overview
prietary software, such as Vxworks [8] or LynxOS [9].
On the free-software side, the Xtratum [13] project POK is an operating system that isolates partitions
aims at providing partitioning support by isolating in terms of:
functions in RTEMS [11] instances. However, it iso-
lates functions using virtualisation mechanisms (us-
• time by allocating a fixed time-slice for parti-
ing a dedicated hypervisor), whereas POK relies on
tions tasks execution with its own scheduling
a traditional kernel model.
policy.
In addition, several standardization attempts
were initiated over the years, to define the concepts • space by associating a unique memory seg-
behind partitioned operating systems. Among them, ment to each partition for its code and data.

261
POK, an ARINC653-compliant operating system released under the BSD license

However, partitions may need to communicate. functions to partitions (for supporting schedul-
For that purpose, POK provides the inter-partitions ing algorithms for example).
ports mechanism that defines interfaces to exchange
• Fault Handling catches errors/exceptions
data between partitions. These are defined in ker-
(for example: divide by zero, segmentation
nel configuration so that only specified channels are
fault, etc.) and calls the appropriate handler
allowed at run-time. Consequently, system designer
(partition or task that generates the fault).
has to specify the communication policy prior (which
partition can communicate with another) executing • Time Isolation allocates time to each parti-
the system. tion according to kernel configuration.
Moreover, partitions may require to communi- • Space Isolation switches from one memory
cate with the external environment and so, access segment to another when another partition is
devices. However, sharing a device between two par- selected to be executed.
titions leads to potential security or safety issues: a
partition at a low security level may read data pre- • Inter-partitions communication exchanges
viously written by a partition classified at a higher data from one partition to another. Inter-
level. To prevent this kind of issue, POK requires partitions communication is supervised by the
that each device is associated to one partition and kernel so that only explicitly defined commu-
the binding between partitions and devices must be nications channels can be established and ex-
explicitly defined during the configuration. change data.

Partition-level services aims at supporting appli-


Partition Level

Cipher algorithms Device drivers code cations execution. It provides relevant services to
create execution entities (tasks, mutexes, etc ...) and
(libpok)

Maths functions libc POSIX ARINC653


helps developers:
Kernel Tasking Intra-partition
interface communications • Kernel interface accesses kernel services
(writing to/reading from inter-partitions com-
Time Space Inter-partitions munication ports, creating tasks, etc.). It re-
Kernel level

isolation isolation communication lies on software interrupts (syscalls) to request


specific kernel services.
Fault Time Memory
Handling Management Management • Tasking functions handles task-related as-
pects (thread management, mutex/semaphore
Hardware Abstraction Layer locking, etc.).
• Intra-partition communication provides
FIGURE 2: Services of Kernel and Parti-
functions for data exchange within a parti-
tion layers
tion. It supports state-of-the-art communi-
cation patterns: events or data-flow oriented,
Finally, POK is available under BSD licence
semaphores, events, etc. Intra-partitions ser-
terms, supports several architectures (x86, PowerPC,
vices are the same than the one provided in
SPARC/LEON) and has already been used by differ-
the ARINC653 [2] standard.
ent research projects [15, 5, 18].
• Libc, Maths functions, POSIX & AR-
INC653 layers are the mapping of well known
3.2 POK services standards in POK. It aims at adapting other
APIs to help developers for reusing existing
Figure 2 depicts the services of each layer (kernel/- code in POK partitions. For example, by using
partition). The kernel contains the following: this compatibility layer, developers can reuse
existing software that performs POSIX calls.
• The Hardware Abstraction Layer provides
a uniform view of the hardware. • Device drivers code is a set of functions to
support devices within partitions. It relies on
• Memory Management handles memory to the kernel-interface service when privileged in-
create or delete memory segments. struction (see section 3.5) are used.

• Time Management counts the time elapsed • Cipher algorithms is a set of reusable func-
since system start-up and provides time-related tions to crypt/decrypt data within a partition.

262
FLOSS in Safety Critical Systems

3.3 Time isolation policy overlap each other and a partition cannot access to
memory address located outside its associated seg-
The time partitioning policy requires a predictable ment.
method to ensure enforcement of time-slices alloca-
Segments properties (address, size, allocation)
tion. For that purpose, POK schedules partitions
are defined at configuration time and cannot be mod-
using a round-robin protocol: each of them is exe-
ified during run-time. To avoid any access outside
cuted periodically for a fixed amount of time. This
their segments, partitions are executed in a user con-
introduces a scheduling hierarchy: partitions are first
text. They can only use a restricted set of instruc-
scheduled and then, tasks within the partition are
tions1 , as for user processes on regular operating sys-
scheduled according to the internal partition schedul-
tems. To use privileged instructions2 , they must call
ing policy.
kernel services and performs system calls (see POK
services - kernel interface in 3.2).
RMS RMS To guarantee space isolation, POK relies on the
Level 1

policy policy Memory Management Unit (known as the MMU ), a


dedicated hardware component usually embedded in
Round-Robin Round-Robin
policy policy
the processor. Depending on the architecture, dif-
ferent protection mechanisms can be used (segmen-
Partition 1 Partition 2 Partition 1 Partition 2
tation, pagination, etc.). For the x86 platform, it
Level 0

relies on the segmentation. This protection pattern


10 ms has many advantages: it is very simple to program
(the developer has to define memory areas using the
FIGURE 3: Partitions scheduling example Global Descriptor Table - GDT [12]) and is pre-
using POK dictable (other memory protection mechanisms re-
quires to perform special operations which execution
In order to ensure predictable communications time are difficult to bound). When switching from a
(in terms of time), inter-partitions communication partition to another, the POK kernel ensures that:
data are flushed at the major frame, which is the
end of the partition scheduling period (sum of all par- 1. Execution context from the previous partition
titions periods). In other words, once all partitions is clean (the elected partition cannot read data
are executed, inter-partitions communications data produced by the previous).
are flushed and is available to recipients partitions
2. The memory segment associated with the
during their next execution period.
elected partition is loaded.
An example of such a scheduling policy is pro-
3. Caches are flushed so that the execution con-
vided in figure 3 with two partitions: one sched-
text is clean as if the partition was freshly
ules its task using the Rate Monotonic Scheduling
elected on a clear environment.
(RMS) policy while the other uses Earliest Dead-
line First (EDF). Each partition is executed during
100ms and inter-partitions communication ports are
flushed each 200ms.
Partition 1 Partition 2
Within partitions, POK supports several POK kernel
scheduling policies: the basic FIFO algorithm or
other state-of-the-art methods such as EDF or LLF.
FIGURE 4: Inter-partitions communica-
However, support for advanced scheduling algo-
tion within POK
rithms is still considered as experimental.
Finally, POK provides the inter-partition com-
munication services, which aim at establishing and
3.4 Space isolation and inter- monitoring data exchange from one partition to an-
partitions communication other, as shown in figure 4, where partition 1 send
data to partition 2 under the supervision of the POK
Partitions are contained in a distinct memory seg- kernel. These channels must be defined at configu-
ment, which means that each of them has a unique ration time so that the kernel allows only explicitly
space to store its code and data. Segments cannot defined communication at run-time.
1 On Intel architectures, such code is usually executed in ring 3
2 On Intel architectures, privileged instructions are available for code executed in ring 0

263
POK, an ARINC653-compliant operating system released under the BSD license

In the example of figure 4, any attempt to send


data from partition 2 to partition 1 will fail. As
explained in section 3.3, inter-partitions communi- Partition Partition
cation are flushed when the major time frame is (application) (driver)
reached. To do so, the POK kernel copies the data of
each initialized source ports to their connected des- POK kernel
tination ports, copying memory areas from one par-
tition to another. However, to ensure that only rele-
vant data are copied, the size of each inter-partition Application partition segment
port is also defined at configuration time so that no
additional data can be read by the destination par- Driver partition segment
tition. I/O segment

FIGURE 6: Input/Output operations


3.5 Device drivers implementation within POK

Compared to other approaches, POK does not exe- Finally, when executing within a partition, a
cute device drivers within the kernel. Instead, their driver requires to have access to privileged instruc-
code is isolated within a single partition to: tions to control hardware. For that purpose, POK
provides access to these privileged instructions only
to the relevant partition (the one that accesses the
• Avoid any impact from a device driver error driver). In that case, this additional access must
(safety reason). If the driver is executed in be defined at configuration time so that the kernel
the kernel context, a crash would lead to un- grants/refuses access to privileged instructions ac-
expected impacts, such as crashing the whole cording to its configuration policy. An example is
kernel or other partitions. shown in figure 6: in this system, two partitions
are executed: one application partition and a driver
• Ensure data isolation (security reason): if the partition that controls a device. The system defines
device is shared by several partitions, one clas- three memory areas: one for each partition and an
sified at a low security level may read or write additional area mapped to the device memory. The
data on the device from another one classified driver partition may have access to this specific addi-
at higher level. tional memory segment to control the hardware but
the application partition would not be able to access
it.

Partition Partition Partition


reader driver writer 3.6 Configuration flexibility
POK kernel One major advantage of POK compared to other par-
titioned operating systems is its fine-grained configu-
FIGURE 5: Model for sharing a device us- ration policy: each service of POK, at each level (par-
ing POK tition of kernel) can be configured and finely tuned.
This would have several advantages: first, it reduces
the set of functionality to the minimum and ensures
However, some partitions may need to share a
a reduced memory footprint. In addition, reducing
device. Even if they cannot have their own driver in-
the number of functions and potentially useless code
stance, these partitions may communicate with the
increases the code coverage of the whole system (for
one that executes the driver. In that case, the con-
partitions and kernel).
nection between partitions must be explicitly defined
at configuration time and then, be enforced by the However, this flexibility is particularly interest-
security policy. This is illustrated in figure 5: the ing when the system is automatically configured, ac-
device that reads the device receives data from the cording to its specification/description. Otherwise,
partition driver while the writer has an outgoing con- the developer has to write the configuration man-
nection to it. Connection to the driver must be ex- ually, which would be time-consuming and error-
plicitly defined so that the partition driver sends data prone. Hopefully, POK provides such configuration
only to relevant communication ports. generation facility through its AADL code generator.

264
FLOSS in Safety Critical Systems

To auto-generate kernel and partitions configuration, (AADL models) so that verified properties are cor-
designer must then first model its system with its re- rectly translated into code.
quirements. This is explained in the next sections.
Next sections present the language chosen
(AADL [7]), its tailoring for partitioned architecture

Validation failure
AADL models modeling and its use for specifications validation.

Model validation rules 4.1 AADL


Validation success

AADL-to-ARINC 653 The Architecture Analysis and Design Language


code generator (Ocarina) (AADL) [7] is a modeling language that defines
Code production a set of components to model a system architec-
ture. It provides hardware (processor, virtual
Generated Generated processor, memory, etc.) and software components
Partition layer
Configuration Configuration (thread, data, subprogram, etc.) as well as mecha-
for partition 1 for partition n nisms to associate them (for example, model the as-
Partition Partition sociation of a thread to a processor or a memory).
services (libpok) services (libpok) The language itself can be extended using:
Kernel

Generated Kernel Kernel


layer

configuration services (POK) 1. Properties to specify components require-


ments or constraints (for example, the period
FIGURE 7: Development and prototyping of a task). The core language provides a set of
process using AADL models predefined properties but system designers can
define their own.
4 Partitioned System Model- 2. Annex language to associate a piece of code
ing of another language to a component. By do-
ing so, it provides a way to interface AADL
components with third-party language for the
To define a system architecture that uses POK, its association of other system aspects.
specification is written with an appropriate modeling
language to:
AADL provides both graphical and textual no-
tations. The graphical one is more convenient for
1. Specify system functions using a rigorous no-
system designers while the text-based one is easier
tation (better than a simple text-based docu-
to process by system analyzers/code generators. A
ment)
complete description of the language, written by its
2. Analyze system properties and requirements to authors, is available in [1].
check for potential errors before implementa-
tion efforts. (for example, check system archi-
tecture against a security policy). 4.2 Tailoring the AADL for parti-
tioned architectures
3. Generate configuration & deployment code us-
ing automated tools so that we avoid hand-
The AADL provides a convenient way to describe
written code and its related problems.
both software and hardware architectures but one
may need guidance to know how to use it to describe
For our needs, we make a specific process (as il-
partitioned architectures and especially how to use
lustrated in figure 7) for the design and implementa-
AADL components to describe POK run-time enti-
tion or partitioned systems supported by POK. First
ties (partitions, memory segments, etc.) and their re-
of all, engineers create AADL [7] models that specify
quirements/constraints (partitions scheduling, mem-
system constraints and requirements. These specifi-
ory segments size, etc.).
cations are processed by validation tools that check
for potential error/fault that could generate a prob- For that purpose, we design modeling patterns,
lem either at run-time or even for system implemen- which consist of a set of modeling rules to be used
tation. Finally, code generator automatically cre- by system designers to represent partitioned archi-
ates configuration & deployment code for both ker- tectures. The main idea is to restrict the set of com-
nel and partitions according to system requirements ponents to be used in a model and which properties

265
POK, an ARINC653-compliant operating system released under the BSD license

must be associated to the model in order to make a example: resource dimensions with the analysis of
complete description of a partitioned architecture. memory segments size with respect to the associated
partitions requirements - size of tasks size, etc.) or
These modeling patterns has been first design for
validation of a specific constraint (for example, par-
POK [15] and then, be standardized as an annex of
titions scheduling).
the AADL standard [3]. Most important patterns de-
fine how to use AADL to model a partition (process Once models are validated and the architecture
bound to a memory and a virtual processor to considered as correct, our code generator, Ocarina,
model the partition, its allocated memory segment produces configuration & deployment code for both
and its execution context), intra-partition communi- the kernel and its associated partitions. Next sec-
cation (connections of AADL ports between thread tions detail this process.
located within the same process) or inter-partition
communication (connection of AADL ports between
process components). A full description of these
modeling patterns is available in [15, 3, 5].
Next section presents our tools that check models 5 Automatic Configuration &
to validate partitioned architectures requirements. Deployment of Partitioned
Architectures
4.3 System validation
Once system requirements and constraints have been
Once system architecture is specified using AADL validated, the Ocarina AADL tool-suite processes
models, analysis tools process models and check their models and generates configuration & deployment
compliance with several requirements/guidelines. In code for both kernel and partitions, as shown in fig-
the context of POK, we design the following rules: ure 7. Next sections describe the process [5], high-
lighting its benefits regarding safety-critical systems
• Modeling patterns enforcement rules needs (predictability, safety assurance, etc.) and con-
(rules described in previous paragraph). straints (code coverage, etc.).
This ensures that models are complete, with all
required components/properties and they can
be processed to configure the kernel and parti-
tions with code generators.
5.1 Code generation Overview
• Handling of potential safety issues rule.
This one checks that all potential error will be
Our code generator process [5] consists in analyzing
recovered and report to the user which error
AADL models, browsing its components hierarchy
is not handled by a partition or a task. For
and, for each of them, generates appropriate code to
example, if the designer didn’t specify a recov-
create and configure system entities (as depicted in
ery subprogram when a memory fault is raised
figure 7). The process creates the kernel configura-
while execution a partition, the analyzer will
tion code (partitions time slots, memory segments as-
report an error.
signment for each partition, etc.) and partitions con-
• Security analysis rules. This aims at check- figuration code (services to be used, required mem-
ing the architecture against a security policy. ory, intra-partition communication policy, etc.).
Depending on its own classification level and For example, when the code generator visits a
the one of the data they produce or receive, a memory component, it generates code to create a new
partition may be compliant with a specific se- memory segment. Then, it inspects the model to re-
curity policy. On the other hand, the security trieve the associated partition and configure the ker-
policy defines which operations are legal so that nel to associate the appropriate partition with this
our tool can automatically checks for architec- segment. Each AADL entity and property is used
ture compliance with them. Actually, our tool to produce system configuration code so that the re-
checks state-of-the-art security policies such as sulting process creates a complete code, almost ready
Bell-Lapadula [17] or Biba [16]. to be compiled. Next section explains which part of
the code is automatically generated and which still
Other validation can be issued on the models to requires some manual code to have a complete exe-
check either the correctness of the architecture (for cutable system.

266
FLOSS in Safety Critical Systems

5.2 Kernel and partitions configura- As for the configuration code generation, the be-
tion havior code of application is generated from a prede-
fined set of AADL components:
At first, the code generator browses components hi-
erarchy to create the kernel and partitions configu- • A thread component specifies a task that
ration code. For that purposes, it analyzes the fol- potentially communicates using its features.
lowing AADL components: For each thread, Ocarina generates code that
gets the data from its IN features, performs
• processor: for the kernel configuration. The the call sequence to its subprogram and sends
processor components specifies the time slots produced output using its OUT features.
allocated for each partition so that they are
translated into C code to configure the kernel • A data component represents a data located
time isolation policy. in a partition, shared among its tasks and pro-
tected against potential race conditions using
• virtual processor: for partition run-time locking mechanisms (mutex, semaphore, etc.).
configuration. Based on the properties of this For each task that uses the data, the code gen-
component, the code generator produces code erator produces code that locks it, modifies it
that configures partitions services (need for and finally releases it.
memory allocation, POSIX or ARINC653 API
support, etc.) • A subprogram component references object
code implemented either in C, Ada or even as-
• process and its features: the process con- sembly languages. This is just a reference to
tains all partitions resources (thread and low-level implementation so when visiting such
data) so that necessary amount of resources component, the code generator creates a func-
is allocated in the kernel and its partition. tion call to the object code. Finally, it also
process features represent inter-partitions configures the build system in order to inte-
communication ports. When analyzing such grate the object code in the partition binary.
entities, the code generator configures these
ports and their connection within the kernel
Then, almost all the code of the system is pro-
so that only communication channels specified
duced by Ocarina. The user has to provide the
in the model would be granted at run-time.
application-level code, the one referenced by AADL
• memory: as it represents a memory segment, subprogram components and automatically called by
the code generator produces configuration code the behavior code generated for each partition. Next
that instantiates a new memory segment in the section details the benefits of this process with re-
main system memory and associates it to its spect to high-integrity systems requirements.
bound process component (that corresponds
to a partition). In consequence, at run-time,
each partition will be associated with a mem- 5.4 Benefits of Code Generation
ory segment that has the properties (address,
size, etc.) specified in the model. First of all, the use of such a process requires to
specify system architecture using a modeling lan-
Once this configuration code has been generated, guage, which makes the whole process more rigorous
Ocarina also generates the behavior code of the sys- than just using a text-based specification document.
tem, the one executed by each task. Next section Moreover, the process brings the following benefits:
details this step.
1. Early error detection

5.3 Behavior code generation 2. Syntax/semantic error avoidance

The behavior part corresponds to the code that uses 3. Specifications requirements enforcement
partitions resources to execute application functions.
It consists in getting/putting data from/to the ex- By using validated models as a language source
ecution environment (for example, by using inter- for system implementation, the development pro-
partitions communication ports), calling application cess detects specification errors at the earliest when
functions and managing tasks execution (put a task such problems are difficult to track and usually de-
in the sleep mode when its period is completed, etc.). tected during tests (at best) or production (at worst)

267
POK, an ARINC653-compliant operating system released under the BSD license

phases. By identifying these errors prior to the im- Finally, the recovery policy requires to stop the ker-
plementation, we save a significant number of prob- nel when an error is raised at kernel-level and restart
lems and save development costs. other components (partition, task) when one of them
triggers an error/exception.
Then, by automating code production with code
generators such as Ocarina, we rely on established
generation patterns that output the same block of
code according a predefined AADL block. Use of
snd_thr
such code patterns avoids all error related to hand- recv_thr
written code that usually introduce syntax/semantic
errors that are difficult to track3 and require code recv_prs snd_prs
analysis tools and reviews to be found.
Finally, a particular interest is the enforcement of
the specifications. Implementation compliance with
seg1
the specifications is usually checked during manual seg2 part1 part2
code review. However, this is long, costly and also ram
pok_kernel
error-prone [6] since its relies on a manual inspec- case_study_osal
tion. By automating the code production from the
specifications and by using code generation patterns,
implementation code ensures specifications enforce- FIGURE 8: AADL model of the case-study
ment and so, would reduce development costs while
improving system safety and robustness.
6.2 AADL modeling

The graphic version of the case-study is illustrated


6 Case-Study in figure 8. It maps the requirements previously de-
scribed with specific execution components:
We illustrate the POK dedicated design process
through a basic example with two partitions: one • Partitions use process (recv prs and
that sends an integer to another which do some pro- snd prs for application aspects), virtual
cessing and outputs its result. processor (part1 and part2 of pok kernel
for run-time concerns) and memory (segments
specification) components. Arrows on the
6.1 Overview graphical model explicit their association.
• Tasks of each partition are defined with a
To focus on the design process, we limit the behavior thread component within each partition (ei-
of the system to a basic example: the sender parti- ther recv thr) or snd thr).
tion executes a task each second, increments an in-
teger (starting from 0) and sends the result to the • Queuing ports are added on each partition
receiver partition using an inter-partition communi- and connected within their containing system
cation channel. Then, the receiver partition runs a (case study osadl)It explicitly defines com-
task activated each 500ms that retrieves the value munication channel that can be requested by
sent by the other partition and triggers a divide by partitions at run-time.
zero, depending on the received value. This, the sec-
Definition of system recovery policy uses AADL
ond partition would execute the dedicated handler
properties, included only in the textual represen-
that recover from the fault it generates.
tation. The following listing provides an exam-
System architecture is made of two partitions ple of the definition of partition run-times with
each one executed during a 500ms time slot (so that their recovery strategy (the Recovery Errors and
the major frame is 1 second) and stored in a segment Recovery Actions properties). It also includes a
of 120Kbytes. Each partition contains a periodic processor component that represents the parti-
thread (either the sender or the receiver ) that calls tioning kernel with the two partition environments
one subprogram (the one that sends or receives the (part1 and part2) and its associated time isolation
integer). Partitions communicate through two queu- policy (Slots, Slots Allocation and Major Frame
ing ports, connection with a inter-partition channel. properties).
3 for example, the C code if (c = 1) statement; is often an error even if legal and would likely be if (c == 1) statement;

268
FLOSS in Safety Critical Systems

1 v i r t u a l p r o c e s s o r i m p l e m e n t a t i o n p a r t i t i o n . common 4 is subcomponent of (y , s )};


properties partitionsmem :=
3 S c h e d u l e r => RR ; 6 {x i n Memory Set |
A d d i t i o n a l F e a t u r e s=> ( c o n s o l e , l i b c s t d i o ) ; is s u b c o m p o n e n t o f ( x , mainmem ) } ;
5 Recovery Errors => ( I l l e g a l R e q u e s t , 8
Partition Init ); c h e c k ( ( C a r d i n a l ( mainmem ) > 0 ) and
7 Recovery Actions => ( P a r t i t i o n R e s t a r t , 10 ( Property Exists
Partition Stop ); ( p a r t i t i o n s m e m , ” Word Count ” ) )
9 end p a r t i t i o n . common ; 12 );
end C o n t a i n s M e m o r i e s ;
11
p r o c e s s o r implementation p o k k e r n e l . i
13 subcomponents Second validation theorem (see listing below)
p a r t 1 : v i r t u a l p r o c e s s o r p a r t i t i o n . common ; checks the major frame compliance (see section 3.3
15 p a r t 2 : v i r t u a l p r o c e s s o r p a r t i t i o n . common ;
properties for its definition) with partitions time slots. To do
17 M a jo r Fr a m e => 1000ms ; so, the theorem processes each processor compo-
Slots => ( 5 0 0 ms , 500ms ) ;
19 Slots Allocation => nent of the system and checks that the value of the
( reference ( part1 ) , reference ( part2 ) ) ; Major Frame property is equal to the sum of the
21 Recovery Errors => ( K e r n e l I n i t ,
Kernel Scheduling ); Slots value.
23 Recovery Actions => ( K e r n e l S t o p , 1 th eo r em s c h e d u l i n g m a j o r f r a m e
Kernel Stop ) ; f o r e a c h cpu i n p r o c e s s o r s e t do
25 end p o k k e r n e l . i ; 3 c h e c k ( p r o p e r t y e x i s t s ( cpu , ” M a jo r Fr a m e ” ) and
( ( f l o a t ( p r o p e r t y ( cpu , ” M a jo r Fr a m e ” ) ) =
5 sum ( p r o p e r t y ( cpu , ” S l o t s ” ) ) ) ) ) )
The following listing also illustrates the defini- end s c h e d u l i n g m a j o r f r a m e ;
tion of the complete system with its processor, par-
titions, memory and the connection of the inter- Other validation theorems can be designed and
partitions queuing ports. added to the process to automatically check for en-
1 s y s te m i m p l e m e n t a t i o n o s a l . i forcement of other requirements or also verify specific
subcomponents
3 cpu : processor pok kernel . i ; modeling patterns. Interested readers may refer to
sender : process snd prs . i ; the POK distribution available on the official POK
5 r e c e i v e r : process recv prs . i ;
mem : memory ram . i ; website [10]: it contains a complete REAL [19] the-
7 connections orem library to check all modeling patterns related
p o r t s e n d e r . p d a t a o u t −> r e c e i v e r . p d a t a i n ;
9 properties to partitioned architectures.
a c t u a l p r o c e s s o r b i n d i n g =>
11 ( r e f e r e n c e ( cpu . p a r t 1 ) ) a p p l i e s to s e n d e r ;
a c t u a l p r o c e s s o r b i n d i n g =>
13 ( r e f e r e n c e ( cpu . p a r t 2 ) ) a p p l i e s to r e c e i v e r ; 6.4 Code generation & metrics
a c t u a l m e m o r y b i n d i n g =>
15 ( r e f e r e n c e (mem . s e g 1 ) ) a p p l i e s to s e n d e r ;
a c t u a l m e m o r y b i n d i n g => Configuration and deployment code is automatically
17 ( r e f e r e n c e (mem . s e g 2 ) ) a p p l i e s to r e c e i v e r ;
end o s a l . i ; generated from AADL specifications. As models
have been previously validated, produced output is
expected to enforce system requirements.
6.3 AADL model validation Among all generated files, one especially impor-
tant is deployment.h, which defines constant and
We validate system architecture using the Require- macro that configure kernel services and set resources
ments Enforcement Analysis Language (REAL [19]) dimensions (amount of ports, partitions, etc.). The
tool. It analyzes AADL components hierarchy and following listing provides an overview of the file gen-
validates it against theorems. By using this tool, we erated from the model of this case-study.
validate model structure and compliance with model- /∗
2 ∗ other con figuration d i r e c t i v e s
ing patterns. We illustrate that using two theorems. ∗ ...
4 ∗/
The first one (see listing below) checks compli- #d e f i n e POK NEEDS CONSOLE 1
ance of system architecture regarding space isola- 6 #d e f i n e POK CONFIG NB PARTITIONS 2
#d e f i n e POK CONFIG SCHEDULING SLOTS { 5 0 0 , 5 0 0 }
tion and memory segments definition. It processes 8 #d e f i n e POK CONFIG SCHEDULING SLOTS ALLOCATION {0 ,1}
memory components contained in the main system #d e f i n e POK CONFIG SCHEDULING NBSLOTS 2
10 #d e f i n e POK CONFIG SCHEDULING MAJOR FRAME 1000
(that corresponds to the main memory - like RAM) #d e f i n e POK CONFIG NB PORTS 2
and ensures that each one contains memory sub-
components to specify memory segment with their We can check and verify that configuration direc-
size (by checking the definition of the Word Count tives enforces model requirements: two partitions are
property). defined, scheduling slots of each partitions (500ms)
th eo r em C o n t a i n s M e m o r i e s is correctly mapped, as well as the major frame (1s).
2 f o r e a c h s i n s y s t e m s e t do
mainmem := { y i n Memory Set | The code generation process not only configures

269
POK, an ARINC653-compliant operating system released under the BSD license

the kernel but produces partitions configuration and N) is set back to 0, showing that the partition binary
behavior code. Developers only have to write ap- has been re-loaded.
plication code, which corresponds to the functional
To assess the memory consumption of generated
part of the system. In this case-study, it consists
systems, we also report generated kernel and parti-
of two functions: one that outputs an integer and
tions sizes (see table below). Partitions size is sim-
stores it as a function argument (the one used on the
ilar: they contain the same functionality and differ
sender side) and another that takes one integer as
only by their application code. Both of them have a
argument and process it (receiver side). The code
small size: 11kB for a complete system that embeds
provided by the developer is shown in the following
run-time functions for the support of user applica-
listing, demonstrating that code production automa-
tion. This demonstrates the lightweight aspect of the
tion reduces manual code production activities.
approach. Kernel size is also very small, especially
In the following application code, the receiver for such a system that provides critical functions re-
part raises a division by zero exception when result garding safety and security issues.
of (t + 1)%3 == 0 (line 16 of the application code).
According to the recovery policy, when such a condi-
Component Size
tion is met, the partition restarts. To show graphi-
Kernel 26 kB
cally that the partition is correctly restarted, we also
Partition 1 11 kB
output the number of times the function is executed
using variable step. Its initial value is stored in the Partition 2 11 kB
data from the partition binary so that when reload-
ing the partition, the initial value is set again in the
variable. 7 Conclusions & Perspectives
1 void user send ( int ∗ t )
{
3 static int n = 0; This article presents POK, a BSD-licensed operating
5 p r i n t f ( ” Sen t v a l u e %d\n” , n ) ;
system that supports partitioning with time & space
n = n + 1; isolation. It also provides layers to ease deployment
7 ∗t = n ;
}
of existing code that uses established standards such
9 as POSIX or ARINC653.
s ta ti c int step = 0;
11 Beyond the operating system itself, POK relies
void u s e r r e c e i v e ( in t t )
13 { on a complete tool-chain to automate its configura-
int d; tion & deployment and ease partitioned systems de-
15
d = ( t + 1) % 3 ; velopment. It aims at specifying system architecture
17 printf ( ” Step %d\n” , s t e p ++); and properties using a modeling language, AADL
printf ( ” R e c e i v e d v a l u e %d\n ” , t ) ;
19 printf ( ” Computed v a l u e %d\n ” , t / d ) ; and verifying its requirements using dedicated anal-
} ysis tools that process these specifications. Then,
from this validated specifications, our tool-chain au-
Generated application is compiled for Intel (x86)
tomatically generates code that configures/deploys
architecture and produces the following output dur-
kernel/partitions and execute application code pro-
ing execution:
vided by the user. This ensures specifications re-
quirements enforcement and avoid all errors related
... to usual development process.
Step 3
Received value 5
[KERNEL] Raise divide by zero error
Step 0
7.1 Perspectives
Received value 0
Computed value 0 The domain of partitioned architecture is still emerg-
Sent value 8 ing and there is many potential open perspectives.
... On the kernel side, there is a need for more hardware
support (devices, architectures, etc.) and a wider
support of existing standards, as for the ARINC653
One may notice that when the faulty condition of
layer (for example, to support the second part of the
the application code ((t + 1)%3 == 0, line 16 of the
standard).
user application code) is reached (in that case, when
receiving value 5), the receiver partition is restarted. On the modeling and analysis part, there is a
Initial value of variable step (printed in the line Step strong need to connect AADL models with other

270
FLOSS in Safety Critical Systems

system representation or specifications. In particu- Infrastructure for Software Testing. Technical


lar, the production of AADL models could be au- report, 2002.
tomated from text-based specifications and model
components/entities could be associated to external [7] SAE. Architecture Analysis & Design Language
specifications such as DOORS. This would ease re- v2.0 (AS5506), September 2008.
quirements traceability, which is a special interest for [8] Windriver VxWorks -
the design of high-integrity systems, when designers http://www.windriver.com
have to ensure that high-levels requirements are cor-
rectly mapped in the implementation. [9] LynuxWorks LynxOS -
http://www.lynuxworks.com/rtos/
[10] POK - http://pok.safety-critical.net
References
[11] RTEMS - http://www.rtems.com
[1] Peter H. Feiler, David Gluch and John Hudak. [12] Global Descriptor Table - Wikipedia
The Architecture Analysis and Design Language
(AADL): An Introduction. Technical report, 02 [13] Xtratum - http://www.xtratum.org
2006.
[14] John Rushby - MILS Policy Architectures
[2] Airlines Electronic Engineering. Avionics Ap-
[15] Julien Delange, Laurent Pautet and Peter
plication Software Standard Interface. Technical
Feiler. Validating safety and security require-
report, Aeronautical Radio, INC, 1997.
ments for partitioned architectures. In 14th
[3] SAE Architecture Analysis and Design Lan- International Conference on Reliable Software
guage (AADL) Annex Volume 2 Technologies - Ada Europe, June 2009

[4] Fabrice Bellard. Qemu, a fast and portable dy- [16] Biba, K.J. - Integrity considerations for secure
namic translator. In ATEC 05: Proceedings of computer systems. Technical report, MITRE
the annual conference on USENIX Annual Tech- [17] Bell, D.E., LaPadula, L.J. - Secure computer
nical Conference, pages 4141. system: Unified exposition and multics inter-
pretation. Technical report, The MITRE Cor-
[5] Julien Delange, Laurent Pautet and Fabrice
poration (1976)
Kordon. Code Generation Strategies for Par-
titioned Systems. In 29th IEEE Real-Time [18] Julien Delange - Intégration de la securité et de
Systems Symposium (RTSS08), pages 5356, la sureté de fonctionnement dans la construction
Barcelona, Spain, December 2008. IEEE Com- d’intergiciels critiques - PhDThesis
puter Society.
[19] Olivier Gilles and Jérôme Hugues - Validating
[6] National Institute of Standards and Technology requirements at model-level in Ingnierie Dirige
(NIST). The Economic Impacts of Inadequate par les modles (IDM08)

271
POK, an ARINC653-compliant operating system released under the BSD license

272
FLOSS in Safety Critical Systems

D-Case Editor: A Typed Assurance Case Editor

Yutaka Matsuno
The University of Tokyo, Japan
JST, CREST
[email protected]

Abstract
System assurance has become an important issue in many system domains, especially in safety-critical
domain. Recently, assurance cases[3] have been getting much attentions for the purpose. We demonstrate
D-Case Editor [10], which is an assurance cases editor being developed in DEOS (Dependable Embedded
Operating System for Practical Uses) project funded by Japan Science and Technology Agency. D-Case
Editor has been implemented as an Eclipse plug-in using Eclipse GMF framework. Its characteristics are
(1) supporting GSN (Goal Structuring Notation) [8], (2) GSN pattern library function and prototype type
checking function [9], and (3) consistency checking function by an advanced proof assistant tool [13]. To
achieve these characteristics, we have exploited types in several ways. In this paper, we briefly introduce
assurance cases, and demonstrate the functions of D-Case Editor. Because it has been implemented on
Eclipse, it is interesting to make a tool chain with existing development tools of Eclipse. D-Case Editor
is available as an open source in the following web page: http://www.il.is.s.u-tokyo.ac.jp/deos/dcase/.

1 Introduction effective way is a critical issue for organisations. Pat-


terns and their supporting constructs are proposed in
System assurance has become a great importance GSN for the reuse of existing assurance cases, which
in many industrial sectors. Safety cases (assurance includes parameterized expressions.
cases for safety of systems) are required to submit Assurance cases have been recognized as a key
to certification bodies for developing and operating method for dependability of systems. However, cur-
safety critical systems, e. g., automotive, railway, de- rently there have been not so much tools for assur-
fense, nuclear plants and sea oils. There are several ance cases (very few in open source.) A notable tool
standards, e. g. EUROCONTROL [5], Rail Yellow is ASCE tools [1], which has been widely used in sev-
Book [12] and MoD Defence Standard 00-56, which eral areas such as defense, safety critical area, and
mandate the use of safety cases. medical devices.
There are several definition for assurance cases To make assurance case more familiar to devel-
[3]. We show one of such definitions as follows[1]. opers who are using open sources tools, we have re-
leased D-Case Editor, an open source assurance case
“a documented body of evidence that editor implemented on Eclipse GMF. The web page
provides a convincing and valid argument is http://www.il.is.s.u-tokyo.ac.jp/deos/dcase/. The
that a system is adequately dependable characteristics are as follows.
for a given application in a given envi-
ronment.”
1. Supporting GSN (Goal Structuring Notation)
Assurance cases are often written in structured [8],
documents using a graphical notation to ease the dif- 2. GSN pattern library function and prototype
ficulty of writing and certifying them. Goal Struc- type checking function [9], and
turing Notation (GSN) is one of such notations [8].
Writing assurance cases and reusing them in a cost 3. Consistency checking function by an advanced

273
D-Case Editor: A Typed Assurance Case Editor

proof assistant tool [13]. 2.2 GSN Patterns

Writing and certifying assurance cases are difficult


To achieve these characteristics, we have exploited because they tend to be huge and complex, and they
types in several ways. For example, we introduce require domain specific knowledge of target systems.
types for variables used in GSN patterns [4]. Our To ease the difficulties, it has been recognized that
intention is to make assurance cases to be shared assurance case patterns should be collected and avail-
among various tools for wider use. Introducing types able for reuse, similarly to design patterns in object
is one attempt for the purpose. oriented languages. There have been several publicly
available GSN patterns ([7, 14, 4]).
The structure of this paper is as follows. In Sec-
tion 2, we introduce assurance cases and patterns, Figure 2 is a simple example of GSN patterns in
and some standardization efforts for assurance cases. [4]. The top-level goal of system safety (G1) is re-
Section 3 introduces several functions of D-Case Ed- expressed as a number of goals of functional safety
itor. Section 4 states a few concluding remarks. (G2) as part of the strategy identified by S1. In order
to support this strategy, it is necessary to have identi-
fied all system functions affecting overall safety (C1)
e.g. through Functional Hazard Analysis (FHA). In
addition, it is also necessary to put forward (and
2 Background Knowledge develop) the claim that either all the identified func-
tions are independent, and therefore have no inter-
actions that could give rise to hazards (G4) or that
2.1 Goal Structuring Notation (GSN) any interactions that have been identified are non-
hazardous (G3).
Goal Structuring Notation (GSN) is introduced by Figure 2 includes main GSN extensions for GSN
Tim Kelly and his colleagues at University of York patterns, as defined in [6]:
[8]. It is a graphical notation for assurance cases.
GSN is widely used for safety cases. Some safety • Parameterized expressions. {System X} and
cases written in GSN are publicly available [2]. We {Function Y } are parametarised expressions.
briefly explain constructs and their meanings in We can instantiate X and Y by appropriate
GSN. Arguments in GSN are structured as trees with (possibly safety critical) system and function,
a few kinds of nodes, including: goal nodes for claims respectively.
to be argued for, strategy nodes for reasoning steps
that decompose a goal into sub goals, and evidence • Uninstantiated. Triangles (△) attached to
nodes for references to direct evidences that respec- nodes indicate that the nodes contain unin-
tive goals hold. Figure 1 is a simple example of GSN. stantiated paramerarised expressions. To in-
The root of the tree must be a goal node, called top stantiate the GSN pattern as an assurance
goal, which is the claim to be argued (G 1 in Figure case, we need to instantiate the expressions.
1.) For G 1, a context node C 1 is attached to com- • 1 to many expressions (multiplicity). Number
plement G 1. Context nodes are used to describe of functions are different by the target system.
the context (environment) of the goal attached to. We can instantiate the number of functions (n)
A goal node is decomposed through a strategy node for the target system.
(S 1) into sub goal nodes (G 2 and G 3). The strat-
egy node contains an explanation, or reason, for why • Choice. By this extension, we can choose ap-
the goal is achieved when the sub goals are achieved. propriate goals for the target system.
S 1 explains the way of arguing (argue over each pos-
sible fault: A and B). When successive decomposi-
tions reach a sub goal (G 2) that has a direct evi- 2.3 Assurance Cases and Standard-
dence of success, an evidence node (E 1) referring to ization Efforts
the evidence is added. Here we use a result of fault
tree analysis (FTA) as the evidence. For the sub Two major graphical notations for assurance cases
goal (G 3) that is not decomposed nor supported by are GSN and CAE (Claims, Arguments, and Evi-
evidences, a node (a diamond) of type undeveloped dence) [1]. There are two standardization efforts for
is attached to highlight the incomplete status of the assurance cases; the system assurance task force at
case. The assurance case in Figure 1 is written with the OMG (Object Management Group) and GSN
D-Case Editor. standardization effort [6]. OMG has standardized

274
FLOSS in Safety Critical Systems

Goal:G_1 Context:C_1

C/S Logic is free Risk Analysis Result:


from possible faults Possible faults are
fault A and fault B

Strategy:S_1

Argue over each


possible fault

Goal:G_2 Goal:G_3

C/S Logic is free C/S Logic is free


from fault A from fault B

Evidence:E_1 Undeveloped:U_1

FTA analysis No evidence is


Result currently given

Figure 1: A simple GSN Example

Figure 2: An example of GSN patterns [4]

275
D-Case Editor: A Typed Assurance Case Editor

the meta-model for assurance cases called ARM (Ar- types enum, int, double, and string, respectively.
gument Metamodel) [11] by which both notations are Furthermore, these types are given useful restrictions
in fact interchangeable. The main aim of the ARM is such that the value of CPU (this variable is intended
to align two major notations and facilitates the tool as the CPU resource usage rate of the target sys-
support. Unfortunately it only reflects main con- tem) is restricted within 0 − 100%. Users of D-Case
structs between the two, and some specific features, Editor can assign values to these variables via the
which are not compatible are missing from it. For parameter setting window. If a user mis-assigned a
instance, patterns are not included in the ARM. value (e.g., 150 for CPU), then D-Case Editor reports
the type error. As far as we know, there is not any
assurance case editor which has such parameterized
3 Overview of D-Case Editor expressions and type checking mechanism. We plan
to implement the type checking mechanism in Sec-
tion 3.
Figure 3 shows a screen shot of D-Case Editor. Users
can draw GSN diagrams in the canvas. In the right,
there is a pattern library. From the library, users can
choose already existing, good assurance case patterns 4 Concluding Remarks
and fragments, and copy to the canvas. Current D-
Case Editor has the following functions (some func- We have presented our assurance case editor, called
tions are omitted in current version.) Consistency D-Case Editor. It has been implemented as an
checking with an advanced proof assistant tool [13] Eclipse plug-in using Eclipse GMF, and released as
will be available soon. an open source. We hope that D-Case Editor would
contribute to make assurance cases more familiar to
• Checks on the graph structure of D-Case (e.g. developers by making a tool chain of D-Case Editor
no-cycle, no-evidence directly below a strategy, with Eclipse and other development tools. We plan
etc.) to comply to OMG ARM [11] and other international
standards related to assurance cases in next release.
• External info via url can be attached to a goal.

• “Patterns” with typed parameters can be reg-


istered and recalled with parameter instantia- References
tions.
[1] http://www.adelard.com/web/hnav/ASCE/choosing-
• Graphical diff to compare two D-Cases. asce/cae.html.
• A “ticket” in Redmine, a project management
[2] http://dependability.cs.virginia.edu.
web application, can be attached to a goal; the
ticket’s status can be reflected graphically in [3] Workshop on Assurance Cases: Best Prac-
D-Case (color change.) tices,Possible Obstacles, and Future Opportuni-
ties, DSN 2004, 2004.
• Monitoring: a url to be polled by Editor can be
attached to a node; the answer is dynamically [4] Robert Alexander, Tim Kelly, Zeshan Kurd,
reflected in D-Case (color change.) and John McDermid. Safety cases for advanced
control software: Safety case patterns. Tech-
• Scoring: calculates a weighted score for a D-
nical report, Department of Computer Science,
Case indicating how much of it is completed.
University of York, 2007.
• connection with uml2tools: generating a D-
Case subtree for a component diagram data. [5] European Organisation for the Safety of Air
Navigation. Safety case development manual.
European Air Traffic Management, 2006.
Among these functions, we show how patterns with
typed parameters can be registered and recalled with [6] GSN contributors. DRAFT GSN standard ver-
parameter instantiations. Current implementation is sion 1.0, 2010.
limited that variables and types can only be declared
in the top level of the GSN term. Declaration of [7] Tim Kelly and John McDermid. Safety case
variables and types are written in an XML file, as patterns - reusing successful arguments. In
shown in Figure 4. In Figure 4, variables STATUS, IEE Colloquium on Understanding Patterns and
CPU, USAGE, and MESSAGE are declared, and given Their Application to System Engineering, 1998.

276
FLOSS in Safety Critical Systems

Figure 3: A Screen Shot of D-Case Editor

<?xml version="1.0" encoding="UTF-8"?>


<!-- all element names and attributes are case sensitive -->
<dataType>
<parameter name="STATUS" type="enum">
<items>
<item value="NORMAL"/>
<item value="ERROR"/>
<item value="RUNNING"/>
<item value="SATISFIED"/>
</items>
</parameter>

<parameter name="CPU" type="int">


<range min="0" max="100"/>
</parameter>

<parameter name="USAGE" type="double">


<range min="0.00" max="999.99" digit="2" inc="0.02" />
</parameter>

<parameter name="MESSAGE" type="string">


<length min="0" max="20"/>
</parameter>
</dataType>

Figure 4: Variables and Type Declarations XML file for D-Case Editor

277
D-Case Editor: A Typed Assurance Case Editor

[8] Tim Kelly and Rob Weaver. The goal structur- Symposium on High-Assurance Systems Engi-
ing notation - a safety argument notation. In neering (HASE), pages 170–171, 2010.
Proc. of the Dependable Systems and Networks
2004, Workshop on Assurance Cases, 2004. [11] OMG. Argument metamodel (ARM). OMG
Document Number Sysa/10-03-15.
[9] Yutaka Matsuno and Kenji Taguchi. Parame- [12] Railtrack. Yellow book 3. Engineering Safety
terised argument structure for gsn patterns. In Management Issue3, Vol. 1, Vol. 2, 2000.
Proc. IEEE 11th International Conference on
Quality Software (QSIC 2011), 2011. Short Pa- [13] Makoto Takeyama. Programming assurance
per (6 pages). cases in agda. In ICFP, page 142, 2011.
[14] Robert Andrew Weaver. The Safety of Software
[10] Yutaka Matsuno, Hiroki Takamura, and Yutaka - Constructing and Assuring Arguments. PhD
Ishikawa. A dependability case editor with pat- thesis, Department of Computer Science, Uni-
tern library. In Procs. IEEE 12th International versity of York, 2003.

278
FLOSS in Safety Critical Systems

A FLOSS Library for the Safety Domain

Peter Krebs, Andreas Platschek, Hans Tschürtz


Vienna Institute for Safety & Systems Engineering
FH Campus Wien - University of Applied Sciences
Favoritenstrasse 226, Vienna, Austria
{peter.krebs, andreas.platschek, hans.tschuertz}@fh-campuswien.ac.at

Abstract
Safety-critical software is usually implemented under the constraints of one or more standards which
demand evidence that these constraints were honoured. This leads to higher implementation effort and
require in-depth knowledge on the programming languages and interfaces used by each individual pro-
grammer – often to avoid making the same mistakes over and over again.
To facilitate development under such conditions, a library of frequently used functions and algorithms
which adhere to certain safety constraints complemented by specific evidence suitable for proof against a
standard would be of great help. Inside this paper we present such a library written in ANSI-C, named
”safety lib”, which emerged as a by-product of an application developed for Safety Integrity Level (SIL)
2 certification according to IEC 61508 at the Vienna Institute for Safety & Systems Engineering.
The main intention of this paper is to show the benefits of using such a library in safety-critical
development and the reasons for its planned release under a FLOSS license. Furthermore, we want to
invite everyone to use the safety lib and participate in its development to improve both its code and
evidence base.
Our hypothesis is that the joint development of a library for safety-critical applications can not only
save development and certification costs, but – even more important – increase safety through better and
more intense reviews carried out by a community instead of just individual developers.

1 Introduction [1], will pose a number of new problems. This espe-


cially concerns two important trends in the design of
modern systems: the ability to communicate over a
Implementing a safety-critical system in software re- network via standardised and widely-available pro-
quires additional effort from the developers com- tocols such as TCP/IP and the utilisation of concur-
pared to the usual, non-safety, development process. rency for enhanced performance and responsiveness.
Depending on the specific domain and the standards Both aspects significantly increase the complexity of
followed, this usually entails the adherence to certain a software and the probability of residual faults due
implementation constraints and the availability of to the difficulty in the design and testing of asyn-
evidence to prove (with sufficiently high confidence) chronous and indeterministic behaviour. This is of-
that the software was actually built under these re- ten aggravated by the available interfaces and their
quirements. Defining and obtaining suitable proof peculiarities which provide further opportunities to
is in fact one of the most time-consuming and ex- implement an incorrect or ’unstable’ system.
pensive tasks in the development of a safety-critical The combined problem of increased software
system. complexity and evidence for safety can not be solved
The implementation of simple, well-understood by a single measure. However, to facilitate the imple-
algorithms is rarely of concern. However, the in- mentation and satisfy the need for proof at the same
creasing demand for safety-critical functions purely time, a viable approach would be to coalesce com-
implemented in software, as shown for example by plex and often used functions in a qualified software

279
A FLOSS library for the safety domain

library. Ideally, this ’safety library’ would be easy is a set of constraints common to most of them:
to (re-)use, well-tested and complemented by a suf-
ficient body of evidence to not only rise the safety of Coding guidelines - A coding guideline restricts
the system itself but also to speed up a certification the functionality of a given programming lan-
process against a specific standard. guage to a certain subset by excluding func-
Inside this paper we present such a library writ- tions and constructs which are deemed un-
ten for ANSI-C, named ’safety lib’, to achieve three safe. A well-known example are the MISRA-
objectives: First, we want to point out an approach C guidelines for the ANSI-C language which
to satisfy the demands of safety standards in a mean- disallow a number of standard library func-
ingful way. Second, by pointing out the problems tions (such as malloc() or printf()). It can be
our library helps to solve during implementation, we expected, that the adherence to a given cod-
hope to raise the awareness of certain ill-understood ing guideline will force a programmer to aban-
and often dangerous practices when implementing a don some ’standard solutions’ in favour of code
safety-critical system in C. Third, it is our intent to that complies to the restrictions (e. g. adding
leverage the know-how of the community to improve pre/postconditions to standard function calls,
the safety of the library and its evidence through ex- avoidance of dynamic memory allocation).
tensive reviews and enhancements. To this end we
Coding style guide - In contrast to a guideline, a
are preparing the release of the safety lib under a
coding style guide only affects the format of
FLOSS license and invite everyone to participate in
the source code and does not restrict the fea-
its further development.
ture set of a language. The primary purpose is
to enforce a consistent style of the code in or-
1.1 Content der to improve readability for code reviews and
maintainability in case of changing developers.
The first part gives an overview on the requirements A style guide dictates, for example, the length
imposed by certain safety standards on the imple- and indentation of code lines.
mentation of a safety-critical SW application and Modularity - Splitting the code up into small, self-
how they affect the source code. Furthermore, the contained components is beneficial for review-
impact on safety of portability and code reuse is dis- ing and testing and facilitates the assessment
cussed. of the impact of changes on the overall safety.
The second part explains how a software library The most common approach to ensure the
can provide evidence for the above mentioned re- modularity of code is by enforcing thresholds
quirements and opportunities for improving safety at for code complexity metrics such as lines of
the same time. After this, the safety lib is introduced code (LOC) and cyclomatic complexity[2].
via a number of examples that demonstrate certain
dangerous and unsafe implementation practices and Defined interfaces - Software modules and their
the approach of the library in preventing problems. functions should be consistent, easy to use and
This is complemented by some technical facts and an unambiguous in the meaning of their parame-
outline of concrete evidence the safety lib offers. ters. This can be achieved by enforcing limits
for the number of parameters, avoiding overly
The remainder of this paper deals with possible generic functions and via documentation of the
alternatives and the approach to improve the func- interfaces.
tionality and safety of the library via the FLOSS
community. Static analysis - Static analysers work directly on
the source code and can detect problems which
are not considered by the translator such as
2 Constraints out-of-bounds access or locking errors. In addi-
tion, manual code reviews provide coverage of
problems which can not be adequately checked
Developing a safety-critical system inevitably in- by a tool.
volves one or more standards and the effort to prove
that the requirements of those standards are met. Testing - Testing complements static analysis by
Regarding software implementation the requirements asserting the correct behaviour of a software
usually consist of a number of constraints for the de- during runtime. As there is no way to exhaus-
sign and coding of the source code. While certain tively test an even moderately complex sys-
standards might require specific methodology, there tem, test coverage metrics are usually defined

280
FLOSS in Safety Critical Systems

to gauge the thoroughness of a test suite (e. g. creased effort of generic programming and the
statement/branch coverage). lack of certain non-standard features.

Reuse of software elements can speed up the de-


Table 1 lists the clauses of the important safety velopment process and can be a strong safety
standards IEC 61508[3], EN 50128[4], DO-178B[5], argument - when the reused component can
ISO 26262[6] and ISO 13849[7] pertaining to the be proven to be safe enough. The necessary
above mentioned points. data for this ’proven in use’ argument is, how-
In addition to the normative requirements a soft- ever, often difficult to retrieve especially with
ware should exhibit additional properties: closed-source/proprietary code. FLOSS soft-
ware plays an important role here as not only
Standard Clauses the code itself but also ancillary data (such as
bug reports) is publicly available.
IEC 61508 CG/CSG: Table A.3, Point 3
(Part 3) M: Table B.9, Points 1/2/3/4
DI: Table B.9, Point 6
SA: Table B.8 3 The Library Approach
T: Tables B.2 and B.3
EN 50128 CG: Table A.4, Point 10 Fulfilling the mentioned subset of requirements for a
(Annex A) CSG: Table A.12, Point 2 piece of safety-critical software obviously requires ef-
M: Table A.20, Points 1/2/3 fort proportional to the amount of code. In order to
DI: Table A.20, Point 5 reduce the workload, parts of the code can be sub-
SA: Table A.19 stituted by pre-existing one in the form of a code
T: Table A.13 library which already complies to the safety con-
DO-178B CG/CSG: 11.8 straints. This approach is actually sanctified by most
M: - safety standards which refer to this pre-existing code
DI: - as ’Commercial/Components off the Shelf’ (COTS)
SA: - and basically provide two ways to assess if a COTS
T: 6.4 component is acceptable:
ISO 26262 CG/CSG: 5.4.6
(Part 6) M: 7.4.3 Proven in use refers to a history of operational
DI: - use without known safety-related failures for
SA: 8.4.5 a specified minimum time 1 .
T: 9
ISO 13849 CG/CSG: Annex J, Clause 4 Verification Evidence proofs that the pre-
M: 4.6.2, 4.6.3 existing code adheres to the safety require-
DI: 4.6.3, Point g ments of a standard for a given safety integrity
SA: 4.6.2 level (or similar metric).
T: 4.6.2, 4.6.3
CG = Coding guideline, CSG = Coding style guide, Considering a software library developed by a 3rd
M = Modularity, DI = Defined interfaces, SA = Static party, both aspects can be covered as follows:
analysis, T = Testing

• The history of defects (e. g. bug reports) shows


TABLE 1: Safety standards and their re-
whether there are known safety-related issues
quirements for implementation.
in a specific version.

Portability enables the usage of the software on dif- • The source code itself can be analysed and re-
ferent systems without the need to modify large viewed for the presence/absence of faults.
parts. This not only saves time and money but
is also beneficial for safety as changes to an ex- • Additional data like test results, metric reports
isting code might easily introduce new bugs. and analyser output provide safety-specific ev-
One way to achieve this is to rely on standard- idence.
ised interfaces as much as possible - e. g. by ex-
clusively using functions defined by the POSIX The first two points are usually not feasible with
standard. The downside of portability is the in- closed-source/proprietary libraries and make the use
1 For example, IEC 61508, part 7, clause C.2.10.1 states the minimum operating time as 1 year.

281
A FLOSS library for the safety domain

of FLOSS software in safety-critical systems attrac- the following, we discuss a number of these prob-
tive. However, most FLOSS libraries do not provide lems to further their awareness and describe how the
the specific evidence needed to justify their usage in safety lib tries to solve them.
the context of a safety standard. While it is theoret-
ically possible to extract the necessary data – given
the availability of the source code – time and bud- 4.1 Undefined behaviour
get constraints usually prevent this. Furthermore,
FLOSS code is rarely written in accordance to safety C is a language which provides lots of freedom both
requirements. Instead, developers might opt to write to the programmer and the compiler. This freedom
safety-critical code from the scratch, effectively rein- comes at a price, as it is not guaranteed how cer-
venting the wheel over and over and wasting the tain source code constructs actually behave during
benefits provided through the FLOSS approach – runtime. The C90 standard[8] includes a list in An-
namely a large number of potential reviewers/testers nex G, Clause G.2 of such constructs said to invoke
and availability of defect history. ’undefined behaviour’ – the most famous examples
To cope with this problem, a software library is are probably division by zero, dereferencing a NULL
needed which fulfils the following requirements: pointer and accessing an array out-of-bounds. Unde-
fined behaviour should not be confused with unspeci-
fied and implementation-defined behaviour which are
• The source code must be fully available.
much more benign (but can be still problematic).
• The history of changes and bugs must be ob- The danger of undefined behaviour is the unpre-
tainable. dictability of the program’s execution. Depending
on the compiler, the state of the execution environ-
• All library code must comply to the common
ment and other arbitrary factors it might crash, fail
set of safety requirements imposed by the stan-
silently or actually show no erroneous behaviour at
dards.
all. Due to compiler optimisation a program might
• Sufficient evidence to proof the compliance of even be affected prior to executing the actual op-
the code must be available. eration which invoked the undefined behaviour, as
demonstrated by [9].
• Modifications and corrections must follow a de-
Examining already existing source code for un-
fined process.
defined behaviour can be very difficult. An alterna-
• The library must be portable to a large number tive approach is to exclude programming constructs
of platforms. which may lead to undefined behaviour a priori –
this is the motivation behind coding guidelines. The
• The interfaces must be fully documented. safety lib adopts this by adhering to the MISRA-
C:2004 coding guidelines[10] which prohibits invoca-
In the remaining parts of the paper we present tion of undefined behaviour in general and for specific
a software library written in ANSI-C, named the cases such as:
’safety lib’, developed in accordance with these re-
quirements. • Using identifiers that do not differ in the first
n characters2

4 A safe library for C • Shifting a value by more than the number of


bits it possesses minus 1 (e. g. shifting a 16-
The safety lib was originally part of a Voice-over- bit integer by more than 15).
IP (VoIP) client software developed according to
• Using the value of an automatic variable before
IEC 61508 for SIL 2. As such, its original purpose
a value has been assigned.
was to mainly provide a safe interface to socket and
thread synchronisation functions. However, its scope
has increased steadily to incorporate other problem- Especially the execution of certain standard li-
atic aspects of ANSI-C and the POSIX function set. brary functions with invalid arguments is a classical
While many of these aspects are common program- source of problems. For example, the inability of the
ming knowledge, some might not be well-known. In string handling functions in respecting array bounds
2 The number n of significant characters is implementation defined and usually 6 and 31 for identifiers with external and

internal linkage, respectively.

282
FLOSS in Safety Critical Systems

can lead to out-of-bounds access when the terminat- 4.2 Complicated Interfaces
ing NUL-character is not present in the source string.
The error might be prevented by using strncpy(). Due to historic growth the POSIX 2001 standard[11]
However, if there is no terminating NUL in the first contains a number of functions whose interfaces are
n characters to copy, the destination string will not rather complicated to use correctly. This especially
be terminated either – only for subsequent functions concerns the socket API used in network program-
to fail as in listing 1. ming, which suffers from the need to combine several
different socket types and provide support for IPv6.
#define BUFSIZE 19 Applications requiring network traffic must take care
... to use the correct type of socket address structure in
resolving addresses and creating sockets or risk un-
char b u f [ BUFSIZE ] = { ’ \0 ’ } ; stable network behaviour.
s t r n c p y ( buf , \ Listing 3 shows how to get the IP address of
” undefined behaviour ” , \ a remote peer connected via TCP. However, this
BUFSIZE ) ;
code does not work with IPv6 which would need an-
/∗ w i l l l i k e l y p r i n t 19 or c r a s h ∗/ other address structure type to store its larger ad-
p r i n t f ( ” S i z e o f s t r i n g i n b u f f e r : %u\n” , dress (namely sockaddr in6). Instead of limiting it-
s t r l e n ( buf ) ) ; self to one of the types applications should use a
Listing 1: Undefined behaviour invoked through sockaddr storage structure suitable for both IPv4 and
standard library string handling functions IPv6 addresses as recommended in [12]. This in-
volves a conversion to yet another type (sockaddr)
in order for getpeername() to work, since IPv6 sup-
The primary method to avoid these and other port was added after the API was defined. Similar
classes of mistakes is to provide wrappers for stan- confusing behaviour can be found for other functions
dard functions which ensure that certain pre- and dealing with sockets and address structures (e. g. the
postconditions hold during execution. For the output of getnameinfo() is inconsistent across plat-
above example a safe version would both ensure forms, inet pton() returning 0 for error).
that a given destination buffer size is not exceeded
and that the resulting string is terminated in all i n t s o c k f d c o n n e c t e d = −1;
struct s o c k a d d r i n p e e r a d d r ;
cases. Listing 2 shows the same situation using sockl en t peer addrlen = sizeof ( peer addr ) ;
the safety lib’s safe strncpy() function which incor- char p e e r a d d r s t r [INET ADDRSTRLEN + 1U ] ;
porates these checks to avoid undefined behaviour3 .
...
#define BUFSIZE 19
/∗ c a l l s t o s o c k e t ( ) , b i n d ( ) and a c c e p t ( )
o m i t t e d ∗/
...
...
char b u f [ BUFSIZE ] = { ’ \0 ’ } ;
getpeername ( s o c k f d c o n n e c t e d , \
i f (1 == s a f e s t r n c p y ( buf , \
( struct s o c k a d d r ∗ ) &p e e r a d d r , \
BUFSIZE , \
&p e e r a d d r l e n ) ;
” undefined behaviour” ) )
{
i n e t n t o p (AF INET , \
/∗ no t e r m i n a t i n g NUL d e t e c t e d among
&p e e r a d d r . s i n a d d r , \
BUFSIZE c h a r a c t e r s ∗/
peer addr string , \
p r i n t f ( ” S t r i n g to o l o n g \n” ) ;
( s o c k l e n t ) (INET ADDRSTRLEN + 1U) ) ;
}
else
p r i n t f ( ” IP o f p e e r : %s \n” , p e e r a d d r s t r ) ;
{
/∗ w i l l al w ay s p r i n t l e s s e q u a l 18 ∗/
p r i n t f ( ” S i z e o f s t r i n g i n b u f : %u\n” , \ Listing 3: Getting the IP address of a TCP peer in
s t r l e n ( buf ) ) ; the traditional way
}

Listing 2: Using safe strncpy() to avoid undefined To facilitate the usage of sockets in a protocol
behaviour independent, portable and non-confusing manner,
the safety lib includes wrappers for socket handling
3 A similar function already exists with strlcpy() on several platforms. However, strlcpy() is not defined by POSIX and so

not universally available – for example, glibc does not implement the function.

283
A FLOSS library for the safety domain

functions which enable the programmer to work di- 4.4 Thread synchronisation
rectly with addresses in textual and binary form for
IPv4 and IPv6. For example, listing 4 demonstrates Sometimes it is necessary to force multiple threads
the above scenario using the safe get peer address() to execute in a certain order or to let them wait for a
function instead. specific event to occur. The principal way to achieve
this with the POSIX pthread API is to use a condi-
i n t s o c k f d c o n n e c t e d = −1; tion variable (condvar). A condvar basically puts the
char p e e r a d d r s t r [ INET ADDRSTRLEN + 1U ] ;
current thread to sleep until it receives a signal from
uint16 t peer srcport ;
enum i p v e r s i o n p e e r i p v e r s i o n ; another thread and is usually associated with a pred-
icate that determines if it is necessary to wait. Fur-
... thermore, each condvar is paired with a mutex that
/∗ c a l l s t o s o c k e t ( ) , b i n d ( ) and a c c e p t ( )
is atomically unlocked and locked when the thread
o m i t t e d ∗/ starts and ends it sleep respectively. The proper use
of a condvar requires certain steps:
...

s af e g et p ee r ad dre s s ( sockfd connected , \ • Before the predicate is checked, the mutex


peer addrstr , \ must be locked.
(INET6 ADDRSTRLEN + 1U) , \
&p e e r s r c p o r t , \
&p e e r i p v e r s i o n , \
• If the predicate evaluates to false, the thread
FALSE) ; blocks on the condvar and releases the mutex.

p r i n t f ( ” IP o f p e e r : %s \n” , p e e r a d d r s t r ) ; • After getting signalled and waking up, the mu-


p r i n t f ( ” S o u r c e p o r t : %u\n” , p e e r s r c p o r t ) ; tex is automatically locked and the predicate
i f ( IP V6 == p e e r i p v e r s i o n )
must be checked again. This is necessary, be-
{ cause condvars are subject to ’spurious wake-
p r i n t f ( ” C o n n e c ti o n u s e s IPv6 \n” ) ; ups, i. e. a thread sleeping on a condvar might
} return without actually receiving a signal (see
Listing 4: Getting the IP address of a TCP peer and [13] for details on this behaviour).
more with safe get peer address() • If the predicate now evaluates to true, the mu-
tex must be unlocked and the thread continues.

To reduce the possibility of errors when imple-


4.3 Dynamic allocation menting the above procedure, the safety lib provides
an extended version of POSIX condvars, termed ’sig-
nal gate’, with the following semantics:
Safety standards are quite clear in prohibiting dy-
namic allocation/deallocation of memory during run-
time4 . To avoid the associated dangers such as mem- • Spurious wake-ups are handled internally and
ory leak there is no explicit dynamic allocation in the never lead to undesired thread continuation.
safety lib (although some utilised standard library or
• A thread signalling another one can assign an
POSIX functions allocate memory temporarily, e. g. ID and a priority (high or low) to the signal.
getaddrinfo()).
Dynamic allocation is often used to implement • A thread waiting on a signal gate can set a
abstract data types (ADT) such as lists. The mask of valid signal IDs. Only signals with a
safety lib provides an API to create ADTs using only matching ID can ’unlock’ the gate and allow
statically allocated memory. In this way, there is no the thread to continue.
loss of flexibility in storing data when the maximum
• If one or more signals are sent to a signal gate
storage consumption is known.
before a thread actually waits on it, the sig-
In case dynamic allocation is absolutely needed nals are buffered in the order of their arrival
a simple custom memory allocator is provided which and processed according to their priority (high
acts on a block of statically allocated storage. Both before low). This is not possible with raw cond-
the allocation and coalescing algorithms can be eas- vars as signals are not buffered in this case but
ily replaced if necessary. simply lost.
4 See IEC 61508 - Annex B, Table B.1, EN 50128 - Annex A, Table A.12, ISO 26262 - Part 6, 8.44, Table.9.

284
FLOSS in Safety Critical Systems

• As with condvars, a timed wait can be used


which automatically unlocks when no signal
OS/Distro HW Arch gcc -v
was received for a certain time.
Debian 6 x86 4.4.5
Debian 6 x86 64 4.4.5
The signal gate primitive was extensively used in Debian 6 mipsel 4.4.5
the design and implementation of a multi threaded Debian 6 ppc 4.4.5
application and proved to be very useful and robust. Ubuntu 10.04 x86 4.4.3
FreeBSD 8.2 x86 4.2.1
Debian GNU/Hurd 6 x86 4.4.6
OpenSolaris 2009.06 x86 3.4.3
OpenIndiana build 151a x86 3.4.3
4.5 Signals
TABLE 2: SW/HW-Platforms on which
the safety lib was tested.
POSIX signals are a classical tool for asynchronous
inter process communication and used explicitly or To fulfil the requirement for modularity, the li-
indirectly in many applications. However, signals brary code respects the following code metric limits:
pose a safety risk as they can occur practically any
time without regard for synchronisation and require • Maximum LOC per function: 200
great care to process correctly (e. g. functions called
in a signal handler must be re-entrant in order not • Maximum cyclomatic complexity per function:
to deadlock, unhandled signals may shut down the 20
application ungracefully). Standards therefore dis-
courage their use5 in a safety-critical software. • Maximum number of function parameters: 7
(with 1 exception)
The abolishment of signals is especially problem-
atic when an application should run in the back- Functions that work with persistent objects (e.
ground as a daemon process since most APIs (such g. lists, signal gates) are fully thread-safe by using
as daemon() or start-stop-daemon) rely on signals pthread mutexes. All remaining functions are re-
for communication. The safety lib therefore pro- entrant.
vides a framework for creating and controlling dae-
mons without signals, instead using POSIX message
queues for relaying start/stop/restart commands to
the background process. This allows for ignoring
6 Evidence provided by the
most signals, reducing the risk of harmful interrup- safety lib
tions.
As important as the actual code is the evidence for
fulfilling the safety requirements mentioned at the
beginning. The subsequent list gives the specific
proofs that are provided by the safety lib:
5 Technical Aspects
• Coding guidelines/static analysis –
MISRA:2004 requires two pieces of evidence
As mentioned before, the safety lib is designed to be to proof the conformance of code to its rules:
portable to a majority of POSIX 2001 conformant A ’compliance matrix’ explaining how each
systems. This was ensured by successfully build- rule was actually enforced and a set of ’devia-
ing and testing the library on a variety of platforms tion requests’ which justify any deviation from
which are listed in Table 2. It should be mentioned, a rule. Both of these objects were created for
that the networking API requires a dual-stack sys- the safety lib. As for the actual checking, the
tem to build (i. e. the network stack must support library code was analysed with the commercial
both IPv4 and IPv6). The library is optimised for static analysis tool FlexeLint[14] supported by
32-bit platforms but should work without problems manual code reviews targeting rules that can
on 64-bit systems, despite not actually using 64-bit not be assessed by the tool. The reports of the
variables internally. analyser and the reviews are also available.
5 See IEC 61508 - Annex B, Table B.1, Point 4, EN 50128 - Annex B, Clause 62, ISO 26262 - Part 6, 7.4, Table 4.

285
A FLOSS library for the safety domain

• Coding style guide – The code follows a self- • Cyclone – As a rather different approach,
defined style guide loosely based on the Linux Cyclone[21] is a dialect of C which adds safety
kernel coding style[15]. No suitable checking checks during compilation and runtime, pre-
tool was available, so manual reviews and their venting certain types of errors. Despite the
reports act as substitute. great potential, it is unfortunately no longer
maintained and would require justification for
• Modularity – As mentioned before, metric the compiler which is most likely not proven-
limits were enforced on the source code which in-use.
were checked with cccc[16]. The metric reports
generated by the tool provide the necessary ev- To summarise, the alternatives lack either the
idence. necessary evidence, do not adhere to the require-
• Defined interfaces – All functions with ex- ments of safety standards or have a too narrow scope.
ternal linkage are documented inline in a con-
sistent manner detailing the function pur-
pose, meaning of parameters and return values. 8 Future Work
Doxygen[9] markup was used to automatically
generate the documentation from source. While the decision that the safety lib should be re-
leased under a FLOSS license was a quick one, the
• Testing – The library code was tested by au- details still have to be clarified. The following list in-
tomated unit tests using a modified version of cludes some of the decisions to be made and the pre-
the CUnit[18] framework to achieve a mini- conditions that have to be established before we can
mum statement/branch coverage of 90%/80% actually release our code – the intention of this pa-
per unit and 93%/84% on average for the whole
per is to get feedback from potential users that may
library. The code coverage was measured with help us to make those decisions which will greatly
gcov and graphical reports were generated with influence the future of the safety lib.
lcov[19].
Deciding on a FLOSS License – It can be as-
sumed that most companies would not be inter-
7 Alternatives ested in linking to a library that ”forces” them
to release their application under the same li-
Implementing safe versions of standard functions and cense. Because of this, we are currently tend-
programming idioms is an established practice. Al- ing towards LGPL [22] or a similar model, as
though there already exist solutions in a similar vein this allows the usage of the library in conjunc-
to the safety lib, they are insufficient for direct adop- tion with a proprietary software, while ensur-
tion into a safety-critical software. This section dis- ing that improvements to the safety lib itself
cusses some of the freely available alternatives and are released as FLOSS. The intention is to pre-
their deficiencies. vent ’grab and run’ in the interest of everyone
who really wants to use and participate in the
development of the safety lib, while still allow-
• Safe C Library – This library implements
ing contributing parties to use the safety lib
alternative versions of the standard library’s
without having to license their whole applica-
string handling and memory allocation func-
tion under a FLOSS license.
tions defined by ISO/IEC TR 24731[20]. These
extensions mainly add bounds checking and Define a Development Life Cycle – A general
have the benefit of standardisation and com- requirement of the various safety standards not
pleteness in regard to the standard library but yet discussed is the need for a well defined de-
nothing else. Furthermore, no additional safety velopment life cycle. To satisfy this demand
evidence is available. in a way that embraces the needs of software
development in the safety domain as well as
• Safe C String Library – A library providing the needs for a community driven open-source
safe string manipulation using a custom string development we will need to define such a life
type with length information. The drawbacks
cycle in detail prior to release.
of this approach are the increased effort in port-
ing legacy applications and the usage of dy- Evidence Management – The basis for a success-
namic allocation for storing strings. Evidence ful management of evidence that can be used
is missing as well. during the certification process is to have a

286
FLOSS in Safety Critical Systems

technical framework that supports the collec- within one single organisation. In a first step, this
tion and management of evidence and semi- includes more code reviews by developers from dif-
automatically produces the documents that ferent organisations and from different industries to-
can be provided to the certification authori- wards a higher chance of discovering subtle bugs –
ties. In our current development, we are using especially by running the code on a variety of hard-
Codestriker[23] to document code reviews and ware and software platforms in various applications.
Bugzilla[24] to report bugs. Both of those tools
In the future, pure testing and code reviews will
store the collected data in SQL databases from
be less and less effective to master growth in code size
which a review report and a report of known
and complexity and the usage of formal methods for
problems can be pulled with small effort.
code verification will become more and more impor-
Legal Issues – Just as important as the manage- tant for this case – even for systems with a low safety
ment of evidence is the question of legal prob- integrity level. Unfortunately, most of these formal
lems with the collected data. This includes method techniques are time intensive tasks demand-
copyright and licensing issues as well an esti- ing expert knowledge. Here, the big advantage of
mation of the impact on a possible certifica- joint development manifests itself by the chance for
tion process against a standard. Basically, this those developers new to the formal analysis to learn
means that the evidence itself will need to be from those who have experience, and to get their first
released under an open license as well. The steps in formal verification checked by experts.
first that comes in mind here is the FDL [25]
but CC [26] might be an option as well.
9 Conclusions
8.1 Community Development
Safety standards impose a lot of constraints on the
To allow efficient community driven development, implementation of software deemed for certification.
the safety lib has already been pulled out of the sub- A set common to most important standards can be
version repository of the project it evolved from and defined from these constraints which demands evi-
has been moved into a separate git repository. Care dence that all requirements are fulfilled by the code.
has been taken to preserve traceability in form of the The acquisition of this evidence is time-consuming
change history back to the very beginning. As soon and often not even possible – especially when de-
as the safety lib is released under a FLOSS license pending on closed-source software.
this repository will be made accessible to the public. Inside this paper we proposed the usage of an
As described above it is crucial to decide on a de- open-source library developed under the normative
velopment life-cycle for the safety lib. This will also constraints as a way to raise the safety of an appli-
concern the way how patches find their way into the cation and satisfy the need for evidence. A proof-
official repository and contributions to the evidence of-concept library, the safety lib, is presented as a
databases can be committed. As these kind of things foundation for a generic safety library jointly devel-
are new to us, and – as far as we know – unique in the oped by the FLOSS community.
safety world we do not want to rush into things but The next steps depend on the feedback of the
to define a strategy - that not just pleases us, but community. If there is sufficient interest, a devel-
makes the safety lib interesting for everyone devel- opment life cycle needs to be defined which enables
oping software for safety critical systems on POSIX contributions both to the code and the evidence base
compliant systems - up front. without violating safety constraints.

8.2 Joint Evidence Collection


10 Acknowledgements
The evidence collected so far, as described in pre-
vious sections, only builds the basis of the evidence The safety lib was developed by the ’Stadt Wien
that can be provided in a joint development. We Kompetenzteam für Safety Network Engineering
also would like to emphasise again that we do not (SNET)’ which is supported by MA27 – EU-
think that the savings in implementation costs are Strategie und Wirtschaftsentwicklung in the
the foremost benefit of an open-source safety library. course of the funding programme ’Stiftungsprofes-
Rather, the real advantage will lie in a thorough set suren und Kompetenzteams für die Wiener
of evidence at a level which is very hard to achieve Fachhochschul-Ausbildungen’.

287
A FLOSS library for the safety domain

We dedicate this paper and the safety lib to [11] IEEE Std 1003.1-2001 - Standard for Informa-
the late DI Herbert Haas who lead the ’Stadt tion Technology Portable Operating System In-
Wien Kompetenzteam für Safety Network Engineer- terface (POSIX), 2001, The Open Group, IEEE
ing (SNET)’. He originally proposed the idea of a
software library for safety-critical applications and [12] UNIX Network Programming Volume 1, Third
his encouragement was invaluable to us during the Edition: The Sockets Networking API, W.
development. Richard Stevens, Bill Fenner, Andrew M. Rud-
off, 2003

References [13] Programming with POSIX Threads, David R.


Butenhof, 1997, Addison-Wesley
[1] This Car Runs on Code, 2009, [14] http://www.gimpel.com
http://spectrum.ieee.org/green-tech/advanced-
cars/this-car-runs-on-code [15] Linux kernel coding style,
[2] A Complexity Measure, http://kernel.org/doc/Documentation/CodingStyle
Thomas J. McCabe, 1976,
[16] http://sourceforge.net/projects/cccc
http://www.literateprogramming.com/mccabe.pdf
[3] IEC 61508 Edition 2.0 – Functional safety [17] www.doxygen.org
of electrical/electronic/programmable electronic
safety-related systems, 2010, IEC [18] http://cunit.sourceforge.net

[4] BS EN 50128:2001 – Railway applications - [19] http://ltp.sourceforge.net/coverage/lcov.php


Communications, signalling and processing sys-
tems -Software for railway control and protec- [20] ISO/IEC TR 24731: Extensions to
tion systems, 2001, CENELEC the C Library, 2011, http://www.open-
std.org/jtc1/sc22/wg14/www/projects
[5] RTCA/DO-178B – Software Considerations in
Airborne Systems and Equipment Certification, [21] Cyclone: A Type-Safe Dialect of C, 2005,
1992, RTCA http://www.cs.umd.edu/ mwh/papers/cyclone-
[6] ISO/DIS 26262 – Road vehicles - Functional cuj.pdf
safety, 2009, ISO
[22] The Lesser GPL,
[7] ISO 13849 Second Edition – Safety of ma- http://www.gnu.org/licenses/lgpl.txt
chinery Safety-related parts of control systems,
2006, ISO [23] Codestriker Webpage,
http://codestriker.sourceforge.net
[8] ANSI/ISO 9899-1990, 1990, ANSI, ISO
[9] Embedded in Academia: A Guide to Un- [24] Bugzilla Webpage, http://www.bugzilla.org
defined Behavior in C and C++, Part 3,
http://blog.regehr.org/archives/232 [25] The Free Documentation License,
http://www.gnu.org/licenses/fdl.txt
[10] MISRA-C:2004 - Guidelines for the use of the C
language in critical systems, 2004, MISRA Con- [26] Creative Commons License,
sortium http://creativecommons.org

288
FLOSS in Safety Critical Systems

”Open Proof” for Railway Safety Software


A Potential Way-Out of Vendor Lock-in Advancing to Standardization, Transparency, and
Software Security.

Klaus-Rüdiger Hase
Deutsche Bahn AG
Richelstrasse 3, 80634 München, Germany
Klaus-Ruediger.Hase {at} DeutscheBahn {dot} com

Abstract
”Open Proof (OP) is a new approach for safety and security critical systems and a further develop-
ment of the Open Source Software (OSS) movement, not just applying OSS licensing concepts to the final
software products itself, but also to the entire life cycle and all software components involved, includ-
ing tools, documentation for specification, verification, implementation, maintenance and in particular
including safety case documents. A potential field of applying OP could be the European Train Control
System (ETCS) the new signaling and Automatic Train Protection (ATP) system to replace some 20
national legacy signaling systems in all over the European Union. The OP approach might help manufac-
turers, train operators, infrastructure managers as well as safety authorities alike to eventually reach the
ambitious goal of an unified fully interoperable and still affordable European Train Control and Signaling
System, facilitating fast and reliable cross-border rail traffic at state of the art safety and security levels.

Keywords: ATC, ATP, Critical Software, Embedded Control, ETCS, EUPL, FLOSS, Open Proof,
openETCS, Train Control, Standardization.

1 Introduction The key element for improving that situation


seems to be a greater degree of standardization in
particular for the ETCS onboard equipment on var-
The European Train Control System (ETCS, [1]) ious levels: Hardware, software, methods and tools.
is intended to replace several national legacy sig- Standardization by applying open source licensing
naling and train control systems all across Europe. concepts will be the focus of this paper.
The system consists of facilities in infrastructure
and on-board units (OBU). Especially for the ETCS
on-board equipment the degree of functional com-
plexity to be implemented is expected to be sig- 1.1 From National Diversity to Euro-
nificantly higher than in conventional systems. In pean Standard
terms of technology, this is mostly done by software
in so-called embedded control system implementa- Looking back into history of signaling and automatic
tions. While electronic hardware is getting continu- train protection (ATP) for mainline railways sys-
ously cheaper, the high complexity of the safety crit- tems, in the past 40 years a major change in tech-
ical software has caused significant cost increases for nology has taken place. In the early days of ATP
development, homologation and maintenance of this almost all functions were implemented in hardware,
technology. This has raised questions for many rail- starting with pure mechanical systems, advancing
way operators with respect to the economy of ETCS to electromechanical components and later on us-
in general. ing solid state electronics, like gates, amplifiers, and

289
Open Proof for Railway Safety Software

other discrete components. Software was not an issue figures for such equipment. Furthermore, some of
then. Beginning in the late 1970 years an increasing the systems are in use for more than 70 years and
number of functions were shifted into software, exe- may not meet todays expected safety level. Some
cuted by so called micro computers. Today the ac- are reaching their useful end of life causing obsoles-
tual functions of such devices are almost entirely de- cence problems.
termined by software. The dramatic performance in-
For a unified European rail system it is very
crease of microcomputers in the past 30 years on the
costly to maintain this diversity of signaling systems
one hand and rising demand for more functionality
forever and therefore the European Commission has
on the other hand, has caused a significant increase in
set new rules by so called Technical Specifications
complexity of those embedded control systems - how
for Interoperability (TSI) with the goal to implement
such devices are usually called. Furthermore, the
a unified European Train Control System, which is
development from purely monitoring safety protec-
part of the European Rail Traffic Management Sys-
tion systems, like the German INDUSI (later called
tem (ERTMS), consisting of ETCS, GSM-R, a cab
PZB: Punktfrmige Zug-Beeinflussung) or similar sys-
radio system based on the GSM public standard en-
tems in other European countries, which only mon-
hanced by certain rail specific extensions and the Eu-
itor speed at certain critical points and eventually
ropean Traffic Management Layer (ETML). Legacy
stop the train, if the driver has missed a halt signal
ATP or so called Class B systems are supposed to be
or has exceeded a safe speed level, to a more or less
phased out within the next decades.
(semi) Automatic Train Control (ATC) systems like
the German continuous train control system, called
LZB (Linien-Zug-Beeinflussung), which has increas-
ingly shifted safety responsibility from the infras- 1.2 ETCS: A new Challenge for Eu-
tructure into the vehicle control units. Displaying ropes Railways
signal commands inside the vehicle on certain com-
puter screens, so called cab signaling, has resulted Before launching the ETCS program, national op-
in greater independence from adverse weather con- erational rules for the railway operation were very
ditions. closely linked with the technical design of the sig-
nal and train protection systems. That is going to
change radically with ETCS. One single technology
has to serve several different sets of operational rules
and even safety philosophies.
The experience of Deutsche Bahn AG after Ger-
man reunification has made very clear that it will
take several years or even decades to harmonize op-
erational rules in all over Europe. Even under nearly
ideal conditions (one language, one national safety
board and even within one single organization) it
was a slow and laborious process to convert different
FIGURE 1: Europes challenge is to substi- rules and regulations back into one set of unified op-
tute more than 20 signaling and ATP systems erational rules. After 40-years of separation into two
by just one single system, ETCS, in order to independent railway organizations (Deutsche Reichs-
provide border crossing interstate rail transit bahn in the east and Deutsche Bundesbahn, west),
in all over the European Union. it took almost 15 years for Deutsche Bahn AG to get
back to one single unified signaling handbook for the
In all over Europe there are more than 20 differ-
entire infrastructure of what is today DB Netz AG.
ent mostly not compatible signaling and train pro-
tection systems in use (figure 1). For internationally Therefore, it seem unrealistic to assume that
operating high speed passenger trains or cargo lo- there will be one set of operational rules for all ETCS
comotives up to 7 different sets of equipment have lines in all over Europe any time soon (Which does
been installed, just to operate in three or four coun- not mean that these efforts should not be started as
tries. Since each of those systems have their own soon as possible, but without raising far too high ex-
antennas to sense signals coming from the way-side pectations about when this will be accomplished.).
and their own data processing units and display in- That means, in order to achieve interoperability by
terfaces, space limitations are making it simply im- using a single technical solution: This new system
possible to equip a locomotive for operation in all has to cope with various operational regimes for the
EU railway networks, not to mention prohibitive cost foreseeable future. Beside this, for more than a

290
FLOSS in Safety Critical Systems

decade there will be hundreds of transition points be- Therefore it became an important issue for vehi-
tween track sections equipped with ETCS and sec- cle operators to identify potential cost drivers and
tions with one of several different legacy systems. options for cost reduction measures, so as not to
This will cause an additional increase of functional endanger the wellintentioned general goal of unre-
complexity for onboard devices. stricted interoperability.

1.3 Technology is not the Limiting 2 Software in ETCS Vehicle


Factor Equipment
With state of the art microcomputer technology, As discussed above, state of the art technology re-
from a technological point of view, this degree of quires for almost all safety critical as well as non-
complexity will most likely not cause any perfor- safety related functions to be implemented in soft-
mance problems since the enormous increase in per- ware. The end-user will normally receive this soft-
formance of microcomputer technology in recent ware not as a separate package, but integrated in
years can provide more than sufficient computing his embedded control device. Therefore software is
power and storage capacity at an acceptable cost usually only provided in a format directly executable
level; to master complex algorithms and a huge by the built-in microprocessor, a patter of ones and
amount of data. zeros, but therefore not well suited for humans to
The real limiting factor here is the human brain understand the algorithm. The original source code,
power. In the end it is the human mind, which a comprehensible documentation format of this soft-
has to specify these functions consistently and com- ware, which is used to generate the executable code
pletely, then provide for correct designs, implement by a compiler and all needed software maintenance
them and ultimately make sure that the system is tools are usually not made available to the users.
fit for its purpose and can prove its safety and secu- Manufacturers are doing this, because they believe
rity. The tremendous increase in complexity, absorb- that they can protect their high R&D investment
ing large numbers of engineering hours is one reason this way.
why we are observing cost figures for R&D, testing
and homologation of the software components in var-
2.1 Impact of Closed Source Software
ious ETCS projects that have surpassed all other
cost positions for hardware design, manufacturing
However concealment of the software source code
and installation. This has caused a substantial cost
documentation has increasingly been considered as
increase for the new onboard equipment compared
problematic, not only for economical reasons for the
with legacy systems of similar functionality and per-
users, but more and more for safety and security rea-
formance.
sons as well. Economically it is unsatisfactory for the
Normally we would expect from any new technol- operators to remain completely dependent from the
ogy a much better price to performance ratio than original equipment manufacturer (OEM), no matter
for the legacy technology to be replaced. Due to the whether software defects have to be fixed or func-
fact, that this is obviously not the case for ETCS, tions to be adapted due to changing operational re-
makes it less attractive for those countries and infras- quirements. For all services linked to these embed-
tructure managers, who have already implemented ded control systems there is no competitive market,
a reliably performing and sufficiently safe signaling since bundling of non-standard electronic hardware
and train control system. In addition there is no im- together with closed source or proprietary software
provement expected for ETCS with respect to per- makes it practically and legally impossible for third
formance and safety compared with service proven parties to provide such service. This keeps prices at
systems like LZB and PZB. In order to reach the high levels. While malfunctions and vulnerability of
goal of EU-wide interoperability soon, the EU Com- software products, allowing malware (malicious soft-
mission has implemented legal measures, regulating ware: as there are viruses, trojans, worms etc.) to
member states policies for infrastructure financing harm the system, can be considered as quality defi-
and vehicle homologation. While in the long run, ciencies, which can practically not be discovered in
ETCS can lower the cost for infrastructure opera- proprietary software by users or independent third
tors, especially for Level 2 implementations making parties, whereas the question of the vendor lock-in
conventional line signals obsolete, the burden of cost due to contractual restrictions and limiting license
increase stays with the vehicle owners and operators. agreements is generally foreseeable, but due to gen-

291
Open Proof for Railway Safety Software

erally accepted practices in this particular market, code has not been made public in the least
hardly to be influenced by individual customers (e.g. reliable category; ...
railway operators). Especially security vulnerability
This resolution was mainly targeting electronic
of software must be considered as a specific char-
communication with private or business related con-
acteristic of proprietary or closed source software.
tent, which most likely will not hurt people or en-
So-called End User License Agreements (EULA) do
danger their lives. However a recent attack by the so
usually not allow end-users to analyze copy or redis-
called STUXNET worm [7], a new type of highly so-
tribute the software freely and legally. Even anal-
phisticated and extremely aggressive malware, which
ysis and improvement of the software for the users
in particular was targeting industrial process control
own purposes is almost generally prohibited in most
systems via its tools chain, even in safety critical ap-
EULAs. While on the one hand customers who
plications (chemical and nuclear facilities). Systems,
are playing by the rules are barred from analyz-
which are very similar in terms of architecture and
ing and finding potential security gaps or hazardous
software design standards with signaling and inter-
software parts and therefore not being able to con-
locking control systems. This incident has demon-
tribute to software improvements, even not for ob-
strated that we have to consider such impact in rail-
vious defects, however the same legal restrictions on
way control and command systems as well, commer-
the other hand do not prevent bad guys from disas-
cially and technically.
sembling (a method of reverse-engineering) and an-
alyzing the code by using freely available tools, in
order to search for security gaps and occasionally (or
better: mostly) being successful in finding unautho- 2.2 Software Quality Issues in ETCS
rized access points or so-called backdoors. Intention- Projects
ally implemented backdoors by irregularly working
programmers or just due to lax quality assurance en- Despite a relatively short track record of ETCS in
forcement or simply by mistake are causing serious revenue service we had already received reports on
threats in all software projects. In most cases in- accidents caused by software defects, like the well
tentionally implemented backdoors are hard to find documented derailment of cargo train No. 43647
with conventional review methods and testing pro- on 16 October 2007 at the Ltschberg base line in
cedures. In a typical proprietary R&D environment Switzerland [8]. German Railways has been spared
only limited resources are allocated in commercial so far from software errors with severe consequences,
projects for this type of security checks and there- possibly due to a relatively conservative migration
fore stay most likely undiscovered. That backdoors strategy. During the past 40 years, software was only
cannot be considered as a minor issue, has been dis- introduced very slowly in small incremental steps
cussed in various papers [2, 3, 4, 5] and has already into safety-critical signaling and train protection sys-
been identified as a serious threat by the EU Parlia- tems and carefully monitored over years of operation,
ment, which has initiated resolution A5-0264/2001 before rolled out in larger quantities. Software was
in the aftermath of the Echelon interception scan- more or less replacing hard-wired circuits with rel-
dal, resulting in following recommendations [6]: atively low complexity based on well serviceproven
functional requirement specifications over a period
... Measures to encourage self-protection by citi- of four decades. With ETCS however, a relatively
zens and firms:
large step will be taken: Virtually all new vehicles
29. Urges the Commission and Member States have to be equipped with ETCS from 2015 on, en-
to devise appropriate measures to promote ... and forced by European legal requirements, despite the
... above all to support projects aimed at develop- fact that no long-term experience has been made.
ing user-friendly open-source encryption soft- The ongoing development of the functional ETCS
ware; specification as well as project specific adaptations to
national or line-specific conditions has resulted in nu-
30. Calls on the Commission and Member States
merous different versions of ETCS implementations
to promote software projects whose source text is
not fully interoperable. Up to now, there is still no
made public (open-source software), as this is
single ETCS onboard equipment on the market that
the only way of guaranteeing that no back-
could be used on all lines in Europe, which are said
doors are built into programmes;
to be equipped with ETCS. That means that the
31. Calls on the Commission to lay down a stan- big goal of unrestricted interoperability would have
dard for the level of security of e-mail software been missed, at least until 2010. The next major re-
packages, placing those packages whose source lease of the System Requirements Specification (SRS
3.0.0), also called ”baseline 3”, is expected to elim-

292
FLOSS in Safety Critical Systems

ination those shortcomings. Baseline 3 has another subset 026 [1]). Since all or at least most of the
important feature: Other than all previous SRS ver- documents are created by humans, there is always
sions, which have been published under the copyright the human factor involved, causing ambiguities and
of UNISIG, an association of major European sig- therefore divergent results. Herbert Klaeren refers
naling manufacturers, SRS 3.0.0 in the opposite has to reports in his lecture [9], which have found an av-
been published as an official document by the Euro- erage of 25 errors per 1000 Lines Of programming
pean Railway Agency (ERA) a governmental orga- Code (TLOC) for newly produced software. The
nization implemented by the European Commission. book Code Complete by Steve McConnell has a brief
This gives the SRS a status of a public domain docu- section about errors to be expected. He basically
ment. That means, everyone in Europe is legally en- says that there is a wide range [10]:
titled to use that information in order to build ETCS
(a) Industry Average: ”about 15 - 50 errors per
compliant equipment.
1000 lines of delivered code.” He further says this
is usually representative of code that has some level
of structured programming behind it, but probably
2.3 Quality Deficiencies in Software
includes a mix of coding techniques.
Products
(b) Microsoft Applications: ”about 10 - 20 de-
Everyone who has ever used software products knows fects per 1000 lines of code during inhouse testing,
that almost all software has errors and no respectable and 0.5 defect per TLOC in released products [10].”
software company claims, that their software is to- He attributes this to a combination of code-reading
tally free of defects. There are various opportunities techniques and independent testing.
to make mistakes during the life cycle of software (c) ”Harlan Mills pioneered a so called ’clean
production and maintenance: Starting with System room development’, a technique that has been able
Analysis, System Requirement Specification, Func- to achieve rates as low as 3 defects per 1000 lines
tional Requirement Specification, etc., down to the of code during in-house testing and 0.1 defect per
software code generation, integration, commission- 1000 lines of code in released product (Cobb and
ing, operation and maintenance phases. A NASA Mills 1990 [11]). A few projects - for example, the
Study on Flight Software Complexity [12] shows con- space-shuttle software - have achieved a level of 0 de-
tribution to bug counts, which can be expected in fects in 500,000 lines of code using a system of formal
different steps of software production (figure 2). development methods, peer reviews, and statistical
testing.”

FIGURE 2: Propagation of residual de-


fects (bugs) as a result of defect insertion and
defect removal rates during several stages of
the software production process, according to
a NASA research on high assurance software
for flight control units for each 1000 lines of
code (TLOC) [12].

Figure 3 characterizes the actual situation in the FIGURE 3: Divergent interpretation of


European signaling industry with several equipment a common public domain ETCS System Re-
manufacturers working in parallel, using the same quirement Specification (SRS) document, due
specification document in natural language precision to the human factor by all parties involved,
giving room for interpretation, combined with differ- causing different software solutions with de-
ent ways and traditions of making mistakes, result- viant behavior of products from different man-
ing in a low degree of standardization, even for those ufacturers, which result in interoperability de-
components, which cannot be used for product dif- ficiencies and costly subsequent improvement
ferentiation (core functionality according to UNISIG activities.

293
Open Proof for Railway Safety Software

However the U.S. space shuttle software program


came at a cost level of about U.S. $ 1,000 per line of
code (3 million LOC 3 billion U.S. $ [9], cost ba-
sis 1978), not typical for the railway sector, which is
more in a range between 30 per LOC for non-safety
applications and up to 100 for SIL 3-4 quality (SIL:
Safety Integrity Level) levels.

2.4 Life Cycle of Complex Software


FIGURE 4: Bug fixing history and growth
statistics of an AT&T communication server
While on the one hand, electronic components are software package over a life cycle of 17 releases
becoming increasingly powerful, yet lower in cost, [9], [13].
on the other hand, cost levels of complex software
products are increasingly rising not only because the Even though that extensive testing has been and
amount of code lines need tremendous men power, still is proposed as an effective method for detecting
but for those lines of code a poof of correctness, or errors, it has become evident, that by testing alone
also called safety case, has to be delivered in order the correctness of software cannot be proven, because
to reach approval for operation in revenue service. tests can only detect those errors for which the test
Some manufacturers have already reported software engineer is looking for [14]. This means ultimately
volumes of over 250 TLOC for the ETCS core func- that there is no way to base a safety case on testing
tionality defined by ETCS SRS subset 026 [1]. It is alone, because the goal is not to find errors, but to
very difficult to receive reliable statistics about er- prove the absence of errors. One of the great pio-
rors on safety related soft- ware products, because neers of software engineering, Edsgar W. Dijkstra,
almost all software manufacturers hide their source has put it into the following words [15]:
code using proprietary license agreements. However Program testing can be a very effective way to
we can assume that software in other mission critical
show the presence of bugs, but is hopelessly inade-
systems, like communication servers, may have the quate for showing their absence.
same characteristics with respect to bug counts. One
of those rare examples published, was taken from an We have even to admit that at the current state
AT&T communication software project, which from of software technology, there is no generally accepted
its size is in the same order of magnitude as todays single method to prove the correctness of software
ETCS onboard software packages (figure 4) [9],[13]. that means, there is no way to prove the absence of
One particular characteristic in figure 4 is quite ob- bugs, at least not for software in a range of 100 TLOC
vious: The size of the software is continuously grow- or more. The only promising strategy for minimizing
ing from version to version, despite the fact that this errors is:
software was always serving the same purpose. Start-
ing with a relatively high bug count of almost 1000 • The development of functional and design spec-
bugs in less than 150 TLOC, the software matures af- ifications (architecture) has to be given top pri-
ter several release changes and is reaching a residual ority and needs adequate resources,
bug density, which is less than a tenth of the initial
bug density. During its early life the absolute num- • The safety part of the software has to be kept
ber of bugs is oscillating and stabilizes in its more as small as possible, and
mature life period. At a later phase of the life cy-
cle the absolute number of bugs is slightly growing • The software life-cycle has to undergo a broad
despite a decreasing bug density. The late life-cycle as possible, manifold, and continuous review
bug density is mainly determined by the effectiveness process.
and quality measures taken on the one hand and the
number of functional changes and adaptations built Successful software projects require for the first
in since last release on the other hand. The actual point, the specification, at least between 20% and
number of bugs is often unknown and can only sub- 40% [12] of the total development cost, depending
sequently been determined. on the size of the final software package. Trying to

294
FLOSS in Safety Critical Systems

save at this point will almost certainly result in in- size may show up once in 1000 years, would mean
flated costs during later project phases (see figure for a space mission duration of 1 year, a probability
5) for a mission failure of about 0,1% due to software
(equal distribution assumed). For the railway oper-
ator however, who operates 1000 trains at the same
time, having the same size and quality of software
on board may cause a mission failure event about
once a year, making drastically clear that code size
matters. While the third point is very difficult to
get implemented within conventional closed source
software projects, simply because highly qualified re-
view resources are always very limited in commercial
projects. Therefore errors are often diagnosed at a
late stage in the process. Their removal is expen-
sive and time consuming. That is why big software
projects often fail due to schedule overruns and cost
FIGURE 5: Fraction of the over all project
levels out of control and often even been abandoned
budgets spent on specification (architecture)
altogether. Never the less, continuous further devel-
versus fraction of budget spend on rework +
opment and consistent use of quality assurance mea-
architecture, which defines a so called sweet
sures can result in a remarkable process of matura-
spot where it reaches its minimum [12]. How-
tion of software products, which is demonstrated by
ever this cost function does not take any poten-
the fact that in our example in figure 4 the bug den-
tial damages into account, which might result
sity has been reduced by more than an order of mag-
from fatalities caused by software bugs.
nitude (initially above 6 bugs/TLOC down to below
0.5 bugs/TLOC). On the other hand, in a later stage
Formal modeling methods and close communica-
of the life cycle, due to the continuous growth of the
tion with the end-user may be helpful in this stage,
number of code lines, which seem to go faster than
especially when operational scenarios can be mod-
the reduction of the bug density, a slight increase of
eled formally as well in order to verify and validate
the total number of bugs, can be observed. Given
the design. Specification, modeling and reviews by
a certain methodology and set of quality assurance
closely involving the customer may even require sev-
measures on the one hand and a number of change
eral cycles in order to come to a satisfactory result.
requests to be implemented per release, then this will
result in a certain number of bugs that remains in the
software. Many of those bugs stay unrecognized for-
ever. However some are also known errors, but their
elimination is either not possible for some reason or
can only be repaired at an unreasonably high level
of cost. The revelation of the unknown bugs can
take several thousand unit operation years (number
of units times number of years of operation) and must
be considered as a random process. That means for
the operator, that even after many years of flawless
operation, unpleasant surprises have to be expected
FIGURE 6: Graph taken from a NASA at any time. In Europe, in a not too distant future
Study on Flight Software Complexity [12] up to 50,000 trains will operate with ETCS, carrying
suggesting a reasonable limit for software millions of passengers daily, plus unnumbered trains
size, determine a level, which results in cer- with hazardous material. Then the idea is rather
tainty of failure beyond this size (NCSL: Non- scary that in any of those European Vital Computers
Commentary Source Lines = LOC as used in (EVC), the core element of the ETCS vehicle equip-
this paper). ment, between 100 and 1000 undetected errors are
most likely left over, even after successfully passing
Large railway operators in the opposite may have safety case assessments and after required authoriza-
several hundreds of trains, representing the same tion has been granted. Even if we would assume,
level of material value (but carrying hundreds of pas- that only one out of 100 bugs might eventually cause
sengers) operating at the same time. Assuming that a hazard [12], that still means 1 to 10 mission critical
a mission critical failure in software of a particular

295
Open Proof for Railway Safety Software

defects per unit. Further more, there will be several tance is a simple form of the software reviewing pro-
different manufacturer-specific variants of fault pat- cess. Researchers and practitioners have repeatedly
terns under way. shown the effectiveness of the reviewing process in
finding bugs and security issues [18].

2.5 New Technologies Have to Have


At Least Same Level of Safety
2.6 Changing Business Model for
According to German law (and equally most other Software: From Sales to Service
EU states) defined in the EBO (Eisenbahn Bau- und
Betriebsordnung: German railway building and op- Such problems can be solved by changing the busi-
eration regulations) any new technology has to main- ness model. Looking at software as a service rather
tain at least the same safety level as provided by the than a commodity or product (in its traditional defi-
preceding technology [16]. Assuming that the more nition) is justified by the fact that software is growing
complex ETCS technology requires about ten times continuously in size (but not necessarily increasing
more software code than legacy technology like LZB the value to the user) as shown in figure 4. Over a
and PZB as an average and given the fact, that PZB life-cycle of 17 releases, the total size of that soft-
and LZB have already reached a very mature stage of ware grew by more than 300%. Furthermore, about
the software integrated after almost 3 decades of con- 40% of the modules of the first release had to be
tinues development, then it seams very unlikely, that fixed or rewritten, due to one or more bugs per mod-
the at least same level of safety can be proven by us- ule. That means only 60% of the original modules
ing the same technical rules and design practices for were reusable for second or later releases. It is fair
a relatively immature ETCS technology. In addition, to assume that half of the original 60% virtually bug
due to less service experience the criticality of devi- free code had to be adapted or otherwise modified
ations from expected reaction patterns are difficult to use it for functional enhancements. This results
to assess. This raises the question whether propri- in not more than 30% of remaining code, which is
etary software combined with a business model that about 50 TLOC of the original code having a chance
sells the software together with the hardware device, to survive unchanged up to release No. 17. Big-
and then - as up to now - will be operated largely gerstaff [19] and Rix [20] suggest that these assump-
without any defined maintenance strategy, might be tions might even be too optimistic, as long as no
inadequate for such a gigantic European project. A specific measures have been taken in order to sup-
project eventually replacing all legacy, but well ser- port reusability of code. It can be assumed that a
vice proven signaling and train protection systems potential sales price of all versions would be at the
with one single unified, but less service proven tech- same level, since all versions serve in principle the
nology, especially when independent verification and same functions. That means during its life cycle only
system validation is only provided at rare occasions 10% ( 50 TLOC out of about 500 TLOC) of the fi-
by a very limited number of experts .... or ... then nal code was left unchanged from the first version in
again after a critical incident has taken place only? this particular example. In other words: 90% of the
Instead, a broad and continues peer review scheme work (code lines) has been added by software main-
with full transparency in all stages of the life cycle of tenance and continuous further development efforts
ETCS on-board software products would be highly over the observed life cycle, which can be considered
recommended. In particular during the critical spec- as servicing the software. In order to make this a
ification, verification and validation phases, following viable business, users and software producers have
the so called Linus’s Law, according to Eric S. Ray- to contract so called service agreements for a cer-
mond, which states that: tain period of time. This kind of business model
can be a win-win for both, users and manufacturers
given enough eyeballs, all bugs are shallow.
alike. Manufacturers are generating continuous cash
More formally: flow, allowing them to maintain a team of experts
over an extended period of time, dedicated to con-
Given a large enough beta-tester and co-developer
tinuously improving the software. Users in exchange
base, almost every problem will be characterized
are having guaranteed access to a qualified technical
quickly and the fix will be obvious to someone.
support team ensuring fast response in a case of crit-
The rule was formulated and named by Eric S. ical software failures. Proprietary software makes it
Raymond in his essay The Cathedral and the Bazaar mostly impossible for the user to switch software ser-
[17]. Presenting the code to multiple developers with vice providers later on, but leave users in a vendor
the purpose of reaching consensus about its accep- lock-in situation with no competition on the software

296
FLOSS in Safety Critical Systems

service market. Competition however is the most ef- 3.1 Public License for an European
fective driver for quality improvement and cost effi- Project
ciency.
Considering commercial, technical, safety and A potential candidate for a license agreement text
security aspects, the risks associated with complex could be the most widely used General Public Li-
closed source software should be reason enough for cense (GPL) or occasionally called GNU Public Li-
the railway operators to consider alternatives, in par- cense, which has been published by the Free Soft-
ticular when a large economic body, like the Euro- ware Foundation [23]. Because this license text (and
pean Union, defines a new technological standard. several similar license texts as well) is based on the
Watts Humphrey, a fellow of the Software Engineer- Anglo-American legal system. In Europe applica-
ing Institute and a Recipient of the 2003 National bility and enforceability of certain provisions of the
Medal of Technology (US), has put the general prob- GPL are considered as critical by many legal experts.
lem of growing software complexity in these words The European Union has recognized this problem
[21]: some time ago and has issued the European Union
While technology can change quickly, getting Public License text [25], which not only is available
your people to change takes a great deal longer. That in 22 official EU languages, but is adapted to the
is why the people-intensive job of developing soft- European legal systems, so that it meets essential
ware has had essentially the same problems for over requirements for copyright and legal liability issues.
40 years. It is also why, unless you do something, The EU Commission recommends and uses this par-
the situation wont improve by itself. In fact, current ticular License for its own European eGovernment
trends suggest that your future products will use more Services project (iDABC [26]). A key feature of the
software and be more complex than those of to- day. aforementioned license types is the so-called strong
This means that more of your people will work on ”Copy Left” [27]. The Copy-Left requires a user who
software and that their work will be harder to track modifies, extends or improves the software and dis-
and more difficult to manage. Unless you make some tributes it for commercial or non-profit purposes, to
changes in the way your software work is done, your make also the source code of the modified version
current problems will likely get much worse. available to the community under the same or at
least equivalent license conditions, which has applied
to the original software. That means everybody will
3 Proposal: Free / Libre Open get access to all improvements and further develop-
ments of the software in the future. The distribu-
Source Software for ETCS tion in principle has to be done free of charge, how-
ever add-on services for a fee are permissible. That
A promising solution for the previously described dif- means for embedded control systems, that software-
ficulties could be given by providing an Open Source hardware integration work, vehicle integration, ho-
ETCS onboard system, making the embedded soft- mologation and authorization costs can be charged
ware source code and relevant documentation open to the customer as well as service level agreements
to the entire railway sector. Open Source Software, for a fee are allowed within the EUPL. By apply-
Free Software or Libre Software more often called ing such license concept to the core functionality of
Free/Libre Open Source Software short: FLOSS [22], the ETCS vehicle function as defined and already
is software that: published in UNISIG subset 026 of the SRS v3.0.0
[1] all equipment manufacturers as well as end-users
• Can be used for any purpose, would be free to use this ETCS software partly or
as a whole in their own hardware products or own
• Can be studied by analyzing the source code, vehicles. Due to the fact that a software package of
• Can be improved and modified and substantial value would be then available to all par-
ties, there would be not much incentive any more for
• Can be distributed with or without modifica- newcomers to start their own ETCS software devel-
tions. opment project from scratch, but would more likely
participate in the OSS project and utilize the effect
This basic definition of FLOSS is identical to the of cost sharing. Also established manufacturers, who
Four Freedoms, with which the Free Software Foun- already may have a product on their own, might
dation (FSF, USA, [23]) has defined free software and consider sharing in for all further add-on functions
is in line with the open source definition formulated by trying to provide an interface to the OSS soft-
by the Open Source Initiative (OSI) [24]. ware modules with their own existing software. This

297
Open Proof for Railway Safety Software

will result in some kind of an informal or even for- ther distributed by the recipient. That means all the
mally set up consortium of co-developing firms and technical improvements, which have been based on
a so called open source eco-system around this core collective experiences of other users/developers and
is most likely to evolve. This has been demonstrated integrated into product improvements need to be dis-
by many similar FLOSS projects. The effect of coop- tributed so that even the original investor gets the
eration of otherwise in competition operating firms, benefits. By recalling the fact that during the life
based on a common standard core product, is often cycle of a large software product, as shown in fig-
called co-competition. In analogy with other similar ure 4, more than 90% of the code and improvements
open projects the name openETCS has been sug- were made after the first product launch, means that
gested for such a project. Occasionally expressed sharing the original software investment with a com-
concern that such a model would squander costly munity (eco-system) becomes a smart investment for
acquired intellectual property of the manufacturers railway operators and manufacturers alike, by sim-
to competitors does not really hit the point, because ply reducing their future upgrade and maintenance
on the one hand the essential functional knowledge, costs significantly. Rather than starting to develop a
which is basically concentrated in the specification, new open source software package from scratch, the
has already been published by UNISIG and ERA easiest and fastest way for a user to reach that goal
within the SRS and cannot be used as unique selling would be simply by selecting one of the already exist-
point. On the other hand implementation know-how ing and (as far as possible) service proven products
for specific hardware architecture and vehicle inte- from the market and put it under an appropriate
gration as well as service knowledge will not be af- open source license. There are numerous examples
fected and has the potential to become part of the from the IT sector, such as the software development
core business for the industry. In addition, for the tool Eclipse, the successor to IBMs Visual Age for
pioneering manufacturer open up his own software Java 4.0., source code was released in 2001 [28], the
could not be better investment money, if this soft- Internet browser Mozilla FireFox (former: Netscape
ware becomes part of an industrial standard, which Navigator), and office communication software Open
is very likely (if others are not quickly following this Office (former: StarOffice) and many more.
move) as demonstrated several times in the software
industry. Not only that, but since safety related soft-
ware products are closely related to the design pro-
cess, tools and quality assurance measures, the pio-
3.3 Tools and Documents Need to be
neering OSS supplier would automatically make his Included
way of designing software to an industrial standard
as well (process standardization). Late followers had In the long term it will not be enough only to make
simply to accept those procedures and may end up the software in the on-board equipment open. Tools
with more or less higher switching costs, giving the for specification, modeling and simulation as well
pioneer a head start. Even in the case that one or as software development, testing and documentation
two competitors would do the same thing quickly, are also essential for securing quality and lowering
those companies could form a consortium sharing life cycle cost. To meet the request for more compe-
their R&D cost and utilizing the effect of quality im- tition in the after sales software service business and
proving feedback from third parties and therefore im- avoiding vendor lock-in effects, requires third par-
proving their competitive position compared to those ties to be in a position to maintain software, prepare
firms sticking with a proprietary product concept. safety case documents, and get the modified software
The UNUMERIT study on FLOSS [22] has shown authorized again without depending on proprietary
cost lowering (R&D average of 36%) and quality im- information. A request no one would seriously deny
proving effects of open source compared with closed for other safety critical elements e.g. in the mechani-
source product lines. cal parts section of a railway vehicle, like loadbearing
car body parts or wheel discs. The past has shown
that software tools are becoming obsolete quite often
due to new releases, changing of operating systems,
3.2 ETCS Vehicle On-Board Units or tool suppliers simply going out of business, leav-
with openETCS ing customers alone with little or no support for their
proprietary products. Railway vehicles are often in
Software that comes with a FLOSS license and a revenue service for more than 40 years and electronic
Copy-Left, represent some kind of a gift with a com- equipment is expected to be serviced for at least 20
mitment, namely as such that the donor has almost years and tools need to be up to the required techni-
a claim to receive any improvements made and fur- cal level for the whole period. The aircraft industry

298
FLOSS in Safety Critical Systems

with similar or sometimes even longer product life- free of malware) source code, can be infected with
cycles has realizing this decades ago, starting with a backdoor, almost invisible for the programmer. It
ADA compiler in the 1980th, specifically designed took several years of research until David A. Wheeler
for developing high assurance software for embedded suggested in his dissertation thesis (2009) a method
control design projects, originally initiated by the US called Diverse Double-Compiling [32], based on open
Air Force and developed by the New York Univer- source tools for countering the so called Thomson
sity (GNAT: GNU NYU Ada Translator), which is Hack. Therefore Wheeler suggests on his personal
available in the public domain and further developed website:
by AdaCore and the GNU Project [3], [23] and a
Normal mathematicians publish their proofs, and
somewhat more sophisticated tools chain, which is
then depend on worldwide peer review to find the er-
called TOPCASED, initiated by AIRBUS Industries
rors and weaknesses in their proofs. And for good
[29]. TOPCASED represents a tools set, based on
reason; it turns out that many formally published
ECLIPSE (another OSS software development tools
math articles (which went through expert peer review
platform [28]) for safety critical flight control applica-
before publication) have had flaws discovered later,
tions with the objective to cover the whole life cycle
and had to be corrected later or withdrawn. Only
of such software products, including formal specifica-
through lengthy, public worldwide review have these
tion, modeling, software generation, verification and
problems surfaced. If those who dedicate their lives
validation, based on FLOSS in order to guarantee
to mathematics often make mistakes, its only reason-
long term availability. TOPCASED seams to be a
able to suspect that software developers who hide their
reasonable candidate for a future openETCS refer-
code and proofs from others are far more likely to get
ence tools platform, since it is a highly flexible open
it wrong. ... At least for safety-critical work making
framework, adaptable in various ways for meeting a
FLOSS (or at least world-readable) code and proofs
wide range of requirements. Today manufacturers in
would make sense. Why should we accept safety soft-
the rail segment are using a mix of proprietary and
ware that cannot undergo worldwide review? Are
open source tools, since some software development
mathematical proofs really more important than soft-
tools like ADA and other Products from the GNU
ware that protects peoples lives? [3]
Compiler Collection (GCC) [24] have already been
used in several railway projects. Even FLOSS tools,
not specifically designed for safety applications, like
BugZilla for bug tracing and record keeping, have
already been found its way into SIL 4 R&D pro- 3.4 Open Proof the ultimate Objec-
grams for railway signaling [30]. The importance tive for openETCS
of qualified and certified tools is rising, since it be-
came obvious, that poor quality tools or even mal-
Wheelers statement confirms the need for an open
ware infected tools can have a devastating effect on source tools chain to cover the software produc-
the quality of the final software product. 6.6 of pro- tion and documentation process for verification and
posed prEN 50128:2009 norm [31], modification and validation into the open source concept in total,
change control, requires to take care of the software providing an Open Proof (OP) methodology [33].
development tools chain and processes, which in the OP should be then the ultimate objective for an
future formally have to comply with requirements openETCS project, in order to make the system as
for the respective SIL level of the final product. Re- robust as possible for reliability, safety as well as for
cent news about the STUXNET attack, a type of
security reasons. An essential precondition for any
malware (worm) specifically designed to target in- high quality product is an un-ambiguous specifica-
dustrial process control computers via its tools chain tion. Until this day only a written more or less-
(maintenance PCs with closed source operating sys- structured text in natural language is the basis for
tem) has made pretty clear, that no one can be lulled ETCS product development, leaving more room for
into security even not with control and monitoring divergent interpretation (figure 3) than desirable. A
systems designed for safety critical embedded appli- potential solution for avoiding ambiguities right in
cations [7]. Ken Thompson, one of the pioneers of the the beginning of the product development process
B Language, a predecessor of C, and UNIX operating
could be the conversion into a formal that means
system design has demonstrated in his Reflections on mathemati- cal description of the functional require-
Trusting Trust [2] that compilers can be infected with ment specification. As recommended by Jan Peleska
malicious software parts in a way that the resulting in his Habiltationsschrift (post doctorial thesis) [34]:
executable software (e.g. an operating system) gen-
erated by this compiler out of a given clean (means: ... how the software crisis should be tackled in
the future:

299
Open Proof for Railway Safety Software

• The complexity of today’s applications can only


be managed by applying a combination of meth-
ods; each of them specialised to support specific
development steps in an optimised way during
the system development process.

• The application of formal methods should be


supported by development standards, i.e., ex-
planations or ”recipes” showing how to apply
the methods in the most efficient way to a spe-
cific type of development task....

• The application of formal methods for the de-


velopment of dependable systems will only be-
come cost-effective if the degree of re-usability
is increased by means of re-usable (generic)
specifications, re-usable proofs, code and even
FIGURE 7: Proposed openETCS based on
re-usable development processes.
ERAs base line 3 SRS natural English speci-
fication text (= prose) converting into a for-
mal functional specification to define software
Despite the fact that several attempts have been
for modeling as well as embedded control in-
made in the past, a comprehensive Formal Func-
tegration, providing equipment manufacturers
tional Requirement Specification (FFRS) has never
to integrate the openETCS kernel software via
been completed for ETCS due to lack of resources
API into their particular EVC hardware de-
and/or funding. Based on proprietary software busi-
sign.
ness concepts there is obviously not a positive busi-
ness case for suppliers for a FFRS. Formal specifica- For verification purposes a test case data base
tion works does not have to be started from scratch, need to be derived from the functional specification
because there are already a number of partial results and supplemented by a response pattern data base,
from a series of earlier work, although that differ- which defines the expected outcome of a certain test
ent approaches, methods and tools have been used case. That database needs to be open for all par-
[35], [36], [37]. Evaluating those results and trying to ties and should collect even all real world cases of
apply a method successfully applied in several open potentially critical situations and in particular those
source projects and known as a so called Stone Soup cases, which have already caused safety relevant inci-
Development Methodology might be able to bring all dents. That means this type of formalized database
those elements and all experts involved together in will keep growing and continuously being completed
order to contribute to such project at relatively low to make sure that all lessons learned are on record
cost [3], [38]. for future tests. State of the art formal specifica-
tion tools do not only provide formatting support
for unambiguous graphical and textual representa-
tion of a specification document, but provide also a
3.5 Formal Methods to validate Spec- modeling platform to execute the model in a more
ification for openETCS or less dynamical way. This modeling can be used
to verify the correctness and integrity of the ETCS
In the first step of formalization only a generic, specification itself not only statically, but also dy-
purely functional and therefore not implementation namically. In addition transitions to and from class
related specification has to be developed. This can B systems need to be specified formally as well and
be mainly done in the academic sector and by R&D that might depend on national rules, even in those
institutes. However railway operators have to feed in cases where the same class B system is used (e.g.
their operational experience, in order to make sure for PZB-STMs hot stand-by functions are handled
that man-machine-interactions and case studies for differently in Germany and Austria). Based on a
test definitions are covering real life operational sce- particular reference architecture the resulting formal
narios and not only synthetic test cases of solely aca- functional specification can be transformed in a for-
demic interest. mal software specification and then converted into

300
FLOSS in Safety Critical Systems

executable software code. Even without existing real software for several month or even years, while hav-
target hardware, those elements can be used to sim- ing an average survival time of days or few weeks, at
ulate the ETCS behavior and modeling critical oper- the most, in the case of well managed OSS projects
ational test cases in a so called Software-in-the-Loop [3], [4], [5], [32]. Figure 8 demonstrates the princi-
modeling set-up. Once the specification of the func- ple information and source code flow for a typical
tionality has been approved and validated, the code FLOSS development set-up.
generation can be done for the EVC embedded con-
trol system. Standardization can be accomplished
by providing an Application Programmer Interface
(API) similar to the approach successfully applied
in the automotive industry within the AUTOSAR
project [39] or for industrial process control systems
based on open Programmable Logic Control (PLC)
within the PLCopen project [40] including safety
critical systems. In addition to the software speci-
fication, generation, verification and validation tools
chain also tools for maintenance (parameter setting, FIGURE 8: The classical Stone Soup De-
system configuration, software upload services) have velopment Methodology often applied in Open
to be included in the OSS concept, as shown in figure Source Software projects according to [3],
7. where the User in most cases is also active
as Developer, which need to be adapted to the
rail sector, where Users my be more in a re-
porting rather developing role. Only trusted
3.6 How FLOSS can meet Safety and developers are privileged to make changes to
Security Requirements the source code in the trusted repository, all
others have read only access.
For many railway experts, not familiar with open
source development methodology, open source is of- It is not in question that well acknowledged and
ten associated with some kind of chaotic and arbi- mandatory rules and regulations according to state
trary access to the software source code by ama- of the art R&D processes and procedures (e.g. EN
teur programmers (hackers), completely out of con- 50128) have to be applied to any software part in
trol and therefore not suited for any kind of qual- order to get approval from safety authorities be-
ity software production. This may have been an fore going into revenue service. While open source
issue of the past and still being in existence with eco-systems in the IT industry are generally driven
some low level projects, adequate for their purpose. by users, having the expertise and therefore being
However since OSS license and R&D methodologies in a position to contribute to the software source
concepts have successfully been applied to unnum- code themselves, so it seems unlikely for the railway
bered serious business projects, even for the highest segment to find many end users of embedded con-
safety and security levels for governmental admin- trol equipment for ETCS (here: railway operators
istration, e.g. within the iDABC, European eGov- or railway vehicle owners), who will have this level
ernment Services Project [26] as well as commercial, of expertise. Therefore the classical OSS develop-
avionics [29] and military use [24], a concept based ment concept and organization has to be adapted to
on a group of qualified and so called Trusted De- the railway sector. Figure 9 shows a proposal for
velopers (figure 8) having exclusively access to a so an open source software development eco-system for
called Trusted Repository, which on the other hand openETCS utilizing a neutral organization to coordi-
can be watched and closely monitored by a large nate the so-called ”co-competition” business model
community of developers, being able to post bug re- for cooperating several competing equipment inte-
ports and other findings visible to the whole commu- grators and distributors for ETCS onboard products
nity, has made this so called bazaar process [17] to a and services based on a common FLOSS standard
much more robust methodology compared with any core module, adapted to the needs of the railway
other proprietary development scheme. According signaling sector providing high assurance products
to several research projects, OSS projects in general to be authorized by safety authorities (NSA, NoBo).
tend to find malicious code faster than closed source The concept as shown in figure 9 assumes a license
projects, which is indicated for example in the av- with Copy-Left, requiring in general distributing the
erage life time of so called backdoors, a potential source code free of charge, even if code has been
security threat, which might exist in closed source added or modified and further distributed, so that

301
Open Proof for Railway Safety Software

the community can re-use the improvements as well. policy), by offering the identical software under two
That means that only certain added values can be (or more) different license agreements (figure 10).
sold for a fee. Typical added values can be service for One might be the European EUPL, a Copy-Left type
software maintenance (bug-fixing), software adapta- FLOSS license and the other one can be a For-Fee-
tion for specific applications, integration into embed- License (without Copy-Left), which does not require
ded control hardware and integration into the vehicle publishing all modification. In exchange ac certain
system, test and homologation services, training for fee has to be paid, which may also provide for war-
personnel and so forth. ranty and other services. Combined with a sched-
uled release scheme (e.g. defining a fixed release
day per year or any other reasonable frequency), all
new modules will be available only under the For-
Fee-License first, until R&D costs have been paid
off by those users, who want to make use of the
new functionality, while all others can stick with the
older, but free of charge software versions. Once the
new features are paid off, those particular software
modules can then be set under the FLOSS license
(EUPL). That allows fair cost sharing for all early
implementers and does not leave an undesired bur-
den on those users, who can live with- out the ad-
ditional functions for a while, but still being able to
FIGURE 9: Proposal for an openETCS upgrade later on.
eco-system using a neutral organization to co-
ordinate a so-called ”co-competition” business
model, showing flow of software source code,
bug reports and ad-on services provided for a
fee.

For further development of the software, espe-


cially for the development of new complex add-on
functions, costly functional improvements, etc., it
might be difficult to find funding, since a Copy-Left
in the FLOSS license requires to publish that soft-
ware free of charge, when distributed.

FIGURE 11: Applying a dual-licensing


model helps even those non-OSS ETCS sup-
pliers participating in cost sharing for add-on
modules even if the usage of the openETCS
software kernel is not possible due to technical
or licensing incompatibilities, but by provid-
ing a Mini-API some or all future new add-on
functions can be integrated.

Since those upgrades will be provided by service


level agreements through OEMs or software service
providers, customers have the choice to either opt for
FIGURE 10: Dual-Licensing concept pro- low cost, but later upgrade service or higher priced
viding a cost sharing scheme for financing new early implementing services, whatever fits best to
functions (Add-on SW Modules) and improve- their business needs. The dual-licensing scheme has
ments for those users who need it, by keeping an additional advantage, allowing even those ETCS
cost low for those not requiring enhanced func- suppliers, who are not able, due to technical limi-
tionality. tations or legal restrictions caused by their legacy
system design or other reasons, to put their soft-
Therefore many OSS projects are using a so ware under an OSS license, never the less being able
called dual licensing policy (or even multi license to participate in the cost sharing effects for further

302
FLOSS in Safety Critical Systems

add-on functional development. In most cases it is an open source reference system, based on an
technically much easier to implement a small API, unambiguous specification, which means using
interfacing just for the add-on functions, rather than formal methods, in order to deliver a reference
providing a fully functional API for the whole kernel onboard system as soon as possible, which can
(figure 11). be used to compare various products on the
market in a simulated as well as real world
If the non-OSS supplier wants to make use of
infrastructure test environment. This device
those Add-on-SW-Modules from the library, he can-
needs to be functionally correct, however does
not use the OSS-licensed software, but can com-
not to be a vital (or fail-safe) implementation.
bine any proprietary software with alterative licensed
software, not including a Copy-Left provision. Be- • 2. At least one or better more manufacturers
sides commercial matters also technical constrains have to be convinced to share-in into an open
have to be taken into account when combining soft- source software based business approach by
ware parts, developed for different architectural de- simply converting their existing and approved
signs. A concept of hardware virtualization has al- proprietary ETCS onboard product into an
ready been discussed to overcome potential security open source software product by just switch-
issues [43]. ing to a FLOSS license agreement, prefer-
ably by using the European Union Public Li-
cense (EUPL), including interface definition
3.7 How to Phase-in an OSS Ap- and safety case documentation. No technical
proach into a Proprietary Envi- changes are required.
ronment? • 3. Once a formally specified reference FLOSS
package has been provided, implemented on a
Even though the original concept of the ETCS goes non-vital reference hardware architecture, ac-
far back into the early 1990 years projecting an open cording to step 1, in a future step by step ap-
white-box design of interchangeable building blocks, proach all add-on functions and enhancements
independent from certain manufacturers, based on a and future major software releases should be
common specification and mainly driven by the rail- based on formal specifications, allowing a mi-
way operators organized in the UIC (Union Inter- gration of the original manufacturers software
national des Chemin de Fer = International Union design solution into the formal method based
of Railways), software was not a central issue and approach, due to the openness of the prod-
open source software concepts were in its infancy uct(s) from step 2.
[41], [42]. Since then a lot of conceptual effort and
detailed product development work has been done,
but the white box approach has never been adapted
by the manufacturing industry. Despites various dif-
ficulties and shortcomings, as mentioned earlier, the
European signal manufacturers have developed sev-
eral products, more or less fit for its purpose and
it would be unwise to ignore this status of develop-
ment and start a brand new development path from
scratch. This would just lead to another product
competing in an even more fragmented market rather
than promoting an effective product standard. In
addition, it needs at least one strong manufacturer
with undoubted reputation and a sound financial ba-
sis in combination with a sufficient customer base to FIGURE 12: Interaction between
enforce a standard in a certain market. Therefore openETCS project providing formally speci-
starting a new product line, by having the need to fied non-vital reference OBU for validating
catch up with more than a decade of R&D efforts is proprietary as well into OSS converted in-
not an option. dustrial products and for future migration to
Based on this insight, a viable strategy has to a fully formally specified openETCS software
act in two ways: version to implemented in a market product.

Figure 12 demonstrates this two path approach


• 1. Ground work has to be started to provide with a conventional roll-out scheme, as planned by a

303
Open Proof for Railway Safety Software

supplier, based on proprietary designs (upper half) EU Commission [22] has identified a potential av-
and major mile stones for the openETCS project, erage cost reduction of 36% for the corresponding
providing a non-vital OBU based on formal spec- R&D by the use of FLOSS. As a result, a signifi-
ification and later migrating to a formally specified cantly lower cost of ownership for vehicle operators
vendor specific implementation of the kernel software would accelerate the ETCS migration on the vehicle
(lower half). side.
Trying to implement an independent formal open
source software package without the backing of at
least one strong manufacturer, will most likely fail 3.9 Benefits for the ETCS Manufac-
if no approved and certified product can be used to turers
start with. The only promising way to accomplish
the crucial second step in this concept is by using The core of the ETCS software functionality de-
a tender for a sufficiently attractive (large enough) fined by UNISIG subset 026, to be implemented
ETCS retrofit project by adding a request for an OSS in each EVC, is a published and binding standard
license for the software to be delivered. The EU requirement and therefore not suitable for defining
commission has provided a guideline for such OSS an Unique Selling Proposition (USP). As a result it
driven tenders, the so called OSOR Procurement makes perfectly sense from the perspective of man-
Guide (Guideline on public procurement of Open ufacturers, to share the development cost and the
Source Software, issued March 2010, [26]). As an risk for all R&D of the ETCS core functionality
example, figure 12 shows the time line for an ETCS even with their competitors, often practiced in other
retrofit project for high speed passenger trains to be industrial sectors (e.g. automotive). The involve-
equipped by 2012 with an pre-baseline 3 proprietary ment of several manufacturers in the development of
software in 2012, to be added by an open source li- openETCS will help to enhance the quality in terms
cense as soon as the first baseline 3 software package of security and reliability (stability and safety) of
is expected to be released. the software, because different design traditions and
experiences can easily complement each other. As
a FLOSS-based business model can no longer rely
3.8 Economical Aspects of openETCS on the sales of the software as such, the business
for Europes Railway Sector focus has to be shifted to services around the soft-
ware and even other add-on features to the product.
That means the business has to evolve into service
A free of charge, high-quality ETCS vehicle software
contracts for product maintenance (further develop-
product on the market, makes it less attractive, un-
ment, performance enhancements and bug fixes). It
der economical aspects, to start a new software de-
thereby helps the ETCS equipment manufacturers
velopment or even further development of a different
to generate a dependable long-term cash flow, fund-
but functionally identical proprietary software prod-
ing software maintenance teams even long after the
uct. This will lead sooner or later to some kind of
hardware product has been discontinued and to cover
cooperation of competing ETCS equipment suppli-
long term maintenance obligations for the product
ers, a co-competition with all those suppliers who
even by third parties, helping to reserves scarce soft-
can and will adapt their own products by provid-
ware development resources for future product R&D.
ing an API to their particular system. Due to the
With respect to the scarcity of well educated software
fact that very different design and safety philosophies
engineers from Universities, FLOSS has the side ef-
have been evolved in the past years, some of the man-
fect, that openETCS can and most likely will become
ufacturers have to decide either to convert their sys-
subject to academic research, generating numerous
tems or share-in into the co-competition grouping,
master and dissertation thesiss and student research
or otherwise stick with costly proprietary software
projects.
maintenance on their own. As figure 4 demonstrates
clearly that the increase of the software volume over
time may exceed the original volume by a factor of
3. It is unlikely to assume that the development of 3.10 Benefits for Operators and Vehi-
the ETCS vehicle software will run much differently. cle Owners
Then it will be very obvious that for a relatively lim-
ited market, of perhaps up to 50,000 rail cars to be The use of openETCS is a better protection for the
equipped with ETCS in Europe, a larger number vehicle owners investment, because an obsolescence
of parallel software product development lines will problem on the hardware side does not necessarily
hardly be able to survive. A study funded by the mean discontinued software service. Modification

304
FLOSS in Safety Critical Systems

of the ETCS kernel can also be developed by inde- lected by applying the same quality criteria. This
pendent software producers. This enables competi- supports the impression that FLOSS does tend to
tion on after-sales services and enhancements, be- have a higher quality.
cause not only the software sources but also associ-
ated software development tools are accessible to all
parties. As shown above, due to the complexity of
the software, malfunctions of the system may show
4 Conclusion
up many years, even decades after commissioning.
Conventional procure- ment processes are therefore The major goal of unified European train control,
not suitable, since they provide only a few years of signaling, and train protection system, ETCS, has
warranty coverage for those kinds of defects. These led to highly complex functionality for the onboard
concepts imply that customers would be able to find units, which converts into a level of complexity for
all potential defects within this limited time frame, the safety critical software not seen on rail vehicles
just by applying reasonable care and observation of before. A lack of standardization on various levels,
the product by the user, which does not match ex- different national homologation procedures and a di-
periences with complex software packages with more versity of operational rules to be covered, combined
than 100,000 lines of code. This finding suggests that with interfacing to several legacy systems during a
complex software will need care during the whole life- lengthy transitional period has to be considered as
cycle. Since software matures during long term qual- a major cost driver. Therefore, even compared with
ity maintenance, means that during early usage, or some of the more sophisticated legacy ATP and ATC
after major changes, the software may need more in- systems in Europe ETCS has turned out to be far
tensive care whereas in its later period of use, service more expensive without providing much if any ad-
intensity may slow down. But as long as the software ditional performance or safety advantages. Due to
is in use, a stand-by team is needed to counter un- ambiguities in the system requirement specification
foreseeable malfunctions, triggered by extremely rare (SRS) various deviations have been revealed in sev-
operational conditions. As the ETCS onboard soft- eral projects, so that even the ultimate goal of full in-
ware can be considered as mission critical, operators teroperability has not yet been accomplished. There-
are well advised to maintain a service level agreement fore the development of ETCS has to be considered
to get the systems up and running again, even after as work in progress, resulting in many software
worst case scenarios. Railway operators and vehicle upgrades to be expected in the near and distant fu-
owners are usually not be able to provide that soft- ture. Since almost all products on the market are
ware support for themselves. They usually rely on based on proprietary software, this means a low de-
services provided by the OEM. However due to slow- gree of standardization for the most complex compo-
ing service intensity after several years of operation, nent as well as life-long dependency to the original
this service model may not match the OEMs cost equipment manufacturers with high cost of owner-
structure in particular after the hardware has been ship for vehicle holders and operators. Therefore an
phased out. In those cases OEMs are likely to in- open source approach has been suggested, not only
crease prices or even to discontinue this kind of serve. covering the embedded control software of the ETCS
A typical escrow agreement for proprietary software onboard unit itself, but including all tools and doc-
might help, but has its price too, because alternative uments in order to make the whole product life cy-
service providers have first to learn how to deal with cle as transparent as possible optimizing economy,
the software. Only a well established FLOSS-eco- reliability, safety and security alike. This concept is
system can fill in the gap at reasonable cost for the called open proof a new approach for the railway sig-
end user, and that is only possible with FLOSS. DBs naling sector. A dual licensing concept is suggested,
experience with FLOSS is very positive in general. based on the European Union Public License with a
For more than a decade, DB is using FLOSS in vari- copy left provision on the one hand, combined with a
ous ways: In office applications, for the intranet and non-copy left for-fee-license on the other hand to pro-
DBs official internet presence and services on more vide a cost sharing effect for participating suppliers
than 2000 servers world-wide and even in business and service providers. By offering a trusted reposi-
critical applications. The original decision in favor tory, a dedicated sources code access policy in com-
of FLOSS was mainly driven by expected savings on bination with a release schedule policy, economical
license cost. However looking back, quality became a as well as safety and security considerations can be
more important issue over time, since FLOSS appli- taken into account. A two step approach, providing
cation have had never caused a service level breach, a formally specified non-vital reference system and a
which cannot be said for proprietary software, se- procurement program, asking for converting existing
commercial products from closed source into open

305
Open Proof for Railway Safety Software

source, and later merging those two approaches, is http://www.europarl.europa.eu/comparl/ tem-


expected to enhance quality and safety parameters in pcom/echelon/pdf/rapport echelon en.pdf
the long run. A neutral independent and mainly not-
for-profit organization is suggested to manage the [7] FINANCIAL TIMES Europe: Stuxnet worm
project involving all major stake holders to define causes worldwide alarm, by Joseph Menn
the future product strategy. The whole openETCS and Mary Watkins, Published: Sept. 24,
project has to be considered as a business conversion 2010, Pages 1 and 3 or online version:
project from a purely competitive sales oriented mar- http://www.ft.com/cms/s/0/cbf707d2-c737-
ket into a co-competitive service market, enhancing 11df-aeb1-00144feab49a.html
cooperation on standards by enabling competition
on implementation and services. It is well under- [8] Schweizerische Eidgenossenschaft, UUS:
stood that such a change cannot be accomplished Schlussbericht der Unfalluntersuchungsstelle
even by one of the largest railway operators alone. Bahnen und Schiffe ber die Entgleisung
Therefore several EU railway organizations, as there von Gterzug 43647 der BLS AG vom Di-
are: ATOC (UK), DB (D), NS (NL), SNCF (F) and enstag, 16. Oktober 2007, in Frutigen.
Trenitalia (I) have already signed a Memorandum of http://www.uus.admin.ch//pdf/07101601 SB.pdf
Understanding promoting the openETCS concept in
the framework of an international project. [9] Klaeren, Herbert: Skriptum soft-
waretechnik, Universitt Tbingen, Okt.
2007, http://www-pu.informatik.uni-
tuebingen.de/users/klaeren/sweinf.pdf
References
[10] McConnell, Steve: Code Complete, 2nd ed.
[1] The European Railway Agency (ERA): 2004, Microsoft Press; Redmond, Washington
ERTMS Technical Documentation, System 98052-6399, USA, ISBN 0-7356-1967-0
requirements Specification - Baseline 3,
SUBSET-026 v300 published 01/01/2010, [11] Richard H. Cobb, Harlan D. Mills: Engineer-
http://www.era.europa.eu/Document- ing Software under Statistical Quality Control.,
Register/Pages/SUBSET-026v300.aspx IEEE Software 7(6): 44-54 (1990)

[2] Thompson, Ken: Reflections on Trusting Trust [12] Dvorak, Daniel L., (Editor): NASA Study
; Reprinted from Communication of the ACM, on Flight Software Complexity, Final Re-
Vol. 27, No. 8, August 1984, pp. 761-763. port, California Institute of Technology,
http://cm.bell-labs.com/who/ken/trust.html 2008, Report: http://www.nasa.gov/pdf/
418878main FSWC Final Report.pdf, Pre-
[3] Wheeler, David A.: High Assurance (for Secu-
sentation: http://pmchallenge.gsfc.nasa.gov/
rity or Safety) and Free-Libre / Open Source
docs/2009/presentations/Dvorak.Dan.pdf
Software (FLOSS); updated 20/11/2009;
http://www.dwheeler.com/essays/high- [13] Ostrand, T. J. et al: Where the Bugs
assurance-floss.html Are. In: Rothermel, G. (Hrsg.): Proceedings
[4] Wysopal, Chris; Eng, Chris: Static De- of the ACM SIGSOFT International Sympo-
tection of Application Backdoors, Vera- sium on Software Testing and Analysis, Vol.
code Inc., Burlington, MA USA, 2007, 29, 2004, Pages 86-96; http://portal.acm.org/
http://www.veracode.com/images/stories/static- ; see also: http://www-pu.informatik.uni-
detection-of-backdoors-1.0.pdf tuebingen.de/users/klaeren/sw.pdf (German)

[5] Poulsen, Kevin, Borland Interbase back- [14] Randell, B.: The NATO Software
door exposed, The Register, Jan. 2001, Engineering Conferences, 1968/1969:
http://www.theregister.co.uk/2001/01/12/ http://homepages.cs.ncl.ac.uk/brian.randell/NATO/
borland interbase backdoor exposed
[15] Dijkstra, Edsger W.: The Humble Pro-
[6] EUROPEAN PARLIAMENT: REPORT on grammer, ACM Turing Lecture 1972.
the existence of a global system for the in- http://userweb.cs.utexas.edu/users/
terception of private and commercial com- EWD/ewd03xx/EWD340.PDF
munications (ECHELON interception system),
(2001/2098(INI)), Part 1: Motion for a [16] Sukale, Margret: Taschenbuch der Eisenbahnge-
resolution: A5-0264/2001, 11. July 2001. setze, Hestra-Verlag, 13.Auflage 2002

306
FLOSS in Safety Critical Systems

[17] Raymond, Eric Steven: The Cathedral and [30] Duhoux, Maarten: Respecting EN 50128
the Bazaar, version 3.0, 11 Sept. 2000 change control requirements using BugZilla
http://www.catb.org/ esr/writings/cathedral- variants, Signal+Draht, Heft 07+08/2010,
bazaar/cathedral-bazaar/ar01s04.html EurailPress http://www.eurailpress.de/sd-
archiv/number/07 082010-1.html
[18] Pfleeger, Charles P.; Pfleeger, Shari Lawrence:
Security in Computing. Fourth edition. ISBN 0- [31] DIN EN 50128; VDE 0831-128:2009-10;
13-239077-9 Railway applications - Communication,
signal- ling and processing systems -
[19] Biggerstaff, Ted J.: A Perspective of Gen- Software for railway control and protec-
erative Reuse, Technical Report, MSR- tion systems; version prEN 50128:2009;
TR-97- 26,1997, Microsoft Corporation Beuth Verlag, Germany, http://www.vde-
http://research.microsoft.com/pubs/69632/tr- verlag.de/previewpdf/71831014.pdf (index
97-26.pdf only)
[20] Rix, Malcolm: ”Case Study of a Suc- [32] Wheeler, David A.: Countering the
cessful Firmware Reuse Program,” Trusting Trust through Diverse Double-
WISR (Workshop on the Institution- Compiling (DDC), 2009 PhD dissertation,
alization of Reuse), Palo Alto, CA,. George Mason University, Fairfax, Virginia
ftp://gandalf.umcs.maine.edu/pub/WISR/wisr5/ http://www.dwheeler.com/trusting-trust/
proceedings/ .
[33] Open Proof: http://www.openproofs.org/
[21] Watts S. Humphrey; Winning with Software:
An Executive Strategy, 2001 by Addison- Wes- [34] Jan Peleska: Formal Methods and the De-
ley, 1st Edition; ISBN-10: 0-201-77639-1 velopment of Dependable Systems, Habilita-
tionsschrift, Bericht Nr. 9612,Universitt
[22] UNU-MERIT, (NL): Economic impact of Bremen, 1996. http://www.informatik.uni-
open source software on innovation and bremen.de/agbs/jp/papers/habil.ps.gz
the competitiveness of the Information and
Communication Technologies (ICT) sector [35] Anne E. Haxthausen, Jan Peleska and Sebas-
http://ec.europa.eu/enterprise/sectors/ict/files/ tian Kinder: A formal approach for the con-
2006-11-20-flossimpact en.pdf struction and verification of railway control
systems, Journal: Formal Aspects of Comput-
[23] Free Software Foundation, Inc.; 51 Franklin ing. Published online: 17 December 2009.
Street, Boston, MA 02110-1301, USA: DOI: 10.1007/s00165-009-0143-6 Springer,
http://www.gnu.org/philosophy/free-sw.html , ISSN 0934-5043 (Print) 1433-299X (Online)
[24] David A. Wheeler: Open Source Soft- http://springerlink.metapress.com/content/
ware (OSS or FLOSS) and the U.S. De- l3707144674h14m5/fulltext.pdf
partment of Defense, November 4, 2009; [36] Lorenz Dubler, Michael Meyer zu Hrste,
http://www.dwheeler.com/essays/dod-oss.ppt Gert Bikker, Eckehard Schnieder; For-
[25] European Commission, European Union Pub- male Spezifikation von Zugleitsystemen
lic License - EUPL v.1.1, Jan. 9, 2009. mit STEP, iVA, Techn. Univ. Braun-
http://ec.europa.eu/idabc/en/document/7774 schweig, 2002; http://www.iva.ing.tu-
bs.de/institut/projekte/Handout STEP.pdf
[26] European Commission, iDABC, European
eGovernment Services; OSOR; Guideline on [37] Padberg, J. and Jansen, L. and Heckel, R. and
public procurement of Open Source Soft- Ehrig, H.: Interoperability in Train Control Sys-
ware, March 2010, http://www.osor.eu/idabc- tems: Specification of Scenarios Using Open
studies/OSS-procurement-guideline Nets; in Proc. IDPT 1998 (Integrated De- sign
and Process Technology), Berlin 1998, pages 17
[27] Wikipedia, terminology: Copyleft - 28
http://en.wikipedia.org/wiki/Copyleft
[38] Gary Rathwell: Stone Soup Development
[28] Eclipse Foundation, About the Eclipse Founda- Methodology: Last updated December 5, 2000
tion, http://www.eclipse.org/org/#about http://www.pera.net/Stonesoup.html
[29] TOPCASED: The Open Source Toolkit for Crit- [39] AUTOSAR (AUTomotive Open System ARchi-
ical Systems; http://www.topcased.org/ tecture); http://www.autosar.org/

307
Open Proof for Railway Safety Software

[40] PLCopen; Molenstraat 34, 4201 CX Gorinchem, control system for the European railways, Aug.
NL,: http://www.plcopen.org/ 1993, 2nd. Rev. Oct. 1995
[41] UIC/ERRI A200: ETCS, European Train Con-
[43] Johannes Feuser, Jan Peleska: Security
trol System, Overall Project Declaration in-
in Open Model Software with Hardware
cluding the contribution to be made by UIC,
Virtualization The Railway Control Sys-
Utrecht, NL, Jan. 1992,
tem Perspective. Univ. Bremen, 2010
[42] UIC/ERRI A200: Brochure ETCS, European http://opencert.iist.unu.edu/Papers/2010-
Train Control System, The new standard train paper-2-B.pdf

308
FLOSS in Safety Critical Systems

Migrating a OSEK run-time environment to the OVERSEE


platform

Andreas Platschek
OpenTech EDV Research GmbH
Augasse21, A-2193 Bullendorf
[email protected]

Georg Schiesser
OpenTech EDV Research GmbH
Augasse21, A-2193 Bullendorf
[email protected]

Abstract
As virtualization techniques are being used in the automotive industry, in order to save hardware,
reduce power consumption and allow the reuse of legacy applications, as well as allow the fast development
and integration of new applications, the need for a run-time environment that is suitable and in wide use
in the automotive industry emerges. The requirements for such an run-time environment are defined in
the most widely used specification in this industry - OSEK/VDX.
One key feature the OVERSEE project is taking advantage of, is that co-locating a OSEK run-time
environment and a full-featured GPOS GNU/Linux eliminates many limitations of OSEK/VDX by the
extension through virtualization and notably allowing to mitigate some of the serious shortcomings in
the security area by resolving these issues at the architectural level rather than trying to patch up the
limited OSEK OS. This may well constitute a general trend to specialize operating systems and operate
powerful hardware as an assortment of specialized FLOSS systems collaborating to provide different
services, including full backwards compatibility to legacy operating systems.
Currently, several FLOSS implementation of this specification are available under different FLOSS
license models and with a different degree of compliance. This paper gives an overview of the available
implementations, a rational for the chosen implementation as well as a description of the efforts for the
migration to XtratuM.

1 Introduction and that allows to write highly portable applica-


tions which only depend on an OSEK compliant API.
Furthermore the OSEK communication specification
In the effort to reduce costs by saving hardware and provides a well specified API for internal as well as
reuse of legacy code, the automotive industry is rely- external communication, turning OSEK compliant
ing on well specified and standardized operating sys- operating systems into highly portable, scalable op-
tems. The OSEK specification (Open Systems and erating systems that support re-usability of legacy
the Corresponding Interfaces for Automotive Elec- OSEK compliant applications.
tronics) has been around since 1993 and after merg-
Although it’s successor [7] is going to be the fu-
ing with the VDX (Vehicle Distributed Executive) it
ture of the industry, OSEK/VDX will be around for
has grown to the most important operating system
quite some time, since it is the basis for AUTOSAR:
specification in the automotive industry.
”The OS shall provide an API that is backward
OSEK’s main goals are to specify an operating
compatible to the API of OSEK OS. New require-
system that is suitable for the automotive industry,

309
Migrating a OSEK run-time environment to the OVERSEE platform

ments shall be integrated as an extension of the func- and well defined behavior of OSEK/VDX compliant
tionality provided by OSEK OS.” [BSW097] Existing operating systems, allow high portability of applica-
OS, AUTOSAR, Requirements on Operating System tions developed for such an operating system.
V2.1.0 R4.0 Rev 1
The following summarizes the essential points of
All this explains, why support for an OSEK com- OSEK/VDX, for more details, please refer to the
pliant run-time environments is an indispensable re- homepage [4], where all parts can be downloaded free
quirement for a software platform - like the one de- of charge, since it is an open standard.
veloped in the OVERSEE [1] project - that targets
the automotive industry. A high level view of this
software platform can be seen in figure 1. Task Management OSEK/VDX distinguishes be-
tween two different types of tasks, basic tasks
(BT) and extended tasks (ET). While a BT
can only release the processor if it terminates,
or if it is preempted by a higher priority task or
an interrupt service routine (ISR), an ET can
also go into a waiting state, allowing the sched-
uler to dispatch a lower priority task, without
terminating the higher priority task. An ex-
ample for this would be, if the ET is waiting
for some kind of event to happen. Instead of
just polling and wasting CPU time, it can go
into the waiting state, in this state it is not
scheduled, before the event is signaled (more
on signals below).
OSEK/VDX provides a Task state Model ([4],
FIGURE 1: High Level Architecture section 4.2) that describes the states a task
can be in, and the transitions between those
This paper will give an introduction to the tasks. The task state model for extended tasks
OSEK/VDX operating system specification, and de- is shown in figure 2. For basic tasks the task
scribe the efforts that were necessary to allow the state model is essentially the same, but without
execution of FreeOSEK [2] a FLOSS implementa- the waiting state.
tion of OSEK/VDX in a virtualized environment,
namely the XtratuM hypervisor allowing to run sev- The states a task can be in are the following:
eral FreeOSEK run-time environments in parallel
with other run-time environments like Linux parti- • running - a task in the running state
tions or LithOS [9] partitions, while guaranteeing the is currently active and executed. At all
independence between those run-time environments. times only one task can be in the running
state. (OSEK/VDX is specified for single
core CPUs only, multi-core solutions are
2 OSEK/VDX covered by newer versions of AUTOSAR)
• ready - all schedulable tasks are in the
In the following, the open operating system speci-
ready state, waiting for their turn to tran-
fication OSEK/VDX [8] is summarized, looking at
sition into the running state.
the highest conformance class ECC2 (extended con-
formance class). The lower conformance classes are • suspended - tasks in the suspended task
subsets of ECC2, the relation between the confor- are currently inactive and wait for their
mance classes can be found in [4], Figure 3-3. activation to become ready.
• waiting - extended tasks that are waiting
2.1 OSEK OS for some event to happen can decide to go
into the waiting state instead of wasting
The most important part of OSEK/VDX to under- CPU time. A task in the waiting state
stand the context of this paper is OSEK OS. It spec- will be released from the waiting state as
ifies a operating system, well suited for the needs soon as the desired event has happened.
of the automotive industry. The standardized API

310
FLOSS in Safety Critical Systems

where it was before the ISR has been


called (no influence on task management).
• category2 ISRs are allowed to use oper-
ating system services that are concerned
with handling interrupts (enable, disable,
etc.), these ISRs prepare the system for
a RTE to run a dedicated user routine
(comparable to Bottom Halves). After a
category2 ISR has been executed, the ex-
ecution does not return to the last point
before the interrupt, instead the scheduler
is invoked, in order to check if a dedicated
user routine (bottom halve) has a higher
priority than the current running task.

Depending on the scheduling policy, the point


of rescheduling is either, when the event is
set (fully preemptive) or at the next point of
rescheduling in non-preemptive mode (listed
above in Task Management).

Events are a means of synchronization. They are


FIGURE 2: OSEK Task State Model only available for extended tasks, since they
are used to transition tasks into and out of
In the OSEK/VDX task management, the the waiting state. Events are objects assigned
scheduling policy is assigned by the system to tasks, and uniquely identified through their
integrator. A systems scheduling policy can name and the task they belong to. At task
be configured to be fully preemptive, non- activation of an extended task, all the events
preemptive or mixed (both preemptable as well are cleared automatically. Events can be set by
as non-preemptable tasks are running at the any task (also basic tasks) as well as category2
same time). The scheduling decision itself is ISRs, to change the task state of the events
based on priority scheduling, with static prior- owner from the waiting to the ready state, but
ities (where 0 is the lowest priority and bigger only the owner of the event is allowed to clear
numbers denote higher priorities). Depending the event after-wards.
on the conformance class, one or more tasks of
the same priority can exist at the same time. Depending on the scheduling policy, the point
If the system is configured to allow preemptive of rescheduling is either, when the event is
tasks, a priority ceiling protocol is provided to set (fully preemptive) or at next point of
prevent priority inversion. rescheduling in non-preemptive mode (listed
above in Task Management).
If preemption is disabled, only voluntary pre-
emption of tasks is possible, rescheduling hap- Resource Management In order to allow the con-
pens only in the following cases: current task execution model described above,
a resource management has to be provided, in
• the running task terminates successfully order to assure
• explicit call of the scheduler by the run-
ning task • mutually exclusive access to resources
• the running task transitions into the wait- • prevent priority inversion
ing state • detect and prevent deadlocks
Interrupts - OSEK/VDX distinguishes between 2 • and access to a resource must never lead
types of interrupts: to a transition into a waiting state

• category1 ISRs do not use operating sys- All these problems are high probable error
tem services, and after they are finished, sources, the goal of the OSEK resource man-
execution continues exactly at the point agement system is to do everything possible to

311
Migrating a OSEK run-time environment to the OVERSEE platform

prevent them from the operating system side. off. The predefined value can be specified ei-
To reach these goals, the following mechanisms ther relative to the actual counter value (rel-
are specified by OSEK/VDX: ative alarm) or as an absolute value (absolute
alarm).
OSEK Priority Ceiling Protocol [4], sec-
The counter value can be incremented by all
tion 8.5, introduces the OSEK Priority
kinds of sources, of course this could be a real-
Ceiling Protocol, used to avoid priority
time clock, but it could also be any other in-
inversion and deadlocks between tasks.
terrupt source that increments the counter.
This protocol provides a ceiling prior-
ity for each resource (this ceiling prior- While any number of alarms can be assigned to
ity is statically assigned at system gener- the same counter, each alarm has exactly one
ation), which shall be set to priority of the counter and exactly one alarm-callback routine
highest-prior task using the resource. assigned at system generation time.
If a task with a lower priority accesses the Error Handling OSEK/VDX defines hook rou-
resource, it’s own priority is risen to the tines which can be used for a variety of tasks.
resources
priority temporarily. After the task re- Hook Routines are part of the operating
leases the resource, it’s priority is set back system, although implemented by the ap-
to it’s old priority. This way, it is not plications developer. They can be seen as
possible that the task is preempted by an a possibility for the application developer
higher prior task that competes for the to extend the functionality of the operat-
same resource, while the lower prior task ing system. The hook routines are called
is holding the resource. by the OS at pre-configured events, which
Section 8.6 of [4] introduces an optional events depends on the implementation of
extension of the OSEK Priority Ceiling the operating system itself. Since hook
Protocol, that includes ISRs. routines are part of the OS, they have
higher priority than all tasks, and they
Restrictions when using Resources
can not be interrupted by category2 ISRs.
OSEK/VDX defines restrictions on the
While the interface for hook routines are
system calls that may be used, while a
standardized, functionality is not and is
task is holding a resource. The calls
up to the application developer.
forbidden while holding a resource are
TerminateTask, ChainTask, Schedule and Error Handling OSEK/VDX distinguishes
WaitEvent. As can be inferred from the between two categories of errors - appli-
names, those calls that invoke the sched- cation errors and fatal errors. In case of
uler and might lead to the scheduling of a fatal error, the integrity of the operat-
another task are the ones prohibited while ing systems internal data can no longer
holding a resource. be guaranteed, and the operating systems
This is a simple an effective way of as- shuts down. If an application error oc-
suring the mutual exclusivity of resources, curs, a system call could not be serviced
furthermore it helps to prevent deadlocks properly, but the internal data of the op-
between tasks. erating system is still assumed to be cor-
rect. If a system service routine returns
Scheduler as a Resource If a task wants to an error code, an error hook routine is
prevent itself from being preempted, it called. This hook routine has to be pro-
can lock the scheduler. If a task chooses vided by the user, who has the respon-
to do so, the scheduler is still invoked, but sibility to bring his application back on
not allowed to schedule any other tasks. track.
Interrupts are received and processed in-
dependently of the state of the scheduler. System Startup/Shutdown All low level
(hardware) initialization is up to the ap-
Alarms are special (time-dependent) events, offered plication developer, the specifications of
by the OSEK OS, to activate tasks after a the OSEK/VDX concern only the plat-
counter has experienced. A counter in OSEK form independent parts and start with the
is represented a counter value measured in call to StartOS.
ticks, if the counter reaches a predefined value, Shutdowns are a little more complicated,
the alarm expires and the alarm-event is set since each task has to be informed of the

312
FLOSS in Safety Critical Systems

shutdown, so it can bring potential actu- OSEK FTCom - Fault-Tolerant Communication


ators into a safe state. Therefore before provides a standardized time-triggered net-
the system can actually shutdown, a shut- working variant that in order to achieve bet-
down hook is called. ter fault-tolerance than with the standard
Debugging is done via a PreTaskHook and a OSEK/VDX networking layer.
PostTaskHook, which are called on task
switches. These hooks can be used for de-
2.3 AUTOSAR
bugging and measurement purposes.
While OSEK/VDX is currently the most used op-
Standardized API in [4], sections 12 and 13, the
erating system standard in the automotive industry,
system services provided by the API of an
its successor AUTOSAR [7] is on it’s way to take
OSEK/VDX compliant operating system are
over. The main reason for this is definitely the fact
specified. This API must be the only way for
that the newest release - AUTOSAR 4.0 - is the first
the application to use the above described op-
operating systems standard taking multi-core CPU’s
erating subsystems, like alarms, events, etc.
into account. Since the days of single-core CPU’s are
counted, this a real important topic that will shake
2.2 Other parts of OSEK/VDX the safety-community over the next years.
For the OVERSEE project, AUTOSAR is inves-
OSEK/VDX consists of multiple parts, OSEK OS tigated, but not strictely followed, the reason is sim-
described previously is the most important one for ply it’s size and the fact that AUTOSAR is based
the porting efforts of FreeOSEK to XtratuM, while on OSEK/VDX, so every application written for an
the other parts do not really play a role in this con- OSEK/VDX compliant operating system can also be
text (except for some sections of OSEK Com). But executed on a AUTOSAR compliant operating sys-
for completeness, here is a short list of all parts: tem. Nevertheless OVERSEE’s design decisions are
loosely based on the AUTOSAR architecture.
OSEK COM - Communication Layer specifies
a message based communication for (inter pro-
cessor) communication - it shows a stunning 3 XtratuM
resemblance with ARINC653 interpartition
communication, but describes a communica- XtratuM is a type II (bare metal) hypervisor tar-
tion system for internal and external commu- geting safety related composable systems. The main
nication. guidelines for design come from one of the key IMA
standards, ARINC 653 [3]. XtratuM is an active
OSEK NM - Network Management provides a FLOSS project being developed at Instituto de In-
standardized way of configuring networks of
formatica Industrial, Universidad Politecnica de Va-
OSEK/VDX nodes, initialization of network-
lencia. While the OVERSEE project is focused on
ing peripherals, network start-up, network security aspects the goal is to provide a platform
monitoring and a lot more, everything that is
that in principle can also satisfy safety requirements.
needed to start, maintain and diagnose a net- There is a strong sharing of core demands on the low-
work of nodes running OSEK/VDX compliant est OS layer with respect to safety and security, and
nodes.
while safety and security have sometimes conflicting
OSEK OIL - OSEK Interpretation Language demands at higher levels these differences are not
specifies a standardized configuration mecha- present at the lowest level of a hypervisor [10]. The
nism for OSEK/VDX compliant nodes. The key to unify the requirements at the lowest level of
configuration files as defined by OSEK OIL are safety and security is to provide a sound:
per node (single CPU nodes only), and do not
include network configuration. • Temporal isolation

OSEK Time - Time-Triggered OS specifies a • Spatial isolation


time-triggered variant of OSEK VDX, the dif-
ferences are e.g. time-triggered scheduling and allowing to build high-level services on top that
the like. It is also possible to run a mixed only allows explicitly permitted sharing of resources
variant, were a standard OSEK OS is run in as well as communication. XtratuM thus is inten-
time slots of the time-triggered OS. tionally reduced close to the bare minimum that is

313
Migrating a OSEK run-time environment to the OVERSEE platform

needed to allow high-level services to operate in there • Basic partition management functions: Much
respective OS environments and still give strong of the partition management is related to the
guarantees with respect to independence. initialization and shutdown phase of a parti-
tion. The essence of the interface is that it
minimizes the state information that needs to
3.1 XM Hypercall Interface be handled by the hypervisor - leaving more or
less all state related work to the partition.
XtratuM offers a relatively narrow interface of Hy-
– XM suspend partition: This is a basic
percalls to it’s partitions. This simplified things a
function that is only used in supervisor
lot for our porting efforts. In this section we will
mode to manage a partition. It is used to
only briefly outline hypercalls that were used in this
porting effort, for a full list of available hypercalls we block a partition (waiting on a resource)
or temporarily stop a partition if errors
refer you to the XtratuM Reference Manual [11] The
are detected.
intention of this section is to show the interface size
used in the XtratuM guest management for a actual – XM resume partition: Simply the oppo-
example. site to the above partition suspension.
– XM shutdown partition: As the hypervi-
sor does not have information about the
• Time services: XtratuM provides an indepen-
internal state of a partition shutdown is
dent virtual time to each domain on which the
provided as an asynchronous notification.
guest-OS then can implement high-level timing
Basically a partition is sent a request to
services. In this sense the low-level services can
shut down via a dedicated interrupt and
be seen as mimicking hardware timing services.
after cleaning up any internal state will
– XM get time: Time entities in Xtra- then terminate it self.
tuM are of microsecond granularity, and – XM reset partition: Conversely to
are maintained relative to the last sys- the XM shutdown partition, the
tem reset. There are two basic clocks XM reset partition is a forced shutdown
in the system. Clocks in XtratuM are of a partition whereby a warm and cold
strictly monotonic. Clocks are main- reset is differentiated, a warm reset pre-
tained for the system (XM HW CLOCK) serves some of the partitions initialized
as well as for the partitions execution resources (i.e. open ports and memory
(XM EXEC CCLOCK) areas) while a cold reset clears this all
– XM set timer: Interval timer service and thus can have side-effects on other
(providing one-shot behavior by setting partitions via communication channels no
the interval to 0). The expire time is an longer being served.
absolute time with respect to either hard- – XM halt partition: A halted partition is
ware clock or execution clock. To a par- set into an inactive state but no recla-
tition the expired timer is signaled as a mation of resources (spatial or temporal)
virtual timer interrupt (emulating a hard- are done (that is left to the partition re-
ware timer). set) in this state the partition is sim-
ply no longer scheduled by the hypervi-
• Interrupt services: Signaling to partitions is sor. The XM halt partition called by non-
provided via virtual interrupts, it is up to the supervisor partitions can only pass self as
guest-OS to then assign suitable meaning and the target of the halt.
response to the events. Note the absence of – XM idle self: This allows a partition to
a interrupt request hypercall - as all resources suspend it self within its time slot. The
are allocated statically in XtratuM there is no partition will only be re-woken on its next
need for a request irq. time-slot or if a NMI is received within its
current time slot. This can be used to im-
– XM enable irqs: globally disable inter-
plement donation schemes for system par-
rupt delivery to this partition
titions.
– XM disable irqs: globally enable inter-
rupt delivery • Basic system management functions: Note
that these are not directly related to the guest-
– XM set irqmask: used for masking OS as these calls are related to privileged do-
(blocking) and unmasking of interrupts mains - they are listed here for completeness.

314
FLOSS in Safety Critical Systems

– XM halt system: The halt partition call POSIX compliant platforms (this is just a simulation
(also described above) is used by sys- environment, running FreeOSEK as a user-space pro-
tem partitions to manage the system as cess intended to allow everyone to test it on a normal
a whole as well as individual partitions. Linux desktop).
Only supervisor partitions can halt other
FreeOSEK is licensed under the GPLv3 with link
partitions. This is used to prepare a par-
exception. This means, that you can link your code
tition reset as well as mode switching.
into FreeOSEK and can still license your code under
– XM reset system: Brute force system halt whatever license you want (free or proprietary).
of the entire board after this only a hard-
ware reset can reboot the system. No pre- According to the FreeOSEK homepage, they cur-
cautions are taken to put any partition rently run about 80% of the OSEK OS conformance
into a sane state thus this is only the last tests, and of those about 95% pass. In addition,
step in a system shutdown as well as in FreeOSEK is tested, using the static code checking
extreme emergency situations. tool splint.
Fortunately big parts of FreeOSEK are generic
• Low level Communication related functions: In
C-code (e.g. the task scheduler) and only the parts
practical implementations one does not actu-
that directly deal with hardware had to be adapted
ally use the low level object class functions but
(see section 5 for details).
uses the wrappers provided to the commonly
used objects (sampling and queuing ports as While OSEK OS is almost complete, OSEK Com
specified in ARINC 653). These wrappers thus is more or less non existent in FreeOSEK, but this is
are the actual hypercalls that will be issued no big problem for us, as we will see later in section
though they are rarely used in guest-OS code. 5.4, since most of the functionality needed for OSEK
Com compliant communication is already provided
– XM read object: read the object, verify- by XtratuM.
ing access permissions and other low-level
properties. Usage in all reading func-
tions like XM receive queuing message,
XM read sampling message, etc. 5 Porting Efforts
– XM write object: write the ob-
ject. This is used i.e. in The following section describes the efforts that have
XM write sampling/queuing message, to be taken to run FreeOSEK as an run-time envi-
XM send queuing message. ronment in a XtratuM partition.
– XM ctrl object: is used to create and This includes also a description of which steps
manage objects with specific properties already have been achieved successfully, and gives
as well as query these objects (i.e. re- insight into the parts that will need more work. To
trieve the id of the object). This hypercall anticipate the most important thing first: As of this
is used in object management functions writing, FreeOSEK can be used as an XtratuM run-
like XM create sampling/queuing port, time environment, but more work will be needed to
XM get sampling/queuing port status, make a full compliant version possible, most notably
etc. in the task management and communication subsys-
tem some (re)work will be necessary.
While the overall hypercall set is a bit more elab-
orate than listed here, the essential calls used to im-
5.1 Adaptation of the Build System
plement the OSEK guest-OS are listed showing how
small such a guest-OS interface actually can be con-
The first step to running FreeOSEK inside of an
structed if the abstraction level is pulled down far
XtratuM partition, was to adapt FreeOSEK’s build
enough. A full description of the interface is out of
system, so that the resulting binary would be ac-
scope for this paper though.
cepted by XtratuM. The most important thing here
is, that FreeOSEK must not be compiled as an ex-
ecutable binary, but instead it has to be compiled
4 FreeOSEK as an relocatable object, that can be linked into an
XtratuM partition - if necessary even in multiple par-
FreeOSEK[2] is a OSEK implementation started by titions - at a memory address that is specified at
Mariano Cerdeiro. It originally ran on ARM and on configuration time in the XtratuM configuration file.

315
Migrating a OSEK run-time environment to the OVERSEE platform

After this stage it is already possible to boot into • setting a task to a waiting state
FreeOSEK, and to put some xprintf’s1 into the init
code. Since most of the initialization code is generic • release of a resource at the task level
(e.g. load the data of the application’s task) this is • return from interrupt level to task level
already done without any changes to the FreeOSEK
code base. The next point that really needed atten- In order to allow preemption of tasks (either vol-
tion, was the x86 specific code for the task switches. untarily by going int o waiting states or involuntarily
by hitting one of the points of rescheduling from the
above list,
5.2 Task Management
the context has to be saved before and restored
In order to assure a flawless scheduling of tasks, after rescheduling, this part of the task management
it has to be assured,that for each possible point of is not clean yet and will need some rework so it can be
rescheduling, the transition from the old to the new considered done. For a proof of concept as necessary
task is done properly. by the OVERSEE project, other parts of OSEK are
more important and will therefore need to be han-
Which actions have to be performed during dis-
dled before finishing up task management.
patching, depends on the the event that led to the
rescheduling - that is on the point of rescheduling
itself. 5.3 Counters and Alarms
OSEK OS lists the following 4 points of
rescheduling for non-preemptive scheduling: As described above, one way a task can be activated
is if an alarm has expired. Each alarm is triggered
• Task Termination by exactly one counter.
Counters can be incremented by all kinds of
• explicit activation of successor task
events but one of the most common ones are timers,
• explicit call of the scheduler in order to allow timed activation of tasks. All that
was to do, to allow alarms that wake up tasks, was to
• a transition into a waiting state takes place add an IRQ handler which is triggered by the virtual-
ized XM timer interrupts. Inside of this IRQ handler
Let’s have a quick look at those four points of a counter is incremented, using the OSEK defined
rescheduling. The first two can be handled really IncrementCounter() call. The virtualized timer is
easily, for those two, the task context of the old task configured in the initialization code of FreeOSEK.
does not have to be saved, since it terminates, be- Now one or more alarm(s) can be associated with
fore the new task is scheduled. Therefore, all that the counter in the OIL configuration file of the ap-
was needed to get a basic version of FreeOSEK run- plication, to make those alarms go off as soon as the
ning on XtratuM, was to set the stack pointer to the counter has reached a limit.
stack of the new task, and jump into task itself. This An example for such configuration could look
way, simple examples that activate non-preemptive like this (only the part that deals with counters and
tasks, and chain non-preemptive tasks can already
alarms):
be run.
If preemptive scheduling is desired, the follow- COUNTER HardwareCounter {
ing extended list of points of rescheduling has to be MAXALLOWEDVALUE = 100000;
considered: TICKSPERBASE = 1000;
MINCYCLE = 1;
• Task Termination TYPE = HARDWARE;
COUNTER = HWCOUNTER0;
• explicit activation of successor task };
• activation of a task at task level COUNTER SoftwareCounter {
• explicit call of the scheduler MAXALLOWEDVALUE = 100000;
TICKSPERBASE = 100;
• a transition into a waiting state takes place MINCYCLE = 1;
1 xprintf is a library function of libxm wrapping a XM write console, giving the application programmer a way to use formatted
printing.

316
FLOSS in Safety Critical Systems

TYPE = SOFTWARE; should be transparent to the application implies,


}; that it has to be possible, to run a legacy OSEK com-
pliant application, that uses OSEK COM. Specifi-
ALARM IncrementSWCounter { cally it can be run in an FreeOSEK run-time envi-
COUNTER = HardwareCounter; ronment communicating via the XtratuM interpar-
ACTION = INCREMENT { tition communication system instead of let’s say a
COUNTER = SoftwareCounter; CAN bus, without even knowing it, and without the
}; need of changing a single line of application code.
AUTOSTART = TRUE {
One further thing we can take into account, is the
APPMODE = AppMode1;
ARINC653 compliant interpartition communication
ALARMTIME = 1;
system provided by XtratuM, and the resemblance
CYCLETIME = 1;
of the OSEK Com system and the ARINC 653 in-
};
terpartition communication system. This similarity
};
in communication mechanisms leads to a huge sim-
plification in the external communication which can
ALARM ActivateTaskA {
be done me wrapper functions for the XtratuM hy-
COUNTER = SoftwareCounter;
percalls, which confiugre the XtratuM (ARINC 653)
ACTION = ACTIVATETASK {
ports to behave the way expected from FreeOSEK
TASK = TaskA;
and allow a OSEK Com compliant interface. Things
}
like FIFO buffers for queueing messages do not have
AUTOSTART = FALSE;
to be implemented, since they already are imple-
};
mented in the XtratuM core.

The first section describes a hardware counter,


that is incremented by the ticks from the virtual-
ized XtratuM timer interrupts. This counter is used
to increment a software counter using an alarm In- 6 Conclusion
crementSWCounter. If this software counter has an
overflow, the ActivateTaskA alarm is triggered, and
Even if the FreeOSEK run-time environment is far
the OSEK task TaskA goes from state suspended to
from being perfect, the implementation shows the
state ready.
feasibility of running an OSEK compliant operating
This looks like a waste of resources, but if you system as a XtratuM run-time environment,fulfilling
want different Alarms triggered by the the same one of OVERSEES main missions - reuse of exist-
hardware timer, you have to configure multiple soft- ing automotive applicatinos in a security enhanced
ware counter, which then activate the various tasks. environment with minimum effort.
Furthermore, the theoretical mapping between
OSEK compliant communication and ARINC653
5.4 Interpartition Communication
compliant communication could be proven valid and
compatible to the point where a legacy OSEK com-
Communication in OSEK is defined in [5], which de-
pliant application can be moved into a XtratuM run-
fines the main goals of this specification as follows:
time environment with a virtualized communication
”It is the aim of the OSEK COM specification system replacing a legacy physical communications
to support the portability, re-usability and interoper- system, without the need of adapting the applica-
ability of application software. The API hides the tion itself.
differences between internal and external communi-
The next steps in the port of FreeOSEK to
cation as well as different communication protocols,
XtratuM will be the cleanup of the context switch,
bus systems and networks.” [OSEK Communication
in order to allow fully preemptive task schedul-
Specification 3.0.3, 1.1 Requirements]
ing. This is also the pre-requiste for most of the
From this paragraph we already can deduce, that MODISTARC [6] tests which are already imple-
connecting FreeOSEK to the message passing inter- mented in FreeOSEK and help to show the compli-
partition communication system that is provided by ance with the OSEK/VDX specifications and finally
XtratuM is conforming to the specification. More the integration of the FreeOSEKport into the over-
importantly, the latter part stating that communi- all OVERSEE architecture Proof-of-Concept frame-
cation protocols as well as communication media work.

317
Migrating a OSEK run-time environment to the OVERSEE platform

7 Acknowledgments [5] OSEK/VDX Communication Version 3.0.3,


2004, OSEK/VDX Consortium
This paper has been produced in the context of [6] Methods and tools for the validation of
the OVERSEE project (FP7-ICT-2009-4, Project ID OSEK/VDX based distributed architectures,
248333). 1999, OSEK/VDX Consortium
[7] AUTOSAR - Automotive Open System Architec-
ture, http://autosar.org/
References
[8] OSEK/VDX consortium,OSEK/VDX Home-
[1] OVERSEE, 2010, http://oversee- page, http://osek-vdx.org/
project.com
[9] M. Masmano et. al.: LithOS: a ARINC-653 guest
operating for XtratuM, 2010
[2] FreeOSEK, http://opensek.sourceforge.net
[10] John Rushby: Partitioning forequirements,
[3] ARINC653 - Avionics Application Software Mechanisms, and Assurance, 1999
Standard Interface,Airlines Electronic Engineer-
ing Committee, October 2003 [11] Miguel Masmano, Ismael Ripoll, Alfons Crespo,
Patricia Balbastre: XtratuM Hypervisor for IN-
[4] OSEK Operating System Specification 2.2.3, TEL x86 - Volume 4: Reference Manual, June
2005, OSEK/VDX Consortium 2011

318

You might also like