Proceedings 2011
Proceedings 2011
Workshop
Organizing committee
Local staff at Czech Technical University Prague
Michal Sojka (Faculty of Electrical Engineering, Department of Control Engineering)
Pavel Pı́ša (Faculty of Electrical Engineering, Department of Control Engineering)
Petr Hodač (SUT - Středisko un*xových technologiı́, UN*X Technologies Center)
Real-Time Linux Foundation Working Group and OSADL
Prof. Nicholas Mc Guire, (Lanzhou University, China)
Andreas Platschek (OpenTech, Austria)
Dr. Carsten Emde (OSADL, Germany)
Program committee
Roberto Bucher, RTAI Maintainer, Switzerland
Alfons Crespo, University Valencia, Spain
Carsten Emde, OSADL, Germany
Oliver Fendt, Siemens, Corporate Technology, Germany
Gerhard Fohler, Technische Universität Kaiserslautern, Germany
Thomas Gleixner, Linutronix, Germany
Nicholas Mc Guire, Lanzhou University, China
Hermann Härtig, TU Dresden, Germany
Zdeněk Hanzálek, Czech Technical University, Prague
Paul E. McKenney, IBM
Jan Kiszka, Siemens, Germany
Miguel Masmano, Universidad Politecnica de Valencia, Spain
Odhiambo Okech, University of Nairobi, Kenya
Pavel Pı́ša, Czech Technical University, Prague
Andreas Platschek, OpenTech, Austria
Zhou Qingguo, Lanzhou University, China
Ismael Ripoll, University Valencia, Spain
Georg Schiesser, OpenTech, Austria
Stefan Schönegger, Bernecker + Rainer, Austria
Michal Sojka, Czech Technical University, Prague
Martin Terbuc, University of Maribor, Slovenia
Mathias Weber, Roche Diagnostics Ltd., Switzerland
Bernhard Zagar, Johanes Keppler University, Austria
Peter Zijlstra, Red Hat, Netherlands
Prague 2011
Title: Proceedings of the 13th Real Time Linux Workshop
ISBN: 978-3-0003-6193-7
Preface
After several Real-Time Linux Workshops in Europe (Vienna 1999, Milan 2001, Valencia 2003, Lille 2005,
Linz 2007, Dresden 2009), in America (Orlando 2000, Boston 2002, Guadalajara 2008), and Asia (Singapore
2004, Lanzhou 2006) reaching Africa for the first time in 2010 (Nairobi 2010), the Thirteenth Real-Time
Linux Workshop comes to Prague, Czech Republic this year.
The event is still driven by the simple goal: bring together developers and users, present new develop-
ments, discuss ‘real’ user demand and get to know those anonymous people that only exist as e-mail folders
on your mailing-list archive, and last but not least, encourage the spirit of a community.
Free Libre Open-Source Software is a fast growing technology pool and we can observe this well in the
breath of development presented at this year’s Real-Time Linux Workshop. Not only has FLOSS reached
traditional automation and control, but it is increasingly reaching into technical areas that were almost
unthinkable for ”non-commercial” entities - safety critical systems. This development is underpinned by
developments in the FLOSS tools for formal and semi-formal verification. With other words, FLOSS is cov-
ering the entire area from educational material, traditional automation and control, robotics to aerospace
and automotive industries - while no single workshop can ever claim to cover it all - we do hope to have
collected a representative snapshot of this sprawling community.
Thank you very much for attending the Real Time Linux Workshop. We hope that your expectations
are met during this workshop, as developer, as user or as newcomer to real time Linux.
i
ii
Acknowledgements
No Real Time Linux community, no Real Time Linux users, no Real Time Linux Workshop. Therefore, our
thanks go to the Real Time Linux community for the work done in Open Source software development as an
international cooperation.
All authors and attendees, thanks a lot for your contribution in any respect.
In particular, we want to express our thanks to the sponsors of the 13th Real-Time Linux Workshop:
Last but not least, thanks to everybody having contributed to this workshop and not explicitly mentioned
above.
iii
iv
Contents
v
Performance Evaluation and Enhancement of Real-Time Linux . . . . . . . . 117
Real-Time Performance of L4Linux
Adam Lackorzynski, Janis Danisevskis, Jan Nordholz and Michael Peter . . . . . . . . 117
Tiny Linux Project: Section Garbage Collection Patchset
Sheng Yong, Wu Zhangjin and Zhou Qingguo . . . . . . . . . . . . . . . . . . . . . . . 125
Performance Evaluation of openPOWERLINK
Yang Minqiang, Li Xuyuan, Nicholas Mc Guire and Zhou Qingguo . . . . . . . . . . . 135
Improving Responsiveness for Virtualized Networking Under Intensive Computing Workloads
Tommaso Cucinotta, Fabio Checconi and Dhaval Giani . . . . . . . . . . . . . . . . . 143
Evaluation of RT-Linux on different hardware platforms for the use in industrial machinery
control
Thomas Gusenleitner and Gerhard Lettner . . . . . . . . . . . . . . . . . . . . . . . . 149
openPOWERLINK in Linux Userspace: Implementation and Performance Evaluation of the
Real-Time Ethernet Protocol Stack in Linux Userspace
Wolfgang Wallner and Josef Baumgartner . . . . . . . . . . . . . . . . . . . . . . . . . 155
Timing Analysis of a Linux-Based CAN-to-CAN Gateway
Michal Sojka, Pavel Pı́ša , Ondrěj Špinka, Oliver Hartkopp and Zdeněk Hanzálek . . 165
Evaluation of embedded virtualization on real-time Linux for industrial control system
Sanjay Ghosh and Pradyumna Sampath . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Real-Time Linux Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Turning Kriegers MCS Lock into a Send Queue or, a Case for Reusing Clever, Mostly Lock-
Free Code in a Different Area
Benjamin Engel and Marcus Völp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
DYNAMIC MEMORY ALLOCATION ON REAL-TIME LINUX
Jianping Shen, Michael Hamal and Sven Ganzenmüller . . . . . . . . . . . . . . . . . 187
pW/CS - Probabilistic Write / Copy-Select (Locks)
Nicholas Mc Guire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
On the implementation of real-time slot-based task-splitting scheduling algorithms for multi-
processor systems
Paulo Baltarejo Sousa, Konstantinos Bletsas, Eduardo Tovar and Björn Andersson . 207
Experience with Sporadic Server Scheduling in Linux: Theory vs. Practice
Mark J. Stanovich, Theodore P. Baker and An-I Andy Wang . . . . . . . . . . . . . . 219
How to cope with the negative impact of a processors energy-saving features on real-time
capabilities?
Carsten Emde . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
FLOSS in Safety Critical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
On integration of open-source tools for system validation, example with the TASTE tool-chain
Julien Delange and Maxime Perrotin . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Safety logic on top of complex hardware software systems utilizing dynamic data types.
Nicholas McGuire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
POK, an ARINC653-compliant operating system released under the BSD license
Julien Delange and Laurent Lec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
D-Case Editor: A Typed Assurance Case Editor
Yutaka Matsuno . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
A FLOSS library for the safety domain
Peter Krebs, Andreas Platschek and Hans Tschürtz . . . . . . . . . . . . . . . . . . . 279
Open Proof for Railway Safety Software
Klaus-Rüdiger Hase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Migrating a OSEK run-time environment to the OVERSEE platform
Andreas Platschek and Georg Schiesser . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
vi
Real-Time Linux in Education
Alberto Guiggiani
Università di Firenze, Dipartimento di Sistemi e Informatica
Via di S. Marta 3, Florence, Italy
[email protected]
Michele Basso
Università di Firenze, Dipartimento di Sistemi e Informatica
Via di S. Marta 3, Florence, Italy
[email protected]
Massimo Vassalli
Natonal Research Council of Italy, Institute of Biophysics
Via De Marini 6, Genoa, Italy
[email protected]
Francesco Difato
Italian Institute of Technology (IIT), Dept. of Neuroscience and Brain Technologies
Via Morego 30, Genoa, Italy
[email protected]
Abstract
In this paper we present the Realtime Suite, a project that aims to provide all the tools and guides
needed to set up a Linux-RTAI real-time machine within a CACSD environment. This is addressed in
particular to researchers and students approaching the world of real-time control applications for the first
time, or people searching for an open-source alternative to commercial solutions.
1
Realtime Suite: a step-by-step introduction to the world of real-time signal acquisition and conditioning.
tively. The next step was to edit source codes with the various components of the real-time machine is
the objective to avoid conflicts and compile errors. shown in figure 1.
Lastly, we packed everything in the so-called Real-
time Suite alongside with documentation with simple
step-by-step instructions and examples. In addition 2.1 RTAI and Comedi
to that, we configured the suite on a Virtual Machine
(VM) that works out-of-the-box, useful for testing RTAI[2], Real Time Application Interface, is a Linux
purposes. extension that allows the execution of tasks with
strict temporal constraints, enabling hard real-time
In order to verify the efficacy of the proposed ap-
(HRT) control algorithm implementations. Develop-
proach in developing an actual real-time application,
ment started back in 1999 with the work of Paolo
a Realtime Suite based system has been installed[9]
Mantegazza at Politecnico di Milano. The Suite in-
to realize a feedback control for nanometer-precision
cludes RTAI version 3.8.1 alongside with Linux Ker-
specimen tracking in an optical tweezer system with
nel 2.6.32 patched from LinuxCNC[10].
piezoelectric actuators.
Comedi[3] is a set of drivers provided as Linux
The paper is organized as follows. In Section 2
kernel extensions that enable communication with a
an overview of the software included in the Realtime
broad range of commercial data acquisition boards.
Suite package will be presented. Section 3 will cover
A collection of libraries is provided, with APIs which
the remaining two components of Realtime Suite: a
allow the real-time target to interface with the de-
step-by-step tutorial and the preconfigured virtual
vice.
machine. Section 4 will focus on results of perfor-
mance tests to evaluate achievable sampling rates
and jitter, while in the last section an application
2.2 Scicoslab and RTAI-Lib
will be proposed in which a Realtime Suite based
machine has been used to control specimen position
in an optical tweezers setup.
TCP/IP
2
Real-Time Linux in Education
chine, semaphores for task synchronization, and in- of Linux real-time applications. To achieve this, the
put/output blocks, like signal generators, scopes and software package described in the previous section
meters. The RTAI-Lib palette blocks are shown in comes with a step-by-step tutorial that accompanies
figure 2. the user in all the phases needed to build a work-
ing real-time control machine, from the initial sys-
tem setup to compilation and execution of the first
2.3 QRTAILab real-time target. The majority of the steps involves
running console commands to compile and install
QRTAILab [7] is a RTAI-Linux Graphical User In- the various components, but everything is explained
terface useful to manage real-time targets running on and viable for users without advanced programming
the same machine. It creates a virtual oscilloscope skills. The tutorial is a revised version of the one
to monitor signals and allows online modification of proposed by Bucher, Mannori, Netter[16].
the target parameters. It was developed starting In addition to the software package and its re-
from the source code of xrtailab, a software part of lated tutorial, a virtual machine has been released.
the RTAI-Lab package, using the Qt libraries. With It is based on Linux Ubuntu 10.04 and includes all
respect to xrtailab, QRTAILab is much lighter[12] the software of the Realtime Suite compiled and
CPU-wise when connecting to complex targets. ready-to-use. It is available through the RTAI-XML
project website[8] in the Open Virtual Machine for-
mat (.ova) and can be executed on a wide range
2.4 Remote interface: RTAI-XML of host machines with the open-source virtualization
software VirtualBox[17]. Due to the limitations of a
One of the major concerns that becomes evident virtualized system it cannot be used in substitution
when designing real-time control system architec- of a physical RTAI machine in actual signal acquisi-
tures is the intrinsic duality between hard real-time tion and conditioning, but can constitute and handy
(HRT) and soft real-time (SRT) components. While tool in the design phases of Scicos control algorithms
the first requires the programmer to focus on tim- or while testing remote RTAI-XML clients.
ing constrains, latencies, and sampling rates, SRT
components like human-machine interfaces (HMI)
require flexibility, user-friendliness and efficient data
handling. In order to separate those two worlds, a 4 Performance Tests
web services approach can be taken. Web services[13]
allow two pieces of software to communicate through This section shows results of two kinds of experimen-
a network defining a standard object access protocol. tal tests performed on a machine configured with
Here only the communication language is shared, Realtime Suite. The machine was a commercial
leaving freedom of implementation. personal computer with the following specifications:
CPU Intel Core2Duo 6300 @ 1.86 GHz
RTAI-XML[8][14] brings a web services approach
System RAM 1 GB
to the world of Linux real-time control applications.
Video Card ATI Radeon HD 4350
The Realtime Suite includes the RTAI-XML server
HDD WD Caviar Blue SATA @ 7200 RPM
component that are compiled on the RT machine
National Instruments DAQ board NI-PCI 6229
through the last steps of the tutorial. This server
components acts as an intermediary between the
real-time targets and a remote procedure call frame- 4.1 Jitter
work. Using XML, it bridges the target signals and
parameters over the network, making them acces- The first test aims at evaluating precision of the sam-
sible from remote clients like jRTAILab[15], a Java pling tick, in respect to sampling rate and task pri-
implementation of xrtailab, or any other application- ority. In order to calculate it, two separate real-
oriented client as the one presented in Section 5. time targets are in execution: the first with high
priority and high sampling rate (2.5 KHz), the sec-
ond with low priority and low sampling rate (125
3 Setup Tutorial and Realtime Hz). Each one gets sampling effective timestamps
and compares them with expected timestamps, cal-
Suite VM culated by adding task period to a counter. The ab-
solute value of their difference is the instantaneous
The Realtime Suite project was born with the objec- jitter. The two tasks are kept running until the max-
tive to guide less experienced people into the world imum jitter reaches a stable value. Results are shown
3
Realtime Suite: a step-by-step introduction to the world of real-time signal acquisition and conditioning.
e c
RTAI-XML
4
Real-Time Linux in Education
In figure 5 is shown the section of C# interface that [7] QRTAILab, a user interface for RTAI, Online,
provides a graphical front-end to edit on-the-fly pa- http://qrtailab.sourceforge.net
rameters, e.g. proportional and integral gains, of the [8] RTAI-XML, Online, http://www.rtaixml.
control algorithm running on the real-time machine. net
This application shows the functionalities of a
[9] A. Guiggiani, B. Torre, A. Contestabile, F. Ben-
control machine built with Realtime Suite. We were
fenati, M. Basso, M. Vassalli, F. Difato, Long-
able to acquire multiple analog channels (nine in this
range and long-term interferometric tracking by
application) sampled with a bandwidth of two KHz,
static and dynamic force-clamp optical tweezers,
and to condition by a custom control algorithms,
Optics Express, in print 2011
with the use of limited hardware (a commercial PC
and a DAQ board). In addition to that, we could sep- [10] Linux CNC, Online, http://www.linuxcnc.
arate implementation of the HMI soft real-time com- org
ponents from business logic ones thanks to RTAI-
XML. [11] Scicos, Online, http://www.scicos.org
With Realtime Suite, we have configured a ready- [13] W3C: Web Services, Online, http://www.w3.
to-use software package useful for researchers/users org/2002/ws/
approaching real-time applications for the first time.
[14] M. Basso, R. Bucher, M. Romagnoli and M.
Everything is based on open-source projects sup-
Vassalli, Real-Time Control with Linux: A Web
ported by active developers and communities. At
Services Approach, In Proc. 44th IEEE Con-
the cost of a personal computer with a supported
ference on Decision and Control - Euro-
data acquisition board, and a few hours of work,
pean Control Conference, Seville (Spain),
it is possible to build a real-time machine capable
pp. 2733-2738, 12–15 Dec., 2005
of running custom control targets, sampling signals
with a bandwidth of a few kHz. Flexibility in con- [15] jRTAILab, a client for RTAI-XML,
trol architecture design is added by the inclusion of Online, http://www.rtaixml.net/
the RTAI-XML project, which allows to separate the client-applications/jrtailab
HRT components (signal acquisition, control algo-
rithms...) from the SRT components (user interface, [16] R. Bucher, S. Mannori, T. Netter, RTAI-Lab tu-
data manipulation), to develop appropriate strate- torial: Scicoslab, Comedi, and real-time control,
gies for interfacing two distinct worlds. 2010
5
Realtime Suite: a step-by-step introduction to the world of real-time signal acquisition and conditioning.
[17] VirtualBox, Online, http://www.rtaixml. optical trap for dielectric particles, Optical
net Letters, 11: 288-290, 1986
[19] K. Svoboda and S. Block, Force and velocity
[18] A. Ashkin, J. Dziedzic, J. Bjorkholm, and S. measured for single kinesin molecules, Cell,
Chu, Observation of a single-beam gradient force 77(5): 773-784, 1994
6
Real-Time Linux in Education
Klaus Weichinger
BIOE Open Hardware Automation System Developer
3300 Greinsfurth, Austria
[email protected] - http://bioe.sourceforge.net
Abstract
Today, nonlinear model-based control methods are an essential part in different control applications.
To provide a complete open framework for educational purposes this contribution extends common open
source software (Scilab/Scicos, Maxima and rt-preempt Linux real-time system) with a low-cost do-it-
yourself open hardware interface and a web-based monitoring system embedded into Scicos blocks.
The simple concept of the open hardware interface called Bioe (Basic Input Output Elements) allows
the real-time application to interact with general analog and digital signals as well as to more complex
devices (e.g. resistive touch panels, RC servos, I2 C acceleration sensors). Furthermore, a prototype of a
web-based monitoring system using Ajax is treated. It consists of a Http web server embedded into a
Scicos block so that existing Scicos code generation packages for rt-preempt Linux can be used without
modifications.
To demonstrate the applicability and usability of the proposed framework a nonlinear model-based
control law for a mechatronic multi-input multi-output system is derived with the concept of input/output
linearization and realized with the proposed open framework.
7
A Nonlinear Model-Based Control realized with an Open Framework for Educational Purposes
input multiple-output (MIMO) mechatronic system the functionality to deal with different types of sig-
consists of the well known mass-spring system with nals and to provide the information via the Bioe bus
viscous friction that is actuated with a double-acting by the use of 16 end-points (EP0, EP1, . . . , EP15).
hydraulic piston (DAP). Beside the position control Each end-point consists of a 16bit receive and 16bit
of the mass the sum-pressure of the DAP has to be transmit register (RXadr,ep and T Xadr,ep with the
stabilized at a constant value. The method of in- Bioe address adr ∈ {0, . . . , 15} and the end-point
put/output exact linearization [11, 12] is used to de- number ep ∈ {0, . . . , 15}). These end-points are the
rive the control law and a hardware-in-loop (HIL) common interface between the real-time application
simulation is used to test the control law and to and the physical signal. The end-points are accessed
demonstrate the usage of the Bioe system and the with transactions. During a transaction a 16bit value
web-based monitoring system. is written from the PC to the register RXadr,ep and
is read from the Bioe module register T Xadr,ep back
Usability and reliability of an open RCP frame-
to the PC simultaneous.
work are basic requirements so that open hard- and
software can be used for educational purposes, a
very important precondition to introduce open RCP
frameworks into industrial applications.
8
Real-Time Linux in Education
are realized for Bioe and each interface type can and connectors, communication speed and the micro-
be activated with an unique DTN. After the initial- processor performance. For the communication with
ization of the interface the corresponding interface transactions a 4 bit parallel bus is used and each
function is processed and the informations are con- transaction contains the device address, the end-
verted and exchanged between the end-points and point number, the register values and a very simple
the hardware interface as illustrated in figure 2. check-sum to detect transmission errors.
The table 1 gives an overview of the provided PIN Name Description
interfaces:
1 Vcc +5 V supply
DTN Interface Type Description 2 GN D ground supply
001 16 digital outputs 3 CS Chip-Select (by Master)
4 CLK Clock (by Master)
002 16 digital inputs 5 DO0 Bit 0 Master→Slave
003 8 digital inputs, 8 digital outputs 6 DO1 Bit 1 Master→Slave
7 DO2 Bit 2 Master→Slave
050 two 10bit PWM, four 10bit ADC 8 DO3 Bit 3 Master→Slave
090 square signal generator 9 DI0 Bit 0 Slave→Master
10 DI1 Bit 1 Slave→Master
091 square signal frequency measurement
11 DI2 Bit 2 Slave→Master
100 incremental decoder 12 DI3 Bit 3 Slave→Master
101 incremental decoder with HCTL2022 13 NC not in use
14 NC not in use
105 16x2 LCD driver, four push buttons
106 ADC, PWM, RC-Servomotor and TABLE 2: Description of the 14 pole Bioe
incremental decoder bus flat cable
107 UART interface Performance measurements done with different
120 RC5 infrared receiver PC’s return a transaction time Ttrans between 25 µs
and 50 µs (time where CS is high; see figure 3). This
127 4 wire resistive touch interface
parameter depends on the PC’s hardware, especially
128 WII-Nunchuck interface (I2 C) the type of parallel port (parallel port directly on the
129 WII-Remote IR camera interface (I2 C) motherboard, a PCIe extension module, ...).
9
A Nonlinear Model-Based Control realized with an Open Framework for Educational Purposes
10
Real-Time Linux in Education
to implement a Http server within the real-time ap- the moment the Rtxmlserver provides an access
plication, to start the server as non-real-time thread via Http requests (a reduced Http server is im-
and to exchange signals and parameters between the plemented according RFC-2616 [15]; TCP and UDP
real-time application and the Http server (see sec- servers are planed).
tion 3.1). A REST-style architecture [14] is used for s ta ti c p t h r e a d _ t thrd ;
Ajax based communication in which the signals are
s ta ti c void * s e r v e r _ t a s k(void * p )
exchanged as XML formated byte stream. A web {
application or an other kind of application uses this struct s c h e d _ p a r a m param ;
param . s c h e d _ p r i o r i t y = 10;
interface to establish a connection with the real-time i f ( sched setscheduler (0 , SCHED_FIFO , & param )== -1)
application to scope signals and to modify parame- {
exit ( -1);
ters. For the client side communication and a basic }
web-based monitoring system see section 3.2. // ... process the server
}
Control-Thread
BIOE bus
Real-Time
LISTING 1: Schematic structure of the
Rtxmlserver Scicos block
Rtxmlserver Create Thread
The Http server supports GET requests to ac-
FIFO based cess the file system and to load a web-page. In ad-
Server-Thread
Signal dition, a PUT-Request with the URI /rtxml.pipe
Manager Non-Real-Time
passes the RTXML-Request within the Http
request message-body to the RTXML-Request-
Data Process Server Handler. The handler parses the XML byte stream
and does the corresponding actions similar to SAX
RTXML- [16]. The XML formated RTXML-Response is
XML HTTP
Request sent back to the client within the message-body of
Stream Server
Handler the PUT-Response. The whole description of the
RTXML-Requests and RTXML-Responses is not in-
Record tended within this contribution but sub-section 3.3
should give an idea about the structure of the XML
formated messages.
File-System
The concept to start a non-real-time thread
within a Scicos block (see listing 1) can be applied
HTTP-Request for other purposes like image processing. In this
case, the function to get the image from a camera
FIGURE 7: Simplified Structure Diagram and to do the image processing can be done within
of the Rtxmlserver a non-real-time task. The results are passed to the
real-time thread. This approach reduces the effort
This thread processes the data-exchange between to start the application and to share the data with
a remote client monitoring system and the real-time inter-process communication.
task. For the communication between the real-time
thread and the server thread a object called RTSig- 3.2 Client Side Application
nalManager was implemented. The Scicos signal
blocks for the Rtxmlserver use this manager to The prototype of a client side monitoring system
register their signals and parameters and during the shown in figure 8 is an Ajax based web-application
operation the signal value with time-stamp is passed using JavaScipt and jQuery (see [17]). Ajax (with
via thread-safe FIFOs to the Rtxmlserver. At jQuery) and the Http request method PUT are
11
A Nonlinear Model-Based Control realized with an Open Framework for Educational Purposes
used to establish and configure a connection with the with all sampled informations for offline analysis the
Rtxmlserver and to exchange the signals and pa- Rtxmlserver can record all data-points into a local
rameters. This functionality is encapsulated within *.csv file and this file can be downloaded with the
the JavaScript Rtxmlclient class that prepares web browser.
and stores the data in arrays. These arrays can be
used for a visualization purposes, e.g., realized with
a JavaScript plotting library to illustrate the signals
continuously without refreshing the whole web-page.
The library jQuery UI [17] was used to improve the
look and feel of the visualization and to get a behav-
ior comparable with a native application.
HTTP-Request
RT-Task
Data
Container
Rtxmlclient
JavaScript-Class Web-Browser
using Ajax
FIGURE 9: Screenshot of a web-based
monitoring example
12
Real-Time Linux in Education
to truncation. The HEX-ASCII string yields to a L is a geometrical parameter of the system and L0 is
100% increase of the bytes to transmit. In the future the unstressed length of the spring. Here and in the
the use of Base64 (RFC4648, [18]) coding is planed following let us assume for simplicity L = L0 .
because this increases the bytes to sent only about
33%. Furthermore, it should be checked if the XML hydraulic L
parsing can be improved. actuator x
<R > m
<! -- Connection - Number -- > c, L0
<C > 69 </ C >
<! -- Provide Signals and P a r a m e t e r s -- >
<P > <I >5 < / I > <N > Param1 </ N > </ P >
<P > <I >6 < / I > <N > Param2 </ N > </ P > viscous friction
<S > <I >1 < / I > <N > Input </ N > </ S >
<S > <I >2 < / I > <N > Output </ N > </ S > QA QB
<! -- Signal Samples Hex - Formated -- >
<! -- time , id1 , signal1 , id2 , signal2 ,.. -- >
<S > <V > AF43 .....4 B2C </ V > </ S > FIGURE 10: Schematic of the hydraulic
<S > <V > AF43 .....4 B3A </ V > </ S > system
</ R >
13
A Nonlinear Model-Based Control realized with an Open Framework for Educational Purposes
figure 1) was realized with a minimum of effort and with e = [ev eFd ]T . Nevertheless, the proof of the
the signals have small offsets. The pressures pA stability of the observer does not allow for conclu-
and pB are very sensitive for signal-offsets because a sions concerning the stability of the overall system
wrong measured pressure deals like an external dis- because of the nonlinear system (2).
turbance force and this irritates a reduced observer
for the velocity v. To fix this problem the system (2) 4.3 Implementation
is extended by an unknown but constant disturbance
force Fd with the dynamic Ḟd = 0. To realize the HIL simulation Ubuntu 10.04 LTS
The following observer design (see, e.g., [12]) [22] with the rt-preempt patched 2.6.31-11-rt kernel
treats the general system from the Lucid repository was installed. ScicosLab
4.4.1 [4] with the rt-preempt code generation pack-
ẋ = v age [9] is used for the simulation and real-time code
1 generation. The Bioe and Rtxmlserver blocks
v̇ = (−c x − d v + F + Fd ) were installed into the ScicosLab directory struc-
m
Ḟd = 0 ture. A performance measurement of a dummy sim-
ulation with a cycle time of Ts = 2 ms and a Bioe
with the position x, the velocity v and the distur- with DTN250 (see table 1) was performed on an In-
bance force Fd . To get the equations for the DAP tel(R) Pentium(R) M 1.4 GHz notebook. During the
system F = A pA − α A pB has to be used. The test duration of two hours the maximum Ts,max =
position x and the force F can be determined by 2.08 ms and the minimum Ts,min = 1.928 ms of the
measurements and calculations and so a reduced ob- cycle time were detected. It should be noted that
server is designed to estimate the velocity v and the the measured time variations cover the rt-preempt
disturbance force Fd . In the first step the following patched Linux notebook, the Bioe bus and the Bioe
state transformation element software.
14
Real-Time Linux in Education
xref
["xref"] psumref QA
"QA"
["pref"] x QB
["x"] pA "QB"
Sample−Time Plot−Output File−Output ["pA"] v_obs
pB "vobs"
From ["pB"]
fr..
Tp [..
Goto EALin−Control+Observer
Ts xref calc_psum "ps..
Goto psumref"xref"
"pref"
Reference Position/Pressure
x x−>dac S. N. adc−>x
QA "x"
["QA"] q−>dac N. S. adc−>q v v−>dac S. N. adc−>v "v"
QB pA p−>dac S. N. adc−>p
["QB"] q−>dac N. S. adc−>q "pA"
pB p−>dac S. N. adc−>p "pB"
Analog Interface (S/H, Noise, ADC, DAC)
Analog Interface (S/H, Noise, ADC, DAC)
DAP−System
FIGURE 11: Numerical simulation of the DAP control realized with Scicos (Tp = 1 ms, Ts = 5 ms)
f (x, u, p) with the system input u and the equations changed dramatically in order to obtain the step re-
g(x, u, p) for the system output y. sponse of the system. In the third plot the volume-
flow QA into the DAP are compared and the fourth
Such a code-generation package is an important
plot compares the real piston velocity (with noise)
component of a complete RCP framework because
and the estimated velocity from the numerical simu-
it avoids the repeated implementation of equations,
lation.
the search for typing errors and it is easy to keep the
analytic calculations and the simulation code syn-
chronized. The created blocks for the DAP system 0.015
and the control law can be used for numerical simula- Reference
Piston Position
0.01
tions and the real-time code generation with Scicos. 0.005 Simulation
x/m
0 HIL
To obtain realistic results the numerical simula- −0.005
tion in figure 11 takes effects like quantization, signal −0.01
ranges and noise into account. The numerical results −0.015
0 1 2 3 4 5
fit very well with the HIL measurements and they are 210000
discussed in the following section. 200000
Sum−Pressure
psum / Pa
190000
180000
4.4 Results 170000 Reference
160000 Simulation
150000 HIL
Following system parameters were used for the 140000
numerical and HIL simulation: m = 1 kg, d = 1 Nms , 0 1 2 3 4 5
N
c=1m , A = 1 · 10−4 m2 , α = 0.7, VA0 = 5 · 10−6 m3 , 5e−005
Simulation
4e−005
Volume−Flow
3e−005
the parameter values are not realistic because they 2e−005
1e−005
were chosen so that bad influences caused by noise, 0
quantization, offsets and signal limitations due to −1e−005
the low-cost analog interface will be reduced. The −2e−005
0 1 2 3 4 5
eigenvalues of the observer dynamic matrix Aobs are 0.1
placed with the coefficients kv = −3 and kFd = −4 0.08 Simulation v
0.06 Simulation vobs
to −2. Finally, the poles of the error dynamic (3) are
Velocity
v / m/s
0.04
defined with α12 = 90, α11 = 2700, α10 = 27000 at 0.02
−30 and the pole of the error dynamic (4) is aligned 0
with α20 = 10 to −10 . −0.02
−0.04
The results are summarized in figure 12. The 0 1 2 3 4 5
15
A Nonlinear Model-Based Control realized with an Open Framework for Educational Purposes
5 Conclusions and Perspectives [10] Basic Input Output Elements Project Website:
http://bioe.sourceforge.net. Web. 20 Aug. 2011.
The HIL simulation of an industrial motivated
control demonstrated that rt-preempt patched Linux [11] Isidori, A.: Nonlinear Control Systems, 3rd Edi-
kernels, ScicosLab, Maxima, the presented open tion. Springer, Londen, UK, 1995.
hardware Bioe and the web-based monitoring sys-
tem prototype build a complete open rapid control [12] Schlacher, K.; Zehetleitner, K.: Control of Hy-
prototyping framework which can be used for edu- draulic Devices, an Internal Model Approach In:
cational purposes. Due to the web-based approach Analysis and Design of Nonlinear Control Sys-
there is no additional software except a modern web- tems. Springer-Verlag Berlin Heidelbarg, 2008.
browser required to interact with the real-time sys-
tem. Furthermore, the web interface can be modified [13] Weichinger, K.: Tonregelung einer Trompete
to the needs of the application by editing the web mit einem Low-Cost Automatisierungssystem
pages with a single text editor. und RTAI Linux, JK University Linz, Austria,
2008.
Improvements of the web-based monitoring sys-
tem and the release of a ready to use version for [14] Fielding, R. T.: Architectural Styles and the
the community with more human interface demos Design of Network-based Software Architectures.
are planed. An other point of interest is to use this Dissertation, University of California, Irvine,
framework to realize a distributed parameter system 2000.
benchmark example and to compare the results with
measurements. Therefore, more complex Bioe in- [15] Fiedling, R. T. et al.: RFC2616: Hyper-
terfaces will be used and the realization of 2D/3D text Transfer Protocol - HTTP/1.1, June 1999,
visualizations are planed. http://www.ietf.org/rfc/rfc2616.txt, Web. 20
Aug. 2011.
[6] FSM Labs, Inc.: RTLinux3.1 Getting Started [20] Kugi, A.: Non-linear Control Based on Physical
with RTLinux, 2001. Models, volume 260 of Lecture Notes in Con-
trol and Information Sciences. Springer-Verlag,
[7] Xenomai: Real-Time Framework for Linux
London, 2001.
Website:
http://www.xenomai.org. Web. 20 Aug. 2011. [21] Murrenhoff, H.: Grundlagen der Fluidtechnik,
[8] Maxima, a Computer Algebra System; Web- Teil 1: Hydraulik. Shaker Verlag, Aachen, 2007.
site: http://maxima.sourceforge.net. Web. 20
Aug. 2011. [22] Ubuntu Linux Distribution Website:
http://www.ubuntu.com. Web. 20 Aug. 2011.
[9] Bucher, R.: rt-preempt Code Generation Pack-
age for ScicosLab [23] Campbell, S.; Chancelier J.; Nikoukhah,
http://www.dti.supsi.ch/b̃ucher. Web. 20 Aug. R.: Modeling and Simulation in Scilab/Scicos.
2011. Springer. 2006.
16
Real-Time Linux Applications
Petr Kacmarik
Department of Radio Engineering K13137, CTU FEE Prague
Technicka 2, 166 27 Praha 6, Czech Republic
[email protected]
Pavel Kovar
Department of Radio Engineering K13137, CTU FEE Prague
Technicka 2, 166 27 Praha 6, Czech Republic
[email protected]
Ondrej Jakubov
Department of Radio Engineering K13137, CTU FEE Prague
Technicka 2, 166 27 Praha 6, Czech Republic
[email protected]
Frantisek Vejrazka
Department of Radio Engineering K13137, CTU FEE Prague
Technicka 2, 166 27 Praha 6, Czech Republic
[email protected]
Abstract
The Witch Navigator (WNav) is an open source project of GNSS (Global Navigation Satellite System)
receiver whose hardware is implemented as an ExpressCard hosted in PC with Linux OS.
The employment of PC offers a possibility of an easy implementation of signal processing algorithms
since almost no restrictions are introduced by a specific embedded platform (concerning memory re-
quirements or real data type and its arithmetic). As a consequence, the WNav is especially suitable for
researchers or students because the signal processing algorithms can there be implemented in asimilar
manner as in high-level simulations. Furthermore, developers can rely on the wide and well known col-
lection of development tools for Linux on x86 architecture. Unlike similar projects, WNav has capability
to achieve performance comparable to professional GNSS receivers.
The WNav receiver is equipped with two front ends which can process any civil GNSS signals on two
frequencies simultaneously. The whole receiver task is distributed between the device driver, user space
real-time process and other axillary processes. The real-time needs are satisfied with RT PREEMPT
kernel patch.
The paper describes the whole conception of WNav with focus on the kernel part (device driver) and
the real-time user space process, provides information about the processes synchronization and presents
the achieved performance.
The first obvious milestone is to develop the fully functional GPS L1 C/A receiver which justifies
the selected conception. The achieved results and experience with this legacy signal are presented in the
paper, as well.
17
The Witch Navigator A Software GNSS Receiver Built on Real-Time Linux
18
Real-Time Linux Applications
consequence, the receiver is not capable of direct dis- and PLL are their detectors. The detector is a block
tance measurement but rather so called pseudorange which output is proportional to an error of tracking
ρi which differs form true distance di by an unknown parameter, i.e. to τ − τ̂ for DLL or to ϕ − ϕ̂ for
bias b. The system of equations has then following PLL. The detector output drives (through the loop
form filter) particular NCO in the signal replica generator
p in other to minimize the error. The contemporary
ρi − b = (x − xi )2 + (y − yi )2 + (z − zi )2 (2) GNSS receiver has several such DLL/PLL blocks,
Since there are 4 unknown parameters (3 position co- each of them tracks one satellite signal.
ordinates and bias b) the receiver needs to perform
minimally 4 measurements to find the solution.
How is the time measurement accomplished from
the signal? The receiver generates identical signals as DLL
satellites. These locally generated signals are called detector
si(t) τ error DLL loop
signal replicas. The signal replicas are kept synchro-
filter
nized with the received signals. The time (delay)
information is then carried in signal replica parame- φ error PLL loop
ters needed for their generation so it is available in filter
PLL
the receiver. detector
ri(t)
Let’s consider the following simplified model of
replica
the received signal generator
si (t) = A d(t − τi ) ci (t − τi ) exp j(ωo,i t + ϕ0 ) + n(t), code NCO carr. NCO
(3)
where A is a signal amplitude, d(t) represents a nav-
igation message, ci (t) is a pseudorandom code (PRN
code) and n(t) is an additive noise. The parameter τi
represents the signal delay, ωo is a frequency offset,
ϕ0 arbitrary carrier phase in t = 0. The correspond-
ing signal replica has the form FIGURE 1: Simplified block diagram of
GNSS signal tracking – DLL and PLL struc-
ri (t) = ci (t − τ̂i ) exp(−jω̂o,i t), (4) ture
where τ̂i is an estimation of the received signal delay.
The τ̂i is the key parameter needed for pseudorange
formation ρi .
The specific correlation property of ci (t) enables Now we move towards the real implementation
to keep si (t) and ri (t) synchronized, i.e. the estima- of the signal tracking as utilized in the WNav re-
tion error τi − τ̂i is kept small. ceiver. The modified block diagram of the algorithm
is shown in Fig. 2.
In fact, the parameter τi can carry only a frac-
tional part of the pseudorange ρi due to ci (t) peri- The detector in GNSS receiver consists of sev-
odicity. Since ci (t) has period of 1ms the τi can only eral correlators (WNav employs just Early and Late
be measured in the range of 0 to 1 ms which corre- correlators in one DLL/PLL block) and discrimina-
sponds to 0 to 300 m in distance. The pseudorange tor block. The correlator is a block which computes
ρi is extended using the bits of navigation message the mutual energy of the si (t) and ri (t) over spec-
d(t) and so called Z-count (time mark imprinted in ified interval (given by period of ci (t), i.e. 1 ms in
d(t)). GPS L1 case). This interval is denoted as the inte-
gration time and, in WNav project, the instants of
The signal tracking is a stage of signal processing
the Early and Late integration ends are denoted as
where a locally generated replica is kept synchronized
Early and Late PRN TIC (E PRN TIC and L PRN
with a received signal. The receiver usually utilizes
TIC), respectively. The next block, discriminator,
cooperation of two feed-back systems for this task.
is usually non-linear memory-less block. Note that
For code and carrier synchronization the DLL and
the DLL and PLL detectors have common correla-
PLL feed-back systems are employed, respectively.
tors while discriminator blocks differ. The DLL and
The simplified block diagram of the signal track- PLL detectors are marked as the blue and red areas,
ing is shown in Fig. 1. The key parts of both DLL respectively.
19
The Witch Navigator A Software GNSS Receiver Built on Real-Time Linux
discriminator
correlator
DLL loop
si(t) all DLL/PLL blocks are sampled at a slightly higher
Early
filter
DLL
rate than E/L PRN TIC and at this time all register
values are interchanged between the FPGA and the
PC. The software on the PC side then does recogni-
discriminator
tion whether new correlator output values were re-
correlator
PLL loop
filter
ceived as a consequence of E/L PRN TIC of a partic-
Late
PLL
ular DLL/PLL block. The resampling runs at 800 µs
rate and in the WNav project is denoted as TIC
rE,i(t) rL,i(t) event. We will also discuss the TIC event later since
it is an important event which drives the FPGA-PC
carr. NCO
meas. carr.
generator
carrier
communication.
Signal tracking can not be an initial stage of sig-
nal processing. The DLL/PLL supposes that the
code NCO
meas. code
cL,i(t)
code
20
Real-Time Linux Applications
3.1 Receiver Hardware 1192). The 8-bit samples at sampling rate of 20 MHz
are used for each component. The next signal pro-
The WNav hardware consists of a peripheral device cessing is accomplished in the FPGA. The WNav is
plugged into the PC workstation. The device is im- now built on Xilinx Spartan 6 FPGA (XC6SLX45T).
plemented as an ExperessCard/54 (L-shape). Two
The key elements of the FPGA part are
different WNav device prototypes can be seen in
DLL/PLL correlator blocks as described above (see
Fig. 3.
Fig. 2). These blocks are organized into groups of six
and, on the higher level, there are four such groups
in WNav. Thus the WNav receiver has capability to
track 6 × 4 = 24 satellite signals simultaneously.
Except the correlator blocks, the FPGA part
contains the other blocks as snapshot memory for
signal acquisition purpose, and an I2C block which
can control the direct-conversion tuners.
All input and output registers and snapshot
memory are arranged in such a way that they can be
accessible through I/O memory mapped mechanism
from the PC side. The device offers one I/O mem-
ory region which is common for reading and writing
operations. But, the read and write operations with
identical address access different memory cells in the
device (generally, a value which was written to the
FIGURE 3: Two different versions of the device can not be read back from the device at the
WNav ExperessCard device prototypes same address). The arrangement of the input and
output registers into I/O memory space as viewed
The WNav device consists of an analog part and from the PC side can be seen in Fig. 6 and Fig. 7.
FPGA part. The FPGA part is responsible for digi- To ensure register visibility from the PC side, the
tal signal processing and communication via PCI Ex- FPGA part contains the communication block which
press (PCIe). forms and processes TLP (Transaction Layer Pack-
ets) packets and transforms them to the I/O memory
I operations.
CH1 MAX 2120 MAX 1192 I,Q
direct conv. 2x8 bit
RX ADC adc_clk The memory region for the read operation is
Q
AGC PCIe
large due to the snapshot memory size. To meet
lanes strict time requirement the transfer from the device
TCXO Spartan 6
I2C to the PC is accomplished with DMA and the com-
20 MHz xc6slx45t
AGC munication block is equipped with a simple DMA
I controller.
MAX 2120 MAX 1192 I,Q
direct conv. 2x8 bit The communication with the device is synchro-
CH2 RX ADC adc_clk
Q nized with the TIC event. The TIC event occurs
every 800 µs and at this instant, the new values can
be read and written through I/O memory. The TIC
Lin. Reg. Lin. Reg. Lin. Reg. Config. event is propagated to the PC side by MSI interrupt
1.2V (core) 1.2V (PCIe) 2.5V (AUX) flash
Prog. (Message Signaled Interrupt) which is generated at
connect. the end of DMA transfer. The time relation of the
TIC event and the interrupt is depicted in Fig. 5.
FIGURE 4: Functional block diagram of
the WNav ExperessCard peripheral device. The correctly plugged WNav device can be seen
in a list of PCI devices:
The device is equipped with two RF (Radio Fre-
quency) inputs with MMCX connector, thus, sig- $ lspci -v
nals from two antennas can be processed simultane- ...
18:00.0
ously. The analog receiver part utilizes direct conver- RAM memory: Xilinx Corporation Zomojo Z1
sion concept (MAX 2120), so the complex envelope Subsystem: Xilinx Corporation Zomojo Z1
(I&Q components) are fed into A/D converter (MAX Physical Slot: 1
21
The Witch Navigator A Software GNSS Receiver Built on Real-Time Linux
Flags: bus master, fast devsel, latency 0, IRQ 44 driver. The second structure, struct wnav dev,
Memory at e4000000 (32-bit, non-prefetchable) gathers data for each particular device (for one
[size=1M]
Capabilities: [40] Power Management version 3
plugged WNav card) into a system; it is supposed,
Capabilities: [48] MSI: Enable+ Count=1/1 that there can be more WNav cards plugged in
Maskable- 64bit+ one PC. The struct wnav dri contains an array of
Capabilities: [58] Express Endpoint, MSI 00 pointers to struct wnav dev as a one of its item.
Capabilities: [100] Device Serial Number
00-00-00-00-00-00-00-00 Most of struct wnav dev items are filled in
Kernel driver in use: wnav
wnav pci probe() function call invoked after the
device plugging. The important items are ad-
dresses for accessing I/O memory of the WNav de-
3.2 Device Driver – Kernel Module
vice. The hardware address baddr hw is obtained
from pci resource start(), and is mapped using
The device driver was written based on information
pcim iomap() to obtain virtual address baddr vir,
in [5, 6]. Other up to date information was obtained
which is used for access from the driver side. The
using a Linux identifier search server [7].
next two addresses are related to the DMA transfer.
The WNav device driver is implemented as a There is dma hw address which has to be sent into the
character device driver. When module is loaded, the device (the DMA controller in the FPGA needs this)
plugged WNav card is accessible through the device and dma vir address, which is used for access the
file /dev/wnav0. The kernel message, when the de- DMA region from driver side. Both of them are ob-
vice was plugged, is shown here (there was one WNav tained as a result of pci alloc consistent() call.
device detected):
The device driver counterpart in the user space is
wnav: wnav_module_init() BEGIN
a RT process wnav core, which is mainly responsible
wnav: wnav_pci_probe() BEGIN for channel services, i.e. the closing feedback of the
wnav: /dev/wnav0 created for device DLLs/PLLs. The driver implements following sys-
wnav 0000:18:00.0: enabling device (0000 -> 0002) tem calls: open(), close(), read() and write().
wnav 0000:18:00.0: PCI INT A -> GSI 18 (level, low)
-> IRQ 18 The RT process calls read() and write() periodi-
wnav: resource start: 0x000000e4000000 cally and between these two calls the channel services
wnav: resource end: 0x000000e40fffff are accomplished (more wnav core details will be
wnav: resource length:0x00000000100000 provided in 3.3.1). The time relation of the FPGA,
wnav: resource flags: 0x00000000040200
wnav: +--> IORESOURCE_IO 0 driver and RT process is depicted in Fig. 5. The
wnav: +--> IORESOURCE_MEM 1 behavior of read() and write() system calls is de-
wnav: +--> IORESOURCE_IRQ 0 pended on the device status stored in variable stat
wnav: +--> IORESOURCE_DMA 0
(item in struct wnav dev). The stat can be one of
wnav: +--> IORESOURCE_PREFETCH 0
wnav: +--> IORESOURCE_READONLY 0 the following: WN READ, WN WRITE and WN DONE.
wnav: +--> IORESOURCE_CACHEABLE 0
wnav: +--> IORESOURCE_RANGELENGTH 0 We describe driver function according Fig. 5.
wnav: +--> IORESOURCE_SHADOWABLE 0 Consider that we are in instant of TIC. The RT pro-
wnav 0000:18:00.0: setting latency timer to 64 cess is sleeping now (is in waiting queue), since it
wnav 0000:18:00.0: irq 44 for MSI/MSI-X called read() and status was WN DONE. However, af-
wnav: recognized MSI IRQ: 44
wnav: probe function ends with success ter the TIC new values are available through I/O
wnav: +--> device id (minor): 0 memory. But it is not supposed that the I/O
wnav: +--> strct wnav_dev ptr: 0xf44f4000 memory would be accessible now with functions as
wnav: wnav_pci_probe() END
ioread32() or iowrite32(). Instead, the DMA
wnav: module init ends with success
wnav: Number of recognized WNav devices: 1 transfer is initiated after the TIC event. Entire read
wnav: +--> &dev[0] 0xf44f4000 block, as shown in Fig. 6, is then available in the
wnav: +--> &dev[1] 0x (null) kernel space (driver). The transfer end is signalized
wnav: +--> &dev[2] 0x (null)
wnav: +--> &dev[3] 0x (null)
by the MSI interrupt. The interrupt handler changes
wnav: +--> &dev[4] 0x (null) the status from WN DONE to WN READ. Since the status
wnav: +--> &dev[5] 0x (null) in WN READ is the condition for RT process wake up,
wnav: +--> &dev[6] 0x (null) the RT process is removed from the waiting queue
wnav: +--> &dev[7] 0x (null)
wnav: wnav_module_init() END
and can now continue in reading. The read block is
then copied into the user space with copy to user()
function. Except the data from the FPGA (out-
Internally, the module data are stored into two
put registers, snapshot memory block), TMARK and
structures. On the top, there is a structure struct
FFLAG blocks are also added. The TMARK and
wnav dri, which gathers common data for entire
22
Real-Time Linux Applications
FFLAG contain timing information and fault flags read block, no DMA transfer mechanism is imple-
from previous read() & write() cycle. This is a mented due to simplicity. The write() system call is
way, how to make available these useful data for per- implemented into two stages. First, the write block is
formance debugging in user space (we will mention copied into the kernel space with copy from user().
both of them later). When the entire read block is Next, depending of write block contents, the data
transferred into the user space, the status is changed are copied from the kernel space to the device with
from WN READ to WN WRITE. Next possible attempt of iowrite32(). When all data are written, the status
read ends with an error. Then, the RT process does is changed from WN WRITE to WN DONE. Next possible
channel services and prepares data for writing. The attempt of write ends with an error. When the RT
arrangement of the write block can be seen in Fig. 7. process calls read() in this time, the RT process is
Since the write block is significantly smaller than the put into waiting queue due to the status in WN DONE.
copy_from_user()
channel
copy_to_user()
sleeping services sleeping
WN_WRITE
WN_DONE
WN_DONE
WN_READ
(module)
kernel
space
transfer
DMA
iowrite32()
(FPGA)
wnav_irq_handler()
HW
23
The Witch Navigator A Software GNSS Receiver Built on Real-Time Linux
reading WN_SIZE_ACQMEM
0x7d00
0x0000 acq. mem.
(snap-shot)
0x7cfc
0x7d00
corr. meas.
2×0x80 0x100
Late code
Early carr.
and
0x7dfc
WN_SIZE_ALLDMA
WN_SIZE_RDBUFF
0x7e00
0x7f1c
output
0x7efc
anf
0x7f00
corr. ctr.
0x7f08 corr. space
TIC count
0x7f10 i2c stat.
struct
0x7f18 time_mark
struct
TMARK fault_flag
FFLAG
0x4+0x100+0x100
header
WN_SIZE_WRBUFF
0x0000
NCO control NCO ctr.
registers registers
0x00fc
corr. ctr. reg. 0x8000 reg. value
PRN mem.
DMA address 0x8008 segment
corr. space reg. 0x8004
I2C ctr. reg. 0x800c
0x01 0000
PRN mem. for
bank 0 0x100×0x40
0x01 3ffc
0x01 4000 = 0x4000
PRN mem. for
bank 1
0x01 7ffc
0x01 8000
PRN mem. for
bank 2 0x01 bffc
0x01 c000
PRN mem. for
bank 3
0x01 fffc
3.3 User Space Processes of three main user space processes. We already men-
tioned the RT process labeled as wnav core which
Our goal now is to describe the processes which run is mainly responsible for the channel services. The
in the user space context. We describe just coarse second one is a user interface and offers a look in-
conception. The detailed information is available on side of WNav but also provides WNav’s control fa-
the project homepage [12] where the source code doc- cility. The process is labeled as wnav monitor. The
umented with Doxygen [11] tool is placed. third one is a process responsible for position, ve-
locity and time (PVT) estimation and is labeled as
The WNav project, in the first approach, consists
24
Real-Time Linux Applications
wnav pvt. All of these three processes can be run In the shared memory, a queue of tasks (struct
separately. Of course, to allow a meaningful oper- wnc llist) is implemented. This queue is filled from
ation of wnav monitor or wnav pvt, the wnav core the user interface. Here, in the RT process, the task
has to be also running. from queue is performed, and if the task is finished,
the it is removed form the queue.
The Inter-Process Communication (IPC) in the
WNav project is based on shared memory. As a The following work of the RT process is the chan-
synchronization object of shared resources (items in nel services. The signal processing related data (filter
shared memory) we simply rely on integer variables status, channel status, accumulated correlator out-
which are treated atomically (we used gcc built- puts, etc.) are stored in structure wnav corr t. The
in atomic functions). A shared memory top struc- array of wnav corr t with identical organization as
ture struct wnav shamem can be used as an outline, DLL/PLL blocks in FPGA (i.e. 4 × 6) is an item
which data are shared among the processes: of the structure struct wnav shamem. The signal
processing task is driven according the channel sta-
struct wnav_shamem tus, item chst in wnav corr t. Based on the avail-
{ able correlator outputs from struct rd restbuff
/* --- IO memory --- */ and data in wnav corr t, new values for code and
struct acq_mem acq;
carrier NCOs are prepared.
struct rd_restbuff rd_rest;
struct wr_buff wr; Another important task of the RT process is an
/* --- --- */
struct prn_gener prn; export of code measurement for pseudorange form-
struct wnav_tuner tuner; ing. These data are stored in struct pvt share and
struct wnav_core wcore; then are read by PVT process.
/* --- channels (correlators) --- */
wnav_corr_t The final task is to write the data back to the
corr_all[WN_CORR_NO_GRP][WN_CORR_IN_GRP]; FPGA through the device file /dev/wnav0. The task
/* --- monitor --- */
struct wna_heap heap;
is accomplished in a function do write(). It may be
struct wnc_llist llist; divided into two parts, see Fig. 7. If just new NCO
/* --- pvt process --- */ values and possible one register value have to be writ-
struct pvt_share pvt; ten, the writing is accomplished with one write()
};
system call. All needed data are prepared in struct
wr buff. In such case, when the PRN code has to
be written in addition, the write is divided into two
3.3.1 RT Process: wnav core
steps. In the first one, the header and NCO values
are written and then a segment of PRN code is writ-
The RT process is responsible for several tasks which
ten.
are accomplished in infinite loop. We describe them
in next paragraphs.
First, the data are read from the device file 3.3.2 Process of User Interface: wnav monitor
/dev/wnav0. This task is covered in a func-
tion do read(). The reading is accomplished into
The user interface is based on Ncurses library [10],
two steps (there are two read() system calls in
see screenshot in Fig. 8.
do read()), see Fig. 6. In the first step, the block of
snapshot memory is read. Such block contains sig- The first task of the user interface is to make
nal samples over 800 µs (interval between two suc- possible a look inside the receiver. The displayed in-
cessive TICs). The block is stored as one element of formation is organized into pages. In the screenshot,
an array in struct acq mem. This array organizes the channel status related data are displayed. There
successive snapshot memory blocks to result into a are other pages like an acquisition page, tuner status
region with signal samples over long interval equals page, fault flag page and help page.
to several multiples of 800 µs. The second step of
The second task of the user interface is a re-
reading get the code and carrier measurements, cor-
ceiver control. The user can control the receiver
relator outputs but also debugging and performance
by typing the commands into a command line (see
related information from kernel driver. These data
the last line in the screenshot). The command
are stored in struct rd restbuff.
consists of its identification (string) and argument
Next task of the RT process is connected with the lists, e.g. ACQ 0 0 starts the acquisition for the
receiver control through the user interface. This task first channel in the first group (channel with coor-
is accomplished in function wnc llist perfrmv(). dinates 0, 0), or TUNER 0 1575.42 4.0 10.0 sets
25
The Witch Navigator A Software GNSS Receiver Built on Real-Time Linux
the frequency, bandwidth and gain for the first tuner algorithm is based on Eq. 2 which is solved using the
(with 0 id). Since most tasks need some write least square method.
and read to/from the FPGA, the monitor process
For proper function the process wnav pvt needs
just converts commands into task objects and put
data from the RT process wnav core. The com-
them into the queue. The queue is internally imple-
munication is accomplished using struct pvt share
mented with struct wnav llist in the shared mem-
placed in the shared memory.
ory. The function wnc llist add() performs a com-
mand parsing and putting task into this queue. Such The items of struct pvt share can be seen
created task objects are then retrieved in wnav core from next code:
with wnc llist perfrmv() function.
struct pvt_share {
uint8_t count_m125;
uint8_t core_idx;
uint8_t pvt_idx;
enum praw_stat stat[PVT_RAW_CNT];
PVT_RAW raw[PVT_RAW_CNT];
};
PC workstation 1 PC workstation 2
PC workstation type – HP Compaq nc6320
CPU model name Intel(R) Core(TM)2 CPU Genuine Intel(R) CPU
CPU MHz 1866.669 MHz 1833.337 MHz
CPU cores 2 2
cache L1/L2 32/4096 KB 64/2048 KB
bogoMIPS 3732.89 3657.46
address size: physical/virtual 36/48 32/32
system memory 2G 1G
distribution Fedora 12 (Constantine) Fedora 15 (Lovelock)
kernel 2.6.33.7-rt29 SMP PREEMPT RT 3.0.1-rt11 SMP PREEMPT RT
hardware platform x86 64 i386
26
Real-Time Linux Applications
4 WNav Receiver Testing interrupt thread and RT process real time priority
and CPU affinity.
The WNav receiver was intensively tested on two dif- Unfortunately, we have not yet gathered enough
ferent PC platforms. The first of them was a desktop information to reliable answer which tuning mecha-
PC (gaming computer), further labeled as PC 1, the nism or parameters have key impact on the WNav
second of them was a laptop, further labeled as PC receiver. The kernel and operation system tunings
2. The PC parameters are enumerated in Tab 1. for WNav are challenges for the future. Clearly, it
On both PC platforms, we were capable to put the will always be a trade off between the PC hardware
WNav receiver into operation. performance and amount of work needed for kernel
The test configuration for PC 1 can be seen in and operation system tunings.
Fig. 9. We used Spirent simulator GSS6560[13] as
a GPS L1 C/A signal source. During the testing Early correlator
the important receiver parameters were logged for 2000
I
later analysis and visualization. See Fig. 10, where 0 Q
−2000
24.8 25 25.2 25.4 25.6 25.8 26 26.2
time [s]
5 E & L power
x 10
20
10
0
11.5 12 12.5 13 13.5
time [s]
27
The Witch Navigator A Software GNSS Receiver Built on Real-Time Linux
The WNav project is developed completely with dioengineering, Volume 19, Issue 4, pp.
free tools. This applies to both hardware (PCB de- 536–543, 2010.
sign, FPGA programming) and software. The WNav
project is open source project. Its source code, doc- [4] Kovar, P., Kacmarik, P., Vejrazka,
umentation and other related materials will be avail- F.: Interoperable GPS, GLONASS
able on the project’s homepage [12]. and Galileo Software Receiver. IEEE
Aerospace and Electronic Systems
Further project development has two obvious di- Magazine, Volume 26, Issue 4, pp. 24–
rections. The first of them is the project development 30, April 2011.
in terms of GNSS, i.e. to introduce algorithms for
new GNSS signals and systems (now, just GPS L1 [5] Corbet, J., Rubini, A., Kroah-
C/A has been implemented). The second of them is a Hartman, G.: Linux Device Drivers,
project development in terms of software implemen- Third Edition. O’Reilly Media. Febru-
tation, i.e. improving an ineffective implementation ary 2005.
of the algorithms, gathering the information how to
[6] Venkateswaran, S.: Essential Linux De-
configure the kernel and operation system, maintain-
vice Drivers. Prentice Hall. 2008.
ing the code, keeping up to date documentations etc.
We hope that in both developing directions we will [7] Linux Cross Reference, Identifier
utilize the feedback of other potential users of the Search. [Online]. Available: http:
WNav receiver. //lxr.free-electrons.com/ident
[8] The RT-kernel Wiki page. [Online].
Available: http://rt.wiki.kernel.
Acknowledgments org/
The authors would like to thank the Spirent Com- [9] Red Hat Realtime Tuning Guide.
munication for lending the GPS simulator GSS6560, 2011. [Online]. Available: http:
which was used for the verification and testing of the //docs.redhat.com/docs/en-US/
final tracking and PVT algorithms. Red_Hat_Enterprise_MRG/1.3/html/
Realtime_Tuning_Guide/index.html
[10] The Ncurses (new curses) library
References homepage. [Online]. Available:
http://www.gnu.org/software/
[1] Kaplan, E., Hegarty, Ch.: Understand- ncurses/ncurses.html
ing GPS: Principles And Applications. [11] Doxygen home page. [Online].
Second Edition. Artech House Mobile Available: http://www.stack.nl/
Communications. 2005.
~dimitri/doxygen/index.html
[2] Misra, E., Enge, P.: Global Positioning [12] Witch Navigator home page. [Online].
System: Signals, Measurements, and Available: http://www.witchnav.cz/
Performance. Second Edition. Ganga-
Jamuna Press. 2006. [13] GSS6560 Multi-Channel Fully
Flexible GPS/SBAS Simulation
[3] Jakubov, O., Kovar, P., Kacmarik, P., System. Spirent product descrip-
Vejrazka, P.: The Witch Navigator – A tion page. [Online]. Available:
Low Cost GNSS Software Receiver for http://www.spirentfederal.com/
Advanced Processing Techniques. Ra- GPS/Products/GSS6560/Overview/
28
Real-Time Linux Applications
Manikandan Ramachandran
Infosys Technologies Limited
Electronic City, Hosur Road, Bengaluru 560100,India
mani [email protected]
Aviral Pandey
Motorola Mobility
2450 Walsh Avenue,Santa Clara, CA 95051, USA
[email protected]
Abstract
Media systems generally have many CPU intensive as well as time critical processes. Vanilla Linux 2.6
with preemption enabled does provide solution to this kind of system. However, if the system is interrupt
intensive, as ours, then vanilla Linux 2.6 performance is not expected to be good; as by definition inter-
rupt preempts all higher priority tasks. RT patch seems to address exact issue by providing an option
to handle interrupts in process context, but the solution address exact issue by providing an option to
handle interrupts in process context, but the solution doesn’t seem to fit customized Linux. Quite a bit
of architecture changes had to be made to reap the benefits of RT patch.
This paper describes about various challenges faced in integrating Ingo’s RT patch on a customized
PowerPc Linux package. And explains how those challenges were overcome. It describes how LTTng can
be used to identify the bottlenecks and finally concludes by comparing performance of the application
that was run on vanilla and RT Linux.
29
Case Study: Challenges and Benefits in Integrating Real Time patch in PowerPc Based Media System
Linux operating system is evolving fast as a de- The most important goal of the product software de-
facto embedded operating system. Looking back the sign is that whenever there is an interrupt from video
community has made tremendous progress from be- hardware, kernel has to schedule video processing
ing stuck with big-o-lock to a really preemptible ker- tasks with almost zero latency. In theory assign-
nel in 2.6, and to fairly predictable kernel with RT ing high real time priority should take care of this
patch. However,still there are few challenges in using requirement; however in few instances it has been
Linux for commercial systems that require real-time demonstrated that Linux kernel has failed to honor
capabilities. priority of tasks because of various other kernel de-
pendencies.
This paper takes one of such case and walks Linux Trace Tool next generation or LTTng is
through the challenges in integrating RT patch along an open source tool that helps to find performance
with slew of other patches. After all the effort in in- bottle neck on Linux based system. Using LTTng,
tegrating, the real performance of the system is not following system bottlenecks were identified:
up to the expectation. The paper dwells deeper into
a performance issue and proposes a solution to fix
that issue. 3.1 Multiple IDE interrupts
30
Real-Time Linux Applications
2”. shows that there are more than 1 IDE interrupts In our system there are 2 scenarios in which spin-
while application process is woken up by video inter- lock usage is inevitable:
rupt. ”Marker 3” just shows the events that occurred
around ”Marker 1”. • Custom drivers handling interrupts or to pro-
vide mutual exclusion to critical resource.
• Kernel usage of spinlocks. One common in-
3.2 Softirqs preempts high priority
stance is usage of spinlocks in ”printk” func-
process tion.
In Figure 3, marker points to context switch from The idea of new design is that whenever there is a
application process to softirq daemon. In this case video interrupt, media process is woken up with zero
few micro seconds are lost from application process or little latency. Another expectation from RT patch
context. is that, the feedback mechanism used in IDE driver
to throttle IDE request can be removed and RT patch
would inherently take care of preempting low priority
3.3 Spinlocks and Preemption tasks including tasks that handle interrupts.
31
Case Study: Challenges and Benefits in Integrating Real Time patch in PowerPc Based Media System
version so we choose real-time patch version 2.6.33.9 • Thread to run 10000 tight loops
and LTTng version [version]. Patch process went
• Interval between each loops 10000 micro sec-
about quite smoothly, but we had following run time
onds
issues:
1 Unable to boot the System in SMP mode: We conducted this test in 2 phases. In first phase, all
The cause of this issue was found to be with application was stopped and made sure CPU utiliza-
calling ”kzalloc” from ”setup cpu”. Appar- tion was less than 1% on both CPU. Then we ran
ently this issue is not seen in non-real-time cyclic test first with vanilla kernel then cyclic test
kernel. We worked around this issue by stat- was ran on kernel with real-time patch. Results of
ically allocating memory rather than using the test are given in Figure-4.
kzalloc or kmalloc.
2 Dependency on BSP code: When used real-
time patch over Marvell’s BSP code, we found
lot of recursive locks issues. We identified those
issues and fixed them.
3 Handling IRQ in a Thread: This is one of the
toughest challenges that we are dealing with.
In first look, after booting real-time kernel one
would think that interrupts are handled in In-
terrupt threads, but we found it in hard way
that interrupts continue to get handled in in- FIGURE 4: Cyclic Test with No Load
terrupt context unless a few specific changes
are made[4]. In second phase, both CPUs of the system were
This issue is the main point of this paper, and heavily loaded using a script that just creates a tight
it is discussed in detail in section ”Performance loop. Multiple instances of this script were run
Analysis”. concurrently till the CPU utilization of the system
reached 97%. Cyclic test was run in this loaded sce-
nario. Figure 5 gives the result of cyclic test on both
kernels.
6 Performance Measure
After applying real-time patch we tried to get raw
performance measure of our system using ”Cyclic
testr” [5] and the actual application performance in a
controlled test environment. From cyclic test, we no-
ticed that the performance of the system was similar
to vanilla kernel. However, with our custom prod-
uct performance test we found that the performance
of the application has deteriorated a lot. Following
section describes our test scenario and results.
FIGURE 5: Cyclic Test with Load
32
Real-Time Linux Applications
7 Application Performance
Analysis
After making the kernel real-time enabled, the expec-
tation was that the IRQ thread will be preempted by
high priority media application process. However, as
demonstrated in previous section, we have seen that
performance of the application was not good with
real-time patched kernel. There could be two possi-
bilities for this kind of behavior:
FIGURE 6: CPU0 load comparison
1. Scheduler is not honoring processes priority.
33
Case Study: Challenges and Benefits in Integrating Real Time patch in PowerPc Based Media System
2. IDE interrupts are still handled in interrupt in thread, one has to use ”request threaded irq” and
context like vanilla kernel. split the handling of irq into two parts by using two
handler functions. In first part, interrupt will be han-
We confirmed ”Case 1” is not the case by running dled in interrupt context. In this context, handler
”rt-migrate-test” written by Steven Rostedt[6]. This should disable the source of the interrupt and wake
is a simple program that creates multiple threads corresponding interrupt thread. The second handler
with various priority and then checks if scheduler should do the actual handling of the interrupt, which
honors the priority of each thread. This test passed obviously will be done in the thread context.
both in normal scenario and in heavily loaded system
Having identified the bottleneck, we are in pro-
with IDE interrupts and media application process.
cess of converting IDE and other similar CPU inten-
To check if IDE interrupts are still handled in sive interrupt handlers to be run in thread context.
interrupt context, we patched real-time kernel with
LTTng. Then with patched kernel, the media ap-
plication was stressed while generating multiple IDE
interrupts.
9 Future Work
34
Real-Time Linux Applications
To make a system near real time capable, one has [8] High resolution timers and
to understand their system bottlenecks. LTTng is dynamic ticks design notes
a great tool which clearly brings out hard to un- [linux/Documentation/timers/highres.txt]
derstand system behavior. As demonstrated in this
case, LTTng was used extensively to get to bottom
[9] rt-latency-howto.txt
of many performance issues. It is highly recom-
[http://people.redhat.com/williams/latency-
mended to use LTTng to understand system perfor-
howto/rt-latency-howto.txt]
mance issues and make few system software archi-
tecture changes to take advantage of tools provided
by real-time patch. To summarize, real-time patch is
not a ”cure-all” of all system real-time owes; instead
it provides great tools that could be used as the first A Appendix
step to make a system real time capable.
Following is the legend for LTTV out-
put. Source: ”http://ltt.openrapids.net/lttv-
References doc/user guide/x81.html”.
35
Case Study: Challenges and Benefits in Integrating Real Time patch in PowerPc Based Media System
36
Real-Time Linux Applications
Markus Klotzbuecher
Katholieke Universiteit Leuven
Celestijnenlaan 300B, Leuven, Belgium
[email protected]
Herman Bruyninckx
Katholieke Universiteit Leuven
Celestijnenlaan 300B, Leuven, Belgium
[email protected]
Abstract
Control and Coordination in industrial robot applications operating under hard real-time constraints
is traditionally implemented in languages such as C/C++ or Ada. We present an approach to use Lua,
a lightweight and single threaded extension language that has been integrated in the Orocos RTT frame-
work. Using Lua has several advantages: increasing robustness by automatic memory management and
preventing pointer related programming errors, supporting inexperienced users by offering a simpler syn-
tax and permitting dynamic changes to running systems. However, to achieve deterministic temporal
behavior, the main challenge is dealing with allocation and recuperation of memory. We describe a prac-
tical approach to real-time memory management for the use case of Coordination. We carry out several
experiments to validate this approach qualitatively and quantitatively and provide robotics engineers the
insights and tools to assess the impact of using Lua in their applications.
37
Hard real-time Control and Coordination of Robot Tasks using Lua
The major challenge of using Lua in a hard real- to Lua. Lua was ultimately chosen because of its
time context is dealing with allocation and recu- significantly larger user community.
peration of memory. Previously we sketched two
strategies to address this: either running in a zero-
allocation mode and with the garbage collector de-
activated or in a mode permitting allocations from
3 Approach
a preallocated memory pool using a O(1) allocator
and with active but controlled garbage collection [3]. To achieve deterministic allocations, Lua was con-
In practice, especially when interacting with C/C++ figured to use the Two-Level Segregate Fit (TLSF)
code it may be inconveniant to entirely avoid collec- [8] O(1) memory allocator. This way memory allo-
tions, hence now we consider it necessary to run the cations are served from a pre-allocated, fixed pool.
garbage collector. Naturally, this raises the issue of how to determine
the required pool size such that the interpreter will
The rest of this paper is structured as follows. not run out of memory. We address this in two ways.
The next section gives an overview over related work. Firstly, by examining memory management statistics
Section 3 describes how we address the issue of mem- the worst case memory consumption of a particular
ory management in a garbage collected language application can be determined and an appropriate
used for coordination. Section 4 describes four ex- size set. Due to the single threaded nature of Lua
periments with the two goals of demonstrating the a simple coverage test can give high confidence that
approach and giving an overview of the worst-case this value will not be exceeded in subsequent runs.
timing behavior to be expected. Robustness is dis- Furthermore, to achieve robust behavior the current
cussed in the context of the last experiment, a coor- memory use is monitored online and appropriate ac-
dination statechart. We conclude in section 5. tions are defined for the (unlikely) case of a memory
shortage. What actions are appropriate depends on
the respective application.
2 Related work This leads to the second challenge for using Lua
in a hard real-time context, namely garbage collec-
tion. In previous work [3] we suggested to avoid
The Orocos RTT framework [4] provides a hard real-
garbage collection entirely by excluding a set of op-
time safe scripting language and a simple state ma-
erations that resulted in allocations. However, in
chine. While both are much appreciated by the user
practical applications that transfer data between the
community, the limited expressivity of the state ma-
scripting language and C/C++ this is not always
chine model (e.g. the lack of hierarchical states)
possible. Consequently the garbage collector can not
and the comparably complex implementation of both
be disabled for long periods and must be either au-
scripting language and state machines have been rec-
tomatically or manually invoked to prevent running
ognized as shortcomings. This work is an effort to
out of memory. For achieving high determinism, it
address this.
is necessary to stop automatic collections and to ex-
The real-time Java community has broadly ad- plicitly invoke incremental collection steps when the
dressed the topic of using Java in hard real-time respective application permits this. Only this way
applications [5]. The goal is to use Java as a re- it can be avoided that an automatic collection takes
placement to C/C++ to build multi-threaded real- place at an undesirable time.
time systems. To limit the impact of garbage collec-
The Lua garbage collector is incremental, mean-
tion parallel and concurrent collection techniques are
ing that it may execute the garbage collection cycle
used [6]. For our use case of building domain spe-
in smaller steps. This is a necessary prerequisite for
cific coordination languages we chose to avoid this
achieving low garbage collection latencies, although
complexity as coordination can be defined without
of course no guarantee; ultimately the latency de-
language level concurrency. In return this permits
pends on various factors such as the amount of live
taking advantage of the deterministic behavior of a
data, the properties of the live data1 and the amount
single threaded scripting language.
of memory to be freed. The control and coordination
The Extensible Embeddable Language (EEL) [7] applications we have in mind generally tend to pro-
is a scripting language designed for use in real-time duce little garbage because the scripting language
application such as audio processing or control ap- is primarily used to combine calls to C/C++ code
plications. Hence, it seems an interesting alternative in meaningful ways. Even though, to achieve high
1 In Lua, for instance, tables are collected atomically. Hence large tables will increase the worst-case duration of an incremental
collection step.
38
Real-Time Linux Applications
robustness the worst-case duration of the collection The purpose of this test is to compare the aver-
steps can be monitored to deal robustly with possible age and worst case latencies between the Lua and C
timing violations. version and to investigate the impact of the garbage
collector in different modes.
The following summarizes the basic approach.
First, the desired functionality is implemented and
executed with a freely running garbage collector.
This serves to determine the maximum memory use
Results The following table summarizes the re-
from which the necessary memory pool size can be
sults of the cyclictest experiments. Each field con-
inferred by adding a safety margin (e.g. the maxi-
tains two values, the average (“a”) and worst case
mum use times 2). Next, the program is optimized
(“w”) latency given in microseconds, that were ob-
to stop the garbage collector in critical paths and in-
tained after fifteen minutes of execution.
cremental steps are executed explicitly. The worst
case timing of these steps is benchmarked, as is the
overall memory consumption. The program is then sleep time 500 1000 2000 5000 10000
executed again with the goal to confirm that the ex- a, w a, w a, w a, w a, w
plicitly executed garbage collection is sufficient to C 0, 35 0, 31 0, 45 1, 35 1, 30
not run low on memory. Lua/free 2, 41 2, 39 3, 39 3, 45 5, 46
Lua/off 2, 38 2, 39 3, 38 3, 43 5, 38
Lua/ctrl 2, 38 2, 42 3, 37 3, 36 5, 46
4 Experiments
Comparing the C cyclictest with the Lua variants
as expected indicates that there is an overhead of us-
In this section we describe the experiments carried ing the scripting language. The difference between
out to assess worst-case latencies and overhead of the three garbage collection modes are less visible.
Lua compared to using C/C++ implementations. The table below shows the average of the worst case
All tests are executed using Xenomai [9] (v2.5.6 on latencies in microseconds and expressed as a ratio to
Linux-2.6.37) on a Dell Latitude E6410 with an Intel the average worst case of C. Note that the average
i7 quad core CPU and 8 GiB of RAM, with real- of a worst-case latency is only meaningful for reveal-
time priorities, current and future memory locked in ing the differences between the four tests, but not in
RAM and under load.2 Apart from the cyclictest absolute terms. A better approach might be to base
all tests are implemented using the Orocos RTT [4] the average on the 20% worst-case values.
framework. The source code is available here [15].
39
Hard real-time Control and Coordination of Robot Tasks using Lua
4.2 Event messages round trip slower. Of the 1 MiB memory pool, a maximum of
34% was used. It is worth noting that for the initial
The second experiment measures the timing of time- version of this benchmark, the response times were
stamped event messages sent from a requester to a approximately eight times slower. Profiling revealed
responder component, as shown in Figure 1. The test that this was caused by inefficient access to the time-
simulates a simple yet common coordination scenario stamp message; switching to a faster foreign function
in which a Coordinator reacts to an incoming event interface yielded the presented results.
by raising a response event, and serves to measure
the overhead of calls into the Lua interpreter. The
test is constructed using the Orocos RTT framework 4.3 Cartesian Position Tracker
and is implemented using event driven ports con-
nected by lock free connections. Both components The following two experiments illustrate more prac-
are deployed in different threads. Three timestamps tical use cases. The first experiment compares both a
are recorded: the first before sending the message, Lua and C++ implementation of a so-called “Carte-
the second at the responder side and the third on sian position tracker”, typical in robotics, and run-
the requester side after receiving the response. The ning at 1KHz, by measuring the duration of the con-
test is executed using two different responder com- troller update function. In contrast to the previ-
ponents implemented in Lua and C++. ous example the incremental garbage collection step
is executed during the controller update and hence
Req Resp contributes to its worst case execution time.
The following listing shows the simplified code
timestamp t1 of the update function. Note that diff function is a
store timestamp t2
call to the Kinematics and Dynamics Library (KDL)
and send response [11] C++ library, hence the controller is not imple-
mented in pure Lua. This is perfectly acceptable,
gcstep
timestamp t3 as the goal is not to replace compiled languages but
to improve the simplicity and flexibility of using the
primitives these offer.
pos_msr = rtt.Variable("KDL.Frame")
FIGURE 1: Sequence diagram of event pos_dsr = rtt.Variable("KDL.Frame")
round trip test. vel_out = rtt.Variable("KDL.Twist")
local vel, rot = vel_out.vel, vel_out.rot
For the Lua responder, this application takes ad-
function updateHook()
vantage of the fact that the requester component will if pos_msr:read(pos_msr) == ’NoData’ or
wait for 500us before sending the next message and pos_dsr:read(pos_dsr) == ’NoData’ then
executes an incremental garbage collection step af- return
ter sending each response. If this assumption could end
not be made, the worst-case garbage collection delay diff(pos_msr, pos_dsr, vel_out, 1)
would have to be added to the response time (as is
the case for experiment 4.3). vel.X = vel.X * K[0]
vel.Y = vel.Y * K[1]
vel.Z = vel.Z * K[2]
rot.X = rot.X * K[3]
Results The following table summarizes the aver-
rot.Y = rot.Y * K[4]
age (“a”) and worst-case (“w”) duration of this ex- rot.Z = rot.Z * K[5]
periment for the request (t2 − t1), response (t3 − t2)
and total round trip time (t3 − t1); all values in mi- vel_out:write(vel_out)
luagc.step()
croseconds. end
req resp total Lua/C (total)
a, w a, w a, w a, w Note that for Lua versions prior to 5.2
C 9, 37 7, 18 16, 50 - invoking the incremental garbage collector
Lua 15, 47 11, 59 26, 106 1.63, 2.12 (collectgarbage(’step’)) restarts automatic col-
lection, hence collectgarbage(’stop’) must be
On average, the time for receiving a response invoked immediately after the first statement. The
from the Lua component is 1.6 times slower than custom luagc.step function executes both state-
using the C responder. The worst case is 2.2 times ments.
40
Real-Time Linux Applications
Results The following table summarizes the re- disabled when entering the approach state and en-
sults of the worst case execution times in microsec- abled again in grasp after the respective controllers
onds. The average execution time is approximately have been enabled.
14 times, the worst case duration 7 times slower than
Besides the actual grasping it is necessary to
the C version. The worst case garbage collection time
monitor the memory use to avoid running out of
measured was 29us, of the 1MiB memory pool size a
memory. With an appropriately sized memory pool
maximum of 34% was in use.
and sufficient garbage collection steps, such a short-
age should not occur. Nevertheless, to guarantee ro-
type duration (avg, max) Lua/C (total)
bust and safe behavior this condition must be taken
a, w a, w
into account and the robot put into a safe state. This
C 5, 19 -
is shown in Figure 3.
Lua 68, 128 13.6, 6.7
The second real-world example is a coordination As the grasping task can only take place while
Statechart that is implemented using the Reduced enough memory is available, it is defined as a sub-
Finite State Machine (rFSM) domain specific lan- state of operational. The structural priority rule of
guage [12], a lightweight Statechart execution engine the Statechart model [13] then guarantees that the
implemented in pure Lua. The goal is to coordinate transition to mem low has always higher priority than
the operation of grasping an object in an uncertain any transitions in the grasping state machine.
position. The grasping consists of two stages: ap-
proaching the object in velocity control mode and Identifying the required memory pool size has
switching to force control for the actual grasp opera- currently to be done by measuring empirically the
tion when contact is made. This statechart is shown maximum required memory of a state machine and
in Figure 2. adding a safety margin. To avoid this, it would
be desirable to infer the expected memory use from
the state machine description. Predicting the static
grasping memory used by the state machine graph is straight-
forward; also the run-time memory use of the rFSM
core is predictable 4 as it depends on few factors such
approach grasp as the longest possible transition and the maximum
entry: e_contact entry:
luagc.step() en_force_ctrl() number of events to be expected within a time step.
en_vel_ctrl() grasp() However, predicting the memory use of the user sup-
approach_object() luagc.start()
plied programs would require a more detailed analy-
e_grasp_failed
e_grasp_ok
sis/simulation, which is currently out of the scope of
this work; but in robotics, most user supplied pro-
grams are in C/C++ anyway.
FIGURE 2: Coordinating the grasping of
an object. Results The previously described grasping coor-
dination Statecharts are tested by raising the events
The real-time constraints of this example depend that effect the transitions from grasping, approach
largely on the approach velocity: if the transition to to grasp. The timing is measured from receiving the
the grasp state is taken too late, the object might e contact event until completing the entry of the
have been knocked over. To avoid the overhead of grasp state. After this, the same sequence of events
garbage collection in this hot path, the collector is is repeated. The functions for enabling the controller
4 It consists mainly of traversing and transforming the FSM graph.
41
Hard real-time Control and Coordination of Robot Tasks using Lua
are left empty, hence the pure overhead of the FSM priate validation should be repeated for each critical
execution is measured. Running the test repeatedly use. In particular when real-time allocation and col-
for five minutes indicates a worst-case transition du- lection is involved, run time validation of real-time
ration between approach and grasp of 180us. The constraints must be considered as an integral part
memory pool size was set to 1 MiB and the TLSF the application.
statistics report a maximum use of 58%. To test
The major shortcoming of the current approach
the handling of low memory conditions, in a second
is that worst-case memory use can be difficult to pre-
experiment the collector is not started in the grasp
dict. To deal with this we currently allocate addi-
state. As a result no memory is recovered, eventually
tional safety margins. As the overall memory usage
leading to a low memory condition and a transition
of the Lua language is comparably small, such a mea-
to the mem low state. For this test the worst case
sure will be acceptable for many systems, save the
maximum memory use was as expected 70%.
very resource constrained.
This test does not take into account the latencies
To conclude, we believe the results demonstrate
of transporting an event to the state machine. For
the feasibility of our approach to use a scripting lan-
example, when using the Orocos RTT event driven
guage for hard real-time control and coordination
ports, the experiments from Section 4.2 can comple-
that permits to significantly improve robustness and
ment this one. Moreover it should be noted that so
safety of a system. The price of these improvements
far no efforts have been put into minimizing rFSM
are (i) increased yet bounded worst-case latencies,
transitions latencies; we expect some improvement
(ii) computational overhead, as well as (iii) requir-
by optimizing these in future work.
ing additional precautions such as manual schedul-
ing of garbage collection. In summary, we believe
Robustness considerations As described, basic this constitutes a modern and practical approach to
robustness of coordination state machines is achieved building hard real-time systems that shifts the fo-
by monitoring of memory and current real-time la- cus from lowest possible latency to sufficient latency
tencies. However, the system level concern of coordi- while maximizing reliability.
nation unfortunately combines the two characteris- Future work will take place in two directions. On
tics of (i) requiring higher robustness than functional the high level we are investigating how to automat-
computations and (ii) being subject to frequent late ically generate executable domain specific languages
modifications during system integration, the latter from formal descriptions. Implementationwise we in-
of course being susceptible to introduce new errors. tend to investigate if and how the presented worst
The combination of scripting language and rFSM case timing behavior can be improved by using the
model can mitigate this effect in two ways. Firstly luajit [14] implementation, a high performance just-
the scripting language inherently prevents fatal er- in-time compiler for Lua.
rors caused by memory corruption, thereby making it
impossible to crash the application. Secondly, rFSM
statecharts execute Lua user code in safe mode5 . Acknowledgments This research was funded by the
This way errors are caught and converted to events European Community under grant agreements FP7-
that again can be used to stop the robot in a safe ICT-231940 (Best Practice in Robotics), and FP7-ICT-
way. 230902(ROSETTA), and by K.U.Leuven’s Concerted Re-
search Action Global real-time optimal control of au-
tonomous robots and mechatronic systems. The au-
thors also gratefully acknowledge the support by Willow
5 Conclusions
Garage, in the context of the PR2 Beta program.
42
Real-Time Linux Applications
43
Hard real-time Control and Coordination of Robot Tasks using Lua
44
Real-Time Linux Applications
Martin Leonhartsberger
Institute for Measurement Technology, Johannes Kepler University
Altenbergerstrasse 69, 4040 Linz, Austria
[email protected]
Bernhard G. Zagar
Institute for Measurement Technology, Johannes Kepler University
Altenbergerstrasse 69, 4040 Linz, Austria
[email protected]
Abstract
Most real-time applications in context to automatic control and data acquisition do require user
interfaces to change parameters or to extract data at runtime. Especially in prototyping control processes,
it is common to use CACSD programs such as Matlab or Scicos for code generation. While still being in
the prototyping stage the customer demands to start evaluation using a software tool. Very often those
customers do not possess the ability to work with CACSD tools or other tools as xRTAILab because of
missing licenses or lack of knowledge in linux. To close the gap between the real-time target and data
evaluation a remote control framework is introduced. Our approach is to use RTAIXML as a XML-RPC
server. It is running on the rt-system and has the ability to connect to the CACSD generated code. This
server instance can be contacted through a platform independent Java framework which allows quick
prototyping of simple to use user interfaces without interfering the control system development. It is
possible to transfer scopes and to change parameters on running targets. A successful application is
shown on the example of a low cost rheometer.
45
Platform independent remote control and data exchange with real-time targets
target through the framework. With this combina- new target parameter values. This is especially
tion rapid prototyping of the control process is pos- needed to change feedback gains or the state.
sible parallel to the GUI implementation as long as
Get Signal Structure: receives a structure of ex-
the state machine interface parameters are designed
posed signals on the target. They can be scalar
properly.
or arrays.
View Selected Signal: This will start a continu-
2 Used tools ous transfer of a signal, so not over XML-RPC
but with an explicit socket connection.
An RTAI [5] patched 2.6.32.11 kernel is used on Write Signal Data on File: This will store a sig-
our reference system. RTAI was started by Paolo nal on the server to request it in a later stage.
Mategazza from the Politecnico di Milano and is in- It is not used in this project, but this may be
troducing a real-time schedule into the kernel which very useful for unattended recordings.
is then able to execute hard real-time code, both
in user and kernel space. Precompiled kernels for Disconnect: This command will shutdown the ses-
a quick start can be found for example in [6]. For sion and deallocate the target lock.
communication with the data acquisition hardware a
COMEDI [7] driver is being used. COMEDI drivers 2.2 Java Framework and state ma-
are low level kernel drivers which provide their func-
tionalities through a common interface for different
chine
data acquisition hardware. Comedilib as a user space
For remote controlling a target over RTAI-XML a
library is used to extend the block modeling capabili-
Java Framework was developed during this work.
ties of Simulink [1] and Scicos [2] for code generation
in a later step. All tools except Simulink are avail-
able as open source.
46
Real-Time Linux Applications
Scicos. The java classes, designed with the Model A possibility to change parameters is now found,
View Controller pattern are shortly presented in the the next step is to implement a work flow by us-
UML chart in Fig 1. ing states for example. This could be done by state
flow objects in Simulink or by additional C programs
Class RTAITarget: This class provides an ob- directly on the target for example. Still, a client pro-
ject for a target. setParameterByName(..) gram for evaluating the results would be necessary as
allows the change of a single parameter well and very often the required workflows are due
or the whole parameter-set can be up- to change quite often, therefore we suggest to im-
dated with setParameters(..) at once. plement a state machine on the client in a program.
getScopeByName(String sName) returns an Every automatic control block and calculation can be
object of type RTAIScope which is being de- switched on and off with a constant in the CACSD
scribed in the next item. Functions for send- model. Initially all processes are switched off. The
ing and receiving parameters allow to update remote program then starts to switch on or off the
the target at certain times when all parame- parts as it is defined in a state model. We have suc-
ters have been set or can be triggered with the cessfully used this approach to implement different
function parameter updateImmediately which workflows for one device.
is available for all ”set” functions. Listeners To show the usage of the framework, a very small
will notify attached objects about changes in demonstration program has been written. The com-
the parameters. ments in the code will explain the different steps.
Class RTAIScope: This class provides an object The source of the framework itself is not printed
of a single scope instance. All the scope due to its large amount of line numbers but will
history is saved in this object after trig- shortly be available as an sourceforge project under
gering the socket transfer of the data with the GPL.
scope.start(). Listeners will as well update
attached objects about new data.
2.2.1 A short example
Interface RTAITargetListener: Provides the in-
terface which target listener objects will have
to implement. We create a small example which is shown in Fig. 2.
The demo target has been inherited of the comedilib
Interface RTAIScopeListener: Provides the in- demos. It demonstrates the output of a signal to an
terface which scope listener objects will have RTAI scope and to a COMEDI device. After code
to implement. generation and setup of all necessary components on
the real-time target it is now possible to change pa-
Interface XMLRPCClientInterface: Provides
rameters with a few lines of Java code on a platform-
an interface with all remote procedure calls
independent remote machine.
specified by RTAI-XML. As it is possible to
use different XML-RPC implementations (in
the corresponding implementation the apache
classes have been used) this is done generic
to change the implementation if necessary or
desired.
Class XMLRPCClientImpl: The actual imple-
mentation to the XMLRPCClientInterface. FIGURE 2: Simple Simulink example
public c l a s s AppMinimal {
// c r e a t e c o n n e c t i o n t o RTAI−XML s e r v e r
C l i e n t X m l I n t e r f a c e r e m o t e C l i e n t = new Cl i e n t X m l R p c R t a i I mpl (new S t r i n g ( ” 1 0 . 0 . 0 . 1 0 ” ) ,
2 9 5 0 0 ,new S t r i n g ( ”STEST” ) ) ;
// c r e a t e a t a r g e t
RTAITarget t a r g e t = new RTAITarget ( r e m o t e C l i e n t ) ;
try {
// Connect t o t a r g e t and s t a r t it
t ar ge t . connect ( ) ;
target . start ( ) ;
47
Platform independent remote control and data exchange with real-time targets
// L i s t a l l a v a i l a b l e Parameters
System . o u t . p r i n t l n ( ” P a r a m e t e r s a v a i l a b l e on t a r g e t : ” ) ;
HashMap v=t a r g e t . g e t P a r a m e t e r s ( ) ;
I t e r a t o r <V e c t o r> m = v . v a l u e s ( ) . i t e r a t o r ( ) ;
while ( m. hasNext ( ) ) {
V e c t o r o = m. n e x t ( ) ;
System . o u t . p r i n t l n ( o ) ;
}
// L i s t a l l a v a i l a b l e s c o p e s
System . o u t . p r i n t l n ( ” S c o p e s a v a i l a b l e on t a r g e t : ” ) ;
V e c t o r v1=t a r g e t . g e t S c o p e s ( ) ;
f o r ( i n t n=0;n<v1 . s i z e ( ) ; n++) {
System . o u t . p r i n t l n ( v1 . g e t ( n ) ) ;
}
// Update a p a r a m e t e r s on t h e t a r g e t , i n t h i s c a s e a s o u r c e s b l o c k
t a r g e t . u p d a t e P a r a m e t e r ( ” t e s t / Gain ” , ” Value ” , 1 0 0 . , true ) ;
// Get a s c o p e
RTAIScope s c = new RTAIScope ( t a r g e t , ”U” ) ;
sc . st ar t Tr an sf e r ( ) ;
sleep (20);
sc . stopTransfer ( ) ;
System . o u t . p r i n t l n ( ” S c o p e d a t a from U : ” + s c . g e t D a t a ( ) . t o S t r i n g ( ) ) ;
// s t o p t a r g e t and d i s c o n n e c t it
t a r g e t . stop ( ) ;
target . disconnect ( ) ;
// i n c a s e o f any e x c e p t i o n , c a t c h i t , s t o p and c l o s e t a r g e t .
} catch ( E x c e p t i o n e ) {
e . printStackTrace ( ) ;
t a r g e t . stop ( ) ;
target . disconnect ( ) ;
}
}
}
48
Real-Time Linux Applications
7
6
Distance sensor: A TCRT reflex sensor was se-
ϑ 8
9 3 11 lected in [11, S. 45]. It is measuring the dis-
12 13
10
tance between a vertical bolt (Fig. 3 - 17,18)
14
ϑ mounted on the rotor and a fixed position on
the rheometer. Using the geometry the rota-
tion degree can be determined.
Stator temperature sensor: A PT100 (Fig. 4 -
10) sensor is used to determine the tempera-
FIGURE 4: Mechanical setup — front ture in the stator mounting clamp. Further
view on this temperature is used for control of the
peltier elements.
The rheometer consists of a rotatory and a static
shaft. The rotor is carried by a small metal ball Rotor temperature sensor: An Analog Devices
bearing (which is kept centered by a small magnet AD592CN sensor (Fig. 4 - 6) is used in the ro-
in the stator). Stabilization occurs through a mag- tor head to determine the temperature of the
net on the top which is also lifting the rotor to keep heat filament. Later on this temperature is be-
friction down. The distance needs to be trimmed to ing used to correct the temperature loss due to
a point, where the magnetic force is stabilizing the the poor heat transfer capacity of the probe.
rotor but does not actually lift it completely. Only
a reduction of the weight on the ball is feasible and The following actuators are used:
gives a positive side effect to reduce friction forces.
On top of the rotor a tripod is mounted. Each foot Peltier elements: Two peltier elements (Fig. 4
of this tripod has a magnet glued into epoxide resin. - 12) mounted on the stator clamp are heat-
Each of these magnets is centered between two coils ing and respectively cooling the stator and the
in a Helmholtz type configuration. The constant probe.
magnetic field between those coils causes the mag-
nets to move, the combination is an electrodynamic Heating filament: Due to the poor heat transfer
actor. AC-currents are causing the rotator to oscil- stator tempering is not sufficient. A heating
late. Both parts, rotor and stator are heated to a filament (Fig. 4 - 7) is inserted into the rotor
selectable temperature. head to enable the possibility of exact temper-
ature control in the probe. An extensive re-
search on the heat propagation model can be
3.2 Electronic design found in [12].
49
Platform independent remote control and data exchange with real-time targets
complex structures in form of workflows. Very often controllers (running on the embedded target) de-
and especially in the rheometer case program states pending on the actual requirements for the measure-
and workflows are needed. After a certain measure- ment steps. Additionally it is possible to change
ment point has been recorded and evaluated there some of the parameters if necessary. All additional
is a need to go on to the next one. When moving parameter can still be changed by an control engi-
on parameters have to be changed according to the neer using q/x/jRTAI-Lab from the RTAI project
workflow and its requirements. Using the framework [5] parallel to the running GUI.
from Section 2.2 a state flow and a GUI (Fig. 6) for
an industrial customer was developed.
4 Conclusions
50
Real-Time Linux Applications
51
Platform independent remote control and data exchange with real-time targets
52
Real-Time Linux Applications
Don W. Carr, Juan Villalvazo Naranjo, Rubén Ruelas, Benjamin Ojeda Magaña
Universidad de Guadalajara
José Guadalupe Zuno 48, Col. Los Belenes, C.P. 45101, Zapopan, Jalisco, México
[email protected]
Abstract
We have developed a system based on GNU/Linux, Apache, PostgreSQL, Zend Framework, the RE-
ACT control engine, and various free software projects to create a system for remote data collection,
analysis, and control of grain silos. This system allows grain operators to monitor/manage grain silos
from a distance. The system includes a GNU Linux based computer, locally on site at the silo, with the
REACT control engine installed, and, a GNU/Linux server in the cloud running the Zend Framework
and the PostgreSQL database. Communication with the server is via GET/POST.
53
Using GNU/Linux and other Free Software for Remote Data Collection, Analysis and Control
The temperature columns are typically located such 1-Wire cable runs, via a 16:1 multiplexor. Each ca-
that there is a maximum of 5 meters to the next ble run can be up to 50 meters long, and is for one
column, or 2.5 meters from the wall to the nearest column of temperatures inside the silo. Each column
sensor, such that there will be a temperature mea- must be hung from the ceiling of the silo and bound
surement no farther than 2.5 meters from every point together with a 1/4 inch steel cable using shrink-fit
in the silo. Vertically the temperature sensors have tubing. The steel cable is necessary due to the ex-
been located every 1.5 meters, but, will switch to treme forces that can be generated when the body of
every 0.5 meters to be able to more accurately esti- grain moves as the silo is being unloaded. The Dallas
mate the height of the grain. More on how we esti- 1-Wire temperature sensors must be soldered every
mate the height of the grain later. The number of 0.5 meters or 1.5 meters depending on the accuracy
temperatures measured inside a silo can run into the of volume measurements desired.
hundreds, depending on the size of the silo. Every
Finally, to turn the ventilator fans on/off, we
10 minutes, the outside temperature, relative humid-
need a relay board with two relays, one that is nor-
ity, all of the temperatures in the columns inside the
mally open and goes in parallel with the ON button,
silo, and the status of the ventilation fans, are logged
and one that is normally closed, and goes in series
locally for backup, and, also communicated to the re-
with the OFF button. The ON relay is pulsed to turn
mote web server. At the same time, we check for new
the ventilator fan on, and, the OFF relay is pulsed
control parameters, and download that at the same
to turn the ventilator fan off. This allows both au-
time, if necessary.
tomatic and manual control of the ventilator fans.
A graph of temperature and relative humidity
The architecture for an installation using 900
taken from near Poncitlan, Jalisco, for one of the
MHz radios is shown below. The SBC7300 is the
initial tests, is shown in Figure 1.
GNU/Linux hardened computer, the CL4490 is a 900
MHz radio, the X505 is a master controller for ven-
tilation fans that can connect multiple relay boards,
3 How the monitoring is done the X105 is a slave device in case that more venti-
lation fans need to be connected, and the W100 is
On site, we use one hardened GNU/Linux computer a weather station. The T200 is the PCB for read-
running our REACT control engine [1],[2],3] to read ing up to 16 temperature columns, and can be daisy
the values from all of the data acquisition devices. chained with an RS-485 port.
We are currently using the Technologic Systems TS-
7300 single board computer, but, we are also testing
the TS-7500 and TS-7553 and plan to switch to one
of these for future projects, as they are cheaper and
more compact. None of these have moving parts, and
will withstand temperature up to 70 degrees Celsius.
They all have SD card slots for gigabytes of local
storage. All of the data acquisition devices that we
use support the MODBUS protocol, and we com-
municate via RS-232, RS-485, and 900 MHz radios.
The GNU/Linux computer is connected to the Inter-
net via Cellular modem, or DSL modem, or the local
network, depending on availability.
FIGURE 2: Temperature and Relative Hu-
The temperature and relative humidity are read midity Data from Poncitlan, Mexico
using a small PCB with micro-controller and inter-
face for a GE Chipcap-D temperature/relative hu-
midity sensor. The GE Chipcap-D is soldered to a 4 The Server
small PCB that must be located close to the micro-
controller, and protected from the weather. Dis-
The server where all of the data is logged can be
cussing the optimum location of this system is be-
located anywhere in the world where there is an
yond the scope of this paper.
Internet connection. The actual servers we use
The temperatures inside of a silo are read us- for this project are located in Karlsruhe, Germany,
ing Dallas 1-Wire temperature sensors, and con- and run Debian GNU/Linux, with the Zend Frame-
nected via Category 5, UTP cable. There is a micro- work, and PostgreSQL database installed. The
controller that can communicate with up to 16 Dallas monitored data at each silo is uploaded every 10
54
Real-Time Linux Applications
minitues via HTTP/POST, and the ventilation pa- necessary, they will be able to disable ventilation en-
rameters are checked for changes every 10 minutes tirely via the web page for the silo in question. To
via HTTP/GET. If we go for more than 20 minutes start, the managers must put in the range of humid-
without communications from a particular site, we itys/temperatures to ventilate the grain. For corn
mark it as offline. We use only HTTPS for these re- for instance, if you ventilate the grain in the range
quests, and, each request must be accompanied by of 70-75% relative humidity, you will end up with
the correct security key for each silo, or the request grain that is 13.5 - 14% humidity by weight. We do
is rejected. Notice that the ventilation parameters plan to automate the process further, by letting the
for each silo can be set from anywhere in the world manager only put the target humidity of the grain,
with Internet access, and they will be transferred and and the algorithm will then automatically try to hit
applied at the local site within 10 minutes. Thus, that target. We should note, that the parameter of
the team managing the silos must only log into the the grain that matters more than all others is the
server for all management functions of the various humidity. If it is too humid, then there will be mold
silos. Onsite at each silo, we will not require grain and other things that will destroy the grain. If it is
conservation experts, only technical people to main- too dry, the grain will crack and break. After hu-
tain the equipment. There must be experts on grain midity, we then want the grain as cool as possible.
certification/quality to visit all of the silos periodi- If the grain is just above zero degrees Celsius, it can
cally and carry out quality analysis and certify the be stored for years. If it is over 30 degrees Celsius, it
quality of the grain using portable laboratory equip- will only last a matter of months. In hotter regions
ment. like the state of Sinaloa, Mexico, when the humidity
is in the correct range, the temperature is typically
much hotter than we would like, and, thus, the grain
stored in Sinaloa must be used in a relatively short
period of time.
To estimate how much grain is in a silo, we ba-
sically need to know the level of the grain in the
silo. From this, using simple geometry, we can cal-
culate how many cubic meters of grain are in each
silo. We can then estimate the number of metric tons
of grain in the silo using the approximate number of
metric tons of grain per cubic meter. So, the ques-
tion is now can we estimate the level of grain in the
silos? The answer is in the temperature readings.
In the silo, the temperature above the grain makes
wide swings based on the outside temperature, sun
hitting the silo, cloud cover, weather conditions in
general. Grain, however, is a natural insulator, and,
temperatures inside the grain remain very constant.
55
Using GNU/Linux and other Free Software for Remote Data Collection, Analysis and Control
6 Conclusions References
[1] Don W. Carr, R. Ruelas, Ramón Reynoso, Anne
Santerre, 2006, An Extensible Object-Oriented
We have developed a novel system using GNU/Linux Instrument Controller for Linux, The Eighth
and other free software projects, that allows silos Real-Time Linux Workshop, Lanzhou,
to be managed/monitored from a distance, so that Gansu, China.
there is no need for grain conservation experts at [2] Donald Wayne Carr Finch, Rubén Ruelas, Raúl
each site, and, so that all stakeholders have access Aquino Santos, Apolinar González Potes, 2007,
to the grain quality and also the quantity of grain Interlocks For An Object-Oriented Control En-
in each silo. The system allows managers to set gine, The 3rd Balkan Conference in In-
the parameters of humidity and temperature when formatics, Sofia, Bulgaria.
the grain will be ventilated, and then monitor the
progress/effectiveness of the ventilation via the qual- [3] Donald Wayne Carr Finch, Rubén Ruelas, Apoli-
ity analysis done periodically on-site, and, disable nar González Potes, Raúl Aquino Santos, 2007,
ventilation when goals have been reached. Because REACT: An Object-Oriented Control Engine for
of GNU/Linux and the many other free software Rapidly Building Control Systems, Conference
projects, we were able to complete this project much on Electronics, Robotics, and Automo-
quicker and at a lower cost. tive Mechanics, Cuernavaca, México.
56
Real-Time Linux Applications
Don W. Carr, Rubén Ruelas, Benjamin Ojeda Magaña, Adriana Corona Nakamura
Universidad de Guadalaara
José Guadalupe Zuno 48, Col. Los Belenes, C.P. 45101, Zapopan, Jalisco, México
[email protected]
Abstract
REACT is a control engine that runs on GNU/Linux, written in C/C++, for creating SCADA and
DCS systems, or just simple controllers for a single machine or laboratory equipment. The design was
based on the experience of the author on large SCADA systems for natural gas pipelines, natural gas
distribution systems, and water distribution systems, software for testing military, industrial, and light
vehicle transmissions, and research on control systems in general. REACT was designed as a general
purpose control engine to scale similar to how GNU/Linux scales: from tiny embedded devices in the
field, all the way up to large computers. Thus, REACT can run as the central software on a server
as a SCADA system, and can also run on small hardened devices as the software for an RTU, or DCS
controller.
57
A Status Report on REACT, a Control Engine that runs on top of GNU/Linux
TCP/IP, various PC data acquisition cards, Dallas be done using code generation from a configuration
1-Wire via the OWFS project, a simple ASCII pro- file that named the database field names/types, and
tocol that we developed, and, some simulators that the corresponding field names in the objects. We
we developed that load as drivers. This driver model also quickly realized that this same configuration file
allows us to write a simple simulator that loads as a could be used to convert existing projects with de-
driver to use during testing, and then switch easily limited text files, to use database files. Finally, the
to the real driver in the field. It allows us to keep existing user interface was a text editor to edit the
the code compact for memory constrained embedded delimited text files, and we would need a new user
systems. interface to edit the object configurations/configure
REACT. With these same configuration files, if we
add a few prompts, and a few other simple things,
3 REACT Object model we can generate either a web based user interface for
editing the configuration, or, a text interface, based
on curses, to edit remotely via ssh. We have im-
We needed an object model to implement all of the plemented the code generator for converting existing
control engine objects, often referred to as point projects, and to read/write the configurations from
types or tagnames, or just tags. Actually, we identify REACT, and are working on generating a web in-
all control objects by their tagname which serves as terface so that we can offer an online configuration
a unique ID. As you will see, there is a need to have editor. For now, we are using the SQLite console ap-
a unique ID to refer to objects from scrips, displays, plication to remotely edit configuration parameters
etc. We have four basic types of objects: 1) Input ob- via ssh.
jects that receive process values via a device driver,
2) Output objects that send/write process values via See below, the configuration file for discrete out-
a device driver, 3) Control objects that access pro- puts in Figure 1, and then, the user interface gener-
cess values via input objects, and send process values ated from this file, in Figure 2.
via an output object, and, 4) Objects that only cal-
culate values, do data logging, do user interfaces, etc.
Currently, all of the object types must be hard-coded
into REACT, but, we are in the process of switching
them to be all loaded on demand at load-time, if they
are needed for a particular project. We have started
by implementing dynamic loading for one object type
(analog input).
4 REACT Configuration
Like many projects, we started out storing all con-
figuration in a directory full of delimited text files.
However, there are possible corruption problems, and
it is tedious to copy all of the files to the target sys-
tem, and retrieve all of the configuration files, af-
ter local changes have been made, for backup. For
this reason, we are moving to putting all of the con-
figurations into SQLite, to eliminate the problems
of corruption by using transactions, and, thus sim-
plify copying the configurations to a target system,
and backing up, since SQLite stores the complete FIGURE 1: Config File used to Auto-
database with all tables in a single file. SQLite is Generate code for Discrete Outputs
also extremely compact and can itself be loaded when
needed, and then unloaded using dlopen(), dlsym(),
dlerror(), dlclose().
Faced with the need to write all of the functions
to read/write the configuration for ALL object types,
we quickly realized that it was repetitive, and could
58
Real-Time Linux Applications
59
A Status Report on REACT, a Control Engine that runs on top of GNU/Linux
60
Real-Time Linux Applications
Gerstorfer, Gregor
Institute for Measurement Technology, Johannes Kepler University Linz, Austria
Altenbergerstr. 69, 4040 Linz
[email protected]
Zagar, Bernhard G.
Institute for Measurement Technology, Johannes Kepler University Linz, Austria
Altenbergerstr. 69, 4040 Linz
[email protected]
Abstract
The development of a low cost profile measurement system suggests – because of the feature low cost –
the use of an open source control system. In this work, the authors combine a commercially available
compact disc (CD) pickup head, the National Instruments NI-PCI 6221 data acquisition card, and a PC
running RTAI Linux. The setup is used to scan along a CD and measure the distance between tracks.
The setup will be used in a lab course by students to get to know rapid control prototyping systems and
open source alternatives to expensive industrial systems.
61
Development of an optical profile measurement system under RTAI Linux using a CD pickup head
and convert the data. In this section the used shape at the detector array is either elliptic or circu-
pickup head (Mitsumi PXR–550X) from a commer- lar, depending on whether the reflection of the beam
cially available CD–drive is investigated. takes place in the focal plane of the objective lens
(circular beam) or if the reflection takes place out
Detector of the focal plane (elliptic shape). The orientation
(NE–SW or NW–SE) and size of the elliptic beam
pattern depends on the location and distance from
Astigmatic Lens the focal plane.
2.5
Beam Splitter
2
1.5
RF in V
1
0.5
0
0 20 40 60 80 100 120 140 160
z in µm
Laser Diode
2
A B
1
D C
FE in V
Too Close
0
Collimator A B A B
-1
D C D C
Too Far In Focus
-2
0 20 40 60 80 100 120 140 160
z in µm
VCM
Objective Lens
FIGURE 2: Bottom: The FE signal’s char-
acteristic s–curve with corresponding beam
shapes at the detector array. Top: The RF
signal.
Reflective Surface The mentioned four elements of the detector ar-
ray are named A, B, C and D. Each of them deliv-
ers a photocurrent depending on the illuminance in
FIGURE 1: The working principle of a CD the respective area. These currents are subsequently
pickup head – focused case. converted into voltages A, B, C and D. From these
voltages, the so–called focus error (FE) signal can be
Figure 1 shows the principle of a CD pickup head, derived:
which consists of a laser diode operating at a wave-
length of 780 nm, a linearly polarizing beam split- F E = (A + C) − (B + D). (1)
ter, a collimator, an objective lens and an astigmatic
lens in front of the detector array. The emitted laser When a specimen with a reflective surface is
beam is polarized by the polarizing beam splitter, moved towards the pickup head, the measured FE
collimated and reflected at the specimen’s surface af- signal shows a characteristic s–shape with an approx-
ter passing the objective lens. This objective lens can imately linear range over an input range of ±3 µm,
be moved by the so called voice coil motors (VCM). see Fig. 2. The reflection takes place exactly in the
The scattered back light takes its way back through focal plane, when the FE signal becomes zero.
the objective lens, passes the beam splitter and is di- Beside the FE signal also the total illumination
rected via the astigmatic lens to the detector array. integrated over all four photodiodes, the RF signal,
The detector array is a four–quadrant photo diode, is of interest:
in Fig. 2 their arrangement can be seen. Because of
the characteristics of the astigmatic lens, the beam RF = A + B + C + D. (2)
62
Real-Time Linux Applications
The RF signal corresponding to the above mentioned At the begin of a profile measurement at first a
FE signal is shown in the upper part of Fig. 2. voltage sweep from −0.5 V up to 0.5 V must be ap-
plied to the VCM and the corresponding FE signal
has to be acquired so that we can derive the actual
3 Profile measurement with a sensitivity of the FE signal to the specimen’s rough-
ness according to its reflective properties. Later some
CD pickup head measurement results will be presented based on this
idea.
The fact that the FE signal shows an approximately
linear range in its s–curve is used for profile mea-
surement: a specimen is placed exactly in the focal 4 The setup using RTAI Linux
plane and is then moved in any direction within the
focal plane to obtain a scan. The variation of the
FE signal is then the measure for the profile. Since In this work there is also a focus on the controlling
some surfaces show better reflective properties than hard- and software because the students in the course
others do, and the FE signal’s amplitude depends on should get to know and use open source software.
the amount of scattered back light, a way to obtain
a proportionality factor between FE signal and pro- Host
file roughness is sought after. Therefore one has to Master
assume, that the reflective properties of a specimen MATLAB/Simulink TCP/IP
RTAILab
are approximately constant in the investigated area. Realtime Workshop
63
Development of an optical profile measurement system under RTAI Linux using a CD pickup head
During the installation of RTAI a folder named The sampling time amounts to ts = 0.1 ms.
MATLAB is created. This folder is copied into the
MATLAB directory of the PC where – after a small
setup – the realtime task then can be developed in
Simulink. Since the configured RTAI kernel contains
Comedi support there are several Comedi blocks in-
cluded. With the realtime workshop (included in
Simulink) the code is generated and can be trans-
ferred to the Master. Subsequently, the code has to
be compiled (the generated code contains a makefile)
and can be started as long as the required realtime
modules are loaded. The Ubuntu installation in the
Host’s virtual machine has the same RTAI patched
kernel and loaded modules as the Master. With QR-
taiLab [13] a connection to the realtime task can be
established to view scopes and to change parameters
of the realtime task.
In this work the NI PCI 6221 data acquisition FIGURE 5: Screenshot of QRtaiLab plot-
card measures the voltages A, B, C and D (see sec- ting acquired signals.
tion 3) as well as the voltage it applies to the VCM
In Fig. 5 a zoomed-in view of the processed FE signal
ua .
and the corresponding VCM voltage is shown. The
linear range of the FE signal crosses zero when the
VCM voltage amounts to ua = 415 mV. Furthermore
the sensitivity of the FE signal can be computed by
considering Eqn. 3 to
z µm
= 2.54 . (4)
FE V
1
FE
−1
−2
−0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
t in s
5 Measurement example 0.5
VCM voltage
0.45
u in V
0.4
O
0.35
As a measurement example we first measure the pro-
−0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
file of a CD. Hence CDs are easily available, their re- t in s
flective characteristics are good and because they are
one of the few available specimen with a microstruc-
FIGURE 6: FE signal acquired while the
ture on it they are used as specimen.
reference delta voltage (top). Corresponding
First, the CD is placed near and parallel to the VCM voltage (bottom).
pickup head. Then the objective lens is moved by
the voice coil motor in axial direction (a delta volt- Subsequently the VCM voltage is fixed to the zero
age is applied to the VCM). This is done for finding crossing voltage and slightly adjusted so that the FE
the lens’ position where the reflection takes place in signal becomes zero. Now the CD is moved in ra-
the focal plane and for acquiring the data for the cal- dial direction along the pickup head. This is accom-
ibration of the FE signal (see section 3). In Fig. 5 plished by a translation stage (Oriel Encoder Mike
the screenshot of QRtaiLab measuring the signals A, 18011) with a velocity of v = 1 µm s . The acquired
B, C and D and the VCM delta voltage is depicted. data is shown in Fig. 7, please remark that the plot
64
Real-Time Linux Applications
already shows the profile roughness with respect to software were zero and students should learn to use
the travelled distance, both in µm. similar setups for their own projects.
Profile of a CD
1
Acknowledgment
0.5
−0.5
(ACCM).
−1
−1.5 References
−2
0 5 10 15 20 25 30 [1] Development of a low-cost autofocusing probe
x in µm
for profile measurement, Kuang-Chao Fan, Chih-
Liang Chu, Jong-I Mou, (12) 2001, Measurement
FIGURE 7: Measured profile of a part of a Science and Technology
CD.
[2] Measurement of Cantilever Displacement Us-
The track pitch is measured and amounts to 1.6 µm ing a Compact Disk/Digital Versatile Disk
which corresponds to the specification of the CD Pickup Head, En-Te Hwu, Kuang-Yuh Huang,
standard. The profile shows a roughness of maximal Shao-Kang Hung, Ing-Shou Hwang, 45 (2006),
2 µm. Japanese Journal of Applied Physics
The setup shows some insufficiencies hence it is
[3] DVD pickup heads for optical measurement ap-
hard to position the pickup head and the CD per-
plications, Stefan Kostner, Michael J. Vellekoop,
fectly parallel. Furthermore the calibration of the FE
125 (2008), Elektrotechnik und Information-
signal is difficult because the reflective characteristics
stechnik (e&i)
are never constant. Another problem occured when
a controller was implemented. Because the FE sig- [4] https://www.rtai.org
nal’s linear range is very narrow the controller is not
able to position the lens in focus but rather switches [5] Development of a low-cost measurement system
from the negative to the positive defocus where the for cutting edge profile detection, Gregor Ger-
FE signal also goes zero. Nevertheless a micro struc- storfer, Bernhard G. Zagar, 2011, Chinese Optics
ture can be measured despite the insufficiencies. Letters
[6] http://www.mathworks.com
6 Conclusions [7] http://www.ubuntu.com
[8] http://www.virtualbox.org
The development of a profile measurement system
based on a optical pickup head using RTAI Linux [9] http://www.kernel.org
was presented. The optical and electronic compo-
nents of a CD pickup head and its working principle [10] https://www.rtai.org/RTAILAB/RTAI-
were introduced. The control and the data acquisi- KubuntuJaunty-ScicosLab-Qrtailab.txt
tion from a remote PC delivered measurement results
which showed the functionality of the setup. Based [11] http://qrtailab.sourceforge.net/rtai installation.html
on this work students will work in a lab course with [12] http://www.comedi.org
open source/rapid control prototyping system. By
using the introduced setup the costs for additional [13] http://qrtailab.sourceforge.net
65
Development of an optical profile measurement system under RTAI Linux using a CD pickup head
66
Real-Time Linux Applications
Pavel Pı́ša1,2
[email protected]
Petr Smolı́k1,3
[email protected]
František Vacek1
[email protected]
Martin Boháček1
[email protected]
Jan Štefan1
[email protected]
Pavel Němeček1
[email protected]
1
Czech Technical University in Prague, Department of Control Engineering
Karlovo náměstı́ 13, 121 35 Praha 2, Czech Republic
2
PiKRON s.r.o.
Kaňkovského 1235, 182 00 Praha 8, Czech Republic
3
AGROSOFT Tábor s.r.o.
Harantova 2213, 390 02 Tábor, Czech Republic
Abstract
The uLan protocol is the multi-master communication protocol aimed on small RS-485 control net-
works. It provides deterministic media access arbitration and it is open in design from its origin. An
open-source implementation of the protocol has already been available for many years. The article fo-
cuses on its adaptation for use in distributed home appliances (switches, lights and HVAC components
interconnection and control). For resource restricted control nodes, it was a challenging task to imple-
ment a flexible and persistent configuration of data and events direct routing between distributed nodes
without need for permanent operation of commanding master. Because devices do not have resources
to mutually examine their often large objects/properties dictionaries, the mechanism to map properties
values into process data messages slots has been implemented. The message slots act as (virtual) wires
which are setup by configuration tools running on PC which has enough resources to build and visualize
full objects/properties model by examining of connected devices. Examples of developed devices using
developed concept are presented at the end of the article together with tools available to help with fast
prototyping of new devices and their testing in PC environment. The compilation of embedded devices
code as native Linux binaries is quite straightforward because uLAN driver implementation is portable
and provides same API when compiled for system-less nodes, GNU/Linux or Windows operating system
environment.
67
Process Data Connection Channels in uLan Network
68
Real-Time Linux Applications
cases assisted by CPU code in ISR) only on whole chronously (UART is used) and some delays could be
character time granularity, the arbitration is based caused by latencies in interrupt processing and some
on switching between Tx zero and Rx for whole char- delays are even required for safe transceiver Rx/Tx
acter time (sometimes implemented by break charac- switching without spikes the minimal time is speci-
ter send). Not like in I2C case, the arbitration needs fied as 4 character/byte transfer times Tchr .
to finish before target address and data are sent in
The first phase TarbW waiting time is not the
transceiver fully driven Tx mode. The arbitration
same for all nodes to ensure some distribution of the
sequence is based on self node/module address to
channel capacity between multiple nodes. The wait
ensure unique dominant/recessive sequence for each
time value is counted as
node.
TarbW = ((LAdr − Adr − 1) mod 16 + 4) · Tchr (1)
uLAN is targetted to control applications which
require data receiption acknowledgement and com-
munication exchanges can be simplified by a direct where LAdr is node address of the last node
reply by addressed device during a single arbitra- which has won arbitration and now releases the bus,
tion cycle. Direct reply frame follows directly after Adr is the address of given node which prepares for
initial frame end without media arbitration. Mas- bus use and Tchr is time to transfer one character.
ter releases the bus after last frame belonging to the This setup ensures strict cycling of media access pri-
given session. This is technique used in many other ority between nodes with messages prepared in Tx
standards but the advantage of uLan is mechanism queue when only addresses up to 16 are assigned to
generic enough that there is no need to use special- nodes. If more nodes are used, the cycling between
ized command format knowledge on the master’s side aliasing nodes is not ensured on deterministic basis
of communication and required/expected single mes- but at least helps with some stochastic distribution.
sage session frames sequence can be prepared and The second phase ensures that node with lower
passed to the driver on application level. own address wins arbitration when two or more
The single frame consists of destination address nodes finish the first phase at the same time. The
(DAdr) with address bit set, source address (SAdr), arbitration is based on sending next three dominant
command (Com) followed by frame data characters. level break characters separated from initial one by
The end of data is delimited by one of four control precomputed time intervals Tarb,0 , Tarb,1 and Tarb,2
characters describing the frame end kind. The sim- Tarb,i = ((Adr shr(2 · i)) mod 4 + 1) · Tchr (2)
ple frame consistency check byte (XorSum) follows.
The frame is directly acknowledged if frame end kind If the activity from other node is detected during
specifies that. Then an direct reply frame can follow inactive interval time, the node abandons arbitra-
if indicated by frame end as well. tion and restarts from the first phase. Direct binary
coding and sending of own address as sequence of
Data frame format
dominant recessive character intervals have not been
selected because precise timing would be a problem
DAdr
or
SAdr Com 0 to MaxBlock
of data bytes
uL_End,
uL_Arq,
XorSum through ISR responses. The addition of one dom-
uL_Beg uL_Prq
or
inant start bit and recessive stop bit around each
uL_Aap
arbitration bit would result in even longer phase two
sequence (3 · Tchr · 8 = 24 · Tchr ) length.
FIGURE 1: uLan Frame Format
Bus request and release
69
Process Data Connection Channels in uLan Network
TarbAll ∈ h4 + 3 · 2, 20 + 3 · 5i · 11 · Tb (3)
TarbAll ∈ h10, 20 + 35i · 11 · Tb (4) Higher Level Layers
70
Real-Time Linux Applications
be read from incoming queue by parts and OIDs are are not required for the most tasks of home automa-
directly interpreted and the reply message is build tion systems. That is why use of uLan for heating
again in “driver space” buffers. The second advan- monitoring and control, lights switching and ring-
tage is that reply allows to identify which objects bells has been proposed by team preparing new home
data it contains. This allows to have more data re- automation project at the Department of Control
quest on the fly from different controlling nodes or Engineering.
applications.
uLOI layer supports devices configuration and
The example of system utilizing many of uLAN their state monitoring by higher level systems. But
services is CHROMuLAN HPLC control system de- use of polling cycle by higher level system is sig-
veloped by Jindrich Jindrich and PiKRON Ltd. nificant disadvantage for home automation. The
home appliances has to be equipped by system which
allows direct communication between nodes in re-
Control System Device 1 Device 2
Local
sponse to the incoming events. This is important
Local display
CHROMuLAN display
keyboard, UI not only to short latencies caused by polling cycle
User scripts keyboard
Control logic Graphic and time program
parameters and User and UI but even to allow system to provide at least basic
acquired data IFPS interpretter Interface
Device Device logic functionality even in the case of higher level control
Object tree browser and handler function
Object tree of branches, properties appli−
and application
comunicating
system failure. It would be possible to use uLOI
and process variables cation over uLan objects messages for direct data writes or reads to/from one
Process
Persistent variables and appliance to objects located in other one. However,
storage Temp. Dev 1 Dev 2 device
ULF, ULC storage model model uLan
object
uLan
object para− this would require mutual knowledge of the structure
ULD files (mem)
uLan net. model interface interface meters of appliances and require quite complex and memory
uLan API uLan uLan resource huge OIDs list and types retrieval or made
API and
MCU
API and Contro−
MCU llers
system inflexible by storing other device OIDs into
Operating System
Linux/Windows uLan
support
libraries
support sensors
libraries etc.
firmware in fixed form.
DOS driver
71
Process Data Connection Channels in uLan Network
Res Lo Res Hi Ext len (el) Ext CID data len (dl) data CID ...
1 byte 1 byte 1 byte 0..el bytes 2 bytes LE 1 (2) byte dl bytes
a single CID, use broadcast to distribute data into mappings for the same CID. The special form to em-
multiple destination devices or even use more de- bed 3 bytes (OID + single byte) or 4 bytes (OID +
vices as data source for same CID. When device 2 bytes) directly into ULOI PICO or ULOI POCO
receives PDO message, it processes every CID iden- mapping table entry is also supported.
tified data according to configured mapping. CIDs
and their respective data for which no mapping is
found are simply skipped. Only data types compati-
bility between mapped source and destination OIDs Events to Process Messages Mapping
is required and sometimes this requirement can be
even relaxed to some degree. If destination type is The ULOI PEV2C array specifies, which CID/CIDs
shorter then source, remaining bytes are skipped, identified transfers should be initiated when given
counter case is illegal for actual implementation. event number is activated. One event can be speci-
Predefined constant data can be sent in response to fied multiple times to trigger multiple CID transfers.
event activation as well. The ULOI PEV2C array entry specifies event num-
ber to CID mapping and some flags to nail down CID
Command UL CMD PDO (0x50) is specified for
processing.
PDO messages. Message format starts with two
reserved bytes for future static extensions and one
byte follows, which can be used for dynamic PDO
messages header extensions in future. These bytes
should be sent as zero for current protocol version. 5 Example Applications
Each data block is preceded by its CID and data
length. Maximal individual data block length is 127 DAMIC Home Automation Compo-
bytes for actual implementation and is encoded in nents
single byte. Format allows extension to two bytes in
future if needed.
The concept of the uLAN PDO connection channels
is used in a components and appliances set which has
been developed at the Department of Control Engi-
Control of Data Mapping into Chan- neering to cover needs of heating, ventilation, air-
nels conditioning (HVAC), light control and other home
automation tasks:
All configuration/mapping of PDO data source and
processing of received PDO messages is done through
device objects dictionary (uLOI). Exchanged data
and meta-data stored in mapping tables have same
format as is used for uLOI layer properties/data ac-
cess.
The core component are ULOI PICO and
ULOI POCO mapping tables, both with same for-
mat structure. They are accessible as regular uLOI
arrays of four field structures. Each array entry spec-
ifies mapping between CID and object dictionary en-
tries. Simple one to one mappings are specified di-
rectly by entry by OID number. Complex mapping
can specify offset into block of meta-data byte array
instead of direct OID specification. This allows to
serialize multiple objects/OIDs data under one CID,
add execute command after CID data reception and FIGURE 4: uACT 2i2ct - uLan Actuator
distribution into uLDOI objects etc. Another possi- and Temperature Sensor
bility is to process the same received data by multiple
72
Real-Time Linux Applications
uLAN-Admin
73
Process Data Connection Channels in uLan Network
and send. The application allows save this network ness of the project make it an excellent candidate for
configuration. The network configuration is transfer- smaller hobbyists home automation projects. The
able to real devices. The virtual device can control minimal requirements for small nodes (only UART
the real devices connected to uLAN bus and vice with software parity control) allows to base such de-
versa. signs on a cheap Cortex-M3 or even smaller MCUs.
The design of higher communication layers can be
utilized even in combination with different link tech-
nologies or can serve as an inspiration for other sim-
ilar projects at least.
uLan project is a live thanks to more companies’
and university members’ participation. The actual
version of the code used in multiple real sold prod-
ucts is available from uLAN project SourceForge GIT
repository and file releases archives.
References
FIGURE 7: uLAN-genmod - Application [1] Jindřich, J., Pı́ša, P.: CHROMuLAN
Main window with Two Devices project [online], 2004–2011, Available:
http://sourceforge.net/projects/chromulan/.
6 Conclusion [2] Pı́ša, P., Smolı́k, P.: uLan Communication Pro-
tocol for Laboratory Instruments, Home Automa-
The uLAN protocol and surrounding infrastructure tion and Field Applications, In 15th International
have been used in many applications for years. They Conference on Process Control 05, Bratislava,
include two generations of HPCL instruments sets 2005. Slovak University of Technology. ISBN
(third generation is in preparation now), more agri- ISBN 80-227-2235-9.
cultural control systems and componets, other se- [3] Pı́ša, P., Smolı́k, P.: uLan SF.net
rious production grade and hobbyists projects (i.e. project [online], 2004–2011, Available:
HISC private house control network based on sole http://ulan.sourceforge.net/.
uLOI which componets has been designed around the
year 2005). [4] Pı́ša, P.: ulan Driver and Protocol Base
Documentation [online], 2004–2011, Available:
uLAN uLCN/PDO design started in 2008 and its http://ulan.sourceforge.net/index.php?page=3.
actual version is complete and well tested. The ap-
proach is similar to CANopen dictionary and PDO [5] PiKRON s.r.o.: HPLC Systems Man-
idea but it is more flexible and suitable for wider uals and Products [online], 2011,
size data types, generic arrays and inherits under- http://www.pikron.com/pages/-
laying uLOI layer flexibility. uLOI layer provides products/hplc.html.
network introspection capabilities much better than [6] Pı́ša, P.: Mathematics and electrical processing
many other standards offers. Yet the metadata over- of liquid chromatography detector signal, Ph.D.
head is kept very small for data exchange after initial Thesis, Czech Technical University in Prague,
device model retrieval phase. 2010.
The PDO mapping system has been tested on [7] Němeček, P., Čarek, L., Fiala, O., Burget, P.:
the CTU developed components for home automa- DAMIC - HVAC control system [application pro-
tion during the DAMIC project. The initial versions totype], 2009
of open-sourced management software utilizing Qt
library is being developed as well. uLan driver and [8] MIKROKLIMA s.r.o.: DAMIC prod-
fully portable interface libraries allows to test even ucts for MIDAM Control System [online],
GNU/Linux builds of components and their interac- 2011, http://www.midam.cz/categories/-
tion. The Qt based components builder and dictio- DAMIC-inteligentni-dum.html.
nary sources generator is in development to help new-
[9] DCE MCU HW and SW Development Resources
comers to test capabilities and speed up new nodes
Wiki – Rtime Server at DCE FEE CTU [online],
design.
2011 http://rtime.felk.cvut.cz/hw/.
The uLCN/PDO mapping extension and open-
74
Real-Time Linux Infrastructure and Tools
Abstract
This paper presents the application of RT-Preempt Linux in a virtual commissioning scenario. In
this scenario, a proprietary Programmable Logic Controller (PLC) is connected to a real-time simulation
model. The model is located on a separate Linux personal computer which simulates for example the
hardware of a production machine. Furthermore, the controller and the simulation computer are con-
nected through the sercos III automation bus. The simulation computer uses a sercos III PCI card as
communication hardware in combination with a user space IO (UIO) driver. This allows the execution
of the simulation model and the sercos III driver as real-time processes on the simulation computer. The
sercos III driver was adapted in order to imitate the bus-interface of a custom sercos III bus-coupler and
to provide easy integration into the PLC engineering system. Moreover, variables in the PLC can be
coupled to input and output values of the simulation model. With this virtual commissioning method,
it is possible to reduce the time to market of a machine, since writing and testing the PLC code for the
controller can be done in parallel to the construction of the hardware.
75
Application of RT-Preempt Linux and Sercos III for Real-time Simulation
2 Theory & State of the Art Each device is equipped with two Ethernet ports.
The preferred bus topology is a ring structure, since a
This chapter gives a short introduction into the tech- ring provides more redundancy than a star topology.
nologies and software systems which are used within Apart from this, a line topology with one or two lines
this project. (i.e. broken ring) can be used as well. Sercos uses
a sophisticated device model which classifies every
bus component into different classes of functionality.
2.1 RT-Preempt Linux and User According to the device model it is possible to dis-
Space IO Drivers tinguish between servo-drives, IO-devices and other
automation hardware.
Linux with real-time kernel preemption (RT- Furthermore, a parameter model was introduced
Preempt) is a enhancement to the Linux kernel. The to describe functional interfaces of field-bus devices.
aim of this patch is to enable real time capabilities Every device has a set of sercos parameters which
in the Linux kernel. The RT-Preempt patch allows characterise the interface of the device. Parame-
user space programs to run in real-time [2], [3]. ters can be accessed by unique identification numbers
The User Space IO (UIO) driver model enables (IDN). Furthermore, a parameter contains a descrip-
drivers to run in the user space of a Linux system [4]. tion of the parameter as string, several attributes and
UIO drivers are a convenient method to implement the data of the parameter with a with fixed or vari-
drivers for non-standard and rarely used hardware able length.
which does not fit into the regular kernel subsys- Sercos uses a start-up phase with five different
tems. The memory of a device is mapped into ad- communication phases (CP) which are usually called
dresses which are accessible from user space memory CP0 to CP4. When the communication phase has
segments. To handle interrupts, a user-space thread passed the early stages and reaches CP4, real-time
can be applied. In addition, a small interrupt han- communication is active, devices and connections are
dler within the kernel space is necessary to wake the set up adequately and real-time data can be trans-
thread. With this functionality it is possible to write mitted. Furthermore, sercos devices can be described
drivers for special purpose devices without the need in the sercos Device Description Markup Language
to handle complex in-kernel structures. UIO-drivers (SDDML) which is based on the Extended Markup
are often used to handle networking devices for field- Language (XML).
buses on systems which are running on RT-Preempt
Linux.
2.2 Serial Real-Time Communication 2.3 Passive sercos III PCI Card
System (sercos) III
The automation bus sercos III is an Ethernet based Custom and PC based sercos slaves can be built
field-bus system which can be used in a wide range by equipping PCs for example with sercos III PCI
of automation applications. Sercos III is standard- networking cards from the company Automata [6].
ised by the association sercos International e.V. [5]. The card contains standard Ethernet communication
In the following the term sercos is used as abbrevia- hardware and a FPGA in order to connect it to the
tion to sercos III. Sercos is based on standard Eth- PCI bus. To bring the card to operation, a propri-
ernet and uses Ethernet frames to communicate on etary driver is necessary. This Sercos Slave Driver
the bus. A sercos network consists of a bus master (SSLV) is written OS independently and contains a
and several slave devices (Figure 1). hardware abstraction layer which can be ported to
other operating systems easily. The card is named
”passive” because a driver, which executes the sercos
networking stack, and a real-time operating system
are necessary to use the card.
This project utilises a port of the SSLV to RT-
Preempt Linux. The SSLV is running as UIO-Driver
within the user space. To support the user space
part of the driver there is also a small kernel module
FIGURE 1: Sercos III ring with master called uio sercos3 in the mainline kernel. Figure 2
and slave devices. shows a rough overview of the SSLV.
76
Real-Time Linux Infrastructure and Tools
3 Problem Definition
To perform virtual commissioning of a production
machine, a real-time simulation model of the hard-
ware is necessary. This virtual machine model can
usually be executed by Virtuos on a PC which also
FIGURE 2: Automata sercos III Slave
executes a PLC or an other type of controller soft-
Driver (SSLV); according to [7].
ware. Indeed, the project specification demanded the
application of a MLP VEP PLC, a proprietary con-
The driver consists of two parts: A small kernel
troller which is not able to execute a Virtuos model.
module called uio sercos3 and the user space applica-
Moreover, it provides no standard interfaces to con-
tion of the SSLV. The user space part is separated in
nect it to a virtual machine model. To solve these
two threads: A UIO interrupt handler thread which
problems, a new method is desired to connect the
is executed with a high real-time priority. And the
PLC to the model. Since the PLC is a proprietary
UserTask, a regular user space part of the SSLV. The
device which needs to be programmed by proprietary
UserTask has a lower real-time priority and needs to
software there are no simple methods to extend it by
be executed at least once in a sercos communication
custom real-time tasks.
cycle. Moreover, a database for IDNs is contained in
the SSLV which can be interfaced from the bus and The solution to this problem is to move the sim-
from the UserTask. ulation model to a PC and let it communicate with
the PLC. Since the communication between PLC and
simulation model needs to be run in real-time, a field-
2.4 MLP VEP and IndraWorks bus can be used to connect the PLC to the simulation
(Figure 3).
In this project a proprietary Programmable Logic
Controller (PLC) produced by the company Bosch
Rexroth [8] is deployed. The MLP VEP is a PC
based PLC which is equipped with several sercos
ports and acts as sercos master device on the bus
system. Furthermore, it can be programed and con-
figured with the engineering tool IndraWorks. It is FIGURE 3: PLC and simulation PC.
able to execute PLC programs written in the five
languages specified in IEC 61131-3 [9] and has addi- Since the MLP VEP PLC offers direct access to
tional Motion Logic Control (MLC) functionality. the sercos field-bus, sercos will be applied as field-
bus in this project. As this field-bus will also be
used later on in the production machine, it does also
2.5 Virtuos also simplify the integration of other hardware which
will be connected to the PLC afterwards.
Virtuos [10],[1] is a simulation software which en-
ables the execution of mechatronic and other mod-
els in real-time. Virtuos consists of three different
software parts: Virtuos-M, Virtuos-V and Virtuos- 4 Approach
S which provide different functionality to the user.
Virtuos-M and Virtuos-V are used for modelling and Sercos uses a device model which can provide dif-
visualisation of simulation models. Virtuos-S is used ferent types of automation devices. But no device
as simulation solver which can compute simulation has an interface which resembles the complexity of
77
Application of RT-Preempt Linux and Sercos III for Real-time Simulation
Virtuos simulation model. A simulation model pro- the PLC. The configuration of the field-bus system is
duces and consumes a high amount of data in every done from this system as well. On the right hand side
simulation cycle. In this project it is sufficient to the simulation computer system with a RT-Preempt
provide exchange of floating point values and inte- patched Linux kernel is shown. This system is also
ger values, since the simulation consists of a mecha- equipped with a passive sercos PCI communication
tronic model. The interface between the PLC and card. PLC and simulation PC are connected via ser-
the simulation was defined as an amount of integer cos. For debugging purposes an Ethernet wiretap
and floating point values. To be able to integrate the can be inserted, as shown in the figure. Inside the
simulation model into the bus system the interface simulation PC the SSLV and the simulation model
of the model was enhanced to resemble the interface are executed. Since the system is (beside the RT-
of a (very large) bus-coupler (see figure 4). kernel) a standard Linux PC, additional software can
be used for debugging purposes as well. The SSLV
is executed to support sercos communication with
the PCI card. Moreover, it is equipped with IPC in-
terfaces to communicate with the simulation model.
Besides of that, the SSLV can record debugging in-
formation in real-time into a FIFO buffer. This in-
FIGURE 4: Simulation model hidden be- formation can be easily read by third party programs
hind the interface of a bus-coupler device or saved for later analysis.
5 System Design
This chapter introduces the design of the simulation
system. Figure 5 shows the overall system structure.
FIGURE 6: Communication concept
78
Real-Time Linux Infrastructure and Tools
5.2 Controlling the Simulation from data exchange with Virtuos. As a first step, the ser-
the PLC cos interface of the IDN database of the SSLV was
enhanced to emulate the bus-interface of a standard
Within the programming system of the PLC, IO off the shelf bus-coupler with just 16 bits of IO-data.
ports of devices can be mapped to variables. Vari- This is an error prone process since there is no de-
ables can be connected to either input or output scription which IDNs are retrieved and evaluated by
ports. Afterwards, IO-operations can be done by the PLC during the start-up phases. The fields for
setting bit-masks in the PLC program. In addition, cyclic real-time data were extended afterwards to the
field-bus devices can be added to the IndraWorks size of 512 bytes as specified in the custom SDDML
project from a device database. The database can be file of the device. In the final configuration 512 bytes
extended by device descriptions. For sercos devices of data are transferred from the PLC to the simula-
this can be achieved by using files in the SDDML- tion and the same from the simulation to the PLC in
Language. Since the bus-coupler which is used in this every communication cycle. Data composition and
project is not a standard off-the shelf bus-coupler, decomposition is also done by the communication
a SDDML file was written which describes a very thread. Listing 2 shows the specification of a data
large bus-coupler. The file was added to the device packet in C source code:
database of IndraWorks to be able to use it within
the PLC program. typedef struct {
double doubles[32];
Moreover, a data structure was created which int ints[64];
combines all the IO-data that has to be send or re- } io_type;
ceived in one communication cycle. The structure
contains a certain amount of 32 bit integer and 64 Listing 2: Specification of a data packet in the
bit floating point variables (See listing 1). SSLV
TYPE io_type: Luckily, the compiler for the PLC code and the
STRUCT GNU-C compiler use the same method of storing
reals:ARRAY [0..31] OF LREAL; data. To decompose the data packet back into struc-
integers: ARRAY [0..63] OF DINT; tures of variables a pointer to a byte array can be
END_STRUCT used. The pointer has be to casted into a pointer of
END_TYPE type io type and vice versa. To connect the SSLV
to the running simulation and to synchronise a sep-
Listing 1: Specification of a data packet in the PLC arate Virtuos-IO (VIO) thread is used. The VIO
thread has the responsibility to exchange data with
Since the size of bytes in the structure is equal
the running Virtuos simulation and to trigger simu-
to the size of bytes of IO-data in the bus-coupler it
lation steps from outside. For purposes of synchroni-
is possible to connect the complete structure to the
sation two semaphores are deployed. Figure 7 shows
IO-configuration at once. To support input and out-
the execution model as simple Gantt-diagram (with-
put data, two structures were added to the input and
out the running communication thread).
to the output of the device. Consequently, a regu-
lar PLC program can be used to perform calculation
input and output operations.
79
Application of RT-Preempt Linux and Sercos III for Real-time Simulation
semaphore has the purpose to signal the beginning lation may cover more than one field-bus devices at
of a simulation step, the ”end” semaphore signals one. As a result it will, be feasible to switch between
the end of a simulation step. The VIO thread is simulated hardware and real hardware without the
started at TN . When the VIO thread has completed need for any changes in the PLC.
its data transfer to the simulation, the simulation
is started. The simulation executes one simulation
cycle and signals the end of the cycle to the VIO- 8 Acknowlegement
thread. Since the exact simulation time varies from
application to application, the VIO-thread does not
The authors would like to thank the German Re-
start data transfer immediately but sleeps until the
search Foundation (DFG) for financial support of
next TN +1 to be in time with the other parts of the
the projects within supporting the Graduate School
system.
of Excellence advanced Manufacturing Engineering
(GSaME) at the University of Stuttgart.
6 Conclusion
References
This paper presents how RT-Preempt Linux can be
used for real-time simulation and the virtual com- [1] Hardware in the loop simulation of production
missioning of production machines. A proprietary systems dynamics, Sascha Röck, 2011, Prod.
PLC is connected to a simulation PC which executes Eng. Res. Devel., German Academic Societey
the real-time simulation model. The automation bus for Production Engineering (WGP), Springer
sercos III is used to transfer data in a deterministic Verlag, Germany.
manner between PLC and simulation PC. To adapt
the simulation model to the field-bus, its interface is [2] Realtime Linux, Open Source Au-
hidden behind the interface of a bus-coupler device. tomation Development Lab (OS-
For this purpose a sercos III PCI networking card is ADL), https://www.osadl.org/Realtime-
utilised. The driver of the card is enhanced to emu- Linux.projects-realtime-linux.0.html, 2011.
late the interface of a bus-coupler and to to transfer
data between the bus and the simulation model. The [3] Real-Time Linux Wiki,
simulation model is executed by the simulation soft- https://rt.wiki.kernel.org.
ware Virtuos on the simulation PC. With this setup,
[4] UIO drivers in the context of RT kernels, Hans-
PLC programs for controlling production machines,
Jürgen Koch, Germany, Twelfth Real-Time
which need run on their (proprietary) and unmodi-
Linux Workshop, 2010, Kenya.
fied target hardware can be tested by means of sim-
ulated mechatronic hardware. Accordingly, the time [5] sercos International e.V., www.sercos.org, Ger-
to market of a production machine can be reduced many.
by parallelisation of development tasks. As testing
of programs which control or depend on mechanical [6] AUTOMATA GmbH & Co. KG,
hardware can be tested without the real hardware to www.automataweb.com, Germany.
be available.
[7] Sercos III Slave Driver API Documentation
V1.1, Automata GmbH, 2011, Germany.
80
Real-Time Linux Infrastructure and Tools
Andrea Claudi
Università Politecnica delle Marche, Department of Ingegneria dell’Informazione (DII)
Via Brecce Bianche, 60131 Ancona, Italy
[email protected]
Abstract
Testing is a key step in software development cycle. Error and bug fixing costs can significantly affect
development costs without a full and comprehensive test on the system.
First efforts to introduce real-time features in the Linux kernel are now more than ten years old.
Nevertheless, no comprehensive testsuites is able to assess the functionality or the conformance to the
real-time operating systems standards of the Linux kernel and of real-time nanokernels that rely on it.
In this paper we propose Lachesis, an automated testsuite derived from the LTP (Linux Test Project)
real-time tests. Lachesis is designed with portability and extensibility as main goals, and it can be used
to test Linux, PREEMPT RT, RTAI and Xenomai real-time features and performances. It provides some
tests for SCHED DEADLINE patch, too. Lachesis is now under active development, and more tests are
planned to be added in the near future.
81
Lachesis: a testsuite for Linux based real-time systems
rapid evolution. We need more efficient, effective and 2 Taxonomy of testing method-
comprehensive test methods, able to ensure proper
software behaviour in the wide range of situations
ologies
where systems can be deployed. For example, tests
are critical in an environment where a malfunction- In software engineering, testing is the process of vali-
ing system can seriously damage machinery, struc- dation, verification and reliability measurement that
tures or even human life. ensure the software to work as expected and to meet
requirements.
Nowadays there are many automatic testsuites,
covering an increasingly wide range of kernel fea- The Linux kernel has been tested since its intro-
tures. Many of these testsuites make it possible duction. In the early stage of development tests were
to functionally test file systems and network stack, ad-hoc and very informal: every developer individu-
to evaluate efficiency in memory management and ally conducted tests on the portion of code he devel-
in communication between processes, and to assess oped, with his own methodologies and techniques;
standards compliance. Very few of them make func- frequently tests came after end-users bug reports,
tional testing on real-time features, and none of them and were aimed at resolving the problem, identify-
test performances or conformance with real-time fea- ing the section of code causing it.
tures. Over the years kernel grew across many differ-
ent architectures and platforms. Testing activities
became increasingly difficult and costly in terms of
1.1 Paper contributions time, but remained very critical for kernel reliability,
robustness and stability.
In this paper we propose Lachesis, an automated
testsuite for Linux based real-time systems, derived For this reason different testing methodologies
from the LTP real-time tests. Lachesis main goals and techniques for the Linux kernel were experi-
are: mented and used. Indicatively, these methods can
be grouped into seven categories [7].
• to provide extensive and comprehensive testing
of real-time Linux kernel features
2.1 Built-in debugging options
• to provide a common test environment for dif-
ferent Linux based real-time systems This kind of tests must not be done simultaneously
with functional and performance tests. It consists in
• to provide a set of functional, regression, per- a series of debugging options (CONFIG DEBUG *
formance and stress test, either developing or in the Linux kernel) and fault insertion routines that
porting them from other testsuites allow the kernel to test itself.
• to design and experiment a series of build tests
82
Real-Time Linux Infrastructure and Tools
examine the code statically with tools like sparse, 3 Automated testsuites
LClint [8] (later renamed as splint), and BLAST [9].
Testing is expensive, both in terms of costs and time.
Automation is a good way to reduce economic and
2.4 Functional and unit tests human efforts on testing. A number of automated
testing environments for the Linux kernel has been
Functional and unit tests are conceived to examine proposed, each with its own strengths and weak-
one specific system functionality. The code imple- nesses.
menting a feature is tested in isolation, to ensure it In the following paragraphs the most impor-
meets some requirement for the implemented specific tant test suites for the Linux kernel are presented.
operation. Crashme [10] is an example of this kind Goals and basic concepts which guide their design
of test. are stated.
Regression tests are designed to uncover new er- IBM autobench is an open source test harness1 con-
rors and bugs in existing functionalities after changes ceived to supports build and system boot tests, along
made on software, such as new features introduction with support for profiling [7]. It is written in a com-
or patches correcting old bugs. The goal for this kind bination of perl and shell scripts, and it is fairly com-
of tests is to assure that a change did not introduce prehensive.
new errors.
Autobench can set up a test execution environ-
Regression testing methods are different, but in ment, perform various tests on the system, and write
general consist in rerunning previously ran tests and logs of statistical data. Tests can be executed in par-
evaluating system behaviour, checking whether new allel, but test control support is basic and the user
errors appear or old errors re-emerge. have almost no control over the way tests are exe-
cuted. Error handling includes the success or failure
of the tests, but is a very complex activity and must
be done explicitly in all cases. In addition the use
2.6 Performance tests
of different languages limits testsuite’s extensibility
and maintainability.
Performance tests measure the relative performance
of a specific workload on a certain system. They IBM autobench project is inactive since 2004,
produce data sets and comparisons between tests, al- when the last version was released.
lowing to identify performance changes or to confirm
that no changes has happened. In this category we
can include kernbench, a tool for CPU performance 3.2 Autotest
tests; iobench, a tool for disk performance tests; and
netperf, a tool to test network performance. Autotest is an open source test harness capable of
running as a standalone client. It is easy to plug
it into an existing server harness [11], too. Au-
2.7 Stress tests totest provides a large number of tests, including
functional, stress, performance, regression and ker-
nel build tests. It supports various profilers, too.
Stress tests push the system to the limits of its capa-
bilities, trying to identify anomalous behaviours. A Autotest is written in python, which enables it
test of this kind can be conceived as an highly parallel to provide an object oriented and clean design. In
task, such as a completely parallelized matrix multi- this way testsuite is easy to extend and maintain.
plication. A performance test running under heavy Including python syntax in job control file, for ex-
memory pressure (such as running with a small phys- ample, users can take more control on test execution.
ical memory), or in a highly resource-competitive en- On the other hand, python is not widely used in the
vironment (competing with many other tasks to ac- real-time community, and is not suited for real-time
cess the CPU, for example) can become a stress test. tests or applications development.
1 A test harness is an automated test framework designed to perform a series of tests on a program unit in different operating
conditions and load, monitor the behaviour of the system and compare the test results with a given range of good values.
83
Lachesis: a testsuite for Linux based real-time systems
Autotest has built-in error handling support. cyclictest [14], for example, is a well known test
Tests produce machine parsable logs; their exit sta- that measures the latency of cyclic timer interrupts.
tus are consistent and a descriptive message of them Through command line options, the user can choose
is provided. A parser is built into the server har- to pin the measurement thread to a particular core
ness, with the task of summarizing test execution of a multi-core system, or to run one thread per core.
results from different testers, and formatting them Cyclictest works by creating one or more user space
in an easy consultation form. periodic thread, the period being specified by the
user. The accuracy of the measurement is ensured
Autotest includes very few tests to examine the
by using different timing mechanism.
Linux kernel from a real-time point of view; almost
all of them are functional tests. Moreover, it does not Another interesting test is hackbench. It is both
include any compliance test on real-time standards. a benchmark and a stress test for the Linux kernel
scheduler. Hackbench creates a specified number of
pairs of schedulable entities which communicate via
3.3 Crackerjack socket. It measures how long it takes for each pair
to send data back and forth.
Crackerjack is a testsuite whose main goal is regres-
sion testing [12]. It provides: rt-tests is a good and well established suite to
test Linux kernel real-time features. However it is
• automatic assessment of kernel behaviours conceived primarily to test the PREEMPT RT patch
set, so it’s quite difficult to extend it to other Linux
• test results storage and analysis based real-time systems. For example, it contains a
• incompatibilities notification couple of tests based on a driver for the Linux kernel;
in systems such as Xenomai, the use of a driver as
• test result and expected test result manage- this causes a mode change in which a real-time task
ment (register, modify, remove) switch from the Xenomai to the Linux environment.
Thus, the task experiences much longer latencies.
Crackerjack is initially developed to test Linux ker-
nel system calls, but over time has been revised to Test results are outputted in a statistical sum-
easy future extension to other operating systems. mary, rather than in a boolean ”PASS” or ”FAIL”.
Unfortunately rt-tests do not provide any mechanism
It is implemented using Ruby on Rails. This for collecting the results and present them in a ma-
makes it easy to modify it, ensuring a low mainte- chine parsable form.
nance cost and simplifying development of new tests.
However, as for python, Ruby is not suited for real-
time tests or applications development.
3.5 LTP - Linux Test Project
Crackerjack integrates a branch tracer for the
Linux kernel, called btrax. Btrax is a tool to anal- LTP (Linux Test Project) is a functional and regres-
yse programs effectiveness. Crackerjack uses btrax to sion testsuite [15]. It contains more than 3000 test
trace the branch executions of the target program, to cases to test much of the functionalities of the ker-
analyse the trace log file, and to display data about nel, and the number of tests is increasingly growing.
coverage and execution path. btrax makes use of LTP is written almost entirely in C, except for some
Intel processors’ branch trace capabilities, recording shell scripts.
how much code was tested.
In recent years LTP has been increasingly used
Crackerjack does not support conformance, per- by kernel developers and testers and today is almost
formance or stress tests, and does not include any a de-facto standard to test the Linux kernel [16] [17].
functional test on real-time features. Linux distributors use LTP, too, and contributes en-
hancements, bug fixes and new tests back to the
suite.
3.4 rt-tests
LTP excellence is testing Linux kernel basic func-
rt-tests [13] is a popular testsuite developed to test tionality, generating sufficient stress from the test
the PREEMPT RT patch to the Linux kernel. It is cases. LTP is able to test and stress filesystems, de-
developed by Thomas Gleixner and Clark Williams, vice drivers, memory management, scheduler, disk
and it is used in the OSADL lab, across various hard- I/O, networking, system calls and IPC and provides
ware architectures, for a continuous testing. rt-tests a good number of scripts to generate heavy load on
includes ten different tests for real-time features. the system.
84
Real-Time Linux Infrastructure and Tools
It also provides some additional testsuites such common test environment for different Linux based
as pounder, kdump, open-hpi, open-posix, code cov- real-time systems. Therefore it seems reasonable to
erage [18], and others. start from an existing, accepted and widely used test-
suite, to adopt its principles and apply them to a new
LTP lacks support for profiling, build and boot
testsuite, conceived with other goals and priorities.
tests. Even if it contains a complete set of tests, LTP
is not a general heavy weight testing client. We choose LTP as a starting point for Lachesis.
There are many reasons behind this choice. First,
LTP also lacks support for machine parsable logs.
LTP is one of the few testsuites able to provide a
Test results can be formatted as HTML pages, but
set of tests for Linux kernel real-time features, and
they are either “PASS” or “FAIL”, and for tester
a large number of testers use it. Second, LTP has
is more complex to understand the reasons behind
a well established and clean architecture. It makes
failures.
use of two main libraries, librttest which provides
LTP has a particularly interesting real-time test- an API to create, signal, join and destroy real-time
suite, that provides functional, performance and tasks, and libstats which provides an API for some
stress tests on the Linux kernel real-time features. basic statistical analysis and some functions to save
To the best of our knowledge, LTP is one of the few data for subsequent analysis. Last, LTP provides a
testsuites that provides such a comprehensive and logging infrastructure. This is an important and de-
full featured set of tests for Linux real-time func- sirable feature for Lachesis, too.
tionalities.
We believe it is of little significance to compare
the results of tests to absolute values statically built
inside the testsuite. In fact, varying the hardware
4 Lachesis to test, these values should vary as well. So, un-
like LTP, Lachesis provides a boolean pass/no pass
All analysed testsuites seem to suffer some key fail- output only on functional tests; by contrast, in per-
ings in relation to testing the Linux kernel real-time formance tests it outputs a statistical summary of
features. the results.
Many of them seem to have little consideration
for real-time features in the Linux kernel. A great 4.1 Architecture
part of them (with the notable exception of LTP and
rt-tests) does not offer any real-time functional, per-
Lachesis is designed to analyse a variety of Linux
formance or stress test.
based real-time systems; therefore it provides a
Usually it is simple to design and develop a new straightforward method to build tests for different
test inside an existing testsuite. However it is very kernels. During the configuration Lachesis probes
difficult to extend an entire testsuite in order to the system to determine which nanorkernels or real-
test some new real-time nanokernels. System calls time patches are present, and instructs the compiler
analysed in tests, in fact, may differ syntactically to produce in output different executables for each
from one kernel to another maintaining the same system to be tested. A set of scripts is provided to
functionality. Moreover some nanokernels, such as execute tests sequentially; launching these scripts,
RTAI and Xenomai, provides some real-time features tests are executed one after another and tests results
through additional system calls, for which specific are stored in logs for subsequent analysis.
tests should be developed.
librttest had to be rewritten to support both
Another problem is the lack of machine parsable RTAI and Xenomai primitives. Lachesis maintains
results. There is no standard way to consistently librttest API, extending it to provide advanced real-
communicate results to the user; often we have not time features, typical of Linux-based nanokernels.
any detail on the reason that led a test to failure. Basic real-time features are provided encapsulating
real-time specific function into the pre-existing API,
Lastly, every testsuites has grown rapidly and
thus concealing them from the user. For example,
chaotically in response to the evolution of the Linux
the create task() primitive was modified to take
kernel. For this reason they are not easy to under-
into account the corresponding primitives for Xeno-
stand, maintain and extend.
mai and RTAI.
The lack of a comprehensive testsuite to meet
As a result, it is possible to write a single test
previously exposed needs led us to develop Lachesis.
for a specific real-time feature and use it to test all
The ambitious goal of Lachesis is to provide a supported systems, thus increasing testsuite’s porta-
85
Lachesis: a testsuite for Linux based real-time systems
86
Real-Time Linux Infrastructure and Tools
measures the time a task waits to lock a mutex. Test to ensure a correct preemption between tasks. It cre-
creates a task with higher priority, one with lower ates 26 tasks at different priority, each of them trying
priority, and some tasks at medium priority; highest to acquire a mutex. Test is passed if all task are ap-
priority task tries to lock a mutex shared with all the propriately preempted in 1 loop. This is a functional
other tasks. Test is repeated 100 times. test.
2) func deadline1 is a functional test we 11) func prio verifies priority ordered wakeup
developed, conceived to be used only on from waiting. It creates a number of tasks with in-
SCHED DEADLINE patched kernels. It creates creasing priorities, and a master task; each of them
a task set with U = 1, using the UUniFast [20] algo- waits on the same mutex. When the master task
rithm not to bias test’s results. Task set is scheduled, releases its mutex, any other task can run. Test is
and test is passed if no deadline is missed. passed if tasks wakeup happened in the correct pri-
ority order. This is a functional test.
3) func deadline2 is a functional test we
developed, conceived to be used only on 12) func sched verifies scheduler behaviour using
SCHED DEADLINE patched kernels. It creates a football analogy. Two kinds of tasks are created:
a task set with U > 1, using the UUniFast algorithm defence tasks and offence tasks. Offence tasks are
not to bias test’s results. Task set is scheduled, and at lowest priority and tries to increment the value of
test is passed if at least one deadline is missed. a shared variable (the ball). Defence tasks have an
higher priority and they should block offence tasks,
4) func gettime verifies clock gettime() be-
in such a way that they never execute. In this way
haviour. It creates a certain number of tasks, some of
ball position should never change. The highest prior-
them setted to sleep, some other ready to be sched-
ity task (the referee) end the game after 50 seconds.
uled. Test is passed if the total execution time of
Test is passed if at the end of the test the shared
sleeping tasks is close to zero. This is a functional
variable is zero. This is a functional test.
test.
13) jitter sched measures the maximum execu-
5) func mutex creates a number of tasks to walk
tion jitter obtained scheduling two different tasks.
through an array of mutexes. Each task holds a max-
The execution jitter of a task is the largest difference
imum number of locks at a time. When the last task
between the execution times of any of its jobs [19].
is finished, it tries to destroy all mutexes. Test is
The first task measures the time it takes to do a fixed
passed if all mutexes can be destroyed, none of them
amount of work; it is periodically interrupted by an
being held by a terminated task. This is a functional
higher priority task, that simply wakes up and goes
test.
back to sleep. Test is repeated 1000 times. This is a
6) func periodic1 creates three groups of periodic performance test.
tasks, each group with different priorities. Each task
14) latency gtod is a performance test. It mea-
makes some computation then sleeps till its next pe-
sures the time elapsed between two consecutive calls
riod, for 6000 times. Test is passed if no period is
of the gettimeofday() primitive. Test is repeated a
missed. This is a functional test.
million of times, at bulks of ten thousand per time.
7) func periodic2 is a functional test we devel-
15) latency hrtimer one timer task and many
oped, and is conceived to be used in kernels that
busy tasks have to be scheduled. Busy tasks run at
support primitives for periodic scheduling. It creates
lower priority than timer task; they perform a busy
a task set with U = 1, using the UUniFast algorithm
wait, then yield the cpu. Timer task measures the
not to bias test’s results. Task set is scheduled, and
time it takes to return from a nanosleep call. Test
test is passed if no period is missed.
is repeated 10000 times, and is passed if the highest
8) func periodic3 is a functional test we devel- priority task latency is not increased by low priority
oped, and is conceived to be used in kernels that tasks. This is a performance test.
support primitives for periodic scheduling. It creates
16) latency kill two tasks with different priority
a task set with U > 1, using the UUniFast algorithm
are to be scheduled. Lower priority task sends a kill
not to bias test’s results. Task set is scheduled, and
signal to the higher priority task, that terminates.
test is passed if at least one period is missed.
The test measures the latency between higher pri-
9) func pi checks whether priority inheritance ority task start and termination. Test is repeated
support is present in the running kernel. This is a 10000 times, and we expect a latency under the tens
functional test. of microseconds. This is a performance test.
10) func preempt verifies that the system is able 17) latency rdtsc is a performance test. It mea-
87
Lachesis: a testsuite for Linux based real-time systems
sures the average latency between two read of the Several real-time tests were ported to Lachesis
TSC register, using the rdtscll() primitive. Test from other testsuites, in a simple and straightfor-
is repeated a million of times. ward way. In many cases there were no needs to
change the code except to add some macro calls at
18) latency sched is conceived to measure the la-
the beginning and at the end of the test’s code.
tency involved in periodic scheduling in systems that
do not support primitives for periodic scheduling. Our extension to librttest API has made possi-
A task is executed, then goes to sleep for a certain ble to develop some new tests for threads with fixed
amount of time; at the beginning of the new period periods or deadlines. These tests are useful to value
the task is rescheduled. We measure the difference jitter and latency in periodic task scheduling. Simi-
between expected start time and effective start time lar tests can be developed in very short times.
for the task. We expect this difference is under the
Unfortunately, Lachesis is far from complete.
tens of µs and, additionally, we expect the task does
First, its test coverage is very low. Second, tests
not miss any period. This is a performance test.
included in Lachesis are somewhat general to be re-
19) latency signal schedules two tasks with the ally useful in development. So Lachesis needs to ex-
same priority. One task sends a signal, the other re- pand its test coverage with more specific tests, and
ceives it. The test measures the time elapsed between librttest needs to take into account more low level
sending and receiving the signal. Test is repeated a primitive, to make possible to develop more signifi-
million of times. We expect a latency under the tens cant tests.
of µs. This is a performance test.
For this reasons, we believe it’s very important to
20) stress pi stresses the Priority Inheritance integrate the testsuite rt-tests in Lachesis. As under-
protocol. It creates 3 real-time tasks and 2 non real- lined previously, rt-tests is very specific in respect to
time tasks, locking and unlocking a mutex 5000 times Lachesis, and so it’s quite difficult to figure out how
per period. We expect real-time tasks to make more to extend these tests to other real-time nanokernels.
progress on the CPU than non real-time tasks. This We expect that a strong extension to librttest API
is a stress test. is necessary to reach this goal.
Up to now Lachesis is tested and used only on
x86 architecture. Given that we use only high-level
5 Conclusions and future work kernel primitives, we are quite confident that the
testsuite is easily portable on other architectures,
In this paper we have presented Lachesis, a unified with little or no effort. Recently we developed a
and automated testsuite for Linux based real-time porting of Xenomai 2.5.5.2 and RTAI 3.8 to a Mar-
systems. Lachesis tries to meet the need for a soft- vell ARM9 board2 , and we plan to use Lachesis to
ware tool to test Linux and Linux-based systems real- test the functionalities and the performances of these
time features, having the following qualities: portings.
However, just the variety of Linux based real-
• supports tests on Linux, RTAI, Xenomai, PRE- time systems that Lachesis is able to test proves that
EMPT RT and SCHED DEADLINE real-time it is portable and easy to use. We plan to exploit
features, through a standard test API this qualities porting Lachesis to other systems, such
• provides a series of functional, performance IRMOS [2], SCHED SPORADIC [21], XtratuM [22]
and stress tests to ensure the functionality of and PartiKle [23], and to other architectures.
the examined kernels Beyond this, we plan to develop some kernel-
space tests for real-time nanokernels and to build
• provides a series of tests for periodic and dead- a system to parse and XML format results. Test re-
line tasks sults quality can be improved, also, detailing possible
• is easy to use: each feature to be tested is as- reasons behind a test failure.
sociated to a script, which runs tests and logs Lachesis is actually under active development,
the results for every testable system. and can be downloaded from bitbucket.org3 .
• it includes a set of bash scripts that helps to
execute tests in the correct order and in the
correct conditions.
2 ARM Marvell 88F6281, equipped with a Marvell Feroceon processor, clocked at 1.2 GHz, ARMv5TE instruction set.
3 https://bitbucket.org/whispererindarkness/lachesis
88
Real-Time Linux Infrastructure and Tools
89
Lachesis: a testsuite for Linux based real-time systems
90
Real-Time Linux Infrastructure and Tools
Abstract
Linux has become a popular foundation for systems with real-time requirements such as industrial
control applications. In order to run such workloads on Linux, the kernel needs to provide certain
properties, such as low interrupt latencies. For this purpose, the kernel has been thoroughly examined,
tuned, and verified. This examination includes all aspects of the kernel, including the device drivers
necessary to run the system.
However, hardware may change and therefore require device driver updates or replacements. Such an
update might require reevaluation of the whole kernel because of the tight integration of device drivers
into the system and the manyfold ways of potential interactions. This approach is time-consuming and
might require revalidation by a third party. To mitigate these costs, we propose to run device drivers in
user-space applications. This allows to rely on the unmodified and already analyzed latency characteristics
of the kernel when updating drivers, so that only the drivers themselves remain in the need of evaluation.
In this paper, we present the Device Driver Environment (DDE), which uses the UIO framework
supplemented by some modifications, which allow running any recent PCI driver from the Linux kernel
without modifications in user space. We report on our implementation, discuss problems related to DMA
from user space and evaluate the achieved performance.
91
Generic User-Level PCI Drivers
these drivers still need to be rewritten from scratch order to run them as user-level applications without
using UIO. In this paper we propose an alternative modification. In this section we give an overview of
technique: Using UIO and other available kernel the DDE approach and analyze Linux’ UIO frame-
mechanisms, we implement a Device Driver Envi- work regarding its capabilities of supporting generic
ronment (DDE) – a library providing a kernel-like user-level device drivers.
interface at the user level. This approach allows for
reusing unmodified in-kernel drivers by simply wrap-
ping them with the library and running them at the 2.1 The DDE Approach
user level.
Our approach for reusing in-kernel device drivers in
In the following section, we introduce the general user space is depicted in Figure 1. The source code
idea of the DDE and inspect the UIO framework with of an unmodified native Linux device driver is linked
respect to its support of a generic user-level driver against a wrapper library, the Device Driver Envi-
layer. We then discuss our implementation of a DDE ronment. The wrapper provides all functions the
for Linux in Section 3. Thereafter, we continue an- driver expects to be implemented originally by the
alyzing the special needs of Direct Memory Access Linux kernel. The DDE reimplements these func-
(DMA) from user space in Section 4 and present a tions solely using mechanisms provided by a device
solution that requires only minimal kernel support. driver abstraction layer, called DDEKit.
In Section 5 we evaluate our DDE implementation
with an in-kernel e1000e network device driver run-
ning as user-space application.
Native Linux
Device Driver
92
Real-Time Linux Infrastructure and Tools
93
Generic User-Level PCI Drivers
tion mechanisms such as locks, semaphores, and even uio pci generic to the device. Whenever the read
condition variables need to be present. returns, at least one interrupt event has occurred
and the handler function registered by the driver is
Furthermore, a lot of drivers need a notion of
executed.
time, which Linux drivers usually obtain by look-
ing at the magic jiffies variable. Hence, DDEKit The interrupt handler thread is the only one
needs to support this. Apart from these features, in polling the UIO device file for interrupts. Af-
order to be useful, the DDEKit also provides means ter successful return from the blocking read, the
for printing messages and a link-time mechanism sysfs node for the device’s PCI config space
for implementing prioritized init-calls, that is func- (/sys/class/uio/.../config) is written to disable
tions that are automatically run during application IRQs while handling the interrupts. In order to avoid
startup before the program’s main function is exe- interrupt storms in the kernel while the user-level
cuted. driver is executing its handler, the disabled interrupt
is only turned on right before the interrupt thread be-
comes ready to wait for the next interrupt by reading
3.2 I/O Ports and Memory the UIO device.
3.5 Timing
3.3 Interrupt Handling
Linux device drivers use timing in two flavors: first,
For managing interrupts, DDEKit/Linux makes use the jiffies counter is incremented with every clock
of the UIO interrupt handling mechanism, which tick. DDEKit/Linux emulates jiffies as a global
supports generic interrupt handling through the variable. During startup, a dedicated jiffies
uio pci generic module for all PCI devices sup- thread is started that uses the libC’s nanosleep to
porting the PCI specification v2.3 or higher. sleep for a while and thereafter adapt the jiffies
counter accordingly. For the drivers we experimented
Once the driver requests an IRQ for a de-
with so far, it has proven sufficient to not tick with
vice, DDEKit locates the generic UIO driver’s
HZ frequency as the Linux kernel would, but in-
sysfs node (/sys/bus/pci/drivers/.../new id).
stead only update the jiffies counter every 10th
It then writes the PCI device’s device and vendor IDs
HZ tick. This might be adapted once a driver needs
into this file and thereby makes uio pci generic
a finer granularity. Furthermore, as device drivers
become responsible for handling this device’s inter-
run as independent instances in user space, this can
rupts.
be configured for every device driver separately ac-
Thereafter, a new interrupt handler thread is cording to its needs and the jiffies counting over-
started. This thread performs a blocking read on head can even be completely removed for drivers that
the UIO file that was generated when attaching don’t need this time source.
94
Real-Time Linux Infrastructure and Tools
The second way Linux drivers use timing is 1. The region’s physical address needs to be avail-
through the add timer group of functions that allows able as DMA does not use virtual addresses.
to program deferred events. DDEKit/Linux provides
an implementation by spawning a dedicated timer 2. It needs to be physically contiguous so that no
thread for every driver instance. This thread man- virtual-to-physical address translations need to
ages a list of pending timers and uses a semaphore be done during the DMA transfer.
to block with a timeout until the next timer occur-
3. It needs to be pinned, that is the region or parts
rence should be triggered. If the blocking semaphore
of it must not be swapped out during the DMA
acquisition returns with a timeout, the next pending
transfer.
timer needs to be handled by executing the handler
function. Otherwise, an external thread has mod-
ified the timer list by either adding or removing a None of these criteria are met by user-
timer. In this case the timer thread recalculates the level memory allocation routines such as malloc,
time to sleep until the next trigger and goes back to posix memalign or mmap, because they work on
sleep. purely virtual addresses and the underlying kernel
is free to map those pages anywhere it wants.
As it is necessary to get kernel support for han-
dling DMA, we implemented a small kernel module
3.6 Memory Management providing an interface to the in-kernel DMA API.
The module supports two modes: copy-mode pro-
Running in user space means that DDEKit/Linux vides a simple translation layer between user and
may use LibC’s malloc and free functions for inter- kernel pages for DMA and zero-copy mode facilitates
nal memory management needs. However, this does an IOMMU to improve DMA performance.
not suffice for implementing Linux’ memory manage-
ment functions. Linux’ kmalloc is internally already
implemented using SLABs or one of their equiva- 4.1 Copy-DMA
lents. Our implementation currently provides a spe-
cific SLAB implementation in DDEKit, but we plan Our kernel module for supporting DMA from user
to use Linux’ original memory allocator in the fu- space closely collaborates with the uio core as
ture and only back it with page-granularity memory shown in Figure 3. The uio dma module is noti-
allocations provided from DDEKit. fied by the uio core when a device is bound to it
Additionally, Linux drivers may use the group and creates an additional device node /dev/uio-dma
of get free pages functions to allocate memory which user-level drivers can use to obtain DMA-able
with page granularity. DDEKit/Linux supports page memory for a specific device2 .
granularity allocations through a function that uses
mmap in order to allocate page-aligned memory. User-Level
Device Driver
A remaining problem is that drivers commonly
acquire DMA-able memory in order to allow high
amounts of data to be copied without CPU in-
teraction. This is impossible by solely relying on
user-level primitives. This means that an imple- uio_core uio_dma
95
Generic User-Level PCI Drivers
96
Real-Time Linux Infrastructure and Tools
10
5.1 Real-Time Operation 0
no
hi
no
hi
no
hi
no
hi
_l
_l
_l
_l
_l
_l
_l
_l
oa
oa
oa
oa
oa
oa
oa
oa
To evaluate the influence of running PCI drivers
d_
d_
d_
d
d_
d_
d_
e1
us
us
e1
us
in user space on the system’s real-time behav-
se
er
er
00
er
00
_m
0e
_m
0e
ap
ior, we used the cyclic test utility provided by
ap
OSADL [20]. Figure 5 shows the maximum laten-
cies for several scenarios we tested.
FIGURE 6: Maximum cyclic test latencies
Each group has four bars corresponding to for the IOMMU scenario
threads running on the 4 CPUs in our test ma-
chine. The group labelled no load shows the la- In addition to the experiments also present in
tencies for running cyclic test on the idle system the no-IOMMU case, we added two more bar groups
running with idle=poll to mitigate power manage- labelled * user map. These groups show maximum
ment effects. For the group labelled hi load we set latencies obtained when using the no-copy version of
each CPU’s load to 100% and reran cyclic test. the uio dma module.
Thereafter, we added network load to the system
by running the IPerf UDP benchmark [8] between In both setups we see that the maximum laten-
the test machine and a remote PC. The groups with cies using user-level device drivers are within the
labels * e1000e show the latency for using network bounds of the other measurements. Although we ob-
through the in-kernel e1000e driver. The groups la- serve a peak in the latency for hi load user, this peak
belled * user give latencies obtained for running the is within the bounds of the unmodified measurements
experiment with the e1000e driver in user space using (e.g., hi load in the previous experiment). We con-
DDE. clude that running device drivers in user space using
DDE has no influence on the real-time capabilities of
cyclic_test latencies without IOMMU
the system.
60
Maximum latency in microseconds
50
hi
no
hi
no
hi
_l
_l
_l
_l
_l
_l
oa
oa
oa
oa
oa
d
d_
d_
d
d_
d_
e1
us
e1
us
er
00
0e
0e
97
Generic User-Level PCI Drivers
Profiling DDE
IPerf throughput When we initially ran the e1000e driver in user space,
Kernel stack
performance was by far not as convincing as in the
1000 User lwIP stack experiments described in Section 5.2.
Using Valgrind’s [19] Callgrind profiler, we were
Throughput in MBit/s
800
able to investigate where the performance went. We
600 were caught by surprise by the result: DDE man-
ages a list of virtual-to-physical mappings for all
400
allocated memory. This list is used to implement
200
the virt to phys lookup mechanism. This is im-
plemented as a linked list and the assumption was
0 that this would suffice, because there would never
U
TC
TC
P_
P_
P_
P_
av
m
av
g
g
98
Real-Time Linux Infrastructure and Tools
instruction-level tracing and symbolic execution in ings of the 2010 USENIX conference on USENIX
order to generate device-specific code that can be annual technical conference, USENIX ATC’10,
dropped into existing per-OS device driver skele- pages 9–9, Berkeley, CA, USA, 2010. USENIX As-
tons [3]. While this approach eases device driver sociation.
reuse, applying it to a real-time kernel has the same [3] Vitaly Chipounov and George Candea. Reverse en-
drawbacks as native in-kernel drivers in that they gineering of binary device drivers with RevNIC. In
still need to be revalidated every time an update is EuroSys ’10: Proceedings of the 5th European Con-
applied. ference on Computer Systems, pages 167–180, New
York, NY, USA, 2010. ACM.
[4] Andy Chou, Junfeng Yang, Benjamin Chelf, Seth
Hallem, and Dawson Engler. An empirical study of
7 Conclusion operating systems errors. In SOSP ’01: Proceed-
ings of the Eighteenth ACM Symposium on Operat-
In this paper we presented a Device Driver En- ing Systems Principles, pages 73–88, New York, NY,
vironment that allows executing generic Linux in- USA, 2001. ACM.
kernel PCI drivers as user-level applications on top [5] Jonathan Corbet. UIO: user-space drivers. https:
of Linux. This is achieved by implementing the DDE //lwn.net/Articles/232575/, 2007.
as a wrapper library implementing the facilities ex- [6] Intel Corp. Network adapter driver for Gigabit
pected by in-kernel drivers at user space using off- PCI based network connections for Linux. http:
the-shelf kernel mechanisms such as UIO and sysfs. //downloadcenter.intel.com, 2010.
With the help of a small kernel module our frame- [7] Zheng Da. DDE for GNU/HURD. http://www.
work also supports DMA from user space. gnu.org/software/hurd/dde.html.
Using this framework, we were able to run the [8] Jon Dugan and Mitch Kutzko. IPerf TCP/UDP
bandwidth benchmark. http://sourceforge.net/
widely used e1000e network interface driver in user
projects/iperf/, 2011.
space on a PREEMPT RT kernel. Experiments us-
ing cyclic test showed that the real-time latencies of [9] Adam Dunkels. Minimal TCP/IP implementation
the system were not influenced by the fact that the with proxy support. Technical Report T2001:20,
SICS – Swedish Institute of Computer Science,
driver was running from user space. Furthermore, it
February 2001. Master’s thesis.
was possible to use common Linux program analy-
sis tools such as the GDB debugger and Valgrind to [10] Thomas Friebel. Uebertragung des Device-
Driver-Environment-Ansatzes auf Module des BSD-
profile and debug drivers.
Betriebssystemkerns. Master’s thesis, TU Dresden,
The DDEKit for Linux is available for download 2006.
at http://os.inf.tu-dresden.de/ddekit/. [11] Vinod Ganapathy, Matthew J. Renzelmann, Arini
Balakrishnan, Michael M. Swift, and Somesh Jha.
The design and implementation of microdrivers.
In ASPLOS’08: Proceedings of the Thirteenth In-
Acknowledgments ternational Conference on Architectural Support
for Programming Languages and Operating Sys-
We’d like to thank several people whose hard work tems, pages 168–178, Seattle, Washington, USA,
within the recent years has made design and imple- March 2008. ACM Press, New York, NY, USA.
mentation of the Device Driver Environment possi- http://doi.acm.org/10.1145/1346281.1346303.
ble. Thank you, Christian Helmuth, Thomas Friebel, [12] TU Dresden OS Group. DDE/DDEKit for
and Dirk Vogt. Carsten Weinhold provided valuable Fiasco+L4Env. http://wiki.tudos.org/DDE/
hints on improving this paper. DDEKit, 2006.
[13] Jorrit N. Herder, Herbert Bos, Ben Gras, Philip
This work was partially supported by the Ger-
Homburg, and Andrew S. Tanenbaum. Failure re-
man Research Association (DFG) within the Special silience for device drivers. In DSN ’07: Proceedings
Purpose Program 1500, project title ASTEROID. of the 37th Annual IEEE/IFIP International Con-
ference on Dependable Systems and Networks, pages
41–50, Washington, DC, USA, 2007. IEEE Com-
References puter Society.
[14] Antti Kantee. Rump device drivers: Shine on
[1] Linux RT project. http://www.kernel.org/pub/ you kernel diamond. http://ftp.netbsd.org/pub/
linux/kernel/projects/rt/. NetBSD/misc/pooka/tmp/rumpdev.pdf, 2010.
[2] Silas Boyd-Wickizer and Nickolai Zeldovich. Toler- [15] Genode Labs. Genode dde kit. http://genode.
ating malicious device drivers in linux. In Proceed- org/documentation/api/dde\_kit\_index.
99
Generic User-Level PCI Drivers
[16] Ben Leslie, Peter Chubb, Nicholas Fitzroy-Dale, [21] PCI SIG. PCI Local Bus Specification.
Stefan Götz, Charles Gray, Luke Macpherson, http://www.pcisig.com/specifications/
Daniel Potts, Yueting Shen, Kevin Elphinstone, and conventional/conventional_pci_23/, 2002.
Gernot Heiser. User-level device drivers: Achieved
performance. Journal of Computer Science and [22] Bernhard Poess. Binary device driver reuse. Mas-
Technology, 20, 2005. ter’s thesis, Universitaet Karlsruhe, 2007.
[17] Joshua LeVasseur, Volkmar Uhlig, Jan Stoess, and
Stefan Götz. Unmodified device driver reuse and im- [23] Fred Schneider, Dan Williams, Patrick Reynolds,
proved system dependability via virtual machines. Kevin Walsh, and Emin Gun Sirer. Device driver
In In Proceedings of the 6th Symposium on Operat- safety through a reference validation mechanism. In
ing Systems Design and Implementation, pages 17– Proceedings of the 8th USENIX Symposium on Op-
30, 2004. erating Systems Design and Implementation OSDI
’08, December 2008.
[18] Martin Mares. PCI Utilities. http://mj.ucw.cz/
pciutils.html, 2010.
[24] Michael M. Swift, Brian N. Bershad, and Henry M.
[19] Nicholas Nethercote and Julian Seward. Valgrind: a Levy. Improving the reliability of commodity oper-
framework for heavyweight dynamic binary instru- ating systems. SIGOPS Oper. Syst. Rev., 37(5):207–
mentation. In Proceedings of the 2007 ACM SIG- 222, 2003.
PLAN Conference on Programming Language De-
sign and Implementation, PLDI ’07, pages 89–100, [25] Andrew Tanenbaum, Raja Appuswamy, Herbert
New York, NY, USA, 2007. ACM. Bos, Lorenzo Cavallaro, Cristiano Giuffrida, Tomáš
[20] OSADL. Cyclic test util- Hrubý, Jorrit Herder, Erik van der Kouwe, and
ity. https://www.osadl.org/ David van Moolenbroek. MINIX 3: Status Report
Realtime-test-utilities-cyclictest-and-s. and Current Research. ;login: The USENIX Maga-
rt-test-cyclictest-signaltest.0.html, 2011. zine, 35(3), June 2010.
100
Real-Time Linux Infrastructure and Tools
Pavel Pı́ša
Czech Technical University in Prague, Department of Control Engineering
Karlovo náměstı́ 13, 121 35 Praha 2, Czech Republic
[email protected]
Rostislav Lisový
Czech Technical University in Prague, Faculty of Electrical Engineering
Karlovo náměstı́ 13, 121 35 Praha 2, Czech Republic
[email protected]
Abstract
The article describes implementation of UIO and Comedi drivers for Humusoft MF624 and MF614
data acquisition cards. Basic functions (D/A, A/D converters, digital inputs/outputs) of Humusoft
MF624 card were implemented into the Qemu emulator as well which enable to experiment with drivers
implementation without physical access to the cards and risk of data lost when drivers are developed
and tested on same primary Linux kernel instance. The article can help newcomers in the area to gain
knowledge required to implement support for other similar cards and hardware emulation of these cards.
The matching real and virtual setup can be used in operating system courses for practical introduction to
simple drivers implementation and helps with understanding internal computation world with real world
computers interfacing.
101
COMEDI and UIO drivers for PCI Multifunction Data Acquisition
20 | info->port[0].porttype = UIO_PORT_X86;
21 | info->port[0].start =
UIO driver Driver
22 | pci_resource_start(dev, 1);
23 | info->port[0].size =
24 | pci_resource_len(dev, 1);
FIGURE 1: UIO driver structure 25 |
26 | uio_register_device(&dev->dev, info);
27 | pci_set_drvdata(dev, info);
Driver uio pci generic
Structure uio mem is used for enabling memory-
When dealing with any device compliant to PCI 2.3, mapped I/O regions, whereas structure uio port is
it is also possible to use uio pci generic driver in used for I/O ports (for each of these structures there
kernel instead of programming a specific one. This is statically allocated array with a size of 5 elements).
driver makes all memory regions of the device avail-
able to user-space.
Binding to the device is done by writing Ven- Interface to User-space
dor and Device ID into /sys/bus/pci/drivers/
uio pci generic/new id file. Communication with kernel part of the UIO driver
is possible through /dev/uioX file (where X is the
Interrupt handler uses Interrupt Disable bit in number of instance of a driver). There are several
the PCI command register and Interrupt Status bit syscalls possible to be used when interfacing with
in the PCI status register. Because neither of MF614 this file:
or MF624 is PCI 2.3 compliant it is not possible to
use this driver for them.
open() opens the device, returns file descriptor used
for another syscalls.
Implementing the Kernel Part read() blocks until an interrupt occurs (the value
read is number of interrupts seen by the de-
In case of writing UIO driver for PCI device, ini- vice).
tialization function of the module registers struct
pci driver in standard way1 , where the probe func- mmap() is used to map memory of the device to user-
tion handles initialization of UIO-related structures. space. The offset value passed to mmap() de-
The main structure holding all data of particular termines the memory area of a device to map
UIO driver is struct uio info. Its simple initial- – for n-th area offset should be n*sysconf(
ization (including registration) is shown below: SC PAGESIZE).
1 For more information about PCI driver development see [1] available online at https://lwn.net/Kernel/LDD3/
102
Real-Time Linux Infrastructure and Tools
irqcontrol() is used for enabling (called with pa- comedi driver register() function. The only pa-
rameter set to (int) 1) or disabling ((int) 0) rameter passed to this function is pointer to struct
interrupts. comedi driver structure. The most important fields
of this structure are:
It is possible to define your own mmap(), open(),
release() functions as an option. When there is const char *driver_name; /* "my_driver" */
need to use irqcontrol(), it is necessary to imple- struct module *module; /* THIS_MODULE */
ment this function per device. int (*attach) (struct comedi_device *,
struct comedi_devconfig *);
Information related to a particular driver in-
int (*detach) (struct comedi_device *);
stance can be found in /sys/class/uio/uioX direc-
tory. Most of the files are read-only. The subdirec-
tory maps contains information about MMIO regions Unlike the UIO or generic PCI driver, the
mapped by the driver, subdirectory portio is for I/O main initialization function is not probe() (of
port regions. struct pci driver) but attach() (of struct
comedi driver) which is invoked by Comedi sub-
When using UIO and mmap() with MF624 card system.
(which has 32 or 128 bytes long memory regions)
there is an issue with the return value of this syscall The attach() function is responsible not only
– the pointer to the memory is page-size-aligned, so for common PCI device initialization but also for
it is necessary to add low bits of physical address initialization of struct comedi device (which is
(page offset) of each memory region to it. Physical accessible through a pointer passed to attach()
address can be obtained from addr file located in function). The most important step is to allocate
/sys/class/uio/uioX/maps/mapX. Region offset is and initialize each subdevice (in Comedis nomen-
equal to addr & (sysconf( SC PAGESIZE) - 1). clature subdevice represents one particular function
of the device – e.g. ADC, digital out, etc.) of the
DAQ card. Allocation is done by Comedi func-
tion alloc subdevices(struct comedi device
4 Comedi Driver *dev, unsigned int num subdev), each struct
comedi subdevice is then accessible in array
UIO driver is a versatile solution available mainly for called subdevices which is part of struct
uncommon devices. In our case of using DAQ card comedi device. Example of initialization of sub-
a special subsystem in Linux kernel designated for device representing ADC:
DAQ card drivers can be used. It is called Comedi
(Linux control and measurement device interface). It 1 | s = dev->subdevices + 0;
provides library functions for user- and kernel-space 2 | s->type = COMEDI_SUBD_AI;
making development and usage of DAQ devices eas- 3 | s->subdev_flags = SDF_READABLE |
ier. It consists of three different parts. 4 | SDF_GROUND;
5 | s->n_chan = 8;
Comedi is a part of Linux kernel. It consist of indi- 6 | s->maxdata = (1 << 14) - 1;
vidual device drivers including Comedi driver 7 | s->range_table = &range_bipolar10;
providing basic set of functions used by device 8 | s->len_chanlist = 8;
drivers. 9 | s->insn_read = mf624_ai_rinsn;
10 | s->insn_config = mf624_ai_cfg;
Comedilib is a user-space library providing unified
interface for another user-space application to
devices supported by Comedi. Interface to User-space
Kcomedilib is also a part of Linux kernel. It pro- After successful compilation and loading of particu-
vides the same API as Comedilib, whereas this lar Comedi driver, there should be /dev/comediX
is used for real-time applications. (where X is number of instance of a driver)
file. For communication with this file Comedi li-
brary functions are used. For opening device –
Implementing the Driver comedi open(), for reading/writing ADCs/DACs
– comedi data read(), comedi data write()
Each Comedi driver should be registered to the list and for reading/writing digital inputs/outputs –
of active Comedi drivers. This is done by invoking comedi dio read(), comedi dio write().
103
COMEDI and UIO drivers for PCI Multifunction Data Acquisition
There are already applications using Comedi void (*)(void). For registering new PCI device,
API2 – thus in some cases there is no need for im- it is necessary to call pci qdev register() passing
plementing user-space application from scratch. parameter of pointer to PCIDeviceInfo. The most
important fields of this Qemu-specific data type are
pointers to init and exit functions with prototype of
int (*)(PCIDevice *).
5 Qemu Virtual Hardware
The PCI device specific initialization consists of:
Qemu is an open-source processor emulator. Unlike
common virtualization solutions it is able of emulat- • Initializing configuration space of PCI device
ing x86, x86-64, ARM and other widespread proces- – e.g. setting Vendor and Device IDs, device
sor architectures. For the purposes of this work it class, interrupt pin, etc.
was used for implementing virtual Humusoft MF624 • Registration of I/O memory used by the de-
DAQ card. vice.
104
Real-Time Linux Infrastructure and Tools
105
COMEDI and UIO drivers for PCI Multifunction Data Acquisition
106
Real-Time Linux Infrastructure and Tools
Stefan Richter
ABB Corporate Research, Industrial Software Systems
Segelhofstr. 1K, Baden-Dättwil, Switzerland
[email protected]
Michael Wahler
ABB Corporate Research, Industrial Software Systems
Segelhofstr. 1K, Baden-Dättwil, Switzerland
[email protected]
Atul Kumar
ABB Corporate Research, Industrial Software Systems
Whitefield Road, Bangalore, India
[email protected]
Abstract
State-of-the-art real-time control systems execute multiple concurrent control applications using op-
erating system mechanisms such as processes, mutexes, or message queues. Such mechanisms leave a
high degree of freedom to developers but are often hard to deal with: they incur runtime overhead, e. g.,
context switches between threads, and often require tedious and costly fine-tuning, e. g., of process and
thread priorities. Reuse is often made more difficult by the tight coupling of software to a given hardware
or other software.
In this paper, we present a software architecture and execution framework for cyclic control appli-
cations that simplifies the construction of real-time control systems while increasing predictability and
reducing runtime overhead and coupling. We present the concepts of this framework as well as imple-
mentation details of our RTLinux-based prototype.
107
A Framework for Component-Based Real-Time Control Applications
108
Real-Time Linux Infrastructure and Tools
such as priorities, proper synchronization, pos- systems and that white-box components contain too
sible deadlock scenarios etc. Often, applica- much detail for this task. The decomposition of com-
tion engineers in the power and automation ponents into blocks (see Section 2.1) in our approach
domain lack formal education in computer sci- follows up on their plea for gray-box components.
ence, making the task even harder for them.
Reusability has been addressed in a component-
Reusability While all processes seem to be fairly based real-time system by Wang et al. [15]. They
independent from each other, they do depend propose a component-based resource overlay that iso-
on the protocol between sample manager and lates the underlying resource management from ap-
protection functions. This protocol needs to be plications to separate the concerns of application de-
implemented by both sides. Further, develop- signers and component providers. Complementing
ers must also decide if the threads are supposed the work presented in this paper, they focus on the
to be run in a light-weight (same process) or in proper allocation of resources to real-time compo-
a heavy-weight way (different processes) and nents.
must adapt the usage of inter-application com-
munication mechanisms accordingly.
2 Component Framework
OS overhead If there are fewer CPU cores than
threads context switches between threads have
to occur several times in each cycle, whenever To address the issues in Section 1.2, we designed a
the flow of execution requires one thread to component framework with a runtime concept com-
wait for another thread’s output. Because of prising four structural elements: component, func-
the short cycle times in many embedded sys- tion block, port, and channel. In Section 2.1 and
tems, context switches may have a significant Section 2.2, we describe these concepts in greater de-
impact on the system behavior (see Section 4). tail. Our component framework further encompasses
a concept for executing fully deterministic static but
Communication overhead The communication replaceable schedules. Application schedules are ex-
must rely on means of inter-process commu- plained in Section 2.3, their execution is presented
nication, which is considered to be slow (e. g., in Section 2.4. In Section 2.5, we discuss how the
message passing) or to potentially compromise concepts introduced in this section address the afore-
data integrity (e. g., shared memory). mentioned issues.
109
A Framework for Component-Based Real-Time Control Applications
simplicity such blocks and their implications to the system shall be out of scope of this paper.
110
Real-Time Linux Infrastructure and Tools
111
A Framework for Component-Based Real-Time Control Applications
112
Real-Time Linux Infrastructure and Tools
Synchronization (e. g., mutex, semaphore, or Table 1 lists the results of our performance mea-
message queue) is used for initiating the exe- surements. Each measurement is the average result
3 On the other hand, research indicates that message passing can be faster than shared memory on multi-core machines [1].
113
A Framework for Component-Based Real-Time Control Applications
of hundreds of measurements. The tolerance of best it is feasible to construct systems that are either safe
case and worst case is about 10% to 15%. or fast.
The prevailing concept of processes and threads,
“fast” “safe” however, makes the construction of systems that are
(1) Channel transmission 0.02 µs 1.97 µs safe and fast difficult. We argue that a small and
(2) Block control 0.46 µs 6.79 µs easy-to-implement modification of this concept will
Sum 0.48 µs 8.76 µs overcome this limitation: Instead of running a thread
in the same process, i. e., address space, we propose
TABLE 1: Performance measurement re- to allow threads to change the address space during
sults runtime.
With this modification, we could statically
In total, it takes the framework about 0.48 µs
schedule the execution of blocks in one thread per
to schedule a block and a channel in the high-
CPU core. During execution of this schedule, the
performance implementation and 8.76 µs in the high-
thread would enter and leave address spaces in cor-
safety implementation. In measurements performed
respondence to the blocks’ components. An obvious
by Li et al. [7] on a comparable system, the aver-
advantage is the lack of context switches because
age context switch was around 3.8 µs. This indicates
blocks always run to completion and because they
that most of the overhead of a safe implementation
leave the stack empty after execution.
is caused by context switches and not by the feature
that is actually required for safety, address space sep- Moreover, inter-process communication could be
aration. implemented efficiently by using shared memory
without synchronization mechanisms. Imagine two
blocks A and B in different components that are
5 Conclusions connected by a channel such that A sends data to
B. Since B only gets started after A finished its ex-
ecution, B will always read consistent data from the
We have presented a component-based software channel/shared memory. This is still true if the same
framework that enforces structurization of cyclic thread is executing A and B, or if A and B are exe-
real-time control software systems vertically and hor- cuted on different cores.
izontally. By decoupling different aspects, such as
application logic, control flow, or communication, Note that it is also possible to implement an
our approach can be expected to simplify the con- operating system based on our component/block
struction of complex control systems and to reduce paradigm instead of the process/thread paradigm.
the implementation effort. We currently consider this idea less feasible. The
main reason is that in cyclic control systems there are
In addition, static scheduling and non- also sporadic tasks without real-time requirements,
preemptible execution of function blocks increase e. g., an FTP server. Such low-priority tasks run in
the system’s determinism and thus its predictability the slack time of the real-time cycle, and typically
compared to interleaved execution of threads and their execution requires the slack time of more than
dynamic scheduling. Moreover, our approach allows one cycle. They therefore need to be preemptible
system engineers to adjust the system to their needs and the operating system would have to provide some
if safety is of lesser concern than cost because the preemption mechanism similar to the one in the pro-
number of context switches and thus system over- cess/thread paradigm.
head can be reduced.
We have shown how the abstractions offered by
the framework can be implemented on RTLinux and 5.2 Future Work
provided performance measurements for two differ-
ent implementations. We see considerable potential for automatic tools
that can assist in system design. For instance, com-
ponent boundaries do not have to be drawn arbi-
5.1 Discussion trarily. Instead, an automatic tool can formally de-
rive component boundaries according to logical con-
In domains such as power and automation systems, straints. In our example, the three components have
components are required to be separated from each to exist because the two protection functions have to
other at runtime to prevent the propagation of faults be separated: If one of them fails the other one is not
across component boundaries. We have shown that affected. In addition, the sample manager needs to
114
Real-Time Linux Infrastructure and Tools
be separate because it must not be affected by faults Microkernel-based Embedded Systems,” Jour-
from either protection function in order to still be nal of Systems and Software, vol. 80, no. 5, pp.
able to serve the other one. However, if there is only 687–699, 2007.
one protection function it can be merged with the
sample manager because a fault in any block renders [6] E. Lee and D. Messerschmitt, “Static Schedul-
the whole system dysfunctional. ing of Synchronous Data Flow Programs for
Digital Signal Processing,” IEEE Transactions
In Section 2.4, we showed how the system sched- on Computers, vol. 36, no. 1, pp. 24–35, 1987.
ule maps all blocks to be executed onto the same
CPU core. This approach can be extended to mul- [7] C. Li, C. Ding, and K. Shen, “Quantifying
tiple cores by providing one mapping for each core the cost of context switch”, Proceedings of the
such that every block is statically assigned to one 2007 workshop on Experimental computer sci-
core. This allows for executing our framework in ence, Article 2, San Diego.
multi-core, multi-CPU, or even distributed scenar-
ios. Synchronization between the different cores is [8] J. W. S. Liu, “Real-Time Systems”, Prentice-
simplified to well-defined synchronization points be- Hall, New Jersey, 2000, ISBN 0-13-099651-3.
cause the dependencies between the blocks are ex-
plicitly specified in the application schedules. [9] C. D. Locke, “Software Architecture for Hard
Real-Time Applications: Cyclic Executives vs.
The dispatcher in our framework can be ex- Fixed Priority Executives,” Journal of Real-
tended such that blocks are not only executed in Time Systems, no. 4, pp. 37–53, 1992.
sequence, but at precise points in time. This is an
effective means for reducing the system’s jitter. [10] M. Naik, C.-S. Park, K. Sen, and D. Gay, “Ef-
fective Static Deadlock Detection”, ICSE’09.
115
A Framework for Component-Based Real-Time Control Applications
116
Performance Evaluation and Enhancement of Real-Time Linux
Abstract
Lately, the greatly improved real-time properties of Linux piqued the interest of the automization
industry. Linux holds a lot of promise as its support by a large active developer community ensures an
array of supported platforms. At the same time, automization vendors can tap into a huge reservoir of
sophisticated software and skilled developers. The open nature of Linux, however, raises questions as to
their resilience against attacks, particularly in installations that are connected to the internet.
While Linux has a good security record in general, security vulnerabilities have been recurring and
will do so for the foreseeable future. As such, it is expedient to supplement Linux’ security mechanisms
with the stronger isolation afforded by virtual machines. However, virtualization introduces an additional
layer in the system software stack which may impair the responsiveness of its guest operating systems.
In this paper, we show that L4Linux — an encapsulated Linux running on a microkernel — can be
improved on such that it exhibits a real-time behavior that falls very close to that of the corresponding
mainline Linux version. In fact, we only measured a small constant increase of the response times, which
should be small enough to be negligible for most applications. Our results show that it is practically
possible to run security critical tasks and control applications in dedicated virtual machines, thus greatly
improving the system’s resilience against attackers.
117
Real-Time Performance of L4Linux
118
Performance Evaluation and Enhancement of Real-Time Linux
ones on the same controller necessitating phys- Timer. The current version of Fiasco uses a 1 KHz
ical separation which incurs costs. A certifiable clock to drive its scheduler. Accordingly, time-
kernel holds the promise of providing isolation outs are limited to that granularity as well,
strong enough so that a coexistence on one con- which is insufficient for high-resolution timers.
troller becomes viable.
Communication. L4Linux operates in an environ-
Development. The diversity of current software
ment where some services are provided by other
may be difficult to leverage if the development
tasks. Requesting them involves the platform
is constrained to one operating system. It
communication means, in case of Fiasco pri-
would be much better if the respective class-
marily synchronous IPC. Waiting for an out-
leading solutions could be used while preserv-
standing reply may delay the dispatch of an
ing the investments in existing software assets.
incoming event.
119
Real-Time Performance of L4Linux
As a user-level task, L4Linux has no direct access to • cyclictest is a simple benchmark to measure
the page tables, which are under the exclusive control the accuracy of OS sleep primitives. The tool is
of the microkernel. To isolate its processes, L4Linux part of the kernel.org ”rt-tests” benchmark
makes use of L4 tasks. L4 allows two tasks to share suite 1 . To achieve accurate results, we exe-
memory and provides the original owner with the cuted it at the highest realtime priority and
means to later revoke that sharing. L4Linux uses chose clock nanosleep() as sleep function,
that memory sharing facility to provision its pro- as it allows an absolute specification of the
cesses with memory. Whenever Linux modifies the wakeup time and thus ignores the time spent
pagetable of one of its processes, this change is re- setting up the actual timer event.
flected in a corresponding change in the memory con-
figuration in the process’ task. • hackbench transmits small messages between
a fixed set of processes and thus exerts stress
While under normal operations page table up- on the OS scheduler. Its current development
dates are propagated individually, the destruction of version is also hosted through git 2 .
a process is handled differently. To avoid the over-
head of a microkernel syscall for each single page • Finally, the compilation of a Linux kernel both
table invalidation, the destruction is performed by a causes a considerable amount of harddisk inter-
single system call. The destruction of a task address rupt activity and creates and destroys a large
space requires the microkernel to iterate the page di- number of processes. We used a standard 2.6
rectory and return page table to its internal memory series Linux source tree and the default i386
pools. Although task destruction can be preempted, configuration as baseline.
it will not be aborted once started. As such, it is fully
added to worst-case times of timing critical paths in
L4Linux.
4.1 Throughput
We added a thread to L4Linux, which disposes
of tasks of perished processes. Not longer execut- To get a feeling for the setup we were about to ex-
ing long-latency syscalls, the main L4Linux thread periment with, we started out with some throughput
remains responsive to incoming events. measurements. While these are by no means repre-
sentative due to the restrictions to very little memory
and only one core, they serve as a good starting point
4 Evaluation and already hint at some of the expected results.
As every switch between a userspace task and
To evaluate our design we conducted a number of the L4Linux server involves a round-trip into the mi-
experiments. Our test machine contained a 2.7 Ghz crokernel and a switch to a different address space,
Athlon 64 X2 5200+ processor, an nVidia-MCP78- L4Linux suffers from frequent TLB and cache misses.
1 RT-Tests Repository: git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git
2 Hackbench Repository: https://github.com/kosaki/hackbench
120
Performance Evaluation and Enhancement of Real-Time Linux
To highlight the effect of this disadvantage, we cre- requires assistance from the microkernel. To see this
ated an ”intermediate” version of native Linux with- problem in effect, we created two latency histograms
out support for global pages, large pages and with with cyclictest under concurrent operation of the
explicit TLB flushes on every kernel entry and exit hackbench benchmark (cf. Fig. 4) and a kernel com-
path. pilation (cf. Fig. 5), respectively. While the maxima
of both histograms are in the expected order of mag-
The hackbench benchmark (cf. Fig. 2) shows
nitude, the former shows outliers up to 100µs and the
the stripped-down version of Linux almost halfway
latter (due to its constant destruction of processes)
between native Linux and L4Linux, which demon-
even an almost constant distribution of ”long” laten-
strates that the impact of repeated cache misses is
cies beyond 40µs.
quite severe when tasks are rapidly rescheduled. The
compilation of a Linux kernel (cf. Fig. 3) displays as 1e+06
expected only a mild slowdown for the non-caching L4Linux (no deferred destruction)
10000
#
L4Linux
250 100
200 10
time (s)
150 1
0 20 40 60 80 100
latency (us)
100
1000
#
1000
Linux 100
TLB Flush
L4Linux
800 10
1
600
time (s)
400
FIGURE 5: Latencies measured with
hackbench as load.
200
As outlined in section 3.3, we therefore external-
0 ized the destruction of address spaces to a separate
kernel L4 thread. While the execution context issuing the
destruction request still waits for the destruction to
FIGURE 3: Duration of kernel complete, L4Linux as a whole is then interruptible
compilation. during the wait and can react to external events.
Another possible source of large latencies are long- Taking all these findings into consideration,
running non-abortable system calls. One particu- we finally compared native Linux and our im-
larly long operation especially in the Fiasco-based proved L4Linux implementation directly us-
setup is the destruction of an address space, as this ing our established benchmark combinations
121
Real-Time Performance of L4Linux
#
termine the recipient attached to it and pass the in-
100
terrupt on. This operation induces a very constant
amount of overhead – we measured a delay of 3.65µs 10
122
Performance Evaluation and Enhancement of Real-Time Linux
123
Real-Time Performance of L4Linux
124
Performance Evaluation and Enhancement of Real-Time Linux
Wu Zhangjin
Tiny Lab - Embedded Geeks
http://tinylab.org
[email protected]
Sheng Yong
Distributed & Embedded System Lab, SISE, Lanzhou University, China
Tianshui South Road 222, Lanzhou, P.R.China
[email protected]
Abstract
Linux is widely used in embedded systems which always have storage limitation and hence requires
size optimization. In order to reduce the kernel size, based on the previous work of the “Section Garbage
Collection Patchset”, This paper focuses on details its principle, presents some new ideas, documents the
porting steps, reports the testing results on the top 4 popular architectures: ARM, MIPS, PowerPC, X86
and at last proposes future works which may enhance or derive from this patchset.
125
Tiny Linux Project: Section Garbage Collection Patchset
to its own section, for instance, there is a function $ echo ’ unused (){} main (){} ’ | gcc -S -x c -o - - \
| grep . text
called unused func(), it goes to .text.unused func . text
section, Then, ld provides the --gc-sections op-
tion to check the references and determine which Or else, each function has its own section (indi-
dead function or data should be removed, and the cated by the .section instruction of assembly):
--print-gc-sections option of ld can print the the
$ echo ’ unused (){} main (){} ’ \
function or data being removed, which is helpful to | gcc - ffunction - sections -S -x c -o - - | grep . text
. section . text . unused , " ax " , @progbits
debugging. . section . text . main ," ax " , @progbits
If no -ffunction-sections for gcc, all functions are 1. The section for function should named with
put into .text section (indicated by the .text instruc- .text prefix, then, the linker may be able to
tion of assembly): merge all of the .text sections. or else, will not
126
Performance Evaluation and Enhancement of Real-Time Linux
be able to or not conveniently merge the sec- Here is a basic linker script:
tions and at last instead may increase the size O U T P U T _ F O R M A T( " elf32 - i386 " , " elf32 - i386 " ,
of executable for the accumulation of the sec- " elf32 - i386 " )
O U T P U T _ A R C H( i386 )
tion alignment. ENTRY ( _start )
SECTIONS
2. The section name for function should be pre- {
. text :
fixed with .text. instead of the default .text {
prefix used gcc and break the core rule. *(. text . stub . text .* . gnu . linkonce . t .*)
...
}
. data :
And we must notice that: ‘You will not be able {
to use “gprof” on all systems if you specify this op- *(. data . data .* . gnu . linkonce . d .*)
...
tion and you may have problems with debugging if }
you specify both this option and -g.’ (gcc man page) / DISCARD / : { *(. note . GNU - stack ) *(. gnu . lto_ *) }
}
$ echo ’ unused (){} main (){} ’ | \
gcc - ffunction - sections -p -x c -o test -
< stdin >:1:0: warning : - ffunction - sections disabled ; \ The first two commands tell the target architec-
it makes p r o f i l i n g i m p o s s i b l e ture and the ABI, the ENTRY command indicates
the entry of the executable and the SECTIONS com-
mand deals with sections.
2.2 Assemble: Translate assembly The entry (above is start, the standard C entry,
files to binary objects defined in crt1.o) is the root of the whole executable,
all of the other symbols (function or data) referenced
In assembly file, it is still be possible to put the func- (directly or indirectly) by the the entry must be kept
tion or data to an indicated section with the .sec- in the executable to make ensure the executable run
tion instruction (.text equals .section “.text”). Since without failure. Besides, the undefined symbols (de-
-ffunction-sections and -fdata-sections doesn’t fined in shared libraries) may also need to be kept
work for assembly files and they has no way to de- with the EXTERN command. and note, the --entry
termine the function or data items, therefore, for the and --undefined options of ld functions as the same
assembly files written from scratch (not translated to the ENTRY and EXTERN commands of linker
from C language), .section instruction is required to script respectively.
added before the function or data item manually, or
else the function or data will be put into the same --gc-sections will follow the above rule to deter-
.text or .data section and the section name indicated mine which sections should be reserved and then pass
should also be unique to follow the core rule of sec- them to the SECTIONS command to do left merg-
tion garbage collection. ing and including. The above linker script merges all
section prefixed by .text, .stub and .gnu.linkonce.t
The following commands change the section to the last .text section, the .data section merging is
name of the ‘unused’ function in the assembly file similar. The left sections will not be merged and kept
and show that it does work. as their own sections, some of them can be removed
$ echo ’ unused (){} main (){} ’ \ by the /DISCARD/ instruction.
| gcc - ffunction - sections -S -x c -o - - \
| sed -e " s / unused / test / g " \ Let’s see how --gc-section work, firstly, without
| gcc -c - x a s s e m b l e r - -o test
$ objdump -d test | grep . section it:
D i s a s s e m b l y of section . text . test :
$ echo ’ unused (){} main (){} ’ | gcc -x c -o test -
D i s a s s e m b l y of section . text . main :
$ size test
text data bss dec hex filename
800 252 8 1060 424 test
2.3 Link: Link binary objects to tar- Second, With --gc-sections (passed to ld with
get executable -Wl option of gcc):
$ echo ’ unused (){} main (){} ’ | gcc - ffunction - sections \
At the linking stage, based on a linker script, the -Wl , - - gc - sections -x c -o test -
$ size test
linker should be able to determine which sections text data bss dec hex filename
should be merged and included to the last executa- 794 244 8 1046 416 test
bles. When linking, the -T option of ld can be used
to indicate the path of the linker script, if no such It shows, the size of the .text section is reduced
option used, a default linker script is called and can and --print-gc-sections proves the dead ‘unused’
be printed with ld --verbose. function is really removed:
127
Tiny Linux Project: Section Garbage Collection Patchset
/usr/bin/ld: Removing unused section ’.data’ in file ’.../crt1.o’ The basic support of gc-sections patchset for Linux
/usr/bin/ld: Removing unused section ’.data’ in file ’.../crtbegin.o’ includes:
/usr/bin/ld: Removing unused section ’.text.unused’ in file ’/tmp/cclR3Mgp.o’
The previous section garbage collection patchset is A better pattern may be the following:
for the -rc version of 2.6.35, which did add the core *(. text .[^.]*)
support of section garbage collection for Linux but
Note, both of the above patterns are only sup-
it still has some limitations.
ported by the latest ld, please use the versions
Now, Let’s analyze the basic support of section newer than 2.21.0.20110327 or else, they don’t
garbage collection patchset for Linux and then list work and will on the contrary to generate big-
the existing limitations. ger kernel image for ever such section will be
128
Performance Evaluation and Enhancement of Real-Time Linux
linked to its own section in the last executable 6. Didn’t pay enough attention to the the kernel
and the size will be increased heavily for the modules, the kernel modules may also include
required alignment of every section. dead symbols which should be removed
• Support objects with more than 64k sections 7. Only for X86 platform, not enough for the
The variant type of section number(the other popular embedded platforms, such as
e shnum member of elf{32,64} hdr) is u16, ARM, MIPS and PowerPC
the max number is 65535, the old modpost
tool (used to postprocess module symbol) In order to break through the above limita-
can only handle an object which only has tions, improvement has been added in our gc-sections
small than 64k sections and hence may fail project, see below section.
to handle the kernel image compiled with
huge kernel builds (allyesconfig, for exam-
3.2 Improvement of the previous gc-
ple) with -ffunction-sections. Therefore, the
modpost tool is fixed to support objects with sections patchset
more than 64k sections by the document
“IA-64 gABI Proposal 74: Section Indexes”: Our gc-sections project is also based on mainline
http://www.codesourcery.com/public/cxx- 2.6.35(exactly 2.6.35.13), it brings us with the fol-
abi/abi/prop-74-sindex.html. lowing improvement:
• Invocation of -ffunction-sections/-fdata-sections 1. Ensure the other kernel features work with gc-
and --gc-sections sections
In order to have a working kernel with Ftrace requires the mcount loc section to
-ffunction-sections and -fdata-sections: store the mcount calling sites; Kgcov requires
$ make KCFLAGS = " - ffunction - sections - fdata - sections " the .ctors section to do gcov initialization,
these two sections are not referenced directly
Then, in order to also garbage-collect the sec- and will be removed by --gc-sections and
tions, added hence should be kept by the KEEP instruc-
L D F L A G S _ v m l i n u x += -- gc - sections tion explicitly. Besides, more sections listed
in include/asm-generic/vmlinux.lds.h or the
in the top-level Makefile. other arch specific header files has the similar
situation and should be kept explicitly too.
The above support did make a working kernel /* include / asm - generic / vmlinux . lds . h */
...
with section garbage collection on X86 platforms, but - *( _ _ m c o u n t _ l o c) \
still has the following limitations: + KEEP (*( _ _ m c o u n t _ l o c )) \
...
- *(. ctors ) \
+ KEEP (*(. ctors )) \
1. Lack of test, and is not fully compatible with ...
some main kernel features, such as Ftrace, Kg-
cov 2. The section name defined by section attribute
2. The current usage of section attribute instruc- instruction should be unique
tion itself still breaks the core rule of section The symbol name should be globally unique
garbage collections for lots of functions or data (or else gcc will report symbol redefinition), in
may be put into the same sections(e.g. init), order to keep every section name unique, it is
which need to be fixed possible to code the section name with the sym-
bol name. FUNCTION (or func in Linux)
3. Didn’t take care of assembly carefully and is available to get function name, but there is
therefore, the dead sections in assembly may no way to get the variable name, which means
also be reserved in the last kernel image there is no general method to get the symbol
4. Didn’t focus on the support of compressed ker- name so instead, another method should be
nel images, the dead sections in them may also used, that is coding the section name with line
be reserved in the last compressed kernel image number and a file global counter. the combina-
tion of these two will minimize the duplication
5. The invocation of the gc-sections requires to of the section name (but may also exist dupli-
pass the gcc options to ‘make’ through the en- cation) and also reduces total size cost of the
vironment variables, which is not convenient section names.
129
Tiny Linux Project: Section Garbage Collection Patchset
But the other directly used .section instruc- 5. Support compressed kernel image
tions require a better solution, fortunately, we The compressed kernel image often include a
can use the same method proposed above, that compressed vmlinux and an extra bootstraper,
is: the bootstraper decompress the compressed
130
Performance Evaluation and Enhancement of Real-Time Linux
kernel image and boot it. the bootstraper may The architecture and platform specific parts
also include dead code, but for its Makefile does are small but need to follow some basic steps
not inherit the make rules from either the top to minimize the time cost, the porting steps
level Makefile or the Makefile of a specific ar- to a new platform will be covered in the next
chitecture, therefore, this should be taken care section.
of independently.
Just like we mentioned in section 2.3, 3.3 The steps of porting gc-sections
the section stored the kernel image must
be kept with the KEEP instruction, and
patchset to a new platform
the -ffunction-sectoins, -fdata-sections,
In order to make gc-sections work on a new platform,
--gc-sections and --print-gc-sections op-
the following steps should be followed (use ARM as
tions should also be added for the compiling
an example).
and linking of the bootstraper.
6. Take care of the kernel modules 1. Prepare the development and testing environ-
ment, including real machine(e.g. dev board)
Currently, all of the kernel modules share
or emulator(e.g. qemu), cross-compiler, file
a common linker script: scripts/module-
system etc.
common.lds, which is not friendly to
--gc-sections for some architectures may re- For ARM, we choose qemu 0.14.50 as the emu-
quires to discard some specific sections. there- lator and versatilepb as the test platform, the
fore, a arch specific module linker script should corss-compiler (gcc 4.5.2, ld 2.21.0.20110327)
be added to arch/ARCH/ and the following is provided by ubuntu 11.04 and the filesys-
lines should be added to the top-level Make- tem is installed by debootstrap, the ramfs
file: is available from http://d-i.debian.org/daily-
# Makefile
images/armel/.
+ LDS_MODULE = \
-T $ ( srctree )/ arch / $ ( SRCARCH )/ module . lds 2. Check whether the GNU toolchains sup-
+ LDFLAGS_MODULE = \ port -ffunction-sections, -fdata-sections
$ ( if $ ( wildcard arch / $ ( SRCARCH )/ module . lds ) ,\
$ ( L D S _ M O D U L E)) and --gc-sections, if no support, add the
L D F L A G S _ M O D U L E += \ toolchains support at first
-T $ ( srctree )/ scripts / module - common . lds
The following command shows the GNU
Then, every architecture can add the archi- toolchains of ARM does support gc-sections,
tecture specific parts to its own module linker or else, there will be failure.
script, for example: $ echo ’ unused (){} main (){} ’ | arm - linux - gnueabi - gcc \
- ffunction - sections -Wl ,-- gc - sections \
# arch / mips / module . lds -S -x c -o - - | grep . section
SECTIONS { . section . text . unused , " ax " ,% progbits
. section . text . main , " ax " ,% progbits
/ DISCARD / : {
*(. MIPS . options )
...
} 3. Add -ffunction-sections, -fdata-sections, at
} proper place in arch or platform specific Make-
file
In order to remove the dead code in the kernel # arch / arm / Makefile
modules, it may require to enhance the com- ifndef C O N F I G _ F U N C T I O N _ T R A C E R
KBUILD_CFLAGS += - ffunction - sections
mon module linker script to keep the functions endif
called by module init() and module exit(), for KBUILD_CFLAGS += - fdata - sections
these two are the init and exit entries of the
modules. Besides, the other specific sections 4. Fix the potential compatibility problem (e.g.
(e.g. .modinfo, version) may need to be kept disable -ffunction-sections while requires
explicitly. This idea is not implemented in our Ftrace)
gc-sections project yet. The Ftrace compatiability problem is fixed
above, no other compatibility has been found
7. Port to the other architectures based platforms
up to now.
Our gc-sections have added the gc-sections sup-
port for the top 4 architectures (ARM, MIPS, 5. Check if there are sections which are unrefer-
PowerPC and X86) based platforms and all of enced but used, keep them
them have been tested. The following three sections are kept for ARM:
131
Tiny Linux Project: Section Garbage Collection Patchset
9. Make sure the main kernel features (e.g. MIPS malta pcnet 4.5.2 2.21
Ftrace, Kgcov, Perf and Oprofile) work nor- PPC g3beige pcnet 4.4.5 2.20.1.20100303
132
Performance Evaluation and Enhancement of Real-Time Linux
133
Tiny Linux Project: Section Garbage Collection Patchset
134
Performance Evaluation and Enhancement of Real-Time Linux
Abstract
For the low cost and high bandwidth of Ethernet, there have been a lot of investigations to implement
real-time communication over Ethernet. To overcome the intrinsic non-determinism of Ethernet, a few of
real-time Ethernet variants turn up but industrial users are still hesitating about the choice among those
variant. openPOWERLINK is a open source industrial Ethernet solution which follows POWERLINK -
a real-time Ethernet protocol. For the good performance and wide platform support of RT-PREEMPT, a
soft PLC which consists of openPOWERLINK and RT-PREEMPT might be a cheap and easy solution for
many deployment cases. In this paper, we would evaluate synchronicity, jitter, cycle time and other most
relevant indicators of quality of openPOWERLINK on RT-PREEMPT in distributed systems which are
used commonly to implement tightly coordinated controllers, data acquisition or synchronization systems.
To allow evaluating such typical scenarios with inexpensive hardware we used the parallel-port to simulate
an input/output unit of a PLC. We designed some benchmark cases to evaluate the related indicators
and represented the result as reference data. Further with the increased use of COTS and FLOSS
components for safety related systems, which are often distributed/replicated systems, the results of this
evaluation may also be of interest to designers of safety related systems. OpenPOWERLINK provides
basic capabilities suitable to build replicated or redundant systems (i.e. TMRs).
Keywords: openPOWERLINK, EPL, Real Time Ethernet, VIAC3
135
Performance Evaluation of openPOWERLINK
tween cost and performance. Moreover, for its good CANopen is one of the most popular higher layer
synchronization, EPL attracted much attention of in- protocols for CAN-based networks. Therefore there
dustrial users. are a number of device and application profiles un-
To provide reference to industrial users and der development or already available which are used
related developers, we built a simple distributed in example in building related applications like door
system with openPOWERLINK[9] which is the control or elevators, for ships, trains, municipal ve-
open-source implementation of EPL. Some demo hicles or railway as well as for medical applications.
and benchmark applications were implemented for Besides these standardized profiles, another big ad-
benchmark. In the following sections, we will intro- vantage of CANopen is that it’s used in a wide range
duce our work and represent the evaluation result. of proprietary systems and applications which eases
integration of these and the transformation to open-
source.
2 Background: EPL and open- openPOWERLINK is an open source industrial
POWERLINK Ethernet solution provided by SYSTEC electronic
[10]. It contains the Ethernet POWERLINK proto-
Bringing together Ethernet, CANopen, and a newly col stack for the Managing Node (master) and for
developed stack for real-time data communication, the Controlled Nodes (slaves). It is released under
POWERLINK integrates features and abilities from the BSD License.
three different worlds. In contrast to a number of EPL Application Layer HTTP
competing products, POWERLINK keeps very close FTP
Object Dictionary Other
to the Ethernet standard, retaining original Ethernet Application Layer
PDO SDO (Asynchronous data)
features, and thus reducing the cost of industrial de-
ployment. It expands Ethernet with a mixed Polling
Transport Layer
and Time slicing mechanism named SCNM (refer to NMT
Network Layer
figure 1).
Managing Node(MN) EPL Data Link Layer
Isochoronous Phase Asynchronous Phase
DataLink Layer
PReq PReq PRes MAC
SoC SoA ASnd
to CN 1 to CN 2 from MN
Idle
Phase
PRes PRes
from CN 1 from CN 2
ASnd Physical Layer (PHY) Physical Layer
Controlled Node(MN)
Fig. 2: Abstract Model of openPOWER-
LINK
Fig. 1: EPL Cycle The EPL stack is divided into two parts: low-
There are two kinds of nodes in EPL, namely prioritized processes above the Communication Ab-
managing node (MN) and controlled node (CN). straction Layer (CAL) called EPL user part and
A MN, which acts as the master in the EPL net- high-prioritized processes below the CAL called EPL
work, polls the CN cyclically. This process takes kernel part. Processes which have to be processed in
place in the isochronous phase of the EPL cycle. every EPL cycle have high priority, e.g. Data Link
Immediately after the isochronous phase follows an Layer (DLL), PDO processing and core NMT state
asynchronous phase for communication which is not machine. All other processes have low priority, e.g.
time-critical, e.g. TCP/IP communication. The SDO. It is possible to swap out the high-prioritized
isochronous phase starts with a Start of Cyclic frame processes on a separate CPU (e.g. on a SMP ma-
on which all nodes are synchronized. This schedule chine) to ensure the real-time requirements[16].
design avoids collisions, which are usually present on
Standard Ethernet, and ensures the determinism of
the hard real-time communication. It is implemented
3 Performance Evaluation
in the EPL data link layer. The SoC1 packet is sent
3.1 Performance Indicators
for synchronizing and indicating that the start of the
isochronous phase of a new cycle. SoA starts the There are a lot of indicators that should be consid-
asynchronous phase[16]. ered when evaluating real-time Ethernet. We paid
EPL integrates the CANopen, a robust and more attention on the most important performance
proven protocol widely used throughout the au- indicators (PIs) in this paper, namely synchronicity,
tomation world, which greatly simplifies setting up minimum cycle time, latency and jitter, and the scal-
networks because of its extensive standardization. ability over the number of end nodes. Some other PIs
1 EPL frame types, SoA: Start of Asynchronous, SoC: Start of Cyclic, PReq: Poll Request, PRes: Roll Response
136
Performance Evaluation and Enhancement of Real-Time Linux
3.2 Setup
We built a three nodes distributed system using VIA
boards with little modification on the default config-
uration of openPOWERLINK. The motivation for
the three node setup being that a common setup
safety related systems is as a triple modular redun-
dancy (TMR) - thus this is one of the target profiles
we are interested in for a real-time Ethernet solu-
tion. There was one MN and two CNs. The three
nodes were connected via 10Mbps HUB or 100Mbps
Fig. 4: System Latency of RT-PREEMPT
Switch (the ideal solution a 100Mbps hub was not
available). A parallel-port cable was attached to each We run cyclictest with different interval (500,
CNs. Suitable pins of the other end of the cable were 1000, 2000 in macrosecond) and the worst case la-
connected to an oscilloscope. We used two channels tency were less than 80 microsecond. The distribu-
of the oscilloscope to display the signals from the two tions of the latency over frequency are quit stable.
CNs - one channel for each CN. The structure of the
setup was figured as follows.
3.3 Synchronicity
As we described above, there was one MN and two
CNs in our setup. To meet our requirement, we need
to configure the related entries of object dictionary
and let the MN send a 8-bit real-time data to each of
the two CNs periodically. The two CNs output the
data to their parallel-port when they get the next
SoC packet. To facilitate our programming, the MN
sent 0x00H and 0xFFH to the CNs separately and
reversed them in the next cycle. The snapshot of the
oscilloscope below shows the output of the CNs.
Fig. 3: System Setup Implementation of MN includes the following
main parts.
The following are the detail of the system.
• Define object dictionary
137
Performance Evaluation of openPOWERLINK
138
Performance Evaluation and Enhancement of Real-Time Linux
.
The cycle time which is commonly one of the
most critical indicator depends on the system jitter,
number of nodes and transmission hardware (NIC Fig. 9: Timing over 10Mbps HUB
and Ethernet). Ethernet packet capture tools like
Wireshark cannot meet the high precise timer re-
quirement for a system analysis. Thus we need to
record the time stamps via high resolution timer.
The MN and CNs used independent clocks, so we
cannot compare the time stamp from different nodes
without time synchronization. Our solution is to
record the related time stamps on the same machine
so that we could get the duration for different phases.
The following figure showed a simplified communica-
tion period.
139
Performance Evaluation of openPOWERLINK
the timeout value which fits to our specific environ- The throughput is closely related to the amount
ment and setup. A new equation can be derived from of data to send during one cycle and minimal cycle
Equation 1, time. After getting those data, we could easily esti-
mate the relationship between throughput, payload
TC = Tsoc + Trr ∗ n + Ta + To ∗ n (3) and cycle time. As the Ethernet has enough band-
in which Ta is the time for asynchronous phase(that width to fulfill the RTE and non-RTE throughput,
we can refer to the Trr to calculate); To ∗ n is the we would not extend the issue in this paper.
idle time to ensure the jitter of the system will not
affect the regular cyclic communication. Note that
the possibility of the worst case latency occur at the 4 Conclusion
same time is very low. We do not need a safety mar-
gin as long as To ∗ n. In other words, you might need As the strong potential of introducing Ethernet into
to find out a balance between cycle time and failure distributed real-time control system, we took several
rate(here means the tolerable rate of the actual cycle important performance indicators of real-time Eth-
time exceeds the configured value). ernet into consideration when evaluating openPOW-
As soon as the timeout, specification of com- ERLINK. User could easily setup the EPL on a Linux
munication (RT and non-RT data amount) and the machine without any special hardware or particu-
number of nodes are specified, referring to the data lar topology and get relatively good hard real-time
and equation above we could get the time for deliver- communication. This feature facilitates the user and
ing SoC (including the gap) and the round time for saves much cost.
PReq and Pres and the time for the asynchronous In this paper, we studied the openPOWERLINK
and idle phase. And go a step further, we could on a few significant indicators, benchmarked the crit-
roughly estimate the possible cycle length of the EPL ical phases in the EPL communicate cycle and gave
network with 10Mbps HUB or 100Mbps Switch. a reference model for industrial user or related re-
Comparing to the system jitter, the jitter of searchers and developers. The data we got indi-
switch is negligeable in our case. We could take the cates that the cycle time and synchronization per-
average value of the parameters as the reference to es- formance of openPOWERLINK-on-RT-PREEMPT
timate the possible cycle time we can get (note we got solution meet the requirement of process control sys-
data in overload system condition). Normally the la- tems and most motion control systems[2]. The data
tency of a 100Mbps switch is about 10 microseconds shows that the system jitter is quite big which makes
(the precise value depends on a specific hardware), a great impact on the EPL cycle time and syn-
so you may use a 100Mbps Hub to avoid the latency chronicity. To use it in high precision motion con-
generated by switch and get better performance. trol system, there are some points need to be opti-
mized, such as the software architecture, code and
RT-PREEMPT etc. Besides optimization of those
3.5 Other indicators points and implementation of a real distributed con-
Up to 240 CNs can be employed connected in var- trol which has mechanical equipment integrated, to
ious configurations. You may use both hubs(hub is seek for a solution to effectively reuse the Ethernet
recommended for its small latency) or switches(you card driver in Linux kernel is also our future work.
need to consider the latency) in more than one level
with different topology. Rather than fixed topology
of EtherCAT, EPL has flexible topology. A mixed 5 Reference
tree and line structure is available when a large num-
ber of nodes are being used. As the intrinsic Ethernet References
property, the EPL network can be easily connected
via gateways to non real-time networks. [1] Jean-Dominique Decotignie, Ethernet-Based
Real-Time and Industrial Communications,
Proceedings of the IEEE, 2005.
140
Performance Evaluation and Enhancement of Real-Time Linux
141
Performance Evaluation of openPOWERLINK
142
Performance Evaluation and Enhancement of Real-Time Linux
Tommaso Cucinotta
Scuola Superiore Sant’Anna
Pisa, Italy
[email protected]
Fabio Checconi
IBM Research, T.J. Watson
Yorktown Heights, NY, USA
[email protected]
Dhaval Giani
Scuola Superiore Sant’Anna
Pisa, Italy
[email protected]
Abstract
In this paper the problem of providing network response guarantees to multiple Virtual Machines
(VMs) co-scheduled on the same set of CPUs is tackled, where the VMs may have to host both responsive
real-time applications and batch compute-intensive workloads. When trying to use a real-time reservation-
based CPU scheduler for providing stable performance guarantees to such a VM, the compute-intensive
workload would be scheduled better with high time granularities, to increase performance and reduce
system overheads, whilst the real-time workload would need lower time granularities in order to keep the
response-time under acceptable levels. The mechanism that is proposed in this paper mixes both concepts,
allowing the scheduler to dynamically switch between fine-grain and coarse-grain scheduling intervals
depending on whether the VM is performing network operations or not. A prototype implementation of
the proposed mechanism has been realized for the KVM hypervisor when running on Linux, modifying
a deadline-based real-time scheduling strategy for the Linux kernel developed previously. The gathered
experimental results show that the proposed technique is effective in controlling the response-times of the
real-time workload inside a VM while at the same time it allows for an efficient execution of the batch
compute-intensive workload.
143
Improving Responsiveness for Virtualized Networking Under Intensive Computing Workloads
1 Introduction and Related based real-time CPU scheduling [2] for the Linux ker-
nel in order to stabilize the performance of individ-
Work ual compute-intensive VMs, tackling the problem of
network-intensive VMs later [6].
Virtualization is increasingly gaining momentum as
the enabling technology for the management of phys- The latter works rely on the use of a reservation-
ical resources in data centers and Infrastructure-as- based scheduler [2] for the CPU (a hard-reservation
a-Service (IaaS) providers in the domain of Cloud variant of the Constant Bandwidth Server [1]) that
allows for configuring the scheduling guarantees for
Computing. Indeed, virtualization enhances the flex-
ibility in managing physical resources, thanks to its a given VM in terms of a budget (Q) and a period
capability to virtualize the hardware so as to host (P ). The scheduler will guarantee that each VM will
multiple Virtual Machines (VMs) executing poten- be scheduled for Q time units every period of P time
tially different Operating Systems, and the capability units, underPthe iusual assumption of non-saturation
to live-migrate them as needed without interrupting for EDF ( i Q Pi ≤ 1, see [10] for details). The
the provided service, except for a very low down- reservation period can be specified independently for
time. Virtualized systems are also capable of ex- each VM, and it constitutes the time granularity over
which the CPU allocation is granted to the VM.
hibiting a performance nearly equal to the one expe-
rienced on the bare metal, due to the hardware virtu- A shorter period improves the responsiveness of
alization extensions provided by modern processors. the VMs at the cost of higher scheduling overheads,
As a consequence of virtualization, multiple thus being beneficial for time-sensitive workloads.
under-utilized servers can easily be consolidated onto On the other hand, a longer period leads to lower
the same physical host. This allows a reduction in scheduling overheads, thus it is beneficial for batch
and high-performance workloads, at the cost of po-
the number of required physical hosts to support a
number of virtualized OSes, leading to advantages in tentially longer time intervals during which the VM
terms of costs for running the infrastructure and of is unresponsive (in the worst-case, a VM might have
energy impact. to wait as much as 2(P − Q) before being scheduled
again). However, for VMs embedding both batch
However, once multiple VMs are deployed on computing activities (including both main VM func-
the same physical resources, their individual perfor- tionality or typical bookkeeping OS activities, such
mance is at risk of becoming greatly unstable, unless as updating indexes) and time-sensitive tasks (e.g.,
proper mechanisms are utilized. A VM which tempo- reporting on the progress of batch tasks, or realizing
rararily saturates either the processing, networking, independent features), both configurations do not fit
or storage access capacity of the underlying physical very well, as highlighted by Dunlap in the discussion
resources immediately impacts the performance of about future work on the new upcoming Xen Credit
the other VMs which share the same resources. This Scheduler [7].
is a potentially critical issue for IaaS providers where
proper QoS specifications are included in the Service- In this paper we propose a novel mechanism for
scheduling VMs with both compute-intensive and
Level Agreements (SLAs) with the customers.
network-responsive workloads. In absence of exter-
The problem of providing a stable performance nal requests the VM progresses with its (long) period
to individual VMs has been studied in the past. For configuration (e.g., hundreds of ms) and can perform
example, Gupta et al. [8] introduce in the Xen hyper- batch computing activities reducing scheduling over-
visor1 a proper CPU scheduling strategy accounting heads to the minimum. However the occurrence of
for the consumption of device driver domain(s) as external requests allows the VM to be woken up by
due to the individual VMs operations. In [11], an the scheduler within a much shorter interval (e.g., ms
extension to the Xen credit-based scheduler is pro- or tens of ms), to perform relatively short activities
posed, to improve its behavior in presence of multi- configured at a higher priority inside the VM, so as
ple different applications with I/O bound workloads. to respond very quickly to external events.
Also, Liao et al. [9] propose to modify the Xen CPU
scheduler, by making it cache aware, and the net-
working infrastructure to improve the performance
of virtualized I/O on 10Gbps Ethernet.
For the KVM hypervisor2, Cucinotta et al. [3,
4, 5] investigated on the use of hierarchical deadline-
1 More information at: http://www.xen.org/.
2 More information at: http://www.linux-kvm.org/.
144
Performance Evaluation and Enhancement of Real-Time Linux
2 Approach
The mechanism proposed in this paper applies to vir-
tual machines scheduled under a reservation-based
real-time scheduler like the one presented in [2]. For
the sake of simplicity the focus is on single-core VMs
scheduled according to a partitioned EDF policy (so
one or more VMs are pinned on each physical core
and scheduled on it).
Each VM can be configured with a set of schedul-
ing parameters denoted by (Q, P ), with the meaning FIGURE 2: Example schedule of a
that Q time units are granted to the VM for each P VM with generic scheduling parameters of
time units. The interest in having Q/P < 1, thus the (Q, P ), when co-scheduled, exhibiting a non-
possibility to have multiple VMs co-scheduled on the responsiveness time interval of 2(P − Q).
same processor and core, comes from the fact that
the infrastructure provider may have an interest in Also, P controls the scheduling overheads im-
“partitioning” the big computing power available on posed on the system. In fact, the scheduler forces a
a single powerful core into multiple VMs with lower context switch at least every interval as long as the
computing capabilities and rent them separately, or minimum P value across all the reservations config-
merely from the fact that the hosted VMs have an ured on the core.
expected workload (e.g., as due to requests coming
from the network) that cannot saturate the comput- This kind of scheduler allows heterogeneous vir-
ing power on the underlying physical core, thus en- tualized workloads to safely coexist as far as they
abling the provider to perform server consolidation. belong to different VMs. One can easily configure a
The Q value constitutes both a guarantee and a lim- short P value for a VM with a real-time workload
itation (i.e., we are using hard reservations). This that needs to be responsive, and a long P for a VM
ensures that the performance of each VM is not af- that performs mainly batch computations. However,
fected (too much) from how much intensively other mixing such types of workloads in the same VM may
VMs are computing [5, 4]. lead to problems. One can configure the responsive
activities in the VM to run at a higher priority as
Roughly speaking, at equal Q over P ratios, the compared to the batch computing ones (i.e., by ex-
chosen value for P regulates the responsiveness of the ploiting priority-based scheduling as available on ev-
associated VM. It is easy to see that, if the VM is ery OS). However, still the non-responsive periods
running alone, then its schedule comes out as shown of the VM will largely dominate the response time
in Figure 1, and the non-responsiveness time interval of the real-time task(s). So, in order to keep such
for the VM may be as long as P − Q. However, the response times low, the normal option would be the
worst-case condition when the VM is co-scheduled one to use small P values, obtaining high schedul-
with other VMs is the one shown in Figure 2, with ing overheads also while the VM is doing its batch
the budget granted to the VM at the beginning of a computing activities without any request from the
P time window (for example, because at that time outside triggering the real-time functionality.
all other VMs were idle), and at the end of the time
window immediately following (for example, as due In order to resolve this problem, in this paper
to the wake-up of a VM at the beginning of this sec- we propose the following mechanism (see Figure 3).
ond time window, with a deadline slightly shorter The VM is normally attached to a reservation con-
than the first VM, under theoretical saturation for figured with scheduling parameters (Q, P ), with a
the EDF scheduler). period P tuned for the batch computing case, i.e., it
is relatively large, for example in the range of hun-
dreds of milliseconds. In addition, a second “spare”
reservation is configured in the system with parame-
ters (Qs , Ps ) tuned for the operation of the real-time
activity, i.e., Ps is relatively small, for example in
the range of tens of milliseconds or shorter, and Qs
FIGURE 1: Example schedule of a sufficient to complete an activation of the real-time
VM with generic scheduling parameters of activity. Now, whenever the VM receives a network
(Q, P ), when running alone, exhibiting a non- packet and its current budget is exhausted (i.e., it
responsiveness time interval of P − Q. is in the non-responsiveness time frame), the VM
145
Improving Responsiveness for Virtualized Networking Under Intensive Computing Workloads
is temporarily attached to the “spare” reservation. Finally, in order to avoid keeping a spare reser-
Having a much shorter deadline, the spare reserva- vation for each and every VM hosted onto the same
tion forces the VM to be scheduled and receive Qs physical host, we propose to use a pool of spare reser-
execution time units on the processor within the Ps vations which can be used for the purpose illustrated
deadline from the packet receive time; this will cause above. The idea is that, exploiting statistical multi-
the VM to run, receive the packet and possibly ac- plexing of the networking traffic patterns among in-
tivate the real-time activity that will perform some dependent VMs, one can assume that the probability
fast computation (and possibly provide a response of having all the VMs requiring a spare reservation
packet). If the real-time activity cannot complete attached dynamically at the same time be very small.
within the first activation of the spare reservation, it This way, the additional utilization to keep for spare
will be resumed during the subsequent activations, reservations may be kept limited.
so it will receive additional Qs time units during the
Therefore, a pool of a few reservations with short
following Ps time window, and so on, till the time
periods will be ready to be used for boosting reser-
of replenishment of the original reservation budget,
vations (with longer periods) of VMs when they re-
at which time the VM relinquishes the spare reserva-
ceive packets from the external world but their nor-
tion. With a proper tuning of the Qs and Ps parame-
mal budget is exhausted due to compute-intensive
ters a VM configured for batch computing activities
activities. This allows for a very quick reaction-time
should exhibit a tremendously improved response-
of the VMs.
time to sporadic requests coming from the network,
at the cost of keeping some extra-capacity unused in
the system.
3 Implementation Details
In order to validate the proposed approach we im-
plemented a proof of concept in the Linux kernel,
using the KVM hypervisor to execute the VMs. We
started from the IRMOS scheduler [2], modifying it
to include support for reservations providing “spare”
bandwidth, and introducing the glue code needed to
use this new feature.
FIGURE 3: Example schedule of a VM
with generic scheduling parameters of (Q, P ), From the interface point of view, each reservation
and a spare reservation of (Qs , Ps ) which may have the property of providing spare bandwidth
is dynamically activated and attached to the to the reservations needing it, and/or the property
same VM on a new packet arrival. Despite the of using spare bandwidth from reservations provid-
budget for the VM at packet arrival time was ing it. The system administrator controls the pa-
exhausted, the VM can complete a short real- rameters of the reservations and the dependencies
time activity of duration Qs within the spare between users and providers of spare bandwidth us-
reservation period Ps . ing the CGROUP filesystem interface.
To recognize the events that are related to VM
The requirements of the real-time workload are
I/O, and consequently activate the spare bandwidth
assumed to be relatively small, and in any case the
mechanism we modified the networking code. In our
additional reservation to be attached dynamically to
modified kernel, when a packet arrives we check its
a VM cannot bee too large in terms of utilization
destination and if is headed towards a Virtual Ma-
(budget over period), because it needs to remain un-
chine we retrieve its server using a simplified hash
used for all the time in which the VM does not ac-
table. If the server has run out of bandwidth we set
cess the network. For example, it might require a
a flag to mark that it needs to access its spare reser-
10% or a lower CPU utilization to complete. This
vation. Setting the flag may also imply requeueing
should allow the real-time activity triggered by the
the running tasks belonging to the same VM, as they
received network packet to complete, assuming it is
may need to access the spare bandwidth too.
configured in the VM for running at higher prior-
ity than other activities. For example, the VM may When a task is activated, along as performing a
perform kernel-level activities inside the networking regular activation, the scheduler checks if the task
driver and stack, and relatively short userspace ac- belongs to a virtual machine, and if the VM’s server
tivities, which may be running in a task that was needs spare bandwidth; if this is the case, the task
waiting for the packet arrival. is not only enqueued in its own server, as would be
146
Performance Evaluation and Enhancement of Real-Time Linux
done anyway, but it is also enqueued in the server der to evaluate the worst-case latency experienced by
providing the spare reservation. ping, the VM was pinned on the first physical core of
the host, while a user-space tool, pinned on the other
The flag set on the VM’s server needs to be re-
core, was used to spin-wait for budget exhaustion of
set, and this may happen on two conditions. The
the associated reservation, and issue a ping request at
first possibility is when the emergency bandwidth
that time. As highlighted in Section 2, the minimum
has been set for a certain duration, empirically de-
observed ping time is theoretically P − Q = 60ms
termined not as a function of time, but rather of the
in this case (but far higher values were observed, ac-
chances the server has had to execute its tasks. The
tually). However, the mechanism introduced in this
other possibility is when the original server has its
paper foresees the attachment of the spare reserva-
bandwidth restored.
tion to the VM at the ping packet receive time, thus
the VM has a chance to run for Qs = 1ms within the
deadline of Ps = 10ms (and for an additional 1ms
4 Experimental Results for each subsequent 10ms time window, till the re-
plenishment of the original reservation budget), thus
The approach presented in the previous section was responding to the request much more quickly.
validated through an experiment conducted on a pro-
The obtained ping times with the VM running
totype implementation of the mechanism, evaluated
under the real-time scheduler are shown in Figure 4.
on a Linux 2.6.35 kernel patched with the IRMOS
As it can be seen when using the spare reservation
real-time scheduler [2], running on an Intel Core 2
(bottom curve) mechanism, the experienced ping
Duo P9600 CPU configured for running at a fixed
times are highly reduced as compared to when not
2.66 GHz frequency. The VM was configured with
using it (top curve).
the CPU thread running at real-time priority lower
than the one used for all its other threads. We were
180
unable to use the full implementation described in "< sed -e ’s/^.*time=\\([0-9.]\\+\\) ms/\\1/’ ping-notrick.dat"
"< sed -e ’s/^.*time=\\([0-9.]\\+\\) ms/\\1/’ ping-trick.dat"
Section 3, and we used only a subset of it, handling 160
80
In order to show the advantages of the technique,
the ping times for reaching the VM have been mea- 60
147
Improving Responsiveness for Virtualized Networking Under Intensive Computing Workloads
148
Performance Evaluation and Enhancement of Real-Time Linux
October 3, 2011
Abstract
Using Linux and Open Source in an industrial environment is becoming more and more common, in
part to ensure participation with daily improvements and compatibility with future development. One
of the most important requirements in the environment of industrial machinery control is realtime, so
we decided to evaluate RT-Linux on different hardware platforms. To generate a realistic load which is
comparable to the real machinery control a simplified version of machinery control, called testplc, was
developed and used in the hardware assessments conducted. The results of this evaluation should give a
clear statement about the applicability of each hardware platform for the machinery control area.
1 Introduction 2 Method
149
Evaluation of RT-Linux on different hardware platforms for the use in industrial machinery control
ery control. It’s very interesting to implement a This test control application gives us two histograms
small test control application designed towards the as result. The first one is a latency histogram of all
real control application. The concrete results give five threads within a range of 0 to 300 micro seconds.
us a better understanding which influences lead to a The second one is a timing histogram of the 500 dou-
scheduling malfunction and the properties of multi ble multiplications within a range of 0 to 20000 nano
core cpus are displayed in the latency histograms. seconds.
Indeed a machinery control application in real
life has to deal with locking, interrupts, priority in- 2.2 RT-Linux
heritance and other complicated stuff, but if this
simple test control application shows problems in To get compareable results its evident to use equal
scheduling and runtime a real machinery control ap- Kernel configurations for the evaluation on each
plication will never work. hardware platform. As already mentioned in chap-
ter 2 the used Kernel was patched using the required
RT-Patch which is available at www.kernel.org. To
2.1 Test Control Application ensure the real time behavour the kernel configura-
tion (see section 2.2.1) as well as the runtime con-
The test control application is implemented in pure figuration (see section 2.2.2) is important and was
C with posix threads. Five of these threads build adopted.
the core of this control application with different cy-
cle times and priorities to simulate the real machin-
ery control. The timing is controlled with absolute 2.2.1 Kernel configuration
timestamps starting with a global start time. In ev-
ery cycle the period time is added to the absolute The following items in the Kernel configuration were
time. Each thread greps the actual system time af- considered:
ter returning from clock_nanosleep before it calcu-
lates the difference between scheduled time and the Processor type and features
actual time. We call this difference latency and in- [ * ] High Resolution Timer Support
serted the value in a histogram. The parameters for [ * ] Symmetric multi-processing support
clock_nanosleep are clockId=CLOCK_MONOTONIC
Preemtion Mode
and flags=TIMER_ABSTIME. For security reasons we
introduce a latency threshold to detect a scheduling (*) Compl.Preemption (Real-Time)
fault. [ * ] Generix x86 support
Timer frequency
Every thread has a number of functions to ex-
ecute, for example calculate 100 double multiplica- (x) 1000 HZ
tions, 500 double multiplications, sorting lists, cal- Power management and ACPI options
culating pi and communicate 1024 bytes over udp
to a server. The udp communication is only done [ * ] Power Management Support
in the 10000 micro second cycle time task to avoid [ ] CPU Freqeuncy Scaling
network problems. This communication generates a Kernel Hacking
huge amount of interrupts to stress the scheduler.
[ * ] Tracers
For measurement we take the time of 500 double
[ * ] Scheduling Latency Tracer
multiplications in nano seconds which are also stored
in a histogram. The timing and priority configura- [ * ] Scheduling Latency Histogram
tions of these threads are [ * ] Missed Timer Offset Histogram
2. 500 us cycle time with priority 75 To ensure real time behavior during run-
time, the Real-Time group scheduling must
3. 1000 us cycle time with priority 70 be modified. Therefore the content of the
files /proc/sys/kernel/sched_rt_period_us and
4. 2000 us cycle time with priority 65 /proc/sys/kernel/sched_rt_runtime_us has
to be set equal. The standard content for
5. 10000 us cycle time with priority 60 sched_rt_period_us is 1000000 (1s) and for
150
Performance Evaluation and Enhancement of Real-Time Linux
3 Results
3.1.1 Description
3.2.1 Description
3.1.2 Result
The testsystem T7500 is based on a Kontron main-
The testplc takes about 80 % of the whole CPU per- board with an Intel(R) Core(TM)2 Duo CPU T7500
formance. In this case the latency times vary over clocked with 2.20GHz and three giga bytes of mem-
more than 100 micro seconds, the peaks look like ory.
serialized tasks in order to the configured priority.
Maybe one of the main reasons for this picture of
latency times in Figure 1 is the poor performance 3.2.2 Result
and the missing ability for parallel computing. The
first three, highest priority tasks do not exceed the The testplc takes about 15 % of the whole CPU per-
maximum worst case latency of 300 micro seconds, formance. In this case the latency times as shown in
the 2000 and 100000 micro second task exceed the Figure 3 do not vary much and concentrate within 10
worst case latency of 300 micro seconds very often, micro seconds. In this plot we see the ability of par-
as shown in the histogram on the right peaks where allel computing, the two fastest tasks set their cpu
the overruns are cumulated. affinity so that each of them use one core.
151
Evaluation of RT-Linux on different hardware platforms for the use in industrial machinery control
Figure 3: Testsystem T7500 latency histogram Figure 5: Testsystem D525 latency histogram
Figure 4: Testsystem T7500 multipliaction time Figure 6: Testsystem D525 multipliaction time
3.3.2 Result The testplc takes about 25 % of the whole CPU per-
formance. In this case the latency times in Figure 7
The testplc takes about 32 % of the whole CPU per- of the three highest priority tasks do not vary much
formance. In this case of Figure 5 the latency times and concentrate within 30 micro seconds. The lower
cover a range up to 200 micro seconds. Only a view the priority the higher the variation in the latency
peaks are detected which exceed the 300 micro sec- time, but even the lowest priority tasks latency so
onds. far stays below 150 micro seconds.
152
Performance Evaluation and Enhancement of Real-Time Linux
4 Conclusion
The results of the recorded histograms show the dif-
ferent real time behavours of the evaluated hard-
ware platforms. For the Z530 and OMAP testsys-
tem the real time capability can’t be confirmed as
the latency time shows a variation more than 300 mi-
cro seconds. In contrast to the testsystem Z530 the
Figure 8: CP255 multipliaction time historgams from the testsystem T7500 show a very
The results in Figure 7 were produced using a small range of latency variation of only 40 micro sec-
vendor specific rt-kernel based on version 2.6.33.9. onds for the tasks with the lowest priority. Although
the testsystem CP255 is optmized for realtime, the
results are not as good as the values from testsys-
3.5 Testsystem OMAP4 tem T7500. We assume the cpu performance is the
reason for this behavour. The four higher priority
3.5.1 Description tasks show only a variation of 15 micro seconds. As
the implementation considers the requirements of a
The testsystem OMAP4 is based on a OMAP4 real machine control and the evaluation gives a good
Panda board with an ARMv7 Processor rev 2 (v7l) overview about the realtime capability, we consider
processor clocked with 1.00GHz and one giga bytes the acquired evaluation method as an extension to
of memory. the OSADL QA farm [2] evaluation.
3.5.2 Result
153
Evaluation of RT-Linux on different hardware platforms for the use in industrial machinery control
154
Performance Evaluation and Enhancement of Real-Time Linux
Wolfgang Wallner
[email protected]
Josef Baumgartner
[email protected]
Abstract
The RT-Preempt patch for Linux turns the Linux operating system into a real-time operating system
and therefore is an ideal platform to implement a real-time Ethernet protocol stack like openPOWERLINK.
The initial implementation of the openPOWERLINK stack on X86 Linux was developed as a kernel mod-
ule. The solution completely bypasses the Linux network stack and achieves maximum performance
through the usage of its own network interface drivers. However, this limits the protocol stack to the
few openPOWERLINK network interface drivers currently available and also makes the protocol very
dependent on the used kernel version. To circumvent these drawbacks, the whole protocol stack was
implemented as Linux user space library. As most of the necessary real-time features are also available in
user space and many applications do not need the performance level of the kernel space implementation,
this solution is adequate for a lot of applications.
This paper describes the porting of the openPOWERLINK stack to user space and examines the
performance of the user space implementation. Therefore, the influence of the user space implementation
on the network jitter and on the generated system load is analyzed and compared with the kernel space
implementation. Due to the long term goal to integrate the lower level layers of the openPOWERLINK
stack into the mainline Linux kernel, in this paper it is furthermore discussed how the protocol stack
could be segmented into a kernel part that would be integrated into the Linux kernel and a user part that
is provided as a user space library.
155
openPOWERLINK in Linux Userspace
network have to support the timing rules, standard sion of the Start of Cyclic (SoC) frame by the MN.
Ethernet devices may not be connected directly to a The SoC frame is sent as a multicast and can be
POWERLINK domain. received and processed by all other POWERLINK
stations in the network. No application data is trans-
ported in the SoC, it is only used for synchronization.
1.2 POWERLINK Layer Model
Immediately after transmitting the SoC, the MN
Figure 1 shows the generic Ethernet POWERLINK addresses each CN in the network with a Poll Re-
layer model. The POWERLINK protocol is lo- quest frame(PReq). Each CN responds with a Poll
cated at OSI layer 2 and 7 (Data Link Layer (DLL) Response (PRes). This frame is sent as multicast and
and application layer). The characteristic timing can therefore be received by the MN as well as by all
that is used to circumvent non-real-time attributes other CNs in the network. Therefore, the PRes can
of standard Ethernet (mainly CSMA/CD) belongs not only send input data from the CN to the MN, but
to the DLL. The POWERLINK specification de- also allows cross-communication among the CNs. Di-
fines that the CANopen interface is used as appli- rect cross-communication allows the times for data
cation layer. The usage of CANopen as application exchange between stations to be reduced consider-
layer makes it easy to integrate classical CANopen ably, since the data need not be copied in the MN.
applications to POWERLINK. The CANopen con- A CN only transmits when it receives a directly
cepts of Device Profiles, the Object Dictionary, Ser- addressed request (PReq) from the MN. The MN
vice Data Objects (SDOs), Process Data Objects waits for the response from the CN. This prevents
(PDOs) and Network Management (NMT) are all collisions on the network and enables deterministic
reused in POWERLINK. This is the reason why timing.
POWERLINK is often referred to as ”CANopen over
Ethernet”. A fixed time is reserved in the network cycle for
asynchronous data. Asynchronous data differs from
cyclic data in that it need not be configured in ad-
vance. Asynchronous data is generated on-demand
by a POWERLINK station. Examples are visualiza-
tion data, diagnostic data, etc. One asynchronous
frame can be sent per POWERLINK cycle. The
CNs can signal the MN in the poll response frame
that they would like to send asynchronous data. The
MN determines which station is allowed to send, and
shares this information in the Start of Asynchronous
(SoA) frame. Any Ethernet frame can be sent as
an asynchronous frame (ARP, IP, etc.). However, a
maximum length (MTU = Maximum Transfer Unit)
must not be exceeded.
The most important timing characteristic in an
Ethernet POWERLINK network is the cycle time,
which is measured between the start of two consec-
FIGURE 1: Overview of the utive SoC frames. The worst case jitter of the cy-
POWERLINK Protocol Layers. cle time is a quality attribute of the MN. A typical
POWERLINK communication cycle is shown in Fig-
1.3 Communication Principle (Data ure 2.
Link Layer)
156
Performance Evaluation and Enhancement of Real-Time Linux
157
openPOWERLINK in Linux Userspace
+ Still enough performance for many production Section 2.3 sketched to general porting procedure,
applications this section will describe the design decisions that
were made for the Linux user space implementation.
158
Performance Evaluation and Enhancement of Real-Time Linux
chosen as the basis for the Linux platform. As libP- The Ethernet POWERLINK network in the fig-
CAP uses RAW sockets on Linux there is nearly no ure is highlighted in orange, other connections that
performance difference. are shown in white indicate standard non-real-time
Ethernet. As measuring network times using tools
like Wireshark on a standard desktop PC suffers
3.3.4 Timers from larger jitter in the timestamps of individual
frames, a B&R Network Analyzer X20ET8819 was
Implementations for both LRTs as well as HRTs used. The B&R Network Analyzer is equipped with
use the POSIX timer API. Using POSIX timers for two network ports, one for POWERLINK and one
the needed HRTs is possible because of the high- for standard Ethernet. It is able to capture frames
resolution timers that ware introduced by Thomas on the POWERLINK network and timestamp them
Gleixner and Ingo Molnar as part of the Linux kernel with a 20ns resolution. It packs the timestamped
since 2.6.16. The new timer system does no longer POWERLINK frames into UDP packets and sends
depend on the periodic tick of the operating system them onto the Ethernet interface for further analy-
and allows nanoseconds resolution. However, the res- sis. The PC that was used to collect the captured
olution depends on the available timer hardware of POWERLINK frames and later run statistical ana-
the system. On an Intel X86 architecture there are lyzes and create test protocol is shown in the upper
different clock sources available (hpet, tsc, acpi pm) right corner. To generate network stress, another ex-
which provide a usable timer resolution in the mi- ternal device was needed. For this purpose, another
crosecond range. These high-resolution timers can Linux PC was used, which is shown in the upper left
be used to increase the precision of POSIX user space corner.
timers, which is exactly what we needed.
A detailed overview of the new architecture is
given in the paper Hrtimers and Beyond: Transform-
ing the Linux Time Subsystems[3].
4 Performance Evaluation
159
openPOWERLINK in Linux Userspace
at 1.06 GHz, 1 GByte DDR2 PC2-5300 DRAM and related threads has been increased. It is impor-
a 40GB hard disk drive. The Intel 945GME chipset tant that the real-time related openPOWERLINK
contains the Graphics Media Accelerator GMA 950. threads all have a higher priority than the other sys-
The APC is equipped with two on-board network in- tem threads. The internal priority relation between
terfaces, which use different Ethernet chips. One of the different openPOWERLINK threads is based on
these interfaces is based on the Intel 82573L, while the stack architecture (timer threads higher than
the other uses a Realtek 8111B. A third interface was network threads, thread of kernel-to-user shared
added as a PCI card, based on the Realtek 8139 chip. buffer higher than user-to-kernel shared buffer, . . . ).
The Low-res timer threads have the same priority as
These connections were used as follows:
the system SIRQs, as they are not critical to the real-
time behaviour. The startup thread has a very low
• Intel 82573L: Used as POWERLINK inter- priority, because it is mainly used for initialization,
face. This interface was configured to have no but has nothing to do during cyclic operation.
IP address, to avoid interference between the
POWERLINK and the Linux network stack.
Controlled Nodes (CNs)
• Realtek 8111B: Connected to the corporate
network, used for TCP/IP communication. A network of standard B&R POWERLINK bus cou-
• Realtek 8139: Directly connected to the PC plers (X20BC0083[6]) was used as CNs. As the ap-
that serves as flood ping generator (used for plication on the MN should simulate a real world
network stress test). implementation, these CNs were equipped with in-
put and output modules and exchanged new data in
The operating system used on the MN was an every cycle. These CNs were addressed as standard
Ubuntu 10.04 LTS (Lucid Lynx). We used the lat- CANopen DS401: Generic I/O modules.
est stable real time kernel 2.6.33.7-rt30 as listed on
the OSADL webpage[8]. The thread priorities were Network Analyzer
adjusted to the following values (demo pi console is
the name of the used demo application): To generate high precision time stamps for the
observed POWERLINK frames, a B&R network
analyzer (X20ET8819) was used. This device is
Thread Priority Description equipped with two Ethernet ports. One of these in-
sirq-hrtimer/0 -81 High-res timer terfaces is used as a pure POWERLINK input port
sirq-hrtimer/1 -81 High-res timer to analyze the received frames. It latches the time
demo pi console -76 High-res timer of reception with a precision of 20 ns. This informa-
irq/29-eth1 -71 Network interface tion is packed in UDP packets and sent out on the
sirq-net-rx/0 -61 Network handling second Ethernet port. On the PC this information
sirq-net-rx/1 -61 Network handling can be received and further processed, i.e. to create
demo pi console -56 Shared Buffer K→U high precision Wireshark traces. In our case these
demo pi console -51 Shared Buffer U→K measurements were evaluated in our test program to
demo pi console -51 Edrv (PCAP) measure the SoC jitter.
demo pi console -50 Low-res timer
demo pi console -21 Startup thread
Flood ping generator
TABLE 1: Thread priorities used by the A standard desktop PC running Linux was used to
user space implementation. create high amounts of network IRQs on the MN by
sending flood pings.
The priorites of the system threads were in-
creased using the tool chrt before the POWERLINK
Measurement PC
application was started. To adjust the priorities
of the stack internal threads, the the API call Another standard desktop PC running Linux that
sched priority was used during run time. was used to dump the timing measurements sent by
As stated earlier in section 2.3, the interrupt la- the network analyzer, do statistical calculations on
tencies for timer IRQs need to be as low as possible them using GNU R, and create the test reports with
to increase precision. This is the reason why the LATEX.
timer related threads are set to the highest priori-
ties. For the same reason, the priorities of network
160
Performance Evaluation and Enhancement of Real-Time Linux
4.2 Results
Precision
For a comparison between the measured jitter values FIGURE 6: CPU load of the pcap based
of the kernel space and the user space implemen- POWERLINK stack in different configura-
tation, see Figure 5. The influence of the different tions.
load scenarios is very similar for both the user space
and kernel space implementation. Notice however
the different scale: in the range of 100 µs for user 5 Conclusion and Future Work
space and in the range of 40 µs for kernel space.
High scheduling load has the greatest impact on the The measured values of performance and precision of
network latencies on both implementations. the user space implementation are inferior to the ker-
nel space variant, which was expected. While high
performance application still need to be served by the
Performance kernel space implementation, the experiments have
shown that the user space variant can be used for
The measured CPU load of the user space imple- many applications with lower requirements. A no-
mentation on different configurations is visualized in ticeable benefit of the user space implementation is
Figure 6. The kernel space and user space imple- the portability. Through the use of the pcap library
mentation are compared in the following table (CPU it can be used on any Ethernet chip that is supported
load is given in percent of a single CPU core): by the mainline Linux kernel. In combination with
161
openPOWERLINK in Linux Userspace
FIGURE 5: Boxplots showing the measured jitters values during different load scenarios. (a)
shows the jitter values of the kernel based implementation, while (b) shows the results of the user
space version using pcap.
162
Performance Evaluation and Enhancement of Real-Time Linux
[5] APC 810 User’s Manual, Version 1.20, October [12] hackbench homepage,
2009, Bernecker + Rainer Industrie-Elektronik http://devresources.linux-
Ges.m.b.H, Austria foundation.org/craiger/hackbench
163
openPOWERLINK in Linux Userspace
164
Performance Evaluation and Enhancement of Real-Time Linux
Michal Sojka1 , Pavel Pı́ša1 , Ondřej Špinka1 , Oliver Hartkopp2 , Zdeněk Hanzálek1
1
Czech Technical University in Prague
Technická 2, 121 35 Praha 6, Czech Republic
{sojkam1,pisa,spinkao,hanzalek}@fel.cvut.cz
2
Volkswagen Group Research
Brieffach 1777, 38436 Wolfsburg, Germany
[email protected]
Abstract
In this paper, we thoroughly analyze timing properties of CAN-to-CAN gateway built with Linux
kernel CAN subsystem. The latencies induced by this gateway are evaluated under many combinations
of conditions, such as when traffic filtering is used, when the gateway is configured to modify the routed
frames, when various types of load are imposed on the gateway or when the gateway is run on different
kernels (both rt-preempt and vanilla are included). From the detailed results, we derive the general
characteristics of the gateway. Some of the results apply not only for the special case of CAN-to-CAN
routing, but also for the whole Linux networking subsystem because many mechanisms in the Linux
networking stack are shared by all protocols.
The overall conclusion of our analysis is that the gateway is in pretty good shape and our results were
used to support merging the gateway into Linux mainline.
165
Timing Analysis of a Linux-Based CAN-to-CAN Gateway
The results of this testing are reported in this pa- The software configuration is kept as simple as
per. The complete data set, consisting of gigabytes possible in order to make the results not disturbed by
of data and more than one thousand graphs, as well unrelated activities. The gateway runs only a Linux
as the source codes of our testing tools, are available kernel, a Dropbear SSH server and, obviously, the
for download in our public repositories [4, 5]. This gateway itself. On the PC, a stripped-down Debian
allows other people interested in this topic to inde- distribution is used. The tasks that generate the test
pendently review our results and methods, as well traffic and measure the gateway latency are assigned
as to use them as a base for their own experiments. the highest real-time priority and their memory is
Our methods and results are relevant not only for the locked in order to prevent page-faults. SocketCAN
special case of CAN-to-CAN routing but, since Linux was used on both the gateway and the PC as the
networking subsystem forms the core of many other CAN driver.
protocols, also for other networks including Ether-
net, Bluetooth, Zigbee etc.
The paper is organized as follows: the next sec-
tion describes the setup of our testbed and how we 2.1 Measurement Methodology
measured the gateway latencies. Section 3 summa-
rizes the main results found during our testing. We To measure the gateway latency, we generate CAN
give our conclusion in Section 4. traffic in the PC and send it out from can0 inter-
face. As can be seen in Figure 1, this interface is
directly wired to the can1 interface of the PC as
2 Testbed Setup well as to one interface of the gateway. The can1
interface is used to receive the frames to determine
The testbed, used for gateway latency measure- the exact time when each frame actually appears on
ments, is depicted in Figure 1 and consists of a stan- the bus2 . This is necessary in order to exclude vari-
dard PC and the gateway. The PC is Pentium 4 ous delays such as queuing time in the can0 transmit
running at 2.4 GHz with 2 GB RAM, equipped with queue. When a frame is received on can1 interface,
Kvaser PCI quad-CAN SJA1000-based adapter. The it is timestamped by the driver in its interrupt han-
gateway is an embedded board based on MPC5200B dler. These timestamps are sufficiently precise for
(PowerPC) microcontroller running at 400 MHz. our measurements.
There are two CAN buses that connects the PC with The frames routed through the gateway are re-
the gateway. The PC generates the CAN traffic on ceived on can2 interface of the PC. Again, these
one bus and looks at the traffic routed via the gate- frames are timestamped the same way as was de-
way on the other bus. The gateway is also connected scribed in the previous paragraph. The total latency
to the PC via Ethernet (using a dedicated adapter is then calculated by simply subtracting the times-
in the PC). This connection serves for booting the tamps measured on the can2 and can1 interfaces (see
gateway via TFTP and NFS protocols, for configur- Figure 2). It is worth noting that both timestamps
ing it via SSH, and also to generate Ethernet load to are obtained using the same clock (in our case time-
see how it influences the gateway latencies. stamp counter register of the PC’s CPU), which en-
sures that the results are not influenced by the offset
of non-synchronized clocks.
To calculate the latency, we need to determine
which received frame corresponds to which transmit-
ted one, and this mechanism must be able to cope
with possible frame losses or frame modifications in
the gateway. For this purpose, the first two bytes of
the data payload are used to store a unique number
that is never modified by the gateway. This number
serves as an index to a lookup table, which stores the
timestamps relevant to the particular frame. This al-
lows for easily detection of frame losses. When the
corresponding entry in the lookup table contains just
one timestamp after a certain timeout, which is set
FIGURE 1: Testbed configuration. to 1 s by default, the frame is considered lost.
2 We could also use kernel provided TX timestamps for this, but it somehow didn’t work in our setup.
166
Performance Evaluation and Enhancement of Real-Time Linux
GW latency Duration
time 2.3 Presentation of Results
Total latency
we subtract from the total latency the duration of
the frame transmission, where we take into account
the stuff bits inserted by CAN link layer.
The sources of our testing tools and the individ-
ual test cases can be found in our git repository [4]
!"#$
Our goal was to measure the properties of the gate-
way under a wide range of conditions. These in-
cluded:
FIGURE 3: How is latency profile constructed.
1. The gateway configuration such as frame fil-
ters, frame modifications, etc. The advantage of using latency profiles is that
the worst-case behavior (bottom right part of the
2. Additional load imposed on the gateway sys-
graph) is “magnified” by the logarithmic scale. More
tem. The following types of load were con-
formally, the properties of the latency profile are
sidered: no load; CPU load i.e. running
as follows: Given two points (t1 , m1 ) and (t2 , m2 )
hackbench3 on the gateway; Ethernet load i.e.
from a latency profile, where t1 < t2 , we can say
running ping -f -s 60000 -q gw on the PC
that m1 − m2 frames had the latency in the range
with gw being the IP address of the gateway.
(t1 , t2 ). Additionally, the rightmost point (tw , mw )
3. Type of CAN traffic. We tested the gateway means that there were exactly mw frames with the
with three kinds of traffic: One frame at a time, worst-case latency of tw .
where the next frame was sent only after receiv-
ing of the previously sent frame from the gate-
way; 50% bus load, where frames were sent with 2.4 Measurement Precision
a fixed period which was equal to two times
the transmission duration and finally, 100% bus We conducted a few experiments to evaluate the
load (flood), where frames were sent as fast as precision of measuring the latencies with our setup.
possible. First, we measured the total frame latencies by two
means: (1) by a PC as described above and (2) by an
4. Linux kernel version used on the gateway. independent CAN analyzer by Vector4 . In the other
The following versions were tested: 2.6.33.7, experiments we used only the method 1 (PC) as it al-
2.6.33.7-rt29, 2.6.36.2, 3.0.4 and 3.0.4-rt14. lows for full automation of the measurement, whereas
3 Hackbench repository is at http://git.kernel.org/?p=linux/kernel/git/tglx/rt-tests.git.
4 We used analyzer called CANalyzer (http://www.vector.com/vi canalyzer en.html)
167
Timing Analysis of a Linux-Based CAN-to-CAN Gateway
bus to another without any modifications.
#$
$%& '
! "
#
$%
when RX soft-irq is scheduled, it runs in a loop and
tries to process all frames sitting in receive buffers
(either in hardware or in software). The graph in
! " Figure 7 shows nicely the effect of this. If we com-
pare the latencies when the CAN traffic was gener-
ated with one frame at a time and flood methods,
it can be seen that in the former case, the overhead
FIGURE 5: Influence of Ethernet load generator of scheduling the RX soft-irq is always included (the
on measurement precision (no GW involved). latency profile starts at 35 µs), whereas in the latter
168
Performance Evaluation and Enhancement of Real-Time Linux
case, the overhead is reduced. Whenever the gate- 3.4 Frame Filtering
way receives a frame just when it finishes processing
of the previous frame, it does not exit the soft-irq The SocketCAN gateway allows for filtering the
and continues processing the new frame. Therefore, frames based on their IDs. There are two kinds of
the best-case latencies are much lower in that case filter implementations. First implementation (used
(the latency of 0 is of course caused by measurement for all EFF5 frames) puts the filtering rules into a
inaccuracies). The worst-case is about the same in linked list. Whenever a frame is received, this list
both cases. is traversed and when a match is found, the frame
is routed to the requested interface. The second im-
plementation is optimized for matching single SFF6
IDs. Since there is only 2048 distinct SFF IDs, the
the destination interface if found without traversing
a potentially long list.
!
"
!
#
FIGURE 7: The effect of batched frame process-
!"
ing. Conditions: GW kernel 2.6.33.7, load: none,
payload: 4 bytes.
( *!
( *#
Figure 8 shows the effect of loading the gateway. ( *
169
Timing Analysis of a Linux-Based CAN-to-CAN Gateway
essence, most of the cost comes from copying the
socket buffer before modifying it. The difference be-
tween different modifications is negligible.
FIGURE 11: Different kernels. Conditions: traf- FIGURE 13: Kernel-space vs. User-space gate-
fic: one frame at a time, load: none, payload: 4 way. Conditions: kernel: 2.6.33.7, load: none, traf-
bytes. fic: one frame at a time, payload: 2 bytes.
Second, the latency of rt-preempt kernels is In Figure 13 can be seen that the time needed
higher than any of non-rt kernels. This is obvious, to route the frame in user-space is about three times
because the preemptivity of the kernel has its costs. bigger then with the kernel gateway.
7 The configs of our gateway kernels can be found at https://rtime.felk.cvut.cz/gitweb/can-benchmark.git/tree/HEAD:/-
kernel/build/shark
170
Performance Evaluation and Enhancement of Real-Time Linux
reader is referred to our web site [5].
4 Conclusion
This paper presented the timing analysis of a Linux-
FIGURE 14: Kernel-space vs. User-space gate- based CAN-to-CAN gateway and studied influence of
way under heavy traffic. Conditions: kernel: various factors (like CPU and bus load, kernel ver-
2.6.33.7 (non-rt and rt), load: none, traffic: flood, sions etc.) on frame latencies. The results indicate
payload: 2 bytes. that the gateway itself introduces no significant over-
head under real-life bus loads and working conditions
Figure 14 shows the gateway latencies with flood and can reliably work as a part of a distributed em-
CAN traffic. One can see that the user-space gate- bedded system. Our results were used to support
way under non-rt kernel drops some frames (the gap merging the gateway into Linux mainline. The gate-
at the top) and exhibits latencies up to 100 ms. The way should appear in Linux 3.2 release.
latencies are caused mainly by queuing frames in re-
On the other hand, it must be noted that espe-
ceiving socket queues. Since the user space had no
cially excessive Ethernet traffic or improperly con-
chance to run for a long time, the queue becomes
structed frame filters can lead to significant perfor-
long and eventually some frames are dropped. The
mance penalties and possible frame losses. The CAN
kernel simply has “higher priority” than anything in
subsystem, which forms the core of the examinated
the user space. With -rt kernel, the situation is dif-
CAN gateway, is inherently prone to problems un-
ferent. The priorities of both user and kernel threads
der heavy bus loads, not only on CAN bus, but also
can be set to (almost) arbitrary values, which allows
on other networking devices, as was already demon-
to reduce latencies of the user-space gateway down
strated in our previous work [6]. Nevertheless, the
to 2 ms. described gateway is a standard and easy-to-use solu-
tion, integrated in Linux kernel mainline, and there-
3.8 Multihop Routing fore represents the framework of choice for most de-
velopers.
In the last experiment, we modified the kernel gate- It was also clearly demonstrated that the kernel-
way to allow routing a single frame multiple times space solution works much better than the user-space
via virtual CAN devices. This allows us to split the solution, and that it can be beneficial to use stan-
overall latency into two parts. The first part is the dard non-rt kernels (providing that the gateway runs
overhead of interrupt handling and soft-irq schedul- in kernel-space). This allows to avoid greater over-
ing and the second part is the processing of the frame head and resulting performance penalty of rt kernels,
in CAN subsystem. The latter part can be derived providing that the standard kernel is properly con-
from Figure 15, by looking at the difference between figured.
consecutive lines (and dividing it by two). We get
Finally, our benchmarks revealed a few problems
that CAN subsystem processing takes about 10 µs.
in -rt kernels. We will investigate these problems as
The rest (from ca. 60 µs to 130 µs) is the overhead
of the rest of the system. our future work.
References
[1] T. Nolte, H. Hansson, and L. L. Bello, “Automo-
tive communications-past, current and future,”
in 10th IEEE Conference on Emerging Technolo-
gies and Factory Automation (ETFA), Catania,
Italy, 2005, pp. 992–1000.
FIGURE 15: Latencies of multi-hop routing via [2] N. Navet, Y. Song, F. Simonot-Lion, and
virtual CAN interfaces. C. Wilwert, “Trends in automotive communi-
171
Timing Analysis of a Linux-Based CAN-to-CAN Gateway
cation systems,” Proceedings of the IEEE, vol. [5] M. Sojka and P. Pı́ša, “Netlink-
93(6), pp. 1204–1223, 2005. based CAN-to-CAN gateway timing
test results,” 2011. [Online]. Available:
[3] “The SocketCAN poject website,” http://rtime.felk.cvut.cz/can/benchmark/2.1/
http://developer.berlios.de/projects/socketcan.
[6] M. Sojka and P. Pı́ša, “Timing analysis of
[4] M. Sojka, “CAN Benchmark git Linux CAN drivers,” in Eleventh Real-Time
repository,” 2010. [Online]. Avail- Linux Workshop. Homagstr. 3 - 5, D-72296
able: http://rtime.felk.cvut.cz/gitweb/can- Schopfloch: Open Source Automation Develop-
benchmark.git ment Lab, 2009, pp. 147–153.
172
Performance Evaluation and Enhancement of Real-Time Linux
Sanjay Ghosh
Industrial Software Systems, ABB Corporate Research
Bhoruka Tech Park, Whitefield Road, Bangalore, India
[email protected]
Pradyumna Sampath
Industrial Communications, ABB Corporate Research
Bhoruka Tech Park, Whitefield Road, Bangalore, India
[email protected]
Abstract
Real time control applications in industrial control systems have long been trusted to run on specially
designed dedicated embedded hardware like PLCs, controllers etc. The essential non real-time func-
tionalities of the automation systems like engineering and HMI are separately executed on independent
hardware. Over the last decade, advancements in RTOS, technologies such as virtualization and avail-
ability of more powerful COTS hardware have enabled the recent trends, where industrial PCs are being
proposed to replace the hardware controller units. This paper describes the evaluation of our prototype
control system on a general purpose hardware based on Linux rt-preempt. In our setup, the control
logic is executed by a runtime control engine alongside the standard engineering framework, standard
HMI, data acquisition, EtherCAT based IO and also general PC utility applications on a uni-processor
system without impacting the deterministic performance. The paper discusses in detail, our performance
evaluation results and the methodologies used in terms of, the test setup, boundary conditions, and the
parameters measured under typical load conditions as in real industrial applications.
1 Introduction and Back- troller, leaving a significant resource left over to run
other essential non real time applications. Addition-
ground ally, PCs are available in several configurations and
form factors to meet the diverse user needs compared
The automation industry has seen a trend in the us- to the controller hardware. The use of commercial
age of control systems based on the commodity hard- PC along with desktop operating systems does not
ware instead of being based on the standard con- guarantee the deterministic hard real-time require-
troller hardware. The key advantages this solution ments for discrete manufacturing. Hard real-time
provides are in terms of higher flexibility of configu- controllers, by their nature are meant to provide fast,
ration and ease of customization. In fact, with the deterministic and repeatable scan times without be-
availability of high processing and memory capabil- ing affected by the other background activities un-
ities, PCs can significantly out-perform most of the dertaken by the operating system. Typical discrete
commercial controllers that are being currently used manufacturing applications require deterministic, re-
in the industry, in terms of the available hardware peatable scan times to be as fast as one millisec-
resources to perform tasks. Therefore, control ap- ond. Therefore the control engineering community,
plications running on a resource rich PC-based con- as they move to the PC-based control, expects it to
trol may consume a comparatively lesser of the total deliver similar level of performance and reliability.
available CPU than in the case of a traditional con- The challenge yet remains for the PC based control
173
Evaluation of embedded virtualization on real-time Linux for industrial control system
on one hand to enable all the benefits discussed above 2 Industrial Control on com-
and on the other hand to achieve the reliability re-
quired of a controller.
modity hardware
174
Performance Evaluation and Enhancement of Real-Time Linux
scarcely needed during operation of a factory au- and the user modes, in Linux called the guest mode,
tomation system. The resource demands of the con- which in turn has its own kernel and user modes [17].
trol engine execution and should not starve the other This resulting guest VM.s physical memory can be
non real time tasks, especially the engineering and mapped to the virtual memory of the host hyper-
the HMI. visor. Using the corresponding host process for the
VM guest process, standard configurations in terms
of priority, affinity etc can be configured in order to
2.2 Technology alternatives for PC flexibly influence the scheduling of virtual machines
based control during runtime [2]. This is another advantage which
a PC based control using a hardware assisted virtu-
In order to host the control engine we identified and alization over a native operating system scores over
selected Linux rt-preempt [12] over other available the traditional control.
real time extensions to the Linux kernel such as RTAI
and RT-Linux, mainly based on the requirements
for our domain [13]. Other reasons for choosing rt-
2.3 Prototype PC based control
preempt is that a lot of the features in this patch is
has been already included into the mainline kernel in Based on the requirements for the PC based Control
parts. It is definitely important to ensure the avail- mentioned in section 2.1 and considering the technol-
ability of the community support that is sustainable ogy solution mentioned in section 2.2, we came out
over long product life cycles. rt-preempt patch im- with a prototype PC based Control. For the con-
plements real time behavior by allowing nearly the trol engine we selected one of the control engine that
entire kernel to be preempted, with the exception of a provides support for Linux operating system on x86
few very small regions of code. Further by inclusion architecture hardware. Programming IDE runs on
of the high resolution timers (hrtimers) hard real- Windows guest OS and is based on the open interna-
time behavior can be achieved in rt-preempt [14]. tional standard IEC 61131 and is also supplied by the
Thus the control engine in the PC based control is to same vendor. The control engine also supports visu-
be run as user space task in Linux preempt RT with alization on the device where the HMI is executed
appropriate real-time priorities. In order to achieve as a user space process on the host operating sys-
co-existence of both the real time and the non real tem. The control engine executes as a high real time
time tasks, a significant degree of isolation in terms priority on the Linux rt-preempt based host while
of memory space is required. The Linux rt-preempt the engineering and other PC utility applications are
kernel with its implementation of a virtual mem- executed inside one or many Windows based guests
ory model provides for this feature with real-time with comparatively lower process priorities. The ex-
responses. Combining virtualization and real-time ecution model of the prototype PC based control is
gives several use cases for embedded systems. Full shown below in figure-1.
virtualization options (such as Intel VT-x) are now
commonly available in commodity hardware having
x86 [10] architecture.
We also selected Kernel Based Virtual Machine
(KVM) [15], [16] which utilizes the hardware virtual-
ization extension to enable the virtualization in our
PC based control prototype. Since 2.6.20, KVM as
an active open source project has been a part of
the mainline kernel as a Linux kernel module with
a strong developer community. KVM runs unmodi-
fied guest operating systems on the host OS provid-
ing each virtual machine to own private virtualized
hardware: a network card, disk, graphics adapter,
etc.. It relies on the host OS for tasks like schedul- FIGURE 1: Execution Model
ing, native interrupt, memory management, hard-
ware management, etc. using a kernel-space device Figure 2 below shows the communication model of
driver (/dev/kvm) and hence categorized as type- the prototype PC based control. The network con-
2 hypervisor. This uses a user-space component figuration between the host operating system (Linux
QEMU [17] for all device emulation. KVM adds an rt-preempt) and the guest operating system (Win-
operating mode in addition to the default, the kernel dows) is using the local Ethernet based soft bridge
175
Evaluation of embedded virtualization on real-time Linux for industrial control system
176
Performance Evaluation and Enhancement of Real-Time Linux
the standard guidelines from the PLCopen. These stress -cpu <c> -io <i> -vm <m> -timeout <t>&
scripts are written in IEC 61131 language structured
text format [19] and are used as standard programs Similarly, in order to load or stress the Windows
to benchmark the performance of PLCs. The ap- based guest VMs, Windows based freeware Heavy-
plication engineering interface used in our prototype Load 3.0.0.159 [21] was used. In order to stress the
allows application programming using the standard system resources, HeavyLoad writes a large test-file
IEC 61131 language. There are two types of bench- to the temp folder, allocates physical and virtual
marking scripts, for application oriented tests and for memory and draws patterns in its window.
language oriented tests. Application oriented bench-
marks are used to measure the whole cycle of the
control i.e. from receiving an input signal, the inter- 3.3 Evaluation Parameters
nal processing, till writing an output signal. These
are a set of different types of applications and their For the performance evaluation of the PC based con-
mixtures, which are typically used in the factory au- trol, we mainly focus on measuring the upper bound
tomation. Language oriented benchmarks evaluates or the worst case latencies of the cyclic execution
the computational performance of a controller while of the control application. The key to appropriate
performing all the available language constructs in performance evaluation is accurately measuring the
61131-3 language. We have evaluated our proto- cycle time. The actual cycle time is the time span
type PC based control using seven of the benchmark- between start and end of a test cycle excluding any
ing programs including both the application oriented possible overheads such as task startup process, IO
tests and the language oriented tests. Almost similar access time etc. However, system specific overhead
results were observed for all of these tests. For the (like timer tick and task scheduler) gets included by
sake of conciseness, in this paper, we present only default to the cycle time measurement. Therefore
the results of one of the experiments i.e. the lan- the time measurements are instrumented within the
guage oriented test for the control statements. This application test program. At every iteration cycle,
test evaluates the performance of the PC based con- at the start of the operations, the current system
trol in operation of one thousand instances of differ- timestamp is stored in a variable. Then the opera-
ent control statements e.g. IF, CASE, FOR, WHILE tions are being performed according as per the appli-
etc. Repetition time is coded inside the test project cation logic. Right at the end of all the operations,
scripts using looping. All the experiments were per- the system time is stored again and the time elapsed
formed for more than two hours with the control en- while executing the operations is being calculated.
gine continuously executing the test program during This is the termed as the Execution Cycle time or
this duration with different test conditions. The ap- more commonly Cycle Time. Based on these mea-
plication engineering tool used in the prototype also surements per cycle, average, minimum and maxi-
allows developing visualization applications too. The mum execution cycle times is calculated. Another
visualization program executes on the control engine measurement parameter which is of interest is the
along with the application test program, however Jitter in the execution of the control logic. This
with an execution cycle time, typically, two orders is the measure of how early or how late the execu-
larger than that of the control application. We have tion cycle starts with reference to the desired time
created a simple visualization program which shows of start. Prior to performing the actual performance
six visualization objects on the screen and linked to evaluation experiments, as the first step, it is also
these to monitor and update the status of six differ- important measure the processing capabilities of a
ent variables (i.e. six OPC tags) in the application control system. In order to estimate the processing
test program. capability of the control system for the particular
control logic, we execute the test control program
While performing the performance evaluation of with different interval cycle times and the watchdog
any real time systems, it is a usual practice to load or enabled. Watchdog in this context is defined as a
stress the system under test in order to observe the monitor inbuilt within the runtime engine which in-
performance in such conditions. In order to stress
dicates an exception when the actual execution time
the Linux rt-preempt host, we used the open source of the IEC61131 application exceeds the designated
tool ‘Stress’ 1.0.4. [20] to run as a background pro- interval time. That is estimated as the minimum
cess in order to imposes a configurable amount of possible interval cycle time of the system under test
CPU, memory, I/O, and disk stress on the system. for that particular control application. Further, for
An example of the command line used for executing all the performance evaluation experiments on the
the stress program is as follows system using that test control application, the in-
terval cycle time is configured to be the minimum
177
Evaluation of embedded virtualization on real-time Linux for industrial control system
178
Performance Evaluation and Enhancement of Real-Time Linux
where the necessary non real time tasks are also run application and scan times for the EtherCAT based
alongside these real time processes, the smooth ex- output observed using an oscilloscope.
ecution of the control application is found possible
only beyond the interval cycle time of 500µs. How-
ever, typical factory automation applications require
scan times of as small as one millisecond. In both
of the above mentioned scenarios, it was observed
that the average execution time the system requires
for executing one cycle of the test control logic was
less than 225ms, i.e. well below even the 50% of
the interval cycle time. As the control engine and
the IO communication runs as high real time prior-
ity processes, these tasks are never preempted from
the scheduler by other non real time tasks, during
their execution. However, it is during these available
slack times the non real time processes are scheduled
if required.
FIGURE 3: Execution Model
4.2 Test 2: Latency and jitter evalu- The measurements presented in the table 3 shows
ation that, the PC based control was able to accommodate
the execution of typical non real time tasks required
This experiment is meant to evaluate the perfor- in a control system, on the same hardware, without
mance of PC based control prototype for reliably compromising the real time guarantee of the control
running the control applications and real time IO applications. Further, it was also observed that, even
communication, even in the presence of the necessary by deliberately applying heavy loads on the host sys-
non real time tasks, running on the same hardware. tem as well as the guest systems, there is only a neg-
For the purpose of this experiment we executed the ligible change on the real time execution behavior of
test control application on the PC based prototype the control application.
under seven different test conditions identified as the In one of the scenario, the reliability of the iso-
seven setup configurations, C1,C2,.., C7. These con- lation between the Linux rt-preempt host and the
figurations are defined based on what all tasks are Windows based guests partitions was also evaluated.
being run on the system Even during the deliberate crashing or during re-
booting of the Windows guests, the control engine
• Base Configuration (Base): Control engine + and the real time IO continues to perform unaffected
IO communication + Visualization on host + in terms of cycle times and jitter.
Windows VM1 running application engineering
tool
• C1: Base 5 Conclusion and Future Work
• C2: Base + Low stress on host
Our experiments with the prototype PC based con-
• C3: Base + Moderate stress on host trol device has demonstrated that; it is possible to
be able to achieve deterministic responses by us-
• C4: Base + Heavy stress on host ing Linux rt-preempt along with KVM as the host
RTOS. Irrespective of the concurrently running ap-
• C5: Base + Very Heavy stress on host
plication load on the windows guests, the determin-
• C6: Base + Very Heavy stress on host + Max istic behavior of the user space applications running
possible load on Windows VM1 on the rt-preempt kernel is not affected. The choice
of real-time tasks and their priorities must still be
• C7: Base + Very Heavy stress on host + Max carefully managed and the host and guest must still
possible load on Windows VM1 + Windows follow the traditional separation of concerns i.e. real-
VM2 running text processing application time and non-real time respectively. The results of
the tests performed also indicate that, such a system
Table 3 below presents the measurements of the ex- may be conceivable for certain industrial application
ecution cycle times and jitter for the test control domains, but maybe inappropriate for applications
179
Evaluation of embedded virtualization on real-time Linux for industrial control system
which demand more stringent real-time constraints [9] Robert Kaiser, Stephan Wagner, and Alexander
(such as closed loop motion control). Zuepke, ”Safe and Cooperative Coexistence of
a SoftPLC and Linux”, Sysgo AG white paper,
Going forward we believe that there is potential
2007
for further work in the area. One such activity might
involve the confluence of multi-core, virtualization [10] Intel Corporation, .Intel 64 and IA-32 Archi-
and real-time. Comparative studies between SMP tectures Software Developer.s Manual., Vol. 3B,
Virtualization and AMP virtualization for real-time pp. 3-8, March 2010.
systems is an area where the industry and academia
might see benefit. [11] Gernot Heiser, .The role of virtualization in em-
bedded systems., Proceedings of the 1st workshop
on Isolation and integration in embedded sys-
References tems, pp. 11-16, April, 2008.
[3] Peter Wurmsdobler, ”Slower is easier”, Indus- [14] Steven Rostedt and Darren Hart, .Internals of
trial Computing, pp. 49-51, Nov 2001. the RT Patch., Proceedings of the Linux Sym-
posium, 2007.
[4] Kushal Koolwal,. Investigating latency effects of
the Linux real-time Preemption Patches (PRE- [15] Kernel Based Virtual Machine -
EMPT RT) on AMD.s GEODE LX Platform., http://www.linux-kvm.org/
RTLWS11, 2009.
[16] Avi Kivity et al., ”KVM: The Linux Virtual Ma-
[5] A. Heursch, D. Grambow, A. Hosrtkotte, chine Monitor”, Proceedings of the Linux Sym-
and H. Rzehak, .Steps towards a fully pre- posium, 2007.
emptable Linux kernel., Proceedings of the
27th IFAC/IFIP/IEEE Workshop on Real- [17] QEMU - open source processor emulator -
Time Programming, May 2003. http://www.qemu.org
[6] Carsten Emde, ”Long-term monitoring of ap- [18] PLCopen Benchmark home page
parent latency in PREEMPT RT Linux real- - http://www.plcopen.org/pages/
time systems”, RTLWS12, 2010. tc3 certification/benchmarking/index.htm
[7] Henning Schild, Adam Lackorzynski, and [19] IEC 61131 .Programmable controllers - Part 3:
Alexander Warg, .Faithful Virtualization on a Programming languages., ed.2.0, 2003.
Real-Time Operating System., RTLWS11, 2009.
[20] Stress project home page .
[8] Baojing Zuo et.al., ”Performance Tuning To- http://weather.ou.edu/ apw/projects/stress/
wards a KVM-based Low Latency Virtualization
System”, Proceedings of 2nd International Con- [21] HeavyLoad download page .
ference on Information Engineering and Com- http://www.softpedia.com/get/ Sys-
puter Science (ICIECS), 2010. tem/Benchmarks/HeavyLoad.shtml
180
Real-Time Linux Concepts
Abstract
Lock- and wait-free data structures can be constructed in a generic way. However, when complex
operations are involved, their practical use is rather limited due to high performance overheads and, in
some settings, difficult to fulfil object lifecycles.
While working on a synchronous inter-processor communication (IPC) path for multicore systems, we
stumbled over a clever piece of code that did fulfil most of the properties that this path requires for its
send queue. Unfortunately, this piece of code was by no means a data-structure publication or somehow
related to send queues. Reporting on our experience in translating Krieger’s MCS-style reader-writer lock
into a send queue for cross-processor IPC, we would like to make the point that sometimes, searching for
code could end up in a valuable treasure chest even for largely different areas.
181
Turning Kriegers MCS Lock into a Send Queue
functionality that we required. In the following, we In addition, send queues should also support a de-
summarize the key requirements of send queues, mo- queue operation from the middle of the list, allowing
tivate our choice of a non-blocking implementation threads to cancel their not-yet-started IPC. Situa-
and highlight the difficulties of using non-blocking tions where callers have to abort an IPC include the
code in the kernel. Then we briefly outline the orig- deletion of a thread and timeouts (e.g., set to react
inal reader-writer lock by Krieger and present our on certain error situations).
modifications to turn it into a send queue. We eval-
One of the key benefits of a synchronous IPC
uate our results and relate them to others, before
path is the possibility to preallocate all message
introducing our idea on a treasure chest of non-
buffers to avoid memory allocation during the IPC.
blocking building blocks.
The immediate consequence for the send queue is
that all operations have to operate on preallocated
memory as well. In particular, if a call is finished, the
2 Send Queue Requirements caller’s message buffer and all meta data used dur-
ing the IPC must be ready for use in subsequent IPC
calls. In particular for generic constructions of lock-
The primary purpose of a send queue is to order
free data structures, these properties are difficult to
incoming requests to prevent sender starvation. In
fulfil. Still non-blocking implementations have their
processor-local IPC paths, the order of send opera-
benefits: they typically scale to higher CPU num-
tions is governed by the sequence threads are picked
bers than lock-based variants and, in our case, the
by the scheduler. This completely eliminating the
effort and overhead required for a fair lock protecting
need of a send queue. For example, facilitating time
the send queue is in the same order as the effort and
and priority inheritance, servers may receive time
overhead of a mostly lock-free send queue.
from the currently highest prioritized thread blocked
on them. Using this time, they can therefore com-
plete potentially pending requests to process the re-
quest of the time provider [6]. The crutial feature 3 Krieger’s MCS-style Reader
ensuring this ordering is the ability to donate time Writer Lock
to other threads, an operation which is easily done lo-
cally but very hard to apply across processor bound-
By swinging a single pointer to the tail of a list,
aries. Therefore, for cross-processor IPC, the order-
MCS locks implicitly arrange lock acquiring threads
ing of incoming requests needs another mechanism:
in FIFO order. Either if a thread enqueues into an
a send queue.
empty queue (i.e., tail = 0 prior to the enqueue oper-
When designing IPC primitives, multiple, par- ation) or if the previous lock holder releases the lock
tially contradicting targets need close attention, like by clearing a field the next thread in the list spins
performance and flexibility. Individual send and re- on, the thread at the head of the list becomes lock
ceive operations allow for a high degree of freedom, holder.
but add complexity. Threads may invoke multiple
prev
servers by sending multiple messages before enter- ...
R* L R* L R* W R R
ing a receive state or they may receive from another
client before replying to a previous one. On the other
hand, allowing only calls (i.e., atomic send and re- Tail
R* L
ceive operations) to invoke servers and restricting prev
R* L R* W ... R R
servers to reply only to the last caller, possibly af-
ter calling other servers in the course of handling the
caller’s request, limits the flexibility of IPC but sim- Tail
plifies synchronization and is sufficient for most use
cases. Processor-local versions of such a call-reply FIGURE 1: Data structure and dequeue
style IPC path can be very fast because they do not operation from Krieger’s MCS lock
need any form of synchronization [5].
Eliminating contention on the active reader
The operations required of a send queue are en- counter of Mellor-Crummey and Scott’s original
queuing to the tail of the list and dequeuing the head reader-writer lock [8], Krieger et al. [7] introduce
when replying. Depending on whether these opera- a second pointer to each queue element to enqueue
tions are used to block threads only in the case of readers into a lazily updated double-linked list. Fin-
a contended server or in every IPC, enqueuing into ished readers dequeue from the middle of the list us-
an empty list and dequeuing the head must be fast. ing spin locks that are only taken for the purpose of
182
Real-Time Linux Concepts
protecting neighboring list nodes during a dequeue. The first part of Invariant I is ensured automat-
The last active reader releases a subsequent writer if ically by the call-reply style IPC path as long as on-
present. Figure 1 illustrates this algorithm and the going requests are not aborted (Section 4.1 below
used data structures. discusses how the spirit of this invariant can be main-
tained in case of aborts). The second part is ensured
The key features of Krieger’s MCS lock, makeing
by leaving the callee locked in situations where the
it a perfect starting point for send queues, are:
last thread dequeues itself from the send queue.
1. an implicit FIFO ordering by threads atomi-
cally swinging the tail pointer to their list ele- type SQ_Item = record
ment and lazily enqueuing afterwards; next : ^sq_item
pred : ^sq_item
2. an extremely fast enqueue operation for the un- lock : precedence_spinlock
contended case (essentially just an atomic swap type Status = enum {EMPTY, NOT_EMPTY, OTHER}
of the tail pointer); type send_queue = class
head : ^sq_item
3. in most situations, an extremely fast dequeue tail : ^sq_item
operation of the head element (essentially only method enqueue(I : ^SQ_Item) : Status
the release of the next thread and an atomic pred : ^SQ_Item := swap(tail, I)
compare-and-swap if the queue is empty after- if pred = nil
wards); and, head := I
release_precedence(I->lock)
4. the possibility to dequeue from the middle of return EMPTY
I->pred := pred
the queue. // store fence
pred->next := I
release_clear_precedence(I->lock)
4 Turning Krieger’s Lock into return NOT_EMPTY
a Send Queue
FIGURE 2: Types and enqueue operation
Our goal is to translate Krieger’s MCS lock into a
mostly lock-free send queue for a call-reply style IPC
Figure 2 shows the pseudocode for enqueuing
path. In general, there are two principle ways to
into the send queue2 . Like in MCS locks, the func-
use such a send queue: enqueue callers only if the
tion enqueue starts by atomically swinging the tail
callee is contended, or, enqueue all callers that are
pointer of the send queue to the list element of the
sending or waiting to send to the callee. In the first
invoking caller and remembering the old value of tail.
case, exclusive control over the callee must be taken
If this old value is nil, the queue was empty and the
to determine whether it is ready to receive new re-
function may return after updating the head pointer.
quests. If not, the caller would block in the send
Otherwise, enqueue first sets the predecessor pointer
queue while waiting for the callee to be ready to re-
of its element before completing the enqueue opera-
ceive its request. Otherwise, the caller, in case of
tion by updating the predecessor’s next pointer. As
a caller-driven implementation, or the callee, in case
pointed out by Krieger, this write sequence prevents
of a callee-driven implementation, starts transferring
a race with a concurrent dequeue operation from
the message. However, because locking the callee to
the middle of the list and may require an additional
obtain exclusive control ideally involves a fair lock
fence. The role of the precedence spinlock and the
that facilitates local spinning, we expect comparable
validity of head will be discussed after we have in-
costs for locking the callee and for enqueuing into an
troduced the two dequeue functions.
MCS-style send queue.
Krieger’s MCS lock does not distinguish between
To avoid the additional overhead of locking the
dequeuing from the middle of the list and dequeuing
callee in the uncontended case (i.e., when the callee
the head element, since finished readers have to re-
is already receiving), we maintain the following in-
tract from the queue lock no matter where they are.
variant:
For a send queue however, the former operation is
Invariant I: A callee with an empty send queue only invoked when a thread cancels its IPC, whereas
is always receiving and implicitly locked by the first dequeuing the head is used in every reply to a caller,
thread entering the send queue. thereby completing the IPC.
2 For a better comparison, we adopt the pseudocode introduced by Mellor-Crummey and Scott [8], which is also used in
183
Turning Kriegers MCS Lock into a Send Queue
method dequeue_head(I : ^SQ_Item) : Status not yet update this pointer after Step 1. If now
if acquire_precedence(I->lock) = false a thread C enqueues itself and both A and B
return OTHER
if I->next = nil
dequeue themselves using dequeue head, A will
if !compare_swap(tail, I, nil) set C’s predecessor to nil but B’s later dequeue
repeat while I->next = nil will set the already left A as a new head.
next : ^SQ_Item := I->next
if next != nil
next->pred := nil Although the list structure itself is maintained, any
head := next head dependent action by C will now block forever or
I->next := nil work on the wrong head A. For call-reply style IPC
return NOT_EMPTY
I->next := nil paths, the restriction Invariant II imposed on the use
return EMPTY of the head pointer is no problem, because head is
used only by the thread having exclusive control of
FIGURE 3: Dequeue head operation the callee and in one of the following two situations:
(1) to pull in the next message after a reply; and
Figure 3 shows the pseudocode for dequeueing (2) to identify the thread to reply to. In a callee-
the head element. Again we defer the discussion of driven implementation, the callee is active anyway
the precedence spin lock to Section 4.1. Like in all when these situations occur. In a caller-driven im-
MCS-style locks, a nil next pointer can indicate one of plementation, the exclusive owner of the callee takes
two situations: either the dequeue operation is about over control of the thread at the head of the send
to remove the last thread from the list (in this case, queue to push its message to the receiver.
tail = I holds) or, a thread is about to enqueue into
the list but did not yet manage to update the prede-
cessor’s next pointer. The atomic compare-and-swap 4.1 Cancel
checks in which of the two states the list is in and
method
clears tail in the first case. Otherwise, the dequeu- dequeue_middle(I : ^SQ_Item) : Status
ing thread spins until the next pointer is set. This pred : ^SQ_Item = I->pred
spinning is bounded because enqueue executes in the // chase and lock pred
kernel with interrupts disabled. After returning from repeat while pred != nil
if try_acquire(pred->lock)
this loop, either the list is empty or the next pointer if pred = I->pred
is set and the dequeuing thread can update the head break
pointer to the corresponding thread, now being the release(pred->lock)
pred := I->pred
new head of the list.
if pred
Notice that the head pointer is not always cur- acquire_precedence(I->lock)
prev->next := nil
rent. In particular, dequeue head does not update if I->next = nil
head in case the list gets empty. The invariant that if !compare_swap(tail, I, prev)
maintains correctness of this implementation is the repeat while I->next = nil
following: next : ^SQ_Item := I->next
next->pred := pred
Invariant II: Head is valid only for the thread that if next != nil
pred->next := next
is under exclusive control of the callee. release(prev->lock)
An immediate consequence of this invariant is, I->pred := nil
I->next := nil
that threads may not poll head until they reach the return dequeue_head(I)
front of the send queue. The following sequence il-
lustrates this race:
FIGURE 4: Dequeue middle operation
1. Thread A enqueues and dequeues from the list.
Assuming the list was empty, head now refers Figure 4 shows the pseudocode for dequeuing
to A because dequeue head will not clear the threads from the middle of the send queue, which
head pointer, as doing so would race with with is required for canceling waiting threads. Except for
B’s concurrent enqueue operation. the precedence spinlocks, the above code directly re-
sembles Krieger’s reader unlock operation. The first
2. Thread B starts enqueuing itself but is delayed
while loop chases the predecessor pointer of the de-
after updating the tail pointer.
queuing caller’s list element to lock it for the subse-
3. Thread A returns, enqueues itself to the list quent dequeue operation. In our code base, this loop
and finds itself to be the head because B did is preemption reactive in the sense that it will abort
184
Real-Time Linux Concepts
the dequeue operation if a preemption is pending. threads and their corresponding send queue items
We have omitted this test for reasons of simplicity. may not be deallocated immediately upon their de-
struction. Instead, we reuse a read-copy update
Having locked the predecessor, the own lock of
(RCU) like deferred destruction scheme [9], that was
the caller’s list element is acquired to prevent con-
already available in Nova [5].
current dequeues from modifying the prev and next
pointers. Like in all MCS-style locks, compare-and- Although a callee is not necessarily receiving, the
swap is used to update the tail pointer in situations spirit of Invariant I holds trivially in a callee-driven
where the tail element of the list is dequeued. Oth- implementation because a caller enqueuing into an
erwise, the dequeue operation waits for a pending empty list will simply activate the callee no matter
enqueue operation to update the next pointer of the in what state it is. After a cancel, this may result in
to-be-dequeued element. Together with enqueue and the callee completing its prior operation or perform-
dequeue head, dequeue middle maintains the invari- ing some cleanup before entering the receive state
ant that dequeued list elements are always prece- during which it will pull the caller’s message.
dence locked. In the following we describe the role
of these precedence locks in greater detail.
4.2 Evaluation
The primary purpose of the spin lock is to pro-
tect the prev and next pointers from concurrent de-
queues. In our version, we grant dequeue head prece-
dence over concurrent dequeue operations from the 6000
et
enqueue + dequeue time in cycles
middle of the list, which are invoked as part of an IPC so
ck
5000 ss
cancel operation and, unlike dequeue head, are not cro
single core
socket
dequeuing from the middle can obtain the lock only single
if it is free and the precedence bit is clear. There- 3000
185
Turning Kriegers MCS Lock into a Send Queue
Jianping Shen
Institut Dr. Foerster GmbH und Co. KG
In Laisen 70, 72766, Reutlingen, Germany
[email protected]
Michael Hamal
Institut Dr. Foerster GmbH und Co. KG
In Laisen 70, 72766, Reutlingen, Germany
[email protected]
Sven Ganzenmüller
Institut Dr. Foerster GmbH und Co. KG
In Laisen 70, 72766, Reutlingen, Germany
[email protected]
Abstract
Dynamic memory allocation is used in many real-time systems. In such systems there are a lot of
objects, which are referenced by different threads. Their number and lifetime is unpredictable, therefore
they should be allocated and deallocated dynamically. Heap operations are in conflict with the main
demand of real-time systems, that all operations in high priority threads must be deterministic. In this
paper we provide a generic solution, a combination of the memory pool pattern with a shared pointer,
which meets both: high system reliability by automatic memory deallocation and deterministic execution
time by avoiding heap operations.
187
DYNAMIC MEMORY ALLOCATION ON REAL-TIME LINUX
2.1 Execution Time of Memory Allo- ordered double linked lists (shown in figure 2).
cation
188
Real-Time Linux Concepts
189
DYNAMIC MEMORY ALLOCATION ON REAL-TIME LINUX
We put 3.1 and 3.2 together, and provide here the Worst case: full concurrent access, all threads access
final solution. the memory pool at the same time.
1. The memory pool preallocates memory for user
data and its reference counter. tall,max = (tp + tm ) · threadmax
This solution works without memory allocation The mutex protection enforces that all parallel mem-
at runtime. The process and the shared pointer both ory pool accesses will be serialised. This could be a
use preallocated memory from the memory pool. In performance bottleneck. A workaround is to create
our approach we call the shared pointer RtSharedPtr. more memory pools to improve parallel memory ac-
quisition.
tp = execution time for a memory We will reach the maximal performance. The
execution time is minimal and constant.
acquisition at a memory pool
tm = execution time
tall = tp
of mutex lock + unlock
threadmax = maximum thread number
Obviously in a single-threaded environment we
tall = complete latency should create the memory pool as thread local to get
for a memory acquisition the best performance.
190
Real-Time Linux Concepts
4 The Implementation
t e m p l a t e <typename F>
c l a s s RtSharedPtr ;
FIGURE 6: Return Memory
t e m p l a t e <typename T>
c l a s s RtMemoryPool Instead of a raw pointer the memory pool returns
{ a shared pointer to the allocated memory block, The
RtMemoryPool ( i n t s i z e , b o o l shared pointer uses the reference count in the al-
t L o c a l , c o n s t Q Str ing& name ) ; located memory block, and keeps a pointer to the
RtSharedPtr <T> s h a r e d P t r A l l o c ( ) ; memory pool.
...
// usa g e
};
RtMemoryPool<SomeType> memPool
( 1 0 0 , ”MyPool” ) ;
The preallocated memory blocks are organized RtSharedPtr <SomeType> s p t r =
as a linked list. Each block contains user data, its memPool . s h a r e d P t r A l l o c ( ) ;
reference counter and a pointer to next block. For a
memory acquisition the pool returns always the head If the template type is a class, the memory pool
block. (see figure 5) will call the class constructor4 to initialize the mem-
ory block. Accordingly the class destructor will be
called when the memory block returns back to the
pool. If the object holds some resources5 , they are
released by the class destructor to prevent resource
leaks.
destructor.
191
DYNAMIC MEMORY ALLOCATION ON REAL-TIME LINUX
The modified shared pointer can work in two priority thread is meassured in µs and illustrated in figure
modes. 7 and figure 8.
192
Real-Time Linux Concepts
193
DYNAMIC MEMORY ALLOCATION ON REAL-TIME LINUX
194
Real-Time Linux Concepts
Nicholas Mc Guire
Disttributed & Embedded Systems Lab, Lanzhou University
Tianshui South Road 222,Lanzhou,P.R.China
[email protected],[email protected]
Abstract
The initial problem of protecting data for concurrent access was relatively simple, the goal was ex-
clusive access to a shared resource. Elaborate semantical variations of the atomicity theme have been
developed over the past decades, followed by a ever increasing focus on scalability. At the same time a
continuously increasing complexity of operating systems and applications has resulted in a steady growth
of the complexity of the locking subsystem - one could speculate that the locking complexity has been
growing faster than the overall complexity of operating systems, but it would be hard to put numeric
evidence on this claim - suffice it to state that the development of Linux in the transition from 2.2 to 2.4
and to the now current 3.X series of kernels has been very much dominated by locking issues related to
scalability [8], [7].
At the same time we have seen that locking semantics has become more complex, priority inheri-
tance/priority ceiling, fine grain locking, lock types dependent on global state [5] and large lock depen-
dencies (or lock chains [13]) becoming common. This growth in complexity has dramatically impacted the
development of real-time OS like the Preempt-RT real-time extension to the Linux kernel - not too surpris-
ing considerable efforts related to real-time are lock related [4],[6]. The paradigm has roughly remained the
same - explicit mutual exclusion to critical regions and atomicity of access in a functionally deterministic
manner along with hardware support for more elaborate atomic instructions (i.e. cmove,cmpx16).
This approach has a serious draw back:
• it is hard to make locking scalable
• detecting and fixing locking problems is becoming more difficult
• the performance impact of locking - notably on real-time - is problematic
• The worst case behavior is only a miniscule sub-state-space hard to actually reach during testing
• Timing wise the worst case is always the loaded system, thus reliable prediction of load impact is
limited.
The question is - is there an alternative ? Notably one that scales with growing complexity ? Its
not yet time to give a simple yes or no answer, but we believe that we can state that for some locking
problems there are solution that can actually inherently scale with growing complexity. The problem
simply has to be approached from a different perspective.
Operating systems have been traditionally modeled as deterministic constructs - code is deterministic
- but in system scope this simply does not hold. Non-determinism at the temporal level paired with
preemptible operating systems inherently leads to the inability to predict the global state of an operating
system even in the near future (lets say a few seconds into the future). Thus modeling a task as running on
a ”random” global state - the operating system - allows a new perspective for access to shared resources.
Taking one step back, locking was not introduce to provide exclusive access, locking was introduced
to ensure consistency of access to a shared resources - locking being one way this can be done in a
straightforward manner. At times where memory was a scarce resource this approach made a lot of
sense - with RAM readily available, though with significant access performance differences depending on
physical location, alternative solutions for consistent access to shared resources may make more sense -
one of these methods is probabilistic locking.
In this paper we present the motivation for a simply probabilistic lock - arguably the term lock
is inappropriate - but we retain it as it serves the same purpose as the traditional locks - guarantee
195
pW/CS - Probabilistic Write / Copy-Select (Locks)
consistency of shared data. This lock is not a one-fits-all solution to the problem of shared data in
concurrent systems - but rather it should be seen as an attempt to change the perspective and view
contemporary systems as what they are - inherently random systems - and capitalize on this notion to
resolve the scalability problem.
196
Real-Time Linux Concepts
task level is precisely the cause for race conditions • failure probability decreases with the number
in the first place, if modern systems were strictly of participating processes.
synchronous at the global level then we could pre-
determine any access patterns and consequently pro- Scalability is not mentioned here simply because
tect. Sources of non-determinism a plentiful in mod- we don’t yet have a good model to actually describe
ern systems, not only asynchronous interrupts, but and analyze scalability but clearly scalability is a
also non-deterministic cache replacement strategies, prime target. Wile we list non-atomic read-write it
ECC RAM and flash (the later with correction rates should be noted that we are assuming that single 32
in the order of 1 out of 100 accesses projected [11]), bit entities are written atomically - that is a write
complex dependencies of instruction execution times, of a word to a memory/register location will never
etc. All of this leads to a non-deterministic timing permit an inconsistent concurrent read - either the
- that is execution time jitter - and paired with pre- old value is read in its entirety or the new value is
emtibility - to a non-deterministic global state from read in its entirety but no ”mix” of the two - any sane
the perspective of the individual thread of execution. architecture will guarantee that (at least at present).
In safety related or HA systems traditionally ran-
dom faults have been mask by replication and re- 1.2 Concept
dundancy - we take a similar approach here but at
a much smaller scale - the critical object is a single The concept is embarrassingly trivial, the writer sim-
data object and the ”fault” is the writing process. ply writes to the shared object indescriminent of the
We start with a well studied and simple class - a sin- state of any reader. Obviously this would not be safe
gle writer multiple reader construct - similar to the for a single object - as with safety related systems
one introduced by Peterson in his influencial paper where random faults must be covered - we simply
”Concurrent reading while writing” [15], whereby the view the concurrent threads as ”randomly” access-
assumptions about the read and write operations are ing the data object and the writer is viewed as the
very much relaxed to reflect the nature of modern ”fault-injector”.
super-scalar multicores, that is no memory barriers
or volatile data types are assumed. The design goals
for race masking are: Shared Val
• lock-less / 0-wait
• hand-shake-free Reg 1 Reg 2 Reg 3
• non-atomic reader/writer
• constant number of steps for read and write
(O(1)) Copy Reg1 Copy Reg2 Copy Reg3
• an arbitrary probability of success can be pro- • probabilistic guarantee of success that can be
vided (level of replication) set to arbitrary value
• failure probability decreases with increasing • write operation: write replicated registers un-
system complexity. protected
• failure probability decreases with increasing • read operations: copies replicated registers and
system load. selects
197
pW/CS - Probabilistic Write / Copy-Select (Locks)
though Lamport’s taxonomy [3] might not be ap- FIGURE 2: access model
plicable to a probabilistic locking scheme but it ful-
fills the criteria quite nicely. As this is a low-level
primitive only, the motivation to build on such defi- • writer protocol:
nitions is to allow deducing high-level constructs (i.e.
– update replicas (left to right)
monitors seem a quite natural option) to build on
this primitive. – update protocol:
198
Real-Time Linux Concepts
∗ update leading (left) marker from 1977 but uses it in a deterministic algorithm
∗ update register to implemnt a multivalent regular register. Dijkstra
∗ update the trailing (right) marker proposes a non-deterministic slection in his paper
titled ”Guarded Commands, Nondeterminacy and
• reader protocol: Formal Derivation of Programs” 1975 [1] from which
– copy register set (right to left) we use the idea of gards to protect a set of in principle
non-deterministically selected actions (copying of the
– select consistent register register). Interestingly enough Hoare in ”Commu-
– selection protocol: nicating Sequencial Processes” [2] describes a num-
ber of situations based on Dijkstras da nguarded
for(reg=right,reg<left,reg--){ commands that resemble the pW/CS locks proposed
if(markers identical ){ here, though the context is quite different. Ulti-
select register mately non-determinism has been proposed in many
} publications though we are not aware of examples
} where this non-determinism is actually utilized - this
return register alone is the novelty of the proposed design here and
we believe it is potentially useful in resolving scala-
bility problems in at least some situations.
protection by write and read being in opposite
directions. It is not possible to get inconsistent data
even with a single pass - it is though possible to get
no data (all data is found to be inconsistent). The 4 General race-masking with
probability of the occurrence of all data is inconsis-
tent can though be brought down to an arbitrary low probabilistic locking
value with sufficient replication.
If the requirement of well set priorities and thus one-
Note that there are possibilities for ”smarter”
sided non-preemptibility is dropped then there is a
protocols than the above brute-force one. For the
possibility that the read will return with none of the
proof-of-concept implementation this simple minded
registers in a detectable consistent state. That is
approach was shown to work just fine - and as it
actually we don’t know the state of the register -
allows simple modeling it is what we are currently
we infer positively that the register is consistent if
using.
the markers are identical. At the same time we can
not positively infer an inconsistency of the register
in case of markers being unequal though. But tak-
3 race-masking with implicit ing the inconsistency of the markers as indication of
reader locking inconsistency of the registers is a pessimistic assump-
tion in all cases and thus safe.
The initial motivation for looking into race mask- To fail the access to the registers must be strictly
ing was to allow lock-less coordination of real-time in lock-step order for readers and writers - for N repli-
and non-real-time tasks on a real-time enhanced cas N*2+1 lock-step access would be needed to re-
GNU/Linux system. Essentially this section is to sult in all registers being inconsistent. Note that the
show that the introduction of real-time priorities will probability of such a lock-step behavior does increase
also only improve the situation but never aggravate with the size of the critical region (actually the un-
it in the sense that the probability of success is never interrupted time spent in the critical region).
reduced.
A collision (all registers in an intermediate state)
A non-probabilistic variant is by implicit priority would require 2N+1 synchronous preemption - so 13
locking of the reader - if the reader has higher priority synchronous preemption for a system using 6 repli-
than the writer then it is not possible for the writer cas. With synchronous preemption we mean that the
to preempt the reader and thus it is guaranteed that preemption of the reader must occur after a complete
the reader will be able to copy the entire buffer un- register with markers was read every time and the
interrupted - in this case it can also be guaranteed preemption of the writer must happen in the middle
that the reader gets at least one consistent copy if of the register region every time plus that last read
N ≥ 3 replicas of the register are used. must also be preempted to ensure that no register is
read in a consistent state.
This is nothing really new, Lamport suggested
this in 1985 [3] suggesting the idea actually stems Such an aliasing for N = 3 requires synchronous
199
pW/CS - Probabilistic Write / Copy-Select (Locks)
preemption of reader and writer in at least 6 consec- the first valid entry found, note that the first found
utive cases - this means a sixfold synchronous race is the last written thus the most current of the N
conditions is needed to result in inconsistent data - replicated registers so the selection can stop once
what is the probability of such a scenario if single a consistent register set was found. To ensure this
race conditions are hard to reproduce ? the individual replicas though must be on cache line
boundaries - if they were fit in a single cache line then
In fact the race condition could be extended to
the ordering implemented in the software would not
arbitrary number of race hits to be needed to result
necessarily be honored by the hardware.
in inconsistent data and thus one can provide arbi-
trary probability of success (at the expense of larger
number of replications).
5 Properties
This solution could be described as, a somewhat
paradox, ”safe race” - safe to an arbitrary proba-
bility of successful reading of at least one consistent 5.1 Assessment of the randomness
register. hypothesis
The current proof-of-concept is for a register con-
The most interesting issue in the experiments was to
sisting of 3 integer values, but is extensible to any
determine if the data can bolster the claim of a ran-
data structure - we note though that the prime in-
dom fault scenario. If this assumption is false then
terest is in resolving synchronization of small data
obviously the underlying model would not be valid
objects, where race occurrence is very unlikely and
and consequently the conclusions also not - at least
thus traditional locking excessively wasteful.
not at this point in time.
The writer is simply an unconditional write to
The random fault model is basically claiming
the register set.
that the writer actually has the same properties as a
random fault injection - even though it is obviously
do{
systematic in nature, its timing is suspected to be
/* unconditional write */
truly random. If this holds then the mitigation of
ui[i].w_enter++;
the fault also holds - with some constraints of course
ui[i].period = period;
that will be developed a bit later.
ui[i].duty = duty;
ui[i].bit = 1; To assess that the failures are actually random
ui[i].w_exit++; we take two main data samples into account.
i++;
}while(i < NUM_REPLICA); • timing distribution
The reader copies the register set in reverse order • distribution of single buffer inconsistencies
and then runs a selection loop on it:
From this data, presented below, we can con-
while(!exit_cond){
clude that the writer process actually exhibits prop-
i = NUM_REPLICA-1;
erties of a random fault (SEU).
do{
pwm[i].w_enter = ui[i].w_enter;
loaded system 8 reader threads (i7)
pwm[i].period = ui[i].period; 1e+09
"reader1_8t_pWCS_load_i7.dist"
pwm[i].duty = ui[i].duty; 1e+08
pwm[i].bit = ui[i].bit; 1e+07
pwm[i].w_exit = ui[i].w_exit; 1e+06
i--;
samples
100000
}while( i >= 0); 10000
1000
for(i=0;i<NUM_REPLICA;i++){ 100
if(pwm[i].w_enter - pwm[i].w_exit == 0){ 10
/* consistent register set found */ 1
0 50 100 150 200 250 300
} time in microseconds
}
FIGURE 3: timing distribution of one
The selection can then simply set the pointer to reader
200
Real-Time Linux Concepts
race occurance
1e+06
samples
400
100000
10000 300
1000
200
100
100
10
1 0
0 1 2 3 4 5 610 620 630 640 650 660 670
inconsistent buffers Sample #
6000
5000
idle system 2,8,256 reader threads (24core AMD)
4000 1e+09
"pWCS_nobind_rw−yield_2t_load0.log"
3000 "pWCS_nobind_rw−yield_8t_load0.log"
1e+08 "pWCS_nobind_rw−yield_256t_load0.log"
2000
1e+07
1000
1e+06
0
0 500 1000 1500 2000
samples
100000
loop lenth
10000
1000
FIGURE 5: race on single unprotected 100
global variable with two threads over the loop 10
length 1
0 1 2 3 4 5 6
inconsisten buffers
201
pW/CS - Probabilistic Write / Copy-Select (Locks)
100000
time distribution is very favorable for the probabilis-
10000
tic lock even on a loaded system (note that readers
1000 and writer are SCHED OTHER not RR or FIFO).
100
10
The comparison is done between code using
1
pW/CS and code using a normal pthread nutex to
0 1 2 3 4 5 6 protect the shared object.
inconsisten buffers
"8t_pWCS"
"8t_lock"
inconsistent buffers
"2t.dist"
"256t.dist" 1e+10
1e+09
1e+08
1e+07
1e+09 1e+06
100000
1e+08 10000
1000
1e+07 100
1e+06 10
1
100000 0
10000 1
1000 2
100 0
10 100 3
200 4
1 300 samples
400 5
500 6
20 time in microseconds 600
6 4 samples 700 7
0 0.5 12108
1 1.5 2 2.5 3 1614
system load
FIGURE 9: buffer inconsistency distribu- FIGURE 10: timing idle system (4 core In-
tion load sweep from 0 to 16, 2 threads vs 256 tel)
threads (24 core AMD)
5.4 Performance of pW/CS Notably running the same test on a larger sys-
tem - a 24 core AMD (2 CPUs) one can clearly
The performance evaluation is quite preliminary, see that the difference between the probabilistic ap-
partially because we don’t have a sufficiently com- proach and the deterministic approach widens.
202
Real-Time Linux Concepts
203
pW/CS - Probabilistic Write / Copy-Select (Locks)
With other words the problem becomes more likely empted/blocked - the reader will always have access
with high-load situations and we can’t test all possi- to the latest consistent buffer the writer was able to
ble combinations of high-load situations. provide.
What is now behind the problem is that the race
condition becomes a rare but possible problem - a
specific global state of the system - if it occurs we fail. 6.1 Next steps
In this sense the failure is deterministic (functional
view) but its dependency is relatively complex so it The current proof-of-concept implementation is sub-
is hard to test. On the other hand the state of the optimal in that it creates a N-replica copy for each
good case (looking at the successful execution of the reader. This is an unnecessary overhead in that it
synchronization object) is well defined ”determinis- would be at most suitable to create such a ”scratch-
tic” in most states - but the state space is very large pad” per NUMA-node on a NUMA system, for non-
so it is hard to achieve coverage. All we need to do is NUMA a direct selection from the writers replicas
turn it around - make the race depend on a complex is also an option. Further on the implementation
deterministic global state and make the good case side, the currently used counters should be replaced
independent of a particular state - that is - the good by a stronger consistency check - with the increas-
case should have a very large state space coverage, ing overhead of accessing remote CPUs the overall
and the bad case a small and testable state-space. local computation that can be expended if commu-
nication can be reduced to hand-shake-free (write-
The mitigation proposed here is to use synchro- and-forget) semantics are conciderable, equally the
nization that exhibits its worst-case behavior on the expendable spatial overhead is considerable before
idle system. A probabilistic lock will have it highest approaching a lock based implementation (on a 4x4
probability of aliasing, and thus failing, in an idle system that we had access to temporarily a imple-
system, and the higher or more erratic the load situ- mentatoin using 100 replicas was still faster than a
ation is the better it gets - because the probability of locking version for a shared object of 16 bytes !).
synchronous preemption that cause the possible col- More work needs to be done to understand where the
lision decreases. Thus we regain the ability to test break-even point would lie and consequently where
synchronization potentially as we can now reliably this approach would be suitable.
provide the worst-case with one single profile - the
idle system. At the same time the good case does not The second large area of future work is in the
have a single deterministic global state constellation modeling of this concept. The current approach
but rather happens in a large number of independent of a quite brute-force implementation to get a bet-
states (that is only one register must be consistent ter understanding of the approach and its potential
from N). is hardly suitable for actual deployment in a real
system if no formal model is available for assess-
A second aspect of raciness is the temporal di- ment. Unfortunately the available models don’t fit
mension - in traditional systems one could observe the approach well. Maybe with the exception of Di-
that something that worked well for a long time sud- jkstras guarded commands and non-deterministc if
denly fails because of optimization or faster hardware construct. The only real diffference being that while
- we had the ”implicit ordering” simply by the ex- Dijkstras guarded commands evaluate the guards to
ecution flow that protected the unprotected critical determin if execution should take place, pW/CS un-
region. Now the probabilistic lock has exactly the conditionally coplies the replicated register instance
opposite qualities, the fast the system gets the lower and then uses the ”guards” to determin if the selec-
the probability of the reader not achieving a consis- tion should take place or if the replica is abandoned -
tent read before being preempted, and this also holds currently we intend to utilizing Dijkstras constructs
for optimization of compilers - so again we can test to model pW/CS.
the worst case - slow system, unoptimized code - it
can only get better for the probability of the race
condition not occurring in all N replicas of the regis-
ter set. 7 Conclusion
Finally the issues of Amdahl’s law, the longest
serialized portion of code can quickly dominate the With ever growing complexity, designing determin-
overall performance. As pW/CS has no serializa- istic while optimal systems is becoming increasingly
tion of readers and the writer there is no impact on hard (or actually impossible). In this paper we pro-
concurrent threads by individual threads being pre- pose to look at potentially capitalizing on the grow-
ing complexity rather than fighting it - by utilizing
204
Real-Time Linux Concepts
the inherent randomness of complex systems in com- August 1975 Communications of the ACM
bination with probabilistic locking methods. Volme 18 Number 8
We demonstrate the feasibility of this approach [2] Communicating Sequencial Processes, C.A.R
with an admittedly naive implementation of a criti- Hoare, August 1978 Communications of the
cal section shared between a concurrently executing ACM,Volume 21, Number 8
readers and writer of arbitrary priority. The results
indicate that with growing complexity of the system, [3] On Interprocess Communication, Leslie Lamport,
with higher system load, and with increased number December 1985 – Part I: Basic formalism, Part II:
of readers the probability of failing is reduced. Fur- Algorithms. SRI International
ther faster systems have a higher probability of suc-
cess than slower systems, and equally optimization [4] ELC: A PREEMPT RT roadmap,Jake Edge
of compiler plays to our advantage. on a talk by Thomas Gleixner, April 2011
http://lwn.net/Articles/440064/
We are aware that this is too early to call this a
sound and reliable result but the preliminary inves- [5] Linux Kernel Development (3ed Edition), Robert
tigation does indicate that the proposed path - stop Love, July 2010 Addison-Wesley
fighting complexity, use it ! - is worth investigation
in more detail. [6] migrate disable infrastructure, Peter Zijlstra, July
2011 Linux 3.0-rc7-rt0
The most notable obstacle to utilize such ap-
proaches in our opinion is the lack of appropriate [7] fasync() BKL pushdown, Jonathan Corbet, June
models for probabilistic approaches. This clearly 2008, http://lwn.net/Articles/287083/
will be our next steps in this effort to capitalize
on the inadvertable trend of growing hardware and [8] hrtimers and beyond transforma-
software complexity. Further a systematic tradeoff tion of the Linux time(r) system,
study, comparing traditional locking options in re- Thomas Gleixner, Douglas Niehaus, 2006
lation to system complexity will be on our TODO http://www.kernel.org/pub/linux/kernel/
list. people/tglx/hrtimers/ols2006-
hrtimers.pdf
The main conclusion from this work though is
simply that locking may not be the best solution for [9] Analysis of inherent randomness of the Linux
concurrent access to shared objects - rethinking the kernel, Nicholas Mc Guire, Peter Odhiambo Okech,
problem in the context of modern super-scalar mul- September 2009 DSLab Lanzhou University
ticore systems might well be worth the effort.
[10] Completely Fair Scheduler,Ingo Molnr, October
2007, Linux 2.6.23
Acknowledgment [11] LEC(TM) for Flash Memory,Lyric Semiconduc-
tor,Undated,www.lyricsemiconductor.com
We would like to thank Silicon Graphics GmbH, Ger- http://gigaom.com/2010/08/16/lyric-
many, specifically Mr. Heinz M oser for supporting semiconducto/
this research effort by providing us with suitable mul-
ticore system for development and testing. This sup- [12] sched fair.c, Ingo Molnar,Peter Zijlstra, et. al,
port allowed us to perform an initi al evaluation of linux-2.6/kernel/sched fair.c
the scalability properties of pW/CS. [13] Lockdependency Validator,Ingo Mol-
sources used for this project are available on re- nar,Arjan van de Ven, May 2006
quest under the GPL V2 from DSLab [14] and will be http://lwn.net/Articles/185605/
released as soon as they are cleaned up
[14] pWCS, gauss,c,dist.c,Nicholas Mc Guire,
2011,http://dslab.lzu.edu.cn:8080/members/
hofrat/pWCS/
References
[15] Concurrent reading while writing, Gary L. Peter-
[1] Guarded Commands, Nondeterminancy and For- son, James E. Burns, 1983. ACM Transactions on
mal Derivation of Programs, Edsger W. Dijkstra, Programming Languages and Systems
205
pW/CS - Probabilistic Write / Copy-Select (Locks)
206
Real-Time Linux Concepts
Konstantinos Bletsas
CISTER-ISEP Research Center, Polytechnic Institute of Porto
Rua Dr. António Bernardino de Almeida 431, 4200-072 PORTO, Portugal
[email protected]
Eduardo Tovar
CISTER-ISEP Research Center, Polytechnic Institute of Porto
Rua Dr. António Bernardino de Almeida 431, 4200-072 PORTO, Portugal
[email protected]
Björn Andersson
Software Engineering Institute, Carnegie Mellon University
Pittsburgh, USA
[email protected]
Abstract
In this paper we discuss challenges and design principles of an implementation of slot-based task-
splitting algorithms into the Linux 2.6.34 version. We show that this kernel version is provided with
the required features for implementing such scheduling algorithms. We show that the real behavior of
the scheduling algorithm is very close to the theoretical. We run and discuss experiments on 4-core and
24-core machines.
207
Real-time slot-based task-splitting scheduling algorithms for multiprocessor systems
bound of 100%, but generate too many preemptions. executes prior runtime and besides assigning tasks
to processors is also responsible for computing all
Partitioned scheduling algorithms partition the
parameters required by the dispatching algorithm.
task set and assign all tasks in a partition to the
The dispatching algorithm works over the timeslot
same processor. Hence, tasks cannot migrate be-
and selects tasks to be executed by processors.
tween processors. Such algorithms involve few pre-
emptions but their utilization bound is at most 50%. The Sporadic-EKG (S-EKG) [4] extends the pe-
riodic task set model of EKG [7] to sporadic task
Semi-partitioning (also known as task-splitting)
set models. This approach assures that the number
scheduling algorithms assign most tasks (called non-
of split tasks is bounded (there are at most m − 1
split tasks) to just one processor but some tasks
split tasks), each split task executes on only two pro-
(called split tasks) are assigned to two or more pro-
cessors and the non-split tasks execute on only one
cessors. Uniprocessor dispatchers are then used on
processor. The beginning and end of each times-
each processor but they are modified to ensure that
lot are synchronized across all processors. The end
a split task never executes on two or more processors
of a timeslot of processor p contains a reserve and
simultaneously.
the beginning of a timeslot of processor p + 1 con-
Several multiprocessor scheduling algorithms tains another reserve, and these two reserves sup-
have been implemented and tested using vanilla ply processing capacity for a split task. As non-split
Linux kernel. LitmusRT [1] provides a modu- tasks execute only on one processor they are sched-
lar framework for different scheduling algorithms uled according to the uniprocessor EDF scheduling
(global-EDF, pfair algorithms). Kato et al. [2] cre- algorithm. A detailed description of that algorithm
ated a modular framework, called RESCH, for using with an example can be found at [8].
other algorithms than LitmusRT (partitioned, semi-
While EKG versions are based on the task, the
partitioned scheduling). Faggioli et al. [3] imple-
NPS-F [5,6] uses an approach based on bins. Each
mented global-EDF in the Linux kernel and made it
bin is assigned one or more tasks and there is a one
compliant with POSIX interfaces.
to one relation between each bin and each notional
In this paper we address the Real-time TAsk- processor. Then, the notional processor schedules
Splitting scheduling algorithms (ReTAS) frame- tasks of each bin under the EDF scheduling pol-
work [11] that implements a specific type of semi- icy. The time is split into equal-duration timeslots
partitioned scheduling: slot-based task-splitting and each timeslot is composed by one or more time
multiprocessor scheduling algorithms [4, 5, 6]. Slot- reserves. Each notional processor is assigned one
based task-splitting scheduling algorithms assign reserve in one physical processor. However, up to
most tasks to just one processor and a few to only m − 1 notional processors could be assigned to two
two processors. They subdivide the time into equal- reserves, which means that these notional proces-
duration timeslots and each timeslot processor is sors are implemented upon two physical processor
composed by one or more time reserves. These re- reserves, while the remaining notional processors are
serves are used to execute tasks. Reserves used for implemented upon one physical processor reserve.
split tasks, which execute on two processors, must
There is one fundamental difference between S-
be carefully positioned to avoid overlapping in time.
EKG and NPS-F algorithms. NPS-F can potentially
The remainder of this paper is structured as fol- generate a higher number of split tasks than S-EKG.
lows. Section 2 provides a description of the main Another difference is related to the dispatching al-
features of the slot-based task-splitting scheduling al- gorithm. The S-EKG allows non-split tasks to be
gorithms. Section 3 discusses some challenges and executed on the split task reserves (in the case when
design principles to implement this kind of algo- these tasks are not ready to be executed) while NPS-
rithms. A detailed description of our implementa- F does not; that is, each notional processor executes
tion is presented in Section 4 while in Section 5 we only on its reserve(s).
discuss the discrepancy between theory and practice.
Fig. 1 shows a generic execution timeline pro-
Finally, in Section 6 conclusions are drawn.
duced by these scheduling algorithms. The time is
divided into equal-duration timeslots of length S.
Each timeslot is divided up to 3 reserves: x[p], y[p]
2 Slot-based task-splitting and N [p]. Reserve x[p] is located in the beginning of
the timeslot and is reserved to execute the task or no-
Slot-based task-splitting algorithms have two impor- tional processor split between processors p and p − 1.
tant components: (i) the task assigning; and (ii) the Reserve y[p] is located in the end of the timeslot and
dispatching algorithm. The task assigning algorithm
208
Real-Time Linux Concepts
is reserved to execute the task or notional processor its reserve on processor p, it has to immediately re-
split between processors p and p + 1. The remain- sume execution on its reserve on processor p+1. Due
ing part (N [p]) of the timeslot is used to execute to many sources of unpredictability (e.g. interrupts)
non-split tasks or notional processors that execute in a real operating system, this precision is not pos-
on only one processor. sible. Consequently, this can prevent the dispatcher
of processor p+ 1 to select the split task because pro-
cessor p has not yet relinquished that task. In order
to handle this issue, one option could be that pro-
cessor p + 1 sends an inter-processor interrupt (IPI)
to processor p to relinquish the split task, and an-
other could be that processor p + 1 sets up timer x
time units in future to force the invocation of its dis-
patcher. Two reasons have forced us to choose the
latter. First, we know that if a dispatcher has not yet
relinquished the split task it was because something
is preventing it from doing so, such as, the execu-
tion of an interrupr service routine (ISR). Second,
FIGURE 1: Execution timeline example. the use of IPIs will create some dependency between
processors that could embarrass the scalability of the
In the remainder of this paper, we will discuss the dispatcher.
implementation of S-EKG an NPS-F algorithms in
4-core and 24-core machines supported by the Linux
2.6.34 kernel version.
209
Real-time slot-based task-splitting scheduling algorithms for multiprocessor systems
4.2 Why vanilla Linux kernel? sidered a job. Note that the first job of each task
appears in the system at time0 + offset (time0 is
The vanilla Linux kernel 2.6.34 was chosen to imple- set equal to all tasks in the system) and the remain-
ment the scheduling algorithms S-EKG [4] and NPS- ing jobs are activated according to the period. The
F [5,6]. That kernel version provides the required delay until function sleeps a task until the absolute
mechanisms to satisfy the previously mentioned de- time specified by arrival.
sign principles: (i) each processor holds its own run-
queue and it is easy to add new fields to it; (ii) it a r r i v a l := t i m e 0 + o f f s e t ;
while ( t r u e )
has already implemented red-black trees that are bal- {
anced binary trees whose nodes are sorted by a key delay until ( arrival ) ;
execute () ;
and most the operations are done in O(log n) time; a r r i v a l := a r r i v a l + p e r i o d ;
(iii) it has the high resolution timers infrastructure }
that offers a nanosecond time unit resolution, and Listing 1: ReTAS task pseudo-algorithm.
timers can be set on a per-processor basis; (iv) it
is very simple to add new system calls and, finally, In the Linux operating system a process is an
(v) it comes with the modular scheduling infrastruc- instance of a program in execution. To manage
ture that easily enables adding a new scheduling pol- all processes, the kernel uses an instance of struct
icy to the kernel. task struct data structure for each process. In or-
der to manage ReTAS tasks, some fields were added
to the struct task struct data structure (see List-
4.3 ReTAS implementation ing 2). notional cpu id field is used to associate
the task with the notional processor. Fields cpu1
The vanilla Linux kernel 2.6.34 has three native and cpu2 are used to set the logical identifier of pro-
scheduling modules: RT (Real-Time); CFS (Com- cessor(s) in which the task will be executed on. The
pletely Fair Scheduling) and Idle. Those modules absolute deadline and also the arrival of each job
are hierarchically organized by priority in a linked are set on the deadline and arrival fields of the
list; the module with highest priority is the RT, the retas job param data structure, respectively.
one with the lowest is the Idle module. Starting with
the highest priority module, the dispatcher looks for s tr u ct r e t a s t a s k {
int n o t i o n a l c p u i d ;
a runnable task of each module in a decreasing order s tr u ct r e t a s t a s k p a r a m t a sk p a r a m {
of priority. unsigned long long d e a d l i n e ; // D i
} t a sk p a r a m ;
We added a new scheduling policy module, called s tr u ct r e t a s j o b p a r a m j o b p a r a m {
unsigned long long d e a d l i n e ; // d i j
ReTAS, on top of the native Linux module hierarchy, unsigned long long a r r i v a l ; // a i j
thus becoming the highest priority module. That } job param ;
i n t cpu1 ;
module implements the S-EKG and NPS-F schedul- i n t cpu2 ;
ing algorithms. The ReTAS implementation con- ...
};
sists on a set of modifications to the Linux 2.6.34
kernel in order to support the S-EKG and NPS-F s tr u ct t a s k s t r u c t {
...
scheduling algorithms and also the cluster version s tr u ct r e t a s t a s k r e t a s t a s k ;
of the NPS-F, called C-NPS-F [5]. These schedul- };
ing policies are identified by the SCHED S EKG and Listing 2: Fields added to struct task struct
SCHED NPS F macros. kernel data structure.
Since the assigning algorithm is executed prior
to runtime, in the next sections we will focus only on
the kernel implementation; that is, on the dispatch- 4.3.2 Notional processors
ing algorithms.
As mentioned before, ReTAS tasks are assigned to
notional processors. Therefore, notional processors
4.3.1 ReTAS tasks act as a runqueue. Each notional processor is an in-
stance of struct notional cpu data structure (see
To differentiate these tasks from other tasks present Listing 3), which is identified by a numerical iden-
in the system, we refer to these tasks as ReTAS tifier (id). Field cpu is set with the logical identi-
tasks. Listing 1 shows the pseudo-algorithm of Re- fier of the physical processor that, in a specific time
TAS tasks. They are periodic tasks and are always instant, is executing a task from that notional pro-
present in the system. Each loop iteraction is con- cessor. The purpose of the flag will be explained in
210
Real-Time Linux Concepts
Section 4.3.5. Each notional processor organizes all 4.3.4 ReTAS scheduling module
ready jobs in a red-black tree, whose root is the field
root tasks, according to the job absolute deadline. In the vanilla Linux kernel each processor holds a
The lock field is used to serialize the insertion and runqueue of all runnable tasks assigned to it. The
remotion operations over the red-black tree specially scheduling algorithm uses this runqueue to select
for notional processors that are executed by two pro- the “best” process to be executed. The information
cessors. edf field points to the task with the earli- for these processes is stored in a per-processor data
est deadline stored in the red-black tree. Note that structure called struct rq (Listing 5). Many func-
notional cpus is a vector defined as global variable. tions that compose the Linux’s modular scheduling
framework have an instance of this data structure as
s tr u ct n o t i o n a l c p u { argument. Listing 5 shows the new data structures
int id ;
a t o m i c t cpu ;
required by the ReTAS scheduling module added to
atomic t f la g ; the Linux native struct rq. The purpose of the
raw spinlock t lock ;
s tr u ct r b r o o t r o o t t a s k s ;
struct timeslot data structure was described in
s tr u ct t a s k s t r u c t ∗ e d f ; the previous section.
...
};
... s tr u ct r e t a s r q {
int p ost sc h e d u l e ;
s tr u ct n o t i o n a l c p u n o t i o n a l c p u s [ s tr u ct t i m e s l o t t i m e s l o t ;
NR NOTIONAL CPUS ] ; s tr u ct r e l e a s e r e l e a s e ;
s tr u ct r e s c h e d c p u r e s c h e d c p u ;
Listing 3: struct notional cpu data structure. };
s tr u ct r q {
...
s tr u ct r e t a s r q retas rq ;
};
4.3.3 Timeslot reserves Listing 5: struct retas rq added to struct rq.
211
Real-time slot-based task-splitting scheduling algorithms for multiprocessor systems
212
Real-Time Linux Concepts
213
Real-time slot-based task-splitting scheduling algorithms for multiprocessor systems
214
Real-Time Linux Concepts
215
Real-time slot-based task-splitting scheduling algorithms for multiprocessor systems
scheduled using the S-EKG and NPS-F scheduling al- observed shows that something prevented the release
gorithms (which have comparable jitter/overheads). mechanism of doing that job release. The reason for
Since each experiment took 1000 s, the whole set of this is related to the unprectability of the underlying
experiments took 34000 s. operating system. There are many sources of un-
predictability in a Linux kernel: (i) interrupts are
the events with the highest priority, consequently
5.4 Discussion of results when one arises, the processor execution switches
to handle the interrupt (usually interrupts arise in
We collected the maximum values observed for each an unpredictable fashion); (ii) on Symmetric Multi
type of jitter and also for each type of overhead ir- Processing (SMP) systems there are multiple kernel
respective of the algorithm used (S-EKG or NPS-F). threads running on different processors in parallel,
Table 1 presents the experimental results for: reserve and those can simultaneously operate on shared ker-
jitter (ResJ), release jitter (RelJ), context switch nel data structures requiring serialization on access
jitter (CtswJ), the overhead of interrupt 20 (related to such data; (iii) disabling and enabling preemption
to the hard disk) and the overhead of tick. Note features used in many parts of the kernel code can
that, we do not directly present the release overhead. postpone some scheduling decisions; (iv) the high
Rather, since, the release overhead is part of what is resolution timer infrastructure is based on local Ad-
experienced as release jitter, we simply present the vanced Programmable Interrupt Controller (APIC),
worst-case RelJ (which also accounts for RelO). The disabling and enabling local interrupts can disrupt
column identified with MAXmax gives the maximum the precision of that timer and, finally, (v) the hard-
value observed in all experiments. The third col- ware that Linux typically runs on does not provide
umn (AVGτi ) gives the average value experimented the necessary determinism, which would permit the
by the task that experienced the MAXmax value. The timing behavior of the system to be predictable with
fourth column (MINmax ) gives the minimum of the all latencies being time-bounded and known prior to
collected values (note that is the minimum of the run-time.
maximum values). The last column displays the av- The same reasons could be given to explain the
erage value of the task that experienced the MINmax MAXmax CtswJ value. However, usually the mag-
value. Before analyzing the results, we draw the at- nitude of CtswJ is not too high, because this oper-
tention of the reader to the time unit, µs, which ation is done by the scheduler, which executes in a
means that the impact of those jitterrs/overheads is controlled context.
relatively small. Recall that the period of tasks in
the various experiments varied from 5 ms up to 50 In Table 1, we present the overhead results of two
ms. ISRs: irq20 and tick (tick is a periodic timer inter-
rupt used by the system to do a set of operations,like
The highest ResJ values were constantly experi- for instance invoking the scheduler). The reason for
enced by split tasks. This is due to the task migra- this is, irq20 can be configured to be executed by one
tion mechanism required for split tasks (described in specific processor but tick cannot. In our opinion, it
Section 4). In that mechanism, if a task is not avail- does not make sense to present other values besides
able, a timer is set to expire some time later. The MAXmax , because in these experiments this is a spo-
value chosen for this delay was 5 µs. radic and rare event. In contrast, tick is periodic
The MAXmax RelJ value is too high (31.834 µs), with a frequency of approximately 1 ms. The values
but comparing both AVGτi (0.329 µs and 0.369 µs) observed show that this overhead is very small.
216
Real-Time Linux Concepts
217
Real-time slot-based task-splitting scheduling algorithms for multiprocessor systems
[8] P. B. Sousa and B. Andersson and E. Tovar, Im- Diego, CA, USA, 2010. Available online:
plementing Slot-Based Task-Splitting Multipro- http://cse.unl.edu/rtss2008/archive/rtss2010/
cessor Scheduling, in proceedings of 6th IEEE WIP2010/5.pdf
International Symposium on Industrial Embed-
ded Systems (SIES 11), Västerås, Sweden, pp. [10] P. B. Sousa K. Bletsas and E. Tovar and B.
256–265, 2011. Andersson, On the implementation of real-time
slot-based task-splitting scheduling algorithms
[9] P. B. Sousa and B. Andersson and E. To- for multiprocessor systems, (extended version of
var, Challenges and Design Principles for this paper) in Technical Report HURRAY-TR-
Implementing Slot-Based Task-Splitting Multi- 110903, 2011.
processor Scheduling, in Work in Progress
(WiP) session of the 31st IEEE Real- [11] P. B. Sousa, ReTAS. Available online:
Time Systems Symposium (RTSS 10), San http://webpages.cister.isep.ipp.pt/˜pbsousa/retas/.
218
Real-Time Linux Concepts
Abstract
Real-time aperiodic server algorithms were originally devised to schedule the execution of threads
that serve a stream of jobs whose arrival and execution times are not known a priori, in a way that
supports schedulability analysis. Well-known examples of such algorithms include the periodic polling
server, deferrable server, sporadic server, and constant bandwidth server.
The primary goal of an aperiodic-server scheduling algorithm is to enforce a demand bound for each
thread - that is, an upper bound on the amount of CPU time a thread may compete for in a given time
interval. Bounding the demand of a given thread limits the interference that thread can inflict on other
threads in the system experience in the competition for CPU time. Isolating the CPU-time demands
of threads, known as temporal isolation, is an essential requirement for guaranteed resource reservations
and compositional schedulability analysis in open real-time systems. A secondary goal of an aperiodic
server is to minimize the worst-case and/or average response time while enforcing the demand bound.
The theoretical aperiodic server algorithms meet both goals to varying degrees.
An implementation of an aperiodic server can yield performance significantly worse than its theoretical
counterpart. Average response time is often higher, and even temporal isolation may not be enforced due
to factors not found or considered in the theoretical algorithm. These factors include context-switching
overheads, imprecise clocks and timers, preemption delays (e.g., overruns), and limits on storage available
for bookkeeping.
This paper reports our experience implementing, in Linux, variations of the sporadic-server scheduling
algorithm, originally proposed by Sprunt, Sha, and Lehoczky. We chose to work with sporadic-server
scheduling because it fits into the traditional Unix priority model, and is the only scheduling policy
recognized by the Unix/POSIX standard that enforces temporal isolation. While this paper only considers
sporadic server, some lessons learned extend to other aperiodic servers including those based on deadline
scheduling.
Through our experience, we show that an implemented sporadic server can perform worse than less
complex aperiodic servers such as the polling server. In particular, we demonstrate the effects of an
implementation’s inability to divide CPU time into infinitely small slices and to use them with no overhead.
We then propose and demonstrate techniques that bring the performance closer to that of the theoretical
sporadic-server algorithm. Our solutions are guided by two objectives. The primary objective is that the
server enforce an upper bound on the CPU time demanded. The secondary objective is that the server
provide low average-case response time while adhering to the server’s CPU demand bound. In order to
meet these objectives, our solutions restrict the degree to which the server’s total CPU demand can be
divided. Additionally, we provide mechanisms to increase the server’s ability to provide more continuous
allocations of CPU demand.
Through a network packet service example, we show that sporadic server can be effectively used to
bound CPU demand. Further, the efficiency of jobs served by sporadic server can be improved in terms
of both reduced average-case response time and increased throughput.
∗ Dr. Baker’s contributions to this paper are based on work supported by the National Science Foundation, while working at
the Foundation.
219
Experience with Sporadic Server Scheduling in Linux: Theory vs. Practice
220
Real-Time Linux Concepts
suspends itself) without interference from jobs of to gather information. While it is desirable to re-
any other task. Showing that a job can complete ceive all network packets, missing a few packets is
within a given time window in the presence of other not catastrophic. The difficulty lies in that the net-
tasks amounts to bounding the amount of proces- work receive path is shared by other tasks on the sys-
sor time the other tasks can steal from it over that tem, some with different deadlines and others with
interval, and then showing that this worst-case inter- no explicit deadlines.
ference leaves enough time for the job to complete.
Assuming a fixed-task-priority model, a prior-
The usual form of interference is preemption by a
ity must be chosen for the bottom level of network
higher priority task. However, lower priority tasks
packet service. Processing the packets at a low or
can also cause interference, which is called priority
background priority does not work well because pro-
inversion or preemption delay. Preemption delays
cessing the packets may be delayed arbitrarily. Ex-
may be caused by critical sections, imprecision in
tended delay in network packet processing means
the OS timer mechanism, or any other failure of the
that a real-time task waiting for the packets may miss
kernel to adhere consistently to the preemptive fixed-
an unacceptably large number of packets. Another
priority scheduling model.
option is to schedule the network packet processing
A system that supports the UNIX real-time API at a high priority. However, the network packet pro-
permits construction of threads that behave like a pe- cessing now can take an unbounded amount of CPU
riodic task. The clock nanosleep() function is one of time, potentially starving other tasks on the system
several that provide a mechanism for suspending ex- and thereby causing missed deadlines. Therefore, a
ecution between one period and the next. Using the scheduling scheme is needed that provides some high-
sched setscheduler() function the application can re- priority time to serve the aperiodic jobs; however, the
quest the SCHED FIFO policy, and assign a priority. high-priority time should be limited, preventing the
By doing this for a collection of periodic tasks, and packet processing from monopolizing the CPU. The
choosing priorities sufficiently high to preempt all bound on CPU time ensures other tasks have access
other threads,1 one should be able to develop an ap- to the CPU in a timely manner.
plication that conforms closely enough to the model
The key to extending analysis techniques devel-
of periodic tasks and fixed task-priority preemptive
oped for periodic tasks to this broader class of work-
scheduling to guarantee the actual tasks meet dead-
loads is to ration processor time. It must be pos-
lines within some bounded tolerance.
sible to force even an uncooperative thread to be
Unfortunately, that is not enough. To support a scheduled in a way that the worst-case interference
reasonable range of real-time applications one needs it causes other tasks can be modeled by the worst-
to be able to handle a wider range of tasks. For ex- case behavior of some periodic task. A number of
ample, a task may request CPU time periodically but scheduling algorithms that accomplish this have been
the execution time requested may not be bounded, studied, which we refer to collectively as aperiodic
or the arrival of work may not be periodic. If such a servers.
task has high enough priority, the interference it can
Examples of well-known aperiodic server
cause for other tasks may be unpredictable or even
scheduling algorithms for use in a fixed-task-priority
unbounded, causing other tasks to miss deadlines.
scheduling environment include the polling and de-
Aperiodic tasks typically have performance re- ferrable servers [18], and the sporadic server [2].
quirements that are soft, meaning that if there is a There are also several examples for use with dead-
deadline it is stochastic, or occasional deadline misses line scheduling, among which the constant bandwidth
can be tolerated, or under temporary overload con- server has received considerable attention[17].
ditions load shedding may be acceptable. So, while
All these algorithms bound the amount of CPU
the CPU time allocated to the service of aperiodic
time an aperiodic task receives in any time interval,
tasks should be bounded to bound worst-case inter-
which bounds the amount of interference it can cause
ference for other tasks, it should be provided in a way
other tasks, guaranteeing the other tasks are left a
that allows the aperiodic task to achieve fast average
predictable minimum supply of CPU time. That is,
response time under expected normal circumstances.
aperiodic servers actively enforce temporal isolation,
One example of an aperiodic task that requires which is essential for an open real-time execution
fast average response time can be found in the paper platform.
by Lewandowski, et. al [3]. In this paper, a real-
The importance of aperiodic servers extends
time task uses the network in its time-critical path
beyond the scheduling of aperiodic tasks. Even
1 Of course, careful attention must be given to other details, such as handling critical sections.
221
Experience with Sporadic Server Scheduling in Linux: Theory vs. Practice
the scheduling of periodic tasks may benefit from The amount of CPU time consumed is restored to
the temporal isolation property.2 Aperiodic server the budget one replenishment period in the future,
scheduling algorithms have been the basis for a starting from the instant when the sporadic server
rather extensive body of work on open real-time sys- requested CPU time and had budget. The operation
tems, appearing sometimes under the names virtual to restore the budget at a given time in the future,
processor, hierarchical, or compositional scheduling. based on the amount of time consumed, is known as
For example, see [4, 9, 10, 11, 12, 13, 14]. a replenishment. Once the server uses all of its bud-
get, it can no longer compete for CPU time at its
In this paper, we limit attention to a fixed-task-
scheduling priority.4
priority scheduling environment, with particular at-
tention to sporadic- server scheduling. The primary The objective of the sporadic-server scheduling
reason is that Linux for the most part adheres to the algorithm is to limit worst-case system behavior
UNIX standard and therefore supports fixed-task- such that the server’s operation can be modeled, for
priority scheduling. Among the well-known fixed- schedulability analysis of other tasks, as if it were
task-priority aperiodic-server scheduling algorithms, a periodic task. That is, in any given sliding time
sporadic-server scheduling is theoretically the best. window, the sporadic server will not demand more
It also happens to be the only form of aperiodic- CPU time than could be demanded by a periodic
server scheduling that is recognized in the UNIX task with the same period and budget. A secondary
standard. goal of the sporadic server is to provide fast average
response time for its jobs.
A polling server is one way of scheduling aperi-
odic workloads. The polling server is a natural ex- With regard to minimizing average response
tension to the execution pattern of a periodic task. time, a sporadic server generally outperforms a
Using a polling server, queued jobs are provided CPU polling server. The advantage with a sporadic server
time based on the polling server’s budget, which is is that jobs can often be served immediately upon
replenished periodically. If no work is available when arrival, whereas with a polling server jobs will gener-
the polling server is given its periodic allocation of ally have to wait until the next period to receive CPU
CPU time, the server immediately loses its budget. time. Imagine a job arrival that happens immedi-
Similarly, if the budget is partially used, and no jobs ately after the polling server’s period. The job must
are queued, the polling server gives up the remainder wait until the following period to begin service, since
of the budget.3 the polling server immediately forfeits its budget if
there are no jobs available to execute. A sporadic
server, on the other hand, can execute the job imme-
diately given that its budget can be retained when
the server’s queue is empty. The ability to retain
budget allows the server to execute more than once
during its period, serving multiple jobs as they ar-
rive. Aperiodic servers that can hold on to their bud-
get until needed are known as bandwidth-preserving
servers.
nal algorithm which was corrected in subsequent work. Further, there are many sporadic-server variants, each with their own
nuances. These details are omitted to simplify the discussion.
222
Real-Time Linux Concepts
The scheduling algorithm followed by our implemen- m1 contains a timestamp, which is then subtracted
tation is described in [15], which is an updated ver- from the time the packet is received by the UDP layer
sion of [5] including corrections for errors in the on m2 .5 In our setup, m1 periodically sends packets
pseudo-code that were identified by Danish et al. in to m2 . The time between sending packets is varied
[4]. in order to increase the load experienced by the net-
work receive thread on m2 . The receive thread on m2
Correct operation of a sporadic server results in
is scheduled using either the polling server, sporadic
bounded interference experienced by lower-priority
server, or SCHED FIFO [7] scheduling policies.6 In
tasks. In order to measure the interference, we used
our experiments, m2 is running Linux 2.6.38 with
Regehr’s “hourglass” technique [6], which creates an
a ported version of softirq threading found in the
application-level process that monitors its own exe-
2.6.33 Linux real-time patch. m2 has a Pentium D
cution time without requiring special operating sys-
830 processor running at 3GHz with a 2x16KB L1
tem support. The hourglass process infers the times
cache and a 2x1MB L2 cache. 2GB of RAM are in-
of its transitions between executing and not execut-
stalled. The kernel was configured to use only one
ing by reading the clock in a tight loop. If the time
core, so all data gathered is basically equivalent to a
between two successive clock values is small, the as-
uniprocessor system.
sumption is that the process was not preempted.
However, if the difference is large, the thread was
likely preempted. This technique can be used to find
SCHED_FIFO
preemption points and thereby determine the time 1000 sporadic server
polling server
intervals when the hourglass process executed. From
this information, the hourglass process can calculate response time (milliseconds) 100
its total execution time.
10
Using the hourglass approach, we were able to
evaluate whether an implemented sporadic server ac-
1
tually provides temporal isolation. That is, if we
schedule the hourglass task with a priority below
0.1
that of the sporadic server (assuming there are no
other higher-priority tasks in the system), the hour-
0.01
glass task should be able to consume all of the CPU 0 2 4 6 8 10 12 14 16 18 20 22
time that remains after the sporadic server used all sent packets (1000 pkts/sec)
0.25
dow of size 10 milliseconds. In reality, other activities
such as interrupt handlers may cause the interfer- 0.2
223
Experience with Sporadic Server Scheduling in Linux: Theory vs. Practice
Scheduling the Linux network receive thread microsecond time interval was produced using the
(i.e., sirq-net-rx) using various scheduling policies af- Linux Trace Toolkit(LTTng)[16] and is shown in Fig-
fects the average response time of received network ure 4. The top bar is the sirq-net-rx thread and the
packets. One would expect that the polling server bottom bar is the lower-priority hourglass measur-
would result in higher average response times than ing task. This figure shows that the CPU time of
SCHED FIFO or sporadic server and that sporadic both tasks is being finely sliced. The small time
server and SCHED FIFO should provide similar av- slices cause interference for both the lower-priority
erage response times until sporadic server runs out and sporadic server thread that would not be expe-
of budget. rienced if the threads were able to run to completion.
In our experiment, sporadic server and polling
server are both given a budget of 1 millisecond 3.1 Accounting for Preemption Over-
and a period equal to 10 milliseconds. The spo-
head
radic server’s maximum number of replenishments
is set to 100. The hourglass task is scheduled us-
To ensure that no hard deadlines are missed, and
ing SCHED FIFO scheduling at a real-time priority
even to ensure that soft deadlines are met within the
lower than the priority of the network receive thread.
desired tolerances, CPU time interference due to pre-
Each data point is averaged over a 10 second interval
emptions must be included in the system’s schedula-
of sending packets at varied rates. The CPU utiliza-
bility analysis. The preemption interference caused
tion and response time for the described experiment
by a periodic task can be included in the analysis by
are shown in Figures 2 and 3.
adding a preemption term to the task’s worst-case
One would expect that if the sporadic server execution time (W CET ) that is equal to twice the
and polling server both were budgeted 10% of the worst-case context switch cost – one for switching
CPU, the lower-priority hourglass task should be into the task and one for switching out of the task.8
able to consume at least 90% of the CPU time re- Assuming all tasks on the system are periodic, this
gardless of the load. However, the data for the exper- is at least a coarse way of including context-switch
iment shows that the sporadic server is causing much time in the schedulability analysis.
greater than 10% interference. The additional inter-
A sporadic server can cause many more context
ference is the consequence of preemptions caused by
switches than a periodic task with the same param-
the server. Each time a packet arrives the sporadic
eters. Rather than always running to completion, a
server preempts the hourglass task, thereby causing
sporadic server has the ability to self-suspend its ex-
two context switches for each packet arrival. Given
ecution. Therefore, to obtain a safe W CET bound
that the processing time for a packet is small (2-10
for analysis of interference9 , one would have to deter-
microseconds) the server will suspend itself before
mine the maximum number of contiguous “chunks”
the next packet arrives. In this situation, the aggre-
of CPU time the sporadic server could request within
gate time for context switching and other sporadic
any given period-sized time interval. The defini-
server overhead such as using additional timer events
tion of sporadic- server scheduling given in schedul-
and running the sporadic-sever-related accounting
ing theory publications does not place any such re-
becomes significant. For instance, on the receiv-
striction on the number of CPU demand chunks and
ing machine the context-switch time alone was mea-
thus imposes no real bound on the W CET . In order
sured at 5-6 microseconds using the lat ctx LMbench
to bound the number of preemptions, and thereby
program[1].
bound the time spent context switching, most imple-
The overhead associated with preemption causes mented variations of sporadic server limit the maxi-
the additional interference that is measured by the mum number of pending replenishments, denoted by
lower-priority hourglass task.7 max repl. Once max repl replenishments are pend-
ing, a sporadic server will be prevented from execut-
A snapshot of CPU execution time over a 500
ing until one of the future replenishments arrives.
7 The lower-priority thread does not measure much of the cache eviction and reloading that other applications may experience,
because its code is very small and typically remains in the CPU’s cache. When cache effects are taken into account, the potential
interference penalty for each preemption by a server is even larger.
8 This is an intentional simplification. The preemption term should include all interferences caused by the sporadic server
preempting another thread, not only the direct context-switch time, but also interferences such as the worst-case penalty im-
posed by cache eviction and reloading following the switch. For checking the deadline of a task, both “to” and “from” context
switches need to be included for potentially preempting task, but only the “to” switch needs be included for the task itself.
9 From this point on we abuse the term W CET to stand for the maximum interference that a task can cause for lower-priority
tasks, which includes not just the maximum time that the task itself can execute, but also indirect costs, such as preemption
overheads.
224
Real-Time Linux Concepts
CPU utilization
0.25
0.2
SSbudget + 2 ∗ max repl
0.15
0.1
225
Experience with Sporadic Server Scheduling in Linux: Theory vs. Practice
3.2 Preemption Overhead It turns out that the workload presented by our
packet service example is a bad one for the sporadic
server, in that a burst of packet arrivals can frag-
ment the server budget, and then this fragmenta-
SCHED_FIFO
1000 sporadic server tion becomes “locked in” until the backlog is worked
polling server
off. Suppose a burst of packets arrives, and the first
response time (milliseconds)
226
Real-Time Linux Concepts
similar to the classical mode-change scheduling prob- may be incorrectly identified as the onset of a heavy
lem, in that one must be careful not to violate the load and the early switching may cause the server
assumptions of the schedulability analysis during the to postpone a portion of its budget that could have
transition. In the case of a sporadic server the con- been used sooner. Conversely, delaying the switch
straint is that the server cannot cause any more in- may mean that time that could have been used to
terference within any time window than would be serve incoming jobs is wasted on preemption charges.
caused by a periodic task with execution time equal
While an ideal switching point may not be pos-
the server budget and period equal to the server’s
sible to detect beforehand, one reasonable indicator
budget replenishment period, including whatever ad-
of a heavy load is when sporadic server uses all of its
justments have been made to the model to allow for
budget. That is the point when a sporadic server is
context-switch effects. We call this the sliding win-
blocked from competing for CPU time at its schedul-
dow constraint for short.
ing priority. At this point the server could switch to
its polling-like mode of operation.
A possible event to indicate when to switch back
to the sporadic server mode of operation is when a
sporadic server blocks but still has available budget.
This point in time would be considered as entering a
period of light load and the max repl could be rein-
stated.
0.25
0.2
0.15
0.1
0.05
0
0 2 4 6 8 10 12 14 16 18 20 22
sent packets (1000 pkts/sec)
FIGURE 8: After switch to poll-like server,
with max repl = 1 and replenishments coa-
lesced. FIGURE 9: Coalescing replenishments un-
der heavy load.
In order to maintain the sliding-window con-
straint during the mode change, one can think in
terms of changing the times associated with pend-
SCHED_FIFO
ing replenishments. Consolidating the replenishment 1000 sporadic server coalesce (immediate)
polling server
times would allow the creation of a single replen- sporadic server coalesce (gradual)
response time (milliseconds)
227
Experience with Sporadic Server Scheduling in Linux: Theory vs. Practice
228
Real-Time Linux Concepts
this way the server’s actual interference is limited rived, but the preemption overhead for doing this
to its actual CPU time budget, and we do not need was still a problem. By waiting for several pack-
to use an inflated value in the schedulability analy- ets to arrive, and then processing them in a batch,
sis. Since the preemption charges come out of the the polling server and our hybrid server were able
server’s budget, we still need to consider preemp- to handle the same workload with much less over-
tion costs when we estimate the worst-case response head. However, the logical next step is to force a
time of the server itself. However, if we choose to similar waiting interval on the interrupt handler for
over-provision the server for worst-case (finely frag- the network device.
mented) arrival patterns it actually gets the time and
While we have not experimented with deadline-
can use it to improve performance when work arrives
based aperiodic servers in Linux, it appears that
in larger chunks.
our observations regarding the problem of fitting the
The ability to use small time slices allows a spo- handling of context switch overheads to an analyz-
radic server to achieve low average response times able theoretical model should also apply to the con-
under light loads. However, under a load of many stant bandwidth server, and that a similar hybrid
small jobs, a sporadic server can fragment its CPU approach is likely to pay off there.
time and waste a large fraction of its budget on pre-
In future work, we hope to explore additional
emption charges. A polling server, on the other hand,
variations on our approach to achieving a hybrid be-
does not experience this fragmentation effect, but
tween polling and sporadic server, to see if we can
does not perform as well as sporadic server under
improve performance under a range of variable work-
light load. To combine the strengths of both servers,
loads. We are considering several different mecha-
we described a mechanism to transition a sporadic
nisms, including stochastic, for detecting when we
server into a polling-like mode, thereby allowing spo-
should change modes of operation as the system
radic server to serve light loads with good response
moves between intervals of lighter and heavier load.
time and serve heavy loads with throughput simi-
We also plan to explore other aperiodic servers and
lar to a polling server. The data for our experiments
determine how much interference preemptions cause.
show that the hybrid approach performs well on both
For example, it appears that a constant bandwidth
light and heavy loads.
server would suffer the same performance problems
Our recent experiences reinforce what we learned as a sporadic server when the workload causes bud-
in prior work with sporadic-server scheduling in get fragmentation. We also plan to investigate the
Linux [5]. There are devils in the details when it preemption interference due to cache eviction and
comes to reducing a clever-looking theoretical algo- reloading. The threads used in our experiments ac-
rithm to a practical implementation. To produce a cess relatively small amounts of data and therefore do
final implementation that actually supports schedu- not experience very large cache interferences. This
lability analysis, one must experiment with a real im- is not true for all applications, and the cache effects
plementation, reflect on any mismatches between the on such applications will need to be bounded. While
theoretical model and reality, and then make further limiting the number of replenishments does reduce
refinements to the implemented scheduling algorithm the cache effect, better mechanisms are needed to
until there is a match that preserves the analysis. reduce the ability of sporadic server to cause cache
This sort of interplay between theory and practice interferences.
pays off in improved performance and timing pre-
Other questions we are considering include
dictability.
whether it is practically feasible to schedule multiple
We also believe our experience suggests a poten- threads using a single sporadic-server budget, and
tial improvement to the “NAPI” strategy employed how well sporadic-server scheduling performs on a
in Linux network device drivers for avoiding unnec- multi-core system with thread migration.
essary packet-arrival interrupts. NAPI leaves the in-
terrupt disabled so long as packets are being served,
re-enabling it only when the network input buffer References
is empty. This can be beneficial if the network de-
vice is faster than the CPU, but in the ongoing race
between processors and network devices the speed [1] L. McVoy and C. Staelin. lmbench: Portable
advantage shifts one way and another. For our ex- tools for performance analysis. In USENIX An-
perimental set-up, the processor was sufficiently fast nual Technical Conference, pages 279–294, Jan.
that it was able to handle the interrupt and the sirq- 1996.
net-rx processing for each packet before the next ar-
[2] B. Sprunt, L. Sha, and L. Lehoczky. Aperiodic
229
Experience with Sporadic Server Scheduling in Linux: Theory vs. Practice
task scheduling for hard real-time systems. Real- Real-Time Systems Symposium, pages 376–385,
Time Systems, 1(1):27–60, 1989. 2005.
[3] M. Lewandowski, M. J. Stanovich, T. P. Baker, [11] G. Lipari and E. Bini. Resource partitioning
K. Gopalan, and A.-I. Wang. Modeling device among real-time applications. In Proc. 15th
driver effects in real-time schedulability analy- EuroMicro Conf. on Real-Time Systems, pages
sis: Study of a network driver. In Real Time and 151–158, July 2003.
Embedded Technology and Applications Sympo-
sium, 2007. RTAS ’07. 13th IEEE, pages 57–68, [12] S. Saewong, R. R. Rajkumar, J. P. Lehoczky,
Apr. 2007. and M. H. Klein. Analysis of hierar hical fixed-
priority scheduling. In ECRTS ’02: Proceedings
[4] M. Danish, Y. Li, and R. West. Virtual-cpu of the 14th Euromicro Conf. on Real-Time Sys-
scheduling in the quest operating system. Real- tems, page 173, Washington, DC, USA, 2002.
Time and Embedded Technology and Applica- IEEE Computer Society.
tions Symposium, IEEE, 0:169–179, 2011.
[13] I. Shin and I. Lee. Compositional real-
[5] M. Stanovich, T. P. Baker, A.-I. A. Wang, time scheduling framework with periodic model.
and M. G. Harbour. Defects of the posix spo- ACM Trans. Embed. Comput. Syst., 7(3):1–39,
radic server and how to correct them. In Real 2008.
Time and Embedded Technology and Applica-
tions Symposium, 2010. RTAS ’10. 16th IEEE, [14] Y. C. Wang and K. J. Lin. The implementa-
pages 35–45, Stockholm, Sweden, Apr. 2010. tion of hierarchical schedulers in the RED-Linux
IEEE Computer Society. scheduling framework. In Proc. 12th EuroMicro
Conf. on Real-Time Systems, pages 231–238,
[6] J. Regehr. Inferring scheduling behavior with
June 2000.
Hourglass. In Proc. of the USENIX Annual
Technical Conf. FREENIX Track, pages 143– [15] M. Stanovich, T. P. Baker, A.-I. A. Wang, and
156, Monterey, CA, June 2002. M. G. Harbour. Defects of the posix sporadic
[7] IEEE Portable Application Standards Commit- server and how to correct them. Technical
tee (PASC). Standard for Information Tech- Report TR-091026 (revised), Florida State
nology - Portable Operating System Interface University Department of Computer Science,
(POSIX) Base Specifations, Issue 7. IEEE, Dec. http://www.cs.fsu.edu/research/reports/TR-
2008. 100315.pdf
[8] D. Faggioli, M. Bertogna, and F. Checconi. Spo- [16] Linux Trace Toolkit Next Generation,
radic server revisited. In Proceedings of the 2010 http://lttng.org/
ACM Symposium on Applied Computing, SAC
[17] L. Abeni, G. Lipari, and G. Buttazzo. Constant
’10, pages 340–345, Sierre, Switzerland, 2010.
bandwidth vs. proportional share resource allo-
ACM.
cation. In Proc. IEEE Int. Conf. Multimedia
[9] R. J. Bril and P. J. L. Cuijpers. Analysis of Computing and Systems, Florence, Italy, June
hierarchical fixed-priority pre-emptive schedul- 1999.
ing revisited. Technical Report CSR-06-36,
Technical University of Eindhoven, Eindhoven, [18] J. Strosnider, J. P. Lehoczky, and L. Sha.
Netherlands, 2006. The deferrable server algorithm for enhanced
aperiodic responsiveness in real-time environ-
[10] R. I. Davis and A. Burns. Hierarchical fixed pri- ments. IEEE Trans. Computers, 44(1):73–91,
ority preemptive scheduling. In Proc. 26th IEEE Jan. 1995.
230
Real-Time Linux Concepts
Carsten Emde
Open Source Automation Development Lab (OSADL) eG
Aichhalder Str. 39, 78713 Schramberg, Germany
[email protected]
Abstract
In the early days of using computers for determinism-critical tasks, processors mostly were suitable
for this purpose, since instruction execution was in sync with the clock frequency. This made it possible
to correctly predict the execution time of a given code segment. With the rapidly increasing need for
processing power, deterministic execution was abandoned in favor of throughput. In consequence, the
peak processing power of a today’s state-of-the-art multi-core processor is about 1,000,000 times greater
than that of a standard processor 30 years ago. The worst-case performance, however, only improved by
a factor of about 10 - and even this may require specific configuration of the processor and the operating
system. The main reasons for the lack of progress of the worst-case performance are the introduction of
caches and energy-saving features. While the negative impact of caching on the worst-case performance
could be studied in recent years and can now be handled reasonably well, the details of energy-saving
are less well known. Although energy-saving most of the time boils down to switching off or at least
throttling down unneeded processor components, such mechanisms can be implemented in various ways
and locations and are often undocumented. It was, therefore, the aim of this project to investigate the
latency behavior of modern energy-saving processors and to provide recommendations how to disable or
circumvent energy-saving.
To investigate the impact of energy-saving, latency measurements no longer were performed in a short
closed loop such as when using the cyclictest utility but with randomly occurring interrupt triggers at
idle state of the processor. This could lead to long latencies which were then attempted to be reduced by
specific processor and Linux kernel configurations.
As a result, we now can recommend a number of individual processor settings and configuration items
of the Linux kernel to optimize the worst-case system latency when modern energy-saving processors are
used. In some cases, however, it was not possible to disable any deleterious effect of energy-saving on a
processor’s latency, although we tried very hard. Thus, we urgently appeal to semiconductor manufactur-
ers to - whatever mechanisms they invent to reduce power consumption - provide a way to switch them
off, if they adversely affect response time. In many cases, power-saving and fast reaction to asynchronous
events exclude each other. It is, however, still possible to obtain a deterministic response while power-
saving is enabled, but it must be taken into account that the worst-case latency of such systems may be
considerably prolonged.
• Reducing the energy consumption as part of • Reducing the need of fans and preventing dam-
the general ecological imperative age from defective fans
231
How to cope with the negative impact of a processors energy-saving features on real-time capabilities?
• Reducing the need of dust filters and prevent- For the analysis of the effect of throttling on the
ing damage from filters that were forgotten to worst-case latency of a processor, the cyclictest util-
clean or to replace ity could be used in its original version. The utility
was run as usual on systems with and without en-
The semiconductor industry is using two com- abled power-saving. At least 100 million test cycles
pletely different approaches to provide processors were run to obtain reliable results of the worst-case
with reduced energy consumption: latency.
For the analysis of the effect of idle states on the
• Using smaller structures and more efficient iso- worst-case latency of a processor, the cyclictest util-
lation material to reduce current demand and ity was expanded. In a first step, the -i or –interval=
leakage current option that is used to define the duration of the mea-
surement cycle was allowed to accept a range of a
• Slowing down or switching off parts of the pro- lowest and a highest interval duration. When a range
cessor when idle is specified using this newly defined option format,
an individual duration is determined for every test
While the former approach is generally welcome and interval by using a uniformly distributed logarithmic
normally does not have any impact on the response random value between the lowest and and the high-
time of a processor, the latter is relevant in the con- est duration specified. The interval is displayed in
text of real-time systems that try to achieve a min- the output line to monitor the behavior of this func-
imum worst-case latency. It is, therefore, important tionality.
to analyze a processor with respect of the imple- In a second step, a two-dimensional histogram
mented mechanisms for energy saving when selecting was implemented that is activated when the -h or –
and configuring it for a system that relies on real- histogram= option is specified. This histogram stores
time computing. the frequencies of latency samples per interval dura-
tion. This makes it possible to differentiate recorded
latency values with respect to the duration of the pre-
2 Slowing down or switching ceding idle time of the processor. It is expected that
the latency values would not depend on the duration
off parts of the processor of the test interval in processors without any (or com-
pletely disabled) power-saving mechanism, whereas
To slow down or switch off parts of the processor, in processors with active power-saving, the latency
three different general mechanism are employed: values would be the higher, the longer the preceding
period of quiescence was.
• Throttling
232
Real-Time Linux Concepts
scaling governor will select it, e.g. to set CPU #0 to 5 Sleep states
full speed:
0
Performance (1,400 MHz) On-demand (583 MHz)
4.3 Recommendation
233
How to cope with the negative impact of a processors energy-saving features on real-time capabilities?
log10 of
2.5
microseconds to 1 second was specified, the number 2.0
1.5
of histogram cells was set to 100 which is equivalent
Frequen
1.0
to 100 microseconds, and a total of 40,000 cycles per 0.5
core was preset: 0.0
cy
10
s)
20 60
(u
55
30 50
al
cyclictest -m -Sp99 -i100-1000000 -l40000 \ 45
rv
40 40
te
Lat 50 35
In
-h100 enc 60 30
cle
y (u 25
s) 70
Cy
20
80 15
of
10
0
90
g1
Figure 3 shows the result of such a measurement 5
*lo
100
10
on an Intel Core i3-2100T processor running at a
maximum clock frequency of 2,500 MHz. All power-
saving features were enabled at BIOS level. The in- FIGURE 4: Frequency of latency samples
terval is scaled logarithmically and multiplied by 10; with respect to the duration of the preceding
thus, the scale value of 20 is equivalent to 102 mi- interval of quiescence (data obtained with a
croseconds and the scale value of 60 is equivalent to modified version of the cyclictest utility), Intel
106 microseconds = 1 second. It can be seen that the Core i3-2100T @2,500 MHz, all power-saving
worst-case latency increases with the increase of the features disabled
duration of the preceding measuring interval. This
is very probably the result of entering a sleep state
when the processor is idle for a certain amount of
time. 5.2 Recommendation
1.0
0.5 cious to reduce the computing speed while in idle
state should be disabled. In addition, the kernel pa-
cy
0.0
20 60
(u
55
30 50
al
45
rv
40 40
te
Lat 50 35
In
enc 60 30
cle
y (u 25
s) 70
Cy
20
80 15
of
10
0
90
g1
5
6 Undisclosed internal mecha-
*lo
100
10
234
Real-Time Linux Concepts
2.0
1.5 And it is well conceivable that the semiconductor
Frequenc
1.0
0.5
industry will develop even more sophisticated mech-
0.0 anisms of power saving that may interfere with real-
y
s)
60
20 mains to appeal to semiconductor manufacturers to
(u
30
al
50
rv
40
- whatever mechanisms they invent to reduce power
te
40
In
Lat 50
enc 60
cle
30
y (u consumption - provide a way to switch them off, at
Cy
s) 70 20
of
80
least if they adversely affect response time. On the
0
g1
90 10
*lo
other hand, field-programmable gate arrays have be-
10
100
References
[1] The Real-Time Linux Kernel Wiki https://rt.wiki.kernel.org/index.php/Cyclictest - may not be accessible
[2] GIT repository of the RT Tests that includes, among others the cyclictest utility, maintained by Clark
Williams http://git.kernel.org/?p=linux/kernel/git/clrkwllms/rt-tests.git;a=summary - may not be acces-
sible
[3] Martin Walter, The SCARTS Hardware/Software Interface, OSADL Academic Works Vol. 2, 2011
[4] The OSADL Quality Assurance Farm https://www.osadl.org/QA/
235
How to cope with the negative impact of a processors energy-saving features on real-time capabilities?
236
FLOSS in Safety Critical Systems
Abstract
Design and implementation of safety-critical system is very difficult because they must ensure contin-
uous correct operational state whereas they are deployed in hostile environments. An error either during
design or implementation phases can have significant impacts and consequences. To avoid such issues,
failure cases must be clearly identified and handled by software engineers to prevent any propagation from
one faulty component to another. For that purpose, good practices and standards are applied during the
development process, from the specifications to the implementation.
However, despite all existing efforts, bugs are still introduced. They are introduced at different levels of
the development process: either in the specifications (as in the Mars Climate Orbiter mission - failure was
due to a mix-up of metric units) or in the implementation (as in the Ariane 5 launch - wrong assumption
was made about a data type so that the system generate an overflow).
Over the years, several solutions have been designed to address such issues. However, they rely on
different system representations and are applicable at different level of the design process, so that their
use could be difficult and may lead to design inconsistencies. In consequence, we have to avoid these
problems and make their use more consistent.
In this paper, we present our tool-chain for system design, validation, implementation and certification.
It relies on a modeling notation to capture both software and hardware concerns. The use of a single
notation ensures specification consistency and avoiding potential errors when using different language to
specify the same system aspect. We detail the support of this process in The Assert Set of Tools for
Engineering (TASTE) development tool-chain.
237
On integration of open-source tools for system validation, example with the TASTE tool-chain
238
FLOSS in Safety Critical Systems
Types mapping
C/Ada
OK KO
3. Certification: implementation is either sim-
C/Ada
Configuration & deployment
C/Ada
ulated or executed on the target to check its
behavior correctness and standards (such as Operating system
C/Ada
239
On integration of open-source tools for system validation, example with the TASTE tool-chain
• reproduces its behavior (by instrument- the the overall development process, making it more
ing the code and produce a Value Change consistent. Extensions mechanisms allow us to tailor
Dump (VCD) [6]) file to be used with the language to our needs:
GTKWave [5])
• Properties extension mechanism is used to
• produces code coverage reports using the
define specific requirements from textual spec-
COUVERTURE tool-set [4] (specific free-
ifications to the AADL model (for example, to
licensed tools from Adacore that aims at
model memory concerns such as stack or heap
supporting code coverage using a specific
size, etc.).
tailored version of QEMU [16]).
• Annex languages mechanism is used to as-
The use of a single notation (AADL [9]), pro- sociate our in-house AADL validation tool
cessed by dedicated tools for each development as- (REAL) to check requirements enforcement.
pect makes the overall process more consistent. In It processes processes models according to its
addition, automation of model processing avoids is- components hierarchy and check for system
sues of usual development process and ensures re- requirements validation (for example: can a
quirements traceability. Finally, while system fea- process P1 with 1Mb of RAM contain three
sibility and requirements are automatically checked threads that require a stack of 800Kb ?).
during the development process, these tools also pro-
vides metrics (such as code coverage) that can be If several modeling languages already exist for
used for system certification. the specification of real-time embedded systems, no
one provides the ability to capture both hardware
Next sections focus on validation and certifica- and software aspects with such a flexibility. That is
tion functions of our tool-chain: why our choice was focused on this language.
240
FLOSS in Safety Critical Systems
241
On integration of open-source tools for system validation, example with the TASTE tool-chain
aspects into account, offering a convenient way to Connect scheduling analysis tools with AADL
analyze distributed systems.
To automate scheduling analysis, TASTE transforms
system specifications (Interface View and De-
ployment View) into a new description that can
be processed by MAST or Cheddar. First, it trans-
lates the AADL models into a Concurrency View:
a single AADL model that merges both software and
hardware aspects with all execution entities (tasks,
shared variables, etc.) with their scheduling con-
straints (scheduling algorithm of processors, locking
policy for shared variables, etc.).
Then, an appropriate code generator (Oca-
rina [10]) transforms this concurrency view into a
new representation suitable for Cheddar or MAST.
In fact, it consists in translating AADL language
constructs into an XML representation that can be
processed either by Cheddar or MAST. As a result,
this export function of our tool-chain bring the abil-
FIGURE 4: MAST scheduling analyzer in- ity to automate scheduling analysis with both tools
terface from the same specification (AADL models).
Concurrency View
AADL
242
FLOSS in Safety Critical Systems
TASTE provides its own interface method with Standards such as DO178B [21] (for avionics sys-
gprof, as illustrated in figure 6. It parses profiling tems) or ECSS [22] (for aerospace applications) re-
results by its own and produces an execution report quires that safety-critical systems enforces a prede-
with the execution time and the number of execu- fined code coverage, depending on their criticality
tion for each function. By using this report, engi- level.
neers check execution traces compliance with system To do so, different methods are commonly used,
requirements. but most of the time, they require a manual instru-
mentation or inspection of application code. Code
instrumentation is intrusive: the code under inspec-
5.2 Specifications compliance en- tion is not the one that would be deployed and so,
forcement validation results may not be relevant. In addition,
a manual inspection is still error-prone, due to the
Execution profiling provides metrics and data that human-factor errors.
could detect some erroneous execution case (a func-
tion called too many times, a call that would not Binary QEMU
happen, etc.), but may be not sufficient to check im-
plementation correctness. In particular, implemen-
tation validation requires to check implementation
xcov Exec trace file
consistency with the specifications (AADL models).
This consist in monitoring system events, and check Coverage report
their compliance with the model. For that purpose,
TASTE provides functions to monitor system events FIGURE 8: Work-flow of the couverture
at run-time and create appropriate metrics that can tool-set
be compared with its specifications. To do so, it
instruments generated application with profiling in- To cope with these issues and provide an accu-
structions that produces VCD [6] files at run-time rate coverage analysis, TASTE relies on the COU-
(example of events reported is available in figure 7) VERTURE tool-set [4], a code coverage analyzer re-
with the following metrics: leased under free-software license terms. It relies on
two main tools that produce coverage reports (as
• Task activation time shown in figure 8):
• Data sent/received through tasks port 1. A tailored version of QEMU [16] traces all exe-
cuted instructions when executing the system.
• Shared data usage (semaphore/mutex acquisi-
tion and release) 2. An analysis tool, xcov, compares executed in-
structions with the program under execution
Once produced, programs such as GTKWave [5] and produces a coverage analysis report.
as used to depict system events with a graphical in-
terface and provide the ability to analyze system be- By tracking executed instructions and establish-
havior. It offers the ability to check run-time be- ing a mapping with the source file, xcov produces
havior consistency with system specifications (for ex- a complete coverage report, as shown in figure 9. It
243
On integration of open-source tools for system validation, example with the TASTE tool-chain
details the execution of each line of code so that de- 1. A sensor for temperature acquisition.
velopers are able to assess if some block could be
2. A filter for bad data detection.
removed or not.
3. An average computer that receives each
However, to evaluate system implementation,
new temperature value from the filter and prints
this coverage functionality would be integrated with
the average temperature.
a test framework that would execute generated ap-
plications with different input values that are repre- Each function is deployed on top a Real-Time
sentative of a real environment. This would provide Operating System, executed on a single processor:
a better assessment of system quality, force each con-
• The sensor function is deployed on a LEON2
dition/decision of the code to be executed and lead
processor with RTEMS.
to a better coverage analysis.
• The filter function is executed on an Intel i386
processor with a Linux operating system.
• The average function is deployed on a LEON2
processor with RTEMS.
Finally, to enable the communication between
functions sensor/filter and and filter/average, the
processors are connected using a SpaceWire bus, as
shown in figure 11.
244
FLOSS in Safety Critical Systems
it filters the data and send it when it is consid- FIGURE 12: Scheduling analysis result for
ered as valid. For the needs of this simulation, the node acquisition
50% of received data are considered as correct
so that this function sends data to the average Finally, from these both models, we can validate
function every two seconds. some of its aspects prior to implementation efforts.
In this case-study, we run a schedulability feasibility
• The average function is also sporadic and ac- using Cheddar [7], as illustrated in figure 11. The
tivated when receiving incoming data from the scheduling feasibility test is based on simulation and
filter. As the filter function sends data ev- is performed for each processor of the system (figure
ery two seconds, this function execution follows 11 illustrates the result for the acq board).
this period.
245
On integration of open-source tools for system validation, example with the TASTE tool-chain
246
FLOSS in Safety Critical Systems
Use of such a tool-chain strengthens the develop- [2] The ASSERT project
ment process and makes it more robust and reliable. http://www.assert-project.net
Moreover, as potential errors are discovered early in
the development process and integration issues would [3] GNU binutils
likely be reduced, development cost are expected to http://www.gnu.org/software/binutils/
decrease significantly.
[4] The Couverture project
Further work would cover other aspects of safety-
critical systems development. In particular, our [5] GTKWave
tool-chain could also support additional guidance for http://gtkwave.sourceforge.net/
safety-critical standards (such as DO178 or ECSS)
enforcement by providing documentation generation [6] Value Change Dump file format
facilities or additional implementation code valida-
[7] Cheddar scheduling analyzer
tion (coding rules to be checked, etc.).
[8] MAST scheduling analyzer
7.1 Perspectives http://mast.unican.es/
247
On integration of open-source tools for system validation, example with the TASTE tool-chain
248
FLOSS in Safety Critical Systems
Nicholas Mc Guire
Distributed and Embedded Systems Lab
SISE, Lanzhou University
[email protected]
Abstract
Utilizing computers for safety critical systems, notably contemporary super scalar multi-cores, let
alone NUMA systems running general purpose operating systems like GNU/Linux, is quite contended in
the safety community - their hopes still rest on determinism and KISS. While keeping things simple in
the safety related components is undoubtedly preferred, it is questionable if keeping the hardware model
simple is realistic - notable with the divergence of reality from model with respect to determinism already
being dramatic for widely used general-purpose single-core CPUs. Further actually deterministically
covering the impact of all complex software components is not doable with an economically tolerable
effort (if it is technically doable is a different issue).
The consequence of this belief in determinism, is an, in our opinion useless, fight against complexity
and non-determinism - two inherent properties of modern hardware/software systems. Quite to the
contrary, we propose to utilize the properties of complex systems to enhance safety related systems. This
seemingly paradox approach can be seen as an attempt to take the bull by the horns as it seems inevitable
that the time of simple CPUs and black-box proprietary operating systems, that continue to entertain
the illusion of determinism, is coming to and end.
Safety mechanisms, drawing enhancements from underlying complexity, we see as potentially suitable
for building safety related systems are:
• computation: Inherent diversity
• data: mapping value domain to complex data representations
• time: loos coupling: inherent randomness
and we are quite sure that this little list is incomplete at this point.
In this article we will describe an attempt at the second category called dynamic data types, which
essentially combine the value domain with the temporal properties of data to map data to a value in
the frequency domain rather than to a value in the time-domain. We outline the concept of dynamic
data types and a rational for why it seems a promising approach for covering of particular fault classes.
Finally we describe how building simple logic utilizing dynamic data types on complex systems can yield
a safe system never the less and thus allow to co-locate safety related logic with non-safety related general
purpose applications and services on a single contemporary system.
Keywords: Safety logic, complex systems, dynamic data types
249
Safety logic on top of complex hardware software systems utilizing dynamic data types.
• permanent bit faults (a memory cell, a shorted if this is implemented with common means then
relay, etc) we have a signal level (voltage or digital makes no
difference here) indicating the switch position of
• transient bit faults (the infamous cosmic ray -
up/down, we have an actuator that operates on such
or more earthly EMV issues)
a signal and we have the indicator lights that will
• systematic operational faults (i.e. the FOOF provide operator feedback on current actions.
bug in Pentium I) possible single faults (simple model)
• transient operational faults (i.e. critically low
voltage)
• up stuck:
• temporal faults (i.e. clock drifts or clock up- actuator ”sees” up pushed and lifts the door,
dates) it is fed back to the operator via indicator light
but the operator might not be able to close it,
This is a bit course grained but serves well for and the actuator can’t actually determine that
the discussion here. It also should be noted that fail- the input is unintended or invalid. The haz-
ures are only one possible cause of hazards, so this ard is probably limited as the indicator is cor-
is in principle incomplete [5]. rect and humans thus can respond according
All of these fault classes are then mitigated by to preset procedures to achieve a ”safe state”
different technologies (see [1] IEC 61508 part 7 for - long term consequences of ”up” being per-
a overview of available technologies) , starting from sistent are obviously only meaningfully inter-
redundancies and different levels, by adding diversity preted in a particular context.
in hardware or software, and by introducing pro- • down stuck:
tocols that mitigate against consequences of these actuator ”sees” down permanent and closes the
faults (i.e. using sequence numbers, CRC, etc.[4]). door - feedback to the operator indicates closed
All of this surrounding safety related mitigation’s of door - safe state assumed - thus no hazard, but
the respective fault classes may well be technically availability is impacted.
suitable, but the question to ask first is - why do
these faults exist in the first place - and if they are • actuator stuck:
inherent to digital systems can they not be mitigated the actuator does not respond to input, it stays
at a generic level rather than growing the complexity in what ever state it happens to be - this may
at the application level ? be visible in the indicator light but the opera-
tor would not see that the actuator is damaged
There have been some indirect efforts to mitigate in the signal - an additional diagnostic input
these issues at a generic level, think of the many IEC would be needed (i.e. indicator for the actua-
61131 [3] runtime systems allowing to focus again on tor operation status it self). Depending on the
the relatively simple logic and handling the safety actuator stuck position this may or may not be
related issues below-the-hood. The assumption be- safe - again context would be needed.
ing that for a simple set of operations a complete
list of potential faults can be established and miti- • indicator stuck green: - this would not allow
gated. The methods though used for this mitigation the operator to detect a potential hazardous
again are relatively specific and in general limited to situation and exposure to a hazard could oc-
ladder-logic or boolean-logic constructs - so the ques- cur based on procedures (green indicating en-
tion remains - what is the root cause and can it be tering permitted). The problem would not be
mitigated at a principle level? detected until an operator attempts to change
state by requesting ”up” and not getting any
response (indicator light changing to red) - but
2 Naive case study it would rely on human observation and even
if detected the diagnostics would require addi-
tional means.
This case study might seem quite trivial but from
a safety perspective we believe it demonstrates the • indicator stuck red:
potential advantage of the concept introduced in this impacts availability primarily.
paper - dynamic data types.
Note that we are not considering any temporal
switch <---> actuator <---> indicator lights issues here - there are of course a whole set of tempo-
(up/down) (red/green) ral failure modes as well, nor are we concerned with
250
FLOSS in Safety Critical Systems
the (and often dominant) non-technical failures like violation of value domain constraints. That is to say
management and safety culture failures. if a register contains an integer and a bit is flipped
it still contains an integer and is within range - the
bit-flip is not noticeable in the data-type. The rea-
2.1 Adding diagnostics son for this is that the value domain is dense (even
though it is discrete - dense in the sense that there are
As noted above there are a number of faults that no ”holes” in the representation) - conversely analog
could be diagnosed if additional information were signals had to be constructed with a granularity to
available from additional sensors - but the obvious ensure detectability of divergence, so the analog sig-
disadvantage of this is that additional sensors mean nal spectrum, effectively usable, could not be dense.
reduced availability on the one hand and the prob-
SEU
lem of accumulated faults for any indicators of ”rare-
events” - notably the later is problematic from a Single event upsets are random alterations of
safety perspective. A further issue is simply that some state of the system that can’t be predicted,
the system intended to be kept simple starts becom- they only can be covered by some form of fault-
ing more complex than would be needed for the pure tolerance or fault detection/reaction - they are in
logic. principle not testable as they are not visible until
they actually occurred. In case where the affected
The above example exhibits two main problems:
resource may stay in the altered state, the alteration
may be detectable if additional information about
• lack of indication of ”invalid” state for all com- the expected state is available.
ponents
• inability to identify the cause without addi- logic high .---- perceived value
tional sensors |
logic low------- - - intended value
^
If we look at the potentially hazardous situations
SEU
then these are associated with the problem of single
points of failure as well as the inability to signal ”in-
valid state”. Omission
Before considering methods to implement redun- Omission faults are due to the inavailability of
dancy, signal-diversity, safety communication, etc. some resource. They are in principle detectable,
we would like to look at the root causes and then though the effort for this may be high - note that
derive requirements that could potentially eliminate omission faults may also have systematic character-
them all together. istics (i.e. wire cross-talk).
251
Safety logic on top of complex hardware software systems utilizing dynamic data types.
252
FLOSS in Safety Critical Systems
• all elements may only react to valid states (im- frequencies, with a tolerated range. In this example
plying active states) we mapped:
• SEUs shall not impact the value domain (gen-
eral resilience against SEUs) Logic Dynamic representation
• signals must encapsulate temporal properties TRUE 6 Hz
along with value properties FALSE 12 Hz
Stuck at failures are a bit more critical, select- ing (in this case butterworth filters) allows covering
ing proper frequencies in the generated intermedi- both omission and stuck-at faults.
ate spectrum in combination with band-pass filter-
253
Safety logic on top of complex hardware software systems utilizing dynamic data types.
4.2 Core concepts of dynamic data • compromising dynamic data types would re-
types quire arbitrarily complex correlated alterations
of data/code to yield a valid signal
From a safety perspective dynamic data types con-
• increased system complexity reduces the prob-
cept can be summarized as
ability of a false-positive signal generation
• variable values are bound to a frequency spec- Essentially the goal of dynamic data types is
trum and so conserve temporal information of to ”scale” with system complexity growth which we
the signal consider inevitable.
254
FLOSS in Safety Critical Systems
positive voltage
True : 1050Hz sin-wave level
IN A False: 1350Hz detection detection
2.1kHz
True : 1050Hz
False: 1350Hz Output
True/False
IN B voting
Frequency
logic
Switch
Multiply
True : 1050Hz
False: 1350Hz
negative voltage
sin-wave level
detection detection
positive voltage
sin-wave level
detection detection
1050kHz
Computation
True : 1050Hz voting
Node Actuator
False: 1350Hz logic
negative voltage
sin-wave level
detection detection
255
Safety logic on top of complex hardware software systems utilizing dynamic data types.
lated system - hat is some MCU or could be run on either be in the tolerance of the signaling system (fil-
the available computing capacity provided adequate ter bandpass tolerances) or will cause the system to
isolation could be enforced as stated in IEC 61508- enter an identifiable invalid state and lead to a safety
3 Clause 7.4.2.8/7.4.2.9 [2]. To provide such simple reaction. For systematic faults it may be a bit more
logic one can use dynamic data types that allow cre- complicated to argue such an approach but equally
ating reliable logic from signal summations followed the impact of any systematic fault at the lower level
by digital filtering to extract the intended logic oper- (i.e. operating system, system libraries) would have
ation. This is achieved by multiplying the inputs and to generate a very complex and synchronous response
the resulting frequency summation/difference again and not merely a local fault like the infamous FOOF
by a constant helper frequency. bug [6].
The claim here thus is that such relatively low-
Inputs complexity safety logic elements can be run on a
A B A*B H1 Spectrum of A*B*H1 unsafe OS while retaining the safety properties of
6 6 0,12 27 , ,15, ,27, ,39 the logic. It should be stressed though that we are
6 12 6,18 27 ,9 , ,21, ,33, ,45 not claiming any mitigation of systematic faults in
12 6 6,18 27 ,9 , ,21, ,33, ,45 the implementation of the safety logic it self - rather
12 12 0,24 27 3 , , , ,27, , , ,51 only mitigation of systematic faults in the underly-
| XOR | ing generic components is claimed. With the overall
complexity of the safety logic and the signal process-
ing beneath it being relatively low we think that it
The output frequency spectrum now allows to
is an absolutely realistic target to implement such a
use a bandpass to extract A XOR B from the above
logic according to suitable standard procedures and
spectrum. By changing the input H1 (a helper fre-
under control of adequate safety management to en-
quency) the frequency spectrum generated by multi-
sure with adequate probability that the residual fail-
plying the two inputs with the helper frequency al-
ure rate of the safety logic is sufficiently low. As an
lows extraction of A AND B. By following the input
example the DDT logic presented here takes two in-
multiplication stage by a digital filter (in our case we
puts A and B and a control input H and provides an
use a 4th or 5th order FIR filter) one then can ex-
output of A o B and NOT A o B, with the operation
tract the desired compositional logic term from the
being AND or XOR depending on the setting of H.
signal.
Inputs H
A B A*B H1 Spectrum of A*B*H1
6 6 0,12 9 3 , 9, ,21, ,
6 12 6,18 9 3 , 9,15, ,27,
A AoB
12 6 6,18 9 3 , 9,15, ,27,
12 12 0,24 9 , 9,15, , ,33 DDT−
| AND |
Logic
B AoB
While this may seem quite a complex way of
doing this, it is precisely the complexity introduced
that is the protective mechanism against systematic FIGURE 5: composable logic element
and stochastic faults in the system as the signals logic structure
value is encoded in the frequency of the signal and
thus to transform a signal into a different logical sig- Note that the frequencies selected here during
nal (i..e. a signal that should be N Hz representing simulation, technically are not sensible, but the prin-
on to a frequency of N Hz representing off) would ciples don’t change when transformed to other ranges
require a highly complex sequence of synchronous of the spectrum - any real-life system would be op-
faults to appear - all faults that appear randomly will erating in the kHz range.
256
FLOSS in Safety Critical Systems
257
Safety logic on top of complex hardware software systems utilizing dynamic data types.
ogy in a digital world - but with the advantage of components as well as the frequency/spectrum-gap
retaining the signal transmission properties and the selection) are directly safety critical, such a system
ability to manipulate the signals in software. can be allowed to run on a un-safe operating sys-
tem/hardware due to the inherent detectability of
induced faults. Thus the required independence from
5 Conclusion the co-located software components can potentially
be provided.
Essentially we believe that safety needs methods to While we see dynamic data types as a potential
build on complexity - ideally appreciate complexity solution to some of the typical safety logic needs, we
of underlying systems, without compromising safety are well aware of this not yet being a mature con-
principles. We see little point in a continuous fight cept - rather we hope that we will find opportunities
against ever increasing complexity of system software to study this approach in more detail. Independent
and hardware - the question of ”should it be done in of the suitability of the dynamic data type concept
software” is legitimate in many cases, but there is though, we believe that it is time to start thinking
also little point in denying the trend towards com- about how to utilize complexity and inherent non-
plexer software/hardware systems and aggregation determinism for safety related systems rather than
of functions of different safety levels in systems. If continuing to rely on an abstract model of comput-
the safety community does not find answers to the ing that is crumbling - if not vanishing.
complexity growth that fit current technologies and
goes on recommending simple systems with simple
or no operating system, this will eventually backfire References
as we will see the expertise and experience as well
as field data of these systems fade into negligence - [1] IEC 61508-7 Ed 1 Functional safety of electri-
resulting in degraded safety in the long run. cal/electronical/programmable electronic safety-
We are not claiming that dynamic data types are related systems Part 7: Overview of tech-
THE answer to the problem, but we do believe that niques and measures, International Electrotech-
they are an example of the potential that lies in the nical Commission, 1998
paradigm change when switching to ”capitalize on
[2] IEC 61508-3:1998 Ed 1 Functional safety
complexity” rather than ”fight complexity”.
of electrical/electronical/programmable electronic
As Leveson notes that software can affect system safety-related systems Part 3: Software require-
safety in two principle ways: ments, International Electrotechnical Commis-
sion,1998
• it can exhibit behavior in terms of output [3] IEC 61131,
value and timing that contributes to the sys-
tem reaching a hazardous state [4] EN 50159-1 Railway applications - Communica-
tion, signalling and processing systems Part 1:
• it can fail to recognize and handle hardware Safety-related communication in closed transmis-
failures that it is required to control or respond sion systems, CENELEC, 2003
to in some way.
[5] Safeware: System Safety and Computers, Nancy
As safety is a system property dynamic data Leveson, 1995 Addison-Wesley
types can’t in principle guarantee software safety - [6] Pentium F00F bug ,Wikipedia, May 2010
but we believe dynamic data-types can give reliable http://en.wikipedia.org/wiki/0xF00FC7C8a
mitigation’s to both of the above mentioned potential
of software to impact system safety. Thus while the [7] The Coded Microprocessor Ccertification, Ozello
implementation of dynamic data types (the software Patrick, 1992,SafeComp
258
FLOSS in Safety Critical Systems
Julien Delange
European Space Agency
Keplerlaan 1, 2201AG Noordwijk, The Netherlands
[email protected]
Laurent Lec
MakeMeReach
23 rue de Cléry, 75002 Paris, France
[email protected]
Abstract
High-integrity systems must be designed to ensure reliability and robustness properties. They must
operate continuously, even when deployed in hostile environment and exposed to hazards and threats.
To avoid any potential issue during execution, they are developed with specific attention. For that
purpose, specific standards define methods and rules to be checked during the development process.
Dedicated execution platforms must also be used to reduce potential errors. For example, in the avionics
domain, the DO178-B standard defines the quality criteria (in terms of performance, code coverage, etc.)
to be met according to the software assurance level. ARINC653 specifies services for the design of safe
systems of avionics systems by using partitioning mechanisms.
However, despite those specific methods and tools, errors are still introduced in high-integrity systems
implementation. In fact, their complexity due to the large number of collocated functions complicates
their analysis, design or even configuration & deployment. In addition, an error may lead to a safety or
security threats, which is especially critical for such systems.
In addition, existing tools and software are released under either commercial or proprietary terms.
This does not ease identification and fix of potential security/safety issues while also reducing the potential
users audience.
In this paper, we present POK, a kernel released under the BSD license that supports software iso-
lation with time & space partitioning for high-integrity systems implementation. Its configuration is
automatically generated from system specifications to avoid potential error related to traditional code
production processes. System specifications, written using AADL models are also analyzed to detect any
design error prior implementation efforts.
259
POK, an ARINC653-compliant operating system released under the BSD license
computation node, reducing the hardware to deploy purpose, functions are separated within partitions.
and thus, the overall system complexity. These are under supervision of a dedicated kernel
that provides partitions execution separation using
However, by collocating several functions on the
time & space isolation:
same processor brings new problems. In particular,
this integration must ensure that each function will
be executed as it was deployed on a single proces- • Time isolation means that processing capac-
sor. In other words, the execution run-time must ity is allocated to each partition so that it is
provide an environment similar to the one provided executed as if it was deployed on a single pro-
by a single processor. In addition, impacts between cessor.
collocated functions must be analyzed, especially • Space isolation means that each partition is
for safety-critical systems where integrated functions contained in a dedicated memory segment so
may be classified at different security/safety levels. that one partition cannot access the segment
Next section details the problem and our pro- of another. However, partitions can communi-
posed approach. It also presents other work related cate through dedicated channels under kernel
to this topic. supervision so that only authorized channels
are granted at run-time.
260
FLOSS in Safety Critical Systems
This ensures enforcement of validated safety the ARINC653 avionics standard [2] specifies the ser-
and security requirements at run-time. vices and the API that would be provided by such a
system. The MILS [14] approach, dedicated to secu-
Use of these three steps altogether makes a com- rity, also details required services to integrate several
plete development process that would ease integra- functions on the same processor while ensuring a se-
tion of heterogeneous functions, as illustrated in fig- curity policy.
ure 1. First, the designer describes its functions with On our side, the POK operating system provides
its properties (criticality/security levels, execution services required by both ARINC53 & MILS: time
time, etc.) using an appropriate specification lan- & space partitioning support, real-time scheduling,
guage (AADL). Then, we analyze system architec- device drivers isolation, etc. It supports ARINC653
ture to ensure that each of their requirements would API and provides a POSIX adaptation layer to ease
be enforced while integrating (step 1 on 1). From this application use on POK. Finally, it is released under
analysis, deployment & configuration code is auto- the BSD license so that users are willing to use it as
matically produced so that partitions and kernels are free-software and can easily improve it, depending on
correctly configured according to their requirements their needs.
(step 2 on 1). Finally, generated code is integrated
with the POK execution platform that supports time On system modeling and analysis side, no frame-
& space isolation so that functions are integrated and work provides the capability to both capture and
separated as specified (step 3 on 1). analyze partitioned architecture requirements. How-
ever, this is very important for incoming projects
when system architecture must be analyzed/vali-
AADL models dated and development automated, especially be-
cause traditional development methods costs are still
Specifications validation (1) increasing [6] and lead to security/safety issues.
Similarly, no tool automates configuration and
Code generator (2) deployment of partitioned kernel from their spec-
ifications. Usually, developers make it manually
Configuration Plate-forme (POK) by translating system requirements into configura-
tion/deployment code. This is still error-prone and
(3)
a fault can have a significant impact (missed tim-
Implementation
ing constraint/deadline, communication channel not
allowed, etc.). Automating configuration is still an
FIGURE 1: Overall development approach emerging need but would likely to take importance
while system functions are still increasing.
As a result, our approach ensures integration of
several functions on the same processor while pre-
serving all their requirements in terms of timing, se- 3 The POK execution platform
curity or safety.
261
POK, an ARINC653-compliant operating system released under the BSD license
However, partitions may need to communicate. functions to partitions (for supporting schedul-
For that purpose, POK provides the inter-partitions ing algorithms for example).
ports mechanism that defines interfaces to exchange
• Fault Handling catches errors/exceptions
data between partitions. These are defined in ker-
(for example: divide by zero, segmentation
nel configuration so that only specified channels are
fault, etc.) and calls the appropriate handler
allowed at run-time. Consequently, system designer
(partition or task that generates the fault).
has to specify the communication policy prior (which
partition can communicate with another) executing • Time Isolation allocates time to each parti-
the system. tion according to kernel configuration.
Moreover, partitions may require to communi- • Space Isolation switches from one memory
cate with the external environment and so, access segment to another when another partition is
devices. However, sharing a device between two par- selected to be executed.
titions leads to potential security or safety issues: a
partition at a low security level may read data pre- • Inter-partitions communication exchanges
viously written by a partition classified at a higher data from one partition to another. Inter-
level. To prevent this kind of issue, POK requires partitions communication is supervised by the
that each device is associated to one partition and kernel so that only explicitly defined commu-
the binding between partitions and devices must be nications channels can be established and ex-
explicitly defined during the configuration. change data.
Cipher algorithms Device drivers code cations execution. It provides relevant services to
create execution entities (tasks, mutexes, etc ...) and
(libpok)
• Time Management counts the time elapsed • Cipher algorithms is a set of reusable func-
since system start-up and provides time-related tions to crypt/decrypt data within a partition.
262
FLOSS in Safety Critical Systems
3.3 Time isolation policy overlap each other and a partition cannot access to
memory address located outside its associated seg-
The time partitioning policy requires a predictable ment.
method to ensure enforcement of time-slices alloca-
Segments properties (address, size, allocation)
tion. For that purpose, POK schedules partitions
are defined at configuration time and cannot be mod-
using a round-robin protocol: each of them is exe-
ified during run-time. To avoid any access outside
cuted periodically for a fixed amount of time. This
their segments, partitions are executed in a user con-
introduces a scheduling hierarchy: partitions are first
text. They can only use a restricted set of instruc-
scheduled and then, tasks within the partition are
tions1 , as for user processes on regular operating sys-
scheduled according to the internal partition schedul-
tems. To use privileged instructions2 , they must call
ing policy.
kernel services and performs system calls (see POK
services - kernel interface in 3.2).
RMS RMS To guarantee space isolation, POK relies on the
Level 1
263
POK, an ARINC653-compliant operating system released under the BSD license
Compared to other approaches, POK does not exe- Finally, when executing within a partition, a
cute device drivers within the kernel. Instead, their driver requires to have access to privileged instruc-
code is isolated within a single partition to: tions to control hardware. For that purpose, POK
provides access to these privileged instructions only
to the relevant partition (the one that accesses the
• Avoid any impact from a device driver error driver). In that case, this additional access must
(safety reason). If the driver is executed in be defined at configuration time so that the kernel
the kernel context, a crash would lead to un- grants/refuses access to privileged instructions ac-
expected impacts, such as crashing the whole cording to its configuration policy. An example is
kernel or other partitions. shown in figure 6: in this system, two partitions
are executed: one application partition and a driver
• Ensure data isolation (security reason): if the partition that controls a device. The system defines
device is shared by several partitions, one clas- three memory areas: one for each partition and an
sified at a low security level may read or write additional area mapped to the device memory. The
data on the device from another one classified driver partition may have access to this specific addi-
at higher level. tional memory segment to control the hardware but
the application partition would not be able to access
it.
264
FLOSS in Safety Critical Systems
To auto-generate kernel and partitions configuration, (AADL models) so that verified properties are cor-
designer must then first model its system with its re- rectly translated into code.
quirements. This is explained in the next sections.
Next sections present the language chosen
(AADL [7]), its tailoring for partitioned architecture
Validation failure
AADL models modeling and its use for specifications validation.
265
POK, an ARINC653-compliant operating system released under the BSD license
must be associated to the model in order to make a example: resource dimensions with the analysis of
complete description of a partitioned architecture. memory segments size with respect to the associated
partitions requirements - size of tasks size, etc.) or
These modeling patterns has been first design for
validation of a specific constraint (for example, par-
POK [15] and then, be standardized as an annex of
titions scheduling).
the AADL standard [3]. Most important patterns de-
fine how to use AADL to model a partition (process Once models are validated and the architecture
bound to a memory and a virtual processor to considered as correct, our code generator, Ocarina,
model the partition, its allocated memory segment produces configuration & deployment code for both
and its execution context), intra-partition communi- the kernel and its associated partitions. Next sec-
cation (connections of AADL ports between thread tions detail this process.
located within the same process) or inter-partition
communication (connection of AADL ports between
process components). A full description of these
modeling patterns is available in [15, 3, 5].
Next section presents our tools that check models 5 Automatic Configuration &
to validate partitioned architectures requirements. Deployment of Partitioned
Architectures
4.3 System validation
Once system requirements and constraints have been
Once system architecture is specified using AADL validated, the Ocarina AADL tool-suite processes
models, analysis tools process models and check their models and generates configuration & deployment
compliance with several requirements/guidelines. In code for both kernel and partitions, as shown in fig-
the context of POK, we design the following rules: ure 7. Next sections describe the process [5], high-
lighting its benefits regarding safety-critical systems
• Modeling patterns enforcement rules needs (predictability, safety assurance, etc.) and con-
(rules described in previous paragraph). straints (code coverage, etc.).
This ensures that models are complete, with all
required components/properties and they can
be processed to configure the kernel and parti-
tions with code generators.
5.1 Code generation Overview
• Handling of potential safety issues rule.
This one checks that all potential error will be
Our code generator process [5] consists in analyzing
recovered and report to the user which error
AADL models, browsing its components hierarchy
is not handled by a partition or a task. For
and, for each of them, generates appropriate code to
example, if the designer didn’t specify a recov-
create and configure system entities (as depicted in
ery subprogram when a memory fault is raised
figure 7). The process creates the kernel configura-
while execution a partition, the analyzer will
tion code (partitions time slots, memory segments as-
report an error.
signment for each partition, etc.) and partitions con-
• Security analysis rules. This aims at check- figuration code (services to be used, required mem-
ing the architecture against a security policy. ory, intra-partition communication policy, etc.).
Depending on its own classification level and For example, when the code generator visits a
the one of the data they produce or receive, a memory component, it generates code to create a new
partition may be compliant with a specific se- memory segment. Then, it inspects the model to re-
curity policy. On the other hand, the security trieve the associated partition and configure the ker-
policy defines which operations are legal so that nel to associate the appropriate partition with this
our tool can automatically checks for architec- segment. Each AADL entity and property is used
ture compliance with them. Actually, our tool to produce system configuration code so that the re-
checks state-of-the-art security policies such as sulting process creates a complete code, almost ready
Bell-Lapadula [17] or Biba [16]. to be compiled. Next section explains which part of
the code is automatically generated and which still
Other validation can be issued on the models to requires some manual code to have a complete exe-
check either the correctness of the architecture (for cutable system.
266
FLOSS in Safety Critical Systems
5.2 Kernel and partitions configura- As for the configuration code generation, the be-
tion havior code of application is generated from a prede-
fined set of AADL components:
At first, the code generator browses components hi-
erarchy to create the kernel and partitions configu- • A thread component specifies a task that
ration code. For that purposes, it analyzes the fol- potentially communicates using its features.
lowing AADL components: For each thread, Ocarina generates code that
gets the data from its IN features, performs
• processor: for the kernel configuration. The the call sequence to its subprogram and sends
processor components specifies the time slots produced output using its OUT features.
allocated for each partition so that they are
translated into C code to configure the kernel • A data component represents a data located
time isolation policy. in a partition, shared among its tasks and pro-
tected against potential race conditions using
• virtual processor: for partition run-time locking mechanisms (mutex, semaphore, etc.).
configuration. Based on the properties of this For each task that uses the data, the code gen-
component, the code generator produces code erator produces code that locks it, modifies it
that configures partitions services (need for and finally releases it.
memory allocation, POSIX or ARINC653 API
support, etc.) • A subprogram component references object
code implemented either in C, Ada or even as-
• process and its features: the process con- sembly languages. This is just a reference to
tains all partitions resources (thread and low-level implementation so when visiting such
data) so that necessary amount of resources component, the code generator creates a func-
is allocated in the kernel and its partition. tion call to the object code. Finally, it also
process features represent inter-partitions configures the build system in order to inte-
communication ports. When analyzing such grate the object code in the partition binary.
entities, the code generator configures these
ports and their connection within the kernel
Then, almost all the code of the system is pro-
so that only communication channels specified
duced by Ocarina. The user has to provide the
in the model would be granted at run-time.
application-level code, the one referenced by AADL
• memory: as it represents a memory segment, subprogram components and automatically called by
the code generator produces configuration code the behavior code generated for each partition. Next
that instantiates a new memory segment in the section details the benefits of this process with re-
main system memory and associates it to its spect to high-integrity systems requirements.
bound process component (that corresponds
to a partition). In consequence, at run-time,
each partition will be associated with a mem- 5.4 Benefits of Code Generation
ory segment that has the properties (address,
size, etc.) specified in the model. First of all, the use of such a process requires to
specify system architecture using a modeling lan-
Once this configuration code has been generated, guage, which makes the whole process more rigorous
Ocarina also generates the behavior code of the sys- than just using a text-based specification document.
tem, the one executed by each task. Next section Moreover, the process brings the following benefits:
details this step.
1. Early error detection
The behavior part corresponds to the code that uses 3. Specifications requirements enforcement
partitions resources to execute application functions.
It consists in getting/putting data from/to the ex- By using validated models as a language source
ecution environment (for example, by using inter- for system implementation, the development pro-
partitions communication ports), calling application cess detects specification errors at the earliest when
functions and managing tasks execution (put a task such problems are difficult to track and usually de-
in the sleep mode when its period is completed, etc.). tected during tests (at best) or production (at worst)
267
POK, an ARINC653-compliant operating system released under the BSD license
phases. By identifying these errors prior to the im- Finally, the recovery policy requires to stop the ker-
plementation, we save a significant number of prob- nel when an error is raised at kernel-level and restart
lems and save development costs. other components (partition, task) when one of them
triggers an error/exception.
Then, by automating code production with code
generators such as Ocarina, we rely on established
generation patterns that output the same block of
code according a predefined AADL block. Use of
snd_thr
such code patterns avoids all error related to hand- recv_thr
written code that usually introduce syntax/semantic
errors that are difficult to track3 and require code recv_prs snd_prs
analysis tools and reviews to be found.
Finally, a particular interest is the enforcement of
the specifications. Implementation compliance with
seg1
the specifications is usually checked during manual seg2 part1 part2
code review. However, this is long, costly and also ram
pok_kernel
error-prone [6] since its relies on a manual inspec- case_study_osal
tion. By automating the code production from the
specifications and by using code generation patterns,
implementation code ensures specifications enforce- FIGURE 8: AADL model of the case-study
ment and so, would reduce development costs while
improving system safety and robustness.
6.2 AADL modeling
268
FLOSS in Safety Critical Systems
269
POK, an ARINC653-compliant operating system released under the BSD license
the kernel but produces partitions configuration and N) is set back to 0, showing that the partition binary
behavior code. Developers only have to write ap- has been re-loaded.
plication code, which corresponds to the functional
To assess the memory consumption of generated
part of the system. In this case-study, it consists
systems, we also report generated kernel and parti-
of two functions: one that outputs an integer and
tions sizes (see table below). Partitions size is sim-
stores it as a function argument (the one used on the
ilar: they contain the same functionality and differ
sender side) and another that takes one integer as
only by their application code. Both of them have a
argument and process it (receiver side). The code
small size: 11kB for a complete system that embeds
provided by the developer is shown in the following
run-time functions for the support of user applica-
listing, demonstrating that code production automa-
tion. This demonstrates the lightweight aspect of the
tion reduces manual code production activities.
approach. Kernel size is also very small, especially
In the following application code, the receiver for such a system that provides critical functions re-
part raises a division by zero exception when result garding safety and security issues.
of (t + 1)%3 == 0 (line 16 of the application code).
According to the recovery policy, when such a condi-
Component Size
tion is met, the partition restarts. To show graphi-
Kernel 26 kB
cally that the partition is correctly restarted, we also
Partition 1 11 kB
output the number of times the function is executed
using variable step. Its initial value is stored in the Partition 2 11 kB
data from the partition binary so that when reload-
ing the partition, the initial value is set again in the
variable. 7 Conclusions & Perspectives
1 void user send ( int ∗ t )
{
3 static int n = 0; This article presents POK, a BSD-licensed operating
5 p r i n t f ( ” Sen t v a l u e %d\n” , n ) ;
system that supports partitioning with time & space
n = n + 1; isolation. It also provides layers to ease deployment
7 ∗t = n ;
}
of existing code that uses established standards such
9 as POSIX or ARINC653.
s ta ti c int step = 0;
11 Beyond the operating system itself, POK relies
void u s e r r e c e i v e ( in t t )
13 { on a complete tool-chain to automate its configura-
int d; tion & deployment and ease partitioned systems de-
15
d = ( t + 1) % 3 ; velopment. It aims at specifying system architecture
17 printf ( ” Step %d\n” , s t e p ++); and properties using a modeling language, AADL
printf ( ” R e c e i v e d v a l u e %d\n ” , t ) ;
19 printf ( ” Computed v a l u e %d\n ” , t / d ) ; and verifying its requirements using dedicated anal-
} ysis tools that process these specifications. Then,
from this validated specifications, our tool-chain au-
Generated application is compiled for Intel (x86)
tomatically generates code that configures/deploys
architecture and produces the following output dur-
kernel/partitions and execute application code pro-
ing execution:
vided by the user. This ensures specifications re-
quirements enforcement and avoid all errors related
... to usual development process.
Step 3
Received value 5
[KERNEL] Raise divide by zero error
Step 0
7.1 Perspectives
Received value 0
Computed value 0 The domain of partitioned architecture is still emerg-
Sent value 8 ing and there is many potential open perspectives.
... On the kernel side, there is a need for more hardware
support (devices, architectures, etc.) and a wider
support of existing standards, as for the ARINC653
One may notice that when the faulty condition of
layer (for example, to support the second part of the
the application code ((t + 1)%3 == 0, line 16 of the
standard).
user application code) is reached (in that case, when
receiving value 5), the receiver partition is restarted. On the modeling and analysis part, there is a
Initial value of variable step (printed in the line Step strong need to connect AADL models with other
270
FLOSS in Safety Critical Systems
[4] Fabrice Bellard. Qemu, a fast and portable dy- [16] Biba, K.J. - Integrity considerations for secure
namic translator. In ATEC 05: Proceedings of computer systems. Technical report, MITRE
the annual conference on USENIX Annual Tech- [17] Bell, D.E., LaPadula, L.J. - Secure computer
nical Conference, pages 4141. system: Unified exposition and multics inter-
pretation. Technical report, The MITRE Cor-
[5] Julien Delange, Laurent Pautet and Fabrice
poration (1976)
Kordon. Code Generation Strategies for Par-
titioned Systems. In 29th IEEE Real-Time [18] Julien Delange - Intégration de la securité et de
Systems Symposium (RTSS08), pages 5356, la sureté de fonctionnement dans la construction
Barcelona, Spain, December 2008. IEEE Com- d’intergiciels critiques - PhDThesis
puter Society.
[19] Olivier Gilles and Jérôme Hugues - Validating
[6] National Institute of Standards and Technology requirements at model-level in Ingnierie Dirige
(NIST). The Economic Impacts of Inadequate par les modles (IDM08)
271
POK, an ARINC653-compliant operating system released under the BSD license
272
FLOSS in Safety Critical Systems
Yutaka Matsuno
The University of Tokyo, Japan
JST, CREST
[email protected]
Abstract
System assurance has become an important issue in many system domains, especially in safety-critical
domain. Recently, assurance cases[3] have been getting much attentions for the purpose. We demonstrate
D-Case Editor [10], which is an assurance cases editor being developed in DEOS (Dependable Embedded
Operating System for Practical Uses) project funded by Japan Science and Technology Agency. D-Case
Editor has been implemented as an Eclipse plug-in using Eclipse GMF framework. Its characteristics are
(1) supporting GSN (Goal Structuring Notation) [8], (2) GSN pattern library function and prototype type
checking function [9], and (3) consistency checking function by an advanced proof assistant tool [13]. To
achieve these characteristics, we have exploited types in several ways. In this paper, we briefly introduce
assurance cases, and demonstrate the functions of D-Case Editor. Because it has been implemented on
Eclipse, it is interesting to make a tool chain with existing development tools of Eclipse. D-Case Editor
is available as an open source in the following web page: http://www.il.is.s.u-tokyo.ac.jp/deos/dcase/.
273
D-Case Editor: A Typed Assurance Case Editor
274
FLOSS in Safety Critical Systems
Goal:G_1 Context:C_1
Strategy:S_1
Goal:G_2 Goal:G_3
Evidence:E_1 Undeveloped:U_1
275
D-Case Editor: A Typed Assurance Case Editor
the meta-model for assurance cases called ARM (Ar- types enum, int, double, and string, respectively.
gument Metamodel) [11] by which both notations are Furthermore, these types are given useful restrictions
in fact interchangeable. The main aim of the ARM is such that the value of CPU (this variable is intended
to align two major notations and facilitates the tool as the CPU resource usage rate of the target sys-
support. Unfortunately it only reflects main con- tem) is restricted within 0 − 100%. Users of D-Case
structs between the two, and some specific features, Editor can assign values to these variables via the
which are not compatible are missing from it. For parameter setting window. If a user mis-assigned a
instance, patterns are not included in the ARM. value (e.g., 150 for CPU), then D-Case Editor reports
the type error. As far as we know, there is not any
assurance case editor which has such parameterized
3 Overview of D-Case Editor expressions and type checking mechanism. We plan
to implement the type checking mechanism in Sec-
tion 3.
Figure 3 shows a screen shot of D-Case Editor. Users
can draw GSN diagrams in the canvas. In the right,
there is a pattern library. From the library, users can
choose already existing, good assurance case patterns 4 Concluding Remarks
and fragments, and copy to the canvas. Current D-
Case Editor has the following functions (some func- We have presented our assurance case editor, called
tions are omitted in current version.) Consistency D-Case Editor. It has been implemented as an
checking with an advanced proof assistant tool [13] Eclipse plug-in using Eclipse GMF, and released as
will be available soon. an open source. We hope that D-Case Editor would
contribute to make assurance cases more familiar to
• Checks on the graph structure of D-Case (e.g. developers by making a tool chain of D-Case Editor
no-cycle, no-evidence directly below a strategy, with Eclipse and other development tools. We plan
etc.) to comply to OMG ARM [11] and other international
standards related to assurance cases in next release.
• External info via url can be attached to a goal.
276
FLOSS in Safety Critical Systems
Figure 4: Variables and Type Declarations XML file for D-Case Editor
277
D-Case Editor: A Typed Assurance Case Editor
[8] Tim Kelly and Rob Weaver. The goal structur- Symposium on High-Assurance Systems Engi-
ing notation - a safety argument notation. In neering (HASE), pages 170–171, 2010.
Proc. of the Dependable Systems and Networks
2004, Workshop on Assurance Cases, 2004. [11] OMG. Argument metamodel (ARM). OMG
Document Number Sysa/10-03-15.
[9] Yutaka Matsuno and Kenji Taguchi. Parame- [12] Railtrack. Yellow book 3. Engineering Safety
terised argument structure for gsn patterns. In Management Issue3, Vol. 1, Vol. 2, 2000.
Proc. IEEE 11th International Conference on
Quality Software (QSIC 2011), 2011. Short Pa- [13] Makoto Takeyama. Programming assurance
per (6 pages). cases in agda. In ICFP, page 142, 2011.
[14] Robert Andrew Weaver. The Safety of Software
[10] Yutaka Matsuno, Hiroki Takamura, and Yutaka - Constructing and Assuring Arguments. PhD
Ishikawa. A dependability case editor with pat- thesis, Department of Computer Science, Uni-
tern library. In Procs. IEEE 12th International versity of York, 2003.
278
FLOSS in Safety Critical Systems
Abstract
Safety-critical software is usually implemented under the constraints of one or more standards which
demand evidence that these constraints were honoured. This leads to higher implementation effort and
require in-depth knowledge on the programming languages and interfaces used by each individual pro-
grammer – often to avoid making the same mistakes over and over again.
To facilitate development under such conditions, a library of frequently used functions and algorithms
which adhere to certain safety constraints complemented by specific evidence suitable for proof against a
standard would be of great help. Inside this paper we present such a library written in ANSI-C, named
”safety lib”, which emerged as a by-product of an application developed for Safety Integrity Level (SIL)
2 certification according to IEC 61508 at the Vienna Institute for Safety & Systems Engineering.
The main intention of this paper is to show the benefits of using such a library in safety-critical
development and the reasons for its planned release under a FLOSS license. Furthermore, we want to
invite everyone to use the safety lib and participate in its development to improve both its code and
evidence base.
Our hypothesis is that the joint development of a library for safety-critical applications can not only
save development and certification costs, but – even more important – increase safety through better and
more intense reviews carried out by a community instead of just individual developers.
279
A FLOSS library for the safety domain
library. Ideally, this ’safety library’ would be easy is a set of constraints common to most of them:
to (re-)use, well-tested and complemented by a suf-
ficient body of evidence to not only rise the safety of Coding guidelines - A coding guideline restricts
the system itself but also to speed up a certification the functionality of a given programming lan-
process against a specific standard. guage to a certain subset by excluding func-
Inside this paper we present such a library writ- tions and constructs which are deemed un-
ten for ANSI-C, named ’safety lib’, to achieve three safe. A well-known example are the MISRA-
objectives: First, we want to point out an approach C guidelines for the ANSI-C language which
to satisfy the demands of safety standards in a mean- disallow a number of standard library func-
ingful way. Second, by pointing out the problems tions (such as malloc() or printf()). It can be
our library helps to solve during implementation, we expected, that the adherence to a given cod-
hope to raise the awareness of certain ill-understood ing guideline will force a programmer to aban-
and often dangerous practices when implementing a don some ’standard solutions’ in favour of code
safety-critical system in C. Third, it is our intent to that complies to the restrictions (e. g. adding
leverage the know-how of the community to improve pre/postconditions to standard function calls,
the safety of the library and its evidence through ex- avoidance of dynamic memory allocation).
tensive reviews and enhancements. To this end we
Coding style guide - In contrast to a guideline, a
are preparing the release of the safety lib under a
coding style guide only affects the format of
FLOSS license and invite everyone to participate in
the source code and does not restrict the fea-
its further development.
ture set of a language. The primary purpose is
to enforce a consistent style of the code in or-
1.1 Content der to improve readability for code reviews and
maintainability in case of changing developers.
The first part gives an overview on the requirements A style guide dictates, for example, the length
imposed by certain safety standards on the imple- and indentation of code lines.
mentation of a safety-critical SW application and Modularity - Splitting the code up into small, self-
how they affect the source code. Furthermore, the contained components is beneficial for review-
impact on safety of portability and code reuse is dis- ing and testing and facilitates the assessment
cussed. of the impact of changes on the overall safety.
The second part explains how a software library The most common approach to ensure the
can provide evidence for the above mentioned re- modularity of code is by enforcing thresholds
quirements and opportunities for improving safety at for code complexity metrics such as lines of
the same time. After this, the safety lib is introduced code (LOC) and cyclomatic complexity[2].
via a number of examples that demonstrate certain
dangerous and unsafe implementation practices and Defined interfaces - Software modules and their
the approach of the library in preventing problems. functions should be consistent, easy to use and
This is complemented by some technical facts and an unambiguous in the meaning of their parame-
outline of concrete evidence the safety lib offers. ters. This can be achieved by enforcing limits
for the number of parameters, avoiding overly
The remainder of this paper deals with possible generic functions and via documentation of the
alternatives and the approach to improve the func- interfaces.
tionality and safety of the library via the FLOSS
community. Static analysis - Static analysers work directly on
the source code and can detect problems which
are not considered by the translator such as
2 Constraints out-of-bounds access or locking errors. In addi-
tion, manual code reviews provide coverage of
problems which can not be adequately checked
Developing a safety-critical system inevitably in- by a tool.
volves one or more standards and the effort to prove
that the requirements of those standards are met. Testing - Testing complements static analysis by
Regarding software implementation the requirements asserting the correct behaviour of a software
usually consist of a number of constraints for the de- during runtime. As there is no way to exhaus-
sign and coding of the source code. While certain tively test an even moderately complex sys-
standards might require specific methodology, there tem, test coverage metrics are usually defined
280
FLOSS in Safety Critical Systems
to gauge the thoroughness of a test suite (e. g. creased effort of generic programming and the
statement/branch coverage). lack of certain non-standard features.
Portability enables the usage of the software on dif- • The source code itself can be analysed and re-
ferent systems without the need to modify large viewed for the presence/absence of faults.
parts. This not only saves time and money but
is also beneficial for safety as changes to an ex- • Additional data like test results, metric reports
isting code might easily introduce new bugs. and analyser output provide safety-specific ev-
One way to achieve this is to rely on standard- idence.
ised interfaces as much as possible - e. g. by ex-
clusively using functions defined by the POSIX The first two points are usually not feasible with
standard. The downside of portability is the in- closed-source/proprietary libraries and make the use
1 For example, IEC 61508, part 7, clause C.2.10.1 states the minimum operating time as 1 year.
281
A FLOSS library for the safety domain
of FLOSS software in safety-critical systems attrac- the following, we discuss a number of these prob-
tive. However, most FLOSS libraries do not provide lems to further their awareness and describe how the
the specific evidence needed to justify their usage in safety lib tries to solve them.
the context of a safety standard. While it is theoret-
ically possible to extract the necessary data – given
the availability of the source code – time and bud- 4.1 Undefined behaviour
get constraints usually prevent this. Furthermore,
FLOSS code is rarely written in accordance to safety C is a language which provides lots of freedom both
requirements. Instead, developers might opt to write to the programmer and the compiler. This freedom
safety-critical code from the scratch, effectively rein- comes at a price, as it is not guaranteed how cer-
venting the wheel over and over and wasting the tain source code constructs actually behave during
benefits provided through the FLOSS approach – runtime. The C90 standard[8] includes a list in An-
namely a large number of potential reviewers/testers nex G, Clause G.2 of such constructs said to invoke
and availability of defect history. ’undefined behaviour’ – the most famous examples
To cope with this problem, a software library is are probably division by zero, dereferencing a NULL
needed which fulfils the following requirements: pointer and accessing an array out-of-bounds. Unde-
fined behaviour should not be confused with unspeci-
fied and implementation-defined behaviour which are
• The source code must be fully available.
much more benign (but can be still problematic).
• The history of changes and bugs must be ob- The danger of undefined behaviour is the unpre-
tainable. dictability of the program’s execution. Depending
on the compiler, the state of the execution environ-
• All library code must comply to the common
ment and other arbitrary factors it might crash, fail
set of safety requirements imposed by the stan-
silently or actually show no erroneous behaviour at
dards.
all. Due to compiler optimisation a program might
• Sufficient evidence to proof the compliance of even be affected prior to executing the actual op-
the code must be available. eration which invoked the undefined behaviour, as
demonstrated by [9].
• Modifications and corrections must follow a de-
Examining already existing source code for un-
fined process.
defined behaviour can be very difficult. An alterna-
• The library must be portable to a large number tive approach is to exclude programming constructs
of platforms. which may lead to undefined behaviour a priori –
this is the motivation behind coding guidelines. The
• The interfaces must be fully documented. safety lib adopts this by adhering to the MISRA-
C:2004 coding guidelines[10] which prohibits invoca-
In the remaining parts of the paper we present tion of undefined behaviour in general and for specific
a software library written in ANSI-C, named the cases such as:
’safety lib’, developed in accordance with these re-
quirements. • Using identifiers that do not differ in the first
n characters2
282
FLOSS in Safety Critical Systems
can lead to out-of-bounds access when the terminat- 4.2 Complicated Interfaces
ing NUL-character is not present in the source string.
The error might be prevented by using strncpy(). Due to historic growth the POSIX 2001 standard[11]
However, if there is no terminating NUL in the first contains a number of functions whose interfaces are
n characters to copy, the destination string will not rather complicated to use correctly. This especially
be terminated either – only for subsequent functions concerns the socket API used in network program-
to fail as in listing 1. ming, which suffers from the need to combine several
different socket types and provide support for IPv6.
#define BUFSIZE 19 Applications requiring network traffic must take care
... to use the correct type of socket address structure in
resolving addresses and creating sockets or risk un-
char b u f [ BUFSIZE ] = { ’ \0 ’ } ; stable network behaviour.
s t r n c p y ( buf , \ Listing 3 shows how to get the IP address of
” undefined behaviour ” , \ a remote peer connected via TCP. However, this
BUFSIZE ) ;
code does not work with IPv6 which would need an-
/∗ w i l l l i k e l y p r i n t 19 or c r a s h ∗/ other address structure type to store its larger ad-
p r i n t f ( ” S i z e o f s t r i n g i n b u f f e r : %u\n” , dress (namely sockaddr in6). Instead of limiting it-
s t r l e n ( buf ) ) ; self to one of the types applications should use a
Listing 1: Undefined behaviour invoked through sockaddr storage structure suitable for both IPv4 and
standard library string handling functions IPv6 addresses as recommended in [12]. This in-
volves a conversion to yet another type (sockaddr)
in order for getpeername() to work, since IPv6 sup-
The primary method to avoid these and other port was added after the API was defined. Similar
classes of mistakes is to provide wrappers for stan- confusing behaviour can be found for other functions
dard functions which ensure that certain pre- and dealing with sockets and address structures (e. g. the
postconditions hold during execution. For the output of getnameinfo() is inconsistent across plat-
above example a safe version would both ensure forms, inet pton() returning 0 for error).
that a given destination buffer size is not exceeded
and that the resulting string is terminated in all i n t s o c k f d c o n n e c t e d = −1;
struct s o c k a d d r i n p e e r a d d r ;
cases. Listing 2 shows the same situation using sockl en t peer addrlen = sizeof ( peer addr ) ;
the safety lib’s safe strncpy() function which incor- char p e e r a d d r s t r [INET ADDRSTRLEN + 1U ] ;
porates these checks to avoid undefined behaviour3 .
...
#define BUFSIZE 19
/∗ c a l l s t o s o c k e t ( ) , b i n d ( ) and a c c e p t ( )
o m i t t e d ∗/
...
...
char b u f [ BUFSIZE ] = { ’ \0 ’ } ;
getpeername ( s o c k f d c o n n e c t e d , \
i f (1 == s a f e s t r n c p y ( buf , \
( struct s o c k a d d r ∗ ) &p e e r a d d r , \
BUFSIZE , \
&p e e r a d d r l e n ) ;
” undefined behaviour” ) )
{
i n e t n t o p (AF INET , \
/∗ no t e r m i n a t i n g NUL d e t e c t e d among
&p e e r a d d r . s i n a d d r , \
BUFSIZE c h a r a c t e r s ∗/
peer addr string , \
p r i n t f ( ” S t r i n g to o l o n g \n” ) ;
( s o c k l e n t ) (INET ADDRSTRLEN + 1U) ) ;
}
else
p r i n t f ( ” IP o f p e e r : %s \n” , p e e r a d d r s t r ) ;
{
/∗ w i l l al w ay s p r i n t l e s s e q u a l 18 ∗/
p r i n t f ( ” S i z e o f s t r i n g i n b u f : %u\n” , \ Listing 3: Getting the IP address of a TCP peer in
s t r l e n ( buf ) ) ; the traditional way
}
Listing 2: Using safe strncpy() to avoid undefined To facilitate the usage of sockets in a protocol
behaviour independent, portable and non-confusing manner,
the safety lib includes wrappers for socket handling
3 A similar function already exists with strlcpy() on several platforms. However, strlcpy() is not defined by POSIX and so
not universally available – for example, glibc does not implement the function.
283
A FLOSS library for the safety domain
functions which enable the programmer to work di- 4.4 Thread synchronisation
rectly with addresses in textual and binary form for
IPv4 and IPv6. For example, listing 4 demonstrates Sometimes it is necessary to force multiple threads
the above scenario using the safe get peer address() to execute in a certain order or to let them wait for a
function instead. specific event to occur. The principal way to achieve
this with the POSIX pthread API is to use a condi-
i n t s o c k f d c o n n e c t e d = −1; tion variable (condvar). A condvar basically puts the
char p e e r a d d r s t r [ INET ADDRSTRLEN + 1U ] ;
current thread to sleep until it receives a signal from
uint16 t peer srcport ;
enum i p v e r s i o n p e e r i p v e r s i o n ; another thread and is usually associated with a pred-
icate that determines if it is necessary to wait. Fur-
... thermore, each condvar is paired with a mutex that
/∗ c a l l s t o s o c k e t ( ) , b i n d ( ) and a c c e p t ( )
is atomically unlocked and locked when the thread
o m i t t e d ∗/ starts and ends it sleep respectively. The proper use
of a condvar requires certain steps:
...
284
FLOSS in Safety Critical Systems
285
A FLOSS library for the safety domain
• Coding style guide – The code follows a self- • Cyclone – As a rather different approach,
defined style guide loosely based on the Linux Cyclone[21] is a dialect of C which adds safety
kernel coding style[15]. No suitable checking checks during compilation and runtime, pre-
tool was available, so manual reviews and their venting certain types of errors. Despite the
reports act as substitute. great potential, it is unfortunately no longer
maintained and would require justification for
• Modularity – As mentioned before, metric the compiler which is most likely not proven-
limits were enforced on the source code which in-use.
were checked with cccc[16]. The metric reports
generated by the tool provide the necessary ev- To summarise, the alternatives lack either the
idence. necessary evidence, do not adhere to the require-
• Defined interfaces – All functions with ex- ments of safety standards or have a too narrow scope.
ternal linkage are documented inline in a con-
sistent manner detailing the function pur-
pose, meaning of parameters and return values. 8 Future Work
Doxygen[9] markup was used to automatically
generate the documentation from source. While the decision that the safety lib should be re-
leased under a FLOSS license was a quick one, the
• Testing – The library code was tested by au- details still have to be clarified. The following list in-
tomated unit tests using a modified version of cludes some of the decisions to be made and the pre-
the CUnit[18] framework to achieve a mini- conditions that have to be established before we can
mum statement/branch coverage of 90%/80% actually release our code – the intention of this pa-
per unit and 93%/84% on average for the whole
per is to get feedback from potential users that may
library. The code coverage was measured with help us to make those decisions which will greatly
gcov and graphical reports were generated with influence the future of the safety lib.
lcov[19].
Deciding on a FLOSS License – It can be as-
sumed that most companies would not be inter-
7 Alternatives ested in linking to a library that ”forces” them
to release their application under the same li-
Implementing safe versions of standard functions and cense. Because of this, we are currently tend-
programming idioms is an established practice. Al- ing towards LGPL [22] or a similar model, as
though there already exist solutions in a similar vein this allows the usage of the library in conjunc-
to the safety lib, they are insufficient for direct adop- tion with a proprietary software, while ensur-
tion into a safety-critical software. This section dis- ing that improvements to the safety lib itself
cusses some of the freely available alternatives and are released as FLOSS. The intention is to pre-
their deficiencies. vent ’grab and run’ in the interest of everyone
who really wants to use and participate in the
development of the safety lib, while still allow-
• Safe C Library – This library implements
ing contributing parties to use the safety lib
alternative versions of the standard library’s
without having to license their whole applica-
string handling and memory allocation func-
tion under a FLOSS license.
tions defined by ISO/IEC TR 24731[20]. These
extensions mainly add bounds checking and Define a Development Life Cycle – A general
have the benefit of standardisation and com- requirement of the various safety standards not
pleteness in regard to the standard library but yet discussed is the need for a well defined de-
nothing else. Furthermore, no additional safety velopment life cycle. To satisfy this demand
evidence is available. in a way that embraces the needs of software
development in the safety domain as well as
• Safe C String Library – A library providing the needs for a community driven open-source
safe string manipulation using a custom string development we will need to define such a life
type with length information. The drawbacks
cycle in detail prior to release.
of this approach are the increased effort in port-
ing legacy applications and the usage of dy- Evidence Management – The basis for a success-
namic allocation for storing strings. Evidence ful management of evidence that can be used
is missing as well. during the certification process is to have a
286
FLOSS in Safety Critical Systems
technical framework that supports the collec- within one single organisation. In a first step, this
tion and management of evidence and semi- includes more code reviews by developers from dif-
automatically produces the documents that ferent organisations and from different industries to-
can be provided to the certification authori- wards a higher chance of discovering subtle bugs –
ties. In our current development, we are using especially by running the code on a variety of hard-
Codestriker[23] to document code reviews and ware and software platforms in various applications.
Bugzilla[24] to report bugs. Both of those tools
In the future, pure testing and code reviews will
store the collected data in SQL databases from
be less and less effective to master growth in code size
which a review report and a report of known
and complexity and the usage of formal methods for
problems can be pulled with small effort.
code verification will become more and more impor-
Legal Issues – Just as important as the manage- tant for this case – even for systems with a low safety
ment of evidence is the question of legal prob- integrity level. Unfortunately, most of these formal
lems with the collected data. This includes method techniques are time intensive tasks demand-
copyright and licensing issues as well an esti- ing expert knowledge. Here, the big advantage of
mation of the impact on a possible certifica- joint development manifests itself by the chance for
tion process against a standard. Basically, this those developers new to the formal analysis to learn
means that the evidence itself will need to be from those who have experience, and to get their first
released under an open license as well. The steps in formal verification checked by experts.
first that comes in mind here is the FDL [25]
but CC [26] might be an option as well.
9 Conclusions
8.1 Community Development
Safety standards impose a lot of constraints on the
To allow efficient community driven development, implementation of software deemed for certification.
the safety lib has already been pulled out of the sub- A set common to most important standards can be
version repository of the project it evolved from and defined from these constraints which demands evi-
has been moved into a separate git repository. Care dence that all requirements are fulfilled by the code.
has been taken to preserve traceability in form of the The acquisition of this evidence is time-consuming
change history back to the very beginning. As soon and often not even possible – especially when de-
as the safety lib is released under a FLOSS license pending on closed-source software.
this repository will be made accessible to the public. Inside this paper we proposed the usage of an
As described above it is crucial to decide on a de- open-source library developed under the normative
velopment life-cycle for the safety lib. This will also constraints as a way to raise the safety of an appli-
concern the way how patches find their way into the cation and satisfy the need for evidence. A proof-
official repository and contributions to the evidence of-concept library, the safety lib, is presented as a
databases can be committed. As these kind of things foundation for a generic safety library jointly devel-
are new to us, and – as far as we know – unique in the oped by the FLOSS community.
safety world we do not want to rush into things but The next steps depend on the feedback of the
to define a strategy - that not just pleases us, but community. If there is sufficient interest, a devel-
makes the safety lib interesting for everyone devel- opment life cycle needs to be defined which enables
oping software for safety critical systems on POSIX contributions both to the code and the evidence base
compliant systems - up front. without violating safety constraints.
287
A FLOSS library for the safety domain
We dedicate this paper and the safety lib to [11] IEEE Std 1003.1-2001 - Standard for Informa-
the late DI Herbert Haas who lead the ’Stadt tion Technology Portable Operating System In-
Wien Kompetenzteam für Safety Network Engineer- terface (POSIX), 2001, The Open Group, IEEE
ing (SNET)’. He originally proposed the idea of a
software library for safety-critical applications and [12] UNIX Network Programming Volume 1, Third
his encouragement was invaluable to us during the Edition: The Sockets Networking API, W.
development. Richard Stevens, Bill Fenner, Andrew M. Rud-
off, 2003
288
FLOSS in Safety Critical Systems
Klaus-Rüdiger Hase
Deutsche Bahn AG
Richelstrasse 3, 80634 München, Germany
Klaus-Ruediger.Hase {at} DeutscheBahn {dot} com
Abstract
”Open Proof (OP) is a new approach for safety and security critical systems and a further develop-
ment of the Open Source Software (OSS) movement, not just applying OSS licensing concepts to the final
software products itself, but also to the entire life cycle and all software components involved, includ-
ing tools, documentation for specification, verification, implementation, maintenance and in particular
including safety case documents. A potential field of applying OP could be the European Train Control
System (ETCS) the new signaling and Automatic Train Protection (ATP) system to replace some 20
national legacy signaling systems in all over the European Union. The OP approach might help manufac-
turers, train operators, infrastructure managers as well as safety authorities alike to eventually reach the
ambitious goal of an unified fully interoperable and still affordable European Train Control and Signaling
System, facilitating fast and reliable cross-border rail traffic at state of the art safety and security levels.
Keywords: ATC, ATP, Critical Software, Embedded Control, ETCS, EUPL, FLOSS, Open Proof,
openETCS, Train Control, Standardization.
289
Open Proof for Railway Safety Software
other discrete components. Software was not an issue figures for such equipment. Furthermore, some of
then. Beginning in the late 1970 years an increasing the systems are in use for more than 70 years and
number of functions were shifted into software, exe- may not meet todays expected safety level. Some
cuted by so called micro computers. Today the ac- are reaching their useful end of life causing obsoles-
tual functions of such devices are almost entirely de- cence problems.
termined by software. The dramatic performance in-
For a unified European rail system it is very
crease of microcomputers in the past 30 years on the
costly to maintain this diversity of signaling systems
one hand and rising demand for more functionality
forever and therefore the European Commission has
on the other hand, has caused a significant increase in
set new rules by so called Technical Specifications
complexity of those embedded control systems - how
for Interoperability (TSI) with the goal to implement
such devices are usually called. Furthermore, the
a unified European Train Control System, which is
development from purely monitoring safety protec-
part of the European Rail Traffic Management Sys-
tion systems, like the German INDUSI (later called
tem (ERTMS), consisting of ETCS, GSM-R, a cab
PZB: Punktfrmige Zug-Beeinflussung) or similar sys-
radio system based on the GSM public standard en-
tems in other European countries, which only mon-
hanced by certain rail specific extensions and the Eu-
itor speed at certain critical points and eventually
ropean Traffic Management Layer (ETML). Legacy
stop the train, if the driver has missed a halt signal
ATP or so called Class B systems are supposed to be
or has exceeded a safe speed level, to a more or less
phased out within the next decades.
(semi) Automatic Train Control (ATC) systems like
the German continuous train control system, called
LZB (Linien-Zug-Beeinflussung), which has increas-
ingly shifted safety responsibility from the infras- 1.2 ETCS: A new Challenge for Eu-
tructure into the vehicle control units. Displaying ropes Railways
signal commands inside the vehicle on certain com-
puter screens, so called cab signaling, has resulted Before launching the ETCS program, national op-
in greater independence from adverse weather con- erational rules for the railway operation were very
ditions. closely linked with the technical design of the sig-
nal and train protection systems. That is going to
change radically with ETCS. One single technology
has to serve several different sets of operational rules
and even safety philosophies.
The experience of Deutsche Bahn AG after Ger-
man reunification has made very clear that it will
take several years or even decades to harmonize op-
erational rules in all over Europe. Even under nearly
ideal conditions (one language, one national safety
board and even within one single organization) it
was a slow and laborious process to convert different
FIGURE 1: Europes challenge is to substi- rules and regulations back into one set of unified op-
tute more than 20 signaling and ATP systems erational rules. After 40-years of separation into two
by just one single system, ETCS, in order to independent railway organizations (Deutsche Reichs-
provide border crossing interstate rail transit bahn in the east and Deutsche Bundesbahn, west),
in all over the European Union. it took almost 15 years for Deutsche Bahn AG to get
back to one single unified signaling handbook for the
In all over Europe there are more than 20 differ-
entire infrastructure of what is today DB Netz AG.
ent mostly not compatible signaling and train pro-
tection systems in use (figure 1). For internationally Therefore, it seem unrealistic to assume that
operating high speed passenger trains or cargo lo- there will be one set of operational rules for all ETCS
comotives up to 7 different sets of equipment have lines in all over Europe any time soon (Which does
been installed, just to operate in three or four coun- not mean that these efforts should not be started as
tries. Since each of those systems have their own soon as possible, but without raising far too high ex-
antennas to sense signals coming from the way-side pectations about when this will be accomplished.).
and their own data processing units and display in- That means, in order to achieve interoperability by
terfaces, space limitations are making it simply im- using a single technical solution: This new system
possible to equip a locomotive for operation in all has to cope with various operational regimes for the
EU railway networks, not to mention prohibitive cost foreseeable future. Beside this, for more than a
290
FLOSS in Safety Critical Systems
decade there will be hundreds of transition points be- Therefore it became an important issue for vehi-
tween track sections equipped with ETCS and sec- cle operators to identify potential cost drivers and
tions with one of several different legacy systems. options for cost reduction measures, so as not to
This will cause an additional increase of functional endanger the wellintentioned general goal of unre-
complexity for onboard devices. stricted interoperability.
291
Open Proof for Railway Safety Software
erally accepted practices in this particular market, code has not been made public in the least
hardly to be influenced by individual customers (e.g. reliable category; ...
railway operators). Especially security vulnerability
This resolution was mainly targeting electronic
of software must be considered as a specific char-
communication with private or business related con-
acteristic of proprietary or closed source software.
tent, which most likely will not hurt people or en-
So-called End User License Agreements (EULA) do
danger their lives. However a recent attack by the so
usually not allow end-users to analyze copy or redis-
called STUXNET worm [7], a new type of highly so-
tribute the software freely and legally. Even anal-
phisticated and extremely aggressive malware, which
ysis and improvement of the software for the users
in particular was targeting industrial process control
own purposes is almost generally prohibited in most
systems via its tools chain, even in safety critical ap-
EULAs. While on the one hand customers who
plications (chemical and nuclear facilities). Systems,
are playing by the rules are barred from analyz-
which are very similar in terms of architecture and
ing and finding potential security gaps or hazardous
software design standards with signaling and inter-
software parts and therefore not being able to con-
locking control systems. This incident has demon-
tribute to software improvements, even not for ob-
strated that we have to consider such impact in rail-
vious defects, however the same legal restrictions on
way control and command systems as well, commer-
the other hand do not prevent bad guys from disas-
cially and technically.
sembling (a method of reverse-engineering) and an-
alyzing the code by using freely available tools, in
order to search for security gaps and occasionally (or
better: mostly) being successful in finding unautho- 2.2 Software Quality Issues in ETCS
rized access points or so-called backdoors. Intention- Projects
ally implemented backdoors by irregularly working
programmers or just due to lax quality assurance en- Despite a relatively short track record of ETCS in
forcement or simply by mistake are causing serious revenue service we had already received reports on
threats in all software projects. In most cases in- accidents caused by software defects, like the well
tentionally implemented backdoors are hard to find documented derailment of cargo train No. 43647
with conventional review methods and testing pro- on 16 October 2007 at the Ltschberg base line in
cedures. In a typical proprietary R&D environment Switzerland [8]. German Railways has been spared
only limited resources are allocated in commercial so far from software errors with severe consequences,
projects for this type of security checks and there- possibly due to a relatively conservative migration
fore stay most likely undiscovered. That backdoors strategy. During the past 40 years, software was only
cannot be considered as a minor issue, has been dis- introduced very slowly in small incremental steps
cussed in various papers [2, 3, 4, 5] and has already into safety-critical signaling and train protection sys-
been identified as a serious threat by the EU Parlia- tems and carefully monitored over years of operation,
ment, which has initiated resolution A5-0264/2001 before rolled out in larger quantities. Software was
in the aftermath of the Echelon interception scan- more or less replacing hard-wired circuits with rel-
dal, resulting in following recommendations [6]: atively low complexity based on well serviceproven
functional requirement specifications over a period
... Measures to encourage self-protection by citi- of four decades. With ETCS however, a relatively
zens and firms:
large step will be taken: Virtually all new vehicles
29. Urges the Commission and Member States have to be equipped with ETCS from 2015 on, en-
to devise appropriate measures to promote ... and forced by European legal requirements, despite the
... above all to support projects aimed at develop- fact that no long-term experience has been made.
ing user-friendly open-source encryption soft- The ongoing development of the functional ETCS
ware; specification as well as project specific adaptations to
national or line-specific conditions has resulted in nu-
30. Calls on the Commission and Member States
merous different versions of ETCS implementations
to promote software projects whose source text is
not fully interoperable. Up to now, there is still no
made public (open-source software), as this is
single ETCS onboard equipment on the market that
the only way of guaranteeing that no back-
could be used on all lines in Europe, which are said
doors are built into programmes;
to be equipped with ETCS. That means that the
31. Calls on the Commission to lay down a stan- big goal of unrestricted interoperability would have
dard for the level of security of e-mail software been missed, at least until 2010. The next major re-
packages, placing those packages whose source lease of the System Requirements Specification (SRS
3.0.0), also called ”baseline 3”, is expected to elim-
292
FLOSS in Safety Critical Systems
ination those shortcomings. Baseline 3 has another subset 026 [1]). Since all or at least most of the
important feature: Other than all previous SRS ver- documents are created by humans, there is always
sions, which have been published under the copyright the human factor involved, causing ambiguities and
of UNISIG, an association of major European sig- therefore divergent results. Herbert Klaeren refers
naling manufacturers, SRS 3.0.0 in the opposite has to reports in his lecture [9], which have found an av-
been published as an official document by the Euro- erage of 25 errors per 1000 Lines Of programming
pean Railway Agency (ERA) a governmental orga- Code (TLOC) for newly produced software. The
nization implemented by the European Commission. book Code Complete by Steve McConnell has a brief
This gives the SRS a status of a public domain docu- section about errors to be expected. He basically
ment. That means, everyone in Europe is legally en- says that there is a wide range [10]:
titled to use that information in order to build ETCS
(a) Industry Average: ”about 15 - 50 errors per
compliant equipment.
1000 lines of delivered code.” He further says this
is usually representative of code that has some level
of structured programming behind it, but probably
2.3 Quality Deficiencies in Software
includes a mix of coding techniques.
Products
(b) Microsoft Applications: ”about 10 - 20 de-
Everyone who has ever used software products knows fects per 1000 lines of code during inhouse testing,
that almost all software has errors and no respectable and 0.5 defect per TLOC in released products [10].”
software company claims, that their software is to- He attributes this to a combination of code-reading
tally free of defects. There are various opportunities techniques and independent testing.
to make mistakes during the life cycle of software (c) ”Harlan Mills pioneered a so called ’clean
production and maintenance: Starting with System room development’, a technique that has been able
Analysis, System Requirement Specification, Func- to achieve rates as low as 3 defects per 1000 lines
tional Requirement Specification, etc., down to the of code during in-house testing and 0.1 defect per
software code generation, integration, commission- 1000 lines of code in released product (Cobb and
ing, operation and maintenance phases. A NASA Mills 1990 [11]). A few projects - for example, the
Study on Flight Software Complexity [12] shows con- space-shuttle software - have achieved a level of 0 de-
tribution to bug counts, which can be expected in fects in 500,000 lines of code using a system of formal
different steps of software production (figure 2). development methods, peer reviews, and statistical
testing.”
293
Open Proof for Railway Safety Software
294
FLOSS in Safety Critical Systems
save at this point will almost certainly result in in- size may show up once in 1000 years, would mean
flated costs during later project phases (see figure for a space mission duration of 1 year, a probability
5) for a mission failure of about 0,1% due to software
(equal distribution assumed). For the railway oper-
ator however, who operates 1000 trains at the same
time, having the same size and quality of software
on board may cause a mission failure event about
once a year, making drastically clear that code size
matters. While the third point is very difficult to
get implemented within conventional closed source
software projects, simply because highly qualified re-
view resources are always very limited in commercial
projects. Therefore errors are often diagnosed at a
late stage in the process. Their removal is expen-
sive and time consuming. That is why big software
projects often fail due to schedule overruns and cost
FIGURE 5: Fraction of the over all project
levels out of control and often even been abandoned
budgets spent on specification (architecture)
altogether. Never the less, continuous further devel-
versus fraction of budget spend on rework +
opment and consistent use of quality assurance mea-
architecture, which defines a so called sweet
sures can result in a remarkable process of matura-
spot where it reaches its minimum [12]. How-
tion of software products, which is demonstrated by
ever this cost function does not take any poten-
the fact that in our example in figure 4 the bug den-
tial damages into account, which might result
sity has been reduced by more than an order of mag-
from fatalities caused by software bugs.
nitude (initially above 6 bugs/TLOC down to below
0.5 bugs/TLOC). On the other hand, in a later stage
Formal modeling methods and close communica-
of the life cycle, due to the continuous growth of the
tion with the end-user may be helpful in this stage,
number of code lines, which seem to go faster than
especially when operational scenarios can be mod-
the reduction of the bug density, a slight increase of
eled formally as well in order to verify and validate
the total number of bugs, can be observed. Given
the design. Specification, modeling and reviews by
a certain methodology and set of quality assurance
closely involving the customer may even require sev-
measures on the one hand and a number of change
eral cycles in order to come to a satisfactory result.
requests to be implemented per release, then this will
result in a certain number of bugs that remains in the
software. Many of those bugs stay unrecognized for-
ever. However some are also known errors, but their
elimination is either not possible for some reason or
can only be repaired at an unreasonably high level
of cost. The revelation of the unknown bugs can
take several thousand unit operation years (number
of units times number of years of operation) and must
be considered as a random process. That means for
the operator, that even after many years of flawless
operation, unpleasant surprises have to be expected
FIGURE 6: Graph taken from a NASA at any time. In Europe, in a not too distant future
Study on Flight Software Complexity [12] up to 50,000 trains will operate with ETCS, carrying
suggesting a reasonable limit for software millions of passengers daily, plus unnumbered trains
size, determine a level, which results in cer- with hazardous material. Then the idea is rather
tainty of failure beyond this size (NCSL: Non- scary that in any of those European Vital Computers
Commentary Source Lines = LOC as used in (EVC), the core element of the ETCS vehicle equip-
this paper). ment, between 100 and 1000 undetected errors are
most likely left over, even after successfully passing
Large railway operators in the opposite may have safety case assessments and after required authoriza-
several hundreds of trains, representing the same tion has been granted. Even if we would assume,
level of material value (but carrying hundreds of pas- that only one out of 100 bugs might eventually cause
sengers) operating at the same time. Assuming that a hazard [12], that still means 1 to 10 mission critical
a mission critical failure in software of a particular
295
Open Proof for Railway Safety Software
defects per unit. Further more, there will be several tance is a simple form of the software reviewing pro-
different manufacturer-specific variants of fault pat- cess. Researchers and practitioners have repeatedly
terns under way. shown the effectiveness of the reviewing process in
finding bugs and security issues [18].
296
FLOSS in Safety Critical Systems
service market. Competition however is the most ef- 3.1 Public License for an European
fective driver for quality improvement and cost effi- Project
ciency.
Considering commercial, technical, safety and A potential candidate for a license agreement text
security aspects, the risks associated with complex could be the most widely used General Public Li-
closed source software should be reason enough for cense (GPL) or occasionally called GNU Public Li-
the railway operators to consider alternatives, in par- cense, which has been published by the Free Soft-
ticular when a large economic body, like the Euro- ware Foundation [23]. Because this license text (and
pean Union, defines a new technological standard. several similar license texts as well) is based on the
Watts Humphrey, a fellow of the Software Engineer- Anglo-American legal system. In Europe applica-
ing Institute and a Recipient of the 2003 National bility and enforceability of certain provisions of the
Medal of Technology (US), has put the general prob- GPL are considered as critical by many legal experts.
lem of growing software complexity in these words The European Union has recognized this problem
[21]: some time ago and has issued the European Union
While technology can change quickly, getting Public License text [25], which not only is available
your people to change takes a great deal longer. That in 22 official EU languages, but is adapted to the
is why the people-intensive job of developing soft- European legal systems, so that it meets essential
ware has had essentially the same problems for over requirements for copyright and legal liability issues.
40 years. It is also why, unless you do something, The EU Commission recommends and uses this par-
the situation wont improve by itself. In fact, current ticular License for its own European eGovernment
trends suggest that your future products will use more Services project (iDABC [26]). A key feature of the
software and be more complex than those of to- day. aforementioned license types is the so-called strong
This means that more of your people will work on ”Copy Left” [27]. The Copy-Left requires a user who
software and that their work will be harder to track modifies, extends or improves the software and dis-
and more difficult to manage. Unless you make some tributes it for commercial or non-profit purposes, to
changes in the way your software work is done, your make also the source code of the modified version
current problems will likely get much worse. available to the community under the same or at
least equivalent license conditions, which has applied
to the original software. That means everybody will
3 Proposal: Free / Libre Open get access to all improvements and further develop-
ments of the software in the future. The distribu-
Source Software for ETCS tion in principle has to be done free of charge, how-
ever add-on services for a fee are permissible. That
A promising solution for the previously described dif- means for embedded control systems, that software-
ficulties could be given by providing an Open Source hardware integration work, vehicle integration, ho-
ETCS onboard system, making the embedded soft- mologation and authorization costs can be charged
ware source code and relevant documentation open to the customer as well as service level agreements
to the entire railway sector. Open Source Software, for a fee are allowed within the EUPL. By apply-
Free Software or Libre Software more often called ing such license concept to the core functionality of
Free/Libre Open Source Software short: FLOSS [22], the ETCS vehicle function as defined and already
is software that: published in UNISIG subset 026 of the SRS v3.0.0
[1] all equipment manufacturers as well as end-users
• Can be used for any purpose, would be free to use this ETCS software partly or
as a whole in their own hardware products or own
• Can be studied by analyzing the source code, vehicles. Due to the fact that a software package of
• Can be improved and modified and substantial value would be then available to all par-
ties, there would be not much incentive any more for
• Can be distributed with or without modifica- newcomers to start their own ETCS software devel-
tions. opment project from scratch, but would more likely
participate in the OSS project and utilize the effect
This basic definition of FLOSS is identical to the of cost sharing. Also established manufacturers, who
Four Freedoms, with which the Free Software Foun- already may have a product on their own, might
dation (FSF, USA, [23]) has defined free software and consider sharing in for all further add-on functions
is in line with the open source definition formulated by trying to provide an interface to the OSS soft-
by the Open Source Initiative (OSI) [24]. ware modules with their own existing software. This
297
Open Proof for Railway Safety Software
will result in some kind of an informal or even for- ther distributed by the recipient. That means all the
mally set up consortium of co-developing firms and technical improvements, which have been based on
a so called open source eco-system around this core collective experiences of other users/developers and
is most likely to evolve. This has been demonstrated integrated into product improvements need to be dis-
by many similar FLOSS projects. The effect of coop- tributed so that even the original investor gets the
eration of otherwise in competition operating firms, benefits. By recalling the fact that during the life
based on a common standard core product, is often cycle of a large software product, as shown in fig-
called co-competition. In analogy with other similar ure 4, more than 90% of the code and improvements
open projects the name openETCS has been sug- were made after the first product launch, means that
gested for such a project. Occasionally expressed sharing the original software investment with a com-
concern that such a model would squander costly munity (eco-system) becomes a smart investment for
acquired intellectual property of the manufacturers railway operators and manufacturers alike, by sim-
to competitors does not really hit the point, because ply reducing their future upgrade and maintenance
on the one hand the essential functional knowledge, costs significantly. Rather than starting to develop a
which is basically concentrated in the specification, new open source software package from scratch, the
has already been published by UNISIG and ERA easiest and fastest way for a user to reach that goal
within the SRS and cannot be used as unique selling would be simply by selecting one of the already exist-
point. On the other hand implementation know-how ing and (as far as possible) service proven products
for specific hardware architecture and vehicle inte- from the market and put it under an appropriate
gration as well as service knowledge will not be af- open source license. There are numerous examples
fected and has the potential to become part of the from the IT sector, such as the software development
core business for the industry. In addition, for the tool Eclipse, the successor to IBMs Visual Age for
pioneering manufacturer open up his own software Java 4.0., source code was released in 2001 [28], the
could not be better investment money, if this soft- Internet browser Mozilla FireFox (former: Netscape
ware becomes part of an industrial standard, which Navigator), and office communication software Open
is very likely (if others are not quickly following this Office (former: StarOffice) and many more.
move) as demonstrated several times in the software
industry. Not only that, but since safety related soft-
ware products are closely related to the design pro-
cess, tools and quality assurance measures, the pio-
3.3 Tools and Documents Need to be
neering OSS supplier would automatically make his Included
way of designing software to an industrial standard
as well (process standardization). Late followers had In the long term it will not be enough only to make
simply to accept those procedures and may end up the software in the on-board equipment open. Tools
with more or less higher switching costs, giving the for specification, modeling and simulation as well
pioneer a head start. Even in the case that one or as software development, testing and documentation
two competitors would do the same thing quickly, are also essential for securing quality and lowering
those companies could form a consortium sharing life cycle cost. To meet the request for more compe-
their R&D cost and utilizing the effect of quality im- tition in the after sales software service business and
proving feedback from third parties and therefore im- avoiding vendor lock-in effects, requires third par-
proving their competitive position compared to those ties to be in a position to maintain software, prepare
firms sticking with a proprietary product concept. safety case documents, and get the modified software
The UNUMERIT study on FLOSS [22] has shown authorized again without depending on proprietary
cost lowering (R&D average of 36%) and quality im- information. A request no one would seriously deny
proving effects of open source compared with closed for other safety critical elements e.g. in the mechani-
source product lines. cal parts section of a railway vehicle, like loadbearing
car body parts or wheel discs. The past has shown
that software tools are becoming obsolete quite often
due to new releases, changing of operating systems,
3.2 ETCS Vehicle On-Board Units or tool suppliers simply going out of business, leav-
with openETCS ing customers alone with little or no support for their
proprietary products. Railway vehicles are often in
Software that comes with a FLOSS license and a revenue service for more than 40 years and electronic
Copy-Left, represent some kind of a gift with a com- equipment is expected to be serviced for at least 20
mitment, namely as such that the donor has almost years and tools need to be up to the required techni-
a claim to receive any improvements made and fur- cal level for the whole period. The aircraft industry
298
FLOSS in Safety Critical Systems
with similar or sometimes even longer product life- free of malware) source code, can be infected with
cycles has realizing this decades ago, starting with a backdoor, almost invisible for the programmer. It
ADA compiler in the 1980th, specifically designed took several years of research until David A. Wheeler
for developing high assurance software for embedded suggested in his dissertation thesis (2009) a method
control design projects, originally initiated by the US called Diverse Double-Compiling [32], based on open
Air Force and developed by the New York Univer- source tools for countering the so called Thomson
sity (GNAT: GNU NYU Ada Translator), which is Hack. Therefore Wheeler suggests on his personal
available in the public domain and further developed website:
by AdaCore and the GNU Project [3], [23] and a
Normal mathematicians publish their proofs, and
somewhat more sophisticated tools chain, which is
then depend on worldwide peer review to find the er-
called TOPCASED, initiated by AIRBUS Industries
rors and weaknesses in their proofs. And for good
[29]. TOPCASED represents a tools set, based on
reason; it turns out that many formally published
ECLIPSE (another OSS software development tools
math articles (which went through expert peer review
platform [28]) for safety critical flight control applica-
before publication) have had flaws discovered later,
tions with the objective to cover the whole life cycle
and had to be corrected later or withdrawn. Only
of such software products, including formal specifica-
through lengthy, public worldwide review have these
tion, modeling, software generation, verification and
problems surfaced. If those who dedicate their lives
validation, based on FLOSS in order to guarantee
to mathematics often make mistakes, its only reason-
long term availability. TOPCASED seams to be a
able to suspect that software developers who hide their
reasonable candidate for a future openETCS refer-
code and proofs from others are far more likely to get
ence tools platform, since it is a highly flexible open
it wrong. ... At least for safety-critical work making
framework, adaptable in various ways for meeting a
FLOSS (or at least world-readable) code and proofs
wide range of requirements. Today manufacturers in
would make sense. Why should we accept safety soft-
the rail segment are using a mix of proprietary and
ware that cannot undergo worldwide review? Are
open source tools, since some software development
mathematical proofs really more important than soft-
tools like ADA and other Products from the GNU
ware that protects peoples lives? [3]
Compiler Collection (GCC) [24] have already been
used in several railway projects. Even FLOSS tools,
not specifically designed for safety applications, like
BugZilla for bug tracing and record keeping, have
already been found its way into SIL 4 R&D pro- 3.4 Open Proof the ultimate Objec-
grams for railway signaling [30]. The importance tive for openETCS
of qualified and certified tools is rising, since it be-
came obvious, that poor quality tools or even mal-
Wheelers statement confirms the need for an open
ware infected tools can have a devastating effect on source tools chain to cover the software produc-
the quality of the final software product. 6.6 of pro- tion and documentation process for verification and
posed prEN 50128:2009 norm [31], modification and validation into the open source concept in total,
change control, requires to take care of the software providing an Open Proof (OP) methodology [33].
development tools chain and processes, which in the OP should be then the ultimate objective for an
future formally have to comply with requirements openETCS project, in order to make the system as
for the respective SIL level of the final product. Re- robust as possible for reliability, safety as well as for
cent news about the STUXNET attack, a type of
security reasons. An essential precondition for any
malware (worm) specifically designed to target in- high quality product is an un-ambiguous specifica-
dustrial process control computers via its tools chain tion. Until this day only a written more or less-
(maintenance PCs with closed source operating sys- structured text in natural language is the basis for
tem) has made pretty clear, that no one can be lulled ETCS product development, leaving more room for
into security even not with control and monitoring divergent interpretation (figure 3) than desirable. A
systems designed for safety critical embedded appli- potential solution for avoiding ambiguities right in
cations [7]. Ken Thompson, one of the pioneers of the the beginning of the product development process
B Language, a predecessor of C, and UNIX operating
could be the conversion into a formal that means
system design has demonstrated in his Reflections on mathemati- cal description of the functional require-
Trusting Trust [2] that compilers can be infected with ment specification. As recommended by Jan Peleska
malicious software parts in a way that the resulting in his Habiltationsschrift (post doctorial thesis) [34]:
executable software (e.g. an operating system) gen-
erated by this compiler out of a given clean (means: ... how the software crisis should be tackled in
the future:
299
Open Proof for Railway Safety Software
300
FLOSS in Safety Critical Systems
executable software code. Even without existing real software for several month or even years, while hav-
target hardware, those elements can be used to sim- ing an average survival time of days or few weeks, at
ulate the ETCS behavior and modeling critical oper- the most, in the case of well managed OSS projects
ational test cases in a so called Software-in-the-Loop [3], [4], [5], [32]. Figure 8 demonstrates the princi-
modeling set-up. Once the specification of the func- ple information and source code flow for a typical
tionality has been approved and validated, the code FLOSS development set-up.
generation can be done for the EVC embedded con-
trol system. Standardization can be accomplished
by providing an Application Programmer Interface
(API) similar to the approach successfully applied
in the automotive industry within the AUTOSAR
project [39] or for industrial process control systems
based on open Programmable Logic Control (PLC)
within the PLCopen project [40] including safety
critical systems. In addition to the software speci-
fication, generation, verification and validation tools
chain also tools for maintenance (parameter setting, FIGURE 8: The classical Stone Soup De-
system configuration, software upload services) have velopment Methodology often applied in Open
to be included in the OSS concept, as shown in figure Source Software projects according to [3],
7. where the User in most cases is also active
as Developer, which need to be adapted to the
rail sector, where Users my be more in a re-
porting rather developing role. Only trusted
3.6 How FLOSS can meet Safety and developers are privileged to make changes to
Security Requirements the source code in the trusted repository, all
others have read only access.
For many railway experts, not familiar with open
source development methodology, open source is of- It is not in question that well acknowledged and
ten associated with some kind of chaotic and arbi- mandatory rules and regulations according to state
trary access to the software source code by ama- of the art R&D processes and procedures (e.g. EN
teur programmers (hackers), completely out of con- 50128) have to be applied to any software part in
trol and therefore not suited for any kind of qual- order to get approval from safety authorities be-
ity software production. This may have been an fore going into revenue service. While open source
issue of the past and still being in existence with eco-systems in the IT industry are generally driven
some low level projects, adequate for their purpose. by users, having the expertise and therefore being
However since OSS license and R&D methodologies in a position to contribute to the software source
concepts have successfully been applied to unnum- code themselves, so it seems unlikely for the railway
bered serious business projects, even for the highest segment to find many end users of embedded con-
safety and security levels for governmental admin- trol equipment for ETCS (here: railway operators
istration, e.g. within the iDABC, European eGov- or railway vehicle owners), who will have this level
ernment Services Project [26] as well as commercial, of expertise. Therefore the classical OSS develop-
avionics [29] and military use [24], a concept based ment concept and organization has to be adapted to
on a group of qualified and so called Trusted De- the railway sector. Figure 9 shows a proposal for
velopers (figure 8) having exclusively access to a so an open source software development eco-system for
called Trusted Repository, which on the other hand openETCS utilizing a neutral organization to coordi-
can be watched and closely monitored by a large nate the so-called ”co-competition” business model
community of developers, being able to post bug re- for cooperating several competing equipment inte-
ports and other findings visible to the whole commu- grators and distributors for ETCS onboard products
nity, has made this so called bazaar process [17] to a and services based on a common FLOSS standard
much more robust methodology compared with any core module, adapted to the needs of the railway
other proprietary development scheme. According signaling sector providing high assurance products
to several research projects, OSS projects in general to be authorized by safety authorities (NSA, NoBo).
tend to find malicious code faster than closed source The concept as shown in figure 9 assumes a license
projects, which is indicated for example in the av- with Copy-Left, requiring in general distributing the
erage life time of so called backdoors, a potential source code free of charge, even if code has been
security threat, which might exist in closed source added or modified and further distributed, so that
301
Open Proof for Railway Safety Software
the community can re-use the improvements as well. policy), by offering the identical software under two
That means that only certain added values can be (or more) different license agreements (figure 10).
sold for a fee. Typical added values can be service for One might be the European EUPL, a Copy-Left type
software maintenance (bug-fixing), software adapta- FLOSS license and the other one can be a For-Fee-
tion for specific applications, integration into embed- License (without Copy-Left), which does not require
ded control hardware and integration into the vehicle publishing all modification. In exchange ac certain
system, test and homologation services, training for fee has to be paid, which may also provide for war-
personnel and so forth. ranty and other services. Combined with a sched-
uled release scheme (e.g. defining a fixed release
day per year or any other reasonable frequency), all
new modules will be available only under the For-
Fee-License first, until R&D costs have been paid
off by those users, who want to make use of the
new functionality, while all others can stick with the
older, but free of charge software versions. Once the
new features are paid off, those particular software
modules can then be set under the FLOSS license
(EUPL). That allows fair cost sharing for all early
implementers and does not leave an undesired bur-
den on those users, who can live with- out the ad-
ditional functions for a while, but still being able to
FIGURE 9: Proposal for an openETCS upgrade later on.
eco-system using a neutral organization to co-
ordinate a so-called ”co-competition” business
model, showing flow of software source code,
bug reports and ad-on services provided for a
fee.
302
FLOSS in Safety Critical Systems
add-on functional development. In most cases it is an open source reference system, based on an
technically much easier to implement a small API, unambiguous specification, which means using
interfacing just for the add-on functions, rather than formal methods, in order to deliver a reference
providing a fully functional API for the whole kernel onboard system as soon as possible, which can
(figure 11). be used to compare various products on the
market in a simulated as well as real world
If the non-OSS supplier wants to make use of
infrastructure test environment. This device
those Add-on-SW-Modules from the library, he can-
needs to be functionally correct, however does
not use the OSS-licensed software, but can com-
not to be a vital (or fail-safe) implementation.
bine any proprietary software with alterative licensed
software, not including a Copy-Left provision. Be- • 2. At least one or better more manufacturers
sides commercial matters also technical constrains have to be convinced to share-in into an open
have to be taken into account when combining soft- source software based business approach by
ware parts, developed for different architectural de- simply converting their existing and approved
signs. A concept of hardware virtualization has al- proprietary ETCS onboard product into an
ready been discussed to overcome potential security open source software product by just switch-
issues [43]. ing to a FLOSS license agreement, prefer-
ably by using the European Union Public Li-
cense (EUPL), including interface definition
3.7 How to Phase-in an OSS Ap- and safety case documentation. No technical
proach into a Proprietary Envi- changes are required.
ronment? • 3. Once a formally specified reference FLOSS
package has been provided, implemented on a
Even though the original concept of the ETCS goes non-vital reference hardware architecture, ac-
far back into the early 1990 years projecting an open cording to step 1, in a future step by step ap-
white-box design of interchangeable building blocks, proach all add-on functions and enhancements
independent from certain manufacturers, based on a and future major software releases should be
common specification and mainly driven by the rail- based on formal specifications, allowing a mi-
way operators organized in the UIC (Union Inter- gration of the original manufacturers software
national des Chemin de Fer = International Union design solution into the formal method based
of Railways), software was not a central issue and approach, due to the openness of the prod-
open source software concepts were in its infancy uct(s) from step 2.
[41], [42]. Since then a lot of conceptual effort and
detailed product development work has been done,
but the white box approach has never been adapted
by the manufacturing industry. Despites various dif-
ficulties and shortcomings, as mentioned earlier, the
European signal manufacturers have developed sev-
eral products, more or less fit for its purpose and
it would be unwise to ignore this status of develop-
ment and start a brand new development path from
scratch. This would just lead to another product
competing in an even more fragmented market rather
than promoting an effective product standard. In
addition, it needs at least one strong manufacturer
with undoubted reputation and a sound financial ba-
sis in combination with a sufficient customer base to FIGURE 12: Interaction between
enforce a standard in a certain market. Therefore openETCS project providing formally speci-
starting a new product line, by having the need to fied non-vital reference OBU for validating
catch up with more than a decade of R&D efforts is proprietary as well into OSS converted in-
not an option. dustrial products and for future migration to
Based on this insight, a viable strategy has to a fully formally specified openETCS software
act in two ways: version to implemented in a market product.
303
Open Proof for Railway Safety Software
supplier, based on proprietary designs (upper half) EU Commission [22] has identified a potential av-
and major mile stones for the openETCS project, erage cost reduction of 36% for the corresponding
providing a non-vital OBU based on formal spec- R&D by the use of FLOSS. As a result, a signifi-
ification and later migrating to a formally specified cantly lower cost of ownership for vehicle operators
vendor specific implementation of the kernel software would accelerate the ETCS migration on the vehicle
(lower half). side.
Trying to implement an independent formal open
source software package without the backing of at
least one strong manufacturer, will most likely fail 3.9 Benefits for the ETCS Manufac-
if no approved and certified product can be used to turers
start with. The only promising way to accomplish
the crucial second step in this concept is by using The core of the ETCS software functionality de-
a tender for a sufficiently attractive (large enough) fined by UNISIG subset 026, to be implemented
ETCS retrofit project by adding a request for an OSS in each EVC, is a published and binding standard
license for the software to be delivered. The EU requirement and therefore not suitable for defining
commission has provided a guideline for such OSS an Unique Selling Proposition (USP). As a result it
driven tenders, the so called OSOR Procurement makes perfectly sense from the perspective of man-
Guide (Guideline on public procurement of Open ufacturers, to share the development cost and the
Source Software, issued March 2010, [26]). As an risk for all R&D of the ETCS core functionality
example, figure 12 shows the time line for an ETCS even with their competitors, often practiced in other
retrofit project for high speed passenger trains to be industrial sectors (e.g. automotive). The involve-
equipped by 2012 with an pre-baseline 3 proprietary ment of several manufacturers in the development of
software in 2012, to be added by an open source li- openETCS will help to enhance the quality in terms
cense as soon as the first baseline 3 software package of security and reliability (stability and safety) of
is expected to be released. the software, because different design traditions and
experiences can easily complement each other. As
a FLOSS-based business model can no longer rely
3.8 Economical Aspects of openETCS on the sales of the software as such, the business
for Europes Railway Sector focus has to be shifted to services around the soft-
ware and even other add-on features to the product.
That means the business has to evolve into service
A free of charge, high-quality ETCS vehicle software
contracts for product maintenance (further develop-
product on the market, makes it less attractive, un-
ment, performance enhancements and bug fixes). It
der economical aspects, to start a new software de-
thereby helps the ETCS equipment manufacturers
velopment or even further development of a different
to generate a dependable long-term cash flow, fund-
but functionally identical proprietary software prod-
ing software maintenance teams even long after the
uct. This will lead sooner or later to some kind of
hardware product has been discontinued and to cover
cooperation of competing ETCS equipment suppli-
long term maintenance obligations for the product
ers, a co-competition with all those suppliers who
even by third parties, helping to reserves scarce soft-
can and will adapt their own products by provid-
ware development resources for future product R&D.
ing an API to their particular system. Due to the
With respect to the scarcity of well educated software
fact that very different design and safety philosophies
engineers from Universities, FLOSS has the side ef-
have been evolved in the past years, some of the man-
fect, that openETCS can and most likely will become
ufacturers have to decide either to convert their sys-
subject to academic research, generating numerous
tems or share-in into the co-competition grouping,
master and dissertation thesiss and student research
or otherwise stick with costly proprietary software
projects.
maintenance on their own. As figure 4 demonstrates
clearly that the increase of the software volume over
time may exceed the original volume by a factor of
3. It is unlikely to assume that the development of 3.10 Benefits for Operators and Vehi-
the ETCS vehicle software will run much differently. cle Owners
Then it will be very obvious that for a relatively lim-
ited market, of perhaps up to 50,000 rail cars to be The use of openETCS is a better protection for the
equipped with ETCS in Europe, a larger number vehicle owners investment, because an obsolescence
of parallel software product development lines will problem on the hardware side does not necessarily
hardly be able to survive. A study funded by the mean discontinued software service. Modification
304
FLOSS in Safety Critical Systems
of the ETCS kernel can also be developed by inde- lected by applying the same quality criteria. This
pendent software producers. This enables competi- supports the impression that FLOSS does tend to
tion on after-sales services and enhancements, be- have a higher quality.
cause not only the software sources but also associ-
ated software development tools are accessible to all
parties. As shown above, due to the complexity of
the software, malfunctions of the system may show
4 Conclusion
up many years, even decades after commissioning.
Conventional procure- ment processes are therefore The major goal of unified European train control,
not suitable, since they provide only a few years of signaling, and train protection system, ETCS, has
warranty coverage for those kinds of defects. These led to highly complex functionality for the onboard
concepts imply that customers would be able to find units, which converts into a level of complexity for
all potential defects within this limited time frame, the safety critical software not seen on rail vehicles
just by applying reasonable care and observation of before. A lack of standardization on various levels,
the product by the user, which does not match ex- different national homologation procedures and a di-
periences with complex software packages with more versity of operational rules to be covered, combined
than 100,000 lines of code. This finding suggests that with interfacing to several legacy systems during a
complex software will need care during the whole life- lengthy transitional period has to be considered as
cycle. Since software matures during long term qual- a major cost driver. Therefore, even compared with
ity maintenance, means that during early usage, or some of the more sophisticated legacy ATP and ATC
after major changes, the software may need more in- systems in Europe ETCS has turned out to be far
tensive care whereas in its later period of use, service more expensive without providing much if any ad-
intensity may slow down. But as long as the software ditional performance or safety advantages. Due to
is in use, a stand-by team is needed to counter un- ambiguities in the system requirement specification
foreseeable malfunctions, triggered by extremely rare (SRS) various deviations have been revealed in sev-
operational conditions. As the ETCS onboard soft- eral projects, so that even the ultimate goal of full in-
ware can be considered as mission critical, operators teroperability has not yet been accomplished. There-
are well advised to maintain a service level agreement fore the development of ETCS has to be considered
to get the systems up and running again, even after as work in progress, resulting in many software
worst case scenarios. Railway operators and vehicle upgrades to be expected in the near and distant fu-
owners are usually not be able to provide that soft- ture. Since almost all products on the market are
ware support for themselves. They usually rely on based on proprietary software, this means a low de-
services provided by the OEM. However due to slow- gree of standardization for the most complex compo-
ing service intensity after several years of operation, nent as well as life-long dependency to the original
this service model may not match the OEMs cost equipment manufacturers with high cost of owner-
structure in particular after the hardware has been ship for vehicle holders and operators. Therefore an
phased out. In those cases OEMs are likely to in- open source approach has been suggested, not only
crease prices or even to discontinue this kind of serve. covering the embedded control software of the ETCS
A typical escrow agreement for proprietary software onboard unit itself, but including all tools and doc-
might help, but has its price too, because alternative uments in order to make the whole product life cy-
service providers have first to learn how to deal with cle as transparent as possible optimizing economy,
the software. Only a well established FLOSS-eco- reliability, safety and security alike. This concept is
system can fill in the gap at reasonable cost for the called open proof a new approach for the railway sig-
end user, and that is only possible with FLOSS. DBs naling sector. A dual licensing concept is suggested,
experience with FLOSS is very positive in general. based on the European Union Public License with a
For more than a decade, DB is using FLOSS in vari- copy left provision on the one hand, combined with a
ous ways: In office applications, for the intranet and non-copy left for-fee-license on the other hand to pro-
DBs official internet presence and services on more vide a cost sharing effect for participating suppliers
than 2000 servers world-wide and even in business and service providers. By offering a trusted reposi-
critical applications. The original decision in favor tory, a dedicated sources code access policy in com-
of FLOSS was mainly driven by expected savings on bination with a release schedule policy, economical
license cost. However looking back, quality became a as well as safety and security considerations can be
more important issue over time, since FLOSS appli- taken into account. A two step approach, providing
cation have had never caused a service level breach, a formally specified non-vital reference system and a
which cannot be said for proprietary software, se- procurement program, asking for converting existing
commercial products from closed source into open
305
Open Proof for Railway Safety Software
[2] Thompson, Ken: Reflections on Trusting Trust [12] Dvorak, Daniel L., (Editor): NASA Study
; Reprinted from Communication of the ACM, on Flight Software Complexity, Final Re-
Vol. 27, No. 8, August 1984, pp. 761-763. port, California Institute of Technology,
http://cm.bell-labs.com/who/ken/trust.html 2008, Report: http://www.nasa.gov/pdf/
418878main FSWC Final Report.pdf, Pre-
[3] Wheeler, David A.: High Assurance (for Secu-
sentation: http://pmchallenge.gsfc.nasa.gov/
rity or Safety) and Free-Libre / Open Source
docs/2009/presentations/Dvorak.Dan.pdf
Software (FLOSS); updated 20/11/2009;
http://www.dwheeler.com/essays/high- [13] Ostrand, T. J. et al: Where the Bugs
assurance-floss.html Are. In: Rothermel, G. (Hrsg.): Proceedings
[4] Wysopal, Chris; Eng, Chris: Static De- of the ACM SIGSOFT International Sympo-
tection of Application Backdoors, Vera- sium on Software Testing and Analysis, Vol.
code Inc., Burlington, MA USA, 2007, 29, 2004, Pages 86-96; http://portal.acm.org/
http://www.veracode.com/images/stories/static- ; see also: http://www-pu.informatik.uni-
detection-of-backdoors-1.0.pdf tuebingen.de/users/klaeren/sw.pdf (German)
[5] Poulsen, Kevin, Borland Interbase back- [14] Randell, B.: The NATO Software
door exposed, The Register, Jan. 2001, Engineering Conferences, 1968/1969:
http://www.theregister.co.uk/2001/01/12/ http://homepages.cs.ncl.ac.uk/brian.randell/NATO/
borland interbase backdoor exposed
[15] Dijkstra, Edsger W.: The Humble Pro-
[6] EUROPEAN PARLIAMENT: REPORT on grammer, ACM Turing Lecture 1972.
the existence of a global system for the in- http://userweb.cs.utexas.edu/users/
terception of private and commercial com- EWD/ewd03xx/EWD340.PDF
munications (ECHELON interception system),
(2001/2098(INI)), Part 1: Motion for a [16] Sukale, Margret: Taschenbuch der Eisenbahnge-
resolution: A5-0264/2001, 11. July 2001. setze, Hestra-Verlag, 13.Auflage 2002
306
FLOSS in Safety Critical Systems
[17] Raymond, Eric Steven: The Cathedral and [30] Duhoux, Maarten: Respecting EN 50128
the Bazaar, version 3.0, 11 Sept. 2000 change control requirements using BugZilla
http://www.catb.org/ esr/writings/cathedral- variants, Signal+Draht, Heft 07+08/2010,
bazaar/cathedral-bazaar/ar01s04.html EurailPress http://www.eurailpress.de/sd-
archiv/number/07 082010-1.html
[18] Pfleeger, Charles P.; Pfleeger, Shari Lawrence:
Security in Computing. Fourth edition. ISBN 0- [31] DIN EN 50128; VDE 0831-128:2009-10;
13-239077-9 Railway applications - Communication,
signal- ling and processing systems -
[19] Biggerstaff, Ted J.: A Perspective of Gen- Software for railway control and protec-
erative Reuse, Technical Report, MSR- tion systems; version prEN 50128:2009;
TR-97- 26,1997, Microsoft Corporation Beuth Verlag, Germany, http://www.vde-
http://research.microsoft.com/pubs/69632/tr- verlag.de/previewpdf/71831014.pdf (index
97-26.pdf only)
[20] Rix, Malcolm: ”Case Study of a Suc- [32] Wheeler, David A.: Countering the
cessful Firmware Reuse Program,” Trusting Trust through Diverse Double-
WISR (Workshop on the Institution- Compiling (DDC), 2009 PhD dissertation,
alization of Reuse), Palo Alto, CA,. George Mason University, Fairfax, Virginia
ftp://gandalf.umcs.maine.edu/pub/WISR/wisr5/ http://www.dwheeler.com/trusting-trust/
proceedings/ .
[33] Open Proof: http://www.openproofs.org/
[21] Watts S. Humphrey; Winning with Software:
An Executive Strategy, 2001 by Addison- Wes- [34] Jan Peleska: Formal Methods and the De-
ley, 1st Edition; ISBN-10: 0-201-77639-1 velopment of Dependable Systems, Habilita-
tionsschrift, Bericht Nr. 9612,Universitt
[22] UNU-MERIT, (NL): Economic impact of Bremen, 1996. http://www.informatik.uni-
open source software on innovation and bremen.de/agbs/jp/papers/habil.ps.gz
the competitiveness of the Information and
Communication Technologies (ICT) sector [35] Anne E. Haxthausen, Jan Peleska and Sebas-
http://ec.europa.eu/enterprise/sectors/ict/files/ tian Kinder: A formal approach for the con-
2006-11-20-flossimpact en.pdf struction and verification of railway control
systems, Journal: Formal Aspects of Comput-
[23] Free Software Foundation, Inc.; 51 Franklin ing. Published online: 17 December 2009.
Street, Boston, MA 02110-1301, USA: DOI: 10.1007/s00165-009-0143-6 Springer,
http://www.gnu.org/philosophy/free-sw.html , ISSN 0934-5043 (Print) 1433-299X (Online)
[24] David A. Wheeler: Open Source Soft- http://springerlink.metapress.com/content/
ware (OSS or FLOSS) and the U.S. De- l3707144674h14m5/fulltext.pdf
partment of Defense, November 4, 2009; [36] Lorenz Dubler, Michael Meyer zu Hrste,
http://www.dwheeler.com/essays/dod-oss.ppt Gert Bikker, Eckehard Schnieder; For-
[25] European Commission, European Union Pub- male Spezifikation von Zugleitsystemen
lic License - EUPL v.1.1, Jan. 9, 2009. mit STEP, iVA, Techn. Univ. Braun-
http://ec.europa.eu/idabc/en/document/7774 schweig, 2002; http://www.iva.ing.tu-
bs.de/institut/projekte/Handout STEP.pdf
[26] European Commission, iDABC, European
eGovernment Services; OSOR; Guideline on [37] Padberg, J. and Jansen, L. and Heckel, R. and
public procurement of Open Source Soft- Ehrig, H.: Interoperability in Train Control Sys-
ware, March 2010, http://www.osor.eu/idabc- tems: Specification of Scenarios Using Open
studies/OSS-procurement-guideline Nets; in Proc. IDPT 1998 (Integrated De- sign
and Process Technology), Berlin 1998, pages 17
[27] Wikipedia, terminology: Copyleft - 28
http://en.wikipedia.org/wiki/Copyleft
[38] Gary Rathwell: Stone Soup Development
[28] Eclipse Foundation, About the Eclipse Founda- Methodology: Last updated December 5, 2000
tion, http://www.eclipse.org/org/#about http://www.pera.net/Stonesoup.html
[29] TOPCASED: The Open Source Toolkit for Crit- [39] AUTOSAR (AUTomotive Open System ARchi-
ical Systems; http://www.topcased.org/ tecture); http://www.autosar.org/
307
Open Proof for Railway Safety Software
[40] PLCopen; Molenstraat 34, 4201 CX Gorinchem, control system for the European railways, Aug.
NL,: http://www.plcopen.org/ 1993, 2nd. Rev. Oct. 1995
[41] UIC/ERRI A200: ETCS, European Train Con-
[43] Johannes Feuser, Jan Peleska: Security
trol System, Overall Project Declaration in-
in Open Model Software with Hardware
cluding the contribution to be made by UIC,
Virtualization The Railway Control Sys-
Utrecht, NL, Jan. 1992,
tem Perspective. Univ. Bremen, 2010
[42] UIC/ERRI A200: Brochure ETCS, European http://opencert.iist.unu.edu/Papers/2010-
Train Control System, The new standard train paper-2-B.pdf
308
FLOSS in Safety Critical Systems
Andreas Platschek
OpenTech EDV Research GmbH
Augasse21, A-2193 Bullendorf
[email protected]
Georg Schiesser
OpenTech EDV Research GmbH
Augasse21, A-2193 Bullendorf
[email protected]
Abstract
As virtualization techniques are being used in the automotive industry, in order to save hardware,
reduce power consumption and allow the reuse of legacy applications, as well as allow the fast development
and integration of new applications, the need for a run-time environment that is suitable and in wide use
in the automotive industry emerges. The requirements for such an run-time environment are defined in
the most widely used specification in this industry - OSEK/VDX.
One key feature the OVERSEE project is taking advantage of, is that co-locating a OSEK run-time
environment and a full-featured GPOS GNU/Linux eliminates many limitations of OSEK/VDX by the
extension through virtualization and notably allowing to mitigate some of the serious shortcomings in
the security area by resolving these issues at the architectural level rather than trying to patch up the
limited OSEK OS. This may well constitute a general trend to specialize operating systems and operate
powerful hardware as an assortment of specialized FLOSS systems collaborating to provide different
services, including full backwards compatibility to legacy operating systems.
Currently, several FLOSS implementation of this specification are available under different FLOSS
license models and with a different degree of compliance. This paper gives an overview of the available
implementations, a rational for the chosen implementation as well as a description of the efforts for the
migration to XtratuM.
309
Migrating a OSEK run-time environment to the OVERSEE platform
ments shall be integrated as an extension of the func- and well defined behavior of OSEK/VDX compliant
tionality provided by OSEK OS.” [BSW097] Existing operating systems, allow high portability of applica-
OS, AUTOSAR, Requirements on Operating System tions developed for such an operating system.
V2.1.0 R4.0 Rev 1
The following summarizes the essential points of
All this explains, why support for an OSEK com- OSEK/VDX, for more details, please refer to the
pliant run-time environments is an indispensable re- homepage [4], where all parts can be downloaded free
quirement for a software platform - like the one de- of charge, since it is an open standard.
veloped in the OVERSEE [1] project - that targets
the automotive industry. A high level view of this
software platform can be seen in figure 1. Task Management OSEK/VDX distinguishes be-
tween two different types of tasks, basic tasks
(BT) and extended tasks (ET). While a BT
can only release the processor if it terminates,
or if it is preempted by a higher priority task or
an interrupt service routine (ISR), an ET can
also go into a waiting state, allowing the sched-
uler to dispatch a lower priority task, without
terminating the higher priority task. An ex-
ample for this would be, if the ET is waiting
for some kind of event to happen. Instead of
just polling and wasting CPU time, it can go
into the waiting state, in this state it is not
scheduled, before the event is signaled (more
on signals below).
OSEK/VDX provides a Task state Model ([4],
FIGURE 1: High Level Architecture section 4.2) that describes the states a task
can be in, and the transitions between those
This paper will give an introduction to the tasks. The task state model for extended tasks
OSEK/VDX operating system specification, and de- is shown in figure 2. For basic tasks the task
scribe the efforts that were necessary to allow the state model is essentially the same, but without
execution of FreeOSEK [2] a FLOSS implementa- the waiting state.
tion of OSEK/VDX in a virtualized environment,
namely the XtratuM hypervisor allowing to run sev- The states a task can be in are the following:
eral FreeOSEK run-time environments in parallel
with other run-time environments like Linux parti- • running - a task in the running state
tions or LithOS [9] partitions, while guaranteeing the is currently active and executed. At all
independence between those run-time environments. times only one task can be in the running
state. (OSEK/VDX is specified for single
core CPUs only, multi-core solutions are
2 OSEK/VDX covered by newer versions of AUTOSAR)
• ready - all schedulable tasks are in the
In the following, the open operating system speci-
ready state, waiting for their turn to tran-
fication OSEK/VDX [8] is summarized, looking at
sition into the running state.
the highest conformance class ECC2 (extended con-
formance class). The lower conformance classes are • suspended - tasks in the suspended task
subsets of ECC2, the relation between the confor- are currently inactive and wait for their
mance classes can be found in [4], Figure 3-3. activation to become ready.
• waiting - extended tasks that are waiting
2.1 OSEK OS for some event to happen can decide to go
into the waiting state instead of wasting
The most important part of OSEK/VDX to under- CPU time. A task in the waiting state
stand the context of this paper is OSEK OS. It spec- will be released from the waiting state as
ifies a operating system, well suited for the needs soon as the desired event has happened.
of the automotive industry. The standardized API
310
FLOSS in Safety Critical Systems
• category1 ISRs do not use operating sys- All these problems are high probable error
tem services, and after they are finished, sources, the goal of the OSEK resource man-
execution continues exactly at the point agement system is to do everything possible to
311
Migrating a OSEK run-time environment to the OVERSEE platform
prevent them from the operating system side. off. The predefined value can be specified ei-
To reach these goals, the following mechanisms ther relative to the actual counter value (rel-
are specified by OSEK/VDX: ative alarm) or as an absolute value (absolute
alarm).
OSEK Priority Ceiling Protocol [4], sec-
The counter value can be incremented by all
tion 8.5, introduces the OSEK Priority
kinds of sources, of course this could be a real-
Ceiling Protocol, used to avoid priority
time clock, but it could also be any other in-
inversion and deadlocks between tasks.
terrupt source that increments the counter.
This protocol provides a ceiling prior-
ity for each resource (this ceiling prior- While any number of alarms can be assigned to
ity is statically assigned at system gener- the same counter, each alarm has exactly one
ation), which shall be set to priority of the counter and exactly one alarm-callback routine
highest-prior task using the resource. assigned at system generation time.
If a task with a lower priority accesses the Error Handling OSEK/VDX defines hook rou-
resource, it’s own priority is risen to the tines which can be used for a variety of tasks.
resources
priority temporarily. After the task re- Hook Routines are part of the operating
leases the resource, it’s priority is set back system, although implemented by the ap-
to it’s old priority. This way, it is not plications developer. They can be seen as
possible that the task is preempted by an a possibility for the application developer
higher prior task that competes for the to extend the functionality of the operat-
same resource, while the lower prior task ing system. The hook routines are called
is holding the resource. by the OS at pre-configured events, which
Section 8.6 of [4] introduces an optional events depends on the implementation of
extension of the OSEK Priority Ceiling the operating system itself. Since hook
Protocol, that includes ISRs. routines are part of the OS, they have
higher priority than all tasks, and they
Restrictions when using Resources
can not be interrupted by category2 ISRs.
OSEK/VDX defines restrictions on the
While the interface for hook routines are
system calls that may be used, while a
standardized, functionality is not and is
task is holding a resource. The calls
up to the application developer.
forbidden while holding a resource are
TerminateTask, ChainTask, Schedule and Error Handling OSEK/VDX distinguishes
WaitEvent. As can be inferred from the between two categories of errors - appli-
names, those calls that invoke the sched- cation errors and fatal errors. In case of
uler and might lead to the scheduling of a fatal error, the integrity of the operat-
another task are the ones prohibited while ing systems internal data can no longer
holding a resource. be guaranteed, and the operating systems
This is a simple an effective way of as- shuts down. If an application error oc-
suring the mutual exclusivity of resources, curs, a system call could not be serviced
furthermore it helps to prevent deadlocks properly, but the internal data of the op-
between tasks. erating system is still assumed to be cor-
rect. If a system service routine returns
Scheduler as a Resource If a task wants to an error code, an error hook routine is
prevent itself from being preempted, it called. This hook routine has to be pro-
can lock the scheduler. If a task chooses vided by the user, who has the respon-
to do so, the scheduler is still invoked, but sibility to bring his application back on
not allowed to schedule any other tasks. track.
Interrupts are received and processed in-
dependently of the state of the scheduler. System Startup/Shutdown All low level
(hardware) initialization is up to the ap-
Alarms are special (time-dependent) events, offered plication developer, the specifications of
by the OSEK OS, to activate tasks after a the OSEK/VDX concern only the plat-
counter has experienced. A counter in OSEK form independent parts and start with the
is represented a counter value measured in call to StartOS.
ticks, if the counter reaches a predefined value, Shutdowns are a little more complicated,
the alarm expires and the alarm-event is set since each task has to be informed of the
312
FLOSS in Safety Critical Systems
313
Migrating a OSEK run-time environment to the OVERSEE platform
needed to allow high-level services to operate in there • Basic partition management functions: Much
respective OS environments and still give strong of the partition management is related to the
guarantees with respect to independence. initialization and shutdown phase of a parti-
tion. The essence of the interface is that it
minimizes the state information that needs to
3.1 XM Hypercall Interface be handled by the hypervisor - leaving more or
less all state related work to the partition.
XtratuM offers a relatively narrow interface of Hy-
– XM suspend partition: This is a basic
percalls to it’s partitions. This simplified things a
function that is only used in supervisor
lot for our porting efforts. In this section we will
mode to manage a partition. It is used to
only briefly outline hypercalls that were used in this
porting effort, for a full list of available hypercalls we block a partition (waiting on a resource)
or temporarily stop a partition if errors
refer you to the XtratuM Reference Manual [11] The
are detected.
intention of this section is to show the interface size
used in the XtratuM guest management for a actual – XM resume partition: Simply the oppo-
example. site to the above partition suspension.
– XM shutdown partition: As the hypervi-
sor does not have information about the
• Time services: XtratuM provides an indepen-
internal state of a partition shutdown is
dent virtual time to each domain on which the
provided as an asynchronous notification.
guest-OS then can implement high-level timing
Basically a partition is sent a request to
services. In this sense the low-level services can
shut down via a dedicated interrupt and
be seen as mimicking hardware timing services.
after cleaning up any internal state will
– XM get time: Time entities in Xtra- then terminate it self.
tuM are of microsecond granularity, and – XM reset partition: Conversely to
are maintained relative to the last sys- the XM shutdown partition, the
tem reset. There are two basic clocks XM reset partition is a forced shutdown
in the system. Clocks in XtratuM are of a partition whereby a warm and cold
strictly monotonic. Clocks are main- reset is differentiated, a warm reset pre-
tained for the system (XM HW CLOCK) serves some of the partitions initialized
as well as for the partitions execution resources (i.e. open ports and memory
(XM EXEC CCLOCK) areas) while a cold reset clears this all
– XM set timer: Interval timer service and thus can have side-effects on other
(providing one-shot behavior by setting partitions via communication channels no
the interval to 0). The expire time is an longer being served.
absolute time with respect to either hard- – XM halt partition: A halted partition is
ware clock or execution clock. To a par- set into an inactive state but no recla-
tition the expired timer is signaled as a mation of resources (spatial or temporal)
virtual timer interrupt (emulating a hard- are done (that is left to the partition re-
ware timer). set) in this state the partition is sim-
ply no longer scheduled by the hypervi-
• Interrupt services: Signaling to partitions is sor. The XM halt partition called by non-
provided via virtual interrupts, it is up to the supervisor partitions can only pass self as
guest-OS to then assign suitable meaning and the target of the halt.
response to the events. Note the absence of – XM idle self: This allows a partition to
a interrupt request hypercall - as all resources suspend it self within its time slot. The
are allocated statically in XtratuM there is no partition will only be re-woken on its next
need for a request irq. time-slot or if a NMI is received within its
current time slot. This can be used to im-
– XM enable irqs: globally disable inter-
plement donation schemes for system par-
rupt delivery to this partition
titions.
– XM disable irqs: globally enable inter-
rupt delivery • Basic system management functions: Note
that these are not directly related to the guest-
– XM set irqmask: used for masking OS as these calls are related to privileged do-
(blocking) and unmasking of interrupts mains - they are listed here for completeness.
314
FLOSS in Safety Critical Systems
– XM halt system: The halt partition call POSIX compliant platforms (this is just a simulation
(also described above) is used by sys- environment, running FreeOSEK as a user-space pro-
tem partitions to manage the system as cess intended to allow everyone to test it on a normal
a whole as well as individual partitions. Linux desktop).
Only supervisor partitions can halt other
FreeOSEK is licensed under the GPLv3 with link
partitions. This is used to prepare a par-
exception. This means, that you can link your code
tition reset as well as mode switching.
into FreeOSEK and can still license your code under
– XM reset system: Brute force system halt whatever license you want (free or proprietary).
of the entire board after this only a hard-
ware reset can reboot the system. No pre- According to the FreeOSEK homepage, they cur-
cautions are taken to put any partition rently run about 80% of the OSEK OS conformance
into a sane state thus this is only the last tests, and of those about 95% pass. In addition,
step in a system shutdown as well as in FreeOSEK is tested, using the static code checking
extreme emergency situations. tool splint.
Fortunately big parts of FreeOSEK are generic
• Low level Communication related functions: In
C-code (e.g. the task scheduler) and only the parts
practical implementations one does not actu-
that directly deal with hardware had to be adapted
ally use the low level object class functions but
(see section 5 for details).
uses the wrappers provided to the commonly
used objects (sampling and queuing ports as While OSEK OS is almost complete, OSEK Com
specified in ARINC 653). These wrappers thus is more or less non existent in FreeOSEK, but this is
are the actual hypercalls that will be issued no big problem for us, as we will see later in section
though they are rarely used in guest-OS code. 5.4, since most of the functionality needed for OSEK
Com compliant communication is already provided
– XM read object: read the object, verify- by XtratuM.
ing access permissions and other low-level
properties. Usage in all reading func-
tions like XM receive queuing message,
XM read sampling message, etc. 5 Porting Efforts
– XM write object: write the ob-
ject. This is used i.e. in The following section describes the efforts that have
XM write sampling/queuing message, to be taken to run FreeOSEK as an run-time envi-
XM send queuing message. ronment in a XtratuM partition.
– XM ctrl object: is used to create and This includes also a description of which steps
manage objects with specific properties already have been achieved successfully, and gives
as well as query these objects (i.e. re- insight into the parts that will need more work. To
trieve the id of the object). This hypercall anticipate the most important thing first: As of this
is used in object management functions writing, FreeOSEK can be used as an XtratuM run-
like XM create sampling/queuing port, time environment, but more work will be needed to
XM get sampling/queuing port status, make a full compliant version possible, most notably
etc. in the task management and communication subsys-
tem some (re)work will be necessary.
While the overall hypercall set is a bit more elab-
orate than listed here, the essential calls used to im-
5.1 Adaptation of the Build System
plement the OSEK guest-OS are listed showing how
small such a guest-OS interface actually can be con-
The first step to running FreeOSEK inside of an
structed if the abstraction level is pulled down far
XtratuM partition, was to adapt FreeOSEK’s build
enough. A full description of the interface is out of
system, so that the resulting binary would be ac-
scope for this paper though.
cepted by XtratuM. The most important thing here
is, that FreeOSEK must not be compiled as an ex-
ecutable binary, but instead it has to be compiled
4 FreeOSEK as an relocatable object, that can be linked into an
XtratuM partition - if necessary even in multiple par-
FreeOSEK[2] is a OSEK implementation started by titions - at a memory address that is specified at
Mariano Cerdeiro. It originally ran on ARM and on configuration time in the XtratuM configuration file.
315
Migrating a OSEK run-time environment to the OVERSEE platform
After this stage it is already possible to boot into • setting a task to a waiting state
FreeOSEK, and to put some xprintf’s1 into the init
code. Since most of the initialization code is generic • release of a resource at the task level
(e.g. load the data of the application’s task) this is • return from interrupt level to task level
already done without any changes to the FreeOSEK
code base. The next point that really needed atten- In order to allow preemption of tasks (either vol-
tion, was the x86 specific code for the task switches. untarily by going int o waiting states or involuntarily
by hitting one of the points of rescheduling from the
above list,
5.2 Task Management
the context has to be saved before and restored
In order to assure a flawless scheduling of tasks, after rescheduling, this part of the task management
it has to be assured,that for each possible point of is not clean yet and will need some rework so it can be
rescheduling, the transition from the old to the new considered done. For a proof of concept as necessary
task is done properly. by the OVERSEE project, other parts of OSEK are
more important and will therefore need to be han-
Which actions have to be performed during dis-
dled before finishing up task management.
patching, depends on the the event that led to the
rescheduling - that is on the point of rescheduling
itself. 5.3 Counters and Alarms
OSEK OS lists the following 4 points of
rescheduling for non-preemptive scheduling: As described above, one way a task can be activated
is if an alarm has expired. Each alarm is triggered
• Task Termination by exactly one counter.
Counters can be incremented by all kinds of
• explicit activation of successor task
events but one of the most common ones are timers,
• explicit call of the scheduler in order to allow timed activation of tasks. All that
was to do, to allow alarms that wake up tasks, was to
• a transition into a waiting state takes place add an IRQ handler which is triggered by the virtual-
ized XM timer interrupts. Inside of this IRQ handler
Let’s have a quick look at those four points of a counter is incremented, using the OSEK defined
rescheduling. The first two can be handled really IncrementCounter() call. The virtualized timer is
easily, for those two, the task context of the old task configured in the initialization code of FreeOSEK.
does not have to be saved, since it terminates, be- Now one or more alarm(s) can be associated with
fore the new task is scheduled. Therefore, all that the counter in the OIL configuration file of the ap-
was needed to get a basic version of FreeOSEK run- plication, to make those alarms go off as soon as the
ning on XtratuM, was to set the stack pointer to the counter has reached a limit.
stack of the new task, and jump into task itself. This An example for such configuration could look
way, simple examples that activate non-preemptive like this (only the part that deals with counters and
tasks, and chain non-preemptive tasks can already
alarms):
be run.
If preemptive scheduling is desired, the follow- COUNTER HardwareCounter {
ing extended list of points of rescheduling has to be MAXALLOWEDVALUE = 100000;
considered: TICKSPERBASE = 1000;
MINCYCLE = 1;
• Task Termination TYPE = HARDWARE;
COUNTER = HWCOUNTER0;
• explicit activation of successor task };
• activation of a task at task level COUNTER SoftwareCounter {
• explicit call of the scheduler MAXALLOWEDVALUE = 100000;
TICKSPERBASE = 100;
• a transition into a waiting state takes place MINCYCLE = 1;
1 xprintf is a library function of libxm wrapping a XM write console, giving the application programmer a way to use formatted
printing.
316
FLOSS in Safety Critical Systems
317
Migrating a OSEK run-time environment to the OVERSEE platform
318