Ke 10
Ke 10
Ke Yu
June 2010
Abstract
iii
Thirdly, we propose a RTOS-centric real-time embedded software simulation
model. It provides a systematic approach for building modular software (including
both application tasks and RTOS) simulation models in SystemC. It flexibly sup-
ports mixed timing application task models. The functions and timing overheads
of the RTOS model are carefully designed and considered. We show that the
RTOS-centric model is both convenient and accurate for real-time software simu-
lation.
Fourthly, we integrate TLM communication interfaces in the software models,
which extend the proposed RTOS-centric software simulation model for SW/HW
inter-module TLM communication modelling.
As a whole, this thesis focuses on RTOS and real-time software modelling and
simulation in the context of SystemC-based SLD and provides guidance to soft-
ware developers about how to utilise this approach in their real-time software de-
velopment. The various aspects of research work in this thesis constitute an inte-
grated software Processing Element (PE) model, interoperable with existing TLM
hardware and communication modelling.
iv
Table of Contents
Abstract ...................................................................................................................................iii
v
Chapter 2 Literature Review: Transaction-Level Modelling and System-Level RTOS
Simulation ................................................................................................................ 27
2.1 Transaction-Level Modelling and Simulation ..................................................... 28
2.1.1 Abstraction Levels and Models in TLM ................................................. 30
2.1.2 Communication Modelling in TLM ........................................................ 35
2.1.3 Embedded Software Development with TLM ........................................39
2.2 The SystemC Language ...................................................................................... 43
2.2.1 SystemC Language Features ...................................................................44
2.2.2 SystemC Discrete Event Simulation Kernel ........................................... 46
2.2.3 A SystemC SW/HW System Example .................................................... 51
2.3 RTOS Modelling and Simulation in System-level Design ..................................54
2.3.1 Coarse-Grained Timed Abstract RTOS Modelling .................................55
2.3.2 Fine-Grained Timed Native-Code RTOS Simulation ............................. 58
2.3.3 ISS-based RTOS Simulation ...................................................................60
2.3.4 The Proposed RTOS Simulation Model ................................................. 61
2.4 Summary ............................................................................................................. 62
Chapter 3 Mixed Timing Real-Time Embedded Software Modelling and Simulation ........... 65
3.1 Issues in Software Timing Simulation ................................................................ 68
3.1.1 Annotation-Dependent Time Advance.................................................... 68
3.1.2 Fine-Grained Time Annotation ............................................................... 70
3.1.3 Multiple-Grained Time Annotation ........................................................ 71
3.1.4 Result Oriented Modelling ......................................................................72
3.2 The Mixed Timing Approach .............................................................................. 75
3.2.1 Separating and Mixing Timing Issues ..................................................... 76
3.2.2 TLM Software Computation Modelling ................................................. 77
3.2.3 Defining Software Models ......................................................................80
3.2.4 Techniques for Improving Simulation Performance ............................... 87
3.2.5 Application Software Performance Estimation .......................................90
3.2.6 RTOS Performance Estimation ............................................................... 93
3.2.7 Timing Issues in Software Simulation .................................................... 95
3.3 The Live CPU Model .......................................................................................... 99
3.3.1 The HW Part of the SW Processing Element Model .............................. 99
3.3.2 The Virtual Registers Model .................................................................101
3.3.3 The Interrupt Controller Model ............................................................. 102
3.3.4 The Live CPU Simulation Engine ......................................................... 103
3.4 Evaluation Metrics ............................................................................................ 109
3.4.1 Simulation Performance Metric ............................................................ 110
3.4.2 Simulation Accuracy Metrics ................................................................ 110
vi
3.5 Experimental Results ........................................................................................ 112
3.5.1 Performance Evaluation ........................................................................ 113
3.5.2 Accuracy Evaluation ............................................................................. 119
3.6 Summary ........................................................................................................... 121
Chapter 4 A Generic and Accurate RTOS-Centric Software Simulation Model ................. 125
4.1 Motivation and Contribution ............................................................................. 126
4.2 Research Context and Assumptions .................................................................. 127
4.3 The Embedded Software Stack Model .............................................................. 129
4.4 Common RTOS Concepts and Features ............................................................ 132
4.4.1 “Real-Time” Features of Embedded Applications ................................ 132
4.4.2 RTOS Kernel Structures ....................................................................... 134
4.4.3 RTOS Requirements and Modelling Guidance .................................... 136
4.5 The Real-Time Embedded Software Simulation Model ................................... 150
4.5.1 Simulation Model Structure .................................................................. 150
4.5.2 Application Software Modelling........................................................... 155
4.5.3 RTOS Task/Thread and Process Modelling.......................................... 159
4.5.4 Multi-Tasking Management Modelling ................................................ 165
4.5.5 Scheduler Modelling ............................................................................. 172
4.5.6 Task Synchronisation and Communication Modelling ......................... 180
4.5.7 Interrupt Handling Modelling ............................................................... 188
4.5.8 HAL Modelling .................................................................................... 194
4.5.9 General Modelling Methods for RTOS Services .................................. 197
4.6 Evaluation Metrics ............................................................................................ 202
4.6.1 Simulation Performance Metrics .......................................................... 202
4.6.2 Simulation Accuracy Metrics ............................................................... 203
4.7 Experimental Results ........................................................................................ 204
4.7.1 Multi-Tasking Simulation with C/OS-II RTOS.................................. 204
4.7.2 Interrupt Simulation with RTX RTOS .................................................. 207
4.8 Summary ........................................................................................................... 210
Chapter 5 Extending the Software PE Model with TLM Communication Interfaces .......... 213
5.1 Integrating OSCI TLM-2.0 Interfaces ............................................................... 215
5.1.1 The OSCI TLM-2.0 Standard ............................................................... 215
5.1.2 TLM Constructs in the Software PE Model.......................................... 216
5.1.3 The TLM System-on-Chip Model ........................................................ 218
5.2 Experiments ...................................................................................................... 221
5.2.1 Performance Study of TLM Models ..................................................... 221
5.2.2 DMA-Based I/O Simulation ................................................................. 223
5.3 Summary ........................................................................................................... 226
vii
Chapter 6 Conclusions and Future Work ................................................................................. 227
6.1 Summary of Contributions ................................................................................ 227
6.2 Conclusions .......................................................................................................229
6.2.1 The Mixed Timing Approach ............................................................... 229
6.2.2 The Live CPU Model ............................................................................ 230
6.2.3 The RTOS-Centric Real-Time Software Simulation Model ................. 230
6.2.4 Extending Software Models for TLM Communication ........................ 231
6.3 Future Work ......................................................................................................232
6.3.1 Improving Timing Modelling Techniques ............................................ 232
6.3.2 Enriching RTOS Model Features .......................................................... 232
6.3.3 Multi-Processor RTOS Modelling ........................................................ 233
viii
List of Tables
ix
Table 4-16. SystemC implementation code of the sem_wait() function ........................................184
Table 4-17. Mutex services in the RTOS model and some RTOSs............................................... 186
Table 4-18 POSIX-like mutex APIs in the RTOS model .............................................................. 186
Table 4-19. Message queue services in the RTOS model and some RTOSs .................................188
Table 4-20. POSIX-like message queue APIs in the RTOS model ............................................... 188
Table 4-21. Time advance methods for RTOS services ................................................................ 201
Table 4-22. Accuracy loss of the RTOS-centric simulation compared with ISS........................... 207
Table 4-23. Simulation speed comparison..................................................................................... 208
Table 4-24. Interrupt handling in the RTOS-centric simulator...................................................... 209
Table 4-25. Timing accuracy losses .............................................................................................. 210
Table 5-1. TLM implementation in the software PE model .......................................................... 217
Table 5-2. LT and AT targets ........................................................................................................219
Table 5-3. Implementation of the DMA controller........................................................................220
x
List of Figures
xi
Figure 3-14. Hardware part of the software PE model ..................................................................100
Figure 3-15. Interrupt Controller Model ........................................................................................ 103
Figure 3-16. Real CPU execution and Live CPU simulation ........................................................ 104
Figure 3-17. Operations of the Live CPU Simulation Engine ....................................................... 106
Figure 3-18. Simulation time results ............................................................................................. 115
Figure 3-19. Simulation time comparison ..................................................................................... 117
Figure 3-20. Comparison of varying fixed-step lengths ................................................................ 118
Figure 3-21. Interrupt handling experiment ................................................................................... 120
Figure 4-1. Software part of the software PE model .....................................................................127
Figure 4-2. Embedded software stack and its abstract model ........................................................ 130
Figure 4-3. Timing parameters of a real-time task ........................................................................133
Figure 4-4. Block diagrams of two RTOS kernel approaches ....................................................... 135
Figure 4-5. Two definitions of interrupt latency and task switching latency ................................ 138
Figure 4-6. The classical three-state task state machine ................................................................ 140
Figure 4-7. Structure of the software PE model ............................................................................ 150
Figure 4-8. SystemC implementation of the software PE simulation model .................................154
Figure 4-9. Defining a RTOS task model ...................................................................................... 160
Figure 4-10. Initialising TCBs .......................................................................................................163
Figure 4-11. Task state machines: reprint A [8] [11], B [12] ........................................................ 166
Figure 4-12. The proposed four-state extensible task state machine ............................................. 167
Figure 4-13. A priority-descending doubly linked task queue ...................................................... 169
Figure 4-14. Priority setting in the RTOS task model ...................................................................173
Figure 4-15. FPS scheduler working flow ..................................................................................... 175
Figure 4-16. Tick scheduling model .............................................................................................. 177
Figure 4-17 Calculating absolute deadlines of tasks in simulation ................................................ 179
Figure 4-18 Message queue control block ..................................................................................... 187
Figure 4-19 RTOS-assisted (non-vectored) interrupt handling model .......................................... 191
Figure 4-20. Vector-based interrupt handling model.....................................................................193
Figure 4-21. TIMA laboratory’s HAL modelling work ................................................................ 194
Figure 4-22. Context switch service .............................................................................................. 196
Figure 4-23. Unmatched RTOS service execution and simulation traces .....................................199
Figure 4-24. Evaluating the timing accuracy by comparing traces ............................................... 203
Figure 4-25. Experiment setup ......................................................................................................204
Figure 4-26. Simulation speed comparison ................................................................................... 205
Figure 4-27. Simulation output comparison .................................................................................. 206
Figure 4-28. Simulation timing accuracy comparison ...................................................................206
Figure 4-29. Interrupt handling experiment ................................................................................... 208
Figure 4-30. RTX interrupt handling in the ISS ............................................................................ 209
Figure 4-31. Simulation timing accuracy comparison ...................................................................210
xii
Figure 5-1. TLM communication interface of the software PE model .......................................... 213
Figure 5-2. OSCI TLM-2.0 essentials ........................................................................................... 216
Figure 5-3. Combining software PE model with TLM interfaces and SoC models ...................... 218
Figure 5-4. The DMA controller model ........................................................................................ 220
Figure 5-5. Simulation performance results .................................................................................. 223
Figure 5-6. The simulation log of the DMA experiment ............................................................... 225
Figure 5-7. Simulation timeline .................................................................................................... 226
xiii
List of Acronyms
xv
HW Hardware
HAL Hardware Abstraction Layer
HDL Hardware Description Language
HdS Hardware-dependent Software
I/O Input/Output
IC Integrated Circuit
IMC Interface Method Call
IP Intellectual Property
IPC Inter-Process Communication
IPCP Immediate Priority Ceiling Protocol
IRQ Interrupt Request
ISA Instruction Set Architecture
ISCS Instruction Set Compiled Simulation
ISR Interrupt Service Routine
ISS Instruction Set Simulation
ITRS International Technology Roadmap for Semiconductors
LT Loosely-Timed
MMU Memory Management Unit
NRE Non-Recurring Engineering
NRT Non-Real-Time
OCP-IP Open Core Protocol International Partnership
OSCI Open SystemC Initiative
OS Operating System
PCB Printed Circuit Board
PCP Priority Ceiling Protocol
PE Processing Element
PIP Priority Inheritance Protocol
POSIX Portable Operating System Interface
PV Programmers View
PVT Programmers View Timed
RM Rate Monotonic
ROM Read Only Memory
xvi
ROM Result Oriented Modelling
RR Round-Robin
RT-CORBA Real-Time Common Object Request Broker Architecture
RTES Real-Time Embedded System
RTL Register-Transfer Level
RTOS Real-Time Operating System
RTS Real-Time System
RTSJ Real-Time Specification for Java
RTX Real Time eXecutive
SHaRK Soft Hard Real-time Kernel
SW Software
SLDL System-Level Design Language
SoC Systems-on-Chip
TCB Task Control Block
TLM Transaction-Level Modelling
UML Unified Modelling Language
VHDL Very-high-speed integrated circuit Hardware Description Language
WCET Worst-Case Execution Time
µITRON micro Industrial The Real-time Operating system Nucleus
xvii
Acknowledgements
I am most grateful to my supervisor Dr. Neil Audsley for his constant and
valuable support and guidance during my PhD study in the University of York.
I would also like to thank my assessors Professor Andy Wellings and Dr.
Leandro Soares Indrusiak for their advice and help in my research.
I give all my love to my parents Yu Shiliang and Song Yipu for their endless
love to me. This PhD thesis is also my sincere gift to them.
I am full of gratitude to Ms. Zhang Jing. She gave invaluable spiritual support
to me during the bittersweet PhD years.
I would like to express my thanks to all colleagues and friends in Real-Time
Systems Research Group. In particular, I thank Dr. Chang Yang, Dr. Shi Zheng,
Dr. Gao Rui, Dr. Zhang Fengxiang, Dr. Kim Min Seong, Lin Shiyao, and Mrs Sue
Helliwell for their help to me and experience shared with me. I also thank Qian
Jun, Shen Jie, Yao Yining, Dr. Liu Yang, and Dr. Chen Jingxin for our friendship
and cheerful lives in UK.
xix
Declaration
The research work presented in this thesis was independently and originally
undertaken by me between October 2005 and June 2010 with advice from my su-
pervisor Dr. Neil Audsley. Three conference papers have been published:
K. Yu and N. Audsley, "A Mixed Timing System-level Embedded Software
Modelling and Simulation Approach," in 6th International Conference on Embed-
ded Software and Systems 2009, (ICESS '09), 2009. [13] This paper received the
best paper award in the conference.
K. Yu and N. Audsley, "A Generic and Accurate RTOS-centric Embedded
System Modelling and Simulation Framework," in 5th UK Embedded Forum
2009 (UKEF '09), 2009. [14]
K. Yu and N. Audsley, "Combining Behavioural Real-time Software Model-
ling with the OSCI TLM-2.0 Communication Standard," in 7th International Con-
ference on Embedded Software and Systems 2010, (ICESS '10), 2010. [15]
xxi
Chapter 1
Introduction
1
actions with the real world” and “timing requirements of these interactions” are its
two essential characteristics [17]. A RTS receives physical events from the real-
world environment. These events are then processed inside the RTS and appropri-
ate actions finally respond. Timing requirements mean that the corresponding
output must be generated from the input within a finite and specified timing
bound, giving the deterministic timing behaviour. The correctness of a RTS de-
pends not only on the computation result, but also on the time when the result is
produced. “Real-time” does not mean “as fast as possible”, but emphasises “on
time”. Neither a too late output nor a too early output is correct. The vast majority
of embedded systems have real-time requirements, and most real-time systems are
embedded in products. At their intersection are Real-Time Embedded Systems
(RTES). The Operating System (OS) used in a RTES is usually a Real-Time Op-
erating System (RTOS), which supports the construction of RTSs [16]. RTESs
and RTOSs are the general context for this thesis.
From the perspective of system design, an embedded system is constructed
from various hardware and software components. As illustrated in Figure 1-1,
they can be classified into four reference layers [18]. The architecture of an em-
bedded system represents an abstraction model including all embedded compo-
nents. It introduces relationships between abstract hardware and software ele-
ments without implementation details.
All embedded systems have a hardware layer, which contains electronics com-
ponents and circuits located on a Printed Circuit Board (PCB) or on an Integrated
RTOS
System software layer
Device Drivers Firmware
2
Circuit (IC). Although some time-critical or power-hungry portions of a system
can be implemented with customised application-specific hardware (e.g., Applica-
tion-Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays
(FPGAs)), most embedded systems mainly function through software running on
embedded General-purpose Programmable Processors (GPPs) (e.g., Central Proc-
essing Units (CPUs) or Digital Signal Processors (DSPs)). With the development
of the microelectronics industry, Systems-on-Chips (SoCs) have emerged as the
state-of-the-art implementation of embedded systems. A SoC is an integrated cir-
cuit combining multiple GPPs, customised cores, memories, peripheral interfaces,
as well as communication fabric, all on a single silicon chip, which provides sub-
stantial computation capability for handling complex concurrent real-world events.
Comparing the different embedded hardware solutions as indicated above, appli-
cation-specific hardware offers high computing performance and low power con-
sumption at the expense of limited programming flexibility, whilst GPPs offer
higher design flexibility and lower Non-Recurring Engineering (NRE) costs, but
with a relatively low computing capability [16].
In general, embedded software can be grouped into three layers: the application
software layer, the middleware layer, and the system software layer. The applica-
tion functions of an embedded system consist of a task or a set of tasks.
Middleware is an optional layer under application software but on top of sys-
tem software. Middleware provides general services for applications, such as
flexible scheduling [19], distributed computing (e.g., Real-Time Common Object
Request Broker Architecture (RT-CORBA) [20]), and Java application environ-
ment (e.g., Real-Time Specification for Java (RTSJ) [21]). Using middleware
technologies has strengths to reduce complexity of applications, simplify migra-
tion of applications, and ensure correct implementation of reusable functions.
The system software layer is sandwiched between upper-level software and
bottom-layer hardware. It usually contains device drivers, boot firmware and
RTOS, which closely interact with the hardware platform. This kind of software is
also called Hardware-dependent Software (HdS) [22]. Device drivers, e.g., a
Board Support Package (BSP) for a given platform, are the interface between any
software and underlying hardware. They are the software libraries that take charge
3
of initialising hardware and managing direct access to hardware for higher layers
of software [18]. Boot firmware, e.g., the Basic I/O System (BIOS), carries out
the initial self-test process for an embedded system and initiates the RTOS. It is
usually stored in the Read-Only Memory (ROM).
Regarding the RTOS, it is unnecessary and cost-inefficient to introduce a
RTOS in some small embedded devices, where an infinite loop program with the
polling policy for Input/Output (I/O) events may work well [23]. However, in or-
der to satisfy the complex functional requirements and timing constraints for con-
current real-time software execution, the RTOS has become an essential compo-
nent in most embedded systems. Here, concurrent real-time software execution
refers to situations that, under the control of a RTOS, multiple tasks either share a
uniprocessor in interleaving steps or execute on multiple processors in parallel. A
RTOS is needed to provide convenient interfaces and comprehensive control
mechanisms to let applications utilise and share hardware and software resources
effectively and reliably. The kernel is the core element of a RTOS and contains
the most essential functions. In most kernels, there is the notion of task priority,
dynamic pre-emptive scheduling services, synchronisation primitives, timing ser-
vices, and interrupt handling services [24] [25] [26]. Other OS features such as
memory management, file systems, device I/O etc. are often optional in a RTOS
in order to maintain its compactness and scalability. As a central part of the real-
time embedded software stack, a RTOS’s own timing behaviour also needs to be
predictable and computable. Designers must know some important RTOS timing
properties, for example, the context switch time, Worst-Case Execution Times
(WCETs) of system calls, the interrupt handling latency, and the maximum inter-
rupts disabled time, etc. Hence, they can analyse and evaluate the real-time per-
formance of the whole system.
The research in this thesis will investigate how to model RTOS kernel func-
tional and timing behaviours in order to support high-level real-time software
simulation in a uniprocessor system.
4
1.2 Challenges in Embedded System Design
In recent years, the complexity of embedded software has increased rapidly.
According to the International Technology Roadmap for Semiconductors (ITRS)
2007 Edition (ITRS 2007), embedded software design has emerged as “the most
critical challenge of SoC productivity” [4]. For many products of consumer elec-
tronics, the amount of software per product is thought to be double every two
years [27]. The General Motor Information Systems CTO predicts that the aver-
age car, with one million lines of software codes in 1990, will run on one hundred
million lines by 2010 [28]. Figure 1-2 shows growing trends of embedded soft-
ware complexity in motor and mobile phone industries.
Figure 1-2. Embedded software size increases in industry (reprint [5] [10])
5
A
Figure 1-4. Gaps between the design complexity and productivity (reprint [4])
flow (see Figure 1-3), the software development cannot go through until the
hardware prototype is available. This means that software designers often face
imminent product delivery deadlines [30].
There is also a big gap between ever-growing semiconductor fabrication capa-
bility and the design productivity of embedded systems (including both HW and
SW aspects) [31]. The ITRS 2007 presents a summary about hardware and soft-
ware design gaps and Figure 1-4 is the pictorial illustration [4]. In Figure 1-4, re-
garding the HW design aspect, the cutting-edge embedded HW advancements and
design methodologies, e.g., multi-core/processor components and Intellectual
Property (IP) reuse, have somewhat narrowed the distance between HW design
productivity and HW technology capabilities. Unfortunately, although enormous
SW complexity has already been exacerbated, these HW advances further increase
demand for HdS development. As what is shown in the figure, SW productivity is
further behind the steeply increasing SW complexity. An industrial report even
indicates that rapidly increasing software design efforts may exceed the cost of
hardware development when IC technologies evolve from deep submicron-scale
to nano-scale [29].
6
1.3 System-Level Design Methodologies
Motivated by the challenges outlined above, since the 1990s, System-Level
Design (SLD), or so-called Electronic System-Level design (ESL), and corre-
sponding System-Level Design Languages (SLDLs) have been developed as ena-
bling tools for embedded system specification, simulation, implementation and
verification [32].
In the view of Electronic Design Automation (EDA) industry, SLD is indicated
at “a new level of abstraction above the familiar register-transfer level” [4]. This
definition reflects a hardware-centric viewpoint. A more complete definition em-
phasises “the concurrent hardware and software design interaction” as a guiding
concept in a SLD process [17], that is, the HW/SW codesign [33] philosophy is
inherent in SLD methodologies.
7
and synthesis in recent years. In this thesis, SystemC is the research tool for soft-
ware modelling and simulation.
8
accepted SLD approach Transaction-Level Modelling (TLM) [3]. TLM methods
often define a number of intermediate computation and communication models
for simulation in a design flow. At each level, models include necessary func-
tional and timing details for a specific design stage. An important TLM research
topic is the trade-off between simulation performance and the accuracy of differ-
ent models. The research in this thesis is also concerned with this trade-off.
9
System specification Application
functionality and
Low resolution
requirements
phase
Executable
Refinement specification
(e.g., untimed)
Specification
model
Hardware/software
Architecture exploration
Refinement Behavioral
partitioning, mapping, cycle-approximate
Hardware Software
Communication
func. & beha. func. & beha.
channels
models models
Hardware Communication
Software
high-level (Interface)
generation Cycle-accurate
High resolution
synthesis synthesis
Refinement simulation
phase
HW impl. Communication
Target-
model in RTL impl. models
compilable SW
HDLs, e.g., HW SW
impl. models in
Verilog, VHDL topologies protocols
C/C++
link to
Logic synthesis,
Integration,
Physical Design...
[52]) can be also used to produce formal or executable models. These models can
describe behaviour of a system and may become a vehicle for next-step system
refinement.
The architecture exploration phase, so-called hardware/software partitioning
and mapping phase, is concerned with how to distribute system functions between
hardware and software, i.e., Design Space Exploration (DSE). This phase can be
further divided into the pre-partitioning step, the partitioning step, and the post-
partitioning step, according to a detailed design flow explanation in [32]. Usually,
this design phase starts from a unified abstract TLM model, which comprises a set
of PEs for computation and channels for communication. These PE models are
explored to implement in either HW (i.e., application-specific hardware logics) or
10
SW (i.e., programs running on a GPP), and channel models are tried with various
abstract communication topologies and protocols. These TLM models are succes-
sively refined, with timing information and implementation details added. Various
alternatives are simulated in order to evaluate and analyse diverse system charac-
teristics, e.g., functional correctness, scheduling decisions, real-time performance,
power consumption, chip area, and communication bandwidth, etc. Once a sys-
tem’s functions have been partitioned and mapped onto some hardware and soft-
ware elements, a golden architecture model [46] comes into being and the imple-
mentation step is ready to begin. This thesis studies RTOS and real-time software
behavioural modelling and simulation, which can be seen as being after-
partitioned TLM software PE computation research in the architecture exploration
phase. Our research has some relevance to current SLD and TLM research, in
terms of comparable abstract modelling styles, fast simulation performance, rea-
sonable accuracy, and some interoperability with other system-level abstract
hardware and communication models.
In the architecture implementation phase, previous architectural models are
transformed into lower-level models in automated synthesis for final product im-
plementation design and manufacturing. For the hardware aspect, the developing
high-level synthesis (sometimes also referred to as Electronic System-Level syn-
thesis, system synthesis, behavioural synthesis) technologies aim to synthesise
HW models in the form of high-level languages (e.g., C, C++, SpecC, SystemC)
into synthesisable RTL descriptions. RTL descriptions are input of the existing
“RTL to Layout” design flow [32]. This automated high-level synthesis process
connects system-level design with the current design flow in order to produce ac-
tual integrated circuits. Although there is a substantial body of research work in
this domain, automatic high-level synthesis is still thought to be not mature [53]
and has “never gained industrial relevance” [54]. In SLDL-based system-level
design, communication synthesis (also known as interface synthesis) aims to map
TLM channels or similar high-level interfaces to a set of synthesisable cycle-
accurate software protocols and RTL descriptions of target communication to-
pologies [55]. There are several approaches regarding bus-based communication
synthesis [56] [57] and on-chip communication networks synthesis [58] [59].
11
More complete surveys on this topic can be found in [54] and [17]. In high-level
software synthesis (namely target software generation), embedded software (in-
cluding the applications, RTOS and other HdS) implementation models (i.e.,
C/C++ codes that are ready to be compiled into binaries for a target instruction set)
can be generated from TLM software PE models written in SLDLs [60] [61]. Sev-
eral approaches have investigated embedded software target code generation, in
which SLDL functions or generic RTOS services in TLM models are mapped and
translated to the Application Program Interface (API) of a specific RTOS [43] [62]
[63] [64] [65].
1.4.1 SystemC
SystemC is the most commonly used C++ based SLDL. It has been in devel-
opment by the association Open SystemC Initiative (OSCI) since 1999 [38]. In its
early days, the initial SystemC versions 0.9 and 1.0 concentrated on describing
hardware-centric RTL features with the goal to replace Verilog and VHDL as a
12
new HDL, so as to realise high-level synthesis. From the version 2.0, its focus
changed to high-level computation and communication modelling and became an
effective SLDL. It was approved as an IEEE standard in 2006 [66] and is cur-
rently the de facto industry standard for ESL specification, modelling, simulation,
verification and synthesis.
The syntax of SystemC is based on the standard C++ language. It is not a brand
new language but a set of C++ libraries together with a discrete-event simulation
kernel that is also built with C++. A mixture of software programs written with
SystemC and C++ can be compiled by a standard C++ compiler (e.g., GCC or
Visual C++) and linked with SystemC libraries in order to generate an executable
simulation program.
A module (SC_MODULE), namely a class, is the basic SystemC language con-
struct to describe an independent functional component. It contains a variety of
elements to define behaviour and structure of a model, e.g., data variables, com-
putation processes, communication ports and interfaces, etc. SystemC supports
the hierarchical model structure, which means a parent module can include instan-
tiations of other modules as member data. This characteristic is helpful to break
down a large system into manageable sub-models. The main SystemC mecha-
nisms for inter-module communications are channels (sc_channel), which can
be either a simple signal (sc_signal) or a complex hierarchical structure such
as the Advanced Microcontroller Bus Architecture (AMBA) bus [67]. The com-
munication methods implemented by channels are named interfaces, which are
abstract classes declaring pure virtual methods. A module accesses a channel
through a port by calling interface methods. In this way, computation and com-
munication can be explicitly separated and modelled in SystemC.
SystemC uses a discrete-event simulation kernel, which relies on a co-
operative, so-called co-routine, execution model [68]. It does not support a prior-
ity assignment or pre-emption. Only one SystemC process can execute at a time.
The executing process cannot be pre-empted or interrupted by either the kernel or
another process. A process only yields control to the kernel by calling wait-for-
time and wait-for-event functions at its own will. When two processes are ready at
the same time in simulation, it is non-deterministic which process will be chosen
13
to run by the simulation kernel. This particular characteristic is suitable for paral-
lel hardware operations and outperforms a pre-emptive simulation kernel in terms
of fast simulation speed because of less context switch overheads [69]. However,
it is not applicable for concurrent real-time software simulation, which requires
pre-emptive and deterministic scheduling services. This deficiency can be prob-
lematic when importing legacy real-time software into SystemC. Some research
pessimistically abandoned real-time software simulation in SystemC [70].
Whereas, many researchers have presented various remedies on this problem to
some extent, e.g., extending the SystemC language with process control constructs
[71], revising the SystemC simulation kernel [69] [68], implementing RTOS func-
tions on top of the SystemC library [72] [73]. This thesis presents a more com-
plete solution in the last direction.
1.4.2 SpecC
SpecC is a system specification and description language that operates as an
extension of standard C language [39]. The SpecC language and associated design
methodologies were originally developed at the University of California Irvine
beginning in the mid-1990s and continuing up to the present day. In contrast to
SystemC, SpecC introduces new keywords to C language, so it needs a special
SpecC Reference Compiler [74]. Many design concepts (e.g., separation of com-
munication and computation) and language constructs (e.g., modular structure de-
scriptions) of SpecC are either possessed or adopted in the development of Sys-
temC. As well, both SpecC and SystemC can fulfil multiple level specification,
verification and synthesis tasks in SLD and TLM. Their similarities and differ-
ences are introduced and compared in [44].
1.4.3 SystemVerilog
Arising from the semiconductor and electronic design industry, SystemVerilog
is a hardware description and verification language based on extensions of Ver-
ilog [75]. In addition to features available in the classical Verilog, SystemVerilog
provides new verification and object-oriented programming facilities, such as as-
sertions, coverage, constrained random generation, build-in synchronisation
14
primitives and classes. Although SystemVerilog offers both internal object-
oriented software features and a direct programming interface to call external C
functions, its scope is mostly constrained to hardware design, simulation and veri-
fication [76] [32].
15
Target Special
Input
memory Registers
program
space
binary
General
Registers
Fetch
Execute Decode
Dispatch
structions being executed on a target machine, as shown in Figure 1-6. The main
advantages of ISS simulation are fine-grained functional and timing accuracy, so
various ISS simulators are traditionally used by software programmers to debug
cross-compiled target programs instead of using real hardware. And in system-
level design, ISS simulators can be seen as references to evaluate other corre-
sponding cycle-approximate simulators. However, simulation performance is a
drawback of the ISS approach, because its interpretive simulation process incurs a
large overhead. Typically, they run on the order of 100K cycles per second [78],
which is not a satisfactory speed for simulating large amounts of software in sys-
tem-level design [79]. Besides, an ISS simulator needs a detailed ISA-level proc-
essor simulation model, which may not be available at the desired high level of
abstraction in early design stages.
The host compilation based ISS is an improved approach by addressing the
performance disadvantage of traditional interpretive ISS methods [80]. The cen-
tral idea of this technique is to translate target machine’s instructions into host
machine’s at software compile time. This binary-to-binary translation avoids big
run-time overheads of the interpretive process in simulation, hence resulting in a
faster simulation speed. The host compilation ISS research in [80] reports a three
orders of magnitude speedup compared to interpretive ISS. Unfortunately, there
are also some deficiencies to this approach. This technique assumes that software
16
does not change at run time, as a result it is not suited to self-modifying code [80].
Poor portability is another problem, because a compiled ISS is not applicable for
processors with different instruction sets [77] [81]. The Instruction Set Compiled
Simulation (ISCS) [81] technique combines the performance of a compilation-
based approach with the flexibility of an interpretive ISS, by moving the decode
step to compile time and carrying out various compile time optimisations. It
claims a 70% simulation performance improvement compared with the best-
known results in its domain. However, it still faces challenges in terms of both a
long compilation time and a large memory usage [77]. In general, the simulation
performance of ISS approaches is perceived as a bottleneck for a rapid design
space exploration at the system level [79] [82].
17
process 1 wait(2)
process 2 wait(7)
process 3 wait(4)
process 4 wait(3)
Native
execution
Evaluate Progress
and time
schedule SLDL Simulation Kernel
SW native wait(t)
Target-delay Pre-defined SLDL Simulation
execution annotation synchronisation point Framework
scheduling, and external TLM communication [79]. We will adapt them to reflect
our software/RTOS-centric research perspective in the following section.
18
1.6.1 Timed Software Simulation
As shown in Figure 1-7, in SLDL-based timed software simulation, embedded
software (both applications and the RTOS) is organised (wrapped) into several
concurrent processes in a SLDL simulation framework. These processes natively
execute on the host under the supervision of a co-operative SLDL simulation ker-
nel. Since the desired timing behaviour of target software execution cannot be di-
rectly represented in native software execution, estimated software execution
costs (time delays) on the target are manually or automatically annotated to corre-
sponding code segments of simulation processes. These time delays are executed
by SLDL wait(delay) statements in order to suspend the calling process, pass con-
trol to the kernel, and advance the simulator clock. By this way, timing behaviour
of real software execution on the target machine is simulated.
According to the above description, in this co-operative SLDL execution
model, a number of wait(delay) statements are annotated into software processes
when building the model. They in effect predefine synchronisation points between
software processes and the SLDL kernel. Software processes can only yield the
running status at these points at simulation runtime and the simulator time is pro-
gressed according to the annotated delays without an interrupt possibility. This
annotation-dependent software time advance method makes it hard to model a
pre-emptive real-time system. The intuitive but halfway solutions tackle this prob-
lem by using more wait() statements with fine-grained delays to advance SW time
[87], or by inserting some imperative synchronisation points [3]. However, the
timing accuracy is limitedly enhanced at the cost of large modelling (more annota-
tion and synchronisation) and simulation (frequent simulation kernel context
switch) overheads.
A RTOS simulation model is a key point for dynamic scheduling and timing
issues in behavioural real-time software simulation [72] [77]. This is because the
RTOS’s crucial role in embedded real-time software layers, in terms of task man-
agement, pre-emptive scheduling, inter-task communication and synchronisation,
etc. Whereas, current SLDL simulation frameworks and related RTOS simulation
19
models do not, in general, support RTOS simulation adequately. There exist some
problems in this area, which affect the functional and timing accuracy of models,
as well as their simulation performance.
For example, from the perspective of maximising flexibility of system-level
design, designers may want to simulate multiple types of application models to-
gether. Current RTOS modelling research does not address this issue sufficiently
and is incapable of integrating abstract task models (i.e., void or simple task func-
tions with coarse-grained execution time estimates) and native-code task models
(i.e., fully functional tasks with fine-grained delay annotations) in one simulator.
Besides, from the perspective of practical RTOS simulation, some RTOS mod-
els provide simplistic task management and limited synchronisation services,
which are inadequate to imitate behaviour of a real multitasking RTOS.
Furthermore, the low timing accuracy is a common, yet critical, problem in
some RTOS modelling approaches by lack of RTOS services’ timing overhead
modelling and proper time advance.
20
they do not support run-time process pre-emption or interruption. Consequently, it
is essential to implement a HW/SW synchronisation method for SLDL-based
software simulation, which behaves like an interrupt controller in a real CPU in
order to monitor external events and interrupt the executing SLDL process. Be-
sides, this mechanism should minimise the synchronisation frequency so as to re-
duce simulation time overhead, which is not yet achieved well in current ap-
proaches.
21
orders of magnitude faster than traditional ISS simulations and is also
better than some related behavioural software simulation methods.
b. Flexibility is a desired benefit of software behavioural modelling and
simulation for the sake of trade-off. The proposed approach can utilise
varying modelling levels and degrees in different software models in
terms of the functional accuracy, timing accuracy, observability of exe-
cution traces, and performance of simulation.
c. Regarding timing accuracy of software time advance, the proposed ap-
proach should avoid the conventional “annotation-dependent” uninter-
ruptible time advance, rather it should support interruptible time ad-
vance.
d. Although the timing accuracy of behavioural software simulation is re-
stricted by its high modelling level, it still should be sufficient to gen-
erate a timed software execution trace which is the same as a corre-
sponding ISS simulation.
2) Build an abstract CPU model, which can simulate HW/SW interactions and
support high-level interruptible software timing simulation.
a. The HW/SW timing synchronisation (i.e., interrupt handling) problem
must be solved, since it is related to interruptible software time advance.
b. There is a limited abstract hardware modelling that supports hardware-
dependent software service models, e.g., context switch, interrupts ser-
vice, and real-time clock service.
c. The organisation of software models and hardware models should
mimic the typical structure of an embedded system, and be extensible
for future development.
3) Capture essential and common RTOS features and build a generic RTOS
model, in order to flexibly support early and practical simulation of real-
time software in SystemC-based system-level design.
a. The RTOS model should provide generic and standardised multi-
tasking, scheduling and synchronisation services as well as other nec-
essary OS functions.
22
b. In order to enhance modelling flexibility on application tasks, the
RTOS simulation model should support both coarse-grain timed ab-
stract task models and fine-grained timed native applications in a hy-
brid simulation.
c. The RTOS model should achieve accurate simulation in terms of both
timing accuracy and functional results.
4) Incorporate limited TLM communication into software models for transac-
tion-based inter-module communication modelling, in order to make soft-
ware models interoperable with existing TLM modelling and simulation
concepts and techniques.
23
timing software behavioural simulation. Also, the Live CPU Model includes an
interrupt controller and some virtual registers, which are actively involved in
HW/SW synchronisation modelling and hardware-dependent software modelling.
By this means, theoretical interrupt modelling latency and software time advance
stopping latency can reach zero-time in simulation, which means an ideal resolu-
tion.
In terms of the Objective 3, the third part of research focuses on the develop-
ment of a generic and accurate SystemC-based RTOS-centric real-time software
simulation framework. It integrates mixed timing application models, the RTOS,
and the Live CPU Model in a software PE model. The software core is the generic
RTOS simulation model. It supplies a set of fundamental and practical services
including multi-tasking management, scheduling services, synchronisation and
inter-task communication mechanisms, clock services, context switch and soft-
ware interrupt handling services, etc. These functions are summarised and ab-
stracted from a survey on some popular RTOS standards and products. To build a
predictable RTOS timing model, the timing overheads of various RTOS services
are considered in models, which is an advantage over some other similar works.
The dynamic execution scenarios of real-time embedded software can be exposed
by tracing diverse system events and values in simulation, e.g., RTOS kernel calls,
RTOS runtime overheads, task execution times, dynamic scheduling decisions,
task synchronisation and communication activities, interrupt handling latencies,
context switch times, and other user-concerned properties. With this RTOS-
centric simulation framework, real-time embedded software designers can quickly
and accurately simulate and evaluate the behaviour of both abstract and native
real-time applications and the RTOS during the early design phases.
Objective 4 is fulfilled by combining the de facto OSCI TLM-2.0 [88] commu-
nication interfaces into the real-time software PE simulation model generated in
the above second and third parts of research. This work also defines a SoC TLM
model, which not only integrates the software PE model but also includes other
typical TLM initiator, target, and interconnection models. This part of work ex-
tends the software simulation models to the TLM modelling community.
24
1.7 Organisation of the Thesis
The remainder of this thesis is organised as follows:
Chapter 2 Literature Review: Transaction-Level Modelling and System-
Level RTOS Simulation
This chapter will introduce current TLM research, describe the SystemC SLDL,
and survey RTOS modelling and simulation research in the context of system-
level design.
This chapter will start with an overview of important concepts and techniques
in TLM design, including various topics such as abstraction levels, accu-
racy/performance trade-off, and typical simulation frameworks. After that, some
important SystemC language constructs and the OSCI reference simulator will be
introduced along with their relevance to real-time software simulation that is con-
cerned by us. Finally, this chapter will survey related system-level RTOS model-
ling and simulation research. The existing approaches will be classified and dis-
cussed based on their modelling granularities, functional features, and application
areas in system-level design flows.
Chapter 3 Mixed Timing Real-Time Embedded Software Modelling and
Simulation
This chapter will propose a SLDL-based mixed timing software behavioural
modelling and simulation approach and an associated Live CPU Model for fast,
flexible and accurate real-time software behavioural modelling and simulation.
At first, this chapter will introduce the problematic annotation-dependent time
advance method in SLDL-based software simulation and survey some remedy ap-
proaches. It will then describe the mixed timing approach, by defining two types
of software models for TLM software computation modelling and discussing
various issues in timing modelling and timing simulation. Afterwards, the compo-
nents and operations of the Live CPU Model will be introduced in detail. Finally,
evaluation metrics and experiments will also be presented in order to evaluate the
research in this chapter.
Chapter 4 A Generic and Accurate RTOS-centric Software Simulation
Model
25
This chapter will introduce a SystemC-based generic and accurate RTOS-
centric real-time software simulation model. It can support flexible and practical
real-time software simulation in early design phases.
Firstly, this chapter will present the research context and assumptions. An ab-
stract embedded software stack will be defined as the research target. It will then
survey common RTOS concepts and requirements as guidance of following re-
search. Afterwards, details of the RTOS-centric real-time software simulation
model will be described. This research will include three main parts, i.e., the over-
all structure of all simulation models, application software modelling, and RTOS
modelling. RTOS modelling is the core part and will be introduced from both the
functional modelling aspect and the timing modelling aspect. The chapter will af-
terwards explain evaluation metrics regarding simulation performance, functional
accuracy and timing accuracy of the proposed RTOS-centric simulator. Accord-
ingly, experiments will be carried out in order to demonstrate these aspects.
Chapter 5: Extending the Software PE model with TLM Communication
Interfaces
This chapter will extend software simulation models with TLM communication
interfaces by utilising the OSCI TLM-2.0 library. This aims to popularise our
software modelling and simulation research into the promising TLM modelling
domain.
It will firstly introduce related concepts of the OSCI TLM-2.0 library in brief.
Then it will describe how to integrate TLM communication constructs into the
Live CPU Model. Afterwards, a simple SoC TLM model will be presented in or-
der to integrate the Live CPU Model and reveal how various typical system com-
ponents are defined for co-simulation with behavioural RTOS-centric software
models. Finally, an experiment will study the simulation performance of the SoC
simulation model, whilst another DMA I/O experiment will demonstrate the in-
teroperable simulation capability of the combined software and TLM models.
Chapter 6: Conclusions and Future Work
The last chapter will summarise contributions, conclude chapters, and suggest
future research directions.
26
Chapter 2
In order to help developers deal with the increasing design cost and short time-
to-market of today’s embedded systems industry, there is a pressing need for new
design methodologies to ameliorate these problems. System-level design tech-
niques have been proposed, that use high-level abstraction methods to design
hardware and software concurrently in a unified environment. In this research
domain, system-level modelling and simulation are key techniques to describe,
validate, analyse and verify complex systems. In various system-level modelling
and simulation approaches, the SystemC-based Transaction-Level Modelling
(SystemC-TLM) has become a de facto standard. Based on the essential TLM
principle “separating computation from communication”, developers can divide
system modelling and simulation into two main aspects, i.e., the computation as-
pect and the communication aspect.
In the general context of embedded systems modelling, the computation can be
further divided into the software aspect (i.e., software running on a CPU) and the
hardware aspect (i.e., application-specific hardware logics). In this thesis, we spe-
cifically concentrate on modelling and simulating real-time software at a high
level, namely the software PE model. The HW/SW timing synchronisation in the
unified event-driven SystemC simulation environment is addressed, which is cru-
cial for modelling interrupts and greatly affects both simulation timing accuracy
and performance. Because of benefits of dynamic scheduling and multi-tasking
execution of concurrent real-time applications, RTOS behavioural modelling has
increasing relevance for both fast simulation and validation of different software
implementation alternatives in the early stages of design. Various RTOS design
27
space exploration activities (e.g., assigning task priorities, deciding scheduling
strategies and designing application-specific OS services) also require an early
and efficient test bed in order to be carried out. Consequently, the RTOS model is
regarded as the heart of behavioural real-time software modelling and simulation
research in this thesis.
This chapter starts with some basics of current TLM research and work exam-
ples in Section 2.1. As the programming language and research environment of
this thesis, SystemC language constructs and the OSCI reference event-driven
simulator kernel are introduced in Section 2.2, along with their relevance and in-
adequate ability for modelling and simulation of real-time software. In Section 2.3,
an overview is presented on related RTOS modelling and simulation research in
the context of system-level and TLM design. These works motivate our study in
this thesis. The HW/SW timing synchronisation approaches and problems in Sys-
temC simulation are also introduced in several paragraphs within this chapter.
Section 2.4 will summarise this chapter.
28
1500 12
10
1000 8
6
500 4
2
0 0
RTL Cycle-accurate TLM
that it can support early development and validation of hardware dependent soft-
ware. Developers can co-simulate software with hardware models in a single-
source SLDL-based simulation framework, almost as soon as the initial architec-
ture specification is determined [90]. In this thesis, from a software research’s
perspective, TLM refers to high-level interaction between different software and
hardware modules. It includes behavioural software modelling/simulation, high-
level hardware modelling/simulation, and transaction-based communication be-
tween them.
However, the higher abstraction levels of TLM models also indicate less mod-
elling detail and some loss of accuracy. The accuracy of TLM simulation, in terms
of both data accuracy and timing accuracy, is necessarily sacrificed to some extent
due to coarse-grained data transfers and larger time-advancing steps. Of course,
with the goal of rapidly describing the system architecture and validating applica-
tions, requirements are relaxed in terms of accuracy of bit-level data or cycle-
accurate timing. Usually, coarse-grained and reasonably accurate assumptions are
made, e.g., packet-level transmission and cycle-approximate timing. Trading ac-
curacy issues against simulation speed [91], or preserving accuracy whilst gaining
in simulation performance [92], are popular TLM research topics in terms of effi-
ciency and flexibility. We are also concerned with them in this thesis and will pre-
sent some studies in the next chapter. At this point, the term “cycle-approximate
timing” (or the similar term “approximate-timed” [7]) indicates that a procedure
29
(either a computation action or a communication transaction) in a model is as-
signed with timing information that spans multiple clock cycles, and that the
simulation clock can be progressed with multiple clock cycles in each step. De-
spite the fact that this term is broadly used as a temporal resolution in the TLM
taxonomy, its exact timing granularity is vague. A variety of interpretations from
diverse researchers often reveal their own interest in modelling and intention of
optimisation, which may make it difficult to compare the performance and accu-
racy of different TLM works quantitatively and horizontally.
In order to present a general idea of the existing research on TLM, three main
topics will be hereby introduced:
Abstraction levels of TLM: A fundamental essence of transaction-level
modelling is to raise the level of abstraction by hiding low-level implemen-
tation detail. Some important concepts and popular definitions on TLM ab-
straction levels will be addressed.
Communication exploration: A variety of transaction-based communication
modelling approaches have been developed in both academia and industry
to define how system components communicate. The research on communi-
cation modelling and simulation is a contributor factor to most of current
TLM achievements. Here, a brief introduction on related work is presented
in order to reveal this essential TLM aspect.
Embedded software development in TLM: If TLM comprehends two por-
tions “communication” and “computation”, then modelling software is
surely a paramount topic of the TLM computation portion.
30
ble to indicate the range of TLM levels. Without much dispute, most researchers
agree that TLM abstraction levels are relatively “higher” than the RTL used in
traditional design. Also, TLM abstraction levels are considered to be “lower” than
functional (algorithmic) models. Functional models are not defined as TLM mod-
els, although the abstraction level of them is sufficiently high [88]. This is because
a functional model usually includes a single software thread only, e.g., in the form
of a C function or a SLDL process. It does not bear two essential features of a
TLM model: concurrent multitasking computation and inter-process communica-
tion [88].
Conventionally, TLM abstract models are organised with respect to some crite-
ria, including:
Timing accuracy: This is a first-class characteristic regarding the accuracy
of a model. It refers to how a model is assigned with timing information,
e.g., a line of code, a code block, or a task, and cares about the resolution of
timing information, e.g., untimed, cycle-approximate, or cycle-by-cycle.
Functional accuracy: This refers to how a model captures the function of a
Communication
timing degree
Bus-functional Implementation
Cycle- Cycle Accurate model model
accurate Models
Cycle- Programmers
approxim View Timed Bus- Cycle-accurate
Models arbitration computation
ate model model
Programmers
Loosely- View Models
timed CP+T
Models
CP Component-
Specification Models assembly Computation
model model
timing degree
Untimed
Loosely- Cycle- Cycle-
timed approximate accurate
31
target system. For instance, some high-level simulators only abstract timing
properties (e.g., execution time, period, and deadline) of a software model in
order to enhance simulation speed, but without modelling its functional be-
haviour. The functional accuracy can be evaluated by comparing the outputs
of the model with a reliable reference by giving them the same inputs.
Communication data granularity: This criterion regards what data structures
are transmitted through communication channels, for example, an applica-
tion packet, a bus packet, or a word.
There are an number of literatures [3] [88] [7] [93] that feature definitions of
TLM abstraction levels. In the following, Sections 2.1.1.1 to 2.1.1.4 will present
some examples. Figure 2-1 provides a conjunctional view of these TLM abstrac-
tion taxonomies by comparing the timing accuracy of their computation aspects
and communication aspects.
The most acknowledged TLM abstraction level taxonomy was proposed by the
OSCI TLM working group [3] [88]. The OSCI TLM specification defines two
general levels for TLM modelling: the Programmers View (PV) level and the
Programmers View Timed (PVT) level (see Figure 2-1). The PV models are char-
acterised by the Loosely-Timed (LT) coding style and the blocking transport inter-
face, in which each transaction is associated with two timing points, correspond-
ing to the start and the end of a blocking transport. It is appropriate for software
programmers who require a functional virtual hardware platform with sufficient
timing information in order to run an operating system and application software.
A PVT model is identical to the PV level model in terms of functionality, but each
PVT transaction is annotated with multiple timing points and uses the non-
blocking transport interface, namely the Approximately-timed (AT) coding style.
It enables architecture exploration and also performance analysis of the applica-
tion system. This OSCI TLM abstraction level view reflects a communication-
centric hardware design perspective, although some software designers, with the
aim of promoting interoperable TLM modelling, are seeking its application for
computation modelling [6].
32
2.1.1.2 Donlin’s Extended TLM Abstraction Levels
Another early and classical TLM taxonomy is introduced by Cai and Gajski in
[7], which concludes that communication and computation are equally important
yet orthogonal aspects of TLM research. Referring to Figure 2-1, these two as-
pects are illustrated as two axes according to degrees of timing accuracy in a sys-
tem modelling graph. They identify three timing degrees, i.e., untimed, approxi-
mate-timed (so-called cycle-approximate), and cycle-timed (so-called cycle-
accurate). Moreover, the authors define six abstraction models in the graph and
explore their usage in embedded system design flows, starting from the specifica-
tion stage and ending at the implementation stage. Among the six models, four
(the shaded circles in the figure) are classified as TLM models, i.e., the compo-
nent-assembly model, the bus-arbitration model, the bus-functional model, and the
cycle-accurate computation model. The solid arrows in the figure represent a typi-
33
cal TLM system design flow, whilst the other dotted arrows stand for some possi-
ble design routes depending on different design intentions, e.g., communication-
focused or computation-focused.
Various TLM models at different degrees of accuracy bring a potential for mul-
tiple-level or mixed-level modelling in which designers can trade off modelling
accuracy and simulation performance according to different strategies.
In Chapter 2 of [3], the researchers propose a general idea for TLM mixed-
level modelling by combining untimed TLM models and standalone timed TLM
models. This allows for concurrently developing pure functional models (by ar-
chitecture teams) and timing models (by micro-architecture teams) with dissimilar
modelling purposes. Multiple timing scenarios with different resolutions can co-
exist in a unified simulation model, and simulation speed can be optimised by dy-
namically switching untimed and timed models at runtime.
For bus communication modelling, Schirner and Dömer quantitatively analyse
simulation speed and timing accuracy of three abstract communication models,
e.g., the conventional TLM model, the arbitrated TLM model, and the cycle-
accurate and pin-accurate bus functional model [92]. They configure them with
varying data granularities and arbitration handling methods in order to trade off
simulation accuracy and performance. Focusing on software computation model-
ling, they define five abstraction levels for processor modelling (e.g., the applica-
tion level, the task scheduling level, the firmware level, the processor TLM level,
the processor functional model) and quantify accuracy loss and simulation speed-
up of each model [79].
For processor and communication design co-exploration, an integrated design
methodology is presented in [95]. It combines multi-level processor hardware
models (e.g., instruction-accurate and cycle-accurate) and communication models
(TLM buses and RTL buses), by which the processor design team can co-operate
with the communication team early in the design flow.
34
2.1.1.5 Summary
The different views of TLM abstraction levels and related models have com-
mon notions of hardware and communication modelling. Each TLM abstraction
level can be seen as a limited design space for exploring and validating some
functional and timing issues with corresponding models. Multiple TLM abstrac-
tion levels thus constitute a wide design space, namely a design flow, for succes-
sive model refinement through the addition of design detail.
The OSCI TLM standard is gaining a high level of popularity and sustainable
development in both industry and academia. It provides two distinguishing levels
(i.e., LT or AT) for communication models depending on their timing degrees and
synchronisation methods. The relevance of this modelling idea will be examined
to the proposed software modelling approach in Section 3.2.2. The mixed model-
ling idea is widely advocated for accuracy and speed trade-off in both the OSCI
TLM standard and the research surveyed in Section 2.1.1.4. Specifically, it is also
a guiding concept of the mixed timing software modelling approach that is to be
presented in Section 3.2. The recent OSCI TLM standard Version 2.0 provides
standard interfaces for creating bus communication models. Chapter 5 will inves-
tigate combining these API interfaces with the proposed software models in order
to advance interoperability between TLM communication and our native-code
software simulation models.
35
task task task DSP/
Dual Port
ARM7 CPU Model Custom
RAM
(ISS or high-level model) hardware
AHB
UART Timers
AHB to APB
ROM/Flash Bus Bridge
Arbiter
APB
36
Module A Module B
Process A1
Channel pB
pA Process B2
pA->write()
write() read()
Process B1
Process A2
pB->read()
Port Interface
As the key element of the TLM IMC communication, a channel can have vary-
ing complexity across different designs. In a SystemC-TLM specification, a chan-
nel can be implemented in two styles, i.e., the primitive channel and the hierarchi-
cal channel. A primitive channel contains processes and ports and aims to provide
simple and fast communication. The SystemC language reference manual [66]
defines several built-in primitive channels (all derived from a base class
sc_prim_channel), e.g., sc_signal (to model a simple wire carrying a
digital electronic signal), sc_fifo (to model a first-in-first-out buffer),
sc_mutex (to model a mutual exclusion lock) and sc_semaphore (to model a
software semaphore), etc. Hierarchical channels are indeed hybrid modules and
can contain other instances of modules, processes, ports and nested channels.
They are used to model complex customised communications, such as buses or
networks.
In order to advocate model interoperability between different communication
modelling and architecture design communities, some standards are proposed to
promote the SystemC TLM communication paradigm. The following are two pre-
dominant standards.
The OSCI TLM Working Group, which was founded in 2003, has published a
series of OSCI TLM standards. The up-to-date OSCI-TLM library version 2.0 [88]
[99] introduces a set of well-defined core APIs, data structures, initiators, targets,
the generic payload, and the base protocol for transaction-based communications.
The core interfaces support two types of transport, i.e., the blocking transport (a
transaction can suspend its parent process) and the non-blocking transport (a
transaction is atomic and does not suspend its parent process). The generic pay-
load is primarily intended for modelling a typical memory-mapped bus, which is
37
abstracted away from the details of any specific bus protocols. An extension
mechanism is also offered to model specific bus protocols or non-bus protocols by
users. The Open Core Protocol International Partnership (OCP-IP) consortium is
another active TLM standardisation organisation. It has proposed and maintained
a SystemC TLM modelling kit since 2002 [100] [101], defining a stack of com-
munication layers including four abstraction levels, i.e., Message Layer (L-3),
Cycle-approximate Transaction Layer (L-2), Cycle-accurate Transfer Layer (L-1),
and the RTL Layer (L-0). Its latest version, which is built on top of OSCI-TLM
v2.0, provides an interoperable standard for SystemC component models with
OCP protocol features.
A number of TLM modelling and simulation approaches have been proposed
for the design of complex communication systems. The following are some repre-
sentative works.
Gajski’s group presents examples of TLM communication research mainly
based on the SpecC language. The literature [102] describes a general TLM com-
munication modelling style for SoC design. For Network-on-Chip synthesis, they
define some successive system communication abstraction layers and correspond-
ing design models to refine abstract message-passing down to a cycle-accurate,
bus-functional implementation [58]. For AMBA AHB bus modelling, they pro-
pose a Result Oriented Modelling (ROM) technique that improves accuracy
drawback of conventional TLM models and gains high speed by omitting internal
states and making end result correction [103].
In 2002, Pasricha pointed out the direction for using the SystemC TLM model-
ling approach in early architecture exploration and developed communication
channels for fast simulation for embedded software development [90]. In order to
bridge the gap between high-level TLM models and bus cycle-accurate models,
Pasricha et al. present an intermediate TLM abstraction level “Cycle Count Accu-
rate at Transaction Boundaries” (CCATB) for communication exploration, which
improves simulation speed by keeping cycle-level timing accuracy only at trans-
action boundaries [104].
Kogel et al. propose a series of multiple-level SystemC-TLM co-simulation
and virtual architecture mapping methodologies for architectural exploration of
38
NoC, SoC, and MPSoC [105] [106] [95]. Klingauf et al. describe the TRAnsac-
tion INterchange (TRAIN) architecture for mapping abstract transaction-level
communication channels onto a synthesisable MPSoC implementation by virtual
transaction layers [55]. They also propose a generic interconnect fabric for TLM
communication modelling that aims to support flexible buses, multiple TLM ab-
straction levels, and various TLM standard APIs [107].
39
Instruction Set Simulator TLM HW
module
SW application binary
TLM
RTOS port binary channels
communication channels (untimed and timed) are provided to support two TLM
abstraction levels: untimed channels are for a faster verification of applications
before partitioning, while timed channels are used for cycle-accurate modelling.
Cross-compiled binary code of software application, the OS, and drivers executes
in the ISS. For MPSoC design space exploration, the MPARM approach inte-
grates multiple SystemC-based ARM processor models (ISS simulators in Sys-
temC wrappers), the AMBA bus model, and memory models [109]. The TLM
channels implement the bus communication architecture in a master-slave style.
40
Specification model TLM models
Module 1 Module 2 SW module HW Module
Process 1 task task Process 3 Target Cross- Target
Process 3
Scheduler or C/C++ code compiled for binary
Process 2 target
RTOS model Code generation
cal SLDL elements such as modules, processes, interfaces, channels, and ports.
These processes run in parallel and communicate with each other by means of
transaction style channels. Through iterative simulation and partition, untimed
specification models are transformed into PV or PVT TLM models. At the TLM
architecture exploration stage, a simple scheduler or a RTOS model may be inte-
grated to assist sequential software simulation. In order to generate software im-
plementation code towards a specific operating system, a RTOS-specific library
(e.g., RTEMS [59], QNX [63]) is introduced to replace the RTOS model with be-
haviourally equivalent RTOS functions, and SLDL processes are mapped to real
RTOS tasks. Finally, SLDL-based software code is cross-compiled into executa-
ble binary code for a target processor.
These approaches reveal a system-level design point of view and make a valu-
able contribution to co-design and co-synthesis flows. However, such a design
flow is still not straightforward. The first obstacle resides in transforming specifi-
cation models described in a SLDL into RTOS based TLM software execution
models. The hardware-style channel communication mechanism used in specifica-
tions is not suitable for real-time software design, which may sacrifice the con-
ventional software implementation productivity and legacy. Besides, it is known
that the SystemC library bears the weakness of not supporting priority assignment
and pre-emptive scheduling, so the built-in SystemC kernel scheduler and syn-
chronisation primitive channels are not applicable for real-time software model-
ling. Consequently, the idea in [62] that simply replaces SystemC library elements
with target RTOS functions may not be appropriate. A usual solution is to inte-
grate a RTOS model on top of the SLDL in order to supply necessary dynamic
41
real-time software services, which is also the method used in the thesis. Another
problem is the increasing size of binary code, because the generated software code
includes overhead from some SLDL language constructs [62] [59]. For resource-
limited embedded systems, some efficient optimisation techniques may be re-
quired to reduce the interference from the SLDL library in target code.
While some research activities have been devised for software development in
the overall system-level design flow, recently some methodologies and techniques
have emerged that specifically focus on the need of abstract modelling a software
PE (i.e., software running on a CPU) in the context of TLM [79] [111]. This topic
can be seen as a mixture of two aspects: abstract processor modelling (the hard-
ware aspect) and behavioural software simulation (the software aspect). Figure
2-6 depicts features of a TLM software PE model and some possible modelling
options.
From the hardware designers’ angle, the motivation is to abstract physical
processor features into functional elements in order to simulate high-level soft-
ware models in the execution environment and connect software models with the
rest of the system. In [111], Bouchhima et al. present an abstract CPU model aim-
ing for timed MPSoC HW/SW co-simulation. It provides a set of Hardware Ab-
straction Layer (HAL) APIs for upper-layer software models and an interface for
connecting other system components. This CPU model captures an architectural
view of a processor, which includes subsystems like an execution unit for HW
Software aspect
Software Processing
Element (CPU)
task task Timing granularity ?
Execution Unit model model Functions ?
Data Unit Software abstraction
model Generic or specific ?
Sync Unit RTOS model Timing granularity ?
Functions ?
Interrupt I/O port
Implicit or explicit ?
Hardware abstraction Interrupt ?
Hardware aspect
42
multiprocessing, a data unit wrapping any devices and memory elements, an ac-
cess unit containing address space, and a synchronisation unit behaving as an in-
terrupt controller. In a subsequent work [6], they introduce a SW TLM communi-
cation refinement approach named “SW bus” to enable SW tasks to access logical
resources of HW TLM models. In [79], Schirner et al. develop a high-level proc-
essor model to support software simulation. The abstract processor model is mod-
elled in a layered approach including five increasing feature levels, i.e., the appli-
cation layer, the OS layer, the HAL layer, the TLM hardware layer, and the bus
functional hardware layer. This model enables incremental and flexible descrip-
tion of the software subsystem at different design stages.
If we turn to a software developers’ perspective, a software processing element
model should consist of various software models at appropriate levels of abstrac-
tion for behavioural software simulation. Timed software simulation, RTOS
scheduling, and interrupt handling are three key aspects to evaluate research in
this area. In a large number of embedded systems, a RTOS provides a useful ab-
straction interface between real-time applications and processor hardware abstrac-
tion. Consequently, most software processing element modelling approaches inte-
grate a RTOS model in order to supervise native execution of application, which
is known as RTOS modelling [12, 43, 73, 87, 112, 113, 114, 115]. In respect of
the research in this thesis concentrating on the RTOS modelling, a more complete
survey will be given in Section 2.3. In Figure 2-6, timing granularity and func-
tional accuracy are used as dimensions to guide and compare software models,
which offer choices on abstraction levels of task models and RTOS model. Still in
the figure, the hardware abstraction model is illustrated by a dotted frame, this re-
flects the current situation whereby some software modelling approaches do not
include interrupt handling, nor consider the interoperability with hardware models,
i.e., hardware abstraction is implicit in the high-level PE model.
43
work can provide a homogeneous programming and co-simulation environment,
by which users can write both software and hardware models in a unified common
language and natively compile them as a single process on the host computer. The
SystemC execution model uses a discrete-event simulation kernel to schedule
model processes (a set of C++ macros) so as to mimic functional behaviour and
time progress of a target system.
In this section, we will start with a brief introduction to SystemC language fea-
tures with regard to concerned support for software modelling. We will then take
a look at the SystemC co-operative execution model which closely affects real-
time software simulation. Finally, an example of a simple SW/HW system model
is presented in order to illustrate the structure of a SystemC model.
User application
Libraries:
TLM library, verification library, mixed-signal library, other IP libraries
Core language
Predefined channels: Utilities:
SystemC library
44
The Simulation Kernel
It schedules SystemC processes in response to an event or a time delay. The
exact execution mechanism will be described in the next Section 2.2.2.
Language Utilities
These utility classes provide some assisted services in terms of tracing value
changes, reporting exceptions, and mathematical functions.
Data Types
In addition to supporting native C++ types, SystemC defines some data types
for hardware modelling, for instance, integer types within and beyond 64-bit
width (e.g., sc_int<WIDTH>, sc_bigint<WIDTH>), fixed point data types
(e.g., sc_fixed, sc_ufixed, etc.) and four-valued logic types (e.g.,
sc_logic, sc_lv<WIDTH>, etc.). Because SystemC data types are defined in
classes with inevitable overheads, it is recommended to use C++ native types or
simple SystemC integer types for best performance if possible [116].
The Core Language
This category of classes provides main modelling functions regarding model
hierarchy, execution units, concurrency, synchronisation and communication, etc.
A module (SC_MODULE) is the basic SystemC building block, namely an
object of a C++ class. The model of a computing system is composed of
several interconnected hierarchical modules. A module is the container of a
variety of modelling elements such as processes, events, ports, channels,
member module instances and data members.
A process is the basic SystemC execution unit (a macro) that is encapsu-
lated in a SC_MODULE instance in order to perform computation of a sys-
tem. There are three types of process to wrap a function: the method process
(SC_METHOD), the thread process (SC_THREAD) and the clocked thread
process (SC_CTHREAD). The main difference between them is that the
method process atomically runs from beginning to end once triggered, but
the thread and clocked thread processes can be suspended and resumed by
directly or indirectly calling wait() functions that can be used to simulate
time cost of a real activity. The SC_CTHREAD process, a variation of
45
SC_THREAD, is only statically sensitive to a single clock and mainly used
in high-level synthesis [116].
Ports (class sc_port), exports (class sc_export), interfaces (abstract
base class sc_interface) and channels (a type definition of
SC_MODULE and implementing one or more interfaces) are main language
constructs to model inter-module communication of a system by means of
the aforementioned interface method call approach.
An event (class sc_event) is used to synchronise processes. The immedi-
ate or pending notification of an event (event.notify()) can trigger
(resume) the process that is waiting on it immediately or at a future time
point. An event can also be cancelled (event.cancel()) when it is at a
pending notification status. Compared to the interface method call method,
using an event is a lightweight synchronisation and communication method
to ease modelling costs. By flexibly changing the opportunity to notify or
cancel an event during simulation, users can change a process’s suspending
time at run-time.
Predefined channels
SystemC contains a number of predefined channels with affiliated methods and
ports, which implement some straightforward communication schemes (intro-
duced in Section 2.1.2). Note that although the mutual exclusion and the sema-
phore synchronisation methods are provided as predefined channels in SystemC,
their characteristics differ from what they usually are in the real-time software
context. We will address this issue later in Section 2.2.2.2.
46
2.2.2.1 The Co-operative Simulation Engine
The current SystemC execution model (after Version 2.1) can be implemented
(compiled) using three thread libraries on different host OS platforms, i.e., the
QuickThread package for UNIX-like OSs, the Fiber thread package for Windows
OS and the more portable POSIX pthread library [117]. But no matter what the
implementation is, the co-operative multitasking policy remains the same. Simply
speaking, only one process will be dispatched by the scheduler to run at a time
point. The running process cannot be pre-empted by another. In case the running
process is a thread type, it transfers the control to the scheduler by calling
wait() functions or exits; a method process only yields control when its func-
tion body finishes.
Figure 2-8 illustrates the operating cycle of the kernel. Notably, due to irrele-
vance to the simulation cycle, the initial elaboration phase (i.e., before the start of
simulation), at which SystemC modules are constructed, is not included in the fig-
ure.
Initialisation: This is the first phase after a SystemC simulation starts, i.e., af-
ter calling the function sc_start() in the main model program. All modelling
processes without a special declaration of dont_initialize() are put into a
Initialisation
Make all eligible
processes ready
to run No ready process,
simulation ends.
Execute a ready
Evaluation process
Update
Ready
Yes process?
No
Time advance
47
ready pool.
Evaluation: At the evaluation stage, ready processes execute sequentially, oth-
erwise the simulation ends if there are no runnable processes. The execution order
of them is unspecified in the SystemC specification. In the co-operative execution,
a process quits the running state either by initiatively calling a wait() statement
or simply finishing its function body. There are two kinds of wait() statements:
The wait(time) function makes a process blocked for an un-interruptible
time duration and will resume the process after that specified time. This will
be also referred to as the wait-for-delay method hereafter.
The wait(event) function makes a process blocked until the specified
event occurs. This will be also referred to as the wait-for-event method
hereafter.
Because processes may also notify some events immediately in execution and
thus cause other processes to be ready to run at once, the evaluation stage will it-
erate until no process is runnable. Besides, executing a process may access primi-
tive channels and change the signal value, which will consequently result in the
updating of data at the next update phase.
Update: In order to model the phenomenon that combinational electronic sig-
nals change values instantaneously in parallel within the sequential SystemC
simulation, SystemC uses an evaluation-update method to guarantee all signals
are synchronised. At the update phase, the update() method of each channel
that previously had requested an update before is called by the kernel to renew the
signal with a new value. If this action notifies an event to wake up a process, or
the kernel finds that some events are to notify blocked processes, then the kernel
will enter the evaluation phase again in order for repetition to occur. This proce-
dure, from evaluation to update and iteration, is known as a delta cycle, which
does not advance the simulation clock because everything happens at the same
time point in actual life.
Time advance: When there is no runnable process, the kernel will progress the
simulation clock to the earliest time point specified by a time delay or the nearest
pending event it is scheduled to notify. Some processes may thus become run-
nable and it is thus necessary to begin a new evaluate phase.
48
2.2.2.2 Advantages and Disadvantages for Real-time Software Modelling
49
Consequently, in order to model and simulate real-time software in the Sys-
temC environment, people should try to avoid or otherwise carefully use the
aforementioned error-prone features.
50
tion speed mainly depends on how many events are involved in simulation, i.e.,
the more events, the lower the speed.
SW PE Module HW Module
SC_THREAD in_port out_port
sw_isr signal<int> SC_METHOD
read() write() hw_gen
sc_event
SC_THREAD
sw_output
This example covers several basic SystemC modelling issues, e.g., concurrent
processes, software sequential execution, co-operative scheduling, event-based
synchronisation method, interface method call communication, static sensitivity,
and dynamic sensitivity, etc. The SystemC code of this example includes three
parts: the hardware module in Table 2-2, the software module in Table 2-3, and
the main function in Table 2-4.
51
Referring to Table 2-2, the function of the hardware module is simply embod-
ied in a SC_METHOD (hw_gen), which executes repeatedly after a randomised
interval (see line 14). In each execution, it writes a random integer TXD to the
output port by calling the method on the port.
Referring to Table 2-3, there are two SC_THREAD type processes in the soft-
ware processing element module. At line 12, the sw_isr process is sensitive to
the value change of the in_port and then receives data from it. Once sw_isr
52
#001 int sc_main(int argc, char **argv) //Main function
#002 {
#003 sc_signal<int> sig;
#004 HW hw_i("HW_moduel");
#005 SW sw_i("SW_module");
#006 hw_i.out_port(sig);
#007 sw_i.in_port(sig);
#008 sc_start(100, SC_US);
#009 return(0);
#010 }
finishes execution, it notifies the event evt_sw in order to make the other proc-
ess sw_output ready (see line 28). The two processes use wait(time)
statements to simulate their execution time cost (lines 26 and 41). Since it is as-
sumed that there is only one conceptual SW PE, the two processes need to execute
sequentially. A flag variable is used to guarantee that only one software process
can be at the running state (i.e., during a delay interval) at a time.
Referring to the main function in Table 2-4, modules and channels are created
and instantiated (lines 3-6). Corresponding ports on both HW and SW modules
are connected by the channel object sig (lines 6, 7) in the elaboration phase. A
call to the function sc_start() begins the simulation, which will continue for
100 microseconds target time in our simulation (line 8).
It should be noted that, in this example, two software processes execute accord-
ing to the SystemC native co-operative scheduling policy and use the uninterrup-
tible wait(time) function to advance the target clock. That is, one software
process executes up to completion and one process cannot pre-empt the other. As
a result, if a hardware signal arrives when a software process is executing, the
Missed Missed
sw_isr
sw_output value: 1 value: 7 value: 9
0 1 2 3 4 5 6 7 8 9 10 11 time (us)
53
software Interrupt Service Routine (ISR) cannot serve the hardware interrupt.
Figure 2-10 shows this phenomenon, in which interrupts are missed at time points
3 µs and 6 µs. In Chapter 3, we will present the solution to this problem.
54
Untimed Approximate-timed Cycle-accurate
System analysis phase System exploration phase Implementation phase Design Flow
Abstract RTOS modelling and simulation focus on early design phases, such as
system specification, system analysis and SW/HW pre-partitioning stages. At this
time, the target platform is undetermined and software code has not been imple-
mented. Also, it is not possible to presume specific RTOS API services in the sys-
tem-level simulation framework before enough decisions have been made regard
the system architecture. However, general structures and execution mechanisms of
the RTOS model should still be not far from real RTOSs, in order to make sure
that the RTOS model has a practical usability for real-time software design. Ab-
stract RTOS modelling is supposed to provide extensible real-time system model-
ling capabilities and be fast to be changed in evolving simulation loops.
In this approach, software applications are normally organised as a collection
of abstract tasks associated with coarse-grained temporal properties, e.g., period,
55
deadline, offset, and execution times [112] [72]. Periodic, aperiodic, and sporadic
tasks are typically explicitly defined by different timing characteristics, which in-
clude the main information obtained by the RTOS in order to handle a task. A
qualified abstract RTOS model needs to at least provide priority-based pre-
emptive scheduling services and basic primitives to control the “start” and “termi-
nation” of a task. This feature is essential for a practically usable RTOS model in
order to overcome the previously-mentioned limitations of underlying SLDL
bases. A task’s execution cost is usually modelled by the wait-for-delay statement.
The delay interval of every task instance (i.e., a job) is either statically annotated
by estimation or dynamically randomised by some statistical theories, e.g. uni-
form distribution [8]. The “delay-measurement and back-annotation” timing
method is also proposed in [113] [43], but it is applied at a coarse-grained timing
granularity (i.e., task-level). Inter-task synchronisation for resource sharing,
communication services and interrupt handling are usually not adequately consid-
ered in this kind of model. The advantage of this method is the fast simulation
speed, since applications and RTOS are highly abstract models. The main draw-
backs of this method are low timing accuracy (coarse time annotations for appli-
cations and inadequate modelling of RTOS timing overhead) and incomplete
modelling capability of RTOS functionalities. Besides, in most existing research,
there is a lack of SW/HW interaction modelling, and hardware parts of a CPU
subsystem are not explicitly modelled either. This means that software application
tasks and the abstract RTOS model form the software PE model by themselves.
Gerstlauer et al. present an early SpecC-based abstract RTOS model in order to
integrate software scheduling support in the TLM model refinement flow [43]
[122]. This RTOS model provides 16 basic primitives to support task management
and scheduling. RTOS timing overheads are not mentioned sufficiently. Besides,
it uses the imperfect wait-for-delay time advance method, so interrupt handling
cannot be accurately modelled and the timing accuracy is limited by the minimal
resolution of time annotations. A subsequent work [123] resolves this initial
HW/SW synchronisation problem by using an improved wait-for-delay method
named “Result Oriented Modelling”. In recent, Zabel et al. [124] use the SystemC
SLDL to implement an abstract RTOS model where most parts are based on the
56
work of [43]. They solve the HW/SW timing synchronisation problem by using
the SystemC wait-for-event method, which is also utilised in our research in this
thesis.
Early work by Madsen et al. presents a SystemC-based abstract RTOS model
[112], which is further extended for MPSoC simulation [8] and NoC simulation
[125]. The basic idea is to decompose an embedded system model into three com-
pact sub-models: the task graph model, the scheduler model, and the link commu-
nication model. The scheduler model provides both fixed-priority scheduling (e.g.,
rate-monotonic priority assignment) and dynamic-priority scheduling (e.g., EDF)
services by using three primitives (i.e., run, pre-empt, and resume) to manage
tasks. The task model is characterised by coarse-grained temporal information or
estimates, e.g., WCET, BCET, period, deadline and offset, but without any func-
tionality code. This RTOS model is a good basis for high-level system exploration,
but it also has some limitations. Firstly, RTOS service overheads are not included
in the model. Furthermore, its task state machine model is different from that usu-
ally found in a typical real-time kernel, and the task model is also too simple to
mimic a real system. Finally, its link communication model heavily relies on the
SystemC Master-Slave message-based communication library for both software
internal and inter-module communications, whose behaviours are different from
common RTOS synchronisation and communication mechanisms.
Hessel et al. describe an abstract RTOS model in SystemC SLDL for use in the
embedded systems refinement flow [113]. Both the structure and implementation
of this RTOS model is similar to Madsen’s model; hence, it is also weak due to
simplistic task modelling and incomplete RTOS service modelling.
Moigne et al. propose a generic RTOS model for real-time systems simulation
[114]. This work has the advantage of considering timing overheads of three
RTOS services, i.e. context-load time, context-save time and scheduling algorithm
duration. Nevertheless, this work does not address task functionality modelling,
interrupt handling and synchronisation modelling.
Hastono et al. use an abstract RTOS model for real-time scheduling assess-
ments [126] and embedded software simulation [72]. The RTOS model provides
basic task management services similar to the models of Gerstlauer and Madsen.
57
Various static and dynamic scheduling policies, e.g., event-driven, time-triggered,
fixed-priority RMS, dynamic-priority EDF, etc. are integrated in order to evaluate
and compare different task scheduling decisions. The functionality of a task is de-
composed into non-pre-emptive atomic actions and pre-emption is assumed to
happen only at boundaries of atomic actions. Consequently, this pre-emption
model cannot simulate interrupts realistically.
Hartmann et al. present an abstract RTOS simulation model as a part of their
SystemC-based system synthesis design flow [127]. They model software on a
generic run-time system rather than directly modelling existing RTOS services,
i.e., all conventional software synchronisation and inter-task communication
mechanisms are modelled by the shared objects method. The intention is to inherit
their previous hardware modelling work and thus allow a seamless high-level
SW/HW specification environment.
58
simulation methods in [121] [128] [130], respectively. In [121] [128], they build a
software simulation model (including OS, application software, and a bus func-
tional model) annotated with timing delays and run it as a host Unix process,
whilst, the hardware part is modelled in SystemC SLDL. The communication be-
tween software and hardware is implemented with Unix IPC methods, such as
shared memory and signal. In order to solve the HW/SW synchronisation problem,
they propose a “variable timing granularity” method to simulate interrupts by
trading off the simulation performance with the timing accuracy. In [130], they
use a different way to model the software part, where application tasks are sched-
uled by an OS model by using the multi-threading functionality of the host OS,
and then the whole software part is integrated into a SystemC HW/SW co-
simulation framework. Both a pre-emptive FIFO based scheduler and a real eCOS
RTOS are implemented in the OS model library. With the same RTL model on
the HW side, compared to the cycle-accurate ISS software simulation, the co-
simulation performance with native RTOS simulation is reported as three orders
of magnitude faster, and the simulation accuracy achieves 86% of the ISS. In gen-
eral, from the RTOS modelling aspect, this research has the advantage of consid-
ering various detailed RTOS service overheads and accurately modelling HW/SW
interactions (e.g., interrupt handling and memory access). However, their models
sometimes utilise the underlying host OS services, which may deteriorate the
portability and negate SLDL’s intent as a homogeneous modelling framework.
A SystemC-based native simulation model for a commercial Texas Instrument
RTOS is presented by He et al. in [87]. It models common RTOS services such as
task management, priority-base scheduling, task synchronisation, I/O, and inter-
process communication with timing overheads estimated from the target proces-
sor’s benchmark sheet. This simulator uses an event time-stamp prediction
method for interrupt modelling, which is based on an assumption that application
tasks can report happening times of their future synchronisation events to the ker-
nel. This tight requirement requires pre-requisite analysis of the whole system and
may hence restrict its usability.
A HW/SW co-simulator that includes a special-purpose μITRON 4.0 RTOS
model is introduced in [129]. It natively simulates a complete μITRON RTOS
59
model with application software on the host computer. For the HW aspect, C/C++
or HDL HW models can be included in the simulator and can communicate with
the software simulator by using Windows IPC methods. This work has a draw-
back in that its simulated clock relies on the host OS clock, i.e., it is untimed from
the perspective of target software simulation. Furthermore, host IPC methods may
bring an extra and unexpected simulation overhead.
Chung et al. describe a generic SystemC-based RTOS model which is oriented
for MPSoC simulation in [131]. Its generic RTOS and POSIX like API models
support native application code to execute with RTL/TLM HW models. However,
its RTOS task machine model is lacking in modelling real-time synchronisation
mechanisms. And it also uses a polling method to check interrupt events in every
clock-cycle, which may result in undesired consequences that interrupt latency
depending on the length of a simulation clock cycle, i.e., it is an “annotation-
dependent” HW/SW timing synchronisation approach.
Posadas et al. develop a comprehensive POSIX compliant RTOS simulation
model on top of SystemC in [12] and apply a dynamic delay annotation method
by assigning each C++ operator with a corresponding target-platform execution
cost. In [132], they address the global variable accessing problem and propose
three joint solutions. Their first method is a fine-grained annotation technique (see
Section 3.1.2); the second method can guarantee a correct functional simulation
result but still has the delayed interrupt handling deficiency due to its wait-for-
delay method (see Section 3.1.1); the third method is satisfactory and similar to a
method used in this thesis (see Section 3.2.3.2), but it focuses on abstract software
programming models by providing a special primitive channel to protect global
variables.
60
software source code is cross-compiled and simulated in a cycle-accurate instruc-
tion set simulator that represents the target processor’s behaviour. The ISS is usu-
ally wrapped in a SLDL module. A real RTOS is often ported in the ISS to super-
vise software application. Other SLDL-based HW component models are con-
nected with the ISS-wrapper model by the SLDL communication backplane to
achieve a co-simulation. This co-simulation approach is similar to the traditional
cycle-accurate embedded system co-simulation approach, which uses HDLs to
model hardware components at RTL level and uses the ISS to execute software.
Compared with the conventional approach, this unified system-level HW/SW co-
simulation approach can enhance design productivity by raising the abstraction
level of HW models and then gain simulation speedup to some extent. However,
this may somewhat contradict the system-level design concept of raising abstrac-
tion level for more efficient design space exploration, because it does not change
the software simulation method.
Chevalier et al. integrate a C/OS-II RTOS on an ARM ISS which is wrapped
by a SystemC model [108]. Their modelling framework constructs a conversion
interface between SystemC API and the C/OS-II API in order to let the RTOS
schedule SystemC-based application software processes. Benini et al. build a Sys-
temC-based multi-processor co-simulation platform [109] that uses SystemC to
wrap several cycle-accurate ARM ISS simulators to run multiple cross-compiled
Clinux kernels and software applications.
To trade-off simulation speed with accuracy, the approaches in [120] and [133]
take a different approach by running software application on the ISS whilst build-
ing a RTOS model on top of the SLDLs. However, [120] only supplies task pre-
emption services and considers limited RTOS timing overheads.
61
Compared with existing research, the proposed RTOS simulation model em-
bodies the mixed timing software modelling idea (in Section 3.2) by supporting
hybrid abstract task models and native-code task models in a single simulator, in
order to enhance modelling flexibility and expand application domain.
Furthermore, the generic RTOS model’s functionality is determined by survey-
ing some popular RTOS products and standards. It aims to support more realistic
software simulation than other simplistic RTOS models. Most importantly, the
high simulation performance and good timing accuracy are preserved at the same
time in the RTOS simulation model because of the underlying Live CPU Model.
The details of this model will be described in Chapter 4.
2.4 Summary
In this chapter, some basic concepts in transaction-level modelling research
have been introduced. The focus is to survey current abstraction levels, timing de-
grees, and communication modelling in the TLM research context, in order to in-
spire our research on real-time software behavioural modelling and simulation
that can be seen as the TLM software computation aspect. However, we noticed
that existing TLM abstraction levels and models are not appropriate and are insuf-
ficient for real-time software modelling. Thus, in the next chapter, we will define
some real-time embedded software simulation models in the context of SystemC
based TLM research.
Subsequently, SystemC language constructs and the co-operative simulation
kernel were introduced. A SystemC-based HW/SW system example model was
presented. This demonstrates how the use of uninterruptible wait-for-delay state-
ments may lead to missing external interrupts in simulation, which highlights a
problem to be solved.
Some state-of-the-art RTOS modelling approaches and simulation models for
SLDL-based system-level design were surveyed also. They are classified into
three categories depending on timing and functional accuracy levels. Among them,
the abstract RTOS modelling approach and the native-code RTOS modelling ap-
proach are of concern to this thesis. We aim to propose a generic mixed timing
62
RTOS simulation model with improved features in terms of timing accuracy,
functionality, and modelling flexibility.
63
Chapter 3
65
In this chapter, the mixed timing approach mainly seeks answers to the above
three key requirements from the timing perspective of modelling and simulation,
but also considers software functional modelling. Separating timing issues in
modelling and preserving high timing accuracy in simulation are two characteris-
tics of this approach. The conventionally annotation-dependent SLDL-based
software modelling and simulation is treated as two partially separated stages1:
1) The timing modelling step mainly refers to annotating target platform exe-
cution costs (time delays) and defining time advance points in software task
code, when SLDL-based software task models are being built.
2) The timing simulation step mainly refers to advancing the target simulated
clock according to those annotated time delays, when these SLDL-based
software task models are dynamically simulated upon a SLDL simulation
engine.
This approach allows flexibility in software timing modelling, achieves good
timing accuracy in software timing simulation, and maintains a high simulation
speed. It has following basic features:
It utilises multiple-grained software timing information and variable annota-
tion methods for software models at the modelling stage (in Section 3.2). It
facilitates model builders and simulation users for using a variety of avail-
able means of timing estimation sources, and allows building mixed timing
simulation models with varying timing precision for workload and accuracy
trade-off.
It preserves high hardware interrupt handling and software pre-emption tim-
ing accuracy within a certain bound at the timing simulation stage. The Live
CPU Model (in Section 3.3) is introduced to supervise software timing
simulation and monitor external interrupts in simulation. By excluding pos-
sible interrupt disabled cases (e.g., critical section code), the Live CPU
Model can interrupt current software simulation (i.e., stop its delay time ad-
1
It is necessary to point out that the separation of timing issues in modelling and in simulation is
“partial”, because these two aspects cannot be totally decoupled in back-annotated timed software
simulation.
66
vance) as soon as an IRQ is caught, and resume remaining time advance for
the pre-empted task at the correct time point, just like real CPU execution.
Compared to some conventional pre-emption simulation approaches that
trade off simulation speed for accuracy, the simulation performance of the
proposed approach is not sacrificed whilst timing accuracy is sustained.
It offers varying system simulation similarity and run-time information ob-
servability. By configuring the Live CPU Simulation Engine with the vari-
able-step and the fixed-step time advance methods, the users can make
trade-offs between simulation similarity, information observability and
simulation performance (in Section 3.3.4).
Figure 3-1 illustrates the mixed timing software modelling and simulation ap-
proach. In the figure, various grained delay time slices, e.g., task-level, function-
level, and source code line-level, can be annotated to the same software model at
the timing modelling stage. The Live CPU Model uses these different sizes of
time annotation statements to progress the target simulated clock. In this mixed
timing approach, the granularity of a time annotation does not interfere with the
dynamic timing accuracy of HW/SW synchronisation (i.e., interrupt handling) in
timing simulation. Interrupt handling does not need to wait until a delay slice has
totally elapsed, i.e., reaching a delay boundary. On the contrary, an ISR can pre-
empt current running software task as soon as an external interrupt happens, just
like the situation at the time point t1. After an ISR finishes execution at time t2, the
pre-empted software task is resumed and the remaining value of the previously-
interrupted delay annotation slice is also continued.
Timing
Case 2: Function-level annotation 250ms 200ms 100ms
Modelling
Case 3: Source code line-level
annotation 1ms 1.5ms 5ms 2ms 4ms 10ms
67
In the reminder of this chapter, some problems and approaches regarding tim-
ing issues in existing SLDL-based software modelling and simulation will be sur-
veyed (Section 3.1). Section 3.2 describes the mixed timing approach in detail, in
terms of various timing techniques for software modelling and simulation. The
Live CPU Model is introduced in Section 3.3, which is not only important for tim-
ing accurate pre-emptive software simulation but also meaningful for extending
the software processing element model to the TLM modelling context. Finally,
evaluation metrics and experiments are presented in Section 3.4 and Section 3.5
respectively, in order to demonstrate benefits of the proposed approach. Section
3.6 will summarise this chapter.
68
Priority task1 wait(2)
ISR ISR
high ISR and task2 execute in parallel.
t0 time
HW IRQ
happens
(B) Wrong concurrency in a uniprocessor system
interrupt event should be processed as soon as possible once it occurs, just like the
normal situation of a real-time system.
In simulation, once a wait-for-delay statement is invoked, the value of software
delay time will be totally consumed without a possibility of interruption. Conse-
quently, task2 can only execute after the wait-for-delay statement of task1 is fin-
ished. In such cases, once an interrupt event is raised by a hardware module dur-
ing this delay duration, e.g., at time t0 in the example, it may lead to two problem-
atic simulation phenomena depending on modelling methods.
Figure 3-2 (A) shows the first possible problem: “delayed interrupt handling”.
Because the wait-for-delay statement of the running task2 cannot be interrupted,
the ISR can only start when the current delay time slice finishes at time t1. It can
be observed that the ISR is wrongly postponed rather than serving the interrupt
request at the expected time point. Under such circumstances, both software tick
scheduling and the HW/SW synchronisation (i.e., interrupt handling) can only oc-
cur at the boundaries of delay annotations. Simulation time advance is dependent
on the granularity of annotation. In simulation, both the pre-emption latency and
the interrupt latency til are unrealistically restricted by length of delays that are
69
defined at the modelling stage. Under the worst circumstances, the latency equals
the largest time delay value. This time advance method makes it hard to model a
pre-emptive real-time system or a real interrupt handling procedure.
Considering the second case in Figure 3-2 (B), the model programmer may
choose to start the ISR as soon as it is raised. However, this brings a critical prob-
lem in that the ISR and the existing task execute in parallel in simulation, i.e., they
are both at the RUNNING state from the perspective of CPU scheduling. Obvi-
ously, in a uniprocessor system, this situation cannot occur. For this simulation
problem, programmers therefore need to correct the affected time delay in order to
serialise software execution with right timing behaviour. This problem resembles
the conventional optimistic co-simulation that may require time rollback and re-
execution.
In the following Sections 3.1.2 - 3.1.4, three existing techniques will be intro-
duced, which aim to remedy this annotation-dependent time advance problem.
More importantly, we will present our complete solutions the “mixed timing ap-
proach” and the “Live CPU Model” in the rest of this chapter.
70
wait(t)
Priority
task1 w(t) w(t)
low
task2 w(t) w(t) w(t)
til
ISR ISR ISR
high
ISR is still delayed.
t0 t1 time
HW IRQ
happens
71
ware simulators manage their local clocks separately and exchange timing infor-
mation via inter-process communication. It is known that IPC overheads may con-
tribute a large portion of simulation time and affect the simulation performance.
The HW/SW timing synchronisation in [121] can be seen as a compromise of the
classic conservative algorithm [134]. Therefore, HW/SW timing synchronisation
accuracy may not be guaranteed when using coarse-grained granularity of timing
annotations.
To solve the problem in Figure 3-2 (B), Schirner et al. introduce their time cor-
rection method Result Oriented Modelling for SLDL-based pre-emptive software
simulation [123]. It still uses the uninterruptible wait-for-delay statement for time
annotation and clock progress, but it can virtually interrupt a wait-for-delay state-
ment in order to enable pre-emption at any time point. In the case of an interrupt
event, the ROM-based RTOS model first records pre-emption timing information.
Then, after the finish of both the existing wait-for-delay statement and interrupt
disturbance, it will finally make a new corrective wait-for-delay statement for the
affected time advance step.
Figure 3-4 illustrates two possible interrupt handling scenarios in the ROM ap-
proach. In case (A), the application task2 begins to run at t0 and then calls a wait-
for-delay statement ranging 8 time units from t0 to t3, so as to mimic its execution
timing cost. This step is called an “initial prediction” in ROM, because it simply
assumes that the task2 can solely occupy the CPU during this wait-for-delay time
interval. However, at t1, a hardware interrupt request is detected. Thus, the RTOS
scheduler dispatches a corresponding ISR as the new RUNNING task to pre-empt
the lower-priority task2. Herein, the RTOS model changes OS status of task2 from
RUNNING to READY, and records the pre-emption time stamp in the Task Con-
trol Block (TCB) of task2. Afterwards, the ISR executes some functions and be-
gins its wait-for-delay statement. During the time duration from t1 to t2, although
both the ISR and task2 are suspended by wait-for-delay statements, their task
states are distinct in the sense of RTOS task management. When the ISR finishes
at t2, RTOS changes OS status of task2 to RUNNING again. More importantly,
72
Calculate pre-
emption amount:
t2 - t1 =3
Initial prediction: Correction:
Priority wait(8) wait(3)
task1 wait(3)
low
task2 ∆t=4 ∆t=3
t0 t1 t2 t3 t4 time
HW IRQ
happens
(A) Pre-empted task wakes up later than the finish of ISR
Calculate
pre-emption
Initial prediction: amount: Correction:
Priority wait(8) t2-t1 =2 wait(2)
task1 wait(3)
low
task2 ∆t=6 ∆t=2
t0 t1t2 t3 t4 time
HW IRQ
happens
(B) Pre-empted task wakes up earlier than the finish of ISR
RTOS calculates how long task2 is pre-empted as its new delay time interval,
namely t2-t1. The initial prediction of task2 ends at t3 and the new corrective wait-
for-delay statement is then issued immediately.
The scenario of Figure 3-4 (B) is slightly more complex than the previous case.
In this example, the initial prediction of the pre-empted task2 finishes at t2 that is
earlier than the ISR’s wait-for-delay finishing time t3. This means that task2 will
wake up and needs to be processed immediately so as not to execute its subse-
quent model code. The RTOS model firstly calculates the pre-emption interval of
task2 as t2-t1 and then indefinitely suspends task2. The ISR finishes at t3 while
task2 is scheduled by the RTOS to resume again. A new wait-for-delay statement
that uses the before-calculated pre-emption interval as the delay parameter is re-
leased in order to revise time advance for task2.
In summary, a ROM simulation procedure contains three steps: 1) Execution of
an initial wait-for-delay statement; 2) Collection of any disturbing events and up-
date of delay information; 3) Making a corrective wait-for-delay statement. By
this approach, the sequential software concurrency can be realised for a uniproc-
73
essor system model. The good timing accuracy of HW/SW synchronisation and
software pre-emption is successfully achieved from the perspective of virtually
pre-empting wait-for-delay statements in SLDL-based simulation.
The “black box” simulation concept is another worthy point emphasized by
ROM [135]. It prefers to only present adjust end results (e.g., termination time
and final state) of a simulation process rather than model and reveal any internal
state changes to users. For example, during a wait-for-delay interval of a software
task, if multiple interrupts happen, the ROM will collect the disturbances together
and only issue one corrective wait-for-delay statement. This “black box” concept
has positive and negative aspects:
1) It brings the advantages of speeding up simulation performance by hiding
intermediate states and maintaining timing advance accuracy by consider-
ing interference from hardware interrupts.
2) In ROM, it is difficult to maintain the similarity of middle state changes to
a real execution at certain circumstances. This is an inevitable compromise.
Because ROM uses the inherently uninterruptible wait-for-delay functions,
there is no way to cancel or postpone a wait-for-delay statement once it be-
gins. Hence, the timing point when a model process wakes up from a wait-
for-delay duration is also unchangeable either. This feature may bring a de-
fect in simulation traces, incurring an amount of simulation overheads. In
ROM, the pre-empted task may wake up at unexpected time points as long
as its wait-for-delay time period is finished. Referring to Figure 3-4 (B) for
instance, task2 wakes up at t2 and calls for processing from the RTOS
model. However, from the perspective of OS multitasking management,
task2 should not initiatively trigger the OS to process it at this time point
because it has been pre-empted. This phenomenon will result in an unnec-
essary RTOS processing procedure, a SLDL simulation kernel context
switch, and a consequential simulation overhead.
3) The ROM approach aims to collect all interrupts that happen during a wait-
for-delay time advance interval and launches a new wait-for-delay state-
ment for the affected task to correct its delay time. In the best case, only
one new corrective wait-for-delay statement is needed to revise an affected
74
Initial
prediction: Corrective
wait(8) prediction: Corrective Corrective
wait(4) prediction: prediction:
Priority wait(2) wait(1)
low task
t0 t1 t2 t3 t4 t5 t6 time
HW IRQ HW IRQ HW IRQ
happens happens happens
time advance step. Whereas, the possibility should be taken into account
that another pre-emption event may happen during a corrective wait-for-
delay interval. This means that one more successive corrective wait-for-
delay statement is required. Figure 3-5 shows such an example. In fact, the
exact number of wait-for-delay statement may vary depending on the num-
ber of pre-emption events and where they happen, which are dynamically
determined in simulation. It may be very costly to correct successively in-
terrupted time advance steps in some conditions.
75
quently, a simulation speedup can be expected due to a fewer number of costly
simulation kernel context switches.
The mixed timing approach is a general approach oriented to SLDL-based real-
time software (including tasks and the RTOS) behavioural modelling and simula-
tion. According to the aforementioned taxonomy of system-level software and
RTOS simulation research in Section 2.3, it can be applied to both coarse-grained
timed abstract software modelling and fine-grained timed native software model-
ling. In this section, this modelling and simulation approach is implemented by
typical SystemC language constructs, mainly the wait-for-event method (see Sec-
tion 2.2). Because of the similarity between SystemC and SpecC SLDL, it is
promising to be generalised to the SpecC context.
76
1) Timing issues in modelling: This aspect is concerned about timing issues
that are statically determined at the model building stage. It relates to vari-
ous jobs that add time delays for software computation models, e.g., define
timing styles of models, choose sources of timing information, apply vari-
able annotation granularities, annotate timing information into model code,
and insert time advance points in models.
2) Timing issues in simulation: This refers to timing issues that are dynami-
cally behaved at simulation runtime. It relates to jobs that use time delays
for simulation time advance, e.g., simulate target timing behaviour for
software models, progress the simulation clock, and process interrupts.
In the following, this mixed timing approach is explained with regard to vari-
ous issues in relation to aspects of timing modelling (Sections 3.2.2 - 3.2.6) and
timing simulation (Section 3.2.7). Besides, the Live CPU Model is an essential
basis of this approach (Section 3.3).
77
HW TLM SW TLM
PVT PVT
Specific bus model Specific arbitration
PV PV
Generic bus model Generic arbitration
- Untimed Specification
Accuracy
Service Layer - Computation TLM
Speed
Sync./asyn.
RPC
- Host Compiled ISS
Figure 3-6. Related SW modelling abstraction level definitions (reprint [6] [9])
processors. However, this work does not specifically distinguish various TLM ab-
straction levels. In general, bearing the current status of TLM research in mind,
most TLM abstraction level definitions have focused on modelling abstractions
for communication and hardware design, and may not be appropriate for software
modelling.
According to the basic assumption of OS-based task modelling and simulation
in this thesis, it is not recommended to use TLM communication techniques in
software modelling, since they are not common methods in conventional real-time
software development. This idea is contrary to [6] that uses OSCI TLM commu-
nication services for joint HW and SW communication exploration.
Note that it is not nontrivial to utilise existing TLM concepts directly. Here we
need to define appropriate behavioural software abstraction levels/models and in-
troduce their relationships with existing TLM modelling communication concepts.
78
ling part, in Section 2.3, system-level software (RTOS) behavioural modelling
and research is classified into two general categories depending on their timing
accuracy: coarse-grained timed abstract models and fine-grained timed native-
code models.
This section compares characteristics of the mixed timing software models and
the OSCI communication modelling standard (see Figure 3-7):
Both modelling approaches decompose a model’s functionality into several
basic entities, i.e., tasks (or finer-grained functions) for software modelling
in our approach, and transactions with corresponding transport functions for
TLM communication modelling. If there is a further necessity for more ac-
curate modelling, then a basic entity can be divided into some finer-grained
entities, i.e., multiple functions inside a task or multiple basic blocks inside
a function, as well as corresponding multiple phases that task place during a
transaction’s transmission life.
We define two comparable timing abstraction levels for models. The coarse-
grained timed level and the fine-grained timed level for software modelling
are comparable to the LT coding style and the AT coding style for TLM
communication. We propose that the coarse-grained timed level uses two
time points to represent the execution cost of a task or a function, i.e., the
beginning and the end of execution. The LT coding style also defines two
time points for each transaction to denote calling to and returning from the
Figure 3-7. OSCI TLM-2.0 models and proposed TLM software models
79
transmission respectively. Accordingly, the concept of the fine-grained
timed level is also parallel to the OSCI AT communication coding style.
This is because they both use multiple timing points inside a basic func-
tional unit, namely, multiple annotations and timing synchronisation points.
Besides, both the untimed timed level and the cycle-accurate timed level are
not recommended in either our software modelling or the OSCI TLM stan-
dard. This is because modelling real-time software and contemporary bus
communication systems apparently need a timing concept.
Based on the above comparison, our software modelling proposal has some
similarity to the OSCI TLM-2.0 communication modelling standard, that is, in
terms of modelling concepts about timing granularity and functional granularity.
Since they are both implemented in the SystemC simulation environment, they
also include similar changing trends in terms of modelling accuracy and simula-
tion performance. This means that models at a corresponding level are “harmoni-
ous” to each other without resulting in undesired extreme behaviour in the context
of TLM co-simulation. We will explain software model definitions in detail in
Section 3.2.3.
In addition, each hardware computation model (e.g., a hardware peripheral de-
vice) needs to be annotated with delays to accompany with software timing mod-
els. Each TLM inter-module communication action is also to be assigned with
corresponding communication delays. However, these two parts are not the focus
of this thesis.
80
2) Different system design teams may focus on modelling different system as-
pects according to their respective design circumstances. For example,
modelling computation and modelling communication are two distinct
working directions in the context of embedded systems modelling and
simulation. As well, RTOS designers and application software program-
mers also pay different attention to SW modelling. It is not only infeasible
but also costly to build all sub-models with the same timing accuracy level.
Therefore, in order to increase flexibility of software validation, a mixed tim-
ing approach is an efficient and practical solution. At some certain early and mid-
dle design stages, with the advance of the development and change of validating
intention, software designers can build and simulate behavioural software models
at various functional and timing levels in a unified SystemC framework.
There are two difficult issues in system-level software modelling and simula-
tion: timing accuracy and simulation performance. It is well known that the
granularity of annotation is a dominant factor of timing accuracy, in terms of
mostly determining whether or not the execution cost of a code segment is “accu-
rately” reflected in the model. For example, given a code segment including dy-
namic data-dependent loops, a single coarse-grained time annotation for the whole
code segment is very likely to be less accurate than several fine-grained time an-
notations for each loop. On the other hand, simulation performance is also a major
issue concerning simulation users in the early design phases. Simulation models
need to process many annotation statements intervening between functional codes,
which necessarily result in simulation overheads. Moreover, a delay annotation
statement is always implemented as a wait-for-delay statement or associated by a
wait-for-event statement in order to progress the simulated target clock. Such
statements result in context switches between the SystemC simulation kernel and
software model processes. Consequently, fine-grained time annotations may lead
to more simulation overheads as a side-effect. The mixed timing approach pro-
poses using different annotation granularities in software models, and thus enables
model programmers to switch timing accuracy for simulation performance in
simulations.
81
There are already some typical annotation granularities mentioned in existing
annotation-based software simulation research, e.g., the assembly instruction level,
the source line level, the basic block level, the function level, and the task level
[121]. This thesis uses some of them in research and presents guidelines for using
some appropriate timing annotation granularities in the two types of software be-
havioural models, i.e., abstract software models and native-code software models.
Currently, time annotations are manually inserted into software models and auto-
matic annotation is beyond the focus of this thesis. Research examples in this area
can be found in [136] [137].
82
#001 void task1(){
#002 while(1){
#003 //No code or #001 void func1()
#004 functional_code; #002 {
#005 DELAY(fixed_value); #003 ...
#006 //or #004 ...
#007 DELAY(random_value); #005 DELAY(t1);
#008 wait-for-event; #006 wait-for-event;
#009 } #007 }
#010 }
(A) Pseudo code of task-level time (B) Pseudo code of function-level time
annotation annotation
(Table 3-1 (A)) and function-level (Table 3-1 (B)) time annotation levels are pro-
posed for abstract software models. Each annotation statement corresponds to an
execution unit, i.e., a task or a function. The delay time information can either be
given as a fixed value representing the WCET at the model building stage, or be
randomised between a lower bound (i.e., the BCET) and an upper bound (i.e., the
WCET) for each job of a task in simulation time.
An annotation value is inserted by the DELAY() function (e.g., line 5 in Table
3-1 (A)), which passes the delay value to the Live CPU Model and triggers it for
an interruptible time advance. A wait-for-event statement is inserted after a delay
statement (e.g., line 8 in Table 3-1 (A)), in order to yield control of the SystemC
simulation kernel and let the task wait for resuming after the delay. It defines a
time advance point (also referred to as a timing synchronisation point). From the
multitasking OS point of view, calling the wait-for-event statement and returning
from it mark the beginning and the end of “execution duration” of a software
model along the target simulation timeline. From the perspective of SystemC
simulation, a piece of “execution duration” is in fact a piece of “waiting duration”
of a SystemC process.
As shown in Figure 3-8, because an abstract software model is assumed to be
independent and does not access shared variables, it execution duration can be
freely interrupted by higher-priority IRQs, i.e., any asynchronous interrupt events
can stop its time advance step. Although a delay value is only annotated once, it
can be divided into many slices due to ISRs. This models a correct timing order of
execution.
83
Execution
Independent
execution cost of the
task model
Priority
low task t t t
ISR t t t
high
time
IRQ IRQ IRQ
The details of the wait-for-event method, the interruptible time advance method,
and the DELAY() function will be introduced in Sections 3.2.7, 3.3.4 and 4.5.8.1.
When a large quantity of software application code has been developed and a
RTOS has been either supplied as an off-the-shelf product or developed in-house,
native-code software models can be built. The available software code is wrapped
in some software task models that are also implemented as SLDL processes.
These task models can be further divided into statement segments or atomic basic
blocks whose performance is measurable or estimable with relatively high accu-
racy. These native-code application software tasks can utilise the APIs of a RTOS
model, which may model specific services of a real RTOS and is annotated with
corresponding timing delay information.
Timing accuracy becomes a major concern in native-code software simulation.
The desired target timing behaviour cannot be directly represented in native-code
software execution. Hence, software execution costs (time delays) on the target
platform need to be either analysed by a static analysis method or dynamically
evaluated in a measurement-based method, and then be manually or automatically
annotated to corresponding code statements in task models. Fine-grained state-
ment segment level annotations and basic block level annotations are advocated to
be applied in this type of software models.
84
#001 void func1(){ #001 void func1()
#002 if(condition) #002 { annotation before code
#003 { A compound #003 DELAY(t1);
#004 ... #004 wait-for-event;
#005 } statement
#006 DELAY(t1); #005 int temp = 0;
Basic block 1
#007 wait-for-event; #006 if(condition)
#008 #007 {
#009 int temp; #008 temp++; Basic block 2
#010 temp = 100; Several statements #009 DELAY(t2);
#011 temp++;
#012 DELAY(t2); #010 wait-for-event;
#011 } annotation
#013 wait-for-event;
#014 } #012 } after code
(A) Pseudo code of statement segment level (B) Pseudo code of basic block level time
time annotation annotation
In the example code sown in Table 3-2 (A), a statement segment is either a
compound statement or several sequential statements. A compound statement is
defined as a sequence of source statements enclosed by a pair of curly braces
[139]. In modelling, several sequential assignment or number operation state-
ments are also treated as a statement segment for convenience of annotation.
However, a statement segment should not include access to an OS service, which
should be treated as another segment.
A basic block is a sequence of code that has only one entry point and only one
exit point [140]. In Table 3-2 (B) the annotation statement of a basic block may
have two possible places, i.e., before the basic block or after the basic block. In
modelling, where to place the annotation statement depends on how to “glue” the
time annotation near its code block, in order to make native-code execution syn-
chronise with corresponding target-time advance steps as much as possible.
Multiple DELAY() functions and wait-for-event time advance points are in-
serted in native-code software models. Their respective behaviour is the same as
the before-mentioned abstract software models.
In native-code models, software code segments may access global shared vari-
ables that may be affected by external interrupts. If a code segment and its annota-
tion are defined improperly, a wrong simulation trace and a result may be gener-
ated. As shown in Figure 3-9 (A), in real software execution, a task independently
executes code segment 1 from time t0. At time t1, an IRQ happens and pre-empts
the task. An ISR writes a value to a global variable. Afterwards, the task resumes
and its code segment 2 reads the global variable to obtain an updated value.
85
Figure 3-9 (B) shows a possible corresponding simulation trace, in which the
task code segment (with its corresponding annotation) includes both code segment
1 and 2. This means that the task not only executes some independent functions
but also reads the global variable at t0, and its total delay begins accordingly. The
IRQ still happens at t1, then pre-empts the task, and writes the global variable. Al-
though the time advance of the task can be interruptible and maintained correctly
in terms of the simulation time order, the functional simulation result is possibly
wrong because the software task gets an outdated value of the global variable.
The solutions to this problem are straightforward:
1) In software models, global variables should be protected by mutual exclu-
sions in order to avoid race conditions. This is effectively a common con-
ISR
high
ISR writes a value
t0 t1 t2 to the shared time
IRQ
happens variable.
Priority
low task delay1 delay2
ISR t
high
ISR writes a new
t0 t1 value to the time
IRQ
happens shared variable.
86
vention in software programming.
2) In terms of native-code simulation, a code segment should not include both
independent functions and an access to a global variable. In another words,
an access to a global variable should be placed in a separate segment that is
as short as possible. Based on the first solution, this requirement is not dif-
ficult to implement in modelling, because a global variable segment is al-
ways marked by calling to OS mutually exclusive services.
Fine-grained time annotations can improve timing accuracy in case there are
data-dependent conditional or looping statements in code, but too many intrusive
annotations not only require more modelling work but also decrease simulation
speed. Similarly, defining many time advance points (so-called timing synchroni-
sation points) can make the simulated clock be progressed smoothly. However, it
also decreases simulation performance. Consequently, two techniques regarding
timing annotations and time advance points are utilised in order to improve simu-
lation performance.
87
#001 while (a < 10000) #001 while (a < 10000)
#002 { #002 {
#003 DELAY(tbb10); #003 a++;
#004 wait-for-event; #004 DELAY(tbb10+tbb11);
#005 a++; #005 wait-for-event;
#006 DELAY(tbb11); #006 }
#007 wait-for-event; #007 DELAY(tbb10);
#008 } #008 wait-for-event;
#009 DELAY(tbb10); #009
#010 wait-for-event; #010
(A) Precise basic block level time (B) Merging time annotation statements
annotations
analysis for application software. This tool can organise assembly code in basic
blocks (see Figure 3-10 (B)) and generate a control flow graph (see Figure 3-10
(C)). Referring to the figure, there are two basic blocks in the program, i.e., the
“Block 10” of the “while” statement and the “Block 11” of the looping body.
If this program is annotated with basic block level timing delays, then three an-
notation statements are needed, as shown in Table 3-3(A). Because the two basic
blocks “Block 10” and “Block 11” (line 1 and line 5 of Table 3-3 (A)) execute
sequentially at most times except for jumping out of the while loop, their time an-
notations tbb10 and tbb11 can be merged into one annotation as showed on line 4 of
Table 3-3 (B).
This technique advances the annotation level from the basic block level to the
statement segment level, which is a mixed timing annotation technique and widely
used in our research. Normally, merging multiple annotation statements should
sacrifice timing accuracy of annotations as little as possible. For instance, the DE-
LAY(tbb10) statement (line 9 of Table 3-3 (A)) corresponds to the “compare and
jump out” execution of the while statement and is improper to be combined into
the annotation statement inside the loop body. Otherwise, target time advance
steps cannot match the native-code execution flow. However, if model builders
intentionally make tradeoffs between accuracy and performance, it is also accept-
able that some tiny one-shot annotations can be omitted.
The second technique to increase the simulation speed is to reduce the number
of wait-for-event statements in models, i.e., reducing the number of time advance
88
points. The basic idea is inspired by the “lazy synchronisation” method introduced
by Hartmann et al. [127], in which this method is used in proprietary abstract
software modelling. Here, we refine it for native-code software simulation models.
As introduced before, a time advance point refers to a timing synchronisation
point where a software model process yields control to the SLDL simulation ker-
nel in order to let it advance the simulated clock.
In discussions and figures in Section 3.2.3, the annotation statement DELAY()
and the wait-for-event method are used together. A DELAY() function finishes
two jobs, i.e., injecting an annotation value into the Live CPU Model and invok-
ing it to advance the timing delay value at once. In fact, in the proposed mixed
timing approach, a delay annotation function does not need to implement the two
jobs conjunctively. And, a wait-for-event method does not necessarily follow each
time annotation statement either.
As shown on line 5 and line 9 in Table 3-4, the lightweight DELAY_WR()
function only processes an annotation value in terms of storing and accumulating
it in a variable (see Virtual Registers in Section 3.3.2) in the Live CPU Model, but
it does not invoke the Live CPU Model to progress the simulated clock immedi-
ately. It is especially appropriate for use in data-dependent loops in order to re-
duce time advance overheads.
The dual-function DELAY() and the wait-for-event statements are also impor-
tant at specific points in model code (e.g., lines 12 and 13 in Table 3-4). Some
rules are defined to indicate where time advance points are essential. In modelling,
these situations include:
89
1) In application tasks, time advance points are necessary before calling and
returning from RTOS system functions. These points define the boundary
of a task and a RTOS function, and allow switches to be made between
them.
2) If the current running application task will terminate execution, then a time
advance point is necessary. This point defines the boundary between differ-
ent tasks.
3) In any critical sections (no matter in tasks or in RTOS functions) where in-
terrupts are disabled, time advance points are necessary in order to progress
the target clock.
This technique essentially separates annotation points from time advance
points. This is a native capability of the mixed timing approach because of the un-
derlying annotation-independent time advance method. The reduced running
chances of the Live CPU Model and fewer context switches of the SystemC ker-
nel can speed up simulation speed. At the same time, fine-grained timing annota-
tions can still be used in order to accurately reflect the timing cost of software
models’ execution traces.
Previously, it has been noted that behavioural software modelling and simula-
tion need timing information of software execution on the target platform. Soft-
ware instrumentation and performance estimation are pre-requisites of all back
annotation based behavioural simulation. This is a quite broad and non-trivial re-
search domain, which is far beyond the focus of this thesis. Example research in
this domain can be found in [84] and [142]. In Sections 3.2.5 and 3.2.6, some re-
lated performance estimation methods are introduced in brief rather than present-
ing in-depth research. The final modelling builders and simulation users can de-
termine and apply appropriate time estimation methods in practice.
90
3.2.5.1 Static Timing Analysis Method
A typical static analysis method is the WCET analysis 2 [143]. It aims to com-
pute an upper bound for the execution time of a piece of program by analysing the
code but without actually running it. A WCET analysis includes three steps:
The program flow analysis extracts possible executing sequences of a pro-
gram at the basic block level. This study should try to cover all possible
paths in order to generate a safe coverage.
The low-level analysis calculates execution time of each basic block on a
given target hardware architecture. The complexity of this study is to con-
sider various performance-enhancing features of modern processors, such as
caches, pipelines, etc.
The calculation step combines paths information and low-level execution
times in order to derive a WCET.
WCET results might be used as source of time annotations in our mixed timing
software modelling.
For abstract software models, the assumption is that much software code has
not been available; hence, specific WCET analysis cannot be implemented. For
native-code models, model programmers can use conventional WCET analysis to
obtain software timing information. In our consideration, now that the source code
is available for simulation, we prefer to annotate statements at a fine granularity,
which means that the basic-block WCET information are more useful than func-
tion-level or task-level WCET results that may be over-pessimistic. Colin et al.
specifically take WCET analysis on the RTEMS RTOS with the intention to study
the predictability of RTOS timing behavioural [144]. This research reveals the
possibility of obtaining timing information of RTOS services by the static analysis
approach.
We can use time estimates of tasks and functions to build simulation models in
order to capture initial approximate timing behavioural of a system. These time
2
The BCET analysis is a related problem to find the lower execution bound of a program.
91
estimates can be generated either from functional specifications or a random func-
tion. Regarding the latter technique, for simple cases that do not have a strict re-
quirement on the approximation of generated numbers, the rand() pseudo-
random function in the C Standard General Utilities Library (with the head file
stdlib.h) is used. If there are some definitions on the probability densities of
periods and computation times of tasks, the well-acknowledged UUNIFAST algo-
rithm can be used to generate task sets with uniform distribution in a given space
[145].
92
3.2.6 RTOS Performance Estimation
For early and abstract modelling research in which both RTOS and the target
platform are not fixed, simulation users may be interested in the relative magni-
tude of RTOS timing cost and compare simulation results of several different de-
sign alternatives. It is not necessary to assign precise timing estimates for every
RTOS activity. RTOS system services can be annotated by a scaling parameter
method in [2]. This relates execution cost for each RTOS action to a scaling pa-
rameter (S), which reflects relative timing magnitudes of different RTOS services
depending on their typical computational complexities. Table 3-5 shows execution
times of some typical RTOS services in terms of the scaling parameter S. Note
that in an individual modelling case, the programmer can correct the scaling factor
of a specific RTOS function depending on available timing information.
Table 3-5. Basic RTOS actions and their relative execution times [2].
93
RTX-RTOS on LPC2138 ARM7 CPU @ 60MHz (Unit : µs)
code executed from internal flash with Memory Accelerator Module
Action Time Action Time
Initialize system 34.9 Task switch 7.1 – 10.5
Create defined task, no task switch 14.3 Send semaphore (no task switch) 2.7
Create defined task, switch task 16.7 Send message (no task switch) 5.3
Delete task 9.6 Interrupt response for IRQ ISR 0.8
For instance, the QNX Neutrino RTOS [147] is provided with average kernel
benchmark results based on different hardware platforms such as Intel Pentium4
processors, XScale processors, and TI OMAP processors. And, referring to Table
3-6, the RTX RTOS is also provided timing specifications on a specific ARM
platform [1]. If benchmark documents are not available for some specific plat-
forms and RTOS versions, development kits or benchmark suites are sometimes
supplied by their vendors, in order to let users measure timing costs by themselves.
94
not feasible to annotate the RTOS model at the basic block level or at the state-
ment level.
95
1) Delay time = td
event_1
2) wait-for-event
3) Release event_1 after td consume td totally
cancel it
1) Delay time = td event_1
2) wait-for-event Consume td1 Release event_1
3) Release event_1 after td Remain td2 after td2 consume td2
ISR
0 t1 t2 t3 simulation time line
td1 td2
(B) Progress the clock and consume the delay time with interrupt disturbance
96
event_1
Update td frequently
td
CPU Simulation Engine can run periodically to update run-time changing vari-
ables, such as value of timers, software delay slices, execution budgets, etc. The
increasing number of time advance steps may also increase simulation times.
Hence, the Live CPU Simulation Engine can blend variable-step and fixed-step
time advance methods in simulation if simulation users want to trade off simula-
tion performance with intermediate observability.
97
b. The resolution of stopping software delay duration:
i. General requirement: It refers to the latency to stop the current
target simulation clock advance step, in the case that an interrupt
happens. It should be as small as possible, i.e., zero-time in theory,
in order to mimic the real situation.
ii. Features of the proposed approach: Because the proposed inter-
ruptible time advance method relies on the Live CPU Simulation
Engine, when an interrupt happens, the simulated clock is pro-
gressed to this time point. At the same time, the consumed part of a
software delay is immediately calculated and the remaining delay
part is saved. Consequently, this means that the resolution of stop-
ping software delay duration is zero-time, i.e., without incorrect la-
tency.
2) Maintaining execution delay information of software models:
a. General requirement: Every software model has some delay informa-
tion representing its running cost on the target architecture. These de-
lays must be accurately consumed in terms of the quantity and order.
b. Features of the proposed approach: According to the time advance
methods introduced earlier, a software model’s timing delay informa-
tion is securely kept on a per-task basis and correctly consumed in its
time advance in simulation. In case of a pre-emption, the delay infor-
mation of a task is updated, and its remaining part is able to resume in
future time advance.
3) Timing accuracy of handling interrupts:
a. General requirement: This is mainly revealed by the interrupt latency,
which is the time from the raising of an external interrupt signal till the
beginning of a software interrupt handler. The simulated interrupt la-
tency should be similar to the real situation in terms of predictability
and functionality.
b. Features of the approach: The interrupt handling approach is based
on a combination of the timely hardware interrupt catching model and
the zero-latency software delay stopping method. The Live CPU Model
98
can sense external interrupt requests when it consumes software delays
at the same time. Since both hardware models and software models
execute in the discrete-event SystemC simulation framework with a
unified global clock, there is no additional HW/SW synchronisation la-
tency that may appear in asynchronous co-simulation. Hardware-
initiated interrupt handling can begin immediately and can be propa-
gated to a software handler without delay. The theoretical minimum in-
terrupt latency is zero-time in simulation, and the worst-case interrupt
latency is bounded by the longest interrupt disabled time which is fully
configured by model builders. This timing behaviour is the same as a
real-time system that runs on a real CPU.
99
Software Processing
Element (CPU)
Behavioural software
Software aspect
simulation model
100
3) The Live CPU Simulation Engine takes charge of advancing software
simulation time (in Section 3.3.4).
Based on these components, this abstract Live CPU Model is actively involved
in high-level software simulation. In the following, they will be introduced in de-
tail.
101
Virtual Registers
For SW simulation time advance For system status and flags setting
Register
Descriptions Register Name Descriptions
Name
CPU_REG[0] Delay Register: delay value of current code block CPSR Current Program Status Register
CPU_REG[1] Total delay of current task job SPSR Saved Program Status Register
CPU_REG[2] Absolute deadline of current task job ICRR Interrupt Controller Raw Status
CPU_REG[3] Consumed delay time ICSR Interrupt Controller Status Register
CPU_REG[4] Start-time Stamp: start time of current delay ICMR Interrupt Controller Mask Register
CPU_REG[5] Task
slice suspension time … ... … ...
and the newly dispatched task’s timing information in its TCB is loaded into
these registers.
As illustrated in the right part of Table 3-8, some 8-bit Virtual Registers
hold system runtime status and help the Interrupt Controller Model to han-
dle interrupts. For example, the Current Program Status Register (CPSR) is
mainly used to distinguish the execution mode of the Live CPU Model, i.e.,
the normal software simulation mode or the interrupt request mode. The In-
terrupt Controller Raw Status (ICRS), the Interrupt Controller Status Regis-
ter (ICSR), and the Interrupt Controller Mask Register (ICMR) contain
original interrupt request information, interrupt service information, and in-
terrupt masking configuration, respectively.
It is acknowledged that the interrupt latency, interrupt response time, and in-
terrupt recovery time are some concerned timing properties of a real-time embed-
ded system. The Interrupt Controller Model provides a hardware-level foundation
to model a usual HW/SW cooperative interrupt handling mechanism, which usu-
ally has three bottom-up layers: the HW interrupt controller, the RTOS interrupt
handler, and application ISRs. As illustrated in Figure 3-15, the main function of
the Interrupt Controller Model is encapsulated in the cpu_ic() SC_METHOD
process. It monitors a set of sc_ports, which are further connected to various
interrupt sources (e.g., peripheral devices) by IRQ lines.
102
irq_line0 IRQ_source_0 Module
Virtual Interrupt Controller Model
Registers irq_line1 IRQ_source_1 Module
ICRR cpu_ic()
irq_line2 IRQ_source_2 Module
ICSR {...} irq_port[n]
ICMR irq_line3 IRQ_source_3 Module
Live CPU Model irq_line i IRQ_source_i Module
In order to deal with multiple simultaneous interrupts from various devices and
bound the interrupt latency, the Interrupt Controller Model can prioritise, mask or
disable interrupt sources by setting corresponding register bits in ICRR, ICSR and
ICMR. When a hardware device raises an IRQ by asserting a signal through its
interrupt request line, the Interrupt Controller Model can catch the signal immedi-
ately and call a software interrupt handler, which could be either a RTOS kernel
interrupt handler function or a vectored ISR depending on a specific interrupt
handling scheme. This software handler will subsequently invoke the Live CPU
Simulation Engine to stop the current delay process. Depending on specific im-
plementation, a software handler can be pre-emptible or non-pre-emptible.
103
#001 SC_METHOD(cpu_sim_engine);
#002 dont_initialize();
#003 sensitive << evt_rtos_start_call_cpu_sim_engine
#004 << evt_apps_call_cpu_sim_engine
#005 << evt_rtos_service_call_cpu_sim_engine
#006 << evt_tick_isr_2_cpu
#007 << evt_interrupt_handler_enter_2_cpu
#008 << evt_cpu_advance_total
#009 #ifdef _CPU_DYNAMIC_FIXED
#010 << m_cpu_clk.posedge_event()
#011 #endif
The basic modelling idea of the Live CPU Simulation Engine is to use the
SLDL wait-for-event mechanism instead of the uninterruptible wait-for-delay
mechanism. The Live CPU Simulation Engine is implemented as a SC_METHOD
process. It coordinates its execution and controls time advance of various software
tasks by corresponding events (i.e., objects of the SystemC sc_event class).
Table 3-9 shows the static sensitivity list of the Live CPU Simulation Engine. The
events on lines 3-7 are externally called by software models to trigger execution
of the Live CPU Simulation Engine, the event on line 8 is internally used by the
Live CPU Simulation Engine to trigger itself for time advance, and lines 9-11
configure the running mode of the Live CPU Simulation Engine if it needs to run
periodically, i.e., the fixed-step time advance method.
Referring to Figure 3-16 (A), most real CPUs execute software cycle-by-cycle
Write back
Decode Instructions Update status Decode delay time
the result of the
to determine purpose of delay time and into standard format for
operation to resume (begin) a SW task time advance
and get operands
register or memory
104
according to an execution mechanism that includes four fundamental stages: fetch
instructions, decode instructions, execute instructions, and store (write back) re-
sults. Inspired by this classical mechanism, the Live CPU Simulation Engine in-
stead executes software models’ delay times over four comparable conceptual
stages: fetch delay time, decode delay time, advance simulation (delay) time, and
update status (see Figure 3-16 (B)).
Referring to Figure 3-17, the Live CPU based software time advance process
can be described over five steps along the target simulation timeline. There are
two possible software time advance cases, i.e., without interrupt interference (see
105
Figure 3-17 (A)), or with interrupt interference (see Figure 3-17 (B)). In following
descriptions, Steps (A), (B), (C), and (D) of the two cases are the same, their dif-
ference residing in Step (E).
1) Step (A): Preliminary to advancing software simulation time by the Live
CPU Simulation Engine, a software task is firstly loaded into the Live CPU
Virtual Registers
C Time
Other Delay
(1) Store t ns in DR. stamp
Registers Register
(2) Maintain registers. Register E
(1) CPU Engine
executes again when
Fetch Decode Advance Update the t ns delay expires.
B delay time delay time delay time status (2) It consumes the
(1) Delay annotation is Live CPU Simulation Engine value in DR.
injected into Live CPU. (3) It resumes the SW
(2) The SW code block task.
Plan to release the event after t
waits for an event.
…… SW task delay duration SW task ... ...
t0 t0+t Simulation time line
A D
A SW code block The CPU Engine plans
executes in zero-time. to trigger itself after t ns
and then returns.
Virtual Registers
C Time E
Other Delay
(1) Store t ns in DR. stamp
Registers Register (1) OS saves task’s
(2) Maintain registers. Register
context and loads an
ISR.
(2) OS calls the CPU
Fetch Decode Advance Update Engine and cancels
B delay time delay time delay time status the old event.
(1) Delay annotation is e
Live CPU Simulation Engine th t (3) CPU Engine starts
injected into Live CPU. cel ven
n immediately and
(2) The SW code block ca ld e
Plan to release the event after t o begins the new ISR.
waits for an event.
…… SW task delay duration ISR delay … ...
t0 t1 Simulation time line
A D
A SW code block The CPU Engine plans
IRQ
executes in zero-time. to trigger itself after t ns
and then returns.
SW execution in SW delay
time advance start time advance stop
zero-target-time duration
106
Model by an OS context switch operation. Then a software code block,
which could either be a whole task, a function, a statement segment, or a
basic block, executes in zero-target-time at time t0.
2) Step (B): After the software code block finishes execution, an explicit time
advance point can be reached. Here, there is a delay annotation function
and a SystemC wait(event) statement, just as what is introduced in
Section 3.2.3.
a. The delay annotation function generates a delay value which may have
different timing units (e.g., second, millisecond, microsecond, etc.) and
meanings (e.g., task level delay or basic block level delay) for model-
ling convenience. The value is written into a temporary variable in the
Live CPU Model, i.e., delay information is fetched, and the Live CPU
Simulation Engine is triggered to be ready-to-run.
b. The software code block then keeps waiting for its exclusive SystemC
sc_event object that will be released by the Live CPU Simulation
Engine at a future time point. This sc_event object represents the
“address of code block to run” in our simulation. Its importance is simi-
lar to the program counter in a real CPU.
c. From the perspective of the internal SystemC scheduler, the SystemC
process, which the software code unit belongs to, yields control to the
SystemC simulation kernel and the Live CPU Simulation Engine proc-
ess will be selected to run in next. However, from the perspective of
OS scheduling, this software task is still at the RUNNING state.
d. Note that, when using the simple single-purpose annotation function
DELAY_WR() in Section 3.2.4.2, only the delay value is stored for
prospective time advance, but the Live CPU Simulation Engine is not
triggered and there is no wait(event) statement. Hence, the soft-
ware model will continue executing until a time advance point is
reached.
3) Step (C): Because inputted delay information may have specific formats, it
is necessary to transform them into standard-form data for use with time
advance. The Live CPU Simulation Engine then decodes delay informa-
107
tion into a double float number with the nanosecond timing scale. The de-
coded result “t ns” is stored in the Delay Register (DR) that belongs to the
virtual register set of the Live CPU Model. At the same time, the current
time stamp t0, which can be obtained by the SystemC function
sc_time_stamp(), is also recorded in another virtual register.
4) Step (D): Subsequently, the Live CPU Simulation Engine starts the “simu-
lation (delay) time advance” step at t0. This stage consists of two opera-
tions: the Live CPU Simulation Engine plans to wake up itself at a future
time point and then returns. The CPU Engine’s sleeping duration represents
execution cost of a software model. Depending on the execution mode of
the Live CPU Simulation Engine, there are three possible cases:
a. If the Live CPU Simulation Engine works in a pure variable-step time
advance mode, it plans to progress the delay time t in the DR in a sin-
gle step. It sets the internal event to trigger itself at the coming time
point t0+t. Then it returns control back to the simulation kernel in order
to advance the simulation time by the duration of t.
b. If the Live CPU Simulation Engine is set with a fixed-step time ad-
vance mode, it runs periodically in order to decrement and update the
delay value in DR until the delay value is totally exhausted, whilst, the
simulation clock is progressed period-by-period.
c. If the Live CPU Simulation Engine is configured with both the vari-
able-step and the fixed-step modes, it not only plans to wake up at the
final time point, but also periodically decrements the delay value.
5) Step (E): In this stage, the Live CPU Simulation Engine updates the simu-
lation status by maintaining delay time and resuming or beginning a soft-
ware task. There are two possible situations depending on whether an inter-
rupt happens:
a. Assuming a simple case where there is no interruption or pre-emption
during the t time duration as illustrated in Figure 3-17 (A), thus the
Live CPU Simulation Engine wakes up at time t0+t. It consumes the
value in DR and then issues the event related to the current RUNNING
108
task so as to make it continue executing. Upon that, the above execu-
tion cycle is repeated.
b. A main target of the mixed timing approach is to solve the non-
interruptible problem of SystemC software simulation. It is important
to consider the interference from an unexpected interrupt event during
ongoing software delay duration. As shown in Figure 3-17 (B), before
the time advance duration t expires, an IRQ happens at t1 that is earlier
than the time point t0+t projected in Step (D). Given that the interrupt
handling mechanism of the system is not intentionally disabled, the In-
terrupt Controller Model thus catches the IRQ immediately and then
invokes the software OS interrupt handling function to serve this IRQ,
i.e., the current RUNNING task will be pre-empted by a higher-priority
ISR. The OS interrupt handling function saves the remaining portion of
the delay time slice and other timing information in Virtual Registers to
the pre-empted task’s TCB for future use. The remaining portion of the
delay time is calculated as: tremain = t-(t1-t0), where t is the initial value
of the DR and t1 is the current time stamp. The OS interrupt handling
function then dispatches (i.e., loads its context to Virtual Registers) an
appropriate ISR as the next-to-run software task and calls the Live CPU
Simulation Engine by notifying an event to replace the previously-
planned wake-up event. The Live CPU Simulation Engine faces fresh
values in the Virtual Registers and sends an event to allow the ISR to
run immediately. Consequently, the software ISR executes its func-
tional code and repeats the above time advance process. In this way,
both software time advance and hardware interrupt handling are simu-
lated accurately.
109
functional and timing models. The simulation performance and simulation accu-
racy aspects are addressed in this section in order to evaluate experiments in Sec-
tion 3.5.
Note that although the ISS simulator is also a software-based simulation ap-
proach, it executes cross-complied software binaries for a target hardware plat-
form. In the context of high-level software simulation, functional and timing be-
haviours of an ISS are commonly deemed the same as real software execution on
a corresponding processor.
110
3.4.2.1 Functional Accuracy
Functional accuracy refers that, in terms of a given test program, whether be-
havioural simulation models can represent similar functions and generate correct
results compared to real software execution. Based on the definition in Section
3.2.3.1, abstract software models do not sufficiently reflect this property if they do
not aim to include enough functional code. Regarding native-code software simu-
lation models, this property can be evaluated by compared its simulation results to
those of an ISS simulation.
However, evaluating functional accuracy is not an emphasis in this chapter, be-
cause it is not difficult to guarantee that a single task model can execute correct
modelling functions. Especially, a native-code task model may have the same
code as a real task. Functional accuracy of concurrent multi-tasking software
models will be addressed in Chapter 4, when a complete RTOS model is intro-
duced.
111
The second part is addressed in definitions of software models in Section 3.2.3.
It should be noticed that inaccurate annotations may be intentional choices of
simulation users for the sake of fast simulation performance and ease of modelling.
The third part is a notable advantage of the mixed timing approach in terms of
supporting interruptible software time advance by the Live CPU Simulation En-
gine. However, in this chapter, without involving many task switches and RTOS
services in simulation, this aspect cannot be evaluated thoroughly.
Still, referring to Section 3.2.7.3, there are three basic features related simula-
tion timing accuracy can be evaluated:
1) The resolution of stopping a software time advance step
2) Timing accuracy of handling interrupts
3) Maintaining execution delay information of software models
The first point can be evaluated by measuring how fast a time advance step can
be stopped in the proposed simulation approach. The second point can be simpli-
fied as the interrupt latency at the moment. In fact, it refers to the same feature as
the first point. The third point can be evaluated by observing whether a task’s time
advance can be resumed properly after it is pre-empted.
112
3.5.1 Performance Evaluation
In Section 3.2.3, the abstract software model and native-code software models
are introduced. Because they have distinct functional and time annotation charac-
teristics, their simulation performance necessarily differs. Furthermore, in Section
3.2.4, two techniques are introduced to improve simulation performance by ad-
justing time annotation and advance statements in code. This section presents
some tests to evaluate simulation performance of these different models and mod-
elling techniques. In order to concentrate on the above-mentioned aspects and
eliminate the possibility that software functional complexity may dominate simu-
lation performance, the test program includes a single task implementing a selec-
tion sort algorithm. This algorithm involves typical data-dependent if conditional
operations and for loop operations, which require fine-grained time annotations if
the timing accuracy is a concern. Although RTOS services are not called by the
task, limited RTOS services (without delay annotations) are still executed in order
to initialise the software simulation system.
As shown in Table 3-10, the same program is simulated in six cases:
Two abstract software models: The first abstract software model does not
implement the actual function of the sort algorithm, whilst the second ab-
stract model does. They are both annotated one time annotation statement
and one time advance point at the task level.
Three native-code models: They all implement the sort function and have
four fine-grained segment level annotation statements, which are approxi-
mately timing accurate regarding data-dependent loops.
The native-code 1 and 2 are both implemented by the proposed mixed
timing method and the interruptible Live CPU based time advance
method. Their difference is: two time advance points are defined in na-
tive-code model 1, which utilises the reduced time advanced technique
in Section 3.2.4.2; whereas, four time advance points are defined in na-
tive-code model 2 and inside data-dependent loops.
113
Proposed Interuptbi
Proposed Proposed Uninterruptibl
native- ble native-
abstract abstract e native-code ISS
code code
model 1 model 2 model 3
model 1 model 2
Without With With With
Functions With functions Final code
functions functions functions functions
Coarse- Coarse- Fine- Fine-
Time annotation grained grained grained grained Fine-grained
granularity function- function- segment- segment- segment-level
level level level level Cycle-
Number of time accurate
1 1 4 4 4
anno. statements ARM7TDM
Coarse- Coarse- Fine- I-S
Time advance grained grained Reduced grained Fine-grained LPC2124
granularity function- function- advance segment- segment-level @60MHz
level level level
Number of time
1 1 2 4 4
adva. statements
114
Simulation time comparison
10000000 1845550
802926.05 815000
(us)
1000
100
10
1
Native- Native- Native-
Abstract Abstract
code code code ISS
model 1 model 2
model 1 model 2 model 3
Host simulation
1882.75 3237.126 3527.092 1845550 802926.1 815000
time (us)
Total execution
counts of 1 1 125749 125749 125749
annotation
Total time
1 1 2 125749 125749
advance steps
115
1) More annotation statements do not contribute too much simulation time.
Comparing the native-code model 1 and the abstract model 2, 125749 times
more annotation statements result in less than 10% simulation overheads.
2) Time advance steps (i.e., execution of the Live CPU Model) affect simula-
tion performance greatly. Comparing the native-code model 2 and the na-
tive-code model 1, 62875 times more time advance steps incur 500 times
more simulation time.
116
1ms to progress the target clock by a step of 1 ms. Once the delay value
falls below 1 ms, then the engine runs every 1 s to advance the target
clock by a step of 1 s. This achieves 1 s time advance resolution.
3) Model C: uses a mixed fixed-step and variable-step time advance method.
It progresses a delay slice in an interruptible variable-length step and also
runs every 1 ms to advance the target clock by a step of 1 ms. The time ad-
vance resolution is only restricted by the timing resolution of SystemC
simulation engine.
4) Model D: uses a variable-step time advance method. It progresses a delay
slice in an interruptible variable-length step. The time advance resolution is
only restricted by the timing resolution of SystemC simulation engine.
The same test program is run on KEIL ARM ISS for a same duration of 1000
ms. The target processor is a NXP LPC2378 running at 40MHZ. A µC/OS-II
RTOS [149] is ported on this ISS to manage tasks.
Obtained simulation speed results are shown in Figure 3-19. Compared to ISS
simulation, mixed timing models obtain drastic performance improvement in
terms of the biggest speedup over 3000 times. Unsurprisingly, the variable-step
approach is also faster than the fixed-step time advance approach. Model D
117
achieves a considerable speedup (over 600 times) compared to model A. This is
because the fixed-step approach progresses the target clock much more frequently
than the variable-step approach, which is reflected by higher running counts of the
Live CPU Simulation Engine.
The models B and C use combined time advance methods. From their simula-
tion results, it can be inferred that finer periodic time advance steps result in more
simulation overheads. In order to reveal relations between step lengths and simu-
lation speeds of fixed-step time advance method, three additional tests are carried
out with periodic steps of 2 ms, 5 ms and 10 ms, which mean the Live CPU Simu-
lation Engine is activated to advance the target clock in every 2 ms, 5 ms, and 10
ms respectively.
Figure 3-20 shows that simulation times and Live CPU running counts steadily
decrease whilst the fixed-step period is growing larger. This characteristic can be
used to tune the Live CPU Simulation Engine and optimise the simulation per-
formance and simulation observability in different situations. Besides, the peri-
odic fixed-step time advance method can represent the behaviour of handling the
periodic real-time clock interrupt of a RTOS, in which the Live CUP Simulation
Engine is triggered periodically. According to simulation results, finer real-time
clock interrupt periods incur extra but not excessive overheads, which can be used
118
as a reference to determine the period of the clock interrupt in a RTOS model.
Experimental tests in Section 3.5.1.1 are also studied here. According to the
analysis in Section 3.4.2.2, regarding a simple software model, its timing accuracy
depends on its performance estimation and delay annotation granularity. Perform-
ance is measured in ISS and used for native-code software models. Timing delays
are annotated at the segment level. Consequently, a good timing accuracy should
be expected. As shown in Table 3-11, in terms of the same test program, native-
code models consume very similar target time to the ISS simulator. This table also
demonstrates that reducing time advance points does not affect timing accuracy of
independent software models.
Native-code Native-code
ISS
model 1 model 2
Simulated times (µs) 6986.115 6986.115 6977.51
Accuracy loss 0.12% 0.12%
Referring to the three basic features related simulation timing accuracy in Sec-
tion 3.4.2.2, an interrupt experiment is executed in order to evaluate them in simu-
lation.
This experiment includes five IRQs (IRQ1-5) and five associated ISRs (ISR1-
5), which are assigned ascending priorities. Each IRQ randomly happens 500
times in 10 seconds simulated time. A normal task runs in the background and can
be interrupted by any IRQs and pre-empted by their ISRs. The software system is
configured so that interrupts are always enabled and the Live CPU Simulation
Engine can stop current time advance as soon as a higher-priority interrupt hap-
pens. Therefore, at any simulation time point, interrupt latency of the highest-
119
SW task SW task
IRQ1
ISR1 C
IRQ2
ISR2 C
IRQ3
C ISR3 ISR3 C
IRQ4
C ISR4 C
7011 7016 7022 7027 7041 7053 t (μs)
til
tiresp tireco
interrupt_handler_enter til : interrupt latency time
interrupt_handler_exit tiresp : interrupt response time
C context_switch tireco : interrupt recovery time
priority IRQ should always be zero, and all other IRQs are only able to be post-
poned by higher-priority ISRs.
Figure 3-21 shows a part of the timeline of this experiment, which is drawn ac-
cording to the actual simulation log. It illustrates three concerned basic timing re-
lated features, i.e., immediate stop of time advance, resumable time advance, and
zero-time interrupt latency. As well, it demonstrates some functions of the Inter-
rupt Handler Model.
Referring to this simulation trace, at t=7011 µs, IRQ2 and IRQ3 happen simul-
taneously. Since the Live CPU model controls software time advance and moni-
tors IRQ lines, the current software time advance step is stopped immediately and
an IRQ is handled immediately. This interrupt latency is zero-time. Because the
priority of IRQ3 is higher than IRQ2, the Interrupt Controller Model ignores
IRQ2 and begins to service IRQ3. Afterwards, RTOS interrupt services and ISR3
execute sequentially. At t=7022 µs, a higher-priority IRQ4 happens and invokes
nested interrupt service by pre-empting ISR3. Note that IRQ1 is raised during
ISR4 execution; however, it is ignored by the Interrupt Controller Model due to its
lower-priority priority. After the completion of ISR4, lower-priority ISRs are han-
dled successively according to their priorities. Among them, ISR3 is released
firstly to continue its remaining delay and finishes at t=7041 µs.
In order to quantify the interrupt latency in simulation, we measure interrupt la-
tencies of these five IRQs in this experiment. The theoretical maximum interrupt
latency of an IRQ can be computed as the sum of all higher-priority ISR time
costs:
120
Table 3-12 compares measured maximum interrupt latencies with calculated
theoretical values. As expected, the highest-priority IRQ5 is always serviced
without any delay. And other IRQs are also serviced with zero-time latency if
there is no other higher-priority ISR in the system. In case that an IRQ is delayed
by some other higher-priority ISRs, its maximum interrupt latency does not ex-
ceed the theoretical worst-case value either.
Counts of Counts of
Theoratical Measured
zero-time delayed ISR time
maximum maximum
interrupt Interrupt cost (µs)
latency (µs) latency (µs)
latency latency
IRQ5 500 0 500 0 0
IRQ4 441 59 10 500 494
IRQ3 440 60 10 510 488
IRQ2 448 52 10 520 502
IRQ1 444 56 10 530 488
3.6 Summary
In this chapter, a SystemC-based mixed timing software behavioural modelling
and simulation approach and the Live CPU Model have been introduced.
In the context of TLM software computation modelling, two types of software
timing models were proposed for use in different software modelling stages. Also,
they can be mixed in simulation for modelling flexibility. By isolating the timing
modelling aspect from the timing simulation aspect, various timing annotation
granularities (i.e., task-level, function-level, segment-level, and basic block-level),
functional accuracy levels (i.e., abstract and native-code), and time advance meth-
ods (i.e., variable-step and fixed-step) can be utilised on mixed timing software
models for various sakes of fast simulation performance, modelling flexibility,
simulation observability, and reasonable accuracy.
121
The proposed SystemC-based Live CPU Model can achieve interruptible soft-
ware time advance and zero-time delayed interrupt handling latency in software
simulation. The HW/SW synchronisation problem is solved without the need of
fine-grained time annotation and advance. This approach avoids the annotation-
dependent software time advance approach that may result in uninterruptible
software timing simulation. The Live CPU model supports multiple execution
modes, which could trade off simulation speed with simulation observability. The
Live CPU Model also provides an essential Interrupt Controller Model, a real-
time clock and some Virtual Registers to assist software simulation. In the context
of a software PE model, the Live CPU Model behaves as the conceptual hardware
part and is promising to be extended with SW/HW interfaces for inter-module
communication.
Regarding the requirement of fast performance, a representative test program
shows that the proposed mixed timing software models achieve about 200 to 3000
times speedup3 to an ARM ISS simulator and the conventional fine-grained unin-
terruptible behavioural software model. The proposed abstract and native-code
software models also show distinct simulation performance as expected. Various
execution models of Live CPU Simulation Engine are tested in order to present
their effects on simulation performance. In general, more time advance points in
models inevitably incur more simulation overheads.
In this chapter, twofold timing accuracy of the simulation approach was meas-
ured in experiments. Firstly, focusing on timing accuracy of single task execution,
with fine-grained segment-level annotations, the proposed native-codes only incur
a 0.12% timing accuracy loss. Secondly, the basic time advance stopping latency
and interrupt latency is evaluated by measuring interrupt latencies in simulation
3
The variation in simulation speedup are mainly because of two reasons: firstly, different experi-
ments and test settings affect the simulation speed of a specific experiment; secondly, experiments
were carried out at different times when the overall functionality and complexity of the proposed
software simulator were evolving, which affected simulation speeds. In general, compared to the
KEIL ARM ISS, the proposed simulation approach has two or three orders of magnitude speedups
in this thesis.
122
tests. The result accords with the theoretical value, i.e., zero-time latency. The re-
sumable time advance method is demonstrated in a simulation case.
123
Chapter 4
125
4.1 Motivation and Contribution
Within the system-level RTOS modelling and simulation research area, there
still exist some unaddressed aspects and issues for improvement. These relate to
the functionality, timing accuracy, and simulation performance of simulation
models. For example, from the perspective of maximising flexibility of system-
level software modelling, designers may want to simulate multiple abstraction-
level software models in one simulation framework. Current RTOS modelling re-
search does not address integrating coarse-grained timed abstract task models (i.e.,
associated with best-case and worst-case execution times) and fine-grained timed
native-code application software (i.e., associated with multiple delay annotations)
in one simulator. Besides, from the perspective of practical RTOS engineering,
some RTOS models provide simplistic task management and limited synchronisa-
tion services, which are inadequate to imitate the behaviour of a real multi-tasking
RTOS. Furthermore, the low timing accuracy is a common, yet critical, problem
borne by many RTOS modelling approaches. On the one hand, this is due to the
lack of inclusion of RTOS services’ timing overheads in modelling. On the other
hand, some SLDL-based modelling methods rely excessively on the uninterrupti-
ble SLDL wait-for-delay time advance mechanism (see Section 3.1.1); conse-
quently, task switches and HW/SW synchronisation can only happen at limited
pre-defined time advance points.
In this chapter, a SystemC-based system-level RTOS-centric real-time embed-
ded software simulation model is presented. Its objectives are fast simulation and
behavioural evaluation of real-time embedded software with good flexibility and
reasonable accuracy in early design phases. Dynamic execution scenarios of a
modelled target system can be exposed by tracing diverse system events and val-
ues in simulation, e.g., RTOS kernel calls, RTOS runtime overheads, task execu-
tion times, dynamic scheduling decisions, task synchronisation and communica-
tion activities, interrupt handling latencies, context switch overheads, and other
properties. The whole simulation framework integrates multi-tasking applications,
RTOS, Live CPU and other hardware component models in a unified SystemC
prototyping environment. The core is a generic RTOS simulation model, which
supplies a set of fundamental and typical services including task management,
126
scheduling services, synchronisation, inter-task communication, clock services,
context switch and interrupt handling services, etc. These services refer to several
commercial RTOS products and specifications in order to supply general and
standard functions. With the aim of building a timing RTOS simulation model,
timing overheads of various RTOS services and application tasks are also consid-
ered in the models.
All models in the simulation framework are implemented on top of the Sys-
temC library. The basic SystemC core language and the OSCI referenced simula-
tion kernel are used without modification.
In the remainder of this chapter, Section 4.3 introduces a typical embedded
software stack and considers its inclusion within our simulation model. Section
4.4 presents background knowledge of real-time applications and the RTOS. Sec-
tion 4.5 describes the RTOS-centric software modelling approach in detail. Sec-
tions 4.6 and 4.7 introduce evaluation metrics and experiments to demonstrate the
simulation performance, function, and accuracy of RTOS-centric real-time soft-
ware models. Finally, the chapter is summarised in Section 4.8.
Software Processing
Element (CPU)
Behavioural software
Software aspect
simulation model
Hardware
Hardware aspect abstraction:
Live CPU Model
SystemC Enviroment
127
so software simulation is guaranteed with reasonable timing accuracy and good
HW/SW synchronisation (i.e., interrupt handling) timing accuracy. The whole
software PE model is the research context, i.e., multi-tasking real-time applica-
tions and a RTOS run in a uniprocessor embedded system model.
Due to the high abstraction level of the software simulation approach in this
thesis, advanced CPU architectures such as multiple-level caches and pipelines
are not considered, i.e., their effects on software execution times are not explicitly
modelled. However, according to the software performance estimation methods
discussed in Sections 3.2.5 and 3.2.6, a KEIL ARM ISS without cache is used to
measure software performance for back annotations of our software models in this
thesis. In terms of other specific ISSes, caches may or may not be supported when
the ISS executes software instructions, which means that caches can still affect
timing accuracy of software time annotations. Hence, timing accuracy losses of
software execution times - between the proposed behavioural software simulation,
the referenced ISS, and the real hardware platform - are inevitable. Recalling the
research intention of this thesis for fast and accurate software simulation, it is as-
sumed that the referenced ISS is accurate enough to support and evaluate our be-
havioural software simulation.
As introduced in Section 3.3.1, the memory subsystem for actual software exe-
cution (e.g., RAM) is not included in the Live CPU Model because that it is not
necessary for behavioural (i.e., abstract or native-code) software simulation.
Hence, target software memory environments such as stack, heap, and memory
protection, and RTOS memory management services such as swapping, paging,
allocation, segmentation, and virtual memory, are also out of the modelling focus.
Nowadays, there are many general RTOS concepts, popular RTOS standards,
and specific RTOS products. This thesis aims to present a generic RTOS model
for behavioural real-time software simulation. It should be representative yet
without a loss of generality. The selection and determination of functions and re-
quirements of the RTOS model are made with reference to both some classical
RTOS literature [25] [26], and some current RTOS specifications and products,
including:
128
The Didactical C Kernel (DICK) [25]: this is a small real-time kernel that
introduces basic and important issues for designing a hard real-time kernel
and hence informs our simulation model from the theoretical aspect.
Real-Time extensions of the POSIX (Portable Operating System Interface)
standard 1003.1 (referred to as RT-POSIX hereafter) [150]: this is a very
broad and successful API standard particularly facilitating handling multi-
threading and multiprocessing real-time applications. RT-POSIX is scalable
with four subsets (namely Real-Time Profile PSE51 (minimal), PSE52 (con-
troller), PSE53 (dedicated), and PSE54 (multi-purpose)) for different-scale
systems. The RTOS model in this thesis refers to the PSE51 profile for
small embedded systems.
μITRON (micro Industrial The Real-Time Operating system Nucleus) 4.0
standard [151]: this standard is oriented to small/medium-size embedded
systems. Over 40% of RTOSs used in Japan are based on this standard [129].
It inspires the task state machine in the proposed RTOS model.
μC/OS-II [149], ThreadX [152], and Keil RTX (Real Time eXecutive) [1]:
they are representative popular small-size RTOSs. Their functions and ker-
nel structures mostly influence the proposed RTOS model from a practical
engineering aspect.
QNX Neutrino [147]: this is a RT-POSIX compliant multiprocessor-enabled
high-end RTOS. Its implements basic thread and real-time services in the
microkernel and can be extended to support multiple processes by adding
optional components.
129
Application software task1 task2 task3 Applications model
layer
task task
Middleware model model
layer Distributed comp. Servers
System software RTOS
layer HAL API Generic RTOS model
(Hardware-dependent Hardware Abstraction Layer
Software) Device Drivers Firmware HAL
130
(e.g., boot firmware, context switches, processor mode change, and interrupt con-
figuration functions) [153]. The intention of HAL is to ease HdS porting on dif-
ferent hardware architectures by separating HdS into the hardware-independent
part (e.g., most RTOS services) and the hardware-dependent part (e.g., HAL).
Hence, in software development, the hardware-independent part can possess reus-
ability over different architectures to some extent. Only the hardware-dependent
part needs hardware-specific development. Furthermore, by means of using HAL
APIs, upper-level application software can utilise abstract hardware resources
early in the design flow before the hardware architecture is fixed and finished,
which embodies a reuse concept.
One issue is where the HAL should appear in the embedded software stack
model. Firstly, consider the hardware and processor resources available. In Sec-
tion 3.3, the Live CPU Model has been introduced as the underlying hardware
model for software simulation. It can provide essential hardware resource for
modelling interrupt-based HW/SW interaction and clock services. In addition, in
the forthcoming Chapter 5, the Live CPU Model will be extended with TLM in-
terfaces for inter-module communication modelling. Based on these foundations,
there is a necessity to provide a HAL model in the software modelling stack
which can offer a set of low-level hardware-related functions. By this means, ap-
plication software and RTOS models can utilise and configure the Live CPU
Model for timing simulation, and can access other hardware resources. These
HAL functions include context switches, interrupt handling, critical section con-
trol, and TLM transfers etc. For the purpose of simplifying model structures, the
HAL model is implemented as a number of member functions inside the SystemC
module of the RTOS model. The external behaviour and interface of the generic
HAL model is similar to what is used in a typical embedded software stack. How-
ever, the exact functionality of some parts of the HAL model is only applicable
for simulation purpose in this thesis, which means it is different to the HAL code
that is finally implemented.
131
4.4 Common RTOS Concepts and Features
In a real-time system, the RTOS must be able to handle hard real-time applica-
tions to fulfil strict requirement of deadlines. In addition, because there are differ-
ent types of applications in the real world, a RTOS may need to support a hybrid
NRT, hard and soft real-time application set. In real-time systems research, some
approaches have been proposed to not only guarantee timing constraints of hard
real-time tasks, but also optimise the average performance of NRT and software
real-time tasks. For example, the hierarchical scheduling schemes use global (so-
called kernel-level or system-level) and local (so-called user-level or subsystem-
132
level) schedulers to schedule various applications and their inclusive tasks by dif-
ferent scheduling algorithms [26] [155]. There have been some attempts to con-
sider this hybrid application problem both on top of existing RTOSs and in the
design of new RTOS research kernels, e.g., hierarchical scheduling extension on
top of VxWorks [156] and Soft Hard Real-time Kernel (SHaRK) [157]. However,
as indicated in [26] “most OSs schedule all applications according to the same
scheduling algorithm at any given time” – currently, most popular commercial
and open source RTOSs do not have explicit special mechanisms to effectively
support NRT, soft, and hard real-time applications running in the same environ-
ment.
A real-time task has some timing properties that need to be aware or consid-
ered in RTOS management and scheduling. Referring to Figure 4-3, typical tim-
ing parameters of a real-time task usually consist of [21] [25]:
Arrival time (a): also called release time, which means the time point when
a task is ready to execute.
Offset (O): the time length between the arrival time and time point 0. In
RTOS execution, it refers to the possibility that different tasks may not si-
multaneously become ready to run after the system is started up.
Worst-Case Execution Time (WCET): the longest possible execution time
of a task.
Best-Case Execution Time (BCET): the shortest possible execution time of
a task.
Execution time (E): the actual execution time of a task, which is the time
length between the start time (s) and the finish time (f). It should reside in
D (relative deadline)
Possible execution time
task execution
0 a s f d time
O (offset) BCET
WCET
133
the range of the BCET and the WCET. Note that s could be later than a, be-
cause a ready task may need to wait for other higher-priority tasks to finish.
Absolute deadline (d): a time point before which a real-time task must com-
plete its execution, otherwise undesired consequences will happen.
Relative deadline (D): the time length between the arrival time and the abso-
lute deadline, i.e., d = a + D.
It is very common that a real-time task will need to regularly or irregularly re-
peat its execution. Based on the periodic characteristic, tasks can be classified into
three types:
Periodic: a task executes once in every regular time interval, i.e., a period
(T). Each execution is called an instance or a job. In RTOS execution, a pe-
riodic task can be triggered either by an external periodic event or by the
clock tick timer.
Aperiodic: a task may execute once or many times, but its activation rate is
not constant. In RTOS execution, an aperiodic task is usually used to han-
dling interrupt events.
Sporadic: is an aperiodic task but includes a minimum time interval between
its two executing jobs.
While different operating systems vary in terms of what components they con-
tain, the kernel is the core part of a RTOS. A RTOS kernel must at least provide
basic functions with respect to task management, interrupt handling, intertask
synchronisation and communication [25]. Some large kernels may also wrap addi-
tional system software modules such as drivers and file systems, but this is not
common in RTOSs. In fact, many RTOSs can actually be seen as kernels because
of their limited functionality and the small size. Application tasks can access ker-
nel functions and data through a series of source-level API functions. In some
embedded systems, the kernel and application software may have their own mem-
ory address spaces for the purpose of memory protection. In real execution, a call
to an API function is known as a system call (see Figure 4-4), which is effectively
134
user File
Applications Applications Network
space system
user
System calls
space Virtual Device
memory drivers
File system Network
kernel IPC Virtual memory Message server
space kernel
Device drivers Scheduler space Basic Sync./IPC Scheduler
Hardware Hardware
135
The monolithic kernel is a conventional RTOS design approach and is popular
for small or deeply embedded applications [17]. Referring to Figure 4-4 (A), it
implements all OS services (e.g., scheduler, task management, synchronisation,
inter-process communication (IPC), memory management, interrupt handlers) and
some system software modules (device drivers, file systems and network stacks)
in the kernel space. That is to say, the monolithic kernel itself equals the entire
RTOS subsystem. RTOS service functions directly call each other as they need.
The main advantage of the monolithic kernel approach is straightforward usage
and fast performance due to simple function calls [158]. However, the tight inte-
gration of many components in the kernel is error-prone, so that a bug in one
module can bring down the whole system [160]. VxWorks [161] and μC/OS-II are
often cited as monolithic kernel RTOS examples.
As shown in Figure 4-4 (B), the microkernel approach only provides a few es-
sential OS services in the kernel space such as task management, scheduler, basic
synchronisation/IPC, and a message manager [158] [162]. Other services are usu-
ally provided as normal server processes running in the user mode. A message
passing system is introduced to support communication between these server
processes [163] [164]. Applications request services from these servers via system
calls through a client-server method. The loose-coupling modularity and clear
separation between kernel services and user-level services make a microkernel
RTOS more reliable and compact. However, when processing a system call, the
client-server service model may bring more run-time context switches from an
application’s memory space to the server’s memory space, resulting in intense
message communication overheads. For these reasons, the microkernel approach
is seen as a promising method suitable for complex and scalable RTOSs such as
the QNX Neutrino RTOS.
136
they can be embodied in the RTOS simulation model. The following subsections
contain key aspects related to this problem.
A good RTOS should not only provide efficient OS services, but also keep its
own time consumptions and response times predictable and accountable [26]. Ide-
ally, a RTOS needs to guarantee execution time of each service as a fixed value,
or at least indicate a trend with an upper bound under all system load circum-
stances. Given the scheduler function as an example, the μC/OS-II RTOS looks
up a table to find the highest priority task and the task scheduling time is constant
in spite of the number of tasks created [149]. In contrast, the Olympus real-time
kernel moves tasks between two queues when it makes a scheduling decision;
hence the overhead of the scheduler varies depending on the number of queue op-
erations. Its worst-case execution time can be computed according to the number
of tasks in the system [165].
In addition to predictability, “fast real-time performance”, or analogous “rapid
real-time response”, is the top RTOS feature concerned by many real-time em-
bedded software developers [166]. This issue reflects the real-world requirements
of a real-time system in terms of promptly processing interrupt events within a
bounded amount of time. Failure to respond may result in a failure of the real-time
embedded system.
Two foremost timing properties (latencies) are usually used to evaluate the re-
sponse capability of a RTOS, namely interrupt latency and task switching latency
[167] [168]. Typically, they are in order of a few or a few tens of microseconds
[26]. Figure 4-5 shows two different interrupt handling schemes (i.e., the RTOS-
assisted scheme and the vector-based scheme) used by two RTOS products [149]
[166]. They are good examples in terms of diverse views on interrupt latency and
task switching latency:
Interrupt latency is usually defined as the elapsed time between the occur-
rence of an interrupt to the entry (first instruction) of the corresponding
software interrupt handler. In the RTOS-assisted interrupt handling scheme
in Figure 4-5 (A), the interrupt handler includes two parts, i.e., the kernel
137
user task user task new task
task execution task execution execution
context kernel context
RTOS save handler load RTOS
user ISR execution user ISR execution
ISR ISR
time time
HW IRQ HW IRQ
happens happens
interrupt task switching interrupt latency
latency latency (interrupt response)
Figure 4-5. Two definitions of interrupt latency and task switching latency
handler and user ISR handler. The interrupt latency refers to the elapsed
time between the interrupt event and the beginning of the kernel handler. It
uses the term interrupt response time to define a longer elapsed time be-
tween the interrupt event and the beginning of the user ISR. In contrast, in
the vector-based interrupt handling scheme in Figure 4-5 (B), the user ISR is
the only interrupt handler in charge and it can be activated directly. There-
fore, the interrupt latency is the same as the interrupt response time and their
definition are shown in the figure.
Task switching latency is sometimes interchangeably used with the term
context switching latency [168] [149] [167]. In Figure 4-5 (A), task switch-
ing latency refers to the time of two portions, i.e., the time to save the cur-
rently executing task’s context and the time to load another task’s context. It
is shorter than the interrupt time. However, in Figure 4-5 (B), task switching
latency refers to the time elapsed from the interrupt event to the beginning
of a task that is activated because of the interrupt. It is greater than the inter-
rupt latency (response). When comparing the two definitions of task (con-
text) switching latency, we can see that the first definition reflects a point of
view of the processor context switch, since it effectively refers to the time
consumed by the processor to save and load the context of registers.
Whereas, the second definition reflects an OS context switch viewpoint, be-
cause it counts for the total switching time used by the RTOS to save and
load tasks. To eliminate the ambiguity, this thesis uses the first definition.
138
4.4.3.2 Multi-Tasking Management
139
WAITING
signal wait
dispatch
READY RUNNING
activate terminate
preempt
dress), then there is only one kind of concurrent entity in the system. Hence, peo-
ple can interchangeably use the term task or thread to refer to a basic concurrent
activity, but the term process is not appropriate to use in this context.
After applications are divided into multiple tasks, the RTOS enables them to
execute (namely occupy the processor resource) interchangeably in order to finish
their jobs and meet their respective deadlines. Task management services are im-
plemented inside or utilised by various higher-level scheduling, synchronisation,
and RTOS initialisation services. Typical multi-tasking primitive functions in-
clude creating tasks, suspending tasks, resuming tasks, and terminating tasks, etc.
These functions control state transitions of tasks during their execution. Figure 4-6
shows a classical minimal RTOS task state machine that has three basic states:
RUNNING, READY, and WAITING [25]. In a RTOS execution, a task must stay
at one of them:
RUNNING: in a uniprocessor system, only one task can enter this state and
execute at a time. If the RUNNING task is pre-empted, then it enters the
READY state.
READY: Tasks at this state are eligible for execution, but cannot execute
immediately as another task is currently at the RUNNING state. All
READY tasks are organised in a queue by the RTOS kernel. This is named
the ready queue [25]. The scheduler regularly checks the ready queue and
the RUNNING task (if there is one) according to various scheduling policies,
in order to dispatch a new task to run when the scheduling policy permits.
140
Multi-tasking models in some standards and RTOSs
Task/Thread Process Task state machine
RT-POSIX pthread Optional implementation-dependent
RUNNING, READY, 3 WAITING
µITRON 4.0 task N/A states, DORMANT, NON-EXISTENT
3 basic states and 2 additional
µC/OS-II task N/A states
3 basic states and 2 additional
ThreadX thread N/A states
RUNNING, READY, 7 WAITING
Keil RTX task N/A states, INACTIVE
3 basic states and up to 18
QNX Neutrino pthread Optional additional states
Additional states or sub-states are always introduced to extend task state ma-
chines in different RTOSs, in order to support more task state transitions and
RTOS services. Table 4-1 surveys multi-tasking/threading/processing models and
task state machines in some RTOS standards and products.
141
that they become READY. Certainly, pre-emptive scheduling policies are prefer-
able in RTOSs, because they can respond to external events in a timely manner.
In priority-based scheduling, all tasks are assigned priorities according to crite-
ria such as the period (so-called rate) or the deadline. Priority-based scheduling is
natively pre-emptive as long as a higher-priority task is allowed to pre-empt a
lower-priority task. Depending on whether the priorities of tasks are assigned be-
fore execution or are dynamically assigned in execution, there are two types of
priority-based scheduling policies, i.e., Fixed-Priority Scheduling (FPS) and Dy-
namic-Priority Scheduling (DPS). The FPS scheme is easy to implement in RTOS
design because the RTOS kernel needs only to maintain a priority queue or a pri-
ority table in execution.
In FPS research, a priority assignment is an important problem and can be seen
as a prerequisite of FPS. However, it is not directly related to RTOS design and
implementation, because users mostly specify priorities of their application tasks.
Rate Monotonic (RM) priority ordering is the most common priority assignment
algorithm for periodic FPS systems. In RM, tasks are assigned fixed priority lev-
els that are inversely proportional to their rates, i.e., the shortest period task is as-
signed the highest priority. The simple periodic task model assumption has been
shown to be optimal in all FPS policies, which means that if it cannot schedule a
task set, then no other FPS algorithms can do so either [171]. There also exist
some other priority assignment policies, such as Deadline Monotonic [172] prior-
ity ordering and the Optimal Priority Assignment [173], which have been proven
to be optimal with their specific assumptions. A review on this topic can be found
in [174].
In the context of FPS, considering the situation whereby multiple tasks may
share the same priority level in a RTOS, a FIFO algorithm can be used as an assis-
tant policy in this case. However, this is not ideal, because a task may monopolise
the CPU for a very long time while other equal-priority tasks are starving. A
Round-Robin (RR) scheduling policy is proposed to tackle this problem. It allo-
cates each task a maximum amount of executing time, so-called a time slice or a
quantum. Once a RUNNING task has exhausted its quantum, then it is moved to
142
the tail of its priority queue by the scheduler and the head task of the priority
queue will be dispatched.
The Sporadic Server scheduling algorithm is introduced to improve the average
response time of aperiodic tasks in fixed-priority systems [175]. It creates a high-
priority server (a task) for serving aperiodic tasks. The server is allocated an
amount of processor time, i.e., execution capacity. Aperiodic tasks execute at the
priority level of the server and consume execution capacity. If substantial aperi-
odic task execution totally consumes execution capacity, then the server priority is
decreased to a low priority level, which means aperiodic tasks then execute at a
low priority without the possibility of frequently pre-empting other periodic tasks.
The execution capacity can be replenished periodically according to the replen-
ishment period.
In terms of DPS policies, the Earliest Deadline First (EDF) scheduling policy is
probably the most notable one. In EDF, priorities are inversely proportional to
absolute deadlines of tasks and are dynamically assigned to them. It has been
demonstrated that EDF is optimal for uniprocessor system in terms of fully utilis-
ing the processor bandwidth [171]. EDF also has some other theoretical advan-
tages compared to FPS in terms of less schedulability analysis complexity and
lower context switch overheads [25]. However, a major disadvantage prevents
EDF from implementation in common commercial RTOSs: EDF requires the
RTOS kernel to track and update absolute deadlines and priorities of tasks at each
job activation, which increases the complexity of the kernel and brings run-time
overheads [176]. Consequently, nowadays the EDF scheduler is mostly provided
in some research RTOS kernels such as SHaRK [157] and MaRTE OS [177].
SHaRK implements the EDF scheduler as an external scheduling module, which
is used by its core generic kernel to schedule tasks. In order to popularise applica-
tion-defined scheduling and standardisation, MaRTE OS uses RT-POSIX inter-
faces to introduce a user-thread level EDF scheduler. However, the literature [26]
has argued that the later method may incur expensive overheads from excessive
system calls of tracking system time and setting tasks’ priorities.
Although there is a fair amount of research on real-time scheduling strategies
today ([25] surveys many details), FPS is the most common priority-based and
143
Scheduling policies in some standards and RTOSs
Pre-emptive FPS FIFO Round Robin Sporadic Server
144
Priority levels in some standards and RTOSs
Priority range Lowest priority Sharable
Priority levels
of user tasks of user tasks priority
Each scheduling
RT-POSIX policy has at User-defined User-defined Yes
least 32 levels
µITRON 4.0 At least 16 levels User-defined 1 Yes
µC/OS-II 64 4-59 59 N/A
ThreadX 32 0-31 31 Yes
Keil RTX 255 1-254 1 Yes
QNX Neutrino 256 1-63 1 Yes
assurance4. The author deems that 256 priority levels can perform very well even,
for the most complex FPS systems.
According to the survey in Table 4-3, priority levels range from 16 to 256
across different RTOS specifications and products, and there are no special re-
strictions on what priority levels are available to user tasks and which priority
represents the lowest level.
Tasks may contend for shared resources (e.g., registers, variables, data struc-
tures, memory areas) in order to communicate or process data in execution. It is
necessary to guarantee that operations on a shared resource are carried out in a
consistent and protected manner, which means that a shared resource can only be
used by one task at a time, i.e., achieving mutually exclusive access. The code
segment modifying the mutual exclusive resource is called a critical section, and
its instructions need to execute sequentially without interruption. Like general-
purpose OSs, almost all RTOSs provide some conventional lock-based synchroni-
sation mechanisms (e.g., mutexes and semaphores)5 to implement mutually exclu-
4
It is reported that, compared to a RM system with 100,000 priority levels, the relative schedula-
bility of the system with 256 levels is merely reduced to 0.9986 [26].
5
Semaphores can be divided into two classes depending on their value, i.e., the general-form
counting semaphore with a non-negative integer value, and the simple-form binary semaphore
with values of zero or one. The later one can be used for mutual exclusion. Mutexes can be seen as
a specialized binary semaphore used for mutual exclusion only [17].
145
sive access to shared resource through atomic primitives wait and signal (also
called P/V or lock/unlock operations).
When a mutually exclusive resource is held by a task using a lock-based syn-
chronisation method, then other competing tasks that want to acquire the resource
cannot get the resource immediately and are said to be blocked. The notable prior-
ity inversion phenomenon refers to the situation where a high-priority task is
blocked on a resource that is already locked by a low-priority task. The problem
becomes more severe considering the low-priority task may in turn be pre-empted
by one or more intermediate-priority tasks, which is referred to as transitive
blocking. In these cases, the high-priority task cannot enter its critical section and
needs to wait for the finish of both the low-priority task and some (the number
may be uncertain) intermediate-priority tasks. This means that the duration of pri-
ority inversion is unbounded; hence, the finish time of the high-priority task is un-
predictable.
In order to solve the unbounded priority inversion problem, real-time systems
research has proposed some resource access control protocols. The Priority Inheri-
tance Protocol (PIP) [179], the derived Priority Ceiling Protocol (PCP), and the
further-improved Immediate Priority Ceiling Protocol (IPCP) [180] are three of
the most well-known protocols applied to FPS systems. They can be bracketed
into the same PIP protocol family because of their close relevance. The basic idea
behind them is similar: the priority of the task that incurs a blocking is temporarily
changed to a higher priority that is inherited according to some algorithms; then
the task can execute through its critical section without being pre-empted by a
medium-priority task; and finally the task’s priority is restored after it exits the
critical section. However, due to their difference in protocol definitions, these
three protocols possess variant features. In general, PIP suffers from a potential
long blocking duration, chained blocking and deadlock, but it does not require
prior knowledge about the resources shared by tasks and hence is easy to imple-
ment at the user level on top of an existing RTOS. Compared to PIP, PCP can
prevent deadlock and chained blocking. However, it needs the software program-
mer to define a ceiling priority for each shared resource and the OS kernel needs
to keep tracking ceiling values and task priorities, which means both implementa-
146
Resource access protocols in some standards and RTOSs
Immediate Priority Ceiling
Priority Inheritance Protocol
Protocol
RT-POSIX Yes Yes
µITRON 4.0 Yes Yes
µC/OS-II Yes N/A
ThreadX Yes N/A
Keil RTX Yes N/A
QNX Neutrino Yes N/A
tion complexity and run-time overheads [17]. Furthermore, IPCP improves the
PCP in terms of being easier to implement and with low overheads. Exact defini-
tions, analysis and comparisons of these protocols can be found in [25] and [181].
Some RTOS standards and commercial products have implemented one or
some of above protocols. Because the above-mentioned classical PIP family pro-
tocols assume that there is only one unit of each shared resource, they are natu-
rally employed on the mutex synchronisation mechanism that provides mutual
exclusion to a single-unit resource. Table 4-4 summarises the resource access con-
trol protocols utilised in some standards and RTOSs, where PIP is the most com-
mon protocol and IPCP is also provided in RT-POSIX and µITRON specifica-
tions, although the PCP does not appear in the survey.
Although binary semaphores can also be used for mutual exclusion, access
control protocols are not applied to them in most OS specifications6 [150] [152]
[1]. Instead, semaphores are mainly used for event notification and thread syn-
chronisation through an embedded counter, i.e., in the form of a counting sema-
phore. In addition, it is noticed that some access control protocols (derivatives of
PIP and PCP [26]) can support safe access to multiple-unit resources, which
means usability for counting semaphores. However, they have not attracted much
interest from RTOS designers.
6
The RTEMS RTOS [182] is an exception that it supports PIP and PCP on binary semaphores.
Actually, RTEMS provides functions of the mutex mechanism through its binary semaphores.
147
4.4.3.6 Summary of RTOS Features in the Model
In this thesis, features of the proposed RTOS simulation model are mainly de-
termined based on surveys in above Sections 4.4.3.1 to 4.4.3.5. In order to be
practically useful for current system-level embedded software development, this
research prefers to model some common characteristics and services of the sur-
veyed RTOS standards and products, rather than invent and integrate too many
proprietary features and theories. However, the two surveyed RTOS standards
(i.e., RT-POSIX and μITRON) and four RTOS products (i.e., μC/OS-II, ThreadX,
RTX, and QNX Neutrino) combine a wide range of RTOS services and features,
which are too broad to be included in the generic RTOS simulation model. Instead,
the three small-size RTOSs (i.e., μC/OS-II, ThreadX, RTX) are used a focus for
RTOS modelling.
Regarding the predictable and responsive timing behaviour of a RTOS (Section
4.4.3.1), the RTOS modelling approach attempts to model common RTOS situa-
tions from two aspects.
1) Firstly, this thesis considers the timing latencies introduced in Section
4.4.3.1, providing annotations for all related RTOS services’ timing over-
heads in simulation models. Normally, the timing overhead of a RTOS ser-
vice is annotated at the service/function level or statement segment level if
possible. Usually, timing accuracy of the RTOS model is sufficient if the
execution time of each service can be obtained before starting simulation,
namely its value is fixed and obtainable. However, if a service’s timing
overhead is dynamically determined in execution, then a simple calculation
function could be inserted in the model to sum the aggregated timing over-
head. Otherwise, a degradation of timing accuracy occurs, which may not
be appropriate for the real-time systems being modelling (i.e., if there are
hard deadlines). Section 4.5.9.2 will describe the general method of how a
RTOS service is modelled with timing information.
2) Secondly, the thesis aims to simulate the common interrupt handling proc-
esses and other services of real RTOSs, in order to represent timing behav-
iour of a system in simulation accurately. In this thesis, the modelling and
simulation approach supports the above-mentioned two interrupt handling
148
schemes (Figure 4-5). Section 4.5.7 will introduce them in detail. Note that
some over-complex or proprietary RTOS functions are not implemented in
the models. Consequently, their timing behaviour cannot be represented in
simulation.
149
4.5 The Real-Time Embedded Software Simulation
Model
Abstract SW
Application period, period, models
SW layer wcet, wcet, Native-code
… ... code{ } … ... code{ } SW models
150
4.5.1.1 Software Layers
In the whole software PE simulation model, two software layers (i.e., the appli-
cation software layer and the RTOS layer) constitute the software part, namely a
software stack model.
From the top down, the application software is divided into several execution
entities (i.e., tasks) and each entity can be modelled as an abstract software model
or a native-code software model. The former focuses on quickly simulating timing
properties of applications. The model is characterised by a set of timing parameter
as introduced in Section 4.4.1. Whereas the latter aims to simulate functions of
applications by using functional code close to actual implementation at the ex-
pense of simulation speed reduction.
No matter which way application software is modelled, each application task
model is projected onto a RTOS-level task/thread model, which is runnable in the
SystemC simulation environment. The task/thread abstraction is handled as the
software scheduling entity in RTOS kernel multi-tasking management, which is
true for most RTOSs in the research context. Process models can be optionally
created in modelling, but they do not play effective roles to compete for resources.
System calls are implemented mainly by function calls to APIs of the RTOS
model, i.e., application tasks call member methods of the RTOS kernel module
(an object of a C++ class). This function-call feature is similar to the real situation
in a RTOS. This modelling method also has a good “side effect” of protecting
RTOS kernel data structures because data access is protected by the C++ object-
oriented program language. It represents the native distinction between the user
space and the kernel space in the real-time embedded software stack. Based on the
essential services provided in the RTOS kernel model, their APIs can be partially
configured to mimic different RTOS standards and products. For the reason that
APIs of various proprietary RTOSs may be quite different from each other in
terms of functionality and function parameters, exact compatibility to a specific
RTOS is not the goal of research in this thesis. Rather, generality, ease of use and
reasonable accuracy is desired for system-level behavioural software simulation.
As introduced in Section 4.4.2, the kernel structure is the first-class concept for
designing and modelling a RTOS. The RTOS kernel model encapsulates all its
151
data and functions in a single class and this model structure is akin to the mono-
lithic approach. However, since modelling extended OS components such as de-
vice drivers, file systems, and network stacks is outside the scope of this thesis,
the presented RTOS model contains fundamental services (e.g., multi-tasking
management, scheduling, inter-task synchronisation, and interrupt handling) that
are commonly provided by a microkernel. From this lightweight (i.e., limited and
essential) service modelling perspective, the RTOS model is also similar to a “mi-
cro” kernel.
Given the high abstraction level of software and hardware simulation models,
modelling real memory space management and the processor MMU is not ad-
dressed. Hence, potential advantages and disadvantages of monolithic and mi-
crokernel structures are not revealed and evaluated in behavioural RTOS model-
ling. This feature brings benefits in terms of modelling simplicity and fast simula-
tion speed, but is also a defect in terms of functionality and remains for future re-
search.
In the RTOS kernel, some HAL primitives directly interact with the Live CPU
Model for advancing software simulation time and setting system states. System
clock interrupts and other external hardware interrupts can invoke associated in-
terrupt handlers in the RTOS kernel module.
In Figure 4-7, the hardware layer is represented by the Live CPU Model that
was introduced in Section 3.3. It is the hardware part of the software PE model
and the basis of the upper software layers. In general, its main purpose is to sup-
port and assist behavioural software simulation from two perspectives: supporting
pre-emptible time advance and modelling hardware I/O. In some conventional
system-level real-time software and RTOS simulation (e.g., [113] [114] [126]),
the application software model and the RTOS model construct a PE, and in fact,
there is not any hardware model in the PE. Unlike them, the Live CPU Model
executes software delay annotations in a way that is conceptually comparable to
the way a real CPU executes instructions. The Live CPU Model also monitors
152
real-time clock and external interrupts and can start, stop, and resume a software
delay time advance without any undesired latency.
If some other SystemC-based hardware modules need to be combined with the
software PE model for further HW/SW co-simulation, they can be connected to
the Live CPU Model by either SystemC interface method call channels or specific
TLM interfaces (see Chapter 5).
As indicated by the lowest layer of Figure 4-7, all models in the SystemC-
based real-time software simulation framework are implemented in SystemC and
C++. Figure 4-8 illustrates how various components of the software PE simulation
model are implemented and relate to each other in SystemC. Depending on their
functionality and creator, they can be divided into two classes:
Software PE related models (See upper half of Figure 4-8): There are inher-
ent hardware and system software components in the software PE model,
which provide standard services for simulating user applications. Simulation
users can directly use these default services in their software simulation
models. Additionally, users can modify them or add new models (services)
depending on the necessity. In implementation, each model in this category
is implemented as a SystemC SC_MODULE in a separate header file. There
are three types of SC_MODULEs: the Live_CPU module, the RTOS mod-
ule, and the task module, which represent the Live CPU Model, the RTOS
kernel model, and RTOS task models.
User application models and simulation related programs (See lower half
of Figure 4-8): This part contains models and programs that are defined by
simulation users in order to simulate specific software applications in the
software PE environment. Referring to apps_main.cpp in Figure 4-8,
an application task model is given as a segment of C/C++ code, which in-
cludes an application task body function, global variables to be shared by
multiple tasks, and possible timing parameters of this application task. Re-
ferring to simulation_main.cpp in Figure 4-8, objects of various
hardware and software models are created and connected with each other in
153
the sc_main() function so as to constitute a whole SystemC simulation
program. Specifically, in the research context of “a uniprocessor system”,
there should be a single Live_CPU object, a single RTOS object, several
RTOS task objects, and several user application functions.
Models and objects are organised and invoked in a straightforward hierarchy,
according to their logical relationship - namely, the RTOS runs on top of the CPU
and software tasks run on top of both the RTOS and the CPU. Referring to Figure
4-8, the dotted lines, and pseudo code in simulation_main.cpp and
apps_main.cpp demonstrate their interrelationship. An object of the
Live_CPU model is created and then used as an argument to create a RTOS ob-
ject (See the dotted line (A)). By this means, various RTOS functions can make
use of CPU resources. Similarly, as illustrated by dotted lines (B) and (C), both
the Live_CPU object and the RTOS object are passed to application task body
task.h
rtos_main.h SC_MODULE(task)
PE simulation model
SC_HAS_PROCESS(RTOS); void (*func)(RTOS *, CPU *), … ...);
RTOS(… ...,
SC_MODULE(Live_CPU) CPU *cpu_i[CPU_NUM]); SC_THREAD(create_task_routine)
{ SC_THREAD(run_task_routine)
SC_HAS_PROCESS(Live_CPU); SC_THREAD(rtos_init)
Live_CPU(name … … ); SC_THREAD(… ...) };
SC_METHOD(cpu_sim_engine)
// Service Functions
SC_METHOD(cpu_ic) … ...
// Variables
// Virtual registers … ... D
…… C
// Port connections };
… ...
}; B
simulation_main.cpp apps_main.cpp
Applications and model-objects
{ Task parameters:
task1_struct{type, state, wcet, ...}
Live_CPU CPU0("CPU0"); // Create a CPU object task2_struct{type, state, wcet, ...}
Live_CPU *cpu_i[CPU_NUM] = {&CPU0}; task3_struct{type, state, wcet, ...}
RTOS RTOS_i("RTOS_i", cpu_i); // Create an RTOS object Task body functions:
… ... void func1(RTOS *p, CPU *p){...}
task task1(… … , func1, task1_struct, … … ); // Create task objects void func2(RTOS *p, CPU *p){...}
task task2(… … , func2, task2_struct, … … ); void func3(RTOS *p, CPU *p){...}
task task3(… … , func3, task3_struct, … … ); Shared variables:
… ... rtos_sem sem0;
sc_start(t, SC_SEC); // Start SystemC simulation rtos_msg mes0;
} int array[];
154
functions as arguments. The meaning of this is twofold: firstly, a task body func-
tion executes on the CPU; secondly, it can call services provided by the RTOS.
Then, an object of the RTOS task module is created, because it is the schedul-
ing entity of both the RTOS kernel and the underlying SystemC simulation kernel.
Referring to the dotted line (D), a user-application task body is wrapped to a
RTOS task object with a one-to-one correlation, in order to be involved in a
RTOS-based software simulation.
The above-introduced modular structure makes the software PE simulation
model simple, reusable, and extensible. The simple structure of the whole simula-
tion model is representative yet abstract enough to represent a real-time embedded
software system. The interdependency between different sub models and sub
modules are reduced as low as possible through carefully and explicitly defined
interfaces. The inherent software scheduling (i.e., the RTOS) and executing (i.e.,
the Live CPU Model) models are distinguished and independent from user-
developed application task models; hence the reusability of the RTOS model and
the Live CPU model is preserved to some extent. Referring to the code line
*cpu_i[CPU_NUM] in the constructor of the RTOS module in Figure 4-8, the
RTOS module can accept several CPU objects, which means that the RTOS model
reserves the potential to be extended as a multi-processor RTOS model in future
development.
The abstract task model applies to situations where application software code
has not been fully finished for modelling, or where the simulation user does not
have much interest in functional simulations. Such task models are primarily in-
155
tended simply to simulate the timing behaviour of real-time software with the as-
sistance of a RTOS model.
Table 4-5 shows an abstract periodic task model. Referring to lines 1-10, some
user-defined task identity information (e.g., task type, initial state, etc) and timing
properties of the model are stored in a data structure variable that will be used to
create a RTOS TCB later during the task creation process. Note that Table 4-5
does not show all necessary user-defined items. They are shown as bold in Table
4-8. The timing behaviour of a task is characterised by a set of parameters, e.g.,
BCET, WCET, relative deadline, period, and offset. In simulation, BCET and
WCET are used to generate an intermediate random value that serves as the exe-
cution time of a specific task instance. Otherwise, the WCET is used as the execu-
tion time of every task instance, because the worst-case behaviour is usually more
concerned with real-time system simulation. The relative deadline is converted
into an absolute deadline in execution, in order to facilitate deadline-driven
scheduling or is used to monitor a task’s status in terms of whether it misses a
deadline. The period explicitly specifies how often a task should be regularly acti-
vated. The offset indicates means the initial waiting time of the first task job in
execution. In simulation, the RTOS kernel model can track the period, the offset,
and instance numbers of an abstract periodic task in order to support periodic task
156
execution (see Section 4.5.4). Note that not all of the above four parameters are
required for an abstract task model. For example, an aperiodic task that services
an interrupt does not have a period parameter.
Lines 11-21 of Table 4-5 show the body function of an abstract periodic task
model. A RTOS model object and a Live CPU Model object are passed to the
function body, in order to let the task use RTOS functions and CPU resources,
although the task model usually contains little or none functional code. Optionally,
the simulation user can appoint a probabilistic function in order to generate a ran-
dom execution time for each task job. This method is also used in similar-purpose
research [8] [72]. The time advance method (i.e., lines 16 and 17) was introduced
in Chapter 3 and note that the event object at line 17 is exclusive to this task
model. Recalling it again, this time advance process is interruptible and the task
model is pre-emptible.
At lines 18 and 19 of Table 4-5, a RTOS function task_wait_cycle() is
called to notify the RTOS kernel that this periodic task reaches its end and waits
for next execution cycle (period). Accordingly, the RTOS kernel will take some
actions to process this request, which will be introduced later in Section 4.5.4.3.
In case of an abstract aperiodic task model, it shall call another RTOS function
task_wait_suspend(), which will suspend the task indefinitely until the
task is invoked by an interrupt again.
Note that, if a task is not independent, namely it cooperates or competes for
some shared resources with other tasks, then specific RTOS synchronisation or
communication services must be called in the body function. In this case, the na-
tive-code task model is more applicable.
157
#001 // Defining parameters of a task in a struct
#002 {
#003 THREAD_TYPE task_type;
#004 THREAD_STATE task_state;
#005 unsigned __int64 relative_deadline;
#006 unsigned __int64 delay_time;
#007 … … }
#008
#009 // Task body function
#010 void task(RTOS *rtos_i_ptr, CPU *cpu_i)
#011 {
#012 while(1){
#013 func_block1(); // The 1st block does some functions
#014 DELAY(B1_DELAY); // Pass B1_DELAY to Live CPU
#015 wait(event); // Wait for time advance
#016 … … // The 2nd block does some functions
#017 … … // The 3rd block dose some functions
#018 rtos_i_ptr->sleep(500); // Call RTOS API: sleep()
#019 wait(event); // Wait for next execution
#020 }
#021 }
Referring to Table 4-6, this wait-for-event time advance method is briefly re-
peated here. Timing delay annotation (line 14) interleaves with a code block (line
13) in the function body. The DELAY() function at line 14 injects a delay value
B1_DELAY into the Live CPU Model. The granularity of a delay annotation de-
pends on the choice of the simulation user, for example, the basic block level or
statement segment level. Unlike [72] [43], delay annotation statements in the na-
tive-code task model do not define fixed pre-emption points for HW/SW synchro-
nisation. Their main purpose is to notify the Live CPU Model how long computa-
tion time a code block needs, and then let the task wait for an event that will be
released when the delay time is consumed. The event object at line 15 is exclu-
sively used in this task model. Interruption and pre-emption can happen at any
necessary (i.e., there is an interrupt event) and possible (i.e., system-wide inter-
rupts are enabled) time points during a delay duration.
Compared to two task examples provided by ThreadX RTOS [152] and
μC/OS-II RTOS [149] in Table 4-7, the body function of a native-code task model
does not differ too much from the entry function of a real RTOS task in terms of
the code structure. That is, a loop contains the C/C++ main functional code and a
RTOS system function is called at the end of the loop body in order to suspend the
task. The periodic execution of a task can be achieved by calling the RTOS time
delay function, e.g., line 18 in Table 4-6 and lines 8 and 19 in Table 4-7. This co-
158
#001 // An example body function of a ThreadX thread
#002 void data_capture_process(ULONG thread_input)
#003 {
#004 while(1){
#005 temp_memory[frame_index][0] = tx_time_get();
#006 temp_memory[frame_index][1] = 0x1234;
#007 frame_index = (frame_index +1) % MAX_TEMP_MEMORY;
#008 tx_thread_sleep(1);
#009 }
#010 }
#011 // An example body function of a μc/OS-II task
#012 void TaskClk(void *pdata)
#013 {
#014 char s[40];
#015 data = data;
#016 for(;;){
#017 PC_GetDateTime(s);
#018 PC_DispStr(60,23,s,DISP_FGND_BLUE+DISP_BGND_CYAN);
#019 OSTimeDly(OS_TICKS_PER_SEC);
#020 }
#021 }
Table 4-7. Two task examples in ThreadX RTOS and μC/OS-II RTOS
herence facilitates using the simulation model with conventional RTOS applica-
tions. The difference mainly resides within two points in the native-code model:
firstly, time annotations and synchronisation points are inserted for time advances;
secondly, a RTOS service should be invoked through a pointer to a RTOS model
object.
Given that application software has been divided into task body functions and
that their timing parameters are provided, it is necessary to create RTOS-level
task models in order to let the RTOS kernel organise these execution entities. As
mentioned before, the task/thread concept is chosen as a RTOS scheduling unit.
Based on the survey in Section 4.4.3.2, such a multi-tasking model is common
and powerful enough to organise real-time embedded applications in various
RTOS products. In the modelling approach, a RTOS task/thread model is imple-
mented as an object of the SystemC task module. Figure 4-9 shows the defini-
tion of a RTOS task and its relationship to an application task model.
Note that there is a clear separation between a user task model and a RTOS
task model, in terms of both the modelling concept and the SystemC implementa-
159
RTOS task model definition Application task model definition
SC_MODULE(task) // Task parameter structure
{
SC_HAS_PROCESS(task); tcb = {… … tid, type, state, … …}
task(sc_module_name name, *tcb, tsb = {… … , event[], … … }
void (*func)(RTOS *, CPU *), … ...); // Task body function
void entry_function(RTOS *rtos_i_ptr, CPU *cpu_i)
SC_THREAD(create_task_routine);
dont_initialize(); {
sensitive << rtos_i_ptr->event_0; while(1)
{
SC_THREAD(run_task_routine); … …
dont_initialize();
sensitive << TSB[tid].event[0]; }
}
};
tion in the modelling approach. The creation of a unified RTOS task object only
utilises task information and the function body defined by a user in one applica-
tion task model (see Section 4.5.2), which means it is not necessary to define a
variety of RTOS task modules in order to accommodate different applications.
In the modelling approach, the implementation of a RTOS task involves two
data structures and two operations, which are referred to as the Task Control
Block, the Task Service Block (TSB), initialising a TCB, and wrapping a function
body, respectively.
Definition of the Task Control Block
Every RTOS needs a TCB structure for each task in order to store task-specific
properties and manage the task through the TCB during run-time. Table 4-8 de-
scribes the TCB fields within the RTOS model. Among them, the bold fields can
be provided in user-defined application models (see Section 4.5.2). All TCB fields
can generally be classified within three categories:
The task ID and status section: Fields in this section are related to statically-
assigned identifiers and dynamically-changed states;
The task timing information section: This section stores some timing pa-
rameters of a real-time task as well as time advance information;
The pointers section: Some pointers are provided in order to correlate mes-
sage and synchronisation event control blocks to a task, and they are also
used to maintain task scheduling queues.
Although the contents of the TCB are internal affairs in the design of a RTOS
simulation model, a certain degree of similarity between the model’s TCB and a
real RTOS’s TCB is still helpful in allowing for simulation users to inspect and
160
Field Description Field Description
rtos_tcb_cpu_id The CPU which a task belongs to Pointer to an event
Pointers section
rtos_tcb_pid Process identifier control block
rtos_tcb_tid Task identifier Pointer to a message
rtos_tcb_thread_type Task type *rtos_tcb_msg
rtos_tcb_thread_state Task state Pointer to the previous
rtos_tcb_wait_flag Sub-state of WAITING state *rtos_tcb_back TCB in a sche. queue
rtos_tcb_base_prio Initial (base) priority Pointer to the next TCB
rtos_tcb_cur_prio Current (effective) priority *rtos_tcb_next in a scheduling queue
Time info. section
understand the state of tasks in simulation. Comparing the RTOS model’s TCB to
those of μC/OS-II and ThreadX, the task ID/status section and the pointers section
of them are mostly alike. The significant differences include:
1) The RTOS model’s TCB omits memory stack setting fields, which how-
ever do exist in the TCB of a real RTOS. This is because the RTOS model
does not aim to model software execution memory space.
2) Regarding the timing information section, the RTOS model’s TCB has
some real-time task-related timing fields; whereas, a real RTOS’s TCB
does not normally contain them. The proposed TCB is based on the consid-
eration that these real-time parameters are necessary for abstract task mod-
elling and real-time system simulation.
3) The context[CONTEXT_LENGTH] field is essential for software time
advance in our timed software simulation method. Its six sub-fields are
namely the “processor-related context” of a task model. Their value needs
to be written to and read from the virtual CPU_REGs of the Live CPU
Model in each context switch. The context-switch process will be intro-
duced in Section 4.5.8.2. Note that a real RTOS TCB does not need these
fields, but contains the real program counter, stack point, and other data
registers as substitutes.
Definition of the Task Service Block
The TSB is a user-defined data structure associated with each RTOS task
model. Its main purpose is to store simulation-related configuration parameters
161
and statistical information of a task that are not normally contained in a real TCB,
in order to simplify the TCB structure. The most useful field of a TSB is a
sc_event object array. Each sc_event object is exclusive to a task function
body (as shown in Table 4-5 and Table 4-6). The Live CPU Model controls time
advances of each task model via the wait-for-event method and these sc_event
objects. The sequence number of task jobs and initial offset are another two nota-
ble TSB fields. They record how many instances a task has executed and the
task’s initial offset, which are used to calculate activation time of an abstract peri-
odic task.
Initialising a TCB
In model implementation, a vacant TCB array (rtos_tcb_array[]) was
defined before the TCB initialisation process. Referring to Figure 4-9, the
SC_THREAD(create_task_routine) takes charge of initialising a TCB
item in the array. A sole task ID offers a connection between the existing TCB
item and this initialisation process. This create_task_routine uses task
properties provided in the user-defined data structure (see Table 4-5 and Table 4-6)
and initialises all necessary fields of a corresponding task’s TCB. Note that both
the offset in Table 4-5 and the delay_time in Table 4-6 correspond to the
thread_sleep_length subfield in Table 4-8, which represent a task possi-
bly being delayed for some time after its creation, i.e., with an offset. The
thread_abs_dln subfield in Table 4-8 refers to the absolute deadline of a task.
If necessary, it can be computed as the sum of the task creation time and its rela-
tive deadline.
Figure 4-10 illustrates the timeline of the TCB initialisation process in a real
RTOS executing situation. Normally, the task creation happens just after the
RTOS kernel has been initialised. The RTOS kernel initialisation necessarily con-
sumes some simulated time and so consequently, there is a time offset from the
zero time of the simulated clock to the initialisation of the first TCB. Furthermore,
every task creation activity sequentially progresses the target clock. In our ap-
proach, the practical timing behaviour of this execution order is modelled by two
techniques:
162
user task1 creation task2 creation task3 creation
code
RTOS Initialisation
0 time
TCB1 is TCB2 is TCB3 is
initialised initialised initialised
7
We assume that all tasks are created just after the RTOS kernel initialisation but before the start
of the OS multi-tasking service. Interrupts are disabled at the time. Hence, the use of wait-for-
delay is allowed here.
163
two kinds of SystemC processes, SC_METHOD and SC_THREAD, the latter is se-
lected as the wrapper. Because it can be suspended and resumed in execution, this
behaviour is essential for modelling task pre-emption and simulation time advance.
In Figure 4-9, the SC_THREAD(run_task_routine) behaves as such a
wrapper to encapsulate a task entry function. It is sensitive to a sc_event stored
in its TSB.
In some complex RTOSs (e.g., QNX and other RT-POSIX compliant ones),
applications are managed in both the process model and the task/thread model,
where a process contains at least one thread and provides a memory space for all
its containing threads. This two-level structure brings some advantages such as
better modularity because of distinct process containers, less interdependency
since each process has its particular definition, and more reliability because
threads are protected in different memory spaces [184].
It is important to reiterate that, in this thesis the multi-tasking model is based
on a single-level task/thread abstraction model and without modelling memory
management functions. Modelling process is out of the research scope. However,
for a consideration of preserving the extendibility of the software PE simulation
model, a simple Process Control Block model is defined as well (See Table 4-9).
In the modelling approach, a process can be created by modelling the RT-POSIX
spawn() function. This creates a child process by directly specifying an execu-
table to load and its implementation is very similar to the previously mentioned
task/thread creation method. A process and its inclusive threads are related to each
other according to the pcb_child_tcb_array[] field in the PCB and the
Field Description
pcb_pid Process identifier
pcb_uid User identifier
pcb_gid Group identifier
pcb_child_tcb_arrary[NUM] Child-task/thread tids’ array
pcb_process_base_priority Initial (base) priority
pcb_process_current_priority Current (effective) priority
start_address Starting address of the process’s memory space
end_address Ending address of the process’s memory space
*pcb_back Pointer to the previous PCB in a sche. queue
*pcb_next Pointer to the next PCB in a scheduling queue
164
pid field in the TCB.
The task state machine is the basis of both multi-tasking management and
scheduling services in our RTOS kernel model. The task state machines imple-
mented in some real RTOS products were surveyed in Section 4.4.3.2.
Note that the task state machines implemented in some existing RTOS model-
ling research used some terms and structures that are confusing or not common in
practical RTOSs. For example, as shown in Figure 4-11 (A), [8] and [11] imple-
ment a similar task state machine including four states8: Idle, Ready, Executing,
and Pre-empted. Two points of this model are worth discussing:
1) In a normal RTOS kernel, if a task is pre-empted, then it usually enters the
“READY” state. However, in Figure 4-11 (A), a special Pre-empted state is
defined as different from its Ready state, which may be unnecessary.
2) In a normal RTOS kernel, if a task is blocked due to waiting on a synchro-
nisation method (namely a resource) or explicit self-suspension, then it
usually goes to the “WAITING” state. In Figure 4-11 (A), a task enters the
Pre-empted state when it is waiting on data, whereas it enters the Idle state
for self-suspension. The two different states cannot simply be interpreted as
synonyms of the classical “WAITING” state, because of the confusing
meaning of the Pre-empted state. In the model, it reflects the function of
both the “READY” state and the “WAITING” state, which are diverse in
relation to classical RTOS concepts.
Figure 4-11 (B) shows a seven-state RTOS task state model presented in [12].
Just as its authors indicated, it is similar to the task state machine commonly used
in UNIX systems. Hence, although it is complete and expressive enough, it may
not be applicable for small-size compact RTOSs. It is noticed that:
8
Hereafter, the first letters of the states in the referenced RTOS task state models are capitalised.
Distinctively, the states in the surveyed RTOS products in Section 4.4.3.2 and in our RTOS model
are spelled using capitals letters.
165
Figure 4-11. Task state machines: reprint A [8] [11], B [12]
The task state machine divides the classical “RUNNING” state into the User
mode and Super User mode, which is not common in RTOSs.
Based on the above feature, the task state machine has divided the classical
“READY” states into two states, i.e., Ready and Waiting. This makes the
RTOS state machine redundant.
Research in [114] implements the three-state (i.e., READY, RUNNING,
WAITING) task state model depicted in Figure 4-6. This canonical structure is
also the basis for research in this thesis. Furthermore, based on the survey in Table
4-1, a four-state extensible task state machine is proposed to contain more states
in order to be more representative and correspond to specific kernel services of
some RTOSs. Figure 4-12 shows its structure and task state transitions. The main
modelling idea behind this is as follows:
1) Add a TERMINATED state, because it appears to be useful in many RTOS
products. For example, it is referred to as, or similar to, the INACTIVE
state in RTX [1], the DORMANT state in μITRON [151] and μC/OS-II
[149], the COMPLETED and TERMINATED state in ThreadX [152]. The
TERMINATED state is the exit of a task in the system, that is, where the
task has already finished and cannot execute again.
2) Subdivide the WAITING state into seven sub-states, i.e., WAITING_SUS,
WAITING_SEM, WAITING_MUT, WAITING_QUE, WAITING_EVT,
WAITING_DLY, and WAITING_CYC. As shown in Figure 4-12, each
166
task create
Delete
Terminated by Self-terminate or
other task TERMINATED finish execution
The RTOS normally manages tasks by organising their TCBs in several queues
[25] [26]. Usually, there are two pointers in a TCB by which multiple TCBs link
to each other (See *rtos_tcb_back and *rtos_tcb_next in Table 4-8).
As mentioned in Section 4.4.3.2, a ready queue and a waiting queue are necessary
for maintaining tasks at the READY state and WAITING state, respectively. In
addition, the TERMINATED state needs a separate queue. Because there is only
one RUNNING task in the uniprocessor system at any time, the RUNNING state
167
does not need a queue and a RTOS_RUNNING_TCB pointer indicates the TCB of
the current RUNNING task. If this RTOS model is possibly extended to a multi-
processor platform in future research, then the RUNNING state can have multiple
RTOS_RUNNING_TCB pointers.
The exact implementation method of a queue varies in different RTOSs. In
μC/OS-II, the ready queue is effectively implemented as a table with two vari-
ables: an integer and an integer array [149]. Their bits represent states of tasks and
task IDs, respectively. The RTOS kernel looks up the table to find the highest pri-
ority READY task and removes a task from the ready list by clearing a bit of the
integer variable. Considerations of such an implementation are to save limited
memory space, improve lookup speed, and keep the lookup execution time con-
stant. However, it is not very user-friendly or well visualised. In QNX, the ready
queue is implemented as 256 separated queues – each priority level having a
linked list [162]. This structure is quite organised with an inserting time complex-
ity of O(1), but its implementation complexity is relatively high for modelling.
In this thesis, in order to keep a balance between implementation complexity in
modelling and operating time complexity in simulation, a basic task queue is im-
plemented as a single priority-descending9 doubly linked list (See Figure 4-13).
All tasks at the same state are inserted into the queue according to their priorities,
with a time complexity of O(n). Same-priority tasks are adjacent. This is similar
to the ready thread list of ThreadX [152]. Basic primitives are provided to ma-
nipulate and debug a queue, for example, inserting a TCB, deleting a TCB, return-
ing the head of the queue, reporting the number of TCBs in the queue, and print-
ing one or all TCBs in the queue. The ready queue, waiting queue, and terminated
queue all inherit this base task queue class but may derive different functions from
it. For example, a simulation user can implement a specified policy regarding how
same-priority TCBs are ordered in a queue, e.g., FIFO or LIFO, by overloading
the inserting primitive. These derived task queues and their member functions are
9
In order to support EDF scheduling, in modelling, a task queue can also be ascendingly ordered
by tasks’ absolute deadlines. Corresponding primitives have been implemented in the model.
168
Priority list.first
high
TCB a TCB b
Level i *tcb_next *tcb_next
NULL *tcb_back *tcb_back
TCB x
Level k
*tcb_next NULL
low *tcb_back
10
These two sets of services mean a narrow definition of “multi-tasking services”. Other specific
task services such as scheduling and synchronisation will be introduced in the following related
sections.
169
Services implemented in RTOS model ThreadX μc/OS-II RTX
Create a task tx_thread_create OSTaskCreate os_tsk_create
Task state transition
Table 4-10. Task services in the RTOS model and some RTOSs
model a specific RTOS product, services of the RTOS model may also need to be
adapted. The careful definitions of the task state machine and the TCB structure
make a sound base from which to revise existing services or add new services into
the RTOS model without many obstacles.
Supporting periodic execution of abstract tasks is a notable task service of the
RTOS model. It is shown as the service “Transfer from RUNNING to WAIT-
ING_CYC” in Table 4-10 and is implemented as the function
task_wait_cycle() in Table 4-11. Upon being called by a task model (see
the example in Section 4.5.2.1), this function firstly calculates the task’s next acti-
vation time according to its first release time and number of instances that are
stored in its TSB. The next activation time is then converted to a sleep value rela-
tive to the current time stamp and is set in the thread_sleep_length sub-
field of the task’s TCB. Finally, the task_wait_cycle() function moves the
task to the WAITING_CYC state to let it wait for its next activation. Afterwards,
clock interrupts check whether the task should be awakened (see Section 4.5.5.3).
In terms of SystemC implementation, according to the specification in Table
4-10, task services are implemented in the RTOS module as normal member func-
tions rather than separate SystemC processes (See Table 4-11). They are invoked
by tasks through a pointer to the RTOS object. In order to be general, they require
minimal input parameters. Depending on needs, a task service can output status
values indicating a success or a failure, as well as other specified information.
Note that, task state transition services usually result in a rescheduling action by
the RTOS scheduler.
170
#001 SC_MODULE(RTOS)
#002 {
#003 SC_HAS_PROCESS(RTOS);
#004 RTOS(sc_module_name name, CPU *cpu_i[CPU_NUM]);
#005 … …
#006 /*Task state transition-related services*/
#007 unsigned int task_create(void);
#008 unsigned int task_terminate(unsigned int tid);
#009 unsigned int task_delete(unsigned int tid);
#010 unsigned int task_give_up_CPU(void);
#011 unsigned int task_resume_sus(unsigned int tid);
#012 unsigned int task_resume_dly(unsigned int tid);
#013 unsigned int task_sleep(unsigned __int64 t);
#014 unsigned int task_wait_suspend(void);
#015 unsigned int task_wait_cycle(void);
#016 /*TCB-related services*/
#017 rtos_tcb* task_tcb_get_pointer(void);
#018 unsigned int task_tcb_get_info(rtos_tcb *source, rtos_tcb *dest);
#019 unsigned int task_change_prio(unsigned int tid);
#020 unsigned int task_change_time_slice(rtos_tcb *tcb,
#021 unsigned __int64 new_slice);
#022 };
171
Explicitly, the creation of a task module object in the sc_main() function
does not contradict using the task_create() function in task body functions.
The former plays the functional role to create a task in SystemC simulation, but it
cannot be used in an application task, nor can it reflect the timing overhead of a
dynamic task creation at simulation runtime. The latter is a dummy in terms of its
void function. However, it complies with the traditional RTOS programming
method by modelling the task creation API of a specific RTOS. This is undertaken
in order to support conventional real-time software simulation. In addition, in case
of a dynamic creation in simulation, it can be annotated with timing consumption
of a task creation service, and hence can represent its timing behaviour at a correct
timing point when it is called. This dual task creation technique utilises SystemC
modular modelling approach and supports native-code real-time software models.
In this thesis, the task_create() function can model both the static task
creation and dynamic task creation, provided that all related task module ob-
jects have been statically created. This “pseudo” dynamic task creation could be
seen as a limitation of the modelling method. The reason for this is that a task is
created by creating a SystemC SC_MODULE, but the SystemC standard does not
natively support “dynamic creation or modification of the module hierarchy dur-
ing simulation” [66].
The Priority Assignment is the basis of scheduling in the RTOS model for FPS
scheduling. Figure 4-14 depicts the priority setting of the RTOS model. This pri-
172
oritisation system is fully configurable by defining some constants, as shown in
the figure. In general, at least 256 levels should be available with the exact num-
ber depending on a specific configuration.
The lowest priority level 0 (i.e., the smallest number) is always assigned to the
special IDLE task11 . Some of the highest priority levels (i.e., the largest numbers)
are currently reserved without use. In the whole priority range, all ISR priorities
are higher than normal task priorities. In the RTOS model, these ISRs represent
special kinds of aperiodic tasks that can be defined by users, but are not equal to
user-defined normal aperiodic tasks that belong to normal real-time tasks in this
model. The specific priority ordering algorithm for normal real-time tasks is de-
pendent on the simulation user’s choice and is unimportant to RTOS modelling
research here (See Section 4.4.3.3 for an introduction to some classical priority
ordering algorithms). If there are non-real-time tasks in the system, then they
should be allocated priorities lower than all other real-time tasks.
Note that in the TCB depicted in Table 4-8, there are two priority fields, i.e.,
ISRs’ priority
range
ISR_LOWEST_PRIORITY
TASK_HIGHEST_PRIORITY
Normal RT
tasks’ priority
range
NRT tasks’
TASK_LOWEST_PRIORITY priority range
1
RTOS_LOWEST_PRIORITY 0 IDLE task
11
The IDLE task is always ready to run. It is dispatched when there are not any other runnable
tasks in the system, which actually means that the CPU is idle.
173
rtos_tcb_base_prio and rtos_tcb_cur_prio, which represent the ba-
sic (initial) priority of a task and the current (effective) priority of a task, respec-
tively. In RTOS execution, a task’s current (effective) priority is used by the
scheduler because it is updated in case of a priority change operation.
The basic algorithm of FPS is to compare the current priority of the RUNNING
task and the current priority of the first task in the ready queue. The result of the
comparison is the basis on which to make a scheduling decision. Regarding FIFO
and RR algorithms in the scheduler model, their theories and usages were intro-
duced in Section 4.4.3.3. The RTOS model follows the classical concepts and can
choose one of the two algorithms for all tasks in the system.
174
monitor, and a conceptual software executing engine in [72] [113] [185]. In
the RTOS model in this thesis, the scheduler just finishes a reasonable
software function to choose the next-to-run task and then calls the task
switch service. The low-level task switch service and the Live CPU Model
collectively finish the remaining work to activate the next-to-run task.
Referring to Figure 4-15, the working flow of the FPS scheduler model can be
described as follows:
1) Once the scheduler is triggered, it compares the current priority of the
RUNNING task and the current priority of the first task in the ready queue.
There may be three results:
2) If the current RUNNING task’s priority is higher, then the scheduler needs
to check whether the RUNNING task is blocked by a condition. If it is
blocked, then the RUNNING task is moved to the waiting queue, and the
first READY task is chosen as the new next-to-run task. Otherwise, the
scheduler just exits. No task switch is necessary, and the RUNNING task
scheduler() is invoked in
time-driven or event-
driven mode
Compare
the priority of the RUNNING
task (Priorun) and the priority of the first
task in the ready queue
(Prioready)
FIFO or RR?
Is the RR
RUNNING task
blocked? Does
FIFO No the RUNNING task
Yes consume its time
slice?
Yes
No
175
continues execution.
3) If the first READY task’s priority is higher, then it is removed from the
ready queue and chosen as the new next-to-run task. The old RUNNING
task is inserted into the ready queue, which means it is pre-empted. Then
the scheduler calls the task switch service and finishes.
4) When their priorities are equal and if the FIFO algorithm is set up, then the
RUNNING task continues executing and the scheduler just exits. If the RR
algorithm is chosen, the scheduler checks whether the RUNNING task’s
time slice is exhausted. If it is, then the first READY task is dispatched as
the new RUNNING task and the old RUNNING task is inserted into the
ready queue. If the RUNNING task’s time slice still exists, then the sched-
uler just exits.
176
user task execution task execution
task
rtos_tim rtos_tim
RTOS scheduler() scheduler()
e_tick() e_tick()
tick_isr ISR ISR
0 time
tick_timer_clk tick_timer_clk
a system tick
177
some common RTOS products. In [176] [26], two implementation methods of an
EDF scheduler in OSs are discussed:
1) Implementing an EDF scheduler on top of usual RTOS kernel with a lim-
ited number of priority levels: The kernel maps absolute deadlines to pri-
orities and allows changing priority at runtime. However, this method is
“not easy nor efficient” [176]. [176] shows an example situation: at exe-
cution runtime, if two task jobs have been allocated two adjacent priority
levels according to their absolute deadlines, then it is not easy to allocate
a priority to the third task job that has an intermediate absolute deadline.
The only solution deemed in [176] is to remap the two existing jobs to
new nonadjacent priority levels. Possibly, in the worst case, all jobs in
the ready queue may need priority remapping and the incurred overhead
could be excessive.
2) Implementing an EDF scheduler on top of a deadline-based RTOS kernel:
The ready queue of the RTOS kernel orders tasks according to increasing
absolute deadlines. This method is believed to be a “better alternative”
[26] because it needs a relatively small modification of the kernel struc-
ture and its services. Basic queue operations such as insertion, deletion,
and returning the queue head all behave similarly to those priority-based
queue operations. The absolute deadline of a task actually plays as the
“priority” of a task in this model. This implementation method requires
that the absolute deadline of a task is calculated at each release time and
recorded in its TCB.
It is noticed that the EDF scheduler in [72] is implemented in the first priority
reassignment method. However, in this thesis, the EDF model follows the second
method. In fact, various implementation elements of this model have already been
referred to in the above paragraphs of this section.
In the TCB definition in Table 4-8, the task relative deadline should be speci-
fied by the user and stored in the rtos_tcb_relative_deadline field, and
the task absolute deadline task is stored in the thread_abs_dln sub field. In
Section 4.5.4.2, the priority-based task queue class is introduced and it has the
possibility of becoming an absolute deadline-based queue.
178
It is well known that: task absolute deadline (d) = task release time (a) + task
relative deadline (D). The task relative deadlines are defined by users, i.e., they
are known. The difficulty of modelling an EDF scheduler in the RTOS model is
mainly dependent on how to determine task release times, by which task absolute
deadlines can be calculated. Referring to Figure 4-17, the proposed implementa-
tion method is described as follows:
For periodic tasks, it is required that each of them should enter the WAIT-
ING_CYC state to wait for next activation when it finishes its current execution
cycle. The method is carried out by two RTOS services:
The task creation service (in Section 4.5.3.1) uses the task creation time (or
adding an offset) as a of the first job of a task. Hence, d of the first job is ob-
tained.
The RTOS time tick service (in Section 4.5.5.3) takes charge of calculating
d for subsequent jobs of a task. If the tick service moves a task from the
WAITING_CYC state to the READY state, then it means that a task has en-
tered its new cycle. This time point is deemed as the approximate a with a
possible but acceptable sleeping time jitter.
There are two kinds of aperiodic tasks in the model, namely ISRs and normal
aperiodic tasks. ISRs provide first level (i.e., early) simple software interrupt ser-
vices, while normal aperiodic tasks provide second level (i.e., later) software in-
Specified by
users
d of an ISR is calcualted by: RTOS kernel interrupt handle or the ISR itself
Aperiodic a is: ISR calling time D: predefined
tasks
d of an normal aperiodic task is calcualted by: its associated ISR
a is: ISR calling time D: predefined
179
terrupt services. The calculation of d is twofold:
1) ISRs are either directly invoked by the hardware interrupt controller or by a
RTOS kernel interrupt handler. In the case of the former mode, an ISR uses
its beginning time as a in order to calculate its d; in the case of the latter
mode, a RTOS kernel interrupt handler will use its calling time as a of the
ISR and calculate its d.
2) Normal aperiodic tasks are initiated by an associated ISR through synchro-
nisation methods. Hence, a precedent ISR can use this calling time as the a
of a subsequent normal aperiodic task, with d calculated from it.
Except for the above points, other implementation details of the EDF scheduler
are similar to the FPS scheduler. Bear in mind the EDF scheduler is implemented
in the RTOS kernel with some restrictive conditions on both application models
and the RTOS model. Consequently, it is unable to refer to a practical RTOS. The
research in this thesis does not aim to implement many RTOS functions in this
EDF model.
180
for mutexes in order to avoid the priority inversion problem. Their usage, for in-
stance whether a specific synchronisation or communication function is allowed
to be used in ISRs, is consistent with referenced RTOSs [149] [152] .
Referring to Table 4-12, the RTOS model uses a universal Event Control Block
(ECB), in common with μC/OS-II RTOS [149], to control different synchronisa-
tion and communication entities (referred to as event objects hereafter) at the ker-
nel level. These different types of event objects share some fields and primitives
of the ECB method. This implementation technique brings reusability to RTOS
modelling. In addition, they contain respective fields and application interface
functions.
An ECB represents the various characteristics of an event object. As shown in
Table 4-12, all types of event objects own an ECB ID, an ECB type property, a
pointer to its respective resource, and a suspension task list. Besides this, a mutex
or a semaphore event object also needs a counter field. More particularly, a mutex
ECB records the original priority of its owning task for the PIP protocol, and a
181
ceiling priority field is reserved for the PCP protocol. The suspension task list is
based on the STL list template class [139]. An element (i.e., struct
tid_priority_block) of the list includes two essential properties of a task,
i.e., a task ID and the current priority. The task suspension list can be ordered by
either FIFO or priority, which is able to model optional features provided by some
RTOSs, e.g., ThreadX. By default, the highest-priority task is placed at the head
of the suspension list in the RTOS model.
In Table 4-12, five basic primitive functions are implemented to manage an
ECB, i.e., creating an ECB, deleting an ECB, waiting for an event object (namely
a P operation), waiting for an event object with a timeout, and signalling an event
object (namely a V operation). These kernel functions are called by different syn-
chronisation and communication application functions accordingly.
In order to explain these primitives, Table 4-13 shows an example code of the
“waiting for an event object” function (sync_wait()) and the “signalling an
event object” function (sync_signal()). The processing sequences of the two
functions are similar in terms of including three sequential steps, i.e., operating
the ECB task suspension list, operating the task’s TCB, and operating the RTOS
182
task queues. However, their exact functions differ. The sync_wait() function
firstly inserts a blocked task in the ECB task suspension list (lines 5, 6), then re-
cords the ECB in this task’s TCB (line 7), and finally puts this task in the RTOS
waiting queue (line 8). In contrast, the sync_signal() function firstly re-
moves the unblocked task from the ECB task suspension list (line 16), then clears
blocking information from this task’s TCB (lines 18, 19), and finally moves the
task from the RTOS waiting queue to the ready queue (lines 23, 24).
In the RTOS model, a counting semaphore includes a 32-bit counter (i.e., the
rtos_ecb_counter field in an ECB). Its value represents how many tasks are
allowed to access the protected resource. Its usage complies with normal situa-
tions in RTOSs:
A semaphore does not have a notion of ownership, and any tasks can wait
(i.e., P) or post (i.e., V) a semaphore.
A positive counter value means resources are available, while a zero value
means the resource is unavailable.
The wait operation will decrement the counter value by one. If a counter
value reaches zero, then a wait operation will block the calling task (i.e., at
the WAITING_SEM state) and put into the suspension task list.
A post operation will increment the counter by one or unblock the highest-
priority task in the suspension task list.
Table 4-14 enumerates seven semaphore services supported in the RTOS
model and corresponding services provided by three referenced RTOS products,
which shows that the proposed RTOS model has a good coverage of typical
Table 4-14. Semaphore services in the RTOS model and some RTOSs
183
#001 SC_MODULE(RTOS)
#002 {
#003 ... ...
#004 int sem_init(rtos_ecb *psem, int pshared, int c_value);
#005 int sem_destroy(rtos_ecb *psem);
#006 int sem_wait(rtos_ecb *psem);
#007 int sem_timedwait(rtos_ecb *psem, unsigned __int64 nanoseconds);
#008 int sem_post(rtos_ecb *psem);
#009 int sem_trywait(rtos_ecb *psem );
#010 int sem_getvalue(rtos_ecb *psem, int *value);
#011 ... ...
#012 };
184
tion. Note that only part of the original code is displayed in the figure due to a
page limit. The sem_wait() function is used inside a task body function when
the task wants to acquire a semaphore count. In case of a positive semaphore
value, the semaphore counter is simply decreased by one (line 6); in case of a zero
semaphore value, the before-mentioned sync_wait() primitive is called (line
14) to block the calling task. Afterwards, the scheduler() function (intro-
duced in Section 4.5.5.2) is invoked to make a rescheduling decision (line 15). On
line 17, because the scheduler() function should have already selected
(namely dispatched) a new task as the next-to-run task, the Live CPU Model is
thus triggered by notifying a sc_event in order to execute the new task. Then,
on line 19, the calling task is blocked by a wait-for-event statement. This
sc_event will be released at a future time point when the task is unblocked.
185
Mutex services in Mutex services in some RTOSs
the RTOS model ThreadX μc/OS-II RTX
Initialise a mutex tx_mutex_create OSMutexCreate os_mut_init
Destroy a mutex tx_mutex_delete OSMutexDel
Lock (P) a mutex
OSMutexPend
Lock (P) a mutex with a timeout tx_mutex_get os_mut_wait
Lock a mutex without blocking OSMutexAccept
Unlock (V) a mutex tx_mutex_put OSMutexPost os_mut_release
tx_mutex_info_get OSMutexQuery
Table 4-17. Mutex services in the RTOS model and some RTOSs
#001 SC_MODULE(RTOS)
#002 {
#003 ... ...
#004 int pthread_mutex_init(rtos_ecb *pmutex, int *attr);
#005 int pthread_mutex_destroy(rtos_ecb *pmutex);
#006 int pthread_mutex_lock(rtos_ecb *pmutex);
#007 int pthread_mutex_timedlock(rtos_ecb *p, unsigned __int64 timeout);
#008 int pthread_mutex_unlock(rtos_ecb* pmutex);
#009 int pthread_mutex_trylock(rtos_ecb *pmutex);
#010 ... ...
#011 };
Table 4-17 enumerates six mutex services supported in the RTOS model,
which also have an approximate equivalence to corresponding services provided
by the three referenced RTOS products. Table 4-18 shows default mutex inter-
faces implemented in the proposed RTOS, which partially refer to the RT-POSIX
standard. The modelling method of a specific mutex service is similar to the
semaphore modelling technique in last section. Hence, it is not repeated again.
186
The send operation inserts a message pointer into the message queue. If the
queue is full, the calling task will be blocked (i.e., at the WAITING_QUE
state) and put into the ECB suspension task list.
The receive operation retrieves and removes a message pointer from the
message queue. If the message queue is empty, the calling task will be
blocked and put into the ECB suspension task list.
The unblocking conditions of send and receive operations are similar to pre-
viously mentioned semaphore and mutex behaviours and hence are abbrevi-
ated here.
In implementation, a message queue needs a special second-level control block
(i.e., rtos_mqcb) in addition to its ECB. Its structure partially refers to μC/OS-
II RTOS [149]. As shown in Figure 4-18, a message queue control block stores
various control information regarding a message queue and is involved in send
and receive operations. The read and write pointers move in the same direction
from the start address to the end address of the pointer array, i.e., messages are
First-In-First-Out.
movement send()
receive()
A pointer array
Figure 4-18 Message queue control block
187
Message queue services Message queue services in some RTOSs
in the RTOS model ThreadX μc/OS-II RTX
Initialise a message queue tx_queue_create OSQCreate os_mbx_init
Destroy a message queue tx_queue_delete OSQDel
Receive a message tx_queue_receive
tx_queue_receive OSQPend os_mbx_wait
Receive a msg. with a timeout
Send a message
tx_queue_send OSQPost os/isr_mbx_send
Send a message with a timeout
tx_queue_flush OSQFlush
tx_queue_front_post OSQPostFront
tx_queue_info_get OSQQuery os/isr_mbx_check
OSQAccept isr_mbx_receive
Table 4-19. Message queue services in the RTOS model and some RTOSs
#001 SC_MODULE(RTOS)
#002 {
#003 ... ...
#004 int mq_open(void **start, int size,
#005 rtos_ecb *pecb, rtos_mqcb *pmqcb);
#006 int mq_close(rtos_ecb *pecb);
#007 int mq_receive(rtos_ecb *pecb, void* msg_ptr,
#008 MQ_SIZE_T msg_len, unsigned int* msg_prio);
#009 int mq_timedreceive(rtos_ecb *pecb, void *msg_ptr,
#010 unsigned __int64 nanoseconds);
#011 int mq_send(rtos_ecb *pecb, void *msg_ptr);
#012 int mq_timedsend(rtos_ecb *pecb, void *msg_ptr, MQ_SIZE_T msg_len,
#013 unsigned int msg_prio, unsigned __int64 nanoseconds);
#014 ... ...
#015 };
Table 4-19 lists message queue services in the RTOS model and referenced
RTOSs. Currently, six basic functions have been included in the model and other
additional RTOS-specific functions can be implemented in future work. In Table
4-20, RT-POSIX-like interfaces are utilised again as the wrapper of message
queue functions in the RTOS model. Note that a standard RT-POSIX message has
a priority property, whereas the proposed RTOS model does not support this fea-
ture. Hence, the priority argument in these APIs is meaningless at the time.
188
schemes including the simple non-nested method; complex grouped and priori-
tised methods; and the vector interrupt controller method [67]. Based on this sur-
vey, three notable common characteristics are extracted and should be considered
in modelling:
1) Nested: A non-nested scheme handles individual interrupts sequentially.
When an interrupt is being serviced, other interrupts are disabled; hence in-
terrupt latency is substantially high. In contrast, a nested scheme allows the
handling of another interrupt during the current interrupt handler. In a sim-
ple nested scheme, interrupts may not be prioritised, which means the new-
est interrupt can block an existing one.
2) Prioritised: Interrupts are assigned priorities that indicate their stringency.
A higher-priority interrupt is serviced in precedence to a lower-priority in-
terrupt, which also means a lower-priority interrupt is ignored if it happens
during a higher-priority interrupt handling process. Depending on specific
implementation, either a hardware interrupt controller or a low-level soft-
ware handler (i.e., in RTOS or drivers) can achieve interrupt prioritisation.
3) Vectored: In a non-vectored interrupt handling scheme, the entry point of
all software ISRs remains the same, i.e., either a RTOS kernel interrupt
handling function or a similar low-level software handler, which takes
charge of determining which ISR should serve the raised IRQ and then load
the ISR into the program counter of the CPU for execution. In a vector-
based scheme, the hardware vector interrupt controller has an array (i.e., a
vector) of ISR addresses. Hence, a specific software ISR can be invoked by
the hardware directly, which means a smaller interrupt latency. These two
schemes are illustrated in Figure 4-5 and referred to as the RTOS-assisted
scheme and the vector-based scheme, respectively.
In order to model typical interrupt schemes in this research, the RTOS model
provides a modular interrupt handling model. It splits interrupt handling functions
in the software PE model into several cooperative HW and SW components.
Through configuration of these components, the interrupt handling model can
189
flexibly support the above-mentioned nest, prioritisation, non-vector, and vector
features.
In the software PE model, one hardware component is related to interrupt han-
dling, i.e., the Interrupt Controller Model in the Live CPU Model (see Section
3.3.3). It is the lowest-level component in the interrupt handling stack and con-
nected to various hardware sources by IRQ lines. Its function and structure has
already been introduced in detail, so they are not repeated here. Just remember, an
essential function of the Interrupt Controller Model is to invoke upper-level soft-
ware interrupt handlers.
Regarding software parts in the interrupt handling stack, there contains the fol-
lowing components:
RTOS kernel-level interrupt handler functions (i.e., inter-
rupt_handler_enter() and interrupt_handler_exit()):
Depending on a specific interrupt scheme, they are invoked by either the
hardware Interrupt Controller Model or user-level ISRs, and their functions
also may vary but in general can prioritise and mask interrupts and call ISRs
if necessary.
User-defined ISRs (also known as immediate interrupt services [26]): they
are attached to corresponding interrupts and programmed by users to pro-
vide simple and non-blocking functions (e.g., post a semaphore) in order to
serve an IRQ promptly. They are assigned higher priorities than are normal
user tasks, as shown in Figure 4-14, among which the tick timer ISR
tick_isr (introduced in Section 4.5.5.3) has the highest priority in the
default setting. Note that a lower-priority ISR can be pre-empted by a
higher-priority ISR. Depending on the configuration of the interrupt model,
a user-defined ISR can be invoked by the RTOS kernel interrupt handler in-
directly or the hardware Interrupt Controller Model directly.
User-level aperiodic tasks (also known as scheduled interrupt services
[26]): they are normal real-time tasks in the RTOS model. Because user-
defined ISRs are typically too simple to include all necessary interrupt han-
dling functions, subsequent aperiodic tasks are always necessary so as to
complete interrupt handling [26]. Their priorities reside in the range of nor-
190
mal real-time tasks, and are usually set to be higher than periodic tasks.
These aperiodic task stay at the WAITING state in normal times and are
triggered by synchronisation functions that are operated by user-defined
ISRs.
The RTOS model can currently support two typical interrupt handling schemes
as shown in Figure 4-5, i.e., the RTOS-assisted (non-vectored) scheme and the
vector-based scheme. No matter in which scheme, nested, prioritised, and
maskable handling functions can all be supported.
Figure 4-19 depicts the process of the RTOS-assisted (non-vectored) interrupt
handling scheme. In this scheme, the RTOS kernel-level interrupt handler in-
terrupt_handler_enter() is the entry point for all ISRs. It is imple-
mented as a SystemC SC_THREAD, which is sensitive to a related sc_event in
the Live CPU Model. The handling process includes the following functions and
transition steps:
1) In Step 1, the Interrupt Controller Model releases the sc_event when it
finds an IRQ.
2) Upon being triggered, the RTOS interrupt entry handler firstly identifies
the external IRQ source and masks other lower-priority IRQs (i.e., ignores
their occurrence during this handling process) by setting interrupt-related
virtual registers in the Live CPU Model. The entry handler then pre-empts
aperiodic aperiodic
task task
User-defined
ISRs and tasks
ISR ISR
Step4
Step3
SC_THREAD:
RTOS interrupt_handler_exit
Step2
kernel SC_THREAD:
RTOS module interrupt_handler_enter
Step 1
IRQ_line1
HW Virtual CPU Sim. Engine Interrupt IRQ_line2
Registers Controller
Live CPU Model IRQ_line3
191
the RUNNING task and inserts it into the RTOS ready queue. Possibly, the
pre-empted “task” may be another lower-priority ISR, and thus the entry
handler will operate an IRQ_NEST_LIST and a
RTOS_IRQ_NEST_COUNT counter in order to record this nested situation
for later recovery. The entry handler also notifies the Live CPU Simulation
Engine in order to stop a time advance of the pre-empted task (details are
introduced in Section 3.3.4.2). Finally, in Step 2, the entry handler sets the
corresponding ISR as the next-to-run task, invokes a context switch, and
triggers the Live CPU Simulation Engine to start. Note that this prioritised
and masked interrupt handling process guarantees that the priority of new
ISR is higher than both the pre-empted task and all other READY tasks in
the system, consequently it is not necessary to invoke the RTOS sched-
uler() function here.
3) Then, an ISR is driven by the Live CPU Simulation Engine to execute its
function. This may unblock a WAITING aperiodic task and make it
READY. When the ISR finishes, it triggers another kernel handler in-
terrupt_handler_exit() in Step 3.
4) This exit handler checks whether there are any nested IRQs. If there are,
their execution will be resumed sequentially according to their priorities.
5) Finally, in the last Step 4, the exit handler sets the highest-priority READY
task to run next, calls a context switch, and activates the Live CPU Simula-
tion Engine.
Figure 4-20 illustrates the vector-based interrupt handling model. Regarding
the hardware part of this model, the Interrupt Controller Model is able to obtain
both the IRQ source and determine the relevant ISR from a vector table. The vec-
tor table is defined as constants in the model, indicating the mapping between IRQ
numbers and ISR’s task IDs. The Interrupt Controller Model also takes charge of
masking lower-priority IRQs when it identifies an IRQ.
In the software part, the interrupt_handler_enter() function is im-
plemented as a normal RTOS function rather than a SystemC process, because it
is no longer the interrupt service entry point and so does not need to be triggered
192
aperiodic aperiodic
task task
User-defined
ISRs and tasks
ISR ISR
Step4
Step3
Step2 interrupt_handler_exit()
RTOS
kernel interrupt_handler_enter()
RTOS module
Step 1
IRQ_line1
HW Virtual Interrupt
Registers CPU Sim. Engine Controller IRQ_line2
193
4.5.8 HAL Modelling
In Section 4.3, the concepts and functions of HAL were briefly outlined. Giv-
ing an example, researchers from the TIMA laboratory present some work on
HAL modelling for native-code software simulation in SoC and MPSoC designs
[153] [186] [154]. Referring to Figure 4-21, their research includes low-level im-
plementation details of both software subsystems (e.g., assembly HAL code) and
hardware subsystems (bus functional and RTL hardware models). Their simula-
tion models apply to the later implementation phases, where HAL API functions
need to be implemented for specific processors.
Compared to the detailed HAL modelling method, most conventional abstract
RTOS modelling work is oriented to early system exploration phases and includes
neither hardware models nor the HAL model, i.e., so-called implicit software PE
modelling and inadequate interrupt handling.
Differing from the implementation-oriented HAL model and the abstract
RTOS model, this thesis proposes a lightweight conceptual HAL model inside the
RTOS module, which supplies some essential hardware-related functions and data
structures for upper-level RTOS and application task models. These functions in-
clude both proprietary services supporting the proposed software PE simulation
Processor
Application SW + delay
OS + delay
Simulation model of HAL + delay
Memory access
Call wait()
via BFM
#001 __ctx_switch
#002 STMIA r0!, {r0-r14} ;save current task
#003 LDMIA r1!, {r0-r14} ;restore new task
#004 SUB pc, lr, #0 ;return
Hardware (RTL) #005 END
(A) Timed software models and the (B) HAL implementation for the context switch on
HAL model the ARM7 processor
194
model and conventional low-level system software primitives. This section intro-
duces three of them, i.e., the delay information injecting service, the context
switch service and the interrupt-related service. Note that transaction-based I/O
communication services will be addressed in Chapter 5.
195
Task timing context
block_exec_time
thread_exec_time
thread_abs_dln
thread_used_time
thread_cur_sta_time
thread_sleep_length
ctx_save()
RTOS module ctx_load()
Virtual
CPU Sim. Engine Registers Interrupt
Controller
Live CPU Model
CPU Model (in Section 3.3.2). As shown in Figure 4-22, the service is imple-
mented as two functions, i.e., ctx_save() and ctx_load():
Upon being called, the ctx_save() uses the values of the virtual regis-
ters to calculate how long time has elapsed since the saved task began its
time advance and record the current time stamp. The updated results are
utilised in a later execution of the Live CPU Simulation Engine, which has
been introduced in Section 3.3.4.2. Afterwards, the ctx_save() saves the
updated software timing context to its TCB.
The ctx_load() function loads a task’s timing context from its TCB
into the virtual registers of the Live CPU Model.
To complete a context-switch process, the RTOS model needs to provide a
method to activate the Live CPU Simulation Engine in order to let it execute
software time delays. This function is implemented by releasing some appropriate
sc_events that are included in the Live CPU module and listened to by the
Live CPU Simulation Engine (in Section 3.3.4).
The two most important interrupt-related kernel functions (i.e., the entry han-
dler and the exit handler) were described in Section 4.5.7.2. Also, some assistant
functions are provided and detailed below:
196
Disabling interrupt: is an essential service in various RTOSs that protects short
critical sections in kernel functions. It is implemented as a pair of functions, i.e.,
enter_critical() and exit_critical(). After executing the former
function, all system interrupts are disabled and can be re-enabled by invoking the
latter function.
Clearing an interrupt source: the interrupt_clear() function can be
called by ISRs to clear a specific IRQ source (according to its IRQ ID number) by
resetting corresponding bits of the Interrupt Controller Raw Status and Status reg-
isters in the Live CPU Model (in Section 3.3.2).
Unmasking interrupts: occurs during an ISR’s execution when its equal- and
lower-priority interrupts are automatically masked by the entry handler in a priori-
tised interrupt handling scheme. After the ISR finishes its function, it needs to call
the interrupt_unmask_equal_lower_irq() function to unmask these
affected interrupts.
As indicated before, the presented RTOS model aims to provide services simi-
lar to those in real RTOSs, in terms of both their formation (normal C++ functions)
and usage (function calls). Most services are implemented as RTOS class member
methods and are called by applications task models through a pointer to their par-
ent RTOS object. The main benefits of modelling RTOS services as normal func-
tions are:
It is more straightforward to input arguments and return values in a normal
function, whereas a SystemC process does not easily support them.
197
It is similar to real-time programming conventions and interfaces, as far as a
RTOS service model can be adapted to a specific RTOS API by changing its
input, output, and function if necessary.
A normal C++ function executes much faster than a SystemC process, be-
cause it does not incur a context switch in the SystemC simulation kernel.
Regarding this point, SystemC language constructs are used as C++ constructs
in modelling, and the RTOS model seems to be implemented by the C++ language
in a normal OS design way. Note that normal C++ RTOS functions can execute in
a SystemC simulation, but cannot represent the timing overheads of the target
RTOS. This problem will be addressed in Section 4.5.9.2.
Certainly, some services and functions in the RTOS model are also imple-
mented as SystemC processes to take advantages of the SystemC language. The
selection of these is based on the following considerations:
Some RTOS services only execute once in a predetermined cooperative or-
der in simulation, thus it is convenient to implement them as SystemC proc-
esses and use simple wait-for-delay statements to advance the simulated
clock. For example, the RTOS initialisation service (i.e.,
SC_THREAD(rtos_init) in model implementation) and the RTOS
multi-tasking start function (i.e., SC_THREAD(rtos_start) in model
implementation) only need to execute at RTOS startup before the beginning
of pre-emptive multi-tasking execution.
Some RTOS services are activated by other SC_MODULEs through static
sensitivity sc_events; consequently they are preferred to be implemented
as SystemC processes. The examples are RTOS kernel interrupt entry and
exit handlers in Section 4.5.7.2.
To conclude, the internal communication methods in the RTOS model are con-
ventional and simple in terms of real-time software programming, i.e., by function
calls. The SystemC sc_event mechanism is mainly used for inter-module and
limited SystemC process-related notifications. The Interface Method Call approach
does appear inside the RTOS model, however it is used in other parts of this re-
search: in hardware modelling (i.e., the Live CPU Model) and in inter-module
communication modelling (i.e., the TLM communication model in Chapter 5).
198
4.5.9.2 Modelling RTOS Timing Overheads
Internal
function
API
service
IRQ time
delay value
Internal b
function a c
API
service
IRQ time
199
Both RTOS services and internal functions may be fully interruptible (there is no
critical section in code), fully uninterruptible (code is a critical section), or par-
tially interruptible (with part code in one or several critical sections). Although a
service in the RTOS model can generate similar results to a corresponding service
in a real RTOS, the simulation trace may be quite different from the real execu-
tion trace in terms of exact included function blocks (see Figure 4-23). Hence,
service-level timing annotations are sufficiently accurate for modelling RTOS
service timing behaviours in this thesis. In fact, unless there is a deep enough un-
derstanding of the target RTOS code and the RTOS model is thoroughly adapted
for the target RTOS, there is not an easy solution for enhancing the timing accu-
racy of RTOS services to a finer level.
Thus, is it possible to use the same time advance method of application tasks
for RTOS services? This is not straightforward for several reasons:
1) RTOS services do not have native control blocks that can store their delay
information.
2) In this thesis, RTOS services are modelled as functions rather than inde-
pendent executable entities. They do not have separate SystemC process
wrappers to support their execution on top of the SystemC simulation ker-
nel.
3) Many RTOS services are re-entrant, for example, a wait-semaphore func-
tion may be invoked in several concurrent tasks and can be blocked in the
middle. If a single sc_event object is used in a RTOS service for the
wait-for-event time advance method, once the Live CPU Model releases
this sc_event, then multiple execution instances of a RTOS service may
be triggered at the same time. This may result in race conditions.
Within this thesis, the RTOS service time annotation and advance problems are
solved by a lightweight approach after investigating common characteristics of
RTOS service time annotations in the model. The service-level RTOS service an-
notation assumption actually means that it is difficult to implement partially inter-
ruptible time advance for a service. The RTOS model does not necessarily have
target-like function blocks inside a service, nor does it support the insertion of
several very accurate interruptible and un-interruptible annotations for these
200
blocks. Thus, the RTOS service timing modelling problem is simplified, with the
time advance method for RTOS services needing to cover two simulation situa-
tions:
1) Time advance in a single step, i.e., uninterruptible;
2) Time advance divided into several steps in case of interruptions, i.e., inter-
ruptible.
The approach is therefore divided into two methods, i.e., the interruptible
method and the un-interruptible method.
The interruptible RTOS time advance method means that the time advance du-
ration of a service can be interrupted and resumed later. This requires users to an-
notate RTOS services when they build application task models. Rather than main-
taining delay information by a RTOS service itself, the delay value of a RTOS
service is annotated in the calling application task. The calling task acts as an
agent to progress the simulated clock for its invoking RTOS service. See Table
4-21 (A) for an example, a semaphore initialisation function executing at line 4.
Subsequently, its interruptible timing overhead
SEM_INIT_FUNC_DELAY_TIME is injected into the Live CPU Model on line 5
and then a wait-for-event statement is inserted on line 6 as normal.
201
The un-interruptible RTOS service advance method relates to RTOS critical-
section services and functions during which system-wide interrupts are totally dis-
abled. Usually, these services are internal RTOS functions, e.g., the context
switch service and the scheduler service, neither of which is directly visible to
user task models. Hence, their annotations need to be inserted inside the RTOS
module. Since it is not necessary to worry about interruptions during the delay
duration, a simple wait-for-delay statement is used to annotate and advance the
simulated time (see Table 4-21 (B)). This method also avoids invoking the Live
CPU Simulation Engine and decreases SystemC kernel engine switches. Hence, it
can improve simulation speed.
We note that above methods may bring unmatched time advances for critical
sections inside a RTOS service. For example, referring to Figure 4-23, a real
RTOS service may include critical sections that are different from those in a
RTOS service model, and these critical sections may execute at different times
along the timeline. In a real execution, an IRQ happens during a critical section
and may hence be ignored or delayed due to temporarily disabled system inter-
rupts. However, in simulation, an IRQ happens at the same absolute time point,
but it may be processed immediately by the system because there is not a critical
section currently available. Given the previously mentioned assumption on RTOS
timing overhead modelling, this limitation should be acceptable.
202
4.6.2 Simulation Accuracy Metrics
Since the proposed RTOS model provides a set of practical functions to sup-
port native-code real-time tasks, functional accuracy of some typical RTOS ser-
vices can be represented in simulation. Both simulation traces and results can be
compared to the ISS counterpart.
Tasks
RTOS
Tasks
RTOS
203
ISS simulator. These observation points are chosen as important state transition
points in concurrent multi-tasking execution, e.g., task switching points, RTOS
service invoking points, task completion points, etc.
204
Simulation speed comparison
1000000
10000
1000
100
10
1
Target simulated time (ms) 500 1000 2000 5000 10000
ISS simulation time (ms) 11500 22960 47130 116130 225020
RTOS-centric simulation
time(ms) 24 47 94 233 458
and segment level respectively. All tests are executed on an x86 PC at 1.86GHz.
In order to compare the speed of RTOS-centric simulator with the standard ISS
simulation, we let each simulator simulate for 500 ms, 1000 ms, 2000 ms, 5000
ms and 10000 ms target time. During the longest 10000 ms simulation, the three
tasks can repeat about 110, 100, and 19 iterations respectively.
Not surprisingly, as a behavioural software simulator, the RTOS-centric simu-
lator is much faster speed than the ISS simulation. Figure 4-26 reveals the simula-
tion performance of RTOS-centric simulation: it is nearly 500 times faster than
the ISS simulator.
Regarding functional accuracy, the RTOS-centric simulator generates simula-
tion sequences and results at the right time compared with real execution. In the
experiment, we input same stimuli, i.e., keyboard signals and voltages (dummy
values), into both Vision ARM ISS and the RTOS-centric simulator. We observe
A/D converting results, which are generated after various multi-tasking interac-
tions between application tasks and the RTOS. Figure 4-27 (A) shows the func-
tional results generated at two time points in the ISS simulator. Note that the time
is displayed with the unit of second. Figure 4-27 (B) shows part of the trace file of
the RTOS-centric simulator. It can be observed that the RTOS-centric simulator
produces similar functional results at very close time points to the ISS simulator,
which demonstrates its functional correctness.
205
AT 501934350 ns:
In 500ms, samples: 5
AT 503937510 ns:
|Sample NO.1: 2200mv
…… ……
AT 511950150 ns:
|Sample NO.5: 2200mv
…… ……
AT 1016351670 ns:
|In 500ms, samples:5
…… ……
AT 1026367470 ns:
|Sample NO.5: 2200mv
According to the method introduced in Section 4.6.2.2, Figure 4-28 shows tim-
ing accuracy comparison between the ISS simulator and the RTOS-centric simula-
tor. The X-axis is 22 observation points (e.g., task switching points or RTOS ser-
vice entry points) in simulation flows and the Y-axis is the simulated target time
of each observation point, which ranges from 0 to 600 ms, i.e., including a full
operation cycle of the system. In the figure, two simulator flows’ curves are in
close accordance, which reveals the good accuracy intuitively.
Table 4-22 shows the timing accuracy losses of the RTOS-centric simulation
compared with the ISS simulation at these 22 comparison points. Results in the
table show that accuracy losses of the RTOS-centric simulator are marginal in this
experiment, i.e., 14 out of 22 points are less than 0.7% and all are less than 4.5%.
Referring to Table 4-22, note that there are some sudden changes of the timing
accuracy in the RTOS-centric simulation timeline, where the accuracy loss
RTOS-centric
500000 ISS
400000
300000
200000
100000
0
0 2 4 6 8 10 12 14 16 18 20 22 24
Observation points
206
Comparison point #1 #2 #3 #4 #5 #6
Table 4-22. Accuracy loss of the RTOS-centric simulation compared with ISS
abruptly spikes or decreases within a certain degree. This phenomenon can be dis-
cussed twofold. Firstly, the generic RTOS model is not implemented the same as
the real C/OS-II RTOS in terms of its internal functions and associated timing
overheads. Hence, there are differences regarding timing behaviours of various
RTOS services in our simulation to ISS simulation. This inevitable inaccuracy can
both contribute the timing accuracy loss and unintentionally remedy the accumu-
lated loss. Secondly, application tasks are annotated with segment level timing
costs, which also have inherent inaccuracy compared to ISS simulation. The con-
sequence is similar to the RTOS aspect as well.
207
ARM
EINT0
RTX RTOS
IRQ can be handled immediately once happening; secondly, the three involved
ISR and tasks can coordinate correctly in terms of both functionality and timing.
The KEIL ARM ISS simulates a 60MHz LPC2129 processor with the vectored
interrupt controller. The RTOS-centric simulator is also configured with this vec-
tor-based interrupt handling mode. The task models and RTOS model are anno-
tated with timing costs that are measured from the ISS simulator at the segment
level and function level respectively.
Firstly, we run ISS and RTOS-centric simulators for 100 ms target time and re-
peat 10 times in order to compare their simulation performance. The results are
shown in Table 4-23. Not surprisingly, the RTOS-centric simulator achieves a
considerable speedup compared to the ISS simulator.
Secondly, we compare interrupt handling processes in the ISS simulator and
the RTOS-centric simulator. We raise the ARM external interrupt at (almost)
same target time points in both simulators (i.e., at 0.01003332 s in ISS and
0.10033290 s in RTOS-centric simulator), when the task counter_task() is
currently executing. ISS and RTOS-centric simulation outputs are shown in Fig-
ure 4-30 and Table 4-24, respectively. The two figures show that a series of events,
Average simulation
Speedup
time (μs)
ISS 14174000
RTOS-centric simulator 16425.88 862.9066
208
(A) Before the IRQ event, counter_task() is executing
i.e., interrupt raise, CPU catch, task pre-emption, and ISR entry. They are exe-
cuted and simulated in the same order in both simulators. This means that our
RTOS-centric simulator can model the realistic interrupt handling method in RTX
RTOS.
Thirdly, in order to evaluate the timing accuracy of our RTOS-centric simula-
tion in this experiment, we still use the “observation points” method introduced in
Section 4.6.2.2. The result is shown in Figure 4-31. The X-axis is 18 observation
points in simulation flows, which represent entries of RTOS services and task job
completions. The Y-axis is the simulated target clock time of each observation
point. It ranges from 0 to 14 ms that includes a full interrupt cycle of the system.
209
14000000
12000000
In the figure, two simulation curves are coincident, showing the good timing accu-
racy of our simulator. As shown in Table 4-25, the calculated timing accuracy
losses are marginal regarding these 18 observation points in this experiment. This
is mainly due to our carefully fine tuning of the RTOS simulator and relatively
simple functions of the test program.
Comparison point #1 #2 #3 #4 #5 #6
4.8 Summary
This chapter has presented a generic RTOS-centric real-time embedded soft-
ware simulation model. It allows modelling and simulating application tasks, the
RTOS, and the CPU processing element in a unified SystemC-based framework.
It can help designers to evaluate both functional and timing effects of the pro-
jected real-time embedded software design fast and early.
210
It can flexibly model application tasks by supporting hybrid abstract software
models and delay-annotated native-code application task models. It improves the
functionality of the RTOS model by providing various generic and practical ser-
vices selected from common RTOS standards and products. It achieves reasona-
bly accurate simulation, in terms of both functional and timing accuracy, by mod-
elling RTOS services as their normal structures and formations and considering
timing overheads of various RTOS services. The underlying Live CPU Model
also enables RTOS-centric software models with interruptible time advance.
Experiments show the fast performance, sufficient function, and marginal tim-
ing accuracy loss of the RTOS-centric simulation approach compared to cycle-
accurate ISS simulation of two real RTOS products. The reasons are mainly three-
fold: firstly, as introduced in the chapter, the RTOS simulation model’s structure
is elaborate and its functional and timing behaviours are carefully modelled; sec-
ondly, the RTOS simulation model is adapted to model the two RTOSs; thirdly,
delay information of both applications and RTOS services is measured on the ISS
before being used in RTOS-centric simulation.
211
Chapter 5
Communication Interfaces
In a real embedded system, the software subsystem runs on top of a CPU sub-
system. These software and hardware subsystems collectively constitute a soft-
ware PE model. Previous chapters have investigated behavioural modelling and
simulating real-time software and RTOS in the context of a software PE model, as
shown in Figure 5-1.
In the SystemC-based high-level software modelling and simulation approach,
the hardware aspect of the software PE model is abstracted and encapsulated into
a Live CPU Model (in Section 3.3). It provides abstract yet essential hardware
control functions (e.g., interrupt controller, virtual register, and real-time clock,
Software Processing
Element (CPU)
RTOS-centric software
Software aspect
simulation model
TLM interfaces
Inter-module TLM
communication aspect
213
etc.) to upper level software. Especially, it supports interruptible SystemC-based
software timed simulation through the Live CPU Simulation Engine. In Chapter 4,
the RTOS-centric real-time software simulation model is described as the soft-
ware aspect of the software PE model. It can supply various practical and flexible
RTOS services in order to support abstract and native-code real-time application
task models.
Within software modelling and simulation research, transaction-level model-
ling has frequently been considered. TLM is a promising system-level modelling
paradigm to improve productivity in the design of integrated embedded systems,
e.g., SoC. TLM models are expected to serve as interoperable references across
different design teams with different aims such as fast embedded systems archi-
tecture exploration, functional verification, and as well as the interest of this thesis
- early embedded software modelling and simulation. SystemC is the research tool
of this thesis and also the most popular SLDL in TLM design area today [3].
Based on the essential TLM principle “separating computation from communica-
tion”, TLM research can be divided into two aspects: the computation aspect and
the communication aspect [7]. In this thesis, the proposed software models reside
in the domain of TLM software computation aspect. In Section 3.2.2, some soft-
ware TLM software computation models have been defined with inspiration from
the OSCI TLM-2.0 standard [88], which is the official SystemC TLM communi-
cation modelling standard.
For the aim of extending software simulation models to the wider and encour-
aging TLM communication modelling world, this chapter considers the integra-
tion of existing TLM communication interfaces in the software PE model. These
added interfaces and structures support SW-to-HW and HW-to-HW inter-module
communication modelling with existing software models. OSCI TLM-2.0 stan-
dard interfaces are selected due to their popularity. As depicted in Figure 5-1, this
TLM interface modelling work can be seen as an add-on module in terms of the
whole software PE model. By this means, the software PE model can be inte-
grated in an abstract TLM embedded system model that includes the CPU, mem-
ory, bus, and peripheral devices, which will improve functionality and extend this
research. Note that this chapter does not aim to propose any new or complex TLM
214
communication modelling methods, because the scope of this thesis is on software
modelling and simulation.
215
Initiator/ Transaction
target objects
module
Initiator Target
socket socket
Transaction objects back. path Forward path back. path
Target Initiator
socket socket
Initiator Target
Initiator socket socket
module Forward Interconnect Forward Target
path module path module
Backward path Backward path
216
#001 class TLM_LT_COMPONENT : public SimpleLTInitiator1
#002 {
#003 int LT_write(unsigned uiId, unsigned uiData);
#004 int LT_read(unsigned uiId);
#005 .......
#006 };
#007 class TLM_AT_COMPONENT : public SimpleATInitiator1
#008 {
#009 int AT_write(unsigned int uiId, unsigned int uiData);
#010 int AT_read(unsigned int uiId);
#011 .......
#012 };
ware components in a real design. Besides, because HAL services are included in
the RTOS module (see Section 4.5.8), the RTOS model needs to provide TLM
APIs for application tasks.
In order to support both LT-style blocking and AT-style non-blocking commu-
nication, two initiator classes are derived from the simple socket interfaces of the
OSCI TLM-2.0 library. Referring to Table 5-1 (A), they both supply simple write
and read functions. In the current model, a write function needs two arguments
for a transport, i.e., a target ID indicating the destination module and a datum to
be transferred; whereas a read function only needs a target ID argument and the
217
obtained datum is returned by the function. Exact addresses of transports are
maintained by these interfaces internally. The model also can support user-defined
addresses in future refinement.
Based on above interface classes, an LT initiator component is instantiated and
bound to a LT socket inside the Live CPU Model, and so is an AT initiator com-
ponent and an AT socket (see Figure 5-3 and Table 5-1 (B)). Afterwards, in Table
5-1 (C), some RTOS HAL services wrap the TLM interfaces provided by the Live
CPU Model. Finally, as shown in Table 5-1 (D), an application task can invoke
RTOS communication services so as to transfer some data to a target.
cpu_sim_engine irq_ctrl
IRQs
LT AT
component component DMA controller
(Initiator/Target)
LT AT IRQ
socket socket
simple bus
Figure 5-3. Combining software PE model with TLM interfaces and SoC models
218
erality, as shown in Figure 5-3 , a simple SoC topology is presented as the refer-
enced model to extend the software PE for TLM modelling. It includes following
modules.
The Live CPU Model is the main software PE initiator. In addition, an optional
hardware IP module can be integrated as another initiator for customised hardware
computation. It supports both LT and AT interfaces like the software PE model
(See Table 5-1 (A)). This HW IP module can also be connected to the Interrupt
Controller Model in the Live CPU Model by a standard SystemC primitive chan-
nel, in order to trigger a software interrupt hander in case of an interrupt event.
219
5.1.3.3 Combined Initiator/Target Module
Referring to Table 5-3, two sockets, three main methods, and four virtual regis-
ters implement a model of a typical DMA mechanism. The b_transport()
method, which is inherited from the standard TLM LT target interface, listens to
220
the target socket and waits for configuration information from requestors. Upon
receipt, the configuration information (i.e., source address, destination address,
size of transfer, and control bits) is saved in virtual registers of the DMA control-
ler. Then the DMA_transfer() function begins to read data from source loca-
tions and then writes them to destinations. When an entire DMA transfer is fin-
ished, the DMA_irq() method will interrupt the Live CPU Model.
5.1.3.4 Interconnection
5.2 Experiments
In this section, some case studies are presented in order to demonstrate the per-
formance and capability of the integrated software PE model and TLM communi-
cation models. They are based on the above introduced TLM SoC model. All ex-
periments run on a 1.86GHz x86 PC.
221
through the TLM bus, whilst another software task reads data from that memory
module. The RTOS model mimics the C/OS-II RTOS and is annotated with
relevant executing overheads on a 48MHz ARM7 processor.
The benchmark model is configured and run in six scenarios as follows:
1) Scenario 1 (Pure SystemC + LT TLM): This is an original SystemC TLM
model without the RTOS model. The SystemC native kernel scheduler pro-
vides co-operative scheduling without prioritisation and pre-emption. Soft-
ware tasks are implemented as SystemC SC_THREADs. Coarse-grained
time annotation and the LT style are used for software and TLM communi-
cation models, respectively. This case can represent behaviour of an origi-
nal SystemC TLM simulation.
2) Scenario 2 (Pure SystemC + AT TLM): The only difference from Scenario
1 is that AT TLM communication is used in this case. It supports more tim-
ing phases in a transaction than the LT TLM model.
3) Scenario 3 (Abstract SW + LT TLM): The software PE model (including
the Live CPU Model, RTOS model, and task models) and the LT style
TLM model are integrated in this case. Two coarse-grained timed abstract
software tasks are controlled by the RTOS model and utilise RTOS syn-
chronisation and timing services.
4) Scenario 4 (Abstract SW + AT TLM): Being different from Scenario 3, the
AT TLM communication method is used in this case.
5) Scenario 5 (Native-code SW + LT TLM): In this case, software tasks are
annotated with fine-grained time delays, whose number is about 1000 times
more than the abstract model. Other properties are the same as Scenario 2.
6) Scenario 6 (Native-code SW + AT TLM): This case includes both fine-
grained timed software model and the AT TLM communication model.
The model of each scenario is executed ten times so as to obtain an average re-
sult. In each run, a thread repeats about ten jobs, with two thousand transactions
being transferred on the bus.
The simulation results are shown in Figure 5-5. Not surprisingly, pure func-
tional SystemC models achieve the fastest simulation speed due to simplicity. As
expected, native-code software models and AT TLM models always give a worse
222
Figure 5-5. Simulation performance results
223
posed models in this thesis have good enough capability (i.e., DMA enabled
memory transfer and full-functional interrupt handling) to model the two mecha-
nisms.
This experiment implements the RSA cryptography algorithm in the software
initiator and uses DMA to transfer encrypted and decrypted messages across a
memory target module and a peripheral target device. Specifically, this experi-
ment includes following modules and components:
The software PE initiator module includes two software tasks and the RTOS
model. Hereinto, the task_encypt() task encrypts randomly-generated
messages and saves them in a memory module; the other
task_dma_transfer() task invokes the DMA controller to transfer
ciphered messages and secret keys from memory locations to a hardware
decipherer device. The two tasks are synchronised by a semaphore.
A memory model serves as a target module and is accesses by software PE
and the DMA module.
A hardware peripheral device RSA_IP is a target module and acts as a de-
cipherer.
The DMA model is a combined initiator/target. It raises an interrupt to the
Live CPU Model when it finishes transport.
The simple_bus interconnection.
Various modelling characteristics of the software PE model and the simple
SoC model such as software processing, DMA transfers, and CPU interruption are
included in this experiment. It is expected to observe successfully recovered mes-
sages after several transfers across the TLM simple bus and low frequency of I/O
related interrupts by using the DMA method.
Figure 5-6 shows some parts of the SystemC simulation trace of this experi-
ment. They are organised in five sequential blocks in order to illustrate a working
cycle of this experiment. In the 1st block, the task task_encypt() encrypts
original messages, saves them in a target memory module, and then unblocks the
task task_dma_transfer(). As shown in the 2nd block, the second task then
uses TLM primitives (i.e., the CPU TLM interfaces) to program the DMA module
to initiate transfers. After transactions are transferred, the hardware peripheral de-
224
//SW task: task_encrypt_data() encrypts:
AT 1472170 ns: |Message to be ciphered: 9614
|Ciphered message: 3307
AT 2272170 ns: |Message to be ciphered: 1454
|Ciphered message: 35894
AT 3072170 ns: |Message to be ciphered: 5878
|Ciphered message: 2726
vice model decrypts these messages correctly (in the 3rd block). Afterwards, the
device initiates a DMA transfer to write decrypted data back to the memory mod-
ule (in the 4th block). Finally, in the last block, the DMA controller interrupts the
CPU to notify the finish of transfer, and the Interrupt Controller in the Live CPU
Model recognises its IRQ number and notifies the RTOS model for software inter-
rupt handling.
Figure 5-7 shows the simulation timeline. DMA transfers relax the software
system from frequent and time-consuming context switches, where the interrupt
225
task1
software task2
ISR
RTOS
0 time (µs)
1000 2000 3000 4000 5000
DMA
hardware
RSA_IP
source only triggers once in each system working cycle. This not only means that
the running speed (total cycles to finish a specific job) of a I/O intensive model
can be improved by utilising DMA, but also infers the possibility to unitise the
CPU more efficiently by implementing some more software functions during
DMA transfer duration.
5.3 Summary
The software PE model has been extended with TLM communication inter-
faces by utilising the OSCI TLM-2.0 library and integrated in a simple SoC dem-
onstration model including common TLM initiator and target modules. The fa-
vourable expandability of the Live CPU Model and the software PE modelling
approach is also reflected in this work. One experiment shows the co-simulation
performance of combined software PE and TLM models and indicates the mar-
ginal overheads of the software PE model in simple TLM simulation. Another ex-
periment simulates a DMA I/O experiment by the proposed SoC TLM models.
This demonstrates the TLM HW/SW co-simulation capability of the extended
software PE model.
Because of highly abstract features and functions of the conceptual TLM mod-
els in this chapter, the timing accuracy and complete functionality of these inte-
grated software PE and TLM communication models cannot be easily judged.
Nevertheless, some academic research [6] as well as some industrial tools [188]
[189] have successfully used OSCI TLM as a sound base for in-depth communi-
cation modelling, which inspire us to improve the software PE model for TLM-
based HW/SW co-simulation in future research.
226
Chapter 6
227
This hypothesis was refined into four objectives in Section 1.6.4. Correspond-
ingly, this thesis has contributed in four aspects:
1) The mixed timing real-time software modelling and simulation approach:
a. It identifies the key aspects for real-time software timing modelling and
simulation in the SystemC simulation environment.
b. It defines two types of software models for early real-time software
simulation, according to the granularity of function and timing, and de-
scribes their relevance to existing TLM abstract models.
c. It proposes to use various modelling and simulation techniques for fast,
flexible and reasonably accurate behavioural simulation.
2) The Live CPU Model:
a. It proposes an abstract hardware CPU model inside a modular high-
level software PE model, which ideally supports interruptible software
time advance in SystemC simulation.
b. It extends modelling capability of the mixed timing software modelling
approach for HW/SW interactions.
3) The RTOS-centric Real-time software simulation model
a. It provides a systematic approach for building and simulating real-time
software (including both application tasks and RTOS) modular simula-
tion models in SystemC, which represent the software aspect of the
modular high-level software PE model.
b. It identifies essential RTOS features that are necessary for practical
RTOS modelling and implements them in a SystemC based RTOS
model.
c. This RTOS-centric approach can simulate mixed timing application
task models in fast and accurate behavioural simulation, which rea-
sonably approximate the functional and timing behaviours of a target
software system.
4) Extending Software Models for TLM Communication
a. It integrates standard TLM communication interfaces into the modular
high-level software PE model.
228
b. This work also proposes a SoC TLM model, which not only integrates
the software PE model but also defines other typical TLM initiator, tar-
get, and interconnection models.
6.2 Conclusions
229
6.2.2 The Live CPU Model
In Chapter 3, we present the SystemC-based Live CPU Model as the concep-
tual hardware part of the modular software PE simulation model. The Live CPU
Model consists of three components, i.e., the Live CPU Simulation Engine, the
Interrupt Controller Model, and the Virtual Registers. It could also be extended
with SW/HW interfaces for inter-module communication.
The Live CPU Simulation Engine is the basis of the wait-for-event time ad-
vance method for upper-level mixed timing software models. It consumes delay
annotations and software models in an interruptible and resumable way, by which
the target simulated clock is accurately progressed. It also supports mixed vari-
able-step and fixed-step execution modes, which enable trade-offs between simu-
lation speed and simulation observability.
The Interrupt Controller Model monitors external HW interrupts that are con-
nected to the Live CPU module. It is the first-level component in the software PE
model to handle interrupts and supports prioritised and maskable handling func-
tions. Once it finds an interrupt, it can immediately notify the Live CPU Simula-
tion Engine to stop current software time advance in order to handler the interrupt,
i.e., with a zero-time interrupt latency.
Some Virtual Registers are modelled to assist software simulation in terms of
task timing information context switch and flag setting.
In general, this Live CPU Model is a novel idea to introduce the conceptual
hardware CPU model into generic high-level software simulation. It separates
functions between software modules and hardware modules and makes whole the
simulation framework more structured and extensible.
230
This generic RTOS-centric real-time embedded software simulation model has
a modular structure. It clearly separates application tasks, the RTOS, and the Live
CPU Model into different modules and integrates them for simulation. The under-
lying Live CPU Model enables accurate time advance for RTOS-centric software
models. Application tasks are modelled according to the mixed timing approach,
which means that hybrid abstract task models and delay-annotated native-code
task models can co-exist in one simulator, in order to enhance modelling flexibil-
ity. The RTOS model provides essential and generic services including
task/process modelling, multi-tasking management, scheduling services, task syn-
chronisation and communication, interrupt handling, and HAL services. This rich
set of services can be invoked by tasks through normal function calls, which en-
able convenient and practical real-time software simulation at early design phases.
In addition to these ample functional features, timing overheads of various RTOS
services are also considered and added into simulation models.
Experimental results have shown fast performance, high functional accuracy,
and small timing accuracy losses of RTOS-centric simulation, compared to cycle-
accurate ISS simulation.
231
onstrates the TLM HW/SW co-simulation capability of the extended software PE
model.
232
could be integrated in the RTOS model. Their functions can be implemented by
wrapping corresponding C++ dynamic memory operators and building memory
control blocks. Their timing overheads can be annotated with the costs of memory
management services in the target RTOS.
233
Bibliography
235
[11] F. Hessel, V. M. D. Rosa, C. E. Reif, C. Marcon, and T. G. S. d. Santos,
"Scheduling Refinement in Abstract RTOS Models," ACM Transactions
on Embedded Computing Systems (TECS), vol. 5, pp.342-354, 2006.
[24] Q. Li and C. Yao, Real-Time Concepts for Embedded Systems: CMP, 2003.
236
[25] G. Buttazzo, Hard Real-Time Computing Systems: Predictable Scheduling
Algorithms and Applications, 2nd ed.: Springer-Verlag New York Inc,
2004.
[28] R. McMillan, "GM CTO predicts cars will run on 100 million lines of
code," in IDG News Service, 21 October 2004.
237
[40] K. Keutzer, A. R. Newton, J. M. Rabaey, and A. Sangiovanni-Vincentelli,
"System-Level Design: Orthogonalization of Concerns and Platform-
Based Design," Computer-Aided Design of Integrated Circuits and
Systems, IEEE Transactions on, vol. 19, pp.1523-1543, 2000.
238
[53] T. Mark, N. Hristo, S. Todor, D. P. Andy, E. Cagkan, P. Simon, and F. D.
Ed, "A Framework for Rapid System-Level Exploration, Synthesis, and
Programming of Multimedia MP-SoCs," in 5th IEEE/ACM international
conference on Hardware/software codesign and system synthesis Salzburg,
Austria: ACM, 2007.
239
[63] M. Krause, O. Bringmann, and W. Rosenstiel, "Target software generation:
an approach for automatic mapping of SystemC specifications onto real-
time operating systems," Design Automation for Embedded Systems, vol.
10, pp.229-251, 2005.
[70] P. Destro, F. Fummi, and G. Pravadelli, "A Smooth Refinement Flow for
Co-Designing HW and SW Threads," in Proceedings of the conference on
Design, automation and test in Europe: EDA Consortium San Jose, CA,
USA, 2007, pp. 105-110.
240
[74] "The SpecC Reference Compiler," Center for Embedded Computer
Systems UC Irvine, 2006.
241
[86] S. Yoo and K. Choi, "Synchronization Overhead Reduction in Timed
Cosimulation," in Proc. of Int. High Level Design Validation. 6th
International Workshop on Hardware/Software Co-Design,
CODES/CASHE98, 1997, pp. 157-164.
[87] Z. He, A. Mok, and C. Peng, "Timed RTOS Modeling for Embedded
System Design," in 11th IEEE Real Time and Embedded Technology and
Applications Symposium (RTAS'05), 2005, p. 448.
[88] OSCI TLM Work Group, "OSCI TLM-2.0 Language Reference Manual
(Software Version: TLM 2.0.1)," http://www.systemc.org/, 2009.
242
[97] F. Doucet, R. K. Shyamasundar, I. H. Krüger, S. Joshi, and R. K. Gupta,
"Reactivity in SystemC Transaction-Level Models," in Lecture Notes in
Computer Science 4899, Proceedings of Third International Haifa
Verification Conference (HVC 2007), Haifa, Israel, 2008, pp. 34-50.
[98] T. Grötker, S. Liao, G. Martin, and S. Swan, System Design with SystemC:
Kluwer Academic Publishers Norwell, MA, USA, 2002.
[99] OSCI TLM Work Group, "Requirements specification for TLM 2.0,"
http://www.systemc.org/, 2007.
[103] G. Schirner and R. Dömer, "Fast and Accurate Transaction Level Models
Using Result Oriented Modeling," in Proceedings of the 2006 IEEE/ACM
international conference on Computer-aided design, 2006, p. 368.
243
Modelling," in Proceedings of the 43rd annual Design Automation
Conference San Francisco, CA, USA: ACM, 2006.
[114] R. L. Moigne, O. Pasquier, and J. P. Calvez, "A Generic RTOS Model for
Real-time Systems Simulation with SystemC," in Conference on Design,
automation and test in Europe - Volume 3: IEEE Computer Society, 2004.
[115] W.-T. Sun and Z. Salcic, "Modeling RTOS for Reactive Embedded
Systems," in 20th International Conference on VLSI Design held jointly
with 6th International Conference on Embedded Systems (VLSID'07),
2007, pp. 534-539.
[116] D. C. Black and J. Donovan, Systemc: From the Ground Up: Springer,
2005.
244
[118] F. Fummi, M. Loghi, G. Perbellini, and M. Poncino, "SystemC Co-
Simulation for Core-Based Embedded Systems," Design Automation for
Embedded Systems, vol. 11, pp.141-166, 2007.
[120] Y. Yi, D. Kim, and S. Ha, "Fast and Time-Accurate Cosimulation with OS
Scheduler Modeling," Design Automation for Embedded Systems, vol. 8,
pp.211-228, June, 2003.
245
[129] S. Honda, T. Wakabayashi, H. Tomiyama, and H. Takada, "RTOS-Centric
Hardware/Software Cosimulator for Embedded System Design," in 2nd
IEEE/ACM/IFIP international conference on Hardware/software codesign
and system synthesis, Stockholm, Sweden, 2004.
[131] M.-K. Chung, S. Yang, S.-H. Lee, and C.-M. Kyung, "System-Level
HW/SW Co-Simulation Framework for Multiprocessor and Multithread
SoC," in VLSI Design, Automation and Test, 2005. (VLSI-TSA-DAT). 2005
IEEE VLSI-TSA International Symposium on, 2005, pp. 177-179.
246
[139] S. Lippman and J. Lajoie, C++ Primer: Addison Wesley Longman
Publishing Co., Inc. Redwood City, CA, USA, 1998.
247
[154] K. Popovici and A. Jerraya, "Hardware Abstraction Layer Introduction and
Overview," in Hardware Dependent Software -- Principles and Practice,
W. Ecker, W. Müller, and R. Dömer, Eds.: Springer Science + Business
Media B.V., 2009, pp. 67-94.
[159] P. A. Laplante, Real-Time Systems Design and Analysis, 3rd ed.: Wiley-
IEEE Press, 2004.
248
[168] S. Baskiyar and N. Meghanathan, "A Survey of Contemporary Real-time
Operating Systems," Informatica vol. 29, pp.233-240, 2005.
[175] B. Sprunt, L. Sha, and J. Lehoczky, "Aperiodic Task Scheduling for Hard-
Real-Time Systems," Real-Time Systems, vol. 1, pp.27-60, 1989.
249
[182] "Real-Time Executive for Multiprocessor Systems (RTEMS)," OAR
Corporation, http://www.rtems.com/.
250