SystemC - Methodologies and Applications (PDFDrive)
SystemC - Methodologies and Applications (PDFDrive)
edited by
Wolfgang Müller
Paderborn University, Germany
Wolfgang Rosenstiel
Tübingen University, Germany
and
Jürgen Ruf
Tübingen University, Germany
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
mechanical, recording, or otherwise, without written consent from the Publisher
Foreword ix
Preface xiii
Chapter 1
A SystemC Based System On Chip Modelling 1
and Design Methodology
Yves Vanderperren, Marc Pauwels, Wim Dehaene, Ates Berna, Fatma Özdemir
1.1. Introduction 1
1.2. An Overview of the Methodology 2
1.3. Requirements Capture and Use Case Analysis 3
1.4. Modelling Strategies 5
1.5. Iterative Development, Model Refinement and Verification 20
1.6. Conclusions 25
Chapter 2
Using Transactional Level Models in a SoC Design Flow 29
Alain Clouard, Kshitiz Jain, Frank Ghenassia, Laurent Maillet-Contoz,
Jean-Philippe Strassen
2.1. Introduction 29
2.2. Overview of the System to RTL Design Flow 31
2.3. TLM, a Complementary View for the Design Flow 33
2.4. TLM Modeling API 44
2.5. Standard Usage of the SystemC API 49
2.6. Advanced TLM API Usages 51
2.7. Example of a Multimedia Platform 52
2.8. Example of ARM Processor Subsystem 58
2.9. Conclusions 63
Chapter 3
Refining a High Level SystemC Model 65
Bernhard Niemann, Frank Mayer, Francisco Javier, Rabano Rubio,
Martin Speitel
3.1. Introduction and Motivation 65
3.2. The OFDM Demodulator 66
3.3. High Level SystemC Model 68
3.4. Refinement to ANSI C 79
3.5. Further Refinement — Operator Grouping 87
3.6. Summary 93
3.7. Conclusions 95
vi
Chapter 4
An ASM Based SystemC Simulation Semantics 97
Wolfgang Müller, Jürgen Ruf, Wolfgang Rosenstiel
4.1. Introduction 97
4.2. Related Works 98
4.3. Abstract State Machines 99
4.4. SystemC Basic Concepts 101
4.5. SystemC Operations 106
4.6. SystemC Scheduler 113
4.7. Example 118
4.8. Conclusions 126
Chapter 5
SystemC as a Complete Design and Validation Environment 127
Alessandro Fin, Franco Fummi, Graziano Pravadelli
5.1. Introduction 127
5.2. Methodology Overview 128
5.3. Design Error Modeling 129
5.4. High Level Validation of SystemC Design 136
5.5. Efficient Fault Simulation of a SystemC Design 144
5.6. Experimental Results 151
5.7. Concluding Remarks 155
Chapter 6
System Level Performance Estimation 157
Nuria Pazos, Winthir Brunnbauer, Jürgen Foag, Thomas Wild
6.1. Introduction 157
6.2. State of the Art 160
6.3. Methodology 161
6.4. Implementation Procedure 173
6.5. Methodology Verification 184
6.6. Case Study. Results and Evaluation 186
6.7. Conclusions and Outlook 189
Chapter 7
Design of Protocol Dominated Digital Systems 191
Robert Siegmund, Uwe Proß, Dietmar Müller
7.1. Introduction 191
7.2. Specification of Data Communication Protocols 193
7.3. An SVE Model of the USB 2.0 Protocol 203
7.4. Synthesis from SVE Protocol Specifications 213
7.5. Summary 215
Chapter 8
Object Oriented Hardware Design and Synthesis Based on SystemC 2.0 217
Eike Grimpe, Wolfgang Nebel, Frank Oppenheimer, Thorsten Schubert
8.1. Introduction 217
8.2. Related Work 219
8.3. High Level Hardware Modeling with SystemC 220
Contents vii
References 325
Index 343
This page intentionally left blank
Foreword
1
Thorsten Grötker, Stan Liao, Grant Martin and Stuart Swan, System Design with SystemC, Kluwer Aca-
demic Publishers, 2002.
ix
x
Grant Martin
Berkeley
Thorsten Grötker
Aachen
March 2003
This page intentionally left blank
Preface
We put great effort into the selection of authors and articles to present a high
quality survey on the state of the art in the area of system design with Sys-
temC. Organised into 11 self-contained readings, we selected leading SystemC
experts to present their work in the domains of modelling, analysis, and syn-
thesis. The different approaches give a comprehensive overview of SystemC
methodologies and applications for HW/SW designs including mixed signal
designs. We know that any collection lacks completeness. This collection
mainly results from presentations at European SystemC User Group meetings
(www-ti.informatik.uni-tuebingen.de/~systemc). We believe that it gives a
representative overview of current work in academia and industries and serves
as a state–of–the–art reference for SystemC methodologies and application.
Any book could never be written without the help and valuable contributions
of many people. First of all we would like to thank Mark de Jongh, Cindy Zitter,
and Deborah Doherty from Kluwer who helped us through the process. Many
thanks also go to the contributing authors and their great cooperation through
the last weeks. For the review of the individual articles and valuable comments,
we acknowledge the work of Axel Braun (Tübingen University), Rolf Drech-
sler (Bremen University), Görschwin Fey (Bremen University), Uwe Glässer
(Simon Fraser University), Daniel Große (Bremen University), Prakash Mohan
Peranandam (Tübingen University), Achim Rettberg (C–LAB), Axel Sieben-
born (Tübingen University), Alain Vachoux (EPFL), as well as many other
colleagues from C-LAB, Paderborn University, and Tübingen University.
Wolfgang Müller
Paderborn
Wolfgang Rosenstiel, Jürgen Ruf
Tübingen
March 2003
***
Wolfgang Müller dedicates this book to Barbara, Maximillian, Philipp, and Tabea.
Wolfgang Rosenstiel dedicates this book to his family and the SystemC community.
Jürgen Ruf dedicates this book to his wife Esther and his children Nellie and Tim.
xiii
This page intentionally left blank
Chapter 1
2
Katholieke Universiteit Leuven, Department Elektrotechniek–ESAT–MICAS
3
STMicroelectronics Turkey (previously with Alcatel Microelectronics)
Abstract This paper describes aspects of the process and methodologies used in the devel-
opment of a complex System On Chip. SystemC played a key role in supporting
the technical work based on a defined refinement process from early architec-
tural modelling to detailed cycle accurate modelling elements which enabled
early co-simulation and validation work. In addition to SystemC, significant use
was made of the Unified Modelling Language, and process and methodology
associated with it, to provide visual, structured models and documentation of the
architecture and design as it developed.
1.1 Introduction
The challenges presented to the design community by the ever greater poten-
tial of System On Chip technology are well documented [de Man, 2002, Scan-
lon, 2002]. SystemC provides the designer with an executable language for
specifying and validating designs at multiple levels of abstraction. We decided
to adopt the use of SystemC as an integral part of the design process for a recent
System On Chip (SoC) development1. The product under development was a
Wireless LAN chipset. In order to address a global market it was required to
support more than one networking standard sharing very similar physical layer
1
within the Wireless Business Unit of Alcatel Microelectronics, acquired by STMicroelectronics in 2002.
1
W. Müller et al. (eds.),
SystemC: Methodologies and Applications, 1–27.
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
2
The process presented here grew from these central ideas and is explained in
more detail in the following sections. It should be made clear that this method-
ology did not attempt to modify the existing processes for detailed design and
implementation. Rather it provides a comprehensive approach to the manage-
ment of the transition from customer requirements through architectural design
to subsystem specification (the entry point for detailed design) which is vital to
control technical risks in a complex, multi-site development project.
Identification of the success criteria and needs for each user and stake-
holder (in other words what do they need to get out of the project in order
for them to consider it a success).
Identification of the key product features and Use Cases. Features are the
high level capabilities of the system that are necessary to deliver benefits
to the users.
The process of agreeing the contents of the Vision Document can prove
cathartic. It is not uncommon for several different views of the project re-
quirements to co-exist within the project and early alignment around a clearly-
expressed set of goals is an important step. The Vision document forces con-
sideration and alignment on fundamental drivers for the development and key
goals.
2
A stakeholder has an interest in the success of the project, but may not necessarily use the system directly.
3
Key measures of effectiveness can be quantified here—non-functional parameters which are essential for
product success (power consumption, battery life, throughput, cost) can be identified to focus attention.
A SystemC Based System On Chip Modellingand Design Methodology 5
showing the interactions between the system and its environment. UML pro-
vides a Use Case Diagram notation which assists in visualising the relationships
between Use Cases and the system actors. Use Cases not only describe the pri-
mary system response (the expected behaviour of the system in order to satisfy
the user goals), but also encourage early consideration of the secondary system
response (i.e., what can go wrong in the successful scenario). It is important at
this stage to avoid any assumptions about the internal design of the system—it
remains the archetypal black box.
Use Cases will not hold all the requirements but only describe the behaviour-
al part. This approach to the analysis of functional requirements is in no way
restricted to software dominated systems, but provides a very general method
for structuring functional requirements of any system. The requirements for a
complete System On Chip design may therefore be analysed in these terms.
and insight on the high level subsystem interactions. A timed functional de-
scription of the system (TF) is obtained by adding information on throughput
and latency targets to the model, which allows a broad timing overview to be
constructed and to obtain initial estimate for required queue sizes, pipeline de-
lays, etc.. In the case of the Wireless LAN project, the UTF modelling stage of
the physical layer was skipped since the Matlab modelling was considered to
have provided adequate support to the architectural partitioning. These mod-
els can be used as the baseline specification for detailed design in their own
right. The SystemC model can be further refined to provide cycle accurate
(CA) simulation of communication busses or taken as far as RTL equivalent
cycle accurate simulation of logic blocks. At any stage the complete design can
be simulated using a mixture of these different levels, providing flexibility to
explore the design at different levels as appropriate to the implementation risk.
Co-simulation of SystemC and VHDL is also possible with many tools. This
may be carried out in order to verify project designed VHDL in the system envi-
ronment, or in order to incorporate 3rd party IP delivered in the form of VHDL.
Examples of co-simulations are presented in Section 1.5.3. Of course, as the
level of abstraction is lowered the accuracy and confidence of the results goes
up, but so does the time required for simulation. A SystemC model provides
very useful flexibility to construct verification environments which effectively
trade off these two concerns.
It should be stressed that the npram channel is not limited to modelling mem-
ories. It provides a flexible container which may be implemented in various
ways (e.g., register banks).
10
ified within the complete system. Compilation and linking occur on the host
computer. In the next phase the Instruction Set Simulator (ISS) of the custom
processor can be integrated. cycle accurate simulations of the cross-compiled
application code are then performed. As said before and illustrated in figure
1.4, the complete system can be simulated at any stage using a mixture of the
different levels, providing flexibility to explore the design at different levels as
appropriate to the implementation risk. Encapsulation of firmware code within
the SystemC model allows fast simulation but requires clean coding style, since
the compiler of the host computer is used, not the target processor compiler.
After cross-compilation the system performance can be evaluated with Sys-
temC / ISS co-simulation. Depending on the level of refinement of the various
sections, wrappers may be required for appropriate encapsulation. Indeed the
npram does not perform conversion (e.g., in terms of data type or access details)
between heterogeneous sections. As explained earlier, the same channel is used
to model memories, register banks, FIFOs. The instances differ from each other
at TF by their content (data types and sizes). At CA level the channels model
arrays of bit true values.
Figure 1.5. Scaling factor: a link between floating and fixed point values
tional bug fix‚ for example‚ there is no need to update both a floating
and a separate fixed point model. This is also a major advantage in an
iterative process where successive versions of the executable model with
increased functionality are made.
reuse — A single data type with dynamic specification of the fixed point
parameters allows clean module interfaces and instances of the same
module which can have different fixed point specifications.
Figures 1.6 and 1.7 show the interface of a given sc_module using the stan-
dard available data types and using the combined data type.
Of course these benefits come at the cost of simulation speed penalty of about
25% with respect to plain fixed point simulation. As most of this overhead
is owed to the scale checking‚ this feature is switched off once the model is
functionally tested and correct. The executable can then be used as a tool to
explore alternative fixed point decisions with a speed penalty of 12.5%4.
Though the fixed point design exploration was done manually in the context
of the Wireless LAN project‚ this data type does not prevent to be used as a
modelling support for automated fixed point exploration methods‚ as proposed
in [Sung and Kum‚ 1995].
4
Using g++ 2.95‚ SystemC 2.0‚ SunOS 5.7
A SystemC Based System On Chip Modellingand Design Methodology 13
Figure 1.6. Fixed point design using the C++ and SystemC data type
Figure 1.7. Fixed point design using the fx_double data type
5
However design patterns are often independent of programming languages and do not lead to direct code
re-use. Frameworks define semi-complete applications that embody domain specific object structures and
functionality: complete applications can be composed by inheriting from and/or instantiating framework
components. In contrast‚ class libraries are less domain specific and provide a smaller scope of reuse.
A SystemC Based System On Chip Modellingand Design Methodology 15
of the model and allows details of low level interactions between software
subsystems and hardware entities to be analysed.
A Step Further: the Error Control Example. The method and approach
described in the preceding sections was developed during the project. Since its
completion further work has been ongoing to explore possible improvements
to the process for future projects.
One of the limitations of the approach described above was the need to select
an architecture first and then begin the modelling process. If poor architectural
18
decisions were taken at the beginning‚ significant rework of the models could
be required. One of the strengths offered by SystemC is support for higher
levels of abstraction in modelling. This section describes a short trial exercise
which has been carried out to further assess the potential this capability offers to
develop early untimed abstract models of the system. This allows the required
functionality of the system to be modelled free from any assumptions or deci-
sions about the implementation architecture. Once this model is constructed it
can be used to explore and give insight to a number of architectural options.
The ‘system’ which was chosen as the basis for the trial‚ the Error Control
[ETSI‚ 2000a]‚ provided some challenges both in terms of understanding the
complex requirements and in finding efficient architectural partitioning between
hardware and software implementation.
The initial modelling phase has as its objective the construction of an ex-
ecutable model which provides the correct black box behaviour against the
Error Control specification. This model should make no assumptions and place
no constraints on the way in which the system will be implemented—known
classically as an analysis or essential model [Rational‚ 2001‚ Ward and Mellor‚
1985]. The reasons for constructing such a model are:
It permits clear separation of concerns between the specification of what
the system has to do and how it will be implemented.
It allows the early development of a system level test suite‚ which can be
reused as the design progresses.
Once again the use of a UML tool supported well the development of the
SystemC model. Figure 1.11 shows part of the black box view of the system and
its environment. Note that the UML icon capability has been used to enhance
the readability of the diagram—SystemC ports are represented as squares‚ inter-
faces as circles and channels as boxes containing a double-headed arrow. The
system interfaces are all modelled as SystemC channels‚ allowing the services
provided by the channels to be separated from their implementation. The test
bench module communicates with the system module via these channels.
The internal structure of the Error Control system (fig. 1.12) is implemented
A SystemC Based System On Chip Modellingand Design Methodology 19
as a set of C++ classes which provide the required functionality. These classes
are developed using ANSI standard C++ and invoke each other’s methods using
synchronous procedure calls. They are deliberately coded to avoid any use of
SystemC structuring concepts. This allows these ‘analysis’ level classes to
later be mapped and remapped into SystemC modules as the architecture is
developed.
The exercise of analysing the Error Control requirements in this fashion
provided many useful insights:
Complexities in the requirements were extracted much earlier than would
otherwise have been the case. Domain and system experts can confidently
specify important algorithms early in the project.
The exercise of coding the required functionality allowed the consider-
ation and discovery of clean and logical structuring of the requirements
applying the classical principles of cohesion and coupling. A good ex-
ample of this is the separation of concerns between the ‘logical’ problem
of managing a notional 1024 element ring buffer‚ and the lower level
problem of mapping this logical buffer to a physical area of memory with
less than 1024 element capacity. The EC example code captures this
distinction nicely and provides a useful basis for a possible architectural
split.
Very early consideration of system test scenarios is forced by the early ex-
ercising of the abstract model—not left to a late stage in the programme to
be invented by a team who were never involved in the original conception
and design.
The process outlined here is seen as a potential ‘front end’ to the material
discussed previously. The use of an early analysis model allows a considered
and informed approach to the adoption of an architecture. Once the architecture
is selected‚ all of the techniques described earlier in the paper are‚ of course‚
available.
The Wireless LAN project iterations are shown in figure 1.13. The iteration
plan may seem sluggish to software developers used to significantly more rapid
cycles‚ but independent teams within the project were free to execute their own‚
more frequent iterations within the scope of a single project iteration.
The feasibility step covers risks related to algorithms using the Matlab plat-
form and is aimed at assessing the signal processing performances. Iteration 0
concentrates on the mapping from the algorithms to an optimal system architec-
ture. The obtained SystemC executable specification answers to key questions
and allows starting the design process in the next iterations. In order to lower the
risks associated with the transition from SystemC to VHDL‚ a limited VHDL
activity already started during iteration 0. Once validated on a limited scope‚
the transition from SystemC to VHDL is applied to all the design. In a similar
way a residual algorithmic activity may happen during iteration 0‚ covering re-
maining details which did not prevent starting off architectural brainstorming.
During iteration 0 sections of the design which were considered high risk were
modelled at CA level and co-simulated with TF level blocks. During iterations
1 and 2 (detailed design) a number of co-simulation scenarios were planned‚
involving the use of an Instruction Set Simulator to support co-verification of
VHDL‚ SystemC‚ and target software. The SystemC model is used as a golden
reference for the implementation and the validation of the final product during
iteration 3.
the Matlab model serves as an algorithmic reference for the physical layer
SystemC model‚ which in turn becomes a reference for the later stages of
the design‚ on the path down to implementation. Test Benches may even be
incorporated into product code to provide Built In Self Test functionality to the
final product (horizontal test bench reuse). The control processor has a test
mode‚ on top of its normal operation mode‚ which allows to test each section
via 3 steps (figure 1.1): load test patterns (1); execute (2); obtain and analyse
the results (3).
SystemC Verification from Matlab and Use Cases. The Matlab model is
the algorithmic reference for the physical layer system. In order to compare the
performance of the SystemC model against the original algorithms‚ a bridge
between the Matlab tool and the SystemC environment was required. The
interface between Matlab and SystemC was achieved by using Matlab APIs
[MathWorks‚ 2000a] from within SystemC. In this manner the input stimuli
for the SystemC model were extracted from Matlab and output results of the
SystemC model written to a Matlab file for comparison and analysis in the
Matlab environment.
Figure 1.15 illustrates the framework for the SystemC verification. The
bridge between Matlab and SystemC allows:
A seamless design development from Matlab to SystemC‚ from algo-
rithmic to architectural level.
A mathematical and graphical analysis environment for SystemC.
The configuration file of the SystemC model allows the user to specify at
will:
24
SystemC with VHDL. Figure 1.16 depicts the SystemC / VHDL co-si-
mulation for the architecture given in figure 1.1. A section implemented as
dedicated logic (S4) is replaced by its VHDL counterpart‚ in order to validate the
A SystemC Based System On Chip Modellingand Design Methodology 25
SystemC with ISS. In figure 1.17 the SystemC model of the general pur-
pose processor firmware is substituted with its Instruction Set Simulator (ISS)
counterpart. This follows the same principles as explained earlier in the context
of the custom processors. The aim is to verify the firmware implementation
before any of the VHDL models is ready for the hardware part.
SystemC with VHDL and ISS. In the eventuality that the firmware is ready
before the hardware parts are transferred into VHDL descriptions‚ a SystemC /
VHDL / ISS co-simulation can be run (figure 1.18). Firmware code debugging
can be started before all the hardware components are ready‚ in order to reduce
overall design cycle time.
VHDL with ISS. This step validates accurately the hardware/firmware com-
munication and follows the conventional lines.
1.6 Conclusions
From its conception the project team recognised the need to adopt new and
often untried practices in order to manage successfully the development of a
complex design.
The combination of tools and techniques used for the development proved to
support each other well and offer many advantages over traditional specification
techniques. Executable modelling of the architecture allowed real confidence
in the quality of specification to be gained early in the project—many speci-
fication errors were undoubtedly discovered and rectified much earlier in the
process than would conventionally have been expected. On top of this the com-
munication of the specification to the detailed design teams was smoother and
less error-prone‚ both because of the formal nature of the specification and the
involvement of the designers in the modelling process.
26
Acknowledgments
The authors would like to thank Jacques Wenin for his support during the
project and his far sighted appreciation of the value of process improvement‚
and Trevor Moore for his valuable contribution to the presented work.
Furthermore‚ the methodology presented was derived from the work done
in the Wireless LAN project; it would never have crystallized without the daily
work and contributions of the system‚ hardware‚ and software teams in Belgium
and Turkey.
This page intentionally left blank
Chapter 2
Abstract Embedded software accounts for more than half of the total development time of a
system on a chip (SoC). The complexity of the hardware is becoming so high that
the definition of the chip architecture and the verification of the implementation
require new techniques. In this chapter we describe our proposed methodology
for supporting these new challenges as an extension of the ASIC flow. Our main
contribution is the identification and systematic usage in an industrial environ-
ment of an abstraction layer that describes SoC architecture to enable three critical
activities: early software development, functional verification and architecture
analysis. The models are also referred to as Transaction Level Models (TLM)
because they rely on the concept of transactions to communicate. Examples of
a multimedia platform and of an ARM subsystem highlight practical benefits of
our approach.
2.1 Introduction
Multi-million gate circuits currently under design with the latest CMOS tech-
nologies not only include hardwired functionalities but also embedded software
running most often on more than one processor. This complexity is driving the
need for extensions of the traditional ‘RTL to Layout’ design and verification
flow. In fact, these chips are complete systems: system on a chip.
Systems on chip, as the name implies, are complete systems composed of
processors, busses, hardware accelerators, I/O peripherals, analog/RF devices,
memories, and the embedded software. Less than a decade ago these compo-
nents were assembled on boards; nowadays they can be embedded in a single
circuit. This added complexity has two major consequences: (i) the mandatory
reuse of many existing IPs to avoid redesigning the entire chip from scratch for
29
W. Müller et al. (eds.),
SystemC: Methodologies and Applications, 29–63.
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
30
each new generation; and (ii) the use of embedded software to provide major
parts of the expected functionality of the chip. These two evolutions lead to the
concept of platform based design. As extensively described in the literature,
a platform is based on a core subsystem which is common to all circuits that
are built as derivatives of the platform. The platform is then customized for
specific needs (specialized functionalities, cost, power, etc.) by either adding
or removing hardware and software components. The embedded software ac-
counts for more than half of the total expected functionality of the circuit and
most often most of the modifications that occur during the design of a chip
based on an existing platform are software updates. An obvious consequence
of this statement is that the critical path for the development of such a circuit
is the software, not the hardware. Enabling software development to start very
early in the development cycle is therefore of paramount importance to reduce
the time to market. At the same time it is worthwhile noticing that adding
significant amount of functionality to an existing core platform may have a sig-
nificant impact on the real time behavior of the circuit, and many applications
that these chips are used in have strong real time constraints (e.g. automotive,
multimedia, telecom). It is therefore equally important to be able to analyze the
impact of adding new functionality to a platform with respect to the expected
real time behavior. This latter activity relates to the performance analysis of the
defined architecture. The functional verification of IPs that compose the sys-
tem as well as their integration has also become crucial. The design flow must
support an efficient verification process to reduce the development time and
also to avoid silicon re-spins which could jeopardize the return on investment
of the product under design. We will also describe how the proposed approach
strengthens the verification process. At STMicroelectronics one direction to
address these two issues is to extend the CAD solution proposed to product
divisions, known as Unicad, beyond the RTL entry point; this extension is re-
ferred to as the System to RTL flow. As the current ASIC flow mainly relies
on three implementation views of a design, namely the layout, gate and RTL
levels, the extended flow adds two new views: TLM and algorithmic. In the
remainder of this chapter we first introduce the System to RTL design and ver-
ification flow with its objectives and benefits. We then focus on the TLM view
with the description of the modeling style and also the usage of these models to
support critical activities of platform based design: early embedded software
development, early architecture analysis and reference model for the functional
verification. Finally, we present how we used this approach for the design of
a multimedia multiprocessor platform as well as the modeling approach of an
ARM based subsystem.
Using Transactional Level Models in a SoC Design Flow 31
SoC architecture (i.e. SoC-A or SoC TLM platform) captures all infor-
mation required to program the embedded software of the circuit;
All views have complementary objectives and indeed avoid the usual conflict
between the need for accurate descriptions that lead to slow simulation speeds
and the request to simulate real time embedded application software. The SoC
microarchitecture view is aimed at:
Low level embedded software debug and validation, in the context of the
real (simulated) hardware. Our goal is to enable the debug of the device
drivers and their integration into the target operating system before the
first breadboard or even the hardware emulator is available;
SoC functional verification. We aim at verifying that the IPs, once inte-
grated, still provide the expected functionality and that the parallel ex-
ecution and communication of the blocks do not corrupt their behavior.
Some blocks are processors which run software. The verification activ-
ity must take this fact into account and validate all possible interactions
between the hardware and software components;
32
Finally, the SoC functionality, as its name implies, specifies the expected
behavior of the circuit as seen by the user. It is used as an executable specifica-
tion of the functionality, and is most often composed of algorithmic software.
Performance figures are specified separately, often as a paper specification.
The proposed views can be integrated gracefully into the SoC development
process, as explained below. The functionality is independent of the architecture
so that its corresponding view can be started as soon as there is a need for a
product specification.
While software and architecture teams are using the same reference SoC TLM
model for firmware development and coarse grain architecture analysis, the RTL
development takes place, leading to a SoC RTL Platform, as depicted in Figure
2.2. By the time this platform is available, some tasks more closely related to
the hardware implementation are ready to start: fine grain architecture analysis,
hardware verification and low level software integration with hardware. These
tasks are conducted concurrently with the emulation setup, the synthesis and
back end implementation of the standard ASIC design flow.
In the end, when the first breadboards are ready, the software (drivers, firm-
ware and a simplified application) is also completed with good confidence and
the SoC has gone through a thorough verification process that increase the
probability to achieve first time silicon/system success.
Using Transactional Level Models in a SoC Design Flow 33
Speed: models used for the above activities require millions of simulated
cycles in some cases and it is not acceptable to wait for even a day to
complete a simulation run, especially in the case of interactive sessions!
34
SoC TLM for early usage with a relatively lightweight development cost.
This abstraction level sits in between the cycle accurate, bit true model and
the untimed algorithmic models and is, in our opinion, the adequate trade
off between speed (at least 1000 times faster than RTL in our experience,
as depicted in Figure 2.3) and accuracy, when complemented with a SoC
RTL platform;
SoC RTL platform for fine grain, cycle accurate simulations at the cost
of slower simulation speed and later availability.
Let us now define the terms and notions required to understand the TLM mod-
eling approach. A system provides functionality with adequate performance to
a user. It is composed of a mix of hardware and software.
A SoC is a special case of a system in which both hardware and software
are embedded into a single circuit. A system model is composed of a set of
elements, named modules, which:
Communicate data between each other (and potentially with the test
bench) to perform the system functionality. This may be a character,
36
data of a full image, etc.. Each data set exchange is called a transaction.
Two consecutive transactions may transfer data sets of variable size. The
size corresponds to the amount of data being exchanged between two
system synchronizations, as defined below;
Synchronize between each other (and potentially with the test bench).
System synchronization is an explicit action between at least two modules
that need to coordinate some behavior distributed over the modules that
rely on this synchronization. Such coordination is required to ensure
predictable system behavior. An example of such synchronization is an
interrupt raised by a DMA to notify a transfer completion.
The simulation speed up achieved by a TLM model compared to the equiv-
alent RTL model has a direct relation to the ratio of the mean number of cycles
between two system synchronizations which therefore includes one transaction.
The presence of explicit system synchronizations is a mandatory condition for
a TLM model to behave correctly because it is the only means of synchronizing
all the concurrent events that occur in the system. In our experience, all sys-
tems comply with this rule. Figure 2.3 provides approximate ratios in terms of
simulation speed up and modeling efforts.
For specific purposes it might be needed to refine a TLM model to include
some information that relates to the microarchitecture. In this case we name the
initial model that only captures the architecture TLM-A (TLM Architecture);
the refined model is named TLM-MA (TLM Microarchitecture). Of course, for
efficiency reasons a methodology is provided to ensure that the source code for
the TLM IP can be either compiled to build either the TLM-A or the TLM-MA
model with no modification of the source code. In this chapter, unless explicitly
mentioned, the term TLM corresponds to TLM-A. In terms of abstraction levels
such a model lies in between the architecture and microarchitecture levels, and
its precise definition is therefore much more fuzzy.
simulation, resulting in slow execution speed that only enables reduced parts of
the embedded software to be run. Also, major hardware modifications are too
costly because the design is too advanced.
SoC TLM platforms can be delivered shortly after the architecture is spec-
ified. An important consequence is that it becomes available for software de-
velopment very early in the hardware design phase. True hardware/software
co-development is then possible. Hardware and software teams can work to-
gether using the TLM platform as the ‘contract’ between the two teams. The
TLM models then become the reference models that software developers use
to run their code. The same models are used as golden reference models by the
hardware teams.
The TLM model can therefore be used in place of the manual process under-
taken by the verification engineer to generate expected results, as seen in Figure
2.4. Applying the test scenarios on the TLM model will automatically generate
the expected (also named reference) output. Of course, a recurrent question is
how to make sure that the executable specification complies with the written
specification, and therefore what is the level of confidence that the generated
38
SoC System Level Verification. In a typical design flow, once IPs are
verified independently, they must also be verified in the SoC context. Functional
integration tests are developed to verify features such as:
The same approach as for block level verification holds. The SoC TLM platform
is used to generate expected system reference output for a given scenario.
A SystemC test bench will then load the trace of transaction and signal values
to drive the TLM model. The output is then compared with the reference data
from the RTL test bench. Of course, this approach is limited to test benches that
do not need to dynamically adapt the generated test vectors to some response
from the design under test.
While reusing test vectors that were not initially built to run on a pure func-
tional model, some limitations may force manual intervention or more sophis-
ticated comparison methods to be implemented:
synchronization nor part of the bus protocol (e.g. reset, clock) have no direct
equivalent in the TLM model. A specific stub (either in VHDL, Verilog, or
SystemC) may be required.
place of the simple ‘first in first out’ policy1 of the TLM-A model. Typical
models which fall into this category are busses and memory controllers.
The subsection 2.8.2 illustrates this approach.
In both cases the timing delays are modeled with calls to the SystemC wait
statement.
Depending on the design phase, the timing information can either come from
the architect or from analysis of the equivalent RTL simulation. For a true top
down flow the architect, based on some preliminary benchmarking experiments,
will initially insert it. Once the RTL becomes available, timing information will
be extracted from the RTL simulation and used to annotate the equivalent TLM
model. This activity is called back annotation.
1
Such policy is optimized for speed because is removes the need for a delta cycle
Using Transactional Level Models in a SoC Design Flow 43
ysis results may be displayed in other diagram formats: pie charts, his-
tograms, etc. (see Figure 2.7).
In addition, SysProbe enables the user to see which lines of embedded soft-
ware source code have created specific transactions which the user has selected
in the transaction viewer. This is a powerful ‘hardware back to software’ bug
tracking and analysis feature.
General requirements:
The IP interface should enable the following activities:
Reuse of IP-A models: architectural IP models should be plugged into a
SoC-A model and ready to simulate with no code modification, regardless
of their microarchitectural implementation. A consequence is the ability
to combine IP-A models whose implementations do not use the same
protocol (i.e. AHB, STBus, etc.);
Intuitive API: enable intuitive usage of API for required modeling activity
so that learning curve is minimized and potential misuse avoided;
Control: IP-A and IP-MA models differ in the communication control layer
they expect. An IP-A model only expects to read and write data whereas an
IP-MA model expects to control the handshaking mechanism that is necessary
for transfering the data. In this respect the control sequence is more complex
for microarchitectural (MA) models.
Using Transactional Level Models in a SoC Design Flow 47
Timing: no timing information is necessary for IP-A models. They only capture
the sequence of events occurring in the system as the data computation. Apart
from simple IPs (i.e. IP-A and IP-MA are almost identical), IP-A models are
not suited to represent timing behavior.
On the contrary, IP-MA models capture how data is exchanged and also
the internal structure of the IPs. They are consequently sufficiently detailed to
enable accurate timing annotation.
TLM API.
TLM transport:
The TLM API enables the transport of transactions from IPs to IPs. A trans-
action is used to exchange information between 2 or more IPs. A transaction
might involve bi-directional exchange of information.
A transaction conveys the following information:
Initiator data set: data sent by the initiator of the transaction exchange;
Target data set: data sent back by the target of the transaction exchange;
Initiator access port;
Target access port.
A corresponding structure follows:
where the parameter t is a reference to the transaction structure and the return
value provides status information (i.e. transaction completed successfully or
not and maybe further details in case of failure).
48
This API can be considered as a generic transport layer. The actual con-
tent (i.e. the meaning) of the transaction is considered as another layer to
define on top of this API. The mechanism to define this content is as follows:
tlm_transaction_data is an empty class. The actual structure being trans-
ferred is a class instance whose class derives from tlm_transaction_data:
TLM protocol:
Relying on the transport API, it is possible to define transaction structures that
reflect the protocols required to exchange data. Structures corresponding to the
following TLM protocols are available as examples and also with the aim of
covering most needs for modeling SoC on-chip communication mechanisms.
AMBA for support of IP-MA models based on the AMBA protocol. Single
transfers, bursts, and locked transfers are supported.
STBus for support of IP-MA models based on the STBus protocol. Re-
quest/acknowledge and separation between requests and responses are
supported, supporting sequences as described in Figure 2.2. Such generic
protocol can also be derived for specific ones such as the STBus.
OCP for support of IP-MA models based on the OCP protocol. Features
are very similar to the STBus.
4 [optional] A wait statement is executed before the user code starts execut-
ing again, therefore providing a mechanism to model concurrency on each
transaction ‘boundary’. This feature is mandatory for cases when trans-
actions might represent system synchronizations (e.g. polling). When a
transaction ends with a call to a wait statement, we name it a scheduled
transaction. It is otherwise named unscheduled transaction;
As depicted in Figure 2.10, the proposed mechanism enables the code of the
transaction transfer to be executed in the initiator thread, therefore reducing the
total number of required threads (and the related number of context switches).
Obviously no restriction is imposed on the receiving module regarding the usage
of SystemC processes to model the IP (i.e. the other part of the IP behavior
may require the usage of processes).
When timing is annotated on the models, the following timing attributes
of a transaction can be identified: initiation time; grant time; reception time;
servicing delay; and acknowledgement.
Routing. The address field of the transaction is used to route the request
to the relevant module servicing the transaction. Channels can either rely on
their own information to route the request or they can use a decode method (if
provided in the API) to access relevant information located in the targets. The
memory map information can either be distributed over the IP models (each
model knows its address range) or centralized in the channel. Intermediate
solutions can be proposed: information is distributed but cached in the channel
at initialization.
the channel while another will access it. In this case both IPs will initiate the
access to the channel.
This scenario can be used if the data needs to be stored in the channel (e.g.
FIFO, mailboxes). This approach can be suited for example for data flow
modeling. In this case, the transaction operations are simple read/writes, as
proposed in TLM-A.
By using the same TLM API the mapping process from pure functional
models (connected via FIFOs for example) to SoC-A and then refinement to
SoC-MA levels will probably be facilitated because automation capabilities can
be reused from one level to another.
of the coding and decoding tasks according to the current status of the system.
Hence a task will be executed only if the relevant data is available, and will
be suspended otherwise. Because of the internal pipeline of operations several
tasks may be activated at the same time. Synchronization is achieved by writ-
ing/reading command and status registers of the different operators. Second,
data exchanges between the operators and the memory are blocking operations.
The synchronization scheme of the platform ensures that an operator will re-
sume its computation only once the previous transaction is completed from the
system point of view.
In the platform we model the data exchanges with arbitrary sizes, with respect
to the system semantics of the exchange. Transactions between the camera and
the grabber transfer images line per line, whilst transactions between the MSQ
and the other operators are 32 bits wide.
2.7.4 Monitoring
Monitoring at the SOC architecture level is well suited to assisting developers
in the understanding of the data exchanges in the system. It helps for embedded
software debugging (understanding write and read operations on command/s-
tatus registers), and it is mandatory for understanding the temporal behavior of
the different concurrent processes which are executed in the different modules.
In order to avoid a heavy instrumentation of all the models, which is tedious
Using Transactional Level Models in a SoC Design Flow 57
The modeling choices inside the IPs enable a significant gain in terms of
model size: TLM models are 10× smaller than RTL. Consequently they are
easy to write and fast to simulate. The simulation speed is 1400× faster for
TLM compared to the RTL model on a SUN Ultra 10 workstation/330 MHz,
256 MB memory.
The RTL model simulates one coded image in one hour, whilst TLM mod-
els in SystemC require 2.5s per coded image. Before the availability of the
Accellera SCE-MI interface [Accellera, 2002a] on the Mentor Celaro hard-
58
We explain the abstraction choices made while modeling the ARM PrimeCell
Direct Memory Access Controller (DMAC PL080) of the PrimeXsys Wireless
Platform [ARM, 2002]. This DMA controller is as a good example of TLM
modeling choices because the controller needs a parallel style of modeling to
capture correctly the whole behavior. Also some of this hardware parallelism
is not controlled by software but is hardware logic only.
usage of the TLM model as a golden reference model for the RTL verification
and also for co-simulation activities.
AHB slave Port: This interface is used to access the DMAC registers. One
should take care that the (a priori asynchronous) update of DMAC TLM slave
registers by the external embedded software should not corrupt the internal
behavior that relies on them. SystemC runs with non-pre-emptable threads.
The systemC simulation can be controlled in each thread to yield control only
at known pre-emption points. It is the responsibility on the designer of the
module to ensure that pre-emption points are carefully defined. Modeling the
system level synchronizations helps to achieve both a safe system design and a
correspondingly correct SystemC simulation.
AHB Master Ports: The real DMAC has two master ports, allowing two
simultaneous read and write operations. In a TLM platform used for architecture
analysis, both ports are required in order to provide realistic traffic generation.
Even in a TLM platform used only for embedded software development there
is a need for modeling the two ports because each one has specific registers that
are accessed by the embedded software.
Three Interrupt Signals: They represent system events and therefore need to
be modeled.
Register bank: The registers are modeled as data members of TLM modules
class definition. For DMAC TLM the size of all the registers can be made
equal, e.g. to 32 bits (the maximum register size or host machine word size,
whichever is maximum).
As a general rule for defining how to model the internal TLM behavior of
an IP, we can take any deviation from functionality of the block as far as the
functionality of the block remains the same from the software point of view.
These abstractions help reduce the modeling effort and simulation time of the
block.
The DMA includes two internal arbiters (one per bus port). The arbiter
implements a priority-based algorithm, i.e. the highest priority channel with
pending transactions will be allowed to transfer data. When a channel suspends
transferring data because no input data is available, the arbiter will grant access
to another channel of lower priority. This last feature enables the optimization of
data transfers. It does not need to be part of the TLM model to ensure the correct
execution of the embedded software. However, this model cannot be simply
annotated with timing information to reflect its correct timing behavior. A
correct interleaving of data transfers between active channels must be modeled
in this case. The addition of such scheduling scheme will refine the model into
Using Transactional Level Models in a SoC Design Flow 61
a TLM-MA level suited for architecture analysis. The model can be compiled
with or without the added functionality.
methods that call the SystemC wait function with a delay equivalent to the given
number of cycles for that situation (this duration is one of the 256 values). The
timed TLM simulation of the memory controller provides the adequate cycle
count estimation compared to RTL simulation.
The resulting simulation platform comprises a mix of TLM-A and TLM-MA
models with sufficient timing accuracy to be used for performance analysis.
Such a trace can be viewed in Figure 2.18. The upper diagram is RTL and
lower diagram is TLM.
In each diagram the upper line represents transactions on the instruction bus
(AHB-I), the lower line displays transactions of the data bus (AHB-D). Both
busses are connected to a single Static Memory Controller (SMC) with separate
ports. Data is stored in a flash while instructions are stored in a ROM. Hence the
two AHB busses are concurrently trying to gain access to memory banks via the
SMC. When such conflicts occur, the instruction fetch transaction takes longer.
Using Transactional Level Models in a SoC Design Flow 63
This is visible on both the RTL and the timed TLM simulation. In fact, thanks to
back-annotation from RTL of the memory controller the accesses have the same
duration as in the RTL simulation while running at the TLM simulation speed.
As a conclusion, with time annotated TLM models, architects can now estimate
SoC performance with light modeling efforts and fast simulations which include
both hardware and software, with easily modifiable hardware models, without
having the RTL of the whole SoC, and without modeling at cycle accurate (low)
level.
2.9 Conclusions
As highlighted in the chapter, we have introduced a new abstraction level in
our SoC design flow to address the needs of early embedded software develop-
ment, architecture analysis, and functional verification. This level captures the
architecture of the IP (or SoC) and complements the RTL level which models
the microarchitecture. The combined usage of these two views avoids the usual
conflict between the need for accurate descriptions and the request to simulate
at high speed the real time embedded application software. This flow extension
has successfully been applied to SoC design and verification activities in an in-
dustrial context involving very large teams with restricted manpower dedicated
to the introduction of new techniques.
Acknowledgments
This chapter is an overview of the system level design flow developed at
STMicroelectronics. We would like to thank all our colleagues who contributed
to the developments described in the chapter. In particular, we thank Eric
Auger and Nicolas Mareau for their major contributions to the development of
‘RTL platforms’, Antoine Perrin and Sudhanshu Chadha for the development
of SysProbe and Rohit Jindal for his contribution to TLM models. We also
gratefully acknowledge the support of Srikant Modugula for the coordination
between the teams of Crolles (France) and Noida (India). Thanks to Jean-
Claude Bauer and Murielle Icard-Paret for their active contribution to the review
and improvement of the chapter. Last, but not least, we would like to thank
Philippe Magarshack for his continuous support and encouragement during the
development of the flow.
This page intentionally left blank
Chapter 3
Abstract The objective of this paper is to present a possible flow, using SystemC, to make
the transition from a high level data flow description towards an implementable
model. The main focus is behavior refinement, in other words the modification
of the module descriptions until they can be mapped to a given architecture,
satisfying the implementation constraints. A customizable processor core that
executes an ANSI C version of the system model is utilized. Only operations
that require a lot of computational power are implemented as custom hardware
modules to extend the core functionality. A simple operator counting technique
is used inside the high level model to obtain run time estimates for hardware-
software partitioning.
65
W. Müller et al. (eds.),
SystemC: Methodologies and Applications, 65–95.
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
66
Outline. This article consists of four main parts. The first two sections
deal with the description of the OFDM system and the system model used for
this work. After a brief presentation of the OFDM demodulator and its basic
concepts a thorough description of the high level data flow model is given.
The principles of data flow modeling and the techniques used to allow for easy
refinement in later stages are presented.
The refinement process is presented in the last two sections. They consist of a
description of the technique used to get from the data flow model to a first ANSI
C model and a discussion of the refinement process used to increase the speed
of that ANSI C model. The steps carried out to replace the modules and ‘First
In First Out’ (FIFO) channels of the data flow model by C language constructs
are described. Going one step back to the SystemC data flow model, the flow
from early performance analysis to implementation of the operator grouping is
described.
bile reception and difficult environments with rapidly changing channel condi-
tions. Examples for such systems are the wireless LAN standard 802.11 a, or
broadcasting systems such as XM Radio [XM Satellite Radio, 2003] and Sirius
Satellite Radio [Sirius Satellite Radio, 2003] in the U.S.A.
A common problem of these channels is deep narrow band fade owed to
multipath reception in an urban environment and narrow band interference.
The basic idea of an OFDM system is to divide the available frequency band
into small sub-bands. The distribution of the information over the available
subcarriers results in a rather low datarate for each subcarrier. Every sub-band
or subcarrier is modulated using Quadrature Amplitude Modulation (QAM).
The QAM scheme depends on the signal to noise ratio in the subcarrier, and
typically varies from Binary Phase Shift Keying (BPSK) up to 64 QAM.
To have the best spectrum efficiency the different modulated subcarriers
have to be orthogonal. By using the Inverse Fast Fourier Transform (IFFT) for
modulation the subcarriers are implicitly chosen to be orthogonal. Using an
FFT in the receiver, the information carried by the OFDM signal is efficiently
restored.
The critical part of an OFDM demodulation is the synchronization of the in-
coming signal. The FFT processing must be synchronized to the boundaries of
the OFDM symbols. The system is sensitive against phase noise and frequency
offset. Therefore symbol tracking and the correction of the frequency offset
and deviation is crucial.
mixing to base band, down sampling and filtering of the received signal;
fine frequency synchronization (frequency offset correction);
synchronization of the OFDM symbols;
subcarrier separation by FFT computation;
QAM demapping, error phase offset compensation and Viterbi metrics
generation;
error correction.
Figure 3.1 shows a block diagram of the system. The Numerical Controlled
Oscillator (NCO) at the input is used for frequency and phase correction and has
to shift the required frequency band exactly into the base band. It is controlled
by the fine frequency module, which determines the correction signals to the
NCO. The discriminator characteristics are implemented in the fine frequency
module. After the NCO the incoming data samples are filtered and down-
sampled by a factor of two.
The next module is the frame and symbol synchronization, in which the
boundaries of each OFDM symbol are detected and the data blocks for the FFT
are defined. Within the FFT module the subcarriers of the OFDM symbol are
separated. The outputs of the FFT module are complex values for each sub-
carrier, representing the transmitted signal information. The post-processing
demaps the OFDM symbols, compensates the error phase offset, and generates
the Viterbi metrics. The last processing stage is comprised of error correction,
interleaving, and de-multiplexing of the data-stream, which are not shown in
this article.
SystemC 2.0. One of the supported MOCs is data flow modeling, as used for
the implementation of the high level model.
The module has an input port inp of type sc_fifo-in<T> and an output port
outp of type sc_fifo_out<T>. Both ports are public members of the module,
because they need to be accessed from outside the module at instantiation, in
order to connect FIFO channels.
As the constructor should take an argument to determine the offset value,
theSC_HAS_PROCESS macro has to be used before defining the code of the con-
structor [OSCI, 2002a]. Within the constructor the member function main()
is registered as anSC_THREAD process. An empty destructor is given for com-
pleteness but could have also been omitted in this simple example.
Refining a High Level SystemC Model 71
The process code follows a scheme used for all the modules of the high level
model. First of all, note that the process code is enclosed by an infinite loop,
which is required by the SC_THREAD process type [OSCI, 2002b] [Niemann,
2001]. Inside this loop the first action is to read the value of the input port
to a temporary variable using blocking read. The temporary variable tmp is
defined outside the infinite loop to avoid the run time overhead of re-defining
the variable at each loop iteration. The next line represents the data processing
of the module, which is performed using the temporary variable (of course,
in more complex modules, data processing will require much more than one
simple line of code). After all processing has been done, the result is written to
the output port using a blocking write operation.
high level model are two streams of floating point samples interpreted as one
stream of complex floating point samples. After the samples have passed the
nco_mix and the lp_filter modules, synchronization is performed by the
sym_frame_syn and the samples are aligned and put together to data blocks.
All the further processing (fft, post_proc and fine_freq) is block based.
The OFDM demodulator contains one feedback loop for frequency control.
A feedback loop in a pure FIFO based data flow model would be implemented
using a delay module inside the feedback loop. For the OFDM demodulator,
the delay module would have to be placed between fine_freq and nco_mix,
because otherwise the nco_mix module would try to read a sample from an
empty FIFO in the feedback loop and the complete simulation would be blocked.
The approach used in the model presented here is to use a signal instead of a
FIFO for the connection of fine_freq and nco_mix. This has the advantage
that the frequency correction is applied to the nco_mix module as soon as it is
available inside the fine_freq module. Moreover, as reading from a signal
never blocks, a delay module is not necessary in this case.
To enable a smooth refinement some simple rules were followed during the
implementation of the high level model:
Use typedefs for the data types of all input and output ports;
Only use one process within each module;
Structure the algorithm into different sub-tasks and use member functions
to encapsulate those tasks.
Apart from the six algorithmic blocks, there are two modules which control
the data flow through the system (split and data_block2smp). Table 3.3
lists the properties of these modules. The data_block2smp module is used to
74
convert blocks of data back to a stream of samples, which can then be written
to a file. The split module is necessary because in SystemC an sc_fifo<>
channel may only be connected to one input port. However, the output of the
FFT is needed in the post-processing module as well as in the feedback loop
to calculate the frequency offset estimates for the NCO. The splitter modules
simply copy data of one input port to multiple output ports (in this case two).
The split modules as implemented in this model can handle any data type that
has an assignment operator (operator=()) defined.
It can be seen that the user defined types T-FFT and T_FREQ are used for the
ports of the module. Those types are defined inside the file of dm_demod_pkg. h
which is included by all modules. The following code snippet, for example,
sets both types to float
equality operator compares each element of the left hand side with each element
of the right hand side and only returns true if all elements match.
The switch from sample based to block based processing requires some
consideration regarding the appropriate FIFO size. In SystemC 2.0 the size of
the FIFOs is determined by the user and not by the simulation kernel. If the
sizes are chosen inappropriately this will result in unnecessary context switches.
A context switch occurs whenever a module tries to read a sample from an
empty FIFO or a module tries to write a sample to a full FIFO, in other words,
whenever the module is blocked. In the sym_frame_syn module, at least the
number of samples comprising one complete block of data is read before the
block is written to the output port. To be able to read all samples required for
one complete data block from the input without a context switch, the connected
FIFOs need to have a size of at least the block size.
Therefore, the following sizing rule was applied to the demodulator model.
Let S be the size of one data block and N the size of the FIFOs for block based
communication,
This sizing rule ensures that only after the processing of N complete blocks of
data a context switch occurs. The performance gain — in terms of simulation
time — resulting from correct FIFO sizing was up to 50% for the OFDM
demodulator, compared to worst case sizing (all FIFOs have size one).
Given these prerequisites, the idea is to replace the data type used in the data
path (double in our model) by a user defined template class specialized by
the data type (profile< double >). This profiling class provides overloaded
implementations of all operators used in the data path. These implementations
call the operators of the underlying data type and additionally dump the type of
operation, the name of the module, and the name of the process to a file.
The implementation of the operator+= is shown in the following code
snippet. It can be seen that apart from performing the add operation, a function
pout() is called with the name of the operator (op+=). This function performs
the dumping of the necessary information to a file.
The output of a profiling run for the low pass filter module is shown below.
Of course, this data has to be post-processed in order to obtain the number of
Refining a High Level SystemC Model 79
Figure 3.5. Operator histogram of the low Figure 3.6. Operator histogram of the
pass filter module for one frame FFT module for one frame
operator calls for each type of operator. This is done by a simple Perl script that
takes the output of a profiling run and converts it into a table containing the type
of the operator and the number of invocations. This file may then, for example,
be used as input to a spreadsheet program to create a graphical representation
of the data obtained.
Histograms for all modules can be generated which show how often an op-
erator is called for a given module. Two such tables are shown in figure 3.5 for
the lp_filter module and in figure 3.6 for the fft module. The number of
processed samples is the same for both figures and corresponds to one frame.
A frame is comprised of several OFDM symbols and has a preamble associated
with it. The details of the frame structure are beyond the scope of this article.
3.4.1 Introduction
The ANSI C description can be seen as an intermediate step in the refine-
ment process towards a real time implementation. It may be used as a common
starting point for implementation on various platforms consisting of a micropro-
cessor and some custom hardware. The advantage of this model is that it allows
80
for early integration on the target platform and avoids the run time overhead of
a C++ or SystemC based implementation.
As will be shown in 3.4.2, the functions performing the algorithms, and there-
fore the number of operations executed by the model, remain the same during
this refinement step. At the end of this section, in 3.4.4, a comparison is made
between the run time of the model on the target processor, an ARCtangent™-
A4 (further called A4), and the estimates obtained from the operator counting
technique.
The code snippet below shows the implementation of the NCO_MIX() func-
tion. The arguments passed to the function are a pointer to the buffer for the
real input data, a pointer to the buffer for the imaginary input data and a pointer
for the increment of the NCO, which is determined by the fine_freq module.
There are no explicit pointers to the output buffers used by the NCO_MIX func-
tion. This is because the processing is done in place, in other words the samples
in the input buffer are overwritten by the output samples. The same applies to
the COMP_MIXING function, which used to have two additional arguments for
the output data in the high level model (see page 74).
The only task handled by the NCO-MIX() function in the example is to call
the two functions that were previously called by the process main() of the
nco_mix module. Access to the sc_fifo<> ports of the module has also been
removed from the original process code, because communication between the
modules is handled by the framework in the ANSI C implementation.
82
Figure 3.7. Structure of the framework used by the ANSI C implementation of the OFDM
demodulator
separates the application specific functions and the device specific functions.
READ_FRAME() only calls a generic function to acquire data from the input
device, which has to be re-implemented for the given device.
After having read a frame the processing takes place. The FSM, used to
schedule the function calls for the different blocks of the demodulator, is imple-
mented by OFDM_DEMOD_MOD(). The function returns control to the framework
after each algorithmic block and is consecutively called by the framework until
all blocks have been run on the data stored in the buffer.
The WRITE_FRAME() function, as the last stage of the processing phase,
writes the processed samples to the output device. It has the same layered
structure as already discussed for the function READ_FRAME().
The framework continues with the shutdown phase after all frames available
on the input device have been processed. The memory that was dynamically al-
located by init_OFDM_DEMOD() is now de-allocated by done_OFDM_DEMOD().
The I/O devices are finally closed using the function done_IO().
Using this technique, run time estimates for various operations on the A4
have been obtained. Table 3.5 shows the run times for some operations for the
data type double. As the A4 processor used in this project does not have a
floating point unit, all operations were carried out using floating point emulation.
Therefore the run time required by those operations is much higher than the run
time of the corresponding fixed point or integer operations. Finally, it should
be noted that the run time required for one floating point operation depends
on the input data. The cycle count presented in this article for the various
operations has been obtained using data that results in a high run time, and
therefore presents a worst case estimate.
The first example that will be studied is the low pass filter. A histogram
obtained from counting the operations in the low pass filter has already been
shown in figure 3.5 for processing a complete frame. To make things simpler the
comparison has been carried out using only 2,872 samples instead of a profile
for a complete frame. The reason for choosing exactly 2,872 samples is based
on the internal implementation details of the OFDM demodulator architecture
and lies beyond the scope of this article.
Refining a High Level SystemC Model 85
Table 3.6 shows a straightforward calculation of the run time estimate for the
low pass filter by simply multiplying the number of operations by the estimated
run time. It is immediately evident that this seems to deliver a useless result,
because the estimate exceeds the real run time by nearly 60 percent.
Of course, a different implementation of the half band filter could have been
chosen for the high level model where the operations with zero arguments are
avoided. However, in this project a generic implementation of a Finite Impulse
Response (FIR) filter was used in the system-level model and directly re-used
in the ANSI C model.
86
Because of the large computational power required by the low pass filter and
the complex mixer, these two modules are candidates for being implemented in
hardware. In the following it is assumed that the A4 processor only handles the
block based processing comprised by sym_frame_syn, fine_freq, fft and
post_proc. Measurements using an A4 evaluation platform have shown that a
theoretical clock frequency of about 70 GHz would be required to run the block
based processing in real time. This is far from being feasible.
Results from measuring the run time for operations for various data types
have shown that a speed up by a factor of 15 is realistic for a conversion from
double precision floating point to fixed point. That would result in about 4–
5 GHz necessary clock speed. Experience from previous projects shows that
this could be reduced by a further factor of 2–4 if hand optimized code with
assembly language for the critical parts is used. To further reduce the required
clock speed, more modules could be implemented in hardware, for example the
post_proc module. That approach will not be further discussed in this paper.
3.5.1 Introduction
As mentioned earlier, the target processor is the ARCtangent™-A4 [ARC
Int., 2001]. This customizable 32 bit processor core is a soft macro and can
be implemented on virtually all ASIC and FPGA technologies. The user may
select — under the control of a vendor provided configuration tool — different
hardware options for the processor core and generate a human-readable VHDL
or Verilog source database that includes all modules of the core. Configurable
options include the system interfaces, size and organization of instruction and
data caches as well as details of the instruction set.
The performance figures in table 3.6 to 3.8 are derived from measurements
on the baseline processor. In addition, hardware extensions for a barrel shifter,
normalize, and 32 bit multiplier may be easily added. These extensions are
known to the compiler engine and suitable instructions are inferred when the
code is translated. Except address calculation that may be carried out in parallel,
the parallelism in core is limited to one operation at a time. Using the men-
88
tioned extensions and assuming fixed point data types and an optimal scheduling
done by the compiler, each simple operation (add, sub, multiply, assignment)
executes in one cycle.
ist in parallel to the original processor ALU and implement the logic for the
additional operator(s). Data paths and control logic are shared with the core
processor; execution is under the control of the ordinary program code.
In this approach to HW/SW partitioning, SystemC is used:
to re-code the data flow model, making use of the additional operators;
SystemC offers very fast turnaround times because only minor modifications
to the original data flow model are required to implement and test different
operator sets.
90
Fixed point Model. To derive a fixed point model a block floating point
approach is used: All inputs are scaled by a constant factor to the range
[–1, 1[ and treated as numbers consisting only of fractions. 15 bit precision is
used whilst the 16th bit represents the sign; the real and imaginary parts are
packed into one 32 bit word. During FFT processing the intermediate results are
statically re-scaled and rounded to avoid overflows and to preserve reasonable
resolution. After processing, the correct floating point values are recovered by
inverse scaling with the input scale factor and the product of all per-stage
scale factors; using standard floating point representation this is done by adding
a constant value to the original exponents. For reasons of simplicity, in this ar-
ticle the well known radix-2 FFT operation is presented. An ANSI C code
snippet for the butterfly operation is shown below.
Having later translation back to ANSI C in mind, the custom operators are
implemented as ordinary functions. The functions for operators op21 and op22
are shown in the code snippet below:
Using the two custom operator functions op21() and op22(), the original
FFT butterfly code may be rewritten as follows:
3.5.5 Results
A setup identical to the one in chapter 3 is used for operator counting. Table
3.11 gives the results for different combinations of the 4 operator groups defined
in figure 3.9.
As can be seen from the table, case B has the highest operator (and thus clock
cycle) count but the smallest area requirements. Case D provides the best speed
up, although this also corresponds to the highest area. Depending on the cost
Refining a High Level SystemC Model 93
function — trade off between speed and area — that is applied, case B may be
of interest, whilst case C doubles, compared to case D, the processing time by
providing only a 10% area gain.
Another important fact that affects the selection of optimal operators is the
number and width of available data paths. In the target processor 2 independent
32 bit data paths for 2 input arguments are routed to the ALU, while only one 32
bit bus exists for the result. Using 16 bit precision and considering all arguments
for the different operators as variable, actually only op1, op21 and op22 may
be implemented, while op3 or op4 would require 6x16 bit (xr0, xi0, xr1, xi1,
wr, wi) and thus exceed the capacity of the input data paths. Moreover, op4
produces results of 4x16 bits (yr0, yi0, yr1, yi1), thus exceeding the capacity
of the data bus used for results.
If op3 and op4 are only used for FFT processing, wi and wr may be assumed
to be semi-static. In fact, wi and wr are constants (twiddle factors), usually
taken from pre-calculated sine and cosine tables. They vary from butterfly to
butterfly, but are independent of the data that is processed by the FFT. In other
words, wi and wr may be generated internally in op3 or op4, for example by
table lookup, and there is no need to explicitly provide wi and wr as arguments.
This limits the input width to a total of 64 bits and enables the implementation
of op3 in the given target hardware. The implementation of op4 would require
additional modifications of the processor core that are beyond the scope of this
article.
3.6 Summary
Modeling and refinement using SystemC was demonstrated in the preceding
sections by means of an OFDM demodulator example. Here the techniques
used to take advantage of the modeling capabilities offered by SystemC are
briefly summarized.
The high level model is used to explore the performance of various algorithms
in terms of bit error rate as a function of signal to noise ratio. Furthermore,
it serves as a basis for further refinement and as golden reference model for
all subsequent steps. It must be possible to simulate a large amount of data
within reasonable time in order to get a reliable measure for the performance.
Moreover, algorithms should be easily interchangeable to allow for seamless
algorithmic exploration. Finally, it should be possible to get to lower levels of
abstractions as easy as possible.
To achieve the abovementioned objectives we have applied the following
modeling principles:
3.7 Conclusions
In this article we have presented refinement of a high level SystemC model
towards embedded software, dedicated hardware, and CPU instruction set ex-
tensions.
Most individual steps we presented could have been carried out with conven-
tional hardware and software design methods using traditional languages like
C, C++ and VHDL. But, with SystemC, a single language, environment, and
methodology may be applied throughout the complete refinement process. In
addition, SystemC offers some unique features that enable a smooth transition
from higher levels of abstraction to the lower ones.
In the high level model we have made use of the encapsulation in modules, the
connection by FIFO channels and the synchronization and scheduling offered
by the built in simulation kernel. SystemC, moreover, helped in retrieving
the number and type of operations for each module of the high level model. In
contrast to commercial data flow modeling systems using proprietary languages,
the SystemC model can be run on a large variety of platforms without having
the need of installing additional software.
During the transition to ANSI C we had the advantage of a unified develop-
ment environment. Moreover, the task could be split in small manageable steps
that could always be verified against the high level model.
New complex operators were easily implemented and verified using SystemC
and the high level model. The performance gain of these operations could also
be evaluated using SystemC. The last step would have been to implement those
custom operators in hardware using SystemC for RT level description. This step
was not carried out because the feasibility of hardware implementations using
SystemC has already been proven (see, for example, [Niemann and Speitel,
2001]).
This page intentionally left blank
Chapter 4
2
Tübingen University, Department of Computer Engineering
Abstract We present a formal definition of the event based SystemC V2.0 simulation se-
mantics by means ofdistributed Abstract State Machines (ASMs). Our definition
provides a rigorous and concise, but yet readable, definition of the SystemC spe-
cific operations and their interaction with the simulation scheduler that covers
channel updates, notify, notify_delayed, wait, and next_trigger operations. We
present the semantics in the form of rules by means of distributed ASMs reflecting
the lines of the SystemC V2.0 Standard Manuals and reference implementation.
The semantics introduced is defined to complement the language reference man-
ual with a precise definition reflecting an abstract model of the SystemC reference
implementation, which can be used for advanced applications and for investigat-
ing interoperabilities with other languages.
4.1 Introduction
SystemC is the emerging de facto standard for system level modeling and
design from the Open SystemC Initiative (OSCI). In 2002, SystemC received a
major revision and upgrade to Version 2.0 which provides the stable basis for fu-
ture versions. With this upgrade the main principles were clarified by a general-
ization, e.g., of the underlying event based synchronizations and channel based
communication concepts. SystemC V2.0 currently comes with well-written
manuals [OSCI, 2002a, OSCI, 2002b], a stable reference implementation, and
complementary text books, e.g., [Grötker et al., 2002]. However, the precise
meaning of several parts of the underlying concepts cannot be easily captured
since natural language descriptions often lack precision. Thus a precise seman-
tics is mandatory for advanced SystemC application in simulation, synthesis,
and formal verification.
97
W. Müller et al. (eds.),
SystemC: Methodologies and Applications, 97–126.
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
98
In this chapter, we present a concise and rigorous but yet intuitive semantic
definition of SystemC Version 2.0 by Gurevich’s distributed Abstract State Ma-
chines [Gurevich, 1995]. ASMs allow us to define the semantics following the
lines of the SystemC manuals and the reference implementation. The definition
given herein is a significant major revision of the semantics given in [Miller
et al., 2001] towards the event-based interaction of the simulation scheduler and
the user defined processes (methods and threads). We develop a mathematical
definition of SystemC in terms of a SystemC Algebra with the precise seman-
tics of channel updates as well as wait(), notify(), notify_delayed(),
and next_trigger() operations.
We mainly introduce our formal semantics as a precise and concise but yet
intuitive definition which complements the SystemC language reference man-
ual and the reference implementation. Its main application is as a basis for
the language studies and interoperabilities with other languages as well as for
reasoning about SystemC models, i.e., for formal verification and synthesis.
Studies in language interoperabilities are supported by our ASM definition
since comparable definitions are available for VHDL, Verilog, and SystemC as
it is outlined in the next section.
The remainder of this chapter is organized as follows. Section 2 discusses
related works. In Section 3, we briefly introduce what is needed from distributed
ASMs without going into theoretical details. Section 4 gives an overview
over the general principles of SystemC before Section 5 and 6 introduces the
semantics of the individual SystemC specific operations as well as the SystemC
scheduler. Section 7 gives an example that executes on the introduced Abstract
State Machines before the conclusion closes the chapter.
1
Formerly known as Evolving Algebras
100
At each step the guards evaluate to a set of function updates (block) each of the
form
2
Note here that Gurevich [Gurevich, 1995] does not introduce a special symbol for separating updates in a
block. We use a comma as an explicit block separator in this chapter.
3
This modification has no impact on the theory of ASMs rather than helps us to keep the definition of
SystemC simulation kernel simpler.
An ASM Based SystemC Simulation Semantics 101
Executing the constructor means to spawn and execute the rule for each
element in Universe simultaneously, i.e., the constructor basically spawns
rules where is the number of elements in Universe. The following example
demonstrates the application of this constructor. It defines a rule which specifies
that each non-empty from the universe LIST is replaced by the list’s tail, i.e.,
deleting the first list element. refers to any valid instance of LIST.
4
SystemC actually introduces threads and cthreads (clocked threads) where the latter are mainly introduced
for the matter of efficient simulation and synthesis. Since SystemC V2.0 unifies the behavioral semantics of
threads and cthreads, cthreads become a specialization of threads, so that we can consider the semantics of
threads here without the loss of generality.
An ASM Based SystemC Simulation Semantics 103
For the enhanced ASM semantics including watching conditions, the reader is
referred to [Müller et al., 2001].
In SystemC V2.0 the interaction between the scheduler and the user defined
processes is completely based on EVENTs. Processes have explicit notification
(i.e., generation of events) by notify and notify_delayed. Suspensions on time-
outs and channel updates generate internal events. The list of pending events
determines the trigger upon which processes become ready with respect to the
current simulation time This defines an underlying discrete SystemC time
model in which EVENTs are ordered by their time components with respect to
The individual tasks of a scheduler are given as phases which are divided into
steps. After initialization the simulation scheduler continuously iterates through
two different phases: evaluate and update (cf. Fig. 4.4). The evaluate phase
defines the invocation of processes and evaluation of their statements within
one delta cycle (see (1) in Fig. 4.4/5). When no more processes are ready, the
scheduler advances to the next delta cycle and proceeds to the update phase. In
update, it decides either on the next delta cycle without time advancement (see
(2) in Fig. 4.4/5), or to proceed to the next time cycle (see (3) in Fig. 4.4/5). The
decision is based on the execution of different steps as given in Fig. 4.5, where
the Step 2-3 correspond to the evaluate phase and Step 4–8 to the update phase.
In update, the scheduler first checks for update_ requests on CHANNELs which
were scheduled by processes when future value changes on CHANNELs are
requested. For each update_request the corresponding update() is executed on
a specific CHANNEL where update() can be individually overloaded for each
An ASM Based SystemC Simulation Semantics 105
CHANNEL type. For signals the predefined update() assigns the new value to
the current one and schedules an update event to the pending events. Here, not
a new event of the signal is generated, rather that the already existing default
event, which is predefined for each signal, is scheduled. After the update of
channels the scheduler checks for events at current time and returns to the
evaluate phase when an event is detected. Otherwise is advanced, which
activates events for the new time before returning to the evaluate phase.
For our ASM definition, the rules in the next two sections constitute two
ASM modules. In the next section, we first define rules for the PROCESS_ASM-
module, which give the semantics of distinguished SystemC operations by the
ASM rules with labels P1–P9. Thereafter the SCHEDULER_ASMmodule is
defined by the rules labelled S1–S10. When instantiating a program we create
one agent instance of the latter for the simulation scheduler and one ASM agent
instance of PROCESS_ASMmodule for each user defined process.
106
For signal request_update(c) stores the request by inserting the signal into
the update_array5.
5
The name is owed to the V2.0.1 reference implementation. For our semantics we have defined it as a set
to reduce complexity in the ASM rules.
An ASM Based SystemC Simulation Semantics 107
4.5.2 Notify
Events are implicitly generated when updating a primitive channel such as
a signal. Alternatively, an event can be explicitly sent by e. notify(). The
notify operation without any parameter generates an immediate notification in
the current delta cycle. All others notify operations generate events for future
cycles by either delta delay or timed notification. For that, each event holds the
set of all processes, which are sensitive to that event (processes(e)), i.e., which
are resumed when the scheduler elaborates the events. Thus for immediate
execution all these processes are immediately set to the status ready.
Owing to the value of its parameter, the notify operation has effect on future
delta cycles or time cycles. We organize the event management through the
global set pendingEvents, which holds the set of current and future events.
Therefore it is important to keep the notification time for each event as (time(e)).
Owing to the rules of the current language reference manual, the notify operation
only has effect under specific conditions. When notified, the event is inserted
in pendingEvents only when it is not already element of that set. Then its time
is set to an absolute time by setting it to the specified time plus the current
time In the case that event has already received a notification, its time is
overwritten if the new time is smaller or equal than the already assigned time.
That means that previous notifications on that event are cancelled when the
already scheduled notification is later than the current notification request. In
other words, when new notification requests are later than the already scheduled
ones they are simply ignored. Finally, after executing the operation the process
proceeds to the next operation by advancing the program counter.
108
4.5.3 Notify_Delayed
In addition to notify(), SystemC introduces a notify_delayed() oper-
ation. As a result of the first one the latter is available without parameters and
with time specification yielding to a similar behavior. In contrast to notify,
notify_delayed cannot set an immediate event on the current delta cycle, i.e.,
notify_delayed without parameters set a notification for the next delta cycle.
In addition notify_delayed generates a run time error and simulation stops
when an event is already inserted into pendingEvents6. Except those two
differences, the semantics is the same as given in rule P3. Therefore, it should
be sufficient when we only give the semantics of a notify_delayed with time
specification here.
6
Note that this is implementation specific behavior of the SystemC V2.0.1 simulator and may be subject for
a modification
An ASM Based SystemC Simulation Semantics 109
4.5.4 Wait
The previously outlined operations schedule events. Processes can defined
to be sensitive to those events through the different variations of the wait and
the next_trigger operations. Wait operations are for the synchronization of
threads, next_trigger for methods.
For both operations we have to define the semantics of several different cases
where we first distinguish static and dynamic sensitivity. It is possible to define
a sensitivity list when declaring a process in the constructor of a module. That
sensitivity list is denoted as the static sensitivity list of a process, i.e., on the
notification of at least one of the events in that list, the process resumes from
suspended. As an example, consider the following definition which declares
process in the constructor of module (SC_CTOR(m)) and a sensitivity list
with events and in the sensitivity list of which defines that is sensitive
to and
The static sensitivity list is assigned for the whole lifetime of the process
in the elaboration phase. Static sensitivity lists are referred to in the body of
a thread by a wait operation without parameters. When executing such a wait
operation, a process Self changes its status to suspended and sets its trigger to
staticSensitivity which is required to identify that the process resumes based on
the evaluation of the static sensitivity list. Recall that each event holds a list
110
In the case of a wait operation with parameters the objects given in the pa-
rameters temporarily replace the ones in the static sensitivity list of the process.
Because the temporary list is active only for the next invocation of the process,
it is denoted as a dynamic sensitivity list. Dynamic sensitivity lists can either
range over timeouts or events or both.
When being sensitive on a timeout the process first creates a new event
and schedules this event at time by setting time(e). Self is inserted into
the list of processes of event in order to be identified for later invocation when
the event becomes active, i.e., when the timer expired. The event is inserted
into the set of pendingEvents. The event expression of the process is kept in
eventExpr(Self) for later evaluation which will be outlined later. Thereafter, the
process sets its trigger to dynamicSensitivity and changes status to suspended.
For ‘or lists’ the process resumes when at least one of the given events is
notified. When events are combined by &, the process resumes after all events
have been notified. The ‘and notification’ may range over different delta cycles
so that the notification history has to be managed.
We generalize all three cases and define their semantics by one ASM rule
over an event expression EventExpr, which can be of one of the three types7.
First, the rule sets its trigger to dynamicSensitivity. Thereafter, the process
Self is inserted into the list of processes which have to be resumed. This is
done for each event in the event expression of the wait operation. The function
eventCollection(EventExpr) returns the set of all events used in EventExpr.
Then the used EventExpr is stored since the scheduler has to access these events,
e.g., for removing Self from all events of an or list, if one of these events has been
triggered. Finally, the rule sets the process status to suspended and advances
the program counter.
The meaning is that the process either waits for the truth of the event ex-
pression or for the timeout. We do not give an extra ASM rule here since the
semantics is a generalization of event expressions towards timeouts so that this
would simply need a combination the ASM rule P7 and P8.
Considering the wait_until(expr) operation, we do not define an extra
ASM rule since it can be translated to:
7
Our definition abstracts from these details and can therefore also be applied to future versions allowing the
mixing of & and | operators
112
4.5.5 Next_Trigger
Whereas the wait operation provides a synchronization for threads, methods
have the next_trigger operation for their invocation based on static and dy-
namic sensitivity lists. Corresponding to the different wait operations outlined
in the previous section, we can distinguish exactly the same variations for the
next_trigger operation:
The semantics of those operations is not to stop the execution of the method
after executing the next_trigger rather than to continue until the last state-
ment of the method. In the case of executing more than one next_trigger
operations, only the last one determines the next trigger.
METHODs are not allowed to have a wait method suspending their execu-
tion and managing their sensitivity. Therefore we first set METHODs (see next
section, Rule S2) to be static sensitive by default. If a call to next_trigger
arises we remove the process from the list of suspended processes, which are
kept for each event. If this call is not the first call then we have to remove this
process from processes(e) of all dynamic events 8. Afterwards the new sensi-
tivity is managed and the program counter is incremented. Since the definition
of the next_trigger method is similar to the definition of the wait statements,
we only give a very generic rule for the handling of event expressions.
8
obtained by eventCollection(eventExpr(Self))
An ASM Based SystemC Simulation Semantics 113
When all processes are not executing, the scheduler goes through different
steps (see Fig. 4.5) by setting the ASM function step. Thereafter suspended
METHODs and THREADs are individually resumed until they go to suspended
again. THREADs change to suspended reaching a wait operation, METHODs
after their last statement.
The 8 steps of the scheduler are defined in the language reference manual as
follows.
Step 1: initialize
Step 2: resume a ready process
Step 3: if there are ready processes goto 2, otherwise goto 4
Step 4: update all channels with update requests
Step 5: if there are events at current time goto 2, otherwise goto 6
Step 6: if there are no more pendingEvents for future time,
STOP simulation, otherwise goto 7
Step 7: advance the current simulation time if max simulation time
exceeded then STOP, otherwise goto 7
Step 8: set processes ready to run for the new
Note here that the language reference manual and [Grötker et al., 2002]
denotes Step 1 as the initialization phase, Steps 2–3 as the evaluate phase, and
Steps 4–8 as the update phase (see also Fig. 4.4 and 4.5). In our specification
we take the fine grain partition into steps since we need a sufficiently detailed
model of the scheduler.
The initialization of most of the ASM functions should be intuitively clear.
Therefore we only sketch the initializations informally. We suppose step :=
checkProcesses and current time to be set to 0. For a processes with
dont_initialize(), we execute an implicit wait() at the beginning, i.e.,
the processes become sensitive to their static sensitivity list and is suspended.
All others processes are considered to be in status executing and immediately
start their execution.
For each clock, we introduce one extra SystemC module with one thread as
it is shown in Figure 4.6. Since these modules also inherit from the interface
114
All other ASM functions are assumed to be undef and sets and lists to be
empty.
After initialization the Scheduler_ASMmodule agent executes as follows.
Compared to the previously introduced steps of the scheduler, owing to tech-
nical ASM specific matters we have to combine Step 4–8 to two ASM rules,
which check for events and either stop the simulation or advance the time. In
order to keep the ASM specification readable, we use ASM macros as place-
holders for the individual sequential steps ResumeProcess, CheckReadyToRun,
UpdateChannels, CheckEvents, and AdvanceTime, so that we finally arrive at
the following definition.
The first step is defined by ResumeProcess and checks for all PROCESS
with status ready. As defined by SystemC, only one process is arbitrarily
selected from the set of processes fulfilling that condition. When there exists
such a process is resumed by setting its status to executing. Additionally, the
trigger of each METHOD is reset to staticSensitivity. Thereafter the scheduler
proceeds to the next step by setting step to checkReadyToRun. Note that the
An ASM Based SystemC Simulation Semantics 115
scheduler does not immediately execute the next step since the process first has
to execute, i.e., the condition AllProcessesExecuting first has to become true
again.
The next step given by CheckReadyToRun checks if there are more processes
ready to run. If there is at least one the scheduler goes back to resume it.
Otherwise the scheduler proceeds to update the channels.
In the third step UpdateChannels takes all update requests which are held in
the update_array and perform an update( ) on each of them.
The update function is different for each channel type. For signals the update
is defined by the following ASM macro definition. Only in the case when the
current value of the signal is different from the new requested value, it is updated
by its new value and a corresponding event (default_ event) is inserted into the
global set of pendingEvents. For that event its time has to be set to the current
116
time since the processes suspended on the signal update have to be triggered
for the next delta cycle at current time. Note here that those two additional
events for Boolean signals (one notifying the positive and one notifying the
negative edge) shall be inserted at this point. We leave this unspecified since it
would introduce unnecessary complexity.
After the step UpdateChannels the scheduler checks for events by Check-
Events. If there are no further events, i.e., pendingEvents is empty, the sim-
ulation stops. Otherwise if there are pending events at the current time, the
corresponding processes are triggered and set to ready which is defined within
the macro TriggerProcesses( ). Thereafter the scheduler proceeds to resume
those. In the case that there are no events at the current time, the scheduler
proceeds to advanceTime.
In the case of static sensitivity we check if the event is in the sensitivity list
of the process before setting it to ready and deleting the process from all the
process lists of all events in its sensitivity list.
The minumum is rounded off with respect to time resolution set by the
user which is given by SystemC function sc_get_time_resolution(). When not
explicitly modified, the default time resolution is nanoseconds.
4.7 Example
This section gives a detailed insight into the execution of our ASM model of
the SystemC V2.0 simulator on the ASM virtual machine. We execute a simple
SystemC bit counter on the defined ASMs agents. The counter was originally
introduced in [Delgado Kloos and Breuer, 1995] where it served as a running
VHDL example for all articles in the book. There it was also used to outline the
ASM based semantics of the VHDL’93 simulator. We give the corresponding
SystemC code in Fig. 4.7. Fig. 4.8 additionally sketches the structure of the
different modules, signals, ports, and outlines their connections. We presume
that the reader has basic knowledge of SystemC V2.0. Otherwise we refer to
the SystemC introduction given in [Grötker et al., 2002].
In the example sc_main defines processes p3.assign as an instance of
SystemC module m2 and clk as an instance of clock (Fig. 4.8). clk is defined
An ASM Based SystemC Simulation Semantics 119
with a period of 10 ns, a duty cycle of 50 percent (the percentage where the clk is
true), and a start time of 0 ns. The last clk parameter defines that the first edge
at the start time is a negative one. Recall that in SystemC a clock is a subclass
of a signal and a module, so that a clock can be considered as a thread, which
determines its own value. In the following execution we refer to the code given
in Figure 4.6 on Page 114 which defines the clock’s behavior by the means of
a SC_THREAD. Note also that m2 and m1 both have a static sensitivity list, just
after the declaration of the corresponding method and thread, respectively. The
clock clk triggers the p3 module via the X port which directly connects to the
port X of p1. The outputs of p1 and p2 are connected to signals s0 and s1
which both combine to the value of output port Y which directly connects to
signal y in sc_main.
When compiling and executing the given SystemC program, SystemC first
elaborates the module structure and instantiates and connects p1, p2, p3, and
clk as described before. For the initialization we have to instantiate one ASM
agent as an instance of the ASM SCHEDULER_ASMmodule (cf. Section 4.6)
and four agents of the ASM PROCESS_ASMmodule (cf. Section 4.5), one for
each user defined process (p3.assign, p1.behaviour, p2.behaviour) and
an extra one for the clock clk.
Table 4.1 gives an overview of the waveform of all signals over the considered
simulation time, from 0 ns to 22 ns. It shows that clk starts with false and toggles
every 5 ns between t (true) and f (false). The last row gives the integer value of
the primary counter output where s0 and s1 are bit components represented as
bool.
4.7.1 Initialization
After elaboration of the program, i.e., executing it from the from the begin-
ning of sc-main to the call of sc_start(), the simulation scheduler is invoked
starting with the initialization phase. Owing to the rules of SystemC initial-
ization, we also initialize our ASM functions. If not stated otherwise, we set
An ASM Based SystemC Simulation Semantics 121
functions to undef and assume lists and sets to be The simulation time res-
olution and the default simulation time unit are set to their default values, i.e.,
SC_PS and SC_NS, respectively. The current simulation time is initialized by
the value 0.
The universe PROCESS has three distinguished elements: p1.behaviour,
p2.behaviour, p3.assign, and clk.clock. Since all module instances have only
one process, we abbreviate the processes by p1, p2, p3, and clk in the remainder
of this section. The domain EVENT is initialized with four default events, one
for each signal including the one for the clock:
For each signal we initialize the ASM function event with the corresponding
default event in order to retrieve events for signals, e.g., is the default event
of signal clk. Additionally we set sensitivityList for all processes as given in
Table 4.2 and assume all triggers to be of staticSensitivity.
Since dont_initialize() is set, the status of p1 and p2 both become
suspended and both are inserted into the lists and
Additionally their programm_counters are assigned to their first statement and
set to the beginning of the while loop. The status of the other processes, clk and
are both set to executing. Thereafter clk starts executing an instance of the
code given in Fig. 4.6, i.e., it initializes val by false, executes write(val), and
suspends on a wait(start_time) with dynamicSensitity. The write statement
does not schedule an update request since the new value equals the initial value
of the clock. The wait statement schedules a timeout by inserting with time
5 and into pendingEvent. Furthermore, eventExpr(clk)
is set to e by the wait statement. p3 executes Y.write() upon which no update
request is set since the current value of Y equals its new value. p3 suspends
after the last statement with trigger=staticSensitivity so that at the end of the
initialization we arrive at a list with one pending event.
122
The list of four processes and their values after the initialization is summa-
rized in Table 4.2, where the sensitivity list of each process is determined by
the (default) events associated to the signals of the processes’ sensitivity, or
signals reachable via the ports in the processes’ sensitivity with respect to the
previously introduced table of their default events
After initialization the scheduler runs iteratively through evaluate and update
phases and advances delta and time cycles as given in Fig. 4.4 and Fig. 4.5.
The remainder of this section sketches the individual steps of the simulation
scheduler with respect to the simulation time.
4.7.2 Time 0 ns
Evaluate. In the first simulation cycle the scheduler starts with the evaluate
phase by executing ResumeProcess (Rule S2) and CheckReadyToRun (Rule
S3). Since there are no processes with status ready, it immediately proceeds to
the update phase and to UpdateChannels (Rule S4).
Update. Since there are no update requests and no events at the current
simulation time, the scheduler advances through the steps CheckReadyToRun,
UpdateChannels, CheckEvents and Advance Time (Rules S2–S10). In Advance-
Time (Rule S10)it sets the current time to the next greater time in the list of
pending events, i.e., The final TriggerProcesses in Rule S10
considers process clk to be ready since and
Owing to the dynamic sensitivity of clk and since eventCollection
of eventExpr(clk) it performs TriggerDynamic(clk, upon which the status of
clk is set to ready.
4.7.3 Time 5 ns
Evaluate. After advancing to 5 ns, the scheduler returns to ResumeProcess
where clk is invoked as a ready process. clk first enters the while loop and inverts
its value (to true) and then assigns it by executing the write statement. The latter
schedules an update request, i.e., clk is inserted into the global update_ array
since the new value is different from the current one. Thereafter clk suspends
on wait(10 * 0.5) which schedules at
An ASM Based SystemC Simulation Semantics 123
In the next step CheckEvent checks for events at the current time and thus
matches As a result the status of p1 is set to ready by
TriggerStatic. This macro also removes p1 from
Evaluate. After returning to ResumeProcess we identify one ready process.
p1 is set to executing and finally suspends on its first wait statement with trigger
= staticSensitivity. Now is inserted into
Update. Since no more processes are ready no update requests are available,
and no events are pending for the scheduler goes through several steps and
finally advances the simulation time to 10 ns which is given by
4.7.4 Time 10 ns
Evaluate. After advancing to 10 ns, the scheduler sets clk to executing in
ResumeProcess, which inverts its value to false and performs a write operation
which executes an update request on clk before it suspends with a 5 ns timeout
when executing the wait operation. The first one schedules event at 10 ns,
the latter schedules at 15 ns.
Update. Thereafter UpdateChannels evaluates the update requests on clk,
which results in scheduling for 10 ns for triggering the process p1.
Update. No more update requests and events are available at the current time
so that is advanced to 11 ns given by and p1 becomes ready.
4.7.5 Time 11 ns
Evaluate. At time 11 ns, p1 becomes executing in ResumeProcess. It writes
a true to s0 via its output port. After inserting s0 into the update_ array it
suspends with static sensitivity on the wait statement in the second while loop.
This includes p1 again into
Update. In UpdateChannels the update request schedules the (default) event
of s0 with processes p2 and p3 since = default_event(s0) is in the sensitivity
lists of both.
4.7.6 Time 15 ns
Evaluate. The next cycle starts with ResumeProcess, which executes clk.
clk sets is value to true and changes the value by the write operation yielding
an update request. Thereafter it suspends with a timeout generating event
for 20 ns.
Update. In UpdateChannels the update request executes an update on clk
and schedules the event for process p1 at CheckEvents finally sets p1 to
ready.
4.7.7 Time 20 ns
Evaluate. ResumeProcess sets clk to executing which changes its value to
false with an update request on clk before it suspends on a wait and schedules
event for 25 ns.
Update. In UpdateChannels the update request on clk sets the process p1 to
be ready since it is
Evaluate. When returning to the step ResumeProcess the ready process
resumes. p1 exits the loop since X.read() returns false. Next it suspends at
wait(1, SC_NS) and generates event for time 21 ns. The set of pending
events now has two entries.
Update. Since there are neither current update requests nor pending events
at the scheduler advances the time to 21 ns so that is set to ready since
it is associated with event
4.7.8 Time 21 ns
Evaluate. p1 is set to executing which assigns false to s0 via its output port.
This generates an update request since s0 changes from true to false. Thereafter
p1 returns to the first while loop and suspends on the first wait statement with
dynamic sensitivity.
Update. In UpdateChannels the update request causes an update of signal
s0 and schedules event for processes p3 and p2 at time 21 ns.
4.7.9 Time 22 ns
Evaluate. At the beginning of this simulation cycle, p2 executes and assigns
true to via its output port which generates an update request on s1 before
suspending on the wait operation in the next while loop.
Update. Thereafter the update request updates s1 to true and schedules the
(default) event of s1 with . The next step thus sets p3
to ready yielding p3 to resume and execute.
Evaluate. is resumed and writes to Y which generates an update request
on with new value 2. Note here that the value of is composed of two bit
components s0=false and s1=true. The value of is finally updated to 2 by
UpdateChannels. We can skip the scheduling of the corresponding event since
default_event(y) is not in the sensitivity list of any process. Thus AdvanceTime
determines to advance to 25 ns which completes our investigations.
4.8 Conclusions
This article introduces the formal definition of the SystemC V2.0 simulation
semantics through the specification SystemC specific statements and their in-
teraction with the simulation scheduler by means of ASM rules based on the
formal ASM theory. The definition is fully compliant with the notion given
in the SystemC manual and presents an abstract model of the SystemC V2.0.1
reference implementation. It is a formal but intuitive description of the Sys-
temC V2.0, which may complement the SystemC language reference manual.
Since the definition given in this chapter compares to the patterns of the ASM
definition of VHDL’93 [Börger et al., 1995] and SpecC [Müller et al., 2002],
our formal definition provides a suitable basis to study interoperabilities with
VHDL’93 and SpecC.
Acknowledgments
The work described herein was partly funded by the German Research Coun-
cil (DFG) within the priority program Integration of Specification Techniques
with Engineering Applications under grants number KR 1869/3-2 and within
the DFG Research Center 614 (Self-Optimizing Systems of Mechanical En-
gineering). We also appreciate the valuable comments of Uwe Glässer who
reviewed this article.
Chapter 5
Abstract Synthesis tools for SystemC descriptions are mature enough to cover the design
flow from the system level to the gate level, whilst SystemC centered validation
methodologies are still under development. This chapter presents a complete
validation framework for SystemC designs based on a mix of functional testing
and model checking. It is based on a fault model and a test generation strat-
egy that are applicable through the whole design flow from system level to gate
level. This is achieved by exploiting the SystemC 2.0 simulation efficiency and
by performing an effective test patterns inheritance procedure. Fault simulation
and automatic test pattern generation (ATPG) have been efficiently integrated
into a unique C++ executable code linked to the SystemC model. In the case
of mixed SystemC/HDL designs co-simulation is avoided by using a HDL to
SystemC automatic translator, which produces a uniform executable SystemC
model. Perturbed (faulty) SystemC descriptions are generated by injecting high-
level faults into the code that are detected by using an ATPG. Undetected faults
may be either faults hard to detect or design errors, thus they are further inves-
tigated by using model checking on the synthesized SystemC code. In this way
the intrinsic characteristic of SystemC 2.0, in order to cover the whole design
flow, is further extended by producing a SystemC-based validation environment
that links a SystemC model with tools aiming at identifying design errors at all
abstraction levels.
5.1 Introduction
The SystemC language has become a new standard in the EDA field and
many designers have started to use it to model complex systems. SystemC has
been mainly adopted to define abstract models of hardware/software compo-
nents, since they can be easily integrated for rapid prototyping. However, it
can also be used to describe modules at a higher level of detail, e.g., RT level
hardware descriptions and assembly software modules. Thus it would be pos-
sible to imagine a SystemC based design flow [Fin et al., 2001a], in which the
127
W. Müller et al. (eds.),
SystemC: Methodologies and Applications, 127–156.
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
128
system description is translated from one abstraction level to the following one
by always using SystemC representations. The adoption of a SystemC based
design flow would be particularly efficient for validation purpose. In fact, it
allows one to define a homogeneous functional validation procedure, applica-
ble to all design phases, based on the same fault model and on the same test
generation strategy.
Testing SystemC descriptions is still an open issue, since the language is
new and researchers are looking for efficient fault models and coverage metrics,
which can be applied indifferently to hardware and software modules. Previous
automated approaches to functional test generation have targeted statement cov-
erage [Malaiya et al., 1995, Corno et al., 1997], i.e., coverage of the statements
in a VHDL or C design description, or coverage of transitions in a finite state
machine model of a design [Cheng and Krishnakumar, 1996]. An automated
approach was used to obtain functional tests that targeted both statement cov-
erage and path coverage [Yu et al., 2002]. However, very few approaches have
been presented in the literature directly targeting SystemC code [Ferrandi et al.,
2002b, Harris and Zhang, 2001] and they are all focused on RTL characteristics
of the language.
The proposed validation methodology tries to cover the whole design flow of
a digital system from the system level of description to its structural representa-
tion by using the same fault model and test generation technique. The accurate
description level and simulation performance reached by a SystemC 2.0 descrip-
tion allow the definition of an efficient validation procedure. In fact, exploiting
SystemC simulation performance, fault simulation, and automatic test pattern
generation can be directly integrated into a unique C++ executable code. On
the contrary, they are very time consuming activities by considering traditional
hardware description languages.
The chapter is organized as follows. Section 5.2 explains the overall method-
ology. Section 5.3 describes how design errors can be modeled by using a high-
level fault model. Section 5.4 describes the proposed validation environment
based on the combination of three different well known validation techniques
(fault simulation, ATPG and model checking) implemented in SystemC. Sec-
tion 5.5 presents some approaches to perform efficient fault simulation. Ex-
perimental results are showed in Section 5.6, whilst Section 5.7 is devoted to
concluding remarks.
Efficient High Level Fault Simulation. More and more SystemC code
is used to model and simulate digital designs, however, a large num-
ber of designs are still based on more traditional hardware description
languages, such as VHDL or Verilog. Thus techniques for efficiently
integrating these languages with SystemC are a hot topic spreading from
modules to EDA tools porting. This chapter explains different alterna-
tives to address the SystemC vs. VHDL integration problem with respect
to FS and ATPG.
Figure 5.2 and Figure 5.3 show respectively how a bit fault and a condition
fault modify the description of a SystemC DUV. The bit coverage fault model
directly covers all language characteristics of SystemC 1.2, which models RTL
descriptions. On the contrary, it must be extended to cover the new SystemC 2.0
language features concerning events and channels.
Events are faulted in two different ways: by changing the eventual pa-
rameter of the notify method and by avoiding the event notification. Pa-
132
Bit coverage can be easily related to the other metrics, developed in the
software engineering field [Myers, 1999] and commonly used in functional
testing.
Statement coverage. Any statement manipulates at least one variable or
signal. The bit failures are injected into all variables and signals on the
left hand and right hand side of each assignment. Thus at least one test
SystemC as a Complete Design and Validation Environment 133
vector is generated for all statements. To reduce the proposed fault model
to statement coverage it is thus sufficient to inject only one bit failure into
one of the variables (signals) composing a statement. In conclusion, the
bit coverage metric induces an ATPG to produce a larger number of test
patterns with respect to statement coverage and it guarantees to cover all
statements.
Branch coverage. The branch coverage metric implies the identifica-
tion of patterns which verify the execution of both the true and false (if
present) paths of each branch. Modeling of our condition failures im-
plies the identification of patterns which differentiate the true behavior
of a branch from the false behavior, and vice versa. This differentia-
tion is performed by making stuck at true (false) the branch condition
and by finding patterns executing the false (true) branch, thus executing
both paths. In conclusion, the proposed bit coverage metric includes the
branch-coverage metric.
SCRAM SystemC Analyzer. This module allows one to convert the Sys-
temC description into the IIR format. SystemC up to version 1.2 can be
directly converted into the IIR model, whilst new classes are added to IIR
to describe SystemC 2.0 features;
Control module generator. This module has been developed to allow the
test of either each single SystemC module of a composite architecture or a
set of them. Analyzing the modules’ interconnections, it produces a new
architecture description inserting a new SystemC module, called switch,
over each interconnecting signal. This feature allows to propagate signals
just to the target modules during simulation. The interface to switches is
driven by the switch manager (see Section 5.4);
Fault list generator. This object analyzes the IIR description to extract
the fault list for the DUV according to the bit coverage fault model;
Fault injector. This module gets either from the previous component
or from file the fault list and produces an IIR description with injected
faults;
SystemC as a Complete Design and Validation Environment 135
to activate the SA0-1 fault on the targeted statement. The SA0-1 masks allow to
remove a specific fault if untestable, redundant or already detected. Figure 5.7
shows an example of fault-free and faulty SystemC descriptions. Moreover, it
illustrates how the faults are recursively inserted in complex statements as an
if-then-else statement. Figure 5.8 illustrates the fault injection architecture.
It shows also an extra constant called delta. It has to be added to each com-
ponent to allow the activation of a target fault even in descriptions with more
instances of the same component.
identify design errors. The architecture of the ATPG is described in the follow-
ing subsection.
The exmode signal sets the behavior of the switch: propagate or isolate. It is
possible to test all the components and its internal connections by setting all the
switches on the border of a sub-architecture to isolation mode and by enabling
all its switches. If the exmode signal is set to propagation mode the value on
sigin port is copied on the sigout port without modifying it. The value of the
selected signal can be either read or write by using sigin and sigout signals.
ATPG engine. This module produces the test vectors for the DUV.
Its structure is customized on the DUV by analyzing the IIR structure.
Input test vectors have to be of the same type of the DUV input ports. The
module applies the same input vector to both faulty and fault-free descrip-
tions, then, when the results are ready, the fault detector compares
them. If the results are different for at least one of the observed output
signals, the targeted fault has been detected, otherwise a new test vector
must be applied;
Fault detector. This component has to be configured for the DUV.
Its configuration is obtained analyzing the IIR structure to check the data
type of the observed output signals;
Switch manager. This block is DUV independent. It selects the
subarchitecture to test by setting the switches inside the two DUV de-
scriptions.
The structure of the ATPG allows the combination of many ATPG engines
(i.e., genetics algorithms, BDD’s, static code analysis, etc.) as described in the
next section.
quired to cover faults hard to detect. In the proposed SystemC based validation
framework, the ATPG is aided by a genetic engine which is applied for extend-
ing randomly generated test patterns. Being SystemC based on C++, it easily
allows the definition of an high accurate genetic algorithm:
The genetic algorithm developed for functional test generation has the fol-
lowing characteristics.
Gene definition and representation. Genes are test vectors. Each one is
GENE_SIZE long, where GENE_SIZE is the sum of primary input widths;
Population. It is composed of 32 genes. Its size is constant during the
whole genetic evolution. It evolves either for at most MAX_GENERATION
generations or it stops if the GA reaches the full coverage;
Fitness function. Several fitness functions have been tested before
choosing for accuracy and performance:
It requires the evaluation of the ratio between faults detected by the gene
X and the remaining faults from previous generation. Both numbers
are updated during the fault simulation, thus the fitness function can be
evaluated efficiently;
Selection strategy. The roulette selection scheme has been adopted.
Experiments show its efficiency owed to its elitist factor. In fact, the top
5 genes from the previous generation take part of the gene set elaborated
by genetic operators;
Crossover operator. A GENE_SIZE-point crossover is applied. The
probability of each crossover point is evaluated by the ranking derived
from linkage learning theory. This solution allows to implicitly derive
the points, which generate, on the average, better new individuals;
Mutation operator. This operator slips gene bits. It allows one to detect
faults from the same cluster. Its application probability increases when no
coverage improvement is obtained during the last generations, otherwise
it decreases to better explore fault clusters.
terms. This is possible when all signals are updated synchronously with
one or more clock signals. In this context the behavior of asynchronous
signals becomes synchronous.
This section investigates the potentialities of the second approach. The man-
ual re-modeling of VHDL components into SystemC modules is not feasible
owing to the long time required to perform the translation, inherently prone to
errors. Thus we have defined a set of rules aiming at automatically translate
VHDL descriptions into SystemC modules with equivalent behavior under the
assumption of cycle based simulation. In this way it is possible to obtain ho-
mogeneous representations, thus performing fault simulation instead of fault
co-simulation sessions.
This translation has been implemented on top of the Savant environment as a
further feature of the AMLETO environment (see Section 5.3.3). The SystemC
code generator is based on the intermediate IIR format that is built from VHDL
code. All identified translation rules have been implemented as C++ methods
manipulating the IIR description.
The VHDL to SystemC translator Translation rules for all the VHDL con-
structs considered have been inserted in a database of rules and implemented in
the vhdl2sc tool. Only partial examples are reported in the following for lack
of space. For instance, Table 5.1 shows the rule to translate a fixed size integer.
Both VHDL functions and procedures have to be translated into C++ meth-
ods. In the case of procedures, method parameters have to be passed by ref-
erence in order to model side effects of VHDL procedures into the calling
process/procedure.
A namespace has to be defined to translate each VHDL package. It can in-
clude functions, procedures, variables, and constants. This approach allows the
declaration of different objects with the same name in separated namespaces,
thus behaving as a package. The translator is currently able to convert VHDL
to SystemC by selecting the 1.2 or 2.0 version.
hand, the same functionality mapped on the emulator will run at a frequency
which does not depend directly on the number of events. This configuration
allows one to maintain a constantly updated vision of design configuration (sig-
nals and registers values). The ATPG tool samples the design primary outputs,
after applying each test vector. This feature represents its main drawback, since
the overhead due to the communication between the workstation is high.
where is the required time to download the whole sequence. Indeed, the
co-emulation should be chosen for any architecture where:
This constraint is valid for most of the available emulator systems. The mas-
ter/slave co-emulation mode is more suitable at the initial stage of the design
flow, when deep design behavior has to be verified.
Table 5.3 shows the delay introduced by the function calls in the simulation
of VHDL and SystemC descriptions stimulated with 100,000 input vectors.
Results show that the function calls overhead in simulation time is higher in
SystemC than in VHDL because of the minor complexity of SystemC processes
regarding to VHDL ones. However, Table 5.4 shows that the simulation times
of SystemC are clearly better than VHDL ones. Moreover, further experiments
show that when the number of processes rises the advantage of SystemC over
VHDL increases as well. Note that changing the C++ compilation optimiza-
tions and the way VHDL functions are translated in SystemC (e.g., static func-
tions, ‘in class’ methods and ‘in line’ methods) impacts on SystemC simulation
efficiency up to 44%.
The last set of experiments measures the impact of components instantiation
in SystemC and VHDL. A structural VHDL description with a parametric num-
ber of components is selected and translated in SystemC. Simulation results are
shown in Table 5.5. It is evident that the higher is the number of components
the lower is the advantage of SystemC over VHDL. This observation can be
SystemC as a Complete Design and Validation Environment 153
Table 5.6 shows the fault coverage achieved and the ATPG characteristics
related to the case study. The low fault coverage (69.9%) obtained shows a quite
low testability of the analyzed design. The majority of the undetected faults
are located within the quantization module mod.Q. By exploiting the controlla-
bility and observability capabilities, offered by the control switch architecture
(Section 5.4), the inverse quantization module (mod.Q) is isolated from the
saturation and the mismatch control modules (mod.S). These submodules are
154
where
The ATPG environment has been built by using the SystemC language
in order to obtain a unique executable code composed of the DUV and
the ATPG.
156
1
Institute for Integrated Circuits, TU-Munich, Germany
2
Infineon, System Architecture, Data Comm., San Jose, CA, USA
Abstract The design of emerging networking architectures opens a multiple scenario of al-
ternatives. If the design starts at a higher level of abstraction the process towards
the selection of the optimal target architecture, as well as the partitioning of the
functionalities, can be considerably accelerated. The present work introduces a
novel system level performance estimation methodology based on SystemC. It
will be shown that the rebuilding effort is considerably lower when applying the
proposed methodology rather than building up a structural model of the target
architecture at a lower level of abstraction. It makes feasible the exploration of
several partitioning alternatives of a system specification onto a target architec-
ture.
6.1 Introduction
High level estimation techniques are of great importance for design decisions,
such as guiding the process of hardware–software partitioning. In this case a
compromise between accuracy and computation time determines the feasibility
of those estimation techniques.
The acceleration of the partitioning process, which allows the exploration of
several partitioning alternatives, requires a fast performance estimation of the
functionalities without losing accuracy. Time consumption of the whole design
process could be drastically reduced if high level estimation methodologies
would be performed before taking major design decisions.
157
W. Müller et al. (eds.),
SystemC: Methodologies and Applications, 157–190.
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
158
Figure 6.1 illustrates the two main different approaches that can be followed
when pursuing a first estimation of the performances achieved by a certain
partition alternative. On the one hand, building up a structural model of the
target architecture according to the selected partition of the functionalities.
It delivers a precise estimation, but the time and effort it takes to try a new
alternative is considerable. On the other hand, describing the functionalities in
terms of a Conditional Process Graph [Pop et al., 2000] and adding the relevant
information of the target architecture to the graph. This information comprises,
first, the performance of each function mapped to the selected processing unit
and, second, the characteristics of the communication architecture. In this
case the simulation time and effort towards a re-partitioning is less costly and
consequently more partition alternatives can be simulated and evaluated. After
choosing a partitioning that meets the performance constraints, the way towards
synthesis (back end) goes through a structural model of the target architecture.
This step is costly in terms of design, but fortunately it has to be done only once,
after selecting the best partition alternative that meets the constraints through
the loop described above.
The chapter is organised as follows: The first section introduces the moti-
vation and the application scenario. Secondly, the Annotated SystemC Con-
ditional Synchronisation Graph is introduced. It encloses the Functional, Ar-
chitectural, and Communication models. The system performance estimation
scheme is then illustrated by means of selective examples from the networking
world. Later on, the chapter explains the steps towards the implementation and
the integration process into the design flow. The performance results achieved
by the proposed methodology are compared to the outcomes of a configurable
cycle accurate hardware–software co-simulation platform for verification pur-
poses. The next section shows and evaluates the results concerning computation
time, re-building effort, and accuracy. The chapter ends with the conclusions.
between components of the system and often assume systems where the compu-
tations and communications can be statically scheduled [Gasteier and Glesner,
1999]. The communication time estimations used in these systems are either
over optimistic, since they ignore dynamic effects such as wait time owed to
bus contention, or are over pessimistic by assuming a worst case scenario for
bus contention. This last case is addressed, for instance, in [Dey and Bommu,
1997], which issues a worst case performance analysis of a system described
as a set of concurrent communicating processes.
6.3 Methodology
In networking applications the processing of the packets crossing a network
equipment depends on the values of certain fields within the packet/cell headers.
This kind of system is suitable to be modelled using a Conditional Process
Graph (CPG) [Pop et al., 2000] which is able to capture both data and control
dependencies. Other possible graph based representations, such as the Control
Data Flow Graph (CDFG) [Knudsen and Madsen, 1999], are not so adequate for
describing networking functionalities. They are mainly thought for handling
internal loops, which are found very seldom in networking applications.
The CPG proposed by Petru Eles describes each input functionality as a
node. The nodes are linked in the defined order to achieve the complete system
specification. Further on, by mapping each process to a given processing el-
162
Once the mapping is fixed the related execution times of each computation
node, depending on which processing unit is mapped, are fetched from an
external library and annotated into the corresponding nodes. Moreover, the
mapping and scheduling information define the parallel/sequential processing
of nodes.
The last model, the Communication ASCSG, accepts as inputs the structure
of the communication architecture, the related parameters for each communi-
cation medium (bandwidth and frequency) and the management policy imple-
mented when accessing a shared communication medium and when sharing a
processing unit by several threads. Furthermore, the mapping of the communi-
cation nodes, the external ones onto the communication media and the internal
ones onto the multi-threading software processors, is performed.
System Level Performance Estimation 167
During the generation of the models mentioned above different issues have
to be considered. The definition of each issue and the proposed approaches to
solve them are further presented.
Convergence of Paths. Within a process graph it can occur that the outputs
of two or more computation nodes sourcing from different branches which are
not exclusive to each other meet together. At this point it can not be assured
that the upper nodes deliver the outcome with the same latency. Therefore
the meeting node has to wait until all required inputs are available. In order
to avoid the loss of intermediate results, a fifo channel (SystemC primitive
168
channel sc_fifo), as depicted in Figure 6.8, has to bind each upper node with
the meeting one.
cause multiple interacting synchronous processes must have at least one timing
control statement in every path.
has to implement the interface functions in order to maintain the original graph
unchangeable.
an external unit, it has to wait until the request is served. The thread then sus-
pends its execution and surrenders the access to the processing unit to another
thread. Once the request has been performed the external unit signalises it to
the corresponding thread, which is again ready for being awake. Anyhow, it
has to wait until the processing unit becomes free and the Context Event Arbiter
(CEA) grants it the access.
In order to be able to implement this characteristic the context switch events
have to be signalised to the Context Event Arbiter, whose modelling will be
explained within the specification of the Communication model.
The initiator processing unit first signals the Command Bus Arbiter
(CBA) its intention through the external communication node. It will
grant the access to the target communication medium;
The context/s ready to run signalise the Context Event Arbiter (CEA)
their status (waiting for a context switch event) through an internal com-
munication node;
172
media. Therefore each one accesses the Command Bus Arbiter of the commu-
nication medium to which it is mapped.
The target architecture comprises masters (or transmission initiators) and
slaves (or transmission receivers). When several masters are able to access
the same slave the last is referred to as shared element. In a bus based SoC
architecture the access to the shared elements takes place through a common bus.
Inside each shared communication medium a command fifo is implemented,
where the requests stay until they can be served.
ing executable file, which is finally executed in order to perform the functional
validation of the specification and the extraction of the performance values for
the selected alternative.
is generated, which further on is parsed for the creation of the functional exe-
cutable file.
Figure 6.16 depicts the access for writing and reading of two instances of
two computation nodes mapped to two different software processors (PE), to
a shared one mapped to an accelerator (AC). For each data to be transfered an
access node (Acc) has to be included. It accesses the corresponding shared
element (shel) through a port which is able to call the interface functions im-
plemented in the shared element.
Figure 6.16. Access to a shared computation node through a shared communication medium
The required parameters for its customisation are the bandwidth, frequency and
the number of words of bandwidth to be transmitted at once.
1
Only required in case of an external communication node
182
The steps towards the Architectural model imply the mapping of every in-
stance of each computation node to the selected functional unit of the target
architecture. This is done by making the computation node hold more informa-
tion inside its data structure. This information includes, for each instance, the
processing unit number to which it is mapped, its priority, and the correspond-
ing execution time, which is picked up from the Mapping Processing Times
library.
With these parameters a process detects whether shared computation nodes
exist and, in the affirmative case, the corresponding flag inside its data struc-
ture is set. Furthermore, a list of required access nodes and another of the
corresponding shared elements are created. The required information for their
customisation is extracted from both the access initiator and the shared compu-
tation node.
In practice, the architectural configuration file extends the previously func-
tional configuration file. The Module section additionally comprises the map-
ping information of each computation node instance and the lists of access
nodes and shared elements along with their details. This architectural con-
figuration file is then parsed and fed to the data structures defined in Fig-
ures 6.18 and 6.19. Finally, these linked lists are simply traversed and pro-
cessed through iteratively in order to generate the simulatable Architectural
model (main_architectural.cpp).
The last step pursuing the Communication model carries out the insertion of
internal and external communication nodes and the corresponding arbitration
184
mechanisms to access the shared processors and media respectively. For these
additional elements two new linked lists are created. The mapping of the internal
and external transmissions to the target software processors and communication
media, respectively, gives the input information required by their customisation.
The communication configuration file extends the Module section with these
two new linked lists of communication nodes and arbitration mechanisms. This
configuration file is again parsed and fed to the data structures defined in Fig-
ures 6.18, 6.19 and 6.20. Once again and lastly, these linked lists are simply
traversed and processed through iteratively in order to generate the simulatable
Communication model (main_communication.cpp).
the transfer initiator. These elements are introduced and customised according
to the requirements.
The achieved Communication model for this partition alternative with two
software processors and one hardware block accelerating the classification func-
tion is depicted in Figure 6.23.
results are shown in the table 6.2, together with other architecture–partition
alternatives consisting in two/six embedded processors and zero/one/two ac-
celerators. The average throughput at the output side is measured stimulating
the system with an input throughput of 915 Mbyte/s.
graph shows its dynamic behaviour and delivers the pursued system perfor-
mance estimation.
The implementation of the method is based on the system level language
SystemC. Its support in terms of means of communication and rules for process
activation, together with the fast simulation time achieved when modelling at
transaction level, make this language very suitable for the implementation and
automation of the method–procedure.
Chapter 7
Abstract In this article an efficient methodology is proposed for the SystemC based de-
sign of digital systems which are protocol dominated, such as communication
networks based on serial protocols. In the development process of such systems
a large amount of design effort has to be put into specification and implemen-
tation of protocol related hardware and software. The methodology presented
here enables an abstract specification and simulation of complex frame-oriented
protocols as well as synthesis of controller hardware from such specifications in
the context of a SystemC system model. The protocol specification language is
implemented as an extension library to SystemC 2.0. Using the USB 2.0 pro-
tocol as a modeling example we demonstrate the efficiency and benefits of the
proposed methodology.
7.1 Introduction
Complex ‘frame oriented’ serial hardware communication protocols such as
USB or Fire Wire, but also dedicated on-chip protocols, play an ever increasing
role in the design of distributed digital systems for wireless and automotive
applications, for highly integrated SoC and embedded systems. However, cur-
rent design methodologies and tools have insufficiently addressed the high level
specification, verification, design space exploration, and synthesis of such pro-
tocols in the system context. Hardware and software components related to
protocol processing are in most cases heavily control dominated, and therefore
their manual design and verification at a low level of abstraction is tedious and
error prone.
A few synthesis oriented tools such as Synopsys Protocol Compiler [Syn-
opsys, 1998] or PROGRAM [Öberg et al., 1996] exist which can transform an
191
W. Müller et al. (eds.),
SystemC: Methodologies and Applications, 191–216.
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
192
abstract high level protocol specification into RTL controller hardware. The
drawback is that the specification means used are not part of a system descrip-
tion language, and therefore corresponding protocol specifications cannot be
integrated into a HDL or SystemC based system model.
These arguments motivated the development of a SystemC library for high
level specification and simulation of hardware communication protocols in the
context of a SystemC model as well as synthesis from such specifications in
order to tightly integrate such protocols into the SystemC based design of digital
systems.
item may describe a complete USB frame, a data field within this frame or just
a single bit of information in the serial USB bit stream. System modules (which
can be either hardware or software components) exchange data and synchro-
nize with each other through transmission of such communication items over
abstract channels. Figure 7.1 visualizes the model of such a communication
item.
An SVE communication item has a number of attributes which reflect typical
properties of communication. The FROM, TO and BETWEEN attributes express
that communication is directed. The direction is specified in terms of commu-
nication endpoints between which a particular item is transmitted. The TAKES
attribute reflects the property that the transmission of an item consumes a cer-
tain amount of time. This time attribute can be set to an absolute value or to
a time span, which introduces controlled non-determinism into protocol mod-
eling. This is especially useful in early stages of design for constraining the
194
In the SVE library two different types of communication items are defined,
describing different levels of communication abstraction. The most abstract
item is the frame item. Frame items describe a packet oriented data transfer
between system modules, and are subdivided into transaction and message
items. Transaction items are used for the specification of a multi-directional
information transfer, e.g. protocols that involve a turnaround in the direction of
transfer such as handshaking protocols. Message items specify a unidirectional
information transfer between system modules. An example specification of a
frame item using the SVE library is given in Fig. 7.2. This item specifies a
packet containing a byte address and eight bytes of data.
An item specification consists of a declaration part which defines the item
parameters and an item constructor. Parameters can be scalars or arrays, and
are defined as SV_Param<type> or SV_ParamArray<type>, respectively. The
template type argument can be any C++, SystemC, or user defined type. For
a parameter array, the second template argument defines the size of that array.
In Fig. 7.2 the frame item Packet has two parameters, a scalar address value
and an array of eight data values. In the item constructor the direction and
timing attributes are set, and all variables of type SV_Param that represent item
Design of Protocol Dominated Digital Systems 195
parameters are registered as such. The order in which they are registered is
used for positional mapping of actual to formal parameters when an item is
instantiated. Other variables of type SV_Param that are declared in an item
but not registered as a parameter serve as local variables used to store the item
state. Finally, for transaction and message items a transmission protocol can be
declared in form of a composition of other items (see section 2.2 of this article).
The second type of item provided with the SVE library is the PHYMAP item.
This item provides a means for mapping of abstract transactions and messages
and their parameter values to sequences of physical signal states. Fig. 7.3 shows
an example for a PHYMAP item specification. In this example the parameter
XD_val is mapped to the physical signal XD. In the declarative part of a PHYMAP
item, parameters as well as references to the SystemC signals to which the
parameter values are to be mapped are declared. Finally, in the PHYMAP item
constructor the referenced signals are associated with the item parameters.
Figure 7.5. Item composition schemes provided with the SVE library
The two more complex composition schemes provided with the SVE library
are the REPEAT and SELECT schemes. The first variant of the REPEAT scheme
specifies a statically bounded loop for repeated transmission of an item. In this
scheme a loop variable of type SV_Param, which is named i in the example, and
the lower and upper bounds of the loop must be specified. The decompositional
and compositional behaviors of this scheme correspond to those of the serial
198
scheme. The loop variable holds the current loop count and can be used to
select elements from a parameter array that is passed as an actual parameter to
the item instance in the loop body. The static REPEAT scheme can therefore
efficiently describe a reversible parallel to serial and vice versa data conversion.
The second variant of REPEAT scheme specifies a repeated transmission of an
item, which is terminated as soon as a condition becomes false. This condition
must be evaluated separately with the aid of a user defined C++ function and the
result assigned to a variable of type SV_Param<bool>. This form of REPEAT
scheme is mostly used to control the transmission of an item (e.g., suspend
and resume transmission) through handshake signals. The decompositional
behavior of this scheme would generate item A as long as the condition i is
true. The compositional behavior would try to match item A as long as the
condition is true. In case the condition is true and item A could not be matched
a protocol error would be generated during simulation.
Finally, there is a third variant of the REPEAT scheme. This scheme declares
a repeated transmission of an item that is terminated by the transmission of an
item distinct from the first one. The decompositional behavior of this scheme
would generate item A a number of N times, followed by generation of item B.
The compositional behavior, however, is this that every time item A is matched
the loop variable is incremented starting from zero until item B is matched, so
that after a match of item B the loop count parameter contains the number of
matched items A. If neither item A nor B can be matched a protocol error is
generated. An application for this scheme is the modeling of serial protocols
that contain a variable number of data fields delimited by an end field, but there
is no explicit information on the number of data fields in the protocol stream.
Here the compositional behavior automatically extracts both the transmitted
data and data length information from the stream.
The SELECT scheme provided by the SVE library is used to selectively trans-
mit a particular item chosen from a set of alternative items. The selection is
based on the value of an item parameter. In Fig. 7.8 the parameter p is used to
choose one from three alternative items. The SELECT scheme has a template
argument that is used to specify the type of the selection parameter. The de-
compositional behavior of the SELECT scheme generates the item specified in
the branch that is selected by the parameter p. The compositional behavior of
this scheme (left example in Fig. 7.8) would try to match one of the specified
Design of Protocol Dominated Digital Systems 199
alternative items, and in the case of a successful match it would set parameter p
to the corresponding value and otherwise generate a protocol error. The com-
positional behavior can be modified such that for parameter p a certain value is
expected, and hence only the item selected by this value is to be matched (right
example in Fig. 7.8). The fact that the selection parameter needs to match a
certain value instead of being established is expressed with the SV_MATCH prefix.
a separate file, as shown in Fig. 7.10. The constructor contains the code for
allocation of all items and registration of external signals. Allocated items
are then accessible in compositions of other items via their references in the
interface unit. In the constructor also the set of communication endpoints
between items can be transferred are defined.
and receive items via these ports. For each port it must be specified in the
module constructor, which communication endpoint from the ones given in the
interface unit it implements. A port must implement exactly one communication
endpoint. The code example in Fig. 7.11 shows the specification of two SystemC
modules which are connected via an SVE channel and which communicate
using the Packet message item.
Figure 7.12. Automatic translation between levels of communication abstraction using an SVE
protocol specification
pair of voltage levels on the two wires. Each one of the speed modes shown in
Table 7.1 has a distinct mapping of bit information to a pair of voltage levels
on the USB wires (see also section 3.2).
Packet Selection. USB transfers data between host and function in units
of packets. USB packets have a variable format. The format depends on the
packet type, which is specified by a packet identifier field (PID) at the beginning
of the packet. The USB protocol defines a total of 16 different packet types,
which are divided into token packets, data packets, handshake packets, and
special packets. A simplified overview of the USB packet structure is shown in
Fig. 7.15. The SVE specification of a USB packet is shown in Fig. 7.16. AUSB
Bit Field of Fixed Length. The USB PID field is a bit field of fixed length.
Fig. 7.17 illustrates the format of the PID field. The eight-bit PID value is
transmitted from LSB to MSB. The corresponding SVE description is shown
in Fig. 7.18. The composition of this message is quite simple. The static repeat
composition is used to describe the PID field in form of a serial bit stream which
is formatted LSB to MSB. During execution of the decompositional behavior,
the value of the PID field is transmitted bit by bit with item LogicalBit. The
compositional behavior would try to match item LogicalBit eight times. It
would then assign the bit value received with this item to the corresponding bit
206
in the PID field. After eight successful matches of the LogicalBit item the
value of the PID is reconstructed.
Bit Stuffing. USB does not use an explicit clock line to transmit synchro-
nization information between host and function. Instead, the communication
endpoints use autonomous phase locked loop based clock generators, which
need to be synchronized in certain intervals. The synchronization information
can be obtained from transmitted data as long as a sufficient number of signal
transitions take place on the USB wires in a certain time interval. In order
to ensure sufficient signal transitions, bit stuffing is applied to the USB data
stream. A zero bit is inserted by the sender respectively removed by the re-
ceiver after six consecutive ones bits in the data stream. In conjunction with
NRZI encoding (see the following section) a signal transition is guaranteed on
the USB wires at least on every seventh transmitted bit. Fig. 7.19 illustrates
the usage of bit stuffing in conjunction with NRZI coding in the USB protocol.
The composition of the bit stuffing message is shown in Fig. 7.20.
Figure 7.20. Specification of the USB logical bit message which implements bit stuffing
The decompositional behavior of this message first transmits the actual bit
value contained in the parameter bitval using message BusStateNRZI. This
message performs the NRZI encoding of the USB bit stream and is described
in the next section. After the bit has been transmitted, a SELECT composi-
tion is used to either increment or reset the ones bit counter, depending on
the value of the currently transmitted bit. For incrementing and resetting this
counter two sequential behaviors have been defined in the message in form of
item methods and have been embedded into the message composition using
the SV_SEQBEHAVIOUR keyword. A second SELECT composition tests the bit
Design of Protocol Dominated Digital Systems 209
counter if a total of six ones bits have been transmitted in sequence. In this case
a stuffing zero bit will be transmitted next using message BusStateNRZI and
the bit counter will be cleared.
The compositional behavior first attempts to match the message BusState-
NRZI and in the case of success it establishes the bit value transmitted with
this message in parameter bitval. Depending on this value, the ones’ bit
counter is then incremented or cleared as in the decompositional behavior.
However, the difference is that if six consecutive ones bits have been received,
the compositional behavior next attempts to match the message BusState-
NRZI. This message must transmit the stuffed zero bit. If this message can be
matched successfully, then the bit stream transmitted so far conformed to the bit
stuffing rule. Otherwise the compositional behavior would indicate a protocol
error. The specification of the bit stuffing message shows how efficient quite
complex compositional and decompositional behaviors can be expressed using
the declarative SVE syntax.
Bus State Mapping. In a USB cable there are, amongst others, two wires,
called D+ and D-, which represent the physical medium used for data transmis-
sion. Each one of the USB speed modes uses its own mapping of the USB
210
Figure 7.21. Specification of the USB message which performs NRZI encoding
bus states to voltage levels on the USB wires D+ and D-. The bus state map-
ping defined in the USB 2.0 specification [Compaq et al., 2000] is shown in
Table 7.2. Fig. 7.22 shows the specification of message item BusStateJ. Item
BusStateK is specified in an analog fashion.
The composition of this message is quite simple and contains just a SELECT
composition of three instances of the PHYMAP item USBPhy with different ac-
tual parameters. This PHYMAP item forms the interface to the physical USB
wires. The actual parameters specify the appropriate signal voltage levels for
the desired bus speed mode. The selection of these voltage levels is based on
a variable BusSpeed, which is in this case not an item parameter but a global
variable defined in the interface unit. The last thing that remains to present
for a complete protocol model is the specification of the USB Phymap item.
This specification is shown in Fig. 7.23. The item specification contains the
references to the two USB wires D+ and D– , which are associated with the item
Design of Protocol Dominated Digital Systems 211
The abstraction level of the protocol has been varied through removal of
lower level items from the protocol specification. Fig. 7.26 shows the relative
simulation time of the system model for various protocol abstraction levels.
In the simulation, the transmission time of a USB token packet was measured
for a USB protocol specification that describes packet level only, a specifica-
tion that describes packet and field level and a specification that describes the
protocol from packet to bus state level. The experiment showed that the simula-
tion time increases rapidly by refining the protocol specification. A refinement
down to field level increases the simulation time by a factor of 3. When the
USBPacket item is specified down to the detailed bus state level, simulation
times are about 12 times higher compared to a simulation using the abstract
Design of Protocol Dominated Digital Systems 213
Figure 7.25. Implementation of the USB communication link to simulate clock recovery
In the next step host and device modules were refined in order to simulate
clock recovery from the USB data stream (see Fig. 7.25). Clock recovery is
needed to synchronize the sampling clock generators in host and device. For
this purpose corresponding synchronization modules that generate the sam-
pling clock for the USB wires have been added to host and device modules.
Two instances of the USB protocol specification are used to serve as protocol
generator and protocol consumer, respectively. Host and device are connected
via the USB wires D+, D- to which also the synchronization modules of host and
device are connected (Fig. 7.25). The relative simulation time for this model is
also shown in Fig. 7.26 and is about 86 times higher compared to the abstract
model of the USB communication link.
Figure 7.26. Simulation time of the USB protocol at different abstraction levels
Figure 7.27. Synthesis of protocol controller hardware from an SVE protocol specification
and decoding, respectively. The generated SystemC RTL models were synthe-
sized into a gate level implementation for a technology and for a clock
frequency of 62.5 MHz, using Synopsys Design Compiler. Table 7.3 shows the
synthesis results for the controller implementations.
7.5 Summary
An efficient methodology for integration of abstract specifications of hard-
ware communication protocols into SystemC 2.0 system level models has been
presented. The methodology is based on a declarative protocol specification
language (SVE) which has been implemented as an extension library to Sys-
temC 2.0. The advantages of declarative SVE protocol specifications have been
demonstrated by means of a model of the USB 2.0 protocol. Such specifications
are concise and intuitive to create and understand by designers, and effectively
216
raise the level of abstraction for protocol modeling. Because such specifica-
tions are reversibly executable they can be used as protocol generators, protocol
checkers or can perform automatic translation between levels of communication
abstraction in a mixed abstraction level design.
Furthermore, the synthesis approach for such declarative protocol specifica-
tions has been sketched. This approach enables synthesis of a protocol generator
as well as a protocol consumer controller in the form of RTL models from the
same protocol description and their automated integration into the communi-
cating behavioral specifications. Synthesizing two controller implementations
with completely distinct behaviors from one protocol specification is a novel
approach in high level synthesis that does not only tremendously raise the de-
sign productivity but also ensures that the generated implementations conform
to the initial protocol specification.
Acknowledgments
We would like to thank the German National Merit Foundation for partly
sponsoring this work.
Chapter 8
1
OFFIS Research Institute, Oldenburg, Germany
2
Carl v. Ossietzky University of Oldenburg, Oldenburg, Germany
Abstract In this article we will briefly discuss the good prospects that SystemC opens
for high level, and especially object oriented, hardware modeling. But we will
also highlight the fundamental problems that SystemC’s higher level language
constructs represent for hardware synthesis. We will further present how, based
on SystemC, object oriented hardware modeling and a clear synthesis semantics
can be combined, leading to an extended SystemC/C++ synthesis subset. And
last, we will outline two synthesis strategies that allow direct and automated
hardware synthesis from object oriented hardware specifications based on the
extended synthesis subset mentioned above.
8.1 Introduction
Object orientation [Booch, 1993] currently is the dominant paradigm in soft-
ware engineering. The basic idea of building a system by decomposing it into a
set of associated objects can be found in abstract methodologies, system analy-
sis, and programming languages. Objects combine the behavior and the data of
a component into conceptual and even physical entity. The interaction between
objects is limited by an interface that abstracts from the particular implementa-
tion of the objects. Two key prerequisites were essential for the overwhelming
success of object orientation in software design. First, object oriented methodol-
ogy provides adequate solutions for major design challenges, e.g., the structure
and the components of complex systems can be mapped naturally onto compo-
sitions of objects. Second, efficient tools — and here especially compilers for
object oriented languages — fostered designers’ productivity.
217
W. Müller et al. (eds.),
SystemC: Methodologies and Applications, 217–246.
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
218
For several years now the dominant paradigm in hardware design has been
register transfer level (RTL). With its typical building blocks such as register,
arithmetic/logical unit, data path, and finite state machine it is already a signifi-
cant abstraction from the lower levels such as transistor or gate level which were
used before. From a structural perspective systems at RTL are compositions of
blocks connected and synchronized via signals. Only by the productivity boost
gained from the shift from gate level to RTL the opportunities offered by the
rapid progress in semiconductor manufacturing could be realized in the past.
For a few years there is also the possibility of behavioral level modeling, adding
some further abstraction to RTL. With behavioral level modeling the designer
can focus more on algorithmic design. The main automated steps of behavioral
synthesis are allocation and binding of resources and the scheduling of opera-
tions to clock cycles. Although this reduces the effort for the designer necessary
in order to achieve an efficient hardware implementation of a given algorithm,
the expressiveness of behavioral specifications does not go far beyond RTL.
Today hardware system complexity reached a point where the current abstrac-
tion level significantly limits the productivity. The modeling and implementa-
tion of communication and synchronization scenarios in actual SoCs already
generates a growing portion of the design effort. Anticipating the progress in
semiconductor manufacturing in the near future we believe that RT and behav-
ioral level design will become as inadequate for hardware design as assembler
coding already is for the majority of software systems.
Similarities between past challenges of software design and those in hardware
design practice provide a clear indication that object orientation could again help
solving them. Object orientation can especially help to model the complex
structure of modern systems and describe communication within and between
hardware entities at an adequate level of abstraction. Stepwise refinement and
generalization are essential concepts for efficient IP re-use in hardware design
which are, in contrast to VHDL [Ashenden, 2002] or Verilog [Thomas and
Moorby, 1991], well supported by object oriented features such as inheritance
and polymorphism.
SystemC™ [OSCI, 2002a, Bhasker, 2002, Grötker et al., 2002] and Sys-
temVerilog [Accellera, 2002b] are two prominent examples for hardware design
languages already trying to improve the designers productivity by higher levels
of abstraction in general, and object orientation, in particular. While these con-
cepts are extremely useful for the modeling of complex hardware systems they
lack a clear synthesis semantics and thus can not be translated automatically
into hardware.
In the remainder of this article we will first give a brief overview of related
work. Next, we will shortly discuss the benefits of SystemC for modeling and
the fundamental problems regarding refinement and synthesis of its higher level
language elements. Afterwards we will present the SystemC based SystemC
Object Oriented Hardware Design and Synthesis Based on SystemC 2.0 219
1
In fact, it is an additional class library like the Master–Slave Library combined with a modeling/coding
guideline.
220
synthesis subsets. A further reason is that features like the SystemC chan-
nels, primitive as well as hierarchical ones, unfortunately lack a clear synthesis
semantics2. It is not clearly specified how to synthesize a channel and what
hardware should be synthesized from it.
Take, for example, the primitive channel sc_fifo. It is a nice example for
a data container in the object oriented sense, but its behavior would be quite
unrealistic for a real world hardware implementation, mainly because it is un-
timed. As a model for real hardware sc_fifo just lacks adequate functional
and timing accuracy, and therefore does not correctly reflect the behavior of an
actual implementation. Although synthesis support for it could be integrated in
principle into a tool, this would be rather an interpretation of what sc_ fifo is,
for instance, a dual ported register file, than a direct synthesis from its descrip-
tion. And it would be a proprietary solution, since other synthesis tools may
map it to another kind of hardware, preventing designers from implementing
own synthesizable channels.
It is a general problem of the channels in SystemC v2.0 that the way to imple-
ment channels conceptually prevents, or at least complicates a direct synthesis
from their specification. Refinement of a channel down to a cycle accurate
level usually requires some information about the way a channel works and
how it should be refined. This information can not be derived directly from its
implementation, making an automation of the refinement hardly possible. One
reason for that might be that SystemC as a system level description language
does not necessarily address only hardware, and that channels in general, and
sc_fifo, in particular, represent even more abstract communication mecha-
nisms. But SystemC is also clearly a hardware description language, and the
reasoning above will not really satisfy a hardware designer who makes use of
channels in his design and who would like to combine the modeling power of
channels with automated hardware synthesis.
In summary, SystemC builds a very good starting point for combining hard-
ware design with object orientation but also reveals some significant drawbacks
regarding hardware synthesis starting from higher levels of abstraction. In the
following we will present a possible approach to how the level of abstraction
for hardware modeling, based on SystemC and in combination with new syn-
thesis techniques, can be raised without having to abandon automatic synthesis.
2
sc_signal is excluded from this consideration, since it is a basic primitive for hardware modeling with a
well defined synthesis semantics rather than just another channel.
Object Oriented Hardware Design and Synthesis Based on SystemC 2.0 223
3
Since this is not fully standardized, this basically means the subset that is most common supported by
existing tools.
224
8.4.1 Polymorphism
In the absence of pointers the question arises of how polymorphism, which
is, of course, one of the most powerful object oriented features, can be realized,
since this mechanism is completely based on pointers in C++. In other pro-
gramming languages, like Ada95 [Burns and Wellings, 1998, Barnes, 1998],
the support for polymorphism does not necessarily depend on pointers nor even
references. An alternative realization is, for example, by means of tagged ob-
jects. A tagged object can dynamically change its type, or, more precisely, its
class type, during run time. The information of to which class such an object
4
This holds at least for traditional ASIC design. Things look different for reprogrammable chips, but for
this kind of technology the borders of hardware and software are blurring anyway. However, it is arguable
whether reprogrammable logic will eventually completely replace hard wired logic in the foreseeable future.
Object Oriented Hardware Design and Synthesis Based on SystemC 2.0 225
A global object is a class template which requires two parameters; the first
parameter must be a scheduler class, which is used in arbitrating concurrent
accesses, the second parameter is any user-defined class, implementing the
intrinsic behavior of the object. An exemplary declaration looks as follows:
Object Oriented Hardware Design and Synthesis Based on SystemC 2.0 229
The SystemC Plus class library already includes a few pre-defined schedulers,
but designers can also define their own scheduling classes, following some
coding guidelines. The central element of a scheduler is the schedule()
function, which must be accordingly overloaded by each user defined scheduler
class. The schedule() function is automatically invoked each time a service
is externally requested on a global object in order to determine the next request
that will be granted. Basically it is a normal synthesizable member function of
the scheduler class from which the scheduler is actually synthesized, without
any additional information necessary to be specified by the user.
The intrinsic functionality of a global object is given by the class that is
passed as second parameter. In the above example it is a FIFO buffer, a class
template itself, that can store up to eight elements of type integer. Although the
second parameter can be any ordinary user defined class, one additional coding
guideline has to be taken into consideration: each of its member functions
which is called by a client process, must be declared as a so called guarded
method. Taking the class declaration of the FIFO class that is used above, this
looks as follows:
230
Global calls are always blocking. That means a client is blocked at the
call site until its request is finally granted. Note that the global object based
communication mechanism in the SystemC Plus approach can not guarantee
freedom from for deadlocks and starvation.
Though a global call looks a bit more ‘ugly’ and may even feel a bit less object
oriented than a usual SystemC channel access owing to the usage of a macro,
it is still method based and easy to use whilst providing all the features that
have been listed previously. And, as another advantage, the temporal behavior
of an actual hardware implementation is accurately reflected in simulation also
taking the necessary communication between a client and a global object into
account. That means that each global call, even if immediately granted, will
block a client for some clock cycles, since handling the communication and the
execution of the called method naturally consumes some time in real hardware.
Thus the user receives an early but realistic impression of the temporal behavior
of the modeled hardware even before performing synthesis, while keeping all
the benefits of high level modeling.
In order not to limit communication via global objects only to processes
being located in the same module, global objects, of the same type, which
means instantiated with the same template arguments, can be bound to each
other like ports:
232
Thanks to the fixed access mechanism for global objects, formed by the
communication protocol, scheduling and guard mechanism, and its separation
from the user defined behavior, the behavior itself can be easily extended and
modified by the user without having to track the synthesis semantics each time.
To summarise, global objects have some clear advantages over SystemC
channels for hardware modeling and synthesis. A global object accurately
Object Oriented Hardware Design and Synthesis Based on SystemC 2.0 233
reflects the behavior of the hardware that is synthesized from it, thanks to built
in mechanisms which cannot be circumvented by the user. Global objects
possess a scheduling mechanism which guarantees mutual exclusive access.
The integrated guard mechanism makes the use of additional semaphores or
monitors obsolete. A global object can have an arbitrary number of clients,
without having to explicitly specify that number, and global objects can be
bound to each other, thus enabling communication throughout a whole design
hierarchy. All this is provided in a way that clearly separates between the user
defined behavior of a global object that is directly synthesized and its basic
functionality that rather serves as a framework for the behavior. Most of these
points could also be achieved with SystemC channels, but only completely
handcrafted and without any special conceptual support. And, in particular, the
last mentioned point regarding synthesis does not seem to be realizable with
channels as provided in SystemC v2.0.
Figure 8.2. Synthesis flow starting from object oriented hardware specifications
behavioral and RT level synthesis tools. Hence the first and most important
step in object oriented synthesis is to eliminate, or, more precisely, to replace,
all objects. As alluded to previously, we will present two approaches that lead
to this goal.
The first approach is to split objects into individual variables. Leaving mem-
ber functions aside at first — they will be discussed in the next section — an
object can be simply regarded as a set of variables, namely its data members.
Thus an object declaration can be replaced by a set of individual variable dec-
larations, each of them representing one of the original data members. Access
to data members is replaced by access to the extracted variables. Assignments
from and to objects are transformed into a sequence of individual assignments
from and to each variable that has been extracted from the original objects.
Signals and ports of class type are split and treated in the same way. The ac-
cess specifiers public, protected, and private, although being useful for
modeling, obviously have no meaning for synthesis5, and are therefore simply
ignored.
5
Here we presume that the design being synthesized is semantically and syntactically correct, i.e., can be
successfully compiled with an ANSI compliant C++ compiler.
Object Oriented Hardware Design and Synthesis Based on SystemC 2.0 235
Inheritance, even multiple inheritance, does not pose a special problem for
synthesis. Simplifyingly speaking inheritance represents a kind of compiler
supported sophisticated copy&paste, and the same effect can be achieved by
copying all data members that are inherited by a class directly into the inheriting
class. This is performed before an object is split, thus ensuring that each class
instance will include a full set of data members, also including all inherited
ones.
Take, for example, the following simple piece of code:
After splitting the used object the code would look as follows:
236
By iteratively repeating object splitting nested objects, i.e., objects that them-
selves contain objects as members, also are completely flattened. If an object
contains arrays6 as members, these members are preserved as arrays, and are
not further split into their single elements. But each multi-dimensional array
is mapped to a one-dimensional array during synthesis. Preserving arrays and
mapping them to one dimension keeps the possibility of mapping arrays to
memories in later synthesis steps. If the elements of an array are objects them-
selves, they are mapped to bit vectors, as presented in the following. At the end
of the complete process, only variables of scalar type and arrays with elements
of scalar type are left, which should be processable by any behavioral or RT
synthesis tool.
The second approach for replacing objects is to map them to bit vectors, which
means that each object is represented by a single bit vector. One motivation
for this kind of mapping is that object splitting is not always a straightforward
process, especially not where objects are treated on the whole, for instance, as
function arguments or return values (refer also to the following section). But
the main motivation is that this kind of mapping allows one to preserve arrays of
objects, which means arrays with class instances as elements, instead of split-
ting them into hundreds or even thousands of individual variables. This is, in
particular, useful for allowing an efficient mapping of arrays to memories, since
each element could be completely mapped to exactly one directly addressable
memory entry.
In order to map an object to a bit vector, first, the bit width of each data
member is determined, for example, 32 bits for a member of type integer, 1 bit
for a Boolean member. The sum of the individual bit widths will give the size
of the bit vector that replaces the complete object. The original data members
are mapped onto slices of this bit-vector as illustrated in Figure 8.3
Coming back to the previous example we now transform the used object into
a bit vector:
6
Arrays of bits and logic values, namely, bit vectors and logic vectors, are excluded from this consideration,
since here they are regarded as their own data types comparable to scalar data types.
Object Oriented Hardware Design and Synthesis Based on SystemC 2.0 237
either passing the set of variables or the bit vector — dependent on the chosen
object mapping — that replace an object as additional parameters to a member
function. Member access within the function body is replaced by access to
the respective formal parameter or slice. As result of this transformation all
member function declarations are replaced by ‘normal’ non-member function
declarations in the global name space, and all member function calls are replaced
by calls to the respective transformed function declaration.
For illustration we will first augment a previous example by some member
function declarations and calls to them (note that the declaration of the class
Base is left out):
The transformation of this code in case of object splitting would give the
following result:
Object Oriented Hardware Design and Synthesis Based on SystemC 2.0 239
This time, applying the same optimizations for eliminating dead members
would lead to a bit vector of reduced size. But what makes optimization more
complex in this case is that for different functions the same object may be
represented by bit vectors of varying size, dependent on the set of members
that is actually used by a function. Additionally, the absolute position of a slice
representing the same member may vary. For instance, if one function uses
the members m_flag and m_number and another one only uses the member
m_number the start index of the slice representing m_number would be 1 in the
first case and 0 in the latter case (presumed that optimizations are applied).
7
It is presumed that the split object has more than one data member.
Object Oriented Hardware Design and Synthesis Based on SystemC 2.0 241
8.5.4 Polymorphism
A polymorphic object is different from a normal object, because it must
be able to store objects of different class types, which means with different
sets of members, and its type must be dynamically determinable in order to
realize dynamic dispatching of methods. The latter requirement is achieved
by adding an ‘artificial’ data member — the so called tag — to a polymorphic
object for synthesis. The tag enumerates the different possible class types
of a polymorphic object and its actual value determines the actual type of a
polymorphic object. Each class type of which instances are assigned to a
polymorphic object is represented by a unique value, and each time an object
is assigned, the tag is set to the value that represents the class type of the source
object. If the source of the assignment is a polymorphic object itself, the tag
value is simply copied.
By means of the tag the realisation of dynamic dispatching in a way that can
be processed by behavioral and logic synthesis is no longer a big problem. It can,
for instance, be handled by a switch statement with the tag as condition. Taking
again the small example of polymorphism which was used in section 3, each
dynamically dispatched call of the function execute() is basically replaced
by the following code fragment during synthesis (the tag value is represented
by an enumeration literal):
Object Oriented Hardware Design and Synthesis Based on SystemC 2.0 243
8
This must not be mixed up with SystemC channels. In this case ‘channel’ is just the denomination for a
certain set of signals which is used for communication.
244
The last waiting for termination of the remote method execution seems to be
redundant, since no output parameters are returned. But it is necessary to ensure
that the one-signal handshake is correctly completed if the method execution
takes more than one cycle.
Whatever has been presented in this article is not the endpoint and still holds
a lot of potential for improvements. And there are, of course, various critical
points which have to be taken into account but could not be addressed in this
short article. One is, for example, the resource efficiency, in terms of area
and timing, of the hardware that has been synthesized from object oriented
specifications. But there is a lot of optimization potential, and in most cases it
should be possible to reduce to a minimum, or even completely eliminate, any
additional overhead that may be introduced by object orientation.
Chapter 9
Abstract The current trend in embedded system design is towards an increasing percentage
of the embedded SW development cost of the total embedded system design costs.
There is a clear need of reducing SW generation cost while maintaining reliability
and design quality. SystemC represents a step forward in ensuring these goals. In
this chapter, the application of SystemC to embedded SW generation is discussed.
The state of art of the existing techniques for SW generation is analyzed and
their advantages and drawbacks presented. In addition, methods for systematic
embedded software generation which reduce the software generation cost in a
platform based HW/SW co-design methodology for embedded systems based on
SystemC is presented. SystemC supports a single-source approach, that is, the
use of the same code for system level specification and verification, and, after
HW/SW partitioning, for HW/SW co-simulation and embedded SW generation.
9.1 Introduction
Since the 1970s microprocessors have become common devices in electronic
systems. In fact, specific processors, microcontrollers, and DSPs have been de-
veloped for this market. In these classical embedded systems, functionality is
shifted to software [Lee, 2000]; that is, they include basic hardware (microcon-
troller/DSP, RAM, FLASH, peripherals, power unit, sensors and actuators) and
nearly all the functionality is implemented in software. Existing general pur-
pose software design techniques are not suitable because embedded software
has a strong interaction with the environment, with very limited resources,
strong constraints (cost, timing, power, etc...), and often is developed by en-
gineers with little background in computer sciences [Lee, 2000]. During the
1970s and 1980s embedded software was usually hand written in C or assem-
247
W. Müller et al. (eds.).
SystemC: Methodologies and Applications, 247–272.
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
248
bler code, linked with a hand written real time kernel and integrated with the
hardware as a final step [K. Shumate, 1992], [Balarin et al., 1997].
Throughout the 1990s the complexity of embedded systems and the time
to market pressure increased constantly, thus the previously commented me-
thodology became non-viable. Additionally, implementation technology evol-
ved towards increasingly complex integrated circuits that allowed the integra-
tion of the complete embedded system on a chip (SoC) . A SoC normally
includes microprocessors, memory blocks, peripheral modules, and hardware
components that implement specific system functions (e.g., time-critical pro-
cesses). A key aspect of the SoC design is the close relationships between
hardware and software parts which mean that they need to be designed at the
same time (co-design ). In order to design these complex embedded systems
on time new co-design techniques have been proposed. Originally these were
based on a ’describe and synthesize’ methodology [Gajski et al., 1994b]. Figure
9.1 shows a classical co-design flow [Balarin et al., 1997] in which from a high
level language specification (HardwareC [Gupta, 1995], SpecCharts [Gajski
et al., 1994b], Esterel [Balarin et al., 1997],...) the source code (C) of the
embedded software is synthesized . Examples of classical co-design tools are
Vulcan [Gupta, 1995], Polis [Balarin et al., 1997], Cosyma [Straunstrup and
Wolf, 1997] or Castle [Lopez et al., 1993]. In these systems the input de-
scription is compiled into models that describe the system behavior. Typically
two models are used to describe software: state oriented and activity oriented
models [Lopez et al., 1993].
State oriented models use a set of states, transitions between them, and ac-
tions associated with states and transitions to model the system. Examples of
this category are the CFSM [Balarin et al., 1997], Petri nets [Sgroi et al., 1999]
and HCFSM [Gajski et al., 1994b] representations. In order to synthesize the
software some representations (strongly oriented to hardware) have to be re-
fined. For example, the CFSM has to be implemented onto an S graph (software
graph) before software generation [Suzuki and Sangiovanni-Vincentelli, 1996].
Other representations (e.g., Petri nets, or PN) minimize this problem. The PN
model [Sgroi et al., 1999] represents data computation using one type of node
(transitions) and non-FIFO channels between computations using another type
of nodes (places). Data dependent control (if then else or while do loops) is
represented by places, called choices, with multiple output transitions, one for
each possible resolutions of the control. Data are represented by tokens passed
by transitions through places.
In the activity oriented models the system is described as a set of operations
related by data or execution dependencies. They are frequently used in DSP
applications. Examples of this model are dataflow graphs and control flow
graphs [Gajski et al., 1994b].
Embedded Software Generation from SystemC for Platform Based Design 249
After the input description is compiled into one of the above models the hard-
ware/software partition is performed taking into account the design constraints
and estimated system performance . After this step the software synthesis be-
gins with the scheduling of the software concurrent tasks on a shared resource:
the CPU [Sgroi et al., 1999]. This schedule can be static (defined during the
software synthesis process) or dynamic (defined at execution time by a RTOS).
The term ‘software synthesis’ is often used for static schedulers that have to
schedule the specification increasing execution speed and reducing the memory
usage [Sgroi et al., 1999] [Jiang and Brayton, 2002]. The main disadvantage of
these systems is the lack of a code reuse methodology and the problems caused
by the input description modification (software debugging difficulties, poor
code, etc...).
In order to generate the embedded processor program the source code (gen-
erated after scheduling) has to be compiled with a retargetable compiler [Mar-
wedel and Goossens, 1995]. Today some of the retargetable code generation
250
effort is focused on the porting of open source compilers such as GNU binutils
[Abbaspour and Zhu, 2002], or gcc.
Additionally some techniques modify the processor architecture and com-
piler algorithms in order to improve memory usage, speed, and power perfor-
mances. After compilation the embedded software has to be downloaded to
the system. The code is normally implemented into an expensive ‘on chip’
memory, thus the code size might seriously affect the cost of the system. To
reduce this cost several memory size reduction (e.g., [Kjeldsberd et al., 2001],
[Passerone et al., 2001]) and code compression techniques have been proposed,
(e.g. [Lekatsas et al., 2000]). The basic idea is to store compressed code into
the system memories and decompress the code when it is loaded to the CPU.
As embedded processors often include cache memories, specific memory, and
speed optimizations for them are required because the correct design of these
cache memories has a high impact on system performances. Several techniques
have been proposed for improving cache usage [Jain et al., 2001], [Chiou et al.,
1999]. These techniques define the cache implementation as well as the nec-
essary software support. Additionally, some techniques can improve memory
size with dynamically allocated data structures [Ellervee et al., 2000], increase
the memory bandwidth [Gebotys, 2001], and reduce the address calculation
[Gupta et al., 2000]. Power consumption is also a very important issue of
current systems on chips and some software techniques allow it to be reduced
[Fornaciari et al., 2001], [Shiue, 2001].
In the late 1990s the ‘describe and synthesize’ methodology presented in
Figure 9.1 was not able to survive the system on chip revolution, since it did
not fulfil the time to market constraints . A new methodology, based on the
‘specify–explore–refine’ paradigm [Gajski et al., 1994b], was proposed [Chang
et al., 1999]. The methodology points out that design reuse and platform based
design are the key aspects of the process. A hardware platform is a family of
architectures satisfying a set of constraints imposed to allow reuse of hardware
and software components [Sangiovanni-Vincentelli, 2002b]. Figure 9.2 shows
the design flow of the methodology [Keutzer et al., 2000].
This methodology has been adopted in commercial co-design tools with soft-
ware generation capability, such as the Cadence Virtual Component Co-Design
Environment (VCC) [Cadence Design Systems, 2002] or CoWare [CoWare,
2002]. The basic idea is to assign the functions to be implemented to compo-
nents of the platform or IP libraries [Villar, 2001]. The system behavior is
normally fixed but some methodologies allow implementing different behaviors
(adaptive systems) [Ernst et al., 2002]. The mapping is an iterative process
in which performance simulation tools evaluate the mapped architecture and
guide the next assignment. After mapping, the re-used hardware and software
component are added to the design, and the rest of the system is implemented
in a similar way to the methodology shown in Figure 9.1.
Embedded Software Generation from SystemC for Platform Based Design 251
A finish node represents the end of process execution, but not the end of the
simulation. It is reached when the process closing } or a return primitive inside
the process is reached.
relaxed timed systems [Villar, 2002]. The majority of the languages currently
used in HW description, simulation, and design fall into this category. So
Simulink, VHDL-AMS and SPICE represent examples of strict, continuous
time frameworks. VHDL and Verilog represent examples of strict, discrete
event languages. They represent the most typical paradigm in HW design effi-
ciently supporting the design steps from the RT level description down to the
gate level implementation. Synchronous languages like Lustre, Esterel, and
Signal are examples of relaxed, discrete time languages.
Based on the definition of relaxed, discrete event systems, sequential pro-
gramming languages like C/C++ or Pascal would fall into this category.
SystemC. The first versions of SystemC were based on a relaxed timed, dis-
crete time computational model including time transactions. It was discrete
time because there was an underlying master clock from which the other clocks
triggering sequential processes and, therefore, advancing time were derived.
The main time tag did not represent physical time. Combinational
processes may provoke time transactions.
This computational model changed with version 2.0. In order to facilitate
the integration of IP blocks modeled at gate level with logic delays, the main
time tag represents exact time in sc_time_units. In an opposite direction the
synchronization mechanisms wait and notify based on events were added,
increasing the flexibility of the communication and synchronization among
Embedded Software Generation from SystemC for Platform Based Design 259
9.3 SW Generation
Bearing in mind the increasing design costs, where SW is becoming more
important, several methodologies (presented in section 9.1) systematically or
automatically generated the binary code to execute over the final platform. In
the context of an integrated design framework, a SW generation methodology,
starting from the system level and preserving the efficiency of the implementa-
tion must be provided.
overcoming the need to maintain the coherence between the specification and
other intermediate representations.
This approach comes from the observation that the system level specifi-
cation code and the SW code maintain a high correlation [Fernández et al.,
2002]. Moreover, in C-based methodologies system level descriptions and the
generated SW code present similar constructing elements in order to support
concurrency, synchronization, and communication. For instance, a SystemC
process declaration has a direct relation to the corresponding thread declaration
in an RTOS API.
For synchronization and interface implementation this relationship is not so
direct. SystemC channels normally use notify and wait constructions to syn-
chronize the data transfer and process execution while the RTOS normally sup-
ports several different mechanisms for these tasks (interruptions, mutexes, flags,
...). Thus every SystemC channel can be implemented with different RTOS
functions. However, this correspondence flexibility (one system channel—
several implementations) can serve to give a certain flexibility for an efficient
implementation of interfaces, which might strongly affect the performance of
the system implementation.
In [Grötker et al., 2002] a solution which requires the substitution of each
C++ SystemC construct with an equivalent C code is proposed. Below is an
example of the way this substitution is performed:
This approach uses a single C++ description for specification and SW code
compilation (the main goal of the single source approach), and it will be ex-
plained in more detail in the rest of this chapter. The methodology described
above has to be extended in order to cover the SW/HW partition. Thus the
SYSTEMC variable has to be replaced by two variables: LEVEL which define the
description abstraction level and IMPLEMENTATION which specifies the partition
(SW or HW) in which the modules are assigned.
The LEVEL variable can take three values: SPECIFICATION, ALGORITHM
and GENERATION. When the first value is taken the input code is seen as the
system level specification before partition, thus it is a standard SystemC descrip-
tion in which the standard systemc.h library maps the language constructions
into the SystemC simulation kernel.
After partition the abstraction level is modified and the LEVEL variable takes
the ALGORITHM or GENERATION values. In this case the IMPLEMENTATION
variable identifies the modules that have been assigned to SW or to HW.
When the LEVEL variable takes the value ALGORITHM the SW part is
executed with the platform RTOS support while the HW part is executed with the
SystemC simulation kernel support (thus the HW part is still a standard SystemC
description not affected by the LEVEL value). In Figure 9.6 this is represented
as two paths, the top path, which represents the SW partition behavior, and the
bottom path which represents the HW partition behavior. The full specification
will run over the simulation host, although the RTOS can be implemented in
different ways. If the RTOS is targeted on the host, a specific implementation
262
One of the main advantages of this methodology is that the development of the
SC2RTOS library is relatively easy because the number of SystemC elements
that have to be redefined is very small. In addition, its file structure can be
similar to the SystemC library structure, thus the porting of the library can be
systematically performed. Figure 9.7 shows the redefined SystemC elements.
They are classified in terms of the element functionality. The figure also shows
the type of RTOS functions that replace the SystemC elements. The specific
RTOS function that replaces the SystemC constructions depends on the selected
RTOS. If a POSIX type API is chosen (for example, EL/IX, [Garnett, 2000])
different RTOS will be supported with the same SC2RTOS library.
The implementation library will also declare the same elements, but with a
SW oriented implementation. The following example shows the elements of
the previous example but declared in the SC2RTOS library. In this case RTOS
functions are not needed, so this portion of the library is totally independent of
the RTOS.
For example, Figure 9.8 shows the eCos implementation of the SC2RTOS
(SC2eCos library). This library has an object (execution_context) function-
266
ally equivalent to the sc_simcontext class in the SystemC library. This object
maintains a list in which every element stores the data that an eCos thread
requires to execute. The execution context provides a method (register_–
thread_process) which registers a new system process in the process list,
allocating the necessary data structures and calling the eCos process declara-
tion call (cyg_thread_create). The method is called during SystemC pro-
cess declaration by means of the redefined SC_THREAD macro. Thus, this
implementation is transparent to the user who is still using SystemC syntax.
The execution_context object has another entry function (uc_start) which
starts the execution of all system threads by using the eCos cyg_ thread_resume
call. The sc_start macro is also redefined in order to call uc_start with a
SystemC syntax.
Figure 9.9 shows a possible channel library implementation. The main idea
consists in describing the channel class as a wrapper from where the implemen-
tation class is called (this process can be called ‘channel mapping’). A function
pointer C technique can be used for channel mapping. Another possibility is
a C++ technique consisting in an interface class object pointing to the imple-
mentation class. The implementation is selected depending on a variable (with
values SW_SW, SW_HW, or HW_SW) which can be provided manually (as a
parameter of the channel class constructor method) or detected automatically.
Each channel implementation class uses communication mechanisms provided
by the underlying RTOS. For example, in the SC2eCos library the synchro-
nization can be implemented through several RTOS objects (flags, mutexes,
mailboxes, etc...). A highly suitable mechanism is the eCos flag, which enables
a quick development of the implementation library, since it is a primitive simi-
lar to SystemC events. While in general the SW/SW channel implementation
only depends on the RTOS API, the HW/SW or SW/HW channel implementa-
tion is also dependent on the platform architecture. This is because the methods
of the HW/SW and SW/HW implementation classes will access the specific and
generic peripheral HW of the platform. In [Fernández et al., 2002] an approach
is given using memory mapping and interruption techniques to implement
channel synchronization and communication mechanisms.
assigned to the HW part). All the statements which introduce additional in-
formation in this description (as partition) have been highlighted with bold
fonts. For example, the general.h file will include (according to the LEVEL
parameter), the SystemC library (systemc.h) or the support library SC2RTOS
(SC2RTOS.h). The LEVEL variable is defined in the top.cc file, but it could
also be introduced as a compilation switch or a variable in the Makefile com-
pilation script. As shown in Figure 9.10, the implementation library has to be
included before a module is declared. This allows the compiler to use the correct
implementation of the SystemC constructions in every module declaration. Ev-
ery specific instance of a channel defines an additional argument that specifies
the type of communication (HW_SW, SW_HW, SW_SW, or HW_HW).
A simple design example was developed in [Herrera et al., 2003] in order
to evaluate these techniques. This example consisted in a car Anti-Lock Brak-
ing System (ABS) [CoWare, 2002] example, with about 200 SystemC code
lines and including 5 modules with 6 concurrent processes. The system was
implemented in an ARM based platform that included an ARM7TDMI proces-
sor, 1Mbyte RAM, 1Mbyte Flash, two 200 Kgate FPGAs, a small configuration
CPLD and an AMBA bus. The open source eCos operating system was selected
as embedded RTOS.
In order to generate software for this platform–OS pair, the SystemC to eCos
(SC2eCos) Library was defined. This library is quite small, about 800 C++
code source lines, and it basically includes the concurrency and communication
support classes which replace the SystemC kernel (see Figure 9.11). This library
uses the basic techniques previously explained (it also includes a channel library
270
In this work the resulting memory layout was also obtained. One of the
most interesting conclusions was that there was a minimum memory size of
53.2 Kbyte that could be considered constant (independent of the application).
This fixed component could be divided in two parts: a 31 Kbyte contribution
owed to the default configuration of the eCos kernel, which includes the sched-
uler,interruptions, timer, priorities, monitor and debugging capabilities, and a
22.2 Kbyte contribution, necessary for supporting dynamic memory manage-
ment, as the C++ library makes use of it. The variable component mainly
increases with the introduction of the new system functionality.
9.7 Conclusions
This chapter shows the existing approaches for embedded software genera-
tion, focusing on a single-source approach from SystemC for platform based
design. The methods for SW generation based on SW synthesis have been
outlined and the single-source approach as an efficient embedded software
generation method based on SystemC has been presented. Focusing on the
single-source generation method, the required specification methodology, the
MoC, and the Design Framework, where this SW generation technique must
take place, have been described. This design strategy allows the designer to
concentrate the design effort on a single specification, thus preventing errors,
raising the abstraction level, allowing early verification, and therefore gaining
in productivity.
SW generation from SystemC can be simple based on the substitution and
redefinition of SystemC class library construction elements. Those elements
are replaced by typical RTOS functions and C++ supporting structures. The use
of a C++ cross-compiler serves to optimally reach the single-source paradigm
while a C cross-compiling strategy might achieve the optimum code size and
speed efficiency. One important advantage is that the method is independent
of the selected RTOS and any of them can be supported by simply writing the
corresponding library. This targeting can be made wider if a generic RTOS API
such as POSIX is used.
Experimental results for the C++ crosscompiling approach demonstrate the
quick targeting and efficient code generation, with a minimum memory foot-
print of 53.2 Kbyte when the eCos RTOS was used. The real overhead with
respect to a pure eCos implementation is relatively low, especially taking into
account the advantages obtained: support of SystemC as unaltered input for the
methodology processes; reliable support of a robust and application indepen-
dent RTOS, namely eCos, which includes an extensive support for debugging,
monitoring, etc,... .
Finally, it is shown how the current evolution of the language will affect
this methodology. In general, it will evolve in coherence with the concepts
explained, easing the inclusion of some of them, for example, the development
of RTOS emulation libraries over SystemC 3.0 will facilitate cosimulation at
algorithmic level using only the SystemC kernel
Chapter 10
SystemC-AMS:
Rationales, State of the Art, and Examples
1
Fraunhofer IIS/EAS, Dresden, Germany
2
Professur Technische Informatik, University of Frankfurt, Germany
3
Continental Teves AG oHG, Germany
Abstract Many technical systems consist of digital and analog subsystems in which some
of the digital parts are controlled by software. In addition, the environment
of the systems which have to be designed may comprise analog components.
Mixed-signal simulation, i.e. the combined digital and analog simulation, is very
important in the design of such heterogeneous systems. Unfortunately, state of
the art mixed-signal simulators are orders of magnitude too slow for an efficient
system simulation. Furthermore, the co-simulation of mixed-signal hardware
with complex software, usually written in C or C++, is insufficiently supported.
A promising approach to overcoming these difficulties is the application of
SystemC and its extension to the Analog and Mixed-Signal domain (SystemC-
AMS). The available SystemC 2.0 class libraries permit the simulation of digital
systems at high levels of abstraction. The generic modeling of communica-
tion and synchronization in SystemC allows designers to use different models
of computation appropriate for different applications and levels of abstraction.
Therefore the SystemC 2.0 class libraries can be extended easily to analog and
mixed-signal simulation by the implementation of new classes and methods.
In this chapter the requirements for the analog and mixed-signal extension of
SystemC coming from important application areas (telecommunication, automo-
tive, wireless radio frequency communication) are analyzed. For these applica-
tion domains we present simple but efficient solvers written in SystemC 2.0. Fi-
nally, we give a brief overview of the first prototype implementation of SystemC-
AMS.
273
W. Müller et al. (eds.),
SystemC: Methodologies and Applications, 273–297.
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
274
10.1 Introduction
Many technical systems consist of digital and analog subsystems in which
some of the digital parts are controlled by software. Which modeling languages
and simulators can be applied most effectively in the design process of such
systems? VHDL, VerilogHDL, and SystemC are the best known candidates for
this task but they have specific deficiencies.
SystemC supports a wide range of Models of Computation (MoC) in the
digital domain and is very well suited to the design of digital HW/SW systems.
Different levels of abstraction (from functional level down to register transfer
level) may be considered in the refinement process during the first system design
steps. In many applications the digital HW blocks and SW algorithms interact
with analog system parts and the continuous time environment. Owing to
the complexity of these interactions and the influence of the analog parts with
respect to the overall system behavior, it is essential to simulate the analog parts
together with the digital ones. Furthermore, an efficient analysis of the analog
system parts will become increasingly important with smaller chip structures
(deep submicron effects) and reduction of supply voltages. This requires an
integrated system design and verification environment (from specification level
down to circuit level). Higher levels of abstraction should be preferred to save
computation time and lower levels have to be used for other subsystems to
achieve accurate simulation results.
VHDL-AMS and Verilog-AMS are modeling languages well suited to de-
sign tasks like analog circuit design, design of small blocks like PLLs or A/D
converters, and other mixed-signal circuits with high accuracy requirements
but with a low complexity. However, for the design of very complex integrated
circuits (including several processors, embedded software, other digital hard-
ware, and analog blocks) these languages lack of concepts for the description
on higher levels of abstraction. There are no possibilities of increasing the sim-
ulation performance significantly by the use of application oriented, optimized
models of computation. The coupling with software (written in C or C++) is
cumbersome and leads to an additional decrease of simulation performance.
Tools like Matlab/Simulink provide the user with powerful possibilities for the
description of analog and mixed-signal systems at system level, but they do
not support modeling of digital hardware/software systems or the simulation of
analog subsystems on the electrical circuit level.
To fill the gap, this chapter introduces application specific methods for the
description of analog components embedded in a large system consisting of
complex digital hardware and software blocks. The main idea is the application
of SystemC and its extension to Analog and Mixed-Signal problems. This
leads to the proposal of ‘SystemC-AMS’. The SystemC simulation environment
provides class libraries which can be used for the simulation on different levels
SystemC-AMS: Rationales, State of the Art, and Examples 275
Analog systems are specified by differential equations (DE) or, more gen-
erally, by differential-algebraic equations (DAE). They can be classified into
conservative and non-conservative systems. Signals in non-conservative sys-
tems are directed. In an abstract sense they model the directed communica-
tion between analog modules. For example, block diagrams represent non-
conservative systems.
The signals in conservative systems represent physical quantities associated
with an across value and a through value. In the case of an electrical network
the across value represents the voltage between a wire and a reference node and
the through value is the current through a wire. Thus Kirchhoff’s laws have to
be satisfied. Signals in conservative systems are bidirectional.
Unlike discrete signals, analog signals (in both conservative and non-conser-
vative systems) are not assumed to be piecewise constant. In general they
consist of piecewise differentiable segments; the value of an analog signal
changes continuously with (continuous) time as shown, e.g., in figure 10.1. For
simulation we approximate such signals by, for example, a linear interpolation
between sample points.
We furthermore distinguish between static and dynamic modules. A module
is static if it models a purely combinational dependency between inputs and
outputs. An example for a static module is the behavioral model of a very
idealized operational amplifier with frequency-independent gain and voltage
limitation. A module is dynamic if we need differential equations to model its
behavior. An example of a dynamic module is a low-pass filter.
For all these analog systems there are no special constructs in SystemC to
model the analog behavior. The distinction of the different types does not cover
all situations which arise in modeling and simulation of mixed signal systems.
Analog modules are often designed in such a way that the signals at their
interfaces are non-conservative (directed) signals whereas the internal signals of
SystemC-AMS: Rationales, State of the Art, and Examples 277
the models are conservative. This design style makes the combining of modules
easier to understand because the interactions between the modules are assumed
to be unidirectional.
systems we will find all of these block types. It is a big disadvantage of the con-
ventional analog simulators that they are not able to differentiate between these
block types and not able to select the most appropriate simulation algorithms
for each block. Instead they are oriented on the general case: stiff nonlinear
differential-algebraic equations to be solved with very small time steps. This
decreases the simulation speed by orders of magnitude compared with a more
flexible simulation strategy. It is a main topic of this chapter to show that an ex-
tension of SystemC to SystemC-AMS gives the basis for such a more effective
simulation strategy.
The implementation of the different model types and simulation algorithms
in SystemC is described in the next sections, together with additional consider-
ations of application specific problems (e.g., telecommunication, automotive).
10.2.2 Synchronization
Analog and digital modules communicate and exchange data via signals
at their ports or terminals. Therefore the digital (discrete event) simulator
and the analog solver have to be synchronized. In general two directions of
synchronization have to be considered:
In the first case a digital event initiates the synchronization at a known point
in time. For the second case, the analog simulator has to calculate the ex-
act time of threshold crossing and to schedule at this time a digital event. In
VHDL-AMS such mixed-signal simulation cycle is part of the standard (IEEE
1076-1). In the SystemC simulation environment this synchronization is not
yet prepared. A non-effective synchronization scheme can slow down the sim-
ulation performance significantly, e.g., by repeated backtracking of the analog
solver.
At system level the communication between analog modules can be modeled
very often with non-conservative signals. Such non-conservative modeling of
the interaction between modules is independent from the description of the
module itself, which can also be a conservative system.
If a system is oversampled a delay T of one time step in feedback loops
can be accepted. In this case we can compute the outputs of an analog system
from known inputs in the signal flow direction without iterations. Although
this results in a limited precision, it is a pragmatic rule that an oversampling
ratio of 10 leads to a similar accuracy as with circuit simulators such as SPICE
or VHDL-AMS. This type of synchronization has many advantages for system
level simulation:
280
The sc_fifo uses a blocking read and write interface. A call of the read
method removes a sample from the FIFO and returns the value. A call of the
write method adds a sample value to the FIFO. The read method suspends
the calling process until a value is available. The write method suspends the
calling process until a value can be written into the FIFO.
In this way the analog modules will be simulated in signal flow order. Note
that if one out port drives more than one in port, this simple data flow realization
needs a fork module which copies the in port value to several out ports. This
can be simplified with the implementation of a specific SystemC channel.
Each module has to compute its dynamic or static function independently
from all other instances only using the output values of the previously computed
instances as input, and its internal state. For example, we can define an analog
signal class sca_quantity as follows:
As a very simple example of an analog module the following code shows the
implementation of a sinusoidal source:
SystemC-AMS: Rationales, State of the Art, and Examples 281
This signal source introduces the time, which is, in contrast to the ‘untimed’
static data-flow model of computation, mandatory for modeling analog signals.
A delay of one time step will be realized by writing the out port sample before
reading the in port sample. The delay time is determined by the time distance
between the samples, which were generated by the signal sources. An explicit
delay module can be realized as follows:
Note that we can also set the value init_value in order to determine initial
conditions. The definition of non-conservative analog signals shows that the
extension of SystemC to analog and mixed-signal systems does not need new
language constructs, as for example in VHDL-AMS compared with VHDL.
282
Instead we add new methods or classes to the existing SystemC class library.
In the following we describe methods that allow us the efficient simulation of
analog modules.
by a very simple and fast linear integration algorithm with constant time steps.
Static nonlinear functions can be modeled directly with SystemC. It makes no
difference if they are computed in a discrete event or in a continuous time model
of computation.
A and B are square matrices, is the state vector, and is a vector of indepen-
dent variables. Equation 10.1 can also be used for the computation of transfer
functions, pole zero or state space representations. With the backward Euler
formula and the time step
with
Figure 10.3. Low pass filter and its simple non-conservative behavioral model
First, we must transform the conservative model of the low pass to a non-
conservative model. Furthermore, we split the filter model into a linear
dynamic and a nonlinear static part. Figure 10.3 shows the resulting
non-conservative model;
Note that T is the constant time step and is a delay of one time step.
Equation 10.2 is a discrete time transfer function with the following
filter coefficients:
In the time domain we can compute the output voltage from the input
voltage by the following difference equation:
SystemC-AMS: Rationales, State of the Art, and Examples 285
The network shown in figure 10.4 has the following state space representation
in general, with as an internal state:
From the partial fraction expansion we can compute the impulse response.
Provided there are no multiple poles, we obtain:
Equation 10.3 describes the output of a transfer function with a Dirac impulse
as input. However, discrete event signals are piecewise constant. Each
constant signal segment is the integration of a Dirac impulse over time. The
SystemC-AMS: Rationales, State of the Art, and Examples 289
integration and the previous values are modeled by a complex variable state.
As a SystemC module we obtain:
where is the modulating signal, the modulation phase and the carrier
frequency. The term which includes the carrier frequency can be separated
from the signal part which contains the transmitted information. The signal
that contains the information is independent of the carrier frequency
This signal is called the complex low pass equivalent, or the complex en-
velope. For the base band signal the carrier frequency is converted to zero.
With and we obtain
The resulting base band signal is represented by two signal parts: and
The amplitude and phase of the carrier signal can be computed from
these signals at each point in time.
Like the signals, the models of RF blocks must also be transformed from the
pass band to the complex base band representation, e.g. a band pass filter in
pass band simulation corresponds to a low pass in base band simulation. The
equivalent low pass processing can be defined more formally using the Hilbert
transform [Jeruchim et al., 1992].
For modeling such modules in SystemC, are the methodologies described
for signal processing systems can be applied. Instead of the data type double,
the C++ Standard Template Library (STL) type complex<double> will be
used. The sampling frequency must then correspond to the pass band and not
to the base band frequency.
The main disadvantage of this simulation approach is that effects outside the
signal bandwidth cannot be represented in the base band signal, e.g. harmonics
of the carrier frequency. Unfortunately such effects could have an impact on
the performance of further receiver components.
292
The view layer transforms the user description into a system of equations. For
example, Modified Nodal Analysis (MNA) is applied to an electrical network
to create matrices which represent the system of equations.
The solver layer provides different analog solvers, e.g., for solving DAEs
or computing transfer functions Solvers are instantiated from the view
layer. A solver is able to solve a system of equations provided from certain
view layers.
SystemC-AMS: Rationales, State of the Art, and Examples 293
also an interface for the scheduler (e.g., the macros for method registration
like SCA_SDF_RUN). The classes sca_linear_solver, sca_ac_solver regis-
ter available solvers and providing an interface for the solvers. In the above
example a linear time domain solver and a small signal frequency domain (AC-
solver) will be available. The solver interfaces provided will be used by the
view layer. A behavior view layer will provide methods for the description of
transfer functions or state space systems. Thus the sdf_module class provides
similar modeling facilities like described in sections 2.2 and 3.1 — however, in
a more comfortable way.
The following example illustrates the model of a simple low-pass filter using
the modeling style described above. The optional frequency domain imple-
mentation will be used by the AC solver which calculates the overall frequency
domain behavior using this implementation and the structure of the system
which will be accessed via the sca_module class interfaces.
SystemC-AMS: Rationales, State of the Art, and Examples 295
The next example demonstrates the definition of a module base class which
supports the description of conservative linear electrical networks. For con-
servative systems a global system of equations has to be set up and solved.
Every conservative component has to contribute to the global system of equa-
tions. Thus the conservative view (sca_conservative_mna_view) must pro-
vide methods for the description of the system of equations contribution, must
set up one system of equations for connected components, must instantiate one
solver for this system of equations and must embed this solver in the defined
synchronization domain.
A resistor, for example, would modify the matrix stamps of the system of
equations as follows:
296
10.5 Conclusions
At system level a very high simulation performance is required. The required
performance can be achieved by the combination of different problem and
application specific simulation techniques.
In section 3 we have discussed a number of simple but efficient methods that
allow us the simulation of behavioral models, or executable specifications of
analog systems. Most of these methods also work with SystemC 2.0. In sec-
tion 4, we have shown possible extensions that allow us to integrate different
SystemC-AMS: Rationales, State of the Art, and Examples 297
Acknowledgments
The authors thank the colleagues of Infineon Technologies MDCA Villach,
especially Gerhard Noessing and Herbert Zojer, for supporting the activities
of AMS modeling with SystemC and the numerous practical inputs and hints.
Furthermore, the authors thank Uwe Knoechel from the Fraunhofer Institute
IIS for contributing to the RF modeling part. We like to thank especially the
reviewers for their criticism and the valuable hints for the improvement of the
paper.
This page intentionally left blank
Chapter 11
Christoph Grimm
University of Frankfurt, Germany
Abstract This chapter describes methods for the simulation and the design of complex
signal processing systems with SystemC. The design starts with a block diagram
which can be simulated in SystemC. Refinement steps transform the block di-
agram to a more detailed description. In the detailed description the execution
of functional blocks is controlled by methods which explicitly determine step
widths. Furthermore, ports, for example clock and enable ports, are added. The
refinement permits the efficient interactive exploration of the design space of
mixed-signal systems.
11.1 Introduction
Sophisticated tools for the synthesis at the RT and behavioral level permit
a specify and synthesize design flow of digital systems. Together with the re-
use of complex IP blocks this allows designers to keep up with the increasing
complexity of digital hardware. However, the design of heterogeneous systems
is far more complex and tools are less mature. Therefore the design of such
systems is more interactive, and languages and associated methodologies for
modeling and design are more important.
SystemC, and even more SpecC [Gajski et al., 1997] support the interac-
tive design of hardware/software systems by a refine and evaluate methodol-
ogy [Gajski et al., 1994a]. An untimed functional model is modified by ‘small’
design steps. These design steps successively add run times of a hardware
or software realization to untimed processes and change the semantics from a
purely sequential execution to a partially concurrent execution. Although the
design steps require only small modifications in the model, their effect on the
system is tremendous: Components are shifted from software to hardware, or
299
W. Müller et al. (eds.),
SystemC: Methodologies and Applications, 299–323.
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
300
vice versa; without having to design all the necessary interfaces. This allows
designers the efficient, interactive design of digital hardware/software systems.
However, many systems include a variety of different analog componen-
ts [Vachoux et al., 2003b]: Control systems in the automotive domain, or line
drivers or RF frontends in the telecom domain, for example, are typical applica-
tions. Their realization includes software running on DSPs, dedicated hardware
for digital signal processing, sample rate converters, PLLs, A/D converters,
analog filtering and signal conditioning, even power electronics and mechan-
ical components. Unfortunately, up to now the refine and evaluate approach
seems to be restricted to the design of hardware/software systems.
Partitioning analog/digital/software;
Section 2 of this chapter introduces basic methods for the simulation of signal
processing systems. Section 3 gives hints and examples for the implementation
of these methods in SystemC. Most of the source code is taken from the ASC
library [Grimm et al., 2003], an early prototype of SystemC-AMS [Einwich
et al., 2002b, Vachoux et al., 2003b, Vachoux et al., 2003a], and is simplified
significantly in order to illustrate the basic methods and concepts. Section 4
describes the use of SystemC for the refinement of an executable specification
to different architectures of signal processing applications. Section 5 gives an
overview of a case study.
The value ranges V of the signals are either the real numbers or discrete ap-
proximations. The time bases T of the signals are the real numbers modeling
continuous time or discrete approximations of continuous time. Signal process-
ing functions are specified by differential and algebraic equations (DAEs) or
difference equations in general. At system level most notably linear dynamic
and nonlinear static functions are used for the specification of signal processing
functions. We call blocks with signal processing functions signal processing
blocks. Contiguous networks of signal processing blocks are called a (signal
processing) cluster. Signal processing blocks are connected via directed edges
that connect output signals with input signals. The meaning of a connection
from an output signal to an input signal is a mathematical equation, for
which we use the operator ‘==’, and not an assignment (operators ‘=’, ‘:=’):
Figure 11.1. A block diagram of one block with SystemC processes, and a cluster of three
signal processing blocks. A coordinator controls the simulation of the signal processing cluster.
cluster requires that all blocks in a cluster are modeled by DAEs, procedural
assignments, or explicit equations, and that all equations are solved by one
‘solver’. Therefore the combination of different methods for the simulation of
analog systems is not supported by this method.
The second method is used in Simulink, SPW, Cossap, and also in AMS
extensions of SystemC. It permits the very efficient simulation of heterogeneous
signal processing systems, because the static data flow model of computation is
used for computing the outputs of the cluster. Because every block is simulated
independently from each other, this approach allows designers to mix different
models of computation in a cluster. A drawback of this method is its restricted
convergence. Relaxation should only be applied to simulate weakly coupled
systems. This requirement is usually met by block diagrams used at system
level. Table 11.1 compares the methods for the simulation of block diagrams.
Figure 11.2. Continuous time signal, discrete time approximation and a discrete event signal
Figure 11.3. A simple synchronization between signal processing cluster and discrete event
processes
has been modified at that is, its value before or at the first delta cycle of the
discrete event simulation Then the signal processing cluster has a known
input from the last change untill the actual point in simulated time and can
compute output values including This synchronization scheme cannot
have deadlocks, because the signal processing cluster has always known inputs
from the past and computes all necessary inputs for the discrete processes at
the actual simulated time.
Together with the relaxation based computation of the cluster, which is im-
plemented using a static data flow model of computation, we get the following
overall algorithm for the simulation of block diagrams:
simulation_of_cluster
Before simulation:
Break up cyclic dependencies by insertion of a delay;
Sort all blocks in signal flow direction.
During simulation:
Every do:
Read input values from DE signals;
Compute signal processing blocks in determined order;
Write DE output values to DE signals;
Simulate all DE delta cycles at
This port always calls methods of the interface of the continuous-time signal
interface:
Note that most of the event handling is useless and could be omitted. If no
events on continuous time signals are needed, even the inheritance from the
primary channel class is not required and can be omitted. However, we keep
inheritance from the primary channel class. This allows us to use the update
routine for the synchronization between the signal processing cluster and the
discrete kernel:
Discrete modules read the signal via ports, which call the method read().
The following implementation just returns the current value of the signal:
Modeling and Refinement of Mixed-Signal Systems with SystemC 309
Note that at this place we also can add a request to synchronize the signal
processing cluster (that is, to compute its actual output!) before returning the
current value. The same applies for signal processing modules and their ports
with the difference that these ports always call the method read_a:
When discrete event signals of SystemC 2.0 read or write inputs of the signal
processing cluster, the following write method waits for an update cycle. This
allows the signal processing cluster to always access the value before the first
update cycle as discussed in section 11.2:
Note that after the discrete event part has written this signal we can request
the coordinator to do a new computation of the signal processing cluster. In the
update methods the new values are finally written to the current value (at this
point in time the signal processing cluster is already simulated):
Figure 11.5. In the dynamical mathematical model the coordinator tries to approximate the
continuous time behavior as precisely as possible by choosing delays as small as possible and
executing all blocks in the signal flow’s direction
Figure 11.6. In the computation-accurate model, the design activities A/D partitioning and de-
termination of sample rates have replaced the coordinator by a discrete controller, that determines
them by means of discrete processes
the signal processing method via the coordinator interface; the signal processing
function therefore is no longer assigned to a coordinator. Because the
functions are intended to be realized in analog, they are still
assigned to the coordinator which controls their execution in order to get a good
approximation of their continuous time behavior. Also a very simple model of
an A/D converter has been introduced in order to model quantization, sampling,
and delay of the converter. The converter still communicates via the interface
of the signal. Note that there is no D/A converter — it is not mandatory for a
simulation, but it can be added like the A/D converter.
Figure 11.7. In this pin accurate model the abstract communication and synchronization is
translated to ‘physical’ signals by adapters
Figure 11.7 shows the resulting structure. The modules have additional in-
terfaces which are provided by adapter classes. These adapter classes translate
316
Figure 11.8. PWM controller example: The mathematical model used as executable specifi-
cation
The modules get the coordinator as a parameter when they are instantiated
in the constructor of the module pwm:
The complete listing of the example is in the appendix. Figure 11.9 shows
a step response simulated with the executable specification. The executable
318
specification which specifies the ideal dynamical behavior nearly fits on just
one page. Nevertheless, after full refinement one ends up with some hundred
pages of source code for a complete controller. In the following, we give an
overview of some of the refinement steps.
After validating the functional correctness of this design step we replace the
signal by a rather complex model of an A/D converter. Furthermore, we
replace the continuous time PI controller by a discrete time PI controller, which
now works with a fixed sample rate. We also introduce a new block controller
which explicitly calls the signal processing functions of the components to be
realized in digital: the adder; and the PI controller. The component to be
realized in analog remains under the control of a coordinator. We also validate
the impact of quantization and limitation to the behavior of the system: The
corresponding values can be modeled easily as properties of each signal. This
also applies to signals and states in each block, especially the discrete controller.
Up to now we have most notably modeled the partitioning of functions to
analog or digital blocks, quantization, and the timing/scheduling of operations.
By the refinement of interfaces we add a number of ports to each block. Note that
this refinement no longer modifies the overall system behavior. It just specifies
an implementation of the functional description evaluated in the refinement of
computation. In the given example given the discrete controller then realizes:
A bus interface via which it can communicate with other components;
A controller, that gets a high clock frequency and that generates clock-
and enable signals for all digital blocks:
Figure 11.11 shows the detailed design of the PWM controller. Note that the
simulation of the discrete controller and the detailed communication reduces
simulation performance by magnitudes in exchange for modeling communica-
tion pin accurate.
The design shown in figure 11.11 is the starting point for the synthesis of
the digital modules at RT level, respectively, for the circuit level design of the
analog part.
11.6 Conclusions
The methods described permit the modeling of analog and signal processing
systems in SystemC 2.0 at a high level of abstraction, which can be compared
to Simulink. Furthermore, they enable the design of such systems by a refine/e-
valuate approach. Features required for such a design methodology are:
Signals which allow designers to connect non-conservative, continuous
time signals with discrete time or discrete event signals in a direct way;
This supports the interactive design, because there is no need to model
converters explicitly. Furthermore, this allows the designer to model
one part of the design by a rather abstract block diagram with non-
conservative signals, connected with a design of other blocks which al-
ready use discrete event signals of SystemC 2.0;
A well defined and generic interface between coordinator and signal pro-
cessing blocks permits one to refine the computation. In this context
the use of relaxation methods for distributed simulation permits design-
ers to combine different (and changing!) models of computation for the
modeling of a system on different levels of abstraction.
Modeling and Refinement of Mixed-Signal Systems with SystemC 321
Acknowledgments
The author would like to thank Alain Vachoux, Karsten Einwich, Peter
Schwarz, and Klaus Waldschmidt for many interesting and valuable discus-
sions.
322
325
326
[Börger and Stärk, 2003] Börger, E. and Stärk, R. (2003). Abstract State Ma-
chines - A Method for High-Level System Design and Analysis. Springer
Verlag, Berlin/Heidelberg/New York.
[Burns and Wellings, 1998] Burns, A. and Wellings, A. (1998). Concurrency
in Ada. Cambridge University Press, Cambridge.
[Cadence Design Systems, 2002] Cadence Design Systems (2002). Virtual
Component Co-Design (VCC). www.cadence.com/datasheets.
[Carroll and Ellis, 1995] Carroll, M. D. and Ellis, M. A. (1995). Designing
and Coding Reusable C++. Addison-Wesley, Boston.
[Cesario et al., 2002] Cesario, W., Baghdadi, A., Gauthier, L., Lyonnard, D.,
Nicolescu, G., Paviot, Y., Yoo, S., Jerraya, A., and Diaz-Nava, M. (2002).
Component-Based Design Approach for Multicore SoCs. In Proc. of the
Design Automation Conference (DAC’02), New Orleans, USA. IEEE CS
Press, Los Alamitos.
[Chakraborty and Gosh, 1988] Chakraborty, T.J. and Gosh, S. (1988). On Be-
havior Fault Modeling for Combinational Digital Designs. In IEEE Inter-
national Test Conference (ITC’88), Washington, USA. IEEE CS Press, Los
Alamitos.
[Chang et al., 1999] Chang, H., Cooke, L., Hunt, M., Martin, G., McNelly, A.,
and Todd, L. (1999). Surviving the SoC Revolution: A Guide to Platform-
Based Design. Kluwer Academic Publishers, Boston/Dordrecht/London.
[Cheng et al., 1999] Cheng, K.T., Huang, S.Y., and Dai, W.J. (1999). Fault Em-
ulation: A New Methodology for Fault Grading. IEEE Trans. on Computer
Aided Design, 18(10).
[Cheng and Krishnakumar, 1996] Cheng, K.T. and Krishnakumar, A.S.
(1996). Automatic Generation of Functional Vectors Using the Extended
Finite State Machine Model. ACM Trans. on Design Automation of Elec-
tronic Systems, 1(1).
[Chevallaz et al., 2002] Chevallaz, C., Mareau, N., and Gonier, A. (2002). Ad-
vanced Methods for SoC Concurrent Engineering. In Proc. of Design, Au-
tomation and Test in Europe — Designer Forum (DATE ’02), Paris, France.
IEEE CS Press, Los Alamitos.
[Chiou et al., 1999] Chiou, D., Jain, P., Rudolph, L., and Devadas, S. (1999).
Application-Specific Memory Management for Embedded Systems Using
Software-Controlled Caches. In Proc. of the Design Automation Conference
(DAC’99), New Orleans, USA. IEEE CS Press, Los Alamitos.
328
[Civera et al., 2001] Civera, P., Macchiarulo, L., Rebaudengo, M., Reorda,
M. Sonza, and Violante, M. (2001). Exploiting Circuit Emulation for Fast
Hardness Evaluation. IEEE Trans. on Nuclear Science, 48(6).
[Clouard et al., 2002] Clouard, A., Mastrorocco, G., Carbognani, F., Perrin,
A., and Ghenassia, F. (2002). Towards Bridging the Gap between SoC
Transactional and Cycle-Accurate Levels. In Proc. of Design, Automation
and Test in Europe — Designer Forum (DATE’02), Paris, France. IEEE CS
Press, Los Alamitos.
[Compaq et al., 2000] Compaq, Hewlett-Packard, Intel, Lucent, Microsoft,
NEC, and Philips (2000). Universal Serial Bus Specification, Revision 2.0.
[Corno et al., 1997] Corno, F., Prinetto, P., and Reorda, M.S. (1997). Testabil-
ity Analysis and ATPG on Behavioral RT-Level VHDL. In IEEE Interna-
tional Test Conference (ITC’97), Washington, USA.
[CoWare, 2002] CoWare (2002). Homepage. www.coware.com.
[de Man, 2002] de Man, H. (2002). On Nanoscale Integration and Gigascale
Complexity in the Post.com world. In Design, Automation and Test in Europe
(DATE’02), Paris, France. IEEE CS Press, Los Alamitos.
[Delgado Kloos and Breuer, 1995] Delgado Kloos, C. and Breuer, P. T. (1995).
Formal Semantics for VHDL. Kluwer Academic Publishers, Boston/Lon-
don/Dordrecht.
[Desmet et al., 2000] Desmet, D., Verkest, D., and de Man, H. (2000). Oper-
ating System Based Software Generation for Systems-On-Chip. In Proc. of
the Design Automation Conference (DAC’00), Los Angeles, USA. IEEE CS
Press, Los Alamitos.
[Dey and Bommu, 1997] Dey, S. and Bommu, S. (1997). Performance Anal-
ysis of a System of Communicating Processes. In Proc. of the International
Conference on Computer Aided Design (ICCAD’97), San Jose, USA. IEEE
CS Press, Los Alamitos.
[Einwich et al., 2001] Einwich, K., Clauss, Ch., Noessing, G., Schwarz, P., and
Zojer, H. (2001). SystemC Extensions for Mixed-Signal System Design. In
Proc. of the Forum on Specification & Design Languages (FDL’01), Lyon,
France.
[Einwich et al., 2002a] Einwich, K., Grimm, Ch., Vachoux, A., Martinez-
Madrid, N., Moreno, F.R., and Ch., Meise (2002a). Analog and Mixed
Signal Extensions for SystemC. White Paper of the OSCI SystemC-AMS
Working Group.
REFERENCES 329
[Einwich et al., 2002b] Einwich, K., Schwarz, P., Grimm, Ch., and Wald-
schmidt, K. (2002b). Mixed-Signal Extensions for SystemC. In Proc. of the
Forum on Specification & Design Languages (FDL’02), Marseille, France.
[Ellervee et al., 2000] Ellervee, P., Miranda, M., Cathoor, F., and Hermani, A.
(2000). System Level Data Format Exploration for Dynamically Allocated
Data Structures. In Proc. of the Design Automation Conference (DAC’00),
Los Angeles, USA. IEEE CS Press, Los Alamitos.
[Ernst et al., 2002] Ernst, R., Richter, K., Haubelt, C., and Teich, J. (2002).
System Design for Flexibility. In Proc. of Design, Automation and Test in
Europe (DATE’02), Paris, France. IEEE CS Press, Los Alamitos.
[ETSI, 2000a] ETSI (2000a). Broadband Radio Access Networks (BRAN),
HIPERLAN Type 2, Data Link Control (DLC) Layer, Part 1: Basic Data
Transport Functions, ETSI TS 101 761-1.
[ETSI, 2000b] ETSI (2000b). Broadband Radio Access Networks (BRAN),
HIPERLAN Type 2, System Overview, ETSI TR 101 683 (V1.1.2).
[Fernández et al., 2002] Fernández, V., Herrera, F., Sánchez, P., and Villar, E.
(2002). Conclusiones: Metodología Industrial Para Codiseño de Sistemas
Embebidos HW/SW. Deliverable Document: DFFEDER1FD97-0791, Uni-
versity of Cantabria, Cantabria, Spain.
[Ferrandi et al., 2002a] Ferrandi, F., Fummi, F., and Sciuto, D. (2002a). Test
Generation and Testability Alternatives Exploration of Critical Algorithms
for Embedded Applications. IEEE Trans. on Computers, 51(2).
[Ferrandi et al., 2002b] Ferrandi, F., Rendine, M., and Sciuto, D. (2002b).
Functional Verification for SystemC Descriptions using Constraint Solv-
ing. In Proc. of Design, Automation and Test in Europe (DATE’02), Paris,
France. IEEE CS Press, Los Alamitos.
[Fin et al., 2002] Fin, A., Fummi, F., Galavotti, M., Pravadelli, G., Rossi, U.,
and Toto, F. (2002). Mixing ATPG and Property Checking for Testing
HW/SW Interfaces. In Proc. of the IEEE European Test Workshop (ETW’02),
Corfu, Greece. IEEE CS Press, Los Alamitos.
[Finet al., 2001a] Fin, A., Fummi, F., Martignano, M., and Signoretto, M.
(2001a). SystemC: A Homogeneous Environment to Test Embedded Sys-
tems. In Proc. of the IEEE International Symposium on Hardware/Software
Codesign (CODES’01), Copenhagen, Denmark.
[Fin et al., 2001b] Fin, A., Fummi, F., and Pravadelli, G. (2001b). AMLETO:
A Multi-Language Environment for Functional Test Generation. In IEEE
330
[Gajski et al., 1994a] Gajski, D., Vahid, F., and Narayan, S. (1994a). A System-
Design Methodology: Executable-Specification Refinement. In Proc. of the
European Design Automation Conference (EURO-DAC’94), Paris, France.
IEEE CS Press, Los Alamitos.
[Gajski et al., 1997] Gajski, D., Zhu, J., and Dömer, R. (1997). The SpecC+
Language. Technical Report ICS-TR-97-15, University of California, Irvine,
Irvine, USA.
[Gajski et al., 2000] Gajski, D., Zhu, J., Dömer, R., Gerstlauer, A., and Zhao,
S. (2000). SpecC: Specification Language and Methodology. Kluwer Aca-
demic Publishers, Boston/Dordrecht/London.
[Gajski et al., 1994b] Gajski, Daniel D., Vahid, Frank, and Narayan, Sanjiv
(1994b). Specification and Design of Embedded System. Prentice Hall,
Englewood Cliffs.
[Gauthier et al., 2001] Gauthier, L., Yoo, S., and Jerraya, A. (2001). Automatic
Generation and Targeting of Application Specific Operating Systems and
Embedded System Software. In Proc. of Design, Automation and Test in
Europe (DATE’01), Munich, Germany. IEEE CS Press, Los Alamitos.
[Glässer et al., 1999] Glässer, U., Gotzhein, R., and Prinz, A. (1999). Towards
a New Formal SDL Semantics Based on Abstract State Machines. SDL ’99
- The Next Millenium, 9th SDL Forum Proceedings.
[Gosh, 1988] Gosh, S. (1988). Behavior-Level Fault Simulation. IEEE Design
&Test of Computers, 5(3).
[Gosh and Chakraborty, 1991] Gosh, S. and Chakraborty, T.J. (1991). On Be-
havior Fault Modeling for Digital Designs. Journal of Electronic Testing:
Theory and Applications (JETTA), 2(2).
[Gracia et al., 2001] Gracia, J., Baraza, J.C., Gil, D., and Gil, P.J. (2001). Com-
parison and Application of Different VHDL-Based Fault Injection Tech-
niques. In Proc. of the IEEE International Symposium on Defect and Fault
Tolerance in VLSI Systems (DFT’01), San Francisco, USA.
[Grimm et al., 2003] Grimm, Ch., Heupke, W., Meise, Ch., and Waldschmidt,
K. (2003). Refinement of Mixed-Signal Systems with SystemC. In Proc.
of Design, Automation and Test in Europe (DATE’03), Munich, Germany.
IEEE CS Press, Los Alamitos.
[Grimm et al., 2002] Grimm, Ch., Oehler, P., Meise, Ch., Waldschmidt, K.,
and Fey, W. (2002). AnalogSL: A Library for Modeling Analog Power
Drivers with C++. In System on Chip Design Languages. Kluwer Academic
Publishers, Boston/London/Dordrecht.
[Grimpe et al., 2002] Grimpe, E., Biniasch, R., Fandrey, T., Oppenheimer, F.,
and Timmermann, B. (2002). SystemC Object-Oriented Extensions and Syn-
thesis Features. In Proc. of the Forum on Specification & Design Languages
(FDL’02), Marseille, France.
[Grötker, 2002] Grötker, T. (2002). Modeling Software with SystemC 3.0. In
6th European SystemC Users Group Meeting.
[Grötker, 2002] Grötker, T. (2002). Reducing the SoC Design Cycle:
Transaction-Level SoC Verification with SystemC. www.synopsys.com.
[Grötker et al., 2002] Grötker, T., Liao, S., Martin, G., and Swan, S. (2002).
System Design with SystemC. Kluwer Academic Publishers, Boston/Lon-
don/Dordrecht.
[Gupta, 1995] Gupta, R. (1995). Co-Synthesis of Hardware and Software for
Digital Embedded Systems. Kluwer Academic Publishers, Boston/London/
Dordrecht.
[Gupta et al., 2000] Gupta, S., Miranda, M., Catthoor, F., and Gupta, R. (2000).
Analysis of High-level Address Code Transformations for Programmable
332
[Harris and Zhang, 2001] Harris, I.G. and Zhang, Q. (2001). A Validation
Fault Model for Timing-Induced Functional Errors. In IEEE International
Test Conference (ITC’01), Baltimore, USA.
[Hashmi and Bruce, 1995] Hashmi, M.M.K. and Bruce, A.C. (1995). Design
and Use of a System Level Specification and Verification Methodology. In
Proc. of the European Design Automation Conference (EDAC’95), Brighton,
UK.
[Haverinen et al., 2002] Haverinen, A., Leclercq, M., Weyrich, N., and Win-
gard, D. (2002). SystemC-Based SoC Communication Modeling for the
OCP Protocol. www.ocpip.org.
[Herrera et al., 2003] Herrera, F., Posadas, H., Sánchez, P., and Villar, E.
(2003). Systematic Embedded Software Generation from SystemC. In Proc.
of Design, Automation and Test in Europe (Date ’03), Munich, Germany.
IEEE CS Press, Los Alamitos.
[Jacobson et al., 1992] Jacobson, I., Christerson, M., Jonsson, P., and Oever-
gaard, G. (1992). Object-Oriented Software Engineering: A Use Case Driven
Approach. Addison-Wesley, Boston.
REFERENCES 333
[Jain et al., 2001] Jain, P., Devadas, S., Engels, D., and Rupoldh, L. (2001).
Software-Assisted Cache Replacement Mechanisms for Embedded System.
In Proc. of the International Conference on Computer Aided Design (IC-
CAD’01), San Jose, USA. IEEE CS Press, Los Alamitos.
[Jenn et al., 1994] Jenn, E., Arlat, J., Rimen, M., Ohlsson, J., and Karlsson, J.
(1994). Fault Injection into VHDL Models: The MEFISTO Tool. In IEEE
International Symposium on Fault-Tolerant Computing (FTCS’94), Seattle,
USA.
[Jeruchim et al., 1992] Jeruchim, M.C., Balaban, P., and Shanmugan, K.S.
(1992). Simulation of Communication Systems. Plenum Press, New York.
[Jiang and Brayton, 2002] Jiang, Y. and Brayton, R. (2002). Software Synthe-
sis from Synchronous Specifications Using Logic Simulation Techniques. In
Proc. of the Design Automation Conference (DAC’02), New Orleans, USA.
IEEE CS Press, Los Alamitos.
[Keutzer et al., 2000] Keutzer, K., Malik, S., Newton, R., Rabacy, J., and
Sangiovanni-Vincentelli, A. (2000). System Level Design: Orthogonaliza-
tion of Concerns and Platform Based Design. IEEE Trans. On Computer-
Aided Design of Circuits and Systems, 19(12).
[Kjeldsberd et al., 2001] Kjeldsberd, P. G., Cathoor, F., and Aas, E. (2001).
Detection of Partially Simultaneously Alive Signal in Storage Requirement
Estimation for Data Intensive Applications. In Proc. of the Design Automa-
tion Conference (DAC’01), Las Vegas, USA. IEEE CS Press, Los Alamitos.
[Knudsen and Madsen, 1998] Knudsen, P.V. and Madsen, J. (1998). Integrat-
ing Communication Protocol Selection with Hardware/Software Codesign.
In Proc. of the International Symposium on System Level Synthesis (ISSS ’98),
Hsinchu, Taiwan.
[Knudsen and Madsen, 1999] Knudsen, P.V. and Madsen, J. (1999). Graph-
Based Communication Analysis for Hardware/Software Codesign. In Proc.
of the International Workshop on HW/SW Codesign, Rome, Italy.
[Kuhn et al., 1998] Kuhn, T., Rosenstiel, W., and Kebschull, U. (1998). Object
Oriented Hardware Modeling and Simulation Based on Java. In Proc. of the
International Workshop on IP Based Synthesis and System Design, Grenoble,
France.
334
[Lahiri et al., 1999] Lahiri, K., Raghunathan, A., and Dey, S. (1999). Fast
Performance Analysis of Bus-Based System-On-Chip Communication Ar-
chitectures. In Proc. of the International Conference on Computer Aided
Design (ICCAD’99), San Jose, USA.
[Lahiri et al., 2001] Lahiri, K., Raghunathan, A., and Dey, S. (2001). System-
Level Performance Analysis for Designing On-Chip Communication Ar-
chitectures. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 20(6).
[Lee, 2000] Lee, E.A. (2000). What’s Ahead for Embedded Software? IEEE
Computer, 33(9).
[Lee and Sangiovanni-Vincentelli, 1998] Lee, E.A. and Sangiovanni-Vincen-
telli, A. (1998). A Framework for Comparing Models of Computation.
IEEE Trans. On Computer-Aided Design of Integrated Circuits and Systems,
17(12).
[Lekatsas et al., 2000] Lekatsas, H., Henkel, J, and Wolf, W. (2000). Code
Compression as a Variable in Hardware/Software Co-Design. In Pro-
ceeding of the International Symposium on Hardware/Software Co-Design
(CODES’00), San Diego, USA.
[Leveugle, 2000] Leveugle, R. (2000). Fault Injection in VHDL Descriptions
and Emulation. In Proc. of the IEEE International Symposium on Defect
and Fault Tolerance in VLSI Systems (DFT’00), Antwerpen, Belgium.
[Lieverse et al., 2001] Lieverse, P., Wolf, P.V., and Deprettere, E. (2001). A
Trace Transformation Technique for Communication Refinement. In Inter-
national Workshop on HW/SW Codesign, Copenhagen, Denmark.
[Liu et al., 1999] Liu, J., Wu, B., Liu, X., and Lee, E. A. (1999). Interoperation
of Heterogeneous CAD Tools in Ptolemy II. In Symposium on Design, Test
and Microfabrication of MEMS/MOEMS (DTM’99), Paris, France.
[Lopez et al., 1993] Lopez, J.C., Hermida, R., and Geisselhardt, W. (1993).
Advanced Techniques for Embedded System Design and Test. Kluwer Aca-
demic Publishers, Boston/London/Dordrecht.
[Lyons, 1997] Lyons, Richard G. (1997). Understanding Digital Signal Pro-
cessing. Addison-Wesley, Boston.
[Malaiya et al., 1995] Malaiya, Y.K., Li, M.N., Bieman, J.M., and Karcich, R.
(1995). Generation of Design Verification Tests from Behavioral VHDL Pro-
grams Using Path Enumeration and Constraint Programming. IEEE Trans.
on VLSI Systems, 3(2).
REFERENCES 335
[MathWorks, 1993] The Math Works (1993). The Matlab EXPO: An Introduc-
tion to MATLAB, SIMULINK, and the MATLAB Application Toolboxes.
[Müller et al., 2002] Müller, W., Dömer, R., and Gerstlauer, A. (2002). The
Formal Execution Semantics of SpecC. In Proc. of the International Sym-
posium on System Synthesis (ISSS’02), Kyoto, Japan.
[Müller et al., 2001] Müller, W, Ruf, J., Hofmann, D., Gerlach, J., Kropf, Th.,
and Rosenstiel, W. (2001). The Simulation Semantics of SystemC. In Proc.
of Design, Automation and Test in Europe (DATE ’01), Munich, Germany.
IEEE CS Press, Los Alamitos.
[Myers, 1999] Myers, G.J. (1999). The Art of Software Testing. Wiley & Sons,
New York.
[Nagel et al., 2001] Nagel, P., Leyh, M., and Speitel, M. (2001). Using Behav-
ioral Compiler and FPGA Prototyping for the Development of an OFDM
Demodulator. In User Papers of Eleventh Annual Synopsys Users Group
Conference (SNUG), San Jose, USA.
[Niemann and Speitel, 2001] Niemann, B. and Speitel, M. (2001). The Ap-
plication of SystemC Compiler in an Industrial Project — Modeling a GPS
Receiver Using SystemC. In User Papers of Synopsys Users Group Europe
(SNUG), Munich, Germany.
[Nikkei, 2001] Nikkei (2001). Performance Analysis of a SoC Interconnect
using SysProbe. Nikkei Electronics Magazine.
[Öberg et al., 1996] Öberg, J., Kumar, A., and Hemani, A. (1996). Grammar-
based hardware synthesis of data communication protocols. In Proc. of the
9th International Symposium on System Synthesis (ISSS’96), La Jolla, USA.
[OMG, 2001] OMG (2001). Unified Modeling Language Specification v1.4.
www.omg.org.
[OSCI, 2000] OSCI (2000). SystemC Version 1.0 User’s Guide. Synopsys Inc,
CoWare Inc, Frontier Inc. www.systemc.org.
[OSCI, 2002a] OSCI (2002a). SystemC Version 2.0 Functional Specification.
Synopsys Inc, CoWare Inc, Frontier Inc. www.systemc.org.
[OSCI, 2002b] OSCI (2002b). SystemC Version 2.0 User’s Guide. Synopsys
Inc, CoWare Inc, Frontier Inc. www.systemc.org.
[OSCI, 2002a] OSCI (2002a). SystemC Version 2.0 User’s Guide. Update for
SystemC 2.0.1. Open SystemC Initiative. www.systemc.org.
[OSCI, 2002b] OSCI (2002b). Version 2.0.1 Master-Slave Communica-
tion Library - A SystemC Standard Library. Open SystemC Initiative.
www.systemc.org.
[OSCI, 2002c] OSCI WG Verification (2002c). SystemC Verification Standard
Specification V1.0b. www.systemc.org.
[Passerone et al., 2001] Passerone, C., Watanabe, Y., and Lavagno, L. (2001).
Generation of Minimal Size Code for Schedule Graphs. In Proc. of Design,
Automation and Test in Europe (DATE’01), Munich, Germany. IEEE CS
Press, Los Alamitos.
[Pazos et al., 2002] Pazos, N., Brunnbauer, W., Foag, J., and Wild, T.
(2002). System-Based Performance Estimation of Multi-Processing, Multi-
Threading SoC Networking Architecture. In Proc. of the Forum on Specifi-
cation & Design Languages (FDL’02), Marseille, France.
[Petzold, 1983] Petzold, L. R. (1983). A Description of DASSL: A Differential-
Algebraic System Solver. In Stepleman, R. S., editor, Scientific Computing.
North-Holland, Amsterdam.
REFERENCES 337
[Pop et al., 2000] Pop, P., Eles, P., and Peng, Z. (2000). Performance Esti-
mation for Embedded Systems with Data and Control Dependencies. In
International Workshop on HW/SW Codesign, San Diego, USA.
[Pross, 2002] Pross, U. (2002). Modellierung und Systemsimulation des Uni-
versal Serial Bus Protokoll. Master’s thesis, Technische Universität Chem-
nitz, Chemnitz, Germany.
[Radetzki, 2000] Radetzki, M. (2000). Synthesis of Digital Circuits from
Object-Oriented Specifications. PhD thesis, University of Oldenburg, Old-
enburg, Germany.
[Radetzki et al., 1998] Radetzki, M., Putzke-Röming, W., and Nebel, W.
(1998). Objective VHDL: The Object-Oriented Approach to Hardware
Reuse. In Roger, J.-Y., Stanford-Smith, B., and Kidd, P. T., editors, Ad-
vances in Information Technologies: The Business Challenge, Amsterdam,
Netherlands. IOS Press.
[Rational, 2001] Rational (2001). Rational Unified Process Whitepaper: Best
Practices for Software Development Teams.
[Rowson and Sangiovanni-Vincentelli, 1997] Rowson, J.A. and Sangiovanni-
Vincentelli, A. (1997). Interface-Based Design. In Proc. of the Design
Automation Conference (DAC’97), Anaheim, USA. IEEE CS Press, Los
Alamitos.
[Salem and Shams, 2003] Salem, A. and Shams, A. (2003). The Formal Se-
mantics of Synchronous SystemC. In Proc. of Design, Automation and Test
in Europe (DATE’03), Munich, Germany. IEEE CS Press, Los Alamitos.
[Sangiovanni-Vincentelli, 2002a] Sangiovanni-Vincentelli, A. (2002a). Defin-
ing Platform-Based Design. EEDesign of EETimes.
[Sangiovanni-Vincentelli, 2002b] Sangiovanni-Vincentelli, A. (2002b). The
Context for Platform-Based Design. IEEE Design & Test of Computer,
19(6).
[Sasaki, 1999] Sasaki, H. (1999). A Formal Semantics for Verilog-VHDL
Simulation Interoperability by Abstract State Machine. In Proc. of Design,
Automation and Test in Europe (DATE’99), Munich, Germany. IEEE CS
Press, Los Alamitos.
[Sasaki et al., 1997] Sasaki, H., Mizushima, K., and Sasaki, T. (1997). Se-
mantic Validation of VHDL-AMS by an Abstract State Machine. In
IEEE/VIUF International Workshop on Behavioral Modeling and Simula-
tion (BMAS’97), Arlington, USA.
338
[Sayinta et al., 2003] Sayinta, A., Canverdi, G., Pauwels, M., Alshawa, A., and
Dehaene, W. (2003). A Mixed Abstraction Level Co-Simulation Case Study
Using SystemC for System-On-Chip Verification. In Proc. of Design, Au-
tomation and Test in Europe (DATE’03), Munich, Germany. IEEE CS Press,
Los Alamitos.
[Scanlon, 2002] Scanlon, T. (2002). Global Responsibilities in System-On-
Chip Design. In Proc. of Design, Automation and Test in Europe (DATE’02),
Paris, France. IEEE CS Press, Los Alamitos.
[Schumacher, 1999] Schumacher, G.(1999). Object-Oriented Hardware Spec-
ification and Design with a Language Extension to VHDL. PhD thesis,
University of Oldenburg, Oldenburg, Germany.
[Semeria and Ghosh, 1999] Semeria, L. and Ghosh, A. (1999). Methodology
for Hardware/Software Co-verification in C/C++. In Proc. of the High-Level
Design Validation and Test Workshop (HLDVT’99), Oakland, USA.
[Sgroi et al., 1999] Sgroi, M., Lavagno, L., Watanabe, Y., and Sangiovanni-
Vincentelli, A. (1999). Synthesis of Embedded Software Using Free-Choise
Petri Nets. In Proc. of the Design Automation Conference (DAC’99), New
Orleans, USA. IEEE CS Press, Los Alamitos.
[Shiue, 2001] Shiue, W.T. (2001). Retargeable Compilation for Low Power.
In Proc. of the International Symposium on Hardware/Software Co-Design
(CODES’01), Copenhagen, Denmark.
[Siegmund and Müller, 2001] Siegmund, R. and Müller, D. (2001). SystemC-
SV: Extension of SystemC for Mixed Multi Level Communication Modeling
and Interface-based System Design. In Proc. of Design, Automation and Test
in Europe (DATE’01), Munich, Germany. IEEE CS Press, Los Alamitos.
[Siegmund and Müller, 2002] Siegmund, R. and Müller, D. (2002). A Novel
Synthesis Technique for Communication Controller Hardware from declar-
ative Data Communication Protocol Specifications. In Proc. of the Design
Automation Conference (DAC’02), New Orleans, USA. IEEE CS Press, Los
Alamitos.
[Sieh et al., 1997] Sieh, V., Tschäche, O., and Balbach, F. (1997). VERIFY:
Evaluation of Reliability Using VHDL-Models with Embedded Fault De-
scriptions. In IEEE International Symposium on Fault-Tolerant Computing
(FTCS’97), Seattle, USA.
[Sigwarth et al., 2002] Sigwarth, C., Mayer, F., Schlicht, M., Kilian, G., and
Heuberger, A. (2002). ASIC Implementation of a Receiver Chipset for the
Broadcasting in Long-, Medium- and Shortwave Bands (DRM). In Proc.
REFERENCES 339
[Turner, 1993] Turner, K. J., editor (1993). Using Formal Description Tech-
niques. An Introduction to ESTELLE, LOTOS and SDL. John Wiley & Sons,
New York.
[University of Cincinnati, 1999] University of Cincinnati, Dept. of ECECS
(1999). Savant Programmer’s Manual.
[Vachoux et al., 2003a] Vachoux, A., Grimm, Ch., and Einwich, K. (2003a).
Analog and Mixed-Signal Modeling with SystemC-AMS. In International
Symposion on Circuits and Systems 2003 (ISCAS ’03), Bangkok, Thailand.
[Vachoux et al., 2003b] Vachoux, A., Grimm, Ch., and Einwich, K. (2003b).
SystemC-AMS Requirements, Design Objectives and Rationale. In Proc.
of Design, Automation and Test in Europe (DATE ’03), Munich, Germany.
IEEE CS Press, Los Alamitos.
[Valderrama et al., 1996] Valderrama, C.A., Nacabal, F., Paulin, P., Jerraya,
A.A., Attia, M., and Cayrol, O. (1996). Automatic Generation of Interfaces
for Distributed C-VHDL Cosimulation of Embedded Systems: An Industrial
Experience. In IEEE Workshop on Rapid System Prototyping (RSP ’96),
Thessaloniki, Greece.
[van Nee and Prasad, 2000] van Nee, R. and Prasad, R. (2000). OFDM for
Wireless Multimedia Communications. Artech House Publishers, Boston.
[Vanderperren et al., 2002] Vanderperren, Y., Dehaene, W., and Pauwels, M.
(2002). A Method for the Development of Combined Floating- and Fixed-
Point SystemC Models. In Proc. of the Forum on Specification & Description
Languages (FDL’02), Marseille, France.
[Vanmeerbeeck et al., 2001] Vanmeerbeeck, G., Schaumont, P., Vernalde, S.,
Engels, M., and Bolsens, I. (2001). Hardware/Software Partitioning for Em-
bedded Systems in OCAPI-xl. In Proc. of the IEEE International Symposium
on Hardware/Software Codesign (CODES’01), Copenhagen, Denmark.
[Vercauteren et al., 1996] Vercauteren, S., Lin, B., and de Man, H (1996). A
Strategy for Real-Time Kernel Support in Application-Specific HW/SW
Embedded Architectures. In Proc. of the Design Automation Conference
(DAC’96), Las Vegas, USA. IEEE CS Press, Los Alamitos.
[Vernalde et al., 1999] Vernalde, S., Schaumont, P., and Bolsens, I. (1999). An
Object Oriented Programming Approach for Hardware Design. In IEEE
Workshop on VLSI, Orlando, USA.
[Villar, 2001] Villar, E. (2001). Design of Hardware/Software Embedded Sys-
tems. Servicio de Publicaciones de la Universidad de Cantabria, Cantabria,
Spain.
REFERENCES 341
343
344
update, 104, 106, 115 coverage, 133
update request, 104, 106 node, 163
CHOOSE, 101 conservative system, 276
client process, 230 constraints, 247, 249–250, 253–254, 257, 259
client/server architecture, 244 constructor, 80, 264, 70, 75–76
cluster, 301 copy, 76
signal processing, 301 default, 76
co-design, 189, 248, 251, 259, 270 context switch, 77
flow, 248 control system, 289
tools, 250 controllability, 137
co-emulation, 149 controller, 287
master-slave, 149 converter, 296
multiple-fault, 150 coordinator, 301
sequence-based, 150 generic, 306
co-simulation, 40, 144, 262, 272 source code, 306
instruction set simulator, 11, 21, 25 COSSAP, 293, 300
platform, 184 COSYNE, 215
VHDL, 8, 21, 24 counter example, 142
CoCentric System Studio, 71–72 CPG, 158, 161
codec, 52 crosscompiler, 261, 263, 272
combinational loop, 278 cycle accurate, 8, 10–11, 21, 58
command bus arbiter, 164
data flow
command event arbiter, 164
static, 280, 303
communicate, 35
scheduler, 307
communication item, 193
data types
attributes, 193
fixed point, 11–12, 84, 88, 92
composition, 195–196
floating point, 84, 87, 90
compositional behaviour, 196, 213
typedef, 73, 77, 94
constructor, 194
user defined, 76
declarative specification, 195
debug, 42
decompositional behaviour, 196, 213
delays, 278
embedded sequential behaviors, 199
denotational semantics, 99
frame, 194
description
message, 194 input, 249
parameter, 194 RT, 258, 270
PHYMAP, 195 single C++, 261
transaction, 194 system level, 251–252, 260
communication media, 172 design errors, 129
communication node, 164 design rule checking, 9
command bus arbiter, 171 design
context event arbiter, 171 example, 269
external, 164, 171 flow, 251
internal, 164, 171 framework, 253, 259
complex mixer, 73–75, 80, 87 platform based, 250–251
computation node, 163 system level, 252
convergence paths, 167 difference equation, 277
multiple instances, 169 differential equation (DE), 276
scheduling, 171 differential algebraic equation (DAE), 276
sequence, 167 digital filter, 283
shared element, 169 Dirac impulse, 288
implementation, 169 discrete event, 279
computational model, 253, 257 discrete process, 275
SystemC, 257–258 discrete time transfer function, 284
untimed, 258 DMA, 59
concurrency, 265, 269 dont_initialize, 103, 113, 118
support, 265 dynamic dispatching, 225
condition synthesis, 242
INDEX 345
dynamic module, 276 functional verification, 37