2005 Book DesignOfEmbeddedControlSystems PDF
2005 Book DesignOfEmbeddedControlSystems PDF
Control Systems
Design of Embedded
Control Systems
TJ223.M53D47 2005
629.8—dc22
2004062635
All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, Inc., 233 Spring Street, New York,
NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in
connection with any form of information storage and retrieval, electronic adaptation, computer software,
or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not
identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to
proprietary rights.
Printed in the United States of America (TB/IBT)
9 8 7 6 5 4 3 2 1
springeronline.com
About the Editors
A. Zakrevskij
Contents
Foreword vii
12. Design of Embedded Control Systems Using Hybrid Petri Nets 139
Thorsten Hummel / Wolfgang Fengler
Index 261
Section I
Specification of Concurrent
Embedded Control Systems
Chapter 1
USING SEQUENTS FOR DESCRIPTION
OF CONCURRENT DIGITAL SYSTEMS
BEHAVIOR
Arkadij Zakrevskij
United Institute of Informatics Problems of the National Academy of Sciences of Belarus,
Surganov Str. 6, 220012, Minsk, Belarus; e-mail: [email protected]
Key words: logical control; behavior level; simple event; sequent automaton; PLA implemen-
tation; concurrency; correctness.
1. INTRODUCTION
Development of modern technology results in the appearance of complex
engineering systems, consisting of many digital units working in parallel and
often in the asynchronous way. In many cases they exchange information by
means of binary signals represented by Boolean variables, and logical control
devices (LCDs) are used to maintain a proper interaction between them. Design
of such a device begins with defining a desirable behavior of the considered
system and formulating a corresponding logical control algorithm (LCA) that
must be implemented by the control device. The well-known Petri net formalism
is rather often used for this purpose.
But it would be worth noting that the main theoretical results of the theory
of Petri nets were obtained for pure Petri nets presenting nothing more than sets
4 Chapter 1
of several ordered pairs of some finite set, interpreted in a special way. To use a
Petri net for LCA representation, some logical conditions and operations should
be added. That is why various extensions of Petri nets have been proposed.
Their common feature is that some logical variables are assigned to elements
of the Petri net structure: places, transitions, arcs, and even tokens. This makes
possible to represent by extended Petri nets rather complicated LCAs, but at
the cost of losing the vital theoretical maintenance.
These considerations motivated developing a new approach to LCA
representation11 , where Petri nets were applied together with cause-effect rela-
tions between simple discrete events (presented by elementary conjunctions).
In that approach only the simplest kind of Petri nets is regarded, where arith-
metic operations (used for counting the current number of tokens in a place)
are changed by set operations, more convenient when solving logical problems
of control algorithms verification and implementation.
According to this approach, the special language PRALU was proposed
for LCA representation and used as the input language in an experimental
system of CAD of LCDs12 . A fully automated technology of LCD design was
suggested, beginning with representation of some LCA in PRALU and using
an intermediate formal model called sequent automaton3−8 . A brief review of
this model is given below.
Besides these, many more events of other types may be taken into consid-
eration. Generally, every subset of BS(W ) may be interpreted as an event that
occurs when some element from BS(W ) is realized; i.e., when the variables from
W possess the corresponding combination of values. In this general case the
event is called complicated and could be presented by the characteristic Boolean
function of the regarded subset. Therefore, the number of complicated events
coincides with the number of arbitrary Boolean functions of |W| variables.
From the practical point of view, the following two types of events deserve
special consideration: basic events and simple events.
Basic events are represented by literals – symbols of variables or their nega-
tions – and occur when these variables take on corresponding values. For ex-
ample, basic event a occurs when variable a equals 1, and event c occurs when
c = 0. The number of different basic events is 2|W|.
Simple events are represented by elementary conjunctions and occur when
these conjunctions take value 1. For example, event ab f occurs when a = 1,
b = 0, and f = 1. The number of different simple events is 3|W| , including the
trivial event, when values of all variables are arbitrary.
Evidently, the class of simple events absorbs elementary events and basic
events. Therefore, elementary conjunction ki is the general form for representa-
tion of events i of all three introduced types; it contains symbols of all variables
in the case of an elementary event and only one symbol when a basic event
is regarded. One event i can realize another event j – it means that the latter
always comes when the former comes. It follows from the definitions that it
occurs when conjunction ki implicates conjunction k j ; in other words, when
k j can be obtained from ki by deleting some of its letters. For example, event
abc de realizes events ac d and bc e , event ac d realizes basic events a, c , and
d, and so on. Hence, several different events can occur simultaneously, if only
they are not orthogonal.
3. SEQUENT AUTOMATON
The behavior of a digital system is defined by the rules of changing its state.
A standard form for describing such rules was suggested by the well-developed
classical theory of finite automata considering relations between the sets of
input, inner, and output states. Unfortunately, this model becomes inapplicable
for digital systems with many Boolean variables – hundreds and more. That is
why a new formal model called sequent automaton was proposed3−5 . It takes
into account the fact that interaction between variables from W takes place
within comparatively small groups and has functional character, and it suggests
means for describing both the control unit of the system and the object of
control – the body of the system.
6 Chapter 1
4. EQUIVALENCE TRANSFORMATIONS
AND CANONICAL FORMS
Let us say that sequent si is satisfied in some engineering system if event
f i is always followed by event ki and that sequent si realizes sequent s j if the
latter is satisfied automatically when the former is satisfied.
Affirmation 1. Sequent si realizes sequent s j if and only if f j ⇒ f i and ki ⇒ k j ,
where ⇒ is the symbol of formal implication.
For instance, sequent ab ∨ c|− uv realizes sequent abc |− u. Indeed, abc
⇒ ab ∨ c and uv ⇒ u.
If two sequents si and s j realize each other, they are equivalent. In that case
f i = f j and ki = k j .
The relations of realization and equivalence can be extended onto sequent
automata S and T . If S includes in some form all demands contained in T , then
S realizes T . If two automata realize each other, they are equivalent.
These relations are easily defined for elementary sequent automata S e and
e
T , which consist of elementary sequents. The left part of such a sequent
presents an elementary event in BS(X ∪ Z ), and the right part presents a
basic event (for example, ab cde |− q, where it is supposed that X ∪ Z =
{a, b, c, d, e}). S e realizes T e if it contains all sequents contained in T e . S e and
T e are equivalent if they contain the same sequents. It follows from this that
the elementary sequent automaton is a canonical form.
There exist two basic equivalencies formulated as follows.
Affirmation 2. Sequent f i ∨ f j |− k is equivalent to the pair of sequents f i |− k
and f j |− k.
Affirmation 3. Sequent f |− ki k j is equivalent to the pair of sequents f |− ki
and f |− k j .
According to these affirmations, any sequent can be decomposed into a
series of elementary sequents (which cannot be decomposed further). This
transformation enables to compare any sequent automata, checking them for
binary relations of realization and equivalence. Affirmations 2 and 3 can be used
for equivalence transformations of sequent automata by elementary operations
of two kinds: splitting sequents (replacing one sequent by a pair) and merging
sequents (replacing a pair of sequents by one, if possible).
Elementary sequent automaton is useful for theoretical constructions but
could turn out quite noneconomical when regarding some real control systems.
Therefore two more canonical forms are introduced.
The point sequent automaton S p consists of sequents in which all left
parts represent elementary events (in BS(X ∪ Z )) and are different. The
8 Chapter 1
corresponding right parts show the responses. This form can be obtained from
elementary sequent automaton S e by merging sequents with equal left parts.
The functional sequent automaton S f consists of sequents in which all right
parts represent basic events in BS(Z ∪ Y ) and are different. So the sequents have
the form f i1 |− u i or f i0 |− u i , where variables u i ∈ Z ∪ Y, and the corresponding
left parts are interpreted as switching functions for them: ON functions f i1 and
OFF functions f i0 . S f can be obtained from S e by merging sequents with equal
right parts.
Note that both forms S p and S f can also be obtained from arbitrary sequent
automata by disjunctive decomposition of the left parts of the sequents (for the
point sequent automaton) or conjunctive decomposition of the right parts (for
the functional one).
⎛a b c p q r⎞ ⎛p q r u v w z⎞
1 − − − 0 − − 1 − − 1 − 1
⎜− 0 1 1 − −⎟ ⎜− − 0 1 − 0 −⎟
A= ⎜
⎜0 1 −
⎟ B= ⎜ ⎟
⎜ − 1 1⎟⎟,
⎜1
⎜ 0 − − 1 − 0⎟⎟
⎝− − 0 − − 0⎠ ⎝0 − − − − 1 −⎠
− − 0 1 0 − − 1 1 0 − 1 −
represent the following system of simple sequents regarded as a simple sequent
automaton:
aq |− qvz,
b cp |− r uw ,
a bqr |− pq vz ,
cr |− p w,
c pq |− qru w.
Using Sequents Description of Concurrent Digital Systems . . . 9
It has been noted1 that, to a certain extent, simple sequents resemble the
sequents of the theory of logical inference introduced by Gentzen2 . The latter
are defined as expressions
A1 , . . . , An → B1 , . . . , Bm ,
which connect arbitrary logic formulae A1 , . . . , An , B1 , . . . , Bm and are inter-
preted as implications
A1 ∧ . . . ∧ An → B1 ∨ . . . ∨ Bm .
The main difference is that any simple sequent ki |− ki presents not a pure
logical but a cause-effect relation: event ki is generated by event ki and appears
after it, so we cannot mix variables from ki with variables from ki .
But sometimes we may discard this time aspect and consider terms ki and ki
on the same level; for instance, when looking for stable states of the regarded
system. In that case, sequent ki |− ki could be formally changed for implication
ki → ki and subjected further to Boolean transformations, leading to equivalent
sets of Gentzen sequents and corresponding sets of standard disjuncts usual for
the theory of logical inference.
For example, the system of simple sequents
ab |− cd , a b |− cd, a b |− c
may be transformed into the following system of disjuncts
a ∨ b ∨ c, a ∨ b ∨ d , a ∨ b ∨ c, a ∨ b ∨ d, a ∨ b ∨ c.
a
b
c
p
p'
q
q'
r
r'
p
p'
q
q'
r
r'
u
u'
v
v'
w
w'
z
z'
Figure 1-1. PLA implementation of a simple sequent automaton.
It is seen from here that the problem of constructing a simple sequent au-
tomaton with minimum number of rows is similar to that of the minimization
of a system of Boolean functions in the class of DNFs known as a hard combi-
natorial problem. An approach to its solving was suggested in Refs. 7 and 8.
The considered model turned out to be especially convenient for represen-
tation of programmable logic arrays (PLAs) with memory on RS-flip-flops. It
is also used in methods of automaton implementation of parallel algorithms for
logical control described by expressions in PRALU11 .
Consider a simple sequent automaton shown in the above example. It is
implemented by a PLA represented in Fig. 1-1. It has three inputs (a, b, c)
supplied with inverters (NOT elements) and four outputs (u, v, w, z) supplied
with RS flip-flops. So its input and output lines are doubled. The six input
lines are intersecting with five inner ones, and at some points of intersection
transistors are placed. Their disposition can be presented by a Boolean matrix
easily obtained from matrix A and determines the AND plane of the PLA. In a
similar way the OR plane of the PLA is found from matrix B and realized on
the intersection of inner lines with 14 output lines.
Using Sequents Description of Concurrent Digital Systems . . . 11
f i1 |− u i and f j1 |− u j ,
f i0 |− u i and f j1 |− u j ,
f i1 |− u i and f j0 |− u j ,
f i0 |− u i and f j0 |− u j ,
12 Chapter 1
ACKNOWLEDGMENT
This research was supported by ISTC, Project B-986.
REFERENCES
1. M. Adamski, Digital Systems Design by Means of Rigorous and Structural Method.
Wydawnictwo Wyzszej Szkoly Inzynierskiej, Zielona Gora (1990) (in Polish).
2. G. Gentzen, Untersuchungen über das Logische Schließen. Ukrainskii Matematicheskii
Zhurnal, 39 176–210, 405–431 (1934–35).
3. V.S. Grigoryev, A.D. Zakrevskij, V.A. Perchuk, The Sequent Model of the Discrete Au-
tomaton. Vychislitelnaya Tekhnika v Mashinostroenii. Institute of Engineering Cybernet-
ics, Minsk, 147–153 (March 1972) (in Russian).
4. V.N. Zakharov, Sequent Description of Control Automata. Izvestiya AN SSSR, No. 2
(1972) (in Russian).
5. V.N. Zakharov, Automata with Distributed Memory. Energia, Moscow (1975) (in Russian).
6. A.D. Zakrevskij, V.S. Grigoryev, A system for synthesis of sequent automata in the basis
of arbitrary DNFs. In: Problems of Cybernetics. Theory of Relay Devices and Finite
Automata. VINITI, Moscow, 157–166 (1975) (in Russian).
7. A.D. Zakrevskij, Optimizing Sequent Automata. Optimization in Digital Devices Design.
Leningrad, 42–52 (1976) (in Russian).
8. A.D. Zakrevskij, Optimizing transformations of sequent automata. Tanul. MTA SeAKJ,
63, Budapest, 147–151 (1977) (in Russian).
9. A.D. Zakrevskij, Logical Synthesis of Cascade Networks. Nauka Moscow (1981) (in
Russian).
10. A.D. Zakrevskij, The analysis of concurrent logic control algorithms. In: L. Budach, R.G.
Bukharaev, O.B. Lupanov (eds.), Fundamentals of Computation Theory. Lecture Notes in
Computer Science, Vol. 278. Springer-Verlag, Berlin Heidelberg New York London Paris
Tokyo, 497–500 (1987).
Using Sequents Description of Concurrent Digital Systems . . . 13
11. A.D. Zakrevskij, Parallel Algorithms for Logical Control. Institute of Engineering Cyber-
netics, Minsk (1999) (in Russian).
12. A.D. Zakrevskij, Y.V. Pottosin, V.I. Romanov, I.V. Vasilkova, Experimental system of
automated design of logical control devices. In: Proceedings of the International Work-
shop “Discrete Optimization Methods in Scheduling and Computer-Aided Design”, Minsk
pp. 216–221 (September 5–6, 2000).
Chapter 2
FORMAL LOGIC DESIGN OF
REPROGRAMMABLE CONTROLLERS
Marian Adamski
University of Zielona Góra, Institute of Computer Engineering and Electronics,
ul. Podgorna 50, 65-246 Zielona Góra, Poland; e-mail: [email protected]
Abstract: The goal of the paper is to present a formal, rigorous approach to the design of
logic controllers, which are implemented as independent control units or as cen-
tral control parts inside modern reconfigurable microsystems. A discrete model of
a dedicated digital system is derived from the control interpreted Petri net behav-
ioral specification and considered as a modular concurrent state machine. After
hierarchical and distributed local state encoding, an equivalent symbolic descrip-
tion of a sequential system is reflected in field programmable logic by means of
commercial CAD tools. The desired behavior of the designed reprogrammable
logic controller can be validated by simulation in a VHDL environment.
Key words: Petri nets; logic controllers; hardware description languages (HDL); field pro-
grammable logic.
1. INTRODUCTION
The paper covers some effective techniques for computer-based synthesis
of reprogrammable logic controllers (RLCs), which start from the given inter-
preted Petri net based behavioral specification. It is shown how to implement
parallel (concurrent) controllers1,4,8,14 in field programmable logic (FPL). The
symbolic specification of the Petri net is considered in terms of its local state
changes, which are represented graphically by means of labeled transitions,
together with their input and output places. Such simple subnets of control in-
terpreted Petri nets are described in the form of decision rules – logic assertions
in propositional logic, written in the Gentzen sequent style1,2,12 .
Formal expressions (sequents), which describe both the structure of the net
and the intended behavior of a discrete system, may be verified formally in
16 Chapter 2
the context of mathematical logic and Petri net theory. For professional valida-
tion by simulation and effective synthesis, they are automatically transformed
into intermediate VHDL programs, which are accepted by industrial CAD
tools.
The main goal of the proposed design style is to continuously preserve
the direct, self-evident correspondence between modular interpreted Petri nets,
symbolic specification, and all considered hierarchically structured implemen-
tations of modeled digital systems, implemented in configurable or reconfig-
urable logic arrays.
The paper presents an extended outline of the proposed design methodology,
which was previously presented in DESDes’01 Conference Proceedings3 . The
modular approach to specification and synthesis of concurrent controllers is ap-
plied, and a direct hierarchical mapping of Petri nets into FPL is demonstrated.
The author assumes that the reader has a basic knowledge of Petri nets5,9,10,13,14 .
The early basic ideas related with concurrent controller design are reported in
the chapter in Ref. 1. The author’s previous work on reprogrammable logic con-
trollers has been summarized in various papers2,4,6,8 . Several important aspects
of Petri net mapping into hardware are covered in books3,13,14 . The implementa-
tion of Petri net based controllers from VHDL descriptions can be found in Refs.
2, 6, and 13. Some arguments of using Petri nets instead of linked sequential
state machines are pointed in Ref. 9.
occur independently and concurrently. The global states of the controller, in-
cluded into the equivalent SFSM model, can be eventually deduced as maximal
subsets of the local states, which simultaneously hold (configurations). They
correspond to all different maximal sets of marked places, which are obtained
during the complete execution of the net. They are usually presented in compact
form as vertices in Petri net reachability graph5,10,13 . It should be stressed that the
explicitly obtained behaviorally equivalent transition systems are usually com-
plex, both for maintenance and effective synthesis. The methodology proposed
in the paper makes it possible to obtain a correctly encoded and implemented
transition system directly from a Petri net, without knowing its global state
set.
The novel tactic presented in this paper is based on a hierarchical decompo-
sition of Petri nets into self-contained and structurally ordered modular subsets,
which can be easily identified and recognized by their common parts of the inter-
nal state code. The total codes of the related modular Petri net subnets, which
are represented graphically as macroplaces, can be obtained by means of a
simple hierarchical superposition (merging) of appropriate codes of individual
places. On the other hand, the code of a particular place includes specific parts,
which precisely define all hierarchically ordered macroplaces, which contain
the considered place inside. In such a way any separated part of a behavioral
specification can be immediately recognized on the proper level of abstraction
and easily found in the regular cell structure (logic array). It can be efficiently
modified, rejected, or replaced during the validation or redesign of the digital
circuit.
Boolean expressions called predicate labels or guards depict the external
conditions for transitions, so they can be enabled. One of enabled transition
occurs (it fires). Every immediate combinational Moore type output signal y is
linked with some sequentially related places, and it is activated when one of
these places holds a token. Immediate combinational Mealy type output signals
are also related with proper subsets of sequentially related places, but they also
depend on relevant (valid) input signals or internal signals. The Mealy type
output is active if the place holds a token and the correlated logic conditional
expression is true.
The implemented Petri net should be determined (without conflicts), safe, re-
versible, and without deadlocks5,7 . For several practical reasons the synchronous
hardware implementations of Petri nets4,6,7 are preferred. They can be realized
as dedicated digital circuits, with an internal state register and eventual output
registers, which are usually synchronized by a common clock. It is considered
here that all enabled concurrent transitions can fire independently in any order,
but nearly immediately.
In the example under consideration (Fig. 2-1), Petri net places P = {p1–
p9} stand for the local states {P1–P9} of the implemented logic controller. The
18 Chapter 2
MP0
y0
[1 2] 1
/Q1
t1 x0
MP7 Q1
MP5
MP1 MP2
y1 y2
[1] 2 [2] 4
/Q3 /Q4
t2 x1 t3 x3
/Q2
[1] 3 [2] 5
Q3 Q4
t4
MP6
MP3 t5 x5*x6 MP4 y3*y4
[2] 7 /Q4
[1] [1] t6 /x2*/x4
8 y5 Q3 6
Q2
t7 /x5 /Q3
y6
[2] 9
Q4
t8 /x6
Figure 2-1. Modular, hierarchical and colored control interpreted Petri net.
Petri net transitions T = {t1–t8} symbolize all the possible local state changes
{T1–T9}. The Petri net places are hierarchically grouped as nested modular
macroplaces MP0-MP7. The Petri net describes a controller with inputs x0–x6
and outputs y0 –y6 . The controller contains an internal state register with flip-
flops Q1–Q4. The state variables structurally encode places and macroplaces
to be implemented in hardware.
The direct mapping of a Petri net into field programmable logic (FPL) is
based on a self-evident correspondence between a place and a clearly defined
bit-subset of a state register. The place of the Petri net is assigned only to the
particular part of the register block (only to selected variables from internal
state register Q1–Q4). The beginning of local state changes is influenced by the
edge of the clock signal, giving always, as a superposition of excitations, the
predicted final global state in the state register. The high-active input values are
denoted as xi, and low-active input values as /xi.
The net could be SM-colored during the specification process, demonstrating
the paths of recognized intended sequential processes (state machines subnets).
These colors evidently help the designer to intuitively and formally validate the
consistency of all sequential processes in the developed discrete state model4 .
The colored subnets usually replicate Petri net place invariants. The invariants
of the top-level subnets can be hierarchically determined by invariants of its
Formal Logic Design of Reprogrammable Controllers 19
subnets. If a given net or subnet has not been previously colored by a designer
during specification, it is possible to perform the coloring procedure by means
of analysis of configurations. Any two concurrent places or macroplaces, which
are marked simultaneously, cannot share the same color. It means that coloring
of the net can be obtained by coloring the concurrency graph, applying the
well-known methods taken from the graph theory1,2 . For some classes of Petri
nets, the concurrency relation can be found without the derivation of all the
global state space4,7,13 . The colors ([1], [2]), which paint the places in Fig. 2-1,
separate two independent sequences of local state changes. They are easy to
find as closed chains of transitions, in which selected input and output places
are painted by means of identical colors.
The equivalent interpreted SM Petri net model, derived from equivalent
transition system description (interpreted Petri net reachability graph of the
logic controller), is given in Fig. 2-2.
The distribution of Petri net tokens among places, before the firing of any
transition, can be regarded as the identification of the current global state M.
Marking M after the firing of any enabled transition is treated as the next global
M1 y0
p1
x0 t1
M2
p 2 *p 4 y1 y2
x1 t2 x3 t3
M3 y2 M4 y1
p 3 *p 4 p 2 *p 5
x3 t3 x1 t2
M5 p 3 *p 5
t4 /x 6 t8
M6 y3 y4
p 6 *p 7
/x5 t7
M7 M8 y6
p 7 *p 8
p 6 *p 9
y3 y4 y5
/x5 t7
p 8 *p 9 y5 y6
M9
Figure 2-2. Global states of control interpreted Petri net. Transition system modeled as
equivalent state machine Petri net.
20 Chapter 2
state @M. From the present global internal state M, the modeled controller
goes to the next internal global state @M, generating the desired combinational
immediate output signals y and registered @y output signals.
There are 9 places describing global states M1–M9 and 13 transitions be-
tween 13 pairs of global states. Such an implicit formal structure really exists in
hardware, although its internal structure could be unknown, because eventually
deduced global state codes are immediately read from the state register as a
consistent superposition of local state codes. Since a Moore-type output should
be stable during the entire clock period, it can also be produced as a registered
Moore-type output @y. The registered Moore-type output signals should be
predicted before the local state changes.
usual meanings. The symbol * in the vector denotes “explicitly don’t know”
value (0 or 1, but no “don’t care”). In expressions, the symbol / denotes the
operator of logic negation, and the symbol * represents the operator of logic
conjunction. An example3 of a heuristic hierarchical local state assignment [Q1,
Q2, Q3, Q4] is as follows:
P1[1,2] = 0 - - - QP1= /Q1
P2[1] = 1 0 0 * QP2= Q1*/Q2*/Q3
P3[1] = 1 0 1 * QP3= Q1*/Q2*Q3
P4[2] = 1 0 * 0 QP4= Q1*/Q2*/Q4
P5[2] = 1 0 * 1 QP5= Q1*/Q2*Q4
P6[1] = 1 1 0 * QP6= Q1*Q2*/Q3
P7[2] = 1 1 * 0 QP7= Q1*Q2*/Q4
P8[1] = 1 1 1 * QP8= Q1*Q2*Q3
P9[2] = 1 1 * 1 QP9= Q1*Q2*Q4
The global state encoding is correct if all vertices of the reachability graph
have different codes. The total code of the global state (a vertex of the reachabil-
ity graph) can be obtained by merging the codes of the simultaneously marked
places. Taking as an example some global states (vertices of the reachability
graph; Fig. 2-2), we obtain
QM3 = QP3 ∗ QP4 = Q1 ∗ /Q2 ∗ Q3 ∗ /Q4;
QM4 = QP2 ∗ QP5 = Q1 ∗ /Q2 ∗ /Q3 ∗ Q4.
Q1
/Q1
[1 2] MP7
[1 2] P1
/Q2 Q2
[1 2] MP5 [1 2] MP6
/Q2 /Q2 Q2 Q2
MP1 MP2 MP3 MP4
its transition status lines. The great advantage of using transition status lines
is the self-evident possibility of reducing the complexity of the next state and
the output combinational logic. The registered output signals together with the
next local state codes may be generated in very simple combinational structures,
sharing together several common AND terms.
The simplified rule-based specification, especially planned for controllers
with JK state and output registers, on the right side of sequents does not contain
state coding signals, which keep their values stable, during the occurrence of
transition1 . Taking into account both the concept of transition status lines and
introducing into specification only the changing registered Moore-type outputs
signals, the specification may be rewritten as follows:
/Q1 * X0 | -T1;
T1| -@Q1*@/Q2*@/Q3*@/Q4*/@Y0*@Y1*@Y2;
(...)
Q1*Q2*Q3*/X5 | -T7;
T7 | - @/Q3*@/Y5;
Q1*Q2*/Q3*Q4*/X6 | -T8;
T8| - @/Q1*/@Y6*@Y0.
7. CONCLUSIONS
The paper presents the hierarchical Petri net approach to synthesis, in which
the modular net is structurally mapped into field programmable logic. The hier-
archy levels are preserved and related with some particular local state variable
subsets. The proposed state encoding technique saves a number of macrocells
and secures a direct mapping of Petri net into an FPL array. A concise, under-
standable specification can be easily locally modified.
The experimental Petri net to VHDL translator has been implemented on
the top of standard VHDL design tools, such as ALDEC Active-HDL. VHDL
syntax supports several conditional statements, which can be used to describe
the topology and an interpretation of Petri nets.
ACKNOWLEDGMENT
The research was supported by the Polish State Committee for Scientific
Research (Komitet Badań Naukowych) grant 4T11C 006 24.
26 Chapter 2
REFERENCES
1. M. Adamski, Parallel controller implementation using standard PLD software. In: W.R.
Moore, W. Luk (eds.), FPGAs. Abingdon EE&CS Books, Abingdon, England, pp. 296–
304 (1991).
2. M. Adamski, SFC, Petri nets and application specific logic controllers. In: Proc. of the
IEEE Int. Conf. on Systems, Man, and Cybern., San Diego, USA, pp. 728–733 (1998).
3. M. Adamski, M. Wegrzyn (eds.), Discrete-Event System Design DESDes’01, Technical
University of Zielona Gora Press Zielona Góra, ISBN: 83-85911-62-6 (2001).
4. K. Bilinski, M. Adamski, J.M. Saul, E.L. Dagless, Petri Net based algorithms for parallel
controller synthesis. IEE Proceedings-E, Computers and Digital Techniques, 141, 405–
412 (1994).
5. R. David, H. Alla, Petri Nets & Grafcet. Tools for Modelling Discrete Event Systems.
Prentice Hall, New York (1992).
6. J.M. Fernandes, M. Adamski, A.J. Proença, VHDL generation from hierarchical Petri
net specifications of parallel controllers. IEE Proceedings-E, Computer and Digital Tech-
niques, 144, 127–137 (1997).
7. M. Heiner, Petri Net based system analysis without state explosion. In: Proceedings of
High Performance Computing’98, April 1998, SCS Int., San Diego, pp. 394–403 (1988).
8. T. Kozlowski, E.L. Dagless, J.M. Saul, M. Adamski, J. Szajna, Parallel controller synthesis
using Petri nets. IEE Proceedings-E, Computers and Digital Techniques, 142, 263–271
(1995).
9. N. Marranghello, W. de Oliveira, F. Damianini, Modeling a processor with a Petri net
extension for digital systems. In: Proceedings of Conference on Design Analysis and
Simulation of Distributed Systems–DASD 2004, Part of the ASTC, Washington, DC, USA
(2004).
10. T. Murata, Petri Nets: Properties, analysis and applications. Proceedings of the IEEE, 77
(4), 541–580 (1989).
11. J.S. Sagoo, D.J. Holding, A comparison of temporal Petri net based techniques in the spec-
ification and design of hard real-time systems. Microprocessing and Microprogramming,
32, 111–118 (1991).
12. M.E. Szabo (Ed.), The collected papers of Gerhard Gentzen. North-Holland Publishing
Company, Amsterdam (1969).
13. A. Yakovlev, L. Gomes, L. Lavagno (eds.), Hardware Design and Petri Nets. Kluwer
Academic Publishers, Boston (2000).
14. A.D. Zakrevskij, Parallel Algorithms for Logical Control. Institute of Engineering Cyber-
netics of NAS of Belarus, Minsk (1999) (Book in Russian).
Chapter 3
HIERARCHICAL PETRI NETS FOR DIGITAL
CONTROLLER DESIGN
Grzegorz Andrzejewski
University of Zielona Góra, Institute of Computer Engineering and Electronics, ul. Podgórna
50, 65-246 Zielona Góra, Poland; e-mail: [email protected]
Abstract: This paper presents a model of formal specification of reactive systems. It is a kind
of an interpreted Petri net, extended by important properties: hierarchy, history,
and time dependencies. The syntax definition is introduced and the principles
of graphical representation drawing are characterized. Semantics and dynamic
behavior are shown by means of a little practical example: automatic washer
controller.
1. INTRODUCTION
Reactive systems strongly interact with the environment. Their essence con-
sists in appropriate control signals shaping in response to changes in communi-
cation signals. Control signals are usually called output signals, and communi-
cation signals input signals. It happens very frequently that a response depends
not only on actual inputs but on the system’s history too. The system is then
called an automaton and its basic model is known as the finite state machine
(FSM). But in a situation in which the system is more complicated, this model
may be difficult to depict. The problem may be solved by using a hierarchy in
which it is possible to consider the modeled system in a great number of abstrac-
tion layers. Such models as Statecharts or SyncCharts are the FSM expansions
with a hierarchy1,7,8 .
Concurrency is a next important problem. Very often a situation occurs in
which some processes must work simultaneously. In practice, it is realized by
a net of related automata synchronized by internal signals. It is not easy to
28 Chapter 3
2. SYNTAX
The following nomenclature is used: capital letters from the Latin alphabet
represent names of sets, whereas small letters stand for elements of these sets.
Small letters from the Greek alphabet represent functions belonging to the
model. Auxiliary functions are denoted by two characteristic small letters from
the Latin alphabet.
Def. 1 A hierarchical Petri net (HPN) is shown as a tuple:
HPN = (P, T, F, S, T- , χ, ψ, λ, α, ε, τ), (1)
where
1. P is a finite nonempty set of places. In “flat” nets with places the capacity
function κ : P → N ∪ (∞) describes the maximum number of tokens in
place p. For reactive systems the function equals 1; for each place p ∈
P, κ( p) = 1.
2. T is a finite nonempty set of transitions. A sum of sets P ∪ T will be called
a set of nodes and described by N .
3. F is a finite nonempty set of arcs, such that F = Fo ∪ Fe ∪ Fi , where
Fo : Fo ⊂ (P × T ) ∪ (T × P) and is called a set of ordinary arcs, Fe : Fe ⊂
(P × T ) and is called a set of enabling arcs, and Fi : Fi ⊂ (P × T ) and is
called a set of inhibit arcs. In flat nets the weight function : F → N
describes the maximum number of tokens that can be moved at the same
time through arc f . For reactive systems ∀ f ∈ F, ( p) ≤ 1. According to
this, an extra-specification of arcs is possible: ∀ f ∈ Fo , ( f ) = 1; ∀ f ∈
Fe ∪ Fi , ( f ) = 0.
Hierarchical Petri Nets for Digital Controller Design 29
p ∈ χ∗ ( p),
χ( p) ∈ χ∗ ( p),
p ∈ χ∗ ( p) ⇒ χ( p ) ⊆ χ∗ ( p).
7. ψ : P → {true, false} is a Boolean history function, assigning a history
attribute to every place p, such that χ( p) = Ø. For basic places the function
is not defined.
8. λ : N → 2 S is a labeling function, assigning expressions created from ele-
ments of set S to nodes from N . The following rules are suggested: places
may be labeled only by subsets of Y ∪ L (a label action means an action as-
signed to a place); the label of transition may be composed of the following
elements:
cond – created on set X ∪ L ∪ {false, true}, being a Boolean expression
imposed as a condition to transition t and generated by operators not, or,
and and; the absence of cond label means cond = true;
abort – created as a cond but standing in a different logical relation with
respect to general condition for transition t enabling, represented graphi-
cally by # at the beginning of expression; absence of abort label means
abort = false;
action – created on set Y ∪ L, meaning action assigned to transition t, rep-
resented graphically by / at the beginning of expression.
9. α : P →{true, false} is an initial marking function, assigning the attribute
of an initial place to every place p ∈ P. Initial places are graphically dis-
tinguished by a dot (mark) in circles representing these places.
10. ε: P → {true, false} is a final marking function, assigning the attribute of a
final place to every place p ∈ P. Final places are graphically distinguished
by × inside circles representing these places.
11. τ : N → T- is a time function, assigning numbers from the discrete scale of
time to each element from the set of nodes N .
∀ τ( p) = 0 (4-f)
p∈Ptin(o)
∀ ∀ s := false (5-c)
p∈Ptin(o) s∈action( p)
∀ χ( p) = Ø ⇒ ∀ ∀ s := false (5-d)
p∈Ptin(o) p ∈χ∗ ( p) s∈action( p )
∀ χ( p) = Ø ⇒ ∀ ac(la( p ) = true oraz
p∈Ptout p ∈χ∗ ( p)
∀ ∀ s := true (5-i)
p∈Ptout s∈action( p)
∀ χ( p) = Ø ⇒ ∀ ac( p ) = true ⇒ ∀ s := true (5-j)
p∈Ptout p ∈χ∗ ( p) s∈action( p )
where ac( p, -te ) is the state of possession (or not) of a token by place p at an
instant, in which the token left place la(p).
Note: Actions e–j are performed when τ(t) = 0. Action k is performed
during all activity time of transition t.
The most important ideas are defined additionally:
Let be given a hierarchical Petri net and place p from the set of places of
this net.
3. SEMANTICS
3.1 Synchronism
One of the basic assumptions accepted in the HPN model is synchronism.
Changes of internal state of a net follow as a result of inputs changes in strictly
appointed instants given by discrete scale of time. It entails a possibility of a
simultaneous execution of more than one transition (if the net fulfills the per-
sistent property). Additionally, it makes possible to simplify formal verification
methods and practical realization of the model5 .
3.2 Hierarchy
The hierarchy property is realized by a net decomposition, in which other
nets are coupled with distinguished places. These places are called macroplaces,
and the assigned nets are called subnets. A subnet is a basic net if no macroplaces
are assigned to it. A subnet is a final net if it contains no macroplaces. Mark-
ing of a macroplace is an activate condition of the corresponding subnet.
Such a concept allows not only much clearer projects with high complication
Hierarchical Petri Nets for Digital Controller Design 33
structures but also testing of selected behavioral properties for each subnet
separately.
3.3 History
Often a situation occurs in which an internal state on selected hierarchy
levels must be remembered. In an HPN it is realized throughout, ascribing the
history attribute {H } to a selected macroplace. With a token leaving macro,
all token locations in the adequate subnet are remembered. And after renewed
macro activation, tokens are inserted to lately active places.
For the convenience of the user, a possibility of ascribing the history attribute
to all subordinated nets is included. Operator {H ∗ } is used.
MP {H* }
P1 body of macro
MP
#x1
P2 P3
L
local signals
places P2 and P3. P1 is an initial place, and places P2 and P3 are final places.
Macroplace P1 can be deprived of activity by means of abort-condition ×1.
There is a simple example showing a simplified control system of initial
washing in an automatic washer (Fig. 3-2).
After turning on the washing program, valve V1 is opened and water is
infused. The infusing process lasts to the moment of achieving L1 level. At the
same time, after exceeding L2 level (total sinking of heater H) if the temperature
is below the required (TL1), the heater system is turned on. After closing valve
V1 the washing process is started, in which the washing cylinder is turned
alternately left and right for 10 sec. with a 5-sec. break. The process of keeping
the temperature constant is active for the whole washing cycle. The cycle is
turned off after 280 sec. and the cylinder is stopped, the heater is turned off,
and valve V2 is opened for water removal.
There is a possibility of describing such a system by means of hierarchical
Petri nets (Fig. 3-3).
P1 P2 {H}
P3 P4
t2 t1
#(not start) start t3
P5 P6
P2
P4 t4 #s1
P9
V2 P7
c2 t7 t6 c1
L3 t5
P10
P8
H
c1=L2 and (not TL1)
c2=(not L2) or TL2
P5 P6 {H} P3
<10sec> P13 P15 P11
CR
<280sec> V1
<5sec> t9 t10 <5sec> t11 L1 t8
4. CONCLUSION
The model offers a convenient means for a formal specification of reactive
systems. An equivalent model of textual notation (HPN format) is worked out
too. The rules of assigning the external function (e.g., ANSI C or VHDL) to
nodes of the net are the subject of research.
Further work and research shall focus on creating tools for automatic anal-
ysis and synthesis of the model on a hardware/software codesign platform in
an ORION software package developed by the author.
ACKNOWLEDGMENT
The research was supported by the Polish State Committee for Scientific
Research (Komitet Badań Naukowych) grant 4T11C 006 24.
REFERENCES
1. C. André, Synccharts: A visual representation of reactive behaviors. Technical Report RR
95-52, I3S, Sophia-Antipolis, France (1995).
36 Chapter 3
Abstract: A method for analyzing and predicting the timing properties of a program frag-
ment will be described. First an architectural description language implemented
to describe a processor’s architecture is presented, followed by the presentation
of a new, static worst-case execution time (WCET) estimation method. The tim-
ing analysis starts by compiling a processor’s architecture program, followed by
the disassembling of the program fragment. After sectioning the assembler pro-
gram into basic blocks, call graphs are generated and these data are later used
to evaluate the pipeline hazards and cache miss that penalize the real-time per-
formance. Some experimental results of using the developed tool to predict the
WCET of code segments using some Intel microcontroller are presented. Finally,
some conclusions and future work are presented.
Key words: architectural description language (ADL); worst-case execution time (WCET);
language paradigm; timing scheme, timing analysis.
1. INTRODUCTION
Real-time systems are characterized by the need to satisfy a huge timing and
logical constraints that regulate their correctness. Therefore, predicting a tight
worst-case execution time (WCET) of a code segment will be a must to guaran-
tee the system correctness and performance. The simplest approach to estimate
the execution time of a program fragment for each arithmetic instruction is to
count the number of times it appears on the code, express the contribution of this
instruction in terms of clock cycles, and update the total clock cycles with this
40 Chapter 4
Front End
MLPDAP
Source Editor
PutLine GetChar
GetChar
Error Tokenizer
Processor
PutLine
Get
Lexical
PutLine
PutLine Analyser
Hexadecimal
File Editor Get
Get
Parser
Put/Search
Put
Start
Symbol
Icode Table
Manager Get Manager
Search/Update
Get
Get/Update Search
Search
Start WCET
Simulator Start Disassembler Estimator
PutLine
Get
GetLine GetLine User
Assembler Interface
Manager
Executor (Back End)
The disassembling process consists of four phases and has as input an ex-
ecutable file containing the code segment that one wants to measure and the
compiled version of an ADL program. The disassembling process starts at the
start-up code address (start-up code is the bootstrap code executed immediately
after the reset or power-on of the processor) and follows the execution flow of
the program:
1. Starting at the start-up code address, it follows all possible execution paths
till reaching the end address of the “main” function. At this stage, all function
calls are examined and their entry code addresses are pushed into an auxiliary
stack.
2. From the entry address of the “main” function, it checks the main function
code for interrupt activation.
3. For each active interrupt, it gets its entry code address and pushes it into the
auxiliary stack.
4. It pops each entry address from the auxiliary stack and disassembles it,
following the function’s execution paths.
The execution of the simulation module is optional and the associated pro-
cess is described by a set of operation introduced using a function named
“SetAction.” For instance, the simulation process including the flag register
affectation, associated to an instruction, is described using SetAction calls to
specify a sequence of operations. Running the simulation process before the
estimation process will produce a more optimistic worst-case timing analysis
since it can
1. rectify the execution time of instructions that depend on data locations, such
as stack, internal, or external memory;
2. solve the indirect address problem by checking if it is a jump or a function
call (function call by address);
3. estimate the iteration number of a loop.
The WCET estimator module requires a direct interaction with the user as
some parameters are not directly measurable through the program code. Note
that the number of an interrupt occurrence and the preview of a possible max-
imum iterations number associated with an infinite loop are quite impossible
to be evaluated using only the program code. The WCET estimation process is
divided into two phases:
1. First, the code segment to be measured is decomposed into basic blocks;
2. For each basic block, the lower and upper execution times are estimated
using the shortest path method and a timing scheme1 .
The shortest path algorithm with the basic block graph as input is used
to estimate the lower and the upper bounds on the execution time of the
WCET Prediction for Embedded Processors Using an ADL 43
code segment. For the estimation of the upper bound, the multiplicative in-
verse of the upper execution time of each basic block is used. A basic block
is a sequence of assembler’s instructions, such as “only the first instruction
can be prefixed by a label” and “only the last one can be a control transfer
instruction.”
The decomposition phase is carried out following the steps given below:
1. rearrangement of code segment to guarantee the visual cohesion of a basic
block (note that the ordering of instructions by address makes the visual-
ization of the inter basic block control flow more difficult, because of long
jump instructions that can occur between basic blocks. To guarantee visual
cohesion, all sequence of instructions are rearranged by memory address,
excluding those located from long jump labels, which are inserted from the
last buffer index);
2. characterization of the conditional structure through the identification of the
instructions sequence that compose the “if” and “else” body;
3. characterization of the loop structure through the identification of the instruc-
tions sequence that composes the loop body, control, and transfer control (it
is essential to discern between “while/for” and “do while” loops since the
timing schemes are different);
4. building a basic block graph showing all the execution paths between basic
blocks;
5. finding the lower and upper execution time for each basic block.
The pipeline analysis of a given basic block must always take into account
the influences of the predecessor basic blocks (note that the dependence among
instructions can cause pipeline hazards, introducing a delay in the instructions
execution); otherwise, it leads to an underestimation of the execution time.
Therefore, at the hazard detection stage of a given basic block, it will always
WCET Prediction for Embedded Processors Using an ADL 45
Figure 4-3. Hazard detection and correction algorithm based on Proebsting’s technique.
incorporate the pipeline’s state associated with the predecessor basic blocks
over the execution paths. The resources vector that describes the pipeline’s
state will be iteratively updated by inserting pipeline stalls to correct the data
and/or structural hazards when the next instruction is issued. If these two hazards
happen simultaneously, the correction process starts at the hazard that occurred
first and then it will be checked whether the second still remains. The issuing
of a new instruction will be always preceded by the updating of the previous
pipeline’s state, achieved by shifting the actual pipeline resource vector one
cycle forward (Fig. 4-4).
The pipeline architectures, usually, present special techniques to correct
the execution flow when a control hazard happens. For instance, the delay
transfer control technique offers the hardware an extra machine cycle to de-
cide the branch. Also, special hardware is used to determine the branch label
and value condition at the end of the instruction’s decode. As one can con-
clude, the execution of delay instructions does not depend on the branch de-
cision, and it is always carried out. Therefore, we model the control hazard
as being caused by all kinds of branch instruction and by adding the sum of
46 Chapter 4
execution time of all instructions in the slot delay to the basic block execution
time.
vary and depend on the presence of the instruction and data into the caches.
Furthermore, to exactly know if the execution of a given instruction causes a
cache miss/hit, it will be necessary to carry out a global analysis of the program.
Note that an instruction’s behavior can be affected by memory references that
happened a long time ago. Adversely, the estimation of WCET becomes harder
for the modern processors, as the behaviors of cache and pipeline depend on
each other. Therefore, we propose the following changes to the algorithm that
takes into account the pipeline effects:
1. Classify the cache behavior4 for any data and instruction as cache hit or
cache miss before the analysis of the pipeline behavior.
2. Before the issuing of an instruction, verify if there is any cache miss related to
the instruction; if there is, first apply the miss penalty and then the detection
and correction of pipeline hazards.
3. EXPERIMENTAL RESULTS
For the moment, we will present some results using the 8 × C196 Intel
microcontrollers as they are the only ones present with all the needed execution
time information in the user’s guide. But we hope to present soon the results
of experiments with modern processors such as Texas Instruments DSPs, Intel
8 × C296, PICs, and so on. Figure 4-5 shows the result achieved by a direct
measurement of a program composed by two functions: main() and func().
This program was instrumented to allow a direct measurement with a digital
oscilloscope through pin 6 of port 2 (P2.6).
At a first stage, the WCET estimator builds the call graph given at the lower
right quadrant of Fig. 4-6 and then func(), identified by the label C 2192, is
processed, providing a similar screen (Fig. 4-7). At the upper right quadrant,
information, such as execution time of individual basic blocks, basic block
control flow, and function execution time, is presented. The lower right quad-
rant can present the assembly code translated by the disassembler from the
executable code, the call graph, and the simulator state. The upper left quadrant
48 Chapter 4
Figure 4-6. WCET = 61µs was estimated for the code segment measured in Fig. 4-4.
WCET Prediction for Embedded Processors Using an ADL 49
4. CONCLUSIONS
A very friendly tool for WCET estimation was developed, and the results
obtained over some Intel microcontroller were very satisfactory. To complete
the evaluation of our tool, we will realize more tests using other classes of
processors, such as DSPs, PICs, and some Motorola microcontrollers. A plenty
use of this tool requires some processor information, such as the execution
time of each instruction composing the processor instruction set, sometimes
not provided in the processor user’s guide. In such a case, to time an individual
instruction, we recommended the use of the logic analyzer to trigger on the
opcode at the target instruction location and on the opcode and location of the
next instruction.
On the basis of previous studies of some emergent ADLs8,9 , we are de-
veloping a framework to generate accurate simulators with different levels of
abstraction by using information embedded in our ADL. With this ADL it
will be possible to generate simulators, and other tools, for microprocessors
50 Chapter 4
REFERENCES
1. Alan C. Shaw, Deterministic timing schema for parallel programs, Technical Report 90-05-
06, Department of Computer Science and Engineering, University of Washington, Seattle
(1990).
2. K. Nilsen, B. Rygg, Worst-case execution time analysis on modern processor. ACM
SIGPLAN Notices, 30 (11), 20–30 (1995).
3. Y. Steven Li et al., Efficient microarchitecture modeling and path analysis for real-time
software. Technical Report, Department of Electrical Engineering, Princeton University.
4. C. Healy, M. Harmon, D. Whalley, Integrating the timing analysis of pipelining and instruc-
tion caching. Technical Report, Computer Science Department, Florida State University.
5. Sung-Soo Lim, C. Yun Park et al., An accurate worst case timing analysis for RISC pro-
cessors. IEEE Transactions on Software Engineering, 21 (7), 593–604 (1995).
6. P. Sorenson, J. Tremblay, The Theory and Practice of Compiler Writing. McGraw-Hill,
New York, ISBN 0-07-065161-2 (1987).
7. C.W. Fraser, T. Proebsting, Detecting pipeline structural hazards quickly. In: Proceedings
of the 21th Annual ACM SIGPLAN SIGACT Symposium on Principles of Programming
Languages, pp. 280–286 (January 1994).
8. S. Pees et al. LISA – Machine description language for cycle-accurate models of pro-
grammable DSP architectures. In: ACM/IEEE Design Automation Conference (1999).
9. Ashok Halambi, et al. EXPRESSION: A language for architecture exploration through
compiler/simulator retargetability. In: Design Automation and Test in Europe (DATE) Con-
ference (1999).
Chapter 5
VERIFICATION OF CONTROL PATHS
USING PETRI NETS
Abstract: This work introduces a hardware design methodology based on Petri nets that
is applied to the verification of digital control paths. The main purpose is to
design control paths that are modeled and verified formally by means of Petri net
techniques.
Key words: Petri nets; digital system design; hardware design methodology; property check-
ing; verification and validation.
1. INTRODUCTION
As the complexity of digital systems grows rapidly, there is a rising interest to
apply modeling and verification techniques for discrete-event systems. Owing
to their universality, Petri nets can be applied for system modeling, simulation,
and functional verification of hardware. Regarding digital system modeling and
verification, the Petri net theory shows significant advantages:
r Inherent ability to model sequential and concurrent events with behaviors
conflict, join, fork, and synchronization
r Ability to model system structure and system functionality as static and
dynamic net behavior
r Ability to model digital systems at different levels of abstraction using
hierarchical net concepts
r Ability to apply Petri nets as a graphical tool for functional and timed system
simulation
r Ability to verify system properties by analysis of static and dynamic net
properties
52 Chapter 5
Figure 5-1. Petri net based modeling and verification of digital systems.
Fig. 5-2, HDL input as well as graphical or textual Petri net input are proposed.
Regarding recent attempts to develop a general textual interchange format6 for
Petri nets to link Petri net editors, simulation tools, and analysis tools, an entity
for textual Petri net input is very valuable. High-level Petri nets (HLPN), such
as hierarchical Petri nets or colored Petri nets, can be applied if it is possible
to unfold the HLPN to a FCPN. Once a Petri net model is created, simulation
and analysis methods can be applied to verify the Petri net model functionally.
Because of the graphical concepts of Petri nets, design errors can be detected
easily by functional simulation. Also, functional simulation can be used to test
subnets of the Petri net model successively. A simulation run is represented as
a sequence of state transitions, and thus as a Petri net. The created occurrence
sequence is analyzed further to evaluate system behavior. Beyond simulation,
an exhaustive analysis of structural and behavioral properties leads to a formal
design verification. Behavioral properties are studied by invariant analysis and
reachability analysis. In a next step, the Petri net model can be optimized using
advanced state space analysis, Petri net reduction techniques, and net symme-
tries. Therefore, some conditions should be met before a Petri net model can be
optimized. Net model optimization techniques heavily effect an efficient imple-
mentation of the Petri net model7 . The Petri net model is then subdivided into
small subnets that can be mapped to dedicated hardware modules. The timing
behavior of the chosen hardware modules determines the timing behavior of
the designed digital system. Therefore, it is possible to verify a modeled digital
system functionally and then implement it as a self-timed design8 or as a syn-
chronous clocked design. This is clearly not the case in a conventional hardware
Verification of Control Paths Using Petri Nets 55
Petri net analysis has to ensure boundedness. In the Petri net model, bound-
edness determines that the modeled design has a finite state space. Consequently,
control paths with an infinite number of states are not close to reality. If every
place of the Petri net contains at most one token, logical values “high” and
56 Chapter 5
“low” are distinguishable for the memory cells that are represented by places.
The digital system is thereby modeled transparently. Other assignment schemes
require decoding. Liveness is the next necessary Petri net property for functional
verification, and it is interpreted as the capability to perform a state transition in
any case. Every transition of the Petri net can be fired at any reachable marking.
Therefore, if liveness is preserved, the Petri net and thus the Petri net model are
not deadlocked. If, in a live Petri net, the initial state is reachable, then the Petri
net is reversible. To reflect the structure of a sequential control path, the Petri net
should have state machine structure (SM structure). In this case no transition of
the Petri net is shared. Every transition has exactly one predecessor place and
one successor place. Therefore, in a state machine, generation and consumption
of tokens is avoided. For modeling concurrent control paths, both marked graph
structures (MG structure) with shared transitions, forking, and synchronizing
events and SM structures are required. Thus the Petri net model of a concur-
rent control path should have free-choice structure. Places with more than one
successor transition generate conflict situations. If several posttransitions of a
marked place are enabled, and one transition fires, then all other transitions are
disabled. Hence the Petri net is not persistent, and also it is not predictable as to
what transition will fire. Transition guards are able to solve conflicts, because
they represent an additional firing condition that is required to perform a state
transition. Therefore, transitions in conflict can become unique using transition
guards and no behavior ambiguity remains. When a pre-place of a transition
also appears as its post-place, then there is a self loop in the Petri net. Self loops
can give a structural expression to model external signal processing distinctly.
It has to be clarified in the modeling process as to which, if any, which self
loops are desired to emphasize external signal processing. Thus, self loops can
be detected and assigned to that situation, and others are marked as modeling
errors and should be removed. As for concurrent control path models, it is not
preferred to start with a state space analysis. Because of the huge amount of
states that a concurrent control path may have, it is more convenient to apply
structural properties analysis first. Tests for marking conservation, coverability
with place invariants, and strong connectedness perform fast. Out of these net
model properties, boundedness is decided.
4. ANALYSIS STRATEGIES
Table 5-1 summarizes all derived analyses strategies12 regarding detected
modeling errors and affected Petri net properties. Strategies S1 . . . S9 are applied
to verify sequential control paths. Primed strategies are derived from nonprimed
ones with only minor changes and can be applied for pipelined control path
verification. If a strategy (S1 . . . S9) exists only as nonprimed version, it can
Verification of Control Paths Using Petri Nets 57
Table 5-1. Analysis strategies
be applied for both types of control paths. Strategies S10 . . . S12 are derived to
verify pipelined control paths.
An automated verification of Petri net models using S1 . . . S12 can be per-
formed according to Figures 5-3 and 5-4. In Fig. 5-3, verification of sequen-
tial control paths is shown, whereas Fig. 5-4 illustrates pipelined control path
verification. Using PNK and INA, all analysis strategies are efficiently imple-
mentable. Exemplarily, strategies S1 and S2 are listed.
4.1 Strategy S1
1. If Petri net N is unbounded, then
(a) determine all shared transitions ti , |ti r| >1 using rpj ∀ pj ∈ P ⇒ gen-
erate list {tiP }.
(b) determine all transition sources Fti 0 using rti ∀ ti ∈ T .
2. Check if [(ti = r pj , ti ∈ {tiP }) ∨ (t = Fti 0)] ∀ pj ∈ P. ⇒ pretransition of
place pj produces tokens ti ∈ {tiP } or Fti 0, and unboundedness of pj is
caused by ti .
4.2 Strategy S2
1. Check liveness ⇒ generate list of live transitions {tiL }.
2. Compute strongly connected components (SCC) in RN ⇒ generate tuple
{SCCi , pj }.
58 Chapter 5
3. Determine shared places and sharing transitions for each SCC using pre-
set and postset of pj ∀ pj ∈ P: (| r pj | > 1) ∨ (| pj r| >1) ⇒ generate tuple
{SCCi , pj , tk }.
4. Compare all tuples {SCCi , pj , tk }. Multiple-occurring tk form transitions
between different SCCs: SCCi → SCCj (SMi → SMj ).
5. Compare all tk and {tiL }. If tk ∈ {tiL } ⇒ tk is live. Otherwise tk is a dead
transition between two different SCCs.
6. Detect live SCCs. For each SCC, check whether ( r pj ∈ {tiL } ∀ pj ∈ SCCi ) ∨
( r pj ∈ {tiL }∀ pj ∈ SCCi ) ⇒ SCCi is live. Dead SCCs include at least one
dead transition in the preset or postset of its places.
Strategies S1 . . . S1 detect shared transitions with more than one post-place
or transitions without pre-place that cause unbounded places, and hence an
unbounded Petri net. S2 and S3 are applied to detect and localize dead transitions
that are caused by lack of strong connectedness or by shared transitions with
more than one pre-place. It is possible that the analyzed Petri net is 1-bounded
and live, but it shows no state machine structure. In this case, all generated tokens
are consumed within a marked graph that is a Petri net structure in which no place
is shared. Strategy S4 is applied to localize such shared transitions. Similarly
S4 finds nonconservative transitions by evaluating place invariants. By means
of strategies S5 . . . S7 , dead transitions caused by one-sided nodes are detected,
as mentioned in Table 5-1. In analysis strategies S8 . . . S9 , transition guards
are used to interpret conflicts and self loops within the Petri net model. The last
strategies S10 . . . S12 are applied to localize liveness and safeness problems
in conservative, strongly connected, and place invariant covered net models.
According to the proposed analysis strategies, it is possible to derive modeling
guidelines that affect the modeling process and assist the designer to create a
system model reflecting the desired functionality. In Table 5-2, analysis results
and their interpretation for the Petri net model are summarized.
Table 5-2. Verified properties and their interpretation in the Petri net model
5. APPLICATION
In computer architecture, the DLX microprocessor is a well-known
example13 . In a case study, the control path of the sequential and the con-
current DLX is designed as a Petri net model, whereas data path and memory
test bench are provided by VHDL models14 . In all, 52 instructions covering all
instruction types are modeled. Additionally reset, error, exception, and inter-
rupt handling is considered. The Petri net model corresponding to the sequential
control path contains 243 nodes. To implement the whole instruction set, 64
control states are required. Complex sequential control paths, such as control
path of a sequential microprocessor, consist of a system of strongly connected
state machines. This includes decomposability into partial nets with state ma-
chine structure. When a Petri net is 1-bounded and live and has state machine
structure, then a very transparent Petri net model is created. Every state in the
Petri net model is assigned to a state of the control path. Thus, the reachability
tree is rather small and represents exactly 64 control states. Using the derived
analysis strategies S1 . . . S9, all modeling errors could be detected, localized,
and removed. The analyzed Petri net properties and their interpretation for the
Petri net model enable to decide functional correctness of the Petri net model
formally. The Petri net model that corresponds to the pipelined control path
contains 267 nodes and, compared with the sequential case, a similar propor-
tion of places and transitions. But the state space of this Petri net model is of
O(105 ). The processor pipeline with five stages is modeled using a free-choice
structure that shows only conservative transitions. Hence the fork degree of a
transition equals its synchronization degree. There are control places to ensure
mutual exclusion between pipeline stages. Under these modeling conditions,
especially, place invariants are convenient to verify the correct pipeline behav-
ior. A Petri net model of 5 pipeline stages leads to 4 place invariants that cover
the whole net model, each a set of places that contains a constant weighted sum
of markings. If there are less than 4 invariants, then there may occur a situation
of an unbounded net model. If the weighted sum of markings is not equal to 1,
then there is a liveness or safeness problem. Because of structural preanalysis, it
Verification of Control Paths Using Petri Nets 61
is easy to detect and locate modeling errors. The analysis strategies S1 . . . S12
detected and localized all occurring modeling errors. Thus the Petri net model
of the pipelined control path was formally verified, too.
6. CONCLUSIONS
This work introduces a Petri net based hardware design methodology for
modeling and verification of digital systems. Modeling digital systems using
free-choice Petri nets (FCPN) and control engineering interpreted Petri nets
(CEI PN) leads to highly transparent and simple structured Petri net models.
Using Petri net analysis techniques, functional verification of Petri net models
is obtained by analysis of Petri net properties and a suitable interpretation
of the Petri net model. For functional verification of control paths, analysis
strategies are provided. Using these analysis strategies, it is possible to detect
and localize modeling errors automatically. As an outcome of applied analysis
strategies, some modeling guidelines are derived. Adhering to these modeling
guidelines enables to approach a functionally correct design already at early
modeling cycles. The methodology is applied to the functional verification of
microprocessor control paths.
REFERENCES
1. S. Patil, Coordination of asynchronous events. PhD Thesis, Department of Electrical
Engineering, MIT, Cambridge, MA (1970).
2. D. Misunas, Petri-nets and speed-independent design. Communications of the ACM, 16 (8),
474–479 (1973).
3. J. Cortadella et al., Hardware and Petri nets: Application to asynchronous circuit design.
Lecture Notes in Computer Science, 1825, 1–15, (Springer 2000).
4. R. König, L. Quäck, Petri-Netze in der Steuerungstechnik. Oldenbourg, München (1988).
5. S. Olcoz, J.M. Colom, A Petri net approach for the analysis of VHDL descriptions. Lecture
Notes in Computer Science, 683, 1–15 (Springer 1993).
6. M. Jüngel, E. Kindler, M.Weber, The Petri net markup language. In: Philippi Stefan (ed.),
Fachberichte Informatik Universität Koblenz-Landau, Nr. 7-2000, pp. 47–52 (2000).
7. W. Erhard, A. Reinsch, T. Schober, Formale Verifikation sequentieller Kontrollpfade mit
Petrinetzen, Berichte zur Rechnerarchitektur,Universität Jena, Institut für Informatik, 7
(2), 1–42 (2001).
8. W. Erhard, A. Reinsch, T. Schober, First steps towards a reconfigurable asynchronous sys-
tem. In: Proceedings of 10th IEEE International Workshop on Rapid System Prototyping
(RSP), Clearwater, FL, pp. 28–31 (1999).
9. E. Kindler, M. Weber, The Petri net kernel—An infrastructure for buildung Petri net tools.
Lecture Notes in Computer Science, 1643, 10–19 (Springer 1999).
10. INA Integrated Net Analyzer. Humboldt Universität zu Berlin, Institut für Informatik,
(July 7, 2003); http://www.informatik.hu-berlin.de/lehrstuehle/automaten/ina.
62 Chapter 5
11. K.Wagner, Petri Netz basierte Implementierung einer formalen Verifikation sequentieller
Kontrollpfade. Studienarbeit, Universität Jena, Institut für Informatik, pp. 1–22 (2002)
(unpublished).
12. T. Schober, Formale Verifikation digitaler Systeme mit Petrinetzen. Dissertation, Univer-
sität Jena, pp. 1–114 (2003).
13. J.A. Hennessy, D.A. Patterson, Computer Architecture—A Quantitative Approach. Morgan
Kaufmann Publisher, San Francisco (1996).
14. The DLXS processor design, University Stuttgart, Institut of Parallel and Distributed
Systems (April 28, 1998); http://www.informatik.uni-stuttgart.de/ipvr/ise/projekte/dlx/.
Chapter 6
MEMORY-SAVING ANALYSIS OF PETRI NETS
Andrei Karatkevich
University of Zielona Góra, Institute of Computer Engineering and Electronics,
ul. Podgórna 50, 65-246 Zielona Góra, Poland; e-mail: [email protected]
Abstract: An approach to Petri net analysis by state space construction is presented in the
paper, allowing reducing the necessary memory amount by means of removing
from memory the information on some of intermediate states. Applicability of the
approach to deadlock detection and some other analysis tasks is studied. Besides
this, a method of breaking cycles in oriented graphs is described.
1. INTRODUCTION
Petri nets1 are a popular formal model of a concurrent discrete system,
widely applied for specifying and verifying control systems, communication
protocols, digital devices, and so on. Analysis of such nets is a time- and
memory-consuming task, because even a simple net may have a huge number
of reachable states caused by its concurrent nature (the so-called state explosion
problem2,3 ).
However, state space search remains one of the main approaches to Petri
net analysis (deadlock detection, for example). But there are various methods
handling state explosion problem, such as lazy state space constructions, build-
ing reduced state spaces instead of the complete ones2 . Among such methods,
Valmari’s stubborn set method 4,5 is best known.
Theoretically there is no necessity of huge memory amount to solve Petri net
analysis problems; there is a polynomial-space algorithm of deadlock detection
in a safe Petri net, but it is practically absolutely inapplicable because its time
consumption is woeful3 . Generally, the known algorithms solving verification
tasks in a relatively small memory are extremely slow3 .
64 Chapter 6
2. PRELIMINARIES
A Petri net1 is a triple = (P, T, F), where P is a set of places, T is a set
of transitions, P ∩ T = Ø and F ⊆ (P × T ) ∪ (T × P). For t ∈ T , • t denotes
{ p ∈ P|( p, t) ∈ F}, t • denotes { p ∈ P|(t, p) ∈ F}, and • t and t • are the sets
of input and output places, respectively. ∀t ∈ T : • t = Ø, t • = Ø. A similar
notation is used for places (• p, p • ). A Petri net can also be considered as an
oriented bipartite graph. A state (marking) of a net is defined as a function
M: P → {0, 1, 2, . . .}. It can be considered as a number of tokens situated in
the net places. M( p) denotes the number of tokens in place p at M. M > M
denotes that ∀ p ∈ P: M ( p) ≥ M( p) and ∃ p ∈ P: M ( p) > M( p). Initial state
M0 is usually specified.
A transition t is enabled and can fire if all its input places contain tokens.
Transition firing removes one token from each input place and adds one token
to each output place, thus changing the current state. If t is enabled in M and its
firing transforms M into M , then that is denoted as MtM . This denotation and
the notion of transition firing can be generalized for firing sequencess (sequential
firing of the transitions, such that each transition is enabled in the state created by
firing of the previous transition). If a firing sequence σ leads from state M to M ,
it is denoted as MσM . A state that can be reached from M by a firing sequence
is called reachable from M; the set of reachable states is denoted as [M. A
transition is live if there is a reachable marking in which it is enabled; otherwise
it is dead. A state in which no transitions are enabled is called a deadlock. A net
is live if in all the reachable markings, all the transitions of the net are live. A
net is safe if in any reachable marking no place contains more than one token.
A net is bounded if ∃n: ∀ p ∈ P ∀M ∈ [M0 M( p) ≤ n (there is an upper bound
of number of tokens for all the net places in all reachable markings).
A reachability graph is a graph G = (V, E) representing state space of a
net. V = [M0 ; e = (M, M ) ∈ E ⇔ Mt M (then t marks e). The reachability
graph is finite if and only if the net is bounded. A strongly connected component
(SCC) of a reachability graph is a maximal strongly connected subgraph. A
terminal component of a graph G is its SCC such that each edge which starts
in the component also ends in it3,5 .
A set TS of the transitions of a Petri net at state M is a stubborn set if
(1) every disabled transition in TS has an empty input place p such that all
Memory-Saving Analysis of Petri Nets 65
3. ON-THE-FLY REDUCTION OF
REACHABILITY GRAPH
Consider the problem of detecting deadlocks. To solve it there is no neces-
sity to keep in memory the whole (even reduced) reachability graph with all
intermediate (non-deadlock) states. But it is evident that removing all of them
can lead to eternal looping (if the reachability graph has cycles). So, some of
the intermediate states should be kept. Which ones? The following affirmations
allow obtaining the answer.
Affirmation 1. (Lemma 3 from Ref. 7). For every cycle C in a reachability
graph, there is a cycle C in the net graph such that every transition belonging
to C marks an arc in C.
Affirmation 2. (Lemma 1 from Ref. 7). Let M σ M , where M > M. Then
there is a cycle C in the net graph such that every transition belonging to C
appears in σ.
An algorithm is presented, which is a modification of the well-known algo-
rithm of reachability graph building.
Algorithm 1
Input: Petri net = (P, T, F), initial state M0 .
Output: Graph G(V, E).
1 V := {M0 }, E := Ø − with − cir cle, D := {M0 }. Tag M0 as “new.”
2 Select Q ⊆ T such that for every cycle in the net graph, at least one
transition belongs to Q.
3 While “new” states exist in V , do the following:
3.1 Select a new state M.
3.2 If no transitions are enabled at M, tag M as “deadlock.”
3.3 While there exist enabled transitions at M, do the following for each
enabled transition t at M ∗ :
3.3.1 Obtain the state M that results from firing t at M.
66 Chapter 6
Algorithm 2
Input: Petri net = (P, T, F), initial state M0 .
Output: Graph G(V, E).
1 V := {M0 }, E := Ø, D := {M0 }. Tag M0 as “new.”
2 Select Q ⊆ T such that for every cycle in the net graph, at least one
transition belongs to Q.
3 While “new” states exist in V , do the following:
3.1 Select a new state M.
3.2 If no transitions are enabled at M, tag M as “deadlock.”
3.3 While there exist enabled transitions at M, do the following for each
enabled transition t at M:
3.3.1 Obtain the state M that results from firing t at M.
3.3.2 If on the path from M0 to M there exists a marking M such that
M > M , then communicate “The net is unbounded” and go to 4.
3.3.3 If M ∈/ V , add M to V and tag M as “new.”
3.3.4 Add the arc (M, M ) to F, mark (M, M ) by t.
3.3.5 If t ∈ Q, add M to D.
3.4 If M is not a “deadlock” and M ∈ / Q, do the following:
3.4.1 For every pair of arcs in (a, M) ∈ F and (M, b) ∈ F, add to F arc (a, b),
and mark (a, b) by all the transitions marking (a, M) and (M, b).
3.4.2 Remove M from V and all the incident arcs from F.
4 The end.
p1 1
t1
p2 2,5
p5
t4
3,5 2,6
t2
p6
p3
t6
t5
4,5 3,6 2,7 2,8
t3
p8
p7
p4
4,6 3,7 3,8
t9
4,7 4,8
p9
Dead markings!
t10
(a) (b)
Figure 6-1. A Petri net (a) and its reachability graph (b).
5. EXAMPLE
Consider a Petri net (Fig. 6-1a). Its reachability graph is shown in Fig. 6-1b.
It has 13 nodes. In Fig. 6-2 graph G is shown; Fig. 6-2a presents it at a stage of
Algorithm 1 execution when its number of nodes is maximal (supposing search
in BFS order); Fig. 6-2b presents its final form. The maximal number of nodes of
G is 6; every marking has been considered only once. The situation is different
if the state space is searched in DFS order; then the maximal number of nodes
is also 13, but some of the states have been considered two or even three times.
It is also interesting to compare an RRG built using the stubborn set method
and graph G built by Algorithm 1a for this example. The RRG contains seven
nodes, and G has maximally three nodes.
6. CONCLUDING REMARKS
6.1 Complexity of the method
How much we gain (and loose) with the proposed method?
An exact analytical evaluation of the space and time complexity of the
method turns to be a complex task even for the nets with a very restricted
Memory-Saving Analysis of Petri Nets 69
2,6
4,7 4,8
(a) (b)
Figure 6-2. Intermediate (a) and final (b) graph G constructed by Algorithm 1 for the net
shown in Fig. 1a.
ACKNOWLEDGMENT
The research was supported by the Polish State Committee for Scientific
Research (Komitet Badań Naukowych) grant 4T11C 006 24.
where e is an arc, w(e) is the weight, id and od are input and output degrees, respectively,
and init(e) and ter(e) are the initial and terminal nodes of the arc e, respectively. Then the arcs
are sorted (in order of nondecreasing weights) and added to the acyclic oriented graph being
constructed, excluding the arcs, adding of which would create a cycle. The algorithm processes
each strongly connected component separately.
This process is similar to the process of building of a minimal spanning tree in Prim’s
algorithm18 , but, of course, a greedy algorithm cannot guarantee the optimal solution in this case.
The intuition behind the algorithm is the following: if the initial node of an arc has many incoming
arcs and few outgoing arcs, and its terminal node has many outgoing arcs and few incoming arcs,
then it is likely that it belongs to many cycles and is one (or one of the few) common arc of those
Memory-Saving Analysis of Petri Nets 71
Figure 6-3. Examples of breaking cycles in oriented graphs by the algorithm described in the
appendix. Dashed arcs are removed.
cycles. Therefore, it is better not to add such an arc to the acyclic graph being built; that is why
a bigger weight is assigned to it.
The time complexity of the algorithm is ((|V | + |E|)2 ). For many examples it gives exact
or close to exact solutions (see Fig. 6-3).
REFERENCES
1. T. Murata, Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77
(4), 548–580 (April 1989).
2. M. Heiner, Petri net-based system analysis without state explosion. High Performance
Computing ’98, Boston, pp. 1–10 (April 1998).
3. A. Valmari, The state explosion problem. In: W. Reisicg, G. Rozenberg (eds.), Lectures
on Petri Nets I: Basic Models, LNCS Tutorials, LNCS 1491, Springer-Verlag pp. 429–528,
(1998).
4. A. Valmari, State of the art report: Stubborn sets. Petri Nets Newsletter, 46, 6–14 (1994).
5. C. Girault, R. Valk Petri Nets for System Engineering: A Guide to Modeling, Verification,
and Applications, Springer-Verlag (2003).
6. A. Karatkevich, M. Adamski, Deadlock analysis of Petri nets: Minimization of memory
amount. In: Proceedings of 3rd ECS Conference, Bratislava, pp. 69–72 (2001).
7. A.G. Karatkevich, Dynamic reduction of Petri net reachability graphs. Radioelektronika
i Informatika, 18 (1), 76–82, Kharkov (2002) (in Russian).
8. A. Karatkevich, On algorithms for decyclisation of oriented graphs. In: DESDes’01.
Przytok, Poland, pp. 35–40 (2001).
9. R. Janicki, M. Koutny, Optimal simulation, nets and reachability graphs. Technical Report
No. 01-09, McMaster University, Hamilton (1991).
10. A. Karatkevich, Optimal simulation of α-nets. In: SRE-2000. Zielona Góra, pp. 205–210
(2000).
11. A. Lempel, Minimum feedback arc and vertex sets of a directed graph IEEE Transactions
on Circuit Theory, CT-13 (4), 399–403 (December 1966).
12. N. Deo, Graph Theory with Applications to Engineering and Computer Science. Prentice-
Hall, New Jersey (1974).
13. B. Thelen, Investigation of Algorithms for Computer-Aided Logic Design of Digital Cir-
cuits. University of Karlsruhe (1988) (in German).
72 Chapter 6
14. H.-J. Mathony, Universal logic design algorithm and its application to the synthesis of
two-level switching circuits. In: Proceedings of the IEE, 136 (3), 171–177 (1989).
15. J. Bieganowski, A. Karatkevich, Heuristics for Thelen’s prime implicant method. In: MSK,
Krakow, pp. 71–76 (November 2003) (in Polish).
16. A. Wȩgrzyn, A. Karatkevich, J. Bieganowski, Detection of deadlocks and traps in Petri
nets by means of Thelen’s prime implicant method, AMCS, 14 (1), 113–121 (2004).
17. O. Coudert, J.K. Madre, New ideas for solving covering problems. In: Proceedings of the
32nd ACM/IEEE Conference on Design Automation, San Francisco, California, United
States, pp. 641–646 (1995).
18. T.H. Cormen, Ch.E. Leiserson, R.L. Rivest, Introduction to Algorithms. MIT (1994).
Chapter 7
SYMBOLIC STATE EXPLORATION
OF UML STATECHARTS FOR
HARDWARE DESCRIPTION
Grzegorz Labiak
University of Zielona Góra, Institute of Computer Engineering and Electronics,
ul. Podgórna 50, 65-246 Zielona Góra, Poland; e-mail: [email protected]
Abstract: The finite state machine (FSM) and Petri net theories have elaborated many
techniques and algorithms that enable the employment of formal methods in the
fields of synthesis, testing, and verification. Many of them are based on symbolic
state exploration. This paper focuses on the algorithm of the symbolic state space
exploration of controllers specified by means of statecharts. Statecharts are a new
technique for specifying the behaviour of controllers, which, in comparison with
FSM and Petri nets, is enriched with notions of hierarchy, history, and exception
transitions. The paper presents the statechart diagrams as a means of digital circuit
specification.
1. INTRODUCTION
Statecharts are a visual formalism for the specification of reactive systems,
which is based on the idea of enriching state transition diagrams with notions
of hierarchy, concurrency, and broadcast communication1,2,3,4 . It was invented
as a visual formalism for complex systems by David Harel1 . Today, as a part of
UML technology, it is widely used in many fields of modern engineering5 . The
presented approach features such characteristics as Moore and Mealy automata,
history, and terminal states. There are many algorithms based on a state transi-
tion graph traversal for finite state machines (FSMs) which have applications in
the area of synthesis, test, and verification4,6,7,8,9 . It seems to be very promising
to use well-developed techniques from the FSM and Petri net theory in the field
of synthesis10 , testing, and the verification of controllers specified by means
74 Chapter 7
3. SEMANTICS
A digital controller specified with a statechart and realized as an electronic
circuit is meant to work in an environment that prompts the controller by means
of events. It is assumed that every event (incoming, outgoing, and internal) is
bound with a discrete time domain. The controller reacts to the set of accessible
events in the system through a firing set of enabled transitions called a microstep.
Because of feedback, execution of a microstep entails generating further events
and causes firing subsequent microsteps. Events triggered during a current
microstep do not influence on transitions being realized but are only allowed
to affect the behavior of a controller in the next tick of discrete time: that is, in
Symbolic State Exploration of UML Statecharts . . . 75
s1
s11 s12
s2 s4
exit / a
t1: a+c E z = {a,b,c,d}
t2: b t6: b
X = {a }
s3 s5
entry/b Y = {d}
H
t4: a* !b s7
s6 t3: a*c
t5: c* !b/{b} do / d
do / c
s1
s11 s12
s2 s3 s4 s5
s6 s7
START
t1: i/{t1}
ACTION
entry/entr
do/d
exit/ext
t2: t1/{t2}
STOP
In Fig. 7-3 a simple diagram and its waveforms illustrate the assumed dy-
namics features. When transition t1 is fired (T = 350), event t1 is broadcast and
becomes available for the system at the next instant of discrete time (T = 450).
The activity moves from the START to the ACTION state. Now transition t2
becomes enabled. Its source state is active and predicates imposed on it (event
t1 ) are met. So, at instant of time T = 450 the system shifts its activity to the
STOP state and triggers event t2 , which does not affect any other transition. The
step is finished.
4. HARDWARE MAPPING
The main assumption of a hardware implemented behavior described with
statechart diagrams is that the systems specified in this way can directly be
mapped into programmable logic devices. This means that elements from a
diagram (for example, states or events) are to be in direct correspondence with
resources available in a programmable device— mainly flip-flops and the pro-
grammable combinatorial logic. On the basis of this assumption and taking
into account the assumed dynamic characteristics, the following principles of
hardware implementation have been formulated:
r Each state is assigned one flip-flop— activity means that the state associated
with the flip-flop can be active or in the case of a state with a history attribute,
its past activity is remembered; an activity of state is established on the basis
of activity of flip-flops assigned to superordinate states (in the sense of a
hierarchy tree).
Symbolic State Exploration of UML Statecharts . . . 77
SYSTEM STATECHART
The diagram from Fig. 7-1 consists of 10 state flip-flops (s1 , s11 , s12 , s2 , s3 ,
s4 , s5 , s6 , s7 , se ) and three event flip-flops (e1 , e2 , e3 ). The flip-flop denoted as
e1 corresponds to exit event e1 assigned to state s2 , e2 corresponds to the entry
action (e2 ) to state s5 , and e3 corresponds to the transition action (broadcasting
of e2 event) bound with transition t5 firing (from state s7 to state s6 ). Global
states comprise all information about the statechart, about both currently active
states and their past activity.
An activity of a state flip-flop does not mean activity of a state bound with
the flip-flop. Logic 1 on flip-flop output means actual or recent state activity.
Hence, state activity is established on the basis of activity of flip-flops assigned
78 Chapter 7
to the superordinate states. The state is said to be active when every flip-flop
bound with the states belonging to the path (in the sense of a hierarchy tree)
carried from the state to the root state (located on top of a hierarchy) is asserted.
Formally, a state activity condition is calculated according to the following
definition:
Definition 2. State activity condition
State activity condition, denoted as activecond(s), is calculated as follows:
activecond(s) = si (1)
si ∈ path(rootz ,s)
where si is a signal from the flip-flop output and path(rootz , s) is a set of states
belonging to the path carried between rootz and sin a hierarchy tree.
For example, activecond(s6 )= s1 ∗ s11 ∗ s3 ∗ s6 (cf. Fig. 7-2, where this path is
thickened).
Definition 3. Configuration
Configuration is a set of currently active states obtained in the consequence of
iterative executing of the system, starting from default global state G 0 .
A configuration carries only the information about the current state of the
system and is calculated by means of the state activity condition (cf. Defini-
tion 2). For example, configuration s1 s11 s12 s2 s̄3 s̄4 s̄5 s̄6 s̄7 se corresponds to global
state s1 s11 s12 s2 s̄3 s̄4 s̄5 s6 s̄7 se e1 ē2 ē3 . In Fig. 7-2 states belonging to the default con-
figuration C0 are grayed.
From the point of view of symbolic analysis techniques it is essential to
express the concept of the set of states. The notion of the characteristic function,
well known in algebra theory, can be applied.
Definition 4. Characteristic function
A characteristic function X A of a set of elements A ⊆ U is a Boolean function
X A : U → {0, 1} defined as follows:
1 ⇔ x ∈ A,
X A (x) = (2)
0 otherwise.
possible global states, and Fig. 7-6 the characteristic function of all configura-
tions for the statechart from Fig. 7-1.
symbolic_traversal_of_Statechart(Z,initial_marking) {
[ = current_marking = initial_marking;
0
while (current_marking != ø) {
next_marking = image_computation(Z,current_marking);
current_marking = next_marking * [ ;
0
[ 0
= current_marking + [ 0
;
}
}
Starting from the default global state and the set of signals, symbolic state
exploration methods enable the computation of the entire set of next global states
in one formal step. Burch et al. and Coudert et al. were the first to independently
propose the approach to the image computation8,9 . Two main methods are the
transition relation and transition function. The latter is the method implemented
by the author. The symbolic state space algorithm of statechart Z is given in
Fig. 7-7.
The variables in italics represent characteristic functions of corresponding
sets of configurations. All logical variables are represented by BDDs. Several
subsequent global states are simultaneously calculated using the characteristic
function of current global states and transition functions. This computation is
realized by the image computation function. The set of subsequent global states
is calculated from the following equations:
next marking
n (4)
= ∃s ∃x current marking∗ i=1 si current marking∗ δi (s, x))
In the characteristic function X [c0 from Fig. 7-6, si denotes activity of ith
state. The statechart from Fig. 7-1 describes the behavior that comprises 9 global
states and 6 configurations.
7. SYSTEM HICOS
Up till now, there have not been many CAD programs that employ statechart
diagrams in digital circuit design implemented in programmable devices. The
most prominent is STATEMATE Magnum by I-Logix15 , where the modeled
behavior is ultimately described in HDL language ( VHDL or Verilog) with the
use of case and sequential instructions like process or always.
System HiCoS16 (Fig. 7-8) automatically converts a statechart description
of behavior into a register transfer level description. The input model described
is in its own textual representation (statecharts specification format), which
is equivalent to a graphical form, and the next is transformed into Boolean
equations. Boolean equations are internally represented by means of BDDs13,17 .
Next, a reachability graph can be built or through RTL-VHDL designed model
can be implemented in programmable logic devices.
SYSTEM HiCoS
Reachability
Graph BDD
Statecharts Boolean
SSF Equations
BDD
VHDL RTL FPGA
8. CONCLUSION
A visual formalism proposed by David Harel can be effectively used to
specify the behavior of digital controllers. Controllers specified in this way
can subsequently be synthesized in FPGA circuits. In this paper it has been
shown that state space traversal techniques from the FSM and Petri nets theory
can be efficiently used in the fields of statechart controllers design. Within the
framework of the research, a software system called HiCoS has been developed,
where the algorithms have been successfully implemented.
ACKNOWLEDGMENT
The research was supported by the Polish State Committee for Scientific
Research (Komitet Badań Naukowych) grant 4T11C 006 24.
REFERENCES
1. D. Harel, Statecharts, A visual formalism for complex systems. Science of Computer
Programming, Vol. 8. North-Holland, Amsterdam, pp. 231–274 (1987).
2. G. Labiak, Implementacja sieci Statechart w reprogramowalnej strukturze FPGA. Mat. I
Krajowej Konf. Nauk. Reprogramowalne Uklady Cyfrowe, Szczecin, pp. 169–177 (1998).
3. A. Magiollo-Schettini, M. Merro, Priorities in Statecharts, Diparamiento di Informatica,
Universita di Pisa, Corso Italia.
4. M. Rausch B.H. Krogh, Symbolic verification of stateflow logic. In: Proceedings of the
4th Workshop on Discrete Event System, Cagliari, Italy, pp. 489–494 (1998).
5. UML 1.3 Documentation, Rational Software Corp. 9‘ 9, http:// www.rational.com/uml
6. K. Biliński, Application of Petri Nets in parallel controllers design. PhD. Thesis, University
of Bristol, Bristol (1996).
7. J.R. Burch, E.M. Clarke, K.L. McMillan, D. Dill, Sequential circuit verification using
symbolic model checking. In: Proceedings of the 27th Design Automation Conference,
pp. 46–51 (June 1990).
8. O. Coudert, C. Berthet, J.C. Madre, Verification of sequential machines using Boolean
functional vectors. In: IMEC-IFIP International Workshop on Applied Formal Methods
for Correct VLSI Design, pp. 111–128 (November 1989).
9. A. Ghosh, S. Devadas, A.R. Newton, Sequential Logic Testing and Verification. Kluwer
Academic Publisher, Boston (1992).
10. M. Adamski , SFC, Petri nets and application specific logic controllers. In: Proceedings of
the IEEE International Conference on Systems, Man and Cybernetics, San Diego, USA,
pp. 728–733 (November 1998).
11. M. von der Beeck, A Comparison of Statecharts Variants, LNCS, Vol. 860. Springer,
pp. 128–148 (1994).
12. G. Labiak, Wykorzystanie hierarchicznego modelu wspólbie˙ znego automatu w projek-
towaniu sterowników cyfrowych. PhD Thesis, Warsaw University of Technology, Warsaw
(June, 2003).
Symbolic State Exploration of UML Statecharts . . . 83
13. S.-I. Minato, Binary Decision Diagrams and Applications for VLSI CAD. Kluwer Aca-
demic Publisher, Boston (1996).
14. G. Labiak, Symbolic states exploration of controllers specified by means of statecharts.
In. Proceedings of the International Workshop DESDes’01, Przytok pp. 209–214 (2001).
15. http://www.ilogix.com/products/magnum/index.cfm
16. http://www.uz.zgora.pl/∼glabiak
17. F. Somenzi, CUDD: CU decision diagram package, http://vlsi.colorado.edu/∼fabio/
CUDD/cuddIntro.html
Chapter 8
CALCULATING STATE SPACES OF
HIERARCHICAL PETRI NETS USING BDD
Piotr Miczulski
University of Zielona Góra, Institute of Computer Engineering and Electronics,
ul. Podgórna 50, 65-246 Zielona Góra, Poland; e-mail: [email protected]
Abstract: The state space of a hierarchical Petri net can be presented as a hierarchical
reachability graph. However, the hierarchical reachability graph can be described
with the help of logic functions. On the other hand, binary decision diagrams
(BDD) are efficient data structures for representing logic functions. Because of the
exponential growth of the number of states in Petri nets, it is difficult to process the
whole state space. Therefore the abstraction method of selected macromodules
gives the possibility of analysis and synthesis for more complex systems. The
goal of the paper is to show a method for representing the state space in the
form of a connected system of binary decision diagrams as well as its calculation
algorithm.
Key words: hierarchical Petri nets; state space calculation algorithm; binary decision diagram;
connected system of binary decision diagrams.
1. INTRODUCTION
Hierarchical interpreted Petri nets are a hierarchical method of describing
concurrent processes. They enable the design of a complex system through
abstracting some parts of the net. It is possible when the abstraction part of
the net is formally correct; i.e., it is safe, live, and persistent1 . This approach
can also be used for a level of the state space of a digital circuit. One of
the possibilities of state spaces representation is a hierarchical reachability
graph. It describes the state space on various hierarchy levels. The hierarchical
reachability graph can be represented in the form of logic functions2 , in which
86 Chapter 8
the logic variables correspond to places and macroplaces of a Petri net. The
number of them equals the number of places and macroplaces. However, the
efficient methods of representing logic functions are decision diagrams, e.g.,
binary decision diagrams (BDDs), zero-suppressed binary decision diagrams
(ZBDDs), or Kronecker functional decision diagrams (KFDDs). Therefore,
each level of the hierarchy can be represented as a logic function (decision
diagram), and the set of these functions creates the system of the connected
decision diagrams. They describe the hierarchical state space.
In this paper, the calculation method of hierarchical state space with the help
of operations on the logic functions and decision diagrams is presented. The
symbolic traversal method of the state space, for flat Petri nets, was presented
by Biliński3 . The application of this method for the hierarchical Petri nets and
the method of describing a hierarchical reachability graph in the form of the
connected system of decision diagrams are a new idea.
M1
p1
SG
t1
OB p2
KOB
t2
M5
M3
M4
M2
p3 p6 p9
S1 S2 S3
t3 t6 t9
!S 1 !S 2 !S 3
O1 p4 t5 O2 p7 t8 t11
O3 p10
K1 K2 K3
t4 t7 t10
p5 p8 p11
t12
the textual format PNSF2 for a specification of the flat and hierarchical Petri
nets4 . In the next step, the computer program loads this specification to internal
object data structures. During loading the textual or graphical description of a
Petri net (or nets), the basic validation of the Petri net structure is made. The
class diagram of the designed Petri net object library (PNOL) is described in
the UML notation, in Fig. 8-2.
After reading a structure of the Petri net, the space of the states of a digital
controller can be calculated. It can be done on the grounds of an internal data
structure of the Petri net and with the help of BDDs.
CObject
CMainClock
CPredicate CPetriNetObject CClock
CRegisterClock
CInhibitorArc CInhibitorArc
CCommonArc CCommonArc
CInhibitorArc CPetriNetModule
CPlace CTransition
Figure 8-2. The class diagram of the Petri net object library.
0 and 1, and non-sink nodes, each labeled with a Boolean variable. Each non-
sink node has two output edges labeled 0 and 1 and represents the Boolean
function corresponding to its edge 0 or the Boolean function corresponding to its
edge 1. The construction of a standard BDD is based on the Shannon expansion
of a Boolean function5,6 . An ordered binary decision diagram (OBDD) is a
BDD diagram in which all the variables are ordered and every path from the
root node to a sink node visits the variables in the same order. A reduced ordered
binary decision diagram (ROBDD) is an OBDD diagram in which each node
represents a distinct logic function. The size of a ROBDD greatly depends on
the variable ordering. Many heuristics have been developed to optimize the size
of BDDs5,6 . In this paper, all binary decision diagrams are reduced and ordered.
The whole state space of the hierarchical Petri net (Fig. 8-1) can be described
as one logic function:
χ( p1 , p2 , p3 , p4 , p5 , p6 , p7 , p8 , p9 , p10 , p11 ) =
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 + p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 +
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 + p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 +
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 + p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 +
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 + p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 +
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 + p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 +
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 + p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 +
Calculating State Spaces of Hierarchical Petri Nets Using BDD 89
This means that the modeling controller may be in one of 29 states. The BBD
diagram, for this function, has 24 non-sink nodes. We can reduce the number
of nodes by creating a connected system of BDDs (see Fig. 8-3). Each decision
diagram describes one characteristic function of the appropriate hierarchy level.
In this case there are six decision diagrams, which have 19 non-sink nodes. This
situation follows from the fact that the size of decision diagram depends, among
other things, on the number of logic variables of the function. Besides, each
cM c c M3
1
p1 M1 p6
0
0 1 0 1 1
p7
0 1
p12 M5 p8
0 1 0 1 1
0
0 1 0 1 0 1
cM cM
2
5 c M4
p3 p9
0 M2
1 0 1 0
1
p4 p10
0 1 M3
0 1 0 1
p5 1 p11
0 M4 1
0 1 0
0 1 0 1 0 1
p6
0
1
p7
0 1
p8 1
0
0 1
χM χM
χM 2 6
3 χM χM
2 4
3 9
4
p3 p9
0 0 0
1 1 1
-1
p4 p10
0 1 0 1 0 1
-1
p5 1 1 p11 1
0 0 0
0 1 0 1 0 1
Figure 8-4. Variable shifters in the connected system of the decision diagrams.
logic function can be represented with the help of another kind of the decision
diagram. The kind of the decision diagram used strictly depends on the kind of
the function. For example, for one logical function the binary decision diagram
is better, but the zero-suppressed binary decision diagram is better for another
function.
In many cases we can find similar parts of the Petri net, for which there is
only one difference: a naming of the places. It means that they are isomorphic.
Therefore, the state spaces of similar parts of the Petri net can be represented
by the same BDD, with the special edge attributes, called variable shifters
(see Fig. 8-4). These edge attributes were described by Minato6 . They allow a
composition of some decision diagrams into one graph by storing the difference
of the indices. The variable shifter edges indicate that a number should be added
to the indices of all descendant nodes. The variable index is not stored with each
node, because the variable shifter gives information about the distance between
the pair of nodes. In the case of the edges, which point to a terminal node, a
variable shifter is not used. The variable shifters, which point to the root of a
BDD, indicate the absolute index of the node. The other variable shifters, which
are assigned to the edges of the BDD, indicate the difference between the index
of the start node and the end node. We can use these edges to reduce the number
Calculating State Spaces of Hierarchical Petri Nets Using BDD 91
of nodes in the connected system of the decision diagrams, and it will allow
designing a more complex system.
START
Is it hierarchical No
Petri net?
Splitting Petri net into hierarchical
structure of macroplaces
Yes
No
STOP
Figure 8-5. Calculation algorithm of the state space of a hierarchical Petri net.
92 Chapter 8
START
No
current marking != 0
Yes
Generating a new marking
(new marking)
STOP
splits it into a hierarchical structure of macroplaces. In the next step, for each
macroplace, the algorithm recursively calculates characteristic function. Each
logic function, represented in the form of a decision diagram, describes a state
space of each macroplace. However the calculated decision diagram is joined
to the connected system of decision diagrams, which represents the whole state
space of the hierarchical Petri net. During this process we can also check whether
the Petri net is formally correct.
One of the more important steps of this algorithm is the calculation of
the characteristic function, which describes the state space of a macroplace
(Fig. 8-6). The symbolic traversal algorithm is taken from Biliński3 . In this
method, the next marking is calculated using the characteristic function and the
transition function. Transition functions ( : → ) are the logic functions
associated with places and defined as a functional vector of Boolean functions:
= [δ1 (P, X ), δ2 (P, X ), . . . , δn (P, X )], where δi (P, X ) is a transition func-
tion of place pi ; P and X are sets of places and input signals, respectively.
Calculating State Spaces of Hierarchical Petri Nets Using BDD 93
The function δi has value 1 when place pi has a token in the next iteration;
otherwise it equals 0. Every function δi consists of two parts: a part describing
the situation when the place pi will receive a token and a part describing the
situation when the place will keep a token. For example, place p7 will have a
token in the next iteration if place p6 has a token and input signal S2 is active
(transition t6 will fire) or place p7 has already got a token and input signal K 2 is
inactive (transition t7 is disabled); thus the function δ7 can be defined as follows:
δ7 = p6 ∗ S2 + p7 ∗ K 2 .
The computation of a set of markings which can be reached from the current
marking (current marking) in one iteration is carried out according to the
following equations:
n
next marking = ∃ ∃(current marking ∗ [ pi (current marking∗ δi ( p, x))])
p x
i=1
where p, p , and x denote the present state, the next state, and the input signal,
respectively, and symbols and * represent logic operators XNOR and AND,
respectively.
5. SUBMISSION
The application of a connected system of binary decision diagrams enables
to reduce the number of nodes of decision diagrams. On the other hand, appli-
cation of the hierarchical Petri nets makes designing parallel digital controllers
easier. It means that we can process more complex digital circuits on various
abstraction levels without processing the whole state space. The connected
system of decision diagrams can also be used for a formal verification of the
hierarchical Petri nets.
ACKNOWLEDGMENT
The research was supported by the Polish State Committee for Scientific
Research (Komitet Badań Naukowych) grant 4T11C 006 24.
REFERENCES
1. T. Murata, Petri nets: properties, analysis and applications. In: Proceedings of the IEEE, 77
(4), 541–580 (1989).
94 Chapter 8
2. E. Pastor, O. Roig, J. Cortadella, M. R. Badia, Application and theory of Petri nets. In:
Proceedings of 15th International Conference, Lecture Notes in Computer Science, Vol. 815,
Petri Net Analysis Using Boolean Manipulation. Springer-Verlang (1994).
3. K. Biliński, Application of Petri nets in parallel controller design. PhD. Thesis, University
of Bristol (1996).
4. M. Wȩgrzyn, Hierarchical implementation of concurrent logic controllers by means of
FPGAs. PhD. Thesis, Warsaw University of Technology (1998).
5. R. Drechsler, Binary Decision Diagram. Theory and Implementation. Kluwer Academic
Publishers, pp. 9–30 (1998).
6. S. Minato, Binary Decision Diagrams and Applications for VLSI CAD. Kluwer Academic
Publishers, pp. 9–22 (1996).
Chapter 9
A NEW APPROACH TO SIMULATION
OF CONCURRENT CONTROLLERS
1. INTRODUCTION
Dynamic expansion of electronics in daily life and constantly developing
digital technology are making life easier, better, and safer. Increasing demands
for digital circuits and controllers create a new working field for engineers. The
designers encounter new problems and have to search new, optimal methods of
design of programmable logic controllers (PLCs).
Nowadays in the world, for design and modeling digital circuits and con-
current controllers, two kinds of the models, HDL languages and Petri nets, are
used most frequently. Testing of concurrent controller model is very important,
because a good system should not contain events that can never occur.
There exist many different methods of systems verification, but one of the
simplest is simulation. This method enables checking of behavior correctness
of the model, and in the same way of a real circuit. Furthermore, the simula-
tion allows to detect and remove many errors at an early stage of design. The
96 Chapter 9
2. BACKGROUND
In this section basic information about and definitions of Petri nets and HDL
language are presented.
PN = (P, T, F),
where
r P is a finite set of places;
r T is a finite state of transitions, and
r F is a finite set of arcs.
In a graphical form of Petri nets each place represents a local state of a digital
circuit. A currently active place is marked. Every marked place represents an
active local state. The set of places, which are marked at the same time, defines
the global state of the controller. Each transition describes logical condition of
controller’s state change.
In the flow graph, each place has a unique identifier and an ordered list of
output signals, which are activated in this state (Moore outputs). However, each
transition, besides the unique name’s tag, has an enabling function to change the
currently active state. This function consists of two parts: logical condition of
state change and firing results. The logical condition consists of inputs product,
outputs, predicates, and places, which are connected by inhibitor and enabling
arcs to this transition. When a logical condition is satisfied and all input places
of the transition have markers, then the transition is being fired with the nearest
rising edge of a clock signal. As a result of execution, input places shift active
markers to output places of the transition and appropriate output signals are
activated (Mealy outputs).
In large projects, it is more useful to create models of Petri nets with the
use of hierarchical approach. In this way any part of the net can be replaced by
the symbol of a macronode. Each macroplace or macrotransition includes other
places, transitions, arcs, and other macronodes. They are making subnets and
represent lower levels of the hierarchy. When the macronode becomes active
(gets a marker), the control is passed to the subnet and starting places of the
subnet are marked inside. The marker stays at this macronode until all output
places of the subnet are marked. Then the control is passed back to the main
net.
CONSOLE
Component Component
AU – emergency stopping of system A B
REP – initialization of work
V2 V4
AUT – normal cycle of work Weight2
Weight1
Motor
M V3 B1 B2 V5
V1 P C1 AC1
Discard
C2 AC2
Discard
NLIM hLIM
Nmax hmax AC1 = AC2 = AC
C1 = C2 = C
Nmin hmin Product V1 = P = VP
EV MIXER V6 V3 = V5 = V35
Discard
Name Description
Figure 9-2 presents Petri nets for the logic circuit that controls the described
technological process. The controller is described by two nets. Each coherent
net is analyzed separately. In literature there are some ways to analyze such
nets, using enabling arcs. But, for the sake of the state local code P14 occurring
in the condition for a lot of transitions (e.g., t2, t3, t4), insert an additional signal
M14, because joining a lot of enabling arcs going out from place P14 decreases
the legibility of the net.
(a)
1
t1 M14
TM1
M14->
2 M14-> EV 4
AC1*AC2
t2 M14*Nmin t3 M14*FT1
3 5
t4 M14*AUT
9 10 11
t8 M14
TM1
M*
12 C1*C2
V3*V5
t9 !AU*FT1
TM2
t13
V6
13
!FT2 -> M AU
t10 !AU*FT2*Nmin
(b) 15
14
M14
Because of this fact a new XML format, PNSF3 (Petri net specification for-
mat 3), was proposed5 . This format is based on PNSF2, which was developed
at the University of Zielona Góra.
PNSF3 is one of such textual descriptions of concurrent controller models.
In this format XML markup notation is used for storing information about the
A New Approach to Simulation of Concurrent Controllers 101
_
<MACRO_PLACE>
description of macroplace used in other units ....
_
</MACRO_PLACE >
<MACRO_TRANSITION>
_
description of macrotransition used in other units
_
</MACRO_TRANSITION>
<PART>
description of independent unit ...•
</PART>
</PNSF3>
Verilogbased modeling of Petri nets is not known. Verilog syntax can sup-
port the intermediate-level models. It makes it possible to describe the highest
level of the system and its interfaces first and then to refer to greater details.
The Verilog language describes a digital system as a set of modules. Each
of these modules has an interface to other modules, as well as a description of
its contents4 .
The basic Verilog statement for describing a process is the always statement.
The always continuously repeats its statement, never exiting or stopping. A
behavioral model may contain one or more always statements.
The initial statement is similar to the always statement, except that it is
executed only once. The initial provides a means of initiating input waveforms
and other simulation variables before the actual description begins its simula-
tion. The initial and always statements are the basic constructs for describing
concurrency4 .
Because of the fact that Verilog can model concurrency, e.g., using structure
always, Petri nets can be effectively described by the Verilog language.
The conditions of transitions are input signals, or internal signals (for subnets
synchronization). Each place is represented by a flip-flop (i.e., concurrent one-
hot method is used for place encoding), which holds the local state of the
controller. During initialization of the simulation, and after active state of signal
reset, initial marked places are set to logical 1, other places are set to 0.
In Fig. 9-6 a part of a Verilog model is presented. At the beginning, a
file called defines.h is included, and real names of objects (input/output signals,
places, and transitions) are assigned to names used in the model. In the example
presented, the signals Nmin, V1, and TM1 are assigned as follows:
A New Approach to Simulation of Concurrent Controllers 103
`include "defines.h"
module Petri_Net_example (reset, clk, INs, REGOUTs, OUTs);
//Declarations
input reset, clk;
input [9:0] INs;
output [13:0] OUTs;
output [1:0] REGOUTs; reg [1:0] REGOUTs ;
reg [14:0] States;
wire [0:12] T;
// Conditions for transitions
assign `T1 = `M14;
assign `T2 = `M14 & `Nmin;
assign `T3 = `M14 & `FT1;
...
//Combinatorial outputs ...
assign `V1 = (`M14 & `NLIM==1)?`P6 :0;
assign `V2 = (`M14==1)?`P7:0;
assign `V3 = `P12;
assign `P = ((`M14 & `NLIM)==1)?`P6:0 ;
...
//... and registered outputs
assign `M14 = `P14;
always @(posedge clk)
begin
if (reset) `TM1 <= 0;
else `TM1 <= `T1&`P1 | `T8&`P9&`P10&`P11 ;
end
always @(posedge clk)
begin
if (reset) `TM2 <= 0;
else `TM2 <= `T9&`P12 ;
end
//Exciting functions for flip-flops (places)
always @(posedge clk)
begin
if (reset) `P1 <= 1;
else `P1 <= (`P1 &~`T1 )|(`T13&`P13);
end
always @(posedge clk)
begin
if (reset) `P2 <= 0;
else `P2 <= (`P2 & ~`T2 ) | (`P1 & `T1) ;
end
...
In a module, the designed controller interface is declared, i.e., the number, types,
and size of input/output ports. Then there are two assign blocks. The first block
defines conditions, with respect to input signals, for firing of transitions. The
second one defines logical outputs for each active local state of the controller.
The next blocks are a group of always statements, which calculate the current
A New Approach to Simulation of Concurrent Controllers 105
state of the flip-flops. Each always statement controls one place of a net. During
work of the controller, the exciting function of the flip-flop for the given state
(place) can be calculated as follows:
Px = Px set + Px hold
and
Px set = C(ti )
ti ∈• px
Px hold = px ∗ ∼ C(ti )
ti ∈Px •
where
r • Px is the set of the input transitions of a place Px ;
r Px • is the set of the output transitions of a place Px ;
r C(ti ) is the firing condition of the transition ti ; and
r , , ∗, and ∼ are logical operations OR, AND, AND, NOT, respectively.
Figure 9-7 shows a window from Aldec Active-HDL simulator with the
waveforms as the simulation result (the vector of the output signals is presented
in detail).
In the previous section it was mentioned that PNSF3 did not keep any
information about element arrangement. Therefore, it is very difficult to make
an XSL file directly for transformation into SVG graphics. To eliminate these
problems, a special program should be used to automate this process.
The first and the most important step for creating the SVG file is to arrange
all elements on the screen correctly. Next, the Animate Script is generated on the
basis of information about connections, conditions, and results of execution for
all transitions. The script consists of a few functions, which control the processes
of the animation. Then special functions and control panels are generated.
The additional functions enable to communicate with the user. However, the
control panels enable setting input signals and watching the current state of the
circuit.
A New Approach to Simulation of Concurrent Controllers 107
5. CONCLUSIONS
In the paper a new XML application for modeling and simulation of digital
circuits, especially concurrent controllers, was presented. The proposed PNSF3
is one such textual form for describing Petri net models.
The most important advantages of the proposed PNSF3 are as follows:
extreme flexibility, platform independence, precise specification, human read-
ing, easiness of preparation, parsability, transformation, and validation. No
special and expensive tools are necessary to prepare documents in PNSF3.
However, the disadvantage is difficulty of conversion into other graphic for-
mats. This problem can be solved by a special application, which generates the
graphical form of Petri nets as a SVG file on the basis of PNSF3 specifica-
tion.
On the other hand, for simulation of logic controllers, which are described
by Petri nets, HDL-based models are very efficient. In the paper, model-
ing in Verilog-HDL was presented. Verilog construction always has been
used for effective description of concurrency. For concurrent place encod-
ing, one-hot method is used; a dedicated flip-flop holds the local state of the
controller.
ACKNOWLEDGMENT
The research has been financially supported by the (Polish) Committee of
Scientific Research in 2004–2006 (grant 3 T11C 046 26).
REFERENCES
1. E. Best, C. Fernandez, Notations and terminology on Petri net theory, Petri Net Newsletter,
23, 21–46 (1986).
2. T. Kropf, Introduction to Formal Hardware Verification. Springer-Verlag, Berlin (1999).
3. R.B. Lyngsø, T. Mailund, Textual interchange format for high-level Petri nets. In: Workshop
on Practical Use of Coloured Petri Nets and Design. Aarhus University, pp. 47–64 (June
1998).
4. D.E. Thomas, P. Moorby, The Verilog Hardware Description Language. Kluwer Academic
Publishers, Boston (1998).
108 Chapter 9
5. A. Wegrzyn, P. Bubacz, XML application for modeling and simulation of concurrent con-
trollers. In: Proceedings of the International Workshop on Discrete-Event Systems Design,
DESDes2001, Przytok, Poland, pp. 215–221 (June 2001).
6. M. Wegrzyn, M. Adamski, J.L. Monteiro, The application of reconfigurable logic to con-
troller design. Control Engineering Practice, Special Section on Custom Processes, IFAC,
6, 879–887 (1998).
7. Extensible Markup Language (XML) 1.0, W3C Recommendation; http://www.w3.org.
8. Scalable Vector Graphics (SVG) 1.0, W3C Candidate Recommendation, http://www.w3.
org.
Section III
Synthesis of Concurrent
Embedded Control Systems
Chapter 10
OPTIMAL STATE ASSIGNMENT OF
SYNCHRONOUS PARALLEL AUTOMATA
Yury Pottosin
National Academy of Sciences of Belarus, Institute of Engineering Cybernetics, Surganov Str.,
6, 220012, Minsk, Belarus; e-mail: [email protected]
Abstract: Three algorithms for assignment of partial states of synchronous parallel automata
are considered. Two of them are original; the third one is taken for comparison.
One of them is exact; i.e., the number of coding variables obtained by this algo-
rithm is minimal. It is based on covering a nonparallelism graph of partial states
by complete bipartite subgraphs. Two other algorithms are heuristic. One of the
heuristic algorithms uses the same approach as the exact one. The other is known
as iterative. The results of application of these algorithms on some pseudoran-
dom synchronous parallel automata and the method for generating such objects
are given.
1. INTRODUCTION
The parallel automaton is a functional model of a discrete device and is rather
convenient to represent the parallelism of interactive branches of controlled
process19 . The main distinction between a parallel automaton and a sequential
one (finite state machine) is that the latter can be in only one state at any moment,
while the parallel automaton can be in several partial states simultaneously. A
set of partial states in which a parallel automaton can be simultaneously is called
a total state. Any two partial states in which an automaton can be simultaneously
are called parallel.
A parallel automaton is described by the set of strings of the form µi :
−w i → νi → νi , where w i and νi are elementary conjunctions of Boolean vari-
ables that define the condition of transition and the output signals, respectively,
and µi and νi are labels that represent the sets of partial states of the parallel
automaton19 . Every such string should be understood as follows. If the total
112 Chapter 10
state of the parallel automaton contains all the partial states from µi and the
event w i has been realized in the input variable space, then the automaton is
found to be in the total state that differs from the initial one by containing partial
states from νi instead of those from µi . The values of output variables in this
case are set to be such that νi = 1.
If −w i and → νi are removed from the string, it can be interpreted as a
transition (µi , νi ) in a Petri net. Therefore, the set of such reduced strings can
be considered as a Petri net being a “skeleton”of the given parallel automaton.
Here we consider only those parallel automata whose skeleton is an α-net19 that
is a subclass of live and safe expanded nets of free choice, which are studied in
Ref. 6.
In state assignment of a parallel automaton, partial states are encoded by
ternary vectors in the space of introduced internal variables that can take val-
ues 0, 1, or “−”, orthogonal vectors being assigned to nonparallel states and
nonorthogonal vectors to parallel states2,18,19 . The orthogonality of ternary vec-
tors means existence of a component having opposite values (0 and 1) in these
vectors. It is natural to minimize the dimension of the space that results in the
minimum of memory elements (fl ip-fl
ops) in the circuit implementation of the
automaton.
The methods to solve the state assignment problem for synchronous parallel
automata are surveyed in Ref. 4. Two heuristic algorithms are considered here.
One of them is based on iterative method3 ; the other reduces the minimization
of the number of memory elements to the problem of covering a nonparallelism
graph of partial states by complete bipartite subgraphs10 . To solve the problem
of covering, the algorithm uses a heuristic technique. The third algorithm con-
sidered here is exact; i.e., the number of coding variables (memory elements)
obtained by this algorithm is minimal. It also finds a cover of a nonparallelism
graph of partial states by complete bipartite subgraphs, though using an ex-
act technique12 . These three algorithms were used to encode partial states of a
number of synchronous parallel automata obtained as pseudorandom objects.
The pseudorandom parallel automata with given parameters were generated
by a special computer program. The method for generating such objects is de-
scribed. The results of this experiment allow to decide about the quality of the
algorithms. Similar experiments are described in Ref. 21, where another ap-
proach was investigated and the pseudorandom objects were Boolean matrices
interpreted as partial state orthogonality matrices of parallel automata.
2. EXACT ALGORITHM
Below, we refer to this algorithm as Algorithm A. It is based on covering a
nonparallelism graph G of partial states by complete bipartite subgraphs.
Optimal State Assignment of Synchronous Parallel Automata 113
2.4 Example
Let a parallel automaton be given by the following set of strings:
1 : −x 1 x2 → y1 y 2 → 10
10 : −x 2 → 2.3.4
2 : → y1 → 5
3.5 : −x2 → 8
4 : −x 1 → y 1 → 7
4 : −x1 → y2 → 9
7 : −x 2 → 9
8.9 : → y 2 → 6
6 : −x1 → y1 y 2 → 1
116 Chapter 10
Edges of G 1
Maximum complete Nonredundant
bipartite G 1 subgraphs v 2 v 8 v3v8 v2v5 v4v7 v4v9 v5v8 v7v9 covers of G 1
v 2 ,v 8 ; v 5 1 1 1 1 1
v 2 ,v 3 ,v 5 ; v 8 1 1 1 1 1 1 1 1 1
v 4 ; v 7 ,v 9 1 1 1 1 1 1
v 4 ,v 9 ; v 7 1 1 1 1 1 1
v 2 ; v 5 ,v 8 1 1 1 1 1
v 4 ,v 7 ; v 9 1 1 1 1 1 1
v v2 v3 v4 v5 v6 v7 v8 v9 v 10
⎡1 ⎤
v1 0 1 1 1 1 1 1 1 1 1
v2 ⎢1 0 0 0 1 1 0 1 0 1⎥
⎢ ⎥
v3 ⎢1 0 0 0 0 1 0 1 0 1⎥
⎢ ⎥
v4 ⎢1 0 0 0 0 1 1 0 1 1⎥
⎢ ⎥
v5 ⎢1 1 0 0 0 1 0 1 0 1⎥
⎢ ⎥
v6 ⎢1 1 1 1 1 0 1 1 1 1⎥
⎢ ⎥
v7 ⎢1 0 0 1 0 1 0 0 1 1⎥
⎢1 1⎥
v8 ⎢ 1 1 0 1 1 0 0 0 ⎥
v9 ⎣1 0 0 1 0 1 1 0 0 1⎦
v 10 1 1 1 1 1 1 1 1 1 0
⎡z 1 z 2 z 3 z4 ⎤
2 1 − − 0
3 ⎢1 − − −⎥
⎢ ⎥
4 ⎢− 0 1 −⎥
⎢ ⎥
5 ⎢1 − − 1⎥
⎢ ⎥
7 ⎢− 1 0 −⎥
⎢ ⎥
8 ⎣0 − − 1⎦
9 − 1 1 −
The rows of this matrix are marked by the indices of the partial states of the
given automaton, except those having no parallel states. The intervals defined
by the rows of the matrix occupy almost the whole space Z formed by Boolean
variables z 1 , z 2 , z 3 , and z 4 . Only one element of it, 0000, is vacant. Therefore,
to place the remaining partial states, 1, 6, and 10, in Z, it must be widened.
Having added variable z 5 , we obtain the final coding matrix with the minimum
length of codes as follows:
z1 z2 z3 z4 z5
⎡ ⎤
1 0 0 0 0 0
2 ⎢1 − − 0 0⎥
⎢ ⎥
3 ⎢1 − − − 0⎥
⎢ ⎥
4 ⎢− 0 1 − 0⎥
⎢ ⎥
5 ⎢1 − − 1 0⎥
⎢ ⎥
6 ⎢0 0 0 0 1⎥
⎢ ⎥
7 ⎢− 1 0 − 0⎥
8 ⎢
⎢0 − − 1 0⎥⎥
9 ⎣− 1 1 − 0⎦
10 1 0 0 0 1
3. HEURISTIC ALGORITHMS
The NP-hardness of covering problem5 does not allow it always to be solved
in acceptable time. Therefore the heuristic algorithms that obtain in many cases
the shortest cover are developed.
3.1 Algorithm B
The method realized in Algorithm B reduces the problem to covering the
state nonparallelism graph G of a given automaton by complete bipartite
118 Chapter 10
we try to restore the B-cover by adding vertices from V and edges from E to
subgraphs of the rest. If we cannot manage it we introduce a new complete
bipartite subgraph containing all uncovered edges.
This procedure can be repeated many times. The sign to finish it may be
the following condition: the number m of B-cover elements and the number of
once-covered edges do not decrease. When this condition is satisfied the process
of solving the task is over and the B-cover obtained after the last executing the
procedure is the result.
To illustrate Algorithm B, let us take the parallel automaton from section
2.4. The adjacency matrix of the state nonparallelism graph of this automaton
is given above. After renumbering according to the rule above, it is
v v2 v3 v4 v5 v6 v7 v8 v9 v 10
⎡ 1 ⎤
v1 0 0 0 0 0 0 1 1 1 1
v2 ⎢0 0 0 0 1 0 1 1 1 1⎥
⎢ ⎥
v3 ⎢0 0 0 1 0 1 0 1 1 1⎥
⎢ ⎥
v4 ⎢0 0 1 0 0 1 0 1 1 1⎥
⎢ ⎥
v5 ⎢0 1 0 0 0 0 1 1 1 1⎥
⎢ ⎥,
v6 ⎢0 0 1 1 0 0 0 1 1 1⎥
⎢ ⎥
v7 ⎢1 1 0 0 1 0 0 1 1 1⎥
⎢1 1⎥
v8 ⎢ 1 1 1 1 1 1 0 1 ⎥
v9 ⎣1 1 1 1 1 1 1 1 0 1⎦
v 10 1 1 1 1 1 1 1 1 1 0
v 1 , v 2 ; v 7 , v 8 , v 9 ; v 4 , v 5 ; v 8 , v 9 ; v 3 ; v 9 ;
B 10
= v 3 , v 6 , v 9 ; v 4 , v 8 , v 10 ; v 2 , v 7 , v 10 ; v 5 , v 8 , v 9 ; v 3 , v 8 , v 9 ; v 6 ;
v 1 , v 2 ; v 7 , v 8 , v 9 , v 10 ; v 4 , v 5 ; v 8 , v 9 , v 10 ; v 3 ; v 9 ; v 7 ; v 10 .
120 Chapter 10
3.2 Algorithm C
To appreciate the efficiency of the proposed algorithms we consider
Algorithm C based on the heuristic iterative method suggested in Ref. 3. The iter-
ative method assumes the definition of parallelism relation and an initial coding
matrix for partial states (the initial matrix may be empty). The matrix is extended
in the process of coding by introducing additional coding variables, which
makes it possible to separate nonparallel partial states in certain pairs. To sep-
arate two states means to put opposite values (0 and 1) to some coding variable
in the codes of these states. The method consists in iterative executions of two
procedures: introducing a new coding variable and defining its values in codes
of nonseparated yet non-parallel partial states. These procedures are executed
until all nonparallel states have been separated. Minimizing the number of intro-
duced coding variables also minimizes the Hamming distance between codes of
states related by transitions. The aim of this is the minimization of the number
of switchings of RS-type fl ip-flops in circuit realization of a parallel automaton.
Introducing a new coding variable is accompanied with separating the max-
imal number of nonseparated yet nonparallel partial states by this variable. For
this purpose, at each step of the procedure of defining the values of the due
variable, a state is chosen to encode by this variable. This state should be sep-
arated from the maximal number of states already encoded by this variable.
The number of states that are not separated from the chosen one and have been
encoded by this variable must be maximal. A new coding variable is introduced
if the inner variables already introduced do not separate all nonparallel partial
states from each other.
5. EXPERIMENTAL RESULTS
Algorithms A, B, and C are realized in computer programs and the corre-
sponding modules are included as components into ISAPR, which is a research
CAD system14 . The program for generating pseudorandom parallel automata is
122 Chapter 10
Table 10-2. Experimental results ( p, t, and s are parameters of α-nets, b is the number
of maximum complete bipartite subgraphs of G 2 )
included into ISAPR as well. This program was used to generate several parallel
automata. The results of partial state assignment are shown in Table 10-2. One
of the automata whose partial states were encoded, RAZ, was not generated by
the program mentioned above. It was obtained from a real control algorithm.
As was noted, only the parameters of α-net, i.e., the number of places p,
the number of transitions t, and the number of sentences s, were considered.
Besides this, the number of maximum complete bipartite subgraphs in the graph
G of nonparallelism of partial states of the given automaton may be of interest.
Algorithm A uses the method that decomposes graph G into two subgraphs G 1
and G 2 , G 1 being complete. The maximum complete bipartite subgraphs were
found in G 2 . The calculations were performed on a computer of AT type with
the 386 processor.
6. CONCLUSION
The technique of investigation of algorithms for state assignment of parallel
automata is described in this paper. The experimental data show that Algorithms
B and C are quite competitive to each other, although the speed of Algorithm
C is higher than that of Algorithm B. Algorithm A is intended to be applied for
automata of small dimension. It can be used as a standard algorithm and helps
one to appreciate the quality of solutions obtained by heuristic algorithms.
REFERENCES
1. S.M. Achasova, O.L. Bandman, Correctness of Concurrent Computing Processes. Nauka,
Siberian Division, Novosibirsk (1990) (in Russian).
Optimal State Assignment of Synchronous Parallel Automata 123
20. A.D. Zakrevskij, Parallel Algorithms for Logical Control. Institute of Engineering Cyber-
netics, National Academy of Sciences of Belarus, Minsk (1999) (in Russian).
21. A. Zakrevskij, I. Vasilkova, A quick algorithm for state assignment in parallel automata. In:
Proceedings of the Third International Conference on Computer-Aided Design of Discrete
Devices (CAD DD’99), Vol. 1. Institute of Engineering Cybernetics, National Academy
of Sciences of Belarus, Minsk, pp. 40– 44 (1999).
Chapter 11
OPTIMAL STATE ASSIGNMENT OF
ASYNCHRONOUS PARALLEL AUTOMATA
Ljudmila Cheremisinova
Institute of Engineering Cybernetics of National Academy of Sciences of Belarus, Surganov
Str., 6, 220012 Minsk, Belarus; e-mail: [email protected]
Key words: asynchronous parallel automata; state assignment; parallelism; critical races.
1. INTRODUCTION
Successive control of a multicomponent system depends greatly on the ef-
ficiency of the synchronization among its processing elements. The functions
of a control of such a system are concentrated in one block –a logic control
device that should provide a proper synchronization of interaction between the
components. In order to represent clearly the interaction involved in concur-
rent engineering system, it is necessary to describe formally its functional and
structural properties.
As a functional model of a discrete control device to be designed, a model
of parallel automaton is proposed1,8,9 . This model can be considered as an
extension of a sequential automaton (finite state machine) allowing represent-
ing parallel processes. The parallel automaton is a more complicated and less
studied model in contrast with the classical sequential automaton model. An
essential difference from sequential automaton is that a parallel automaton can
126 Chapter 11
be in more than one state simultaneously. That is why the states of a parallel
automaton were called partial8 . Partial states in which a parallel automaton is
at the same moment are called parallel8 . Any transition of automaton defines
parallel partial state changes.
The design of asynchronous automata has been an active area of research
for the last 40 years. There has been a renewed interest in asynchronous design
because of its potential for high performance. However, design of asynchronous
automata remains a cumbersome problem because of difficulties of ensuring
correct dynamic behavior.
The important step of control device hardware implementation is the state
assignment. It is at the heart of the automaton synthesis problem (especially
for its asynchronous mode of realization). Despite great efforts devoted to this
problem, no satisfactory solutions have been proposed. A difference of this
process for parallel automaton in comparison with the sequential one is that
there are parallel states (they are compatible in the sense that the automaton can
find itself in several of them at the same time). That is why it was suggested in
Ref. 8 to encode partial states with ternary vectors that should be nonorthogonal
for parallel partial states but orthogonal for nonparallel ones. In such a way an
initial parallel automaton is transformed from its abstract form into a structural
form –a sequent parallel automaton 9 or a system of Boolean functions that can
be directly implemented in hardware.
The problem of state assignment becomes harder when asynchronous im-
plementation of a parallel automaton is considered. The mentioned condition
imposed on codes is necessary but not enough for that case. The additional
condition to be fulfilled is to avoid the infl uence of races between memory
elements (fl ip-fl
ops) during hardware operation. One of the ways to avoid this
is to order switches of memory elements so as to eliminate critical races.
A problem of race–free state assignment of asynchronous parallel automata
is considered. The goal is to encode partial states of parallel automaton us-
ing minimal number of coding variables and to avoid the critical races during
automaton operation. An exact algorithm to find a minimal solution of the prob-
lem is suggested. The algorithm allows reducing computational efforts of state
encoding. The same problem is considered in Ref. 5, where another approach is
suggested. The method is based on covering a family of complete bipartite sub-
graphs defining constraints of absence of critical races by minimal number of
maximal complete bipartite subgraphs of the partial state nonparallelism graph.
2. RACE-FREE IMPLEMENTATION OF
ASYNCHRONOUS AUTOMATON
The asynchronous sequential automaton behaves as follows. Initially, the
automaton is stable in some state. After the input state changes, the output
Optimal State Assignment of Asynchronous Parallel Automata 127
contains all the partial states from Sk and the variables in the conjunction term
X kl assume values at which X kl = 1, then as the result of the transition the
automaton goes to the next global state that differs from initial one in that it
contains partial states from Sl instead of those from Sk . More than one gener-
alized transition may take place at some moment when a parallel automaton
functions. These transitions define changing different subsets of parallel partial
states. There are no races on such a pair of transitions.
In the case of parallel automaton there are generalized transitions instead
of elementary ones. A generalized transition tkl : Sk → Sl consists of |Sk | · |Sl |
elementary transitions ski → slj , where ski ∈ Sk is nonparallel to slj ∈ Sl . Let
us introduce the set T (tkl , t pq ) of pairs of elementary transitions ski → slj and
s pi → sqj between pairwise nonparallel partial states taken from Sk , Sl , S p , and
Sq generated by the pair of competing transitions tkl : Sk → Sl and t pq : S p → Sq .
For compatible pair tkl , t pq of generalized transitions, we have T (tkl , t pq ) = ∅.
The partial states from {s2 , s4 , s5 , s7 , s8 , s9 } and {s3 , s6 } are pairwise parallel,
so also are the partial states from {s4 , s7 } and {s5 , s8 }. One can see, for example,
that the pair t1 , t8 of generalized transitions is competing. The generalized
constraint U1,8 induced by that pair consists of three simple constraints: u 1,8.1 p
=
({s1 , s2 ; s7 } and ({s1 , s7 ; s2 }), u 1,8.2 = ({s1 , s2 ; s8 } and ({s1 , s8 ; s2 }), and u 1,8.3 =
p p
({s1 , s3 ; s6 } and ({s1 ; s3 , s6 }). Thus for this automaton we have the following set
of generalized constraints Uk (in the form of dichotomies) derived from the
pairs of competing generalized transitions:
1. {s1 , s2 ; s9 } and {s1 ; s2 , s9 };
2. ({s1 , s2 ; s4 } and {s1 ; s2 , s4 }) or ({s1 , s2 ; s5 } and {s1 ; s2 , s5 });
3. ({s1 , s3 ; s6 } and {s1 ; s3 , s6 });
4. {s1 , s2 ; s5 , s8 };
5. ({s1 , s2 ; s7 } and {s1 , s7 ; s2 }) or ({s1 , s2 ; s8 } and {s1 , s8 ; s2 }) or ({s1 , s3 ; s6 } and
{s1 , s6 ; s3 });
6. {s2 , s9 ; s4 , s7 };
7. ({s2 , s9 ; s4 } and {s4 , s2 ; s9 }) or ({s2 , s9 ; s5 } and {s2 , s5 ; s9 });
8. {s1 , s7 ; s2 , s9 } or {s1 , s8 ; s2 , s9 };
9. {s2 , s9 ; s5 , s8 };
10. {s1 , s7 ; s2 , s4 } or {s1 , s8 ; s2 , s5 };
11. ({s1 , s7 ; s4 } and {s1 ; s4 , s7 }).
Example 2. For the automaton under consideration we can see that the gen-
eralized constraint ({s1 , s7 ; s2 , s9 } or ({s1 , s8 ; s2 , s9 }) induced by the pair t3 , t8
of competing transitions implicates the elementary constraint ({s1 ; s2 , s9 } from
the simple constraint ({s1 , s2 ; s9 } and ({s1 ; s2 , s9 }) induced by the pair t1 , t2 of
competing transitions.
The irredundant constraint matrix U for the automaton under consideration
is shown in the second column in Table 11-1. Its first column gives the structure
Table 11-1. Encoding constraints, their boundary vectors, and compatibility relation among
them
The ith entry of ukz defines whether the state si may be encoded with kth
coding variable, and if yes (ith entry is not “+”), it shows what may be the
value of that variable in the code. For example, for the automaton under consid-
eration, boundary vectors for matrix U rows are displayed in the third column
in Table 11-1.
Now we define some operations over 4-valued vectors that extend those over
ternary vectors (keeping in view vectors of the same length). A 4-valued vector
b is an inversion of a vector a if, whenever the ith entry of a is 1, 0, –, +, the
ith entry of b is 0, 1, ,– +. The 4-valued vectors a and b (one of them can be
ternary) are orthogonal if for at least an index i one vector has the ith entry
equaled to σ ∈ {1, 0}, and the other to σ or “+”.A weight of a vector is defined
as a sum of the weights of its entries, supposing that the weight of the entry
equaled to 1, 0, –, + is 1, 1, 0, 2, respectively. A 4-valued vector a covers a
Optimal State Assignment of Asynchronous Parallel Automata 133
cardinality, such that for any generalized constraint Ui ∈ U there exists an im-
plicant in V implicating it. The second part of the problem is reduced to a
covering problem of Boolean matrix7 , as in the case with Quine table.
1 1−1−−0−− − −−−1−−−−−−−−−−−
2 1−0−−1−− − −−−−−−11−−−−−−−−
3 1000000−− −11−1−−−−−−−−−−1−
4 1−0−−1−− − −−1−1−−−−−−−−−−1−
5 10−0−−1−1 −−1−−−−1−1−−11−−
6 10−0−−1−0 −−1−−−−1−−1−11−−
7 10−−0−−11 −11−− −11−1−−1−−1
8 10−−0−−10 −11−− −11−−1−1−−1
9 10−00−0−1 −11−−−−−−1−−−−1−
10 11−−1−−00 1−−−−−−−−1−−−− −1
11 11−11−−−0 1−−−−−−−−1−−−−−−
12 10−11−110 −−−−−−111−11−−−−
13 11−00−0−0 11−− −11−−−−−− −1−
14 11−00−001 −1− − −11−1−−1−−1−
15 11−1−−0−0 1− − − − −1−−1− − − − −−
Optimal State Assignment of Asynchronous Parallel Automata 135
Then the sixth constraint u3,1.2 is chosen (it is compatible with three con-
straints), and so on.
U9 = u 12 (u 6,1 );
U10 = u 13 ∨ u 14 (u 7,1 ∨ u 7,2 );
U11 = u 15 ∨ u 16 (u 8,1 ∨ u 8,2 );
U12 = u 17 (u 9,1 );
U13 = u 18 ∨ u 19 (u 10,1 ∨ u 10,2 );
U14 = u 20 (u 11,1.1 );
U15 = u 21 (u 11,1.2 );
U16 = u 22 (u 12,1 ).
These conjunctive members are assigned to the columns of the Quine table
shown in the third column in Table 11-2. Permissible minimum number of rows
that provide the Quine table cover is 5. One of the minimal covers presenting
encoding for automaton considered consists of the rows 3, 6, 10, 13, and 1.
So we find the following 5-component codes of partial states that provide the
absence of critical races when the automaton operates:
⎡ ⎤
1 0 0 0 0 0 0 − −
⎢1 0 − 0 − − 1 − 0 ⎥
⎢ ⎥
V =⎢ ⎢ 1 1 − 0 0 − 0 0 1 ⎥
⎥
⎣1 1 − − 1 − − 0 0 ⎦
1 − 1 − − 0 − − −
It should be noted that some entries of a matrix with values 0 or 1 can be
substituted with value d“ on’t care”because of usage of maximal implicants. In
our case the irredundant form of the coding matrix is
⎡ ⎤
1 − 0 − − 0 − − −
⎢1 0 − 0 − − 1 − 0 ⎥
⎢ ⎥
V =⎢ ⎢1 1 − 0 0 − 0 0 1 ⎥
⎥
⎣1 1 − − 1 − − 0 0 ⎦
1 − 1 − − 0 − − −
5. CONCLUSION
The suggested method solves the encoding problem exactly: it ensures that
the number of variables to encode partial states is minimal. Unfortunately the
problems considered are computationally hard. The growth of the computation
time as the size of the problem is a practical limitation of the method to be used
in computer-aided design systems. It can be used for solving encoding problems
of moderate size obtaining after decomposing the whole big problem. Besides,
the method can be useful for estimation of efficiency of heuristic encoding
techniques3 .
Optimal State Assignment of Asynchronous Parallel Automata 137
REFERENCES
1. M. Adamski, M. Wegrzyn, Field programmable implementation of current state machine.
In: Proceedings of the Third International Conference on Computer-Aided Design of Dis-
crete Devises (CAD DD’99), Vol. 1. Institute of Engineering Cybernetics of the of Belarus
Academy of Sciences, Minsk, 4– 12 (1999).
2. L.D. Cheremisinova, Implementation of parallel digital control algorithms by asynchronous
automata. Automatic Control and Computer Sciences, 19 (2), 78– 83 (1985).
3. L.D. Cheremisinova, Race-free state assignment of a parallel asynchronous automaton.
Upravlyajushchie Sistemy i Mashiny, 2 51–54 (1987) (in Russian).
4. L.D. Cheremisinova, PLC implementation of concurrent control algorithms. In: Proceed-
ings of the International Workshop “Discrete Optimization Methods in Scheduling and
Computer-Aided Design”. Republic of Belarus, Minsk, pp. 190– 196 Sept. 5–6 (2000).
5. Yu.V. Pottosin, State assignment of asynchronous parallel automata with codes of minimum
length. In: Proceedings of the International Workshop “Discrete Optimization Methods in
Scheduling and Computer-Aided Design”. Republic of Belarus, Minsk, pp. 202– 206 Sept.
5–6 (2000).
6. S.H. Unger, Asynchronous Sequential Switching Circuits. Wiley-Interscience, New York
(1969).
7. A.D. Zakrevskij, Logical Synthesis of Cascade Networks. Nauka, Moscow (1981) (in Rus-
sian) .
8. A.D. Zakrevskij, Parallel automaton. Doklady AN BSSR, 28 (8), 717– 719 (1984) (in
Russian).
9. A.D. Zakrevskij, Parallel Algorithms for Logical Control. Institute of Engineering Cyber-
netics of NAS of Belarus, Minsk (1999) (in Russian).
Chapter 12
DESIGN OF EMBEDDED CONTROL SYSTEMS
USING HYBRID PETRI NETS
Abstract: The paper describes the challenges of modeling embedded hybrid control systems
at a higher abstraction level. It discusses the problems of modeling and analyzing
such systems and suggests the use of hybrid Petri nets and time interval Petri nets.
Modeling an exemplary embedded control system with a special hybrid Petri net
class using an object-oriented modeling and simulation tool and the extension of
hybrid Petri nets with the concept of time intervals for analyzing time constraints
shows the potential of this approach.
Key words: embedded control systems; hybrid Petri nets; time interval Petri nets.
1. INTRODUCTION
The design of complex embedded systems makes high demands on the
design process because of the strong combination of hardware and software
components and the observance of strong time constraints. These demands rise
rapidly if the system includes components of different time and signal concepts.
This means that there are systems including both event parts and continuous
parts. Such systems are called heterogeneous or hybrid systems.
The behavior of such hybrid systems cannot be covered homogeneously
by the well-known specification formalisms of the different hardware or soft-
ware parts because of the special adaptation of these methods to their respec-
tive field of application and the different time and signal concepts the several
components are described with. A continuous time model usually describes
continuous components, whereas digital components are described by discrete
events.
140 Chapter 12
For describing both kinds of behavior in its interaction, there are different
approaches to describe such systems. On the one hand, the different components
can be described by their special formalisms. On the other hand, homogeneous
description formalism can be used to model the complete system with its dif-
ferent time and signal concepts, and that is what we are in favor of.
Therefore, we have investigated modeling methods that can describe the be-
havior of such systems homogeneously at a high abstraction level independently
from their physical or technical details. Apart from considering the heterogene-
ity, the modeling method must cope with the high complexity of the modeled
system. This demand requires support for modularization and partitioning and
hierarchical structuring capabilities. To meet the challenges of strong time con-
straints the tool used should have time analysis capabilities.
In the following sections, a graph-based formal modeling approach is pre-
sented. It is based on a special Petri net class, which has extended capabilities
for the modeling of hybrid systems. To model the hybrid systems, we have used
an object-oriented modeling and simulation tool based on this Petri net class.
This tool can be used for modeling hybrid systems from an object-oriented point
of view. It can be used for modeling and simulating components or subsystems
and offers capabilities for hierarchical structuring.
By extending the used Petri net class with the concept of time intervals, an
analysis method for time constraints could be implemented.
hybrid Petri nets to model embedded hybrid systems. The used Petri net class
of hybrid dynamic nets (HDN) and its object-oriented extension is described
in Refs.3 and 4. This class is derived from the above-mentioned approach of
David and Alla and defines the firing speed as function of the marking from the
continuous net places.
Components or subsystems are modeled separately and abstracted into
classes. Classes are templates, which describe the general properties of objects.
They are grouped into class libraries. Classes can be used to create objects,
which are called instances of these classes. If an object is created by a class, it
inherits all attributes and operations defined in this class.
One of the important advantages of this concept is the ability to describe a
larger system by decomposition into interacting objects. Because of the proper-
ties of objects, the modification of the system model could be achieved easily.
The object-oriented concept unites the advantages of the modules and hierar-
chies and adds useful concepts such as reuse and encapsulation.
has to be considered because of the change of the net behavior with different
time intervals.
The analysis method consists of two steps. At first, for a given transition
sequence the reachability graph has to be calculated without considering any
time restrictions. In the next step a state class for the given transition sequence
with consideration of the time intervals has to be built. Every class includes all
states with the same marking components. The states of one class differ by its
time components. A change in the marking component leads to a class change.
The method of class building is based on the investigation of firing capabili-
ties for all transitions with the progressive time. The resulting inequation system
of every class can be solved with methods of linear optimization considering
additional conditions. Thereby for the given transition sequence, different time
characteristics can be found:
r Worst case: maximum-minimum analysis,
r Observance of time constraints (deadline),
r Livelock.
4. MODELING AN EMBEDDED
CONTROL SYSTEM
The application example we have chosen to discover the possibilities of using
hybrid Petri nets for modeling embedded hybrid control systems and analyzing
time-related aspects of this system is an integrated multi-coordinate drive8 .
This is a complex mechatronic system including a so-called multicoordinate
measuring system. Figure 12-1 shows this incremental, incident light measuring
system consisting of three scanning units fixed in the stator and a cross-grid
measure integrated into the stage. The two y-systems allow determining the
angle of rotation ϕ. The current x, y1 , and y2 position is determined by the cycle
detection of its corresponding sine and cosine signals. The full cycle counter
keeps track of completed periods of the incremental measuring system. This is
a precondition for the following deep interpolation. The cycle counter of these
signals is a function of the grid constant and the shift between the scanning grids
and the measure. The cycle counter provides a discrete position, and in many
cases this precision is sufficient for the motive control algorithm. To support a
very precise position control with micrometer or nanometer resolution, it must
be decided which possibility of increasing the measure precision is the most
Design of Embedded Control Systems Using Hybrid Petri Nets 143
Forward
-1 Sine
fw m1
sin(pos)-sin
1 sin
Stop -1
P
cos(pos)-cos
stop m2 Position pos cos
Cosine
Backward
bw m3
1
Components with the same functionalities are abstracted into classes, put
into a class library, and instantiated while modeling. The modeling of a multi-
hierarchical system is possible as well.
update_1
upd minmax_s
minmax.akt minmax.max
minmax.upd minmax.min
minmax.mean
meas_1
minmax.amp position_1
minmax.akt minmax.min
minmax.upd minmax.max
minmax.mean
minmax.amp
normalized cosine
ncos
has to detect the moving direction of the carrier and with it the increasing or
decreasing of the cycle number. The original measuring system used a look-up
table, but this was very hard to model with Petri nets. Therefore, we changed
this into logic rules and used this to model the subnet p“ osition 1.”
Average Maximum -1
akt-max
mean max
amp min
akt-min
-1
Amplitude Minimum
X-signal
axismeas-X
signal.fw signal.sin
signal.stop signal.cos
axmess.sin axmess.nsin
m1 signal.bw signal.pos
axmess.cos axmess.ncos
Up disturbance axmess.pos
Left Right Y1-signal
m2 m3 m4 axismeas-Y1
signal.fw signal.sin Dist.XSin Dist.rndXSin
Stop signal.stop signal.cos Dist.XCos Dist.rndXCos
signal.bw signal.pos axmess.sin axmess.nsin
Dist.Y1Sin Dist.rndY1Sin
axmess.cos axmess.ncos
Dist.Y1Cos Dist.rndY1Cos
m5 axmess.pos y1
Dist.Y2Sin Dist.rndY2Sin
Y2-signal
Dist.Y2Cos Dist.rndY2Cos
Down
Dist.dTime
signal.fw signal.sin axismeas-Y2
Dist.dNorm
signal.stop signal.cos
Dist.dAmp
signal.bw signal.pos axmess.sin axmess.nsin
axmess.cos axmess.ncos
y2
axmess.pos
m8 y measured Y-position
ypos
-1
y1-y2-dy
Y-divergence
Y-position dy
-1
For example, the middle top diagram in Fig. 12-7 shows an extreme example
of a simulation with disturbances. It shows a clear exceeding of the zero line of
the cosine signal. Nevertheless, the normal values are correctly calculated and
the position of the machine is correctly displayed.
5. CONCLUSION
Our investigation has shown the advantages of using hybrid Petri nets for
homogeneous modeling of an embedded hybrid system. The object-oriented
approach of the hybrid Petri net class used makes possible a clear modeling of
complex hybrid systems.
The analysis of time-related properties of complex embedded systems offers
the chance to check, in early stages of the design oflw, if the modeled system
matches to given time constraints.
Things that have to be done in future are the extension and completion of
the system model and the integration of the modeling process in a complete
design flow. Here we focus our future work on connecting our approach to
148 Chapter 12
Measurement Measurement
system Computer system
X X X
P1 P2 P3
Read
Read scaling
T1 sensor values T2
signals
P4
P5 P6
Determine
quadrant T3 [0.1, 0.3]
P7
Determine
period
count T4 [3.0, 5.0]
P9
P8
P11 P10
P12
ACKNOWLEDGMENT
This research work was supported by the DFG (Deutsche Forschungsge-
meinschaft, German Research Association) as part of the investigation project
“Design and Design Methodology of Embedded Systems”with the subject
“Design of Embedded Parallel Control Systems for Integrated Multi-Axial Mo-
tive Systems”under grant FE373/13-1/2.
Design of Embedded Control Systems Using Hybrid Petri Nets 149
REFERENCES
1. H. Alla, R. David, J. Le Bail, Hybrid Petri nets. In: Proceedings of the European Control
Conference, Grenoble (1991).
2. H. Alla, R. David, Continuous Petri nets. In: Proceedings of the 8th European Workshop
on Application and Theory of Petri nets, Saragossa (1987).
3. R. Drath, Modeling hybrid systems based on modified Petri nets. PhD Thesis, TU Ilmenau,
(1999) (in German).
4. R. Drath, Hybrid object nets: An object-oriented concept for modeling complex hybrid
systems. In: Hybrid Dynamical Systems. Third International Conference on Automation of
Mixed Processes, ADPM’98, Reims (1998).
5. L. Popova-Zeugmann, Time Petri nets. PhD Thesis, Humboldt-Universit¨ at zu Berlin (1989)
(in German).
6. L. Popova-Zeugmann, On Parametrical Sequences in Time Petri Nets. Humboldt-
University Berlin, Institute of Informatics.
7. C.A. Petri, Communication with Automata. Schriften des IIM Nr. 2, Institut f¨ ur Instru-
mentelle Mathematik, Bonn (1962) (in German).
8. E. Saffert, C. Sch¨
affel, E. Kallenbach, Control of an integrated multi-coordinate drive. In:
Mechatronics’96, Vol. 1, Guimaraes, Portugal, Proceedings 151– 156 September 18– 20,
(1996).
Section IV
Implementation of Discrete-Event
Systems in Programmable Logic
Chapter 13
STRUCTURING MECHANISMS
IN PETRI NET MODELS
From specification to FPGA-based implementations
Abstract: This chapter addresses the use of modular model structuring mechanisms for the
design of embedded systems, using reactive Petri nets. Relevant characteristics
of reactive Petri nets are briefly presented. One graphical hierarchical structur-
ing mechanism named horizontal decomposition is presented. This mechanism
relies on the usage of macronodes, which have subnets associated with them
and can be seen as a generalization of widely known mechanisms available in
several Petri nets tools. Three types of macronodes are used: macroplace, macro-
transition, and macroblock. The model execution is accomplished through the
execution of the model’s flat representation. Additional folding mechanisms are
proposed through the introduction of a vector notation associated with nodes
and external signals. A running example of a controller for a 3-cell first-in-first-
out system is used illustrating the several modular construction mechanisms.
High-level and low-level Petri net models are used and compared for this pur-
pose. A modular composition operation is presented and its use in the con-
troller’s design is exemplified. Finally, an overview of distinct field programmable
gate array (FPGA)-based implementation strategies for the referred controller is
discussed.
Key words: Petri nets; structuring mechanisms; modular design; programmable logic devices;
field programmable gate arrays.
1. INTRODUCTION
Petri nets are a well-known formal specification language offering numer-
ous and important advantages for the modeling of concurrent and distributed
154 Chapter 13
3. AN EXAMPLE
In this section we present a running example that allows us to illustrate the
structuring and composition mechanisms that we will introduce. It is a 3-cell
first-in-first-out (FIFO) system with four associated conveyor belts. This is a
simplified version of an example presented elsewhere14 .
Each conveyor (except the last one) has sensors to detect objects on its inputs
and outputs. These sensors have associated variables IN[1..4] and OUT[1..3].
Each conveyor also has movement control, through the variables MOVE[1..3].
Figure 13-1 presents the referred layout.
Depending on the designer preferences, several modeling styles can be used,
e.g., starting with a high-level Petri net model, as in Fig. 13-2, or using low-level
nets, as in Fig. 13-3. In Fig. 13-2, as a simplified notation, the ith element of
the input/output signal vector x, x[i], is referred to as xi. As a matter of fact,
Fig. 13-3 can be seen as a (slightly modified) unfolding of the model in Fig. 13-
2. In Fig. 13-3, we can easily identify three model parts associated with different
token colors in Fig. 13-2. Places and transitions in Fig. 13-3 exhibit the names
used in Fig. 13-2 and additionally receive a vector index. This will be presented
later on.
The colored model in Fig. 13-2 easily accommodates the modeling associ-
ated with the system’s expansion. For example, if one wants to model a 25-cell
FIFO system, the changes are really easy to follow, as far as only initial marking
at pa and p6, and guards at t2 and t3, is changed accordingly.
On the other hand, the equivalent low-level Petri net model will expand
through several pages, although as a result of the duplication of a specific
submodel. Yet, for an implementation based on “elementary platforms” (those
without sophisticated computational resources), it is probably preferable to
start from the low-level model, as it can be used directly as an implementation
specification. This is the case for hardware implementations, namely the ones
supported by FPGAs (or other reconfigurable devices), where each node can be
IN[1..N+1]
OUT[1..N]
CONTROLLER
MOVE[1..N]
Number of cells N = 3
Conveyor
C
free <1>
<2>
<2> <3>
pa
p2 Object
<i> <i> processing
OUTi=0 t1 <i>
<i> <i+1>
<i> i=3 INi+1=0
t2 t3
INi+1=0 <i> i=1 or i=2 <i>
Object p4 p5 <i>
feeding Removing INi=1
p3 t4
Object
<i> <i> i=1 <i>
OUTi=1 INi+1=1 <i> i=1 or INi+1=1
t6 i=2
t5 <i> <i> t7 <i+1>
<i> <i>
<i>
<i>
<1>
Free <2> <3>
<3>
Cell p6 Conveyor
t8 moving
p7
pb
Conveyor
stopped <i> <i>
and busy IF <i>
OUTi=1 THEN MOVEi
{
IN[1]=1
OUT[1]=1 pa[1]
<1> 1] OUT[1]=11 p3[1] OUT[1]=0
p7[1] t1
IN[2 ]=0 IN[2]=1
p5[1]
pb[1] t8[1] t5[1] t1[1]
p6[1] p2[1] t3[1] t7[1]
{
pa[2]
OUT[2]=1
2 =1
<2> p7[2] OUT[2]=1
T
p3[2]
=
OUT[2]=0
IN[3]=0 IN[3]=1
p5[2]
pb[2] t8[2] t1[2]
t5[2]
p2[2] t3[2] t7[2]
p6[2]
{ OUT[3]=1
pa [3]
<3> p7[3] OUT[3]=1
T OUT[3]=0
p3[3] IN[4]=0 IN[4]=1
p5[3]
pb[3] t8[3] t5[3] t1[3]
p6[3] p2[3] t3[3] t7[3]
p1
OUT[i]=1 pa
p7 OUT[i]=1 OUT[i]=0
p3 IN[i+1]=0 IN[i+1]=1 ppb
p5
pb t8 t5 t1
p6 p2 t3 t7
Cell [i]
ppa
Interface nodes Other places interpretation
pa - Conveyor free p2 - Product processing in the cell
pb - Conveyor moving p3 - Feeding
p5 - Redraw
ppa - Next conveyor free p6 - Cell free
ppb - Next conveyor moving p7 - Conveyor stop and busy
k∈
/ {1}: <•>, k-1
IN[1] =1
k∈{1}:<•>
pa ppa <•>, k∈{3}
<
<•>, k∈{1} pb
ppb k∈{3}:<•> p1
t1 b[i=1..3] = Cell [i]
k∈
/ {3}: <•>, k+1
formalization of the net composition, including the use of macronodes for hi-
erarchical structuring, can be supported by a small set of operations (defined
elsewhere1,13 ). The basic operation is named net addition.
Net addition is an intuitive operation that works in two steps. In the first step,
two nets are considered as a single net (a disjoint union). In the second step,
several pairs of nodes (either places or transitions) are fused. Figure 13-7 exem-
plifies net addition. The addition of the two upper nets, Start and Cell, results
in net StartCell. Roughly speaking, the addition was accomplished through the
OUT[1]=1 pa
p7 OUT[1]=1 OUT[1]=0
IN[1 ] =1 p3
pp2 IN[1]=0 IN[1]=1 ppb
p5
pp1 t1 + pb t8
p6
t5 t1
p1 t3 t7
Start Celll
ppa
IN[1]=1
t1
= OUT[1]=1
p7 OUT[1]=1
p3
OUT[1]= 0
pa
IN[1]=0
p5
IN[1]=1 ppb
pb t8 t5 t1
p6 p1 t3 t7
StartCelll
ppa
fusion of two pairs of places: pp1 from the Start net with pa from the Cell net
and pp2 from the Start net with pb from the Cell net.
The operation is amenable to be algebraically represented as
StartCell = Cell ⊕ Start (pa/ppl → pa, pb/pp2 → pb) (1)
where the places preceding the arrows are the interface nodes. The nodes after
the arrows are the resulting merged nodes. For example, the notation pa/pp1→
pa means that nodes pa and pp1 are merged (fused) together giving origin to a
new node named pa.
Also, the whole model presented in Fig. 13-3 can be straightforwardly pro-
duced through the following expression, where Place represents a subnet with
no transitions and a single place p1 containing one token* :
System = Place ⊕ Cell[3] ⊕ Cell[2] ⊕ Cell[1] ⊕ Start
(Cell[1].pa/pp → pa[1], Cell[1].pb/pp2 → pb[1],
Cell[1].ppb/Cell[2].pb → pb[2], Cell[1].ppa/Cell[2].pa → pa[2],
Cell[2].ppb/Cell[3].pb → pb[3], Cell[2].ppa/Cell[3].pa → pa[3],
Cell[3].ppb/Cell[3].ppa/p1 → p1)
(2)
First, the expression clearly shows the nets and the net instances that are
added together. Then the expression shows the node fusions used to “glue” the
nets together. We name this list of node fusions expressions a net collapse and
each of the elements in the list a named fusion set. These two concepts, as well
as net addition, are informally defined in section 5.
4. HORIZONTAL DECOMPOSITION
The horizontal decomposition mechanism is defined in the “common” way
used in top-down and bottom-up approaches, supporting refinements and ab-
stractions, and is based on the concept of module. The module is modeled in
a separated net, stored in a page; every page may be used several times in the
same design as a net instance. The pages with references to the modules are
referred to as superpages (upper-level pages), while the pages containing the
module model are referred to as subpages (lower-level pages). The nodes of
the net model related with hierarchical structuring are named by macronodes.
Three types of macronodes are used: macroplace, macrotransition (also used
by hierarchical colored Petri nets18 ), and macroblock. Every macronode has an
* Note that, for simplification purposes, in Fig. 13-3, the nodes named Cell[i].n appear as
n[i].
Structuring Mechanisms in Petri Net Models 161
5. NET ADDITION
In this section we informally present net addition and the two main associated
concepts: net collapse and named fusion sets. The respective formal definitions
can be found elsewhere3 , together with a more comprehensive presentation. An
implementation of these concepts is also presented elsewhere2 . It is proposed in
accordance with the Petri Net Markup Language (PNML)5 , which is currently
the major input for a transfer format for the International Standard ISO/IEC
15909 on high-level Petri nets.
For simplification purposes, we base our presentation in terms of a low-
level marked Petri net. This implies that all the nonautonomous extensions as
well as the high-level features are not mentioned. In fact, for net inscriptions,
only the net marking is taken into account: the respective place markings are
added when places are fused. This assumes the existence of a net addition
operation defined for net markings. All other net inscriptions can be handled
in a similar manner as long as the respective addition operation is defined on
them.
We now present informal definitions for a few concepts implicit in a net
addition operation.
Given a net with a set of places P and a set of transitions T , we say that a
fusion set is a subset of P or a subset of T . A fusion set with n nodes x1 , x2 . . . , xn
is denoted as x1 /x2 / . . . /xn .
As the objective of fusion sets is the specification of a new fused node,
we also define named fusion sets. A named fusion set is simply a fusion set
(x1 /x2 / . . . /xn ) plus the name for the node resulting (rn) from the fusion of all
the nodes in the fusion set. It is denoted as x1 /x2 / . . . /xn → r n. This notation
162 Chapter 13
was already used in the (1) and (2). We also say that Nodes (x1 /x2 / . . . /xn →
r n) = {x1 /x2 , . . . , xn }, and Result (x1 /x2 / . . . /xn → r n) = r n.
Fusion sets and named fusion sets constitute the basis for the net collapse
definition: a net collapse is simply a list of named fusion sets. For example, (1)
uses a net collapse containing two named fusion sets.
The application of a net collapse CO to a net N is named a net collapse
operation. Informally, each named fusion set nfs, in a net collapse CO, fuses
all the nodes in Nodes(nfs) into the respective single node Result(nfs). A net
addition between two or more nets is defined as a net disjoint union of all the
nets, followed by a net collapse operation on the resulting net.
Finally, the interested reader should refer to Refs. 2 and 3, where an addi-
tional reverse operation named net subtraction is also defined. This allows the
undoing of a previous addition operation in a structured way.
6. VECTORS EVERYWHERE
The example presented already illustrated the use of net instances, which
are specified by the net name followed by an index number between square
brackets (e.g. Cell[2]). This is called the net instance name. The rationale for
this notation comes from viewing each net as a template from which any number
of instances can be created. Two instances are made different by the respective
instance numbers. All the node identifiers and all the net inscriptions of a
given net instance become prefixed by the net instance name (e.g. Cell[2].p6).
This guarantees that all the elements in two net instances of a given net are
disjoint.
The use of net instances is made particularly useful if we allow the definition
of net vectors and iterators across the net vector elements. A net vector is simply
a list of net instances: Cell[1..3] = (Cell[1], Cell[2], Cell[3]).
A vector iterator is defined as an integer variable that can be specified
together with a vector declaration. This is used in the specification of net col-
lapses. In this way we can specify collapse vectors, which allow the gluing of
an arbitrary number of net instances. As an example, (3) generalizes (2) to any
number of cells ((2) corresponds to NCells = 2):
System = Place ⊕ Cell[1.. NCells] ⊕ Start
(Cell[1].pa/pp1 → pa[1], Cell[1].pb/pp2 → pb[1],
(Cell[i].ppb/Cell[i + 1].pb → pb[i + 1],
Cell[i].ppa/Cell[i + 1].pa → pa[i + 1])
[i : 1..NCells − 1],
Cell[3].ppb/ Cell[3].ppa/p1 → p1) (3)
Structuring Mechanisms in Petri Net Models 163
Especially important is the fact that (3) allows the specification of a similar
system with any number of cells. Note the use of an iterator variable i in the
specification of the addition collapse.
Vectors are also extremely useful for the input events. If we define an input
event vector with NIE elements ie[NIE], we can then use a vector element ie[i]
as an event. If the index value iis specified as a (colored) transition variable, it
can then be used to bind i. This is another way for the execution environment to
interfere with the net execution. In this sense, transition firing depends not only
on the marking of the connected places (as common on a Petri net), constrained
by the guard expressions, but also on the occurrence of the associated events12 .
8. CONCLUSIONS
In this paper, Petri net-based digital system design was addressed. Modular
design was supported by the concept of the net addition operation. A hierar-
chical structuring mechanism, named horizontal decomposition, was presented
on the basis of the concept of module, which can be represented by special
kind of nodes, named macronodes, and was complemented by the vectorial
representation of nodes and signals. Their usage was successfully validated
through an example of a controller for a low-to-medium complexity system,
which was implemented on the basis of programmable logic devices (normally
Structuring Mechanisms in Petri Net Models 165
REFERENCES
1. João Paulo Barros, Luı́s Gomes, Modifying Petri net models by means of crosscutting
operations. In: Proceedings of the 3rd International Conference on Application of Con-
currency to System Design (ACSD’2003); Guimarães, Portugal (18–20 June 2003).
2. João Paulo Barros, Luı́s Gomes, Operational PNML: Towards a PNML support for model
construction and modification. In: Workshop on the Definition, Implementation and Ap-
plication of a Standard Interchange Format for Petri Nets; Satellite event at the Interna-
tional Conference on Application and Theory of Petri Nets 2004, Bologna, Italy (June 26,
2004).
3. João Paulo Barros, Luı́s Gomes, Net model composition and modification by net opera-
tions: A pragmatic approach. In: Proceedings of the 2nd IEEE International Conference
on Industrial Informatics (INDIN’04), Berlin, Germany (24–26 June, 2004).
4. Luca Bernardinello, Fiorella De Cindio; A survey of basic net models and modular net
classes. In: Advances in Petri Nets 1992; Lecture Notes in Computer Science, G. Rozenberg
(ed.). Springer-Verlag (1992).
5. Jonathan Billington, Søren Christensen, Kees van Hee, Ekkart Kindler, Olaf Kummer,
Laure Petrucci, Reinier Post, Christian Stehno, Michael Weber, The Petri net markup
language: Concepts, technology, and tools. In: W. van der Aalst, E. Best (ed.), Proceeding
of the 24th International Conference on Application and Theory of Petri Nets, LNCS, Vol.
2679, p. 483–505, Eindhoven, Holland, Springer-Verlag. (June 2003).
6. Peter Buchholz, Hierarchical high level Petri nets for complex system analysis. In: R.
Valette (ed.) Application and Theory of Petri Nets 1994, Proceedings of 15th International
Conference, Zaragoza, Spain, Lecture Notes in Computer Science Vol. 815, pp. 119–138,
(1994) Springer-Verlag.
7. Ken Chapman; “Picoblaze”; www.xilinx.com
8. Søren Christensen, N.D. Hansen. Coloured Petri nets extended with channels for syn-
chronous communication. Application and Theory of Petri Nets 1994, Proceedings of
15th International Conference, Zaragoza, Spain, Lecture Notes in Computer Science Vol.
815, (1994), pp. 159–178.
9. R. David, H. Alla, Petri Nets & Grafcet; Tools for Modelling Discrete Event Systems.
Prentice Hall International (UK) Ltd; ISBN 0-13-327537-X (1992).
10. Claude Girault, Rüdiger Valk, Petri Nets for Systems Engineering—A Guide to Modelling,
Verification, and Applications, Springer-Verlag, ISBN 3-540-41217-4 (2003).
11. Luis Gomes, Redes de Petri Reactivas e Hierárquicas—Integração de Formalismos no
Projecto de Sistemas Reactivos de Tempo-Real (in Portuguese). Universidade Nova de
Lisboa 318 pp. (1997).
12. Luis Gomes, João Paulo Barros, Using hierarchical structuring mechanisms with Petri nets
for PLD based system design. In: Workshop on Discrete-Event System Design, DESDes’01,
27–29 June 2001; Zielona Gora, Poland, (2001).
13. Luis Gomes, João Paulo Barros, On structuring mechanisms for Petri nets based system
design. In: Proceedings of ETFA’2003—2003 IEEE Conference on Emerging Technologies
and Factory Automation Proceedings, September, 16–19, 2003, Lisbon, Portugal; IEEE
Catalog Number 03TH86961 ISBN 0-7803-7937-3.
166 Chapter 13
14. Luis Gomes, Adolfo Steiger-Garção, Programmable controller design based on a synchro-
nized colored Petri net model and integrating fuzzy reasoning. In: Application and Theory
of Petri Nets’95; Giorgio De Michelis, Michel Diaz (eds.); Lecture Notes in Computer
Science; Vol. 935; Springer, Berlin, pp. 218–237 (1995).
15. David Harel, Statecharts: A visual formalism for complex systems. Science of Computer
Programming, 8, 231–274 (1987).
16. Xudong He, John A. N. Lee. A methodology for constructing predicate transition net
specifications. Software-Practice and Experience, 21(8), 845–875 (August 1991).
17. P. Huber, K. Jensen, R.M. Shapiro, Hierarchies in Coloured Petri Nets. In: Advances
in Petri Nets 1990, Lecture Nores in Computer Science, Vol. 483, G. Rozenberg (ed.);
pp. 313–341 (1990).
18. Kurt Jensen, Coloured Petri Nets: Basic Concepts, Analysis Methods and Practical Use,
Vol. 1. Springer-Verlag, ISBN 3-540-55597-8 (1992).
19. Manuel Silva, Las Redes de Petri: En la Automática y la Informática. Editorial AC, Madrid,
ISBN 84 7288 045 1 (1985).
Chapter 14
IMPLEMENTING A PETRI NET
SPECIFICATION IN A FPGA USING VHDL
Abstract: This paper discusses how the FPGA architectures affect the implementation of
Petri net specifications. Taking into consideration the observations from that study,
a method is developed for obtaining VHDL descriptions amenable to synthesis,
and tested against other standard methods of implementation. These results have
relevance in the integration of access technologies to high-speed telecommuni-
cation networks, where FPGAs are excellent implementation platforms.
1. INTRODUCTION
In many applications it is necessary to develop control systems based on
Petri nets1 . When a complex system is going to be implemented in a small
space, the best solution may be to use a FPGA.
FPGA architectures2 are divided in many programmable and configurable
modules that can be interconnected with the aim of optimizing the use of the
device surface. It is necessary to remember that the main problem of PLDs,
PALs, and PLAs is the poor use of the device surface, that is, the low percentage
of logic gates used. This occurs because this kind of programmable device has
only one matrix for AND operations and another matrix for OR operations.
FPGAs are different because they are composed of small configurable logic
blocks (CLBs) that work like sequential systems. CLBs are composed of a RAM
memory and one or more macrocells. Each CLB RAM memory is programmed
with the combinational system that defines the behavior of the sequential system.
168 Chapter 14
These points can be followed in many cases when implementing a Petri net.
There are two kinds of elements in a Petri net: places and transitions. The circuit
implementation of these elements is relatively easy, as shown in the schematics
of a place and a transition in Fig. 14-1. Each one of these elements can be
programmed in one or more CLBs following the model shown in Fig. 14-1.
That would not be the most compact and efficient design, but it would be the
simplest.
Each place and transition can be implemented on a CLB. The main problem
is the low number of inputs in a CLB. Sometimes it is necessary to use more
CLBs for each element. T inputs are the signals generated for the preceding
transitions. R inputs are connected to the output of the next transitions. LS is
the output signal of the place. E inputs are the system inputs involved in the
transition. L inputs are the signals generated for the preceding places. TS is the
output of the transition.
For obtaining the most compact and efficient design, it is necessary to make
the following transformations.
In the models of Fig. 14-1, each place of the Petri net is associated to one
bit-state (one-hot encoding). This is not the most compact solution because
most of the designs do not need every combination of bits for defining all the
states of the system. For instance, in many cases a place being active implies
that a set of places will not be active. Coding the place bits with a reduced
number of bits will be a good solution because the number of CLBs decreases.
For instance, if a Petri net with six places always has only one place active, it
is enough to use only 3 bits for coding the active place number (binary state
encoding).
The other transformation consists of implementing the combinational circuit
of the global system and dividing the final sequential system (combinational
circuit and memory elements).
These transformations are used for making compact and fast designs, but
they have some limitations.
When a compacted system is divided, maybe too many CLBs have to be
used, because of the low number of inputs on each CLB (four or five inputs). This
obstacle supposes sometimes to use more CLBs than dividing a noncompacted
system.
Verifying or updating a concrete signal of the Petri net in a compacted
system may be difficult. It is necessary to take into account the achieved trans-
formations and to supply the inverse transformations for monitoring the signal.
This problem can be exposed in failure-tolerant systems. This kind of systems
need to verify their signals while they are running. This system may be more
complex if it has been compacted previously.
To avoid the mentioned problems, this paper proposes a solution that consists
in implementing the system using special blocks composed of one place and a
transition. With this kind of blocks compact systems can be achieved, preserving
the Petri net structure. Figure 14-2 shows an example of Petri net divided into
five blocks. Each block is implemented in a CLB.
3. IMPLEMENTATION
With this kind of implementation of Petri net-based systems, every CLB
is composed of a place connected to a transition. The place can be activated
170 Chapter 14
Figure 14-2. Example of Petri net divided into blocks for implementation in FPGA.
Figure 14-3 shows logical and electronic schematics of these blocks. The
place and the transition are interconnected through two signals in the block.
These signals are not always connected to the exterior. This detail allows a
reduction of the CLB connections in a FPGA. In many cases a concrete CLB
does not have enough inputs for including a block of this kind. In those cases,
it is necessary to use auxiliary CLBs for implementing the block. However,
it is unusual to find a Petri net on which every place is preceded by a high
number of transitions (or to find a transition preceded by many places). Usually,
most places and transitions in a practical Petri net are connected to one or two
transitions or places, respectively (except common resources or synchronism
points). Figures 14-4 and 14-5 show some examples in which there is an element
preceded by many others.
There are cases in which the number of CLB inputs is not enough to include
a place or a transition in the CLB. Figure 14-6 shows a logical schematic for
expanding the block inputs. The logic gates connected outside the block place-
transition are used for incrementing the number of inputs. In this figure, four
CLBs are necessary for implementing the block. Three of them are auxiliary
blocks and have the function of concentrating a number of inputs in one signal.
Figure 14-4. Examples of different block interconnections for implementing several places
(above) or several transitions (below) with one other element.
172 Chapter 14
Figure 14-5. Example of interconnection for implementing several places and transitions.
Figure 14-7. Schematic of the connections for the Petri net of Fig. 14-1 with configurable
blocks.
5. EXAMPLE
Figure 14-7 shows the blocks interconnection of a Petri net-based system
on a FPGA. The given example corresponds to the net of Fig. 14-2.
Each block is a CLB of the FPGA, and it is not necessary to include auxiliary
blocks for incrementing the number of inputs of the elements of the net. There
is only a place with two input signals in the net of Fig. 14-2. Rest of the
places have only one input signal. If some element had more than two inputs,
it would be necessary to use the structures of Fig. 14-4, and then the number of
CLBs would be increased. The results of different design methodologies using
a sample FPGA are summarized in Table 14-1.
Design method Design process FPGA resources in use Device frequency achieved
6. CONCLUSIONS
In this paper, the implementation of Petri net-based systems on FPGAs has
been discussed. The main problem consists of using places and transitions with a
174 Chapter 14
different number of inputs, including the case when there are more inputs than a
configurable block of a FPGA. For that, a method has been developed through
two circuit models, one for places and the other for transitions. With these
models a new block has been presented that contains a place interconnected
with a transition. The purpose of this block is to reduce the interconnections
between CLBs in a FPGA and, therefore, reducing the number of inputs on
each block (especially, feedback signals necessary to reset preceding places).
This method is optimal for Petri nets in which most places and transitions are
preceded by one or two (but no more) transitions or places. Furthermore, some
possibilities have been shown for the interconnection of blocks that increase the
number of inputs in elements of a Petri net. The main purpose of this method
is to integrate the maximum number of elements of a Petri net in a FPGA.
ACKNOWLEDGMENTS
This work was financed by the European Commission and the Comisión
Interministerial de Ciencia y Tecnologı́a (Spain) through research grant TIC
1FD97-2248-C02-02 in collaboration with Versaware SL (Vigo, Spain).
REFERENCES
1. R. Zurawski, M.C. Zhou, Petri nets and industrial applications: A tutorial. IEEE Transac-
tions on Industrial Electronics (December 1994).
2. FLEX8000 HandBook ALTERA Corp. (1994).
3. E. Soto, E. Mandado, J. Farina, Lenguajes de descripcion hardware (HDL): El lenguaje
VHDL. Mundo Electronico (April 1993).
4. John Nemec, Stoke the fires of FPGA design. Keep an FPGA’s architecture in mind and
produce designs that will yield optimum performance and efficiency. Electronic Design
(October 25, 1994).
Chapter 15
FINITE STATE MACHINE IMPLEMENTATION
IN FPGAs
Hana Kubátová
Department of Computer Science and Engineering, Czech Technical University in Prague,
Karlovo nám. 13, 121 35 Prague 2, Czech Republic; e-mail: [email protected]
Abstract: This paper deals with the possibility of the description and decomposition of the
finite state machine (FSM). The aim is to obtain better placement of a designed
FSM to the selected FPGA. It compares several methods of encoding of the FSM
internal states with respect to the space (the number of CLB blocks) and time
characteristics. It evaluates the FSM benchmarks and seeks for such qualitative
properties to choose the best method for encoding before performing all FOUN-
DATION CAD system algorithms, since this process is time consuming. The new
method for encoding the internal FSM states is presented. All results are verified
by experiments.
Key words: finite state machine (FSM); hardware description language (HDL); state transition
graph (STG); encoding method; decomposition; benchmark.
1. INTRODUCTION
Most research reports and other materials devoted to searching for the “opti-
mal” encoding of the internal states of a FSM are based on the minimum number
of internal states and sometimes also on the minimum number of flip-flops used
in their hardware implementation. The only method to get really optimal results
is testing of all possibilities1 . But sometimes “wasting” the internal states or
flip-flops is a better solution because of the speed of the designed circuit and
mapping to a regular structure. Most encoding methods are not suitable for
these structures, e.g., different types of FPGAs or CPLDs. Therefore it is de-
sirable to compare several types of sequential circuit benchmarks to search for
the relation between the type of this circuit (the number of the internal states,
inputs, outputs, cycles, branching) and the encoding method with respect to
their implementation in a XILINX FPGA.
176 Chapter 15
Our research group has been working on encoding and decomposition meth-
ods for FSMs. We have worked with the CAD system XILINX FOUNDATION
v2.1i during our first experiments and next with XILINX ISE. We have used
the benchmarks from the Internet in KISS2 format, some encoding algorithms
from JEDI program and system SIS 1.2 10 . First of all, we classified the FSM
benchmarks to know their quantitative characteristics: the number of internal
states, inputs, outputs, transitions (i.e., the number of arcs in the state transition
graph (STG)), a maximum number of input arcs, a maximum number of out-
put arcs to and from STG nodes, etc. We compared eight encoding methods:
“one-hot,” “minimum-lengths” (binary), “Johnson,” and “Gray” implemented
in FOUNDATION CAD system. “Fan-in” and “Fan-out” oriented algorithms,
the algorithm “FAN” connecting Fan-in and Fan-out ones,1,5 and the “two-hots”
methods were implemented. The second group of our experiments was focused
on FSM decompositions. Our original method called a “FEL-code” is based
on these experimental results and is presented in this paper. The final results
(the number of CLB blocks and maximum frequency) were obtained for the
concrete FPGA implementation (Spartan XCS05-PC84).
2. METHODS
2.1 Encoding methods
A one-hot method uses the same number of bits as the number of internal
states, where the great number of internal variables is its main disadvantage.
The states that have the same next state for the given input should be given
adjacent assignments (“Fan-out oriented”). The states that are the next states
of the same state should be given adjacent assignments (“Fan-in oriented”).
The states that have the same output for a given input should be given adjacent
assignments, which will help to cover the 1’s in the output Karnaugh maps
(“output oriented” method).
A very popular and frequently used method is the “minimum-length” code
(obviously called “binary”) that uses the minimum number of internal vari-
ables, and the Gray code with the same characteristics and adjacent codes for a
sequence of the states.
The “two-hots” method uses a combination of two 1’s to distinguish between
all states.
First partial results based on several encoding methods (Table 15.1) and
benchmarks characteristics were presented in Refs. 6, 7, 8, and 9. We have
found out that the most successful methods are “minimum-length” and “one-
hot.” Minimum-length encoding is better than one-hot encoding for small FSMs
and for FSMs that fulfill the following condition: a STG describing the FSM
Finite State Machine Implementation in FPGAs 177
Table 15-1. Examples of codes for six internal states
should be complete or nearly complete. If the ratio of the average output degree
of the node to the number of states is greater than 0.7, then it is better to use the
minimum-length code. On the contrary, one-hot encoding is better when this
ratio is low. Let us define this qualitative property of the FSM as a characteristic
AN:
Average output edges
AN = (1)
Number of states − 1
The value AN = 0.7 was experimentally verified on benchmarks and on
our specially generated testing FSMs – Moore-type FSMs with the determined
number of internal states and the determined number of the transitions from
the internal states. Our FSM has the STG with the strictly defined number
of edges from all states. For each internal state this number of output edges
must be the same. The resulting format is the KISS2 format – e.g., 4.kiss
testing FSM has the STG with four edges from each internal state (node).
The next state connections were generated randomly to overcome the XILINX
FOUNDATION optimization for the counter design. The relationship between
one-hot and minimum-length encoding methods is illustrated in Fig. 15-1. The
Figure 15-1. The comparison of minimum-length and one-hot encoding with respect to AN.
178 Chapter 15
comparison was made for 30 internal states FSMs with different branching.
The number of transitions from all states is expressed by AN (axis X). The axis
Y expresses the percentage success of the methods with respect to the space
(the minimum number of the CLB from all encoding methods for every testing
FSM is 100%). It can be seen that for less branching (smaller AN) the one-hot
code is always better until the border AN = 0.7.
as several level, cascade, and general, where the number of levels depends on
the FSM properties. The global number of states of the decomposed FSM is
equal to the original number of FSM internal states. The internal states are di-
vided into groups (sets of internal states) with many connections between states
(for such “strongly connected” states the minimum-length encoding is better to
use); see Fig. 15-3. The internal state code is composed of the minimum-length
part (a serial number of the state in its set in binary notation) and the one-hot
part (a serial number of a set in one-hot notation). The number of minimum-
length part bits is equal to b, where 2b is greater than or equal to the maximum
number of the states in sets and the number of one-hot part bits corresponds to
the number of sets.
The global algorithm could be described as follows:
1. Place all FSM internal states Q i into the set S0 .
2. Select the state Q i (from S0 ) with the greatest number of transitions to other
disjoint states from S0 . Take away Q i from S0 ; Q i becomes the first member
of the new set Sgroup .
3. Construct the set Sneighbor of neighboring internal states of all members of
Sgroup . Compute the score expressing the placement suitability for a state Q j
into Sgroup for all states from Sneighbor . Add the state with the highest score to
Sgroup .
4. The score is a sum of
(a) the number of transitions from Q j to all states from Sgroup multiplied by
the constant 10;
(b) the number of such states from Sgroup for which exists the transition from
Q j to them, multiplied by the constant 20;
(c) the number of transitions from Q j to all neighboring internal states from
Sgroup (i.e. to all states from Sneighbor ) multiplied by the constant 3;
(d) the number of such states from Sneighbor for which the transition from Q j
to them, multiplied by the constant 6, exists;
180 Chapter 15
(e) the number of transitions from all internal states from Sgroup to Q j mul-
tiplied by the constant 10;
(f) the number of such states from Sgroup that the transition from them to
Q j , multiplied by the constant 20, exists;
(g) the number of transitions from all neighboring states of Sgroup (placed
in Sneighbor ) to Q j , multiplied by the constant 3;
(h) the number of the neighboring states in Sneighbor for which the transition
from them to Q j , multiplied by the constant 6, exists;
5. Compute AN (1) characteristics for Sgroup .
6. When this value is greater than the “border value” (the input parameter of
this algorithm, in our experiments is usually 0.7) the state Q j becomes the
real member of Sgroup . Now continue with step 3. When the ratio is less than
the border value, state Q j is discarded from the Sgroup and this set is closed.
Now continue with step 2.
7. If all internal states are placed into sets Si and at the same time S0 is empty,
construct the internal states code:
8. The code consists of the binary part (a serial number of the state in its set
in binary notation) and the one-hot part (a serial number of a set in one-hot
notation). The number of binary part bits is equal to b, where 2b is greater
than or equal to the maximum number of states in the sets. The number of
one-hot part bits is equal to the number of sets Si .
00/1 10/1
11/1
0-/1
st3 st2 1-/1
01/1
Choose the state with the highest value and construct the new set S1 :
S0 = {st0, st2, st3}, S1 = {st1}
3. Construct the set Sneighbor of neighboring internal states of all members
of S1 :
S0 = {st0, st2, st3}, S1 = {st1}, Sneighbor = {st0, st2}
Compute the score for all states from Sneighbor :
st0score = 1.10 + 1.20 + 2.3 + 1.6 + 1.10 + 1.20 + 2.3 + 1.6 = 84
st2score = 1.10 + 1.20 + 1.3 + 1.6 + 1.10 + 1.20 + 1.3 + 1.6 = 78
Choose the state with the highest score and add it to S1 :
S0 = {st2, st3}, S1 = {st0, st1}, Sneighbor = {st2}
4. Compute AN (1) for the elements from S1 :
AN = 1.0.
AN is greater then 0.7; therefore, the state Q j becomes a real member of S1 .
Now continue with step 3.
5. Try to add the state st2 into S1 and compute AN. Because AN = 0.66, state
st2 is discarded from S1 and this set is closed. Now continue by step 2.
At the end all internal states are placed into two groups:
S1 = {st0, st1}, S2 = {st2, st3}
Now the internal state code is connected from the one-bit binary part and
the two-bit one-hot parts:
st0 . . . 0/01 st1 . . . 1/01 st2 . . . 0/10 st3 . . . 1/10
3. EXPERIMENTS
Because the conversion program between a KISS2 format and VHDL was
necessary, the converter K2V DOS (in C++ by the compiler GCC for DOS OS)
was implemented6 . The K2V DOS program allows obtaining information about
FSMs; e.g., the node degree, the number of states, the number of transitions,
etc. The VHDL description of a FSM created by the K2V DOS program can
be described in different ways (with different results):
r One big process sensitive to both the clock signal and the input signals (one
case statement is used in this process; it selects an active state; in each branch
of the case there are if statements defining the next states and outputs). This
is the same method that was used for the conversion between STG and
VHDL by the XILINX FOUNDATION11 .
r Three processes (next-state-proc for the implementation of the next-state
function, state-dff-proc for the asynchronous reset and D flip-flops appli-
cation, and output-proc for the FSM output function implementation). To
182 Chapter 15
frequency
#CLB
The K2V DOS program system can generate our special testing FSMs (for
more precise setting of the characteristics AN, see Section 2.1). The K2V DOS
program can generate different FSM internal state encoding by minimum-
lengths, Gray, Johnson, one-hot, two-hots, Fan-in, Fan-out, FAN, and FEL-
code methods. All benchmarks were processed by the DECOMP program to
generate all possible types of decompositions (in KISS2 format due to using
the same batch for the FPGA implementation).
r One-hot encoding method is better for other cases and mostly generates
faster circuits (but the XILINX FOUNDATION uses optimization methods
for “one-hot” encoding).
r The FEL-code method is universal as it combines the advantages of both one-
hot and minimum-length methods. This method is heuristic; the parameters
(AN and the score evaluations) have been experimentally verified.
r Other tested encoding methods provide worse results in most cases and have
no practical significance.
r For such FSM implementation where majority of CLB blocks are used (e.g.,
90%), the one-hot method gives better results, mainly with respect to the
maximum working frequency due to easier wiring.
r All FSM decomposition types are not advantageous to use in most cases
due to great information exchange – the parallel decomposition is the best
one (if it exists).
r A different strategy for searching for the partitions – the best FSM partition
is not the one with the minimum number of internal states but the one with
the minimum sets of input and output symbols – could be used for FPGA
implementation.
The experimental results performed on the recent XILINX ISE CAD system
have not been sufficiently compared with those presented above, since many
qualitative changes were incorporated into this tool, such as new types of final
platforms and new design algorithms. According to our last results not yet
presented, we can conclude that the one-hot and minimum-length methods still
remain the most successful. The average improvement (for all benchmarks but
only for working frequency) is presented in Table 15-2. It can be stated that
the encoding methods offered by the CAD system are better then the outside
methods and AN = 0.7 is a right value.
ACKNOWLEDGMENTS
This paper is based on the results of several student projects supervised by
the author for four years. This research was in part supported by the grants “De-
184 Chapter 15
REFERENCES
1. P. Ashar, F. Devadas, A.R. Newton, Sequential Logic Synthesis. Kluwer Academic Pub-
lishers, Boston, Dordrecht, London (1992).
2. L. Józwiak, J.C. Kolsteren, An efficient method for sequential general decomposition of
sequential machines. Microprocessing and Microprogramming, 32 657–664 (1991).
3. A. Chojnaci, L. Jozwiak, An effective and efficient method for functional decomposition
of Boolean functions based on information relationship Measures. In: Proceedings of 3rd
DDECS 2000 Workshop, Smolenice, Slovakia, pp. 242–249 (2000).
4. L. Jozwiak, An efficient heuristic method for state assignment of large sequential machines.
Journal of Circuits, Systems and Computers, 2 (1) 1–26 (1991).
5. K. Feske, S. Mulka, M. Koegst, G. Elst, Technology-driven FSM partitioning for syn-
thesis of large sequential circuits targeting lookup-table based FPGAs. In: Proceedings
7th Workshop Field-Programable Logic and Applications (FPL ’97), Lecture Notes in
Computer Science, 1304, London, UK, pp. 235–244 (1997).
6. H. Kubátová, T. Hrdý, M. Prokeš, Problems with the Enencoding of the FSM with the Rela-
tion to its Implementation by FPGA. In: ECI2000 Electronic Computers and Informatics,
International Scientific Conference, Herl’any, pp. 183–188 (2000).
7. H. Kubátová, Implementation of the FSM into FPGA. In: Proceedings of the Interna-
tional Workshop on Discrete-Event System Design DESDes ’01, Oficyna Wydawnicza
Politechnika Zielona Góra, Przytok, Poland, pp. 141–146 (2001).
8. H. Kubátová, How to obtain better implementation of the FSM in FPGA. In: Proceedings
of the 5th IEEE Workshop DDECS 2002, Brno, Czech Republic, pp. 332–335 (2002).
9. H. Kubátová, M. Bečvář, FEL-Code: FSM internal state enencoding method. In: Pro-
ceedings of 5th International Workshop on Boolean Problems. Technische Universität
Bergakademie, Freiberg, pp. 109–114 (2002).
10. ftp://ftp.mcnc.org/pub/benchmark/Benchmark.dirs/LGSynth93/LGSynth93.tar
11. The Programmable Logic Data Book. XILINX Tenth Anniversary (1999),
http://www.xilinx.com
Chapter 16
BLOCK SYNTHESIS OF
COMBINATIONAL CIRCUITS
Abstract: Circuit realization on a single programmable logic array (PLA) may be unac-
ceptable because of the large number of terms in sum-of-products; therefore a
problem of block synthesis is considered in this paper. This problem is to realize a
multilevel form of Boolean function system by some blocks, where each block is
a PLA of smaller size. A problem of block synthesis in gate array library basis is
also discussed in this paper. The results of experimental research of influence of
previous partitioning of Boolean function systems on circuit complexity in PLA
and gate array library basis are presented in this paper.
1. INTRODUCTION
There are different ways of implementation of the control logic of custom
digital VLSI circuits. The most important ways are realization of two-level
AND/OR circuits in programmable logic array (PLA) basis9 and realization of
multilevel circuits in library gates basis7 . Each of them has its advantages and
disadvantages. The advantage of PLA circuits is simplicity of layout design,
testing, and modification, because the circuits are regular. There are effective
methods and programs of PLA area minimization9,4 . The disadvantage of two-
level PLA circuits is the large chip area in comparison with the area required
for a multilevel library gates circuit. But the synthesis of a multilevel circuit is a
very difficult task5 ; moreover, such circuits are harder for testing and topological
186 Chapter 16
design than PLA circuits. The implementation of a circuit in a single PLA may
be unacceptable because of the large size of the PLA. Therefore a problem of
block synthesis is considered in this paper. The results of experimental research
are given.
k n m
SAND,OR = 2 ∗ ∗ 9∗ ∗ +10 ∗ , (3)
8 2 4
and SBND is the area of the PLA boundary (load transistors, buffers, etc.), deter-
mined by the formula
k n m
SBND = 34,992 ∗ + 85 ∗ + 49,104 ∗
8 2 4
m
+ − 1 ∗ 12,276 + 60,423. (4)
4
top
The value of SPLA determined by formulas (3) and (4) is the number of real
layout cells that the PLA layout is composed of1 .
Table 16-1. Comparison of two realizations of multilevel representation for PLAs: the first is
realization in one PLA; the second is realization in several PLAs, obtained by partition
algorithm
Circuit name n m L S k L S h L S
REFERENCES
1. P.N. Bibilo, Symbolic layout of VLSI array macros with an SCAS silicon compiler. I.
Russian Microelectronics, 27 (2) 109–117 (1998).
2. P.N. Bibilo, N.A. Kirienko, Partitioning a system of logic equations into subsystem under
given restrictions. In: Proceedings of the Third International Conference on Computer-
Aided Design of Discrete Devices (CAD DD’99), Vol. 1, Republic of Belarus, Minsk,
pp. 122–126 (1999).
3. P.N. Bibilo, V.G. Litskevich, Boolean Network Covering by Library Elements. Control
Systems and Computers, Vol. 6, Kiev, Ukraine, pp. 16–24 (1999) (in Russian).
4. K.R. Brayton, G.D. Hachtel, C.T. McMullen, A.L. Sangiovanni-Vincentelli, Logic Mini-
mization Algorithm for VLSI Synthesis. Kluwer Academic Publisher, Boston (1984).
5. K.R. Brayton, R. Rudell, A.L. Sangiovanni-Vincentelli, A.R. Wang, MIS: A multiply-level
logic optimisation systems. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, CAD-6 1062–1081 (1987).
6. L.D. Cheremisinova, V.G. Litskevich, Realization of Operations on Systems of Logic Equa-
tions and SOPs. Logical Design. Institute of Engineering Cybernetics of National Academy
of Sciences of Belarus, Minsk, pp. 139–145 (1998) (in Russian).
7. F. Mailhot, G. De Micheli, Algorithms for technology mapping based on binary decision
diagrams and on Boolean operations. IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, 12 (5) 599–620 (1993).
8. N.R. Toropov, Minimization of Systems of Boolean Functions in DNF. Logical Design.
Institute of Engineering Cybernetics of National Academy of Sciences of Belarus, Minsk,
pp. 4–19 (1999) (in Russian).
9. A.D. Zakrevskij, Logic Synthesis of the Cascade Circuits. Nauka, Moscow (1981) (in
Russian).
Chapter 17
THE INFLUENCE OF FUNCTIONAL
DECOMPOSITION ON MODERN DIGITAL
DESIGN PROCESS
Abstract: General functional decomposition has been gaining more and more importance
in recent years. Though it is mainly perceived as a method of logic synthesis
for the implementation of Boolean functions into FPGA-based architectures, it
has found applications in many other fields of modern engineering and science.
In this paper, an application of balanced functional decomposition in different
tasks of modern digital designing is presented. The experimental results prove
that functional decomposition as a method of synthesis can help implementing
circuits in CPLD/FPGA architectures. It can also be efficiently used as a method
for implementing FSMs in FPGAs with Embedded ROM Memory Blocks.
1. INTRODUCTION
Decomposition has become an important activity in the analysis and design
of digital systems. It is fundamental for many fields of modern engineering and
science1,2,3,4 . The functional decomposition relies on breaking down a complex
system into a network of smaller and relatively independent cooperating sub-
systems, in such a way that the behavior of the original system is preserved,
i.e., function F is decomposed to subfunction G and H in the form described
by formula F = H (A, G(B)).
New methods of logic synthesis based on the functional decomposition were
recently developed5,6,7 . One of the promising decomposition-based methods is
the so-called balanced decomposition8 .
194 Chapter 17
Figure 17-1. Schematic representation of a) the serial and b) the parallel decomposition.
0 1 3 7 0 1 2 6
4 6
G1 1 7 9
G2
H1 H2
y0 y1
a) G 1 b) H1 c) G 2 d) H2
X
x q
a Address
modifier
c< b
Register
(a + c) < (x + q)
ROM
F Q
Altera’s FLEX family EAB (Embedded Array Block) has 2048 bits of memory
and the device FLEX10K10 consists of 3 such EABs. Functional decomposition
can be used to implement FSMs that exceed that size.
Any FSM defined by a given transition table can be implemented as in
Fig. 17-3, using an address modifier. The process may be considered as a de-
composition of memory block into two blocks: a combinational address modi-
fier and a smaller memory block. Appropriately chosen strategy of the balanced
decomposition may allow reducing required memory size at the cost of addi-
tional logic cells for address modifier implementation. This makes possible the
implemention of FSM that exceeds available memory through using embedded
memory blocks and additional programmable logic.
Example: FSM implementation with the use of the concept of address modifier.
x1x2
a) b) 00 01 10 11 Q2 q3 q1
v1 v2 v3 v4 v1 v2 v3 v4
s1 s1 s2 s4 – s1 s1 s2 s4 – 00
s2 – – s5 s4 s2 – – s5 s4 01 0
s3 s3 s2 s1 s3 s4 s2 – s4 s1 10
s4 s2 – s4 s1 s3 s3 s2 s1 s3 11
1
s5 s3 s1 s4 s2 s5 s3 s1 s4 s2 01
The Influence of Functional Decomposition 199
x1
x1 x2 q1 g
q2
q3
REGISTER
g = x1q3 + q2
ROM
q1 q2 q3
Let us consider FSM described in Table 17-3a. Its outputs are omitted, as
they do not have influence on the method. This FSM can be implemented using
ROM memory with 5 addressing bits. This would require memory of the size of
32 words. In order to implement this FSM machine in ROM with 4 addressing
bits, the address modifier is required.
Let us implement the given FSM in a structure shown on Fig. 17-4 with
3 free variables and one output variable from the address modifier. Such an
implementation requires memory of the size of 16 words and additional logic
to implement the address modifier.
To find the appropriate state/input encoding and partitioning, the FSMs state
transition table is divided into 8 subtables (encoded by free variables), each of
them having no more then two different next states, which can be encoded with
one variable—address modifier output variable (to achieve this, rows s3 and s4
changed places with each other). Next the appropriate state and input encoding
is introduced (Table 17-3b).
The truth table of address modifier, as well as content of the ROM memory,
can be computed by the application of the serial functional decomposition
method.
More detailed description of the method can be found in Ref. 11.
4. EXPERIMENTAL RESULTS
The balanced decomposition was applied to implement FPGA in architec-
tures of several “real life” examples: combinational functions and combinational
parts of FSMs. We used the following examples:
r bin2bcd1—binary to BCD converter for binary values from 0 to 99,
r bin2bcd2—binary to BCD converter for binary values from 0 to 355,
r rd88—sbox from Rijndael implementation,
r DESaut—combinational part of the state machine used in DES algorithm
implementation,
200 Chapter 17
bin2bcd1 13 165 38 30 30
bin2bcd2 39 505 393 225 120
rd88 262 326 452 – –
DESaut 28 46 35 25 30
5B6B 41 92 41 100 49
count4 11 74 68 17 11
For the comparison the following synthesis tools were used: MAX+PlusII
v10.2 Baseline, QuartusII WebEdition v3.0 SP1, FPGA Express 3.5, Leonardo
Spectrum v1999.1 and DEMAIN. Logic network produced by all synthesis
tools were implemented in EPF10K10LC84-3, the FPGA device from FLEX
family of Altera.
Table 17-4 shows the comparison of our method based on balanced decom-
position as implemented in tool DEMAIN with other methods of compared
tools. The table presents the comparison of logic cells needed for implemen-
tation of given examples. The results of implementation in FPGA architecture
show that the method based on the balanced decomposition provide better re-
sults than other tools used in the comparison. It is especially noticeable in
synthesis of such parts of digital systems that can be represented by truth
table description, as in the case of BIN2BCD converter. In the implementa-
tion of this example obtained with DEMAIN software the number of logic
cells required are over 10 times less than in the case of the MAX+PlusII and
2 times in case of the Leonardo Spectrum. It is also much better than imple-
mentation based on behavioral description14 , which requires 41 logic cells of
device EPF10K10. This is over 3 times worse than the solution obtained with
DEMAIN.
The presented results lead to the conclusion that the influence of the bal-
anced decomposition on efficiency of practical digital systems implementation
would be particularly significant when the designed circuit contains complex
combinational blocks. This is a typical situation when implementing crypto-
graphic algorithms, where so-called substitution boxes are usually implemented
as combinational logic.
DEMAIN has been used in the implementation of such algorithms allowing
significant improvement in logic resources utilization, as well as in performance.
Implementation of data path of the DES (Data Encryption Standard) algorithm
The Influence of Functional Decomposition 201
with MAX+PlusII requires 710 logic cells and allows encrypting data with
throughput of 115 Mb/s. Application of balanced functional decomposition in
optimization of selected parts of the algorithm reduces the number of required
logic cells to 296 without performance degradation and even increasing it to
206 Mbits/s14 .
The balanced functional decomposition was also used in the implementa-
tion process of the Rijndael algorithm targeted to low-cost Altera programmable
devices15 . Application of DEMAIN software allowed implementing this algo-
rithm in FLEX10K200 circuit very efficiently with throughput of 752 Mbits/s.
For comparison, implementation of Rijndael in the same programmable struc-
ture developed at TSI (France) and Technical University of Kosice (Slovakia)
allowed throughput of 451 Mbits/s, at George Mason University (USA)—316
Mbits/s, and at Military University of Technology (Poland)—248 Mbits/s16 .
Since, upon the encoding of the FSM states, the implementation of such
FSM architectures involves the technology mapping of the combinational part
into target architecture, the quality of such an implementation strongly depends
on the combinational function implementation quality.
In Table 17-5 the comparison of different FSM implementations is presented.
Each sequential machine was described by a transition table with encoded states.
We present here the number of logic cells and memory bits required (i.e., area
of the circuit) and the maximal frequency of clock signal (i.e., speed of the
circuit) for each method of FSM implementation.
The columns under the FF MAX+PlusII heading present results obtained
by the Altera MAX+PlusII system in a classical flip-flop implementation of
FSM. The columns under FF DEMAIN heading show results of implemen-
tation of the transition table with the use of balanced decomposition. The
ROM columns provide the results of ROM implementation; the columns under
AM ROM heading present the results of ROM implementation with the use of
an address modifier. It can be easily noticed that the application of balanced
Table 17-5. Implementation of FSM: 1) FSM described with special AHDL construction,
2)
decomposition with the minimum number of logic levels, 3) decomposition not possible,
4)
not enough memory to implement the project
5. CONCLUSIONS
Balanced decomposition produces very good results in combinational func-
tion implementation in FPGA-based architectures. However, results presented
in this paper show that balanced functional decomposition can be efficiently
and effectively applied beyond the implementation of combinational circuits
in FPGAs. Presented results, achieved using different algorithms of multilevel
synthesis, show that the possibilities of decomposition-based multilevel syn-
thesis are not fully explored.
Implementation of sequential machines in FPGA-based architectures
through balanced decomposition produces devices that are not only smaller
(less logic cells utilised) but are also often more important, faster than those
obtained by the commercial MAX+PlusII tool. Balanced decomposition can
also be used to implement large FSM in an alternative way—using ROM.
This kind of implementation requires much less logic cells than the traditional
flip-flop implementation; therefore, it can be used to implement “nonvital”
FSMs of the design, saving logic cell resources for more important parts of
the circuits. However, large FSM may require too much memory resources.
With the concept of address modifier, memory usage can be significantly
reduced.
The experimental results shown in this paper demonstrate that the synthesis
method based on functional decomposition can help in implementing sequential
machines using flip-flops, as well as ROM memory.
Application of this method allows significant improvement in logic re-
sources utilization as well as in performance. Implementation of data path
of the DES (Data Encryption Standard) algorithm with MAX+PlusII requires
710 logic cells and allows encrypting data with throughput of 115 Mbits/s. Ap-
plication of balanced decomposition reduces the number of logic cells to 296
without performance degradation and even increasing it to 206 Mbits/s.
The Influence of Functional Decomposition 203
These features make balanced decomposition the universal method that can
be successfully used in digital circuits design with FPGAs.
ACKNOWLEDGMENT
This paper was supported by the Polish State Committee for Scientific Re-
search financial grant for years 2003–2004, number 4 T11D 014 24.
REFERENCES
1. J.A. Brzozowski, T. Luba, Decomposition of Boolean functions specified by cubes. Jour-
nal of Multiple-Valued Logic and Soft Computing, 9, 377–417 (2003). Old City Publishing,
Inc., Philadelphia.
2. C. Scholl, Functional Decomposition with Application to FPGA Synthesis. Kluwer Aca-
demic Publishers, 2001.
3. T. Luba, Multi-level logic synthesis based on decomposition. Microprocessors and Mi-
crosystems, 18 (8), 429–437 (1994).
4. R. Rzechowski, L. Jóźwiak, T. Luba, Technology driven multilevel logic synthesis based
on functional decomposition into gates. In: EUROMICRO’99 Conference, Milan, pp. 368–
375 (1999).
5. M. Burns, M. Perkowski, L. Jóźwiak, An efficient approach to decomposition of multi-
output Boolean functions with large set of bound variables. EUROMICRO’98 Conference,
Vol. 1, Vasteras, pp. 16–23 (1998).
6. S.C. Chang, M. Marek-Sadowska, T.T. Hwang, Technology mapping for TLU FPGAs
based on decomposition of binary decision diagrams. IEEE Transactions on CAD, 15
(10), 1226–1236 (October, 1996).
7. L. Jóźwiak, A. Chojnacki, Functional decomposition based on information relationship
measures extremely effective and efficient for symmetric Functions. EUROMICRO’99
Conference, Vol. 1, Milan, pp. 150–159 (1999).
8. T. Luba, H. Selvaraj, M. Nowicka, A. Kraśniewski, Balanced multilevel decomposition
and its applications in FPGA-based synthesis, In: G. Saucier, A. Mignotte (eds.), Logic
and Architecture Synthesis. Chapman & Hall (1995).
9. M. Nowicka, T. Luba, M. Rawski, FPGA-based decomposition of Boolean functions.
Algorithms and implementation. In: Proceedings of Sixth International Conference on
Advanced Computer Systems, Szczecin, pp. 502–509.
10. M. Rawski, L. Jóźwiak, T. Luba, Functional decomposition with an efficient input support
selection for sub-functions based on information relationship measures. Journal of Systems
Architecture, 47, 137–155 (2001).
11. M. Rawski, T. Luba, FSM Implementation in embedded memory blocks using concept
of decomposition. In: W. Ciazynski, et al. (eds.): Programmable Devices and Systems.
Pergamon—Elsevier Science, pp. 291–296 (2002).
12. T. Luba, H. Selvaraj, A general approach to Boolean function decomposition and its
applications in FPGA-based synthesis, VLSI design. Special Issue on Decompositions in
VLSI Design, 3 (3–4), 289–300 (1995).
13. http://www.altera.com/products/devices/dev-index.jsp.
204 Chapter 17
Abstract: The development of embedded systems requires both tools and methods which
help the designer to deal with the higher complexity and tougher constraints due
to the different hardware support, often distributed topology and time require-
ments. Proper tools and methods have a major impact on the overall costs and
final product quality. We have applied the Object-Oriented Real-Time Techniques
(OORT) method, which is oriented toward the specification of distributed real-
time systems, to the implementation of the Multiple Lift System (MLS) case
study. The method is based on the UML, SDL and MSC languages and supported
by the ObjectGEODE∗ toolset. This paper summarizes the method and presents
our experience in the MLS system development, namely the difficulties we had
and the success we have achieved.
Key words: Embedded Systems Specification; Software Engineering; Discrete-Event Sys-
tems Control; Simulation; Targeting.
1. INTRODUCTION
Embedded systems are very complex because they are often distributed,
run in different platforms, have temporal constraints, etc. Their development
demands high quality and increasing economic constraints, therefore it is nec-
essary to minimize their errors and maintenance costs, and deliver them within
short deadlines.
To achieve these goals it is necessary to verify a few conditions: decrease
the complexity through hierarchical and graphical modeling for high flexibility
Requirements Analysis
Object Analysis
UML Class Diagrams
Behavioural Design
SDL Process Diagrams
Data Modelling
UML Class Diagrams
Implementation Test
2. REQUIREMENTS ANALYSIS
In the requirements analysis phase, the system environment is modeled and
the user requirements are specified. The analyst must concentrate on what the
system should do. The environment where the system will operate is described
by means of UML class diagrams—object modeling. The functional behavior
of the system is specified by MSCs, organized in a hierarchy of scenarios—use
case modeling. The system is viewed from the exterior as a black box, with
which external entities (system actors) interact. Both the object model and the
use case model must be independent from the solutions chosen to implement
the system in the next phases.
Building
1
1 Address : string
Telephone : string
1..*
1 1
MLS
NF NF
Floor 1 1 FloorAccess
Number : FloorType Number : FloorType
1 Lodges
NL 1
Lift Supervisor
Number : ElevatorType SaveAlarm(Info : Alarme)
Floor : FloorType ReceivedDestination(Dest : Destination)
Direction : DirectionType ServedDestination(Dest : Destination)
State : StateType ScheduledCall(Ca : Call)
ServeDestination(Floor : FloorType) ReceivedCall(Ca : Call)
ServeCall(Ca : Call) ServedCall(Ca : Call)
EndOperation() LiftState(State : LiftState)
{1 <= Floor <= NF} 1 StopSystem()
Transportation 1
* * MonitoringAndControl
PotentialPassenger Passenger
PushCallButton(Direction : DirectionType) PushDestinationButton(Floor : FloorType)
PushDoorButton(Comand : DoorComandType)
1
User
Operator
BlockDoor()
Trips
Trip
CrossFloor DoorOpened
Passenger and Operator. Generally, there is one module for some basic
system composition, one for each of the actors and others to express interesting
relationships. More information about the analysis of the MLS can be found in
Douglass13 .
3. ARCHITECTURAL DESIGN
In this phase, the system designers specify a logical architecture for the
system (as opposed to a physical architecture). The SDL language covers all
aspects of the architecture design.
Development of Embedded Systems Using OORT 211
CrossFloor
IE_Lift_1 MLS_1 IE_Floor_1_Lift_1 SP_Lift_1 Antonio
NewFloor (2)
MLS
Floors
FloorAccess (NF):Floor Lift Lifts(NL):Lift Central
Figure 18-6. SDL Interconnection Diagram of the Top Level MLS Hierarchy.
4. DETAILED DESIGN
The description of concurrent and passive objects that constitute the system
architecture is done in the detailed design phase. In other words, it is specified
how the system implements the expected services, which should be independent
of the final platform where the system will run.
Opened
EXPORTED Close
Closing
Opened
EXPORTED Open
PROCEDURE Closed NoAck Blocked
Close
Opening
5. TEST DESIGN
In this phase, the communication between all the elements of the system
architecture is specified by means of detailed MSCs. The detailed MSCs contain
the sequences of messages exchanged between the architectural elements. They
are built by refining the abstract MSC of each terminal scenario from the use
case model, according to the SDL architecture model. Consequently, the test
design activity can be executed parallel to the architecture design and provide
requirements for the detailed design phase.
In the intermediate architecture levels, the detailed MSCs represent integra-
tion tests between the concurrent objects. The last refinement step corresponds
214 Chapter 18
LiftState Call
CallDataBase
Floor : FloorType Floor : FloorType 0..*
to unit tests that describe the behavior of processes (the terminal SDL architec-
ture level). Fig. 18-9 illustrates this.
The process level MSCs can be further enriched by including in each process
graphical elements with more detailed behavior, such as states, procedures, and
timers. Fig. 18-10 shows the integration test corresponding to the abstract MSC
of Fig. 18-4, and Fig. 18-11 represents the respective unit test for one of the
internal blocks.
System
Scenario
S1 S2 Hierarchy
S3 S4 S5
System
m1
Abstract
m2 MSCs
m3
Block1 Block2
m1
m4
m2
m3
m5
Detailed
MSCs
Process1 Process2 Process3
m4
m6
m7
m2
m5
m8
block block
/MLS/Central /MLS/Lifts_1
NewFloor
'Window := UpdateState
(Window,1,(. 3,Down,Moving.))'
While the use case model reflects the user perspective of the system, the
test design should be spread to cover aspects related to the architecture, such
as performance, robustness, security, flexibility, etc.
6. SIMULATION
SDL is a formal language, and therefore permits a trustable simulation14
of the models. The simulation of an SDL model is a sequence of steps, firing
transitions from state to state.
The ObjectGEODE simulator15 executes SDL models, comparing them with
MSCs that state the expected functionalities and anticipated error situations, and
it generates MSCs of the actual system behavior. It provides three operation
modes: interactive, in which the user acts as the system environment (providing
stimuli) and monitors the system’s internal behavior; random, the simulator
Central
Supervisor_1 Gnome_1
process process
/MLS/Central/Supervisor /MLS/Central/Gnome
'Window := UpdateState
(Window,1,(. 3,Down,Moving .))'
Figure 18-11. Detailed MSC with Unit Test of Block Central for CrossFloor.
216 Chapter 18
picks randomly one of the transitions possible to fire; exhaustive, the simulator
automatically explores all the possible system states.
The interactive mode can be used to do the first tests, to verify some important
situations in particular. This way, the more generic system behavior can be
corrected and completed. This mode is specially suited for rapid prototyping,
to ensure that the system really works.
The system was very useful to detect flaws in ADTs whose operators were
specified in textual SDL, as, for example, the heavy computational ADTs re-
sponsible the calls dispatching. As the simulator has transition granularity, it
is not possible to go step by step through the operations executed inside one
transition. The errors are detected after each transition, whenever an unexpected
state is reached, or a variable has an unpredicted value. Obviously, this is not
an adequate way to simulate a large number of cases.
After a certain level of confidence in the overall application behavior is
achieved, it can be tested for a larger number of scenarios, in order to detect
dynamical errors such as deadlocks, dead code, unexpected signals, signals
without receiver, overflows, etc. This is done in the random mode, to verify if
the system is being correctly built—system verification.
Although, this could be done with exhaustive simulation, it would not be
efficient. The exhaustive mode requires considerable computer resources during
a lot of time, and it generates a large amount of information. It is not something
to be done everyday. The exhaustive simulation allows the validation of the
system, i.e., to check the system against the requirements. We can verify if it
implements the expected services, by detecting interactions that do not follow
some defined properties, or interaction sequences that are not expected.
7. TARGETING
The ObjectGEODE automatic code generator translates the SDL specifica-
tion to ANSI C code, which is independent of the target platform in which the
system will run.
The SDL semantics (including the communication, process instance
scheduling, time management, and shared variables) is implemented by a dy-
namic library which abstracts the platform from the generated code. It is also
responsible for the integration with the executing environment, namely the
RTOS.
In order to generate the application code, it is necessary to describe the
target platform where the system will be executed. This is done by means of a
mapping between the SDL architecture and the C code implementation.
The SDL architecture consists of a logical architecture of structural ob-
jects (system, blocks, processes, etc.,), in which the lower level objects (the
Development of Embedded Systems Using OORT 217
SDL Specification
Architecture Definition
Generated User
C Code Code
Dynamic Library
RTOS
Physical Architecture
processes) implement the behavior of the described system. The physical imple-
mentation consists of a hierarchy of nodes and tasks. A node corresponds to one
processing unit with multitasking OS, and a task is an unit of parallelism of the
OS. One task can map one of the following SDL objects: system, Task/System
(TS) mapping; block, Task/Block (TB) mapping; process—Task/Process (TP)
mapping; process instance—Task/Instance (TI) mapping.
In the TI mapping, the complete application is managed by the target OS.
In the TP mapping, the OS is in charge of the interaction between processes,
whilst the management of the process instances inside the task is done by the
ObjectGEODE’s SDL virtual machine (VM).
In the TB mapping, the OS manages the communication between blocks,
while the SDL VM executes the SDL objects inside each block. Finally, the TS
mapping is the only possible option for nonmultitasking operating systems, for
which the SDL VM manages all the application. For the MLS, the TP mapping
was chosen.
After the automatic code generation, the code of any parts interacting or
directly depending on the physical platform has to be supplied, preferably in
the most suitable language. The ADT operators that do not interact with external
devices can be coded algorithmically in SDL, and thus the respective C code will
be generated. For each ADT operator one C function interface is automatically
generated.
Fig. 18-12 illustrates the simplified application generation scheme. If some
parts of the SDL model are to be implemented in hardware, Daveau16 provides
a partition and a synthesis methodology.
8. CONCLUSION
The UML, MSC, and SDL, being continuously improved to international
standards, facilitates the protection of development investment. The presented
work shows the validity of these languages and their combined use in the
implemention of embedded systems.
It is feasible to simulate a formal language like SDL, because it is defined by
a clear set of mathematical rules. The ObjectGEODE provides three simulation
218 Chapter 18
modes suitable for different levels of system correctness. They can be applied
to make early validations and to increase the frequency of an iterative develop-
ment process. This allows cost reduction by decreasing the number of missed
versions, i.e., it helps the designers to get closer to the “right at first time”.
The SDL application is scalable, because its logical architecture is indepen-
dent of the physical architecture. The mapping between objects and hardware
is defined only in the targeting phase. Furthermore, with the ObjectGEODE
toolset the implementation is automatic, thus limiting the manual coding to the
target dependent operations. The generated application is optimized for the tar-
get platform by means of the mapping defined by the developer. Any change in
the physical architecture only requires a change in the mapping, so the system
specification and its logical architecture remain the same.
The adoption of a methodology based on OO graphical languages helps
the designer to organize the development tasks, and build the application as a
consistent combination of parts. Object-oriented visual modeling is more flex-
ible, easier to maintain, and favors reutilization. These advantages are empha-
sized when appropriate tool support exists during all the engineering process
phases. In fact, it is a critical factor for success, namely for simulation and
targeting.
REFERENCES
1. Verilog, ObjectGEODE Method Guidelines. Verilog, SA (1996).
2. Object Management Group, Unified Modeling Language Specification vl.3 (March 2000);
http://www.omg.org.
3. ITU-T, Recommendation Z.120: Message Sequence Chart (October 1996); http://www
.itu.int.
4. ITU-T, Recommendation Z.100: Specification and Description Language (March 1993);
http://www.itu.int.
5. ITU-T, Recommendation Z.100 Appendix 1: SDL Methodology Guidelines (March 1994);
http://www.itu.int.
6. ITU-T Recommendation Z.100 Addendum 1 (October 1996); http://www.itu.int.
7. UML Revision Task Force, Object Modeling with OMG UML Tutorial Series (November
2000); http://www.omg.org/technology/uml/uml tutorial.htm.
8. ITU-T, Recommendation Z.100 Supplement 1: SDL + Methodology: Use of MSC and
SDL (with ASN.1) (May 1997); http://www.itu.int.
9. E. Rudolph, P. Graubmann, J. Grabowski, Tutorial on message sequence charts. Computer
Networks and ISDN Systems 28 (12) (1996).
10. O. Faergemand, A. Olsen, Introduction to SDL-92. Computer Networks and ISDN Systems,
26 (9) (1994).
11. A. Olsen, O. Faergemand, B. Moller-Pedersen, R. Reed, J.R.W. Smith, Systems Engineer-
ing Using SDL-92. North Holland (1994).
12. E. Yourdon, Object-Oriented Systems Design: An Integrated Approach. Prentice Hall
(1994).
Development of Embedded Systems Using OORT 219
13. B.P. Douglass, Real-Time UML: Developing Efficient Objects for Embedded Systems.
Addison-Wesley (1998).
14. V. Encontre, How to use modeling to implement verifiable, scalable, and efficient real-
time application programs. Real-Time Engineering (Fall 1997).
15. Verilog, ObjectGEODE SDL Simulator Reference Manual. Verilog, SA (1996).
16. J.M. Daveau, G.F. Marchioro, T. Ben-Ismail, A.A. Jerraya, Cosmos: An SDL based
hardware/software codesign environment. In J.-M. Bergé, O. Levia, J. Rouillard (eds.),
Hardware/Software Co-design and Co-Verification, Kluwer Academic Publishers (1997)
pp. 59–87.
Chapter 19
OPTIMIZING COMMUNICATION
ARCHITECTURES FOR PARALLEL
EMBEDDED SYSTEMS
Vaclav Dvorak
Dept. of Computer Science and Engineering, University of Technology Brno, Bozetechova 2,
612 66 Brno, Czech Republik; e-mail: [email protected]
1. INTRODUCTION
The design of mixed hw/sw systems for embedded applications has been an
active research area in recent years. Hw/sw cosynthesis and cosimulation have
been mainly restricted to a single processor and programmable arrays attached to
it, which were placed incidentally on a single chip (SoC). A new kind of system,
application-specific multiprocessor SoC, is emerging with frequent applications
in small-scale parallel systems for high-performance control, data acquisition
and analysis, image processing, wireless, networking processors, and game
computers. Typically several DSPs and/or microcontrollers are interconnected
with an on-chip communication network and may use an operating system.
The performance of most digital systems today is limited by their com-
munication or interconnection, not by their logic or memory. This is why we
222 Chapter 19
As the number of processors on the chip will be, at least in the near future,
typically lower than ten, we do not have to worry about scalability of these ar-
chitectures. Therefore the bus interconnection will not be seen as too restrictive
in this context.
Some more scalable architectures such as SMP with processors and memory
modules interconnected via a multistage interconnection network (the so-called
“dancehall” organization) or a hw-supported distributed shared memory will
not be considered as candidates for small-scale parallel embedded systems or
SoCs.
Let us note that the choice of architecture can often also be dictated by a
particular application to be implemented in parallel, e.g., broadcasting data to
processors, if not hidden by computation, may require a broadcast bus for speed,
but on the contrary, all-to-all scatter communication of intermediate results will
be serialized on the bus and potentially slower than on a direct communication
network. The next generation of internet routers and network processors SoC
may require unconventional approaches to deliver ultrahigh performance over
an optical infrastructure. Octagon topology3 suggested recently to meet these
challenges was supposed to outperform shared bus and crossbar on-chip com-
munication architectures. However, it can be easily shown that this topology
with the given routing algorithm3 is not deadlock-free. Some conclusions like
the previous one or preliminary estimation of performance or its lower bound
can be supported by back-of the-envelope calculations, other evaluations are
more difficult due to varying message lengths or irregular nature of communi-
cations. This is where simulation fits in.
With reference to the presented case study, we will investigate the following
(on-chip) communication networks:
1. fully connected network
2. SF hypercube
3. WH hypercube
4. Multistage interconnection network MIN (Omega)
5. Atomic bus
in shared address space than in message passing, since the events of interest
are not explicit in the shared variable program. In the shared address space,
performance modeling is complicated by the very same properties that make
developing a program easier: naming, replication and coherence are all im-
plicit, i.e., transparent for the programmer, so it is difficult to determine how
much communication occurs and when, e.g., when cache mapping conflicts are
involved 4 .
Sound performance evaluation methodology is essential for credible com-
puter architecture research to evaluate hw/sw architectural ideas or trade-offs.
Transaction Level Modeling (TLM)5 has been proposed as a higher modeling
abstraction level for faster simulation performance. At the TLM level, the sys-
tem bus is captured as an abstract ‘channel’, independent of a particular bus
architecture or protocol implementation. A TLM model can be used as a pro-
totype of the system and for early functional system validation and embedded
software development. However, these models do not fully exploit the poten-
tial for speedup when modeling systems for exploring on-chip communication
tradeoffs and performance. On the other hand, commonly used shared-memory
simulators rsim, Proteus, Tango, limes or MulSim6 , beside their sophistication,
are not suitable for message passing systems.
This made us reconsider the simulation methodology for sharedmemory
multiprocessors. Here we suggest using a single CSP-based simulator both for
message passing as well as for shared address space. It is based on simple
approximations and leaves the speed vs. accuracy tradeoff for the user, who can
control the level of details and accuracy of simulation.
The CSP-based Transim tool can run simulations written in Transim
language7 . It is a subset of Occam 2 with various extensions. Transim is natu-
rally intended for message passing in distributed memory systems. Neverthe-
less, it can be used also for simulation of shared memory bus-based (SMP)
systems – bus transactions in SMP are modeled as communications between
node processes and a central process running on an extra processor. Transim
also supports shared variables used in modeling locks and barriers. Until now,
only an atomic bus model has been tested; the split-transaction bus requires
more housekeeping and its model will be developed in the near future.
The input file for Transim simulator tool contains descriptions of software,
hardware, and mapping to one another. In software description, control state-
ments are used in the usual way, computations (integer only) do not consume
simulated time. This is why all pieces of sequential code are completed or
replaced (floating point) by special timing constructs SERV ( ). Argument of
SERV ( ) specifies the number of CPU cycles taken by the task. Granularity of
simulation is therefore selectable from individual instructions to large pieces
of code. Explicit overhead can be represented directly by WAIT( ) construct.
Data-dependent computations can be simulated by SERV construct with a ran-
dom number of CPU cycles. Some features of an RT distributed operating
Optimizing Communication Architectures 225
Table 19-1. The lower bound on total exchange (AAS) communication times
5. PARAMETERS OF SIMULATED
ARCHITECTURES AND RESULTS
OF SIMULATION
Six architectures simulated in the case study are listed in Table 19-2 together
with the execution times. The CPU clock rate is 200 MHz in all 6 cases, the
Optimizing Communication Architectures 227
Table 19-2. Parallel FFT execution times in µs for
six analyzed architectures
external channel speed of 100 Mbit/s (12 MB/s) is used for serial links in all
message-passing architectures, whereas bus transfer rate for SMP is 100 MB/s.
Downloading and uploading of input data and results were supposed to continue
in the background in all processors simultaneously at a 8-times higher rate than
the link speed, which is almost equivalent to the bus speed in SMP case. In
message-passing architectures the AAS communication was overlapped with
submatrix transposition as much as possible. Optimum routing algorithm for SF
hypercube and AAS communication requires p/2 steps and uses schedule tables
shown in Fig. 19-1. For example two nodes with mutually reversed address bits
(the relative address RA = 7) will exchange messages in step 2, 3, and 4 and the
path will start in dimension 1, then continue in dimension 0 and finally end in
dimension 2. In case of WH hypercube, dimension-ordered routing is used in
every step i, i = 1, 2, . . . , p-1, in which src-node and dst-node with the relative
addresses RA = src ⊕ dst = i exchange messages without any conflict-over
disjoint paths.
The small cluster of (digital signal) processors, referred to as COSP in
Table 19-2, uses a centralized router switch (MIN of Omega type) with sw/hw
overhead of 5 µs, the same as a start-up cost of serial links, and WH routing. The
algorithm for AAS uses a sequence of cyclic permutations, e.g., (01234567),
(0246)(1357), . . . , (07654321) for p = 8. All these permutations are blocking
and require up to log p = 3 passes through the MIN.
Finally a bus-based shared memory system with coherent caches (SMP)
has had 100 MB/s bus bandwidth, 50 MHz bus clock, and the miss penalty of
20 CPU clocks. We will assume an atomic bus for simplicity and a fair bus
Figure 19-1. Optimum schedule for AAS in all-port full-duplex 2D- and 3D SF hypercubes.
228 Chapter 19
µs
450
400
350 full
300 COSP
250 SMP
200 SFcube
150 WHcube
100
50
0 processor
2 4 8 count
6. CONCLUSIONS
The performance study of the parallel FFT benchmark on a number of ar-
chitectures using Transim tool proved to be a useful exercise. Even though the
results of simulations have not been confronted with real computations, they
can certainly serve to indicate serious candidate architectures that satisfy cer-
tain performance requirements. The approximations hidden in the simulation
limit the accuracy of real-time performance prediction, but the level of detail
in simulation is given by the user, by how much time they are willing to spend
on building the model of hw and sw. For example, modeling the split-transaction
bus or the contention in interconnection network for WH routing could be quite
difficult. The latter was not attempted in this case study since the FFT bench-
mark requires only regular contention-free communication. This, of course,
generally will not be the case. Nevertheless, simulation enables fast varying of
sw/hw configuration parameters and studying of the impact of such changes on
performance, free from the second-order effects. In this context, the CSP-based
Transim simulator and language proved to be very flexible, robust, and easy to
use. Future work will continue to include other benchmarks and analyze the
accuracy of performance prediction.
ACKNOWLEDGMENTS
This research has been carried out under the financial support of the Research
intention no. CEZ: J22/98: 262200012—“Research in information and control
systems” (Ministry of Education, CZ) and the research grant GA 102/02/0503
“Parallel system performance prediction and tuning” (Grant Agency of Czech
Republic).
REFERENCES
1. J. Duato, S. Yalamanchili, L. Ni, Interconnection Networks—An Engineering Approach.
Morgan Kaufman Publishers (2003).
2. J. Silc, B. Robic, T. Ungerer, Processor Architecture: From Dataflow to Superscalar and
Beyond. Springer-Verlag (1999), ISBN 3-540-64798-8.
3. F. Karim, A. Nguyen, An Interconnect Architecture for Networking Systems on Chips. IEEE
Micro, pp. 36–45, Sept.–Oct. 2002.
230 Chapter 19
4. D.E. Culler et al., Parallel Computer Architecture. Morgan Kaufmann Publ., p. 189
(1999).
5. T. Grötker, S. Liao, G. Martin, S. Swan, System Design with SystemC. Kluwer Academic
Publishers (2002).
6. http://heather.cs.ucdavis.edu/∼matloff/mulsim.html
7. E. Hart, TRANSIM–Prototyping Parallel Algorithms, (2nd edition). University of Westmin-
ster Press London, (1994).
8. A. Zomaya, Parallel and Distributed Computing Handbook. McGraw Hill, p. 344
(1996).
Chapter 20
REMARKS ON PARALLEL BIT-BYTE CPU
STRUCTURES OF THE PROGRAMMABLE
LOGIC CONTROLLER
Abstract: The paper presents some hardware solutions for the bit-byte CPU of a PLC,
which are oriented for maximum optimisation of data exchange between the CPU
processors. The optimization intends maximum utilization of the possibilities
given by the two-processor architecture of the CPUs. The key point is preserving
high speed of instruction processing by the bit-processor, and high functionality
of the byte-processor. The optimal structure should enable the processors to work
in parallel as much as possible, and minimize the situation, when one processor
has to wait for the other.
Key words: PLC; CPU; Bit-Byte structure of CPU; control program; scan time.
1. INTRODUCTION
One of the main parameters (features) of Programmable Logic Controllers
(PLC) is execution time of one thousand control commands (scan time). This
parameter evaluates the quality of PLC. Consequently, designing and construc-
tion are important tasks of the CPU which should have a structure enabling
fast control program execution. The most developed CPUs of PLCs of many
well-known manufacturers are constructed as multiprocessor units. Particular
processors in such units execute the tasks assigned to them. In this way we
obtain a unit, which makes possible parallel operation of several processors.
For such a CPU the main problem to be solved is the method of task-assuming
in particular processors and finding a structure of CPU capable of realisation
of such task-assigning in practice as shown by Michel (1990).
232 Chapter 20
be executed more often than the other segments. To avoid a situation where
two or more tasks are triggered at the same moment it would be necessary to
assign the priorities to the control program segments. The described method of
programmable controller operation changes the approach to preparing a control
program but it seems to the authors that in such a programmable controller the
problems with, for example, timers will be easier. It is not necessary to observe
the moment when time interval will be completed. At the end of the time inter-
val, counting the suitable segment may be called and executed. It means that
the currently executed program segment should be interrupted and this depends
on the priorities assigned to the particular program segments.
Such type of programmable controller CPUs will be the subject of future
work, while in this paper the few proposals of programmable controller bit-
byte CPU structures are presented. These are CPUs with serial-cyclic program
execution but they are structurally prepared for event-triggered operation.
inputs and outputs is impossible. In this situation all inputs and outputs must
be scanned and updated sequentially as fast as possible. If we wish to achieve
good control parameters, the bits operation should be done very quickly.
The creation of a specialised bit-processor, which can carry out bit operation
very quickly, is quite reasonable. If there is a need to process byte data, for
example from AD converters or external timers, the use of additional 8, 16, or
even 32, bits processor or microcontroller is required. The General structure of
the device is presented by Getko (1983).
The solution consists of two processors. Each of them has its own instruction
set. An instruction decoder recognises for which processor an instruction was
fetched and sends activation signals to it.
The basic parameter under consideration was program execution speed. Pro-
gram execution speed is mainly limited by access latency of both processors to
the internal (e.g., counter timers) and external (e.g., inputs and outputs) process
variables. The program memory and the instruction fetch circuitry also influ-
ence system performance. In order to support conflictless cooperation of both
processors and maintain their concurrent operations the following assumption
were made:
r both processors have separate data memory but process image memory is
shared between them. It seems a better solution when each processor has
independent bus to in/out signals that are disjoint sets for both processors. In
order to remove conflicts in access to common resources (process memory)
two independent bus channels are implemented to process memory, one
for each processor. This solution increases cost but simplifies processor
cooperation protocol, especially by the elimination of the arbitration process
during access to common resources;
r presented circuit has only one program memory. In this memory there are
stored instructions for bit and byte processors. Byte processor usually exe-
cutes a subprogram which has a set of input parameters. The byte processor
would highly reduce instruction transfer performance by accessing program
memory in order to fetch an invariant part of subroutine that is currently
executed. Donandt (1989) proposed a solution that implements separated
program memory for byte processors. In the presented case we decided to
implement two program memories for byte processor. In common program
memory, subprogram calls are stored with appropriate parameter sets. In
local program memory of byte processor there are stored bodies of subrou-
tines that can be called from controller program memory. It allows saving
program memory by replacing subprograms of byte processor with subpro-
gram calls. Subprograms implement specific instructions of the PLC, which
are not typical for general purpose byte processors;
r in order to reduce access time to in/out signal, process memory was re-
placed by registers that are located in modules. Content refresh cycle of
Remarks on Parallel Bit-Byte CPU Structures 235
these registers will be executed after the completion of the calculation loop.
In some cases register update can be executed on programmer demand.
Presented solution gives fast access to in/out modules with relatively low
requirements for a hardware part. It is also possible to bypass registers and
directly access module signals.
The bit processor that operates as a master unit in the CPU allows speeding
up operation of the controller. There are also disadvantages connected with this
architecture, which can be easily compensated. Main limitation is micropro-
grammable architecture of bit processor that has limited abilities in comparison
to standard microprocessor or microcontroller. Proper cooperation protocol of
both processors allows eliminating of those limitations. Separated bus channels
as well as separated program and data memories assure concurrent operation
of the bit and byte processors without requirement of arbitration process.
The structure of the designed controller must be as simple and cheap as
possible. There must be minimal influence on the execution speed of byte as
well bit processor (all processors should be able to operate with highest possible
throughput). It is obvious that not all assumptions can be fully satisfied.
Following assumptions were made in order to support two processors in
concurrent operations:
r separate address buses for bit and byte processors;
r two data buses: for bit-processor and for microcontroller;
r two controls bus, separate for microcontroller and bit-processor.
GO Byte
Main
Bit Byte Processor
Program NEXT
Processor Processor Program
Memory
Memory
Standard
Instruction
Program
Buffer
Memory
Bit Byte
Common
Processor Processor
Data
Program Program
Memory
Memory Memory
Bit Byte
Processor Bit Arbitration Byte Processor
Data Procesor Circuit Processor Data
Memory Memory
Binary Byte
In/Out In/Out
Modules Modules
Figure 20-2. Block diagram of the two processors CPU with common memory.
The modification of the above solution, referring to the first conception is the
unit where the bit-processor generates pulses activating the sequential tasks in
the byte-processor. These tasks are stored in suitable areas of the byte-processor
memory.
Finally the CPU structure presented in the Fig. 20-2 was accepted. This
structure was additionally equipped with the system of fast data exchange
keeping PLC programming easy. This system—in simple words—ensures that
the processors do not wait to finish their operations but they execute the
next commands up to the moment when command of waiting for result of
238 Chapter 20
Fb FbB
Fb FbB
WRFbB RESFbB Instruction
Instruction READ_F bB
TRFbB EMPTYFbB READYFbB
FbB FB
FBb FB
Instruction RESFBb WRFBb
TFBb Instruction
READYFBb EMPTYFBb WRITE_F Bb
EMPTYBUF GO
EDGEBUF NEXT
Bit Command Byte
Buffer
Processor Processor
Bit instruction
modifies Fb
N
READYFbB
N Y
EMPTYFbB
FbB FB
Y
Fb FbB 1 EMPTYFbB
0 READYFbB
0 EMPTYFbB
1 READYFbB Byte instruction
uses FbB
state of EMPTYFBb line. Until this line is not set new condition result must
not be written to FBb register as it still contain valid data that should be
received by the bit processor.
Such exchange of the conditional flags does not require postponing of the
program execution. Proper data transfer is maintained by handshake registers
that controls data flow among processors and also synchronizes program ex-
ecution. When condition result is passed from one processor to another must
be always executed by a pair of instructions. When data is passed from the
bit processor to the byte processor those are TFbB and READ FbB . Transfer in
opposite direction requires execution of instructions WRITE FBb and TRFBb .
In Fig. 20-4 inter-processors condition passing algorithms are presented.
5. CONCLUSION
Studies on the data exchange optimization between the processors of the
bit-byte CPU of the PLC have shown the great capabilities and the possible
applications of this architecture.
As can be seen from the given considerations, the proposed PLC structure—
or, to be more precise, organization of information exchange between both
processors of a PLC central unit, allows for execution of the control programs
consisting of bit command and/or word commands.
242 Chapter 20
REFERENCES
M. Chmiel, E. Hrynkiewicz, Parallel bit-byte CPU structures of programmable logic con-
trollers. In: International Workshop ECMS, Liberec, Czech Republic, pp. 67–71 (1999).
M. Chmiel, W. Cia̧żyński, A. Nowara, Timers and counters applied in PLCs. In: International
Conference PDS, Gliwice, Poland, pp. 165–172 (1995a).
M. Chmiel, L. Drewniok, E. Hrynkiewicz, Single board PLC based on PLDs. In: International
Conference PDS, Gliwice, Poland, pp. 173–180 (1995b).
J. Donandt, Improving response time of Programmable Logic Controllers by use of a Boolean
Coprocessor, IEEE Comput. Soc. Press., Washington, DC, USA, 4, pp. 167–169 (1989).
Z. Getko, Programmable Systems of Binary Control, Elektronizacja, WKiL, Warsaw (in Pol-
ish), 18:5–13 (1983).
E. Hrynkiewicz, Based on PLDs programmable logic controller with remote I/O groups. In:
International Workshop ECMS, Toulouse, France, pp. 41–48
G. Michel, Programmable Logic Controllers, Architecture and Applications. John Wiley &
Sons, West Sussex, England (1990).
Chapter 21
FPGA IMPLEMENTATION OF
POSITIONAL FILTERS
Dariusz Caban
Institute of Engineering Cybernetics, Wroclaw University of Technology, Janiszewskiego 11-17,
50-370 Wroclaw, Poland; e-mail: [email protected]
Abstract: The paper reports on some experiments with implementing positional digital
image filters using field programmable devices. It demonstrates that a single
field programmable device may be used to build such a filter. By using extensive
pipelining in the design, the filter can achieve performance of 50 million pixels per
second (using Xilinx XC4000E devices) and over 120 MHz (in case of Spartan-
3 devices). These results were obtained using automatic synthesis from VHDL
descriptions, avoiding any direct manipulation in the design.
1. INTRODUCTION
The paper reports on the implementation of a class of filters used in image
processing. The filtering is realised on a running, fixed size window of pixel
values. Positional filtering is obtained by arranging the values in an ordered
sequence (according to their magnitude) and choosing one that is at a certain
position (first, middle, last, or any other). Thus, the class of filters encompasses
median, max, and min filtering, depending on the choice of this position.
There are various algorithms used in positional filtering1,2 . These are
roughly classified into three groups: compare-and-multiplex3 , threshold decom-
position4 , and bit-wise elimination5,6,7 . All these can be used with the currently
available, powerful FPGA devices. However, the bit-wise elimination method
seems most appropriate for the cell array organisation.
Some specific positional filters have commercial VLSI implementations.
There is no device that can be configured to realise any position filtering. Even
if only median, min, or max filtering is required, it may be advantageous to use
FPGA devices, as they offer greater versatility and ease of reengineering. Of
244 Chapter 21
M(r) M(r-1)
BSP
S(r) S(r-1)
P(r)
where
xi (r ) is the r -th bit of i-th pixel in the filtering window,
Pnk (r ) is the r -th bit of the value at k-th position (k-quantile),
Mi (r ) and Si (r ) are the modifying functions.
x(6)
x(7)
P(0)
P(6)
P(7)
This architecture has very short propagation paths between registers and
hence ensures highest pixel processing rates. There is latency between the
input signals and the output equal to the number of bits in the pixel represen-
tations. Normally, in image processing applications this is not a problem. Just
the image synchronisation signals need to be shifted correspondingly. It may
be unacceptable, though, if image filtering is just a stage in a real-time control
application.
The circuit complexity, expressed in terms of the number of cells used (CLBs
or Virtex slices), results from the number and complexity of bit-slice processors
(complexity of the combinatorial logic) and from the number of registers used
in pipelining. The first increases linearly with the size of pixel representation.
On the other hand the number of registers used in pipelining increases with
the square of this representation. In case of the XC4000 architecture, the pixel
representation of 8 bits is the limit, above which the complexity of circuit is
determined solely by the pipelining registers (all the combinatorial logic fits in
the lookup tables of cells used for pipelining).
The synthesis tools had problems in attaining optimal solutions for the syn-
thesis of thresholding functions in the case of the cells implemented in XC4000
devices (this was not an issue in case of min and max positional filters). Most
noticeably, the design obtained when the threshold function was described as
a set of minterms required 314 CLBs in case of 8-bit pixel representation. By
using a VHDL description that defined the function as a network of intercon-
nected 4-input blocks, the circuit complexity was reduced to the reported 288
cells. The reengineered threshold function had a slight effect on the complexity
of the 12-bit filter and none on the 16-bit one.
248 Chapter 21
The most noticeable improvement in using the Virtex-2 devices for posi-
tional filter implementations was in the operation speed: approximately 50 MHz
in case of the XC4000E devices and 80–90 MHz in case of Virtex-2. Some
other architectural improvements are also apparent. The increased functional-
ity of Virtex slices led to much more effective implementations of pipelining
registers: the FPGA Express synthesizer implemented them as shift registers
instead of unbundled flip-flops, significantly reducing the slice usage. Improved
lookup table functionality eliminated the problem of efficient decomposition of
threshold function, as well (at least in the case of the 3 × 3 filtering window).
The results obtained for the Spartan-3 family were similar to the corre-
sponding Virtex-2 ones. The improved performance resulted from higher speed
grades of the devices in use. It should be noted that these results were obtained
using the synthesis tools integrated within Xilinx ISE 6.2i package, since the
available version of FPGA Express could not handle the devices. The synthesis
results do not vary significantly, being in some cases better and in others worse
in the resultant slice count.
5. CONCLUSIONS
The presented implementation results show that FPGA devices have attained
the speed grades that are more than adequate for implementing positional image
filters of very high resolution. Furthermore, it is no longer necessary to inter-
connect multiple FPGA devices or limit the circuit complexity by reducing the
pixel representations. In fact, the capabilities of Virtex-2 and Spartan-3 devices
exceed these requirements both in terms of performance and cell count.
The proposed bit-wise elimination algorithm with pipelining is appropriate
for the cell architecture of FPGA devices. The only problem is the latency,
which may be too high in case of long pixel representations. By limiting the
pipelining to groups of 2, 3, or more bit-slice processors it is possible to trade
off latency against performance.
Positional filtering is just a stage in complex image processing. The analysed
filter implementations leave a lot of device resources unused. This is so, even in
the case of XC4000E packages, where the cell utilization for representations of
8 bits or more is between 60 and 97%. The cells are mostly used for registering,
and the lookup tables are free. These may well be used to implement further
stages of image processing.
It is very important that the considered implementations were directly ob-
tained by synthesis from functional descriptions, expressed in VHDL language.
This makes feasible the concept of reconfigurable filters, where the user de-
scribes the required filtering algorithms in a high-level language, and these are
programmed into the filter. However, the design tools have not yet reached
FPGA Implementation of Positional Filters 249
REFERENCES
1. M. Juhola, J. Katajainen, T. Raita, Comparison of algorithms for standard median filtering.
IEEE Transactions on Signal Processing, 39 (1), 204–208 (1991).
2. D.S. Richards, VLSI Median filters, IEEE Transactions on Acoustics, Speech and Signal
Processing, 38 (1), 145–153 (1990).
3. S. Ranka, S. Sahni, Efficient serial and parallel algorithms for median filtering. IEEE
Transactions on Signal Processing, 39 (6), 1462–1466 (1991).
4. J.P. Fitch, E.J. Coyle, N.C. Gallagher, Median filtering by threshold decomposition. IEEE
Transactions on Acoustics, Speech and Signal Processing, 32 (6), 553–559 (1984).
5. M.O. Ahmad, D. Sundararajan, A fast algorithm for two-dimensional median filtering.
IEEE Transactions on Circuits and Systems, 34 (11), 1364–1374 (1987).
6. C.L. Lee, C.W. Jen, Binary partition algorithms and VLSI architecture for median and
rank order filtering. IEEE Transactions on Signal Processing, 41 (9), 2937–2942 (1993).
7. C.-W. Wu, Bit-level pipelined 2-D digital filters for real-time image processing. IEEE
Transactions on Circuits and Systems for Video Technology, 1 (1), 22–34 (1991).
8. D. Caban, J. Jarnicki, A reconfigurable filter for digital images processing (in Polish).
Informatyka, 6, 15–19 (1992).
9. D. Caban, Hardware implementations of a real time positional filter. In: Proceedings of 5th
Microcomputer School Computer Vision and Graphics, Zakopane, pp. 195–200 (1994).
10. S.C. Chan, H.O. Ngai, K.L.Ho, A programmable image processing system using FPGAs.
International Journal Electronics, 75 (4), 725–730 (1993).
11. D. Caban, W. Zamojski, Median filter implementations. Machine Graphics & Vision, 9 (3),
719–728 (2000).
Chapter 22
A METHODOLOGY FOR DEVELOPING IP
CORES THAT REPLACE OBSOLETE ICS
An industrial experience
e-mail: [email protected]
3 Technical University of Bielsko-Biala, Department of Electrical Engineering, ul. Willowa 2,
Abstract: This paper presents a proven methodology of the development and productization
of virtual electronic components. The methodology consists of rigorous approach
to the development of component specification, reverse engineering of behavior of
reference circuits by means of hardware simulator, application of industrystandard
rules to coding of RTL model in a hardware description language, and extensive
testing and verification activities leading to high quality synthesizable code and
to working FPGA prototype. In the final stage called productization a series of
deliverables are produced to ensure effective reuse of the component in different
(both FPGA and ASIC) target technologies.
Key words: Virtual Components; IP Cores; Hardware Description Languages; high level de-
sign; quality assurance.
We will discuss these stages one by one in this chapter, focusing on details
related to our experiences. In addition we will present in more detail the use
of hardware modeling in the specification and verification process of virtual
components.
form an environment that allows the user to simulate systems that consist of
integrated circuits for which no behavioral models are available. These circuits
are modeled by real chips and may interact within the mentioned environment
with software-based testbench and other parts for which simulation models
exist.
The hardware modeler proved to be useful in our company during specifi-
cation and verification stages. The greatest role of hardware models is related
to reverse engineering, when the goal is documenting functionality of existing
catalogue parts (usually obsolete). Reverse engineering is essential during de-
velopment of the specification of IP cores meant to be functionally equivalent to
those parts. Hardware modeling resolves many ambiguities, which are present
in the referenced chip documentation.
The hardware modeler is also useful for testing the FPGA prototypes of
virtual components, independent of whether it was used during the specification
stage.
It enables the import of the testing results obtained with a hardware model into
its waveform viewer in order to compare them with simulated behavior of the
core under development.
6. VERIFICATION PROCESS
6.1 Test suite development
Test suite development is based on specification. Specification is analyzed
and all the functional features of the core that should be tested for the original
256 Chapter 22
device are enumerated. The test development team starts with development of
tests that are needed to resolve ambiguities in the available documentation of
the chip to which a core has to be compliant.
Most of the functional tests are actually the short programs written in the
assembly language of the processor that is modeled. Each test exercises one
or several instructions of the processor. For instructions supporting several
addressing modes, tests are developed to check all of them. After compiling a
test routine the resulting object code is translated to formats that may be used
to initialize models of program memory in the testbenches (both in CADAT
and VHDL environments). We have developed a set of utility procedures that
automate this process.
In order to test processor interaction with its environment (i.e., I/O oper-
ations, handling of interrupts, counting of external events, response to reset
signal) a testbench is equipped with a stimuli generator.
we also use FSM coverage (state, arc, and path) metrics to ensure that control
parts of the circuit are tested exhaustively.
Incompleteness of the test suite may result in leaving bugs in untested parts
of the code. On the other hand code coverage analysis also helps to reveal (and
remove) redundancy of the test suite.
7. SUBBLOCK DEVELOPMENT
The main part of the macro development effort is the actual design of sub-
blocks defined during specification phase. At the moment we have no access to
tools that check the compliance of the code to a given set of rules and guide-
lines. We follow the design and coding rules defined in Ref. 1. We check the
code with VN-Check tool from TransEDA to ensure that the rules are followed.
Violations are documented.
For certain subblocks we develop separate testbenches and tests. However,
the degree to which the module is tested separately depends on its interaction
with surrounding subblocks. As we specialize in microprocessor core develop-
ment it is generally easier to interpret the results of simulation of the complete
core than to interpret the behavior of its control unit separated from other parts
of the chip. The important aspect here is that we have access to the results of
the test run on the hardware model that serves as a reference.
On the other hand certain subblocks like arithmetic-logic unit or peripherals
(i.e., Universal Asynchronous Receiver/Transmitters (UARTs) and timers) are
easy to test separately and are tested exhaustively before integration of the
macro starts.
Synthesis is realized with tools for FPGA design. We use Synplify, FPGA
Express, and Leonardo. We realize synthesis with each tool looking for the best
possible results in area-oriented and performance-oriented optimizations.
258 Chapter 22
8. MACRO INTEGRATION
Once the subblocks are tested and synthesized they may be integrated. Then
all the tests are run on the RTL model and the results are compared with the
hardware model. As soon as the compliance is confirmed (which may require
a few iterations back to subblock coding and running tests on integrated macro
again) a macro is synthesized towards Xilinx and Altera chips and the tests are
run again on the structural model.
9. PROTOTYPING
The next step in the core development process is building of a real prototype
that could be used for testing and evaluation of the core.
At present we target two technologies: Altera and Xilinx. Our cores are avail-
able to users of Altera and Xilinx FPGAs through AMPP and AllianceCORE
programs. Shortly we will implement our cores in Actel technologies, as well.
Placing and routing of a core in a given FPGA technology is realized with
vendor-specific software. The tests are run again on the SDF-annotated struc-
tural model. We developed a series of adapter boards that interface FPGA
prototype to a system in which a core may be tested or evaluated.
The simplest way to test the FPGA prototype is to replace an original ref-
erence chip used in the hardware modeler with it. This makes it possible to
compare the behavior of the prototype with the behavior of the original chip.
However, for some types of tests even a hardware modeler does not provide the
necessary speed. These tests can only be executed in a prototype hardware sys-
tem at full speed. Such an approach is a must when one needs to test a serial link
with a vast amount of data transfers, or to perform floating point computations
for thousands of arguments. Our experience shows that even after an exhaustive
testing program, some minor problems with the core remains undetected until
it runs a real-life application software.
For this reason we have developed a universal development board (Fig. 22-
2). It can be adapted to different processor cores by replacement of onboard
programmable devices and EPROMs. An FPGA adapter board (see Fig. 22-1)
containing the core plugs into this evaluation board. An application program
may be uploaded to the on-board RAM memory over a serial link from PC.
Development of this application program is done by a separate design team.
This team actually plays a role of an internal beta site, which reveals problems
in using the core before it is released to the first customer.
The FPGA adapter board can also be used to test the core in the ap-
plication environment of a prototype system. Such system should contain a
microcontroller or microprocessor that is to be replaced with our core in the
A Methodology for Developing IP Cores . . . 259
integrated version of the system. The adapter board is designed in such a way
that it may be plugged into the microprocessor socket of the target system. Using
this technique we made prototypes of our cores run into ZX Spectrum micro-
computer (CZ80cpu core) and SEGA Video Game (C68000 core), in which
they replaced original ZilogR and MotorolaR processors.
10. PRODUCTIZATION
The main goal of the productization phase is to define all deliverables that
are necessary to make the use of the virtual component in the larger design easy.
We develop and run simulation scripts with Modelsim and NC Sim simulators
to make sure that the RTL model simulates correctly with them.
While we develop cores in VHDL we translate them into Verilog, to make
them available to customers who only work with Verilog HDL. The RTL model
is translated automatically while the testbench manually. The equivalence of
Verilog and VHDL versions is exhaustively tested.
Synopsys Design Compiler scripts are generated with the help of the FPGA
Compiler II. Synthesis scenarios for high performance and for minimal cost are
developed.
For FPGA market an important issue is developing all the deliverables re-
quired by Altera and Xilinx from their partners participating in AMPP and
AllianceCore third party IP programs.
User documentation is also completed at productization stage (an exhaustive,
complete, and updated specification is very helpful when integrating the core
into a larger design).
260 Chapter 22
11. EXPERIENCES
The methodology described in this paper was originally developed in the
years 1999 and 2000 during the design of a few versions of 8051-compatible
microcontroller core1 . It was then successfully applied to the development of
IP cores compatible to such popular chips as Microchip PICR 1657 microcon-
troller, Motorola 68000 16-bit microprocessor and 56002 digital signal proces-
sor, ZilogR Z80 8-bit microprocessor and its peripherals, TIR 32C025 dsp and
IntelR 80186 16-bit microcontroller.
After accommodating certain improvements of this methodology, we doc-
umented it in our quality management system which passed the ISO 9001
compliance audit in 2003. Presently we are looking at complementing it with
functional coverage and constrained random verification techniques.
REFERENCES
1. M. Bandzerewicz, W. Sakowski, Development of the configurable microcontroller core. In:
Proceedings of the FDL’99 Conference, Lyon (1999).
2. M. Keating, P. Bricaud, Reuse Methodology Manual (2nd ed.). Kluwer Academic Publishers
(1999).
3. J. Haase, Virtual components – From research to business. In: Proceedings of the FDL’99
Conference, Lyon (1999).
4. Maciej Pyka, Wojciech Sakowski, Wlodzimierz Wrona, Developing the concept of hard-
ware modeling to enhance verification process in virtual component design. In: Proceedings
of IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems, Poznan
(2003).
Index
E H
EAB. See Embedded Array Block (EAB) Hardware, 35, 52, 254
Embedded Array Block (EAB), 198 Hardware Description Language (HDL), 53,
Encoding, 15, 21, 25, 113, 126, 129, 136, 81, 95, 97, 101, 107, 259
164, 175, 182, 197, 201 Altera (AHDL), 201
FEL-code, 176, 178 HDL. See Hardware Description Language
global state, 22 (HDL)
minimum-length, 176, 177, 178, 179 HDN. See Petri net—hybrid dynamic (HDN)
one-hot, 127, 169, 172, 176, 177, 182, 183 HiCoS, 74, 81
concurrent, 21, 102 Hierarchy, 32
Event, 4, 74, 86, 95 High Speed Input, 40
basic, 5 History, 33
concurrent, 51 HLPN. See Petri net—high-level (HLPN)
discrete, 4, 20, 51, 74, 139 HPN. See Petri net—hierarchical (HPN)
sequential, 51
simple, 5 I
Exception, 60
Image computation, 80
Extensible Markup Language (XML), 101,
Implicant, 134
106
INA. See Integrated Net Analyzer (INA)
Instruction, 235
F
Integrated Net Analyzer (INA), 55, 57
FCPN. See Petri net—free-choice (FCPN) Invariant, 54, 234
Feeback, 4, 74, 174, 197 place, 18, 21, 56, 59, 164
Field Programmable Gate Array (FPGA), 24, IP core, 163, 164, 251, 254, 260
81, 82, 153, 163–5, 167, 171, 172, IPN. See Petri net—interpreted (IPN)
182, 197, 199, 232, 243, 248, 252, 257 Irredundancy, 11, 121
Filter, 244, 245, 246
median, 243 J
positional, 243, 247, 248
JEDI, 176
Firing, 19, 29, 52, 64, 77, 97, 141, 146, 154,
163 K
Firing sequence, 64
FPGA. See Field Programmable Gate Array KFDD. See Decision Diagram—Kronecker
(FPGA) Functional (KFDD)
FPGA Express, 200, 244, 248, 257
L
FSM. See State machine (SM)—finite
(FSM) Language paradigm, 40
Fusion set, 160-62 LCA. See Logical control algorithm (LCA)
264 Index
LCD. See Logical control devices (LCD) Microcontroller, 40, 47, 164, 221, 233, 251,
Leonardo Spectrum, 200 260
Livelock, 142 embedded, 232
Liveness, 56, 59, 67 Microprocessor, 49, 55, 163, 232, 240,
Locality, 154 257
Logical control algorithm (LCA), 3 DLX, 60
Logical control devices (LCD), 3 Microstep, 74
LOGICIAN, 24 MIN. See Multistage interconnection network
(MIN)
Minimal feedback arc set, 66
M
MLS. See Multiple Lift System (MLS)
Macroblock, 159–61, 164 Model, 145, 158
Macrocell, 24, 25, 167, 168 formal, 4, 63
Macromodule, 85 object, 209, 210
Macronode, 97, 159, 160, 164 object oriented, 28
Macroplace, 17, 23, 30, 34, 86, 97, Petri net, 154
161 Modeling, 43, 60, 98, 139, 223
Macrotransition, 97, 161 hardware, 252, 253
Mapping, 22, 47, 175, 188, 216, 224 object, 209
direct, 16, 52 Transaction Level (TLM), 224
hardware, 76 use case, 209
model, 53 MSC. See Message Sequence Chart (MSC)
Task/Block (TB), 217 MulSim, 224
Task/Instance (TI), 217 Multiple Lift System (MLS), 208
Task/Process (TP), 217 Multistage interconnection network (MIN),
Task/System (TS), 217 223
technology, 52, 195
Marking, 19, 32, 56, 64, 92, 141, 142, 155, N
161
Net addition, 159, 161
current, 93
Normal form
initial, 29, 52, 102, 113, 156
conjunctive (CNF), 70, 135
MAX+PlusII, 196, 200, 202
disjunctive (DNF), 6, 10, 70
Memory block, 202
embedded, 197
O
Message, 208
Message Sequence Chart (MSC), 208, Object
217 concurrent, 208, 211, 212
Method passive, 211, 212
bitwise elimination, 244 ObjectGEODE, 208, 215–17
block, 187 Object-Oriented Real-Time Techniques
stubborn set, 65 (OORT), 208, 214
Thelen’s, 70 Occam, 224
Index 265