HardwareSoftware Co-Design Principles and Practice
HardwareSoftware Co-Design Principles and Practice
net/publication/2354061
CITATIONS READS
26 13,174
3 authors, including:
All content following this page was uploaded by Daniel Gajski on 26 March 2015.
Abstract
In this report we discuss the main models of computation, the basic types of architectures, and
language features needed to specify systems. We also give an overview of a generic methodology for
designing systems, that include software and hardware parts, from executable speci cations.
Contents
1 Models 1
1.1 Model and architecture de nition : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1
1.2 Model taxonomy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3
1.3 Finite-state machine : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4
1.4 Finite-state machine with datapath : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5
1.5 Petri net : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6
1.6 Hierarchical concurrent nite-state machine : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8
1.7 Programming languages : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8
1.8 Program-state machine : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9
2 Architectures 10
2.1 Controller architecture : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10
2.2 Custom Datapath architecture : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11
2.3 FSMD architecture : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13
2.4 CISC architecture : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13
2.5 RISC architecture : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14
2.6 VLIW architecture : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15
2.7 Parallel architecture : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16
3 Languages 17
3.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17
3.2 Characteristics of system models : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18
3.3 Concurrency : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18
3.4 State transitions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 20
3.5 Hierarchy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 21
3.6 Programming constructs : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23
3.7 Behavioral completion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23
3.8 Exception handling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 24
3.9 Timing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 24
3.10 Communication : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 25
3.11 Process synchronization : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 26
3.12 SpecC+ Language description : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28
4 Generic codesign methodology 33
4.1 System speci cation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 33
4.2 Allocation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 35
4.3 Partitioning and the model after partitioning : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 35
4.4 Scheduling and the scheduled model : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 36
4.5 Communication synthesis and the communication model : : : : : : : : : : : : : : : : : : : : : : : : 37
4.6 Analysis and validation ow : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 39
4.7 Backend : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 39
5 Conclusion and Future Work 40
6 Index 43
i
List of Figures
2 Implementation architectures : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2
1 Conceptual views of an elevator controller : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3
3 FSM model for the elevator controller. (y) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4
4 State-based FSM model for the elevator controller. (y) : : : : : : : : : : : : : : : : : : : : : : : : : 5
5 FSMD model for the elevator controller. (y) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6
6 A Petri net example. (y) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7
7 Petri net representations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7
8 Statecharts: hierarchical concurrent states. (y) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8
9 An example of program-state machine. (y) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9
10 A generic controller design : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10
11 An example of a custom datapath. (z) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11
12 Simple datapath with one accumulator : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12
13 Two di erent datapaths for FIR lter : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12
14 Design model : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13
15 CISC with microprogrammed control. (y) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14
16 RISC with hardwired control. (y) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15
17 An example of VLIW datapath. (y) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15
19 Some typical con gurations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17
18 A heterogeneous multiprocessor : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18
20 Data-driven concurrency : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 19
21 Pipelined concurrency : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 20
22 Control-driven concurrency : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 20
23 State transitions between arbitrarily complex behaviors. (y) : : : : : : : : : : : : : : : : : : : : : : 21
24 Structural hierarchy. (y) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 21
25 Sequential behavioral decomposition : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 22
26 Behavioral decomposition types : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 22
27 Code segment for sorting. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23
28 Behavioral completion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23
29 Exception types : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 24
30 Timing diagram : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 25
31 Communication model. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 25
33 Integer channel. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 26
34 A simple synchronous bus protocol : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 26
32 Examples of communication : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 27
35 Protocol description of the synchronous bus protocol. : : : : : : : : : : : : : : : : : : : : : : : : : : 27
36 Control synchronization : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28
37 Data-dependent synchronization in Statecharts : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28
38 A graphical SpecC+ speci cation example. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28
39 A textual SpecC+ speci cation example. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29
40 Component wrapper speci cation. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30
41 Source code of the component wrapper speci cation. : : : : : : : : : : : : : : : : : : : : : : : : : : 31
42 Common con gurations before and after channel inlining : : : : : : : : : : : : : : : : : : : : : : : : 32
43 Timing speci cation of the SRAM read protocol. : : : : : : : : : : : : : : : : : : : : : : : : : : : : 32
44 Timing implementation of the SRAM read protocol. : : : : : : : : : : : : : : : : : : : : : : : : : : 32
46 Conceptual model of speci cation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 33
45 Generic methodology. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 34
47 Conceptual model after partitioning : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 35
48 Conceptual model after scheduling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 36
49 Conceptual model after communication synthesis : : : : : : : : : : : : : : : : : : : : : : : : : : : : 38
ii
Essential Issues in Codesign
Daniel D. Gajski, Jianwen Zhu, Rainer Domer
Department of Information and Computer Science
University of California, Irvine
Irvine, CA 92697-3425, USA
1
language can capture many di erent models, and a
model can be captured in many di erent languages.
The purpose of a conceptual model is to provide
an abstracted view of a system. Figure 1, for exam-
ple, shows two di erent models of an elevator con-
troller, whose English description is in Figure 1(a).
The di erence between these two models is that Fig-
ure 1(b) represents the controller as a set of program-
ming statements, whereas Figure 1(c) represents the
controller as a nite state machine in which the states
indicate the direction of the elevator movement.
As you can see, each of these models represents a rfloor cfloor rfloor cfloor
. . .
thereby exposing its di erent characteristics. For ex-
ample, the state-machine model is best suited to rep-
resent a system's temporal behavior, as it allows a
designer to explicitly express the modes and mode-
transitions caused by external or internal events. The
algorithmic model, on the other hand, has no explicit d
2
"If the elevator is stationary and the floor
requested is equal to the current floor, loop
then the elevator remains idle. if (rfloor = cfloor) then
d := idle;
If the elevator is stationary and the floor elsif (rfloor < cfloor) then
requested is less than the current floor, d := down;
then lower the elevator to the requested floor. elsif (rfloor > cfloor) then
d := up;
If the elevator is stationary and the floor end if;
requested is greater than the current floor, end loop;
then raise the elevator to the requested floor."
(a) (b)
(c)
Figure 1: Conceptual views of an elevator controller: (a) desired functionality in English, (b) programming model,
(c) state-machine model. (y)
next step, then, is to transform the model into an sign process proceeds, an architecture will begin to
architecture, which serves to de ne the model's im- emerge, with more detail being added at each step in
plementation by specifying the number and types of the process. Generally, designers will nd that certain
components as well as the connections between them. architectures are more ecient in implementing cer-
In Figure 2, for example, we see two di erent architec- tain models. In addition, design and manufacturing
tures, either of which could be used to implement the technology will have a great in uence on the choice of
state-machine model of the elevator controller in Fig- an architecture. Therefore, designers have to consider
ure 1(c). The architecture in Figure 2(a) is a register- many di erent implementation alternatives before the
level implementation, which uses a state register to design process is complete.
hold the current state and the combinational logic to
implement state transitions and values of output sig-
nals. In Figure 2(b), we see a processor-level imple- 1.2 Model taxonomy
mentation that maps the same state-machine model System designers use many di erent models in their
into software, using a variable in a program to repre- various hardware or software design methodologies.
sent the current state and statements in the program In general, though, these models fall into ve dis-
to calculate state transitions and values of output sig- tinct categories: (1) state-oriented; (2) activity-orien-
nals. In this architecture, the program is stored in the ted; (3) structure-oriented; (4) data-oriented; and (5)
memory and executed by the processor. heterogeneous. A state-oriented model, such as a
Models and architectures are conceptual and imple- nite-state machine, is one that represents the sys-
mentation views on the highest level of abstraction. tem as a set of states and a set of transitions between
Models describe how a system works, while architec- them, which are triggered by external events. State-
tures describe how it will be manufactured. The de- oriented models are most suitable for control systems,
sign process or methodology is the set of design such as real-time reactive systems, where the system's
tasks that transform a model into an architecture. temporal behavior is the most important aspect of
At the beginning of this process, only the system's the design. An activity-oriented model, such as a
functionality is known. The designer's job, then, is data ow graph, is one that describes a system as a set
to describe this functionality in some language which of activities related by data or execution dependencies.
is based on the most appropriate models. As the de- This model is most applicable to transformational sys-
3
tems, such as digital signal processing systems, where state-based or Moore-type, for which h is de ned
data passes through a set of transformations at a xed as a mapping S ! O. In other words, an output sym-
rate. Using a structure-oriented model, such as a bol is assigned to each state of the FSM and outputed
block diagram, we would describe a system's physical during the time the FSM is in that particular state.
modules and interconnections between them. Unlike The other type is an input-based or Mealy-type
state-oriented and activity-oriented models which pri- FSM, for which h is de ned as the mapping S I ! O.
marily re ect a system's functionalities, the structure- In this case, an output symbol in each state is de ned
oriented models focus mainly on the system's physical by a pair of state and input symbols and it is outputed
composition. Alternatively, we can use a data-orien- while the state and the corresponding input symbols
ted model, such as an entity-relationship diagram, persist.
when we need to represent the system as a collection According to our de nition, each set S; I; and O
of data related by their attributes, class membership may have any number of symbols. However, in real-
and interactions. This model is most suitable for infor- ity we deal only with binary variables, operators and
mation systems, such as databases, where the function memory elements. Therefore, S; I; and O must be
of the system is less important than the data organi- implemented as a cross-product of binary signals or
zation of the system. Finally, a designer could use a memory elements, whereas functions f and h are de-
heterogeneous model { one that integrates many of ned by Boolean expressions that will be implemented
the characteristics of the previous four models { when- with logic gates.
ever he needs to represent a variety of di erent views
in a complex system.
In the rest of this section we will describe some r1/n r2/n
frequently used models. start S1
r2/u1
S2
r1/d1
d1
r3
/u2
r2/
u1
A nite-state machine (FSM) is an example of a r1/
r3/
d2
state-oriented model. It is the most popular model S3
for describing control systems, since the temporal be-
havior of such systems is most naturally represented r3/n
in the form of states and transitions between states.
Basically, the FSM model consists of a set of states, Figure 3: FSM model for the elevator controller. (y)
a set of transitions between states, and a set of ac-
tions associated with these states or transitions.
The nite state machine can be de ned abstractly In Figure 3, we see an input-based FSM that models
as the quintuple the elevator controller in a building with three oors,
as described in Section 1.1. In this model, the set of
< S; I; O; f; h > inputs I = fr1; r2; r3g represents the oor requested.
For example, r2 means that oor 2 is requested. The
where S; I; and O represent a set of states, set of inputs set of outputs O = fd2; d1; n; u1; u2g represents the
and a set of outputs, respectively, and f and h rep- direction and number of oors the elevator should go.
resent the next-state and the output functions. The For example, d2 means that the elevator should go
next state function f is de ned abstractly as a map- down 2 oors, u2 means that the elevator should go
ping S I ! S . In other words, f assigns to every up 2 oors, and n means that the elevator should stay
pair of state and input symbols another state sym- idle. The set of states represents the oors. In Fig-
bol. The FSM model assumes that transitions from ure 3, we can see that if the current oor is 2 (i.e., the
one state to another occur only when input symbols current state is S2 ), and oor 1 is requested, then the
change. Therefore, the next-state function f de nes output will be d1.
what the state of the FSM will be after the input sym- In Figure 4 we see the state-based model for the
bols change. same elevator controller, in which the value of the out-
The output function h determines the output val- put is indicated in each state. Each state has been
ues in the present state. There are two di erent types split into three states representing each of the output
of nite state machine which correspond to two di er- signals that the state machine in Figure 3 will output
ent de nitions of the output function h. One type is a when entering that particular state.
4
Each state, input and output symbols are de ned by
r1
r1 a cross-product of Boolean variables. More precisely,
r1 r3
start
S11/d2 S21/d1 S31 /n I =A1 A2 : : : Ak
S =Q1 Q2 : : : Qm
r2
r2
O=Y1 Y2 : : : Yn
r2 r2
r1
r3 r3
r1 r2
S13 /n r2 S23 /u1 S33 /u2 above FSM de nition by adding the set of datapath
variables, inputs and outputs. More formally, we de-
ne a variables set
r1 r3
r3
r3
V = V 1 V2 : : : V q
Figure 4: State-based FSM model for the elevator con-
troller. (y) which de nes the state of the datapath by de ning the
values of all variables in each state.
In the same fashion, we can separate the set of
In practical terms, the primary di erence between FSMD inputs into a set of FSM inputs IC and a set
these two models is that the state-based FSM may of datapath inputs ID . Thus,
require quite a few more states than the input-based
model. This is because in a input-based model, there I = I C ID
may be multiple arcs pointing to a single state, each
arc having a di erent output value; in the state-based where IC = A1 A2 : : : Ak as before and ID =
model, however, each di erent output value would re- B1 B2 : : : Bp .
quire its own state, as is the case in Figure 4. Similarly, the output set consists of FSM outputs
OC and datapath outputs OD . In other words,
1.4 Finite-state machine with datap- O = OC OD
ath
In cases when a FSM must represent integer or where OC = Y1 Y2 : : : Yn as before and OD =
oating-point numbers, we could encounter a state- Z1 Z2 : : : Zr . However, note that Ai ; Qj and
explosion problem, since, if each possible value for a Yk represent Boolean variables while Bi ; Vi and Zi
number requires its own state, then the FSM could are Boolean vectors which in turn represent integers,
require an enormous number of states. For exam- oating-point numbers and characters. For example,
ple, a 16-bit integer can represent 216 or 65536 dif- in a 16-bit datapath, Bi ; Vi and Zi would be 16 bits
ferent states. There is a fairly simple way to eliminate wide and if they were positive integers, they would be
the state-explosion problem, however, as it is possi- able to assume values from 0 to 216?1 .
ble to extend a FSM with integer and oating-point Except for very trivial cases, the size of the data-
variables, so that each variable replaces thousands of path variables, and ports makes speci cation of func-
states. The introduction of a 16-bit variable, for ex- tions f and h in a tabular form very dicult. In or-
ample, would reduce the number of states in the FSM der to be able to specify variable values in an ecient
model by 65536. and understandable way in the de nition of an FSMD,
In order to formally de ne a FSMD[Gaj97], we we will specify variable values with arithmetic expres-
must extend the de nition of a FSM introduced in sions.
the previous section by introducing sets of datapath We de ne the set of all possible expressions, Expr,
variables, inputs and outputs that will complement over the set of variables V , to be the set of all con-
the sets of FSM states, inputs and outputs. stants K of the same type as variables in V , the set
As we mentioned in the previous section an FSM is of variables V itself and all the expressions obtained
a quintuple by combining two expressions with arithmetic, logic,
< S; I; O; f; h > or rearrangement operators.
5
More formally, Note, again that variables in OC are Boolean variable
and that variables in OD are Boolean vectors.
Expr(V )=K [ V [ f(ei 2ej ) j ei ; ej Using this kind of FSMD, we could model the el-
2 Expr; 2 is an acceptable operatorg evator controller example in Figure 3 with only one
state, as shown in Figure 5. This reduction in the
Using Expr(V ) we can de ne the values of the number of states is possible because we have desig-
status signals as well as transformations in the dat- nated a variable cfloor to store the state value of the
apath. Let STAT = fstatk = ei 4ej j ei ; ej ; 2 FSM in Figure 3 and rfloor to store the values of r1,
Expr(V ); 4 2 f; <; =; 6=; >; gg be the set of all r2 and r3.
status signals which are described as relations be-
tween variables or expressions of variables. Examples
of status signals are Data 6= 0; (a ? b) > (x + y) and
(counter = 0)AND(x > 10). The relations de ning
(cfloor != rloor) / cfloor:=rfloor; output := rfloor − cfloor
status signals are either true, in which case the status start S1
signal has value 1 or false in which case it has value 0. (cfloor = rfloor) / output := 0
6
More formally, a Petri net is a quintuple which transition t1 res after transition t2. In Fig-
ure 7(b), we see the modeling of non-deterministic
< P; T; I; O; u > (1) branching, in which two transitions are enabled but
only one of them can re. In Figure 7(c), we see
where P = fp1 ; p2 ; : : : ; pm g is a set of places, T = the modeling of synchronization, in which a transi-
ft1 ; t2 ; : : : ; tn g is a set of transitions, and P and T are tion can re only after both input places have tokens.
disjoint. Further, the input function, I : T ! P + , Figure 7(d) shows how one would model resource con-
de nes all the places providing input to a transition, tention, in which two transitions compete for the same
while the output function, O : T ! P + , de nes all the token which resides in the place in the center. In Fig-
output places for each transition. In other words, the ure 7(e), we see how we could model concurrency, in
input and output functions specify the connectivity of which two transitions, t2 and t3, can re simultane-
places and transitions. Finally, the marking function ously. More precisely, Figure 7(e) models two concur-
u : P ! N de nes the number of tokens in each place, rent processes, a producer and a consumer; the token
where N is the set of non-negative integers. located in the place at the center is produced by t2
and consumed by t3.
p2 Petri net models can be used to check and validate
certain useful system properties such as safeness and
t4 liveness. Safeness, for example, is the property of
p1 t1 p5 t2 p4
Petri nets that guarantees that the number of tokens
in the net will not grow inde nitely. In fact, we cannot
p3 construct a Petri net in which the number of tokens
is unbounded. Liveness, on the other hand, is the
Net = (P, T, I, O, u) property of Petri nets that guarantees a dead-lock free
P = {p1, p2, p3, p4, p5}
T = {t1, t2, t3, t4}
t3
operation, by ensuring that there is always at least one
transition that can re.
I: I(t1) = {p1} O: O(t1) = {p5} u: u(p1) = 1
I(t2) = {p2,p3,p5} O(t2) = {p3,p5} u(p2) = 1
I(t3) = {p3} O(t3) = {p4} u(p3) = 2
I(t4) = {p4} O(t4) = {p2,p3} u(p4) = 0
u(p5) = 1
t1 t2 t1 t2 t1
7
it can quickly become incomprehensible with any in-
crease in system complexity. Y
A D
9
C nishes or not. The register, usually called the State register, is de-
Since PSMs can represent a system's states, data, signed to store the states in S , while the two combi-
and activities in a single model, they are more suit- national blocks, referred to as the Next-state logic and
able than HCFSMs for modeling systems which have the Output logic, implement functions f and h. In-
complex data and activities associated with each state. puts and Outputs are representations of Boolean sig-
A PSM can also overcome the primary limitation of nals that are de ned by sets I and O.
programming languages, since it can model states ex-
plicitly. It allows a modeler to specify a system us-
ing hierarchical state-decomposition until he/she feels Inputs
FF
. . .
Next FF
Output
A A A
2 Architectures
1 2 k Clk
. . .
. . .
FF
models can be used to describe a system's function-
1
. . .
Next
by specifying how the system will actually be imple-
FF Output
1 Y
State Logic n
Logic
mented. The goal of an architecture, then, is to de- State
10
ter, the State register. Since the inputs and outputs
are Boolean signals, in either case, this architecture
is well-suited to implementing controllers that do not Selector Selector
require complex data manipulation. Selector
The controller synthesis consists of state minimiza- Register
11
operand will always be the content of the Accumulator, image processing, and multimedia. A datapath archi-
which could also be output through a tri-state driver. tecture often consists of high-speed arithmetic units,
The Accumulator is a shift register with a parallel load. connected in parallel, and heavily pipelined in order
This datapath's schematic is shown in Figure 12(a), to achieve a high throughput.
and in Figure 12(b), we have shown the 9-bit control
word that speci es the values of the control signals
for the Selector, the ALU, the Accumulator and the x(i) b(0) x(i−1) b(1) x(i−2) b(2) x(i−3) b(3)
output drivers.
* * * *
Input O
+ +
Pipeline stages
+
1 0
8 S Selector
y(i)
(a)
* * * *
4
3
+ + + y(i)
2 S IL IR
S1
1 0 Accumulator
Clk
Pipeline stages
(b)
0
12
tion in an algorithm is implemented by its own unit, as
in Figure 13, we do not need a control for the system,
Control Datapath
inputs inputs
since data simply ows from one unit to the next, and Control
...
...
...
...
Bus 2
D Q
Control Datapath
Figure 14(a), the datapath has two types of I/O ports. (b)
One type of I/O ports are data ports which are used
by the outside environment to send and receive data to
and from the ASIC. The data could be of type integer, Figure 14: Design model: (a) high-level block dia-
oating-point, or characters and it is usually packed gram, (b) register-transfer-level block diagram. (z)
into one or more words. The data ports are usually 8,
16, 32 or 64 bits wide. The other type of I/O ports Their value is obtained by comparing values of selected
are control ports which are used by the control unit variables stored in the datapath. There are also two
to control the operations performed by the datapath types of output signals: external signals and datapath
and receive information about the status of selected control signals. External signals identify to the envi-
registers in the datapath. ronment that a FSMD architecture has reached cer-
As shown in Figure 14(b), the datapath takes the tain state or nished a particular computation. The
operands from storage units, performs the computa- datapath controls, as mentioned before, select the op-
tion in the combinatorial units and returns the results eration for each component in the datapath.
to storage units during each state, which is usually FSMD architectures are used for various ASIC de-
equal to one clock cycle. signs. Each ASIC design consists of one or more
As mentioned in the previous section the selec- FSMD architectures, although two implementations
tion of operands, operations and the destination for may di er in the number of control units and data-
the result is controlled by the control unit by setting paths, the number of components and connections in
proper values of datapath control signals. The datap- the datapath, the number of states in the control unit
ath also indicates through status signals when a par- and the number of I/O ports. The FSM controller and
ticular value is stored in a particular storage unit or DSP datapath mentioned above are two special cases
when a particular relation between two data values of this kind of architecture. In addition, the FSMD is
stored in the datapath is satis ed. also the basic architecture for general-purpose proces-
Similar to the datapath, a control unit has a set sors, since each processor includes both a control unit
of input and a set of output signals. Each signal is and a datapath.
a Boolean variable that can take a value of 0 or 1.
There are two types of input signals: external sig-
nals and status signals. External signals represent the
conditions in the external environment on which the
2.4 CISC architecture
FSMD architecture must respond. On the other hand, The primary motivation for developing an architecture
the status signals represent the state of the datapath. of complex-instruction-set computers (CISC)
13
was to reduce the number of instructions in com- one register to another. Since the MicroPC is concur-
piled code, which would in turn minimize the num- rently incremented to point to the next control word,
ber of memory accesses required for fetching instruc- this procedure will be repeated for each control word
tions. The motivation was valid in the past, since in the sequence. Finally, when the last control word is
memories were expensive and much slower than pro- being executed, a new instruction will be fetched from
cessors. The secondary motivation for CISC develop- the Memory, and the entire process will be repeated.
ment was to simplify compiler construction, by includ- From this description, we can see that the number
ing in the processor instruction set complex instruc- of control words, and thus the number of clock cycles
tions that mimic programming language constructs. can vary for each instruction. As a result, instruc-
These complex instructions would reduce the seman- tion pipelining can be dicult to implement in CISCs.
tic gap between programming and machine languages In addition, relatively slow microprogram memory re-
and simplify compiler construction. quires a clock cycle to be longer than necessary. Since
instruction pipelines and short clock cycles are neces-
sary for fast program execution, CISC architectures
Control may not be well-suited for high-performance proces-
unit
Control
Datapath sors.
Although a variety of complex instructions could be
Microprogram
memory
14
ister le is, the smaller the number of load and store RISC compiler will need to use a sequence of RISC in-
instructions in the code. When the RISC executes an structions in order to implement complex operations.
instruction, the instruction pipe begins by fetching an At the same time, of course, although these features
instruction into the Instruction register. In the sec- require more sophistication in the compiler, they also
ond pipeline stage the instruction is then decoded and give the compiler a great deal of exibility in perform-
the appropriate operands are fetched from the Regis- ing aggressive optimization.
ter le. In the third stage, one of two things occurs: Finally, we should note that RISC programs tend to
the RISC either executes the required operation in the require 20% to 30% more program memory, due to the
ALU , or, alternatively, computes the address for the lack of complex instructions. However, since simpler
Data cache. In the fourth stage the data is stored instruction sets can make compiler design and running
in either the Data cache or in the Register le. Note time much shorter, the eciency of the compiled code
that the execution of each instruction takes only four is ultimately much higher. In addition, because of
clock cycles, approximately, which means that the in- these simpler instruction sets, RISC processors tend
struction pipeline is short and ecient, losing very few to require less silicon area and a shorter design cycle
cycles in the case of data or branch dependencies. than their CISC counterparts.
15
What is interesting to note here is that, ideally, the each PE, and then collect the results after the compu-
VLIW in Figure 17 would provide four times the per- tations are nished. When it is necessary, PEs can also
formance we could get from a processor with a sin- communicate directly with their nearest neighbors.
gle functional unit, under the assumption that the The primary advantage of array processors is that
code executing on the VLIW had four-way parallelism, they are very convenient for computations that can be
which enables the VLIW to execute four independent naturally mapped on a rectangular grid, as in the case
instructions in each clock cycle. In reality, however, of image processing, where an image is decomposed
most code has a large amount of parallelism inter- into pixels on a rectangular grid, or in the case of
leaved with code that is fundamentally serial. As a weather forecasting, where the surface of the globe is
result, a VLIW with a large number of functional units decomposed into n-by-n-mile squares. Programming
might not be fully utilized. The ideal conditions would one grid point in the rectangular array processor is
also require us to assume that all the operands were quite easy, since all the PEs execute the same instruc-
in the register le, with 8 operands being fetched and tion stream. However, programming any data routing
four results stored back on every clock cycle, in ad- through the array is very dicult, since the program-
dition to four new operands being brought from the mer would have to be aware of all the positions of each
memory to be available for use in the next clock cy- data for every clock cycle. For this reason, problems,
cle. It must be noted, however, that this computation like matrix triangulations or inversions, are dicult to
pro le is not easy to achieve, since some results must program on an array processor.
be stored back to memory and some results may not
be needed in the next clock cycle. Under these con- Array processors, then, are easy to build and easy
ditions, the eciency of a VLIW datapath might be to program, but only when the natural structure of the
less than ideal. problem matches the topology of the array processor.
Finally, we should point out that there are two tech- As a result, they can not be considered general pur-
nological limitation that can a ect the implementation pose machines, because users have diculty writing
of a VLIW architecture. First, while register les with programs for general classes of problems.
8{16 ports can be built, the eciency and performance An MIMD processor, usually called a multipro-
of such register les tend to degrade quickly when we cessor system, di ers from an SIMD in that each
go beyond that number. Second, since VLIW pro- PE executes its own instruction stream. In this kind
gram and data memories require a high communica- of architecture, the program can be loaded by a host
tion bandwidth, these systems tend to require expen- processor, or each processor can load its own program
sive high-pin packaging technology as well. Overall, from a shared memory. Each processor can commu-
these are the reasons why VLIW architectures are not nicate with every other processor within the multi-
as popular as RISC architectures. processor system, using one of the two communica-
tion mechanisms. In a shared-memory multipro-
2.7 Parallel architecture cessor, all the processors are connected to a shared
memory through an interconnection network, which
In the design of parallel processors, we can take ad- means that each processor can access any data in the
vantage of spatial parallelism by using multiple pro- shared memory. In a message-passing multiproces-
cessing elements (PEs) that work concurrently. In this sor, on the other hand, each processor tends to have
type of architecture, each PE may contain its own dat- a large local memory, and sends data to other pro-
apath with registers and a local memory. Two typi- cessors in the form of messages through an intercon-
cal types of parallel processors are the SIMD (single nection network. The interconnection network for a
instruction multiple data) and the MIMD (multiple shared memory must be fast, since it is very frequently
instruction multiple data) processors. used to communicate small amounts of data, like a
In SIMD processors, usually called array proces- single word. In contrast, the interconnection network
sors, all of the PEs execute the same instruction in a used for message passing tends to be much slower,
lock step manner. To broadcast the instructions to all since it is used less frequently and communicates long
the PEs and to control their execution, we generally messages, including many words of data. Finally, it
use a single global controller. Usually, an array pro- should be noted that multiprocessors are much eas-
cessor is attached to a host processor, which means ier to program, since they are task-oriented instead
that it can be thought of as a kind of hardware accel- of instruction-oriented. Each task runs independently
erator for tasks that are computationally intensive. In and can be synchronized after completion, if necessary.
such cases, the host processor would load the data into Thus, multiprocessors make program and data par-
16
titioning, code parallelization and compilation much
simpler than array processors.
Such a multiprocessor, in which the interconnec-
tion network consists of several buses, is shown in Fig-
ure 18. Each processing element (PE) consists of a
processor or ASIC and a local memory connected by
the local bus. The shared or global memory may be ei- PE
handshaking between the two PEs are performed via Proc1 LM1 Proc2 LM2
3 Languages Arbiter
SBus
17
PE1 PE2 PE3
Sbus1 Sbus2
Arbiter1 Arbiter2
designers to work on. Increasingly, designers need to Since di erent conceptual models possess di erent
conceptualize the system using an executable spec- characteristics, any given speci cation language can
i cation language, which is capable of capturing the be well or poorly suited for that model, depending on
functionality of the system in a machine-readable and whether it supports all or just a few of the model's
simulatable form. characteristics. To nd the language that can capture
Such an approach has several advantages. First, a given conceptual model directly, we would need to
simulating an executable speci cation allows the de- establish a one-to-one correlation between the charac-
signer to verify the correctness of the system's in- teristics of the model and the constructs in the lan-
tended functionality. In the traditional approach, guage.
which started with a natural-language speci cation,
such veri cation would not be possible until enough 3.2 Characteristics of system models
of the design had been completed to obtain a simulat- In this section, we will present some of the character-
able system description (usually gate-level schemat- istics most commonly found in modeling systems. In
ics). The second advantage of this approach is that the presenting these characteristics, part of our goal will
speci cation can serve as an input to codesign tools, be to assess how useful each characteristic is in cap-
which, in turn, can be used to obtain an implementa- turing one or more types of system behavior.
tion of the system, ultimately reducing design times
by a signi cant amount. Third, such a speci cation
can serve as comprehensive documentation, providing 3.3 Concurrency
an unambiguous description of the system's intended Any system can be decomposed into chunks of func-
functionality. Finally, it also serves as a good medium tionality called behaviors, each of which can be de-
for the exchange of design information among various scribed in several ways, using the concepts of pro-
users and tools. As a result, some of the problems cesses, procedures or state machines. In many cases,
associated with system integration can be minimized, the functionality of a system is most easily conceptu-
since this approach would emphasize well-de ned sys- alized as a set of concurrent behaviors, simply because
tem components that could be designed independently representing such systems using only sequential con-
by di erent designers. structs would result in complex descriptions that can
The increasing design complexity associated with be dicult to comprehend. If we can nd a way to
systems-on-a-chip also makes an executable mod- capture concurrency, however, we can usually obtain
eling language extremely desirable where an inter- a more natural representation of such systems. For
mediate implementation can be represented and val- example, consider a system with only two concurrent
idated before proceeding to the next synthesis step. behaviors that can be individually represented by the
For the same reason, we need such a modeling lan- nite-state machines F1 and F2 . A standard represen-
guage to be able to describe design artifacts from pre- tation of the system would be a cross product of the
vious designs and intellectual properties (IP) provided two nite-state machines, F1 F2 , potentially result-
by other sources. ing in a large number of states. A more elegant solu-
18
tion, then, would be to use a conceptual model that inputs, the add and subtract operations in statements
has two or more concurrent nite-state machines, as 1 and 3 will be carried out rst. The results of these
do the Statecharts [Har87] and many other concurrent two computations will provide the data required for
languages. the multiplication in statement 3. Finally, the addi-
Concurrency representations can be classi ed into tion in statement 2 will be performed to compute y.
two groups, data-driven or control-driven, depending
on how explicitly the concurrency is indicated. Fur- Pipelined concurrency: Data ow description in
thermore, a special class of data-driven concurrency the previous section can be viewed as a set of op-
called pipelined concurrency is of particular impor- erations which consume data from their inputs and
tance to signal processing applications. produce data on their outputs. Since the execution
of each operation is determined by the availability of
Data-driven concurrency: Some behaviors can be its input data, the degree of concurrency that can be
clearly described as sets of operations or statements exploited is limited by data dependencies. However,
without specifying any explicit ordering for their ex- when the same data ow operations are applied to a
ecution. In a case like this, the order of execution stream of data samples, we can use pipelined con-
would be determined only by data dependencies be- currency to improve the throughput, that is, the rate
tween them. In other words, each operation will per- at which the system is able to process the data stream.
form a computation on input data, and then output Such throughput improvement is achieved by dividing
new data, which will, in turn, be input to other op- operations into groups, called pipeline stages, which
erations. Operation executions in such data ow de- operate on di erent data sets in the stream. By op-
scriptions depend only upon the availability of data, erating on di erent data sets, pipeline stages can run
rather than upon the physical location of the opera- concurrently. Note that each stage will take the same
tion or statement in the speci cation. Data ow repre- amount of time, called a cycle, to compute its results.
sentations can be easily described from programming For example, Figure 21(a) shows a data ow graph
languages using the single assignment rule, which operating on the data set a(n); b(n); c(n); d(n) and
means that each variable can appear exactly once on x(n), while producing the data set q(n); p(n) and y(n),
the left hand side of an assignment statement. where the index n indicates the nth data in the stream,
called data sample n. Figure 21(a) can be converted
into a pipeline by partitioning the graph into three
a b c d x stages, as shown in Figure 21(b).
In order for the pipeline stages to execute con-
+ − currently, storage elements such as registers or FIFO
1: q = a + b queues have to be inserted between the stages (indi-
2: y = p + x
3: p = (c − d) * q * cated by thick lines in Figure 21(b)). In this way,
while the second stage is processing the results pro-
+ duced by the rst stage at the previous cycle, the
rst stage can simultaneously process the next data
q p y sample in the stream. Figure 21(c) illustrates the
(a) (b) pipelined execution of Figure 21(b), where each row
represents a stage, each column represents a cycle. In
Figure 20: Data-driven concurrency: (a) data ow the third column, for example, while the rst stage
statements, (b) data ow graph generated from (a). (y) is adding a(n + 2) and b(n + 2), and subtracting
c(n + 2) and d(n + 2), the second stage is multiplying
Consider, for example, the single assignment state- (a(n + 1) + b(n + 1)) and c(n + 1) ? d(n + 1), and
ments in Figure 20(a). As in any other data-driven ex- the third stage is nishing the computation of the nth
ecution, it is of little consequence that the assignment sample by adding ((a(n)+ b(n)) (c(n) ? d(n)) to x(n).
to p follows the statement that uses the value of p to
compute the value of y. Regardless of the sequence of Control-driven concurrency: The key concept
the statements, the operations will be executed solely in control-driven concurrency is the control thread,
as determined by availability of data, as shown in the which can be de ned as a set of operations in the sys-
data ow graph of Figure 20(b). Following this princi- tem that must be executed sequentially. As mentioned
ple, we can see that, since a, b, c and d are primary above, in data-driven concurrency, it is the dependen-
19
nth (n+1)th (n+2)th (n+3)th
cycle cycle cycle cycle
a(n) b(n) c(n) d(n) x(n) a(n) b(n) c(n) d(n) x(n) time
+ − + − + −
stage 1
+ − stage 1 + −
* stage 2 * stage 2 * * *
+ stage 3 + stage 3
+ + +
Figure 21: Pipelined concurrency: (a) original data ow, (b) pipelined data ow, (c) pipelined execution.
cies between operations that determine the execution ment waits for the previously forked control threads
order. In control-driven concurrency, by contrast, it is to terminate. The fork statement in Figure 22(a), for
the control thread or threads that determine the order example, spawns three control threads A, B and C,
of execution. In other words, control-driven concur- all of which execute concurrently. The correspond-
rency is characterized by the use of explicit constructs ing join statement must wait until all three threads
that specify multiple threads of control, all of which have terminated, after which the statements in R can
execute in parallel. be executed. In Figure 22(b), we can see how pro-
cess statements are used to specify concurrency. Note
that, while a fork-join statement starts from a sin-
sequential behavior X
begin
concurrent behavior X
begin gle control thread and splits it into several concurrent
Q(); process A(); threads as shown in Figure 22(c), a process statement
fork A(); B(); C(); join;
R();
process B();
process C(); represents the behavior as a set of concurrent threads,
end behavior X; end behavior X; as shown in Figure 22(d). For example, the process
(a) (b) statements of Figure 22(b) create three processes A,
B and C, each representing a di erent control thread.
Q
Both fork-join and process statements may be nested,
and both approaches are equivalent to each other in
the sense that a fork-join can be implemented using
A B C A B C nested processes and vice versa.
R 3.4 State transitions
Systems are often best conceptualized as having var-
(c) (d) ious modes, or states, of behavior, as in the case of
controllers and telecommunication systems. For ex-
ample, a trac-light controller [DH89] might incorpo-
Figure 22: Control-driven concurrency: (a) fork-join rate di erent modes for day and night operation, for
statement, (b) process statement, (c) control threads manual and automatic functioning, and for the status
for fork-join statements, (d) control threads for pro- of the trac light itself.
cess statement. (y) In systems with various modes, the transitions be-
tween these modes sometimes occur in an unstruc-
Control-driven concurrency can be speci ed at the tured manner, as opposed to a linear sequencing
task level, where constructs such as fork-joins and pro- through the modes. Such arbitrary transitions are
cesses can be used to specify concurrent execution of akin to the use of goto statements in programming
operations. Speci cally, a fork statement creates a languages. For example, Figure 23 depicts a system
set of concurrent control threads, while a join state- that transitions between modes P, Q, R, S and T, the
20
sequencing determined solely by certain conditions. as declaration types, variables and subprogram names.
Given a state machine with N states, there can be Since a lack of hierarchy would make all such objects
N N possible transitions among them. global, it would be dicult to relate them to their par-
ticular use in the model, and could hinder our e orts
to reuse these names in di erent portions of the same
start
model.
There are two distinct types of hierarchy { struc-
u v tural hierarchy and behavioral hierarchy { both of
P
which are commonly found in conceptual views of sys-
tems.
w z
Q R T
Structural hierarchy: A structural hierarchy is
x
S
y
one in which a system speci cation is represented as
finish
a set of interconnected components. Each of these
components, in turn, can have its own internal struc-
Figure 23: State transitions between arbitrarily com- ture, which is speci ed with a set of lower-level inter-
plex behaviors. (y) connected components, and so on. Each instance of
In systems like this, transitions between modes can an interconnection between components represents a
be triggered by the detection of certain events or cer- set of communication channels connecting the compo-
tain conditions. For example, in Figure 23, the tran- nents. The advantage of a model that can represent a
sition from state P to state Q will occur whenever structural hierarchy is that it can help the designer to
event u happens while in P. In some systems, actions conceptualize new components from a set of existing
can be associated with each transition, and a partic- components.
ular mode or state can have an arbitrarily complex
behavior or computation associated with it. In the
case of the trac-light controller, for example, in one System
day and the trac density. In simple (Section 1.3) and lines
21
into its corresponding control logic represented as a archical transitions. A simple transition is similar
set of gates. to that which connects states in an FSM model in that
it causes control to be transferred between two states
Behavioral hierarchy: The speci cation of a be- that both occupy the same level of the behavioral hi-
havioral hierarchy is de ned as the process of erarchy. In Figure 25(b), for example, the transition
decomposing a behavior into distinct subbehaviors, triggered by event e1 transfers control from behavior
which can be either sequential or concurrent. Q1 to Q2. Group transitions are those which can
The sequential decomposition of a behavior be speci ed for a group of states, as is the case when
may be represented as either a set of procedures or event e5 causes a transition from any of the subbe-
a state machine. In the rst case, a procedural se- haviors of Q to the behavior R. Hierarchical tran-
quential decomposition of a behavior is de ned as sitions are those (simple or group) transitions which
the process of representing the behavior as a sequence span several levels of the behavioral hierarchy. For ex-
of procedure calls. Even in the case of a behavior that ample, the transition labeled e6 transfers control from
consists of a single set of sequential statements, we can behavior Q3 to behavior R1, which means that it must
still think of that behavior as comprising a procedure span two hierarchical levels. Similarly, the transition
which encapsulates those statements. A procedural labeled e7 transfers control from Q to state R2, which
sequential decomposition of behavior P is shown in is at a lower hierarchical level.
Figure 25(a), where behavior P consists of a sequen- For a sequentially decomposed behavior, we must
tial execution of the subbehaviors represented by pro- explicitly specify the initial subbehavior that will be
cedures Q and R. Behavioral hierarchy would be rep- activated whenever the behavior is activated. In Fig-
resented here by nested procedure calls. Recursion in ure 25(b), for example, R is the rst subbehavior that
procedures allows us to specify a dynamic behavioral is active whenever its parent behavior P is activated,
hierarchy, which means that the depth of the hierarchy since a solid triangle points to this rst subbehavior.
will be determined only at run time. Similarly, Q1 and R1 would be the initial subbehaviors
of behaviors Q and R, respectively.
The concurrent decomposition of behaviors al-
P
e4 R lows subbehaviors to run in parallel or in pipelined
fashion.
Q
behavior P
variable x, y; Q1 e5 R1
begin e2
Q(x) ; e1
Q3 e8
R(y) ; e6
end P; Sequential Concurrent Pipelined
Q2 e3 R2
e7 X X X
A A A
(a) (b)
22
A; B and C run in parallel, which means that they In the nite-state machine model, we usually desig-
will start when X starts, and when all of them n- nate an explicitly de ned set of states as nal states.
ish, X will nish, just like the fork-join construct dis- This means that, for a state machine, completion will
cussed in Section 3.3. In Figure 26(c), A; B and C run have occurred when control ows to one of these nal
in pipelined mode, which means that they represent states, as shown in Figure 28(a).
pipeline stages which run concurrently where A sup- In cases where we use programming language con-
plies data to B and B to C as discussed in Section 3.3. structs, a behavior will be considered complete when
the last statement in the program has been executed.
3.6 Programming constructs For example, whenever control ows to a return state-
ment, or when the last statement in the procedure is
Many behaviors can best be described as sequential al- executed, a procedure is said to be complete.
gorithms. Consider, for example, the case of a system
intended to sort a set of numbers stored in an array,
or one designed to generate a set of random numbers. B
23
sition to program-state Y. Similarly, program-state B
will be said to have completed whenever control ows
along the TOC arc labeled e4 from program-state Y e1
X
e2
X
24
the system is actually describing custom communica-
tion procedures. Hence, it is very important for a
system description language to provide the ability to
Address a
Figure 30: Timing diagram (a) a mechanism to separate the speci cation of com-
putation and communication;
tems, whose performance is measured in terms of (b) a mechanism to declare abstract communication
how well the implementation respects the timing con- functions in order to describe what they are and
straints. A favorite example of such systems would be how they can be used;
an aircraft controller, where failure to respond to an
abnormal event in a prede ned timing limit will lead (c) a mechanism to de ne a custom communication
to disaster. implementation which describes how the commu-
nication is actually performed.
3.10 Communication In order to nd a general communication model
the structure of a system must be de ned. A sys-
In general, systems consist of several interacting be- tem's structure consists of a set of blocks which are
haviors which need to communicate with each other to interconnected through a set of communication chan-
be cooperative. Thus a general communication model nels. While the behavior in the blocks speci es how
is necessary for system speci cation. the computation is performed and when the commu-
In traditional programming languages standard nication is started, the channels encapsulate the com-
forms of communication between functions are shared munication implementation. In this way blocks and
variable access and parameter passing in procedure channels e ectively separate the speci cation of co-
calls. These mechanisms provide communication in munication and computation.
an abstract form. The way the communication is per- Each block in a system contains a behavior and a
formed is prede ned and hidden to the programmer. set of ports through which the behavior can commu-
For example, functions communicate through global nicate. Each channel contains a set of communication
variables, which share a common memory space, or functions and a set of interfaces. An interface de-
via parameter passing. In case of local procedure clares a subset of the functions of the channel, which
calls, parameter passing is implemented by exchanging can be used by the connected behaviors. So while the
information on the stack or through processor regis- declaration of the communication functions is given in
ters. In the case of remote procedure calls, parame- the interfaces, the implementation of these functions
ters are passed via the complex protocol of marshal- is speci ed in the channel.
ing/unmarshaling and sending/receiving data through
a network.
While these mechanisms are sucient for stan- B1 B2
dard programming languages, they poorly address the P1
I1 C I2
P2
needs for systems-on-a-chip descriptions, where the
way the communication is performed is often custom
and impossible to prede ne. For example, in telecom-
munication applications the major task of modeling Figure 31: Communication model.
25
For example, the system shown in Figure 31 con- ods. This model also encourages the separation of
tains two blocks B 1 and B 2, and a channel C . Block computation and communication, since the function-
B 1 communicates with the left interface I 1 of channel ality responsible for communication can be con ned in
C via its port P 1. Similarly block B 2 accesses the the channel speci cation and will not be mixed with
right interface I 2 of channel C through its port P 2. the description used for computation.
Note that blocks B 1 and B 2 can be easily replaced by
other blocks as long as the port types stay the same.
Similarly channel C can be exchanged with any other
1 2 3 4 5 1 2 3 4 5
specify how data is transferred over the channel. All (a) (b)
26
B1 B2 B1 C B2
int x; void send(int d) int y;
int x; int y; ... { ... } ...
... int M; ... C.send(x); int receive(void) y=C.receive();
M = x; y = M; ... { ... } ...
... ...
(a) (b)
B1 B2
C1
P1 C2 P2
(c)
Figure 32: Examples of communication: (a) shared memory, (b) channel, (c) hierarchical channel.
27
AB
Q A B
behavior X
begin A1 B1
Q(); A B C e e
fork A(); B(); C(); join; AB
R(); synchronization A2 B2 A B
end behavior X; point
R (a) A1
x:=0
B1
e (x=1)
(a) (b)
AB A2 B2
B x:=1
A
ABC AB A1 B1
(b)
A B C A B e entered A2
A2 B2
B1
A1
(c)
A2 B2
m = max;
max = array[ j ];
i ... o
e2
28
structural hierarchy in the sense that it captures a
system as a hierarchy of actors. Each actor is either a
composite actor or a leaf actor.
Composite actors are decomposed hierarchically
into a set of child actors. For structural hierarchy, the
1 typedef int
2
TData[16];
child actors are interconnected via the communication
3 interface IData( void ) { channels by child actor instantiation statements, sim-
4
5
TData read( void );
void write( TData d ); ilar to component instantiation in VHDL. For exam-
6 }; ple, actor X is instantiated in line 57 of Figure 39 by
mapping its port a and c to the ports (p) and commu-
7
8 channel CData( void ) implements IData {
9 bool valid; nication channels (ch) de ned in its parent actor B.
For behavioral hierarchy, the child actors can either
10 event s;
11 TData storage;
12 be concurrent, in which case all child actors are active
whenever the parent actor is active, or can be sequen-
13 TData read( void ) {
14 if( valid ) s.wait();
15
16
return storage;
}
tial, in which case the child actors are only active one
17 void write( TData d ) { at a time. In Figure 38, actors B and X are composite
18
19 }
storage=d; valid = 1; s.notify(); actors. Note that while B consists of concurrent child
20 }; actors X and Y, X consists of sequential child actors
21
22 actor X1( in TData i, out TData o ) { ... }; X1 and X2.
23 actor X2( in TData i, IData o ) { Leaf actors are those that exist at the bottom
24
25
void main( void ) {
... of the hierarchy whose functionality is speci ed with
26 o.write(...); imperative programming constructs. In Figure 38, for
27
28 };
}
example, Y is a leaf actor.
29
30 actor X( in int a, IData c ) { SpecC+ also supports state transitions, in the
31 TData s; sense that we can represent the sequencing between
32
33
X1
X2
x1( a, s );
x2( s, c ); child actors by means of a set of transition arcs.
34 In this language, an arc is represented as a 3-tuple
35
36
psm main( void ) {
x1 : ( TI, cond1, x2 ); < T; C; N >, where T represents the type of transi-
37 x2 : ( TOC, cond2, complete ); tion, C represents the condition triggering the transi-
38
39 };
}
tion, and N represents the next actor to which control
40 is transferred by the transition. If no condition is as-
41 actor Y ( IData c, out int m ) {
42 void main( void ) { sociated with the transition, it is assumed to be \true"
43
44
int
TData
max, j;
array;
by default.
45 SpecC+ supports two types of transition arcs. A
46
47
array = c.read();
max = 0; transition-on-completion arc (TOC) is traversed
48 for( j = 0; j < 16; j ++ ) whenever the source actor has completed its compu-
49
50
if( array[j] > max )
max = array[j]; tation and the associated condition evaluates as true.
51 m = max; A leaf actor is said to have completed when its last
52
53 };
}
statement has been executed. A sequentially decom-
54 posed actor is said to be complete only when it makes
55 actor B( in TData p, out int q ) {
56 CData ch; a transition to a special prede ned completion point,
57 X x( p, ch ); indicated by the name complete in the next-actor eld
58
59
Y y( ch, q );
of a transition arc. In Figure 38, for example, we
60 csp main( void ) { can see that actor X completes only when child actor
61
62
par { x.main(); y.main(); }
} X2 completes and control ows from X2 to the com-
63 }; plete point when cond2 is true (as speci ed by the arc
< TOC; cond2; complete > in line 36 of Figure 39).
Figure 39: A textual SpecC+ speci cation example. Finally, a concurrently decomposed actor is said to be
completed when all of its child actors have completed.
In Figure 38, for example, actor B completes when all
the concurrent child actors X and Y have completed.
Unlike the TOC arc, a transition-immediate-
29
ly arc (TI) is traversed instantaneously whenever a set of function implementations. For example, the
the associated condition becomes true, regardless of channel CData encapsulates media s and storage and
whether the source actor has or has not completed an implementation of methods read and write. The
its computation. For example, in Figure 38, the arc interface and the channel are related by the imple-
< TI; cond1; x2 > terminates X1 whenever cond1 is ments keyword. A channel related to an interface in
true and transfers control to actor X2. In other words, this way is said to implement this interface, meaning
a TI arc e ectively terminates all lower level child ac- the channel is obligated to implement the set of func-
tors of the source actor. tions prescribed by the interface. For example, CData
Transitions are represented in Figure 38 with has to implement read and write since they appear in
directed arrows. In the case of a sequentially- IData. It is possible that several channels can imple-
decomposed actor, an inverted bold triangle points to ment the same interface, which implies that they can
the rst child actor. An example of such an initial provide di erent implementations of the same set of
child actor is X1 of actor X. The completion of se- functions.
quentially decomposed actors is indicated by a transi- Interfaces are usually used as port data types in
tion arc pointing to the completion point, represented port declarations of an actor (as port c of actor Y at
as a bold square within the actor. Such a completion line 41 of Figure 39). A port of one interface type will
point is found in actor X (transition from X2 labeled be bound to a particular channel which implements
e2). TOC arcs originate from a bold square inside the such an interface during actor instantiation. For ex-
source child actor, as does the arc labeled e2. TI arcs, ample, port c of actor Y is mapped to channel c of
in contrast, originate from the perimeter of the source actor B, when actor Y is instantiated.
child actor, as does the arc labeled e1. The fact that a port of interface type can be bound
SpecC+ supports both data-dependent syn- to a real channel until actor instantiation is called late
chronization and control-dependent synchro- binding. Such a late binding mechanism helps to
nization. In the rst method, actors can synchronize improve the reusability of an actor description, since
using common event. For example, in Figure 38, ac- it is possible to plug in any channel as long as they
tor Y is the consumer of the data produced by actor implement the same interface.
X via channel c, which is of type CData Figure 39. In
the implementation of CData at line 8 of Figure 39,
an event s is used to make sure Y can get valid data ASystem CSramWrapper
word reg[8];
initial states. Furthermore, the fact that X and Y are ....
concurrent actors enclosed in B automatically imple- read_word(0x1, ®[0]);
....
cs cs cas cas
30
use IRam as its port so that its behavior can make
function calls to methods read word and write word
without knowing how these methods are exactly im-
plemented. There are two types of memories avail-
able in the library, represented by actors ASram and
ADram respectively, the descriptions of which provide
1 interface IRam( void ) { their behavioral models. Obviously, the static RAM
2
3
void
void
read_word( word a, word *d );
write_word( word a, word d ); ASram and dynamic RAM ADram have di erent pins
4 }; and timing protocols to access them, which can be
5
6 actor AAsic( IRam ram ) { encapsulated with the component actors themselves
7 word reg[8]; in channels called wrappers, as CSramWrapper and
8
9 void main( void ) { CDramWrapper in Figure 40. When the actor AAsic
10 ... is instantiated in actor ASystem (lines 52 and 53 in
11
12
ram.read_word( 0x0001, ®[0] );
... Figure 41), the port IRam will be resolved to either
13
14
ram.write_word( 0x0002, reg[4] );
}
CSramWrapper or CDramWrapper.
15 }; The improvement of reusability of this style of spec-
16
17 actor ASram( in signal<word> addr, i cation is two fold: rst, the encapsulation of commu-
18 inout signal<word> data, nication protocols into the channel speci cation make
19
20
in signal<bit> rd, in signal<bit> wr ) {
... these channels highly reusable since they can be stored
21 }; in the library and instantiated at will. If these chan-
22
23 actor ADram( in signal<word> addr, nel descriptions are provided by component vendors,
24 inout signal<word> data, the error-prone e ort spent on understanding the data
25
26
in signal<bit> cs, in signal<bit> we,
out signal<bit> ras, out signal<bit> cas ) { sheets and interfacing the components can be greatly
27 ... relieved. Secondly, actor descriptions such as AAsic
28
29
};
can be stored in the library and easily reused without
30 channel CSramWrapper( void ) implements IRAM { any change subject to the change of other components
31
32
signal<word> addr, data;
signal<bit> rd, wr;
// address, data
// read/write select with which it interfaces.
33
34
ASram sram( addr, data, rd, wr ); It should be noted that while methods in an actor
35 void read_word( word a, word *d ) { ... } represent the behavior of itself, the methods of a chan-
36
37
void
...
write_word( word a, word d ) { ... } nel represent the behavior of their callers. In other
38 }; words, when the described system is implemented, the
39 methods of the channels will be inlined into the con-
nected actors. When a channel is inlined, the encapsu-
40 channel CDramWrapper( void ) implements IRam {
41 signal<word> addr, data; // address, data
42 signal<bit> cs, we; // chip select, write enable lated media get exposed and its methods are moved to
the caller. In the case of a wrapper, the encapsulated
43 signal<bit> ras, cas; // row, col address strobe
44 ADram sram( addr, data, cs, we, ras, cas );
45
46 void read_word( word a, word *d ) { ... }
actors also get exposed.
47 void write_word( word a, word d ) { ... } Figure 42 shows some typical con gurations. In
48
49
...
}; Figure 42(a), two synthesizable components A and B
50 (eg. actors to be implemented on an ASIC) are inter-
51 actor ASystem( void ) {
52 CSramWrapper ram; // can be replaced by connected via a channel C , for example, a standard
53 // CDramWrapper ram; // this declaration bus. Figure 42(b) shows the situation after inlining.
54
55
AAsic asic( ram );
The methods of the channel C are inserted into the
56 void main( void ) { ... } actors and the bus wires are exposed. In Figure 42(c)
57 };
a synthesizable component A communicates with a
xed component B (eg. an o -the-shelf component)
Figure 41: Source code of the component wrapper through a wrapper W . When W is inlined, as shown
speci cation. in Figure 42(d), the xed component B and the sig-
nals get exposed. In Figure 42(e) again a synthesizable
component A communicates with a xed component
B using a prede ned protocol, that is encapsulated
in the channel C . However, B has its own built-in
protocol, which is encapsulated in the wrapper W . A
31
A B A 1 void read_word( word a, word *d ) {
C W 2 do {
B 3 t1: { addr = a; }
4 t2: { rd = 1; }
5 t3: { }
(a) (c) 6 t4: { *d = data; }
A B A 7 t5: { addr.disconnect(); }
8 t6: { rd = 0; }
B 9 t7: { break}; }
10 }
11 timing {
(b) (d) 12 range( t1; t2; 0; );
13 range( t1; t3; 10; 20 );
A T 14 range( t2; t3; 10; 20 );
C W 15 range( t3; t4; 0; );
B 16 range( t4; t5; 0; );
17 range( t5; t7; 10; 20 );
18 range( t6; t7; 5; 10 );
(e) 19 }
A 20 };
T
B
Figure 43: Timing speci cation of the SRAM read
(f) protocol.
Legend: synthesizable fixed protocol
component component transducer
part lists all the events of the diagram. Events are
speci ed as a label and its associated piece of code,
which describes the change on signal values. The sec-
inlined
component channel wrapper
32
SpecC+ language.
4.1 System speci cation Figure 46: Conceptual model of speci cation: (a)
We have described the characteristics needed for spec- control- ow view, (b) atomic behaviors.
ifying systems in Section 3.1. The system speci cation
should describe the functionality of the system with-
out premature engagement in the implementation. It
should be made logically as close as possible to the
conceptual model of the system so that it is easy to
be maintained and modi ed. It should also be exe-
cutable so that the speci ed functionality is veri able.
The behavior model in Section 3.12 makes it a good
candidate since it is a simple model which meets these
requirements.
In the example shown in Figure 46, the system it-
self is speci ed as the top behavior B0, which contains
33
Synthesis Flow Analysis & Validation Flow
Allocation,
Partitioning
Scheduling
Communication
Synthesis
Manufacturing
34
an integer variable shared and a boolean variable sync.
There are three child behaviors, B1, B2, B3, with se-
quential ordering, in behavior B0. While B1 and B3
are atomic behaviors speci ed by a sequence of im-
perative statements, B2 is a composite behavior con-
sisting of two concurrent behaviors B4 and B5. B5 in
turn consists of B6 and B7 in sequential order. While PE0
most of the actual behavior of an atomic behavior is PE1 B0
omitted in the gure for space reasons, we do show a
producer-consumer example relevant for later discus- B1 B2 B3
Top
4.2 Allocation
shared sync B1_start B1_done B4_start B4_done
partitioning
} } }
}
35
(b) In general, controlling behaviors are needed and
must be added for child behaviors assigned to dif-
ferent PEs than their parents. For example, in
Figure 47, behavior B1 ctrl and B4 ctrl are in-
serted in order to control the execution of B1 and
B4, respectively.
(c) In order to maintain the functional equivalence
between the partitioned model and the original
speci cation, synchronization between PEs is in-
serted. In Figure 47 synchronization variables , PE0: B6 B7 B3
B1 start, B1 done, B4 start, B4 done are added PE1: B1 B4
so that the execution of B1 and B4, which are as- (a)
signed to PE1, can be controlled by their control-
ling behaviors B1 ctrl and B4 ctrl through inter- Top shared sync B6_start B3_start
PE synchronization.
PE0 PE1
However, the model after partitioning is still far B1
from implementation for two reasons: B6_start
B6
(a) There are concurrent behaviors in each PE that sync
have to be serialized; B7 B4
B3_start
model ... {
wait( B3_start ); stmts;
signal( B6_start ); ...
} ...
}
Given a set of behaviors and possibly a set of perfor-
}
36
the execution time of each behavior can be obtained. (b) The designer may also choose to assign a shared
This strategy eliminates the context switching over- variable to the local memory of one particular
head completely, but may su er from inter-PE syn- PE. In this case, accesses to this shared variable
chronization especially in the case of inaccurate per- in models of other PEs have to be changed into
formance estimation. On the other hand, the strategy function calls to message passing primitives such
based on dynamic scheduling does not have this prob- as send and receive. Again, interfaces have to be
lem because whenever a behavior is blocked for inter- inserted to make the message-passing possible.
PE synchronization, the scheduler will select another (c) Another option is to maintain a copy of the shared
to execute. Therefore the selection of the scheduling variable in all the PEs that access it. In this
strategy should be based on the trade-o between con- case, all the statements that perform a write on
text switching overhead and CPU utilization. this variable have to be modi ed to implement a
The model generated after static scheduling will broadcasting scheme so that all the copies of the
remove the concurrency among behaviors inside the shared variable remain consistent. Necessary in-
same PE. As shown in Figure 48, all child behaviors terfaces also need to be inserted to implement the
in PE0 are now sequentially ordered. In order to main- broadcasting scheme.
tain the partial order across the PEs, synchronization
between them must be inserted. For example, B6 is The generated model after communication synthe-
synchronized by B6 start, which will be asserted by sis, as shown in Figure 49, is di erent from previous
B1 when it nishes. models in the following way:
Note that B1 ctrl and B4 ctrl in the model after (a) New behaviors for interfaces, shared memories
partitioning are eliminated by the optimization car- and arbiters are inserted at the highest level of
ried out by static scheduling. It should also be men- the hierarchy. In Figure 49 the added behaviors
tioned that in this section we de ne the tasks, rather are IF0, IF1, IF2, Shared mem, Arbiter.
than the algorithms of codesign. Good algorithms are
free to combine several tasks together. For example, (b) The shared variables from the previous model are
an algorithm can perform the partitioning and static all resolved. They either exist in shared memory
scheduling at the same time, in which case intermedi- or in local memory of one or more PEs. The com-
ate results, such as B1 ctrl and B4 ctrl, are not gen- munication channels of di erent PEs now become
erated at all. the local buses and system buses. In Figure 49,
we have chosen to put all the global variables in
Shared mem, and hence all the global declarations
4.5 Communication synthesis and the in the top behavior are moved to the behavior
communication model Shared mem. New global variables in the top be-
Up to this stage, the communication and synchroniza- havior are the buses lbus0, lbus1, lbus2, sbus.
tion between concurrent behaviors are accomplished (c) If necessary, a communication layer is inserted
through shared variable accesses. The task of this into the runtime system of each PE. The com-
stage is to resolve the shared variable accesses into an munication layer is composed of a set of inter-PE
appropriate inter-PE communication scheme at imple- communication primitives in the form of driver
mentation level. Several communication schemes ex- routines or interrupt service routines, each of
ist: which contain a stream of I/O instructions, which
in turn talk to the corresponding interfaces. The
(a) The designer can choose to assign a shared vari- accesses to the shared variables in the previous
able to a shared memory. In this case, the com- model are transformed into function calls to these
munication synthesizer will determine the loca- communication primitives. For the simple case of
tion of the variables assigned to the shared mem- Figure 49, the communication synthesizer will de-
ory. Given the location of the shared variables, termine the addresses for all global variables, for
the synthesizer then has to change all accesses to example, shared addr for variable shared, and all
the shared variables in the model into statements accesses to the variables are appropriately trans-
that read or write to the corresponding addresses. formed. The accesses to the variables are ex-
The synthesizer also has to insert interfaces for changed with reading and writing to the corre-
the PEs and shared memories to adapt to di er- sponding addresses. For example, shared = local
ent protocols on the buses. + 1 becomes *shared addr = local+1.
37
Top
lbus0 lbus1 lbus2 sbus
sync
B0
B1_done B4_done
B1_ctrl
Shared_mem
B1
Arbiter
B2
B5
B6
B3
(a) (b)
B1( ) B1_ctrl( )
{ { B3( ) B7( ) IF0( ) IF1( ) IF2( )
wait( *B1_start_addr ); signal( * B1_start_addr ); { { { { {
... wait( * B1_done_addr ); stmt; stmt; stmt; .stmt; stmt;
signal( *B1_done_addr ); } ... ... ... .. ...
} } } } } }
B4()
{ B4_ctrl( ) B6( ) Shared_mem( ) Arbiter( )
int local; { { { {
wait( *B4_start_addr ); signal( *B4_start_addr ); int local; int shared; stmt;
wait( *sync_addr ); wait( *B4_done_addr ); ... bool sync; ...
local = (*shared_addr) − 1; } *shared_addr = local + 1; bool B1_start, B1_done; }
... signal( *sync_addr ); bool B4_start, B4_done;
signal( *B4_done_addr ); } ...
} }
(c)
Figure 49: Conceptual model after communication synthesis: (a) communication synthesis decision, (b) conceptual
model, (c) atomic behaviors.
38
4.6 Analysis and validation ow possible. For example, consider a behavior represent-
Before each design step, which takes an input design ing a piece of software that performs some compu-
model and generates a more detailed design model, the tation and then sends the result to an ASIC. While
input design model has to be functionally veri ed. It the part of the software which communicates with the
also needs to be analyzed, either statically, or dynam- ASIC needs to be simulated at cycle level so that tricky
ically with the help of the simulator or estimator, in timing problems become visible, it is not necessary to
order to obtain an estimation of the quality metrics, simulate the computation part with the same accu-
which will be evaluated by the synthesizer to make racy.
good design decisions. This motivates the set of tools The debugger renders the simulation with break
to be used in the analysis and validation ow of the point and single step ability. This makes it possible
methodology. An example of such a tool set consists to examine the state of a behavior dynamically. A
of visualizer can graphically display the hierarchy tree
of the design model as well as make dynamic data
(a) a static analyzer, visible in di erent views and keep them synchronized
at all times. All these e orts are invaluable in quickly
(b) a simulator, locating and xing the design errors.
(c) a debugger, The pro ler is a good complement of a static
analyzer for obtaining dynamic information such as
(d) a pro ler, and branching probability. Traditionally, it is achieved by
(e) a visualizer. instrumenting the design description, for example, by
inserting a counter at every conditional branch to keep
The static analyzer associates each behavior with track of the number of branch executions.
quality metrics such as program size and program per-
formance in case it is to be implemented as software, or 4.7 Backend
metrics of hardware area and hardware performance if
it is to be implemented as an ASIC. To achieve a fast At the stage of the backend, as shown in the lower
estimation with satisfactory accuracy, the analyzer re- part of Figure 45, the leaf behaviors of the design
lies on probabilistic techniques and the knowledge of model will be fed into di erent tools in order to obtain
backend tools such as compiler and high level synthe- their implementations. If the behavior is assigned to
sizer. a standard processor, it will be fed into a compiler for
The simulator serves the dual purpose of func- this processor. If the behavior is to be mapped on an
tional validation and dynamic analysis. Simulation ASIC, it will be synthesized by a high level synthesis
is achieved by generating an executable simulation tool. If the behavior is an interface, it will be fed into
model from the design model. The simulation model an interface synthesis tool.
runs on a simulation engine, which in the form of run- A compiler translates the design description into
time library, provides an implementation for the simu- machine code for the target processor. A crucial com-
lation tasks such as simulation time advance and syn- ponent of a compiler is its code generator, which emits
chronization among concurrent behaviors. machine code from the intermediate representation
Simulation can be performed at di erent accuracy generated by the parser part of the compiler. A re-
levels. Common accuracy models are functional, cycle targetable compiler is a compiler whose code gen-
based, and discrete event simulation. A functionally erator can emit code for a variety of target proces-
accurate simulation compiles and executes the design sors. An optimizing compiler is a compiler whose
model directly on a host machine without paying spe- code generator fully exploits the architecture of the
cial attention to simulation time. A clock cycle accu- target processor, in addition to the standard optimiza-
rate simulation executes the design model in a clock tion techniques such as constant propagation. Modern
by clock fashion. A discrete event simulation incorpo- RISC processors, DSP processors, and VLIW proces-
rates a even more sophisticated timing model of the sors depend heavily on optimizing compilers to take
components, such as gate delay. Obviously there is a advantage of their speci c architectures.
trade-o between simulation accuracy and simulator The high level synthesizer translates the design
execution time. model into a netlist of register transfer level (RTL)
It should be noted that, while most design method- components, as de ned in Section 2.3 as a FSMD ar-
ologies adopt a xed accuracy simulation at each de- chitecture. The tasks involved in high level synthesis
sign stage, applying a mixed accuracy model is also include allocation, scheduling and binding. Allocation
39
selects the number and type of the RTL components of design tasks for re ning the design and the models
from the library. Scheduling assigns time steps to representing the re nements.
the operations in the behavioral description. Binding In this chapter we presented essential issues in code-
maps variables in the description to storage elements, sign. System codesign starts by specifying the system
operators to functional units, and data transfers to in one of the speci cation languages based on some
interconnect units. All these tasks try to optimize ap- conceptual model. Conceptual models were de ned in
propriate quality metrics subject to design constraints. Section 1, implementation architectures in Section 2,
We de ne an interface as a special type of ASIC while the features needed in executable speci cations
which links the PE that it is associated (via its native were given in Section 3. After a speci cation is ob-
bus) with other components of the system (via the tained the designer must select an architecture, allo-
system bus). Such a interface implements the behav- cate components, and perform partitioning, schedul-
ior of a communication task, which is generated by a ing and communication synthesis to generate the ar-
communication synthesis tool to implement the shared chitectural behavioral description. After each of the
variable accesses. Note that a transducer, which trans- above tasks the designer may validate her decisions by
lates a transaction on one bus into one or a series of generating appropriate simulation models and validat-
transactions on another bus, is just a special case of ing the quality metrics as explained in Section 4.
the above interface de nition. An example of such a Presently, very little research has been done in the
transducer translates a read cycle on a processor bus codesign eld. The current CAD tools are mostly sim-
into a read cycle on the system bus. The communi- ulator backplanes. Future work must include de ni-
cation tasks between di erent PEs are implemented tion of speci cation languages, automatic re nement
jointly by the driver routines and interrupt service of di erent system descriptions and models, and de-
routines implemented in software and the interface velopment of tools for architectural exploration, algo-
circuitry implemented in hardware. While the par- rithms for partitioning, scheduling, and synthesis, and
titioning of the communication task into software and backend tools for custom software and hardware syn-
hardware, and model generation for the two parts is thesis, including IP creation and reuse.
the job of communication synthesis, the task of gen-
erating an RTL design from the interface model is the
job of interface synthesis. Thus interface synthesis
is a special case of high level synthesis. The charac-
Acknowledgements
teristics that distinguish an interface circuitry from a We would like to acknowledge the support provided
normal ASIC is that its ports have to conform to some by UCI grant #TC20881 from Toshiba Inc. and grant
prede ned protocols. These protocols are often spec- #95-D5-146 and #96-D5-146 from Semiconductor Re-
i ed in the form of timing diagrams in vendors' data search Corporation.
sheets. This poses new challenges to the interface syn- We would also like to acknowledge Prentice-Hall
thesizer for two reasons: Inc., Upper Saddle River, NJ 07458, for the permission
(a) the protocols impose a set of timing constraints to reprint gures from [GVNG94] (annotated by y),
on the minimum and maximum skews between gures from [Gaj97] (annotated by z), and the partial
events that the interface produces and other pos- use of text appearing in Chapter 2 and Chapter 3 in
sibly external events, which the interface has to [GVNG94] and Chapter 6 and Chapter 8 in [Gaj97].
satisfy; We would also like to thank Jie Gong, Sanjiv
Narayan and Frank Vahid for valuable insights and
(b) the protocols provide a set of timing delays on the early discussions about models and languages. Fur-
minimum and maximum skews between external thermore, we want to acknowledge Jon Kleinsmith,
events and other events, of which the interface En-shou Chang, Tatsuya Umezaki for contributions
may take advantage. in language requirements and model development.
40
of Object-Oriented Languages. Springer- Proceedings of the 29th ACM, IEEE De-
Verlag, 1990. sign Automation Conference, 1992.
[AG96] K. Arnold, J. Gosling. The Java Program- [GDWL91] D. D. Gajski, N. D. Dutt, C. H. Wu,
ming Language. Addison-Wesley, 1996. Y. L. Lin. High-Level Synthesis: In-
troduction to Chip and System Design.
[BCJ+ 97] F. Balarin, M. Chiodo, A. Jurecska, Kluwer Academic Publishers, Boston,
H. Hsieh, A. Lavagno, C. Passerone, Massachusetts, 1991.
A. Sangiovanni-Vincentelli, E. Sentovich,
K. Suzuki, B. Tabbara Hardware-Software [GVN94] D. D. Gajski, F. Vahid, S. Narayan. \A
Co-Design of Embedded Systems: A Po- system-design methodology: Executable-
lis Approach. Kluwer Academic Publish- speci cation re nement". In Proceedings
ers, 1997. of the European Conference on Design
[COB95] P. Chou, R. Ortega, G. Boriello. \Inter- Automation, 1994.
face Co-synthesis Techniques for Embed- [GVNG94] D. Gajski, F. Vahid, S. Narayan, J. Gong.
ded Systems". In Proceedings of the Inter- Speci cation and Design of Embedded Sys-
national Conference on Computer-Aided tems. New Jersey, Prentice Hall, 1994.
Design, 1995.
[Har87] D. Harel. \Statecharts: A visual for-
[CGH+ 93] M. Chiodo, P. Giusto, H. Hsieh, A. Ju- malism for complex systems". Science of
recska, L. Lavagno, A. Sangiovanni-Vin- Computer Programming 8, 1987.
centelli. \A formal speci cation model for
hardware/software codesign". Technical [HHE94] D. Henkel, J. Herrmann, R. Ernst. \An
Report UCB/ERL M93/48, U.C. Berke- approach to the adaption of estimated
ley, June 1993. cost parameters in the cosyma system".
[DH89] D. Drusinsky, D. Harel. \Using State- Third International Workshop on Hard-
charts for hardware description and syn- ware/Software Codesign, Grenoble, 1994.
thesis". In IEEE Transactions on Com- [Hoa85] C. A. R. Hoare. Communicating Sequen-
puter Aided Design, 1989. tial Processes. Prentice-Hall International,
[EHB93] R. Ernst, J. Henkel, Englewood Cli s, New Jersey, 1985.
T. Benner. \Hardware-software cosynthe- [HP96] J. L. Hennessy, D. A. Patterson. Com-
sis for microcontrollers". In IEEE Design puter Architecture { A Quantitative Ap-
and Test, Vol. 12, 1993. proach, 2nd edition, Morgan-Kaufmann,
[FLLO95] R. French, M. Lam, J. Levitt, K. Oluko- 1996.
tun. \A General Method for Compiling [KL95] A. Kalavade, E. A. Lee. \The extended
Event-Driven Simulation". In Proceedings partitioning problem: Hardware/software
of 32th Design Automation Conference, 6, mapping and implementation-bin selec-
1995. tion". In Proceedings of the 6th Interna-
[FH92] C. W. Fraser, D. R. Hanson, T. A. Proeb- tional Workshop on Rapid Systems Pro-
sting. \Engineering a Simple, Ecient totyping, 1995.
Code Generator Generator". In ACM Let-
ters on Programming Languages and Sys- [Lie97] C. Liem. Retargetable Compilers for Em-
tems, 1, 3 (Sept. 1992). bedded Core Processors: Methods and
Experiences in Industrial Applications.
[Gaj97] D. D. Gajski. Principles of Digital Design, Kluwer Academic Publishers, 1997.
Prentice Hall, 1997.
[LM87] E. A. Lee, D. G. Messerschmidt. \Static
[GCM92] R. K. Gupta, C. N. Coelho Jr., Scheduling of Synchronous Data Flow
G. De Micheli. \Synthesis and simulation Graphs for Digital Signal Processors". In
of digital systems containing interacting IEEE Transactions on Computer-Aided
hardware and software components". In Design, 87, pp.24-35.
41
[LMD94] B. Landwehr,
P. Marwedel, R. Domer. \OSCAR: Op-
timum Simultaneous Scheduling, Alloca-
tion and Resource Binding Based on Inte-
ger Programming". In Proceedings of the
European Design Automation Conference,
1994.
[LS96] E. A. Lee, A. Sangiovanni-Vincentelli.
\Comparing Models of Computation". In
Proceedings of the International Confer-
ence on Computer Design, San Jose, CA,
Nov. 10-14, 1996.
[MG95] P. Marwedel, G. Goosens. Code Gener-
ation for Embedded Processors. Kluwer
Academic Publishers, 1995.
[NM97] R. Niemann, P. Marwedel. \An Algo-
rithm for Hardware/Software Partition-
ing Using Mixed Integer Linear Program-
ming". In Design Automation for Embed-
ded Systems, 2, Kluwer Academic Pub-
lishers, 1997.
[Pet81] J. L. Peterson. Petri Net Theory and the
Modeling of Systems. Prentice-Hall, En-
glewood Cli s, New Jersey, 1981.
[PK93] Z. Peng, K. Kuchcinski. \An Algorithm
for partitioning of application speci c sys-
tems". In Proceedings of the European
Conference on Design Automation, 1993.
[Rei92] W. Reisig. A Primer in Petri Net Design.
Springer-Verlag, New York, 1992.
[Stau94] J. Staunstrup. A Formal Approach to
Hardware Design. Kluwer Academic Pub-
lishers, 1994.
[Str87] B. Strous-
trup. The C++ Programming Language.
Addison-Wesley, Reading, 1987.
[TM91] D. E. Thomas, P. R. Moorby. The Verilog
Hardware Description Language. Kluwer
Academic Publishers, 1991.
[YW97] T. Y. Yen, W. Wolf. Hardware-soft-
ware Co-synthesis of Distributed Embed-
ded Systems. Kluwer Academic Publish-
ers, 1997.
42
Index
Accumulator, 11 FSM, 4
Action, 4 input-based, 4
Actor, 28 Mealy-type, 4
composite, 29 Moore-type, 4
leaf, 29 state-based, 4
Application-Speci c Architecture, 10 FSMD, 5
Architecture, 3 Function, 9
Array Processor, 16
General-Purpose Processor, 10
Behavior, 18
Leaf, 22 HCFSM, 8
Behavioral Hierarchy, 22 Hierarchy, 8
Block, 25 behavioral, 28
structural, 29
Channel, 25 High Level Synthesizer, 39
CISC, 13
Communication, 30 Inlining, 31
Communication Medium, 26 Interface, 17, 25
Compiler, 39 Interface Synthesis, 40
optimizing, 39
retargetable, 39 Language
Completion Point, 23 Executable Modeling, 18
Complex-Instruction-Set Computer, 13 Executable Speci cation, 18
Concurrency, 8 Late Binding, 30
pipelined, 19 Liveness, 7
Concurrent Decomposition, 22 Method, 26
Control construct, 9 Methodology, 3
branching, 9 MIMD, 10, 16
looping, 9 Mode, 20
sequential composition, 9 Model, 1
subroutine call, 9 activity-oriented, 3
Controller, 10 data-oriented, 4
Cycle, 19 heterogeneous, 4
Data Sample, 19 state-oriented, 3
Data Type structure-oriented, 4
basic, 8 Multiprocessor, 16
composite, 9 message-passing, 16
Data ow, 19 shared-memory, 16
Debugger, 39 Multiprocessor System, 16
Design Exploration, 35 Parallel Processor, 10, 16
Design Process, 3
PE, 17
Exception, 24 Petri net, 6
Executable Modeling, 18 Pipeline Stage, 19
Executable Speci cation, 18 Place, 6
Port, 25
Final State, 23 Procedure, 9
Finite-State Machine, 4 Process, 20
hierarchical concurrent, 8 Processing Element, 17
Fire, 7 Pro ler, 39
43
Program-State, 9 Transition on Completion, 9, 29
composite, 9
concurrent, 9 Very-Long-Instruction-Word Computer, 15
Leaf, 9 Visualizer, 39
sequential, 9 VLIW, 10, 15
Program-State Machine, 9
Programming Language, 8
declarative, 8
imperative, 8
Protocol, 17, 24
PSM, 9
Reduced-Instruction-Set Computer, 14
RISC, 10, 14
Saveness, 7
Sequential Decomposition, 22
procedural, 22
state-machine, 22
SIMD, 10, 16
Simulator, 39
Single Assignment Rule, 19
State, 4, 8
State Transition, 29
Statement, 9
Static Analyzer, 39
Structure, 25
Subbehavior
initial, 22
Substate, 8
concurrent, 8
Synchronization
by common event, 27
by common variable, 27
by status detection, 28
control-dependent, 30
data-dependent, 30
initialization, 27
shared-memory based, 27
System Bus, 17
TI, 9, 30
Timing, 32
Timing Constraint, 24, 32
Timing Delay, 24, 32
Timing Diagram, 24
TOC, 9, 29
Token, 6
Transition, 4, 6, 8
group, 22
hierarchical, 22
simple, 22
Transition Immediately, 9, 30
44