0% found this document useful (0 votes)
58 views25 pages

Digital Design With Implicit State Machines: Fengyun Liu

Digital Design with Implicit State Machines

Uploaded by

Mykola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views25 pages

Digital Design With Implicit State Machines: Fengyun Liu

Digital Design with Implicit State Machines

Uploaded by

Mykola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

1 Digital Design with Implicit State Machines

2 Fengyun Liu
3 EPFL, Switzerland
4 [email protected]
5 Aleksandar Prokopec
6 Oracle Labs, Switzerland
7 [email protected]
8 Martin Odersky
9 EPFL, Switzerland
10 [email protected]

11 Abstract
12 Claude Shannon, in his famous thesis (1938), revolutionized circuit design by showing that Boolean
13 algebra subsumes all ad-hoc methods that are used in designing switching circuits, or combinational
14 circuits as they are commonly known today. But what is the calculus for sequential circuits?
15 Finite-state machines (FSM) are close, but not quite, as they do not support arbitrary parallel and
16 hierarchical composition like that of Boolean expressions. We propose an abstraction called implicit
17 state machine (ISM) that supports parallel and hierarchical composition. We formalize the concept
18 and show that any system of parallel and hierarchical ISMs can be flattened into a single flat FSM
19 without exponential blowup. As one concrete application of implicit state machines, we show that
20 they serve as an attractive abstraction for digital design and logical synthesis.

21 2012 ACM Subject Classification Replace ccsdesc macro with valid one

22 Keywords and phrases Finite-state machines, hierarchical FSM

23 Digital Object Identifier 10.4230/LIPIcs.CVIT.2016.23

24 1 Introduction
25 Claude Shannon [26] revolutionized circuit design by showing that Boolean algebra subsumes
26 all ad-hoc methods that are used in designing switching circuits, or combintional circuits as
27 they are commonly known today. In contrast to combinational circuits which only contain
28 stateless gates, sequential circuits may also contain stateful elements, like registers. But what
29 is the calculus for sequential circuits? Finite-state machines (FSM) are close, but not quite.
30 A good abstraction in programming should be composable. In a Boolean expression
31 a ∨ b, the sub-expression a and b can be arbitrary Boolean expressions. We may also
32 put two Boolean expression side by side to achieve parallel composition. Essentially, any
33 combinational circuit design will eventually result in a Boolean expression, regardless of
34 whether the design language is in VHDL, Verilog, or Chisel [1]. The composability of Boolean
35 expression ensures that any combinational circuit can be represented.
36 If we turn to sequential circuits, which may contain state elements and cycles, what is the
37 calculus that all sequential circuits can compile to, like Boolean algebra for combinational
38 circuits? Finite-state machines are close to fulfill the role, but not quite. Classic FSMs
39 support neither hierarchical composability nor parallel composition. The milestone paper
40 by Benveniste and Berry [2] argued that the lack of support for hierarchical design and
41 concurrency is mentioned in as a major drawback of FSMs.
42 Conceptually, we may compose FSMs side by side or in a nested way, which leads to
43 parallel and hierarchical FSMs. In a hierarchical FSM, the behavior of the outer FSM
44 depends on that of the inner FSM, and the inner FSM has a privileged access to the current
45 state of the outer FSM. Parallel FSMs run side-by-side and respond to inputs concurrently.
© Fengyun Liu, Aleksandar Prokopec and Martin Odersky;
licensed under Creative Commons License CC-BY
42nd Conference on Very Important Topics (CVIT 2016).
Editors: John Q. Open and Joan R. Access; Article No. 23; pp. 23:1–23:25
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
23:2 Digital Design with Implicit State Machines

46 If one FSM can be in state a, b, the other can be in state c, d, then their parallel composition
47 may be in states ac, ad, bc, bd.
48 There has been proposals for programming with hierarhical and parallel FSMs [7, 8, 12, 19],
49 but so far no proposals address the two problems below:

50 How to support parallel and hierarchical composition of FSMs in a declarative language?


51 How to transform a complex system of FSMs into a flat FSM?

52 While experts in logic verification and synthesis usually work with flat FSMs for its
53 simplicity and expressiveness, digital designers primarily work with hierarchical FSMs to
54 decompose the complexity of a system. It is unknown how to support hierarchical and parallel
55 composition of FSMs in a language, and then transform it into a flat FSM to facilitate formal
56 verification such as model checking [5], and optimizations such as state encoding [10, 30].
57 The flattening of hierarchical and parallel FSMs generally results in exponential blowup
58 in the size of their representation, e.g. flattening of 32 parallel 2-state FSMs would result in
59 a flat FSM with 232 states. Existing programming models with FSMs require one case for
60 each state in the code [7, 8, 12, 19], consequently, the exponential blowup cannot be avoided
61 in such languages. This creates a gap between a complex system of parallel and hierarchical
62 FSMs and a flat FSM. Despite its simplicity and mathematical elegance, we still do not know
63 how to make FSMs a first-class construct for programming, optimization and verification
64 due to the lack of efficient composability and flattening.
65 To bridge the gap, we propose a novel abstraction, called implicit state machine (ISM),
66 that supports arbitrary parallel and hierarchical composition of FSMs. Implicit state machines
67 do not mandate states to be explicitly specified in the program, which avoids the exponential
68 blowup when flattening a complex system of FSMs. This flexible composability makes
69 implicit state machine an elegant first-class programming construct for digital design, and
70 the avoidance of exponential blowup in flattening makes implicit state machines an attractive
71 intermediate language for compilation, optimization and verification.
72 From the perspective of circuit design, the flattening keeps the area and the delay, the two
73 optimization goals of logic synthesis, unchanged. The result implies that any synchronous
74 sequential circuits is equivalent to a circuit with all state elements at the boundary, and a big
75 combinational core at the center. We conjecture this result will lead to more optimization
76 opportunities. For example, now combinational techniques may be used to optimize the
77 whole circuit, while it was previously convenient to optimize only combinational fragments
78 using the fundamental techniques. It may also give rise to novel hardware architectures. For
79 example, FPGAs no longer need to scatter state elements (e.g. D flip-flops) in its layout.
80 Our contributions are listed below:

81 We introduce the concept of implicit state machines, and formalize the concept in a
82 declarative calculus. Implicit state machines support parallel and hierarchical composition,
83 and we may optimize and reason about the code by equational reasoning.
84 We show that any parallel and hierarchical FSMs can be flattened into a flat implicit state
85 machine in polynomial time and code size. As far as we know, this is the first abstraction
86 for hierarchial and parallel FSMs that avoids exponential blowup in flattening.
87 To the best of our knowledge, we are the first to theorize that any synchronous sequential
88 circuits is equivalent to a circuit with all state elements at the boundary and a big
89 combinational core at the center with the same area and delay.
90 We create an embedded DSL in Scala based on implicit state machines, and the initial
91 experiments show positive results when implicit state machine is used as a programming
92 model and an intermediate representation for logic synthesis.
F. Liu, A. Prokopec, M. Odersky 23:3

93 2 Implicit State Machines

94 2.1 Introduction
95 Finite-state machines are widely used in the design and verification of reactive and real-
96 time systems, which include critical systems that control nuclear plants, airplanes, trains, cars,
97 etc. As a mathematical model, finite-state machines can precisely and succinctly characterize
98 the behaviors of such systems, which forms the basis to formally verifying that the systems
99 work reliably in accordance with the specification.
100 Mathematically, a finite state machine is usually represented as a quintuple (I, S, s0 , σ, O):

101 I is the set of inputs


102 S is the set of states
103 s0 ∈ S is the initial state
104 σ : I × S → S × O maps the input and the current state to the next state and the output
105 O is the set of outputs

106 FSM can also be represented graphically by state-transition diagrams, as the following
107 figure shows:

0/1 1/1
0/0
1/0
start q1 q2 q3

0/1, 1/1
108

109 In the state machine above, q1 is the initial state, and each edge denotes a transition:
110 the label 0/1 on the edge means the transition happens when the input is 0, and it outputs 1
111 when the transition occurs.
Implicit state machines are based on a reflection on the essence of FSM: a mapping
from input and state to the next state and output. The first insight towards implicit state
machines is that the mapping function does not have to be represented as a set whose
size correlates with the size of the state space, as it is the case in existing languages for
programming with FSMs [12, 8, 7, 19]. In a declarative language, the mapping functionality
can be represented by any expression. This gives us a tentative representation as follows:

λx:I × S. (t1 , t2 ) : I ×S →S×O

The body (t1 , t2 ) enforces that the output and next state are implemented as two functions.
This imposes unnecessary constraints. If we introduce tuples in the language, we can replace
(t1 , t2 ) just by t:
λx:I × S. t : I ×S →S×O
The second insight is that the state is neither an input to an FSM nor an output of an
FSM, but a self reference. It leads us to the following representation with the state variable
s:
λx:I. fsm { s ⇒ t } : I→O
112 In the above, the term t still has the type S × O, but seen from outside, a state machine
113 just maps input to output, which corresponds to our intuition.

CVIT 2016
23:4 Digital Design with Implicit State Machines

The last insight is that the inputs do not need to be represented explicitly, they can be
captured from the lexical scope:

fsm { s ⇒ t } : O

We still miss the initial state, so we use the value v to denote the initial state of the FSM:

fsm { v | s ⇒ t } : O

Voila! Suppose we are working in the domain of digital circuits, a one-bit D flip-flop with
an input signal d can be represented as follows:

fsm { 0 | s ⇒ (d, s) }

114 It takes the value d as the next state, and outputs the last state on every clock. We may
115 compose several such flip-flops to implement a shift register for a given input d:

116 let q1 = fsm { 0 | s => (d, s) } in


117 let q2 = fsm { 0 | s => (q1, s) } in
118 let q3 = fsm { 0 | s => (q2, s) } in
119 let q4 = fsm { 0 | s => (q3, s) } in
120 (q1, q2, q3, q4)

121 An equivalent flat FSM that implements the 4-bit shift register is shown below:

122 fsm { (0, 0, 0, 0) | s => ((d, s.1, s.2, s.3), s) }

123 Implicit state machines are just expressions, thus they may appear anywhere that an ex-
124 pression is allowed. In particular, we may nest them to get another equivalent implementation
125 of the shift register:

126 fsm { 0 | q1 =>


127 let q2 = fsm { 0 | s => (q1, s) } in
128 let q3 = fsm { 0 | s => (q2, s) } in
129 let q4 = fsm { 0 | s => (q3, s) } in
130 (d, (q1, q2, q3, q4))
131 }

132 In the following, we formalize implicit state machines in a calculus.

133 2.2 Syntax


134 The syntax of the language is presented below:
F. Liu, A. Prokopec, M. Odersky 23:5

t ::= terms
a, b, c external input
x, y, z, s variables
let x = t in t let binding
β Boolean value
t∗t 1 bit and
t+t 1 bit or
135 !t 1 bit not
(t, . . . , t) tuple
t.i projection
fsm { v | s ⇒ t } implicit state machine

β ::= 0 | 1 Boolean values


v ::= β | (v, . . . , v) values
i ::= 0, 1, 2, . . . indexes

136 Beyond the basic elements of Boolean algebra, we also introduce let-bindings, which is a
137 basic abstraction and reuse mechanism. Tuples and projections are introduced for parallel
138 composition and decomposition. In a projection t.i, the index i must be a statically known
139 number. For implicit state machines, we require that the initial state is a value.
140 A circuit usually has external inputs, which is represented by variables a, b, c. By
141 convention, we use x, y, z for let-bindings, and s for the binding in implicit state machines.
142 We choose Boolean algebra as the domain theory, but it can also be other mathematical
143 structures, like groups or abelian groups. Our transform does not assume properties of
144 mathematical structures as long as we may substitute equals for equals [29].

145 2.3 Semantics


The semantics of the language is defined with the help of a state map σ and an environment
ρ. The state σ maps a state variable to a state value, the environment variable ρ maps an
external signal to a value. The big-step operational semantics is defined with the following
reduction relation:
σ,ρ
t −→ v | σ 0
146 It means that given the current state σ and environment ρ, the term t evaluates to the
147 value v with the next state σ 0 . The semantics follows the synchronous hypothesis [2], which
148 assumes that the computation of the response to an input takes no time. For synchronous
149 digital circuits, it means that the system produces an output at each clock tick. The reduction
150 rules are defined in Figure 1. We explain the rules below:
151 E-Value. If it is already a value, do nothing. There are no nested state machines, thus
152 the mapping for the next state is the empty set.
153 E-Input. Look up the external variable a from the environment ρ.
154 E-Let. First evaluate t1 to the value v1 , then evaluate t2 with x replaced by v1 .
155 E-Tuple. Evaluate each component in parallel to a value, and accumulate the mapping
156 for the next state.
157 E-Project. First evaluate the term to a tuple value, then return the corresponding
158 component.
159 E-And. Evaluate the two components in parallel to Boolean values, then call the helper
160 method and to compute the resulting Boolean value β. As each component may contain
161 implicit state machines, accumulate the mapping for the next state.

CVIT 2016
23:6 Digital Design with Implicit State Machines

σ,ρ
v −→ v | ∅ (E-Value)

v = ρ(a)
σ,ρ (E-Input)
a −→ v | ∅
σ,ρ σ,ρ
t1 −→ v1 | σ 0 [x 7→ v1 ]t2 −→ v2 | σ 00
σ,ρ (E-Let)
let x = t1 in t2 −→ v | σ 0 ∪ σ 00
σ,ρ σ,ρ
t1 −→ v1 | σ1 ... tn −→ vn | σn
σ,ρ (E-Tuple)
(t1 , . . . , tn ) −→ (v1 , . . . , vn ) | σ1 ∪ · · · ∪ σn
σ,ρ
t −→ (v1 , . . . , vi , . . . , vn ) | σ 0
σ,ρ (E-Project)
t.i −→ vi | σ 0
σ,ρ σ,ρ
t1 −→ β1 | σ 0 t2 −→ β2 | σ 00 β = and(β1 , β2 )
σ,ρ (E-And)
0 00
t1 ∗ t2 −→ β | σ ∪ σ
σ,ρ σ,ρ
t1 −→ β1 | σ 0 t2 −→ β2 | σ 00 β = or(β1 , β2 )
σ,ρ (E-Or)
0 00
t1 + t2 −→ β | σ ∪ σ
σ,ρ
t −→ β | σ 0 β 0 = not(β)
σ,ρ (E-Not)
!t −→ β 0 | σ 0
σ,ρ
v = σ(s) [s 7→ v]t | −→ (v1 , v2 ) | σ 0
σ,ρ (E-Fsm)
fsm { v | s ⇒ t } −→ v2 | { s 7→ v1 } ∪ σ 0

Figure 1 Big-step operational semantics

162 E-Or. Similar as above, but use the helper function or to compute the resulting value.
163 E-Not. Similar as above, but use the helper function not to compute the resulting value.
164 E-Fsm. First look up the value for the current state from the state map σ. Then evaluate
165 the body of the state machine to a pair value (v1 , v2 ). The output is v2 , and the next
166 state is v1 .

167 The reduction relation only defines one-step semantics. The semantics of a system is
168 defined by the trace of a given input series ρ0 , ρ1 , · · · . We define it formally below:

169 I Definition 1 (Trace). The trace of a system t with respect to an input sequence ρ0 , ρ1 , · · ·
170 is the sequence o0 , o1 , · · · such that
σ0 ,ρ0
171 t −→ o0 | σ1
172 ...
σi ,ρi
173 t −→ oi | σi+1
174 ...

175 In the above, σ0 is the initial state of FSMs as specified in t.


F. Liu, A. Prokopec, M. Odersky 23:7

176 2.4 Type System


177 We introduce a simple type system to ensure that the system is sound, i.e. it never gets
178 stuck. The type system is presented in Figure 2. In the system, there are only two types:
179 Bool for Boolean values and (T1 , . . . , Tn ) for tuples. We explain the typing rules below:

T ::= Bool | (T, . . . , T )

Γ ` β : Bool (T-Bool) Γ ` t : (T1 , . . . , Ti , . . . , Tn )


(T-Project)
Γ ` t.i : Ti
a:T ∈Γ
(T-Input) Γ ` t1 : Bool Γ ` t2 : Bool
Γ`a:T (T-And)
Γ ` t1 ∗ t2 : Bool
x:T ∈ Γ
(T-Var) Γ ` t1 : Bool Γ ` t2 : Bool
Γ`x:T (T-Or)
Γ ` t1 + t2 : Bool

Γ ` t : Bool Γ ` t1 : T1 Γ, x:T1 ` t2 : T2
(T-Not) (T-Let)
Γ `!t : Bool Γ ` let x = t1 in t2 : T2

Γ ` t1 : T1 ... Γ ` tn : Tn Γ ` v : T1 Γ, s:T1 ` t : (T1 , T2 )


Γ ` (t1 , . . . , tn ) : (T1 , . . . , Tn ) Γ ` fsm { v | s ⇒ t } : T2
(T-Tuple) (T-Fsm)

Figure 2 Type System

180 T-Bool. The type for Boolean values is always Bool.


181 T-Input. For inputs, their types are predefined in the environment.
182 T-Var. For variables, their types also appear in the environment.
183 T-Not. The term t must be Bool.
184 T-Tuple. If each component has a type, and then the type of the tuple has a corres-
185 ponding tuple type.
186 T-Project. If the term t has a tuple type, then the projection has the type of the
187 corresponding component.
188 T-And. If each component has the type Bool, the result also has the type Bool.
189 T-Or. The same as above.
190 T-Let. If the bound term has the type of T1 , and the body of the let-binding has the
191 type T2 under the environment Γ extended with the binding x:T1 , then the let-binding has
192 the type T2 . Note that this rule forbids the usage of x1 in t1 , which prevents undesired
193 circles.
194 T-Fsm. If the initial value has the type T1 , and the body has the type (T1 , T2 ) under the
195 environment Γ extended with the binding s:T1 , then the FSM has the type T2 .

196 We need an auxiliary definition of value map typing:

CVIT 2016
23:8 Digital Design with Implicit State Machines

Γ`∅ Γ`ξ Γ`v:T


Γ, α: T ` ξ ∪ { α 7→ v }

197 In the above, α ranges over inputs a and state variables s, and ξ ranges over input map
198 ρ and state map σ.

199 I Theorem 2 (Soundness). If Γ ` t : T , and if for each ρi in the input sequence ρ0 , ρ1 , . . .


200 we have Γ ` ρi , then there exists a trace corresponding to the input sequence.

201 The proof follows from the following lemma by induction on the length of the input
202 sequence:

203 I Lemma 3. Given Γ ` t : T , Γ ` ρ, Γ ` σ, Γ ` σ0 , where σ0 is the initial state map as


σ,ρ
204 specified in t, then t −→ v | σ 0 , Γ ` v : T and Γ ` σ 0 .

205 Sketch. By induction on the typing judgment Γ ` t : T . J

206 2.5 Flattening


207 In this section, we present a transform that translates any system of parallel and hierarchical
208 implicit state machines into a flat implicit state machine. The transformation is defined in
209 Figure 3. It consists of two major steps:

210 Lifting. This step lifts FSMs to top-level.


211 Flattening. This step merges FSMs to a single FSM.

212 For the purposes of the transformation, we first define the FSM-free fragment of the
213 language, which is represented by e. Lifting will result in lifted normal form (N), where all
214 FSMs are at the nested at the top of the program, with an FSM-free fragment in the middle.
215 The relation t1 ;L t2 says that the term t1 takes a lifting step to t2 . Lifting is defined
216 with the help of the lifting context L. The lifting context specifies that the transform follows
217 the order left-right and top-down. The actual lifting happens with the function J·K, which
218 transforms the source program to the expected form. We explain the concrete transform
219 rules below:

220 fsm { v | s ⇒ e1 } ∗ t2 . The FSM absorbs t2 into its body. The symmetric case, and the
221 cases for AND and OR are similar.
222 let x = fsm { v | s ⇒ e1 } in t2 . It pulls the let-binding into the body. The case in
223 which FSM is in the body of let-binding is similar.
224 fsm { v | s ⇒ e }.i. It pulls the projection into the body of FSM.
225 (ē, fsm { v | s ⇒ e }, t̄). It pulls the tuple into the body of FSM.

226 Once all FSMs are nested at the top-level after lifting, flattening takes place. The relation
227 t1 ;F t2 says that the term t1 takes a flattening step to t2 . Flattening is defined with
228 the help of the flattening context F . The flattening context specifies that the flattening
229 happens from inside towards outside. The actual merging step is quite straightforward: it
230 just combines the initial states v1 and v2 , as well as merges s1 and s2 into s.
231 We use the notation t1 ; t2 to mean that t1 takes either a lifting step (;L ) or a
232 flattening step (;F ) to t2 . We write t1 ;∗ t2 to mean 0 or multiple such transform steps.
233 For simplicity of presentation, we omit the formal definitions.
F. Liu, A. Prokopec, M. Odersky 23:9

FSM-free Fragment

e ::= v | e ∗ e | e + e | !e | (e, . . . , e) | e.i | let x = e in e | x | s | a

Lifted Normal Form

N ::= e | fsm { v | s ⇒ N }

Lifting

L ::= [·] | L ∗ t | e ∗ L | L + t | e + L | !L | L.i | (e1 , . . . , L, . . . , tn ) |


fsm { v | s ⇒ L } | let x = L in t | let x = e in L

JtK = fsm { v | s ⇒ t0 }
L[t] ;L L[fsm { v | s ⇒ t0 }]

Jfsm { v | s ⇒ e1 } ∗ t2 K = fsm { v | s ⇒ let x = e1 in (x.1, x.2 ∗ t2 ) }


Je2 ∗ fsm { v | s ⇒ e1 }K = fsm { v | s ⇒ let x = e1 in (x.1, e2 ∗ x.2) }
Jfsm { v | s ⇒ e1 } + t2 K = fsm { v | s ⇒ let x = e1 in (x.1, x.2 + t2 ) }
Je2 + fsm { v | s ⇒ e1 }K = fsm { v | s ⇒ let x = e1 in (x.1, e2 + x.2) }
J! fsm { v | s ⇒ e }K = fsm { v | s ⇒ let x = e in (x.1, !x.2) }
Jlet x = fsm { v | s ⇒ e1 } in t2 K = fsm { v | s ⇒ let s1 , x = e1 in (s1 , t2 ) }
Jlet x = e1 in fsm { v | s ⇒ e2 }K = fsm { v | s ⇒ let x = e1 in e2 }
q Jfsm { v | s ⇒ e }.iKy = fsm { v | s ⇒ let x = e in (x.1, x.2.i) }
(ē, fsm { v | s ⇒ e }, t̄) = fsm { v | s ⇒ let x = e in (x.1, (ē, x.2, t̄)) }
Flattening
F ::= [·] | fsm { v | s ⇒ F }

JN K = fsm { v | s ⇒ e }
F [N ] ;F F [fsm { v | s ⇒ e }]

Jfsm { v1 | s1 ⇒ fsm { v2 | s2 ⇒ e2 } }K = fsm { (v1 , v2 ) | s ⇒ let s1 , s2 = s in


let x = e2 in ((x.2.1, x.1), x.2.2) }

Figure 3 Flattening of nested FSMs. We write let x, y = t1 in t2 as a syntactic sugar for


let z = t1 in let x = z.1 in let y = z.2 in t2 .

234 I Theorem 4 (Complexity). If the term t contains FSMs, then there exists e such that
235 t ;∗ fsm { v | s ⇒ e } in O(m ∗ n) steps where m is the size of the term t, and n is the
236 number of state machines in the code.

237 Sketch. During lifting, each step moves some code that pre-exists in t inside another FSM.
238 Thus, the worse case is O(m ∗ n). During flattening, each step reduces one FSM, thus it
239 takes n steps for flattening. Therefore, the complexity is O(m ∗ n). J

240 A tighter bound is O(d ∗ n), where d is the max depth of FSM from the root (if we see a
241 term t as an abstract syntax tree), n is the number of FSMs. However, as lifting introduces

CVIT 2016
23:10 Digital Design with Implicit State Machines

242 let-bindings which changes the height of the tree, technically it is more complex to establish
243 the bound, we thus leave it to future work.
244 Meanwhile, the complexity also establishes the bound for the resulting code size after
245 flattening: for each lifting and flattening step, the code size increase by a small constant
246 (usually an additional let-binding and tuple), thus code size increase is also bound by O(m∗n).

247 I Corollary 5 (Code Size). If the term t contains FSMs, and there exists e such that
248 t ;∗ fsm { v | s ⇒ e }, then the code size increase of e compared to e is bounded by
249 O(m ∗ n), where m is the size of the term t, and n is the number of state machines in the
250 code.

251 I Theorem 6 (Semantic Preserving). If t ; t0 , then they have the same trace for any given
252 input sequence ρ0 , ρ1 , · · · .

253 It follows from the following lemmas by induction on the length of the trace:
σ,ρ σ,ρ
254 I Lemma 7. If t ;L t0 , t −→ v | σ1 , then t0 −→ v | σ1 .

255 Sketch. First perform induction on the lifting contexts, then perform case analysis on the
256 concrete transform rules. J

257 I Lemma 8. If N ;F N 0 , i.e. in the flattening Jfsm { v1 | s1 ⇒ fsm { v2 | s2 ⇒ e2 } }K


258 of two state machines, let f = λσ.{ s 7→ (σ(s1 ), σ(s2 )) } ∪ (σ\{ s1 , s2 }), and f (σ) = σ 0 ,
σ,ρ σ 0 ,ρ
259 N −→ v | σ1 , then N 0 −→ v | σ10 and f (σ1 ) = σ10 .

260 Sketch. Perform induction on the flattening contexts. Note that for the initial states σ0 and
261 σ00 specified in N and N 0 respectively, f (σ0 ) = σ00 holds trivially.
262 J

263 2.6 Discussion: Are Implicit State Machines FSMs?


264 The mathematical definition of FSM requires the transition function to be a pure function, i.e.
265 a function that always return the same result given the same input. However, it is generally
266 not the case for implicit state machines, as an implicit state machine may contain a nested
267 implicit state machine, which makes the transition function stateful or impure. Consequently,
268 if an implicit state machine does not contain any nested ISM, then its body is a pure Boolean
269 function, which make the ISM an FSM in the mathematical sense.
270 From this perspective, flattening plays another important role: it transforms a possibly
271 non-FSM implicit state machine to an FSM. This also reflects a natural design choice of
272 implicit state machines: in order to support hierarchical state machines, we need to give up
273 the requirement that the transition function is pure.
274 Also note that implicit state machines just do not mandate states to be explicitly
275 represented in the program, however, they do not forbid that. This means that programmers
276 can continue to program with explicit states when necessary. This is can be done with a
277 switch on the state of the FSM (in pseudocode):

278 fsm { 0 | s =>


279 when (s == 0) t1
280 when (s == 1) t2
281 when (s == 2) t3
282 otherwise t4
283 }
F. Liu, A. Prokopec, M. Odersky 23:11

284 In the above, we the when construct to define one transition for each state. We implement
285 when as a syntactic sugar in our DSL and use it to decode controller instructions (Section 4).
286 Note that outside the setting of formal verification and theory of computation, the term
287 finite-state machine is sometimes used in programming to loosely mean any machine that has
288 a finite set of states. In the rest of the paper, when there is no danger of misunderstanding,
289 we use the term FSM in the loose sense.

290 3 Programming Model for Digital Design


291 The hardware design community is yearning for a better programming language [16, 20, 21].
292 We believe introducing implicit state machines as a programming model will improve the
293 situation.

294 3.1 Declarative Programming


295 It is well-known in the programming language community that a declarative language enjoys
296 many advantages over an imperative language. The mainstream languages for digital design,
297 such as VHDL and Verilog, are in imperative style.
298 A declarative language is easier to work with than an imperative one. Declarative programs
299 are easier to compose and reason about, as we may substitute equals for equals [29]: given an
300 equation x = t in the program, we may safely substitute the variable x with the code t without
301 changing semantics of the program. In contrast, such substitution is generally problematic
302 in imperative programs. Consequently, it is much easier to perform semantic-preserving
303 transformations and optimizations of declarative programs than of imperative programs.
304 Imperative programming with states faces the problem of double assignment. In the
305 Verilog code example below, the variable a is assigned twice when c is true:
306
307 1 always @ (posedge clk )
308 2 if (enable) begin
309 3 a <= c & d; b <= c | d;
310 4 if (c) a <= b; // double assignment of a if c is true
311 5 end else a <= d; // b not assigned in else branch
312
313
6 end

314 Most languages take the last assignment as effective in the case of double assignment.
315 The fact that such code is supported is a little counter-intuitive as all registers are refreshed
316 exactly once on each clock tick in synchronous digital circuits. What is worse is that double
317 assignment could be mistakes made by the programmer, for which the compiler is helpless to
318 address.
319 Such problems are inherent in imperative programming with states. However, a stateful
320 computation does not need to be in imperative style. The synchronous dataflow model in
321 Lustre [6] and Signal [4] is one evidence for this. Yet it is unknown how to make programming
322 with FSMs declarative, as they are stateful computation by nature, and past proposals on
323 programming with FSMs are all in imperative style [19, 8]. With implicit state machines, we
324 show how to program with FSMs in declarative style.
325 It is reported that dataflow programming is a good fit for dataflow-dominated applica-
326 tions, while FSM-based imperative programming is a more suitable for control-dominated
327 applications [3, 8]. The FSM extension to Lustre [8] comes from the need to support both
328 styles in the same language, in which FSMs desugar to a core dataflow calculus. Our calculus
329 of implicit state machines can be seen as another synergy of dataflow programming and
330 imperative programming. The expression-oriented nature of the calculus makes dataflow

CVIT 2016
23:12 Digital Design with Implicit State Machines

331 programming easy. Meanwhile, an implicit state machine with an explicit case for each state
332 is a good fit for control-dominated applications.

333 3.2 Scalable Abstraction


334 It is well-known that abstraction is the way to control complexity and build complex systems.
335 Boolean algebra saves digital designers from transistors and resistors. It is a pity that Chisel
336 [1, 18, 15], the latest hardware construction language that gains traction, still promotes
337 programming with wires and connections. If we examine Chisel, VHDL, and Verilog closely,
338 it is not clear what is the core calculus which plays the role of lambda calculus for functional
339 programming.
340 With implicit state machines, we eliminate wires, connections, registers and flip-flops from
341 hardware design. We cannot imagine what else can be removed further, as mathematicians
342 would have discovered the simpler formalism and replaced FSMs with it.
343 Implicit state machine is a scalable abstraction. It may succinctly describe the most
344 basic building blocks of digital design, such as D flip-flops, as well as complex systems
345 via hierarchical and parallel composition. Any synchronous digital system that may be
346 characterized by an FSM can be programmed with implicit state machines, because the
347 transition function of implicit state machines can be both stateful and stateless, that latter
348 corresponds to the transition function of FSMs.
349 Explicit state machines, i.e. state machines with one separate case for each state are
350 implicit state machines by definition. It means programmers can freely choose to program
351 with explicit states or implicit states. Some circuits are simpler to program implicitly, such
352 as that of D flip-flop. The D flip-flop representation with implicit state machines only takes
353 one line:

354 fsm { 0 | s => (d, s) }

355 However, explicit representation in a truth table would take several lines:

s d s’ Q
0 0 0 0
356 0 1 1 0
1 0 0 1
1 1 1 1

357 The D flip-flop is so simple that digital designers seldom think them as an FSM in
358 programming. Programming with FSM in Verilog and VHDL is just a design methodology,
359 with implicit state machines, it becomes a reality.

360 3.3 Acyclic by Construction


361 It is common to compose FSMs in digital design, as hierarchical decomposition is a widely
362 used method to break down a complex system. In Verilog and VHDL, FSM is not a primitive
363 programming construct. They are usually encoded with registers in separate modules, and
364 then the modules are composed. Such composition, however, is dangerous, as combinational
365 cycles may arise from the composition of FSMs [25, 2]. The combinational cycles resulted
366 from FSM composition is illustrated in Figure 4.
367 Despite the fact that combinational cycles have been studied theoretically [27, 22], in
368 practice they represent mistakes in the design and CAD tools for synthesis and verification
F. Liu, A. Prokopec, M. Odersky 23:13

input output

Combinational Logic

current state next state


State

A B C

Figure 4 FSM composition. (A) An FSM in circuit, where the combinational logic is acyclic. (B)
The connection of two FSMs results in combinational cycles. (C) The connection does not result
in combinational cycles, as the feedback to the upper FSM only goes to the state element, which
breaks the loop.

369 require circuits without combinational cycles as input. In our calculus, there are no combina-
370 tional cycles by construction. To compose two FSMs as in Figure 4B, a digital designer has
371 to write the following code:

372 fsm { v3 | s3 =>


373 let o1 = fsm { v1 | s1 => t1 } in
374 let o2 = fsm { v2 | s2 => t2 } in
375 (t3, (o1, o2))
376 }

377 In the code above, another FSM is created with the state name s3, which is the shared
378 state that decouples the combinational loop.
379 In the case where the connection in Figure 4C does not result in combinational cycles, i.e.
380 one feedback only goes into the state elements but not output, there is no need to create an
381 additional FSM:

382 fsm { v1 | s1 =>


383 let o1 = t1 in
384 let o2 = fsm { v2 | s2 => t2 } in
385 (t3, (o1, o2))
386 }

387 In the above, the next state and output of the inner FSM, i.e. t2, may depend on
388 o1. Meanwhile, the next state of the outer FSM, i.e. t3, may depend on o2. The code is
389 guaranteed to be acyclic by construction.

390 3.4 Logic Synthesis


391 De Micheli [9] mentioned that sequential synthesis is hindered by combinational boundaries:
392 typical optimizations extract combinational logic from the register-separated circuit network
393 and optimize the combinational fragments only. The flattening of FSMs can transform any
394 circuit into an equivalent circuit with state elements at the boundary and a big combinational
395 core in the center. We conjecture such a transformation will facilitate optimizations as well

CVIT 2016
23:14 Digital Design with Implicit State Machines

396 as enable more optimization opportunities. We leave the conjecture to be substantiated by


397 future research.
398 An expert in logic synthesis might wonder, what is the impact of flattening on area and
399 delay, the two goals of logic optimizations? The answer is that they are unchanged. The
400 reason is that during flattening, we only introduce let-bindings, it neither creates additional
401 gates nor changes the number of gates on any path. With implicit state machines, experts in
402 logic synthesis no longer need to worry about combinational boundaries any more.
403 Already in 1991, Malik [23] envisioned the possibility of applying combinational techniques
404 to optimizing sequential circuits by pushing registers to the boundary of the circuit network,
405 and cut the loops when needed. The approach taken by Malik is based on a technique called
406 retiming [17], which changes the timing behaviors of the circuit by moving registers around
407 in the circuit network.
408 Our approach essentially follows the same spirit. However, we achieve the same goal
409 without changing timing behavior of the circuit. The optimization opportunities enabled
410 by retiming is different from ours, but it can be expressed based on top of implicit state
411 machines. For example, given the circuit network below:

C1 delay = 1 C2
delay = 3 delay = 5

412

413 The circuit above shows that two outputs of the sub-circuit C1 go to two different registers.
414 The output of the two registers go to an AND gate and its output in turns goes to the
415 sub-circuit C2. The critical path of the circuit has the delay 6. The critical path is the path
416 in the circuit that has the maximum delay between an input signal or a register read, to
417 an output signal or a register write. The period of clock in a synchronous circuit has to be
418 bigger than the delay of the critical path.
419 Using retiming, we can push the two registers after the AND gate, which results in the
420 following network:

C1 delay = 1 C2
delay = 3 delay = 5

421

422 Now the critical path of the circuit has a delay of 5 instead of 6, and it saves one register.
423 If we represent the circuit C1 by the term t1, and the circuit C2 by the term t2, then the
424 circuit before the retiming optimization can be expressed as follows:

425 let x = t1 in
426 let y = fsm { (0, 0) | s =>
427 (x, s.1 & s.2)
F. Liu, A. Prokopec, M. Odersky 23:15

428 }
429 in t2

430 In the above, x represents the two output signals of the circuit C1, and the input signal
431 to the circuit C2 is represented by the variable y. The circuit after the retiming optimization
432 can be expressed as follows:

433 let x = t1 in
434 let y = fsm { 0 | s =>
435 (x.1 & x.2, s)
436 }
437 in t2

438 If the AND gate in the original circuit is a XOR gate, then we also need to change the
439 initial state of the transformed FSM in the above.
440 If we see it from another perspective, retiming transforms are just usage of laws of implicit
441 state machines. In addition to the transformations presented in lifting and flattening, the
442 following transformations may also serve as laws because they are semantic-preserving:

Jlet x = t1 in t2 K = [x 7→ t1 ]t2 inlining


Jfsm { v | s ⇒ (v, t) }K = let s = v in t stable state
Jfsm { v | s ⇒ (s, t) }K = let s = v in t const state
Jfsm { v | s ⇒ (t1 , v2 ) }K = v2 const output
443 Jfsm { v | s ⇒ (t1 , t2 ) }K = t2 if s is not free in t2 fake state
Jfsm { v | s ⇒ (t1 , t2 ) }K = let x = t1 in fsm { v | s ⇒ (x, t2 ) } simple state
if s is not free in t1
Jfsm { v | s ⇒ (x, t2 ) }K = fsm { v 0 | s ⇒ ([s 7→ x]t2 , s) } retiming
σ0 ,∅ 0
if t2 −→ v

444 The essence of retiming is succinctly expressed by the last rule, except the subtlety about
445 the initial state: it requires that t2 should evaluate to a value v 0 given the initial states for
446 all FSMs in the program σ0 . The empty environment enforces that t2 may not depend on
447 external inputs. Otherwise, we do not see how to preserve semantics in the transform.

448 4 Implicit State Machine in Scala


449 To test feasibility of making implicit state machines as a programming model, we implemented
450 an embedded DSL in Scala for hardware construction.

451 4.1 A Quick Glance


452 The following code shows how we may implement a half adder in our DSL:
453
454 1 def halfAdder(a: Signal[Bit], b: Signal[Bit]): Signal[Vec[2]] = {
455 2 val s = a ^ b
456 3 val c = a & b
457 4 c ++ s
458
459
5 }

460 In the code above, the type Signal[Bit] means that a is a signal of 1 bit. The type
461 Signal[Vec[2]] means a signal of width 2. Here we take advantage of literal types in Scala,

CVIT 2016
23:16 Digital Design with Implicit State Machines

462 which supports the usage of a literal constant as a type. The type Bit means the same as
463 Vec[1]:
464
465
466
1 type Bit = Vec[1]

467 The DSL supports common bit-wise operations like XOR (^), AND (&), OR (|), ADD
468 (+), SUB (-), SHIFT (<< and >>), MUX (if/then/else). The operator ++ concatenates two bit
469 vector to form a bigger bit vector. All these operations are supported in Verilog [28], and
470 they follow the same semantics as in Verilog.
471 We may compose two half adders to create a full adder, which takes a carry cin as input:
472
473 1 def full(a: Signal[Bit], b: Signal[Bit], cin: Signal[Bit]): Signal[Vec[2]] = {
474 2 val ab = halfAdder(a, b)
475 3 val s = halfAdder(ab(0), cin)
476 4 val cout = ab(1) | s(1)
477 5 cout ++ s(0)
478
479
6 }

480 In the above, we make two calls to halfAdder. Each call will create a copy of the half
481 adder circuit to be composed in the fuller adder. It returns the carry and the sum. We may
482 compose them further to create a 2-bit adder:
483
484 1 def adder2(a: Signal[Vec[2]], b: Signal[Vec[2]]): Signal[Vec[3]] = {
485 2 val cs0 = full(a(0), b(0), 0)
486 3 val cs1 = full(a(1), b(1), cs0(1))
487 4 cs1(1) ++ cs1(0) ++ cs0(0)
488
489
5 }

490 To actually generate a representation of the circuit, we need to specify the input signals:
491
492 1 val a = variable[Vec[2]]("a")
493 2 val b = variable[Vec[2]]("b")
494
495
3 val circuit = adder2(a, b)

496 Now we may generate Verilog code for the circuit:


497
498
499
1 circuit.toVerilog("Adder", a, b)

500 For testing purposes, we can call the interpreter to get the result for a specific input:
501
502 1 val add2 = circuit.eval(a, b)
503 2 val Value(c1, s1, s0) = add2(Value(1, 0) :: Value(0, 1) :: Nil)
504 3 assertEquals(c1, 0)
505 4 assertEquals(s1, 1)
506
507
5 assertEquals(s0, 1)

508 You might be wondering, what about a generic adder that generates circuits for a given
509 width? This can be implemented with a recursion on the number of bits:
510
511 1 def adderN[N <: Num](lhs: Signal[Vec[N]], rhs: Signal[Vec[N]])
512 2 : Signal[Bit ~ Vec[N]] = {
513 3 val n: Int = lhs.size
514 4 def recur(index: Int, cin: Signal[Bit], acc: Signal[Vec[_]]) =
515 5 if (index >= n) cin ~ acc.as[Vec[N]]
516 6 else {
517 7 val cs: Signal[Vec[2]] = full(lhs(index), rhs(index), cin)
518 8 recur(index + 1, cs(1), (cs(0) ++ acc.as[Vec[N]]).asInstanceOf)
519 9 }
520 10

521 11 recur(0, lit(false), Vec().as[Vec[_]])


F. Liu, A. Prokopec, M. Odersky 23:17

522
523
12 }

524 In the code above, the type Signal[Bit ~ Vec[N]] means a signal that is a pair, the left
525 is one bit, the right is a bit vector of length N. To construct a signal of such a type, we just
526 connect two signals with ~ as it is used at line 5. At line 8, we used several type cast in the
527 code, due to the fact that Scala currently does not support arithmetic operations at type
528 level.

529 4.2 Sequential Circuits


We show how to create sequential circuits with the example of moving average. The moving
average filter we are going to implement is specified below:

Yi = (Xi + 2 ∗ Xi−1 + Xi−2 )/4

530 For the input Xi , the output Yi also depends on the previous values Xi−1 and Xi−2 . The
531 FSM that delays a given signal by one clock can be implemented as follows:
532
533 1 def delay[T <: Type](sig: Signal[T], init: Value): Signal[T] =
534 2 fsm("delay", init) { (last: Signal[T]) =>
535 3 sig ~ last
536
537
4 }

538 In the code above, we declare an implicit state machine with the specified initial state
539 init. The body of the FSM is a pair sig ~ last, where the first part becomes the next state,
540 and the second part becomes the output. This is exactly the D flip-flop.
541 Now we may create the circuit for the moving average:
542
543 1 def movingAverage(in: Signal[Vec[8]]): Signal[Vec[8]] = {
544 2 let(delay(in, 0.toValue(8))) { z1 =>
545 3 let(delay(z1, 0.toValue(8))) { z2 =>
546 4 (z2 + (z1 << 1) + in) >> 2.W[2]
547 5 }
548 6 }
549
550
7 }

551 In the code above, we first create an instance of the delay circuit and bind it to the
552 variable z1. Then we delay the signal z1, and bind it to z2. Finally, the computation is
553 expressed on bit vectors.
554 Note that it is tempting to implement the same circuit without using the let-bindings:
555
556 1 def movingAverage(in: Signal[Vec[8]]): Signal[Vec[8]] = {
557 2 val z1 = delay(in, 0.toValue(8))
558 3 val z2 = delay(z1, 0.toValue(8))
559 4 (z2 + (z1 << 1) + in) >> 2.W[2]
560
561
5 }

562 The circuit, though functions the same, will need more gates to implement. The reason is
563 that, in our DSL, the variable definition z1 represents the D flip-flop circuit (not the signal),
564 each usage of the variable z1 will create a copy of the circuit. It is used twice, the circuit is
565 thus duplicated twice. The way to avoid duplication is to use let-bindings, which serves the
566 same role as that of wires: a bound variable may be used multiple times, just like a wire
567 may forward the same signal to multiple gates.
568 The adder example in the previous section also suffers from this problem. However, to
569 our surprise, the version without let-binding is optimized better by synthesis tools from our
570 testing. This problem is common in meta-programming, i.e. write a program to generate

CVIT 2016
23:18 Digital Design with Implicit State Machines

571 another program (possibly in another language). We believe linear type systems might be
572 useful in such settings to ensure that method call results are used linearly, as a method
573 usually synthesize some piece of code, duplicate usage or no usage are usually mistakes.
574 Meanwhile, method arguments should be non-linear, i.e., they may be used multiple times.

575 4.3 Optimizations


576 The synthesized code for the moving average example initially looks like the following (in a
577 notation close to the calculus):
578
579 1 let x: Vec[8] = fsm { 0 | delay => a ~ delay }
580 2 in
581 3 let x1: Vec[8] = fsm { 0 | delay1 => x ~ delay1 }
582
583
4 in (x1 + (x << 1) + a) >> 2

584 After lifting of FSMs, we get the following code:


585
586 1 fsm { 0 | delay =>
587 2 fsm { 0 | delay1 =>
588 3 let x6: Vec[8] ~ Vec[8] = a ~ delay
589 4 in
590 5 let x: Vec[8] = x6.2
591 6 in
592 7 let x8: Vec[8] ~ Vec[8] =
593 8 let x7: Vec[8] ~ Vec[8] = x ~ delay1
594 9 in
595 10 let x1: Vec[8] = x7.2
596 11 in (x7.1 ~ x1 + (x << 1) + a) >> 2
597 12 in x8.1 ~ x6.1 ~ x8.2
598 13 }
599
600
14 }

601 As expected, a lot of unnecessary let-bindings are introduced, and the flattening of FSMs
602 will introduce several more let bindings. To eliminate such bindings, we first transform the
603 code into A-normal form (ANF), then perform detupling that reduces pairs to bit vectors,
604 and finally inline trivial let-bindings. In the end, we get the following compact code:
605
606 1 fsm { 0 | state =>
607 2 a ++ state(15..8) ++ ((state(7..0) + (state(15..8) << 1) + a) >> 2)
608
609
3 }

610 Eventually, the generated Verilog code looks like the following:
611
612 1 module Filter (CLK, a, out);
613 2 input CLK;
614 3 input [7:0] a;
615 4 output [7:0] out;
616 5 wire [7:0] out;
617 6 reg [15:0] state;
618 7

619 8 assign out = ( ( ( state[7:0] + ( state[15:8] << 1’b1 ) ) + a ) >> 2’b10 );


620 9

621 10 initial begin


622 11 state = 16’b0000000000000000;
623 12 end
624 13

625 14 always @ (posedge CLK)


F. Liu, A. Prokopec, M. Odersky 23:19

626 15 state <= { a, state[15:8] };


627
628
16 endmodule

629 In the Verilog code above, only the following line updates the state of the FSM, other
630 lines compute the next state and output:
631
632 1 always @ (posedge CLK)
633 2 state <= { a, state[15:8] };
634
635
3 endmodule

636 This is the typical code generated by our DSL compiler, all the code is combinational
637 except one line, no matter how complex the circuit is. Is the generated Verilog efficient? For
638 curiosity, we implemented the moving average filter in Chisel:
639
640 1 class MovingAverage3 extends Module {
641 2 val io = IO(new Bundle {
642 3 val in = Input(UInt(8.W))
643 4 val out = Output(UInt(8.W))
644 5 })
645 6 val z1 = RegNext(io.in)
646 7 val z2 = RegNext(z1)
647 8 io.out := (io.in + (z1 << 1.U) + z2) >> 2.U
648
649
9 }

650 Chisel generates the following Verilog code after removing comments and the reset input:
651
652 1 module MovingAverage3(
653 2 input clock,
654 3 input [7:0] io_in,
655 4 output [7:0] io_out
656 5 );
657 6 reg [7:0] z1;
658 7 reg [7:0] z2;
659 8 wire [8:0] _GEN_0;
660 9 wire [8:0] _T_12;
661 10 wire [8:0] _GEN_1;
662 11 wire [9:0] _T_13;
663 12 wire [8:0] _T_14;
664 13 wire [8:0] _GEN_2;
665 14 wire [9:0] _T_15;
666 15 wire [8:0] _T_16;
667 16 wire [8:0] _T_18;
668 17 assign _GEN_0 = {{1’d0}, z1};
669 18 assign _T_12 = _GEN_0 << 1’h1;
670 19 assign _GEN_1 = {{1’d0}, io_in};
671 20 assign _T_13 = _GEN_1 + _T_12;
672 21 assign _T_14 = _GEN_1 + _T_12;
673 22 assign _GEN_2 = {{1’d0}, z2};
674 23 assign _T_15 = _T_14 + _GEN_2;
675 24 assign _T_16 = _T_14 + _GEN_2;
676 25 assign _T_18 = _T_16 >> 2’h2;
677 26 assign io_out = _T_18[7:0];
678 27 always @(posedge clock) begin
679 28 z1 <= io_in;
680 29 z2 <= z1;
681 30 end
682
683
31 endmodule

CVIT 2016
23:20 Digital Design with Implicit State Machines

684 Now we run the synthesis tool Yosys1 on both files, we get the following result:

wires wire bits public wires public wire bits cells


Chisel (original) 73 147 11 85 85
685
Chisel (after correction) 59 106 8 55 73
Our DSL 55 84 4 33 73

686 For all columns, lower is better. The most important is last column cells, which says
687 the number of gates required to implement the circuit. The column wires means the total
688 number of wires in the synthesized design, the column wire bits means the total number of
689 wires in bits, as wires may be wider than 1 bit. The column public wires means the wires
690 that exist in the original design, i.e. not created by Yosys, the column public wire bits is
691 similar.
692 The difference between the first two lines comes from the fact that Chisel handles << by
693 incrementing the width of the result, it thus increases wires and gates. Our DSL follows the
694 semantics of Verilog, i.e. to keep the result the same width as the shifted bit vector. After
695 the correction of the semantics for <<, Chisel uses the same number of gates as our DSL, and
696 our DSL still performs better on wire bits. This shows that at least for simple circuits, our
697 DSL compiler generates efficient circuits on par with the industry-level DSL.

698 4.4 Case Study: Microcontroller


699 To further test the usability of the DSL, we implemented a 2-stage accumulator-based
700 microcontroller. The microcontroller supports 20 instructions:

701 NOP, ADD, ADDI, SUB, SUBI, SHL, SHR, LD, LDI, ST, AND, ANDI, OR, ORI,
702 XOR, XORI, BR, BRZ, BRNZ, EXIT

703 NOP is the no-op. EXIT is used for testing.


704 Arithmetic operations have two versions, those with immediate operands (such as ADDI
705 and ORI) and those with indirect operands (such as ADD and OR).
706 SHL and SHR always have immediate operands.
707 LD loads a date from memory to the accumulator. LDI puts the immediate operand in the
708 accumulator. ST stores the value in the accumulator to a memory address.
709 BR is unconditional jump. BRZ will jump to the operand address if the accumulator is zero.
710 BRNZ is the opposite of BRZ.

711 The controller interfaces with a bus, which make the requested data on bus in the next
712 clock cycle:
713
714 1 type BusOut = Vec[8] ~ Bit ~ Bit ~ Vec[32] // addr ~ read ~ write ~ writedata
715
716
2 type BusIn = Vec[32] // read data

717 The signature of the microcontroller generator is as follows:


718
719
720
1 def processor(prog: Array[Int], busIn: Signal[BusIn]): Signal[BusOut ~ Debug]

721 It takes a program prog to store in a on-chip instruction memory, which is different from
722 the external memory connected by the bus. Note that the output type is BusOut ~ Debug,
723 where we add Debug for testing purposes:

1
https://github.com/YosysHQ/yosys
F. Liu, A. Prokopec, M. Odersky 23:21

724
725
726
1 type Debug = Vec[32] ~ Vec[_] ~ Vec[16] ~ Bit // acc ~ pc ~ instr ~ exit

727 Note that the width of the program counter PC is unspecified, because it depends on the
728 size of the given program. If the program size is 62, then the width is 6.
729 At the high-level, the microcontroller is an FSM which contains three architectural states:
730
731 1 fsm("processor", pc0 ~ acc0 ~ pending0) { (state: Signal[PC ~ ACC ~ INSTR]) =>
732 2 val pc ~ acc ~ pendingInstr = state
733
734
3 }

735 The variable pc refers to the program counter, acc is the accumulator register, pendingInstr
736 is the instruction from the last cycle waiting for data from the external memory. The type
737 ACC and INSTR are aliases of Vec[32] and Vec[16] respectively. The type PC is an alias of
738 Vec[addrWidth.type], where addrWidth is a local variable computed from the program size.
739 The skeleton of the implementation is as follows:
740
741 1 let("pcNext", pc + 1.W[addrWidth.type]) { pcNext =>
742 2 let("instr", instrMemory(addrWidth, prog, pc)) { instr =>
743 3 let("stage2Acc", stage2(pendingInstr, acc, busIn)) { acc =>
744 4 when (opcode === ADDI.W[8]) {
745 5 val acc2 = acc + operand
746 6 next(acc = acc2)
747
748
7 } /* ... */ } }

749 It first increments the program counter pc and bind the result to pcNext. Then it binds
750 the current instruction to instr. Next, it gets the updated value of the accumulator from
751 the pending instruction. At the circuit-level, the three operations are executed in parallel.
752 Finally, the instruction is decoded and executed in a series of when constructs. The when
753 construct is a syntactic sugar created from the built-in multiplexer that supports selecting
754 one of two n-bit inputs by a single bit control. Eventually, each branch calls the local method
755 next with appropriate arguments:
756
757 1 def next(
758 2 pc: Signal[PC] = pcNext,
759 3 acc: Signal[ACC] = acc,
760 4 pendingInstr: Signal[INSTR] = 0.W[16],
761 5 out: Signal[BusOut] = defaultBusOut,
762 6 exit: Boolean = false
763 7 ): Signal[(PC ~ ACC ~ INSTR) ~ (BusOut ~ Debug)] = {
764 8 val debug = acc ~ (pc.as[Vec[_]]) ~ instr ~ exit
765 9 (pc ~ acc ~ pendingInstr) ~ (out ~ debug)
766
767
10 }

768 As can be seen from above, the method next defines default values for all arguments, such
769 that each branch may only specify parameters that are different. For example, the following
770 are the code for unconditional jump BR and indirect addition ADD:
771
772 1 } .when (opcode === BR.W[8]) {
773 2 next(pc = jmpAddr)
774 3 } .when (opcode === ADD.W[8]) {
775 4 next(out = loadBusOut, pendingInstr = instr)
776
777
5 }

778 The implementation for the method stage2 just checks the pending instructions, and
779 computes the updated accumulator value from the bus input. If the pending instruction is
780 NOP, it simply returns the current value of the accumulator.

CVIT 2016
23:22 Digital Design with Implicit State Machines

781 The on-chip instruction memory is implemented by generating nested conditional expres-
782 sions. Each condition tests whether the input address is equal to a memory address, if true,
783 the instruction at the address is returned in the same clock cycle (they are combinational
784 circuits):
785
786 1 def instrMemory(addrWidth: Int, prog: Array[Int],
787 2 addr: Signal[Vec[addrWidth.type]]): Signal[Vec[16]] = {
788 3 val default: Signal[Vec[16]] = 0.W[16]
789 4 (0 until (1 << addrWidth)).foldLeft(default) { (acc, curAddr) =>
790 5 when[Vec[16]] (addr === curAddr.W[addrWidth.type]) {
791 6 if (curAddr < prog.size) prog(curAddr).W[16]
792 7 else default
793 8 } otherwise {
794 9 acc
795 10 }
796 11 }
797
798
12 }

799 We test the implementation with small assembly programs. Despite the allure of success-
800 fully running simple assembly programs, we are aware that the microcontroller is still too
801 simple and it may not match quality standards. Our next goal is to implement RISC-V cores
802 and compare with the state-of-the-art open source implementations by standard metrics.

803 5 Related Work

804 Statecharts [12] is a visual formalism which supports hierarchical states and orthogonal states.
805 Its formal semantics is subtle, and was given several years later after its first introduction
806 [14, 24, 11, 13]. Hierarchical states do not automatically give rise to hierarchical FSMs
807 required for hierarchical module composition in circuit design. In a sense, hierarchical states
808 and hierarchical FSMs are two orthogonal concepts, as hierarchical FSMs do not imply
809 hierarchical states either. Implicit state machines do not support hierarchical states natively,
810 but such an extension is conceptually possible, though what they should look like and
811 whether they are useful in digital design is open to debate. Implicit state machines just
812 do not mandate one separate case for each state in the program, but do not forbid them,
813 hierarchical or not.
814 An extension of hierarchical FSMs [8] is experimented in Lucid Synchrone [7] and
815 integrated in the declarative dataflow language Lustre [6]. The extension is in imperative
816 style, and it desugars to a core dataflow calculus. Since the state machines need to define a
817 transition for each state separately, their code representation suffers from exponential blowup
818 after flattening.
819 Caisson [19] is an imperative language for digital design, which supports nested states and
820 parameterized states. The language contains both registers and FSM as primitive constructs.
821 In contrast, our approach is more fundamental in that it makes implicit state machines as
822 the only primitive construct.
823 Malik [23] proposed the usage of combinational techniques to optimizing sequential
824 circuits by pushing registers to the boundary of the circuit network, and cut the loops when
825 needed. The approach is based on a technique called retiming [17], which changes the timing
826 behaviors of the circuit by moving registers around in the circuit network. We achieve the
827 same goal without changing timing behavior of the circuit. The retiming optimization can
828 be expressed on top of implicit state machines.
F. Liu, A. Prokopec, M. Odersky 23:23

829 6 Conclusion
830 It is well-known that Boolean algebra is the calculus for combinational circuits. In this paper,
831 we propose implicit state machines as the calculus for sequential circuits. Implicit state
832 machines do not mandate one separate case for each state in the specification of an FSM.
833 Compared to classic FSMs, implicit state machines support arbitrary parallel and hierarhical
834 composition, which is crucial for real-world programming.
835 Compared to explicit state machines that require one separate case for each state, implicit
836 state machines enjoy a nice property: any system of parallel and hierarchical implicit state
837 machines may be flattened to a single implicit state machine without exponential blowup. For
838 digital circuits, this means that any sequential circuit can be transformed into an equivalent
839 circuit with state elements at the boundary, and a big combinational core in the center. This
840 creates more optimization opportunities for digital circuits, and logic synthesis experts no
841 longer need to worry about combinational boundaries anymore.
842 There are two directions for future work. First, implicit state machines, due to their
843 composability, will make integrated and compositional specification in complex systems
844 easier. Meanwhile, flattening may also flatten the specifications, which can then be fed into
845 off-the-shelf verification tools, together with the flattened FSMs. In this sense, implicit state
846 machines bridge the gap between complex systems and verification tools.
847 Second, implicit state machines may lead to new hardware architectures. For example,
848 in FPGA architectures, currently state elements are scattered across the chip to support
849 different kinds of sequential circuits. This architecture is still not flexible enough, and it is a
850 waste of resource when the distribution of the state elements diverges too big from the circuit
851 to be implemented on the FPGA chip. A possibility is to centralize all state elements, as any
852 circuit is equivalent to a circuit with state elements at the boundary and a combinational
853 core, of the same delay and area.

854 References
855 1 Jonathan Bachrach, Huy Vo, Brian C. Richards, Yunsup Lee, Andrew Waterman, Rimas
856 Avizienis, John Wawrzynek, and Krste Asanovic. Chisel: Constructing hardware in a scala
857 embedded language. DAC Design Automation Conference 2012, pages 1212–1221, 2012.
858 2 A. Benveniste and G. Berry. The synchronous approach to reactive and real-time systems. Pro-
859 ceedings of the IEEE, 79(9), September 1991. URL: http://ieeexplore.ieee.org/document/
860 97297/, doi:10.1109/5.97297.
861 3 A. Benveniste, P. Caspi, S.A. Edwards, N. Halbwachs, P. Le Guernic, and R. de Simone. The
862 synchronous languages 12 years later. Proceedings of the IEEE, 91(1), January 2003. URL:
863 http://ieeexplore.ieee.org/document/1173191/, doi:10.1109/JPROC.2002.805826.
864 4 Albert Benveniste, Paul Le Guernic, and Christian Jacquemot. Synchronous programming
865 with events and relations: the SIGNAL language and its semantics. Science of Computer Pro-
866 gramming, 16(2), September 1991. URL: http://www.sciencedirect.com/science/article/
867 pii/016764239190001E, doi:10.1016/0167-6423(91)90001-E.
868 5 Jerry R Burch, Edmund M Clarke, Kenneth L McMillan, David L Dill, and Lain-Jinn Hwang.
869 Symbolic model checking: 1020 states and beyond. Information and computation, 98(2), 1992.
870 6 P. Caspi, D. Pilaud, N. Halbwachs, and J. A. Plaice. LUSTRE: A Declarative Language for
871 Real-time Programming. In Proceedings of the 14th ACM SIGACT-SIGPLAN Symposium
872 on Principles of Programming Languages, POPL ’87, New York, NY, USA, 1987. ACM.
873 event-place: Munich, West Germany. URL: http://doi.acm.org/10.1145/41625.41641,
874 doi:10.1145/41625.41641.
875 7 Paul Caspi, Gregoire Hamon, Marc Pouzet, and Univ Paris-Sud. Synchronous Functional
876 Programming: The Lucid Synchrone Experiment. 2008.

CVIT 2016
23:24 Digital Design with Implicit State Machines

877 8 Jean-Louis Colaço, Bruno Pagano, and Marc Pouzet. A conservative extension of synchronous
878 data-flow with state machines. In Proceedings of the 5th ACM international conference on
879 Embedded software - EMSOFT ’05, Jersey City, NJ, USA, 2005. ACM Press. URL: http:
880 //portal.acm.org/citation.cfm?doid=1086228.1086261, doi:10.1145/1086228.1086261.
881 9 G. De Micheli. Synchronous logic synthesis: algorithms for cycle-time minimization. IEEE
882 Transactions on Computer-Aided Design of Integrated Circuits and Systems, 10(1), January
883 1991. URL: http://ieeexplore.ieee.org/document/62792/, doi:10.1109/43.62792.
884 10 Giovanni De Micheli, Robert K Brayton, and Alberto Sangiovanni-Vincentelli. Optimal
885 state assignment for finite state machines. IEEE Transactions on Computer-Aided Design of
886 Integrated Circuits and Systems, 4(3):269–285, 1985.
887 11 Willem-Paul de Roever, Gerald Lüttgen, and Michael Mendler. What Is in a Step: New Per-
888 spectives on a Classical Question. In Zohar Manna and Doron A. Peled, editors, Time for Veri-
889 fication, volume 6200. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010. URL: http://link.
890 springer.com/10.1007/978-3-642-13754-9_15, doi:10.1007/978-3-642-13754-9_15.
891 12 David Harel. Statecharts: a visual formalism for complex systems. Science of Computer
892 Programming, 8(3), June 1987. URL: https://linkinghub.elsevier.com/retrieve/pii/
893 0167642387900359, doi:10.1016/0167-6423(87)90035-9.
894 13 David Harel and Hillel Kugler. The Rhapsody Semantics of Statecharts (or, On the Executable
895 Core of the UML). In David Hutchison, Takeo Kanade, Josef Kittler, Jon M. Kleinberg,
896 Friedemann Mattern, John C. Mitchell, Moni Naor, Oscar Nierstrasz, C. Pandu Rangan,
897 Bernhard Steffen, Madhu Sudan, Demetri Terzopoulos, Dough Tygar, Moshe Y. Vardi, Gerhard
898 Weikum, Hartmut Ehrig, Werner Damm, Jörg Desel, Martin Große-Rhode, Wolfgang Reif,
899 Eckehard Schnieder, and Engelbert Westkämper, editors, Integration of Software Specification
900 Techniques for Applications in Engineering, volume 3147. Springer Berlin Heidelberg, Berlin,
901 Heidelberg, 2004. URL: http://link.springer.com/10.1007/978-3-540-27863-4_19, doi:
902 10.1007/978-3-540-27863-4_19.
903 14 David Harel and Amnon Naamad. The STATEMATE Semantics of Statecharts. ACM Trans.
904 Softw. Eng. Methodol., 5(4), October 1996. URL: http://doi.acm.org/10.1145/235321.
905 235322, doi:10.1145/235321.235322.
906 15 A. Izraelevitz, J. Koenig, P. Li, R. Lin, A. Wang, A. Magyar, D. Kim, C. Schmidt, C. Markley,
907 J. Lawson, and J. Bachrach. Reusability is firrtl ground: Hardware construction languages,
908 compiler frameworks, and transformations. In 2017 IEEE/ACM International Conference
909 on Computer-Aided Design (ICCAD), pages 209–216, Nov 2017. doi:10.1109/ICCAD.2017.
910 8203780.
911 16 M. Keating. The simple art of soc design. 2011.
912 17 Charles E. Leiserson and James B. Saxe. Retiming synchronous circuitry. Algorithmica,
913 6(1-6), June 1991. URL: http://link.springer.com/10.1007/BF01759032, doi:10.1007/
914 BF01759032.
915 18 Patrick S. Li, Adam M. Izraelevitz, and Jonathan Bachrach. Specification for the firrtl language.
916 Technical Report UCB/EECS-2016-9, EECS Department, University of California, Berkeley,
917 Feb 2016. URL: http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-9.html.
918 19 Xun Li, Mohit Tiwari, Jason K Oberg, Vineeth Kashyap, Frederic T Chong, Timothy Sherwood,
919 and Ben Hardekopf. Caisson: A Hardware Description Language for Secure Information Flow.
920 20 Dan Luu. Verilog is weird. https://danluu.com/why-hardware-development-is-hard/.
921 Accessed: 2019-12-24.
922 21 Dan Luu. Writing safe verilog. https://danluu.com/pl-troll/. Accessed: 2019-12-24.
923 22 S. Malik. Analysis of cyclic combinational circuits. Proceedings of 1993 International Conference
924 on Computer Aided Design (ICCAD), pages 618–625, 1993.
925 23 Sharad Malik, Ellen M Sentovich, and Robert K Brayton. Retiming and Resynthesis: Optim-
926 izing Sequential Networks with Combinational Techniques.
F. Liu, A. Prokopec, M. Odersky 23:25

927 24 A. Pnueli and M. Shalev. What is in a step: On the semantics of statecharts. In Takayasu Ito
928 and Albert R. Meyer, editors, Theoretical Aspects of Computer Software, Lecture Notes in
929 Computer Science, Berlin, Heidelberg, 1991. Springer. doi:10.1007/3-540-54415-1_49.
930 25 Daniel Sanchez. Minispec reference guide. https://6004.mit.edu/web/_static/fall19/
931 resources/references/minispec_reference.pdf, 2019. Accessed: 2019-12-24.
932 26 Claude E. Shannon. A symbolic analysis of relay and switching circuits. Transactions of the
933 American Institute of Electrical Engineers, 57:713–723, 1938.
934 27 T.R. Shiple, V. Singhal, R.K. Brayton, and A.L. Sangiovnni-Vincentelli. Analysis of
935 combinational cycles in sequential circuits. In 1996 IEEE International Symposium on
936 Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96, volume 4,
937 Atlanta, GA, USA, 1996. IEEE. URL: http://ieeexplore.ieee.org/document/542093/,
938 doi:10.1109/ISCAS.1996.542093.
939 28 IEEE Computer Society. IEEE Standard for Verilog Hardware Description Language. IEEE,
940 2005.
941 29 Harald Søndergaard and Peter Sestoft. Referential transparency, definiteness and unfoldability.
942 Acta Informatica, 27:505–517, 1990.
943 30 Lin Yuan, Gang Qu, Tiziano Villa, and Alberto Sangiovanni-Vincentelli. An fsm reengineering
944 approach to sequential circuit synthesis by state splitting. IEEE Transactions on Computer-
945 Aided Design of Integrated Circuits and Systems, 27(6):1159–1164, 2008.

CVIT 2016

You might also like