Delay Insensitive Asynchronous Design: Equation Chapter 2 Section 1
Delay Insensitive Asynchronous Design: Equation Chapter 2 Section 1
Delay-insensitive design style mainly falls into two categories according to the level of
abstraction applied [30]: Transistor -Level and Gate -Level.
Transistor-Level Delay-Insensitive Design Styles usually follow Martins methods [36] for
designing at transistor level and building optimized and usually state holding circuits
through formal transformations from logic descriptions. This design style produces the
circuits with minimum transistor count [37, 38], and has a specific language and design tool
developed for it [36], but due to its abstraction being at transistor level, not as widely
supported and automated as gate (logic) level design styles.
Gate-Level Delay-Insensitive Design Styles set the level of abstraction at logic design level,
provided that a standard cell library composed of special logic gates is used for circuit
implementation, either totally or partially, alongside with ordinary boolean logic. Such a
library contains logic elements which resemble the Muller C gates [31], in that they can hold
their states in case certain input conditions are not attained. These are called threshold-logic
14
gates, of which the most well-known and cooperated into an automated CAD flow is Null
Convention Logic (NCL) [16]. In gate-level delay-insensitive design mutually exclusive
symbol representations are used frequently instead of boolean representation, even though
boolean gates are still partially used. There is an increasing degree of automated tool support
for design and verification of gate-level design-insensitive circuits, due to their suitability for
system-on-chip design constraints.
True delay-insensitive circuits are very hard to realize, therefore very rare. However, being
the closest approximation, Dual-rail Threshold Logic Gates are widely referred as building
blocks for delay-insensitive circuits in literature. These circuits are actually Quasi-delay
insensitive, meaning that their functionality is based on the isochronic forks assumption
which states that all wiring works have equal delays, or at least those on small circuit scales.
Dual-rail Threshold Logic gates implement a logic function in case a certain input
conditions, namely the threshold are met, otherwise hold their states. They have been
developed concurrently under different names by different parties for gate-level delay-
insensitive design [33, 34, 35]. The most well known is the Null Convention Logic (NCL),
developed and commercialized by Theseus Logic Inc. in 1996, to address the delay-
insensitive asynchronous design space [16, 39, 40]. In NCL style, completion information is
not explicitly sent but embedded in data representation and circuits are constructed using all
gates from an NCL-type cell library. The basic principles characterizing the Dual-rail
Threshold Logic Gates are explained in the following subsections.
15
A dual-rail signal has two mutually exclusive data paths, D0 and D1, and implements three
logic states {NULL, DATA0, and DATA1} as given in Table 2.1. State DATA1 (D0 = 0
and D1 = 1) for Boolean logic 1, State DATA0 (D0 = 1 and D1 = 0) for Boolean logic 0 and
State NULL (D0 = 0 and D1 = 0) to indicate the result is not available yet. So the validity of
the output could be determined without a time reference. As the two rails are mutually
exclusive, (D0 = 1 and D1 = 1) is an illegal state.
Dual-rail Threshold logic circuits are constructed from primitive modules known as
threshold gates with hysteresis [41]. A typical thmn gate, with 1 m n, has n inputs, of
which at least m of them has to become DATA for the output to assert a DATA value. This
is the threshold behavior. Similarly, at least m of the n inputs has to transition to NULL for
the output to assert NULL. Otherwise the threshold gate maintains its current state,
displaying hysteresis behavior. Specifically, a thmn gate functions like an n-input C-
element while a th1n gate like an n-input OR gate. Two typical gates from the Dual-rail
Threshold Logic Library and their truth tables are given in Figure 2.1.
a a
z
b th33 b
th13 z
c c
a b c z a b c z
1 1 1 1 1 1 1 1
1 1 0 z 1 1 0 1
1 0 1 z 1 0 1 1
1 0 0 z 1 0 0 1
0 1 1 z 0 1 1 1
0 1 0 z 0 1 0 1
0 0 1 z 0 0 1 1
0 0 0 0 0 0 0 0
The most basic approach for logic design using Dual Rail Threshold Logic Gates is
producing a sum of minterms for both rails of the dual-rail output in DIMS (Delay
Insensitive Minterms Summation) style [32, 33], to implement the logic functionality. A
DIMS style full-adder built from dual-rail threshold logic gates is illustrated in Figure 2.2.
c1
c0
a0 a1b1c1 Cout1
a1 th33
th14
b0 a1b0c0
th33
b1
Sum1
a0b1c0 th14
th33
a0b0c1
th33
Cout0
th14
a0b0c0
th33
a0b1c1
th33 Sum0
th14
a1b0c1
th33
a1b1c0
th33
Figure 2.2 DIMS Adder Structure built with Dual-Rail Threshold Logic Gates
There are other approaches which allow for some degree of boolean optimization and hence
do not require complete minterms but rely on C-gates to guarantee delay-insensitivity [34,
35].
17
2.2.4 Transistor Level Design of Dual Rail Threshold Logic Gates
VDD
An
A1 A2 An
VDD
A2
Go To Hold
NULL NULL A1
Y A1
A2
A1 A2 An
Go To Hold
DATA DATA
An
a. Structure of M-of-N threshold gate [41] b. Structure of N-of-N threshold gate [41]
Figure 2.3 Static implementation of Dual-Rail threshold gates with hysteresis
18
After constructing a Dual-Rail Threshold gate according to the given Static Implementation
rules, further circuit optimizations could be employed to decrease the transistor count and
circuit area or to increase gate response times [30].
Each Dual-rail threshold logic circuit requires at least two registration stages, one at the
output to detect the completion of a DATA/NULL value and one at the input to request the
next NULL/DATA value. More registration stages could be introduced to divide the
functional blocks in pipelined fashion, as seen in Figure 2.3.
DI DI DI DI DI
DI DI
DATAin Latch Combinational
Latch
Combinational
Latch Combinational Latch DATAout
(n-2) (n-1) (n) Logic (n+1)
Logic Logic
19
NULL NULL DATA DATA
Evaluation Ack Evaluation Ack
In a Dual-rail Threshold Logic pipeline, the pipeline registration stages could be completely
eliminated by embedding the pipeline registration stage into the last level of combinational
logic. Since each Dual-rail threshold logic gate can inherently hold its state like a register,
the REQ input from next state could be fed into the last level of combinational gates of each
pipelining stage as an extra input and the threshold level of these combinational gates could
be increased by 1 to include the REQ input. Thus gate count and DATA-to-DATA cycle
time (TDD) could be reduced and throughput of the pipeline would be improved.
Dual-rail Threshold logic circuits need to obey certain criteria for maintaining delay-
insensitivity. These can be summarized as follows:
(i) Completeness of Input requires that all outputs of a combinational circuit may not
transition from NULL to DATA until all inputs have transitioned from NULL to DATA, and
may not transition from DATA to NULL until all inputs have transitioned from DATA to
NULL. For circuits with multiple outputs, Seitzs Weak Conditions for Completeness of
Input [44] allow some outputs to transition without having a complete input set, as long as
all outputs cannot transition before all inputs arrive.
(ii) Observability requires that every input and internal wire transition in the circuit should
cause a transition in at least one of the outputs [30, 40]. Transitions that are not used in
determination of the outputs, called orphans, are not allowed propagate through gate
boundaries.
20
2.4 Pipelining Criteria
Dual-rail Threshold Logic circuits lend themselves easily to pipelining but pipelining
requires additional criterion to be obeyed for delay insensitivity. For maintaining proper
control flow in a pipelined Dual-rail Threshold Logic circuit, so that NULL and DATA
waves would not interact within a pipelining stage and violate delay insensitivity, the
evaluation time of ACK output of each pipelining stage should not be greater than arrival
time for REQ input to that pipelining stage, which is fed back from the next pipelining stage
as ACK output, as formulated in (1):
Time [input , ACK ]n Time [input , REQ ]n = Time [input , ACK ]n +1 (1)
Due to their ease of pipelining, Dual-rail Threshold logic circuits could be intrinsically
transformed into systolic arrays for increased throughput in data processing. In systolic
arrays, data exchange is localized to adjacent systoles so global data paths are eliminated.
With asynchronous design, global control paths (clock signals) are also eliminated and
replaced with local handshaking signals. A delay-insensitive bit-level pipelined systolic
array with embedded registration is shown in Figure 2.6.
DI Systole DI Systole
...
...
...
Bit-level pipelining in systolic arrays has the advantage of reducing the latency of the circuit
to the latency of a single systole, so that the speed of a single systole signifies the overall
throughput of a systolic array circuit and the throughput of the systolic array could be kept
21
constant against increasing array dimensions. But, with bit-level pipelining, an additional
criterion for delay insensitivity, called Completion Completeness [45], is introduced in case
bit-wise completion is used at registration stages and the combinational parts of the circuit
only conform to the Weak Condition for Completeness of Input
Completion Completeness is based on the fact that the dual-rail threshold logic registration
stage, which acknowledges either a DATA output or a NULL output, can only assure the
completeness of the output, not the completeness of input [45]. This may cause interaction
of consecutive DATA/NULL wavefronts and violate delay insensitive operation, when bit-
wise completion is adopted instead of word-wise completion for increasing the throughput
of the dual-rail threshold logic pipeline and the combinational parts only conform to the
Weak Condition for Completeness of Input. Since, in bit-wise completion, the completion
signal of each bit of the output is sent only to the dual-rail threshold logic registers that took
part in the calculation of that output bit. So an output bit does not reflect all input transitions
individually.
22