12a Timing Optimization
12a Timing Optimization
12a Timing Optimization
Timing Optimization
Outline:
Definitions and problem statement
Overview of techniques (motivated by
adders)
Timing Optimization
Factors determining delay of circuit:
Underlying circuit technology
Parasitics
Wire loads
Layout
Problem Statement
Given:
Initial circuit function description
Library of primitive functions
Performance constraints (arrival/required times)
Generate:
an implementation of the circuit using the primitive
functions, such that:
1. performance constraints are met
2. circuit area is minimized
Gate netlist
Layout
Behavior
Optiization
(scheduling)
Partitioning
(retiming)
Logic synthesis
Technology independent
Technology mapping
Timing driven
place and route
Touati (LT-trees)
Singh
4. Resizing
Focus here on circuit re-structuring
Circuit re-structuring
Approaches:
Local:
Mimic optimization techniques in adders
Global:
Reduce depth of entire circuit
Partial collapsing
Boolean simplification
Re-structuring methods
Performance measured by
1.
2.
3.
levels,
sensitizable paths,
technology dependent delays
Tree height reduction (Singh 88)
Partial collapsing and simplification (Touati 91)
Generalized select transform (Berman 90)
Sensitizable paths
l
i
Collapsed
Critical region
n 5
Critical
m 1 region
4
k
j 3
n
1
2
0 0
h
0
0 0 0
5
1
m 1
4
3
Duplicated
logic
h
2 0
b c d e f
0 0
2 0
c d e f
Collapsed
Critical region
m
2
4
0 0
j 3
4
n
3
2
Duplicated
logic
h
0
0 0
2 0
c d e f
Singh 88
5
m 1
2
4
k
0
j 3
0
h
1
New delay = 5
0 0
2 0
c d e f
10
fm+1
fn=g
McGeer 91
fm =f
fm+1
Boolean
difference
fn=g
dg
__
df
0
g
1
s-a-0 redundant
11
GBX gives little area increase, BUT have now created an untestable
fault (on control input to multiplexor)
KMS transform: (remove false paths without increasing delay)
1.
2.
3. fj fans out to every fanout of f j except fj+1, and fj just fans out to fj+1
4. Set f0 input to f1 to controlling value and propagate constant (can do because
path is false and does not fanout)
KMS results
12
fm+1
fk
fm
fm+1
fk
fm
fm+1
fk
fk+1
fn
Delay is not
increased
fk+1
fn
13
End of lecture 20
14
out
b
c
Berman 90
a=0
b
a=1
out
b
c
15
0/1
a=0
difference =
h
ha ha
a
a=1
0 out
GBX
b
b
a=1
0/1
__
da
Note:
Boolean
dh
GBX
a=0
0
g
1
GST vs GBX
0
g
1
GST
16
GST vs GBX
Select transform appears to be more area
efficient
But Boolean difference generally more efficiently
formed in practice
No delay/speedup advantage for either transform
Need
one MUX per fanout in GST,
only one MUX in GBX
GST
out2
a
a=0
a=1
b
b
out1
17
Technology independent
delay reductions
Generally THR, GBX, GST (critical path based methods)
work OK, but not great
Why are technology independent delay reductions hard?
Lack of fast and accurate delay models
b
e
t
t
e
r
s
l
o
w
e
r
18
Clustering/partial-collapse
Traditional critical-path based methods require
Well defined critical path
Good delay/slack information
Problems:
Possible solutions:
19
Clustering/partial-collapse
Two-level circuits are fast
Details
20
Fast: O(m x k)
21
22
Example of clustering
0
0
0
0
1
1
k=3
2
Result: Lawlers algorithm
gives minimum depth circuit
0
0
0
1
1
Typically,
1. we decompose initial circuit
into 2-input NANDs and
invertors.
2. then cluster size k
reflects # 2-input NANDs
to be collapsed together.
23
Choosing k
I(k) is minimized
Break ties using d(k)
Minimize d(k)
d(k)
I(k)
1
k0
24
Area recovery
Area increase is due to node duplication -
Two solutions:
25
Relabeling procedure:
Attempt to increase node labels without exceeding cluster
size
In reverse topological order
Start : assign
Increase label(u) if
1. new-label(u) <= label(v) for each fanout v and
2. new-label(u) = new-label(v) for each fanout v only if label(u) =
label(v) before relabeling, and
3. no cluster size is violated
26
Relabeling example
0
0
0
0
0
0
0
0
0
0
1
1
1
1
before
after
27
Full_simplify
Redundancy removal
28
Conclusions
Variety of methods for delay optimization
29