12a Timing Optimization

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 29

Logic Restructuring for

Timing Optimization
Outline:
Definitions and problem statement
Overview of techniques (motivated by
adders)

Tree height reduction (THR)


Generalized bypass transform (GBX)
Generalized select transform (GST)
Partial collapsing (?)

Timing Optimization
Factors determining delay of circuit:
Underlying circuit technology

Circuit type (e.g. domino, static CMOS, etc.)


Gate type
Gate size

Logical structure of circuit

Length of computation paths


False paths
Buffering

Parasitics

Wire loads
Layout

Problem Statement
Given:
Initial circuit function description
Library of primitive functions
Performance constraints (arrival/required times)
Generate:
an implementation of the circuit using the primitive
functions, such that:
1. performance constraints are met
2. circuit area is minimized

Current Design Process


Behavioral description

Logic and latches


Logic equations
Gate library
Perf. Constraints
Delay models

Gate netlist
Layout

Behavior
Optiization
(scheduling)
Partitioning
(retiming)
Logic synthesis
Technology independent
Technology mapping
Timing driven
place and route

Technology mapping for


delay
Function
tree
Buffer
tree

Overview of Solutions for delay


1. Circuit re-structuring

Rescheduling operations to reduce time of computation

2. Implementation of function trees (technology mapping)

Selection of gates from library


Minimum delay (load independent model - Kukimoto)
Minimize delay and area (Jongeneel, DAC00)
(combines Lehman-Watanabe and Kukimoto)

3. Implementation of buffer trees

Touati (LT-trees)
Singh

4. Resizing
Focus here on circuit re-structuring

Circuit re-structuring
Approaches:
Local:
Mimic optimization techniques in adders

Carry lookahead (THR tree height reduction)


Conditional sum (GST transformation)
Carry bypass (GBX transformation)

Global:
Reduce depth of entire circuit
Partial collapsing
Boolean simplification

Re-structuring methods
Performance measured by
1.
2.
3.

Level based optimizations:

levels,
sensitizable paths,
technology dependent delays
Tree height reduction (Singh 88)
Partial collapsing and simplification (Touati 91)
Generalized select transform (Berman 90)

Sensitizable paths

Generalized bypass transform (Mcgeer 91)

Re-structuring for delay:


tree-height reduction
6
5
1

l
i

Collapsed
Critical region

n 5
Critical
m 1 region
4
k
j 3

n
1

2
0 0

h
0

0 0 0

5
1

m 1
4
3

Duplicated
logic

h
2 0

b c d e f

0 0

2 0

c d e f

Restructuring for delay:


path reduction
5

Collapsed
Critical region

m
2
4
0 0
j 3

4
n
3
2

Duplicated
logic

h
0

0 0

2 0

c d e f

Singh 88

5
m 1
2
4
k
0
j 3
0
h
1

New delay = 5

0 0

2 0

c d e f
10

Generalized bypass transform


(GBX)
Make critical path false
Speed up the circuit

Bypass logic of critical path(s)


fm=f

fm+1

fn=g
McGeer 91

fm =f

fm+1

Boolean
difference

fn=g

dg

__
df

0
g
1

s-a-0 redundant

11

GBX and KMS transform

GBX gives little area increase, BUT have now created an untestable
fault (on control input to multiplexor)
KMS transform: (remove false paths without increasing delay)
1.
2.

fk is last node on false path that fans out.


Duplicate false path {f1,, fk} -> {f1, , fk}

3. fj fans out to every fanout of f j except fj+1, and fj just fans out to fj+1
4. Set f0 input to f1 to controlling value and propagate constant (can do because
path is false and does not fanout)

KMS results

1. Function of every node, except f 1, ,fk is unchanged


2. Added k-1 nodes
3. Area added in linear in size of length of false paths; in practice small area
increase.

12

KMS (Keutzer, Malik, Saldanha 90)


fm

fm+1

fk

fm

fm+1

fk

fm

fm+1

fk

fk+1

fn

Delay is not
increased

fk+1

fn

13

End of lecture 20

14

Generalized select transform


(GST)

Late signal feeds multiplexor


a

out

b
c

Berman 90

a=0

b
a=1

out

b
c

15

0/1

a=0

difference =
h
ha ha
a

a=1

0 out

GBX

b
b

a=1

0/1

__
da

Note:
Boolean

dh

GBX

a=0

0
g
1

GST vs GBX

0
g
1

GST
16

GST vs GBX
Select transform appears to be more area
efficient
But Boolean difference generally more efficiently
formed in practice
No delay/speedup advantage for either transform
Need
one MUX per fanout in GST,
only one MUX in GBX
GST

out2

a
a=0
a=1

b
b

out1

17

Technology independent
delay reductions
Generally THR, GBX, GST (critical path based methods)
work OK, but not great
Why are technology independent delay reductions hard?
Lack of fast and accurate delay models

b
e
t
t
e
r

s
l
o
w
e
r

1. # levels, fast but crude


2. # levels + correction term (fanout, wires, ): a little better,
but still crude (what coefficients to use?)
3. Technology mapped: reasonable, but very slow
4. Place and route: better but extremely slow
5. Silicon: best, but infeasibly slow (except for FPGAs)

18

Clustering/partial-collapse
Traditional critical-path based methods require
Well defined critical path
Good delay/slack information

Problems:

Good delay information comes from mapper and layout


Delay estimates and models are weak

Possible solutions:

Better delay modeling at technology independent level


Make speedup, insensitive to actual critical paths and
mapped delays

19

Clustering/partial-collapse
Two-level circuits are fast

Collapse circuit to 2-level - but


Huge area penalty
Huge capacitive loading on inputs (can be much slower)

To avoid huge area penalty

Identify clusters of nodes


Each cluster has some fixed size
Perform collapse of each cluster
Simplify each node

Details

How to choose the clusters?


How to choose cluster size?
How to simplify each node?

20

Lawlers clustering algorithm


Optimal in delay:

For a given clustering size

May duplicate nodes (hence possible area penalty)


Not optimal w.r.t duplication
Use a heuristic

Fast: O(m x k)

m = number of edges in network


k = maximum cluster size

21

Clustering algorithm - overview


1. Label phase: (k is cluster size)

If node u is an input, label(u) := L := 0


Else L := max label of fanin of u
If (# nodes in TFI(u) with (label = L) >= k)
label(u) := L+1

2. Cluster phase: (outputs to inputs)

If node u is an output, L := infinity


Else L := max label of fanouts of u
If (label(u) < L) then create a new cluster with root u and with members all
the nodes in TFI(u) with label = label(u)

3. Collapse phase: (order independent)

Collapse all nodes in a cluster into a single node


Note: a node may be in several clusters (causes area increase

22

Example of clustering
0
0
0
0

1
1

k=3
2
Result: Lawlers algorithm
gives minimum depth circuit

0
0
0

1
1

Typically,
1. we decompose initial circuit
into 2-input NANDs and
invertors.
2. then cluster size k
reflects # 2-input NANDs
to be collapsed together.

23

Choosing k

I(k): number of levels, given k


d(k): duplication ratio

Number of gates in cluster network divided by number of gates in original


network

Determine k0 where k0/d(k0)~2.0


For every k from 2 to k0, compute d(k), I(k)
Use exhaustive enumeration: label and cluster (without collapse) for each k.
Each iteration is O(|E|k)

Choose k such that

I(k) is minimized
Break ties using d(k)
Minimize d(k)

d(k)
I(k)
1

k0

24

Area recovery
Area increase is due to node duplication -

this occurs when node is in multiple clusters

Two solutions:

1. Break clusters into smaller pieces off critical


path
2. After cluster and collapse, recover area

25

Relabeling procedure:
Attempt to increase node labels without exceeding cluster
size
In reverse topological order
Start : assign

new -label (Oi ) max label (O j )


j PO

Increase label(u) if
1. new-label(u) <= label(v) for each fanout v and
2. new-label(u) = new-label(v) for each fanout v only if label(u) =
label(v) before relabeling, and
3. no cluster size is violated

26

Relabeling example
0
0
0
0

0
0
0
0

0
0

1
1

1
1

before

after
27

Post-collapse area recovery


Do algebraic factorization, but

Undo factorization if depth increases

Full_simplify

Only consider node v as possible fanin of a node


(v introduced by using dont cares) if level of
v < level of node.

Redundancy removal

28

Conclusions
Variety of methods for delay optimization

No single technique dominates (KJ Singh PhD thesis)

When applied to ripple-carry adder get

Carry-lookahead adder (THR)


Carry-bypass adder (GBX)
Carry-select adder (GST)
? (partial collapse)

All techniques ignore false paths when assessing


the delay and critical regions

Can use KMS transform to eliminate false paths without


increasing delay (area increase however).

29

You might also like