Lec2 Timing Up
Lec2 Timing Up
Note: some of the figures in this slide set are adapted from the slide set
of “ Digital Integrated Circuits” by Rabaey et. al., Copyright 2002
1 EE7605 Lecture 2
Outline
• Motivations and definitions
• Hold and Setup time constraints
• Clock non idealities
– Clock Skew problem (positive and negative Skew).
– Worst delay and contamination time evaluation
– Clock Jitter.
• Clock Distribution networks:
– H tree network, Grid, etc..
• Some industrial example:
– Strategies of clock distributions in complex digital
systems and µp.
• Conclusion and discussions (How to counter Skew &
jitter)
2 EE7605 Lecture 2
System Timing
• Clocking is very important to ensure that improper
values are never stored.
• Flip-flop-based pipeline system:
Reg. Reg.
Tq Combinational Ts
clock A Logic (Td) B
Tc = Tq + Td + Ts
Primary inputs change after clock (φ) edge.
Primary inputs must stabilize before next clock edge.
Rules allow changes to propagate through
combinational logic for next cycle.
Flip-flop outputs hold current-state values for next-state
computation
3 EE7605 Lecture 2
Chip to Chip Timing
4 EE7605 Lecture 2
Timing Definition-Latch Parameters
D Q
Clk
T
Clk PWm
tsu
D
thold
tc-q td-q
Q
5 EE7605 Lecture 2
Register Parameters
D Q
Delays can be different for rising and falling data transitions
Clk T
Clk
D thold
tsu
tc-q
Q
6 EE7605 Lecture 2
Clock period
• For each clock cycle, cycle period must be longer
than sum of:
– combinational delay;
– Memory element propagation delay.
• period depends on longest path.
• Unbalanced delays
– Logic with unbalanced delays leads to inefficient
use of logic:
8 EE7605 Lecture 2
Outline
• Motivations and definitions
• Hold and Setup time constraints
• Clock non idealities
– Clock Skew problem (positive and negative Skew).
– Worst delay and contamination time evaluation
– Clock Jitter.
• Clock Distribution networks:
– H tree network, Grid, etc..
• Some industrial example:
– Strategies of clock distributions in complex digital
systems and µp.
• Conclusion and discussions (How to counter Skew &
jitter)
9 EE7605 Lecture 2
DFF Implementation (falling edge
triggered)
Master/Slave latch arrangement
D Ds Q
D Q D Q
G Q’ G Q’
Cs
C
D Q
Master D Slave D
latch latch
C Q’
10 EE7605 Lecture 2
DFF Internal Operation
D
C
Master Master
sampling sampling
Ds
Xfer to
Slave
Cs
Xfer to
Slave
Q
11 EE7605 Lecture 2
Edge-triggered Flip Flop using Latches
Slave
Master CLK
0 Q D
1 QM
1
QM
D 0 Q
CLK
CLK
12 EE7605 Lecture 2
Master-Slave Register
I2 T2 I3 I5 T4 I6 Q
QM
D I1 T1 I4 T3
CLK
13 EE7605 Lecture 2
Flip-Flop: Timing Definitions
φ
t
tsetu p th old
In
DATA
STABLE
t
tp FF
Out
DATA
STABLE
t
Setup time: time before clock during which data input must be stable.
Hold time: time after clock event for which data input must remain
stable.
Clock-to-Q delay = TPFF
14 EE7605 Lecture 2
Clk-Q Delay
2.5
CLK
1.5
Volts
D
tc 2 q(lh) tc 2 q(hl)
Q
0.5
2 0.5
0 0.5 1 1.5 2 2.5
time, nsec
15 EE7605 Lecture 2
Clock Race
16 EE7605 Lecture 2
The setup time race
17 EE7605 Lecture 2
The hold time race
18 EE7605 Lecture 2
Setup Time
3.0 3.0
Q
2.5 2.5
2.0 QM 2.0 I 2 2 T2
1.5 1.5 Q
Volts
Volts
CLK CLK
D D
1.0 1.0
I 2 2 T2 QM
0.5 0.5
0.0 0.0
2 0.5 2 0.5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
time (nsec) time (nsec)
(a) Tsetup 5 0.21 nsec (b) T setup 5 0.20 nsec
19 EE7605 Lecture 2
More Precise Setup Time
Clk
t
D
t
Q
t
(a)
1.05tC 2 Q
tC 2 Q
tSu tD 2 C
tH
(b)
20 EE7605 Lecture 2
Setup/Hold Time Illustrations
Circuit before clock arrival (Setup-1 case)
CN
TG1
Inv2 Clk-Q Delay
D1 SM QM
D
Inv1
CP
TClk-Q
TSetup-1 Time
Data Clock
TSetup-1
Time
t=0
21 EE7605 Lecture 2
Setup/Hold Time Illustrations
TG1
Inv2 Clk-Q Delay
D1 SM QM
D
Inv1
CP
TClk-Q
TSetup-1 Time
Data Clock
TSetup-1
Time
t=0
22 EE7605 Lecture 2
Setup/Hold Time Illustrations
TG1
Inv2 Clk-Q Delay
D1 SM QM
D
Inv1
TClk-Q
CP
TSetup-1 Time
Data Clock
TSetup-1
Time
t=0
23 EE7605 Lecture 2
Setup/Hold Time Illustrations
TG1
Inv2 Clk-Q Delay
D1 SM QM TClk-Q
D
Inv1
CP
TSetup-1 Time
Data Clock
TSetup-1
Time
t=0
24 EE7605 Lecture 2
Setup/Hold Time Illustrations
Hold-1 case
CN
Inv1
0
CP
TClk-Q
THold-1
Time
Clock Data
THold-1
Time
t=0
25 EE7605 Lecture 2
Setup/Hold Time Illustrations
Hold-1 case
CN
Inv1
0
CP
TClk-Q
THold-1
Time
Clock Data
THold-1
Time
t=0
26 EE7605 Lecture 2
Setup/Hold Time Illustrations
Hold-1 case
CN
Inv1
0
CP TClk-Q
THold-1
Time
Clock Data
THold-1
Time
t=0
27 EE7605 Lecture 2
Setup/Hold Time Illustrations
Hold-1 case
CN
Inv1 TClk-Q
0
CP
THold-1
Time
Clock Data
THold-1
Time
t=0
28 EE7605 Lecture 2
Setup/Hold Time Illustrations
Hold-1 case
CN
Inv1
0
CP
THold-1
Time
Clock Data
THold-1 ⇒
Time
t=0
29 EE7605 Lecture 2
Hold time violation
Td2
Reg Reg
d q Logic
d q
clk M1 M2
delay T delay
c1 Tc2
Hold time
clk Violation
Tc1
Td2 Old data New data
Tc2
Tc2 is sampling the new data while it’s supposed to sample the old. This
happens when Tc2 lags behind the data Td2 and which is more likely to
happen for extended delay on clk and shorter delay on Registers and Logic.
Worst case will corresponds to the min delay of Logic.
30 EE7605 Lecture 2
Hold time condition
tc_q
+ tlogic,min must be higher than a certain threshold
defined by the hold time of the FF.
31 EE7605 Lecture 2
How fast can we run
Reg Reg
d q Logic
d q
clk M1 M2
delay T delay
c1 Tc2
clk
clk
Setup time requirement:
Tq1 There is Minimum cycle time:
still a margin T = tc-q + tsu + tlogic
Tq1 +
Tlmax
Tsetup2
Setup time
32 Problem Violation EE7605 Lecture 2
Hold and setup time violations
• The earliest that data appears at the input of register M2 is
at time Tc1+Tq1, assuming zero delay in the logic block.
• The clock appears at the register M2 at time Tc2.
• Assume zero setup and hold times, if Tc2 lags the data
change (Tc2 > (Tc1+ Tq1)), the module M2 will store the data
from the current cycle rather than the previous cycle. This
is a hold-time violation and may be caused in practice by
Tc1 and Tq1 being close to zero while a delay is introduced
into the Tc2 clock line.
• If the delay (Tc1+ Tq1) - Tc2 is larger than the cycle time Tc,
then the data will arrive late at M2. This will cause a setup-
time violation. This occurs when the circuit is too slow for
the clock cycle used. While Tc2 may be artificially increased
to allow more time for the data to set up, the constraints Tc2
< (Tc1+ Tq1), becomes harder to meet and data delays may
have to be artificially added to meet the constraints.
33 EE7605 Lecture 2
Outline
• Motivations and definitions
• Hold and Setup time constraints
• Clock non idealities
– Clock Skew problem (positive and negative Skew).
– Worst delay and contamination time evaluation
– Clock Jitter.
• Clock Distribution networks:
– H tree network, Grid, etc..
• Some industrial example:
– Strategies of clock distributions in complex digital
systems and µp.
• Conclusion and discussions (How to counter Skew &
jitter)
34 EE7605 Lecture 2
Clock Non-idealities
• Clock skew
– Spatial variation in temporally equivalent clock
edges; deterministic + random, tSK
• Clock jitter
– Temporal variations in consecutive edges of the
clock signal; modulation + random noise
– Cycle-to-cycle (short-term) tJS
– Long term tJL
• Variation of the pulse width
– Important for level sensitive clocking
35 EE7605 Lecture 2
Clock Skew and Jitter
Clk
tSK
Clk tJS
36 EE7605 Lecture 2
Clock Uncertainties
4 Power Supply
3 Interconnect
2 6 Capacitive Load
Devices
37 EE7605 Lecture 2
Sources of skew and Jitter
• Systematic errors are nominally identical from chip to chip and
are predictable while random errors are due to manufacturing
variations that are difficult to model.
• Clock-signal generation: achieved by generating a high
frequency signal from a low frequency one (VCO): sensitive to
device noise, power supply variations, substrate coupling.
• Manufacturing Device variations: matching of devices in the
buffers along multiple clock paths is critical.
• Interconnect variations: Vertical and lateral dimension
variations cause the interconnect cap and resistance to vary.
Source of problem: Inter layer Diele (ILD) thickness variations.
• Environmental variations: temperature and power supply.
Temperature gradients across the chip are large as a
consequence of clock gating. Device parameters (Vth and µ)
depend on temperature and the clock delay can vary from path
to path. Does temperature contributes to skew or jitter?
• Capacitive coupling: Any coupling between clock wire and
adjacent signal results in timing uncertainties.
38 EE7605 Lecture 2
The Clock Skew Problem
Clock Rates as High as 2 Ghz in CMOS! (T=0.5ns)
φ
In
CL1 R1 CL2 R2 CL3 R3 Out
ti
clk1
clk2
39 EE7605 Lecture 2
Positive Skew
TCLK + δ
TCLK
1 3
CLK1
δ
CLK2 2 4
δ + th
R1 R2
In Combinational
D Q D Q
Logic
tc − q tlogic
tc − q, cd tlogic, cd
tsu, thold
CLK2 2 4
δ + th
In Out
CL1 R1 CL2 R2 CL3 R3
ti
clk1
clk2
42 EE7605 Lecture 2
Negative Skew
TCLK + δ
TCLK
1 3
CLK1
CLK2 2 4
δ
R1 R2
In Combinational
D Q D Q
Logic
tCLK1 tCLK2
clk
tc − q tlogic
tc − q, cd tlogic, cd
tsu, thold
TCLK + δ
TCLK
1 3
CLK1
CLK2 2 4
δ
44 EE7605 Lecture 2
Positive and Negative Skew
φ
(a) Positive skew(clock
is routed in the same
Data direction of the data
CL R CL R CL R flow.
•Skew has to be strictly controlled and satisfy the maximum
value of skew. Otherwise the circuit will be mal-function.
Reducing the clock frequency does not help.
φ
•When the skew is -ve, the race condition will never happen. The
circuit operates correctly independent of skew.
•However, -ve skew impact the throughput in a negative way. The skew
reduces the time available for the actual computation so that the clock
45 period has to increased by |δ|. EE7605 Lecture 2
How to counter Clock Skew?
• Routing the clock is opposition direction can relieve the
race problem of clock skew. But it will hamper
performance. Also sometimes the data-flow of circuit is
not uni-directional.
Negative Skew
REG
REG
REG
φ . log Out
REG
In φ φ
Positive Skew
φ
Clock Distribution
REG
MUX
delay, ts = setup time
tq = reg clock-to-q delay
φ T = clock period
Assume input signals arrive early enough, max Need to evaluate
bound on the skew is tlogic)min &
t q + t g + t m − t hold > δ tlogic)max
The equilibrium requirement at the time of latching
imposes another constraints on the skew
t q + 5t g + t m + t s < T + δ
Combining these constraints we have
t q + t g + t m − t hold > δ > t q + 5t g + t m + t s − T
48 EE7605 Lecture 2
Example –Propagation and
contamination delay evaluation
• Propagation and contamination delay are not always
easy to evaluate due to false paths.
OR1
A PATH2
In1 Out
B OR2 PATH1
C AND1
D AND3
AND2
50 EE7605 Lecture 2
Critical Path
• The longest delay path is known as critical path since
that path limits the system performance.
• The critical path not only tells us the system cycle
time, it points out what part of the combinational logic
must be changed to improved system performance.
• Speed up gates on the critical path by increasing
transistor sizes, or reducing wiring capacitance, or
redesign logic along the critical path to use a faster
gate configuration.
• Speeding up the system may require modifying several
sections of logic since the critical path can have
multiple branches. Identify the critical path and identify
the cutset of the graph represents the critical path.
Then determine the edge (gate) to speed up.
51 EE7605 Lecture 2
False Path
52 EE7605 Lecture 2
Example of False Path
a c d y
z
b
e
53 EE7605 Lecture 2
Outline
• Motivations and definitions
• Hold and Setup time constraints
• Clock non idealities
– Clock Skew problem (positive and negative Skew).
– Worst delay and contamination time evaluation
– Clock Jitter.
• Clock Distribution networks:
– H tree network, Grid, etc..
• Some industrial example:
– Strategies of clock distributions in complex digital
systems and µp.
• Conclusion and discussions (How to counter Skew &
jitter)
54 EE7605 Lecture 2
Impact of Jitter
TC LK
t j itter
CLK
-tji tte r
REGS Combinational
In Logic
CLK t log ic
tc-q , tc-q, cd t log ic, cd
ts u, thold
tjitter
55 EE7605 Lecture 2
Longest Logic Path in
Edge-Triggered Systems
TJI + δ Setup time
TSU Condition
Clk
TClk-Q
TLM
Latest point Earliest arrival
of launching T of next cycle
If launching edge is late and receiving edge is early, the data will not be too late if:
Clk
TH
Nominal Data must not arrive
clock edge before this time
59 EE7605 Lecture 2
Clock Distribution to bound skew
60 EE7605 Lecture 2
More realistic H-tree
[Restle98]
61 EE7605 Lecture 2
Clock Network with Distributed Buffering
Local Area
Module Module
secondary clock drivers
Module Module
Equalizing the local
clock delay through a
Module Module
careful routing of the
clock signals
combining with a
main clock driver hierarchical clock-
buffering scheme
CLOCK
62 EE7605 Lecture 2
The Grid System
GCL K
Driver
Driver
Driver
GCLK GCLK
•No rc-matching
•Large power
Driver
GCL K
63 EE7605 Lecture 2
Example: DEC Alpha Evolution
64 EE7605 Lecture 2
Example: DEC Alpha 21164
Clock Drivers
65 EE7605 Lecture 2
Example: DEC Alpha 21164
Use Clock grid instead of clock tree
Clock Frequency: 300 MHz - 9.3 Million Transistors
Total Clock Load: 3.75 nF
66 EE7605 Lecture 2
Example: DEC Alpha 21164
67 EE7605 Lecture 2
Example: DEC Alpha 21264
tcycle= 1.67ns
600 MHz – 0.35
trise = 0.35ns tskew = 50ps micron CMOS
Global clock waveform
PLL
68 EE7605 Lecture 2
Example: DEC Alpha 21264
69 EE7605 Lecture 2
Example: DEC Alpha 21264
ps ps
5 300
10 305
15 310
20 315
25 320
30 325
35 330
40 335
45 340
50 345
70 EE7605 Lecture 2
Hybrid Grid
71 EE7605 Lecture 2
Example : Intel IA-64 Itanium
72 EE7605 Lecture 2
Intel IA-64 Itanium clock
distribution topology
73 EE7605 Lecture 2
Global Clock Distribution
• Distribute two clocks
– Core clock and
reference clock
– Using two identical
and balanced H-tree
on the top two
metal layers
• To reduce cap. noise
coupling and to
ensure good
inductive return path,
the H-tree is fully
shield laterally with
Vcc/Vss.
74 EE7605 Lecture 2
Regional clock distribution
• Distributed array of
deskew buffer (DSK) to
reduce within-die
process variations
• Regional clock grid
driven by modular
Regional Clock Drivers
– 30 clock regions
– M4 for x-direction, M5
for y-direction
– Full support for scan
and clock gating
75 EE7605 Lecture 2
Local Clock distribution
76 EE7605 Lecture 2
Take away message –Dealing with
Skew and jitter
• Balance clock paths from a central distribution source to
individual clocking elements: the effective load of each path that
includes wiring and transistors must equalized.
• The use of local grid can reduce skew but at increased power.
• Need to be very careful with gated clocks as it creates data
dependent clock, which would increase jitter.
• If data flow in one direction, route the data in the opposite
direction. This eliminates races at the cost of performance.
• Shielding clock wires from adjacent signals helps reduce noise.
• Variation in chip temperature across the die causes variations in
clock buffer delay. Need to use temperature compensation tech.
• Add on-chip decoupling capacitors to reduce power supply high
frequency fluctuations.
• Extensive simulation should be performed to check for circuit
operation in the # corners (δT, δV, δW, δL, δC, δµn, δVth)
77 EE7605 Lecture 2