Lecture 15
Lecture 15
[Adapted from Prof. Mary Jane Irwin’s slides, Rabaey’s Digital Integrated
Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Timing Classifications
Synchronous systems
z All memory elements in the system are simultaneously updated
using a globally distributed periodic synchronization signal (i.e.,
a global clock signal)
z Functionality is ensure by strict constraints on the clock signal
generation and distribution to minimize
- Clock skew (spatial variations in clock edges)
- Clock jitter (temporal variations in clock edges)
Asynchronous systems
z Self-timed (controlled) systems
z No need for a globally distributed clock, but have asynchronous
circuit overheads (handshaking logic, etc.)
Hybrid systems
z Synchronization between different clock domains
z Interfacing between asynchronous and synchronous domains
Review: Synchronous Timing Basics
R1 R2
In Combinational
D Q D Q
logic
tclk1 tclk2
clk
tc-q, tsu, tplogic, tcdlogic
thold, tcdreg
Under ideal conditions (i.e., when tclk1 = tclk2)
T ≥ tc-q + tplogic + tsu
thold ≤ tcdlogic + tcdreg
Under real conditions, the clock signal can have both
spatial (clock skew) and temporal (clock jitter) variations
z skew is constant from cycle to cycle (by definition); skew can be
positive (clock and data flowing in the same direction) or negative
(clock and data flowing in opposite directions)
z jitter causes T to change on a cycle-by-cycle basis
Sources of Clock Skew and Jitter in Clock Network
4 power supply
3 interconnect
clock 6 capacitive load
1
generation
7 capacitive
PLL coupling
2 clock drivers
5 temperature
Skew Jitter
z manufacturing device z clock generation
variations in clock drivers z capacitive loading and
z interconnect variations coupling
z environmental variations z environmental variations
(power supply and (power supply and
temperature) temperature)
Positive Clock Skew
Clock and R1 R2
data flow in In D Q
Combinational
D Q
the same logic
direction tclk1 tclk2
clk
delay
T
T+δ
1 3
δ>0
2 4
δ + thold
T: T + δ ≥ tc-q + tplogic + tsu so T ≥ tc-q + tplogic + tsu - δ
thold : thold + δ ≤ tcdlogic + tcdreg so thold ≤ tcdlogic + tcdreg - δ
δ > 0: Improves performance, but makes thold harder to
meet. If thold is not met (race conditions), the circuit
malfunctions independent of the clock period!
Negative Clock Skew
Clock and R1 R2
data flow in In D Q
Combinational
D Q
opposite logic
directions tclk1 tclk2
clk
delay
T
T+δ
1 3
2 4
δ<0
-tjitter +tjitter
6 12
-tjitter
Idle
condition
Gated
Clock clock
Clock Grid Network
Distributed buffering reduces absolute delay and makes
clock gating easier, but is sensitive to variations in the
buffer delay
The secondary buffers
isolate the local clock
local logic nets from the upstream
area load and amplify the
clock signals degraded
by the RC network
Clock
z decreases absolute skew
main clock z gives steeper clocks
buffer
Only have to bound the
skew within the local
secondary clock buffers logic area
DEC Alpha 21164 (EV5) Example
300 MHz clock (9.3 million transistors on a 16.5x18.1
mm die in 0.5 micron CMOS technology)
z single phase clock
The critical
instruction and
execution units all
see the clock within
65 ps
Dealing with Clock Skew and Jitter
To minimize skew, balance clock paths using H-tree or
matched-tree clock distribution structures.
If possible, route data and clock in opposite directions;
eliminates races at the cost of performance.
The use of gated clocks to help with dynamic power
consumption make jitter worse.
Shield clock wires (route power lines – VDD or GND – next to
clock lines) to minimize/eliminate coupling with neighboring
signal nets.
Use dummy fills to reduce skew by reducing variations in
interconnect capacitances due to interlayer dielectric
thickness variations.
Beware of temperature and supply rail variations and their
effects on skew and jitter. Power supply noise fundamentally
limits the performance of clock networks.
Major Components of a Computer
Processor Devices
Control Input
Memory
Datapath Output
Control
z Finite state machines (PLA, ROM, random logic)
Interconnect
z Switches, arbiters, buses
Memory
z Caches, TLBs, DRAM, buffers
MIPS 5-Stage Pipelined (Single Issue) Datapath
Fetch Decode Execute Memory WriteBack
0
pipeline
stage
isolation
Add register
Add
4 Shift
left 2
Read Addr 1
I$ D$
Register Read Address
Read Addr 2Data 1
Dec/Exec
IF/Dec
Read
PC
File Read
Exec/Mem
Mem/WB
Address Write Addr ALU 1
Read Data
0
Data 2 Write Data 0
Write Data
1
Sign
16 Extend 32
Icache Dcache
precharge precharge
RegWrite
clk
Datapath Bit-Sliced Organization
Control Flow
Pipeline Register
Bit 3
Pipeline Register
Pipeline Register
Pipeline Register
Register File
Multiplexer
Multiplexer
Bit 2
Shifter
Adder
From
I$ Bit 1
Bit 0