0% found this document useful (0 votes)
56 views

VLSI Design I VLSI Design I VLSI Design I VLSI Design I

1) The document discusses various clocking strategies for sequential logic circuits, including single phase and double phase clock systems, and timing considerations for latches and flip-flops. 2) It covers the use of latches and flip-flops to pipeline systems for better logic utilization, convert parallel operations to serial operations, and process sequential inputs in finite state machines. 3) Timing constraints for latch-based designs include setup times, hold times, clock skew, and the minimal clock period required to ensure correct operation. Dynamic latches can improve speed but require careful timing analysis to function correctly.

Uploaded by

Saksham sangwan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

VLSI Design I VLSI Design I VLSI Design I VLSI Design I

1) The document discusses various clocking strategies for sequential logic circuits, including single phase and double phase clock systems, and timing considerations for latches and flip-flops. 2) It covers the use of latches and flip-flops to pipeline systems for better logic utilization, convert parallel operations to serial operations, and process sequential inputs in finite state machines. 3) Timing constraints for latch-based designs include setup times, hold times, clock skew, and the minimal clock period required to ensure correct operation. Dynamic latches can improve speed but require careful timing analysis to function correctly.

Uploaded by

Saksham sangwan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

VLSI Design I

CMOS Sequential Logic


Clocking Strategies

Overview
single and double phase clock systems
Latch and FF timing

Goal: You are familiar with static and dynamic


latches/FFs
latches/FFs as well as with single, double phase
clock, clock redistribution, clock skew and PLL
clocking techniques.
MicroLab, VLSI-10 (1/23)

JMM v1.4
Sequential Logic
Use #1: Get better utilization from
idle combinational logic blocks.
Pipeline the system so that new
computations start before the old ones
complete. Add registers to keep
computations separate.

8
A
8 Use #2: Convert parallel operations
x C
B to a sequence of (faster, smaller)
8 serial operations.
operations.
1
A
1
+ C
B
8 8

Use #3: Need to process a


sequence of inputs and want to
reuse the same hardware (finite
state machine).

MicroLab, VLSI-10 (2/23)

JMM v1.4
Flip--Flops
Latches and Flip
Q follows D

D Q D

G G
Q
level sensitive latch
Q stable

Q takes value from D

D Q D

clk clk
Q
edge sensitive flip-
flip-flop

Q stable

A static latch will hold data while G is inactive, however long


that may be. A dynamic latch will hold data while G is
inactive, but only “for a while”, after which the saved value
may decay.
Do static latches dissipate static power?
How long is “for a while”?
Which one should I use?
MicroLab, VLSI-10 (3/23)

JMM v1.4
Latch Timing Constraints #1
latch a latch b

D Q CLa D Q CLb D Q

G G G

CLK

t1a
t2b
H S
CLK H S

Do I have to
check ALL these t1a = tnqa+ tnla > thb
constraints?
t1b = tnqb + tndb > tha
t2a = txqa + txla < tc0 - tsb
t2b = txqb + txlb < tc1 - tsa
th = hold time
ts = setup time
tn = min delay from invalid input to invalid output
tx = max delay from valid input to valid output
tl = delay for combinatorial logic from input to output
tq = delay for memory element from G to Q

tc0 = low period of clock cycle tc


MicroLab, VLSI-10 (4/23)

JMM v1.4
Latch Timing Constraints #2
t1a
t2b
H S
CLK H S

t1a = tnqa+ tnla > thb


t1b = tnqb + tndb > tha
t2a = txqa + txla < tc0 - tsb
t2b = txqb + txlb < tc1 - tsa
Questions for latch-
latch-based designs:
Š how much time for useful work (i.e. for combinational logic
delay)?
txla + txlb < tc - 2(t
2(ts + txq)
Š what is the maximal clock frequency
1/f = tc > 2(t
2(txq + txl + ts )
Š does it help to guarantee a minimum tn, for example, by requiring
a minimum number of gates in each cloud?
Š Suppose the maximum clock skew is tSKEW. How does that affect
the equations above? Clock skew measures the difference in
arrival of CLK at two cascaded latches (not necessarily any two
latches!).
MicroLab, VLSI-10 (5/23)

JMM v1.4
Static Latches
Basic idea: Want storage node to
be isolated from whatever
Need gain around user does to Q.
this loop to make 0
latch static.
Q
D 1
Would like fast CLK-
CLK-to-
to-Q,
small setup and zero hold
times.
CLK
Oops… feedback not
Obvious implementation: isolated from Q. Could
add additional
output inverters...

Good! Input goes


only to fet gates
Q

D D

CLKN

CLK CLK
Should we buffer CLK
0, 1 or 2 times?

MicroLab, VLSI-10 (6/23)

JMM v1.4
Latch Timing
1 2

CLK

setup time = how long D input has to be stable


before CLK transition.
hold time = how long D input has to be stable
after CLK transition.
ts
th
CLK

So, what node should we use to measure


setup and hold times? And what should we measure?

Other time of interest: CLK-


CLK-to-
to-Q MicroLab, VLSI-10 (7/23)

JMM v1.4
Dynamic Latches
Suppose in the interest of speed we were
willing to give up the “static guarantee”
and take our chances with dynamic latches,
i.e., remove feedback path...
Eliminate when
Q fanout is small (1)

D Q
Can combine
other logic
with inverter
CLK local or global
clock inverter?

Can we do without the CLK inverter too?


DEC did without on 21064 but put in back in for 21164

CLKN D Q
D Q
CLK
CLK

Delete the PFET driven by CLKN and then add


NFET driven by CLK in Q’s pulldown path to
handle what happens when D goes from 1 to 0.

MicroLab, VLSI-10 (8/23)

JMM v1.4
Flip--flops (registers)
Flip
Using alternating positive and negative dynamic latches with
a single clock gives great speed and small area, but…
Š lots of worries about clock skew
Š must balance logic delays to minimize wastage
Š need latch size checks (check optimisations!)

What about those of us who don’t have buildings full of


engineers to sweat the details? Use D-flip-
flip-flops and
address all the problems once!

D D Q D Q Q D D Q Q
master slave
G G CLK
CLK

D
CLK

Q
!
MicroLab, VLSI-10 (9/23)

JMM v1.4
Flip--flop Implementations
Flip
Obvious implementation:

Q
D

CLK

Use “jamb” latches to lighten CLK load:


“Weak” feedback inverters
(long n and p) get overridden

D Q

CLK

MicroLab, VLSI-10 (10/23)

JMM v1.4
Flip--Flop Timing
Flip
D Q CL D Q

clk clk

CLK

t1
t2
CLK

t1 = tnq + tnl > th


t2 = txq + txl < tc - ts

Questions for register-


register-based designs:
Š how much time for useful work (i.e. for combinational logic
delay)?
Š does it help to guarantee a minimum tn? How about designing
registers so that
txq > th?
Š Supp
Suppose the maximum clock skew is tSKEW. How does that affect
the equations above?

MicroLab, VLSI-10 (11/23)

JMM v1.4
Flip--Flops
Dynamic Flip
I’ll have the Christer Svensson
special please!
2

CLK QN

CLK is low:
Š node 1 follows not(D)
Š node 2 pulled up
Š QN is “floating” with it’s old value

CLK is high:
Š node 2 = “0” if node 1 = “1”,
otherwise it stays “1”
Ö node 2 = not(node 1) shortly after CLKÏ
Š QN = not(node 2) Ö stable soon after CLKÏ
Š node 1 can be pulled down if D goes to “0” (capacitive
coupling), but node 2 won’t change!
MicroLab, VLSI-10 (12/23)

JMM v1.4
Single--Phase Clocked Systems
Single
RTL #1:
D Q D Q D Q

clk clk clk

CLK

latch #2:
D Q D Q D Q

G G G

CLK

Simplest clocking methodology is to use a single clock in conjunction


conjunction
with a register. Clocks are generated with global clock buffers.
CLK and CLK are generated locally.
buffers necessary
for large loads
clk-
clk-in
clk

clk
MicroLab, VLSI-10 (13/23)

JMM v1.4
Clock Skew
D Q D Q D Q

clk clk clk

CLK delay delay

Š if a clock net is heavily loaded, there might be a race


between clock and data -> clock skew
Š special attention has be made by designing the clock
tree. CAD tools are able to design balanced clock trees.
Š two methods to avoid clock skew:
latch
D Q D Q D Q

clk clk clk

CLK delay

D Q D Q

clk clk

delay CLK
MicroLab, VLSI-10 (14/23)

JMM v1.4
Two--Phase Clocked Systems (latch)
Two

D Q D Q D Q

G G G
PHI1
PHI2
phi1
“non-
“non-overlapping
two phase clocks” phi2

Š a problem in single phase clocked systems is the


generation an
and distribution of nearly perfect overlapping
clocks.
Š in two-
two-phase clocked systems this is solved by non-
non-
overlapping clocks
Š non-
non-overlapping clocks can be generated with latch
structures
clk
≥1 phi1

≥1 phi2

MicroLab, VLSI-10 (15/23)

JMM v1.4
Two--Phase Clocked Systems (FF)
Two
D Q D Q D Q

clk clk clk

CLK

CLK
“non-
“non-overlapping
two edge clocks”

‹ in properly designed two-


two-edge clocked systems clock
skew problems are drastically reduced
‹ Disadvantage: 50% speed reduction
‹ typical application: FSM on rising edge, data-
data-path on
falling edge
‹ designs with several FSMs and data-
data-paths need thorough
design

MicroLab, VLSI-10 (16/23)

JMM v1.4
Clock Distribution
Two main techniques for clock distribution exist:
‹ a single large buffer (see Alpha processor)

‹ a distributed clock tree approach

n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath
delays have
n-bit datapath to match
clk between
n-bit datapath
n-bit datapath stages
n-bit datapath
n-bit datapath
n-bit datapath

‹ there is no such thing as design-


design-free clocking
strategy in today’s high-
high-performance processes
‹ clock buffers should be surrounded by power pads
due to its large power consumption
vdd clk gnd clk

clk clk clk clk driver

clk

MicroLab, VLSI-10 (17/23)

JMM v1.4
Phase Locked Loop Clock Technique
Phase locked loops (PLL) are used to generate
internal clocks on chips for two main reasons:
‹ to synchronize the internal clock of a chip with an
external clock
‹ to operate the internal clock at a higher rate than
the external clock input
clock clock

PLL

clock clock
route route

dclk dclk

dclk+dpad dpad

clock clock

dclk dclk

data out data out


MicroLab, VLSI-10 (18/23)

JMM v1.4
PLL Divider #2
by n

up VCO
Phase Charge voltage
Filter
fosc Detector down Pump controlled n x fosc
oscillator
PLL
fosc

ffeed

up

down

Ufilter
‹ The phase detector produces a sequence of up/down
pulses, which are used to switch a charge pump.
‹ The charge pump charges/discharges a capacitor
with voltage or current pulses
‹ A filter is used to limit the rate of change of the
capacitor voltage. The result is a slowly changing
voltage that depends on the frequency difference
between the PLL and VCO.
‹ The VCO increases/decreases its frequency of
operation depending on its input voltgae
MicroLab, VLSI-10 (19/23)

JMM v1.4
Static Timing Analysis
Do I have to Yup, for every pair of connected
check ALL the register/latches AND for all
constraints? possible data values!

We need a CAD tool: static timing analyser. Here’s how


it works:
Step 1: “Level-
“Level-ize”
ize” all signal nodes.
Start by assigning all register outputs and top-
top-level inputs a
level of 0. For all other gates: levelOUTPUT =
max(level
max(levelINPUT)+1.

Step 2: Compute min/max signal delays.


For each successive node level, compute min and max time for
all nodes on that level (see next slide for details). This is a
“data independent”
independent” computation. Might need case analysis to
avoid false paths.
paths.

Step 3: Check setup and hold constraints


Use min times of register inputs to check hold time. Use max
times and tCLK to check setup time or use max time + tSETUP
to determine min tCLK.

MicroLab, VLSI-10 (20/23)

JMM v1.4
Stage Delay Computation
Look at each gate and use knowledge of input timing and rise/fall
rise/fall
timing to compute earliest and latest time output could change ffor
or
both rising and falling output transitions.

IN VDD

INÏ Ö OUTÐ
C1 COUT
2
CLKN min Ö 1=OV, fast
IN OUT max Ö 1=VDD, slow
CLK
1 IN GND

INÐ Ö OUTÏ
C2 COUT
Other transitions:
CLKÏ, CLKÐ, CLKNÏ, CLKNÐ min Ö 2= VDD , fast
max Ö 2=0V, slow

Use Penfield-
Penfield-Rubenstein model to compute
td,in-
d,in-out = sum(R
sum(Ri,Ci) over all nodes “i” in the stage, where Ri is
total “effective resistance” to power rail and Ci is non-
non-zero if node
capacitor needs to be charged/discharged. Multiply by degrading
factor to account for rise/fall time of input.

MicroLab, VLSI-10 (21/23)

JMM v1.4
Coming Up...
Next topic…
Data operators

Readings for next time…


Weste:
‹ Sections 5.5 thru 5.5.6 (latch, FF)
‹ 5.5.8 thru 5.5.11 (clock strategy)

‹ 5.5.15 and 5.5.16 (clock strategy)

Selfstudy…
Selfstudy…
Weste:
‹ PLL section 9.3.5.3

MicroLab, VLSI-10 (22/23)

JMM v1.4
VLSI--10
Exercises: VLSI

Ex vlsi10.1 (difficulty: easy): calculate peak current


and power consumption of a 100MHz clock driver
with rise and fall times of 1ns driving 30k registers
bits at 100fF each with Vdd=3.3V
Vdd=3.3V
Result: Ipeak=9.9A, Pd=2.18 Watt

MicroLab, VLSI-10 (23/23)

JMM v1.4

You might also like