0% found this document useful (0 votes)
14 views129 pages

Onur Digitaldesign Comparch 2021 Lecture8 Timing and Verification Afterlecture

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views129 pages

Onur Digitaldesign Comparch 2021 Lecture8 Timing and Verification Afterlecture

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 129

Digital Design & Computer Arch.

Lecture 8: Timing and Verification

Prof. Onur Mutlu

ETH Zürich
Spring 2021
19 March 2021
Required Readings (This Week)
 Hardware Description Languages and Verilog
 H&H Chapter 4 in full

 Timing and Verification


 H&H Chapters 2.9 and 3.5 + (start Chapter 5)

 By tomorrow, make sure you are done with


 P&P Chapters 1-3 + H&H Chapters 1-4

2
Required Readings (Next Week)
 Von Neumann Model, LC-3, and MIPS
 P&P, Chapter 4, 5
 H&H, Chapter 6
 P&P, Appendices A and C (ISA and microarchitecture of LC-3)
 H&H, Appendix B (MIPS instructions)

 Programming
 P&P, Chapter 6

 Recommended: Digital Building Blocks


 H&H, Chapter 5

3
Assignment: Required Lecture Video
 Why study computer architecture? Why is it important?
 Future Computing Platforms: Challenges & Opportunities

 Required Assignment
 Watch one of Prof. Mutlu’s lectures and analyze either (or both)
 https://www.youtube.com/watch?v=kgiZlSOcGFM (May 2017)
 https://www.youtube.com/watch?v=mskTeNnf-i0 (Feb 2021)

 Optional Assignment – for 1% extra credit


 Write a 1-page summary of one of the lectures and email us
 What are your key takeaways?
 What did you learn?
 What did you like or dislike?
 Submit your summary to Moodle – Deadline: April 5
4
Extra Assignment: Moore’s Law (I)
 Paper review
 G.E. Moore. "Cramming more components onto integrated
circuits," Electronics magazine, 1965

 Optional Assignment – for 1% extra credit


 Write a 1-page review
 Upload PDF file to Moodle – Deadline: April 5

 I strongly recommend that you follow my guidelines for


(paper) review (see next slide)

5
Extra Assignment 2: Moore’s Law (II)
 Guidelines on how to review papers critically

 Guideline slides: pdf ppt


 Video: https://www.youtube.com/watch?v=tOL6FANAJ8c

 Example reviews on “Main Memory Scaling: Challenges and


Solution Directions” (link to the paper)
 Review 1
 Review 2

 Example review on “Staged memory scheduling: Achieving


high performance and scalability in heterogeneous systems”
(link to the paper)
 Review 1

6
Agenda
 Wrap Up FSMs in Verilog

 Timing in combinational circuits

 Timing in sequential circuits

 Circuit Verification

7
Wrap Up: FSMs in Verilog

8
Recall: Finite State Machines (FSMs)
 Each FSM consists of three separate parts:
 next state logic
 state register
 output logic

CLK
M next
next k state k output N
inputs state
state
outputs
logic
logic

state register

9
Recall: Finite State Machines (FSMs)
Comprise
 Sequential circuits CLK
 State register(s)
 Store the current state and S’ S
Next Current
 Load the next state at the clock edge State State

 Combinational Circuits Next State


 Next state logic Logic
 Determines what the next state will be CL Next
State

 Output logic Output


Logic
 Generates the outputs
CL Outputs

10
FSM Example 1: Divide the Clock Frequency by 3

The output Y is HIGH for one clock cycle out of every 3. In other
words, the output divides the frequency of the clock by 3.

11
Implementing FSM Example 1: Definitions
module divideby3FSM (input clk,
input reset,
output q);

reg [1:0] state, nextstate;

parameter S0 = 2'b00;
parameter S1 = 2'b01;
parameter S2 = 2'b10;

 We define state and nextstate as 2-bit reg


 The parameter descriptions are optional, it makes reading
easier

12
Implementing FSM Example 1: State
Register CLK
S’ S
Next Current
State State

// state register
always @ (posedge clk, posedge reset)
if (reset) state <= S0;
else state <= nextstate;

 This part defines the state register (memorizing process)


 Sensitive to only clk, reset
 In this example, reset is active when it is ‘1’ (active-high)

13
Implementing FSM Example 1: Next State
Logic
CLK
M next
next k state k output N
inputs state
state
outputs
logic
logic

// next state logic


always @ (*)
case (state)
S0: nextstate = S1;
S1: nextstate = S2;
S2: nextstate = S0;
default: nextstate = S0;
endcase

14
Implementing FSM Example 1: Output Logic

CLK
M next
next k state k output N
inputs state
state
outputs
logic
logic

// output logic
assign q = (state == S0);

 In this example, output depends only on state


 Moore type FSM
15
Implementation of FSM Example 1
module divideby3FSM (input clk, input reset, output q);
reg [1:0] state, nextstate;

parameter S0 = 2'b00; parameter S1 = 2'b01; parameter S2 = 2'b10;

always @ (posedge clk, posedge reset) // state register


if (reset) state <= S0;
else state <= nextstate;

always @ (*) // next state logic


case (state)
S0: nextstate = S1;
S1: nextstate = S2;
S2: nextstate = S0;
default: nextstate = S0;
endcase
assign q = (state == S0); // output logic
endmodule

16
FSM Example 2: Smiling Snail
 Alyssa P. Hacker has a snail that crawls down a paper tape
with 1’s and 0’s on it
 The snail smiles whenever the last four digits it has crawled
over are 1101
 Design Moore and Mealy FSMs of the snail’s brain

Moore

Mealy

17
Implementing FSM Example 2:
Definitions
module SmilingSnail (input clk,
input reset,
input number,
output smile);

reg [1:0] state, nextstate;

parameter S0 = 2'b00;
parameter S1 = 2'b01;
parameter S2 = 2'b10;
parameter S3 = 2’b11;

number/smile

18
Implementing FSM Example 2: State
Register
// state register
always @ (posedge clk, posedge reset)
if (reset) state <= S0;
else state <= nextstate;

 This part defines the state register (memorizing process)

 Sensitive to only clk, reset

 In this example reset is active when ‘1’ (active-high)

19
Implementing FSM Example 2: Next State
Logic
// next state logic
always @ (*)
case (state)
S0: if (number) nextstate = S1;
else nextstate = S0;
S1: if (number) nextstate = S2;
else nextstate = S0;
S2: if (number) nextstate = S2;
else nextstate = S3;
S3: if (number) nextstate = S1;
else nextstate = S0;
default: nextstate = S0;
endcase

20
Implementing FSM Example 2: Output Logic

// output logic
assign smile = (number & state == S3);

 In this example, output depends on state and input


 Mealy type FSM

 We used a simple combinational assignment

21
Implementation of FSM Example 2
module SmilingSnail (input clk, always @ (*) // next state logic
input reset, case (state)
input number, S0: if (number)
output smile); nextstate = S1;
else nextstate = S0;
reg [1:0] state, nextstate; S1: if (number)
nextstate = S2;
parameter S0 = 2'b00; else nextstate = S0;
parameter S1 = 2'b01; S2: if (number)
parameter S2 = 2'b10; nextstate = S2;
parameter S3 = 2’b11; else nextstate = S3;
S3: if (number)
// state register nextstate = S1;
always @ (posedge clk, posedge else nextstate = S0;
reset) default: nextstate = S0;
if (reset) state <= S0; endcase
else state <= nextstate; // output logic
assign smile = (number & state==S3);

endmodule

22
What Did We Learn?
 Basics of describing sequential circuits in Verilog

 The always statement


 Needed for defining memorizing elements (flip-flops, latches)
 Can also be used to define combinational circuits

 Blocking vs Non-blocking statements


 = assigns the value immediately
 <= assigns the value at the end of the block

 Describing FSMs in Verilog


 Next state logic
 State assignment
 Output logic
23
Now:
Timing and Verification

24
What Will We Learn Today?
 Timing in combinational circuits
 Propagation delay and contamination delay
 Glitches

 Timing in sequential circuits


 Setup time and hold time
 Determining how fast a circuit can operate

 Circuit Verification
 How to make sure a circuit works correctly
 Functional verification
 Timing verification

25
Tradeoffs in Circuit Design

26
Circuit Design is a Tradeoff Between:
 Area
 Circuit area is proportional to the cost of the device

 Speed / Throughput
 We want faster, more capable circuits

 Power / Energy
 Mobile devices need to work with a limited power supply
 High performance devices dissipate more than 100W/cm2

 Design Time
 Designers are expensive in time and money
 The competition will not wait for you

27
Requirements and Goals Depend On
Application

28
Circuit Timing
 Until now, we investigated logical functionality

 What about timing?


 How fast is a circuit?
 How can we make a circuit faster?
 What happens if we run a circuit too fast?

 A design that is logically correct can still fail because of


real-world implementation issues!

29
Part 1:
Combinational Circuit Timing

30
Digital Logic Abstraction
 “Digital logic” is a convenient abstraction
 Output changes immediately with the input

A 1 0

Y 0 1

31
Combinational Circuit Delay
 In reality, outputs are delayed from inputs
 Transistors take a finite amount of time to switch

A Y

delay

time
Time

32
Real Inverter Delay Example

Image source: Sandoval-Ibarra, F., and E. S. Hernández-Bernal. "Ring CMOS NOT-based oscillators:
Analysis and design." Journal of applied research and technology, 2008.
33
Circuit Delay and Its Variation
 Delay is fundamentally caused by
 Capacitance and resistance in a circuit
 Finite speed of light (not so fast on a nanosecond scale!)

 Anything affecting these quantities can change delay:


 Rising (i.e., 0 -> 1) vs. falling (i.e., 1 -> 0) inputs
 Different inputs have different delays
 Changes in environment (e.g., temperature)
 Aging of the circuit

 We have a range of possible delays from input to output

34
Delays from Input to Output Y
 Contamination delay (tcd): delay until Y starts changing
 Propagation delay (tpd): delay until Y finishes changing

Example Circuit Effect of Changing Input ‘A’

Cross-hatching
means value is changing 35
Calculating Longest & Shortest Delay
Paths
We care about both the longest and shortest delay paths
in a circuit (we will see why later in the lecture)
Critical Path

A n1
B
n2
C
D Y

Short Path
 Critical (Longest) Path: tpd = 2 tpd_AND + tpd_OR
 Shortest Path: tcd = tcd_AND
36
Calculating Longest Delay Path (Critical
Path)

 Critical (Longest) Path: tpd = 2 tpd_AND + tpd_OR


 Shortest Path: tcd = tcd_AND
37
Calculating Shortest Delay Path

 Critical (Longest) Path: tpd = 2 tpd_AND + tpd_OR


 Shortest Path: tcd = tcd_AND
38
Example tpd for a Real NAND-2 Gate

 Heavy dependence on voltage and temperature!


Source: Nexperia 2-input NAND (74HC00) Datasheet, Section 39
Example Worst-Case tpd
 Two different implementations of a 4:1 multiplexer
Gate Delays Implementation 1 Implementation 2

 Different designs lead to very different delays


40
Aside: A Third 4:1 MUX
Implementation

41
Disclaimer: Calculating Long/Short
Paths
It’s not always this easy to determine the long/short paths!
 Not all input transitions affect the output
 Can have multiple different paths from an input to output

 In reality, circuits are not all built equally


 Different instances of the same gate have different delays
 Wires have nonzero delay (increasing with length)
 Temperature/voltage affect circuit speeds
 Not all circuit elements are affected the same way
 Can even change the critical path!

 Designers assume “worst-case” conditions and run many


statistical simulations to balance yield/performance

42
Combinational Timing Summary
 Circuit outputs change some time after the inputs change
 Caused by finite speed of light (not so fast on a ns scale!)
 Delay is dependent on inputs, environmental state, etc.

 The range of possible delays is characterized by:


 Contamination delay (t ): minimum possible delay
cd

 Propagation delay (t ): maximum possible delay


pd

 Delays change with:


 Circuit design (e.g., topology, materials)
 Operating conditions

43
Output Glitches

44
Glitches
 Glitch: one input transition causes multiple output transitions

Circuit initial state

0
0
1

45
Glitches
 Glitch: one input transition causes multiple output transitions

0
1 -> 0
1 -> ?

46
Glitches
 Glitch: one input transition causes multiple output transitions

Slow path (3 gates)


0
1 -> 0
1 -> ?

1
Fast path (2 gates)

47
Glitches
 Glitch: one input transition causes multiple output transitions

Slow path (3 gates)


0
1 -> 0
1 -> 0 -> 1

1
Fast path (2 gates)

48
Glitches
 Glitch: one input transition causes multiple output transitions
Slow path (3 gates)
0
(B) 1 -> 0 n1
(Y) 1 -> 0 -> 1
n2
1
Fast path (2 gates)

49
Optional: Avoiding Glitches Using K-Maps
 Glitches are visible in K-maps
 Recall: K-maps show the results of a change in a single input
 A glitch occurs when moving between prime implicants

(A) 0 AB
(B) 1 -> 0
(Y) 1 -> 0 -> 1

(C) 1
BC

50
Optional: Avoiding Glitches Using K-Maps
 We can fix the issue by adding in the consensus term
 Ensures no transition between different prime implicants

(A) 0 AB
(B) 1 -> 0
BC (Y) 1 -> 1

(C) 1

AC

No dependence on B
=> No glitch!

51
Avoiding Glitches
 Q: Do we always care about glitches?
 Fixing glitches is undesirable
 More chip area
 More power consumption
 More design effort
 The circuit is eventually guaranteed to converge to the right
value regardless of glitchiness

 A: No, not always!


 If we only care about the long-term steady state output,
we can safely ignore glitches
 Up to the designer to decide if glitches matter in their
application
 When examining simulation output, important to recognize glitches

52
Part 2:
Sequential Circuit Timing

53
Recall: D Flip-Flop
 Flip-flop samples D at the active clock edge
 It outputs the sampled value to Q
 It “stores” the sampled value until the next active clock edge

CLK

D Q

 The D flip-flop is made from combinational elements


 D, Q, CLK all have timing requirements!
54
D Flip-Flop Input Timing Constraints
 D must be stable when sampled (i.e., at active clock edge)
CLK CLK

D
D Q
tsetup thold

ta

 Setup time (tsetup): time before the clock edge that data
must be stable (i.e. not changing)
 Hold time (thold): time after the clock edge that data must
be stable
 Aperture time (ta): time around clock edge that data
must be stable (ta = tsetup + thold)
55
Violating Input Timing: Metastability
 If D is changing when sampled, metastability can occur
 Flip-flop output is stuck somewhere between ‘1’ and ‘0’
 Output eventually settles non-deterministically
Example Timing Violations (NAND RS Latch)

CLK

Q Non-deterministic
Convergence
Metastability

Source: W. J. Dally, Lecture notes for EE108A, Lecture 13: Metastability and
Synchronization Failure (When Good Flip-Flops go Bad) 11/9/2005. 56
Flip-Flop Output Timing
CLK CLK

D Q
tccq
tpcq

 Contamination delay clock-to-q (tccq): earliest time after


the clock edge that Q starts to change (i.e., is unstable)

 Propagation delay clock-to-q (tpcq): latest time after the


clock edge that Q stops changing (i.e., is stable)
57
Recall: Sequential System Design

 Multiple flip-flops are connected with combinational logic


 Clock runs with period Tc (cycle time)

 Must meet timing requirements for both R1 and R2!

58
Ensuring Correct Sequential Operation
 Need to ensure correct input timing on R2

 Specifically, D2 must be stable:


 at least tsetup before the clock edge
 at least until thold after the clock edge

CLK

tsetup thold

ta

59
Ensuring Correct Sequential Operation
 This means there is both a minimum and maximum
delay between two flip-flops
 CL too fast -> R2 thold violation
Potential
 CL too slow -> R2 tsetup violation
R2 ttSETUP
R2 HOLD
VIOLATION!
CLK CLK
Q1 CL D2

R1 R2
(a)
Tc
CLK

Q1

D2
(b) tHOLD tSETUP
60
Setup Time Constraint
 Safe timing depends on the maximum delay from R1 to R2
 The input to R2 must be stable at least tsetup before the clock edge.

CLK CLK
Q1 D2
CL Tc
R1 R2
Tc
CLK

Q1

D2
tpcq tpd tsetup

61
Setup Time Constraint
 Safe timing depends on the maximum delay from R1 to R2
 The input to R2 must be stable at least tsetup before the clock edge.

CLK CLK
Q1 D2
CL Tc > tpcq
R1 R2
Tc
CLK

Q1

D2
tpcq tpd tsetup

62
Setup Time Constraint
 Safe timing depends on the maximum delay from R1 to R2
 The input to R2 must be stable at least tsetup before the clock edge.

CLK CLK
Q1 D2
CL Tc > tpcq + tpd
R1 R2
Tc
CLK

Q1

D2
tpcq tpd tsetup

63
Setup Time Constraint
 Safe timing depends on the maximum delay from R1 to R2
 The input to R2 must be stable at least tsetup before the clock edge.

CLK CLK
Q1 D2
CL Tc > tpcq + tpd + tsetup
R1 R2
Tc
CLK

Q1

D2
tpcq tpd tsetup

64
Setup Time Constraint
 Safe timing depends on the maximum delay from R1 to R2
 The input to R2 must be stable at least tsetup before the clock edge.

Wasted work
CLK CLK
Q1 CL D2
Tc > tpcq + tpd + tsetup
R1 R2 Useful work
Tc
CLK

Q1 Sequencing overhead:
D2
amount of time wasted
tpcq tpd tsetup
each cycle due to sequencing
element timing requirements

65
tsetup Constraint and Design Performance

 Critical path: path with the longest tpd

Tc > tpcq + tpd + tsetup


 Overall design performance is determined by the critical path tpd
 Determines the minimum clock period (i.e., max operating frequency)
 If the critical path is too long, the design will run slowly
 If critical path is too short, each cycle will do very little useful work
 i.e., most of the cycle will be wasted in sequencing overhead

66
Hold Time Constraint
 Safe timing depends on the minimum delay from R1 to R2
 D2 (i.e., R2 input) must be stable for at least thold after the clock edge
Must not change until
thold after the clock
CLK CLK
Q1 CL D2 tccq
R1 R2

CLK

Q1

D2
tccq tcd
thold

67
Hold Time Constraint
 Safe timing depends on the minimum delay from R1 to R2
 D2 (i.e., R2 input) must be stable for at least thold after the clock edge

CLK CLK
Q1 CL D2 tccq + tcd
R1 R2

CLK

Q1

D2
tccq tcd
thold

68
Hold Time Constraint
 Safe timing depends on the minimum delay from R1 to R2
 D2 (i.e., R2 input) must be stable for at least thold after the clock edge

CLK CLK
Q1 CL D2 tccq + tcd > thold
R1 R2

CLK

Q1

D2
tccq tcd
thold

69
Hold Time Constraint
 Safe timing depends on the minimum delay from R1 to R2
 D2 (i.e., R2 input) must be stable for at least thold after the clock edge

CLK CLK
Q1 CL D2 tccq + tcd > thold
R1 R2
tcd > thold - tccq
CLK

Q1
We need to have a minimum
D2
combinational delay!
tccq tcd
thold

70
Hold Time Constraint
 Safe timing depends on the minimum delay from R1 to R2
 D2 (i.e., R2 input) must be stable for at least thold after the clock edge

CLK CLK
Q1 CL D2 tccq + tcd > thold
R1 R2
tcd > thold - tccq
CLK

Q1
Does NOT depend on Tc!
D2
tccq tcd
Very hard to fix thold violations after
thold
manufacturing- must modify circuits!
71
Sequential Timing Summary
tccq / tpcq clock-to-q delay (contamination/propagation)
tcd / tpd combinational logic delay (contamination/propagation)
tsetup time that FF inputs must be stable before next clock edge
thold time that FF inputs must be stable after a clock edge
Tc clock period

CLK CLK CLK CLK


Q1 D2 Q1 CL D2
CL

R1 R2 R1 R2
Tc

CLK CLK

Q1 Q1

D2 D2
tccq tcd tpcq tpd tsetup

thold

72
Example: Timing Analysis
CLK CLK
A Timing Characteristics
tccq = 30 ps
B
tpcq = 50 ps
X' X
C tsetup = 60 ps

D
Y' Y thold = 70 ps

per gate
tpd = 35 ps
tpd =
tcd = 25 ps
tcd =
Check setup time constraints: Check hold time constraints:
Tc > tpcq + tpd + tsetup tccq + tcd > thold ?
Tc >
fmax = 1/Tc =
73
Example: Timing Analysis
CLK CLK
A Timing Characteristics
tccq = 30 ps
B
tpcq = 50 ps
X' X
C tsetup = 60 ps

D
Y' Y thold = 70 ps

per gate
tpd = 35 ps
tpd = 3 x 35 ps = 105 ps
tcd = 25 ps
tcd =
Check setup time constraints: Check hold time constraints:
Tc > tpcq + tpd + tsetup tccq + tcd > thold ?
Tc >
fmax = 1/Tc =
74
Example: Timing Analysis
CLK CLK
A Timing Characteristics
tccq = 30 ps
B
tpcq = 50 ps
X' X
C tsetup = 60 ps

D
Y' Y thold = 70 ps

per gate
tpd = 35 ps
tpd = 3 x 35 ps = 105 ps
tcd = 25 ps
tcd = 25 ps
Check setup time constraints: Check hold time constraints:
Tc > tpcq + tpd + tsetup tccq + tcd > thold ?
Tc >
fmax = 1/Tc =
75
Example: Timing Analysis
tpcq CLKA CLK
Timing Characteristics
tccq = 30 ps
B tpd
tpcq = 50 ps
X' X
C tsetup = 60 ps

D
Y' Y thold = 70 ps
tsetup

per gate
tpd = 35 ps
tpd = 3 x 35 ps = 105 ps
tcd = 25 ps
tcd = 25 ps
Check setup time constraints: Check hold time constraints:
Tc > tpcq + tpd + tsetup tccq + tcd > thold ?
Tc > (50 + 105 + 60) ps = 215 ps
fmax = 1/Tc = 4.65 GHz
76
Example: Timing Analysis
CLK CLK
A Timing Characteristics
tccq = 30 ps
B
tccq tpcq = 50 ps
X' X
C tsetup = 60 ps

D
tcd Y' Y thold = 70 ps

per gate
tpd = 35 ps
tpd = 3 x 35 ps = 105 ps
tcd = 25 ps
tcd = 25 ps
Check setup time constraints: Check hold time constraints:
Tc > tpcq + tpd + tsetup tccq + tcd > thold ?

Tc > (50 + 105 + 60) ps = 215 ps (30 + 25) ps > 70 ps ?

fmax = 1/Tc = 4.65 GHz


77
Example: Timing Analysis
CLK CLK
A Timing Characteristics
tccq = 30 ps
B
tpcq = 50 ps
X' X
C tsetup = 60 ps

D
Y' Y thold = 70 ps

per gate
tpd = 35 ps
tpd = 3 x 35 ps = 105 ps
tcd = 25 ps
tcd = 25 ps
Check setup time constraints: Check hold time constraints:
Tc > tpcq + tpd + tsetup tccq + tcd > thold ?
I L
Tc > (50 + 105 + 60) ps = 215 ps (30 + 25) ps > 70 ps ? FA
fmax = 1/Tc = 4.65 GHz 78
Example: Fixing Hold Time Violation
Add buffers to the short paths:
CLK CLK
Timing Characteristics
A tccq = 30 ps

B tpcq = 50 ps

X' X tsetup = 60 ps
C
thold = 70 ps
Y' Y
D

per gate
tpd = 35 ps
tpd =
tcd = 25 ps
tcd =
Check setup time constraints: Check hold time constraints:
Tc > tpcq + tpd + tsetup tccq + tcd > thold ?
Tc >
fc =
79
Example: Fixing Hold Time Violation
Add buffers to the short paths:
CLK CLK
Timing Characteristics
A tccq = 30 ps

B tpcq = 50 ps

X' X tsetup = 60 ps
C
thold = 70 ps
Y' Y
D

per gate
tpd = 35 ps
tpd = 3 x 35 ps = 105 ps
tcd = 25 ps
tcd = 2 x 25 ps = 50 ps
Check setup time constraints: Check hold time constraints:
Tc > tpcq + tpd + tsetup tccq + tcd > thold ?
Tc >
fc =
80
Example: Fixing Hold Time Violation
Add buffers to the short paths:
CLK CLK
Timing Characteristics
tpcq A tccq = 30 ps

B tpd tpcq = 50 ps

X' X tsetup = 60 ps
C
thold = 70 ps
Y' Y
D
tsetup

per gate
tpd = 35 ps
tpd = 3 x 35 ps = 105 ps
tcd = 25 ps
tcd = 2 x 25 ps = 50 ps
Check setup time constraints: Check hold time constraints:
Tc > tpcq + tpd + tsetup tccq + tcd > thold ?
Tc > (50 + 105 + 60) ps = 215 ps
fc = 1/Tc = 4.65 GHz
81
Example: Fixing Hold Time Violation
Add buffers to the short paths:
CLK CLK
Timing Characteristics
A tccq = 30 ps

B tpcq = 50 ps

X' X tsetup = 60 ps
C
thold = 70 ps
Y' Y
D

per gate
tpd = 35 ps
tpd = 3 x 35 ps = 105 ps
tcd = 25 ps
tcd = 2 x 25 ps = 50 ps
Check setup time constraints: Check hold time constraints:
Tc > tpcq + tpd + tsetup tccq + tcd > thold ?
Tc > (50 + 105 + 60) ps = 215 ps
Note: no change
fc = 1/Tc = 4.65 GHz to max frequency!
82
Example: Fixing Hold Time Violation
Add buffers to the short paths:
CLK CLK
Timing Characteristics
A tccq = 30 ps

B tpcq = 50 ps
tccq X' X tsetup = 60 ps
C

tcd Y' Y
thold = 70 ps
D

per gate
tpd = 35 ps
tpd = 3 x 35 ps = 105 ps
tcd = 25 ps
tcd = 2 x 25 ps = 50 ps
Check setup time constraints: Check hold time constraints:
Tc > tpcq + tpd + tsetup tccq + tcd > thold ?
Tc > (50 + 105 + 60) ps = 215 ps (30 + 50) ps > 70 ps ?
fc = 1/Tc = 4.65 GHz
83
Example: Fixing Hold Time Violation
Add buffers to the short paths:
CLK CLK
Timing Characteristics
A tccq = 30 ps

B tpcq = 50 ps

X' X tsetup = 60 ps
C
thold = 70 ps
Y' Y
D

per gate
tpd = 35 ps
tpd = 3 x 35 ps = 105 ps
tcd = 25 ps
tcd = 2 x 25 ps = 50 ps
Check setup time constraints: Check hold time constraints:
Tc > tpcq + tpd + tsetup tccq + tcd > thold ?
SS
Tc > (50 + 105 + 60) ps = 215 ps (30 + 50) ps > 70 ps ? PA
fc = 1/Tc = 4.65 GHz
84
Clock Skew
 To make matters worse, clocks have delay too!
 The clock does not reach all parts of the chip at the same time!
 Clock skew: time difference between two clock edges

CLOCK A
SOURCE

Long, slow B
clock path

Clock Source
Point A
Point B

clock skew

85
Clock Skew Example
 Example of the Alpha 21264 clock skew spatial distribution

P. E. Gronowski+, "High-performance Microprocessor Design," JSSC’98. 86


Clock Skew: Setup Time Revisited
 Safe timing requires considering the worst-case skew
 Clock arrives at R2 before R1

 Leaves as little time as possible for the combinational logic

Signal must arrive at D2 earlier!

This effectively increases tsetup:

Tc > tpcq + tpd + tsetup + tskew

Tc > tpcq + tpd + tsetup, effective


87
Clock Skew: Hold Time Revisited
 Safe timing requires considering the worst-case skew
 Clock arrives at R2 after R1

 Increases the minimum required delay for the combinational logic

Signal must arrive at D2 later!

This effectively increases thold:


tcd + tccq > thold + tskew
tccq tcd

tcd + tccq > thold, effective


tskew thold
88
Clock Skew: Summary
 Skew effectively increases both tsetup and thold
 Increased sequencing overhead
 i.e., less useful work done per cycle

 Designers must keep skew to a minimum


 Requires intelligent “clock network” across a chip
 Goal: clock arrives at all locations at roughly the same time

Source: Abdelhadi, Ameer, et al. "Timing-driven variation-aware nonuniform clock mesh synthesis." GLSVLSI’10.

89
Part 3:
Circuit Verification

90
How Do You Know That A Circuit Works?
 You have designed a circuit
 Is it functionally correct?
 Even if it is logically correct, does the hardware meet all
timing constraints?

 How can you test for:


 Functionality?
 Timing?

 Answer: simulation tools!


 Formal verification tools (e.g., SAT solvers)
 HDL timing simulation (e.g., Vivado)
 Circuit simulation (e.g., SPICE)

91
Testing Large Digital Designs
 Testing can be the most time consuming design stage
 Functional correctness of all logic paths
 Timing, power, etc. of all circuit elements

 Unfortunately, low-level (e.g., circuit) simulation is much


slower than high-level (e.g., HDL, C) simulation

 Solution: we split responsibilities:


 1) Check only functionality at a high level (e.g., C, HDL)
 (Relatively) fast simulation time allows high code coverage
 Easy to write and run tests
 2) Check only timing, power, etc. at low level (e.g., circuit)
 No functional testing of low-level model
 Instead, test functional equivalence to high-level model
Hard,
 Adapted but
from easier
”CMOS than4e”,testing
VLSI Design Neil H. E.logical
Weste andfunctionality at©2011
David Money Harris this Pearson
level 92
Testing Large Digital Designs
 We have tools to handle different levels of verification
 Logic synthesis tools guarantee equivalence of high-level logic

and synthesized circuit-level description


 Timing verification tools check all circuit timings
 Design rule checks ensure that physical circuits are buildable

 The task of a logic designer is to:


 Provide functional tests for logical correctness of the design
 Provide timing constraints (e.g., desired operating frequency)

 Tools and/or circuit engineers will decide if it can be built!

Adapted from ”CMOS VLSI Design 4e”, Neil H. E. Weste and David Money Harris ©2011 Pearson 93
Part 4:
Functional Verification

94
Functional Verification
 Goal: check logical correctness of the design

 Physical circuit timing (e.g., tsetup/thold) is typically ignored


 May implement simple checks to catch obvious bugs
 We’ll discuss timing verification later in this lecture

 There are two primary approaches


 Logic simulation (e.g., C/C++/Verilog test routines)
 Formal verification techniques

 In this course, we will use Verilog for functional verification

95
Testbench-Based Functional Testing
 Testbench: a module created specifically to test a design
 Tested design is called the “device under test (DUT)”

Outputs
Inputs
Test Output
Pattern Checking
Generator Logic
DUT

Testbench

 Testbench provides inputs (test patterns) to the DUT


 Hand-crafted values
 Automatically generated (e.g., sequential or random values)
 Testbench checks outputs of the DUT against:
 Hand-crafted values
 A “golden design” that is known to be bug-free
96
Testbench-Based Functional Testing
 A testbench can be:
 HDL code written to test other HDL modules
 Circuit schematic used to test other circuit designs

 The testbench is not designed for hardware synthesis!


 Runs in simulation only
 HDL simulator (e.g., Vivado simulator)
 SPICE circuit simulation
 Testbench uses simulation-only constructs
 E.g., “wait 10ns”
 E.g., ideal voltage/current source
 Not suitable to be physically built!

97
Common Verilog Testbench Types

Input/Output
Testbench Error Checking
Generation
Simple Manual Manual
Self-Checking Manual Automatic
Automatic Automatic Automatic

98
Example DUT
 We will walk through different types of testbenches to test
a module that implements the logic function:
y = (b ∙ c) + (a ∙ b)
// performs y = ~b & ~c | a & ~b
module sillyfunction(input a, b, c,
output y);
wire b_n, c_n;
wire m1, m2;

not not_b(b_n, b);


not not_c(c_n, c);

and minterm1(m1, b_n, c_n);


and minterm2(m2, a, b_n);
or out_func(y, m1, m2);
endmodule
99
Useful Verilog Syntax for Testbenching
module example_syntax();
reg a;

// like “always” block, but runs only once at sim start


initial
begin
a = 0; // set value of reg: use blocking assignments
#10; // wait (do nothing) for 10 ns
a = 1;
$display(“printf() style message!"); // print
message
end
endmodule

100
Simple Testbench

101
Simple Testbench
module testbench1(); // No inputs, outputs
reg a, b, c; // Manually assigned
wire y; // Manually checked

// instantiate device under test


sillyfunction dut (.a(a), .b(b), .c(c), .y(y) );

// apply hardcoded inputs one at a time


initial begin
a = 0; b = 0; c = 0; #10; // apply inputs, wait 10ns
c = 1; #10; // apply inputs, wait 10ns
b = 1; c = 0; #10; // etc .. etc..
c = 1; #10;
a = 1; b = 0; c = 0; #10;
end
endmodule

102
Simple Testbench: Output Checking
 Most common method is to look at waveform diagrams
 Thousands of signals over millions of clock cycles

 Too many to just printf()!

time
 Manually check that output is correct at all times

103
Simple Testbench
 Pros:
 Easy to design
 Can easily test a few, specific inputs (e.g., corner cases)

 Cons:
 Not scalable to many test cases
 Outputs must be checked manually outside of the simulation
 E.g., inspecting dumped waveform signals
 E.g., printf() style debugging

104
Self-Checking Testbench

105
Self-Checking Testbench
module testbench2();
reg a, b, c;
wire y;

sillyfunction dut(.a(a), .b(b), .c(c), .y(y));

initial begin
a = 0; b = 0; c = 0; #10; // apply input, wait 10ns
if (y !== 1) $display("000 failed."); // check result
c = 1; #10;
if (y !== 0) $display("001 failed.");
b = 1; c = 0; #10;
if (y !== 0) $display("010 failed.");
end
endmodule

106
Self-Checking Testbench
 Pros:
 Still easy to design
 Still easy to test a few, specific inputs (e.g., corner cases)
 Simulator will print whenever an error occurs

 Cons:
 Still not scalable to millions of test cases
 Easy to make an error in hardcoded values
 You make just as many errors writing a testbench as actual code
 Hard to debug whether an issue is in the testbench or in the DUT

107
Self-Checking Testbench using Testvectors
 Write testvector file
 List of inputs and expected outputs
 Can create vectors manually or automatically using an
already verified, simpler “golden model” (more on this later)
 Example file:
$ cat testvectors.tv
000_1
001_0
010_0
011_0
100_1 Format:
101_1 input_output
110_0
111_0

108
Testbench with Testvectors Design
 Use a “clock signal” for assigning inputs, reading outputs
 Test one testvector each “clock cycle”

Clock cycle

Apply input Check outputs


on rising edge on falling edge
 Note: “clock signal” simply separates inputs from outputs
 Allows us to observe the inputs/outputs in waveform diagrams

 Not used for checking physical circuit timing (e.g., tsetup/thold)


 We’ll discuss circuit timing verification later in this lecture
109
Testbench Example (1/5): Signal Declarations
 Declare signals to hold internal state

module testbench3();
reg clk, reset; // clock and reset are internal
reg a, b, c, yexpected; // values from testvectors
wire y; // output of circuit
reg [31:0] vectornum, errors; // bookkeeping variables
reg [3:0] testvectors[10000:0];// array of testvectors

// instantiate device under test


sillyfunction dut(.a(a), .b(b), .c(c), .y(y) );

H&H Section 4.9, Example 4.39

110
Testbench Example (2/5): Clock Generation
// generate clock
always // no sensitivity list, so it always executes
begin
clk = 1; #5; clk = 0; #5; // 10ns period
end

111
Testbench Example (3/5): Read Testvectors into
Array
// at start of test, load vectors and pulse reset
initial // Only executes once
begin
$readmemb("example.tv", testvectors); // Read
vectors
vectornum = 0; errors = 0; // Initialize
reset = 1; #27; reset = 0; // Apply reset wait
end

// Note: $readmemh reads testvector files written in


// hexadecimal

112
Testbench Example (4/5): Assign Inputs/Outputs
// apply test vectors on rising edge of clk
always @(posedge clk)
begin
{a, b, c, yexpected} = testvectors[vectornum];
end

 Apply {a, b, c} inputs on the rising edge of the clock

 Get yexpected for checking the output on the falling edge

 Rising/falling edges are chosen only by convention


 You can use any part of the clock signal
 Your H+H textbook uses this convention

113
Testbench Example (5/5): Check Outputs
always @(negedge clk)
begin
if (~reset) // don’t test during reset
begin
if (y !== yexpected)
begin
$display("Error: inputs = %b", {a, b, c});
$display(" outputs = %b (%b exp)",y,yexpected);
errors = errors + 1;
end

// increment array index and read next testvector


vectornum = vectornum + 1;

if (testvectors[vectornum] === 4'bx)


begin
$display("%d tests completed with %d errors",
vectornum, errors);
$finish; // End simulation
end
end
end

114
Self-Checking Testbench with
Testvectors
Pros:
 Still easy to design
 Still easy to test a few, specific inputs (e.g., corner cases)
 Simulator will print whenever an error occurs
 No need to change hardcoded values for different tests

 Cons:
 May be error-prone depending on source of testvectors
 More scalable, but still limited by reading a file
 Might have many more combinational paths to test than will fit in
memory

115
Automatic Testbench

116
Golden Models
 A golden model represents the ideal circuit behavior
 Must be developed, and might be difficult to write
 Can be done in C, Perl, Python, Matlab or even in Verilog

 For our example circuit:

module golden_model(input a, b, c,
output y);
assign y = ~b & ~c | a & ~b;// high-level abstraction
endmodule

 Simpler than our earlier gate-level description


 Golden model is usually easier to design and understand
 Golden model is much easier to verify

117
Automatic Testbench
 The DUT output is compared against the golden model

Inputs Outputs

DUT
Test
Pattern Check
Generation Equality
Golden
Model

Testbench

 Challenge: need to generate inputs to the designs


 Sequential values to cover the entire input space?
 Random values?
118
Automatic Testbench: Code
module testbench1();
... // variable declarations, clock, etc.

// instantiate device under test


sillyfunction dut (a, b, c, y_dut);
golden_model gold (a, b, c, y_gold);

// instantiate test pattern generator


test_pattern_generator tgen (a, b, c, clk);

// check if y_dut is ever not equal to y_gold


always @(negedge clk)
begin
if(y_dut !== y_gold)
$display(...)
end
endmodule

119
Automatic Testbench
 Pros:
 Output checking is fully automated
 Could even compare timing using a golden timing model
 Highly scalable to as much simulation time as is feasible
 Leads to high coverage of the input space
 Better separation of roles
 Separate designers can work on the DUT and the golden model
 DUT testing engineer can focus on important test cases
instead of output checking

 Cons:
 Creating a correct golden model may be (very) difficult
 Coming up with good testing inputs may be difficult

120
However, Even with Automatic
Testing…
How long would it take to test a 32-bit adder?
 In such an adder there are 64 inputs = 264 possible inputs
 If you test one input in 1ns, you can test 109 inputs per
second
 or 8.64 x 1014 inputs per day
 or 3.15 x 1017 inputs per year
 we would still need 58.5 years to test all possibilities

 Brute force testing is not feasible for most circuits!


 Need to prune the overall testing space
 E.g., formal verification methods, choosing ‘important cases’

 Verification is a hard problem

121
Part 5:
Timing Verification

122
Timing Verification Approaches
 High-level simulation (e.g., C, Verilog)
 Can model timing using “#x” statements in the DUT
 Useful for hierarchical modeling
 Insert delays in FF’s, basic gates, memories, etc.
 High level design will have some notion of timing
 Usually not as accurate as real circuit timing

 Circuit-level timing verification


 Need to first synthesize your design to actual circuits
 No one general approach- very design flow specific
 Your FPGA/ASIC/etc. technology has special tool(s) for this
 E.g., Xilinx Vivado (what you’re using in lab)
 E.g., Synopsys/Cadence Tools (for VLSI design)

123
The Good News
 Tools will try to meet timing for you!
 Setup times, hold times
 Clock skews
 …

 They usually provide a ‘timing report’ or ‘timing summary’


 Worst-case delay paths
 Maximum operation frequency
 Any timing errors that were found

124
The Bad News
 The tool can fail to find a solution
 Desired clock frequency is too aggressive
 Can result in setup time violation on a particularly long path
 Too much logic on clock paths
 Introduces excessive clock skew
 Timing issues with asynchronous logic

 The tool will provide (hopefully) helpful errors


 Reports will contain paths that failed to meet timing
 Gives a place from where to start debugging

 Q: How can we fix timing errors?

125
Meeting Timing Constraints
 Unfortunately, this is often a manual, iterative process
 Meeting strict timing constraints (e.g., high performance
designs) can be tedious

 Can try synthesis/place-and-route with different options


 Different random seeds
 Manually provided hints for place-and-route

 Can manually optimize the reported problem paths


 Simplify complicated logic
 Split up long combinational logic paths
 Recall: fix hold time violations by adding more logic!

126
Meeting Timing Constraints: Principles
 Let’s go back to the fundamentals

 Clock cycle time is determined by the maximum logic delay


we can accommodate without violating timing constraints

 Good design principles


 Critical path design: Minimize the maximum logic delay
 Maximizes performance
 Balanced design: Balance maximum logic delays across different
parts of a system (i.e., between different pairs of flip flops)
 No bottlenecks + minimizes wasted time
 Bread and butter design: Optimize for the common case, but
make sure non-common-cases do not overwhelm the design
 Maximizes performance for realistic cases
127
Lecture Summary
 Timing in combinational circuits
 Propagation delay and contamination delay
 Glitches

 Timing in sequential circuits


 Setup time and hold time
 Determining how fast a circuit can operate

 Circuit Verification
 How to make sure a circuit works correctly
 Functional verification
 Timing verification

128
Digital Design & Computer Arch.
Lecture 8: Timing and Verification

Prof. Onur Mutlu

ETH Zürich
Spring 2021
19 March 2021

You might also like