0% found this document useful (0 votes)
30 views67 pages

00 Cummings Slidesf

Uploaded by

belucky96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views67 pages

00 Cummings Slidesf

Uploaded by

belucky96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Sunburst Design

Verilog Nonblocking
Assignments with Delays -
Myths & Mysteries
Clifford E. Cummings
Sunburst Design, Inc.
[email protected]
www.sunburst-design.com
2 of 67

Agenda Sunburst Design

• IEEE 1364 reference model & event queue


• Review 8 Guidelines to avoid "death by Verilog!"
• 0-delay models - Nonblocking assignments happen first
• Inertial & transport delays
• Delay line modeling with transport delays
• Benchmark VCS simulations with and without #1 delays
• VCS switches: +nbaopt and +rad
• Multiple common clocks - are there race conditions?
• Mixing blocking & nonblocking assignments
• Mixed RTL & gate simulations
• Miscellaneous SDF notes & testbench tricks
• Flawed guidelines - better guidelines - Conclusions
3 of 67
IEEE Std 1364 - Section 5.4 Confusing

The Verilog simulation reference model Sunburst Design


T refers to the current simulation time
while (there are events) { and all events are held in the event
if (no active events) { queue, ordered by simulation time.
if (there are inactive events) {
else activate all inactive events; activate #0 events
} else if (there are nonblocking assign update events) {
activate all nonblocking assign update events;
} else if (there are monitor events) { activate LHS of
activate all monitor events; nonblocking assignments
} else {
activate $monitor & $strobe
advance T to the next event time;
activate all inactive events for time T;
} If no events left in the current time
} advance to the next simulation time
E = any active event;
if (E is an update event) { Execute the active events
update the modified object;
add evaluation events for sensitive processes to event queue;
} else { /* shall be an evaluation event */
evaluate the process;
add update events to the event queue;
}
}
4 of 67
IEEE Std 1364 - Section 5.4
Simplified Verilog simulation reference model Sunburst Design
First: set T=0 / set nets=HiZ / set variables=X /
activate always blocks / activate initial blocks
while (there are events) {
if (there are active events) {
E = any active event; Execute all active events
if (E is an update event) {
update the modified object;
add evaluation events for sensitive processes to event queue;
} schedule newly triggered events
else { // this is an evaluation event, so ...
evaluate the process;
add update events to the event queue;
} ... update LHS of
} after all active events have executed ...
nonblocking
else if (there are nonblocking update events) { assignments
activate all nonblocking update events;
}
execute $monitor and $strobe commands before advancing time
else {
advance T to the next event time; Advance to the next
activate all inactive events for time T; simulation time
} and start over
}
5 of 67

IEEE1364-1995 Verilog Sunburst Design


Stratified Event Queue
Blocking assignments
Evaluate RHS of nonblocking These events may
assignments be scheduled in
Active Events any order
Continuous assignments
$display command execution
Evaluate inputs and change
outputs of primitives

Inactive Events #0 blocking assignments *Guideline #8: do not


use #0 delays

Nonblocking Events Update LHS of nonblocking assignments

Monitor Events $monitor command execution


$strobe command execution

Other specific PLI commands


* Guidelines on slide 7
6 of 67

IEEE1364-1995 Verilog Sunburst Design


Stratified Event Queue
Active Blocking assignments These events may
Events Evaluate RHS of NBAs be scheduled in
Continuous assignments any order

Nonblocking
Events Update LHS of nonblocking assignments

Can trigger These events


nested events Blocking assignments
Active may be
in the same Evaluate RHS of NBAs
Events scheduled in
time step Continuous assignments any order
Nonblocking
Update LHS of nonblocking assignments
Events

Monitor $monitor command execution


Events $strobe command execution
7 of 67

8 Guidelines to avoid Sunburst


Follow these guidelinesDesign
and
Coding Styles that Kill! remove 90-100% of the
Verilog race conditions

• In general, following specific coding guidelines can eliminate


Verilog race conditions:
Guideline #1: Sequential logic - use nonblocking assignments
Guideline #2: Latches - use nonblocking assignments
Guideline #3: Combinational logic in an always block - use blocking assignments

Guideline #4: Mixed sequential and combinational logic in the same always block
- use nonblocking assignments
Guideline #5: Do not mix blocking and nonblocking assignments in the same
always block
Guideline #6: Do not make assignments to the same variable from more than one
always block
Guideline #7: Use $strobe to display values that have been assigned using
nonblocking assignments
Guideline #8: Do not make #0 procedural assignments
8 of 67

Are There Exceptions to the Sunburst Design


Guidelines?
Probably!
• How to judge valid exceptions:

– Does the exception make the simulation significantly faster?

– Does the exception make the code more understandable?

– Does the exception make the coding effort much easier?

Faster! ... More understandable! ... Easier!

• If not, the exception is probably not a good exception


9 of 67

For 0-Delay RTL Models Sunburst Design


Nonblocking Assignments finish first!

• ??? - The Verilog event queue schedules blocking


assignments before nonblocking assignments
• Using clk-edge-based simulation techniques ...
negedge clk negedge clk posedge clk posedge clk
(1) blocking (1) nonblocking
1 2 (2) blocking 1 2 (2) blocking
a and b stimulus q nonblocking y blocking
d changes
change changes changes

a
d q y
b combinational combinational
logic logic

clk
rst_n All combinational outputs settle out
immediately after the posedge clk
10 of 67

Simple sblk1 Example Sunburst Design


Simple testbench
module sblk1 ( module tb;
output reg q2, reg a, b, clk, rst_n;
input a, b, clk, rst_n); Clock
reg q1, d1, d2; initial begin oscillator
clk = 0;
always @(a or b or q1) begin forever #10 clk = ~clk;
d1 = a & b; end Instance
d2 = d1 | q1; Combinational
end logic sblk1 u1 (.q2(q2), .a(a), .b(b),
.clk(clk), .rst_n(rst_n));
always @(posedge clk or negedge rst_n)
if (!rst_n) begin initial begin
q2 <= 0; a = 0; b = 0; Stimulus
q1 <= 0; rst_n <= 0;
end @(posedge clk);
else begin 2 flip-flops @(negedge clk) rst_n = 1;
q2 <= d2; a = 1; b = 1;
q1 <= d1; @(negedge clk) a = 0;
end @(negedge clk) b = 0;
endmodule a d1 @(negedge clk) $finish;
b end
d2 q2 endmodule
q1

clk
rst_n
11 of 67

sblk1 Input Stimulus & Input


Sunburst Design
Combinational Logic Timing
External combinational inputs
typically change on negedge clks

rst_n

clk

a
a d1
b b
d2 q2
q1
d1
clk q1
rst_n
d2

q2
12 of 67

sblk1 Input Stimulus & Input


Sunburst Design
Combinational Logic Event Scheduling

Blocking assignment (clk = ~ clk; // testbench negedge clk)


Active
Triggers testbench stimulus commands @negedge clk
Events
Triggers combinational inputs on the device under test

Nonblocking
Empty (no nonblocking assignments to update)
Events

Monitor $monitor command execution (if any) Combinational


Events $strobe command execution (if any) blocking
assignments

Advance to next event


(should be a posedge clk blocking assignment)

(Blocking assignment in the


testbench clock oscillator)
13 of 67

sblk1 Sequential
Sunburst Design
(Clocked) Logic Timing
Sequential logic outputs
change on posedge clks

rst_n

clk

a
a d1
b b
d2 q2
q1
d1
clk q1
rst_n
d2

q2
14 of 67

sblk1 Other
Sunburst Design
Combinational Logic Timing
Internal combinational outputs
change on posedge clks
(after sequential logic)

rst_n

clk

a
a d1
b b
d2 q2
q1
d1
clk q1
rst_n
d2

q2
15 of 67

sblk1 Sequential &


Sunburst Design
Combinational Logic Event Scheduling

Blocking assignment (clk = ~ clk; // testbench posedge clk)


Active
Triggers evaluation of RHS of sequential logic NBAs
Events

Nonblocking
Update LHS of sequential logic nonblocking assignments
Events

Activate and execute NBAs events


Active
Triggers combinational logic blocking assignments
Events
(after the NBAs in the same time step)

Nonblocking
Empty (no additional nonblocking assignments to update)
Events

Monitor $monitor command execution (if any)


Events $strobe command execution (if any)

negedge clk - triggers stimulus input events


Advance to next event
posedge clk - triggers sequential logic events
Reference 16 of 67
Material
Inertial & Transport Delays Sunburst Design
for gate-level simulations
See the paper for more details

• Command line switches for gate-level simulations


– Reject pulses shorter than x% of the propagation delay
+pulse_r/x

– Display unknowns (errors) for pulses greater than x% of the propagation


delay but shorter than y% of the propagation delay
+pulse_e/y

– Enable transport delays for gate-level simulation


+transport_path_delays
Reference 17 of 67
Material
Simple Test Buffer Sunburst Design
with 5ns propagation delay
See the paper for more details

Simple delay buffer model Simple testbench


(delaybuf.v) (tb.v)

`timescale 1ns/1ns `timescale 1ns/1ns


module delaybuf module tb;
(output y, input a); reg a;
integer i; instance
buf u1 (y, a); Verilog buffer
primitive delaybuf i1 (.y(y), .a(a));
specify
(a*>y) = 5; initial begin
endspecify a=0; Stimulus
endmodule #10 a=~a;
5ns specify block for (i=1;i<7;i=i+1)
delay from a-to-y #(i) a=~a;
#20 $finish;
5ns end
a y endmodule
Reference 18 of 67
Material
Inertial & Transport Delays Sunburst Design
commands & displays

vcs -RI +v2k tb.v delaybuf.v +pulse_r/100 +pulse_e/100

Pure inertial delays

vcs -RI +v2k tb.v delaybuf.v +pulse_r/0 +pulse_e/0 +transport_path_delays

Pure transport delays


Reference 19 of 67
Material
Error & Mixed Delays Sunburst Design
commands & displays

vcs -RI +v2k tb.v delaybuf.v +pulse_r/0 +pulse_e/100

Pure unknown (error) delays

vcs -RI +v2k tb.v delaybuf.v +pulse_r/40 +pulse_e/80

Mixed delays (r/40 & e/80)


Pulses shorter than 40% of 5ns are filtered out
Pulses between 40% & 80% of 5ns are passed as X's
Pulses greater than 80% of 5ns are passed
20 of 67

Delay Line Model Sunburst Design


Transport Delays
Both models use
Verilog-2001 enhanced
Delay line model with coding style Parameterized delay line
two output taps model with two output taps

`timescale 1ns / 1ns `timescale 1ns / 1ns


module DL2 in DL2 module DL2
(output reg y1, y2, #(parameter TAP1 = 25,
input in); TAP2 = 40)
25ns y1 (output reg y1, y2,
always @(in) begin delay input in);
y1 <= #25 in;
y2 <= #40 in; 40ns y2 always @(in) begin
end delay y1 <= #TAP1 in;
endmodule y2 <= #TAP2 in;
end
RHS nonblocking delays endmodule
are transport delays
NOTE: Synthesis tools
ignore delays
These events are scheduled into
future nonblocking assign
Cannot synthesize
update event queues
delay lines
21 of 67

Benchmark Circuit #1 Sunburst Design


20K bits of Sequential Logic

20 x 1000-bit registers

d d q qq1 d q qq2 qq18 d q qq19 d q q


...
1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000

clk
rst_n
22 of 67

Benchmark Circuit #2 Sunburst Design


20K bits Sequential / 40K bits Combinational
20 x 1000-bit registers
with inverted-inputs
and inverted-outputs

d q d q d q
d dd qq qq1 dd qq qq2 qq19 dd qq q
...
1000 1000 1000 1000 1000 1000 1000 1000 1000 1000

clk
rst_n
23 of 67
Benchmark Models
(with inverters and without inverters) Sunburst Design
module dff (q, d, clk, rst_n);
parameter SIZE=100;
DFF models with
output [SIZE-1:0] q; no inverters
input [SIZE-1:0] d;
input clk, rst_n;
reg [SIZE-1:0] q;
DFF models with
always @(posedge clk or negedge rst_n) inverters
if (!rst_n) q <= 0;
else q <= d; module dffi (q, d, clk, rst_n);
endmodule parameter SIZE=100;
output [SIZE-1:0] q;
input [SIZE-1:0] d;
input clk, rst_n;

reg [SIZE-1:0] qq;


wire [SIZE-1:0] dd;

Invert inputs to flip-flops and assign q = ~qq;


invert outputs from flip-flops assign dd = ~d;

always @(posedge clk or negedge rst_n)


if (!rst_n) qq <= 0;
else qq <= dd;
endmodule
24 of 67

Benchmark RTL Code Sunburst Design


With and Without Delays
always @(posedge clk or negedge rst_n)
if (!rst_n) q <= 0; #1 Nonblocking assignments
else q <= d; with no delays

always @(posedge clk or negedge rst_n)


if (!rst_n) q <= #1 0; #2 Nonblocking assignments
else q <= #1 d; with #1 delays

always @(posedge clk or negedge rst_n)


if (!rst_n) q = #1 0; #3 Blocking assignments
else q = #1 d; with #1 delays (BAD)

`define D #0
always @(posedge clk or negedge rst_n)
if (!rst_n) q <= `D 0; #4 Nonblocking assignments
else q <= `D d; with macro-added #0 delays

`define D
always @(posedge clk or negedge rst_n)
if (!rst_n) q <= `D 0; #5 Nonblocking assignments
else q <= `D d; with macro-added no delays
25 of 67

Large Pipeline Benchmark Circuit


Sunburst Design
(top-level model)
Command line options to select different +define+DFF="dff"
DFF models with or without delays +define+DFF="dff1"
+define+DFF="dff1b"
module dffpipe (q, d, clk, rst_n);
+define+DFF="dff0"
parameter SIZE=1000; +define+DFF="dff_"
output [SIZE-1:0] q;
input [SIZE-1:0] d;
input clk, rst_n;

wire [SIZE-1:0] qq1, qq2, qq3, qq4, qq5, qq6, qq7, qq8, qq9;
wire [SIZE-1:0] qq10, qq11, qq12, qq13, qq14, qq15, qq16, qq17, qq18, qq19;

`DFF #(SIZE) u1 (.q( qq1), .d( d), .clk(clk), .rst_n(rst_n));


`DFF #(SIZE) u2 (.q( qq2), .d( qq1), .clk(clk), .rst_n(rst_n));
`DFF #(SIZE) u3 (.q( qq3), .d( qq2), .clk(clk), .rst_n(rst_n));
`DFF #(SIZE) u4 (.q( qq4), .d( qq3), .clk(clk), .rst_n(rst_n));

...
20 registers
1000 bits each
`DFF #(SIZE) u17 (.q(qq17), .d(qq16), .clk(clk), .rst_n(rst_n));
`DFF #(SIZE) u18 (.q(qq18), .d(qq17), .clk(clk), .rst_n(rst_n));
`DFF #(SIZE) u19 (.q(qq19), .d(qq18), .clk(clk), .rst_n(rst_n));
`DFF #(SIZE) u20 (.q( q), .d(qq19), .clk(clk), .rst_n(rst_n));
endmodule
26 of 67

Benchmark Results - Circuit #1 Sunburst Design


Pipeline with no inverters
Linux Laptop

IBM ThinkPad T21, Pentium III-850MHz, 384MB RAM, Redhat Linux 6.2
VCS Version 6.2 - Simulation ended at Time: 800002150 ns
CPU Time Speed compared to
DFF pipeline (no inverters)
(seconds) no-delay model

No delays 292.920 Baseline no-delay model

Nonblocking #1 delays
376.460 29% slower
( <= #1 )
Blocking #1 delays
358.240 22% slower
( = #1 NOT RECOMMENDED)
Nonblocking #0 delays
307.630 5% slower
( <= `D and `define D #0 )
Nonblocking blank delays
292.880 ~same speed
( <= `D and `define D <no_value> )
27 of 67

Benchmark Results - Circuit #1 Sunburst Design


Pipeline with no inverters
SUN Workstation

SUN Ultra 80, UltraSPARC-II 450MHz, 1GB RAM, Solaris 8


VCS Version 6.2 - Simulation ended at Time: 800002150 ns
CPU Time Speed compared to
DFF pipeline (no inverters)
(seconds) no-delay model

No delays 438.090 Baseline no-delay model

Nonblocking #1 delays
839.270 92% slower
( <= #1 )
Blocking #1 delays
548.110 25% slower
( = #1 NOT RECOMMENDED)
Nonblocking #0 delays
447.770 2% slower
( <= `D and `define D #0 )
Nonblocking blank delays
437.960 ~same speed
( <= `D and `define D <no_value> )
28 of 67

Benchmark Results - Circuit #2 Sunburst Design


Pipeline with inverters
Linux Laptop

IBM ThinkPad T21, Pentium III-850MHz, 384MB RAM, Redhat Linux 6.2
VCS Version 6.2 - Simulation ended at Time: 800002150 ns
CPU Time Speed compared to
DFF pipeline with inverters
(seconds) no-delay model

No delays 390.140 Baseline no-delay model

Nonblocking #1 delays
462.230 18% slower
( <= #1 )
Blocking #1 delays
458.750 18% slower
( = #1 NOT RECOMMENDED)
Nonblocking #0 delays
390.320 ~same speed
( <= `D and `define D #0 )
Nonblocking blank delays
390.630 ~same speed
( <= `D and `define D <no_value> )
29 of 67

Benchmark Results - Circuit #2 Sunburst Design


Pipeline with inverters
SUN Workstation

SUN Ultra 80, UltraSPARC-II 450MHz, 1GB RAM, Solaris 8


VCS Version 6.2 - Simulation ended at Time: 800002150 ns
CPU Time Speed compared to
DFF pipeline with inverters
(seconds) no-delay model

No delays 668.170 Baseline no-delay model

Nonblocking #1 delays
1,112.130 66% slower
( <= #1 )
Blocking #1 delays
777.440 16% slower
( = #1 NOT RECOMMENDED)
Nonblocking #0 delays
744.160 11% slower
( <= `D and `define D #0 )
Nonblocking blank delays
673.95 1% slower
( <= `D and `define D <no_value> )
30 of 67

Command Line Switches Sunburst Design


+nbaopt and +rad

• VCS command line switch to remove delays from the RHS of


nonblocking assignments
+nbaopt can be used
+nbaopt to remove #1 delays

• VCS command line switch to improve overall simulation


performance
+rad is actually a family of optimizations that will
+rad make improvements to non-timing designs

Speeds up logic and +rad does not affect +rad is not just for
event propagation delay scheduling cycle-based simulations

Synopsys reports some designs achieve large speedups with +rad


(typically the uglier the code, the larger the speedup)
31 of 67

Benchmark Results - Circuit #1 Sunburst Design


With +nbaopt Command Switch

Linux
IBM ThinkPad T21, Pentium III-850MHz, 384MB RAM, Redhat Linux 6.2
Laptop
VCS Version 6.2 - including the +nbaopt command switch
CPU Time Speed compared to
DFF pipeline (no inverters)
(seconds) no-delay model

No delays 293.770 Baseline no-delay model

Nonblocking #1 delays
311.070 6% slower
( <= #1 )

SUN
SUN Ultra 80, UltraSPARC-II 450MHz, 1GB RAM, Solaris 8
Workstation
VCS Version 6.2 - including the +nbaopt command switch
CPU Time Speed compared to
DFF pipeline (no inverters)
(seconds) no-delay model

No delays 439.000 Baseline no-delay model

Nonblocking #1 delays
448.630 2% slower
( <= #1 )
32 of 67

Benchmark Results - Circuit #1 Sunburst Design


With +rad Command Switch
Linux Laptop

IBM ThinkPad T21, Pentium III-850MHz, 384MB RAM, Redhat Linux 6.2
VCS Version 6.2 - (using +rad switch)
CPU Time Speed compared to
DFF pipeline (no inverters)
(seconds) no-delay model

No delays 233.540 Baseline no-delay model

Nonblocking #1 delays
293.250 26% slower
( <= #1 )
Blocking #1 delays
289.940 24% slower
( = #1 NOT RECOMMENDED)
Nonblocking #0 delays
229.290 2% faster
( <= `D and `define D #0 )
Nonblocking blank delays
233.100 ~same speed
( <= `D and `define D <no_value> )
33 of 67

Benchmark Results - Circuit #2 Sunburst Design


With +rad Command Switch
Linux Laptop

IBM ThinkPad T21, Pentium III-850MHz, 384MB RAM, Redhat Linux 6.2
VCS Version 6.2 - (using +rad switch)
CPU Time Speed compared to
DFF pipeline with inverters
(seconds) no-delay model

No delays 233.710 Baseline no-delay model

Nonblocking #1 delays
294.480 25% slower
( <= #1 )
Blocking #1 delays
288.910 23% slower
( = #1 NOT RECOMMENDED)
Nonblocking #0 delays
228.410 3% faster
( <= `D and `define D #0 )
Nonblocking blank delays
234.510 2% faster
( <= `D and `define D <no_value> )
34 of 67

Multiple Common Clocks


Sunburst Design
& Race Conditions

Combinational
module blk2_2 ( assignments
output reg q2,
input a, b,
input clk1a, clk1b, rst_n);
a d1
reg q1; b
wire d1 = a & b; clk1a block d2 q2
q1
wire d2 = q1 | d1; (q1 register)
clk1a
always @(posedge clk1a or negedge rst_n) clk1b
if (!rst_n) q1 <= 0; rst_n
else q1 <= d1;

always @(posedge clk1b or negedge rst_n)


if (!rst_n) q2 <= 0;
else q2 <= d2;
endmodule clk1b block
(registered q2 output)
35 of 67

Multiple Common Clocks Sunburst Design


Sequential logic outputs
change on posedge clks
If there is no skew between
clk1a and clk1b and ... rst_n

clk1a

always @(clk1a) clk1b


clk1b <= clk1a;
a

b
No race conditions!
d1

a d1 q1
b
d2 q2
q1 d2
clk1a
q2
clk1b
rst_n
36 of 67

Do Not Mix Assignments! Sunburst Design


(This guideline is often challenged)

Guideline #5: Do not mix blocking and nonblocking assignments in the same
always block

This is a VHDL-like No simulation advantage


coding style in Verilog

• Reasons to avoid this coding style


– Understanding event scheduling can be confusing
– Mis-ordering statements -or- multiple NBAs will cause problems
– Inputs, outputs and clocks all change simultaneously in a waves display
37 of 67

Mixed Assignment Example #1 Sunburst Design

Named block to permit


module blk1a ( local declarations
output reg q,
output y,
input a, b, c,
input clk, rst_n);
Combinational
always @(posedge clk or negedge rst_n) begin: logic intermediate
reg d; signal
if (!rst_n) q <= 0;
else begin
d = a & b; c
q <= d; y
end a
Clocked output d q q
end b
signal
assign y = q & c;
clk
endmodule
rst_n
38 of 67

Mixed Assignment Example #1 Sunburst Design


(synthesis result)

Synthesizes okay

c y
a
d q q
b

clk
rst_n
39 of 67

Yucky Waveform Display! Sunburst Design

clk changes ...

rst_n
The combinational d-signal clk1b
does not update when the
a and b inputs go high a

b
c y
c
a
d q q
b
d

clk q
rst_n
y

... d input changes


at the same time !!
... the q output changes
40 of 67

Mixed Assignment Example #1 Sunburst Design


(with x-assignments)

Named block to permit


module blk1a ( local declarations
output reg q,
output y,
input a, b, c,
input clk, rst_n);
Combinational
always @(posedge clk or negedge rst_n) begin: logic intermediate
reg d; signal
if (!rst_n) q <= 0;
else begin
d = a & b; c
q <= d; y
d = 1'bx; a
d q q
end b
end
Extra X-assignment to avoid clk
assign y = q & c; waveform confusion (???) rst_n
endmodule
41 of 67

Is This Really Much Better?? Sunburst Design

begin: logic
reg d;
if (!rst_n) q <= 0; clk changes ...
else begin rst_n
d = a & b;
q <= d; clk1b
d = 1'bx;
end a
end
b

c y c
a
d q q d
b
q
clk
rst_n y

... d input changes


d input always appears
... the q output changes
to be unknown
42 of 67

Mixed Assignment Example #2 Sunburst Design

module blk2a (
output reg q, q2,
output y,
input a, b, c,
input clk, rst_n); Signal d declared
reg d; at the module level
Desired logic
always @(posedge clk or negedge rst_n)
if (!rst_n) q <= 0;
else begin Combinational
d = a & b; intermediate c
q <= d; signal y
end a
d q q
b
assign y = q & c;
Buffered
d -output clk q2
always @(d) q2 = d;
rst_n
endmodule
43 of 67

Mixed Assignment Example #2 Sunburst Design


(synthesis result)

Oops! q2 is now a
registered output
44 of 67

Mixed RTL & Gate-Level Simulations Sunburst Design

• Are there any problems with mixed RTL and gate-level


simulations?

mod1.v mod2.v mod3.v


a1a a1b b1a b1b c1a c1b
a1 b1 c1 d1
a1a <= a1; b1a <= b1; c1a <= c1;
a2a <= a2; b2a <= b2; c2a <= c2;
a2 a1b <= ...;
b2 b1b <= ...;
c2 c1b <= ...;
d2
a2b <= ...; b2b <= ...; c2b <= ...;
b1 <= ...; c1 <= ...; d1 <= ...;
b2 <= ...; c2 <= ...; d2 <= ...;

clk
rst_n

Partitioned ASIC design


(3 RTL partitions)
45 of 67

Mixed RTL & Gate-Level Simulations


Sunburst Design
Where are the problems?
Guideline: Add a #1 (or more)
always @(posedge clk or negedge rst_n) delay to RTL statements that
if (!rst_n)...; // reset regs drive gate-level models
else begin
a1a <= a1;
a2a <= a2; mod2.vg
a1b <= ...; (gates model)
a2b <= ...;
b1 <= #1 ...; mod1.v Tsetup = 1.3ns mod3.v
b2 <= #1 ...; (RTL model) Thold = 0.6ns (RTL model)
end

mod1.v mod2.vg mod3.v


a1a a1b b1a b1b c1a c1b
a1 b1 c1 d1
a1a <= a1; c1a <= c1;
a2a <= a2; c2a <= c2;
a2 a1b <= ...;
b2 c2 c1b <= ...;
d2
a2b <= ...; c2b <= ...;
b1 <= #1 ...; d1 <= ...;
b2 <= #1 ...; d2 <= ...;

clk
rst_n
46 of 67

RTL Simulations
Sunburst Design
(With Multiple Balanced Clock Sources)

Multiple clock drivers


(no nonblocking assignments
and no skew)

mod1.v mod2.v mod3.v


a1a a1b b1a b1b c1a c1b
a1 b1 c1 d1
a1a <= a1; b1a <= b1; c1a <= c1;
a2a <= a2; b2a <= b2; c2a <= c2;
a2 a1b <= ...;
b2 b1b <= ...;
c2 c1b <= ...;
d2
a2b <= ...; b2b <= ...; c2b <= ...;
clk1a b1 <= ...; c1 <= ...; d1 <= ...;
b2 <= ...; c2 <= ...; d2 <= ...;

clk1b
clk1 clk1c

No simulation
rst_n problems
47 of 67

RTL Simulations
Sunburst Design
(With Instantiated PLL Clock Source)

Instantiated clock driver


(PLL) with skewed clock
outputs

mod1.v mod2.v mod3.v


a1a a1b b1a b1b c1a c1b
a1 b1 c1 d1
a1a <= a1; b1a <= b1; c1a <= c1;
a2a <= a2; b2a <= b2; c2a <= c2;
a2 a1b <= ...;
b2 b1b <= ...;
c2 c1b <= ...;
d2
a2b <= ...; b2b <= ...; c2b <= ...;
clk1a b1 <= #1 ...; c1 <= #1 ...; d1 <= #1 ...;
b2 <= #1 ...; c2 <= #1 ...; d2 <= #1 ...;
PLL
#0.1
skew

#0.0 clk1b
#0.2 clk1c

Must add delays


rst_n to RTL outputs
48 of 67

Gate-Level Simulations
Sunburst Design
(With Instantiated PLL Clock Source)

Instantiated clock driver


(PLL) with skewed clock
outputs

mod1.v mod2.v mod3.v


a1a a1b b1a b1b c1a c1b
a1 b1 c1 d1

a2 b2 c2 d2
clk1a
PLL
#0.1
skew

#0.0 clk1b
#0.2 clk1c
Gate-level models
have intrinsic delays
rst_n (no problems)
49 of 67

Problem - Sunburst Design


Vendors That Use Blocking Assignments

• What happens when an incorrectly-coded vendor model


interacts with a correctly-coded RTL design?
– 1st Examine: Bad-vendor1 model driving a good RTL model
– 2nd Examine: Good RTL model driving a bad vendor2 model

Bad vendor1 Good RTL model Bad vendor2


coding style coding style coding style

vendor1_b1 myrtl_nb1 vendor2_b1

a blocking b nonblocking c blocking d


assignments assignments assignments

clk
rst_n
50 of 67
module vendor1_b0 (
Vendor Model Drives Correct RTL
output reg b,
input a, clk, rst_n); Sunburst Design
Model w/ Blocking Assignments
always @(posedge clk or negedge rst_n) Error in vendor1
if (!rst_n) b = 0; coding style
else b = a;
endmodule `timescale 1ns / 1ns
module vendor1_b1 (
output reg b,
input a, clk, rst_n);

always @(posedge clk or negedge rst_n)


if (!rst_n) b = #1 0;
else b = #1 a;
endmodule

vendor1_b1 myrtl_nb1 vendor2_b1

a b c d
b = #1 a c <= #1 b d = #1 c

clk
rst_n
51 of 67
module myrtl_nb0 (
Correct RTL Model
output reg c,
input b, clk, rst_n); Sunburst Design
w/ Nonblocking Assignments
always @(posedge clk or negedge rst_n) Correct sequential
if (!rst_n) c <= 0; coding style
else c <= b;
endmodule `timescale 1ns / 1ns
module myrtl_nb1 (
output reg c,
input b, clk, rst_n);

always @(posedge clk or negedge rst_n)


if (!rst_n) c <= #1 0;
else c <= #1 b;
endmodule

vendor1_b1 myrtl_nb1 vendor2_b1

a b c d
b = #1 a c <= #1 b d = #1 c

clk
rst_n
52 of 67
module vendor2_b0 (
Correct RTL Model Drives Vendor
output reg d,
input c, clk, rst_n); Sunburst Design
Model w/ Blocking Assignments
always @(posedge clk or negedge rst_n) Error in vendor2
if (!rst_n) d = 0; coding style
else d = c;
endmodule `timescale 1ns / 1ns
module vendor2_b1 (
output reg d,
input c, clk, rst_n);

always @(posedge clk or negedge rst_n)


if (!rst_n) d = #1 0;
else d = #1 c;
endmodule

vendor1_b1 myrtl_nb1 vendor2_b1

a b c d
b = #1 a c <= #1 b d = #1 c

clk
rst_n
53 of 67

Bad-Vendor1 Driving Good-RTL


(Equivalent code after compiled & flattened designs) ScenarioSunburst Design
#1 & Scenario #2
both have potential race
conditions
Scenario #1
always @(posedge clk ...)
vendor1_b0 myrtl_nb0 ... begin
b = a; Fails
a b c c <= b; ...
b=a c <= b

always @(posedge clk ...)


... begin
clk c <= b; Passes
rst_n b = a; ...

Scenario #2
always @(posedge clk ...)
vendor1_b1 myrtl_nb0 ... begin
b = #1 a; Fails
a b c c <= b; ...
b = #1 a c <= b
always @(posedge clk ...)
... begin
c <= b; Passes
clk
rst_n b = #1 a; ...
54 of 67

Bad-Vendor1 Driving Good-RTL


(Equivalent code after compiled & flattened designs) ScenarioSunburst Design
#3 & Scenario #4
also both have potential
race conditions
Scenario #3
always @(posedge clk ...)
vendor1_b0 myrtl_nb1 ... begin
b = a; Fails
a b c c <= #1 b; ...
b=a c <= #1 b

always @(posedge clk ...)


... begin
clk c <= #1 b; Passes
rst_n b = a; ...

Scenario #4
always @(posedge clk ...)
vendor1_b1 myrtl_nb1 ... begin
b = #1 a; Fails
a b c c <= #1 b; ...
b = #1 a c <= #1 b
always @(posedge clk ...)
... begin
c <= #1 b; Passes
clk
rst_n b = #1 a; ...
55 of 67

Good-RTL Driving Bad-Vendor2


(Equivalent code after compiled & flattened designs) ScenarioSunburst Design
#5 & Scenario #6
both simulate with no
race conditions
Scenario #5
always @(posedge clk ...)
myrtl_nb0 vendor1_b0 ... begin
c <= b; Passes
b c d d = c; ...
c <= b d=c

always @(posedge clk ...)


... begin
clk d = c; Passes
rst_n c <= b; ...

Scenario #6
always @(posedge clk ...)
myrtl_nb0 vendor1_b1 ... begin
c <= b; Passes
b c d d = #1 c; ...
c <= b d = #1 c
always @(posedge clk ...)
... begin
d = #1 c; Passes
clk
rst_n c <= b; ...
56 of 67

Good-RTL Driving Bad-Vendor2


(Equivalent code after compiled & flattened designs) ScenarioSunburst Design
#7 & Scenario #8
both simulate with no
race conditions
Scenario #7
always @(posedge clk ...)
myrtl_nb1 vendor1_b0 ... begin
c <= #1 b; Passes
b c d d = c; ...
c <= #1 b d=c

always @(posedge clk ...)


... begin
clk d = c; Passes
rst_n c <= #1 b; ...

Scenario #8
always @(posedge clk ...)
myrtl_nb1 vendor1_b1 ... begin
c <= #1 b; Passes
b c d d = #1 c; ...
c <= #1 b d = #1 c
always @(posedge clk ...)
... begin
d = #1 c; Passes
clk
rst_n c <= #1 b; ...
57 of 67

Urban Legend & Verilog Folklore Sunburst Design

• "Adding #1 to my nonblocking assignments makes up for


vendor coding problems" Nope! ... Sorry!

Vendor1 RTL Vendor2


Race Condition?
Model Model Model

b=a potential race condition


c <= b
b = #1 a potential race condition
b=a potential race condition
c <= #1 b
b = #1 a potential race condition
d=c NO race condition
c <= b
d = #1 c NO race condition
d=c NO race condition
c <= #1 b
d = #1 c NO race condition
58 of 67

Benchmark Circuit #3 Sunburst Design


Only add delays to I/O registers

Added #1 delays 20 x 1000-bit registers Added #1 delays


to iof models to iof models

No delays on
dff models

d d q qq1 d q qq2 qq18 d q qq19 d q q


...
1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000

clk
rst_n
59 of 67

Benchmark Results - Circuit #3 Sunburst Design


With delays only on the I/O flip-flops

Linux
IBM ThinkPad T21, Pentium III-850MHz, 384MB RAM, Redhat Linux 6.2
Laptop
VCS Version 6.2 - #1 delays only added to the 2,000 I/O flip-flops

No delays 292.920 Baseline no-delay model

Nonblocking #1 delays
376.460 29% slower
( <= #1 )
Nonblocking #1 delays only on the 2,000
375.710 28% slower
I/O flip-flops

SUN
SUN Ultra 80, UltraSPARC-II 450MHz, 1GB RAM, Solaris 8
Workstation
VCS Version 6.2 - #1 delays only added to the 2,000 I/O flip-flops

No delays 438.090 Baseline no-delay model

Nonblocking #1 delays
839.270 92% slower
( <= #1 )
Nonblocking #1 delays only on the 2,000
833.720 90% slower
I/O flip-flops
Reference 60 of 67
Material
Gate Simulations
Sunburst Design
With SDF Delays See the paper for more details

• Why run gate-level simulations with SDF timing delays?


Isn't is good enough to do
(1) functional simulations,
(2) static timing analysis (STA), and
(3) equivalence check the gates model to the RTL model?

– Full system simulation


– Equivalence checking software costs money
– Final regression simulations with SDF timing delays verifies STA and
equivalence checked models
61 of 67

Testbench Tricks Sunburst Design


Resets

Race condition
initial begin
rst_n = 0;
...
end

always @(posedge clk or negedge rst_n)


...

No race condition
initial begin
rst_n <= 0;
...
end

always @(posedge clk or negedge rst_n)


...
62 of 67

Testbench Tricks Sunburst Design


Clock Oscillator

Common clock oscillator


`define cycle 10 (clk=0 at time 0)
...
initial begin
clk = 0;
forever #(`cycle/2) clk = ~clk);
end

clk=1 at time 0
`define cycle 10 No race condition
...
initial begin
clk <= 1;
forever #(`cycle/2) clk = ~clk);
end
63 of 67

Testbench Tricks Sunburst Design


Change stimulus on clock edges
If the clock period changes,
the testbench still stays the same
Simple testbench
module tb;
reg a, b, clk, rst_n;
Free-running
initial begin clock oscillator
clk = 0;
forever #10 clk = ~clk;
end

sblk1 u1 (.q2(q2), .a(a), .b(b),


.clk(clk), .rst_n(rst_n));
Initialize a & b
initial begin
a = 0; b = 0; Reset at time 0 for one clock cycle
rst_n <= 0;
@(posedge clk); (negedge clk) release reset set a and b
@(negedge clk) rst_n = 1;
a = 1; b = 1;
(negedge clk) change a
@(negedge clk) a = 0;
@(negedge clk) b = 0;
@(negedge clk) $finish; (negedge clk) change b
end
endmodule (negedge clk) finish simulation
64 of 67

The Bergeron Guidelines Sunburst Design


Four flawed guidelines

• Four "Guidelines for Avoiding Race Conditions:"

– If a register is declared outside of the always or initial block, assign to it


using a nonblocking assignment. Reserve the blocking assignment for
registers local to the block

Guideline #5: Do not mix blocking and nonblocking assignments in the same
Better
always block

– Assign to a register from a single always or initial block

Guideline #6: Do not make assignments to the same variable from more than
Better
one always block
65 of 67

The Bergeron Guidelines Sunburst Design


Four flawed guidelines (cont.)

• Four "Guidelines for Avoiding Race Conditions:" (cont.)

– Use continuous assignments to drive inout pins only. Do not use them
to model internal conbinational functions. Prefer sequential code instead
Procedural blocks are more prone
to Verilog race conditions

For simple Boolean expressions, To group assignments or to include case,


use continuous assignments if-else, for-loops, use procedural blocks

– Do not assign any value at time 0


Initialize everything Use testbench nonblocking assignment
at time 0 tricks to avoid race conditions
66 of 67

8 Important Guidelines Sunburst Design

• In general, following specific coding guidelines can eliminate


Verilog race conditions:
Guideline #1: Sequential logic - use nonblocking assignments
Guideline #2: Latches - use nonblocking assignments
Guideline #3: Combinational logic in an always block - use blocking assignments

Guideline #4: Mixed sequential and combinational logic in the same always block
- use nonblocking assignments
Guideline #5: Do not mix blocking and nonblocking assignments in the same
always block
Guideline #6: Do not make assignments to the same variable from more than one
always block
Guideline #7: Use $strobe to display values that have been assigned using
nonblocking assignments
Guideline #8: Do not make #0 procedural assignments
67 of 67

Conclusions Sunburst Design

• Follow the 8 important coding guidelines


• Do not mix blocking and nonblocking assignments
• Either code NBAs with no delays or with macro #1 delays
No delays -or- `define D #1
`define D
always @(posedge clk or always @(posedge clk or
negedge rst_n) negedge rst_n) Simulations run up
if (!rst_n) q <= 0; if (!rst_n) q <= `D 0; to 100% faster
else q <= d; else q <= `D d; without #1 delays

• Mixed RTL & gates simulations require RTL-output delays


• Remember +nbaopt and +rad for faster VCS simulation

• Request: please give us a +nba1 switch!

You might also like