00 Cummings Slidesf
00 Cummings Slidesf
Verilog Nonblocking
Assignments with Delays -
Myths & Mysteries
Clifford E. Cummings
Sunburst Design, Inc.
[email protected]
www.sunburst-design.com
2 of 67
Nonblocking
Events Update LHS of nonblocking assignments
Guideline #4: Mixed sequential and combinational logic in the same always block
- use nonblocking assignments
Guideline #5: Do not mix blocking and nonblocking assignments in the same
always block
Guideline #6: Do not make assignments to the same variable from more than one
always block
Guideline #7: Use $strobe to display values that have been assigned using
nonblocking assignments
Guideline #8: Do not make #0 procedural assignments
8 of 67
a
d q y
b combinational combinational
logic logic
clk
rst_n All combinational outputs settle out
immediately after the posedge clk
10 of 67
clk
rst_n
11 of 67
rst_n
clk
a
a d1
b b
d2 q2
q1
d1
clk q1
rst_n
d2
q2
12 of 67
Nonblocking
Empty (no nonblocking assignments to update)
Events
sblk1 Sequential
Sunburst Design
(Clocked) Logic Timing
Sequential logic outputs
change on posedge clks
rst_n
clk
a
a d1
b b
d2 q2
q1
d1
clk q1
rst_n
d2
q2
14 of 67
sblk1 Other
Sunburst Design
Combinational Logic Timing
Internal combinational outputs
change on posedge clks
(after sequential logic)
rst_n
clk
a
a d1
b b
d2 q2
q1
d1
clk q1
rst_n
d2
q2
15 of 67
Nonblocking
Update LHS of sequential logic nonblocking assignments
Events
Nonblocking
Empty (no additional nonblocking assignments to update)
Events
20 x 1000-bit registers
clk
rst_n
22 of 67
d q d q d q
d dd qq qq1 dd qq qq2 qq19 dd qq q
...
1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
clk
rst_n
23 of 67
Benchmark Models
(with inverters and without inverters) Sunburst Design
module dff (q, d, clk, rst_n);
parameter SIZE=100;
DFF models with
output [SIZE-1:0] q; no inverters
input [SIZE-1:0] d;
input clk, rst_n;
reg [SIZE-1:0] q;
DFF models with
always @(posedge clk or negedge rst_n) inverters
if (!rst_n) q <= 0;
else q <= d; module dffi (q, d, clk, rst_n);
endmodule parameter SIZE=100;
output [SIZE-1:0] q;
input [SIZE-1:0] d;
input clk, rst_n;
`define D #0
always @(posedge clk or negedge rst_n)
if (!rst_n) q <= `D 0; #4 Nonblocking assignments
else q <= `D d; with macro-added #0 delays
`define D
always @(posedge clk or negedge rst_n)
if (!rst_n) q <= `D 0; #5 Nonblocking assignments
else q <= `D d; with macro-added no delays
25 of 67
wire [SIZE-1:0] qq1, qq2, qq3, qq4, qq5, qq6, qq7, qq8, qq9;
wire [SIZE-1:0] qq10, qq11, qq12, qq13, qq14, qq15, qq16, qq17, qq18, qq19;
...
20 registers
1000 bits each
`DFF #(SIZE) u17 (.q(qq17), .d(qq16), .clk(clk), .rst_n(rst_n));
`DFF #(SIZE) u18 (.q(qq18), .d(qq17), .clk(clk), .rst_n(rst_n));
`DFF #(SIZE) u19 (.q(qq19), .d(qq18), .clk(clk), .rst_n(rst_n));
`DFF #(SIZE) u20 (.q( q), .d(qq19), .clk(clk), .rst_n(rst_n));
endmodule
26 of 67
IBM ThinkPad T21, Pentium III-850MHz, 384MB RAM, Redhat Linux 6.2
VCS Version 6.2 - Simulation ended at Time: 800002150 ns
CPU Time Speed compared to
DFF pipeline (no inverters)
(seconds) no-delay model
Nonblocking #1 delays
376.460 29% slower
( <= #1 )
Blocking #1 delays
358.240 22% slower
( = #1 NOT RECOMMENDED)
Nonblocking #0 delays
307.630 5% slower
( <= `D and `define D #0 )
Nonblocking blank delays
292.880 ~same speed
( <= `D and `define D <no_value> )
27 of 67
Nonblocking #1 delays
839.270 92% slower
( <= #1 )
Blocking #1 delays
548.110 25% slower
( = #1 NOT RECOMMENDED)
Nonblocking #0 delays
447.770 2% slower
( <= `D and `define D #0 )
Nonblocking blank delays
437.960 ~same speed
( <= `D and `define D <no_value> )
28 of 67
IBM ThinkPad T21, Pentium III-850MHz, 384MB RAM, Redhat Linux 6.2
VCS Version 6.2 - Simulation ended at Time: 800002150 ns
CPU Time Speed compared to
DFF pipeline with inverters
(seconds) no-delay model
Nonblocking #1 delays
462.230 18% slower
( <= #1 )
Blocking #1 delays
458.750 18% slower
( = #1 NOT RECOMMENDED)
Nonblocking #0 delays
390.320 ~same speed
( <= `D and `define D #0 )
Nonblocking blank delays
390.630 ~same speed
( <= `D and `define D <no_value> )
29 of 67
Nonblocking #1 delays
1,112.130 66% slower
( <= #1 )
Blocking #1 delays
777.440 16% slower
( = #1 NOT RECOMMENDED)
Nonblocking #0 delays
744.160 11% slower
( <= `D and `define D #0 )
Nonblocking blank delays
673.95 1% slower
( <= `D and `define D <no_value> )
30 of 67
Speeds up logic and +rad does not affect +rad is not just for
event propagation delay scheduling cycle-based simulations
Linux
IBM ThinkPad T21, Pentium III-850MHz, 384MB RAM, Redhat Linux 6.2
Laptop
VCS Version 6.2 - including the +nbaopt command switch
CPU Time Speed compared to
DFF pipeline (no inverters)
(seconds) no-delay model
Nonblocking #1 delays
311.070 6% slower
( <= #1 )
SUN
SUN Ultra 80, UltraSPARC-II 450MHz, 1GB RAM, Solaris 8
Workstation
VCS Version 6.2 - including the +nbaopt command switch
CPU Time Speed compared to
DFF pipeline (no inverters)
(seconds) no-delay model
Nonblocking #1 delays
448.630 2% slower
( <= #1 )
32 of 67
IBM ThinkPad T21, Pentium III-850MHz, 384MB RAM, Redhat Linux 6.2
VCS Version 6.2 - (using +rad switch)
CPU Time Speed compared to
DFF pipeline (no inverters)
(seconds) no-delay model
Nonblocking #1 delays
293.250 26% slower
( <= #1 )
Blocking #1 delays
289.940 24% slower
( = #1 NOT RECOMMENDED)
Nonblocking #0 delays
229.290 2% faster
( <= `D and `define D #0 )
Nonblocking blank delays
233.100 ~same speed
( <= `D and `define D <no_value> )
33 of 67
IBM ThinkPad T21, Pentium III-850MHz, 384MB RAM, Redhat Linux 6.2
VCS Version 6.2 - (using +rad switch)
CPU Time Speed compared to
DFF pipeline with inverters
(seconds) no-delay model
Nonblocking #1 delays
294.480 25% slower
( <= #1 )
Blocking #1 delays
288.910 23% slower
( = #1 NOT RECOMMENDED)
Nonblocking #0 delays
228.410 3% faster
( <= `D and `define D #0 )
Nonblocking blank delays
234.510 2% faster
( <= `D and `define D <no_value> )
34 of 67
Combinational
module blk2_2 ( assignments
output reg q2,
input a, b,
input clk1a, clk1b, rst_n);
a d1
reg q1; b
wire d1 = a & b; clk1a block d2 q2
q1
wire d2 = q1 | d1; (q1 register)
clk1a
always @(posedge clk1a or negedge rst_n) clk1b
if (!rst_n) q1 <= 0; rst_n
else q1 <= d1;
clk1a
b
No race conditions!
d1
a d1 q1
b
d2 q2
q1 d2
clk1a
q2
clk1b
rst_n
36 of 67
Guideline #5: Do not mix blocking and nonblocking assignments in the same
always block
Synthesizes okay
c y
a
d q q
b
clk
rst_n
39 of 67
rst_n
The combinational d-signal clk1b
does not update when the
a and b inputs go high a
b
c y
c
a
d q q
b
d
clk q
rst_n
y
begin: logic
reg d;
if (!rst_n) q <= 0; clk changes ...
else begin rst_n
d = a & b;
q <= d; clk1b
d = 1'bx;
end a
end
b
c y c
a
d q q d
b
q
clk
rst_n y
module blk2a (
output reg q, q2,
output y,
input a, b, c,
input clk, rst_n); Signal d declared
reg d; at the module level
Desired logic
always @(posedge clk or negedge rst_n)
if (!rst_n) q <= 0;
else begin Combinational
d = a & b; intermediate c
q <= d; signal y
end a
d q q
b
assign y = q & c;
Buffered
d -output clk q2
always @(d) q2 = d;
rst_n
endmodule
43 of 67
Oops! q2 is now a
registered output
44 of 67
clk
rst_n
clk
rst_n
46 of 67
RTL Simulations
Sunburst Design
(With Multiple Balanced Clock Sources)
clk1b
clk1 clk1c
No simulation
rst_n problems
47 of 67
RTL Simulations
Sunburst Design
(With Instantiated PLL Clock Source)
#0.0 clk1b
#0.2 clk1c
Gate-Level Simulations
Sunburst Design
(With Instantiated PLL Clock Source)
a2 b2 c2 d2
clk1a
PLL
#0.1
skew
#0.0 clk1b
#0.2 clk1c
Gate-level models
have intrinsic delays
rst_n (no problems)
49 of 67
clk
rst_n
50 of 67
module vendor1_b0 (
Vendor Model Drives Correct RTL
output reg b,
input a, clk, rst_n); Sunburst Design
Model w/ Blocking Assignments
always @(posedge clk or negedge rst_n) Error in vendor1
if (!rst_n) b = 0; coding style
else b = a;
endmodule `timescale 1ns / 1ns
module vendor1_b1 (
output reg b,
input a, clk, rst_n);
a b c d
b = #1 a c <= #1 b d = #1 c
clk
rst_n
51 of 67
module myrtl_nb0 (
Correct RTL Model
output reg c,
input b, clk, rst_n); Sunburst Design
w/ Nonblocking Assignments
always @(posedge clk or negedge rst_n) Correct sequential
if (!rst_n) c <= 0; coding style
else c <= b;
endmodule `timescale 1ns / 1ns
module myrtl_nb1 (
output reg c,
input b, clk, rst_n);
a b c d
b = #1 a c <= #1 b d = #1 c
clk
rst_n
52 of 67
module vendor2_b0 (
Correct RTL Model Drives Vendor
output reg d,
input c, clk, rst_n); Sunburst Design
Model w/ Blocking Assignments
always @(posedge clk or negedge rst_n) Error in vendor2
if (!rst_n) d = 0; coding style
else d = c;
endmodule `timescale 1ns / 1ns
module vendor2_b1 (
output reg d,
input c, clk, rst_n);
a b c d
b = #1 a c <= #1 b d = #1 c
clk
rst_n
53 of 67
Scenario #2
always @(posedge clk ...)
vendor1_b1 myrtl_nb0 ... begin
b = #1 a; Fails
a b c c <= b; ...
b = #1 a c <= b
always @(posedge clk ...)
... begin
c <= b; Passes
clk
rst_n b = #1 a; ...
54 of 67
Scenario #4
always @(posedge clk ...)
vendor1_b1 myrtl_nb1 ... begin
b = #1 a; Fails
a b c c <= #1 b; ...
b = #1 a c <= #1 b
always @(posedge clk ...)
... begin
c <= #1 b; Passes
clk
rst_n b = #1 a; ...
55 of 67
Scenario #6
always @(posedge clk ...)
myrtl_nb0 vendor1_b1 ... begin
c <= b; Passes
b c d d = #1 c; ...
c <= b d = #1 c
always @(posedge clk ...)
... begin
d = #1 c; Passes
clk
rst_n c <= b; ...
56 of 67
Scenario #8
always @(posedge clk ...)
myrtl_nb1 vendor1_b1 ... begin
c <= #1 b; Passes
b c d d = #1 c; ...
c <= #1 b d = #1 c
always @(posedge clk ...)
... begin
d = #1 c; Passes
clk
rst_n c <= #1 b; ...
57 of 67
No delays on
dff models
clk
rst_n
59 of 67
Linux
IBM ThinkPad T21, Pentium III-850MHz, 384MB RAM, Redhat Linux 6.2
Laptop
VCS Version 6.2 - #1 delays only added to the 2,000 I/O flip-flops
Nonblocking #1 delays
376.460 29% slower
( <= #1 )
Nonblocking #1 delays only on the 2,000
375.710 28% slower
I/O flip-flops
SUN
SUN Ultra 80, UltraSPARC-II 450MHz, 1GB RAM, Solaris 8
Workstation
VCS Version 6.2 - #1 delays only added to the 2,000 I/O flip-flops
Nonblocking #1 delays
839.270 92% slower
( <= #1 )
Nonblocking #1 delays only on the 2,000
833.720 90% slower
I/O flip-flops
Reference 60 of 67
Material
Gate Simulations
Sunburst Design
With SDF Delays See the paper for more details
Race condition
initial begin
rst_n = 0;
...
end
No race condition
initial begin
rst_n <= 0;
...
end
clk=1 at time 0
`define cycle 10 No race condition
...
initial begin
clk <= 1;
forever #(`cycle/2) clk = ~clk);
end
63 of 67
Guideline #5: Do not mix blocking and nonblocking assignments in the same
Better
always block
Guideline #6: Do not make assignments to the same variable from more than
Better
one always block
65 of 67
– Use continuous assignments to drive inout pins only. Do not use them
to model internal conbinational functions. Prefer sequential code instead
Procedural blocks are more prone
to Verilog race conditions
Guideline #4: Mixed sequential and combinational logic in the same always block
- use nonblocking assignments
Guideline #5: Do not mix blocking and nonblocking assignments in the same
always block
Guideline #6: Do not make assignments to the same variable from more than one
always block
Guideline #7: Use $strobe to display values that have been assigned using
nonblocking assignments
Guideline #8: Do not make #0 procedural assignments
67 of 67