Session 2
Session 2
Outline
2
Outline
3
ASIC/FPGA Design Flow
Standard
Specifications
Cells
Pre-Layout Post-Layout
Simulation Yes Timing Yes Back Yes
RTL Coding Synthesis APR Timing Logic
Pass? Alanysis Annotation Alanysis verification
Pass? Pass?
NO NO
Test Bench Timing NO
Constraints
Tapeout
1. HDL Coding 2. Simulation 3. Synthesis 4. Placement & routing 5. Timing Analysis & Verification
In this course we learn all the above steps in detail for ASIC
1. HDL Coding
Standard
Specifications
Cells
Pre-Layout Post-Layout
Simulation Yes Timing Yes Back Yes
RTL Coding Synthesis APR Timing Logic
Pass? Alanysis Annotation Alanysis verification
Pass? Pass?
NO NO
Test Bench Timing NO
Constraints
Tapeout
HDL allows us to describe the functionality of a logic circuit in a language that is:
Easy to understand
Easy to share
Hides complicated implementation details
Designer more concerned about the design functionality than the detailed circuit
design
2. Simulation by Testbenches
Standard
Specifications
Cells
Pre-Layout Post-Layout
Simulation Yes Timing Yes Back Yes
RTL Coding Synthesis APR Timing Logic
Pass? Alanysis Annotation Alanysis verification
Pass? Pass?
NO NO
Test Bench Timing NO
Constraints
Tapeout
After HDL coding, the code has to be tested using “testbenches” (Verification).
Simulation tools:
Synopsys VCS (Synopsys)
Modelsim (Mentor Graphics)
NCVerilog (Cadence)
3. Synthesis
Standard
Specifications
Cells
Pre-Layout Post-Layout
Simulation Yes Timing Yes Back Yes
RTL Coding Synthesis APR Timing Logic
Pass? Alanysis Annotation Alanysis verification
Pass? Pass?
NO NO
Test Bench Timing NO
Constraints
Tapeout
Synthesis tool:
Analyzes a piece of Verilog code and converts it into optimized logic gates
This conversion is done according to the “language semantics”
We have to learn these language semantics, i.e., Verilog code.
3. Synthesis
If a designer can design 150 gates a day, it will take 6666 man’s
day to design a 10-million gate design, or almost 20 years for 10
designers! This is assuming a linear grow of complexity when
design gets bigger.
3. Synthesis
Synthesis tool:
Input:
HDL Code
“Technology library” file Standard cells (known by transistor size, 90nm)
o Basic gates (AND, OR, NOR, …)
o Macro cells (Adders, Muxes, Memory, Flip-flops, …)
Constraint file (Timing, area, power, loading requirement, optimization Alg.)
Output:
A gate-level “Netlist” of the design
Timing files (.sdf)
3. Synthesis Tools
HDL Tech Lib Constraints
Synthesis tool:
Gate-level Netlist
Synthesis
Tool
Standard
Specifications
Cells
Pre-Layout Post-Layout
Simulation Yes Timing Yes Back Yes
RTL Coding Synthesis APR Timing Logic
Pass? Alanysis Annotation Alanysis verification
Pass? Pass?
NO NO
Test Bench Timing NO
Constraints
Tapeout
Standard
Specifications
Cells
Pre-Layout Post-Layout
Simulation Yes Timing Yes Back Yes
RTL Coding Synthesis APR Timing Logic
Pass? Alanysis Annotation Alanysis verification
Pass? Pass?
NO NO
Test Bench Timing NO
Constraints
Tapeout
Standard
Specifications
Cells
Pre-Layout Post-Layout
Simulation Yes Timing Yes Back Yes
RTL Coding Synthesis APR Timing Logic
Pass? Alanysis Annotation Alanysis verification
Pass? Pass?
NO NO
Test Bench Timing NO
Constraints
Tapeout
Standard
Specifications
Cells
Pre-Layout Post-Layout
Simulation Yes Timing Yes Back Yes
RTL Coding Synthesis APR Timing Logic
Pass? Alanysis Annotation Alanysis verification
Pass? Pass?
NO NO
Test Bench Timing NO
Constraints
Tapeout
Logic Verification
Simulate and test the very final netlist after APR
Timing analysis using testbenches
Send the final design (GDS file) for fabrication
Outline
17
Introduction: Digital Logic Design
Conventional Approach:
Schematic Entry good for fairly small designs
(Draw K-maps, optimize the Boolean logic, draw the schematic)
A
B
Y
C (10 gates)
D
Clk
Solution:
Describe the design in text Hardware Description Language (HDL)
Just describe the design “behavior” not the detailed gate-level logic
Gate-level logic is generated automatically by a “synthesis” tool
Introduction: Why HDL?
Complicated designs can be easily described by HDL
Designers can make decisions about cost, performance, power, and area
earlier in the design process
Advantages of HDL Coding
There are several benefits to using an HDL to describe your design:
Verilog:
Developed by Philip Moorby in 1985 as a proprietary language
Open to public by Cadence Design Systems in 1990
IEEE standard in 1995 and revised in 2001
VHDL Verilog
Commissioned in 1981 by Department of Defense Created by Gateway Design Automation in 1985
Initially created for ASIC Synthesis Initially an interpreted language for gate-level simulation
Strong support for package management and large No special extensions for large designs
designs
ADA-like software language C-like software language
Design is composed of entities each of which can have Design is composed of modules which have just one
multiple architectures implementation
Gate-level, dataflow, and behavioral modeling. Gate-level, dataflow, and behavioral modeling.
Synthesizable subset. Synthesizable subset.
Harder to learn and use Easy to learn and use
Verilog in Three Flavors
There are three types of Verilog Coding:
Behavioral:
Most
Describes a system by the flow of data between its functional Blocks
Descriptive
Defines signal values when they change
Structural:
Shows detailed design components, nets, and interconnects
Least
Uses technology-specific, low-level components Descriptive
Used to pass netlist information b/w design tools (e.g., from DC to APR)
RTL (Register Transfer Level):
Describe how data transfers b/w registers and input/outputs
Focus of Describes a system by the flow of data and control signals between and Somehow
this course
within its functional blocks Descriptive
Defines signal values with respect to a clock
Verilog Coding Styles
RTL Behavioral Structural
module RTL ( A, B, C, D, Out); module behavior (A,B, C, D, Out); module structural (A,B, C, D, Out);
input A, B, C, D; input A, B, C, D; input A, B, C, D;
output Out; output Out; output Out;
reg Out; reg Out; wire n30;
always @ (A or B or C or D) always @ (A or B or C or D) EO U9 ( .A(D), .B(C), .Z(n30) );
begin begin AN3 U8 ( .A(A), .B(n30), .C(B), .Z(Out) );
if (A & B & ~D) if (A & B & ~D) endmodule
Out = C; Out = #5 C;
else if (A & D & ~C) else if (A & D & ~C)
Out = B; Out = #3 B;
else else if ((A ==1'bx) | (B ==1'bx) |
Out = 0; (C ==1'bx) |(D ==1'bz))
end Out = #7 1'bx;
endmodule else if ((A ==1'bz) | (B ==1'bZ))
Out = #7 1'bZ;
else
Our Focus Out = #3 0;
end
endmodule
B
Verilog in Three Flavors : Behavioral
Verilog in Three Flavors : RTL
Verilog in Three Flavors : Structural
Verilog Coding Styles: Levels of Abstraction
Verilog Coding Styles: Levels of Abstraction
Trade-offs:
Verilog Coding Styles: Levels of Abstraction
One language for all levels:
Verilog Coding Styles: Design Style
Verilog, like any other hardware description language, permits a design in either
Bottom-up or Top-down methodology.
Bottom-Up Design
• The traditional method of electronic design is bottom-up. Each design is
performed at the gate-level using the standard gates. With the increasing
complexity of new designs this approach is nearly impossible to maintain.
New systems consist of ASIC or microprocessors with a complexity of
thousands of transistors. These traditional bottom-up designs have to give
way to new structural, hierarchical design methods.
Top-Down Design
• A real top-down design allows early testing, easy change of different
technologies, a structured system design and offers many other
advantages. But it is very difficult to follow a pure top-down design. Due
to this fact most designs are a mix of both methods, implementing some
key elements of both design styles.
Verilog for Synthesis (RTL)
In this course we focus on RTL coding
RTL coding is the closest one to the actual hardware implementation
RTL code includes a subset of all Verilog syntax
Not all Verilog syntax are synthesizable
We cover most Verilog coding parts that are needed for logic synthesis
Simulation of the RTL code is also covered
We learn how to write a “good” Verilog code for synthesis
Lots of examples on the synthesized RTL!
In Combinational Out
Comb. Logic
Logic
Critical path
Clk
Verilog Applications
39
Verilog Fundamentals : Comment
Comments are used for documentation
input
output module
Signal
Net
wire:
For interconnecting logic elements (LEs)
To connect an output of a logic element to the input of another LE
tri
Circuit nodes that are connected in a tri-state fashion
Variable
reg (unsigned in general)
Corresponds to a circuit node (not necessarily a register!)
Allow a circuit to be described in terms of its behavior
Retains its value until it is overwritten by a subsequent assignment
integer (signed in general)
Used for loop counters
Signal
The “wire” declarations are not necessary as Verilog assumes that signals
are nets by default .
The “reg” declaration is required!
Example:
module DUT (A, B, C) ; Don’t forget semicolon
input [1:0] A;
output B;
inout [2:0] C;
Body-code
endmodule
Signal
Example: Out
DUT DUT_
module DUT (s, Out);
input [3:0] s;
Ports
output [2:0] Out;
Wire
wire [2:0] Out; (for interconnection)
Signals reg [2:0] Count;
integer k; Loop counter
Count = 0;
for (k=0; k<4; k=k+1)
Code if (s[k])
Body Count = Count + 1; “;” at the end of each line
assign Out = Count;
endmodule
Signal
The keyword “reg” does NOT necessarily denote a storage element or register.
“reg” only models the behavior of a circuit.
May or may not be synthesized as a register.
a a
C C
b b
Clk
Verilog Fundamentals : Signal Range
Signals in Verilog can be:
Signal
Scalar: representing a node
reg C;
Type Range Name Value
wire B;
0
Scalar Vector
1 Z
X: Unknown
Verilog Fundamentals : Signal Strength
pull Driving
large Storage Type Range Name Value
weak Driving
medium Storage Scalar Vector
small Storage
highz High Impedance weakest
Verilog Fundamentals : Signal Value
Vector: <# of bits> <base> <number> Signal
Example:
module DUT (s, Out)
parameter n = 3;
parameter S0 = 4’b1010;
input [n-1:0] s;
output [n:0] Out;
endmodule
Verilog Fundamentals : Memories
Memory:
A two-dimensional array of bits
Declared in Verilog as a two-dimensional variable (reg)
Example: A 4-byte memory: 0 1 2 3 4 5 6 7
R[0]
reg [7:0] R [3:0]; R[1]
R[2]
R[3]
8-bit 4 rows (cell)
R[2][5]
A three-dimensional array may also be declared. (indexing method)
Example:
reg [7:0] M [3:0][1:0];
If an 8-bit A is declared then the legal assignment is:
reg [7:0] A; A = M[3][0];
Verilog Fundamentals : Operators
Example:
Bitwise: Operation Result
1010 & 1100 1000
& 1010
1010 | 1100 1110 1100
~1010 0101
1000
1101 Λ 0100 1001
Logical:
Operation Result
X || 1 = 1 1010 && 1100 1 Non-zero operand=logical “1”
X && 0 = 0
2’b11 || 2’b00 1
!0010 0
& 111 1
Λ 0100 1
Relational:
Operation Result
A=2’b10 B=(A == 2’b10) B=1
D = A << 2 D = 110000
F = A >> 3 F = 000001
Concatenation:
Operation Result
A=2’b11 {A, B} 5’b11010
B=3’b010 {3{A}} 6’b111111
{B, B} 6’b010010
{{3{A}}, {2{B}}} 12’b111111010010
Be generous in {}
Verilog Fundamentals : Operators
Conditional: (? , : )
C
D = S ? B:C; 0
D
B 1
B if S=1;
D=
C if S=0;
S
D = ({S1,S2}==2’b00)? F: F
00
({S1,S2}==2’b01)? E:
E 01
({S1,S2}==2’b10)? C:B;
D 4-input
Default C
D = ({S1,S2}==2’b00)? F: 10 Multiplexer
({S1,S2}==2’b01)? E: B (MUX)
11
({S1,S2}==2’b10)? C:
({S1,S2}==2’b11)? B:B;
S1 S2
Verilog Fundamentals : Operators (All in One)
Verilog Fundamentals : Module-Revisited
Any circuit or subcircuit is declared as a “module” in Verilog.
There are three types of ports: module DUT (A, B, C)
input type “wire” input A;
output [3:0] B;
output type “wire” or “reg” inout C;
Signals
output [3:0] B; Combined
output reg [3:0] B; Body-code
reg [3:0] B;
endmodule
Verilog Fundamentals : Module Ports
Inside view of the module
input port: wire
output port : wire or reg
inout: wire net
net inout
Outside view of the module
input port: wire or reg input output
output port : wire reg or net net reg or net net
inout: wire
Verilog Fundamentals : Module-Revisited
In Verilog-2001 the port list can directly follow the module declaration
x C
y assign C = x & y; Equivalent
assign C[1] = A[1]&B[1];
assign C[2] = A[2]&B[2];
Statement Assignment assign C[3] = A[3]&B[3];
assign S = x ^ y ^ Cin;
assign {Cout, S} = x + y + Cin;
assign Cout = (x & y)|(x & Cin)|(y & Cin);
endmodule
endmodule
x y Cin Cout S
Cin
0 0 0 0 0
x 0 0 1 0 1
Cout
0 1 0 0 1
0 1 1 1 0
y
1 0 0 0 1
1 0 1 1 0
S 1 1 0 1 0
1 1 1 0 1
Concurrent Statements
Example: Signed vs. unsigned addition:
In Verilog “+” declares unsigned addition
Signed addition has to be explicitly specified using the sign extension
module Adder_sign (X, Y, S_unsigned, S_signed); module Adder_sign (X, Y, S_unsigned, S_signed);
input [3:0] X, Y; parameter n = 4;
output [4:0] S_unsigned, S_signed; input [n-1:0] X, Y;
output [n:0] S_unsigned, S_signed;
assign S_unsigned = X + Y;
assign S_unsigned = X + Y;
Sign
assign S_signed = {{X[3]},X} + {{Y[3]},Y}; assign S_signed = {{X[n-1]},X} + {{Y[n-1]},Y};
extension
endmodule endmodule
wire #2 S;
assign #5 S = x&y;
Correct Incorrect
A given variable should never be assigned a value in more than one always block.
Because always blocks are concurrent with respect to one another.
Evaluated and assigned in a single step Evaluated and assigned in two steps
Sequential nature 1. All RHSs are evaluated in parallel
Assignment ordering IS important 2. Assignments to LHSs are performed together
S=4 “blocks” a=S to be evaluated They all are evaluated all at once
a=S has to wait for S=4 to be evaluated first Assignment ordering is NOT important
S<=4 and a<=S evaluated in parallel
Blocking vs. Non-Blocking Assignments
Example: Swap bytes in words
B[15:8] B[7:0]
y1 y1
in y1 y2 in y2 in y2
Clk Clk
Overall Code Parallelism
Statements inside an always block are evaluated sequentially
However, all always blocks are evaluated concurrently
All continuous assignments are evaluated concurrently too
assign a=b&c; in b
always @ (c,b) d
begin Clk
d = c^b;
end e
assign e=b|c;
Verilog Assignments in a Glance
Verilog Assignments
Procedural Continuous
Inside an always block Using assign statement
assign a=b;
Blocking Non-blocking
always @ (*) always @ (*)
begin begin
= <=
= <=
end end
assign can not be used inside an always block b/c assign is used for nets.
Nets can not be assigned inside an always blocks (only reg or integer).
Outline
79
Logic Circuits Category
Logic Circuits:
Combinational logic: (realized by assign and always)
Output depends on inputs
Inputs propagates to the output through some gates with delay
e.g., adders, Mux, multiplier, all logic gates
Sequential Logic: (realized only by always)
Output depends on inputs and circuit history
Circuit history is kept using flip-flops, registers or latches
e.g., Finite State Machines (FSM), shift registers, Flip Flops (FF)
Sequential logic has two flavors:
Synchronous: all registers controlled by a global clock
Asynchronous: based on the handshaking process
Logic Circuits Category
A general system consists of both combinational and sequential circuits
assign assign
always always always always always
In Combinational Out
Comb. Logic
Logic
Critical path
Clk
Critical path of the Comb. Logic determines the max operating frequency
Combinational logic can be realized using assign and always constructs
Sequential logic can only be realized using always blocks.
Combinational Logic
Combinational logic can be realized using assign and always constructs
Example: Full Adder:
When using always block for Com. Logic, “blocking” assignments are used
When using an always block, time instant changes when one of the
sensitivity list variables changes
Blocking Assignment for Combinational Logic
Use only blocking assignments for combinational logic. Why?
Example: Accumulator: (Assume Count == 0)
Why?
1. Because powerful statements like if-else and loop constructs can only
be used inside an always block
Comes with more clarity and more concise description than assign
2. Multiple outputs can be assigned within a single always block
Sequential Logic
Sequential circuits have memory (i.e., remembers the past)
The current state is held in memory and the next state is computed through
the combinational logic
In a synchronous system, a global clock signal orchestrates the flow of the
data and the sequence of events
Input
Output
Comb. Next State Registers/
Current State Comb.
Logic (NS) Flip Flops
(CS) Logic
(FFs)
Clk
Sequential Logic
Sequential logic can only be realized using an always block
Consists of :
Flip flops that are normally controlled by:
Positive edge of the clock (posedge) always @ (posedge Clk)
Negative edge of the clock (negedge) always @ (negedge Clk)
Have posedge or negedge in the sensitivity list
Any variable assigned a value is the output of a flip-flop
Latches
Transfers input to output when clock is “1” and stores the value O.W.
Finite State Machine (FSM)
When using the always block for the sequential Logic, “Non-blocking”
assignments are used
Sequential Logic: Flip-Flop
Example: Flip-flop with asynchronous Reset:
high-fanout
reset signal
Reset
Sequential Logic
Example: D-Latch:
module Latch(D, Clk, Q);
input D, Clk;
D Q
output reg Q;
always @ (D, Clk)
if (Clk) Clk
Q<=D;
endmodule
Example:
Clk
Reset
D-Latech
FF (sync Rst)
FF (Async Rst)
Sequential Logic
Example: D-Latch:
module Latch(D, Clk, Q);
input D, Clk;
D Q
output reg Q;
always @ (D, Clk)
if (Clk) Clk
Q<=D;
endmodule
Both results in a latch
wire [n:0] d;
reg [n:0] q;
...
always @ (posedge Clk)
q<=d;
Signal
The keyword “reg” does NOT necessarily denote a storage element or register.
“reg” simply means a variable that can hold a value
May or may not be synthesized as a register.
a a
C C
b b
Clk
Sequential Logic
When using always block for sequential Logic, “Non-blocking”
assignments are used. Why?
always @ (posedge Clk) always @ (posedge Clk)
y1=in; y1<=in;
Race Condition
Clk Clk
in in
y1 y1
y2 ? ? ? ? ? ? y2
Sequential Logic
When using always blocks for sequential Logic, “Non-blocking”
assignments are used. Why?
Example: Shift register
Incorrect!
Combinational Out
In
Logic
Clk
Clk
Tsu Thold Tsu Thold
in
Tcq Tlogic Tsu
TClk>Tcq+Tlogic+Tsu
Tlogic<TClk-Tsu-Tcq
System Timing Parameters : Minimum Delay
Hold-time Condition:
If violates circuit does not work (even at lower frequencies) (why?)
Combinational Out
In
Logic
Clk
Clk
Tsu Thold Tsu Thold
in
Tcq,d Tlogic,cd Tcq,d Tlogic,cd
Tcq,cd+Tlogic,cd>Thold
Procedural Statements
Procedural Statements
module Mux21 (in1, in2, s, out) module Mux21 (in1, in2, s, out)
input in1, in2, s; input in1, in2, s;
Example: output reg out; output reg out;
always @ (in1, in2, s)
in1 0 always @ (in1, in2, s) begin
out if (s==0) out = in1;
in2 out = in1; if (s==1)
1
else out = in2;
s out = in2; end
endmodule endmodule
If-else statements
If-else construct inside an always block have a sequential nature when used
by blocking assignments. Sequential means direct effect on synthesis not
necessarily sequential in actual hardware implementation
This means ordering is important
Example: always @ (*) always @ (*)
begin begin
out = in1; if (s==1)
if (s==1) out = in2;
out = in2; out = in1;
end end
in1 0
out in1 out
in2 1
s
Procedural Statements
Procedural Statements
module Mux21 (in1, in2, s, out) module Mux21 (in1, in2, s, out)
Example: input in1, in2, s; input in1, in2, s;
output reg out; output reg out;
When realizing combinational logic with always block using if-else or case
constructs care has to be taken to avoid latch inference after synthesis
The latch is inferred when “incomplete” if-else or case statements are declared
If there is some logic path through the always block that does not assign a value
to the output, a latch is inferred
Latch Inference in If-else or Case Statements
Example: module DUT (A, B, S, out);
module DUT (A, B, S, out); input A, B;
input A, B, S; Input [1:0] S;
output reg out; output reg out;
always @(*) always @(A, B, S)
begin begin
if (S==1) case (S)
out = A; 2’b00: out = A;
end 2’b01: out = B;
endmodule endcase
end
endmodule
A out
D Q A
0
S out
Clk D Q
B 1
Clk
Latch Inference
S[0] S[1]
Latch Inference
Latch Inference in Combinational Logic
To avoid latch inference make sure to specify all possible cases “explicitly”
Do NOT let it up to the synthesis tool to act in unspecified cases and do specify
all cases explicitly.
Avoid Latch Inference in If-else Statements
module DUT (A, B, S, out); module DUT (A, B, S, out);
Example: input A, B, S; input A, B, S;
output reg out; output reg out;
always @(*) 1 always @(*)
B
A S[1]
0
out B
D Q
B 1
S[0] out
Clk
S[0] S[1] A
Top Module
M1 M2
Inputs
M3 M1 M1
Outputs
M2 M1
Using Sub-modules
There are some built-in primitive logic gates in Verilog that can be instantiated
Built-in primitives means there is no need to define a module for these gates
and, or, nor, ….
Example:
module Myand(In1, In2, out)
input In1, In2;
output out;
reg out;
3. always block always @(In1, In2)
out = (In1 & In2);
endmodule
Using Sub-modules : Gate-level Primitives
Gate-Level primitives:
output S, Cout;
wire S, C; Cout Cin
C[3] C[2] C[1]
assign S = x ^ y ^ Cin;
assign Cout = (x & y)|(x & Cin)|(y & Cin);
S[3] S[2] S[1] S[0]
endmodule
defparam stage1.n = 2;
RippleCarryAdderI stage1 (.Cin(C), .X(X[4:3]), .Y(Y[4:3]), .S(S[4:3]), .Cout(Cout));
endmodule
Sub-modules Instantiation
IN_2
Example: DUT
IN_0
in1 out1 w1 OUT_2
M1 M1
M1 (6) (3)
Parameter length =10;
IN_1
in2 out2 w3
w2 M1
(10)
OUT_0
defparam stage0.length = 6;
M1 stage0 (IN[0], IN[1], w1, w2);
defparam stage1.length = 3;
M1 stage1 (.in1(w1), .in2(IN[2]), .out2(w3), .out1(OUT[2]));
endmodule
Function Construct
function may be used to have a modular code without defining separate modules
A function can have multiple inputs but does not have any output
function my4-to-1MUX;
input [0:3] W;
input [1:0] s;
if (s==0) my4-to-1MUX = W[0];
else if (s==1) my4-to-1MUX = W[1];
else if (s==2) my4-to-1MUX = W[2];
else if (s==3) my4-to-1MUX = W[3];
endfunction
if (S[3:2]==0) Out= M[0];
always@ (W, S) else if (S[3:2]==1) Out= M[1];
begin else if (S[3:2]==2) Out= M[2];
M[0] = my4-to-1MUX(W[0:3],S[1:0]); else if (S[3:2]==3) Out= M[3];
M[1] = my4-to-1MUX(W[4:7],S[1:0]);
M[2] = my4-to-1MUX(W[8:11],S[1:0]);
M[3] = my4-to-1MUX(W[12:15],S[1:0]);
end
endmodule
Function Construct with multiple-bit output
Example:
module test_fcn (a, b, c, Out); module test_fcn (a, b, c, Out);
input a, b, c; input a, b, c;
output reg [2:0] Out; output [2:0] Out;
A task can only be called from inside and always (or initial) block
task 4-to-1MUX;
input [0:3] W;
input [1:0] s;
output Result;
begin
if (s==0) Result= W[0];
elseif (s==1) Result = W[1];
elseif (s==2) Result = W[2];
elseif (s==3) Result = W[3];
end
endtask
always@ (W, S)
begin
4-to-1MUX(W[0:3],S[1:0], M[0]);
4-to-1MUX(W[4:7],S[1:0] , M[1]);
4-to-1MUX(W[8:11],S[1:0] , M[2]);
4-to-1MUX(W[12:15],S[1:0] , M[3]);
4-to-1MUX(M[0:3],S[3:2] , Out);
end
endmodule
HDL for Synthesis (Priority logic)
The order in which assignments are written in an always block may affect the logic
that is synthesized. (both conditions in if and else if can be true!)
Example: 0
always @ (s0,s1, d0, d1) 0
begin 0
Q = 0; d1 1 Q
if (s0) Q = d0; d0 1
else if (s1) Q = d1;
end s1
s0
Different
0
0
always @ (s0,s1, d0, d1) 0
begin d0 1 Q
Q = 0; d1 1
if (s1) Q = d1;
else if (s0) Q = d0; s0
end
s1
Non of the above infer latch, why?
Example: Up & Down Counters
4-Bit unsigned down-counter 4-Bit up-counter with
with synchronous set asynchronous reset and
modulo maximum
module D_counter (C, S, Q); module U_counter (C, CLR, Q);
input C, S; parameter
output [3:0] Q; MAX_SQRT = 4,
reg [3:0] tmp; MAX = (MAX_SQRT*MAX_SQRT);
always @(posedge C) input C, CLR;
begin output [MAX_SQRT-1:0] Q;
if (S) reg [MAX_SQRT-1:0] cnt;
tmp <= 4’b1111; always @ (posedge C or posedge CLR)
else begin
tmp <= tmp - 1’b1; if (CLR)
end cnt <= 0;
assign Q = tmp; else
cnt <= (cnt + 1) %MAX;
endmodule end
assign Q = cnt;
endmodule
Accumulator
Accumulates multiple successive k-bit values and stores them into a k-bit register
The number of successive numbers (Num) as an input
module Accumulator (In, Num, Clk, Rst, Out);
parameter k = 8;
parameter m = 4;
input [k-1:0] In;
input [m-1:0] Num;
Reset input Clk, Rst;
output reg [k-1:0] Out;
Sum Out wire [k-1:0] Sum;
reg [m-1:0] C;
In wire En, Cout;
Clk
defparam stage0.n = k;
RippleCarryAdderI stage0 (.Cin(0), .X(In), .Y(Out), .S(Sum), .Cout(Cout));
always@ (posedge Clk, negedge Rst)
if (Rst == 0)
begin
Num En C <= Num;
Out <= {k{1‘b0}};
Down
end
Counter
else if (En)
begin
C <= C-1;
Out <= Sum;
end
assign En = |C;
endmodule
Outline
136
Finite State Machine (FSM)
Used to implement control sequencing
An FSM is defined by
set of inputs
set of outputs
set of states
initial state
transition function
output function
Input
Output
Comb. Next State Flip Flops Comb.
(NS) Current State
Logic (FFs) Logic
(CS)
Input
Output
Comb. Next State Flip Flops Comb.
(NS) Current State
Logic (FFs) Logic
(CS)
Output
assign ……………
Calculation
Calculation
CS <= A; Sequential
else
CS <= NS; (Non-Blocking)
endmodule
FSM
Example: Moore Machine module moore (Clk, w, Resetn, z);
input Clk, w, Resetn;
Output: wire output z;
reg [1:0] CS, NS;
parameter A = 2'b00, B = 2'b01, C = 2'b10;
143
Tri-State Logic in Verilog
Tri-state buffer:
module tri-buffer (A, y, EN)
input A, EN;
A EN = 1 A Y output Y;
Y=
Z EN = 0 assign Y = (EN) ? A : 1’bZ;
EN
endmodule
2. Half-duplex communication:
A A
3. Bus multiplexing: a 8 8
s 8 Out[7:0]
b 8 8
Tri-State Applications
Example: Adder with four options
module tri-adder (a, b, c, d, S_ab, S_cd, Out);
a 8 8
input S_ab, S_cd;
S_ab 8 input [7:0] a, b, c, d;
p output [8:0] Out;
wire [7:0] p, q;
b 8 8
endmodule
input Si, L, R;
input [7:0] In;
output [7:0] Out;
The length of WI and WF are calculated based on the dynamic range of variables
Total length: WI + WF
Verilog Operations: Fixed-Point Simulation
Typical word lengths:
Sing Bit
WI WF
0: positive
Sign Bit =
1: negative
W W 1 1 WF
Good to represent quantized numbers in the range: 2 I , 2 I
2
WF
Resolution :
1
2
Example:
in (3,3) 011101 represents 3.625 (smallest number: 0.125)
in (3,5) 10111000 represents -2.25 (smallest number: 0.03125)
Fixed-Point Simulation: Rounding
Eliminates LSB bits
Need to reduce the number of bits due to word growth
For example, if we multiply two 5-bit words, the product will have 10 bits,
i.e., xxxxx × yyyyy = zzzzzzzzzz and we likely don’t want or need all that
precision
Matlab rounding:
round(∙): towards nearest integer
Pos. and neg. numbers are rounded symmetrically about zero
Generally the best possible rounding algorithm
fix(∙): truncates towards zero
Pos. and neg. numbers are rounded symmetrically about zero
floor(∙): rounds towards negative infinity
ceil(∙): rounds towards positive infinity
Fixed-Point Modeling: Casting
Care must be taken when dealing with fixed-point numbers
Casting: To convert a number with a larger bit length to a smaller one
A:
WI WF
W W
I I
W W
F F
B:
WI WF
(10,4) 0 0 0 0 1 1 0 1 1 1 0 1 0 0 (10,4) 0 0 0 1 1 1 0 1 1 1 0 1 0 0
(7,2) 0 1 1 0 1 1 1 0 1 (7,2) 0 1 1 1 1 1 1 1 1
WI WF WI WF
(10,4) 1 1 1 1 1 1 0 1 1 1 0 1 0 0 (10,4) 1 1 0 1 1 1 0 1 1 1 0 1 0 0
(6,3) 1 1 0 1 1 1 0 1 0 (6,3) 1 0 0 0 0 0 0 0 0
Fixed-Point Modeling: Sign Extension
To convert a number with a smaller bit length to a larger one sign extension
is required.
WI WF
A:
WI WI (6,3) 1 1 0 1 1 1 0 1 0
WI WF WF WF
B: 0 0
(10,4) 1 1 1 1 1 1 0 1 1 1 0 1 0 0
WI WF
C[n-1:0]
Overflow may happen if:
0110 1010
A[n-1]==1 and B[n-1]==1 and C[n-1]==0 + +
0111 1001
A[n-1]==0 and B[n-1]==0 and C[n-1]==1
1101 10011
assign SUM = B + A;
assign OV = (A[n-1]==1 && B[n-1]==1 && C[n-1]==0)||
(A[n-1]==0 && B[n-1]==0 && C[n-1]==1);
𝐸𝑥𝑝−127
The most widely used form of floating-point is IEEE Standard for Binary
Floating-Point Arithmetic (IEEE 754) with two major formats:
Single-precision (32-bit)
Double-precision (64-bit)
Floating-point Implementation
The floating-point implementation concurs a complicated hardware compared to
the fixed-point counterpart.
Take into account as an example a floating-point adder!
This additional logic is needed to perform the various normalization steps for the adder
implementation.
Fixed-point vs. Floating-point
The area comparison for floating-point is additionally complicated as the
relationship between multiplier and adder area is now changed.
In fixed-point, multipliers are generally viewed to be N times bigger than
adders where N is the word length.
However, in floating-point, the area of floating-point adders is comparable
to that of floating-point multipliers which corrupts the assumption at the
algorithmic stages to reduce number of multiplications in favor of additions.
Table below gives some figures on area and speed figures for floating-point
addition and multiplication implemented in a Xilinx Virtex 4 FPGA
technology.
Verilog Operations: $signed and $unsigned
A = $signed(B)
Sign extends B and assigns it to A
bit width(B) < bit width (A)
Example
wire [5:0] A; A = 111110
assign A = $signed (3b’110);
A = $unsigned (B)
Zero fill B and assign it to A
bit width(B) < bit width (A)
Example
wire [5:0] A;
A = 000110
assign A = $unsigned (3b’110);
Verilog Operations: Signed Addition
There are two ways to perform signed addition:
1. Sing Extension:
wire [2:0] A, B;
wire [3:0] SUM; Same result
assign SUM = {B[2],B} + {A[2],A};
1110 (-2)
0010 (+2)
2. Using signed signals
10000 (0)
wire signed [2:0] A, B; Discard Overflow
wire signed [3:0] SUM;
assign SUM = B + A;
Wrong otherwise:
wire [2:0] A, B; 110 (-2)
wire [3:0] SUM; 010 (+2) (Wrong)
assign SUM = B + A; 1000 (-8)
Verilog Operations: Signed Addition with Carry-in
There are two correct ways to perform signed addition with carry-in:
1. Sing Extension:
wire [2:0] A, B;
wire Cin; Same result
wire [3:0] SUM; (-2)
1110
assign SUM = {B[2],B} + {A[2],A} + Cin; 0010 (+2)
0001 Cin
Discard Overflow
wire signed [2:0] A, B;
wire Cin;
wire signed [3:0] SUM;
assign SUM = B + A + $signed({1’b0},Cin);
Verilog Operations: Signed Addition with Carry-in
Incorrect Codes:
wire signed [2:0] A, B; 110 (-2)
wire Cin;
If any operand of an operation is
010 (+2)
wire signed [3:0] SUM; unsigned, the entire operation is
1 Cin
assign SUM = B + A + Cin; (9)
performed unsigned
1001
1110 (-2)
wire signed [2:0] A, B; When Cin=1, it sign extends it, to
wire Cin; 0010 (+2)
match the size of A and B,
wire signed [3:0] SUM; 1111 Cin
1111 (-1) which is incorrect!
assign SUM = B + A + $signed(Cin);
Verilog Operations: Signed Multiplication
Use signed construct as we used for signed addition:
1. Use Verilog constructs:
Complicated!
Verilog Operations: Signed Multiplication
Multiplication of a signed number and an unsigned number:
Correct:
wire signed [2:0] A; 110 (-2)
wire [2:0] B; 111 (7)
wire signed [5:0] PROD;
110010 (-14)
assign PROD = A*$signed({1’b0,B});
Incorrect:
wire signed [2:0] A; wire signed [2:0] A;
wire [2:0] B; wire [2:0] B;
wire signed [5:0] PROD; wire signed [5:0] PROD;
assign PROD = A*$signed(B); assign PROD = A*B;
56=32+16+8 56=64-8
input [7:0] in; input [7:0] in;
wire [13:0] product; wire [13:0] product;
assign product = assign product =
{in[7], in, 5’b00000} {in, 6’b00000}
+ {in[7], in[7], in, 4’b0000} - {in[7], in[7], in[7], in, 3’b000};
+ {in[7], in[7], in[7], in, 3’b000};
Verilog Operations: Constant Multiplication
Multiplication with a set of constant numbers may be implemented more
efficiently: P ab b{-7,-5,-3,-1,1,3,5,7}
a
b
4
3
2
1
LSB
0111 b[4]
0101
1 0 1 0 1 0
0011
0001 0 1 0
b[2] b[3]
1111
1101 1 0
b[3]b[4] b[3]b[4]
1011
1001 P=axb
Verilog Operations: Constant Multiplication
Simpler way for implementation: P ab b{-7,-5,-3,-1,1,3,5,7}
Area (um2) Critical Path Multiplier
1800 3.5 Constant MUL
12000 5.1 Normal MUL
a b
b
C.M. 0 <<1 <<2 <<3
4
3
2
1
MSB
LSB
1 0 1 0
b[2] b[3]
0111
0101 b[1]b[2]b[3] 1 0 1 0
b[4] b[3]
0011 +
0001 b[3]b[2]b[1]
1111
1101 1 0
b[4]
1011
1001 Constant Multiplier
P ab
Verilog Operations: Complex Multiplication
A complex multiplication is equivalent to four real multiplications
-
c+d
d
bd ac+bd
b
- Real
c ac ac-bd
Pipelined Complex Multiplication
Pipelined Implementation:
(a jb)(c jd) (ac bd) j(a b)(c d) - (ac bd)
bd ac+bd
b
- Real
c ac ac-bd
Squaring
x2 can be done with about half the hardware of a full multiply (for a
dedicated squaring block, of course)
x3 x2 x1 x0 x3 x2 x1 x0
x0 x0
x1 x1
x2 x2
x3 x3
Diagonals (x0 x0, x1 x1, …) can be replaced by the single input bit with
no computation for that bit b/c we have x0 AND x0= x0
Resource-Shared Complex Multiplication
inputs n
General Architecture: Data Path
m outputs
Data Path:
Transfer input data signals into outputs
Normally combinational logic or counters Controller
Clk
Controller:
Provides any control signal to determine the direction of data flow
Examples: Reset, set, MUX select signals, …
Sequential logic
Resource-Shared Complex Multiplication
HDL Code:
Control Path:
1. a_r * b_r →pp1_reg
2. a_i * b_i →pp2_reg
3. pp1 –pp2 →p_r_reg
a_r * b_i →pp1_reg
4. a_i * b_r →pp2_reg
5.pp1 + pp2 →p_i_reg
Write Read
we
cs&!we&oe
oe
data
Clk CLK data_out
DATAOUT[7:0]
Clk
RADDR[7:0] EN
address
WADDR[7:0]
DATAIN[7:0] cs&!we&oe
cs&we
mem
(SYNC RAM)
Verilog Memories: Dual-Port RAM
256-Byte DPRAM:
Two separate read/write operations
we0
Clk
address0 RADDR[7:0] EN
WADDR[7:0]
cs1
we1
data1
Clk CLK data_out1
DATAOUT[7:0]
Clk
address1 RADDR[7:0] EN
WADDR[7:0]
we1
1 data1
Clk CLK data_out1
DATAOUT[7:0]
00
Clk
RADDR[7:0] EN
WADDR[7:0] cs1 & !we1 & oe1
address1
DATAIN[7:0]
cs1 & we1
mem_dual
(SYNC RAM)
1 data0
DATAIN[7:0] data_out0
DATAOUT[7:0]
00
address0 Clk
RADDR[7:0] EN
cs0 & !we0 & oe0
WADDR[7:0] cs0 & !we0 & oe0
Clk
cs0 mem
(SYNC RAM)
we0
oe0
FIFO
First-In/First-Out buffer
Connecting producer and consumer
Decouples rates of production/consumption
Masked ROM
Data manufactured into the ROM
When there are multiple assignments to the same variable in an always block,
the last statement is evaluated
Example:
module DUT(Count );
output reg [2:0] Count;
integer k; module DUT(Count );
output reg [2:0] Count;
always @ (*) integer k;
begin
Count <= 0; always @ (*)
for (k=0; k<4; k=k+1) Counter
Count <= Count + 3; 3
Count <= Count + k; endmodule
end
endmodule
Reviews and Notes
Two codes with different simulation results might have the same synthesized circuit
Therefore, to avoid mismatch b/w simulation and synthesized version, the sensivity
list of always block should include all the signals on the RHS
Coding Styles
Do not mix blocking and non-blocking assignments in an always block
Use parentheses to optimize logic structure
Use meaningful names for signals, variables, and modules
Define if-else and case statements explicitly to avoid latch inference
Multiple procedural assignments (inside an always block) to a single variable is allowed.
The last assignment is evaluated.
Multiple continuous assignments (assign) to a single net in NOT allowed.
Do not mix edge and level sensitive elements together
Use assign statements for simple comb. logic and always block for complex comb. logic
Avoid mixing positive-edge and negative-edge triggered flip-flops in one design
Confuses the timing closure
Coding Styles : Parentheses
Coding Styles : Parentheses
Difference b/w HDL and HLL (1)
In HLL (high-level language) assignment order is important
In HDL for “assign” and “non-blocking” assignments, order is NOT important
Example:
b nb
wire na, nb; wire na, nb; s
y
Result: na = a&~s;
wire na; b
s Illegal
HDL: assign y = na|nb; na
(only used for tri-state implementation)
assign na = b&s;
assign na = a&~s;
a