Digital Design With SystemVerilog

Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

Digital Design with SystemVerilog

Prof. Stephen A. Edwards


Columbia University

Spring 2014

Synchronous Digital Design Combinational Logic Sequential Logic Summary of Modeling Styles Example: Bresenhams Line Algorithm Testbenches

Why HDLs?
1970s: SPICE transistor-level netlists
Vdd

An XOR built from four NAND gates .MODEL P PMOS .MODEL N NMOS .SUBCKT NAND A M1 Y A Vdd Vdd M2 Y B Vdd Vdd M3 Y A X Vss M4 X B Vss Vss .ENDS X1 X2 X3 X4 A A B I2 B I1 I1 I3 I1 I2 I3 Y Vdd Vdd Vdd Vdd B Y Vdd Vss P P N N 0 0 0 0 NAND NAND NAND NAND

Y A B Vss

A X1 B I1

X2

I2 X4 Y

X3 I3

Why HDLs?
1980s: Graphical schematic capture programs

Why HDLs?
1990s: HDLs and Logic Synthesis
library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; entity ALU is port(A: in unsigned(1 downto 0); B: in unsigned(1 downto 0); Sel: in unsigned(1 downto 0); Res: out unsigned(1 downto 0)); end ALU; architecture behv of ALU is begin process (A,B,Sel) begin case Sel is when "00" => Res <= A + B; when "01" => Res <= A + (not B) + 1; when "10" => Res <= A and B; when "11" => Res <= A or B; when others => Res <= "XX"; end case; end process; end behv;

Separate but Equal: Verilog and VHDL

Verilog: More succinct, really messy VHDL: Verbose, overly exible, fairly messy Part of languages people actually use identical Every synthesis system supports both SystemVerilog a newer version. Supports many more features.

Synchronous Digital Design

The Synchronous Digital Logic Paradigm


Gates and D ip-ops only No level-sensitive latches
INPUTS OUTPUTS

All ip-ops driven by the same clock No other clock signals Every cyclic path contains at least one ip-op No combinational loops
STATE CLOCK

C L

NEXT STATE

Timing in Synchronous Circuits


Q

C L

CLK

tc CLK Q D

tc : Clock period. E.g., 10 ns for a 100 MHz clock

Timing in Synchronous Circuits


Q

C L

CLK

Sufcient Hold Time? tp(min,FF) CLK Q D tp(min,CL)

Hold time constraint: how soon after the clock edge can D start changing? Min. FF delay + min. logic delay

Timing in Synchronous Circuits


Q

C L

CLK

tp(max,FF) CLK Q D

Sufcient Setup Time? tp(max,CL)

Setup time constraint: when before the clock edge is D guaranteed stable? Max. FF delay + max. logic delay

Combinational Logic

Full Adder
Module name Data type: Input port single bit Port name // Full adder module full_adder( full_adder input logic a, a b, c, output logic sum, carry);

Single-line comment Systems are built from modules

Continuous assign sum = a ^ b ^ c; assignment c assign carry = a & b | a & c | b & c; expresses combinational endmodule logic
carry~0 b a carry~2 c carry~1 carry~3 carry

Logical Expression

sum sum

Operators and Vectors


Four-bit vector, little-endian style

module gates(input logic [3:0] a, b, output logic [3:0] y1, y2, y3, y4, y5); /* Five groups of two-input logic gates acting on 4-bit busses */ assign y1 = a & b; // AND assign y2 = a | b; // OR assign y3 = a ^ b; // XOR assign y4 = ~(a & b); // NAND assign y5 = ~(a | b); // NOR endmodule

Multi-line comment

Reduction AND Operator

module and8(input logic [7:0] a, output logic y); assign y = &a; // Reduction AND

// Equivalent to // assign y = a[7] & a[6] & a[5] & a[4] & // a[3] & a[2] & a[1] & a[0]; // Also ~|a // |a // ~|a // ^a // ~^a endmodule NAND OR NOR XOR XNOR

The Conditional Operator: A Two-Input Mux

s d0[3..0] d1[3..0]
3 3 0 1

y~0 y[3..0]

module mux2(input logic [3:0] d0, d1, input logic s, output logic [3:0] y); // Array of two-input muxes

y~1
2 2 0 1

y~2

assign y = s ? d1 : d0; endmodule

1 1

0 1

y~3
0 0 0 1

Operators in Precedence Order


!c -c &c ~&c |c ~|c ^c ~^c a*b a/b a%b a+b a-b a<<b a>>b a<<<b a>>>b a<b a<=b a>b a>=b a==b a!=b a&b a^&b a^b a~^b a|b a?b:c {a,b,c,d} NOT, Negate, Reduction AND, NAND OR, NOR, XOR, XNOR Multiply, Divide, Modulus Add, Subtract Logical Shift Arithmetic Shift Relational Equality AND XOR, XNOR OR Conditional Concatenation

An XOR Built Hierarchically


module mynand2(input logic a, b, output logic y); assign y = ~(a & b); endmodule module myxor2(input logic a, b, output logic y); logic abn, aa, bb; mynand2 n1(a, b, abn), n2(a, abn, aa), n3(abn, b, bb), n4(aa, bb, y); endmodule
mynand2:n2 y y~not
y

Declare internal wires n1: A mynand2 connected to a, b, and abn

mynand2:n4 y y~not
y b

a mynand2:n1 y y~not
y b

a b

mynand2:n3 y y~not
y b

A Decimal-to-Seven-Segment Decoder
always_comb: combinational logic in an imperative style module dec7seg(input logic [3:0] a, output logic [6:0] y); always_comb case (a) 4d0: y = 7b111_1110; 4d1: y = 7b011_0000; 4d2: y = 7b110_1101; 4d3: y = 7b111_1001; 4d4: y = 7b011_0011; 4d5: 4d5 y = 7b101_1011; 7b101_1011 4d6: y = 7b101_1111; 4d7: y = 7b111_0000; 4d8: y = 7b111_1111; 4d9: y = 7b111_0011; default: y = 7b000_0000; endcase endmodule blocking assignment: use in always_comb

Multiway conditional 4d5: decimal 5 as a four-bit binary number

seven-bit binary vector (_ is ignored)

Mandatory

Verilog Numbers

16h 16 h8_0F 8_0F Number of Bits Base: b, o, d, or h Value: _s are ignored Zero-padded

4b1010 = 4o12 = 4d10 = 4ha 16h4840 = 16b 100_1000_0100_0000

Imperative Combinational Logic


module comb1( input logic [3:0] a, b, input logic s, output logic [3:0] y); always_comb if (s) y = a + b; else y = a & b; endmodule
2 2 1 1

y~2
2

y~5
0 1

y[3..0]

y~1 y~6 Add0


A[3..0] B[3..0] 1 0 1 OUT[3..0]

+ y~0

0 0

y~7
0 0 1

s a[3..0] b[3..0]
3 0 3 1 3

y~3 y~4

Both a + b and a & b computed, mux selects the result.

Imperative Combinational Logic


module comb2( input logic [3:0] a, b, input logic s, t, output logic [3:0] y); always_comb if (s) y = a + b; else if (t) y = a & b; else y = a | b; endmodule
a[3..0] b[3..0]
2 2 2 1 1 1 1 1 0 0 0 0 3 3 0 3 3 2

y~2

y~6
0

y~9
0

y~13
2 1

y~5

y[3..0]

y~1

y~10
0 0 1 1 1

y~14

y~0 y~11 y~4


0 0 1 0 1

y~15

y~3 y~8 y~7


0 1 3 1

y~12

t s
A[3..0] B[3..0]

Add0 +
OUT[3..0]

All three expressions computed in parallel. Cascaded muxes implement priority (s over t).

s 1 0 0

t 1 0

y a+b a&b a|b

Imperative Combinational Logic


module comb3( input logic [3:0] a, b, input logic s, t, output logic [3:0] y, z); always_comb begin z = 4b0; if (s) begin y = a + b; z = a - b; end else if (t) begin y = a & b; z = a + b; end else y = a | b; end endmodule
1

Add1
A[4..0] 0:3 0:3 B[4..0]

OUT[4..0]

1 1

y~5

t
1 1

y~10 y~1
0 1 0 2 2 2 0 2 2 0 0 0 0 3 3 1 3 3 1

y~13 y[3..0]

s a[3..0] b[3..0] y~2 y~9 y~6

y~14
0 1 1

y~0 y~11
0 0

y~15
0 1

y~4

y~3
0

y~8
0

y~12
3 1

y~7

Add0
A[3..0] B[3..0]

z~0
OUT[3..0] 3

1'h0 0
0 1 4 1

z~4 z[3..0]

z~1 1'h0 0
0 2 1 3 1

z~5

z~2 1'h0 0
0 1 2 1

z~6

Separate mux cascades for y and z. One copy of a + b.

z~3 1'h0 0
0 0 1 1 1

z~7

An Address Decoder
module adecode(input logic [15:0] address, output logic RAM, ROM, output logic VIDEO, IO); always_comb begin {RAM, ROM, VIDEO, IO} } = if (address[15] [15]) RAM = 1; else if (address[14:13] VIDEO = 1; [14:12] else if (address[14:12] IO = 1; else if (address[14:13] ROM = 1; end endmodule 4b 0; 0 Vector concatenation Default: all zeros Select bit 15 Select bits 14, 13, & 12

== 2b 00 ) == 3b 101) == 2b 11 )

Omitting defaults for RAM, etc. will give construct does not infer purely combinational logic.

Sequential Logic

A D-Flip-Flop
always_ff introduces sequential logic module mydff(input logic clk, input logic d, output logic q); always_ff @(posedge clk) clk q <= d; Copy d to q endmodule Triggered by the rising edge of clk

Non-blocking assignment: happens just after the rising edge

q~reg0 d clk
D CLK Q

A Four-Bit Binary Counter

module count4(input logic clk, output logic [3:0] count); always_ff @(posedge clk) count <= count + 4d 1; endmodule
Add0
A[3..0]

Width optional but good style

count[0]~reg[3..0]
OUT[3..0] D CLK Q

4'h8 B[3..0] clk

count[3..0]

A Decimal Counter with Reset, Hold, and Load


module dec_counter(input logic input logic input logic [3:0] output logic [3:0] always_ff @(posedge clk) if (reset) count else if (load) count else if (~hold) if (count == 4d 9) count else count endmodule
count~0
3 0

clk, reset, hold, load, d, count);

<= 4d 0; <= d; <= 4d 0; <= count + 4d 1;

1'h0 1
3 0

count~4
0 1

count~8
0 3 1

count~12 1'h0 1

count~1
2 0

1'h0 1
2 0

count~5
0 1

count~9
0 2 1

count~13 1'h0 1

count~2 Equal0
A[3..0] 1 0

4'h9 B[3..0]

OUT

1'h0 1
1 0

count~6
0 1

count~10
0 1 1

count~14 1'h0 1

Add0
A[3..0]

count~3
OUT[3..0] 0 0

4'h8 B[3..0]

1'h0 1
0 0

count~7
0 1 0 1

count~11
0

hold load d[3..0] reset clk

count~15 1'h0 1 count[0]~reg[3..0]


D CLK Q

count[3..0]

Moore and Mealy Finite-State Machines

Inputs Next State Logic

Next State CLK

Current State

Output Logic

Outputs

The Moore Form: Outputs are a function of only the current state.

Moore and Mealy Finite-State Machines

Inputs Next State Logic

Next State CLK

Current State

Output Logic

Outputs

The Mealy Form: Outputs may be a function of both the current state and the inputs. A mnemonic: Moore machines often have more states.

Moore-style: Sequential Next-State Logic


module moore_tlc(input logic clk, reset, input logic advance, output logic red, yellow, green); enum logic [2:0] {R, Y, G} state; // Symbolic state names always_ff @(posedge if (reset) else case (state) R: if (advance) G: if (advance) Y: if (advance) default: endcase clk) // Moore-style next-state logic state <= R; state state state state <= <= <= <= G; Y; R; R; // Combinational output logic // separated from next-state logic

assign red = state == R; assign yellow = state == Y; assign green = state == G; endmodule

Mealy-style: Combinational output/next state logic


module mealy_tlc(input logic clk, reset, input logic advance, output logic red, yellow, green); typedef enum logic [2:0] {R, Y, G} state_t; state_t state, next_state; always_ff @(posedge clk) state <= next_state; always_comb begin // Mealy-style next state and output logic {red, yellow, green} = 3b0; // Default: all off and next_state = state; // hold state if (reset) next_state = R; else case (state) R: begin red = 1; if (advance) next_state = G; end G: begin green = 1; if (advance) next_state = Y; end Y: begin yellow = 1; if (advance) next_state = R; end default: next_state = R; endcase end endmodule

Blocking vs. Nonblocking assignment


module nonblock(input clk, input logic a, output logic d); logic b, c; always_ff @(posedge clk) begin Nonblocking b <= a; assignment: c <= b; All run on the clock edge d <= c; end endmodule
b a clk
D CLK Q D CLK

module blocking(input clk, input logic a, output logic d); logic b, c; always_ff @(posedge clk) begin Blocking b = a; assignment: c = b; Effect felt by next statement d = c; end endmodule

c
Q

d~reg0
D CLK Q

d~reg0
d

a clk

D CLK

Summary of Modeling Styles

A Contrived Example
module styles_tlc(input logic clk, reset, input logic advance, output logic red, yellow, green); enum logic [2:0] {R, Y, G} state; always_ff @(posedge if (reset) else case (state) R: if (advance) G: if (advance) Y: if (advance) default: endcase clk) // Imperative sequential state <= R; // Non-blocking assignment // Case state <= G; // If-else state <= Y; state <= R; state <= R; Imperative combinational Blocking assignment If-else Case

always_comb begin // {red, yellow} = 2b 0; // if (state == R) red = 1; // case (state) // Y: yellow = 1; default: ; endcase; end assign green = state == G; endmodule

// Cont. assign. (comb)

Example: Bresenhams Line Algorithm

Bresenhams Line Algorithm


Objective:Draw a line...

Bresenhams Line Algorithm


...with well-approximating pixels...

Bresenhams Line Algorithm


...by maintaining error information..

Error = 1/7 Error = 3/7

Bresenhams Line Algorithm


...encoded using integers

3 4 0

5 6 3 2

Error = 1/7 Error = 3/7

Approach
1. Understand the algorithm I went to Wikipedia; doesnt everybody? 2. Code and test the algorithm in software I used C and the SDL library for graphics 3. Dene the interface for the hardware module A communication protocol: consider the whole system 4. Schedule the operations Draw a timing diagram! In hardware, you must know in which cycle each thing happens. 5. Code in RTL Always envision the hardware you are asking for 6. Test in simulation Create a testbench: code that mimicks the environment (e.g., generates clocks, inputs). 7. Test on the FPGA Simulating correctly is necessary but not sufcient.

The Pseudocode from Wikipedia


function line(x0, y0, x1, y1) dx := abs(x1-x0) dy := abs(y1-y0) if x0 < x1 then sx := 1 else sx := -1 if y0 < y1 then sy := 1 else sy := -1 err := dx-dy loop setPixel(x0,y0) if x0 = x1 and y0 = y1 exit loop e2 := 2*err if e2 > -dy then err := err - dy x0 := x0 + sx end if if e2 < dx then err := err + dx y0 := y0 + sy end if end loop

My C Code
void line(Uint16 x0, { Sint16 dx, dy; // Uint16 x, y; // Sint16 err; // Sint16 e2; // int right, down;// Uint16 y0, Uint16 x1, Uint16 y1) Width and height of bounding box Current point Loop-carried value Temporary variable Boolean

dx = x1 - x0; right = dx > 0; if (!right) dx = -dx; dy = y1 - y0; down = dy > 0; if (down) dy = -dy; err = dx + dy; x = x0; y = y0; for (;;) { plot(x, y); if (x == x1 && y == y1) break; // Reached the end e2 = err << 1; // err * 2 if (e2 > dy) { err += dy; if (right) x++; else x--;} if (e2 < dx) { err += dx; if (down) y++; else y--;} } }

Module Interface
module bresenham(input logic input logic input logic [10:0] clk, reset, start, x0, y0, x1, y1,

output logic plot, output logic [10:0] x, y, output logic done);

start indicates (x0, y0) and (x1, y1) are valid plot indicates (x,y) is a point to plot done indicates we are ready for the next start

Scheduling: Timing Diagram


clk (x0,y0) (x1,y1) start state done plot x y err dx, dy
0 0 3 6 1 1 2 5 2 3 2 1 4 4 5 3 0 6 7 4 3 5 3 0 1, 1 6 4 IDLE RUN IDLE RUN IDLE (0,0) (7,4) (5,3) (6,4)

7, 4

RTL: The IDLE state


/* C code */ Sint16 dx; Sint16 dy; Uint16 x, y; Sint16 err; Sint16 e2; int right; int down; dx = x1 - x0; right = dx > 0; if (!right) dx = -dx; dy = y1 - y0; down = dy > 0; if (down) dy = -dy; err = dx + dy; x = x0; y = y0; for (;;) { plot(x, y); logic signed [11:0] dx, dy, err, e2; logic right, down; typedef enum logic {IDLE, RUN} state_t; state_t state; always_ff @(posedge clk) begin done <= 0; plot <= 0; if (reset) state <= IDLE; else case (state) IDLE: if (start) begin dx = x1 - x0; // Blocking! right = dx >= 0; if (~right) dx = -dx; dy = y1 - y0; down = dy >= 0; if (down) dy = -dy; err = dx + dy; x <= x0; y <= y0; plot <= 1; state <= RUN; end

RTL: The RUN state


/* C Code */ for (;;) { plot(x, y); if (x == x1 && y == y1) break; e2 = err << 1; if (e2 > dy) { err += dy; if (right) x++; else x--; } if (e2 < dx) { err += dx; if (down) y++; else y--; } } RUN: if (x == x1 && y == y1) begin done <= 1; state <= IDLE; end else begin plot <= 1; e2 = err << 1; if (e2 > dy) begin err += dy; if (right) x <= x + 10d 1; else x <= x - 10d 1; end if (e2 < dx) begin err += dx; if (down) y <= y + 10d 1; else y <= y - 10d 1; end end default: state <= IDLE; endcase end

Datapath for dx, dy, right, and down


I: if (start) dx = x1 - x0; x1 right = dx >= 0; x0 if (~right) dx = -dx; dy = y1 - y0; down = dy >= 0; if (down) dy = -dy; err = dx + dy; x <= x0; y <= y0; plot <= 1; y1 state <= RUN; y0 R: if (x == x1 && y == y1) done <= 1; state <= IDLE; else plot <= 1; e2 = err << 1; if (e2 > dy) err += dy; if (right) x <= x + 10d 1; else x <= x - 10d 1; if (e2 < dx) err += dx; if (down) y <= y + 10d 1; else y <= y - 10d 1;
x1 x0 0? right

negate

1 0

dx

y1 y0

0?

down

negate

1 0

dy

Datapath for err


I: if (start) dx = x1 - x0; right = dx >= 0; if (~right) dx = -dx; dx + dy = y1 - y0; dy down = dy >= 0; if (down) dy = -dy; dy err = dx + dy; + x <= x0; y <= y0; plot <= 1; dy state <= RUN; R: if (x == x1 && y == y1) done <= 1; state <= IDLE; <<1 else plot <= 1; e2 = err << 1; if (e2 > dy) err += dy; if (right) x <= x + 10d 1; else x <= x - 10d 1; if (e2 < dx) err += dx; if (down) y <= y + 10d 1; else y <= y - 10d 1;
RUN

dx 1 0

1 0

0 1

<

e2 > dy dx e2

err e2 < dx

>

Datapath for x and y


I: if (start) dx = x1 - x0; right = dx >= 0; if (~right) dx = -dx; dy = y1 - y0; down = dy >= 0; if (down) dy = -dy; err = dx + dy; x <= x0; y <= y0; x1 plot <= 1; y1 state <= RUN; R: if (x == x1 && y == y1) done <= 1; state <= IDLE; else plot <= 1; e2 = err << 1; if (e2 > dy) err += dy; if (right) x <= x + 10d 1; else x <= x - 10d 1; if (e2 < dx) err += dx; if (down) y <= y + 10d 1; else y <= y - 10d 1;
right +1 1 1 0 e2 > dy x0 RUN 0 1 x

1 0

= done = down +1 1 1 0 e2 < dx y0 RUN 0 1 y

1 0

The Framebuffer: Interface and Constants


module input input input VGA_framebuffer( logic clk50, reset, logic [10:0] x, y, // Pixel coordinates logic pixel_color, pixel_write,

output logic [7:0] VGA_R, VGA_G, VGA_B, output logic VGA_CLK, VGA_HS, VGA_VS, VGA_BLANK_n,VGA_SYNC_n); parameter HACTIVE HFRONT_PORCH HSYNC HBACK_PORCH HTOTAL HACTIVE + parameter VACTIVE VFRONT_PORCH VSYNC VBACK_PORCH VTOTAL VACTIVE + = 11d 1280, = 11d 32, = 11d 192, = 11d 96, = HFRONT_PORCH + HSYNC + HBACK_PORCH; //1600 = 10d 480, = 10d 10, = 10d 2, = 10d 33, = VFRONT_PORCH + VSYNC + VBACK_PORCH; //525

The Framebuffer: Counters and Sync


// Horizontal counter logic [10:0] logic always_ff @(posedge clk50 or if (reset) hcount else if (endOfLine) hcount else hcount hcount; endOfLine; posedge reset) <= 0; <= 0; <= hcount + 11d 1;

assign endOfLine = hcount == HTOTAL - 1; // Vertical counter logic [9:0] logic always_ff @(posedge clk50 or if (reset) vcount else if (endOfLine) if (endOfField) vcount else vcount vcount; endOfField; posedge reset) <= 0; <= 0; <= vcount + 10d 1;

assign endOfField = vcount == VTOTAL - 1; assign VGA_HS = !( (hcount[10:7] == 4b1010) & (hcount[6] | hcount[5]) ); assign VGA_VS = !( vcount[9:1] == (VACTIVE + VFRONT_PORCH) / 2);

The Framebuffer: Blanking, Memory, and RGB


assign VGA_SYNC_n = 1; // Sync on R, G, and B. Unused for VGA. logic blank; assign blank = ( hcount[10] & (hcount[9] | hcount[8]) ) | // 1280 ( vcount[9] | (vcount[8:5] == 4b1111) ); // 480 logic framebuffer [307199:0]; // 640 * 480 logic [18:0] read_address, write_address; assign write_address = x + (y << 9) + (y << 7) ; // x + y * 640 assign read_address = (hcount >> 1) + (vcount << 9) + (vcount << 7); logic pixel_read; always_ff @(posedge clk50) begin if (pixel_write) framebuffer[write_address] <= pixel_color; if (hcount[0]) begin pixel_read <= framebuffer[read_address]; VGA_BLANK_n <= ~blank; // Sync blank with read pixel data end end assign VGA_CLK = hcount[0]; // 25 MHz clock assign {VGA_R, VGA_G, VGA_B} = pixel_read ? 24hFF_FF_FF : 24h0; endmodule

The Hallway Line Generator


module hallway(input logic input logic input logic clk, reset, VGA_VS, done,

output logic [10:0] x0, y0, x1, y1, output logic start, pixel_color); // ... // Typical state: S_TOP: if (done) begin start <= 1; if (x0 < 620) x0 <= x0 + 10d 10; else begin state <= S_RIGHT; x0 <= 639; y0 <= 0; end end

Connecting the Pieces


// SoCKit_Top.sv logic [10:0] logic logic logic x, y, x0,y0,x1,y1; pixel_color; pixel_write; done, start;

VGA_framebuffer fb(.clk50(OSC_50_B3B), .reset(~RESET_n), .*); bresenham liner(.clk(OSC_50_B3B), .reset(~RESET_n), .reset(~RESET_n) .plot(pixel_write), .*); hallway hall(.clk(OSC_50_B3B), .reset(~RESET_n), .* );

Connect the bresenham reset port to an inverted RESET_n

Connect the other bresenham ports to wires with the same name e.g., .x(x), .y(y),. . .

Testbenches

Testbenches
A model of the environment; exercises the module.
// Module to test: // Three-bit // binary counter module count3( input logic clk, reset, output logic [2:0] count); always_ff @(posedge clk) if (reset) count <= 3d 0; else count <= count + 3d 1; endmodule module count3_tb; ; logic clk, reset; logic [2:0] count; count3 dut(. dut(.*); initial begin clk = 0; forever #20ns clk = ~clk; end initial begin // Reset reset = 0; repeat (2) @(posedge clk); reset = 1; repeat (2) @(posedge clk); reset = 0; end endmodule No ports Signals for each DUT port Device Under Test Connect everything Initial block: Imperative code runs once Innite loop Delay Counted loop

Wait for a clock cycle

Running this in ModelSim

You might also like