0% found this document useful (0 votes)
110 views

Lecture20 Memory PDF

Uploaded by

Karl Arsch
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views

Lecture20 Memory PDF

Uploaded by

Karl Arsch
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

inst.eecs.berkeley.

edu/~eecs151

EECS151 : Introduction to Digital Design and ICs


Lecture 20 – Memory and Clock
Bora Nikolić and Sophia Shao
This Maglev Heart Could Keep Cardiac Patients
Alive
Aug. 22, 2019. Inside Bivacor’s artificial heart, a levitating
disk spins 2000 times per minute to keep blood flowing

IEEE Spectrum: https://spectrum.ieee.org/biomedical/devices/this-maglev-


heart-could-keep-cardiac-patients-alive

EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 1
Review
• SRAM and regfile cells can have multiple R/W ports
• Memory decoding is done hierarchically
• Wire-limited in large arrays
• Multiple cache levels make memory appear both fast and big
• Direct mapped and set-associative cache

EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 2
ASIC Memories

EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 3
ASIC Memory Compilers
• Memory compiler
produces front-end
views (similar to
standard cells, but
really large ones)

EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 4
FPGA Memories

EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 5
Verilog RAM Specification
//
// Single-Port RAM with Asynchronous Read
//
module ramBlock (clk, we, a, di, do);
input clk;
input we; // write enable
input [19:0] a; // address
input [7:0] di; // data in
output [7:0] do; // data out
reg [7:0] ram [1048575:0]; // 8x1Meg
always @(posedge clk) begin // Synch write
if (we)
ram[a] <= di;
assign do = ram[a]; // Asynch read
endmodule

What do the synthesis tools do with this?

EECS151/251A L20 MEMORY AND CLOCK 6


Verilog Synthesis Notes (FPGAs)
• Block RAMS and LUT RAMS all exist as primitive library elements. However, it is much
more convenient to use inference.
• Depending on how you write your Verilog, you will get either a collection of block
RAMs, a collection of LUT RAMs, or a collection of flip-flops.
• The synthesizer uses size, and read style (sync versus async) to determine the best
primitive type to use.
• It is possible to force mapping to a particular primitive by using synthesis directives.
Ex: (* ram_style = "distributed" *) reg myReg;
• The synthesizer has limited capabilities (eg., it can combine primitives for more depth
and width, but is limited on porting options). Be careful, as you might not get what
you want.
• See User Guide for examples.
• CORE generator memory block has an extensive set of parameters for explicitly
instantiated RAM blocks.

EECS151/251A L20 MEMORY AND CLOCK 7


Inferring RAMs in Verilog (FPGA)
// 64X1 RAM implementation using distributed RAM

module ram64X1 (clk, we, d, addr, q);


input clk, we, d;
input [5:0] addr;
output q;
Verilog reg array used with
reg [63:0] temp; “always @ (posedge ... infers
always @ (posedge clk) memory array.
if(we)
temp[addr] <= d; Asynchronous read infers LUT
assign q = temp[addr]; RAM

endmodule

EECS151/251A L20 MEMORY AND CLOCK 8


Dual-read-port LUT RAM (FPGA)
//
// Multiple-Port RAM Descriptions
//
module v_rams_17 (clk, we, wa, ra1, ra2, di, do1, do2);
input clk;
input we;
input [5:0] wa;
input [5:0] ra1;
input [5:0] ra2;
input [15:0] di;
output [15:0] do1;
output [15:0] do2;
reg [15:0] ram [63:0];
always @(posedge clk)
begin
if (we)
ram[wa] <= di;
end Multiple reference to same
assign do1 = ram[ra1]; array.
assign do2 = ram[ra2];
endmodule

EECS151/251A L20 MEMORY AND CLOCK 9


Block RAM Inference (FPGA)
//
// Single-Port RAM with Synchronous Read
//
module v_rams_07 (clk, we, a, di, do);
input clk;
input we;
input [5:0] a;
input [15:0] di;
output [15:0] do;
reg [15:0] ram [63:0];
reg [5:0] read_a;
always @(posedge clk) begin
if (we)
ram[a] <= di; Synchronous read (registered
read_a <= a; read address) infers Block RAM
end
assign do = ram[read_a];
endmodule

EECS151/251A L20 MEMORY AND CLOCK 10


FPGA Block RAM initialization (FPGA)
module RAMB4_S4 (data_out, ADDR, data_in, CLK, WE);
output[3:0] data_out;
input [2:0] ADDR;
input [3:0] data_in;
input CLK, WE;
reg [3:0] mem [7:0];
reg [3:0] read_addr;

initial
“data.dat” contains initial RAM contents, it gets
begin
put into the bitfile and loaded at configuration
$readmemb("data.dat", mem);
end time.
(Remake bits to change contents)
always@(posedge CLK)
read_addr <= ADDR;

assign data_out = mem[read_addr];

always @(posedge CLK)


if (WE) mem[ADDR] = data_in;

endmodule

EECS151/251A L20 MEMORY AND CLOCK 11


FIFOs

EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 12
First-in-first-out (FIFO) Memory
• Used to implement queues. • Producer can perform many writes
without consumer performing any
• These find common use in processor and reads (or vice versa). However,
communication circuits. because of finite buffer size, on
average, need equal number of reads
• Generally, used to “decouple” actions of producer and writes.
and consumer: • Typical uses:
stating state – interfacing I/O devices. Example
network interface. Data bursts
c ba from network, then processor
bursts to memory buffer (or reads
after write one word at a time from
d c ba interface). Operations not
synchronized.
after read
– Example: Audio output. Processor
produces output samples in bursts
dc b (during process swap-in time).
Audio DAC clocks it out at constant
sample rate.
EECS151/251A L20 MEMORY AND CLOCK
FIFO Interfaces

DIN RST CLK


• Address pointers are used internally to
WE keep next write position and next read
FULL FIFO position into a dual-port memory.
HALF FULL write ptr
EMPTY
RE read ptr
DOUT • If pointers equal after write Þ FULL:

• After write or read operation, FULL and EMPTY


write ptr read ptr
indicate status of buffer.
• Used by external logic to control own reading from or • If pointers equal after read Þ EMPTY:
writing to the buffer.
• FIFO resets to EMPTY state. write ptr read ptr

• HALF FULL (or other indicator of partial fullness) is


optional.

EECS151/251A L20 MEMORY AND CLOCK


Note: pointer incrementing is done “mod size-of-buffer”
Xilinx Virtex5 FIFOs
• Virtex5 BlockRAMS include dedicated circuits for FIFOs.
• Details in User Guide (ug190).
• Takes advantage of separate dual ports and independent ports clocks.

EECS151/251A L20 MEMORY AND CLOCK


FPGA Memory Blocks

EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 16
A SLICEM 6-LUT ...
Memory data input
Normal
5/6-LUT
outputs.
Normal
6-LUT
inputs. Memory
data input.

Control output for chaining


Memory LUTs to
write make larger memories.
address
Synchronous write / asychronous read
A 1.1 Mb distributed RAM can be made if
all SLICEMs of an LX110T are used as RAM.

EECS151/251A L20 MEMORY AND CLOCK 17


SLICEL vs SLICEM ...
SLICEL SLICEM

SLICEM adds memory features


to LUTs, + muxes.
Page
32
EECS151/251A L20 MEMORY AND CLOCK 18
Example Distributed RAM (LUT RAM)
Example configuration: Single-
port 256b x 1, registered output.

EECS151/251A L20 MEMORY AND CLOCK 19


Distributed RAM Primitives

All are built from a single slice or less.

Remember, though, that the SLICEM LUT is naturally only 1


read and 1 write port.

EECS151/251A L20 MEMORY AND CLOCK 20


Block RAM Overview
• 36K bits of data total, can be configured as:
• 2 independent 18Kb RAMs, or one 36Kb RAM.
• Each 36Kb block RAM can be configured as:
• 64Kx1 (when cascaded with an adjacent 36Kb block RAM), 32Kx1,
16Kx2, 8Kx4, 4Kx9, 2Kx18, or 1Kx36 memory.
• Each 18Kb block RAM can be configured as:
• 16Kx1, 8Kx2, 4Kx4, 2Kx9, or 1Kx18 memory.
• Write and Read are synchronous operations.
• The two ports are symmetrical and totally independent (can
have different clocks), sharing only the stored data.
• Each port can be configured in one of the available widths,
independent of the other port. The read port width can be
different from the write port width for each port.
• The memory content can be initialized or cleared by the
configuration bitstream.
EECS151/251A L20 MEMORY AND CLOCK 21
UltraRAM Blocks

EECS151/251A L20 MEMORY AND CLOCK 22


Administrivia
• Projects
• Checkpoints 2 are this Friday
• Homework 9 due on Monday, Nov. 25
• No class on Wednesday, Nov. 27
• Last day of classes is Dec. 6

EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 23
DRAM

EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 24
3-Transistor DRAM Cell

BL 1 BL 2
WWL
RWL WWL

M3 RWL
M1 X M2 X VDD - VT
VDD
CS BL 1
BL 2 VDD DV

No constraints on device ratios


Reads are non-destructive
Value stored at node X when writing a “1” = VWWL -VTh

Can work with a logic IC process

EECS151/251A L20 MEMORY AND CLOCK 25


1-Transistor DRAM Cell

VBL

VBIT= 0 or (VDD – VT)

Write: C S is charged or discharged by asserting WL and BL.


Read: Charge redistribution takes places between bit line and storage capacitance
CS << CBL Voltage swing is small; typically hundreds of mV.

• To get sufficiently large Cs, special IC process is used


• Cell reading is destructive, therefore read operation always is followed by a write-back
• Cell looses charge (leaks away in ms - highly temperature dependent), therefore cells occasionally
need to be “refreshed” - read/write cycle
EECS151/251A L20 MEMORY AND CLOCK 26
Clocking

EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 27
Example Clock System

Courtesy of IEEE Press, New York. Ó 2000


EECS151/251A L20 MEMORY AND CLOCK
Clock Generation: Phase-Locked Loop
• Phase-locked loop (PLL)
• Used for clock frequency synthesis
• N/M x FSYSCLK
• Locks output clock phase to input
clock phase
• Can multiply clock frequency
• Multiple PLLs on a SoC
• One per core/clock domain
• Can be analog or digital
• Low phase noise (jitter)

EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 29
Clock Distribution

H-tree

CLK

Clock is distributed in a tree-like fashion


Large chips (blocks) – many levels of buffers
Goal minimize skew, supply-induced jitter
30

EECS151/251A L20 MEMORY AND CLOCK


More realistic H-tree

[Restle98]
31

EECS151/251A L20 MEMORY AND CLOCK


The Grid System

GCL K

Driver

Driver

Driver
GCLK GCLK

•Relaxed design constraints


for low skew
•Large power
Driver

GCL K

32

EECS151/251A L20 MEMORY AND CLOCK


DEC Alpha 21264 Clocking
• Highest-performance processor of its time

33

EECS151/251A L20 MEMORY AND CLOCK


Clock Gating
• Clock consumes 20-40% of
power in a chip
• Gate off in inactive blocks
• Multiple levels of clock gating

EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 34
What Happens When We Un-Gate Clock?

EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 35
Crossing Clock Domains
• Two domains at different frequencies exchange wdata, rdata
• FIFO with two clocks

http://www.sunburst-design.com/papers/CummingsSNUG2002SJ_FIFO1.pdf
EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 36
Summary
• Memory compilers generate SRAM blocks
• Several options for memory on FPGAs: Distributed, BlockRAM, UltraRAM
• Clock generation and distribution is a major part of digital system design
• We just touched on it

EECS151/251A L20 MEMORY AND CLOCK Nikolić, Shao Fall 2019 © UCB 37

You might also like