0% found this document useful (0 votes)
87 views19 pages

Lec10 Sram1

- SRAM is the focus of today's lecture on internal memory basics - An SRAM cell uses 6 transistors (6T cell) to store a single bit - SRAM cell arrays are organized into rows and columns accessed by word lines (WL) and bit lines (BL) - During reads, BLs are precharged high and one will be pulled low by the accessed cell value - During writes, BLs are driven differentially to overwrite the cell state - Larger memories can be built by cascading smaller memory blocks through column multiplexing

Uploaded by

VenkatGolla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views19 pages

Lec10 Sram1

- SRAM is the focus of today's lecture on internal memory basics - An SRAM cell uses 6 transistors (6T cell) to store a single bit - SRAM cell arrays are organized into rows and columns accessed by word lines (WL) and bit lines (BL) - During reads, BLs are precharged high and one will be pulled low by the accessed cell value - During writes, BLs are driven differentially to overwrite the cell state - Larger memories can be built by cascading smaller memory blocks through column multiplexing

Uploaded by

VenkatGolla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

EECS150 - Digital Design

Lecture 10 – SRAM (I)

September 27, 2011

Elad Alon
Electrical Engineering and Computer Sciences
University of California, Berkeley

http://www-inst.eecs.berkeley.edu/~cs150

Fall 2011 EECS150 Lecture 10 Page 1

Announcements
• Homework #4 due Thursday

• Homework #5 out Thursday


– Due next Thurs.

Fall 2011 EECS150 Lecture 10 Page 2

1
Project CPU Pipelining Review
3-stage I X M
pipeline instruction execute access data
fetch memory
• Pipeline rules:
– Writes/reads to/from DMem use leading edge of “M”
– Writes to RegFile use trailing edge of “M”
– Instruction Decode and Register File access is up to you.
• 1 Load Delay Slot, 1 Branch Delay Slot
– No Stalling may be used to accommodate pipeline hazards (in
final version).
• Other:
– Target frequency to be announced later (50-100MHz)
– Minimize cost
– Posedge clocking only
Fall 2011 EECS150 Lecture 10 Page 3

Memory-Block Basics
• Uses:
Whenever a large collection of state elements is required.
– data & program storage
– general purpose registers log2(M)
– data buffering
– table lookups
– CL implementation
M X N memory:

• Basic Types: Depth = M, Width = N.

– RAM - random access memory M words of memory, each


word N bits wide.
– ROM - read only memory
– EPROM, FLASH - electrically programmable read only
memory

Fall 2011 EECS150 Lecture 10 Page 4

2
Memory Components Types:
• Volatile:
– Random Access Memory (RAM):
• SRAM "static" Focus today
• DRAM "dynamic" Focus in ~2 weeks

• Non-volatile:
– Read Only Memory (ROM):
• Mask ROM "mask programmable"
• EPROM "electrically programmable"
• EEPROM "erasable electrically programmable"
• FLASH memory - similar to EEPROM with programmer integrated
on chip

All these types are available as stand alone chips or as blocks in


other chips.
Fall 2011 EECS150 Lecture 10 Page 5

Standard Internal Memory Organization


2-D arrary of bit
cells. Each cell
stores one bit of
data.

Special circuit tricks are


used for the cell array to
improve storage density.

• RAM/ROM naming convention:


– examples: 32 X 8, "32 by 8" => 32 8-bit words
– 1M X 1, "1 meg by 1" => 1M 1-bit words

Fall 2011 EECS150 Lecture 10 Page 6

3
Address Decoding

Fall 2011 EECS150 Lecture 10 Page 7

SRAM Internals
WL1

WL2

WLi

Fall 2011 EECS150 Lecture 10 Page 8

4
SRAM Cell Details
• Most common is 6 transistors (6T) cell:

WL

BL BL

• Notice: no explicit read vs. write signal


– WL activates the cell (and all others on same column) for
both operations
– Will see shortly how to distinguish reads from writes

Fall 2011 EECS150 Lecture 10 Page 9

SRAM Cell Array

WL0 WL0

WL2 WL2

WL3 WL3

BL BL_B BL BL_B

Fall 2011 EECS150 Lecture 10 Page 10

5
SRAM Cell Array: Write

For write operation, column bit lines are driven differentially


(e.g., 0 on BL, 1 on BL_b). Values overwrite cell state.

Fall 2011 EECS150 Lecture 10 Page 11

SRAM Cell Array: Read

For read operation, column bit lines are both driven to high
voltage (supply), then released. When activated, cell pulls down
one bit line or the other.
Fall 2011 EECS150 Lecture 10 Page 12

6
Column Multiplexing:
• Permits input/output data widths different from row width.
• Enables physical aspect ratio closer to a square
– Why is this important?
1024x1: 256x4:

Technique illustrated for read operation. Similar


approach for write.
Fall 2011 EECS150 Lecture 10 Page 13

Logical View: Cascading Memory-Blocks


How to make larger memory blocks out of smaller ones.

Increasing the width. Example: given 1Kx8, want 1Kx16

Fall 2011 EECS150 Lecture 10 Page 14

7
Logical View: Cascading Memory-Blocks
How to make larger memory blocks out of smaller ones.

Increasing the depth. Example: given 1Kx8, want 2Kx8

Fall 2011 EECS150 Lecture 10 Page 15

Multi-ported Memory
• Motivation:
– Consider CPU core register file: Aa
Douta
Dina
• 1 read or write per cycle limits
WEa
processor performance. Dual-port
• Complicates pipelining. Difficult for Memory
Ab
different instructions to Dinb Doutb
simultaneously read or write regfile. WEb
• Common arrangement in pipelined
CPUs is 2 read ports and 1 write
port.
– I/O data buffering: disk or network interface
• Dual-porting allows both sides to data
simultaneously access memory buffer CPU

Fall 2011 EECS150 Lecture 10 Page 16

8
Dual-ported Memory Internals
• Add decoder, another set of • Example cell: SRAM
read/write logic, bit lines, word lines:
WL2
WL1

deca decb cell


array b2 b1 b1 b2

• Repeat everything but cross-coupled


r/w logic inverters.
• This scheme extends up to a couple
r/w logic more ports, then need to add
additional transistors.
address
ports data ports

Fall 2011 EECS150 Lecture 10 Page 17

Adding Ports to Primitive Memory Blocks


Adding a read port to a simple dual port (SDP) memory.

Example: given 1Kx8 SDP, want 1 write & 2 read ports.

Fall 2011 EECS150 Lecture 10 Page 18

9
Adding Ports to Primitive Memory Blocks
How to add a write port to a simple dual port memory.
Example: given 1Kx8 SDP, want 1 read & 2 write ports.

Fall 2011 EECS150 Lecture 10 Page 19

Virtex-5 LX110T
memory blocks:

Distributed RAM
using LUTs
among the CLBs.

Block RAMs
in four
columns.

Fall 2011 EECS150 Lecture 10 Page 20

10
SLICEL vs SLICEM ...
SLICEL SLICEM

SLICEM adds memory


features to LUTs, + muxes.

Fall 2011 EECS150 Lecture 10 Page 21

A SLICEM 6-LUT…

Fall 2011 EECS150 Lecture 10 Page 22

11
Example Distributed RAM (LUT RAM)
Example configuration:
Single-port 256b x 1,
registered output.

A 128 x 32b LUT RAM


has a 1.1ns access time.

Fall 2011 EECS150 Lecture 10 Page 23

Distributed RAM Primitives

All are built from a single slice or less.

Remember, though, that the SLICEM LUT


is naturally only 1 read and 1 write port.

Fall 2011 EECS150 Lecture 10 Page 24

12
Example Dual Port Configurations

Fall 2011 EECS150 Lecture 10 Page 25

Distributed RAM Timing

Fall 2011 EECS150 Lecture 10 Page 26

13
Spring 2009 EECS150 - Lec03-FPGA Page
Fall 2011 EECS150 Lecture 10 Page 27

Block RAM Overview


• 36K bits of data total, can be configured as:
– 2 independent 18Kb RAMs, or one 36Kb RAM.
• Each 36Kb block RAM can be configured as:
– 64Kx1 (when cascaded with an adjacent 36Kb
block RAM), 32Kx1, 16Kx2, 8Kx4, 4Kx9, 2Kx18, or
1Kx36 memory.
• Each 18Kb block RAM can be configured as:
– 16Kx1, 8Kx2, 4Kx4, 2Kx9, or 1Kx18 memory.
• Write and Read are synchronous operations.
• The two ports are symmetrical and totally
independent (can have different clocks),
sharing only the stored data.
• Each port can be configured in one of the
available widths, independent of the other
port. The read port width can be different
from the write port width for each port.
• The memory content can be initialized or
cleared by the configuration bitstream.

Fall 2011 EECS150 Lecture 10 Page 28

14
Block RAM Timing

• Note this is in the default mode, “WRITE_FIRST”.


• Optional output register, would delay appearance of output
data by one cycle.
• Maximum clock rate roughly 400MHz.

Fall 2011 EECS150 Lecture 10 Page 29

Verilog Synthesis Notes


• Block RAMS and LUT RAMS all exist as primitive library
elements (similar to FDRSE). However, it is much more
convenient to use inference.
• Depending on how you write your verilog, you will get either a
collection of block RAMs, a collection of LUT RAMs, or a
collection of flip-flops.
• The synthesizer uses size, and read style (synch versus asynch)
to determine the best primitive type to use.
• It is possible to force mapping to a particular primitive by using
synthesis directives. However, if you write your verilog
correctly, you will not need to use directives.
• The synthesizer has limited capabilities (eg., it can combine
primitives for more depth and width, but is limited on porting
options). Be careful, as you might not get what you want.
• See Synplify User Guide, and XST User Guide for examples.

Fall 2011 EECS150 Lecture 10 Page 30

15
Inferring RAMs in Verilog
// 64X1 RAM implementation using distributed RAM

module ram64X1 (clk, we, d, addr, q);


input clk, we, d;
input [5:0] addr;
output q;
Verilog reg array used with
reg [63:0] temp; “always @ (posedge ... infers
always @ (posedge clk) memory array.
if(we)
temp[addr] <= d; Asynchronous read
assign q = temp[addr]; infers LUT RAM

endmodule

Fall 2011 EECS150 Lecture 10 Page 31

Dual-read-port LUT RAM


//
// Multiple-Port RAM Descriptions
//
module v_rams_17 (clk, we, wa, ra1, ra2, di, do1, do2);
input clk;
input we;
input [5:0] wa;
input [5:0] ra1;
input [5:0] ra2;
input [15:0] di;
output [15:0] do1;
output [15:0] do2;
reg [15:0] ram [63:0];
always @(posedge clk)
begin
if (we)
ram[wa] <= di;
end Multiple reference to
assign do1 = ram[ra1]; same array.
assign do2 = ram[ra2];
endmodule

Fall 2011 EECS150 Lecture 10 Page 32

16
Block RAM Inference
//
// Single-Port RAM with Synchronous Read
//
module v_rams_07 (clk, we, a, di, do);
input clk;
input we;
input [5:0] a;
input [15:0] di;
output [15:0] do;
reg [15:0] ram [63:0];
reg [5:0] read_a;
always @(posedge clk) begin
if (we)
ram[a] <= di; Synchronous read
read_a <= a; (registered read address)
infers Block RAM
end
assign do = ram[read_a];
endmodule

Fall 2011 EECS150 Lecture 10 Page 33

Block RAM initialization


module RAMB4_S4 (data_out, ADDR, data_in, CLK, WE);
output[3:0] data_out;
input [2:0] ADDR;
input [3:0] data_in;
input CLK, WE;
reg [3:0] mem [7:0];
reg [3:0] read_addr;

initial “data.dat” contains initial RAM


begin contents, it gets put into the bitfile
$readmemb("data.dat", mem); and loaded at configuration time.
end (Remake bits to change contents)
always@(posedge CLK)
read_addr <= ADDR;

assign data_out = mem[read_addr];

always @(posedge CLK)


if (WE) mem[ADDR] = data_in;

endmodule

Fall 2011 EECS150 Lecture 10 Page 34

17
Dual-Port Block RAM
module test (data0,data1,waddr0,waddr1,we0,we1,clk0, clk1, q0, q1);

parameter d_width = 8; parameter addr_width = 8; parameter mem_depth = 256;

input [d_width-1:0] data0, data1;


input [addr_width-1:0] waddr0, waddr1;
input we0, we1, clk0, clk1;

reg [d_width-1:0] mem [mem_depth-1:0]


reg [addr_width-1:0] reg_waddr0, reg_waddr1;
output [d_width-1:0] q0, q1;

assign q0 = mem[reg_waddr0];
assign q1 = mem[reg_waddr1];

always @(posedge clk0)


begin
if (we0)
mem[waddr0] <= data0;
reg_waddr0 <= waddr0;
end

always @(posedge clk1)


begin
if (we1)
mem[waddr1] <= data1;
reg_waddr1 <= waddr1;
end

endmodule

Fall 2011 EECS150 Lecture 10 Page 35

Implications on Processor Design


• Register File: Consider distributed RAM (LUT RAM)
– Size is close to what is needed: distributed RAM primitive
configurations are 32 or 64 bits deep. Extra width is easily
achieved by parallel arrangements.
– LUT-RAM configurations offer multi-porting options - useful for
register files.
– Asynchronous read, might be useful by providing flexibility on where
to put register read in the pipeline.
• Instruction / Data Caches : Consider Block RAM
– Higher density, lower cost for large number of bits
– A single 36kbit Block RAM implements 1K 32-bit words.
– Configuration stream based initialization permits a simple “boot
strap” procedure.
• Other Memories? FIFOs? Video “Frame Buffer”? How big?

Fall 2011 EECS150 Lecture 10 Page 36

18
XUP Board External SRAM
“ZBT”
“ZBT” synchronous
synchronous
SRAM,
SRAM, 99 Mb
Mb on
on
32-bit
32-bit data bus,
data bus,
with
with four
four “parity”
“parity”
bits
bits
256K
256K xx 36
36 bits
bits
(located
(located under the
under the
removable
removable LCD)
LCD)

*ZBT (ZBT stands for zero bus


turnaround) — the turnaround is
the number of clock cycles it
takes to change access to the
SRAM from write to read and vice
versa. The turnaround for ZBT
SRAMs or the latency between
read and write cycle is zero.
More generally, how does software
interface to I/O devices?
Fall 2011 EECS150 Lecture 10 Page 37

XUP Board External DRAM


256
256 MByte
MByte DDR2
DDR2
DRAM
DRAM with
with
400MHz
400MHz data
data rate.
rate.

*SO-DIMM stands for small


outline dual in-line memory module.
SO-DIMMS are often used in
systems which have space
restrictions such as notebooks.
*DDR2 stands for second
generation double data rate. DDR
transfers data both on the rising
and falling edges of the clock
More generally, how does software signal.

interface to I/O devices?


Fall 2011 EECS150 Lecture 10 Page 38

19

You might also like