Lecture 12 SRAM
Lecture 12 SRAM
Lecture 12 SRAM
Zhuo Feng
12.1
12.1
Outline
Memory Arrays
SRAM Architecture
SRAM Cell
Decoders
Column Circuitry
Multiple Ports
12.2
12.2
Memory Arrays
Memory Arrays
Read/Write Memory
(RAM)
(Volatile)
Static RAM
(SRAM)
Dynamic RAM
(DRAM)
Mask ROM
Programmable
ROM
(PROM)
12.3
12.3
Shift Registers
Serial In
Parallel Out
(SIPO)
Erasable
Programmable
ROM
(EPROM)
Queues
Parallel In
Serial Out
(PISO)
Electrically
Erasable
Programmable
ROM
(EEPROM)
First In
First Out
(FIFO)
Flash ROM
Last In
First Out
(LIFO)
Array Architecture
2n words of 2m bits each
If n >> m, fold by 2k into fewer rows of more columns
wordlines
bitline conditioning
bitlines
row decoder
n-k
n
memory cells:
2n-k rows x
2m+k columns
column
circuitry
k
column
decoder
2m bits
6T SRAM Cell
Cell size accounts for most of array size
Reduce cell size at expense of complexity
6T SRAM Cell
Used in most commercial chips
Data stored in cross-coupled inverters
Read:
Precharge bit, bit_b
Raise wordline
bit
word
Write:
Drive data onto bit, bit_b
Raise wordline
12.6
12.6
bit_b
SRAM Read
Precharge both bitlines high
Then turn on wordline
One of the two bitlines will be pulled down by the cell
Ex: A = 0, A_b = 1
bit discharges, bit_b stays high
But A bumps up slightly
Read stability
A_b
bit_b
bit
bit_b
1.5
word
P1 P2
N2
A
1.0
N4
A_b
N1 N3
0.5
A
0.0
0
12.7
12.7
bit
word
100
200
300
time (ps)
400
500
600
SRAM Read
Precharge both bitlines high
Then turn on wordline
One of the two bitlines will be pulled down by the cell
Ex: A = 0, A_b = 1
N1 >> N2
N3 >> N4
Read stability
A_b
bit_b
bit
bit_b
1.5
word
P1 P2
N2
A
1.0
N4
A_b
N1 N3
0.5
A
0.0
0
12.8
12.8
bit
word
100
200
300
time (ps)
400
500
600
SRAM Write
Drive one bitline high, the other low
Then turn on wordline
Bitlines overpower cell with new value
Ex: A = 0, A_b = 1, bit = 1, bit_b = 0
Force A_b low, then A rises high
Writability
Must overpower feedback inverter
A_b
bit_b
bit
1.5
word
bit_b
P1 P2
N2
A
N4
A_b
1.0
0.5
word
N1 N3
0.0
0
100
200
300
400
time (ps)
12.9
12.9
500
600
700
SRAM Write
Drive one bitline high, the other low
Then turn on wordline
Bitlines overpower cell with new value
Ex: A = 0, A_b = 1, bit = 1, bit_b = 0
N2 >> P1
N4 >> P2
Writability
Must overpower feedback inverter
A_b
bit_b
bit
1.5
word
bit_b
P1 P2
N2
A
N4
A_b
1.0
0.5
word
N1 N3
0.0
0
100
200
300
400
time (ps)
12.10
12.10
500
600
700
SRAM Sizing
High bitlines must not overpower inverters during reads
But low bitlines must write new value into cell
bit_b
bit
word
weak
med
A
med
A_b
strong
12.11
12.11
Read
Write
word_q1
write_q1
data_s1
12.12
12.12
bit_b_v1f
bit_v1f
SRAM Cell
SRAM Layout
Cell size is critical: 26 x 45 (even smaller in industry)
Tile cells sharing VDD, GND, bitline contacts
GND
VDD
WORD
Cell boundary
12.13
12.13
Thin Cell
In nanometer CMOS
Avoid bends in polysilicon and diffusion
Orient all transistors in one direction
this
Also reduces length and capacitance of bitlines
12.14
12.14
Commercial SRAMs
Five generations of Intel SRAM cell
micrographs
Transition to thin cell at 65 nm
Steady scaling of cell area
12.15
12.15
Decoders
n:2n decoder consists of 2n n-input AND gates
One needed for each row of memory
Build AND from NAND or NOR gates
Static CMOS
A1
Pseudo-nMOS
A0
A1
A1
A0
word
1/2
A0
A1
word0
word0
word1
word1
word2
word2
word3
12.16
12.16
A0
word3
16
word
Decoder Layout
Decoders must be pitch-matched to SRAM cell
Requires very skinny gates
A3
A3
A2
A2
A1
A1
A0
A0
VDD
word
GND
NAND gate
12.17
12.17
buffer inverter
Large Decoders
For n > 4, NAND gates become slow
Break large gates into multiple smaller gates
A3
A2
A1
A0
word0
word1
word2
word3
word15
12.18
12.18
Column Circuitry
Some circuitry is required for each column
Bitline conditioning
Sense amplifiers
Column multiplexing
12.19
12.19
Bitline Conditioning
Precharge bitlines high before reads
bit
bit_b
bit
12.20
12.20
bit_b
Sense Amplifiers
Bitlines have many cells attached
Ex: 32-kbit SRAM has 256 rows x 128 cols
128 cells on each bitline
tpd (C/I) V
Even with shared diffusion contacts, 64C of diffusion
capacitance (big C)
Discharged slowly through small transistors (small I)
12.21
12.21
sense_b
bit
P1
N1
P2
N2
sense
bit_b
N3
12.22
12.22
capacitance
bit
bit_b
isolation
transistors
sense_clk
regenerative
feedback
sense
12.23
12.23
sense_b
Twisted Bitlines
Sense amplifiers also amplify noise
Coupling noise is severe in modern processes
Try to couple equally onto bit and bit_b
Done by twisting bitlines
12.24
12.24
Column Multiplexing
Recall that array may be folded for good aspect
ratio
Ex: 2k word x 16 folded into 256 rows x 128
columns
Must select 16 output bits from the 128 columns
Requires 16 8:1 column multiplexers
12.25
12.25
mux
No external decoder logic needed
A0
B0 B1
B2 B3
B4 B5
B6 B7
B0 B1
B2 B3
B4 B5
A0
A1
A1
A2
A2
Y
12.26
12.26
B6 B7
decoder
A1
A0
B0 B1
B2 B3
12.27
12.27
Dual-Ported SRAM
Simple dual-ported SRAM
Two independent single-ended reads
Or one differential write
bit
bit_b
wordA
wordB
multiplexing
Read during ph1, write during ph2
12.28
12.28
Large SRAMs
Large SRAMs are split into subarrays for speed
Ex: UltraSparc 512KB cache
4 128 KB subarrays
Each have 16 8KB banks
256 rows x 256 cols / bank
60% subarray area efficiency
Also space for tags & control
[Shin05]
12.29
12.29
12.30
12.30
Shift Register
Shift registers store and delay data
Simple design: cascade of registers
Watch your hold times!
clk
Din
Dout
8
12.31
12.31
Din
clk
11...11
reset
12.32
12.32
counter
counter
00...00
readaddr
writeaddr
dual-ported
SRAM
Dout
clk
delay2
SR1
delay3
SR2
12.33
12.33
delay4
SR4
delay5
SR8
SR16
SR32
Din
delay1
Dout
delay0
clk
Sin
P0
12.34
12.34
P1
P2
P3
P0
P1
P2
P3
shift/load
clk
Sout
12.35
12.35
Queues
Queues allow data to be read and written at
different rates.
Read and write each use their own clock, data
Queue indicates whether it is full or empty
Build with SRAM and read/write counters
(pointers)
WriteClk
WriteData
ReadClk
Queue
FULL
12.36
12.36
ReadData
EMPTY
12.37
12.37