Memory Architecture
Memory Architecture
CMOS VLSI
Design
Semiconductor Memory
Harris and Weste, Chapter 12
25 October 2018
J. J. Nahas and P. M. Kogge
Modified from slides by Jay Brockman 2008
[Including slides from Harris & Weste, Ed 4,
Adapted from Mary Jane Irwin and Vijay Narananan, CSE Penn State
adaptation of Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Outline
Memory features and comparisons
Generic Memory Architecture
– Architecture Overview
– Row and Column Decoders
– Redundancy and Error Correction
Static Random Access Memory (SRAM)
Dynamic Random Access Memory (DRAM)
Flash (EEPROM) Memory
Other Memory Types
1
Memory Features and
Comparisons
Memory Characteristics
Read/Write Attributes
– Read-Only Memory (ROM): Programmed at manufacture
• Being phased out of use in favor of Flash
– Read-Write Memory: Can change value dynamically
• SRAM, DRAM
– Read-Mostly: Can write, but much more slowly than read
• EEPROM (Electrically Eraseable, Programable Read Only
Memory) (pronounced “Double E Prom”)
• Flash (A form of EEPROM)
Volatility: sensitivity to losing power
– Volatile: loses contents when power turned off
– Non-volatile: does not lose contents
2
Memory Characteristics
Addressability
– Random-Access: provide address to access a “word” of data
• No correlation between successive addresses
– Block oriented: read and write large blocks of data at a time
– Content-addressable: search memory for match against
partial data
– Serial Access: e.g.. FIFO. Queue, stack
Wearout
– Some memories have a limited number of write cycles before
changing characteristics renders them unusable
Cell Size
– Primary determinent of the physical size of the memory array
Lecture Focus
The most used technologies
– SRAM
• Local memory
• Registers, Cache
– DRAM
• Main memory for computers and processors
– Flash and other EEPROMs
• Program and data storage
• Operational parameter storage
– Used in controllers for everything from a key fob to
an automobile engine/transmission.
3
Memory Features
SRAM DRAM Flash
Non‐volitile No No Yes
Cell Size (F^2) 25 to 40 6 to 8 4 to 5
Read Access Time ~ 0.1 ns ~ 10 ns ~ 10 ns
Write Time ~ 0.1 ns ~ 10 ns N.A.
Erase Time N.A. N.A. ~ 10 ms
Program Time N.A. N.A. ~ 10 us
Wearout N.A. N.A. 10k to 100k
erase/program
cycles
Use in Registers Main Memory Program and
Computer/Processor Cache Data Storage
Systems (L1, L2, L3)
Generic Memory
Architecture
4
Notional 2 Dimensional Memory
(2n Words of 2m bits each)
Architecture
Word 2n-1
Word Line 2n-1
• Array of 2n horizontal word lines
… 2n “Rows” …
Row Decoder
• and 2m vertical Bit lines
Column BitLine 0
• With a “memory bit” at each
intersection
Bit Line 2n
2D Memory Array
of 2nx2m Bit Cells Key Logic Cells:
• Bit Cell: when its word line is
active, place its current value on
Word 0
Each Row Line “Interrogates” Word Line 0 bit line
a “Word” of 2Nmbits and places bits on output
Sense Amps and Bit Line Drivers • Row Decoder: Convert 2n bit
address into 1 out of 2n-1 lines
Data Data
Bit 0 Bit 2m-1 • Sense Amps: convert minute
signal on a bit line into full-
Address fledged digital signal
(n bits) •Bit Line Driver: Drives input
data onto bit line for storage in cell
Semiconductor Memory CMOS VLSI Design Slide 9
5
Write Access Signal Flow
Transmit Row Address “UP” thru
row decoder
Decoder at each row determines if
it matches transmitted row address
If so, it raises its word line
As word line signal goes from left
to right, each cell it reaches
receives its data from its bitline
Only one cell per column receives
data from its bit line
Questions
1. How many bits of storage?
2. What is logic function at each
row decoder?
3. What memory parameters have
major effect on access time, and
why?
4. What memory parameters affect
power?
5. What happens if memory is “tall
and skinny”? i.e. n>>m
6. What happens if memory is
“short and fat”? i.e. n<<m
6
More Accurate Array-Structured Memory
Architecture
Assume we still want 2n words of 2m bits each,
but implement 2D array of (2n-k) rows of (2(k+m)) bits
Address of n bits, split
into two parts
row decode – “Row” (n-k bits)
– “Column” (k bits)
n-k row address bits
used to decode 1 of
2n-k rows
All cells on selected
row sensed
row address simultaneously
sense amplifiers
n-k bits bit line drivers Array reads out
(2k)*(2m) bits – called
column decode an open row
column address and mux
k bits C Column address
Din/Dout: 2m bits bits select one of 2k
CMOS VLSI Design
words from open row
7
A Common Solution:
Vdd
Active Pull-up
Use “High Resistance” P type at top
Bit
Turn on for access Cell
Dynamic Sensing
With all P types always on, when n-types are on you
have “short circuit” current – wastes power
Observation: a bit line is a long wire
– With significant capacitance
What if we “pulse” p –types just before reading
– Called precharging the bit lines
– Bit lines all “charged” to a high voltage
Now when we activate n-types
– Only bits that represent “0” are pulled low
– NO SHORT CIRCUIT CURRENT
“Sensing:” no longer static voltage level
– Charge on bit line decays with time
– Need sampling sense amplifier to sense charge
8
Row and Column
Decoders
Row Decoder:
n bit address in, 2n rows out
V DD
Pull-up devices
A1
WL [0]
A0
A1
A0 WL [1]
A1
WL [2]
A0
A1
A0 WL [3]
9
Standardizing Row Decoder
V DD
Pull-up devices
NOR WL [0]
NOR A0
WL [1]
NOR WL [2]
NOR WL [3]
Row Decoders
Collection of 2n complex logic gates
Organized in regular and dense fashion
(N)AND Decoder
NOR Decoder
10
Large Decoders
For n > 4, NAND gates become slow
– Break large gates into multiple smaller gates
A3 A2 A1 A0
word0
word1
word2
word3
word15
Predecoding
Many of these gates are redundant
– Factor out common
A3
A2
– Saves area
A0
1 of 4 hot
predecoded lines
word0
word1
word2
word3
word15
11
4-input pass-transistor based column
decoder/multiplexor
BL 0 BL 1 BL 2 BL 3
S0
A0
S1
S2
A1 S3
Advantages: speed (tpd does not add to overall memory access time)
Only one extra transistor in signal path
Disadvantage: Large transistor count
Redundancy
and
Error Correction
Weste and Harris Section 12.8 and 11.7.2
12
Redundancy
To improve yield, large memory arrays typically have
redundant rows and columns.
During testing, defective bits, rows, and columns are
identified.
Algorithm then determines which rows and/or
columns to replace to avoid the defective bits.
Laser programming or fuses are used to program
the replacements into the chip
Error Correction
Large memory arrays can also have soft errors due to signals
being marginal.
– The larger the number of bits, the larger the distribution of
signals.
– What is the error rate for a memory?
• 1 error in 1012 reads is not acceptable
• 1 error in 1016 reads is acceptable for most applications
– A few errors per year at 1 GHz read rate.
• 1 error in 1024 reads is needed for applications in
financial institutions
A 64 bit word plus nine parity bits (73 bits total) can be used to
correct one bit error in the 64 bit word and detect two bit errors.
– The use of 64 bit single error correction would decrease the
error rate from 1 error in 1012 to 1 error in 1016 reads.
13
SRAM: Static Random
Access Memory
2008, 45 nm CMOS
SRAM used in registers, L1 and L2 Caches in Cores and
Shared L3 Cache
90+% of chip is SRAM
Semiconductor Memory CMOS VLSI Design Slide 28
14
Semiconductor Memory CMOS VLSI Design Slide 29
WL
VDD
V DD M2 M4
M2 M4
Q
M5 Q M6
Q Q
M1 M3
M1 M3
GND
M5 M6
BL BL WL
BL BL
There are 2 bit lines per column: True and Complement
Sense Amp looks for “1-0” or “0-1”
Semiconductor Memory CMOS VLSI Design Slide 30
15
SRAM Read
Precharge both bitlines high
Then turn on wordline
One of the two bitlines will be pulled down by the cell
Ex: A = 0, A_b = 1 word
bit bit_b
1.5
– N1 >> N2 0.5
A
0.0
0 100 200 300 400 500 600
time (ps)
SRAM Write
Drive one bitline high, the other low
Then turn on wordline
Bitlines overpower cell with new value
bit bit_b
Writability A_b
– N2 >> P1
bit_b
1.0
0.5
word
16
SRAM Sizing
High bitlines must not overpower inverters during
reads
But low bitlines must write new value into cell
bit bit_b
word
weak
med med
A A_b
strong
2 2
More More
Cells
Cells
word_q1
word_q1
bit_b_v1f
bit_b_v1f
bit_v1f
SRAM Cell
bit_v1f
SRAM Cell
H H write_q1
out_b_v1r out_v1r
data_s1
1
2
word_q1
bit_v1f
out_v1r
17
SRAM Layout
Cell size is critical: 26 x 45 (even smaller in industry)
Tile cells sharing VDD, GND, bitline contacts
VDD
WORD
Cell boundary
Thin Cell
In nanometer CMOS
– Avoid bends in polysilicon and diffusion
– Orient all transistors in one direction
Lithographically friendly or thin cell layout fixes this
– Also reduces length and capacitance of bitlines
18
Bitline Conditioning
Precharge bitlines high before reads
bit bit_b
bit bit_b
Sense Amplifiers
Bitlines have many cells attached
– Ex: 32-kbit SRAM has 128 rows x 256 cols
– 128 cells on each bitline
tpd (C/I) V
– Even with shared diffusion contacts, 64C of
diffusion capacitance (big C)
– Discharged slowly through small transistors
(small I)
Sense amplifiers are triggered on small voltage
swing (reduce V)
19
Differential Pair Amp
Differential pair requires no clock
But always dissipates static power
P1 P2
sense_b sense
bit N1 N2 bit_b
N3
bit bit_b
sense_clk isolation
transistors
regenerative
feedback
sense sense_b
20
Decoder Layout
Decoders must be pitch-matched to SRAM cell
– Requires very skinny gates
A3 A3 A2 A2 A1 A1 A0 A0
VDD
word
GND
21
Intel 1103 1Kbit 1024 x 1 DRAM
http://www.cpu-museum.com/Thumbs/Intel-P1103_t.jpg
256Mb DRAM
2008 DDR3 DRAM
http://www.eetasia.com/IMAGES/EEOL_2008APR24_STOR_NP_01a.jpg
22
1-Transistor DRAM Cell
DRAM Operation
23
Early Poly Diffused
Capacitor DRAM Cell Capacitor
M1 word
line
Metal word line
SiO 2
Poly
n+ n+ Field Oxide Diffused
bit line
Inversion layer
Poly
induced by Polysilicon
Polysilicon
plate bias gate plate
Cross-section Layout
24
Capacitor Implementations
http://www.ieee.org/portal/cms_docs_sscs/sscs/08Winter/sunami-fig5.jpg
Semiconductor Memory CMOS VLSI Design Slide 49
25
Making the DRAM Capacitors
Word line
Cell plate Capacitor dielectric layer
Insulating Layer
Cell Plate Si
26
DRAM Sense Amplifier
DRAM Refresh
Memory capacitor can discharge by themselves in
~10-100 ms.
A read operation senses the capacitor voltage and,
using positive feedback, recharges the capacitor.
All bit cells in a DRAM must be read periodically to
refresh the capacitor voltage.
– This periodic reading is called a refresh cycle.
27
Flash Memory
Non-Volatile Memories
The Floating-gate transistor (FAMOS)
Floating gate Gate
D
Source Drain
tox G
tox
S
n+ p n+_
Substrate
Device cross-section
Schematic symbol
Storage determined by charge on the floating gate
– “0” = negative charge (extra electrons)
– “1” = no charge
Negative charge on floating gate “screens” normal gate, raising
threshold
Charge can take years to “leak off” once placed there
Multi Level flash: different charge levels represent different values
– We are “programming” Vt of the transistor
Semiconductor Memory CMOS VLSI Design Slide 56
28
NAND Flash Memory
String select “String” SSL
transistor
Word line(poly)
Gate
ONO
Unit Cell
Gate FG
Oxide
String select
Ground select
29
NAND Flash Memory
Select transistor Word lines
Active area
STI
64 Gb (8GB) flash
• 2 independent panes
Bit line contact Source line contact • 64K columns/pane
• Thus 64kbit page
• Each cell holds 4 bits
• Each string = 64 cells
• Each block has 256 pages
• Each pane has 2K blocks
Courtesy Toshiba
Semiconductor Memory CMOS VLSI Design Slide 59
Reading Data
Precharge bit lines
SSL
SSL & GSL set high
Set all word lines but desired page to Word line
high enough to turn transistors on,
regardless of state
Bit line
30
Programming (Writing) Data
SSL
Cell “programmed” by placing electrons on floating
gate Word line
Bit line
– Word line for selected page raised very high
(e.g. 20V) to trigger tunneling Selected page’s
Word line
– Word line for all other pages at intermediate
level (10V) guaranteed to turn transistors on,
but not tunnel
– Desired bit values placed on bit lines
– If a “0” on bit line, then electrons tunnel to
floating gate occurs GSL
Block Erasure
Floating gate Gate
Source Gnd Drain
tox
31
Non-Volatile Memories
The Floating-gate transistor (FAMOS)
Floating gate Gate
D
Source Drain
tox G
tox
S
n+ p n+_
Substrate
Device cross-section Schematic symbol
• ERASE: Raising substrate to high + voltage (e.g. 20V) with Gate at Ground, causes tunneling from
floating gate to substrate, clearing floating gate of all charge
• PROGRAM: High + voltage on Control Gate, with substrate at ground, causes electrons to tunnel
from substrate to floating gate, raising effective threshold of device, and representing a “0”
• Different voltages can store different amounts of charge, changing threshold
• READ state by applying voltage (< “0” threshold) to control gate and seeing if current flows
Summary
Floating gate Gate
Source Drain
tox
tox
n+ p n+_
Substrate
32
Microcontrollers
An example of SRAM combined with Flash Memory
Usually include many other functions
33
STM ARM Cortex-M3 MCU
34
Other Semiconductor
Memory Technologies
►4 Mb Toggle MRAM
• 35ns symmetrical read/write
• Unlimited endurance
• Date retention >>10 Years
• 256Kx16bit organization
• 3.3V single power supply
• Fast SRAM pinout
• Consumer temperature range
35
MRAM Attributes
Non-Volatility
– fast writing
– no write endurance limitation
Random Access
– no refresh
– no erase/program write sequence.
Non-destructive read
Top
Bottom Electrode
Electrode i sense
i
Write Line 1
ON for
sensing
selection
i Ref.
36
Elements of Toggle Bit
Balanced SAF free-layer Hard Easy
Bit oriented 45º to lines Axis Axis
Unipolar currents
Overlapping pulse sequence
Pre-read / decision write
Write
Line 1
(H1)
Write
Line 2
(H2)
Hard Easy Hard Easy Hard Easy Hard Easy Hard Easy
Axis Axis Axis Axis Axis I2 Axis Axis I2 Axis Axis Axis
H1 H1
I1 I1
On Write Line 1
Off
On
Write Line 2
Off
t0 t1 t2 t3 t4
37
2 X 2 Array Addressing
Y Z
Axis2 Axis2
Wire Wire
Current No Current
38
The Radical Fringe: Carbon Nanotubes
Scientific American,
Feb. 2005
Semiconductor Memory CMOS VLSI Design Slide 77
39