0% found this document useful (0 votes)
7 views57 pages

Dic Lec 28 Arrays1 v014

The document is a lecture on Memory Arrays in Digital IC Design, focusing on various types of memory such as SRAM, DRAM, and CAM, as well as their architectures and functionalities. It discusses the differences between static and dynamic memory cells, the design challenges associated with SRAM cells, and the importance of decoders and column circuitry in memory arrays. The lecture is based on the textbook 'CMOS VLSI Design' and aims to provide foundational knowledge for integrated circuit design.

Uploaded by

Ali Emad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views57 pages

Dic Lec 28 Arrays1 v014

The document is a lecture on Memory Arrays in Digital IC Design, focusing on various types of memory such as SRAM, DRAM, and CAM, as well as their architectures and functionalities. It discusses the differences between static and dynamic memory cells, the design challenges associated with SRAM cells, and the importance of decoders and column circuitry in memory arrays. The lecture is based on the textbook 'CMOS VLSI Design' and aims to provide foundational knowledge for integrated circuit design.

Uploaded by

Ali Emad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

‫ن ا ْلعِْلِم إِاَّل قَلِ ًيل‬ِ ‫وما أُوتِيتم‬

‫م‬
8 March 2020 1441 ‫ رجب‬13

َ ُْ ََ

Digital IC Design

Lecture 28
Memory Arrays (1)

Dr. Hesham A. Omran


Integrated Circuits Laboratory (ICL)
Electronics and Communications Eng. Dept.
Faculty of Engineering
Ain Shams University
This lecture is mainly based on “CMOS VLSI Design”, 4th edition, by N. Weste and D. Harris and
its accompanying lecture notes
Chip Subsystems
❑ Chip functions generally can be divided into the following categories:
▪ Datapath operators
▪ Memory arrays
▪ Control structures
▪ Special-purpose cells
• I/O
• Power distribution
• Clock generation and distribution
• Analog and RF

28: Memory Arrays (1) 2


Memory Arrays
Memory Arrays

Random Access Memory Serial Access Memory Content Addressable Memory


(CAM)

Read/Write Memory Read Only Memory


Shift Registers Queues
(RAM) (ROM)
(Volatile) (Nonvolatile)

Serial In Parallel In First In Last In


Static RAM Dynamic RAM Parallel Out Serial Out First Out First Out
(SRAM) (DRAM) (SIPO) (PISO) (FIFO) (LIFO)

Mask ROM Programmable Erasable Electrically Flash ROM


ROM Programmable Erasable
(PROM) ROM Programmable
(EPROM) ROM
(EEPROM)

28: Memory Arrays (1) 3


Outline
❑ Memory Arrays
❑ SRAM Architecture
▪ SRAM Cell
▪ Decoders
▪ Column Circuitry
▪ Multiple Ports
❑ CAM
❑ Serial Access Memories

28: Memory Arrays (1) 4


Memory Arrays
❑ Like sequencing elements, memory cells used in volatile memories can further be divided
into static and dynamic
❑ Dynamic cells
▪ Use charge stored on a floating capacitor through an access transistor
▪ Charge will leak away through the access transistor even while the transistor is OFF
▪ Must be periodically read and rewritten to refresh their state
❑ Static cells
▪ Use some form of feedback to maintain their state
▪ Faster and less troublesome
▪ Require more area per bit than their dynamic counterparts

28: Memory Arrays (1) 5


Array Architecture
❑ 2n words of 2m bits each
❑ Ex: 16 4-bit words (n = 4, m = 2)
❑ Only one row is activated by asserting the wordline
▪ The cells on this wordline drive the bitlines
▪ Column circuitry may contain amplifiers or buffers to
sense the data
❑ Good regularity – easy to design
❑ Very high density if good cells are used

28: Memory Arrays (1) 6


Folding
❑ 2n words of 2m bits each
❑ If n >> m, fold by 2k into fewer rows of more columns
❑ Larger memories are built from multiple smaller subarrays
▪ Keep wordlines and bitlines reasonably short and fast

28: Memory Arrays (1) 7


Outline
❑ Memory Arrays
❑ SRAM Architecture
▪ SRAM Cell
▪ Decoders
▪ Column Circuitry
▪ Multiple Ports
❑ CAM
❑ Serial Access Memories

28: Memory Arrays (1) 8


SRAM
❑ Static RAMs use a memory cell with internal feedback that retains its value as long as
power is applied
❑ An ordinary flip-flop could accomplish this requirement, but the size is quite large
❑ SRAM is
▪ Denser than flip-flops
▪ Compatible with standard CMOS processes
▪ Faster than DRAM
▪ Easier to use than DRAM
❑ Widely used in applications from caches to register files

28: Memory Arrays (1) 9


12T SRAM Cell
❑ Basic building block: SRAM Cell
▪ Holds one bit of information, like a latch
▪ Must be read and written
❑ 12-transistor (12T) SRAM cell
▪ Use a simple latch connected to bitline
▪ 46 x 75 l unit cell (good for class projects)
bit
write

write_b

read

read_b

28: Memory Arrays (1) 10


6T SRAM Cell
❑ Cell size accounts for most of array size
▪ Reduce cell size at expense of peripheral circuitry complexity
❑ 6T SRAM Cell: used in most commercial chips
▪ Data stored in cross-coupled inverters
❑ Read
▪ Precharge bit, bit_b bit bit_b

▪ Raise wordline word

❑ Write
▪ Drive data onto bit, bit_b
▪ Raise wordline
❑ Main design challenge
▪ Be weak enough to be overpowered during a write
▪ Yet strong enough not to be disturbed during a read

28: Memory Arrays (1) 11


SRAM Read
❑ Precharge both bitlines high bit_b
bit
❑ Then turn on wordline word
❑ One of the two bitlines will be pulled down by the P1 P2
N2 N4
cell
A A_b
❑ Ex: A = 0, A_b = 1
N1 N3
▪ bit discharges, bit_b stays high
▪ But A bumps up slightly
A_b bit_b
❑ Read stability
▪ A must not flip 1.5

▪ PD >> Access 1.0


bit
word
▪ N1 >> N2 (and N3 >> N4)
0.5

A
0.0
0 100 200 300 400 500 600
28: Memory Arrays (1) time (ps) 12
SRAM Write
❑ Drive one bitline high, the other low bit bit_b
❑ Then turn on wordline word
❑ Bitlines overpower cell with new value P1 P2
N2 N4
❑ Ex: A = 0, A_b = 1, bit = 1, bit_b = 0
A A_b
▪ Force A_b low, then A rises high N1 N3
❑ Writability
▪ Must overpower feedback inverter A_b

▪ Access >> PU 1.5 A

▪ N2 >> P1 (and N4 >> P2) bit_b


1.0

0.5
word

0.0
0 100 200 300 400 500 600 700
time (ps)
28: Memory Arrays (1) 13
SRAM Sizing
❑ Readability
▪ PD >> Access
❑ Writability
▪ Access >> PU
❑ High bitlines must not overpower inverters during reads
❑ But low bitlines must write new value into cell
bit bit_b
word
Weak
P1 P2
N2 N4
Med A A_b
Strong N1 N3

28: Memory Arrays (1) 14


SRAM Column Example
Bitline Conditioning Bitline Conditioning
Read Write
2 2
More More
Cells
Cells
word_q1
word_q1

bit_b_v1f

bit_b_v1f
bit_v1f

bit_v1f
SRAM Cell
SRAM Cell

H H write_q1

out_b_v1r out_v1r
data_s1
1

2

word_q1

bit_v1f

out_v1r
28: Memory Arrays (1) 15
SRAM Layout
❑ Cell size is critical: 26 x 45 l (even smaller in industry)
❑ Tile cells sharing VDD, GND, bitline contacts

GND BIT BIT_B GND

VDD

WORD

Cell boundary

28: Memory Arrays (1) 16


Thin Cell
❑ In nanometer CMOS → Lithographically friendly thin cell
▪ Avoid bends in polysilicon and diffusion
• Diffusion runs strictly in the vertical direction
• Polysilicon runs strictly in the horizontal direction
▪ Orient all transistors in one direction
▪ Also reduces length and capacitance of bitlines

28: Memory Arrays (1) 17


Commercial SRAMs
❑ Five generations of Intel SRAM cell micrographs
▪ Transition to thin cell at 65 nm
▪ Steady scaling of cell area

28: Memory Arrays (1) 18


Decoders
❑ n:2n decoder consists of 2n n-input AND gates
▪ One needed for each row of memory
▪ Build AND from NAND or NOR gates

Static CMOS Pseudo-nMOS


A1 A0 A1 A0
1 1 8
word 1/2 4 16
A1 word
1 4
A0 A1 2 8
A0 1 1 1

word0 word0

word1 word1

word2 word2

word3 word3

28: Memory Arrays (1) 19


Decoder Layout
❑ Decoders must be pitch-matched to SRAM cell
▪ Requires very skinny gates

A3 A3 A2 A2 A1 A1 A0 A0

VDD

word

GND

NAND gate buffer inverter

28: Memory Arrays (1) 20


Large Decoders
❑ For n > 4, NAND gates become slow
▪ Break large gates into multiple smaller gates
A3 A2 A1 A0

word0

word1

word2

word3

word15

28: Memory Arrays (1) 21


Predecoding
❑ Many of these gates are redundant
A3
▪ Factor out common
gates into predecoder A2

▪ Saves area A1

▪ Same path effort


A0

predecoders

1 of 4 hot
predecoded lines

word0

word1

word2

word3

word15

28: Memory Arrays (1) 22


Row Decoder Example
❑ Estimate the delays of 8:256 decoders using static CMOS. Assume the decoder has an
electrical effort of H = 10 and that both true and complementary inputs are available.

❑ The decoder consists of 256 8-input AND gates.


❑ B = 256/2 = 128 because each input is used by half the gates.
❑ Assume logical effort of the path (G) is close to 1, the path effort is F = GBH = 1280 and the
best number of stages is log4F = 5.16
❑ Consider a 6-stage design using three levels of 2-input AND gates
❑ G = [(4/3) × (1)]^3 = 64/27
❑ F = GBH = 3034
❑ P = 3 × (2 + 1) = 9
❑ D = NF1/N + P = 31.8𝜏 = 6.4 FO4

28: Memory Arrays (1) 23


Hierarchical Wordlines
❑ The wordline is heavily loaded and has a high resistance
▪ Long RC flight time for large arrays
❑ Divide the wordline into global and local segments
▪ Local wordlines (lwl) are shorter and drive smaller group of cells
▪ Global wordlines (gwl) are still long
• but have lighter loads and can be constructed with a wider and thicker level of metal

28: Memory Arrays (1) 24


Static to Dynamic Interface
❑ The wordline generally must be qualified with the clock
for proper bitline timing
❑ Take advantage of the 1-hot nature of decoder outputs
▪ Share the clocked nMOS transistor across multiple
final 2-input AND gates
❑ The sleep transistor only needs to be wide enough to
supply current to a single inverter

28: Memory Arrays (1) 25


Column Circuitry
❑ Some circuitry is required for each column
▪ Bitline conditioning
▪ Sense amplifiers
▪ Column multiplexing

28: Memory Arrays (1) 26


Bitline Conditioning
❑ Precharge bitlines high before reads


bit bit_b

❑ Equalize bitlines to minimize voltage difference when using sense amplifiers

bit bit_b

28: Memory Arrays (1) 27


Bitline Delay
❑ Bitlines have many cells attached
▪ Ex: 32-kbit SRAM has 128 rows x 256 cols
▪ 128 cells on each bitline
❑ tpd  (C/I) DV
▪ Even with shared diffusion contacts, 64C of diffusion capacitance (big C)
▪ Discharged slowly through small transistors (small I)

bit bit_b
word
P1 P2
N2 N4
A A_b
N1 N3

28: Memory Arrays (1) 28


Bitline Delay Example
❑ A subarray of a large memory is organized as 256 words × 136 bits. Estimate the parasitic
delay of the bitline. Assume the driver and access transistors are unit-sized and that wire
capacitance is comparable to diffusion capacitance.

❑ The bitline has 256 cells attached, but pairs of cells are mirrored to share a bitline, so the
diffusion capacitance is 128C.
❑ Wire capacitance is comparable, so the total capacitance is 256C.
❑ The bitline is pulled down through the driver and access transistors in series, with a total
resistance of 2R.
❑ Therefore, the delay is 512RC, or 34.1 FO4 inverter delays.
❑ This is unacceptably large for many applications.

28: Memory Arrays (1) 29


Sense Amplifier
❑ Bitline sensing can be large-signal (between VDD and GND) or small-signal (e.g., 100-
300mV)
❑ Sense amplifiers are triggered on small voltage swing (reduce DV)
+ Saves the delay of waiting for a full bitline swing
+ Reduces energy consumption if the bitline swing is terminated after sensing
- Requires a timing circuit to indicate when the sense amplifier should fire
- Process variation leads to offsets in the sense amplifier that increase the required
bitline swing
❑ Historically, small SRAM arrays such as register files used large-signal sensing while big
SRAM and DRAM arrays used sense amps
❑ The trend is toward large-signal sensing in nanometer processes

28: Memory Arrays (1) 30


Differential Pair Amp
❑ Differential pair requires no clock
❑ But always dissipates static power

P1 P2
sense_b sense
bit N1 N2 bit_b

N3

28: Memory Arrays (1) 31


Clocked Sense Amp
❑ Clocked sense amp saves power
❑ Requires sense_clk after enough bitline swing
❑ Isolation transistors cut off large bitline capacitance

bit bit_b

sense_clk isolation
transistors

regenerative
feedback

sense sense_b

28: Memory Arrays (1) 32


Twisted Bitlines
❑ Sense amplifiers also amplify noise
▪ Coupling noise is severe in modern processes
▪ Try to couple equally onto bit and bit_b
▪ Done by twisting bitlines

b0 b0_b b1 b1_b b2 b2_b b3 b3_b

28: Memory Arrays (1) 33


Column Multiplexing
❑ Recall that array may be folded for good aspect ratio
❑ Ex: 2 kword x 16 folded into 256 rows x 128 columns
▪ Must select 16 output bits from the 128 columns
▪ Requires 16 8:1 column multiplexers
❑ Column decoding takes place in parallel with row decoding so it does not impact the critical
path

28: Memory Arrays (1) 34


Tree Decoder Mux
❑ Column mux can use pass transistors
▪ Use nMOS only, precharge outputs
❑ One design is to use k series transistors for 2k:1 mux
▪ No external decoder logic needed

B0 B1 B2 B3 B4 B5 B6 B7 B0 B1 B2 B3 B4 B5 B6 B7
A0
A0

A1
A1

A2
A2

Y Y
to sense amps and write circuits

28: Memory Arrays (1) 35


Single Pass-Gate Mux
❑ Or eliminate series transistors with separate decoder

A1 A0

B0 B1 B2 B3

28: Memory Arrays (1) 36


Ex: 2-way Muxed SRAM
❑ Large signal sensing using NMOS pass transistor mux
▪ Mux output is precharged high (why?)
❑ 2-way: read and write through the same mux

2

More More
Cells Cells
word_q1

A0
A0

write0_q1 2 write1_q1

data_v1
28: Memory Arrays (1) 37
Multiple Ports
❑ We have considered single-ported SRAM
▪ One read or one write on each cycle
❑ Multiported SRAM are needed for register files
❑ Examples:
▪ Multicycle MIPS must read two sources or write a result on some cycles
▪ Pipelined MIPS must read two sources and write a third result each cycle
▪ Superscalar MIPS must read and write many sources and results each cycle

28: Memory Arrays (1) 38


Dual-Ported SRAM
❑ Simple dual-ported SRAM
▪ 6T cells but two wordlines bit bit_b
wordA
❑ Two independent single-ended reads (one read wordB
appears on bit, while the other appears on bit_b) [n]
▪ Or one differential write

❑ Do two reads and one write by time multiplexing


wordA
▪ Read during ph1, write during ph2 wordB
[n+1]

28: Memory Arrays (1) 39


Dual-Ported SRAM
❑ 8T SRAM dual-ported SRAM cell enables independent read and write ports
▪ Each additional read port adds rwl and two transistors
❑ Read operation does not backdrive the state nodes through the access transistor
▪ Simplifies design trade-offs and allows lower-voltage operation
❑ Intel switched from 6T to 8T cells within the cores for its 45 nm line of Core processors

28: Memory Arrays (1) 40


Large SRAMs
❑ Large SRAMs are split into subarrays (banks) for speed
▪ Trade-off between area and speed
❑ Ex: UltraSparc 512KB cache
▪ Four 128 kB subarrays
▪ Each have 16 8KB banks
▪ 256 rows x 256 cols / bank

[Shin05]

28: Memory Arrays (1) 41


Outline
❑ Memory Arrays
❑ SRAM Architecture
▪ SRAM Cell
▪ Decoders
▪ Column Circuitry
▪ Multiple Ports
❑ CAM
❑ Serial Access Memories

28: Memory Arrays (1) 42


Content-Addressable Memory (CAM)
❑ Extension of ordinary memory (e.g. SRAM)
▪ Read and write memory as a normal SRAM
▪ Also match to see which words contain a key
❑ The match signal is used to address another SRAM
▪ Like searching in a database

adr data/key

read
CAM match
write

28: Memory Arrays (1) 43


10T CAM Cell
❑ Add four match transistors to 6T SRAM (56 x 43 l unit cell)
❑ Multiple CAM cells in the same word are tied to the same matchline.
❑ The matchline is precharged. The key is placed on the bitlines.
❑ If any cell in a given row does not match:
▪ matchline is pulled low.
▪ If all cells match, it remains high.

bit bit_b bit bit_b


word
cell

cell_b

cell

cell_b

match

28: Memory Arrays (1) 44


CAM Cell Operation
❑ Read and write like ordinary SRAM
❑ For matching: CAM cell

▪ Leave wordline low clk weak


miss
▪ Precharge matchlines

row decoder
address match0

▪ Place key on bitlines match1

▪ Matchlines evaluate
match2
❑ Miss line
match3
▪ Pseudo-nMOS NOR of match lines read/write column circuitry

▪ Goes high if no words match data

28: Memory Arrays (1) 45


Outline
❑ Memory Arrays
❑ SRAM Architecture
▪ SRAM Cell
▪ Decoders
▪ Column Circuitry
▪ Multiple Ports
❑ CAM
❑ Serial Access Memories

28: Memory Arrays (1) 46


Serial Access Memories
❑ Serial access memories do not use an address
▪ Shift Registers
▪ Tapped Delay Lines
▪ Serial In Parallel Out (SIPO)
▪ Parallel In Serial Out (PISO)
▪ Queues (FIFO, LIFO)

28: Memory Arrays (1) 47


Shift Register
❑ Shift registers store and delay data
❑ Simple design: cascade of registers
▪ Watch your hold times!

clk

Din Dout
8

28: Memory Arrays (1) 48


Denser Shift Registers
❑ Flip-flops aren’t very area-efficient
❑ For large shift registers, keep data in SRAM instead
❑ Move read/write pointers to RAM rather than data
▪ Initialize read address to first entry, write to last
▪ Increment address on each cycle
Din
clk

readaddr
00...00 counter
dual-ported
counter SRAM
writeaddr
11...11

reset
Dout

28: Memory Arrays (1) 49


Tapped Delay Line
❑ A tapped delay line is a shift register with a programmable number of stages
❑ Set number of stages with delay controls to mux
▪ Ex: 0 – 63 stages of delay

clk
SR32

SR16

SR8

SR4

SR2

SR1
Din Dout

delay5 delay4 delay3 delay2 delay1 delay0

28: Memory Arrays (1) 50


Serial In Parallel Out
❑ 1-bit shift register reads in serial data
▪ After N steps, presents N-bit parallel output

clk

Sin

P0 P1 P2 P3

28: Memory Arrays (1) 51


Parallel In Serial Out
❑ Load all N bits in parallel when shift = 0
▪ Then shift one bit out per cycle

P0 P1 P2 P3
shift/load
clk

Sout

28: Memory Arrays (1) 52


Queues
❑ Queues allow data to be read and written at different rates.
❑ Read and write each use their own clock, data
❑ Queue indicates whether it is full or empty
❑ May also provide ALMOST-FULL and ALMOST-EMPTY flags
❑ Build with SRAM and read/write counters (pointers)

WriteClk ReadClk

WriteData Queue ReadData

FULL EMPTY

28: Memory Arrays (1) 53


FIFO, LIFO Queues
❑ First In First Out (FIFO)
▪ Used to buffer data between two asynchronous streams
▪ Initialize read and write pointers to first element
▪ Queue is EMPTY
▪ On write, increment write pointer
▪ If write almost catches read, Queue is FULL
▪ On read, increment read pointer
❑ Last In First Out (LIFO)
▪ Also called a stack
▪ Use a single stack pointer for read and write

28: Memory Arrays (1) 54


Thank you!

28: Memory Arrays (1) 55


Sense Amplifier Timing
❑ Clocked sense amplifiers must be activated at just the right time
▪ Too early, DV not enough, too late, unnecessarily slow
❑ Use replica cells and bitlines to closely track the access path
❑ The replica column has only 1/r cells → discharges r times faster
❑ saen asserted at bitline swing approximately VDD /r

28: Memory Arrays (1) 56


Multi-Ported SRAM
❑ Adding more access transistors hurts read stability
❑ Multiported SRAM isolates reads from state node
❑ Single-ended bitlines save area (add inverter within the cell for wb)

28: Memory Arrays (1) 57

You might also like