0% found this document useful (0 votes)
11 views6 pages

CMOS VLSI Design - A Circuits and Systems Perspective

The document discusses various types of memory cells, including static RAM (SRAM) and dynamic RAM (DRAM), highlighting their structures, operations, and characteristics. SRAM is noted for its speed and ease of use, while DRAM offers higher density but requires periodic refreshing. The organization of memory arrays and the design considerations for SRAM cells, including read and write operations, are also covered.

Uploaded by

rohithhenugala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

CMOS VLSI Design - A Circuits and Systems Perspective

The document discusses various types of memory cells, including static RAM (SRAM) and dynamic RAM (DRAM), highlighting their structures, operations, and characteristics. SRAM is noted for its speed and ease of use, while DRAM offers higher density but requires periodic refreshing. The organization of memory arrays and the design considerations for SRAM cells, including read and write operations, are also covered.

Uploaded by

rohithhenugala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

498 Chapter 12 Array Subsystems

Like sequencing elements, the memory cells used in volatile memories can further be
divided into static structures and dynamic structures. Static cells use some form of feedback
to maintain their state, while dynamic cells use charge stored on a floating capacitor
through an access transistor. Charge will leak away through the access transistor even
while the transistor is OFF, so dynamic cells must be periodically read and rewritten to
refresh their state. Static RAMs (SRAMs) are faster and less troublesome, but require
more area per bit than their dynamic counterparts (DRAMs).
Some nonvolatile memories are indeed read-only. The contents of a mask ROM are
hardwired during fabrication and cannot be changed. But many nonvolatile memories can
be written, albeit more slowly than their volatile counterparts. A programmable ROM
(PROM) can be programmed once after fabrication by blowing on-chip fuses with a spe-
cial high programming voltage. An erasable programmable ROM (EPROM) is pro-
grammed by storing charge on a floating gate. It can be erased by exposure to ultraviolet
(UV) light for several minutes to knock the charge off the gate. Then the EPROM can be
reprogrammed. Electrically erasable programmable ROMs (EEPROMs) are similar, but can
be erased in microseconds with on-chip circuitry. Flash memories are a variant of
EEPROM that erases entire blocks rather than individual bits. Sharing the erase circuitry
across larger blocks reduces the area per bit. Because of their good density and easy in-
system reprogrammability, Flash memories have replaced other nonvolatile memories in
most modern CMOS systems.
Memory cells can have one or more ports for access. On a read/write memory, each
port can be read-only, write-only, or capable of both read and write.
A memory array contains 2n words of 2m bits each. Each bit is stored in a memory
cell. Figure 12.2 shows the organization of a small memory array containing 16 4-bit
words (n = 4, m = 2). Figure 12.2(a) shows the simplest design with one row per word and
one column per bit. The row decoder uses the address to activate one of the rows by assert-
ing the wordline. During a read operation, the cells on this wordline drive the bitlines,
which may have been conditioned to a known value in advance of the memory access. The
column circuitry may contain amplifiers or buffers to sense the data. A typical memory
array may have thousands or millions of words of only 8–64 bits each, which would lead to
a tall, skinny layout that is hard to fit in the chip floorplan and slow because of the long
vertical wires. Therefore, the array is often folded into fewer rows of more columns. After
folding, each row of the memory contains 2k words, so the array is physically organized as
2n – k rows of 2m + k columns or bits. Figure 12.2(b) shows a two-way fold (k = 1) with eight
rows and eight columns. The column decoder controls a multiplexer in the column cir-
cuitry to select 2m bits from the row as the data to access. Larger memories are generally
built from multiple smaller subarrays so that the wordlines and bitlines remain reasonably
short, fast, and low in power dissipation.
We begin in Section 12.2 with SRAM, the most widely used form of on-chip memory.
SRAM also illustrates all the issues of cell design, decoding, and column circuitry design.
Subsequent sections address DRAMs, ROMs, serial access memories, CAMs, and PLAs.

12.2 SRAM
Static RAMs use a memory cell with internal feedback that retains its value as long as
power is applied. It has the following attractive properties:
 Denser than flip-flops
 Compatible with standard CMOS processes
12.2 SRAM 499

Bitline
Conditioning

Bitlines

Wordlines
Row Decoder

Memory
Cells Bitline
2n rows × Conditioning
m
2 columns
Row Decoder

Memory
Cells
2n − k rows ×
2m + k columns

n−k
Column k Col Column
Circuitry Dec Circuitry
n n

Address Data (2m bits) Address Data (2m bits)


(a) (b)
FIGURE 12.2 Memory array architecture

 Faster than DRAM


 Easier to use than DRAM
For these reasons, SRAMs are widely used in applications from caches to register files to
tables to scratchpad buffers. The SRAM consists of an array of memory cells along with
the row and column circuitry. This section begins by examining the design and operation
of each of these components. It then considers important special cases of SRAMs, includ-
ing multiported register files, large SRAMs and subthreshold SRAMs.
bit bit_b
word
12.2.1 SRAM Cells
A SRAM cell needs to be able to read and write data and to hold the data as long as the
power is applied. An ordinary flip-flop could accomplish this requirement, but the size
is quite large. Figure 12.3 shows a standard 6-transistor (6T) SRAM cell that can be an
order of magnitude smaller than a flip-flop. The 6T cell achieves its compactness at the FIGURE 12.3
expense of more complex peripheral circuitry for reading and writing the cells. This is a 6T SRAM cell
500 Chapter 12 Array Subsystems

good trade-off in large RAM arrays where the memory cells dominate the area. The small
cell size also offers shorter wires and hence lower dynamic power consumption.
The 6T SRAM cell contains a pair of weak cross-coupled inverters holding the state
and a pair of access transistors to read or write the state. The positive feedback corrects
disturbances caused by leakage or noise. The cell is written by driving the desired value
and its complement onto the bitlines, bit and bit_b, then raising the wordline, word. The
new data overpowers the cross-coupled inverters. It is read by precharging the two bitlines
high, then allowing them to float. When word is raised, bit or bit_b pulls down, indicating
the data value. The central challenges in SRAM design are minimizing its size and ensur-
ing that the circuitry holding the state is weak enough to be overpowered during a write,
yet strong enough not to be disturbed during a read.
SRAM operation is divided into two phases. As described in Section 10.4.6, the
phases will be called K1 and K2, but may actually be generated from clk and its complement
clkb. Assume that in phase 2, the SRAM is precharged. In phase 1, the SRAM is read or
written. Timing diagrams will label the signals as _q1 for qualified clocks (K1 gated with
an enable), _v1 for those that become valid during phase 1, and _s1 for those that remain
stable throughout phase 1.
It is no longer common for designers to develop their own SRAM cells. Usually, the
fabrication vendor will supply cells that are carefully tuned to the particular manufacturing
process. Some processes provide two or more cells with different speed/density trade-offs.
Read and write operations and the physical design of the SRAM are discussed in the
subsequent sections.

12.2.1.1 Read Operation Figure 12.4 shows a SRAM cell being read. The bitlines are
both initially floating high. Without loss of generality, assume Q is initially 0 and thus
Q_b is initially 1. Q_b and bit_b both should remain 1. When the wordline is raised, bit
should be pulled down through driver and access transistors D1 and A1.
bit bit_b At the same time bit is being pulled down, node Q tends to rise. Q is
word held low by D1, but raised by current flowing in from A 1. Hence, the
P1 P2 driver D1 must be stronger than the access transistor A 1. Specifically,
A1 A2 the transistors must be ratioed such that node Q remains below the
Q Q_b switching threshold of the P2/D2 inverter. This constraint is called read
D1 D2 stability. Waveforms for the read operation are shown in Figure 12.4(b)
as a 0 is read onto bit. Observe that Q momentarily rises, but does not
(a)
glitch badly enough to flip the cell.
Figure 12.5 shows the same cell in the context of a full column
Q_b bit_b from the SRAM. During phase 2, the bitlines are precharged high. The
1.0
wordline only rises during phase 1; hence, it can be viewed as a _q1
qualified clock (see Section 10.4.6). Many SRAM cells share the same
bitline pair, which acts as a distributed dual-rail footless dynamic multi-
plexer. The capacitance of the entire bitline must be discharged through
word bit
the access transistor. The output can be sensed by a pair of HI-skew
inverters. By raising the switching threshold of the sense inverters,
delay can be reduced at the expense of noise margin. The outputs are
Q
dual-rail monotonically rising signals, just as in a domino gate.
0.0 t
0

(b) 12.2.1.2 Write Operation Figure 12.6 shows the SRAM cell being
written. Again, assume Q is initially 0 and that we wish to write a 1 into
FIGURE 12.4 Read operation for 6T SRAM cell
the cell. bit is precharged high and left floating. bit_b is pulled low by a
12.2 SRAM 501

write driver. We know on account of the read stability constraint that Bitline Conditioning
bit will be unable to force Q high through A1. Hence, the cell must be
written by forcing Q_b low through A 2. P 2 opposes this operation; φ2
thus, P 2 must be weaker than A 2 so that Q_b can be pulled low
More
enough. This constraint is called writability. Once Q_b falls low, D 1 Cells
turns OFF and P1 turns ON, pulling Q high as desired. word_q1
Figure 12.7(a) again shows the cell in the context of a full column
from the SRAM. During phase 2, the bitlines are precharged high.
Write drivers pull the bitline or its complement low during phase 1 to

bit_b_v1f
bit_v1f
write the cell. The write drivers can consist of a pair of transistors on SRAM Cell
each bitline for the data and the write enable, or a single transistor
driven by the appropriate combination of signals (Figure 12.7(b)). In H H
either case, the series resistance of the write driver, bitline wire, and
access transistor must be low enough to overpower the pMOS transis- out_b_v1r out_v1r
tor in the SRAM cell. Some arrays use tristate write drivers to improve
writability by actively driving one bitline high while the other is pulled φ1
low.
φ2
12.2.1.3 Cell Stability To ensure both read stability and writability, word_q1
the transistors must satisfy ratio constraints. The nMOS pulldown
transistor in the cross-coupled inverters must be strongest. The access bit_v1f
transistors are of intermediate strength, and the pMOS pullup transis- out_v1r
tors must be weak. To achieve good layout density, all of the transistors
must be relatively small. For example, the pulldowns could be 8/2 Q, FIGURE 12.5 SRAM column read
the access transistors 4/2, and the pullups 3/3. The SRAM cells
must operate correctly at all voltages and temperatures despite process
variation.
Bitline Conditioning
The stability and writability of the cell are quantified by the hold
margin, the read margin, and the write margin, which are determined φ2
by the static noise margin of the cell in its various modes of operation.
More
A cell should have two stable states during hold and read operation, Cells
and only one stable state during write. The static noise margin (SNM)
word_q1
measures how much noise can be applied to the inputs of the two
cross-coupled inverters before a stable state is lost (during hold or bit_b_v1f
read) or a second stable state is created (during write).
bit_v1f

SRAM Cell

write_q1
Q_b
1.0

Q data_s1
Write Driver
bit_b
(a)

φ1 bit bit_b φ1
word write_s1 write_s1
data_s1 data_s1
write0_q1 write1_q1
0.0 t
0 (b)
FIGURE 12.6 Write operation for 6T SRAM cell FIGURE 12.7 SRAM column write
522 Chapter 12 Array Subsystems

each quadrupling in memory size. In practice, the wire delay depends on the wire width
and thickness and repeater strategy, but can be several times this lower bound. In processes
beyond the 100 nm generation, sense amplifiers will need larger bitline swings because
their offset voltages are not scaling with the supply voltage. This will add several FO4
inverter delays to the bitline-sensing time.
CACTI (Cache Access and Cycle Time) is another model for cache delay
[Wilton96]. [Agarwal01] extends this model to account for process scaling of wires and
transistors. For caches up to 256 KB, the model predicts an access time of a single-ported
direct-mapped cache with a 32-byte block size in a 50 nm process of roughly

D = 1.5 C + 13 (12.7)

FO4 delays, where C is the capacity in KB. For example, the access time for a 16 KB cache
is approximately 19 FO4 delays. The model also predicts the delay of a six-ported register
file with 64-bit words to vary from 12–16 FO4 delays as the capacity increases from
32–256 registers.

12.2.7.3 Power Memory power has dynamic and leakage components. The dynamic
power is proportional to the number of cells in a bank and the number of banks that are
activated (typically 1). For large caches, the dynamic power of the datalines to route the
data out of the cache is also significant. This power grows with the wire length, which
depends on the square root of the capacity. The leakage power is proportional to the total
number of cells in the memory. Dynamic and leakage power both grow linearly with the
number of ports. [Evans95] describes SRAM power modeling further.

12.3 DRAM
Dynamic RAMs (DRAMs) store their contents as charge on a capacitor rather than in a
feedback loop. Thus, the basic cell is substantially smaller than SRAM, but the cell must
word be periodically read and refreshed so that its contents do not leak away. Commercial
x DRAMs are built in specialized processes optimized for dense capacitor structures. They
Ccell offer a factor of 10–20 greater density (bits/cm2) than high-performance SRAM built in a
standard logic process [Nakagome03], but they also have much higher latency. DRAM
bit circuit design is a specialized art and is the topic of excellent books such as [Keeth07].
(a) This section provides an overview of the general issues.
A 1-transistor (1T) dynamic RAM cell consists of a transistor and a capacitor, as
word shown in Figure 12.41(a). Like SRAM, the cell is accessed by asserting the wordline to
connect the capacitor to the bitline. On a read, the bitline is first precharged to VDD / 2.
When the wordline rises, the capacitor shares its charge with the bitline, causing a voltage
x
change )V that can be sensed, as shown in Figure 12.41(b). The read disturbs the cell con-
VDD/2 26V tents at x, so the cell must be rewritten after each read. On a write, the bitline is driven
bit high or low and the voltage is forced onto the capacitor. Some DRAMs drive the wordline
to VDDP = VDD + Vt to avoid a degraded level when writing a ‘1.’
(b)
The DRAM capacitor Ccell must be as physically small as possible to achieve good
FIGURE 12.41 density. However, the bitline is contacted to many DRAM cells and has a relatively large
1T DRAM cell read operation capacitance Cbit. Therefore, the cell capacitance is typically much smaller than the bitline
capacitance. According to the charge-sharing equation, the voltage swing on the bitline
during readout is
12.3 DRAM 523

V DD C cell
)V = (12.8) Bitline
2 C cell + C bit
We see that a large cell capacitance is important to provide a reasonable Wordline
voltage swing. It also is necessary to retain the contents of the cell for an
n+ n+
acceptably long time and to minimize soft errors. For example, 30 fF is a typi-

Poly Plug
cal target. The most compact way to build such a high capacitance is to extend Oxide-Nitride-Oxide
into the third dimension. For example, Figure 12.42 shows a cross-section and Dielectric
SEM image of trench capacitors etched under the source of the transistor. The Heavily Doped
walls of the trench are lined with an oxide-nitride-oxide dielectric. The trench p-substrate
is then filled with a polysilicon conductor that serves as one terminal of the
capacitor attached to the transistor drain, while the heavily doped substrate
serves as the other terminal. A variety of three-dimensional capacitor structures
have been used in specialized DRAM processes that are not available in con-
ventional CMOS processes.

12.3.1 Subarray Architectures


Like SRAMs described in Section 12.2.5, large DRAMs are divided into mul-
tiple subarrays. The subarray size represents a trade-off between density and
performance. Larger subarrays amortize the decoders and sense amplifiers
across more cells and thus achieve better array efficiency. But they also are slow
and have small bitline swings because of the high wordline and bitline capaci-
tance. A typical subarray size is 256 words by 512 bits, as shown in Figure
12.43. Array efficiencies are typically 50–60%.
A subarray of this size has an order of magnitude higher capacitance on
the bitline than in the cell, so the bitline voltage swing )V during a read is tiny.
The array uses a sense amplifier to compare the bitline voltage to that of an idle
bitline (precharged to VDD /2). The sense amplifier must also be compact to fit FIGURE 12.42 Trench capacitor
the tight pitch of the array. The low-swing bitlines are sensitive to noise. Three

bit0 bit1 bit2 bit3 bit4 bit509 bit510 bit511


word0

word1

word2

word254

word255

FIGURE 12.43 DRAM subarray

You might also like