Semiconductor Memory Design: Chapter Outline
Semiconductor Memory Design: Chapter Outline
CHAPTER
8
Semiconductor Memory Design
CHAPTER OUTLINE
8.1 Introduction
8.2 MOS Decoders
8.3 Static RAM Cell Design
8.4 SRAM Column I/O Circuitry
8.5 Memory Architecture
8.6 Summary
References
Problems
8.1 Introduction
Modern digital systems require the capability of storing and retrieving large
amounts of information at high speeds. Memories are circuits or systems that store
digital information in large quantity. This chapter addresses the analysis and design
of VLSI memories, commonly known as semiconductor memories. Today, memory
circuits come in different forms including SRAM, DRAM, ROM, EPROM,
E2PROM, Flash, and FRAM. While each form has a different cell design, the basic
structure, organization, and access mechanisms are largely the same. In this chap-
ter, we classify the different types of memory, examine the major subsystems, and
focus on the static RAM design issues. This topic is particularly suitable for our
study of CMOS digital design as it allows us to apply many of the concepts pre-
sented in earlier chapters.
Recent surveys indicate that roughly 30% of the worldwide semiconductor busi-
ness is due to memory chips. Over the years, technology advances have been driven
by memory designs of higher and higher density. Electronic memory capacity in
digital systems ranges from fewer than 100 bits for a simple function to standalone
chips containing 256 Mb (1 Mb 210 bits) or more.1 Circuit designers usually
1 Recently, a memory chip with 1 Gbit of data storage capacity has been announced.
359
hod83653_ch08.qxd 6/17/03 2:01 PM Page 360
speak of memory capacities in terms of bits, since a separate flip-flop or other sim-
ilar circuit is used to store each bit. On the other hand, system designers usually state
memory capacities in terms of bytes (8 bits); each byte represents a single alphanu-
meric character. Very large scientific computing systems often have memory capac-
ity stated in terms of words (32 to 128 bits). Each byte or word is stored in a partic-
ular location that is identified by a unique numeric address. Memory storage
capacity is usually stated in units of kilobytes (K bytes) or megabytes (M bytes).
Because memory addressing is based on binary codes, capacities that are integral
powers of 2 are most common. Thus the convention is that, for example, 1K byte
1,024 bytes and 64K bytes 65,536 bytes. In most memory systems, only a single
byte or word at a single address is stored or retrieved during each cycle of memory
operation. Dual-port memories are also available that have the ability to read/write
two words in one cycle.
Wordlines
Row address
..
. Cell
.. Bitlines
.
..
.
n
2n 1 2m
Column decoder/MUX
Data
..
.
1 m
Column address
Figure 8.1
Organization of memory systems.
hod83653_ch08.qxd 6/17/03 2:01 PM Page 361
in one row depending on the application. The row and column (or group of
columns) to be selected are determined by decoding binary address information.
For example, an n-bit decoder for row selection, as shown in Figure 8.1, has 2n out-
put lines, a different one of which is enabled for each different n-bit input code. The
column decoder takes m inputs and produces 2m bitline access signals, of which
1, 4, 8,16, 32, or 64 may be enabled at one time. The bit selection is done using a
multiplexer circuit to direct the corresponding cell outputs to data registers. In
total, 2n 2m cells are stored in the core array.
An overall architecture of a 64 Kb random-access memory is shown in Figure
8.2. For this example, n m 8. Therefore, the core array has a total of 65,536
cells. The memory uses a 16-bit address to produce a single bit output.
Memory cell circuits can be implemented in a wide variety of ways. In princi-
ple, the cells can be based on the flip-flop designs listed in Chapter 5 since their
intended function is to store bits of data. However, these flip-flops require a sub-
stantial amount of area and are not appropriate when millions of cells are needed.
2m = 256
Row decoder
Column pull-ups
Wordline
Bitline
2n = 256
n=8
Address
input
2m
Column MUX
m=8
Column
decoder
Sense amplifier
Sense en
Read/write
Write en
Read/write Write driver
control
Figure 8.2
Overall architecture of memory design.
hod83653_ch08.qxd 6/17/03 2:01 PM Page 362
In fact, most memory cell circuits are greatly simplified compared to register and
flip-flop circuits. While the data storage function is preserved, other properties
including quantization of amplitudes, regeneration of logic levels, input-output iso-
lation, and fanout drive capability may be sacrificed for cell simplicity. In this way,
the number of devices in a single cell can be reduced to one to six transistors. Fig-
ure 8.2 illustrates a six-transistor memory cell.
At the level of a memory chip shown in Figure 8.2, the desired logic properties
are recovered through use of properly designed peripheral circuits. Circuits in this
category are the decoders, sense amplifiers, column precharge, data buffers, etc.
These circuits are designed so that they may be shared among many memory cells.
Read-write (R/W) circuits determine whether data are being retrieved or stored,
and they perform any necessary amplification, buffering, and translation of voltage
levels. Specific examples are presented in the following sections.
Erasable programmable read-only memories (EPROMs) also have all bits ini-
tially in one binary state. They are programmed electrically (similar to the PROM),
but all bits may be erased (returned to the initial state) by exposure to ultraviolet
(UV) light. The packages for these components have transparent windows over the
chip to permit the UV irradiation. Electrically erasable programmable read-only
memories (EEPROMs, E2PROM, or E-squared PROMs) may be written and erased
by electrical means. These are the most advanced and most expensive form of
PROM. Unlike EPROMs, which must be totally erased and rewritten to change even
a single bit, E2PROMs may be selectively erased. Writing and erasing operations for
all PROMs require times ranging from microseconds to milliseconds. However, all
PROMs retain stored data when power is turned off; thus they are termed non-
volatile.
A recent form of EPROM and E2PROM is termed Flash memory, a name derived
from the fact that blocks of memory may be erased simultaneously. Flash memory
of the EPROM form is written using the hot-electron effect2 whereas E2PROM
Flash is written using Fowler-Nordheim (FN) tunneling.3 Both types are erased
using FN tunneling. Their large storage capacity has made this an emerging mass
storage medium. In addition, these types of memories are beginning to replace the
role of ROMs on many chips, although additional processing is required to manu-
facture Flash memories in a standard CMOS technology.
Memories based on ferroelectric materials, so-called FRAMs or FeRAMs, can
also be designed to retain stored information when power is off. The Perovskite crys-
tal material used in the memory cells of this type of RAM can be polarized in one
direction or the other to store the desired value. The polarization is retained even
when the power supply is removed, thereby creating a nonvolatile memory. How-
ever, semiconductor memories are preferred over ferroelectric memories for most
applications because of their advantages in cost, operating speed, and physical size.
Recently, FRAMs have been shown to be useful nonvolatile memory in certain
applications such as smart cards and may be more attractive in the future due to
their extremely high storage density.
2 Hot electrons are created by applying a high field in the channel region. These electrons enter the oxide and
raise the threshold voltage of a device. Devices with this higher threshold voltage are viewed as a stored “1.”
Devices with the lower threshold voltage represent a stored “0.”
3 Fowler-Nordheim tunneling occurs through thin insulating material such as thin-oxide associated with the
gate. Current flows through the oxide by tunneling through the energy barrier.
hod83653_ch08.qxd 6/17/03 2:01 PM Page 364
Address
(n + m) bits
r/w
tcycle
tw/setup
Clk
Ai Aj Address
tAC r/w
Data in
Figure 8.3
Random access memory timing parameters.
there are write setup operations needed before each memory operation, indicated
by tw/setup . The memory clock cycle time, tcycle , is the minimum time needed to com-
plete successive read or write operations.
The cycle time is essentially the reciprocal of the time rate at which address
information is changed while reading or writing at random locations. Minimum
access times for reading and writing are not necessarily the same, but for simplicity
of design, most systems specify a single time for both reading and writing. For semi-
conductor read-write memories, the read access time is typically 50 to 80% of cycle
time.
A1 A0 =
0 0
1 0
0 1
1 1
A1 A1 A0 A0
(a) AND-based decoder
0 0
1 0
NORs
0 1
1 1
A1 A1 A0 A0
(b) NOR-based decoder
Figure 8.4
AND and NOR decoders.
NOR gates that take every possible combination of the inputs. There are two
address bits in this example and we require both the true and complement of each
address bit. The output line activated by each input combination is shown in the
figure. Note that all outputs are normally low except one.
An n-bit decoder requires 2n logic gates, each with n inputs. For example, with
n 6, we need 64 NAND6 gates driving 64 inverters to implement the decoder.
From previous chapters, it is clear that gates with more than 3 or 4 inputs create
large series resistances and long delays. Rather than using n-input gates, it is prefer-
able to use a cascade of gates. Typically two stages are used: a predecode stage and a
final decode stage. The predecode stage generates intermediate signals that are used
by multiple gates in the final decode stage. In Figure 8.5, schematics are shown for
two possible alternatives to implement a 6-input AND gate. We could choose three
hod83653_ch08.qxd 6/17/03 2:01 PM Page 366
Decode stage
Predecode stage
(a)
Decode stage
Predecode stage
(b)
Figure 8.5
Predecoder configurations.
NAND2 gates and one NAND3 gate to implement each AND6 gate, as shown in
Figure 8.5a. Alternatively, two NAND3 gates and one NAND2 gate of Figure 8.5b
may be used. The better of the two can be determined using logical effort.
The main advantage of two-level decoding is that a large number of intermedi-
ate signals can be generated by the predecode stage and then reused by the final
decoding stage. The result is a reduction in the number of inputs for each gate. Since
this aspect is not clearly depicted in Figure 8.5, a more complete example for 6
address bits is shown in Figure 8.6. In the predecoder, a total of 12 intermediate sig-
nals are generated from the address bits and their complements. The signals that are
generated in the predecoder are as follows: A0A1, A0A1, A 0 A1, A 0 A1, A2A3, A2A 3 , etc.
These signals may now be used by the final decoding stage to generate the 64
required outputs using NAND3/inverter combinations. This corresponds to the
configuration shown in Figure 8.5a. Each predecoder output drives 16 NAND gates
(i.e., 64 NAND3 gates 3 inputs each / 12 intermediate outputs). Therefore, the
branching effort BE 16. The delay through the NAND2-inverter-NAND3-
inverter stages can be minimized by sizing the gates using logical effort. A similar
kind of two-level decoder can be constructed using the configuration of Figure 8.5b.
Again, the better of the two approaches can be determined using logical effort. It is
important to minimize the delay through the decoder as it may constitute up to
40% of the clock cycle.
hod83653_ch08.qxd 6/17/03 2:01 PM Page 367
A0 A1 A2 A3 A4 A5
Predecoder 1 2 3 4 5 6 7 8 9 10 11 12
Final 1 62 63 64
.
.
.
decoder
Figure 8.6
Structure of two-level decoder for 6-bit address.
2
5/3/4
1/4
BE = 16
(16)(5/3)/43
(16)(4/3)(5/3)/44
q
Wordline Stored 0
Inverter 2
1 Inverter 1
2
q q Stored 1
Figure 8.7
Basic SRAM cell and VTC.
on flip-flops. The VTC conveys the key cell design considerations for read and write
operations. In the cross-coupled configuration, the stored values are represented by
the two stable states in the VTC. The cell will retain its current state until one of the
internal nodes crosses the switching threshold, VS. When this occurs, the cell will
flip its internal state. Therefore, during a read operation, we must not disturb its
current state, while during the write operation we must force the internal voltage to
swing past VS to change the state.
The six transistor (6T) static memory cell in CMOS technology is illustrated
schematically in Figure 8.8. The cross-coupled inverters, M1, M5 and M2 , M6 , act as
the storage element. Major design effort is directed at minimizing the cell area and
power consumption so that millions of cells can be placed on a chip. The steady-
state power consumption of the cell is controlled by subthreshold leakage currents,
so a larger threshold voltage is often used in memory circuits. To reduce area, the
cell layout is highly optimized to eliminate all wasted area. In fact, some designs
replace the load devices, M5 and M6 , with resistors formed in undoped polysilicon.
VDD
b M5 M6 b
M3 M4
q q
M1 M2
Wordline
Figure 8.8
6T SRAM cell.
hod83653_ch08.qxd 6/17/03 2:01 PM Page 370
Clock
b b b b b b
Wl1
Wl2
Row Wordline
address X X X
.
.
.
Decoder
bits
Wl3 2
Cword
1
Wl4 q q
Bitline Bitline
. . . Wl5
. . .
. . Cbit . Cbit
Figure 8.9
Wordline and double bitline configuration.
This is called a 4T cell since there are now only four transistors in the cell. To min-
imize power, the current through the resistors can be made extremely small by using
very large pull-up resistances. Sheet resistance of these resistors is 10 M per square
or higher and the area is minimal. Standby currents are kept in the nanoampere
range. Thus, power and area may be reduced at the expense of extra processing
complexity to form the undoped polysilicon resistors. However, the majority of the
designs today use the conventional 6T configuration of Figure 8.8.
The operation of an array of these cells is illustrated in Figure 8.9. The row
select lines, or wordlines, run horizontally. All cells connected to a given wordline
are accessed for reading or writing. The cells are connected vertically to the bitlines
using the pair of access devices to provide a switchable path for data into and out of
the cell. Two column lines, b and b , provide a differential data path. In principal, it
should be possible to achieve all memory functions using only one column line and
one access device. Attempts have been made in this direction, but due to normal
variations in device parameters and operating conditions, it is difficult to obtain
reliable operation at full speed using a single access line. Therefore, the symmetri-
cal data paths b and b as shown in Figure 8.9 are almost always used.
Row selection in CMOS memory is accomplished using the decoders described
in the previous section. For synchronous memories, a clock signal is used in con-
junction with the decoder to activate a row only when read-write operations are
being performed. At other times, all wordlines are kept low. When one wordline goes
high, say wl3 in Figure 8.9, all the cells in that row are selected. The access transistors
hod83653_ch08.qxd 6/17/03 2:01 PM Page 371
are all turned on and a read or write operation is performed. Cells in other rows are
effectively disconnected from their respective wordlines.
The wordline has a large capacitance, Cword, that must be driven by the decoder.
It is comprised of two gate capacitances per cell and the wire capacitance per cell:
Once the cells along the wordline are enabled, read or write operations are carried
out. For a read operation, only one side of the cell draws current. As a result, a small
differential voltage develops between b and b on all column lines. The column
address decoder and multiplexer select the column lines to be accessed. The bitlines
will experience a voltage difference as the selected cells discharge one of the two bit-
lines. This difference is amplified and sent to output buffers.
It should be noted that the bitlines also have a very large capacitance due to the
large number of cells connected to them. This is primarily due to source/drain
capacitance, but also has components due to wire capacitance and drain/source
contacts. Typically, a contact is shared between two cells. The total bitline capaci-
tance, Cbit , can be computed as follows:
C bit 1source>drain cap wire cap contact cap2 no. of cells in column
(8.2)
During a write operation, one of the bitlines is pulled low if we want to store 0,
while the other one is pulled low if we want to store 1. The requirement for a suc-
cessful write operation is to swing the internal voltage of the cell past the switching
threshold of the corresponding inverter. Once the cell has flipped to the other state,
the wordline can be reset back to its low value.
The design of the cell involves the selection of transistor sizes for all six transis-
tors of Figure 8.8 to guarantee proper read and write operations. Since the cell is
symmetric, only three transistor sizes need to be specified, either M1, M3, and M5 or
M2 , M4 , and M6 . The goal is to select the sizes that minimize the area, deliver the
required performance, obtain good read and write stability, provide good cell read
current, and have good soft error immunity (especially due to a-particles).
VDD
Icell
b M5 M6 b
M3 M4
(= 0) (1 =)
q q
Cbit Cbit
M1 M2
wl
(a)
wl
Trigger ∆V
b, b
∆τ
q
(b)
Figure 8.10
Design of transistor sizes for read operation.
Upon completion of the read cycle, the wordline is returned to zero and the column
lines can be precharged back to a high value.
When designing the transistor sizes for read stability, we must ensure that the
stored values are not disturbed during the read cycle. The problem is that, as cur-
rent flows through M3 and M1, it raises the output voltage at node q which could
turn on M2 and bring down the voltage at node q, as shown in Figure 8.10b. The
voltage at node q may drop a little but it should not fall below VS. To avoid altering
the state of the cell when reading, we must control the voltage at node q by sizing
M1 and M3 appropriately. We can accomplish this by making the conductance of M1
about 3 to 4 times that of M3 so that the drain voltage of M1 does not rise above VTN.
In theory, the voltage should not exceed VS, but this design must be carried out with
due consideration of process variations and noise. In effect, the read stability
requirement establishes the ratio between the two devices.
The other consideration in the read cycle design is to provide enough cell cur-
rent to discharge the bitline sufficiently within 20 to 30% of the cycle time. Since the
cell current, Icell, is very small and the bitline capacitance is large, the voltage will
drop very slowly at b, as shown in Figure 8.10b. The rate of change of the bitline can
be approximated as follows:
hod83653_ch08.qxd 6/17/03 2:01 PM Page 373
dV
Icell Cbit
dt
dV Icell
dt Cbit
Clearly, Icell controls the rate at which the bitline discharges. If we want a rapid full-
swing discharge, we can make Icell large. However, the transistors M1 and M3 would
have to be larger. Since we have millions of such cells, the area and power of the
memory would be correspondingly larger. Instead, we choose a different approach.
We attach a sense amplifier to the bitlines to detect the small difference, V,
between b and b and produce a full-swing logic high or low value at the output. The
trigger point relative to the rising edge of the wordline, t, for the enabling of the
sense amplifier can be chosen by the designer based on the response characteristics
of the amplifier. If the voltage swing V and a target delay t are specified accord-
ing to Figure 8.10b, then
Cbit ¢V
Icell
¢t
This leads to the cell current value which, in turn, determines the final transistor
sizes for M1 and M3. Alternatively, if the transistor sizes are determined to optimize
the cell area, then the corresponding delay is computed as
Cbit ¢V
¢t
Icell
We now establish a rule of thumb for transistor sizes for the read cycle using an
example.
We first eliminate Cox from both sides of the equation. Now, setting Vq 0.1 V and
ignoring body effect, we obtain
cm2
a 270 b
V sec 0.12
c 11.2 0.4 20.1 d
W1
a1 b
0.1 mm 0.1 2
0.6
W3 18 106 cm>s 2 11.2 0.1 0.4 2 2
11.2 0.1 0.4 2 0.6
W1
1.7
W3
This ratio would be smaller if body effect were taken into account. The actual val-
ues of the widths depend on the desired rate of change of the bitline voltage, the
delay specification, and cell current. If we require a bitline transition of 200 mV in
2 ns, with a total bitline capacitance of 2 pF, then the cell current is
¢V 200 mV
Icell Cbit 2 pF 200 mA
¢t 2 ns
This is the average cell current through M1 and M3. As a rough estimate, we could
simply use the current through the access transistor when it turns on:
W3 18 106 2 11.6mF>cm2 2 11.2 0.1 0.4 2 2
Icell 200 mA
11.2 0.1 0.4 2 0.6
W3 0.4 mm
This implies that W10.7 mm.These two sizes are larger than we would desire if we
were trying to create a 1 Mbit SRAM. However, this example is intended to show
the steps in the design process.
In practice, the device sizes are controlled by the RAM cell area constraints. As a rule
of thumb, we typically use
W1
1.5 (8.3)
W3
and then optimize the sizes to provide the proper noise margin characteristics.
VDD
VDD
b M5 M6 b
M3 M4
(= 0) (1 =)
q
q
M1 M2
VDD
(a)
wl
b
b
q
q
(b)
Figure 8.11
Write operation and waveforms for 6T SRAM.
its drain voltage rises to VDD due to the pull-up action of M5 and M3. At the same
time, M2 turns on and assists M4 in pulling output q to its intended low value. When
the cell finally flips to the new state, the row line can be returned to its low standby
level.
The design of the SRAM cell for a proper write operation involves the transis-
tor pair M6-M4. As shown in Figure 8.11a, when the cell is first turned on for the
write operation, they form a pseudo-NMOS inverter. Current flows through the two
devices and lowers the voltage at node q from its starting value of VDD . The design
of device sizes is based on pulling node q below VS to force the cell to switch via the
regenerative action. This switching process is shown in Figure 8.11b. Note that the
bitline b is pulled low before the wordline goes up. This is to reduce the overall delay
since the bitline will take some time to discharge due to its high capacitance.
The pull-up to pull-down ratio for the pseudo-NMOS inverter can be deter-
mined by writing the current equation for the two devices and setting the output to
VS. To be conservative, a value much lower than VS should be used to ensure proper
operation in the presence of noise and process variations. Based on this analysis, a
rule of thumb is established for M6-M4 sizing:
W4
1.5 (8.4)
W6
hod83653_ch08.qxd 6/19/03 3:16 PM Page 376
The two ratios provided in Equations (8.3) and (8.4) are only estimates. One should
remember that the actual values will depend on a number of factors such as area,
speed, and power considerations. However, these two rules of thumb can be used to
validate the solution, once obtained.
Exercise 8.1 Compute the ratio of M6 to M4 for the circuit in Figure 8.11 assuming that the
node
q is to be pulled down to VTN, which is well below the switching threshold.
b b
VDD
M5 M6
gnd gnd
q q
M3 M4
wl
M1 M2
Figure 8.12
SRAM cell layout.
hod83653_ch08.qxd 6/17/03 2:02 PM Page 377
near the bottom of the cell. VDD is routed in Metal1 at the top of the cell while Gnd
is routed in Metal1 near the middle of the cell. The source/drain contacts are shared
between pairs of neighboring cells by mirroring the cell vertically. The capacitance
of the contacts per cell is therefore half the actual value due to sharing. The cell indi-
cated by the center bounding box is replicated to create the core array. This cell is
approximately 40l by 30l. Note that the substrate and well contacts are contained
inside the cell. Removal of substrate and well plugs from the cell would result in a
smaller cell.
The large number of devices connected to the wordline and bitlines gives rise
to large capacitance (and resistance) values as described earlier. The row lines are
routed in both Metal1 and poly to reduce resistance, while the bitlines are routed in
Metal2. Calculations of total capacitance may be carried out using Equations (8.1)
and (8.2).
Problem:
What is the capacitance of the wordline and the bitlines for a 64K SRAM that uses
the cell layout of Figure 8.12 with access transistors that are 0.5 mm/0.1 mm in size?
The contacts on the bitlines are shared between pairs of cells and have a capaci-
tance of 0.5 fF each. Wire capacitance is 0.2 fF/mm. Assume 0.13 mm technology
parameters. The cell layout is 40l by 30l. Note that 1mm 20l.
Solution:
If we were to design a 64K SRAM, it would contain a core area of 256 256.
Ignoring the resistance for the moment, the row capacitance would be due to the
gate capacitance of 512 access transistors and the wire capacitance of 256 cells:
Cword 512 2 fF/mm 0.5 mm 256 30l 0.2 fF/mm 1 mm/20l
589 fF
The bitline capacitance per cell due to the source/drain capacitance of the access
transistors is lower than usual since the voltage drop across the junction is close to
VDD . In addition, there is wire capacitance and a half contact capacitance per cell.
The total is
Cbit 256 0.5 fF/mm 0.5 mm 256 40l 0.2 fF/mm 0.1 mm/2l
128 0.5 fF 230 fF
PC PC PC
Important to
equalize bitline
voltage before
reads
Cell Cell Cell
wl wl wl
.. .. ..
. . .
b b b b b b
b, b b, b
b, b VDD
PC PC ∆v PC VDD − VTN
wl wl wl ∆v
Figure 8.13
Column pull-up configurations.
hod83653_ch08.qxd 6/17/03 2:02 PM Page 379
current flow, or latch-based voltage-sensing amplifiers since the bitlines will estab-
lish a differential voltage, v.
Figure 8.13c is based on the NMOS saturated enhancement load. Therefore, the
maximum possible voltage on the bitline is VDD VT. When PC is applied to the
balance transistor, it equalizes the two voltage levels. Once the lines are precharged
high, the PC signal is turned off (raised to VDD ) and then wl goes high. At this point,
the pull-ups are still active so current will flow through one of them into the cell
side with the stored “0.” Again, a steady-state output level will be reached by the cor-
responding bitline, as shown in the figure, although this value will be lower than the
pseudo-NMOS case. This type of pull-up is suitable for differential voltage sensing
amplifiers since the bitline voltages initially start at VDD VTN . This lower voltage
is needed for a proper biasing and output swing of the differential amplifier, as will
be described later.
The PC signal may be generated in a variety of ways, but typically it is produced
by an address transition detection (ATD) circuit. One form of this circuit is shown in
Figure 8.14. The ATD signal is triggered by any transition on the address inputs. The
basic circuit is comprised of a set of XOR gates, each with a delay element on one of
the inputs. When an address line changes, it causes the XOR gate to generate a short
pulse since the inputs differ in value for a short time. Circuits that generate a short
pulse of this nature are called one-shots, which are part of the monostable family of cir-
cuits. The duration of the pulse, tD , is determined by the delay element. The delay line
may be constructed from a simple inverter chain with an even number of inverters. In
the figure, N is an even number and tPinv is the inverter propagation delay.
τD τD
τD PC
τD
A0 Delay ATD
..
.
A1 Delay
..
.
An + m Delay
τD τD = NτPinv
Delay =
Figure 8.14
Address transition detection (ATD) circuit.
hod83653_ch08.qxd 6/17/03 2:02 PM Page 380
Once the pulse is generated, it turns on one of the pull-down transistors of the
pseudo-NMOS NOR gate. A negative going pulse is generated at its output. This is
passed through another inverter to generate the actual ATD signal. Many of the tim-
ing signals in SRAMs are derived from this basic circuit so it is required to drive a
very high capacitance. Therefore, the signal should be properly buffered using logi-
cal effort or any other optimization method. Once generated, it can be inverted and
applied to the bitline precharge elements as the PC signal. The address transitions
usually take place before the beginning of a clock cycle and, as a result, the precharge
operation typically occurs at the end of a previous memory cycle.
1
..
.
m Column
3
decoder
..
.
2m
Figure 8.15
Column decoding and multiplexing.
hod83653_ch08.qxd 6/17/03 2:02 PM Page 381
Write
driver
Sense Write Sense
amp driver amp
(a) (b)
Figure 8.16
Column selection.
The optimal design of the column decoder proceeds in the same way as described
earlier for the row decoder. The transmission gates driven by the decoder are also
sized for optimal speed. They are connected to the sense amplifier for read opera-
tions and the write driver for write operations. This is shown in Figure 8.16a. Note
that the use of the CMOS transmission gate presents a routing problem since each
of the signals driving the pass transistors must be complementary (we are driving
both PMOS and NMOS devices).
The routing can be simplified by realizing that the PMOS device is better at
transmitting high signals while the NMOS device is better at transmitting low signals.
Since the bitlines are near VDD during a read, we should turn on the PMOS device
during a read operation and leave the NMOS device off. During a write, one bitline is
pulled to a low voltage. Therefore, we leave the PMOS device off and only turn on the
NMOS device. It is possible to separate the NMOS and PMOS devices and only turn
them on when needed. This is shown in Figure 8.16b. The NMOS devices are only
connected to the write drivers while the PMOS devices are only connected to the sense
amplifiers since they would be turned on during a read operation.
Now that the lines have been separated, there is one other improvement we can
consider. Rather than a single level of multiplexing, it is possible to reduce the over-
all transistor count by using a tree decoding structure as shown in Figure 8.17. In
this example, we have a 4-bit column address which would normally translate to 16
enable lines. Instead, we use two 2-to-4 decoders that select 1-out-of-4 pass transis-
tors at each level. As we add more levels in the tree, the signal path is slower but the
decoder size is reduced. For this example, we have shown two-level tree decoding
hod83653_ch08.qxd 6/17/03 2:02 PM Page 382
Columns
A0
2
to
..
.
4
A1
A2
2
to
4
A3
Figure 8.17
Two-level tree decoder for a 4-bit column address.
M7 M8
pc
Cell
clk
.. pc
Cbit . Cbit addr
Cell data
b b
q q
wl
wl col
b, b
M13 M14 q
W W Write q
D D driver
Col M15
Figure 8.18
Write driver circuit.
the bitlines slowly drop in voltage and could potentially cause long access times. To
reduce read access time, the memory is designed so that only a small voltage change
on one column line or the other is needed to detect the stored value. Two or more
amplifying stages are used to generate a valid logic output when the voltage differ-
ence between b and b is about 150–200 mV. Thus the column delay is only due to
the time needed to achieve this small voltage change.
Figure 8.19 shows a simplified version of the read circuitry for a CMOS static
memory. One of the precharge circuits of Figure 8.13 is used to pull the column
lines high. In this case, the columns are biased at VDD by transistors M7, M8. Then,
the address, data (not used during read), and clock signals are applied. Again, the
address signals translate into column enable and wordline activation signals. Usu-
ally the column selection and sense enable are activated at the same time. The sense
amplifier depicted in Figure 8.19 is used to provide valid high and low outputs using
the small voltage difference between inputs b and b . The precharge circuit must be
consistent with the sense amplifier circuit. Otherwise, the sense amplifier may not
operate properly.
hod83653_ch08.qxd 6/19/03 3:16 PM Page 384
M7 M8
pc
Cell
clk
.. pc
Cbit . Cbit addr
Cell data
b b
q q
wl Trigger
b, b
wl
q
q
M9 M10
col
col Write sense
driver enable
Sense Sense out
out
enable amplifier
Figure 8.19
Basic read circuitry.
4These are called sense amplifiers when used in memory applications; their role is to sense which bitline is
dropping in voltage. Normally, such a circuit is used for small-signal voltage gain rather than large-signal sens-
ing applications.
hod83653_ch08.qxd 6/17/03 2:02 PM Page 385
VDD
M3 M4
M5 ISS
VCS
Figure 8.20
Differential voltage sense amplifier.
large-signal applications, we can study its properties from a more digital point of
view.
The circuit can be divided into three components: the current mirror, common-
source amplifier, and the biasing current source. All transistors are initially placed
in the saturation region of operation so that the gain is large. They also use large val-
ues of channel length, L, to improve linearity.
The two transistors, M3 and M4, act to provide the same current to the two
branches of the circuit. That is, the current flowing through M3 is mirrored in M4:
I3 I4
Any difference in the two currents is due to differences in their VDS values. The tran-
sistor M5 sets the bias current, ISS , which depends on the bias voltage VCS . At steady-
state, the current flowing through the two branches of the amplifier should be equal
to ISS /2.
The two input transistors, M1 and M2, form a source-coupled differential pair.
The two input voltages, Vi1 and Vi2, are connected to the column lines. The biasing
of the circuit must be done carefully to allow the output node to have a large enough
swing. Specifically, the transistors must be on and in the saturation region of oper-
ation for high gain. In order to accomplish this, the inputs to M1 and M2 must be
set to approximately VDD VTN rather than VDD . To understand this, consider the
case when the inputs are precharged to VDD . To keep the input devices in saturation,
their two drain nodes, N1 and out, would be biased at the saturation voltage:
The above simplification is possible since the channel lengths are large.
hod83653_ch08.qxd 6/17/03 2:02 PM Page 386
A problem arises if the two nodes, N1 and out, are biased at this value: both p-
channel devices would be at the edge of cutoff. In practice, the PMOS threshold
voltage is higher in magnitude than the NMOS threshold voltage. Therefore, both
M3 and M4 would be completely off in the steady-state condition. Instead, if we
biased the inputs of M1 and M2 at VDD VTN , then
Now there is enough headroom for the two PMOS devices to be comfortably on and
in saturation. This input bias condition requires the use of the column pull-up cir-
cuits of Figure 8.13c.
With the biasing established, the sense amplifier operates as follows. Initially,
the bias currents in the two branches are equal and the two inputs are at VDD VTN.
When the voltage at one input decreases, it decreases the current in that branch. At
the same time, the current in the other branch increases in value to maintain a total
of ISS through M5.
We examine the discharging and charging cases in Figure 8.21. Assume that M1
is the input that has dropped below VDD VTN by the prescribed amount to turn it
off, as in Figure 8.21a. This implies that the currents in M3 and M4 are both zero.
However, since the current in M5 is ISS, it follows that this current must be dis-
charging the output capacitance through M2. Therefore, the output voltage is
quickly forced to ground. In the other scenario depicted in Figure 8.21b, if the input
Vi 2 drops by the prescribed amount, M2 is turned off. Then all the current flows
through M1, M3 , and M5. The current of M3 is mirrored in M4. Since the current in
M2 is zero, this current must flow to the output to charge it to VDD .
VDD VDD
M3 M4 M3 M4
ISS
N1 Out ISS N1 Out
ISS
Cout Cout
Vi1 Vi2 Vi1 Vi2
M1 M2 M1 M2
M5 ISS M5 ISS
VCS VCS
Figure 8.21
Detecting “0” and “1” using a differential sense amplifier.
hod83653_ch08.qxd 6/17/03 2:02 PM Page 387
This is called the slew rate (i.e., dV/dt at the output). Rearranging, the delay through
the sense amplifier is
Cout ¢Vout
¢t
ISS
To reduce the delay, a large ISS can be selected. However, the power dissipation in
steady-state is given by
P ISSVDD
Therefore, a tradeoff exists between the speed and the power dissipation; both are
controlled by the choice of ISS. Once a suitable value of ISS is selected, the W/L of
the devices can be determined. For the input devices, the W/L determines the
VGS VTN value. This value is the gate overdrive term that establishes the desired
bitline swing value. As the input transistor sizes increase, the gate overdrive term
decreases. Since we require a small gate overdrive, the input devices are required to
be rather large. The sizes of the other transistors are based on the bias voltages
needed at the internal nodes. The complete design of such amplifier circuits falls
into the realm of analog circuit design. Further details can be obtained by consult-
ing the references at the end of the chapter.
A second option for the sense amplifier is the latch-based circuit shown in Fig-
ure 8.22. The circuit is effectively a cross-coupled pair of inverters with an enabling
transistor, M1. This circuit relies on the (slower) regenerative effect of inverters to
generate a valid high or low voltage. It is a lower power option since the circuit is
not activated until the required potential difference has developed across the bit-
lines. However, it is slower since it requires a large input voltage difference and is not
as reliable in the presence of noise as the previous sense amplifier.
The initial sequence of operations is similar to the differential sense amplifier
described above. The bitlines are precharged to VDD with either Figure 8.13a or Fig-
ure 8.13b. Then the wordline is enabled and one of the bitlines drops in voltage. As
the bitline differential voltage reaches the prescribed amount, the sense enable is acti-
vated. The timing of the sense enable is critical as described later. For now, assume
that it arrives at the proper time. At this point, the bitline difference is fed into the
cross-coupled inverters. One side drops in voltage faster than the other side, since
one side will always have more gate overdrive than the other. When the voltage drops
below VTN on one side, it turns off the pull-down transistor of the opposite side, and
hod83653_ch08.qxd 6/19/03 3:16 PM Page 388
M4 M5
Sense_b
Sense
M2 M3
M1
SenseEnable
wl
b, b
SenseEnable
Sense_b
Sense
Figure 8.22
Latch-based sense amplifier.
the pull-up transistor acts to raise the voltage to VDD . This regenerative process is
shown in the timing diagram of Figure 8.22. The device sizing follows previously dis-
cussed methods for flip-flops.
The more important issue is the timing of the SenseEnable signal. If the latch is
enabled too early, the latch may not flip in the proper direction due to noise. If it is
enabled too late, then it will add unnecessary delay to the access time. In addition,
process variations will control the actual timing of the signal. In order to guarantee
that the signal will arrive at the proper time in the presence of process variations,
one needs to introduce a replica circuit that mimics the delay of the actual signal
path, shown in Figure 8.23. Here, the upper path emanating from the clock is the
actual signal path to the bitlines. The SenseEnable should arrive as soon as the bit-
line swing reaches the desired value. By creating a second path (the lower path) that
exhibits the same delay characteristics, we can ensure that the SenseEnable arrives
at the correct time.
hod83653_ch08.qxd 6/19/03 3:16 PM Page 389
b b
Decoder
Cell
Wordline
Address
..
Predecoder Final decoder
. Bitlines
(small swing)
Clock
Cell
..
.
replica SenseEnable
Sense amp
Bitline
replica Decoder replica
(full swing)
Figure 8.23
Replica circuit for sense amplifier clock enable.
The critical path for the read cycle starts from the clock and the address inputs
and terminates at the sense amplifier inputs. The signal flows through the decoder and
generates the wordline, which activates the memory cell that drives the bitlines. The
swing on the bitlines is presented to the sense amplifier. This is the point at which we
wish to enable the sense amplifier. The purpose of a replica circuit is to duplicate the
delays along this path with circuits that correspond to each delay element. Essentially,
we want to have a decoder replica that tracks the gate delays in the real decoder, and
a cell replica that tracks the bitline discharge delay of the actual bit cell.
Note that we have placed the memory cell replica before the decoder in Figure
8.23. It is not appropriate to place the memory cell after the decoders in the replica
path since it would have to drive all the sense amplifiers at the bottom of the mem-
ory. Since a small memory cell does not have the needed drive capability, we place
the replica cell ahead of the decoder. We can keep the memory small and still deliver
a full-swing signal as needed by the input to the decoder replica. The buffers of the
decoder replica can be used to drive the sense amplifiers.
One issue for the replica memory cell in this configuration is that we require
the full-swing output to have the same delay as the small swing on the actual bit-
lines. For example, if the actual cell requires 500 ps to transition by 180 mV, then the
replica memory cell would require approximately 5 ns to transition by 1.8 V. This is
not an acceptable delay in the replica path.
The replica cell should, in fact, be a replica column line with only enough cells
to match the timing of the actual column. This is shown in Figure 8.24. For example,
hod83653_ch08.qxd 6/17/03 2:02 PM Page 390
Memory block
..
.
Replica bitline 256
(cut here)
..
.
..
26 .
Replica cell
Figure 8.24
Replica cell design.
if we have 256 bits in the true bitline with a swing of 180 mV, then we only require
roughly 26 cells in the replica circuit to produce a full swing of 1.8 V in the same time
interval. The slight round off error in the number of cells used is not an issue since
we will ensure that the replica path is slightly longer than the actual path delay. With
the full swing from the replica path cells, we can drive the decoder replica gates. The
needed 26 bits can be cut from a section of additional columns that are always fab-
ricated alongside the main memory array to avoid “edge effects” on each end.
To ensure that the SenseEnable does not arrive too early for any reason, we
should add an extra gate delay or two to the decoder replica. Designed in this man-
ner, the SenseEnable will arrive at the proper time in the presence of process and
environmental variations.
Predecoder
Mem
Drv Decoder
Row decoder
Memory array Mem Drv Decoder
Mem
Drv Decoder
..
.
R/W
Column MUX
2:1 MUX
Bit IO & Bit IO
Address
Figure 8.25
Basic memory architecture.
decoder is placed on the right side and the column multiplexer and bit I/O are
located below the core array. If we zoom in on a corner region of the memory, as
shown in the right-hand side of Figure 8.25, we see that the row decoder is comprised
of a predecoder and a final decoder. The decoder drives the wordlines horizontally
across the array. Each pair of bitlines feeds a 2:1 column decoder (in this case) which
is connected to the bitline I/O circuits such as sense amplifier and write drivers. Each
memory cell is mirrored vertically and horizontally to form the array, as indicated in
the figure.
Several factors contribute to a limit on the maximum speed of operation. Delays
in address buffers and decoders naturally increase as the number of inputs and out-
puts increase. Row lines are typically formed in polysilicon and may have substantial
delays due to distributed RC parameters. A metal line may be placed in parallel and
contacted to the poly line to reduce the delay (cf., Figure 8.12). Column lines are usu-
ally formed in metal, so resistance is not as significant, but the combined capacitance
of the line and many parallel access transistors connected to them results in a large
equivalent lumped capacitance on each of these lines. The large capacitances on the
wordline and bitlines also contribute to excess power dissipation.
In order to reduce delay and power, a number of different partitioning
approaches have been used. One technique is the divided wordline (DWL) strategy
shown in Figure 8.26 for a 64K bit SRAM. Part of the 8 row address bits (6 in this
case) are used to define global wordlines. A total of 64 global wordlines are created.
These lines do not drive memory cells (i.e., the two access transistors within each
cell) and therefore have far less capacitance than the regular wordlines. The remain-
ing 2 bits of the address are used to generate local wordlines that actually drive the
hod83653_ch08.qxd 6/17/03 2:02 PM Page 392
Block
64 64 16 64 select 64 Bitlines
Global
Local
wordline wordline ..
.. .
256
.
64
Block 1 Block 2 Block 3 Block 4
..
.
..
.
Block 2 Block 4
Global sense
amp
dout din
Figure 8.26
Divided wordline strategy to reduce power and delay.
cell access transistors. In this example, four blocks are created and accessed using the
local wordlines. The total cell capacitance is reduced by up to a factor of 4. Therefore,
the power will be reduced greatly. In addition, the delay along the wordlines is also
reduced.
A similar partitioning strategy can be applied to the bitlines, as shown in Fig-
ure 8.27. An architecture without partitioning is shown in Figure 8.27a. For this
case, neighboring pairs of bitlines are multiplexed to produce 128 outputs (i.e.,
there are 128 sense amplifiers in this example). If the bitlines are partitioned into
two sections, the bitline capacitance is reduced by a factor of 2. The proper cell must
be selected using a two-level multiplexing scheme of Figure 8.27b. To achieve the
same bitline swing as in Figure 8.27a would only require roughly half the time. Fur-
ther partitioning can be carried out with a corresponding increase in the complex-
ity of multiplexing.
hod83653_ch08.qxd 6/17/03 2:02 PM Page 393
1 2 255 256
Wordline drivers
Wordline drivers
..
.
.. 256
.
128
128
Sense Global
amps bitlines
(a) Single-level MUX (b) Two-level MUX
Figure 8.27
Bitline partitioning to reduce delay.
8.6 Summary
This chapter has focused on the application of material in previous chapters to the
design of an SRAM. Modern memories are, of course, much more complicated but
most of them can be understood with the basic concepts that have been presented
in this chapter. Since memory is very regular in nature, the design process has
reached a point where it can be implemented in software. Today, there are CAD
tools called memory compilers that can generate a memory design within minutes.
While the sizes of the memories that can be generated are still limited to some
degree, the automatic synthesis approach for memories will be used more fre-
quently in the future. This concludes the discussion on SRAM circuits. Other types
of memory circuits are described in Chapter 9.
REFERENCES
1. B. Prince, Emerging Memories: Technologies and Trends, Kluwer Academic Publishers,
Boston, MA, 2002.
2. K. Itoh, VLSI Memory Chip Design, Springer-Verlag, Heidelberg, 2001.
3. J. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits: A Designer Per-
spective, Second Edition, Prentice-Hall, Upper Saddle River, NJ, 2003.
4. S. M. Kang and Y. Leblebici, CMOS Digital Integrated Circuits, Analysis and Design, Third
Edition, McGraw-Hill, New York, 2003.
5. H. Veendrick, Deep-Submicron CMOS ICs, Second Edition, Kluwer Academic Publishers,
Boston, MA, 2000.
hod83653_ch08.qxd 6/17/03 2:02 PM Page 394
6. J. P. Uyemura, CMOS Logic Circuit Design, Kluwer Academic Publishers, Boston, MA,
1999.
7. Behrad Razavi, Design of Analog CMOS Integrated Circuits, McGraw-Hill, New York,
2001.
PROBLEMS
P8.1. What are the main differences between the ROM, SRAM, DRAM,
EPROM, E2PROM, and Flash? Which is the most popular memory for
embedded applications (i.e., on the same chip as the processor logic
blocks)? Describe suitable applications for each one.
P8.2. Draw the circuit equivalent for the 6T SRAM of Figure 8.12. Estimate
the height and width of the cell. Assume that contacts are 4l by 4l.
P8.3. Implement an 8-bit decoder using NAND2, NAND3, and NAND4 logic
gates and inverters, following a two-level decoding scheme. You do not
have to design the sizes of the transistors. Why is a two-level scheme pre-
ferred over a multilevel scheme? What is the branching effort of the
decoder from input to output (see Chapter 6 for the definition of this
term)?
P8.4. Consider the SRAM cell of Figure 8.8 with a stored 0 on the left side and
a stored 1 on the right side. Design the transistors of the SRAM such that
node q does not exceed VTN during a read operation and node q drops
below VS during a write operation. The desired cell current during a
read operation is 300 mA. Use 0.18 mm technology parameters.
P8.5. Redesign the SRAM cell of the previous problem by assuming worst-
case conditions for a read operation as follows: the threshold voltage of
M3 is reduced by 10%, the width M3 is increased by 10%, the threshold
voltage of M2 is decreased by 10%, and the width of M2 is increased by
10%. Explain why this is considered to be worst case for a read opera-
tion. Simulate the circuit in SPICE to demonstrate its operation under
worst-case conditions.
P8.6. Redesign the SRAM cell of the previous problem by assuming worst-
case conditions for a write operation as follows: the threshold voltage of
M4 is increased by 10%, the width M4 is decreased by 10%, the thresh-
old voltage of M1 is decreased by 10%, and the width of M1 is increased
by 10%. Explain why this is considered to be worst-case for the write
operation. Simulate the circuit in SPICE to demonstrate its operation
under worst-case conditions.
P8.7. Consider the 6T SRAM cell of Figure 8.8. Replace M5 and M6 by poly
resistors that are 100 M in value. Explain how this new 4T cell works
for read and write operations. How does the internal node get pulled to
a high value? Is the new cell static or dynamic?
hod83653_ch08.qxd 6/17/03 2:02 PM Page 395
P8.8. For the sense amplifier shown in Figure 8.20, answer the following ques-
tions using 0.18 mm technology parameters:
(a) If the sense amplifier is driving a load of 50 fF in 500 ps, what is the
required value of ISS?
(b) In order to turn off the input transistors with a bitline swing of
100 mV, what values of W/L are needed?
(c) Which of the three column pull-up configurations of Figure 8.13
would be used with this sense amplifier? What is the initial voltage
at the inputs to the sense amp?
(d) Given the size of the input transistors, what is the steady-state volt-
age at the gate node and the resulting size of M5?
(e) Choose the sizes of M3 and M4 to establish a suitable steady-state
output voltage.
Wordline
Wordline 42 4λ 4λ
SRAM cell
40λ × 40λ
Row decoder
Address [9:2]
Read/write IO
Clk
Address
Wordline
Figure D8.1
SRAM layout and timing information.
(c) You can implement the predecoder in either 2 or 4 stages as shown in Fig-
ure D8.2. Which of the two approaches is better? Using logical effort,
decide which architecture you are going to use. For both cases, the final
decode stage is a NAND2-INV combination. Ignore any sideload capaci-
tances.
(d) Next choose the correct design for the decoder, and size the gates to opti-
mize performance. What is your estimate of the delay of the decoder in
terms of FO4 delay (include the delay from the parasitics of the gates in
your estimate)?
(e) Now include the sideload at the outputs of the predecoder stage. The side-
load is due to the wire running vertically in Figure D8.2. Compute the
actual sizes when the sideload is included.
(f) Compare the hand-calculated delay against SPICE.
hod83653_ch08.qxd 6/17/03 2:02 PM Page 397
Address [9:2]
Wordline[0]
Wordline[15]
10240λ = 2048µm
Wordline[16]
Two possible
types of the
stage
Wordline[31]
Predecoders Wordline[255]
Figure D8.2
Decoder topology, branching, and wire loads.
of the inverter), and (3) Make the transistors as small as possible. One
assumption you could make is that the bitline is around 0 V during the
write operation. For the read operation, assume that the bitline acts like a
supply voltage at VDD . Rules of thumb are available for the design but you
may need to adjust these values to make the cell work properly. All transis-
tors should be specified in integer units of l. The minimum channel
length/width is 2l.
(b) Estimate VS of the inverters used in the cell. Why is this number important
in the cell design? Use SPICE to verify the dc parameters you calculated.
Provide both VOL for the read operation and VS as measured in HSPICE.
Adjust any device sizes as necessary to achieve the design specifications.
(c) Using hand calculations, estimate the current that the cell can draw from
the bitline when the wordline goes high. Use an average between the initial
value and the steady-state value to estimate this current.
(d) Next, calculate the capacitance of the bitline. There are 256 rows of cells
and, from Design Problem 1, each cell is 40l 40l.The column pull-up
transistors are each 50l. Assume that each contact has a capacitance of 1 fF.
(e) Next compute the time required to discharge one of the bitlines by 180 mV.
(f) The last part of the read cycle involves the sense amplifier. The design of
the sense amp has been provided in P8.8 using Figure 8.20. Compute the
discharge time assuming that the bias current is 300 mA and the output
capacitance is 50 fF. Assume a voltage swing at the output of the sense amp
of 0.9 V.
(g) The next step of the design is the write circuit of Figure 8.18. To write the
cell, the bitline has to be driven low enough so that the cell switches. Ide-
ally, the write time should be about the same speed as the read time. If we
work to make it faster, we are wasting our effort because the read will be the
limiting factor. If it is slower than the read, then the write is in the critical
path and that is not desirable either. Using a simple RC model, determine
the effective resistance needed to get the bitlines to swing down to VDD /5
in the same amount of time it takes to do the read. Size M13, M14, and M15
so that they have this effective resistance.
(h) As a last step, we need to set the values of the column pull-up transistors of
Figure 8.18. They need to pull the bitlines up after a write but before the
end of the precharge phase. How big do they need to be? The final differ-
ence of the bitlines should be less than 10% of the desired read swings to
avoid confusing the sense amp. Typically, we make them twice the size of
the pull-down devices for a fast precharge. Size these transistors accord-
ingly.
(i) Check your results using SPICE.