Introduction To Flash Memory: Proceedings of The IEEE May 2003
Introduction To Flash Memory: Proceedings of The IEEE May 2003
net/publication/2986127
CITATIONS READS
610 6,862
4 authors, including:
Some of the authors of this publication are also working on these related projects:
phase change nanostructures and chalcogenide thin films for electronics View project
All content following this page was uploaded by Alberto Modelli on 15 November 2013.
Invited Paper
The most relevant phenomenon of this past decade in the field to allow cell scaling below the 65-nm node is the tunnel oxide
of semiconductor memories has been the explosive growth of the thickness reduction, as tunnel thinning is limited by intrinsic and
Flash memory market, driven by cellular phones and other types extrinsic mechanisms.
of electronic portable equipment (palm top, mobile PC, mp3 audio
player, digital camera, and so on). Moreover, in the coming years, Keywords—Flash evolution, Flash memory, Flash technology,
portable systems will demand even more nonvolatile memories, ei- floating-gate MOSFET, multilevel, nonvolatile memory, NOR cell,
ther with high density and very high writing throughput for data scaling.
storage application or with fast random access for code execution
in place. The strong consolidated know-how (more than ten years of
experience), the flexibility, and the cost make the Flash memory a I. INTRODUCTION
largely utilized, well-consolidated, and mature technology for most
of the nonvolatile memory applications. Today, Flash sales repre- The semiconductor market, for the long term, has been
sent a considerable amount of the overall semiconductor market. continuously increasing, even if with some valleys and
Although in the past different types of Flash cells and architec- peaks, and this growing trend is expected to continue in the
tures have been proposed, today two of them can be considered as coming years (see Fig. 1). A large amount of this market,
industry standard: the common ground NOR Flash, that due to its
about 20%, is given by the semiconductor memories, which
versatility is addressing both the code and data storage segments,
and the NAND Flash, optimized for the data storage market. are divided into the following two branches, both based on
This paper will mainly focus on the development of the NOR Flash the complementary metal–oxide–semiconductor (CMOS)
memory technology, with the aim of describing both the basic func- technology (see Fig. 2).
tionality of the memory cell used so far and the main cell architec-
ture consolidated today. The NOR cell is basically a floating-gate – The volatile memories, like SRAM or DRAM, that
MOS transistor, programmed by channel hot electron and erased although very fast in writing and reading (SRAM)
by Fowler–Nordheim tunneling. The main reliability issues, such as or very dense (DRAM), lose the data contents when
charge retention and endurance, will be discussed, together with the the power supply is turned off.
understanding of the basic physical mechanisms responsible. Most – The nonvolatile memories, like EPROM,
of these considerations are also valid for the NAND cell, since it is
based on the same concept of floating-gate MOS transistor. EEPROM, or Flash, that are able to balance
Furthermore, an insight into the multilevel approach, where two the less-aggressive (with respect to SRAM and
bits are stored in the same cell, will be presented. In fact, the ex- DRAM) programming and reading performances
ploitation of the multilevel approach at each technology node allows with nonvolatility, i.e., with the capability to keep
the increase of the memory efficiency, almost doubling the density the data content even without power supply.
at the same chip size, enlarging the application range, and reducing
the cost per bit. Thanks to this characteristic, the nonvolatile memories offer
Finally, the NOR Flash cell scaling issues will be covered, the system a different opportunity and cover a wide range
pointing out the main challenges. The Flash cell scaling has of applications, from consumer and automotive to computer
been demonstrated to be really possible and to be able to follow
and communication (see Fig. 3).
the Moore’s law down to the 130-nm technology generations.
The technology development and the consolidated know-how is The different nonvolatile memory families can be qualita-
expected to sustain the scaling trend down to the 90- and 65-nm tively compared in terms of flexibility and cost (see Fig. 4).
technology nodes as forecasted by the International Technology Flexibility means the possibility to be programmed and
Roadmap of Semiconductors. One of the crucial issues to be solved erased many times on the system with minimum granularity
(whole chip, page, byte, bit); cost means process complexity
Manuscript received July 1, 2002; revised January 5, 2003. and in particular silicon occupancy, i.e., density or, in sim-
The authors are with the Central Research and Development Department, pler words, cell size. Considering the flexibility-cost plane,
Non-Volatile Memory Process Development, STMicroelectronics, 20041
Agrate Brianza, Italy (e-mail: [email protected]). it turns out that Flash offers the best compromise between
Digital Object Identifier 10.1109/JPROC.2003.811702 these two parameters, since they have the smallest cell size
A. Basic Concept
ation. But the Flash market did not take off until this tech-
nology was proven to be reliable and manufacturable. In the A Flash cell is basically a floating-gate MOS transistor
late 1990s, the Flash technology exploded as the right non- (see Fig. 7), i.e., a transistor with a gate completely sur-
volatile memory for code and data storage, mainly for mobile rounded by dielectrics, the floating gate (FG), and electri-
applications. Starting from 2000, the Flash memory can be cally governed by a capacitively coupled control gate (CG).
considered a really mature technology: more than 800 mil- Being electrically isolated, the FG acts as the storing elec-
lion units of 16-Mb equivalent NOR Flash devices were sold trode for the cell device; charge injected in the FG is main-
in that year. tained there, allowing modulation of the “apparent” threshold
In Fig. 6, the Flash market is reported and compared with voltage (i.e., seen from the CG) of the cell transistor.
the DRAM and SRAM one [10]. It can be seen that the Flash Obviously the quality of the dielectrics guarantees the non-
market became and has stayed bigger than the SRAM one volatility, while the thickness allows the possibility to pro-
since 1999. Moreover, the Flash market is forecasted to be gram or erase the cell by electrical pulses. Usually the gate
above $20 billion in three or four years from now, reaching dielectric, i.e., the one between the transistor channel and the
the DRAM market amount, and only smoothly following the FG, is an oxide in the range of 9–10 nm and is called “tunnel
DRAM oscillating trend, driven by the personal computer oxide” since FN electron tunneling occurs through it. The
market. In fact, portable systems for communications and dielectric that separates the FG from the CG is formed by a
consumer markets, which are the drivers of the Flash market, triple layer of oxide–nitride–oxide (ONO). The ONO thick-
are forecasted to continuously grow in the coming years. ness is in the range of 15–20 nm of equivalent oxide thick-
In the following, we briefly describe the basics of the Flash ness. The ONO layer as interpoly dielectric has been intro-
cell functionality. duced in order to improve the tunnel oxide quality. In fact, the
Fig. 9. (a) NOR Flash array equivalent circuit. (b) Flash memory cell cross section.
use of thermal oxide over polysilicon implies growth temper- contact and the sourceline. This picture can be better under-
ature higher than 1100 C, impacting the underneath tunnel stood considering the layout of a cell (see Fig. 10) and the
oxide. High-temperature postannealing is known to damage two schematic cross sections, along the direction (bitline)
the thin oxide quality. and the direction (wordline). The cell area is given by the
If the tunnel oxide and the ONO behave as ideal di- pitch times the pitch. The pitch is given by the active
electrics, then it is possible to schematically represent the area width and space, considering also that the FG must
energy band diagram of the FG MOS transistor as reported overlap the oxide field. The pitch is constituted by the cell
in Fig. 8. It can be seen that the FG acts as a potential well gate length, the contact-to-gate distance, half contact, and
for the charge. Once the charge is in the FG, the tunnel and half sourceline. It is evident, as reported in Fig. 9(b), that
ONO dielectrics form potential barriers. both contact and sourceline are shared between two adjacent
The neutral (or positively charged) state is associated with cells.
the logical state “1” and the negatively charged state, corre-
sponding to electrons stored in the FG, is associated with the B. Reading Operation
logical “0.” The data stored in a Flash cell can be determined mea-
The “NOR” Flash name is related to the way the cells are suring the threshold voltage of the FG MOS transistor. The
arranged in an array, through rows and columns in a NOR-like best and fastest way to do that is by reading the current driven
structure. Flash cells sharing the same gate constitute the by the cell at a fixed gate bias. In fact, as schematically re-
so-called wordline (WL), while those sharing the same drain ported in Fig. 11, in the current–voltage plane two cells,
electrode (one contact common to two cells) constitute the respectively, logic “1” and “0” exhibit the same transcon-
bitline (BL). In this array organization, the source electrode ductance curve but are shifted by a quantity—the threshold
is common to all of the cells [Fig. 9(a)]. voltage shift ( )—that is proportional to the stored elec-
A scanning electron microscope (SEM) cross section tron charge .
along a bitline of a Flash array is reported in Fig. 9(b), where Hence, once a proper charge amount and a corresponding
three cells can be observed, sharing two by two the drain is defined, it is possible to fix a reading voltage in such
Fig. 10. The NOR Flash cell. (a) Basic layout. (b) Updated Flash
product (64-Mb, 1.8-V Dual bank). (c) and (d) are, respectively,
the schematic cross section along bitline (y pitch) and wordline
(x pitch).
C. Data Retention
As in any nonvolatile memory technology, Flash memories
are specified to retain data for over ten years. This means the
loss of charge stored in the FG must be as minimal as pos-
sible. In updated Flash technology, due to the small cell size,
the capacitance is very small and at an operative programmed
threshold shift—about 2 V—corresponds a number of elec-
trons in the order of 10 to 10 . A loss of 20% in this number
(around 2–20 electrons lost per month) can lead to a wrong Fig. 16. Threshold voltage window closure as a function of
read of the cell and then to a data loss. program/erase cycles on a single cell.
Possible causes of charge loss are: 1) defects in the tunnel
oxide; 2) defects in the interpoly dielectric; 3) mobile ion
contamination; and 4) detrapping of charge from insulating
layers surrounding the FG.
The generation of defects in the tunnel oxide can be di-
vided into an extrinsic and an intrinsic one. The former is
due to defects in the device structure; the latter to the physical
mechanisms that are used to program and erase the cell. The
tunnel oxidation technology as well as the Flash cell architec-
ture is a key factor for mastering a reliable Flash technology.
The best interpoly dielectric considering both intrinsic
properties and process integration issues has been demon-
strated to be a triple layer composed of ONO. For several Fig. 17. Program and erase time as a function of the cycles
generations, all Flash technologies have used ONO as their number.
interpoly dielectric.
The problem of mobile ion contamination has been al-
ready solved on the EPROM technology, taking particular
care with the process control, but in particular using high
phosphorus content in intermediate dielectric as a gettering
element. [17], [18]. The process control and the interme-
diate dielectric technology have also been implemented in
the Flash process, obtaining the same good results.
Electrons can be trapped in the insulating layers sur-
rounding the floating gate during wafer processing, as a
result of the so-called plasma damage, or even during the UV
exposure normally used to bring the cell in a well-defined
state at the end of the process. The electrons can subse-
quently detrap with time, especially at high temperature. Fig. 18. Anomalous SILC modeling. The leakage is caused by
a cluster of positive charge generated in the oxide during erase
The charge variation results in a variation of the floating gate (left-hand side). The multitrap assisted tunneling is used to model
potential and thus in cell decrease, even if no leakage SILC: trap parameters are energy and position.
has actually occurred. This apparent charge loss disappears
if the process ends with a thermal treatment able to remove typical result of an endurance test on a single cell is shown in
the trapped charge. Fig. 16. As the experiment was performed applying constant
The retention capability of Flash memories has to be pulses, the variations of program and erase threshold voltage
checked by using accelerated tests that usually adopt levels are described as “program/erase threshold voltage
screening electric fields and hostile environments at high window closure” and give a measure of the tunnel oxide
temperature. aging. In real Flash devices, where intelligent algorithms are
used to prevent window closing, this effect corresponds
D. Programming/Erasing Endurance to a program and erase times increase (see Fig. 17).
Flash products are specified for 10 erase/program cycles. In particular, the reduction of the programmed threshold
Cycling is known to cause a fairly uniform wear-out of the with cycling is due to trap generation in the oxide and to
cell performance, mainly due to tunnel oxide degradation, interface state generation at the drain side of the channel,
which eventually limits the endurance characteristics [19]. A which are mechanisms specific to hot-electron degradation.
B. Reading Operation
In order to have a fast reading operation in the NOR cell, a
parallel sensing approach can be used [29]. The cell current,
obtained in reading conditions, is simultaneously compared
with three currents provided by suitable reference cells (see
Fig. 22). The comparison results are then converted to a bi-
nary code, whose content can be 11, 01, 10, or 00, due to the
Fig. 22. Parallel multilevel sensing architecture. multilevel nature. In Fig. 23, we report the threshold voltage
= =
MSB most significant bit; LSB less significant bit. distribution of a 2-b/cell memory. The 11, 10, and 01 cell dis-
tribution will give rise to a different current distribution, mea-
sured at fixed , while the 00 cell distribution does not
butions can be obtained by combining a program-and-verify
drain current as well as the programmed level of a standard
technique with a staircase ramp (see Fig. 21). In fact,
1-b/cell device. High read data rate, via page or burst mode,
this method should theoretically lead to a distribution
is normally supported by large internal read parallelism.
width for any state not larger than . Indeed, neglecting
A parallel sensing approach does not seem transferable
any error due to sense amplifier inaccuracy or voltage fluc-
to 3- or 4-b/cell generations because of the exponential in-
tuations, the last programming pulse applied to a cell will
crease, 2 1, in comparators number, respectively 7 or 15
cause its threshold voltage to be shifted above the program
per cell, that means exponential increase in sensing area and
verify decision level by an amount at most as large as .
current consumption. At this moment, a serial sensing ap-
It follows that by decreasing , it is possible to in-
proach, e.g., dichotomic, or a mixed serial-parallel is consid-
crease the programming accuracy. Obviously, this is paid in
ered the more suitable approach. Serial sensing is also useful
terms of a larger number of programming pulses together
for a 2-b/cell device when high-speed random access is not
with verify phases and, therefore, with a longer programming
necessary, e.g., in Flash Cards applications.
time. Hence, the best accuracy/time tradeoff must be chosen
for each case considering the application specification.
However, high programming throughput, equal to 1-b/cell C. Data Retention
devices, is normally achieved via a large internal program One of the main concerns about multilevel is the reduced
parallelism, which is possible because cells need a low pro- margin toward the charge loss, compared with the 1-b/cell
gramming current in ML staircase programming. To do that, approach. We can basically divide the problem of data reten-
ML devices operate with a program write buffer, whose typ- tion into two different issues.
ical length is 32–64 bytes, i.e., 128–256 cell data length. The first is related to the extrinsic charge loss, i.e., to a
Also, evolution to 3–4 b/cell will not have an impact on single bit that randomly can have different behaviors with
programming throughput. In fact, program pulses and verify respect to the average and that usually form a tail in a stan-
phases increase proportionally with the number of bits per dard distribution. It is well known that extrinsic charge loss
cell, thus keeping roughly constant the effective byte pro- strongly depends on tunnel oxide retention electric field and
gramming time. that this issue can become more critical if an enhanced cell
Despite a not-negligible programming current, another ad- threshold range has to be used to allocate the 2 levels [30].
vantage in using CHE programming for multilevel devices is This problem is usually solved with the introduction of the
to avoid the appearance of erratic bits that instead can be a error correction code (ECC), whose correction power must
potential failure mode affecting FN programming. In fact, er- be chosen as a function of the technology and of the specifi-
ratic bit behavior was observed in the FN erase of standard cation required to the memory products.
NOR memories [27] but, for its nature, it should be present in The second one is related to the intrinsic charge loss, i.e.,
every tunneling process [28]. to the behaviors of the Gaussian part of a cell distribution,
Fig. 27. Triple well structure cross section: schematic (left side) and SEM (right side).
beyond 64 Mb will be realized entering the Flash CMOS technology have also been used for Flash. In Fig. 26,
in the gigabit era. The sectorization is becoming the different cell cross sections as a function of the different
more complex, and dual or multiple bank devices technology node are reported. For every generation, the
have already been presented. In these devices, dif- main innovative introduced steps are pointed out. It turns
ferent groups of sectors ( banks) can be differently out that the evolution of the different generations has been
managed: at the same time one sector belonging sustained by an increased process complexity, from the
to a bank can be read while another one, inside a one gate oxide and one metal process with standard local
different bank, can be programmed or erased. Also, oxidation of silicon isolation at the 0.8- m technology node,
following the general trend of reducing the power to the two gate oxides, three metals, and shallow trench iso-
supply, the device supply is scaling to 1.8 V (with lation at the 0.13- m node. In between is the introduction of
the consequent difficulties of internally generating tungsten plug, of self-aligned silicided junctions and gates,
high voltages starting from this low supply voltage and the wide use of chemical mechanical polishing steps.
value) and will go down to 1.2 V. Another issue, be- But one of the most crucial technologies for Flash evolution
coming more and more important, is the high data was the high-energy implantation development that has
throughput, in particular considering the density allowed the introduction of the triple well architecture (see
increase. Burst mode is often used in order to speed Fig. 27). With this process module, further development
up the reading operation and quickly download the of the single-voltage products has been possible, allowing
software content, reaching up to 50 MB/s. the easy management of the negative voltage required to
The introduction of the different generation as well as the erase the cell and, furthermore, the possibility to completely
reduction of the cell size has been made possible by the change the erasing scheme of the cell.
developments of Flash technology and process, and of cell In fact, as reported in Fig. 28, the cell programming and
architecture. erasing applied voltages have been changed as a function of
For what concerns the process architecture, all the main the different generation, always staying inside the CHE pro-
technology steps that have allowed the evolution of the gramming and the FN erasing. The first generation of cells
Fig. 29. NOR cell scaling. The basic layout has remained
unchanged through different generations. Fig. 30. NOR Flash cell scaling trends for cell area (right y axis)
and cell aspect ratio (left y axis). Both values are normalized to
the 130-nm technology node.
was erased, applying the high voltage to the source junction
and then extracting electrons from the FG-source overlap re-
gion (source erase scheme). This way was too expensive in The next technology step for the NOR Flash will be the
terms of parasitic current, as the working conditions were 90-nm technology node in 2004–2005. The cell size is ex-
very close to the junction breakdown. Moving to the second pected to stay in the range of 10–12 , translating to a cell
generation with the single-voltage devices, the voltage drop area of 0.1–0.08 m . As reported again in Fig. 29, the cell
between the source and the FG was divided, applying a neg- basic layout and structure has remained unchanged through
ative voltage to the control gate and lowering the source bias the different generations. The area scales through the scaling
to the external supply voltage (negative gate source erase of both the and pitch. Basically, this must be done con-
scheme). temporarily reducing the active device dimensions, effective
Finally, with the exploitation of the triple well also for the length ( ) and width ( ), and the passive elements,
array, the erasing potential is now divided between the neg- such as contact dimension, contact to gate distance, and so
ative CG and the positive bulk (the isolated p-well) of the on.
array, moving the tunneling region from the source to the For future generation technology nodes, i.e., the 65 nm in
whole cell channel (channel erase scheme). In this way, elec- 2007 and the 45 nm in 2010, as forecasted by ITRS, the Flash
trons are extracted from the FG all along the channel without cell reduction will face challenging issues. In fact, while the
any further parasitic current contribution from the source passive elements will follow the standard CMOS evolution,
junction, consequently reducing the erase current amount of benefiting from all the technology steps and process modules
about three orders of magnitude; the latter being a clear ben- proposed for the CMOS logic (like advanced lithography for
efit for battery saving in portable low-voltage applications. contact size, cupper for metallization in very tight pitch), the
The NOR Flash cell is forecasted to scale again following active elements will be limited in the scaling. In particular,
the International Technology Roadmap of Semiconductors the effective channel length will be limited by the possibility
(ITRS) [32]. The introduction of the 130-nm technology to further scale the active dielectric, i.e., the tunnel oxide and
node has occurred in 2002–2003 with a cell size of 0.16 m the interpoly ONO. As already presented in Section III, the
[33], following the 10- golden rule for the cell area tunnel oxide thickness scaling is limited by intrinsic issues
scaling, where is the technology node. The representation related to the Flash cell reliability, in particular the charge re-
of the memory cell size in terms of number of is a usual tention one, especially after many writing cycles. Although
way to compare different technology with the same metric; the direct tunneling, preventing the ten-year retention time,
for example, the DRAM cell size is today quoted to stay in occurs at 6–7 nm, SILC considerations push the tunnel thick-
the range of 6–8 . ness limit to no less than 8–9 nm. Moreover, the effective