Static RAM
Static RAM
Static RAM
INTRODUCTION
[3].
This paper is organized as follows. In section II, designing
of an 8-BIT SRAM is presented. Section III explains the
measurement of SNM, write and read delays. Section IV
describes Monte Carlo simulations by considering process
variations and mismatch. A summary and future work is
presented in section V.
Vs
(a)
Viffl*vi~fm ollf SRAM
525
.t
1.3
:.0
timre tnrsl
(b)
Figure 2. (a) Schematic for 7T SRAM cell (b) Waveform for 7T Cell during
Data Protection by N5 Transistor.
WL
AO AlAV Al
I-
WL
_I
WL
I1_
WL
-77)
__ _
F_
NOR
NOR
AO Al
Figure 4. (a) Row Decoder of 8-Bit SRAM 6T cell (b) Row decoder
of 8-bit SRAM 7T cell
WWL
: 12
;= 4R / W and
w=t
=
=
.:08
: tws
SPRAM 7T
SRAM 6T
Vss
105
163
Read Delay
56
10 1
LOS
N
IW 2
2594
1536
096
0_12
47-65
27-62
0.M
O 4
55-68
33-07-
0-84
0-24
64-13
5 53
0-S4
0-36
S-36
0 736
-0-4
operation
Write
Delay
(ps)
operation
Read
Delay
SRAM 6T
Write '0'
326
Read '0'
389.6
SRAM 7T
Write '0'
239
Read '0'
320.2
SRAM 6T
Write '1'
303
Read '1'
348.2
SRAM 7T
Write '1'
265
Read '1'
229.1
(ps)
Table III
Delay_8BIT_7T_SRAM_Read_1
SRAM 8BIT
Write
operation
Mean(g)
(ps)
SD(6)
(ps)
- (%)
SRAM 6T
Write '0'
267.43
21.45
8.02
SRAM 7T
Write '0'
236.37
18.26
7.75
SRAM 6T
Write '1'
299.45
21.61
7.21
SRAM 7T
Write '1'
279.48
17.54
6.27
Table IV
(a)
Delay_6T_SRAM_8Bit_Read_1
SRAM 8BIT
operation
Read
Mean (g)
SD (6)
(ps)
(O)
SRAM 6T
Read '0'
352.36
20.14
5.71
SRAM 7T
Read '0'
319.9
19.73
6.16
SRAM 6T
Read '1'
391.4
20.72
5.29
SRAM 7T
Read '1'
247.45
18.95
7.65
(ps)
REFERENCES
(b)
Figure 6. Monte Carlo Simulations for SRAM 8-BIT (a) Simulation
result for Read '1' operation of using 7T cell (b) Simulation for Read '1'
operation using 6T cell.
[1]
[12] http://www.vertex42.com/ExcelArticles/mc/MonteCarloSimulation.html
A R Aswatha,
Dept of ECE,
BMSCE,
Bangalore-560019, India
1
[email protected], [email protected]
Dept of ECE,
DSCE,
Bangalore-560078, India
[email protected]
Cell [3], as shown in Figure 1(b). They will be designed and
analysed in various configurations with respect to
functionality, power dissipation, area occupancy, stability
and access time.
(a)
I.
INTRODUCTION
155
(b)
Figure 1. SRAM Cell. (a) Conventional 6T SRAM Cell. (b) New
Loadless 4T SRAM Cell.
II.
(a)
IV.
SENSE AMPLIFIER
(a)
(b)
Figure 2. SNM simulation Setup. (a) For Conventional 6T SRAM Cell.
(b) For New Loadless 4T SRAM Cell.
III.
(b)
PRECHARGE CIRCUITS
156
(a)
(a)
(b)
Figure 4. Sense Amplifiers. (a) Latch-type SA with Local Precharge
Circuit for 6T SRAM Array. (b) Latch-Type SA with Local Precharge
Circuit for New Loadless 4T SRAM Array.
V.
The decoder and the write driver circuits are same for
both the type of SRAM arrays. The decoder circuit is
presented in section V-A. The write driver circuit is
presented in section V-B.
(b)
A. Decoder Circuit
A decoder is used to decode the given input address and
to enable a particular WL. There are various types of
decoders available. The one that is used in this paper is the
dynamic decoder. Dynamic decoders [6] have the following
advantages when compared to the other types of decoders.
(a) The number of transistors used is less. (b) The layout of
the decoder is simple and less time consuming. (c) The
power consumption is less. (d) The speed of the decoder is
also good.
In particular dynamic NAND decoder is used in this
paper rather than dynamic NOR decoder, as the former
consumes less area and less power than the latter. For an nword memory, an m : n dynamic NAND decoder is used,
Figure 5. Decoder and Write Driver Circuits. (a) 2:4 Dynamic NAND
Decoder. (b) Write Driver
VI.
157
(a)
(a)
(b)
(b)
(c)
Figure 6. Write-Read Cycle of 1-Bit 6T SRAM. (a) In 130nm CMOS
Technology. (b) In 90nm CMOS Technology. (c) In 65nm CMOS
Technology.
(c)
158
(a)
(a)
(b)
(b)
(c)
(c)
159
Cell Ratio
290
70
310
250
320
370
TABLE II.
Cell Ratio
260
40
270
170
280
270
TABLE III.
Cell Ratio
240
30
250
150
260
230
TABLE IV.
ACCESS TIMES FOR BOTH THE TYPES OF SRAM ARRAYS
WITH CR=3 FOR 6T SRAM AND CR=4 FOR 4T SRAM IN 130NM CMOS
TECHNOLOGY.
Metric
Read Access
Time
Write Access
Time
1K-Bit 6T
SRAM
608ps
996ps
145ps
118ps
TABLE V.
ACCESS TIMES FOR BOTH THE TYPES OF SRAM ARRAYS
WITH CR=3 FOR 6T SRAM AND CR=4 FOR 4T SRAM IN 90NM CMOS
TECHNOLOGY.
Metric
Read Access
Time
Write Access
Time
1K-Bit 6T
SRAM
671ps
1290ps
145ps
92.2ps
TABLE VI.
ACCESS TIMES FOR BOTH THE TYPES OF SRAM ARRAYS
WITH CR=3 FOR 6T SRAM AND CR=4 FOR 4T SRAM IN 65NM CMOS
TECHNOLOGY.
Metric
Read Access
Time
Write Access
Time
160
1K-Bit 6T
SRAM
781ps
1990ps
134ps
87.6ps
TABLE VII.
COMPARISON OF TPD FOR DIFFERENT ARRAY
CONFIGURATIONS WITH CR=3 FOR 6T SRAM AND CR=4 FOR 4T SRAM IN
130NM CMOS TECHNOLOGY.
Configuration
TPD (6T)
(in mW)
TPD (4T)
(in mW)
Reduction in TPD
1*1
0.089408
0.05309
40.62%
16 * 16
1.4833
0.8738
41.09%
32 * 32
3.1688
1.8397
41.94%
ACKNOWLEDGMENT
Sandeep R would like to thank I. K. Ravish Kumar,
Project Manager, Intel Technology India Private Limited,
Bangalore, for helpful discussions and providing the tool.
REFERENCES
[1]
TPD (6T)
(in mW)
TPD (4T)
(in mW)
Reduction in TPD
1*1
0.048379
0.026362
45.51%
16 * 16
0.82611
0.44497
46.14%
32 * 32
1.7484
0.91411
47.72%
TABLE IX.
COMPARISON OF TPD FOR DIFFERENT ARRAY
CONFIGURATIONS WITH CR=3 FOR 6T SRAM AND CR=4 FOR 4T SRAM IN
65NM CMOS TECHNOLOGY.
Configuration
TPD (6T)
(in mW)
TPD (4T)
(in mW)
Reduction in TPD
1*1
0.036
0.0189
47.50%
16 * 16
0.5918
0.30994
47.63%
32 * 32
1.2478
0.6497
47.93%
TABLE X.
Configuration
1*1
6T SRAM
array
31
New Loadless
4T SRAM array
29
16 * 16
2064
1552
24.81%
32 * 32
7232
5184
28.32%
Reduction
6.45%
VII. CONCLUSION
The New Loadless 4T-SRAM cell is designed and
analyzed in deep submicron (130nm, 90nm and 65nm)
CMOS technologies, which establish the technology
independence of the New Loadless 4T-SRAM cell and its
consistent performance with respect to Conventional 6TSRAM cell in deep sub-micron regime. The New Loadless
4T SRAM array consumes low power with low area than
that of the Conventional 6T SRAM array. The New
Loadless 4T SRAM Cell operates with high stability for
higher values of CR. The most significant feature of this
new loadless 4T SRAM Cell is that there is no need to
modify any of the fabrication process. Thus it can be used
for on-chip caches in embedded microprocessors, highdensity SRAMs embedded in any logic devices, as well as
for stand-alone SRAM applications.
161
1192
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011
AbstractThe threshold voltage ( TH ) drifts induced by negative bias temperature instability (NBTI) and positive bias temperature instability (PBTI) weaken PFETs and high-k metal-gate
NFETs, respectively. These long-term TH drifts degrade SRAM
cell stability, margin, and performance, and may lead to functional
failure over the life of usage. Meanwhile, the contact resistance
of CMOS device increases sharply with technology scaling, especially in SRAM cells with minimum size and/or sub-ground rule
devices. The contact resistance, together with NBTI/PBTI, cumulatively worsens the SRAM stability, and leads to severe SRAM performance degradation. Furthermore, most state-of-the-art SRAMs
are designed with power-gating structures to reduce leakage currents in Standby or Sleep mode. The power switches could suffer
NBTI or PBTI degradation and have large contact resistances. This
paper presents a comprehensive analysis on the impacts of NBTI
and PBTI on power-gated SRAM arrays with high-k metal-gate
devices and the combined effects with the contact resistance on
SRAM cell stability, margin, and performance. NBTI/PBTI tolerant sense amplifier structures are also discussed.
Index TermsContact resistance, negative bias temperature
instability (NBTI), positive bias temperature instability (PBTI),
power-gated SRAM, reliability.
ACRONYM
SRAM
PBTI
NBTI
Manuscript received October 16, 2009; revised February 06, 2010; accepted
March 30, 2010. First published May 17, 2010; current version published June
24, 2011. This work was supported by the National Science Council of Taiwan,
under Contract NSC 98-2221-E-009 -112, and Ministry of Economic Affairs,
under the Project MOEA 98-EC-17-A-01-S1-124.
The authors are with the Department of Electronics Engineering
and Institute of Electronics, National Chiao-Tung University, Hsinchu
300, Taiwan (e-mail: [email protected]; [email protected];
[email protected]).
Digital Object Identifier 10.1109/TVLSI.2010.2049038
I. INTRODUCTION
BTI has long been a concern for scaled PFETs. The longdrift caused by NBTI has been shown to determ
grade the stability and performance of SRAM, and may lead to
functional failure over the life of usage. Recently, with the introduction of high-k metal-gate technology to contain the gate
leakage current, and to enable scaling of MOSFET to 45 nm
node and below, PBTI has emerged to be a major reliability coninstability caused by charge trapping
cern for NFETs due to
drifts degrade MOSFET
at the interface. These long term
current drive over time, and their effects become more significant with technology and voltage scaling (see Fig. 1) [1].
The transistor performance also degrades with the ever-increasing device contact resistances and series resistances of
the channel/source/drain in scaled technologies. [2], [3]. Conventionally, the contact and series resistance are second-order
effects on the device performance. However, with technology
scaling, the contact area and the device width decrease, leading
to increase in contact and series resistances. When the silicide
length continuously shrinks and is smaller than the transfer
length, the contact resistance increases sharply, severely degrading the stability and performance of circuits.
SRAMs in deep sub-100 nm technologies have poor margin
and stability due to large leakage and process variation, fundamental limitation such as random dopant fluctuation (RDF),
and microscopic effects such as line edge roughness (LER). The
combined/cumulative effects of NBTI/PBTI and device contact
and series resistance aggravate the already poor margin and stability of SRAMs. Furthermore, many state-of-the-arts SRAMs
are designed with power-gating structures to reduce static power
in Standby or Sleep mode [4][7]. The power-gating structures
play vital roles to contain leakage current in Standby or Sleep
mode, and to provide sufficient currents for SRAM arrays in
Active mode. Unfortunately, power switches also suffer NBTI/
PBTI stress and degradation, and become weaker over time. As
such, it is crucial to understand the NBTI/PBTI degradation of
the power-gating structures, in addition to the cell, and the resulting combined impacts on the power-gated SRAMs.
Previous works have shown that SRAM read static margin
(RSNM) was degraded by NBTI effects, while write margin
(WM) was improved [8]. RSNM and WM were both degraded
Fig. 1. V
drifts induced by NBTI and PBTI using reaction-diffusion framefor high-k metal gate (nMOS
work calibrated with published data [1]. T
7.5 ; pMOS
for poly gate (nMOS
7.7 ),
16.5 ; pMOS
17.5 ).
=
=
A
A
A T
A
when PBTI and NBTI were considered together, and the degrawas
dations were more sensitive to PBTI [9]. SRAM
also shown to degrade with time [10]. However, these papers
only focused on analyzing SRAM cells in standard 6-T SRAM
array structures. In this paper, we present a comprehensive
analysis on the impacts of NBTI and PBTI on power-gated
SRAM arrays. Two different types of power-gating structures,
header and footer, are analyzed. The resulting impacts on the
stability, margin, power, performance, virtual supply bounce,
and wake-up time, etc., are discussed. The effects of contact
and series resistances on the SRAM cell, and the combined
impacts with NBTI/PBTI on SRAM are also investigated.
In the following section, we first describe the details of our
simulation model in predictive technology model (PTM) high-k
CMOS 32 nm technology in Section II. Section III shows NBTI/
PBTI impacts on power-gated SRAM. The impacts of contact
resistance on power-gated SRAM are analyzed in Section IV,
and their combined effects with NBTI/PBTI are also studied.
Section V compares SRAM sensing structures, including differential sensing amplifier and large signal sensing scheme, and
shows that judicious choice of appropriate sense amplifier structure can mitigate NBTI and PBTI effects. The conclusions of the
paper are given in Section VI.
II. ANALYSIS MODELS
This section describes the NBTI/PBTI model and contact resistance model used in our analyses. The power-gated SRAM
structure and its operation in this work are also introduced.
A. NBTI and PBTI Model
NBTI causes the threshold voltage
of PFET to become
more negative with time, leading to long term degradation of
current drive. Under negative gate bias (stress phase), holes in
inversion layer interact with and break Si-H bonds at interface.
The H-species diffuse into the oxide, leaving interface traps at
. When stress condiinterface, thus causing increase in
tions are removed, H-species diffuses back to interface and passivates dangling Si-bonds, and passivation (or recovery) oc-
1193
curs. Thus, the device lifetime under ac stress is longer than that
predicted by dc stress measurements. The corresponding effect
for NFET, namely PBTI, is in general quite small and can be
neglected for oxide/poly-gate device. NFETs with high- gate,
however, exhibit significant charge trapping and thus long-term
shift as well. The
drift of PFET (NFET) due to NBTI
(PBTI) can be described by dc reaction-diffusion (RD) framework when the stress signal does not change (i.e., static stress)
[8], [11], [12]. If the stress signal changes with time (i.e., alternating stress), the dc RD model can be multiplied by a prefactor to account for the signal (stress) probability, frequency,
duty cycle of the stress signal, and the recovery mechanism, and
the new formula is called ac RD model [8], [11], [12]. However, according to the results of [12] and [13], the impact of the
drift is relatively insignificant. Thus,
signal frequency on
we neglect the effect of signal frequency, and analyze cases with
various signal (stress) probabilities. In following analysis, the
prefactor of the ac RD model is simplified as function of signal
probability. The simplified ac RD model is
(1)
where prefactor is a function of signal probability (S), and
is a technologydependent constant. Notice also that
drift depends strongly on the
NBTI/PBTI induced
bias and temperature, but barely on
[11], [12]. Fig. 1
shows the
drifts induced by NBTI and PBTI using reaction-diffusion framework and calibrated with published data
drifts are incorporated into PTM 32 nm and PTM
[1]. The
high-k 32 nm device models.1 Notice that in the model,
of poly-gate PFET is 17.5 , while
of high-k metal-gate
of high-k metal-gate device is almost
PFET is only 7.7 .
of poly-gate device. These are
2.3 times smaller than
that
consistent with the facts that the best (smallest)
can be achieved with SiON/poly-Si gate is around 1718 ,
limited by gate tunneling leakage, and that state-of-the-art 32
around 7.58.0 . As
nm high-k metal-gate devices have
drift of high-k gate-device is more
such, in our model, the
serious than the SiON/poly-Si gate device.
B. Contact Resistance Model
As shown in Fig. 2(a), the source/drain (S/D) series resistance
, extension resiscan be divided into overlap resistance
, deep resistance
, and silicon-contact diftance
, where all resistance are in units of
.
fusion resistance
,
, and
are included in the deConventionally,
vice model, but
is not. We model the
of a transistor as
,
shown in Fig. 2(b). With technology scaling, the sum of
, and
decreases, but
increases. The formula for
silicon-contact diffusion resistance is given by
(2)
where
is the sheet resistance per square of the underlying
,
is the specific
heavily doped silicon layer, in unit of
contact resistivity between the metal and the diffusion layer in
is the transfer length,
unit of ohm square centimeter, and
1[Online].
Available: http://www.eas.asu.edu/~ptm/
1194
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011
which is defined as
[3]. When
is
larger than , the contact resistance is only slightly dependent
is smaller than
on the contact region. However, when
, the contact resistance increases sharply if
is further
scaled down. According to [3], the contact resistance would be
,
, and
, and increases
larger than the sum of
sharply beyond 45 nm technology node. As diffusion contact
resistance dominates the short channel device resistance, we focuses on its impacts on SRAM array in the following analysis.
C. Power-Gated SRAM Structures
SRAM power-gating structures can be divided into two
basic types: a header power-gating structure and a footer
structure. Fig. 3(a) shows a column-based header-gated SRAM
structure. In this structure, PH1 is the header power switch
used for leakage reduction, and MH1 is the clamping device
to bias virtual supply (VVDD) for data retention in Standby
or Sleep mode. Fig. 3(b) shows a column-based footer-gated
SRAM structure. MF1 is the footer power switch, and PF1 is
the clamping device to bias virtual ground (VVSS) for data
retention in Standby or Sleep mode. Fig. 4 shows the standard
6T SRAM cells used in our analysis. The sub-array block
size is 128 128 cells. Parasitic capacitance, inductance, and
resistance of the package are included in our analysis. Each
power-gated SRAM array is packed by a package model [14],
TABLE I
SIGNAL (STRESS) PROBABILITY CASE SUMMARY
Available: http://www.itrs.net/
1195
1196
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011
Fig. 6. RSNM of header structure impacted by (a) NBTI, (b) PBTI, and
(c) NBTI&PBTI; RSNM of footer structure impacted by (d) NBTI, (e) PBTI,
and (f) NBTI&PBTI.
Fig. 8. Read delay of header structure impacted by (a) NBTI, (b) PBTI, and
(c) NBTI&PBTI; read delay of footer structure impacted by (d) NBTI, (e) PBTI,
and (f) NBTI&PBTI.
Fig. 7. Relation between RSNM and signal (stress) probability when stress
time is 10 s in (a) header and (b) footer structure.
1197
or VVSS (for footer-gating) to appropriate level for data retention. Because the clamping device is not stressed as explained
drifts of
in Section II, VVDD (VVSS) is dominated by
the SRAM array devices during Standby or Sleep mode. As
such, the Standby/Sleep mode VVDD of the header structure
increases with the stress time, while the Standby/Sleep mode
VVSS of the footer structure decreases with the stress time as
shown in Fig. 10. Additionally, due to the increased equivalent OFF resistances of the SRAM array, the leakages of both
header and footer structures decrease as shown in Fig. 11.
During wake-up transition, the power switch turns on,
leading to virtual supply line bounce due to large current
flowing through the parasitical capacitance, inductance, and
resistance of the package and interconnect. After stressing, the
and equivalent resistance of the power switch and SRAM
array increase, so the wake-up current of the SRAM array
decreases. As a result, the virtual supply line bounce reduces
during wake-up transition as shown in Fig. 12. Moreover,
when only the power switch is impacted by NBTI or PBTI,
the wake-up time increases with stress time due to higher
of the power switch (Case A of Fig. 13(a), and Case A
of Fig. 13(b)). However, if the SRAM cell array also suffers
NBTI and/or PBTI stress, the wake-up time decreases, as the
Standby/Sleep mode VVDD of the header structure increases
1198
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011
Fig. 12. (a) VVDD bounce of header structure impacted by NBTI&PBTI and
(b) VVSS bounce of footer structure impacted by NBTI&PBTI during wake-up
transition.
1199
Fig. 13. (a) Wake-up time of header structure impacted by NBTI&PBTI and
(b) wake-up time of footer structure impacted by NBTI&PBTI.
Fig. 16. Normalized read delay versus contact resistance. Read delay is normalized with respect to the case with no contact resistance. Read delay degradation at 32 nm node from diffusion contact resistance is around 1% to 4.5%.
Fig. 14. (a) Wake-up time of header structure impacted by NBTI&PBTI and
(b) wake-up time of footer structure impacted by NBTI&PBTI.
1200
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011
Fig. 17. Normalized RSNM under NBTI and PBTI versus contact resistance.
RSNM is normalized with respect to the case with no contact resistance and no
NBTI/PBTI stress. RSNM degradation at 32 nm node caused by the combined
effects of NBTI/PBTI and diffusion contact resistance is around 23% to 26%.
Fig. 18. Normalized read delay under NBTI and PBTI versus contact resistance. Read delay is normalized with respect to the case with no contact
resistance and no NBTI/PBTI stress. Read delay degradation at 32 nm node
caused by the combined effects of NBTI/PBTI and diffusion contact resistance
is around 20% to 24%.
Fig. 20. Normalized WM under NBTI and PBTI versus contact resistance. WM
is normalized with respect to the case with no contact resistance and no NBTI/
PBTI stress. WM degradation at 32 nm node caused by the combined effects of
NBTI/PBTI and diffusion contact resistance is around 10.3% to 10.1%.
1201
Fig. 21. Normalized write delay versus contact resistance. Write delay is normalized with respect to the case with no contact resistance. Write delay degradation at 32 nm node by diffusion contact resistance is around 0.6% to 3.4%.
Fig. 23. Active mode VVDD of a header power-gating structure. Active mode
VVDD degradation at 32 nm node by diffusion contact resistance is around 1.03
to 5.06 mV.
Fig. 22. Normalized write delay under NBTI and PBTI versus contact
resistance, write delay is normalized with respect to the case with no contact
resistance and no NBTI/PBTI stress. Write delay degradation at 32 nm node
caused by the combined effects of NBTI/PBTI and diffusion contact resistance
is around 6.62% to 9.65%.
Fig. 24. Active mode VVSS of a footer power-gating structure. Active mode
VVSS degradation at 32 nm node by diffusion contact resistance is around 1.11
to 2.74 mV.
In contrast with RSNM, larger diffusion contact resistance improves WM slightly (about 0.5%), as the current charging QB
is limited by M2 under NBTI effect.
Write delay is defined as the latency between the time WL
and the time Q and QB cross each other. Write
rises to half
delay normally tracks WM, and better (higher) WM would improve Write delay in general. However, Write delay is also aftime constant, and larger diffusion contact refected by the
sistances lead to longer Write delay as shown in Fig. 21. Additionally, Fig. 22 shows the relation between Write delay and the
contact resistance when the cell is under NBTI and PBTI stress.
The Write delay can be seen to degrade about 6% with NBTI
and PBTI. The Write delay also increases sharply when the diffusion contact resistance is larger than 100 .
C. SRAM Power-Gating Structure
In power-gated SRAM, when diffusion contact resistances
increase, the equivalent resistance between VVDD and VDD
(header-gated structure) and VVSS and VSS (footer-gated
structure) also increase. It causes decrease of VVDD in a
header-gated structure and increase of VVSS in a footer-gated
structure as shown in Figs. 23 and 24, respectively. Con-
1202
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011
Fig. 28.
Fig. 26. Standby mode VVSS of footer power-gating structure. Standby mode
VVSS degradation at 32 nm node by diffusion contact resistance is around 0.08
to 0.5 mV.
1203
Hao-I Yang (S09) received the B. S. and M. S. degree in electrical engineering from National Cheng
Kung University, Tainan, Taiwan, in 2003 and 2005,
respectively. He is currently pursuing the Ph.D. degree in electronic engineering from National Chiao
Tung University, Hsinchu, Taiwan.
1204
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011
IBM Master Inventor. He was President, Board Director and Chairman of the
Boards of Directors of the Chinese American Academic and Professional Society (CAAPS) from 1986 to 1999. He is a member of the New York Academy
of Science, Sigma Xi, and Phi Tau Phi Society. He has served several times in the
Technical Program Committee of the ISLPED, SOCC, A-SSCC. He served as
the General Chair of 2007 IEEE SoC Conference (SOCC 2007) and the General
Chair of 2007 IEEE International Workshop on Memory Technology, Design,
and Testing (MTDT 2007). Currently, he is serving as Founding Director of
Center for Advanced Information Systems and Electronics Research (CAISER)
of University System of Taiwan, UST and Director of ITRI and NCTU Joint
Research Center. He is also severing as a Supervisor of IEEE Taipei Section.
cessors for enterprise servers, PowerPC workstations, and game/media processors. Since 1996, he has been leading the efforts in evaluating and exploring
scaled/emerging technologies, such as PD/SOI, UTB/SOI, strained-Si devices,
hybrid orientation technology, and multi-gate/FinFET devices, for high-performance logic and SRAM applications. Since 1998, he has been responsible for
the Research VLSI Technology Circuit Co-design strategy and execution. His
group has also been very active and visible in leakage/variation/degradation tolerant circuit and SRAM design techniques. He took early retirement from IBM
to join National Chiao-Tung University, Hsinchu, Taiwan, as a Chair Professor
in the Department of Electronics Engineering in February 2008. He is currently
the Director of the Intelligent Memory and SoC Laboratory at National ChiaoTung University. He has authored many invited papers in international journals such as International Journal of High Speed Electronics, PROCEEDINGS OF
IEEE, IEEE CIRCUITS AND DEVICES MAGAZINE, and Microelectronics Journal.
He holds 31 U.S. patents with another 11 pending. He has authored or coauthored over 290 papers.
Dr. Chuang was a recipient of an Outstanding Technical Achievement
Award, a Research Division Outstanding Contribution Award, 5 Research
Division Awards, 12 Invention Achievement Awards from IBM, and the
Outstanding Scholar Award from Taiwans Foundation for the Advancement of
Outstanding Scholarship for 2008 to 2013. He was the co-recipient of the Best
Paper Award at the 2000 IEEE International SOI Conference. He served on the
Device Technology Program Committee for IEDM in 1986 and 1987, and the
Program Committee for Symposium on VLSI Circuits from 1992 to 2006. He
was the Publication/Publicity Chairman for Symposium on VLSI Technology
and Symposium on VLSI Circuits in 1993 and 1994, and the Best Student
Paper Award Sub-Committee Chairman for Symposium on VLSI Circuits from
2004 to 2006. He was elected an IEEE Fellow in 1994 For contributions to
high-performance bipolar devices, circuits, and technology. He has presented
numerous plenary, invited or tutorial papers/talks at international conferences
such as International SOI Conference, DAC, VLSI-TSA, ISSCC Microprocessor Design Workshop, VLSI Circuit Symposium Short Course, ISQED,
ICCAD, APMC, VLSI-DAT, ISCAS, MTDT, WSEAS, and VLSI Design/CAD
Symposium, etc.
IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 57, NO. 11, NOVEMBER 2010
2785
AbstractIn this paper, the design space, including fin thickness (Tn ), fin height (Hn ), fin ratio of bit-cell transistors,
and surface orientation, is researched to optimize the stability,
leakage current, array dynamic energy, and read/write delay of
the FinFET SRAM under layout area constraints. The simulation
results, which consider the variations of both Tn and threshold
voltage (Vth ), show that most FinFET SRAM configurations
achieve a superior read/write noise margin when compared with
planar SRAMs. However, when two fins are used as pass gate
transistors (PG) in FinFET SRAMs, enormous array dynamic
energy is required due to the increased effective gate and drain
capacitance. On the other hand, a FinFET SRAM with a one-fin
PG in the (110) plane shows a smaller write noise margin than the
planar SRAM. Thus, the one-fin PG in the (100) plane is suitable
for FinFET SRAM design. The one-fin PG FinFET SRAM with
Tn = 10 nm and Hn = 40 nm in the (100) plane achieves a
three times larger noise margin when compared with the planar
SRAM and consumes a 17% smaller bit-line toggling array energy
at a cost of a 22% larger word-line toggling energy. It also achieves
a 2.3 times smaller read delay and a 30% smaller write delay when
compared with the planar SRAM.
Index TermsCell current, FinFET, leakage current, read stability, SRAM, surface orientation, write stability.
I. I NTRODUCTION
using the FinFET because the critical issues of the SRAM bitcell scaling, such as the demand for continuous bit-cell size
scaling and electrical stability problems, can be resolved [6].
Thus, the optimal design of the SRAM bit-cell with the FinFET
is analyzed in this paper.
For a standard planar CMOS, (100) silicon substrates have
been used generally due to superior electron mobility, which is
higher in the (100) plane than that in the (110) plane. However,
the mobility of a hole in the (100) plane is lower than that
in the (110) plane. For planar device technology, the devices
with a (110) surface orientation have to be fabricated on silicon
substrates with a (110) crystalline orientation, which is not
generally used. However, both the (100) and (110) orientations
for the FinFET can be achieved in the (100) plane because of
the vertical structure. As shown in Fig. 1, the (110)-oriented
FinFET can be achieved by only rotating the transistor layout
by 45 in the plane of a (100) wafer [7]. However, there is
an inevitable area penalty for using multiorientation, where
both (100)- and (110)-oriented FinFETs are used on the same
wafer, because the angle between the (100)- and (110)-oriented
FinFETs has to be 45 [8]. In addition, multiorientation is not
practical due to its complex fabrication process. Thus, singleoriented FinFET SRAM designs (all (110)- or (100)-oriented
FinFETs in the (100) plane) are considered, rather than the multioriented FinFET SRAM ones. The p-n mobility ratios used in
this paper are decided to express the general characteristics of
the (100) and (110) orientations by considering the sensitivity to
the modest amount of process-induced strain from [7] and [9].
The different results between the (100) and (110) orientations in
this paper are mainly caused by the p-n mobility ratio. The ratio
largely depends on the surface orientation but can be affected
by other elements, such as the materials and process-induced
strain. Thus, it is noted that the models for the (100) and (110)
orientations in this paper are one of the possible choices to show
the general trend between the orientations rather than represent
the absolute characteristics of the orientations.
The Tn variation of the FinFET is one of the major sources
of the Vth variation along with the RDF. Thus, both the RDF
and the Tn variation are considered for the FinFET SRAM.
While reducing the fin thickness (Tn ) suppresses the SCE,
the fabrication of a thin FinFET is challenging and increases
the variation of Tn , which results in a large Vth variation
[10]. Thus, different variations depending on the Tn value are
applied for the FinFET SRAM.
Because of the vertical nature of the device structure, the
FinFET can achieve a higher effective channel width (hence,
a higher driving strength) per unit planar area by increasing
2786
IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 57, NO. 11, NOVEMBER 2010
Fig. 2.
(a) Planar SRAM bit-cell layout. (b) FinFET SRAM bit-cell layout.
Fig. 3.
2787
FinFET diagram.
TABLE I
FinFET SRAM DESIGN SPACES
Fig. 4. (a) Idsat per unit width and (b) IOFF per unit width when gate length
(Lg ) = 35 nm and oxide thickness (Tox ) = 1 nm.
Fig. 5. Idsat Vgs curve of planar and FinFET with Lg = 35 nm and Tox =
1 nm in the (100) plane.
Fig. 4 describes Idsat and the OFF current IOFF per unit width
of the FinFET and planar devices. As shown in Fig. 4(a), all
the FinFETs using three (Tn , Hn ) combinations achieve a
larger Idsat per unit width than the planar device. In a standard
(100) plane, the Idsat of the NMOS is larger than that of the
PMOS while the Idsat of the PMOS is larger than that of the
NMOS in the (110) plane. Fig. 5 shows the Idsat Vgs curves of
the NMOS and PMOS in three (Tn , Hn ) combinations in the
(100) plane. The Vth is higher in a narrower fin by suppressed
Vth rolloff and DIBL owing to the improved short-channel
effect. Thus, when Tn is reduced, the Idsat and IOFF per unit
width of the FinFET become smaller, as shown in Fig. 4. The
FinFET with Tn = 20 nm, which suffers from the SCE, has an
even larger IOFF than that of a planar device. The geometric
effective widths in the (10 nm, 40 nm), (15 nm, 30 nm), and
(20 nm, 20 nm) combinations are 90, 75, and 60 nm, respectively. Thus, as shown in Fig. 5, the Idsat is largest and smallest
in the (10 nm, 40 nm) and (20 nm, 20 nm) combinations,
respectively.
2788
IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 57, NO. 11, NOVEMBER 2010
Avt
W idth(= 2Hn + Tn ) Length
(2)
where Avt is a technology constant proportional to the oxide thickness and the channel doping. In this paper, 2.5 and
1.76 mV m are used for the Avt of the planar device and
FinFET, respectively [17]. Even though the Vth.RDF is not
independent of the Tn variation as shown in (2), the effect
is negligible. Thus, Vt , which includes both Vt.RDF and
Vt.Tn , can be expressed by the following equation with the
assumption that Vt.RDF and Vt.Tn are independent
2
2
+ Vt.Tn
.
(3)
Vt = Vt.RDF
In Fig. 6, the RSNM and WNM of the planar and FinFET
SRAM bit-cells are described. The RSNM and the WNM are
measured with the methods described in [18] and [19], respectively. The RSNM and the WNM in the y-axis in Fig. 6 are
presented as / in order to standardize each result. Generally,
the RSNM and WNM of the FinFET SRAM are larger than
those of the planar SRAM because the effect of the small RDF
surpasses that of the Tn variation.
The driving strength of the PG divided by that of the PU
is defined as the alpha ratio. The driving strength of the PD
divided by that of the PG is defined as the beta ratio. The RSNM
is proportional to the beta ratio while the WNM is proportional
to the alpha ratio [20]. In addition, when the driving strength of
the PU relative to that of the PD is high, the trip point of the
inverter becomes high. Thus, the RSNM is also proportional to
the driving strength of the PU divided by that of the PD.
Fig. 7. (a) Currents in SRAM bit-cell during read operation. (b) Butterfly
curves of cases 1 and 3.
(4)
(5)
1
IPD = KPD (VIN VtnD )
(6)
2
W
W
KPU = P COX
KPG = N COX
L PU
L PG
W
KPD = N COX
(7)
L PD
where P and N are the mobility of the PMOS and NMOS
transistors, respectively. Vtp , VtnG , and VtnD are the threshold voltage of the PU, PG, and PD, respectively. (W/L)PU ,
(W/L)PG , and (W/L)PD are the width/length of the PU, PG,
and PD, respectively, and COX is the oxide capacitance. is
the number between one and two. To measure the RSNM, it is
assumed that the voltages of the word-line and the bit-line_b are
2789
VDD . The summation of IPU and IPG is the same as IPD during
the read operation. Thus, the following equations are derived
from (4)(6)
1
KPU (VDD VIN |Vtp |)
VOUT = VDD
2KPG
+ KPG VtnG + X
(8)
dVOUT
1
1 dX
=
KPU +
(9)
dVIN
2KPG
2 X dVIN
where
X = [KPU (VDD VIN |Vtp |) KPG VtnG ]2
+ 2KPG KPD (VIN VtnD )
(10)
dX
2
= 2KPU
(VDD VIN |Vtp |) + 2KPU KPG VtnG
dVIN
+ 2KPG KPD (VIN VtnD )1 .
(11)
2790
IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 57, NO. 11, NOVEMBER 2010
Fig. 11.
Fig. 10. (a) Definition of effective gate capacitance (Cge ) and effective
drain capacitance (Cde ). (b) Cge and Cde per unit width for FinFET and
planar devices normalized to planar NMOS Cge .
Fig. 12. Word-line and bit-line toggling energies normalized to bit-line toggling energy of planar SRAM.
2791
Fig. 13. SRAM operation delay. (a) Read delay. (b) Write delay.
2792
IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 57, NO. 11, NOVEMBER 2010
case 1 achieves 2.6 times larger Icell and 103.7 times smaller
IOFF when compared with the planar SRAM bit-cell due to
a larger effective width achieved by the vertical structure and
superior electrostatic channel control, respectively. Because of
a larger driving strength and smaller Cge and Cde than a
planar device, the FinFET SRAM in case 1 achieves a 2.3 times
smaller read delay and a 30% smaller write delay. The FinFET
SRAM in case 1 also achieves a 17% smaller bit-line toggling
power at a cost of a 22% larger wordline toggling power when
compared with the planar SRAM, respectively.
VII. C ONCLUSION
In this paper, the FinFET SRAMs with possible (Tn , Hn )
combinations, fin ratio, and surface orientation are researched
with regard to speed, stability, leakage current, and array dynamic energy under area constraints. Despite the Tn variation,
the FinFET SRAM achieves a superior read/write noise margin
compared to that of the planar SRAM, owing to the small
RDF. In addition, owing to the strong driving capability with
the aid of the vertical structure, most FinFET SRAM configurations show superior speed performance when compared
with the planar SRAM. However, the cases with NG = 2
consume too much word-line and bit-line toggling energies
because of the greatly increased Cge and Cde . Moreover, the
cases with NG = 1 in the (110) plane show very poor write
stability. In the case of NG = 1 in the (100) plane, the (Tn =
15 nm, Hn = 30 nm) and (Tn = 20 nm, Hn = 20 nm) configurations show too much IOFF . The optimal configuration
with (Tn = 10 nm, Hn = 40 nm) and NG = 1 shows 103.7
times smaller IOFF due to a high Vth and shows three times
larger read and write noise margins when compared with the
planar SRAM bit-cell despite the Tn variation. It also achieves
a 2.3 times smaller read delay and a 30% smaller write delay
when compared with the planar SRAM.
R EFERENCES
[1] K. Kim, K. K. Das, R. V. Joshi, and C.-T. Chuang, Leakage power
analysis of 25-nm double-gate CMOS devices and circuits, IEEE Trans.
Electron Devices, vol. 52, no. 5, pp. 980986, May 2005.
[2] J. Kavalieros, B. Doyle, S. Datta, G. Dewey, M. Doczy, B. Jin,
D. Lionberger, M. Metz, W. Rachmady, M. Radosavljevic, U. Shah,
N. Zelick, and R. Chau, Tri-gate transistor architecture with high- k gate
dielectrics, metal gate and strain engineering, in VLSI Symp. Tech. Dig.,
Jun. 2006, pp. 5051.
[3] H. Shang, L. Chang, X. Wang, M. Rooks, Y. Zhang, B. To, K. Babich,
G. Totir, Y. Sun, E. Kiewra, M. Ieong, and W. Haensch, Investigation of
FinFET devices for 32 nm technologies and beyond, in VLSI Symp. Tech.
Dig., Jun. 2006, pp. 5455.
[4] S. A. Tawfik and V. Kursun, Low-power and compact sequential circuits
with independent-gate FinFETs, IEEE Trans. Electron Devices, vol. 55,
no. 1, pp. 6070, Jan. 2008.
[5] D. J. Frank, Y. Taur, M. Ieong, and H.-S. P. Wong, Monte Carlo modeling
of threshold variation due to dopant fluctuations, in VLSI Symp. Tech.
Dig., Jun. 1999, pp. 171172.
[6] A. Bansal, S. Mukhopadhyay, and K. Roy, Device-optimization technique for robust and low-power FinFET SRAM design in nanoscale
era, IEEE Trans. Electron Devices, vol. 54, no. 6, pp. 14091419,
Jun. 2007.
[7] L. Chang, M. Ieong, and M. Yang, CMOS circuit performance enhancement by surface orientation optimization, IEEE Trans. Electron Devices,
vol. 51, no. 10, pp. 16211627, Oct. 2004.
2793
B. M. Han
From 1989 to 1997, he was with Samsung Electronics, where he worked on the DRAM/eDRAM/
GDRAM layout. From 1997 to 2000, he was with
AAC, where he worked on a graphic memory layout. From 2000 to 2004, he was with IDT, where
he worked on the SRAM and CAM layout. Since
2004, he has been with the digital mask design team
of Qualcomm Inc., San Diego, CA. He has more
than 20 years of memory layout experience and is
currently interested in new device technologies.
H. K. Park (S10) was born in Iksan, Jeollabukdo, Korea, in 1982. He received the B.S. degree
in electrical and electronic engineering from Yonsei
University, Seoul, Korea, in 2008, where he is currently working toward the M.S. degree.
His current research interests include SRAM stability, subthreshold SRAM bit-cell design, FinFET
SRAM bit-cell design, and FinFET peripheral circuit
design.
76
I. INTRODUCTION
KOLAR et al.: A 32 nm HIGH-k METAL GATE SRAM WITH ADAPTIVE DYNAMIC STABILITY ENHANCEMENT FOR LOW-VOLTAGE OPERATION
77
78
Fig. 3. The effect of WLUD strength on read and write Vccmin for a slow-fast
(SF) and fast-slow (FS) corner die is shown. The optimal WLUD strength is determined by where the read and write Vccmin curve intersect and are different
for the two die. Without the ability to track process corners, a compromise solution for WLUD will have to be picked which is not optimal for either the SF
or the FS die. Data is measured at 10 C.
for the SF die. Now consider the FS die. The optimal WLUD for
this die is about 17% of Vcc. Clearly what WLUD is optimal for
the FS die is not optimal for the SF die and vice-versa. If there
are no means of determining if a die is in the SF or FS corner, the
optimal WLUD setting is determined by intersection of the RD
Vccmin for the FS die and the WR Vccmin for the SF die. This
value of WLUD is 9% which is not the optimal selection for
either the SF die or the FS die. The die active Vccmin obtained
using this compromise WLUD setting is higher that what could
be achieved if WLUD strength was chosen independently for
each die. The ADWLUD circuit uses the sensor input to track
process skew corner at the die level and selects an optimized
WLUD setting from a fixed set of WLUD strength options for
each die. This yields a substantial Vccmin improvement for that
die, compared to a globally selected optimal setting.
Fig. 4 depicts the concept of temperature tracking with the
ADWLUD system. The silicon measurements are for a single
die at two different temperatures. The read and write Vccmin for
each die is plotted as a function of WLUD strength. The x-intercept of the intersection of the read and write Vccmin curves
is the optimal WLUD setting for this die at a given temperature. As can be seen from the graph, this value is 12% of
Vcc at 10 C. The optimal WLUD strength at 95 C is quite
differentabout 24% of Vcc. The WLUD setting that is optimal for this die at high temperature is suboptimal for the die
at low temperature. Without a dynamic system for tracking and
responding to this temperature shift, a compromise WLUD of
16% is optimal (determined by the intersection of the read
curve at 95 C and write curve at 10 C). Active Vccmin with
the fixed, single-point WLUD setting is higher than the Vccmin
possible with multiple settings tuned for different temperature
ranges. The ADWLUD sensor is able to track temperature shifts
and dynamically select the optimal WLUD strength setting for
each die across a range of temperatures. This selection process
yields improved active Vccmin for this die at both temperatures,
as seen in Fig. 4.
Fig. 4. The effect of WLUD strength on read and write Vccmin for a die at 95
C and 10 C is shown. The optimal WLUD strength is determined by where
the read and write Vccmin curve intersect are different for the die at the two
temperatures. Without the ability to track this temperature shift, a compromise
solution for WLUD will have to be picked which is not optimal for either the
10 C or 95 C. Data is measured for a typical die.
KOLAR et al.: A 32 nm HIGH-k METAL GATE SRAM WITH ADAPTIVE DYNAMIC STABILITY ENHANCEMENT FOR LOW-VOLTAGE OPERATION
79
Fig. 5. The adaptive dynamic word-line under-drive (ADWLUD) circuit consists of the WLUD module, controller and 6T-SRAM bitcell based sensor.
One input to the comparator is the sensor output voltage and the
other is the reference voltage. If Vsensor is less than Vref1, the
comparator output signal activates WLUD pMOS P3, applying
a strong WLUD. Similarly, the other comparator will enable
P4 when Vsensor is less than Vref2. Depending on the magnitude of Vsensor, the controller applies either weak or strong
WLUD. Comparator offset directly contributes to the overall
error in WLUD setting assignment and reduces the Vccmin benefit. The impact of non-idealities in the controller is explored in
the measurement results section. The reference voltage generation circuit consists of a resistive divider with a multiplexer for
controller calibration. One of four different nodes from the resistive divider can be chosen as the reference voltage. Since this
is a ratioed circuit composed of uniform elements, it provides
reference voltages that are independent of process corner and
temperature. Characterization of a statistically significant quantity of silicon material is used to empirically determine optimal
values for Vref1 and Vref2. The empirically determined settings
are then used across all silicon material. The choice of Vref1 and
Vref2 is discussed in more detail in Section IV in the context of
actual silicon results.
Fig. 6 shows simulated waveforms of the ADWLUD system
at 5 GHz with 1 V supply voltage at 95 C. The diagram describes the timing of WL-driver wake-up signal (wl_slpen) and
WLVCC with different WLUD settings. The wake-up signal arrives one cycle before WL is asserted and WLVCC is restored
to the voltage level set by the WLUD strength settings, ranging
.
from 0 to 20% below
Simulations were performed to assess the effect of systematic
and random variation on the WLUD circuit itself. At a given
temperature, the variation across various process corners for
the WLVCC voltage is on the order of 6 mV. In addition, the
random variation component is 10 mV. These variation components are not significant enough to impact the efficacy of ADWLUD, which is confirmed by the silicon measurements pre-
80
Fig. 7. The ability of the sensor to track process corners (a) and temperature
shifts (b) is shown. On the x-axis, the difference between read and write Vccmin
data obtained from Si measurements is plotted. This value is well correlated to
the sensor output voltage Vsensor. Vsensor (measured at nominal voltage for
all die) can be used to determine if a die is read or write limited by comparing
Vsensor to the reference voltage.
KOLAR et al.: A 32 nm HIGH-k METAL GATE SRAM WITH ADAPTIVE DYNAMIC STABILITY ENHANCEMENT FOR LOW-VOLTAGE OPERATION
81
Fig. 9. The shift in measured Vsensor value when temperature changes from
10 C to 95 C is shown. For die of interest, the median shift is 80 mV with
a 1 sigma variation of 6 mV.
Fig. 10. The distribution of read-write Vccmin for die measured on a wafer
with no WLUD (a). Applying fixed strong WLUD (b) improves read limited
die while degrading the write limited die significantly. In the sensor-controlled
mode (ADWLUD), read limited die improve but the write limited die do not
degrade significantly (c).
82
V. CONCLUSIONS
Fig. 13. Die photo of the testchip showing the 3.4 Mb macro that is controlled
by the sensor.
KOLAR et al.: A 32 nm HIGH-k METAL GATE SRAM WITH ADAPTIVE DYNAMIC STABILITY ENHANCEMENT FOR LOW-VOLTAGE OPERATION
83
Pramod Kolar (S01M04) received the B.E. degree in electronics and communication engineering
from the National Institute of Technology, Surathkal,
India, in 1998 and the M.S. and Ph.D. degree in electrical engineering from Duke University, Durham,
NC. in 2002 and 2005, respectively.
He has been with Advanced Design, Logic Technology Development, Intel Corporation, since 2005,
where he works on SRAM bitcell development,
statistical circuit design and yield analysis. He has
published 12 papers in international conferences
and technical journals and holds two U.S. patents. He was a graduate intern at
Qualcomm in 2004.
Dr. Kolar received the Inventor Recognition Award from Semiconductor Research Corporation in 2005.
84
Henry (Hyunwoo) Nho received the B.S. in electrical engineering from Korea Advanced Institute of
Science and Technology, Daejeon, Korea, in 2003,
and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in
2005 and 2008, respectively.
After finishing the Ph.D. in June 2008, he joined
Advanced Design Group, Logic Technology Development, Intel Corporation, where he worked on the
lead vehicle design for future process technology development, and low-power high-performance SRAM
designs for CPU and mobile applications. In January 2010, he joined Mobile
Platform Architecture Group, LG Electronics, as a Senior Research Engineer.
Since then, he has been working on architecture for mobile application processors, optimization of system architecture for mobile devices, and adoption of
innovative technologies into mobile devices.
I.
INTRODUCTION
VDD
800mV
750mV
700mV
650mV
600mV
550mV
HD-SRAM
0
0
0
0
1
19
D-SRAM
0
1
1
5
10
115
57
58
MEASUREMENT RESULTS
A. Test Chips
Process
Bitcell Area
VMIN
Sim. SNM
Performance
Energy / bit
Leakage / bit
We fabricated test chips including 32kb banks of HDSRAM and commercial D-SRAM in a 45nm CMOS process
with 1.1V nominal VDD (Figs. 5 and 6). Each bank uses
identical address decoders, WL and BL drivers, and sense
amplifiers (SAs). HD-SRAM adds gating logic and an
additional WL driver to support two WLs per row, slightly
decreasing array efficiency. We tie one HD-SRAM SA input
to a reference voltage to accommodate single-ended read. The
test chips do not include assist circuits, error correction coding
(ECC), or redundancy, which could be applied to either
design. A BIST performs functionality and performance tests
on each design. Functionality is assessed by performing march
tests with solid, checkerboard and stripe test patterns.
B. Performance, Power and Leakage
HD-SRAM D-SRAM
45nm CMOS
0.37 m2
639 mV
711 mV
353 mV
268 mV
550 MHz
650 MHz
43 fJ
53 fJ
55 pW
64 pW
59
Figure 9. Failure maps show bitcell failure locations as VDD is scaled down.
For this test array VMIN is 650mV for HD-SRAM and 800mV for D-SRAM.
ACKNOWLEDGEMENTS
The authors thank STMicroelectronics for fabrication
and support of this project.
Figure 10. A histogram of measured VMIN for 80 test chips shows that HDSRAM has a 72mV-lower average VMIN and fewer arrays with high VMIN.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
Figure 11. At nominal VDD, HD-SRAM has a 100x-lower bitcell failure rate
than D-SRAM. Read failures dominate cell stability at nominal VDD and are
aggravated in only this plot to observe a significant number of failures.
60
I. INTRODUCTION
Technology scaling has made the current day ICs faster,
denser. But faster the circuits are, more is the power
consumption and hence reduces the battery life of many of
the portable devices. Reduction in power consumption can
be achieved by many techniques: Logic Optimization,
The 6T SRAM cell, shown in Figure 1 has two crosscoupled inverters ((M1, M2) & (M4, M3)) connected to the
bit lines through the access transistors (M5, M6) [4].
During write, the bit lines are driven with the value that has
to be written into the cell. The word line WL goes high and
the values are stored in the cell. During read, the cell drives
the respective bit lines with the value stored in the cell. This
cell has already been studied in the sub-threshold region
[5].Results has shown that the write ability of the cell fails
due to decreased signal levels and increased variations.
Write depends on NMOS winning the ratioed fight with the
PMOS. But as iso-size PMOS is stronger than NMOS in
sub-VT region, this becomes more challenging and fails [5].
Similarly, read SNM also comes down heavily because of
the interference from the bit lines and hence more prone to
flipping the state of the cell. SNM defines the amount of
noise that the cell can bear before the state of the cell flips.
In [1], they show that, for a 6 probability, hold SNM for a
particular supply say 0.3 V is equal to the RSNM of a
supply twice of it (0.6 V). So, we can operate the 6T SRAM
cell at very low voltages provided that RSNM problem is
removed. There have been many SRAM design proposals
for sub-threshold region operation. Some of the abovethreshold SRAMs which satisfy the above condition for
read and also with the ability to write at low voltages can be
successful in the sub-threshold region. Some of such
SRAM designs were chosen and were pushed into subthreshold region of operation and their performance was
observed and compared.
down the read process as the read bit line discharge has to
take place through three stacked transistors.
IV SIMULATION SETUP
Delay
Twrite01
Twrite10
Tread 1
Tread0
6T
[6]
Fails
321
-0.494
7T
[7]
1015.5
198.15
367
--
7T
[8]
2501
1495
721
721
7T
[9]
Fails
Fails
316
240
8T
[10]
1035
200
-210
9T
[11]
1131
201
466
466
9T
[12]
1039.5
202.5
-562
11T
[13]
1059
202.5
-203
10T
[1]
2611.5
460.7
-0.605
Proposed
Design
1018.3
193.19
---209.6
10T
[1]
6.48
6.18
0.338
0.482
0.217
Proposed
Design
2.21
2.23
0.244
0.382
0.161
TABLE II
POWER RESULTS OF SRAMS AT 45nm TECHNOLOGY WITH 0.3 V VDD
POWER IN NANOWATTS (nW)
Delay
Pwrite01
Pwrite10
Pread 1
Pread0
PHOLD
6T
[6]
Fails
3.146
0.171
0.665
0.0768
7T
[7]
2.32
2.25
0.456
0.235
0.161
7T
[8]
2.02
2.02
0.862
0.862
0.251
7T
[9]
Fails
Fails
0.640
0.659
0.201
8T
[10]
2.28
2.29
0.241
0.379
0.158
9T
[11]
2.5
2.5
0.998
0.998
0.177
9T
[12]
2.32
2.28
0.253
0.439
0.156
11T
[13]
2.34
2.33
0.195
0.424
0.158
TABLE III
AVERAGE POWER DELAY PRODUCT OF SRAMS in (Ws)
Avg PDP
7t
[7]
8t
[10]
9t
[12]
11t
[13]
Proposed
Design
Writing
1.3866e-18
1.4109 e-18
1.4283 e-18
1.4728 e -18
1.3477 e-18
Reading
1.267 e -19
0.651 e -19
1.944 e-19
0.6282e-19
0.6573 e-19
V. ANALYSIS OF RESULTS
All the SRAM designs were compared for power and
delay values when subjected to operate in the sub-threshold
region. Some of the designs worked well into the subthreshold region. The designs that did not work were not
taken into comparative study for performance. The further
part of the paper explains about the reasons as if why the
design does work or does not work.
A. Designs that fail
Out of the designs compared, along with the standard 6T
design, the 7T SRAM design proposed in [9], the single
ended 6T SRAM design proposed in [6] do not work in the
sub-threshold region. Standard 6T design does not work for
the reasons mentioned earlier. The single ended 6T SRAM
design does not work because the inverter 1 is not able to
activate the inverter 2 in such low supply voltages. The
same problem exists with the 7T SRAM from [9] too. Also
this SRAM uses the traditional read method which reduces
the RSNM of the circuit and hence more prone to failure.
B. Designs that work
Among the designs that are used for comparison, the 7T
SRAMs proposed in [7], [8], 8T SRAM proposed in [10],
the 9T SRAM designs proposed in [11] [12], and the 11T
SRAM proposed in [13] work well. All these designs
except the 11T SRAM [13] use separate read and write
parts in the SRAM.
The sub-threshold SRAM proposed in [1] was also taken
for comparison analysis. Here in 45nm technology, when
the circuit was simulated, the PMOS header to the supply
had to be upsized for proper functionality. When the results
were compared with the other SRAM designs considered
for study, we find that all the other working ones using the
separate read and write mechanism performed better. This
is because the design in [1] uses floating Vdd and hence it
takes longer time to write. Similarly, the read has to occur
through more number of transistors and hence it is slower.
The comparison designs work well for writing just but they
are slower than the above threshold operation which is
obvious from such a low supply. From the results in Table I
and Table II, we can see that 7T SRAM from [8] and the 9T
SRAM from [11] have the worst performance with respect
to power and speed and hence were not considered further
to find the better performer. This is because of the high bit
line capacitance and hence higher delay and power as these
designs have extra pass transistors connected to the bit lines
[8].
C. SRAM Write
Among the designs used for study, successful writes gives
almost similar results as they use the same 6T SRAM
design for write. This is very because, since the inverter is
not ratioed, the write 1 operation can occur at such low
supply for typical corner transistor models [2]. In Table I,
we can see that combining the results of both write 1 and
write 0, we can see that the 7T SRAM in [7] comes as the
fastest. This is because, it weakens the feedback by
disconnecting the 6T write part from the ground and hence
can load into the storage nodes easily [6]. These are closely
followed by the 9T SRAM [12], 11T SRAM [13] and the
8T SRAM [4]. From Table II, when we compare the power
consumption, they differ according to the usage of
transistors with the 11T SRAM design consuming the most.
The difference in speed between the considered designs is
almost nullified by the difference in power consumption as
the faster ones use more number of transistors. This can be
observed from Table III which has the average power delay
products. The result for all the selected designs is almost
the same with minor differences. This is due to the fact that
the write mechanism is similar in all the compared cases.
D. SRAM Read
When we compare the results, the designs with separate
read ports with a dedicated read bit line emerge winners in
terms of speed and power. In case of the speed as shown in
Table I, among these designs we can see that 11T SRAM
[13] comes as the fastest in reading followed by the 8T
SRAM [10] and 7T SRAM [7]. This is because the read
discharge through the storage node has two paths to get
discharged and hence faster. The 8T SRAM is very close in
speed to this because the read has to occur through just two
of the transistors. Though the 7T SRAM has the similar
7t
[7]
8t
[10]
9t
[12]
11t
[13]
Writing
2.80
4.48
5.64
8.5
Reading
44.8
-0.9
66.18
-4.6
VI. CONCLUSION
The paper proposes a novel sub-threshold SRAM circuit
along with the study of various SRAM designs in the subthreshold region at 45nm technology using HSPICE
simulations and typical corner transistor models. Operating
a SRAM device in sub-threshold requires sufficient writing
ability and good static noise margin for the design. Among
the designs that we have taken, the successful designs use
the 6T setup for writing without the necessity of ratioed
inverter and a separate setup for read when they use typical
corner transistor models. From the results we can see that
successful designs perform better than sub-threshold SRAM
proposed in [1]. Results show that the 7T SRAM proposed
in [7] has best performance in case of write and 8T SRAM
[10] has the best performance in case of read. 11T SRAM
from [13] is good in speed and power but as it has reduced
SNM in the read mode. The new 9T SRAM design
combining the advantages of these designs was proposed.
In case of write, the PDP of proposed 9T SRAM design is
2.80% less than the 7T SRAM from [7], 4.48 % less than
8T SRAM from [10], 5.64% less than 9T SRAM design
[12] and 8.5 % less than 11T SRAM [13]. Similarly, the
savings in PDP during read is 44.8 % less than 7T SRAM
in [7] and 66.18 % less than 9T SRAM in [12]. It is almost
same as in the 8T Design. Though PDP of the proposed
design is greater than that of the 11T design, the reduced
RSNM of 11T design makes it inferior the proposed design.
REFERENCES
[1] Calhoun, B.H.; Chandrakasan, A.;, "A 256kb Sub-threshold SRAM in
65nm CMOS," Solid-State Circuits Conference, 2006. ISSCC 2006. Digest
of Technical Papers. IEEE International, vol., no., pp.2592-2601, 6-9 Feb.
2006
[2] Moradi, F.; Wisland, D.T.; Aunet, S.; Mahmoodi, H.; Tuan Vu Cao; ,
"65NM sub-threshold 11T-SRAM for ultra low voltage applications," SOC
Conference, 2008 IEEE International , vol., no., pp.113-118, 17-20 Sept.
2008
[3] Wang, A.; Chandrakasan, A.P.; Kosonocky, S.V.; , "Optimal supply
and threshold scaling for subthreshold CMOS circuits ," VLSI, 2002.
[12] Sheng Lin; Yong-Bin Kim; Lombardi, F.; , "A 32nm SRAM design
for low power and high stability," Circuits and Systems, 2008. MWSCAS
2008. 51st Midwest Symposium on , vol., no., pp.422-425, 10-13 Aug.
2008
[13] Singh, A.K.; Prabhu, C.M.R.; Soo Wei Pin; Ting Chik Hou; , "A
proposed symmetric and balanced 11-T SRAM cell for lower power
consumption," TENCON 2009 - 2009 IEEE Region 10 Conference , vol.,
no.,
pp.1-4,
23-26
Jan.
2009
[14] Berkeley Predictive
www.eas.asu.edu/~ptm/.
Technology
model
website,
http://
Sumana Basu
I.
INTRODUCTION
BACKGROUND
169
III.
170
Figure 6.
IV.
SIMULATION RESULTS
171
same as switching the bit line. Note that for read operation,
since the bit line is pre-charged to VDD, there is no significant
current flow and voltage changes across the access transistor if
the cell contains a 1. Therefore, read 1 delay is not defined.
TABLE I.
WRITE 0
162uW
0uW
WRITE 1
162uW
81uW
READ 0
243uW
162uW
READ 1
243uW
81uW
WRITE 0
5.1 nS
6nS
WRITE 1
5.5 nS
0nS
7.5nS
8.5nS @ 600nM
6.5nS @ 800nM
{MRD/MRA}
READ 0
172
The main operations of the SRAM cell are the write, read
and hold. The static noise margin is certainly more important at
hold and read operations [6], specifically in read operation
when the wordline is 1 and the bitlines are precharged to 1. The
internal node of SRAM which stores 0 will be pulled up
through the access transistor across the access transistor and
the driver transistor. This increase in voltage severely degrades
the SNM during read operation.
Figure 8.
The SNM Curve plotted for different VBS changing from 0.0V to
1.8V in steps of 300mV
V.
CONCLUSION
173
REFERENCES
[1]
[2]
[3]
Yen Chen,Gary Chen et al., A 0.6V dual rail compiler SRAM design
on 45nm CMOS technology with Adaptive SRAM power for lower
Vdd_min VLSIs,Solid-State Circuits, IEEE Journal,vol. 44 ,April.2009,
Issue 4 , pp.1209-1214
[4]
Koichi Takeda et al, A Read Static Noise Margin Free SRAM cell for
Low Vdd and High Speed Applications,Solid-State Circuits, IEEE
Journal vol. 41, Jan.2006, Issue 1 , pp.113-121
[19] Shilpi Birla, Neeraj Kr. Shukla, Manisha Pattanaik, R.K.Singh, Device
and Circuit Design Challenges for Low Leakage SRAM for Ultra Low
Power Applications, Canadian Journal on Electrical & Electronics
Engineering Vol. 1, No. 7, December 2010,
[5]
[6]
Benton H. Calhoun Anantha P. Chandrakasan A 256-kb 65-nm Subthreshold SRAM Design for Ultra-Low-Voltage Operation,Solid-State
Circuits, IEEE Journal vol. 42, March 2007, Issue 3 , pp.680-688.
[7]
Peter Geens, WimDehaene, A dual port dual width 90nm SRAM with
guaranteed data retention at minimal standby supply voltage,34th
European Solid-State Circuits Conference, 2008. ESSCIRC 2008.pp290-293.
[8]
Farshad Moradi etal., 65nm Sub threshold 11 T SRAM for ultra low
voltage Application ,IEEE xplore,2008,pp-113-117
[9]
[22] Prashant Upadhyay, Mr. Rajesh Mehra, Niveditta Thakur, Low Power
Design of an SRAM Cell for Portable Devices, Intl Conf. on
Computer & Communication Technology, ICCCT10
[23] Yen Hsiang Tseng, Yimeng Zhang , Leona Okamura and Tsutomu
Yoshihara, Graduate School of Information, Production and Systems,
Waseda University, Fukuoka, 808-0135, Japan, A New 7-Transistor
SRAM Cell Design with High Read Stability, 2010 International
Conference on Electronic Devices, Systems and Applications
(ICEDSA2010)
[11] B. Cheng et al., The impact of random doping effects on CMOS SRAM
cell, in Proc. ESSCIRC, Sep. 2004, pp. 219222
[12] E. Seevinck et al., Static-noise margin analysis of MOS SRAM cells,
IEEE J. Solid-State Circuits, vol. SC-22, no. 5, pp. 748754, Oct. 1987
[14] L. Chang et al., Stable SRAM cell design for the 32 nm node and
beyond, in Symp. VLSI Technology Dig. Tech. Papers, Jun. 2005, pp.
128129
[15] Evelyn Grossar, Michele Stucchi, Karen Maex, Member, IEEE, and
Wim Dehaene, Senior Member, Read stability and write-ability
analysis of SRAM Cells for nanometer Technologies IEEE Journal Of
Solid-State Circuits, Vol. 41, No. 11, November 2006.
174
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 59, NO. 10, OCTOBER 2012
2275
I. INTRODUCTION
ECENTLY, portable devices such as smart-phones, cellular phones, and video cameras are gaining popularity
as well as making changes in every aspect of our daily lives.
Multimedia data processing including image/video applications is one of the key factors enhancing the ever-increasing
portable device market. However, image/video applications are
very computationally intensive and require a large amount of
embedded memory access, which results in significant power
consumption and thus limits the battery lifetime of portable
devices.
Manuscript received March 01, 2011; revised July 12, 2011 and October 12,
2011; accepted November 24, 2011. Date of publication February 14, 2012; date
of current version September 25, 2012. This work was supported by the Basic
Science Research Program through the National Research Foundation of Korea
(NRF) funded by the Ministry of Education, Science, and Technology (20100004484). This paper was recommended by Associate Editor C.-C. Wang.
J. Kwon, I. Lee, and J. Park are with the School of Electrical Engineering,
Korea University, Seoul 136-701, Korea (e-mail: [email protected]).
I. J. Chang is with the School of Electronics and Information, Kyung Hee
University, Suwon 446-701, Korea.
H. Park is with the Department of Computer Science, Yonsei University,
Seoul 140-749, Korea.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCSI.2012.2185335
2276
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 59, NO. 10, OCTOBER 2012
Fig. 3. Comparison between the write failure probability in the SF corner and
the read failure probability in the FS corner for minimum size SRAM. The trannm
nm,
sistor sizes of the minimum (1.0 ) SRAM are
nm
nm, and
nm
nm.
ative read static noise margin (SNM) [3], [13] and negative
write margin [14]. Under process variations, the worst process
corners in SRAM bit-cell for read and write operations are
Fast-NMOS and Slow-PMOS (FS) and Slow-NMOS and
Fast-PMOS (SF) corners, respectively [7]. SRAM read failure
probability,
, at FS corner and write failure probability,
, at SF corner are simulated using IBM 90-nm CMOS
technology, and the results are presented in Fig. 3. Please note
that
at the SF corner is much smaller than
at
the FS corner, which implies that in the conventional 6T SRAM
bit-cell, the worst SRAM failure probability,
, happens in
the FS corner. In the following, we will refer to SRAM failure
as the read failure under the FS corner, which means that we
consider the worst process corners of 6T SRAM.
As mentioned, most of the SRAM failures are mainly due
to the random transistor threshold voltage
variations
[15]. The transistor threshold voltage
variation is usually
caused by the random dopant fluctuation (RDF) effect [16].
Since the effect of RDF is expected to increase with technology
scaling [16], [17], the probability of a SRAM cell failure will
grow as well with process scale-down. The RDF based
variation in a simple reverse-quadratic model is expressed as
follows [18]:
(1)
KWON et al.: HETEROGENEOUS SRAM CELL SIZING FOR LOW-POWER H.264 APPLICATIONS
2277
Fig. 5. Block diagram of H.264 video encoder and its embedded SRAMs.
Fig. 4. SRAM failure probabilities for different supply voltages and
bit-cell sizes. The transistor sizes of the minimum (1 ) SRAM are
nm
nm,
nm
nm, and
nm
nm.
where
is the standard deviation of the
variations for a
minimum sized transistor, which has a channel length
and
a channel width
. From the equation above, we can easily
notice that SRAM failure probability decreases with larger transistor size.
The failure probabilities for different supply voltages and
transistor widths in a SRAM bit-cell are presented in Fig. 4. The
numerical results are obtained using extensive Monte Carlo simulations with IBM 90-nm technology. The number of destructive read operations in SRAM is counted using Monte-Carlo
simulations for 100 000 samples with local intra-die threshold
voltage variations (RDF effects) at the worst global process
corner (FS corner). As the supply voltages and transistor widths
become smaller, the probability of SRAM failure abruptly increases. It should be noted that the increasing failures due to the
voltage scaling-down can be compensated with a larger transistor width. For example, a SRAM bit-cell with minimum transistor width (1.0 ) and 925-mV supply voltage has a failure
probability of 1.07%, which has the same failure probability in
the case of 1.35 width bit-cell under 850-mV supply voltage.
B. Video Quality Degradation for Bit Position
For the embedded SRAM used inside H.264 system, the
SRAM failure increase leads to video quality degradation.
The key observation here is that the amplitude of the quality
degradation is quite dependent on the bit-cell failure locations.
In other words, the failures, which occur in the SRAM bit-cells
storing the HOBs of luma/chroma pixels, result in a considerable video quality degradation, while the LOB data, if affected,
does not significantly deteriorate the output video quality. In
this section, we quantitatively analyze the effect of the SRAM
failure positions on the video quality degradation in H.264
system.
As a video quality measure, we use the PSNR (Peak
Signal-to-Noise Ratio), which is
(2)
MSE is the mean square error between the original videos and
the impaired videos, whose quality is degraded due to the failures in embedded memory. MSE is expressed as
(3)
Fig. 5 presents the H.264 encoder architecture [19]. The motion estimation (ME) block records the displacement with the
reference frame as a motion vector. The motion compensation
(MC) reconstructs the temporal domain frame by using the motion vectors. Here, the difference between the original data and
the reconstructed data is computed and stored in residual frame
buffer. This residual frame data is transformed to the frequency
domain and further quantized in order to reduce the spatial redundancy. The final bit-stream is generated by using the motion
vectors and quantized transform coefficients. The target application in this work is the H.264 baseline profile level 1.3 that
supports common intermediate format (CIF). Since the format
presents 352
288 resolution under 0.5-Mbps constant bitrate (CBR) constraint, it allows low clock frequencies with low
supply voltage.
During the H.264 encoding process, we assume that the
six embedded memories (the residual frame buffer, the reconstructed frame buffer, the reference frame buffer, the
inter-prediction buffer, the pipelined buffer for DCT, and the
quantization buffer) are utilized as buffers storing the intermediate results of the frame data, which are highlighted in
Fig. 5. In the following discussions, we assume that the SRAM
failures are equally distributed in those six different buffers,
and the video quality degradation is measured on the H.264
decoder side.
As a measure of the video quality degradation, we define
PSNR as the PSNR differences between the original video
and the impaired video due to the SRAM failures:
(4)
plots when the SRAM failure ocFig. 6 shows the
curs only at one particular bit position of the 8-bit luma pixel and
4-bit chroma pixel during the encoding process of H.264 systems. During the encoding process, the level 1.3 baseline profile
encoder with JM reference software [20] version 16.0 which
2278
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 59, NO. 10, OCTOBER 2012
Fig. 6.
for different failure locations in embedded SRAM.
changes for 8-bit luma pixel data. (b)
changes for 4-bit
(a)
chroma pixel data.
supports CIF video format, 30 fps, 0.5 Mbps bit-rate with adaptive quantization is used. More than ten reference video samples are used to get the average value of the
. As the
failure probability increases at the HOBs, the
range
is around 24
3 dB. On the other hands, the
stays
around 1
0 dB for the failures in LOBs. Here, when the
SRAM failure occurs only at the
order bit of 8 bit pixel,
the video quality degradation can be represented as
,
. As the failure position becomes closer to
7, the
increases abruptly.
The analysis in Fig. 6 shows that the amplitudes in video
quality degradation are quite dependent on the bit-cell failure
positions and the supply voltage. Therefore, the SRAM bit-cells
storing the luma/chroma pixel data can be carefully sized so
that the cells storing important video data are less affected by
voltage scaling or process variations. Of course, the failures of
the SRAM cells storing the LOB pixel data may increase; however, the video quality degradation is not significant. In order
to select the appropriate size for each of the SRAM bit-cells,
we propose a priority-based SRAM sizing algorithm in the following section.
KWON et al.: HETEROGENEOUS SRAM CELL SIZING FOR LOW-POWER H.264 APPLICATIONS
obtain
if
, Voltage
Initial:
, Voltage
Initialize
for all
2279
do
then
for
1 to 7 do
to
for
to
for
if
However, this methodology requires a huge computation
overhead since it needs to compute all of the possible bit-cell
combinations. In the case of calculating the optimal cell sizes
of the 8-bit luma pixels, the complexity is
, where is
the number of all the probable cases of the bit-cell sizing from
minimum to maximum. In our experiment, 17 probable bit-cell
sizes can be chosen, therefore, a total
sizing cases need
to be considered. In the following subsection, we propose an
efficient approach to find a set of bit-cell sizing that guarantees
the optimal video quality with a reasonable time complexity.
do
do
then
(5)
where
means
of
order bit with size
of . Fig. 7 shows an example of a subproblem sizing (2, 3.8 ).
To solve the subproblem sizing (2, 3.8 ), the candidates from
the previous iteration
are checked to find the partial cell sizing that shows the
minimum sum of the
. The resulting
and
are stored in the table, and those values will be
used in the next step
of iteration.
Algorithm 2 shows an overall flow of the optimal cell sizing
selection process in a given area
. As a first step of the
iteration, when is equal to zero,
are initialized
as
since there is only one candidate. For other
table contents,
,
, are initialized
as infinite number, and all the
elements are set to
zero. As mentioned, in order to find the optimal solution of a
2280
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 59, NO. 10, OCTOBER 2012
Fig. 8. Still shot images and PSNR comparisons between the identical and
the proposed approach for two different videos with the iso-area condition of
1.3 under 900-mV supply voltage. (a) An image from Foreman video with
the identical sizing SRAM. (b) An image from Foreman video with heterogeneous sizing SRAM. (c) An image from City video with the identical sizing
SRAM. (d) An image from City video with heterogeneous sizing SRAM.
Fig. 9. Average PSNR comparisons for ten sample videos under different area
constraints at 900-mV supply voltage.
KWON et al.: HETEROGENEOUS SRAM CELL SIZING FOR LOW-POWER H.264 APPLICATIONS
2281
Fig. 10. PSNR curves per frame for sample videos. (a) PSNR curves of 60 frames Soccer video under 900-mV (heterogeneous and identical) and 1.2-V (identical)
supply voltage. (b) PSNR curves of 60 frames Crew video under 900-mV (heterogeneous and identical) and 1.2-V (identical) supply voltage. (c) PSNR curves
of 1000 frames video (Harbour, Canoo, mobile, Football) under 900-mV (heterogeneous and identical) and 1.2-V (identical) supply voltage. (d) PSNR curves of
1000 frames video (Soccer, Crew, Foreman, Bus) under 900-mV (heterogeneous and identical) and 1.2-V (identical) supply voltage.
PSNR
FOR
TABLE II
FOREMAN VIDEO IN VARIOUS LOW-VOLTAGE OPERATIONS.
(UNDER THE ISO-AREA CONDITION OF 1.3 )
2282
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 59, NO. 10, OCTOBER 2012
Fig. 11. Video quality (PSNR) versus output bit-rate graphs for Foreman (a)
video and Soccer (b) videos.
Fig. 12. PSNR comparison for 4CIF video sequences under 1.3
dition at 900-mV supply voltage.
iso-area con-
KWON et al.: HETEROGENEOUS SRAM CELL SIZING FOR LOW-POWER H.264 APPLICATIONS
2283
architecture, and is applicable to the various embedded memories, where data stored in the memory have large differences
in importance.
V. CONCLUSION
Fig. 14. PSNR comparisons for four sample videos when 32-nm PTM model
is used with two different supply voltages (750 mV and 800 mV) under 1.3
iso-area condition. The transistor sizes of the minimum SRAM are
nm
nm,
nm
nm, and
nm
nm.
mV for both NMOS
Both inter-die threshold voltage variation (
mV,
and PMOS) and intra-die threshold voltage variation (NMOS
mV) are considered [25] for the simulations.
and PMOS
voltage, the proposed approach shows an average PSNR improvement of 5.65 dB over the identical sizing.
B. Layout Example and Layout Issues
A layout example of the heterogeneous SRAM for 8-bit pixel
is presented in Fig. 15. Eight heterogeneously sized bit-cells
are placed into one word-line, and every bit-cells have the
same height. The width of each bit-cell can be varied by the
bit position, which is decided by the proposed Algorithm 2.
As mentioned, the simple modifications are needed in the
SRAM bit-cells and peripheral circuitry to adopt the proposed
heterogeneous SRAM architecture. Other SRAM architectures
supporting low-voltage operation such as 8T SRAM [4] and
priority-based 8T/6T hybrid SRAM [7] require complex circuitries since two separate word-lines are used for the read and
write operations. Therefore, combining the single-ended 8T
SRAM structure with the double-ended 6T SRAM becomes
a rather cumbersome design challenge. Moreover, in the 8T
SRAM architectures, only a small number of bit-cells are used
for a common bit-line due to the single-ended structure [4],
thus significantly degrading the area efficiency. Our proposed
heterogeneous SRAM approach provides an easy to design
REFERENCES
[1] C. P. Lin et al., A 5 mW MPEG4 SP encoder with 2D bandwidthsharing motion estimation for mobile applications, in Proc. ISSCC
Dig. Tech. Papers, Feb. 2006, pp. 16261635.
[2] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, Low-power digital CMOS design, IEEE J. Solid State Circuits, vol. 6, no. 5, pp.
473484, Apr. 1992.
[3] I. J. Chang et al., Fast and accurate estimation of SRAM read and
hold failure probability using critical point sampling, IET Circuits,
Devices, Syst., vol. 4, no. 6, pp. 469478, Nov. 2010.
[4] L. Chang et al., A 5.3 GHz 8T-SRAM with operation down to 0.41 V
in 65 nm CMOS, in Symp. VLSI Circuits Dig., Jun. 2007, pp. 252253.
[5] I. J. Chang et al., A 32 kb 10T sub-threshold SRAM array with bitinterleaving and differential read scheme in 90 nm CMOS, IEEE J.
Solid State Circuits, vol. 44, no. 2, pp. 650658, Feb. 2009.
[6] A.-T. Do et al., An 8T differential SRAM with improved noise margin
for bit-interleaving in 65 nm CMOS, IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 58, no. 6, pp. 12521263, Jun. 2011.
[7] I. Chang, D. Mohapatra, and K. Roy, A priority-based 6T/8T hybrid
SRAM architecture for aggressive voltage scaling in video applications, IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 2, pp.
101112, Feb. 2011.
[8] K. Osada et al., 16.7 fA/cell tunnel-leakage-suppressed 16 Mb SRAM
for handling cosmic-ray-induced multi-errors, in ISSCC Dig. Tech.
Papers, Feb. 2003, pp. 302303.
[9] A. K. Agarwal and S. Nassif, The impact of random device variation on SRAM cell stability in sub-90-nm CMOS technologies, IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 1, pp. 8697,
Jan. 2008.
[10] A. Bhavnagarwala, X. Tang, and J. D. Meindl, The impact of intrinsic
device fluctuations on CMOS SRAM cell stability, IEEE J. Solid State
Circuits, vol. 36, no. 4, pp. 658665, Apr. 2001.
[11] Y. Taur and T. H. Ning, Fundametals of Modern VLSI Devices. Cambridge, U.K.: Cambridge Univ. Press, 2002.
2284
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 59, NO. 10, OCTOBER 2012
Jinmo Kwon (S11) received the B.S. and M.S. degrees in electrical engineering from Korea University, Seoul, Korea, in 2009 and 2011, respectively.
He joined the system IC Division of LG Electronics Corporation, Seoul, as a Research Engineer
in 2011. His research interest includes low power
circuit and systems for digital signal processing and
video signal processing.
Abstract-
Lourts Deepak A.
Likhitha Dhulipalla
Bangalore, India.
Bangalore, India.
Static
Random
Access
Memory
(SRAM)
is
doesn't
in
material
and
process
application
technology act as
since
(DRAM).
driving
require
any
force
refresh
for
low
current.
power
In
this
paper,
we've
leakage.
II.
A.
BACKGROUND THEORY
6T SRAM Cell
A static RAM cell is capable of holding a data bit so
I.
INTRODUCTION
increased
vastly
from
consumer
goods
to
industrial
low,
it
isolates
the storage
cell. The
access transistor
static random access memory (SRAM) can preserve its data's until the
power is applied. Moreover SRAM cell doesn't need to be
FINFET
is
refreshed
and
faster
than
Dynamic
RAM,
operation
for power
designs offers the better control over short channel effects, low
leakage and better yield [2] in 32nm helps to overcome the
obstacles in scaling.
the high speed devices and in very large scale circuits. This
steady miniaturization of transistor with each new generation
bulk
CMOS
improvement
in
technology
of
has
yielded
continual
978-1-4673-0074-2/11/$26.00 @2011IEEE
The
177
B.
and
output
codes
are
different.
The
decoding
is
the decoder is an AND gate, the AND gate output will be high
when all the inputs are high, this output is also known as
active high output. A little more complex decoder is the n-2n
=
The access transistors and the word and bit lines WL and
BL are used to write and read, to and from the cell. In the
standby mode the access transistors turn to off by making the
word line low. The inverter will be complementary in this
state. The PMOS of the left inverter is turned on, the output
potential is high and the PMOS of the second inverter is
switched to off. The gates of the transistor that connect the bit
line and the lines of the inverter are driven by the word line. If
the word line is kept low the cell is disconnected from the bit
lines.
A.
SlA
'0
f2
clarified from its inputs like SO, SI and outputs like YO, YI,
Y2, Y3 relationship which mentioned in the table I below:
178
YO
SI
SO
0
From this
Y2
Y3
Yl
1
0
Pll DJll]]lJVVlLJlLJll
I
jll
,
fl
:!
____
L___
i-I
OOTI'H!
_
___
l_
__
r
I I
II II
I I ________
J ,,
__
r-'
I I
I I
I I
, I
J \.
as the decoder selects only one row of the array, the other
cells may generate glitch, this can be nullified by the buffers.
Also a 4-bit OR can be used to combine all the output of single
SRAM cells of each column to make a single output data.
The figure 5 shows the 4x4 SRAM cell array design
111 r, /n r 1_._.
=
J 11;:::==::;:
:::;:
=:::;:
:
: ","UT l
>-
__1> ___
t:.
...,J12
___
r-,!
___
... )
__
\...
jl
-",-,........'
\_._----
:=;:::::::;=
:
::
:;::=
::::;:::;::=
:=
r_t-!('1I$
D
D
D
Figure 4 : 2:4 decoder simulation result
A.
96
28
16
16
form a 4x4 SRAM cell array. To address these rows of cells the
(RE)
signal
activates the read buffer. The ready SRAM cell data traverses
towards ready buffer. Thus the data bit is read from memory
transistors).
enable
(WE)
connected to each row of the array. The input and output data
WE
179
VI. CONCLUSION
V.
FINFET based
are integrated and 4x4 SRAM cell array has been developed.
REFERENCES
word line WL OOlO and 0100 and for the given bit line BL
inputs llOl and 0111 the outputs of the SRAM array after
[I]
S.M kang and Y.Leblebici, -CMOS digital integrated circuits II, TMH
publishing company limited, 2007
[2]
passed through the OR gate are 1101 and 0111 thus just
following the BL i.e. input performing the read operation of
I
.l
WS;t5;s--__S I
I
'-------------------
M'"
L______
'------------------- ('I') I)
."
MOl
]lDMIVVVVVV I
i
"
"
[4]
[5 ]
"
I
L
I'M'"''"
, ilt===============-------J
l
i ;':,
1
----"1
I
, ':l-.,'"
"""I 4''"''""'"' <h"'''1 ''"tI+ I
Feng Wang, Yuan Xie, Kerry Bernstein and Yan Luo, -Dependability
Analysis of Nano-Scale FinFET Circuits II, proceedings of Emerging
VLSI technologies and Architectures IEEE 2006 .
..,.,
simulated first. Then 2:4 decoder also has been designed using
180
SushiI Bhushan
Assistant Professor
M-TECH VLSI
Sanjay Sharma
Department of Electronics & Communication
Thapar University
Patiala, Punjab, INDIA.
[email protected]
Abstract- This paper is based on the observation of a CMOS
consume
five-transistor SRAM cell (5T SRAM cell) for very high density
significant
fraction
(30-6 0%)
of
total
power
and low power applications. This cell retains its data with leakage
SRAM cell uses one word-line and one bit-line and extra read
memory
line
reduction
control.
The
new
cell
size
is
21.66%
smaller
than
show
purposed
cell
has
correct
operation
during
read/write and also the delay of new cell is 70.15% smaller than a
six-transistor SRAM cell. The new 5T SRAM cell contains
72.10% less leakage current with respect to the 6T SRAM
memory cell using cadence 45 nm technology.
many
VLSI
chips.
This
is
especially
true
for
with
each
generation
to
the
most
attractive
targets
for
power
majority of the write and read bits are '0'. Whereas in the
conventional SRAM cell because one of two bit-lines must be
discharged to low regardless of written value, the power
consumption in both writing '0' and ' l' are the generally same
[1]. Also in conventional SRAM cell differential read bit-line
used during read operation and consequently, one of the two
bit-lines must be discharged regardless of the stored data value
[3]. Therefore always there are transitions on bit lines in both
writing '0' and reading '0' and since in cell accesses an
cause high dynamic power consumption during read/write
are
overwhelming majority of the write and read bits are "0" these
INTRODUCTION
I.
cells
bridge
the
increasing
usually
SRAM cell with five transistors to reduce the cell area size
standby
mode. These
on-chip
memory
cells
are
ordinary programs.
346
II.
Ease The SRAM cell current and read static noise margin
(SNM) are two important parameters of SRAM cell. The read
SNM of cell shows the stability of cell during read operation
IDS-Ml
SRAM
cell.
Although
SRAM
cell
>
current
operations [4].
Both Read SNM and SRAM cell current values are highly
dependent on the driving capability of the access NMOS
transistor: Read SNM decreases with increases in driving
capability, while SRAM cell current increases [4]. That is, the
dependence of the two is in an inverse correlation [4]. Thus in
conventional SRAM cell the read SNM of cell and cell current
cannot adjust separately.
Figure 2.
Figure I.
Figure 3.
cell
is
base
this
memory cell are zeroes for both the data and instruction
streams. This new cell making it possible to achieves both low
VDD and high-speed operations with no area overhead
III.
,
-
---
--
1'4:
.')',
-1
t,
During idle mode of cell (when read and write operation don't
. -
''' .:':. I
--
- - -
_ __
__
____
'
..:
- -
.....,. - - - - -- -
- --
t.C"()
- -;-
-, - '- - -
Figure 4.
347
.'1-
) . : I', l
) ...
- --
- - --
IV.
read-line
maintained
at
GND.
When
write
Bit-line driving
A.
B.
Cell flipping
J)
Tr<lmientResponse
]un16,2011
Data is zero
in this state, ST node pulled down to GND by NMOS
2)
Data is one
in this state, ST node pulled up to VDD-VTN by NMOS
1.25
>
'HI,1!
u' rill 1
,
!,
\
r L
-.25
1.25
---
by M2 and M3.
3)
Idle mode
At the end of write operation, cell will go to idle mode and
Figure 6.
a)
Bit-line discharging:
V.
b)
Word-line activation:
c)
CELL AREA
g) Sensing:
h)
Idle mode:
348
These
numbers
do
not take
into
account
TABLET.
the
Leakage
Leakage
Better
in H
in6T
Performanc e
Forwrite1
...9 . 8 4
-0229
6T
inSTB
nA..
nA.
551 nA.
94.1 1
Parameter
No
1
node
ForwriteO
inSTB
5I
nA.
node
3
HL9n.I\
Forwrite 1
-3
.10
5I
nA.
inST
node
Figure 8.
ForwriteO
3 . 6011.1\
5I
920 nA.
inST
VI.
LEAKAGE CURRENT
node
stored) and in the other state the ST SRAM cell must retains its
mode when 'l' stored in cell, there is positive feedback and M2,
in SRAM cell.
VII.
CELL DELAY
1
2
Delay
Delay
Better
of S T
of 6T
Performance
Delay at
14.24
47.72
ST
STB
ps
ps
Delay at
2.4S 3
0.839
ST
ns
ns
Parameters
6T
This table shows that S T cell delay in STB node is less than
Figure 9.
Path from supply voltage to ground when ' I ' stored in cell.
because the output is taken from the STB node and cell delay
for the ST in STB node is less than the 6T in STB node.
different
assumptions
about
memory
cell
VIII.
sizes,
Power
in
idle
mode
of
ST
SRAM
cell
consumption
of
the
SRAM
memory
cell
is
POWER CONSUMPTION
and
349
TABLE Ill.
ACKNOWLEDGEMENTS
Paramete1: 5
no .
PO\'i<2'1" consulILod
at ST"B node fo
''Titing 0
PO\'i<2'1" consulILod
at 'T node fo
''Titing I
4
POWe1:
consumption of
ST
6T
9 .8nW
O.OIl p\V
at STB node fo
\'Titing I
POWe1: con sulILod
POWe1:
consumption of
Power consulll..od
6 .J n\\
JO pW
8 5 . 0 1 n': 1,1
O.OOJ pW
9 .8nW
[2]
[3]
[4]
[5]
L. Chang et aI., "Stable SRAM cell design for the 32 nm node and
beyond," in Symp. VLSI Technology Dig., pp. 128-129, Jun. 2005. 574
JOURNALS OF COMPUTERS, VOL. 4, NO. 7, JULY 2009 2009
ACADEMY PUBLISHER.
[6]
[7]
[8]
[9]
[10]
28pW
at ST node fo
\'Titing 0
IX.
[I]
CONCLUSION
350
Shishir Rastogi
Assistant Professor
M-TECH VLSI
Gwalior, M.P
Gwalior, M.P
Sanjay Sharma
Thapar University
Department of Electronics & Communication
Patiala, Punjab, INDIA.
[email protected]
Abstract- This paper is based on the observation of a various
CMOS
low power applications. This cell retains its data with leakage
current
and positive
feedback
without
refresh cycle.
These
various 7T SRAM cell uses one word-line and one bit-line and
consume
significant
fraction
(30-60%)
of
total
power
SRAM cell. The various new 7T SRAM cell contains 72.10% less
memory
reduction
cells
are
the
most
attractive
targets
for
power
majority of the write and read bits are '0'. Whereas in the
INTRODUCTION
many
VLSI
chips.
This
is
especially
true
for
overwhelming majority of the write and read bits are "0" these
with
each
generation
to
bridge
the
increasing
mode.
These
on-chip
memory
cells
are
usually
cell with five transistors to reduce the cell area size with
performance
power
consumption
improvement.
In
ordinary programs.
and
364
II.
of IV.
The SRAM cell current and read static noise margin (SNM)
are two important parameters of SRAM cell. The read SNM of
cell shows the stability of cell during read operation and
SRAM cell current detennine the delay time of SRAM cell [4].
Figure 2.
ill
the conventIOnal
Read SNM and SRAM cell current values are highly dependent
VIdle. Fig. 3 shows the new proposed 7T2 SRAM and Fig.4.
SRAM cell current increases [4]. That is, the dependence of the
cell
is
base
on
loop-cutting
strategy
and
this
Figure 3.
365
Figure 7.
:JIIIIo_.!io'lIIt=-- __ IW_,..
Figure 4.
above
condition
satisfied.
The
Cadence
Virtuoso
parameters are obtained from the latest for the technology node
of 45-nm [6].
IV.
read-line
maintained
at
GND.
When
write
2.1
10
5.0
!inlt4
Figure 5.
Figure 6.
2.1
till!t!I
3.0
5.'
366
V.
CELL AREA
obtained
Therefore
the
6T
by
cell
sharing
with
size
21.66%
is
neighboring
smaller
cells.
than
Figure 8.
Jun16,2011
Transi!ntR!lpOnl!
-----
-
",11
.1
---
3 '1"
VI.
LEAKAGE CURRENT
time (n
Figure 9.
mode when ' l' stored in cell, there is positive feedback and M2,
367
TABLE II.
DELAY
SRAM
S.No.
Delay
cell
STB
ST
6T
47.72 ps
50.54 ps
7Tl
39.18 ps
39.18 ps
7T2
34.53 ps
28.96 ps
7T3
31.48 ps
31.48 ps
This table shows that 7T3 cell delay in STB node is less
than 6T cell delay in STB node. It means 7T3 is better than 6T
because the output is taken from the STB node and cell delay
for the 7T3 in STB node is less than the 6T in STB node.
VIII.
Power
POWER CONSUMPTION
consumption
of
the
SRAM
memory
cell
is
different
assumptions
about
memory
cell
S.
sizes,
Cell
STB
'1 '
current
in
idle
mode
of
7T
SRAM
cell
and
TABLE !.
S. No.
Transistor
Leakage
6T
0.9067 f.1A
7T3
7Tl
0.7092 f.1A
7T3
7T2
0.3155 f.1A
7T3
7T3
0.1189 f.1A
7T3
STB
'0'
ST
'1 '
ST '0'
102.3n
101
101.
101.
nW
3nW
5nW
58.8
59.4
52.56n
7Tl
56
nW
nW
3nW
7T2
71.2
74.3
78.5
81.23n
nW
nW
6nW
7T3
90.4
95.6
98.9
93.93n
nW
nW
InW
Better
6T
Current
1
Power Consumption
NO
CONCLUSION
The key observations behind our design are that the cell
than the 6T SRAM cell. It shows that 7T3 is better than 6t for
VII.
CELL DELAY
write
operation
are
approximate
368
20.34%
less
than
the
REFERENCES
[I]
[2]
[3]
[4]
[5]
L. Chang et aI., "Stable SRAM cell design for the 32 nm node and
beyond," in Symp. VLSI Technology Dig., pp. 128-129,Jun. 2005. 574
JOURNALS OF COMPUTERS, VOL. 4, NO. 7, JULY 2009 2009
ACADEMY PUBLISHER.
[6]
[7]
[8]
[9]
Circuits:
369
Jayram Shrivas
Shyam Akashe
Associate Professor
Electronics & Instrument Engineering Department
Institute of Technology and Management
Gwalior (M.P), INDIA
Email: [email protected]
I.
INTRODUCTION
355
353
II.
IV.
Igate = A(VoxTox) e
(1)
III.
Vox
Tox
354
356
V.
VI.
Process Technology
45 nm
0.7 V
Leakage Current
680.53 fA
Power Consumption
95.28 PW
(2)
IX.
A major reason for decreasing effectiveness of SRAM bitcell technique is inherent goodness of High-K Gate Dielectric
used in modern CMOS process along with processor design
constraints like ITD and VDCMIN_RET distribution.
Decreasing effectiveness of SRAM bit cell sleep is evident by
the fact that 10-50mw leakage power savings is attained from
SRAM bit-cell sleep scheme. For example, leakage power
savings from SRAM bit-cell sleep scheme can be increased by
disabling bit-cell sleep at lower VDC in processors supporting
DVS feature. Disabling SRAM bit-cell sleep at lower VDC
levels enables sleep design to be optimized for higher VDC.
Another factor contributing to bit-cell sleep ineffectiveness is a
wide spread in data-retention voltage. Since spread in
VDCMIN_RET is function of array size. Since sleep transistor
resistance selection can be better optimized due to a narrower
VDCMIN_RET distribution.
CONCLUSION
ACKNOWLEDGMENT
This work was supported by ITM University Gwalior, with
collaboration Cadence Design System Banglore.
(3)
355
357
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
Y. Hsiang Tseng, Y. Zhang, L. Okamura and T. Yoshihara, A new 7transistor SRAM cell design with high stability, IEEE International
Conference on Electronic Device, System and Application, 2010.
V. Venkatachalam, Power Reduction Techniques in Microprocessor
Systems ACM Computing Surveys, Vol 3, pp 195237, September
2005.
M. Powell, Gated Vdd: A circuit technique to reduce leakage power in
deep-submicron cache memories Proceedings of ISLPED, pp 90-95,
2000.
N. Kim, Circuit and Microarchitectural Techniques for reducing cache
leakage power IEEE Transactions on VLSI, Vol 12, number 2, pp 167182, Feb 2004.
K. Zhang, SRAM Design on 65nm CMOS Technology with Dynamic
Sleep Transistor for leakage reduction IEEE JSSCC, Vol 40, pp 895900, April 2005.
F. Hamzaoglu, K.Zhang, A 3.8 GHz 153 Mb SRAM Design in 45nm
High-K Metal Gate CMOS Technology IEEE JSSCC, Vol 44, pp 148154, January 2009.
K. Kuhn, Intels 45nm CMOS Process Technology, Intel Technology
Journal, Vol 12, Issue-2, June 2008.
356
358
With the rapid growth in the market for mobile information terminals such as
smart phones and tablets, the performance of image processing engines (e.g.,
operation speed, accuracy in digital images) has improved remarkably. In these
processors, 2-port SRAM (2P-SRAM) macros [1], in which a read port and a
write port are operated synchronously in a single clock cycle, are widely used.
As shown in Fig. 13.4.1, since the 2P-SRAM is placed in front of large scale logic
circuitry for image processing, a faster access time (e.g., <1ns) is required. In
general, the read-out operation in 2P-SRAM utilizes full-swing of the single read
bitline (BL), so a drastic improvement of the access time is not expected. On the
other hand, the dual-port SRAM (DP-SRAM) makes use of the voltage difference
between BL pair in the read-out operation, which is suitable for the high-speed
operation. In this study, we present a time-sharing scheme using a DP-SRAM
cell to achieve high-speed access in 2P-SRAM macros in such image processors.
There are several conventional methods to realize 2P-SRAM operation within a
single clock cycle. A single-port SRAM (SP-SRAM) can be double-clocked with
consecutive read and write operations, but such 2 operating frequency is generally hard to achieve. Alternatively, the read and write ports of a DP-SRAM can
be operated in parallel; however, this causes a read-disturb issue [2]
(Fig. 13.4.2), in which the cell current (Iread) is degraded when the read and write
ports access the same row simultaneously. To achieve <1ns access time, such
Iread degradation is not acceptable. Our method, using a DP-SRAM cell, performs consecutive read and write operations in a single clock cycle with a small
delay between the WL pulses. The two operations effectively share time in the
overall clock cycle as the rising edge of WLW is delayed until the sense amplifier (SA) has completed the read operation. Our method realizes i) high-frequency operation (or reduced cycle time) compared with conv. 1 (of Fig. 13.4.1) since
the read and write operations are independently executed by independent
peripheral circuits for each port, and (ii) high-speed access time due to prevention of read-disturb issue in conv. 2.
Figure 13.4.2 illustrates a situation in which data is read out from the left memory cell while a data is written to the right cell. Note that both cells are in the
same row. Focusing on the left cell, since the read wordline (WLRn) is activated, the internal node MT discharges BLR (Iread). On the other hand, the write
wordline (WLWn) is also activated to write a data to the right cell, so WLWn
behaves as the dummy read operation for the left cell. The MT in the left cell
(storing 0) is ramped up via the precharged BLW, reducing the power of MT to
discharge the BLR. Thus, the cell current (Iread) decreases, which is referred to
as the read-disturb issue. Figure 13.4.2 shows the simulated Iread with and without the read-disturb issue. To take the 6 variation into account, we introduce
the worst combination of the local Vth variation referring to the worst-vector
method in [3]. Due to the read-disturb issue, Iread is decreased by 48%, which
means that it takes twice as long to achieve the same amount of BL swing without the read disturb. Therefore, by circumventing the read-disturb issue, we can
accelerate the time for BL swing by 50% compared with conventional DP-SRAM.
Another merit of our implementation is shown in Fig. 13.4.3, in which the left cell
is written and read simultaneously. The unselected right cell in the same row is
half-selected due to activation of both WLRn and WLWn and thus sees a strong
dummy read operation. In this case, MT is raised significantly above ground by
BLR and BLW. If MT becomes higher than the threshold voltage of the inverter
in the cell, the storage nodes are flipped, causing data destruction. Figure 13.4.3
236
compares the waveforms of the WLs and the storage nodes for proposed and
conventional circuit. In the conventional method, the longer the simultaneousWL-activation period (indicated by the dashed arrow) becomes, the more easily
the storage nodes are flipped, indicating low cell stability. This period for the
time-sharing method is shorter than that used in conventional parallel read/write
DP-SRAM because the write WL is activated only when the read BL swing is
complete. Thus, this implemenatation realizes the high cell stability. In this
study, we applied the conventional 8T DP-SRAM cell layout to our circuit, however the proposed circuit enables us to use smaller cell size owing to this advantage.
Figure 13.4.4 shows the circuitry and corresponding waveforms of our implementation. When the TDEC signal generated by CLK is activated, one read wordline, WLRn, is selected to begin the read operation. Note that in the conventional design, this TDEC also activates the WLWn, which results in the read disturb
issue mentioned earlier. To circumvent this, we introduce a new BACK signal,
which is activated by TDEC with a delay element This delay is designed so that
the SA can detect the worst BL swing including the local Vth variation. The BACK
signal activates the SA and the read-out data is transferred as the output Q. At
the same time, the BACK signal also selects the WLWn so that the write operation is executed after the read-out is completed. In this way, the time-sharing
scheme is achieved and the read-disturb issue is prevented. Note that WTE,
which activates the write-driver (WD), is synchronized with TDEC, however, the
data to be written is transferred to the WD in advance, so the peak current due
to concurrent activation of SA and WD is effectively avoided. Simulated waveforms show that WLW1 activation is delayed until SAE is enabled, so that the
read BLs (BLR0 and /BLR0) are discharged without disturbance from WLW1. In
addition, WTE is enabled prior to SAE activation, which helps to reduce peak current as previously mentioned. Simulation results show that the cycle time and
access time are 1GHz and 360ps, respectively.
The table in Fig. 13.4.5 summarizes the features of the circuit as designed in a
test-chip in a 28nm high-k metal-gate process. The graph represents a Shmoo
plot at 25C, showing the relationship between minimum operating voltage
(Vmin) and read access time. The lowest Vmin is 0.56V, while the access time at
1.0V is 500ps. This access time achieves 5 speed-up in comparison with our
previous data (2.5 ns at 1.0 V using the same DP-SRAM in 28 nm process [4].)
Figure 13.4.6 compares the data for this scheme with that for the conventional
method (conv. 2 in Fig. 13.4.1.) The simulation result (solid line) is in good
accordance with the measurement data for a wide range of supply voltage VDD.
A 100mV reduction in Vmin is observed due to improvement of cell stability while
the access time at worst-case temperature was improved by 13% to 360ps at
1.2V due to elimination of the read-disturb issue.
Acknowledgements:
We would like to thank Y. Ouchi, O. Kuromiya and M. Tanaka for their helpful
technical supports, and also Y. Kihara, T. Sato and T. Takeda for their management.
References:
[1] T. Suzuki, H. Yamauchi, Y. Yamagami, K. Satomi, and H. Akamatsu, A stable
2-port SRAM cell design against simultaneously read/write-disturbed accesses,
IEEE J. Solid-State Circuits, pp. 2109-2119, Sep. 2008.
[2] Y. Ishii, H. Fujiwara, S. Tanaka, Y. Tsukamoto, K. Nii, Y. Kihara, and K.
Yanagisawa, A 28 nm dual-port SRAM macro with screening circuitry against
write-read disturb failure issues, IEEE J. Solid-State Circuits, pp. 2535-2544,
Nov. 2011.
[3] Y. Tsukamoto, T. Kida, T. Yamaki, Y. Ishii, K. Nii, K. Tanaka, S. Tanaka, and Y.
Kihara, Dynamic stability in minimum operating voltage Vmin for single-port
and dual-port SRAMs, CICC Dig. Tech Papers, pp. 1-4, Sep. 2011.
[4] Y. Ishii, H. Fujiwara, K. Nii, H. Chigasaki, O. Kuromiya, T. Saiki, A. Miyanishi,
and Y. Kihara, A 28-nm dual-port SRAM macro with active bitline equalizing circuitry against write disturb issue, Symp. VLSI Circuits Dig. Tech. Papers, 2010,
pp. 99100.
13
237
I. I NTRODUCTION
Multilevel on-chip cache hierarchies have been typically
built employing Static Random Access Memory (SRAM)
technology, which is the fastest electronic memory technology. Nowadays, alternative technologies are being used and
explored since SRAM technology presents important shortcomings like low density and high leakage energy, which is
proportional to the number of transistors. These shortcomings
have become important design challenges, in such a way that
it is unlikely the implementation of future cache hierarchies
with only SRAM technology, especially in the context of Chip
Multi-Processors (CMPs).
New advances in technology enable to build caches with
other technologies, like embedded Dynamic RAM (eDRAM)
or Magnetic RAM (MRAM). Table I summarizes some
297
Table I
F EATURES OF SRAM, E DRAM, AND MRAM
Technology
SRAM
eDRAM
MRAM
TECHNOLOGIES
Feature
Speed
Density
fast
low
slow
high
slow for reads
high
very slow for writes
Leakage
high
low
very low
II. M OTIVATION
First-level data caches concentrate most of their hits (e.g.,
more than 90%) in the Most Recently Used (MRU) block [13].
Therefore, in hybrid SRAM/eDRAM L1 caches, it is enough
to build a single cache way with the fastest SRAM technology
and force this cache way to store the MRU block for performance purposes [8]. However, it is widely known that data
locality in L2 caches is much poorer than in L1 caches. Thus
this implementation might yield to unacceptable performance
in L2 caches.
The two extremes of the design space exploration of hybrid
caches are defined by caches implemented with a single
technology. We will refer to these caches as pure SRAM and
pure eDRAM caches. Both extremes provide the maximum
performance but higher energy consumption and area, and the
poorer performance but lower leakage energy consumption and
area, respectively. Between these points we can vary the ratio
between SRAM and eDRAM banks in order to benefit either
performance or energy savings and area.
In this design space, the optimal hybrid SRAM/eDRAM
cache design must be found out. That is, the optimal design
loc-0
loc-1
loc-{2-3}
loc-{4-7}
loc-{8-15}
100%
90%
Cache Hit Distribution
80%
70%
60%
50%
40%
30%
20%
10%
ea
n
ap
pl
u
Am
gr
id
m
ar
t
fa
ce
re
c
w
up
w
ise
lu
ca
s
sw
im
ap
si
ol
f
bz
ip
2
tw
cf
Figure 1.
vp
r
0%
m
298
Table II
P URE AND HYBRID CACHES WITH THE CORRESPONDING NUMBER OF WAYS , BANKS ,
Cache scheme
16S
8S-8D
4S-12D
2S-14D
16D
SRAM ways
16
8
4
2
0
eDRAM ways
0
8
12
14
16
SRAM banks
8
4
2
1
0
Figure 2. Diagram of the hybrid cache access. Dark boxes represent the
accessed parts of the cache. The second stage is performed only on a hit in
an eDRAM way detected in the first stage.
AND RATIO
eDRAM banks
0
4
6
7
8
299
Table III
M ACHINE PARAMETERS
eDRAM
SRAM
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
Hit Ratio
Microprocessor core
Issue policy
Out of order
Branch predictor type
Hybrid gshare/bimodal: gshare has
14-bit global history plus 16K
2-bit counters, bimodal has 4K
2-bit counters, and choice
predictor has 4K 2-bit counters
Branch predictor penalty
10 cycles
Fetch, issue, commit width 4 instructions/cycle
ROB size (entries)
128
# Int/FP ALUs
4/4
Memory hierarchy
L1 instruction cache
64B-line, 16KB, 2-way, 2-cycle
L1 data cache
64B-line, 16KB, 2-way, 2-cycle
L2 unified cache
64B-line, 1MB, 16-way
L2 access latency
Tag array: 2-cycle
SRAM data array: 6-cycle
eDRAM data array: 9-cycle
Memory access latency
100-cycle
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
mcf
Figure 3.
twolf bzip2
apsi
swim lucas
art
face.
Hit ratio (%) split into hits in SRAM and eDRAM banks.
16%
14%
12%
Slowdown
vpr
A. Performance Evaluation
10%
8%
6%
4%
2%
ea
n
ap
pl
u
m
H
gr
id
m
ar
t
fa
ce
re
c
w
up
w
ise
lu
ca
s
sw
im
ap
si
ol
f
bz
ip
2
tw
cf
m
vp
r
0%
Figure 4. Slowdown (%) of the analyzed schemes with respect to the pure
SRAM scheme.
300
8S-8D
4S-12D
2S-14D
16D
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
ea
n
Am
gr
id
ap
pl
u
ar
t
fa
ce
re
c
w
up
w
ise
lu
ca
s
sw
im
ap
si
ol
f
bz
ip
2
tw
cf
m
vp
r
0%
301
302
2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA
I.INTRODUCTION
NATURE technology, which is a hybrid CMOS/nano
technology reconfigurable architecture, was presented
earlier. It can facilitate run-time reconfigurability. The
NATURE technology was based on CMOS logic and nano
RAMs, which used the concept of temporal logic folding
and fine-grain dynamic reconfiguration in order to increase
logic density. The main drawback of this design is that, it
required fine-grained distribution of nano RAMs
throughout the field-programmable gate array (FPGA)
architecture. Since the fabrication process of nano RAMs
is not mature yet, this prevents immediate exploitation of
NATURE[1]. FPGAs - Field Programmable Gate Arrays are future-oriented building blocks which allow perfect
customization of the hardware at an attractive price even in
low quantities. FPGA components available today have
usable sizes at an acceptable price. This makes them
effective factors for cost savings and time-to-market when
making individual configurations of standard products. A
time consuming and expensive redesign of a board can
often be avoided through application-specific integration of
Dr.M.Jagadeeswari
Professor and Head - M.E. VLSI Design
Sri Ramakrishna Engineering College
Coimbatore, India
[email protected]
IP cores in the FPGA-an alternative for the future,
especially for very specialized applications with only small
or medium [7]. Another important aspect is long-term
availability. The advantage of FPGAs and their nearly
unlimited availability lies in the fact that even if the device
migrates to the next generation the code remains
unchanged. FPGAs contain programmable logic gate
components called logic blocks and a hierarchy of
reconfigurable interconnects that allow the blocks to be
wired together. The logic blocks also include memory
elements, which may be simple flip-flops or an array of
complete blocks of memory. The drawback of FPGAs is
that, the area, power consumption and delay parameters are
high compared to the application-specific integrated
circuits (ASICs) [2]. This drawback is primarily due to the
overheads introduced for reconfigurability. In order to
overcome the drawbacks of the current FPGA, a hybrid
CMOS/nanotechnology reconfigurable architecture, called
NATURE [3], was proposed previously to solve two main
problems: logic density and efficiency of run-time
reconfiguration. This NATURE technology is based on
CMOS logic and nano RAMs. These nano RAMs were
advantageous due to the fact that they provide high-speed
and high density, which enables the concept of temporal
logic folding that is similar to the temporal pipelining
concept [4].Due to the fact that, nano RAM fabrication
techniques are not mature yet, and the demand for the use
of fine-grained distributed nano RAMs incurs extra design
complexity and cost, marks a drawback of the nano RAMs.
Therefore, in order to avoid the use of nano RAMs, this
paper presents SRAM-based FPGA architecture. This
architecture overcomes the disadvantages of the nano
RAMs by employing CMOS logic and CMOS devices.
Reduced power consumption can be achieved by
employing low-power 10T non-precharge SRAM blocks.
These blocks are used for storage of configuration bits [5],
which save the charge/precharge power on bitlines during
read operation [1]. The proposed 10T SRAM-based FPGA
architecture achieves reduced reconfiguration delay and
power consumption. Simulation results show significant
improvements in performance due to the reduced delay
being achieved at competitive power consumption. The
remainder of this paper is organized as follows. Section II
presents fundamental facts of the 4T, 6T and 10T SRAM
cells and the previous design of NATURE. Section III
2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA
4T SRAM
B.
6T SRAM
10T SRAM
The proposed architecture employs the low power nonprecharge 10T SRAM cell in each of the memory cell. As
shown in Fig. 1, a 10T SRAM cell includes a conventional
6T SRAM cell, a readout inverter and a transmission gate
for the read port. These 10T SRAM cells enable both read
and write operation. The write operation is same as
achieved by a conventional 6T SRAM cell. For read
operation, this 10T SRAM cell employs its non-precharge
scheme [1]. Since the readout inverter is able to fully
charge/ discharge the read bitline, the precharge scheme is
not required. Therefore, the voltage on the bitline does not
switch until the readout datum is changed and hence, the
readout power is saved, also the delay is improved as well
compared to conventional 4T and 6T SRAM cells, since
the time for precharge is reduced. The area overhead of
the cell relative to the conventional 6T SRAM cell and 4T
SRAM cell is quite high. Since the 10T SRAM design
avoids high switching activities on memory read bitlines
and thus saves most of the charge/ precharge power, it is a
promising candidate for low-power applications.
NATURE
2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA
LB
2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA
Interconnects
speed 10T SRAMs [4] that support fast read operation and
large bit-width. However, loading reconfiguration bits
from the local memory blocks every cycle results in a
delay overhead. Hence, we added a shadow SRAM cell to
each reconfigurable element to further improve
performance, allowing us to hide the reconfiguration delay.
As shown in Fig. 4, each LB is associated with a 10T
SRAM block. This SRAM block stores the configuration
copies for the LB, CB, and SB. The memory is designed to
support 32 configurations for a circuit mapped to the
FPGA. Hence, 32 x no. of configuration bits, need to be
stored in the memory block. As shown in Fig. 8, 32 DFFs
are serially connected so as to implement a shift-register
and the 32 wordlines (each has 1438 bits) are activated by
the shift-register row-by-row. The read bit lines (RBL) as
specified in the Fig.8 then provide the configuration data to
the shadow SRAMs. Conventional FPGAs use only one of
the SRAM cell in order to control a reconfigurable switch.
2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA
EXPERIMENTAL RESULTS
TABLE I
Area, delay and power comparison between 4T, 6T and
10T SRAM cell
SRAM Cell Structures
LOGIC
UTILIZATION
4T SRAM
6T SRAM
10T
SRAM
DELAY
9.049 ns
8.924 ns
8.035 ns
POWER
0.089 W
0.081W
0.015W
TABLE II
Area, delay and power tradeoffs of LB employed with 10T
SRAM cell in Memory Block
SRAM
CELL
AREA
DELAY
10T SRAM
239
8.979 ns
POWER
0.052 W
2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA
ACKNOWLEDGEMENT
The authors would like to thank All India Council for
Technical Education (AICTE), India, to financially support
this work under the grant 8023/BOR/RID/RPS-48/200910. The authors would also like to thank the Management
and Principal of Sri Ramakrishna Engineering College,
Coimbatore for providing excellent computing facilities
and encouragement.
REFERENCES
[1] Ting-Jung Lin, Wei Zhang, and Niraj K. Jha, SRAM-Based
NATURE: A Dynamically Reconfigurable FPGA Based on 10T LowPower SRAMs IEEE Trans. VLSI Systems, accepted for future inclusion
in IEEE journal.
[2] I. Kuon and J. Rose, Measuring the gap between FPGAs and ASICs,
IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 26, no.
2, pp. 203215, Feb. 2007.
[3] W. Zhang, N. K. Jha, and L. Shang, A hybrid nano/CMOS
dynamically reconfigurable systemPart I: Architecture, ACM J.
Emerg. Technol. Comput. Syst., vol. 5, no. 4, pp. 16.116.30, Nov. 2009.
[4] A. DeHon, Dynamically programmable gate arrays: A step toward
increased computational density, in Proc. Canadian Wkshp. FieldProgram. Devices, 1996, pp. 4754.
[5] H. Noguchi, Y. Iguchi, H. Fujiwara, Y. Morita, K. Nii, H. Kawaguchi,
and M. Yoshimoto, A 10T non-precharge two-port SRAM for 74%
power reduction in video processing, in Proc. IEEE Comput. Soc.
Annu. Symp. VLSI, 2007, pp. 107112.
[6] Md. A. Khan, N. Miyamoto, R. Pantonial, K. Kotani, S. Sugawa, and
T. Ohmi, Improving multi-context execution speed on DRFPGAs, in
Proc. IEEE Asian Solid-State Circuits Conf., 2006, pp. 275278.
[7] S. Trimberger, D. Carberry, A. Johnson, and J. Wong, A timemultiplexed FPGA, in Proc. IEEE Symp. FPGAs for Custom Comput.
Mach., 1997, pp. 2228.
[8] T. Fujii, K.-I. Furuta, M. Motomura, M. Nomura, M. Mizuno, K.-I.
Anjo, K.Wakabayashi, Y. Hirota, Y.-E. Nakazawa, H. Ito, and M.
Yamashina, A dynamically reconfigurable logic engine with amulticontext/ multi-mode unified-cell architecture, in Proc. IEEE Int. SolidState Circuits Conf., 1999, pp. 364365.
[9] Sapna Singh, Neha Arora, Meenakshi Suthar and Neha Gupta
(2012)Performance Evaluation Of Different Sram Cell Structures At
Different Technologies, Proc. of International Journal of VLSI design &
Communication Systems (VLSICS) Vol.3.
[10] W. Zhang, L. Shang, and N. K. Jha, A hybrid nano/CMOS
dynamically reconfigurable systemPart II: Design optimization flow,
ACM J. Emerg. Technol. Comput. Syst., vol. 5, no. 3, pp. 13.113.31,
Aug. 2009.
[11] G. Lemieux and D. Lewis, Circuit design of routing switches, in
Proc. Int. Symp. FPGA, 2002, pp. 1928.
[12] Dr.Sanjay Sharma and Shyam Akashe (2011),High Density FourTransistor SRAM Cell With Low Power Consumption, Proc. of Int. J.
Comp. Tech. Appl., Vol 2, 1275-1282.
[13]
Spartan3E,ver.3.4,Xilinx,(2006)[Online].Available:http://direct.
xilinx.com/bvdos/ publications/ds312.pdf
[14] Narender Gujran, Praveen kaushik (2012), A Comparative Study of
6T, 8T and 9T Sram Cell, Proceedings of International Journal of Latest
Trends in Engineering and Technology (IJLTET),Vol.1.
I.
INTRODUCTION
P2 and four NMOS transistor namely N1, N2, N3, N4. Fig.2
shows 6T SRAM cell write wavefoorm, fig.3 shows
Conventional 6T SRAM cell for read opperation and fig.4
shows 6T SRAM cell read waveform.
A conventional 6T Static Random Access Memory
performs write operation, read operattion and standby
operation. These operations performed witth the help of bitline pairs (BL and BLB) and word lines (W
WL). The BL and
BLB are the bit line pairs of the SRAM cell. The value of
BL and BLB is inverted. When BL is highh then BLB will be
zero and vice versa.
wo signals will be
During write operation of SRAM tw
produced from the input BL and BLB. BL is the input of N3
and BLB is the input of N4 transistor. Wheere if BL=0, then
BLB will be 1, or if BL=1 then BLB
B
will be zero.
Simultaneously word-line goes high which help to written the
data in the SRAM cell by accessing the trannsistors.
III.
PROPOSED
D 9T SRAM CELL
IV.
(1)
(2)
Whereas
VIL is the maximum input voltage level
recognized as logical 0, VIH is the minimum input voltage
level recognized as a logical 1. VOL is the maximum logical
0 output voltage, VOH is the minimum logical 1 output
voltage. Therefore the required SNM expression can be
defined a
SNM = (NMH2 + NML2)
V.
(3)
Leakage
Current (pA)
Leakage Power
(pW)
6T
13.31
4.82
9T
4.57
1.81
SRAM
Cell
Table (1). Leakage Current and leakage Power of 6T and 9T SRAM cell
.
SRAM
6T
7T
% Increment
RSNM
415mv
595mv
43.37%
SIMULATION RESULTS
Takahashi, A 16-Mb 400-MHz loadless CMOS fourtransistor SRAM macro, IEEE J. Solid-State Circuits,
vol. 35, pp.16311640, Nov. 2000.
[3]. S.-M. Yoo, J. M. Han, E. Hag, S. S. Yoon, S.-J. Jeong,
B. C. Kim, J.-H. Lee, T.-S. Jang, H.D. Kim, C. J.
Park, D. H. Seo, C. S. Choi, S.-I. Cho, and C. G. Hwang,
A 256 M DRAM with simplified register control for
low power self refresh and rapid burn-in, in Symp.
VLSI Circuits Dig. Tech. Papers, 1994, pp. 8586.
[4]. B.H.Calhoun and P.Chandrakasan A 256-kb 65-nm
sub- threshold SRAM Design for Ultra-Low-Voltage
operation, IEEE JOURNAL OF SOLID-STATE
CIRCUITS, VOL, 42, NO. 3, pp. 680-688, March 2007
[5]. International Technology Roadmap for Semiconductors
2005.WWW.ITRS.NET/LINKS/2005ITRS/HOME2005.
HTM
[6]. Evelyn Grossar, Michele Stucchi, Karen Maex, Read
Stability and Write-Ability Analysis of SRAM Cells for
Nanometer Technologies, Solid-State Circuits, IEEE
Journal ,vol. 41 , no. 11, Nov.2006 pp.2577-2588.
[7]. Benton H. Calhoun Anantha P. Chandrakasan Static
Noise Margin Variation for Sub-threshold SRAM in 65
nm CMOS, Solid-State Circuits, IEEE Journal vol. 41,
Jan.2006, Issue 7, pp.1673-1679
[8]. Yeonbae Chung, Seung-Ho Song , Implementation of
low-voltage static RAM with enhanced data stability and
circuit speed, Microelectronics Journal vol. 40, Issue 6,
June 2009, pp. 944-951.
[9]. J.Rabaey, A.Chandrakasan, and B. Nikolic, Digital
Integrated Circuits: A Design Perspective, 2nd ed.
Englewood Cliffs, NJ: Prentice- Hall, 2003.
[10]. Andrei Pavlov and Manoj Sachdev. CMOS SRAM
Circuit Design AND Parametric Test in Nano-scaled
Technologies. Springer, 2008
[11]. Z. Liu, and V. Kursun, Characterization of a Novel
Nine-Transistor SRAM Cell, IEEE Transactions on
Very Large Scale Integration (VLSI) Systems vol. 16,
No. 4, pp. 488-492, April 2008
[12]. S. Lin, Y-B. Kim, F. Lombardi, A Low Leakage 9T
SRAM Cell for Ultra-Low Power Operation, ACM
Great Lakes Symposium on VLSI 2008, May 2008, pp.
123-126.
[13]. A. Chandrakasan, W.J. Bowhill, F. Fox, Design of
High- Performance Microprocessor
Circuits, IEEE
Press, 2000.
[14]. O. Thomas, Impact of CMOS Technology Scaling on
SRAM Standby Leakage Reduction techniques,
ICICDT, May 2006.
[15]. V. De et al., Techniques for leakage power reduction,
in Design of High-Performance Microprocessor Circuit,
Circuits, A. Chandrakasan, W. J. Bowhill, and F. Fox,
Eds. Piscataway, NJ: IEEE, 2001, pp. 285-308.
[16]. E. Seevinck, F.J. List, J. Lohstroh, Static-noise margin
analysis of MOS SRAM cells, Solid-State Circuits,
IEEE Journal of Volume 22, Issue 5, pp. 748 - 754, Oct
1987.
Relocatable and Resizable SRAM Synthesis for Via Configurable Structured ASIC
Hsin-Hung Liu, 1Rung-Bin Lin, 1I-Lun Tseng
Department of Computer Science and Engineering
Yuan Ze University
Chung-Li, Taiwan
1
E-mail: [email protected], [email protected]
memory block realization using six-transistor (6T) memory
cells. Nevertheless, we still adapt a logic-oriented VCLB to
realize an SRAM block.
A typical ASIC may contain a varying number of
memory blocks of various sizes. The memory blocks,
mainly SRAMs, can be placed at any legal locations. They
can be used for cache memories, FIFOs, stacks, etc. They
are customized for area, power, and performance
optimization using a highly crafted SRAM cell and
peripheral circuits. On the contrary, a typical structured
ASIC can have only a fixed number of pre-diffused memory
blocks whose locations and sizes cannot be changed as
shown in Figure 1(a), i.e., they are not relocatable and
resizable. Figure 1(b) shows another implementation of
structured ASIC memory block which employs a
customized VCLB different from the one for realizing logic
blocks. If the VCLB instances are not used for memory
block, they may be still configured into some logic blocks.
Otherwise, they will be wasted. The implementations both in
Figure 1(a) and 1(b) face the same problem of having to
determine the number of memory blocks, their sizes, and
their locations beforehand. Hence, a designer may encounter
a problem of insufficient memory blocks, improper memory
block sizes and locations, etc. These limitations severely
discourage structured ASICs adoption. Clearly, we need
remove these limitations to make structured ASIC a
prevailing design technology.
Abstract
Memory blocks in a structured ASIC are normally precustomized with fixed sizes and placed at predefined
locations. The number of memory blocks is also predetermined. This imposes a stringent limitation on the use of
memory blocks, often creating a situation of either
insufficient capacity or considerable waste. To remove this
limitation, in this paper we propose a method to create
relocatable and resizable SRAM blocks using the same viaconfigurable logic block to implement both logic gates and
6T SRAM cells. We develop an SRAM compiler to
synthesize SRAM blocks of this sort. Our single-port SRAM
array uses only 1/3 the area taken by a flip-flop based
SRAM array. For dual-port SRAM arrays, this ratio is 2/3.
We demonstrate first time the feasibility of deploying a
varying number of relocatable and resizable SRAM blocks
on a structured ASIC.
Keywords
Structured ASIC, SRAM, via configurable, regular fabric
1. Introduction
Structured ASIC is a design style that can bridge the
performance, power, area, and design cost gaps between
ASIC and FPGA. It contains arrays of via-configurable logic
blocks (VCLB) with prefabricated transistors and possibly
arrays of routing fabric blocks each formed by predefined
yet via-configurable metal wires [1-4]. A via-configurable
structured ASIC technology employs only customizable via
layers [5-9] and thus has a much lower non-recurring
engineering (NRE) cost than standard ASIC. Its regular
layout structure also enables higher and more predictable
manufacturing yields. As technology scales, layout
regularity becomes even more important. It is encouraging
to see that Jhaveri et al. [10-11] are able to achieve
comparable timing and area utilization for an ARM926EJ
implementation exploiting only a small set of layout
primitives for forming regular layouts, with respect to an
implementation based on a commercial 65-nm standard cell
library.
Structured ASIC researches mainly focus on exploring
VCLB architectures. Only a few works are done for routing
architecture [6,8,9]. VCLBs are usually handcrafted to
enable standard-cell-like designs for leveraging existing
standard ASIC design tools [7-8]. Although a significant
progress in VCLB architecture designs has been made so far,
VCLBs are mainly optimized for logic block realization
rather than memory block implementation. To the best of
our knowledge, we are the first to design a VCLB for
978-1-4673-4953-6/13/$31.00 2013 IEEE
494
2. Preliminary
2.1. VCLB for standard-cell-like structured ASIC
In this work we will employ a standard-cell-like
structured ASIC technology presented in [7]. Such a
technology is featured by having a VCLB layout similar to
that of conventional standard cells. The work in [7] presents
a VCLB with 5 pairs of P/N transistors laid over three
diffusion strips. We can use vias between Metal 1 (M1 for
short) and Metal 2 (M2 for short) to configure the VCLB
into various logic gates. We can also abut several VCLB
instances to realize a more complex logic gate or a flip-flop.
However, this VCLB, without any modification, cannot be
used directly to implement a 6T memory cell. Besides a
VCLB, a via-configurable structured ASIC has a predefined
yet via-configurable routing fabric for connecting logic
gates together. We need a structured ASIC router [8]
specifically designed to deal with a predefined routing fabric.
In this work, we use the routing fabric in [8] for our SRAM
implementation. This routing fabric contains repetitive wire
segments on M3 through M5.
5. Experimental Results
We employ M-VCLB to create a structured ASIC library
with more than 100 cells (combinational and sequential cells
together) based on TSMC 0.18um technology. We also use
M-VCLB to create 6T memory cells (M-cell) and leaf cells
employed by our SRAM compiler to produce a relocatable
and resizable SRAM block. We use a commercial tool
MemChar [24] to characterize access time and power
dissipation of our SRAM blocks. For the purpose of
comparison, we also use Artisans (ARMs) memory
compiler to generate a customized yet non-structured ASIC
SRAM block counterpart. Our M-cell has an area of
94.248um2 whereas a typical 0.18um 6T memory cell has an
area about 5um2 [25]. The area ratio is about 18.85. Table I
shows area and access time of our single-port SRAM blocks
with a multiplexor width equal to four. Area is normalized
to that of Artisans non-structured ASIC SRAM block
counterparts. Our SRAM blocks have an area of 1.5 to 14.4
and an access time of 1.46 to 2.25 times that of their
counterparts. Timing performance of our SRAM blocks is
competitive. The area gap between our SRAM blocks and
Artisans increases with increasing memory size. However,
given that the ratio of our M-cells area to a non-structured
ASIC SRAM cells area is 18.85, the area efficiency of our
SRAM blocks is relatively good, especially for small SRAM
blocks. Clearly, if we can reduce M-cells size, the area
efficiency will be getting even better. When compared to
flip-flop based SRAM blocks, the area of our SRAM array
is only 1/3 the area of a memory array formed by flip-flops
(each flip-flop using three M-VCLBs based on the work in
[7]). Note that Table I also shows the number of nets routed
by our router and the routing runtime. Table II shows the
internal power of single-port SRAM blocks. For a small
SRAM block, power dissipation is dominated by peripheral
circuits whereas, for a large SRAM block, power is
dominated by memory array access. We are still wondering
why Artisans SRAM blocks have such a small standby
power which is independent of their sizes.
Table III shows area and access time of dual-port SRAM
blocks. Note that MemChar [24] fails to characterize our
dual-port SRAM blocks because we use 3 M-VCLBs to
implement two memory bits. Hence, we do not have internal
power data. The access times of our SRAM blocks are
obtained from HSPICE simulation. The array area is 2/3 the
area of a memory array formed by flip-flops. The area usage
and timing performance are as good as those for single-port
SRAM blocks.
Since we derive M-VCLB by shrinking the transistor
sizes of S-VCLB originally optimized for logic gates, we
would like to see the extent of the impact on timing
performance due to transistor downsizing. We perform the
following experiments. We also use S-VCLB to create a
structured ASIC cell library. Along with Artisans 0.18um
non-structured ASIC standard cell library (STDL) and the
cell library created using M-VCLB, we have three cell
libraries now. We synthesize some ITC99 (b14~b22) and
ISCAS89 benchmark circuits using these libraries. We use a
method presented in [7] to push the delay envelope
(minimizing the longest path delay as much as possible) of a
circuit synthesized by Synopsys Design Compiler using
each individual cell library. The smallest longest path delay
obtained for each circuit is used as the achievable clock
period. Table IV shows the achievable clock period for each
circuit and their corresponding power dissipation and total
cell area. Columns denoted by M-VCLB (S-VCLB) give
data obtained by employing the cell library based on MVCLB (S-VCLB). With M-VCLB, chip performance is
degraded by 64% (=(10.8-6.6)/6.6*100%). Correspondingly,
power dissipation is also reduced. Despite of such
degradation, the achievable clock speed by M-VCLB is only
a little bit smaller than half the clock speed achieved by
STDL. Note that all the benchmark circuits in Table IV do
not contain SRAM blocks.
To illustrate how we use our relocatable and resizable
SRAM blocks in a design, we implement an SoC platform
ORPSoC [26] based on OpenRISC 1200, a 32-bit CPU from
OpenCores. ORPSoC has eight memory blocks. The largest
one has 4K bytes whereas the smallest one has 112 bytes.
We have three implementations:
STDL(A): using Artisans non-structured ASIC standard cell
library and Artisans non-structured ASIC SRAM
blocks.
S-VCLB(A): using the structured ASIC cell library based on
S-VCLB and Artisans non-structured ASIC
SRAM blocks.
M-VCLB(O): using our structured ASIC cell library and
relocatable and resizable SRAM blocks based
on M-VCLB.
SRAM Block
1
1
1
1
1
16X4 (8B)
32X8 (32B)
256X16 (512B)
256X64 (2KB)
512X64 (4KB)
1.5
2.5
8.3
11.6
14.4
1.15
1.17
1.20
1.33
1.34
Routing
Nets
Runtime (m)
1.68 300
1.87 800
2.56 10800
2.82 42700
3.01 83100
0
0
2
16
42
Artisans
Ours
Read
Write
Standby
Read
Write
Standby
59
70
95
233
237
60
73
103
274
282
0.04
0.04
0.04
0.04
0.04
22
45
144
585
780
30
40
154
547
859
5
10
54
225
366
Area
1
1
1
1
1
1.6
2.3
6.8
9.1
11.1
1.19
1.21
1.28
1.46
1.51
Routing
Ours
Nets
Runtime (m)
1.80
1.80
2.40
2.90
3.10
600
1700
22300
87000
169300
0
0
8
72
188
Clock period(ns)
Power(mW)
1.3
2.9
2.4
4.7
4.2
4.9
7.2
9.7
5.2
4.8
5.3
4.8
1.8
3.4
3.0
6.7
4.9
5.8
12.1
13.3
7.0
6.9
7.7
6.6
2.6
5.2
4.4
9.1
8.8
12.5
21.9
22.5
9.7
10.2
11.7
10.8
10
11
12
10
8
26
68
114
22
19
24
29
22
33
32
20
19
58
148
260
30
36
60
65
21
30
36
13
11
42
101
182
28
28
41
49
0.25
0.24
0.23
0.16
0.15
0.44
1.23
2.27
0.37
0.35
0.45
0.56
0.81
1.02
0.95
0.83
0.71
2.09
5.35
8.77
1.17
1.48
2.42
2.33
0.90
1.13
1.01
0.84
0.61
2.28
5.49
9.05
1.74
1,75
2.47
2.48
STDL(A)
15.00
68.74
3.00
1.28
S-VCLB(A)
17.97
247.85
7.86
6.65
M-VCLB(O)
19.74
320.63
18.36
6.70
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
6. Conclusions
In this paper we propose a method to create a relocatable
and resizable SRAM block for via-configurable structured
ASIC. This structured ASIC technology is enabled by using
the same VCLB to implement logic blocks and SRAM
blocks. We develop such a VCLB by properly sizing a
VCLB originally optimized for logic gate implementation.
We develop an SRAM compiler to generate relocatable and
resizable SRAM blocks using this VCLB. Our SRAM
blocks, especially the smaller ones, achieve acceptable
timing performance, area usage, and power dissipation. We
also demonstrate an application of our relocatable and
resizable SRAM blocks to designing an SoC platform. In the
future we will work out a VCLB optimized for memory cell
implementation but still being viable for logic gate
implementation. We will also continue to search for VCLBs
that will further narrow the area, power, and performance
gaps between structured ASIC and standard ASIC.
7. References
[1] [1] B. Zahiri, "Structured ASICs: opportunities and
challenges," ICCD, pp. 404-4093, 2003.
[2] K. C. Wu and Y. W. Tsai, "Structured ASIC, evolution
or revolution?" ISPD, pp. 103-106, 2004.
[13]
[14]
[15]
[16]
[17]
Abstract
Most modern microprocessors have multi-level on-chip
caches with multi-megabyte shared last-level cache (LLC).
By using multi-level cache hierarchy, the whole size of onchip caches becomes larger. The increased cache size causes
the leakage power and area of the on-chip caches to increase.
Recently, to reduce the leakage power and area of the
SRAM based cache, the SRAM-eDRAM hybrid cache was
proposed. For SRAM-eDRAM hybrid caches, however,
there has not been any study to analyze the effects of the
reduced area on wire delay, cache access time, and performance. By replacing half (or three fourth) of SRAM cells by
small eDRAM cells for the SRAM-eDRAM hybrid caches,
wire length is shortened, which eventually results in the reduction of wire delay and cache access time. In this paper,
we evaluate the SRAM-eDRAM hybrid caches in terms of
the energy, area, wire delay, access time, and performance.
We show that the SRAM-eDRAM hybrid cache reduces the
energy consumption, area, wire delay, and SRAM array
access time by up to 53.9%, 49.9%, 50.4%, and 38.7%, respectively, compared to the SRAM based cache.
Keywords
SRAM-eDRAM hybrid cache, wire delay, access time
1. Introduction
Recent microprocessors have multi-level cache hierarchy
which is composed of the first level (L1), second level (L2),
and last-level cache (LLC). Since cache access time is crucial for performance, SRAM cells which have faster access
time than DRAM cells are suitable for cache memory. However, SRAM cells relatively occupy larger area and consume
much more leakage power than DRAM cells, since SRAM
cells have more transistors. The whole size of on-chip caches including L1, L2, and LLC is large as much as multimegabyte. An important point is that the cache access time
is increased with the cache size, since the delay to travel
route wire is increased by the enlarged area.
Cache access time is composed of the following delay
models based on [7]: i) h-tree wire delay, ii) decoder delay,
iii) wordline delay, iv) bitline delay, and v) Sense amplifier
delay. These delay models can be classified into two categories: i) area-sensitive and ii) process technology sensitive.
The wire length is increased with the cache area so that the
electron transfer time through wire is inevitably increased.
Especially, the h-tree wire delay is the most area-sensitive
delay model. Note the h-tree wire delay accounts for substantial portion of cache access time [7]. On the other hand,
the delay to transfer an electron is largely dependent on the
978-1-4673-4953-6/13/$31.00 2013 IEEE
2. Related Work
2.1. SRAM-eDRAM Hybrid Cache
524
3.2. Area
SRAM
0.57(100.0%)
1S1D
1S3D
0.57(99.3%)
0.55(95.6%)
8MB(LLC)
13.93(100.0%)
9.80(70.4%)
7.88(56.6%)
16MB(LLC)
28.64(100.0%)
19.51(68.1%)
14.49(50.1%)
For evaluation, we use the three different caches depending on the ratio of eDRAM array: the SRAM based cache,
1S1D hybrid cache, and 1S3D hybrid cache. We evaluate
these three different caches in the perspective of energy consumption, cache access time, and performance. Additionally,
we compare the SRAM-eDRAM hybrid caches with NUCA.
Table II shows the specification of two different cache configurations (Config.1 and Config.2) used in our evaluation.
Config.2
32nm
64B
64KB
4-way set assoc.
8MB
8-way set assoc.
16MB
8-way set assoc.
Group 1
Group 2
Group 3
Group 4
Group 5
Group 6
SRAM
1S1D
1S3D
64KB (L1)
8MB (LLC)
16MB (LLC)
3 cycles
21 cycles
25 cycles
3 cycles
16 cycles
23 cycles
3 cycles
14 cycles
16 cycles
6. Acknowledgements
Figure 8: Performance of each group of applications.
The 1S1D/1S3D hybrid caches and NUCA are applied
to L2 cache.
4.4. Performance
While the SRAM array access time is further decreased
by the SRAM-eDRAM hybrid cache as the cache size is
increased, as shown in Section 4.3, the performance might
be deteriorated by the SRAM-eDRAM hybrid cache due to
the access time to the eDRAM array. However, the shortened SRAM array access time compensates for the increased
average cache access time caused by the eDRAM cells.
When the SRAM-eDRAM hybrid cache is applied to the L1
cache, performance loss is 2%, on average. On the other
hand, when the SRAM-eDRAM hybrid cache is applied to
the LLC, the performance loss by the eDRAM cells is negligible, since the SRAM array access time considering the
reduced wire delay (as shown in Fig. 7) is around 30% more
reduced compared to that not considering the reduced wire
delay in the LLC.
Fig. 8 shows the IPC which is normalized to the SRAM
based LLC with the different groups of applications. For
evaluation, we use the cache configuration as described in
Table II. The SRAM-eDRAM hybrid cache and NUCA are
applied to the LLC. In our evaluation, we concentrate on the
LLC since the impact of the SRAM-eDRAM hybrid cache
on the L1 cache was already investigated in [6]. As shown in
Fig. 8, the impact of the SRAM-eDRAM hybrid cache on
performance is not prominent, since the number of L2 cache
accesses is much less compared to the L1 cache. In case of
the group 1 or group 6, the performance improvement is
slight (on average, 1.0%). Though the performance improvement by the SRAM-eDRAM hybrid cache is not so
significant, the energy consumption is still significantly reduced (as shown in Fig. 6(b)) with relatively small area.
However, the SRAM-eDRAM hybrid cache is expected to
further enhance performance is expected to be further improved by the SRAM-eDRAM hybrid cache as the cache
size is increased.
5. Conclusion
The SRAM-eDRAM hybrid cache was originally proposed to reduce leakage power and area of SRAM cells,
maintaining performance. However, the previous studies did
7. References
[1] C. Kim, D. Burger, S.W. Keckler, An adaptive, nonuniform cache structure for wire-delay dominated onchip caches, Proceedings of the 10th International Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS), vol. 30,
pp. 211-222, Dec. 2002.
[2] N. Muralimanohar and R. Balasubramonian, Interconnect Design Considerations for Large NUCA Caches,
Proceedings of the 34th annual international symposium
on Computer architecture (ISCA 07), , vol. 35, pp.
369-380, May 2007.
[3] J. J. Sharkey, D. Ponomarev, and K. Ghose M-Sim: A
Flexible, Multithreaded Architectural Simulation Environment, Technical report CS-TR-05-DP01, Department of Computer Science, State University of New
York at Binghamton, Oct. 2005.
[4] S. Thoziyoor, N. Muralimanohar, J.H. Ahn, and N.P.
Jouppi CACTI 5.1, Technical report, HPL-2008-20,
HP Laboratories, Apr. 2008.
[5] A. Valero, J. Sahuquillo, S. Petit, V. Lorente, R. Canal,
P. Lopez, and J. Duato, An hybrid eDRAM/SRAM
macrocell to implement first-level data caches, Proceedings of the 42nd Annual IEEE/ACM International
Symposium on Microarchitecture, pp. 213221, Dec.
2009.
[6] A. Valero, J. Sahuquillo, V. Lorente, S. Petit, P. Lopez,
and J. Duato, Impact on performance and energy of
the retention time and processor frequency in L1 macrocell-based data caches, IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, pp. 1-10, May
2011.
[7] S.J.E. Wilton and N.P. Jouppi, CACTI: An enhanced
cache access and cycle time model, IEEE Journal of
Solid-State Circuits, vol. 31, no. 5, May 1996.
[8] SimpleScalar toolset. http://www.simplescalar.com/
[9] SPEC CPU2006 http://www.spec.org/cpu2006/
223
I. INTRODUCTION
In recent microprocessors, the capacity of onn-chip memory is
rapidly increasing to improve overall performaance. As a larger
cache memory is demanded, SRAM plays an inccreasingly critical
role in modern microprocessor systems [1], porrtable devices like
PDA, cellular phones, portable multimedia deviices etc. To attain
higher speed, SRAM based cache memories and System-onchips (SOCs) are commonly used [2]. Due too device scaling,
SRAM design is facing several challengges like power
consumption problems, stability and area. Six--transistor SRAM
cell is conventionally used as the memorry cell [3] [4].
Substantial problems have already been encouuntered when the
conventional six transistors (6T) SRAM cell configuration is
used. This cell shows poor stability [3]. It has small hold and
read static noise margins. During the read operaation, the stability
224
RL is at VDD and N4 is OFF becaause Qb=0. Now BL has no
path to discharge to ground, hence retaining the held charge,
indicating stored data is 1.
C. Hold operation of 7T SRAM celll
During hold state, Q and Qb
b nodes of 7T SRAM cell
maintain the stored data until the power is available from the
power supply. If data stored is 1, then Q=VDD and Qb=0V. If
data stored is 0, then Q=0V and Qb
b=VDD.
N RESULTS
III. SIMULATION
Figure 2: Proposed 7T SRAM cell
Since 7TSRAM cell uses only one bit linee, power required
for charging and discharging of one more bit line will be
reduced. Hence usage of only one bit line reeduces the power
required to charge and discharge the bit lines to approximately
half, because only one bit line is charged duringg a read operation
instead of two. The bit line is charged during a write operation
about half of the time instead of every tim
me when a write
operation is required, here we are assuming equual probability of
writing 0 and 1. The proposed 7T SRAM
M cell uses two
transistors N4 and N5 with read-line (RL) for reead operation.
Conventional
6T SRAM cell
(uW)
9.35
9.35
8.62
8.52
Pro
oposed 7T
SR
RAM cell
(uW)
7.29
7.73
7.11
6.70
Improvement
(%)
22.03
17.33
17.52
21.36
10
8
Conventional 6T
SRAM cell
6
4
Proposed 7T
SRAM cell
2
0
WRITE 0 WRITE 1 READ 0 READ
D1
225
the proposed design. These total power consumed include, total
power consumed by bit lines (BL and BLb), transistors of cell,
word line (WL), and read line (RL) in case of proposed design. It
is found that power consumed by proposed 7T SRAM cell is less
comparing with conventional 6T SRAM cell.
In case of write operation of conventional 6T SRAM cell,
both the bit lines are loaded with complementary data and after
that charged values are floated. Depending on the value loaded,
any one of the bit line will be charged. Once the write operation
is finished, it is assumed that the charged value is discharged.
Here we are assuming equal probabilities of write 0 and write
1 operations. So power dissipation happens twice during a
write operation. In case of write operation of proposed 7T
SRAM cell, single bit-line is charged if the data need to be
written is 1 otherwise it is not charged if the data need to be
written is 0. After the write 1 operation, it is assumes that
charged value is discharged. Assuming equal probabilities for
write 0 and write 1 operations power consumed is almost half
compared to power consumption in conventional 6T SRAM Cell.
During read operation of 6T SRAM Cell, both the bit-lines
are charged to VDD and charged values are floated. One of the
bit lines will discharge depending on the data present in the cell,
while the other bit line is discharged after the operation is
complete. So in read operation power dissipation happens four
times. The single bit line of proposed 7T SRAM cell is charged
to VDD during read operation. If the data stored is 0, the bit
line discharges, otherwise it is assumed that the bit line
discharges after the read operation. So in read operation power
dissipation happens two times. Hence power consumed during
read operation is half compared to power consumption in
conventional 6T SRAM cell.
B. Delay calculation
SRAM delays are usually defined as the time taken to switch
SRAM cells data nodes from one logic to other logic. Delay is
measured as the time difference between 10% and 90% of the
voltage swing.
Table 2: Delay comparison between proposed 7T SRAM cell and
conventional 6T SRAM cell
Operation
Conventional 6T SRAM
Proposed 7T SRAM cell
(ps)
cell (ps)
WRITE 0
72.8
85.7
WRITE 1
104.1
139.3
READ 0
25.5
35
READ 1
22.5
0
SNM
226
0.702
0.66
0.66
0.58
227
0.595
0.55
0.437
0
0.49
Conveentional 6T
SRAM
M cell
osed 7T SRAM
Propo
cell
0.8
0.6
0.4
0.2
0
RSNM
5P
228
[2]. Sanjeev K. Jain/ Pankaj Agarwal, A Low Leakage and SNM Free SRAM
Cell Design in Deep Sub micron CMOS Technology, 19th International
Conference on VLSI Design (VLSID06)1063-9667 2006 IEEE.
[3]. Paridhi Athe and S. Dasgupta, A Comparative Study of 6T, 8T and 9T
Decanano SRAM cell, 978-1-4244-4683-4/09 2009 IEEE.
[4]. Arash Azizi Mazreah, Mohammad Reza Sahebi, Mohammad Taghi
Manzuri, S. Javad Hosseini, A Novel Zero-Aware Four-Transistor SRAM
Cell for High Density and Low Power Cache Application, 978-0-76953489-3/08 $25.00 2008 IEEEDOI 10.1109/ICACTE.2008.
[5]. Sheng Lin, Yong-Bin Kim and Fabrizio Lombardi, Dept of Electrical and
Computer Engineering Northeastern University, Boston, MA, USA, A
32nm SRAM Design for Low Power and High Stability, unpublished.
[6]. Prashant Upadhyay, Mr. Rajesh Mehra, Niveditta Thakur, Low Power
Design of an SRAM Cell for Portable Devices, 978-1-4244-9034 2010
IEEE.
[7]. Ming-Hsien Tu, Jihi-Yu Lin, Ming-Chien Tsai, Shyh-Jye Jou, SingleEnded Sub-threshold SRAM With Asymmetrical Write/Read-Assist,
1549-8328 2010 IEEE.
[8]. B. Alorda, G. Torrens, S. Bota and J. Segura, Static-Noise Margin
Analysis during Read Operation of 6T SRAM Cells, unpublished.
[9]. Seevinck, E., List, F.J., Lohstroh, J. Static-Noise Margin Analysis of MOS
SRAM Cells, IEEE Journal of Solid-State Circuits, SC-22, 5 (Oct. 1987),
748-754.
[10]. Benton H. Calhoun and Anantha, Analyzing Static Noise Margin for Sub
threshold SRAM in 65nm CMOS, MIT, 50 Vassar St 38-107, Cambridge,
MA, 02139 USA, unpublished.
[11]. Zheng Guo, Andrew Carlson, Liang-Teck Pang, Member, Kenneth T.
Duong, Tsu-Jae King Liu, and Borivoje Nikolic, Large-Scale SRAM
Variability Characterization in 45 nm CMOS, 0018-9200 2009 IEEE.
[12]. Benton Highsmith Calhoun, and Anantha P. Chandrakasan, A 256-kb 65nm Sub-threshold SRAM Design for Ultra-Low-Voltage Operation, 00189200 2007 IEEE.
[13]. Koichi Takeda, Yasuhiko Hagihara, Yoshiharu Aimoto, Masahiro Nomura,
Yoetsu Nakazawa, Toshio Ishii, and Hiroyuki Kobatake, A Read-StaticNoise-Margin-Free SRAM Cell for Low-VDD and High-Speed
Applications, 0018-9200 2006 IEEE.
[14]. Shilpi Birla, Neeraj Kr. Shukla, Manisha Pattanaik, R.K.Singh, Deviceand-Circuit-Design-Challenges-for-Low-Leakage-SRAM for Ultra Low
Power Applications, Canadian Journal on Electrical & Electronics
Engineering Vol. 1, No. 7, December 2010.
A New Assist Technique to Enhance the Read and Write Margins of Low Voltage
SRAM cell
Santhosh Keshavarapu
Saumya Jain
Manisha Pattanaik
I.
INTRODUCTION
II.
PROPOSED METHOD
The cell ratio and pull up ratios are the two important
factors which decide the read and write margins of the
SRAM bit cells. The cell ratio is the ratio of pull down
NMOS transistor to the width of access NMOS transistor.
Pull up ratio is the width of pull up PMOS transistor to
access NMOS transistor in SRAM cell. If the cell ratio is
high then the read margin of the cell increases
correspondingly and by reducing the pull up ratio the write
margin of the cell increases. These two ratios are
contradictory to each other i.e. by modifying one factor the
other margin decreases proportionally. Hence we propose a
mechanism to enhance the read and write margins of the
SRAM bit cell without affecting the other margin.
In the read operation if the supply voltage of the cell is
higher than that of the word line voltage then the read
margin of the cell increases and vice versa. Similarly for the
97
will not reduce the voltage which is obtained with the two
transistors.
A. Circuit Operation
98
Figure 6. Read Noise Margin curve for proposed read assist technique
SIMULATION RESULTS
[A]: -400C
[G]:700C.
[B]: -200C
[C]: 00C
[D]: 200C
[E]:400C
[F]:600C
A. Read Margin
For the measurement of read margin conventional static
noise margin (i.e. butterfly curve) approach is used and the
curve of read noise margin for the proposed read assist
technique is shown in the Figure 6. The node voltages Q,
QB are the storage nodes. The butterfly curve is obtained by
sweeping these storage nodes and measuring the other
storage nodes voltage.
99
B. Write Margin
The write margin is measured by sweeping the right
word line voltage(WLR)[7][8] of the SRAM cell. On the Y
axis the storage node voltage is taken and on the X axis
sweeping right word line voltage is taken. The write margin
curve for proposed write assist technique is shown in the
Figure 11.
Figure 11. write margin curve for the proposed write assist technique
100
[2]
Figure 15. Write margin curves with different process corners
[3]
[4]
[5]
[6]
Figure 16. Comparison of write margin with different techniques
A: 6T(=1) B: 6T(=2) C: 6T(=3) D: proposed read assist with
6T(=1) E: Dynamic word line voltage with 6T F:8T cell G:Proposed
write assist with 8T.
[7]
[8]
[9]
CONCLUSION
[10]
[11]
[12]
[13]
101
improvement
in
read
and
write SNM
respectively
in
(a)
vee ri"lr===----,
using
90nm
TSMC
CMOS
model
confirms
the
III
a
recent
years,
sub-threshold
o L-_____-J
(b)
INTRODUCTION
design for
low
power
Q(V)
vee
Figure I. a) conventional 6T cell SRAM and b) butterfly curve for hold and
read SNM
sub-threshold
experiences
poor
region,
read
and
conventional
write
ability.
6T-cell
SRAM
With
various
operation
and
internal
write
novel
scheme
is
proposed
that
uses
dynamic
the
write
time.
ScarifYing
the
read
time
and
augments
the
read
SNM.
Although
104
robust stability.
the
area
structure
...
--,--.,-,
1.2 ,---....---
1.2
III
(a)
0.8
III
0.4
0 .2
1.2
0.8
0.2 0.4
1.2
0.8
0.2 0.4
a (V)
a (V)
Figure 2. Butterfly curve shows a) hold SNM reduction and b) read SNM
reduction by lowering
VDD.
(b)
II.
A.
cell with all controlling signals. c) layout of two columns related to the
6T cell failure
from Fig. l(a) and the inverse VTC from inverter 1. The SNM
the
is defined as the length of the side of the largest square that can
consumption is high.
reading
time
and
due
to
be
asymmetric,
the
area
III.
contains only two inverters that regenerate the data. This VTC
is related to the hold phase. The other use of SNM is the
measurement of cell stability during the read operation. During
read operation, two access transistors interact with the inverters
and deteriorate the SNM. Fig. l(b) shows the read VTC that
contains smaller square in comparison to the hold VTC.
The impact of decreasing the VDD is shown in Fig. 2.
In
3(a)
shows
the
conventional
cell
[8].
The
conventional 9T cell uses two different paths for read and write
operations; WBL and WBLB perform write operation and the
single RBL is for read phase activation. Thus, the area and the
2013
&
SRAM
9T
PROPOSED 9 T CELL
Read operation
Fig. 4 shows the proposed 9Tcell read operation simulation
S2 and S3 are
105
I.
Ci
:: :: 1I!
200
" r
SNM.
:=============
300,____
27S !
"
250
"
22S '
200 I;
175 1
SO
100
200
250
read
SNM
for
various
VDD
\,
-;; 200.
....co..s.
100;
o
IL
\---___-. _
_-_-_-_-_-_-_-_-_-==----1
____
300
-s; 200
..s
100
300
co
150
shows
300.L----'I
is acceptable.
: 1 1----1
I
Fig.
300
Time (ns)
200
'
loa
while
reading
1, BLB will be
300
cJ
200
100
and
M2
the
bitline
leakage
decrease
significant amount.
Fig.S
shows
the
E.
300
200
100
0
Write operation
B.
-;;
1
I
transistors,
i
1
l
9Tcell write
(0)
operation
1----_1
I
proposed
10
20
30
40
SO
Time (ns)
Figure 5. Proposed 9Tcell write operation simulation results.
HoIdSNM
106
2013
and
&
TABLE
ReadSNM
TABLE II
ConvellIional 9T
[8J
45
91
82
90
90
135
91
91
90
33
33
39
4. 5
4. 5
3. 6
differential
single
single
1. 8A
I. 7A
PERATION
i
:>
z
00
50
Read method
WriteSNM
50
(fi)
VOO(V)
Area
,--------r=---<>=;:=c;= T .=;
.2<) "'c:il
-B-T=20C
----'O'-T=80C
Proposed
9T
Conventional
6T
50
i:> _loa
ill
-150
_25 L
,
05
"\-c
5 -
o ,-
3 --035 fc
0
-;; 25-
02-----0"'
5 ----fo
-"
VDO(V)
Fig.
shows
the
write
SNM
for
various
simulated
to show the
2
consumption is 0.2mm .
VDD
of
cell.
Table I
shows various
[I]
[4]
accesses
that
improve
the
readability
[6]
by
and
50%
in
read
Computer
ICCD
2005.
K;
K;
[7 ]
mV
SRAM
With
Expanded
Write
and
Read
Margins
for
CONCLUSION
improvement
2005.
Processors,
and
Computers
VLSI in
K;
SRAM
subthreshold
multiple
area
total
Design:
the
REFERENCES
and
and
performance
and
write
[8]
2008. 51st Midwest Symposium on, vol. , no. , pp.422-425, 10-13 Aug.
SNM
Sheng Lin; Yong-Bin Kim; Lombardi, F.; , "A 32nm SRAM design for
low power and high stability," Circuits and Systems, 2008. MWSCAS
2008.
[9]
2013
&
107
AbstractFor the first time through this paper, a Static Random Access
Memory using 9T SRAM, 8T SRAM and 6T SRAM has been compared
using N-curve and statistical analysis which demonstrated a multi-fold
performance enhancement. In this paper, 9T SRAM cell with extra
transistors compared to 8T SRAM and 6T SRAM cells, is giving the
higher stability (SVNM, SINM, WTV, and WTI) as compare to
conventional SRAM cells. The paper analysis the variety of parameters
such as stability (SVNM, SINM, WTV, and WTI), area and leakage
power consumption. A comparison based study of the Cell Ratio (CR),
the Pull-up-ratio (PR) with SVNM has been shown. A statistical model
has been developed displaying the power histogram during the write and
read cycle for the 9T SRAM cell. The 9T SRAM cell shows a much better
stability, less standby power consumption and higher area as compare to
6T and 8T SRAM cell counterparts. The design is based on the 90 nm
CMOS process technology.
I. INTRODUCTION
A. Operation
The conventional 6T-SRAM cell has three different modes;
standby mode, write mode and read mode. In standby mode no
write or read operation is performed means circuit is idle, in
read mode the data is read from output node to the bit lines
and the write mode updating the data or contents. The SRAM
to operate in read mode should have readability and write
mode should have "write ability. The three different modes
work as follows.
Standby: If the word line WL is low (0) then the access
transistors M3 and M4 become turn off and the bit lines
disconnect from both the access transistors. The cross-coupled
inverters will continue reinforce each other, in this mode the
current drawn from supply voltage is called standby or leakage
current.
Reading: In read mode the word line WL (1), turns ON the
transistors N3 and N4, when both the transistors turn on than
the value of Q and QB are transferred to BL and BLB bit lines
respectively but before giving the WL(1) high the bit lines BL
and BLB should be pre-charge to VDD Assume that the 1 is
stored at Q and 0 at QB so no current flows through N4 and
current is flow through N3 that will discharge the BLB
through N3 and N0.this voltage difference means the read
CMOS
PROCESS
SVNM
SINM
WTV
WTI
Conventional
SRAM 6T
90nm /1V
8T-SRAM
90nm /1V
90nm /1V
273.9mV
18.40uA
446.6mV
-13.53uA
273.9mV
18.42uA
446.6mV
-13.55uA
529.5mV
32.85uA
598mV
-18.27uA
9T- SRAM
TABLE II
AREA AND POWER IMPROVEMENT OF THE 9T SRAM CELLS
COMPARED TO THAT OF THE CONVENTIONAL SRAM CELL
SRAM
Cell
Leakage power
consumption(nW)
6T
8T
9T
56.23
55.10
48.43
Area(mm2)
26.1756
32.6710
34.1381
Fig. 11. SVNM variations with WTV (Write trip voltage) and CR for the
6T SRAM Cell.
Fig. 15. The histogram of reading power of 6T SRAM cell from 10k point
M-C simulation.
Fig
12.
Monte
carlo
simulation
of
8T
SRAM
Cell.
Fig.
VI. CONCLUSION
Fig. 14. The histogram of write power of 6T SRAM cell from 10k point M-C
simulation.
VII. REFERENCES
[1] Virtuso advanced Analysis tools user guide ,product version 5.1.41(2007).
[2] Jiajing.wang ,Satyanand nalam , analyzing static and dynamic Write
margin for nanometer SRAMs ISLPED08, augerst 11-13 ,2008
[3] Debasis mukherjee ,Hemanta kr.mondal and B.V.R. reddy, static noise
margin analysis of SRAM cell for high speed application, IJCSI international
journal of computer science issue volume7 issue 5 september 2010.
[4] Sung-Mo Kang ,Yusuf Leblebici, CMOS digital Intergrated circuit
analysis and design edition 2003, pp 402-519.
[5] K. Dhanumjaya1, M. Sudha2, Dr.MN.Giri Prasad3, Dr.K.Padmaraju, Cell
stability analysis of conventional 6T dynamic 8T SRAM cell in 45nm
technology, International Journal of VLSI design & Communication Systems
(VLSICS) Vol.3, No.2, April 2012
[6] E. Grossar et al., Read stability and write-ability analysis of SRAM cells
for manometer technologies, IEEE J. Solid-State Circuits, vol.41, no. 11, pp.
25772588, Nov. 2006.
[7] Jiajing .wang and Amith singhee, Statistical modeling for the minimum
standby supply voltage of a full SRAM array IEEE trans.2007
[8] S.Dasgupta and Paridhi ate , A comparative study of 6T,8T and 9T
SRAM cell, IEEE trans. October 4-6 ,2009.
[9] Vikas nehra and Rajesh singh ,simulation of 8T SRAM cell stability at
CMOS technology for multimedia application ,Canadian journal on
electronics engineering vol.3 no.1 january 2012.
[10] Shilpi birla, R.K.Singh, and Manisha Pattnaik, Static Noise Margin
Analysis of Various SRAM Topologies , IACSIT International Journal of
Engineering and Technology,Volume 3, No.3, June 2011.
Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
I. INTRODUCTION
Embedded memories are popular in the realization of
todays complex systems known as system on chips (SOCs).
The forecast for 2013 from International Technology Roadmap
for Semiconductors (ITRS) [1] [2] states that 90% of the area
of SOCs will be made up of memories most specifically static
random access memories (SRAMs) Large arrays of fast SRAM
help in expanding the system performance. However, this
increases the chip cost. Thus for area cost optimization the size
of SRAM cells are minimized. Thus, small SRAM cells are
closely placed making SRAM arrays the densest circuitry on a
chip. Such areas on the chip can be vulnerable to
manufacturing defects and process variations.
This implies that test cost of memories will make a large
impact on the test cost of the SOCs. The faults in memories
result in reduction of yield. In critical systems these may cause
systems failure. Thus adequate test methods must be
employed in order to minimize the cost while maintaining
Figure 1: 6T CMOS SRAM cell
efficiency thereby increasing the quality of the product. In a
SRAM testing, various fault models such as stuck-at, transition,
The writing operation of a SRAM can be classified as
coupling faults are used. In order to detect these faults, March
tests [3] [4] has been widely used. But these detection transition writing and non transition writing. Transition writing
processes are time consuming. Testing using quiescent current will cause transient current to flow from the power supply to
331
Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
ground. Thus if a transient current is detected during non
transition write or no transient current is detected during a
transition writing can confirm the presence of fault in the cell.
The peak of the transient current also varies with the presence
of different faults, which can be sensed to predict the
occurrence of faults
B. Faults introduced
Fault occurs in SRAM due to logical or electrical design
error, manufacturing defects, aging of components, or
destruction of components (due to exposure to radiation) or
process variations. Manufacturing defects are defects that were
not intended. A manufacturing defect can occur despite of
careful design.
The size of memory causes physical examination of SRAM
impossible. Thus testing mechanism is based on the
comparison of logical behaviour of faulty memory against
good memory. To compare logical behaviour of faulty
memories against good ones, modelling the physical failure
mechanisms as logic fault models is required. The failure in
SRAM occurs due to open and bridging faults as shown in
Figure 2.
TABLE I
RESISTANCE INTRODUCED TO MODEL FAULTS
Resistance
Resistance
Nature of fault
Value()
Fault
model
R1
1M
Open defect 1
TF
R2
1M
Open defect 2
DRF
R3
1M
Open defect 3
S-a-1
R4
1M
Open defect 4
DRF
R5
1M
Open defect 5
SOF
R6
1M
Open defect 6
DRF
R7
10
Bridging defect 1
SAF
R8
10
Bridging defect 2
S-a-1
R9
10
Bridging defect 3
S-a-0
R10
10
Bridging defect 4
CF
R11
10
Bridging defect 5
CF
Open fault [8] [9] occurs where two nodes which are
supposed to be connected is left open and they can be modelled
as a high resistance connected between those particular nodes.
Bridging fault [10] [11]t is modelled as a low valued resistance
connected across two nodes to show the shorting of the nodes
which are supposed to be open. Thus any resistances shown
above can be introduced into a SRAM to make it faulty. The
values of resistance used are given in Table I. It also states the
nature of that particular fault. The established logic fault
models in SRAM are listed below. The model predicts five
functional fault classes, stuck-at fault (SAF), transition fault
(TF), stuckopen fault (SOF), coupling fault (CF) and data
retention fault (DRF).
A. VDDT Sensor
IDDT current [12] [13] is a very fast action, it is extremely
difficult to sense and process it. Mostly in low power
technologies, processing the dynamic supply current is almost
insoluble. Thus by transforming the current to voltage, and
then handling the resulted voltage waveform is a possible
solution [14]. A VDDT sensor is shown in Figure 3. The
output voltage keeps the shape of the dynamic current, but they
are stretched in time. This stretch in time is useful for further
processing. The main advantage is the speed of test, where
only two write operations are required as compared to March
tests, where minimum 4 operations are required to detect the
fault.
332
Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
obtained at the output of comparator. Instead of a SRAM cell
an array of cells can also tested with this same circuitry.
333
Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
Column decoder is used activate a particular BL and
BLBAR lines. Row decoder is used to activate word lines. A
block level representation of the entire memory system is
shown in Figure 6. SRAMs in the same row share same word
line and SRAMs in same column share common bit line. A 4*4
is made hence 16 six transistor SRAMs are employed. Column
decoder is used activate a particular BL and BLBAR lines.
Row decoder is used to activate word lines.
The SRAM array testing circuitry is same as that of a
SRAM cell as shown in the previous section. The difference is
that in place of a cell an array of SRAM is placed. A faulty
array with a good array is compared where the transient
voltage differs from each other giving a pulse at the output of
the comparator. Hence pulsed output at the comparator gives
the occurrence fault in the SRAM array.
IV. SIMULATION RESULTS
A. Sensor Output
A write operation that flips the data in the SRAM will
cause transient current to flow through the SRAM. Presence of
fault in a SRAM changes the amount of transient current
flowing through it. Figure 6 shows the difference in transient
current of a faulty and fault free SRAM. Here write operation
is done in every 50ns. The difference in transient current within
an individual memory cell is due to 1 to 0 writing or 0 to 1
writing into the SRAM cell
.
Fault free
Faulty
transient
transient
current(A)
current(A)
R1
-58.767
-11.730
R2
-58.767
-039.201
R3
-58.767
-070.524
R4
-58.767
-085.688
R5
-58.767
-075.570
R6
-58.767
-183.740
R7
-58.767
-082.716
R8
-58.767
-377.510
R9
-58.767
-210.926
R10
-58.767
-000.427
R11
-58.767
-028.773
334
Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
the presence of fault in SRAM. And a zero dc output at
V(MAINOUT) is obtained for a fault free SRAM cell.
V. CONCLUSIONS
In this paper a BIST using transient current approach for
fault detection, is implemented and its effectiveness has been
tested using simple memory architecture having single and
multiple faults. Presence of fault is shown as a pulse in the
output of the circuit. Absence of fault is given by a zero dc at
the output. This BIST is also effective for detecting faults for a
4*4 SRAM array also.
This BIST architecture may be extended to detect
faults in larger SRAM array. A novel high performance BIST
can be designed. High performance includes high speed, low
power and area efficient BIST.
REFERENCES
[1]
[2]
[3]
[4]
[5]
Figure 8: Output of BIST circuit
[6]
[7]
[8]
[9]
[10]
TABLE III
TRANSIENT CURRENT IN SRAM CELL WITH MULTIPLE FAULTS
Resistance
Faulty SRAM
introduced
transient
transient
current(A)
current(A)
R1,R2
-58.767
-21.563
R1,R4
-58.767
-089.001
R4,R5
-58.767
-159.897
R4,R5,R10
-58.767
-047.688
R5,R11
-58.767
-161.580
R6,R8
-58.767
-12.890
R7,R9.R4
-58.767
-019.716
[11]
[12]
[13]
[14]
[15]
335
,QWHJUDWLQJ(PEHGGHG7HVW,QIUDVWUXFWXUHLQ
65$0&RUHVWR'HWHFW$JLQJ
:3UDWHV/%RO]DQL*+DUXW\XQ\DQ$'DYW\DQ)9DUJDV<=RULDQ
(OHFWULFDO(QJLQHHULQJ'HSW
&DWKROLF8QLYHUVLW\38&56
3RUWR$OHJUH%UD]LO
YDUJDV#SXFUVEU
OHWLFLD#SRHKOVFRP
, ,1752'8&7,21
6\QRSV\V
6\QRSV\V
<HUHYDQ$UPHQLD
&$86$
*XUJHQ+DUXW\XQ\DQ#V\QRSV\VFRP
$UPDQ'DYW\DQ#V\QRSV\VFRP
<HUYDQW=RULDQ#V\QRSV\VFRP
EORFN DQG ZULWH FLUFXLWU\ ZKLFK LQFUHDVHV GHVLJQ
FRPSOH[LW\ PRUH WKDQ VLPSO\ DGGLQJ DQ RQFKLS VHQVRU
LWVHOI F WR SHUIRUP 65$0 DJLQJ WHVW RQ WKH ILHOG WKH
ZKROH PHPRU\ FRQWH[W PXVW EH SUHYLRXVO\ VDYHG LQ D
VSDUH PHPRU\ EHIRUH WHVW H[HFXWLRQ EHFDXVH WKH WHVW
SURFHGXUH RYHUZULWHV WKH DSSOLFDWLRQ GDWD VWRUHG LQ
PHPRU\ )RU KLJKDYDLODELOLW\ DSSOLFDWLRQV ORQJ
GRZQWLPH SHULRGV PD\ EH D VHULRXV UHVWULFWLRQ $QRWKHU
DSSURDFK >@ SUHVHQWV D FRPSDFW RQFKLS VHQVRU GHVLJQ
WKDWWUDFNV1%7,IRU65$0V7KHVHQVRULVHPEHGGHGLQ
WKH65$0DUUD\DQGWDNHVWKHIRUPRID765$0FHOO
7KLVDSSURDFKW\SLFDOO\FRQVLVWVRIKXQGUHGVRUWKRXVDQGV
RI VHQVRUV WR DFKLHYH GHFHQW VHQVLQJ SUHFLVLRQ )RU
H[DPSOH IRU D 0ELW 65$0 D V\VWHP RI RQH WKRXVDQG
VHQVRUV ZDV GHVLJQHG HDFK VHQVRU PRQLWRUV D VPDOO
VXEVHW RI FHOOV UHVXOWLQJ LQ D ODUJH DUHD RYHUKHDG
$GGLWLRQDOO\LIWKHVHQVRULVGULYHQLQWRDIDXOWVWDWHLH
LW WXUQV LQWR DQ DJHG FHOO WKLV GRHV QRW PHDQ WKDW WKH
QHLJKERUKRRG FHOOV ZRXOG EH GHIHFWLYH DV ZHOO $QG WKH
RSSRVLWHLVDOVRWUXHZKLFKLVHYHQZRUVH7KLVFRQGLWLRQ
GHJUDGHVWKHDSSURDFK
VUHOLDELOLW\
,Q >@ WKH DXWKRUV SUHVHQWHG WKH ILUVW YHUVLRQ RI WKH
2Q&KLS $JLQJ 6HQVRU 2&$6 DSSURDFK 7KLV VHQVRU
ZDVGHVLJQHGLQDFRPPHUFLDOQPWHFKQRORJ\
,QWKLVSDSHUZHSUHVHQWWKHLQWHJUDWLRQRIWKH2&$6
DSSURDFKLQWKHGHVLJQPHWKRGRORJ\RIQPVLQJOHSRUW
65$0FRUHV7KHJRDOLVWRHQKDQFHWKHFXUUHQW6\QRSV\V
WHVW DQG UHSDLU RQFKLS LQIUDVWUXFWXUH 67$5 6HOI7HVW
DQG 5HSDLU 6ROXWLRQ >@ WR GHWHFW 65$0 DJLQJ GXULQJ
V\VWHPOLIHWLPH7KH67$50HPRU\6\VWHPVROXWLRQZDV
GHYHORSHG ZLWKLQ 6\QRSV\V 'HVLJQ :DUH DOORZLQJ XVHUV
WRFUHDWHLQWHJUDWHDQGYHULI\HPEHGGHGPHPRU\WHVWDQG
UHSDLU LQIUDVWUXFWXUHLQV\VWHPRQFKLSV67$5GHWHFWVD
ZLGH UDQJH RI UHDOLVWLF IDXOWV VXFK DV UHVLVWLYH
SHUIRUPDQFH VWDWLF DQG G\QDPLF OLQNHG DQG XQOLQNHG
SURFHVVYDULDWLRQIDXOWV%DVHGRQ63,&(VLPXODWLRQVZH
KDYH LQYHVWLJDWHG WZR VLWXDWLRQV D WKH DJLQJ IDXOWV DQG
WKHLU LPSDFW RQ QP 65$0V DQG E WKH 2&$6
VHQVLWLYLW\ WR GHWHFW VXFK IDXOWV LQ WKH WDUJHW QP
WHFKQRORJ\
c
978-1-4799-0664-2/13/$31.00 2013
IEEE
,,
2&$6$3352$&+
)LJGHSLFWVWKHJHQHUDOEORFNGLDJUDPRIWKH2&$6
DSSURDFK LQGLFDWLQJ WKH FRQQHFWLRQ EHWZHHQ WKH DJLQJ
VHQVRUDQGD65$0FROXPQ$VREVHUYHGWUDQVLVWRU77
LV FRQQHFWHG EHWZHHQ WKH UHDO 9'' DQG YLUWXDO 9'' QRGH
9''ZKLFKLVXVHGWRIHHGWKHSRVLWLYHELDVWRWKHFHOOV
25
:ULWHWKHYDOXHDVUHDGLQVWHS
'ULYH &75/ VLJQDO WR (YDOXDWLRQ 3KDVH DQG
REVHUYHWKH2&$6RXWSXWIRUDSDVVIDLOGHFLVLRQ
5HWXUQ3RZHU*DWLQJVLJQDOWRDQG7HVWLQJ0RGH
VLJQDOWRDORQJZLWKWKHFROXPQZKRVH
FHOO
ZDVWHVWHG
,I WKHUH DUH PRUH FHOOV WR EH FKHFNHG UHSHDW WKH
SURFHVVIURPVWHSRWKHUZLVHVWRSWHVWLQJ
,WLVLPSRUWDQWWRPHQWLRQWKDWWKHUHLVDVPDOOFLUFXLWU\
HPEHGGHGLQWKH2&$6ZKLFKLVXVHGWRSHUIRUPVHOIWHVW
RIWKHVHQVRUEHIRUHLWLVDFWLYDWHGWRPRQLWRUWKH65$0
FHOOV7KLVVPDOOFLUFXLWQRWVKRZQLQ)LJLVIRUPHGE\
WZR UHVLVWRUV 5 DQG 5 LQ VHULHV LQ WKH VDPH
FRQILJXUDWLRQ DVWKHRQH IRUPHGE\ UHVLVWRUV 5DQG 5
EXWFRQQHFWHGWRWKHGUDLQRIWUDQVLVWRUV0DQG07KH
YROWDJHSURGXFHGDWWKLVQRGHLVHTXDOWRWKHRQHSURGXFHG
DWWKH9''DIWHUDVHTXHQFHRIWZRZULWHRSHUDWLRQVLQD
IDXOW\ DJHGFHOODFWLYDWHGGXULQJ WKH 7HVWLQJ 0RGH 6R
ZKHQDFWLYDWHGWKH2&$6VHOIWHVWLQJLWLVH[SHFWHGWKDW
2XWZLOOLQGLFDWHDQHUURUORJLFOHYHO
,,, ,17(*5$7,1*2&$6$1'65$0&25(6
)LJXUH*HQHUDOEORFNGLDJUDPRIWKHSURSRVHGDSSURDFK
26
)LJXUH6FKHPDWLFVRIWKHSURSRVHGDSSURDFK
(UURU,QGLFDWLRQ
5HI9ROWDJH
,9 (;3(5,0(176
67$50HPRU\6\VWHP
%,67
&RQWURO
$GGUHVV
'DWD
67$5&RQWURO
$JLQJ6HQVRU
9''
9LUWXDO
0HPRU\
&RQWURO
9''
5RZ
'HFRGHU
$*('
&(//
6HQVH$PSDQG23EXIIHUV
$''5
!
'
:(
0(
&/
.
*1'
)LJXUH%ORFNGLDJUDPRIWKH2&$6LQWHJUDWLRQLQD65$0FRUH
27
7DEOH,2&$6VHQVLWLYLW\DVDIXQFWLRQRIWKHQXPEHURIZULWH
RSHUDWLRQVLQWRWKH65$0FHOO
)LJ SUHVHQWV 2&$6 VHQVLWLYLW\ IRU WKH YDOXH
GLVSOD\HG RQ WKH VHFRQG FROXPQ VHFRQG OLQH RI 7DEOH ,
VXUURXQGHGE\DFLUFOH$VREVHUYHG)LJDWKH\HDU
DJLQJ IDXOW ZDV QRW GHWHFWHG E\ 2&$6 DIWHU ZULWH
RSHUDWLRQVLQWRWKHFHOOEXWLWVXFFHHGHGWRGHWHFWD\HDU
DJLQJ IDXOW ZLWK VXFK VHTXHQFH )LJ E )RU WKLV
VLPXODWLRQ VHW WKH UHIHUHQFH YROWDJH DW WKH 9''
YLUWXDO
QRGH ZDV VHW WR 9 7KLV UHIHUHQFH YDOXH ZDV
FRPSXWHGGXULQJDSUHYLRXVVLPXODWLRQE\REVHUYLQJWKH
YROWDJHYDOXHDWYLUWXDO9''
QRGHDIWHUZULWHRSHUDWLRQV
LQWRWKHWDUJHWFHOO
D
7KH VKLIW YDOXH RI ZDV VHOHFWHG E\ FRQVLGHULQJ OLWHUDWXUH ZRUNV WKDW
VXJJHVWW\SLFDO9WKSKDQJLQJIURPWRSHU\HDU>@
28
E
)LJXUH 63,&( VLPXODWLRQ IRU )) FRUQHU 7 R &HOVLXV 9''
9 D 2&$6 IDLOV WR GHWHFW D \HDU DJLQJ IDXOW LQMHFWHG LQ D
65$0FHOODIWHUDVHTXHQFH RIZULWH RSHUDWLRQVE 2&$6VXFFHHGV
WRGHWHFWD\HDUDJLQJIDXOWDIWHUVXFKDVHTXHQFHRIZULWHRSHUDWLRQV
LLDJLQJVWDWHRIWKHFHOOWREHPRQLWRUHG$VROGLVD
PHPRU\ FHOO OHVVHU LV LWV FDSDELOLW\ WR GLVFKDUJH WKH
YLUWXDO9''QRGHGXULQJDZULWHRSHUDWLRQEHFDXVHRIWKH
UHGXFHGFXUUHQWGULYHFDSDELOLW\RIWKHS026WUDQVLVWRUV
WKDW KDYH WKHLU WKUHVKROG YROWDJH LQFUHDVHG 7KLV PHDQV
WKDWDVROGLVDFHOOHDVLHULVWKH2&$6
VWDVNWRGHWHFWD
IDXOWFRQGLWLRQ
D
E
)LJXUH 63,&( VLPXODWLRQ IRU 66 FRUQHU 7 R &HOVLXV 9''
9 D 2&$6 IDLOV WR GHWHFW D \HDU DJLQJ IDXOW LQMHFWHG LQ D
65$0FHOODIWHUDVHTXHQFH RIZULWH RSHUDWLRQVE 2&$6VXFFHHGV
WRGHWHFWD\HDUDJLQJIDXOWDIWHUVXFKDVHTXHQFHRIZULWHRSHUDWLRQV
7DEOH,,2&$6VHQVLWLYLW\IRU77JHWWLQJROGGXULQJV\VWHPOLIHWLPH
&RUQHUFDVH
$JLQJIDXOW
$JLQJIDXOW
GHWHFWLRQE\ GHWHFWLRQE\
:ULWHV
:ULWHV
$JLQJIDXOW
GHWHFWLRQE\
:ULWHV
$JLQJIDXOW
GHWHFWLRQE\
:ULWHV
$ 'LVFXVVLRQV
+HUHDIWHU ZH GLVFXVV WKH SDUDPHWHUV WKDW GLUHFWO\
DIIHFW 2&$6
VHQVLWLYLW\ ZKHQ VFDOLQJ GRZQ IURP WR
QPWHFKQRORJ\0RUHVSHFLILFDOO\2&$6VHQVLWLYLW\LV
DIXQFWLRQRIWKHQXPEHURIZULWHRSHUDWLRQVWKDWKDYHWR
EHSHUIRUPHGGXULQJWKHSUHFKDUJHSKDVHRIWKH7HVWLQJ
0RGH1XPEHURIZULWHRSHUDWLRQVGHSHQGVRQWKHDFWXDO
9''QRGHFDSDFLWDQFHZKLFKLQWXUQLVDIXQFWLRQRIWKH
LQXPEHURIFHOOVFRQQHFWHGWRWKHFROXPQ$VODUJH
LVWKH 9'' QRGH FDSDFLWDQFH DODUJHQXPEHURIFHOOVLV
FRQQHFWHG WR WKH FROXPQ ORQJHU ZULWH RSHUDWLRQ
VHTXHQFHVDUHQHHGHGE\2&$6WRFKHFNWKHDJLQJVWDWH
RIDJLYHQFHOO$QGWKHUHYHUVHLVDOVRWUXHDVVPDOOHULV
WKH 9'' QRGH FDSDFLWDQFH D VPDOO QXPEHU RI FHOOV LV
FRQQHFWHG WR WKH FROXPQ VKRUWHU ZULWH RSHUDWLRQ
VHTXHQFHVDUHQHHGHG
%DVHGRQUHFHQW63,&(VLPXODWLRQVRID 65$0FDVH
VWXG\ GHVLJQHG LQ D FRPPHUFLDO QP WHFKQRORJ\ >@
2&$6
VHQVLWLYLW\ ZDV DGMXVWHG WR PLOOLYROW 6LQFH
GXULQJWKHVHFRQGZULWHRSHUDWLRQDFHOODJHGRIDWOHDVW
\HDU DW D SDFH RI LQFUHDVH RI 9WKS SURGXFHV D
YDULDWLRQ DW WKH 9'' QRGH ODUJHU WKDQ PLOOLYROW LQ D
65$0 FROXPQ FRQWDLQLQJ FHOOV 2&$6 ZDV DEOH WR
GHWHFW DQ\ FHOO DJHG RI \HDU RU PRUH DIWHU WKH VHFRQG
ZULWHRSHUDWLRQLQVXFKDQ65$0,QRUGHUWRJXDUUDQWHH
VLPLODUPLOOLYROW2&$6
VHQVLWLYLW\DIWHUVFDOLQJGRZQ
WRQPWHFKQRORJ\WKHQXPEHURIZULWHRSHUDWLRQVKDG
WR EH LQFUHDVHG $V H[DPSOH FRQVLGHU 7DEOH , IRU VKRUW
ZULWH VHTXHQFHV ZULWHV )LJ 2&$6 ZDV DEOH WR
GHWHFW FHOOV DJHG RI DW OHDVW \HDUV EXW LW IDLOHG WR
LGHQWLI\ FHOOV DJHG E\ \HDUV +RZHYHU ZKHQ LW ZDV
SHUIRUPHGORQJHUVHTXHQFHVZULWHVULJKWPRVWFROXPQ
RI7DEOH,2&$6HDVLO\GHWHFWHGFHOOVDJHGRIDWOHDVW
\HDU RU PRUH :H ZRXOG OLNH WR XQGHUOLQH WKDW UHVLVWRUV
5 DQG 5 ZHUH LQFOXGHG LQ WKH GHVLJQ RQO\ IRU
VLPXODWLRQ SXUSRVH ,Q RUGHU WR HQVXUH WKH PLOOLYROW
2&$6VHQVLWLYLW\WKHUHIHUHQFHYROWDJHZLOOEHJHQHUDWHG
RXWVLGHWKHFKLSWRDYRLGYDULDELOLW\SUREOHPVDQGODFNRI
DFFXUDF\1RWHWKDWLIRQHGHVLUHVWRFRQQHFW2&$6WRD
FROXPQ FRQWDLQLQJ PRUH WKDQ FHOOV LW ZLOO SUREDEO\
EH QHFHVVDU\ WR FRPELQH ODUJHU VHTXHQFHV RI ZULWH
RSHUDWLRQV WKDQ WKRVH REVHUYHG LQ 7DEOH , ZLWK D ORQJHU
2&$6 SUHFKDUJH SKDVH ORQJHU WKDQ QV DW OHDVW LQ
RUGHU WR REWDLQ D YDULDWLRQ RI DW OHDVW PLOOLYROW DW WKH
9'' QRGH 6R WKH DSSURDFK
V UHOLDELOLW\ LV XQFKDQJHG
DQG WKH RQO\ SDUDPHWHU WKDW PXVW EH DGMXVWHG LV WKH WHVW
GXUDWLRQVLQFHDORQJHU ZULWHRSHUDWLRQVHTXHQFH ZLWK D
ORZHU DSSOLFDWLRQ IUHTXHQFH ORZHU WKDQ 0+] LV
QHHGHG
)LQDOO\ LW LV DOVR LPSRUWDQW WR PHQWLRQ WKDW WKH
QXPEHU RI ZULWH RSHUDWLRQV VKRXOG EH HYHQ LQ RUGHU WR
JXDUDQWHH WKDW WKH RULJLQDO YDOXH WKDW ZDV VWRUHG LQ WKH
PHPRU\FHOOEHIRUHVWDUWLQJWKHWHVWLVUHVWRUHGEHIRUHWKH
PHPRU\ LV VHW EDFN LQWR WKH 1RUPDO 2SHUDWLQJ 0RGH
7KLV SURSHUW\ LV LPSRUWDQW IRU SHULRGLFDO RIIOLQH WHVWLQJ
SURFHGXUHV RQ WKH ILHOG LQ RUGHU WR JXDUDQWHH WKDW
DSSOLFDWLRQPHPRU\FRQWHQWLVQRWORVWDIWHUWHVW
% $UHD2YHUKHDG
&RQVLGHULQJ WKH FDVHVWXG\ GHVFULEHG LQ 6HFWLRQ ,,,
DQG VLPXODWHG LQ 6HFWLRQ ,9 D 65$0 FRQVLVWLQJ RI
FROXPQVHDFKFROXPQFRQWDLQLQJFHOOVDQG2&$6
WKHWUDQVLVWRUFRXQWDUHDRYHUKHDGIRUWKHPHPRU\GXHWR
VHQVRULQVHUWLRQLVDOPRVWQHJOLJLEOH
& 3RZHU&RPVXPSWLRQ
7KH SRZHU FRQVXPHG SHU 2&$6 LQVHUWLRQ LV
FRPSXWHGLQWZRSDUWVWKHVWDWLFSRZHUDQGWKHG\QDPLF
29
RQH 7KH ODWWHU SRZHU KDV EHHQ FRPSXWHG IRU WKH VHQVRU
RSHUDWLQJDW0+],QPRUHGHWDLO
6WDWLF SRZHU FRQVXPSWLRQ 9DD 9 OHDNDJH
FXUUHQW S$36 S:
'\QDPLF SRZHU FRQVXPSWLRQ 9DD
FRPSRVHGRIWKUHHFRPSRQHQWV
9 LV
ZLWK
77
ZLWKRXW
77
D
5()(5(1&(6
'HOD\[ V
ZLWK77
E
ZLWKRXW
77
ZLWK
77
'HOD\[ V
)LJXUH 3HUIRUPDQFH GHJUDGDWLRQ GXH WR 2&$6 LQVHUWLRQ D UHDG
RSHUDWLRQVEZULWHRSHUDWLRQV
' 3HUIRUPDQFH'HJUDGDWLRQ
$QRWKHU LVVXH UHODWHG WR WKH GHOD\ LQFUHDVH
SHUIRUPDQFH GHJUDGDWLRQ WKDW FRXOG UHVXOW IURP
LQWHJUDWLQJ WKH 2&$6 FLUFXLWU\ LQ D 65$0 PHPRU\ ,Q
RWKHU ZRUGV WUDQVLVWRU 77 PD\ OLPLW WKH SRZHU VXSSO\
FXUUHQW,''WKDWIORZV EHWZHHQ9''DQG*QGDQGIURP
9''WRWKHELWQRGHZKHQDFHOOLVDFFHVVHGIRUDUHDGRU
D ZULWH RSHUDWLRQ ,Q WKLV FDVH SHUIRUPHG D VHW RI
VLPXODWLRQVZLWKDQGZLWKRXWWUDQVLVWRU77FRQQHFWHGWR
WKHSRZHUVXSSO\OLQHDQGPHDVXUHGWKHGHOD\RIDUHDG
DQG D ZULWH RSHUDWLRQ LQWR WKH FHOO )LJ GHSLFWV WKLV
30
>@ 6 0DKDSDWUD ' 6DKD ' 9DUJKHVH 3 % .XPDU 2Q WKH
*HQHUDWLRQ DQG 5HFRYHU\ RI ,QWHUIDFH 7UDSV LQ 026)(7V
6XEMHFWHG WR 1%7, )1 DQG +&, 6WUHVV ,((( 7UDQV (OHFWURQ
'HY
>@ ,QJ&KDR/LQ&KLQ+RQJ/LQ.XDQ+XL/L/HDNDJHDQG$JLQJ
2SWLPL]DWLRQ 8VLQJ 7UDQVPLVVLRQ *DWH%DVHG 7HFKQLTXH ,(((
7UDQV RQ &RPSXWHU$LGHG 'HVLJQ RI ,QWHJUDWHGG &LUFXLWV DQG
6\VWHPV9RO1R-DQSS
>@ &)HUUL'3DSDJLDQQRSRXORX5,ULV%DKDU$&DOLPHUD1%7,
$ZDUH 'DWD $OORFDWLRQ 6WUDWHJLHV IRU 6FUDWFKSDG0HPRU\ %DVHG
(PEHGGHG 6\VWHPV WK ,((( /DWLQ $PHULFDQ 7HVW :RUNVKRS
/$7:
>@ ) $KPHG / 0LORU 5HOLDEOH &DFKH 'HVLJQ ZLWK 2Q&KLS
0RQLWRULQJ RI 1%7, 'HJUDGDWLRQ LQ 65$0 &HOOV XVLQJ %,67
WK,(((9/6,7HVW6\PSRVLXP976SS
>@ =4L-:DQJ$&DEH6:RRWHUV7%ODORFN%&DOKRXQ0
6WDQ 65$0%DVHG 1%7,3%7, 6HQVRU 6\VWHP 'HVLJQ 'HVLJQ
$XWRPDWLRQ&RQIHUHQFH'$&
>@ $&HUDWWL7&RSHWWL/%RO]DQL)9DUJDV,QYHVWLJDWLQJWKH8VH
RI DQ 2Q&KLS 6HQVRU WR 0RQLWRU 1%7, (IIHFW LQ 65$0 WK
,(((/DWLQ$PHULFDQ7HVW:RUNVKRS/$7:
>@ -+LFNV'%HUJVWURP0+DWWHQGRUI--RSOLQJ-0DL]63DH
& 3UDVDG - :LHGHPHU QP 7UDQVLVWRU 5HOLDELOLW\ ,QWHO
7HFKQRORJ\-RXUQDO9RO,VVXH-XQH,661
;'2,LWM
>@ . 'DUELQ\DQ * +DUXW\XQ\DQ 6 6KRXNRXULDQ 9 9DUGDQLDQ
DQG<=RULDQ$UREXVW VROXWLRQIRUHPEHGGHGPHPRU\WHVWDQG
UHSDLU,((($VLDQ7HVW6\PSRVLXP$76SS
393
I. I NTRODUCTION
Manuscript received December 20, 2013; accepted January 12, 2014. Date
of publication January 31, 2014; date of current version February 20, 2014.
This work was supported by the National Science Foundation ASSIST
Nanosystems ERC under Award EEC-1160483. The review of this letter was
arranged by Editor D. Ha.
R. Pandey, V. Saripalli, V. Narayanan, and S. Datta are with The
Pennsylvania State University, University Park, PA 16802 USA (e-mail:
[email protected]).
J. P. Kulkarni is with the Circuit Research Laboratory, Intel Corporation,
Hillsboro, OR 97124 USA.
Color versions of one or more of the figures in this letter are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/LED.2014.2300193
0741-3106 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
394
Fig. 1. 10T ST2 SRAM (a) read schematic (b) RNM in presence of RTN in
1024 possible cell types (c) (d) Worst case RTN RNM, FinFET and HTFET.
Fig. 2. 10T ST2 SRAM (a) write schematic (b) WNM in presence of RTN in
1024 possible cell types (c) (d) Worst case RTN WNM, FinFET and HTFET.
Fig. 3. 10T ST2 SRAM (a) RNM, and (b) WNM trend with Vcc scaling
in presence of RTN. Percent change in RNM (c) and WNM (d) indicates
HTFET ST2 SRAM is more immune to RTN induced variation.
395
Fig. 4. (a) RNM of 10T ST2 SRAM compared against 6T SRAM (b) Average power consumption of 256256 SRAM array with 5% activity factor
(c) Read-access delay. For HTFET SRAM, plots for 2 different trap locations: trap at tunnel junction and at 2 nm away from tunnel junction are also shown.
TABLE I
N ORMALIZED P ERFORMANCE M ETRICS W ITH RTN, AT V CCMIN
compared to Si-FinFET 6T SRAM due to delayed saturation in HTFET output characteristics. 10T ST2 SRAM
using Schmitt Trigger feedback mechanism to suppress variation is examined to explore its RTN immunity. For sub0.225 V operation, HTFET ST2 SRAM supersedes Si-FinFET
ST2 SRAM in performance due to high Ion and Ion/Ioff
ratio of HTFET (which improves effectiveness of Schmitt
feedback [8]). At 0.15V, HTFET ST2 SRAM offers 15.8%
and 17.2% improvement in RNM and WNM respectively over
Si-FinFET ST2 SRAM, besides exhibiting better tolerance
against RTN induced variation and faster operation with competitive power dissipation. Thus HTFET ST2 SRAM meets
performance and power requirements at ultra-low Vcc SRAM
applications.
R EFERENCES
[1] M. Agostinelli, J. Hicks, J. Xu, et al., Erratic fluctuations of SRAM
cache vmin at the 90nm process technology node, in IEEE IEDM Tech.
Dig., Dec. 2005, pp. 655658.
[2] N. Tega, H. Miki, R. Zhibin et al., Impact of HK / MG stacks
and future device scaling on RTN, in Proc. IEEE IRPS, Apr. 2011,
pp. 6A.5.16A.5.6.
[3] M. Fan, V. P. Hu, Y. Chen, et al., Analysis of single-trap-induced
random telegraph noise and its interaction with work function variation for tunnel FET, IEEE Trans. Electron Devices, vol. 60, no. 6,
pp. 20382044, Jun. 2013.
[4] J. Wan, C. Le Royer, A. Zaslavsky, et al., Low-frequency noise behavior
of tunneling field effect transistors, Appl. Phys. Lett., vol. 97, no. 24,
pp. 243503-1243503-3, 2010.
[5] D. K. Mohata, R. Bijesh, S. Mujumdar, et al., Demonstration of
MOSFET-like on-current performance in arsenide/antimonide tunnel
FETs with staggered hetero-junctions for 300mV logic applications,
in Proc. IEEE IEDM, vol. 5. Dec. 2011, pp. 33.5.133.5.4.
[6] G. Dewey, B. Chu-Kung, J. Boardman, et al., Fabrication, characterization, and physics of III-V heterojunction tunneling field effect transistors
for steep sub-threshold swing, in Proc. IEEE IEDM, vol. 3. Dec. 2011,
pp. 33.6.133.6.4.
[7] J. P. Kulkarni, K. Kim, S. P. Park, et al., Process variation tolerant SRAM array for ultra low voltage applications, in Proc. 45th
ACM/IEEE DAC, Jun. 2008, pp. 108113.
[8] V. Saripalli, S. Datta, V. Narayanan, et al., Variation-tolerant ultra lowpower heterojunction tunnel FET SRAM design, in Proc. IEEE/ACM
Int. Symp. Nanoscale Archit., vol. 1. Jun. 2011, pp. 4552.
[9] M.-L. Fan, V. P.-H. Hu, Y.-N. Chen, et al., Impacts of single trap
induced random telegraph noise on FinFET devices and SRAM cell
stability, in Proc. IEEE Int. SOI Conf., Oct. 2011, pp. 12.
[10] R. Pandey, B. Rajamohanan, H. Liu, et al., Electrical noise in heterojunction interband tunnel FETs, IEEE Trans. Electron Devices, vol. 61,
no. 2, Feb. 2014, to be published.
[11] N. Tega, H. Miki, M. Yamaoka, et al., Impact of threshold voltage
fluctuation due to random telegraph noise on scaled-down SRAM, in
Proc. IEEE Int. Rel. Phys. Symp., Apr./May 2008, pp. 541546.
[12] C. Leyris, S. Pilorget, M. Marin, et al., Random telegraph signal noise
SPICE modeling for circuit simulators, in Proc. 37th Eur. Solid State
Device Res. Conf., Sep. 2007, pp. 187190.
[13] (2009). Cadence Virtuoso Spectre Circuit Simulator [Online]. Available:
http://www.cadence.com/products/rf/spectre_circuit/pages/default.aspx
Battery lifetime is the key feature in the growing markets of sensor networks and
energy-management system (EMS). Low-power MCUs are widely used in these
systems. For these applications, standby power, as well as active power, is
important contributor to the total energy consumption because active sensing or
computing phases are much shorter than the standby state. Figure 13.4.1 shows
a typical power profile of low-power MCU applications. To achieve many years
of battery lifetime, the power consumption of the chip must be kept below 1A
during deep sleep mode. Another key feature of a low-power MCU for such
applications is fast wake-up from deep-sleep mode, which is important for low
application latency and to keep wake-up energy minimal. For fast wake-up, the
system must retain its state and logged information during sleep mode because
several-hundred microseconds are needed for reloading such data to memories.
Conventional SRAM consumes much higher retention current than the required
deep-sleep-mode current as shown in Fig. 13.4.1. Embedded Flash memories
have limited write endurance on the order of 105 cycles making them difficult to
use in applications that frequently power down. Embedded FRAM [1,2] has been
used for this purpose and it could be used as a random-access memory as well
as a nonvolatile memory. However, as a random-access memory, its slow
operation and high energy consumption [1,2] limits performance of the MCU
and battery lifetime. Furthermore, additional process steps for fabricating FRAM
memory cells increase the cost of MCU. SRAM can operate at higher speed with
lower energy without additional process steps, but high retention current makes
it difficult to sustain data in deep-sleep mode. To solve this problem, we develop
low-leakage current SRAM (XLL SRAM) that reduce retention current by 1000
compared to conventional SRAM and operate with less than 10ns access time.
The retention current of XLL SRAM is negligible in the deep-sleep mode because
it is much smaller than the amount of the deep-sleep-mode current of MCU,
which is dominated by active current of the real-time clock and control logic
circuits. By using XLL SRAM, the store and reload process during mode transitions can be eliminated and wake-up time from deep-sleep mode of MCU is
reduced to few microseconds. This paper describes a 128kb SRAM with 3.5nA
(27fA/b) retention current, 7ns access time, and 25W/MHz active energy
consumption. Its low retention current, high-speed, and low-power operation
enable to activate SRAM in the deep-sleep mode, and also provides fast
wake-up, low active energy consumption and high performance to MCU.
Since complexity and required performance of MCU has been increasing, a more
advance process has to be used. As the process geometry becomes smaller, the
leakage current of transistors increases. In addition to channel leakage, other
leakage mechanism become significant, such as gate-oxide leakage and GIDL.
To realize XLL SRAM, these three types of leakage current must be suppressed.
Low-leakage transistors with long gate length and thick gate oxide for SRAM
memory cells have been developed. Their GIDL is also decreased by introducing
lightly doped region in the active layer as shown in Fig. 13.4.2(c). Generally,
adapting long-gate-length and thick-gate-oxide transistors for memory cells
causes an increase in memory macro area and power dissipation. We use
several techniques to shrink the memory cell size, avoiding the increase in active
power dissipation. The memory cell layout is shown in Figs.13.4.2 (a) and (b).
Since the supply voltage of memory cells is 1.2V, the space between p-type and
n-type well and between gate poly and adjacent diffusion area are shrunk
compared to the original design rule for the transistors with the same gate-oxide
thickness. As a result, the memory cell size is reduced about 20%. The cell
height in the vertical direction is extended by adopting long gate length
transistors. This allows four wordlines to be routed over the memory cell as
shown in Fig. 13.4.2(b). A block diagram of the developed 128kb SRAM is
shown in Fig.13.4.3(a). Peripheral circuits consist of conventional transistors to
achieve high performance and to reduce SRAM size. The supply voltage of the
peripheral circuits is cut-off, and the NMOS source node of the memory cells
(VSSB) is reverse biased via source-bias circuits in retention mode, as shown in
Fig. 13.4.3(b). As a result, 27fA/b of leakage current is achieved with the
236
fabricated XLL SRAM at room temperature, shown in Fig. 13.4.4(a). The leakage
current of conventional SRAM is also shown for comparison. The XLL SRAM
achieves 1000 lower leakage current at room temperature compared to the
conventional SRAM due to lower gate leakage current and larger back-gate-bias
effect. Low-power MCU in the deep-sleep mode consumes low energy, so that
retention current at room temperature determines the battery lifetime. The
leakage current of XLL SRAM is lower than the required deep-sleep-mode
current of low power MCU, even though memory capacity increases up to
several Mb. Therefore all of SRAMs in a low-power MCU can be active in deepsleep mode, and the state and the logged information can be retained in them.
Comparison of our SRAM leakage current to published leakage for SRAM in
65nm and smaller processes is shown in Fig. 13.4.4(b). It shows XLL SRAM
reduces leakage current by more than 10 compared to FD-SOI SRAMs.
To compensate for the increase in active power due to relatively large SRAM
area, several low-power techniques are adopted. Bitline-charging current is the
dominant portion of active power consumption of SRAM. To reduce the bitlinecharging current, we adopt quarter array activation scheme (QAAS) and chargeshared hierarchical bitline (CSHBL) [3,4]. Four wordlines are routed over a
memory cell, and one of four wordlines connects to a memory cell in every 4
columns as shown in Fig.13.4.5. An SRAM architecture where two word lines
are routed over a memory cell has been reported [3]. By taking advantage of the
extended memory cell height, the number of wordlines passed over a memory
cell is doubled. Then 3/4 of bitlines remain inactive in active cycles, and bitlinecharging current is reduced. The SRAM also employ CSHBL. It has been
reported that CSHBL is effective for reducing the active power increase due to
random variation of transistors [4]. We find that CSHBL is also effective for
reducing the active power increase due to process and temperature variations.
Waveforms of signals in CSHBL operation are shown on the right side of
Fig.13.4.5. The local bitlines are fully swung when the corresponding wordline
is selected. Before the pass transistors are turned on, the selected wordline falls.
Then stored charge on the local bitlines are transferred to the global bitlines by
charge sharing. Since the amount of stored charge on the local bitlines is
determined by the capacitance of the local bitlines and supply voltage, the
bitline-charging current of the SRAM stays constant regardless temperature and
process condition. In a conventional SRAM, bitline level varies substantially with
changing temperature or process condition. Designing for the minimum bitline
swing that can be sensed by the sense amplifiers in the slowest condition causes
the bitline swing to become excessive in fast conditions. This causes excessive
bitline-charging power dissipation. Figures 13.4.6 (a) and (b) show the bitlinecharging current by measuring current to the memory cell ground in several
process conditions. The charging current of the XLL SRAM decreases more than
40% compared to SRAM with conventional bitline architecture and timingcontrol circuits. Figure 13.4.6(c) shows measured active energy of the XLL
SRAM. Active energy of the XLL SRAM is reduced about 40% by adapting QAAS
and CSHBL, and 25W/MHz of active energy at 1.2V is achieved. The achieved
active energy is only 9% larger than the conventional SRAM.
Figure 13.4.7 shows the chip micrograph and key features of the test chip
fabricated in a 65nm CMOS process. Memory cell size is 2.159m2 and macro
area of the 128kb XLL SRAM is 0.443mm2. Retention current and active energy
at 1.2V is 3.5nA (27fA/b) and 25W/MHz, respectively. The leakage current is
negligible compared to deep-sleep-mode current of low power MCU. Because of
this, all the SRAMs in MCU can be awake in deep-sleep mode, shortening
wake-up time from deep-sleep mode and reducing the energy consumption
during the mode transition.
References:
[1] A. Baumann et al., A MCU platform with embedded FRAM achieving 350nA
current consumption in real-time clock mode with full state retention and 6.5s
system wakeup time, VLSI Cir. Symp., pp. 202-203, 2013.
[2] M. Zwerg et al. An 82A/MHz Microcontroller with Embedded FeRAM for
Energy-Harvesting Applications, ISSCC, pp. 334-335, 2011.
[3] H. Fujiwara et al., A 20nm 0.6V 2.1W/MHz 128kb SRAM with No Half
Select Issue by Interleave Wordline and Hierarchical Bitline Scheme, VLSI Cir.
Symp., pp 118-119, 2013.
[4] S. Miyano et al., Highly Energy-Efficient SRAM With Hierarchical Bit Line
Charge-Sharing Method Using Non-Selected Bit Line Charges, JSSC vol.48, pp.
924-931, April 2013.
13
237
238
Figure 13.5.1: (a) Conventional 6T-SRAM bitcell and (b) the layout view of the
high-density SRAM bit-cell with an area of 0.07m2.
Figure 13.5.2: Negative bitline voltage versus required write bitline voltage.
13
Figure 13.5.3: SRAM design equipped with SCS-NBL write assist scheme.
Figure 13.5.4: SRAM design equipped with WRE-LCV write assist scheme.
Figure 13.5.5: Floorplan of the WRE-LCV and SCS-NBL blocks for write-assist
techniques.
239
238
Figure 13.5.1: (a) Conventional 6T-SRAM bitcell and (b) the layout view of the
high-density SRAM bit-cell with an area of 0.07m2.
Figure 13.5.2: Negative bitline voltage versus required write bitline voltage.
13
Figure 13.5.3: SRAM design equipped with SCS-NBL write assist scheme.
Figure 13.5.4: SRAM design equipped with WRE-LCV write assist scheme.
Figure 13.5.5: Floorplan of the WRE-LCV and SCS-NBL blocks for write-assist
techniques.
239
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014
585
I. Introduction
Manuscript received June 22, 2013; revised November 13, 2013 and January
17, 2014; accepted January 23, 2014. Date of current version March 17, 2014.
The work was supported by the Singapore MOE Tier-1 funding RG 26/10. The
preliminary result was published in ISPD13. This paper was recommended
by Associate Editor C. Sze.
The authors are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798 (e-mail:
[email protected]).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCAD.2014.2304704
c 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
0278-0070
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
586
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014
SONG et al.: REACHABILITY-BASED ROBUSTNESS VERIFICATION AND OPTIMIZATION OF SRAM DYNAMIC STABILITY
Fig. 1.
Fig. 2.
587
q(x(t), t) + f (x , t) + u +
(x x )
dt
x x=x
1
2 f
T
+ (x x )
(x x ) = 0,
(2)
2
x2 x=
{x + (x x )|0 1}
where x is the nominal point and x is one neighbor point near
x ; and represents the tensor multiplication. The 2nd-order
remainder in (2), i.e., the difference between nonlinear f (t)
and its linear approximation, is called as linearization error
denoted by L.
The SRAM dynamic equation thereby can be depicted in a
simplified form by
d
(q(x , t) + Cx) + f (x , t) + u (t) + Gx + u + L = 0 (3)
dt
in which
q
f
C=
|x=x , G =
|x=x ;
x
x
(4)
x = x x , u = u u .
Here, u is decomposed into u and u, in which u is the
noiseless input and u is the input noise independent of x.
Assume that q(x, t) can be further decomposed into q(x ) and
Cx. Thus, one can obtain
d
q(x , t) + f (x , t) + u (t) = 0
(5a)
dt
d
Cx + Gx + u + L = 0
(5b)
dt
in which (5a) is the nonlinear differential equation for nominal
point x and (5b) is the linear equation with Euclidean distance
from nominal point x to the neighbor point x.
588
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014
..
.
gm
gm
W
W
W.
(9)
G =
gm gm
W W
..
.
Fig. 3. SRAM with threshold-voltage variations modeled by additional current sources for all transistors.
(6)
node b
u = u u = [0, ..., Ijd , ..., Ijd , ..., 0]T
(7)
as an independent current source. u represents the jth variation of current source connected between nodes a and b. The
other process variations can be also conveniently considered
in the similar way.
What is more, perturbations of multiple device parameters
can be considered as well. Suppose that each transistor in
SRAM has width perturbation W that affects transconductance gm , namely, gm . One can have
gm =
gm
W.
W
(8)
F (w)
s. t.:
(10)
SONG et al.: REACHABILITY-BASED ROBUSTNESS VERIFICATION AND OPTIMIZATION OF SRAM DYNAMIC STABILITY
589
TABLE I
Parameters Used in Reachability-Based Verification
(12)
(13)
Here, c1 and c2 are the centers of zonotopes P, Q, respectively. Generators of P and Q are represented by g1(i) ,
g2(i) , respectively. A tight zonotope enclosing the convex hulls
of two zonotopes CH(P, Q) can be found by CH(P, Q) as
follows:
1
CH(P, Q) = (c1 + c2 , g1(1) + g2(1) , ..., g1(e) + g2(e) ,
2
c1 c2 , g1(1) g2(1) , ..., g1(e) g2(e) ).
(14)
590
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014
Fig. 6.
Fig. 7.
q
[0, 1]g(i) }.
(15)
i=1
q
Xk = xk R
: xk = xk +
[1, 1]xk .
i=1
(j)
Uk = uk Rn1 : uk = uk +
.
[1, 1]uk
j=1
The according iteration equation for zonotope-based verification is thereby built after substituting generator xk(i)
(q)
g
g
by generator matrix Xk = [xk(1) , ..., xk ], u(i)
k by Uk =
(1)
(m)
[uk , ..., uk ], Jacobian matrix A by matrix zonotope A,
and capacitance matrix C by matrix zonotope C. As such
C g
g
g
Xk = A1
(18)
Xk1 Uk Lk , k = 1, ..., N.
h
What is more for robustness optimization, matrix zonotopes
A and C can be built to consider perturbations from multiple
device parameters, such as transistor width sizings in the case
of SRAMs. In A, interval conductance matrix G can be
computed using the interval values of transistor widths similar
to (9). As such, the zonotope matrix can be further interpreted
in terms of interval-valued matrices by
A A(0)
|A(i) |, A(0) +
|A(i) | .
(19)
i
(16)
i=1
A(i) =
A(0)
W (i) = G(i) .
W
(20)
SONG et al.: REACHABILITY-BASED ROBUSTNESS VERIFICATION AND OPTIMIZATION OF SRAM DYNAMIC STABILITY
591
(21)
(22)
k xk =
xk )|0
1}.
(25)
Fig. 8.
(26)
n1
: d = psafe c
q
i=1
[0, 1]g(i) }.
(27)
592
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014
TABLE II
2:
Xk1 (xk1
, Xk1 )
3:
compute xk and linearized matrices Ck , Gk
4:
compute system matrix zonotopes A and C
5:
approximate linearization error Lk
6:
if IH(Lk ) [, ] then
g
g
7:
Xh = A1 Ch Xk1
g
1
8:
Xi = A Uk
9:
Xeg = A1 Lk
g
g
g
10:
Xk = Xh Xi Xeg
g
11:
Xk = (xk , Xk )
12:
else
13:
Xk1 = split(Xk1 )
14:
continue
15:
end if
16: end for
Fig. 9.
where , i = 1, ...q is the coefficient of generators to determine the relative position of the point within the zonotope.
Note that safety distance reduces to zero if zonotope settles
in the safe region. As such, one can utilize it to verify the
dynamic stability of SRAM.
B. Sensitivity of Safety Distance
With the use of reachability analysis by zonotope, trajectory
of SRAM is obtained and the sensitivity of the safety distance
D at the final reachable set
q
(i)
xfinal = cfinal +
[0, 1]gfinal
i=1
(p
)T
(i)
final
zonotope generators gfinal
to the normal vector ||psafe
,
safe cfinal ||2
which is formed from the zonotope center cfinal to the safe
region psafe .
As such, one can calculate the large-signal sensitivity
S(D, w) of the safety distance D with respect to device
parameter w by
S(D, w) :=
D
w
(30)
xk
C
G C
C
= ( + G)1
( + G)1 ( xk1 uk )
w
h
w h
h
(31)
SONG et al.: REACHABILITY-BASED ROBUSTNESS VERIFICATION AND OPTIMIZATION OF SRAM DYNAMIC STABILITY
593
V. Experimental Results
Fig. 10.
(32)
F (wk , t)
kT k
(34)
594
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014
Fig. 12. Verification of read operation with threshold variation range of 30%.
(a) Read operation succeeds with 6 ns pulse. (b) Read operation fails with
11 ns pulse.
Fig. 11. Verification of write operation with threshold variation range of
10%. (a) Write operation fails with 0.025 ns writing pulse. (b) Write operation
fails with 0.029 ns writing pulse. (c) Write operation succeeds with 0.050 ns
writing pulse.
reachable sets the smallest and thus the new trajectories can
cover the most Monte Carlo curves.
Finally, when the duration increases to 0.050 ns in
Fig. 11(c), all possible states finish write operation without
failure. As shown in Fig. 11, curves of Monte Carlo verification remain within the reachable sets by reachability analysis
under the similar accuracy. It indicates that reachability analysis can succeed in approximating the trajectories of SRAM
for failure verification.
2) Verification of Read Operation: Next, the read operation
can be also verified by reachability analysis. The verification
result of read operation is compared with different durations of
input signal while the V th variations are set as 30%. Duration
of read signal is set to 6 ns and 11 ns.
As shown in Fig. 12, the Monte Carlo curves are plotted
in light purple and the enclosing trajectories drawn by reachability analysis are in dark blue. When signal duration is 6 ns
[Fig. 12(a)], all reachable sets recover back to the initial state
after read operation finishes. But when the signal duration
rises to 11 ns, most reachable sets head for the opposite state
which means that read failure happens [Fig. 12(b)]. Due to
the limited accuracy of the first-order noise current model in
(6), the difference between Monte Carlo and the reachability
SONG et al.: REACHABILITY-BASED ROBUSTNESS VERIFICATION AND OPTIMIZATION OF SRAM DYNAMIC STABILITY
595
Fig. 13.
recovers from read failure. The optimized widths are [148 nm,
343 nm ,217 nm].
Then, we perform stability optimization for write operation
only. We set the initial pair widths as [W1 , W3 , W5 ] =[400 nm,
500 nm, 350 nm] and reduce the pulse width to 0.050 ns.
The stability optimization by large-signal sensitivity calculated
from reachability analysis can certainly help guide the system
trajectory to converge to the safe region within four iterations
(Fig. 14). The optimized widths are [381 nm, 440 nm, 497 nm].
2) Optimization of Read and Write Failure: To optimize
read and write failure simultaneously, initial transistor pair
widths are randomly chosen as W1 = 200 nm, W3 = 400 nm
and W5 = 400 nm. Pulse width is 9 ns for read operation
and is 0.024 ns for write operation. The process of stability
optimization is shown in Fig. 15.
The optimization direction of trajectory for read operation and write operation are shown in Fig. 15(a) and (b),
respectively. The trajectory after performing optimization to
initial set of transistor widths is represented as initial. From
Fig. 15(b), one can observe that at the beginning, write failure
happens as the trajectory converges to the initial state. With
the use of the proposed sensitivity-based reachability analysis
for the dynamic stability optimization, the trajectory of read
operation moves away from the wrongly converged region
and finally moves to the target state after six iterations when
tuning transistor pair sizes. Meanwhile, the read operation in
Fig. 15(a) is considered, where read failure did not happen at
the beginning. As the write operation is optimized, the trajectory for read operation deviates upward too. As such, the safety
distance to the top-left corner (in this case) is decreased. In
other words, the write operation is optimized at the expense of
read operation to achieve a lower rate of failure for both cases.
The optimized transistor widths obtained by our approach
are finally achieved as W1 = 192 nm, W3 = 330 nm and
N
W5 = 586 nm, respectively. Yield rate (Y := 1 Nfailure
)
total
considering both read and write functions is improved from
6.8% to 99.957%. Further improvement of yield rate can be
achieved by introducing larger threshold variations during the
optimization.
C. Comparisons
1) SRAM Dynamic Stability Verification: A detailed comparison between zonotope-based reachability analysis and
596
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014
Fig. 14.
Fig. 15.
Optimization procedure for SRAM dynamic stability. (a) Optimization of read operation. (b) Optimization of write operation.
Monte Carlo method is made upon the write operation. For the
concern of huge time consumption by Monte Carlo method,
we use 1000 samples that usually takes more than one hour
for a single round of verification according to our experiment. Different durations of write signal are considered as
well as different threshold-voltage variations in all transistors.
Detailed experimental results are listed in Table III in which
pulse refers to the duration of input signal; and acceleration
is the ratio of time consumption of Monte Carlo to that of
reachability analysis.
As shown in Table III, compared with Monte Carlo, reachability analysis can achieve speedup up to more than 800 for
1000 samples. When write signal duration is set to 0.025 ns
[Fig. 11(a)] or 0.050 ns [Fig. 11(c)], only one trajectory is
generated by reachability analysis. Linearization is performed
around one nominal trajectory which takes up most of the
simulation time. Thus the time consumption of reachability
analysis is slightly higher than the simulation of one sample of
Monte Carlo verification. As signal duration is set to 0.029 ns,
reachable sets are split into different parts and two trajectories
TABLE III
Time Consumption of SRAM Verification
are generated. Therefore the runtime of reachability verification doubles and the speedup ratio reduces by half when the
signal lasts 0.029 ns and 10% V th variations are introduced
[Fig. 11(b)]. For all experiment cases listed in the Table III,
the reachability analysis can achieve the similar accuracy as
Monte Carlo method to report the failure region.
2) SRAM Dynamic Stability Optimization: The runtime of
optimization at each iteration is listed in Table IV, where more
than 600 runtime speedup can be achieved by our approach.
SONG et al.: REACHABILITY-BASED ROBUSTNESS VERIFICATION AND OPTIMIZATION OF SRAM DYNAMIC STABILITY
Fig. 16.
597
Fig. 17.
598
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014
guide SRAM design with operations departing from unsafe region and converge in safe region. In addition, compared to the
traditional single-parameter small-signal based sensitivity optimization, our method can converge faster with higher accuracy.
Compared to the Monte Carlo-based optimization, our method
can achieve speedups up to 600 with similar accuracy.
References
[1] E. Seevinck, F. J. List, and J. Lohstroh, Static-noise margin analysis
of MOS SRAM cells, IEEE J. Solid State Circuits, vol. 22, no. 5,
pp. 748754, Oct. 1987.
[2] E. Grossar, M. Stucchi, K. Maex and W. Dehane, Read stability and
write-ability analysis of SRAM cells for nanometer technologies, IEEE
J. Solid State Circuits, vol. 41, no. 11, pp. 25772588, Nov. 2006.
[3] S. O. Toh, Z. Guo, and B. Nikolic, Dynamic SRAM stability characterization in 45 nm cmos, in Proc. VLSIC, Jun. 2010, pp. 3536.
[4] A. Singhee, C. F. Yang, J. D. Ma, R. A. Rutenbar, Probabilistic intervalvalued computation: Toward a practical surrogate for statistics inside
CAD tools, IEEE Trans. Comput. Aided Design Integr. Circuits Syst.,
vol. 27, no. 12, pp. 23172330, Nov. 2008.
[5] S. Yaldiz, U. Arslan, X. Li and L. Pileggi, Efficient statistical analysis
of read timing failures in SRAM circuits, in Proc. ISQED, 2009,
pp. 617, 621.
[6] C. Dong and X. Li, Efficient SRAM failure rate prediction via Gibbs
sampling, in Proc. DAC, Jun. 2011, pp. 200205.
[7] H. Yu and S. X.-D. Tan, Recent advance in computational prototyping
for analysis of high-performance analog/RF ICs, in Proc. ASICON,
Oct. 2009, pp. 760764.
[8] F. Gong, H. Yu, Y. Shi, D. Kim, J. Ren and L. He, Quickyield: An efficient global-search based parametric yield estimation with performance
constraints, in Proc. DAC, Jun. 2010, pp. 392397.
[9] F. Gong, H. Yu, and L. He, Fast non-Monte-Carlo transient noise
analysis for high-precision analog/RF circuits by stochastic orthogonal
polynomials, in Proc. DAC, Jun. 2011, pp. 298303.
[10] F. Gong, X. Liu, H. Yu, S. X-D. Tan, J. Ren and L. He, A fast nonMonte-Carlo yield analysis and optimization by stochastic orthogonal
polynomials, ACM Trans. Design Autom. Electron. Syst., vol. 17, no. 1,
pp. 10:110:23, Jan. 2012.
[11] H. Wang, H. Yu, and S. X.-D. Tan, Fast timing analysis of clock
networks considering environmental uncertainty, VLSI J. Integr., vol. 45,
no. 4, pp. 376387, Sep. 2012.
[12] W. Wu, F. Gong, R. Krishnan, H. Yu, and L. He, Exploiting parallelism
by data dependency elimination: A case study of circuit simulation algorithms, IEEE Design Test Comput., vol. 30, no. 1, pp. 2635, Feb. 2013.
[13] F. Gong, S. B. Kazeruni, L. He and H. Yu, Stochastic behavioral modeling analysis of analog/mixed-signal circuits, IEEE Trans. Comput.
Aided Design Integr. Circuits Syst., vol. 32, no. 1, pp. 2433, Jan. 2013.
[14] K. Agarwal and S. Nassif, Statistical analysis of SRAM cell stability,
in Proc. DAC, 2006, pp. 5762.
[15] D. E. Khalil, M. Khellah, N-S. Kim, Y. Ismail, T. Karnik and V. K. De,
Accurate estimation of SRAM dynamic stability, IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 16, no. 12, pp. 16391647, Dec.
2008.
[16] B. Zhang, A. Arapostathis, S. Nassif and M. Orshansky, Analytical
modeling of SRAM dynamic stability, in Proc. ICCAD, Nov. 2006,
pp. 315322.
[17] S. Srivastava and J. Roychowdhury, Rapid estimation of the probability
of SRAM failure due to MOS threshold variations, in Proc. CICC,
Sep. 2007, pp. 229232.
[18] G. M. Huang, W. Dong, Y. Ho and P. Li, Tracing SRAM separatrix
for dynamic noise margin analysis under device mismatch, in Proc.
BMAS, Sep. 2007, pp. 610.
[19] C. J. Gu and J. Roychowdhury, An efficient, fully nonlinear, variabilityaware non-Monte-Carlo yield estimation procedure with applications
to SRAM cells and ring oscillators, in Proc. ASPDAC, Mar. 2008,
pp. 754761.
[20] W. Dong, P. Li, and G. M. Huang, SRAM dynamic stability: Theory,
variability and analysis, in Proc. ICCAD, Nov. 2008, pp. 378385.
[21] S. Gupta, B. H. Korigh and R. A. Rutenbar, Towards formal verification
of analog designs, in Proc. ICCAD, Nov. 2004, pp. 210217.
[22] G. Frehse, B. H. Krogh, and R. A. Rutenbar, Verifying analog
oscillator circuits using forward/backward abstraction refinement, in
Proc. DATE, Mar. 2006, pp. 257262.
2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing
by Design; Total
I.
INT RODUCT ION
SRAM with CMOS process is widely used in space
exp loration and high-energy physics experiments. Since a large
number of cosmic rays and charged particles ( particle, high
energy proton, neutron and so on) existing in these application
environment, SRAM is especially sensitive to the ionizing
radiation that they create, which easily cause storage data bits
upset, the decrease of data access rate, temporal disorder,
tremendous parameters change and power consumption [1].
The life and stability of SRAM is seriously declined because of
radiation effect [2]. Therefore, the research and design of
radiation hardened SRAM is very necessary.
Replication Array
SRAM
Control
Signal
SRAM
Address
Line
Precharge and
Amplifier
Replication
Array
Memory Array
Address
Decoding
Sequential
Control
Driver
24Kb13bit SRAM
Writing Data
/ 13 bits
and
Redundancy
Bits
EDAC Writing
Circuit
Encode
5HDGLQJ
EDAC Test
Circuit
'DWDDQG
/UHGXQGDQF\
ELWV
%LWV
EDAC Reading
Circuit
Decode
8 bits
2 bits
/ 8 bits
Writing
Data
159
WL
PM1
PM2
PM3
PM4
NM1
NM2
BL
BLN
(a) Schematic
A. System Design
The system structure of the designed radiation-hardened
SRAM proto chip is showed in Figure 1. To make up for the
deficiency of commercial CMOS processs anti-SEU ability,
EDAC (Error Detection and Correction) is integrated in SRAM
to realize real-time processing for storage data. The encoding
algorithm of module EDAC is the (13, 8) hamming code,
which is 8 bits of data and 5 redundancy codes (13 b its in total).
When data is written, 5 bits of redundancy codes are created by
EDAC writing circuits, stored in SRAM together with data.
When the data is read, the data and the 5 redundancy codes in
SRAM co me into EDA C decoding error correction circuits
together. First, decide whether the data is right or not. If its
right, output it directly. If one bit is wrong, correct it and then
output it. If two are wrong, report the error to the system
through error flag. Error flag is made up by two bits; one
shows whether to correct the error, the other shows whether the
data is effective. When the error bits are greater than or equal
to 2 bits, the data is useless. Therefore, built-in module EDAC
can correct or monitor the upset of the storage data, and
improve SRAMs ability to tolerate SEU.
III.
THE REALIZAT ION AND VERIFICAT ION OF RADIAT IONHARDENED SRAM PROT O CHIP
160
TABLE II.
PP
) LOOHU
&KLS$
&KLS%
G
F
&KLS&
PP
G
F
PP
XP
XP
XP
XP
)('$&5HDGLQJ&LUFXLW *('$&7HVW&LUFXLW
XP
XP
(('$&:ULWLQJ&LUFXLW
TABLE I.
RHBD
Method
Chip A
Chip B
Chip C
EDAC
YES
YES
YES
YES
T ime
Control
Store Cell
(NMOS)
Guard
Ring
Replication
Array
Replication
Array
Replication
Array
Replication
Array
Ringgate
Ringgate
Ringgate
Bar gate
No
N-type
P-type
No
Distance
2um
2um
2um
0um
a.
Chip C
Chip D
Chip Area
5.24mm2
5.24mm2
5.24mm2
1.86mm2
Frequency
50M Hz
50M Hz
50M Hz
50M Hz
T ime
7.6ns
7.6ns
7.6ns
3.9ns
Static Power
0.49uW
0.49uW
0.49uW
0.41uW
Dynamic Power
7.38mW
7.38mW
7.38mW
5.40mW
E
PP
Chips versions
) LOOHU
PP
Chip B
PP
&KLS'
Chip A
IV.
F G
Electrical
properties
a.
PP
PP
Chip D
161
V.
CONCLUSION
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
Fig.4. T he static current increases with the total ionizing dose
[9]
162
6WXG\DQGPHFKDQLVPRIVWDWLFVFDQQLQJODVHUIDXOWLVRODWLRQRQHPEHG65$0IXQFWLRQIDLO
&KDQJTLQJ&KHQKXLSHQJ1J*KLQERRQ$QJ-/DP=KLKRQJ0DL
*/2%$/)281'5,(63WH/WG
:RRGODQGV,QGXVWULDO3DUN'6WUHHW6LQJDSRUH
&KDQJTLQJ&+(1#JOREDOIRXQGULHVFRP
$EVWUDFW
$V WKH WHFKQRORJ\ NHHS VFDOLQJ GRZQ DQG ,& GHVLJQ
EHFRPLQJ PRUH DQG PRUH FRPSOH[ IDLOXUH DQDO\VLV EHFRPHV
PRUH DQGPRUH FKDOOHQJH HVSHFLDOO\ IRU VWDWLF ODVHU DQDO\VLV
)RU WKH IRXQGU\ )$ RU SURFHVV PRQLWRULQJ 65$0 DQDO\VLV
EHFRPHV PRUH DQG PRUH FULWLFDO 7KHUH DUH WZR UHDVRQV IRU
WKLV 7KH ILUVW RQH LV WKDW 65$0 FLUFXLW LV UHODWLYH VLPSOH
ZKLFK LV ZHOO NQRZQ WR DOO LW LV DOVR XVHG E\ IDE IRU
PRQLWRULQJ VWUXFWXUH 7KH VHFRQG UHDVRQ LV WKH 65$0
SHUFHQWDJHRQFKLSNHHSVLQFUHDVLQJ,WFDQRFFXS\PRUH
FKLS DUHD IRUPRVW ORJLF SURGXFW 7KDW LV DOVR DQRWKHUUHDVRQ
ZH XVH WKH 65$0 WR PRQLWRU RXU SURFHVV 65$0 DQDO\VLV
ZLWK ELW PDS LV UHODWLYHO\ HDV\ IRU )$ %XW DV ')7 EHFRPH
PRUHSRSXODUWKH%,67WHFKQLFDOZDVDSSOLHGLQWKH65$0
ELW PDS ZDV SURYLGHG IUHTXHQWO\ 7KH JOREDO IDXOW LVRODWLRQ
PHWKRGRORJ\ PXVW EH HPSOR\HG LQ WKH 65$0 )$ ,Q WKLV
SDSHU VWDWLF VFDQQLQJ ODVHU PHWKRGRORJ\ ZDV DSSOLHG LQ WKH
65$0 )$ ZKLFK QR ELW PDS ZDV SURYLGHG +RW VSRW ZDV
REVHUYHG LQWKH 65$0 EORFN HGJH IRU VRPH IDLOHG XQLWV EXW
VRPH QRW &RPELQHG ZLWK WKH 65$0 VFKHPDWLF DQG *'6
DQDO\VLV WKH GHIHFW ZDV VXFFHVVIXOO\ IRXQG DQG WKH IDLOXUH
PHFKDQLVP ZDV VWXGLHG ZKLFK FDQ VXFFHVVIXOO\ OLQN WKH
HOHFWULFDO SKHQRPHQRQ DQG SK\VLFDO GHIHFW $OVR ZH IRXQG
WKHSURFHVVLVVXHZLWKWKH)$UHVXOW
%DFNJURXQGLQIRUPDWLRQ
6LQFHWKH65$0SHUFHQWDJHRQWKHFKLSNHHSVLQFUHDVLQJ
DQG LW LV DOVR WKH NH\ SURFHVV GHYHORSPHQW DQG SURFHVV
PRQLWRULQJ VWUXFWXUH 7KH IDLOXUH DQDO\VLV RQ 65$0 LV TXLWH
FULWLFDO7KH)$RQ65$0ZLWKELWPDSLVTXLWHFRPPRQDQG
PXFK PRUH VWUDLJKWIRUZDUG %XW DV WKH WHFKQRORJ\ JRLQJ
IRUZDUG')7DQG%,67ZDVZLGHO\DSSOLHGLQWKH,&GHVLJQ
WKH IDLOXUH DQDO\VLV EHFRPHV PRUH DQG PRUH FKDOOHQJH RQ
65$0 ZLWKRXW ELW PDS *OREDO IDXOW LVRODWLRQ PHWKRGRORJ\
ZDVPXVWHPSOR\HGLQWKLVNLQGRIIDLOXUHPRGH%XWPRVWRI
WLPH '& ELDV GRHVQW VKRZV DQ\ VLJQLILFDQW GLIIHUHQFH
EHWZHHQ JRRG DQG IDLOHG XQLW EHFDXVH HLWKHU WKH GHIHFW
ORFDWLRQFDQQRWEHDFFHVVHGE\'&ELDVRUWKHGHIHFWORFDWLRQ
RQO\FDQLQGXFHVPDOOFXUUHQWFKDQJHZKLFKFDQEHFRQFHDOHG
E\ RYHUDOO FXUUHQW %DVHG RQ RXU VWXG\ WKH JOREDO IDXOW
LVRODWLRQ VWLOO FDQ EH DSSOLFDEOH IRU WKH VHFRQG VLWXDWLRQ ,Q
WKLVSDSHUDQHPEHG65$0%,67IDLOZDVKDQGOHG$OWKRXJK
WKH FRPSDUDEOH ,9 ZDV REVHUYHG WKH JOREDO IDXOW LVRODWLRQ
ZDVVWLOODSSOLHGRQWKHIDXOWLVRODWLRQ7,9$DQDO\VLV
FRPSDUHGZLWKUHIHUHQFHRQH2QHPRUHREVHUYDWLRQLVWKDWDOO
RI WKH 7,9$ VSRW ORFDWH LQ WKH 65$0 EORFN HGJH )LJXUH
%DVHGRQRXUH[SHULHQFHWKLVNLQGRIVROLGVSRWVKRXOGEHUHDO
RQH (LWKHU GHIHFW ORFDWLRQ RUKDV VRPH NLQG RI UHODWLRQ ZLWK
WKH GHIHFW :H VHOHFW RQH XQLW IRU 3)$ IURP WRS GRZQ
1RWKLQJDEQRUPDOZDVREVHUYHGIURPWRSPHWDOGRZQWRPHWDO
39&DOVRVKRZVQRDQRPDO\LQWKHKRWVSRWORFDWLRQDWHYHU\
OD\HU
)LJXUHKRWVSRWRIWKHIDXOWLVRODWLRQ
)XUWKHU GHSURFHVV ZDV SHUIRUPHG 7KH 3RO\ ZDV H[SRVHG
E\%2(*URVV:H[WUXVLRQZDVREVHUYHGLQWKHVSRWORFDWLRQ
)LJXUH 0HDQZKLOH JURVV : H[WUXVLRQ ZDV DOVR REVHUYHG
ZLWKLQWKH65$0EORFN
([SHULPHQWDQG'LVFXVVLRQ
(PEHG 65$0 %,67 IDLO ZDV REVHUYHG LQ RQH RI RXU
SURGXFW XP QR ELW PDS FDSDELOLW\ ZDV EXLOGXS IRU WKLV
SURGXFW 7KH 65$0 LV WKH QRUPDO 7 65$0 '& ,9
PHDVXUHPHQW ZDV SHUIRUPHG RQ WKH 9GG DQG 9VV 1R
VLJQLILFDQW GLIIHUHQFH ZDV REVHUYHG 7,9$ DQDO\VLV ZDV
SHUIRUPHG RQ VHYHUDO IDLOHG XQLWV DQG FRPSDUHG ZLWK JRRG
XQLW'LVWLQFW7,9$VSRWZDVREVHUYHGRQVRPHIDLOHGXQLWVDV
c
978-1-4799-3929-9/14/$31.00 2014
IEEE
)LJXUH3)$VKRZV:VROLGVKRUWKDSSHQHGLQ65$0HGJH
39
2QHPRUH3)$REVHUYDWLRQLVWKDWDOORIWKH:H[WUXVLRQ
ORFDWLRQV VLW LQ VSHFLILF ORFDWLRQ ELW OLQH FRQWDFW WR ELW OLQH
FRQWDFWVKRUW)URPWKHSURFHVVSRLQWRIYLHZLWLVTXLWHHDV\
WRXQGHUVWDQGWKHURRWFDXVHRIWKLVGHIHFW7KDWPHDQVZHFDQ
HDVLO\OLQNWKHGHIHFWWRRXUSURFHVV%XWKRZFDQZHOLQNWKH
GHIHFW ZLWK RXU HOHFWULFDO UHVXOW ZK\ WKH KRWVSRW DOZD\V
ORFDWHV LQ WKH 65$0 EORFN HGJH :K\ RQO\ VRPH RI WKH
IDLOHGXQLWVKDVWKHKRWVSRW
7R DQVZHU WKHVH TXHVWLRQV WKH LQGHSWK DQDO\VLV ZDV
HPSOR\HG LQ WHUPV RI WKH FLUFXLW DQG OD\RXW RI WKLV GHYLFH
%HIRUH WKDW ZH VHOHFW RQH IDLOHG XQLW ZLWKRXW KRWVSRW WR GR
UDQGRP 3)$ RQ WKH 65$0 UHJLRQ $V H[SHFWHG QRWKLQJ
DEQRUPDOZDVREVHUYHGLQ%(2/%XWJURVV:H[WUXVLRQZDV
DOVR REVHUYHG LQ WKH 65$0 EORFN 7KHQ RQH PRUH TXHVWLRQ
FRPHV RXWZK\WKLVJURVV:H[WUXVLRQFDQLQGXFHKRWVSRWLQ
VRPHXQLWZKLOHQRWLQRWKHUXQLW:KDWVWKHUHDVRQEHKLQG
6(0 WRS GRZQ LQVSHFWLRQ ZDV FRPSDUHG EHWZHHQ WKH
VDPSOH ZLWKKRWVSRW DQG ZLWKRXWKRW VSRW 6RPH REVHUYDWLRQ
ZDV IRXQG IRU WKH VDPSOH ZLWK KRW VSRW WKHUH LV D VROLG :
VKRUWQHVVEHWZHHQELWOLQHFRQWDFWDQGQHLJKERULQJFRQWDFWDW
65$0 EORFN HGJH :KLOH IRU WKH VDPSOH ZLWKRXW KRW VSRW
DOWKRXJK WKHUH LV : H[WUXVLRQ EXW QR VROLG : VKRUWQHVV
KDSSHQVLQWKH65$0EORFNHGJH
7KHUHDVRQLVDVEHORZXQGHU'&ELDVFRQGLWLRQ9GGDQG
9VVWKLV3026VRXUFHLVFRQQHFWHGZLWK9GGZKLOHWKHJDWH
LVIORDWLQJ%XWZHPXVWEHDULQPLQGWKLV3026LVVLWWLQJRQ
WKH 1:(// ZKLOH WKH 1:(// LV FRQQHFWHG ZLWK WKH 9GG
6RHYHQWKHJDWHLVIORDWLQJRUFRQQHFWHGZLWKVRPHZKHUHWKH
JDWHLVVWLOOWXUQRQ)RUWKH%LWOLQHWRELWOLQHVKRUWWKHUHLVQR
FXUUHQW IORZ %HFDXVH ERWK ELW OLQH DUH VKRUW WR 9GG IRU WKH
FHQWHU FHOO WKHUH LV QR FXUUHQW IORZ VR QR VSRW FDQ EH
WULJJHUHG
)LJXUH*'6OD\RXWRIWKHIDLOHGORFDWLRQDQGDQDO\VLV
)LJXUH3)$VKRZVQR:VROLGVKRUWKDSSHQHGLQ65$0HGJHIRUWKHVDPSOH
ZLWKRXWKRWVSRW
,QGHSWK FLUFXLW DQG OD\RXW DQDO\VLV VKRZV WKDW WKH GLH
HGJH FRQWDFW ZKLFK VKRUW ZLWK ELW OLQH FRQWDFW LV FRQQHFWHG
ZLWK9VV7KH9VVLQWKHEORFNHGJHLVKLJKOLJKWHGLQWKH*'6
OD\RXWILJXUH
,QWKHFHQWHURIWKH65$0EORFNELWOLQHWRELWOLQHVKRUW
ZDV REVHUYHG EXW WKLV VKRUW FDQQRW EH DFFHVVHG E\ WKH '&
ELDVVLQFHZHFDQRQO\ELDV9GGDQG9VV%XWKRZDERXWWKH
HGJH ELWOLQHZKLFKLVVKRUWWR9VV EDVHGRQRXU*'6 OD\RXW
DQDO\VLV&DQWKLVNLQGRIVKRUWQHVVEHDFFHVVHGE\QRUPDO'&
ELDVDQGFDQLWLQGXFHKRWVSRW
,QRUGHUWRDQVZHUWKLVTXHVWLRQZHKDYHDGHWDLOHGVWXG\
RI WKH 65$0 SHULSKHUDO VFKHPDWLF 7KHUH LV DQ HTXDOL]HU
3026 ZKRVH VRXUFH LV FRQQHFWHG ZLWK WKH 9GG DV
KLJKOLJKWHG LQ WKH )LJXUH 1RUPDOO\ WKLV 3026 LV WXUQHG
RII WKHELW OLQHFDQQRWEHGLUHFWO\DFFHVVHG E\WKH9GG%XW
DIWHU WKH GHWDLOHG DQDO\VLV RI WKLV FLUFXLW DQG FRPELQHG ZLWK
*'6OD\RXWZHFDQFRQILUPWKLV3026LVWXUQRQ
40
)LJXUHWUDGLWLRQDO65$0FLUFXLWDQDO\VLV
%XWWKHIRUWKHHGJHELWOLQHWKHVLWXDWLRQLVGLIIHUHQWLWLV
ELWOLQHVKRUWWR9VV
2014 IEEE 21st International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA)
$VVKRZQLIWKH)LJXUWKHUHLVEDODQFH3026ZKLFKLV
FRQQHFWHGZLWKWKHELWOLQH,IZHSXOORXWDVLQJOH65$0IRU
VHOHFWLRQLVYHU\LPSRUWDQW$VVKRZQLQWKLVSDSHUQRWDOOWKH
VDPSOHKDVKRWVSRW
)RUWKLVSDSHUWKH%,67IXQFWLRQDOIDLO 65$0ZDVDQDO\VLV
6LQFHQR ELW PDS SURYLGHG WKHQRUPDO JOREDO IDXOWLVRODWLRQ
PHWKRGZDVDSSOLHGLQWKHDQDO\VLV7KHGHIHFWDQGURRWFDXVH
ZDV VXFFHVVIXOO\ IRXQG 7KLV LV D JRRG UHIHUHQFH IRU VRPH
NLQGRIIXQFWLRQDOIDLOXUHDQDO\VLV
5HIHUHQFHV
>@ /DZUHQFH & :DJQHU )DLOXUH $QDO\VLV RI ,QWHJUDWHG
&LUFXLWV7RROVDQG7HFKQLTXHVSS,6%1
$6,1 ,6%1 ($1
>@-&+3KDQJ'6+&KDQO03DODQLDSSDQO-0&KLQ$
5HYLHZRI/DVHU,QGXFHG7HFKQLTXHVIRU0LFURHOHFWURQLF
)DLOXUHSURFHHGLQJVRIWK,3)$ 7DLZDQ
>@-&+3KDQJ'6+&KDQO6/7$1:%/HQ$UHYLHZRI
)LJXUH65$0VLQJOHELWDQGDQDO\VLV
QHDU LQIUDUHG SKRWRQ HPLVVLRQ PLFURVFRS\ DQG
FLUFXLW PHFKDQLVPDQDO\VLV LW LV VKRZQ LQ )LJXUH :H FDQ
VSHFWURVFRS\3URFHHGLQJRIWK,3)$6LQJDSRUH
HDVLO\ ILQGWKHUHLVD FXUUHQWIORZ IURPWUDQVLVWRU0SWR9VV
ZKHQWKHJDWH30260SLVIORDWLQJ7KDWLVWKHUHDVRQZK\
WKHKRWVSRWV DOZD\V ORFDWH LQ WKH 65$0 HGJH ,I WKHUH LVQR
VROLG VKRUWKDSSHQHG LQ WKH 65$0 HGJH WKHUH LVQR FXUUHQW
IORZSDWK6RKRWVSRWFDQQRWEHWULJJHUHG7KDWLVWKHUHDVRQ
ZK\WKHUHLVQRKRWVSRWLQVRPHRIWKHIDLOHGGLH
)XUWKHU FURVV VHFWLRQ DQDO\VLV VKRZ : H[WUXVLRQ VKRUW
)LJXUHFURVVVHFWLRQUHVXOWVKRZ:VKRUWKDSSHQHG
41
Margin, CMOS
I. INTRODUCTION
SRAM memories take up to 80% of the total die area and up to 70%
of the total power consumption of high-performance processors [1].
Therefore, there is a crucial need of designing high-performance,
low-leakage, and robust SRAM s. Unfortunately in scaled
technologies, particularly under scaled supply voltages, read and
write stabilities of SRAM s are affected by process variations. Due to
a large number of small geometry transistors in a memory array,
process variations have a significant impactleading to possible
read, write, and access failures, particularly at lower supply voltages.
Furthermore, in conventional 6T SRAM , the conflict between read
and write stabilities is an unavoidable design constraint that
aggravates the effect of process variations on SRAM stability and its
performance.
To improve the SRAM cell functionality, several solutions have
been proposed from device to architecture level. For instance, the use
of new devices such as FinFETs, leads to a significant performance
improvement [2-5]. At the cell level, new cells such as 7T, 8T, 9T,
10T, and 11T [6-12] have been proposed. At the architecture level,
proposed read and write assist techniques in literature can improve
SRAM robustness and performance while occupying less area
compared to the cell techniques such as 8T and 10T which can be
used with any type of SRAM .
The standard 6T-SRAM cell is shown in Fig.1 that consists of two
back-to-back inverters (include two pull-up NM OS transistors and
two pull-down PM OS transistors) and two NM OS access transistors
connected to the bitlines with the gates connected to the wordline.
During read, wordline is asserted and the voltage difference between
bitlines is sensed using a sense amplifier. The read cycle is done via
access transistors and pull-down transistors. Stronger pull-down
transistors (PDL and PDR) and weaker access transistors improve
read static noise margin (RSNM ). On the other side, stronger access
transistors and weaker pull-up transistors improve write margin
(WM ) of the bit-cell. Through upsizing, SRAM cell can operate at
lower supply voltages (i.e. low VDD min) with minimized threshold
voltage variation with a penalty of increased area.
893
increase in the area of SRAM cell compared to the standard 6TSRAM cell. Therefore, we consider two improvements in our
proposed SRAM cell: first, lowering the total area and second write
margin improvement compared to the conventional 7T-SRAM cell.
894
Fig.9. Write time comparison for the proposed SRAM cell, 6T SRAM , and conventional 7T-SRAM a) writing 1 b) writing
0
achieved. To improve the write margin of the cell when Q holds 0,
smaller sized ACR ameliorates the write margin while it has no effect
on other parameters such as read margin or access time. Since the
write margin of the cell when Q holds 1 is large enough, the size of
PUL can be increased while the size of PUR can be decreased. Also,
considering this fact that enlarging the ACL can improve the write
margin, it can be used as an important design factor. During read,
larger PDL and NF will improve read noise margin while stronger
ACL deteriorates the RSNM .
a. Write Margin
Fig.8. Read static noise margin of the proposed 7T-SRAM and
the conventional 7T-SRAM
voltage of the PM OS transistor PUR, thus, PUR turns on and helps
the stored data to flip faster. This leads to an intrinsic improvement in
write margin and enables the SRAM cell to work properly at ultrascaled supply voltages near/sub-threshold. However, as we showed in
Fig.3, for the conventional 7T-SRAM cell, M 3 and M 5 contest to
discharge and charge the storage node Q respectively, while M 2
keeps the QB node low and fights with M 4 and M 6. This increases
the short circuit power consumption of the conventional 7T -SRAM
during write 0.
By assuming a 0 on the storage node Q, as shown in in section II,
the behavior of the conventional 7T-SRAM improves the write
margin, significantly, due to the similar mechanism to the proposed
7T-SRAM cell that was mentioned in previous case. However, the
minimum write noise margin is defined by the previous case (i.e.
when node Q holds a 1) that was slightly worse than standard 6TSRAM cell. For the proposed 7T-SRAM cell, however, the write
margin improvement is limited by the write margin of the circuit
when Q holds a 0. The equivalent circuit of the cell is shown in
Fig.6.b. When the voltage on node QB is lower than Vdd-Vtp both
transistors PDL and NF become weaker that results in a faster data
flip. Noted, to improve the write margin of 7T-SRAM cell, careful
sizing is required that will be discussed in following sub-section.
895
6T-SRAM Cell
7T-SRAM
ACL
ACR
PUL
PUR
PDL
PDR
M1,M2
M3
M4
M5
M6
M7
180n
180n
150n
150n
230n
230n
230n
150n
150n
180n
180n
230n
Proposed 7TSRAM
ACL
180n
ACR
180n
PUL
150n
PUR
120n
PDL,NF
230n
PDR
200n
the node QB. This negative feedback does not allow QB going lower
and the data flipping pace becomes slower. This results in a
significantly improved read noise margin. On the other hand, when
node QB keeps a 0, by assuming a bump in voltage at this node, the
discharging path from node Q to ground is weak due to this fact that
the gate of transistor NF that is connected to the node QB. Therefore,
the RSNM of the proposed circuit improves significantly.
Retention (Hold) mode: During Hold mode when both WWL and
WL are set to 0, the proposed 7T-SRAM circuit deteriorates the
hold static noise margin (HSNM ) by at least 20.2% for VDD=0.8V
compared to the conventional 7T-SRAM . The degradation of HSNM
is attributed to the case in which Q holds 1 while for another case
(Q=0), the maximum noise margin is achieved.
Write time: In order to find the write time, the time between asserting
WL and storage node reaches to 80% of its final value is measured.
As it is shown in Fig.9, the proposed SRAM cell improves the write
time by at least 10% and 2% compared to the conventional 7T SRAM at VDD=1V when writing 1 and 0, respectively. The
maximum write time improvement compared to the conventional 7TSRAM occurs at VDD=0.4V when writing 1 and at VDD=0.3V when
writing 0 by 81% and 37%, respectively. In comparison to the 6T SRAM cell, the proposed 7T-SRAM cell shows at least 3% when
writing 1 at VDD=0.3V and 8% when writing 0 at VDD=1V. The
maximum improvement in write time compared to 6T-SRAM occurs
at VDD=0.4V by 17% and 71% for writing 1 and 0, respectively.
As seen in Fig. 9, conventional 7T-SRAM shows larger write time
when writing 1. In our simulation, the wordline capacitance effect
on write time has been neglected. Since the conventional 7T -SRAM
cell suffers from a larger wordline capacitance, taking this into
consideration results in more degradation in write time. In contrast,
the proposed 7T-SRAM cell has a similar wordline capacitance to the
6T-SRAM cell.
Sizing: in this part, we investigate the effect of sizing of transistors
on different parameters of the proposed cell such as RSNM , WM ,
and HSNM . The sizing information of the 6T-SRAM , the
conventional 7T-SRAM , and the proposed 7T-SRAM cells are
tabulated in Table 1. For the proposed 7T-SRAM cell increasing the
size of the transistor NF, PDR, and PDL improves the RSNM and
HSNM . By increasing the size of NF, PDR, and PDL, similar to 6T SRAM cell, the RSNM and HSNM will improve while increasing the
size of access transistors degrades the RSNM of the cell.
IV.CONCLUSIONS
In this paper, a novel 7T-SRAM cell was proposed that improves
write and read noise margin along with write time of the cell. The
proposed circuit can employs any type of write assist technique to
improve the write margin further. The proposed cell improves write
margin of the 6T-SRAM cell by 27% and 14% when writing 0 and 1,
respectively. The proposed cell, improves RSNM of the cell by 2.2X
and 2.5% compared to standard 6T-SRAM and conventional 7TSRAM cell, respectively.
REFERENCES
[1] M . Horowitz, Scaling, power, and the future of M OS, in
IEDM Tech. Dig., pp. 915, Dec. 2005.
896
I. I NTRODUCTION
Lowering the power consumption is a continuous task as
IC technologies keep advancing. Dynamic power consumption
of CMOS circuits is mainly consequence of charging and
discharging nodes of the circuit. Static power consumption,
on the other hand, is always present when there are voltage
differences over some components of the circuit. Both of
these can be reduced by lowering supply voltage (Vdd ), and
voltage scaling has become an effective method for reducing
power consumption in commercial CMOS devices [1]. If
Vdd is lowered from the nominal value towards transistor
threshold voltage (Vth ), transistors keep operating normally,
but the currents through devices become smaller and therefore
state switching becomes slower. In Near-Threshold Computing
(NTC) Vdd is lowered from the nominal value, but it is kept
above Vth in order to keep Ion /Iof f ratio of transistors large
enough for reliable operation.
If NTC is applied to CMOS technologies with low price
point, reasonable priced ASIC devices can be built for lowperformance purposes. Also, if the application which the
circuit is used for, can operate with low clock frequency,
voltage scaling can provide significant power benefits.
8T SRAM is by nature more reliable memory cell structure
than 6T SRAM [2]. Therefore, it is a reasonable choice when
925
design
P1 & P2
N1 & N2
A1 & A2
R1 & R2
6T SRAM
8T SRAM
150/200
150/200
350/200
250/200
250/200
350/200
300/200
W R/RD
Vdd
WR
P1
P2
BL
BL
A2
A1
N1
N2
(a)
Vdd
WR
WR
BL
P1
A2
P2
Q
R2
N2
R1
BL
A1
N1
RD
(b)
Fig. 1. 6T (a) and 8T (b) SRAM memory cells
TABLE I
SRAM CELL TRANSISTOR DIMENSIONS [ WIDTH / LENGTH (nm)].
926
Vdd
WR
600/200
t0
600/200
BL/BL
input/input
Vdd
RD
150/1000
300/200
150/1000
WR
300/200
output
BL
(a)
(b)
Fig. 2. (a) Write and (b) read circuits used for both SRAM designs
[width/length (nm)].
was 0 stored
was 0 stored
was 1 stored
was 1 stored
each write,
in
in
in
in
the
the
the
the
cell
cell
cell
cell
before,
before,
before,
before,
and the mean power was calculated from them. The dynamic
power consumption of a single operation was estimated from
the equation
Eoperation
,
P
t
where t is the period of the clock cycle, which was 1 s in
our case. Eoperation is the energy consumption of SRAM cell
927
15
10
0
-35 -10 15
40
65
90
-35 -10 15
Temperature (C)
40
65
90
-35 -10 15
Temperature (C)
(a)
40
65
90
-35 -10 15
Temperature (C)
(b)
40
65
90
Temperature (C)
(c)
(d)
Fig. 3. Power consumption of a single SRAM cell, stat parameters, distributions of 10 000 Monte Carlo simulations. Leak power of 6T SRAM (a) and
8T SRAM (b), each measuring two cases: 0 stored, and 1 stored in the cell. Dynamic power of 6T SRAM (c) and 8T SRAM (d), all different initial values
and read/write operations concerned. Box edges are at 25th and 75th percentiles, whiskers have maximum length of 1.5 times the box height ( 99.3 %
coverage).
Total Power,
1.0
8T median
0.6
1.5
6T median
1.0
0.4
0.2
0.5
0
-35
-10
15
40
65
90
Temperature (C)
-10
15
40
65
1.6
1.6
1.4
1.4
1.4
1.4
1.2
1.2
1.0
1.0
90
(a)
0.8
-35
Temperature (C)
1 page
64 pages
1.8
0.8
-35
-10
15
40
65
90
-35
Temperature (C)
(b)
6T SRAM median
8T SRAM median
2.0
1 page
64 pages
1.8
2.0
6T mean
Total Power,
2.0
8T mean
0.8
6T SRAM mean
8T SRAM mean
-10
15
40
65
90
Temperature (C)
(c)
(d)
Fig. 4. Leak (a) and dynamic (b) power consumption of a single SRAM cell, 10 000 Monte Carlo simulations. Total power of 6T SRAM array in relation
to 8T SRAM array, calculated from means (c) and medians (d) of simulation results; example page of 128 B, 16 bit read or write, activity 100 %.
V. A REA C OMPARISON
2
928
Abstract:The Low Voltage Low Power (LVLP) 8T, 11T, 13T and
ZA SRAM cell is designed using the dynamic logic SRAM cell.
The SRAM cells are implemented using pass transistor logic
technique, which is mainly focused on read and write operation.
The circuits are designed by using DSCH2 circuit editor and
their layouts are generated by MICROWIND3 layout editor. The
Layout Versus Simulation (LVS) design has been verified using
BSIM 4 with 65nm technology and with a corresponding voltage
of 0.7V respectively. The simulated SRAM layouts are verified
and analyzed. The SRAM 8T gives power dissipation of 0.145
microwatts, propagation delay of 37.2 pico seconds, area of 14 x
8 micrometers and a throughput of 4.037 nano seconds.
Keywords power dissipation, delay, throughput, SRAM cell
I.
INTRODUCTION
Lower power operation has become of crucial importance in
VLSI Design. One of the ways of obtaining power reduction is
by lowering the power supply and has been seen to be good
and effective. One of the essentials of IC design techniques is
to lower the power of memory circuits with a minimum
tradeoff on its performance. The VLSI industry is constantly
striving towards achieving high density, high speed and low
power devices in the CMOS technology. As the size of the
transistor is being reduced to about 70% of its earlier version
using new technology, the density of these devices on chip has
been increased and also reduction in delay time has been
obtained to satisfy the demand of high performance.
Different memory circuits like the SRAM, covers a
considerable area in the design of digital ICs. Arun Ramnath
Ramani and Ken Choi have [1] shown that it is possible to
push the design of low power SRAMs into the sub threshold
region and then compared it with various parameters like
speed, power consumption and average power delay product.
Yashwant Singh and D. Bhoolchandani have [2] focused on
design ofSRAM cell with dynamic Vt and dynamic standby
voltage to mitigate the leakage power dissipation. Simulation
results show significant reduction in power dissipation in
standby mode of SRAM cell. An8T-CDC column-decoupled
SRAM was designed using a half-select free design by Rajiv
V. Joshi et al [3] which enabled enhancement in voltage
scaling capabilities, and there was a 30%40% power
reduction in comparison to standard 6T techniques. A 10T
SRAM cell circuit was designed by Takahiko et al
[4]considering the static noise margin(SNM) .This circuit was
II.
Design Method
332
Fig. 3(a) 8T
333
Fig. 3(c)13T
VO
(V)
ID
(mA)
PD(W)
Propagation
Delay
PDP
A=WxH
Throug
h
put
8T
11T
13T
ZA
0.7
0.69
0.67
0.67
0.06
0.06
0.25
0.03
0.145 x10-6
0.101 x10-6
1 x10-3
6 x10-9
37.16 x10-12
8.70 x10-11
8.41 x10-10
8.50 x10-10
5.388 x10-18
8.787x10-18
8.41 x10-13
5.1 x10-18
14x8m
17x8m
20x8m
16x8m
4.372 ns
4.087ns
6.841ns
8.85ns
8T
11T
13T
ZA
0.7
0.68
0.69
0.66
0.06
0.06
0.25
0.03
0.145 x10-6
0.101 x10-6
1 x10-3
606 x10-9
3.72 x10-11
7.00 x10-11
9.25 x10-10
8.50 x10-10
5.388 x10-18
7.07 x10-18
9.25 x10-13
5.151 x10-16
14x8m
17x8m
20x8m
16x8m
4.037ns
4.07ns
6.925ns
8.85ns
Fig. 3(d) ZA
A condition of 0 at the control input defines the pre-charge
where the PMOS is conducting while the NMOS is cutoff as is
seen in the timing diagram in Fig. 3(a), 3(b), 3(c) and 3(d)
respectively. The switching of the control input between 0 and
1 and the effect on the circuits of 8T, 11T, 13T and ZA is seen
in the timing diagrams of Fig.3 (a), 3(b), 3(c) and 3(d).In the
part III of the paper the results obtained using the above
techniques achieve low power dissipation, less delay and
better speed
III.
RESULTS AND DISCUSSION
A VLSI circuit can be determined as a complex maze of paths
that influence on the whole circuit function. When the
designis not appropriately simulated it will ruin the whole
circuit operation. In order to ensure the rightness of the circuit
there are a few ways to minimize these effects. One of the
methods to correct the fault is by putting the analysis data on
the certain probe points and follows the data flow through the
circuit. The output points later can be observed to determine
whether the circuit has handled the data in an appropriate
manner. The observation of these data as they flow through
the computer representation of a circuit is called the
simulation and the selection of input data is known as a testvector generation.Our SRAM cell is designed in context with
334
Typ
e
VO
(V)
ID
(mA)
PD (W)
Propagati
on Delay
PDP
A=WxH
Through
put
8T
0.56
0.056
7.817 x10-6
3.72 x10-11
2.907x10-16
14x8m
4.0372ns
-6
-11
8.427 x10-17
17x8m
4.036ns
11T
0.66
0.056
2.341 x10
3.60 x10
13T
0.68
0.249
7 x10-6
9.09 x10-10
6.363 x10-15
20x8m
6.909ns
ZA
0.67
0.249
906 x10-9
9.00 x10-10
8.154 x10-16
16x8m
8.9ns
8T
0.56
0.056
7.817 x10-6
3.72 x10-11
2.907 x10-16
14x8m
4.0372ns
11T
0.54
0.056
9.061 x10-6
1.04 x10-10
9.423 x10-16
17x8m
4.104ns
13T
0.68
0.249
7 x10-6
9.42 x10-10
6.594 x10-15
20x8m
6.942ns
ZA
0.67
0.249
1.306 x10-6
9.84 x10-10
1.285 x10-15
16x8m
8.984ns
Proposed circuit
0.145 x 10 -6
Ref [1]
-
Ref [2]
0.0687 x 10 -6
37.2 x 10 -12
2.907 x 10 -16
1018.3 x 10 -12
1.347 x 10 -18
Fig 4 (d) ZA
CONCLUSION
This design has been implemented with the low voltage low
power application where it tries to reduce the power
consumption on the SRAM cell circuit. The data or output
from this simulation does not give the much expected result.
This is due to the many problems faced when dealing with the
simulation process.This shows that during the designing
process, we need to follow all aspects stated in the manual as a
guide line in order to design a better circuit. According to
dynamic Design concept, this SRAM cell achieved less
consumption of power, lower delay, high speed and high
throughput than existing SRAM cell.
REFERENCES
[1]
Fig 4(a) 8T
[2]
[3]
[4]
[5]
[6]
335
I.
DRAIN
BACK GATE
FRONT GATE
SOURCE
BOX
Si SUBSTRATE
DRAIN
BACK GATE
FRONT GATE
SOURCE
BOX
Si SUBSTRATE
INTRODUCTION
II.
449
FinFET
VGS= 0.9V
0.8V
0.7V
0.6V
0.5V
0.4V
50
40
Bulk MOSFET
VGS= 0.9V
0.8V
0.7V
0.6V
0.5V
0.4V
a
20
15
30
10
20
Temp=25c
10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Temp=25c
0.1
0.2
0.3
VDS(V)
0.4
0.5
0.6
0.7
0.8
0.9
VDS(V)
Fig. 2. The I-V characteristic of (a) FinFET (b) bulk CMOS transistors
used. The width of bulk CMOS transistor is 71nm while the fin
height and fin thickness of FinFET transistor are 28nm and
15nm, respectively, that makes its width equal with bulk
CMOS transistor. Table I shows the characteristics of FinFET
transistor that is used. All simulations have been performed in
HSPICE using Predictive technology models (PTMs) [11].
TABLE I.
Technology(nm)
Lg(nm)
EOT(nm)
Tfin(nm)
Hfin(nm)
NSD(cm-3)
VDD(V)
SS(mV/decade)
DIBL(mV/V)
A. I-V characteristic
Fig. 2 shows ID versus VDS for bulk CMOS and FinFET
transistors when VGS changes from 0V to 0.9V. Two features
can be derived from strong inversion region including the level
of ON current and output resistance. The level of ON current in
FinFET is higher. Besides, it has higher output resistance (less
channel length modulation). It is due to the fact that the
channel is surrounded in three dimensions in FinFET. It causes
better gate control in this type of transistor.
20
24
1.1
15
28
3e26
0.9
71
58
C. Subthreshold Swing
Fig. 4 also shows that the subthreshold swing (SS) of the
FinFET is 21% lower than bulk CMOS transistor at room
temperature. It shows more dependency of the drain current to
VGS in FinFET transistor. Considering the subthreshold I-V
relation where drain current changes exponentially with VGS
[12], it shows that the dependency of drain current with VGS in
FinFET increases in a faster pace than bulk CMOS.
Fig. 5 shows the effect of temperature on SS. Temperature
is changed from -40c to 125c and SS is calculated for both
Log10 ID
Fig. 4. Drain current versus Gate Source voltage for FinFET and bulk CMOS
while VDS is 0.1V and 1.1V
Fig. 3. ION/IOFF ratio versus supply voltage for FinFET and bulk CMOS
transistors
450
Fig. 7. Drain current versus gate source voltage, while drain voltage is VDD for
channel length from 24nm to 54nm
Fig. 5. Subthreshold swing versus temperature for FinFET and bulk CMOS
Fig. 8. Threshold voltage versus channel length for bulk CMOS and FinFET
Fig. 6. Drain current versus gate source voltage while drain voltage is VDD
451
10u
1u
T=125c
100n
10n
1n
-40c
100p
10p
1p
FinFET
Bulk MOSFET
100f
10f
0
0.4
0.6
0.8
VGS(V)
Fig. 11. Drain current versus Gate Source voltage for different temperatures
for FinFET and bulk CMOS
to common path for read and write operations, there are design
trade-offs in the strength of transistors in 6T SRAM cell. While
reading the cell, pull down transistors (PD1-PD2) stronger than
access transistors (AC1-AC2) increase the reliability of the
cell. In the other way, when performing the write operation
stronger access transistors than pull down and pull up
transistors ease the operation. In the hold mode, equal strength
for pull up and pull down transistors ensures the most
reliability.
Due to a very high process variation and low noise margin
of 6T SRAM, 8T-SRAM cell is used with separate lines for
read and write [15]-[17]. This obviates the trade-off between
read and write cycles. To make it more clear, stronger access
transistors are required in order to improve write margin, while
for improving read margin weaker access transistors are
needed. This means, this issue can be resolved by separating
read and write paths. Fig 12-b shows 8T SRAM cell structure.
It consists of a 6T SRAM cell together with a read circuit (R1R2 transistors and RBL line). The write operation is done by
BL and BLB lines througth access transistors (AC1-AC2). R1
and R2 transistors are used in order to make the data stored in
node Q on RBL line during read operation. This structure
obviates the trade-offs between read and write.
H. Temperature
Fig. 11 shows ID versus VGS for different temperatures (T)
vary from -40c to 125c. As shown, T variation in bulk
CMOS changes both of the performance (ION) and the leakage
power consumption (IOFF) compared with FinFET that it only
changes IOFF. However, OFF current variation in FinFET is
more severe. Another feature that is affected by temperature
variation is threshold voltage. Increasing the temperature from
-40c to 125c decreases the threshold voltage by 10% and
16% for bulk CMOS and FinFET respectively. It shows more
threshold voltage dependency to the temperature in FinFET.
III.
0.2
452
45mV and 49mV for NMOS and PMOS have been considered
[18] and a Monte Carlo analysis with 1000 itterations is
performed. 25C and VDD=0.9V conditions have been
considered in whole simulations. Simulations are done using
HSPICE with PTM models 22nm LP and 20nm LSTP for bulk
CMOS and FinFET transistors, respectively.
FinFET structure.
3) Supply Voltage Scalability: In order to evaluate supply
voltage scalability for low voltage applications such as
biomedical applications, the Monte Carlo analysis while VDD
is swept from 0.95V to 0.2V is performed and RSNM is
calculated. As expexcted, RSNM decreases with VDD scaling.
By supposing the minimum allowable RSNM is 15% of VDD,
the minimum operational supply voltage will be 0.3V and
0.65V for FinFET and bulk CMOS, respectively. It shows that
FinFET transistor is a better candidate for low voltage
applications.
A. 6T SRAM cell
1) RSNM: Read static noise margin (RSNM) is a metric
showing read stability of a SRAM cell. It is defined as the
length of the side of the largest square that can fit into the
lobes of the butterfly curve. Butterfly curve is obtained by
drawing and mirroring the inverter charachteristics while
access transistors are ON and bitlines are precharged to VDD
[19]. The size of transistors has shown in Fig. 12. The
parameter X is W/L ratio for bulk CMOS transistors that is
71nm/24nm (e.g., the size of PD2 in 6T SRAM is 3X or
W=213nm/L=24nm). When using FinFET transistors X=1 and
the term shows the number of fins (e.g., the number of fins for
AC1 is 2).
Fig 14. shows the results of Monte Carlo analysis for WM.
The mean value and variance of WM are 295mV (209mV) and
3.52110-05 (1.1110-04) for FinFET (bulk CMOS). Write
margin is obtained 180mV and 278mV for bulk CMOS and
FinFET, respectively. It shows 54% improvement in WM for
Fig. 13. Monte Carlo analysis of RSNM in 6T SRAM cell for threshold
voltage in presence of process variation for (a) FinFET and (b) bulk CMOS
453
[2]
[3]
[4]
[5]
CONCLUSIONS
RSNM (V)
[1]
[6]
[7]
[8]
[9]
[10]
REFERENCES
N. Verma, and A. P. Chandrakasan, A 256 kb 65 nm 8T subthreshold
SRAM employing sense-amplifier redundancy, IEEE J. Solid-State
0.383
35
0.38
25
[11]
[12]
[13]
FinFET
T=25c
VDD=0.9V
[14]
15
0.377
5
0.374
500
Iteration number
1000
0.375
(a)
0.379
RSNM (V)
[15]
0.383
[16]
80 Bulk MOSFET
T=25c
70 V =0.9V
DD
60
50
40
30
20
10
RSNM (V)
0.36
0.358
0.356
0.354
0.352
500
Iteration number
1000 0.35
0.354
0.358
[17]
[18]
[19]
0.362
RSNM (V)
(b)
Fig. 16. Monte Carlo analysis of RSNM in 8T SRAM cell for threshold
voltage in presence of process variation for (a) FinFET and (b) bulk CMOS
[20]
[21]
[22]
454
Gbstract
150
&
140
R8
a
1.47
Simple 6T
AAdvanced 61
130
H
!s
120
.I
c
(
110
_- --
100
50
100
150
140
100
100
110
120
130
140
150
11.I .I
0-7803-3393-4 $5.00 0 1996 IEEE
IEDM 96-271
SILICIDE
POLY
Typical S U M process
11 .I .2
272-IEDM 96
"1'
.__
11)
1.3
1.6
1.9
2.2
2.5
in
1.3
1.6
1.9
2.2
2.5
11 .I .3
IEDM 96-273
Summarv
In conclusion, 4T SRAM cells, which have enjoyed
nearly two decades of dominance of the SRAM market, appear
to be very difficult to scale below the 1.8 volt power supply
generation. Because of this, 6T cells will likely begin to
dominate the stand-alone SRAM market as power supplies
scale below 1.8 volts. This may lead to common process
flows for advanced microprocessors and stand-alone SRAM
products.
References
I
-100
LO
1.3
1.6
19
25
22
c
fi
6T,B=l
\ ,A
4T,TET,Ron = 1 GOhm
130
4T,l3=2
4KOhm LDD res.
-104
10
I'
I
1.3
1.6
1.9
2.2
2.5
11.I .4
274-IEDM 96
Abstract
This paper demonstrates new circuit technologies that enable
a 0.25-pm ASIC SRAM macro to be nonvolatile with only a
17% cell area overhead (NV-SRAM: nonvolatile SRAM).
New capacitor-on-metallvia-stacked-plug process technologies make it possible for a NV-SRAM cell to consist of a sixtransistor ASIC SRAM cell and two back-up ferroelectric
capacitors stacked over the SRAM portion. A Vdd12 plate
line architecture makes readwrite fatigue virtually negligible.
A 512-byte test chip has been successfully fabricated to show
compatibility with ASIC technologies.
Introduction
Conventional nonvolatile memories using remanent
polarization of ferroelectric capacitors (FeRAMs) appear to
hold good promise for low-power, on-chip use because their
operational voltage is low (about 2.5 V) and write endurance
is high (over 10' cycles), each an improvement over on-chip
EEPROMs and Flash memories. Their destructive read
operations, however, have limited their read endurance to the
same level as their write endurance, preventing wider
application. In response to this, we have developed a
nonvolatile SRAM (NV-SRAM) technology based on a
shadow-RAM that uses ferroelectric capacitors. A 5 12-byte
NV-SRAM macro fabricated with 0.25 pm design rules has
shown virtually negligible readwrite fatigue. Individual cell
area is 18.6 pm2, and total macro area is 0.27 mm2.
NV-SRAM Cell
A. Cell Structure
Figure l(b) shows the proposed NV-SRAM cell configuration.
Each of its cells consists only of a six-transistor SRAM cell
and two back-up capacitors. Each capacitor is directly
connected to an SRAM cell storage node. A unique feature
of the proposed NV-SRAM cell is that the plate lines of the
NV-SRAM are set to Vdd/2 to keep the voltage bias across
the capacitors low, within -Vddl2 to Vdd2. The coercive
voltages of the capacitors are set to a value greater than V d d
2 to eliminate polarization transitions; this results in fatigue
becoming virtually negligible.
WL
BL
c1s2E
c2
PL
(a) Shadow RAM cell using ferroelectric capacitor [I, 21
WL
BL
PL
(normally V d d 2 )
(b) NV-SRAM cell
Figure 1: Memory cell configurations.
4-4- 1
0-7803-5809-0/00/$10.000 2000 IEEE
65
B. Cell Layout
Figure 2(a) shows the memory cell layout of an SRAM cell
for a 0.25 pm standard ASIC macro and Figure 2(b) shows
that of an NV-SRAM based on this SRAM cell. Figure 3
illustrates the 3-D structure of an NV-SRAM cell. By means
of capacitor-on-metalkia-stacked-plug (CMVP) process
technologies [3], the capacitors are formed above the SRAM
portion after the completion of the standard CMOS process.
The added topmost metal layer forms plate lines. The bottom
electrode of each capacitor is connected to a storage node of
the SRAM portion via a stacked plug, and the top electrodes
in the cell m a y are connected to a plate line. The NV-SRAM
cell is 1.17 times larger than the SRAM cell because of the
additional area of the stacked metal/via connecting the storage
nodes and the capacitors. The NV-SRAM cell size depends
not on capacitor size but on wiring and transistor design rules.
For 0.25-pm ASIC design rules, I-pm-square capacitors, large
enough for stable operations, are used. With this simple
structure, NV-SRAM cells can be shrunk as the same rate as
C. STORE Operation
STORE operations must be carried out before any voltage
cut-off so as not to lose data. Figure 4 shows timing charts
indicating the polarization states of the ferroelectric capacitors
at individual stages. In a volatile state, the plate line is set at
Vdd/2, and polarization is distributed randomly. During a
STORE operation, the plate line is first driven to Vdd, which
results in Vdds being applied across the capacitor that is
connected to the lower storage node (CapLow) to provide
positive remanent polarization. By way of contrast, no bias
is applied across the other capacitor (CapHigh) (1). The plate
line is then discharged to ground level, which results in -Vdds
being applied across CapHigh to provide negative remanent
polarization, and the voltage biass being removed from
CapLow, which retains positive remanent polarization (2).
Finally, voltage is cut off, and CapHigh and CapLow retain
negative and positive remanent polarization, respectively.
D. RECALL Operation
When power is next restored, an automatically-invoked
RECALL operation translates the states of remanent
polarizations kept in the ferroelectric capacitors into voltage
levels in their respective storage nodes in the SRAM portion.
As the supplied voltage rises from ground level to Vdd, while
66
4-4-2
STORE
Power-
Power supply
Plate line
tJr?
Jr
=FFF+
4 V o T JF
Plate line
Lower storage node
CapLow bias
CapLow polarization
Higher storage node
CapHigh bias
't
vr
-$
i-FT
CapHigh polarization
(power off) (3)
(4)
4-4-3
67
Conclusion
A 512-byte nonvolatile SRAM (NV-SRAM) test macro has
been successfully fabricated with a 0.25-pm double-metallayer CMOS process. The Vdd2 plate line architecture contributes to its virtually negligible (>10l2) fatiguehmprint characteristics. Because of its CMVP 3-D structure, its memory
cell occupies an area only 1.17 times larger than that of a
standard ASIC SRAM cell produced with the same design
rules, The cell is 18.6 pm2 and its total macro area is 0.27
mm2. The readwrite cycle time of NV-SRAM is comparable
with a standard SRAM macro because its read/write operations are based on a standard SRAM.
References
[ 11 Eaton, S. S . , et al., A Ferroelectric Nonvolatile Memory,
ISSCC 88.
Organization
Supply voltage
Technology
[3] Amanuma, K., et al., Capacitor-on-MetalNia-stackedPlug (CMVP) Memory Cell for 0.25 um CMOS Embedded
FeRAM, IEDM 98.
Die size
Cell size
Capacitor size
Cycle time
Active power
512 Byte
2.5V
0.25 p n double-metal CMOS
with capacitor-on-metavviastacked-plug (CMVP)
450x600pm2
3.22 x 5.78 p2
1 x 1 pn2x 2
6 ns @ 2.5 V (estimated)
2 mW @ 5 MHz
35
3 30
25
x
20
__
--..
1
--I
..
Non-switching
5
0
IE+W
IE+10
IE+12
-30
Voltage (VI
68
4-4-4
522
AbstractThis paper demonstrates new circuit technologies that enable a 0.25- m ASIC SRAM macro to be
nonvolatile with only a 17% cell-area overhead. New capacitor-on-metal/via-stacked-plug process technologies permit a
nonvolatile SRAM (NV-SRAM) cell to consist of a six-transistor
ASIC SRAM cell and two backup ferroelectric capacitors stacked
over the SRAM portion. READ and WRITE operations in this
NV-SRAM cell are very similar to those of a standard SRAM,
and this NV-SRAM shares almost all the circuit properties of a
standard SRAM. Because each memory cell can perform STORE
and RECALL individually, both can execute massive-parallel
operations. A dd 2 plate-line architecture makes READ/WRITE
fatigue negligible. A 512-byte test chip was successfully fabricated
to show compatibility with ASIC technologies.
Index TermsEmbedded memory, ferroelectric memory,
SRAM.
I. INTRODUCTION
523
C.
(a)
(b)
the SRAM-cell portion. Then, the plate line is driven from the
. This is done to apply program voltages
ground level to
to each of the ferroelectric capacitors.
of plus and minus
In this conventional cell, C1 must be boosted over
to apply full
bias to the capacitor connected to the higher
storage node.
A typical conventional shadow RAM using ferroelectric capacitors presents three design challenges. The first is concerned
with density; designers must determine the minimum cell size
possible. The second is concerned with the STORE and RECALL
operations; massive-parallel operations are best in quick powerup/down sequences. The third challenge is concerned with the
reliability of stored data, which strongly depends on the program
voltage. A sufficiently high program voltage must be combined
with a low operation voltage.
READ
WRITE
Operation
To write a value in an NV-SRAM cell, a WRITE operation similar to that in a standard SRAM is carried out. First, a WRITE cir, and the other to the ground
cuit drives one of the bitlines to
level in accordance with the value to be written. Then, the wordline is selectively driven to an active level, and the SRAM portion is forced to hold the value. During this WRITE operation, the
.
plate line is kept at
In contrast to READ access time, WRITE cycle time can be extended by adding capacitors. To pull down a higher storage node
to the ground level in a WRITE operation, the WRITE circuit also
needs to discharge a ferroelectric capacitor on the storage node.
The ferroelectric capacitor is about 100 fF and comparable to
the bitline capacitance [Fig. 3(b)]. It may require a time budget
in the write cycle. Enhancing the WRITE circuit is effective way
to overcome this disadvantage.
The static noise and WRITE margins of an NV-SRAM cell are
naturally the same as those in an SRAM cell, which forms the
SRAM cell portion of the NV-SRAM cell. This is because these
margins define the dc characteristics of memory cell. That is,
they depend only on the resistance and conductance of the elements in memory cells and are independent of the capacitance
of the storage nodes.
D.
STORE
Operation
Operation
E.
RECALL
Operation
When power is next restored, an automatically invoked REoperation translates the states of the remanent polarizations kept in the ferroelectric capacitors into voltage levels in
their respective storage nodes in the SRAM portion. As the sup, while the plate
plied voltage rises from the ground level to
line is fixed at the ground level, the voltage at the storage node
connected to CapHigh rises faster than at the other storage node
(3). This is because, under these conditions, the differential capacitance of CapLow is higher than that of CapHigh because
of the switching charge involved and the node connected to
CapLow has a greater coupling with the grounded plate line.
Subsequently, the latch function of the SRAM portion amplifies
CALL
524
(a)
Fig. 3.
(b)
Influence of capacitors on READ/WRITE operations. (Arrows show current paths.) (a) READ operation. (b) WRITE operation.
(a)
Fig. 5.
cell.
(b)
525
526
ACKNOWLEDGMENT
The authors would like to thank the members of the FeRAM
SoC Development Project, NEC Corporation, for their valuable
advice, and M. Yamashina and M. Fukuma for their support.
REFERENCES
Fig. 9.
Hysteresis curves.
V. CONCLUSION
A 512-byte nonvolatile SRAM (NV-SRAM) test macro was
successfully fabricated by using a 0.25- m double-metal-layer
plate-line architecture contributes to
CMOS process. Its
fatigue/imprint characteristics. Because
its negligible
of the CMVP 3-D structure, the memory cell occupies an area
only 1.17 times larger than that of a standard ASIC SRAM
cell produced with the same design rules. The cell has an
area of 18.6 m , and the total macro area is 0.27 mm . The
527
Yukihiko Maejima received the B.S. degree in applied physics from Waseda University, Tokyo, Japan,
in 1981, and the M.S. degree in physics from the University of Tokyo in 1983.
He joined NEC Corporation, Kanagawa, Japan, in
1983, and is currently involved in the development of
FeRAM processes.
Mr. Maejima is a member of the Japan Society of
Applied Physics.
Hiromitsu Hada was born in Mie, Japan, on December 26, 1959. He received the B.E. and M.E. degrees in electrical engineering from Mie University,
Mie, in 1982 and 1984, respectively.
In 1984, he joined NEC Corporation, Kawasaki,
Japan, where he worked on CMOS SOI LSIs and
DRAM technology. He is currently involved in
research and development of FeRAM technology at
Silicon Systems Research Laboratories, Sagamihara,
Japan.
Mr. Hada is a member of the Japan Society of Applied Physics.
2009 IEEE Symposium on Industrial Electronics and Applications (ISIEA 2009), October 4-6, 2009, Kuala Lumpur, Malaysia
S. Dasgupta
Electronics and Computer Engineering Dept.
IIT Roorkee
Roorkee, India
[email protected]
better performance new SRAM cells [5] [6] [7] [8] have
been introduced. In most of these cell read and write
operation are isolated to obtain higher noise margin. In
this paper a comparative analysis of 6T, 8T [5] and 9T [6]
SRAM cell has been carried out. The major difference
between the 8T and 9T SRAM cell is that in case of 8T
SRAM cell single bit line has been used while in case of
9T SRAM double bit line as in conventional 6T SRAM
cell has been used. All the simulation has been carried on
90 nm CMOS technology. Tools used for the simulation
are CADENCE, MATLAB and ORIGIN.
II. SRAM CELLS
A SRAM cell must be designed in such a way, so that it
provides a non destructive read operation and a reliable
write operation. In the conventional 6T SRAM cell this
condition is fulfilled by appropriately sizing all the
transistors in the SRAM cell. Sizing is done according to
the cell ratio (CR) [9] and pull up ratio (PR) [9] of the
I.
INTRODUCTION
For nearly 40 years CMOS devices have been scaled
down in order to achieve higher speed, performance and
lower power consumption. Due to their higher speed
SRAM based Cache memories and System-on-chips are
commonly used. Due to device scaling there are several
design challenges for nanometer SRAM design. Now we
are working with very low threshold voltage and ultra
thin gate oxide due to which leakage energy consumption
is getting increased. Besides this data stability during read
and write operation is also getting affected. Intrinsic
parameter fluctuation [1] like random dopant fluctuation
[2], line edge roughness [3] and oxide thickness
fluctuation [4] further degrades the stability of SRAM
cell. In order to obtain higher noise margin along with
889
2009 IEEE Symposium on Industrial Electronics and Applications (ISIEA 2009), October 4-6, 2009, Kuala Lumpur, Malaysia
TABLE I
WIDTH OF TRANSISTOR USED FOR SIMULATING 8T SRAM
CELL
Transistor
Width (nm)
M1, M2, M3, M4
120
M5
600
M7,M8
480
M6
240
252.2
93.64
84.5
72.82
77.72
98.85
8.976
45.47
10
TABLE V
SIMULATION RESULT FOR CORNER ANALYSIS OF READ
NOISE MARGIN
Process
6T SRAM
8T SRAM
9T SRAM
corner
(V)
(V)
(V)
FS
0.073
0.285
0.274
FF
0.084
0.314
0.299
TT
0.077
0.320
0.308
SS
0.087
0.325
0.314
SF
0.116
0.342
0.325
890
2009 IEEE Symposium on Industrial Electronics and Applications (ISIEA 2009), October 4-6, 2009, Kuala Lumpur, Malaysia
1.2
FS
SF
SS
TT
FF
1.0
VQn (V)
0.8
0.037
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
VQ (V)
1.2
SS
SF
FS
FF
TT
1.0
0.228
VQn (V)
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
VQ (V)
1.2
FS
SF
SS
TT
FF
1.0
0.217
VQn(V)
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.0
1.2
VQ(V)
TABLE VI
SIMULATION RESULT FOR STATISTICAL ANALYSIS OF SRAM
CELLS
SRAM
cells
6T
8T
9T
RNM
RNM (mV)
( RNM 6 RNM )
18.85
18.82
20.92
(mV)
Negative
192.1
175.48
(V)
0.065 V
0.305 V
0.301 V
891
2009 IEEE Symposium on Industrial Electronics and Applications (ISIEA 2009), October 4-6, 2009, Kuala Lumpur, Malaysia
Process
corner
FS
FF
TT
SS
SF
TABLE VII
CORNER SIMULATION RESULT FOR WNM
6T SRAM cell
8T SRAM cell
9T SRAM cell
Write
Write
Write
Write
Write
Write
0
1
0
1
0
1
(V)
(V)
(V)
(V)
(V)
(V)
0.505
0.456
0.399
0.334
0.779
0.774
0.479
0.442
0.445
0.268
0.757
0.762
0.494
0.445
0.454
0.248
0.745
0.739
0.497
0.448
0.465
0.226
0.728
0.725
0.454
0.422
0.506
0.165
0.708
0.711
D. Lekage energy
The subthreshold leakage current is modeled as [11]:
I sub = Asub
E. Layout
Layout for different SRAM cell is shown in Figure 16,
17 and 18. Due to larger width of pull down transistors in
6T SRAM cell, finger type layout has been used for it.
The area and parasitic capacitance for different SRAM
cell is tabulated in Table VIII
exp
V GS Vt 0 'V SB + V DS
n ' kT
q
V DS
1 exp
kT
F.
(1)
Where Asub = 0 C ox
W
(kT / q )2 e1.8
Leff
(2)
4m * q
(kT )2 1 + kT
3
h
2 EB
q S q F EG / 2
exp
exp E B
kT
J Tunnel =
0.32
1.88
2.02
1.2
(3)
1.0
Where
4Tox 2 m ox
h
(4)
892
VQn (V)
0.8
0.6
0.4
FF
SS
TT
FF
SF
0.351
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
VQ (V)
2009 IEEE Symposium on Industrial Electronics and Applications (ISIEA 2009), October 4-6, 2009, Kuala Lumpur, Malaysia
1.2
1.0
VQn(V)
0.8
SF
FF
FS
SS
TT
0.160
0.6
0.4
0.2
0.8
0.6
0.4
0.2
0
1
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
VQ (V)
1.0
VQn (V)
0.8
0.6
0.705
0.4
SF
FS
TT
SS
FF
0.2
1
0.8
0.6
0.4
0.2
0
1
0.0
0.4
0.6
1.2
0.2
0.0
0.8
1.0
1.2
VQ (V)
Figure 13. Corner simulation result for 6T SRAM cell for write 0
893
2009 IEEE Symposium on Industrial Electronics and Applications (ISIEA 2009), October 4-6, 2009, Kuala Lumpur, Malaysia
the cell is not symmetric and its write time is also higher.
On the other hand 9T SRAM cell has higher RNM as well
as WNM and its write time is also small. The leakage
current of 9T SRAM cell is reduced by using dual
threshold voltage technology.
REFERENCES
[1]
[2]
[3]
[5]
[6]
[7]
[8]
[9]
[10]
IV.
CONCLUSION
894
[12]
[13]
1427
AbstractThe authors show new guidelines for Vdd and threshold voltage (Vth ) scaling for both the logic blocks and the highdensity SRAM cells from low power-dissipation viewpoint. For
the logic operation, they have estimated the power and the speed
for inverter gates with a fan out = 3. They find that the optimum
Vdd is very sensitive to switching activity in addition to the operation frequency. They propose to integrate two sets of transistors
having different Vdd s on a chip. In portions of the chip with high
frequency or high switching activity, the use of H transistors in
which Vdd and Vth are moderately scaled is helpful. On the other
hand, in low switching activity blocks or relatively low frequency
portions, the use of L transistors in which Vdd should be kept
around 11.2 V is advantageous. A combination of H and L is
beneficial to suppress power consumption in the future. They have
investigated the yield of SRAM arrays to study the optimum Vdd
for SRAM operation. In high-density SRAM, low Vth causes yield
loss and an area penalty because of low static noise margin and
high bit leakage especially at high temperature operation. Vth
should be kept around 0.30.4 V from an area size viewpoint. The
minimum Vdd for SRAM operation is found to be 0.7 V in this
study. It is also found that the supply voltage for SRAM cannot be
scaled continuously.
Index TermsCMOSFET logic devices, CMOS memory integrated circuits, logic devices, power consumption, SRAM chips.
I. INTRODUCTION
Io Vdd .
(1)
Ngate
Manuscript received October 11, 2005; revised March 6, 2006. The review
of this paper was arranged by Editor V. R. Rao.
The authors are with the System LSI Division, Semiconductor Company, Toshiba Corporation, Yokohama 235-8522, Japan (e-mail: eiji.morifuji@
toshiba.co.jp).
Digital Object Identifier 10.1109/TED.2006.874752
1428
Vth . Fig. 1 indicates the delay time per inverter versus the total
power consumed by 1 M gates at 105 C. The implicit variables
in the plot are Vdd and Vth . Delay time is calculated by using
the simple CV /I equation [5]. The device characteristics are
estimated using the device currentvoltage (IV ) equations
obtained from [5]. The mobility model used here is obtained
from [6]. These equations are calibrated with the published
65-nm technology data in [7]. The estimated Tpd using CV /I
shows a good agreement with the actual Tpd provided by [7].
The clock frequency is chosen to be 2 GHz. The switching
activity is 20%. The gate length and the equivalent oxide
thickness (EOT) are set to be 40 and 1.2 nm, respectively.
The standby power is calculated by taking into account the
distribution of Io within a chip. The average value of Io with
log normal distribution is used in the analysis. Vth variations
caused by fluctuation in process as well as random factors are
considered. As the required speed becomes high, Vth should be
lower in order to achieve the required speed. This causes an
increase in the standby power. At a given voltage, the standby
power increases drastically and dominates the total power in
the high-speed region. The dotted line is the boundary where
the dominant power changes from being mostly active power
to being mostly standby power. The transition point moves
toward slower delay if we decrease the Vdd . This suggests
that a low Vdd operation suffers from high standby power if
the required speed is high as well. The relation between the
active power and the standby power is sensitive to the operation
frequency and the switching activity (a). Fig. 2 depicts the
total power as a function of Vdd for 1 M gates for three
different frequencies: 500 MHz, 2 GHz, and 4 GHz. The curves
corresponding to various switching activities from 1% to 100%
are also shown. The clock generator operates at a = 100%.
On the other hand, the memory array seldom switches and
has a switching activity less than 1%. The typical switching
activity in the logic blocks is around 20%. In the case of a
chip operating at a relatively lower frequency (500 MHz in this
case), lower Vdd looks disadvantageous because the standby
power dominates the total power. In this case, Vdd should be
kept high at around 1.11.2 V. On the other hand, for the case
MORIFUJI et al.: SUPPLY AND THRESHOLD-VOLTAGE TRENDS FOR SCALED LOGIC AND SRAM MOSFETs
Fig. 3. Proposed guideline and scaling trend for achieving both low power
consumption and high-speed operation. In this scaling, the operation frequency
is assumed to be 1.4 times the previous technology. H and L versions are
provided to meet the system requirement of power and speed.
1429
Fig. 6. Total power consumption per 1-M gate calculated in each technology
in H only and in H + L cases is shown by separating the standby-power and
active-power components. The expected power reduction is half of the previous
generation.
1430
Fig. 7. Factors affecting the scaling in SRAM cell are illustrated such as smaller SNMs and high OFF-currents.
Fig. 8. Yield of 2-MB SRAM arrays as a function of Vdd and Vth operated
at room temperature. Vth is varied from 0.15 to 0.45 V. Cell sizes such as
0.56 m2 (beta ratio = 1), 0.598 m2 (beta ratios = 1.5 and 1.67), and
0.6292 m2 (beta ratio = 2.17) are investigated.
arrays while varying the Vdd and Vth . The data are shown at
room temperature as well as at a high temperature of 125 C.
Four types of SRAM having different beta ratios and cell sizes
are investigated. These are 0.56 m2 (beta ratio: 1), 0.598 m2
(beta ratios: 1.5 and 1.67), and 0.6292 m2 (beta ratio: 2.17).
Vth can be tuned by changing the channel doping. The SRAM
yield at low-voltage operation improves by lowering the threshold voltage to between 0.15 and 0.25 V. It should be noted that
a degradation in the yield is found at high-Vdd operation for a
Vth = 0.15 V. In high-Vdd region, the OFF-current of each cell
increases and becomes comparable with the cell current, thus
causing a failure. On the other hand, SRAM with a low beta
ratio significantly degrades yield in low-Vdd operation. From
this, a low Vth is disadvantageous from the cell-size viewpoint.
This is caused by a degradation in SNM through the narrowchannel effect in the driver and the transfer transistors. For
Vth = 0.35 V case, beta = 1 depicts the best yield. For a higher
Vth case (Vth = 0.45 V), yield degrades because the threshold
voltage is close to half of the supply voltage, and the eye
MORIFUJI et al.: SUPPLY AND THRESHOLD-VOLTAGE TRENDS FOR SCALED LOGIC AND SRAM MOSFETs
1431
R EFERENCES
[1] R. H. Dennard, F. H. Gaensslen, V. L. Rideout, E. Bassous, and
A. R. LeBlanc, Design of ion-implanted MOSFETs with very small
physical dimensions, IEEE J. Solid-State Circuits, vol. 9, no. 5, pp. 256
268, Oct. 1974.
[2] R. Gonzalez, B. M. Gordon, and M. A. Horowitz, Supply and threshold
voltage scaling for low power CMOS, IEEE J. Solid-State Circuits,
vol. 32, no. 8, pp. 12101216, Aug. 1997.
[3] The International Technology Roadmap for Semiconductors, ITRS Handbook, 1993. [Online]. Available: http//:public.itrs.net/Files/2003ITRS/
Home2003.htm
[4] K. Nose and T. Sakurai, Optimization of Vdd and Vth for low-power and
high-speed applications, in Proc. ASP-DAC, Jan. 2000, pp. 469474.
[5] Y. Taur and T. H. Ning, Fundamentals of Modern VLSI Devices.
Cambridge, U.K.: Cambridge Univ. Press, 1998.
[6] K. Chen, C. Hu, P. Fang, M. R. Lin, and D. L. Wollesen, Predicting
CMOS speed with gate oxide and voltage scaling and interconnect loading
effects, IEEE Trans. Electron Devices, vol. 44, no. 11, pp. 19511957,
Nov. 1997.
[7] E. Morifuji, M. Kanda, N. Yanagiya, S. Matsuda, S. Inaba, K. Okano,
K. Takahashi, M. Nishigori, H. Tsuno, T. Yamamoto, K. Hiyama,
M. Takayanagi, H. Oyamatsu, S. Yamada, T. Noguchi, and M. Kakumu,
High performance 30 nm bulk CMOS for 65 nm technology node
(CMOS5), in IEDM Tech. Dig., Dec. 811, 2002, pp. 655658.
[8] P. Bai, C. Auth, S. Balakrishnan, M. Bost, R. Brain, V. Chikarmane,
R. Heussner, M. Hussein, J. Hwang, D. Ingerly, R. James, J. Jeong,
C. Kenyon, E. Lee, S. H. Lee, N. Lindert, M. Liu, Z. Ma, T. Marieb,
A. Murthy, R. Nagisetty, S. Natarajan, J. Neirynck, A. Ott, C. Parker,
J. Sebastian, R. Shaheed, S. Sivakumar, J. Steigerwald, S. Tyagi,
C. Weber, B. Woolery, A. Yeoh, K. Zhang, and M. Bohr, A 65 nm
logic technology featuring 35 nm gate lengths, enhanced channel strain,
8 Cu interconnect layers, low- ILD and 0.57 m2 SRAM cell, Electron
Devices Meeting, in IEDM Tech. Dig., Dec. 1315, 2004, pp. 657660.
[9] Z. Luo, A. Steegen, M. Eller, M. Mann, C. Baiocco, P. Nguyen, L. Kim,
M. Hoinkis, V. Ku, V. Klee, F. Jamin, P. Wrschka, P. Shafer, W. Lin,
S. Fang, A. Ajmera, W. Tan, D. Park, R. Mo, J. Lian, D. Vietzke,
C. Coppock, A. Vayshenker, T. Hook, V. Chan, K. Kim, A. Cowley,
S. Kim, E. Kaltalioglu, B. Zhang, S. Marokkey, Y. Lin, K. Lee, H. Zhu,
M. Weybright, R. Rengarajan, J. Ku, T. Schiml, J. Sudijono, I. Yang,
and C. Wann, High performance and low power transistors integrated
in 65 nm bulk CMOS technology, in IEDM Tech. Dig., Dec. 1315, 2004,
pp. 661664.
[10] E. Morifuji, T. Yoshida, H. Tsuno, Y. Kikuchi, S. Matsuda, S. Yamada,
T. Noguchi, and M. Kakumu, New guideline of Vdd and Vth scaling
for 65 nm technology and beyond, in VLSI Symp.Tech. Dig., Jun. 1517,
2004, pp. 164165.
[11] M. Kanda, E. Morifuji, M. Nishigoori, Y. Fujimoto, M. Uematsu,
K. Takahashi, H. Tsuno, K. Okano, S. Matsuda, H. Oyamatsu,
H. Takahashi, N. Nagashima, S. Yamada, T. Noguchi, Y. Okamoto, and
M. Kakumu, Highly stable 65 nm node (CMOS5) 0.56 m2 SRAM
cell design for very low operation voltage, in VLSI Symp. Tech. Dig.,
Jun. 1012, 2003, pp. 1314.
1432
[12] K. Agawa, H. Hara, T. Takayanagi, and T. Kuroda, A bitline leakage compensation scheme for low-voltage SRAMs, IEEE J. Solid-State Circuits,
vol. 36, no. 5, pp. 726734, May 2001.
[13] A. J. Bhavnagarwala, T. Xinghai, and J. D. Meindl, The impact of
intrinsic device fluctuations on CMOS SRAM cell stability, IEEE J.
Solid-State Circuits, vol. 36, no. 4, pp. 658665, Apr. 2001.
[14] N. Yanagiya, S. Matsuda, S. Inaba, M. Takayanagi, I. Mizushima,
K. Ohuchi, K. Okano, K. Takahasi, E. Morifuji, M. Kanda,
Y. Matsubara, M. Habu, M. Nishigoori, K. Honda, H. Tsuno,
K. Yasumoto, T. Yamamoto, K. Hiyama, K. Kokubun, T. Suzuki,
J. Yoshikawa, T. Sakurai, T. Ishizuka, Y. Shoda, M. Moriuchi, M. Kishida,
H. Matsumori, H. Harakawa, H. Oyamatsu, N. Nagashima, S. Yamada,
T. Noguchi, H. Okamoto, and M. Kakumu, 65 nm CMOS technology
(CMOS5) with high density embedded memories for broadband microprocessor applications, in IEDM Tech. Dig., Dec. 811, 2002, pp. 5760.
Gate
Back Gate
tsi
Source
Drain
Hfin
Drain
Front Gate
(b)
(a)
Drain
Source
Front Gate
Insulator
Source
tsi = 8nm
tox = 1.6nm
Back Gate
Gate
25.6nm
L = 32nm
Oxide
Heavily doped Si
Lightly doped Si
(c)
P1
WL
2.6X
Dual-Gate-Mode
1.E-03
VDD
(1x32)/32
P2
Node1
BLB
(1x32)/32
WL
Node2
N3
N4
(1x32)/32
N1
(1x32)/32
N2
(1x32)/32
(1x32)/32
(a)
VDD
BL
VDD
(1x32)/32
P1
WL
P2
Node1
BLB
(1x32)/32
WL
Node2
N3
N4
(1x32)/32
(1x32)/32
N1
Single-Gate-Mode
I D S (A/m )
VDD
BL
(2x32)/32
N2
(2x32)/32
(b)
Fig. 3. Two tied-gate FinFET SRAM cells. (a) SRAM-TG1: all six transistors
are sized minimum. (b) SRAM-TG2: the pull-down transistors in the crosscoupled inverters have two fins. The size of each transistor is given as
(number of fins fin height)/channel length.
1.E-04
1.E-05
1.E-06
Vth = 0.25V
Vth = 0.4V
1.E-07
0.0
0.2
0.4
0.6
0.8
VGS (V)
Fig. 2. Drain current characteristics of an N-type IG-FinFET. The drain-tosource voltage is 0.8V. T = 70 oC.
VDD
VDD
P1
WL
BLB
(1x32)/32
(1x32)/32
P2
WL
Node1
N3
N4
Node2
(1x32)/32
N1
(1x32)/32
N2
(1x32)/32
(1x32)/32
20
(a)
VDD
BL
P1
RW
VDD
(1x32)/32
P2
Node1
BLB
(1x32)/32
RW
Node2
N3
N4
(1x32)/32
(1x32)/32
(1x32)/32
N2
W
(1x32)/32
(b)
Fig. 4. The IG-FinFET SRAM cells. (a) SRAM-IG1. (b) SRAM-IG2. The size
of each transistor is given as (number of fins fin height)/channel length.
230mV
197mV
SNM (mV)
200
150
70 oC
27 oC
15
10
N1
180mV
120mV
100
SRAM-TG1
SRAM-TG2
SRAM-IG1
SRAM-IG2
VDD
BL
2.0
SRAM-TG1
SRAM-TG2
SRAM-IG1
SRAM-IG2
1.5
1.0
0.5
0.0
Read delay
Read Power
Write delay
Write power
Fig. 7. The active mode power consumption and propagation delays of the
SRAM circuits. For each SRAM circuit, the power and delay are normalized
with respect to SRAM-TG1.
50
0
SRAM-TG1
SRAM-TG2
SRAM-IG1
SRAM-IG2
Fig. 5. The read SNM of the tied-gate and the independent-gate FinFET
SRAM cells.
D. Process Variations
The effect of process variations on the tied-gate and the
independent-gate SRAM cells is evaluated in this section. 1500
IEEE ICM - December 2007
SRAM-IG1 / SRAM-IG2
Mean = 12.5 nW
SD = 1.6 nW
Number of Samples
500
WL
WL
VGND
VDD
BLB
(a)
SRAM-TG2
Mean = 19.3 nW
SD = 2.7 nW
VGND
VDD
BL
WL
94.7%
400
VGND
VDD
BL
93.3%
300
WL
200
VDD
VGND
100
BLB
(b)
BL
VGND
VGND
VDD
0
0
10
15
20
25
30
WL
Number of Samples
1000
SRAM-TG2
Mean = 196mV
SD = 3mV
SRAM-TG1
Mean = 124mV
SD = 8.4mV
SRAM-IG1
Mean = 176mV
SD = 4mV
WL
800
600
VDD
VGND
VGND
BLB
(c)
SRAM-IG2
Mean = 222mV
SD = 3.5mV
VGND
VDD
BL
W
RW
400
200
40
80
120
160
200
RW
240
W
VGND
VDD
BLB
(d)
Fig. 10. Layouts of the FinFET SRAM cells. (a) SRAM-TG1. (b) SRAMTG2. (c) SRAM-IG1. (d) SRAM-IG2. SRAM-TG1, SRAM-IG1, and SRAMIG2: 0.226 m2. SRAM-TG2: 0.254 m2.
REFERENCES
[1] E. Nowak et al., Turning Silicon on Its Edge, IEEE Circuits & Device
Magazine, pp. 20-31, January/February 2004.
[2] Meng-Hsueh Chiang et al., Novel High-Density Low-Power Logic
Circuit Techniques Using DG Devices, IEEE Transactions on Electron
Devices, Vol. 52, No. 10, pp. 23392342, October 2005
[3] Y. X. Liu et al., 4-Terminal FinFETs with High Threshold Voltage
Controllability, Proceedings of the IEEE Device Research Conference, Vol.
1, pp. 207208, June 2004.
[4] J. Kedzierski et al., Metal-gate FinFET and Fully-Depleted SOI Devices
Using Total Gate Silicidation,Proceedings of the IEEE Electron Devices
Meeting, pp. 247250, December 2002.
[5] V. Kursun, S. A. Tawfik, and Z. Liu, Leakage-Aware Design of
Nanometer SoC, Proceedings of the IEEE International Symposium on
Circuits and Systems, pp. 3231-3234, May 2007.
[6] E. Seevinck, F. J. List, and J. Lohstroh, Static-Noise Margin Analysis of
MOS SRAM Cells, IEEE Journal of Solid-State Circuits, Vol. 22, No. 5, pp.
748-754, October 1987.
[7] M. Yamaoka et al., Low Power SRAM Menu for SOC Application Using
Yin-Yang-Feedback Memory Cell Technology, Proceedings of the IEEE
Symposium on VLSI Circuits, pp. 288-291, June 2004.
[8] B. Giraud et al., A Comparative Study of 6T and 4T SRAM Cells in
Double-Gate CMOS with Statistical Variation, Proceedings of the IEEE
International Symposium on Circuits and Systems, pp. 3022-3025, May 2007.
[9] Medici Device Simulator, Synopsys, Inc., 2006.
[10] Z. Liu and V. Kursun, High Read Stability and Low Leakage Cache
Memory Cell, Proceedings of the IEEE International Symposium on Circuits
and Systems, pp. 2774-2777, May 2007.
[11] O. Thomas, M. Reyboz, and M. Belleville, Sub-1V, Robust and
Compact 6T SRAM cell in Double Gate MOS Technology, Proceedings of
the IEEE International Symposium on Circuits and Systems, pp. 2778-2781,
May 2007.