Reconfigurable 2T2R ReRAM Architecture For Versatile Data Storage and Computing In-Memory
Reconfigurable 2T2R ReRAM Architecture For Versatile Data Storage and Computing In-Memory
Authorized licensed use limited to: National Central University. Downloaded on October 19,2023 at 09:10:13 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: RECONFIGURABLE 2T2R RERAM ARCHITECTURE FOR VERSATILE DATA STORAGE AND CIM 2637
Fig. 4. Proposed 2T2R bit-cell. (a) Schematic, (b) read operation, and
(c) writing operation for writing data “1.”
Authorized licensed use limited to: National Central University. Downloaded on October 19,2023 at 09:10:13 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: RECONFIGURABLE 2T2R RERAM ARCHITECTURE FOR VERSATILE DATA STORAGE AND CIM 2639
Fig. 5. TCAM operations. (a) Search example (the right column is a mismatch). (b) Simulated waveforms for match and 1-b-mismatch cases of 2T2R
TCAM, the 1-b mismatch curve is only shown for WDL = 32 since it is dominated by LRS and is similar for different WDLs. (c) Write example (write
data = (0, X)).
The 2T2R ReRAM bit-cell requires two cycles for writing more bits are mismatched, the discharge speed will be higher.
data. Fig. 4(c) shows an example of writing data “1.” Since Therefore, the worst case scenario is when a 1-bit mismatch
the bit-cell contains two identical 1T1R cells, the same biasing occurs. It should be noted that the leakage current during the
conditions for writing the 1T1R bit-cell can be utilized. match case limits the maximum TCAM word-length (WDL).
However, the shared BL architecture can disturb the ReRAM Fig. 5(b) shows the BL voltage waveforms under a 1-bit
device programmed in the first cycle. To address this issue, mismatch case and match cases for different WDLs when
we propose to erase both ReRAM devices by setting them in BL precharge voltage = 0.5 V. With the increased WDL,
the first cycle. In the second cycle, either Q or QB is reset to it becomes more difficult to distinguish between the match
HRS depending on the data to be written. The LRS device is case and 1-bit mismatch case due to the reduced voltage
not disturbed in the second cycle since the corresponding SL difference.
and BL are grounded. The write operation for TCAM takes two phases to write a
B. TCAM Operations word column-wise. HRS states are written in the first phase
and LRS states are written in the second phase. An additional
TCAM is a critical component in many systems where fast
column decoder is necessary to select a column to be written.
searching is required. The proposed 2T2R ReRAM can operate
Since multiple cells in a column share the same BL and
as a 2T2R TCAM like the ones in [18] and [19] by storing
SL/SLB, the number of cells that can be written in parallel
words column-wise (Fig. 5). For TCAM search operation,
per cycle in each phase depends on the strength of the BL and
BL is precharged while SL and SLB are grounded, which
SL/SLB drivers [39]. Therefore, it may take multiple cycles
is different from the NVM access mode where SL and SLB
to program one TCAM word given the area constraints of
are precharged and BL is grounded. Search data and inverted
TCAM write drivers. However, many TCAM applications such
search data are applied to WLL and WLR, respectively. If all
as neuromorphic circuits require infrequent writes, and the
bits are matched, BL stays at the precharged level or discharges
proposed TCAM is well-suited for such applications due to
slowly because of the leakage current. If there is a mismatch,
its nonvolatile feature that consumes zero standby power [18].
one of BL and BLB, or both BL and BLB will discharge
Fig. 5(c) illustrates writing a two-bit string (0, X). In phase 1,
quickly through the ReRAM device in LRS. The SA compares
BL = 0 and SL = SLB = VRESET . In phase 2, BL = VSET
the BL voltage against a reference voltage VREF and generates
and SL = SLB = 0. The WLL and WLR states in each phase
the search result.
are determined by the data to be written as shown in Table II.
Fig. 5(a) explains an example of searching (0, 1). The left
column stores (X, 1) and the right column stores (1, 1). Since
the first column stores the matched data, BL [1] will be slowly C. LiM Operation
discharged through the leakage current of the bit-cells in HRS, The proposed 2T2R structure can also compute Boolean
which is recognized as match[1] = “1.” However, BL [2] will logic functions between two words (X and Y ) in memory.
be discharged below VREF quickly through the bit-cell in LRS We utilize two address decoders to turn on two rows at the
and produce a search result indicated by match [2] = “0.” If same time and use two single-ended SAs to generate LiM
Authorized licensed use limited to: National Central University. Downloaded on October 19,2023 at 09:10:13 UTC from IEEE Xplore. Restrictions apply.
2640 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020
TABLE II
B IASING FOR TCAM W RITE O PERATION
Fig. 7. IM-DP for BNNs. (a) Sensing scheme. (b) Simplified architecture.
Authorized licensed use limited to: National Central University. Downloaded on October 19,2023 at 09:10:13 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: RECONFIGURABLE 2T2R RERAM ARCHITECTURE FOR VERSATILE DATA STORAGE AND CIM 2641
Authorized licensed use limited to: National Central University. Downloaded on October 19,2023 at 09:10:13 UTC from IEEE Xplore. Restrictions apply.
2642 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020
Authorized licensed use limited to: National Central University. Downloaded on October 19,2023 at 09:10:13 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: RECONFIGURABLE 2T2R RERAM ARCHITECTURE FOR VERSATILE DATA STORAGE AND CIM 2643
Fig. 11. SA offsets for two small SAs (SA1 and SA2 in Fig. 9), one large SA (SA1 and SA2 connected in parallel), and SA redundancy.
Authorized licensed use limited to: National Central University. Downloaded on October 19,2023 at 09:10:13 UTC from IEEE Xplore. Restrictions apply.
2644 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020
Fig. 15. Block diagram of the 256 × 128 array for simulation.
TABLE IV
O FFSETS OF D IFFERENT SA C ONFIGURATIONS
Fig. 14. (a) Normalized tmax(VSM ) versus RH /RL . (b) max(VSM ) versus RH /RL
at different VREAD .
We perform 1000 Monte-Carlo simulations for TCAM search
and ReRAM access operations for different VREAD . Fig. 16(a)
Fig. 14(b) shows the relationship between max(VSM ) and and (b) shows the simulation results for TCAM (VREAD =
RH /RL at different VREAD levels. When RH /RL is low, VSM 0.5 V) and ReRAM access (VREAD = 0.3 V), respectively.
shows a larger sensitivity to RH /RL . When RH /RL is high, The maximum variations of 32-bit match and 1-bit mismatch
VSM is less sensitive to RH /RL . In summary, it is desirable to for TCAM have standard deviations equal to 5.8 and 7.3 mV,
provide high RH /RL for reliable BL sensing. respectively. The variations due to access transistors for normal
ReRAM read have smaller standard deviations. The transistor
variations for LiM is similar to that of ReRAM read because
V. E VALUATION OF THE P ROPOSED 2T2R R E RAM they only access a small number of cells. The standard
In this section, we evaluate the proposed 2T2R ReRAM deviations for accessing HRS and LRS devices are 0.3 and
under different operating modes. We first analyze the area and 6.5 mV, respectively. As a result, TCAM has the worst VSM
energy overheads of the proposed R-CIM architecture using degradation due to transistor variations of the 2T2R bit-cell.
Hspice and a modified version of DESTINY [42]. Then we In the following analysis, results for the achievable VSM
present how different optimization techniques help improve are based on the voltage difference developed between the
the robustness of the R-CIM system and reduce the energy worst Monte-Carlo case of two voltage levels that need to
consumption by setting a lower VREAD . The ReRAM device is be distinguished, e.g., the lowest curve of 32-bit match and
modeled with Verilog-A and the model is calibrated using the the highest curve of 1-bit mismatch in Fig. 16(a). To achieve
data from [24] to get default LRS = 10 k and HRS = 1 M. the maximum sensing margin, the time to enable sensing is
The modeled ReRAM devices are integrated with transistors calculated using (7). Note that (7) can be close to the optimal
in 40-nm CMOS technology. We design an R-CIM array of sensing time since the resistance due to access transistors is
256 × 128 as shown in Fig. 15. Note that WLLs/WLRs much smaller compared with the ReRAM resistances. For
are controlled by tristate buffers and signal “TCAM_EN.” single-ended sensing, the reference voltage is put in the middle
In TCAM mode, WLLs/WLRs are driven by TCAM drivers of two different voltage levels after determining the sensing
instead of row decoders and the maximum allowed search time.
WDL is 32 as explained in Section IV-D. For ReRAM variations, the LRS value is 10 k with a
The sensing margin must be larger than the SA offset for 20% variation through all simulations. We use different HRS
reliable sensing. The SA offset from Fig. 10 are summarized values to generate HLRs of 150, 100, and 50 to evaluate VSM
in Table IV. We consider SA offset variations of 4-sigma with respect to HRS variations. This HRS variation is adopted
for an acceptable yield. Besides the SA offset, variations of from [18] that characterizes a 2T2R TCAM and the reported
access transistors of the 2T2R cell also need to be considered. ±2.5σ HRS corresponds to 50% variation with respect to the
Authorized licensed use limited to: National Central University. Downloaded on October 19,2023 at 09:10:13 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: RECONFIGURABLE 2T2R RERAM ARCHITECTURE FOR VERSATILE DATA STORAGE AND CIM 2645
Fig. 16. BL discharge variations due to access transistors for (a) TCAM and Fig. 17. Overhead of the proposed R-CIM. (a) Area. (b) Energy.
(b) ReRAM access operations.
Authorized licensed use limited to: National Central University. Downloaded on October 19,2023 at 09:10:13 UTC from IEEE Xplore. Restrictions apply.
2646 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020
C. LiM Evaluation
We evaluate the robustness of the proposed LiM primi-
tive operations (e.g., in-memory AND/ NOR) as well as the
performance and energy consumption of the LiM-FA. Fig. 19
shows the achievable VSM versus VREAD for LiM primitive
operations. The VSM of LiM primitive operations is much
larger than that of the TCAM search operation because of the
increased RH /RL ratio obtained from the same HRS and LRS
values. For example, when computing the in-memory AND
function, RH is obtained from two HRS in parallel while RL
is one LRS and one HRS in parallel which is approximately
one LRS. This causes only a 2× reduction in RH /RL compared
to the HLR of the ReRAM device. Moreover, VSM is much
less sensitive to HRS degradation because of the high RH /RL
Fig. 18. TCAM robustness evaluation. (a) VSM versus VREAD for different ratio (>15). Based on the proposed SA analysis, the target
HLRs at room temperature. (b) VSM at different temperatures and HLRs. VSM is 100 mV without employing the SA redundancy, giving
(c) Search energy (normalized) with respect to VREAD . the minimum VREAD of 150 mV for HLR = 50.
Table V compares various FA designs. The LiM-FA [10],
increases. With nominal HLR, low VREAD (e.g., 400 mV) still [12] and CMOS FA [47] are simulated with 40-nm technology.
gives enough VSM at high temperatures. However, if consid- Data for other designs are directly obtained from the rele-
ering HRS degradation, a high VREAD (e.g., 600 mV) must vant articles. Compared with other LiM-FAs [10], [12], the
be used to provide enough VSM at high temperatures if not proposed LiM-FA achieves 3.2×, 1.2×, and 1.6× improve-
employing SA redundancy. On the contrary, our optimizations ments in the delay, the static power, and the dynamic power,
by employing the SA redundancy reduces the VSM requirement respectively. Compared with CMOS FA, the proposed LiM-FA
and allows TCAM to operate at VREAD = 400 mV under high has slightly worse dynamic power due to more levels of
temperatures. Note that TCAM has the worst VSM , requiring transition, but the delay and the static power are 1.34× and
higher VREAD . The other operations of the proposed ReRAM 8.9× better with fewer transistors. We also compare our
architecture can have lower VREAD . LiM-FA with several FAs based on nonvolatile devices such
The normalized TCAM search energy (including peripheral as magnetic tunnel junctions (MTJs) [44], ferroelectric tunnel
logics) with respect to VREAD is presented in Fig. 18(c). When junctions (FTJs) [45], and ferroelectric field-effect transistors
HLR = 150 at room temperature, SA redundancy allows (FeFETs) [46]. Regarding performance, our LiM-FA is only
VREAD to be lowered from 300 to 200 mV as depicted in slightly worse than the FA based on FeFET [46]. This is
Fig. 18(a). This improves the TCAM search energy by 1.2×. because the FA in [46] uses the dynamic logic design style
When the HLR decreases because of HRS degradation, the SA with only a pull-down NMOS network, therefore less capaci-
Authorized licensed use limited to: National Central University. Downloaded on October 19,2023 at 09:10:13 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: RECONFIGURABLE 2T2R RERAM ARCHITECTURE FOR VERSATILE DATA STORAGE AND CIM 2647
TABLE V
D ELAY AND P OWER C ONSUMPTION OF D IFFERENT FA D ESIGNS
TABLE VI
C OMPARISON W ITH R ECENT R E RAM AND R-CIM W ORKS
Authorized licensed use limited to: National Central University. Downloaded on October 19,2023 at 09:10:13 UTC from IEEE Xplore. Restrictions apply.
2648 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020
different types of variations. Note that the lowest VREAD [7] S. Jeloka, N. B. Akesh, D. Sylvester, and D. Blaauw, “A 28 nm
= 0.1 V for NVM read will affect the read speed [35]. configurable memory (TCAM/BCAM/SRAM) using push-rule 6T bit
cell enabling logic-in-memory,” IEEE J. Solid-State Circuits, vol. 51,
Fortunately, 0.1 V is far less than the SET/RESET voltage of no. 4, pp. 1009–1021, Apr. 2016.
ReRAM devices (>0.5 V) and it leaves a large space for the [8] W.-H. Chen et al., “A 65 nm 1 Mb nonvolatile computing-in-memory
tradeoff between speed and power while ensuring the ReRAM ReRAM macro with sub-16 ns multiply- and-accumulate for binary
reliability. As reported in [35], which is also based on ReRAM DNN AI edge processors,” in IEEE ISSCC Dig. Tech. Papers, Feb. 2018,
pp. 494–495.
integrated with 40-nm technology, the read speed is improved [9] C. Yu, T. Yoo, T. T.-H. Kim, K. C. T. Chuan, and B. Kim, “A 16K
by more than 2× when VREAD increases from 0.18 to 0.26 V. current-based 8T SRAM compute-in-memory macro with decoupled
The work in [18] uses VREAD = 0.6 V with RH /RL ratio read/write and 1-5bit column ADC,” in Proc. IEEE Custom Integr.
∼
= during TCAM search. However, with VREAD = 0.6 V, Circuits Conf. (CICC), Mar. 2020, pp. 1–4.
[10] S. Jain, A. Ranjan, K. Roy, and A. Raghunathan, “Computing in memory
significant read disturbance may occur during TCAM search with spin-transfer torque magnetic RAM,” IEEE Trans. Very Large Scale
and degrades the ReRAM reliability [31], [36]. On the con- Integr. (VLSI) Syst., vol. 26, no. 3, pp. 470–483, Mar. 2018.
trary, we perform optimizations for TCAM using SA redun- [11] D. Reis, M. Niemier, and X. S. Hu, “Computing in memory with
dancy and lower the required VREAD to 0.4 V. Although FeFETs,” in Proc. Int. Symp. Low Power Electron. Design, Jul. 2018,
pp. 1–6.
a lower VREAD will decrease the search speed, it improves
[12] S. K. Thirumala, S. Jain, A. Raghunathan, and S. K. Gupta, “Non-
the reliability of ReRAM devices and reduces the search volatile memory utilizing reconfigurable ferroelectric transistors to
energy. When the robustness of R-CIM systems is of major enable differential read and energy-efficient in-memory computation,”
concern, designers should not just care about speed and in Proc. IEEE/ACM Int. Symp. Low Power Electron. Design (ISLPED),
Jul. 2019, pp. 1–6.
power consumption since the reliability of ReRAM devices
[13] M. Bocquet et al., “In-memory and error-immune differential RRAM
is also important when performing different CIM operations. implementation of binarized deep neural networks,” in IEDM Tech. Dig.,
Therefore, it is necessary to perform different optimizations to Dec. 2018, pp. 20-1–20-6.
ensure that optimal VREAD can be selected without significantly [14] C.-X. Xue et al., “A 1 Mb multibit ReRAM computing-in-memory
macro with 14.6 ns parallel MAC computing time for CNN based
disturbing ReRAM devices. This work provides a guide for AI edge processors,” in IEEE ISSCC Dig. Tech. Papers, Feb. 2019,
such optimizations. pp. 388–389.
[15] W. Wan et al., “A 74 TMACS/W CMOS-RRAM neurosynaptic core with
dynamically reconfigurable dataflow and in-situ transposable weights
VI. C ONCLUSION for probabilistic graphical models,” in IEEE ISSCC Dig. Tech. Papers,
Feb. 2020, pp. 498–499.
In this article, we proposed a reconfigurable 2T2R ReRAM [16] L. Zheng, S. Shin, and S.-M.-S. Kang, “Memristors-based ternary
architecture to support three types of CIM operations: content addressable memory (mTCAM),” in Proc. IEEE Int. Symp.
1) TCAM; 2) LiM; and 3) IM-DP. We proposed a con- Circuits Syst. (ISCAS), Jun. 2014, pp. 2253–2256.
figurable data storage strategy to allow the 2T2R ReRAM [17] M.-F. Chang et al., “Designs of emerging memory based non-volatile
TCAM for Internet-of-Things (IoT) and big-data processing: A 5T2R
to operate as conventional 1T1R ReRAM in situations that universal cell,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS),
CIM is not required. We performed optimizations for the May 2016, pp. 1142–1145.
proposed R-CIM using existing and novel design techniques [18] D. R. B. Ly et al., “In-depth characterization of resistive memory-based
to improve its robustness and efficiency. We quantitatively ternary content addressable memories,” in IEDM Tech. Dig., Dec. 2018,
pp. 20-1–20-3.
analyzed the robustness of the proposed R-CIM with respect to [19] J. Li, R. K. Montoye, M. Ishii, and L. Chang, “1 Mb 0.41 μm2
the precharge voltage (VREAD ) and the ReRAM ON/ OFF ratio. 2T-2R cell nonvolatile TCAM with two-bit encoding and clocked
With the proposed optimizations, the TCAM search energy can self-referenced sensing,” IEEE J. Solid-State Circuits, vol. 49, no. 4,
be reduced by 1.6× with better reliability thanks to the lower pp. 896–907, Apr. 2014.
[20] Y. Zha, E. Nowak, and J. Li, “Liquid silicon: A nonvolatile fully pro-
VREAD . The proposed LiM-FA improves the delay (3.2×), the grammable processing-in-memory processor with monolithically inte-
static power (1.2×), and the dynamic power (1.6×) compared grated ReRAM for big data/machine learning applications,” in Proc.
with the state-of-the-art LiM-FA. Combining optimizations Symp. VLSI Circuits, Jun. 2019, pp. C206–C207.
with robustness analysis, the same VREAD for ReRAM access [21] M.-F. Chang et al., “Challenges and circuit techniques for
energy-efficient on-chip nonvolatile memory using memristive devices,”
can be set in 2T2R and 1T1R configurations. A lower VREAD IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 5, no. 2, pp. 183–193,
in 1T1R configuration gives 1.14× lower access energy. Jun. 2015.
[22] P.-Y. Chen and S. Yu, “Compact modeling of RRAM devices and its
applications in 1T1R and 1S1R array design,” IEEE Trans. Electron
R EFERENCES Devices, vol. 62, no. 12, pp. 4022–4028, Dec. 2015.
[1] T.-K.-J. Ting et al., “An 8-channel 4.5 Gb 180GB/s 18ns-row-latency [23] R. Waser, R. Dittmann, G. Staikov, and K. Szot, “Redox-based resistive
RAM for the last level cache,” in IEEE ISSCC Dig. Tech. Papers, switching memories–nanoionic mechanisms, prospects, and challenges,”
Feb. 2017, pp. 404–405. Adv. Mater., vol. 21, nos. 25–26, pp. 2632–2663, Jul. 2009.
[2] J. Wang et al., “A 28-nm compute SRAM with bit-serial logic/arithmetic [24] K.-S. Li et al., “Utilizing sub-5 nm sidewall electrode technology for
operations for programmable in-memory vector computing,” IEEE atomic-scale resistive memory fabrication,” in Symp. VLSI Technol.
J. Solid-State Circuits, vol. 55, no. 1, pp. 76–86, Jan. 2020. (VLSI-Technol.), Dig. Tech. Papers, Jun. 2014, pp. 1–2.
[3] M. F. Ali, A. Jaiswal, and K. Roy, “In-memory low-cost bit-serial [25] H. Y. Lee et al., “Low power and high speed bipolar switching with a
addition using commodity DRAM technology,” IEEE Trans. Circuits thin reactive Ti buffer layer in robust HfO2 based RRAM,” in IEDM
Syst. I, Reg. Papers, vol. 67, no. 1, pp. 155–165, Jan. 2020. Tech. Dig., Dec. 2008, pp. 1–4.
[4] Y. Chen, “ReRAM: History, status, and future,” IEEE Trans. Electron [26] W. Kim et al., “Forming-free nitrogen-doped AlOX RRAM with sub-μA
Devices, vol. 67, no. 4, pp. 1420–1433, Apr. 2020. programming current,” in IEEE Symp. VLSI Technol. Dig. Tech. Papers,
[5] M. M. S. Aly et al., “The N3XT approach to energy-efficient abundant- Jun. 2011, pp. 22–23.
data computing,” Proc. IEEE, vol. 107, no. 1, pp. 19–48, Jan. 2019. [27] A. Grossi et al., “Fundamental variability limits of filament-based
[6] T. F. Wu et al., “A 43pJ/cycle non-volatile microcontroller with 4.7μs RRAM,” in IEDM Tech. Dig., Dec. 2016, pp. 4-1–4-7.
shutdown/wake-up integrating 2.3-bit/cell resistive RAM and resilience [28] E. Vianello et al., “Resistive memories for ultra-low-power embedded
techniques,” in IEEE ISSCC Dig. Tech. Papers, Feb. 2019, pp. 226–228. computing design,” in IEDM Tech. Dig., Dec. 2014, pp. 6-1–6-3.
Authorized licensed use limited to: National Central University. Downloaded on October 19,2023 at 09:10:13 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: RECONFIGURABLE 2T2R RERAM ARCHITECTURE FOR VERSATILE DATA STORAGE AND CIM 2649
[29] A. Fantini et al., “Intrinsic program instability in HfO2 RRAM and Lu Lu (Student Member, IEEE) received the B.E.
consequences on program algorithms,” in IEDM Tech. Dig., Dec. 2015, degree from the School of Computer and Infor-
pp. 7-1–7-5. mation, Hefei University of Technology, Hefei,
[30] Y. Chen et al., “Balancing SET/RESET pulse for >1010 endurance in China, in 2007, and the M.E. degree from
HfO2 /Hf 1T1R bipolar RRAM,” IEEE Trans. Electron Devices, vol. 59, the School of Microelectronics and Solid-State
no. 12, pp. 3243–3249, Dec. 2012. Electronics, Xiamen University, Xiamen, China,
[31] W. C. Chien et al., “A forming-free WOx resistive memory using a in 2010. She is currently working toward the Ph.D.
novel self-aligned field enhancement feature with excellent reliability degree at the School of Electrical and Electronic
and scalability,” in IEDM Tech. Dig., Dec. 2010, pp. 19-1–19-2. Engineering, Nanyang Technological University,
[32] B. Govoreanu et al., “10×10 nm2 Hf/HfOx crossbar resistive ram with Singapore.
excellent performance, reliability and low-energy operation,” in IEDM Her research interests include low-power SRAM
Tech. Dig., Dec. 2011, pp. 31-1–31-6. and SRAM-based physical unclonable function (PUF).
[33] P. Jain et al., “A 3.6 Mb 10.1Mb/mm2 embedded non-volatile ReRAM Ms. Lu was a recipient of the IEEE SSCS Singapore Chapter Award in
macro in 22 nm FinFET technology with adaptive forming/set/reset 2018.
schemes yielding down to 0.5 V with sensing time of 5ns at 0.7 V,”
in IEEE ISSCC Dig. Tech. Papers, Feb. 2019, pp. 212–213.
[34] O. Golonzka et al., “Non-volatile RRAM embedded into 22FFL FinFET Bongjin Kim (Member, IEEE) received the B.S.
technology,” in Proc. Symp. VLSI Technol., Jun. 2019, pp. 230–231. and M.S. degrees from POSTECH, Pohang, South
[35] C.-C. Chou et al., “An N40 256K × 44 embedded RRAM macro Korea, in 2004 and 2006, respectively, and the Ph.D.
with SL-precharge SA and low-voltage current limiter to improve read degree from the University of Minnesota, Minneapo-
and write performance,” in IEEE ISSCC Dig. Tech. Papers, Feb. 2018, lis, MN, USA, in 2014.
pp. 478–479. He spent two years with Rambus, Sunnyvale, CA,
[36] H. Lv et al., “BEOL based RRAM with one extra-mask for low cost, USA, where he was a Senior Staff Member and
highly reliable embedded application in 28 nm node and beyond,” in worked on the research of high-speed serial link
IEDM Tech. Dig., Dec. 2017, pp. 2–4. circuits and microarchitectures. He worked as a Post-
[37] M.-F. Chang et al., “Embedded 1Mb ReRAM in 28 nm CMOS with doctoral Fellow with Stanford University, Stanford,
0.27-to-1 V read using swing-sample-and-couple sense amplifier and CA, for a year. From 2006 to 2010, he was with
self-boost-write-termination scheme,” in IEEE ISSCC Dig. Tech. Papers, Samsung Electronics, Yongin, South Korea, where he performed the research
Feb. 2014, pp. 332–333. on clock generators for high-speed serial links. He was also a Research
[38] Q. Liu et al., “A fully integrated analog ReRAM based 78.4TOPS/W Intern with Texas Instruments, Dallas, TX, USA, IBM TJ Watson Research,
compute-in-memory chip with fully parallel MAC computing,” in IEEE Yorktown Heights, NY, USA, and Rambus, during his Ph.D., from 2012 to
ISSCC Dig. Tech. Papers, Feb. 2020, pp. 500–501. 2014. He joined Nanyang Technological University (NTU), Singapore, in Sep-
[39] Q. Guo, X. Guo, Y. Bai, and E. Ipek, “A resistive TCAM accelerator for tember 2017, as an Assistant Professor. His current research interests include
data-intensive computing,” in Proc. 44th Annu. IEEE/ACM Int. Symp. memory-centric computing circuits and architectures, hardware accelerators,
Microarchitecture (MICRO), Dec. 2011, pp. 339–350. and mixed-signal circuit design techniques and methodologies.
[40] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, Dr. Kim was a recipient of the Prestigious Doctoral Dissertation Fellow-
“Binarized neural networks,” in Proc. 30th Conf. Neural Inf. Process. ship Award for his Ph.D. study, the Low Power Design Contest Award
Syst. (NIPS), Dec. 2016, pp. 4107–4115. from ISLPED, and the Intel/IBM/Catalyst Foundation Award from CICC
[41] N. Verma and A. P. Chandrakasan, “A 65 nm 8T sub-Vt SRAM Conference. His research works appeared at top circuit conferences and
employing sense-amplifier redundancy,” in IEEE ISSCC Dig. Tech. journals including ISSCC, IEEE T RANSACTIONS ON V ERY L ARGE S CALE
Papers, Feb. 2007, pp. 328–329. I NTEGRATION (VLSI) S YSTEMS , CICC, ESSCIRC, and IEEE J OURNAL OF
[42] M. Poremba, S. Mittal, D. Li, J. S. Vetter, and Y. Xie, “DESTINY: A tool S OLID -S TATE C IRCUITS (JSSC).
for modeling emerging 3D NVM and eDRAM caches,” in Proc. Design,
Autom. Test Eur. Conf. Exhib. (DATE), Mar. 2015, pp. 1543–1546.
[43] M. E. Sinangil and A. P. Chandrakasan, “Application-specific SRAM Tony Tae-Hyoung Kim (Senior Member, IEEE)
received the B.S. and M.S. degrees in electrical
design using output prediction to reduce bit-line switching activity and
statistically gated sense amplifiers for up to 1.9× lower energy/access,” engineering from Korea University, Seoul, South
IEEE J. Solid-State Circuits, vol. 49, no. 1, pp. 107–117, Jan. 2014. Korea, in 1999 and 2001, respectively, and the
[44] S. Matsunaga et al., “Fabrication of a nonvolatile full adder based on Ph.D. degree in electrical and computer engineering
logic-in-memory architecture using magnetic tunnel junctions,” Appl. from the University of Minnesota, Minneapolis, MN,
Phys. Express, vol. 1, Aug. 2008, Art. no. 091301. USA, in 2009.
[45] Z. Wang et al., “A physics-based compact model of ferroelectric tunnel From 2001 to 2005, he was with Samsung Elec-
tronics, Hwasung, South Korea, where he performed
junction for memory and logic design,” J. Phys. D, Appl. Phys., vol. 47,
no. 4, Dec. 2013, Art. no. 045001. the research on the design of high-speed SRAM
[46] X. Yin, X. Chen, M. Niemier, and X. S. Hu, “Ferroelectric FETs-based memories, clock generators, and IO interface cir-
nonvolatile logic-in-memory circuits,” IEEE Trans. Very Large Scale cuits. From 2007 to 2009, he was with the IBM T. J. Watson Research
Integr. (VLSI) Syst., vol. 27, no. 1, pp. 159–172, Jan. 2019. Center, Yorktown Heights, NY, USA, and Broadcom Corporation, Edina,
[47] Deepa and V. S. Kumar, “Analysis of energy efficient PTL based full MN, USA, where he performed the research on circuit reliability, low-power
adders using different nanometer technologies,” in Proc. 2nd Int. Conf. SRAM, and battery-backed memory design. In 2009, he joined Nanyang
Technological University, Singapore, where he is currently an Associate
Electron. Commun. Syst. (ICECS), Feb. 2015, pp. 310–315.
Professor. He has authored or coauthored over 160 journal and conference
articles and holds 17 U.S. and Korean patents registered. His current research
interests include low-power and high-performance digital, mixed-mode, and
memory circuit design, ultralow-voltage circuits and systems design, variation
and aging-tolerant circuits and systems, and circuit techniques for 3-D ICs.
Dr. Kim received the Best Demo Award at APCCAS2016, the Low Power
Yuzong Chen received the B.Eng. degree in electri- Design Contest Award at ISLPED2016, the best paper awards at 2014
cal and electronic engineering from Nanyang Tech- and 2011 ISOCC, the AMD/CICC Student Scholarship Award at the IEEE
nological University, Singapore, in 2019. CICC2008, the Departmental Research Fellowship from the University of
He is currently a Project Officer with the Cen- Minnesota in 2008, the DAC/ISSCC Student Design Contest Award in 2008,
tre for Integrated Circuits and Systems (CICS), the Samsung Humantec Thesis Award in 2008, 2001, and 1999, and the ETRI
Nanyang Technological University. His research Journal Paper of the Year Award in 2005. He was the Chair of the IEEE
interests include resistive random access mem- Solid-State Circuits Society Singapore Chapter. He has served on numerous
ory (ReRAM) circuits design and in-memory conferences as a Committee Member. He serves as an Associate Editor for
computing. the IEEE T RANSACTIONS ON V ERY L ARGE S CALE I NTEGRATION (VLSI)
S YSTEMS , IEEE A CCESS , and the IEIE Journal of Semiconductor Technology
and Science.
Authorized licensed use limited to: National Central University. Downloaded on October 19,2023 at 09:10:13 UTC from IEEE Xplore. Restrictions apply.