0% found this document useful (0 votes)
51 views6 pages

An Overview of Computing-In-Memory Circuits With DRAM and NVM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views6 pages

An Overview of Computing-In-Memory Circuits With DRAM and NVM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

1626 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 71, NO.

3, MARCH 2024

An Overview of Computing-in-Memory
Circuits With DRAM and NVM
Sangjin Kim , Graduate Student Member, IEEE, and Hoi-Jun Yoo , Fellow, IEEE

Abstract—Computing-in-memory (CIM) has emerged as an


energy-efficient hardware solution for machine learning and AI.
While static random access memory (SRAM)-based CIM has
been prevalent, growing attention is directed towards leveraging
dynamic random access memory (DRAM) and non-volatile
memory (NVM) with its unique characteristics such as high-
density and non-volatility. This brief reviews the evolving trends
in DRAM and NVM-based CIM, which have faced unique
challenges that arise from SRAM despite their advantages. For
instance, the DRAM cell’s density comes with leakage and
refresh issues, impacting efficiency and computing accuracy.
NVM-CIM faces computing accuracy challenges of resistance-
based computation with low signal margins and non-linear
characteristics. This tutorial discusses the current status and
future directions in DRAM-CIM and NVM-CIM research, which
address the abovementioned challenge.
Index Terms—Computing-in-memory, dynamic random access
memory, non-volatile memory, magnetic random access memory,
resistive random access memory, phase change memory, hard-
ware for artificial intelligence.

I. I NTRODUCTION
OWADAYS, there is a rising surge of interest in
N Computing-in-memory (CIM) as a potent energy-
efficient hardware solution for machine learning (ML)
Fig. 1. The Advantage and the Challenge of the DRAM-CIM and NVM-CIM.

and artificial intelligence (AI) acceleration [1], [2]. CIM


involves the integration of high-efficiency computational logic
within the memory array, thereby leading to a substantial utilizing cross-coupled inverter logic. Thanks to stable storage,
reduction in the memory and computation energy and a various computational methods can be used, and their effi-
marked enhancement in the energy efficiency of AI/ML ciency and accuracy are steadily improved. Moreover, SRAM
applications. Early CIM architecture emerged using NVM boasts the advantages of low power consumption for read
with its straightforward mapping [3] and DRAM with its and write operations, alongside the availability of high-density
high density [4]. However, in recent years, attention to static cells with pushed-rule across various manufacturing processes.
random access memory (SRAM)-based implementation has Despite its advantages, SRAM-CIM has inherent drawbacks
increased rapidly [5], [6], [7], [8], [9], [10], [11], [12], [13], compared to DRAM-CIM and NVM-CIM that deserve con-
[14]. The rationale behind this trend is rooted in SRAM’s sideration. First, in addition to 6 transistors for the SRAM
characteristic of stably retaining data through static storage cell, SRAM-CIM requires additional transistors for computing
logic. Therefore, a relatively high transistor count (6 to 18) [5],
Manuscript received 1 September 2023; revised 24 October 2023; accepted [6], [7], [8], [9], [10], [11], [12], [13], [14] is required for a
14 November 2023. Date of publication 17 November 2023; date of current unit MAC, and ×1.4 to ×2 [12], [13] more area consumed.
version 5 March 2024. This work was supported by the Institute of Information
and Communications Technology Planning and Evaluation (IITP) Grant
Second, SRAM-CIM is confined to storing only 1 bit in
funded by the Korean Government (Ministry of Science and Information a single cell and limits the weight density because of the
and Communication Technology (MSIT), Processing-in-Memory (PIM) cross-coupled storage node. Finally, SRAM is unsuitable for
Semiconductor Design Research Center) under Grant 2022-0-01170. This applications requiring long-term data storage as it loses data
brief was recommended by Associate Editor Y. Ha. (Corresponding author:
Hoi-Jun Yoo.) when the power is turned off. This volatility makes it less
The authors are with the School of Electrical Engineering, Korea Advanced practical for event-driven applications.
Institute of Science and Technology, Daejeon 34141, South Korea (e-mail: As the field of CIM progresses, these limitations underscore
[email protected]; [email protected]).
Color versions of one or more figures in this article are available at
the need to explore alternative memory technologies, and
https://doi.org/10.1109/TCSII.2023.3333851. DRAM and NVM-based CIM are being reexamined. DRAM
Digital Object Identifier 10.1109/TCSII.2023.3333851 and NVM offer unique advantages and might address some
1549-7747 
c 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on March 21,2024 at 07:20:17 UTC from IEEE Xplore. Restrictions apply.
KIM AND YOO: OVERVIEW OF CIM CIRCUITS WITH DRAM AND NVM 1627

challenges of the SRAM-CIM. For example, a DRAM cell multiple rows at a time during a read operation was proposed.
can be implemented even with a single transistor and capac- These approaches use the bitwise operation to support more
itor, achieving high memory density. Also, DRAM can be complex digital operations with primitive boolean operations.
implemented on both DRAM-dedicated processes for higher Additionally, more bandwidth is available since the compu-
density and logic processes as embedded DRAM (eDRAM) tation is performed inside the cell array, and computation
for logic compatibility and high availability. In addition, NVM efficiency can be further improved than digital-PIM.
can achieve low standby power, high density with multi-level Recent research [15], [16] applied analog operation in
cells, and low system power by eliminating the initial data DRAM CIM to achieve higher efficiency and parallelism. In
writing. Therefore, NVM-CIM can provide a highly energy- the [15], the capacitor of the 1T1C cell is used to create the
efficient event-driven system by using power-off mode while logic necessary to multiply-and-accumulate (MAC) the input
maintaining the neural network (NN) model on-chip. In the activation (IA) and weight (W) of the NN model. First, an
latest CIM research, implementation using DRAM [15], [16], analog voltage corresponding to the IA value is generated
[17], [18], [19], [20], [21], [22] and NVM [23], [24], [25], using charge sharing between cell capacitors and written to
[26], [27], [28], [29] is proposed for higher performance and the cell. Then, charge sharing is performed between the cells
functionality. according to the weight value to perform MAC with multi-bit
Although DRAM and NVM offer advantages in terms of IA and W. Also, [16] adopts the spike neural network (SNN),
memory density and functionality, the integration of CIM gives representing IA as a spike instead of the scalar. Therefore,
rise to fresh challenges. In contrast to the stability of weight SNN operation consists of simple integrate-and-firing oper-
data in SRAM-based implementations, the stored data within ation instead of multiply-and-accumulation operation [16].
a cell becomes less stable in DRAM due to leakage and less implement the integration operation using charge sharing of
easy to use in NVM where data is stored as resistance. To cell columns and the firing operation as SA operation.
handle these new characteristics, new circuits and datapaths
are developed to manage the leakage and noise of DRAM, B. Gain-Cell DRAM
non-linearity, and low signal margin of NVM.
The 1T-1C cell uses single-bit lines for read and write
This tutorial will cover the challenges and trends of the
operations, resulting in destructive read. Therefore, the data
computation method in DRAM and NVM-based CIM. Also,
in the cell is destroyed by charge sharing during every read
the future direction and possible challenges of DRAM and
or computation, and SA operation is required to recover it.
NVM-CIM will discussed.
As a result, the 1T-1C DRAM-CIM consumes considerable
power by frequent SA operations. The gain-cell structure is
II. DRAM-CIM devised to separate the read-bit line (RBL) and write-bit line
In the case of DRAM cells, the storage node is implemented (WBL) by using an additional transistor. Therefore, although
using only a capacitor instead of the cross-coupled logic of the gain cells show lower density than the 1T-1C cell, the non-
SRAM. This design enables the cell with a minimal structure destructive read path can be used as a computing datapath for
of just one transistor and capacitor. However, the charge on less frequent SA operation. Also, DRAM-CIM with gain-cell
the capacitor can be gradually degraded due to leakage or uses 2 4 transistors for a single cell, which is still lower than
impacted by noise. However, during the reading process, a that of SRAM-CIM with 6 18 transistors.
sense amplifier detects this charge as 0 or 1, ensuring accurate The 4T-2C cell proposed in [20] is a pair of 2T-1C gain-cells
data transmission from the cell to the computing logic without for complementary storage with signed number representation.
errors. In the case of DRAM-CIM, where computing logic is Each 2T-1C cell pair performs multiplication of 1-b IA and
integrated within the DRAM memory, data in the cell can be ternary W, as shown in Fig. 2(a). The driving transistor is
directly used by computing logic inside the cell, facing the connected to the storage node and drives the current to RBL
challenge of leakage and noise impact. only when its cell value (W or W) is H and IA on word
line (WL) is L. Also, by activating 64 rows in parallel, the
multiplication results from each cell can be accumulated with
A. 1T-1C DRAM current-based operation. However, the process variation may
The 1T-1C DRAM cell is most broadly used because involve the current amount mismatch among the cells.
it achieves the highest memory density with a minimal To address this issue, as shown in Fig. 2(b), the cell array
number of transistors for a cell. Because the design cost proposed in [17] removes the row parallelism and integrates
of changing the internal structure of the cell array is high, the self-detect voltage clipper below every row. After 1’b
digital processing-in-memory (digital-PIM) or processing- multiplication by the cell, the clipper logic clips the RBL
near-memory (PNM) has been most widely studied. The voltage to the predefined reference voltage. After that, the RBL
digital-PIM [30], [31] integrate the computing logic after the voltage of every column is accumulated by charge sharing.
sense amplifier (SA), and PNM integrates the computing This operation uses only column parallelism, but the impact
logic on the buffer device of Dual In-line Memory Module of leakage and variation can be eliminated by clipping.
(DDIM) [32] or logic die of 3D DRAM [4]. However, the To maintain higher row parallelism while reducing the
energy efficiency and latency improvement of digital-PIM and impact of the variation and leakage, [19] proposed the oper-
PNM is limited since computing logic is not actually placed ation with segmented BL. As shown in Fig. 2(c), the RBL is
inside the cell array and SA operates for every data read. divided into multiple row segments with a transmission gate
In [33], [34], [35], [36], [37], a method of performing a instead of sharing the charge among the columns. Each seg-
boolean operation on the bit line (BL) or SA by activating ment’s RBL voltage is accumulated with charge sharing after

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on March 21,2024 at 07:20:17 UTC from IEEE Xplore. Restrictions apply.
1628 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 71, NO. 3, MARCH 2024

TABLE I
C OLLECTION OF THE DRAM-PIM

accumulation results. Finally, the storage capacitor does not


need to be excessively large, resulting in high cell density.

III. NVM-CIM
Fig. 2. Operation of the DRAM-CIM with Gain-Cell. (a) 4T-2C Current- NVM, such as resistive random access memory (ReRAM),
based (b) 2T-1C Charge Sharing-based (BL) (c) 2T-1C Charge Sharing-based magnetic random access memory (MRAM), and phase-
(sub-BL) (d) 3T-1C Current-based (e) 3T-2C Capacitive Coupling-based.
change memory (PCM) represent data as a resistance in the
cell. RRAM, MRAM, and PCM differ in mechanisms and
performance characteristics. ReRAM uses a metal-insulator-
one row per segment performs 1-b multiplication. Therefore,
metal structure with metal oxide, MRAM uses manipulation
all segments can be operated in parallel. Also, [19] discovered
of the magnetization direction of a magnetic tunnel junction
that most computation errors, such as leakage-induced error
(MTJ) and PCM exploits the reversible phase change between
and coupling noise from control and word line, are common-
amorphous and crystalline states of a chalcogenide material.
mode errors. Therefore, [19] uses one more cell array as a
NVM brings unique advantages to CIM, non-volatility, which
reference cell array to use as the reference voltage of analog-
maintains data when power is off, and higher density compared
digital converter (ADC) to remove the common-mode error.
to SRAM [38]. Due to these advantages, several NVM-CIM
3T cell designs [18], [21], [22] have been proposed for
architectures [39], [40] have been studied since early times.
more functionality or parallelism than 2T cells. The 3T gain-
Additionally, the latest NVM-CIMs [23], [24], [25], [26],
cell proposed in [18] stores 4-bit weight in a single cell as an
[27], [28], [29] support matrix-vector-multiplication (MVM)
analog voltage on a storage capacitor. Also, two transistors are
operations to demonstrate NN inference. However, regarding
serially connected to the read bit line, as shown in Fig. 2(d).
NVM-CIM for NN inference, the current-based operation is
The driving transistor (Q1) connected to the storage node
inevitable since NVM represents data as resistance. Therefore,
enables multi-bit weight operation by changing the current
the static current during the computation, non-linear volt-
amount depending on the analog voltage on the storage node.
age characteristics, and limited sensing margin become the
Also, the other transistor (Q2) supports multi-bit IA operation
challenge of the NVM-CIM. This section summarizes recent
by changing the turn-on time with pulse width. As a result,
progress in circuits and datapaths of NVM-CIM that address
the single-bit cell stores 4b weight and performs analog 4b-4b
these challenges.
multiplication. However, to store 4-b analog weight, a large
capacitor is required to reduce the impact of leakage and noise
and limit cell density. Also, the current-based operation limits A. Current-Based Operation With Complementary Cell
the accuracy due to nonlinearity and variation of the transistor. The operation with multiple cells has been proposed to
The 3T-2C cell in Fig. 2(e) achieves higher cell density overcome the effect of non-linearity and low signal margin on
using its novel leakage-tolerant computing method [21], [22]. computation accuracy.
They tailor the capacitive coupling-based operation of SRAM- The XNOR-RRAM proposed in [26] uses a pair of 1T-
CIM [12] as operation for DRAM to enable high matching 1R ReRAM cells for binarized (+1 and −1) IA and W.
and linearity. Usually, the capacitive coupling-based operation Each 1T-1C cell pair performs XNOR operation with 1-b
consists of the multiplication in cell and accumulation using IA and 1-b W, as shown in Fig. 3(a). Depending on the
a capacitor. The 3T-2C cell in [21], [22] separates the voltage weight value, one cell is set to LRS and the other to HRS.
domain of multiplication logic in the cell and multiplication Also, only one row of the two rows in the cell pair is
can be done in the digital domain. Therefore, the leakage noise activated according to the IA value for XNOR operation,
on the storage node can not affect the 1-b multiplication and and all pairs are activated in parallel for the accumulation.

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on March 21,2024 at 07:20:17 UTC from IEEE Xplore. Restrictions apply.
KIM AND YOO: OVERVIEW OF CIM CIRCUITS WITH DRAM AND NVM 1629

time-space-based in-memory computing. As shown in


Fig. 3(c), [23] uses the series of voltage-to-time converter
(VTC) and time-to-digital converter (TDC) after the in-
memory operation. After precharging the parasitic capacitor of
BL, the current flowed through the cell with the computation
results, and discharge latency was sensed. Therefore, no DC-
current occurs during the operation, enabling higher efficiency.

C. Hybrid Method
In addition, the hybrid method uses several computational
methods together, using a trade-off between computational
accuracy and efficiency.
As shown in Fig. 3(d), [24], [25] utilizes both single-level
cell (SLC) and multi-level cell (MLC) of the ReRAM and
PCM. The SLC and MLC have a trade-off between memory
density, efficiency, and signal margin. The MCL achieves
higher weight density and efficiency with the multi-bit data in
a single cell but shows a lower signal margin. Therefore, [24]
implements hybrid operation with SLC and MLC of PCM by
storing the upper 2-bit of 8-bit weight in the SLC cell for
better signal margin and storing the remaining lower 6-bit in
the MLC cells for higher efficiency. Also, [25] uses ReRAM-
CIM and supports reconfigurable SLC-MLC-hybrid modes.
Fig. 3. Operation of NVM (a) 2T-2R XNOR cell (b) 4T-4R Dual For example, the number of bits stored in the MLC cell can be
Complementary Cell (c) Time-space Readout (d) SLC-MLC hybrid IMC. configured among 0, 2, 4, 6, and 8 bits. Therefore, depending
on the requirement of the NN model, the weight density
Therefore, the voltage of the BL is determined by voltage and computing efficiency can be balanced with computing
division with the cell resistance and the pull-up (PU) transistor accuracy.
resistance, as shown in Fig. 3(a). However, because of the Reference [25] utilizes both CIM and near-memory-
non-ideal resistance of BL and pull-up logic, the transfer computing (NMC) in a single macro. The NVM-NMC
function of BL voltage becomes non-linear. Therefore, [26] eliminates the multiple-row activation and places the comput-
uses flash ADC with non-linear reference voltage for sensing ing unit below the cell array to ensure accurate computation
non-linear voltage transfer function with linear quantization. while sacrificing throughput and efficiency. Therefore, with
Also, the [27], [28] also adopts the XNOR-based operation for the trade-off between computing accuracy and efficiency, CIM
MRAM-CIM. However, [27] extend the 1-b XNOR operation and NMC can be adaptively used depending on the accuracy
as the multi-bit operation with 1-b input-vector and 4-b matrix. requirements of each layer of NN.
Therefore, for the multi-bit operation with high linearity, the
current-based operation with the current combiner feeds the IV. C HALLENGES AND F UTURE R ESEARCH D IRECTIONS
4-column output to a single ADC simultaneously. Also, [28]
uses refined bit-cell to support XNOR operation with enhanced Numerous studies have proposed methods to address the
readout margin. unique challenges arising from computation within DRAM
As shown in Fig. 3(b), [29] proposed the 4T-4R cell with and NVM, such as the impact of leakage and non-linear signal
dual complementary coding to overcome the non-linearity of margin. Despite the recent advancements, the field still faces
the voltage transfer function. In addition to the previous 2T- certain challenges. These challenges highlight the nature of
2R cell that uses complementary WL, the complementary BL DRAM and NVM-CIM and pave the research direction for
was also used and four cells operate as a pair. Therefore, future research.
the number of activated LRS and HRS in BL and BLB 1) Process limitations: NVM has limited processes available
was always the same and the differential voltage of the BL depending on the memory type, and the circuit for eDRAM or
pair became linear. Therefore, unlike [26] with flash ADC, DRAM-CIM may not apply to each other’s process. Also, the
linear output sensing is possible even with the successive density of the custom CIM cell is quite lower than the in-fab
approximation (SAR)-ADC. In addition, the 4T-4R cell in [29] cell array.
can be reconfigured into the two 2T-2R cells with higher 2) Refreshing DRAM causes energy and throughput over-
efficiency and lower accuracy. Therefore, it can adaptively head in addition to the accuracy and density challenges. Also,
reconfigure the mode depending on the accuracy requirements when expanding the design to a large scale, refresh control
of the NN model for higher efficiency. can become challenging to handle computation and memory
access together.
3) The write operation of NVM requires high voltage
B. Time-Based Operation and high latency. Also in some cases applying to CIM, the
To overcome the large DC-current flow during interactive write-read-verify and multiple voltage sources or
voltage-based operation of NVM-CIM, [23] proposed pulse width modulation are required

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on March 21,2024 at 07:20:17 UTC from IEEE Xplore. Restrictions apply.
1630 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 71, NO. 3, MARCH 2024

TABLE II
C OLLECTION OF THE NVM-PIM [5] A. Biswas and A. P. Chandrakasan, “Conv-RAM: An energy-
efficient SRAM with embedded convolution computation for
low-power CNN-based machine learning applications,” in Proc.
IEEE Int. Solid - State Circuits Conf. (ISSCC), 2018, pp. 488–490,
doi: 10.1109/ISSCC.2018.8310397.
[6] S. Yin, Z. Jiang, J.-S. Seo, and M. Seok, “XNOR-SRAM: In-memory
computing SRAM macro for binary/ternary deep neural networks,”
IEEE J. Solid-State Circuits, vol. 55, no. 6, pp. 1733–1743, Jun. 2020,
doi: 10.1109/JSSC.2019.2963616.
[7] X. Si et al., “24.5 a twin-8T SRAM computation-in-memory
macro for multiple-bit CNN-based machine learning,” in Proc.
IEEE Int. Solid- State Circuits Conf. (ISSCC), 2019, pp. 396–398,
doi: 10.1109/ISSCC.2019.8662392.
[8] X. Si et al., “A 28nm 64Kb 6T SRAM computing-in-memory
macro with 8b MAC operation for AI edge chips,” in Proc.
IEEE Int. Solid- State Circuits Conf. (ISSCC), 2020, pp. 246–248,
doi: 10.1109/ISSCC19947.2020.9062995.
[9] Z. Jiang, S. Yin, J.-S. Seo, and M. Seok, “C3SRAM: An in-
memory-computing SRAM macro based on robust capacitive coupling
computing mechanism,” IEEE J. Solid-State Circuits, vol. 55, no. 7,
pp. 1888–1897, Jul. 2020, doi: 10.1109/JSSC.2020.2992886.
[10] Q. Dong et al., “15.3 a 351TOPS/W and 372.4GOPS compute-in-
memory SRAM macro in 7nm FinFET CMOS for machine-learning
applications,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC),
2022, pp. 242–244, doi: 10.1109/ISSCC42614.2022.9731681.
To address these challenges, following future research direc-
[11] J.-W. Su et al., “16.3 a 28nm 384kb 6T-SRAM computation-in-
tion is possible: memory macro with 8b precision for AI edge chips,” in Proc.
1) The CIM with standard high-density bit-cells can main- IEEE Int. Solid-State Circuits Conf. (ISSCC), 2021, pp. 250–252,
tain high density. The computing logic implemented with doi: 10.1109/ISSCC42613.2021.9365984.
[12] J. Lee, H. Valavi, Y. Tang, and N. Verma, “Fully row/column-parallel
peripheral circuits or combining peripheral circuits with cells in-memory computing SRAM macro employing capacitor-based mixed-
can be a feasible solution. signal computation with 5-b inputs,” in Proc. Symp. VLSI Circuits, 2021,
2) To compensate for the shortcomings of DRAM-CIM pp. 1–2, doi: 10.23919/VLSICircuits52068.2021.9492444.
or NVM-CIM, SRAM or digital architecture can be used [13] S. Yin et al., “PIMCA: A 3.4-Mb programmable in-memory computing
accelerator in 28nm for on-chip DNN inference,” in Proc. Symp. VLSI
together. Technol., 2021, pp. 1–2.
3) The lightweight algorithms to reduce write and refresh [14] P.-C. Wu et al., “A 28nm 1Mb time-domain computing-in-
or compiler for efficient model mapping can be co-designed. memory 6T-SRAM macro with a 6.6ns latency, 1241GOPS and
37.01TOPS/W for 8b-MAC operations for edge-AI devices,” in
Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2022, pp. 1–3,
doi: 10.1109/ISSCC42614.2022.9731681.
V. C ONCLUSION [15] S. Xie, C. Ni, A. Sayal, P. Jain, F. Hamzaoglu, and J. P. Kulkarni,
This tutorial has covered computing circuits for CIM, “16.2 eDRAM-CIM: Compute-in-memory design with reconfigurable
embedded-dynamic-memory array realizing adaptive data converters and
specifically focusing on utilizing dynamic random access charge-domain computing,” in Proc. IEEE Int. Solid-State Circuits Conf.
memory (DRAM) and non-volatile memory (NVM). SRAM- (ISSCC), 2021, pp. 248–250, doi: 10.1109/ISSCC42613.2021.9365932.
based CIM has dominated previously, benefiting from stable [16] S. Kim et al., “A reconfigurable 1T1C eDRAM-based spiking neural
network computing-in-memory processor for high system-level effi-
data storage and low power usage, but limitations in transistor ciency,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), 2023, pp. 1–5,
count and volatility have driven exploration into DRAM doi: 10.1109/ISCAS46773.2023.10181420.
and NVM as promising alternatives for CIM. The continual [17] S. Xie, C. Ni, P. Jain, F. Hamzaoglu, and J. P. Kulkarni,
advancement of DRAM-CIM and NVM-CIM are overcoming “Gain-cell CIM: Leakage and bitline swing aware 2T1C gain-
cell eDRAM compute in memory design with bitline precharge
challenges and enhancing energy efficiency and functionality. DACs and compact schmitt trigger ADCs,” in Proc. IEEE Symp.
As the field progresses, it is expected that DRAM and NVM- VLSI Technol. Circuits (VLSI Technol. Circuits), 2022, pp. 112–113,
CIM will play key roles in energy-efficient CIM hardware for doi: 10.1109/VLSITechnologyandCir46769.2022.9830338.
AI/ML applications. [18] Z. Chen, X. Chen, and J. Gu, “15.3 A 65nm 3T dynamic analog
RAM-based computing-in-memory macro and CNN accelerator with
retention enhancement, adaptive analog sparsity and 44TOPS/W system
energy efficiency,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC),
R EFERENCES 2021, pp. 240–242, doi: 10.1109/ISSCC42613.2021.9366045.
[19] S. Ha, S. Kim, D. Han, S. Um, and H.-J. Yoo, “A 36.2 dB high
[1] S. Yu, H. Jiang, S. Huang, X. Peng, and A. Lu, “Compute- SNR and PVT/leakage-robust eDRAM computing-in-memory macro
in-memory chips for deep learning: recent trends and prospects,” with segmented BL and reference cell array,” IEEE Trans. Circuits
IEEE Circuits Syst. Mag., vol. 21, no. 3, pp. 31–56, Aug. 2021, Syst. II, Exp. Briefs, vol. 69, no. 5, pp. 2433–2437, May 2022,
doi: 10.1109/MCAS.2021.3092533. doi: 10.1109/TCSII.2022.3159808.
[2] N. R. Shanbhag and S. K. Roy, “Benchmarking in-memory comput- [20] C. Yu, T. Yoo, H. Kim, T. T.-H. Kim, K. C. T. Chuan, and
ing architectures,” IEEE Open J. Solid-State Circuits Society, vol. 2, B. Kim, “A logic-compatible eDRAM compute-in-memory with
pp. 288–300, Dec. 2022, doi: 10.1109/OJSSCS.2022.3210152. embedded ADCs for processing neural networks,” IEEE Trans.
[3] C. Xu et al., “Overcoming the challenges of crossbar resistive memory Circuits Syst. I, Reg. Papers, vol. 68, no. 2, pp. 667–679, Feb. 2021,
architectures,” in Proc. IEEE 21st Int. Symp. High Perform. Comput. doi: 10.1109/TCSI.2020.3036209.
Archit. (HPCA), 2015, pp. 476–488, doi: 10.1109/HPCA.2015.7056056. [21] S. Kim et al., “16.5 DynaPlasia: An eDRAM in-memory-
[4] M. Gao, J. Pu, X. Yang, M. Horowitz, and C. Kozyrakis. “TETRIS: computing-based reconfigurable spatial accelerator with
Scalable and efficient neural network acceleration with 3D memory,” triple-mode cell for dynamic resource switching,” in Proc. IEEE
in Proc. Int. Conf. Archi. Support Program. Lang. Operating Syst. Int. Solid-State Circuits Conf. (ISSCC), 2023, pp. 256–258,
(ASPLOS), 2017, pp. 751–764, doi: 10.1145/3037697.3037702. doi: 10.1109/ISSCC42615.2023.10067352.

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on March 21,2024 at 07:20:17 UTC from IEEE Xplore. Restrictions apply.
KIM AND YOO: OVERVIEW OF CIM CIRCUITS WITH DRAM AND NVM 1631

[22] S. Kim et al., “Scaling-CIM: An eDRAM-based in-memory- [31] D. Kwon et al., “A 1ynm 1.25V 8Gb 16Gb/s/Pin GDDR6-based
computing accelerator with dynamic-scaling ADC for SQNR-boosting accelerator-in-memory supporting 1TFLOPS MAC operation and
and layer-wise adaptive bit-truncation,” in Proc. IEEE Symp. various activation functions for deep learning application,” IEEE
VLSI Technol. Circuits (VLSI Technol. Circuits), 2023, pp. 1–2, J. Solid-State Circuits, vol. 58, no. 1, pp. 291–302, Jan. 2023,
doi: 10.23919/VLSITechnologyandCir57934.2023.10185439. doi: 10.1109/JSSC.2022.3200718.
[23] J.-M. Hung et al., “An 8-Mb DC-current-free binary-to-8b precision [32] Y. Kwon, Y. Lee, and M. Rhu, “TensorDIMM: A practical near-
ReRAM nonvolatile computing-in-memory macro using time-space- memory processing architecture for embeddings and tensor operations in
readout with 1286.4 - 21.6TOPS/W for edge-AI devices,” in deep learning,” in Proc. 52nd Ann. IEEE/ACM Int. Symp. Microarchit.
Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2022, pp. 1–3, (MICRO ’52) 2019, pp. 740–753, doi: 10.1145/3352460.3358284.
doi: 10.1109/ISSCC42614.2022.9731715. [33] N. Hajinazar et al., “SIMDRAM: A framework for bit-serial SIMD
[24] W.-S. Khwa et al., “A 40-nm, 2M-cell, 8b-precision, hybrid SLC-MLC processing using DRAM,” in Proc. 26th ACM Int. Conf. Archit.
PCM computing-in-memory macro with 20.5 - 65.0TOPS/W for tiny-Al Support Program. Lang. Operat. Syst. (ASPLOS), 2021, pp. 329–345,
edge devices,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), doi: 10.1145/3445814.3446749.
2022, pp. 1–3, doi: 10.1109/ISSCC42614.2022.9731670. [34] X. Xin, Y. Zhang, and J. Yang, “ELP2IM: Efficient and low
[25] W.-H. Huang et al., “A nonvolatile Al-edge processor with 4MB power bitwise operation processing in DRAM,” in Proc. IEEE Int.
SLC-MLC hybrid-mode ReRAM compute-in-memory macro and Symp. High Perform. Comput. Archit. (HPCA), 2020, pp. 303–314,
51.4-251TOPS/W,” in Proc. IEEE Inte. Solid-State Circuits Conf. doi: 10.1109/HPCA47549.2020.00033.
(ISSCC), 2023, pp. 15–17, doi: 10.1109/ISSCC42615.2023.10067610. [35] V. Seshadri et al., “RowClone: Fast and energy-efficient in-
[26] S. Yin, X. Sun, S. Yu, and J.-S. Seo, “High-throughput in-memory com- DRAM bulk data copy and initialization,” in Proc. 46th Ann.
puting for binary deep neural networks with monolithically integrated IEEE/ACM Int. Symp. Microarchit. (MICRO-46), 2020, pp. 185–197,
RRAM and 90-nm CMOS,” IEEE Trans. Electron Devices, vol. 67, doi: 10.1145/2540708.2540725.
no. 10, pp. 4185–4192, Oct. 2020, doi: 10.1109/TED.2020.3015178.
[36] V. Seshadri et al., “Ambit: In-memory accelerator for bulk bitwise
[27] P. Deaville, B. Zhang, and N. Verma, “A 22nm 128-kb MRAM
operations using commodity DRAM technology,” in Proc. 50th Annu.
row/column-parallel in-memory computing macro with memory-
IEEE/ACM Int. Symp. Microarchit. (MICRO-50 ’17), 2017, pp. 273–287,
resistance boosting and multi-column ADC readout,” in Proc.
doi: 10.1145/3123939.3124544.
IEEE Symp. VLSI Technol. Circuits (VLSI Technol. Circuits),
2022, pp. 268–269, doi: 10.1109/VLSITechnologyandCir46769. [37] S. Li, D. Niu, K.T. Malladi, H. Zheng, B. Brennan, and Y. Xie. “DRISA:
2022.9830153. A DRAM-based reconfigurable in-situ accelerator,” in Proc. 50th Annu.
[28] H. Cai et al., “A 28nm 2Mb STT-MRAM computing-in-memory macro IEEE/ACM Int. Symp. Microarchit. (MICRO-50 ’17), 2017, pp. 288–301,
with a refined bit-cell and 22.4 - 41.5TOPS/W for AI inference,” in doi: 10.1145/3123939.3123977.
Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2023, pp. 500–502, [38] B. Li, B. Yan, and H. Li “An overview of in-memory processing
doi: 10.1109/ISSCC42615.2023.10067339. with emerging non-volatile memory for data-intensive applica-
[29] W. Xie, H. Sang, B. Kwon, D. Im, S. Kim, S. Kim, and tions,” Great Lakes Symp. VLSI (GLSVLSI ’19), 2017, pp. 381–386,
H.-J. Yoo, “A 709.3 TOPS/W event-driven smart vision SOC with doi: 10.1145/3299874.3319452.
high-linearity and reconfigurable MRAM PIM,” in Proc. IEEE Symp. [39] J. Borghetti, G. S. Snider, P. J. Kuekes, J. J. Yang, and
VLSI Technol. Circuits (VLSI Technology and Circuits), 2023, pp. 1–2, D. R. Stewart, “‘Memristive’ switches enable ‘stateful’ logic operations
doi: 10.23919/VLSITechnologyandCir57934.2023.10185337. via material implication,” Nature, vol. 464, pp. 873–876, Apr. 2010,
[30] Y.-C. Kwon et al., “A 20nm 6GB function-in-memory dram, Based doi: 10.1038/nature08940.
on HBM2 with a 1.2TFLOPS programmable computing unit using [40] E. Linn, R. Rosezin, S. Tappertzhofen, U. Böttger, and R. Waser,
bank-level parallelism, for machine learning applications,” in Proc. “Beyond von Neumann—Logic operations in passive crossbar arrays
IEEE Int. Solid-State Circuits Conf. (ISSCC), 2021, pp. 350–352, alongside memory operations,” Nanotechnology, vol. 23, no. 30, 2012,
doi: 10.1109/ISSCC42613.2021.9365862. Art. no. 305205. doi: 10.1088/0957-4484/23/30/305205.

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on March 21,2024 at 07:20:17 UTC from IEEE Xplore. Restrictions apply.

You might also like