A Fully Bit-Flexible Computation in Memory Macro Using Multi-Functional Computing Bit Cell and Embedded Input Sparsity Sensing
A Fully Bit-Flexible Computation in Memory Macro Using Multi-Functional Computing Bit Cell and Embedded Input Sparsity Sensing
Abstract— Computation in memory (CIM) overcomes the von practical because of the higher computing bandwidth and tran-
Neumann bottleneck by minimizing the communication over- sistor density with the advancement of semiconductor technol-
head between memory and process elements. However, using ogy. However, further improvement in energy efficiency using
conventional CIM architectures to realize multiply-accumulate
operations (MACs) with flexible input and weight bit precision traditional von Neumann architecture is limited, since the com-
is extremely challenging. This article presents a fully bit-flexible putation and storage are separate, and the corresponding data
CIM design with a compact area and high energy efficiency. access has dominated the overall energy consumption [1]. This
The proposed CIM macro employs a novel multi-functional “memory wall” has become the performance bottleneck of
computing bit cell design by integrating the MAC and the efficient implementations for ML algorithms and applications.
A/D conversion to maximize efficiency and flexibility. Moreover,
an embedded input sparsity sensing and a self-adaptive dynamic Computation in memory (CIM) is proposed to deal with
range (DR) scaling scheme are proposed to minimize the energy- the memory bottleneck by maximizing the utilization rate of
consuming A/D conversions in CIM. Finally, the proposed CIM the stored data with massively parallel and local processing
macro implementation utilizes an interleaved placement structure inside the memory macro. As a result, the CIM architecture
to enhance the weight-updating bandwidth and the layout sym- can achieve over 10× higher energy efficiency than the state-
metry. The proposed CIM design fabricated in standard 28-nm
CMOS technology achieves an area efficiency of 27.7 TOPS/mm2 of-the-art digital accelerators [2]. A typical CIM is commonly
and an energy efficiency of 291 TOPS/W, demonstrating a highly accomplished by storing the weight parameters in the memory
energy-area-efficient flexible CIM solution. and then feeding the input activations into the CIM macro to
Index Terms— Area-efficient, bit scalability, computation in generate the corresponding MAC outputs. This approach is
memory (CIM), deep neural network (DNN), energy-efficient, suitable for applications that require only fixed weight and
in-memory A/D conversion, sparsity sensing. input bit precision [3], [4], [5], [6], [7], [8], [9], [10], [11],
[12] but fails to serve the applications demanding the MACs
I. I NTRODUCTION with flexible bit precision. To solve this issue, several CIM
works split the weight parameters and input activations into
M ODERN machine learning (ML) algorithms, such
as deep neural networks (DNNs), require substan-
tial parameters and computations mainly composed of
bit groups with different weighting representations for low-bit
MACs first. These partial MAC results are then processed by
high-dimensional multiply-accumulate operations (MACs). MAC aggregation or near-memory computing (NMC) circuitry
The corresponding hardware implementations have become to complete full-precision MACs [13], [14], [15], [16], [17],
[18], [19], [20], [21], [22], [23], [24], [25]. Based on this
Manuscript received 18 February 2022; revised 24 July 2022; design principle, the CIM design employing 1-b × 1-b MACs
accepted 16 November 2022. Date of publication 9 January 2023; date of
current version 25 April 2023. This article was approved by Associate Editor together with MAC aggregation offers a promising solution
Kathryn Wilcox. This work was supported in part by the Ministry of Science to realize fully bit-flexible multi-bit MACs, which can be
and Technology, Taiwan, under Grant MOST 110-2218-E-002-034-MBK explained by the equation below:
and Grant MOST 111-2218-E-002-018-MBK; in part by the Intelligent
and Sustainable Medical Electronics Research Fund in National Taiwan N −1
P−1 Q−1 N −1
University; and in part by MediaTek Inc. under Contract MTKC-2022-0125.
(Corresponding author: Tsung-Te Liu.)
y =x·w= x n wn = (−1)k 2 p+q x n [ p]wn [q]
Chun-Yen Yao was with the Graduate Institute of Electronics Engi- n=0 p=0 q=0 n=0
neering, National Taiwan University, Taipei 10617, Taiwan. He is now (1)
with the Department of Electrical Engineering and Computer Sciences,
University of California at Berkeley, Berkeley, CA 94720 USA (e-mail: where x is an N-dimensional input vector whose bit preci-
[email protected]).
Tsung-Yen Wu was with the Graduate Institute of Electronics Engineering, sion in each scalar term x n is P, w is an N-dimensional
National Taiwan University, Taipei 10617, Taiwan. He is now with MediaTek weight vector whose bit precision in each scalar term wn is
Inc., Taipei 11491, Taiwan (e-mail: [email protected]). Q, and k is an integer term to handle negative conditions
Han-Chung Liang, Yu-Kai Chen, and Tsung-Te Liu are with the Gradu-
ate Institute of Electronics Engineering, National Taiwan University, Taipei for two’s complement operations. By (1), it is clear that a
10617, Taiwan (e-mail: [email protected]; [email protected]; general MAC consists of only Ntwo classes of components:
−1
n=0 x n [ p]wn [q]) and scaling
[email protected]). common terms (summation,
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/JSSC.2022.3224363. terms ((−1)k 2 p+q ). As a result, the CIM design utilizing a
Digital Object Identifier 10.1109/JSSC.2022.3224363 1-b × 1-b MAC scheme to compute the common terms can
0018-9200 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: National Taiwan University. Downloaded on June 27,2024 at 03:00:40 UTC from IEEE Xplore. Restrictions apply.
1488 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 58, NO. 5, MAY 2023
achieve full bit flexibility if all scaling terms are computed via measurement results are shown in Section VI, where com-
MAC aggregation. parisons with the state-of-the-art results are also provided.
Several recent works [13], [15], [16], [19], [20] have taken Section VII concludes this article.
advantage of 1-b × 1-b MACs with MAC aggregation to
maximize the bit flexibility. Among these designs, analog II. R ELATED W ORKS
CIM approaches can potentially achieve a compact CIM
macro area without bulky digital adders required in digital The CIM designs employing 1-b × 1-b MACs with MAC
CIM counterparts. However, the analog CIM architecture aggregation can be mainly classified into two categories
demands a multi-bit A/D conversion process to reconstruct according to its computing scheme: current-based compu-
the mixed-signal MAC results back to digital codes. Due tation [13], [16] and charge-based computation [15]. The
to the required A/D conversion and multi-level reference current-based CIM design performs one 1-b × 1-b MAC by
circuitry, its efficiency can be significantly limited. Moreover, summing the total discharging currents on the bitlines. This
the A/D conversion process consumes a significant amount computing scheme features a short CIM delay that benefits
of energy, seriously impacting the CIM energy efficiency. from the short bitline discharging time but suffers from high
Finally, since the completion of an ML task requires both nonlinearity. Okumura et al. [13] proposed a 17T ternary bit
standard memory access and CIM operations, the performance cell that consists of two standard 6T SRAM cells and a
of standard memory access is also critical. Although the 5T discharging circuit. The multi-level reference circuitry for
standard write access and the CIM operation use the same the following successive-approximation register (SAR) A/D
number of rows, a single write is not a multi-row access operation was realized via the binary-weighted reference cells
function, causing low weight-updating bandwidth and severe replicating the same discharging path of ternary bit cells. How-
performance degradation of the system latency. ever, only one of the four banks in the MAC operating block
To overcome the design challenges above, this work pro- can simultaneously access these area-consuming reference
poses a fully bit-flexible CIM macro with the following design cells, significantly deteriorating the memory utilization and the
features. overall area efficiency. Besides, Chiu et al. [16] exploited the
1) A highly compact CIM computing bit cell (CIMC) can small footprint of standard 6T SRAM and directly built the
support standard read/write access, 1-b × 1-b MAC, discharging paths via the access transistors. The corresponding
reference voltage generation, and in-memory A/D con- digital codes can then be reconstructed via a self-timing
version to maximize the area efficiency by reducing the tracking and sensing scheme through four pairs of replica
A/D and reference circuitry overheads. bitlines. However, this approach requires additional compensa-
2) An embedded input sparsity sensing and an automatic tion bit cells to ensure the correct MAC function. The required
on-chip reference voltage generation scheme can realize number of compensation cells equals the maximum number of
a self-adaptive dynamic range (DR) scaling based on activated WLs, causing a huge area overhead for CIM designs.
the real-time input sparsity characteristics to minimize In addition, the required linear search A/D process slows down
the expensive A/D conversions and maximize the energy the A/D operation and requires more comparisons than a SAR
efficiency. A/D approach, significantly degrading its energy performance.
3) An interleaved CIMC placement structure can simul- On the other hand, the charge-based CIM approach demon-
taneously accelerate the weight-updating process, strates better linearity than the current-based designs by per-
maintain the symmetric layout implementation, and sup- forming the computations on capacitors. Besides, it features
port the ping-pong operation for higher weight-updating high integration, since the metal-oxide-metal (MOM) capac-
bandwidth. itors can be placed right above the transistors. Jia et al. [15]
The proposed fully bit-flexible CIM macro was imple- exploited this feature and proposed an 8T1C bit cell array for
mented and verified in standard 28-nm CMOS technol- 1-b × 1-b MACs via charge sharing. The outputs are then
ogy. The proposed CIM design achieves an area efficiency converted into digital codes via typical capacitor-switching
of 27.7 TOPS/mm2 , representing an 8.15× improvement SAR analog-to-digital converters (ADCs) outside the CIM
compared with the previous work. Moreover, the proposed array. However, this approach can cause a large area penalty
embedded input sparsity sensing and the self-adaptive DR due to additional sampling capacitors. Moreover, the sample-
scaling minimize the expensive A/D conversions, realizing and-hold process can result in severe voltage swing reduc-
27.4%–30.2% energy reduction and measured peak energy tion [11], [19], seriously deteriorating the sense margin of the
efficiency of 383 TOPS/W. Finally, the proposed interleaved comparators and the corresponding CIM performance. In sum-
CIMC placement topology can further enable a 32.7% reduc- mary, the A/D conversion and the reference circuits have
tion in operating cycles, substantially improving the system clearly become the performance bottleneck for further effi-
latency performance. ciency improvement in the CIM designs. Therefore, this work
This article is organized as follows. Section II introduces tackles this critical issue by proposing a novel multi-functional
the previous bit-flexible CIM works. Section III describes computing bit cell architecture that integrates the MAC and
the proposed CIM architecture and the operating principle A/D conversion together to maximize the CIM efficiency.
of in-memory A/D conversion. Section IV introduces the In addition, the energy-consuming A/D conversions are mini-
proposed embedded input sparsity sensing and self-adaptive mized with the proposed embedded input sparsity sensing and
DR scaling techniques. Section V describes the proposed self-adaptive DR scaling to further enhance the CIM energy
interleaved CIMC placement. The chip implementation and efficiency.
Authorized licensed use limited to: National Taiwan University. Downloaded on June 27,2024 at 03:00:40 UTC from IEEE Xplore. Restrictions apply.
YAO et al.: FULLY BIT-FLEXIBLE CIM MACRO 1489
Fig. 1. (a) Overall architecture of the proposed CIM design. (b) Schematic of CIMC and its supporting functions.
A. Multi-Functional CIMC
The proposed CIMC, as shown in Fig. 1(b), consists of a architectures and significantly reduce the overall area by
6T SRAM for weight storage, a stacked MOM capacitor (Cu ) 24.1%, as described in Section VI.
above the transistors, and a 6T AND–OR–INV gate serving An operation of the proposed CIM consists of two steps.
as computation logic and capacitor driver. Based on different Fig. 2 illustrates an operation example of a 256-D 1-b ×
ways to activate the control signals, including WL, RWL, 1-b MAC computation in BlockA0 . Fig. 3 shows the corre-
CTRL, and RST, each CIMC can support the following four sponding timing diagram. In the first step, BlockA0 computes
functions: 1) standard read/write access; 2) 1-b × 1-b mixed- the mixed-signal MAC, while BlockB0 generates the reference
signal MAC; 3) reference voltage generation; and 4) SAR A/D voltage V ref . The capacitive coupling mechanism in both
capacitor switching. As a result, the proposed highly integrated blocks can then be expressed as follows:
bit cell design can realize the required CIM functions within
a compact cell structure. In this way, we avoid additional Cu VDD
255
VRBL = Zi (2)
ADC and reference circuit overhead in conventional CIM 256Cu + CRBL i=0
Authorized licensed use limited to: National Taiwan University. Downloaded on June 27,2024 at 03:00:40 UTC from IEEE Xplore. Restrictions apply.
1490 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 58, NO. 5, MAY 2023
Authorized licensed use limited to: National Taiwan University. Downloaded on June 27,2024 at 03:00:40 UTC from IEEE Xplore. Restrictions apply.
YAO et al.: FULLY BIT-FLEXIBLE CIM MACRO 1491
Fig. 9. Concept of self-adaptive DR scaling given (a) dense and (b) sparse
Fig. 7. Modeled input sparsity of CIFAR-10 task with ResNet-18. input.
Authorized licensed use limited to: National Taiwan University. Downloaded on June 27,2024 at 03:00:40 UTC from IEEE Xplore. Restrictions apply.
1492 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 58, NO. 5, MAY 2023
Fig. 11. Interleaved CIMC placement, where the dots denote the wire access
to the CIMCs.
Fig. 13. Ping-pong operation using the proposed CIM macro.
Fig. 12. Operating cycle reduction using the proposed interleaved CIMC
placement structure.
Authorized licensed use limited to: National Taiwan University. Downloaded on June 27,2024 at 03:00:40 UTC from IEEE Xplore. Restrictions apply.
YAO et al.: FULLY BIT-FLEXIBLE CIM MACRO 1493
TABLE II
P ERFORMANCE C OMPARISON W ITH THE S TATE - OF - THE -A RT CIM D ESIGNS
Fig. 15. Area breakdown between the two different CIM macro designs. Fig. 16. Measured energy improvement with the proposed embedded input
sparsity sensing and self-adaptive DR scaling scheme.
of 27.7 TOPS/mm2 . Moreover, with the proposed embedded energy reduction at 0.7 and 0.8 V, respectively, effectively
input sparsity sensing and self-adaptive DR scaling scheme, minimizing the A/D conversion energy to enhance the overall
our design achieves a peak energy efficiency of 383 TOPS/W. CIM efficiency.
Given the CIFAR-10 dataset, the proposed design can achieve Table II compares the proposed CIM architecture with
91% classification accuracy with ResNet-18. the state-of-the-art designs that support flexible input and
Fig. 15 compares the area breakdown of the proposed CIM weight bit precision. The proposed multi-functional CIMC
macro with the baseline design. The proposed CIM architec- implemented in an advanced CMOS technology process
ture with in-memory A/D conversion significantly decreases enables a highly integrated and compact CIM design, demon-
the macro area from 43 200 to 32 800 µm2 , representing strating 8.15× higher area efficiency and 5.89× higher
a 24.1% area reduction. To evaluate the energy improve- energy efficiency than the previous work with the highest
ment with the proposed embedded input sparsity sensing and area efficiency [19]. Moreover, with the proposed embed-
self-adaptive DR scaling scheme, an additional test mode that ded input sparsity sensing and self-adaptive DR scaling
can disable the sparsity sensor was implemented in the CIM scheme, the proposed CIM design achieves similar energy
test chip. Fig. 16 compares the measured MAC energy of efficiency while realizing 72.1× higher area efficiency than
the proposed design before and after the embedded sparsity the most energy-efficient CIM [15]. Finally, the proposed
sensor is disabled. For a CIFAR-10 task with ResNet-18 CIM macro utilizes an interleaved placement structure to
topology, the proposed design with the input sparsity sens- maintain symmetric layout implementation and accelerate the
ing and self-adaptive DR scaling realizes 27.4% and 30.2% weight-updating process, substantially reducing the overall
Authorized licensed use limited to: National Taiwan University. Downloaded on June 27,2024 at 03:00:40 UTC from IEEE Xplore. Restrictions apply.
1494 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 58, NO. 5, MAY 2023
latency and benefiting large-scale CIM-based computation [12] S. Xie, C. Ni, A. Sayal, P. Jain, F. Hamzaoglu, and J. P. Kulkarni,
systems. “eDRAM-CIM: Compute-in-memory design with reconfigurable
embedded-dynamic-memory array realizing adaptive data converters
and charge-domain computing,” in IEEE Int. Solid-State Circuits Conf.
VII. C ONCLUSION (ISSCC) Dig. Tech. Papers, Feb. 2021, pp. 248–250.
[13] S. Okumura, M. Yabuuchi, K. Hijioka, and K. Nose, “A ternary based
This article presents an energy-area-efficient CIM design bit scalable, 8.80 TOPS/W CNN accelerator with many-core processing-
that can support emerging ML applications with flexible bit in-memory architecture with 896K synapses/mm2 ,” in Proc. Symp. VLSI
precision. The highly compact multi-functional CIMC design Technol., Jun. 2019, pp. 248–249.
[14] X. Si et al., “A twin-8T SRAM computation-in-memory unit-macro for
with in-memory SAR A/D conversion can maximize the CIM multibit CNN-based AI edge processors,” IEEE J. Solid-State Circuits,
efficiency and flexibility. In addition, the proposed embedded vol. 55, no. 1, pp. 189–202, Jan. 2020.
input sparsity sensing and self-adaptive DR scaling scheme [15] H. Jia, H. Valavi, Y. Tang, J. Zhang, and N. Verma, “A programmable
heterogeneous microprocessor based on bit-scalable in-memory com-
minimize the expensive A/D conversions effectively. Finally, puting,” IEEE J. Solid-State Circuits, vol. 55, no. 9, pp. 2609–2621,
the interleaved placement structure is proposed to improve Jan. 2020.
the weight-updating bandwidth and maintain the layout sym- [16] Y.-C. Chiu et al., “A 4-Kb 1-to-8-bit configurable 6T SRAM-based
computation-in-memory unit-macro for CNN-based AI edge proces-
metry simultaneously. The measurement results show that sors,” IEEE J. Solid-State Circuits, vol. 55, no. 10, pp. 2790–2801,
the proposed CIM design achieves the high energy and area Oct. 2020.
efficiencies of 291 TOPS/W and 27.7 TOPS/mm2 , respec- [17] J.-W. Su et al., “A 28 nm 64 Kb inference-training two-way transpose
multibit 6T SRAM compute-in-memory macro for AI edge chips,”
tively, representing a highly efficient bit-flexible CIM solution. in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
Sep. 2020, pp. 240–242.
[18] C.-X. Xue et al., “A 22 nm 2 Mb ReRAM compute-in-memory macro
ACKNOWLEDGMENT with 121–28TOPS/W for multibit MAC computing for tiny AI edge
The authors would like to thank Taiwan Semiconductor devices,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.
Papers, Feb. 2020, pp. 244–246.
Manufacturing Company (TSMC), Hsinchu, Taiwan, and the [19] Z. Chen et al., “CAP-RAM: A charge-domain in-memory computing
Taiwan Semiconductor Research Institute (TSRI), Hsinchu, for 6T-SRAM for accurate and precision-programmable CNN inference,”
providing chip fabrication and technical support. They would IEEE J. Solid-State Circuits, vol. 56, no. 6, pp. 1924–1935, Jun. 2021.
[20] H. Kim, T. Yoo, T. T.-H. Kim, and B. Kim, “Colonnade: A recon-
also like to thank Bing-Chen Wu for providing suggestions to figurable SRAM-based digital bit-serial compute-in-memory macro for
this manuscript. processing neural networks,” IEEE J. Solid-State Circuits, vol. 56, no. 7,
pp. 2221–2233, Jul. 2021.
[21] X. Si et al., “A local computing cell and 6T SRAM-based computing-
R EFERENCES in-memory macro with 8-b MAC operation for edge AI chips,” IEEE J.
Solid-State Circuits, vol. 56, no. 9, pp. 2817–2831, Sep. 2021.
[1] M. Horowitz, “Computing’s energy problem (and what we can do about [22] J. Yue et al., “A 2.75-to-75.9TOPS/W computing-in-memory NN proces-
it),” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, sor supporting set-associate block-wise zero skipping and ping-pong
Feb. 2014, pp. 10–14. CIM with simultaneous computation and weight updating,” in IEEE
[2] N. Verma et al., “In-memory computing: Advances and prospects,” IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2021,
Solid-State Circuits Mag., vol. 11, no. 3, pp. 43–55, Summer 2019. pp. 238–240.
[3] J. Zhang, Z. Wang, and N. Verma, “In-memory computation of [23] C.-X. Xue et al., “A 22 nm 4 Mb 8b-precision ReRAM computing-in-
a machine-learning classifier in a standard 6T SRAM array,” memory macro with 11.91 to 195.7TOPS/W for tiny AI edge devices,”
IEEE J. Solid-State Circuits, vol. 52, no. 4, pp. 915–924, Apr. 2017. in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
[4] S. K. Gonugondla, M. Kang, and N. R. Shanbhag, “A variation-tolerant Feb. 2021, pp. 245–247.
in-memory machine learning classifier via on-chip training,” IEEE J. [24] J.-W. Su et al., “A 28 nm 384 kb 6T-SRAM computation-in-memory
Solid-State Circuits, vol. 53, no. 11, pp. 3163–3173, Nov. 2018. macro with 8b precision for AI edge chips,” in IEEE Int. Solid-State
[5] A. Biswas and A. P. Chandrakasan, “CONV-SRAM: An energy-efficient Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2021, pp. 250–252.
SRAM with in-memory dot-product computation for low-power convo- [25] Y.-D. Chih et al., “An 89TOPS/W and 16.3TOPS/mm2 all-digital
lutional neural networks,” IEEE J. Solid-State Circuits, vol. 54, no. 1, SRAM-based full-precision compute-in memory macro in 22 nm for
pp. 217–230, Jan. 2019. machine-learning edge applications,” in IEEE Int. Solid-State Circuits
[6] H. Valavi, P. J. Ramadge, E. Nestler, and N. Verma, “A 64-tile 2.4-Mb Conf. (ISSCC) Dig. Tech. Papers, Feb. 2021, pp. 252–254.
in-memory-computing CNN accelerator employing charge-domain com- [26] S.-W. M. Chen and R. W. Brodersen, “A 6-bit 600-MS/s 5.3-mW
pute,” IEEE J. Solid-State Circuits, vol. 54, no. 6, pp. 1789–1799, asynchronous ADC in 0.13-µm CMOS,” IEEE J. Solid-State Circuits,
Jun. 2019. vol. 41, no. 12, pp. 2669–2680, Dec. 2006.
[7] H. Kim, Q. Chen, and B. Kim, “A 16K SRAM-based mixed-signal in- [27] Q. Fan, Y. Hong, and J. Chen, “A time-interleaved SAR ADC with
memory computing macro featuring voltage-mode accumulator and row- bypass-based opportunistic adaptive calibration,” IEEE J. Solid-State
by-row ADC,” in Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC), Circuits, vol. 55, no. 8, pp. 2082–2093, Aug. 2020.
Nov. 2019, pp. 35–36.
[8] Z. Jiang, S. Yin, J.-S. Seo, and M. Seok, “C3SRAM: An in-memory-
computing SRAM macro based on robust capacitive coupling computing
mechanism,” IEEE J. Solid-State Circuits, vol. 55, no. 7, pp. 1888–1897,
Jul. 2020. Chun-Yen Yao received the B.S. degree in electrical
[9] Q. Dong et al., “A 351TOPS/W and 372.4GOPS compute-in-memory engineering with a minor in mechanical engineer-
SRAM macro in 7 nm FinFET CMOS for machine-learning applica- ing and the M.S. degree in electronics engineering
tions,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, from National Taiwan University, Taipei, Taiwan, in
Feb. 2020, pp. 242–244. 2019 and 2022, respectively. He is currently pursu-
[10] C. Yu, T. Yoo, T. T.-H. Kim, K. C. T. Chuan, and B. Kim, “A 16K ing the Ph.D. degree in electrical engineering with
current-based 8T SRAM compute-in-memory macro with decoupled the University of California at Berkeley, Berkeley,
read/write and 1–5 bit column ADC,” in Proc. IEEE Custom Integr. CA, USA.
Circuits Conf. (CICC), Mar. 2020, pp. 1–4. His current research interests include low-noise
[11] Y.-T. Hsu, C.-Y. Yao, T.-Y. Wu, T.-D. Chiueh, and T.-T. Liu, “A high- current sensing and adaptive sensing for biomedical
throughput energy–area-efficient computing-in-memory SRAM using applications.
unified charge-processing network,” IEEE Solid-State Circuits Lett., Mr. Yao received the National Taiwan University Outstanding Youth Award
vol. 4, pp. 146–149, 2021. in 2021.
Authorized licensed use limited to: National Taiwan University. Downloaded on June 27,2024 at 03:00:40 UTC from IEEE Xplore. Restrictions apply.
YAO et al.: FULLY BIT-FLEXIBLE CIM MACRO 1495
Tsung-Yen Wu received the B.S. degree in electrical Yu-Kai Chen received the B.S. degree in electrical
engineering from National Cheng Kung University, engineering from National Chung Hsing University,
Tainan, Taiwan, in 2019, and the M.S. degree in Taichung, Taiwan, in 2021. He is currently pursu-
electronics engineering from National Taiwan Uni- ing the M.S. degree with the Graduate Institute of
versity, Taipei, Taiwan, in 2022. Electronics Engineering, National Taiwan Univer-
He is currently with MediaTek Inc., Taipei. sity, Taipei, Taiwan.
His current research interests include computing His research interests include computation in
in memory for energy-efficient machine learning memory and mixed-signal circuit designs.
applications.
Authorized licensed use limited to: National Taiwan University. Downloaded on June 27,2024 at 03:00:40 UTC from IEEE Xplore. Restrictions apply.