A Reliable 8T SRAM For High-Speed Searching and Logic-in-Memory Operations
A Reliable 8T SRAM For High-Speed Searching and Logic-in-Memory Operations
Abstract— To efficiently implement searching and logic func- high-throughput and energy-efficient computing architecture.
tions with the SRAM-based in-memory computing (IMC), However, conventional von Neumann architecture needs to
we need to perform computations on bitlines (BLs) (called com- move data back and forth between memory and processing
pute access) via multiple wordline (WL) activations. However, this
may cause prominent read disturbance when the IMC is imple- elements, which results in limited throughputs and substantial
mented with the standard 6T SRAM. To address this reliability energy overhead [1]. The in-memory computing (IMC) archi-
issue, existing solutions adopt either auxiliary assistance circuits tectures have been proposed to circumvent the von Neumann
or alternative bitcell topologies, but they lead to substantial bottleneck by reducing data transfers and performing com-
overheads of the access speed or array density. In this article, putations directly inside or near the memory. Recently, dif-
we propose a novel 8T compute SRAM (CSRAM) for reliable
and high-speed in-memory searching and compound logic-in- ferent levels of memory hierarchies, including SRAM [2], [3],
memory computations. Our 8T CSRAM features a pair of pMOS DRAM [4], and nonvolatile memories (such as RRAM [5], [6],
access transistors and split-WLs dedicated to the compute access. STT-MRAM [7], and Flash [8]) have been explored to imple-
A thorough circuit-level analysis reveals that the pMOS-based ment IMC systems. In this article, we focus on the compute
compute access port is essential for significantly mitigating the SRAM (CSRAM, as an SRAM-based IMC), as it is compatible
read disturbance. Moreover, we propose an elevated precharge
voltage scheme and a low-skewed inverter-based sensing amplifier with commercial CMOS technologies and has an opportunity
to improve the sensing speed. We have validated the proposed 8T to utilize existing large on-chip SRAM caches. The well-
CSRAM design in a 16 Kb array with a 28-nm CMOS technology. known analog-based CSRAM designs performing multiplica-
Compared to the state-of-the-art 8T CSRAM, results show that tion and accumulation (MAC)/dot-product computation have
our design is not only reliable but also 3.1 times faster, with a been presented in [9]–[21]. However, these works only sup-
maximum operating frequency upping to 2.44 GHz.
port specific error-resilient applications such as convolutional
Index Terms— Content addressable memory (CAM), neural networks (CNN).
in-memory computing (IMC), read disturbance, SRAM. To efficiently implement some types of popular applications
(such as searching and logic functions) with the digital-
I. I NTRODUCTION based CSRAM designs, we need to perform accurate bit-wise
computations on bitlines (BLs, called compute access) via
T HE surge of data-intensive applications such as arti-
ficial intelligence has an ever-increasing demand for multiple wordline (WL) activations [22]–[27]. The multiword
activation is an essential operation for digital-based CSRAM
Manuscript received August 25, 2021; revised December 18, 2021 and designs to achieve high throughput and energy efficiency.
February 4, 2022; accepted March 31, 2022. Date of publication April 20, However, this may cause prominent read disturbance when
2022; date of current version May 23, 2022. This work was supported
in part by the National Natural Science Foundation of China under Grant the IMC is implemented with the standard 6T SRAM bitcell.
62074101 and Grant 62150710549, and in part by the Shanghai Science Because the 6T SRAM has a shared read and write BL, the
and Technology Commission Funding under Grant 19511131200 and Grant bitcells of the same column may be flipped as the BL voltage
20ZR1435800. (Corresponding author: Yajun Ha.)
Jian Chen is with the School of Information Science and Technology, Shang- goes to a relatively low value. Therefore, the read disturbance
haiTech University, Shanghai 201210, China, also with Shanghai Institute significantly reduces the reliability of CSRAM designs.
of Microsystem and Information Technology, Chinese Academy of Sciences, Various techniques have been used to mitigate the read
Shanghai 200050, China, and also with the School of Electronic, Electrical and
Communication Engineering, University of Chinese Academy of Sciences, disturbance of CSRAM designs. First, a hierarchical 6T
Beijing 100049, China. CSRAM [28] and an interleaved structure [29] have been
Wenfeng Zhao is with the Department of Electrical and Computer proposed to circumvent the read disturbance at the architec-
Engineering, Binghamton University SUNY, Binghamton, NY 13902 USA.
Yuqi Wang, Yuhao Shu, and Weixiong Jiang are with the School of ture level. Nevertheless, both designs have rigid data layout
Information Science and Technology, ShanghaiTech University, Shanghai requirements and are not suitable for searching operation
201210, China. [i.e., content addressable memory (CAM)] operations. Second,
Yajun Ha is with the School of Information Science and Technology,
ShanghaiTech University, Shanghai 201210, China, and also with Shanghai designs in [25], [26] have eliminated the read disturbance
Engineering Research Center of Energy Efficient and Custom AI IC, issue for logic-in-memory computation by maintaining the
Shanghai 201210, China (e-mail: [email protected]). BL voltage at a high level. Again, both schemes do not
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TVLSI.2022.3164756. well support CAM operations. Third, other assist techniques
Digital Object Identifier 10.1109/TVLSI.2022.3164756 and alternative topologies, which aim to address the general
1063-8210 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 03,2022 at 06:01:49 UTC from IEEE Xplore. Restrictions apply.
770 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 6, JUNE 2022
TABLE I
S UMMARY F EATURES OF P REVIOUS A SSIST S CHEMES FOR CSRAM
read disturbance issue with multiple words simultaneously 1) First, we propose a novel and reliable 8T CSRAM
activated, are summarized in Table I. In [22], a WL under- with differential pMOS-based access transistors and split
drive technique is adopted and can efficiently mitigate the WLs dedicated to the compute access port. We per-
read disturbance by weakening the access transistors of 6T form in-depth circuit-level analysis to reveal that our
CSRAM, however, at the cost of accessing performance. CSRAM can well address the trade-off between reli-
Agrawal [30] introduce a staggered WL scheme, which acti- ability and performance compared to the previous
vates WLs successively to avoid the short circuit path. This CSRAM.
results in degraded speed and introduces complicated signal 2) Second, we propose an elevated precharge voltage
control overheads. Bitcells with more transistors have also scheme and a low-skewed inverter-based sense ampli-
been investigated, such as 8T [24], [30], 9T [30], and 10T [31]. fier (SA) to improve the compute access performance.
For the standard 8T, Lin et al. [30] and Agrawal et al. [24] 3) Third, we apply the new 8T CSRAM to realize reli-
circumvent the read disturbance by utilizing the decoupled 2T able and high-speed CAM for searching operations.
read ports cooperated with different sensing schemes. How- Moreover, the proposed 8T CSRAM is able to perform
ever, both sensing schemes need to distinguish two logic compound logic-in-memory functions in only one cycle.
functions from a single BL, which results in a low sensing The proposed 8T CSRAM is implemented in a 28-nm CMOS
margin and harms the access performance. Last, although technology, which has a similar area as a standard 8T bit-
the 9T [30] and 10T [31] with differential read ports provide cell [30]. A 16 Kb CSRAM macro has been post-layout
reliable compute access, both schemes suffer from area penalty validated, which is considerably faster than the state-of-the-
compared to 8T CSRAM. In summary, the aforementioned art designs.
works do not well address the trade-off among the compute The remainder of the article is organized as follows.
access performance, reliability, and bitcell area when dealing Section II introduces the read disturbance issue of conven-
with the read disturbance of CSRAM. It is pivotal to design a tional nMOS-access-based CSRAM. Section III presents the
novel CSRAM cell and architecture that can operate reliably design and operating principle of the proposed 8T bitcell. The
and with high access performance. reliability and access performance analysis of the proposed 8T
In this article, we have proposed a novel and reliable 8T CSRAM are also covered in this section. Section IV presents
CSRAM for reliable and high-speed in-memory searching and the elevated precharge scheme and low-skewed inverter-based
compound logic operation. Our 8T CSRAM features a pair sensing scheme to improve the sensing speed. Section V
of pMOS access transistors and split WLs dedicated to the introduces the principle of the CAM functions and the com-
compute access. The pMOS-access-based SRAMs have been pound logic-in-memory functions based on the proposed 8T
proved to be feasible, especially for ultralow voltage 6T [32] CSRAM. Section VI presents the experimental results and
and 9T [33]. However, these topologies can not address the analysis of the proposed design, and Section VII concludes the
new reliability issue of CSRAM, which makes them are not article.
applicable in IMC applications. Our proposed bitcell and its
corresponding peripheral circuit have been optimized for IMC
II. R EAD D ISTURBANCE OF CSRAM
applications so that the pMOS access transistor can be effec-
tively utilized in compute access. To the authors’ knowledge, Fig. 2 illustrates a 6T CSRAM design which activates two
this is the first time that pMOS-access-based CSRAM has been WLs simultaneously (i.e., a compute access). Based on the
proposed. stored data, the BL will be discharged to three different voltage
Our main contributions are summarized as follows. levels, as shown in Fig. 2(b). By sensing the three voltage
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 03,2022 at 06:01:49 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: RELIABLE 8T SRAM FOR HIGH-SPEED SEARCHING AND LOGIC-IN-MEMORY OPERATIONS 771
Fig. 1. (a) Schematic of conventional 6T CSRAM with two words selected simultaneously. (b) Dynamic behavior of equivalent half circuit during a compute
access. (c) BL transient behavior and formulated direct current during a typical corner. (d) Stored datum “B” is flipped under the worst corner (fast nMOS
slow pMOS).
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 03,2022 at 06:01:49 UTC from IEEE Xplore. Restrictions apply.
772 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 6, JUNE 2022
TABLE II
O PERATION C ONDITION OF THE P ROPOSED 8T CSRAM
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 03,2022 at 06:01:49 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: RELIABLE 8T SRAM FOR HIGH-SPEED SEARCHING AND LOGIC-IN-MEMORY OPERATIONS 773
Fig. 4. (a) Schematic of proposed 8T CSRAM with two words selected simultaneously. (b) Dynamic behavior of equivalent half circuit during a compute
access. (c) BL transient behavior and formulated direct current during a typical corner. (d) Stored data are NOT flipped under the worst corner (slow nMOS
fast pMOS).
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 03,2022 at 06:01:49 UTC from IEEE Xplore. Restrictions apply.
774 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 6, JUNE 2022
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 03,2022 at 06:01:49 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: RELIABLE 8T SRAM FOR HIGH-SPEED SEARCHING AND LOGIC-IN-MEMORY OPERATIONS 775
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 03,2022 at 06:01:49 UTC from IEEE Xplore. Restrictions apply.
776 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 6, JUNE 2022
Fig. 12. TCAM search example in a 4 × 4 subarray. The bitcells enclosed by purple boxes are used to represent “don’t care” state.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 03,2022 at 06:01:49 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: RELIABLE 8T SRAM FOR HIGH-SPEED SEARCHING AND LOGIC-IN-MEMORY OPERATIONS 777
A. Reliability
For the compound logic-in-memory functions, the read
disturbance is not prominent as there are only two words
activated per BL, as shown in Fig. 13. However, for the CAM
operation, the read disturbance is more severe as 128 words
are simultaneously accessed. Therefore, we only analyzed the
read disturbance of CAM operation. Taking one column as an
example, the worst case of CAM operation is that only one
accessed bitcell stores datum “0,” while the other 127 accessed
bitcells store datum “1.” When the 128 bitcells sharing the
same column are activated simultaneously, the RBLs will be
charged to a high value quickly. In this case, the data “0” is
most likely corrupted.
Fig. 15 presents the results of a 15 k-point Monte-Carlo sim-
ulation for the worst case CAM operation. The Monte-Carlo
simulation is performed with both global (process) and
local (mismatch) variation, at 27 ◦ C and 80 ◦ C. As can be
observed in the left figure of Fig. 15, the RBLs are charged to
the VDD level quickly when WLs are activated. Nevertheless,
the pseudo-write effect through the pMOS access is weak due
Fig. 14. Proposed 16 Kb (128 × 128) array. (a) Overall architecture.
(b) Layout view. to a large driving strength gap between the pMOS access
transistor and the pull-down nMOS transistor, as illustrated
in Section III-C. Thus, there is no data flipping occurring for
driver blocks are used to drive the pMOS access and the nMOS both 27 ◦ C and 80 ◦ C, as shown in Fig. 15, which means the
access transistors, respectively. The proposed CSRAM was read disturbance is successfully mitigated in this design.
implemented and post-layout validated in a commercial 28-nm
CMOS technology, which occupied a total area of 0.025 mm2 B. Access Performance and Energy Evaluation
(80 µm × 310 µm). The array efficiency of this design is Fig. 16 shows the operating frequency of logic and CAM
58.8%, which is comparable to industrial standards [28]. operation with respect to the supply voltage under the typical
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 03,2022 at 06:01:49 UTC from IEEE Xplore. Restrictions apply.
778 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 6, JUNE 2022
TABLE IV
C OMPARISON W ITH P REVIOUS CSRAM W ORKS
Fig. 17. (a) BCAM and TCAM energy across VDD. (b) Compound logic
operation energy across VDD.
Fig. 16. Operating frequency of CAM and logic operation across VDD. has a larger capacitance compared to other push-rule-based
designs. Therefore, for the CAM operation, the proposed
design has an energy overhead. But, the overhead can be
process corner and 27 ◦ C. The measured frequency of logic effectively alleviated if the proposed 8T is implemented with
operation is for the case of a four-operands compound logic push-rule. In addition, the proposed design still has energy
function, while the measured frequency of CAM operation benefit for logic operation as we implement a low swing
is for the worst case that only has a one-bit mismatch sensing. Due to the introduction of differential pMOS-based
in a column. As can be observed, the logic operation is slightly read ports, the proposed CSRAM has a larger sense margin
faster than the CAM operation because a multiword activation compared to the standard 8T, which uses only a single read
of CAM operation introduces large voltage overshoots caused port. Thus, a reduced BL swing can also be implemented to
by the gate–drain capacitance of the access transistors, which boost the access performance. As can be observed from the
has a negative impact on the access performance. At the table, when this work is validated in a post-layout fashion,
minimum supply voltage (VDD = 0.5 V), the logic operation the maximum frequency of BCAM is up to 1.90 GHz which
frequency is 180 MHz, while the CAM frequency is 123 MHz. is 2.3× faster than standard 8T [24] which is only validated
Fig. 17 illustrates the CAM operation energy and logic oper- with presimulation, and the maximum frequency of logic
ation energy versus the supply voltage. The CAM energy is operation is 2.44 GHz which is also 3.1× faster. Considering
measured under the case that all column data are mismatched. the standard 8T in the reference work [24] is implemented
At the minimum supply, the BCAM consumes 0.32 fJ/bit. in 65-nm technology, we normalized the frequency to 28-nm
As two bits are needed to represent a state in TCAM, the technology. Even with an ideal transformation, the projected
consumption energy per bit is doubled comparing to BCAM. frequency of [24] is about 1.63 and 1.59 GHz for CAM and
For the four-operand compound logic operation, the measured logic operation, which is still substantially lower than the
energy is averaged over all kinds of data patterns. The min- proposed design.
imum consumption energy is only 5.7 fJ/bit when VDD = Compared to the 6T CSRAM, although the proposed 8T
0.5 V. CSRAM will have around 26% area overhead if they are
Table IV summarizes a comparison between the pro- implemented with the same technology and logic rule, its
posed 8T CSRAM and state-of-the-art CSRAM designs. As we operating frequency is greatly improved. When compared to
adopt a logic-rule during the layout, the proposed design a 6T CSRAM adopting staggered WLs [30], our design has
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 03,2022 at 06:01:49 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: RELIABLE 8T SRAM FOR HIGH-SPEED SEARCHING AND LOGIC-IN-MEMORY OPERATIONS 779
improved the access speed by 7.3×. In addition, as a nominal [14] A. Jaiswal, I. Chakraborty, A. Agrawal, and K. Roy, “8T SRAM cell as
WL voltage can be applied in this design, the maximum CAM a multibit dot-product engine for beyond von Neumann computing,”
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 27, no. 11,
and logic operation frequency is also 5.1× and 3.1× higher pp. 2556–2567, Nov. 2019.
than a 6T design [22] which adopts a WL under-drive scheme. [15] M. Ali, A. Jaiswal, S. Kodge, A. Agrawal, I. Chakraborty, and K. Roy,
“IMAC: In-memory multi-bit multiplication and accumulation in 6T
SRAM array,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 8,
VII. C ONCLUSION pp. 2521–2531, Aug. 2020.
[16] J. Zhang, Z. Wang, and N. Verma, “In-memory computation of
The read disturbance issue is critical for IMC applications a machine-learning classifier in a standard 6T SRAM array,”
IEEE J. Solid-State Circuits, vol. 52, no. 4, pp. 915–924, Apr. 2017.
with multiword activation. In this article, we propose a reli- [17] M. Kang, S. K. Gonugondla, A. Patil, and N. R. Shanbhag,
able 8T bitcell with extra differential pMOS access transistors, “A multi-functional in-memory inference processor using a standard 6T
which is dedicated to multiword IMC. Comprehensive analysis SRAM array,” IEEE J. Solid-State Circuits, vol. 53, no. 2, pp. 642–655,
Feb. 2018.
shows that CSRAM adopting pMOS access transistor is more [18] X. Si et al., “15.5 A 28 nm 64 Kb 6T SRAM computing-in-memory
reliable and faster than conventional CSRAM adopting nMOS macro with 8b MAC operation for AI edge chips,” in IEEE ISSCC Dig.
access transistor. Moreover, a read BL elevated precharge Tech. Papers, Oct. 2020, pp. 246–248.
[19] M. Kang, S. K. Gonugondla, and N. R. Shanbhag, “Deep in-memory
circuitry and low-skewed inverter-based sensing scheme is architectures in SRAM: An analog approach to approximate computing,”
proposed to accelerate the accessing speed. To illustrate the Proc. IEEE, vol. 108, no. 12, pp. 2251–2275, Dec. 2020.
effectiveness, a 16 Kb CSRAM, which can be configured to [20] Z. Liu et al., “NS-CIM: A current-mode computation-in-memory archi-
tecture enabling near-sensor processing for intelligent IoT vision nodes,”
CAM and logic-in-memory operations, is designed and post- IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 9, pp. 2909–2922,
layout validated. The results show that the operating frequency Sep. 2020.
of CAM and logic-in-memory operation is 2.3× and 3.1× [21] C. Yu, T. Yoo, T. T.-H. Kim, K. C. Tshun Chuan, and B. Kim,
faster compared to the state-of-the-art design based on the “A 16 K current-based 8T SRAM compute-in-memory macro with
decoupled read/write and 1-5bit column ADC,” in Proc. IEEE Custom
standard 8T CSRAM, respectively. Integr. Circuits Conf. (CICC), Mar. 2020, pp. 1–4.
[22] S. Jeloka, N. B. Akesh, D. Sylvester, and D. Blaauw, “A 28 nm
configurable memory (TCAM/BCAM/SRAM) using push-rule 6T bit
R EFERENCES cell enabling logic-in-memory,” IEEE J. Solid-State Circuits, vol. 51,
no. 4, pp. 1009–1021, Apr. 2016.
[1] M. Horowitz, “1.1 Computing’s energy problem (and what we can do [23] Q. Dong et al., “A 4+2T SRAM for searching and in-memory computing
about it),” in IEEE ISSCC Dig. Tech. Papers, Feb. 2014, pp. 10–14. with 0.3-V V D Dmin ,” IEEE J. Solid-State Circuits, vol. 53, no. 4,
[2] N. Verma et al., “In-memory computing: Advances and prospects,” IEEE pp. 1006–1015, Apr. 2018.
Solid State Circuits Mag., vol. 11, no. 3, pp. 43–55, Aug. 2019. [24] Z. Lin et al., “In-memory computing with double word lines and three
[3] S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, and read ports for four operands,” IEEE Trans. Very Large Scale Integr.
R. Das, “Compute caches,” in Proc. IEEE Int. Symp. High Perform. (VLSI) Syst., vol. 28, no. 5, pp. 1316–1320, May 2020.
Comput. Archit. (HPCA), Austin, TX, USA, Feb. 2017, pp. 481–492. [25] K. Lee, J. Jeong, S. Cheon, W. Choi, and J. Park, “Bit parallel 6T SRAM
[4] S. Li, D. Niu, K. T. Malladi, H. Zheng, B. Brennan, and Y. Xie, in-memory computing with reconfigurable bit-precision,” in Proc. 57th
“DRISA: A dram-based reconfigurable in-situ accelerator,” in Proc. 50th ACM/IEEE Design Autom. Conf. (DAC), Dec. 2020, pp. 1–6.
Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO), Oct. 2017, [26] J. Chen, W. Zhao, Y. Wang, and Y. Ha, “Analysis and optimization
pp. 288–301. strategies toward reliable and high-speed 6T compute SRAM,” IEEE
[5] W. H. Chen et al., “A 65 nm 1 Mb nonvolatile computing-in-memory Trans. Circuits Syst. I, Reg. Papers, vol. 68, no. 4, pp. 1520–1531,
ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN Apr. 2021.
AI edge processors,” in IEEE ISSCC Dig. Tech. Papers, Feb. 2018, [27] Z. Lin et al., “Two-direction in-memory computing based on 10T SRAM
pp. 494–496. with horizontal and vertical decoupled read ports,” IEEE J. Solid-State
[6] Y. Chen, L. Lu, B. Kim, and T. T.-H. Kim, “Reconfigurable 2T2R Circuits, vol. 56, no. 9, pp. 2832–2844, Sep. 2021.
ReRAM architecture for versatile data storage and computing in- [28] W. Simon, J. Galicia, A. Levisse, M. Zapater, and D. Atienza, “A fast,
memory,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, reliable and wide-voltage-range in-memory computing architecture,”
no. 12, pp. 2636–2649, Dec. 2020. in Proc. 56th ACM/IEEE Annu. Design Autom. Conf. (DAC). ACM,
[7] S. Jain, A. Ranjan, K. Roy, and A. Raghunathan, “Computing in memory Jun. 2019, pp. 1–6.
with spin-transfer torque magnetic RAM,” IEEE Trans. Very Large Scale [29] A. Jaiswal, A. Agrawal, M. F. Ali, S. Sharmin, and K. Roy, “I-SRAM:
Integr. (VLSI) Syst., vol. 26, no. 3, pp. 470–483, Dec. 2017. Interleaved wordlines for vector Boolean operations using SRAMs,”
[8] P. Wang et al., “Three-dimensional NAND flash for vector–matrix IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 12, pp. 4651–4659,
multiplication,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., Dec. 2020.
vol. 27, no. 4, pp. 988–991, Apr. 2019. [30] A. Agrawal, A. Jaiswal, C. Lee, and K. Roy, “X-SRAM: Enabling in-
[9] X. Si et al., “A dual-split 6T SRAM-based computing-in-memory unit- memory Boolean computations in CMOS static random access mem-
macro with fully parallel product-sum operation for binarized DNN edge ories,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 65, no. 12,
processors,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 66, no. 11, pp. 4219–4232, Dec. 2018.
pp. 4172–4185, Nov. 2019. [31] Y. Zhang, L. Xu, Q. Dong, J. Wang, D. Blaauw, and D. Sylvester,
[10] W.-S. Khwa et al., “A 65 nm 4 Kb algorithm-dependent computing-in- “Recryptor: A reconfigurable cryptographic cortex-M0 processor with
memory SRAM unit-macro with 2.3 Ns and 55.8 TOPS/W fully parallel in-memory and near-memory computing for IoT security,” IEEE J. Solid-
product-sum operation for binary DNN edge processors,” in IEEE ISSCC State Circuits, vol. 53, no. 4, pp. 995–1005, Apr. 2018.
Dig. Tech. Papers, Feb. 2018, pp. 496–498. [32] M. Nabavi and M. Sachdev, “A 290-mV, 3.34-MHz, 6T SRAM with
[11] A. Biswas and A. P. Chandrakasan, “CONV-SRAM: An energy-efficient pMOS access transistors and boosted wordline in 65-nm CMOS tech-
SRAM with in-memory dot-product computation for low-power convo- nology,” IEEE J. Solid-State Circuits, vol. 53, no. 2, pp. 656–667,
lutional neural networks,” IEEE J. Solid-State Circuits, vol. 54, no. 1, Feb. 2018.
pp. 217–230, Jan. 2019. [33] S. Lutkemeier, T. Jungeblut, H. K. O. Berge, S. Aunet, M. Porrmann,
[12] X. Si et al., “24.5 A twin-8T SRAM computation-in-memory macro for and U. Ruckert, “A 65 nm 32 b subthreshold processor with 9T
multiple-bit CNN-based machine learning,” in IEEE ISSCC Dig. Tech. multi-Vt SRAM and adaptive supply voltage control,” IEEE J. Solid-
Papers, Feb. 2019, pp. 396–398. State Circuits, vol. 48, no. 1, pp. 8–19, Jan. 2013.
[13] J. Yang et al., “24.4 sandwich-RAM: An energy-efficient in-memory [34] J. Wang et al., “A 28-nm compute SRAM with bit-serial logic/arithmetic
BWN architecture with pulse-width modulation,” in IEEE ISSCC Dig. operations for programmable in-memory vector computing,” IEEE
Tech. Papers, Feb. 2019, pp. 394–396. J. Solid-State Circuits, vol. 55, no. 1, pp. 76–86, Jan. 2020.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 03,2022 at 06:01:49 UTC from IEEE Xplore. Restrictions apply.
780 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 6, JUNE 2022
[35] H. Kim, T. Yoo, T. T.-H. Kim, and B. Kim, “Colonnade: A recon- Weixiong Jiang (Graduate Student Member, IEEE)
figurable SRAM-based digital bit-serial compute-in-memory macro for received the B.S. degree from Harbin Institute of
processing neural networks,” IEEE J. Solid-State Circuits, vol. 56, no. 7, Technology, Harbin, China, in 2017. He is currently
pp. 2221–2233, Jul. 2021. working toward the Ph.D. degree at the Recon-
[36] T. Na, S.-H. Woo, J. Kim, H. Jeong, and S.-O. Jung, “Comparative study figurable and Intelligent Computing Laboratory,
of various latch-type sense amplifiers,” IEEE Trans. Very Large Scale ShanghaiTech University, Shanghai, China.
Integr. (VLSI) Syst., vol. 22, no. 2, pp. 425–429, Feb. 2014. His current research interests include energy effi-
cient DNN acceleration and online slack measure-
ment on FPGA.
Jian Chen (Graduate Student Member, IEEE)
received the B.S. degree from Huazhong University
of Science and Technology, Wuhan, China, in 2016.
He is currently working toward the Ph.D. degree at
ShanghaiTech University, Shanghai, China, Shang-
hai Institute of Microsystem and Information Tech-
nology, Chinese Academy of Sciences, Shanghai,
and the University of Chinese Academy of Sciences,
Beijing, China.
His current research interests include memory
design, in-memory computing, and ultralow power
VLSI designs.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 03,2022 at 06:01:49 UTC from IEEE Xplore. Restrictions apply.