A Brain-Inspired ADC-Free SRAM-Based In-Memory Computing Macro With High-Precision MAC For AI Application

Uploaded by

Ayush Dahiya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views5 pages

A Brain-Inspired ADC-Free SRAM-Based In-Memory Computing Macro With High-Precision MAC For AI Application

Uploaded by

Ayush Dahiya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

1276 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 70, NO.

4, APRIL 2023

A Brain-Inspired ADC-Free SRAM-Based

In-Memory Computing Macro With High-Precision
MAC for AI Application
Zihao Xuan , Chang Liu, Yue Zhang, Yuan Li, and Yi Kang

Abstract—In this brief, an ADC-free SRAM-based IMC

macro is proposed, which enables energy-efficient and high-
precision MAC operation by brain-inspired computing. We
identify two key features that support IMC macro to achieve
high energy efficiency and high precision MAC calculation. First,
the temporal-coding spiking neuron circuit is used to replace
the analog-to-digital converter (ADC) to achieve high-efficiency
data conversion. Second, the digital adder tree logic eliminates the
cost of moving partial sums between PEs and increases the par-
allelism of the calculations. The mixed-signal SRAM-based IMC
macro is designed for processing artificial intelligence (AI) algo-
rithms with reconfigurable precisions based on bit-wise input and
weight. In an experiment, the proposed SRAM-based IMC macro
with an area of 3.41 mm2 was designed in 0.18um CMOS tech-
nology. Post-layout simulation results indicate that the 16Kb IMC
macro achieves 10.8-13.5 TOPS/W with 4-b inputs, 4-b weights,
and 14-b MAC-value outputs.
Index Terms—In-memory computing, SRAM, neuromorphic
hardware, MAC, AI accelerator.

I. I NTRODUCTION
EEP neural networks (DNNs) show strong vitality in real-
D ize complex artificial intelligence (AI) applications, such
as image recognition, autopilot, object detection, and natural Fig. 1. (a) Difference between Von Neumann architecture and SRAM-based
language processing (NLP) [1]. Vast amounts of accelera- in-memory computing architecture. (b) Conventional adopt ADC as readout
circuit and proposed adopt spiking neuron as readout circuit.
tors based on Von Neumann architecture have been proposed
to achieve high-performance DNN algorithm acceleration.
Among them, general-purpose DNN accelerators represented In-memory computing (IMC) can perfectly solve this problem
by DaDianNao [2] and domain-specific CNN accelerators rep- by performing in situ multiply and accumulate (MAC) oper-
resented by Eyeriss [3] have been widely reported. These ations in memory, which minimizes the energy consumption
DNN accelerators reduce off-chip memory access by enhanc- overhead caused by off-chip memory access [4]. It is a promis-
ing data reuse to achieve effective algorithm acceleration. ing candidate approach to breaking through the limitations of
However, limited by the inherent “memory wall” problem Von Neumann’s architecture and achieving a low-power, high-
in Von Neumann architecture, which has separate memory parallel, and high-throughput computing system. Previous
and process elements (PEs), these accelerators cannot meet works have explored the novel IMC macros based on different
the further computing requirements of DNNs with continu- memory technologies [5], [6]. Among many alternatives, static
ally increasing parameters by reusing data (see Fig. 1 (a)). random-access memory (SRAM) based IMC macro stands out
Manuscript received 30 June 2022; revised 2 September 2022 and
because of its excellent advantages in stability and maturity.
29 October 2022; accepted 19 November 2022. Date of publication Previous works [7], [8], [9], [10] have demonstrated the
23 November 2022; date of current version 29 March 2023. This work robustness of analog SRAM-based IMC macros to achieve
was supported in part by the National Key Research and Development energy-efficient, low-latency, and high-parallelism computa-
Program of China under Grant 2019YFB2204800, and in part by the
tions. In the analog domain approach, the result of the
Strategic Priority Research Program of Chinese Academy of Sciences under
Grant XDB44000000. This brief was recommended by Associate Editor calculation is represented as a continuous voltage signal
J. Kulkarni. (Corresponding author: Yi Kang.) and converted into a digital output by the analog-to-digital
The authors are with the School of Microelectronics, University of Science converter (ADC) (see Fig. 1 (b)). However, the limited volt-
and Technology of China, Hefei 230026, Anhui, China (e-mail: zh11@ age margin and the excessive overhead ADCs can limit the
mail.ustc.edu.cn; [email protected])
Color versions of one or more figures in this article are available at
system’s accuracy and energy efficiency gains.
https://doi.org/10.1109/TCSII.2022.3224049. In this brief, we propose an ADC-free SRAM-based IMC
Digital Object Identifier 10.1109/TCSII.2022.3224049 macro, which adopts a spiking neuron as a readout circuit
1549-7747
c 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on August 08,2023 at 09:59:45 UTC from IEEE Xplore. Restrictions apply.
XUAN et al.: BRAIN-INSPIRED ADC-FREE SRAM-BASED IMC MACRO 1277

(VMAC ∈ [0, VDD]), inappropriate integration time t and

capacitance C can lead to overflow of the integrated voltage
value. This will increase the output error.
In spiking neurons, the membrane capacitance integrates the
input current to generate the membrane potential, and when
the membrane potential reaches the threshold, the neuron fires
a spiking signal [11]. According to the biological neuron oper-
ating mechanism, the IMAC current can be directly converted
Fig. 2. Rate coding and temporal coding IF spiking neuron. into spike firing frequency or time. It avoids the manipulation
of analog voltages. Therefore, we can rewrite equation (2) as
follow:
1 V ·C
tMAC = = (3)
fMAC DMAC · I0
where V is the neuron threshold voltage and is a constant. In
this equation, the time tMAC or frequency fMAC instead of volt-
age as an intermediate signal for data conversion. This brings
two benefits: one is that data overflow is avoided, and the
other is that an analog neuron circuit can use a very low sup-
ply voltage to obtain ultra-low power and energy consumption.
Therefore, spiking neuron circuits can be promising candidates
for energy-efficient data conversion circuits.
Fig. 2 shows the relationship between the output spikes and
the input current of rate-coding and temporal-coding spiking
neurons. Compared to rate-coding, temporal-coding spiking
neuron only needs to fire one spike per data transition. This
reflects the superiority of temporal-coding spiking neurons to
save energy consumption. In this brief, we first adopt temporal-
coding spiking neurons as efficient data conversion interface
circuits to break through the predicament of the current IMC
system.
Fig. 3. Proposed mixed-signal ADC-free SRAM-based IMC macro structure
and data flow.
III. P ROPOSED IMC M ACRO
A. Overall Architecture
(see Fig. 1 (b)), for AI applications. Three features support
the proposed macro to achieve high energy-efficiency and The proposed SRAM-based IMC macro (see in Fig. 3) com-
accuracy calculation: 1) temporal-coding spiking neuron to prises input buffers & WL drivers, a global reference
replace ADC for low-error and low-power data conversion. block, shifters & adders (S&A), an R/W interface, con-
2) digital adder tree logic pushes partial sums of movement troller, and an SRAM computing array. This macro can
in the local domain. 3) bit-wise input and weight can realize process energy-efficient matrix-vector multiplication (VMM)
DNN algorithm acceleration with reconfigurable fixed-point for offline DNN’s inference. A computing array is divided
precision. into several slices, which respectively implement sub-column
MAC within slices in the analog time domain and column
MAC between slices in the digital domain. A sub-column
II. SRAM-BASED B RAIN -I NSPIRED C OMPUTING MAC is implemented by a local computing block (LCB) and
In SRAM-based analog computing macro, each SRAM is a column MAC is implemented by an adder tree (AT) logic.
equivalent to an analog 1-bit multiplier. The multiplication The S & A circuits can assist with the bit-wise input and
results are collected on the bit line (BL) in the form of current weight for reconfigurable calculation precision. LCB, AT, and
to get the MAC current IMAC . The relationship between IMAC S & A can be executed in a three-stage pipelined scheme to
and digital MAC code DMAC can be expressed: increase computing speed.

IMAC = DMAC · I0 (1) B. Local Computing and Global Reference Block Circuit
where I0 is the unit current. To get the corresponding digital In Section II, we find that temporal-coding spiking neu-
code DMAC , traditionally, the IMAC is converted into a MAC rons have excellent advantages in low-power computing. In
voltage VMAC via a capacitor for quantization by ADC cir- this sub-section, we propose a novel local computing block
cuit. The relationship between VMAC , IMAC , and DMAC can be circuit (LCB) to perform energy-efficient sub-column MAC
indicated as: operations without ADC overhead. As shown in Fig. 4 (a),
each LCB consists of 32 8T-SRAMs and a spiking neu-
IMAC · t I0 · t
VMAC = = DMAC (2) ron circuit. Each 8T-SRAM is equivalent to an analog 1-bit
C C multiplier. When the multiplication result is “1”, the SRAM
where C and t are integral capacitor value and time, respec- generates a unit current I0 , otherwise, no current is generated.
tively. However, because of the limited voltage margins This unit current is pooled on the local read bit line (RBL)

Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on August 08,2023 at 09:59:45 UTC from IEEE Xplore. Restrictions apply.
1278 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 70, NO. 4, APRIL 2023

a self-supervised approach. As shown in Fig. 4 (b), a GRB

circuit mainly consists of 33 LCB circuits with fixed bias
currents input (Fix-LCB), which increases sequentially from
I0 to 33I0 (I0 is the unit current generated by an SRAM).
The spike firing times (t0 , t1 , . . . , t32 ) of the first 32 Fix-
LCB circuits contain all possible cases of an LCB output time.
Passing these 32 output signals “SN0-SN31” through a par-
allel counter (PC) can obtain a time-varying reference digital
signal GREF . When the output signal of the LCB is flipped, the
GREF at the moment represents the digital output correspond-
ing of tMAC . This is a novel time-to-digital converter achieved
by the self-supervision of the LCB. The control signals S2,
S3, and S4 in the LCB are also generated through SN32 , SN5 ,
and SN16 of the GRB circuit, respectively.
Fig. 5 shows the timing diagram of LCB. The workflow
includes two phases: (1) PH1: Reset LCB via keeping the
S1 switch “ON” and the S2 switch “OFF”. In this phase,
both Vm and VA in LCB are reset to GND. (2) PH2:
Fig. 4. Local computing (a) and global reference (b) block circuit. Data Conversion. At time tinital , the S1 switch “OFF”. IMAC
is injected into the SN to generate a continuously increasing
membrane potential Vm . At time t32 , S2 switch “ON”, and the
SE circuit enable, which can reduce the energy consumption
of SE caused by direct current (DC) input before time t32 . At
time t16 , S4 switch “ON”, P4 is turned on, the current copy
ratio of VGCM is changed, and the SN membrane current Im
is equal to 3IMAC . At time t5 , S3 switch “ON” briefly, Vm is
directly up to the VREF . Regardless, once the Vm > Vth , the
SN fires a spike, simultaneously cutting off the input from
SRAM. According to the amplitude of the input IMAC , the
functional relationship between LCB input current IMAC and
output time tMAC is expressed as follows:
⎧
⎪ Vth Cm
⎨ tinital + IMAC ,
⎪ 16I0 ≤ IMAC < 32I0
Vth Cm −t16 IMAC
tMAC = t16 + , 5I0 ≤ IMAC < 16I0 (4)
⎪
⎪
3IMAC
Fig. 5. Time diagram of LCB and GRB. ⎩ t5 + (Vth −VREF )Cm , IMAC < 5I0
3IMAC

where Cm is the membrane capacitance of the neuron with

in the vertical direction as part of the sub-column MAC cur- a capacitance of C1 + C2; Vth is spiking neuron threshold
rent IMAC . The spiking neuron (SN) circuit can convert IMAC voltage; tinital is the initial integration time of the neuron; t16
into a digital time signal tdMAC . Its block diagram is shown in is the output MAC time tMAC when IMAC =16I0 ; t5 is the
the orange dashed box in Fig. 4 (a), which contains a variable output MAC time tMAC when IMAC = 5I0 .
gain current mirror (VGCM), self-emitter circuit (SE), a regis-
ter under digital clock control, and control circuits. A VGCM
(P0 -P5 ) proportionally replicates the IMAC to capacitors to gen- C. Adder Tree and S&A Circuit
erate the SN membrane potential Vm . A SE comprises two In the proposed IMC macro, each LCB performs 32 MACs
tri-state inverters (N3 -N6 , P7 , P8 ) in series and two integrating of 1-bit input activation and 1-bit weight, and the results of
capacitors C1 and C2 . The series tri-state inverter is equiva- LCBs in the same column are fed into the adder tree to gen-
lent to a feed-forward amplifier (FA), and the output of FA is erate a column MAC with 1-bit input activation and 1-bit
connected to its input through the C2 capacitor to form a pos- weight (MAC@1bIN-1bW). Adder tree can push partial sums
itive feedback system. When the common node voltage of C1 movement in the local domain and improve the parallelism of
and C2 exceeds the inverter flip threshold Vth , the output VA of MAC computing without accuracy loss. To demand multi-bit
FA is inverted, C1 and C2 are converted from series to parallel, computing in current AI tasks, shifter and adder (S&A) cir-
the membrane potential Vm jumps to VDD, and SN generates cuits are proposed, which can realize reconfigurable multi-bit
an output spike signal (edge-like) at time tMAC. weights and input activations MAC operation expansion. As
Due to MAC time signals tMAC are not uniformly dis- shown in Fig. 6, S&A consists of a weighted summation cir-
tributed, it is difficult to quantify it using a conventional cuit (WSC) and weighted accumulation circuit [10] (WAC),
time-to-digital converter (TDC). Recently, [12] reported a non- which can realize multi-bit programmable signed/unsigned
uniform time quantization scheme by using an over-sampling weights and inputs for MAC operation, respectively. The WSC
TDC plus a time calibration circuit. The overhead of such (see Fig. 6 (a)) consists of three shifters for weighting, two
a scheme is much worse in a high-precision SRAM-based adders for summation, an original code to complement code
IMC design. To deal with the non-uniform quantization issue, (O2C) conversion circuit and a complement code to original
we propose a global reference block (GRB) to convert the code (C2O) conversion circuit for signed expansion. In WSC,
unevenly distributed time signal to a digital signal through four column-MAC@1bIN-1W results (MAC0 , MAC1 , MAC2 ,

Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on August 08,2023 at 09:59:45 UTC from IEEE Xplore. Restrictions apply.
XUAN et al.: BRAIN-INSPIRED ADC-FREE SRAM-BASED IMC MACRO 1279

Fig. 8. (a) The simulated output max positive error (red line) and neg-
ative error (blue line) under TT corner and room temperature. The LCB
output precision from different (b) operating voltages, (c) temperatures,
Fig. 6. S&A consists of (a) weighted summation circuit (WSC) for MAC and (d) process corners simulation.
operation with multi-bit programmable signed/unsigned weights. (b) weighted
accumulation circuit (WAC) for MAC operation with multi-bit programmable
signed/unsigned inputs.
IV. R ESULT AND D ISCUSSION
Fig. 7 illustrates the relationship between the number of
activated RWLs (NMAC ), tMAC , and DMAC of the proposed LCB
circuit. The red scatter plot shows the results of the Monte-
Carlo simulation at room temperature of 25◦ C and TT corner.
It shows the offset of the output time tMAC caused by non-
ideal factors, such as process mismatch and variation. The
interval of the envelope of the two grey lines represents all
possible relationships between DMAC and NMAC . As shown in
Fig. 8 (a), we have obtained the relationship between the dig-
ital output code DMAC and the output error. The red and blue
line represents the maximum positive error and the maximum
negative error, respectively. The maximum absolute error less
than 1.5 LSB and the output precision of an LCB is greater
Fig. 7. The relationship between the number of activated RWLs, tMAC ,
and DMAC of the proposed LCB circuit by Monte-Carlo simulation at room
than 4-bit. To future illustrate the LCB output accuracy, we
temperature and TT corner. provide simulation results from different process corners, oper-
ating voltages, and temperatures. As shown in Fig. 8 (b), (c),
and (d), the lowering of the operating voltage and the fast
NMOS (ff, fnsp) process corners will lead to a drop in the
MAC3 ) are fed into WSC to generate a column-MAC@1bIN-
output accuracy of the LCB to 3 bits. These results show that
4bW (PSUM) results. The relationship between the output and
the output precision of our proposed LCB circuit can reach
input of the WSC can be expressed as follows:
at least 3 bits. This conclusion is still valid under worst-case
conditions (125◦ C, FF angle, 1V supply voltage).
PSUM = 20 MAC0 + · · · + 22 MAC2 + (−1)Sign 23 MAC3
Fig. 9 shows the layout photograph, area and energy break-
(5) down, and performance summary of the proposed IMC macro,
which was implemented in 0.18um CMOS technology and
Input activations are simultaneously fed into the SRAM array occupied 3.41 mm2 with 16 Kb SRAM memory. Each sub-
in an MSB-first bit-serial manner. The WAC is required to module of the proposed macro is individually post-layout
accumulate the column MAC@1bIN-4bW results of each simulated in the cadence tool. The accuracy, delay, and
cycle. The WAC is shown in Fig. 6 (b), which consists of energy efficiency of the whole macro can be obtained by
a shifter for weighting, a 21-bit adder for accumulation, O2C aggregating the simulation results of the sub-modules. Our
for input signed expansion and some registers for the pipeline SRAM-based IMC macro energy efficiency can reach 10.8-
manner. For the input activations with 4-bit precision, 5 cycles 13.5 TOPS/W and throughput can reach 10.24 GOPS for
are required to complete a column MAC@4bIN-4bW. The nearly full output-ratio (4bIN-4bW-14bOUT). The time for the
macro to complete one vector-matrix multiplication (VMM) is
final output (SUM) of the WAC can be expressed by the
370ns. We adopt VGG-8 as a benchmark (software baseline is
bit-serial input as: 93.5%), and our design achieved 92.5% and 93.4% inference
accuracy on the CIFAR-10 dataset using 4- or 8-bit precision
SUM = 20 PSUM0 + · · · + 22 PSUM2 + (−1)Sign∗MSB 23 PSUM3 input and weight configuration, respectively.
(6) Design assessment shows that the spike time approach has
excellent technology scaling capability. We also estimated
Through WSC and WAC scheme, macro can provide flex- the performance parameters of the entire IMC macro at the
ible MAC computation requirements with adjustable weights 28nm technology by pre-layout simulation, in which aver-
and input activations accuracy. age energy efficiency can reach 98 TOPS/W with 4-bit MAC

Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on August 08,2023 at 09:59:45 UTC from IEEE Xplore. Restrictions apply.
1280 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 70, NO. 4, APRIL 2023

TABLE I
F EATURE S UMMARY AND C OMPARISON TO P RIOR W ORKS V. C ONCLUSION
This brief presents an ADC-free SRAM-based IMC macro
to support high-precision MAC operation for AI applications
based on brain-inspired computing. A 0.18um 16kb IMC
SRAM macro implementation is demonstrated, and simula-
tion shows the energy efficiency can reach 10.8-13.5 TOPS/W
while performing MAC operation with 4-bit input, 4-bit
weight, and 14-bit precision output.

R EFERENCES
[1] Y. LeCun, Y. Bengio, and G. J. N. Hinton, “Deep learning,” Nature,
vol. 521, no. 7553, pp. 436–444, 2015.
[2] Y. Chen et al., “DaDianNao: A machine-learning supercomputer,”
in Proc. 47th Annu. IEEE/ACM Int. Symp. Microarchit., Dec. 2014,
pp. 609–622, doi: 10.1109/MICRO.2014.58.
[3] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An
energy-efficient reconfigurable accelerator for deep convolutional neu-
ral networks,” IEEE J. Solid-State Circuits, vol. 52, no. 1, pp. 127–138,
Jan. 2017, doi: 10.1109/JSSC.2016.2616357.
[4] S. Yu, H. Jiang, S. Huang, X. Peng, and A. Lu, “Compute-in-
memory chips for deep learning: Recent trends and prospects,” IEEE
Circuits Syst. Mag., vol. 21, no. 3, pp. 31–56, 3rd Quart., 2021,
doi: 10.1109/MCAS.2021.3092533.
[5] T. P. Xiao, C. H. Bennett, B. Feinberg, S. Agarwal, and M. J. Marinella,
“Analog architectures for neural network acceleration based on non-
volatile memory,” Appl. Phys. Rev., vol. 7, no. 3, 2020, Art. no. 31301,
doi: 10.1063/1.5143815.
[6] M. Kang, S. K. Gonugondla, and N. R. Shanbhag, “Deep in-memory
architectures in SRAM: An analog approach to approximate com-
puting,” Proc. IEEE, vol. 108, no. 12, pp. 2251–2275, Dec. 2020,
doi: 10.1109/JPROC.2020.3034117.
[7] X. Si et al., “A twin-8T SRAM computation-in-memory unit-macro for
multibit CNN-based AI edge processors,” IEEE J. Solid-State Circuits,
vol. 55, no. 1, pp. 189–202, Jan. 2020.
Fig. 9. Layout photograph, area and energy breakdown, and summary of [8] A. Biswas and A. P. Chandrakasan, “Conv-RAM: An energy-
proposed macro. efficient SRAM with embedded convolution computation for low-
power CNN-based machine learning applications,” in Proc. IEEE
Int. Solid-State Circuits Conf. (ISSCC), Feb. 2018, pp. 488–490,
doi: 10.1109/ISSCC.2018.8310397.
[9] Z. Chen et al., “CAP-RAM: A charge-domain in-memory computing
6T-SRAM for accurate and precision-programmable CNN inference,”
IEEE J. Solid-State Circuits, vol. 56, no. 6, pp. 1924–1935, Jun. 2021,
doi: 10.1109/JSSC.2021.3056447.
[10] X. Si et al., “A local computing cell and 6T SRAM-based computing-
in-memory macro with 8-b MAC Operation for edge AI chips,” IEEE
J. Solid-State Circuits, vol. 56, no. 9, pp. 2817–2831, Sep. 2021,
doi: 10.1109/JSSC.2021.3073254.
[11] W. Gerstner, W. M. Kistler, R. Naud, and L. Paninski, Neuronal
Dynamics: From Single Neurons to Networks and Models of Cognition.
Cambridge, U.K.: Cambridge Univ. Press, 2014.
[12] J.-M. Hung et al., “An 8-Mb DC-current-free binary-to-8b precision
ReRAM nonvolatile computing-in-memory macro using time-space-
Fig. 10. Figure of Merit (FoM) Comparison. readout with 1286.4-21.6TOPS/W for edge-AI devices,” presented at
the IEEE Int. Solid-State Circuits Conf. (ISSCC), 2022.
[13] Y.-D. Chih et al., “An 89TOPS/W and 16.3 TOPS/mm2 all-digital
SRAM-based full-precision compute-in memory macro in 22nm for
under 0.65V analog domain supply voltage and 0.81V digi- machine-learning edge applications,” in Proc. IEEE Int. Solid-State
Circuits Conf. (ISSCC), vol. 64, 2021, pp. 252–254.
tal supply voltage. The proposed spike-time-based IMC macro [14] A. Sayal, S. S. T. Nibhanupudi, S. Fathima, and J. P. Kulkarni,
is at most 1.9× and 2.04× more energy efficiency than the “A 12.08-TOPS/W all-digital time-domain CNN engine using bi-
state-of-the-art digital- and analog- domain macros, respec- directional memory delay lines for energy efficient edge computing,”
tively, and is on par with the state-of-the-art time-domain IEEE J. Solid-State Circuits, vol. 55, no. 1, pp. 60–75, Jan. 2020,
macro. Table I presents the comparison table with previous doi: 10.1109/JSSC.2019.2939888.
[15] P.-C. Wu et al., “A 28nm 1Mb time-domain computing-in-memory 6T-
SRAM-based IMC works [7], [8], [10], [13], [14], [15], and SRAM macro with a 6.6ns latency, 1241GOPS and 37.01TOPS/W for
Fig. 10 shows the comparison Figure-of-Merit (FoM = EFMAC 8b-MAC operations for edge-AI devices,” in Proc. IEEE Int. Solid-State
× Input-precision × Weight-precision × Output-precision). Circuits Conf. (ISSCC), vol. 65, 2022, pp. 1–3.