ADC-Less 3D-NAND Compute-In-Memory Architecture Using Margin Propagation

Uploaded by

Dr. M V GANESWARA RAO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

ADC-Less 3D-NAND Compute-In-Memory Architecture Using Margin Propagation

Uploaded by

Dr. M V GANESWARA RAO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

2023 IEEE 66th International Midwest Symposium on Circuits and Systems (MWSCAS)

Phoenix, Arizona, USA, August 6-9, 2023

ADC-less 3D-NAND Compute-in-Memory

Architecture using Margin Propagation
Aswin Chowdary Undavalli1 , Gert Cauwenberghs2 , Arun Natarajan3 , Shantanu Chakrabartty1 and Aravind Nagulu1,∗
1
Department of Electrical and Systems Engineering, Washington University in St. Louis, MO, USA
2023 IEEE 66th International Midwest Symposium on Circuits and Systems (MWSCAS) | 979-8-3503-0210-3/23/$31.00 ©2023 IEEE | DOI: 10.1109/MWSCAS57524.2023.10406082

2
Department of Bioengineering University of California at San Diego, CA, USA
3
School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
* email - [email protected]

Abstract—Compute In Memory (CIM) has gained significant exceed the state-of-the-art performance specifications. The
attention in recent years due to its potential to overcome the key technical challenge is that unlike a NOR configuration,
memory bottleneck in Von-Neumann computing architectures. where floating-gate [3] or FeFET transistors are connected in
While most CIM architectures use non-volatile memory elements
in a NOR-based configuration, NAND-based configuration, and parallel to a current summation line, a NAND-flash comprises
in particular, 3D-NAND flash memories are attractive because of a cascade of floating-gate transistors. Thus, traditional par-
their potential in achieving ultra-high memory density and ultra- allel multiply-and-accumulate (MAC) approaches cannot be
low cost per bit storage. Unfortunately, the standard multiply- applied to NAND architectures to compute inner products
and-accumulate (MAC) CIM-paradigm can not be directly ap- or matrix-vector multipliers (MVMs). However, our recent
plied to NAND-flash memories. In this paper, we report a NAND-
Flash-based CIM architecture by combining the conventional 3D- results have shown that inner product computation, like any
NAND flash with a Margin-Propagation (MP) based approximate pattern-matching computation, has sufficient ensemble redun-
computing technique. We show its application for implementing dancy to tolerate minor approximation errors in individual
matrix-vector multipliers (MVMs) that do not require analog-to- MCEs. We leveraged this observation to develop a 3D NAND-
digital converters (ADCs) for read-out. Using simulation results based Multiply-Accumulate-Macro (MAM) to achieve multi-
we show that this approach has the potential to provide a 100×
improvement in compute density, read speed, and computation bit precision for MVM computations. Specifically, we will
efficiency compared to the current state-of-the-art. use a Margin-propagation (MP) based approximate computing
Index Terms—Compute-In Memory, Multiply and Accumulate, architecture [4] whose operational principle perfectly matches
NAND Flash, Margin Propagation, and Matrix Vector Multipli- the operating physics of a 3-D NAND-flash memory, as shown
ers. in Fig. 1(a).
I. I NTRODUCTION
Conventional processors based on Von-Neumann architec-
ture suffer from the memory-wall bottleneck [1] which arises
due to the energy and bandwidth limitations of memory access
and movement of data between the memory and the processor.
The compute-in-memory (CIM) paradigm [2] can potentially
alleviate this bottleneck by integrating some core computing
function with the memory and by exploiting highly-parallel
analog computing techniques. In literature, several CIM archi-
tectures have been proposed and have been applied to different
types of volatile and non-volatile memories (NVMs) [3] as
Multiply Compute Elements (MCE). NVM CIMs are more
Fig. 1. (a) Analog bottom-up, top-down co-design that exploits the compu-
desirable because the stored parameters, once programmed tational redundancy in inner-product and computational primitives inherent in
can be retained across brown-outs and reused without in- margin-propagation (b) System level architecture of the differential MP-based
curring any data upload costs. Amongst all the non-volatile 3-D NAND Flash Matrix Vector Multiplier.
memory technologies, 3D NAND flash boasts one of the The proposed approach also naturally lends itself to an
highest integration densities with memory capacities exceeding ADC-less readout. As shown in Fig. 1(b), when the NAND
14GB/mm2 . Also, the technology can vertically stack more stack is coupled to a pre-charged peripheral capacitor, the
than 200 layers of floating-gate memory cells which implies temporal voltage decay on the capacitor can be used to time-
each stack occupies a cross-sectional area of less than 0.01F 2 encode the MAC output. In this manner, the proposed NAND-
per bit. Thus, if it is possible to implement an analog compute- based architecture also eliminates the need for peripheral
in-memory architecture on a 3D NAND flash array, similar to a ADCs. The proposed 3D NAND-based in-memory compute
NOR configuration [2], then the resulting design would easily array can therefore result in

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on March 21,2024 at 06:59:20 UTC from IEEE Xplore. Restrictions apply.
• Highest memory density of all non-volatile memories by flash involves complex interactions between electrical charges
leveraging the density of 3D NAND stack. and transistors, but its ability to store data without power and
• Parallel computation and single-shot readout per MAM, its fast read and write times have made it a popular storage
thus significantly reducing the read time and enhancing technology in a wide range of applications.
compute efficiency.
• ADC-less digital output and readout by leveraging the B. System Architecture
capacitor discharging/charging scheme to directly encode
In the proposed CIM architecture we exploit the 3D NAND
MAM into time.
stacking to efficiently perform a simultaneous Inner Product
Overall, the proposed scheme advances state-of-the-art archi- (IP) calculation between a vector X, with ‘n’ components, and
tectures in NVM computing. ‘m’ different weight vectors (Wi , for i = 1 to m), in a single
II. P RINCIPLE OF O PERATION read cycle. We propose to use non-volatile 3-D NAND vertical
stacks to store the weight vectors (Wi = [w1i w2i . . . wni ])
A. 3D NAND Flash
along the memory cells in a vertical stack. Multiple weight
3D NAND flash is a variation on the traditional floating- vectors (Wi , for i = 1 to m) are stored in parallel NAND
gate non-volatile memories where the floating-gate transistors stacks (as illustrated in Fig. 1(b) in pink ink). The input
are vertically stacked on top of each other, as shown in Fig. 2. vector (X = [x1 x2 . . . xn ]) is applied simultaneously to all
This not only increases the storage density but also reduces the stacks, and the individual inputs (xp , for p = 1 to n)
interference across memory cells which is important for CIM are directed to individual memory cells within each stack (as
architectures. The memory cells are vertically arranged in a shown in Fig. 1(b) in cyan ink). From MP-computation theory,
grid of rows and columns, with each cell consisting of a we have shown that the difference in resistance between the
transistor and a floating gate as shown in Fig 2. A thin oxide positive and negative stacks indicates the value of the inner
layer insulates the floating gate from the transistor and can product (discussed in sections III-A & III-B). During the
store electrical charges that can represent multi-bit information sense phase, pre-charged capacitors are connected to the output
[5]. nodes, and the decay of their charge is tracked to encode the
inner product value into the pulse width of the digital output
pulse (as illustrated in Fig. 1(b) in red ink). In summary, the
envisioned system can simultaneously activate all the vertical
NAND stacks and individual memory cells within each stack
to perform the inner product calculation between X and Wi ,
for all i = 1 to m, in a single readout cycle. This architecture
enables substantial improvements in efficiency, readout speed,
and compute density, making it the most suitable for in-
memory computing applications.

C. Multiplier-less MCEs based on Margin Propagation

In [6], it has been demonstrated that an inner product
between two vectors X and W could be approximated using
Fig. 2. (a) 3D-NAND Flash Topology where DL = ‘Data Line’, CL = ‘Control a differential MP architecture that computes z + = M P ([X +
Line’, GSL = ‘Ground Select Line’ and CEL = ‘Chip Enable Line’. (b) Single-
cell NAND Flash Structure W, −X −W ], γ), z − = M P ([X −W, −X +W ], γ), and (z + −
z − ) ≈ X T W . Previously, MP-based vector multipliers were
To read data from a NAND flash memory cell, a voltage implemented using both digital [7] and analog current domain
is applied to the cell’s control gate, which switches on the [4] methods. In this work, we plan to extend this concept
transistor and allows current to flow through the channel. The to an in-memory compute architecture based on NAND flash
amount of current that flows depends on the amount of charge memory, as depicted in Fig. 1.
stored in the floating gate. This current is detected by a sense Starting with a method to realize an MCE; the multi-level
amplifier and converted into a digital signal that represents the input differential weight (w and -w) will be stored on 2 units of
data stored in the cell. To write data to a NAND flash memory 2-stacked NAND flash memory cells, and differential input (x
cell, a voltage is applied to the control gate, and another and -x) will be incident on the input gates with the appropriate
voltage is applied to the cell’s drain, creating an electric field gate bias (Vg ) (as shown in Fig. 3(a)). In the common source
that allows electrons to tunnel through the oxide layer and configuration, the source terminals (S+ and S-) are grounded,
into the floating gate. NAND flash memory is organized into and current Isense is injected into the drain terminals. When
blocks, with each block consisting of multiple memory cells. biased in the triode region, the ON-resistance of the floating
To erase data from a block, a higher voltage is applied to the gate transistor is RDS = 1/[kn (VGS −VT )]. So ON-resistance
+
block’s control gate, which releases all electrical charges from of the left branch can be written as RDS = R1+ + R2+ where
+ +
the floating gates in the block. Overall, the working of NAND R1 and R2 are device resistances with x+w and -x-w as

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on March 21,2024 at 06:59:20 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. (a) Schematic of a 2-stacked NAND MCE. Output of a 2-stacked NAND MCE across varying x and w when operating in (b) common-source
configuration, and (c) common drain configuration.

effective inputs, i.e., actual gate input + VT change due to III. MP-BASED MVM S
trapped charges, and can be expressed as A. MVMs Using 3D MP-NAND Flash
1 1
R1+ = , R+ = , Expanding upon the single unit NAND-based MCE, vector
kn (VgT + (x − w)) 2 kn (VgT − (x − w)) inner product calculations can be easily achieved by stacking
Similarly for right branch ON-resistance can be written as unit cells on top of each other, as illustrated in Fig. 4(a).
−
RDS = R1− + R2− where R1− and R2− are device resistances This approach would be highly efficient in terms of area
with -x+w and x-w as inputs and can be expressed as utilization if implemented with 3-D NAND Flash technology.
In fact, a single read operation of a 128-stacked NAND
− 1 − 1 flash memory can evaluate an inner product of a vector with
R1 = ,R = .
kn (VgT + (x + w)) 2 kn (VgT − (x + w)) 64 elements. As the number of NAND cells increases per
Finally, we can show that stack, the mismatch errors are averaged. When the 128-stacked
2 (VgT ) 2 (VgT ) NAND flash architecture is biased in the CD configuration,
+ −
RDS = , RDS = , the MP-based NAND Flash MVM yielded an inner-product
kn VgT 2 − (x − w)2 kn VgT2 − (x + w)2
computation with an RMS error of -30.5dB, equivalent to 5-bit
  precision (see Fig. 4(b)). When biased in the CS configuration,
 8 (VgT ) Isense  the MP-based NAND Flash MAM yielded an inner-product
VD = − (x×w) computation with an RMS error of -43.5dB, equivalent to 7.2-
 k V 2 − (x + w)2 V 2 − (x − w)2 
n gT gT bit precision (see Fig. 4(b)). The reduced computing precision
of the CD-NAND MVM compared to the CS-NAND MVM
VD ∝ (x × w) for large VgT , is due to the higher non-linearity in the CD-NAND MCE, as
+ −
Where RDS and RDS represents the resistance of the pos- shown in Fig. 3(c).
itive and negative stacks respectively, and VgT = Vg − VT . As
can be seen, the differential drain voltage tracks the product B. Parallel Computation and Single-Shot Readout per MVM
VD ∝ x × w (CS-NAND MCE). Similar behavior is also In the most recent NAND flash-based in-memory compute
observed in our simulations shown in Fig. 3(b). Due to the blocks, only one unit cell within the vertical stack is activated
lack of availability of NAND flash technology at present, per read cycle, and this approach limits the read time and
the simulations were performed using conventional 22nm computing efficiency [8]. However, our proposed scheme
transistors where the threshold change of the flash memory improves upon this limitation by activating all the unit cells
cells is emulated using a voltage source in series with the in a vertical stack. This allows the inner product between the
transistor gate. stored weights and input vector to be computed within a single
Additionally, the MP-based NAND Flash MCE can also read cycle, significantly improving the read time and compute
operate in the common-drain configuration, where the drain efficiency by a factor of (0.5 × no. of stacked unit cells)
terminals are biased at VDD and current Isense is drawn out (discussed in section II-C). Additionally, when implemented
of the source terminals. The analysis of such a system is more on a 3-D NAND flash technology, thus improvement can
involved and has been omitted for brevity. Essentially, in this enable extremely compact implementation. For instance, with
case, the differential output voltage VS = (VS+ − VS− ) tracks the current state-of-the-art 3-D NAND flash, it’s possible to
the multiplication and is monotonic to (x × w) (CD-NAND stack up to 256 unit cells, resulting in a 128× improvement
MCE) as shown in Fig. 3(c). The nonlinearity of the output, in read time, compute efficiency, and implementation area.
with respect to the product (x × w), can be easily calibrated
during the calibration phase. In summary, to realize a single C. ADC-Less Time-Encoded Output and Readout
analog multiplication, we use four non-volatile flash memory Peripheral circuits, such as the output ADC, can limit the
units (either in the CD or CS configuration) occupying a efficiency of a system. To address this issue, we propose an
feature size of 2F × 2F . ADC-less readout scheme in Fig. 5(a) that encodes the output

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on March 21,2024 at 06:59:20 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. (a) Schematic of the MP-based 3-D NAND Flash MAM. Representative simulation of a 128-stacked NAND Flash MAM across varying inner product
when operating in (b) common-drain configuration resulting in 5-bit precision and (c) common-source configuration resulting in 7.2-bit precision.

in the pulse width of a digital signal. Initially, the output

capacitors are pre-charged. During the compute phase, the
output capacitors discharge through the MCE with different
time constants in the positive and negative branches, causing
the output nodes to reach the comparator reference voltage
at different time instances. We can extract the difference
between these instances using XOR and AND gates. Simu-
lations suggest that the pulse width can successfully encode
the multiplication of x and w. Fig. 5(b) illustrates the pulse
width of the XOR output across varying inner product value
between two vectors X and W, creating a profile equivalent
to an ideal multiplication. For a total discharge time (a.k.a Fig. 5. (a) Schematic representation of a 2-stacked NAND MCE with time-
encoded output including a calibration phase that calibrates pulse width due
readout time per vector inner product) of 45 ns, the overall to comparator offsets. (b) The output of a CS-NAND MCE across varying
pulse width ranges from −4 ns to +4 ns as shown in Fig. 5(b). inner product between two vectors X and W .
This pulse width can be digitized to 8-bit accuracy assuming a
time-to-digital converter with an LSB of ∼31 ps which can be R EFERENCES
accomplished with sub-mW in scaled CMOS technologies [9]. [1] M. Horowitz, “1.1 computing’s energy problem (and what we can do
This concept can be easily extended to the 3D-NAND MAM- about it),” in IEEE ISSCC, pp. 10–14, IEEE, 2014.
[2] N. R. Shanbhag and S. K. Roy, “Comprehending in-memory computing
based MVM system discussed earlier, resulting in an ADC- trends via proper benchmarking,” in IEEE CICC, pp. 01–07, IEEE, 2022.
less readout scheme, thereby enhancing the overall system [3] S. Chakrabartty and G. Cauwenberghs, “Sub-microwatt analog VLSI
efficiency. trainable pattern classifier,” IEEE JSSC, vol. 42, no. 5, pp. 1169–1179,
2007.
[4] M. Gu and S. Chakrabartty, “Synthesis of bias-scalable CMOS analog
computational circuits using margin propagation,” IEEE TCAS I: Regular
Papers, vol. 59, no. 2, pp. 243–254, 2011.
IV. C ONCLUSION [5] A. Goda, “Recent progress on 3D NAND flash technologies,” Electronics,
vol. 10, no. 24, p. 3156, 2021.
[6] A. R. Nair et al., “Multiplierless MP-kernel machine for energy-efficient
Combining the proposed MP-based MAM with 3-D NAND edge devices,” IEEE Trans. on VLSI Systems, vol. 30, no. 11, pp. 1601–
Flash will enable a 100× increase in read speed without 1614, 2022.
[7] M. Gu and S. Chakrabartty, “A 100 pJ/bit, (32,8) CMOS Analog Low-
compromising the computing efficiency. The NAND flash Density Parity-Check Decoder Based on Margin Propagation,” IEEE
MAM’s parallel compute and single stack architecture can JSSC, vol. 46, no. 6, pp. 1433–1442, 2011.
enable 100× improvement in the memory density compared to [8] M. Kim et al., “An Embedded nand Flash-Based Compute-In-Memory
Array Demonstrated in a Standard Logic Process,” IEEE JSSC, vol. 57,
CIM architectures based on NOR Flash memory. The proposed no. 2, pp. 625–638, 2022.
ADC-less readout scheme that encodes the output in the pulse [9] H. Kim et al., “19.3 A 2.4GHz 1.5mW digital MDLL using pulse-width
width of a digital signal would reduce the energy overhead comparator and double injection technique in 28nm CMOS,” in IEEE
ISSCC, pp. 328–329, 2016.
from the peripheral circuits.

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on March 21,2024 at 06:59:20 UTC from IEEE Xplore. Restrictions apply.

High Society - 1-54!1!22
0% (1)
High Society - 1-54!1!22
22 pages
Gandhi Cloth Company - Integer & Mixed Integer Programming
No ratings yet
Gandhi Cloth Company - Integer & Mixed Integer Programming
3 pages
Reliable Computing of ReRAM Based Compute-in-Memory Circuits for AI Edge Devices
No ratings yet
Reliable Computing of ReRAM Based Compute-in-Memory Circuits for AI Edge Devices
6 pages
RRAM_Based_In-Memory_Computing_From_Device_and_Lar
No ratings yet
RRAM_Based_In-Memory_Computing_From_Device_and_Lar
16 pages
A_Fully_Bit-Flexible_Computation_in_Memory_Macro_Using_Multi-Functional_Computing_Bit_Cell_and_Embedded_Input_Sparsity_Sensing
No ratings yet
A_Fully_Bit-Flexible_Computation_in_Memory_Macro_Using_Multi-Functional_Computing_Bit_Cell_and_Embedded_Input_Sparsity_Sensing
9 pages
Reconfigurable 2T2R ReRAM Architecture For Versatile Data Storage and Computing In-Memory
No ratings yet
Reconfigurable 2T2R ReRAM Architecture For Versatile Data Storage and Computing In-Memory
14 pages
A Logic-Compatible EDRAM Compute-In-Memory With Embedded ADCs For Processing Neural Networks
No ratings yet
A Logic-Compatible EDRAM Compute-In-Memory With Embedded ADCs For Processing Neural Networks
13 pages
Challenges in Circuits of Nonvolatile Compute-In-Memory For Edge AI Chips
No ratings yet
Challenges in Circuits of Nonvolatile Compute-In-Memory For Edge AI Chips
5 pages
Cim Iscas 2021
No ratings yet
Cim Iscas 2021
9 pages
Multiply Accumulate Operations in Memristor Crossbar Arrays Foranalog Computing
No ratings yet
Multiply Accumulate Operations in Memristor Crossbar Arrays Foranalog Computing
22 pages
Lattice An ADC DAC-less ReRAM-based Processing-In-Memory Architecture For Accelerating Deep Convolution Neural Networks
No ratings yet
Lattice An ADC DAC-less ReRAM-based Processing-In-Memory Architecture For Accelerating Deep Convolution Neural Networks
6 pages
10.1038_s41565-020-0655-z
No ratings yet
10.1038_s41565-020-0655-z
16 pages
(Survey) Memory Devices and Applications For In-Memory Computing
No ratings yet
(Survey) Memory Devices and Applications For In-Memory Computing
16 pages
A_28_nm_16-kb_Sign-Extension-Less_Digital-Compute-in-Memory_Macro_With_Extension-Friendly_Compute_Units_and_Accuracy-Adjustable_Adder-Tree
No ratings yet
A_28_nm_16-kb_Sign-Extension-Less_Digital-Compute-in-Memory_Macro_With_Extension-Friendly_Compute_Units_and_Accuracy-Adjustable_Adder-Tree
5 pages
10T SRAM Computing-in-Memory Macros For Binary and
No ratings yet
10T SRAM Computing-in-Memory Macros For Binary and
15 pages
A High-Parallelism RRAM-Based Compute-In-Memory Macro With Intrinsic Impedance Boosting and In-ADC Computing
No ratings yet
A High-Parallelism RRAM-Based Compute-In-Memory Macro With Intrinsic Impedance Boosting and In-ADC Computing
9 pages
Jaisimha Thesis 2021
No ratings yet
Jaisimha Thesis 2021
81 pages
Integrating Memristors and CMOS For Better AI: News & Views
No ratings yet
Integrating Memristors and CMOS For Better AI: News & Views
2 pages
ispass14
No ratings yet
ispass14
11 pages
MM 3
No ratings yet
MM 3
11 pages
10 1109@tcsii 2020 3013336
No ratings yet
10 1109@tcsii 2020 3013336
5 pages
11 8T SRAM Cell As A Multi-Bit Dot Product Engine For Beyond Von-Neumann Computing
No ratings yet
11 8T SRAM Cell As A Multi-Bit Dot Product Engine For Beyond Von-Neumann Computing
7 pages
14.2-A-Compute-SRAM-with-Bit-Serial-Integer_Floating-Point-Operations-for-Programmable-In-Memory-Vector-Acceleration
No ratings yet
14.2-A-Compute-SRAM-with-Bit-Serial-Integer_Floating-Point-Operations-for-Programmable-In-Memory-Vector-Acceleration
3 pages
Accuracy_Improvement_With_Weight_Mapping_Strategy_and_Output_Transformation_for_STT-MRAM-Based_Computing-in-Memory
No ratings yet
Accuracy_Improvement_With_Weight_Mapping_Strategy_and_Output_Transformation_for_STT-MRAM-Based_Computing-in-Memory
7 pages
IJME Vol 7 Iss 4 Paper 9 1260 1264
No ratings yet
IJME Vol 7 Iss 4 Paper 9 1260 1264
5 pages
A CMOS-integrated Compute-In-Memory Macro
No ratings yet
A CMOS-integrated Compute-In-Memory Macro
10 pages
Hai Jin
No ratings yet
Hai Jin
15 pages
CIM Study
No ratings yet
CIM Study
3 pages
CICC_2007_Oh
No ratings yet
CICC_2007_Oh
5 pages
An 8-Bit in Resistive Memory Computing Core With Regulated Passive Neuron and Bitline Weight Mapping
No ratings yet
An 8-Bit in Resistive Memory Computing Core With Regulated Passive Neuron and Bitline Weight Mapping
13 pages
A Survey For Realizing In-Memory Computing
No ratings yet
A Survey For Realizing In-Memory Computing
4 pages
Design of Current-Mode 8T SRAM Compute-In-Memory Macro For Processing Neural Networks
No ratings yet
Design of Current-Mode 8T SRAM Compute-In-Memory Macro For Processing Neural Networks
2 pages
03) Time-Domain - Computing - in - Memory - Using - Spintronics - For - Energy-Efficient - Convolutional - Neural - Network
No ratings yet
03) Time-Domain - Computing - in - Memory - Using - Spintronics - For - Energy-Efficient - Convolutional - Neural - Network
13 pages
A Brain-Inspired ADC-Free SRAM-Based In-Memory Computing Macro With High-Precision MAC for AI Application
No ratings yet
A Brain-Inspired ADC-Free SRAM-Based In-Memory Computing Macro With High-Precision MAC for AI Application
5 pages
ec24m2018_vttd
No ratings yet
ec24m2018_vttd
11 pages
In-Memory Mirroring: Cloning Without Reading: Ntroduction
No ratings yet
In-Memory Mirroring: Cloning Without Reading: Ntroduction
6 pages
04) Efficient - Time-Domain - In-Memory - Computing - Based - On - TST-MRAM
No ratings yet
04) Efficient - Time-Domain - In-Memory - Computing - Based - On - TST-MRAM
5 pages
02Computing-in-Memory With SRAM and RRAM For Binary Neural Networks
No ratings yet
02Computing-in-Memory With SRAM and RRAM For Binary Neural Networks
4 pages
A Survey of Software Techniques For Using No-Volatile Memories For Storage and Main Memory Systems
No ratings yet
A Survey of Software Techniques For Using No-Volatile Memories For Storage and Main Memory Systems
14 pages
Towards Integration of A Dedicated Memory Controll
No ratings yet
Towards Integration of A Dedicated Memory Controll
11 pages
3D XPoint
0% (1)
3D XPoint
7 pages
An in-Memory-Computing STT-MRAM Macro With Analog ReLU and Pooling Layers For Ultra-High Efficient Neural Network
No ratings yet
An in-Memory-Computing STT-MRAM Macro With Analog ReLU and Pooling Layers For Ultra-High Efficient Neural Network
6 pages
2) A - Time-Domain - Computing-In-Memory - Micro - Using - Ring - Oscillator
No ratings yet
2) A - Time-Domain - Computing-In-Memory - Micro - Using - Ring - Oscillator
2 pages
electronics-12-03155-v2
No ratings yet
electronics-12-03155-v2
15 pages
In-Memory Processing Paradigm For Bitwise Logic Operations in STT-MRAM
No ratings yet
In-Memory Processing Paradigm For Bitwise Logic Operations in STT-MRAM
4 pages
A Multi-Functional In-Memory Inference Processor Using A Standard 6T SRAM Array
No ratings yet
A Multi-Functional In-Memory Inference Processor Using A Standard 6T SRAM Array
14 pages
A Survey of MRAM-Centric Computing From Near Memory To in Memory
No ratings yet
A Survey of MRAM-Centric Computing From Near Memory To in Memory
13 pages
Micromaquinas para Mecatronica
No ratings yet
Micromaquinas para Mecatronica
12 pages
Matchline Controller For Content Addressable Memory
No ratings yet
Matchline Controller For Content Addressable Memory
5 pages
CREAM Computing in ReRAM-Assisted Energy - and Area-Efficient SRAM For Reliable Neural Network Acceleration
No ratings yet
CREAM Computing in ReRAM-Assisted Energy - and Area-Efficient SRAM For Reliable Neural Network Acceleration
14 pages
Session 24 - SRAM & Computation-In-Memory
No ratings yet
Session 24 - SRAM & Computation-In-Memory
153 pages
Research progress in architecture and application of RRAM with computing-in-memory - PMC
No ratings yet
Research progress in architecture and application of RRAM with computing-in-memory - PMC
24 pages
SEE-MCAM Scalable Multi-Bit FeFET Content Addressable Memories For Energy Efficient Associative Search
No ratings yet
SEE-MCAM Scalable Multi-Bit FeFET Content Addressable Memories For Energy Efficient Associative Search
9 pages
Overview of 3D Vertical Nand Flash Memory Using Charge Trap Flash Technology Seminar Report
No ratings yet
Overview of 3D Vertical Nand Flash Memory Using Charge Trap Flash Technology Seminar Report
23 pages
Priyanka - 50300 16 130
No ratings yet
Priyanka - 50300 16 130
4 pages
3460971
No ratings yet
3460971
20 pages
eDRAM-OESP: A Novel Performance Efficient in-embedded-DRAM-compute Design For On-Edge Signal Processing Application
No ratings yet
eDRAM-OESP: A Novel Performance Efficient in-embedded-DRAM-compute Design For On-Edge Signal Processing Application
7 pages
Assessing Design Space for the Device-Circuit Codesign of Nonvolatile Memory-Based Compute-in-Memory Accelerators
No ratings yet
Assessing Design Space for the Device-Circuit Codesign of Nonvolatile Memory-Based Compute-in-Memory Accelerators
7 pages
2021_ASSCC_8-5
No ratings yet
2021_ASSCC_8-5
3 pages
An Overview of Computing-In-Memory Circuits With DRAM and NVM
No ratings yet
An Overview of Computing-In-Memory Circuits With DRAM and NVM
6 pages
A_Survey_of_Neuromorphic_Computing-in-Memory_Architectures_Simulators_and_Security
No ratings yet
A_Survey_of_Neuromorphic_Computing-in-Memory_Architectures_Simulators_and_Security
10 pages
Routing in Wireless Mesh Networks
From Everand
Routing in Wireless Mesh Networks
Raghav Kumar
No ratings yet
Real Estate Advisor C Project
No ratings yet
Real Estate Advisor C Project
10 pages
Annex 2
No ratings yet
Annex 2
5 pages
Winter - 2019 Examination Subject Name: Basic Mathematics Model Answer Subject Code
No ratings yet
Winter - 2019 Examination Subject Name: Basic Mathematics Model Answer Subject Code
18 pages
Power Center Metadata Manager
No ratings yet
Power Center Metadata Manager
4 pages
Altivar Easy 310 - ATV310HU40N4E
No ratings yet
Altivar Easy 310 - ATV310HU40N4E
7 pages
Chapter 4- Design of Tension Members(1)
No ratings yet
Chapter 4- Design of Tension Members(1)
41 pages
Swat Analysis
No ratings yet
Swat Analysis
12 pages
Planificare-Anuala Upstream Intermediate B2 CLASA A9a L1
No ratings yet
Planificare-Anuala Upstream Intermediate B2 CLASA A9a L1
4 pages
Phenomenal Bombardment of Antibiotic in Poultry: Contemplating The Environmental Repercussions
No ratings yet
Phenomenal Bombardment of Antibiotic in Poultry: Contemplating The Environmental Repercussions
15 pages
Vogele Ab 500 Extending Screed Training
100% (46)
Vogele Ab 500 Extending Screed Training
10 pages
Margin Call Blog
No ratings yet
Margin Call Blog
4 pages
淘气豆英语说明书
No ratings yet
淘气豆英语说明书
5 pages
Ec5000 SMS
No ratings yet
Ec5000 SMS
108 pages
Water: A Novel, Coupled CFD-DEM Model For The Flow Characteristics of Particles Inside A Pipe
No ratings yet
Water: A Novel, Coupled CFD-DEM Model For The Flow Characteristics of Particles Inside A Pipe
23 pages
Feed-Mill-Maintance-and-Sanitation-Guide
No ratings yet
Feed-Mill-Maintance-and-Sanitation-Guide
13 pages
HD6437050 HD64F7050 HD64F7051
No ratings yet
HD6437050 HD64F7050 HD64F7051
841 pages
1.1-4 Assemble Computer Hardware529
No ratings yet
1.1-4 Assemble Computer Hardware529
51 pages
Comparison of Artificial Bee Colony Algorithm With Other Algorithms
No ratings yet
Comparison of Artificial Bee Colony Algorithm With Other Algorithms
4 pages
Carrier Noc SDS
No ratings yet
Carrier Noc SDS
3 pages
Human Nutrition (Multiple Choice) QP_230819_075352_230822_200157
No ratings yet
Human Nutrition (Multiple Choice) QP_230819_075352_230822_200157
8 pages
033 Montenegro Toilets Elevations PDF
No ratings yet
033 Montenegro Toilets Elevations PDF
1 page
S7O2 SportsNutritionInformation
No ratings yet
S7O2 SportsNutritionInformation
5 pages
AlagangWency - Partnersip Liquidation Short Quiz
No ratings yet
AlagangWency - Partnersip Liquidation Short Quiz
2 pages
Homework Questions Convolutional Codes
No ratings yet
Homework Questions Convolutional Codes
7 pages
CV Olaru Nestor Mironel en
No ratings yet
CV Olaru Nestor Mironel en
2 pages
CK-E55 H - Generator Set Caterpillar2
No ratings yet
CK-E55 H - Generator Set Caterpillar2
6 pages
Interface of Marketing With Operations, Interface of Marketing With Technology and Social Media
No ratings yet
Interface of Marketing With Operations, Interface of Marketing With Technology and Social Media
19 pages
SRS Template
No ratings yet
SRS Template
4 pages

ADC-Less 3D-NAND Compute-In-Memory Architecture Using Margin Propagation

Uploaded by

ADC-Less 3D-NAND Compute-In-Memory Architecture Using Margin Propagation

Uploaded by

2023 IEEE 66th International Midwest Symposium on Circuits and Systems (MWSCAS)

Phoenix, Arizona, USA, August 6-9, 2023

ADC-less 3D-NAND Compute-in-Memory

979-8-3503-0210-3/23/$31.00 ©2023 IEEE 89

C. Multiplier-less MCEs based on Margin Propagation

in the pulse width of a digital signal. Initially, the output

You might also like