14.2 A Compute SRAM With Bit Serial Integer - Floating Point Operations For Programmable in Memory Vector Acceleration

Uploaded by

Akash Mukherjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views3 pages

14.2 A Compute SRAM With Bit Serial Integer - Floating Point Operations For Programmable in Memory Vector Acceleration

Uploaded by

Akash Mukherjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

ISSCC 2019 / SESSION 14 / MACHINE LEARNING & DIGITAL LDO CIRCUITS / 14.

14.2 A Compute SRAM with Bit-Serial Integer/Floating-Point simultaneously, resulting in massive parallelism (2048 CBLs in our design).
Operations for Programmable In-Memory Vector Subtraction is performed by first inverting B and then adding to A with Cin pre-set
to 1. As shown in Fig. 14.2.3, multiplication is more complicated as it requires
Acceleration predication. For this, the tag latch (Fig. 14.2.2) is used to enable the write-back
Jingcheng Wang, Xiaowei Wang, Charles Eckert, Arun Subramaniyan, driver, resulting in a conditional copy/addition. First, 4 empty columns in the array
Reetuparna Das, David Blaauw, Dennis Sylvester are reserved for the product and initialized to zero. In the first cycle, the LSB of the
multiplier is loaded to the tag latch. In cycles 2 and 3, the multiplicands are copied
University of Michigan, Ann Arbor, MI to product columns only if their tag is 1. In cycle 4, the second bit of the multiplier
is loaded to the tag latch. In the next 2 cycles, for rows with tag = 1, the
Data movement and memory bandwidth are dominant factors in the energy and
multiplicands are added to the second and third bits of the product, shifting the
performance of both general purpose CPUs and GPUs. This has led to extensive
multiplicands by 1 to account for the multiplier bit position. Finally, we store Cout in
research focused on in-memory computing, which moves computation to where
the most significant bit (MSB) of the product to complete the multiplication. Note
the data is located. With this approach, computation is often performed on the
that partial products are implicitly shifted as they are added using appropriate bit
memory bit-lines in the analog domain using current summing [1-3], which requires
addressing in the bit-serial operation and no explicit shift is performed. Division is
expensive analog-to-digital and digital-to-analog conversions at the array boundary.
conducted similarly by implicit shifting and subtraction from a partial result. Floating
In addition, such analog computation is very sensitive to PVT variations, limiting
point arithmetic is implemented using repeated integer add/sub/mult/div with
precision. More recently, full-rail (digital) binary in-memory computing was
predication. Fig. 14.2.3 provides a list of supported computations and their
proposed to avoid this conversion overhead and improve robustness [4, 5].
performance, demonstrating both the versatility of CRAM and its high performance
However, both prior in-memory approaches suffer from the same major limitations:
due to bit-line parallelism.
they accelerate only one type of algorithm and are inherently restricted to a very
specific application domain due to their limited and fixed bit-width precision and Figure 14.2.4 shows measurement results from the prototype chip fabricated in
non-programmable architecture. Software algorithms, on the other hand, continue 28nm CMOS that contains 8 CRAM banks (128KB memory with 2048 computing
to evolve rapidly, especially in novel application domains, such as neural networks, rows) and a Cortex-M0 processor. The figure shows measured frequency and
vision and graph processing, making rigid accelerators of limited use. Furthermore, energy efficiency of 8b addition and multiplication across supply voltage. At 1.1V
most available SRAM in today’s chips is located in the caches of CPUs or GPUs. the maximum frequency of 475MHz results in 122GOPS for 8b addition and
These large CPU and GPU SRAM stores present an opportunity for extensive in- 9.4GOPS for 8b multiplication. The best energy efficiency is achieved at 0.6V and
memory computing and have, to date, remained largely untapped. 114MHz, resulting in 0.56TOPS/W for 8b multiplication and 5.27TOPS/W for 8b
addition. Fig. 14.2.4 shows measured frequency and leakage power distributions
In this paper, we present a general purpose hybrid in-/near-memory compute SRAM
for 21 measured dies.
(CRAM) that combines the efficiency of in-memory computation with the flexibility
and programmability necessary for evolving software algorithms. CRAM augments Figure 14.2.5 shows the performance of the test chip for diverse computationally
conventional SRAM in a CPU with vector-based, bit-serial [6, 7] in-memory intensive tasks ranging from neural networks to graph and signal processing. The
arithmetic. It can accommodate a wide range of bit-widths, from single to 32b or total latency in cycles is compared with a baseline operation, where CRAMs are only
64b, and operation types, including integer and floating point addition, multiplication used as data memories and the computation is entirely performed on the ARM CPU.
and division. To maintain compatibility with CPU/GPU operation, CRAM writes/reads The first benchmark is the 1st convolutional layer from Cuda-convnet and the
operands conventionally with horizontal word-lines and vertical bit-lines. Then, using second is the last fully connected layer from AlexNet. Due to their size, these layers
a transposable bitcell [8], CRAM operates directly on the stored operands in memory must be executed in multiple smaller sub-sections. The third application consists
with additional horizontal compute bit-lines. This enables the same bit position from of 512 simultaneous 32-tap FIR filters and the fourth application performs traversal
two vectors elements to be simultaneously accessed on a single bit-line. Logic of a directed graph represented by a 192×192 adjacency matrix. The workload
operations are performed on the bit-line (in-memory), while small additional in- breakdown shows the percentage of time spent on input loading and output loading
column logic (near-memory with 4.5% SRAM bank area overhead) enables vs. in-memory computation. Speedup, compared to executing the same workload
carry-propagation between successive bit-serial calculations, enabling multi-bit with the ARM Cortex-M0, varies from 7.2-to-114×, with the greatest gains obtained
arithmetic operations in SIMD fashion across all vectors of elements. To maintain when the operation is compute-heavy and low on input/output movement.
versatility, the memories can function either as traditional or compute memories.
The approach was implemented in a small IoT processor in 28nm CMOS, consisting Figure 14.2.6 compares the proposed approach with other state-of-the-art in-
of a Cortex-M0 CPU and 8 CRAM banks of 16KB each (128KB total). The system memory accelerators. The proposed work is the only solution to provide a wide
achieves 475MHz operation and, with all CRAMs active, produces 30GOPS or range of instructions and flexible bitwidth. It repurposes the memory storage already
1.4GFLOPS on 32b operands for graph, neural, and DSP applications. available in processors, thereby accelerating computation while maintaining
programmability.
Figure 14.2.1 shows the overall organization of the IoT processor. The ARM core
can access all 8 memory banks and load/store data using the horizontal word-lines Acknowledgements:
and vertical bit-lines. Then, in-memory instructions can be streamed from one bank We gratefully acknowledge TSMC University Shuttle Program for chip fabrication.
to one or more compute-configured banks, while the M0 simultaneously performs This work was supported in part by ADA, one of six centers in JUMP, a
other processing with the remaining memory banks. Banks performing in-memory Semiconductor Research Corporation (SRC) program sponsored by DARPA.
computing use the horizontal compute bit-lines (CBLs) and vertical compute word- References:
lines (CWLs). [1] J. Zhang, et al., "In-Memory Computation of a Machine Learning Classifier in a
Figure 14.2.2 shows the architecture of the 128×256 CRAM sub-array, which is one Standard 6T SRAM Array," IEEE JSSC, vol. 52, no. 4, pp. 915-924, 2017.
quarter of a 16KB CRAM macro. An 8T transposable bitcell is used to provide [2] A. Biswas, et al., “Conv-RAM: An Energy-Efficient SRAM with Embedded
bidirectional access. Fig. 14.2.2 shows an example operation of the data flow for a Convolution Computation for Low-Power CNN-based Machine Learning
1b addition performed in 1 cycle of the bit-serial computation. Here, we add the Applications,” ISSCC, pp. 488-489, 2018.
second bit positions of vector A (A1=0) and vector B (B1=1) with carry-in C (=1) [3] S. Gonugondla, et al., “A 42pJ/Decision 3.12TOPS/W Robust In-Memory
from the previous cycle, and store the result back to vector D. First, the CRAM Machine Learning Classifier with On-Chip Training,” ISSCC, pp. 490-491, 2018
instruction decoder receives the ADD instruction and the 3 column addresses of [4] W. Khwa, et al., “A 65nm 4Kb Algorithm-Dependent Computing-in-Memory
bits A1, B1 and D1. It activates the CWLs of A1 and B1 simultaneously to compute ‘A SRAM Unit-Macro with 2.3ns and 55.8TOPS/W Fully Parallel Product-Sum
AND B’ on CBL and ‘A AND B’ on CBLB. Since A=0 and B=1, both CBL and CBLB Operation for Binary DNN Edge Processors,” ISSCC, pp 496-497, 2018.
discharge. Then, after the dual sense amps, the results propagate to the near- [5] Y. Zhang, et al., “Recryptor: A Reconfigurable In-Memory Cryptographic Cortex-
memory logic located at the end of each CBL. The NOR gate generates ‘A XOR B’, M0 Processor for IoT,” IEEE Symp. VLSI Circuits, 2017.
which combined with Cin from the carry latch produces Sum=0 and Cout =1. Sum is [6] K. Batcher, “Bit-Serial Parallel Processing Systems,” IEEE Trans. on Computers,
then written back to D, and Cout is stored in the carry latch, which provides Cin for vol. 31, no. 5, pp. 377-384, 1982.
the next cycle, thus completing one full bit-serial addition in one clock cycle. [7] C. Eckert, et al., “Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural
Networks,” ACM/IEEE ISCA, pp. 383-396, 2018.
Figure 14.2.3, left, shows how two vectors of 2b numbers (A and B) are added bit- [8] J. Seo, et al., “A 45nm CMOS Neuromorphic Chip with a Scalable Architecture
by-bit starting from the least significant bit (LSB). Note that while only one bit of a for Learning in Networks of Spiking Neurons,” IEEE CICC, 2011.
multi-bit operand is processed in each cycle, all compute bit-lines operate

224 • 2019 IEEE International Solid-State Circuits Conference 978-1-5386-8531-0/19/$31.00 ©2019 IEEE
ISSCC 2019 / February 19, 2019 / 2:00 PM

Figure 14.2.2: CRAM array architecture (top-left), 8T transposable bitcell (top-

right), In-memory computing part (bottom-left) and near-memory computing
Figure 14.2.1: Chip architecture and storage and computation of data in part (bottom-right) of 1-bit addition. Addition of near-memory logic increased
transposable memory array. array size by 4.5%.

Figure 14.2.3: 2-bit addition cycle-by-cycle demonstration (top-left), 2-bit Figure 14.2.4: Frequency and energy efﬁciency of 8-bit multiplication and
multiplication cycle-by-cycle demonstration (top-mid & right), and list of CRAM addition at different VDD (top), maximum frequency and leakage power
instructions and its performance (bottom). distribution of 21 dies at 1.1V (bottom).

Figure 14.2.5: Performance comparison between CRAM and baseline scenario

(top), workload breakdown (bottom). Figure 14.2.6: Comparison table.

DIGEST OF TECHNICAL PAPERS • 225

ISSCC 2019 PAPER CONTINUATIONS

Figure 14.2.7: Die photo.

Creality Ender-3 3D Printer Parts List & Sizes
No ratings yet
Creality Ender-3 3D Printer Parts List & Sizes
12 pages
PCI Express Technology 3.0 PDF
75% (4)
PCI Express Technology 3.0 PDF
181 pages
Kofi A. A. Makinwa - High Speed and Wide Bandwidth Delta-Sigma ADCs
100% (1)
Kofi A. A. Makinwa - High Speed and Wide Bandwidth Delta-Sigma ADCs
135 pages
C662 DS 20230511 Online
No ratings yet
C662 DS 20230511 Online
1 page
Solutions: CS152 Computer Architecture and Engineering
No ratings yet
Solutions: CS152 Computer Architecture and Engineering
17 pages
Project Report Group1
100% (2)
Project Report Group1
91 pages
A Brief Review On Linux: Index
No ratings yet
A Brief Review On Linux: Index
6 pages
Running Machines: Arm Multiple Choice
No ratings yet
Running Machines: Arm Multiple Choice
4 pages
Embedding Read-Only Memory in Spin-Transfer Torque MRAM-Based On-Chip Caches
No ratings yet
Embedding Read-Only Memory in Spin-Transfer Torque MRAM-Based On-Chip Caches
11 pages
Design of High Performance Radix-4 and Radix-8 Multiplier Using Verilog HDL
No ratings yet
Design of High Performance Radix-4 and Radix-8 Multiplier Using Verilog HDL
11 pages
A Logic-Compatible EDRAM Compute-In-Memory With Embedded ADCs For Processing Neural Networks
No ratings yet
A Logic-Compatible EDRAM Compute-In-Memory With Embedded ADCs For Processing Neural Networks
13 pages
Anjali Kumari Report
No ratings yet
Anjali Kumari Report
8 pages
A Multi-Functional In-Memory Inference Processor Using A Standard 6T SRAM Array
No ratings yet
A Multi-Functional In-Memory Inference Processor Using A Standard 6T SRAM Array
14 pages
Signal Processing With Pulse Trains: An Algebraic Approach-Part I
No ratings yet
Signal Processing With Pulse Trains: An Algebraic Approach-Part I
11 pages
Bit Parallel 6T SRAM In-Memory Computing With Reconfigurable Bit-Precision PDF
No ratings yet
Bit Parallel 6T SRAM In-Memory Computing With Reconfigurable Bit-Precision PDF
6 pages
Sehs3317 L4
No ratings yet
Sehs3317 L4
53 pages
Prob 7
No ratings yet
Prob 7
8 pages
EC3021 Computer Organisation and Architecture: Latest Technologies in Multiplier Design
No ratings yet
EC3021 Computer Organisation and Architecture: Latest Technologies in Multiplier Design
6 pages
51 - Design
No ratings yet
51 - Design
10 pages
Module 16 Sram
No ratings yet
Module 16 Sram
39 pages
Solutions COA7e 1
No ratings yet
Solutions COA7e 1
92 pages
HC2312131218 PDF
No ratings yet
HC2312131218 PDF
7 pages
Existing Methodology
No ratings yet
Existing Methodology
7 pages
Design of Current-Mode 8T SRAM Compute-In-Memory Macro For Processing Neural Networks
No ratings yet
Design of Current-Mode 8T SRAM Compute-In-Memory Macro For Processing Neural Networks
2 pages
Architecture of Static Random Access Memory Design Using 65nm Technology
No ratings yet
Architecture of Static Random Access Memory Design Using 65nm Technology
3 pages
SS - Report Format - MSD
No ratings yet
SS - Report Format - MSD
19 pages
Thesis Report
No ratings yet
Thesis Report
60 pages
Existing Methodology: I I I-1 I I-1 I I
No ratings yet
Existing Methodology: I I I-1 I I-1 I I
9 pages
Towards Integration of A Dedicated Memory Controll
No ratings yet
Towards Integration of A Dedicated Memory Controll
11 pages
Tesseract Pim Architecture For Graph Processing - Isca15
No ratings yet
Tesseract Pim Architecture For Graph Processing - Isca15
13 pages
eDRAM-OESP: A Novel Performance Efficient in-embedded-DRAM-compute Design For On-Edge Signal Processing Application
No ratings yet
eDRAM-OESP: A Novel Performance Efficient in-embedded-DRAM-compute Design For On-Edge Signal Processing Application
7 pages
10T SRAM Computing-in-Memory Macros For Binary and
No ratings yet
10T SRAM Computing-in-Memory Macros For Binary and
15 pages
Micron Emmc v50
No ratings yet
Micron Emmc v50
26 pages
Lab 05 Memories
No ratings yet
Lab 05 Memories
12 pages
2) A - Time-Domain - Computing-In-Memory - Micro - Using - Ring - Oscillator
No ratings yet
2) A - Time-Domain - Computing-In-Memory - Micro - Using - Ring - Oscillator
2 pages
2022-BitBlade - Energy-Efficient - Variable - Bit-Precision - Hardware - Accelerator - For - Quantized - Neural - Networks
No ratings yet
2022-BitBlade - Energy-Efficient - Variable - Bit-Precision - Hardware - Accelerator - For - Quantized - Neural - Networks
12 pages
Irjet V7i5474
No ratings yet
Irjet V7i5474
6 pages
Ram Memory Design
No ratings yet
Ram Memory Design
5 pages
2.approximate Memory For Low-Power Video Applications
No ratings yet
2.approximate Memory For Low-Power Video Applications
10 pages
Lec10 Memory 2
No ratings yet
Lec10 Memory 2
48 pages
VLSI Implementation of Bit Serial Architecture Based Multiplier in Floating Point Arithmetic
No ratings yet
VLSI Implementation of Bit Serial Architecture Based Multiplier in Floating Point Arithmetic
6 pages
X-SRAM Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories
No ratings yet
X-SRAM Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories
14 pages
Multiply Accumulate Operations in Memristor Crossbar Arrays Foranalog Computing
No ratings yet
Multiply Accumulate Operations in Memristor Crossbar Arrays Foranalog Computing
22 pages
IJME Vol 7 Iss 4 Paper 9 1260 1264
No ratings yet
IJME Vol 7 Iss 4 Paper 9 1260 1264
5 pages
Xnor
No ratings yet
Xnor
11 pages
02Computing-in-Memory With SRAM and RRAM For Binary Neural Networks
No ratings yet
02Computing-in-Memory With SRAM and RRAM For Binary Neural Networks
4 pages
11 8T SRAM Cell As A Multi-Bit Dot Product Engine For Beyond Von-Neumann Computing
No ratings yet
11 8T SRAM Cell As A Multi-Bit Dot Product Engine For Beyond Von-Neumann Computing
7 pages
A Novel Low Power and High Speed Multiply-Accumulate MAC Unit Design For Floating-Point Numbers
No ratings yet
A Novel Low Power and High Speed Multiply-Accumulate MAC Unit Design For Floating-Point Numbers
7 pages
Parallel Port
No ratings yet
Parallel Port
31 pages
Manual MBDP22-2
No ratings yet
Manual MBDP22-2
37 pages
A Reliable 8T SRAM For High-Speed Searching and Logic-in-Memory Operations
No ratings yet
A Reliable 8T SRAM For High-Speed Searching and Logic-in-Memory Operations
12 pages
An 8-Bit RRAM Based Multiplier For Hybrid Memory Computing
No ratings yet
An 8-Bit RRAM Based Multiplier For Hybrid Memory Computing
3 pages
DSD Subsystem Design
No ratings yet
DSD Subsystem Design
65 pages
Seminar Report1
No ratings yet
Seminar Report1
30 pages
Memory Element Design
No ratings yet
Memory Element Design
11 pages
Systems Reference Library: IBM System/360 System Summary, Form A22-6810, Which
No ratings yet
Systems Reference Library: IBM System/360 System Summary, Form A22-6810, Which
199 pages
Module-5 Memories Updated
No ratings yet
Module-5 Memories Updated
27 pages
Project
No ratings yet
Project
62 pages
Unit 3,4
No ratings yet
Unit 3,4
9 pages
ATX Bench Power Supply
No ratings yet
ATX Bench Power Supply
27 pages
Cvh3n Users
No ratings yet
Cvh3n Users
158 pages
2021 Asscc 8-5
No ratings yet
2021 Asscc 8-5
3 pages
Background - Swapping - Contiguous Allocation - Paging - Segmentation - Segmentation With Paging
No ratings yet
Background - Swapping - Contiguous Allocation - Paging - Segmentation - Segmentation With Paging
54 pages
Direct BPS of RF
No ratings yet
Direct BPS of RF
6 pages
AD-120i Parts Manual: Whirlpool Corporation
No ratings yet
AD-120i Parts Manual: Whirlpool Corporation
32 pages
BBQ Tonight Discount CC 240508 130210
No ratings yet
BBQ Tonight Discount CC 240508 130210
2 pages
SKU Models - PS5 Developer Wiki
No ratings yet
SKU Models - PS5 Developer Wiki
4 pages
BSE Universal Interface: Manual
No ratings yet
BSE Universal Interface: Manual
4 pages
REN R20an0547ej0200-Rfpan APN 20201224
No ratings yet
REN R20an0547ej0200-Rfpan APN 20201224
18 pages
TH66 Manual - Updated v2
No ratings yet
TH66 Manual - Updated v2
18 pages
FQ 4201102700
No ratings yet
FQ 4201102700
14 pages
CO101 Lec01 IntroductiontoComputerSystemsOrganization
No ratings yet
CO101 Lec01 IntroductiontoComputerSystemsOrganization
18 pages
Unit - 3
No ratings yet
Unit - 3
32 pages
RRAM Based In-Memory Computing From Device and Lar
No ratings yet
RRAM Based In-Memory Computing From Device and Lar
16 pages
Reliable Computing of ReRAM Based Compute-in-Memory Circuits For AI Edge Devices
No ratings yet
Reliable Computing of ReRAM Based Compute-in-Memory Circuits For AI Edge Devices
6 pages
DSD Module 2
No ratings yet
DSD Module 2
118 pages
An Emulated Computer With Assembler For Teaching U
No ratings yet
An Emulated Computer With Assembler For Teaching U
9 pages
Lenovo Legion Y540 17IRH PG0 Spec
No ratings yet
Lenovo Legion Y540 17IRH PG0 Spec
1 page
RRAM-Based In-Memory Computing For Embedded Deep Neural Networks
No ratings yet
RRAM-Based In-Memory Computing For Embedded Deep Neural Networks
5 pages
An Improvised Design Implementation of S
No ratings yet
An Improvised Design Implementation of S
10 pages
Highlight HPC168 Passenger Counter Hardware
No ratings yet
Highlight HPC168 Passenger Counter Hardware
10 pages
10.1038@s41563 019 0291 X
No ratings yet
10.1038@s41563 019 0291 X
15 pages
Prac 1
No ratings yet
Prac 1
9 pages
Module-V CPU, TPU, GPU
No ratings yet
Module-V CPU, TPU, GPU
9 pages
Main
No ratings yet
Main
5 pages
A 32 GB S PAM-4 Optical Transceiver With Active Back Termination in 40 NM CMOS Technology
No ratings yet
A 32 GB S PAM-4 Optical Transceiver With Active Back Termination in 40 NM CMOS Technology
9 pages
13 mosCapsAndMiller
No ratings yet
13 mosCapsAndMiller
22 pages
LCD Liquid Crystal Display LCD
No ratings yet
LCD Liquid Crystal Display LCD
9 pages
Enhanced Circuit For Linear Ring VCO-ADCs
No ratings yet
Enhanced Circuit For Linear Ring VCO-ADCs
3 pages
In-Memory Computing With 6T SRAM For Multi-Operator Logic Design
No ratings yet
In-Memory Computing With 6T SRAM For Multi-Operator Logic Design
15 pages
A 0.61-J Frame Pipelined Wired-Logic DNN Processor in 16-nm FPGA Using Convolutional Non-Linear Neural Network
No ratings yet
A 0.61-J Frame Pipelined Wired-Logic DNN Processor in 16-nm FPGA Using Convolutional Non-Linear Neural Network
11 pages
Accelerating AI Applications Using Analog In-Memory Computing: Challenges and Opportunities
No ratings yet
Accelerating AI Applications Using Analog In-Memory Computing: Challenges and Opportunities
6 pages
A High Precision Monolithic Super-Beta Operational Amplifier
No ratings yet
A High Precision Monolithic Super-Beta Operational Amplifier
8 pages
07b Bodyeffect
No ratings yet
07b Bodyeffect
19 pages
A Scalable 20V Charge-Pump-Based Driver in 65nm CMOS Technology
No ratings yet
A Scalable 20V Charge-Pump-Based Driver in 65nm CMOS Technology
5 pages
Overcoming The Transimpedance Limit A Tutorial On Design of Low-Noise TIA
No ratings yet
Overcoming The Transimpedance Limit A Tutorial On Design of Low-Noise TIA
6 pages
05 Cascodegain
No ratings yet
05 Cascodegain
12 pages
Uk Utility Bill 2025 - Google Search
No ratings yet
Uk Utility Bill 2025 - Google Search
3 pages
Reconfigurable Multiplier
No ratings yet
Reconfigurable Multiplier
16 pages
Chapter 1 OS Notes
No ratings yet
Chapter 1 OS Notes
7 pages

14.2 A Compute SRAM With Bit Serial Integer - Floating Point Operations For Programmable in Memory Vector Acceleration

Uploaded by

14.2 A Compute SRAM With Bit Serial Integer - Floating Point Operations For Programmable in Memory Vector Acceleration

Uploaded by

ISSCC 2019 / SESSION 14 / MACHINE LEARNING & DIGITAL LDO CIRCUITS / 14.

Figure 14.2.2: CRAM array architecture (top-left), 8T transposable bitcell (top-

Figure 14.2.5: Performance comparison between CRAM and baseline scenario

DIGEST OF TECHNICAL PAPERS • 225

Figure 14.2.7: Die photo.

• 2019 IEEE International Solid-State Circuits Conference 978-1-5386-8531-0/19/$31.00 ©2019 IEEE

You might also like