0% found this document useful (0 votes)

8 views21 pages

2017 - Binary Convolutional Neural Network On RRAM - PPT

Uploaded by

陈德爱

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views21 pages

2017 - Binary Convolutional Neural Network On RRAM - PPT

Uploaded by

陈德爱

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Binary Convolutional Neural

Network on RRAM
Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang

Dept. of E.E, Tsinghua National Laboratory for Information

Science and Technology (TNList)
Tsinghua University, Beijing, China
e-mail: [email protected]
Outline

• Background & Motivation

• RRAM-based Binary CNN Accelerator Design

– System Overview
– Convolver Circuit
– Line Buffer & Pipeline

• Experimental Results
– Comparison between “BCNN on RRAM” and “Multi-bit CNN on RRAM”
– Recognition Accuracy under Device Variation
– Area and Energy Cost Saving

• Conclusion

2
Convolutional Neural Network

• Popular in Recent Ten Years, Good Performance for Wide Ranges of Applications

Generalized
Recognition

(CNN)
Object Detection & Localization Image Caption

Specialized
Recognition

(CNN) Lane Detection & Vehicle Detection Pedestrian Detection Face Recognition

Besides
Vision Tasks

(CNN + other) 33
Speech Recognition Natural Language Processing Chess & Go
(CNN + RNN) (CNN + LSTM) (CNN + Reinforcement Learning)
Convolutional Neural Network

• Popular in Recent Ten Years, Good Performance for Wide Ranges of Applications
Winners of the Image-Net Large-Scale Visual Recognition Challenge (ILSVRC)
Task: Classification & Localization with Provided Data
2011 2012 2013 2014 2015 2016
Team XRCE SuperVision Clarifai VGG MSRA Trimps-Soushen
Model Not CNN AlexNet ZF VGG-16 ResNet-152 Ensemble
Err (Top-5) 25.8% 16.4% 11.7% 7.4% 3.57% 2.99%

2016 For the Current Popular CNN Model

(ResNet Series),
2014 • <5% Top-5 Error Rate
2014
• 10-50M Weights for Single Model
• >10G Operations per Inference

2012 A High Energy Efficiency Platform

is Required for CNN Applications!
4 3
[arxiv: 1605.07678]
Convolutional Neural Network

• Layer-wise Structure & Basic Operations

LeNet-5

CNN Operations
Fully-Connected Neuron Convolution Pooling Normalization
DNN Operations
5
Convolutional Neural Network

• Layer-wise Structure & Basic Operations

LeNet-5

CNN on RRAM ?
Fully-Connected Neuron Convolution Pooling Normalization
DNN on RRAM  (PRIME [ISCA 2016], ISAAC [ISCA 2016])
6
Convolutional Neural Network Input Image

• Layer-wise Structure & Basic Operations Conv Layer 1

Convolution v.s. Fully-Connected Pooling Layer 1

• Matrix-Vector Multiplication y = W·x → Convolver Circuit
…
• Weight Mapping
Conv Layer N
• Interface Design
• Sliding Window→ Data Buffering & Fetching
Pooling Layer N

FC Layer (N+1)
• One-to-One Mapping
Pooling
…
FC Layer (N+M)

Recognition Result
7
Convolutional Neural Network Input Image

• Layer-wise Structure & Basic Operations Conv Layer 1

Convolution v.s. Fully-Connected Pooling Layer 1

• Matrix-Vector Multiplication → Convolver Circuit
…
• Sliding Window→ Data Buffering & Fetching
Conv Layer N

Normalization v.s. Neuron

Pooling Layer N
• One-to-One Mapping → Peripheral → Convolver Circuit
• Linear or Non-Linear Function
FC Layer (N+1)
Pooling
…
• Sliding Window → Data Buffering & Fetching FC Layer (N+M)

Recognition Result
8
BCNN on RRAM

• Two Main Concerns for “CNN on RRAM” Design

• Convolver Circuit Design

• Intermediate Result Buffering & Fetching

9 3
BCNN on RRAM

• Two Main Concerns for “CNN on RRAM” Design

• Convolver Circuit Design

• X-axis: GOPS, NOT GFLOPS!

• Quantization is always Introduced.

y = W·x
 Weight Quantization
 Interface Quantization

• Intermediate Result Buffering & Fetching

10 3
BCNN on RRAM

• Two Main Concerns for “CNN on RRAM” Design

• Convolver Circuit Design
• Weight Mapping: Lower Bit-Level of Weights
• Lower Requirement for RRAM Representative Ability
• Interface Design: Lower Bit-Level of Neurons
• Lower Cost on ADC/DAC Interfaces
• Extreme Case:Binary Weights & Binary Interfaces
→ Binary CNN on RRAM

• Intermediate Result Buffering & Fetching

• Line Buffer Structure & Pipelining Strategy

11 3
BCNN on RRAM

• Binary CNN
• Training Workflow: Binarize while Training
• BinaryNet [arXiv:1602.02830]
• XNOR-Net [Rastegari ECCV 2016]

• Inference: How to Map the Well-Trained BCNN onto RRAM?

yb = Binarize(Wb ·xb)
• The Case when “RRAM Crossbar Size > Weight Matrix Size”
→ Direct Mapping
• The Case when “RRAM Crossbar Size < Weight Matrix Size”
e.g. crossbar column length = 128, Conv kernel size of VGG19 = 3*3*512
→ Matrix Splitting

12 3
BCNN on RRAM: Convolver Circuit

How to map a large matrix onto a group of RRAM crossbars?

x = x =
• Column Splitting

(# RRAM Column < # Kernel)

• Row Splitting x = x = + +

(RRAM Column Length < Kernel Size)

How to decide the bit-level of the partial sum?
• Current Solution: 4-bit interface to
• Signal Splitting = Pos - Neg trade-off accuracy and energy efficiency
by brute force searching.
• Future Work: Merge the partial sum
quantization into the training process.

13
BCNN on RRAM: Line Buffer & Pipeline

• Sliding Window
– Unnecessary to buffer the whole input feature maps.
– the Convolver circuit can awake (A) from sleep (S) once the input data
of the Conv kernel size is achieved;
– Structure of Line Buffer introduced to cache and fetch intermediate
feature map.
(𝑖)
C𝑖𝑛 line buffers

(𝑖) (𝑖)
C𝑖𝑛 IN ℎ ⋅ 𝑤 ⋅ C𝑖𝑛
MUX

……
0
……
Feature Map Feature Map

ℎ
From the Previous CONV_EN CTRL To the Next
OUT
Convovler Circuit 𝑤 Convovler Circuit
𝑊
(f)

14
BCNN on RRAM: Line Buffer & Pipeline

• Basic Convolution • Line Buffer Size << Feature Map Size

Layer i Layer i

x
x
3*3 Kernel
Feature Map 3*3 Kernel
Feature Map

• Zero Padding: To Keep the • Awake When Sliding Along

Input/Output Feature Map Size the Same • Sleep When Meeting the End of One Row
Layer i Layer i

✔️ ✖
x x
3*3 Kernel
3*3 Kernel
Feature Map
Feature Map
15
Experimental Results

• Experimental Setup:
– Small Case:
• LeNet on MNIST (C5*20-S2-C5*50-S2-FC100-FC10)
• Unnecessary for Matrix Splitting
• Effects of Device Variation on Multi-bit/Binary CNN Model
Mapping onto N-bit RRAM devices
– Large Case:
• AlexNet on ImageNet
• Necessary for Matrix Splitting (4bit ADC for Partial Sum Interface)
• Area and Energy Estimation on Multi-bit/Binary CNN Model
Mapping onto N-bit RRAM platform
– Other Settings:
• Crossbar Size: (128x128)

16
Experimental Results

• Effects of Device Variation under Different Bit-Levels

CNN Model
【Multi-bit Model】(Quantization Error)
Dynamically Quantizing [Qiu. FPGA 2016]
the Well-Trained Floating-Point Model
into M bits RRAM Model
• With N-bit Representative Ability
【Binary Model】(Training Error) • Assumption: Symmetric Conductance Range
Well Trained Model under the BinaryNet • the kth Conductance Range:
Training WorkFlow (g(k)-△g, g(k)+△g),
• Device Variation:
(-△g, +△g).
(Mapping Error) 【Full Bit-Level Mode】
All 2N Conductance Ranges are all in Use
【Binary Mode】
Only two Conductance Ranges are Picked from
2N Ones .

17
Experimental Results

• Multi-bit/Binary CNN Error Rate of LeNet on MNIST:

Effects of Device Variation Under Different Bit-Levels

• No Variation: Precise Mapping from the Quantized Model to RRAM (Baseline)

– Full Bit-Level Mode: Dynamic Quantization Error Only
– Binary Mode: Training Error Only

• With Variation:
– Full Bit-Level Mode: Dynamic Quantization Error + Variation
– Binary Mode: Training Error + Variation
Binary Mode shows Better Robustness than Full Bit-Level Mode
18
Experimental Results

• Area and Energy Estimation of Different RRAM-based Crossbar PEs

• Area: FC Layer takes the largest part

• Energy: Conv Layer takes the largest part

• Input Interface: Mostly Saved
• Output Interface: Still takes a large portion

19
Conclusion

• In this paper, an RRAM crossbar-based accelerator is proposed for

BCNN forward process.
– The matrix splitting problem
– The pipeline implementation.

• The robustness of BCNN on RRAM under device variation are

demonstrated.
– Experimental results show that BCNN introduces negligible
recognition accuracy loss for LeNet on MNIST.
– For AlexNet on ImageNet, the RRAM-based BCNN accelerator
saves 58.2% energy consumption and 56.8% area compared with
multi-bit CNN structure.

20
Thanks for your Attention 

UNIT 2 Study Materials 1
No ratings yet
UNIT 2 Study Materials 1
42 pages
Week 8
No ratings yet
Week 8
101 pages
Unit 1
No ratings yet
Unit 1
109 pages
CNN and Autoencoder
No ratings yet
CNN and Autoencoder
56 pages
Convolutional Neural Network: by Gagandeep Kaur
100% (1)
Convolutional Neural Network: by Gagandeep Kaur
107 pages
A 64 KB Reconfigurable Full-Precision Digital ReRAM-Based Compute-In-Memory For Artificial Intelligence Applications
No ratings yet
A 64 KB Reconfigurable Full-Precision Digital ReRAM-Based Compute-In-Memory For Artificial Intelligence Applications
13 pages
03 Convolutional Neural Networks
No ratings yet
03 Convolutional Neural Networks
83 pages
CNN MLFA Ons-Part1
No ratings yet
CNN MLFA Ons-Part1
65 pages
Convolutional Neuralnetworks: Abin - Roozgard
No ratings yet
Convolutional Neuralnetworks: Abin - Roozgard
54 pages
DSA5102 Lecture5
No ratings yet
DSA5102 Lecture5
45 pages
DSA5102X Lecture5
No ratings yet
DSA5102X Lecture5
44 pages
DL6 - Convnets 4
No ratings yet
DL6 - Convnets 4
57 pages
Hot Chips Overview
No ratings yet
Hot Chips Overview
47 pages
FT04 Haghighat Independent 2023
No ratings yet
FT04 Haghighat Independent 2023
40 pages
CV 2025 Spring 16
No ratings yet
CV 2025 Spring 16
53 pages
Oct2022 CSC649 SupervisedDL - CNN
No ratings yet
Oct2022 CSC649 SupervisedDL - CNN
79 pages
Scan 30 Sep 23 18 20 44
No ratings yet
Scan 30 Sep 23 18 20 44
30 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
SN DL UNIT4 Complete
No ratings yet
SN DL UNIT4 Complete
28 pages
Lec6 RNN Attention Search
No ratings yet
Lec6 RNN Attention Search
62 pages
2017 - Binary Convolutional Neural Network On RRAM
No ratings yet
2017 - Binary Convolutional Neural Network On RRAM
6 pages
Images and Convolutional Neural Networks: Practical Deep Learning
No ratings yet
Images and Convolutional Neural Networks: Practical Deep Learning
34 pages
Deep Learning: Seungsang Oh
No ratings yet
Deep Learning: Seungsang Oh
39 pages
Convolutional Networks
No ratings yet
Convolutional Networks
37 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
Unit 3 - Machine Learning
No ratings yet
Unit 3 - Machine Learning
27 pages
BNN in FPGA
No ratings yet
BNN in FPGA
15 pages
Rec03 - Deep Architectures
No ratings yet
Rec03 - Deep Architectures
65 pages
Data and Hardware Efficient Design For Convolutional Neural Network!
No ratings yet
Data and Hardware Efficient Design For Convolutional Neural Network!
10 pages
A CNN Accelerator On FPGA Using Depthwise
No ratings yet
A CNN Accelerator On FPGA Using Depthwise
5 pages
CNN hw1
No ratings yet
CNN hw1
13 pages
RRAM-Based In-Memory Computing For Embedded Deep Neural Networks
No ratings yet
RRAM-Based In-Memory Computing For Embedded Deep Neural Networks
5 pages
DL Unit2
No ratings yet
DL Unit2
25 pages
CNN
No ratings yet
CNN
10 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Design and Optimization of Fefet-Based Crossbars For Binary Convolution Neural Networks
No ratings yet
Design and Optimization of Fefet-Based Crossbars For Binary Convolution Neural Networks
6 pages
Cloning Safe Driving Behavior For Self-D PDF
No ratings yet
Cloning Safe Driving Behavior For Self-D PDF
8 pages
DL Unit 4&5
No ratings yet
DL Unit 4&5
30 pages
Visual and Audio Signal Processing Lab University of Wollongong
No ratings yet
Visual and Audio Signal Processing Lab University of Wollongong
20 pages
Deep Learning Image Classification
No ratings yet
Deep Learning Image Classification
11 pages
High Throughput and Low Bandwidth Demand Accelerating CNN Inference Block-By-block On FPGAs
No ratings yet
High Throughput and Low Bandwidth Demand Accelerating CNN Inference Block-By-block On FPGAs
9 pages
Accelerating Binarized Convolutional 2017
No ratings yet
Accelerating Binarized Convolutional 2017
10 pages
Image Classification Using Convolutional Neural Networks (CNNS)
No ratings yet
Image Classification Using Convolutional Neural Networks (CNNS)
61 pages
7-Research On FPGA High-Performance Implementation Method of CNN
No ratings yet
7-Research On FPGA High-Performance Implementation Method of CNN
5 pages
DL Unit4
No ratings yet
DL Unit4
31 pages
Lecture 3 V33
No ratings yet
Lecture 3 V33
52 pages
Understanding of Convolutional Neural Network (CNN)
No ratings yet
Understanding of Convolutional Neural Network (CNN)
9 pages
02Computing-in-Memory With SRAM and RRAM For Binary Neural Networks
No ratings yet
02Computing-in-Memory With SRAM and RRAM For Binary Neural Networks
4 pages
Wang 2017
No ratings yet
Wang 2017
7 pages
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
No ratings yet
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
5 pages
A CNN Accelerator On FPGA Using Depthwise Separable Convolution
No ratings yet
A CNN Accelerator On FPGA Using Depthwise Separable Convolution
5 pages
Ane Cient Implementation of 2D Convolution in CNN: Jing Chang and Jin Sha
No ratings yet
Ane Cient Implementation of 2D Convolution in CNN: Jing Chang and Jin Sha
8 pages
Guddu Jha - Organized
No ratings yet
Guddu Jha - Organized
3 pages
Area Efficient Compression For Floating-Point Feature Maps in Convolutional Neural Network Accelerators
No ratings yet
Area Efficient Compression For Floating-Point Feature Maps in Convolutional Neural Network Accelerators
5 pages
Mergeddv
No ratings yet
Mergeddv
2 pages
Them Bombs - Manual (En 3.0)
No ratings yet
Them Bombs - Manual (En 3.0)
31 pages
Access Control SOW
100% (1)
Access Control SOW
20 pages
Software Quality Metrics
No ratings yet
Software Quality Metrics
16 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
9 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu - Medium
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu - Medium
8 pages
VPLS On IOS-XR
100% (1)
VPLS On IOS-XR
86 pages
Final Exam Course Outline OODBMS
No ratings yet
Final Exam Course Outline OODBMS
2 pages
Ebook Definitive Guide To Cloud Security
No ratings yet
Ebook Definitive Guide To Cloud Security
43 pages
Indutrial Training Report Format Tybtech Mech
No ratings yet
Indutrial Training Report Format Tybtech Mech
4 pages
Quantum Users Guide-3
100% (2)
Quantum Users Guide-3
209 pages
Presentation Hotel Management System SQL
100% (1)
Presentation Hotel Management System SQL
20 pages
Get The Most Out of Your Storage With The Dell EMC Unity XT 880F All-Flash Array
No ratings yet
Get The Most Out of Your Storage With The Dell EMC Unity XT 880F All-Flash Array
13 pages
SX Uo WUELlua 8 QD VGyva G
No ratings yet
SX Uo WUELlua 8 QD VGyva G
3 pages
Cloud Computing Overview
No ratings yet
Cloud Computing Overview
8 pages
ASA Failover Troubleshooting On 7.x and 8
No ratings yet
ASA Failover Troubleshooting On 7.x and 8
5 pages
User'S Manual: Revision 1.0a
No ratings yet
User'S Manual: Revision 1.0a
126 pages
Certificate: Student Database" Mr. Sandeep Singh Chauhan
No ratings yet
Certificate: Student Database" Mr. Sandeep Singh Chauhan
20 pages
CM2 4G GPS Datasheet - 1
No ratings yet
CM2 4G GPS Datasheet - 1
2 pages
Glo Settings
No ratings yet
Glo Settings
4 pages
Excel Core 2016 Lesson 09
No ratings yet
Excel Core 2016 Lesson 09
115 pages
Siemens Case Study On AgilePoint
No ratings yet
Siemens Case Study On AgilePoint
1 page
2000 Parts Book (701.6 KB)
No ratings yet
2000 Parts Book (701.6 KB)
46 pages
FLEX Board USB Connection
100% (2)
FLEX Board USB Connection
3 pages
Cisco Tidal Intelligent Automation For SAP System Refresh Datasheet 1104B0710 - FINAL
No ratings yet
Cisco Tidal Intelligent Automation For SAP System Refresh Datasheet 1104B0710 - FINAL
3 pages
Ais Notes - Finals
No ratings yet
Ais Notes - Finals
16 pages
2012 An Introduction To The Memristor A Valuable Circui
No ratings yet
2012 An Introduction To The Memristor A Valuable Circui
10 pages
2016-Nature - Memristors With Diffusive Dynamics As Synaptic Emulators For Neuromorphic
No ratings yet
2016-Nature - Memristors With Diffusive Dynamics As Synaptic Emulators For Neuromorphic
10 pages
Neuromorphic Computing Emerging Memories Artificial Intelligence Socs
No ratings yet
Neuromorphic Computing Emerging Memories Artificial Intelligence Socs
7 pages
Introduction To Von Neumann Architecture
No ratings yet
Introduction To Von Neumann Architecture
8 pages
Eco Strip 050
No ratings yet
Eco Strip 050
17 pages
Nnano 19 Heat-Assisted Microwave Amplifier
No ratings yet
Nnano 19 Heat-Assisted Microwave Amplifier
3 pages
Quiz 6 - 3
No ratings yet
Quiz 6 - 3
9 pages
A Survey Paper On: Gmail API Services and Importing PDF'S.: Authors
No ratings yet
A Survey Paper On: Gmail API Services and Importing PDF'S.: Authors
15 pages
Unit 7 Pointers
No ratings yet
Unit 7 Pointers
12 pages
CS Dept Practical Exams Timetable 2024
No ratings yet
CS Dept Practical Exams Timetable 2024
4 pages
PS Experiment 02
No ratings yet
PS Experiment 02
2 pages
Contemporary Machine Learning Methods: Harnessing Scikit-Learn and TensorFlow
From Everand
Contemporary Machine Learning Methods: Harnessing Scikit-Learn and TensorFlow
Adam Jones
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet

2017 - Binary Convolutional Neural Network On RRAM - PPT

Uploaded by

2017 - Binary Convolutional Neural Network On RRAM - PPT

Uploaded by

Binary Convolutional Neural

Dept. of E.E, Tsinghua National Laboratory for Information

• Background & Motivation

• RRAM-based Binary CNN Accelerator Design

2016 For the Current Popular CNN Model

2012 A High Energy Efficiency Platform

• Layer-wise Structure & Basic Operations

• Layer-wise Structure & Basic Operations

• Layer-wise Structure & Basic Operations Conv Layer 1

Convolution v.s. Fully-Connected Pooling Layer 1

• Layer-wise Structure & Basic Operations Conv Layer 1

Convolution v.s. Fully-Connected Pooling Layer 1

Normalization v.s. Neuron

• Two Main Concerns for “CNN on RRAM” Design

• Intermediate Result Buffering & Fetching

• Two Main Concerns for “CNN on RRAM” Design

• X-axis: GOPS, NOT GFLOPS!

• Quantization is always Introduced.

• Intermediate Result Buffering & Fetching

• Two Main Concerns for “CNN on RRAM” Design

• Intermediate Result Buffering & Fetching

• Inference: How to Map the Well-Trained BCNN onto RRAM?

How to map a large matrix onto a group of RRAM crossbars?

(# RRAM Column < # Kernel)

(RRAM Column Length < Kernel Size)

• Basic Convolution • Line Buffer Size << Feature Map Size

• Zero Padding: To Keep the • Awake When Sliding Along

• Effects of Device Variation under Different Bit-Levels

• Multi-bit/Binary CNN Error Rate of LeNet on MNIST:

• No Variation: Precise Mapping from the Quantized Model to RRAM (Baseline)

• Area and Energy Estimation of Different RRAM-based Crossbar PEs

• Area: FC Layer takes the largest part

• Energy: Conv Layer takes the largest part

• In this paper, an RRAM crossbar-based accelerator is proposed for

• The robustness of BCNN on RRAM under device variation are

You might also like