0% found this document useful (0 votes)
8 views21 pages

2017 - Binary Convolutional Neural Network On RRAM - PPT

Uploaded by

陈德爱
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views21 pages

2017 - Binary Convolutional Neural Network On RRAM - PPT

Uploaded by

陈德爱
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Binary Convolutional Neural

Network on RRAM
Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang

Dept. of E.E, Tsinghua National Laboratory for Information


Science and Technology (TNList)
Tsinghua University, Beijing, China
e-mail: [email protected]
Outline

• Background & Motivation

• RRAM-based Binary CNN Accelerator Design


– System Overview
– Convolver Circuit
– Line Buffer & Pipeline

• Experimental Results
– Comparison between “BCNN on RRAM” and “Multi-bit CNN on RRAM”
– Recognition Accuracy under Device Variation
– Area and Energy Cost Saving

• Conclusion

2
Convolutional Neural Network

• Popular in Recent Ten Years, Good Performance for Wide Ranges of Applications

Generalized
Recognition

(CNN)
Object Detection & Localization Image Caption

Specialized
Recognition

(CNN) Lane Detection & Vehicle Detection Pedestrian Detection Face Recognition

Besides
Vision Tasks

(CNN + other) 33
Speech Recognition Natural Language Processing Chess & Go
(CNN + RNN) (CNN + LSTM) (CNN + Reinforcement Learning)
Convolutional Neural Network

• Popular in Recent Ten Years, Good Performance for Wide Ranges of Applications
Winners of the Image-Net Large-Scale Visual Recognition Challenge (ILSVRC)
Task: Classification & Localization with Provided Data
2011 2012 2013 2014 2015 2016
Team XRCE SuperVision Clarifai VGG MSRA Trimps-Soushen
Model Not CNN AlexNet ZF VGG-16 ResNet-152 Ensemble
Err (Top-5) 25.8% 16.4% 11.7% 7.4% 3.57% 2.99%

2016 For the Current Popular CNN Model


(ResNet Series),
2014 • <5% Top-5 Error Rate
2014
• 10-50M Weights for Single Model
• >10G Operations per Inference

2012 A High Energy Efficiency Platform


is Required for CNN Applications!
4 3
[arxiv: 1605.07678]
Convolutional Neural Network

• Layer-wise Structure & Basic Operations


LeNet-5

CNN Operations
Fully-Connected Neuron Convolution Pooling Normalization
DNN Operations
5
Convolutional Neural Network

• Layer-wise Structure & Basic Operations


LeNet-5

CNN on RRAM ?
Fully-Connected Neuron Convolution Pooling Normalization
DNN on RRAM  (PRIME [ISCA 2016], ISAAC [ISCA 2016])
6
Convolutional Neural Network Input Image

• Layer-wise Structure & Basic Operations Conv Layer 1

Convolution v.s. Fully-Connected Pooling Layer 1


• Matrix-Vector Multiplication y = W·x → Convolver Circuit

• Weight Mapping
Conv Layer N
• Interface Design
• Sliding Window→ Data Buffering & Fetching
Pooling Layer N

FC Layer (N+1)
• One-to-One Mapping
Pooling

FC Layer (N+M)

Recognition Result
7
Convolutional Neural Network Input Image

• Layer-wise Structure & Basic Operations Conv Layer 1

Convolution v.s. Fully-Connected Pooling Layer 1


• Matrix-Vector Multiplication → Convolver Circuit

• Sliding Window→ Data Buffering & Fetching
Conv Layer N

Normalization v.s. Neuron


Pooling Layer N
• One-to-One Mapping → Peripheral → Convolver Circuit
• Linear or Non-Linear Function
FC Layer (N+1)
Pooling

• Sliding Window → Data Buffering & Fetching FC Layer (N+M)

Recognition Result
8
BCNN on RRAM

• Two Main Concerns for “CNN on RRAM” Design


• Convolver Circuit Design

• Intermediate Result Buffering & Fetching

9 3
BCNN on RRAM

• Two Main Concerns for “CNN on RRAM” Design


• Convolver Circuit Design

• X-axis: GOPS, NOT GFLOPS!

• Quantization is always Introduced.


y = W·x
 Weight Quantization
 Interface Quantization

• Intermediate Result Buffering & Fetching

10 3
BCNN on RRAM

• Two Main Concerns for “CNN on RRAM” Design


• Convolver Circuit Design
• Weight Mapping: Lower Bit-Level of Weights
• Lower Requirement for RRAM Representative Ability
• Interface Design: Lower Bit-Level of Neurons
• Lower Cost on ADC/DAC Interfaces
• Extreme Case:Binary Weights & Binary Interfaces
→ Binary CNN on RRAM

• Intermediate Result Buffering & Fetching


• Line Buffer Structure & Pipelining Strategy

11 3
BCNN on RRAM

• Binary CNN
• Training Workflow: Binarize while Training
• BinaryNet [arXiv:1602.02830]
• XNOR-Net [Rastegari ECCV 2016]

• Inference: How to Map the Well-Trained BCNN onto RRAM?


yb = Binarize(Wb ·xb)
• The Case when “RRAM Crossbar Size > Weight Matrix Size”
→ Direct Mapping
• The Case when “RRAM Crossbar Size < Weight Matrix Size”
e.g. crossbar column length = 128, Conv kernel size of VGG19 = 3*3*512
→ Matrix Splitting

12 3
BCNN on RRAM: Convolver Circuit

How to map a large matrix onto a group of RRAM crossbars?

x = x =
• Column Splitting

(# RRAM Column < # Kernel)

• Row Splitting x = x = + +

(RRAM Column Length < Kernel Size)


How to decide the bit-level of the partial sum?
• Current Solution: 4-bit interface to
• Signal Splitting = Pos - Neg trade-off accuracy and energy efficiency
by brute force searching.
• Future Work: Merge the partial sum
quantization into the training process.

13
BCNN on RRAM: Line Buffer & Pipeline

• Sliding Window
– Unnecessary to buffer the whole input feature maps.
– the Convolver circuit can awake (A) from sleep (S) once the input data
of the Conv kernel size is achieved;
– Structure of Line Buffer introduced to cache and fetch intermediate
feature map.
(𝑖)
C𝑖𝑛 line buffers

(𝑖) (𝑖)
C𝑖𝑛 IN ℎ ⋅ 𝑤 ⋅ C𝑖𝑛
MUX

……
0
……
Feature Map Feature Map


From the Previous CONV_EN CTRL To the Next
OUT
Convovler Circuit 𝑤 Convovler Circuit
𝑊
(f)

14
BCNN on RRAM: Line Buffer & Pipeline

• Basic Convolution • Line Buffer Size << Feature Map Size


Layer i Layer i

x
x
3*3 Kernel
Feature Map 3*3 Kernel
Feature Map

• Zero Padding: To Keep the • Awake When Sliding Along


Input/Output Feature Map Size the Same • Sleep When Meeting the End of One Row
Layer i Layer i

✔️ ✖
x x
3*3 Kernel
3*3 Kernel
Feature Map
Feature Map
15
Experimental Results

• Experimental Setup:
– Small Case:
• LeNet on MNIST (C5*20-S2-C5*50-S2-FC100-FC10)
• Unnecessary for Matrix Splitting
• Effects of Device Variation on Multi-bit/Binary CNN Model
Mapping onto N-bit RRAM devices
– Large Case:
• AlexNet on ImageNet
• Necessary for Matrix Splitting (4bit ADC for Partial Sum Interface)
• Area and Energy Estimation on Multi-bit/Binary CNN Model
Mapping onto N-bit RRAM platform
– Other Settings:
• Crossbar Size: (128x128)

16
Experimental Results

• Effects of Device Variation under Different Bit-Levels


CNN Model
【Multi-bit Model】(Quantization Error)
Dynamically Quantizing [Qiu. FPGA 2016]
the Well-Trained Floating-Point Model
into M bits RRAM Model
• With N-bit Representative Ability
【Binary Model】(Training Error) • Assumption: Symmetric Conductance Range
Well Trained Model under the BinaryNet • the kth Conductance Range:
Training WorkFlow (g(k)-△g, g(k)+△g),
• Device Variation:
(-△g, +△g).
(Mapping Error) 【Full Bit-Level Mode】
All 2N Conductance Ranges are all in Use
【Binary Mode】
Only two Conductance Ranges are Picked from
2N Ones .

17
Experimental Results

• Multi-bit/Binary CNN Error Rate of LeNet on MNIST:


Effects of Device Variation Under Different Bit-Levels

• No Variation: Precise Mapping from the Quantized Model to RRAM (Baseline)


– Full Bit-Level Mode: Dynamic Quantization Error Only
– Binary Mode: Training Error Only

• With Variation:
– Full Bit-Level Mode: Dynamic Quantization Error + Variation
– Binary Mode: Training Error + Variation
Binary Mode shows Better Robustness than Full Bit-Level Mode
18
Experimental Results

• Area and Energy Estimation of Different RRAM-based Crossbar PEs

• Area: FC Layer takes the largest part

• Energy: Conv Layer takes the largest part


• Input Interface: Mostly Saved
• Output Interface: Still takes a large portion

19
Conclusion

• In this paper, an RRAM crossbar-based accelerator is proposed for


BCNN forward process.
– The matrix splitting problem
– The pipeline implementation.

• The robustness of BCNN on RRAM under device variation are


demonstrated.
– Experimental results show that BCNN introduces negligible
recognition accuracy loss for LeNet on MNIST.
– For AlexNet on ImageNet, the RRAM-based BCNN accelerator
saves 58.2% energy consumption and 56.8% area compared with
multi-bit CNN structure.

20
Thanks for your Attention 

21

You might also like