0% found this document useful (0 votes)
32 views9 pages

MNSIM Manual

Uploaded by

hiyis33635
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views9 pages

MNSIM Manual

Uploaded by

hiyis33635
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Manual of MNSIM Python:

A Behavior-Level Modeling Tool for NVM-based


CNN Accelerators
Zhenhua Zhu1,* , Hanbo Sun1 , Kaizhong Qiu1 , Lixue Xia2 , Gokul Krishnan6 , Dimin Niu2 ,
Qiuwen Lou3 , Xiaoming Chen4 , Yuan Xie2,5 , Yu Cao6 , X. Sharon Hu3 , Yu Wang1,* , and
Huazhong Yang1
1 Dept. of EE, BNRist, Tsinghua University
2 Alibaba Group
3 University of Notre Dame
4 Institute of Computing Technology, Chinese Academy of Sciences
5 University of California, Santa Barbara
6 Arizona State University
* [email protected], [email protected]

ABSTRACT
MNSIM Python is a behavior-level modeling tool for NVM-based CNN accelerators and the version 1.0 is
still a beta version. If you have any questions and suggestions about MNSIM Python please contact us
via e-mail. We hope that MNSIM Python can be helpful to your research work, and sincerely invite every
Processing-In-Memory researcher to add your ideas to MNSIM Python to enlarge its function.

CONTENTS
1 Introduction 1
2 Running MNSIM Python 2
2.1 Basic running method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Parser information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.3 Hardware description and modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.4 CNN description and weights file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.5 Case study: VGG8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Architecture design used in MNSIM Python 5
4 Entire modeling flow 7
5 Future work and update 8
References 9

1 INTRODUCTION
MNSIM Python is a behavior-level modeling tool for NVM-based CNN accelerators, which is devel-
oped in Python. Compared with the former version MNSIM (available in: https://github.com/Zhu-
Zhenhua/MNSIM V1.1), MNSIM Python models the CNN computing accuracy and hardware perfor-
mance (i.e., area, power, energy, and latency) in behavior-level. As shown in Figure 1, this tool is
developed for Non-Volatile Memory (NVM) based Processing-In-Memory (PIM) architecture designers
and CNN algorithm researchers who want to fast evaluate the CNN accuracy and hardware performance
of their architecture or algorithm model design. It should be noted that this tool is mainly used to estimate
and compare the relative dis-/advantages of different architecture/NN design solutions. For achieving
more accurate simulation results, please use circuits-level simulators.
MNSIM Python is designed based on these papers:
[IEEE TCAD] Lixue Xia, Boxun Li, Tianqi Tang, Peng Gu, Pai-yu Chen, Shimeng Yu, Yu Cao, Yu
Wang, Yuan Xie, Huazhong Yang, MNSIM: Simulation Platform for Memristor-based Neuromorphic
Computing System, in IEEE TCAD, vol.37, No.5, 2018, pp.1009-1022.
Alg. Model Different Researchers
Operator
Algorithm
Parameters
Mapping Output Results
Method • Algorithm calculation accuracy
Connections • Hardware performance and costs:
Architecture ü Area
Buffer
ü Power & Energy consumption
MNSIM
User-Defined ü Latency
Module
User Inputs ü…
Interfaces • Resource utilization
Circuits
Technology • Security assessment

Xbar Size
Device Type
Device
Characteristics
SPICE Simulation Results or
Non-ideal
•2
factors Chip Test Results

Figure 1. The overview of MNSIM Python

[DAC’19] Zhenhua Zhu, Hanbo Sun, Yujun Lin, Guohao Dai, Lixue Xia, Song Han, Yu Wang,
Huazhong Yang, A Configurable Multi-Precision CNN Computing Framework Based on Single Bit
RRAM, in DAC, 2019.
[ASPDAC’20] Hanbo Sun, Zhenhua Zhu, Yi Cai, Xiaoming Chen, Yu Wang, Huazhong Yang, An
Energy- Efficient Quantized and Regularized Training Framework for Processing-In-Memory Accelerators,
to appear in ASP-DAC 2020, 2020.
[ASPDAC’17] Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang, Binary Convolutional
Neural Network on RRAM, in ASP-DAC 2017, 2017, pp.782-787.
Thanks for using MNSIM Python.

2 RUNNING MNSIM PYTHON


2.1 Basic running method
1st : Make sure the MNSIM Python location is added into the system environment variables:
e.g.: export PYTHONPATH=PYTHONPATH:/Users/user1/MNSIM Python/

2nd : Download the default weights files to the file /MNSIM Python/:
https://cloud.tsinghua.edu.cn/d/e566b3daaed44804b640/.
3rd : Go to the tool directory and run MNSIM Python:

e.g.: cd /Users/user1/MNSIM Python/


python main.py

2.2 Parser information


The detailed parser information is listed in Table 1.
Here we show two examples about how to use the parser information:
e.g.: Simulate the NN computing accuracy considering SAFs and device variations
python main.py -SAF -Var

e.g.: Simulate AlexNet and the weights file is stored in [/example/AlexNet.pth]


python main.py -NN ’AlexNet’ -Weights ”/example/AlexNet.pth”

2.3 Hardware description and modification


In MNSIM Python, we propose a basic PIM architecture assumption as shown in Figure 2. Users can
describe their PIM architectures design with a few modifications (e.g., change the crossbar size, PE
number, or add new hardware modules). More details of the architecture design will be discussed in
Section 3.
[SimConfig.ini] is the hardware configuration description file, which contains eight parts:

2/9
Table 1. Parser information

Parser Description Default


-HWdes Hardware description file location and /MNSIM Python/
–hardware description file name SimConfig.ini
-Weights /MNSIM Python/
NN weights file location and file name
–weights vgg8 params.pth
-NN
NN model name vgg8
–NN
-DisHW
Disable hardware modeling False
–disable hardware modeling
-DisAccu
Disable accuracy simulation False
–disable accuracy simulation
-SAF Enable MNSIM Python to simulate the
False
–enable SAF effect of Stuck-At-Fault
-Var Enable MNSIM Python to simulate the
False
–enable variation effect of device variation
-FixRange Enable MNSIM Python to fix ADC
False
–enable fixed Qrange quantization range (-|max|, |max|)
-DisPipe Disable the inner-layer pipeline structure
False
–disable inner pipeline modeling in MNSIM Python
Determine the device (platform) running
-D
MNSIM Python (input the GPU id, CPU
–device
None is CPU)
Disable module simulation results
-DisModOut
output, only output the entire system False
–disable module output
simulation results
Disable layer-wise simulation results
-DisLayOut
output, only output the whole NN model False
–disable layer output
simulation results

1. [Device level]: model the device characteristics (e.g., device area, read/write latency, etc.)
2. [Crossbar level]: model the crossbar configuration (e.g., crossbar size)
3. [Interface level]: describe characteristics of Analog-to-Digital Converters (ADCs) and Digital-to-
Analog Converters (DACs) (e.g., ADDA area, resolution, power, etc.)
4. [Process element level]: model the PE configuration (Figure 2(3). e.g., crossbar number in PE)
5. [Digital module level]: model the digital module configuration (e.g., registers, shifters, adders.)
6. [Tile level]: model the tile configuration (Figure 2(2). e.g., PE number in one tile)
7. [Architecture level]: describe the architecture level configuration and buffer design (e.g., buffer
type and size)
8. [Algorithm level]: Configure the simulation settings (needs to be updated later)
For more details about [SimConfig.ini], users can refer to the default configuration file (file path: /MN-
SIM Python/SimConfig.ini).

2.4 CNN description and weights file


In MNSIM Python, we provide four basic network models and weights files, i.e., LeNet (-NN ‘lenet’),
AlexNet (-NN ‘alexnet’), VGG-8 (-NN ‘vgg8’), and VGG-16 (-NN ‘vgg16’). These basic network models
are trained on Cifar-10. The basic weights file can be downloaded from:
https://cloud.tsinghua.edu.cn/d/e566b3daaed44804b640/.
If the users want to test their own CNN models other than the default CNN models, two steps are needed:
1. Describe the user designed network structure in /MNSIM Python/MNSIM/ Interface/network.py
get net(hardware config, cate). For example, the code describing AlexNet is shown below.
Here, the input variable string ’alexnet’ is the NN model name used in input parser, other required
information is shown in Table 2. What is more, MNSIM Python also supports the multi-precision

3/9
if cate.startswith('alexnet’):
layer_config_list.append({‘type‘: ’conv‘, ’in_channels‘: 3, ’out_channels‘: 64,
'kernel_size':3, 'padding':1, 'stride':2})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'pooling', 'mode': 'MAX', 'kernel_size': 2, 'stride': 2})
layer_config_list.append({'type': 'conv', 'in_channels': 64,
'out_channels': 192, 'kernel_size': 3, 'padding': 1})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'pooling', 'mode': 'MAX', 'kernel_size': 2, 'stride': 2})
layer_config_list.append({'type': 'conv', 'in_channels': 192,
'out_channels': 384, 'kernel_size': 3, 'padding': 1})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'conv', 'in_channels': 384,
'out_channels': 256, 'kernel_size': 3, 'padding': 1})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'conv', 'in_channels': 256,
'out_channels': 256, 'kernel_size': 3, 'padding': 1})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'pooling', 'mode': 'MAX’, 'kernel_size': 2, 'stride': 2})
layer_config_list.append({'type': 'view’})
layer_config_list.append({'type': 'fc', 'in_features': 1024, ‘out_features’: 4096})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({‘type’: ‘fc’, ‘in_features’: 4096,‘out_features’: 4096})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'fc', 'in_features': 4096,'out_features': 10})

CNN (i.e., different layers have different weights’, input activations’, and output activations’
precision). Users need to add descriptions of the precision parameters of each layer after defining
the CNN structure in /MNSIM Python/MNSIM/Interface/network.py. For example, if we want to
specify that the parameters of each layer are the same (weight precision is 9-bit, activation precision
is 9-bit, and the fixed-point decimal point position is -2), the code is shown below:
for i in range(len(layer_config_list)):
quantize_config_list.append({‘weight_bit’: 9, ‘activation_bit’: 9, ‘point_shift’: -2})
input_index_list.append([-1])

2. Provide the weights file (*.pth) of the user designed network. The weights file is required to be
generated by PyTorch (with torch.save).

Table 2. Layer required information

Layer Type Variable Description


in channels Input channel number
conv
out channels Output channel number
(Convolutional layer +
kernel size The convolutional kernel size
batch norm operations)
stride The stride size of the sliding window
relu
– Current version only supports ReLU
(Nonlinear activation layer)
Pooling function type: max pooling (MAX)
pooling mode
or average (AVG) pooling
kernel size Pooling window’s size
(Pooling layer)
stride The stride size of the sliding window
Change the 3D matrix to 1D vector
view –
(transition between conv and fc layer)
fc in features The length of fc layer’s input vector
(Fully-connected layer) out features The length of fc layer’s output vector

2.5 Case study: VGG8


In this section, we will show a case study: simulation and modeling of VGG8.
Firstly, download the source code from GitHub:

Then, add the MNSIM Python file path into the system environment variables and go to the directory:

4/9
Next, run the MNSIM Python and the data set will be downloaded at the begining.

The output information contains four parts:


1. The CNN model information:

2. Hardware utilization and performance:

3. Computing latency (latency of each layer and the entire model):

4. CNN classification accuracy (accuracy results based on GPUs and PIM systems):

3 ARCHITECTURE DESIGN USED IN MNSIM PYTHON


In order to model the computing accuracy and hardware performance of PIM accelerators under different
architecture design parameters, we propose a basic architecture assumption for MNSIM Python, which is
shown in Figure 2. The architecture design refers to our DAC’19 paper: Zhenhua Zhu, Hanbo Sun, Yujun

5/9
CPU DRAM In-Port (Tile)
Out-Port (N)
PE-level Input Activation Buffer
Out-Port (S)
In-Port (N) Merge Out-Port (W)
In-Port (S) and Out-Port (E)

iReg

iReg
DAC

DAC
In-Port (W) Bypass XBAR XBAR
Out-Pooling
In-Port (E) ······
Tile Tile Tile Tile Out-PE
ADC ADC
Interconnect Data Buffer oReg oReg
1 1 Data Forwarding Unit
<< + << +

Tile Tile Tile Tile Pooling Adder Tree


2 Input Buffer
Input Distributor
PE-level Output Reg
PE PE PE PE 3 Process Element (PE)
Tile Tile Tile Tile

Pooling Module
Input Output-to-Left
PE PE PE PE Downstream
4 3 Data Distributor Output-to-Right
Tile Tile Tile Tile
PE PE PE PE Upstream
Data-1 Concat or
Bank-levelRRAM
OutputBank
Buffer PE PE PE PE Output-to-Next
RRAM Bank Upstream Add
RRAM Bank Data-2
Tile-level Output Buffer
RRAM-based CNN Accelerator 2 Tile 4 Joint Module

Figure 2. The overall architecture assumption used in MNSIM Python

Lin, Guohao Dai, Lixue Xia, Song Han, Yu Wang, Huazhong Yang, A Configurable Multi-Precision CNN
Computing Framework Based on Single Bit RRAM, in Design Automation Conference (DAC), 2019.
In this paper, we demonstrated that multi-precision CNN quantization can improve the classification
accuracy while reducing the storage burden and computing latency further. To support the acceleration
of multi-precision CNNs in limited precision device-based PIM accelerators (e.g., 1-bit RRAM), a data
splitting scheme is proposed as shown in Figure 3. In our architecture design, we use multiple crossbars
to store multi-bit weights. For the input activation, due to the limited resolution of DACs, multiple cycles
are needed for fetching these data.

𝐶/01.

⟹ 𝐾'+ 𝐶'- .
……

bit 1 bit 2 bit 𝑊'


Input Activation Weight Kernel Crossbars
𝑀' _𝑏𝑖𝑡 𝑚_bit

⟹ ……
……

……

……


Input Activation Weight Kernel #$ %

Figure 3. Data splitting scheme

The architecture is mainly composed of several NVM banks. In each NVM bank, an array of NVM
tiles is organized and connected in a way similar to Network-on-Chip (NoC). To reduce the complexity
of control logic and data path, we specify each tile will only process one layer of CNN, while for some
large-scale layers, matrix splitting and multiple tiles will be needed. One NVM tile is adjacent to a data
forwarding unit, which receives data from other tiles, merges (i.e., add or concatenate) them, and outputs
the result to the local tile or other tiles. According to the layer type, i.e., CONV layers, pooling layers, or
FC layers, the NVM tile can be configured as a pooling module or an MVM module, which are realized
by the pooling module and the crossbar process elements (PEs) array. The NVM PEs in one tile are
linked as an H-Tree structure to reduce the intra tile interconnection overhead. Each connection node

6/9
of the H-Tree is a joint module, which manages the data forwarding and summations of PE results. To
solve the limited NVM device precision problem and to support multi-precision algorithms, multiple
low precision NVM crossbars represent and store a part of high precision weight values. For example,
eight 1-bit NVM crossbars are required for storing 8-bit CONV kernels. Computing results of different
crossbars are merged together by shifter and adder tree.

4 ENTIRE MODELING FLOW


The entire modeling flow is shown in Figure 4. The input variables include specific weights & input
feature (*.pth), CNN structure (/Interface/network.py), and architecture design (SimConfig.ini). The
modeling process can be divided into two parts: accuracy simulation and hardware performance modeling.

Non-ideal Factors
(SAF and Variation) ADC Quant.

Weight matrix and Sub Matrix Weight


Weights & Matrix-vector MVM results and CNN classification
feature split
Input Feature Sub Input Vector multiplication error propagation accuracy
module

Latency Modeling Latency results


CNN Structure CNN computing
Hardware energy efficiency
CNN mapping
Resources Power Modeling Power results
module
Allocation Results
Architecture Area Modeling Area results
Design Resource
utilization

Figure 4. Entire modeling flow

Figure 5 illustrates the detailed accuracy evaluation process of PIM-based CNN computing accuracy,
which contains five steps (/MNSIM Python/MNSIM/Interface/). Firstly, considering crossbar size, NVM
device precision, and DAC resolution, we split the weight matrix and feature data into sub-matrices and
sub-vectors. Secondly, non- ideal factors are introduced to update the sub-matrix values. Here we only
take Stuck- At-Faults (SAFs) and resistance variations into consideration, other non-ideal factors will be
updated in the latter version. Thirdly, Matrix-Vector Multiplications (MVMs) are performed between
updated sub-matrices and sub vectors. Fourthly, the MVMs results are quantized according to the ADC
resolution. In MNSIM Python, we provide two quantization modes:
1. Normal fixed quantization range: determine the quantization range according to the crossbar size
M × N, device precision (pNV M ), and DAC resolution(pDAC ):

[0, 2 pDAC +log2 M+pNV M − 1]

2. Dynamic quantization range: determine the quantization range according to the data distribution
through NN training with training dataset. For this mode, please refer to our ASPDAC’20 paper
for more information: Hanbo Sun, Zhenhua Zhu, Yi Cai, Xiaoming Chen, Yu Wang, Huazhong
Yang, An Energy-Efficient Quantized and Regularized Training Framework for Processing-In-
Memory Accelerators, to appear in the 25th Asia and South Pacific Design Automation Conference
(ASP-DAC 2020), 2020.
Finally, the quantized MVM results are merged into the CONV results and propagated to the later layers
to get the final classification accuracy.
The hardware modeling part is based on our previous work: Lixue Xia, Boxun Li, Tianqi Tang, Peng
Gu, Pai-yu Chen, Shimeng Yu, Yu Cao, Yu Wang, Yuan Xie, Huazhong Yang, MNSIM: Simulation
Platform for Memristor-based Neuromorphic Computing System , in IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems (TCAD), vol.37, No.5, 2018, pp.1009-1022. Firstly,
according to CNN structure and architecture design, the hardware resource usage and tile-level data
dependency description are generated (/MNSIM Python/ MNSIM/Mapping Model/). Secondly, in
terms of the mapping results, the power and area are modeled from the bottom level (e.g., device) to
the top level (e.g., tile) (/MNSIM Python/MNSIM/Hardware Model/). Please note that the area and
power results are based on the behavior-level modeling analysis. The parameters in the modules come
from circuits-level simulation results, existing paper results, and other simulators (i.e., CACTI [4, 3]

7/9
and NVSIM [2]). Thirdly, computing latency is estimated w/ or w/o considering inner-layer pipeline
(/MNSIM Python/MNSIM/Latency Model/). The inner-layer pipeline structure is discussed in our paper:
Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang, Binary Convolutional Neural Network on
RRAM, in Proceedings of the 22nd Asia and South Pacific Design Automation Conference (ASP-DAC),
2017, pp.782-787. Finally, the latency results and power results are used to calculate the computing
energy efficiency.

Non-ideal Input Feature


Factors
(SAF and Variation)
Crossbar Size Sub Matrix Updated Sub
Weight W10 Matrix W’10
Layer 1
Weight Sub Matrix
NVM Device Sub Matrix Updated Sub
Matrix Split Weight W11 Update
Precision Matrix W’11
Module Module



Weight Sub Matrix Updated Sub Layer i
Matrix W Weight WNN Matrix W’NN
……
ADC
Crossbar Size Sub Input Resolution
Vector1 Layer N
DAC Feature Map Sub Input Matrix-Vector Quantization
Resolution Split Module Vector2 Multiplication Module
Classification


Input Feature Sub Input Accuracy
Data VectorN CONV
Results

Figure 5. Accuracy evaluation of PIM-based CNN inference

5 FUTURE WORK AND UPDATE


There are still many incompleteness and imperfections in the current version of MNSIM Python and we
will continue to update and improve MNSIM Python.

Scheduler Instruction
NAS Module Module Sequence
Test accuracy NN Model
NN Training NN Model NN Mapping NN Mapping
Initial Module for NVM Module Strategy
Input

NN Computing Training
Error NN Training
IR Procedure
Module in NVM
Description
NVM Computing
Output
Error Analyzer Hardware
Performance Hardware Architecture
Simulation Results Simulator Descriptions
Latency, Power, Area

Figure 6. The completed version of MNSIM Python

The completed version of MNSIM Python we plan is shown in Figure 6. Compared with the current
version, we will add a Network-Architecture-Search (NAS) module for PIM system to generate a “suitable”
CNN structure for PIM and NN training module in PIM to model the on-line training architecture based
on NVM [1].
Here is our update plan:
Recent updates:
1. Complement the missing digital module simulation data;
2. Update the buffer modeling and add more different buffer design options (e.g., NVM- based buffer
design);
3. Design the network structure parameters automatic extraction module;
4. Optimize the modeling accuracy;

8/9
5. Support more kinds of non-ideal factors;
6. Add the NN training module for NVM-based PIM system.

Long term planning:


1. Add PIM-based on-line training module;
2. Add NAS module for PIM;

3. Design the interface between MNSIM Python and other circuits-level simulators.

REFERENCES
[1] Cheng, M. et al. (2017). Time: A training-in-memory architecture for memristor-based deep neural
networks. In DAC, 2017. ACM.
[2] Dong, X. et al. (2012). Nvsim: A circuit-level performance, energy, and area model for emerging

nonvolatile memory. IEEE TCAD.


[3] Muralimanohar, N., Balasubramonian, R., and Jouppi, N. P. (2009). Cacti 6.0: A tool to model large

caches. HP laboratories, 27:28.


[4] Wilton, S. J. E. and Jouppi, N. P. (1996). Cacti: an enhanced cache access and cycle time model.

JSSC, 1996, 31(5):677–688.

9/9

You might also like