MNSIM Manual
MNSIM Manual
ABSTRACT
MNSIM Python is a behavior-level modeling tool for NVM-based CNN accelerators and the version 1.0 is
still a beta version. If you have any questions and suggestions about MNSIM Python please contact us
via e-mail. We hope that MNSIM Python can be helpful to your research work, and sincerely invite every
Processing-In-Memory researcher to add your ideas to MNSIM Python to enlarge its function.
CONTENTS
1 Introduction 1
2 Running MNSIM Python 2
2.1 Basic running method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Parser information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.3 Hardware description and modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.4 CNN description and weights file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.5 Case study: VGG8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Architecture design used in MNSIM Python 5
4 Entire modeling flow 7
5 Future work and update 8
References 9
1 INTRODUCTION
MNSIM Python is a behavior-level modeling tool for NVM-based CNN accelerators, which is devel-
oped in Python. Compared with the former version MNSIM (available in: https://github.com/Zhu-
Zhenhua/MNSIM V1.1), MNSIM Python models the CNN computing accuracy and hardware perfor-
mance (i.e., area, power, energy, and latency) in behavior-level. As shown in Figure 1, this tool is
developed for Non-Volatile Memory (NVM) based Processing-In-Memory (PIM) architecture designers
and CNN algorithm researchers who want to fast evaluate the CNN accuracy and hardware performance
of their architecture or algorithm model design. It should be noted that this tool is mainly used to estimate
and compare the relative dis-/advantages of different architecture/NN design solutions. For achieving
more accurate simulation results, please use circuits-level simulators.
MNSIM Python is designed based on these papers:
[IEEE TCAD] Lixue Xia, Boxun Li, Tianqi Tang, Peng Gu, Pai-yu Chen, Shimeng Yu, Yu Cao, Yu
Wang, Yuan Xie, Huazhong Yang, MNSIM: Simulation Platform for Memristor-based Neuromorphic
Computing System, in IEEE TCAD, vol.37, No.5, 2018, pp.1009-1022.
Alg. Model Different Researchers
Operator
Algorithm
Parameters
Mapping Output Results
Method • Algorithm calculation accuracy
Connections • Hardware performance and costs:
Architecture ü Area
Buffer
ü Power & Energy consumption
MNSIM
User-Defined ü Latency
Module
User Inputs ü…
Interfaces • Resource utilization
Circuits
Technology • Security assessment
Xbar Size
Device Type
Device
Characteristics
SPICE Simulation Results or
Non-ideal
•2
factors Chip Test Results
[DAC’19] Zhenhua Zhu, Hanbo Sun, Yujun Lin, Guohao Dai, Lixue Xia, Song Han, Yu Wang,
Huazhong Yang, A Configurable Multi-Precision CNN Computing Framework Based on Single Bit
RRAM, in DAC, 2019.
[ASPDAC’20] Hanbo Sun, Zhenhua Zhu, Yi Cai, Xiaoming Chen, Yu Wang, Huazhong Yang, An
Energy- Efficient Quantized and Regularized Training Framework for Processing-In-Memory Accelerators,
to appear in ASP-DAC 2020, 2020.
[ASPDAC’17] Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang, Binary Convolutional
Neural Network on RRAM, in ASP-DAC 2017, 2017, pp.782-787.
Thanks for using MNSIM Python.
2nd : Download the default weights files to the file /MNSIM Python/:
https://cloud.tsinghua.edu.cn/d/e566b3daaed44804b640/.
3rd : Go to the tool directory and run MNSIM Python:
2/9
Table 1. Parser information
1. [Device level]: model the device characteristics (e.g., device area, read/write latency, etc.)
2. [Crossbar level]: model the crossbar configuration (e.g., crossbar size)
3. [Interface level]: describe characteristics of Analog-to-Digital Converters (ADCs) and Digital-to-
Analog Converters (DACs) (e.g., ADDA area, resolution, power, etc.)
4. [Process element level]: model the PE configuration (Figure 2(3). e.g., crossbar number in PE)
5. [Digital module level]: model the digital module configuration (e.g., registers, shifters, adders.)
6. [Tile level]: model the tile configuration (Figure 2(2). e.g., PE number in one tile)
7. [Architecture level]: describe the architecture level configuration and buffer design (e.g., buffer
type and size)
8. [Algorithm level]: Configure the simulation settings (needs to be updated later)
For more details about [SimConfig.ini], users can refer to the default configuration file (file path: /MN-
SIM Python/SimConfig.ini).
3/9
if cate.startswith('alexnet’):
layer_config_list.append({‘type‘: ’conv‘, ’in_channels‘: 3, ’out_channels‘: 64,
'kernel_size':3, 'padding':1, 'stride':2})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'pooling', 'mode': 'MAX', 'kernel_size': 2, 'stride': 2})
layer_config_list.append({'type': 'conv', 'in_channels': 64,
'out_channels': 192, 'kernel_size': 3, 'padding': 1})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'pooling', 'mode': 'MAX', 'kernel_size': 2, 'stride': 2})
layer_config_list.append({'type': 'conv', 'in_channels': 192,
'out_channels': 384, 'kernel_size': 3, 'padding': 1})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'conv', 'in_channels': 384,
'out_channels': 256, 'kernel_size': 3, 'padding': 1})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'conv', 'in_channels': 256,
'out_channels': 256, 'kernel_size': 3, 'padding': 1})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'pooling', 'mode': 'MAX’, 'kernel_size': 2, 'stride': 2})
layer_config_list.append({'type': 'view’})
layer_config_list.append({'type': 'fc', 'in_features': 1024, ‘out_features’: 4096})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({‘type’: ‘fc’, ‘in_features’: 4096,‘out_features’: 4096})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'fc', 'in_features': 4096,'out_features': 10})
CNN (i.e., different layers have different weights’, input activations’, and output activations’
precision). Users need to add descriptions of the precision parameters of each layer after defining
the CNN structure in /MNSIM Python/MNSIM/Interface/network.py. For example, if we want to
specify that the parameters of each layer are the same (weight precision is 9-bit, activation precision
is 9-bit, and the fixed-point decimal point position is -2), the code is shown below:
for i in range(len(layer_config_list)):
quantize_config_list.append({‘weight_bit’: 9, ‘activation_bit’: 9, ‘point_shift’: -2})
input_index_list.append([-1])
2. Provide the weights file (*.pth) of the user designed network. The weights file is required to be
generated by PyTorch (with torch.save).
Then, add the MNSIM Python file path into the system environment variables and go to the directory:
4/9
Next, run the MNSIM Python and the data set will be downloaded at the begining.
4. CNN classification accuracy (accuracy results based on GPUs and PIM systems):
5/9
CPU DRAM In-Port (Tile)
Out-Port (N)
PE-level Input Activation Buffer
Out-Port (S)
In-Port (N) Merge Out-Port (W)
In-Port (S) and Out-Port (E)
iReg
iReg
DAC
DAC
In-Port (W) Bypass XBAR XBAR
Out-Pooling
In-Port (E) ······
Tile Tile Tile Tile Out-PE
ADC ADC
Interconnect Data Buffer oReg oReg
1 1 Data Forwarding Unit
<< + << +
Pooling Module
Input Output-to-Left
PE PE PE PE Downstream
4 3 Data Distributor Output-to-Right
Tile Tile Tile Tile
PE PE PE PE Upstream
Data-1 Concat or
Bank-levelRRAM
OutputBank
Buffer PE PE PE PE Output-to-Next
RRAM Bank Upstream Add
RRAM Bank Data-2
Tile-level Output Buffer
RRAM-based CNN Accelerator 2 Tile 4 Joint Module
Lin, Guohao Dai, Lixue Xia, Song Han, Yu Wang, Huazhong Yang, A Configurable Multi-Precision CNN
Computing Framework Based on Single Bit RRAM, in Design Automation Conference (DAC), 2019.
In this paper, we demonstrated that multi-precision CNN quantization can improve the classification
accuracy while reducing the storage burden and computing latency further. To support the acceleration
of multi-precision CNNs in limited precision device-based PIM accelerators (e.g., 1-bit RRAM), a data
splitting scheme is proposed as shown in Figure 3. In our architecture design, we use multiple crossbars
to store multi-bit weights. For the input activation, due to the limited resolution of DACs, multiple cycles
are needed for fetching these data.
𝐶/01.
⟹ 𝐾'+ 𝐶'- .
……
⟹ ……
……
……
……
…
…
The architecture is mainly composed of several NVM banks. In each NVM bank, an array of NVM
tiles is organized and connected in a way similar to Network-on-Chip (NoC). To reduce the complexity
of control logic and data path, we specify each tile will only process one layer of CNN, while for some
large-scale layers, matrix splitting and multiple tiles will be needed. One NVM tile is adjacent to a data
forwarding unit, which receives data from other tiles, merges (i.e., add or concatenate) them, and outputs
the result to the local tile or other tiles. According to the layer type, i.e., CONV layers, pooling layers, or
FC layers, the NVM tile can be configured as a pooling module or an MVM module, which are realized
by the pooling module and the crossbar process elements (PEs) array. The NVM PEs in one tile are
linked as an H-Tree structure to reduce the intra tile interconnection overhead. Each connection node
6/9
of the H-Tree is a joint module, which manages the data forwarding and summations of PE results. To
solve the limited NVM device precision problem and to support multi-precision algorithms, multiple
low precision NVM crossbars represent and store a part of high precision weight values. For example,
eight 1-bit NVM crossbars are required for storing 8-bit CONV kernels. Computing results of different
crossbars are merged together by shifter and adder tree.
Non-ideal Factors
(SAF and Variation) ADC Quant.
Figure 5 illustrates the detailed accuracy evaluation process of PIM-based CNN computing accuracy,
which contains five steps (/MNSIM Python/MNSIM/Interface/). Firstly, considering crossbar size, NVM
device precision, and DAC resolution, we split the weight matrix and feature data into sub-matrices and
sub-vectors. Secondly, non- ideal factors are introduced to update the sub-matrix values. Here we only
take Stuck- At-Faults (SAFs) and resistance variations into consideration, other non-ideal factors will be
updated in the latter version. Thirdly, Matrix-Vector Multiplications (MVMs) are performed between
updated sub-matrices and sub vectors. Fourthly, the MVMs results are quantized according to the ADC
resolution. In MNSIM Python, we provide two quantization modes:
1. Normal fixed quantization range: determine the quantization range according to the crossbar size
M × N, device precision (pNV M ), and DAC resolution(pDAC ):
2. Dynamic quantization range: determine the quantization range according to the data distribution
through NN training with training dataset. For this mode, please refer to our ASPDAC’20 paper
for more information: Hanbo Sun, Zhenhua Zhu, Yi Cai, Xiaoming Chen, Yu Wang, Huazhong
Yang, An Energy-Efficient Quantized and Regularized Training Framework for Processing-In-
Memory Accelerators, to appear in the 25th Asia and South Pacific Design Automation Conference
(ASP-DAC 2020), 2020.
Finally, the quantized MVM results are merged into the CONV results and propagated to the later layers
to get the final classification accuracy.
The hardware modeling part is based on our previous work: Lixue Xia, Boxun Li, Tianqi Tang, Peng
Gu, Pai-yu Chen, Shimeng Yu, Yu Cao, Yu Wang, Yuan Xie, Huazhong Yang, MNSIM: Simulation
Platform for Memristor-based Neuromorphic Computing System , in IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems (TCAD), vol.37, No.5, 2018, pp.1009-1022. Firstly,
according to CNN structure and architecture design, the hardware resource usage and tile-level data
dependency description are generated (/MNSIM Python/ MNSIM/Mapping Model/). Secondly, in
terms of the mapping results, the power and area are modeled from the bottom level (e.g., device) to
the top level (e.g., tile) (/MNSIM Python/MNSIM/Hardware Model/). Please note that the area and
power results are based on the behavior-level modeling analysis. The parameters in the modules come
from circuits-level simulation results, existing paper results, and other simulators (i.e., CACTI [4, 3]
7/9
and NVSIM [2]). Thirdly, computing latency is estimated w/ or w/o considering inner-layer pipeline
(/MNSIM Python/MNSIM/Latency Model/). The inner-layer pipeline structure is discussed in our paper:
Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang, Binary Convolutional Neural Network on
RRAM, in Proceedings of the 22nd Asia and South Pacific Design Automation Conference (ASP-DAC),
2017, pp.782-787. Finally, the latency results and power results are used to calculate the computing
energy efficiency.
…
…
Weight Sub Matrix Updated Sub Layer i
Matrix W Weight WNN Matrix W’NN
……
ADC
Crossbar Size Sub Input Resolution
Vector1 Layer N
DAC Feature Map Sub Input Matrix-Vector Quantization
Resolution Split Module Vector2 Multiplication Module
Classification
…
Input Feature Sub Input Accuracy
Data VectorN CONV
Results
Scheduler Instruction
NAS Module Module Sequence
Test accuracy NN Model
NN Training NN Model NN Mapping NN Mapping
Initial Module for NVM Module Strategy
Input
NN Computing Training
Error NN Training
IR Procedure
Module in NVM
Description
NVM Computing
Output
Error Analyzer Hardware
Performance Hardware Architecture
Simulation Results Simulator Descriptions
Latency, Power, Area
The completed version of MNSIM Python we plan is shown in Figure 6. Compared with the current
version, we will add a Network-Architecture-Search (NAS) module for PIM system to generate a “suitable”
CNN structure for PIM and NN training module in PIM to model the on-line training architecture based
on NVM [1].
Here is our update plan:
Recent updates:
1. Complement the missing digital module simulation data;
2. Update the buffer modeling and add more different buffer design options (e.g., NVM- based buffer
design);
3. Design the network structure parameters automatic extraction module;
4. Optimize the modeling accuracy;
8/9
5. Support more kinds of non-ideal factors;
6. Add the NN training module for NVM-based PIM system.
3. Design the interface between MNSIM Python and other circuits-level simulators.
REFERENCES
[1] Cheng, M. et al. (2017). Time: A training-in-memory architecture for memristor-based deep neural
networks. In DAC, 2017. ACM.
[2] Dong, X. et al. (2012). Nvsim: A circuit-level performance, energy, and area model for emerging
9/9