0% found this document useful (0 votes)

32 views9 pages

MNSIM Manual

Uploaded by

hiyis33635

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views9 pages

MNSIM Manual

Uploaded by

hiyis33635

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Manual of MNSIM Python:

A Behavior-Level Modeling Tool for NVM-based

CNN Accelerators
Zhenhua Zhu1,* , Hanbo Sun1 , Kaizhong Qiu1 , Lixue Xia2 , Gokul Krishnan6 , Dimin Niu2 ,
Qiuwen Lou3 , Xiaoming Chen4 , Yuan Xie2,5 , Yu Cao6 , X. Sharon Hu3 , Yu Wang1,* , and
Huazhong Yang1
1 Dept. of EE, BNRist, Tsinghua University
2 Alibaba Group
3 University of Notre Dame
4 Institute of Computing Technology, Chinese Academy of Sciences
5 University of California, Santa Barbara
6 Arizona State University
* [email protected], [email protected]

ABSTRACT
MNSIM Python is a behavior-level modeling tool for NVM-based CNN accelerators and the version 1.0 is
still a beta version. If you have any questions and suggestions about MNSIM Python please contact us
via e-mail. We hope that MNSIM Python can be helpful to your research work, and sincerely invite every
Processing-In-Memory researcher to add your ideas to MNSIM Python to enlarge its function.

CONTENTS
1 Introduction 1
2 Running MNSIM Python 2
2.1 Basic running method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Parser information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.3 Hardware description and modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.4 CNN description and weights file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.5 Case study: VGG8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Architecture design used in MNSIM Python 5
4 Entire modeling flow 7
5 Future work and update 8
References 9

1 INTRODUCTION
MNSIM Python is a behavior-level modeling tool for NVM-based CNN accelerators, which is devel-
oped in Python. Compared with the former version MNSIM (available in: https://github.com/Zhu-
Zhenhua/MNSIM V1.1), MNSIM Python models the CNN computing accuracy and hardware perfor-
mance (i.e., area, power, energy, and latency) in behavior-level. As shown in Figure 1, this tool is
developed for Non-Volatile Memory (NVM) based Processing-In-Memory (PIM) architecture designers
and CNN algorithm researchers who want to fast evaluate the CNN accuracy and hardware performance
of their architecture or algorithm model design. It should be noted that this tool is mainly used to estimate
and compare the relative dis-/advantages of different architecture/NN design solutions. For achieving
more accurate simulation results, please use circuits-level simulators.
MNSIM Python is designed based on these papers:
[IEEE TCAD] Lixue Xia, Boxun Li, Tianqi Tang, Peng Gu, Pai-yu Chen, Shimeng Yu, Yu Cao, Yu
Wang, Yuan Xie, Huazhong Yang, MNSIM: Simulation Platform for Memristor-based Neuromorphic
Computing System, in IEEE TCAD, vol.37, No.5, 2018, pp.1009-1022.
Alg. Model Different Researchers
Operator
Algorithm
Parameters
Mapping Output Results
Method • Algorithm calculation accuracy
Connections • Hardware performance and costs:
Architecture ü Area
Buffer
ü Power & Energy consumption
MNSIM
User-Defined ü Latency
Module
User Inputs ü…
Interfaces • Resource utilization
Circuits
Technology • Security assessment

Xbar Size
Device Type
Device
Characteristics
SPICE Simulation Results or
Non-ideal
•2
factors Chip Test Results

Figure 1. The overview of MNSIM Python

[DAC’19] Zhenhua Zhu, Hanbo Sun, Yujun Lin, Guohao Dai, Lixue Xia, Song Han, Yu Wang,
Huazhong Yang, A Configurable Multi-Precision CNN Computing Framework Based on Single Bit
RRAM, in DAC, 2019.
[ASPDAC’20] Hanbo Sun, Zhenhua Zhu, Yi Cai, Xiaoming Chen, Yu Wang, Huazhong Yang, An
Energy- Efficient Quantized and Regularized Training Framework for Processing-In-Memory Accelerators,
to appear in ASP-DAC 2020, 2020.
[ASPDAC’17] Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang, Binary Convolutional
Neural Network on RRAM, in ASP-DAC 2017, 2017, pp.782-787.
Thanks for using MNSIM Python.

2 RUNNING MNSIM PYTHON

2.1 Basic running method
1st : Make sure the MNSIM Python location is added into the system environment variables:
e.g.: export PYTHONPATH=PYTHONPATH:/Users/user1/MNSIM Python/

2nd : Download the default weights files to the file /MNSIM Python/:
https://cloud.tsinghua.edu.cn/d/e566b3daaed44804b640/.
3rd : Go to the tool directory and run MNSIM Python:

e.g.: cd /Users/user1/MNSIM Python/

python main.py

2.2 Parser information

The detailed parser information is listed in Table 1.
Here we show two examples about how to use the parser information:
e.g.: Simulate the NN computing accuracy considering SAFs and device variations
python main.py -SAF -Var

e.g.: Simulate AlexNet and the weights file is stored in [/example/AlexNet.pth]

python main.py -NN ’AlexNet’ -Weights ”/example/AlexNet.pth”

2.3 Hardware description and modification

In MNSIM Python, we propose a basic PIM architecture assumption as shown in Figure 2. Users can
describe their PIM architectures design with a few modifications (e.g., change the crossbar size, PE
number, or add new hardware modules). More details of the architecture design will be discussed in
Section 3.
[SimConfig.ini] is the hardware configuration description file, which contains eight parts:

2/9
Table 1. Parser information

Parser Description Default

-HWdes Hardware description file location and /MNSIM Python/
–hardware description file name SimConfig.ini
-Weights /MNSIM Python/
NN weights file location and file name
–weights vgg8 params.pth
-NN
NN model name vgg8
–NN
-DisHW
Disable hardware modeling False
–disable hardware modeling
-DisAccu
Disable accuracy simulation False
–disable accuracy simulation
-SAF Enable MNSIM Python to simulate the
False
–enable SAF effect of Stuck-At-Fault
-Var Enable MNSIM Python to simulate the
False
–enable variation effect of device variation
-FixRange Enable MNSIM Python to fix ADC
False
–enable fixed Qrange quantization range (-|max|, |max|)
-DisPipe Disable the inner-layer pipeline structure
False
–disable inner pipeline modeling in MNSIM Python
Determine the device (platform) running
-D
MNSIM Python (input the GPU id, CPU
–device
None is CPU)
Disable module simulation results
-DisModOut
output, only output the entire system False
–disable module output
simulation results
Disable layer-wise simulation results
-DisLayOut
output, only output the whole NN model False
–disable layer output
simulation results

1. [Device level]: model the device characteristics (e.g., device area, read/write latency, etc.)
2. [Crossbar level]: model the crossbar configuration (e.g., crossbar size)
3. [Interface level]: describe characteristics of Analog-to-Digital Converters (ADCs) and Digital-to-
Analog Converters (DACs) (e.g., ADDA area, resolution, power, etc.)
4. [Process element level]: model the PE configuration (Figure 2(3). e.g., crossbar number in PE)
5. [Digital module level]: model the digital module configuration (e.g., registers, shifters, adders.)
6. [Tile level]: model the tile configuration (Figure 2(2). e.g., PE number in one tile)
7. [Architecture level]: describe the architecture level configuration and buffer design (e.g., buffer
type and size)
8. [Algorithm level]: Configure the simulation settings (needs to be updated later)
For more details about [SimConfig.ini], users can refer to the default configuration file (file path: /MN-
SIM Python/SimConfig.ini).

2.4 CNN description and weights file

In MNSIM Python, we provide four basic network models and weights files, i.e., LeNet (-NN ‘lenet’),
AlexNet (-NN ‘alexnet’), VGG-8 (-NN ‘vgg8’), and VGG-16 (-NN ‘vgg16’). These basic network models
are trained on Cifar-10. The basic weights file can be downloaded from:
https://cloud.tsinghua.edu.cn/d/e566b3daaed44804b640/.
If the users want to test their own CNN models other than the default CNN models, two steps are needed:
1. Describe the user designed network structure in /MNSIM Python/MNSIM/ Interface/network.py
get net(hardware config, cate). For example, the code describing AlexNet is shown below.
Here, the input variable string ’alexnet’ is the NN model name used in input parser, other required
information is shown in Table 2. What is more, MNSIM Python also supports the multi-precision

3/9
if cate.startswith('alexnet’):
layer_config_list.append({‘type‘: ’conv‘, ’in_channels‘: 3, ’out_channels‘: 64,
'kernel_size':3, 'padding':1, 'stride':2})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'pooling', 'mode': 'MAX', 'kernel_size': 2, 'stride': 2})
layer_config_list.append({'type': 'conv', 'in_channels': 64,
'out_channels': 192, 'kernel_size': 3, 'padding': 1})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'pooling', 'mode': 'MAX', 'kernel_size': 2, 'stride': 2})
layer_config_list.append({'type': 'conv', 'in_channels': 192,
'out_channels': 384, 'kernel_size': 3, 'padding': 1})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'conv', 'in_channels': 384,
'out_channels': 256, 'kernel_size': 3, 'padding': 1})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'conv', 'in_channels': 256,
'out_channels': 256, 'kernel_size': 3, 'padding': 1})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'pooling', 'mode': 'MAX’, 'kernel_size': 2, 'stride': 2})
layer_config_list.append({'type': 'view’})
layer_config_list.append({'type': 'fc', 'in_features': 1024, ‘out_features’: 4096})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({‘type’: ‘fc’, ‘in_features’: 4096,‘out_features’: 4096})
layer_config_list.append({'type': 'relu’})
layer_config_list.append({'type': 'fc', 'in_features': 4096,'out_features': 10})

CNN (i.e., different layers have different weights’, input activations’, and output activations’
precision). Users need to add descriptions of the precision parameters of each layer after defining
the CNN structure in /MNSIM Python/MNSIM/Interface/network.py. For example, if we want to
specify that the parameters of each layer are the same (weight precision is 9-bit, activation precision
is 9-bit, and the fixed-point decimal point position is -2), the code is shown below:
for i in range(len(layer_config_list)):
quantize_config_list.append({‘weight_bit’: 9, ‘activation_bit’: 9, ‘point_shift’: -2})
input_index_list.append([-1])

2. Provide the weights file (*.pth) of the user designed network. The weights file is required to be
generated by PyTorch (with torch.save).

Table 2. Layer required information

Layer Type Variable Description

in channels Input channel number
conv
out channels Output channel number
(Convolutional layer +
kernel size The convolutional kernel size
batch norm operations)
stride The stride size of the sliding window
relu
– Current version only supports ReLU
(Nonlinear activation layer)
Pooling function type: max pooling (MAX)
pooling mode
or average (AVG) pooling
kernel size Pooling window’s size
(Pooling layer)
stride The stride size of the sliding window
Change the 3D matrix to 1D vector
view –
(transition between conv and fc layer)
fc in features The length of fc layer’s input vector
(Fully-connected layer) out features The length of fc layer’s output vector

2.5 Case study: VGG8

In this section, we will show a case study: simulation and modeling of VGG8.
Firstly, download the source code from GitHub:

Then, add the MNSIM Python file path into the system environment variables and go to the directory:

4/9
Next, run the MNSIM Python and the data set will be downloaded at the begining.

The output information contains four parts:

1. The CNN model information:

2. Hardware utilization and performance:

3. Computing latency (latency of each layer and the entire model):

4. CNN classification accuracy (accuracy results based on GPUs and PIM systems):

3 ARCHITECTURE DESIGN USED IN MNSIM PYTHON

In order to model the computing accuracy and hardware performance of PIM accelerators under different
architecture design parameters, we propose a basic architecture assumption for MNSIM Python, which is
shown in Figure 2. The architecture design refers to our DAC’19 paper: Zhenhua Zhu, Hanbo Sun, Yujun

5/9
CPU DRAM In-Port (Tile)
Out-Port (N)
PE-level Input Activation Buffer
Out-Port (S)
In-Port (N) Merge Out-Port (W)
In-Port (S) and Out-Port (E)

iReg

iReg
DAC

DAC
In-Port (W) Bypass XBAR XBAR
Out-Pooling
In-Port (E) ······
Tile Tile Tile Tile Out-PE
ADC ADC
Interconnect Data Buffer oReg oReg
1 1 Data Forwarding Unit
<< + << +

Tile Tile Tile Tile Pooling Adder Tree

2 Input Buffer
Input Distributor
PE-level Output Reg
PE PE PE PE 3 Process Element (PE)
Tile Tile Tile Tile

Pooling Module
Input Output-to-Left
PE PE PE PE Downstream
4 3 Data Distributor Output-to-Right
Tile Tile Tile Tile
PE PE PE PE Upstream
Data-1 Concat or
Bank-levelRRAM
OutputBank
Buffer PE PE PE PE Output-to-Next
RRAM Bank Upstream Add
RRAM Bank Data-2
Tile-level Output Buffer
RRAM-based CNN Accelerator 2 Tile 4 Joint Module

Figure 2. The overall architecture assumption used in MNSIM Python

Lin, Guohao Dai, Lixue Xia, Song Han, Yu Wang, Huazhong Yang, A Configurable Multi-Precision CNN
Computing Framework Based on Single Bit RRAM, in Design Automation Conference (DAC), 2019.
In this paper, we demonstrated that multi-precision CNN quantization can improve the classification
accuracy while reducing the storage burden and computing latency further. To support the acceleration
of multi-precision CNNs in limited precision device-based PIM accelerators (e.g., 1-bit RRAM), a data
splitting scheme is proposed as shown in Figure 3. In our architecture design, we use multiple crossbars
to store multi-bit weights. For the input activation, due to the limited resolution of DACs, multiple cycles
are needed for fetching these data.

𝐶/01.

⟹ 𝐾'+ 𝐶'- .
……

bit 1 bit 2 bit 𝑊'

Input Activation Weight Kernel Crossbars
𝑀' _𝑏𝑖𝑡 𝑚_bit

⟹ ……
……

……

…
…

Input Activation Weight Kernel #$ %

Figure 3. Data splitting scheme

The architecture is mainly composed of several NVM banks. In each NVM bank, an array of NVM
tiles is organized and connected in a way similar to Network-on-Chip (NoC). To reduce the complexity
of control logic and data path, we specify each tile will only process one layer of CNN, while for some
large-scale layers, matrix splitting and multiple tiles will be needed. One NVM tile is adjacent to a data
forwarding unit, which receives data from other tiles, merges (i.e., add or concatenate) them, and outputs
the result to the local tile or other tiles. According to the layer type, i.e., CONV layers, pooling layers, or
FC layers, the NVM tile can be configured as a pooling module or an MVM module, which are realized
by the pooling module and the crossbar process elements (PEs) array. The NVM PEs in one tile are
linked as an H-Tree structure to reduce the intra tile interconnection overhead. Each connection node

6/9
of the H-Tree is a joint module, which manages the data forwarding and summations of PE results. To
solve the limited NVM device precision problem and to support multi-precision algorithms, multiple
low precision NVM crossbars represent and store a part of high precision weight values. For example,
eight 1-bit NVM crossbars are required for storing 8-bit CONV kernels. Computing results of different
crossbars are merged together by shifter and adder tree.

4 ENTIRE MODELING FLOW

The entire modeling flow is shown in Figure 4. The input variables include specific weights & input
feature (*.pth), CNN structure (/Interface/network.py), and architecture design (SimConfig.ini). The
modeling process can be divided into two parts: accuracy simulation and hardware performance modeling.

Non-ideal Factors
(SAF and Variation) ADC Quant.

Weight matrix and Sub Matrix Weight

Weights & Matrix-vector MVM results and CNN classification
feature split
Input Feature Sub Input Vector multiplication error propagation accuracy
module

Latency Modeling Latency results

CNN Structure CNN computing
Hardware energy efficiency
CNN mapping
Resources Power Modeling Power results
module
Allocation Results
Architecture Area Modeling Area results
Design Resource
utilization

Figure 4. Entire modeling flow

Figure 5 illustrates the detailed accuracy evaluation process of PIM-based CNN computing accuracy,
which contains five steps (/MNSIM Python/MNSIM/Interface/). Firstly, considering crossbar size, NVM
device precision, and DAC resolution, we split the weight matrix and feature data into sub-matrices and
sub-vectors. Secondly, non- ideal factors are introduced to update the sub-matrix values. Here we only
take Stuck- At-Faults (SAFs) and resistance variations into consideration, other non-ideal factors will be
updated in the latter version. Thirdly, Matrix-Vector Multiplications (MVMs) are performed between
updated sub-matrices and sub vectors. Fourthly, the MVMs results are quantized according to the ADC
resolution. In MNSIM Python, we provide two quantization modes:
1. Normal fixed quantization range: determine the quantization range according to the crossbar size
M × N, device precision (pNV M ), and DAC resolution(pDAC ):

[0, 2 pDAC +log2 M+pNV M − 1]

2. Dynamic quantization range: determine the quantization range according to the data distribution
through NN training with training dataset. For this mode, please refer to our ASPDAC’20 paper
for more information: Hanbo Sun, Zhenhua Zhu, Yi Cai, Xiaoming Chen, Yu Wang, Huazhong
Yang, An Energy-Efficient Quantized and Regularized Training Framework for Processing-In-
Memory Accelerators, to appear in the 25th Asia and South Pacific Design Automation Conference
(ASP-DAC 2020), 2020.
Finally, the quantized MVM results are merged into the CONV results and propagated to the later layers
to get the final classification accuracy.
The hardware modeling part is based on our previous work: Lixue Xia, Boxun Li, Tianqi Tang, Peng
Gu, Pai-yu Chen, Shimeng Yu, Yu Cao, Yu Wang, Yuan Xie, Huazhong Yang, MNSIM: Simulation
Platform for Memristor-based Neuromorphic Computing System , in IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems (TCAD), vol.37, No.5, 2018, pp.1009-1022. Firstly,
according to CNN structure and architecture design, the hardware resource usage and tile-level data
dependency description are generated (/MNSIM Python/ MNSIM/Mapping Model/). Secondly, in
terms of the mapping results, the power and area are modeled from the bottom level (e.g., device) to
the top level (e.g., tile) (/MNSIM Python/MNSIM/Hardware Model/). Please note that the area and
power results are based on the behavior-level modeling analysis. The parameters in the modules come
from circuits-level simulation results, existing paper results, and other simulators (i.e., CACTI [4, 3]

7/9
and NVSIM [2]). Thirdly, computing latency is estimated w/ or w/o considering inner-layer pipeline
(/MNSIM Python/MNSIM/Latency Model/). The inner-layer pipeline structure is discussed in our paper:
Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang, Binary Convolutional Neural Network on
RRAM, in Proceedings of the 22nd Asia and South Pacific Design Automation Conference (ASP-DAC),
2017, pp.782-787. Finally, the latency results and power results are used to calculate the computing
energy efficiency.

Non-ideal Input Feature

Factors
(SAF and Variation)
Crossbar Size Sub Matrix Updated Sub
Weight W10 Matrix W’10
Layer 1
Weight Sub Matrix
NVM Device Sub Matrix Updated Sub
Matrix Split Weight W11 Update
Precision Matrix W’11
Module Module

…
…
Weight Sub Matrix Updated Sub Layer i
Matrix W Weight WNN Matrix W’NN
……
ADC
Crossbar Size Sub Input Resolution
Vector1 Layer N
DAC Feature Map Sub Input Matrix-Vector Quantization
Resolution Split Module Vector2 Multiplication Module
Classification

…
Input Feature Sub Input Accuracy
Data VectorN CONV
Results

Figure 5. Accuracy evaluation of PIM-based CNN inference

5 FUTURE WORK AND UPDATE

There are still many incompleteness and imperfections in the current version of MNSIM Python and we
will continue to update and improve MNSIM Python.

Scheduler Instruction
NAS Module Module Sequence
Test accuracy NN Model
NN Training NN Model NN Mapping NN Mapping
Initial Module for NVM Module Strategy
Input

NN Computing Training
Error NN Training
IR Procedure
Module in NVM
Description
NVM Computing
Output
Error Analyzer Hardware
Performance Hardware Architecture
Simulation Results Simulator Descriptions
Latency, Power, Area

Figure 6. The completed version of MNSIM Python

The completed version of MNSIM Python we plan is shown in Figure 6. Compared with the current
version, we will add a Network-Architecture-Search (NAS) module for PIM system to generate a “suitable”
CNN structure for PIM and NN training module in PIM to model the on-line training architecture based
on NVM [1].
Here is our update plan:
Recent updates:
1. Complement the missing digital module simulation data;
2. Update the buffer modeling and add more different buffer design options (e.g., NVM- based buffer
design);
3. Design the network structure parameters automatic extraction module;
4. Optimize the modeling accuracy;

8/9
5. Support more kinds of non-ideal factors;
6. Add the NN training module for NVM-based PIM system.

Long term planning:

1. Add PIM-based on-line training module;
2. Add NAS module for PIM;

3. Design the interface between MNSIM Python and other circuits-level simulators.

REFERENCES
[1] Cheng, M. et al. (2017). Time: A training-in-memory architecture for memristor-based deep neural
networks. In DAC, 2017. ACM.
[2] Dong, X. et al. (2012). Nvsim: A circuit-level performance, energy, and area model for emerging

nonvolatile memory. IEEE TCAD.

[3] Muralimanohar, N., Balasubramonian, R., and Jouppi, N. P. (2009). Cacti 6.0: A tool to model large

caches. HP laboratories, 27:28.

[4] Wilton, S. J. E. and Jouppi, N. P. (1996). Cacti: an enhanced cache access and cycle time model.

JSSC, 1996, 31(5):677–688.

9/9

RM Merged Files
No ratings yet
RM Merged Files
207 pages
DNN NeuroSim V2.1 User Manual
No ratings yet
DNN NeuroSim V2.1 User Manual
34 pages
Zynqnet: An Fpga-Accelerated Embedded Convolutional Neural Network
No ratings yet
Zynqnet: An Fpga-Accelerated Embedded Convolutional Neural Network
102 pages
2019 Neurips Tutorial
No ratings yet
2019 Neurips Tutorial
138 pages
Pynq Classification
No ratings yet
Pynq Classification
65 pages
Adm2 FR Operating Manual 15.07.02
71% (7)
Adm2 FR Operating Manual 15.07.02
160 pages
Mxnet Documentation: Release 0.0.8
No ratings yet
Mxnet Documentation: Release 0.0.8
93 pages
Towards Reconfigurable CNN Accelerator For FPGA Implementation
No ratings yet
Towards Reconfigurable CNN Accelerator For FPGA Implementation
5 pages
Implementation of A Fast Artificial Neural Network Library (Fann)
No ratings yet
Implementation of A Fast Artificial Neural Network Library (Fann)
92 pages
Fast and Resource-Efficient CNNs For Radar Interference Mitigation On Embedded Hardware
No ratings yet
Fast and Resource-Efficient CNNs For Radar Interference Mitigation On Embedded Hardware
4 pages
User Manual of 3D - NeuroSim - V1.0
No ratings yet
User Manual of 3D - NeuroSim - V1.0
28 pages
EE292A Lecture 2.ML - Hardware - 2 - April9
No ratings yet
EE292A Lecture 2.ML - Hardware - 2 - April9
13 pages
Electronics 10 02859 v2
No ratings yet
Electronics 10 02859 v2
16 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
28 pages
Design and Implementation of Deep Neural Network Hardware Chip and Its Performance Analysis
No ratings yet
Design and Implementation of Deep Neural Network Hardware Chip and Its Performance Analysis
10 pages
Accelerating Binarized Convolutional 2017
No ratings yet
Accelerating Binarized Convolutional 2017
10 pages
A High-Performance Hardware Accelerator For Sparse Convolutional Neural Network On FPGA
No ratings yet
A High-Performance Hardware Accelerator For Sparse Convolutional Neural Network On FPGA
7 pages
Tesi
No ratings yet
Tesi
73 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
5 pages
L 0017398760 PDF
No ratings yet
L 0017398760 PDF
24 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
FPGA Based Implementation of Neural Network
No ratings yet
FPGA Based Implementation of Neural Network
5 pages
A Comprehensive Evaluation of CNN
No ratings yet
A Comprehensive Evaluation of CNN
5 pages
A CNN Accelerator On FPGA Using Depthwise Separable Convolution
No ratings yet
A CNN Accelerator On FPGA Using Depthwise Separable Convolution
5 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
A Mixed-Pruning Based Framework For Embedded Convolutional Neural Network Acceleration
No ratings yet
A Mixed-Pruning Based Framework For Embedded Convolutional Neural Network Acceleration
10 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
2017.01.jssc - Eyeriss Design
No ratings yet
2017.01.jssc - Eyeriss Design
12 pages
Hardware Accleration For ML
No ratings yet
Hardware Accleration For ML
26 pages
GSLP-CIM A 28-nm Globally Systolic and Locally Parallel CNN Transformer Accelerator With Scalable and Reconfigurable eDRAM Compute-in-Memory Macro For Flexible Dataflow
No ratings yet
GSLP-CIM A 28-nm Globally Systolic and Locally Parallel CNN Transformer Accelerator With Scalable and Reconfigurable eDRAM Compute-in-Memory Macro For Flexible Dataflow
11 pages
Max78000 Article Series Part 1
No ratings yet
Max78000 Article Series Part 1
4 pages
10 1109vdat50263 2020 9190274
No ratings yet
10 1109vdat50263 2020 9190274
6 pages
Week 6
No ratings yet
Week 6
8 pages
High Throughput and Low Bandwidth Demand Accelerating CNN Inference Block-By-block On FPGAs
No ratings yet
High Throughput and Low Bandwidth Demand Accelerating CNN Inference Block-By-block On FPGAs
9 pages
Practical 14
No ratings yet
Practical 14
5 pages
2WH Light
No ratings yet
2WH Light
36 pages
7-Research On FPGA High-Performance Implementation Method of CNN
No ratings yet
7-Research On FPGA High-Performance Implementation Method of CNN
5 pages
ANN Notes
No ratings yet
ANN Notes
8 pages
Hospital Managemen T System: Oose LAB File
No ratings yet
Hospital Managemen T System: Oose LAB File
62 pages
Reliability Analysis
100% (1)
Reliability Analysis
16 pages
PEAC Lesson Plan English 8
No ratings yet
PEAC Lesson Plan English 8
2 pages
Geerations of Computer 1st To 5th Explained With Pictures
No ratings yet
Geerations of Computer 1st To 5th Explained With Pictures
9 pages
Auto Sentry Eau99g 240521 095417
No ratings yet
Auto Sentry Eau99g 240521 095417
75 pages
HV Assignment 1 (Section 2a) (1 47)
No ratings yet
HV Assignment 1 (Section 2a) (1 47)
5 pages
Tertiary Winding Function
No ratings yet
Tertiary Winding Function
1 page
Alto DJM-2 Mixer Schematics
No ratings yet
Alto DJM-2 Mixer Schematics
34 pages
Lecture 1-1 Introduction To Digital Systems
No ratings yet
Lecture 1-1 Introduction To Digital Systems
16 pages
Rem Mark 50 Bagger
No ratings yet
Rem Mark 50 Bagger
64 pages
Agri-Fishery LAS 5
No ratings yet
Agri-Fishery LAS 5
5 pages
Statistical Analysis System: First SAS Program
No ratings yet
Statistical Analysis System: First SAS Program
8 pages
Prashanth 091123
No ratings yet
Prashanth 091123
8 pages
9852 2340 01b Manual Cement Unit Boltec M & L RCS 4.5
No ratings yet
9852 2340 01b Manual Cement Unit Boltec M & L RCS 4.5
56 pages
CFS Families
No ratings yet
CFS Families
4 pages
Quadratic Equations Final
No ratings yet
Quadratic Equations Final
6 pages
Numerical Diff and Integration
No ratings yet
Numerical Diff and Integration
56 pages
10 Formatting Text (Font, Paragraph, Lists)
No ratings yet
10 Formatting Text (Font, Paragraph, Lists)
3 pages
Form B Level 200
No ratings yet
Form B Level 200
1 page
Grade 6 Performance Task: Taking A Field Trip
No ratings yet
Grade 6 Performance Task: Taking A Field Trip
24 pages
LDO Forwarding Pump 90 KW Scheme
No ratings yet
LDO Forwarding Pump 90 KW Scheme
15 pages
6.-SESSION-PLAN Sample
No ratings yet
6.-SESSION-PLAN Sample
9 pages
102-00094-I RIO ZUNI Operators Manual
No ratings yet
102-00094-I RIO ZUNI Operators Manual
46 pages
Duk
No ratings yet
Duk
7 pages
HP-MP: Compact Pulverizing Mill and Pellet Press
No ratings yet
HP-MP: Compact Pulverizing Mill and Pellet Press
6 pages
Operational Amplifier
No ratings yet
Operational Amplifier
18 pages
Technical Brochure Metal Ceilings V100-V200-en EU
No ratings yet
Technical Brochure Metal Ceilings V100-V200-en EU
12 pages
Question 2.21: What Are The Reasons of Using Load Equalisation in The Electric Drive? Answer
No ratings yet
Question 2.21: What Are The Reasons of Using Load Equalisation in The Electric Drive? Answer
1 page
Python Beyond Limits: Python, #3
From Everand
Python Beyond Limits: Python, #3
AnwaarX
No ratings yet
Software Architecture with Python
From Everand
Software Architecture with Python
Anand Balachandran Pillai
3/5 (1)
Linux Essentials for Hackers & Pentesters: Kali Linux Basics for Wireless Hacking, Penetration Testing, VPNs, Proxy Servers and Networking Commands
From Everand
Linux Essentials for Hackers & Pentesters: Kali Linux Basics for Wireless Hacking, Penetration Testing, VPNs, Proxy Servers and Networking Commands
Linux Advocate Team
No ratings yet
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Accelerated Computing With HIP: Second Edition
From Everand
Accelerated Computing With HIP: Second Edition
Yifan Sun
No ratings yet
Mastering the Art of ARM Assembly Programming: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering the Art of ARM Assembly Programming: Unlock the Secrets of Expert-Level Skills
Steve Jones
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Rust for Network Programming and Automation: Learn to Design and Automate Networks, Performance Optimization, and Packet Analysis with low-level Rust
From Everand
Rust for Network Programming and Automation: Learn to Design and Automate Networks, Performance Optimization, and Packet Analysis with low-level Rust
Brian Anderson
No ratings yet
Rust for Network Programming and Automation
From Everand
Rust for Network Programming and Automation
Brian Anderson
No ratings yet
BentoML Adapter Integrations for Machine Learning Frameworks: The Complete Guide for Developers and Engineers
From Everand
BentoML Adapter Integrations for Machine Learning Frameworks: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Study Guide Cisco 300-535 SPAUTO Automating and Programming Cisco Service Provider Solutions
From Everand
Study Guide Cisco 300-535 SPAUTO Automating and Programming Cisco Service Provider Solutions
Anand Vemula
No ratings yet
Python Networking 101: Navigating essentials of networking, socket programming, AsyncIO, network testing, simulations and Ansible
From Everand
Python Networking 101: Navigating essentials of networking, socket programming, AsyncIO, network testing, simulations and Ansible
Odette Windsor
No ratings yet
Linux Essentials for Hackers & Pentesters
From Everand
Linux Essentials for Hackers & Pentesters
Linux Advocate Team
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
CircuitPython in Practice: Definitive Reference for Developers and Engineers
From Everand
CircuitPython in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Programming the MSP430 Microcontroller: Definitive Reference for Developers and Engineers
From Everand
Programming the MSP430 Microcontroller: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
ARM Architecture and Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
ARM Architecture and Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cortex-M Architecture and Programming Reference: Definitive Reference for Developers and Engineers
From Everand
Cortex-M Architecture and Programming Reference: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
SystemTap Essentials: Definitive Reference for Developers and Engineers
From Everand
SystemTap Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Nmap: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Nmap: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Programming Atmel Microcontrollers: Definitive Reference for Developers and Engineers
From Everand
Programming Atmel Microcontrollers: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PIC Microcontroller Development Essentials: Definitive Reference for Developers and Engineers
From Everand
PIC Microcontroller Development Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Jetson Platform Development Guide: Definitive Reference for Developers and Engineers
From Everand
Jetson Platform Development Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cortex-A Architecture and System Design: Definitive Reference for Developers and Engineers
From Everand
Cortex-A Architecture and System Design: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

MNSIM Manual

Uploaded by

MNSIM Manual

Uploaded by

Manual of MNSIM Python:

A Behavior-Level Modeling Tool for NVM-based

Figure 1. The overview of MNSIM Python

2 RUNNING MNSIM PYTHON

e.g.: cd /Users/user1/MNSIM Python/

2.2 Parser information

e.g.: Simulate AlexNet and the weights file is stored in [/example/AlexNet.pth]

2.3 Hardware description and modification

Parser Description Default

2.4 CNN description and weights file

Table 2. Layer required information

Layer Type Variable Description

2.5 Case study: VGG8

The output information contains four parts:

2. Hardware utilization and performance:

3. Computing latency (latency of each layer and the entire model):

3 ARCHITECTURE DESIGN USED IN MNSIM PYTHON

Tile Tile Tile Tile Pooling Adder Tree

Figure 2. The overall architecture assumption used in MNSIM Python

bit 1 bit 2 bit 𝑊'

Input Activation Weight Kernel #$ %

Figure 3. Data splitting scheme

4 ENTIRE MODELING FLOW

Weight matrix and Sub Matrix Weight

Latency Modeling Latency results

Figure 4. Entire modeling flow

[0, 2 pDAC +log2 M+pNV M − 1]

Non-ideal Input Feature

Figure 5. Accuracy evaluation of PIM-based CNN inference

5 FUTURE WORK AND UPDATE

Figure 6. The completed version of MNSIM Python

Long term planning:

nonvolatile memory. IEEE TCAD.

caches. HP laboratories, 27:28.

JSSC, 1996, 31(5):677–688.

You might also like