0% found this document useful (0 votes)
4 views

Tile NET

Good paper new

Uploaded by

f20212831
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Tile NET

Good paper new

Uploaded by

f20212831
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

TileNET,

Working Principle &


Architecture Focused Area:

a novel tiled architecture for ternary-weighted CNNs.


A generic and scalable tiled architecture
systolic and streaming implementation: All the layers operate in parallel in the streaming
architecture. On the other hand, only one layer is computed entirely at a time in the systolic
architecture.
on smaller datasets and larger datasets, ternary weighted networks are closer to full-precision
implementation than binary weighted networks.
Training at fp, inference -> 8bit input and weight (3bit)(-1,0,1)
deeply pipelined custom architecture developed exclusively for ternarised CNNs (very high
throughput)

Motivation:
Quantization, lowering the precision of the MAC operations reduces the resources required for a
single MAC operation.
This reduction in resource requirement allows accommodating a higher number of MAC
operators that can be used in parallel
Results:

Limitations:
Acc?
Platform:
Xilinx Virtex, Artix, Kintex and Zynq devices
16 Teraoperations per second (TOPs) for LeNet, AlexNet, ResNet-50 and VGG-16.

s. Consequently, lowering the precision of the MAC operations reduces the resources required
for a single MAC operation

write working principle, architecture focused area, motivation(why?),


advancements(performancxe impact), limitations of the research and platforms used to simulate

FPGA-Based Hardware Acceleration Using PYNQZ2

Fpga is efficient in hw accn

Working Principle &


Architecture Focused Area:

Motivation:

,,, more of a way how they have used fpgas…


Results:

limitation

Platform:
PYNQZ2, ocr, image recognition
Long short-term memory (LSTM) neural network is used to implement OCR, and a Binarized
neural network (BNN) is used to implement image recognition

An Efficient Hardware Accelerator for Block Sparse Convolutional Neural Networks on


FPGA

Working Principle &


Architecture Focused Area:
create a dataflow format with high parallelism that can avoid zero-weight calculation in
convolution.
the choice of sparse mode and the coding mode of sparse data, and the other is how to design
the hardware architecture to match the data flow during calculation
Block sparsity is a pruning method whose pruning granularity is between weight pruning and
filter pruning. It can achieve a better balance between accuracy and efficiency. And the sparse
data obtained in this way is easier to store and encode,
Motivation:

Model compression, Block pruning, Sparse cnn

Results:

limitation
MobileNet-V2
Platform:
Fpga Zynq

Hardware Pruning

Motivation: roofline base model..

You might also like