0% found this document useful (0 votes)

248 views

hls4ml Tutorial

This document provides instructions for setting up and running a hands-on tutorial for the hls4ml package. It instructs users to open a Jupyter notebook hosted online, authenticate with their GitHub account, and execute code cells using shift + enter. For local work, it recommends installing necessary software like Vivado and following conda instructions on the hls4ml GitHub page.

Uploaded by

pelayo leguina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

248 views

hls4ml Tutorial

Uploaded by

pelayo leguina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Hands On - Setup

● The interactive part is served with Python notebooks

● Open https://cern.ch/ssummers/hls4ml-tutorial in your web browser
● Authenticate with your Github account (login if necessary)
● If you’re new to Jupyter notebooks, select a cell and hit “shift + enter” to execute the code
● If you have Vivado install yourself, you might prefer to work locally, see ‘conda’ section at:
https://github.com/fastmachinelearning/hls4ml-tutorial

Aug 18, 2021 hls4ml tutorial

1
hls4ml tutorial
FastML Workshop 2020
Sioni Summers et al. for the hls4ml team
2
Introduction
● hls4ml is a package for translating neural networks to FPGA ﬁrmware for inference with extremely
low latency on FPGAs
○ https://github.com/hls-fpga-machine-learning/hls4ml
○ https://fastmachinelearning.org/hls4ml/
○ pip install hls4ml

● In this session you will get hands on experience with the hls4ml package
● We’ll learn how to:
○ Translate models into synthesizable FPGA code
○ Explore the diﬀerent handles provided by the tool to optimize the inference
■ Latency, throughput, resource usage
● Make our inference more computationally eﬃcient with pruning and quantization

Aug 18, 2021 hls4ml tutorial

3
Why FPGAs?

Aug 18, 2021 hls4ml tutorial

4
LHC Experiment Data Flow

r l
s
lli Hz

ge ve

pu ine
g
on

tin
ig e
co M

om ﬄ
Tr h-L
si
pp 40

O
ig
Tr

H
L1

C
DATA FLOW
L1 trigger:
∙ 40 MHz in / 100 KHz out
∙ Process 100s TB/s
∙ Trigger decision to be made in ≈ 10 μs
∙ Coarse local reconstruction
∙ FPGAs / Hardware implemented
hls4ml tutorial
5
hls4ml origins: triggering at (HL-)LHC

Extreme collision frequency of 40 MHz → extreme data rates O(100 TB/s)

Most collision “events” don’t produce interesting physics
“Triggering” = filter events to reduce data rates to manageable levels

hls4ml tutorial 6
LHC Experiment Data Flow

r l
s
lli Hz

ge ve

pu ine
g
on

tin
ig e
co M

om ﬄ
Tr h-L
si
pp 40

O
ig
Tr

H
L1

C
DATA FLOW

Deploy ML algorithms very early in the game

Challenge: strict latency constraints!

hls4ml tutorial
7
The challenge: triggering at (HL-)LHC
The trigger discards events forever, so selection must be very precise
ML can improve sensitivity to rare physics
Needs to be fast!
Enter: hls4ml (high level synthesis for machine learning)

hls4ml tutorial 8
Muon trigger example

Aug 18, 2021 hls4ml tutorial

9
hls4ml: progression
● Previous slides showed the original motivation for hls4ml
○ Extreme low latency, high throughput domain
● Since then, we have been expanding!
○ Longer latency domains, larger models, resource constrained
○ Different FPGA vendors
○ New applications, new architectures
● While maintaining core characteristics:
○ “Layer-unrolled” HLS library → not another DPU
○ Extremely configurable: precision, resource vs latency/throughput tradeoff
○ Research project, application- and user-driven
○ Accessible, easy to use

Aug 18, 2021 hls4ml tutorial

10
Recent Developments
hls4ml community is very active!
● Binary & Ternary neural networks:
[2020 Mach. Learn.: Sci. Technol]
○ Compressed weights for low resource inference
● Boosted Decision Trees: [JINST 15 P05026 (2020)]
○ Low latency for Decision Tree ensembles
● GarNet / GravNet: [arXiv: 2008.03601]
○ Distance weighted graph neural networks suitable
for sparse/irregular point-cloud data
● Quantization aware training QKeras + support in
hls4ml: [arXiv: 2006.10159]
● Convolutional neural networks
Mach. Learn.: Sci. Technol. 2 045015 (2021)

Aug 18, 2021 hls4ml tutorial

11
Coming Soon
● A few exciting new things are being developed
and should become available soon:
○ Intel Quartus HLS, Mentor Catapult HLS,
Intel One API ‘Backends’
○ Recurrent Neural Networks
○ More integrated ‘end-to-end’ flow with bitfile
generation and host bindings for platforms
like Alveo, PYNQ
■ Bundled into MLCommons Tiny
submission -- image classification and
anomaly detection
https://mlcommons.org/en/news/mlperf-tiny-v05/

Aug 18, 2021 hls4ml tutorial

12
What are FPGAs?
Field Programmable Gate Arrays are reprogrammable
FPGA diagram
integrated circuits

Contain many diﬀerent building blocks (‘resources’) which are

connected together as you desire

Originally popular for prototyping ASICs, but now also for high
performance computing

Aug 18, 2021 hls4ml tutorial

13
What are FPGAs?
FPGA diagram
Field Programmable Gate Arrays are reprogrammable
integrated circuits

Logic cells / Look Up Tables perform arbitrary functions on

small bitwidth inputs (2-6)

These can be used for boolean operations, arithmetic, small

memories

Flip-Flops register data in time with the clock pulse

Logic cell

Look-up
Flip-ﬂop
table
(logic) (registers)

Aug 18, 2021 hls4ml tutorial

14
What are FPGAs?
Field Programmable Gate Arrays are reprogrammable integrated
FPGA diagram
circuits

DSPs (Digital Signal Processor) are specialized units for

multiplication and arithmetic

Faster and more eﬃcient than using LUTs for these types of
operations

And for Neural Nets, DSPs are often the most scarce

DSP
(multiplication)

Aug 18, 2021 hls4ml tutorial

15
What are FPGAs?
FPGA diagram
Field Programmable Gate Arrays are reprogrammable
integrated circuits

BRAMs are small, fast memories - RAMs, ROMs, FIFOs (18Kb

each in Xilinx)
Memories using BRAMs more eﬃcient than using LUTs

A big FPGA has nearly 100Mb of BRAM, chained together as

needed

Aug 18, 2021 hls4ml tutorial

16
What are FPGAs?
FPGA diagram
In addition, there are specialised blocks for I/O, making FPGAs
popular in embedded systems and HEP triggers

High speed transceivers with Tb/s total bandwidth

PCIe, (Multi) Gigabit Ethernet, Inﬁniband

AND: Support highly parallel algorithm implementations

Low power per Op (relative to CPU/GPU)

Aug 18, 2021 hls4ml tutorial

17
Why are FPGAs Fast?
● Fine-grained / resource parallelism
○ Use the many resources to work on
diﬀerent parts of the problem
simultaneously
○ Allows us to achieve low latency
● Most problems have at least some sequential
aspect, limiting how low latency we can go
○ But we can still take advantage of it
with…
● Pipeline parallelism
○ Use the register pipeline to work on Like a production line for data…
diﬀerent data simultaneously
○ Allows us to achieve high throughput

Aug 18, 2021 hls4ml tutorial

18
How are FPGAs programmed?
Hardware Description Languages

HDLs are programming languages which describe electronic

circuits

High Level Synthesis

Compile from C/C++ to VHDL

Pre-processor directives and constraints used to optimize the
design
Drastic decrease in ﬁrmware development time!

Today we’ll use Xilinx Vivado HLS [*]

[*] https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_1/ug902-vivado-high-level-synthesis.pdf
Aug 18, 2021 hls4ml tutorial
19
Jargon
● LUT - Look Up Table aka ‘logic’ - generic functions on small bitwidth inputs. Combine many to build the
algorithm
● FF - Flip Flops - control the flow of data with the clock pulse. Used to build the pipeline and achieve high
throughput
● DSP - Digital Signal Processor - performs multiplication and other arithmetic in the FPGA
● BRAM - Block RAM - hardened RAM resource. More efficient memories than using LUTs for more than a few
elements
● HLS - High Level Synthesis - compiler for C, C++, SystemC into FPGA IP cores
● HDL - Hardware Description Language - low level language for describing circuits
● RTL - Register Transfer Level - the very low level description of the function and connection of logic gates
● Latency - time between starting processing and receiving the result
○ Measured in clock cycles or seconds
● II - Initiation Interval - time from accepting first input to accepting next input

Aug 18, 2021 hls4ml tutorial

20
high level synthesis for machine learning

Catapult
Coming Soon

https://fastmachinelearning.org/hls4ml/
Aug 18, 2021 hls4ml tutorial
21
Neural network inference

Aug 18, 2021 hls4ml tutorial

22
Neural network inference
Ln
L1
LN precomputed and DSPs logic cells
stored in BRAMs

Aug 18, 2021 hls4ml tutorial

23
Neural network inference
Ln
L1
LN precomputed and DSPs logic cells
stored in BRAMs

How many resources? DSPs,

LUTs, FFs?
Does the model ﬁt in the latency
requirements?

Aug 18, 2021 hls4ml tutorial

24
Neural network inference
Ln
L1
LN precomputed and DSPs logic cells
stored in BRAMs

How many resources? DSPs,

LUTs, FFs?
Does the model ﬁt in the latency
requirements?

Aug 18, 2021 hls4ml tutorial

25
Efficient NN design for FPGAs
FPGAs provide huge ﬂexibility Constraints:
Performance depends on how well you take Input bandwidth
advantage of this FPGA resources
Latency

Today you will learn how to optimize your project through:

- compression: reduce number of synapses or neurons

a i ning
tr
NN
- quantization: reduces the precision of the calculations (inputs,
weights, biases)
r o ject
p
- parallelization: tune how much to parallelize to make the inference GA ing
FP sign
faster/slower versus FPGA resources de

Aug 18, 2021 hls4ml tutorial

26
What we won’t cover today
● Two new tutorial notebooks are not yet ready, but will be soon!
○ Boosted decision trees: implemented in a companion package to hls4ml
■ https://github.com/thesps/conifer
○ Convolutional NNs: convolutional layers can quickly increase in number of
operations, recently available in hls4ml at larger scales
■ https://arxiv.org/abs/2101.05108
● What comes after hls4ml… you would need to integrate the ‘IP core’ into a larger
design
○ For a custom board, you’d need to do this by hand (e.g. CMS L1 Trigger, National
Instruments DAQ framework)
○ For more oﬀ-the-shelf boards, integration with system-on-chip or host CPU can be
more straightforward
■ https://github.com/mlcommons/tiny_results_v0.5/tree/main/open/hls4ml

Aug 18, 2021 hls4ml tutorial

27
Today’s hls4ml hands on
∙ Part 1:
- Get started with hls4ml: train a basic model and run the conversion, simulation & c-synthesis
steps

∙ Part 2:
- Learn how to tune inference performance with quantization & ReuseFactor

∙ Part 3:
- Perform model compression and observe its eﬀect on the FPGA resources/latency

∙ Part 4:
- Train using QKeras “quantization aware training” and study impact on FPGA metrics

Aug 18, 2021 hls4ml tutorial

28
hls4ml tutorial
Part 1: Model Conversion

29
Physics case: jet tagging
Study a multi-classiﬁcation task to be implemented on FPGA: discrimination between highly
energetic (boosted) q, g, W, Z, t initiated jets

Jet = collimated ‘spray’ of particles

top Z W other quark gluon

t→bW→bqq Z→qq W→qq q/g background

3-prong jet 2-prong jet 2-prong jet no substructure
and/or mass ~ 0
Reconstructed as one massive jet with substructure

Aug 18, 2021 hls4ml tutorial 30

Physics case: jet tagging

top Z W other quark gluon

Input variables: several observables known to have high discrimination power

from oﬄine data analyses and published studies [*]
[*] D. Guest at al. PhysRevD.94.112002, G. Kasieczka et al. JHEP05(2017)006, J. M.
Butterworth et al. PhysRevLett.100.242001, etc..

Aug 18, 2021 hls4ml tutorial

31
Physics case: jet tagging
● We’ll train the ﬁve class multi-classiﬁer on a sample of ~ 1M events with two boosted
WW/ZZ/tt/qq/gg anti-kT jets
○ Dataset DOI: 10.5281/zenodo.3602254
○ OpenML: https://www.openml.org/d/42468
● Fully connected neural network with 16 expert-level inputs:
○ Relu activation function for intermediate layers
○ Softmax activation function for output layer

better
AUC = area under ROC curve
(100% is perfect, 20% is random)
Aug 18, 2021 hls4ml tutorial
32
Hands On - Setup
● The interactive part is served with Python notebooks
● Open https://cern.ch/ssummers/hls4ml-tutorial in your web browser
● Authenticate with your Github account (login if necessary)
● Open and start running through “part1_getting_started” !
● If you’re new to Jupyter notebooks, select a cell and hit “shift + enter” to execute the code
● If you have Vivado install yourself, you might prefer to work locally, see ‘conda’ section at:
https://github.com/fastmachinelearning/hls4ml-tutorial

Aug 18, 2021 hls4ml tutorial

33
hls4ml Tutorial
Part 2: Advanced Conﬁguration

34
Efficient NN design: quantization
ap_fixed<width bits, integer bits> ∙ In the FPGA we use fixed point representation
0101.1011101010 − Operations are integer ops, but we can represent fractional
integer fractional values
width ∙ But we have to make sure we’ve used the correct data types!
Scan integer bits Scan fractional bits
Fractional bits fixed to 8 Integer bits fixed to 6

FPGA AUC / Expected AUC

Full performance at 6
Full performance at 8
integer bits
fractional bits

Aug 18, 2021 hls4ml tutorial

35
Efficient NN design: quantization

Aug 18, 2021 hls4ml tutorial

36
Efficient NN design: parallelization
∙ Trade-oﬀ between latency and FPGA resource usage determined by the parallelization of the
calculations in each layer

∙ Conﬁgure the “reuse factor” = number of times a multiplier is used to do a computation

Fewer resources,
Lower throughput,
Fully serial Higher latency

More resources,
Fully parallel Higher throughput,
Lower latency
Reuse factor: how much to parallelize operations in a hidden layer
Aug 18, 2021 hls4ml tutorial
37
Parallelization: DSP usage
More resources
Fully parallel
Each mult. used 1x

Each mult. used 2x

Each mult. used 3x

…
Longer latency

Aug 18, 2021 hls4ml tutorial

38
Parallelization: Timing
Latency of layer m

Longer latency

~ 175 ns
Latency (clock

Each mult. used 6x

cycles)

… …
Each mult. used 3x

Fully parallel
~ 75 ns Each mult. used 1x

More resources
Aug 18, 2021 hls4ml tutorial
39
Large MLP
● ‘Strategy: Resource’ for IOType: io_parallel # options: io_serial/io_parallel
larger networks and higher HLSConfig:
reuse factor Model:
● Uses a slightly different HLS
Precision: ap_fixed<16,6>
implementation of the dense
layer to compile faster and ReuseFactor: 128
better for large layers Strategy: Resource
● Here, we use a different LayerName:
partitioning on the first layer dense1:
for the best partitioning of
ReuseFactor: 112
arrays

This config is for a model trained on the MNIST digits classification dataset
Architecture (fully connected): 784 → 128 → 128 → 128 → 10
Model accuracy: ~97%
We can work out how many DSPs this should use...

Aug 18, 2021 hls4ml tutorial

40
Large MLP
∙ It takes a while to synthesise, so here’s one I made earlier…

∙ The DSPs should be: (784 x 128) / 112 + (2 x 128 x 128 + 128 x 10) / 128 = 1162 🤞
============================
============================
+ Timing (ns): =====================================
== Utilization Estimates
* Summary:
=====================================
+--------+-------+----------+------------+ +---------------------+---------+-------+---------+--------+
| Clock | Target| Estimated| Uncertainty| | Name | BRAM_18K| DSP48E| FF | LUT |
+---------------------+---------+-------+---------+--------+
+--------+-------+----------+------------+ ...
|ap_clk | 5.00| 4.375| 0.62| +---------------------+---------+-------+---------+--------+
+--------+-------+----------+------------+ |Total | 1962| 1162| 169979| 222623|
+---------------------+---------+-------+---------+--------+
|Available SLR | 2160| 2760| 663360| 331680|
+ Latency (clock cycles): +---------------------+---------+-------+---------+--------+
|Utilization SLR (%) | 90| 42| 25| 67|
* Summary:
+---------------------+---------+-------+---------+--------+
+-----+-----+-----+-----+----------+ |Available | 4320| 5520| 1326720| 663360|
| Latency | Interval | Pipeline | +---------------------+---------+-------+---------+--------+
|Utilization (%) | 45| 21| 12| 33|
| min | max | min | max | Type | +---------------------+---------+-------+---------+--------+
+-----+-----+-----+-----+----------+
| 518| 522| 128| 128| dataflow |
+-----+-----+-----+-----+----------+

II determined by the largest reuse factor

Aug 18, 2021 hls4ml tutorial
41
hls4ml Tutorial
Part 3: Compression

42
NN compression methods
● Network compression is a widespread technique to reduce the size, energy consumption, and
overtraining of deep neural networks
● Several approaches have been studied:
○ parameter pruning: selective removal of weights based on a particular ranking
[arxiv.1510.00149, arxiv.1712.01312]
○ low-rank factorization: using matrix/tensor decomposition to estimate informative parameters
[arxiv.1405.3866]
○ transferred/compact convolutional filters: special structural convolutional filters to save
parameters [arxiv.1602.07576]
○ knowledge distillation: training a compact network with distilled knowledge of a large network
[doi:10.1145/1150402.1150464]
● Today we’ll use the tensorflow model sparsity toolkit
○ https://blog.tensorflow.org/2019/05/tf-model-optimization-toolkit-pruning-API.html
● But you can use other methods!

Aug 18, 2021 hls4ml tutorial

43
TF Sparsity
● Iteratively remove low magnitude weights, starting with 0 sparsity, smoothly increasing
up to the set target as training proceeds

Aug 18, 2021 hls4ml tutorial

44
Eﬃcient NN design: compression

Fully parallelized
(max DSP use)

compression
Number of DSPs available
● DSPs (used for multiplication) are often
limiting resource
○ maximum use when fully parallelized
○ DSPs have a max size for input (e.g.
70% compression ~ 70% fewer DSPs 27x18 bits), so number of DSPs per
multiplication changes with precision

Aug 18, 2021 hls4ml tutorial

45
hls4ml Tutorial
Part 4: Quantization

46
Eﬃcient NN design: quantization
● hls4ml allows you to use diﬀerent data types
everywhere, we saw how to tune that in part 2
● We will also try quantization-aware training with
QKeras (part 4)
● With quantization-aware we can even go down to
just 1 or 2 bits
○ See our recent work:
https://arxiv.org/abs/2003.06308
● See other talks on quantization at this workshop:
Amir, Thea, Benjamin

Aug 18, 2021 hls4ml tutorial

47
QKeras
● QKeras is a library to train models with
quantization in the training
○ Developed & maintained by Google
● Easy to use, drop-in replacements for Keras
layers
○ e.g. Dense → QDense
○ e.g. Conv2D → QConv2D
○ Use ‘quantizers’ to specify how many bits
to use where
○ Same kind of granularity as hls4ml
● Can achieve good performance with very few
bits
● We’ve recently added support for
QKeras-trained models to hls4ml
○ The number of bits used in training is also
used in inference
○ The intermediate model is adjusted to
capture all optimizations possible with
QKeras
hls4ml tutorial
48
Summary
● After this session you’ve gained some hands on experience with hls4ml
○ Translated neural networks to FPGA firmware, run simulation and synthesis
● Tuned network inference performance with precision and ReuseFactor
○ Used profiling and trace tools to guide tuning
● Learned how to simply prune a neural network and the impact on resources
● Trained a model with small number of bits using QKeras, and use the same spec in inference easily
with hls4ml
● The tutorial server is always available at https://cern.ch/ssummers/hls4ml-tutorial
● You can find these tutorial notebooks to run locally at:
https://github.com/fastmachinelearning/hls4ml-tutorial
● You can run the tutorial Docker image yourself like:
○ docker run -p 8888:8888 gitlab-registry.cern.ch/ssummers/hls4ml-tutorial:12.v
○ 15 GB download! Or remove ‘.v’ for a much smaller image but without Xilinx tools (so no ‘build’)
● Use hls4ml in your own environment: pip install hls4ml[profiling]

Aug 18, 2021 hls4ml tutorial

Download Full Digital VLSI Design and Simulation with Verilog 1st Edition Lata Tripathi Suman Saxena Sobhit Sinha Sanjeet K Patel Govind S PDF All Chapters
100% (3)
Download Full Digital VLSI Design and Simulation with Verilog 1st Edition Lata Tripathi Suman Saxena Sobhit Sinha Sanjeet K Patel Govind S PDF All Chapters
33 pages
DSDV Lab Manual PDF
100% (3)
DSDV Lab Manual PDF
15 pages
Sram Part1
No ratings yet
Sram Part1
36 pages
BEC7L1 - Digital Cmos Vlsi-Lab
No ratings yet
BEC7L1 - Digital Cmos Vlsi-Lab
66 pages
Chapter10 Verilog
No ratings yet
Chapter10 Verilog
62 pages
Design and VLSI Implementation of High-Performance Face-Detection Engine For Mobile Applications
No ratings yet
Design and VLSI Implementation of High-Performance Face-Detection Engine For Mobile Applications
2 pages
Basic Uvm
No ratings yet
Basic Uvm
13 pages
M.Tech. VLSI Design, Test and Manufacturing Curriculum and Syllabus R2015
No ratings yet
M.Tech. VLSI Design, Test and Manufacturing Curriculum and Syllabus R2015
36 pages
Programmable ASIC Design: Haibo Wang ECE Department Southern Illinois University Carbondale, IL 62901
No ratings yet
Programmable ASIC Design: Haibo Wang ECE Department Southern Illinois University Carbondale, IL 62901
25 pages
00-The Art of Analog Cover)
No ratings yet
00-The Art of Analog Cover)
17 pages
Cadence Spectre Matlab Toolbox
0% (1)
Cadence Spectre Matlab Toolbox
11 pages
Vlsi Module
No ratings yet
Vlsi Module
2 pages
105 Machine Learning Paper
No ratings yet
105 Machine Learning Paper
6 pages
VLSI Design Lab
No ratings yet
VLSI Design Lab
130 pages
Introduction To Uvm
No ratings yet
Introduction To Uvm
7 pages
AMS-RF PDK Flow
No ratings yet
AMS-RF PDK Flow
30 pages
Computer Vision Lab Report
No ratings yet
Computer Vision Lab Report
7 pages
Layout Design8
No ratings yet
Layout Design8
34 pages
Two Day FDP On ASIC Physical Design Using Cadence First Encounter
No ratings yet
Two Day FDP On ASIC Physical Design Using Cadence First Encounter
2 pages
Integrated Circuits: Mohamed Dessouky
No ratings yet
Integrated Circuits: Mohamed Dessouky
18 pages
Verilog For Verification
No ratings yet
Verilog For Verification
23 pages
Verilog Lab Instructor Manual
100% (1)
Verilog Lab Instructor Manual
90 pages
Using Verilog and System Verilog Design and Verify The Communication Bridge Between APB and I2C Protocol
No ratings yet
Using Verilog and System Verilog Design and Verify The Communication Bridge Between APB and I2C Protocol
3 pages
Verilog Tutorial: Chin-Lung Su
No ratings yet
Verilog Tutorial: Chin-Lung Su
42 pages
Lamda
No ratings yet
Lamda
23 pages
Flow Setup
No ratings yet
Flow Setup
252 pages
About VIT University About VLSI Design Lab: Course Description
No ratings yet
About VIT University About VLSI Design Lab: Course Description
2 pages
EE 523-VLSI Design-Dr. Shahid Masud PDF
No ratings yet
EE 523-VLSI Design-Dr. Shahid Masud PDF
2 pages
Vlsi Lab Manual 2010
No ratings yet
Vlsi Lab Manual 2010
88 pages
CS Amplifier
No ratings yet
CS Amplifier
7 pages
Ec1461 Vlsi Design
No ratings yet
Ec1461 Vlsi Design
17 pages
Mealy and Moore Machines
No ratings yet
Mealy and Moore Machines
15 pages
ESE 570: Digital Integrated Circuits and VLSI Fundamentals: Lec 1: January 12, 2017 Introduction and Overview
No ratings yet
ESE 570: Digital Integrated Circuits and VLSI Fundamentals: Lec 1: January 12, 2017 Introduction and Overview
42 pages
RNS VLSI Lab Manual
No ratings yet
RNS VLSI Lab Manual
30 pages
Dvcon2017 Uvm Debug Lib Final
No ratings yet
Dvcon2017 Uvm Debug Lib Final
9 pages
Chapter07 - Electronic Analysis of CMOS Logic Gates
No ratings yet
Chapter07 - Electronic Analysis of CMOS Logic Gates
38 pages
Concepts in VLSI Design
No ratings yet
Concepts in VLSI Design
59 pages
Analog Labs Manual: IC614 ASSURA410
No ratings yet
Analog Labs Manual: IC614 ASSURA410
108 pages
A Practical Guide To The UofU VLSI CAD Flow
No ratings yet
A Practical Guide To The UofU VLSI CAD Flow
8 pages
FSM Design Using Verilog - Electrosofts
No ratings yet
FSM Design Using Verilog - Electrosofts
7 pages
Open File 2
No ratings yet
Open File 2
68 pages
SystemVerilog Switch Lab Final
No ratings yet
SystemVerilog Switch Lab Final
6 pages
Verilog HDL: A Guide To Digital Design DS TH I and Synthesis
No ratings yet
Verilog HDL: A Guide To Digital Design DS TH I and Synthesis
62 pages
Layout Design Rules
No ratings yet
Layout Design Rules
21 pages
Design of High Speed & Power Optimized Sense Amplifier Using Deep Nano CMOS VLSI Technology
No ratings yet
Design of High Speed & Power Optimized Sense Amplifier Using Deep Nano CMOS VLSI Technology
8 pages
Design Flow Vlsi
No ratings yet
Design Flow Vlsi
42 pages
Asic Design
No ratings yet
Asic Design
59 pages
VHDL Notes
100% (2)
VHDL Notes
14 pages
VMM Golden Reference Guide Jan 2010
No ratings yet
VMM Golden Reference Guide Jan 2010
364 pages
PDK Reference Manual
No ratings yet
PDK Reference Manual
85 pages
Designing Finite State Machines (FSM) Using Verilog
No ratings yet
Designing Finite State Machines (FSM) Using Verilog
8 pages
Analog Layout UTL
No ratings yet
Analog Layout UTL
3 pages
Tanner Manual
100% (1)
Tanner Manual
60 pages
New DSD Manual Rvitm (4-7)
No ratings yet
New DSD Manual Rvitm (4-7)
72 pages
System Ver I Log
No ratings yet
System Ver I Log
8 pages
Application-Specific Integrated Circuit ASIC A Complete Guide
From Everand
Application-Specific Integrated Circuit ASIC A Complete Guide
Gerardus Blokdyk
No ratings yet
Logic synthesis Standard Requirements
From Everand
Logic synthesis Standard Requirements
Gerardus Blokdyk
No ratings yet
Full Text
No ratings yet
Full Text
25 pages
Introduction 2
No ratings yet
Introduction 2
21 pages
Algosup Fpga Course Day 1 Slides
No ratings yet
Algosup Fpga Course Day 1 Slides
111 pages
Lecture11 FPGA
No ratings yet
Lecture11 FPGA
38 pages
Week 1 (Part 2) ECE-852 pak Austria
No ratings yet
Week 1 (Part 2) ECE-852 pak Austria
45 pages
Digital System Implementation
No ratings yet
Digital System Implementation
37 pages
22639-2023-Summer-question-paper[Msbte study resources]
No ratings yet
22639-2023-Summer-question-paper[Msbte study resources]
4 pages
A New VLSI Architecture of Parallel Multiplier-Accumulator Based On Radix-2 Modified Booth Algorithm
No ratings yet
A New VLSI Architecture of Parallel Multiplier-Accumulator Based On Radix-2 Modified Booth Algorithm
8 pages
Venkata Subbaiah Maven Silicon
No ratings yet
Venkata Subbaiah Maven Silicon
2 pages
Finn RTL
No ratings yet
Finn RTL
22 pages
Vlsi Design Unit 2
No ratings yet
Vlsi Design Unit 2
36 pages
Ddco Lab Manual-1
No ratings yet
Ddco Lab Manual-1
15 pages
Dsd&Dica Lab
No ratings yet
Dsd&Dica Lab
104 pages
FPGA Design Tutorial: ECE 554 - Digital Engineering Laboratory
No ratings yet
FPGA Design Tutorial: ECE 554 - Digital Engineering Laboratory
30 pages
Engr. Fahad Islam Cheema Mobile# 0092-342-6714593
No ratings yet
Engr. Fahad Islam Cheema Mobile# 0092-342-6714593
4 pages
Ug901 Vivado Synthesis
No ratings yet
Ug901 Vivado Synthesis
284 pages
Quick Notes On LINT
No ratings yet
Quick Notes On LINT
2 pages
Using The New Verilog-2001 Standard Part Two: Verifying Designs
No ratings yet
Using The New Verilog-2001 Standard Part Two: Verifying Designs
32 pages
Ug1119 Vivado Creating Packaging Ip Tutorial
No ratings yet
Ug1119 Vivado Creating Packaging Ip Tutorial
30 pages
RTL Coding Hints
No ratings yet
RTL Coding Hints
17 pages
CS8351 Syllabus
No ratings yet
CS8351 Syllabus
2 pages
CS8351 Syllabus
No ratings yet
CS8351 Syllabus
1 page
Level To Pulse
No ratings yet
Level To Pulse
18 pages
DDTV Unit 1
No ratings yet
DDTV Unit 1
131 pages
Chapter 2: Structural Modeling: Prof. Soo-Ik Chae
No ratings yet
Chapter 2: Structural Modeling: Prof. Soo-Ik Chae
27 pages
SystemC and Codesign Additional Lectures
No ratings yet
SystemC and Codesign Additional Lectures
58 pages
CV_202312081222137483_12011859
No ratings yet
CV_202312081222137483_12011859
1 page
Synthesis Flow Overview (VLSI) - Introduction - by ANKIT MAHAJAN - Medium
No ratings yet
Synthesis Flow Overview (VLSI) - Introduction - by ANKIT MAHAJAN - Medium
2 pages
Implementation and Testing of Adders Using Bist
50% (2)
Implementation and Testing of Adders Using Bist
52 pages
INTRO TO HDL Course Guide
No ratings yet
INTRO TO HDL Course Guide
3 pages
Sequential cicuit timing
No ratings yet
Sequential cicuit timing
32 pages
Verilog Tips and Interview Questions - Verilog
No ratings yet
Verilog Tips and Interview Questions - Verilog
8 pages
SUG284-2.1E - Gowin IP Core Generator User Guide
No ratings yet
SUG284-2.1E - Gowin IP Core Generator User Guide
56 pages

hls4ml Tutorial

Uploaded by

hls4ml Tutorial

Uploaded by

Hands On - Setup

● The interactive part is served with Python notebooks

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

Extreme collision frequency of 40 MHz → extreme data rates O(100 TB/s)

Deploy ML algorithms very early in the game

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

Contain many diﬀerent building blocks (‘resources’) which are

Aug 18, 2021 hls4ml tutorial

Logic cells / Look Up Tables perform arbitrary functions on

These can be used for boolean operations, arithmetic, small

Flip-Flops register data in time with the clock pulse

Aug 18, 2021 hls4ml tutorial

DSPs (Digital Signal Processor) are specialized units for

Aug 18, 2021 hls4ml tutorial

BRAMs are small, fast memories - RAMs, ROMs, FIFOs (18Kb

A big FPGA has nearly 100Mb of BRAM, chained together as

Aug 18, 2021 hls4ml tutorial

High speed transceivers with Tb/s total bandwidth

AND: Support highly parallel algorithm implementations

Low power per Op (relative to CPU/GPU)

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

HDLs are programming languages which describe electronic

High Level Synthesis

Compile from C/C++ to VHDL

Today we’ll use Xilinx Vivado HLS [*]

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

How many resources? DSPs,

Aug 18, 2021 hls4ml tutorial

How many resources? DSPs,

Aug 18, 2021 hls4ml tutorial

Today you will learn how to optimize your project through:

- compression: reduce number of synapses or neurons

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

Jet = collimated ‘spray’ of particles

top Z W other quark gluon

t→bW→bqq Z→qq W→qq q/g background

Aug 18, 2021 hls4ml tutorial 30

top Z W other quark gluon

Input variables: several observables known to have high discrimination power

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

FPGA AUC / Expected AUC

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

∙ Conﬁgure the “reuse factor” = number of times a multiplier is used to do a computation

Each mult. used 2x

Each mult. used 3x

Aug 18, 2021 hls4ml tutorial

Each mult. used 6x

Aug 18, 2021 hls4ml tutorial

II determined by the largest reuse factor

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

Aug 18, 2021 hls4ml tutorial

You might also like