Introduction 2
Introduction 2
Introduction
r l
ge ve
s
lli Hz
pu e
g
on
CMS
CMSTrigger
CMS Trigger
Trigger
in
r
tin
ig e
ge
co M
Tr h-L
om ffl
si
ig
pp 40
O
ig
Tr
H
L1
1 1kHz
C
1 kHz
kHz
100
100kHz
100 kHz
kHz 1 MB/evt
1 1MB/evt
1 MB/evt
MB/evt
HiH Hi
ghighgh
4040MHz
40 MHz e rer er T T-L-L -L
MHz ig gggigg rTi r rie e e
T i
rTr Tr ggigggvgeveve
L1L1L1 e
DATA FLOWr r
re el l l
• •Level-1
•Level-1Trigger
Level-1 Trigger
Trigger
• 40 MHz in / 100 KHz out • •High-Level
(hardware)
(hardware)
(hardware) •High-Level
High-Level Trigger (software)
Trigger
Trigger (software)
(software)
• Absorbs 100s TB/s
• •99.75%
•99.75%
99.75%rejectedTrigger decision to be made•in•~
•rejected
rejected •10
99% μsrejected
99%
99% rejected
rejected
• •decision
•decision
• Coarse local reconstruction
inin~4
decision in
~4 μs
~4μs μs
• FPGAs / Hardware implemented
• • •
decision
decision
decision inin~100s
in ms
~100s
~100s msms
r l
ge ve
s
lli Hz
pu e
g
on
CMS
CMSTrigger
CMS Trigger
Trigger
in
r
tin
ig e
ge
co M
Tr h-L
om ffl
si
ig
pp 40
O
ig
Tr
H
L1
1 1kHz
C
1 kHz
kHz
100
100kHz
100 kHz
kHz 1 MB/evt
1 1MB/evt
1 MB/evt
MB/evt
HiH Hi
ghighgh
4040MHz
40 MHz e rer er T T-L-L -L
MHz ig gggigg rTi r rie e e
T i
rTr Tr ggigggvgeveve
L1L1L1 e
DATA FLOWr r
re el l l
• •Level-1
•Level-1Trigger
Level-1 Trigger
Trigger (hardware)• 100•KHz
(hardware)
(hardware) •High-Level
•High-Level
/ 1 KHz outTrigger
inHigh-Level (software)
Trigger
Trigger (software)
(software)
• Output: ~ 500 KB/event
• •99.75%
•99.75%
99.75%rejected
rejected
rejected • •99%
•99%
• Processing 99%rejected
time ~ rejected
rejected
300 ms
• •decision
•decisioninin~4
decision ~4μs
in ~4μsμs •
• Simplified global reconstruction
• •
decision
decision
decision inin~100s
in ms
~100s
~100s
• Software implemented on CPUs
msms
r l
ge ve
s
lli Hz
CMS Trigger
pu e
g
on
CMS
CMSTrigger
CMS Trigger
Trigger
in
r
tin
ig e
ge
co M
Tr h-L
om ffl
si
ig
pp 40
O
ig
Tr
H
1 kHz1 1kHz
L1
C
1 kHz
kHz
100 kHz
100 kHz
100
100 kHz
kHz 1 MB/evt
1 1MB/evt
1 MB/evt
MB/evt
Hi
gh HiH g i Hi
gh
40 MHz er T
r - L h g h
4040MHz
40 MHz
MHz igg ge er
gri er eTvT T-L-L -L
Tr rig igrgig gg riegri rieve ev
L1 1 T T r T e l gg g g gev e e
1
LL L 1 r e e el l l
DATA FLOWr rr
vel-1 Trigger
• • •Level-1
Level-1
Level-1 (hardware)
Trigger (hardware)
Trigger
Trigger •
(hardware)
(hardware) High-Level Trigger
• • •High-Level
High-Level (software)
Trigger
High-Level (software)
Trigger
Trigger (software)
(software)
• Output: max. 1 MB/event
.75%
•• • rejected
99.75%
99.75%rejected
rejected
99.75% rejected • 99% rejected
• 99%
99% • • • Accurate global reconstruction
rejected
99% rejectedtime ~ 20 s
•rejected
Processing
cision in ~4 μs
• • •decision in~4
decision in
decision ~4μs
in ~4
μsμs • decision in ~100s
• • •decision
decision in
decision ms
~100s
in ~100s
in ~100s
• Software ms
msmson CPUs
implemented
r l
ge ve
s
lli Hz
CMS Trigger
pu e
g
on
CMS
CMSTrigger
CMS Trigger
Trigger
in
r
tin
ig e
ge
co M
Tr h-L
om ffl
si
ig
pp 40
O
ig
Tr
H
1 kHz1 1kHz
L1
C
1 kHz
kHz
100 kHz
100 kHz
100
100 kHz
kHz 1 MB/evt
1 1MB/evt
1 MB/evt
MB/evt
Hi
gh HiH g i Hi
gh
40 MHz er T
r - L h g h
4040MHz
40 MHz
MHz ig g ge er
gri er eTvT T-L-L -L
Tr rig igrgig gg riegri rieve ev
L1 1 T T r T e l gg g g gev e e
1 ns 1Lμs1
LL 1 r e re100el l msl 1s
r r
vel-1 Trigger
• • •Level-1
Level-1
Level-1 (hardware)
Trigger (hardware)
Trigger
Trigger •
(hardware)
(hardware) High-Level Trigger
• • •High-Level
High-Level (software)
Trigger
High-Level (software)
Trigger
Trigger (software)
(software)
.75%
•• • rejected
99.75%
99.75%rejected
rejected
99.75% rejected • 99% rejected
• 99%
99%
Deploy ML algorithms very • •early in the game
rejected
99% rejected
rejected
cision in ~4 μs
• • •decision in ~4 μs
decision in
decision ~4
in μs
~4 μs •
Challenge: strict latency constraints!
decision in
decision ~100s
decision in ms
• • •decision in ~100s
~100s
in ms
~100s
msms
pu e
g
on
CMS
CMSTrigger
Trigger
in
r
tin
ig e
ge
co M
Tr h-L
om ffl
si
ig
pp 40
O
ig
Tr
H
L1
1 1kHz
C
kHz
100
100kHz
kHz 1 MB/evt
1 1MB/evt
MB/evt
HiH
ghigh
4040MHz
40 MHz erer TrTr -L-eLe
MHz igi
ggg ig ig vev
T rTr gege l el
1 ns 1L1μs
L1 r 100
r ms 1s
• •Level-1
Level-1Trigger
Trigger(hardware)
(hardware) • •High-Level
High-LevelTrigger
Trigger(software)
(software)
• •99.75%
99.75%rejected
rejected • •99%
99%rejected
rejected
• •decision
decisioninin~4~4μsμs • •decision
decisioninin~100s
~100sms ms
wing
Field Programmable Gate Arrays are reprogrammable
integrated circuits
FPGA diagram
ctures
Contain array of logic cells used to configure low level
operations (bit masking, shifting, addition)
ures
DSPs (multiply-accumulate,
Logic cell etc.)
Flip Flops (registers/distributed memo
LUTs (logic)
Block RAMs
Look-up(memories)
Flip-flop
table
(registers)
(logic)
wing
Field Programmable Gate Arrays are reprogrammable
integrated circuits
FPGA diagram
ctures
Contain array of logic cells used to configure low level
operations (bit masking, shifting, addition)
ures
DSPs (multiply-accumulate,
Also contain embedded components: etc.)
Flip Flops (registers/distributed
Digital Signal Processors (DSPs):
memo
LUTs
logic units(logic)
used for multiplications
Block RAMs (memories)
Random-access memories (RAMs):
embedded memory elements
wing
Field Programmable Gate Arrays are reprogrammable
integrated circuits
FPGA diagram
ctures
Contain array of logic cells embedded with DSPs,
BRAMs, etc.
ures
Support highly parallel algorithm implementations
DSPs (multiply-accumulate,
Digital Signal Processors (DSPs): etc.)
Flip Flops (registers/distributed memo
logic units used for multiplications
LUTs (logic)
Random-access memories (RAMs):
embedded memory elements
Block RAMs (memories)
Flip-flops (FF) and look up tables
(LUTs) for additions
[*] https://www.xilinx.com/support/documentation/sw_manuals/xilinx2014_1/ug902-vivado-high-level-synthesis.pdf
06.02.2019 fpa4hep: real-time deep learning on FPGAs 16
Neural network inference
Lmn
N
L
N11
xn = gn (Wn,n 1 xn 1 + bn )
L
NMN
activation function multiplication addition
precomputed and
DSPs logic cells
stored in BRAMs
M hidden layers
16 inputs
64 nodes
output layer
activation: ReLU
32 nodes
N
X activation: ReLU
Nmultiplications = Ln 1 ⇥ Ln
5 outputs
n=2 activation: SoftMax
16 inputs
64 nodes
activation: ReLU
output layer
How many resources?
input layer DSPs, LUTs, FFs?
32 nodes
activation: ReLU
layer m
Does the model fit in the
32 nodes
N
X latencyactivation:
requirements?
ReLU
Nmultiplications = Ln 1 ⇥ Ln
5 outputs
n=2 activation: SoftMax
hls4ml
-
hls 4 ml
HLS 4 ML/
https://hls-fpga-machine-learning.github.io/hls4ml/
Figure 1: A typical workflow to translate a model into a firmware implementation using hls4ml.
06.02.2019 fpa4hep: real-time deep learning on FPGAs 19
Efficient NN design for FPGAs
FPGAs provide huge flexibility Constraints:
Performance depends on how well you Input bandwidth
take advantage of this FPGA resources
Latency
- take confidence with the package, its functionalities and design synthesis by running with one of
the provided trained NN
- learn how to read out an estimate of FPGA resources and latency for a NN after synthesis
• Second part:
• Third part:
- learn how to do model compression and its effect on the FPGA resources/latency
• Fourth part:
- learn how to accelerate NN inference firmware on a real FPGA (provided on Amazon cloud) with
SDAccel