0% found this document useful (0 votes)
70 views79 pages

2022 06 15 FPGA Lecture HS

1) An FPGA is a programmable integrated circuit that can be configured to perform the function of digital circuits. 2) FPGAs use SRAM memory cells to store the configuration and can be reprogrammed by loading a new configuration via external memory or a serial interface like JTAG. 3) The basic building blocks of FPGAs include programmable logic blocks, memory blocks, DSP blocks, clock management resources and I/O pins that can be connected together to implement a wide range of digital circuits and systems.

Uploaded by

Mauza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views79 pages

2022 06 15 FPGA Lecture HS

1) An FPGA is a programmable integrated circuit that can be configured to perform the function of digital circuits. 2) FPGAs use SRAM memory cells to store the configuration and can be reprogrammed by loading a new configuration via external memory or a serial interface like JTAG. 3) The basic building blocks of FPGAs include programmable logic blocks, memory blocks, DSP blocks, clock management resources and I/O pins that can be connected together to implement a wide range of digital circuits and systems.

Uploaded by

Mauza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Introduction to

Field Programmable
Gate Arrays
Hannes Sakulin
CERN / EP-CMD

12th International School of Trigger and Data


Acquisition (ISOTDAQ)
Catania, Italy, June 15th, 2022
What is a Field Programmable Gate Array ?
.. a quick answer for the impatient
— An FPGA is an integrated circuit
— Mostly digital electronics

— An FPGA is programmable in the in the field (=outside the factory),


hence the name “field programmable”
— Design is specified by schematics or with a
hardware description language
— Tools compute a programming file for the FPGA
— The FPGA is configured with the design (gateware / firmware)
— Your electronic circuit is ready to use

With an FPGA you can build electronic circuits …


… without using a bread board or soldering iron
… without plugging together NIM modules
… without having a chip produced at a factory
2
Outline
— Quick look at digital electronics
— Short history of programmable logic devices
— FPGAs and their features
— Programming techniques
— Design flow
— Example Applications in the Trigger and DAQ domain

3
Digital electronics

4
The building blocks: logic gates
Truth table C equivalent

AND gate q = a && b;

OR gate q = a || b;

Exclusive OR gate A
Q q = a != b;
XOR gate B

5
Combinatorial logic (asynchronous)
Outputs are determined
by Inputs, only

Example: Full adder with carry-in, carry-out

A B Cin S Cout
0 0 0 0 0
1 0 0 1 0
0 1 0 1 0 Combinatorial logic may
1 1 0 0 1 be implemented using
Look-Up Tables (LUTs)
0 0 1 1 0
1 0 1 0 1
0 1 1 0 1
1 1 1 1 1 LUT = small memory 6
(Synchronous) sequential logic

Outputs are determined


by Inputs and their History
(Sequence)
The logic has an internal state

2-bit binary counter


https://www.zeepedia.com/read.php?b=9&c=32&d_flip-flop_based_implementation_digital_logic_design

set
D Flip-flop:
samples the data at the rising
data Output (or falling) edge of the clock
clock Inverted output The output will be equal to
the last sampled input until the
reset next rising (or falling) clock edge

D Flip-flop (D=data, delay) 7


Synchronous sequential logic

+ =

Using Look-Up-Tables and Flip-Flops


any kind of digital electronics may be implemented

Of course there are some details


to be learnt about electronics design …
8
Quick history of programmable
digital electronics

9
Long long time ago …

10
Simple Programmable Logic Devices (sPLDs)
a) Programmable Read Only Memory (PROMs)

Late 60’s

Unprogrammed PROM (Fixed AND Array, Programmable OR Array)


11
Simple Programmable Logic Devices (sPLDs)
b) Programmable Logic Arrays (PLAs)

Programmable AND array

Unprogrammed PLA (Programmable AND and OR Arrays)


Most flexible
197512
but slower
Simple Programmable Logic Devices (sPLDs)
c) Programmable Array Logic (PAL)

Unprogrammed PAL (Programmable AND Array, Fixed OR Array)


13
Complex PLDs (CPLDs)

and flip-flops

Coarse grained
100’s of blocks, restrictive structure
(EE)PROM based 14
FPGAs …

15
FPGAs
(extremely flexible)

Fine-grained: 100.000’s of blocks Programmable Input / Output pins


16
today: up to 5 million logic blocks
LUT-based Fabrics

17
Typical LUT-based Logic Cell

Xilinx: logic cell,


Altera: logic element

— LUT may implement any function of the inputs


— Flip-Flop registers the LUT output
— May use only the LUT or only the Flip-flop
— LUT may alternatively be configured a shift register
— Additional elements (not shown): fast carry logic 18
Clock Trees

Clock trees guarantee that the clock arrives at the same time at all flip-flops
19
Clock Managers

Daughter clocks
may have multiple
or fraction
of the frequency

20
Embedded RAM blocks

Today: Up to ~500 Mbit of RAM21


Embedded Multipliers & DSPs

22
Digital Signal Processor (DSP)

DSP block (Xilinx 7-series)


Up to several 1000 per chip

23
Soft and Hard Processor Cores
— Soft core
— Design implemented with
the programmable
resources (logic cells) in
the chip

— Hard core
— Processor core that is
available in addition to the
programmable resources
— E.g.: Power PC, ARM

24
General-Purpose Input/Output (GPIO)

Today: Up to 1200 user I/O pins


Input and / or output
Voltages from (1.0), 1.2 .. 3.3 V
Many IO standards
Single-ended: LVTTL, LVCMOS, …
25
Differential pairs: LVDS, …
High-Speed Serial Interconnect
— Using differential pairs
— Standard I/O pins limited to
about 1 Gbit/s

— Latest serial transceivers:


typically 10 Gb/s, 25 Gb/s,
— up to 32.75 Gb/s
— up to 112 Gb/s with (SERDES)
Pulse Amplitude Modulation
(PAM)

— FPGAs with multi-Tbit/s IO


bandwidth

26
Components in a modern FPGA

27
Programming techniques

28
Fusible Links (not used in FPGAs)

29
Antifuse Technology

30
EPROM Technology
Erasable Programmable Read Only Memory

Intel, 1971
31
EEPROM and FLASH Technology
Electrically Erasable Programmable Read Only Memory

EEPROM: erasable word by word


FLASH: erasable by block or by device

Erasure by “Fowler-Nordheim” Tunneling 32


SRAM-Based Devices

Multi-transistor SRAM cell


Keeps configuration only while powered
33
Programming a 3-bit wide LUT

34
Summary of Technologies

Rad-tolerant
secure

Rad-tolerant
(e.g. Alice)
Used in most
FPGAs

35
Design Considerations (SRAM Config.)

36
Configuration at power-up

stores
Flash single or
PROM multiple
designs

FPGA Serial bit-stream


( SRAM based ) (may be encrypted)

Typical FPGA configuration time: milliseconds


37
Programming via JTAG
Joint Test Action Group

JTAG
Flash connector
PROM

FPGA
( SRAM based )
...

JTAG is a serial bus that can be used to


- Program Flash PROMs
- Program FPGAs
- Read / write the status of all FPGA I/Os
( = Boundary scan )
38
Remote programming

Flash
PROM FPGA PCI, VME

JTAG bus
FPGA
( SRAM based )
...

The JTAG bus may be driven by an FPGA


which contains an interface to a host PC
via PCI or VME

gateware can then be updated remotely


39
Major Manufacturers
— AMD Xilinx
— First company to produce FPGAs in 1985
— About 55% market share, today
— SRAM based CMOS devices
— Bought by AMD in 2022

— Intel FPGA (formerly Altera)


— About 35% market share
— SRAM based CMOS devices

— Microchip (Microsemi, Actel)


— Anti-fuse FPGAs (Formerly )
— Flash based FPGAs (Formerly )
— Mixed Signal

— Lattice Semiconductor
— SRAM based with integrated Flash PROM
— low power 40
Trends

41
Ever-decreasing feature size
— Higher capacity
— Higher speed
— Lower power
consumption

130 nm Xilinx Virtex-2


Widely used at LHC startup

28 nm Xilinx Virtex-7 / Altera Stratix V


16 nm Xilinx UltraScale +
14 nm Intel Stratix 10

7nm (2019) 7 nm Xilinx Versal ACAP(*)


42
(*) Adaptive Compute Acceleration Platform
Trends
— Speed of logic increasing
— Look-up-tables with more inputs (5 or 6)
— Speed of serial links increasing (multiple Gb/s)
— More integrated memory
— Integrated High Bandwidth Memory (HBM) in-package
— 10x faster than DDR4 (Xilinx: up to 8 GB, Intel: up to 16GB)

— Additional Flip Flops in routing resources (Intel hyperflex)


— More and more hard macro cores on the FPGA
— PCI Express
— Gen2: 5 Gb/s per lane
— Gen3: 8 Gb/s per lane (typically up to 16 lanes)
— Gen4: 16 Gb/s per lane
— 10 Gb/s, 40 Gb/s, 100 Gb/s Ethernet, 150 Gb/s Interlaken
— Sophisticated soft macros
— CPUs
— Gb/s MACs
— Memory interfaces (DDR2/3/4)

— Processor-centric architectures – see next slide 43


System-On-a-Chip (SoC) FPGAs

Xlinix Zynq

Intel Stratix 10 SoC

CPU(s) + Peripherals + FPGA in one package 44


FPGA – ASIC comparison
FPGA ASIC
— Rapid development cycle (minutes / — Higher performance
hours) — Speed, Area, Power
— Analog designs possible
— May be reprogrammed in the field
(gateware upgrade) — Better radiation hardness
— New features
— Long development cycle (weeks /
— Bug fixes months)

— Low development cost — Design cannot be changed once it is


produced
— You can get started with a
development board (< $100) and — Extremely high development cost
free software — ASICs are produced at a
semiconductor fabrication facility
(“fab”) according to your design
— High-end FPGAs rather expensive — Lower cost per device compared to
FPGA, when large quantities are
needed

45
FPGA development

46
Design entry
Schematics Hardware description language
VHDL, Verilog

entity DelayLine is

generic (
n_halfcycles : integer := 2);

port (
x : in std_logic_vector;
x_delayed : out std_logic_vector;
clk : in std_logic);

end entity DelayLine;

— Graphical overview — Can generate blocks using loops


— Can draw entire design — Can synthesize algorithms
— Use pre-defined blocks — Independent of design tool
— May use tools used in SW
development (git, CI …)
47
Mostly a personal choice depending on previous experience
Schematics

48
Hardware Description Language
— Looks similar to a programming language
— BUT be aware of the difference
— Programming Language => translated into machine
instructions that are executed by a CPU
— HDL => translated into gateware (logic gates & flip-flops)

— Common HDLs
— VHDL
— Verilog
— Newer trends
— High Level Synthesis (HLS) from C/C++
— Other C-like languages (handle-C, System C)
— Labview
49
Example: VHDL
Asynchronous logic — Looks like a
All signals in sensitivity list
programming
language

— All statements
Synchronous logic
Only clock (and reset) in sensitivity list
executed in
parallel, except
inside
processes

50
Schematics & HDL combined

51
C/C++
Design flow High Level
Synthesis
Commercial
State IP Integrator Intellectual Property
constraints Schematics VHDL / Verilog
Machines cores
Pins Processors
Counters Interfaces
Timing
FIFOs Controllers
Area
… …

Behavioral
Simulation
Synthesis

Register Transfer Level (RTL) model


Static Timing
Implementation Analysis
Map
Place & Route

Timing
Simulation
Programming file 52
Floorplan (Xlinx Virtex 2)

53
Manual Floor planning

— For large designs, manual Routing congestion


54
floor planning may be necessary Xilinx Virtex 7 (Vivado)
Simulation

55
Embedded Logic Analyzers

A great tool for debugging your design


56
FPGA applications
in the Trigger & DAQ domain

57
First-Level Trigger at Collider
Timing: beam crossings
LHC: 25 ns
detector
Coarse grain data
Full data
(fine grain)
Delay First Level Trigger Fixed Latency
FIFO (= processing time
Pipelined of the first
Logic level trigger)

N beam crossings
Trigger decision YES / NO
(for every beam crossing )
De-
randomizer Latency should be short
FIFO
In order to limit the length
of the delay FIFOS 58
Pipelined Logic
Processing Processing Processing Trigger
data from data from data from decision
beam beam beam for beam
crossing crossing crossing crossing

4 3 2 1
...

Combinatorial logic

Flip flop
Clocked with same clock as collider
59
Pipelined Logic – a clock cycle later
Processing Processing Processing Trigger
data from data from data from decision
beam beam beam for beam
crossing crossing crossing crossing

5 4 3 2
...

Combinatorial logic

Flip flop
Clocked with same clock as collider
60
Why are FPGAs ideal for First-Level Triggers ?

— They are fast


— Much faster than discrete electronics
Low latency
(shorter connections)

— Many inputs
— Data from many parts of the detector
has to be combined

— All operations are performed in parallel


— Can build pipelined logic High
performance
— They can be re-programmed
— Trigger algorithms can be optimized
61
Trigger algorithms implemented in FPGAs
— Trigger
— Peak finding
— Pattern Recognition
— Track Finding
— Clustering / Energy summing
— Topological Algorithms (invariant mass)
— Vertex Finding
— Particle flow (reconstruction jets, etc. from individual particle tracks)
— Inference with Neural Networks
— Many more …

— Trigger Control system


— Fast (busy) signal merging & monitoring
— Generation of random triggers
— Generation of calibration sequences
— Automatic recovery sequences
— Monitoring (dead times, rates, …) 62
CMS Run-1 Global Muon Trigger
— Input: ~1000 bits
@ 40 and 80 MHz
— Output: ~50 bits @ 80MHz
— Processing time: 250 ns
— Pipelined logic
one new result every 25 ns

— 10 Xilinx Virtex-II FPGAs


— up to 500 user I/Os per chip
— Up to 25000 LUTs per chip
used
— Up to 96 x 18kbit RAM used
— The CMS Global Muon trigger received 16 muon — In use in the CMS trigger
candidates from the three muon systems of CMS 2008-2015

— It merged different measurements for the same muon


and found the best 4 over-all muon candidates
63
CMS Run-1 Global Muon Trigger main FPGA

64
Micro-TCA board for Run 2&3
CMS trigger based on Virtex 7

360 Gb/s
36 x
10 Gb/s

Rx
Tx

Rx
Tx

MP7, Imperial College Input/output:


Virtex 7 with 690k logic cells up to 14k bits per 40 MHz clock
80 x 10 Gb/s transceivers bi-directional
72 of them as optical links on front panel Same board used for different functions
0.75 + 0.75 Tb/s (different gateware)
Being used in the CMS trigger since 2015 Separation of framework + algorithm fw
CMS ATCA Trigger boards for HL-LHC
120 x
25 Gb/s

Serenity, UK APX, US
— Few types of generic boards, ATCA standard
— Xilinx Virtex/Kintex Ultrascale+ FPGAs
— 25-28 Gb/s optical links
— SoC FPGAs used for board control (on some boards)
— Advanced firmware algorithms
— Vertex finding
— Particle flow
— Neural network classifiers 66
Neural Networks in Trigger
One or many hidden layers

— Principle
— Node is assigned a value based
on the weighted sum of nodes in
the previous layer
— Maps well to DSP resources in
FPGA (multiplier + adder)

— Applications:
— Jet classification
— Assignment of transverse
momentum based on many
measurements
By Glosser.ca - Own work, Derivative of File:Artificial neural network.svg, CC
— …
BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=24913461

— Tools
— Many commercial tools
— hls4ml (optimized for latency)
— Firmware generation from
high-level model using Vivado
HLS 67
FPGAs in Data Acquisition
— Frontend & Backend Electronics
— Pedestal subtraction
— Zero suppression
— Compression
— …

— Custom data links


— E.g. SLINK-64 over copper
— Several serial LVDS links in parallel
— Up to 400 MB/s
— SLINK-express over optical fibre
— Lp-GBT
— Interface from custom hardware to commercial electronics
— PCI/PCIe, VME bus, Myrinet, 10/40/100 Gb/s Ethernet etc.
68
C-RORC (Alice) / Robin NP (ATLAS) for Run-2

Xilinx Virtex-6 FPGA


Custom data
link in

SLINK (ATLAS)
DDL (ALICE)

Direct Memory Access


transfer to host memory
69
CMS Front-end Readout Link (Run-1)
— SLINK Sender Mezzanine
Card: 400 MB / s Custom
— 1 FPGA (Altera) Interface to
— CRC check backend
— Automatic link test electronics

Custom
data
Commercial Myrinet Network link out
Interface Card on internal PCI bus

— Front-end Readout Link Card


— 1 main FPGA (Altera) Interface to
— 1 FPGA as PCI interface commercial
HW
— Custom Compact PCI card
— Receives 1 or 2 SLINK64
— 2nd CRC check
— Monitoring, Histogramming Custom
— Event spy data
link in 70
CMS Readout Link for Run-2&3 in use
Commercial since 2015
data link out Myrinet NIC
10 Gb/s TCP/IP replaced by
custom-built
card
(“FEROL”)

Cost effective solution


(need many boards)
Custom data Rather inexpensive FPGA
link in + commercial chip to combine
SLINK-64 input 3 Gb/s links to 10 Gb/s
LVDS / copper

FEROL (Front End Readout Optical Link)


Input: 1x or 2x SLINK (copper) Output: 10 Gb/s Ethernet optical
1x or 2x 5Gb/s optical TCP/IP sender in FPGA
1x 10Gb/s optical 71
CMS Readout Link for Run-2&3 in use
Commercial since 2015
data link out
10 Gb/s TCP/IP

10 Gb/s SLINK Express


5 Gb/s SLINK Express
5 Gb/s SLINK Express

Custom data
link in
SLINK-64 input
LVDS / copper

FEROL (Front End Readout Optical Link)


Input: 1x or 2x SLINK (copper) Output: 10 Gb/s Ethernet optical
1x or 2x 5Gb/s optical TCP/IP sender in FPGA
1x 10Gb/s optical 72
PCIe40 – LHCb and ALICE Run-3
Custom data
link in

Clock,
control

Direct Memory Access


transfer to host memory

73
J.P. Cachemiche, ACES 2018
CMS DTH (DAQ and Timing Hub) for HL-LHC
DAQ FPGA
Custom data Zynq SoC FPGA
link in for control

Commercial
data link
(TCP/IP) out
Rear
transition
module DTH prototype 2
Clock & control Main board Clock & control distribution
uplink via backplane

— ATCA board using Xilinx Virtex Ultrascale + FPGAs


— One or two DAQ units per board
— Up to 24 inputs at 25 Gb/s
— 5x 100 Gb/s Ethernet to commercial network
— TCP/IP in FPGA
— Board contains switch for control network 74
FPGAs in other domains
— ASIC Prototyping
— Medical imaging
— Compute accelerators
— Advanced Driver Assistance
— Accelerator cards
Systems (Image
Processing)
— Speech recognition
— Cryptography
— Bioinformatics
(Genome sequencing)
— Aerospace / Defense
— ( Bitcoin mining )
— Also available in the Cloud
— 5G Wireless — Financial
— Inferencing
— Video transcoding
— … 75
Lab Session 5: Programming an FPGA

You are going to design the digital electronics inside this FPGA !
76
Lab Session 13: System-on-a-chip FPGA

PYNQ-Z2 board
Xilinx Zynq w. dual-core ARM

Design the digital electronics and software in this SoC FPGA!


77
Thank you!

78
Acknowledgement
— Parts of this lecture are based on material by Clive Maxfield, author
of several books on FPGAs. Many thanks for his kind permission to
use his material!

Re-use
— Re-use of the material is permitted only with the written
authorization of both Hannes Sakulin ([email protected])
and Clive Maxfield.

79

You might also like