0% found this document useful (0 votes)
85 views22 pages

Cs295: Modern Systems What Are Fpgas and Why Should You Care

This document provides an overview of field-programmable gate arrays (FPGAs): - FPGAs can be configured to act like any circuit and are commonly used for computation acceleration. Their hardware is not fixed like CPUs/GPUs and can be customized for specific applications. - FPGAs offer fine-grained parallelism through specialized circuits optimized for applications. They can achieve GPU-level performance with higher power efficiency than CPUs/GPUs. - Programming FPGAs involves hardware description languages like Verilog or high-level synthesis from C/C++. The FPGA is programmed by compiling the code to a configuration bitfile.

Uploaded by

Giovanni Orsari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views22 pages

Cs295: Modern Systems What Are Fpgas and Why Should You Care

This document provides an overview of field-programmable gate arrays (FPGAs): - FPGAs can be configured to act like any circuit and are commonly used for computation acceleration. Their hardware is not fixed like CPUs/GPUs and can be customized for specific applications. - FPGAs offer fine-grained parallelism through specialized circuits optimized for applications. They can achieve GPU-level performance with higher power efficiency than CPUs/GPUs. - Programming FPGAs involves hardware description languages like Verilog or high-level synthesis from C/C++. The FPGA is programmed by compiling the code to a configuration bitfile.

Uploaded by

Giovanni Orsari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

CS295: Modern Systems

What Are FPGAs


and Why Should You Care

Sang-Woo Jun
Spring, 2019
What Are FPGAs
Field-Programmable Gate Array
Can be configured to act like any circuit – More later!
Can do many things, but we focus on computation acceleration
FPGAs Come In Many Forms

PCIe-Attached In-Storage

CPU Integrated In-Network


How Is It Different From CPU/GPUs
GPU – The other major accelerator
CPU/GPU hardware is fixed
o “General purpose”
o we write programs (sequence of instructions) for them
FPGA hardware is not fixed
o “Special purpose”
o Hardware can be whatever we want
o Will our hardware require/support software? Maybe!
Optimized hardware is very efficient
o GPU-level performance**
o 10x power efficiency (300 W vs 30 W)
Analogy
CPU/GPU comes with fixed circuits FPGA gives you a big bag of components

To build whatever Could be a CPU/GPU!

“The Z-Berry”
“Experimental Investigations on Radiation Characteristics of IC Chips”
benryves.com “Z80 Computer”
Shadi Soundation: Homebrew 4 bit CPU
Fine-Grained Parallelism of
Special-Purpose Circuits
 Example -- Calculating gravitational force:
8 instructions on a CPU → 8 cycles**
Much fewer cycles on a special purpose circuit
A = G × m 1 × m2 B = (x1 - x2)2 C = (y1 - y2)2
A = G × m1 C = x1 - x2 E = y 1 - y2
D=B+C
B = A × m2 D = C2 F = E2
Ret = B / G
G=D+F 3 cycles with compound operations
Ret = B / G May slow down clock
Ret = (G × m1 × m2) / ((x1 - x2)2 + (y1 - y2)2)
4 cycles with basic operations 1 cycle with even further compound operations
Coarse-Grained Parallelism of
Special-Purpose Circuits
Typical unit of parallelism for general-purpose units are threads ~= cores
Special-purpose processing units can also be replicated for parallelism
o Large, complex processing units: Few can fit in chip
o Small, simple processing units: Many can fit in chip
Only generates hardware useful for the application
o Instruction? Decoding? Cache? Coherence?
How Is It Different From ASICs
ASIC (Application-Specific Integrated Circuit)
o Special chip purpose-built for an application
o E.g., ASIC bitcoin miner, Intel neural network accelerator
o Function cannot be changed once expensively built
 + FPGAs can be field-programmed
o Function can be changed completely whenever
o FPGA fabric emulates custom circuits
 - Emulated circuits are not as efficient as bare-metal
o ~10x performance (larger circuits, faster clock)
o ~10x power efficiency
Basic FPGA Architecture
“Configurable logic block (CLB)” Programmable
~
I/O block Latch
6-Input
Look-Up
Table
FF

Ex) 2-LUT for “AND”


Input 1 Input 2 Output Sequential circuit
0 0 0 construction
0 1 0
1 0 0
1 1 1
Programmable interconnect
Basic FPGA Architecture – DSP Blocks
“DSP block”
CLBs act as gates – Many needed to
implement high-level logic
Arithmetic operation provided as
efficient ALU blocks
o “Digital Signal Processing (DSP) blocks”
o Each block provides an adder + multiplier

× +/-
Basic FPGA Architecture – Block RAM
“Block RAM”
CLB can act as flip-flops
o (~1 bit/block) – tiny!
Some on-chip SRAM provided as blocks
o ~18/36 Kbit/block, MBs per chip
o Massively parallel access to data → multi-
TB/s bandwidth
Basic FPGA Architecture – Hard Cores
Some functions are provided as
Memory efficient, non-configurable “hard cores”
o Multi-core ARM cores (“Zynq” series)
o Multi-Gigabit Transceivers
o PCIe/Ethernet PHY
o Memory controllers
Ethernet
o …

ARM PCIe
Example Accelerator Card Architecture
“FPGA Mezzanine Card” Expansion
o Network Ports, Memory, Storage, PCIe, …
General-Purpose I/O Pins Multi-Gigabit Transceivers
FMC

1GbE DRAM

FPGA
40GbE DRAM

PCIe
Example Accelerator Card (VCU108)
Programming FPGAs
Languages and tools overlap with ASIC/VLSI design
FPGAs for acceleration typically done with either
o Hardware Description Languages (HDL): Register-Transfer Level (RTL) languages
o High-Level Synthesis: Compiler translates software programming languages to RTL
RTL models a circuit using:
o Registers (state), and
o Combinational logic (computation)
Hardware Description Language
Software programming languages: Describes process
Hardware description languages: Describes structure
std::queue<float> input_queue; FIFO#(Float) input_queue <- mkFIFO;
std::queue<float> output_queue; Exists in memory FIFO#(Float) output_queue <- mkFIFO; Exists on chip
float factor; Reg#(Float) factor <- mkReg;
FloatMultIfc mult <- mkFloatMult;
while (true) {
if ( !input_queue.empty() ) { rule in;
ret = input_queue.front() * factor; mult.enq(factor, input_queue.first);
Instructions
output_queue.push(ret) input_queue.deq;
For CPU Creates
input_queue.pop(); endrule
} rule out; circuits
} ret <- mult.result;
output_queue.enq(ret);
endrule
Major Hardware Description Languages
Verilog: Most widely used in industry
o Relatively low-level language supported by everyone
Chisel – Compiles to Verilog
o Relatively high-level language from Berkeley
o Embedded in the Scala programming language
o Prominently used in RISC-V development (Rocket core, etc)
Bluespec – Compiles to Verilog
o Relatively high-level language from MIT
o Supports types, interfaces, etc
o Also active RISC-V development (Piccolo, etc)
High-Level Synthesis
Compiler translates software programming languages to RTL
High-Level Synthesis compiler from Xilinx, Altera/Intel
o Compiles C/C++, annotated with #pragma’s into RTL
o Theory/history behind it is a complex can of worms we won’t go into
o Personal experience: needs to be HEAVILY annotated to get performance
o Anecdote: Naïve RISC-V in Vivado HLS achieves IPC of 0.0002 [1], 0.04 after
optimizations [2]
OpenCL
o Inherently parallel language more efficiently translated to hardware
o Stable software interface

[1] http://msyksphinz.hatenablog.com/entry/2019/02/20/040000
[2] http://msyksphinz.hatenablog.com/entry/2019/02/27/040000
FPGA Compilation Toolchain
“Which transceiver instance should
top_transceiver_01 map to?”
High-Level High-level language vendor tool And so, so much more…
HDL Code
Constraint
Functional File Cycle-level
Simulation Simulation
Language
Compiler FPGA Vendor toolchain (Few open source)

Verilog/ Map/
Synthesize Netlist Place/ Bitfile
VHDL
Route
Programming/Using an FPGA Accelerator
Bitfile is programmed to FPGA over “JTAG” interface
o Typically used over USB cable
o Supports FPGA programming, limited debugging access, etc
PCIe-attached FPGA accelerator card is typically used similarly to GPUs
o Program FPGA, execute software
o Software copies data to FPGA board, notify FPGA
-> FPGA logic performs computations
-> Software copies data back from FPGA
FPGA flexibility gives immense freedom of usage patterns
o Streaming, coherent memory, …
Partial Reconfiguration

FPGA

Sub-components Parts of the FPGA can be


swapped out dynamically
without turning off FPGA
o Physical area is drawn on chip
Used in Amazon F1, etc
Toolchain support for isolation
FPGAs In The Cloud
Amazon EC2 F1 instance (1 – 4 FPGAs)
Microsoft Azure, etc…

You might also like