0% found this document useful (0 votes)
3 views

HLS_Tutorial

The document provides an introduction to High-Level Synthesis (HLS) and its application in programming Field Programmable Gate Arrays (FPGAs) using C/C++. It covers the basics of HLS, the workflow for developing HLS projects, and optimization techniques such as pipelining and unrolling to enhance performance. Additionally, it includes practical tips for setting up projects and testing synthesized designs.

Uploaded by

yehia.mahmoud02
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

HLS_Tutorial

The document provides an introduction to High-Level Synthesis (HLS) and its application in programming Field Programmable Gate Arrays (FPGAs) using C/C++. It covers the basics of HLS, the workflow for developing HLS projects, and optimization techniques such as pipelining and unrolling to enhance performance. Additionally, it includes practical tips for setting up projects and testing synthesized designs.

Uploaded by

yehia.mahmoud02
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Introduction to HLS

Simone Bologna
[email protected]

University of Bristol

23 October 2019
Outline

● Introduction to FPGA, VHDL, and HLS


● Getting started with HLS
● Life of a toy project from conception to (almost) implementation
● Tips and tricks
● Using C++ constructs in Vivado HLS

Introduction to HLS, Simone Bologna - 23 October 2019 2/42


Introduction
Field Programmable Gate Arrays (FPGA)

● FPGA are circuits that are programmable on the field


● FPGAs are powerful and flexible devices
● Components of FPGA
– Flip-Flops (FF), small memory component able to store a bit
● Typical used as a fast register to store data
– Look-Up Tables (LUT), small memories used to store truth tables and
perform logic functions
● Typically used to perform operation such as “and”, “or”, sums or
subtractions
– Digital Signal Processor (DSP), small processor able to quickly perform
mathematical operation on streaming digital signals
● Typically used for multiplication and additions
– Block RAM (BRAM), memory able to store data
● Can store a fair amount of data, but slow and with a limited number of
ports limiting memory throughput
Introduction to HLS, Simone Bologna - 23 October 2019 4/42
VHDL and HLS

● What is VHDL?
– VHSIC Hardware Design Language
● Very High Speed Integrated Circuit Hardware Description Language
– … ergh…
– Used to describe circuits that will be implemented on FPGA via code
– Not covered here!
● High-Level Synthesis (HLS) enables user to transform (synthesise)
C/C++/SystemC code into VHDL
– Enables users to program FPGA in high-level languages!
– Focusing on C++
● Analogies with assembly and high-level languages are stretched
– Each language works better in specific situations

Introduction to HLS, Simone Bologna - 23 October 2019 5/42


When to use each language?

● Collecting here opinions I have heard from various experts


● When to use VHDL?
– When you want full control on how your design is going to be
implemented
– When you need some clock-dependent applications
● i.e. receive data and hold it for three clock cycle
– Receiving data and sorting in specific manners
● When to use HLS?
– Rapid prototyping
● I would suggest to use it in doubt, implementing stuff in HLS will
generally take less time than using VHDL
– Designing some processing/analysis block
● i.e. developing some particle identification algorithm

Introduction to HLS, Simone Bologna - 23 October 2019 6/42


Getting started
HLS in Bristol

● excession.phy.bris.ac.uk is the FPGA development machine


● Two strategies to develop in HLS:
– Write code in your favourite editor and use Vivado HLS’ command line
interface (CLI)
– Use Vivado HLS’s GUI to do both editing and synthesis
● Vivado HLS’ command line does not provide all the tools
– Vivado HLS GUI is required when you need to investigate design
performance in detail
● Using editor + Vivado HLS CLI here
● I recommend using VNC to log into excession if you want to use
Vivado GUI
– Feel free to ask me help to set it up :)

Introduction to HLS, Simone Bologna - 23 October 2019 8/42


How to begin

● Get an account on excession


● Add Vivado to your environment
– source /software/CAD/Xilinx/2018.2/Vivado/2018.2/settings64.sh

– 2019.1 is available, I started on 2018.2 and I am keeping it for consistency


● Run vivado_hls in your terminal to open the vivado_hls GUI
– Use if you have mounted /software locally or if you are working via VNC
● vivado_hls -i opens the interactive TCL shell
– Development tools through command line
● vivado_hls <.tcl file> runs a .tcl script
– Typically I use it to build my firmware and test
● Back to tcl in a sec
● vivado_hls -f <.tcl file> runs a .tcl script and keeps the console open
– Useful for .tcl scripts that sets up your project before running some
interactive operation
Introduction to HLS, Simone Bologna - 23 October 2019 9/42
Terminology

● HLS file, C/C++ code that will be synthesised and run on FPGA
● Test bench (TB) file, C/C++ code that is run to test the HLS code. It
calls the HLS functions and can run tests on their output, e.g. C
asserts.
● Tcl scripts, set of tcl instructions executed by the Vivado HLS shell
● Synthesis, C/C++ → HDL lang (VHDL/Verilog)
● Project, collection of HLS and test bench (TB) files
– Has a top-level function name that is the starting point for synthesis
● Solution, specific implementation of a project
– Runs on a specific device at a specific clock frequency
● C simulation, HLS + TB files are compiled with gcc against HLS
headers and lib and plainly run as any other executable
● C/RTL cosimulation, synthesised HLS code is run on a simulator and
results tested on the C/C++ test bench
Introduction to HLS, Simone Bologna - 23 October 2019 10/42
Setting up your first project

● In a base project you will typically have


– At least a HLS .c/.cpp files
– A header used to link HLS code to test bench code
– At least a TB .c/.cpp file
– A .tcl script to set up your Vivado HLS project and solution

Introduction to HLS, Simone Bologna - 23 October 2019 11/42


General workflow

● Problem
● Define your inputs & output
– They will translate as the parameters of your HLS top-level function
● Write up your code
● Test your C++ code
● Synthesis, i.e. convert to VHDL code
– Optimise it to get the desired performance while staying in your HW limits
● Test synthesised design
● Export design, typically in Vivado IP (Intellectual Property) format
● Implement in Vivado on actual FPGA

Introduction to HLS, Simone Bologna - 23 October 2019 12/42


Building and optimising a project
Our problem

● Problem definition:
– We want to design a high-throughput vector adder and multiplier
● Throughput: amount of data items passing through the process

● Input & output definition


– We receive two 100-dimensional vector of 16-bit signed integer
– We output a 100-dimensional vector of 16-bit signed integer as the sum
and an additional 16-bit integer as the product

Introduction to HLS, Simone Bologna - 23 October 2019 14/42


Write up your code

Code time!

https://github.com/simonecid/VivadoTutorial
Introduction to HLS, Simone Bologna - 23 October 2019 15/42
Testing

● Before optimising your design, you need a reliable system to check


that it works as expected
● Testbench!
– C++ which runs your HLS function with a defined sets of inputs, of
which you already know the output
● e.g. two vectors you know the sum and product of
● Having a test bench that runs through tests is extremely beneficial
– You can use it to keep on checking that your code keeps on working
fine after you have altered it
● After going through synthesis you might want to redesign parts of it in
order to better suit your needs or optimise it
● Typical test runs the function and checks its results via C asserts
– More extensive and sophisticated test unit libraries, e.g. CPPunit, are
available, but let’s keep it simple :)

Introduction to HLS, Simone Bologna - 23 October 2019 16/42


Testing

● Add test bench files


with
– add_files -tb “FILE”
● Run your test bench
with
– csim
● Abbreviation of
csim_design

Introduction to HLS, Simone Bologna - 23 October 2019 17/42


Synthesis

● If the design is working and has been tested, you can proceed with the
synthesis
– Run csyn (abbreviation of csynth_design)
● Vivado HLS synthetises VHDL and Verilog (another HDL language) from
your C++ code
● Synthesis starts from a top-level function, declared in you .tcl file with
set_top
● Parameters of the top-level functions are translated into ports, by
default:
– N-bit variables are translated into STD_LOGIC_VECTORS, i.e. array of 1-bit
ports
– Structs and classes are converted to ports by creating ports for each one
of their attributes
– Arrays are translated into ports able to read from an external memory

Introduction to HLS, Simone Bologna - 23 October 2019 18/42


Post-synthesis analysis

● After synthesis, HLS produces a report describing the performance of


your design under <ProjectName>/<SolutionName>/syn/report/ in
.rpt format, human readable, and .xml, useful for automated analysis
Utilisation estimates: breakdown of resource
usage.
Clock estimate: gives an initial Note: LUTs and FFs are typically overestimated,
estimate of whether your design meets even by a factor 2
the required clock period
Note: final clock can only be known
after implementation on actual device,
sometimes HLS really messes up
Latency: minimum and maximum number
of clocks to finish processing, may change if
you have variable length loops
Initiation Interval (II): number of clocks
before new data can be processed
Pipeline: if the function has been
pipelined (more on this soon)
Loop breakdown: label your loops to make sure you can
see and study its performance.
Trip count is the number of iteration of the loop

Introduction to HLS, Simone Bologna - 23 October 2019 19/42


Post-synthesis analysis breakdown

● You can see how your resources


are being used
● 1 DSP used by multiplication
● 75 for the sums
● 108 used for temporary memory
Introduction to HLS, Simone Bologna - 23 October 2019 20/42
Optimising your design
● Base throughput: 1.2 Gb/s
Base design
● Let’s work on improving this
throughput
● Introducing three new concepts: 1.2 Gb/s
– Pipelining: enables an iteration of a function
or a loop to be executed before the previous
one is over
● Increases throughput w/ minimal resource
usage increase
– Unrolling: enables multiple iterations of a for
loop to run in parallel, if independent
● Greatly reduces latency and throughput
● Can have an impact on resource usage
based on loop size
– Memory partitioning: splits array
(implemented in BRAM1P/2P or memory port
by default) into single registers or ports,
enable fast parallel memory access
Introduction to HLS, Simone Bologna - 23 October 2019 21/42
Pipelining
● Let’s partition the memories and pipeline the main body of the loops
– Partition in the body of the function where the variable or the parameter is
declared; in main:
#pragma HLS array_partition variable=inVector1/2/3
● Breaks down the memory interface into single 16-bit ports
– Put this pragma in loop body to pipeline it; in sumLoop and productLoop:
#pragma HLS pipeline
● Following pipelining and partitioning
– Latency: 802 → 206
– II: 802 → 206

Introduction to HLS, Simone Bologna - 23 October 2019 22/42


Pipelining

Base design
throughput
1.2 Gb/s

Pipelined
throughput
4.7 Gb/s

Introduction to HLS, Simone Bologna - 23 October 2019 23/42


Unrolling

● Let’s unroll the loops


– Instead of instantiate logic for a single loop and execute it 100 times,
instantiate logic for each iteration and execute in parallel
● Essentially you increase resource usage by a factor 100
– DSP: 1 → 100
– Put this pragma in loop body to unroll it; in sumLoop and productLoop:
#pragma HLS unroll
● Latency: 206 → 8
● II: 206 → 8

Introduction to HLS, Simone Bologna - 23 October 2019 24/42


Unrolling

Base design
throughput
1.2 Gb/s

Pipelined
throughput
4.7 Gb/s

Unrolled
throughput
120 Gb/s

Introduction to HLS, Simone Bologna - 23 October 2019 25/42


Pipelining the top-level function

● The pipeline pragma pipelines the function in which it is located and


unroll and pipelines every underlying loop
– If we place a pipeline pragma in the top-level function body, everything
will be unrolled and pipelined, maximising performance
● Latency: 8 → 8
● II: 8 → 1, data can be input every clock cycle, max. throughput

Introduction to HLS, Simone Bologna - 23 October 2019 26/42


Pipelining the top-level function

Base design
throughput
1.2 Gb/s

Pipelined
throughput
4.7 Gb/s

Unrolled
throughput
120 Gb/s

Fully
pipelined
throughput
960 Gb/s
Introduction to HLS, Simone Bologna - 23 October 2019 27/42
Finishing touches

● Whenever you create a function, HLS creates a separate logic block


and connects it to the logic block of the main function
– Increases latency
– Prevents HLS from running optimisations that reduces resource usage
– In the function body (not top-level): #pragma HLS inline
● Inlines and integrates the sub-function in the calling one
● Latency: 8 → 7

Introduction to HLS, Simone Bologna - 23 October 2019 28/42


Testing and exporting the synthesised design

● Synthetised design can be tested in HDL


simulator in the C test bench
– Run cosim (abbreviation of cosim_design)
● First tests the C code, then the synthetised
design
● If everything looks good, you can export it for
actual implementation
– Using IP catalog now, but other formats are
available
– Final product of Vivado HLS
– export_design -format ip_catalog
● Exported design can be found in
<ProjectName>/<SolutionName>/impl/ip
● From here on is Vivado domain, not covered
here, but you can load IP and implement it
Introduction to HLS, Simone Bologna - 23 October 2019 29/42
Tips and tricks
Various tips and tricks

● You can use C++11 and higher constructs, e.g. auto or constexpr:
add_files -cflags "-std=c++11 "<HLS_FILE>"
● Run thorough tests on software, do not be lazy like me!
– Debugging stuff at later stages Is just way harder and confusing
● If you do not trust me, ask Aaron!
● Read the list of pragmas and experiment a lot with them
– Array_partition, pipeline, and unroll accept options, study them!
– Pragmas try to bridge the gap between C++ and HLS, master them
● HLS likes ternary operators, if possible use them instead of if
statements!
Ternary operator

Equivalent if statement

Introduction to HLS, Simone Bologna - 23 October 2019 31/42


Various tips and tricks
Splitting designs
● Big designs take long to synthesise
● Split your problem in smaller projects
● Each project can be exported in IP format
and then linked in a chain
● Saves lots of synthesis time
● Increases flexibility
– Blocks can be run at different clock
speeds
● Example: the jet trigger algorithm I work
on is made of three blocks
– Histogrammer
– Data buffer
History taught us that this
– Jet finder strategy works!
● Divide et impera reigns!
Introduction to HLS, Simone Bologna - 23 October 2019 32/42
Various tips and tricks
Scaling designs

● Your time is precious!


Do not waste it implementing large borken designs.
● Start small and write code that can be easily scaled up!
● For instance, let’s say you need to do some processing on a large
number of inputs
– Make the number of inputs a parameter of your code with a
#define NUMBER_OF_INPUTS XX
and make your code depend on it
– Do your initial testing on a scaled-down version of your code, i.e. with
few inputs, then increase it
● Takes way less time to implement a smaller design

Introduction to HLS, Simone Bologna - 23 October 2019 33/42


Various tips and tricks
Getting more accurate estimates

● Final timing and resource usage results are only obtainable after
implementation
● Vivado HLS provides tools to implement design without using Vivado
– Not sure how it works, I presume it makes some basic assumptions on
how you are going to place your design in a FPGA and implements it
● By running it you can get a more accurate estimates of timing and
resource usage, although not final they tend to be much closer
● Run export_design -format ip_catalog -evaluate vhdl
– This implements the VHDL design on FPGA
– 10 minutes to run for the small test design, against XX for synthesis
– Results in <ProjectName>/<SolutionName>/impl/report/vhdl/

Introduction to HLS, Simone Bologna - 23 October 2019 34/42


Various tips and tricks
Getting more accurate estimates

Introduction to HLS, Simone Bologna - 23 October 2019 35/42


Various tips and tricks
HLS libraries

● FOR THE LOVE OF GOD DO NOT USE THE C/C++ STANDARD LIBRARY!
– I have heard it gives horrible results
● I do not even know how they managed to get HLS to synthesise
● Do not reinvent the wheel!
– Vivado HLS has libraries doing many interesting things
● It is all in the manual
– For instance, #include <hls_math.h> for HLS math libraries

Introduction to HLS, Simone Bologna - 23 October 2019 36/42


Using C++ constructs
Using C++ constructs

Code time!

https://github.com/simonecid/VivadoTutorial/tree/cpp_version
Introduction to HLS, Simone Bologna - 23 October 2019 38/42
Using C++ constructs

● Rewritten the vector add and


multiply by developing a generic
Vector class via template
– Generic, flexible, easy to use
● N-dimensional
● Uses any type
● Same resource usage
● Clever usage of C++ constructs
provides great flexibility without
usage penalties
● Note:
– Partitioning of class attributes
must be invoked in constructors
– Inline every class method!

Introduction to HLS, Simone Bologna - 23 October 2019 39/42


Using C++ constructs

Introduction to HLS, Simone Bologna - 23 October 2019 40/42


Summary
● HLS enables users to write FPGA firmware in high-level languages
– More flexible and easier to use
● HLS pragmas can be used to produce high-throughput designs
– Pipeline functions, unroll loops and partition memory
– Used it on a vector adder and multiplier
● The machine excession is available in Bristol per FPGA development
● Went through a number of tips and tricks
● Using C++ classes and template does not affect resource usage while
improving code flexibility and ease of use
● Collection of my FPGA bookmarks in next slide
● Contacts:
[email protected]
– Skype: simonecid
– Office: 4.57
Introduction to HLS, Simone Bologna - 23 October 2019 41/42
Useful links

● HLS guide by Xilinx,


https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_2/ug902-vivado-high-le
vel-synthesis.pdf
● Optimisation in HLS by Xilinx
https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_1/ug1270-vivado-hls-
opt-methodology-guide.pdf
● Pipelining and Unrolling tips,
https://www.xilinx.com/support/documentation/sw_manuals/xilinx2015_2/sdsoc_doc/topics/cal
ling-coding-guidelines/concept_pipelining_loop_unrolling.html
● Parallelising function tip,
https://forums.xilinx.com/t5/Vivado-High-Level-Synthesis-HLS/How-to-set-the-two
● HLS tips, https://fling.seas.upenn.edu/~giesen/dynamic/wordpress/vivado-hls-learnings/
● HLS pragma list,
https://www.xilinx.com/html_docs/xilinx2018_3/sdsoc_doc/hls-pragmas-okr1504034364623.htm
l
● Introductory slides to HLS,
http://home.mit.bme.hu/~szanto/education/vimima15/heterogen_vivado_hls.pdf
● Improving performance in HLS,
http://users.ece.utexas.edu/~gerstl/ee382v_f14/soc/vivado_hls/VivadoHLS_Improving_Perform
ance.pdf

A first approach to HLS, Simone Bologna - 12 March 2019 42/42

You might also like