0% found this document useful (0 votes)

20 views23 pages

HLS Tips and Tricks

The document provides tips and tricks for using Vivado HLS, focusing on coding techniques, design steps, and performance optimization strategies. It emphasizes the importance of methodologies such as pipelining, loop unrolling, and dataflow for enhancing system performance in FPGA designs. The presentation also discusses the growing acceptance of Vivado HLS in the industry, particularly for deep learning applications on FPGAs.

Uploaded by

yehia.mahmoud02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views23 pages

HLS Tips and Tricks

Uploaded by

yehia.mahmoud02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Vivado HLS – Tips and Tricks

Presented By

Frédéric Rivoallon
Marketing Product Manager
October 2018

© Copyright 2018 Xilinx

Vivado HLS
Algorithms

Coding techniques C / C++

• Micro-architecture OpenCL
˃ Abstracted C based descriptions • RAM adaptation Open
• Data type optimization Source
Design steps
˃ Higher productivity • C sim
• C synthesis
HLS
C++
• Co-sim Vivado HLS Library
Concise code Automated RTL verification

AXI I/F
Interface synthesis
Optimized libraries

Fast C simulation RTL IP

IP Integrator
Automated simulation of generated RTL RTL IP
Platform
Awareness
Interface synthesis (AXI-4) Integration
RTL

Vivado
Synthesis, P&R

© Copyright 2018 Xilinx

Vivado HLS Acceptance Grows…

5,000+ papers since 2014!

1370

High demand for deep

learning accelerators on
FPGAs

Software programmable
FPGA SoCs become available

2000 2013 2015 2017

Year
Based on graph from Cornell University.
© Copyright 2018 Xilinx
Factors for Overall System Performance

˃ Platform Fixed Performance…

Off-chip memory, data links (e.g. PCIe) Malleable Performance…
Data Processing (RTL, HLS)
Connectivity IPs
Connectivity IPs
Typically Xilinx IPs
˃ Compute Customization
Micro-architecture, parallelism, operators
D
FPGA D
˃ Memory Adaptation R

On-chip memory, shift registers, piping

˃ Datatype Optimization
Customized data type (adjusted to requirement)

© Copyright 2018 Xilinx

Identify the Performance Challenge

˃ Compute-bound or memory-bound?

˃ What kind of parallelism is required?

Algorithm
Examples

Cornell University - Rosetta benchmarks: http://www.csl.cornell.edu/~zhiruz/pdfs/rosetta-fpga2018.pdf

© Copyright 2018 Xilinx
Proceed Methodologically
Adjusting C code and pragmas
to find the “right” micro-architecture
˃ 5 Steps to design closure – UG1197 (Chapter 4) is a major design step…
The UltraFast High-Level Productivity Design Methodology Guide (Design Hub)

• Define interfaces and data packing

• Define loop trip counts
• Pipeline and Dataflow
• (compute instructions and tasks parallelism)

• Partition memories and ports

• Remove false dependencies

• Optionally recover resources through sharing

• Fine tune operator sharing and constraints

© Copyright 2018 Xilinx

Interface Synthesis
˃ Simple code quickly becomes a “real” circuit
HLS provide block level IO and interface pragma to customize circuit
f(int
void Fin[20],
(int in[20],
int out[20])
int out[20])
{ {
int a,b,c,x,y;
for(int i = 0; i < 20; i++) {
x = in[i]; y = a*x + b + c; out[i] = y;}
y;
}

a
v The default interface for C
*
in x y out
+
v
arrays (BRAM) can be
FIFO
v v FIFO
v changed to “FIFO” via a
in b out single line pragma (a.k.a
+
v
BRAM c BRAM directive)…
v

i
i
Control logic – FSM v v v

HLS Adapts Logic to the Design Interface

˃ PIPELINE applies to loops or functions Initiation Interval (II):

Number of clock cycles before the
Instructs HLS to process variables continuously function can accept new inputs

void F (...) {
void F (...) {
... clk
... default READ COMPUTE WRITE
add: for (i=0;i<=3;i++) {
add: for (i=0;i<=3;i++) { throughput = 3
READ COMPUTE WRITE
# PRAGMA HLS PIPELINE READ COMPUTE WRITE
op_READ; loop latency = 12 READ COMPUTE WRITE
op_READ;
op_COMPUTE;
op_COMPUTE;
op_WRITE; clk
op_WRITE;
}
}
...
PIPELINE READ COMPUTE
READ
WRITE
COMPUTE WRITE
... READ COMPUTE WRITE
throughput = 1 READ COMPUTE WRITE

loop latency = 6

Loop pipelining example

˃ Allows for loops or functions to process inputs continuously
Improves throughput (II gets lower)

© Copyright 2018 Xilinx

Loop Unrolling
˃ Unroll forces the parallel execution of the instructions in the loop

void F (...) { Default: 4 cycles clk

... 0 1 2 3
add: for (i=0;i<=3;i++) {
b = a[i] + b; clk Note: A tight timing
... Unroll: 1 cycle 0 constraint could lead
1 to a latency different
2
3 than 1 clock cycle.

High performance
a[3] + + + execution when array
In “Directives” pane, select
loop label “add”, right-click
a[2]
a[1]
a[0]
+ v b
elements available in
parallel…
and select unroll… Otherwise no benefit
Example: Fully unrolled loop. from unrolling this loop…
(parallel execution but more area)

© Copyright 2018 Xilinx

PIPELINE and Automatic Loop Unrolling
Initiation Interval (II):
Number of clock cycles before the
˃ PIPELINE automatically unrolls loops… function can accept new inputs

void fir(data_t x, coef_t c[N], acc_t *y) { QUIZ: Which other pragmas might be useful?

#pragma HLS PIPELINE a) “interface ap_stable” for the coefficients

static data_t shift_x[N]; b) “array partition” for shift_x
acc_t acc; c) “expression_balance” to control adder tree
data_t data;
d) All of the above
acc=0;
for (int i=N-1;i>=0;i--) {
if (i==0) { Answer d)
shift_x[0] = x;
• ap_stable helps reduce logic for “c” if the coefficients are
data = x;
} else { expected to be constant
shift_x[i] = shift_x[i-1]; • Array partitioning the shifter then ensures all “x” can be
data = shift_x[i];
} accessed in parallel
acc+=data*c[i];; • Expression balance to preserve the inherent multiplier-add
}
*y=acc;
cascade chain implied in the C code (longer latency but
} more efficient once mapped onto DSP blocks)

© Copyright 2018 Xilinx

Removing Inter-Loop Bubbles

˃ Rewind for PIPELINE for next loop execution to start as soon as possible
Removes inter-loop gaps

loop: for(i=1;i<N;i++) {
loop: for(i=1;i<N;i++) { RD0 CMP WR0
#pragma HLS PIPELINE rewind RD0 CMP WR0
op_Read; RD RD1
RD1 CMP
CMP WR1
WR1
op_Read;
op_Compute; RDCMP RD2
RD2 CMP
CMP WR2
WR2
op_Compute;
op_Write; CMPWR RDN
RDN CMP
CMP WRN
WRN
op_Write; WR
} Next loop Next loop
RD0 CMP RD0
WR0 CMP WR0
RD1 CMP RD1
WR1 CMP WR1
invocation invocation
reads
} immediately…
starts after RD2 CMP RD2
WR2 CMP WR2
previous one RDN CMP RDN
WRN CMP WRN
has finished…

˃ See user guide for more information (including the “flush” option)

© Copyright 2018 Xilinx

C Arrays

˃ C Arrays describe memories…

Vivado HLS default memory model assumes 2-port BRAMs
˃ Default number of memory ports defined by…
How elements of the array are accessed
The target throughput (a.k.a initiation interval also referred to as II)

See UG902 to get full throughput on this example

void foo (...) { • (Chap 3 – Array Accesses and Performance)
...
SUM_LOOP:for(i=2;i<N;++i) {
sum += mem[i] + mem[i-1] + mem[i-2];
RD RD RD + + WR RD RD WR
... RD +
} +
}

Example: Code implies three reads from a RAM, prevents full throughput

˃ Arrays can be reshaped and/or partitioned to remove bottlenecks

Changes to array layout do not require changes to the original code

© Copyright 2018 Xilinx

Partition, Reshape Your C Arrays
˃ Partitioning splits an array into independent arrays
Arrays can be partitioned on any of their dimensions for better throughput

0 1 … N/2-1
block
Example:
factor of 2
RTL arrays
N/2 … N-2 N-1
C array
Example:
1 3 … N-1
0 1 2 … N-3 N-2 N-1 cyclic factor of 2 RTL arrays
0 2 … N-2

N-2 N-1 …
1
complete Individual elements
0
N-3 2

˃ Reshaping combines array elements into wider containers

Different arrays into a single physical memory

New RTL memories are automatically generated without changes to C code

© Copyright 2018 Xilinx

Dataflow Pragma – Task Level Parallelism

˃ By default a C function producing data for another is fully executed first

// This memory can be a FIFO during optimization
rgb_pixel inter_pix[MAX_HEIGHT][MAX_WIDTH]; Sepia Filter Sobel Filter

// Primary processing functions

Sepia Filter Finish all writes … then Sobel starts
sepia_filter(in_pix,inter_pix);
sobel_filter(inter_pix,out_pix2);
Sobel Filter to inter_pix[N]… accessing
inter_pix[N]

˃ Dataflow allows Sobel to start as soon as data is ready

Sepia Filter
Sepia Filter
Functions operate concurrently and continuously
[0] [1] [2][0] [3][1] …[2] [3] …
The interval (hence throughput) is improved
Channel buffer has to be filled before consumed for ping-pong Sobel Filter
Sobel Filter

˃ Dataflow creates memory channels Channel (ping-pong)

Channel (FIFO)
Created between loops or functions to store data samples RAM
SepiaFilter
Sepia Filter FIFO Sobel
SobelFilter
Filter
“Ping-pong” channel holds all the data RAM

“FIFO” for sequential access, no need to store all the data

˃ The FIFO channel with DATAFLOW avoids storing frames between tasks
Default Pipelined Dataflow
FIFOI/F FIFO FIFO
Stream
RAM Stream Task
Exclusive full function Instruction
execution per pixel Parallelism Parallelism

Sepia Filter Sepia Filter Sepia Filter

RAM RAM FIFO

Exclusive full function Instruction
execution per pixel Parallelism

Sobel Filter Sobel Filter Sobel Filter

RAM
FIFOI/F Stream
FIFO Stream
FIFO

Default Pipelined Dataflow

BRAM 2792 2790 24
FF 891 1136 883
LUT 2315 2114 1606
Interval (II) 128,744,588 4,150,224 2,076,613

© Copyright 2018 Xilinx

Dataflow Hardware Implementation
˃ HLS inserts a “channel” between the functions ˃ Channel implementation
RAM
Ping-pong buffer
vecIn[10] vecOut[10] ‒ RAM buffers RAM
func 1 channel func 2 FIFO
‒ Sequential access FIFO

˃ Vivado implementation (RTL view)

channel

vecIn[10] vecOut[10]

func 1

Note: Apply “inline off” pragma to small functions so that they show as a level of hierarchy… func 2
© Copyright 2018 Xilinx
Dataflow Example
˃ DATAFLOW allows concurrent execution of two (or more) functions
void top(int vecIn[10], int vecOut[10]) {
#pragma HLS DATAFLOW
int tmp[10]; ˃ Vector I/O are modeled as coming
from/to a RAM
func1(vecIn,tmp);
func2(tmp,vecOut); ˃ Code on the left has an II of 5
}
i.e. vector size of 10 and 2
void func1(int f1In[10], int f1Out[10]) { elements cycle
#pragma HLS INLINE off
#pragma HLS PIPELINE Input vector is “BRAM” by default, so only 2
for(int i=0; i<10; i++) { reads in one cycle, hence II is 5
f1Out[i] = f1In[i] * 10;
} top II
}

void func2(int f2In[10], int f2Out[10]) {

#pragma HLS INLINE off Function II
#pragma HLS PIPELINE
for(int i=0; i<10; i++) {
f2Out[i] = f2In[i] + 2;
} Review Optimization in
} DATAFLOW viewer

Analyzing Dataflow Results
˃ View simulation waveforms after RTL cosimulation
Toolbar button Open Wave Viewer
Top-level signals in waveform view, pre-grouped into useful bundles

Vivado HLS Vivado HLS

4
Click Open Wave Viewer icon
1 Run C/RTL Cosimulation: Vivado
Vivado Simulator (or Auto)
5 Pre-grouped signals:
• Block-level IO
Select Dump Trace • C inputs
2 “all” or “port” • C outputs

6 Select function:
• Add its signals to waveforms
• ap_done
• ap_idle
• ap_ready
• ap_start

Note: Apply “inline off” pragma to small functions so that they remain a level of hierarchy in HLS…
3 Click OK

Analyze Simulation Waveforms

˃ New Dataflow waveform viewer(*)

Shows task-level parallelism

Confirm optimizations took place

Co-Simulation Waveforms in v2018.2

˃ HLS Schedule Viewer
Shows operator timing and clock margin

Shows data dependencies

X-probing from operations to source code

HLS Schedule Viewer in v2018.2

(*): 2018.2: Visible when Dataflow is applied, all traces dumped, using Vivado simulator and checking waveform debug

Target Markets for HLS

Aerospace and Defense Communications

Radar, Sonar LTE MIMO receiver
Signals Intelligence Advanced wireless antenna
positioning

Industrial, Scientific, Medical Audio, Video, Broadcast

Ultrasound systems 3D cameras
Motor controllers Video transport

Automotive Consumer
Infotainment 3D television
Driver assistance eReaders

Test & Measurement Computing & Storage

Communications instruments High performance computing
Semiconductor ATE Database acceleration

Vivado HLS Resources

˃ Vivado HLS is included in all Vivado HLx Editions (free in WebPACK)

˃ Videos on xilinx.com and YouTube

˃ DocNav: Tutorials, UG, app notes, videos, etc…

˃ Application notes on xilinx.com (also linked from hub)

˃ Code examples within the tool itself and on github

˃ Instructor led training

Summary

Performance Boosters for HLS…

˃ Compute customization, memory adaptation, datatype optimization

Throughput Optimizations…
˃ Apply task and instruction level parallelism

Vivado HLS is not just C synthesis…

˃ It’s C simulation, automated RTL simulation, interface synthesis, waveform analysis

s550 6989494 Enus SM Bob Cat s550
100% (2)
s550 6989494 Enus SM Bob Cat s550
1,234 pages
The Factors That Influence The Career Choices of Grade 12 HUMSS Students in Mil
100% (1)
The Factors That Influence The Career Choices of Grade 12 HUMSS Students in Mil
28 pages
Ug902 Vivado High Level Synthesis
100% (1)
Ug902 Vivado High Level Synthesis
673 pages
America's Civil War - January 2019 USA
100% (3)
America's Civil War - January 2019 USA
66 pages
100 Ansministère de La Famille Adventiste
No ratings yet
100 Ansministère de La Famille Adventiste
125 pages
2017 01 31 FPGA Lecture HS
No ratings yet
2017 01 31 FPGA Lecture HS
75 pages
Word Formation Processes
100% (1)
Word Formation Processes
45 pages
CoSynthesis Algorithms Partitioning
No ratings yet
CoSynthesis Algorithms Partitioning
29 pages
High Level Synthesis With Catapultc: Michal Stala
No ratings yet
High Level Synthesis With Catapultc: Michal Stala
29 pages
Heterogen Vivado Hls
No ratings yet
Heterogen Vivado Hls
341 pages
Mt233B: Kinematics: Learning Module
100% (1)
Mt233B: Kinematics: Learning Module
40 pages
RIICWD601E Student Assessment Tasks
No ratings yet
RIICWD601E Student Assessment Tasks
23 pages
System Design Using FPGA
No ratings yet
System Design Using FPGA
135 pages
Advance HDL Design Training On Xilinx FPGA
No ratings yet
Advance HDL Design Training On Xilinx FPGA
333 pages
Basic HLS Tutorial-2022.2
No ratings yet
Basic HLS Tutorial-2022.2
95 pages
Chapter 1
No ratings yet
Chapter 1
88 pages
Introduction To High-Level Synthesis With Vivado HLS
No ratings yet
Introduction To High-Level Synthesis With Vivado HLS
39 pages
Introduction To High-Level Synthesis With Vivado HLS
No ratings yet
Introduction To High-Level Synthesis With Vivado HLS
39 pages
FPGA Manual
No ratings yet
FPGA Manual
33 pages
Red Hat 3scale API Management 2.8
No ratings yet
Red Hat 3scale API Management 2.8
74 pages
Chapter 8 Bee 3113
No ratings yet
Chapter 8 Bee 3113
52 pages
Vivado HLS
No ratings yet
Vivado HLS
110 pages
CPEN 311: Digital Systems Design Slide Set 19: High-Level Synthesis
No ratings yet
CPEN 311: Digital Systems Design Slide Set 19: High-Level Synthesis
28 pages
Week 1 (Part 2) ECE-852 Pak Austria
No ratings yet
Week 1 (Part 2) ECE-852 Pak Austria
45 pages
001 29 Spartan FPGA Implementation
No ratings yet
001 29 Spartan FPGA Implementation
43 pages
Xilinx Answer 72471 PCIe EoU Debug 2019 1 Ver1
No ratings yet
Xilinx Answer 72471 PCIe EoU Debug 2019 1 Ver1
51 pages
VivadoHLS Overview PDF
No ratings yet
VivadoHLS Overview PDF
43 pages
25 Improving Performance and Resource Utilization
No ratings yet
25 Improving Performance and Resource Utilization
38 pages
Section 1HLS Overview Powerpoint
No ratings yet
Section 1HLS Overview Powerpoint
28 pages
Vivado - HLS - To - Zynq - Design - Summary - Jgarrigos
No ratings yet
Vivado - HLS - To - Zynq - Design - Summary - Jgarrigos
24 pages
Lecture05 - High-Level Digital Design Automation
No ratings yet
Lecture05 - High-Level Digital Design Automation
36 pages
Edk Basesystembuilder
No ratings yet
Edk Basesystembuilder
34 pages
VivadoHLS Improving Performance
No ratings yet
VivadoHLS Improving Performance
31 pages
STLRF Form
No ratings yet
STLRF Form
2 pages
24 Vivado HLS Intro
No ratings yet
24 Vivado HLS Intro
34 pages
HLS Tutorial
No ratings yet
HLS Tutorial
42 pages
C D1 05 Introduction To Vitis HLS
No ratings yet
C D1 05 Introduction To Vitis HLS
22 pages
High Speed electronics-UoH - 4-Vivado-Presentation
No ratings yet
High Speed electronics-UoH - 4-Vivado-Presentation
66 pages
Lab 1
No ratings yet
Lab 1
46 pages
Digital Filtering in Hardware: Adnan Aziz
No ratings yet
Digital Filtering in Hardware: Adnan Aziz
102 pages
Embedded Systems ECT-401 Part-3 Embedded Computing Platform
No ratings yet
Embedded Systems ECT-401 Part-3 Embedded Computing Platform
24 pages
High-Level Synthesis of Vlsis: Theda
No ratings yet
High-Level Synthesis of Vlsis: Theda
40 pages
FPGA Based System Design
No ratings yet
FPGA Based System Design
40 pages
SUG918-1.9E - Gowin Software Quick Start Guide
No ratings yet
SUG918-1.9E - Gowin Software Quick Start Guide
43 pages
High Level Synthesis II: ECE 3401 Digital Systems Design
No ratings yet
High Level Synthesis II: ECE 3401 Digital Systems Design
35 pages
Vivado HLS Update
No ratings yet
Vivado HLS Update
35 pages
02 - IO Resources
No ratings yet
02 - IO Resources
30 pages
Comparison of A Midsummer Night's Dream Adaptations and Play
No ratings yet
Comparison of A Midsummer Night's Dream Adaptations and Play
4 pages
Creating A Processor System Lab
No ratings yet
Creating A Processor System Lab
28 pages
System-On-Chip Design Using High-Level Synthesis Tools
No ratings yet
System-On-Chip Design Using High-Level Synthesis Tools
9 pages
FPGA Tutorial: Monday 07.09.2015 - 14:00
No ratings yet
FPGA Tutorial: Monday 07.09.2015 - 14:00
61 pages
Xps Tutorial
No ratings yet
Xps Tutorial
10 pages
005 Fpgades
No ratings yet
005 Fpgades
24 pages
Vivado HLS Update
No ratings yet
Vivado HLS Update
35 pages
FPGA
No ratings yet
FPGA
9 pages
LAB04
No ratings yet
LAB04
15 pages
Fpga Timeline & Applications: Fpgas Past, Present & Future
No ratings yet
Fpga Timeline & Applications: Fpgas Past, Present & Future
39 pages
Xilinx HLS
No ratings yet
Xilinx HLS
16 pages
L12: Reconfigurable Logic Architectures
No ratings yet
L12: Reconfigurable Logic Architectures
30 pages
Ug902 4
No ratings yet
Ug902 4
3 pages
FPGA Basics: FPGA and Xilinx ISE
No ratings yet
FPGA Basics: FPGA and Xilinx ISE
4 pages
Ug902 2
No ratings yet
Ug902 2
3 pages
Run Fast With Vivado HLS
No ratings yet
Run Fast With Vivado HLS
4 pages
Lesson Online Marketing Kids Strategies Techniques
100% (1)
Lesson Online Marketing Kids Strategies Techniques
19 pages
Fpga: Digital Designs: Team Name:Digital Dreamers
No ratings yet
Fpga: Digital Designs: Team Name:Digital Dreamers
8 pages
Ug902 3
No ratings yet
Ug902 3
3 pages
Jainam Internship Report PDF 2
No ratings yet
Jainam Internship Report PDF 2
21 pages
CV Vetnizah Juniantito 2022 PDF
No ratings yet
CV Vetnizah Juniantito 2022 PDF
3 pages
Desert Magazine 1941 February
100% (2)
Desert Magazine 1941 February
48 pages
All About FPGAs
No ratings yet
All About FPGAs
11 pages
Gulfo Monsalve 2019 Archivage
No ratings yet
Gulfo Monsalve 2019 Archivage
132 pages
Sweden B1 Introduces Six-Hour Work Day WORKSHEET
No ratings yet
Sweden B1 Introduces Six-Hour Work Day WORKSHEET
5 pages
Performance Analysis of NB-IoT Uplink in Low Earth
No ratings yet
Performance Analysis of NB-IoT Uplink in Low Earth
22 pages
Feasibility Study of Off-Shore Drilling Industry in West Bengal
No ratings yet
Feasibility Study of Off-Shore Drilling Industry in West Bengal
19 pages
Manish Kumar Vs Union of India UOI and Ors 1901202SC20212001211057171COM112670
No ratings yet
Manish Kumar Vs Union of India UOI and Ors 1901202SC20212001211057171COM112670
169 pages
High Level Synthesis A Use Case Comparison With Hardware Descrip
No ratings yet
High Level Synthesis A Use Case Comparison With Hardware Descrip
36 pages
Bablu Fruits CCI
No ratings yet
Bablu Fruits CCI
4 pages
Revised Student Handbook 2020 2025 Guyana Campus NEW Adjusted Candice Alvarez
No ratings yet
Revised Student Handbook 2020 2025 Guyana Campus NEW Adjusted Candice Alvarez
52 pages
S5 Comp Mce Q A
No ratings yet
S5 Comp Mce Q A
17 pages
FSM PDF
No ratings yet
FSM PDF
3 pages
FFT Implementation With Hardware Reuse
No ratings yet
FFT Implementation With Hardware Reuse
5 pages
High Level Synthesis
No ratings yet
High Level Synthesis
13 pages
NB-IoT Uplink Receiver Design and Performance Study
No ratings yet
NB-IoT Uplink Receiver Design and Performance Study
14 pages
LISTENING 3 In-Class Answer Key & Tapescript
No ratings yet
LISTENING 3 In-Class Answer Key & Tapescript
8 pages
Dedapontn2025 Thptchogao
No ratings yet
Dedapontn2025 Thptchogao
6 pages
An FPGA Implementation of A Soft-In Soft-Out Decoder For Block Codes
No ratings yet
An FPGA Implementation of A Soft-In Soft-Out Decoder For Block Codes
5 pages
Hls Setup Guide
No ratings yet
Hls Setup Guide
10 pages
Data Sheet 3RH2122-1BB40
No ratings yet
Data Sheet 3RH2122-1BB40
7 pages
Model Exam 2022 PDF
No ratings yet
Model Exam 2022 PDF
5 pages
GSIS Form 9 - Check Delivery Receipt
No ratings yet
GSIS Form 9 - Check Delivery Receipt
2 pages
B Cisco Nexus 9000 NX Os Quality of Service Configuration Guide 93x - Chapter - 0111
No ratings yet
B Cisco Nexus 9000 NX Os Quality of Service Configuration Guide 93x - Chapter - 0111
18 pages
FINAL PETITION in Support of Michael Martin Clark
No ratings yet
FINAL PETITION in Support of Michael Martin Clark
2 pages
5 - MC01 - Exterior Walkaround & Normal Checklist - REV 05.1
No ratings yet
5 - MC01 - Exterior Walkaround & Normal Checklist - REV 05.1
2 pages
1) Importance of Having An Encounter
No ratings yet
1) Importance of Having An Encounter
4 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
From Everand
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
Anand Vemula
No ratings yet
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
A Guide to Python Mastery: Python
From Everand
A Guide to Python Mastery: Python
Ummed Singh
No ratings yet

HLS Tips and Tricks

Uploaded by

HLS Tips and Tricks

Uploaded by

Vivado HLS – Tips and Tricks

© Copyright 2018 Xilinx

Coding techniques C / C++

Fast C simulation RTL IP

© Copyright 2018 Xilinx

5,000+ papers since 2014!

High demand for deep

2000 2013 2015 2017

˃ Platform Fixed Performance…

On-chip memory, shift registers, piping

© Copyright 2018 Xilinx

˃ What kind of parallelism is required?

Cornell University - Rosetta benchmarks: http://www.csl.cornell.edu/~zhiruz/pdfs/rosetta-fpga2018.pdf

• Define interfaces and data packing

• Partition memories and ports

• Optionally recover resources through sharing

• Fine tune operator sharing and constraints

© Copyright 2018 Xilinx

HLS Adapts Logic to the Design Interface

˃ PIPELINE applies to loops or functions Initiation Interval (II):

Loop pipelining example

© Copyright 2018 Xilinx

void F (...) { Default: 4 cycles clk

© Copyright 2018 Xilinx

#pragma HLS PIPELINE a) “interface ap_stable” for the coefficients

© Copyright 2018 Xilinx

© Copyright 2018 Xilinx

˃ C Arrays describe memories…

See UG902 to get full throughput on this example

˃ Arrays can be reshaped and/or partitioned to remove bottlenecks

© Copyright 2018 Xilinx

˃ Reshaping combines array elements into wider containers

New RTL memories are automatically generated without changes to C code

© Copyright 2018 Xilinx

˃ By default a C function producing data for another is fully executed first

// Primary processing functions

˃ Dataflow allows Sobel to start as soon as data is ready

˃ Dataflow creates memory channels Channel (ping-pong)

“FIFO” for sequential access, no need to store all the data

Sepia Filter Sepia Filter Sepia Filter

RAM RAM FIFO

Sobel Filter Sobel Filter Sobel Filter

Default Pipelined Dataflow

© Copyright 2018 Xilinx

˃ Vivado implementation (RTL view)

void func2(int f2In[10], int f2Out[10]) {

© Copyright 2018 Xilinx

Vivado HLS Vivado HLS

© Copyright 2018 Xilinx

˃ New Dataflow waveform viewer(*)

Confirm optimizations took place

Co-Simulation Waveforms in v2018.2

Shows data dependencies

X-probing from operations to source code

© Copyright 2018 Xilinx

Aerospace and Defense Communications

Industrial, Scientific, Medical Audio, Video, Broadcast

Test & Measurement Computing & Storage

© Copyright 2018 Xilinx

˃ Vivado HLS is included in all Vivado HLx Editions (free in WebPACK)

˃ Videos on xilinx.com and YouTube

˃ DocNav: Tutorials, UG, app notes, videos, etc…

˃ Application notes on xilinx.com (also linked from hub)

˃ Code examples within the tool itself and on github

˃ Instructor led training

© Copyright 2018 Xilinx

Performance Boosters for HLS…

Vivado HLS is not just C synthesis…

© Copyright 2018 Xilinx

You might also like