0% found this document useful (0 votes)
14 views6 pages

Vaidya 2017 Hardware Acceleration of Image Proc

This paper discusses the hardware acceleration of image processing algorithms using the Vivado High Level Synthesis (HLS) tool, highlighting its benefits in productivity and performance. It compares implementations of basic algorithms like histogram calculation and smoothing filters across various hardware platforms in terms of speed, latency, and resource utilization. The findings demonstrate the advantages of using HLS for efficient hardware design while emphasizing the need for embedded hardware knowledge.

Uploaded by

Noah Okitoi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views6 pages

Vaidya 2017 Hardware Acceleration of Image Proc

This paper discusses the hardware acceleration of image processing algorithms using the Vivado High Level Synthesis (HLS) tool, highlighting its benefits in productivity and performance. It compares implementations of basic algorithms like histogram calculation and smoothing filters across various hardware platforms in terms of speed, latency, and resource utilization. The findings demonstrate the advantages of using HLS for efficient hardware design while emphasizing the need for embedded hardware knowledge.

Uploaded by

Noah Okitoi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

International Conference on Intelligent Computing and Control Systems

ICICCS 2017

Hardware Acceleration of Image Processing


Algorithms using Vivado high level synthesis
tool
Prof. Bhaumik Vaidya Prof. Mustafa Surti Parth Vaghasiya Jay Bordiya Jenish Jain
Assistant Professor, Assistant Professor, Student, Student, Student,
SCET, Surat SCET, Surat SCET, Surat SCET, Surat SCET, Surat

Abstract — In this paper, image processing The primary benefits of an HLS design methodology
algorithms are designed using high level synthesis and are improved productivity for hardware designers and
implemented on different hardware platforms. These improved system performance for software designers in
implementations are compared in terms of speed, latency terms of reduction in development time because of
and resource utilization on various hardware platforms. The abstraction from implementation details, verification at the
use of pipelining to reduce the latency is illustrated with an C-level allows validation of the functional correctness of
example. Using high level synthesis the designer has the the orders of magnitude faster than traditional hardware
opportunity to employ libraries similar to OpenCV and can description languages allows, Controlling the C synthesis
take advantage of the productivity benefits of working at a process through optimization directives allows the creation
higher level of abstraction, while creating high-performance of specific high-performance hardware implementations
hardware. Basic image processing algorithms like and allows quick creation of many different
calculating histogram, histogram equalization, averaging implementations from the C source code using
filter and laplacian filter are chosen to explain hardware optimization directives which enables easy design space
acceleration using high level synthesis. The workflow of
exploration and improves the likelihood of finding the
most-optimal implementation. [2]
implementing high level C / C++ / SystemC code in
hardware using Vivado high level synthesis tool is explained The image processing algorithm design takes advantage
along with implementation results of various Image of a high-level synthesis tool because it allows the designer
Processing algorithms. to employ libraries similar to OpenCV, a library that is
Keywords— image processing; Vivado HLS; Zynq SOC, well-known and widely used by software designers for
Virtex 7; Virtex 6; Spartan 6; Histogram; Laplacian filter. computer vision applications. [1, 9] High level synthesis
helps in reduction of time to market because designers are
already familiar with libraries. However, high-level
I. INTRODUCTION synthesis tools are far from being perfect. Developers still
Reconfigurable computing has gain increasing attention need embedded hardware knowledge and experience to
from researchers and industries over the last few years as it accomplish a successful design [3]
constitutes a very interesting combination of hardware The organization of paper is as follows. In section II
performance and software flexibility. The complexity of other similar implementation are explored, In section III
hardware design in growing day by day and because of that design flow for Vivado HLS is discussed along with
number of lines in Hardware Description Language (HDL) basic image processing algorithm details. The
code is also increasing day by day. Most engineers have to implementation results and comparison results are also
spend a significant amount of time to learn to program shown.
FPGA using hardware description language such as
Verilog and VHDL because the modeling of a hardware is
vastly different than designing a software, and it requires a II. RELATED WORK
good knowledge of hardware. To overcome this, concept of Various studies have been presented in literature
hardware / software co-design and high level synthesis is regarding the utilization of reconfigurable architectures for
introduced. acceleration of image processing applications.
High level synthesis tools like Vivado from Xilinx Developments in HLS attract many software and hardware
convert high level C / C++ / SystemC code into Register designers to enhance the implementation of different
Transfer Language (RTL) implementation that synthesizes solutions [2]. To improve the design productivity of
into Xilinx FPGAs. [1] Vivado HLS provides the implementing FPGA-based image processing, several
possibility for a software designer to accelerate application researchers have demonstrated edge detector applications
with computational complexity on the hardware which using high-level synthesis tools.
provides a massively paralleled architecture with benefits Hanaa M Abdelgawad, Mona Safar, and Ayman M
in performance, cost and power over traditional processors. Wahba proposed High level synthesis of canny edge
[1] It allows hardware designers who implement designs in detection algorithm on Zynq platform. [2] K V Ramana
a FPGA to take advantage of the productivity benefits of Reddy implemented Canny Edge Detector (CED)
working at a higher level of abstraction, while creating algorithm on Spartan 3E FPGA platform and Video
high-performance hardware. The process of HDL coding Graphics Array (VGA) interface for displaying the images
and behavioral simulation in conventional FPGA design on the monitor. [4] The maximum image size that has been
flow can be replaced by the workflow of Vivado HLS [2]. implemented was 128 x 128 with using BRAM to store the

978-1-5386-2745-7/17/$31.00 ©2017 IEEE 29


International Conference on Intelligent Computing and Control Systems
ICICCS 2017

image in it. Malathy H Lohithaswa [5] implemented Canny


Edge Detection design in Verilog HDL and executed it on
a Spartan 3E FPGA platform. The high-level
implementation was done using MATLAB. The total logic
utilization that has been estimated are; Number of Slices
(16%), Flip Flops (18%), LUTs (11%), bonded IOBs (7%),
MULT18X18SIOs (10%) and GCLKs(4%).
J. Monson, M. Wirthlin, and B. L. Hutchings [6]
implemented Sobel filter, targeting a Zynq-based platform.
They used HLS to restructure an existing Sobel filter
written in C to synthesizable version and changed any non-
synthesizable portions to synthesizable one. The
incremental optimization helped their design to achieve a
performance of 388 FPS at a resolution of 640x480.
Chaitali Chakrabarti, Srenivas Varadarajan and Lina J.
Karam [7] implemented canny edge detection algorithm
using a 32 computing engine architecture and synthesized
it for the Xilinx Virtex-5 FPGA as target. It occupies 64%
of the total number of slices and 87% of the local memory
and takes 0.721ms to detect edges of 512x512 images Fig 1 Vivado HLS Workflow
when clocked at 100 MHz. All the results were obtained
with MATLAB simulation for implementation and The primary output from Vivado HLS is the
ModelSim for testing with the platform. P H Pawar and R implementation in RTL format in languages like Verilog
P Patil [8] implemented Canny Edge Detection algorithm and VHDL. This RTL can be synthesized into a gate-level
on Virtex-5 evaluation board. The input images were implementation and an FPGA bit stream file by logic
converted from RGB to gray color using the MATLAB synthesis using Xilinx PlanAhead EDA tool-suite. The
code. The size of the stored image was resized in the implementation files are packaged as an IP block for use
MATLAB code. Edge detection in image was done using within other tools in the Xilinx design flow. [1, 11]
Xilinx 14.1. The output was successfully displayed on
VGA monitor interfaced with the board using DVI
connector. IV. IMPLEMENTATION
DETAILS AND METHODS
Most of the implementation in literature still uses either
MATLAB or hardware description language like Verilog This paper focuses on developing hardware
or VHDL to implement some part of the image processing acceleration of a number of popular image processing
algorithms. Very few implementations are completely algorithms using Vivado high level synthesis tool. The
made using high level synthesis technique. So there is a goal for this hardware implementation is to minimize the
need to exploit this area for image processing applications latency without sacrificing precision. The boards used for
in detail which is the goal of this paper. the algorithm implementation is the Xilinx Zynq ZC
602 board, Virtex 7 VC 709 evaluation board,
Virtex 6 ML 605 evaluation board and Spartan-6
III. WORKFLOW OF VIVADO HLS FPGA on a SP605 board. Resources available in various
hardware platforms are shown in table 1.
Fig 1 indicates workflow followed by Xilinx Table 1 Resources available in various hardware
Vivado High level synthesis tool. The primary input to platforms
Vivado HLS is a function written in C, C++ or SystemC.
This function might contain a hierarchy of sub-functions. Board No of No of FF No of No of
LUT DSP48E BRAM
Additional inputs include constraints and directives. The Zynq ZC 602 53200 106400 220 280
constraints are mandatory and include clock period, clock Virtex 7 VC 433200 866400 3600 2940
uncertainty and FPGA target. The directives are optional 709
and Vivado HLS uses them to direct the synthesis process Virtex 6 ML 150720 301440 768 832
605
to implement a specific behavior or implementation. The Spartan-6 SP 27288 54576 58 116
final type of input is the C test bench and any associated 605
files. [1]
High-Level Synthesis uses the C test bench to The timing performance and resource utilization
simulate the C function to be synthesized. High-Level of various image processing algorithms is compared
Synthesis later re-uses the C test bench to automatically for all above hardware resources. Image size is taken
verify the RTL output using C / RTL co-simulation. as 256x256 for simplicity. For timing, clock is
constrained at 10ns and clock uncertainty at 1.25ns.
Timing performance is compared in terms of clock
period, maximum frequency of operation and latency
where latency signifies number of clock cycles needed
to complete operation.

978-1-5386-2745-7/17/$31.00 ©2017 IEEE 30


International Conference on Intelligent Computing and Control Systems
ICICCS 2017

this algorithm for enhancement of real-time image


A. Histogram and Histogram Equalization sequences is sought. [10]

Image histogram shows frequency of pixels intensity


values. In an image histogram, the X - axis shows the gray
level intensities and the Y - axis shows the frequency of
these intensities. [13] Image histogram gives contrast
information of the image or it indicates that image is dark or
light. Histogram is widely used as one of the image feature
in computer vision applications and it is also used in image
thresholding.
The flowchart for implementation of calculating
histogram in C++ is shown in Fig 2. Test bench is written in
C++ to verify the code by giving 256 x 256 image as an
output and storing the result in text file for plotting in
MATLAB.

Fig 3 Flowchart for Histogram equalization code in C++

In Fig 3 flowchart for histogram equalization


implementation in C++ is shown. Standard image of size
256x256 is applied as an input to the code from test bench
and again output of the code is constructed as an image in
the test bench using OpenCV library. The simulation results
for both histogram and histogram equalization is shown in
Fig 4.

Fig 2 Flowchart of Histogram code in CPP

The code for histogram is synthesized using


Vivado HLS for different hardware platforms and
comparison between them is shown in Table 2.
Table 2 Comparison of Histogram HLS for different hardware
platforms (a) (b)

Board Clock Freq. Latency LUT FF


Period (MHz)
(ns)
Zynq ZC 602 7.86 127.2 131330 205 91
Virtex 7 VC 709 5.73 174.5 131330 205 91
Virtex 6 ML 7.20 138.9 131330 205 91
605
Spartan-6 SP 605 7.41 135 131330 224 91

The Histogram Equalization algorithm enhances


the contrast of images by transforming the values in an (c) (d)
intensity image so that the histogram of the output image is Fig 4 (a) Original image (b) Histogram output from Vivado
approximatelyy flat using
g Equation.
q HLS (c) Histogram equalized image (d) Histogram of
Equalized image

(1) In Histogram equalization all pixel values can be


Where x is input intensity value, is maximum intensity calculated simultaneously using Equation 1. In hardware
value is minimum value and y is intensity output after this can be done in parallel. The concept of doing operation
equalization. This approach is computationally extensive in parallel is called pipelining. Vivado HLS tool can be
and because of its performance, hardware implementation of configured to introduce pipelining in hardware resources by
using pragma that is compiler directives for the tool. Table 3

978-1-5386-2745-7/17/$31.00 ©2017 IEEE 31


International Conference on Intelligent Computing and Control Systems
ICICCS 2017

and Table 4 shows the results of with pipelining and without so that it can be used for windowing operation that needs 3
pipelining in hardware resources. x 3 neighbors and does not have to be fetched all nine
values every time. The center pixel is returned to the Test
Table 3 Comparison of Histogram Equalization without bench and from that output image is created using OpenCV
pipelining HLS for different hardware platforms library.
Board Clock Freq. Latency LUT FF DSP
Period (MHz) 48E
(ns)
Zynq ZC 8.13 123 1769479 2351 1528 3
602
Virtex7 8.73 114.5 983044 1738 886 3
VC 709
Virtex 6 8.50 117.6 1245189 1724 1038 3
ML 605
Spartan-6 8.53 117.2 1507333 1768 1110 5
SP 605

Table 4 Comparison of Histogram Equalization with pipelining


HLS for different hardware platforms

Board Clock Freq. Latency LUT FF DSP


Period (MHz) 48E
(ns)
Zynq ZC 8.13 123 65569 2297 1514 3 Fig 5 Flowchart for Averaging filter in C++
602
Virtex 7 8.73 114.5 65554 1723 872 3
VC 709
Fig 6 indicates result of applying 3 x 3 averaging
Virtex 6 8.50 117.6 65554 1707 1024 3 mask on an image. As can be seen from the result it blurs
ML 605 the input image. This filter works well if image is having
Spartan-6 8.53 117.2 65563 1748 1096 5 white noise but it works poorly in case of salt and pepper
SP 605 noise. [13]
As can be seen from comparison between results of
with and without pipelining (Table 3 and Table 4) latency
is reduced drastically so throughput of the program will
increase proportionally for the same clock frequency.
Overall it can be concluded from the results that Virtex 7
board is working well in terms of clock frequency, latency
and resource utilization.

B. Smoothing Filter
Fig 6 Result of application of smoothening filter on image
A useful filter in video and image processing is the
smoothing filter, also known as the averaging or blurring Following table shows comparison of high level
filter. This filter is widely used to reduce noise in the
image particularly whit e noise. It is also used as a synthesis of averaging filter code for various hardware
pre-processing stage in computer vision algorithms in platforms.
order to enhance images for use in later stages of image Table 5 Comparison of Averaging filter HLS for different
processing and computer vision. It is considered a low hardware platforms
pass filter; it gets rid of sharp edges and quick changes in
pixel values, i.e. the high frequency part of the image. [13] Board Clock Freq. Latenc LUT FF DSP BRAM
Averaging filter runs through the image pixel by pixel, and Period MHz y 48E _18K
replaces each pixel with a new value that is the average (ns)
value of the window of pixels. For window size of 3x3, Zynq 7.79 128.3 327680 401 219 2 2
ZC
aver a gin g filter will apply following window to all 602
pixels: Virtex 6.76 148.4 196608 407 217 2 2
7 VC
709
(2) Virtex 8.72 114.5 196608 411 217 2 2
6 ML
Fig 5 indicates flowchart of implementing 605
Averaging filter in C++. In implementation of averaging Spartan 8.71 115.0 458752 459 221 2 2
filter concept of row buffer is used to store past pixel values -6 SP
605

978-1-5386-2745-7/17/$31.00 ©2017 IEEE 32


International Conference on Intelligent Computing and Control Systems
ICICCS 2017

C. Laplacian Filter

Laplacian filter is a derivative operator which is used


to find edges in an image. The major difference between
Laplacian and other operators like Prewitt, Sobel, Robinson
and Kirsch is that these all are first order derivative masks
but Laplacian is a second order derivative mask. [13]
Laplacian filter highlights gray level discontinuities in an
image and try to deemphasize regions with slowly varying
gray levels. This operation produces images which have Fig 8 Result of application of laplacian filter on image
grayish edge lines and other discontinuities on a dark
background. This produces inward and outward edges in an
image. The Laplacian operator can be positive or negative
Table 6 shows comparison of high level synthesis
and it is rotation invariant. [12] In this paper following mask
of laplacian filter code for various hardware platforms.
is used as Laplacian window.
Table 6 Comparison of Laplacian filter HLS for different
(3)
hardware platforms
Fig 7 indicates flowchart of implementing Board Clock Freq. Latency LUT FF BRAM
Laplacian filter in C++. Again concept of row buffer is used Period MHz _18K
to store past pixel values. The center pixel is returned to the (ns)
Test bench and from that output image is created using Zynq 8.73 114.5 65540 475 323 2
ZC
OpenCV library. The simulation results can be seen in Fig 602
8. Virtex 8.18 122.2 196608 398 213 2
7 VC
709
Virtex 8.39 119.2 196608 428 213 2
6 ML
605
Spartan 8.56 116.8 262144 491 243 2
-6 SP
605

From comparison of hardware platforms for


different image processing algorithms it can be concluded
that Virtex 7 VC 709 board performs well in terms of
maximum frequency of operation, throughput, latency and
resource utilization. Also introduction of pipelining, use of
block RAMs and concept of row buffers helps in reducing
latency of the program significantly. After generating RTL
for algorithms using HLS and co-simulating it using C++ test
Fig 7 Flowchart of laplacian filter in C++ bench, it is taken to vivado design suit for generating gate
level netlist and FPGA bit stream. That bit stream is
As can be seen from results in Fig 8 application of programmed in to FPGA for hardware verification.
Laplacian filter with 3x3 window size finds edges in the
input image. The discontinuities are indicated by white
pixels whereas similar regions are shown in black.
CONCLUSION
In this paper, the possibility for accelerating
computationally expensive image processing algorithms by
using Vivado HLS and OpenCV library is explored.
Experimental results for simulation and implementation of
four image processing algorithms on FPGA using Vivado
High level synthesis are shown and compared for various
hardware platforms available in market. The performance of
the algorithm is compared in terms of latency, clock
frequency and resource utilization. By analyzing results of
all the algorithms it can be concluded that virtex 7 vc709
platform performs well. As can be seen that HLS offers

978-1-5386-2745-7/17/$31.00 ©2017 IEEE 33


International Conference on Intelligent Computing and Control Systems
ICICCS 2017

many benefits over RTL implementation in terms of


programmability for FPGA-based design and Time to market
for design. But still to gain performance, designer has to
understand the application and analyze it to perform
application optimization and mapping on accelerator-based
hardware platforms.
REFERENCES
[1] Xilinx. High-Level Synthesis. http://www.xilinx.com/
support/documentation/sw_manuals/xilinx2015_4/ ug902-vivado-
high-level-synthesis.pdf, November 2015
[2] H. M. Abdelgawad, M. Safar, and A. M. Wahba, “ High level
synthesis of canny edge detection algorithm on zynq platform“, Int. J.
Comput. Electr. Autom. Control Inf. Eng, 9(1):148–152, 2015.)
[3] Z. Shi, “ Rapid Prototyping of an FPGA-Based Video Processing
System,” PhD thesis, Virginia Tech, 2016.
[4] KV Ramana Reddy, ”Implementation of canny edge detection
algorithm on fpga and displaying image through vga interface,”
International Journal of Engineering and Advanced Technology
(IJEAT), 2013.
[5] M H Lohithaswa, “Canny edge detection algorithm on fpga,” IOSR
Journal of Electronics and Communication Engineering (IOSR-
JECE), 2015.
[6] J. Monson, M. Wirthlin, and B. L. Hutchings, “Optimization
techniques for a high level synthesis implementation of the sobel
filter,” In 2013 International Conference on Recon- figurable
Computing and FPGAs (ReConFig), pages 1–6, Dec 2013.
[7] C. Chakrabarti , Q. Xu, S. Varadarajan and J. Karam, “ A distributed
canny edge detector: Algorithm and fpga implementation,” IEEE
TRANSACTIONS ON IMAGE PROCESSING,, jul 2014R.
[8] P.H. Pawar and R.P. Patil, ”Fpga implementation of canny edge
detection algorithm,” International Journal of Engineering and
Computer Science, 3:8704–8709, 2014.
[9] A. Cortes, I. Velez and A. Irizar, "High level synthesis using Vivado
HLS for Zynq SoC: Image processing case studies," 2016 Conference
on Design of Circuits and Integrated Systems (DCIS), Granada, 2016,
pp. 1-6.
[10] M. Reza Ali "Realization of the contrast limited adaptive histogram
equalization (CLAHE) for real-time image enhancement." The
Journal of VLSI Signal Processing 38.1 (2004): 35-44.
[11] D. O’Loughlin, A. Coffey, F. Callaly, D. Lyons, F. Morgan, "Xilinx
Vivado High Level Synthesis: Case studies," Irish Signals & Systems
Conference 2014 and 2014 China-Ireland International Conference on
Information and Communications Technologies (ISSC 2014/CIICT
2014). 25th IET Year pp. 352-356 2014.
[12] E. Nadernejad, S. Sharifzadeh, and H. Hassanpour, “Edge detection
techniques: Evaluations and comparison,” Applied Mathematical
Sciences, 2(31):1507–1520, 2008.
[13] R. Gonzalez and R. Woods,”Digital Image Processing”, Addison-
Wesley Publishing Company, 1992, p191.

978-1-5386-2745-7/17/$31.00 ©2017 IEEE 34

You might also like