0% found this document useful (0 votes)
124 views6 pages

Analysis of Image Compression Algorithms On Vivado HLS

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views6 pages

Analysis of Image Compression Algorithms On Vivado HLS

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

IEEE International Conference on Communication information and Computing Technology (ICCICT), June 25-27, 2021, Mumbai, India

Analysis of Image Compression Algorithms on


Vivado HLS
2021 International Conference on Communication information and Computing Technology (ICCICT) | 978-1-6654-0430-3/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICCICT50803.2021.9510100

1st Ishan Khichadia 2nd Dhrumil Parikh


Electronics Engineering Electronics Engineering
Sardar Patel Institute of Technology Sardar Patel Institute of Technology
Mumbai, India Mumbai, India
[email protected] [email protected]

3rd Yash Rathod 4th Surendra Rathod


Electronics Engineering Electronics Engineering
Sardar Patel Institute of Technology Sardar Patel Institute of Technology
Mumbai, India Mumbai, India
[email protected] surendra [email protected]

Abstract—The resolution of the cameras is rising day by day FPGA gives the developer an ability to have the hardware
due to developments in optical technology. Such high resolution reconfigured based on the requirement of the application.
images cause the size of the images to be huge, putting a strain Moreover, the faster time-to market as compared to an ASIC
on the available storage. Thus, there is a need to compress
the size of the image without losing important data from the makes its advantageous to have development on the FPGA.
actual image. In this paper, we therefore present implementation The development of the algorithm is made easier by making
and a detailed analysis of different lossy image compression use of High Level Synthesis (HLS) workflow. VIVADO HLS
algorithms such as DCT (Discrete Cosine Transform), FDCT from Xilinx offers major advantages for the development and
(Fast Discrete Cosine Transform) and Haar Wavelet Transform. deployment of applications such as signal processing and im-
These algorithms are simulated and synthesized using Xilinx’s
VIVADO HLS platform for a Xilinx Artix 7 family board. age processing. These tools allow hardware-based algorithms
These algorithms are accelerated using the pragmas provided to be built and tested using higher-level languages (such as
by VIVADO HLS to optimize the application. The paper also C,C++) before the HDL-based implementation, verification
provides an analysis on the tradeoff between these algorithms and validation. HLS significantly reduces algorithm develop-
and usage of accelerators in the algorithms. We have calculated ment time.
the Most Significant Error (MSE) and Peak Signal to Noise Ratio
(PSNR) and achieved the PSNR value to be in the permissible II. R ELATED W ORK
range of 30dB to 50dB.
Various techniques have been proposed by researchers re-
Index Terms—Vivado, High-Level Synthesis, DCT, Fast DCT, lated to compression and decompression techniques. Few of
Haar wavelet, lossy compression, Matlab, FPGA, Acceleration, the recent developments in image compression are described
pipelining, accelerators, C language.
here.
I. I NTRODUCTION Yuecheng Li et al. [2] describes about the implementation of
image compression using JPEG baseline encoder. HLS tool is
Pictures are a part and parcel of human life in this modern utilized for system design. It uses 8x8 DCT algorithm. The
age. The high resolution images can be compressed to reduce pixels are quantized and Huffman coding is then applied. The
the redundancy in the image data. This results in reduction of AC components of pixels are encoded using zig-zag scanning.
storage space utilization. Image compression algorithms are This paper focuses on very less hardware utilization.
of two types: Lossless and lossy. Lossless compression is a Ahmad Shawahna et al. [3] describes JPEG compression
technique to reduce the size of an image while preserving the using DCT. The implementation is done in VHDL (VHSIC
quality of image. Lossy compression is a process in which Hardware Description Language). The paper proposes 5 steps
certain portions of the image are discarded in order to give the for image compression such as Color Space Conversion, Down
image an even smaller size. The image compression algorithms Sampling, 2-D DCT, Quantization, encoding. The compression
have dozens of operations being performed on a pixel. This ratio of around 82% to 85% was achieved. The paper focuses
results in an extremely heavy load on the computer software. on designing a parallel architecture.
In order to reduce the resource dependency on the Central M. B. Mutgekar et al. [4] proposes DCT and Fast DCT using
Processing Unit, the usage of Field Programmable Gate Array FFT on Nexys 4 DDR board. The main focus of this paper is
(FPGA) is beneficial for parallelization of operations [1]. implementation and testing of 2D DCT. But it also proposes
978-1-6654-0430-3/21/$31.00 ©2021 IEEE that Fast DCT has an improved performance.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 11,2022 at 12:50:52 UTC from IEEE Xplore. Restrictions apply.
T. G. Anitha et al. [5] proposes image compression tech- multiplied with the 8x8 block and then with the transpose
niques using 2D-FFT and IFFT on Matlab. The paper focuses of the DCT matrix [7].
on processing speed. The decompressed images were identical
to the source image with quality better than 35dB. C = DADT (1)
R. Praisline Jasmi et al. [6] proposes Huffman coding,
Discrete Wavelet Transform (DWT) and fractal coding for where:
image compression. The paper claims that DWT can be used C = resultant matrix of the DCT which contains values
to boost the quality of compressed images, while the Fractal of different frequencies with top left being DC and
Algorithm offers better compression ratios (CR) and peak bottom right being maximum frequency.
signal to noise ratios (PSNR). D = DCT matrix which is calculated using,

III. I MPLEMENTATION DETAILS AND METHODS  √1 if i = 0
Di,j = qN2 (2j+1)iπ
N cos otherwise

2N

A = 8x8 block of pixel.


DT = Transpose of DCT matrix.
• Each resultant matrix is divided by the quantization
matrix for lossy compression and then rounded to 0. Here,
low frequencies which contain the maximum information
of the image are divided by smaller number as compared
to the high frequency elements which are divided by
comparatively larger number.
C 
i,j
Bi,j = round (2)
Qi,j
where:
C = Output matrix of the previous step.
Q = Quantization matrix.
B = Resultant matrix after quantization is applied on C
Fig. 1: General Block Diagram of Image Compression
matrix.
• Now, the output matrix from the previous step is multi-
Fig.1 shows the general block diagram of image compres- plied with the quantization matrix.
sion system which is proposed. First, the original image is
converted into pixel matrix using Matlab and then stored in Xi,j = round(Bi,j ∗ Qi,j ) (3)
a text file which is read by Vivado HLS. The compression
algorithm is then run on a sizeable block depending on the where:
algorithms. Next step in the process is quantization which X = Resultant matrix after quantization multiplication.
is the most important step of compression algorithm as it B = Output matrix of the previous step.
is used to scale down the original values of the image. A Q = Quantization matrix.
fixed quantization matrix is used depending on the algorithm. • After quantization we again run the DCT on the output
The quantization matrix will divide the original pixel values of the previous matrix in the reverse order.
which will terminate some of the pixels to zero. After this step, Y = DT XD (4)
inverse of image compression algorithm is performed and pixel
values are stored back into a text file. The whole procedure is where:
performed and co-simulated by Vivado HLS. The text file is X = output matrix of the previous step.
then converted back to image with the help of Matlab. D = DCT matrix.
DT = Transpose DCT matrix.
A. Algorithms Used Y = Final block of compressed 8x8 pixels.
The algorithms used are: DCT, FDCT and Haar Wavelet • Combine all the 8x8 block of compressed pixels to
Transform. construct the compressed image.
The steps for each of these algorithms are mentioned.
2) Fast Discrete Cosine Transform (FDCT):
1) Discrete Cosine Transform (DCT): • Divide the image into 8 blocks of pixel values.
• Divide the image into 8x8 blocks of pixel values. • On that block of values, Fast DCT Transform is imple-
• From left to right and top to bottom, DCT is applied mented using the signal flow graph [8] as shown in Fig.2.
to every such block of pixels. For this, DCT matrix is The directions of signal flow graph are explained using

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 11,2022 at 12:50:52 UTC from IEEE Xplore. Restrictions apply.
H = Haar matrix which is given by,
 
0.125 0.125 0.25 0 0.5 0 0 0
0.125 0.125 0.25 0 −0.5 0 0 0 
 
 
0.125 0.125 −0.25 0 0 0.5 0 0 
 
0.125 0.125 −0.25 0 0 −0.5 0 0 
H=
 
0.125 −0.125 0

 0.25 0 0 0.5 0 

0.125 −0.125 0 0.25 0 0 −0.5 0 
 
0.125 −0.125 0 −0.25 0 0 0 0.5 
 

0.125 −0.125 0 −0.25 0 0 0 −0.5


Fig. 2: Signal Flow Graph of FDCT
A = 8x8 block of pixel.
DT = Transpose of Haar Transform matrix.
• The pixel values of the resultant matrix are then compared
with the predetermined threshold value. If the pixel value
is less than the threshold, then that value is changed to 0
and if it is greater no changes are made.
(
Fig. 3: Different blocks of Signal Flow Graph of FDCT Bi,j if Bi,j >= Threshold value
Bi,j =
0 otherwise
.
• Now, we again run the Haar transform but now we take
the inverse of the Haar transform matrix and its transpose.
−1
X = HT BH −1 (6)

where:
Fig. 4: Values of blocks of Signal Flow Graph of FDCT X = Final block of compressed 8x8 pixels.
−1
HT = Inverse of Transpose of Haar Transform matrix.
B = Output matrix of previous step.
Fig.3 and each block of signal flow graph is explained H −1 = Inverse of Haar Transform matrix.
using Fig 4. • Combine all the 8x8 block of compressed pixels to
• Each resultant matrix is divided by the quantization construct the compressed image.
matrix for lossy compression and then rounded to 0.
• Now, the output matrix from the previous step is multi-
IV. S IMULATION R ESULTS
plied with the quantization matrix.
• After quantization, we again run the FDCT in reversed The simulation results have been obtained using the Vivado
order to get the block of pixels in compressed form. HLS C simulation feature which builds and simulates the code.
• Combine all the 8 block of compressed pixels to construct The text file containing compressed pixel values generated as
the compressed image. a result of C simulation is transformed into JPEG image using
Matlab.
3) Haar Wavelet Transform:
• Divide the image into 8x8 blocks of pixel values. A. Image Output
• From left to right and top to bottom, Haar wavelet is
applied to every such block of pixels. For this, Transpose Fig.5 shows the original and compressed images using
of Haar Transform matrix is multiplied with the 8x8 block various compression algorithms. The original image is of
and then with the Haar Transform matrix. man, which has dimension of 1024*1024 and size is 1024
KB.
B = H T AH (5) Table I shows the image compression results performed by
all the three algorithms. The algorithm is performed on four
where: images of different dimension and different sizes to validate
B = Resultant matrix after Haar Transformation is the performance of the algorithms. Compression ratio of 80%
applied. to 88% was achieved.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 11,2022 at 12:50:52 UTC from IEEE Xplore. Restrictions apply.
1) Compression Ratio(CR): This ratio defines the compres-
sion achieved for a given image. The equation is as follows:
Sizeof U ncompressedImage
CR = (7)
Sizeof CompressedImage
2) Mean Square Error of an image represents the cumulative
square error between the compressed image and original
image. The equation is as follows:
1 XP XQ
M SE = [f 1(x, y) − f 2(x, y)]2 (8)
PQ j=1 i=1

(a) Original Image (b) Compressed Image using DCT where:


P = Height of the image.
Q = Width of the image.
f 1 = Pixel values of original image.
f 2 = Pixel values of compressed image.

3) Peak Signal to Noise Ratio (PSNR) represents the mea-


sure of peak error. The equation is as follows:
255
P SN R = 20log10 √ (9)
M SE
(c) Compressed Image using (d) Compressed Image using In our work, we have calculated the MSE and PSNR while
FDCT HAAR keeping the CR constant for an adequate comparison between
Fig. 5: Original and Compressed Image Of Man the three algorithms.

TABLE II: Performance Parameters Results


TABLE I: Image Compression Results
Image CR MSE PSNR(dB)
Image Dimension Original Size(KB) Compressed Size(KB) Reduction%
DCT Algorithm
DCT Algorithm
Man 8 31.365 33.166
Man 1024*1024 1024 123 87.98%
Flower 5 8.0327 39.082
Flower 3168*4752 1903 426 77.61%
Promenade 6 53.026 30.886
Promenade 3684*5540 4357 749 82.82%
City 6 36.119 32.553
City 3264*4896 7926 1296 83.64%
FDCT Algorithm
FDCT Algorithm
Man 8 63.667 30.094
Man 1024*1024 1024 122 88.05%
Flower 5 5.15 41.00
Flower 3168*4752 1903 443 76.72%
Promenade 6 50.192 31.124
Promenade 3684*5540 4357 760 82.56%
City 6 42.036 31.895
City 3264*4896 7926 1303 83.56%
HAAR Algorithm
HAAR Algorithm
Man 8 44.869 31.611
Man 1024*1024 1024 116 88.67%
Flower 6 9.112 38.535
Flower 3168*4752 1903 495 73.98%
Promenade 6 60.746 30.296
Promenade 3684*5540 4357 795 81.76%
City 6 38.7456 32.249
City 3264*4896 7926 1300 83.59%

Table II shows the performance parameters results of ob-


tained images. PSNR values for all algorithms were greater
B. Performance Parameters
then 30dB. For lossy image compression, the standard PSNR
The quality of image can be assessed using two methods: values for images having 8 bit depth is 30db to 50db [11].
subjective and objective [9]. Subjective refers to a human eye
assessment, whereas objective method refers to a mathematical C. Synthesis Report
criteria. The objective assessment can be obtained taking into We have used the Xilinx’s Vivado HLS workflow for
account performance parameters such as Mean Square Error designing of the image compression algorithms. HLS provides
(MSE), Peak Signal to Noise Ratio (PSNR), Compression a synthesis report about the performance and logic of custom-
Ratio (CR). These are the most common quality assessment generated hardware. Such knowledge provides advanced users
parameters for an image compression algorithm [10]. The CR, with many insights into the effects of implementing the code
MSE, PSNR can be calculated using the particular formulas: on actual FPGA hardware. The Resources tab shows the
resources utilized for the application.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 11,2022 at 12:50:52 UTC from IEEE Xplore. Restrictions apply.
We have made a comparison of the three algorithms based the parallel, same clock cycle memory access to different
on each of these reports. We have used the image: man.bmp locations thereby causing an increase in the BRAM utilization.
for our analysis.
b) DSP 48E
1) Latency is defined as the number of clock cycles DSP 48E is the Arithmetic and Logical Unit of the FPGA
required to produce an output [12]. [12]. It is responsible for complex computations such as
addition, subtraction, multiplication and summation.

Fig. 6: Latency achieved


Fig. 8: DSP 48E Utilization
From the Fig.6, we can see that the use of accelerators
within the C code implementation results in decrease of the The DSP 48E block utilization can be seen in Fig. 8.
latency. On the contrary, latency increases in FDCT algorithm The utilization when accelerators are not used is high as
indicating that using acceleration increase the wait state for computation of each equation makes use of multiple DSP
execution of functions because of loop-carry dependencies blocks and keeps using them till the complete computation
thereby causing delays. The FDCT algorithm is the quickest is not over. Using accelerators decreases this parallel
algorithm of the three. This is because the algorithm runs computations within an equation and frees up the resource as
in O(nlogn) as opposed to O(n3 ) for DCT and HAAR soon as that portion of computation is over, enabling it for
algorithm. use of other computations.

2) Hardware Utilization: The resources utilized by each c) Look Up Tables


algorithm is indicated in the charts. Look Up Tables (LUT) is the block which implements any
logic function consisting of n variables [12].
a) BRAM 18K
BRAM 18K is a dual-port RAM module which provides the
on-chip data storage. 18K indicates the amount of bits that
can be stored [12].

Fig. 9: Look Up Tables Utilization

Fig. 7: BRAM 18K Utilization The look up tables shown in Fig. 9 depicts the increase of
LUT utilization in DCT and HAAR when accelerators are
The BRAM utilization can be seen increasing in Fig.7, used as multiple functions are implemented simultaneously.
when accelerators have been used in DCT and HAAR, The FDCT has, on the other hand, decreased use.
because of the pipelining feature of accelerators increases

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 11,2022 at 12:50:52 UTC from IEEE Xplore. Restrictions apply.
d) Flip Flops better throughput, better speed but with overall increase in
The Flip Flop performs the function of basic data input, clock hardware utilization.
input and output acting as a latch. It holds the data for a time The current work is limited to Vivado HLS platform.
period beyond a clock cycle [12]. Further development of the application can be done on the
Vivado Design Suite. The HLS C application can be exported
as an Intellectual Property(IP) to the Vivado Design Suite
and further development of the hardware design can be done
around it. This design can be used to program FPGA hardware.

VI. ACKNOWLEDGMENT
We would like to express our gratitude towards the Elec-
tronics Engineering Department of Sardar Patel Institute of
Technology for providing us constant guidance and facilities.
R EFERENCES
[1] D. Tsiktsiris, D. Ziouzios, and M. Dasygenis, “A High-Level Synthesis
Implementation and Evaluation of an Image Processing Accelerator,”
Fig. 10: Flip Flops Utilization Technologies, vol. 7, no. 1, p. 4, Dec. 2018.

[2] Li Y, Jia W, Luan B, Mao ZH, Zhang H, Sun M. A FPGA Implementa-


From Fig.10 it can be seen that the flip flops utilization tion of JPEG Baseline Encoder for Wearable Devices. Proc IEEE Annu
increases in case of the accelerators in DCT and HAAR. Northeast Bioeng Conf. 2015;2015:10.1109/NEBEC.2015.7117173.
doi:10.1109/NEBEC.2015.7117173
This is because more flip flops are required to hold the data
for a longer periods of clock cycle as multiple concurrent [3] Shawahna, Ahmad Haque, Md Amin, Alaaeldin. (2019). JPEG Image
computations takes place causing more utilization. Compression using the Discrete Cosine Transform: An Overview,
Applications, and Hardware Implementation.

[4] M. B. Mutgekar and P. C. Bhaskar, ”Analysis of DCT and FAST DCT


V. C ONCLUSION & F UTURE W ORKS using soft core processor,” 2019 3rd International Conference on Trends
In this paper, we have implemented three different image in Electronics and Informatics (ICOEI), Tirunelveli, India, 2019, pp.
1128-1132, doi: 10.1109/ICOEI.2019.8862715.
compression algorithms in the Vivado HLS platform using the
C language. The C simulation and synthesis of each algorithm [5] T. G. Anitha and S. Ramachandran, ”Novel algorithms for 2-D FFT and
is generated and resultant image output is obtained using its inverse for image compression,” 2013 International Conference on
Signal Processing , Image Processing Pattern Recognition, Coimbatore,
Matlab. A thorough analysis concludes that each of these 2013, pp. 62-65, doi: 10.1109/ICSIPR.2013.6497959.
algorithms has its own share of advantages and disadvantages.
These aids to appropriate selection of the algorithm for a [6] R. Praisline Jasmi, B. Perumal and M. Pallikonda Rajasekaran,
”Comparison of image compression techniques using huffman coding,
particular image application. DWT and fractal algorithm,” 2015 International Conference on
For a given compression ratio, DCT provides the best Computer Communication and Informatics (ICCCI), Coimbatore, 2015,
PSNR, that is the best image quality while compromising on pp. 1-5, doi: 10.1109/ICCCI.2015.7218137.
the speed. Whereas, for high speed and critical applications, [7] Ken Cabeen and Peter Gent, Image Compression and the Discrete
FDCT has the lowest latency which is approximately 1% of Cosine Transform, Math 45, College of the Redwoods
latency required by the DCT, making it the fastest, although
[8] Stanford University: EE398A - Image and Video Compression:
it suffers from the poor PSNR. Haar algorithm on the other Transform Coding.(Fast scaled length-8 DCT-II and DCT-III algorithms,
hand is a middle ground between DCT and FDCT. It is better by Arai, Agui, Nakajima,1988.)
in terms of speed as compared to DCT but has a poorer
[9] A. Horé and D. Ziou, ”Image Quality Metrics: PSNR vs. SSIM,” 2010
PSNR than it. Whereas, it has better PSNR than FDCT but 20th International Conference on Pattern Recognition, Istanbul, 2010,
is slower in performance. pp. 2366-2369, doi: 10.1109/ICPR.2010.579.
We have used Vivado HLS workflow which has
[10] I. Avcibas, B. Sankur and K. Sayood, “Statistical evaluation of image
reduced the development time of the application. We quality measures”, Journal of Electronic Imaging, vol. 11, no. 2, pp.
have made use of Artix − 7 FPGA having the part number 206-223, 2002
XC7A100T − 1CSG324C. It has the following features:
[11] S. Ilic, M. Petrovic, B. Jaksic, P. Spalevic, Lj. Lazic and M.
Logic Slices 15,850 (4 6-input LUTs 8 flip-flops each), Milosevic:Experimental analysis of picture quality after compression
Block RAM 4,860 Kbits, Clock Tiles 6 (each with PLL), DSP by different methods. Przeglad Elektrotechniczny, 2013, Vol. 89, No.
Slices 240, Internal clock 450MHz+. The utilization of its 11, pp. 190-194
resources by our algorithms is analyzed using the Synthesis [12] Xilinx Inc, ”Vivado Design Suite User Guide: High-Level Synthesis”,
Report generated within the Vivado HLS. We could see that Xilinx’s SDAccel Development Environment Help, UG902, 2018.
in general, using the accelerator pragmas provided by Xilinx,
algorithm’s performance were optimized which resulted in a

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 11,2022 at 12:50:52 UTC from IEEE Xplore. Restrictions apply.

You might also like