Design and Implementation of Sorting Algorithms Based On FPGA
Design and Implementation of Sorting Algorithms Based On FPGA
Design and Implementation of Sorting Algorithms Based On FPGA
Abstract— Analysis of the efficiency of sorting algorithms most efficient to use for small amount of data and some are effective
often bound up to software simulation. In practical field real for large number of data. Bubble sort, selection sort and
time operation needs a system of faster sorting operation insertion sort are the most common and basic sorting
because software sorting is less effective. With the advancement algorithm which are faster than complex sorting algorithm like
of VLSI design sorting algorithm can be easily implemented as merge sort or quick sort etc. for small number of data. In this
a block in any system which can be effectively used when
work bubble sort, selection sort and insertion sort algorithm
necessary. In this paper, three most common sorting algorithms
bubble sort, selection sort and insertion sort algorithm will be will be implemented using Verilog HDL language. The RTL
implemented using Verilog HDL language. For all algorithms diagram for all the implementation will be given and
RTL (Register Transfer Level) diagram will be examined and associated timing diagram will be examined for worst-case
associated timing diagram will be analyzed for worst case situation (like worst case situation for bubble sort is the data
scenario. Then the comparative analysis for the algorithms will are in reverse order). Some operation parameters will be
be given form the analysis and synthesis report and from some studied. For comparative analysis and synthesis report and
operating parameters. The algorithm which has better timing summary report will be tabled up and comparison
hardware performance can be used as a block in any system result be will be given based on analysis of parameters.
with parallelism.
The rest part of this paper is organizing in three sections.
Keywords— FPGA (Field Programmable Gate Array), RTL, Section II describes the methodology of the implementation
Verilog HDL, sorter implementation, timing analysis. for all three algorithms. Section III describes the RTL diagram
I. INTRODUCTION and comparative timing analysis. Section IV gives the
concluding part.
Sorting is one of the fundamental problem in computer
science that is studied for decades. Almost all field of II. METHODOLOGY
technology requires sorting operation like image processing,
A. Bubble sort
multimedia, scientific computing, networking Over the years
many sorting algorithms have proposed, each of them In this sorting algorithm each pair of adjacent elements is
characterizable by a measure of how much time an algorithm compared, repeatedly steps through the end of the list to be
takes to complete the sorting as the problem size gets bigger. sorted and swap them if they are in the wrong order (the latter
While all algorithms take more time to sorting if number of one is smaller than the previous one). After each iteration,
elements increased, some are slower than others. It is not one less element (the last one) is required to be compared
efficient to execute sort programs on general-purpose until there are no more elements left to be compared. If we
computer today due to its inherent algorithmic difference in have total n elements, then we need to repeat the swapping
computation and sorting in real time operation. Also, process for n-1times. Among all the sorting the
software algorithm is not feasible to achieve high speed. In implementation of bubble sort is simple compared to other
every general-purpose computer there must be a block of algorithms, but the performance is impractical to implement.
arithmetic and logical operation which need sometimes In worst-case scenario (numbers are in reverse order) bubble
sorted data for operation [1] [2] [3]. So, it will be natural to sort has complexity of (n2). Here n is the number of items
implement a dedicated hardware block inside the ALU being sorted [3]. The flowchart for hardware implementation
(Arithmetic Logic Unit) for sorting operation. of this sorting algorithm has given in Fig 1.
With the rapid development of VLSI design field such a B. Selection sort
module for sorting can be implemented using FPGA (Field
Selection sort algorithm sorts an array of number by
Programmable Gate Array) on a single chip [2]. It is a type of
repeatedly finding the smallest element from unsorted part and
reconfigurable gate array where designer can easily change putting it in first position. It maintains two subarrays in the
the design of implementation with gate level parallelism [4]. specific array: the subarray which is already sorted and
From a storage device large number of data proceed to an remaining subarray which is unsorted. The procedure is:
ALU and sorted result will be restored in storage device. This locate the smallest element in the array, interchange it with the
denotes that sorting of data set requires sequential transfer element in the first position, locate the second smallest
from and to a sorter. In FPGA large number of memory is element and interchange it with the element in the second
available and can be accessed with a small number of pins. So, position and keep going on until the given array is sorted. For
FPGA implementation gives a better speed performance with worst, average and best case scenario the complexity is same
parallelism. Among all the sorting algorithm some are as bubble sort , (n2) because run time depends only on the
Authorized licensed use limited to: University of Wollongong. Downloaded on May 31,2020 at 08:16:07 UTC from IEEE Xplore. Restrictions apply.
amount of order in the file. Figure 2 represents the Start
implementation procedure of selection sort algorithm.
Input,
output,
and register
Start deceleration
In positive edge
clock
Input, output, Input-> data
(reg)
and register
deceleration
For i= 1 to n
In positive edge
clock
Input-> data Position=i
(reg)
Position= j
false
For j=1 to n-i Position i
Tr ue
Exchange array
[i] and array [i+1] end
Input, output,
and register
end deceleration
In positive edge
clock
Fig 1. Flowchart for bubble sort implementation. Input-> data
(reg)
Authorized licensed use limited to: University of Wollongong. Downloaded on May 31,2020 at 08:16:07 UTC from IEEE Xplore. Restrictions apply.
III. RESULT selection and insertion sort has given is Fig 7,8 and 9
The methodology describes above sections are implemented respectively.
using Quartus Prime software and in the Xilinx ISE 4.7
design suite for simulating chip performance. The main
working procedures are implementing the processes, give the
entire view of RTL diagram, timing diagram and analysis of
operating parameters for determining the best algorithm.
A. RTL diagram
For all three sorting algorithms the initial implementation
procedures are same as shown in flow diagram. At positive
edge clocking input numbers that will be sorted will be stored
in intermediate registers. Then the numbers are passed
through to another register which will performed as one-
dimensional array. Then the main algorithm will be
performed using array. Lastly sorted data will be stored in
register for displaying output. For all the algorithms 8-bit
registers are taken for all operation. Fig 4,5,6 shows the RTL Fig 7. Timing diagram for bubble sort.
diagram for bubble, selection and insertion sort respectively.
The rectangular box shown in figures are the inputs, outputs
and clock pins.
C. Comparative analysis
Fig 6. RTL diagram for insertion sort.
TABLE I. BASIC OPERATING PARAMETER
B. Timing diagram
Parameter Value
To simulate the implemented chip ISim simulator is used. For Vcc 1.1 V
bubble sort and insertion sort numbers are taken in reverse I/O standard voltage 2.5 V
order as this is the worst-case situation. As selection sort Junction temperature 27.7C
doesn’t depend on the initial orientation of numbers so a Total thermal power estimation 359.5 mW
random input is given. The timing diagram for bubble,
Authorized licensed use limited to: University of Wollongong. Downloaded on May 31,2020 at 08:16:07 UTC from IEEE Xplore. Restrictions apply.
In TABLE I some basic operating parameter has shown. TABLE III shows the estimated timing summery for the
These parameters mainly depend on the vendor family i.e. synthesis of all program. Here Minimum period after
which chip is selected to implement and simulate the design. synthesis is an estimation of the clock period for signals
In all three implementation these Vcc and I/O voltage are inside the implementation. It calculates the worst-case path
same. Only a slight change has seen in junction temperature timing from clock edge to clock edge for flip-flops within the
and total thermal power estimation, but it is negligible. So, design. For insertion sort it is less than other two. And,
the comparison hasn’t shown in TABLE I. insertion operates in higher clock than other two. But will
they work in higher clock depend on the rest of the two
TABLE II. : ANALYSIS AND SYNTHESIS REPORT summaries in table 2. Minimum input arrival time before
Bubble sort Selection sort Insertion sort clock means the required setup time from worst case top-level
ALM needed 406 692 438 design input to the clock. And maximum output required time
Average fanout 3.34 3.94 3.44 after clock defines the delay of the top-level design from the
Number of slices 528 1675 983 clock to external outputs for the worst-case. The main
LUTs
Average LUT 23.74 77.11 19.88 significance of these two is if they are less than minimum
depth period then the implementation may perform in highest clock
Total current 51.58 mA 51.56mA 51.55mA frequency. Insertion sort will perform better then bubble and
drawn selection sort. And selection sort has the worst performance.
Clock to setup on 19.795 47.056 17.364
destination clock IV. CONCLUSION
TABLE II shows the comparative analysis and synthesis In this paper three common sorting algorithm is implemented
report for the implemented design. Firstly, Adaptive Logic using FPGA. The RTL diagram, timing diagram and
Module (ALM) is the basic building block of supported comparative analysis is discussed. From our point of view
devices families and is designed to maximize performance insertion sort algorithm shows much faster operation when
and resource usage. An implementation needs efficient use of implemented in hardware. Though these algorithms are less
ALM. Selection sort uses so much ALM compared to others, effective when large data is encountered but if the data set can
so it is inefficient. Bubble sort has some less amount of ALM be divided into several parts and proceed into several blocks
than insertion sort, so it seems to be a little bit efficient. But (designing several modules of sorter in a large module) for
an implementation should require less amount of time to sorting then it can be a faster operation and it will hold the
process input. So, efficiency of insertion sort lies within its parallelism [3] [5] of the system. We will develop new
timing performance. LUTs (Look Up Tables) are usually optimized sorting algorithm for computer processing.
read-only and their content can only be changed during REFERENCES
FPGA configuration. Adding more LUTs was either too
expensive or considered not very useful. So, when
[1] Magesh.V, Megavarnan.S, Pragadish.A, Saravanan.S , “ FPGA
considering number of slices LUT insertion sort has medium IMPLEMENTATION OF SORTING ALGORITHMS ” in
number of LUT and bubble sort has less. But when Average International Journal For Technological Research In Engineering
LUT depth is considered insertion sort has a smaller number Volume 5, Issue 8, April-2018.
of LUTs compared to other. So, insertion sort is more and fast [2] Gayathri K,HarshiniV S, Dr Senthil Kumar K K, “ Hardware
implementation of sorting algorithm using FPGA” in IJARIIE-
compared to other. Current drawn is same for all ISSN(O)-2395-4396, Vol-4 Issue-2 2018.
implementation. When we look at the clock to set up on [3] Ashrak Rahman Lipu, Ruhul Amin, Md. Nazrul Islam Mondal, Md. Al
destination clock insertion sort required a smaller number of Mamun, “Exploiting Parallelism for Faster Implementation of Bubble
clocks. Sort Algorithm Using FPGA” in 2nd International Conference on
Electrical, Computer & Telecommunication Engineering (ICECTE) 8-
TABLE III. TIMING SUMMARY
10 December 2016, Rajshahi-6204, Bangladesh.
[4] Dmitri Mihhailov, Valery Sklyarov, Iouliia Skliarova, Alexander
Bubble sort Selection sort Insertion sort Sudnitson, “Hardware implementation of recursive sorting algorithms”
Clock period 16.729ns 41.499ns 14.931ns in 2011 International Conference on Electronic Devices, System and
Clock 59.775MHz 24.097MHz 66.975MHz Applications (ICEDSA).
Frequency [5] Stephan Olariu, M. Cristina Pinotti, Si Qing Zheng, “An Optimal
Minimum input 0.302ns 0.307ns 0.288ns Hardware-Algorithm for Sorting Using a Fixed-Size Parallel Sorting
arrival time Device” in IEEE TRANSACTIONS ON COMPUTERS, VOL. 49,
before clock NO. 12, DECEMBER 2000.
Maximum 0.640ns 0.640ns 0.640ns
output required
time after clock
Authorized licensed use limited to: University of Wollongong. Downloaded on May 31,2020 at 08:16:07 UTC from IEEE Xplore. Restrictions apply.