Comparison of Parallel Quick and Merge Sort Algorithms On Architecture With Shared Memory
Comparison of Parallel Quick and Merge Sort Algorithms On Architecture With Shared Memory
Abstract — Sorting algorithms are widely used in more algorithms and they are easier to understand than other divide
complex algorithms that rely on their output correctness. and conquer methods.
Slowly but surely, sequential approach is losing its OpenMP was chosen for parallelization in this experiment
significance because of modern hardware constraints. because it is easier to understand than other various thread
Programs and algorithms must be redesigned and libraries, can work on a large number of shared memory
adjusted so they can use full potential of new hardware computers, it is standardized and well documented.
and they should be created using parallel approach.
Quick-Sort and Merge-Sort are sorting algorithms which II. SORTING ALGORITHMS AND THEIR
are easy to understand, well understood in parallel PARALLELIZATION
algorithms theory and have popular representation of the
rich class of divide and conquer methods. In this paper, A. Quick-Sort
comparison of sequential and parallel implementation of Quick-Sort is an efficient sorting algorithm, serving as
Merge-Sort and Quick-Sort algorithms performance is systematic method for placing the elements of an array in
presented. This is done with parallel programming order. Developed by Tony Hoare in 1959 and published in
platform OpenMP and it is run on mainstream multi-core 1961, it is still commonly used algorithm for sorting. When
computers. Performances were measured on multiple Intel implemented well, it can be about two or three times faster
i3, Intel i5 Intel i7 processors and Intel Xeon CPU than its main competitors, Merge-Sort and Heap-Sort [1].
E5-2676. Mathematical analysis of quicksort shows that, on average,
There will be some words about OpenMP environment the algorithm takes O(n logn) comparisons to sort n items. In
and about the way mentioned algorithms work. Results, the worst case, it makes O(n 2
) comparisons, though this
speedups and comparisons are illustrated in this paper. behavior is rare.
Quick-Sort is a divide and conquer algorithm. It first
Keywords — Sorting algorithm; Parallel Quick-Sort; divides main array in two smaller subarrays: the low elements
Parallel Merge-Sort; Speedup; and the high elements. Quick-Sort can then recursively sort
the subarrays.
I. INTRODUCTION The steps are:
Sorting algorithms are one of the most critical problems in 1. Pick an element, called a pivot, from the array.
computer science and they are frequently used by computer 2. Divide the list into two subarrays: a lower list
scientists for search algorithms when picking relevant results, containing numbers smaller than the pivot, and an
sorting amounts of data, converting data and producing upper list containing numbers larger than or equal to
human-readable output. the pivot.
Because of their wide usage, and inclusion in other, more 3. Recursively apply the above steps to the sub-array of
complex, algorithms, it is mandatory that they are producing elements with smaller values and separately to the
correct results and doing that as fast as possible. Considering sub-array of elements with greater values.
all this reasons many sorting algorithms have been developed The base case of the recursion is arrays of size zero or
such as Quick-Sort, Merge-Sort, Selection-Sort, Insertion-Sort one, which are in order by definition, so they never need to be
etc. The reason Quick-Sort and Merge-Sort have been chosen sorted. The final sorted result is the concatenation of the
is because they are efficient divide and conquer sorting sorted lower array, the pivot, and the sorted upper array.
The pivot selection and partitioning steps can same for worst and average case scenarios [4].
be done in several different ways, and the choice of specific Problem is that this algorithm is memory inefficient,
implementation schemes greatly affects the performance of because every time array is divided, two new arrays are
the algorithm. Pivot can be selected randomly, using the index allocated in memory. Unlike of that, Quick-Sort algorithm
of first element of array, index of last element of array, middle works on same array during its execution. Because of that,
index of the partition, choosing the median of the first, middle merge sort is worse than quick sort in sort of data caching.
and last element of the partition for the pivot. For needs of this Just like Quick-Sort algorithm, Merge-Sort is a divide and
experiment, pivot was chosen as middle index of partition. conquer algorithm too. It divides array on two subarrays of
same length, and recursively calls itself on both subarrays.
B.Parallel Quick-Sort Algorithm divides that subarrays in new subarrays until their
Parallel Quick-Sort algorithm can be implemented in length is 0 or 1, because then they are sorted by definition.
different ways. In this section it will be summarized the one When subarrays are sorted, they will be merged [5].
used for this experiment. The main idea is to do the Steps of Merge-Sort algorithm are:
partitioning of the original array using a single thread and then 1. Finding middle point off main array A where it will
assign lower subarray to same thread and upper subarray to be divided in two halves, subarray B from first to
another thread for further partitioning. This is the case when middle element of A, and subarray C from middle to
two threads are available. When four threads are available this last element of A:
step is repeated one more time. middle = (first(A) + last(A)) / 2;
2. Calling Merge-Sort on subaray B:
merge-sort(A, first, middle);
3. Calling Merge-Sort on subarray C:
merge-sort(A, middle, last);
4. Mergeing two sorted subarrays B and C:
merge(A, first, middle, last);
These steps are for basic 2-way algorithm where array is
divided into 2 subarrays. Merge-Sort algorithm can be k-way
where k is number of subarrays obtained by dividing an array.
D. Parallel Merge-Sort
Idea of parallelization of Merge-Sort algorithm is to call
sequential version of this algorithm on both subarrays which
are obtained from given array, where Merge-Sort for every
array is assigned to different thread. Indifference of this,
Quick-Sort parallelization is based on recursive call of
algorithm itself but on different threads.
Steps of parallel Merge-Sort algorithm are like in basic
sequential algorithm. Only differences are in step 2 and 3
where these steps are performed in parallel on two different
threads. Expected speedup, where speedup is quotient between
execution time on 1 thread and execution time on 2 threads, is
slightly less than 2 because of existing overhead in
parallelization. Measurements and performance comparisons
Fig. 1. Visualization of parallel Quick-Sort [2] are shown in one of next paragraphs.
Problem with implementation used in experiment is not
C.
Merge-Sort scalable, which means performances will not be improved by
Merge-Sort algorithm, as a main competition of efficient using more threads. According to this concept, for using 3
sorting algorithms like Quick-Sort algorithm, is one of the threads 3-way, for 4 threads 4-way, and for K threads, K-way
most efficient algorithms for sorting the elements of a given Merge-Sort algorithm should be used. In theory, when using
array. K threads and K-way Merge-Sort, speedup should be close to
This algorithm is developed by John von Neumann in 1945. K.
Bottom-up version was published by Goldstine and John von There are other implementations that may or may not be
Neumann in 1948 [3]. scalable.
Time complexity analysis shows that Merge-Sort in best case It is important to notice that in huge number of Merge-Sort
implementations, function for merging sorted subarrays in
scenario has O(n*log(n)) time complexity. Time complexity is
correct way is not parallel.
Improvement in parallelization of Merge-Sort algorithm is obtained using Intel Core i3 and Intel Core i5 processors,
done by parallelizing that merging operation. That is done by while those shown by fig. 3. are obtained using Intel Core i7
conversion into OpenMP of a platform-specific technique and Intel Xeon E5 processors.
developed for .NET Task Parallel Library [6].
B.
Experiment Results for Merge-Sort
For this experiment, array of 1024000 unsorted random
integer numbers between 1 and 1024000 were used. That is
approximately 4000KB. Merge-Sort is memory inefficient, as
mentioned earlier, and array with more elements could not be
used in current conditions. One temporary array of same
length is needed for Merge-Sort algorithm. On the start of
experiment, approximately 8000KB of memory is allocated.
After, in every recursion step, same amount of memory is
going to be allocated.
Fig. 6. is showing results obtained after execution of
sequential Merge-Sort marked green, and parallel Merge-Sort
marked blue, on Intel Core i3 and i5 processors. For parallel
Merge-Sort two threads are used. Time is measured in
milliseconds.