0% found this document useful (0 votes)
64 views6 pages

A Detailed Experimental Analysis of Library Sort Algorithm: Neetu Faujdar

The document discusses the library sort algorithm, which is a sorting algorithm that uses insertion sort with gaps between elements. It provides details on how the library sort algorithm works, including binary search with blanks, and discusses analyzing the time complexity, space complexity, and performance of the library sort algorithm through experimental testing on datasets.

Uploaded by

angki_ang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views6 pages

A Detailed Experimental Analysis of Library Sort Algorithm: Neetu Faujdar

The document discusses the library sort algorithm, which is a sorting algorithm that uses insertion sort with gaps between elements. It provides details on how the library sort algorithm works, including binary search with blanks, and discusses analyzing the time complexity, space complexity, and performance of the library sort algorithm through experimental testing on datasets.

Uploaded by

angki_ang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

IEEE INDICON 2015 1570168483

1  
2  
3  
4  
A Detailed Experimental Analysis of Library Sort
5  
6  
Algorithm
7  
8  
9  
10   Neetu Faujdar Satya Prakash Ghrera
11   Department of CSE Department of CSE & IT
12   Jaypee University of Information Technology, Waknaghat Jaypee University of Information Technology, Waknaghat
13   Solan, India Solan, India
14   [email protected] [email protected]
15  
16  
17   Abstract— One of the basic problem in computer science is to distributed gap, and the algorithm runs O (n log n) with high
18   arrange the items in lexicographic order. Sorting is one of the probability. O (n log n) is better than O (n2). But the library
major research topic. There are number of sorting algorithms. sort also has some issues. The first issue is the value of gap,
19   This paper presents the implementation and detailed analysis of
20   range of gap is given, but it is yet to be implemented after
library sort. Library sort is also called gapped insertion sort. It is
21   implementation, we can only decide that given range is
a sorting algorithm that uses insertion sort with gaps. Time taken
22   by insertion sort is O (n2) because each insertion takes O (n) time; satisfying the concept of library sort. The second issue is re-
23   and library sort has insertion time O (log n) with high balancing, re-balancing has done after 2i elements in library
24   probability. Total running time of library sort is O (n log n) time sort, but it also accounts cost and time of library sort
25   with high probability. Library sort has better run time than algorithm. The third issue is that only a theoretical concept of
26  
insertion sort, but the library sort also has some issues. The first library sort is given by Bender et al but he has not
issue is the value of gap which is denoted by ‘ε’, the range of gap implemented it. So, in this paper to overcome these issues of
27   is given, but it is yet to be implemented to check that given range library sort, we have implemented the concept, done the
28   is satisfying the concept of library sort algorithm. The second detailed experimental analysis and we measure the
29   issue is re-balancing which accounts the cost and time of library
performance on a dataset. The application of leaving gaps for
30   sort algorithm. The third issue is that, only a theoretical concept
of library sort is given, but the concept is not implemented. So, to insertions in a data structure is used by Itai, Konheim, and
31  
overcome these issues of library sort, in this paper, we have Rodeh [8]. This idea has found recent application in external
32  
implemented the concept of library sort and done the detailed memory and cache-oblivious algorithms in the packed
33  
experimental analysis of library sort algorithm, and measure the memory structure of Bender, Demaine and Farach-Colton [1]
34   performance of library sort algorithm on a dataset. and later used in [6, 7]. The remainder of this paper is
35   organized as follows. The detail of library sort algorithm is
36   Keywords— Sorting; Insertion sort; Library sort; Time
Complexity; Space Complexity. given in section 2 and the time complexity based testing using
37   the dataset is done in section 3. The space complexity based
38   testing on a dataset is done in section 4 [13]. The re-balancing
39   I. INTRODUCTION
based testing is done in section 5. We analysis the
40   In computer science, sorting algorithm [2] is an performance of library sort in section 6 and present the
41   algorithm that sorts the list of items in a certain order; conclusion and future work with a few comments in section 7
42   Insertion sort iterates, takes one input element with each and 8.
43   repetition, and put it into the sorted output list. Repeat the
44   process until no input elements remains unprocessed. Insertion
II. LIBRARY SORT ALGORITHM
45   sort [10] is less efficient on large number of items as it takes O
46   (n2) time in worst case, and the best case of insertion sorting The algorithm of library sort is as follows, there are
47   occurs when data is in sorted manner and it is O (n) in best three steps of the library sort algorithm.
48   case. Insertion sort is an adaptive [3] sorting algorithm; it is 1. Binary Search with blanks: In Library sort we have to
49   also a stable sorting algorithm [4]. search a number and the best search for an array is found by
50   Michael A. Bender proposed the library sort binary search. The array ‘S’ is sorted but has the gap. As in
51   algorithm or gapped insertion sort [1]. Library sort is a sorting computer, gaps of memory will hold some value and this value
52   algorithm that comes by an insertion sort but there is a space is fixed to sentential value that is ‘-1’. Due to this reason we
after each element in the array to accelerate subsequent cannot directly use the binary search for sorting. So we have
53  
insertions. Library sort is an adaptive sorting and also a stable modified the binary search. After finding the mid, if mid
54  
sorting algorithm [9]. If we leave more space, the fewer comes out to be ‘-1’ then we move linearly left and right until
55  
elements we move on insertions. The author achieves the O we get a non-zero value. These values are named as m1 and
56  
(log n) insertions with high probability using the evenly m2. Based on these values we define new low, high and mid
57  
for the working. Another difference in the binary search
60  
61  
62  
63  
64  
65  
978-1-4673-6540-6/15/$31.00 ©2015 IEEE

1 01,2020 at 02:18:44 UTC from IEEE Xplore. Restrictions apply.


Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October
presented below is that it is not only searches the element in endif
the list, but also reports the correct position where we have to endif
insert the number. if(k > S[m1] && k < S[m2]) then
low = m1 + 1
ALGORITHM 1. Library Sort Binary Search with blanks high = m2 – 1
Input: Data to be sorted n and number to be searched k. endif
Output: Position to enter the element d. if(k >=S[m2]
while(low < high) if(m2<=high)then
mid = (low + high)/2 low = m2 + 1
if (S[mid] = = -1) then else
m1= m2 = mid low = m2
while(S[m1] = = -1 && m1>=low) endif
m1= m1-1 endif
endwhile else
while(S[m2] = = -1 && m2< = high) if(S[mid] < k) then
m2 = m2 + 1 low = mid + 1
endwhile endif
if(m1< 0 && m2 >= high+1) then if(S[mid] > k) then
low = high=m1+1 high = mid – 1
endif endif
if(m1= = 0 && m2>= high+1) if(S[mid] = = k) then
if(k < S[m1]) then return mid
low = high = m1 endif
else endwhile
low = high = m1+1 if(low==high)
endif if(S[low]<k) then
endif low++
if(m1 > 0 && m2>=high+1) high++
if(k >=S[m1]) then endif
endif
high = m1+1
return low
else
end
high = m1-1
endif
endif 2. Insertion: As we know, library sort is also known by the
if(m1 > 0 && m2 < high+1) name ‘gapped insertion sort’. If the value to be inserted is in
if(k <=S[m1]) the gap, then it is ok, but if there is an element in that
if(k==S[m]) then particular position, we have to shift the elements till we find
low = high = m1 the next gap.
else
high=m1-1
endif ALGORITHM 2. Library Sort Insertion
endif Input: Data to be sorted n and pass number i.
endif Output: Sorted list but without gaps.
if(k > S[m1] && k < S[m2]) then Set i1=0 and c1=0
low = m1 + 1 if(i = =1) then
high = m2 –1 i1=i - 1
endif c1 = 0
endif
if(k >=S[m2])
S1=pow(2,i)
if(m2 < high) then
if(S1 > size) then
low = m2 + 1 S1 = size
else endif
low = m2 for j= (pow(2,i1-1) - c1) to S1do
endif k= search(pow(2,i)+pow(2,i+1),a[j])
endif if(S[k] != -1) then
if(m1= = 0 && m2 <=high) managetill(k)
if(k <= S[m1]) then endif
high = m1 S[k] = a[j]

2 01,2020 at 02:18:44 UTC from IEEE Xplore. Restrictions apply.


Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October
endfor
end Fig. 1.

3. Re-balancing: Re-balancing is done after inserting 2i Fig. 2.


elements.
Fig. 3.

ALGORITHM 3. Library Sort Re-balancing Fig. 4.


Input: Sorted data but not uniformly gapped and re-balancing
factor e. Fig. 5.
Output: Sorted list of n items.
while(l < n) do Fig. 6. Graph shows the execution time of random data using value of gaps
if(S[j] ! = -1) then
reba[i] = S[j]
i++
j++
l++
for k=0 to e do
reba[i] = -1
i++
endfor
else
j++
endif
for k = 0 to I do
S[k] = reba[k] Fig. 7. Graph shows the execution time of nearly sorted data using value of
endfor gaps
endwhile
end

III. EXECUTION TIME TESTING OF LIBRARY SORT ON A DATASET


We have tested the library sort algorithm on a dataset
[T10I4D100K (.gz) ] [5, 12] by increasing the value of the
gap (ε ) . The dataset contains the 1010228 items. We have
tested on four cases:
(1) Random with repeated data (Random data)
(2) Reverse sorted with repeated Data (Reverse sorted data)
(3) Sorted with repeated data (Sorted data) Fig. 8. Graph shows the execution time of reverse sorted data using value of
(4) Nearly sorted with repeated data (Nearly sorted data) gaps
Table I shows the execution time in microseconds of library
sort algorithm using the dataset.
By analyzing the table I, we can see that when we increase
the gap value between the elements the execution time will
decrease. The following figures show this effect. In all the
figures X-axis represents the increasing value of the gap and
the Y-axis shows the time in microseconds.

TABLE I. EXECUTION TIME OF LIBRARY SORT IN MICROSEC BASED ON


GAP VALUE

LibrarySort Algorithm
Time in Microseconds
Value
Random Nearly Sorted Reverse Sorted Sorted
of ε
ε =1 981267433 864558882 1450636163 861929937
Fig. 9. Graph shows the execution time of sorted data using value of gaps
ε =2 729981576 620115904 1065247938 609647355
We have plotted figure 1, 2, 3, 4 by using table I. By
ε =3 119727535 358670053 278810310 356489846
examining these figures, we can see that how the execution
ε =4 23003046 117188830 263693774 116590140

3 01,2020 at 02:18:44 UTC from IEEE Xplore. Restrictions apply.


Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October
time is decreasing when the gap value between items is memory required is directly proportional to the value of
increasing. In figure 1-4, we are representing the execution epsilon, where epsilon is (1 + ε )n .
time in microseconds in all the four cases of dataset.
The value of epsilon: when we increase the value of epsilon,
the execution time will decrease, but at some point, value of
epsilon gets saturated point because we are allocating more
gaps, but these gaps are more than are actually required for the
operation, so it will only be an extra memory overhead
because we need more memory to store the elements. So in
this way the space complexity of the algorithm increases
linearly, when we increase the value of epsilon. The concept
of space complexity will be explained in the next section with
the help of graph.

IV. SPACE COMPLEXITY TESTING OF LIBRARY SORT ON A Fig. 10. Graph showing the space complexity of library sort
DATASET
Auxiliary space complexity of library sort is O(n), but the V. RE-BALANCING TESTING OF LIBRARY SORT ON A DATASET
space complexity is not only limited to auxiliary space. It is
the total space taken by the program which includes the As the rebalancing is done after inserting ai elements,
following [11]. this increases the size of the array. The size of the array will
depend on ‘ε’. To do this process, we will require an auxiliary
(1) Primary memory required to store input data (Mip).
array of the same size so as to make a duplicate copy with
(2) Secondary memory required to store input data (Mis)
(3) Primary memory required to store output data (Mop). gaps. Re-balancing is necessary after inserting ai elements, but
(4) Secondary memory required to store output data (Mos) it also accounts the cost and time of library sort algorithm so,
(5) Memory required for holding the code (Mc) what will be the suitable value for ‘a’ is the question. We have
(6) Memory required for working space (temporary memory) calculated re-balancing till ai where ‘a’= 2, 3, 4 and i = 0, 1, 2,
variables + stack (Mw) 3, 4….. With the value of gaps ‘ε’ = 1,2,3,4.
Table II shows the detail of total space complexity taken by
the library sort algorithm on a dataset using gap values and re-
balancing factor. (A) For example, when ε=1, then how re-balancing will
be performed if a=2.
TABLE II. TOTAL SPACE COMPLEXITY IN BYTES OF LIBRARY SORT WITH 2i = 20, 21, 22, 23, 24 ……………
INCREASING VALUE OF GAP AND RE-BALANCING FACTOR
=1, 2, 4, 8, 16 ……………
Reba-
ε Mip Mis Mop Mos Mc Mw Total 1. Re-balance for 20 =1
lancing
1 4040932 4932283 4 4932283 81,920 16163752 30151174
2 4040932 4932283 4 4932283 81,920 24245576 38232998
1 -1
2
4040932 4932283 4 4932283 81,920 32327400
1. Re-balance for 21=2
3 46314822
4 4040932 4932283 4 4932283 81,920 40409224 54396646
1 4040932 4932283 4 4932283 81,920 16163752 30151174
2 4040932 4932283 4 4932283 81,920 24245576 38232998
3
3 4040932 4932283 4 4932283 81,920 32327400 46314822 1 2
4 4040932 4932283 4 4932283 81,920 40409224 54396646
1 4040932 4932283 4 4932283 81,920 16163752 30151174

4
2 4040932 4932283 4 4932283 81,920 24245576 38232998 After re-balancing this array will be-
3 4040932 4932283 4 4932283 81,920 32327400 46314822
4 4040932 4932283 4 4932283 81,920 40409224 54396646

1 -1 2 -1
In table II, we have seen the total space complexity taken by
the library sort using the dataset. From table II, we can see that
there is no effect of re-balancing factor, but there is an effect 2. Re-balance for 22=4
of epsilon values. When we increase the gap value, the space
taken by the program will also increase. We can see this effect 1 2 3 4
with the help of graph shown in figure 5. In figure 5, the X-
axis represents the value of epsilon and the Y-axis represents
the memory occupied by the library sort algorithm in bytes. After re-balancing this array will be
We can see that space complexity of the library sort algorithm
increases linearly, when we increase the value of epsilon or 1 -1 2 -1 3 -1 4 -1
gaps between the elements. It increases because we require
more memory to store the elements and it is directly So in this manner we can re-balance the array in the power of
proportional to the value of epsilon. Due to this fact, the 2i.

4 01,2020 at 02:18:44 UTC from IEEE Xplore. Restrictions apply.


Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October
(B) Example when ε=1, then how re-balancing will be
performed if a=3.
3i = 30, 31, 32, 33, 34 ……………
= 1, 3, 9, 27…………………
1. Re-balance for 30 =1

1 -1

1. Re-balance for 31=3

In the above array only one space is empty, so we can insert


only one element so the total element will be two, but Fig. 12. Graph shows the re-balancing of library sort using nearly sorted
dataset
according to re-balancing factor we require three elements so
in this situation we have to shift the data to insert the new
elements. So in this way performance of algorithm degrades as
we are having the larger number of swapping to generate the
spaces which is same as that in the case of traditional insertion
sort. So finally we have found that when we increase the re-
balancing factor ‘a’ from 2 to 4 then the execution time of
library sort algorithm will also increase. We can see this effect
with the help of table III and graphs described in figure 6 to
figure 9.

TABLE III. TIME TAKEN BY LIBRARY SORT ALGORITHM IN


MICROSECOND DURING RE-BALANCING
Type of Dataset
Value Fig. 13. Graph shows the rebalancing of library sort using reverse sorted
Rebalancing Random Nearly Reverse Sorted
of ε dataset
1 981267433 864558882 1450636163 861929937
2 729981576 620115904 1065247938 609647355
2
3 119727535 358670053 278810310 356489846
4 23003046 117188830 263693774 116590140
Sahi hai
1 2622591059 2214715182 2832112301 3011802732
2 2103580421 1964645906 2585747568 2651992181
3
3 2043974421 1728175857 2195021514 1962122927
4 1620914312 1600879365 2130261056 1620374625
1 2942693856 2467933298 3239333534 3281368964
2 2705332601 2510103530 3154811065 2923182920
4
3 2676681610 2613423098 3013676930 2378347887
4 2611656774 2157740458 2993363707 2222906193

From table III, we can see that the execution time of library
sort is increasing when the re-balancing factor will increase in
all the cases of the dataset. The following graph shows this Fig. 14. Graph shows the re-balancing of library sort using sorted dataset
effect.
From figure 6 to 9, we can see that the execution time of
library sort is increasing when the re-balancing factor is
increasing using all the four cases of dataset.
From figure 6 to figure 9, the X-axis represents the value of
epsilon and the Y-axis represents the execution time in
microseconds when the re-balancing factor value is 2i, 3i, 4i.
By analyzing the figures, we can see that the nature of data
marginally effected on the re-balancing factor. If the re-
balancing factor is 2i i.e. we have to re-balance the elements
in the following manner 20, 21, ……. 2n. Then the performance
of the algorithm is good because in the array proper space is
there to insert the new elements. But the performance of the
Fig. 11. Graph shows the re-balancing of library sort using random dataset algorithm is degraded if the re-balancing factor increases from
2i to 4i because if we use the re-balancing factor 3i i.e. we

5 01,2020 at 02:18:44 UTC from IEEE Xplore. Restrictions apply.


Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October
have to re-balance the elements in the following manner: 30, each element. This can be done parallel using GPU
31,…….3n . Then in the array there is no proper space to insert distribution.
the new elements in the manner of 3i and 4i. So shifting of data
was required to insert the new elements and the spaces ACKNOWLEDGMENT
between many elements have been already consumed so this
way performance degrades as we have a larger number of We have done our code of library sort algorithm in C
swapping to generate the spaces which is same as that in the language and the datasets is used which is available at
case of traditional insertion sort.
frequent item-set mining implementation repository. This
work is performed in the frame of a research concerted action.
VI. ANALYSIS OF PERFORMANCE OF LIBRARY SORT ALGORITHM All experimental results are done in the research lab of Jaypee
By execution time analysis, we have found that when we University of Information Technology, Waknaghat Solan,
increase the value of epsilon then the execution time will India.
decrease, but at some point the value of epsilon gets a
saturated point because we will have the extra spaces, for the REFERENCES
data to be inserted in between.
By space complexity analysis, we have found that the [1] Bender, Michael A., Martin Farach-Colton, and Miguel A. Mosteiro.
space complexity of the library sort algorithm increases "Insertion sort is O (n log n)”. Theory of Computing Systems, 39.3
linearly that is, when we increase the value of epsilon the (2006), pp. 391-397.
memory consumption is also increasing in the same [2] Cormen, Thomas H., et al. “Introduction to algorithms”. Cambridge:
MIT press, Vol. 2, 2001.
proportion.
[3] Estivill-Castro, Vladmir, and Derick Wood. "A survey of adaptive
By execution time analysis of re-balancing, we have found sorting algorithms". ACM Computing Surveys (CSUR), 24.4 (1992), pp.
that when we increase the re-balancing factor ‘a’ from 2 to 4 441-476.
then the execution time of library sort algorithm will also [4] Pardo, Luis Trabb. "Stable sorting and merging with optimal space and
time bounds”. SIAM Journal on Computing, 6.2 (1977), pp. 351-372.
increase as it moves towards the traditional insertion sort.
[5] Frequent Itemset Mining Implementations
So to find out the best result of library sort algorithm, the Repository,http://fimi.cs.helsinki.fi accessed on 10/11/2014
value of epsilon should be optimized and re-balancing factor [6] Bender, Michael A., et al. "A locality-preserving cache-oblivious
should be minimized or ideally equal to 2. dynamic dictionary”. Proceedings of the thirteenth annual ACM-SIAM
symposium on Discrete algorithms, 2002, pp. 1-22.
[7] Brodal, Gerth Stølting, Rolf Fagerberg, and Riko Jacob. "Cache
VII. CONCLUSION oblivious search trees via binary trees of small height". Proceedings of
In this paper, we have tested the library sort algorithm on a the thirteenth annual ACM-SIAM symposium on Discrete algorithms,
2002, pp. 1-20.
dataset. There are four cases of the dataset and every case of
[8] Itai, Alon, Alan G. Konheim, and Michael Rodeh. “A sparse table
the dataset contains the 1010228 items. We have applied the implementation of priority queues”. Springer Berlin Heidelberg, 1981,
library sort algorithm in the four cases of the dataset and pp. 417-431.
compare the performance in each case. And also we have [9] Thomas, Nathan, et al. "A framework for adaptive algorithm selection in
found out the total space complexity taken by library sort STAPL”. Proceedings of the tenth ACM SIGPLAN symposium on
Principles and practice of parallel programming ACM, 2005, pp. 277-
algorithm. We also found out how the value of epsilon and re- 288.
balancing factor is affecting the execution time of library sort [10] Janko, Wolfgang. "A list insertion sort for keys with arbitrary key
algorithm, and we should keep the value of epsilon and re- distribution”. ACM Transactions on Mathematical Software (TOMS, 2.2
balancing factor (a) as optimal as possible and minimum or (1976), pp. 143-153.
ideally equal to 2. Library sort algorithm is implemented in C- [11] Faujdar Neetu, and Satya Prakash Ghrera. “Analysis and Testing of
language. The program of library sort is designed at Borland Sorting Algorithms on a Standard Dataset”.CommunicationSystems
and Network Technologies (CSNT), Fifth International Conference on.
C++ 5.02 compiler and executed on the Intel® core™ i5 IEEE, 2015, pp. 962-967 .
processor-3230 M CPU @ 2.60 GHz machine, and the [12] Zubair Khan, Neetu Faujdar, et al. “Modified BitApriori Algorithm: An
programs running at 2.2 GHz clock speed. Intelligent Approach for Mining Frequent Item-Set”. Proc. Of Int. Conf.
on Advance in Signal Processing and Communication, 2013, pp. 813-
819.
VIII. FUTURE WORK [13] Faujdar Neetu, and Satya Prakash Ghrera. “Performance Evaluation of
Merge and Quick Sort using GPU Computing with CUDA”.
International Journal of Applied Engineering Research, 10.18(2015),
In this paper, we have used the uniform gap distribution pp. 39315-39319.
after each element. For further improvement of library sort
algorithm, we can use the non-uniform gap distribution after

6 01,2020 at 02:18:44 UTC from IEEE Xplore. Restrictions apply.


Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October

You might also like