Sorting and searching algorithms
Objectives
To study and analyze time efficiency of various sorting
algorithms
To design, implement, and analyze bubble sort.
To design, implement, and analyze merge sort
To design, implement, and analyze quick sort.
Linear Search
One by one...
Linear Search
Check every element in the list, until the
target is found
For example, our target is 38:
i 0 1 2 3 4 5
a[i] 25 14 9 38 77 45
Not found!
Found!
Linear Search
1) Initilize an index variable i
2) Compare a[i] with target
• If a[i]==target, found
• If a[i]!=target,
• If all have checked already, not found
• Otherwise, change i into next index and go to step
2
Linear Search
Time complexity in worst case?
– If N is number of elements,
– Time complexity = O(N)
Advantage?
Disadvantage?
Binary Search
Chop by half...
Binary Search
Given a SORTED list:
(Again, our target is 38)
Smaller! Found! Larger!
i 0 1 2 3 4 5
a[i] 9 14 25 38 45 77
L R
Binary Search
Why always in the middle, but not other
positions, say one-third of list?
1) Initialize boundaries L and R
2) While L is still on the left of R
• mid = (L+R)/2
• If a[mid]>Target, set R be m-1 and go to step 2
• If a[mid]<Target, set L be m+1 and go to step 2
• If a[mid]==Target, found
Binary Search
Time complexity in the worst case?
– If N is the number of elements,
– Time complexity = O(lg N)
– Why?
Advantage?
Disadvantage?
What can you learn?
Improve one ‘dimension’ using binary
search
Linear search for a few times can be more
efficient than binary search for many times!
– DO NOT underestimate linear search!!!
why study sorting?
Sorting is a classic subject in computer science. There are three
reasons for studying sorting algorithms.
– First, sorting algorithms illustrate many creative
approaches to problem solving and these approaches can
be applied to solve other problems.
– Second, sorting algorithms are good for practicing
fundamental programming techniques using selection
statements, loops, methods, and arrays.
– Third, sorting algorithms are excellent examples to
demonstrate algorithm performance.
what data to sort?
The data to be sorted might be integers, doubles, characters, or
objects. The Java API contains several overloaded sort methods
for sorting primitive type values and objects in the
java.util.Arrays and java.util.Collections class.
data to be sorted are integers,
data are sorted in ascending order, and
data are stored in an array. The programs can be easily
modified to sort other types of data, to sort in descending
order, or to sort data in an ArrayList or a LinkedList.
Bubble Sort
2 9 5 4 8 1 2 5 4 8 1 9 2 4 5 1 8 9 2 4 1 5 8 9 1 2 4 5 8 9
2 5 9 4 8 1 2 4 5 8 1 9 2 4 5 1 8 9 2 1 4 5 8 9
2 5 4 9 8 1 2 4 5 8 1 9 2 4 1 5 8 9
2 5 4 8 9 1 2 4 5 1 8 9
2 5 4 8 1 9
(a) 1st pass (b) 2nd pass (c) 3rd pass (d) 4th pass (e) 5th pass
Bubble sort time: O(n2) BubbleSort
n2 n Run
(n 1) (n 2) ... 2 1
2 2
Merge Sort
2 9 5 4 8 1 67
split
2 9 5 4 8 1 6 7
split divide
2 9 5 4 8 1 6 7
split
2 9 5 4 8 1 6 7
merge
2 9 4 5 1 8 6 7
conquer
merge
2 4 5 9 1 6 7 8
MergeSort
merge
1 2 4 5 6 7 89 Run
Merge Two Sorted Lists
current1 current2 current1 current2 current1 current2
2 4 5 9 1 6 7 8 2 4 5 9 1 6 7 8 2 4 5 9 1 6 7 8
1 1 2 4 5 6 7 8 1 2 4 5 6 7 8 9
current3 current3 current3
(a) After moving 1 to temp (b) After moving all the (c) After moving 9 to
elements in list2 to temp temp
to temp
Merge Sort Time
Let T(n) denote the time required for sorting an
array of n elements using merge sort. Without loss
of generality, assume n is a power of 2. The merge
sort algorithm splits the array into two subarrays,
sorts the subarrays using the same algorithm
recursively, and then merges the subarrays. So,
n n
T ( n) T ( ) T ( ) mergetime
2 2
Merge Sort Time
The first T(n/2) is the time for sorting the first
half of the array and the second T(n/2) is the time
for sorting the second half. To merge two
subarrays, it takes at most n-1 comparisons to
compare the elements from the two subarrays and
n moves to move elements to the temporary
array. So, the total time is 2n-1. Therefore,
n n n n
T ( n ) 2T ( ) 2n 1 2( 2T ( ) 2 1) 2n 1 2 2 T ( 2 ) 2n 2 2n 1
2 4 2 2
n
2 k T ( k ) 2n 2k 1 ... 2n 2 2n 1
2
n
2logn T ( logn ) 2n 2logn 1 ... 2n 2 2n 1
2
n 2n log n 2logn 1 2n log n 1 O ( n log n )
Quick Sort
Quick sort, developed by C. A. R. Hoare (1962),
works as follows: The algorithm selects an element,
called the pivot, in the array. Divide the array into
two parts such that all the elements in the first part
are less than or equal to the pivot and all the
elements in the second part are greater than the
pivot. Recursively apply the quick sort algorithm to
the first part and then the second part.
Quick Sort
pivot
5 2 9 3 8 4 0 1 6 7 (a) The original array
pivot pivot
4 2 1 3 0 5 8 9 6 7 (b)The original array is partitioned
pivot
(c) The partial array (4 2 1 3 0) is
0 2 1 3 4
partitioned
pivot
0 2 1 3 (d) The partial array (0 2 1 3) is
partitioned
(e) The partial array (2 1 3) is
1 2 3 partitioned
Partition pivot low
5 2 9 3 8 4 0 1 6 7
high
(a) Initialize pivot, low, and high
pivot low high
5 2 9 3 8 4 0 1 6 7 (b) Search forward and backward
pivot low high
5 2 1 3 8 4 0 9 6 7 (c) 9 is swapped with 1
pivot low high
5 2 1 3 8 4 0 9 6 7 (d) Continue search
pivot low high
5 2 1 3 0 4 8 9 6 7 (e) 8 is swapped with 0
pivot low high
QuickSort 5 2 1 3 0 4 8 9 6 7 (f) when high < low, search is over
pivot
Run
4 2 1 3 0 5 8 9 6 7 (g) pivot is in the right place
The index of the pivot is returned
Quick Sort Time
To partition an array of n elements, it takes n-1
comparisons and n moves in the worst case. So,
the time required for partition is O(n).
Worst-Case Time
In the worst case, each time the pivot divides the
array into one big subarray with the other empty.
The size of the big subarray is one less than the
one before divided. The algorithm requires O (n 2 )
time:
(n 1) (n 2) ... 2 1 O(n 2 )
Best-Case Time
In the best case, each time the pivot divides the
array into two parts of about the same size. Let
T(n) denote the time required for sorting an array
of elements using quick sort. So,
n n
T ( n ) T ( ) T ( ) n O ( n log n )
2 2
Average-Case Time
On the average, each time the pivot will not
divide the array into two parts of the same size
nor one empty part. Statistically, the sizes of the
two parts are very close. So the average time is
O(nlogn). The exact average-case analysis is
beyond the scope of this book.
Heap
Heap is a useful data structure for designing efficient
sorting algorithms and priority queues. A heap is a binary
tree with the following properties:
Itis a complete binary tree.
Each node is greater than or equal to any of its children.
Complete Binary Tree
A binary tree is complete if every level of the tree is full
except that the last level may not be full and all the leaves
on the last level are placed left-most. For example, in
Figure below the binary trees in (a) and (b) are complete,
but the binary trees in (c) and (d) are not complete. Further,
the binary tree in (a) is a heap, but the binary tree in (b) is
not a heap, because the root (39) is less than its right child
(42).
42 39 42 42
32 39 32 42 32 39 32
22 29 14 33 22 29 14 22 14 33 22 29
Representing a Heap
For a node at position i, its left child is at position 2i+1 and
its right child is at position 2i+2, and its parent is (i-1)/2.
For example, the node for element 39 is at position 4, so its
left child (element 14) is at 9 (2*4+1), its right child
(element 33) is at 10 (2*4+2), and its parent (element 42) is
at 1 ((4-1)/2).
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][11][12][13]
62 [10][11]
62 42 59 32 39 44 13 22 29 14 33 30 17 9
left
42 59 parent
right
32 39 44 13
22 29 14 33 30 17 9
Adding Elements to the Heap
Adding 3, 5, 1, 19, 11, and 22 to a heap, initially empty
3 5 5
3 3 1
(a) After adding 3 (b) After adding 5 (c) After adding 1
19 19 22
5 1 11 1 11 19
3 3 5 3 5 1
(d) After adding 19 (e) After adding 11 (f) After adding 22
Rebuild the heap after adding a new node
Adding 88 to the heap
22 22 88
11 19 11 88 11 22
3 5 1 88 3 5 1 19 3 5 1 19
(a) Add 88 to a heap (b) After swapping 88 with 19 (b) After swapping 88 with 22
Removing the Root and Rebuild the Tree
Removing root 62 from the heap
62
42 59
32 39 44 13
22 29 14 33 30 17 9
Removing the Root and Rebuild the Tree
Move 9 to root
42 59
32 39 44 13
22 29 14 33 30 17
Removing the Root and Rebuild the Tree
Swap 9 with 59
59
42 9
32 39 44 13
22 29 14 33 30 17
Removing the Root and Rebuild the Tree
Swap 9 with 44
59
42 44
32 39 9 13
22 29 14 33 30 17
Removing the Root and Rebuild the Tree
Swap 9 with 30
59
42 44
32 39 30 13
22 29 14 33 9 17
The Heap Class
Heap<E>
-list: java.util.ArrayList<E>
+Heap() Creates a default empty heap.
+Heap(objects: E[]) Creates a heap with the specified objects.
+add(newObject: E): void Adds a new object to the heap.
+remove(): E Removes the root from the heap and returns it.
+getSize(): int Returns the size of the heap.
Heap TestHeap Run
Heap Sort
HeapSort Run
Heap Sort Time
Let h denote the height for a heap of n elements.
Since a heap is a complete binary tree, the first
level has 1 node, the second level has 2 nodes,
the kth level has 2(k-1) nodes, the (h-1)th level has
2(h-2) nodes, and the hth level has at least one node
and at most 2(h-1) nodes. Therefore,
1 2 ... 2h 2 n 1 2 ... 2h 2 2h 1
2 h 1 h
1 n 2 1 2h 1 n 1 2 h log 2 h 1 log( n 1) log 2 h
h 1 log( n 1) h log( n 1) h log( n 1) 1
Bucket Sort and Radix Sort
All sort algorithms discussed so far are general
sorting algorithms that work for any types of keys
(e.g., integers, strings, and any comparable objects).
These algorithms sort the elements by comparing
their keys. The lower bound for general sorting
algorithms is O(nlogn). So, no sorting algorithms
based on comparisons can perform better than
O(nlogn). However, if the keys are small integers,
you can use bucket sort without having to compare
the keys.
Bucket Sort
Put the elements into buckets in the order of
elements’ keys.
Radix Sort
The buckets corresponds to radix.
Phase I
Repeatedly bring data from the file to an array,
sort the array using an internal sorting algorithm,
and output the data from the array to a temporary
file.
Program Original file
Array
Temporary file
……
S1 S2 Sk
Phase II
Merge a pair of sorted segments (e.g., S1 with S2,
S3 with S4, ..., and so on) into a larger sorted
segment and save the new segment into a new
temporary file. Continue the same process until
one sorted segment results.
S1 S2 S3 S4 S5 S6 S7 S8
Sk
Merge
S1, S2 merged S3, S4 merged S5, S6 merged S7, S8 merged
Merge
S1, S2, S3, S4 merged S5, S6 , S7, S8 merged
Merge
S1, S2, S3, S4 , S5, S6 , S7, S8 merged
Implementing Phase II
Each merge step merges two sorted segments to
form a new segment. The new segment doubles the
number elements. So the number of segments is
reduced by half after each merge step. A segment is
too large to be brought to an array in memory. To
implement a merge step, copy half number of
segments from file f1.dat to a temporary file f2.dat.
Then merge the first remaining segment in f1.dat
with the first segment in f2.dat into a temporary
file named f3.dat.
Implementing Phase II
S1 S2 S3 S4 S5 S6 S7 S8 f1.dat
Sk
Copy to f2.dat
S1 S2 S3 S4 f2.dat
S1, S5 merged S2, S6 merged S3, S7 merged S4, S8 merged f3.dat