0% found this document useful (0 votes)
0 views

Sorting and Searching Algorithms

Uploaded by

degagaalemayehu5
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Sorting and Searching Algorithms

Uploaded by

degagaalemayehu5
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Sorting and searching algorithms

Objectives
 To study and analyze time efficiency of various sorting
algorithms
 To design, implement, and analyze bubble sort.
 To design, implement, and analyze merge sort
 To design, implement, and analyze quick sort.
Linear Search
One by one...
Linear Search
 Check every element in the list, until the
target is found
 For example, our target is 38:

i 0 1 2 3 4 5

a[i] 25 14 9 38 77 45

Not found!
Found!
Linear Search
1) Initilize an index variable i
2) Compare a[i] with target
• If a[i]==target, found
• If a[i]!=target,
• If all have checked already, not found
• Otherwise, change i into next index and go to step
2
Linear Search
 Time complexity in worst case?
– If N is number of elements,
– Time complexity = O(N)
 Advantage?
 Disadvantage?
Binary Search
Chop by half...
Binary Search
 Given a SORTED list:
 (Again, our target is 38)
Smaller! Found! Larger!

i 0 1 2 3 4 5

a[i] 9 14 25 38 45 77

L R
Binary Search
 Why always in the middle, but not other
positions, say one-third of list?

1) Initialize boundaries L and R


2) While L is still on the left of R
• mid = (L+R)/2
• If a[mid]>Target, set R be m-1 and go to step 2
• If a[mid]<Target, set L be m+1 and go to step 2
• If a[mid]==Target, found
Binary Search
 Time complexity in the worst case?
– If N is the number of elements,
– Time complexity = O(lg N)
– Why?
 Advantage?
 Disadvantage?
What can you learn?
 Improve one ‘dimension’ using binary
search
 Linear search for a few times can be more
efficient than binary search for many times!
– DO NOT underestimate linear search!!!
why study sorting?
Sorting is a classic subject in computer science. There are three
reasons for studying sorting algorithms.

– First, sorting algorithms illustrate many creative


approaches to problem solving and these approaches can
be applied to solve other problems.

– Second, sorting algorithms are good for practicing


fundamental programming techniques using selection
statements, loops, methods, and arrays.

– Third, sorting algorithms are excellent examples to


demonstrate algorithm performance.
what data to sort?
The data to be sorted might be integers, doubles, characters, or
objects. The Java API contains several overloaded sort methods
for sorting primitive type values and objects in the
java.util.Arrays and java.util.Collections class.

 data to be sorted are integers,


 data are sorted in ascending order, and
 data are stored in an array. The programs can be easily
modified to sort other types of data, to sort in descending
order, or to sort data in an ArrayList or a LinkedList.
Bubble Sort
2 9 5 4 8 1 2 5 4 8 1 9 2 4 5 1 8 9 2 4 1 5 8 9 1 2 4 5 8 9
2 5 9 4 8 1 2 4 5 8 1 9 2 4 5 1 8 9 2 1 4 5 8 9
2 5 4 9 8 1 2 4 5 8 1 9 2 4 1 5 8 9
2 5 4 8 9 1 2 4 5 1 8 9
2 5 4 8 1 9

(a) 1st pass (b) 2nd pass (c) 3rd pass (d) 4th pass (e) 5th pass

Bubble sort time: O(n2) BubbleSort

n2 n Run
(n  1)  (n  2)  ...  2  1  
2 2
Merge Sort
2 9 5 4 8 1 67
split
2 9 5 4 8 1 6 7
split divide
2 9 5 4 8 1 6 7
split
2 9 5 4 8 1 6 7
merge
2 9 4 5 1 8 6 7
conquer
merge
2 4 5 9 1 6 7 8
MergeSort
merge
1 2 4 5 6 7 89 Run
Merge Two Sorted Lists

current1 current2 current1 current2 current1 current2

2 4 5 9 1 6 7 8 2 4 5 9 1 6 7 8 2 4 5 9 1 6 7 8

1 1 2 4 5 6 7 8 1 2 4 5 6 7 8 9

current3 current3 current3


(a) After moving 1 to temp (b) After moving all the (c) After moving 9 to
elements in list2 to temp temp
to temp
Merge Sort Time
Let T(n) denote the time required for sorting an
array of n elements using merge sort. Without loss
of generality, assume n is a power of 2. The merge
sort algorithm splits the array into two subarrays,
sorts the subarrays using the same algorithm
recursively, and then merges the subarrays. So,
n n
T ( n) T ( )  T ( )  mergetime
2 2
Merge Sort Time
The first T(n/2) is the time for sorting the first
half of the array and the second T(n/2) is the time
for sorting the second half. To merge two
subarrays, it takes at most n-1 comparisons to
compare the elements from the two subarrays and
n moves to move elements to the temporary
array. So, the total time is 2n-1. Therefore,
n n n n
T ( n ) 2T ( )  2n  1 2( 2T ( )  2  1)  2n  1 2 2 T ( 2 )  2n  2  2n  1
2 4 2 2
n
2 k T ( k )  2n  2k  1  ...  2n  2  2n  1
2
n
2logn T ( logn )  2n  2logn  1  ...  2n  2  2n  1
2
n  2n log n  2logn  1 2n log n  1 O ( n log n )
Quick Sort
Quick sort, developed by C. A. R. Hoare (1962),
works as follows: The algorithm selects an element,
called the pivot, in the array. Divide the array into
two parts such that all the elements in the first part
are less than or equal to the pivot and all the
elements in the second part are greater than the
pivot. Recursively apply the quick sort algorithm to
the first part and then the second part.
Quick Sort
pivot

5 2 9 3 8 4 0 1 6 7 (a) The original array

pivot pivot

4 2 1 3 0 5 8 9 6 7 (b)The original array is partitioned

pivot
(c) The partial array (4 2 1 3 0) is
0 2 1 3 4
partitioned
pivot

0 2 1 3 (d) The partial array (0 2 1 3) is


partitioned

(e) The partial array (2 1 3) is


1 2 3 partitioned
Partition pivot low

5 2 9 3 8 4 0 1 6 7
high

(a) Initialize pivot, low, and high

pivot low high

5 2 9 3 8 4 0 1 6 7 (b) Search forward and backward

pivot low high

5 2 1 3 8 4 0 9 6 7 (c) 9 is swapped with 1

pivot low high

5 2 1 3 8 4 0 9 6 7 (d) Continue search

pivot low high

5 2 1 3 0 4 8 9 6 7 (e) 8 is swapped with 0

pivot low high

QuickSort 5 2 1 3 0 4 8 9 6 7 (f) when high < low, search is over

pivot
Run
4 2 1 3 0 5 8 9 6 7 (g) pivot is in the right place

The index of the pivot is returned


Quick Sort Time
To partition an array of n elements, it takes n-1
comparisons and n moves in the worst case. So,
the time required for partition is O(n).
Worst-Case Time
In the worst case, each time the pivot divides the
array into one big subarray with the other empty.
The size of the big subarray is one less than the
one before divided. The algorithm requires O (n 2 )

time:
(n  1) (n  2)  ...  2  1 O(n 2 )
Best-Case Time
In the best case, each time the pivot divides the
array into two parts of about the same size. Let
T(n) denote the time required for sorting an array
of elements using quick sort. So,

n n
T ( n ) T ( )  T ( ) n O ( n log n )
2 2
Average-Case Time
On the average, each time the pivot will not
divide the array into two parts of the same size
nor one empty part. Statistically, the sizes of the
two parts are very close. So the average time is
O(nlogn). The exact average-case analysis is
beyond the scope of this book.
Heap
Heap is a useful data structure for designing efficient
sorting algorithms and priority queues. A heap is a binary
tree with the following properties:

Itis a complete binary tree.


Each node is greater than or equal to any of its children.
Complete Binary Tree
A binary tree is complete if every level of the tree is full
except that the last level may not be full and all the leaves
on the last level are placed left-most. For example, in
Figure below the binary trees in (a) and (b) are complete,
but the binary trees in (c) and (d) are not complete. Further,
the binary tree in (a) is a heap, but the binary tree in (b) is
not a heap, because the root (39) is less than its right child
(42).

42 39 42 42

32 39 32 42 32 39 32

22 29 14 33 22 29 14 22 14 33 22 29
Representing a Heap
For a node at position i, its left child is at position 2i+1 and
its right child is at position 2i+2, and its parent is (i-1)/2.
For example, the node for element 39 is at position 4, so its
left child (element 14) is at 9 (2*4+1), its right child
(element 33) is at 10 (2*4+2), and its parent (element 42) is
at 1 ((4-1)/2).
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][11][12][13]
62 [10][11]
62 42 59 32 39 44 13 22 29 14 33 30 17 9
left
42 59 parent

right
32 39 44 13

22 29 14 33 30 17 9
Adding Elements to the Heap

Adding 3, 5, 1, 19, 11, and 22 to a heap, initially empty

3 5 5

3 3 1
(a) After adding 3 (b) After adding 5 (c) After adding 1

19 19 22

5 1 11 1 11 19

3 3 5 3 5 1
(d) After adding 19 (e) After adding 11 (f) After adding 22
Rebuild the heap after adding a new node

Adding 88 to the heap


22 22 88

11 19 11 88 11 22

3 5 1 88 3 5 1 19 3 5 1 19
(a) Add 88 to a heap (b) After swapping 88 with 19 (b) After swapping 88 with 22
Removing the Root and Rebuild the Tree
Removing root 62 from the heap

62

42 59

32 39 44 13

22 29 14 33 30 17 9
Removing the Root and Rebuild the Tree
Move 9 to root

42 59

32 39 44 13

22 29 14 33 30 17
Removing the Root and Rebuild the Tree
Swap 9 with 59

59

42 9

32 39 44 13

22 29 14 33 30 17
Removing the Root and Rebuild the Tree
Swap 9 with 44

59

42 44

32 39 9 13

22 29 14 33 30 17
Removing the Root and Rebuild the Tree
Swap 9 with 30

59

42 44

32 39 30 13

22 29 14 33 9 17
The Heap Class
Heap<E>
-list: java.util.ArrayList<E>
+Heap() Creates a default empty heap.
+Heap(objects: E[]) Creates a heap with the specified objects.
+add(newObject: E): void Adds a new object to the heap.
+remove(): E Removes the root from the heap and returns it.
+getSize(): int Returns the size of the heap.

Heap TestHeap Run


Heap Sort

HeapSort Run
Heap Sort Time
Let h denote the height for a heap of n elements.
Since a heap is a complete binary tree, the first
level has 1 node, the second level has 2 nodes,
the kth level has 2(k-1) nodes, the (h-1)th level has
2(h-2) nodes, and the hth level has at least one node
and at most 2(h-1) nodes. Therefore,
1  2  ...  2h  2  n 1  2  ...  2h  2  2h  1

2 h 1 h
 1  n 2  1 2h  1  n  1 2 h log 2 h  1  log( n  1) log 2 h

h  1  log( n  1) h log( n  1) h  log( n  1)  1


Bucket Sort and Radix Sort
All sort algorithms discussed so far are general
sorting algorithms that work for any types of keys
(e.g., integers, strings, and any comparable objects).
These algorithms sort the elements by comparing
their keys. The lower bound for general sorting
algorithms is O(nlogn). So, no sorting algorithms
based on comparisons can perform better than
O(nlogn). However, if the keys are small integers,
you can use bucket sort without having to compare
the keys.
Bucket Sort
Put the elements into buckets in the order of
elements’ keys.
Radix Sort
The buckets corresponds to radix.
Phase I
Repeatedly bring data from the file to an array,
sort the array using an internal sorting algorithm,
and output the data from the array to a temporary
file.
Program Original file

Array
Temporary file
……
S1 S2 Sk
Phase II
Merge a pair of sorted segments (e.g., S1 with S2,
S3 with S4, ..., and so on) into a larger sorted
segment and save the new segment into a new
temporary file. Continue the same process until
one sorted segment results.
S1 S2 S3 S4 S5 S6 S7 S8
Sk
Merge
S1, S2 merged S3, S4 merged S5, S6 merged S7, S8 merged
Merge
S1, S2, S3, S4 merged S5, S6 , S7, S8 merged
Merge
S1, S2, S3, S4 , S5, S6 , S7, S8 merged
Implementing Phase II
Each merge step merges two sorted segments to
form a new segment. The new segment doubles the
number elements. So the number of segments is
reduced by half after each merge step. A segment is
too large to be brought to an array in memory. To
implement a merge step, copy half number of
segments from file f1.dat to a temporary file f2.dat.
Then merge the first remaining segment in f1.dat
with the first segment in f2.dat into a temporary
file named f3.dat.
Implementing Phase II
S1 S2 S3 S4 S5 S6 S7 S8 f1.dat
Sk
Copy to f2.dat
S1 S2 S3 S4 f2.dat

S1, S5 merged S2, S6 merged S3, S7 merged S4, S8 merged f3.dat

You might also like