Chap 5 - Sorting
Chap 5 - Sorting
Chap 5 - Sorting
Sorting Algorithms
Merge Sort:
his sort follows a divide and conquer algorithm. It divides the problem into half, recursively
T
sorts two subproblems, and then merges the results into a complete sorted sequence.
Divide:
41
CSE 221: Algorithms [ARN] Ariyan Hossain
Merging/Combining two sorted arrays A and B into another sorted array C in Θ(n):
42
CSE 221: Algorithms [ARN] Ariyan Hossain
Pseudo Code:
Advantages:
C
● onsistent time complexity O(n log n), making it efficient for large datasets
● Stable sorting algorithm, preserving the relative order of equal elements
Disadvantages:
● R equires additional memory space (Out-of-place sorting) for the temporary arrays during
the merging process, leading to higher space complexity O(n)
● Slower for small datasets compared to simpler algorithms like insertion sort
Follow-up Question:
ns: Depends on the number of subproblems and each subproblem size and work done at each
A
step. In each step, there are 2 subproblems where size gets divided by 2 (n/2) and work done is
n at each step. T(n) = 2T(n/2) + n
43
CSE 221: Algorithms [ARN] Ariyan Hossain
Quick Sort:
his sort also follows a divide and conquer algorithm. It partitions the array into subarrays
T
around a pivot x such that the elements in the lower subarray ≤ x ≤ elements in the upper
subarray, recursively sorts the 2 subarrays, and then concatenates the lower subarray, pivot,
and the upper subarray.
Divide / Partition:
44
CSE 221: Algorithms [ARN] Ariyan Hossain
45
CSE 221: Algorithms [ARN] Ariyan Hossain
Pseudo Code:
Quicksort is unique because its speed depends on the pivot you choose.
orst Case happens when the pivot is the first or last element of a sorted (ascending or
W
descending) array. The result is that one of the partitions is always empty.
46
CSE 221: Algorithms [ARN] Ariyan Hossain
Worst Case:
here are O(n) levels or the height is O(n), And each level takes O(n) time. The entire algorithm
T
will take O(n) * O(n) = O(n2) time.
Best Case:
47
CSE 221: Algorithms [ARN] Ariyan Hossain
here are O(log n) levels or the height is O(log n). And each level takes O(n) time. The entire
T
algorithm will take O(n) * O(log n) = O(n log n) time.
he best case is also the average case. If you always choose a random element in the array as
T
the pivot, quicksort will complete in O(n log n) time on average
Advantages:
F
● astest sorting algorithm O(n log n), making it efficient for large datasets
● Does not require additional memory space (In-place sorting)
Disadvantages:
O
● (n2) time complexity in worst-case scenario
● Unstable sorting algorithm, not preserving the relative order of equal elements
Follow-up Question:
) If quicksort is O(n log n) on average, but merge sort is O(n log n) always, why not use merge
Q
sort? Isn’t it faster?
ns: Even though both functions are the same speed in Big O notation, quicksort is faster in
A
practice. When you write Big O notation like O(n), it means O(c * n) where c is some fixed
amount of time that your algorithm takes. Let’s see an example:
48
CSE 221: Algorithms [ARN] Ariyan Hossain
uicksort has a smaller constant than merge sort. So if they’re both O(n log n) time, quicksort is
Q
faster. And quicksort is faster in practice because it hits the average case way more often than
the worst case.
ns: Depends on the number of subproblems and each subproblem size and work done at each
A
step. In each step, there is 1 subproblem where size gets subtracted by 1 (n-1) and work done
is n at each step. T(n) = T(n-1) + n
ns: Depends on the number of subproblems and each subproblem size and work done at each
A
step. In each step, there are 2 subproblems where size gets divided by 2 (n/2) and work done is
n at each step. T(n) = 2T(n/2) + n
) How to make sure your algorithm never reaches the worst-case when choosing 1st element
Q
as a pivot?
ns: Use Randomized Quicksort. It is the same as the usual Quicksort but you will swap the
A
pivot with a random number in the array within the range and then start partitioning. This will
always give O(n log n).
Heap Sort:
Recap:
In a Binary Tree, a parent can have a maximum of 2 nodes and a minimum of 0 nodes.
●
● In a Complete Binary Tree, nodes are added to the tree in a left-to-right manner not
skipping any position. A parent can have 0, 1, or 2 children.
49
CSE 221: Algorithms [ARN] Ariyan Hossain
Heap is an Abstract Data Type (ADT) for storing values. Its underlying data structure is an
A
array.
Heap has to be a complete binary tree and it must satisfy the heap property.
A
Heap property:
● T he value of the parent must be greater than or equal to the values of the
children (Max heap).
or
● The value of the parent must be smaller than or equal to the values of the children. (Min
heap).
here are two types of heaps. Max heap is mostly used (default). A heap can be either a max
T
heap or a min heap but can't be both at the same time.
eap data structure provides worst-case O(1) time access to the largest (max-heap) or smallest
H
(min-heap) element and provides worst-case Θ(log n) time to extract the largest (max-heap) or
smallest (min-heap) element.
Note: Tree is used for efficient tracing. While programming, the data structure is a simple Array.
he benefit of using Array for Heap rather than Linked List is Arrays give you random access to
T
its elements by indices. You can just pick any element from the Array by just calling the
corresponding index. Finding a parent and their children is trivial. Whereas, Linked List is
sequential. This means you need to keep visiting elements in the linked list unless you find the
50
CSE 221: Algorithms [ARN] Ariyan Hossain
lement you are looking for. Linked List does not allow random access as Array does. Also,
e
each Linked List must have three (3) references to traverse the whole Tree (Parent, left, Right).
Heap Operations:
● Insert:
Inserts an element at the bottom of the Heap. Then we must make sure that the Heap property
remains unchanged. When inserting an element in the Heap, we start from the left available
position to the right.
ere, Heap property is kept intact. What if we want to insert 102 instead of 3? 102 will be added
H
as a child of 5 but Heap property will be broken. Therefore, we need to put 102 in its correct
position.
et the new node be ‘n’ (in this case it is the node that contains 102). Check ‘n’ with its parent. If
L
the parent is smaller (n > parent) than the node ‘n’, replace ‘n’ with the parent. Continue this
process until n is in its correct position.
51
CSE 221: Algorithms [ARN] Ariyan Hossain
est-case Time Complexity is O(1) when a key is inserted in the correct position at the first go.
B
Worst-case Time Complexity is when the newest node needs to climb up to the root
O(1) [insertion] + O(log n) [swim] = O(log n)
,
● Delete:
In heap, you cannot just cannot randomly delete an item. Deletion is done by replacing the root
with the last element. The Heap property will be broken as small value will be at the top (root) of
the Heap. Therefore we must put it in the right place.
52
CSE 221: Algorithms [ARN] Ariyan Hossain
ere, the root 102 will be replaced by the last element 5, and 102 will be removed. Heap
H
property will be broken. Therefore, we need to put 5 in its correct position.
et the replaced node be ‘n’ (in this case it is the node that contains 5). Check ‘n’ with its
L
children. If the node ‘n’ is smaller (n < any child) than any child, replace ‘n’ with the largest child.
Continue this process until n is in its correct position.
eleted element will always be the maximum element available in max-heap.Time Complexity is
D
O(1) [deletion] + O(log n) [sink] = O(log n)
53
CSE 221: Algorithms [ARN] Ariyan Hossain
elete + Sink all the nodes of the heap and store them in an array. The array will return a sorted
D
array in descending order. Reversing the array will give a sorted array in ascending order.
Simulation:
Delete + Sink takes O(log n) and for ‘n’ nodes, Heap Sort will take O(n log n).
54
CSE 221: Algorithms [ARN] Ariyan Hossain
ou are given an arbitrary array and you have been asked to built it a heap. This will take
Y
O(n log n).
Advantages:
C
● onsistent time complexity O(n log n), making it efficient for large datasets
● Does not require additional memory space (In-place sorting)
● Often used as a priority queue
Disadvantages:
● Unstable sorting algorithm, not preserving the relative order of equal elements
Count Sort:
ount sort, also known as counting sort, is a non-comparative integer sorting algorithm. This
C
sorting technique doesn't perform sorting by comparing elements, but rather by using a
frequency array. It is efficient when the range of the input data (i.e., the difference between the
maximum and minimum values) is not significantly greater than the number of elements to be
sorted.
55
CSE 221: Algorithms [ARN] Ariyan Hossain
tep 2:Initialize a countArray[] of length max+1 with all elements as 0. This array will be used
S
for storing the occurrences of the elements of the input array.
tep 3:In the countArray[], store the count of each unique element of the input array at their
S
respective indices.
tep 4:Store the cumulative sum of the elements ofthe countArray[] by doing
S
countArray[i] = countArray[i – 1] + countArray[i].
tep 5:Iterate from end of the input array (to preserve stability) and update
S
outputArray[ countArray[ inputArray[i] ] – 1] = inputArray[i] and also, update
countArray[ inputArray[i] ] = countArray[ inputArray[i] ] – 1 (so that duplicate values
are not overwritten)
For i=7
56
CSE 221: Algorithms [ARN] Ariyan Hossain
For i=6
.
.
.
For i=0
57
CSE 221: Algorithms [ARN] Ariyan Hossain
Pseudo Code:
Advantages:
C
● onsistent linear time complexity O(n+k)
● Stable sorting algorithm, preserving the relative order of equal elements
Disadvantages:
C
● ounting sort is inefficient if the range of values to be sorted is very large
● Requires additional memory space (Out-of-place sorting) for countArray and outputArray,
leading to higher space complexity O(n+k)
● Counting sort does not work on decimal values
Follow-Up Question:
ns: The first loop takes O(k), the second loop takes O(n), the third loop takes O(k) and the last
A
loop takes O(n). Hence, time complexity is O(n+k)
58