Chap 5 - Sorting

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭Sorting Algorithms‬

‭Merge Sort:‬

‭ his sort follows a divide and conquer algorithm. It divides the problem into half, recursively‬
T
‭sorts two subproblems, and then merges the results into a complete sorted sequence.‬

‭Divide:‬

‭Conquer & Combine:‬

‭41‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭Merging/Combining two sorted arrays A and B into another sorted array C in Θ(n):‬

‭42‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭Pseudo Code:‬

‭Advantages‬‭:‬

‭‬ C
● ‭ onsistent time complexity O(n log n), making it efficient for large datasets‬
‭●‬ ‭Stable sorting algorithm, preserving the relative order of equal elements‬

‭Disadvantages:‬

‭●‬ R ‭ equires additional memory space (Out-of-place sorting) for the temporary arrays during‬
‭the merging process, leading to higher space complexity O(n)‬
‭●‬ ‭Slower for small datasets compared to simpler algorithms like insertion sort‬

‭Follow-up Question:‬

‭Q) What is the Recurrence Equation?‬

‭ ns: Depends on the number of subproblems and each subproblem size and work done at each‬
A
‭step. In each step, there are 2 subproblems where size gets divided by 2 (n/2) and work done is‬
‭n at each step. T(n) = 2T(n/2) + n‬

‭43‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭Quick Sort:‬

‭ his sort also follows a divide and conquer algorithm. It partitions the array into subarrays‬
T
‭around a pivot x such that the elements in the lower subarray ≤ x ≤ elements in the upper‬
‭subarray, recursively sorts the 2 subarrays, and then concatenates the lower subarray, pivot,‬
‭and the upper subarray.‬

‭Divide / Partition:‬

‭Conquer & Combine:‬

‭44‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭Partitioning/Dividing an array A in Θ(n):‬

‭45‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭Pseudo Code:‬

‭Quicksort is unique because its speed depends on the pivot you choose.‬

‭Worst Case vs Average/Best Case:‬

‭ orst Case happens when the pivot is the first or last element of a sorted (ascending or‬
W
‭descending) array. The result is that one of the partitions is always empty.‬

‭46‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭Worst Case:‬

‭ here are O(n) levels or the height is O(n), And each level takes O(n) time. The entire algorithm‬
T
‭will take O(n) * O(n) = O(n‬‭2‬‭) time.‬

‭Best Case:‬

‭47‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭ here are O(log n) levels or the height is O(log n). And each level takes O(n) time. The entire‬
T
‭algorithm will take O(n) * O(log n) = O(n log n) time.‬

‭ he best case is also the average case. If you always choose a random element in the array as‬
T
‭the pivot, quicksort will complete in O(n log n) time on average‬

‭Advantages‬‭:‬

‭‬ F
● ‭ astest sorting algorithm O(n log n), making it efficient for large datasets‬
‭●‬ ‭Does not require additional memory space (In-place sorting)‬

‭Disadvantages:‬

‭‬ O
● ‭ (n‬‭2‭)‬ time complexity in worst-case scenario‬
‭●‬ ‭Unstable sorting algorithm, not preserving the relative order of equal elements‬

‭Follow-up Question:‬

‭ ) If quicksort is O(n log n) on average, but merge sort is O(n log n) always, why not use merge‬
Q
‭sort? Isn’t it faster?‬

‭ ns: Even though both functions are the same speed in Big O notation, quicksort is faster in‬
A
‭practice. When you write Big O notation like O(n), it means O(c * n) where c is some fixed‬
‭amount of time that your algorithm takes. Let’s see an example:‬

‭48‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭ uicksort has a smaller constant than merge sort. So if they’re both O(n log n) time, quicksort is‬
Q
‭faster. And quicksort is faster in practice because it hits the average case way more often than‬
‭the worst case.‬

‭Q) What is the Recurrence Equation when it is worst-case?‬

‭ ns: Depends on the number of subproblems and each subproblem size and work done at each‬
A
‭step. In each step, there is 1 subproblem where size gets subtracted by 1 (n-1) and work done‬
‭is n at each step. T(n) = T(n-1) + n‬

‭Q) What is the Recurrence Equation when it is best-case?‬

‭ ns: Depends on the number of subproblems and each subproblem size and work done at each‬
A
‭step. In each step, there are 2 subproblems where size gets divided by 2 (n/2) and work done is‬
‭n at each step. T(n) = 2T(n/2) + n‬

‭ ) How to make sure your algorithm never reaches the worst-case when choosing 1st element‬
Q
‭as a pivot?‬

‭ ns: Use Randomized Quicksort. It is the same as the usual Quicksort but you will swap the‬
A
‭pivot with a random number in the array within the range and then start partitioning. This will‬
‭always give O(n log n).‬

‭Heap Sort:‬

‭Heap Sort uses a data structure called the heap.‬

‭Heap Data Structure:‬

‭Recap:‬

‭ ‬ I‭n a Binary Tree, a parent can have a maximum of 2 nodes and a minimum of 0 nodes.‬

‭●‬ ‭In a Complete Binary Tree, nodes are added to the tree in a left-to-right manner not‬
‭skipping any position. A parent can have 0, 1, or 2 children.‬

‭49‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭ Heap is an Abstract Data Type (ADT) for storing values. Its underlying data structure is an‬
A
‭array.‬

‭ Heap has to be a complete binary tree and it must satisfy the heap property.‬
A
‭Heap property:‬

‭●‬ T ‭ he value of the parent must be greater than or equal to the values of the‬
‭children (Max heap).‬
‭or‬
‭●‬ ‭The value of the parent must be smaller than or equal to the values of the children. (Min‬
‭heap).‬

‭ here are two types of heaps. Max heap is mostly used (default). A heap can be either a max‬
T
‭heap or a min heap but can't be both at the same time.‬

‭ eap data structure provides worst-case O(1) time access to the largest (max-heap) or smallest‬
H
‭(min-heap) element and provides worst-case Θ(log n) time to extract the largest (max-heap) or‬
‭smallest (min-heap) element.‬

‭Note: Tree is used for efficient tracing. While programming, the data structure is a simple Array.‬

‭ he benefit of using Array for Heap rather than Linked List is Arrays give you random access to‬
T
‭its elements by indices. You can just pick any element from the Array by just calling the‬
‭corresponding index. Finding a parent and their children is trivial. Whereas, Linked List is‬
‭sequential. This means you need to keep visiting elements in the linked list unless you find the‬

‭50‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭ lement you are looking for. Linked List does not allow random access as Array does. Also,‬
e
‭each Linked List must have three (3) references to traverse the whole Tree (Parent, left, Right).‬

‭Heap Operations:‬

‭●‬ ‭Insert:‬

I‭nserts an element at the bottom of the Heap. Then we must make sure that the Heap property‬
‭remains unchanged. When inserting an element in the Heap, we start from the left available‬
‭position to the right.‬

‭ ere, Heap property is kept intact. What if we want to insert 102 instead of 3? 102 will be added‬
H
‭as a child of 5 but Heap property will be broken. Therefore, we need to put 102 in its correct‬
‭position.‬

‭●‬ ‭HeapIncreaseKey / Swim:‬

‭ et the new node be ‘n’ (in this case it is the node that contains 102). Check ‘n’ with its parent. If‬
L
‭the parent is smaller (n > parent) than the node ‘n’, replace ‘n’ with the parent. Continue this‬
‭process until n is in its correct position.‬

‭51‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭ est-case Time Complexity is O(1) when a key is inserted in the correct position at the first go.‬
B
‭Worst-case Time Complexity is when the newest node needs to climb up to the root‬
‭O(1) [insertion] + O(log n) [swim] = O(log n)‬
‭,‬

‭●‬ ‭Delete:‬

I‭n heap, you cannot just cannot randomly delete an item. Deletion is done by replacing the root‬
‭with the last element. The Heap property will be broken as small value will be at the top (root) of‬
‭the Heap. Therefore we must put it in the right place.‬

‭52‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭ ere, the root 102 will be replaced by the last element 5, and 102 will be removed. Heap‬
H
‭property will be broken. Therefore, we need to put 5 in its correct position.‬

‭●‬ ‭MaxHeapify / Sink:‬

‭ et the replaced node be ‘n’ (in this case it is the node that contains 5). Check ‘n’ with its‬
L
‭children. If the node ‘n’ is smaller (n < any child) than any child, replace ‘n’ with the largest child.‬
‭Continue this process until n is in its correct position.‬

‭ eleted element will always be the maximum element available in max-heap.Time Complexity is‬
D
‭O(1) [deletion] + O(log n) [sink] = O(log n)‬

‭53‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭●‬ ‭Heap Sort:‬

‭ elete + Sink all the nodes of the heap and store them in an array. The array will return a sorted‬
D
‭array in descending order. Reversing the array will give a sorted array in ascending order.‬

‭Simulation:‬

‭Delete + Sink takes O(log n) and for ‘n’ nodes, Heap Sort will take O(n log n).‬

‭54‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭●‬ ‭Build Max Heap:‬

‭ ou are given an arbitrary array and you have been asked to built it a heap. This will take‬
Y
‭O(n log n).‬

‭Advantages‬‭:‬

‭‬ C
● ‭ onsistent time complexity O(n log n), making it efficient for large datasets‬
‭●‬ ‭Does not require additional memory space (In-place sorting)‬
‭●‬ ‭Often used as a priority queue‬

‭Disadvantages:‬

‭●‬ ‭Unstable sorting algorithm, not preserving the relative order of equal elements‬

‭Count Sort:‬

‭ ount sort, also known as counting sort, is a non-comparative integer sorting algorithm. This‬
C
‭sorting technique doesn't perform sorting by comparing elements, but rather by using a‬
‭frequency array. It is efficient when the range of the input data (i.e., the difference between the‬
‭maximum and minimum values) is not significantly greater than the number of elements to be‬
‭sorted.‬

‭Step 1:‬‭Find out the maximum element from the given‬‭array.‬

‭55‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭ tep 2:‬‭Initialize a countArray[] of length max+1 with all elements as 0. This array will be used‬
S
‭for storing the occurrences of the elements of the input array.‬

‭ tep 3:‬‭In the countArray[], store the count of each unique element of the input array at their‬
S
‭respective indices.‬

‭ tep 4:‬‭Store the cumulative sum of the elements of‬‭the countArray[] by doing‬
S
‭countArray[i] = countArray[i – 1] + countArray[i].‬

‭ tep 5:‬‭Iterate from end of the input array (to preserve stability) and update‬
S
‭outputArray[ countArray[ inputArray[i] ] – 1] = inputArray[i] and also, update‬
‭countArray[ inputArray[i] ] = countArray[ inputArray[i] ] – 1 (so that duplicate values‬
‭are not overwritten)‬

‭For i=7‬

‭56‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭For i=6‬

.‭‬
‭.‬
‭.‬

‭For i=0‬

‭The outputArray will return a sorted array‬

‭57‬
‭CSE 221: Algorithms [ARN] Ariyan Hossain‬

‭Pseudo Code:‬

‭Advantages:‬

‭‬ C
● ‭ onsistent linear time complexity O(n+k)‬
‭●‬ ‭Stable sorting algorithm, preserving the relative order of equal elements‬

‭Disadvantages:‬

‭‬ C
● ‭ ounting sort is inefficient if the range of values to be sorted is very large‬
‭●‬ ‭Requires additional memory space (Out-of-place sorting) for countArray and outputArray,‬
‭leading to higher space complexity O(n+k)‬
‭●‬ ‭Counting sort does not work on decimal values‬

‭Follow-Up Question:‬

‭Q) What is the Time Complexity?‬

‭ ns: The first loop takes O(k), the second loop takes O(n), the third loop takes O(k) and the last‬
A
‭loop takes O(n). Hence, time complexity is O(n+k)‬

‭58‬

You might also like