6 Sorting
6 Sorting
10/19/2024 1
Paper Code(s): CIC‐209 L P C
Paper: Data Structures 4 ‐ 4
Marking Scheme:
1. Teachers Continuous Evaluation: 25 marks
2. Term end Theory Examinations: 75 marks
Instructions for paper setter:
1. There should be 9 questions in the term end examinations question paper.
2. The first (1st) question should be compulsory and cover the entire syllabus. This question should be objective, single line answers or short
answer type question of total 15 marks.
3. Apart from question 1 which is compulsory, rest of the paper shall consist of 4 units as per the syllabus. Every unit shall have two questions
covering the corresponding unit of the syllabus. However, the student shall be asked to attempt only one of the two questions in the unit.
Individual questions may contain upto 5 sub‐parts / sub‐questions. Each Unit shall have a marks weightage of 15.
4. The questions are to be framed keeping in view the learning outcomes of the course / paper. The standard / level of the questions to be asked
should be at the level of the prescribed textbook.
5. The requirement of (scientific) calculators / log‐tables / data – tables may be specified if required.
Course Objectives :
1. To introduce basics of Data structures (Arrays, strings, linked list etc.)
2. To understand the concepts of Stacks, Queues and Trees, related operations and their implementation
3. To understand sets, heaps and graphs
4. To introduce various Sorting and searching Algorithms
Course Outcomes (CO)
CO 1 To be able to understand difference between structured data and data structure
CO 2 To be able to create common basic data structures and trees
CO 3 To have a knowledge of sets, heaps and graphs
CO 4 To have basic knowledge of sorting and searching algorithms
Course Outcomes (CO) to Programme Outcomes (PO) mapping (scale 1: low, 2: Medium, 3: High)
PO01 PO02 PO03 PO04 PO05 PO06 PO07 PO08 PO09 PO10 PO11 PO12
CO 1 3 2 2 2 3 ‐ ‐ ‐ 2 2 2 3
CO 2 3 2 2 2 3 ‐ ‐ ‐ 2 2 2 3
CO 3 3 2 2 2 3 ‐ ‐ ‐ 2 2 2 3
CO 4 3 2 2 2 3 ‐ ‐ ‐ 2 2 2 3
UNIT – I
Overview of data structure, Basics of Algorithm Analysis including Running Time Calculations, Abstract Data Types, Arrays, Arrays and Pointers,
Multidimensional Array, String processing, General Lists and List ADT, List manipulations, Single, double and circular lists. Stacks and Stack ADT,
Stack Manipulation, Prefix, infix and postfix expressions, recursion. Queues and Queue ADT, Queue manipulation.
UNIT – II
Sparse Matrix Representation (Array and Link List representation) and arithmetic (addition, subtraction and multiplication), polynomials and
polynomial arithmetic.
Trees, Properties of Trees, Binary trees, Binary Tree traversal, Tree manipulation algorithms, Expression trees and their usage, binary search trees,
AVL Trees, Heaps and their implementation, Priority Queues, B‐Trees, B* Tree, B+ Tree
UNIT – III
Sorting concept, order, stability, Selection sorts (straight, heap), insertion sort (Straight Insertion, Shell sort), Exchange Sort (Bubble, quicksort),
Merge sort (External Sorting) (Natural merge, balanced merge and polyphase merge). Searching – List search, sequential search, binary search,
hashing methods, collision resolution in hashing.
UNIT – IV
Disjoint sets representation, union find algorithm, Graphs, Graph representation, Graph Traversals and their implementations (BFS and DFS).
Minimum Spanning Tree algorithms, Shortest Path Algorithms
Textbook(s):
1. Richard Gilberg , Behrouz A. Forouzan, “Data Structures: A Pseudocode Approach with C, 2nd Edition, Cengage Learning, Oct 2004
2. E. Horowitz, S. Sahni, S. Anderson‐Freed, "Fundamentals of Data Structures in C", 2nd Edition, Silicon Press (US), 2007.
References:
1. Mark Allen Weiss, “Data Structures and Algorithm Analysis in C”, 2nd Edition, Pearson, September, 1996
2. Robert Kruse, “Data Structures and Program Design in C”, 2nd Edition, Pearson, November, 1990
3. Seymour Lipschutz, “Data Structures with C (Schaum's Outline Series)”, McGrawhill, 2017
4. A. M. Tenenbaum, “Data structures using C”. Pearson Education, India, 1st Edition 2003.
5. Weiss M.A., “Data structures and algorithm analysis in C++”, Pearson Education, 2014.
Data Structure
• Data Structure is a way of collecting and organizing data in such a
way that we can perform operations on these data in an effective
way
• In simple language, Data Structures are structures programmed to
store ordered data, so that various operations can be performed on
it easily
10/19/2024 4
Structure of Data
10/19/2024 5
Why Study Sorting Algorithms?
10/19/2024 6
Sorting
• Sorting most common data‐processing applications
Classified as
• Internal sort
All of the data are held in primary memory during sorting process
• External sort
Uses primary memory for the data currently being sorted and
Secondary storage for any data that does not fit in primary memory
10/19/2024 7
Sorting
Sorts
Internal External
• Natural
• Balanced
Insertion Selection Exchange • Polyphase
10/19/2024 8
Sorting
• Sort order
Ascending
Descending
• Sort Efficiency
Estimate of number of comparisons and moves required to order an unordered list
• Sort Stability
Data with equal keys maintain their relative input order in the output
10/19/2024 9
Stable sort algorithms
Sorts
Internal
10/19/2024 11
INSERTION: Insertion Sort
10/19/2024 12
Insertion Sort
• The first item is sorted
• Compare the second item to the first
• if smaller swap
• Third item, compare to item next to it
• need to swap
• after swap compare again
• And so forth…
To insert 3, we need to
make room for it by moving
first 6 and then 4.
10/19/2024 13
Insertion Sort
input array
5 2 4 6 1 3
sorted unsorted
10/19/2024 14
Insertion Sort
10/19/2024 15
One step of insertion sort
sorted next to be inserted
3 4 7 12 14 14 20 21 33 38 10 55 9 23 28 16
temp
less than 10
10
3 4 7 10 12 14 14 20 21 33 38 55 9 23 28 16
sorted
10/19/2024 16
Algorithm
1 2 3 4 5 6 7 8
INSERTION‐SORT(A)
a1 a2 a3 a4 a5 a6 a7 a8
for j ← 2 to n
do key ← A[ j ] key
Insert A[ j ] into the sorted sequence A[1 . . j -1]
i←j-1
while i > 0 and A[i] > key
do A[i + 1] ← A[i]
i←i–1
A[i + 1] ← key
• Insertion sort – sorts the elements in place
10/19/2024 17
Insertion Sort
• int a[6] = {5, 2, 4, 6, 1, 3};
• int i, j, key; // n=6
• for(i=1; i<n; i++)
•{
• key = a[i];
• j = i‐1;
• while(j>=0 && key < a[j])
• {
• a[j+1] = a[j];
j‐‐;
}
• a[j+1] = key;
•}
10/19/2024 18
Analysis of Insertion Sort
INSERTION‐SORT(A) cost times
for j ← 2 to n c1 n
do key ← A[ j ] c2 n-1
Insert A[ j ] into the sorted sequence A[1 . . j ‐1] 0 n-1
i←j‐1 c4 n-1
c5
n
while i > 0 and A[i] > key j 2 j
t
do A[i + 1] ← A[i] c6
n
(t j 1)
j 2
i←i–1 c7
n
(t j 1)
j 2
A[i + 1] ← key c8 n-1
tj: # of times the while statement is executed at iteration j
j 2 j 2 j 2
Best Case Analysis
The array is already sorted “while i > 0 and A[i] > key”
A[i] ≤ key upon the first time the while loop test is run (when i
= j ‐1)
tj = 1
T(n) = c1n + c2(n ‐1) + c4(n ‐1) + c5(n ‐1) + c8(n‐1)
= (c1 + c2 + c4 + c5 + c8)n + (c2 + c4 + c5 + c8)
= an + b = (n)
T (n) c1n c2 (n 1) c4 (n 1) c5 t j c6 t j 1 c7 t j 1 c8 (n 1)
n n n
10/19/2024 j 2 j 2 j 2 20
Worst Case Analysis
• The array is in reverse sorted order “while i > 0 and A[i] > key”
• Always A[i] > key in while loop test
• Have to compare key with all elements to the left of the j‐th position compare with
j-1 elements tj = j
using
n
n(n 1) n
n(n 1) n
n(n 1)
j 1
j
2
j
j 2 2
1 ( j 1)
j 2 2
we have:
n( n 1) n( n 1) n( n 1)
T ( n ) c1n c2 ( n 1) c4 ( n 1) c5 1 c6 c7 c8 ( n 1)
2 2 2
an 2 bn c a quadratic function of n
T(n) = (n2) order of growth in n2
T (n) c1n c2 (n 1) c4 (n 1) c5 t j c6 t j 1 c7 t j 1 c8 (n 1)
n n n
j 2 j 2 j 2 21
Complexity: Insertion sort
• Worst Case Time Complexity : O(n2)
• Best Case Time Complexity : O(n)
• Average Time Complexity : O(n2)
• Space Complexity : O(1)
10/19/2024 22
Sorting
Sorts
Internal
10/19/2024 23
INSERTION: Shell Sort
10/19/2024 24
Shell Sort
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
44
• Initial
68 191
gap
119
= length
119 37
/ 2 =8316 / 82
2 = 8191 45 158 130 76 153 39 25
• initial sub arrays indices:
• {0, 8}, {1, 9}, {2, 10}, {3, 11}, {4, 12}, {5, 13}, {6, 14}, {7, 15}
next gap = 8 / 2 = 4
• {0, 4, 8, 12}, {1, 5, 9, 13}, {2, 6, 10, 14}, {3, 7, 11, 15}
next gap = 4 / 2 = 2
• {0, 2, 4, 6, 8, 10, 12, 14}, {1, 3, 5, 7, 9, 11, 13, 15}
• final gap = 2 / 2 = 1
10/19/2024 25
Shell Sort
10/19/2024 26
Shell Sort
10/19/2024 27
Example: Shell Sort
46 2 83 41 102 5 17 31 64 49 18
5 2 83 41 102 18 17 31 64 49 46
5 2 17 18 31 41 46 83 49 102 64
• Array sorted
2 5 17 18 31 41 46 49 64 83 102
10/19/2024 29
Shell Sort ‐ Ideal Gap Sequence
10/19/2024 30
Gap Sequence: Three possibilities
• Three possibilities presented:
1. Shell's suggestion ‐ first gap is N/2 ‐ successive gaps are
previous value divided by 2
2. Odd gaps only ‐ like Shell method except if division
produces an even number add 1
better performance than Shell's method since all odd
values eliminates the factor 2
3. 2.2 method ‐ like Odd gaps method but use a divisor of
2.2 and truncate
best performance of all ‐ most nearly a relatively prime
sequence.
10/19/2024 31
The Knuth gap sequence
Sorts
Internal
10/19/2024 33
SELECTION: Selection Sort
• Idea:
• Find the smallest element in the array
• Exchange it with the element in the first position
• Find the second smallest element and exchange it with the element in the second
position
• Continue until the array is sorted
• Disadvantage:
• Running time depends only slightly on the amount of order in the file
10/19/2024 34
Selection sort
10/19/2024 35
Example
• Original list 8 4 6 9 2 3 1
• Pass 1
1 4 6 9 2 3 8
• Pass 2
1 2 6 9 4 3 8
• Pass 3
1 2 3 9 4 6 8
10/19/2024 36
Example
• Pass 4 1 2 3 4 9 6 8
• Pass 5
1 2 3 4 6 9 8
• Pass 6
1 2 3 4 6 8 9
• Pass 7
1 2 3 4 6 8 9
Sorted
10/19/2024 37
Selection Sort
8 4 6 9 2 3 1
Algorithm: SELECTION‐SORT(A)
n ← length[A]
for j ← 1 to n - 1
do smallest ← j
for i ← j + 1 to n
do if A[i] < A[smallest]
then smallest ← i
exchange A[j] A[smallest]
38
Selection Sort
int i, j, min, temp;
for(i=0; i < size‐1; i++ )
{
min = i; //setting min as i
for(j=i+1; j < size; j++) 8 4 6 9 2 3 1
{
if(a[j] < a[min]) //if element at j is less than element at min position
{
min = j; //then set min as j
}
}
temp = a[i];
a[i] = a[min];
a[min] = temp;
} 10/19/2024 39
Analysis of Selection Sort
cost times
Alg.: SELECTION‐SORT(A) c1 1
n ← length[A]
c2 n
for j ← 1 to n - 1
c3 n-1
do smallest ← j
n2/2
for i ← j + 1 to n c4 j 1 (n j 1)
n 1
comparisons
do if A[i] < A[smallest] c5
n 1
j 1
(n j )
n then smallest ← i
exchanges c6
n 1
j 1
(n j )
exchange A[j] A[smallest]
c7 n-1
n 1 n 1 n 1 40
T ( n ) c1 c 2 n c 3 ( n 1) c 4 ( n j 1) c 5 n j c 6 n j c 7 ( n 1) ( n )2
j 1 j 1 j2
Complexity: Selection Sort
• Worst Case Time Complexity : O(n2)
• Best Case Time Complexity : O(n2)
• Average Time Complexity : O(n2)
• Space Complexity : O(1)
10/19/2024 41
Example and analysis of selection sort
7 2 8 5 4 • The selection sort might swap an array element with
itself—
2 7 8 5 4 • Analysis:
The outer loop executes n-1 times
The inner loop executes about n/2 times on average
2 4 8 5 7 (from n to 2 times)
Work done in the inner loop is constant (swap two array
elements)
2 4 5 8 7 Time required is roughly (n-1)*(n/2)
O(n2)
2 4 5 7 8
10/19/2024 42
Sorting
Sorts
Internal
10/19/2024 43
SELECTION: Heap Sort
• A heap is a nearly complete binary tree with the following two
properties:
• Structural property: all levels are full, except possibly the last one, which
is filled from left to right
• Order (Max‐heap) property: for any node x
Parent(x) ≥ x
10/19/2024 44
Heap Sort
10/19/2024 45
Question
• Consider the array
• A = (29, 18, 10, 15, 20, 9, 5, 13, 2, 4, 15)
• Does A satisfy the max‐heap property?
10/19/2024 46
Heap Sort
• Build heap starting from unsorted array
• While the heap is not empty
• Remove the first item from the heap:
Swap it with the last item
• Restore the heap property
10/19/2024 47
Example
10/19/2024 48
Example
10/19/2024 49
Example
10/19/2024 50
Example
10/19/2024 51
Heap Sort
• 1st Step‐ Build heap, O(n) time complexity
• 2nd Step – perform n delete Max operations, each with O(log(n))
time complexity
• Total time complexity = O(n log(n))
10/19/2024 52
Heap Sort
10/19/2024 53
Sorting
Sorts
Internal
10/19/2024 54
Exchange Sort
10/19/2024 55
EXCHANGE: Bubble Sort
10/19/2024 56
Bubble Sort
• Idea:
• Repeatedly pass through the array
• Swaps adjacent elements that are out of order
i
1 2 3 n
8 4 6 9 2 3 1
j
• Easier to implement, but slower than Insertion sort
57
Example
8 4 6 9 2 3 1 1 8 4 6 9 2 3
i=1 j i=2 j
8 4 6 9 2 1 3 1 2 8 4 6 9 3
i=1 j i=3 j
8 4 6 9 1 2 3 1 2 3 8 4 6 9
i=1 j i=4 j
8 4 6 1 9 2 3 1 2 3 4 8 6 9
i=1 j i=5 j
8 4 1 6 9 2 3 1 2 3 4 6 8 9
i=1 j i=6 j
8 1 4 6 9 2 3 1 2 3 4 6 8 9
i=1 j i=7
j
1 8 4 6 9 2 3 58
i=1 j
Bubble Sort
Algorithm: BUBBLESORT(A)
for i 1 to length[A]
do for j length[A] downto i + 1
do if A[j] < A[j -1]
then exchange A[j] A[j-1]
i
8 4 6 9 2 3 1
i=1 j
59
Bubble Sort
• Pass 1
• 9 2 5 4 7 8 compare 1st & 2nd out of order EXCHANGE 9 2
• 2 9 5 4 7 8 compare 2nd & 3rd out of order EXCHANGE 9 5
• 2 5 9 4 7 8 compare 3rd & 4th out of order EXCHANGE 9 4
• 2 5 4 9 7 8 compare 4th & 5th out of order EXCHANGE 9 7
• 2 5 4 7 9 8 compare 5th & 6th If out of order EXCHANGE 9 8
• 254789
• Last element reached at appropriate position after pass 1
• All other values may still out of order
• So we need to repeat this process
10/19/2024 60
Bubble sort
• int a[6] = {9 2 5 4 7 8 };
• int i, j, temp; // n=6
• for(i=0; i<n; i++)
•{
•for(j=0; j<n‐i‐1; j++)
•{
• if( a[j] > a[j+1])
• {
• temp = a[j];
• a[j] = a[j+1];
• a[j+1] = temp;
• }
• }
• }
• Algorithm isn't efficient!
• Loop for six iterations even if the array gets sorted after second iteration
10/19/2024 61
Bubble sort modified
• int a[6] = {9 2 5 4 7 8 };
• int i, j, temp;
if (!flag)
• for(i=0; i<n; i++)
{
•{ break; //
• int flag = 0; // taking a flag variable breaking out of for
• for(j=0; j<n‐i‐1; j++) loop if no swapping
takes place
•{
}
•if( a[j] > a[j+1])
•{
}
• temp = a[j]; // If no swapping is
• a[j] = a[j+1]; taking place that
• a[j+1] = temp; means the array is
• flag = 1; // setting flag as 1, if swapping occurs sorted and we can
• } jump out of the for
• } loop
10/19/2024 62
Complexity: Bubble Sort
• n‐1 comparisons will be done in 1st pass
• n‐2 in 2nd pass
• n‐3 in 3rd pass and so on
• total number of comparisons will be (n‐1)+(n‐2)+(n‐3)+.....+3+2+1
• Sum = n(n‐1)/2 i.e. O(n2)
• Hence the complexity of Bubble Sort is O(n2).
• Space complexity for Bubble Sort is O(1), because only single additional
memory space is required for temp variable
• Best‐case Time Complexity O(n), when list is already sorted
10/19/2024 63
Bubble‐Sort Running Time
Algorithm: BUBBLESORT(A)
c1
for i 1 to length[A]
do for j length[A] downto i + 1 c2
do if A[j] < A[j -1] Comparisons: n2/2 c3
(n i )
n n
T(n) = c1(n+1) + c2 (n i 1) c3 (n i ) c4
i 1 i 1 i 1
n
= (n) + (c2 + c2 + c4) (n i )
i 1
n n n
n ( n 1) n 2
n
where (n i ) n i n
2
i 1 i 1 i 1 2 2 2
Thus,T(n) = (n2)
64
Bubble Sort
Sorts
Internal
10/19/2024 66
EXCHANGE: Quick Sort
10/19/2024 67
Picking the Pivot
• Strategy 1: Pick the first element in S
Works only if input is random
10/19/2024 68
Picking the Pivot
• Strategy 2: Pick the pivot randomly
Would usually work well, even for mostly sorted input
10/19/2024 69
Picking the Pivot
• Strategy 3: Median‐of‐three Partitioning
Ideally, the pivot should be the median of input array S
Median = element in the middle of the sorted sequence
• Let i start at the first element and j start at the last but one
(i = left, j = right – 1)
swap
5 6 4 6 3 12 19 5 6 4 19 3 12 6
pivot i j
10/19/2024 71
Partitioning Strategy
• Want to have
• A[p] <= pivot, for p < i <= pivot >= pivot
• A[p] >= pivot, for p > j
• When i < j
•
i pivot
Move i right, skipping over elements smaller than the
j
• Move j left, skipping over elements greater than the pivot
• When both i and j have stopped
• A[i] >= pivot & A[j] <= pivot
5 6 4 19 3 12 6 5 6 4 19 3 12 6
i 10/19/2024 j i j 72
Partitioning Strategy
swap
5 6 4 19 3 12 6 5 3 4 19 6 12 6
i
10/19/2024 j i j 73
Partitioning Strategy
5 3 4 19 6 12 6
• When i and j have crossed
• Swap A[i] and pivot
• Result:
i j
• A[p] <= pivot, for p < i
• A[p] >= pivot, for p > i
5 3 4 19 6 12 6
j i
5 3 4 6 6 12 19
10/19/2024
j i 74
Quick Sort
/* a[] array, p is starting index, i. e. 0, and r is the last index of array */
10/19/2024 75
Quick Sort
if(i < j)
int partition(int a[], int p, int r) {
{
temp = a[i];
int i, j, pivot, temp; a[i] = a[j];
pivot = a[p]; a[j] = temp;
i = p; }
j = r; else
{
while(1)
{ return j;
}
while(a[i] < pivot && a[i] != pivot)
}
i++; }
while(a[j] > pivot && a[j] != pivot)
j‐‐;
10/19/2024 76
Complexity: Quick Sort
10/19/2024 77
Merge Sort
• Merge Sort follows the rule of Divide and Conquer
• But doesn't divide the list into two halves
• Unsorted list is divided into N sublists
each having one element
a list of one element is considered sorted
10/19/2024 78
Merge Sort
• Merge sort works recursively
• First it divides a data set in half, sorts each half separately
• Next, the first elements from each of the two lists are compared
The lesser element is then removed from its list and
added to the final result list
10/19/2024 79
Merge Sort
10/19/2024 81
External Merge Sort
5. Perform a 9‐way merge and store the result in the output buffer
If the output buffer is full, write it to the final sorted file, and empty it
If any of the 9 input buffers gets empty, fill it with the next 10 MB of its associated 100
MB sorted chunk until no more data from the chunk is available
10/19/2024 82
External Merge Sort
• Example of Two‐Way Sorting:
• N = 14, M = 3
(14 records on tape Ta1, memory capacity: 3 records.)
• Ta1: 17, 3, 29, 56, 24, 18, 4, 9, 10, 6, 45, 36, 11, 43
10/19/2024 83
External Merge Sort
• Ta1: 17, 3, 29, 56, 24, 18, 4, 9, 10, 6, 45, 36, 11, 43
• Step A Sorting of runs:
• Read 3 records in main memory, sort them and store them on Tb1: 17, 3, 29 ‐> 3,
17, 29
• Tb1: 3, 17, 29
• Read the next 3 records in main memory, sort them and store them on Tb2: 56,
24, 18 ‐> 18, 24, 56
• Tb2: 18, 24, 56
• Read the next 3 records in main memory, sort them and store them on Tb1: 4, 9,
10 ‐> 4, 9, 10
• Tb1: 3, 17, 29, 4, 9, 10
• Read the next 3 records in main memory, sort them and store them on Tb2: 6, 45,
36 ‐> 6, 36, 45
• Tb2: 18, 24, 56, 6, 36, 45
• Read the next 3 records in main memory, sort them and store them on Tb1: 11, 43
‐> 11, 43
(there are only two records left)
• Tb1: 3, 17, 29, 4, 9, 10, 11, 43
10/19/2024 84
External Merge Sort
10/19/2024 85
External Merge Sort
10/19/2024 86
External Merge Sort
• Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43
• Tb2: 18, 24, 56 | 6, 36, 45 |
10/19/2024 87
External Merge Sort
• Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43
• Tb2: 18, 24, 56 | 6, 36, 45 |
10/19/2024 88
External Merge Sort
• Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43
• Tb2: 18, 24, 56 | 6, 36, 45 |
10/19/2024 89
External Merge Sort
• Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43
• Tb2: 18, 24, 56 | 6, 36, 45 |
• Thus we have the first two runs on Ta1 and Ta2, each twice the size
of the original runs:
10/19/2024 90
External Merge Sort
• Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43
• Tb2: 18, 24, 56 | 6, 36, 45 |
• Next, we merge the third runs on Tb1 and Tb2 and store the result
on Ta1
• only Tb1 contains a third run, so copied onto Ta1:
10/19/2024 91
External Merge Sort
• Step B2 Merging runs of length 6 to obtain runs of length 12
• Source tapes: Ta1 and Ta2. Result on Tb1 and Tb2:
• After merging the first two runs from Ta1 and Ta2
• we get a run of length 12, stored on Tb1:
10/19/2024 92
External Merge Sort
• Now on each tape there is only one run
• The last step is to merge these two runs and to get the entire file
sorted
• Step B3 Merging the last two runs
10/19/2024 93
Time Complexity
1.Initial Splitting:
1. We split the data into chunks that fit into memory. This takes O(n) time,
2. where n is the total number of elements. (n=14)
2.Sorting Each Chunk:
1. Sorting each chunk of size M (where M is the memory size) takes O(M
log M) time. (M=3)
2. If there are k chunks, the total time for sorting all chunks is O(k ⋅ M
log M).
3.Merging Chunks:
1. The merging process involves k sorted chunks. The number of merge pass
es required is log k.
2. Each merge pass processes n elements, leading to a total merging time of
O(n log k).
• Since k=n/M, the merging complexity can be approximated as O(n log n/M).
• Overall, the time complexity of the external merge sort is O(n log n/M).
10/19/2024 94
Natural merge
• Works well on partially sorted data or data with existing runs (sorted
subarrays)
• Identifying Runs:
• The first step involves identifying and marking the existing runs of sorted elements
in the data.
• A run is a contiguous subarray where all elements are in non‐decreasing order.
• Merging Runs:
• The algorithm then merges these runs pairwise, combining them into larger runs.
• This process is repeated until the entire array is merged into a single run.
10/19/2024 95
Example
• [8,4,7,5,1,2,6,3]
• Step 1: Identify Runs: Identify and mark the existing runs of sorted elements in the array.
A run is a sequence of elements that are in non‐decreasing order:
• [8],[4,7],[5],[1,2,6],[3]
• Step 2: Merge Runs: Merge the identified runs pairwise until the entire array is sorted. The
merging process involves comparing and combining adjacent runs. The new runs become
larger, and the process continues until only one run remains.
1. Merge [8] and [4,7]: [4,7,8]
2. Merge [5] and [1,2,6]: [1,2,5,6]
3. Merge [4,7,8] and [1,2,5,6]: [1,2,4,5,6,7,8]
4. Merge [3] with the sorted array: [1,2,3,4,5,6,7,8]
• Conclusion
• The result is a sorted array:
• [1,2,3,4,5,6,7,8]
10/19/2024 96
Natural merge
• Potentially reduces the number of merge passes,
• making it more efficient for data that is partially sorted
• Natural merge sort can be especially efficient for real‐world data
• that often contains sorted or nearly sorted sequences.
10/19/2024 97
Balanced Merge
• It is a version of merge sort that ensures a
balanced division of the input list.
• Merge runs in a balanced way to minimize the
number of passes
• Works well when the dataset is large and lacks
inherent order, as it effectively distributes the
sorting load.
• The ability to use multiple tapes/files makes
each pass more efficient, reducing overall
sorting time.
10/19/2024 98
Balanced Merge
• Given
• N = 14 (total records)
• M = 3 (memory capacity, meaning we can handle 3 records at a time
in memory)
10/19/2024 99
Balanced Merge
• Step 1: Split and Sort
• First, we split the records into chunks that fit into memory and sort
each chunk:
• Chunk 1: 17, 3, 29 → Sorted: 3, 17, 29
• Chunk 2: 56, 24, 18 → Sorted: 18, 24, 56
• Chunk 3: 4, 9, 10 → Sorted: 4, 9, 10
• Chunk 4: 6, 45, 36 → Sorted: 6, 36, 45
• Chunk 5: 11, 43 → Sorted: 11, 43
10/19/2024 100
Balanced Merge
• Step 2: Initial Distribution
• Distribute these sorted chunks across the tapes:
• Tape A: 3, 17, 29
• Tape B: 18, 24, 56
• Tape C: 4, 9, 10
• Tape A: 6, 36, 45
• Tape B: 11, 43
10/19/2024 101
Balanced Merge
• Step 3: Merging Passes
• First Merge Pass
• Merge from Tape A and Tape B, write to Tape C:
• Tape A: 3, 17, 29
• Tape B: 18, 24, 56
• Merge result on Tape C: 3, 17, 18, 24, 29, 56
• Merge from Tape C and remaining Tape A, write to Tape B:
• Tape C: 4, 9, 10
• Tape A: 6, 36, 45
• Merge result on Tape B: 4, 6, 9, 10, 36, 45
• Remaining chunk from Tape B to Tape A:
• Tape B: 11, 43
10/19/2024 102
Balanced Merge
• After the first pass, the tapes will look like:
• Tape A: 11, 43
• Tape B: 4, 6, 9, 10, 36, 45
• Tape C: 3, 17, 18, 24, 29, 56
10/19/2024 103
Balanced Merge
• Second Merge Pass
• Merge from Tape A and Tape C, write to Tape B:
• Tape A: 11, 43
• Tape C: 3, 17, 18, 24, 29, 56
• Merge result on Tape B: 3, 11, 17, 18, 24, 29, 43, 56
• Remaining chunk from Tape B to Tape A:
• Tape B: 4, 6, 9, 10, 36, 45
10/19/2024 104
Balanced Merge
• Final Merge Pass
• Merge the final runs from Tape A and Tape B, write to Tape C:
• Tape A: 4, 6, 9, 10, 36, 45
• Tape B: 3, 11, 17, 18, 24, 29, 43, 56
• Merge result on Tape C: 3, 4, 6, 9, 10, 11, 17, 18, 24, 29, 36, 43, 45, 56
10/19/2024 105
Polyphase Merge
• Used for sorting large datasets that don't fit entirely in
memory.
• It minimizes I/O operations, making it efficient for
external storage
• It is a bit more complex but very efficient in minimizing
the number of read and write operations.
• It uses a more dynamic distribution of runs across
tapes
• It ensure number of runs reduces smoothly without
emptying any tape prematurely
10/19/2024 106
Polyphase Merge
• Given
• N = 14 (total records)
• M = 3 (memory capacity, meaning we can handle 3
records at a time in memory)
10/19/2024 107
Polyphase Merge
• Step 1: Split and Sort
• First, we split the records into chunks that fit into memory and sort
each chunk:
10/19/2024 108
Polyphase Merge
• Step 2: Initial Distribution
• Distribute these sorted chunks across three tapes (A, B, C) in a
polyphase manner.
• The key here is that one tape remains empty initially (for efficient
merging step):
10/19/2024 109
Polyphase Merge
• Step 3: Merging Passes
• First Merge Pass
• Merge from Tape A and Tape B, write to Tape C:
• Merge Result on Tape C:
• Merging: 3, 17, 29 and 18, 24, 56 → 3, 17, 18, 24, 29, 56
• Merging: 6, 36, 45 and 4, 9, 10 → 4, 6, 9, 10, 36, 45
• Remaining run from Tape B to Tape A: 11, 43
10/19/2024 110
Polyphase Merge
• Second Merge Pass
• Merge from Tape A and Tape C, write to Tape B:
• Merge Result on Tape B:
• Merging: 11, 43 and 3, 17, 18, 24, 29, 56 → 3, 11, 17, 18, 24, 29, 43, 56
• Remaining run from Tape C to Tape A: 4, 6, 9, 10, 36, 45
10/19/2024 111
Polyphase Merge
• Final Merge Pass
• Merge from Tape A and Tape B, write to Tape C:
• Merge Result on Tape C:
• Merging: 4, 6, 9, 10, 36, 45 and 3, 11, 17, 18, 24, 29, 43, 56 → 3, 4, 6, 9,
10, 11, 17, 18, 24, 29, 36, 43, 45, 56
10/19/2024 112
Balanced Merge Sort
• Time Complexity:
• Sorting: O(n log M)
• Merging: O(n log k)
• Overall: O(n log n / M)
• Space Complexity:
• In‐memory space: O(M)
• Disk space: O(n)
• Pros:
• Efficient for large datasets by balancing the load across multiple tapes/files.
• Reduces the number of read/write operations by leveraging multiple tapes.
• Cons:
• Requires multiple tapes/files, which might not always be feasible.
• Implementation can be complex.
10/19/2024 113
Natural Merge Sort (Two‐Way)
• Time Complexity:
• Sorting: O(n log n)
• Merging: Efficient if the dataset has natural runs
• Overall: O(n log n)
• Space Complexity:
• In‐memory space: O(M)
• Disk space: O(n)
• Pros:
• Takes advantage of existing sorted runs in the dataset, potentially reducing work.
• Simpler implementation compared to balanced or polyphase merge sorts.
• Efficient for datasets that are partially sorted.
• Cons:
• Less efficient for completely unsorted datasets, doesn't balance load as effectively.
• Can have more passes compared to balanced merge sort.
10/19/2024 114
Polyphase Merge
• Time Complexity:
• Sorting: O(n log M)
• Merging: Efficient multi‐way merging
• Overall: O(n log n)
• Space Complexity:
• In‐memory space: O(M)
• Disk space: O(n)
• Pros: