0% found this document useful (0 votes)
25 views115 pages

6 Sorting

Uploaded by

Govind 025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views115 pages

6 Sorting

Uploaded by

Govind 025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 115

DATA STRUCTURE

DR. ACHAL KAUSHIK

10/19/2024 1
Paper Code(s): CIC‐209 L P C
Paper: Data Structures 4 ‐ 4
Marking Scheme:
1. Teachers Continuous Evaluation: 25 marks
2. Term end Theory Examinations: 75 marks
Instructions for paper setter:
1. There should be 9 questions in the term end examinations question paper.
2. The first (1st) question should be compulsory and cover the entire syllabus. This question should be objective, single line answers or short
answer type question of total 15 marks.
3. Apart from question 1 which is compulsory, rest of the paper shall consist of 4 units as per the syllabus. Every unit shall have two questions
covering the corresponding unit of the syllabus. However, the student shall be asked to attempt only one of the two questions in the unit.
Individual questions may contain upto 5 sub‐parts / sub‐questions. Each Unit shall have a marks weightage of 15.
4. The questions are to be framed keeping in view the learning outcomes of the course / paper. The standard / level of the questions to be asked
should be at the level of the prescribed textbook.
5. The requirement of (scientific) calculators / log‐tables / data – tables may be specified if required.
Course Objectives :
1. To introduce basics of Data structures (Arrays, strings, linked list etc.)
2. To understand the concepts of Stacks, Queues and Trees, related operations and their implementation
3. To understand sets, heaps and graphs
4. To introduce various Sorting and searching Algorithms
Course Outcomes (CO)
CO 1 To be able to understand difference between structured data and data structure
CO 2 To be able to create common basic data structures and trees
CO 3 To have a knowledge of sets, heaps and graphs
CO 4 To have basic knowledge of sorting and searching algorithms
Course Outcomes (CO) to Programme Outcomes (PO) mapping (scale 1: low, 2: Medium, 3: High)
PO01 PO02 PO03 PO04 PO05 PO06 PO07 PO08 PO09 PO10 PO11 PO12

CO 1 3 2 2 2 3 ‐ ‐ ‐ 2 2 2 3
CO 2 3 2 2 2 3 ‐ ‐ ‐ 2 2 2 3
CO 3 3 2 2 2 3 ‐ ‐ ‐ 2 2 2 3
CO 4 3 2 2 2 3 ‐ ‐ ‐ 2 2 2 3
UNIT – I
Overview of data structure, Basics of Algorithm Analysis including Running Time Calculations, Abstract Data Types, Arrays, Arrays and Pointers,
Multidimensional Array, String processing, General Lists and List ADT, List manipulations, Single, double and circular lists. Stacks and Stack ADT,
Stack Manipulation, Prefix, infix and postfix expressions, recursion. Queues and Queue ADT, Queue manipulation.

UNIT – II
Sparse Matrix Representation (Array and Link List representation) and arithmetic (addition, subtraction and multiplication), polynomials and
polynomial arithmetic.
Trees, Properties of Trees, Binary trees, Binary Tree traversal, Tree manipulation algorithms, Expression trees and their usage, binary search trees,
AVL Trees, Heaps and their implementation, Priority Queues, B‐Trees, B* Tree, B+ Tree

UNIT – III
Sorting concept, order, stability, Selection sorts (straight, heap), insertion sort (Straight Insertion, Shell sort), Exchange Sort (Bubble, quicksort),
Merge sort (External Sorting) (Natural merge, balanced merge and polyphase merge). Searching – List search, sequential search, binary search,
hashing methods, collision resolution in hashing.

UNIT – IV
Disjoint sets representation, union find algorithm, Graphs, Graph representation, Graph Traversals and their implementations (BFS and DFS).
Minimum Spanning Tree algorithms, Shortest Path Algorithms

Textbook(s):
1. Richard Gilberg , Behrouz A. Forouzan, “Data Structures: A Pseudocode Approach with C, 2nd Edition, Cengage Learning, Oct 2004
2. E. Horowitz, S. Sahni, S. Anderson‐Freed, "Fundamentals of Data Structures in C", 2nd Edition, Silicon Press (US), 2007.

References:
1. Mark Allen Weiss, “Data Structures and Algorithm Analysis in C”, 2nd Edition, Pearson, September, 1996
2. Robert Kruse, “Data Structures and Program Design in C”, 2nd Edition, Pearson, November, 1990
3. Seymour Lipschutz, “Data Structures with C (Schaum's Outline Series)”, McGrawhill, 2017
4. A. M. Tenenbaum, “Data structures using C”. Pearson Education, India, 1st Edition 2003.
5. Weiss M.A., “Data structures and algorithm analysis in C++”, Pearson Education, 2014.
Data Structure
• Data Structure is a way of collecting and organizing data in such a
way that we can perform operations on these data in an effective
way
• In simple language, Data Structures are structures programmed to
store ordered data, so that various operations can be performed on
it easily

10/19/2024 4
Structure of Data

10/19/2024 5
Why Study Sorting Algorithms?

• There are a variety of situations that we can encounter


• Do we have randomly ordered keys?
• Are all keys distinct?
• How large is the set of keys to be ordered?
• Need guaranteed performance?

• Various algorithms are better suited to some of these situations

10/19/2024 6
Sorting
• Sorting most common data‐processing applications
Classified as
• Internal sort
All of the data are held in primary memory during sorting process

• External sort
Uses primary memory for the data currently being sorted and
Secondary storage for any data that does not fit in primary memory

10/19/2024 7
Sorting

Sorts

Internal External

• Natural
• Balanced
Insertion Selection Exchange • Polyphase

Insertion Selection Bubble

Shell Heap Quick

10/19/2024 8
Sorting

• Sort order
Ascending
Descending

• Sort Efficiency
Estimate of number of comparisons and moves required to order an unordered list

• Sort Stability
Data with equal keys maintain their relative input order in the output

10/19/2024 9
Stable sort algorithms

• A stable sort keeps Amit 98 Amit 98 Keshav 98


equal elements in the
same order Kapil 90 Keshav 98 Amit 98

• This may matter when Pakhi 75 Kapil 90 Divija 90


you are sorting data
according to some Keshav 98 Divija 90 Kapil 90
characteristic
Ankush 86 Ankush 86 Ankush 86
• Example: sorting
students by test scores Divija 90 Mudit 86 Mudit 86

Mudit 86 Pakhi 75 Pakhi 75

original array stably sorted Unstably sorted


10/19/2024 10
Sorting

Sorts

Internal

Insertion Selection Exchange

Insertion Selection Bubble

Shell Heap Quick

10/19/2024 11
INSERTION: Insertion Sort

• Idea: like sorting a hand of playing cards


• Start with an empty left hand and the cards facing down on the table
• Remove one card at a time from the table, and insert it into the correct position in the
left hand
• compare it with each of the cards already in the hand, from right to left
• The cards held in the left hand are sorted
• these cards were originally the top cards of the pile on the table

10/19/2024 12
Insertion Sort
• The first item is sorted
• Compare the second item to the first
• if smaller swap
• Third item, compare to item next to it
• need to swap
• after swap compare again

• And so forth…

To insert 3, we need to
make room for it by moving
first 6 and then 4.
10/19/2024 13
Insertion Sort

input array

5 2 4 6 1 3

at each iteration, the array is divided in two sub‐arrays:

left sub‐array right sub‐array

sorted unsorted

10/19/2024 14
Insertion Sort

10/19/2024 15
One step of insertion sort
sorted next to be inserted

3 4 7 12 14 14 20 21 33 38 10 55 9 23 28 16
temp
less than 10
10

3 4 7 10 12 14 14 20 21 33 38 55 9 23 28 16

sorted

• This one step takes O(n) time


• We must do it n times; hence, insertion sort is O(n2)

10/19/2024 16
Algorithm
1 2 3 4 5 6 7 8

INSERTION‐SORT(A)
a1 a2 a3 a4 a5 a6 a7 a8
for j ← 2 to n
do key ← A[ j ] key
Insert A[ j ] into the sorted sequence A[1 . . j -1]
i←j-1
while i > 0 and A[i] > key
do A[i + 1] ← A[i]
i←i–1
A[i + 1] ← key
• Insertion sort – sorts the elements in place
10/19/2024 17
Insertion Sort
• int a[6] = {5, 2, 4, 6, 1, 3};
• int i, j, key; // n=6
• for(i=1; i<n; i++)
•{
• key = a[i];
• j = i‐1;
• while(j>=0 && key < a[j])
• {

• a[j+1] = a[j];
j‐‐;
}
• a[j+1] = key;
•}

10/19/2024 18
Analysis of Insertion Sort
INSERTION‐SORT(A) cost times
for j ← 2 to n c1 n
do key ← A[ j ] c2 n-1
Insert A[ j ] into the sorted sequence A[1 . . j ‐1] 0 n-1
i←j‐1 c4 n-1
c5 
n
while i > 0 and A[i] > key j 2 j
t
do A[i + 1] ← A[i] c6 
n
(t j  1)
j 2
i←i–1 c7 
n
(t j  1)
j 2
A[i + 1] ← key c8 n-1
tj: # of times the while statement is executed at iteration j

T (n)  c1n  c2 (n  1)  c4 (n  1)  c5  t j  c6  t j  1  c7  t j  1  c198 (n  1)


n n n

j 2 j 2 j 2
Best Case Analysis

 The array is already sorted “while i > 0 and A[i] > key”
 A[i] ≤ key upon the first time the while loop test is run (when i
= j ‐1)
 tj = 1
 T(n) = c1n + c2(n ‐1) + c4(n ‐1) + c5(n ‐1) + c8(n‐1)
 = (c1 + c2 + c4 + c5 + c8)n + (c2 + c4 + c5 + c8)
 = an + b = (n)

T (n)  c1n  c2 (n  1)  c4 (n  1)  c5  t j  c6  t j  1  c7  t j  1  c8 (n  1)
n n n

10/19/2024 j 2 j 2 j 2 20
Worst Case Analysis

• The array is in reverse sorted order “while i > 0 and A[i] > key”
• Always A[i] > key in while loop test
• Have to compare key with all elements to the left of the j‐th position  compare with
j-1 elements  tj = j

using
n
n(n  1) n
n(n  1) n
n(n  1)

j 1
j
2
  j 
j 2 2
 1   ( j  1) 
j 2 2
we have:

 n( n  1)  n( n  1) n( n  1)
T ( n )  c1n  c2 ( n  1)  c4 ( n  1)  c5   1  c6  c7  c8 ( n  1)
 2  2 2
 an 2  bn  c a quadratic function of n
T(n) = (n2) order of growth in n2

T (n)  c1n  c2 (n  1)  c4 (n  1)  c5  t j  c6  t j  1  c7  t j  1  c8 (n  1)
n n n

j 2 j 2 j 2 21
Complexity: Insertion sort
• Worst Case Time Complexity : O(n2)
• Best Case Time Complexity : O(n)
• Average Time Complexity : O(n2)
• Space Complexity : O(1)

10/19/2024 22
Sorting

Sorts

Internal

Insertion Selection Exchange

Insertion Selection Bubble

Shell Heap Quick

10/19/2024 23
INSERTION: Shell Sort

• Shell sort is a variant of insertion sort


It is named after Donald Shell
Average performance: O(n3/2) or better

• Divide and conquer approach to insertion sort


Sort many smaller subarrays using insertion sort
Sort progressively larger arrays
Finally sort the entire array

• These arrays are elements separated by a gap


Start with large gap
Decrease the gap on each pass

10/19/2024 24
Shell Sort
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

44
• Initial
68 191
gap
119
= length
119 37
/ 2 =8316 / 82
2 = 8191 45 158 130 76 153 39 25
• initial sub arrays indices:
• {0, 8}, {1, 9}, {2, 10}, {3, 11}, {4, 12}, {5, 13}, {6, 14}, {7, 15}
next gap = 8 / 2 = 4
• {0, 4, 8, 12}, {1, 5, 9, 13}, {2, 6, 10, 14}, {3, 7, 11, 15}
next gap = 4 / 2 = 2
• {0, 2, 4, 6, 8, 10, 12, 14}, {1, 3, 5, 7, 9, 11, 13, 15}
• final gap = 2 / 2 = 1

10/19/2024 25
Shell Sort

• Make use of the intrinsic strengths of Insertion sort


• Insertion sort is fastest when:
The array is nearly sorted
The array contains only a small number of data items

• Shell sort works well because:


It always deals with a small number of elements
Elements are moved a long way through array with each swap and this leaves
it more nearly sorted

10/19/2024 26
Shell Sort

• The h1, h2, h3,. . . , ht is a sequence of increasing integer values


used as a sequence (from right to left) of gap values

• Any sequence will work as long as it is increasing and h1 = 1


For any gap value hk we have A[i] <= A[i + hk]
An array A for which this is true is hk sorted

• An array which is hk sorted is then


hk‐1 sorted remains hk sorted

10/19/2024 27
Example: Shell Sort
46 2 83 41 102 5 17 31 64 49 18

• Gap of five. Sort sub array with 46, 5, and 18


5 2 83 41 102 18 17 31 64 49 46

• Gap still five. Sort sub array with 2 and 17

5 2 83 41 102 18 17 31 64 49 46

• Gap still five. Sort sub array with 83 and 31


5 2 31 41 102 18 17 83 64 49 46

• Gap still five Sort sub array with 41 and 64


5 2 31 41 102 18 17 83 64 49 46

• Gap still five. Sort sub array with 102 and 49


5 2 31 41 49 18 17 83 64 102 46
10/19/2024 28
Shell Sort
• Gap now 2: Sort sub array with 5 31 49 17 64 46
5 2 31 41 49 18 17 83 64 102 46

• Gap still 2: Sort sub array with 2 41 18 83 102


5 2 17 41 31 18 46 83 49 102 64
• Gap of 1 (Insertion sort)

5 2 17 18 31 41 46 83 49 102 64
• Array sorted

2 5 17 18 31 41 46 49 64 83 102

10/19/2024 29
Shell Sort ‐ Ideal Gap Sequence

• Although any increasing sequence will work ( if h1 = 1):


• Best results when all values in gap sequence are relatively prime
sequence does not share any divisors

• Obtaining a relatively prime sequence is often not practical


practical solutions try to approximate relatively prime sequences

• Time complexity: O(nr) with 1 < r < 2


• This is better than O(n2) but generally worse than O(n log2n)

10/19/2024 30
Gap Sequence: Three possibilities
• Three possibilities presented:
1. Shell's suggestion ‐ first gap is N/2 ‐ successive gaps are
previous value divided by 2
2. Odd gaps only ‐ like Shell method except if division
produces an even number add 1
 better performance than Shell's method since all odd
values eliminates the factor 2
3. 2.2 method ‐ like Odd gaps method but use a divisor of
2.2 and truncate
 best performance of all ‐ most nearly a relatively prime
sequence.
10/19/2024 31
The Knuth gap sequence

• No one knows the optimal sequence of diminishing gaps


• This sequence is attributed to Donald E. Knuth:
• Start with h = 1
• Repeatedly compute h = 3*h + 1
• 1, 4, 13, 40, 121, 364, 1093
• Stop when h is larger than the size of the array and use as the first gap,
• To get successive gap sizes, apply the inverse formula:
h = (h – 1) / 3

• This sequence seems to work very well


• It turns out that just cutting the array size in half each time does
not work out as well
10/19/2024 32
Sorting

Sorts

Internal

Insertion Selection Exchange

Insertion Selection Bubble

Shell Heap Quick

10/19/2024 33
SELECTION: Selection Sort

• Idea:
• Find the smallest element in the array
• Exchange it with the element in the first position
• Find the second smallest element and exchange it with the element in the second
position
• Continue until the array is sorted

• Disadvantage:
• Running time depends only slightly on the amount of order in the file

10/19/2024 34
Selection sort

• Given an array of length n,


Search elements 0 through n‐1 and select the smallest
Swap it with the element in location 0
Search elements 1 through n-1 and select the smallest
Swap it with the element in location 1
Search elements 2 through n-1 and select the smallest
Swap it with the element in location 2
Search elements 3 through n-1 and select the smallest
Swap it with the element in location 3
Continue in this fashion until there’s nothing left to search

10/19/2024 35
Example

• Original list 8 4 6 9 2 3 1

• Pass 1
1 4 6 9 2 3 8
• Pass 2
1 2 6 9 4 3 8
• Pass 3
1 2 3 9 4 6 8

10/19/2024 36
Example

• Pass 4 1 2 3 4 9 6 8

• Pass 5
1 2 3 4 6 9 8
• Pass 6
1 2 3 4 6 8 9
• Pass 7
1 2 3 4 6 8 9
Sorted

10/19/2024 37
Selection Sort
8 4 6 9 2 3 1
Algorithm: SELECTION‐SORT(A)
n ← length[A]
for j ← 1 to n - 1
do smallest ← j
for i ← j + 1 to n
do if A[i] < A[smallest]
then smallest ← i
exchange A[j] A[smallest]

38
Selection Sort
int i, j, min, temp;
for(i=0; i < size‐1; i++ )
{
min = i; //setting min as i
for(j=i+1; j < size; j++) 8 4 6 9 2 3 1
{
if(a[j] < a[min]) //if element at j is less than element at min position
{
min = j; //then set min as j
}
}
temp = a[i];
a[i] = a[min];
a[min] = temp;
} 10/19/2024 39
Analysis of Selection Sort
cost times
Alg.: SELECTION‐SORT(A) c1 1
n ← length[A]
c2 n
for j ← 1 to n - 1
c3 n-1
do smallest ← j
n2/2
for i ← j + 1 to n c4  j 1 (n  j  1)
n 1

comparisons
do if A[i] < A[smallest] c5 
n 1
j 1
(n  j )
n then smallest ← i
exchanges c6 
n 1
j 1
(n  j )
exchange A[j] A[smallest]
c7 n-1
n 1 n 1 n 1 40
T ( n )  c1  c 2 n  c 3 ( n  1)  c 4  ( n  j  1)  c 5   n  j   c 6   n  j   c 7 ( n  1)   ( n )2

j 1 j 1 j2
Complexity: Selection Sort
• Worst Case Time Complexity : O(n2)
• Best Case Time Complexity : O(n2)
• Average Time Complexity : O(n2)
• Space Complexity : O(1)

10/19/2024 41
Example and analysis of selection sort
7 2 8 5 4 • The selection sort might swap an array element with
itself—
2 7 8 5 4 • Analysis:
The outer loop executes n-1 times
The inner loop executes about n/2 times on average
2 4 8 5 7 (from n to 2 times)
Work done in the inner loop is constant (swap two array
elements)
2 4 5 8 7 Time required is roughly (n-1)*(n/2)
 O(n2)

2 4 5 7 8

10/19/2024 42
Sorting

Sorts

Internal

Insertion Selection Exchange

Insertion Selection Bubble

Shell Heap Quick

10/19/2024 43
SELECTION: Heap Sort
• A heap is a nearly complete binary tree with the following two
properties:
• Structural property: all levels are full, except possibly the last one, which
is filled from left to right
• Order (Max‐heap) property: for any node x
Parent(x) ≥ x

10/19/2024 44
Heap Sort

• A heap can be stored as an array A.


• Root of tree is A[1]
• Left child of A[i] = A[2i]
• Right child of A[i] = A[2i + 1]
• Parent of A[i] = A[ i/2 ]
• Heapsize[A] ≤ length[A]

• The elements in the subarray A[(n/2+1) .. n] are leaves

10/19/2024 45
Question
• Consider the array
• A = (29, 18, 10, 15, 20, 9, 5, 13, 2, 4, 15)
• Does A satisfy the max‐heap property?

10/19/2024 46
Heap Sort
• Build heap starting from unsorted array
• While the heap is not empty
• Remove the first item from the heap:
Swap it with the last item
• Restore the heap property

10/19/2024 47
Example

10/19/2024 48
Example

10/19/2024 49
Example

10/19/2024 50
Example

10/19/2024 51
Heap Sort
• 1st Step‐ Build heap, O(n) time complexity
• 2nd Step – perform n delete Max operations, each with O(log(n))
time complexity
• Total time complexity = O(n log(n))

10/19/2024 52
Heap Sort

• Worst Case Time Complexity : O(n log n)


• Best Case Time Complexity : O(n log n)
• Average Time Complexity : O(n log n)
• Space Complexity : O(n)
• Not a Stable sort
requires a constant space for sorting a list

• Heap Sort is very fast and is widely used for sorting

10/19/2024 53
Sorting

Sorts

Internal

Insertion Selection Exchange

Insertion Selection Bubble

Shell Heap Quick

10/19/2024 54
Exchange Sort

• Attempt to improve ordering by comparing elements in pairs


interchanging them if they are not in sorted order

• Operation is repeated until the table is sorted


• Algorithms differ in how they systematically choose the two elements to be
compared
Bubble Sort compares adjacent elements

10/19/2024 55
EXCHANGE: Bubble Sort

• Traverse a collection of elements

• Move from the front to the end


• Bubble the largest value to the end using
pair‐wise comparisons and
swapping

10/19/2024 56
Bubble Sort

• Idea:
• Repeatedly pass through the array
• Swaps adjacent elements that are out of order

i
1 2 3 n

8 4 6 9 2 3 1
j
• Easier to implement, but slower than Insertion sort

57
Example
8 4 6 9 2 3 1 1 8 4 6 9 2 3
i=1 j i=2 j

8 4 6 9 2 1 3 1 2 8 4 6 9 3
i=1 j i=3 j

8 4 6 9 1 2 3 1 2 3 8 4 6 9
i=1 j i=4 j

8 4 6 1 9 2 3 1 2 3 4 8 6 9
i=1 j i=5 j

8 4 1 6 9 2 3 1 2 3 4 6 8 9
i=1 j i=6 j

8 1 4 6 9 2 3 1 2 3 4 6 8 9
i=1 j i=7
j
1 8 4 6 9 2 3 58

i=1 j
Bubble Sort

Algorithm: BUBBLESORT(A)
for i  1 to length[A]
do for j  length[A] downto i + 1
do if A[j] < A[j -1]
then exchange A[j]  A[j-1]

i
8 4 6 9 2 3 1
i=1 j
59
Bubble Sort
• Pass 1
• 9 2 5 4 7 8 compare 1st & 2nd out of order EXCHANGE 9 2
• 2 9 5 4 7 8 compare 2nd & 3rd out of order EXCHANGE 9 5
• 2 5 9 4 7 8 compare 3rd & 4th out of order EXCHANGE 9 4
• 2 5 4 9 7 8 compare 4th & 5th out of order EXCHANGE 9 7
• 2 5 4 7 9 8 compare 5th & 6th If out of order EXCHANGE 9 8
• 254789
• Last element reached at appropriate position after pass 1
• All other values may still out of order
• So we need to repeat this process

10/19/2024 60
Bubble sort
• int a[6] = {9 2 5 4 7 8 };
• int i, j, temp; // n=6
• for(i=0; i<n; i++)
•{
•for(j=0; j<n‐i‐1; j++)
•{
• if( a[j] > a[j+1])
• {
• temp = a[j];
• a[j] = a[j+1];
• a[j+1] = temp;
• }
• }
• }
• Algorithm isn't efficient!
• Loop for six iterations even if the array gets sorted after second iteration
10/19/2024 61
Bubble sort modified
• int a[6] = {9 2 5 4 7 8 };
• int i, j, temp;
 if (!flag)
• for(i=0; i<n; i++)
{
•{ break; //
• int flag = 0; // taking a flag variable breaking out of for
• for(j=0; j<n‐i‐1; j++) loop if no swapping
takes place
•{
}
•if( a[j] > a[j+1])
•{
 }
• temp = a[j];  // If no swapping is
• a[j] = a[j+1]; taking place that
• a[j+1] = temp; means the array is
• flag = 1; // setting flag as 1, if swapping occurs sorted and we can
• } jump out of the for
• } loop
10/19/2024 62
Complexity: Bubble Sort
• n‐1 comparisons will be done in 1st pass
• n‐2 in 2nd pass
• n‐3 in 3rd pass and so on
• total number of comparisons will be (n‐1)+(n‐2)+(n‐3)+.....+3+2+1
• Sum = n(n‐1)/2 i.e. O(n2)
• Hence the complexity of Bubble Sort is O(n2).
• Space complexity for Bubble Sort is O(1), because only single additional
memory space is required for temp variable
• Best‐case Time Complexity O(n), when list is already sorted

10/19/2024 63
Bubble‐Sort Running Time
Algorithm: BUBBLESORT(A)
c1
for i  1 to length[A]
do for j  length[A] downto i + 1 c2
do if A[j] < A[j -1] Comparisons:  n2/2 c3

Exchanges:  n2/2 then exchange A[j]  A[j-1] c4


n

 (n  i )
n n
T(n) = c1(n+1) + c2  (n  i  1)  c3  (n  i )  c4
i 1 i 1 i 1
n
= (n) + (c2 + c2 + c4)  (n  i )
i 1
n n n
n ( n  1) n 2
n
where  (n  i )  n   i  n 
2
 
i 1 i 1 i 1 2 2 2
Thus,T(n) = (n2)
64
Bubble Sort

• Given n numbers to sort


• Repeat the following n‐1 times:
For each pair of adjacent numbers
If the number on the left is greater than the number on the right
 swap them

• How efficient is bubble sort?


In general, given n numbers to sort, it performs n2 comparisons
Same as selection sort

• simple way to improve:


Stop after going through without making any swaps
Will only help some of the time
10/19/2024 65
Sorting

Sorts

Internal

Insertion Selection Exchange

Insertion Selection Bubble

Shell Heap Quick

10/19/2024 66
EXCHANGE: Quick Sort

• Divide and conquer approach


• Given array S to be sorted
 If size of S < 1 then done;
 Pick any element v in S as the pivot
 Partition S-{v} (remaining elements in S) into two groups
 S1 = {all elements in S-{v} that are smaller than v}
 S2 = {all elements in S-{v} that are larger than v}
 Return {quicksort(S1) followed by v followed by quicksort(S2) }

10/19/2024 67
Picking the Pivot
• Strategy 1: Pick the first element in S
Works only if input is random

• What if input S is sorted, or even mostly sorted?


All the remaining elements would go into either S1 or S2!
Terrible performance!

• Why worry about sorted input?


Quicksort is recursive, so sub‐problems could be sorted
Plus mostly sorted input is quite frequent

10/19/2024 68
Picking the Pivot
• Strategy 2: Pick the pivot randomly
Would usually work well, even for mostly sorted input

• Random number generation is an expensive operation

10/19/2024 69
Picking the Pivot
• Strategy 3: Median‐of‐three Partitioning
Ideally, the pivot should be the median of input array S
Median = element in the middle of the sorted sequence

Would divide the input into two almost equal partitions


Unfortunately, its hard to calculate median quickly, without sorting first!

So find the approximate median


Pivot = median of the left‐most, right‐most and center element of the array S
Solves the problem of sorted input

Let input S = {6, 1, 4, 9, 0, 3, 5, 2, 7, 8}


left=0 and S[0]=6, right=9 and S[9]=8, center=(left+right)/2 = 4 and
S[4]=0
Pivot
• = Median of S[left], S[right], and S[center] = median of 6, 8, and 0
• = S[left] = 6
10/19/2024 70
Partitioning Strategy

• Want to partition an array A[left .. right]


• Put the pivot element at the end
by swapping it with the last element (Swap pivot and A[right])

• Let i start at the first element and j start at the last but one
(i = left, j = right – 1)

swap
5 6 4 6 3 12 19 5 6 4 19 3 12 6

pivot i j
10/19/2024 71
Partitioning Strategy
• Want to have
• A[p] <= pivot, for p < i <= pivot >= pivot
• A[p] >= pivot, for p > j

• When i < j

i pivot
Move i right, skipping over elements smaller than the
j
• Move j left, skipping over elements greater than the pivot
• When both i and j have stopped
• A[i] >= pivot & A[j] <= pivot

5 6 4 19 3 12 6 5 6 4 19 3 12 6

i 10/19/2024 j i j 72
Partitioning Strategy

• When i and j have stopped and i is to the left of j


Swap A[i] and A[j]
The large element is pushed to the right and the small element is pushed to the left
After swapping
A[i] <= pivot
A[j] >= pivot
Repeat the process until i and j cross

swap
5 6 4 19 3 12 6 5 3 4 19 6 12 6

i
10/19/2024 j i j 73
Partitioning Strategy
5 3 4 19 6 12 6
• When i and j have crossed
• Swap A[i] and pivot
• Result:
i j
• A[p] <= pivot, for p < i
• A[p] >= pivot, for p > i
5 3 4 19 6 12 6

j i

5 3 4 6 6 12 19

10/19/2024
j i 74
Quick Sort
/* a[] array, p is starting index, i. e. 0, and r is the last index of array */

void quicksort(int a[], int p, int r)


{
if(p < r)
{
int q;
q = partition(a, p, r);
quicksort(a, p, q);
quicksort(a, q+1, r);
}
}

10/19/2024 75
Quick Sort
if(i < j)
int partition(int a[], int p, int r) {
{
temp = a[i];
int i, j, pivot, temp; a[i] = a[j];
pivot = a[p]; a[j] = temp;
i = p; }
j = r; else
{
while(1)
{ return j;
}
while(a[i] < pivot && a[i] != pivot)
}
i++; }
while(a[j] > pivot && a[j] != pivot)
j‐‐;

10/19/2024 76
Complexity: Quick Sort

• Worst Case Time Complexity : O(n2)


• Best Case Time Complexity : O(n log n)
• Average Time Complexity : O(n log n)
• Space Complexity : O(n log n)
• Space required by quick sort is very less
only O(n log n) additional space is required

• Quick sort is not a stable sorting technique


might change the occurrence of two similar elements in the list while sorting

10/19/2024 77
Merge Sort
• Merge Sort follows the rule of Divide and Conquer
• But doesn't divide the list into two halves
• Unsorted list is divided into N sublists
each having one element
a list of one element is considered sorted

• Then, it repeatedly merge these sublists


to produce new sorted sublists
and at lasts one sorted list is produced

10/19/2024 78
Merge Sort
• Merge sort works recursively
• First it divides a data set in half, sorts each half separately
• Next, the first elements from each of the two lists are compared
The lesser element is then removed from its list and
added to the final result list

10/19/2024 79
Merge Sort

• Worst Case Time Complexity : O(n log n)


• Best Case Time Complexity : O(n log n)
• Average Time Complexity : O(n log n)
• Space Complexity : O(n)
• Time complexity is O(n Log n) in all 3 cases (worst, average and best)
as merge sort always divides the array in two halves and take linear time to merge two
halves
It requires equal amount of additional space as the unsorted list
Hence its not at all recommended for searching large unsorted lists.

• It is the best Sorting technique for sorting Linked Lists.


10/19/2024 80
External Sorting

• External sorting can handle massive amounts of data


Required when the data do not fit into the main memory
Reside in the slower external memory (usually a hard drive)

• External sorting typically uses a hybrid sort‐merge strategy


• In the sorting phase,
data small enough to fit in main memory are read
sorted and
written out to a temporary file

• In the merge phase,


the sorted sub‐files are combined into a single larger file

10/19/2024 81
External Merge Sort

• Sorting 900 megabytes of data using only 100 megabytes of RAM:


• Read 100 MB of the data in main memory and
1. Sort by some conventional method, like quicksort
2. Write the sorted data to disk
3. Repeat steps 1 and 2 until all of the data is in sorted 100 MB chunks
900MB / 100MB = 9 chunks, need to be merged into one single output file

4. Read the first 10 MB (= 100MB / (9 chunks + 1)) of each sorted chunk


into input buffers in main memory and
allocate the remaining 10 MB for an output buffer
(In practice, better performance when the output buffer larger than the input buffers)

5. Perform a 9‐way merge and store the result in the output buffer
If the output buffer is full, write it to the final sorted file, and empty it
If any of the 9 input buffers gets empty, fill it with the next 10 MB of its associated 100
MB sorted chunk until no more data from the chunk is available
10/19/2024 82
External Merge Sort
• Example of Two‐Way Sorting:
• N = 14, M = 3
 (14 records on tape Ta1, memory capacity: 3 records.)

• Ta1: 17, 3, 29, 56, 24, 18, 4, 9, 10, 6, 45, 36, 11, 43

10/19/2024 83
External Merge Sort
• Ta1: 17, 3, 29, 56, 24, 18, 4, 9, 10, 6, 45, 36, 11, 43
• Step A  Sorting of runs:
• Read 3 records in main memory, sort them and store them on Tb1: 17, 3, 29 ‐> 3,
17, 29
• Tb1: 3, 17, 29
• Read the next 3 records in main memory, sort them and store them on Tb2: 56,
24, 18 ‐> 18, 24, 56
• Tb2: 18, 24, 56
• Read the next 3 records in main memory, sort them and store them on Tb1: 4, 9,
10 ‐> 4, 9, 10
• Tb1: 3, 17, 29, 4, 9, 10
• Read the next 3 records in main memory, sort them and store them on Tb2: 6, 45,
36 ‐> 6, 36, 45
• Tb2: 18, 24, 56, 6, 36, 45
• Read the next 3 records in main memory, sort them and store them on Tb1: 11, 43
‐> 11, 43
(there are only two records left)
• Tb1: 3, 17, 29, 4, 9, 10, 11, 43
10/19/2024 84
External Merge Sort

• Step B  Merging of runs


• B1. Merging runs of length 3 to obtain runs of length 6.

• Source tapes: Tb1 and Tb2, result on Ta1 and Ta2


• Merge the first two runs (on Tb1 and Tb2) and store the result on Ta1.
• Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43
• Tb2: 18, 24, 56 | 6, 36, 45 |

10/19/2024 85
External Merge Sort

• Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43


• Tb2: 18, 24, 56 | 6, 36, 45 |

10/19/2024 86
External Merge Sort
• Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43
• Tb2: 18, 24, 56 | 6, 36, 45 |

10/19/2024 87
External Merge Sort
• Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43
• Tb2: 18, 24, 56 | 6, 36, 45 |

10/19/2024 88
External Merge Sort
• Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43
• Tb2: 18, 24, 56 | 6, 36, 45 |

10/19/2024 89
External Merge Sort
• Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43
• Tb2: 18, 24, 56 | 6, 36, 45 |
• Thus we have the first two runs on Ta1 and Ta2, each twice the size
of the original runs:

10/19/2024 90
External Merge Sort
• Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43
• Tb2: 18, 24, 56 | 6, 36, 45 |
• Next, we merge the third runs on Tb1 and Tb2 and store the result
on Ta1
• only Tb1 contains a third run, so copied onto Ta1:

10/19/2024 91
External Merge Sort
• Step B2  Merging runs of length 6 to obtain runs of length 12
• Source tapes: Ta1 and Ta2. Result on Tb1 and Tb2:
• After merging the first two runs from Ta1 and Ta2
• we get a run of length 12, stored on Tb1:

• The second set of runs is only one run, copied to Tb2

10/19/2024 92
External Merge Sort
• Now on each tape there is only one run
• The last step is to merge these two runs and to get the entire file
sorted
• Step B3  Merging the last two runs

10/19/2024 93
Time Complexity
1.Initial Splitting:
1. We split the data into chunks that fit into memory. This takes O(n) time,
2. where n is the total number of elements. (n=14)
2.Sorting Each Chunk:
1. Sorting each chunk of size M (where M is the memory size) takes O(M
log M) time. (M=3)
2. If there are k chunks, the total time for sorting all chunks is O(k ⋅ M
log M).
3.Merging Chunks:
1. The merging process involves k sorted chunks. The number of merge pass
es required is log k.
2. Each merge pass processes n elements, leading to a total merging time of
O(n log k).
• Since k=n/M, the merging complexity can be approximated as O(n log n/M).
• Overall, the time complexity of the external merge sort is O(n log n/M).

10/19/2024 94
Natural merge
• Works well on partially sorted data or data with existing runs (sorted
subarrays)
• Identifying Runs:
• The first step involves identifying and marking the existing runs of sorted elements
in the data.
• A run is a contiguous subarray where all elements are in non‐decreasing order.

• Merging Runs:
• The algorithm then merges these runs pairwise, combining them into larger runs.
• This process is repeated until the entire array is merged into a single run.

• Repeat until Sorted:


• Steps 1 and 2 are repeated until the entire array is sorted.
• If data is already sorted, algorithm detects this early and avoids unnecessary work

10/19/2024 95
Example
• [8,4,7,5,1,2,6,3]
• Step 1: Identify Runs: Identify and mark the existing runs of sorted elements in the array.
A run is a sequence of elements that are in non‐decreasing order:
• [8],[4,7],[5],[1,2,6],[3]
• Step 2: Merge Runs: Merge the identified runs pairwise until the entire array is sorted. The
merging process involves comparing and combining adjacent runs. The new runs become
larger, and the process continues until only one run remains.
1. Merge [8] and [4,7]: [4,7,8]
2. Merge [5] and [1,2,6]: [1,2,5,6]
3. Merge [4,7,8] and [1,2,5,6]: [1,2,4,5,6,7,8]
4. Merge [3] with the sorted array: [1,2,3,4,5,6,7,8]
• Conclusion
• The result is a sorted array:
• [1,2,3,4,5,6,7,8]

10/19/2024 96
Natural merge
• Potentially reduces the number of merge passes,
• making it more efficient for data that is partially sorted
• Natural merge sort can be especially efficient for real‐world data
• that often contains sorted or nearly sorted sequences.

10/19/2024 97
Balanced Merge
• It is a version of merge sort that ensures a
balanced division of the input list.
• Merge runs in a balanced way to minimize the
number of passes
• Works well when the dataset is large and lacks
inherent order, as it effectively distributes the
sorting load.
• The ability to use multiple tapes/files makes
each pass more efficient, reducing overall
sorting time.

10/19/2024 98
Balanced Merge
• Given
• N = 14 (total records)
• M = 3 (memory capacity, meaning we can handle 3 records at a time
in memory)

• Initial Dataset (Tape A):


• 17, 3, 29, 56, 24, 18, 4, 9, 10, 6, 45, 36, 11, 43

10/19/2024 99
Balanced Merge
• Step 1: Split and Sort
• First, we split the records into chunks that fit into memory and sort
each chunk:
• Chunk 1: 17, 3, 29 → Sorted: 3, 17, 29
• Chunk 2: 56, 24, 18 → Sorted: 18, 24, 56
• Chunk 3: 4, 9, 10 → Sorted: 4, 9, 10
• Chunk 4: 6, 45, 36 → Sorted: 6, 36, 45
• Chunk 5: 11, 43 → Sorted: 11, 43

10/19/2024 100
Balanced Merge
• Step 2: Initial Distribution
• Distribute these sorted chunks across the tapes:
• Tape A: 3, 17, 29
• Tape B: 18, 24, 56
• Tape C: 4, 9, 10
• Tape A: 6, 36, 45
• Tape B: 11, 43

• Tape A: 3, 17, 29 |6, 36, 45


• Tape B: 18, 24, 56 | 11, 43
• Tape C: 4, 9, 10

10/19/2024 101
Balanced Merge
• Step 3: Merging Passes
• First Merge Pass
• Merge from Tape A and Tape B, write to Tape C:
• Tape A: 3, 17, 29
• Tape B: 18, 24, 56
• Merge result on Tape C: 3, 17, 18, 24, 29, 56
• Merge from Tape C and remaining Tape A, write to Tape B:
• Tape C: 4, 9, 10
• Tape A: 6, 36, 45
• Merge result on Tape B: 4, 6, 9, 10, 36, 45
• Remaining chunk from Tape B to Tape A:
• Tape B: 11, 43

10/19/2024 102
Balanced Merge
• After the first pass, the tapes will look like:

• Tape A: 11, 43
• Tape B: 4, 6, 9, 10, 36, 45
• Tape C: 3, 17, 18, 24, 29, 56

10/19/2024 103
Balanced Merge
• Second Merge Pass
• Merge from Tape A and Tape C, write to Tape B:
• Tape A: 11, 43
• Tape C: 3, 17, 18, 24, 29, 56
• Merge result on Tape B: 3, 11, 17, 18, 24, 29, 43, 56
• Remaining chunk from Tape B to Tape A:
• Tape B: 4, 6, 9, 10, 36, 45

• After the second pass, the tapes will look like:


• Tape A: 4, 6, 9, 10, 36, 45
• Tape B: 3, 11, 17, 18, 24, 29, 43, 56
• Tape C: empty

10/19/2024 104
Balanced Merge
• Final Merge Pass
• Merge the final runs from Tape A and Tape B, write to Tape C:
• Tape A: 4, 6, 9, 10, 36, 45
• Tape B: 3, 11, 17, 18, 24, 29, 43, 56
• Merge result on Tape C: 3, 4, 6, 9, 10, 11, 17, 18, 24, 29, 36, 43, 45, 56

• Now the entire dataset is sorted and written to Tape C:


• Final Tape (Tape C): 3, 4, 6, 9, 10, 11, 17, 18, 24, 29, 36, 43, 45, 56

• By alternating between tapes and balancing the load,


we've efficiently sorted the dataset using balanced
merge sort.

10/19/2024 105
Polyphase Merge
• Used for sorting large datasets that don't fit entirely in
memory.
• It minimizes I/O operations, making it efficient for
external storage
• It is a bit more complex but very efficient in minimizing
the number of read and write operations.
• It uses a more dynamic distribution of runs across
tapes
• It ensure number of runs reduces smoothly without
emptying any tape prematurely

10/19/2024 106
Polyphase Merge
• Given
• N = 14 (total records)
• M = 3 (memory capacity, meaning we can handle 3
records at a time in memory)

• Initial Dataset (Tape A):

• 17, 3, 29, 56, 24, 18, 4, 9, 10, 6, 45, 36, 11, 43

10/19/2024 107
Polyphase Merge
• Step 1: Split and Sort
• First, we split the records into chunks that fit into memory and sort
each chunk:

• Chunk 1: 17, 3, 29 → Sorted: 3, 17, 29


• Chunk 2: 56, 24, 18 → Sorted: 18, 24, 56
• Chunk 3: 4, 9, 10 → Sorted: 4, 9, 10
• Chunk 4: 6, 45, 36 → Sorted: 6, 36, 45
• Chunk 5: 11, 43 → Sorted: 11, 43

10/19/2024 108
Polyphase Merge
• Step 2: Initial Distribution
• Distribute these sorted chunks across three tapes (A, B, C) in a
polyphase manner.
• The key here is that one tape remains empty initially (for efficient
merging step):

• Tape A: 3, 17, 29 (1 run), 6, 36, 45 (1 run) → 2 runs


• Tape B: 18, 24, 56 (1 run), 4, 9, 10 (1 run), 11, 43 (1 run) → 3 runs
• Tape C: Empty

10/19/2024 109
Polyphase Merge
• Step 3: Merging Passes
• First Merge Pass
• Merge from Tape A and Tape B, write to Tape C:
• Merge Result on Tape C:
• Merging: 3, 17, 29 and 18, 24, 56 → 3, 17, 18, 24, 29, 56
• Merging: 6, 36, 45 and 4, 9, 10 → 4, 6, 9, 10, 36, 45
• Remaining run from Tape B to Tape A: 11, 43

• After the first pass, the tapes will look like:


• Tape A: 11, 43
• Tape B: Empty
• Tape C: 3, 17, 18, 24, 29, 56, 4, 6, 9, 10, 36, 45

10/19/2024 110
Polyphase Merge
• Second Merge Pass
• Merge from Tape A and Tape C, write to Tape B:
• Merge Result on Tape B:
• Merging: 11, 43 and 3, 17, 18, 24, 29, 56 → 3, 11, 17, 18, 24, 29, 43, 56
• Remaining run from Tape C to Tape A: 4, 6, 9, 10, 36, 45

• After the second pass, the tapes will look like:


• Tape A: 4, 6, 9, 10, 36, 45
• Tape B: 3, 11, 17, 18, 24, 29, 43, 56
• Tape C: Empty

10/19/2024 111
Polyphase Merge
• Final Merge Pass
• Merge from Tape A and Tape B, write to Tape C:
• Merge Result on Tape C:
• Merging: 4, 6, 9, 10, 36, 45 and 3, 11, 17, 18, 24, 29, 43, 56 → 3, 4, 6, 9,
10, 11, 17, 18, 24, 29, 36, 43, 45, 56

• Now, the entire dataset is sorted and written to Tape C:


• Final Tape (Tape C): 3, 4, 6, 9, 10, 11, 17, 18, 24, 29, 36, 43, 45, 56

10/19/2024 112
Balanced Merge Sort
• Time Complexity:
• Sorting: O(n log M)
• Merging: O(n log k)
• Overall: O(n log n / M)

• Space Complexity:
• In‐memory space: O(M)
• Disk space: O(n)

• Pros:
• Efficient for large datasets by balancing the load across multiple tapes/files.
• Reduces the number of read/write operations by leveraging multiple tapes.
• Cons:
• Requires multiple tapes/files, which might not always be feasible.
• Implementation can be complex.

10/19/2024 113
Natural Merge Sort (Two‐Way)
• Time Complexity:
• Sorting: O(n log n)
• Merging: Efficient if the dataset has natural runs
• Overall: O(n log n)

• Space Complexity:
• In‐memory space: O(M)
• Disk space: O(n)

• Pros:
• Takes advantage of existing sorted runs in the dataset, potentially reducing work.
• Simpler implementation compared to balanced or polyphase merge sorts.
• Efficient for datasets that are partially sorted.
• Cons:
• Less efficient for completely unsorted datasets, doesn't balance load as effectively.
• Can have more passes compared to balanced merge sort.
10/19/2024 114
Polyphase Merge
• Time Complexity:
• Sorting: O(n log M)
• Merging: Efficient multi‐way merging
• Overall: O(n log n)
• Space Complexity:
• In‐memory space: O(M)
• Disk space: O(n)
• Pros:

• Efficiently minimizes read/write operations by keeping one tape empty


• Ideal for large datasets due to balancing runs
• Cons:

• More complex initial setup and distribution of runs.


• Requires careful management of tapes/files
10/19/2024 115

You might also like