Module 5
Module 5
SORTING TECHNIQUES
AND HASHING
Sorting Techniques
1. Bubble sort
2. Insertion sort O(n^2)
3. Selection sort
4. Quick sort
5. Merge sort O(nlogn)
6. Heap sort
Sorting Techniques
Task of rearranging the data in an order
Internal External
By
By comparison
Distribution
9
Insertion Sort
10
Insertion Sort
11
Insertion Sort
12
Insertion Sort
input array
5 2 4 6 1 3
sorted unsorted
13
INSERTION-SORT
Alg.: INSERTION-SORT(A)
1 2 3 4 5 6 7 8
a1 a2 a3 a4 a5 a6 a7 a8
key
Insertion Sort
Insertion Sort
1. #include <stdio.h> 14. t = array[d];
2. int main() 15. array[d] = array[d-1];
3. { 16. array[d-1] = t;
4. int n, array[1000], c, d, t; 17. d--;
5. printf("Enter number of 18. }
elements\n"); 19. }
6. scanf("%d", &n); 20. printf("Sorted list in ascending
7. printf("Enter %d integers\n", n); order:\n");
8. for (c = 0; c < n; c++) { 21. for (c = 0; c <= n - 1; c++) {
9. scanf("%d", &array[c]); 22. printf("%d\n", array[c]);
10. } 23. }
11. for (c = 1 ; c <= n - 1; c++) { 24. return 0;
12. d = c; 25. }
13. while ( d > 0 && array[d] < array[d-1])
{
Insertion Sort
Selection Sort
Idea:
◦ Find the smallest element in the array
◦ Exchange it with the element in the first position
◦ Find the second smallest element and exchange it with the element in the second
position
◦ Continue until the array is sorted
18
Example
8 4 6 9 2 3 1 1 2 3 4 9 6 8
1 4 6 9 2 3 8 1 2 3 4 6 9 8
1 2 6 9 4 3 8 1 2 3 4 6 8 9
1 2 3 9 4 6 8 1 2 3 4 6 8 9
19
Selection Sort
Alg.: SELECTION-SORT(A) 8 4 6 9 2 3 1
n ← length[A]
for j ← 1 to n - 1
do smallest ← j
for i ← j + 1 to n
do if A[i] < A[smallest]
then smallest ← i
exchange A[j] ↔ A[smallest]
Selection Sort
1. #include <stdio.h> 18. if ( small != c )
2. int main() 19. {
3. { 20. temp = array[c];
4. int array[100], n, c, d, small, temp; 21. array[c] = array[small];
5. printf("Enter number of elements\n"); 22. array[small] = temp;
6. scanf("%d", &n); 23. }
7. printf("Enter %d integers\n", n); 24. }
8. for ( c = 0 ; c < n ; c++ ) 25. printf("Sorted list in ascending
9. scanf("%d", &array[c]); order:\n");
10. for ( c = 0 ; c < ( n - 1 ) ; c++ ) 26. for ( c = 0 ; c < n ; c++ )
11. { 27. printf("%d\n", array[c]);
12. small = c; 28. return 0;
13. for ( d = c + 1 ; d < n ; d++ ) 29. }
14. {
15. if ( array[position] > array[d] )
16. small = d;
17. }
Selection Sort
Quicksort
Basic Concept: divide and conquer
Select a pivot and split the data into two groups: (< pivot) and (> pivot):
Element 33 belongs 26 33 35 29 19 12 22
to RIGHT group.
left right
Element 22 belongs
to LEFT group.
pivot
Exchange the two
elements. 26 22 35 29 19 12 33
left right
Quicksort Step 5
Step 5: pivot
Element 35 belongs 26 22 35 29 19 12 33
to RIGHT group.
left right
Element 12 belongs
to LEFT group.
pivot
Exchange,
increment left, and 26 22 12 29 19 35 33
decrement right.
left right
Quicksort Step 7
Step 7: pivot
Element 29 belongs 26 22 12 29 19 35 33
to RIGHT.
left right
Element 19 belongs
to LEFT.
pivot
Exchange,
increment left, 26 22 12 19 29 35 33
decrement right.
right left
Quicksort Step 8
Step 8: pivot
When the left and right
markers pass each other, 26 22 12 19 29 35 33
we are done with the
partition task. right left
12 19 22 26 29 33 35
Assemble parts when done
12 19 22 26 29 33 35
pivot_index = 0 40 20 10 80 60 50 7 30 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
pivot_index = 0 40 20 10 80 60 50 7 30 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
pivot_index = 0 40 20 10 80 60 50 7 30 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
pivot_index = 0 40 20 10 80 60 50 7 30 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
pivot_index = 0 40 20 10 80 60 50 7 30 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
pivot_index = 0 40 20 10 80 60 50 7 30 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
pivot_index = 0 40 20 10 80 60 50 7 30 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
pivot_index = 0 40 20 10 30 60 50 7 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 60 50 7 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 60 50 7 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 60 50 7 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 60 50 7 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 60 50 7 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 60 50 7 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 7 50 60 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 7 50 60 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 7 50 60 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 7 50 60 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 7 50 60 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 7 50 60 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 7 50 60 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 7 50 60 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 7 50 60 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
5. Swap data[too_small_index] and data[pivot_index]
pivot_index = 0 40 20 10 30 7 50 60 80 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
5. Swap data[too_small_index] and data[pivot_index]
pivot_index = 4 7 20 10 30 40 50 60 80 100
too_big_index too_small_index
Partition Result
7 20 10 30 40 50 60 80 100
7 20 10 30 40 50 60 80 100
pivot_index = 0 2 4 10 12 13 50 57 63 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
5. Swap data[too_small_index] and data[pivot_index]
pivot_index = 0 2 4 10 12 13 50 57 63 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
5. Swap data[too_small_index] and data[pivot_index]
pivot_index = 0 2 4 10 12 13 50 57 63 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
5. Swap data[too_small_index] and data[pivot_index]
pivot_index = 0 2 4 10 12 13 50 57 63 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
5. Swap data[too_small_index] and data[pivot_index]
pivot_index = 0 2 4 10 12 13 50 57 63 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
5. Swap data[too_small_index] and data[pivot_index]
pivot_index = 0 2 4 10 12 13 50 57 63 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
5. Swap data[too_small_index] and data[pivot_index]
pivot_index = 0 2 4 10 12 13 50 57 63 100
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
4. While too_small_index > too_big_index, go to 1.
5. Swap data[too_small_index] and data[pivot_index]
pivot_index = 0 2 4 10 12 13 50 57 63 100
[10, 4, 6, 3, 8, 2, 5, 7]
[10, 4, 6, 3] [8, 2, 5, 7]
[2, 3, 4, 5, 6, 7, 8, 10 ]
89
Heap and Heap Sort
The highest (or lowest) priority element is always stored at the root, hence the
name "heap".
A heap is not a sorted structure and can be regarded as partially ordered.
a heap is a complete binary tree, it has a smallest possible height - a heap with N
nodes always has O(log N) height.
Heap and Heap Sort
Heap Representations
Linked Structure
Array
Array – more advantageous
1. No wastage of array space – complete binary tree
2. Null entries if any at the tail end only
3. No links needed for parent and descendants
Heap and Heap Sort
Insertion to a heap
◦ The new element is initially appended to the end of the heap
◦ The heap property is repaired by comparing the added element with its
parent and moving the added element up a level
◦ This process is called "percolation up".
◦ The comparison is repeated until the parent is larger than or equal to the
percolating element.
Heap and Heap Sort
MaxHeap - Insertion
Heap and Heap Sort
Algorithm InsertMaxHeap
Input: ITEM, data to be inserted, N-> no. of nodes
Output: ITEM inserted into the heap tree
DS: array A[1….Size]
Heap and Heap Sort
Steps 9. while ( p> 0) and (A[p] <
A[i]) do
1. If (N>=SIZE) then 10. temp = A[i]
2. Print “Heap Tree is 11. A[i] = A[p]
saturated”
3. Exit 12. A[p] = temp
4. Else 13. t=p
5. N = N+1 14. p=p/2
6. A[N] = ITEM 15. EndWhile
7. i=N 16. EndIf
8. p=i/2 17. Stop
Heap and Heap Sort
Delete from MinHeap
The minimum element can be found at the root, which is the first
element of the array.
Remove the root and replace it with the last element of the heap
Then restore the heap property by percolating down.
Heap and Heap Sort
104
Operations on Heaps
Maintain/Restore the max-heap property
◦ MAX-HEAPIFY
Priority queues
Maintaining the Heap Property
A[2] A[4]
A[2] violates the heap property A[4] violates the heap property
A[4] A[9]
A stored item needs to have a data member, called key, that will be used in computing the index
value for the item.
The items that are stored in the hash table are indexed by values from 0 to TableSize – 1.
Advantages:
1.minimum number of multiplications (handled by shifts!)
2.avoids overflow, because is doing mods during computation
Optimal Hash Function
The best hash function would distribute keys as evenly as
possible in the hash table
“Simple uniform hashing”
◦ Maps each key to a (fixed) random number
◦ Idealized gold standard
◦ Simple to analyze
◦ Can be closely approximated by best hash functions
Example:
Hash Table of size 10
Keys: 10, 19, 35, 43, 62, 59, 31, 49, 77, 33
Hash function:
1. Add the 2 digits in the key
2. Take the digit at unit’s place as index
Example:
K I Hash Table
10 1 19 0
19 0 10 1
35 8 2
43 H 7 49 3
62 8 59,31,77 4
59 4 5
31 4 33 6
49 3 43 7
77 4 35,62 8
33 6 9
Division Method
Fast Hashing method
Widely accepted
1. Choose a number h larger than N (usually a prime no.)
3. Hash Function H
H(k) = k mod h, if indices start from 0
H(k) = k mod h + 1, if indices start from 1
MidSquare Method
Widely accepted
H(k) = x,
Limitation: time consuming computation
2. Fold Shifting
3. Fold Boundary
b. Quadratic Probing
c. Double Hashing
Separate Chaining or chaining
The idea is to keep a list of all elements that hash to
the same value.
◦ The array elements are pointers to the first nodes of the
lists.
1 81 1
2
4 64 4
5 25
6 36 16
7
9 49 9
Chaining Technique
Advantages:
• Better space utilization for large items.
• Simple collision handling: searching linked list.
• Overflow: we can store more items than the hash table
size.
• Deletion is quick and easy: deletion from the linked list.
Disadvantages:
• Cost of maintaining linked lists
• Extra storage space for link fields
Algorithm HashChaining
Basic idea:
◦ Insertion: if a slot is full, try another one,
until you find an empty one
◦ Search: follow the same sequence of probes
probe sequence!
Generalize hash function
notation:
A hash function contains two arguments now: insert 14
(i) Key value, and (ii) Probe number
Note: None of these methods can generate more than m2 different probing
sequences!
Linear Probing
In linear probing, collisions are resolved by sequentially scanning an array
(with wraparound) until an empty cell is found.
◦ i.e. f is a linear function of i, typically f(i)= i.
Example:
◦ Insert items with keys: 89, 18, 49, 58, 9 into an empty hash table.
◦ Table size is 10.
◦ Hash function is hash(x) = x mod 10.
◦ f(i) = i;
Figure :
Linear probing
hash table after
each insertion
Search and Delete
The Search algorithm follows the same probe sequence as the insert
algorithm.
◦ A search for 58 would involve 4 probes.
◦ A search for 19 would involve 5 probes.
Problem:
◦ We may not be sure that we will probe all locations in the table (i.e.
there is no guarantee to find an empty cell if table is more than half full.)
◦ If the hash table size is not prime this problem will be much severe.
= (1 + 8) mod 13 = 9 12
160
Hashing – performance factors
1.Hash function :should distribute the keys and entries evenly throughout the entire table o should
minimize collisions
3.Table size :
a. Too large a table, will cause a wastage of memory o
b. Too small a table will cause increased collisions and eventually force rehashing (creating a new hash
table of larger size and copying the contents of the current hash table into it) o
c. The size should be appropriate to the hash function used and should typically be a prime number
Applications
Keeping track of customer account information at a
bank
◦ Search through records to check balances and perform
transactions
Keep track of reservations on flights
◦ Search to find empty seats, cancel/modify reservations
Search engine
◦ Looks for all documents containing a given word
Special Case: Dictionaries
Dictionary = data structure that supports mainly two basic
operations: insert a new item and return an item with a given key
Queries: return information about the set S:
◦ Search (S, k)
◦ Minimum (S), Maximum (S)
◦ Successor (S, x), Predecessor (S, x)