0% found this document useful (0 votes)
21 views94 pages

Unit V

The document provides an overview of searching, sorting, and hashing techniques in data structures, specifically focusing on linear and binary search methods. It outlines the steps, benefits, and disadvantages of each search method, along with their time complexities. Additionally, it discusses various sorting algorithms and the distinction between internal and external sorting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views94 pages

Unit V

The document provides an overview of searching, sorting, and hashing techniques in data structures, specifically focusing on linear and binary search methods. It outlines the steps, benefits, and disadvantages of each search method, along with their time complexities. Additionally, it discusses various sorting algorithms and the distinction between internal and external sorting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 94

PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.

E-CSE

PANIMALAR INSTITUTE OF TECHNOLOGY

DEPARTMENT OF CSE

ACADEMIC YEAR :2018-19


BATCH :2017-2021
YEAR/SEM :II/III
SUB CODE : CS8391
SUB TITLE : DATA STRUCTURS
UNIT-V-NOTES
SEARCHING, SORTING AND HASHING TECHNIQUES

CS8391 1 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

UNIT V SEARCHING, SORTING AND HASHING TECHNIQUES 9


Searching- Linear Search - Binary Search. Sorting - Bubble sort - Selection sort -
Insertion sort - Shell sort – Radix sort. Hashing- Hash Functions – Separate
Chaining – Open Addressing – Rehashing – Extendible Hashing.

SEARCHING
Searching is the technique of finding particular element from the set.
Two most commonly used searching techniques are
 Linear searching
 Binary searching
Searching Algorithms
Searching is the process of determining whether the given value exists in a data
structure or in a storage media.
There are two types of searching algorithms
1. Linear Search
2. Binary Search
LINEAR SEARCH
The linear (or sequential) search algorithm on an array is:
 Start from beginning of an array/list and continues until the item is found or
the entire array/list has been searched.
 Sequentially scan the array, comparing each array item with the searched
value.

If a match is found; return the index of the matched element; otherwise return-1.
Note: linear search can be applied to both sorted and unsorted arrays.

CS8391 2 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Linear Search is implemented using following steps

Step 1: Read the search element from the user

Step 2: Compare, the search element with the first element in the list.

Step 3: If both are matching, then display "Given element found!!!" and terminate
the function

Step 4: If both are not matching, then compare search element with the next
element in the list.

Step 5: Repeat steps 3 and 4 until the search element is compared with the last
element in the list.

Step 6: If the last element in the list is also doesn't match, then display "Element
not found!!!" and terminate the function.

Routine for Linear Search


Linear search(a,n,search_item)
{ for(i=0;i<n-1;i++)
{
if (a[i] == search_item)
return 1;
else
return 0;
}
Example: Consider the following list to search an element using linear search

65, 20, 10, 55, 32, 12, 50, 99

CS8391 3 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Search element : 12
Step 1: Search element (12) is compared with first element(65).
0 1 2 3 4 5 6 7
List 65 20 10 55 32 12 50 99
12
Both are not matching. So move to next element.
Step 2: Search element(12) is compared with next element(20).
0 1 2 3 4 5 6 7
List 65 20 10 55 32 12 50 99
12
Both are not matching. So move to next element.
Step 3: Search element(12) is compared with next element(10)
0 1 2 3 4 5 6 7
List 65 20 10 55 32 12 50 99
12
Both are not matching. So move to next element.
Step 4: Search element(12) is compared with next element(55).
0 1 2 3 4 5 6 7
List 65 20 10 55 32 12 50 99
12
Both are not matching. So move to next element.
Step 5: Search element(12) is compared with next element(32).
0 1 2 3 4 5 6 7
List 65 20 10 55 32 12 50 99
12
Both are not matching. So move to next element.
Step 6: Search element(12) is compared with next element(12)
0 1 2 3 4 5 6 7
List 65 20 10 55 32 12 50 99
12
Both are matching. So we stop comparing and display element found at index 5.
CS8391 4 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Benefits
 Easy to understand
 Array can be in any order
Disadvantages
 Inefficient for array of N elements
 Examines N/2 elements on average for value in array, N elements for value
not in array
Time complexity of linear search is:
a. Best case = O(1)
b. Average case = n(n+1)/2n = O(n)
c. Worst case = O(n)

PROGRAM
#include <stdio.h>
int main()
{
int a[100], key, i, n, count = 0;
printf("Enter the number of elements in array :");
scanf("%d", &n);

printf("Enter %d numbers :", n);


for (i = 0; i < n; i++)
scanf("%d", &a[i]);

printf("Enter the number to search :");


scanf("%d", &key);

CS8391 5 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

for (i = 0; i < n; i++) {


if (a[i] == key) {
printf("%d is present at location %d.\n", key, i+1);
count++;
}
}
if (count == 0)
printf("%d isn't present in the array.\n", key);
else
printf("%d is present %d times in the array.\n", key, count);
return 0;
}
OUTPUT
Enter the number of elements in array : 5
Enter 5 numbers : 5 6 4 2 4
Enter the number to search : 4
4 is present 2 times in the array

BINARY SEARCH
The most efficient method of searching a sequential file is binary search. This
method is applicable to elements of a sorted list only.
In this method, to search an element it compare with the middle element of the list.
If it matches, then the search is successful and it is terminated. But if it does not
match, the list is divided into two halves.
The first half consists of 0th element to the middle element whereas the second list
consists of the element next to the middle element to the last element. Now It is
obvious that all elements in first half will be < or = to the middle element and all
CS8391 6 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

element elements in the second half will be > than the middle element. If the
element to be searched is greater than the center element then searching will be
done in the second half, otherwise in the first half.

Same process of comparing the element to be searched with the center element &
if not found then dividing the elements into two halves is repeated for the first or
second half. This process is repeated till the required element is found or the
division of half parts gives a single element.
Binary search looks for an item in an array/list using divide and conquer strategy.
 The algorithm begins at the middle of the array in a binary search
 If the item for which searching is less than the item in the middle. Then the
item won't be in the second half of the array.
 Once again examine the "middle" element, the process continues with each
comparison cutting in half the portion of the array, where the item might be.
 Note: Each execution of the recursive method reduces the search space by
about a half.
Let A be the mid element of array A. Then there are three conditions that needs to
be tested while searching the array using this method
1. If KEY==A then desired element is present in the list.
2. Otherwise if (KEY<A) then search the left sub list
3. Otherwise if (KEY>A) then search the right sub list.

CS8391 7 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

This can be represented as

A[0] . . . . A[m-1] A[m] A[m+1]. . . . . A[n-1]

Search here if Search here if


KEY < A[m] KEY > A[m]

Key ?

Routine for Binary Search


Binary search (int x[], int n, int key)
{
int flag = 0; low = 0;
int mid, high = n-1;
while(low < high)
{
mid = (low + high )/2; /* mid is the center position of the list */
if( x[mid]==key) /* x[mid] is the center element*/
{
flag = 1;
break;
}
if(x[mid] > key)
high = mid –1;
else
low = mid + 1;
}
if(flag==1)
return (1);

CS8391 8 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

else
return(-1);
}
Explanation
1. Initialize low to index of the first element.
2. Initialize high to index of the last element.
3. Repeat through step 6 while low < = high
4. Middle = (low + high)/2
5. If search element is equal to middle element then return the index position
of middle element.
6. If search element is less than middle element then high=middle – 1 should
be done to search the element in first half.
else
Search element is greater than middle element then low=middle + 1
should be done to search the element in second half.
7. If the element is not found return 0.

Efficiency of Binary Search


 The binary search algorithm is extremely fast compared to an algorithm that
tries all array elements in order.
 About half the array is eliminated from consideration right at the start, then a
quarter, then eighth of the array and so forth.
Benefit
 Much more efficient than linear search

Disadvantage
 Binary search requires that array elements be sorted.
CS8391 9 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Time complexity of Binary search is:

a. Best case: O(1)


In the best case, the item X is the middle in the array A. A constant number
of comparisons (actually just 1) are required.

b. Average case: log n(log n+1)/2 log n = O(log n)

To find the average case, take the sum over all elements of the product of
number of comparisons required to find each element and the probability of
searching for that element. To simplify the analysis, assume that no item
which is not in A will be searched for, and that the probabilities of searching
for each element are uniform.

The difference between O(log(N)) and O(N) is extremely significant when N


is large: for any practical problem it is crucial that we avoid O(N) searches.
For example, suppose your array contains 2 billion (2 * 10**9) values.
Linear search would involve about a billion comparisons; binary search
would require only 32 comparisons
c. Worst case: O(log n)
In the worst case, the item X does not exist in the array A at all. Through
each recursion or iteration of Binary Search, the size of the admissible range
is halved. This halving can be done ceiling (log n ) times. Thus, ceiling (log
n ) comparisons are required

Example 1: Consider the following element stored in an array

CS8391 10 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Example 2: Consider the following element stored in an array

CS8391 11 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

PROGRAM
#include <stdio.h>
#include <conio.h>
void main()
{
int a[25], i, n, Key, flag = 0, low, high, mid;
clrscr();
printf("Enter the number of elements :");
scanf("%d", &n);
printf("Enter the elements :");
for(i = 0; i<n; i++)
scanf("%d",&a[i]);

printf("Enter the key to be searched :");


scanf("%d",&Key);
low = 0;
high = n - 1;
while(low <= high)
{
mid = (low+high)/2;
if(a[mid] == Key)
{
flag = 1;
break;
}
else if(Key<a[mid])
high = mid-1;

CS8391 12 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

else
low = mid + 1;
}
if(flag == 1)
printf("Key element is found");
else
printf("Key element not found");
getch(); }
OUTPUT
Enter the number of elements :6
Enter the elements :3 15 17 25 33 47
Enter the key to be searched : 12
Key element not found

Enter the number of elements :6


Enter the elements :3 15 17 25 33 47
Enter the key to be searched : 47
Key element is found
Comparison between Binary and Linear Search
S.No Binary Search Linear Search
Binary search need sorted elements. Linear search does not need sorted
1.
elements.
Binary searching starts from middle Linear search starts searching from the
2.
point starting to ending point
3. Access is faster Access is slower
4. Worst case time complexity is O(n) Worst case time complexity is O(log n)

CS8391 13 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

SORTING
 Sorting is one of the most important operations performed by computers.
 Sorting is a process of reordering a list of items in either increasing or
decreasing order.

There are two types of sorting,


1. Internal Sorting: Internal sorting is a technique in which data resides in the
memory of the computer are sorted.
Some common internal sorting algorithms include:
 Bubble Sort
 Insertion Sort
 Quick Sort
 Heap Sort
 Radix Sort
 Selection sort
2. External Sorting:
External sorting is a term for a class of sorting algorithms that can handle massive
amounts of data. External sorting is required when the data being sorted do not fit
into the main memory of a computing device (usually RAM) and instead they must
reside in the slower external memory (usually a hard drive). External sorting
typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data
small enough to fit in main memory are read, sorted, and written out to a temporary
file. In the merge phase, the sorted sub-files are combined into a single larger file.
External sorting is a technique in which data resides in external or secondary
storage devices such as hard disk, floppy disk etc..

CS8391 14 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

One example of external sorting is the external merge sort algorithm, which sorts
chunks that each fit in RAM, then merges the sorted chunks together. We first
divide the file into runs such that the size of a run is small enough to fit into main
memory. Then sort each run in main memory using merge sort sorting algorithm.
Finally merge the resulting runs together into successively bigger runs, until the
file is sorted.

Some of external sorting algorithms include:

 Two way merge sort


 Multi way merge sort
 Balanced sort

BUBBLE SORT

Bubble sort is a simple sorting algorithm that works by repeatedly stepping


through the list to be sorted, comparing each pair of adjacent items and swapping
them if they are in the wrong order. The pass through the list is repeated until no
swaps are needed, which indicates that the list is sorted. The algorithm gets its
name from the way smaller elements "bubble" to the top of the list. Because it only
uses comparisons to operate on elements, it is a comparison sort. Although the
algorithm is simple, most of the other sorting algorithms are more efficient for
large lists. Bubble sort is not a stable sort which means that if two same elements
are there in the list, they may not get their same order with respect to each other.
Steps for Bubble Sort

1. Compare A[0] and A[1]. If A[0] is bigger than A[1], swap the elements.
2. Move to the next element, A[1] (which might now contain the result of a
swap from the previous step), and compare it with A[2]. If A[1] is bigger

CS8391 15 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

than A[2], swap the elements. Do this for every pair of elements until the
end of the list.
3. Do steps 1 and 2 n times.

Routine for Bubble Sort

void bubble_sort( int A[ ], int n ) {

int temp;

for(int i = 0; i< n; i++) {

// (n-k-1) is for ignoring comparisons of elements which have already been

compared in earlier iterations

for(int j = 0; j < n-i-1; j++) {

if(A[ j ] > A[ j+1] ) {

// here swapping of positions is being done.

temp = A[ j ];

A[ j ] = A[ j+1 ];

A[ j + 1] = temp;

} } } }

Complexity Analysis of Bubble Sort

In Bubble Sort, n-1 comparisons will be done in the 1st pass, n-2 in 2nd pass, n-3
in 3rd pass and so on. So the total number of comparisons will be,

(n-1) + (n-2) + (n-3) + ..... + 3 + 2 + 1

Sum = n(n-1)/2

i.e O(n2)
CS8391 16 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Hence the time complexity of Bubble Sort is O(n2).

The main advantage of Bubble Sort is the simplicity of the algorithm.

The space complexity for Bubble Sort is O(1), because only a single additional
memory space is required i.e. for temp variable. Also, the best case time
complexity will be O(n), it is when the list is already sorted.

Following are the Time and Space complexity for the Bubble Sort algorithm.

Worst Case Time Complexity [ Big-O ]: O(n2)


Best Case Time Complexity [Big-omega]: O(n)
Average Time Complexity [Big-theta]: O(n2)
Space Complexity: O(1)
Example 1: Sort the array A[]={7, 3, 1, 4, 2} using the bubble sort algorithm.
A[1] A[2] A[3] A[4] A[5]
I=5 7 3 1 4 2
J=1 3 7 1 4 2
J=2 3 1 7 4 2
J=3 3 1 4 7 2
J=4 3 1 4 2 7
A[1] A[2] A[3] A[4] A[5]
I=4 7 3 1 4 2
J=1 1 3 4 2 7
J=2 1 3 4 2 7
J=3 1 3 2 4 7
A[1] A[2] A[3] A[4] A[5]
I=3 1 3 2 4 7
J=1 1 3 2 4 7
J=2 1 2 3 4 7
A[1] A[2] A[3] A[4] A[5]
I=2 1 2 3 4 7
J=1 1 2 3 4 7
A[1] A[2] A[3] A[4] A[5]
I=1 1 2 3 4 7

CS8391 17 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Example 2:
Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest number
to greatest number using bubble sort. In each step, elements written in bold are
being compared. Three passes will be required.

First Pass:
(51428) ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and
swaps since 5 > 1.

(15428) ( 1 4 5 2 8 ), Swap since 5 > 4

(14528) ( 1 4 2 5 8 ), Swap since 5 > 2

(14258) ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5),
algorithm does not swap them.
Second Pass:
(14258) (14258)
(14258) ( 1 2 4 5 8 ), Swap since 4 > 2
(12458) (12458)
(12458) (12458)

Now, the array is already sorted, but our algorithm does not know if it is
completed. The algorithm needs one whole pass without any swap to know it is
sorted.
Third Pass:
(12458) (12458)
(12458) (12458)
(12458) (12458)
(12458) (12458)

CS8391 18 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Example 3: A [ ] = { 7, 4, 5, 2}

In step 1,7 is compared with4 . Since ,7>4,7 is moved ahead of 4 . Since all the
other elements are of a lesser value than 7, 7 is moved to the end of the array.

Now the array is A[] = {4,5,2,7}.

In step 2, 4 is compared with 5. Since 5>4 and both 4 and 5 are in ascending order,
these elements are not swapped. However, when 5 is compared with 2, 5>2 and these
elements are in descending order. Therefore, 5 and 2 are swapped.

Now the array is A[] = {4,2,5,7}.

In step 3, the element 4 is compared with2 . Since 4>2 and the elements are in
descending order, 4 and 2 are swapped.
The sorted array is A[] = {2,4,5,7}.

Program for Bubble Sort


#include <stdio.h>
void bubbleSort(int arr[], int n)

CS8391 19 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

{
int i, j, temp;
for(i = 0; i < n; i++)
{ for(j = 0; j < n-i-1; j++)
{ if( arr[j] > arr[j+1])
{
// swap the elements
temp = arr[j];
arr[j] = arr[j+1];
arr[j+1] = temp;
} } }
// print the sorted array
printf("Sorted Array: ");
for(i = 0; i < n; i++)
{
printf("%d ", arr[i]);
}
}
int main()
{
int arr[100], i, n, step, temp;
printf("Enter the number of elements to be sorted: ");
scanf("%d", &n);
for(i = 0; i < n; i++)
{
printf("Enter element no. %d: ", i+1);
scanf("%d", &arr[i]);
CS8391 20 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

}
bubbleSort(arr, n);
return 0;
}
OUTPUT
Enter the number of elements to be sorted: 5
Enter element no. 1: 14
Enter element no. 2: -5
Enter element no. 3: 45
Enter element no. 4: 69
Enter element no. 5: 100
Sorted Array: -5 6 14 45 100

SELECTION SORT
Selection Sort algorithm is used to arrange a list of elements in a particular order
(Ascending or Descending).
In selection sort, the first element in the list is selected and it is compared
repeatedly with remaining all the elements in the list. If any element is smaller than
the selected element (for ascending order), then both are swapped. Then we select
the element at second position in the list and it is compared with remaining all
elements in the list. If any element is smaller than the selected element, then both
are swapped. This procedure is repeated till the entire list is sorted.

Step by Step Process


The selection sort algorithm is performed using following steps...
Step 1: Select the first element of the list (i.e., Element at first position in the list).
Step 2: Compare the selected element with all other elements in the list.

CS8391 21 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Step 3: For every comparision, if any element is smaller than selected element (for
Ascending order), then these two are swapped.
Step 4: Repeat the same procedure with next position in the list till the entire list is
sorted.
Routine for Selection Sort
void selectionSort(int[] a, int n)
{ for (int i = 0; i ‹ n-1; i++)
{
int small = i;
for (int j = i+1; j ‹ n; j++)
if (a[j] ‹ a[small])
small = j;
int temp = a[small];
a[small] = a[i];
a[i] = temp;
} }
Complexity Analysis of Selection Sort
Selection Sort requires two nested for loops to complete itself, one for loop is in
the function selectionSort, and inside the first loop we are making a call to another
function indexOfMinimum, which has the second(inner) for loop.
Hence for a given input size of n, following will be the time and space complexity
for selection sort algorithm:
Worst Case Time Complexity [ Big-O ]: O(n2)
Best Case Time Complexity [Big-omega]: O(n2)
Average Time Complexity [Big-theta]: O(n2)
Space Complexity: O(1)

CS8391 22 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Example 1: A[] = {3, 6, 1, 8, 4, 5}

In the first pass, the smallest element will be 1, so it will be placed at the first position.
Then leaving the first element, next smallest element will be searched, from the
remaining elements. We will get 3 as the smallest, so it will be then placed at the second
position.
Then leaving 1 and 3(because they are at the correct position), we will search for the
next smallest element from the rest of the elements and put it at third position and keep
doing this until array is sorted.
Example 2: a[]={ 15, 20,10,30,50,18,5,45}

Iteration 1: Select the first position element in the list, compare it with all other elements in the list
and whenever we found a smaller element than the element at the first position then swap those two
elements.

CS8391 23 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

List after 1st Iteration

Iteration 2: Select the second position element in the list, compare it with all other elements in
the list and whenever we found a smaller element than the element at the first position then swap
those two elements.
List after 2nd Iteration

Iteration 3: Select the third position element in the list, compare it with all other elements in the
list and whenever we found a smaller element than the element at the first position then swap
those two elements.
List after 3rd Iteration

Iteration 4: Select the fourth position element in the list, compare it with all other elements in
the list and whenever we found a smaller element than the element at the first position then swap
those two elements.
List after 4th Iteration

Iteration 5: Select the fifth position element in the list, compare it with all other elements in the
list and whenever we found a smaller element than the element at the first position then swap
those two elements.

List after 5th Iteration

Iteration 6: Select the sixth position element in the list, compare it with all other elements in the
list and whenever we found a smaller element than the element at the first position then swap
those two elements.
List after 6th Iteration

CS8391 24 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Iteration 7: Select the seventh position element in the list, compare it with all other elements in
the list and whenever we found a smaller element than the element at the first position then swap
those two elements.
List after 7th Iteration (Final Sorted List of elements)

Program for Selection Sort


#include <stdio.h>
int main()
{
int array[100], n, c, d, position, swap;

printf("Enter number of elements :");


scanf("%d", &n);

printf("Enter %d integers :", n);

for ( c = 0 ; c < n ; c++ )


scanf("%d", &array[c]);

for ( c = 0 ; c < ( n - 1 ) ; c++ )


{
position = c;

for ( d = c + 1 ; d < n ; d++ )


{
if ( array[position] > array[d] )
position = d;
}
if ( position != c )

CS8391 25 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

{
swap = array[c];
array[c] = array[position];
array[position] = swap;
}
}
printf("Sorted list in ascending order:\n");
for ( c = 0 ; c < n ; c++ )
printf("%d", array[c]);
return 0;
}
OUTPUT
Enter number of elements : 7
Enter 7 integers : 23 12 3 -7 4 598 38
Sorted list in ascending order:
-7 3 4 12 23 38 598
INSERTION SORT
Sorting is the process of arranging a list of elements in a particular order
(Ascending or Descending). Insertion sort algorithm arranges a list of elements in a
particular order. In insertion sort algorithm, for every iteration moves an element
from unsorted portion to sorted portion until all the elements are sorted in the list.
Basic Idea:
Find the location for an element and move all others up, and insert the element.
The process involved in insertion sort is as follows:
1. The left most value can be said to be sorted relative to itself. Thus, we don’t
need to do anything.

CS8391 26 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

2. Check to see if the second value is smaller than the first one. If it is, swap these
two values. The first two values are now relatively sorted.

3. Next, we need to insert the third value in to the relatively sorted portion so that
after insertion, the portion will still be relatively sorted.

4. Remove the third value first. Slide the second value to make room for
insertion. Insert the value in the appropriate position.

5. Now the first three are relatively sorted.

6. Do the same for the remaining items in the list.

Routine for Insertion Sort


insertionSort(int [] A,int n)
{
int tmp;
for(int i=1; i<n; i++)
{
for(int j=i; j > 0 && A[j] < A[j-1]; j--)
{
tmp=A[j];
A[j]=A[j-1];
A[j-1]=tmp;
} }
return A; }
Example 1:

CS8391 27 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Example 2:

Advantages of Insertion Sort:


1. Simple to Implement
2. This method is efficient when we want to sort small number of elements and this
method has excellent performance on almost sorted list of elements

CS8391 28 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

3. More efficient than most other simple O(n2) algorithms such as selection sort or
bubble sort
4. It is called in-place sorting algorithm. The in-place sorting algorithm is an
algorithm in which the input is overwritten by output and to execute the sorting
method it does not require any more additional space.

Complexity Analysis of Insertion Sort


As we mentioned above that insertion sort is an efficient sorting algorithm, as it
does not run on preset conditions using for loops, but instead it uses one while
loop, which avoids extra steps once the array gets sorted.

Even though insertion sort is efficient, still, if we provide an already sorted array to
the insertion sort algorithm, it will still execute the outer for loop, thereby requiring
n steps to sort an already sorted array of n elements, which makes its best case time
complexity a linear function of n.

Worst Case Time Complexity [ Big-O ]: O(n2)


Best Case Time Complexity [Big-omega]: O(n)
Average Time Complexity [Big-theta]: O(n2)
Space Complexity: O(1)

Program for Insertion Sort


#include<stdio.h>
#include<conio.h>
void main()
{
int size, i, j, temp, list[100];
CS8391 29 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

printf("Enter the size of the list: ");


scanf("%d", &size);
printf("Enter %d integer values: ", size);
for (i = 0; i < size; i++)
scanf("%d", &list[i]);
//Insertion sort logic
for (i = 1; i < size; i++) {
temp = list[i];
j = i - 1;
while ((temp < list[j]) && (j >= 0)) {
list[j + 1] = list[j];
j = j - 1;
}
list[j + 1] = temp;
}

printf("List after Sorting is: ");


for (i = 0; i < size; i++)
printf(" %d", list[i]);
getch();
}
OUTPUT
Enter the size of the list: 5
Enter 5 integer values: 23 -5 5 67 89
List after Sorting is: -5 5 23 67 89

CS8391 30 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

SHELL SORT
Shell sort, named after its inventor, Donald Shell, It is a highly efficient sorting
algorithm and is based on insertion sort algorithm. This algorithm avoids large
shifts as in case of insertion sort, if the smaller value is to the far right and has to
be moved to the far left.
This algorithm uses insertion sort on a widely spread elements, first to sort them
and then sorts the less widely spaced elements. This spacing is termed as interval.
This interval is calculated based on Knuth's formula as −

Knuth's Formula

h=h*3+1
where −
h is interval with initial value 1

This algorithm is quite efficient for medium-sized data sets as its average and
worst case complexity are of Ο(n), where n is the number of items.

Routine for Shell Sort


void shellsort (int A[], int N)
{
int i, j, k, temp;
for(k=N/2; k>0; k=k/2)
for(i=k; i<N; i++)
{
temp = A[i];
for(j=i; j>=k && A[j-k] > temp; j=j-k)
{
A[j] = A[j – k];

CS8391 31 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

}
A[j] = temp;
}
}
Example 1:
Shell sort is quite similar to that of insertion sort with the only difference that in
shell sort, higher values of k are considered. Whereas insertion sort assumes k to
be 1. If k=4, then every 4th element is compared, if it is 3 then every 3rd element
is compared. Thus k=m means every mth element gets compared with each other
and is swapped.
The value of k is decremented after each scan. The file is sorted, when k becomes
1 and swapping is performed for k = 1. The difference in original and decremented
value(i.e. difference in k) may be generated using a hash function, which returns
the number of decrements in k. say k=4, then k=2 and finally k=1. k=3 is skipped.
or (k=8, k=4, k=2, k=1) or (k=4, k=3, k=2, k=1).
If k is taken as 1 only, then it is no more shell sort, it becomes insertion sort.

CS8391 32 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Example 2: Shell’s increment will be (n/2)


18 32 12 5 38 33 16 2
8 Numbers to be sorted, Shell’s increment will be (n/2)
(8/2)  (4) = 4
increment by 4: 1 2 3 4
18 32 12 5 38 33 16 2
1. Only look at 18 and 38 and sort in order;18 and 38 stays at its current position
because they are in order.
2. Only look at 32 and 33 and sort in order;32 and 33 stays at its current position
because they are in order.
3. Only look at 12 and 16 and sort in order;12 and 16 stays at its current position
because they are in order.

CS8391 33 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

4. Only look at 5 and 2 and sort in order ;2 and 5 need to be switched to be in


order.
Resulting numbers after increment 4 pass:
18 32 12 2 38 33 16 5
(4/2)  (2) = 2
Increment by 2: 1 2
18 32 12 2 38 33 16 5
1. Look at 18, 12, 38, 16 and sort them in their appropriate location:
12 38 16 2 18 33 38 5
2. Look at 32, 2, 33, 5 and sort them in their appropriate location:
12 2 16 5 18 32 38 33
(2/2) (1) = 1
increment by 1: 1
12 2 16 5 18 32 38 33
2 5 12 16 18 32 33 38
The last increment or phase of Shell sort is basically an Insertion Sort algorithm.
Example 4:

CS8391 34 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Time Complexity
Time complexity of above implementation of shellsort is O(n 2). In the above
implementation gap is reduce by half in every iteration. There are many other ways
to reduce gap which lead to better time complexity.
Worst Case Analysis: O(n2)
Best Case Analysis: O(n log n)
Average Case Analysis: O(n1.5)
Program for Shell Sort
#include<stdio.h>
#include<conio.h>
void shell(int a[],int n);
int i,j,k,n,temp,a[25];
void main()
{
clrscr();
printf("\n Enter the limit:");
scanf("%d",&n);
printf("\n Enter the elements :");
for(i=0;i<n;i++)
scanf("%d",&a[i]);
printf("\n SHELL SORT");
printf("\n **********");

for(k=n/2; k>0; k=k/2)


for(i=k; i<n;i++)
{
temp=a[i];

CS8391 35 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

for(j=i;j>=k&&a[j-k]>temp;j=j-k)
{
a[j]=a[j-k];
}
a[j]=temp;
printf(“\n k= %d\t”,k);
for(int p=0; p<n; p++)
printf(“%d\t”,a[p]);
printf(“\n”);
}
printf(“\n The Sorted elements are:”);
for(i=0;i<n;i++)
printf(“%d\t”,a[i]);
getch();
}
OUTPUT
Enter the limit:9
Enter the elements :81 45 67 23 -9 450 56 6 78
SHELL SORT
**********
k=4 -9 45 67 23 81 450 56 6 78
k=4 -9 45 67 23 81 450 56 6 78
k=4 -9 45 56 23 81 450 67 6 78
k=4 -9 45 56 6 81 450 67 23 78
k=4 -9 45 56 6 78 450 67 23 81

k=2 -9 45 56 6 78 450 67 23 81
k=2 -9 6 56 45 78 450 67 23 81
k=2 -9 6 56 45 78 450 67 23 81
k=2 -9 6 56 45 78 450 67 23 81

CS8391 36 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

k=2 -9 6 56 45 67 450 78 23 81
k=2 -9 6 56 23 67 45 78 450 81
k=2 -9 6 56 23 67 45 78 450 81

k=1 -9 6 56 23 67 45 78 450 81
k=1 -9 6 56 23 67 45 78 450 81
k=1 -9 6 23 56 67 45 78 450 81
k=1 -9 6 23 56 67 45 78 450 81
k=1 -9 6 23 45 56 67 78 450 81
k=1 -9 6 23 45 56 67 78 450 81
k=1 -9 6 23 45 56 67 78 450 81
k=1 -9 6 23 45 56 67 78 81 450

The Sorted elements are:-9 6 23 45 56 67 78 81 450

RADIX SORT / BUCKET SORT


Radix sort was developed for sorting large integers, but it treats an integer as a
string of digits, so it is really a string sorting algorithm.
Radix sort is a non-comparative sorting algorithm that sorts data with keys by
grouping keys by the individual digits which share the same significant position
and value.
Radix Sort arranges the elements in order by comparing the digits of the numbers.

LSD radix sort : Least-significant-digit-first radix sort.


LSD radix sorts process the integer representations starting from the least
significant digit and move the processing towards the most significant digit.
MSD radix sort : Most-significant-digit-first radix sort.
CS8391 37 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

MSD radix sort starts processing the keys from the most significant digit, leftmost
digit, to the least significant digit, rightmost digit. This sequence is opposite that of
least significant digit (LSD) radix sorts.
Routine for Radix Sort
radix_sort(int arr[], int n)
{ int bucket[10][5],buck[10],b[10];
int i,j,k,l,num,div,large,passes;
div=1;
num=0;
large=arr[0];
for(i=0 ; i< n ; i++)
{ if(arr[i] > large)
{ large = arr[i];}
while(large > 0)
{ num++;
large = large/10;
}
for(passes=0 ; passes < num ; passes++)
{
for(k=0 ; k< 10 ; k++)
{ buck[k] = 0; }
for(i=0 ; i< n ;i++)
{ l = ((arr[i]/div)%10);
bucket[l][buck[l]++] = arr[i];
}
i=0;
for(k=0 ; k < 10 ; k++)
CS8391 38 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

{ for(j=0 ; j < buck[k] ; j++)


{ arr[i++] = bucket[k][j]; }
}
div*=10;
}
}
}
Steps to perform Radix Sort
Each key is first figuratively dropped into one level of buckets corresponding to
the value of the rightmost digit. Each bucket preserves the original order of the
keys as the keys are dropped into the bucket. There is a one-to-one correspondence
between the number of buckets and the number of values that can be represented
by a digit. Then, the process repeats with the next neighboring digit until there are
no more digits to process. In other words:
1. Take the least significant digit of each key.
2. Group the keys based on that digit, but otherwise keep the original order of
keys.
3. Repeat the grouping process with each more significant digit.
The sort in step 2 is usually done using bucket sort or counting sort, which are
efficient in this case since there are usually only a small number of digits.
Example 1: array = [88, 410, 1772, 20]
Radix sort relies on the positional notation of integers, as shown here:

CS8391 39 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

First, the array is divided into buckets based on the value of the least significant
digit: the ones digit.

These buckets are then emptied in order, resulting in the following partially-sorted
array:

array = [410, 20, 1772, 88]

Next, repeat this procedure for the tens digit:

The relative order of the elements didn’t change this time, but you’ve still got more
digits to inspect.
The next digit to consider is the hundreds digit:

CS8391 40 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

For values that have no hundreds position (or any other position without a value),
the digit will be assumed to be zero.
Reassembling the array based on these buckets gives the following
array = [20, 88, 410, 1772]

Finally, you need to consider the thousands digit:

Reassembling the array from these buckets leads to the final sorted array:
array = [20, 88, 410, 1772]
When multiple numbers end up in the same bucket, their relative ordering doesn’t
change. For example, in the zero bucket for the hundreds position, 20 comes before
88. This is because the previous step put 20 in a lower bucket than 80, so 20 ended
up before 88 in the array.
Example 2:
Original, unsorted list:
170, 45, 75, 90, 802, 24, 2, 66
CS8391 41 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Sorting by least significant digit (1s place) gives:


170, 90, 802, 2, 24, 45, 75, 66
Sorting by next digit (10s place) gives:
802, 2, 24, 45, 66, 170, 75, 90
Sorting by most significant digit (100s place) gives:
2, 24, 45, 66, 75, 90, 170, 802
It is important to realize that each of the above steps requires just a
single pass over the data, since each item can be placed in its correct
bucket without having to be compared with other items.
Some LSD radix sort implementations allocate space for buckets by
first counting the number of keys that belong in each bucket before
moving keys into those buckets. The number of times that each digit
occurs is stored in an array. Consider the previous list of keys viewed
in a different way:
170, 045, 075,090, 002, 024, 802, 066
The first counting pass starts on the least significant digit of each key,
producing an array of bucket sizes:
2 (bucket size for digits of 0: 170, 090)
2 (bucket size for digits of 2: 002, 802)
1 (bucket size for digits of 4: 024)
2 (bucket size for digits of 5: 045, 075)
1 (bucket size for digits of 6: 066)
A second counting pass on the next more significant digit of each key will
produce an array of bucket sizes:
2 (bucket size for digits of 0: 002, 802)
1 (bucket size for digits of 2: 024)

CS8391 42 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

1 (bucket size for digits of 4: 045)


1 (bucket size for digits of 6: 066)
2 (bucket size for digits of 7: 170, 075)
1 (bucket size for digits of 9: 090)
A third and final counting pass on the most significant digit of each key will
produce an array of bucket sizes:
6 (bucket size for digits of 0: 002, 024, 045, 066, 075, 090)
1 (bucket size for digits of 1: 170)
1 (bucket size for digits of 8: 802)
Program for Radix Sort
#include<stdio.h>
// Function to find largest element
int largest(int a[], int n)
{ int large = a[0], i;
for(i = 1; i < n; i++)
{
if(large < a[i])
large = a[i];
}
return large;
}
// Function to perform sorting
void RadixSort(int a[], int n)
{ int bucket[10][10], bucket_count[10];
int i, j, k, remainder, NOP=0, divisor=1, large, pass;

CS8391 43 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

large = largest(a, n);


printf("The large element %d\n",large);
while(large > 0)
{ NOP++;
large/=10;
}

for(pass = 0; pass < NOP; pass++)


{ for(i = 0; i < 10; i++)
{
bucket_count[i] = 0;
}
for(i = 0; i < n; i++)
{
remainder = (a[i] / divisor) % 10;
bucket[remainder][bucket_count[remainder]] = a[i];
bucket_count[remainder] += 1;
}
i = 0;
for(k = 0; k < 10; k++)
{ for(j = 0; j < bucket_count[k]; j++)
{
a[i] = bucket[k][j];
i++;
}
}
divisor *= 10;
CS8391 44 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

for(i = 0; i < n; i++)


printf("%d ",a[i]);
printf("\n");
}
}
int main()
{ int i, n, a[10];
printf("Enter the number of elements :: ");
scanf("%d",&n);
printf("Enter the elements :: ");
for(i = 0; i < n; i++)
{ scanf("%d",&a[i]);
}
RadixSort(a,n);
printf("The sorted elements are :: ");
for(i = 0; i < n; i++)
printf("%d ",a[i]);
printf("\n");
return 0;
}
OUTPUT
Enter the number of elements :: 7
Enter the elements :: 21 32 11 58 98 45 21
The large element 98
21 11 21 32 45 58 98
11 21 21 32 45 58 98
The sorted elements are :: 11 21 21 32 45 58 98
CS8391 45 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

ADDITIONAL SORTING TECHNIQUES


QUICK SORT
Quick Sort is also based on the concept of Divide and Conquer, just like merge
sort. But in quick sort all the heavy lifting(major work) is done while dividing the
array into subarrays, while in case of merge sort, all the real work happens during
merging the subarrays. In case of quick sort, the combine step does absolutely
nothing.
It is also called partition-exchange sort. This algorithm divides the list into three
main parts:
 Elements less than the Pivot element
 Pivot element
 Elements greater than the pivot element
Pivot element can be any element from the array, it can be the first element, the
last element or any random element. In this tutorial, we will take the rightmost
element or the last element as pivot.
For example: In the array {52, 37, 63, 14, 17, 8, 6, 25}, we take 25 as pivot. So
after the first pass, the list will be changed like this. {6 8 17 14 25 63 37 52}

Hence after the first pass, pivot will be set at its position, with all the elements
smaller to it on its left and all the elements larger than to its right. Now 6 8 17 14
and 63 37 52 are considered as two separate sunarrays, and same recursive logic
will be applied on them, and we will keep doing this until the complete array is
sorted.
Steps to perform Quick Sort
Following are the steps involved in quick sort algorithm:

CS8391 46 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

1. After selecting an element as pivot, which is the last index of the array in our
case, we divide the array for the first time.
2. In quick sort, we call this partitioning. It is not simple breaking down of array
into 2 subarrays, but in case of partitioning, the array elements are so positioned
that all the elements smaller than the pivot will be on the left side of the pivot and
all the elements greater than the pivot will be on the right side of it.
3. And the pivot element will be at its final sorted position.
4. The elements to the left and right, may not be sorted.
5. Then we pick subarrays, elements on the left of pivot and elements on the right of
pivot, and we perform partitioning on them by choosing a pivot in the subarrays.

Example 1: Array {9, 7, 5, 11, 12, 2, 14, 3, 10, 6}

CS8391 47 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

In step 1, we select the last element as the pivot, which is 6 in this case, and call for
partitioning, hence re-arranging the array in such a way that 6 will be placed in its
final position and to its left will be all the elements less than it and to its right, we
will have all the elements greater than it.
Then we pick the subarray on the left and the subarray on the right and select a pivot
for them, in the above diagram, we chose 3 as pivot for the left subarray and 11 as
pivot for the right subarray.and we again call for partitioning.
Program for Quick Sort
# include <stdio.h>
void swap(int* a, int* b)
{
int t = *a;
*a = *b;
*b = t;
}
// a[] is the array, p is starting index, that is 0, and r is the last index of array.
void quicksort(int a[], int p, int r)
{
if(p < r)
{
int q;
q = partition(a, p, r);
quicksort(a, p, q);
quicksort(a, q+1, r);
}
}
int partition (int a[], int low, int high)
CS8391 48 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

int pivot = arr[high]; // selecting last element as pivot

int i = (low - 1); // index of smaller element

for (int j = low; j <= high- 1; j++)

// If current element is smaller than or equal to pivot

if (arr[j] <= pivot)

i++; // increment index of smaller element

swap(&arr[i], &arr[j]);

swap(&arr[i + 1], &arr[high]);

return (i + 1);

// function to print the array

void printArray(int a[], int size)

int i;

for (i=0; i < size; i++)

CS8391 49 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

printf("%d ", a[i]);

printf("\n");

int main()

int arr[] = {9, 7, 5, 11, 12, 2, 14, 3, 10, 6};

int n = sizeof(arr)/sizeof(arr[0]);

// call quickSort function

quickSort(arr, 0, n-1);

printf("Sorted array: n");

printArray(arr, n);

return 0;

Complexity Analysis of Quick Sort

For an array, in which partitioning leads to unbalanced subarrays, to an extent


where on the left side there are no elements, with all the elements greater than the
pivot, hence on the right side.

And if keep on getting unbalanced subarrays, then the running time is the worst
case, which is O(n2)

CS8391 50 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Whereas if partitioning leads to almost equal subarrays, then the running time is
the best, with time complexity as O(n*log n).

Worst Case Time Complexity [ Big-O ]: O(n2)

Best Case Time Complexity [Big-omega]: O(n*log n)

Average Time Complexity [Big-theta]: O(n*log n)

Space Complexity: O(n*log n)

As we know now, that if subarrays partitioning produced after partitioning are


unbalanced, quick sort will take more time to finish. If someone knows that you
pick the last index as pivot all the time, they can intentionally provide you with
array which will result in worst-case running time for quick sort.

To avoid this, you can pick random pivot element too. It won't make any difference
in the algorithm, as all you need to do is, pick a random element from the array,
swap it with element at the last index, make it the pivot and carry on with quick
sort.

Space required by quick sort is very less, only O(n*log n) additional space is
required.

Quick sort is not a stable sorting technique, so it might change the occurence of
two similar elements in the list while sorting.

MERGE SORT

In Merge Sort, the given unsorted array with n elements, is divided into n
subarrays, each having one element, because a single element is always sorted in

CS8391 51 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

itself. Then, it repeatedly merges these subarrays, to produce new sorted subarrays,
and in the end, one complete sorted array is produced.
The concept of Divide and Conquer involves three steps:
1. DIVIDE: Partition the n-element sequence to be sorted into two
subsequences of n/2 elements each.
2. CONQUER: Sort the two subsequences recursively using the mergesort.
3. COMBINE: Merge the two sorted sorted subsequences of size n/2 each to
produce the sorted sequence consisting of n elements.

Routine for Merge Sort

void merge(int *Arr, int start, int mid, int end) {


// create a temp array
int temp[end - start + 1];

// crawlers for both intervals and for temp


int i = start, j = mid+1, k = 0;

// traverse both arrays and in each iteration add smaller of both elements in
temp
while(i <= mid && j <= end) {
if(Arr[i] <= Arr[j]) {
temp[k] = Arr[i];
k += 1; i += 1;
CS8391 52 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

}
else {
temp[k] = Arr[j];
k += 1; j += 1;
}
}

// add elements left in the first interval


while(i <= mid) {
temp[k] = Arr[i];
k += 1; i += 1;
}

// add elements left in the second interval


while(j <= end) {
temp[k] = Arr[j];
k += 1; j += 1;
}

// copy temp to original interval


for(i = start; i <= end; i += 1) {
Arr[i] = temp[i - start]
}
}

// Arr is an array of integer type


// start and end are the starting and ending index of current interval of Arr

void mergeSort(int *Arr, int start, int end) {

if(start < end) {


int mid = (start + end) / 2;
mergeSort(Arr, start, mid);
mergeSort(Arr, mid+1, end);
merge(Arr, start, mid, end);
}
}

CS8391 53 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Example 1: Array {14, 7, 3, 12, 9, 11, 6, 12}

In merge sort the following steps are performed:


1. We take a variable p and store the starting index of our array in this. And we
take another variable r and store the last index of array in it.
2. Then we find the middle of the array using the formula (p + r)/2 and mark
the middle index as q, and break the array into two subarrays, from p to q
and from q + 1 to r index.
3. Then we divide these 2 subarrays again, just like we divided our main array
and this continues.
4. Once we have divided the main array into subarrays with single elements,
then we start merging the subarrays.

CS8391 54 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Example 2:

Program for Merge Sort

#include<stdio.h>
void mergesort(int a[],int i,int j);
void merge(int a[],int i1,int j1,int i2,int j2);

int main()
{
int a[30],n,i;
printf("Enter no of elements:");
scanf("%d",&n);
printf("Enter array elements:");

CS8391 55 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

for(i=0;i<n;i++)
scanf("%d",&a[i]);

mergesort(a,0,n-1);

printf("\nSorted array is :");


for(i=0;i<n;i++)
printf("%d ",a[i]);

return 0;
}
void mergesort(int a[],int i,int j)
{
int mid;

if(i<j)
{
mid=(i+j)/2;
mergesort(a,i,mid); //left recursion
mergesort(a,mid+1,j); //right recursion
merge(a,i,mid,mid+1,j); //merging of two sorted sub-arrays
}
}
void merge(int a[],int i1,int j1,int i2,int j2)
{
int temp[50]; //array used for merging
int i,j,k;

CS8391 56 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

i=i1; //beginning of the first list


j=i2; //beginning of the second list
k=0;

while(i<=j1 && j<=j2) //while elements in both lists


{
if(a[i]<a[j])
temp[k++]=a[i++];
else
temp[k++]=a[j++];
}

while(i<=j1) //copy remaining elements of the first list


temp[k++]=a[i++];

while(j<=j2) //copy remaining elements of the second list


temp[k++]=a[j++];

//Transfer elements from temp[] back to a[]


for(i=i1,j=0;i<=j2;i++,j++)
a[i]=temp[j];
}
OUTPUT
Enter no of elements: 7
Enter array elements: 23 45 7 -6 90 1 55
Sorted array is : -6 1 7 23 45 55 90

CS8391 57 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Complexity Analysis of Merge Sort


Merge Sort is quite fast, and has a time complexity of O(n*log n). It is also a stable
sort, which means the "equal" elements are ordered in the same order in the sorted
list.

The running time for merge sort is O(n*log n).


As we have already learned in Binary Search that whenever we divide a number
into half in every steps, it can be represented using a logarithmic function, which is
log n and the number of steps can be represented by log n + 1(at most)

Also, we perform a single step operation to find out the middle of any subarray, i.e.
O(1).
And to merge the subarrays, made by dividing the original array of n elements, a
running time of O(n) will be required.
Hence the total time for mergeSort function will become n(log n + 1), which gives
us a time complexity of O(n*log n).

Worst Case Time Complexity [ Big-O ]: O(n*log n)


Best Case Time Complexity [Big-omega]: O(n*log n)
Average Time Complexity [Big-theta]: O(n*log n)
Space Complexity: O(n)

Time complexity of Merge Sort is O(n*Log n) in all the 3 cases (worst, average
and best) as merge sort always divides the array in two halves and takes linear time
to merge two halves.
It requires equal amount of additional space as the unsorted array. Hence its not at
all recommended for searching large unsorted arrays.
CS8391 58 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

It is the best Sorting technique used for sorting Linked Lists.

HEAP SORT

Heap Sort is one of the best sorting methods being in-place and with no quadratic
worst-case running time. Heap sort involves building a Heap data structure from
the given array and then utilizing the Heap to sort the array.

The node's parent, left, and right child can be expressed as

Heap:
Heap is a special tree-based data structure that satisfies the following special heap
properties:

Shape Property: Heap data structure is always a Complete Binary Tree, which
means all levels of the tree are fully filled.

Heap Property:
All nodes are either greater than or equal to or less than or equal to each of its
children.

CS8391 59 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

There are two kinds of binary heaps: Max-heap and Min-heaps. Both types of
heaps satisfy certain property.
Max-heap Property
If the parent nodes are greater than their child nodes, heap is called a Max-Heap
If A is an array representation of a heap, then in Max-heap.
A[parent[i]] > = A[i]
which means that a node can't have a greater value than its parent. In a max-heap,
the largest element is stored at the root, and the minimum elements are in the
leaves.
Min-heap Property
If the parent nodes are smaller than their child nodes, heap is called Min-Heap.
If A is an array representation of a heap, then in Min-heap.
A[parent[i]] < = A[i]
which means that a parent node can't have a greater value than its children. Thus,
the minimum element is located at the root, and the maximum elements are located
in the leaves.

CS8391 60 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Procedure to perform Heap Sort


Heap sort algorithm is divided into two basic parts:
 Creating a Heap of the unsorted list/array.
 Then a sorted array is created by repeatedly removing the largest/smallest
element from the heap, and inserting it into the array. The heap is
reconstructed after each removal.
Initially on receiving an unsorted list, the first step in heap sort is to create a Heap
data structure(Max-Heap or Min-Heap). Once heap is built, the first element of the
Heap is either largest or smallest (depending upon Max-Heap or Min-Heap), so we
put the first element of the heap in our array. Then we again make heap using the
remaining elements, to again pick the first element of the heap and put it into the
array. We keep on doing the same repeatedly until we have the complete sorted list
in our array.

In the below algorithm, initially heapsort() function is called, which calls heapify()
to build the heap.
Heapsort Algorithm
The heapsort algorithm has two main parts (that will be broken down further
below): building a max heap and then sorting it. The max heap is built as described
in the above section. Then, heapsort produces a sorted array by repeatedly
removing the largest element from the heap (which is the root of the heap), and
then inserting it into the array. The heap is updated after each removal. Once all
elements have been removed from the heap, the result is a sorted array.

The heapsort algorithm uses the max_heapify function, and all put together, the
heapsort algorithm sorts a heap array like this:
1. Build a max-heap from an unordered array.
CS8391 61 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

2. Find the maximum element, which is located A[0] at because the heap is a
max-heap.
3. Swap elements A[n] and A[0] so that the maximum element is at the end of
the array where it belongs.
4. Decrement the heap size by one (this discards the node we just moved to the
bottom of the heap, which was the largest element). In a manner of speaking,
the sorted part of the list has grown and the heap (which holds the unsorted
elements) has shrunk.
5. Now run max_heapify on the heap in case the new root causes a violation of
the max-heap property. (Its children will still be max heaps.)
6. Return to step 2.
Routine
def max_heapify(A, heap_size, i):
left = 2 * i + 1
right = 2 * i + 2
largest = i
if left < heap_size and A[left] > A[largest]:
largest = left
if right < heap_size and A[right] > A[largest]:
largest = right
if largest != i:
A[i], A[largest] = A[largest], A[i]
max_heapify(A, heap_size, largest)

Complexity Analysis of Heap Sort

Worst Case Time Complexity: O(n*log n)

CS8391 62 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Best Case Time Complexity: O(n*log n)


Average Time Complexity: O(n*log n)
Space Complexity : O(1)
Heap sort is not a Stable sort, and requires a constant space for sorting a list.
Heap Sort is very fast and is widely used for sorting.

Program for Heap Sort


#include<stdio.h>
#include<conio.h>
void manage(int *, int);
void heapsort(int *, int, int);
void main()
{ int arr[20];
int i,j,size,tmp,k;
clrscr();
printf("\n\t\t\t------- Heap sorting method -------\n\n");
printf("Enter the number of elements to sort : ");
scanf("%d",&size);
printf("Enter The Element In Array\n");
for(i=1; i<=size; i++)
{ scanf("%d",&arr[i]);
manage(arr,i);
}
j=size;
for(i=1; i<=j; i++)
{ tmp=arr[1];
arr[1]=arr[size];
CS8391 63 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

arr[size]=tmp;
size--;
heapsort(arr,1,size);
}
printf("\n\t\t\t------- Heap sorted elements -------\n\n");
size=j;
printf("Sorted Elements:\t");
for(i=1; i<=size; i++)
printf("%d\t ",arr[i]);
getch();
}
void manage(int *arr, int i)
{ int tmp;
tmp=arr[i];
while((i>1) && (arr[i/2]< tmp))
{ arr[i]=arr[i/2];
i=i/2;
}
arr[i]=tmp;
}
void heapsort(int *arr, int i, int size)
{ int tmp,j;
tmp=arr[i];
j=i*2;
while(j<=size)
{ if((j < size) && (arr[j] < arr[j+1]))
j++;
CS8391 64 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

if(arr[j] < arr[j/2])


break;
arr[j/2]=arr[j];
j=j*2;
}
arr[j/2]=tmp;
}

OUTPUT
------- Heap sorting method -------
Enter the number of elements to sort : 6
Enter The Element In Array 12 3 -7 120 45 90
------- Heap sorted elements -------
Sorted Elements: -7 3 12 45 90 120

Time Complexity comparison of Sorting

Sorting Types Time Complexity

Best Average Worst


Quicksort O(n log(n)) O(n log(n)) O(n^2)
Merge sort O(n log(n)) O(n log(n)) O(n log(n))
Heap sort O(n log(n)) O(n log(n)) O(n log(n))
Bubble Sort O(n) O(n^2) O(n^2)
Insertion Sort O(n) O(n^2) O(n^2)
Select Sort O(n^2) O(n^2) O(n^2)
Bucket Sort O(n+k) O(n+k) O(n^2)
Radix Sort O(nk) O(nk) O(nk)
HASHING
CS8391 65 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Hashing is a process of generating an index or address based on the data. For


example, file systems use hash table to generate the disk location using the
filename. A good hash function is the one which generates distinct addresses for
distinct file names. It is used to perform insertions, deletions, and finds in constant
average time.

Hash Table
Hash Table is a data structure used to store data elements in a specific order. The
ideal hash table is a fixed size (TableSize) array containing keys. Each key is
mapped into some number in the range 0 to TableSize - 1, and placed in the
appropriate cells.

HASH FUNCTION
The mapping of key into some number in the range 0 to tablesize-1 of the hash
table is called a hash function.
It is used to put the data in the hash table and also to retrieve the data from the hash
table. Thus hash function is used to implement the hash table.

Hash Key
The integer returned by hash function is called hash key. For numeric keys, one
simple hash function is Key mod TableSize, where TableSize is a prime number.

Characteristics of Good Hashing function


1. The hash function should be simple to compute.
2. Number of collisions should be less while placing the record in the hash table.
Ideally no collision should occur. Such a function is called perfect hash function.

CS8391 66 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

3. Hash functions should produce such a keys (buckets) which will get distributed
uniformly over an array.
4. The hash function should depend upon every bit of the key. Thus the hash
function that simply extracts the portion of a key is not suitable.

TYPES OF HASH FUNCTION


There are different types of hash function. They are:
1. Division method
2. Mid square
3. Multiplicative hash function
4. Digit folding
1. Division method
The hash function depends upon the remainder of division. Typically the divisor
is table length.
Example:
If the record 54,72,89,37 is to be placed in the hash table and if the table size is 10
then
H(key) = record % table size
4=54%10 (Places record 54 at index 4 of hash table)
2=72%10 9=89%10 7=37%10

0
1
2 72
3
4

CS8391 67 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

5 54
6
7 37
8
9 89
2. Mid square:
In the mid square method, the key is squared and the middle or mid part of the
result is used as the index.
Consider that if we want to place a record 3111 then
31112 = 9678321
For the hash table of size 1000
H(3111)=783 (the middle 3 digits)

3. Multiplicative hash function:


The given record is multiplied by some constant value. The formula for
computing the hash key is
H(key)=floor(P*(fractional part of key *A))

Where, p is integer constant and A is constant real number.


If key 107 and p=50 & A=0.61803398987

H(key) = floor(50*(107*0.61803398987)
= floor(3306.4818458045)
= 3306.
The record 107 will be placed at location 3306 in the hash table.

4. Digit folding:

CS8391 68 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

The key is divided in to separate parts and using some simple operation these
parts are combined to produce the hash key.
Example:
Consider a record 12365412 then it is divided into separate parts 123 654 12 and
these are added together
H(key) = 123+654+12
= 789
The record 12365412 will be placed at location 789 in the hash table.

Collision Definition:
The situation in which the hash function returns the same hash key for more
than one record is called collision.

Similarly when there is no room for a new pair in the hash table then such situation
is called overflow. Sometimes when we handle collision it may lead overflow
conditions. Collision and overflow show the poor hash functions.

Example
Consider a hash function. H(key) = recordkey%10 having the hash table of
size 10. The record keys to be placed are 131, 44, 43, 78, 19, 36, 57 and 77
0
1 131
2
3 43
4 44
5
6 36

CS8391 7 57 69 UNIT V - DATA STRUCTURES


8 78
9 19
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Now if we try to place 77 in the hash table then we get the hash key to be 7
and index 7 already has the record key 57. This situation is called collision.
From the index 7 we look for next vacant position at subsequent indices 8, 9
then we find that there is no room to place 77 in the hash table. This situation is
called overflow.
A simple hash function –routine
typedef unsigned int Index;
Index Hash(int key, int TableSize)
{
unsigned int HashVal = 0;
HashVal= key% Tablesize;
return HashVal;
}
COLLISION RESOLUTION TECHNIQUES
The techniques which are used to resolve or overcome collision while inserting
data into the hash table are called collision resolution techniques.
There are two methods for detecting collisions and overflows in the hash table
1. Chaining or Separate chaining.
2. Open addressing
 Linear probing
 Quadratic probing
 Double hashing
1. SEPARATE CHAINING

CS8391 70 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

In this method, a linked list of all elements that hash to the same value is kept. The
linked list has a header node. Any new element inserted will be inserted in the
beginning of the list.
The diagrammatic representation is given below.

Example:
Consider the keys to be placed in their home buckets are
131, 3, 4, 21, 61, 24, 7, 97, 8, 9
Then we will apply a hash function as
H(key) = key % D
where D is the size of table. The hash table will be
Here D = 10.

Implementation
Type declaration for separate chaining
struct ListNode;
typedef struct ListNode *Position;
struct HashTbl;
CS8391 71 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

typedef struct HashTbl *HashTable;


HashTable InitializeTable (int TableSize);
void DestroyTable (HashTable H);
Position Find (ElementType Key, HashTable H );
void Insert (ElementType Key, HashTable H);
ElementType Retrieve (Position P);
struct ListNode
{
ElementType Element;
Position Next;
};
typedef Position List;
struct HashTbl
{
int TableSize;
List *TheLists;
};
Initialization routine for separate chaining
HashTable
InitializeTable (int TableSize)
{
HashTable H;
int i;
if (TableSize < MinTableSize)
{
Error ("Table size too small");
return NULL;
CS8391 72 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

}
/* Allocate table */
H = malloc (sizeof (struct HashTbl));
if (H == NULL)
FatalError ("Out of space!!!");
H->TableSize = NextPrime (TableSize);
/* Allocate array of lists */
H->TheLists = malloc (sizeof (List) * H->TableSize);
if (H->TheLists == NULL)
FatalError ("Out of space!!!");

/* Allocate list headers */


for (i = 0; i < H->TableSize; i++)
{
H->TheLists [i] = malloc (sizeof (struct ListNode));
if (H->TheLists [i] == NULL)
FatalError ("Out of space!!!");
else
H->TheLists [i]->Next = NULL;
}
return H;
}
Explanation
1. This function takes the tablesize as the parameter.
2. H contains the address where the structure hashtabl is created in the
memory.
3. The table size is made to be prime.
CS8391 73 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

4. The variable Thelists contains the base address of the array of the list
formed.
5. Each index in the array of index contains the address of the node listnode.
6. The next address of the Thelists is made to be NULL.

Find routine for separate chaining


Position Find (ElementType Key, HashTable H)
{ Position P;
List L;
L = H->TheLists [Hash (Key, H->TableSize)];
P = L->Next;
while (P != NULL && P->Element != Key) /* Probably need strcmp!! */
P = P->Next;
return P;
}
Explanation
1. This function has two parameters namely the element to be found and the
address of the hashtable.
2. Hash function is called which returns the index where the data is to be
found.
3. The address stored in that index of the list is assigned to L.(Header Node)
4. P contains the address of the next pointer of L(i.e the first node in the list
which contains the element).
5. A Loop is formed to find the element. Using P, traversal of Linked list takes
place.
6. The value of P(Address) is returned back to the calling function.

CS8391 74 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Consider, Find the element 97.


The steps involved are:
i. Hash function returns the index 7.
ii. L contains the address of the header node for the linked list formed for
the index 7.
iii. P contains the address of the node which contains P.
iv. Using P the linked list is traversed and When the element and the value
present in P is equal, the address of the node is returned back.
v. If the value is not found, it returns back NULL, since P contains NULL.

Insert routine for separate chaining

void Insert (ElementType Key, HashTable H)


{
Position Pos, NewCell;

CS8391 75 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

List L;
Pos = Find (Key, H);
if (Pos == NULL) /* Key is not found */
{
NewCell = malloc (sizeof (struct ListNode));
if (NewCell == NULL)
FatalError ("Out of space!!!");
else
{
L = H->TheLists [Hash (Key, H->TableSize)];
NewCell->Next = L->Next;
NewCell->Element = Key; /* Probably need strcpy! */
L->Next = NewCell;
}
}
}
Explanation
1. This function has two parameters namely the element to be inserted and the
address of the hashtable.
2. Find function is invoked to check whether the element to be inserted is
present already. If it is present already, it is not inserted again.
3. If find function returns NULL, it implies that the element is not found in the
hash table. Hence insertion takes place.
4. The variable NewCell contains the address of the new node created.
5. The variable L contains the address of the header node of the linked list
where the element is to be inserted.

CS8391 76 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

6. The element is inserted into the new node. The new node is inserted into the
beginning of the list.
2. OPEN ADDRESSING
Open Addressing is an alternative method to resolve collision with linked lists. If a
collision occurs, alternative cells are tried until an empty cell is found. Because all
the data go inside the table, a bigger table is needed for open addressing hashing
than for separate chaining hashing.
There are three methods in open addressing. They are:
i. Linear Probing
ii. Quadratic Probing
iii. Double Hashing

2.1. LINEAR PROBING


This is the easiest method of handling collision. If collision occurs, alternative cells
are tried until an empty cell is found.
In linear probing method, the hash table is represented one-dimensional array with
indices that range from 0 to the desired table. Before inserting any elements into
this table, we must initialize the table to retain the situation where all slots are
empty. This allows us to detect overflow collisions when we insert elements into
the table. Then using some suitable function the element can be inserted into the
hash table.
Example:
Consider following keys that are to be inserted in the hash table.
131, 4, 8, 7, 21, 5, 31, 61, 9, 29
Initially, we will put the following keys in the hash table.
131,4,8,7.

CS8391 77 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

We will use Division hash function. That means the keys are placed using the
H(key) = key % tablesize
For instance the element 131 can be placed at
H(key) = 131%10
=1
131 is placed at the Index 1. Continuing in this fashion we will place 4, 8 and 7.

Index Key
0 NULL
1 131
2 NULL
3 NULL
4 4
5 NULL
6 NULL
7 7
8 8
9 NULL

Now the next key to be inserted is 21. According to the hash function
H(key) = 21%10
=1
But the index 1 location is already occupied by 131 i.e. collision occurs. To resolve
collision we will linearly move down and at the next empty location. Therefore 21
will be placed at the index 2. If the next element is 5 we will put element 5 at index
5.
The Hash table after the insertion of 21 and 5 is given below.
CS8391 78 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Index Key
0 NULL
1 131
2 21
3 NULL
4 4
5 5
6 NULL
7 7
8 NULL
9 NULL

After placing record keys 31, 61 the hash table will be

Index Key
0 NULL
1 131
2 21
3 31
4 4
The next record key that comes is 9, According to
5 5
decision hash function it demands the index 9.
6 61
Hence we will place 9 at index 9. Now the next final
7 7
record key is 29 and it hashes a key 9. But the index 9
8 8
is already occupied. So there is no next empty bucket
9 NULL
as the table size is limited to index 9. The overflow
occurs.
Problem with linear probing
CS8391 79 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

One major problem with linear probing is primary clustering. Primary clustering is
a process in which a block of data is formed in the hash table when collision is
resolved.
19 % 10 = 9 18 % 10 = 8 39 % 10 = 9 29 % 10 = 9 8 % 10 = 8

0 39
1 29 Cluster is Formed
2 8
3
4
5 Rest of the table empty
6
7
8 18
9 19

2.2. QUADRATIC PROBING


In Quadratic Probing, the collision function is quadratic. It eliminates the primary
clustering problem. . If collision occurs, alternative cells are tried until an empty
cell is found. In linear probing method, the hash table is represented one-
dimensional array with indices that range from 0 to the desired table. The
alternative cells are calculated using the formula, F(i) = i 2.
The forumula to calculate the Hash(key) when collision occurs is given by the
formula,
H = (Hash(key)+i2) mod m
Where m is a table size or any prime number

CS8391 80 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Example :
If we have to insert following elements in the hash table with size 10
37, 90, 55, 22, 11, 17, 49, 87. We will fill the hash table step by step

37 % 10 = 7 90 % 10 = 0 55 % 10 = 5 22 % 10 = 2 11 % 10 = 1

0 90
1 11
2 22
3
4
5 55
6
7 37
8
9

Now if we want to place 17 a collision will occur as 17%10=7, since 37 is already


present in that location. Hence we will apply quadratic probing to insert this recor
the hash table.
H = (Hash(key)+i2)
Consider i = 0 then
(17+02) %10 = 7
(17+12) %10 = 8, 0 90 When i = 1
The index 8 is 1 empty hence we will place the
element at index 2 22 8.
Then comes 49 3 which will be placed at index 9.
4
49% 10 = 9 5 55
6
CS8391
7 37 81 UNIT V - DATA STRUCTURES

8 17
9 49
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Now to place 87 we will use quadratic probing.

0 90
1 11
2 22
3
4
(87 + 0) % 10 = 7
5 55
(87 + 1) % l0 = 8 ... but already occupied
6 87
(87 + 22) % 10 = 1 ... already occupied
7 37
(87 + 32) % l0 = 6
8 17
9 49
It is observed that if we want to place all the
necessary elements in the hash table
the size of divisor (m) should be twice as large as total number of elements.

Type declaration for open addressing


typedef int ElementType;
typedef unsigned int Index;
typedef Index Position;

CS8391 82 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

enum KindOf Entry {Legitimate, Empty, Deleted}


struct HashEntry
{ ElementType Element;
enum KindOfEntry Info;
};
typedef struct HashEntry Cell;
/* Cell *TheCells will be allocated later */
struct HashTbl
{
int TableSize;
Cell *TheCells;
};
struct HashTbl;
typedef struct HashTbl *HashTable;
HashTable InitializeTable (int TableSize);
void DestroyTable (HashTable H);
Position Find (ElementType Key, HashTable H);
void Insert (ElementType Key, HashTable H);
ElementType Retrieve (Position P, HashTable H);
HashTable Rehash (HashTable H);
/* Delete & MakeEmpty are omitted */
Routine to initialize open addressing hash table
HashTable InitializeTable (int TableSize)
{
HashTable H;
int i;
if (TableSize < MinTableSize)
CS8391 83 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

{
Error ("Table size too small");
return NULL;
}
/* Allocate table */
H = malloc (sizeof (struct HashTbl));
if (H == NULL) FatalError ("Out of space!!!");
H->TableSize = NextPrime (TableSize);
/* Allocate array of Cells */
H->TheCells = malloc (sizeof (Cell) * H ->TableSize);
if (H->TheCells == NULL) FatalError ("Out of space!!!");
for (i = 0; i < H->TableSize; i++ )
H->TheCells [i].Info = Empty;
return H;
}
Explanation
1. This function takes the tablesize as the parameter.
2. H contains the address where the structure hashtable is created in the memory.
3. The table size is made to be prime.
4. The variable “TheCells” contains the base address of the array of the cell
formed.
5. For every cell in the array, the enum value empty is assigned.

Routine for finding the element with quadratic probing


Position Find (ElementType Key, HashTable H)
{
Position CurrentPos;
CS8391 84 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

int CollisionNum;
CollisionNum = 0;
CurrentPos = Hash (Key, H->TableSize);
while (H->TheCells [CurrentPos].Info != Empty &&
H->TheCells [CurrentPos].Element != Key)
/* Probably need strcmp!! */
{
CurrentPos += 2 * ++CollisionNum - 1;
if (CurrentPos >= H->TableSize)
CurrentPos -= H->TableSize;
}
return CurrentPos;
}
Explanation
1. This function has two parameters namely the element to be found and the
address of the hashtable.
2. Hash function is called which returns the index where the data is to be
found. The index is stored in currentpos.
3. It checks the index to find the data. If the element is not found, it check the
next possible index to find the data. This is done using the formula
Currentpos = currentpos + 2 * ++collissionnum – 1, wher collissionnum is
initialized to 0.
4. The function returns the index value of the array where the element is
present.
Insert routine for hash tables with quadratic probing
void Insert( ElementType Key, HashTable H )
{
CS8391 85 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Position Pos;
Pos =Find( Key, H );
if( H—>TheCells[Pos ].Info ! = Legitimate )
{
H—>TheCells[ Pos ].Info = Legitimate;
H->TheCells [Pos].Element =Key;
}
}
Explanation
1. This function has two parameters namely, element to be inserted in the key and
the address of the hash table H.
2. Find Function is invoked to find the index where the data is to be inserted.
3. If the info in the TheCells is not legitimate, it is assigned legitimate. And the
key value is inserted into that position.
Disadvantages of Quadratic Probing
• If the table size is prime, a new element can always be inserted if the table is
at least half empty.
• Standard deletion cannot be performed in an open addressing hash table
because the cell might have caused a collision to go past it.
• Secondary clustering problem - elements hash to the same position will
probe the same alternative cells.
2.3. DOUBLE HASHING
Double hashing is technique in which a second hash function is applied to the key
when a collision occurs. By applying the second hash function we will get the
number of positions from the point of collision to insert.
There are two important rules to be followed for the, second function:
• It must never evaluate to zero.
CS8391 86 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

• Must make sure that all cells can be probed.


The formula to be used for double hashing is

H1(key)= mod tablesize

H2(key) =M –(key mod M)


where M is a prime number smaller than the size of the table.
Example

Consider the following elements to be placed in the hash table of size 10


37, 90 45, 22, 17, 49, 55

Initially insert the elements using the formula for H


Insert 37, 90, 45, 22,49
0 90
H(37) = 37 % 10 = 7
1
H(90) = 90 % 10 = 0 2 22
H(45) = 45 % 10 = 5 3
4
H(22) = 22 % 10 = 2
5 45
H(49) = 49 % 10 = 9 6
7 37
8
9 49

Now if 17 is to be inserted then


H1(17)=17%10=7. Now collision occurs since index 7 is already
filled in.
Now, by using the second function
H2(Key)=M-(Key%M) ( M = 7) that is H2(17)=7-(17%7)=7-3=4

CS8391 87 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

That means we have to insert the element 17 at 4 places from 37.In short we
have ton take 4 jumps .Therefore the 17 will be placed at index 1.Now to insert
numbers 55
0 90
1 17
2 22
3
4
5 45
6
7 37
8
9 49
Insert number 55.
H 1(55) = 55%10=5
H 2(55) =7—(55 %7) =7—6=1
We have to take one jump from index 5 to place 55. Finally the hash table looks
like this.
0 90
1 17
2 22
3
4
5 45
6 55
7 37
8
9 49

Comparison of quadratic probing and double hashing:


 The double hashing requires another hash function whose probing efficiency is
same as some another hash function required when handling random collision.
CS8391 88 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

 The double hashing is more complex to implement than quadratic probing. The
quadratic probing is fast technique than double hashing.
REHASHING
Rehashing is a technique in which the table is resized, i.e., the size of table is
doubled by creating a new table. It is preferable if the total size of table is a prime
number. There are situations in which the rehashing is required-
 When table is completely full.
 With quadratic probing when the table is filled half.
 When insertions fail due to overflow.
In such situations, we have to transfer entries from old table to the new table by
re-computing their positions using suitable hash functions.

Consider we have to insert the elements 37, 90, 55, 22, 17, 49 and 87. The size is
10 and will use hash function,
H(key) = key mod tablesize
37 % 10 = 7 90 % 10 = 0 55 % 10 = 5 22 % 10 = 2
17 % 10 = 7 Collision solved by linear Probing by placing it at 8 49 % 10 = 9
0 90

CS8391 89 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

1
2 22
3
4
5 55
6
7 37
8 17
9 49

Now the table is almost full and if we try to insert more elements collisions will
occur eventually further insertions will fail. Hence we will rehash by doubling the
table . The old table size is 10 then we should double this size for new table, but 20
is not a prime number, we will prefer to make the table size as 23.
The hash function will be H(key) = key mod 23
01
02
03 49
04
05
06
07
08
09 90
10 55
11
12
13
14 37
15
16
17 17
18 87

CS8391 90 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

19
20
21
22 22

Now the hash table is sufficiently large to accommodate new insertions.


REHASHING IS SIMPLE TO IMPLEMENT.
HASH_TABLE rehash( HASH_TABLE H )
{
unsigned int i, old_size;
cell *old_cells;
/*1*/ old_cells = H->the_cells;
/*2*/ old_size = H->table_size;
/* Get a new, empty table */
/*3*/ H = initialize_table( 2*old_size );
/* Scan through old table, reinserting into new */
/*4*/ for( i=0; i<old_size; i++ )
/*5*/ if( old_cells[i].info == legitimate )
/*6*/ insert( old_cells[i].element, H );
/*7*/ free( old_cells );
/*8*/ return H;
}
Advantages:
1. This technique provides the programmer a flexibility to enlarge the
table size if required.
2. Only the space gets doubled with simple hash function which avoids
occurrence of collisions.
EXTENSIBLE HASHING
CS8391 91 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

• Extensible hashing is a technique, which handles a large amount of data. The


data to be placed in the hash table is by extracting certain number of bit
• Extensible hashing grows and shrinks similar to B-trees.
Let us assume that there are N records to store in the disk; at most M records fit in
one disk block.
Consider the data consists of several six-bit integers.

Extendable Hashing: Original Data

The root of the tree contains four pointer determined by the leading two bits of the
data. Each leaf has up to M=4 elements. It happens that in each leaf the first two
bits are identical; this is indicated by the number in parenthesis.
D will represent the number of bits used by the root, which is sometimes known as
directory.
The number of entries in the directory is 2 D. d L is the number of leading bits that
all the elements of some leaf L have in common.

CS8391 92 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

d L will depend on the particular leaf, and d L <=D.

Suppose that we want to insert the key 100100. This would go into the third leaf,
but as the third leaf is already full, there is no room. We thus split this leaf into two
leaves, which are now determined by the first three bits. This requires increasing
the directory size to 3. These changes are reflected in the figure below.

Extendible hashing: after insertion of 100100 and directory split

All of the leaves not involved in the split are now pointed to by two adjacent
directory entries. Thus, although an entire directory is rewritten, none of the other
leaves is actually accessed. If the key 000000 is now inserted, then the first leaf is
split, generating two leaves with d L = 3. Since D = 3, the only change required in
the directory is the updating of the 000 and 001 pointers. This is given in the figure
below:

CS8391 93 UNIT V - DATA STRUCTURES


PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE

Extendible hashing: after insertion of 000000 and leaf split

This very simple strategy provides quick access times for Insert and Find
operations on large databases. There are a few important details we have not
considered.

CS8391 94 UNIT V - DATA STRUCTURES

You might also like