Unit V
Unit V
E-CSE
DEPARTMENT OF CSE
SEARCHING
Searching is the technique of finding particular element from the set.
Two most commonly used searching techniques are
Linear searching
Binary searching
Searching Algorithms
Searching is the process of determining whether the given value exists in a data
structure or in a storage media.
There are two types of searching algorithms
1. Linear Search
2. Binary Search
LINEAR SEARCH
The linear (or sequential) search algorithm on an array is:
Start from beginning of an array/list and continues until the item is found or
the entire array/list has been searched.
Sequentially scan the array, comparing each array item with the searched
value.
If a match is found; return the index of the matched element; otherwise return-1.
Note: linear search can be applied to both sorted and unsorted arrays.
Step 2: Compare, the search element with the first element in the list.
Step 3: If both are matching, then display "Given element found!!!" and terminate
the function
Step 4: If both are not matching, then compare search element with the next
element in the list.
Step 5: Repeat steps 3 and 4 until the search element is compared with the last
element in the list.
Step 6: If the last element in the list is also doesn't match, then display "Element
not found!!!" and terminate the function.
Search element : 12
Step 1: Search element (12) is compared with first element(65).
0 1 2 3 4 5 6 7
List 65 20 10 55 32 12 50 99
12
Both are not matching. So move to next element.
Step 2: Search element(12) is compared with next element(20).
0 1 2 3 4 5 6 7
List 65 20 10 55 32 12 50 99
12
Both are not matching. So move to next element.
Step 3: Search element(12) is compared with next element(10)
0 1 2 3 4 5 6 7
List 65 20 10 55 32 12 50 99
12
Both are not matching. So move to next element.
Step 4: Search element(12) is compared with next element(55).
0 1 2 3 4 5 6 7
List 65 20 10 55 32 12 50 99
12
Both are not matching. So move to next element.
Step 5: Search element(12) is compared with next element(32).
0 1 2 3 4 5 6 7
List 65 20 10 55 32 12 50 99
12
Both are not matching. So move to next element.
Step 6: Search element(12) is compared with next element(12)
0 1 2 3 4 5 6 7
List 65 20 10 55 32 12 50 99
12
Both are matching. So we stop comparing and display element found at index 5.
CS8391 4 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
Benefits
Easy to understand
Array can be in any order
Disadvantages
Inefficient for array of N elements
Examines N/2 elements on average for value in array, N elements for value
not in array
Time complexity of linear search is:
a. Best case = O(1)
b. Average case = n(n+1)/2n = O(n)
c. Worst case = O(n)
PROGRAM
#include <stdio.h>
int main()
{
int a[100], key, i, n, count = 0;
printf("Enter the number of elements in array :");
scanf("%d", &n);
BINARY SEARCH
The most efficient method of searching a sequential file is binary search. This
method is applicable to elements of a sorted list only.
In this method, to search an element it compare with the middle element of the list.
If it matches, then the search is successful and it is terminated. But if it does not
match, the list is divided into two halves.
The first half consists of 0th element to the middle element whereas the second list
consists of the element next to the middle element to the last element. Now It is
obvious that all elements in first half will be < or = to the middle element and all
CS8391 6 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
element elements in the second half will be > than the middle element. If the
element to be searched is greater than the center element then searching will be
done in the second half, otherwise in the first half.
Same process of comparing the element to be searched with the center element &
if not found then dividing the elements into two halves is repeated for the first or
second half. This process is repeated till the required element is found or the
division of half parts gives a single element.
Binary search looks for an item in an array/list using divide and conquer strategy.
The algorithm begins at the middle of the array in a binary search
If the item for which searching is less than the item in the middle. Then the
item won't be in the second half of the array.
Once again examine the "middle" element, the process continues with each
comparison cutting in half the portion of the array, where the item might be.
Note: Each execution of the recursive method reduces the search space by
about a half.
Let A be the mid element of array A. Then there are three conditions that needs to
be tested while searching the array using this method
1. If KEY==A then desired element is present in the list.
2. Otherwise if (KEY<A) then search the left sub list
3. Otherwise if (KEY>A) then search the right sub list.
Key ?
else
return(-1);
}
Explanation
1. Initialize low to index of the first element.
2. Initialize high to index of the last element.
3. Repeat through step 6 while low < = high
4. Middle = (low + high)/2
5. If search element is equal to middle element then return the index position
of middle element.
6. If search element is less than middle element then high=middle – 1 should
be done to search the element in first half.
else
Search element is greater than middle element then low=middle + 1
should be done to search the element in second half.
7. If the element is not found return 0.
Disadvantage
Binary search requires that array elements be sorted.
CS8391 9 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
To find the average case, take the sum over all elements of the product of
number of comparisons required to find each element and the probability of
searching for that element. To simplify the analysis, assume that no item
which is not in A will be searched for, and that the probabilities of searching
for each element are uniform.
PROGRAM
#include <stdio.h>
#include <conio.h>
void main()
{
int a[25], i, n, Key, flag = 0, low, high, mid;
clrscr();
printf("Enter the number of elements :");
scanf("%d", &n);
printf("Enter the elements :");
for(i = 0; i<n; i++)
scanf("%d",&a[i]);
else
low = mid + 1;
}
if(flag == 1)
printf("Key element is found");
else
printf("Key element not found");
getch(); }
OUTPUT
Enter the number of elements :6
Enter the elements :3 15 17 25 33 47
Enter the key to be searched : 12
Key element not found
SORTING
Sorting is one of the most important operations performed by computers.
Sorting is a process of reordering a list of items in either increasing or
decreasing order.
One example of external sorting is the external merge sort algorithm, which sorts
chunks that each fit in RAM, then merges the sorted chunks together. We first
divide the file into runs such that the size of a run is small enough to fit into main
memory. Then sort each run in main memory using merge sort sorting algorithm.
Finally merge the resulting runs together into successively bigger runs, until the
file is sorted.
BUBBLE SORT
1. Compare A[0] and A[1]. If A[0] is bigger than A[1], swap the elements.
2. Move to the next element, A[1] (which might now contain the result of a
swap from the previous step), and compare it with A[2]. If A[1] is bigger
than A[2], swap the elements. Do this for every pair of elements until the
end of the list.
3. Do steps 1 and 2 n times.
int temp;
temp = A[ j ];
A[ j ] = A[ j+1 ];
A[ j + 1] = temp;
} } } }
In Bubble Sort, n-1 comparisons will be done in the 1st pass, n-2 in 2nd pass, n-3
in 3rd pass and so on. So the total number of comparisons will be,
Sum = n(n-1)/2
i.e O(n2)
CS8391 16 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
The space complexity for Bubble Sort is O(1), because only a single additional
memory space is required i.e. for temp variable. Also, the best case time
complexity will be O(n), it is when the list is already sorted.
Following are the Time and Space complexity for the Bubble Sort algorithm.
Example 2:
Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest number
to greatest number using bubble sort. In each step, elements written in bold are
being compared. Three passes will be required.
First Pass:
(51428) ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and
swaps since 5 > 1.
(14258) ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5),
algorithm does not swap them.
Second Pass:
(14258) (14258)
(14258) ( 1 2 4 5 8 ), Swap since 4 > 2
(12458) (12458)
(12458) (12458)
Now, the array is already sorted, but our algorithm does not know if it is
completed. The algorithm needs one whole pass without any swap to know it is
sorted.
Third Pass:
(12458) (12458)
(12458) (12458)
(12458) (12458)
(12458) (12458)
Example 3: A [ ] = { 7, 4, 5, 2}
In step 1,7 is compared with4 . Since ,7>4,7 is moved ahead of 4 . Since all the
other elements are of a lesser value than 7, 7 is moved to the end of the array.
In step 2, 4 is compared with 5. Since 5>4 and both 4 and 5 are in ascending order,
these elements are not swapped. However, when 5 is compared with 2, 5>2 and these
elements are in descending order. Therefore, 5 and 2 are swapped.
In step 3, the element 4 is compared with2 . Since 4>2 and the elements are in
descending order, 4 and 2 are swapped.
The sorted array is A[] = {2,4,5,7}.
{
int i, j, temp;
for(i = 0; i < n; i++)
{ for(j = 0; j < n-i-1; j++)
{ if( arr[j] > arr[j+1])
{
// swap the elements
temp = arr[j];
arr[j] = arr[j+1];
arr[j+1] = temp;
} } }
// print the sorted array
printf("Sorted Array: ");
for(i = 0; i < n; i++)
{
printf("%d ", arr[i]);
}
}
int main()
{
int arr[100], i, n, step, temp;
printf("Enter the number of elements to be sorted: ");
scanf("%d", &n);
for(i = 0; i < n; i++)
{
printf("Enter element no. %d: ", i+1);
scanf("%d", &arr[i]);
CS8391 20 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
}
bubbleSort(arr, n);
return 0;
}
OUTPUT
Enter the number of elements to be sorted: 5
Enter element no. 1: 14
Enter element no. 2: -5
Enter element no. 3: 45
Enter element no. 4: 69
Enter element no. 5: 100
Sorted Array: -5 6 14 45 100
SELECTION SORT
Selection Sort algorithm is used to arrange a list of elements in a particular order
(Ascending or Descending).
In selection sort, the first element in the list is selected and it is compared
repeatedly with remaining all the elements in the list. If any element is smaller than
the selected element (for ascending order), then both are swapped. Then we select
the element at second position in the list and it is compared with remaining all
elements in the list. If any element is smaller than the selected element, then both
are swapped. This procedure is repeated till the entire list is sorted.
Step 3: For every comparision, if any element is smaller than selected element (for
Ascending order), then these two are swapped.
Step 4: Repeat the same procedure with next position in the list till the entire list is
sorted.
Routine for Selection Sort
void selectionSort(int[] a, int n)
{ for (int i = 0; i ‹ n-1; i++)
{
int small = i;
for (int j = i+1; j ‹ n; j++)
if (a[j] ‹ a[small])
small = j;
int temp = a[small];
a[small] = a[i];
a[i] = temp;
} }
Complexity Analysis of Selection Sort
Selection Sort requires two nested for loops to complete itself, one for loop is in
the function selectionSort, and inside the first loop we are making a call to another
function indexOfMinimum, which has the second(inner) for loop.
Hence for a given input size of n, following will be the time and space complexity
for selection sort algorithm:
Worst Case Time Complexity [ Big-O ]: O(n2)
Best Case Time Complexity [Big-omega]: O(n2)
Average Time Complexity [Big-theta]: O(n2)
Space Complexity: O(1)
In the first pass, the smallest element will be 1, so it will be placed at the first position.
Then leaving the first element, next smallest element will be searched, from the
remaining elements. We will get 3 as the smallest, so it will be then placed at the second
position.
Then leaving 1 and 3(because they are at the correct position), we will search for the
next smallest element from the rest of the elements and put it at third position and keep
doing this until array is sorted.
Example 2: a[]={ 15, 20,10,30,50,18,5,45}
Iteration 1: Select the first position element in the list, compare it with all other elements in the list
and whenever we found a smaller element than the element at the first position then swap those two
elements.
Iteration 2: Select the second position element in the list, compare it with all other elements in
the list and whenever we found a smaller element than the element at the first position then swap
those two elements.
List after 2nd Iteration
Iteration 3: Select the third position element in the list, compare it with all other elements in the
list and whenever we found a smaller element than the element at the first position then swap
those two elements.
List after 3rd Iteration
Iteration 4: Select the fourth position element in the list, compare it with all other elements in
the list and whenever we found a smaller element than the element at the first position then swap
those two elements.
List after 4th Iteration
Iteration 5: Select the fifth position element in the list, compare it with all other elements in the
list and whenever we found a smaller element than the element at the first position then swap
those two elements.
Iteration 6: Select the sixth position element in the list, compare it with all other elements in the
list and whenever we found a smaller element than the element at the first position then swap
those two elements.
List after 6th Iteration
Iteration 7: Select the seventh position element in the list, compare it with all other elements in
the list and whenever we found a smaller element than the element at the first position then swap
those two elements.
List after 7th Iteration (Final Sorted List of elements)
{
swap = array[c];
array[c] = array[position];
array[position] = swap;
}
}
printf("Sorted list in ascending order:\n");
for ( c = 0 ; c < n ; c++ )
printf("%d", array[c]);
return 0;
}
OUTPUT
Enter number of elements : 7
Enter 7 integers : 23 12 3 -7 4 598 38
Sorted list in ascending order:
-7 3 4 12 23 38 598
INSERTION SORT
Sorting is the process of arranging a list of elements in a particular order
(Ascending or Descending). Insertion sort algorithm arranges a list of elements in a
particular order. In insertion sort algorithm, for every iteration moves an element
from unsorted portion to sorted portion until all the elements are sorted in the list.
Basic Idea:
Find the location for an element and move all others up, and insert the element.
The process involved in insertion sort is as follows:
1. The left most value can be said to be sorted relative to itself. Thus, we don’t
need to do anything.
2. Check to see if the second value is smaller than the first one. If it is, swap these
two values. The first two values are now relatively sorted.
3. Next, we need to insert the third value in to the relatively sorted portion so that
after insertion, the portion will still be relatively sorted.
4. Remove the third value first. Slide the second value to make room for
insertion. Insert the value in the appropriate position.
Example 2:
3. More efficient than most other simple O(n2) algorithms such as selection sort or
bubble sort
4. It is called in-place sorting algorithm. The in-place sorting algorithm is an
algorithm in which the input is overwritten by output and to execute the sorting
method it does not require any more additional space.
Even though insertion sort is efficient, still, if we provide an already sorted array to
the insertion sort algorithm, it will still execute the outer for loop, thereby requiring
n steps to sort an already sorted array of n elements, which makes its best case time
complexity a linear function of n.
SHELL SORT
Shell sort, named after its inventor, Donald Shell, It is a highly efficient sorting
algorithm and is based on insertion sort algorithm. This algorithm avoids large
shifts as in case of insertion sort, if the smaller value is to the far right and has to
be moved to the far left.
This algorithm uses insertion sort on a widely spread elements, first to sort them
and then sorts the less widely spaced elements. This spacing is termed as interval.
This interval is calculated based on Knuth's formula as −
Knuth's Formula
h=h*3+1
where −
h is interval with initial value 1
This algorithm is quite efficient for medium-sized data sets as its average and
worst case complexity are of Ο(n), where n is the number of items.
}
A[j] = temp;
}
}
Example 1:
Shell sort is quite similar to that of insertion sort with the only difference that in
shell sort, higher values of k are considered. Whereas insertion sort assumes k to
be 1. If k=4, then every 4th element is compared, if it is 3 then every 3rd element
is compared. Thus k=m means every mth element gets compared with each other
and is swapped.
The value of k is decremented after each scan. The file is sorted, when k becomes
1 and swapping is performed for k = 1. The difference in original and decremented
value(i.e. difference in k) may be generated using a hash function, which returns
the number of decrements in k. say k=4, then k=2 and finally k=1. k=3 is skipped.
or (k=8, k=4, k=2, k=1) or (k=4, k=3, k=2, k=1).
If k is taken as 1 only, then it is no more shell sort, it becomes insertion sort.
Time Complexity
Time complexity of above implementation of shellsort is O(n 2). In the above
implementation gap is reduce by half in every iteration. There are many other ways
to reduce gap which lead to better time complexity.
Worst Case Analysis: O(n2)
Best Case Analysis: O(n log n)
Average Case Analysis: O(n1.5)
Program for Shell Sort
#include<stdio.h>
#include<conio.h>
void shell(int a[],int n);
int i,j,k,n,temp,a[25];
void main()
{
clrscr();
printf("\n Enter the limit:");
scanf("%d",&n);
printf("\n Enter the elements :");
for(i=0;i<n;i++)
scanf("%d",&a[i]);
printf("\n SHELL SORT");
printf("\n **********");
for(j=i;j>=k&&a[j-k]>temp;j=j-k)
{
a[j]=a[j-k];
}
a[j]=temp;
printf(“\n k= %d\t”,k);
for(int p=0; p<n; p++)
printf(“%d\t”,a[p]);
printf(“\n”);
}
printf(“\n The Sorted elements are:”);
for(i=0;i<n;i++)
printf(“%d\t”,a[i]);
getch();
}
OUTPUT
Enter the limit:9
Enter the elements :81 45 67 23 -9 450 56 6 78
SHELL SORT
**********
k=4 -9 45 67 23 81 450 56 6 78
k=4 -9 45 67 23 81 450 56 6 78
k=4 -9 45 56 23 81 450 67 6 78
k=4 -9 45 56 6 81 450 67 23 78
k=4 -9 45 56 6 78 450 67 23 81
k=2 -9 45 56 6 78 450 67 23 81
k=2 -9 6 56 45 78 450 67 23 81
k=2 -9 6 56 45 78 450 67 23 81
k=2 -9 6 56 45 78 450 67 23 81
k=2 -9 6 56 45 67 450 78 23 81
k=2 -9 6 56 23 67 45 78 450 81
k=2 -9 6 56 23 67 45 78 450 81
k=1 -9 6 56 23 67 45 78 450 81
k=1 -9 6 56 23 67 45 78 450 81
k=1 -9 6 23 56 67 45 78 450 81
k=1 -9 6 23 56 67 45 78 450 81
k=1 -9 6 23 45 56 67 78 450 81
k=1 -9 6 23 45 56 67 78 450 81
k=1 -9 6 23 45 56 67 78 450 81
k=1 -9 6 23 45 56 67 78 81 450
MSD radix sort starts processing the keys from the most significant digit, leftmost
digit, to the least significant digit, rightmost digit. This sequence is opposite that of
least significant digit (LSD) radix sorts.
Routine for Radix Sort
radix_sort(int arr[], int n)
{ int bucket[10][5],buck[10],b[10];
int i,j,k,l,num,div,large,passes;
div=1;
num=0;
large=arr[0];
for(i=0 ; i< n ; i++)
{ if(arr[i] > large)
{ large = arr[i];}
while(large > 0)
{ num++;
large = large/10;
}
for(passes=0 ; passes < num ; passes++)
{
for(k=0 ; k< 10 ; k++)
{ buck[k] = 0; }
for(i=0 ; i< n ;i++)
{ l = ((arr[i]/div)%10);
bucket[l][buck[l]++] = arr[i];
}
i=0;
for(k=0 ; k < 10 ; k++)
CS8391 38 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
First, the array is divided into buckets based on the value of the least significant
digit: the ones digit.
These buckets are then emptied in order, resulting in the following partially-sorted
array:
The relative order of the elements didn’t change this time, but you’ve still got more
digits to inspect.
The next digit to consider is the hundreds digit:
For values that have no hundreds position (or any other position without a value),
the digit will be assumed to be zero.
Reassembling the array based on these buckets gives the following
array = [20, 88, 410, 1772]
Reassembling the array from these buckets leads to the final sorted array:
array = [20, 88, 410, 1772]
When multiple numbers end up in the same bucket, their relative ordering doesn’t
change. For example, in the zero bucket for the hundreds position, 20 comes before
88. This is because the previous step put 20 in a lower bucket than 80, so 20 ended
up before 88 in the array.
Example 2:
Original, unsorted list:
170, 45, 75, 90, 802, 24, 2, 66
CS8391 41 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
Hence after the first pass, pivot will be set at its position, with all the elements
smaller to it on its left and all the elements larger than to its right. Now 6 8 17 14
and 63 37 52 are considered as two separate sunarrays, and same recursive logic
will be applied on them, and we will keep doing this until the complete array is
sorted.
Steps to perform Quick Sort
Following are the steps involved in quick sort algorithm:
1. After selecting an element as pivot, which is the last index of the array in our
case, we divide the array for the first time.
2. In quick sort, we call this partitioning. It is not simple breaking down of array
into 2 subarrays, but in case of partitioning, the array elements are so positioned
that all the elements smaller than the pivot will be on the left side of the pivot and
all the elements greater than the pivot will be on the right side of it.
3. And the pivot element will be at its final sorted position.
4. The elements to the left and right, may not be sorted.
5. Then we pick subarrays, elements on the left of pivot and elements on the right of
pivot, and we perform partitioning on them by choosing a pivot in the subarrays.
In step 1, we select the last element as the pivot, which is 6 in this case, and call for
partitioning, hence re-arranging the array in such a way that 6 will be placed in its
final position and to its left will be all the elements less than it and to its right, we
will have all the elements greater than it.
Then we pick the subarray on the left and the subarray on the right and select a pivot
for them, in the above diagram, we chose 3 as pivot for the left subarray and 11 as
pivot for the right subarray.and we again call for partitioning.
Program for Quick Sort
# include <stdio.h>
void swap(int* a, int* b)
{
int t = *a;
*a = *b;
*b = t;
}
// a[] is the array, p is starting index, that is 0, and r is the last index of array.
void quicksort(int a[], int p, int r)
{
if(p < r)
{
int q;
q = partition(a, p, r);
quicksort(a, p, q);
quicksort(a, q+1, r);
}
}
int partition (int a[], int low, int high)
CS8391 48 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
swap(&arr[i], &arr[j]);
return (i + 1);
int i;
printf("\n");
int main()
int n = sizeof(arr)/sizeof(arr[0]);
quickSort(arr, 0, n-1);
printArray(arr, n);
return 0;
And if keep on getting unbalanced subarrays, then the running time is the worst
case, which is O(n2)
Whereas if partitioning leads to almost equal subarrays, then the running time is
the best, with time complexity as O(n*log n).
To avoid this, you can pick random pivot element too. It won't make any difference
in the algorithm, as all you need to do is, pick a random element from the array,
swap it with element at the last index, make it the pivot and carry on with quick
sort.
Space required by quick sort is very less, only O(n*log n) additional space is
required.
Quick sort is not a stable sorting technique, so it might change the occurence of
two similar elements in the list while sorting.
MERGE SORT
In Merge Sort, the given unsorted array with n elements, is divided into n
subarrays, each having one element, because a single element is always sorted in
itself. Then, it repeatedly merges these subarrays, to produce new sorted subarrays,
and in the end, one complete sorted array is produced.
The concept of Divide and Conquer involves three steps:
1. DIVIDE: Partition the n-element sequence to be sorted into two
subsequences of n/2 elements each.
2. CONQUER: Sort the two subsequences recursively using the mergesort.
3. COMBINE: Merge the two sorted sorted subsequences of size n/2 each to
produce the sorted sequence consisting of n elements.
// traverse both arrays and in each iteration add smaller of both elements in
temp
while(i <= mid && j <= end) {
if(Arr[i] <= Arr[j]) {
temp[k] = Arr[i];
k += 1; i += 1;
CS8391 52 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
}
else {
temp[k] = Arr[j];
k += 1; j += 1;
}
}
Example 2:
#include<stdio.h>
void mergesort(int a[],int i,int j);
void merge(int a[],int i1,int j1,int i2,int j2);
int main()
{
int a[30],n,i;
printf("Enter no of elements:");
scanf("%d",&n);
printf("Enter array elements:");
for(i=0;i<n;i++)
scanf("%d",&a[i]);
mergesort(a,0,n-1);
return 0;
}
void mergesort(int a[],int i,int j)
{
int mid;
if(i<j)
{
mid=(i+j)/2;
mergesort(a,i,mid); //left recursion
mergesort(a,mid+1,j); //right recursion
merge(a,i,mid,mid+1,j); //merging of two sorted sub-arrays
}
}
void merge(int a[],int i1,int j1,int i2,int j2)
{
int temp[50]; //array used for merging
int i,j,k;
Also, we perform a single step operation to find out the middle of any subarray, i.e.
O(1).
And to merge the subarrays, made by dividing the original array of n elements, a
running time of O(n) will be required.
Hence the total time for mergeSort function will become n(log n + 1), which gives
us a time complexity of O(n*log n).
Time complexity of Merge Sort is O(n*Log n) in all the 3 cases (worst, average
and best) as merge sort always divides the array in two halves and takes linear time
to merge two halves.
It requires equal amount of additional space as the unsorted array. Hence its not at
all recommended for searching large unsorted arrays.
CS8391 58 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
HEAP SORT
Heap Sort is one of the best sorting methods being in-place and with no quadratic
worst-case running time. Heap sort involves building a Heap data structure from
the given array and then utilizing the Heap to sort the array.
Heap:
Heap is a special tree-based data structure that satisfies the following special heap
properties:
Shape Property: Heap data structure is always a Complete Binary Tree, which
means all levels of the tree are fully filled.
Heap Property:
All nodes are either greater than or equal to or less than or equal to each of its
children.
There are two kinds of binary heaps: Max-heap and Min-heaps. Both types of
heaps satisfy certain property.
Max-heap Property
If the parent nodes are greater than their child nodes, heap is called a Max-Heap
If A is an array representation of a heap, then in Max-heap.
A[parent[i]] > = A[i]
which means that a node can't have a greater value than its parent. In a max-heap,
the largest element is stored at the root, and the minimum elements are in the
leaves.
Min-heap Property
If the parent nodes are smaller than their child nodes, heap is called Min-Heap.
If A is an array representation of a heap, then in Min-heap.
A[parent[i]] < = A[i]
which means that a parent node can't have a greater value than its children. Thus,
the minimum element is located at the root, and the maximum elements are located
in the leaves.
In the below algorithm, initially heapsort() function is called, which calls heapify()
to build the heap.
Heapsort Algorithm
The heapsort algorithm has two main parts (that will be broken down further
below): building a max heap and then sorting it. The max heap is built as described
in the above section. Then, heapsort produces a sorted array by repeatedly
removing the largest element from the heap (which is the root of the heap), and
then inserting it into the array. The heap is updated after each removal. Once all
elements have been removed from the heap, the result is a sorted array.
The heapsort algorithm uses the max_heapify function, and all put together, the
heapsort algorithm sorts a heap array like this:
1. Build a max-heap from an unordered array.
CS8391 61 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
2. Find the maximum element, which is located A[0] at because the heap is a
max-heap.
3. Swap elements A[n] and A[0] so that the maximum element is at the end of
the array where it belongs.
4. Decrement the heap size by one (this discards the node we just moved to the
bottom of the heap, which was the largest element). In a manner of speaking,
the sorted part of the list has grown and the heap (which holds the unsorted
elements) has shrunk.
5. Now run max_heapify on the heap in case the new root causes a violation of
the max-heap property. (Its children will still be max heaps.)
6. Return to step 2.
Routine
def max_heapify(A, heap_size, i):
left = 2 * i + 1
right = 2 * i + 2
largest = i
if left < heap_size and A[left] > A[largest]:
largest = left
if right < heap_size and A[right] > A[largest]:
largest = right
if largest != i:
A[i], A[largest] = A[largest], A[i]
max_heapify(A, heap_size, largest)
arr[size]=tmp;
size--;
heapsort(arr,1,size);
}
printf("\n\t\t\t------- Heap sorted elements -------\n\n");
size=j;
printf("Sorted Elements:\t");
for(i=1; i<=size; i++)
printf("%d\t ",arr[i]);
getch();
}
void manage(int *arr, int i)
{ int tmp;
tmp=arr[i];
while((i>1) && (arr[i/2]< tmp))
{ arr[i]=arr[i/2];
i=i/2;
}
arr[i]=tmp;
}
void heapsort(int *arr, int i, int size)
{ int tmp,j;
tmp=arr[i];
j=i*2;
while(j<=size)
{ if((j < size) && (arr[j] < arr[j+1]))
j++;
CS8391 64 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
OUTPUT
------- Heap sorting method -------
Enter the number of elements to sort : 6
Enter The Element In Array 12 3 -7 120 45 90
------- Heap sorted elements -------
Sorted Elements: -7 3 12 45 90 120
Hash Table
Hash Table is a data structure used to store data elements in a specific order. The
ideal hash table is a fixed size (TableSize) array containing keys. Each key is
mapped into some number in the range 0 to TableSize - 1, and placed in the
appropriate cells.
HASH FUNCTION
The mapping of key into some number in the range 0 to tablesize-1 of the hash
table is called a hash function.
It is used to put the data in the hash table and also to retrieve the data from the hash
table. Thus hash function is used to implement the hash table.
Hash Key
The integer returned by hash function is called hash key. For numeric keys, one
simple hash function is Key mod TableSize, where TableSize is a prime number.
3. Hash functions should produce such a keys (buckets) which will get distributed
uniformly over an array.
4. The hash function should depend upon every bit of the key. Thus the hash
function that simply extracts the portion of a key is not suitable.
0
1
2 72
3
4
5 54
6
7 37
8
9 89
2. Mid square:
In the mid square method, the key is squared and the middle or mid part of the
result is used as the index.
Consider that if we want to place a record 3111 then
31112 = 9678321
For the hash table of size 1000
H(3111)=783 (the middle 3 digits)
H(key) = floor(50*(107*0.61803398987)
= floor(3306.4818458045)
= 3306.
The record 107 will be placed at location 3306 in the hash table.
4. Digit folding:
The key is divided in to separate parts and using some simple operation these
parts are combined to produce the hash key.
Example:
Consider a record 12365412 then it is divided into separate parts 123 654 12 and
these are added together
H(key) = 123+654+12
= 789
The record 12365412 will be placed at location 789 in the hash table.
Collision Definition:
The situation in which the hash function returns the same hash key for more
than one record is called collision.
Similarly when there is no room for a new pair in the hash table then such situation
is called overflow. Sometimes when we handle collision it may lead overflow
conditions. Collision and overflow show the poor hash functions.
Example
Consider a hash function. H(key) = recordkey%10 having the hash table of
size 10. The record keys to be placed are 131, 44, 43, 78, 19, 36, 57 and 77
0
1 131
2
3 43
4 44
5
6 36
Now if we try to place 77 in the hash table then we get the hash key to be 7
and index 7 already has the record key 57. This situation is called collision.
From the index 7 we look for next vacant position at subsequent indices 8, 9
then we find that there is no room to place 77 in the hash table. This situation is
called overflow.
A simple hash function –routine
typedef unsigned int Index;
Index Hash(int key, int TableSize)
{
unsigned int HashVal = 0;
HashVal= key% Tablesize;
return HashVal;
}
COLLISION RESOLUTION TECHNIQUES
The techniques which are used to resolve or overcome collision while inserting
data into the hash table are called collision resolution techniques.
There are two methods for detecting collisions and overflows in the hash table
1. Chaining or Separate chaining.
2. Open addressing
Linear probing
Quadratic probing
Double hashing
1. SEPARATE CHAINING
In this method, a linked list of all elements that hash to the same value is kept. The
linked list has a header node. Any new element inserted will be inserted in the
beginning of the list.
The diagrammatic representation is given below.
Example:
Consider the keys to be placed in their home buckets are
131, 3, 4, 21, 61, 24, 7, 97, 8, 9
Then we will apply a hash function as
H(key) = key % D
where D is the size of table. The hash table will be
Here D = 10.
Implementation
Type declaration for separate chaining
struct ListNode;
typedef struct ListNode *Position;
struct HashTbl;
CS8391 71 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
}
/* Allocate table */
H = malloc (sizeof (struct HashTbl));
if (H == NULL)
FatalError ("Out of space!!!");
H->TableSize = NextPrime (TableSize);
/* Allocate array of lists */
H->TheLists = malloc (sizeof (List) * H->TableSize);
if (H->TheLists == NULL)
FatalError ("Out of space!!!");
4. The variable Thelists contains the base address of the array of the list
formed.
5. Each index in the array of index contains the address of the node listnode.
6. The next address of the Thelists is made to be NULL.
List L;
Pos = Find (Key, H);
if (Pos == NULL) /* Key is not found */
{
NewCell = malloc (sizeof (struct ListNode));
if (NewCell == NULL)
FatalError ("Out of space!!!");
else
{
L = H->TheLists [Hash (Key, H->TableSize)];
NewCell->Next = L->Next;
NewCell->Element = Key; /* Probably need strcpy! */
L->Next = NewCell;
}
}
}
Explanation
1. This function has two parameters namely the element to be inserted and the
address of the hashtable.
2. Find function is invoked to check whether the element to be inserted is
present already. If it is present already, it is not inserted again.
3. If find function returns NULL, it implies that the element is not found in the
hash table. Hence insertion takes place.
4. The variable NewCell contains the address of the new node created.
5. The variable L contains the address of the header node of the linked list
where the element is to be inserted.
6. The element is inserted into the new node. The new node is inserted into the
beginning of the list.
2. OPEN ADDRESSING
Open Addressing is an alternative method to resolve collision with linked lists. If a
collision occurs, alternative cells are tried until an empty cell is found. Because all
the data go inside the table, a bigger table is needed for open addressing hashing
than for separate chaining hashing.
There are three methods in open addressing. They are:
i. Linear Probing
ii. Quadratic Probing
iii. Double Hashing
We will use Division hash function. That means the keys are placed using the
H(key) = key % tablesize
For instance the element 131 can be placed at
H(key) = 131%10
=1
131 is placed at the Index 1. Continuing in this fashion we will place 4, 8 and 7.
Index Key
0 NULL
1 131
2 NULL
3 NULL
4 4
5 NULL
6 NULL
7 7
8 8
9 NULL
Now the next key to be inserted is 21. According to the hash function
H(key) = 21%10
=1
But the index 1 location is already occupied by 131 i.e. collision occurs. To resolve
collision we will linearly move down and at the next empty location. Therefore 21
will be placed at the index 2. If the next element is 5 we will put element 5 at index
5.
The Hash table after the insertion of 21 and 5 is given below.
CS8391 78 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
Index Key
0 NULL
1 131
2 21
3 NULL
4 4
5 5
6 NULL
7 7
8 NULL
9 NULL
Index Key
0 NULL
1 131
2 21
3 31
4 4
The next record key that comes is 9, According to
5 5
decision hash function it demands the index 9.
6 61
Hence we will place 9 at index 9. Now the next final
7 7
record key is 29 and it hashes a key 9. But the index 9
8 8
is already occupied. So there is no next empty bucket
9 NULL
as the table size is limited to index 9. The overflow
occurs.
Problem with linear probing
CS8391 79 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
One major problem with linear probing is primary clustering. Primary clustering is
a process in which a block of data is formed in the hash table when collision is
resolved.
19 % 10 = 9 18 % 10 = 8 39 % 10 = 9 29 % 10 = 9 8 % 10 = 8
0 39
1 29 Cluster is Formed
2 8
3
4
5 Rest of the table empty
6
7
8 18
9 19
Example :
If we have to insert following elements in the hash table with size 10
37, 90, 55, 22, 11, 17, 49, 87. We will fill the hash table step by step
37 % 10 = 7 90 % 10 = 0 55 % 10 = 5 22 % 10 = 2 11 % 10 = 1
0 90
1 11
2 22
3
4
5 55
6
7 37
8
9
8 17
9 49
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
0 90
1 11
2 22
3
4
(87 + 0) % 10 = 7
5 55
(87 + 1) % l0 = 8 ... but already occupied
6 87
(87 + 22) % 10 = 1 ... already occupied
7 37
(87 + 32) % l0 = 6
8 17
9 49
It is observed that if we want to place all the
necessary elements in the hash table
the size of divisor (m) should be twice as large as total number of elements.
{
Error ("Table size too small");
return NULL;
}
/* Allocate table */
H = malloc (sizeof (struct HashTbl));
if (H == NULL) FatalError ("Out of space!!!");
H->TableSize = NextPrime (TableSize);
/* Allocate array of Cells */
H->TheCells = malloc (sizeof (Cell) * H ->TableSize);
if (H->TheCells == NULL) FatalError ("Out of space!!!");
for (i = 0; i < H->TableSize; i++ )
H->TheCells [i].Info = Empty;
return H;
}
Explanation
1. This function takes the tablesize as the parameter.
2. H contains the address where the structure hashtable is created in the memory.
3. The table size is made to be prime.
4. The variable “TheCells” contains the base address of the array of the cell
formed.
5. For every cell in the array, the enum value empty is assigned.
int CollisionNum;
CollisionNum = 0;
CurrentPos = Hash (Key, H->TableSize);
while (H->TheCells [CurrentPos].Info != Empty &&
H->TheCells [CurrentPos].Element != Key)
/* Probably need strcmp!! */
{
CurrentPos += 2 * ++CollisionNum - 1;
if (CurrentPos >= H->TableSize)
CurrentPos -= H->TableSize;
}
return CurrentPos;
}
Explanation
1. This function has two parameters namely the element to be found and the
address of the hashtable.
2. Hash function is called which returns the index where the data is to be
found. The index is stored in currentpos.
3. It checks the index to find the data. If the element is not found, it check the
next possible index to find the data. This is done using the formula
Currentpos = currentpos + 2 * ++collissionnum – 1, wher collissionnum is
initialized to 0.
4. The function returns the index value of the array where the element is
present.
Insert routine for hash tables with quadratic probing
void Insert( ElementType Key, HashTable H )
{
CS8391 85 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
Position Pos;
Pos =Find( Key, H );
if( H—>TheCells[Pos ].Info ! = Legitimate )
{
H—>TheCells[ Pos ].Info = Legitimate;
H->TheCells [Pos].Element =Key;
}
}
Explanation
1. This function has two parameters namely, element to be inserted in the key and
the address of the hash table H.
2. Find Function is invoked to find the index where the data is to be inserted.
3. If the info in the TheCells is not legitimate, it is assigned legitimate. And the
key value is inserted into that position.
Disadvantages of Quadratic Probing
• If the table size is prime, a new element can always be inserted if the table is
at least half empty.
• Standard deletion cannot be performed in an open addressing hash table
because the cell might have caused a collision to go past it.
• Secondary clustering problem - elements hash to the same position will
probe the same alternative cells.
2.3. DOUBLE HASHING
Double hashing is technique in which a second hash function is applied to the key
when a collision occurs. By applying the second hash function we will get the
number of positions from the point of collision to insert.
There are two important rules to be followed for the, second function:
• It must never evaluate to zero.
CS8391 86 UNIT V - DATA STRUCTURES
PANIMALAR INSTITUTE OF TECHNOLOGY II YEAR/III SEM B.E-CSE
That means we have to insert the element 17 at 4 places from 37.In short we
have ton take 4 jumps .Therefore the 17 will be placed at index 1.Now to insert
numbers 55
0 90
1 17
2 22
3
4
5 45
6
7 37
8
9 49
Insert number 55.
H 1(55) = 55%10=5
H 2(55) =7—(55 %7) =7—6=1
We have to take one jump from index 5 to place 55. Finally the hash table looks
like this.
0 90
1 17
2 22
3
4
5 45
6 55
7 37
8
9 49
The double hashing is more complex to implement than quadratic probing. The
quadratic probing is fast technique than double hashing.
REHASHING
Rehashing is a technique in which the table is resized, i.e., the size of table is
doubled by creating a new table. It is preferable if the total size of table is a prime
number. There are situations in which the rehashing is required-
When table is completely full.
With quadratic probing when the table is filled half.
When insertions fail due to overflow.
In such situations, we have to transfer entries from old table to the new table by
re-computing their positions using suitable hash functions.
Consider we have to insert the elements 37, 90, 55, 22, 17, 49 and 87. The size is
10 and will use hash function,
H(key) = key mod tablesize
37 % 10 = 7 90 % 10 = 0 55 % 10 = 5 22 % 10 = 2
17 % 10 = 7 Collision solved by linear Probing by placing it at 8 49 % 10 = 9
0 90
1
2 22
3
4
5 55
6
7 37
8 17
9 49
Now the table is almost full and if we try to insert more elements collisions will
occur eventually further insertions will fail. Hence we will rehash by doubling the
table . The old table size is 10 then we should double this size for new table, but 20
is not a prime number, we will prefer to make the table size as 23.
The hash function will be H(key) = key mod 23
01
02
03 49
04
05
06
07
08
09 90
10 55
11
12
13
14 37
15
16
17 17
18 87
19
20
21
22 22
The root of the tree contains four pointer determined by the leading two bits of the
data. Each leaf has up to M=4 elements. It happens that in each leaf the first two
bits are identical; this is indicated by the number in parenthesis.
D will represent the number of bits used by the root, which is sometimes known as
directory.
The number of entries in the directory is 2 D. d L is the number of leading bits that
all the elements of some leaf L have in common.
Suppose that we want to insert the key 100100. This would go into the third leaf,
but as the third leaf is already full, there is no room. We thus split this leaf into two
leaves, which are now determined by the first three bits. This requires increasing
the directory size to 3. These changes are reflected in the figure below.
All of the leaves not involved in the split are now pointed to by two adjacent
directory entries. Thus, although an entire directory is rewritten, none of the other
leaves is actually accessed. If the key 000000 is now inserted, then the first leaf is
split, generating two leaves with d L = 3. Since D = 3, the only change required in
the directory is the updating of the 000 and 001 pointers. This is given in the figure
below:
This very simple strategy provides quick access times for Insert and Find
operations on large databases. There are a few important details we have not
considered.