0% found this document useful (0 votes)
25 views22 pages

Unit 5

The document provides an overview of various searching and sorting techniques, including linear search, binary search, and several sorting algorithms such as bubble sort, insertion sort, selection sort, shell sort, and radix sort. It explains the principles, advantages, and disadvantages of each method, along with example code snippets for implementation. Additionally, it discusses hashing techniques and their applications, highlighting the importance of algorithm efficiency in terms of time and space complexity.

Uploaded by

manashanandhini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views22 pages

Unit 5

The document provides an overview of various searching and sorting techniques, including linear search, binary search, and several sorting algorithms such as bubble sort, insertion sort, selection sort, shell sort, and radix sort. It explains the principles, advantages, and disadvantages of each method, along with example code snippets for implementation. Additionally, it discusses hashing techniques and their applications, highlighting the importance of algorithm efficiency in terms of time and space complexity.

Uploaded by

manashanandhini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

UNIT V SEARCHING, SORTING AND HASHING TECHNIQUES 9

Searching - Linear Search - Binary Search - Sorting – Bubble sort - Insertion sort - Selection
sort – Shell sort – Radix sort. Hashing - Hash functions - Open Hashing - Separate Chaining
- Closed Hashing - Linear Probing, Quadratic Probing, Double Hashing, Random Probing,
Rehashing, Extendible Hashing, Applications – Dictionary - Telephone directory.

SEARCHING
Searching is an algorithm, to check whether a particular element is present in the list.
Types of searching:-
Linear search Binary Search
Linear Search
Linear search is used to search a data item in the given set in the sequential manner, starting from the
first element. It is also called as sequential search

Linear Search routine:


void Linear_search ( int a[ ] , int n )
{
int search , count = 0 ; for ( i = 0 ; i < n ; I ++ )
{
if ( a [ i ] = = search )
{
count ++ ;
}
}
if ( count = = 0 )
cout<<“Element not Present” ;
else
cout<<“Element is Present in list" ;
}

Program for Linear search


#include < iostream.h>
void main( )
{
int a [ 10 ] , n , i , search, count = 0 ;
cout<<" Enter the number of elements " ;
cin>> n ;
cout<< " Enter %d numbers " , n ;
for ( i = 0 ; i < n ; i ++ )
cin>> & a [ i ] ;
cout<< " \n Array Elements " ) ; for ( i = 0 ; i < n ; i ++ )
cout<<a [ i ] ;
cout<<" Enter the Element to be searched: \ t " ) ;
cin>>search ;
for ( i =0 ; i < n; i ++ )
{
if ( search = = a [ i ] )

count ++ ;
}
if ( count = = 0 )
cout<<" \n Element %d is not present in the array " , search ; else
cout<< " \n Element %d is present %d times in the array " , search , count ;
}
OUTPUT:
Enter the number of elements 5 Enter the numbers
20 10 5 25 100
Array Elements
20 10 5 25 100
Enter the Element to be searched: 25 Element 25 is present 1 times in the array
Advantages of Linear search:

• The linear search is simple - It is very easy to understand and implement;

• It does not require the data in the array to be stored in any particular order.
Disadvantages of Linear search:

• Slower than many other search algorithms.

• It has a very poor efficiency.


Binary Search
Binary search is used to search an item in a sorted list. In this method , initialize the lower limit and
upper limit.
The middle position is computed as (first+last)/2 and check the element in the middle position with the
data item to be searched.
If the data item is greater than the middle value then the lower limit is adjusted to one greater than the
middle value.Otherwise the upper limit is adjusted to one less than the middle value.
Working principle:
Algorithm is quite simple. It can be done either recursively or iteratively:

1. Get the middle element;


2. If the middle element equals to the searched value, the algorithm stops;
3. Otherwise, two cases are possible:
o Search value is less than the middle element. In this case, go to the step
1 for the part of the array, before middle element.
o Searched value is greater, than the middle element. In this case, go to
the step 1 for the part of the array, after middle element.

Example 1.
Find 6 in {-1, 5, 6, 18, 19, 25, 46, 78, 102, 114}.
Step 1 (middle element is 19 > 6): -1 5 6 18 19 25 46 78 102 114
Step 2 (middle element is 5 < 6): -1 5 6 18 19 25 46 78 102 114
Step 3 (middle element is 6 == 6): -1 5 6 18 19 25 46 78 102 114

Binary Search routine:


void Binary_search ( int a[ ] , int n , int search )
{
int first, last, mid ; first = 0 ;
last = n-1 ;
mid = ( first + last ) / 2 ; while ( first < = last )
{
if ( Search > a [ mid ] ) first = mid + 1 ;
else if ( Search = = a [ mid ] )
{
cout<<“Element is present in the list" ;
break ;
}
else {
last = mid – 1 ;
mid = ( first + last ) / 2 ;
}
if( first > last )
print “Element Not Found” ;
}
Program for Binary Search:
#include<iostream.h> void main( )
{
int a [ 10 ] , n , i , search, count = 0 ;
void Binary_search ( int a[ ] , int n , int search ); printf ("Enter the number of elements \t") ;
scanf ("%d",&n); printf("\nEnter the numbers\n") ; for (i = 0; i<n;i++)
scanf("%d",&a[i]); printf("\nArray Elements\n") ; for (i = 0 ; i < n ; i ++ ) printf("%d\t",a[i]) ;
printf ("\n\nEnter the Element to be searched:\t"); scanf("%d",&search ); Binary_search(a,n,search);
}
void Binary_search ( int a[ ] , int n , int search )
{
int first, last, mid ; first = 0 ;
last = n-1 ;
mid = (first + last ) / 2 ; while (first<=last )
{
if(search>a[mid]) first = mid + 1 ;
else if (search==a[mid])
{
printf("Element is present in the list"); break ;
}
else
last = mid - 1 ;
mid = ( first + last ) / 2 ;
}
if( first > last ) printf("Element Not Found");
}

SORTING:
Definition:
Sorting is a technique for arranging data in a particular order.
Order of sorting:
Order means the arrangement of data. The sorting order can be ascending or descending. The ascending
order means arranging the data in increasing order and descending order means arranging the data in
decreasing order.
Types of Sorting
Internal Sorting External Sorting
Internal Sorting
Internal Sorting is a type of sorting technique in which data resides on main memory of computer. It is
applicable when the number of elements in the list is small.
E.g. Bubble Sort, Insertion Sort, Shell Sort, Quick Sort., Selection sort, Radix sort
External Sorting
External Sorting is a type of sorting technique in which there is a huge amount of data and it resides on
secondary devise(for eg hard disk,Magnetic tape and so no) while sorting.
E.g. Merge Sort, Multiway Merge Sort,Polyphase merge sort
Sorting can be classified based on 1.Computational complexity 2.Memory utilization
3. Stability
4. Number of comparisons.
ANALYSIS OF ALGORITHMS:
Efficiency of an algorithm can be measured in terms of:

Space Complexity: Refers to the space required to execute the algorithm


Time Complexity: Refers to the time required to run the program.
Sorting algorithms:
Insertion sort Selection sort Shell sort Bubble sort Quick sort Merge sort Radix sort
INSERTION SORTING:
The insertion sort works by taking elements from the list one by one and inserting them in
the correct position into a new sorted list.
Insertion sort consists of N-1 passes, where N is the number of elements to be sorted. The ith
pass will insert the ith element A[i] into its rightful place among A[1],A[2],…,A[i-1].
After doing this insertion the elements occupying A[1],…A[i] are in sorted order.

How Insertion sort algorithm works?

Insertion Sort routine:


void Insertion_sort(int a[ ], int n)
{
int i, j, temp;
for ( i = 0 ; i < n -1 ; i ++ )
{
temp = a [ j ] ;
for ( j = i ; j > 0 && a [ j -1 ] > temp ; j -- )
{
a[ j ] = a [ j – 1 ] ;

}
a[j]=temp;
}}
Program for Insertion sort
#include<iostream.h>
void main( ){
int n, a[ 25 ], i, j, temp;
cout<< "Enter number of elements" ; cin>>n;
printf( "Enter %d integers \n", n ); for ( i = 0; i < n; i++ )
scanf( "%d", &a[i] ); for ( i = 0 ; i < n; i++ ){
temp=a[i];
for (j=i;j > 0 && a[ j -1]>temp;j--)
{
a[ j ] = a[ j - 1 ];
}
a[j]=temp;}
cout<<"Sorted list in ascending order:”;
for ( i = 0 ; i < n ; i++)
cout<<a[ i ] ;}
OUTPUT:
Enter number of elements 6
Enter 6 integers
20 10 60 40 30 15
Sorted list in ascending order:
10
15
20
30
40
60
Advantage of Insertion sort
Simple implementation.
Efficient for (quite) small data sets.
Efficient for data sets that are already substantially sorted.

Disadvantages of Insertion sort


It is less efficient on list containing more number of elements.
As the number of elements increases the performance of the program would be slow. Insertion sort
needs a large number of element shifts.
Selection Sort
Selection sort selects the smallest element in the list and place it in the first position then selects the
second smallest element and place it in the second position and it proceeds in the similar way until the
entire list is sorted. For “n” elements, (n-1) passes are required. At the end of the ith iteration, the ith
smallest element will be placed in its correct position.

Selection Sort routine:


void Selection_sort( int a[ ], int n )
{
int i , j , temp , position ;
for ( i = 0 ; i < n – 1 ; i ++ )
{
position = i ;
for ( j = i + 1 ; j < n ; j ++ )
{
if ( a[ position ] > a[ j ] )
position = j;}
temp = a[ i ];
a[ i ] = a[ position ]; a[ position ] = temp;
}}

How Selection sort algorithm works?

Program for Selection sort


#include<iostream.h>
void main( )
{
int a [ 100 ] , n , i , j , position , temp ; cout<<"Enter number of elements” ;
cin>>n ;
cout<<" Enter %d integers “, n ;
for ( i = 0 ; i < n ; i ++ )
cin>> a[ i ] ;
for ( i = 0 ; i < ( n - 1 ) ; i ++ )
{
position = i ;
for ( j = i + 1 ; j < n ; j ++ )
{
if ( a [ position ] > a [ j ] ) position = j ;
}
if ( position != i )
{
temp = a [ i ] ;
a [ i ] = a [ position ] ; a [ position ] = temp ;
}
}
cout<<"Sorted list in ascending order: " ;
for ( i = 0 ; i < n ; i ++ )
cout<<a[ i ] ;
}
OUTPUT:
Enter number of elements 5
Enter 5 integers
83951
Sorted list in ascending order:
1
3
5
8
9
Advantages of selection sort

• Memory required is small.

• Selection sort is useful when you have limited memory available.

• Relatively efficient for small arrays.


Disadvantage of selection sort
• Poor efficiency when dealing with a huge list of items.
• The selection sort requires n-squared number of steps for sorting n elements.
• The selection sort is only suitable for a list of few elements that are in random order.
Shell Sort
• Invented by Donald shell.
• It improves upon bubble sort and insertion sort by moving out of order elements more than
one position at a time.
• In shell sort the whole array is first fragmented into K segments, where K is preferably a
prime number.
• After the first pass the whole array is partially sorted.
• In the next pass, the value of K is reduced which increases the size of each segment and
reduces the number of segments.
• The next value of K is chosen so that it is relatively prime to its previous value.
• The process is repeated until K=1 at which the array is sorted.
• The insertion sort is applied to each segment so each successive segment is partially sorted.
• The shell sort is also called the Diminishing Increment sort, because the value of k decreases
continuously

A Shell Sort with Increments of Three

A Shell Sort after Sorting Each Sublist


Shell Sort: A Final Insertion Sort with Increment of 1
Shell Sort routine:
void Shell_sort ( int a[ ], int n )
{
int i, j, k, temp;
for ( k = n / 2 ; k > 0 ; k = k / 2 ) for ( i = k ; i < n ; i + + )
{
temp = a [ i ] ;
for ( j = i ; j > = k && a [ j – k ] > temp ; j = j – k )
{
a[j]=a[j–k];
}
a [ j ] = temp ;
}
}
Program for Shell sort
#include<iostream.h> void main( )
{
int n, a[ 25 ], i, j,k,temp;
cout<<"Enter number of elements " ;
cin>>n;
cout<<"Enter %d integers”, n ;
for ( i = 0; i < n; i++ )
cin>>a[i];
for (k = n / 2 ; k>0 ; k=k/ 2){
for ( i = k ; i < n ; i ++ )
{
temp = a [ i ] ;
for (j = i ; j>= k && a [ j - k ]>temp ; j=j - k )
{
a[j]=a[j-k];
}
a [ j ] = temp ;
}
}
Cout<< "Sorted list in ascending order using shell sort:";
for ( i = 0 ; i < n ; i++)
cout<<a[ i ];
}
OUTPUT:
Enter number of elements 10
Enter 10 integers
81 94 11 96 12 35 17 95 28 58
Sorted list in ascending order using shell sort:
11 12 17 28 35 58 81 94 95 96
Advantages of Shell sort

• Efficient for medium-size lists.


Disadvantages of Shell sort

• Complex algorithm, not nearly as efficient as the merge, heap and quick sorts
Bubble Sort
Bubble sort is one of the simplest internal sorting algorithms.
Bubble sort works by comparing two consecutive elements and the largest element among these two
bubbles towards right at the end of the first pass the largest element gets sorted and placed at the end
of the sorted list.
This process is repeated for all pairs of elements until it moves the largest element to the end of the
list in that iteration.
Bubble sort consists of (n-1) passes, where n is the number of elements to be sorted.
In 1st pass the largest element will be placed in the nth position.
In 2nd pass the second largest element will be placed in the (n-1)th position. In (n-1)th pass only the first
two elements are compared

Radix Sort
Radix sort is one of the linear sorting algorithms. It is generalized form of bucket sort. It can be
performed using buckets from 0 to 9.
It is also called binsort, card sort.
It works by sorting the input based on each digit. In first pass all the elements are stored according to
the least significant digit.
In second pass the elements are arranged according to the next least significant digit and so on till the
most significant digit.
The number of passes in a Radix sort depends upon the number of digits in the given numbers.

Algorithm for Radix sort


Steps1: Consider 10 buckets (1 for each digit 0 to 9)
Step2: Consider the LSB (Least Significant Bit) of each number (numbers in the one‟s Place…. E.g.,
in 43 LSB = 3)
Step3: Place the elements in their respective buckets according to the LSB of each number
Step4: Write the numbers from the bucket (0 to 9) bottom to top.
Step5: repeat the same process with the digits in the 10‟s place (e.g. In 43 MSB =4)
Step6: repeat the same step till all the digits of the given number are consider.

Consider the following numbers to be sorted using Radix sort.


Sorted list of array : 3 15 27 31 37 43 80
Routine for Radix sort
void Radix_sort ( int a [ ] , int n )
{
int bucket [ 10 ] [ 5 ] , buck [ 10 ] , b [ 10 ] ; int i , j , k , l , num , div , large , passes ;
div = 1 ; num = 0 ;
large = a [ 0 ] ;
for ( i = 0 ; i < n ; i ++ )
{
if ( a[ I ] > large )
{
large = a [ i ] ;
}
while ( large > 0 )
{
num ++ ;
large = large / 10 ;
}
for ( passes = 0 ; passes < num ; passes ++ )
{
for ( k = 0 ; k < 10 ; k ++ )
{
buck [ k ] = 0 ;
}
for ( i = 0 ; i < n ; i ++ )
{
l = ( ( a [ i ] / div ) % 10 ) ;
bucket [ l ] [ buck [ l ] ++ ] = a [ i ] ;
}
i=0;
for ( k = 0 ; k < 10 ; k ++ )
{
for ( j = 0 ; j < buck [ k ] ; j ++ )
{
a [ i ++ ] = bucket [ k ] [ j ] ;
}
}
div * = 10 ;
}
}
}
Advantages of Radix sort:

• Fast and complexity does not depend on the number of data.

• Radix Sort is very simple.


Disadvantages of Radix sort:

• Radix Sort takes more space than other sorting algorithms, since in addition to the
array that will be sorted, you need to have a sub list for each of the possible digits
or letters.

• Since Radix Sort depends on the digits or letters, Radix Sort is also much less
flexible than other sorts.
Collision:
If two more keys hashes to the same index, the corresponding records cannot be stored in the same
location. This condition is known as collision.
Characteristics of Good Hashing Function:
It should be Simple to compute.
Number of Collision should be less while placing record in Hash Table.
Hash function with no collision Perfect hash function.
Hash Function should produce keys which are distributed uniformly in hash table.
The hash function should depend upon every bit of the key. Thus the hash function that
simply extracts the portion of a key is not suitable.
Collision Resolution Strategies / Techniques (CRT):
If collision occurs, it should be handled or overcome by applying some technique. Such
technique is called CRT.
There are a number of collision resolution techniques, but the most popular are:
Separate chaining (Open Hashing)
Open addressing. (Closed Hashing)
Linear Probing Quadratic Probing Double Hashing
Separate chaining (Open Hashing)
Open hashing technique.
Implemented using singly linked list concept. Pointer (ptr) field is added to each record.
When collision occurs, a separate chaining is maintained for colliding data. Element inserted in
front of the list.
H (key) =key % table size
Two operations are there:-
Insert
Find
Structure Definition for Node typedef Struct node *Position; Struct node
{
int data; defines the nodes
Position next;
};
Structure Definition for Hash Table
typedef Position List; struct Hashtbl
{ Defines the hash table which contains
int Tablesize; array of linked list List * theLists;
};
Insert the following four keys 22 84 35 62 into hash table of size 10 using separate chaining. The hash
function is
H(key) = key % 10
1. H(22) = 22 % 10 =2 2. 84 % 10 = 4

Advantages
1. More number of elements can be inserted using array of Link List
Disadvantages

1. It requires more pointers, which occupies more memory space.


2. Search takes time. Since it takes time to evaluate Hash Function and also to traverse the
List
Open Addressing
Closed Hashing
Collision resolution technique
Uses Hi(X)=(Hash(X)+F(i))mod Tablesize
When collision occurs, alternative cells are tried until empty cells are found. Types:-

▪ Linear Probing

▪ Quadratic Probing

▪ Double Hashing Hash function

▪ H(key) = key % table size.


Insert Operation
▪ To insert a key; Use the hash function to identify the list to which the element
should be inserted.

▪ Then traverse the list to check whether the element is already present.
▪ If exists, increment the count.

▪ Else the new element is placed at the front of the list.


Linear Probing:
Easiest method to handle collision.
Apply the hash function H (key) = key % table size Hi(X)=(Hash(X)+F(i))mod Tablesize,where
F(i)=i.
How to Probing:
first probe – given a key k, hash to H(key)
second probe – if H(key)+f(1) is occupied, try H(key)+f(2) And so forth.
Probing Properties:
We force f(0)=0
The ith probe is to (H (key) +f (i)) %table size. If i reach size-1, the probe has failed.
Depending on f (i), the probe may fail sooner. Long sequences of probe are costly.
Probe Sequence is:
H (key) % table size
H (key)+1 % Table size H (Key)+2 % Table size

1. (Key)=Key mod Tablesize


This is the common formula that you should apply for any hashing If collocation occurs use Formula
2
2. H(Key)=(H(key)+i) Tablesize
Where i=1, 2, 3, …… etc
Example: - 89 18 49 58 69; Tablesize=10
1. H(89) =89%10
=9
2. H(18) =18%10
=8
3. H(49) =49%10
=9 ((coloids with 89.So try for next free cell using formula 2)) h1(49) = (H(49)+1)%10
= (9+1)%10
=10%10
=0
4. H(58) =58%10
=8 ((colloids with 18)) h1(58) = (H(58) +1)%10
= (8+1) %10
=9%10
=9 =>Again collision i=2 h2(58) =(H(58)+2)%10
=(8+2)%10
=10%10
=0 =>Again collision

EMPTY 89 18 49 58 69
0 49 49 49
1 58 58
2 69
3
4
5
6
7
8 18 18 18
9 89 89 89 89
Linear probing

Quadratic Probing
To resolve the primary clustering problem, quadratic probing can be used. With quadratic probing,
rather than always moving one spot, move i2 spots from the point of collision, where i is the number
of attempts to resolve the collision.
Another collision resolution method which distributes items more evenly.
From the original index H, if the slot is filled, try cells H+12, H+22, H+32,.., H + i2 with wrap-
around.
Hi(X)=(Hash(X)+F(i))mod Tablesize,F(i)=i2 Hi(X)=(Hash(X)+ i2)mod Tablesize
Limitation: at most half of the table can be used as alternative locations to resolve collisions.
This means that once the table is more than half full, it's difficult to find an empty spot. This new
problem is known as secondary clustering because elements that hash to the same hash key will
always probe the same alternative cells.
Double Hashing
Double hashing uses the idea of applying a second hash function to the key when a collision occurs.
The result of the second hash function will be the number of positions forms the point of collision to
insert.
There are a couple of requirements for the second function:
It must never evaluate to 0 must make sure that all cells can be probed.
Hi(X)=(Hash(X)+i*Hash2(X))mod Tablesize
A popular second hash function is:
Hash2 (key) = R - (key % R) where R is a prime number that is smaller than the size of the table.

Rehashing
Once the hash table gets too full, the running time for operations will start to take too long and may
fail. To solve this problem, a table at least twice the size of the original will be built and the elements
will be transferred to the new table.
Advantage:
A programmer doesn‟t worry about table system. Simple to implement
Can be used in other data structure as well
The new size of the hash table:
should also be prime
will be used to calculate the new insertion spot (hence the name rehashing)
This is a very expensive operation! O(N) since there are N elements to rehash and the table size is
roughly 2N. This is ok though since it doesn't happen that often.
The question becomes when should the rehashing be applied?
Some possible answers:
once the table becomes half full once an insertion fails
Extendible Hashing
Extendible Hashing is a mechanism for altering the size of the hash table to accommodate new entries
when buckets overflow.
Common strategy in internal hashing is to double the hash table and rehash each entry. However, this
technique is slow, because writing all pages to disk is too expensive.
Therefore, instead of doubling the whole hash table, we use a directory of pointers to buckets, and
double the number of buckets by doubling the directory, splitting just the bucket that overflows.
Since the directory is much smaller than the file, doubling it is much cheaper. Only one page of keys
and pointers is split.

You might also like