Unit 5
Unit 5
Searching - Linear Search - Binary Search - Sorting – Bubble sort - Insertion sort - Selection
sort – Shell sort – Radix sort. Hashing - Hash functions - Open Hashing - Separate Chaining
- Closed Hashing - Linear Probing, Quadratic Probing, Double Hashing, Random Probing,
Rehashing, Extendible Hashing, Applications – Dictionary - Telephone directory.
SEARCHING
Searching is an algorithm, to check whether a particular element is present in the list.
Types of searching:-
Linear search Binary Search
Linear Search
Linear search is used to search a data item in the given set in the sequential manner, starting from the
first element. It is also called as sequential search
count ++ ;
}
if ( count = = 0 )
cout<<" \n Element %d is not present in the array " , search ; else
cout<< " \n Element %d is present %d times in the array " , search , count ;
}
OUTPUT:
Enter the number of elements 5 Enter the numbers
20 10 5 25 100
Array Elements
20 10 5 25 100
Enter the Element to be searched: 25 Element 25 is present 1 times in the array
Advantages of Linear search:
• It does not require the data in the array to be stored in any particular order.
Disadvantages of Linear search:
Example 1.
Find 6 in {-1, 5, 6, 18, 19, 25, 46, 78, 102, 114}.
Step 1 (middle element is 19 > 6): -1 5 6 18 19 25 46 78 102 114
Step 2 (middle element is 5 < 6): -1 5 6 18 19 25 46 78 102 114
Step 3 (middle element is 6 == 6): -1 5 6 18 19 25 46 78 102 114
SORTING:
Definition:
Sorting is a technique for arranging data in a particular order.
Order of sorting:
Order means the arrangement of data. The sorting order can be ascending or descending. The ascending
order means arranging the data in increasing order and descending order means arranging the data in
decreasing order.
Types of Sorting
Internal Sorting External Sorting
Internal Sorting
Internal Sorting is a type of sorting technique in which data resides on main memory of computer. It is
applicable when the number of elements in the list is small.
E.g. Bubble Sort, Insertion Sort, Shell Sort, Quick Sort., Selection sort, Radix sort
External Sorting
External Sorting is a type of sorting technique in which there is a huge amount of data and it resides on
secondary devise(for eg hard disk,Magnetic tape and so no) while sorting.
E.g. Merge Sort, Multiway Merge Sort,Polyphase merge sort
Sorting can be classified based on 1.Computational complexity 2.Memory utilization
3. Stability
4. Number of comparisons.
ANALYSIS OF ALGORITHMS:
Efficiency of an algorithm can be measured in terms of:
}
a[j]=temp;
}}
Program for Insertion sort
#include<iostream.h>
void main( ){
int n, a[ 25 ], i, j, temp;
cout<< "Enter number of elements" ; cin>>n;
printf( "Enter %d integers \n", n ); for ( i = 0; i < n; i++ )
scanf( "%d", &a[i] ); for ( i = 0 ; i < n; i++ ){
temp=a[i];
for (j=i;j > 0 && a[ j -1]>temp;j--)
{
a[ j ] = a[ j - 1 ];
}
a[j]=temp;}
cout<<"Sorted list in ascending order:”;
for ( i = 0 ; i < n ; i++)
cout<<a[ i ] ;}
OUTPUT:
Enter number of elements 6
Enter 6 integers
20 10 60 40 30 15
Sorted list in ascending order:
10
15
20
30
40
60
Advantage of Insertion sort
Simple implementation.
Efficient for (quite) small data sets.
Efficient for data sets that are already substantially sorted.
• Complex algorithm, not nearly as efficient as the merge, heap and quick sorts
Bubble Sort
Bubble sort is one of the simplest internal sorting algorithms.
Bubble sort works by comparing two consecutive elements and the largest element among these two
bubbles towards right at the end of the first pass the largest element gets sorted and placed at the end
of the sorted list.
This process is repeated for all pairs of elements until it moves the largest element to the end of the
list in that iteration.
Bubble sort consists of (n-1) passes, where n is the number of elements to be sorted.
In 1st pass the largest element will be placed in the nth position.
In 2nd pass the second largest element will be placed in the (n-1)th position. In (n-1)th pass only the first
two elements are compared
Radix Sort
Radix sort is one of the linear sorting algorithms. It is generalized form of bucket sort. It can be
performed using buckets from 0 to 9.
It is also called binsort, card sort.
It works by sorting the input based on each digit. In first pass all the elements are stored according to
the least significant digit.
In second pass the elements are arranged according to the next least significant digit and so on till the
most significant digit.
The number of passes in a Radix sort depends upon the number of digits in the given numbers.
• Radix Sort takes more space than other sorting algorithms, since in addition to the
array that will be sorted, you need to have a sub list for each of the possible digits
or letters.
• Since Radix Sort depends on the digits or letters, Radix Sort is also much less
flexible than other sorts.
Collision:
If two more keys hashes to the same index, the corresponding records cannot be stored in the same
location. This condition is known as collision.
Characteristics of Good Hashing Function:
It should be Simple to compute.
Number of Collision should be less while placing record in Hash Table.
Hash function with no collision Perfect hash function.
Hash Function should produce keys which are distributed uniformly in hash table.
The hash function should depend upon every bit of the key. Thus the hash function that
simply extracts the portion of a key is not suitable.
Collision Resolution Strategies / Techniques (CRT):
If collision occurs, it should be handled or overcome by applying some technique. Such
technique is called CRT.
There are a number of collision resolution techniques, but the most popular are:
Separate chaining (Open Hashing)
Open addressing. (Closed Hashing)
Linear Probing Quadratic Probing Double Hashing
Separate chaining (Open Hashing)
Open hashing technique.
Implemented using singly linked list concept. Pointer (ptr) field is added to each record.
When collision occurs, a separate chaining is maintained for colliding data. Element inserted in
front of the list.
H (key) =key % table size
Two operations are there:-
Insert
Find
Structure Definition for Node typedef Struct node *Position; Struct node
{
int data; defines the nodes
Position next;
};
Structure Definition for Hash Table
typedef Position List; struct Hashtbl
{ Defines the hash table which contains
int Tablesize; array of linked list List * theLists;
};
Insert the following four keys 22 84 35 62 into hash table of size 10 using separate chaining. The hash
function is
H(key) = key % 10
1. H(22) = 22 % 10 =2 2. 84 % 10 = 4
Advantages
1. More number of elements can be inserted using array of Link List
Disadvantages
▪ Linear Probing
▪ Quadratic Probing
▪ Then traverse the list to check whether the element is already present.
▪ If exists, increment the count.
EMPTY 89 18 49 58 69
0 49 49 49
1 58 58
2 69
3
4
5
6
7
8 18 18 18
9 89 89 89 89
Linear probing
Quadratic Probing
To resolve the primary clustering problem, quadratic probing can be used. With quadratic probing,
rather than always moving one spot, move i2 spots from the point of collision, where i is the number
of attempts to resolve the collision.
Another collision resolution method which distributes items more evenly.
From the original index H, if the slot is filled, try cells H+12, H+22, H+32,.., H + i2 with wrap-
around.
Hi(X)=(Hash(X)+F(i))mod Tablesize,F(i)=i2 Hi(X)=(Hash(X)+ i2)mod Tablesize
Limitation: at most half of the table can be used as alternative locations to resolve collisions.
This means that once the table is more than half full, it's difficult to find an empty spot. This new
problem is known as secondary clustering because elements that hash to the same hash key will
always probe the same alternative cells.
Double Hashing
Double hashing uses the idea of applying a second hash function to the key when a collision occurs.
The result of the second hash function will be the number of positions forms the point of collision to
insert.
There are a couple of requirements for the second function:
It must never evaluate to 0 must make sure that all cells can be probed.
Hi(X)=(Hash(X)+i*Hash2(X))mod Tablesize
A popular second hash function is:
Hash2 (key) = R - (key % R) where R is a prime number that is smaller than the size of the table.
Rehashing
Once the hash table gets too full, the running time for operations will start to take too long and may
fail. To solve this problem, a table at least twice the size of the original will be built and the elements
will be transferred to the new table.
Advantage:
A programmer doesn‟t worry about table system. Simple to implement
Can be used in other data structure as well
The new size of the hash table:
should also be prime
will be used to calculate the new insertion spot (hence the name rehashing)
This is a very expensive operation! O(N) since there are N elements to rehash and the table size is
roughly 2N. This is ok though since it doesn't happen that often.
The question becomes when should the rehashing be applied?
Some possible answers:
once the table becomes half full once an insertion fails
Extendible Hashing
Extendible Hashing is a mechanism for altering the size of the hash table to accommodate new entries
when buckets overflow.
Common strategy in internal hashing is to double the hash table and rehash each entry. However, this
technique is slow, because writing all pages to disk is too expensive.
Therefore, instead of doubling the whole hash table, we use a directory of pointers to buckets, and
double the number of buckets by doubling the directory, splitting just the bucket that overflows.
Since the directory is much smaller than the file, doubling it is much cheaper. Only one page of keys
and pointers is split.