0% found this document useful (0 votes)
3 views64 pages

Unit V

The document covers searching, sorting, and hashing techniques in computer science. It details linear and binary search methods, various sorting algorithms like bubble sort, selection sort, and insertion sort, and explains hashing concepts including hash tables and functions. Additionally, it discusses collision resolution strategies such as separate chaining and open addressing.

Uploaded by

hema22050.ec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views64 pages

Unit V

The document covers searching, sorting, and hashing techniques in computer science. It details linear and binary search methods, various sorting algorithms like bubble sort, selection sort, and insertion sort, and explains hashing concepts including hash tables and functions. Additionally, it discusses collision resolution strategies such as separate chaining and open addressing.

Uploaded by

hema22050.ec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

UNIT V

SEARCHING, SORTING AND HASHING


TECHNIQUES
SYLLABUS:
Searching- Linear Search - Binary Search - Sorting - Bubble sort - Selection sort -

Insertion sort – Hashing - Hash Functions – Separate Chaining – Open Addressing –

Rehashing – Extendible Hashing.


SEARCHING
Searching:
• Search is an operation in which a given list is searched for a particular value.
• The specified element is often called the search key.
• Searching can also be the process of finding the location of the specified element in a list.

• If the process of searching finds the match of the search key with a list element,
the search is said to be successful, otherwise it is unsuccessful.
The most commonly used search techniques are
1. Linear Search (or) Sequential search
2. Binary Search
Linear Search:
• The list is searched sequentially , starting from the first element and continues till the desired
element is found.

• The worst case is up to the end of the list (i.e) For list of size is N, total no. of steps required is also
N.

A simple approach to implement a linear search is

•Begin with the leftmost element of arr[] and one by one compare x with each element.
•If x matches with an element then return the index.
•If x does not match with any of the elements then return -1.
ROUTINE:
int linearSearch(int values[], int key, int n)
{
for(int i = 0; i < n; i++)
{
if (values[i] == key)
{
return i;
}
}
return -1;
}
Linear Search Algorithm is applied when
• No information is given about the array.
• The given array is unsorted or the elements are unordered.
• The list of data items is smaller.

The time complexity of a linear search is O(n).


Here, n is the number of elements in the linear array.
We just compare the given value with the elements in an array one by one.
So, space complexity is O(1).
Advantages of Linear search:
The linear search is simple.It is very easy to understand and implement
It does not require the data in the array to be stored in any particular
order.

Disadvantages of Linear search


Slower than many other search algorithms
It has a very poor efficiency.
Binary search:
• Binary Search is used for searching an element in a sorted array.
• It is a fast search algorithm with run-time complexity of O(log
n).
• Binary search works on the principle of divide and
conquer.
• This searching technique looks for a particular element by
comparing the middle most element of the collection.
• It is useful when there are large number of elements in an array.
Case-01
If the element being searched is found to be the middle most element, its
index is returned.
Case-02
If the element being searched is found to be greater than the middle most
element, then its search is further continued in the right sub array of the
middle most element.
Case-03
If the element being searched is found to be smaller than the middle most
element, then its search is further continued in the left sub array of the
middle most element.
for(i=0; i<n; i++)
cin>>a[i];
cin>>key;
low = 0;
high = n-1;
middle = (low + high) / 2;
while (low < = high)
{
if( key = = a[middle])
cout<< key, middle;
break;
else if( key < a[middle])
high = middle - 1;
else
low = middle + 1;
middle = (low + high) / 2;
}
if( low > high)
cout<<Search key is not found;
The time complexity of binary search is:-
• Best Case- O(1) i.e. constant.
• Average Case- O(logn).
• Worst Case- O(logn).
Space Complexity Analysis:
Space complexity is O(1).
Advantages of Binary search:
• Binary search has to be arranged in a sorted list.
• The algorithm is faster compared to linear search.
• The number of comparisons is less in binary search compared to the linear search.
• The binary search algorithm first finds the middle element and then searches the element’s position
so because of this there is considerable reduction of the element list
Disadvantages of Binary search
• This search works on only sorted list but the other techniques works on different types of technique
SORTING
SORTING:
The operation of arranging a set of data in some given order .
Most used orders are Numerical order and Lexicographical order
It is an operation in which all the elements of a list are arranged either in ascending or descending
order
Some of the sorting techniques are
• Bubble Sort
• Selection sort
• Insertion Sort
• Quick Sort
• Radix sort
• Merge Sort
• Shell Sort
Sorting Categories

There are two different categories in sorting:

● Internal sorting: If the input data is such that it can be adjusted in the main memory at once, it is
called internal sorting.
● External sorting: If the input data is such that it cannot be adjusted in the memory entirely at
once, it needs to be stored in a hard disk, floppy disk, or any other storage device. This is called
external sorting.
BUBBLE SORT
It is a very simple sorting techniques. Bubble Sort is the simplest sorting algorithm that
works by repeatedly swapping the adjacent elements if they are in the wrong order.
Steps to perform Bubble Sort
• It Proceeds by looking at the list from left to right
• Each adjacent pair of element is compared.
• Whenever the pair is not in order, the elements are exchanged
• Therefore after the first pass, the largest element bubbles up to the end of the list
• Steps above are repeated for (array size – 1) times to get the sorted array.
Routine:
for( i=0;i<n-1;i++)
{
for(j=0;j<n-1-i;j++)
{
if(a[j]>a[j+1])
{
temp = a[j];
a[j] = a[j+1];
a[j+1] = temp;
}
}
}
Time and Space complexity for the Bubble Sort algorithm
• Worst Case Time Complexity [ Big-O ]: O(n2)
• Best Case Time Complexity [Big-omega]: O(n)
• Average Time Complexity [Big-theta]: O(n2)
• Space Complexity: O(1)
Advantages
∙ Easy to understand.
∙ Easy to implement.
∙ In-place, no external memory is needed.
∙ Performs greatly when the array is almost sorted.
Disadvantages
∙ Very expensive, O(n2)O(n2)in worst case and average case.
∙ It does more element assignments than its counterpart, insertion sort.
SELECTION SORT
● It finds the smallest element in the array and exchanges it with the element present
at the head (First) of the list.
● Now the list is divided into two parts sorted and unsorted.
● The same steps repeated for the unsorted array (i.e) smallest is searched in the
unsorted part of the list and exchanged with the element at head of unsorted part.
● Procedure done till array is sorted
● Two important steps in selection sort are selection and exchange.
● For ‘n’ elements , n-1 pass is required.
ROUTINE:

for(i=0;i<n-1;i++) // no of passes
{
int min = i;
for(j=i+1;j<n;j++) // to find minimum element
{
if(a[j]<a[min])
{
min =j;
}
}
temp = a[i];
a[i] = a[min];
a[min] = temp;
}
Time & space complexity:
Worst Case Time Complexity : O(n2)
Best Case Time Complexity: : O(n²)
Average Time Complexity: : O(n2)
Space Complexity : O(1)
Advantages:
∙ It performs well on a small list.
∙ Because it is an in-place sorting algorithm, no additional temporary storage is required beyond
what is needed to hold the original list.
∙ Its performance is easily influenced by the initial ordering of the items before the sorting
process.

Disadvantages:
∙ Poor efficiency when dealing with a huge list of items.
∙ It requires n-squared number of steps for sorting n elements.
INSERTION SORT
Steps for insertion sort

• Given a list of numbers, it divides the list into two-parts sorted and
unsorted.
• The first element becomes the sorted part and the rest of the list
becomes the unsorted part.

• Then it picks up one element from the front of the unsorted part and
inserts it at its proper position in the sorted part of the list.
• The insertion action id repeated till the unsorted part diminishes
ROUTINE:
for(i=1;i<n;i++)
{
temp = a[i]
for (j=i; j>0 && temp<a[j-1]; j--)
{
a[j]=a[j-1];
}
a[j]=temp
}
Time and Space complexity for the Insertion Sort algorithm
• Worst Case Time Complexity [ Big-O ]: O(n2)
• Best Case Time Complexity [Big-omega]: O(n)
• Average Time Complexity [Big-theta]: O(n2)
• Space Complexity: O(1)
Advantages
• It is simple and when the list is small it is efficient
• It is an in place sorting algorithm so the space requirement is minimal
Disadvantages
• When compared to other sorting algorithm it is inefficient.
• It is inefficient when sorting huge list .
HASHING
HASHING :
● Hashing is a technique that is used to store, retrieve and find data in the data structure
called Hash Table.
● It is used to overcome the drawback of
Linear Search (Comparison) &
Binary Search (Sorted order list).

● It involves two important concepts-

Hash Table
Hash Function
● A hash table is a data structure that is used to store and retrieve data (keys) very
quickly.
● It is an array of some fixed size, containing the keys. Hash table run from 0 to Tablesize – 1.
● Each key is mapped into some number in the range 0 to Tablesize – 1. This mapping is
called Hash function.
● Insertion of the data in the hash table is based on the key value obtained from the hash
function.
● If the input keys are integer, the commonly used hash function is H ( key ) = key % Tablesize
● Using same hash key value, the data can be retrieved from the hash table by few or more
Hash key comparison.
The load factor of a hash table is calculated using the formula:
(Number of data elements in the hash table) / (Size of the hash table)
Factors affecting Hash Table Design
Hash function
Table size.
Collision handling scheme
Types of Hash Functions
● Division Method
● Mid Square Method
● Multiplicative Hash Function
● Digit Folding
Division Method:
● It depends on remainder of division.
● Divisor is Table Size.
● Formula is ( H ( key ) = key % table size )
Mid Square Method:
● We first square the item, and then extract some portion of the resulting digits(r).
● For example, if the item were 44, we would first compute 44^2=1,936. Extract the middle two
digit 93 from the answer.
● Store the key 44 in the index 93.
● The value of r can be decided based on the size of the table. Suppose the hash table has 100 memory locations. So r = 2
because two digits are required to map the key to the memory location.
Multiplicative Hash Function:
This method involves the following steps:
1. Choose a constant value A such that 0 < A < 1.
2. Multiply the key value with A.
3. Extract the fractional part of kA.
4. Multiply the result of the above step by the size of the hash table i.e. M.
5. The resulting hash value is obtained by taking the floor of the result obtained in step 4.

Formula:
h(K) = floor (M (kA mod 1)
M is the size of the hash table.
k is the key value.
A is a constant value.
EXAMPLE:
k = 12345
A = 0.357840
M = 100
h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
Digit Folding Method:
This method involves two steps:
1. Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each part has the
same number of digits except for the last part that can have lesser digits than the other parts.
2. Add the individual parts. The hash value is obtained by ignoring the last carry if any.

Formula:
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= S
s is obtained by adding the parts of the key k
Example:
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51

Note:
The number of digits in each part varies depending upon the size of the hash table. Suppose for
example the size of the hash table is 100, then each part must have two digits except for the last
part which can have a lesser number of digits.
Collision:
If two more keys hashes to the same index, the corresponding records cannot be stored in the
same location. This condition is known as collision.

Characteristics of Good Hashing Function:


It should be Simple to compute. Number of Collision should be less while placing record in
Hash Table. Hash function with no collision Perfect hash function.

Hash Function should produce keys which are distributed uniformly in hash table.

The hash function should depend upon every bit of the key. Thus the hash function that
simply extracts the portion of a key is not suitable.
Collision Resolution Strategies / Techniques (CRT)
Separate Chaining:

● The idea behind separate chaining is to implement the array as a linked list called a
chain.
● Separate chaining is one of the most popular and commonly used techniques in
order to handle collisions.
● The linked list data structure is used to implement this technique.
● So what happens is, when multiple elements are hashed into the same slot index,
then these elements are inserted into a singly-linked list which is known as a
chain.
● Here, all those elements that hash into the same slot index are inserted into a linked
list.
INSERT THE FOLLOWING 64, 36, 81, 49, 25, 4, 9, 16, 1
Advantages
More number of elements can be inserted using array of Link List

Disadvantages
It requires more pointers, which occupies more memory space.
Search takes time. Since it takes time to evaluate Hash Function and also to traverse the List
Open Addressing
● Closed Hashing
● Uses Hi(X)=(Hash(X)+F(i))mod Tablesize
● When collision occurs, alternative cells are tried until empty cells are
found.
Types:-
● Linear Probing
● Quadratic Probing
● Double Hashing
Linear Probing:

In linear probing, the hash table is searched sequentially that starts from the original
location of the hash. If in case the location that we get is already occupied, then we
check for the next location.
The function used for rehashing is as follows: rehash(key) = (n+1)%table-size.
Quadratic probing
● It is a method with the help of which we can solve the problem of clustering.
● This method is also known as the mid-square method.
● In this method, we look for the i^2 th slot in the ith iteration.
● It always start from the original hash location. If only the location is occupied then we
check the other slots.

If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S


If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
DOUBLE HASHING:
● Double hashing is a collision resolution technique used in hash tables.
● It works by using two hash functions to compute two different hash values for a
given key. The first hash function is used to compute the initial hash value, and
the second hash function is used to compute the step size for the probing
sequence.
● Double hashing has the ability to have a low collision rate, as it uses two hash
functions to compute the hash value and the step size.
Double hashing can be done using :

(hash1(key) + i * hash2(key)) % TABLE_SIZE

Here hash1() and hash2() are hash functions and TABLE_SIZE is size of hash table.

(We repeat by increasing i when collision occurs)

hash1(key) = key % TABLE_SIZE

hash2(key) = PRIME – (key % PRIME)

where PRIME is a prime smaller than the TABLE_SIZE.


Rehashing:

● Rehashing is the process of increasing the size of a hashmap and redistributing the elements to new
buckets based on their new hash values.
● It is done to improve the performance of the hashmap and to prevent collisions caused by a high load
factor.
● When a hashmap becomes full, the load factor (i.e., the ratio of the number of elements to the number of
buckets) increases. As the load factor increases, the number of collisions also increases, which can lead
to poor performance.
● Rehashing can be costly in terms of time and space, but it is necessary to maintain the efficiency of the
hashmap.
How Rehashing is done?
Rehashing can be done as follows:

● For each addition of a new entry to the map, check the load factor.
● If it’s greater than its pre-defined value (or default value of 0.75 if not given), then
Rehash.
● For Rehash, make a new array of double the previous size and make it the new bucket
array.
● Then traverse to each element in the old bucket array and call the insert() for each so as to
insert it into the new larger bucket array.
Extendible Hashing is a dynamic hashing method wherein directories, and buckets are used to hash
data. It is an aggressively flexible method in which the hash function also experiences dynamic
changes.
Main features of Extendible Hashing:
The main features in this hashing technique are:

● Directories: The directories store addresses of the buckets in pointers. An id is assigned to each
directory which may change each time when Directory Expansion takes place.
● Buckets: The buckets are used to hash the actual data.
Example of hashing the following elements: 16,4,6,22,24,10,31,7,9,20,26.

Bucket Size: 4

First, calculate the binary forms of each of the given numbers.

16- 10000

4- 00100

6- 00110

22- 10110

24- 11000

10- 01010

31- 11111

7- 00111

9- 01001

20- 10100

26- 11010

You might also like