Module V Unit 2 Hashing
Module V Unit 2 Hashing
array. Hashing techniques is implemented using hash function and hash table.
Hash Table: It is data structure which
contains index and associated data.
Access of data becomes very fast
if we know the index of the desired data.
Hash Function: It is a function which is
used to map the key to a hash value.
It is represented as h(x).
Collision: If the same index or hash value
is produced by the hash function for
multiple keys then, conflict arises.
1
This situation is called collision.
Types of hash functions:
Division method
It is the most simple method of hashing an integer x. This method divides x by m and
then uses the remainder obtained as hash value. In this case, the hash function can be
given as
h(x) = x mod m.
It requires only a single division operation, therefore this method works very fast.
Example:
calculate the hash values of keys 1234 and 5462,where
m=97. h(1234) = 1234 % 97 = 70 , h(5642) = 5642 % 97
= 16
Multiplication method
The steps involved in the multiplication method are as follows:
Step 1: choose a constant a such that 0 < a < 1.
Step 2: multiply the key k by a.
Step 3: extract the fractional part of ka.
Step 4: multiply the result of step 3 by the size of hash table (m).
Hence, the hash function can be given as:
h(k) = m (ka mod 1) where, (ka mod 1) gives the fractional part of ka and m is the total
number of indices in the hash table.
Example:
Given a hash table of size 1000, map the key 12345 to an appropriate location in the hash
table.
we will use a = 0.618033, m = 1000, and k = 12345
h(12345) = 1000 (12345 * 0.618033 mod 1)
= 1000 (7629.617385 mod 1)
= 1000 (0.617385)
= 617.385
= 617
Insert 52 Insert 36
Insert 54 Insert 11
8
Insert 23
Open Addressing:
In this technique, the hash table contains two types of values: sentinel values (e.g., –1)
and data values. The presence of a sentinel value indicates that the location contains no
data value at present but can be used to hold a value.
When a key is mapped to a particular memory location, then the value it holds is
checked. If it contains a sentinel value, then the location is free and the data value can be
stored in it.
if the location already has some data value stored in it, then other slots are examined
systematically in the forward direction to find a free slot. If even a single free location
is not found, then we have an overflow condition.
The process of examining memory locations in the hash table is called probing.
Open addressing technique can be implemented using:
1. Linear probing
2. Quadratic probing
3. Double hashing
4. Rehashing.
Linear probing:
The simplest approach to resolve a collision is
linear probing. In this technique, if a value is
already stored at a location generated by h(k), then
the following hash function is used to resolve the
collision:
H(k, i) = [h¢(k) +
i] mod m
Where m is the size of the hash table, h¢(k) = (k
mod m), and i is the probe number that varies
from 0 to m–1.
Example: Consider a hash table of size 10. Using
linear probing, insert the keys 72, 27, 36, 24, 63,
81, 92 into the table.
Let h¢(k) = k mod m, m = 10 ,Initially,
Example: Consider a hash table of size 10. Using
linear probing, insert the keys 72, 27, 36, 24,
63, 81, 92 into the table.
Let h¢(k) = k mod m, m = 10
Advantages:
Easy to compute.
Disadvantages:
Heap sort:
= 12
Right child of 1
= element in 2 index
= 9
Similarly,
= element in 3 index
= 5
Right child of 12
= element in 4 index
= 6
Let us also confirm that the rules hold for finding parent of any
node
Parent of 9 (position 2)
= (2-1)/2
= ½
= 0.5
~ 0 index
= 1
Parent of 12 (position 1)
= (1-1)/2
= 0 index
= 1
heapify(array)
Root = array[0]
Largest = largest( array[0] , array [2*0 + 1].
array[2*0+2])
if(Root != Largest)
Swap(Root, Largest)
The example above shows two scenarios - one in which the root
is the largest element and we don't need to do anything. And
another in which the root had a larger element as a child and we
needed to swap to maintain max-heap property.
The top element isn't a max-heap but all the sub-trees are max-
heaps.
To maintain the max-heap property for the entire tree, we will
have to keep pushing 2 downwards until it reaches its correct
position.
int largest = i;
int left = 2 * i + 1;
int right = 2 * i + 2;
largest = left;
largest = right;
if (largest != i) {
swap(&arr[i], &arr[largest]);
heapify(arr, n, largest);
This function works for both the base case and for a tree of any
size. We can thus move the root element to the correct position to
maintain the max-heap status for any tree size as long as the sub-
trees are max-heaps.
Build max-heap
To build a max-heap from any tree, we can thus start heapifying
each sub-tree from the bottom up and end up with a max-heap
after the function is applied to all the elements including the root
element.
heapify(arr, n, i);
Steps to build max heap for heap sort
Steps to build max heap for heap sort
As shown in the above diagram, we start by heapifying the lowest
smallest trees and gradually move up until we reach the root
element.
9. swap(&arr[0], &arr[i]);
10.
// Heap Sort in C
#include <stdio.h>
// Heap sort
for (int i = n - 1; i >= 0; i--) {
swap(&arr[0], &arr[i]);
// Print an array
void printArray(int arr[], int n) {
for (int i = 0; i < n; ++i)
printf("%d ", arr[i]);
printf("\n");
}
// Driver code
int main() {
int arr[] = {1, 12, 9, 5, 6, 10};
int n = sizeof(arr) / sizeof(arr[0]);
heapSort(arr, n);
Time Complexity
Best O(nlog n)
Worst O(nlog n)
Average O(nlog n)
Let the initial array be [121, 432, 564, 23, 1, 45, 788]. It is
sorted according to radix sort as shown in the figure below.
Working of Radix Sort
1. Find the largest element in the array, i.e. max. Let X be the number
of digits in max. X is calculated because we have to go through all
the significant places of all elements.
In this array [121, 432, 564, 23, 1, 45, 788], we have the largest
number 788. It has 3 digits. Therefore, the loop should go up to
hundreds place (3 times).
2. Now, go through each significant place one by one.
radixSort(array)
for i <- 0 to d
countingSort(array, d)
#include <stdio.h>
// Print an array
void printArray(int array[], int size) {
for (int i = 0; i < size; ++i) {
printf("%d ", array[i]);
}
printf("\n");
}
// Driver code
int main() {
int array[] = {121, 432, 564, 23, 1, 45, 788};
int n = sizeof(array) / sizeof(array[0]);
radixsort(array, n);
printArray(array, n);
}
Radix Sort Complexity
Time Complexity
Best O(n+k)
Worst O(n+k)
Average O(n+k)
Stability Yes
This makes radix sort space inefficient. This is the reason why this
sort is not used in software libraries.