0% found this document useful (0 votes)
31 views

Module V Unit 2 Hashing

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Module V Unit 2 Hashing

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Hashing: Hashing is a technique to convert a range of key values into a range of indexes of an

array. Hashing techniques is implemented using hash function and hash table.
Hash Table: It is data structure which
contains index and associated data.
Access of data becomes very fast
if we know the index of the desired data.
Hash Function: It is a function which is
used to map the key to a hash value.
It is represented as h(x).
Collision: If the same index or hash value
is produced by the hash function for
multiple keys then, conflict arises.
1
This situation is called collision.
Types of hash functions:
 Division method
It is the most simple method of hashing an integer x. This method divides x by m and
then uses the remainder obtained as hash value. In this case, the hash function can be
given as
h(x) = x mod m.
It requires only a single division operation, therefore this method works very fast.
Example:
calculate the hash values of keys 1234 and 5462,where
m=97. h(1234) = 1234 % 97 = 70 , h(5642) = 5642 % 97
= 16
 Multiplication method
The steps involved in the multiplication method are as follows:
Step 1: choose a constant a such that 0 < a < 1.
Step 2: multiply the key k by a.
Step 3: extract the fractional part of ka.
Step 4: multiply the result of step 3 by the size of hash table (m).
Hence, the hash function can be given as:
h(k) = m (ka mod 1) where, (ka mod 1) gives the fractional part of ka and m is the total
number of indices in the hash table.
Example:
Given a hash table of size 1000, map the key 12345 to an appropriate location in the hash
table.
we will use a = 0.618033, m = 1000, and k = 12345
h(12345) = 1000 (12345 * 0.618033 mod 1)
= 1000 (7629.617385 mod 1)
= 1000 (0.617385)
= 617.385
= 617

 Mid Square Method:


The mid-square method is a good hash function which works in two steps:
Step 1: square the value of the key. That is, find k2.
Step 2: extract the middle r digits of the result obtained in step 1.
In the mid-square method, the same r digits must be chosen from all the keys. Therefore,
the hash function can be given as:
H(k) = s ,Where s is obtained by selecting r digits from k2.
.
Example:
calculate the hash value for keys 1234 and 5642 using the mid-square method.
The hash table has 100 memory locations.
Note that the hash table has 100 memory locations whose indices vary from 0 to 99.
This means that only two digits are needed to map the key to a location in the hash table, so
r = 2.
When k = 1234, k2 = 1522756, h (1234) = 27
When k = 5642, k2 = 31832164, h (5642) = 21
Observe that the 3rd and 4th digits starting from the right are chosen.
Folding method :
The folding method works in the following two steps:
Step 1: divide the key value into a number of parts. That is, divide k into parts k1 ,
k2 , ..., kn , where ,each part has the same number of digits except the last part which
may have lesser digits than the other parts.
Step 2: add the individual parts. That is, obtain the sum of k1+k2+k3+......kn .This
hash value produced by ignoring the last carry, if any.
Example:
Given a hash table of 100 locations, calculate the hash value using folding method for
keys 5678, 321, and 34567.
Since there are 100 memory locations to address, we will break the key into parts where
each part (except the last) will contain two digits. The hash values can be obtained as
shown below:
The collision resolution techniques are :
1. Separate chaining
2. Open addressing
Separate chaining:
 In this technique, if a hash function produces the same index for multiple elements, these
elements are stored in the same index by using a linked list.
 if no element is hashed to a particular index then it will contain NULL.
Insert the keys 7, 24, 18, 52, 36, 54, 11, and 23 in a chained hash table of 9 memory locations.
Use hash function h(k) = k mod m.
In this case, m=9.
Insert 7
Insert 24 Insert 18

Insert 52 Insert 36
Insert 54 Insert 11

8
Insert 23
Open Addressing:
 In this technique, the hash table contains two types of values: sentinel values (e.g., –1)
and data values. The presence of a sentinel value indicates that the location contains no
data value at present but can be used to hold a value.
 When a key is mapped to a particular memory location, then the value it holds is
checked. If it contains a sentinel value, then the location is free and the data value can be
stored in it.
 if the location already has some data value stored in it, then other slots are examined
systematically in the forward direction to find a free slot. If even a single free location
is not found, then we have an overflow condition.
 The process of examining memory locations in the hash table is called probing.
 Open addressing technique can be implemented using:
1. Linear probing
2. Quadratic probing
3. Double hashing
4. Rehashing.
Linear probing:
 The simplest approach to resolve a collision is
linear probing. In this technique, if a value is
already stored at a location generated by h(k), then
the following hash function is used to resolve the
collision:
H(k, i) = [h¢(k) +
i] mod m
Where m is the size of the hash table, h¢(k) = (k
mod m), and i is the probe number that varies
from 0 to m–1.
Example: Consider a hash table of size 10. Using
linear probing, insert the keys 72, 27, 36, 24, 63,
81, 92 into the table.
Let h¢(k) = k mod m, m = 10 ,Initially,
Example: Consider a hash table of size 10. Using
linear probing, insert the keys 72, 27, 36, 24,
63, 81, 92 into the table.
Let h¢(k) = k mod m, m = 10
Advantages:

 Easy to compute.

Disadvantages:

 Table must be big enough to get a free cell.

 Time to get a free cell may be quite large.

 Primary Clustering: Any key that hashes

into the cluster will require several

attempts to resolve the collision.

Heap sort:

Heap Sort Algorithm


Heap Sort is a popular and efficient sorting algorithm in computer
programming. Learning how to write the heap sort algorithm
requires knowledge of two types of data structures - arrays and
trees.
The initial set of numbers that we want to sort is stored in an array
e.g. [10, 3, 76, 34, 23, 32] and after sorting, we get a sorted
array [3,10,23,32,34,76].
Heap sort works by visualizing the elements of the array as a
special kind of complete binary tree called a heap.

Relationship between Array Indexes and Tree Elements


A complete binary tree has an interesting property that we can
use to find the children and parents of any node.

If the index of any element in the array is i, the element in the


index 2i+1 will become the left child and element in 2i+2 index will
become the right child. Also, the parent of any element at
index i is given by the lower bound of (i-1)/2.

Relationship between array and heap indices

Let's test it out,

Left child of 1 (index 0)

= element in (2*0+1) index


= element in 1 index

= 12

Right child of 1

= element in (2*0+2) index

= element in 2 index

= 9

Similarly,

Left child of 12 (index 1)

= element in (2*1+1) index

= element in 3 index

= 5

Right child of 12

= element in (2*1+2) index

= element in 4 index

= 6

Let us also confirm that the rules hold for finding parent of any
node
Parent of 9 (position 2)

= (2-1)/2

= ½

= 0.5

~ 0 index

= 1

Parent of 12 (position 1)

= (1-1)/2

= 0 index

= 1

Understanding this mapping of array indexes to tree positions is


critical to understanding how the Heap Data Structure works and
how it is used to implement Heap Sort.
What is Heap Data Structure?
Heap is a special tree-based data structure. A binary tree is said
to follow a heap data structure if

● it is a complete binary tree


● All nodes in the tree follow the property that they are greater than
their children i.e. the largest element is at the root and both its
children and smaller than the root and so on. Such a heap is
called a max-heap. If instead, all nodes are smaller than their
children, it is called a min-heap
The following example diagram shows Max-Heap and Min-Heap

How to "heapify" a tree


Starting from a complete binary tree, we can modify it to become
a Max-Heap by running a function called heapify on all the non-
leaf elements of the heap.
Since heapify uses recursion, it can be difficult to grasp. So let's
first think about how you would heapify a tree with just three
elements.

heapify(array)
Root = array[0]
Largest = largest( array[0] , array [2*0 + 1].
array[2*0+2])
if(Root != Largest)
Swap(Root, Largest)

The example above shows two scenarios - one in which the root
is the largest element and we don't need to do anything. And
another in which the root had a larger element as a child and we
needed to swap to maintain max-heap property.

If you're worked with recursive algorithms before, you've probably


identified that this must be the base case.

Now let's think of another scenario in which there is more than


one level.

The top element isn't a max-heap but all the sub-trees are max-
heaps.
To maintain the max-heap property for the entire tree, we will
have to keep pushing 2 downwards until it reaches its correct
position.

Thus, to maintain the max-heap property in a tree where both


sub-trees are max-heaps, we need to run heapify on the root
element repeatedly until it is larger than its children or it becomes
a leaf node.
We can combine both these conditions in one heapify function as
void heapify(int arr[], int n, int i) {

// Find largest among root, left child and right child

int largest = i;

int left = 2 * i + 1;

int right = 2 * i + 2;

if (left < n && arr[left] > arr[largest])

largest = left;

if (right < n && arr[right] > arr[largest])

largest = right;

// Swap and continue heapifying if root is not largest

if (largest != i) {

swap(&arr[i], &arr[largest]);

heapify(arr, n, largest);

This function works for both the base case and for a tree of any
size. We can thus move the root element to the correct position to
maintain the max-heap status for any tree size as long as the sub-
trees are max-heaps.
Build max-heap
To build a max-heap from any tree, we can thus start heapifying
each sub-tree from the bottom up and end up with a max-heap
after the function is applied to all the elements including the root
element.

In the case of a complete tree, the first index of a non-leaf node is


given by n/2 - 1. All other nodes after that are leaf-nodes and
thus don't need to be heapified.
So, we can build a maximum heap as
// Build heap (rearrange array)

for (int i = n / 2 - 1; i >= 0; i--)

heapify(arr, n, i);
Steps to build max heap for heap sort
Steps to build max heap for heap sort
As shown in the above diagram, we start by heapifying the lowest
smallest trees and gradually move up until we reach the root
element.

If you've understood everything till here, congratulations, you are


on your way to mastering the Heap sort.

Working of Heap Sort


1. Since the tree satisfies Max-Heap property, then the largest item
is stored at the root node.
2. Swap: Remove the root element and put at the end of the array
(nth position) Put the last item of the tree (heap) at the vacant
place.
3. Remove: Reduce the size of the heap by 1.
4. Heapify: Heapify the root element again so that we have the
highest element at root.
5. The process is repeated until all the items of the list are sorted.
6. The code below shows the operation.
7. // Heap sort

8. for (int i = n - 1; i >= 0; i--) {

9. swap(&arr[0], &arr[i]);

10.

11. // Heapify root element to get highest element at


root again

12. heapify(arr, i, 0);


13. }

// Heap Sort in C

#include <stdio.h>

// Function to swap the the position of two elements


void swap(int *a, int *b) {
int temp = *a;
*a = *b;
*b = temp;
}

void heapify(int arr[], int n, int i) {


// Find largest among root, left child and right child
int largest = i;
int left = 2 * i + 1;
int right = 2 * i + 2;

if (left < n && arr[left] > arr[largest])


largest = left;

if (right < n && arr[right] > arr[largest])


largest = right;

// Swap and continue heapifying if root is not largest


if (largest != i) {
swap(&arr[i], &arr[largest]);
heapify(arr, n, largest);
}
}

// Main function to do heap sort


void heapSort(int arr[], int n) {
// Build max heap
for (int i = n / 2 - 1; i >= 0; i--)
heapify(arr, n, i);

// Heap sort
for (int i = n - 1; i >= 0; i--) {
swap(&arr[0], &arr[i]);

// Heapify root element to get highest element at root


again
heapify(arr, i, 0);
}
}

// Print an array
void printArray(int arr[], int n) {
for (int i = 0; i < n; ++i)
printf("%d ", arr[i]);
printf("\n");
}

// Driver code
int main() {
int arr[] = {1, 12, 9, 5, 6, 10};
int n = sizeof(arr) / sizeof(arr[0]);

heapSort(arr, n);

printf("Sorted array is \n");


printArray(arr, n);
}

Heap Sort Complexity

Time Complexity

Best O(nlog n)

Worst O(nlog n)

Average O(nlog n)

Space Complexity O(1)


Stability No
Heap Sort Applications
Systems concerned with security and embedded systems such as
Linux Kernel use Heap Sort because of the O(n log n) upper
bound on Heapsort's running time and constant O(1) upper bound
on its auxiliary storage.
Although Heap Sort has O(n log n) time complexity even for the
worst case, it doesn't have more applications ( compared to other
sorting algorithms like Quick Sort, Merge Sort ). However, its
underlying data structure, heap, can be efficiently used if we want
to extract the smallest (or largest) from the list of items without the
overhead of keeping the remaining items in the sorted order. For
e.g Priority Queues.

Radix Sort Algorithm


Radix sort is a sorting algorithm that sorts the elements by first
grouping the individual digits of the same place value. Then, sort
the elements according to their increasing/decreasing order.
Suppose, we have an array of 8 elements. First, we will sort
elements based on the value of the unit place. Then, we will sort
elements based on the value of the tenth place. This process
goes on until the last significant place.

Let the initial array be [121, 432, 564, 23, 1, 45, 788]. It is
sorted according to radix sort as shown in the figure below.
Working of Radix Sort
1. Find the largest element in the array, i.e. max. Let X be the number
of digits in max. X is calculated because we have to go through all
the significant places of all elements.
In this array [121, 432, 564, 23, 1, 45, 788], we have the largest
number 788. It has 3 digits. Therefore, the loop should go up to
hundreds place (3 times).
2. Now, go through each significant place one by one.

Use any stable sorting technique to sort the digits at each


significant place. We have used counting sort for this.

Sort the elements based on the unit place digits (X=0).

Using counting sort to sort elements based on unit place


3.Now, sort the elements based on digits at tens place.
Sort elements based on tens place

4.Finally, sort the elements based on the digits at hundreds


place.
Sort elements based on hundreds place

Radix Sort Algorithm

radixSort(array)

d <- maximum number of digits in the largest element

create d buckets of size 0-9

for i <- 0 to d

sort the elements according to ith place digits using


countingSort

countingSort(array, d)

max <- find largest element among dth place elements

initialize count array with all zeros

for j <- 0 to size


find the total count of each unique digit in dth place of
elements and

store the count at jth index in count array

for i <- 1 to max

find the cumulative sum and store it in count array itself

for j <- size down to 1

restore the elements to array

decrease count of each element restored by 1

// Radix Sort in C Programming

#include <stdio.h>

// Function to get the largest element from an array


int getMax(int array[], int n) {
int max = array[0];
for (int i = 1; i < n; i++)
if (array[i] > max)
max = array[i];
return max;
}

// Using counting sort to sort the elements in the basis of


significant places
void countingSort(int array[], int size, int place) {
int output[size + 1];
int max = (array[0] / place) % 10;

for (int i = 1; i < size; i++) {


if (((array[i] / place) % 10) > max)
max = array[i];
}
int count[max + 1];

for (int i = 0; i < max; ++i)


count[i] = 0;

// Calculate count of elements


for (int i = 0; i < size; i++)
count[(array[i] / place) % 10]++;
// Calculate cumulative count
for (int i = 1; i < 10; i++)
count[i] += count[i - 1];

// Place the elements in sorted order


for (int i = size - 1; i >= 0; i--) {
output[count[(array[i] / place) % 10] - 1] = array[i];
count[(array[i] / place) % 10]--;
}

for (int i = 0; i < size; i++)


array[i] = output[i];
}

// Main function to implement radix sort


void radixsort(int array[], int size) {
// Get maximum element
int max = getMax(array, size);

// Apply counting sort to sort elements based on place


value.
for (int place = 1; max / place > 0; place *= 10)
countingSort(array, size, place);
}

// Print an array
void printArray(int array[], int size) {
for (int i = 0; i < size; ++i) {
printf("%d ", array[i]);
}
printf("\n");
}

// Driver code
int main() {
int array[] = {121, 432, 564, 23, 1, 45, 788};
int n = sizeof(array) / sizeof(array[0]);
radixsort(array, n);
printArray(array, n);
}
Radix Sort Complexity

Time Complexity
Best O(n+k)

Worst O(n+k)

Average O(n+k)

Space Complexity O(max)

Stability Yes

Since radix sort is a non-comparative algorithm, it has advantages


over comparative sorting algorithms.

For the radix sort that uses counting sort as an intermediate


stable sort, the time complexity is O(d(n+k)).
Here, d is the number cycle and O(n+k) is the time complexity of
counting sort.
Thus, radix sort has linear time complexity which is better
than O(nlog n) of comparative sorting algorithms.
If we take very large digit numbers or the number of other bases
like 32-bit and 64-bit numbers then it can perform in linear time
however the intermediate sort takes large space.

This makes radix sort space inefficient. This is the reason why this
sort is not used in software libraries.

Radix Sort Applications


Radix sort is implemented in

● DC3 algorithm (Kärkkäinen-Sanders-Burkhardt) while making a


suffix array.
● places where there are numbers in large ranges.

You might also like