Hashing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Hashing

Hashing
• Hashing refers to the process of transforming a given key to
another value.
• It involves mapping data to a specific index in a hash table
using a hash function that enables fast retrieval of information
based on its key.
• The transformation of a key to the corresponding value is done
using a Hash Function and the value obtained from the hash
function is called Hash Code .
Need for Hash data structure
● A very common data structure is the Array data structure.
● For a large data set Array data structure becomes inefficient.
● So now we are looking for a data structure that can store the
data and search in it in constant time, i.e. in O(1) time. This is
how Hashing data structure came into play.
● With the introduction of the Hash data structure, it is now
possible to easily store data in constant time and retrieve them
in constant time as well.
Components of Hashing
1. Key: A Key can be anything string or integer which is fed as
input in the hash function.
2. Hash Function: The hash function receives the input key and
returns the index of an element in an array called a hash table.
The index is known as the hash index .
3. Hash Table: Hash table is a data structure that maps keys to
values using a special function called a hash function. Hash
stores the data in an associative manner in an array where
each data value has its own unique index.
How Hashing Works?
The process of hashing can be broken down into three steps:

1. Input: The data to be hashed is input into the hashing algorithm.


2. Hash Function: The hashing algorithm takes the input data and
applies a mathematical function to generate a fixed-size hash
value.
3. Output: The hash value is returned, which is used as an index to
store or retrieve data in a data structure.
Direct Address Table
● Direct Address Table is a data structure that has the
capability of mapping records to their corresponding
keys using arrays.
● In direct address tables, records are placed using
their key values directly as indexes.
● They facilitate fast searching, insertion and deletion
operations.
Example:
We create an array of size equal to maximum value plus one
(assuming 0 based index) and then use values as indexes. For
example, in the following diagram key 21 is used directly as index.
Limitations:
● Prior knowledge of maximum key value

● Practically useful only if the maximum value is very

less.
● It causes wastage of memory space if there is a

significant difference between total records and


maximum value.

Hashing can overcome these limitations of direct


address tables.
Hash function
● The hash function creates a mapping between key
and value, this is done through the use of
mathematical formulas known as hash functions.
● The result of the hash function is referred to as a
hash value or hash. The hash value is a
representation of the original string of characters but
usually smaller than the original.
Types of Hash functions
Division method
This method involves dividing the key by the table size and taking
the remainder as the hash value. For example, if the table size is
10 and the key is 23, the hash value would be 3 (23 % 10 = 3).

Advantages:
● Simple to implement.
● Works well when 𝑚m is a prime number.

Disadvantages:
● Poor distribution if 𝑚m is not chosen wisely.
● The reason why a prime modulo is often chosen is due to the
absence of common factors.
● This allows keys to be more evenly distributed and reduces
clustering, thus improving the performance of the hash table.

Let’s look at an example.


Multiplication Method
● In the multiplication method, the key is multiplied by a constant
A, which is between 0 and 1. But, an optimal choice will be ≈ (√5-
1)/2 [0.618033] suggested by Knuth.
● The reason for this is so to keep the result distributed within the
hash table size.
● Then, the fractional part of the result is multiplied by m, the
desired size of the hash table.
● The final result is then floored to obtain an integer value
equating to the index.
Example:
Given:
Key to hash=42
Constant 𝐴=0.618033
Table size m=10

H(key)= floor(m(key × A mod 1))


H(42)= ⌊10(42 × 0.618033 % 1) ⌋
H(42)= ⌊10(25.957386 % 1) ⌋
H(42)= ⌊10(0.957386) ⌋
H(42)= ⌊ 9.57386 ⌋
H(42)=9
Universal Hashing
● Universal hashing uses a family of hash functions to minimize
the chance of collision for any given set of inputs.

h(k)=((a⋅k+b) mod p) mod m

Where a and b are randomly chosen constants, p is a prime


number greater than m, and k is the key.

Advantages:
● Reduces the probability of collisions.

Disadvantages:
● Requires more computation and storage.
What is Collision?
● Collision in Hashing occurs
when two different keys
map to the same hash
value.
● The probability of a hash
collision depends on the
size of the algorithm, the
distribution of hash values
and the efficiency of Hash
function.
How to handle Collisions
Separate Chaining Example:
Hash function: key mod 7
● The idea is to make each cell of values: 50, 700, 76, 85, 92, 73, 101
the hash table point to a linked
list of records that have the
same hash function value.
● This method is implemented
using the linked list data
structure.
● When numerous elements are
hashed into the same slot index,
those elements are added to a
chain, which is a singly-linked
list.
Advantages:
● easy to implement
● We can always add more elements to the chain, thus the hash table
never runs out of space.
● less susceptible to load factors or the hash function.
● When it is unclear how many or how frequently keys might be added
or removed, it is typically used.

Disadvantages:
● Chaining's cache performance is poor since keys are kept in a linked
list.
● Space wastage (Some Parts of hash table are never used)
● In the worst situation, search time can become O(n) as the chain
lengthens.
● additional space is used for connections.
Open Addressing
Unlike chaining, open addressing doesn't store multiple elements
into the same slot. Here, each slot is either filled with a single key
or left NIL.

Linear Probing
In linear probing, the hash table is searched sequentially that
starts from the original location of the hash. If in case the location
that we get is already occupied, then we check for the next
location.
Algorithm
1. Calculate the hash key. i.e. key = data % size
2. Check, if hashTable[key] is empty
store the value directly by hashTable[key] = data
3. If the hash index already has some value then
check for next index using key = (key+1) % size
4. Check, if the next index is available hashTable[key] then store
the value. Otherwise try for next index.
5. Do the above process till we find the space.
Example: Let us consider a simple hash function as “key mod 5”
and a sequence of keys that are to be inserted are 50, 70, 76, 85,
93.
Quadratic Probing
Quadratic probing operates by taking the original hash index and
adding successive values of an arbitrary quadratic polynomial until an
open slot is found.

Algorithm
1. If the slot hash(x) % n is full, then we try (hash(x) + 12) % n.
2. If (hash(x) + 12 ) % n is also full, then we try (hash(x) + 22) % n.
3. If (hash(x) + 22 ) % n is also full, then we try (hash(x) + 32) % n.
4. This process will be repeated for all the values of i until an empty slot is
found
Example 1:
Table Size = 7, hash function as Hash(x) = x % 7. Insert = 22, 30,
and 50
Example 2:
Table size: 10; hash function: h(x) = x % 10
Keys: 2, 12, 22, 32
Quadratic probing formula: (hash(x) + i2 ) % n

32 2 12 22
0 1 2 3 4 5 6 7 8 9
1. h(2) = 2 % 10 = 2
2. h(12) = 12 % 10 = 2, index 2 is occupied.
(hash(x) + i2 ) % n
(2 + 12 ) % 10 = 3
3. h(22) = 22 % 10 = 2, index 2 is occupied
(2 + 12 ) % 10 = 3, index 3 is occupied
(2 + 22 ) % 10 = 6
4. h(32) = 32 % 10 = 2, index 2 is occupied
(2 + 12 ) % 10 = 3, index 3 is occupied
(2 + 22 ) % 10 = 6, index 6 is occupied
(2 + 32 ) % 10 = 1
Double Hashing
● Double hashing works by using two hash functions to compute two
different hash values for a given key.
● The first hash function is h1(k) which takes the key and gives out a
location on the hash table.
● But in case the location is occupied (collision) we will use secondary
hash-function h2(k) in combination with the first hash-function h1(k) to
find the new location on the hash table.

This combination of hash functions is of the form


h(k, i) = (h1(k) + i * h2(k)) % n
where:
● i is a non-negative integer that indicates a collision number,
● k = element/key which is being hashed
● n = hash table size.
Example:
Keys to insert: 27, 43, 692, 72
Hash Table size: 7
First hash-function is h1​(k) = k mod 7
Second hash-function is h2(k) = 1 + (k mod 5)
Lab Activity 1
Create a C program that will allow the user to insert, search,
display, and delete integer elements from a hash table. The
program must use multiplication method as hash function. For
collision resolution, use separate chaining.
Sources
● https://www.geeksforgeeks.org/introduction-to-hashing-
2/#how-to-handle-collisions
● https://www.youtube.com/watch?v=KyUTuwz_b7Q
● https://medium.com/@alejandro.itoaramendia/the-hash-table-
data-structure-a-complete-guide-27fb7ebed2ff

You might also like