Hashing Algorithms
Hashing Algorithms
Hashing
• Hashing is one of the searching techniques that uses a
constant time. The time complexity in hashing is O(1).
• Till now, we read the two techniques for searching, i.e.,
linear search and binary search. The worst time complexity in
linear search is O(n), and O(logn) in binary search. In both
the searching techniques, the searching depends upon the
number of elements but we want the technique that takes a
constant time. So, hashing technique came that provides a
constant time
Hashing
• It is the process of converting a given key into another value, with the
help of a hash function. A hash function is nothing but a mathematical
algorithm which helps generate a new value for a given input. The
result of a hash function is called a hash, or a hash value.
• Hashing is a process, in which a new index value is assigned to a key,
and data associated with that key is stored at that index.
• In this technique, we give an input called a key to the hash function.
The function uses this key and generates the unique index
corresponding to that value in the hash table. After that, it returns the
value stored at that index which is known as the hash value.
Components of Hashing
There are majorly three components of hashing:
1.Key: A Key can be anything string or integer which is fed as input in the hash
function the technique that determines an index or location for storage of an item
in a data structure.
2.Hash Function: The hash function receives the input key and returns the index of
an element in an array called a hash table. The index is known as the hash index .
3.Hash Table: Hash table is a data structure that maps keys to values using a special
function called a hash function. Hash stores the data in an associative manner in an
array where each data value has its own unique index.
How does Hashing work?
• Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would
like to store it in a table.
• hash function - sum(string) mod 7
Types of hash functions
The primary types of hash functions are:
1.Division Method.
2.Mid Square Method.
3.Folding Method.
4.Multiplication Method.
Division Method
The easiest and quickest way to create a hash value is through division. The k-value is
divided by M in this hash function, and the result is used.
Formula:
• h(K) = k mod M
• (where k = key value and M = the size of the hash table)
Example:
• If the value to be stored is 15 and the hash table is of size 10. Then the key value will be
stored at
• h (15) = 15 % 10
• h (15) = 5
• So, key value 15 will be stored at index value 5.
Mid-Square Method
In the mid-square method, the key is squared, and the middle digits of
the result are taken as the hash value.
Steps:
1.Square the key.
2.Extract the middle digits of the squared value i.e. using the middle r
digits, calculate the hash value.
Formula:
• h(K) = h(k x k)
• (where k = key value)
Folding Method
Folding Method in Hashing: The folding method for constructing hash
functions begins by dividing the item into equal-size pieces (the last piece may
not be of equal size). These pieces are then added together to give the resulting
hash value.
Algorithm:
• The folding method is used for creating hash functions starts with the item
being divided into equal-sized pieces i.e., the last piece may not be of equal
size.
• The outcome of adding these bits together is the hash value,
H(x) = (a + b + c) mod M, where a, b, and c represent the preconditioned key
broken down into three parts and M is the table size, and mod stands
for modulo.
• In other words, the sum of three parts of the preconditioned key is divided by
the table size. The remainder is the hash key.
Folding Method
• The task is to fold the key 123456789 into a Hash Table of ten spaces (0
through 9).
• It is given that the key, say X is 123456789 and the table size (i.e., M =
10).
• Since it can break X into three parts in any order. Let’s divide it evenly.
• Therefore, a = 123, b = 456, c = 789.
• Now, H(x) = (a + b + c) mod M i.e., H(123456789) =(123 + 456 + 789)
mod 10 = 1368 mod 10 = 8.
• Hence, 123456789 is inserted into the table at address 8.
Multiplication Method
• The multiplication method for creating hash functions operates in two
steps. First, we multiply the key k by a constant A in the range 0 < A <
1 and extract the fractional part of kA. Then, we increase this value by
m and take the floor of the result.
• The hash function is:
• Where "k A mod 1" means the fractional part of k A, that is, k A - ⌊k
A⌋.
Collisions
A hash collision occurs when two different keys map to the same index
in a hash table. This can happen even with a good hash function,
especially if the hash table is full or the keys are similar.
Causes of Hash Collisions:
• Poor Hash Function: A hash function that does not distribute keys
evenly across the hash table can lead to more collisions.
• High Load Factor: A high load factor (ratio of keys to hash table
size) increases the probability of collisions.
• Similar Keys: Keys that are similar in value or structure are more
likely to collide.
Collisions
Open Hashing / Separate Chaining/ Closed
Addressing
• Separate chaining is the most used collision hashing technique in data
structures that uses a lined list.
• Any two or more components that meet at the same point are chained
together to form a single-linked list known as a chain.
• Every linked list member that hashes is chained to the same position
here.
• Also known as closed addressing, open hashing is used to avoid any
hash collisions, using an array of linked lists in order to resolve the
collision.
2
1
4
3
5
Closed hashing (Open addressing)
• Open addressing stores all entry records within the array itself, as
opposed to linked lists.
• The phrase 'open addressing' refers to the notion that the hash value of an
item does not identify its location or address.
• In order to insert a new entry, the array is first checked before computing
the hash index of the hashed value, starting with the hashed index. If the
space at the hashed index is empty, the entry value is inserted there;
otherwise, some probing sequences are used until an empty slot is found.
• In Closed hashing, there are three techniques that are used to resolve the
collision: Linear Probing, Quadratic Probing and Double Hashing
Linear Probing
• In linear probing, collision is resolved by checking the next slot ie.in this case,
searching is performed sequentially, starting from the position where the collision
occurs till the empty cell is not found.
• It can also be defined as that it allows the insertion ki at first free location from (u+i)
%m where i=0 to m-1. and u=h(k)%m
example for the linear probing:
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
• The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5 respectively. The
calculated index value of 11 is 5 which is already occupied by another key value, i.e.,
6. When linear probing is applied, the nearest empty cell to the index 5 is 6;
therefore, the value 11 will be added at the index 6.
• The next key value is 13. The index value associated with this key value is 9 when
hash function is applied. The cell is already filled at index 9. When linear probing is
applied, the nearest empty cell to the index 9 is 0; therefore, the value 13 will be
added at the index 0.
Quadratic probing
• In case of linear probing, searching is performed linearly. In
contrast, quadratic probing is an open addressing technique that
uses quadratic polynomial for searching until a empty slot is
found.
• It can also be defined as that it allows the insertion ki at first
free location from (u+i2)%m where i=0 to m-1. and u=h(k)%m
Example:
• A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
Ans: the order of the elements is 13, 9, _, 12, _, 6, 11, 2, 7, 3.
Double Hashing
• The double hashing technique uses two hash functions. The second hash function
comes into use when the first function causes a collision.
• The formula for the double hashing technique is as follows:
(firstHash(key) + i * secondHash(key)) % sizeOfTable
Where i is the offset value. This offset value keeps incremented until it finds an
empty slot.
(u+i*v) %m;
u=h1(k)%m;
v=h2(k)%m
Steps to follow:
1. Verify if hash1(key) is empty. If yes, then store the value on this slot.
2. If hash1(key) is not empty, then find another slot using hash2(key).
3. Verify if hash1(key) + hash2(key) is empty. If yes, then store the value on this slot.
4. Keep incrementing the counter and repeat with hash1(key)+2hash2(key), hash1(key)
Example- Double Hashing
• Hash Tables: The most common use of hash functions in DSA is in hash tables, which provide an
efficient way to store and retrieve data.
• Data Structures: Hash functions are utilized in various data structures such as Bloom filters and hash
sets.
• Cryptography: In cryptographic applications, hash functions are used to create secure hash algorithms
like SHA-256.
• Password Storage: Hash functions are commonly used to securely store passwords. Instead of storing the
actual passwords, the system stores their hash values. When a user enters a password, it is hashed and
compared with the stored hash value for authentication.
• Data Integrity: Hashing is used to ensure data integrity by generating hash values for files or messages.
By comparing the hash values before and after transmission or storage, it's possible to detect if any
changes or tampering occurred. Hash functions are used to ensure the integrity of data by generating
checksums.
• Data Retrieval: Hashing is used in data structures like hash tables, which provide efficient data retrieval
based on key-value pairs. The hash value serves as an index to store and retrieve data quickly.
• Digital Signatures: Hash functions are an integral part of digital signatures. They are used to generate a
unique hash value for a message, which is then encrypted with the signer's private key. This allows for
verification of the authenticity and integrity of the message using the signer's public key.