0% found this document useful (0 votes)
2 views48 pages

Hashing

Hashing is an efficient searching technique in data structures that allows for constant time complexity O(1) for searches, unlike traditional methods like linear and binary search which depend on the number of elements. It utilizes a hash table and a hash function to map keys to indices, with various methods for generating hash values and handling collisions, such as separate chaining and open addressing. The document outlines different hashing methods, their advantages and disadvantages, and provides examples of collision resolution techniques.

Uploaded by

milankv2703
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views48 pages

Hashing

Hashing is an efficient searching technique in data structures that allows for constant time complexity O(1) for searches, unlike traditional methods like linear and binary search which depend on the number of elements. It utilizes a hash table and a hash function to map keys to indices, with various methods for generating hash values and handling collisions, such as separate chaining and open addressing. The document outlines different hashing methods, their advantages and disadvantages, and provides examples of collision resolution techniques.

Uploaded by

milankv2703
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Hashing

• In data structures,
1. There are several searching techniques like linear search,
binary search, search trees etc.
2. In these techniques, time taken to search any particular
element depends on the total number of elements.
a) Linear Search takes O(n) time to perform the search in
unsorted arrays consisting of n elements.
b) Binary Search takes O(logn) time to perform the search in
sorted arrays consisting of n elements.
c) It takes O(logn) time to perform the search in Binary Search
Tree consisting of n elements.
• Drawback-
1. As the number of elements increases, time taken to
perform the search also increases.
2. This becomes problematic when total number of
elements become too large.
Hashing in Data Structure
• Hashing is a well-known technique to search any particular element
among several elements.
• It minimizes the number of comparisons while performing the search.
Advantage
• Unlike other searching techniques,
• Hashing is extremely efficient.
• The time taken by it to perform the search does not
depend upon the total number of elements.
• It completes the search with constant time complexity
O(1).
Hashing Mechanism
In hashing,
• An array data structure called as Hash table is used to store
the data items.
• Based on the hash key value, data items are inserted into the
hash table.
Hash Key Value
• Hash key value is a special value that serves as an index for a
data item.
• It indicates where the data item should be be stored in the
hash table.
• Hash key value is generated using a hash function.
Hash Function
• Hash function is a function that maps any big number or
string to a small integer value.
• Hash function takes the data item as an input and
returns a small integer value as an output.
• The small integer value is called as a hash value.
• Hash value of the data item is then used as an index for
storing it into the hash table.
Types of Hash Functions
There are various types of hash functions available such
as
1. Mid Square Hash Function
2. Division Hash Function
3. Folding Hash Function etc.,
It depends on the user which hash function he wants to
use.
Properties of Hash Function
The properties of a good hash function are-
• It is efficiently computable.
• It minimizes the number of collisions.
• It distributes the keys uniformly over the table.
Division Method
• The easiest and quickest way to create a hash value is
through division. The k-value is divided by M in this
hash function, and the result is used.

• Formula:
h(K) = k mod M

(where k = key value and M = the size of the hash


table)
Advantages:
• This method is effective for all values of M.
• The division strategy only requires one operation, thus it
is quite quick.
Disadvantages:
• Since the hash table maps consecutive keys to
successive hash values, this could result in poor
performance.
• There are times when exercising extra caution while
selecting M's value is necessary.
Mid Square Method
The following steps are required to calculate
this hash method:
• k*k, or square the value of k
• Using the middle r digits, calculate the hash
value.
Formula:
h(K) = h(k x k)
(where k = key value)
Folding Method
The process involves two steps:
• except for the last component, which may have fewer
digits than the other parts, the key-value k should be
divided into a predetermined number of pieces, such as
k1, k2, k3,..., kn, each having the same amount of
digits.
• Add each element individually. The hash value is
calculated without taking into account the final carry, if
any.
• Formula:
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
Addition is done in two ways
Shift Folding: All parts except the
last are shifted so that their least
significant bits correspond to each
other.
For example, the hash function could
form groups of three from the key
12320324111220. To determine
the index to which the key is mapped
the hash function “shifts” each set of
digits under the other and adds them
Folding on the boundaries: The
identifier is folded at the part
boundaries and the bits falling
together are added.
Group the digits in the search key as
in shift folding but the middle
numbers are folded on the boundary
between the first group and the
middle group and they are thus
reversed.
Advantages:
1. Creates a simple hash value by precisely
splitting the key value into equal-sized
segments.
2. Without regard to distribution in a hash table.
Disadvantages:
3. When there are too many collisions, efficiency
can occasionally suffer.
Multiplication Method
1. Determine a constant value. A, where (0, A, 1)
2. Add A to the key value and multiply.
3. Consider kA's fractional portion.
4. Multiply the outcome of the preceding step by M, the
hash table's size.
Formula:
h(K) = floor (M (kA mod 1))
(Where, M = size of the hash table, k = key value and A
= constant value)
(Where, M = size of the hash table,
k = key value and A = constant
value)
h(K) = floor (M (kA mod 1))
k = 5678
A = 0.6829
M = 200
Now, calculating the new value of h(5678):h(5678) = floor[200(5678 x
0.6829 mod 1)]
h(5678) = floor[200(3881.5702 mod 1)]
h(5678) = floor[200(0.5702)]
h(5678) = floor[114.04]
h(5678) = 114
So, with the updated values, h(5678) is 114.
Advantages:
• Any number between 0 and 1 can be applied to
it, however, some values seem to yield better
outcomes than others.
Disadvantages:
• The multiplication method is often appropriate
when the table size is a power of two since
multiplication hashing makes it possible to
quickly compute the index by key.
Collision in Hashing
In hashing
1. Hash function is used to compute the hash value for a key.
2. Hash value is then used as an index to store the key in
the hash table.
3. Hash function may return the same hash value for two or
more keys.

When the hash value of a key maps to an already


occupied bucket of the hash table, it is called as a
Collision.
Collision Resolution Techniques
Collision Resolution Techniques are the techniques
used for resolving or handling the collision.
Separate Chaining
To handle the collision,
1. This technique creates a linked list to the slot
for which collision occurs.
2. The new key is then inserted in the linked list.
3. These linked lists to the slots appear like
chains.
4. That is why, this technique is called as separate
chaining.
Time Complexity
For Searching
1. In worst case, all the keys might map to the same
bucket of the hash table.
2. In such a case, all the keys will be present in a
single linked list.
3. Sequential search will have to be performed on the
linked list to perform the search.
4. So, time taken for searching in worst case is O(n).
• For Deletion
1. In worst case, the key might have to be searched
first and then deleted.
2. In worst case, time taken for searching is O(n).
3. So, time taken for deletion in worst case is O(n).
Load Factor (α)

If Load factor (α) = constant, then time complexity of


Insert, Search, Delete = Θ(1)
Problem
• Using the hash function ‘key mod 7’, insert the
following sequence of keys in the hash table-
50, 700, 76, 85, 92, 73 and 101
Use separate chaining technique for collision
resolution.
• For the given hash function, the possible range of
hash values is [0, 6].
• So, draw an empty hash table consisting of 7
buckets as
Bucket of the hash table to which key 50 maps = 50 mod 7 = 1.
So, key 50 will be inserted in bucket-1 of the hash table as
700 maps = 700 mod 7 = 0
76 maps = 76 mod 7 = 6
The next key to be inserted in the
hash table = 85.
Bucket of the hash table to which
key 85 maps = 85 mod 7 = 1.
Since bucket-1 is already occupied,
so collision occurs.
Separate chaining handles the
collision by creating a linked list to
bucket-1.
So, key 85 will be inserted in
bucket-1 of the hash table as-
The next key to be inserted in the
hash table = 92.
Bucket of the hash table to which
key 92 maps = 92 mod 7 = 1.
Since bucket-1 is already
occupied, so collision occurs.
Separate chaining handles the
collision by creating a linked list to
bucket-1.
So, key 92 will be inserted in
bucket-1 of the hash table as-
• The next key to be inserted in the hash table = 73.
• Bucket of the hash table to which key 73 maps = 73
mod 7 = 3.
• So, key 73 will be inserted in bucket-3 of the hash table
as-
The next key to be inserted in
the hash table = 101.
Bucket of the hash table to
which key 101 maps = 101 mod
7 = 3.
Since bucket-3 is already
occupied, so collision occurs.
Separate chaining handles the
collision by creating a linked list
to bucket-3.
So, key 101 will be inserted in
bucket-3 of the hash table as-
Another Example
Keys: 0, 1, 4, 9, 16, 25, 36, 49,hash(key)
64, 81 = key % 10.

0 0

1 81 1
2

4 64 4
5 25
6 36 16
7

9 49 9
Open Addressing
In open addressing,
1. Unlike separate chaining, all the keys are stored inside
the hash table.
2. No key is stored outside the hash table.

• Techniques used for open addressing are


1. Linear Probing
2. Quadratic Probing
3. Double Hashing
Linear Probing
In linear probing,
1. When collision occurs, we linearly probe for the
next bucket.
2. We keep probing until an empty bucket is
found.
Advantage-
3. It is easy to compute.
Disadvantage-
4. The main problem with linear probing is
clustering.
5. Many consecutive elements form groups.
Quadratic Probing
In quadratic probing,
1. When collision occurs, we probe for i2th bucket
in ith iteration.
2. We keep probing until an empty bucket is
found.
Double Hashing
In double hashing,
1. We use another hash function hash2(x) and
look for i * hash2(x) bucket in ith iteration.
2. It requires more computation time as two hash
functions need to be computed.
• A second hash function is used to drive the collision resolution.
• f(i) = i * hash2(x)
• We apply a second hash function to x and probe at a distance
hash2(x), 2*hash2(x), … and so on.
• The function hash2(x) must never evaluate to zero.
• e.g. Let hash2(x) = x mod 9 and try to insert 99 in the
previous example.
• A function such as hash2(x) = R – ( x mod R) with R a prime
smaller than TableSize will work well.
• e.g. try R = 7 for the previous example.(7 - x mode 7)
Example
H1(k) = k mod 11
H2(k) = 8 – (k mod 8)

Collision: (h1(k)+ih(2))mod11
Keys: 20, 34, 45, 70, 56

You might also like