0% found this document useful (0 votes)
14 views44 pages

Hashing

Hashing is a technique for mapping large datasets to tabular indexes using a hash function, allowing for constant time operations (O(1)) for lookups, updates, and retrievals. It involves using hash tables that store elements in key-value pairs, and addresses hash collisions through methods like chaining and open addressing. Hashing has various applications in databases, cryptography, caching, spell checking, and network routing, offering advantages such as fast access and efficient search, but also has limitations like hash collisions and the quality of hash functions.

Uploaded by

fewibi7074
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views44 pages

Hashing

Hashing is a technique for mapping large datasets to tabular indexes using a hash function, allowing for constant time operations (O(1)) for lookups, updates, and retrievals. It involves using hash tables that store elements in key-value pairs, and addresses hash collisions through methods like chaining and open addressing. Hashing has various applications in databases, cryptography, caching, spell checking, and network routing, offering advantages such as fast access and efficient search, but also has limitations like hash collisions and the quality of hash functions.

Uploaded by

fewibi7074
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Hashing

Introduction
Hashing

• Hashing is a technique of mapping a large set of arbitrary

data to tabular indexes using a hash function. It is a

method for representing dictionaries for large datasets.

• It allows lookups, updating and retrieval operation to

occur in a constant time i.e. O(1).


Why Hashing is Needed?

• After storing a large amount of data, we need to perform various

operations on these data. Lookups are inevitable for the datasets.

• Linear search and binary search perform lookups/search with

time complexity of O(n) and O(log n) respectively. As the size of

the dataset increases, these complexities also become

significantly high which is not acceptable.


Why Hashing is Needed?

• So, We need a technique that does not

depend on the size of data. Hashing allows

lookups to occur in constant time i.e. O(1).


Hash Function
• A hash function is used for mapping each
element of a dataset to indexes in the table.
Hash Table
• The Hash table data structure stores elements
in key-value pairs where
• Key- unique integer that is used for indexing
the values
• Value - data that are associated with keys.
Hashing

• In a hash table, a new index is processed using the


keys. And, the element corresponding to that key is
stored in the index. This process is called hashing.
• Let k be a key and h(x) be a hash function.

• Here, h(k) will give us a new index to store the


element linked with k
Hashing
Hash Collision
• When the hash function generates the same index for
multiple keys, there will be a conflict (what value to be
stored in that index). This is called a hash collision.

• We can resolve the hash collision using one of the following


techniques.
– Collision resolution by chaining

– Open Addressing: Linear/Quadratic Probing and Double Hashing


Collision resolution by chaining

• In chaining, if a hash function produces the same


index for multiple elements, these elements are
stored in the same index by using a doubly-linked
list.
• If j is the slot for multiple elements, it contains a
pointer to the head of the list of elements. If no
element is present, j contains NIL.
Collision resolution by chaining
Example
• Example: Let us consider a simple hash function as “key mod 7” and a
sequence of keys as 50, 700, 76, 85, 92, 73, 101
Collision resolution by chaining

Advantages:

• Simple to implement.

• Hash table never fills up, we can always add more elements
to the chain.

• Less sensitive to the hash function or load factors.

• It is mostly used when it is unknown how many and how


frequently keys may be inserted or deleted.
Collision resolution by chaining
Disadvantages:

• The cache performance of chaining is not good as keys are stored using

a linked list. Open addressing provides better cache performance as

everything is stored in the same table.

• Wastage of Space (Some Parts of the hash table are never used)

• If the chain becomes long, then search time can become O(n) in the

worst case

• Uses extra space for links


Open Addressing
• Unlike chaining, open addressing doesn't store multiple elements
into the same slot. Here, each slot is either filled with a single key or
left NIL.

• Different techniques used in open addressing are:

i. Linear Probing

ii. Quadratic Probing

iii. Double hashing


Linear Probing
In linear probing, collision is resolved by checking the next slot.

h(k, i) = (h′(k) + i) mod m

where i = {0, 1, ….} h'(k) is a new hash function

• If a collision occurs at h(k, 0), then h(k, 1) is checked.


In this way, the value of i is incremented linearly.
Linear Probing

• The problem with linear probing is that a cluster


of adjacent slots is filled.

• When inserting a new element, the entire cluster


must be traversed.

• This adds to the time required to perform


operations on the hash table.
Example
• Example: Let us consider a simple hash function as “key mod 7” and a
sequence of keys as 50, 700, 76, 85, 92, 73, 101,
Quadratic Probing

• It works similar to linear probing but the spacing


between the slots is increased (greater than one)
by using the following relation.

• h(k, i) = (h′(k) + c1i + c2i2) mod m

where, c1 and c2 are positive auxiliary constants,

i = {0, 1, ….}
Example
• Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7

and collision resolution strategy to be f(i) = i2 . Insert = 22, 30, and 50.

• Step 1: Create a table of size 7.


Example

• Step 2 – Insert 22 and 30


– Hash(22) = 22 % 7 = 1, Since the cell at index 1 is empty, we can easily insert 22

at slot 1.

– Hash(30) = 30 % 7 = 2, Since the cell at index 2 is empty, we can easily insert 30

at slot 2.
Example
• Step 3: Inserting 50
– Hash(50) = 50 % 7 = 1

– In our hash table slot 1 is already occupied. So, we will search for slot 1+1 2, i.e.
1+1 = 2,
– Again slot 2 is found occupied, so we will search for cell 1+22, i.e.1+4 = 5,

– Now, cell 5 is not occupied so we will place 50 in slot 5.


Double hashing
• If a collision occurs after applying a hash function h(k), then
another hash function is calculated for finding the next slot.

• h(k, i) = (h1(k) + ih2(k)) mod m

• Double hashing can be done using :


(hash1(key) + i * hash2(key)) % TABLE_SIZE
Here hash1() and hash2() are hash functions and TABLE_SIZE
is size of hash table.
(We repeat by increasing i when collision occurs)
Double hashing
Good Hash Functions

A good hash function may not prevent the collisions completely however it

can reduce the number of collisions.

Here, we will look into different methods to find a good hash function

• Division Method

• Multiplication Method

• Mid Square Method

• Digital Folding Method


Division Method

1. Division Method

• This is the most simple and easiest method to generate a hash

value. The hash function divides the value k by M and then

uses the remainder obtained.

• Formula:

• h(K) = k mod M

Here, k is the key value, and M is the size of the hash table.
Division Method
• Example:
– k = 1276
M = 11
h(1276) = 1276 mod 11
=0
Division Method
• Pros:
– This method is quite good for any value of M.
– The division method is very fast since it requires
only a single division operation.
• Cons:
– This method leads to poor performance since
consecutive keys map to consecutive hash values in
the hash table.
– Sometimes extra care should be taken to choose
the value of M.
Multiplication Method
• This method involves the following steps:

• Choose a constant value A such that 0 < A < 1.

• Multiply the key value with A.

• Extract the fractional part of kA.

• Multiply the result of the above step by the size of the hash
table i.e. M.
• The resulting hash value is obtained by taking the floor of the
result obtained in step 4.
Multiplication Method
• Formula:
• h(K) = floor (M (kA mod 1))
• Here,
M is the size of the hash table.
k is the key value.
A is a constant value.
Multiplication Method
• Example:
• k = 12345
A = 0.357840
M = 100
• h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
Multiplication Method
• Pros:

• it can work with any value between 0 and 1,

• although there are some values that tend to give better results than
the rest.

• Cons:

• generally suitable when the table size is the power of two,

• then the whole process of computing the index by the key using
multiplication hashing is very fast.
Mid Square Method

• The mid-square method is a very good hashing

method.

• It involves two steps to compute the hash value-

• Square the value of the key k i.e. k2

• Extract the middle r digits as the hash value.


Mid Square Method

• Formula:

• h(K) = h(k x k)

• Here,

k is the key value.

• The value of r can be decided based on the size of the table.


Mid Square Method
Example:
• Suppose the hash table has 100 memory locations. So r = 2 because
two digits are required to map the key to the memory location.
• k = 60
k x k = 60 x 60
= 3600
h(60) = 60
• The hash value obtained is 60
Mid Square Method
• Pros:
• The performance of this method is good as most or all digits of the key value contribute to the result.

• This is because all digits in the key contribute to generating the middle digits of the squared result.

• The result is not dominated by the distribution of the top digit or bottom digit of the original key

value.

• Cons:
• The size of the key is one of the limitations of this method, as the key is of big size then its square will

double the number of digits.

• Another disadvantage is that there will be collisions but we can try to reduce collisions.
Digit Folding Method

• This method involves two steps:

• Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where

each part has the same number of digits except for the last part that can

have lesser digits than the other parts.

• Add the individual parts. The hash value is obtained by ignoring the last

carry if any.
Digit Folding Method

• Formula:

• k = k1, k2, k3, k4, ….., kn

s = k1+ k2 + k3 + k4 +….+ kn

h(K)= s

• Here,

s is obtained by adding the parts of the key k


Digit Folding Method

• Example:

• k = 12345

k1 = 12, k2 = 34, k3 = 5

s = k1 + k2 + k3

= 12 + 34 + 5

= 51

h(K) = 51
Digit Folding Method

• Example:

• k = 12345

k1 = 12, k2 = 34, k3 = 5

s = k1 + k2 + k3

= 12 + 34 + 5

= 51

h(K) = 51
Applications of Hashing

• Hashing has many applications in computer science,

including:

• Databases: Hashing is used to index and search large

databases efficiently.

• Cryptography: Hash functions are used to generate message

digests, which are used to verify the integrity of data and

protect against tampering.


Applications of Hashing

• Caching: Hash tables are used in caching systems to store

frequently accessed data and improve performance.

• Spell checking: Hashing is used in spell checkers to quickly

search for words in a dictionary.

• Network routing: Hashing is used in load balancing and

routing algorithms to distribute network traffic across

multiple servers.
Advantages of Hashing
• Fast Access: Hashing provides constant time access to data, making it

faster than other data structures like linked lists and arrays.

• Efficient Search: Hashing allows for quick search operations, making it

an ideal data structure for applications that require frequent search

operations.

• Space-Efficient: Hashing can be more space-efficient than other data

structures, as it only requires a fixed amount of memory to store the

hash table.
Limitations of Hashing:
• Hash Collisions: Hashing can produce the same hash value for
different keys, leading to hash collisions. To handle collisions, we
need to use collision resolution techniques like chaining or open
addressing.

• Hash Function Quality: The quality of the hash function


determines the efficiency of the hashing algorithm. A poor-
quality hash function can lead to more collisions, reducing the
performance of the hashing algorithm.

You might also like