0% found this document useful (0 votes)

14 views44 pages

Hashing

Hashing is a technique for mapping large datasets to tabular indexes using a hash function, allowing for constant time operations (O(1)) for lookups, updates, and retrievals. It involves using hash tables that store elements in key-value pairs, and addresses hash collisions through methods like chaining and open addressing. Hashing has various applications in databases, cryptography, caching, spell checking, and network routing, offering advantages such as fast access and efficient search, but also has limitations like hash collisions and the quality of hash functions.

Uploaded by

fewibi7074

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views44 pages

Hashing

Uploaded by

fewibi7074

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

Hashing

Introduction
Hashing

• Hashing is a technique of mapping a large set of arbitrary

data to tabular indexes using a hash function. It is a

method for representing dictionaries for large datasets.

• It allows lookups, updating and retrieval operation to

occur in a constant time i.e. O(1).

Why Hashing is Needed?

• After storing a large amount of data, we need to perform various

operations on these data. Lookups are inevitable for the datasets.

• Linear search and binary search perform lookups/search with

time complexity of O(n) and O(log n) respectively. As the size of

the dataset increases, these complexities also become

significantly high which is not acceptable.

Why Hashing is Needed?

• So, We need a technique that does not

depend on the size of data. Hashing allows

lookups to occur in constant time i.e. O(1).

Hash Function
• A hash function is used for mapping each
element of a dataset to indexes in the table.
Hash Table
• The Hash table data structure stores elements
in key-value pairs where
• Key- unique integer that is used for indexing
the values
• Value - data that are associated with keys.
Hashing

• In a hash table, a new index is processed using the

keys. And, the element corresponding to that key is
stored in the index. This process is called hashing.
• Let k be a key and h(x) be a hash function.

• Here, h(k) will give us a new index to store the

element linked with k
Hashing
Hash Collision
• When the hash function generates the same index for
multiple keys, there will be a conflict (what value to be
stored in that index). This is called a hash collision.

• We can resolve the hash collision using one of the following

techniques.
– Collision resolution by chaining

– Open Addressing: Linear/Quadratic Probing and Double Hashing

Collision resolution by chaining

• In chaining, if a hash function produces the same

index for multiple elements, these elements are
stored in the same index by using a doubly-linked
list.
• If j is the slot for multiple elements, it contains a
pointer to the head of the list of elements. If no
element is present, j contains NIL.
Collision resolution by chaining
Example
• Example: Let us consider a simple hash function as “key mod 7” and a
sequence of keys as 50, 700, 76, 85, 92, 73, 101
Collision resolution by chaining

Advantages:

• Simple to implement.

• Hash table never fills up, we can always add more elements
to the chain.

• Less sensitive to the hash function or load factors.

• It is mostly used when it is unknown how many and how

frequently keys may be inserted or deleted.
Collision resolution by chaining
Disadvantages:

• The cache performance of chaining is not good as keys are stored using

a linked list. Open addressing provides better cache performance as

everything is stored in the same table.

• Wastage of Space (Some Parts of the hash table are never used)

• If the chain becomes long, then search time can become O(n) in the

worst case

• Uses extra space for links

Open Addressing
• Unlike chaining, open addressing doesn't store multiple elements
into the same slot. Here, each slot is either filled with a single key or
left NIL.

• Different techniques used in open addressing are:

i. Linear Probing

ii. Quadratic Probing

iii. Double hashing

Linear Probing
In linear probing, collision is resolved by checking the next slot.

h(k, i) = (h′(k) + i) mod m

where i = {0, 1, ….} h'(k) is a new hash function

• If a collision occurs at h(k, 0), then h(k, 1) is checked.

In this way, the value of i is incremented linearly.
Linear Probing

• The problem with linear probing is that a cluster

of adjacent slots is filled.

• When inserting a new element, the entire cluster

must be traversed.

• This adds to the time required to perform

operations on the hash table.
Example
• Example: Let us consider a simple hash function as “key mod 7” and a
sequence of keys as 50, 700, 76, 85, 92, 73, 101,
Quadratic Probing

• It works similar to linear probing but the spacing

between the slots is increased (greater than one)
by using the following relation.

• h(k, i) = (h′(k) + c1i + c2i2) mod m

where, c1 and c2 are positive auxiliary constants,

i = {0, 1, ….}
Example
• Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7

and collision resolution strategy to be f(i) = i2 . Insert = 22, 30, and 50.

• Step 1: Create a table of size 7.

Example

• Step 2 – Insert 22 and 30

– Hash(22) = 22 % 7 = 1, Since the cell at index 1 is empty, we can easily insert 22

at slot 1.

– Hash(30) = 30 % 7 = 2, Since the cell at index 2 is empty, we can easily insert 30

at slot 2.
Example
• Step 3: Inserting 50
– Hash(50) = 50 % 7 = 1

– In our hash table slot 1 is already occupied. So, we will search for slot 1+1 2, i.e.
1+1 = 2,
– Again slot 2 is found occupied, so we will search for cell 1+22, i.e.1+4 = 5,

– Now, cell 5 is not occupied so we will place 50 in slot 5.

Double hashing
• If a collision occurs after applying a hash function h(k), then
another hash function is calculated for finding the next slot.

• h(k, i) = (h1(k) + ih2(k)) mod m

• Double hashing can be done using :

(hash1(key) + i * hash2(key)) % TABLE_SIZE
Here hash1() and hash2() are hash functions and TABLE_SIZE
is size of hash table.
(We repeat by increasing i when collision occurs)
Double hashing
Good Hash Functions

A good hash function may not prevent the collisions completely however it

can reduce the number of collisions.

Here, we will look into different methods to find a good hash function

• Division Method

• Multiplication Method

• Mid Square Method

• Digital Folding Method

Division Method

1. Division Method

• This is the most simple and easiest method to generate a hash

value. The hash function divides the value k by M and then

uses the remainder obtained.

• Formula:

• h(K) = k mod M

Here, k is the key value, and M is the size of the hash table.
Division Method
• Example:
– k = 1276
M = 11
h(1276) = 1276 mod 11
=0
Division Method
• Pros:
– This method is quite good for any value of M.
– The division method is very fast since it requires
only a single division operation.
• Cons:
– This method leads to poor performance since
consecutive keys map to consecutive hash values in
the hash table.
– Sometimes extra care should be taken to choose
the value of M.
Multiplication Method
• This method involves the following steps:

• Choose a constant value A such that 0 < A < 1.

• Multiply the key value with A.

• Extract the fractional part of kA.

• Multiply the result of the above step by the size of the hash
table i.e. M.
• The resulting hash value is obtained by taking the floor of the
result obtained in step 4.
Multiplication Method
• Formula:
• h(K) = floor (M (kA mod 1))
• Here,
M is the size of the hash table.
k is the key value.
A is a constant value.
Multiplication Method
• Example:
• k = 12345
A = 0.357840
M = 100
• h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
Multiplication Method
• Pros:

• it can work with any value between 0 and 1,

• although there are some values that tend to give better results than
the rest.

• Cons:

• generally suitable when the table size is the power of two,

• then the whole process of computing the index by the key using
multiplication hashing is very fast.
Mid Square Method

• The mid-square method is a very good hashing

method.

• It involves two steps to compute the hash value-

• Square the value of the key k i.e. k2

• Extract the middle r digits as the hash value.

Mid Square Method

• Formula:

• h(K) = h(k x k)

• Here,

k is the key value.

• The value of r can be decided based on the size of the table.

Mid Square Method
Example:
• Suppose the hash table has 100 memory locations. So r = 2 because
two digits are required to map the key to the memory location.
• k = 60
k x k = 60 x 60
= 3600
h(60) = 60
• The hash value obtained is 60
Mid Square Method
• Pros:
• The performance of this method is good as most or all digits of the key value contribute to the result.

• This is because all digits in the key contribute to generating the middle digits of the squared result.

• The result is not dominated by the distribution of the top digit or bottom digit of the original key

value.

• Cons:
• The size of the key is one of the limitations of this method, as the key is of big size then its square will

double the number of digits.

• Another disadvantage is that there will be collisions but we can try to reduce collisions.
Digit Folding Method

• This method involves two steps:

• Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where

each part has the same number of digits except for the last part that can

have lesser digits than the other parts.

• Add the individual parts. The hash value is obtained by ignoring the last

carry if any.
Digit Folding Method

• Formula:

• k = k1, k2, k3, k4, ….., kn

s = k1+ k2 + k3 + k4 +….+ kn

h(K)= s

• Here,

s is obtained by adding the parts of the key k

Digit Folding Method

• Example:

• k = 12345

k1 = 12, k2 = 34, k3 = 5

s = k1 + k2 + k3

= 12 + 34 + 5

= 51

h(K) = 51
Digit Folding Method

• Example:

• k = 12345

k1 = 12, k2 = 34, k3 = 5

s = k1 + k2 + k3

= 12 + 34 + 5

= 51

h(K) = 51
Applications of Hashing

• Hashing has many applications in computer science,

including:

• Databases: Hashing is used to index and search large

databases efficiently.

• Cryptography: Hash functions are used to generate message

digests, which are used to verify the integrity of data and

protect against tampering.

Applications of Hashing

• Caching: Hash tables are used in caching systems to store

frequently accessed data and improve performance.

• Spell checking: Hashing is used in spell checkers to quickly

search for words in a dictionary.

• Network routing: Hashing is used in load balancing and

routing algorithms to distribute network traffic across

multiple servers.
Advantages of Hashing
• Fast Access: Hashing provides constant time access to data, making it

faster than other data structures like linked lists and arrays.

• Efficient Search: Hashing allows for quick search operations, making it

an ideal data structure for applications that require frequent search

operations.

• Space-Efficient: Hashing can be more space-efficient than other data

structures, as it only requires a fixed amount of memory to store the

hash table.
Limitations of Hashing:
• Hash Collisions: Hashing can produce the same hash value for
different keys, leading to hash collisions. To handle collisions, we
need to use collision resolution techniques like chaining or open
addressing.

• Hash Function Quality: The quality of the hash function

determines the efficiency of the hashing algorithm. A poor-
quality hash function can lead to more collisions, reducing the
performance of the hashing algorithm.

Hashing
No ratings yet
Hashing
48 pages
Unit 3.4 Hashing Techniques
No ratings yet
Unit 3.4 Hashing Techniques
7 pages
Hashing 2
No ratings yet
Hashing 2
59 pages
2,2 Hashing
No ratings yet
2,2 Hashing
30 pages
Hashing
No ratings yet
Hashing
24 pages
Unit 5
No ratings yet
Unit 5
50 pages
Module 6 DSA 24
No ratings yet
Module 6 DSA 24
64 pages
Hashtables
No ratings yet
Hashtables
21 pages
Module 5-Hashing and Collision
No ratings yet
Module 5-Hashing and Collision
51 pages
Module 5 Hashing
No ratings yet
Module 5 Hashing
66 pages
DSA - Unit 1
No ratings yet
DSA - Unit 1
43 pages
HAshing (ISE Department)
No ratings yet
HAshing (ISE Department)
31 pages
UNIT - 2 Notes
No ratings yet
UNIT - 2 Notes
40 pages
Hash
No ratings yet
Hash
7 pages
Hashing
No ratings yet
Hashing
23 pages
Hashing
No ratings yet
Hashing
30 pages
Unit 5 Session 5 Hashing
No ratings yet
Unit 5 Session 5 Hashing
20 pages
Unit-5 Hashing
No ratings yet
Unit-5 Hashing
12 pages
08 Hashing
No ratings yet
08 Hashing
26 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
5 pages
HASHING
No ratings yet
HASHING
63 pages
Hashing Methods
No ratings yet
Hashing Methods
20 pages
HAshing (Satish Sir)
No ratings yet
HAshing (Satish Sir)
52 pages
Module 5
No ratings yet
Module 5
33 pages
Hashing in Data Structures
No ratings yet
Hashing in Data Structures
8 pages
Unit-5 2
No ratings yet
Unit-5 2
9 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Hashing
No ratings yet
Hashing
14 pages
Hashing
No ratings yet
Hashing
23 pages
Module 5: HASHING: Functions. The Values Are Then Stored in A Data Structure Called Hash Table
No ratings yet
Module 5: HASHING: Functions. The Values Are Then Stored in A Data Structure Called Hash Table
39 pages
What Is Hashing
No ratings yet
What Is Hashing
11 pages
Unit-6c DBMS - Hashing
No ratings yet
Unit-6c DBMS - Hashing
21 pages
Hashing v2 12032018
No ratings yet
Hashing v2 12032018
23 pages
HASHING
No ratings yet
HASHING
8 pages
Dsa 5
No ratings yet
Dsa 5
22 pages
Hashing
No ratings yet
Hashing
8 pages
Week13 1
No ratings yet
Week13 1
16 pages
Hashing
No ratings yet
Hashing
20 pages
DS 5
No ratings yet
DS 5
23 pages
Lecture 08 - Hash Tables
No ratings yet
Lecture 08 - Hash Tables
21 pages
UNIT V - Hashing
No ratings yet
UNIT V - Hashing
20 pages
Hashing
No ratings yet
Hashing
7 pages
Hashing
No ratings yet
Hashing
30 pages
Hashing
No ratings yet
Hashing
20 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Hashing Techniques
No ratings yet
Hashing Techniques
13 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
Hashing
No ratings yet
Hashing
37 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Hashing
No ratings yet
Hashing
23 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
Python String Function
No ratings yet
Python String Function
40 pages
Hashing
No ratings yet
Hashing
34 pages
Dump Raw
No ratings yet
Dump Raw
198 pages
15 Days of Power BI - Complete Microsoft Power BI Bootcamp
0% (2)
15 Days of Power BI - Complete Microsoft Power BI Bootcamp
6 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Hashing
No ratings yet
Hashing
56 pages
BCS304-DSA Notes M-5
100% (1)
BCS304-DSA Notes M-5
22 pages
Hashing and Graphs
No ratings yet
Hashing and Graphs
28 pages
Data Sheet
No ratings yet
Data Sheet
22 pages
Acer SF315-41 - Pegatron BK5EA
No ratings yet
Acer SF315-41 - Pegatron BK5EA
81 pages
Conpot Readthedocs Io en Latest
No ratings yet
Conpot Readthedocs Io en Latest
105 pages
Jsppage : JSP Api Introduction
No ratings yet
Jsppage : JSP Api Introduction
28 pages
newLISP v.10.5
No ratings yet
newLISP v.10.5
291 pages
Chapter 6
No ratings yet
Chapter 6
48 pages
Section One: Introduction: Module 2: EAI Business Services
No ratings yet
Section One: Introduction: Module 2: EAI Business Services
18 pages
XMP 2 0 Specification Rev 1 0
No ratings yet
XMP 2 0 Specification Rev 1 0
26 pages
Longshine WA5 40P Manual Eng
No ratings yet
Longshine WA5 40P Manual Eng
50 pages
Stack
No ratings yet
Stack
13 pages
Informatics Practices Term II
No ratings yet
Informatics Practices Term II
7 pages
Birst Exercise 1 Gettingstartedinadmin Web
No ratings yet
Birst Exercise 1 Gettingstartedinadmin Web
7 pages
Describe Network-Supported Technologies That Impact How People Learn, Work, and Play
No ratings yet
Describe Network-Supported Technologies That Impact How People Learn, Work, and Play
39 pages
Case Study 4
No ratings yet
Case Study 4
2 pages
EE6008 MBSD
No ratings yet
EE6008 MBSD
34 pages
Git Quick Reference
No ratings yet
Git Quick Reference
3 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Bitmap and Bitmap Join Index
No ratings yet
Bitmap and Bitmap Join Index
18 pages
Topicals 1.3.5 Memory, Storage Devices and Media
No ratings yet
Topicals 1.3.5 Memory, Storage Devices and Media
9 pages
UPDATED-Evolution of Computers
No ratings yet
UPDATED-Evolution of Computers
22 pages
Computer Science
No ratings yet
Computer Science
9 pages
Exp11 RA2112703010019
No ratings yet
Exp11 RA2112703010019
4 pages
StreamReader Class (System - Io) - Microsoft Docs
No ratings yet
StreamReader Class (System - Io) - Microsoft Docs
8 pages
Current Analytical Architecture
No ratings yet
Current Analytical Architecture
6 pages
XML Dom
No ratings yet
XML Dom
2 pages
Salesforce - LeetCode
No ratings yet
Salesforce - LeetCode
3 pages
Ķeifer
No ratings yet
Ķeifer
2 pages
Take Assessment: Exercise 1
No ratings yet
Take Assessment: Exercise 1
11 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet