What is Hashing?
Hashing is a method of sorting and indexing data. The idea behind hashing is to allow large
amounts of data to be indexed using keys commonly created by formulas
Magic function
Apple 18
Application 20
Appmillers 22
0 1 .. 18 19 20 21 22 23
.. Apple Application Appmillers
AppMillers
www.appmillers.com
Why Hashing?
It is time e cient in case of SEARCH Operation
Time complexity for
Data Structure
SEARCH
Array/ Python List O(logN)
Linked List O(N)
Tree O(logN)
Hashing O(1) / O(N)
AppMillers
www.appmillers.com
ffi
Hashing Terminology
Hash function : It is a function that can be used to map of arbitrary size to data of xed size.
Key : Input data by a user
Hash value : A value that is returned by Hash Function
Hash Table : It is a data structure which implements an associative array abstract data type, a
structure that can map keys to values
Collision : A collision occurs when two di erent keys to a hash function produce the same
output.
Hash Function
Key Magic function
Hash Value
Apple 18
Application 20 Hash Table
Appmillers
22
0 1 .. 18 19 20 21 22 23
.. Apple Application Appmillers AppMillers
www.appmillers.com
ff
fi
Hashing Terminology
Hash function : It is a function that can be used to map of arbitrary size to data of xed size.
Key : Input data by a user
Hash value : A value that is returned by Hash Function
Hash Table : It is a data structure which implements an associative array abstract data type, a
structure that can map keys to values
Collision : A collision occurs when two di erent keys to a hash function produce the same
output.
Hash function
ABCD 20
ABCDEF 20
ABCDEF
Collision
0 1 ..
..
18 19 💥
20
ABCD
21 22 23
AppMillers
www.appmillers.com
ff
fi
Hash Functions
Mod function
def mod(number, cellNumber):
return number % cellNumber
mod(400, 24) 16
mod(700, 24) 4
0 1 .. 4 5 .. 16 .. 23
.. 700 .. 400 ..
AppMillers
www.appmillers.com
Hash Functions
ASCII function
def modASCII(string, cellNumber):
total = 0
for i in string:
total += ord(i)
return total % cellNumber
modASCII("ABC", 24) 6
A 65 65+66+67 = 198 24
192 8
B 66
6
C 67
0 1 .. 6 7 .. 16 .. 23
.. ABC .. ..
AppMillers
www.appmillers.com
Hash Functions
Properties of good Hash function
- It distributes hash values uniformly across hash tables
Hash function
ABCD 20
ABCDEF 20
ABCDEF
Collision
0 1 ..
..
18 19 💥
20
ABCD
21 22 23
AppMillers
www.appmillers.com
Hash Functions
Properties of good Hash function
- It distributes hash values uniformly across hash tables
- It has to use all the input data
ABCD
ABCDEF
Hash function
ABC 18
Collision
0 1 ..
..
💥
18
ABCD
19 20 21 22 23
AppMillers
www.appmillers.com
Collision Resolution Techniques
Hash function
ABCD 0
2
💥
Collision
1
EFGH 2
2 ABCD EFGH
IJKLM 3
4
5
6
7
8
9
10
11
12
13
14
15
AppMillers
www.appmillers.com
Collision Resolution Techniques
Resolution Techniques
Direct Chaining Open Addressing
Linear Probing
Quadratic Probing
Double Hashing
AppMillers
www.appmillers.com
Collision Resolution Techniques
Direct Chaining : Implements the buckets as linked list. Colliding elements are stored in this lists
0
1
Hash function 2 111 ABCD 222
Null EFGH 333
Null IJKLM Null
3 111 222 333
ABCD 2 4
5
EFGH 2
6
IJKLM 2
7 444 Miller Null
Miller 7 444
8
9
10
11
12
13
14
15
AppMillers
www.appmillers.com
Collision Resolution Techniques
Open Addressing: Colliding elements are stored in other vacant buckets. During storage and
lookup these are found through so called probing.
Linear probing : It places new key into closest following empty cell
0
1
Hash function 2 ABCD
3 EFGH
ABCD 2 4 KLM
5
EFGH 2
6
IJKLM 2
7
8
9
10
11
12
13
14 AppMillers
www.appmillers.com
15
IJ
Collision Resolution Techniques
Open Addressing: Colliding elements are stored in other vacant buckets. During storage and
lookup these are found through so called probing.
Quadratic probing : Adding arbitrary quadratic polynomial to the index until an empty cell is found
0 12, 22, 32, 42..
1
Hash function
2 + 12 = 3
2 ABCD
3 EFGH 2 + 22 = 6
ABCD 2 4
5
EFGH 2
6 KLM
IJKLM 2
7
8
9
10
11
12
13
14 AppMillers
www.appmillers.com
15
IJ
Collision Resolution Techniques
Open Addressing: Colliding elements are stored in other vacant buckets. During storage and
lookup these are found through so called probing.
Double Hashing : Interval between probes is computed by another hash function
0
1
Hash function 2 ABCD
3 2 + 4= 6
ABCD 2 4
2 + 4= 6
5
EFGH 2
6 EFGH 2 + (2*4) = 8
IJKLM 2
7
8 KLM
9
Hash 2
10
11
EFGH 4 12
13
IJKLM 4
14 AppMillers
www.appmillers.com
15
IJ
Hash Table is Full
Direct Chaining
This situation will never arise.
Hash function
0 111 NOPQ Null
ABCD 2 111
EFGH 1
1 222 EFGH Null
555 RSTU Null
222 555
IJKLM 3 2 333 ABCD Null
NOPQ 0 333
3 444 IJKLM
RSTU 1 Null
444
AppMillers
www.appmillers.com
Hash Table is Full
Open addressing
Create 2X size of current Hash Table and recall hashing for current keys
Hash function
0 NOPQ 0 NOPQ
ABCD 2
EFGH 1
1 EFGH 1 EFGH
IJKLM 3 2 ABCD 2 ABCD
NOPQ 0
3 IJKLM 3 IJKLM
RSTU 1
4 RSTU
AppMillers
www.appmillers.com
Pros and Cons of Collision resolution techniques
Direct chaining
- Hash table never gets full
- Huge Linked List causes performance leaks (Time complexity for search operation becomes O(n).)
Open addressing
- Easy Implementation
- When Hash Table is full, creation of new Hash table a ects performance (Time complexity for
search operation becomes O(n).)
‣ If the input size is known we always use “Open addressing”
‣ If we perform deletion operation frequently we use “Direct Chaining”
AppMillers
www.appmillers.com
ff
Pros and Cons of Collision resolution techniques
Hash function
0 NOPQ Linear Probing
ABCD 2
1 EFGH
EFGH 1
2 ABCD
IJKLM 3
3 IJKLM
NOPQ 0
4 RSTU
RSTU 1
AppMillers
www.appmillers.com
Personal Computer
Practical Use of Hashing
Password veri cation
Login : [email protected]
Password: 123456
Google Servers
Hash value: *&71283*a12
AppMillers
www.appmillers.com
fi
Practical Use of Hashing
Password veri cation
File system : File path is mapped to physical location on disk
AppMillers
www.appmillers.com
fi
Practical Use of Hashing
File system : File path is mapped to physical location on disk
Path: /Documents/Files/hashing.txt
1 /Documents/
Files/hashing.txt Physical location: sector 4
3
AppMillers
www.appmillers.com
Pros and Cons of Hashing
✓On an average Insertion/Deletion/Search operations take O(1) time.
x When Hash function is not good enough Insertion/Deletion/Search operations take O(n) time
Operations Array /Python List Linked List Tree Hashing
Insertion O(N) O(N) O(LogN) O(1)/O(N)
Deletion O(N) O(N) O(LogN) O(1)/O(N)
Search O(N) O(N) O(LogN) O(1)/O(N)
AppMillers
www.appmillers.com