Hashing:: Michael Levin
Hashing:: Michael Levin
Introduction
Michael Levin
Department of Computer Science and Engineering
University of California, San Diego
2 Phone Book
4 Hash Functions
5 Chaining
7 Hash Tables
Blockchain
Programming Languages
Programming Languages
dict
Programming Languages
HashMap
dict
Programming Languages
2 Phone Book
4 Hash Functions
5 Chaining
7 Hash Tables
Who’s Calling?
Who’s Calling?
Who’s Calling?
Phone Book
Like 123-23-23
Local Phone Numbers
Like 123-23-23
Typically up to 7 digits
Local Phone Numbers
Like 123-23-23
Typically up to 7 digits
Sufficient for 107 = 10 000 000 phone
numbers
Convert Phone Number to Integer
Examples
123-23-23 → 1 232 323
049 12 12 → 491 212
5757575 → 5 757 575
Direct Addressing
Phone number Name
0000000
…
2391717
7
Sasha
10 rows
…
5757575 Helen
…
9999999
Direct Addressing
SetName(phoneNumber, name)
index ← ConvertToInt(phoneNumber)
phoneBookArray[index] ← name
GetName(phoneNumber)
index ← ConvertToInt(phoneNumber)
return phoneBookArray[index]
SetName(phoneNumber, name)
index ← ConvertToInt(phoneNumber)
phoneBookArray[index] ← name
GetName(phoneNumber)
index ← ConvertToInt(phoneNumber)
return phoneBookArray[index]
SetName(phoneNumber, name)
index ← ConvertToInt(phoneNumber)
phoneBookArray[index] ← name
GetName(phoneNumber)
index ← ConvertToInt(phoneNumber)
return phoneBookArray[index]
SetName(phoneNumber, name)
index ← ConvertToInt(phoneNumber)
phoneBookArray[index] ← name
GetName(phoneNumber)
index ← ConvertToInt(phoneNumber)
return phoneBookArray[index]
SetName(phoneNumber, name)
index ← ConvertToInt(phoneNumber)
phoneBookArray[index] ← name
GetName(phoneNumber)
index ← ConvertToInt(phoneNumber)
return phoneBookArray[index]
SetName(phoneNumber, name)
index ← ConvertToInt(phoneNumber)
phoneBookArray[index] ← name
Asymptotics
2 Phone Book
4 Hash Functions
5 Chaining
7 Hash Tables
International Phone Numbers
Like +1-800-700-00-00
International Phone Numbers
Like +1-800-700-00-00
Can be up to 15 digits:
+594 700 123 233 455
International Phone Numbers
Like +1-800-700-00-00
Can be up to 15 digits:
+594 700 123 233 455
Using direct addressing requires array of
size 1015, which would take 7PB (7
petabytes) to store one phone book
(1PB = 1024TB, 1TB = 1024GB)
International Phone Numbers
Like +1-800-700-00-00
Can be up to 15 digits:
+594 700 123 233 455
Using direct addressing requires array of
size 1015, which would take 7PB (7
petabytes) to store one phone book
(1PB = 1024TB, 1TB = 1024GB)
Your phone memory is probably at most
256GB, so you would need 28762
phones to store your phone book :)
Idea
Direct addressing requires too much
memory
Idea
Direct addressing requires too much
memory
Array is huge because it has a cell for
every possible phone number
Idea
Direct addressing requires too much
memory
Array is huge because it has a cell for
every possible phone number
Let’s store only the known phone
numbers
Idea
Direct addressing requires too much
memory
Array is huge because it has a cell for
every possible phone number
Let’s store only the known phone
numbers
Put pairs (Phone number, Name) into a
doubly-linked list
Idea
01707773331 Maria
14052391717 Sasha
15025757575 Helen
Operations
Retrieve name by phone number using
binary search in O(log n)
Operations
Retrieve name by phone number using
binary search in O(log n)
To insert a new contact, find
appropriate position in O(log n), then
insert in...
Operations
Retrieve name by phone number using
binary search in O(log n)
To insert a new contact, find
appropriate position in O(log n), then
insert in...
...O(n), because we need to first move
part of the array 1 position to the right
Operations
Retrieve name by phone number using
binary search in O(log n)
To insert a new contact, find
appropriate position in O(log n), then
insert in...
...O(n), because we need to first move
part of the array 1 position to the right
Too slow again
Conclusion
2 Phone Book
4 Hash Functions
5 Chaining
7 Hash Tables
Encoding Phone Numbers
Definition
For any set of objects S and any integer
m > 0, a function h : S → {0, 1, . . . , m − 1}
is called a hash function.
Hash Function
Definition
For any set of objects S and any integer
m > 0, a function h : S → {0, 1, . . . , m − 1}
is called a hash function.
Definition
m is called the cardinality of hash function h.
Desirable Properties
Hash function should be fast to
compute
Desirable Properties
Hash function should be fast to
compute
Different values for different objects
Desirable Properties
Hash function should be fast to
compute
Different values for different objects
Direct addressing with O(m) memory
Desirable Properties
Hash function should be fast to
compute
Different values for different objects
Direct addressing with O(m) memory
Want small cardinality m
Desirable Properties
Hash function should be fast to
compute
Different values for different objects
Direct addressing with O(m) memory
Want small cardinality m
Impossible to have all different values if
number of objects |S| is more than m
(by pigeonhole principle)
Collisions
Definition
When h(o1) = h(o2) and o1 ̸= o2, this is a
collision.
Desirable Properties
2 Phone Book
4 Hash Functions
5 Chaining
7 Hash Tables
Map
Definition
Map from set S of objects to set V of
values is a data structure with methods
HasKey(object), Get(object),
Set(object, value), where
object ∈ S, value ∈ V.
Map
Definition
In a Map from S to V, objects from S are
usually called keys of the Map. Objects from
V are called values of the Map.
Chaining for Phone Book
0
1
2
3
4
5
6
7
Chaining for Phone Book
h(01707773331) = 4
0
1
2
3
4
5
6
7
Chaining for Phone Book
h(01707773331) = 4
0
1
2
3
4 Maria01707773331
5
6
7
Chaining for Phone Book
h(01707773331) = 4
0 h(14052391717) = 1
1
2
3
4 Maria01707773331
5
6
7
Chaining for Phone Book
h(01707773331) = 4
0 h(14052391717) = 1
1 Sasha14052391717
2
3
4 Maria01707773331
5
6
7
Chaining for Phone Book
h(01707773331) = 4
0 h(14052391717) = 1
1 Sasha14052391717
h(15025757575) = 4
2
3
4 Maria01707773331
5
6
7
Chaining for Phone Book
h(01707773331) = 4
0 h(14052391717) = 1
1 Sasha14052391717
h(15025757575) = 4
2
3
4 Maria01707773331 Helen15025757575
5
6
7
Chaining for Phone Book
Select hash function h of cardinality m
Chaining for Phone Book
Select hash function h of cardinality m
Create array Chains of size m
Chaining for Phone Book
Select hash function h of cardinality m
Create array Chains of size m
Each element of Chains is a
doubly-linked list of pairs
(name, phoneNumber), called chain
Chaining for Phone Book
Select hash function h of cardinality m
Create array Chains of size m
Each element of Chains is a
doubly-linked list of pairs
(name, phoneNumber), called chain
Pair (name, phoneNumber) goes into
chain at position
h(ConvertToInt(phoneNumber)) in
the array Chains
Chaining for Phone Book
To look up name by phone number, go
to the chain corresponding to phone
number and look through all pairs
Chaining for Phone Book
To look up name by phone number, go
to the chain corresponding to phone
number and look through all pairs
To add a contact, create a pair
(name, phoneNumber) and insert it into
the corresponding chain
Chaining for Phone Book
To look up name by phone number, go
to the chain corresponding to phone
number and look through all pairs
To add a contact, create a pair
(name, phoneNumber) and insert it into
the corresponding chain
To remove a contact, go to the
corresponding chain, find the pair
(name, phoneNumber) and remove it
from the chain
Outline
1 Applications
2 Phone Book
4 Hash Functions
5 Chaining
7 Hash Tables
Implementation
Chains — array of chains
Each chain is a list of pairs (object, value)
HasKey(object)
chain ← Chains[hash(object)]
for (key, value) in chain:
if key == object:
return true
return false
Implementation
Chains — array of chains
Each chain is a list of pairs (object, value)
HasKey(object)
chain ← Chains[hash(object)]
for (key, value) in chain:
if key == object:
return true
return false
Implementation
Chains — array of chains
Each chain is a list of pairs (object, value)
HasKey(object)
chain ← Chains[hash(object)]
for (key, value) in chain:
if key == object:
return true
return false
Implementation
Chains — array of chains
Each chain is a list of pairs (object, value)
HasKey(object)
chain ← Chains[hash(object)]
for (key, value) in chain:
if key == object:
return true
return false
Implementation
Chains — array of chains
Each chain is a list of pairs (object, value)
HasKey(object)
chain ← Chains[hash(object)]
for (key, value) in chain:
if key == object:
return true
return false
Implementation
Chains — array of chains
Each chain is a list of pairs (object, value)
HasKey(object)
chain ← Chains[hash(object)]
for (key, value) in chain:
if key == object:
return true
return false
Implementation
Get(object)
chain ← Chains[hash(object)]
for (key, value) in chain:
if key == object:
return value
return N/A
Implementation
Get(object)
chain ← Chains[hash(object)]
for (key, value) in chain:
if key == object:
return value
return N/A
Implementation
Get(object)
chain ← Chains[hash(object)]
for (key, value) in chain:
if key == object:
return value
return N/A
Implementation
Get(object)
chain ← Chains[hash(object)]
for (key, value) in chain:
if key == object:
return value
return N/A
Implementation
Get(object)
chain ← Chains[hash(object)]
for (key, value) in chain:
if key == object:
return value
return N/A
Implementation
Get(object)
chain ← Chains[hash(object)]
for (key, value) in chain:
if key == object:
return value
return N/A
Implementation
Set(object, value)
chain ← Chains[hash(object)]
for pair in chain:
if pair.key == object:
pair.value ← value
return
chain.Append((object, value))
Implementation
Set(object, value)
chain ← Chains[hash(object)]
for pair in chain:
if pair.key == object:
pair.value ← value
return
chain.Append((object, value))
Implementation
Set(object, value)
chain ← Chains[hash(object)]
for pair in chain:
if pair.key == object:
pair.value ← value
return
chain.Append((object, value))
Implementation
Set(object, value)
chain ← Chains[hash(object)]
for pair in chain:
if pair.key == object:
pair.value ← value
return
chain.Append((object, value))
Implementation
Set(object, value)
chain ← Chains[hash(object)]
for pair in chain:
if pair.key == object:
pair.value ← value
return
chain.Append((object, value))
Implementation
Set(object, value)
chain ← Chains[hash(object)]
for pair in chain:
if pair.key == object:
pair.value ← value
return
chain.Append((object, value))
Implementation
Set(object, value)
chain ← Chains[hash(object)]
for pair in chain:
if pair.key == object:
pair.value ← value
return
chain.Append((object, value))
Asymptotics
Lemma
Let c be the length of the longest chain in
Chains. Then the running time of HasKey,
Get, Set is Θ(c + 1).
Asymptotics
Proof
If the chain corresponding to the
object is non-empty, but the object
is not found in the chain, we will scan
all c items — Θ(c) = Θ(c + 1)
Asymptotics
Proof
If the chain corresponding to the
object is non-empty, but the object
is not found in the chain, we will scan
all c items — Θ(c) = Θ(c + 1)
If c = 0, we still need O(1) time, thus
the need for “+1”
Asymptotics
Lemma
Let n be the number of different objects
currently in the map and m be the cardinality
of the hash function. Then the memory
consumption for chaining is Θ(n + m).
Asymptotics
Proof
Θ(n) to store n pairs (object, value)
Asymptotics
Proof
Θ(n) to store n pairs (object, value)
Θ(m) for array Chains of size m
Outline
1 Applications
2 Phone Book
4 Hash Functions
5 Chaining
7 Hash Tables
Set
Definition
Set is a data structure with methods
Add(object), Remove(object),
Find(object).
Set
Examples
Students on campus
Set
Examples
Students on campus
Phone numbers of contacts
Set
Examples
Students on campus
Phone numbers of contacts
Keywords in a programming language
Implementing Set
Find(object)
chain ← Chains[hash(object)]
for key in chain:
if key == object:
return true
return false
Implementation
Find(object)
chain ← Chains[hash(object)]
for key in chain:
if key == object:
return true
return false
Implementation
Find(object)
chain ← Chains[hash(object)]
for key in chain:
if key == object:
return true
return false
Implementation
Find(object)
chain ← Chains[hash(object)]
for key in chain:
if key == object:
return true
return false
Implementation
Find(object)
chain ← Chains[hash(object)]
for key in chain:
if key == object:
return true
return false
Implementation
Find(object)
chain ← Chains[hash(object)]
for key in chain:
if key == object:
return true
return false
Implementation
Add(object)
chain ← Chains[hash(object)]
for key in chain:
if key == object:
return
chain.Append(object)
Implementation
Add(object)
chain ← Chains[hash(object)]
for key in chain:
if key == object:
return
chain.Append(object)
Implementation
Add(object)
chain ← Chains[hash(object)]
for key in chain:
if key == object:
return
chain.Append(object)
Implementation
Add(object)
chain ← Chains[hash(object)]
for key in chain:
if key == object:
return
chain.Append(object)
Implementation
Add(object)
chain ← Chains[hash(object)]
for key in chain:
if key == object:
return
chain.Append(object)
Implementation
Add(object)
chain ← Chains[hash(object)]
for key in chain:
if key == object:
return
chain.Append(object)
Implementation
Remove(object)
if not Find(object):
return
chain ← Chains[hash(object)]
chain.Erase(object)
Implementation
Remove(object)
if not Find(object):
return
chain ← Chains[hash(object)]
chain.Erase(object)
Implementation
Remove(object)
if not Find(object):
return
chain ← Chains[hash(object)]
chain.Erase(object)
Implementation
Remove(object)
if not Find(object):
return
chain ← Chains[hash(object)]
chain.Erase(object)
Implementation
Remove(object)
if not Find(object):
return
chain ← Chains[hash(object)]
chain.Erase(object)
Hash Table
Definition
An implementation of a Set or a Map using
hashing is called a hash table.
Programming Languages
Set:
unordered_set in C++
HashSet in Java
set in Python
Map:
unordered_map in C++
HashMap in Java
dict in Python
Conclusion