01 Phone-Book-Problem 07 Hash Tables 2 Hashfunctions
01 Phone-Book-Problem 07 Hash Tables 2 Hashfunctions
01 Phone-Book-Problem 07 Hash Tables 2 Hashfunctions
Hash Functions
Michael Levin
Higher School of Economics
Data Structures
Data Structures and Algorithms
Outline
1 Good Hash Functions
2 Universal Family
3 Hashing Integers
4 Hashing Strings
Phone Book
Design a data structure to store your
contacts: names of people along with their
phone numbers. The data structure should
be able to do the following quickly:
Add and delete contacts,
Lookup the phone number by name,
Determine who is calling given their
phone number.
We need two Maps:
(phone number → name) and
(name → phone number)
We need two Maps:
(phone number → name) and
(name → phone number)
Implement these Maps as hash tables
We need two Maps:
(phone number → name) and
(name → phone number)
Implement these Maps as hash tables
First, we will focus on the Map from
phone numbers to names
Direct Addressing
int(123-45-67) = 1234567
Direct Addressing
int(123-45-67) = 1234567
Create array Name of size 10L
where L
is the maximum allowed phone number
length
Direct Addressing
int(123-45-67) = 1234567
Create array Name of size 10Lwhere L
is the maximum allowed phone number
length
Store the name corresponding to phone
number P in Name[int(P)]
Direct Addressing
int(123-45-67) = 1234567
Create array Name of size 10Lwhere L
is the maximum allowed phone number
length
Store the name corresponding to phone
number P in Name[int(P)]
If no contact with phone number P ,
Name[int(P)] = N/A
Direct Addressing
Name
...
Natalie
Natalie: 123-45-67 → 1234567 N/A
N/A
Steve: 223-23-23 → 2232323 ...
Steve
N/A
...
Direct Addressing
Operations run in O(1)
Direct Addressing
Operations run in O(1)
Memory usage: O(10L), where L is the
maximum length of a phone number
Direct Addressing
Operations run in O(1)
Memory usage: O(10L), where L is the
maximum length of a phone number
Problematic with international numbers
of length 12 and more: we will need
1012 bytes = 1TB to store one person's
phone book this won't t in anyone's
phone!
Chaining
Select m = 1000
Last Digits
Select m = 1000
Hash function: take last three digits
Last Digits
Select m = 1000
Hash function: take last three digits
h(800-123-45-67) = 567
Last Digits
Select m = 1000
Hash function: take last three digits
h(800-123-45-67) = 567
Problem if many phone numbers end
with three zeros
Random Value
Select m = 1000
Random Value
Select m = 1000
Hash function: random number between
0 and 999
Random Value
Select m = 1000
Hash function: random number between
0 and 999
Uniform distribution of hash values
Random Value
Select m = 1000
Hash function: random number between
0 and 999
Uniform distribution of hash values
Dierent value when hash function
called again we won't be able to nd
anything!
Random Value
Select m = 1000
Hash function: random number between
0 and 999
Uniform distribution of hash values
Dierent value when hash function
called again we won't be able to nd
anything!
Hash function must be deterministic
Good Hash Functions
Deterministic
Fast to compute
Distributes keys well into dierent cells
Few collisions
No Universal Hash Function
Lemma
If number of possible keys is big (|U| ≫ m),
for any hash function h there is a bad input
resulting in many collisions.
U
m=3 U
h(k) = 2
h(k) = 0 h(k) = 1
m=3 U
h(k) = 2
h(k) = 0 h(k) = 1
m=3 U
h(k) = 2
h(k) = 0 h(k) = 1
42%
Outline
1 Good Hash Functions
2 Universal Family
3 Hashing Integers
4 Hashing Strings
Idea
Remember QuickSort?
Idea
Remember QuickSort?
Choosing random pivot helped
Idea
Remember QuickSort?
Choosing random pivot helped
Use randomization!
Idea
Remember QuickSort?
Choosing random pivot helped
Use randomization!
Dene a family (set) of hash functions
Idea
Remember QuickSort?
Choosing random pivot helped
Use randomization!
Dene a family (set) of hash functions
Choose random function from the family
Universal Family
Definition
Let U be the universe the set of all
possible keys.
Universal Family
Definition
Let U be the universe the set of all
possible keys. A set of hash functions
ℋ = {h : U → {0, 1, 2, . . . , m − 1}}
Universal Family
Definition
Let U be the universe the set of all
possible keys. A set of hash functions
ℋ = {h : U → {0, 1, 2, . . . , m − 1}}
Pr [h(x) = h(y )] ≤
1
m
Universal Family
Pr [h(x) = h(y )] ≤
1
m
means that a collision h(x) = h(y ) on
selected keys x and y , x ̸= y happens for no
more than m1 of all hash functions h ∈ ℋ.
How Randomization Works
Lemma
ℋp = hpa,b (x) = ((ax + b) mod p) mod m
{︀ }︀
for all a, b : 1 ≤ a ≤ p − 1, 0 ≤ b ≤ p − 1
is a universal family
Hashing Phone Numbers
Example
Select a = 34, b = 2, so h = hp34,2 and
consider x = 1 482 567 corresponding to
phone number 148-25-67. p = 10 000 019.
Hashing Phone Numbers
Example
Select a = 34, b = 2, so h = hp34,2 and
consider x = 1 482 567 corresponding to
phone number 148-25-67. p = 10 000 019.
(34 × 1482567 + 2) mod 10000019 = 407185
Hashing Phone Numbers
Example
Select a = 34, b = 2, so h = hp34,2 and
consider x = 1 482 567 corresponding to
phone number 148-25-67. p = 10 000 019.
(34 × 1482567 + 2) mod 10000019 = 407185
407185 mod 1000 = 185
Hashing Phone Numbers
Example
Select a = 34, b = 2, so h = hp34,2 and
consider x = 1 482 567 corresponding to
phone number 148-25-67. p = 10 000 019.
(34 × 1482567 + 2) mod 10000019 = 407185
407185 mod 1000 = 185
h(x) = 185
General Case
Dene maximum length L of a phone
number
General Case
Dene maximum length L of a phone
number
Convert phone numbers to integers from
0 to 10L − 1
General Case
Dene maximum length L of a phone
number
Convert phone numbers to integers from
0 to 10L − 1
Choose prime number p > 10L
General Case
Dene maximum length L of a phone
number
Convert phone numbers to integers from
0 to 10L − 1
Choose prime number p > 10L
Choose hash table size m
General Case
Dene maximum length L of a phone
number
Convert phone numbers to integers from
0 to 10L − 1
Choose prime number p > 10L
Choose hash table size m
Choose random hash function from
universal family ℋp (choose random
a ∈ [1, p − 1] and b ∈ [0, p − 1])
Outline
1 Good Hash Functions
2 Universal Family
3 Hashing Integers
4 Hashing Strings
Lookup Phone Numbers by Name
Now we need to implement the Map
from names to phone numbers
Lookup Phone Numbers by Name
Now we need to implement the Map
from names to phone numbers
Can also use chaining
Lookup Phone Numbers by Name
Now we need to implement the Map
from names to phone numbers
Can also use chaining
Need a hash function dened on names
Lookup Phone Numbers by Name
Now we need to implement the Map
from names to phone numbers
Can also use chaining
Need a hash function dened on names
Hash arbitrary strings of characters
Lookup Phone Numbers by Name
Now we need to implement the Map
from names to phone numbers
Can also use chaining
Need a hash function dened on names
Hash arbitrary strings of characters
You will learn how string hashing is
implemented in Java!
String Length Notation
Definition
Denote by |S| the length of string S .
Examples
|“a”| = 1
|“ab”| = 2
|“abcde”| = 5
Hashing Strings
Given a string S , compute its hash value
Hashing Strings
Given a string S , compute its hash value
S = S[0]S[1] . . . S[|S| − 1], where S[i]
individual characters
Hashing Strings
Given a string S , compute its hash value
S = S[0]S[1] . . . S[|S| − 1], where S[i]
individual characters
We should use all the characters in the
hash function
Hashing Strings
Given a string S , compute its hash value
S = S[0]S[1] . . . S[|S| − 1], where S[i]
individual characters
We should use all the characters in the
hash function
Otherwise there will be many collisions:
Hashing Strings
Given a string S , compute its hash value
S = S[0]S[1] . . . S[|S| − 1], where S[i]
individual characters
We should use all the characters in the
hash function
Otherwise there will be many collisions:
For example, if S[0] is not used,
h(“aa”) = h(“ba”) = · · · = h(“za”)
Preparation
Proof idea
This follows from the fact that the equation
a0 + a1x + a2x 2 + · · · + aLx L = 0 (mod p) for
prime p has at most L dierent solutions x .
Cardinality Fix
Proof
1 1 1
m + pL < m
L
+ mL = m + m1 = 2
m = O( m1 )
Running Time
For big enough p again have
c = O(1 + 𝛼)
Running Time
For big enough p again have
c = O(1 + 𝛼)
Computing PolyHash(S) runs in time
O(|S|)
Running Time
For big enough p again have
c = O(1 + 𝛼)
Computing PolyHash(S) runs in time
O(|S|)
If lengths of the names in the phone
book are bounded by constant L,
computing h(S) takes O(L) = O(1)
time
Conclusion
You learned how to hash integers and
strings
Phone book can be implemented as two
hash tables
Mapping phone numbers to names and
back
Search and modication run on average
in O(1)!