Hash Table
Hash Table
find
delete
Unsorted linked-list
O(1)
O(n)
O(n)
Unsorted array
O(1)
O(n)
O(n)
Sorted linked list
O(n)
O(n)
O(n)
Sorted array
O(n)
O(log n)
O(n)
Balanced tree
O(log n) O(log n)
O(log n)
Magic array
O(1)
O(1)
O(1)
Sufficient magic:
Use key to compute array index for an item in O(1) time [doable]
Have a different index for every item [magic]
11/3/16
81
2
What structure is appropriate?
Tree?
2
1
2
List?
Array?
0
1
2
3
4
5
11/3/16
1
6
2
8
11/3/16
Hash Tables
hash table
Basic idea:
hash function:
index = h(key)
TableSize 1
11/3/16
Hash Tables
There are m possible keys (m typically large, even
infinite)
We expect our table to have only n items
n is much less than m (often written n << m)
Many dictionaries have this property
Compiler: All possible identifiers allowed by the language vs.
those used in some file of one program
Database: All possible student names vs. students enrolled
AI: All possible chess-board configurations vs. those
considered by the current player
11/3/16
Hash functions
An ideal hash function:
Fast to compute
Rarely hashes two used keys to the same index
hash table
Often impossible in theory but easy in practice
0
Will handle collisions later
hash function:
index = h(key)
TableSize 1
11/3/16
0
1
2
3
4
5
6
18
41
11/3/16
10
41
34
7
18
11/3/16
11
h(K) = f(K) % P
P is typically the TableSize
P is often chosen to be prime:
Reduces likelihood of collisions due to patterns in
data
Is useful for guarantees on certain hashing strategies
(as well see)
m-1
128])
H(batman) = H(ballgame)
1. h(K) = s0 % TableSize
si
i 0
2. h(K) =
m 1
m1
3. h(K) =
11/3/16
s 37
i
i0
H(spot) = H(pots)
% TableSize
% TableSize
13
What to hash?
We will focus on the two most common things to hash:
ints and strings
For objects with several fields, usually best to have most of
the identifying fields contribute to the hash to avoid
collisions
Example:
class Person {
String first; String middle; String last;
Date birthdate;
}
An inherent trade-off: hashing-time vs. collision-avoidance
11/3/16
Deep Breath
Recap
11/3/16
15
client
E
hash table
int
collision? collision
table-index
resolution
TableSize 1
11/3/16
16
Collision resolution
Collision:
When two keys map to the same location in
the hash table
We try to avoid it, but number-of-keys exceeds
table size
So hash tables should support collision
resolution
Ideas?
11/3/16
17
Separate Chaining
0
11/3/16
Chaining:
All keys that map to the same
table location are kept in a list
(a.k.a. a chain or bucket)
As easy as it sounds
Example:
insert 10, 22, 107, 12, 42
with mod hashing
and TableSize = 10
18
Separate Chaining
0
10 /
11/3/16
Chaining:
All keys that map to the same
table location are kept in a list
(a.k.a. a chain or bucket)
As easy as it sounds
Example:
insert 10, 22, 107, 12, 42
with mod hashing
and TableSize = 10
19
Separate Chaining
0
1
10 /
/
22 /
2
3
11/3/16
Chaining:
All keys that map to the same
table location are kept in a list
(a.k.a. a chain or bucket)
As easy as it sounds
Example:
insert 10, 22, 107, 12, 42
with mod hashing
and TableSize = 10
20
Separate Chaining
0
1
10 /
/
22 /
2
3
4
5
6
/
/
/
/
7
/
11/3/16
As easy as it sounds
107 /
Chaining:
All keys that map to the same
table location are kept in a list
(a.k.a. a chain or bucket)
Example:
insert 10, 22, 107, 12, 42
with mod hashing
and TableSize = 10
21
Separate Chaining
0
1
10 /
/
12
2
3
4
5
6
/
/
/
/
7
/
11/3/16
As easy as it sounds
107 /
22 /
Chaining:
All keys that map to the same
table location are kept in a list
(a.k.a. a chain or bucket)
Example:
insert 10, 22, 107, 12, 42
with mod hashing
and TableSize = 10
22
Separate Chaining
0
1
10 /
/
42
2
3
4
5
6
/
/
/
/
7
/
11/3/16
As easy as it sounds
107 /
12
Chaining:
All keys that map to the same
table location are kept in a
22 /
list (a.k.a. a chain or
bucket)
Example:
insert 10, 22, 107, 12, 42
with mod hashing
and TableSize = 10
23
TableSize
number of elements