CS 03
CS 03
1
Dictionaries
• Dictionary stores elements.
• Each element has an associated key
• Elements can be retrieved quickly using the key
• Examples of keys
— Roll number for student records
— Account number for bank accounts
— PAN number for Tax payments
2
The Dictionary ADT
• size()
• isEmpty()
• elements()
• findElement (k)
• findAllElements (k)
• InsertItem (k, e)
• removeElement (k)
• RemoveAllElements (k)
3
Using an ordered sequence
• If key-element pairs can be stored as an ordered sequence, then
searching can be done in O(log2 n) time using binary search.
5 8 9 13 17 21 22 43 48 50 55 58 59 60 78 90
4
Binary Search
• Algorithm BinarySearch (A, k, low, high)
if low > high then return Nil
else mid ← (low+high)/2
if k = A[mid] then return mid
elseif k < A[mid] then
return BinarySearch (A, k, low, mid-1)
else return BinarySearch (A, k, mid+1, high)
5 8 9 13 17 21 22 43 48 50 55 58 59 60 78 90
5
Binary Search (Iterative)
• low ← 1
high ← n
do
mid ← (low+high)/2
if A[mid] = k then return mid
elseif A[mid] > k then high ← mid-1
else low ← mid + 1
while low ≤ high
return Nil
6
Using an unordered sequence
• j←1
while j ≤ n and A[ j] ≠ q
do j++
if j ≤ n then return j
else return Nil
7
Use an array indexed by key
• Let the keys be in the range 0 to r-1.
• Allocate an array of size r
Space required O(r)
. . . . . . . . . . . . . . . .
8
Hash Table
• Given n (key, element) pairs
• A hash table is a table of size m
• The original key is mapped to an integer
The integer is mapped to a table index by taking its modulo m
. . . . . . . . . . . . . . . .
9
Hash Function
• A hash function maps a key to an index in the hash table.
• Multiple keys can get mapped to the same location (Clash/Collision)
. . . . . . . . . . . . . . . .
10
Collision Resolution
• Chaining
0 . 0 0 0
1 .
2 .
3 .
4 . 5 5
5 .
6 .
7 .
8 .
9 .
10 .
11 . 9
11
Collision Resolution — Chaining
• Find/Insert/Delete
0 . . . .
1 .
2 .
3 .
4 . . .
5 .
6 .
7 . Use the hash function to get the
slot for that key.
8 .
9 .
10 . Find/insert/delete the
. element in the linked list of
11 .
the hashed slot.
12
13
Analysis of Hashing
• Assume that the hash function h(k) takes Θ(1) time to compute
• The hash function h( ⋅ ) maps the universe U of keys into the slots of the
hash table T[0…m − 1]
h : U → {0,1,…, m − 1}
• A good hash function distributes the keys evenly amongst the slots.
14
Analysis of Hashing
• For our analysis, we will consider a simple uniform hash function
n
• The load factor λ = m where n is the number of items stored in a hash
15
Analysis of Hashing
• Unsuccessful Search
• Element being searched is not in the linked list
.
.
.
.
.
16
Analysis of Hashing
• Successful Search
• Again assume a simple uniform hash function
• At the time of inserting the i-th element, the expected length of the
list is (i − 1)/m.
.
. 17
Analysis of Hashing
• The expected number of elements examined is:
n i=1 ( m )
1 n i−1 1 n 1 n(n − 1)
∑ ∑
1+ =1+ (i − 1) =1+
nm i=1 nm 2
n−1 n 1 λ 1
=1+ =1+ − =1+ −
2m 2m 2m 2 2m
• Including the time for computing the hash function, the complexity of search
( )
λ 1
is Θ 2 + − = Θ (1 + λ)
2 2m
.
. 18
Analysis of Hashing
• We generally take the number of hash table slots as proportional to the
number of elements in the table. ∴ n = O(m)
n O(m)
• λ = m = m = O(1)
19
Hash Functions
• A good hash function should be
• Quick to compute
• Distribute keys uniformly
20
Hash Code Maps
• Integer Cast
• Component Sum
• Polynomial Accumulation
Use the ascii/Unicode equivalent of tokens as coefficients of a
polynomial
21
Compression Maps
• h(k) = k mod m k: key, m: #slots
• h(k) = | ak + b | mod n
22