CS 240 Tutorial 9 Notes: (Array A of Size M, Store V at A (K) )
CS 240 Tutorial 9 Notes: (Array A of Size M, Store V at A (K) )
Dictionary/Associative array: An abstract data type which holds a collection of (key, value) pairs, where
each key appears at most once.
This allows one to store a value in the associative array using a key as an index. The value can later
be extracted if one knows the key it was stored under.
Typical operations are
insert(k, v): add value v, associated to key k
delete(k): remove item associated to key k
search(k): return value associated to key k (if one exists)
For simplicity, keys are usually assumed to be integers. (If not, we can always map the keys to integers
first.) Also, say maximum key is M .
Question: How would one go about implementing an associative array?
Unsorted array/linked list
Sorted array
Balanced search tree (e.g., AVL)
Direct addressing
(array A of size M , store v at A[k])
Insert
O(1)
Search
O(n)
Delete
O(n)
(add to end)
(brute force)
(brute force)
O(n)
O(log n)
O(n)
(shifting)
(binary search)
(shifting)
O(log n)
O(log n)
O(log n)
(search + rotation)
(max height)
O(1)
O(1)
O(1)
k
X
i=1
ascii(wi ) mod 5.
X
k
ki
ascii(wi ) 3
mod 4 .
i=1
= 498 mod 5 = 3
= 623 mod 5 = 3
= 510 mod 5 = 0
Using chaining:
insert aardvark:
insert aback:
insert abacus:
insert abaft:
A[0]
A[1]
A[2]
A[3]
A[4]
aardvark
A[0]
A[1]
A[2]
A[3]
A[4]
aback
aardvark
A[0]
A[1]
A[2]
A[3]
A[4]
abacus aback
aardvark
A[0]
A[1]
A[2]
A[3]
A[4]
abaft
abacus aback
aardvark
Note: Values are inserted at the start of the linked list. This keeps insertion at O(1) cost.
However, in the worst case all items hash to the same location, and search/delete cost O(n).
But if the hash function is chosen properly this behaviour is unlikely. Assuming each hash value is equally
likely to occur, search/delete cost O(1 + n/|A|) in the average case. If we take |A| n then this is O(1).
This makes sense intuitively: if you want to store n items in A, you probably want to take |A| n to avoid
excessive chaining.
insert aardvark:
insert aback:
insert abacus:
insert abaft:
A[0]
A[1]
A[2]
A[3]
A[4]
aardvark
A[0]
A[1]
A[2]
A[3]
A[4]
aback
aardvark
abacus
aback
aardvark
abacus
abaft
A[0]
A[1]
A[2]
A[3]
A[4]
A[0]
A[1]
A[2]
A[3]
A[4]
aback
aardvark
Note: Now insert/delete/search are all O(n) in the worst case. When the hash table is mostly empty this
behaviour is unlikely, but as the table fills up (its load factor increases) it becomes more and more likely.
Using double hasing:
Note that
h2 (aardvark) = 1 + (97 37 + 97 36 + 114 35 + 100 34 + 118 33 + 97 32 + 114 3 + 107 mod 4)
= 1 + (323162 mod 4) = 1 + 2 = 3
h2 (aback) = 1 + (97 34 + 98 33 + 97 32 + 99 3 + 107 mod 4)
= 1 + (11780 mod 4) = 1 + 0 = 1
h2 (abacus) = 1 + (97 35 + 98 34 + 97 33 + 99 32 + 117 3 + 115 mod 4)
= 1 + (35485 mod 4) = 1 + 1 = 2
h2 (abaft) = 1 + (97 34 + 98 33 + 97 32 + 102 3 + 116 mod 4)
= 1 + (11798 mod 4) = 1 + 2 = 3
insert aardvark:
insert aback:
insert abacus:
insert abaft:
A[0]
A[1]
A[2]
A[3]
A[4]
aardvark
A[0]
A[1]
A[2]
A[3]
A[4]
aback
aardvark
abacus
aback
aardvark
abacus
abaft
aback
aardvark
A[0]
A[1]
A[2]
A[3]
A[4]
A[0]
A[1]
A[2]
A[3]
A[4]
jump amount: 3
jump amount: 1
jump amount: 2
jump amount: 3
Note: When the jump amount is 1, double hashing is identical to linear probing. Also, the jump amount
should never be 0, or no alternate positions will ever be checked. In general, if the jump amount evenly
divides the array size, not all alternate positions will be checked. (Making the array size prime and the jump
amount smaller than |A| guards against this.)