0% found this document useful (0 votes)
32 views4 pages

CS 240 Tutorial 9 Notes: (Array A of Size M, Store V at A (K) )

The document discusses associative arrays and different methods for implementing them using hashing. An associative array allows storing key-value pairs and looking up values by key. Direct addressing stores values directly in an array using the key as the index, but requires large space. Hashing maps keys to array indices to reduce space. Common hashing methods are chaining, linear probing, and double hashing, which deal with collisions when different keys hash to the same index. The document provides an example comparing how these methods insert different keys into an array of size 5.

Uploaded by

DavidKnight
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views4 pages

CS 240 Tutorial 9 Notes: (Array A of Size M, Store V at A (K) )

The document discusses associative arrays and different methods for implementing them using hashing. An associative array allows storing key-value pairs and looking up values by key. Direct addressing stores values directly in an array using the key as the index, but requires large space. Hashing maps keys to array indices to reduce space. Common hashing methods are chaining, linear probing, and double hashing, which deal with collisions when different keys hash to the same index. The document provides an example comparing how these methods insert different keys into an array of size 5.

Uploaded by

DavidKnight
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CS 240 Tutorial 9 Notes

Dictionary/Associative array: An abstract data type which holds a collection of (key, value) pairs, where
each key appears at most once.
This allows one to store a value in the associative array using a key as an index. The value can later
be extracted if one knows the key it was stored under.
Typical operations are
insert(k, v): add value v, associated to key k
delete(k): remove item associated to key k
search(k): return value associated to key k (if one exists)
For simplicity, keys are usually assumed to be integers. (If not, we can always map the keys to integers
first.) Also, say maximum key is M .
Question: How would one go about implementing an associative array?
Unsorted array/linked list
Sorted array
Balanced search tree (e.g., AVL)
Direct addressing
(array A of size M , store v at A[k])

Insert
O(1)

Search
O(n)

Delete
O(n)

(add to end)

(brute force)

(brute force)

O(n)

O(log n)

O(n)

(shifting)

(binary search)

(shifting)

O(log n)

O(log n)

O(log n)

(search + rotation)

(max height)

(search + swap + rotations)

O(1)

O(1)

O(1)

(key tells us exactly where item will be stored)

Question: What is the downside to direct addressing?


Space required is O(M ), even if n is very small, which is wasteful. If the keys contained 99 decimal
digits, A would have to be of size 10100 , which is more than the estimated number of atoms in the
universe!
A hash table is a way of maintaining the good behaviour of this approach, while also addressing the downside.
The idea is to use an array of smaller size (e.g., O(n)) and then map the original keys into a smaller range, so
that they are indices for this smaller array.
The process of mapping a key into a small keyspace is known as hashing, and is done by applying a hash
function.
Main problem: Since we are mapping a large keyspace onto a small keyspace, some large keys will hash to
the same small key (by the pigeonhole principle), so we must somehow deal with collisions.
Three ideas:
Chaining: Each array location can contain multiple (k, v) by storing them in a linked list.
Linear probing: If the location where you want to insert is already filled, insert in the next available
location.
Double hashing: Instead of looking sequentially for the next available location, jump ahead by a certain
amount until a space is free. The jump amount is controlled by a second hash function.
Example: Using chaining, linear probing, and double hashing, insert aardvark, aback, abacus, and abaft
into an array of size 5, where the key is the word itself and the hash function is
h(w) = h(w1 w2 . . . wk ) =

k
X
i=1

ascii(wi ) mod 5.

For double hashing, use the secondary hash function


h2 (w) = h2 (w1 w2 . . . wk ) = 1 +

X
k

ki

ascii(wi ) 3


mod 4 .

i=1

Answer: Note that


h(aardvark) = 97 + 97 + 114 + 100 + 118 + 97 + 114 + 107 mod 5 = 844 mod 5 = 4
h(aback) = 97 + 98 + 97 + 99 + 107 mod 5
h(abacus) = 97 + 98 + 97 + 99 + 117 + 115 mod 5
h(abaft) = 97 + 98 + 97 + 102 + 116 mod 5

= 498 mod 5 = 3
= 623 mod 5 = 3
= 510 mod 5 = 0

Using chaining:

insert aardvark:

insert aback:

insert abacus:

insert abaft:

A[0]
A[1]
A[2]
A[3]
A[4]

aardvark

A[0]
A[1]
A[2]
A[3]
A[4]

aback
aardvark

A[0]
A[1]
A[2]
A[3]
A[4]

abacus aback
aardvark

A[0]
A[1]
A[2]
A[3]
A[4]

abaft

abacus aback
aardvark

Note: Values are inserted at the start of the linked list. This keeps insertion at O(1) cost.
However, in the worst case all items hash to the same location, and search/delete cost O(n).
But if the hash function is chosen properly this behaviour is unlikely. Assuming each hash value is equally
likely to occur, search/delete cost O(1 + n/|A|) in the average case. If we take |A| n then this is O(1).
This makes sense intuitively: if you want to store n items in A, you probably want to take |A| n to avoid
excessive chaining.

Using linear probing:

insert aardvark:

insert aback:

insert abacus:

insert abaft:

A[0]
A[1]
A[2]
A[3]
A[4]

aardvark

A[0]
A[1]
A[2]
A[3]
A[4]

aback
aardvark

abacus

aback
aardvark

abacus
abaft

A[0]
A[1]
A[2]
A[3]
A[4]
A[0]
A[1]
A[2]
A[3]
A[4]

aback
aardvark

Note: Now insert/delete/search are all O(n) in the worst case. When the hash table is mostly empty this
behaviour is unlikely, but as the table fills up (its load factor increases) it becomes more and more likely.
Using double hasing:
Note that
h2 (aardvark) = 1 + (97 37 + 97 36 + 114 35 + 100 34 + 118 33 + 97 32 + 114 3 + 107 mod 4)
= 1 + (323162 mod 4) = 1 + 2 = 3
h2 (aback) = 1 + (97 34 + 98 33 + 97 32 + 99 3 + 107 mod 4)
= 1 + (11780 mod 4) = 1 + 0 = 1
h2 (abacus) = 1 + (97 35 + 98 34 + 97 33 + 99 32 + 117 3 + 115 mod 4)
= 1 + (35485 mod 4) = 1 + 1 = 2
h2 (abaft) = 1 + (97 34 + 98 33 + 97 32 + 102 3 + 116 mod 4)
= 1 + (11798 mod 4) = 1 + 2 = 3

insert aardvark:

insert aback:

insert abacus:

insert abaft:

A[0]
A[1]
A[2]
A[3]
A[4]

aardvark

A[0]
A[1]
A[2]
A[3]
A[4]

aback
aardvark

abacus

aback
aardvark

abacus
abaft

aback
aardvark

A[0]
A[1]
A[2]
A[3]
A[4]
A[0]
A[1]
A[2]
A[3]
A[4]

jump amount: 3

jump amount: 1

jump amount: 2

jump amount: 3

Note: When the jump amount is 1, double hashing is identical to linear probing. Also, the jump amount
should never be 0, or no alternate positions will ever be checked. In general, if the jump amount evenly
divides the array size, not all alternate positions will be checked. (Making the array size prime and the jump
amount smaller than |A| guards against this.)

You might also like