Open Addressing: Linear Probing: Data Structures and Algorithms
Open Addressing: Linear Probing: Data Structures and Algorithms
Open Addressing:
Linear Probing
Linear Probing
• The easiest method to probe the bins of the hash table is
to search forward linearly
• Assume we are inserting into bin i:
– if bin i is empty, we occupy it
– otherwise, check bin i + 1, i + 2, and so on, until an empty bin is
found
– if we reach the end of the hash table, we start at the front (bin 0)
Linear Probing
• For example, suppose that our hash function converts a
2-digit integer into a single digit by taking the least-
significant digit
• Not necessarily a bad idea: for most natural data, the
least-significant digit is approximately random
• Such distributions are used to catch tax fraud
Linear Probing: Insertions
• Insert the numbers
81, 70, 97, 60, 51, 38, 89, 68, 24
into the initially empty hash table:
0 1 2 3 4 5 6 7 8 9
Linear Probing : Insertions
• We can easily insert 81, 70, and 97 into their
corresponding bins:
0 1 2 3 4 5 6 7 8 9
70 81 97
Linear Probing : Insertions
• Inserting 60 causes a collision in bin 0, therefore, we
check:
– bin 1 (also full), and
– bin 2 (empty)
0 1 2 3 4 5 6 7 8 9
70 81 60 97
Linear Probing : Insertions
• Inserting 51 also causes a collision, this time, in bin 1,
therefore, we check:
– bin 2 (also full), and
– bin 3 (empty)
0 1 2 3 4 5 6 7 8 9
70 81 60 51 97
Linear Probing : Insertions
• 38 and 89 can be placed into bins 8 and 9 respectively
without collisions
0 1 2 3 4 5 6 7 8 9
70 81 60 51 97 38 89
Linear Probing : Insertions
• Inserting 68 causes a collision in bin 8, and therefore we
check bins:
– 9, 0, 1, 2, 3, and finally 4 which is empty
– insert 68 into bin 4
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 97 38 89
Linear Probing : Insertions
• Inserting 24 causes a collision in bin 4,
however the next bin is empty
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
Linear Probing: Searching
• Testing for membership is similar to insertions
• Start at the appropriate bin, and continue searching
forward until either:
– the item is found,
– an empty bin is found, or
– we have traversed the entire array
• The last case will only occur if the hash table is full
Linear Probing: Searching
• Searching for 68, we first examine bin 8, then 9, 0, 1, 2,
3, and 4, finding 68 in bin 4
• Searching for 23, we search bins 3, 4, 5, and bin 6 is
empty, so 23 is not in the table
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
Linear Probing: Removing
• We cannot simply remove elements from the hash table
• For example, if we delete 89 by removing it, we can no
longer find 68
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
Linear Probing: Removing
• However, we cannot simply move all entries up to fill the
gap
• Moving 70 to bin 9 would make it impossible to find 70
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
81 60 51 68 24 97 38 70
Linear Probing: Removing
• Instead, we must probe forward, moving only those
elements which would not be moved to a location before
their bin starts
• For example, we remove 89
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38
Linear Probing: Removing
• We probe forward until we find an entry which can be
moved into bin 9
• We cannot move 70, 81, 60, or 51, but we can move 68
0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68
Linear Probing: Removing
• Next, we search forward again, and note that 24 can be
moved forward
• The next cell is already empty, and therefore we are
finished
0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68
Linear Probing: Removing
• Suppose we now remove 60
• Begin searching forward from bin 0
0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68
Linear Probing: Removing
• We find 60 in bin 2, and therefore we remove it
• We search forward and find that we can move 51 into bin
2
0 1 2 3 4 5 6 7 8 9
70 81 51 24 97 38 68
Linear Probing: Removing
• We cannot move 24 forward
• The next bin (5) is empty, therefore we are finished
0 1 2 3 4 5 6 7 8 9
70 81 51 24 97 38 68
Primary Clustering
• We have already observed the following phenomenon:
– as we insert more elements into the hash table, the contiguous
regions get larger
• This results in longer search times
Primary Clustering
• Consider inserting the following entries
81, 70, 97, 63, 76, 38, 85, 68, 21, 9, 55, 73, 57, 60, 72, 74, 85, 16, 61, 7, 49
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
76 81 63 70 97
Primary Clustering
• Inserting 38 causes a collision in bin 13
• The next seven do not cause any further collisions
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
76 55 81 57 9 85 63 38 68 70 21 97 73
Primary Clustering
• The next four insertions cause collisions:
60 (bin 10)
72 (bin 22)
74 (bin 24)
85 (bin 10)
• We can safely insert 16 into bin 16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
74 76 55 81 57 9 85 60 85 63 38 16 68 70 21 97 73 72
Primary Clustering
• The remaining insertions all cause collisions:
61 (bin 11)
7 (bin 7)
49 (bin 24)
• The joining of smaller groups into one large group is
termed coalescing
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
74 76 49 55 81 57 7 9 85 60 85 63 38 61 16 68 70 21 97 73 72
Primary Clustering
• As the load factor increased, the probability of a collision
increased
• Justification:
– suppose that a chain is of length m
– an insertion either into any bin occupied by the chain or into the
locations immediately before or after it will increase the length of
the chain
Primary Clustering
• Example, using the last two digits, consider the following
hash table
• Any insertion into bins 29 through 34 will increase the
length of the chain
... 28 29 30 31 32 33 34 35 ...
230 531 730 432
Primary Clustering
• Consequently, if a chain is of size m, then the probability
that it will be increased in length is (m + 2)/M where M is
the size of the hash table
• The more a chain grows, the more likely it will grow in
the future
Primary Clustering
• The length of these chains will affect the number of
probes required to perform insertions, accesses, or
removals
• It is possible to estimate the average number of probes
for a successful search, where is the load factor:
1
1
1
1
2