0% found this document useful (0 votes)
459 views34 pages

Open Addressing: Linear Probing: Data Structures and Algorithms

Linear probing is a method for resolving collisions in open addressing hash tables by searching through table slots sequentially. When a collision occurs, the next empty slot is used. As the load factor increases above 2/3, the average number of probes needed for operations grows substantially due to clustering of elements, hurting performance. Doubling the table size or using different probing methods like quadratic probing can help address this issue.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
459 views34 pages

Open Addressing: Linear Probing: Data Structures and Algorithms

Linear probing is a method for resolving collisions in open addressing hash tables by searching through table slots sequentially. When a collision occurs, the next empty slot is used. As the load factor increases above 2/3, the average number of probes needed for operations grows substantially due to clustering of elements, hurting performance. Doubling the table size or using different probing methods like quadratic probing can help address this issue.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Data Structures and Algorithms

Open Addressing:
Linear Probing
Linear Probing
• The easiest method to probe the bins of the hash table is
to search forward linearly
• Assume we are inserting into bin i:
– if bin i is empty, we occupy it
– otherwise, check bin i + 1, i + 2, and so on, until an empty bin is
found
– if we reach the end of the hash table, we start at the front (bin 0)
Linear Probing
• For example, suppose that our hash function converts a
2-digit integer into a single digit by taking the least-
significant digit
• Not necessarily a bad idea: for most natural data, the
least-significant digit is approximately random
• Such distributions are used to catch tax fraud
Linear Probing: Insertions
• Insert the numbers
81, 70, 97, 60, 51, 38, 89, 68, 24
into the initially empty hash table:

0 1 2 3 4 5 6 7 8 9
Linear Probing : Insertions
• We can easily insert 81, 70, and 97 into their
corresponding bins:

0 1 2 3 4 5 6 7 8 9
70 81 97
Linear Probing : Insertions
• Inserting 60 causes a collision in bin 0, therefore, we
check:
– bin 1 (also full), and
– bin 2 (empty)

0 1 2 3 4 5 6 7 8 9
70 81 60 97
Linear Probing : Insertions
• Inserting 51 also causes a collision, this time, in bin 1,
therefore, we check:
– bin 2 (also full), and
– bin 3 (empty)

0 1 2 3 4 5 6 7 8 9
70 81 60 51 97
Linear Probing : Insertions
• 38 and 89 can be placed into bins 8 and 9 respectively
without collisions

0 1 2 3 4 5 6 7 8 9
70 81 60 51 97 38 89
Linear Probing : Insertions
• Inserting 68 causes a collision in bin 8, and therefore we
check bins:
– 9, 0, 1, 2, 3, and finally 4 which is empty
– insert 68 into bin 4

0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 97 38 89
Linear Probing : Insertions
• Inserting 24 causes a collision in bin 4,
however the next bin is empty

0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
Linear Probing: Searching
• Testing for membership is similar to insertions
• Start at the appropriate bin, and continue searching
forward until either:
– the item is found,
– an empty bin is found, or
– we have traversed the entire array
• The last case will only occur if the hash table is full
Linear Probing: Searching
• Searching for 68, we first examine bin 8, then 9, 0, 1, 2,
3, and 4, finding 68 in bin 4
• Searching for 23, we search bins 3, 4, 5, and bin 6 is
empty, so 23 is not in the table

0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
Linear Probing: Removing
• We cannot simply remove elements from the hash table
• For example, if we delete 89 by removing it, we can no
longer find 68

0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
Linear Probing: Removing
• However, we cannot simply move all entries up to fill the
gap
• Moving 70 to bin 9 would make it impossible to find 70

0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89

81 60 51 68 24 97 38 70
Linear Probing: Removing
• Instead, we must probe forward, moving only those
elements which would not be moved to a location before
their bin starts
• For example, we remove 89

0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38
Linear Probing: Removing
• We probe forward until we find an entry which can be
moved into bin 9
• We cannot move 70, 81, 60, or 51, but we can move 68

0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68
Linear Probing: Removing
• Next, we search forward again, and note that 24 can be
moved forward
• The next cell is already empty, and therefore we are
finished

0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68
Linear Probing: Removing
• Suppose we now remove 60
• Begin searching forward from bin 0

0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68
Linear Probing: Removing
• We find 60 in bin 2, and therefore we remove it
• We search forward and find that we can move 51 into bin
2

0 1 2 3 4 5 6 7 8 9
70 81 51 24 97 38 68
Linear Probing: Removing
• We cannot move 24 forward
• The next bin (5) is empty, therefore we are finished

0 1 2 3 4 5 6 7 8 9
70 81 51 24 97 38 68
Primary Clustering
• We have already observed the following phenomenon:
– as we insert more elements into the hash table, the contiguous
regions get larger
• This results in longer search times
Primary Clustering
• Consider inserting the following entries
81, 70, 97, 63, 76, 38, 85, 68, 21, 9, 55, 73, 57, 60, 72, 74, 85, 16, 61, 7, 49

• Use the number modulo 25 to determine which bin it


should occupy
• The first four don’t cause any collisions

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

76 81 63 70 97
Primary Clustering
• Inserting 38 causes a collision in bin 13
• The next seven do not cause any further collisions

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

76 55 81 57 9 85 63 38 68 70 21 97 73
Primary Clustering
• The next four insertions cause collisions:
60 (bin 10)
72 (bin 22)
74 (bin 24)
85 (bin 10)
• We can safely insert 16 into bin 16

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

74 76 55 81 57 9 85 60 85 63 38 16 68 70 21 97 73 72
Primary Clustering
• The remaining insertions all cause collisions:
61 (bin 11)
7 (bin 7)
49 (bin 24)
• The joining of smaller groups into one large group is
termed coalescing

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

74 76 49 55 81 57 7 9 85 60 85 63 38 61 16 68 70 21 97 73 72
Primary Clustering
• As the load factor increased, the probability of a collision
increased
• Justification:
– suppose that a chain is of length m
– an insertion either into any bin occupied by the chain or into the
locations immediately before or after it will increase the length of
the chain
Primary Clustering
• Example, using the last two digits, consider the following
hash table
• Any insertion into bins 29 through 34 will increase the
length of the chain

... 28 29 30 31 32 33 34 35 ...
230 531 730 432
Primary Clustering
• Consequently, if a chain is of size m, then the probability
that it will be increased in length is (m + 2)/M where M is
the size of the hash table
• The more a chain grows, the more likely it will grow in
the future
Primary Clustering
• The length of these chains will affect the number of
probes required to perform insertions, accesses, or
removals
• It is possible to estimate the average number of probes
for a successful search, where  is the load factor:
 1 
1
1  
 1  
2

• For example: if  = 0.5, we 1.5 probes


Primary Clustering
• The number of probes for an unsuccessful search or for
an insertion is higher:
 1 
1 1  
2   1    2 
 
• For 0 ≤  ≤ 1, then (1 – )2 ≤ 1 – , and therefore the
reciprocal will be larger
• Again, if = 0.5 then we require 2.5 probes
Primary Clustering
• The following plot shows how the number of required
probes increases
Primary Clustering
• Our goal was to keep all operations O(1)
• Unfortunate, as grows, so does the run time
• One solution is to keep the load factor under a given
bound
• If we choose  = 2/3, then the number of probes for
either a successful or unsuccessful search is 2 and 5,
respectively
Primary Clustering
• Therefore, we have three choices:
– Choose M large enough so that we will not pass this load factor
– Double the number of bins if the chosen load factor is reached
– Choose a different strategy from linear probing
Primary Clustering
• The first solution (choose M sufficiently large) is most
useful if we know all the possible entries
• The second (doubling) is only useful if we have an
environment where we can dynamically allocate memory
• For the third, we will look at quadratic probing and
double hashing

You might also like