Lect Hashing
Lect Hashing
Introduction
• Searching methods studied so far …
• Linear searching (O(n) search time)
• Binary searching (O(lg n) search time)
Hash Table
• A hash table is a data structure that stores elements and allows
insertions, lookups, and deletions to be performed in O(1) time.
• Hash function h is k % m = k % 8
• Where are the pairs stored?
• The hash function when applied to equal Objects, returns the same
number for each.
Advantages:
• fast, requires only one operation
Disadvantage:
• Certain values of m are bad, e.g.,
• Power of 2
• Non-prime numbers
9
Generally a prime number is a best choice to spread keys evenly.
Hash Function Methods
The Folding Method: The key K is partitioned into a number of parts,
each of which has the same length as the required address with the
possible exception of the last part.
• This parts are then added together, ignoring the final carry, to
form an address.
Example:
• If key=356942781 is to be transformed into a three digit address
• P1=356, P2=942, P3=781 are added to yield 079
10
Example:
• If key=123456 is to be transformed
• (123456)2 = 15241383936
• If 3-digit address is required, position 5 to 7 is chosen giving 138 11
12
Now what?
(72,e) (33,c) (3,d) (85,f)(22,a)
13
Separate Chaining
• Idea is to keep a list of all elements that hash to the same value.
• The array elements are pointers to the first nodes of the lists. • A
new item is inserted to the front of the list.
• Advantages:
• Better space utilization for large number of items.
• Simple collision handling: searching linked list.
• Overflow: we can store more items than the hash table size. • Deletion is
15
quick and easy: deletion from the linked list.
1
81 1
2
3
4
64 4
5
25
6
36 16
7
8
16
9
49 9
• Find:
• locate the cell using hash function.
• sequential search on the linked list in that cell.
• Insertion:
• Locate the cell using hash function.
• (If the item does not exist) insert it as the first item in the list.
• Deletion:
• Locate the cell using hash function.
17
• Delete the item from the linked list.
18
• How long does it take to search for an element with a given key?
• Worst case: Θ(n)
• If we can choose TableSize to be about N, constant search time •
Assuming each element is likely to be hashed to any bucket, running
19
time constant, independent of N
Open Addressing
• If we have enough contiguous memory to store all the keys
(TableSize > N) 🡺 store the keys in the table itself
No need to use linked lists anymore
Generally the load factor should be below 0.5.
• Basic idea:
• Insertion: if a slot is full, try another one, until you find an empty one
• Search: follow the same sequence of probes
• Deletion: more difficult ... (we’ll see why)
• Example:
• Insert items with keys: 89, 18, 49, 58, 9 into an empty hash table.
• Table size is 10.
• Hash function is hash(x) = x mod 10.
21
Figure 20.4
Linear probing
hash table
after
each insertion
22
• Three cases:
• Position in table is occupied with an element of equal key
• Position in table is empty
• Position in table occupied with a different element
• For Case 2, probe the next higher index until the element is found
or an empty position is found
23
• The process wraps around to the beginning of the table
24
25
Quadratic Probing
• Quadratic Probing eliminates primary clustering problem of
linear probing.
Quadratic Probing
• Problem:
• We may not be sure that we will probe all locations in the table
(i.e. there is no guarantee to find an empty cell if table is more
than half full.)
• If the hash table size is not prime this problem will be much
severe.
• However, there is a theorem stating that:
• If the table size is prime and load factor is not larger than 0.5,
all probes will be to different locations and an item can always
be inserted.
30
Some Considerations
• How efficient is calculating the quadratic probes?
• Linear probing is easily implemented. Quadratic probing appears
to require * and % operations.
• However by the use of the following trick, this is overcome:
• Hi = Hi-1+2i – 1 (mod M)
31
Some Considerations
• What happens if load factor gets too high?
• Dynamically expand the table as soon as the load factor reaches 0.5,
which is called rehashing.
• Always double to a prime number.
• When expanding the hash table, reinsert the new table by using the
new hash function.
32
33
Double Hashing
• Use one hash function to determine the first slot
• Use a second hash function to determine the increment for
the probe sequence
• h(k,i) = (h1(k) + i h2(k) ) mod m, i=0,1,...
h(14,1) = (h1(14) + 7 35
h2(14)) mod 13 8
36
Hashing Applications
• Compilers use hash tables to implement the symbol table (a
data structure to keep track of declared variables).
• Game programs use hash tables to keep track of positions it
has encountered (transposition table)
• Online spelling checkers.
37
Summary
• Hash tables can be used to implement the insert and find
operations in constant average time.
• it depends on the load factor not on the number of items in the
table.