0% found this document useful (0 votes)
9 views37 pages

Hashing

A hash function maps objects to numbers in a way such that equal objects produce the same number and unequal objects are unlikely to produce the same number. This allows for very fast constant-time searches of data stored in a hash table. Collisions occur when two objects hash to the same location, and methods like chaining or open addressing resolve collisions by storing data in alternate locations. Rehashing computes a second hash function to further reduce collisions.

Uploaded by

stanyatan10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views37 pages

Hashing

A hash function maps objects to numbers in a way such that equal objects produce the same number and unequal objects are unlikely to produce the same number. This allows for very fast constant-time searches of data stored in a hash table. Collisions occur when two objects hash to the same location, and methods like chaining or open addressing resolve collisions by storing data in alternate locations. Rehashing computes a second hash function to further reduce collisions.

Uploaded by

stanyatan10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Hashing

Preview
 A hash function is a function that:
 When applied to an Object, returns a number
 When applied to equal Objects, returns the same number
for each
 When applied to unequal Objects, is very unlikely to return
the same number for each
 Hash functions turn out to be very important for
searching, that is, looking things up fast
Searching
 Consider the problem of searching an array for a given value
 If the array is not sorted, the search requires O(n) time
 If the value isn’t there, we need to search all n elements
 If the value is there, we search n/2 elements on average
 If the array is sorted, we can do a binary search
 A binary search requires O(log n) time
 About equally fast whether the element is found or not
 It doesn’t seem like we could do much better
 How about an O(1), that is, constant time search?
 We can do it if the array is organized in a particular way
Hashing
 Suppose we were to come up with a “magic function” that, given a value to
search for, would tell us exactly where in the array to look
 If it’s in that location, it’s in the array
 If it’s not in that location, it’s not in the array
 This function would have no other purpose
 If we look at the function’s inputs and outputs, they probably won’t “make
sense”
 This function is called a hash function because it “makes hash” of its inputs
Example (ideal) hash function
 Suppose our hash function 0 kiwi
gave us the following values: 1
hashCode("apple") = 5 2 banana
hashCode("watermelon") = 3
hashCode("grapes") = 8
3 watermelon
hashCode("cantaloupe") = 7 4
hashCode("kiwi") = 0
hashCode("strawberry") = 9
5 apple
hashCode("mango") = 6 6 mango
hashCode("banana") = 2
7 cantaloupe
8 grapes
9 strawberry
Sets and tables
 Sometimes we just want a set ... key value
of things—objects are either
in it, or they are not in it 141
 Sometimes we want a map— 142 robin robin info
a way of looking up one thing 143 sparrow sparrow info
based on the value of another
 We use a key to find a place in 144 hawk hawk info
the map 145 seagull seagull info
 The associated value is the
information we are trying to 146
look up 147 bluejay bluejay info
 Hashing works the same for
148 owl owl info
both sets and maps
 Most of our examples will be
sets
Example imperfect hash function
 Suppose our hash function gave 0 kiwi
us the following values: 1
hash("apple") = 5

hash("watermelon") = 3
2 banana
hash("grapes") = 8 3 watermelon
hash("cantaloupe") = 7 4
hash("kiwi") = 0
hash("strawberry") = 9 5 apple
hash("mango") = 6
hash("banana") = 2
6 mango
hash("honeydew") = 6 7 cantaloupe
8 grapes
9 strawberry
Hashing

0
Universe of keys
h(k1)

h(k4)
K k1 k4
(actual k2
h(k2)=h(k5)
keys) k5 collision
k3

h(k3)

m–1
(ii) Division Method
• Map a key k into one of the m slots by taking the
remainder of k divided by m. That is,
h(k) = k mod m
• Example: m = 31 and k = 78  h(k) = 16.
• Advantage: Fast, since requires just one division
operation.
(Mid Square Method)
Collisions
 When two values hash to the same array location,
this is called a collision
 Collisions are normally treated as “first come, first
served”—the first value that hashes to the location
gets it
 We have to find something to do with the second and
subsequent values that hash to this same location
Methods of Resolution
• Chaining: 0

– Store all elements that hash to the k1 k4

same slot in a linked list. k5 k2 k6

– Store a pointer to the head of the linked k7


k8
k3

list in the hash table slot. m–1

• Open Addressing:
– All elements stored in hash table itself.
– When collisions occur, use a systematic
(consistent) procedure to store
elements in free slots of the table.
(Linear Probing)
Insertion, I
 Suppose you want to add ...
seagull to this hash table 141

 Also suppose: 142 robin


 hashCode(seagull) = 143 143 sparrow
 table[143] is not empty 144 hawk
 table[143] != seagull 145 seagull
 table[144] is not empty
146
 table[144] != seagull
147 bluejay
 table[145] is empty
148 owl
 Therefore, put seagull at
...
location 145
Searching, I
 Suppose you want to look up ...
seagull in this hash table 141
 Also suppose: 142 robin
 hashCode(seagull) = 143
143 sparrow
 table[143] is not empty
 table[143] != seagull 144 hawk
 table[144] is not empty 145 seagull
 table[144] != seagull 146
 table[145] is not empty
147 bluejay
 table[145] == seagull !
148 owl
 We found seagull at location
...
145
Searching, II
 Suppose you want to look up ...
cow in this hash table 141
 Also suppose: 142 robin
 hashCode(cow) = 144
143 sparrow
 table[144] is not empty
 table[144] != cow 144 hawk
 table[145] is not empty 145 seagull
 table[145] != cow 146
 table[146] is empty
147 bluejay
 If cow were in the table, we 148 owl
should have found it by now
...
 Therefore, it isn’t here
Insertion, II
 Suppose you want to add ...
hawk to this hash table 141

 Also suppose 142 robin


 hashCode(hawk) = 143 143 sparrow
 table[143] is not empty 144 hawk
 table[143] != hawk 145 seagull
 table[144] is not empty
146
 table[144] == hawk
147 bluejay
 hawk is already in the table,
148 owl
so do nothing
...
Insertion, III
 Suppose: ...
 You want to add cardinal to 141
this hash table 142 robin
 hashCode(cardinal) = 147
143 sparrow
 The last location is 148
144 hawk
 147 and 148 are occupied
145 seagull
 Solution: 146
 Treat the table as circular; after
147 bluejay
148 comes 0
148 owl
 Hence, cardinal goes in
location 0 (or 1, or 2, or ...)
Collision Resolution by Chaining

0
Universe of keys
h(k1)=h(k4)
X
k1
k4
K
(actual k2 k6 X
k5 h(k2)=h(k5)=h(k6)
keys)
k8 k7
k3
X h(k3)=h(k7)
h(k8)
m–1
Collision Resolution by Chaining

0
Universe of keys
k1 k4

k1
k4
K
(actual k2 k6
keys)
k5 k5 k2 k6
k8 k7
k3
k7 k3

k8
m–1
Rehashing
 In the event of a collision, another approach is to rehash: compute another hash function
 Since we may need to rehash many times, we need an easily computable sequence of functions
 Simple example: in the case of hashing Strings, we might take the previous hash code
and add the length of the String to it
 Probably better if the length of the string was not a component in computing the original hash function
 Possibly better yet: add the length of the String plus the number of probes made so far
 Problem: are we sure we will look at every location in the array?
 Rehashing is a fairly uncommon approach, and we won’t pursue it any further here

You might also like