Suresh
Suresh
Once the hash values have been computed, we can insert each item into the hash table
at the designated position
Now when we want to search for an item, we simply use the hash function to compute the
slot name for the item and then check the hash table to see if it is present. This searching
operation is O(1), since a constant amount of time is required to compute the hash value
and then index the hash table at that location. If everything is where it should be, we have
found a constant time search algorithm.
Folding Method
• The folding method for constructing hash functions begins
by dividing the item into equal-size pieces (the last piece
may not be of equal size). These pieces are then added
together to give the resulting hash value.
• For example, if our item was the phone number 436-555-
4601, we would take the digits and divide them into groups
of 2 (43,65,55,46,01).
• After the addition, 43+65+55+46+01, we get 210. If we
assume our hash table has 11 slots, then we need to perform
the extra step of dividing by 11 and keeping the remainder.
• In this case 210 % 11 is 1, so the phone number 436-555-
4601 hashes to slot 1.
Mid-square method
• We first square the item, and then extract some
portion of the resulting digits.
• For example, if the item were 44, we would
first compute 442=1,936.
• By extracting the middle two digits, 93, and
performing the remainder step, we get 5
(93 % 11).
Collision Resolution
• What is collision?
• When two items hash to the same slot, we
must have a systematic method for placing the
second item in the hash table. This process is
called collision resolution.
• If the hash function is perfect, collisions will
never occur. However, since this is often not
possible, collision resolution becomes a very
important part of hashing.
Open addressing technique:Linear
probing
• One method for resolving collisions looks into the hash
table and tries to find another open slot to hold the item that
caused the collision.
• A simple way to do this is to start at the original hash value
position and then move in a sequential manner through the
slots until we encounter the first slot that is empty.
• Note that we may need to go back to the first slot
(circularly) to cover the entire hash table. This collision
resolution process is referred to as open addressing in that
it tries to find the next open slot or address in the hash table.
• By systematically visiting each slot one at a time, we are
performing an open addressing technique called linear
probing.
• A disadvantage to linear probing is the tendency for
clustering; items become clustered in the table. This
means that if many collisions occur at the same hash
value, a number of surrounding slots will be filled by
the linear probing resolution.
• One way to deal with clustering is to extend the linear
probing technique so that instead of looking
sequentially for the next open slot, we skip slots,
thereby more evenly distributing the items that have
caused collisions. This will potentially reduce the
clustering that occurs.
• The general name for this process of looking for another
slot after a collision is rehashing.
• With simple linear probing, the rehash function is
newhashvalue=rehash(oldhashvalue) where
rehash(pos)=(pos+1)%sizeoftable.
• The “plus 3” rehash can be defined as
rehash(pos)=(pos+3)%sizeoftable.
• In general, rehash(pos)=(pos+skip)%sizeoftable. It is
important to note that the size of the “skip” must be such
that all the slots in the table will eventually be visited.
Otherwise, part of the table will be unused.
• To ensure this, it is often suggested that the table size be a
prime number.
• A variation of the linear probing idea is called
quadratic probing. Instead of using a
constant “skip” value, we use a rehash
function that increments the hash value by 1, 3,
5, 7, 9, and so on. This means that if the first
hash value is h, the successive values are h+1,
h+4, h+9, h+16, and so on.
Chaining Technique
• An alternative method for handling the collision
problem is to allow each slot to hold a reference
to a collection (or chain) of items.
• Chaining allows many items to exist at the same
location in the hash table. When collisions
happen, the item is still placed in the proper slot
of the hash table. As more and more items hash to
the same location, the difficulty of searching for
the item in the collection increases.
• When we want to search for an item, we use
the hash function to generate the slot where it
should reside. Since each slot holds a
collection, we use a searching technique to
decide whether the item is present. The
advantage is that on the average there are
likely to be many fewer items in each slot, so
the search is perhaps more efficient.