0% found this document useful (0 votes)
1 views15 pages

Suresh

Hashing is a technique to improve search algorithms by allowing O(1) time complexity for item retrieval using a hash table, where items are mapped to slots via a hash function. Various methods for creating hash functions include the remainder method, folding method, and mid-square method, while collision resolution techniques like linear probing and chaining help manage instances where multiple items hash to the same slot. The document also discusses the importance of rehashing and the potential issues of clustering in linear probing.

Uploaded by

yegom47911
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views15 pages

Suresh

Hashing is a technique to improve search algorithms by allowing O(1) time complexity for item retrieval using a hash table, where items are mapped to slots via a hash function. Various methods for creating hash functions include the remainder method, folding method, and mid-square method, while collision resolution techniques like linear probing and chaining help manage instances where multiple items hash to the same slot. The document also discusses the importance of rehashing and the potential issues of clustering in linear probing.

Uploaded by

yegom47911
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

HASHING

• We can make improvements in our search algorithms


by taking advantage of information about where items
are stored in the collection with respect to one another.
• For example, by knowing that a list was ordered, we
could search in logarithmic time using a binary search.
• In this section we will attempt to go one step further by
building a data structure that can be searched in O(1)
time. This concept is referred to as hashing.
• If every item is where it should be, then the search can
use a single comparison to discover the presence of an
item.
• A hash table is a collection of items which are stored
in such a way as to make it easy to find them later.
• Each position of the hash table, often called a slot,
can hold an item and is named by an integer value
starting at 0. For example, we will have a slot named
0, a slot named 1, a slot named 2, and so on.
• Initially, the hash table contains no items so every slot
is empty.
• The mapping between an item and the slot
where that item belongs in the hash table is
called the hash function.
• The hash function will take any item in the
collection and return an integer in the range of
slot names, between 0 and m-1.
Remainder Method
• Assume that we have the set of integer items
54, 26, 93, 17, 77, and 31
• “remainder method,” simply takes an item
and divides it by the table size, returning the
remainder as its hash value
(h(item)=item%11).
• Once the hash values have been computed,
we can insert each item into the hash table at
the designated position
Item Hash Value
54 10
26 4
93 5
17 6
77 0
31 9

Once the hash values have been computed, we can insert each item into the hash table
at the designated position

Now when we want to search for an item, we simply use the hash function to compute the
slot name for the item and then check the hash table to see if it is present. This searching
operation is O(1), since a constant amount of time is required to compute the hash value
and then index the hash table at that location. If everything is where it should be, we have
found a constant time search algorithm.
Folding Method
• The folding method for constructing hash functions begins
by dividing the item into equal-size pieces (the last piece
may not be of equal size). These pieces are then added
together to give the resulting hash value.
• For example, if our item was the phone number 436-555-
4601, we would take the digits and divide them into groups
of 2 (43,65,55,46,01).
• After the addition, 43+65+55+46+01, we get 210. If we
assume our hash table has 11 slots, then we need to perform
the extra step of dividing by 11 and keeping the remainder.
• In this case 210 % 11 is 1, so the phone number 436-555-
4601 hashes to slot 1.
Mid-square method
• We first square the item, and then extract some
portion of the resulting digits.
• For example, if the item were 44, we would
first compute 442=1,936.
• By extracting the middle two digits, 93, and
performing the remainder step, we get 5
(93 % 11).
Collision Resolution
• What is collision?
• When two items hash to the same slot, we
must have a systematic method for placing the
second item in the hash table. This process is
called collision resolution.
• If the hash function is perfect, collisions will
never occur. However, since this is often not
possible, collision resolution becomes a very
important part of hashing.
Open addressing technique:Linear
probing
• One method for resolving collisions looks into the hash
table and tries to find another open slot to hold the item that
caused the collision.
• A simple way to do this is to start at the original hash value
position and then move in a sequential manner through the
slots until we encounter the first slot that is empty.
• Note that we may need to go back to the first slot
(circularly) to cover the entire hash table. This collision
resolution process is referred to as open addressing in that
it tries to find the next open slot or address in the hash table.
• By systematically visiting each slot one at a time, we are
performing an open addressing technique called linear
probing.
• A disadvantage to linear probing is the tendency for
clustering; items become clustered in the table. This
means that if many collisions occur at the same hash
value, a number of surrounding slots will be filled by
the linear probing resolution.
• One way to deal with clustering is to extend the linear
probing technique so that instead of looking
sequentially for the next open slot, we skip slots,
thereby more evenly distributing the items that have
caused collisions. This will potentially reduce the
clustering that occurs.
• The general name for this process of looking for another
slot after a collision is rehashing.
• With simple linear probing, the rehash function is
newhashvalue=rehash(oldhashvalue) where
rehash(pos)=(pos+1)%sizeoftable.
• The “plus 3” rehash can be defined as
rehash(pos)=(pos+3)%sizeoftable.
• In general, rehash(pos)=(pos+skip)%sizeoftable. It is
important to note that the size of the “skip” must be such
that all the slots in the table will eventually be visited.
Otherwise, part of the table will be unused.
• To ensure this, it is often suggested that the table size be a
prime number.
• A variation of the linear probing idea is called
quadratic probing. Instead of using a
constant “skip” value, we use a rehash
function that increments the hash value by 1, 3,
5, 7, 9, and so on. This means that if the first
hash value is h, the successive values are h+1,
h+4, h+9, h+16, and so on.
Chaining Technique
• An alternative method for handling the collision
problem is to allow each slot to hold a reference
to a collection (or chain) of items.
• Chaining allows many items to exist at the same
location in the hash table. When collisions
happen, the item is still placed in the proper slot
of the hash table. As more and more items hash to
the same location, the difficulty of searching for
the item in the collection increases.
• When we want to search for an item, we use
the hash function to generate the slot where it
should reside. Since each slot holds a
collection, we use a searching technique to
decide whether the item is present. The
advantage is that on the average there are
likely to be many fewer items in each slot, so
the search is perhaps more efficient.

You might also like