0% found this document useful (0 votes)
35 views14 pages

DSA Topic07-I

Uploaded by

chandanmohan999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views14 pages

DSA Topic07-I

Uploaded by

chandanmohan999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Introduction to Algorithms

6.046J/18.401J
LECTURE 7
Hashing I
• Direct-access tables
• Resolving collisions by
chaining
• Choosing hash functions
• Open addressing

Prof. Charles E. Leiserson


October 3, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.1
Symbol-table problem
Symbol table S holding n records:
record
x Operations on S:
key[x]
key[x]
• INSERT(S, x)
• DELETE(S, x)
Other fields
containing • SEARCH(S, k)
satellite data

How should the data structure S be organized?


October 3, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.2
Direct-access table
IDEA: Suppose that the keys are drawn from
the set U ⊆ {0, 1, …, m–1}, and keys are
distinct. Set up an array T[0 . . m–1]:
x if x ∈ K and key[x] = k,
T[k] =
NIL otherwise.
Then, operations take Θ(1) time.
Problem: The range of keys can be large:
• 64-bit numbers (which represent
18,446,744,073,709,551,616 different keys),
• character strings (even larger!).
October 3, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.3
Hash functions
Solution: Use a hash function h to map the
universe U of all keys into T
{0, 1, …, m–1}: 0

k1 h(k1)
k5 h(k4)
S k4 h(k2) = h(k5)
k2 k3
h(k3)
U
m–1
When a record to be inserted maps to an already
As each key
occupied slotisininserted, h maps
T, a collision it to a slot of T.
occurs.
October 3, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.4
Resolving collisions by
chaining
• Link records in the same slot into a list.
T
Worst case:
i
• Every key
hashes to the
49
49 86 86 52 52 same slot.
• Access time =
Θ(n) if |S| = n
h(49) = h(86) = h(52) = i

October 3, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.5
Average-case analysis of chaining
We make the assumption of simple uniform
hashing:
• Each key k ∈ S is equally likely to be hashed
to any slot of table T, independent of where
other keys are hashed.
Let n be the number of keys in the table, and
let m be the number of slots.
Define the load factor of T to be
α = n/m
= average number of keys per slot.
October 3, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.6
Search cost
The expected time for an unsuccessful
search for a record with a given key is
= Θ(1 + α).

October 3, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.7
Search cost
The expected time for an unsuccessful
search for a record with a given key is
= Θ(1 + α). search
the list
apply hash function
and access slot

October 3, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.8
Search cost
The expected time for an unsuccessful
search for a record with a given key is
= Θ(1 + α). search
the list
apply hash function
and access slot
Expected search time = Θ(1) if α = O(1),
or equivalently, if n = O(m).

October 3, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.9
Search cost
The expected time for an unsuccessful
search for a record with a given key is
= Θ(1 + α). search
the list
apply hash function
and access slot
Expected search time = Θ(1) if α = O(1),
or equivalently, if n = O(m).
A successful search has same asymptotic
bound, but a rigorous argument is a little
more complicated. (See textbook.)
October 3, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.10
Choosing a hash function
The assumption of simple uniform hashing
is hard to guarantee, but several common
techniques tend to work well in practice as
long as their deficiencies can be avoided.
Desirata:
• A good hash function should distribute the
keys uniformly into the slots of the table.
• Regularity in the key distribution should
not affect this uniformity.
October 3, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.11
Division method
Assume all keys are integers, and define
h(k) = k mod m.
Deficiency: Don’t pick an m that has a small
divisor d. A preponderance of keys that are
congruent modulo d can adversely affect
uniformity.
Extreme deficiency: If m = 2r, then the hash
doesn’t even depend on all the bits of k:
• If k = 10110001110110102 and r = 6, then
h(k) = 0110102 . h(k)
October 3, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.12
Division method (continued)
h(k) = k mod m.
Pick m to be a prime not too close to a power
of 2 or 10 and not otherwise used prominently
in the computing environment.
Annoyance:
• Sometimes, making the table size a prime is
inconvenient.
But, this method is popular, although the next
method we’ll see is usually superior.
October 3, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.13
Multiplication method
Assume that all keys are integers, m = 2r, and our
computer has w-bit words. Define
h(k) = (A·k mod 2w) rsh (w – r),
where rsh is the “bitwise right-shift” operator and
A is an odd integer in the range 2w–1 < A < 2w.
• Don’t pick A too close to 2w–1 or 2w.
• Multiplication modulo 2w is fast compared to
division.
• The rsh operator is fast.

October 3, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.14

You might also like