0% found this document useful (0 votes)
241 views19 pages

Lec8 PDF

lec8

Uploaded by

harry_i3t
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
241 views19 pages

Lec8 PDF

lec8

Uploaded by

harry_i3t
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Introduction to Algorithms

6.046J/18.401J LECTURE 8
Hashing II Universal hashing Universality theorem Constructing a set of universal hash functions Perfect hashing Prof. Charles E. Leiserson

October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.1

A weakness of hashing

Problem: For any hash function h, a set


of keys exists that can cause the average
access time of a hash table to skyrocket.
An adversary can pick all keys from
{k U : h(k) = i} for some slot i.
IDEA: Choose the hash function at random, independently of the keys. Even if an adversary can see your code, he or she cannot find a bad set of keys, since he or she doesnt know exactly which hash function will be chosen.
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.2

Universal hashing

Definition. Let U be a universe of keys, and let H be a finite collection of hash functions, each mapping U to {0, 1, , m1}. We say H is universal if for all x, y U, where x y, we have |{h H : h(x) = h(y)}| = |H|/ m. That is, the chance
{h : h(x) = h(y)}
of a collision between x and y is
1/m if we choose h |H|
randomly from H.
m H

October 5, 2005

Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson

L7.3

Universality is good

Theorem. Let h be a hash function chosen


(uniformly) at random from a universal set H of hash functions. Suppose h is used to hash n arbitrary keys into the m slots of a table T. Then, for a given key x, we have E[#collisions with x] < n/m.

October 5, 2005

Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson

L7.4

Proof of theorem

Proof. Let Cx be the random variable denoting the total number of collisions of keys in T with x, and let 1 if h(x) = h(y), cxy = 0 otherwise. Note: E[cxy] = 1/m and C x =
yT { x}

cxy .

October 5, 2005

Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson

L7.5

Proof (continued)

E[
C
x ]
=
E

c xy
yT { x}

Take expectation
of both sides.

October 5, 2005

Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson

L7.6

Proof (continued)

E[
C
x ]
=
E
c xy

yT { x}
=
yT { x}

Take expectation of both sides. Linearity of expectation.

E[cxy ]

October 5, 2005

Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson

L7.7

Proof (continued)

E[
C
x ]
=
E

c xy


yT { x} = =
yT { x}
yT { x}

Take expectation
of both sides. Linearity of expectation. E[cxy] = 1/m.

E[cxy ] 1/ m

October 5, 2005

Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson

L7.8

Proof (continued)


x ]
=
E
c xy E[
C
yT {
x}
= =
yT { x} yT { x}

Take expectation of both sides. Linearity of expectation. E[cxy] = 1/m. Algebra.


L7.9

E[cxy ] 1/ m

=
n

1 . m
October 5, 2005

Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson

Constructing a set of universal hash functions


Let m be prime. Decompose key k into r + 1 digits, each with value in the set {0, 1, , m1}. That is, let k = k0, k1, , kr, where 0 ki < m. Randomized strategy:
Pick a = a0, a1, , ar where each ai is chosen
randomly from {0, 1, , m1}.
Dot product,
Define ha (k ) = ai ki mod m . modulo m i =0 How big is H = {ha}? |H| = mr + 1. REMEMBER
THIS!
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.10

Universality of dot-product hash functions


Theorem. The set H = {ha} is universal.

Proof. Suppose that x = x0, x1, , xr and y = y0, y1, , yr be distinct keys. Thus, they differ in at least one digit position, wlog position 0. For how many ha H do x and y collide? We must have ha(x) = ha(y), which implies that

ai xi ai yi
i =0 i =0
October 5, 2005

(mod m) .
L7.11

Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson

Proof (continued)

Equivalently, we have

ai ( xi yi ) 0
i =0 i =1

(mod m)

or
r a0 ( x0 y0 ) + ai ( xi yi ) 0 which implies that
a0 ( x0 y0 ) ai ( xi yi )
i =1
October 5, 2005

(mod m) ,

(mod m) .
L7.12

Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson

Fact from number theory

Theorem. Let m be prime. For any z Zm such that z 0, there exists a unique z1 Zm such that z z1 1 (mod m). Example: m = 7.

z z1
October 5, 2005

4 5

1 4

5 2 3 6

L7.13

Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson

Back to the proof

We have
a0 ( x0 y0 ) ai ( xi yi )
i =1 r

(mod m) ,

and since x0 y0 , an inverse (x0 y0 )1 must exist, which implies that

r 1 a0 a ( x ) ( x y ) y i 0 0 i i i =1

(mod m) .

Thus, for any choices of a1, a2, , ar, exactly one choice of a0 causes x and y to collide.
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.14

Proof (completed)

Q. How many has cause x and y to collide? A. There are m choices for each of a1, a2, , ar , but once these are chosen, exactly one choice for a0 causes x and y to collide, namely


r
1

mod m . a0 = a x y
x y
( )
( )




i
i
i
0
0


i =1

Thus, the number of h s that cause x and y


a r r to collide is m 1 = m = |H|/m.
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.15

Perfect hashing

Given a set of n keys, construct a static hash


table of size m = O(n) such that SEARCH takes
(1) time in the worst case.

IDEA: Twolevel scheme with universal hashing at both levels. No collisions at level 2!
October 5, 2005

T 0
1
2
3
4
5
6
4 31 4 31 1 00 1 00 9 86 9 86 m a

S1 14 27 1427 S4 26 26 h31(14) = h31(27) = 1 S6

40 22 40 37 37 22 0 1 2 3 4 5 6 7 8
L7.16

Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson

Collisions at level 2

Theorem. Let H be a class of universal hash functions for a table of size m = n2. Then, if we use a random h H to hash n keys into the table, the expected number of collisions is at most 1/2. Proof. By the definition of universality, the probability that 2 given keys in the table collide
n 2
) pairs under h is 1/m = 1/n . Since there are (2 of keys that can possibly collide, the expected number of collisions is n
1
(
n
n
1)
1

2 <
1 .
2 =
2 2 n
2
n
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.17

No collisions at level 2

Corollary. The probability of no collisions is at least 1/2.

Proof. Markovs inequality says that for any nonnegative random variable X, we have Pr{X t} E[X]/t. Applying this inequality with t = 1, we find that the probability of 1 or more collisions is at most 1/2. Thus, just by testing random hash functions
in H, well quickly find one that works.

Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson

October 5, 2005

L7.18

Analysis of storage

For the level-1 hash table T, choose m = n, and let ni be random variable for the number of keys that hash to slot i in T. By using ni2 slots for the level-2 hash table Si, the expected total storage required for the two-level scheme is therefore m 1

2 (ni
) =
(n),
E


i
=
0

since the analysis is identical to the analysis from recitation of the expected running time of bucket sort. (For a probability bound, apply Markov.)
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.19

You might also like