0% found this document useful (0 votes)
44 views7 pages

Universal Classes of Hash Functions

The document presents universal classes of hash functions that can be used to store and retrieve data in average linear time. It defines a property called "universal 2" for classes of hash functions, where no pair of distinct keys are mapped to the same index by more than 1/B of the functions, where B is the size of the index set. It shows that universal 2 classes ensure the expected number of collisions for any sample is small. It then exhibits several universal 2 classes of hash functions that can be evaluated efficiently and discusses applications like improving bounds for certain algorithms.

Uploaded by

shdotcom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views7 pages

Universal Classes of Hash Functions

The document presents universal classes of hash functions that can be used to store and retrieve data in average linear time. It defines a property called "universal 2" for classes of hash functions, where no pair of distinct keys are mapped to the same index by more than 1/B of the functions, where B is the size of the index set. It shows that universal 2 classes ensure the expected number of collisions for any sample is small. It then exhibits several universal 2 classes of hash functions that can be evaluated efficiently and discusses applications like improving bounds for certain algorithms.

Uploaded by

shdotcom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Uni versal Cl asses of Hash Funct i ons

( ext ended abst ract )


J. Lawr ence Car t er and Mar k N. Wegman
I BM Thomas J. Wat son Research Cent er
Yor k t own Hei ght s, New Yor k 10598
Ab s t r a c t :
Thi s paper gi ves an i , put i ndepende, t aver age l i near t i me al gor i t hm f or st or age
and r et r i eval on keys. The al gor i t hm makes a r andom choi ce of hash f unct i on f r om a
sui t abl e cl ass of hash f unct i ons. Gi ven any sequence of i nput s t he expect ed t i me
( aver agi ng over al l f unct i ons in t he cl ass) t o st or e and r et r i eve el ement s is l i near in
t he l engt h of ~ t he sequence. The number of ref erences t o t he dat a base r equi r ed by
t he al gor i t hm f or any i nput is ext r emel y cl ose t o t he t heor et i cal mi ni mum f or any
possi bl e hash f unct i on wi t h r andoml y di st r i but ed i nput s. We pr esent t hr ee sui t abl e
cl asses of hash f unct i ons whi ch al so may be eval uat ed r api dl y. The abi l i t y t o anal yze
t he cost of st or age and r et r i eval wi t hout wor r yi ng about t he di st r i but i on of t he i nput
al l ows as cor ol l ar i es i mpr ovement s on t he bounds of several al gor i t hms.
I n t r o d u c t i o n :
One may v i ew di f f er ent i nput s t o a pr ogr am as
el ement s f r om a cl ass of pr obl ems. The answer
gi ven by t he pr ogr am is, hopef ul l y, a cor r ect sol ut i on
t o t he pr obl em. Ordi nari l y, when one t al ks about
t he aver age per f or mance of a pr ogr am, one aver -
ages over t he cl ass of pr obl ems t he pr ogr am can
sol ve. Gi l l [ 2 ] , Ra b i n [ 7 ] , St rassen and So l o v a y [ 9 ]
have used a di f f er ent appr oach on some cl asses of
pr obl ems. They suggest t hat t he pr ogr am r andoml y
choose an al gor i t hm f r om t he cl ass of al gor i t hms t o
sol ve t he pr obl em. They are abl e t o bound t he av -
er age per f or mance of t he cl ass of al gor i t hms f or t he
wor s t case i nput . Thi s aver age on t he wor s t case
can be bet t er t han t he per f or mance of any known
si ngl e al gor i t hm on i t s wor st case. Some of t he
p r o b l e ms whi ch t hi s appr oach over comes are t he
f ol l owi ng:
1) Cl assi cal anal ysi s ( aver agi ng over t he cl ass of
i nput s) must make assumpt i ons about t he di st r i b-
ut i on of t he i nput s. These assumpt i ons may not hol d
in cer t ai n appl i cat i ons.
2) A consequence of (1) is t hat one cannot cl assi -
cal l y anal yze t he aver age per f or mance of a subr out -
i ne i ndependent l y of t he mai n r out i ne, si nce t he
mai n r out i ne may skew t he di st r i but i on of dat a.
3) I f t he pr ogr am is pr esent ed wi t h a wor s t - c as e
i nput , t her e is no way t o avoi d t he r esul t i ng poor
per f or mance. However , i f one had a cl ass of al gor -
i t hms t o choose f r om and was abl e t o real i ze t hat a
par t i cul ar al gor i t hm was r unni ng sl owl y on a g i v e n
i nput , t hen i t mi ght be possi bl e t o choose a di f f er ent
al gor i t hm.
In t hi s paper, we appl y t hese not i ons t o hash-
i ng f or st or age and r et r i eval , and suggest t hat a
cl ass of hash f unct i ons be used. We show t hat i f
t he cl ass of f unct i ons is chosen pr oper l y, t hen t he
aver age per f or mance of t he pr ogr am on any i nput
wi l l be nearl y as good as i f a si ngl e f unct i on, chosen
wi t h knowl edge of t he i nput , wer e used. W~ pr es-
ent sever al cl asses of hash f unct i ons whi ch i nsure
t hat ever y sampl e chosen f r om t he i nput space wi l l
be di st r i but ed evenl y by enough of t he f unct i ons t o
1 0 6
compensat e f or t he poor per f or mance of t he al gor -
i t hm when an unl ucky choi ce of f unct i on is made.
A br i ef out l i ne of our paper f ol l ows. Af t er i n-
t r oduci ng some not at i on, we def i ne a pr oper t y of a
cl ass of f unct i ons: uni ver sal 2. We s how t hat any
cl ass of f unct i ons t hat is uni ver sal 2 has t he pr oper t y
t hat gi ven any sampl e, a r andoml y chosen member
of t hat cl ass wi l l be expect ed t o di st r i but e t he s am-
pl e evenl y. We t hen ex hi bi t sever al uni ver sal 2
cl asses of f unct i ons whi ch can be eval uat ed easi l y.
Fi nal l y we gi ve sever al exampl es of t he use of t hese
f unct i ons.
N o t a t i o n :
I f S is a set, I SI wi l l denot e t he number of
el ement s in S. I x ] means t he l east i nt eger > x.
If x and y ar e bi t st r i ngs, t hen x ( ~ y i s t he excl usi ve-
or of x and y. Z wi l l r epr esent t he i nt eger s mod n.
Al l hash f unct i ons map a set A i nt o a set B. We wi l l
al ways assume I AI > I BI . A is somet i mes cal l ed
t h e set of possi bl e keys, and B t he set of i ndi ces. If
f is a hash f unct i on and x, y~A, we l et ~r(x,y) be 1 i f
x p y and f (x) = f(y), and 0 ot her wi se. Thus, 8f(x,y)
i s 1 i f x and y ar e di st i nct el ement s of A whi ch map
t o t he same val ue under f. I f f, x or y i s r epl aced in
8,(x,y) by a set , we sum over al l t he el ement s in t he
set . Thus, i f H is a col l ect i on of hash f unct i ons, x~A
and S e A t hen 8H(X,S) means
Z ~ 8f(x,y).
f~H y~S
Not i ce t hat t he or der of summat i on does not mat t er .
P r o p e r t i e s o f Un i v e r s a l Cl a s s e s :
Let H be a cl ass of f unct i ons f r om A t o B. We
say t hat H is universal 2 i f f or al l x, y in A,
8. ( x, y) < I HI . That is, H is uni ver sal 2 i f no pai r of
- I B I
di st i nct keys ar e mapped i nt o t he same i ndex by
mor e t han one I B I'h of t he f unct i ons. Pr oposi t i on 1
shows t hat t hi s bound on 8. ( x, y) is t i ght when I AI
i s much l ar ger t han I BI . The second pr oposi t i on
f o l l o ws al mos t i mmedi at el y f r om t he def i ni t i on of
uni ver sal 2 .
Pr oposi t i on 1: Gi ven any col l ect i on H of hash f unc-
t i ons (not necessar i l y uni versal 2), t her e exi st s x, yeA
such t hat
I HI I HI
8. ( x, y) > -
I BI I AI
Pr oof (sket ch): Let a = I AI and b = I BI . A coun-
t i ng ar gument shows t hat f or each f el l ,
a 2
(~f (A, A) > - - - a.
- b
Thus, 8H(A,A) _> a 2 1 Hl ( 1 / b - l / a ) . Ther ef or e, by
t he pi geon hol e pr i nci pl e, t her e exi st s x, yeA such
t hat 8H(x,y) > [ H I ( 1 / b - 1/ a) . [ ]
Pr oposi t i on 2: Let x be any el ement of A and S any
subset of A. Let f be a f unct i on chosen r andoml y
f r om a uni ver sal 2 cl ass of f unct i ons ( wi t h equal
pr obabi l i t i es on t he f unct i ons. ) Then. t he expect ed
number of el ement s of S t hat x col l i des wi t h, i.e.
I SI
8,(x,S), is _< - -
I BI
Pr oof :
Mean val ue of 8f(x,S)
_ 1 ~'~
8f(x,S)
I HI feB
= 1 ~_, 8H(x,y) ( by not at i on)
I HI y~s
1 ~'~ I H I
< ~ ~**~. q~- ( by def . of uni versal 2)
_ I S l [ ]
I B I "
In addi t i on t o bei ng usef ul l ater, Pr oposi t i on 2
has some di r ect appl i cat i ons. For i nst ance, an opt i -
cal char act er r eader post pr ocessi ng syst em is d e -
scr i bed in [ 8 ] . Thi s syst em is desi gned t o check i f
a wor d x is a member of a set of val i d wor ds S.
The set { f(y) l yeS } is st or ed in memor y. To t est
whet her x is in S, a check is made t o see i f f(x) is in
t he st or ed set . Si nce f(y) is gener al l y shor t er t han y,
a consi der abl e amount of space was saved. Ho we v -
er, t her e is a chance of er r or ; i f f (x)=f (y) and yeS,
t hen x may er r oneousl y be accept ed as val i d Pr op-
osi t i on 2 gi ves a bound on t he pr obabi l i t y of er r or
when f is chosen f r om a cl ass of uni versal 2 f 0nc-
t i ons.
1 0 7
We ar e i nt er est ed i n t he cost of usi ng t hese
f unct i ons in st or age and r et r i eval oper at i ons. Gi ven
a sequence R of r equest s ( i nser t i ons or r et r i eval s) t o
some dat a base, and a hash f unct i on f, we def i ne
t he cost of R wi t h r espect t o f, C(f,R), t o be t he sum
of t he cost s of t he i ndi vi dual r equest s. The cost of
an i ndi vi dual r equest r ef er r i ng t o an el ement x is one
pl us t he number of di st i nct pr evi ousl y i nser t ed ys
f or whi ch f (x) = f(y).
Thi s cost f unct i on r ef l ect s t he wor s t case cost
of i nser t i ng or f i ndi ng el ement s i n a st or age and
r et r i eval scheme in whi ch each el ement of B is as -
soci at ed wi t h a l i nked l i st, and an el ement x i s
st or ed in t he l i st associ at ed wi t h f (x) (see [ 1 ] , page
111 - 113.) Ot her col l i si on r esol ut i on schemes
wo u l d have ot her cost f unct i ons associ at ed wi t h
t hem. For exampl e, i f t he keys wi t h t he same i ndex
wer e st or ed in a bal anced t ree, t he cor r espondi ng
cost f unct i on woul d be smal l er.
The f ol l owi ng t heor em gi ves a ni ce bound on
t he ex pec t ed cost of usi ng a uni ver sal 2 cl ass of
hash f unct i ons wi t h t he l i nked l i st met hod f or r e-
sol vi ng col l i si ons.
Pr oposi t i on 3: Let R be a sequence of r r equest s
whi ch i ncl udes k i nser t i ons. Suppos e H i s a
uni ver sal 2 cl ass of hash f unct i ons. Then i f we
choose f at r andom f r om H, t he expect ed cost
k
C(f,R) is _< r(1 + ~- ~- ) .
Pr oof : The ex pec t ed cost of R is t he sum of t he
expect ed cost s of t he i ndi vi dual r equest s. Pr oposi -
t i on 2 and t he def i ni t i on of cost t el l us t hat an i ndi -
vi dual r equest has expect ed cost no gr eat er t han
k
1 + - - [ ]
I BI "
A speci al case of t hi s pr oposi t i on is t hat i f k is
r oughl y t he si ze of B t hen t he expect ed cost is 2r.
Not i ce t hat t hi s l i near bound hol ds f or any sequence
of r equest s, not j ust f or t he " a v e r a g e " r equest . For
many appl i cat i ons, t her e is an upper bound on t he
number of el ement s t o be st or ed and hence, B can
be c l ~s e n appr opr i at el y. I f t her e is no known upper
bound i t i s possi bl e t o dynami cal l y choose a size,
and rehash when t hat choi ce pr oves t o be t o o smal l .
Rehashi ng can be done in l i near t i me, and even in
r eal - t i me.
We can s how t hat t he expect ed cost ( aver agi ng
over t he hash f unct i ons) of any r equest is vi r t ual l y
t he same as t he expect ed cost ( aver agi ng over t he
possi bl e r equest s) of any si ngl e hash f unct i on when
appl i ed t o a r andom r equest af t er r andom i nser t i ons
have been made. The r eason i s as f ol l ows : Let
a = I AI and b = I BI . The count i ng ar gument
ci t ed in pr oposi t i on 1 i mpl i es t hat i f f is any hash
f unct i on and x and y ar e chosen at' r andom f r om A,
t hen t he expect at i on of 8f(x,y) is > ( 1 / b - l / a ) . It
f o l l o ws t h a t t he e x p e c t a t i o n of 8f ( x, S)
is > I S a ( 1 / b - l / a ) , wher e S is t he r andom s ub-
set of A whi ch has been pr evi ousl y st or ed. Thus,
t he r . ost of t he r e q u e s t i s at l east
1 + I S I ( 1 / b - l / a ) . When A is much l ar ger t han
B ( whi ch wi l l be t he case in most appl i cat i ons of
hashi ng), t hi s is vi r t ual l y t he same as t he cost of a
r equest ci t ed in t he pr oof of Pr oposi t i on 3.
It is al so possi bl e t o bound t he pr obabi l i t y t hat
gi ven a sequence of r equest s R, t he per f or mance of
a r andoml y chosen f unct i on wi l l be wor se t han t o l -
er abl e on R. Si nce we k now t hat C(f,R) must be at
l east r, we can concl ude t hat when k is r oughl y t he
si ze of I BI , t he pr obabi l i t y t hat C(f,R) > t . r is l ess
t han 1/ ( t - 1) . For some cl asses of hash f unct i ons
(such as t he l ast cl ass ment i oned in t hi s paper), i t is
possi bl e t o der i ve a bound on t he st andar d devi at i on
or hi gher moment s of t he cost of a r andoml y chosen
f unct i on on a par t i cul ar R. Thi s al l ows us t o get a
much bet t er est i mat e of t he pr obabi l i t y t hat C(f,R)
wi l l be l arge.
S o m e u n i v e r s a l 2 c l a s s e s :
The f i r st cl ass of uni ver sal 2 hash f unct i ons we
pr esent is sui t abl e f or appl i cat i ons wher e t he bi t
1 0 8
st r i ngs whi c h r epr es ent t he keys can c o n v e n i e n t l y
be mul t i pl i ed by t he c omput er .
Su p p o s e A={ O, 1 ..... a - l } and B={ O, 1 . . . . . b - l } .
Let p be a pr i me wi t h p>_a. Let g be any f unc t i on
f r om Zp t o B whi ch, as cl osel y as possi bl e, maps t he
same number of el ement s of Z, i nt o each el ement of
B. For mal l y, we r equi r e
I { y ~ Z p I g(y)=z} I < [ - ~ ] for all z ~ B. A
nat ur al choi ce f or g i s t he r esi due mo d u l o b. When
b = 2 k f or s ome k, t hi s a mo u n t s t o t aki ng t he l ast k
bi t s i n t he bi nar y r epr es ent at i on of y.
Let m and n be el ement s of Zp wi t h mpO. We
def i ne hm,o:A--,Z p by h (x) = ( mx+n) mo d p. No w
def i ne fm.n(X) = g(hm,,(X)). The cl ass H i s t he set
{f~,n I m, neZp m# 0 } . If desi r ed, p can be chosen
so t he mod p oper at i on can be cal cul at ed wi t h o u t a
di vi si on.
The f o l l o wi n g l emma i s usef ul i n pr ovi ng t hat
t hi s cl ass i s uni ver sal 2.
Lemma: When H i s def i ned as above, t hen f or any
x, y~A wi t h x # y , 8H(x,y) equal s t he number of o r -
der ed pai r s (r,s) wi t h r,s~Zp, r ~ s and g(r)=g(s).
Pr oof : Ther e is a nat ur al c or r es pondenc e b e t we e n
t he f unc t i ons hm, n and t he or der ed pai r s (r,s) wh e r e
r, seZp and r ~s. Speci f i cal l y, we i dent i f y t he f u n c -
t i on hm, n by t he or der ed pai r (hm,n(X),hm, n(y)). Si nce
m~O, hm,n(X)~hrn,n(y). Thi s c or r es pondenc e i s o n e -
t o - o n e and o n t o si nce t he l i near equat i ons
x m+ n ~ r ( mod p) and y m+ n - - s ( mod p) have a uni -
que sol ut i on f or m and n i n t he f i el d Zp.
If (r,s) i s t he pai r (hm,n(X),hm,n(y)), t hen
fm,n(X)=fm,n(y) i f and onl y i f g(r)=g(s). Thus, 8H(x,y)
i s t he n u mb e r of such pai r s. [ ]
Pr opos i t i on 4 : The cl ass H def i ned a b o v e i s
uni ver sal 2 .
Pr oof : Let n i be t he n u mb e r of e l e me n t s i n
[ t eZp I g( t ) =i } . The r est r i ct i on on g i s t hat f or
each i, ni < I - ~ ] . Si nce p and b ar e i nt eger s, t hi s
i mpl i es t hat n i < p - 1 +1. Thus, f or a gi ven r, t he
- b
n u mb e r of choi ces f or s such t hat r # s but g(r)=g(s)
i s < P - - I . Si nce t her e ar e p choi ces f or r,
b
P p- 1 > t he n u mb e r of or der ed pai r s (r,s) s at i s f y -
b
i ng t he c ondi t i on i n t he l emma = 8H(x,y). Recal l i ng
t hat f or x=y, 8H(x, y)=O, t hi s s hows t hat H i s
uni ver sal 2 . I-I
Fr equent l y, al gor i t hms ar e anal yzed mak i ng t he
as s umpt i on t hat mul t i pl i c at i on t akes uni t t i me. The
n u mb e r of mul t i pl i c at i ons i s s a i d , t o be t he cost of
t he al gor i t hm. Thi s model i s appr opr i at e when t her e
ar e no oper at i ons whi c h can be done an unbounded
n u mb e r of t i mes f or each mul t i pl i cat i on. When a
uni ver sal 2 cl ass of hash f unc t i ons i s used, t hen t he
n u mb e r of me mo r y r ef er ences per r equest can be
b o u n d e d wh e n av er aged ov er al l f unc t i ons i n t he
cl ass as i n Pr opos i t i on 3 ( assumi ng k i s not u n -
b o u n d e d wi t h r espect t o J BI . ) Ther ef or e t he model
i s appr opr i at e, and under i t t he hash f unc t i ons i n t he
cl ass gi ven abov e may be appl i ed i n c ons t ant t i me.
Thus i n t hi s model , f or any s equenc e of r equest s,
t he al gor i t hm t akes av er age t i me l i near i n t he n u m-
ber of r equest s.
I t may seem t hat t he addi t i on of n i n t he cl ass
of f unc t i ons gi ven abov e pl ays an uni mpor t ant rol e.
Thi s i s onl y par t l y t r oe. Suppos e f or m~Zp we d e -
f i ne hm(X) = ( mx) r ood p, and as bef or e def i ne fm(X)
as g(hm(X)). Let H = {frn I meZp m~ O} . I t can
be s h o wn t hat t hi s cl ass of f unc t i ons c omes wi t hi n a
f ac t or of t wo of bei ng uni ver sal 2, t hat is, f or any x
^ I HI
and y, 8H(x, y) <_ Z ' ~ T - . On t he ot her hand, t hi s
b o u n d c a n n o t be i mp r o v e d si gni f i cant l y. For i n-
st ance, l et b = J BI , and c hoos e k so t hat
p = k b + k + l i s pr i me ( t her e wi l l be i nf i ni t el y ma n y
such k' s. ) Let g(x) be t he f unc t i on x ( mod b). Let
x = 1 and y = b + l . Then t he 2k f unc t i ons f l ,
f 2 ..... fk f p - k f p- k. l ..... fp-1 each ma p x and y t o
t he s ame bi n. Thus, 8H(x, y) = 2k whi l e
I H_~_l_ p-_~_l _ k b + k _ ( 1 + 1 ) k .
I B I b b b
The uni ver sal 2 cl ass of f unc t i ons gi ven abov e
109
may not be conveni ent when t he keys are t oo l ong
t o be mul t i pl i ed u=~ing a si ngl e machi ne i nst r uct i on.
However , t he next pr oposi t i on gi ves a met hod of
ext endi ng a cl ass of f unct i ons f or l ong keys.
Pr oposi t i on 4: Suppose I BI is a power of t wo and
H is a cl ass of f unct i ons f r om A t o B wi t h t he p r o p -
er t y t hat f or each i~B,
I { f ~H I f ( x) ( ~f ( y) = i } I - ] HI Recal l t hat
I BI "
Q is ex c l us i v e- or . Then we can def i ne a
uni versal 2 cl ass of hash f unct i on f r om A x A t o B a s
f ol l ows. For f, geH, def i ne hf.g((x,y)) = f (x) ( ~ g(y).
Then t hi s new col l ect i on of hash f unct i ons
J = {hf.Q ] f , g~H} is uni ver sal 2 and al so sat i sf i es
t he condi t i on of t hi s pr oposi t i on.
The pr oof is qui t e si mi l ar t o t he pr oof of Pr op-
osi t i on 6, and t her ef or e omi t t ed. Pr oposi t i on 5 can
be appl i ed r epeat edl y t o ext end t he f unct i ons t o
ar bi t r ar i l y l ong keys. If t he f unct i ons in H can be
appl i ed in const ant t i me, t hen t he t i me r equi r ed t o
c omput e an ext ended f unct i on i s pr opor t i onal t o t he
l engt h of t he key.
The pr oposi t i on does not qui t e appl y t o t he
uni versal 2 cl ass of f unct i ons gi ven ear l i er bot h b e -
cause H is not a power of 2 (so I HI / I BI cannot be
an i nt eger) and because t he number of f unct i ons f or
whi ch f (x) ( ~ f(y) = 0, i.e. 6H(x,y), is act ual l y l ess
t han I HI / I B I . Bot h of t hese di f f er ences add smal l
f act or s t o 8j ( x, y) whi ch bar el y pr event J f r om bei ng
uni ver sal 2. The per cent age cont r i but ed by t hese
smal l f act or s decr ease asympt ot i cal l y t o 0 as p is
i ncr eased. Mor e det ai l s wi l l be gi ven in a f ut ur e p a -
per.
The f ol l owi ng is a cl ass of f unct i ons whi ch do
not r equi r e mul t i pl i cat i on, whi ch may be bet t er f or
many appl i cat i ons, Suppose A can be vi ewed as
t he set of i - di g; t number s wr i t t en in base a, and B
as t he set of bi nar y number s of l engt h j. Then I AI
= a ' a n d I BI = 2 j. Let M be t he cl ass of ar r ays of
l engt h i a, each of whos e el ement s ar e bi t s t r i n g s o f
l engt h j . For m~M, l et m(k) be t he bi t st r i ng whi ch
is t he k th el ement of m, and f or x t A, l et x k be t he k 'h
di gi t of x wr i t t en in base a. We def i ne
fro(x) -- m( x , +l ) O m( xl +x2+2) ( ~ ... (~) m( ~ xk+k).
k =l
The cl ass H i s t he set { f . I me M} .
Anot her way of def i ni ng t hi s cl ass i s t o gi ve a
pr ogr am f or appl yi ng a f unct i on t o an i nput x.
dcl m( i a) bi t ( j ) i ni t ( r andom) ;
dcl x(i ) di gi t s base a;
dcl val ue bi t (j );
di sp := 0;
val ue := 0;
f or k := 1 t o i do begi n
di sp := di sp + x(k) + 1;
val ue := val ue (~) m( di sp) ;
end;
r et ur n (val ue);
Pr oposi t i on 6: The c l a s s H def i ned above i s
uni ver sal 2 .
Pr oof : For x and y in A, s uppos e fm(X) i s t he
ex c l us i v e- or of t he r ows r 1, r 2 ..... r s of m, and fro(Y)
is r s. l ( ~. . . ( ~) r t . Not i ce t hat fro(x) = fro(Y) i f and onl y
i f r l ( ~) . . . ( ~) r t = 0. As s umi ng x ~ y , t her e wi l l be
some k such t hat r k is i nvol ved in t he cal cul at i on of
onl y one of fm(X) or fm(y). Then fro(x) wi l l equal fro(y)
i f and onl y i f r k is t he ex c l us i v e- or of t he ot her ri ' s.
Si nce t her e are 2 j = B possi bi l i t i es f or t hat r ow, x
and y wi l l col l i de f or one B th of t he possi bl e f unc -
t i ons fm" Thus, t he cl ass of al l frn'S i s uni ver sal 2.
[ ]
For a gi ven B, each f unct i on in H t akes t i me
l i near in t he l engt h of t he key. In addi t i on, we can
mor e accur at el y descr i be t he di st r i but i on of cost s of
a par t i cul ar sequence of r equest s under t he di f f er ent
f unct i ons. For i nst ance, Ma r k o ws k y [ 6 ] has shown
t hat f or any sequence R of r r equest s wi t h k i nser -
t i ons, and any posi t i ve t, t he pr obabi l i t y t hat
k and
C( f , R) - - r ( l + ) > r-t is l ess t han t Zl BI
7k
al so l ess t han - -
t 3 i B]
ii0
i m p or t ance:
Pr ogr ammer s somet i mes spend a consi der abl e
amount of t i me r ef i ni ng hash f unct i ons f or appl i ca-
t i ons wher e i t is cri t i cal t hat a uni f or m di st r i but i on
be achi eved ( [ 5 ] , p . 5 0 8 - 5 1 3 ) . Thi s may be di f f i cul t
because i t is necessar y t hat t he expect ed i nput set
not be bi ased in such a way as t o make tl~e hash
f unct i on per f or m poor l y. One of t he pr act i cal val ues
of a cl ass of uni versal 2 f unct i ons is t hat we k now
t hat t her e ar e many accept abl e f unct i ons in t he
cl ass. Si mpl y choosi ng a si ngl e hash f unct i on r an-
doml y f r om such a cl ass gi ves a hi gh expect at i on
t hat a uni f or m di st r i but i on wi l l be achi eved. Fur t her -
more, i f t he f unct i on is changed each t i me t he pr o-
gr am is run, t hen we can be sure of good per f or m-
ance aver aged over al l runs.
The t heor et i cal i mpor t ance is t hat i t al l ows one
t o get a good bound on t he aver age per f or mance of
an al gor i t hm whi ch uses hashi ng. The pr obl em wi t h
an or di nar y hashi ng scheme is t hat t he al gor i t hm
mi ght bi as t he i nf or mat i on bei ng st or ed and r e-
t r i eved t o wa r d s t hose cases t hat ar e di st r i but ed
unevenl y by t he par t i cul ar hash f unct i on bei ng used.
Rabi n has devel oped an al gor i t hm whi ch f i nds
t he near est nei ghbor s of a col l ect i on of poi nt s in a
pl ane, gi ven t he coor di nat es of t he poi nt s [ 7 ] . Thi s
al gor i t hm i nvol ves maki ng a r andom choi ce of
poi nt s, and i t uses hashi ng. If one al so r andoml y
chooses t he hash f unct i on f r om a uni versal 2 cl ass,
t hen t he expect ed r unni ng t i me of t he al gor i t hm wi l l
al ways be l i near t he number of poi nt s.
In [ 3 ] and [ 4 ] an al gor i t hm is suggest ed f or
mul t i pl yi ng spar se pol ynomi al s, usi ng hashi ng. We
can st r engt hen t he r esul t s of t hese paper s. Let t wo
pol ynomi al s, P and Q, have n and m non- z er o t er ms
r espect i vel y. If mul t i pl i es and aCds are vi ewed as
t aki ng const ant t i me, t hen gi ven any t wo pol y nomi -
al s P and O., we can mul t i pl y t hem in aver age t i me
O(n' m). Let CP1,CP 2 ..... CP n be t he coef f i ci ent s of
t he n t er ms of P. Let EP1,EP 2 ..... EP n be the ex p o-
nents of t hose t er ms. Let CQ i and EQ i st and f or t he
same quant i t i es of Q. The f ol l owi ng al gor i t hm wi l l
have t he per f or mance we suggest ed, assumi ng
St or e and Ret r i eve are i mpl ement ed usi ng a
uni ver sal 2 cl ass of hash f unct i ons. Thei r f i r st ar gu-
ment is a key, and t he second is t he val ue st or ed or
r et r i eved. If a val ue has not been st or ed pr evi ousl y
f or a gi ven key, a r et r i eve wi l l have a zer o val ue.
Begi n
Choose a hash f unct i on;
For i := 1 t o n do
For j := 1 t o md o
Begi n
Coef f i ci ent := CP i * CQj ;
Ret r i eve (EP i + EQj ,k);
St or e (EP i + EQi , Coef f i ci ent + k);
End;
Pri nt al l keys and val ues whi ch have been st or ed;
End;
Si nce addi t i on and mul t i pl i cat i on are vi ewed as
t aki ng const ant t i me, t he f i r st cl ass of f unct i ons we
pr esent ed woul d seem appr opr i at e f or t hi s anal ysi s.
Fut ur e r esear ch:
Ther e ar e a number of areas whi ch can be i n-
vest i gat ed, such as:
1) I mpr ove t he bounds ci t ed here on t he pr obabi l i t y
t hat a par t i cul ar f unct i on f r om t he t abl e l o o k - u p
cl ass wi l l per f or m poor l y on a par t i cul ar i nput .
2) Suppos e t he st or e and r et r i eve al gor i t hm is
changed so t hat t he l ast el ement r et r i eved is moved
t o t he f i r st posi t i on on t he l i st. Can a smal l er bound
on t he pr obabi l i t y t hat t he cost f unct i on exceeds a
speci f i ed val ue be der i ved?
3) Ext end t he anal ysi s t o ot her st or age and r et r i eval
al gor i t hms whi ch i nvol ve hashi ng, such as doubl e
hashi ng and open addr essi ng.
4) Gener al i ze t he def i ni t i on of uni ver sal 2 t o
uni ver sal n t o consi der t he act i on of t he cl ass of
f unct i ons on any col l ect i on of n el ement s of A. De-
t er mi ne i f one can obt ai n i mpr oved r esul t s wi t h such
iii
a stronger assumpti on. (One def i ni t i on of universal k
i mpl i es t hat t he expect ed number of keys f rom a
k- el ement set mappi ng i nto a gi ven el ement of B
woul d be bi nomi al l y di stri buted.)
5) When should~one deci de that a particular f unct i on
is a poor choice and it woul d be wor t h the ef f ort to
choose a new f unct i on and rehash?
Ack nowledg ements:
We woul d like to thank Ashok Chandra f or hel pi ng
suggest and f ormul at e the probl em; Dave Gl i ckman
f or suggesting we exami ne the t abl e l ook- up t ech-
ni que; Hani a Gaj ewska and Davi d Smi t h f or sug-
gesti ng revi si ons in an earl i er manuscri pt ; Wal t er
Rosenbaum f or discussions about a practical use of
this work; and George Markowsky f or hel p in un-
derstandi ng the di st ri but i on of perf ormance of the
class obt ai ned by;tabl e l ook- up.
References:
[ 1 ]
[ 2 ]
Aho, A.V., Hopcroft, J.E., and Ullman, J.D.,
The Design and Analysis of Computer Algor-
ithms, Addison-Wesley, Reading, Mass.
(1974).
Gill, J.T., III, Comput at i onal Compl exi t y of
Probabi l i sti c Turing Machines, Proceedings of
the Sixth AC,~f Symposium on the Theory of
[ 3 ]
[ 4 ]
[ 5 ]
[ 6 ]
[ 7]
[ 8 ]
[ 9]
Computing, May, 1976, Seattle, Washi ngton,
p. 91- 95.
Goto, E., and Kanada, Y., Hashing Lemmas
on Ti me Compl exi t i es wi t h Appl i cat i ons to
Formul a Mani pul at i on, Proceedings of the
1976 ACM Symposium on Symbolic and Alge-
braic Computation, August, 1976, Yorktown
Heights, New York, p. 149- 153.
Gustavson, F., and Yun, D.Y.Y., Ari t hmet i c
Compl exi t y of Unor der ed or Sparse
Pol ynomi al s Proceedings of the 1976 ACM
Symposium on Symbolic and Algebraic Computa-
tion, August, 1976, Yor kt own Heights, New
York, p. 154- 159.
Knuth, D.E., Sorting and Searching,
Addi son- Wesl ey, Reading, Mass. (1973).
Markowsky, G., pri vate communi cat i on.
Rabin, M.O., Probabilistic algorithms,
Proceedings of Symposium on New Directions and
Recent Results in Algorithms and Complexity
(1976 - to appear).
Rosenbaum, W.S., and Hilliard, J.J,, Mul t i -
f ont OCR postprocessi ng system. I BM Jour-
nal of Research and Development (Jul y 1975).
Strassen, V., and Sol ovay, R., A fast
Mont ~- Car l o test f or pri mal i ty, Sl AM Journal
on Computing (to appear).
1 1 2

You might also like