A Simple PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

A Simple PDF File

This is a small demonstration .pdf file -

just for use in the Virtual Mechanics tutorials. More text. And more
text. And more text. And more text. And more text.

And more text. And more text. And more text. And more text. And more
text. And more text. Boring, zzzzz. And more text. And more text. And
more text. And more text. And more text. And more text. And more text.
And more text. And more text.

And more text. And more text. And more text. And more text. And more
text. And more text. And more text. Even more. Continued on page 2 ...
Simple PDF File 2
...continued from page 1. Yet more text. And more text. And more text.
And more text. And more text. And more text. And more text. And more
text. Oh, how boring typing this stuff. But not as boring as watching
paint dry. And more text. And more text. And more text. And more text.
Boring. More, a little more text. The end, and just as well.
RESEARCHCONTRIBUTIONS

Faster Methods for


Lloyd Fosdick
Gue s t Editor Random Sampling

JEFFREY SCOTT VITTER

A B S T R A C T : S e ve ralne w m e thods are pre s e nte dfor ing, a n d p ro b a b ilis tic a lg o rith m s . In te re s t in th is s u b je c t
s e le cting n records at random w ith o u t re place m e ntfrom a s te m s fro m wo rk o n a n e w e xte rn a l s o rtin g m e th o d
file containing N records. Each algorithm s e le cts the records c a lle d Bu c ke tS o rt th a t u s e s ra n d o m s a m p lin g fo r p re -
for the s am ple in a s e que ntialm a n n e r--in the s am e orde r p ro c e s s in g[7].
the records appe ar in the file . The algorithm s are online in O n e wa y to s e le c t th e n re c o rd s is to g e n e ra te a n
that the records for the s am ple are s e le cte dite rative ly w ith in d e p e n d e n tra n d o m in te g e r k b e twe e n I a n d N a n d to
no preprocessing. The algorithm s re quire a cons tant am ount s e le c t th e kth re c o rd if it h a s n o t a lre a d y b e e n s e le c te d ;
of s pace and are s hort and e as y to im ple m e nt. The m ain this p ro c e s s is re p e a te d u n til n re c o rd s h a ve b e e n s e -
re s ult of this pape r is the de s ign and analys is of Algorithm D, le cte d. (If n > N /2 , it is fa s te r to s e le c t th e N - n
which does the s am pling in O(n) tim e , on the average; re c o rd s not in th e s a mple .) Th is is a n e xa m p le o f a
roughly n uniform random variate s are ge ne rate d,and nons e que ntialalgorithm b e c a u s eth e re c o rd s in th e s a m -
approxim ate ly n e xpone ntiationope rations (of the form ab, p le m ig h t n o t b e s e le c te d in lin e a r o rd e r. F o r e xa m p le ,
for real num be rs a and b) are pe rform e d during the th e 8 4 th re c o rd in th e file m a y b e s e le c te d b e fo re th e
s am pling. This s olve s an ope n proble m in the lite rature . 16th re c o rd in th e file is s e le c te d .Th e a lg o rith m re -
CPU tim ings on a large m ainfram e com pute rindicate that q u ire s th e g e n e ra tio n o f O(n) u n ifo rm ra n d o m va ria te s ,
Algorithm D is s ignificantly fas te r than the s am pling a n d it ru n s in O(n) tim e if th e re is e n o u g h e xtra s p a c e
algorithm s in us e today. to c h e c k in c o n s ta n t tim e wh e th e r th e kth re c o rd h a s
a lre a d y b e e n s e le c te d . Th e c h e c kin g c a n b e d o n e in
O(N) s p a c e u s in g a bit a rra y o r with O(n) p o in te rs u s in g
1. INTR O DUC TIO N h a s h in g te c h n iq u e s (e .g,, [2, 6]}. In e ith e r ca s e , th e
Ma n y c o m p u te r s c ie n c e a n d s ta tis tics a p p lic a tio n s s p a c e re q u ire d m a y b e p ro h ib itive .
ca ll for a s a m p le o f n re c o rd s s e le c te d ra n d o m ly with - Ofte n, we wa n t th e n re c o rd s in th e s a m p le to b e in
o u t re p la c e m e n tfro m a file c o n ta in in g N re c o rd s th e s a m e o rd e r th a t th e y a p p e a r in th e file s o th a t th e y
o r fo r a ra n d o m s a m p le o f n in te g e rs fro m th e s e t c a n b e a c c e s s e ds e q u e n tia lly, for e xa m p le , if th e y re -
{1, 2, 3 . . . . . NI. Bo th typ e s o f ra n d o m s a m p lin g a re s ide o n d is k o r ta pe . In o rd e r to a c c o m p lis h th is u s in g a
e s s e n tia lly e q u iva le n t; fo r c o n ve n ie n c e ,in th is p a p e r n o n s e q u e n tia la lg o rith m , we m u s t s ort th e re c o rd s b y
we re fe r to th e fo rm e r typ e o f s a m p lin g , in wh ic h re c - th e ir in d ic e s afte r th e s a m p lin g is d o n e . Th is re q u ire s
o rd s a re s e le c te d .S o m e im p o rta n t u s e s o f s a m p lin g in- O(n log n) tim e u s in g a c o m p a ris o n -b a s e ds o rtin g a lgo-
c lu d e m a rke t s u rve ys , q u a lity c o n tro l in m a n u fa c tu r- rith m like q u ic ks o rt o r h e a p s o rt;a d d re s s -c a lc u la tio n
S ome of this re s e a rchwa s done while th e a u th o rwa s cons ultingfor th e IBM s o rtin g c a n re d u c e th e s o rtin g tim e to O(n), o n th e a ve r-
P a le Alto S cie ntific Ce nte r, S upport wa s a ls o provide d in pa rt by NS F Re s e a rch a ge , b u t it re q u ire s s p a c e fo r O(n) p o in te rs . No n s e q u e n -
Gra nt MCS -81-05324,by a n IBM re s e a rchcontra ct, a n d by ONR a n d DARP A
u n d e r Contra ct N00014-83.K-0146a nd ARP A O rd e r No. 4786. An e xte n d e d tia l a lg o rith m s th u s ta ke n o n lin e a r tim e , o r th e ir s p a c e
a bs tra ct of this re s e a rcha ppe a rsin [10]. re q u ire m e n ts a re ve ry la rg e a n d th e a lg o rith m is s o m e -
©1984ACMO001.0782/84/0700-0703
75¢ wh a t c o m p lic a te d .

July 1984 Volume 27 Number 7 Communicationsof the ACM 703


ResearchContributions

Mo re im p o rta n tly, th e n re c o rd s c a n n o t b e o u tp u t in TABLE I: Performance of Algorithms


s e q u e n tia lo rd e r online : It ta ke s O(n) tim e to o u tp u t th e
firs t e le m e n t s in c e th e s o rtin g c a n b e g in o n ly a fte r a ll n Average Average
Uniform
re c o rd s h a ve b e e n s e le c te d . Algorithm Random Running
Th e a lte rn a tive we ta ke in th is p a p e r is to in ve s tig a te Time
Variates
s e que ntialra n d o m s a m p lin g a lg o rith m s , wh ic h s e le c t
th e re c o rd s in th e s a m e o rd e r th a t th e y a p p e a r in th e S (N + 1)n O(N)
file . Th e s e q u e n tia ls a m p lin g a lg o rith m s in th is p a p e r n+l
a re id e a lly s u ite d to o n lin e u s e s in c e th e y ite ra tive ly A n O(N)
s e le c t th e n e xt re c o rd fo r th e s a m p le in a n e ffic ie n t
wa y. Th e y a ls o h a ve th e a d va n ta g eo f b e in g e xtre m e ly B n O (n ~ log Iog(-~))
s h o rt a n d s im p le to im p le m e n t.
Th e m e a s u re o f p e rfo rm a n c e we u s e for th e a lgo- C n(n + 1) O(n 2)
rith m s is C P U tim e , n o t I/ O tim e . Th is is re a s o n a b lefo r 2
re c o rd s s to re d o n ra n d o m -a c c e s sd e vic e s like R AM o r D :n O(n)
dis k s in c e a ll th e a lg o rith m s ta ke O(n) I/ O tim e in th is
ca s e . It is re a s o n a b lefor ta p e s to ra g e a s we ll s in c e m a n y tia lly p ro c e s s e sth e re c o rd s o f th e file a n d d e te rm in e s
ta p e d rive s h a ve a fa s t-fo rwa rd s p e e d th a t c a n q u ic kly wh e th e r e a c h re c o rd s h o u ld b e in c lu d e d in th e s a m p le .
s kip o ve r u n wa n te d re c o rd s . In te rm s o f s a m p lin g n Wh e n n re c o rd s h a ve b e e n s e le c te d , th e a lg o rith m te r-
in te g e rs o u t o f N, th e I/ O tim e is in s ig n ific a n t b e c a u s e m in a te s . If m re c o rd s h a ve a lre a d y b e e n s e le c te d fro m
th e re is n o file o f re c o rd s b e in g re a d . a m o n g th e firs t t re c o rd s in th e file , th e (t + 1)s t re c o rd
Th e m a in re s u lt o f th is p a p e r is th e d e s ig n a n d a n a ly- is s e le c te d with p ro b a b ility
s is o f a fa s t n e w a lg o rith m , c a lle d Alg o rith m D, wh ic h
d o e s th e s e q u e n tia l s a m p lin g in O(n) tim e , o n th e a ve r- n - m 1 -t (2-1)
a ge . Th is yie ld s th e o p tim u m ru n n in g tim e u p to a
c o n s ta n t fa ctor, a n d it s o lve s th e o p e n p ro b le m lis te d in In th e im p le m e n ta tio n b e lo w, th e va lu e s o f n a n d N
e xe rc is e 3.4.2-8 in [6]. Ap p ro xim a te ly n u n ifo rm ra n - d e c re a s ed u rin g th e c o u rs e o f e xe c u tio n . All o f th e a lgo-
d o m va ria te s a re g e n e ra te dd u rin g th e a lg o rith m , a n d rith m s in th is p a p e r fo llo w th e c o n ve n tio n th a t n is the
ro u g h ly n e xp o n e n tia tio n o p e ra tio n s (of th e fo rm a b = num be ro f re cords re m aining to be s e le cte dand N is the
e xp(b In a ), fo r re a l n u m b e rs a a n d b) a re p e rfo rm e d . num be r of re cords that have not y e t be e n proce s s e d.(This is
Th e m e th o d is m u c h fa s te r th a n th e p re vio u s ly fa s te s t- d iffe re n t fro m th e im p le m e n ta tio n s o f Alg o rith m S in
kn o wn s e q u e n tia l a lg o rith m , a n d it is fa s te r a n d s im p le r [3, 4, 6] in wh ic h n a n d N re m a in c o n s ta n t, a n d a u xil-
th a n th e n o n s e q u e n tia la lg o rith m s m e n tio n e d a b o ve . ia ry va ria b le s like m a n d t in (2-1) a re us e d.) With th is
In th e n e xt s e c tio n , we d is c u s s Alg o rith m S , wh ic h c o n ve n tio n , th e p ro b a b ility o f s e le c tin g th e n e xt re c o rd
u p u n til n o w wa s th e m e th o d o f c h o ic e for s e q u e n tia l for th e s a m p le is s im p ly n / N . Th is c a n b e p ro ve d di-
ra n d o m s a m p lin g . In S e c tio n 3, we s ta te a n d a n a lyz e re c tly b y th e fo llo win g s h o rt b u t s u b tle a rg u m e n t: If a t
th re e n e w m e th o d s (Alg o rith m s A, B, a n d C); th e m a in a n y g ive n tim e we m u s t s e le c t n m o re re c o rd s a t ra n -
re s ult, Alg o rith m D, is p re s e n te din S e c tio n 4. Th e n a - d o m frofn a pool o f N re m a in in g re c o rd s , th e n th e n e xt
ive im p le m e n ta tio n o f Alg o rith m D re q u ire s th e g e n e ra - re c o rd s h o u ld b e c h o s e n with p ro b a b ility n / N .
tio n o f a p p ro xim a te ly 2n u n ifo rm ra n d o m va ria te s a n d Th e a lg o rith m s in th is p a p e r a re writte n in a n E n g -
th e c o m p u ta tio n o f ro u g h ly 2n e xp o n e n tia tio n o p e ra - lis h-like s tyle u s e d b y th e m a jo rity o f p a p e rs o n ra n d o m
tions . O n e o f th e o p tim iz a tio n s g ive n in S e c tio n 5 re - s a m p lin g in th e lite ra tu re . In a d d itio n , P a s c a l-like im -
d u c e s b o th c o u n ts fro m 2n to n. Th e a n a lys is in S e c tio n p le m e n ta tio n s a re g ive n in th e Ap p e n d ix.
6 s h o ws th a t th e ru n n in g tim e o f Alg o rith m D is lin e a r
in n. Th e p e rfo rm a n c e o f Alg o rith m S a n d th e fo u r n e w ALG O R ITHM S . Th is m e th o d s e q u e n tia lly s e le c ts n re c -
m e th o d s is s u m m a riz e d in Ta b le I. o rd s a t ra n d o m fro m a file c o n ta in in g N re c o rd s ,
S e c tio n 7 give s C P U tim in g s fo r F O R TR AN 77 im p le - wh e re 0 _< n ~ N. Th e u n ifo rm ra n d o m va ria te s g e n e r-
m e n ta tio n s o f Alg o rith m s S , A, C, a n d D o n a la rge a te d in S te p $1 m u s t b e in d e p e n d e n to f o n e a n o th e r.
m a in fra m e IBM 3081 c o m p u te r s ys te m ; th e ru n n in g S 1. [G e n e ra te U.] G e n e ra te a ra n d o m va ria te U th a t
tim e s o f th e s e fo u r a lg o rith m s (in m ic ro s e c o n d s )a re is u n ifo rm ly d is trib u te d b e twe e n 0 a n d 1.
a p p ro xim a te ly 1 6 N (Alg o rith m S ), 4 N (Alg o rith m A),
S 2. [Te s t.] If N U > n, go to S te p $4.
8n 2 (Alg o rith m C), a n d 55n (Alg o rith m D). In S e c tio n 8,
we d ra w c o n c lu s io n s a n d d is c u s s re la te d wo rk. Th e $3. [S e le ct.] S e le c t th e n e xt re c o rd in th e file fo r th e
Ap p e n d ix give s th e P a s c a l-like ve rs io n s o f th e F O R - s a m p le , a n d s e t n := n - 1 a n d N := N - 1. If
TR AN p ro g ra m s u s e d in th e C P U tim in g s . A s u m m a ry n > 0, th e n re tu rn to S te p $1; o th e rwis e , th e
o f this wo rk a p p e a rs in [10]. s a m p le is c o m p le te a n d th e a lg o rith m te rm in a te s .
S 4. [Do n 't s e le ct.] S kip o ve r th e n e xt re c o rd (do n o t
2. ALG O R ITHM S in c lu d e it in th e s a mple ), s e t N := N - 1, a n d
In this s e c tio n , th e s e q u e n tia l ra n d o m s a m p lin g m e th o d re tu rn to S te p $1. |
in tro d u c e d in [3, 4] is d is c u s s e d .Th e a lg o rith m s e q u e n -

704 Com m unications


of the ACM July 1984 Volum e 27 Num be r 7
R e s e archContributions

Be fore th e a lg o rith m is ru n , e a c h re c o rd in th e file


h a s th e s a m e c h a n c e o f b e in g s e le c te dfor th e s a m p le .
F u rth e rm o re ,th e a lg o rith m n e ve r ru n s off th e e n d of
th e file b e fo re n re c o rd s h a ve b e e n c h o s e n :If a t s o m e
p o in t in th e a lg o rith m we h a ve n -- N, th e n e a c h of th e n/N ........ cg(s)
re m a in in gn re c g rd s in th e file will b e s e le c te dfor th e • o o o e •o f(s)
s a m p le with p ro b a b ility o n e . Th e a ve ra g e n u m b e r of @

u n ifo rm ra n d o m va ria te s g e n e ra te db y Alg o rith m S is * o o o **, h(s)


(N + 1 ) n / ( n + 1), a n d th e a ve ra g e r u n n in g tim e is O(N). O
Alg o rith m S is s tu d ie d fu rth e r in [6].
o

o
3 . THR E E N E W S E Q U E N TIAL ALG O R ITHMS
o...
• •

We d e fin e S (n , N ) to b e th e ra n d o m va ria b le th a t c o u n ts
th e n u m b e r of re c o rd s to s k ip o v e r b e fo re s e le c tin gth e
n e xt re c o rd for th e s a m p le . Th e p a ra m e te rn is th e I ° :iii8 8 , . , • *1

n u m b e r of re c o rd s re m a in in gto b e s e le c te d ,a n d N is N/n N - n N
th e to ta l n u m b e r o f re c o rd s le ft in th e file . In o th e r
wo rd s , th e (S (n, N) + 1)s t re c o rd is th e n e xt o n e s e - FIGURE 1. The probability funcUon f(s ) = ProbIS = s } is g ra phe d
le c te d . O fte n we will a b b re via teS (n , N ) b y S in wh ic h a s a function of s . The me a n a nd s tandard de viation of S a re both
ca s e th e p a ra m e te rsn a n d N will b e im p lic it. approximately N/n. The quantities cgls) and his) that are used in
In th is s e c tio n , th re e n e w m e th o d s (Alg o rith m s A, B, Algorithm D are a ls o g ra phe d for the c a s e in whic h the random
a n d C) for s e q u e n tia lra n d o m s a m p lin ga re p re s e n te d variable X is integer-valued.
a n d a n a lyz e d .A fo u rth n e w m e th o d (Alg o rith m D),
wh ic h is th e m a in re s u lt of th is p a p e r, is d e s c rib e da n d Th e e xp re s s io nn / ( N - s ) is th e p ro b a b ility th a t th e
a n a lyz e din S e c tio n s4 a n d 5. E a c h m e th o d d e c id e s (s + 1)s t re c o rd is s e le c te dfor th e s a m p le , g ive n th a t th e
wh ic h re c o rd to s a m p le n e xt b y g e n e ra tin gS a n d b y firs t s re c o rd s a re n o t s e le c te d .Th e p ro b a b ility fu n c tio n
s kip p in g th a t m a n y re c o rd s . Th e g e n e ra lfo rm of a ll /(s ) = P rob{S = s}, for 0 <_ s <_ N - n, is e q u a l to F(s ) -
fo u r a lg o rith m s is a s follows : F(s - 1). S u b s titu tin g(3-1), we ge t th e fo llo win g two
e xp re s s io n sfo r/(s ), 0 _< s _< N - n :
S te p 1. G e n e ra tea ra n d o m va ria te S (n , N }.
S te p 2. S kip o ve r th e n e xt S (n , N ) re c o rd s in th e file n (N - s - 1)a =l n (N - n)~ (3-3)
a n d s e le c t th e fo llo win g o n e for th e s a m p le . f(s ) = ~ (N - 1)n=l = N (N - 1)~
S e t N : = N - S (n , N ) - 1 a n d n := n - 1.
R e tu rn to S te p 1 if n > 0. Wh e n s < 0 o r s > N - n, we d e fin e f{s ) = O. An
a lte rn a te d e riva tio n of (3-3} fo llo ws fro m th e c o m b in a -
Th e fo u r m e th o d s d iffe r fro m o n e a n o th e r in h o w th e y to ria l id e n tity
p e rfo rm S te p 1. G e n e ra tin gS in vo lve s g e n e ra tin go n e o r
m o re ra n d o m va ria te s th a t a re u n ifo rm ly d is trib u te d
b e twe e n0 a n d 1. As in Alg o rith m S in th e la s t s e c tio n ,
a ll u n ifo rm v a ria te s a re a s s u m e d to b e in d e p e n d e n t o f o n e
a n o th e r. Th e e xp e c te dva lu e ~ ( S ) is e q u a l to
Th e ra n g e of S (n, N) is th e s e t of in te g e rs in th e
in te rva l 0 ___s < N - n. Th e d is trib u tio n fu n c tio n F(s ) = N-n
P ro b lS -< s }, for 0 _<s ___N - n, c a n b e e xp re s s e din two 3 Y (S ) = • s f(s } - n + 1' (3-4)
wa ys :
a n d th e va ria n c e va r(S ) is e q u a l to
F(s ) = 1 (N - s - 1)n (N - n)+~-~
- Nn = 1 - N~ .t • (3 - 1}
va r(S ) = Y~ s Zf(s ) - _~(S )2 = {N + 1)(N - n )n (3-5}
(n + 2}(n + 1}2
{We u s e th e n o ta tio n a b to d e n o te th e "fa llin g p o we r"
a(a - 1) . . . (a - b + 1) = a !/{a - b)!.) We h a ve F(s ) = 0 Both th e e xp e c te dva lu e a n d th e s ta n d a rdd e via tio n o f S
for s < 0 a n d F(s ) = 1 for s _> N - n. Th e two fo rm u la s a re = N / n . Th e p ro b a b ility fu n c tio n f(s ) is g ra p h e d in
in (3-1} fo llo w b y in d u c tio n fro m th e re la tio n F ig u re 1.

1 - F(s ) = P ro b lS > s}
3 .1 Alg o rith m A
Th is is , b y fa r, th e s im p le s t o f th e fo u r m e th o d s . It is
= P r o b lS > s - 1 1 -N_ s b a s e d o n th e o b s e rva tio nth a t f(s ) is e q u a l to th e d iffe r-
e n c e F(s ) - F(s - 1). We c a n g e n e ra teS b y s e ttin g it
e q u a l to th e m in im u m va lu e s s u c h th a t U -< F(s ),
= ( l- F ( s - 1 ) ) 1 N- " (3-2)
wh e re U is u n ifo rm ly d is trib u te d o n th e u n it in te rva l.

J u ly 1984 V o lu m e 27 N u m b e r 7 Com m unicationso[ the A C M 7115


R e s e archC o n trib u tio n s

By (3-1), we h a ve fin d th e a p p ro xim a te ro o t s o f (3-6) wh e n s is la rge .


E a c h ite ra tio n in vo lve s th e c o m p u ta tio n o f F(s ) a n d
U _< 1 (N - n)~2" AF(s ), for s o m e va lu e s. Th e e va lu a tio n o f F(s ) re q u ire s
N~+I ,
O (n ) time , a n d AF(s ) = f(s + 1) c a n b e c o m p u te d fro m
(N - n) ~ F(s ) in c o n s ta n t tim e u s in g (3-1) a n d (3-3). Th u s , th e
_ < I- U . tim e p e r s e le c te d re c o rd is O (n lo g log S) fo r la rg e S. Th e
N+~2
tota l s a m p lin g tim e is b o u n d e d b y O(Y.I~_I~_, t lo g log S ).
Th e ra n d o m va ria b le V = 1 - U is u n ifo rm ly d is trib - Us in g th e c o n s tra in t th a t Y,l~t~, S _<N - n, it is e a s y to
u te d a s U is , s o we c a n g e n e ra te V d ire c tly, a s in th e s h o w th a t ~l_~t_~,t log log S is m a xim iz e d wh e n e a c h S
fo llo win g a lg o rith m . is a p p ro xim a te ly N / n . He n c e , th e to ta l ru n n in g tim e fo r
Alg o rith m B is O(n2(l + log lo g ( N / n ) ) ) in th e wo rs t ca s e .
ALG O R ITHM A. Th is m e th o d s e q u e n tia lly s e le c ts n It c a n b e s h o wn th a t th e a ve ra g e ru n n in g tim e is n o t
re c o rd s a t ra n d o m fro m a file c o n ta in in g N re c o rd s , b e tte r th a n th e wo rs t-c a s e tim e b y m o re th a n a
wh e re 0 -< n ~ N. Th e u n ifo rm ra n d o m va ria te s g e ne r- c o n s ta n t fa ctor.
ate d in S te p A1 m u s t b e in d e p e n d e n to f o n e a n o th e r. We c a n o b ta in a h ig h e r o rd e r c o n ve rg e n c ein th e
A1. [G e n e ra te V.] G e n e ra te a ra n d o m va ria te V th a t s e a rc h for th e m in im u m s b y re p la c in g Ne wto n 's
is u n ifo rm ly d is trib u te d b e twe e n 0 a n d 1. m e th o d with a n in te rp o la tio n s c h e m e th a t u s e s h ig h e r
o rd e r d iffe re n c e s AkF(s ) = A k -1 F(s + 1) -- Ak-IF(s ). E a c h
A~.. [F in d m in im u m s .] S e a rc h s e q u e n tia lly fo r th e
d iffe re n c e A k F(s ), for k > 1, c a n b e c o m p u te d in
m in im u m va lu e s _ 0 s o th a t (N - n) ~+~ <_ N ~ +~ V " c o n s ta n t tim e fro m A k -IF(s ) u s in g th e fo rm u la
S e t S := s.
A3. [S e le c t th e (S + 1)s t re c o rd .] S kip o ve r th e n e xt AkF(s ) = -- 7_ S -- A k -IF(s )" (3-7)
S (n , N ) re c o rd s in th e file a n d s e le c t th e fo llo w-
in g o ne fo r th e s a m p le . S e t N := N - S (n , N ) - 1 Alg o rith m B d o e s n o t s e e m to b e o f p ra c tic a l in te re s t,
a n d n := n - 1. R e tu rn to S te p A1 if n > 0. | e s p e c ia lly wh e n c o m p a re d to Alg o rith m s A a n d D, s o
fu rth e r d e ta ils a re o m itte d .
S te ps A1 -A3 a re ite ra te d n tim e s , o n c e fo r e a c h s e -
le c te d re c o rd in th e s a m p le . In o rd e r to g e n e ra te S, th e 3.3 Alg o rithm C (Inde pe nde nc e Me tho d)
in n e r lo o p im p lic it in S te p A2 is e xe c u te d O (S + 1) Le t U1, U2. . . . . U, b e in d e p e n d e n ta n d u n ifo rm ly dis -
time s ; e a c h lo o p ta ke s c o n s ta n t time . Th e to ta l tim e trib u te d ra n d o m va ria b le s o n th e u n it in te rva l. Th e dis -
s p e n t e xe c u tin g S te p A2 is O (Y , I~ /~ , (S + 1)) = O(N). Th e trib u tio n fu n c tio n F(s ) = P ro b [S _< s} c a n b e e xp re s s e d
tota l tim e is th u s O(N). a lg e b ra ic a lly a s
Alg o rith m s S a n d A b o th re q u ire O(N) tim e , b u t th e
F(s ) = 1 - II N - s - k
n u m b e r n o f u n ifo rm va ria te s g e n e ra te db y Alg o rith m
A is m u c h le s s th a n (N + 1 ) n / ( n + 1), wh ic h is th e
a ve ra g e n u m b e r o f va ria te s g e n e ra te db y Alg o rith m S . -- 1 - I1 (1 - Fk(S + 1)), (3-8)
l~ k <_ n
De p e n d in g o n th e im p le m e ta tio n , Alg o rith m A c a n b e
fo u r to e ig h t tim e s fa s te r. wh e re we le t Fk(X) = P rob{(N - k + !)Uk <-- X} = x / ( N --
Alg o rith m A is s im ila r to th e o n e p ro p o s e din [3], k + 1) b e th e d is trib u tio n fu n c tio n o f th e ra n d o m va ri-
e xc e p t th a t in th e la tte r m e th o d , th e m in im u m s > 0 a b le (N - k + 1)Uk. By in d e p e n d e n c e ,we h a ve
s a tis fyin g U <_ F(s ) is fo u n d b y re c o m p u tin g F(s ) fro m
II (1 - Fk(s + 1))
s c ra tc h fo r e a c h s u c c e s s iveva lu e o f s . Th e re s u ltin g 1 -.k-<n

a lg o rith m ta ke s O (n N ) tim e . As th e a u th o rs in [3] n o te d , = [I P rob{(N - k + 1)Uk > s + 1}


th a t a lg o rith m wa s d e fin ite ly s lo we r th a n th e ir im p le - l~ _ k <_ n

m e n ta tio n o f Alg o rith m S .


. oblmin + >s + 11
J

3.2 Alg o rithm B (Ne wto n's Me thod}


S u b s titu tin g th is b a c k in to (3-8), we ge t
In S te p A2 o f Alg o rith m A, th e m in im u m va lu e s
s a tis fyin g U <_ F(s ) is fo u n d b y m e a n s o f a s e q u e n tia l F(s ) = 1 - P ro b lm .in {( N - k + 1)Uk} > S + 1]
s e a rc h . An o th e r wa y to d o th a t is to fin d th e I ]<_k<~_ _ 1 !

"a p p ro xim a te ro o t" s o f th e e q u a tio n


= P roblmin_
_ , {( N - k + 1)Uk} _ < s + 11
F(s ) ~ U, (3-6)
b y u s in g a va ria n t o f Ne wto n 's in te rp o la tio n m e th o d . = P ro b {[m in {( N - k + 1,Uk}] < S}. (3-9,
Th is re s u lta n t m e th o d is c a lle d Alg o rith m B.
S in c e F(s ) d o e s n o t h a ve a c o n tin u o u s d e riva tive , we (The n o ta tio n t x l, wh ic h is re a d "flo o r o f x," d e n o te s
u s e in its p la c e th e d iffe re n c e f u n c t io n th e la rge s t in te g e r _< x.) Th is s h o ws th a t S h a s th e s a m e
d is trib u tio n a s th e flo o r o f th e m in im u m o f th e n in d e -
AF(s ) = r(s + 1) - r(s ) = f(s + 1).
p e n d e n t ra n d o m va ria b le s N U1 , (N - 1)U2. . . . . (N - n
Ne wto n 's m e th o d c a n b e s h o wn to c o n ve rg e fo r th is + 1)U,. Th e fo llo win g a lg o rith m m a ke s u s e o f th is fa ct
s itu a tio n . Th e m e th o d re q u ire s O (lo g log S ) ite ra tio n s to to g e n e ra te S.

706 C o m m u n ic a tio n s o f the A C M July 1984 V o lu m e 27 N u m b e r 7


ResearchContributions

Th e c o m p a ris o n U > f(LX J )/c g (X ) th a t is m a d e in o r-


ALG O R ITHM C (Inde pe nde nceMe th o d ). Th is m e th o d s e -
d e r to d e c id e w h e t h e r LXJ s h o u ld b e re je c te d in vo lve s
q u e n tia lly s e le c ts n re c o rd s a t ra n d o m fro m a file c o n -
th e c o m p u ta tio n o f f(iX3), wh ic h b y (3.3) re q u ire s
ta in in g N re c o rd s wh e re 0 _< n _< N. Th e u n ifo rm ra n -
O(min{n, LXJ + 1}) tim e . S in c e th e p ro b a b ility o f re je c -
d o m va ria te s g e n e ra te din S te p C1 m u s t b e in d e p e n d e n t
tio n is ve ry s m a ll, we c a n a vo id th is e xp e n s e m o s t o f
o f o n e a n o th e r.
th e tim e b y s u b s titu tin g fo r f(s ) a m o re q u ic kly c o m -
C1. [G e n e ra te Uk, 1 -< k -< n.] G e n e ra te n in d e p e n d - p u te d fu n c tio n h(s ) s u c h th a t
e n t ra n d o m va ria te s U1, U2 . . . . . U, , e a c h u n i-
fo rm ly d is trib u te d b e twe e n 0 a n d 1. h(s ) <_ f(s ). (4-2)
C2. [F in d m in im u m . ] S e t S := t m in ls L, {( N - k + With h ig h p ro b a b ility, we will h a ve U <_ h(LXJ)/cg(X).
1)Uk}J . IfS = N - n + 1, s e t S to a n a rb itra ry Wh e n th is o c c u rs , it fo llo ws th a t U <_ f{tX J }/c g (X }, s o we
va lu e b e twe e n 0 a n d N - n. (We c a n ig n o re th is c a n a c c e p t LXJ a n d s e t S := LXJ . Th e va lu e o f f(LXJ )
te s t if th e ra n d o m n u m b e r g e n e ra to ru s e d in m u s t b e c o m p u te d o n ly wh e n U > h(LXJ)/cg(X}, wh ic h
S te p C1 c a n o n ly p ro d u c e n u m b e rs le s s th a n 1.) h a p p e n s ra re ly. Th is te c h n iq u e is s o m e tim e s c a lle d a
C3. [S e le c t th e (S + 1)s t re c o rd .] S kip o ve r th e n e xt s que e z em e th o d s in c e we h a ve h(ixJ ) ~ f(LxJ ) ~ cg{x}.
S (n, NO re c o rd s in th e file a n d s e le c t th e fo llo win g Typ ic a l va lu e s o f th e fu n c tio n s f(s ), cg(s ), a n d h(s ) a re
o n e for th e s a m p le , S e t N := N - S (n, N) - 1 a n d g ra p h e d in F ig u re 1 fo r th e c a s e in wh ic h X is a n in te -
n := n - 1. R e tu rn to S te p C1 if n > 0. I g e r-va lu e d ra n d o m va ria b le .
Wh e n n is la rg e with re s p e c t to N, th e re je c tio n te c h -
S te p s C 1 -C 3 a re ite ra te d n tim e s , o n c e for e a c h s e - n iq u e m a y b e s lo we r th a n th e p re vio u s a lg o rith m s in
le c te d re c o rd . Th e s e le c tio n o f th e jth re c o rd in th e th is s e c tio n d u e to th e o ve rh e a d in vo lve d in g e n e ra tin g
s a m p le , wh e re 1 _<j _< n, re q u ire s th e g e n e ra tio no f X, h(tXJ ), a n d g(X}. F o r la rg e n, Alg o rith m A is th e
n - j + 1 in d e p e n d e n tu n ifo rm ra n d o m va ria te s a n d fa s te s t s a m p lin g m e th o d . Th e fo llo win g a lg o rith m u ti-
ta ke s O(n - j + 1) tim e . He n c e , Alg o rith m C re q u ire s liz e s a c o n s ta n t a th a t s p e c ifie s wh e re th e tra d e o ff is : If
n + (n - 1) + . . . + 1 = n(n + 1 )/2 u n ifo rm va ria te s , n < a N, th e re je c tio n te c h n iq u e is u s e d to d o th e s a m -
a n d it ru n s in O(n2) tim e . p lin g ; o th e rwis e , if n _> a N, th e s a m p lin g is d o n e b y
4. ALGORITHM D (REJECTION METHOD) Alg o rith m A. Th e va lu e o f a d e p e n d s o n th e p a rtic u la r
It is in te re s tin g to n o te th a t if th e te rm (N - k + 1)Uk in c o m p u te r im p le m e n ta tio n . Typ ic a l va lu e s o f a c a n b e
(3-9) is re p la c e d b y NUk, th e re s u ltin g e xp re s s io nwo u ld e xp e c te d to b e in th e ra n g e 0 .0 5 -0 .1 5 . F o r th e im p le -
b e th e d is trib u tio n fu n c tio n for th e m in im u m o f n re a l m e n ta tio n d e s c rib e din S e c tio n ,7, we h a ve a ~ 0.07.
n u m b e rs in th e ra n g e fro m 0 to N. Th a t d is trib u tio n is
th e c o n tin u o u s c o u n te rp a rt o f S , a n d it a p p ro xim a te sS ALG O R ITHM D (R e je ction Me thod). Th is m e th o d s e -
we ll. O n e o f th e ke y id e a s in th is s e c tio n is th a t we c a n q u e n tia lly s e le c ts n re c o rd s a t ra n d o m fro m a file c o n -
g e n e ra teS in c o n s ta n t tim e b y g e n e ra tin gits c o n tin u - ta in in g N re c o rd s , wh e re 0 ~ n _< N. At a n y g ive n p o in t
o u s c o u n te rp a rt a n d th e n "c o rre c tin g " it s o th a t it h a s in th e a lg o rith m , th e va ria b le n s to re s th e n u m b e r o f
e xa c tly th e d e s ire d d is trib u tio n fu n c tio n F(s ). re c o rd s th a t re m a in to b e s e le c te dfo r th e s a m p le , a n d
Alg o rith m D h a s th e g e n e ra l fo rm d e s c rib e d a t th e N s to re s th e n u m b e r o f (u n p ro c e s s e d )re c o rd s le ft in th e
b e g in n in g o f S e c tio n 3. Th e ra n d o m va ria b le S is g e n e r- file . Th e u n ifo rm ra n d o m va ria te s g e n e ra te din S te p D2
a te d b y a n a p p lic a tio n o f vo n Ne u m a n n ' s re je ction-ac- m u s t b e in d e p e n d e n to f o n e a n o t h e r Th e fu n c tio n s g(x}
ce ptance m e th o d to th e d is c re te c a s e . We u s e a ra n d o m a n d h(s ) a n d th e c o n s ta n t c >_ 1 d e p e n d o n th e c u rre n t
va ria b le X th a t is e a s y to g e n e ra te a n d th a t h a s a d is tri- va lu e s o f n a n d N, a n d th e y m u s t s a tis fy (4-1) a n d (4-2).
b u tio n wh ic h a p p ro xim a te sF(s ) we ll. F o r s im p lic ity, we Th e c o n s ta n t a is in th e ra n g e 0 ___a _< 1.
a s s u m e th a t X is e ith e r a c o n tin u o u s o r a n in te g e r-
va lu e d ra n d o m va ria b le . Le t g(x) d e n o te th e d e n s ity D1. [Is n >_ aN?] If n _> a N, u s e Alg o rith m A to d o th e
fu n c tio n o f X if X is c o n tin u o u s o r e ls e th e p ro b a b ility s a m p lin g a n d th e n te rm in a te th e a lg o rith m .
fu n c tio n o f X if X is in te g e r-va lu e d .We c h o o s e a c o n - (O th e rwis e , we u s e th e re je c tio n te c h n iq u e o f
s ta n t c >_ 1 s o th a t S te p s D2 -D5 .)
f(ixJ} <_ cg(x), (4-1) D2. [G e n e ra te U a n d X.] G e n e ra te a ra n d o m va ria te
U th a t is u n ifo rm ly d is trib u te d b e twe e n 0 a n d 1
for a ll x in th e d o m a in o f g(x).
a n d a ra n d o m va ria te X th a t h a s d e n s ity fu n c tio n
In o rd e r to g e n e ra te S , we g e n e ra te X a n d a ra n d o m
o r p ro b a b ility fu n c tio n g(x).
va ria te U th a t is u n ifo rm ly d is trib u te d o n th e u n it in -
te rva l. If U > f{LXJ)/cg(X) (wh ic h o c c u rs with lo w p ro b - D3. [Ac c e p t? ] If U <_ h (LX l)/c g (X ), th e n s e t S := LXJ
a b ility), we re je ct LXI a n d s ta rt a ll o ve r b y g e n e ra tin ga a n d go to S te p D5.
n e w X a n d U. Wh e n th e c o n d itio n U <_ f(tX J )/c g (X ) is D4. [Ac c e p t?1 If U <_ f(LX l)/c g (X ), th e n s e t S := LXJ.
fin a lly s a tis fie d , th e n we acce pt iX/ a n d m a ke th e a s - O th e rwis e . re tu rn to S te p D2.
s ig n m e n t S := LXJ . A m o d ific a tio n o f th e fo llo win g D5. [S e le c t th e (S + 1)s t re c o rd .] S kip o ve r th e n e xt
le m m a is p ro ve n in [6]. S (n, N) re c o rd s in th e file a n d s e le c t th e fo llo w-
LEMMA 1 in g o n e fo r th e s a m p le . S e t N := N - S (n, N ) - 1
Th e random variate S ge ne rate dby the above proce dure has a n d n := n - 1. R e tu rn to S te p D2 if n > 0. I
dis tribution (3-1}.

July 1984 Volume 27 Num be r7 Communicationsof the ACM ?07


R e s e archContributions

Cho o s ing the P a ra me te rs G-fl(y) = N(1 - (1 - y)I/"}.


Two g o o d wa ys (n a m e ly, (4-3) a n d (4-5)) fo r c h o o s in g
th e p a ra m e te rsX, c, g(x), a n d h(s ) a re p re s e n te db e lo w. S in c e 1 - U is u n ifo rm ly d is trib u te d wh e n U is , we c a n
g e n e ra te X1 b y s e ttin g
Dis c o ve rin g th e s e two wa ys is th e h a rd p a rt o f th is
s e c tio n ; o n c e th e y a re d e te rm in e d , it is e a s y to p ro ve XI: = N ( 1 - U ~/") or X~ : = N ( 1 - e - Y/ ") . (4-4)
Le m m a s 2 a n d 3, wh ic h s h o w th a t (4-3) a n d (4-5) s a tis fy
c o n d itio n s (4-1) a n d (4-2). Th e fo llo win g le m m a s h o ws th a t th e d e fin itio n s in
Th e firs t wa y wo rks b e tte r w h e n n 2 / N is s m a ll, a n d (4-3) s a tis fy re q u ire m e n ts (4-1) a n d (4-2).
th e s e c o n d wa y is b e tte r wh e n n 2 / N is la rg e . An e a s y LEMMA 2.
ru le for d e c id in g wh ic h to u s e is a s fo llo ws : If n 2 / N ~_ fl, The choice s o f gl(x), cl, a n d h~(s )in (4-3) s a tis fy th e
th e n we u s e X l, cl, gl(x), a n d hi(s ); e ls e if n 2 / N > fl, re lation
th e n X2, c2, g2(s ), a n d h2(s ) a re u s e d . Th e va lu e o f th e
c o n s ta n t fl is im p le m e n ta tio n -d e p e n d e n t.We s h o w in hi(s ) <-f(s } <- c lg l(s + 1).
S e c tio n 5 th a t in o rd e r to m in im iz e th e a ve ra g e n u m b e r No te th a t s in c e gl(x) is a n o n in c re a s in gfu n c tio n , th is
o f u n ifo rm va ria te s g e n e ra te db y Alg o rith m D, we im m e d ia te ly im p lie s {4-1}.
s h o u ld s e t fl ~ 1 .Th e ru n n in g tim e s o f th e F O R TR AN
im p le m e n ta tio n s d is c u s s e din S e c tio n 7 a re m in im iz e d PROOF
b y fl ~ 50. Th e p ro o f is s tra ig h tfo rwa rd . F irs t we p ro ve th e s e c o n d
O u r firs t c h o ic e o f th e p a ra m e te rsis in e q u a lity. We h a ve
n (N-s - 1)"-1
(n[
~ 1 -~ )
x\"-' , if0 - < x ~ N ; f(s ) - N (N - 1)'az2-
gl(x)
! n (N - s - 1)v-=!
1 . 0, o th e rwis e ;
- N - n +1 N a=t
N

n(
c~= N - n +1 ' (4-3) -N -n + l

(, 1
s ;1 , if O <s <N - n ;
- N - n + 1 ffl 1 - - - = c lg ,(s + 1).
hi(s ) = N-n +l - -
Th e "~ " te rm in th e a b o ve d e riva tio n fo llo ws b e c a u s e
I. 0, o th e rwis e . N - s - l- k N -s -1
-< , for 0_< k_< n - 2. Th e firs t
N -k N
Th e ra n d o m va ria b le X~ with d e n s ity gl(x) h a s th e be ta
in e q u a lity c a n b e p ro ve d in th e s a m e wa y:
dis tribution s c a le d to th e in te rva l [0, N] a n d with p a ra m -
e te rs a = 1 a n d b = n. It is th e c o n tin u o u s c o u n te rp a rt n N s -n +l
o f S , a s m e n tio n e d in th e b e g in n in g o f th is s e c tio n : Th e h~(s ) =- ~ • -- n + T
va lu e o f X1 c a n b e th o u g h t o f a s th e s m a lle s t o f n re a l
n u m b e rs c h o s e n in d e p e n d e n tlya n d u n ifo rm ly fro m th e n (N-s - 1~
< -- = f(s ).
in te rva l [0, N]. - N (N - 1) "-~
We c a n g e n e ra te X1 ve ry q u ic kly with o n ly o n e u n i-
Th e "<__"te rm fo llo ws s in c e
fo rm o r e xp o n e n tia lra n d o m va ria te . Le t Z1, Z2 . . . . . Z,
d e n o te n in d e p e n d e n ta n d u n ifo rm ly c h o s e n re a l n u m - N - s - n +l <N N S - l- k
N - n +l - ZT-- ~ . , fo r 0 _ < k _ < n - 2 . I
b e rs fro m th e in te rva l [0, N]. We d e fin e Gl(x) to b e th e
d is trib u tio n fu n c tio n o f X1. By in d e p e n d e n c e ,we h a ve Th e s e c o n d c h o ic e fo r th e p a ra m e te rsis
G~(x) = P rob{X, _< x}

= 1 - P rob{Xl > x}
g 2 (s )-~ n-l(1 n t'
1 ~ -- , s >_ 0;

= 1- H P ro b{Zk>X} n N-1
l~_k<_n C 2 = - - - - ;
(4 -5 )

(,-;y
n --1 N

=1-

It is we ll kn o wn th a t we c a n g e n e ra te a ra n d o m va ria te
h2(s ) =
{ (1
N ~,1--
if O <_ s <_ N - n ;

X1 with d is trib u tio n Gl(x) b y s e ttin g 0, o th e rwis e .

X1 := G~q(U) o r X1 := G71(e -r), Th e ra n d o m va ria b le X2 with p ro b a b ility fu n c tio n g2(s )


h a s th e g e o m e tric dis tribution. Its ra n g e o f va lu e s is th e
wh e re U is u n ifo rm ly d is trib u te d o n th e u n it in te rva l s e t o f n o n n e g a tive in te g e rs .
a n d Y is e xp o n e n tia lly d is trib u te d . By a lg e b ra ic m a n ip - We c a n g e n e ra te X2 q u ic kly with a s in g le u n ifo rm o r
u la tio n , we g e t e xp o n e n tia lra n d o m va ria te b y s e ttin g

?08 Com m unicationsof the ACM July 1 9 8 4 Volum e 27 Num be r7


ResearchContributions

X2:=l(InU)/In(l
N-I)] Wh e n n / N ~ c~, th e tim e re q u ire d to d o th e n - 1 e xtra
"Is n -> a N ? " te s ts will b e c o m p e n s a te dfo r b y th e de -
c re a s e dtim e for g e n e ra tin gS ; if n / N ~e a , th is m o d ific a -
or (4-6) tio n will c a u s e a s light in c re a s e in th e ru n n in g tim e
(a p p ro xim a te ly 1 -2 p e rc e n t). An im p o rta n t a d va n ta g e
×. o f this m o d ific a tio n wh e n X~ is u s e d fo r X in th e re je c -
tio n te c h n iq u e is to g u a rd a g a in s t "wo rs t c a s e " b e h a v-
wh e re U is u n ifo rm ly d is trib u te d o n th e u n it in te rva l ior, wh ic h h a p p e n s wh e n th e va lu e o f N d e c re a s e s
a n d Y is e xp o n e n tia lly d is trib u te d . Th is is e a s y to d ra s tic a lly a n d b e c o m e s ro u g h ly e q u a l to n a s a re s u lt
p ro ve : Le t p d e n o te th e fra c tio n (n - 1 )/(N - 1). We o f a ve ry la rge va lu e o f S b e in g g e n e ra te d ;in s u c h ca s e s ,
h a ve X2 = s if a n d o n ly if s ~ (In U)/ln (1 - p) < s + 1, th e ru n n in g tim e o f th e re m a in d e r o f th e a lg o rith m will
wh ic h is e q u iva le n t to th e c o n d itio n (1 - p)5 _> U > (1 - be q u ite la rge , o n th e a ve ra g e u n le s s th e m o d ific a tio n is
p)~+l, a n d this o c c u rs with p ro b a b ility ga (s ) = p(1 - p)L us e d.
Th e fo llo win g le m m a s h o ws th a t (4-5) s a tis fie s re - Th e im p le m e n ta tio n s o f Alg o rith m D in th e Ap p e n -
q u ire m e n ts (4-1) a n d (4-2). dix us e a s lig h tly d iffe re n t m o d ific a tio n , in wh ic h a "Is
n _> a N? " te s t is d o n e a t th e s ta rt o f e a c h lo o p u n til th e
LEMMA 3.
te s t is tru e , a fte r wh ic h Alg o rith m A is c a lle d to finis h
The choice s g2(s ), c2, and h2(s ) in (4-5) s a tis fy the re lation
th e s a mpling. Th e re s u ltin g p ro g ra m is s im p le r th a n th e
h2(s ) -- f(s ) ~ c2g2(s ) firs t m o d ific a tio n , a n d it s till p ro te c ts a g a in s t wo rs t-c a s e
b e h a vio r.
PROOF
Th is p ro o f is a lo n g th e lin e s o f th e p ro o f o f Le m m a 2. 5 .2 Th e S p e c ia l Ca s e n -" 1
To p ro ve th e firs t in e q u a lity, we n o te th a t Th e s e c o n d m o d ific a tio n s p e e d s u p th e g e n e ra tio n o f S
n ( N - n ) ~- n ( N - n y wh e n o n ly o n e re c o rd re m a in s to b e s e le c te d .Th e ra n -
f(s } - ~ (N 1}~-<- -N \ N - 1 / d o m va rib le S (1, N) is u n ifo rm ly d is trib u te d a m o n g th e
in te g e rs 0 ~ s -< N - 1; th u s wh e n n = 1 we c a n
g e n e ra te S d ire c tly b y s e ttin g S := LNUJ , wh e re U is
n ~ N N ~ FJ - = c .x.(s ). u n ifo rm ly d is trib u te d o n th e u n it in te rva l. (The c a s e
U -- 1 h a p p e n s with z e ro p ro b a b ility s o wh e n we h a ve
Th e firs t in e q u a lity c a n be p ro ve d in th e s a m e wa y: U = 1, we c a n a s s ign S a rb itra rily.) Th is m o d ific a tio n
n (N -s -n + 1)s n (N -n ) ~ c a n be a p p lie d to a ll th e s a m p lin g a lg o rith m s d is c u s s e d
h.(s} = ~ N~ s - <- N ( N- - 1- ) -~- f( s ) . | in this pa pe r.

5.3 R e d u c in g the Nu m b e r o f Un ifo rm R a n d o m


Va ria te s G e n e ra te d
5. O P TIMIZING ALG O R ITHM D
Th e th ird m o d ific a tio n a llo ws u s to re d u c e th e n u m b e r
In this s e ction, fo u r m o d ific a tio n s o f th e n a ive im p le -
o f u n ifo rm ra n d o m va ria te s u s e d in Alg o rith m D b y
m e n ta tio n o f Alg o rith m D a re g ive n th a t c a n im p ro ve
ha lf. E a c h g e n e ra tio n o f X a s d e s c rib e d in (4-4) a n d (4-6)
th e ru n n in g tim e s ignifica ntly. In p a rtic u la r, th e la s t
two m o d ific a tio n s c u t th e n u m b e r o f u n ifo rm ra n d o m re q u ire s th e g e n e ra tio n o f a n in d e p e n d e n tu n ifo rm ra n -
d o m va ria te , wh ic h we d e n o te b y V. (In th e c a s e in
va ria te s g e n e ra te da n d th e n u m b e r o f e xp o n e n tia tio n
wh ic h a n e xp o n e n tia l va ria te is u s e d to g e n e ra te X, we
o p e ra tio n s p e rfo rm e d b y ha lf, wh ic h m a ke s th e a lgo-
rith m ru n twic e a s fa s t, Two d e ta ile d im p le m e n ta tio n s a s s u m e th a t th e e xp o n e n tia l va ria te is g e n e ra te db y
u tiliz in g th e s e m o d ific a tio n s a re g ive n in th e Ap p e n d ix. firs t g e n e ra tin ga u n ifo rm va ria te , wh ic h is typ ic a lly
th e ca s e .)E xc e p t for th e firs t tim e X is g e n e ra te d ,th e
va ria te V (a nd h e n c e X) c a n b e c o m p u te d in a n in d e -
5 .1 Wh e n to Te s t n _> a N p e n d e n t wa y u s in g th e va lu e s o f U a n d X fro m th e
Th e va lu e s o f n a n d N d e c re a s ee a c h tim e S is g e n e r- p re vio u s loop, a s follows : Du rin g S te ps D3 a n d p o s s ib ly
a te d in S te p D5. If in itia lly we h a ve n / N ~ a, th e n D4 o f th e p re vio u s in n e r loop, it wa s d e te rm in e d th a t
d u rin g th e c o u rs e o f e xe c u tio n , th e va lu e o f n / N will e ith e r U _<y~, yl < U _<y2, o r y2 < U, wh e re y~ =
p ro b a b ly b e s o m e tim e s < a a n d s o m e tim e s _> a . Wh e n h (LX J )/c g (X ) a n d y2 = f(LX J )/c g (X ). We c o m p u te V for
th a t is th e ca s e , it m ig h t b e a d va n ta g e o u sto m o d ify th e n e xt loop b y s e ttin g
Alg o rith m D a n d do th e "Is n >_ a N T ' te s t e a c h o f th e n
tim e s S m u s t b e g e n e ra te d .If n < a N, th e n we g e n e ra te
S b y d o in g S te ps D2-D4; o th e rwis e , s te ps A1 a n d A2 a re t yl
U, if U ~ yl;
e xe c u te d . Th is c a n be im p le m e n te d b y c h a n g in g th e
"go to" in S te p D5 s o th a t it re tu rn s to S te p D1 in s te a d V := U
y2 yl
yl , if yl < U _< y2; (5-1)
o f to D2, a n d b y th e fo llo win g s u b s titu tio n for S te p DI:
D1 . [Is n >_ a N ? ] If n >- a N , th e n g e n e ra te S b y e xe - U y2 if y2 "< U.
c u tin g S te ps A1 a n d A2 o f Alg o rith m A, a n d go 1 y2
to S te p D5. (O th e rwis e , S will b e g e n e ra te db y Th e fo llo win g le m m a c a n b e p ro ve n u s in g th e d e fin i-
S te ps D2-D4.) tio n s o f in d e p e n d e n c ea n d o f V:

July 1984 Volum e 27 Num be r 7 Communications of the ACM ?00


ResearchContributions

LEMMA 4. If th e te s t is tru e , wh ic h is a lm o s t a lwa ys th e ca s e , we


The value V com pute d via (5-1) is a uniform random variate s e t V' to th e d iffe re n c e o f th e LHS m in u s th e RHS ; th e
that is inde pe nde ntof all pre vious value s of X and of re s u ltin g V' h a s th e s a m e d is trib u tio n a s th e n a tu ra l
whe the ror not each X was acce pte d. lo g a rith m o f a u n ifo rm ra n d o m va ria te . We c a n g e n e r-
a te th e n e xt va lu e o f X2 b y s e ttin g
5.4 R e d u c in g the Numbe r o f E xp o n e n tia tio n
Ope rations
X2 := V' In 1 ~ (5-5)
An e xp o n e n tia tio n o p e ra tio n is th e c o m p u ta tio n o f th e
fo rm a b = e xp(b In a ), for re a l n u m b e rs a a n d b. It c a n
b e d o n e in c o n s ta n t tim e u s in g th e lib ra ry fu n c tio n s (cf., (4-6)). Th e c o m m o n te rm ln(1 - (n - 1 )/(N - 1)) in
EXP a n d LOG. F o r s im p lic ity, e a c h c o m p u ta tio n o f e xp (5-4) a n d (5-5) n e e d o n ly b e c o m p u te d o n c e p e r loop, s o
o r In is re g a rd e d a s "h a lf" a n e xp o n e n tia tio n o p e ra tio n . th e tota l n u m b e r o f In o p e ra tio n s is ro u g h ly th re e p e r
Firs t, th e c a s e in wh ic h X1 is u s e d fo r X is c o n s id e re d . lo o p (wh ic h c o u n ts a s 1.5 e xp o n e n tia tio n o p e ra tio n s p e r
By th e la s t m o d ific a tio n , o n ly o n e u n ifo rm ra n d o m loop).
va ria te m u s t b e g e n e ra te dd u rin g e a c h loop, b u t e a c h Th e m o d ific a tio n d is c u s s e din th is s e c tio n is a n e ffi-
loop s till re q u ire s two e xp o n e n tia tio n o p e ra tio n s :o n e c ie n t a lte rn a tive to u s in g (5-1) fo r c o m p u tin g V, fo r th e
ca s e U ~ ya . Wh e n we h a ve U > ya , wh ic h h a p p e n s
to c o m p u te Xa fro m V u s in g (4-4) a n d th e o th e r to
with ve ry lo w p ro b a b ility, it is q u ic ke r to g e n e ra te V b y
c o m p u te
c a llin g a ra n d o m n u m b e r g e n e ra to rth a n it is to u s e a
h a ( L X lJ ) _ N - n +l( N - n - IX a ]+l N ;)n-a te c h n iq u e s im ila r to (5-1).
clgl(X1) N -I~ - n 7 -1 N -~ "
We c a n c u t d o wn th e n u m b e r o f e xp o n e n tia tio n sto 6. ANALYS IS O F ALG O R ITHM D
ro u g h ly o n e p e r lo o p in th e fo llo win g wa y: In s te a d o f In this s e c tio n , we p ro ve th a t th e a ve ra g e n u m b e r o f
d o in g th e te s t U <- hl(tXad)/cagl(X~), we u s e th e e q u iva - u n ifo rm ra n d o m va ria te s g e n e ra te db y Alg o rith m D a n d
le n t te s t th e a ve ra g e ru n n in g tim e a re b o th O(n). We a ls o d is c u s s
h o w th e c o rre c t c h o ic e o f Xa o r X2 d u rin g e a c h ite ra tio n
N U o f th e a lg o rith m c a n fu rth e r im p ro ve p e rfo rm a n c e .
(N - n + l )1/{.-1)
N - n - [Xa J + 1 N
< • (5-2)
- N -n + l N -X a 6.1 Ave ra g e Numbe r V ( n , IV) o f Unifo rm Ra ndo m
If th e te s t is tru e , wh ic h is a lm o s t a lwa ys th e ca s e , Variate s
we s e t V' to th e q u o tie n t o f th e LHS d ivid e d b y th e Th e a ve ra g e n u m b e r o f u n ifo rm ra n d o m va ria te s g e n e r-
RHS ; th e re s u ltin g V' h a s th e s a m e d is trib u tio n a s th e a te d d u rin g Alg o rith m D is d e n o te d b y V(n, N}. We u s e
(n - 1)s t ro o t o f a u n ifo rm ra n d o m va ria te . S in c e n V'(n , N} to d e n o te th e a ve ra g e n u m b e r o f u n ifo rm va r-
d e c re a s e sb y 1 b e fo re th e s ta rt o f th e n e xt loop, we c a n ia te s fo r th e m o d ifie d ve rs io n o f th e a lg o rith m in wh ic h
g e n e ra te th e n e xt va lu e o f X1 with o u t d o in g a n e xp o - e a c h va ria te X is c o m p u te d fro m th e p re vio u s va lu e s o f
n e n tia tio n b y s e ttin g U a n d X. Th e o re m s 1 a n d 2 s h o w th a t V(n, N} a n d V'(n ,
N) a re a p p ro xim a te ly e q u a l to 2n a n d n, re s p e c tive ly.
Xa := N(1 - V') (5-3)
(cf., (4-4)). Th u s , in a lm o s t a ll ca s e s ,o n ly o n e e xp o n e n - THEOREM1
tia tio n o p e ra tio n is re q u ire d p e r loop. An o th e r im p o r- The ave rage num be r V(n, N} of uniform random variate s
ta n t a d va n ta g eo f u s in g th e te s t (5-2) in s te a d o f th e te s t us e d by the unm odifie d Algorithm D is bounde d by
U <- ha(tX1l)/caga(XO is th a t th e p o s s ib ility o f flo a tin g
p o in t u n d e rflo w is e lim in a te d . 2nN , ifn < a N ;
Wh e n X2 is u s e d for X, we h a ve a s im ila r s itu a tio n , V (n ,N )_ < N-n + 1 (6-1)
th o u g h n o t q u ite a s fa vo ra b le . Th e c o m p u ta tio n o f X2
n, if n ~_ a N .
fro m V re q u ire s two In o p e ra tio n s (wh ic h c o u n ts a s o n e
e xp o n e n tia tio n o p e ra tio n , a s e xp la in e d a bove ). An o th e r
Wh e n n < a N , we h a ve V(n, N ) ~ 2n(1 + n /N }. Th e
e xp o n e n tia tio n o p e ra tio n in e a c h lo o p is re q u ire d to
b a s ic id e a o f th e p ro o f is th a t U a n d X m u s t b e g e n e r-
c o m p u te
a te d ro u g h ly 1 + n / N tim e s , o n th e a ve ra g e ,in o rd e r to
h2(X2) _ (.N - n - X2 + l N - ln)X=" g e n e ra te e a c h o f th e n va lu e s o f S. Th u s , a p p ro xim a te ly
c2g2(X2) X X2 N 2n(1 + n /N } va ria te s a re n e e d e d fo r th e s a m p lin g . Th e
d iffic u lt p a rt o f th e fo llo win g p ro o f is a c c o u n tin g fo r th e
Th e n u m b e r o f e xp o n e n tia tio n sc a n b e c u t d o wn to
fa ct th a t th e va lu e s o f n a n d N c h a n g e d u rin g th e
a b o u t 1.5 p e r loop, a s follows : In s te a d o f d o in g th e te s t
c o u rs e o f e xe c u tio n .
U ~ h2(X2)/c2g2(X2), w e u s e th e e q u iva le n t te s t
PROOF
ln U ~ X2 x In 1 /~ - It is a s s u m e dth a t Xa, ga(x), ca, a n d ha(s), wh ic h a re
d e fin e d in (4-3), a re u s e d fo r X, g(x), c, a n d h(s ) in
Alg o rith m D. If n _> a N, th e n Alg o rith m A is u s e d , a n d
e xa c tly n u n ifo rm ra n d o m va ria te s a re g e n e ra te d .

110 Com m unications


of the ACM July 1984 Volume 27 Num be r7
ResearchContributions

F o r th e n < a N c a s e , (6-1) is d e rive d b y in d u c tio n In o u r d e riva tio n o f (6-1) a n d (6-2), we a s s u m e dth a t


o n n. In o rd e r to g e n e ra te S , th e a ve ra g e n u m b e r o f X1 wa s u s e d fo r X th ro u g h o u t Alg o rith m D. We c a n d o
tim e s S te p s D2 -D4 a re e xe c u te d is 1 /(1 - r), wh e re r is b e tte r if we s o m e tim e s u s e X2 fo r X. We s h o we d a b o ve
th e p ro b a b ility o f re je c tio n . Th e p ro b a b ility o f re je c tio n th a t we n e e d a n a ve ra g e o f c ite ra tio n s to g e n e ra te e a c h
is r = f0N g(t)(1 - f(t)/(cg(t))) dt = 1 - 1 /c . By s u b s titu - s u c c e s s iveS . F o r Xl, we h a ve cl = N / ( N - n + 1)
tio n , e a c h g e n e ra tio n o f S re q u ire s a n a ve ra g e o f 1 /(1 - 1 + n /N ; fo r X2, we h a ve c2 = (n /(n - 1))((N - 1 )/N)
r) = c ite ra tio n s , wh ic h c o rre s p o n d sto 2c u n ifo rm ra n - 1 + 1 /n . Th u s , we c o u ld u s e X1 w h e n n 2 /N <_ fl, a n d
d o m va ria te s . If n = 1, we h a ve c = 1, s o LXI is a c c e p te d we c o u ld u s e X2 wh e n n 2 / N > fl, wh e re /8 = 1.
im m e d ia te ly a n d we h a ve V(1, N) = 2. No w le t u s Th e fo llo win g in tu itive a rg u m e n t in d ic a te s th a t th is
a s s u m e th a t (6-1) is tru e fo r s a m p le s o f s iz e n - 1; we m ig h t re d u c e V'(n , N) to
will s h o w th a t it re m a in s tru e for n s a m p le d re c o rd s . V'(n, N)
By (6-1) a n d (3-3), we h a ve
2N In ( 1 + N), ifn 2 /N <_ l, n <a N ;
V(n, IV) <_ + Y, f(s )V(n - 1, N - s - 1)
N- n + 1 O<s~N-n
n 1+ , ifn 2 / N > l, n < a N ; (6-3)
2N
- N - n +l
In , if n >_ aN.
+ y, n (N-s - 1)a = 2 2 ( n - 1 ) ( N - s - 1)
0-~s--N-.N ( N - 1)"-1 N-s -n + 1 Th e in fo rm a l ju s tific a tio n o f (6-3) is b a s e d o n th e o b s e r-
va tio n th a t th e ra tio n / N u s u a lly d o e s n o t c h a n g e m u c h
2N 2n(n - 1) d u rin g th e e xe c u tio n o f Alg o rith m D. At th e e xp e n s e o f
N-n +1 + N~ m a th e m a tic a lrig o r, we will m a ke th e s im p lifyin g a s -
s u m p tio n th a t n / N re m a in s c o n s ta n t d u rin g e xe c u tio n .
Y~ ( N - s - 1 )(N-s - 1)"-2 Th e va lu e o f n 2 /N = n (n /N ) d e c re a s e slin e a rly to 0 a s n
O~-s ~-N-n
d e c re a s e sto 0. If in itia lly we h a ve n 2 /N ~ 1 a n d n <
--
2N 2n(n
. ]_ - -
- 1) a N , th e n n 2 /N will a lwa ys b e _< 1 d u rin g e xe c u tio n , s o
N - n +1 N" X1 will b e u s e d th ro u g h o u t Alg o rith m D; th u s , V '(n , N)
n(1 + n /N ), a s in Th e o re m 2. If in s te a d we h a ve n 2 / N
~N (N-s) "-1- X ( N - s - 1)"-2) > 1 a n d n < a N in itia lly, th e n X2 will b e u s e d fo r X th e
0_<S --n 0 ~ 5 <_ N-n firs t n - N / n tim e s S is g e n e ra te d ,a fte r wh ic h we will
h a ve n 2 /N ~- 1. Th e ra n d o m va ria b le X1 will b e u s e d
2N 2 n ( n - 1)
for X th e la s t N / n tim e s S is g e n e ra te d .He n c e , th e to ta l
- N - n +l + N-" n u m b e r o f u n ifo rm ra n d o m va ria te s is a p p ro xim a te ly
( N +_ 1)u ( n - 1 ) !- - - N
+ ("-~
n -2 )! )
n n -1
2nN
-N -n "
Th is c o m p le te s th e p ro o f o f Th e o re m 1. |
THEOREM 2 N N
= n - -- + H. - HN / , , + - - + 1
The ave rage num be rV '(n , N} of uniform random variate s n n

us e d by Algorithm D w ith the s e condand third m odifica- n2


tions de s cribe d in the pre vious s e ction is bounde dby = n + l+ ln - - .
N

f nN if n < a N ; (Th e s ym b o l H, d e n o te s th e n th h a rm o n ic n u m b e r 1 +
V '(n , N) _< N - n + 1' (6-2) 1 /2 + . . - + 1 /n .) Th is c o m p le te s th e a rg u m e n t.
n, if n >_ a N.
PROOF 6 . 2 Av e ra g e E x e c u tio n Tim e T ( n , IV )

We n e e d o n ly c o n s id e r th e c a s e n < c~N. By th e la s t T(n, N) is u s e d to re p re s e n tth e a ve ra g e to ta l ru n n in g


m o d ific a tio n , th e va ria te X is g e n e ra te du s in g a u n ifo rm tim e o f Alg o rith m D. As s h o wn in Ta b le II, we c a n
ra n d o m va ria te V c o m p u te d fro m th e p re vio u s va lu e s o f
U a n d X. Th u s , th e firs t ite ra tio n re q u ire s th e g e n e ra - TABLE I1: Times per Step for Algorithm D.
tio n o f th e two u n ifo rm ra n d o m va ria te s U a n d V, b u t
Step Time per Step
e a c h s u c c e s s ivelo o p re q u ire s o n ly th e g e n e ra tio no f U.
By th e s e c o n d m o d ific a tio n , th e la s t ite ra tio n d o e s n o t D1 d~
re q u ire a n y ra n d o m va ria te s to b e g e n e ra te d .Th u s , th e D2 d2
n u m b e r o f u n ifo rm ra n d o m va ria te s th a t m u s t b e g e n - D3 da
e ra te d is e xa c tly 1/2 V(n, N). Th e th e o re m fo llo ws fro m D4 d4 • min{n, LXJ + 1 }
D5 ds
(6-1). B

July 1984 Volum e27 Num be r7 Communicationsof the ACM 711


ResearchContributions

b o u n d th e tim e it ta ke s to e xe c u te e a c h s te p o f Alg o - Th e p ro o f o f Th e o re m 1 s h o ws th a t th e to ta l c o n trib u -


rith m D e xa c tly o n c e b y th e q u a n titie s d~, d2, d3, tio n o f S te p D4 to T(n, N) is a t m o s t
d 4 .m in {n , [XJ + 1}, a n d ds , wh e re e a c h di is a p o s itive
nN
re a l-va lu e dc o n s ta n t. 3nd4cl : 3d4
If in itia lly we h a ve n / N > a , th e n Alg o rith m A is N - n +l"
u s e d to d o th e s a m p lin g , a n d th e a ve ra g e ru n n in g tim e Th is c o m p le te s th e p ro o f o f Th e o re m 3. |
T(n, N) c a n b e b o u n d e d c lo s e ly b y d~ + d 'N + d" n, fo r
We p ro ve d th e tim e b o u n d (6-4) u s in g X1 fo r X
s o m e c o n s ta n ts d" a n d G '. Th e fo llo win g th e o re m
th ro u g h o u t th e a lg o rith m . We c a n d o b e tte r if we in -
s h o ws th a t T(n, N) is a t m o s t lin e a r in n.
s te a d u s e X1 fo r X wh e n n 2 /N <_ fl a n d X2 fo r X w h e n
THEOREM3. n 2 /N > ft. We s h o we d in S e c tio n 6.1 th a t th e va lu e
The average running tim e T(n, N) of Algorithm D is fl ~ 1 m in im iz e s th e a ve ra g e n u m b e r o f u n ifo rm va r-
bounde dby ia te s g e n e ra te d .Th e va lu e o f fl th a t o p tim iz e s th e a ve r-
a ge ru n n in g tim e o f Alg o rith m D d e p e n d s o n th e c o m -
T(n, N)
p u te r im p le m e n ta tio n . F o r th e F O R TR AN im p le m e n ta -
tio n d e s c rib e din S e c tio n 7, we h a ve fl -~ 50.
d l+N _ n +n N 1 (d2 + d3 + 3d4) + ds n, Th e c o n s ta n ts di, for 2 < i < 5, h a ve d iffe re n t va lu e s
<- if n < aN; (6-4) wh e n X2 is u s e d fo r X th a n w h e n X~ is u s e d . In o rd e r to
g e t a n in tu itive id e a o f h o w m u c h fa s te r Alg o rith m D is
d~ + d 'N + d in , if n > a N. wh e n we u s e X~ and X2, le t u s a s s u m e th a t th e va lu e s o f
th e c o n s ta n ts di a re th e s a m e for X2 a s th e y a re fo r X1. If
PROOF we b o u n d th e tim e fo r S te p D4 b y d4n ra th e r th a n b y
All th a t is n e e d e d is th e n < a N c a s e o f (6-4), in wh ic h d4(LXlJ + 1) a s we d id in th e p ro o f o f Th e o re m 3, we
th e re je c tio n te c h n iq u e is u s e d . We a s s u m e th a t X~, c a n s h o w th a t wh e n n 2 /N < fl th e tim e re q u ire d to
gl(x), ci, a n d hi(s ) a re u s e d in p la c e o f X, g(x), c, a n d h(s) g e n e ra te S u s in g X1 fo r X is a t m o s t
th ro u g h o u t Alg o rith m D. S te p s D2 a n d D3 a re e a c h
e xe c u te d c tim e s , o n th e a ve ra g e ,wh e n S is g e n e ra te d . N ( d 2 +d a +2 d 4 n _ ~ ) +d s . (6-5)
N - n +l
Th e p ro o f o f Th e o re m 1 s h o ws th a t th e to ta l c o n trib u -
tio n to T(n, N) fro m S te p s D1, D2, D3, a n d D5 is S im ila rly, we c a n p ro ve th a t th e tim e re q u ire d to g e n -
bounde dby e ra te S wh e n n 2 /N > /3 u s in g X2 fo r X is b o u n d e d b y
ro u g h ly
nN
dl + (d2 + d3) + ds n.
N - n +1
n - 1 N + ds . (6-6)
Th e tric ky p a rt in th is p ro o f is to c o n s id e r th e c o n tri-
b u tio n to T(n, N) fro m S te p D4. Th e tim e fo r e a c h e xe - (Th e p ro o f th a t S te p D4 ta ke s ~ 6d4(N - 1 )/(n (n - 1))
c u tio n o f S te p D4 is b o u n d e d b y d4 .ra in {n , I.X1J + 1} tim e to g e n e ra te e a c h S re q u ire s in tric a te a p p ro xim a -
d4(I.X~J + 1). S te p D3 is e xe c u te d a n a ve ra g e o f C1 tim e s tio n s .) Th e b o u n d s (6-5) a n d (6-6) a re e q u a l w h e n n 2 /N
p e r g e n e ra tio n o f S . Th e p ro b a b ility th a t U > h fftX])/ ~ fl, for s o m e c o n s ta n t 1 _< fl _< v'3. F o r s im p lic ity, le t u s
Clgl(X) in S te p D3 (wh ic h is th e p ro b a b ility th a t S te p D4 a s s u m e th a t fl ~ 1 (wh ic h m e a n s th a t d4 << d2 + da +
is e xe c u te d n e xt) is 1 - h~(I.XJ)/clg~(X). He n c e , th e tim e ds ). By a n in fo rm a l a rg u m e n t s im ila r to th e o n e a t th e
s p e n t e xe c u tin g S te p D4 in o rd e r to g e n e ra te S is e n d o f th e la s t s e c tio n , we c a n s h o w th a t th e ru n n in g
bounde dby tim e o f Alg o rith m D is re d u c e d to

Cl 3oI~N d4(x + 1}gl(x) (cgl(x)/hl{x)


1 - ~ dx d l+n 1 + ~ 2+da +d,~ +ds n,

fo jo
if n2/N <_ fl, n < aN;
= cld4 (x + 1)gl(x) dx - d4 (x + 1)h~(x) dx.
d l+ n(1 + 1 +ln--(nn2/N!)(d2+da)+ dsn
Th e firs t in te g ra l is T(n, N) (6-7)

cld4(..~(Xl) + 1) = Cld4
N+n +l
n+l
+d 4 +1+6 (ln N + N ,

Th e s e c o n d in te g ra l e q u a ls if na/N > fl, n K aN;


d 4 N+2 d~ + d 'N +d '.'n , if n>--aN.
c~ n + l

Th e d iffe re n c e o f th e two in te g ra ls is b o u n d e d b y 7 . E MP IR IC AL C O MP AR IS O N S
Alg o rith m s S , A, C, a n d D h a ve b e e n im p le m e n t e d in
3d4Q F O R TR AN 77 o n a n IBM 3081 m a in fra m e c o m p u te r in

712 Com m unications


of the ACM July 1984 Volum e27 Number7
ResearchContributions

TABLE II1: Average CPU Times (IBM 3081) a ra n d o m s a m p le o f in te g e rs is g e n e ra te db y tru n c a tin g


Average Execution Time e a c h e le m e n t in a ra n d o m s a m p le o f cn u n ifo rm re a l
Algorithm (microseconds) n u m b e rs in th e ra n g e [0, N + 1), for s o m e c o n s ta n t c >
1; th e re a l n u m b e rs c a n b e g e n e ra te ds e q u e n tia lly b y
S =17N th e a lg o rith m in [1]. If th e re s u ltin g s a m p le o f tru n c a te d
A =4N
re a l n u m b e rs c o n ta in s m ~ n d is tin c t in te g e rs , th e n
C =8n 2
D =55n
Alg o rith m S (or b e tte r ye t, Alg o rith m A) is a p p lie d to
th e s a m p le o f s ize m to p ro d u c e th e fina l s a m p le o f s ize
n; if m < n, th e n th e firs t pa s s is re p e a te d . Th e p a ra m e -
te r c > 1 is c h o s e n to be a s s m a ll a s pos s ible , b u t la rge
o rd e r to ge t a g o o d id e a o f th e lim it o f th e ir p e rfo rm - e n o u g h to m a ke it ve ry u n like ly th a t th e firs t pa s s m u s t
a n c e . Th e F O R TR AN im p le m e n ta tio n s a re d ire c t tra n s - b e re p e a te d ;th e o p tim u m va lu e o f c c a n b e d e te rm in e d
la tio n s o f th e P a s c a l-like ve rs io n s g ive n in th e Ap p e n - for a n y g ive n im p le m e n ta tio n . Du rin g th e firs t pa s s , th e
dix. Th e a ve ra g e C P U tim e s a re lis te d in Ta b le III. m d is tin c t in te g e rs a re s to re d in a n a rra y o r lin ke d lis t,
F o r e xa m p le , for th e c a s e n = 103, N = 108, th e C P U wh ic h re q u ire s s p a c e for O(m ) p o in te rs ; h o we ve r, th is
tim e s we re 0.5 h o u rs for Alg o rith m S , 6.3 m in u te s fo r s tora ge re q u ire m e n t c a n b e a vo id e d if th e ra n d o m
Alg o rith m A, 8.3 s e c o n d sfo r Alg o rith m C, a n d 0.052 n u m b e r g e n e ra to rc a n b e re -s e e d e dfor th e s e c o n d pa s s ,
s e c o n d sfo r Alg o rith m D. Th e im p le m e n ta tio n o f Algo- s o th a t th e p ro g ra m c a n re g e n e ra teth e in te g e rs o n th e
rith m D th a t us e s X1 for X is u s u a lly fa s te r th a n th e fly. Wh e n re -s e e d in gis d o n e , a s s u m in g th a t th e firs t
ve rs io n th a t us e s X2 for X, s in c e th e la s t m o d ific a tio n in pa s s d o e s n o t h a ve to b e re p e a te d ,th e p ro g ra m re q u ire s
S e c tio n 5 c a u s e s th e n u m b e r o f e xp o n e n tia tio n o p e ra - m + cn ra n d o m n u m b e r g e n e ra tio n s a n d th e e q u iva le n t
tio n s to b e re d u c e d to ro u g h ly n wh e n X~ is u s e d , b u t to o f a b o u t 2cn e xp o n e n tia tio n o p e ra tio n s . F o r m a xim u m
o n ly a b o u t 1.5n wh e n X2 is u s e d . Wh e n X~ is u s e d fo r X, e ffic ie n c y, two d iffe re n t ra n d o m n u m b e r g e n e ra to rsa re
th e m o d ific a tio n s d is c u s s e din S e ctic~n 5 c u t th e C P U re q u ire d in th e s e c o n d pa s s :o n e for re g e n e ra tin gth e
tim e for Alg o rith m D to ro u g h ly h a lf o f wh a t it wo u ld re a l n u m b e rs a n d th e o th e r for Alg o rith m S o r A. Th e
be o th e rwis e . s e c o n d pa s s c a n b e d o n e with o n ly o n e ra n d o m n u m b e r
Th e s e tim in g s give a g o o d lo we r b o u n d o n h o w fa s t g e n e ra to r,if d u rin g th e firs t pa s s 2cn - 1 ra n d o m va r-
th e s e a lg o rith m s ru n in p rd c tic e a n d s h o w th e re la tive ia te s a re g e n e ra te din s te a d o f cn, with o n ly e ve ry o th e r
s p e e d so f th e a lg o rith m s . O n a s m a lle r c o m p u te r, th e ra n d o m va ria te u s e d a n d th e o th e r h a lf ig n o re d . FOR-
ru n n in g tim e s c a n b e e xp e c te d to b e m u c h longe r. TR AN 77 im p le m e n ta tio n s o f Be n tle y's m e th o d (us ing
Alg o rith m A a n d two ra n d o m n u m b e r g e n e ra to rsfor
th e s e c o n d pa s s ) o n a n IBM 3081 m a in fra m e ru n in
8. CONCLUS IONS AND FUTURE WORK a p p ro xim a te ly 105n m ic ro s e c o n d s .Th e a m o u n t o f c o d e
We h a ve p re s e n te ds e ve ra l n e w a lg o rith m s for s e q u e n - is c o m p a ra b le to th e im p le m e n ta tio n s o f Alg o rith m D in
tia l ra n d o m s a m p lin g o f n re c o rd s fro m a file c o n ta in in g th e Ap p e n d ix.
N re c o rd s . E a c h a lg o rith m d o e s th e s a m p lin g with a E m p iric a l s tu d y in d ic a te s th a t ro u n d -o ff e rro r is in-
s m a ll c o n s ta n t a m o u n t o f s pa ce . Th e ir p e rfo rm a n c e is s ig n ific a n t in th e a lg o rith m s in this p a p e r. Th e ra n d o m
s u m m a riz e d in Ta b le I, a n d e m p iric a l tim in g s a re va ria te s S g e n e ra te db y Alg o rith m D pa s s th e s ta n d a rd
s h o wn in Ta b le III. P a s c a l-like im p le m e n ta tio n s o f s e v- s ta tis tica l te s ts . It is s h o wn in [1] th a t th e ru le (4-4) for
e ra l o f th e a lg o rith m s a re g ive n in th e Ap p e n d ix. g e n e ra tin gX1 wo rks we ll n u m e ric a lly. S in c e o n e o f th e
Th e m a in re s u lt o f this p a p e r is th e d e s ig n a n d a n a ly- wa ys Alg o rith m D g e n e ra te sS is b y firs t g e n e ra tin gX1,
s is o f Alg o rith m D, wh ic h ru n s in O(n) tim e , o n th e it is n o t s u rp ris in g th a t th e g e n e ra te dS va lu e s a re a ls o
a ve ra g e ;it re q u ire s th e g e n e ra tio n o f a p p ro xim a te ly n va lid s ta tis tica lly.
u n ifo rm ra n d o m va ria te s a n d th e c o m p u ta tio n o f Th e ide a s in this p a p e r h a ve o th e r a p p lic a tio n s a s
ro u g h ly n e xp o n e n tia tio n o p e ra tio n s . Th e in n e r lo o p o f we ll. R e s e a rc his c u rre n tly u n d e rwa y to s e e if th e re -
Alg o rith m D th a t g e n e ra te sS give s a n o p tim u m a ve r- je c tio n te c h n iq u e u s e d in Alg o rith m D c a n b e e xte n d e d
a g e -tim e s o lu tio n to th e o p e n p ro b le m lis te d in E xe rc is e to g e n e ra te th e kth re c o rd o f ra n d o m s a m p le o f s ize n
3.4.2-8 o f [6]. Alg o rith m D is ve ry e ffic ie n t a n d s im p le fro m a pool o f N re c o rd s in c o n s ta n t tim e , o n th e a ve r-
to im p le m e n t, s o it is id e a lly s u ite d fo r c o m p u te r im p le - a ge . Th e g e n e ra tio n o f S (n, N) in Alg o rith m D h a n d le s
m e n ta tio n . th e s p e c ia l ca s e k = 1; ite ra tin g th e p ro c e s s a s in Algo-
Th e re a re a c o u p le o th e r in te re s tin g m e th o d s th a t rith m D g e n e ra te sth e in d e x o f th e kth re c o rd in O(k)
h a ve b e e n d e ve lo p e d in d e p e n d e n tly. Th e o n lin e s e - time . Th e d is trib u tio n o f th e in d e x o f th e kth re c o rd is
q u e n tia l a lg o rith m s in [5] u s e a c o m p lic a te d ve rs io n o f a n e xa m p le o f th e n e g a tive h yp e rg e o m e tricd is trib u -
th e re je c tio n -a c c e p ta n c em e th o d , wh ic h d o e s n o t ru n tion. O n e pos s ible a p p ro a c h to g e n e ra tin g th e in d e x in
in O(n) tim e . P re lim in a ry a n a lys is in d ic a te s th a t th e c o n s ta n t tim e is to a p p ro xim a te th e n e g a tive h yp e rg e o -
a lg o rith m s ru n in O(n + N /n ) tim e ; th e y a re lin e a r in n m e tric d is trib u tio n b y th e b e ta d is trib u tio n with p a ra m -
o n ly wh e n n is n o t to o s ma ll, b u t n o t to o la rge . F o r e te rs a = k a n d b = n - k + 1 a n d n o rm a liz e d to th e
s m a ll o r la rge n, Alg o rith m D s h o u ld b e m u c h fa s te r. in te rva l [0, N]. An a lte rn a te a p p ro xim a tio n is th e n e g a -
J. L. Be n tle y (p e rs o n a lc o m m u n ic a tio n , 1983) h a s p ro - tive b in o m ia l d is trib u tio n . P o s s ib ly th e re je c tio n te c h -
p o s e d a c le ve r two -p a s s m e th o d th a t is n o t o n lin e , b u t n iq u e c o m b in e d with a p a rtitio n in g a p p ro a c h c a n give
doe s ru n in O(n) tim e , o n th e a ve ra g e .In th e firs t pa s s , th e d e s ire d re s ult.

July 1984 Volum e27 Num be r7 Communicationsof the ACM 713


R e s e archContributions

Wh e n the n u m b e r N of re c o rd s in th e file is not lis te d a s Alg o rith m R in [6]. It re q u ire s N u n ifo rm ra n -


kn o wn a priori a n d wh e n re a d in g th e file m o re th a n d o m va ria te s a n d ru n s in O(N) time . In [9, 10], th e
once is not a llo we d or de s ire d, n o n e of th e a lg o rith m s re je c tio n te c h n iq u e is a p p lie d to yie ld a m u c h fa s te r
m e n tio n e d in this p a p e r c a n b e us e d. O n e wa y to s a m - a lg o rith m th a t re q u ire s a n a ve ra g eof o n ly O(n + n
ple wh e n N is u n kn o wn b e fo re h a n dis th e R e s e rvo ir ln(N/n)) u n ifo rm ra n d o m va ria te s a n d O(n + n ln(N/n))
S a mp lin g Me thod, d u e to A. G. Wa te rm a n ,wh ic h is time .

w h ile n > 0 d o
b e g in
if N x R A N D O M ( ) < n t h e n
b e g in
S e le c t th e n e x t re c o rd in th e file fo r th e s a m p le ;
n :=n - 1
e nd
e ls e S k ip o v e r th e n e x t re c o rd (d o n o t in c lu d e it in th e s a m p le );
N :=N - 1
e nd;

ALGORITHMS: All variableshave type integer.

to p : = N - orig_r~;
fo r n : = orig_r~ d o w n t o 2 do
b e g in
{ S t e p A1 }
V : = R A N D O M ( );
{ S t e p A2 }
S : = O;
quot : = to p ~ N ;
w h ile quot > V d o
b e g in
S : = S + 1;
top : = to p - 1;
N : = N - 1;
quot : = quot x t o p / N
e nd;
{ S t e p A3 }
S k ip o v e r th e n e x t S re c o rd s a n d s e le c t th e fo llo w in g o n e fo r th e s a m p le ;
N:=N-1
e nd;
{ S p e c ia l c a s e n = 1 }
S : = T R U N C ( N x R A N D O M ( )1;
S k ip o v e r th e n e x t S r e c o r d s a n d s e le c t th e fo llo w in g o n e fo r th e s a m p le ;

ALGORITHMA: The variablesV and quot have type real All other variables havetype integer.

APPENDIX Two im p le m e n ta tio n sof Alg o rith m D a re give n: th e


Th is s e ction give s P a s ca l-like im p le m e n ta tio n so f Algo- firs t us e s X1 for X, a n d th e s e c o n d us e s X2 for X. Th e
rith m s S, A, C, a n d D. Th e FORTRAN p ro g ra m s u s e d in firs t im p le m e n ta tio nis re c o m m e n d e dfor g e n e ra lus e .
S e ction 7 for th e CP U timings a re d ire c t tra n s la tio n s o f Th e s e two p ro g ra m s us e a n o n -s ta n d a rdP a s ca lcon-
th e p ro g ra m s in this s e ction. s tru c t for looping. Th e s ta te m e n tswith in th e loop a p-

714 Com m unicationsof the A C M July 1984 Volum e 27 Num be r 7


ResearchContributions

lim it : = N - o r/g _ n + 1;
for n := o rig _ n d o w n t o 2 d o
be gin
{ S te p s C1 a n d C 2 }
m in _ X := lim it; ....
fo r m a lt := N d o w n t o lim it d o
be gin
X := m a lt x R A N D O M ( );
if X < m in _ X t h e n m in _ X := X
e nd;
S := T R U N C ( m in . . X ) ;
{ S te p C 3 }
S k ip o v e r th e n e x t S re c o rd s a n d s e le c t th e fo llo w in g o n e fo r th e s a m p le ;
N :=N - S - 1 ;
lim it := lim it - S
e nd;
{ S p e c ia l c a s e n = 1 }
s := TRUNC(N × RANDOM( ));
S k ip o v e r th e n e x t S re c o rd s a n d s e le c t th e fo llo w in g o n e fo r th e s a m p le ;

ALGORITHM C: The variables X and min_X have type real. All other variables have type integer.

p e a r b e twe e n th e re s e rve d wo rd s loop and e nd loop; us e d. Th e firs t im p le m e n ta tio n g ive n b e lo w is p re fe rre d


th e e xe c u tio n o f th e s ta te m e n tbre ak loop c a u s e sth e a n d is re c o m m e n d e dfor a ll ra n g e s o f n a n d N; th e
flo w o f c o n tro l to e xit th e c u rre n t in n e rm o s t loop. s e c o n d im p le m e n ta tio n will wo rk we ll a ls o, b u t is
Libe rtie s h a ve b e e n ta ke n with th e s yn ta x o f id e n ti- s lightly s lo we r fo r th e re a s o n s g ive n in S e c tio n 7,
fie r n a m e s , for th e s a ke o f re a d a b ility. Th e × s ym b o l is e s p e c ia lly wh e n n is s ma ll. Th e ra n g e fo r th e ra n d o m
u s e d for m u ltip lic a tio n . P a re n th e s e sa re u s e d to e n c lo s e n u m b e r fu n c tio n R A N DO M is a s s u m e dto b e th e o p e n
n u ll a rg u m e n ts in ca lls to fu n c tio n s (like R A N DO M) in te rva l (0, 1).
th a t h a ve n o p a ra m e te rs . As e xp la in e d in S e c tio n s 4 a n d 5, th e re is a c o n s ta n t
Va ria b le s o f typ e real s h o u ld b e d o u b le p re c is io n s o a th a t d e te rm in e s wh ic h o f Alg o rith m s D a n d A s h o u ld
th a t ro u n d -o ff e ri'or will b e in s ig n ific a n t, e ve n wh e n N be u s e d for th e s a m p lin g : If n < a N , th e n th e re je c tio n
is ve ry la rge . R o u g h ly lo g lo N digits o f p re c is io n will te c h n iq u e is fa s te r; o th e rwis e , Alg o rith m A s h o u ld b e
S uffice . C a re s h o u ld b e ta ke n to a s s u re th a t in te rm e d i- us e d. Th is o p tim iz a tio n g u a rd s a g a in s t "wo rs t-c a s e "
a te c a lc u la tio n s a re d o n e in full p re c is io n . Va ria b le s o f b e h a vio r th a t o c c u rs wh e n n = N a n d wh e n X1 is u s e d
typ e inte ge r s h o u ld b e a b le to s to re n u m b e rs u p to for X, a s e xp la in e d in S e c tio n 5. Th e va lu e o f a is
va lu e N. typ ic a lly in th e ra n g e 0.05-0.15. F o r th e IBM 3081
Th e c o d e for th e ra n d o m n u m b e r g e n e ra to r im p le m e n ta tio n d is c u s s e din S e c tio n 7, we h a ve a
R A N DO M is n o t in c lu d e d . F o r th e C P U tim in g s in S e c- 0.07. Bo th im p le m e n ta tio n s o f Alg o rith m D u s e a n
tio n 7, we u s e d a m a c h in e -in d e p e n d e n ve t rs io n o f th e inte ge r c o n s ta n t alpha_inve rs e > 1 (wh ic h is in itia liz e d to
lin e a r c o n g ru e n tia lm e th o d , s im ila r to th e o n e g ive n in l/ a ) a n d a n inte ge r va ria b le thre s hold (wh ic h is
[8]. Th e fu n c tio n R A N DO M ta ke s n o a rg u m e n ts a n d a lwa ys e q u a l to alpha_inve rs e × n).
re tu rn s a d o u b le -p re c is io nu n ifo rm ra n d o m va ria te in S e c tio n s 4 a n d 6 m e n tio n th a t th e re is a c o n s ta n t fl
th e in te rva l [0, 1). Bo th im p le m e n ta tio n s o f Alg o rith m s u c h th a t if n 2 /N <_ fl, th e n it is b e tte r to u s e X1, c l,
D a s s u m e th a t th e ra n g e o f R A N DO M is re s tric te d to th e gl (x), h n d hi (s) in Alg o rith m D; o th e rwis e , X2, c2, g2(s ),
o p e n in te rva l (0, 1). Th is re s tric tio n c a n b e lifte d for th e a n d h2(s) s h o u ld b e us e d. Th e va lu e o f fl for th e IBM
firs t im p le m e n ta tio n o f Alg o rith m D with a c o u p le s im - 3081 im p le m e n ta tio n d is c u s s e din S e c tio n 7 in fl ~ 50.
p le m o d ific a tio n s , wh ic h will b e d e s c rib e d la te r. If m a xim u m e ffic ie n c y is a b s o lu te ly n e c e s s a ry,it is
re c o m m e n d e dth a t th e two p ro g ra m s b e c o m b in e d : X2
Algorithm D s h o u ld b e u s e d fo r X u n til th e c o n d itio n n Z /N <_ fl
Two im p le m e n ta tio n s a re g ive n fo r Alg o rith m D b e lo w: b e c o m e stru e , a fte r wh ic h Xa s h o u ld b e u s e d for X.
Xa is u s e d for X in th e firs t, a n d X2 is u s e d fo r X in th e Th e re s h o u ld b e n o n e e d to c o n tin u e te s tin g th e
s e c o n d . Th e o p tim iz a tio n s d is c u s s e din S e c tio n 5 a re c o n d itio n o n c e it b e c o m e stru e .

]uly 1984 Volume 27 Number 7 Communicationsof the ACId 715


R e s e archContributions

V_prim e := E X P ( L O G ( R A N D O M ( ))/n );
q u a n tl := N - n + 1; q u a n t2 := q u a n tl / N ;
thre s hold := alpha_inve rs e × n;
w h ile (n > 1) a n d (th re s h o ld < N ) d o
b e g in
lo o p
{ S te p D2: G e n e ra te U a n d X }
lo o p
X := N × (1.0 - V_prim e );
S := T R U N C ( X ) ;
if S < q u a n tl t h e n b r e a k lo o p ;
V_prim e := E X P ( L O G ( R A N D O M ( ))/n )
e n d lo o p ;
y := R A N D O M ( )/q u a n t2 ; { U is th e va lu e re tu rn e d b y R A N D O M }
{ S te p D3: Ac c e p t? }
LHS := E ~ X P (LO G (y )/(n - 1));
R H S := ((q u a n tl - S ) / q u a n t l) x ( N / ( N - X));
ff LHS < R H S t h e n
b e g in { Ac c e p t S , s in c e U < h ( [X J ) / c g ( X ) }
V_prim e := L H S / R H S ;
b r e a k lo o p
e nd;
{ S te p D4: Ac c e p t? }
fin - 1 >S th e n
b e g in b o tto m := N - n; lim it := N - S e n d
e ls e b e g in b o tto m := N - S - 1; lim it := q u a n tl e n d ;
fo r top := N - 1 d o w n t o lim it d o
b e g in y := y × to p ~b o tto m ; b o tto m := b o tto m - 1 e n d ;
ff E X P ( L O G ( y ) / ( n - 1)) < N / ( N - X ) t h e n
b e g in { Ac c e p t S , s in c e U < f ( L X J ) / c g ( X ) }
V_prim e := E X P ( L O G ( R A N D O M ( ) ) / ( n - 1));
b r e a k lo o p
e nd;
V_prim e := E X P ( L O G ( R A N D O M ( ))/n )
e n d lo o p ;
{ S te p Db: S e le ct th e (S + 1)s t re c o rd }
S k ip o v e r th e n e xt S re c o rd s a n d s e le c t th e fo llo w in g o n e fo r th e s a m p le ;
N := N - S - 1 ; n :=n -1 ;
q u a n tl := q u a n tl - S ; q u a n t2 : - q u a n tl / N ;
thre s hold := thre s hold - alpha_inve rs e
e nd;
if n > 1 t h e n Call A lg o rith m A to finis h th e s a m p lin g
e ls e b e g in { S p e c ia l ca s e n = 1 }
S := T R U N C ( N × V_prim e );
S k ip o v e r th e n e x t S re c o rd s a n d s e le c t th e fo llo w in g o n e fo r th e s a m p le
e nd;

ALGORITHM D: Using Xl for X.

716 Com m unicationsof the A C M July 1984 Volum e 27 N u m b e r 7


R e s e archContributions

V_prim e := L O G ( R A N D O M ( ));
q u a n tl := N - n + 1;
thre s hold := alpha_inve rs e x n;
w h ile (n > 1) a n d (th re s h o ld < N ) d o
b e g in
q u a n t2 : : (q u a n tl - 1 ) / ( N - 1); q u a n t3 := LO G (q u a n t2 );
lo o p
{ S te p D2: G e n e ra te U a n d X }
lo o p
S := T R U N C ( V _ p rim e /q u a n t3 ); { X is e q u a l to S }
if S < q u a n tl t h e n b r e a k lo o p ;
V_prim e := L O G ( R A N D O M ( ) )
e n d lo o p ;
LHS := L O G ( R A N D O M ( )); { U is th e va lu e re tu rn e d b y R A N D O M }
{ S te p D3: Ac c e p t? }
R H S := S x (LO G ((q u a n tl - S ) / ( N - S )) - q u a n t3 );
if LHS < R H S t h e n
b e g in { Ac c e p t S , s ince U < h (LX J )/c g (X ) }
V_prim e := LHS - R HS ;
b r e a k lo o p
e nd;
{ S te p D4: Ac c e p t? }
y := 1.0;
ifn - l> S t h e n
b e g in b o tto m : : N - n; lim it := N - S e n d
e ls e b e g in b o tto m :-- N - S - 1; lim it := q u a n tl e n d ;
fo r top := N - 1 d o w n t o lim it d o
b e g in y := y x to p ~b o tto m ; b o tto m := b o tto m - 1 e n d ;
V_prim e : : L O G ( R A N D O M ( ));
if q u a n t3 < - ( L O G ( y ) + L H S ) / S t h e n
b r e a k lo o p { Ac c e p t S , s ince U < f ( L X J ) / c a ( X ) }
e n d lo o p ;
{ S te p Db: S e le ct th e (S + 1)s t re c o rd }
S k ip ove r th e n e x t S re c o rd s a n d s e le c t th e fo llo w in g o n e fo r th e s a m p le ;
N::N-S -1 ; n :-n -1 ;
q u a n tl := q u a n tl - S ;
thre s hold := t h re s h o ld - alpha_inve rs e
e nd;
if n > 1 t h e n Call A lg o rith m A to finis h th e s a m p lin g
e ls e b e g in { S p e c ia l ca s e n = 1 }
S := T R U N C ( N × R A N D O M ( ));
S k ip o v e r th e n e x t S re c o rd s a n d s e le c t th e fo llo w in g o n e fo r th e s a m p le
e nd;

ALGORITHM D: Using X= for X.

July 1 9 8 4 Volum e 27 Num be r 7 Com m unicationsof the ACM 717


R e s e archContributions

U s in g X1 fo r X 2. Ernva ll, J. a nd Ne va la ine n,O. An a lgorithm for unbia s e dra ndom


T h e v a r ia b le s U, X , V _ p rim e , LHS , R HS , y, a n d q u a n t2 s a mpling.Comput.J. 25, 1 (J a nua ry1982),45-47.
3. Fa n, C.T., Mulle r, M.E., a nd Re zucha ,I. De ve lopme ntof s a mpling
h a v e typ e re al. T h e o t h e r v a r ia b le s h a v e t y p e inte ge r. pla ns by us ing s e que ntia l(ite m-by-ite m)s e le ctionte chnique sa nd
T h e p r o g r a m a b o v e c a n b e m o d ifie d to a llo w R A N D O M digita l compute rs .Am . S tat.As s n.J. 57 (J une 1962),387-402.
4. J one s T.G.
, A note on s a mpling a ta pe file . Commun.ACM, 5, 6 (J une
to r e t u r n t h e v a lu e 0 .0 b y r e p la c in g a ll e x p r e s s io n s o f 1962),343.
t h e fo r m E X P (LO G (a )/b ) b y a 1/b 5. Ka wa ra s a ki,J. a nd S ibuya , M. Ra ndomnumbe rsfor s imple ra ndom
T h e v a r ia b le V _ p rim e ( w h ic h is u s e d to g e n e r a t e X) is s a mplingwithout re pla ce me nt.KeioMath. S em.RepNo. 7 (1982),1-
9.
a lw a y s s e t to t h e n t h ro o t o f a u n ifo r m r a n d o m v a r ia t e , 6. Knuth, D.E. The Art of ComputerProgramming,Vol. 2, S eminumerical
fo r t h e c u r r e n t v a lu e o f n. T h e v a r ia b le s q u a n tl, q u a n t2 , Algorithms.Addis on-We s le y,Re a ding,MA (s e conde dition, 1981).
a n d thre s hold e q u a l N - n + 1, (N - n + 1 ) / N , a n d 7. Linds trom, E.E.a nd Vitte r, J.S.The de s igna nd a na lys is of
Bucke tS ortfor bubble me mory s e conda rys tora ge .Te ch. Re p.CS-83-
a lp h a _ in v e rs e x N , r e s p e c t iv e ly , fo r t h e c u r r e n t v a lu e s 23, Brown Unive rs ity, P rovide nce ,RI, (S e pte mbe r1983).S e e a ls o
o f N a n d n. U.S .Pate ntApplicationProvisionalS erialNo. 500741(file d J une 3,
1983).
8. S e dge wick,R. Algorithms.Addis on-We s le y,Re a ding,MA (1983).
U s in g X2 fo r X 9. Vitte r, J.S.Ra ndoms a mpling with a re s e rvoir. Te ch. Re p.CS-83-17,
T h e v a r ia b le s U, V _ p rim e , LHS , R HS , y, q u a n t2 , a n d Brown Unive rs ity, P rovide nce ,RI, (J uly 1983).
10. Vitte r, J.S.Optimum a lgorithms for two ra ndoms a mplingproble ms .
q u a n t3 h a v e t y p e re al. T h e o t h e r v a r ia b le s h a v e t y p e In Proceedingsof the 24th IEEES ymposiumon Foundationsof Computer
inte ge r. Le t x > 0 b e t h e s m a lle s t p o s s ib le n u m b e r re - S cience,Tucs on,AZ (Nove mbe r1983),65-75.
t u r n e d b y R A N D O M . T h e in te g e r v a r ia b le S m u s t b e
la rg e e n o u g h to s to re - (lo g lo x)N. CR Categories and Subject Descriptors: C.3 [Mathematics of
Computing]: P roba bility a nd S tatistics--probabalisticalgorithms,random
T h e v a r ia b le V _ p rim e ( w h ic h is u s e d to g e n e r a t e X) is numbergeneration,statisticalsoftware;G.4 ]Mathematics of Computing]:
a lw a y s s e t to t h e n a t u r a l lo g a r it h m o f a u n ifo r m r a n - Ma the ma tica lS oftware--algorithmanalysis
d o m va ria te . T h e v a r ia b le s q u a n tl, q u a n t2 , q u a n t3 , a n d General Terms: Algorithms , De s ign,P e rforma nce ,The ory
Additional Key Words and Phrases: ra ndoms a mpling,a na lys is of
thre s hold e q u a l N - n + 1, (N - n ) / ( N - 1), ln ((N - n ) / a lgorithms ,re je ction me thod, optimiza tion
(N - 1)), a n d a lp h a _ in v e rs e × n, fo r t h e c u r r e n t v a lu e s o f
N a n d n. Re ce ive d8/82; re vis e d 12/83; a cce pte d2/84
A c k n o w le d g m e n t s T h e a u t h o r w o u ld like to t h a n k
P h il H e id e lb e r g e r fo r in t e r e s t in g d is c u s s io n s o n w a y s to Author's P re s e ntAddre s s :J e ffre yS. Vitte r, As s is ta ntP rofe s s orof
r e d u c e t h e n u m b e r o f r a n d o m v a r ia t e s g e n e r a t e d in Compute rS cie nce ,De pa rtme ntof Compute rS cie nce ,Box 1910,Brown
Unive rs ity, P rovide nce ,RI 02912;js v.brown @ CS Ne t-Re la y
Alg o r it h m D fr o m t w o p e r lo o p to o n e p e r lo o p . T h a n k s
a ls o g o to t h e t w o a n o n y m o u s r e fe r e e s fo r t h e ir h e lp fu l
P e rmis s ionto copy without fe e a ll or pa rt of this ma te ria l is gra nte d
c o m m e n ts . provide d tha t the copie s a re not ma de or dis tribute dfor dire ct comme r-
cia l a dva nta ge the
, ACM copyright notice a nd the title of the publica tion
REFERENCES a nd its da te a ppe a r,a nd notice is give n tha t copying is by pe rmis s ionof
1. Be ntle y, J.L. a nd S a xe ,J.B.Ge ne ra tings orte d lis ts of ra ndom the As s ocia tionfor Computing Ma chine ry. To copy othe rwis e ,or to
numbe rs .ACM Trans.Math. S oftw.6, 3 (S e pt.1980),359-364. re publis h, re quire s a fe e a n d /o r s pe cific pe rmis s ion.

C O R R IG E N D U M: Hu m a n A s p e c ts o f C o m p u tin g

Iz a k B e n b a s a ta n d Ya ir W a n d . C o m m a n d a b b r e v ia t io n b e h a v io r in h u m a n - c o m p u t e r in t e r a c t io n . C o m m u n . A C M 27,
4 (Ap r. 1 9 8 4 ), 3 7 6 -3 8 3 . P a g e 3 8 0 : T a b le II s h o u ld re a d :

TABLE II. Data on Abbreviation Behavior*

Average Weighted
No. of Command No. of Percent Distribution of No. of
Characters Name Times Characters Used Characters Average
in Command Used Used for Group

1 2 3 4 5 6 7 8

4 VARY 86 5 7 3 85 3.69
4 RUSH 280 5 20 4 71 3.41
4 SORT 5 0 0 0 100 4.00
4 HELP 25 0 0 0 100 4.00
4 EXIT 12 0 25 0 75 3.50
4 STOP 3 0 0 0 100 4.00 3.52
5 POINT 442 27 4 17 1 51 3.46
5 ORDER 27 7 0 7 0 85 4.56
5 NAMES 28 0 0 7 0 93 4.86 3.60
6 SELECT 87 0 0 15 0 0 85 5.55
6 REPORT 596 0 0 62 0 0 37 4.09
6 CANCEL 35 0 0 14 0 0 86 5.57 4.34
7 COLUMNS 10 0 0 40 0 20 0 40 -- 5.00 5.00
8 QUANTITY 404 40 1 14 17 12 0 0 17 3.45
8 SIMULATE 520 1 0 88 0 0 0 0 11 3.51 3.48
• E x c l u d e s users w h o did not use abbreviations.

718 Com m unications of the A C M July 1 9 8 4 Volum e 27 Num be r 7


The pdf995 suite of products - Pdf995, PdfEdit995, and Signature995 - is a complete solution for your document publishing needs. It
provides ease of use, flexibility in format, and industry-standard security- and all at no cost to you.

Pdf995 makes it easy and affordable to create professional-quality documents in the popular PDF file format. Its easy-to-use interface
helps you to create PDF files by simply selecting the "print" command from any application, creating documents which can be viewed
on any computer with a PDF viewer. Pdf995 supports network file saving, fast user switching on XP, Citrix/Terminal Server, custom
page sizes and large format printing. Pdf995 is a printer driver that works with any Postscript to PDF converter. The pdf995 printer
driver and a free Converter are available for easy download.

PdfEdit995 offers a wealth of additional functionality, such as: combining documents into a single PDF; automatic link insertion;
hierarchical bookmark insertion; PDF conversion to HTML or DOC (text only); integration with Word toolbar with automatic table of
contents and link generation; autoattach to email; stationery and stamping.

Signature995 offers state-of-the-art security and encryption to protect your documents and add digital signatures.

The Pdf995 Suite offers the following features, all at no cost:

Automatic insertion of embedded links Option to attach PDFs to email after creation
Hierarchical Bookmarks Automatic text summarization of PDF
Support for Digital Signatures documents
Support for Triple DES encryption Easy integration with document management
Append and Delete PDF Pages and Workflow systems
Batch Print from Microsoft Office n-Up printing
Asian and Cyrillic fonts Automatic page numbering
Integration with Microsoft Word toolbar Simple Programmers Interface
PDF Stationery Option to automatically display PDFs after
Combining multiple PDF's into a single PDF creation
Three auto-name options to bypass Save As dialog Custom resizing of PDF output
Imposition of Draft/Confidential stamps Configurable Font embedding
Support for large format architectural printing Support for Citrix/Terminal Server
Convert PDF to JPEG, TIFF, BMP, PCX formats Support for Windows 2003 Server
Convert PDF to HTML and Word DOC conversion Easy PS to PDF processing
Convert PDF to text Specify PDF document properties
Automatic Table of Contents generation Control PDF opening mode
Support for XP Fast User Switching and multiple user Can be configured to add functionality to
sessions Acrobat Distiller
Standard PDF Encryption (restricted printing, modifying, Free: Creates PDFs without annoying
copying text and images) watermarks
Support for Optimized PDF Free: Fully functional, not a trial and does not
Support for custom page sizes expire
Over 5 million satisfied customers
Over 1000 Enterprise Customers worldwide

Please visit us at www.pdf995.com to learn more.


This document illustrates several features of the Pdf995 Suite of Products.
Introduction
The Virtual Reality Modeling Language (VRML) is a language for describing multi-
participant interactive simulations -- virtual worlds networked via the global Internet and
hyperlinked with the World Wide Web. All aspects of virtual world display, interaction
and internetworking can be specified using VRML. It is the intention of its designers that

D
VRML become the standard language for interactive simulation within the World Wide
Web.

The first version of VRML allows for the creation of virtual worlds with limited

VE
interactive behavior. These worlds can contain objects which have hyperlinks to other
worlds, HTML documents or other valid MIME types. When the user selects an object
with a hyperlink, the appropriate MIME viewer is launched. When the user selects a link
to a VRML document from within a correctly configured WWW browser, a VRML
viewer is launched. Thus VRML viewers are the perfect companion applications to
standard WWW browsers for navigating and visualizing the Web. Future versions of
O
VRML will allow for richer behaviors, including animations, motion physics and real-
time multi-user interaction.

This document specifies the features and syntax of Version 1.0 of VRML.
R

VRML Mission Statement


PP

The history of the development of the Internet has had three distinct phases; first, the
development of the TCP/IP infrastructure which allowed documents and data to be stored
in a proximally independent way; that is, Internet provided a layer of abstraction between
data sets and the hosts which manipulated them. While this abstraction was useful, it was
also confusing; without any clear sense of "what went where", access to Internet was
restricted to the class of sysops/net surfers who could maintain internal cognitive maps of
the data space.
A

Next, Tim Berners-Lee’s work at CERN, where he developed the hypermedia system
known as World Wide Web, added another layer of abstraction to the existing structure.
This abstraction provided an "addressing" scheme, a unique identifier (the Universal
Resource Locator), which could tell anyone "where to go and how to get there" for any
piece of data within the Web. While useful, it lacked dimensionality; there’s no there
there within the web, and the only type of navigation permissible (other than surfing) is
by direct reference. In other words, I can only tell you how to get to the VRML Forum
home page by saying, "http://www.wired.com/", which is not human-centered data. In
fact, I need to make an effort to remember it at all. So, while the World Wide Web
provides a retrieval mechanism to complement the existing storage mechanism, it leaves
a lot to be desired, particularly for human beings.

Finally, we move to "perceptualized" Internetworks, where the data has been sensualized,
that is, rendered sensually. If something is represented sensually, it is possible to make
sense of it. VRML is an attempt (how successful, only time and effort will tell) to place

D
humans at the center of the Internet, ordering its universe to our whims. In order to do
that, the most important single element is a standard that defines the particularities of
perception. Virtual Reality Modeling Language is that standard, designed to be a
universal description language for multi-participant simulations.

VE
These three phases, storage, retrieval, and perceptualization are analogous to the human
process of consciousness, as expressed in terms of semantics and cognitive science.
Events occur and are recorded (memory); inferences are drawn from memory
(associations), and from sets of related events, maps of the universe are created (cognitive
perception). What is important to remember is that the map is not the territory, and we
should avoid becoming trapped in any single representation or world-view. Although we
O
need to design to avoid disorientation, we should always push the envelope in the kinds
of experience we can bring into manifestation!
R

This document is the living proof of the success of a process that was committed to being
open and flexible, responsive to the needs of a growing Web community. Rather than re-
invent the wheel, we have adapted an existing specification (Open Inventor) as the basis
from which our own work can grow, saving years of design work and perhaps many
PP

mistakes. Now our real work can begin; that of rendering our noospheric space.

History
VRML was conceived in the spring of 1994 at the first annual World Wide Web
Conference in Geneva, Switzerland. Tim Berners-Lee and Dave Raggett organized a
A

Birds-of-a-Feather (BOF) session to discuss Virtual Reality interfaces to the World Wide
Web. Several BOF attendees described projects already underway to build three
dimensional graphical visualization tools which interoperate with the Web. Attendees
agreed on the need for these tools to have a common language for specifying 3D scene
description and WWW hyperlinks -- an analog of HTML for virtual reality. The term
Virtual Reality Markup Language (VRML) was coined, and the group resolved to begin
specification work after the conference. The word ’Markup’was later changed to
’Modeling’to reflect the graphical nature of VRML.
Shortly after the Geneva BOF session, the www-vrml mailing list was created to discuss
the development of a specification for the first version of VRML. The response to the list
invitation was overwhelming: within a week, there were over a thousand members. After
an initial settling-in period, list moderator Mark Pesce of Labyrinth Group announced his
intention to have a draft version of the specification ready by the WWW Fall 1994
conference, a mere five months away. There was general agreement on the list that, while
this schedule was aggressive, it was achievable provided that the requirements for the

D
first version were not too ambitious and that VRML could be adapted from an existing
solution. The list quickly agreed upon a set of requirements for the first version, and
began a search for technologies which could be adapted to fit the needs of VRML.

VE
The search for existing technologies turned up a several worthwhile candidates. After
much deliberation the list came to a consensus: the Open Inventor ASCII File Format
from Silicon Graphics, Inc. The Inventor File Format supports complete descriptions of
3D scenes with polygonally rendered objects, lighting, materials, ambient properties and
realism effects. A subset of the Inventor File Format, with extensions to support
networking, forms the basis of VRML. Gavin Bell of Silicon Graphics has adapted the
Inventor File Format for VRML, with design input from the mailing list. SGI has publicly
O
stated that the file format is available for use in the open market, and have contributed a
file format parser into the public domain to bootstrap VRML viewer development.
R
PP
A
A Graphical Representation of Inverse VRML Uptake
90
140 Programmers Artists
80
Technical Writers Musicians
120 QA Politicians
70
Other Dentists
100 60

Inverse log usage


Inverse usage

D
80 50

40
60
30

VE 40

20

0
20

10

0
0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

50

60

70

80

90

100

110

120

130

140
Days after download Days after download
O
Change the number in red below to adjust for download rate and/or bandwidth.

1 The number 1 represents an engineer with an "average" cube *


R
PP

EANx EANx
fsw Air
CF Min 32% 36%
80.0 149.12 0
61.4 114.43 10
49.8 92.846 20
41.9 78.102 30 180
36.2 67.402 40 120
A

31.8 59.275 50 80.0 147.0 192.0


28.4 52.9 60 57.0 92.0 123.0
25.6 47.774 70 40.0 65.0 79.0
23.4 43.543 80 30.0 49.0 59.0
21.5 40.001 90 24.0 37.0 45.0
19.9 37 100 19.0 30.0 35.0
18.5 34.409 110 16.0 25.0 29.0
17.3 32.154 120 13.0 20.0 n/a
16.2 30.178 130 10.0 17.0 n/a
15.1 28.202 140 8.0 n/a n/a
Adobe Acrobat PDF Files
Adobe® Portable Document Format (PDF) is a universal file format that preserves all
of the fonts, formatting, colours and graphics of any source document, regardless of
the application and platform used to create it.

Adobe PDF is an ideal format for electronic document distribution as it overcomes the
problems commonly encountered with electronic file sharing.

• Anyone, anywhere can open a PDF file. All you need is the free Adobe Acrobat
Reader. Recipients of other file formats sometimes can't open files because they
don't have the applications used to create the documents.

• PDF files always print correctly on any printing device.

• PDF files always display exactly as created, regardless of fonts, software, and
operating systems. Fonts, and graphics are not lost due to platform, software, and
version incompatibilities.

• The free Acrobat Reader is easy to download and can be freely distributed by
anyone.

• Compact PDF files are smaller than their source files and download a
page at a time for fast display on the Web.
I am  ome random PDF file. 

You might also like