Von Neumann'S Comparison Method For Random Sampling From The Normaland Other Distributions
Von Neumann'S Comparison Method For Random Sampling From The Normaland Other Distributions
_...
-L-e
---
.-we-_
_.
_.
VON NEUMANNS COMPARISON METHOD FOR RANDOM SAMPLING FROM THE NORMALAND OTHER DISTRIBUTIONS .
BY GEORGE E. FORSYTHE
COMPUTER SCIENCE DEPARTMENT School of Humanities and Sciences STANFORD UN IVERS ITY
Von Neumann's Comparison Method for Random Sampling from the Normal and Other Distributions
Abstract The author presents a generalization he worked out in 1950 of von Neumann's method of generating random samples from the exponential distribution by comparisons of uniform random numbers on (0,l) . It is shown how to generate samples from any distribution whose probability density function is piecewise both absolutely continuous and monotonic on ( -=,a) . cost of only A special case delivers normal deviates at an average
4.036 uniform deviates each. This seems more efficient
than the Center-Tail method of Dieter and Ahrens, which uses a related, but different, method of generalizing the von Neumann idea to the normal distribution.
This research was supported in part by the Office of Naval Research under Contracts N-000%67-A-Oll2-0057 (NR 044-402) and N-000%67-A-0112-0029 (NR 044~ZTL), and by the National Science Foundation under Grant GJ- 992. Reproduction in whole or in part is permitted for any purpose of the United States Government.
Von Neumann's Comparison Method for Random Sampling from the Normal and Other Distributions
1.
Introduction.
Numerical Analysis on the campus of the University of California, Los Angeles, John von Neumann ['] lectured on various aspects of generating pseudorandom numbers and variables. At the end he presented an ingenious
method for generating a sample from an exponential distribution, based solely on comparisons of uniform deviates. In his last snetence he
commented that his "method could be modified to yield a distribution satisfying any first-order differential equation". In 1949 or 1950 I wrote some notes about what I assumed von Neumann had in mind, but I do not recall ever discussing the matter with him. This belated polishing and publication of those notes is
related algorithms are studied, and by a personal discussion $-ith the authors on how the von Neumann dea can be extended.
In Section 2 the general method is presented, and in Section 3 its efficiency is analyzed. In Sections 4 and 5 it is shown how the
exponential and normal distributions show up as special cases. In Section 6 the method for a normal distribution is compared with the Center-Tail method of [l] and [2]. are mentioned. Although this introduction has emphasized historical matters, the method of Section 6 is a good one,-and is competitive with the best known methods for generating normal deviates. I thank both Professors Ahrens and Dieter for their careful criticism of a first draft of this paper. 2. The general algorithm. Let f(x) > 0 be defined for all In Section 7 possible generali7ati.01
1s
( 11
f'(x) Let
b(x)
f(x)
(0
< _
<
B(x)
= J 0
b(t)
d-t
(3)
Then
(4)
is the unique solution of (1) PTith 1, and hence f is
the probability density distribution of a nonnegative random variable. Suppose we have a supply of independent random variables u wi th
a unil'orm distribution on [0, l), and that we wish to generate a random variable y with the density distribution f(x). proceed. We first prepare three tables of constants {qk] , (r,] , [dk) for k = 0, 1, . . . . K, as follows. (K is defined below.) Let q. = 0. Here is one way to
(5)
qk
qk-1
(6)
Next, compute
B(qk)
- B(qk-l)
(7
>
qk rk = f(x) dx (k = 0, 1, . . . . K).
Here K is chosen as the least index such that rK exceeds the largest representable number less tIlah 1. rK (K may be chosen smaller, if one sets
by
L 1, and if' one is willing to truncate the generated variable qK to the interval [qKml, qK).)
(8)
dk = qk - 9J.p1
(k = 1, 2, . . . . K).
(9)
Gk(X)
= B(qkwl + x) -
B(qkwl)
(k
1,
2,
..Y
K).
[q k-l, qk) the variable y Cl1 belong to. y within tllat interval. Set k C-1.
Generate a uniform
2.
[Test.]
3.
4.
Generate
5.
Set t WGk(w).
7. 8.
[Test.] If u* > t, go to step 11. [Trial continues.] If U* < t, generate another uniform deviate u.
9. [Test.] If u < u++, set tf---u and go back to step 6. 10. 11. [Reject the trial.] If u > u*, go back to step 4. [Finish.] Return yeqk 1 + w as the sample variable. Since we = K,
assume that each u < 1, the test in step 2 must be passed when k if not sooner. of r k
Hence an interval [qkml, q ) is selected, and the values k were chosen to make the probabilities of choosing the various
intervals correct. Fix k. The remainder of the algorithm can be described as follows: is selected uniformly from the interval 0 < w < d k'
Then the algorithm continues to generate independent uniform deviates ui f'rom 10, 1) until the least n > Gk(W) is found with (n
u2 (10)
U
= l), or
n+l
>u
<u
n-l
<
. . .
<u
<u 2
<
GkcW)
(n
If
2 2). n
With probability
all the u, choose a new w, and repeat. We now determine the probability P(k, w) that one w determined in step 4 will be accepted without returning to step 4. be the universe of all events. tile event
U
Let El(k, w)
For
<
11
n-l
<
<u
<u
<
Gk(W
I-
E,(k, w) is given by
(n
Gk(W)
Prob fE,(k, w)]
1)
(n
= 2)
. . .
(n 2
3)
(all n).
The occurrence of
(10)
is the conjunction of
E,+l(k, w)* Since E,+l(k, w) implies E (k, w), the probability n that (10) occurs for a given n and w is (11) Prob [I3 (k, w) n and nOt-En+l(k, w)] =
Gk(w)n-l Gk(w)n
(n-l)
(12)
P(k,
F')
= (n-l) ! -
k(w)
'Gkh) e '
I 1=
Since
< -
d < -
Gk(W)
-1
Now g< -
d,-'dc g+ that
w <
Combining this bith (l2), Fe see that the w 5 g + dg and that w is accepted is given
probability bY
0~)
Prob
(g
F'
dg
and
w is
accepted]
dg e -Gk(w) dk
Corresponding
--
qkW1 + w as the
sample variable.
1 e dk 1 II dk = 1 e
-Gk(X - $1) dx
-B(x)
+-
'(qk-1) dx 9 bY (9)
Ce
B(qk-l)
-1 -B(x) C e
dx
has the desired probability density distribution within the interval hk-1' q& Since? from (13), the probability of an infinite loop
back to step I+ is zero, the second half of the algorithm terminates w'lttl probability 1. works as claimed. 3* Efficiency -- algorithm. of the For a general function b , I This concludes the demonstration that the algorithm
shall derive a representation for the expected number of uniformly distributed random variables u that must be used to generate one variable A similar deriv-
PI.
The preliminary game to select k -- steps 1 to 3 of the algor thm -- requires one u.
The rest of the algorithm is different for each k, and we shall first determine the expected number N(k) of steps to determine y.
To do this, we shall first assume t hat k is fixed and that w has been picked in the interval 0 -< w < dk. Define E,(k, w) as in Section 2, and introduce the abbreviations 1 00 and e n = yp, w> = Prob {E_(k, w)) ( n = 1, 2, . ..)
(17)
g = Gk(w .
Then, as in Section 2, we have the following expression for the probability P(k, w) of accepting w without returning to step 4:
P(k, w) = (el - e2) + (e3 - e4) + (e5 - e6) + . . . . Moreover, given k and w and given that w is accepted, the expected needed will be 2(el - e2) + 4(e3 - e4) + 6(e5 - e6) + 2-l n!
.0 3
maOh 4
= p(k,
w)-'
1
l
(n+l).
Similarly, the probability'1 - P(k, w) that w is rejected is given by l- P(k, w) = (e2 - e3) + (e4 - e5) + (es - 7) + *e
Moreover, given
of uniform deviates u
[3(e p3) +
67 n-l
5(e4-e5) + 7(e
2 n!
6-7
') + . ..I
(n + 1) .
Now, if a w is rejected, the algorithm returns to step 4, a new w is picked, and the process repeats. Let M(k, w) be the is finally Then N(k)
expected number of uniform deviates selected until a y selected, given a fixed is the average of 0 < w < dk. k and an initially chosen w.
uniformly distributed on
We have (20) M(k, w) = P(k, w)ma(k, w) + i1 - P(k, 41 br(k, w> + N(k)1 , is rejected, the whole process is repeated. Using
(18)
and
(19)
f'or
g M(k, w) =
x (n-l)! n=l [
M(k, w) = 1 + e Gk(W) + [l - P(k, w)] N(k) . (21) for 0 < w <- d , and using (12) , we find that dk >'w' dw + N(k) c - + I"-""") dj .
Averaging
1 N(k) = 1 + dk
get
dk
G.
(w)
ek
dw .
-Gk(W)
dw
Finally, the expected number Nof uniform deviates drawn in the main game until a y is returned is the average of N(k) over the
intervals, weighted by the probabilities of selecting the various intervals. That is,
0) c
N(k)
bk
rk-ll
(4), (7),
and (9)
to express
??
in terms of B(x),
we obtain the ugly representation co = z k=l (24 > 1 = 03 e-B(X) dx s 0 d ewB(qk-l) + e-2B(qk-l) k L eB(") dx J e B(qjp.l) dk + e
-B( qk-1)
9k J qk-l Tr
eB(") dx
e -B(x)
dx
cYBCx)
dx
e-B(x) dx J 0
%-1
10
4.
Special case:
exponential distribution.
If b(x) '= 1
in (l), then B(x) = x and y(x) = eoX, corresponding to the exponential distribution treated in [3]. For the algorithm of Section 2 we have -k , and Gk(x) = x, for all k. Since = k, dk = 1, rk=l-e dk 9k and Gk(x) are independent of k, steps 4 to 10 of the algorithm are the same for all k. steps 1 to 5. is over They can therefore be carried out independently of
By (l-2), the probability that a chosen w is not accepted (for all k), and the average value of 1 - eWw e-l.
If the preliminary game of steps 1 to 3 were played, the interval [k -1, 'k) would be selected with probability rk - rk 1 = e -k e-k (1 - e -l), for k=l,2, . . . .
-1
-e
-(k+Q _ -
, and rejected with probability e . -1 Since the rejection ratio for each interval has the same value e , accepted with probability 1 - e which is the a priori probability of rejecting in the main game any w selected in step 4, von Neumann could use the rejection of w signal to change the interval from as the
preliminary game of steps l-3 is unnecessary for the exponential distribution. This made von Neumann's game very elegant. I know of no com-
parable trick for general b(x). From (22) and (23), since N(k) = 5, ential distribution (25) i;s = 1 + (e-l) -1 1 - e = e 1 - e 11 -1 we see that for the expon-
t 4.30026
as stated in
Cl1
5.
Special case:
40
42k - 1
(k > 2).
-Jz3
(k > 2).
!&lx
ck
The
table below gives 15-decimal values of qk3 %, rky and N(k) for k = 1, 2, . . . . 36, as computed in Fortran on Stanford's IBM 360/67 computer
in double precision. To generate normal dwiatcs, one selects K and prestorcs the values of rk, qk7 and dk for k = 1, 2, . . . . K. Then set qofieO and
dKfi-1. (The limit K = I2 permits normal deviates up to + 5.0 to . be generated, and the deviates will be truncated less than once in a million trials. cation.)
i
As suggested in [2], one should start the algorithm with a preliminary determination of the sign of the normal deviate. We do this
in steps Nl-NJ of the following algorithm. a uniform deviate on the interval w, 1).
oi' Sctction 2, with the sign appended in the last step. Nl. [13ct:in choice of sign and interval.] uniform N2. N5.
N4. rJ5.
Set k * 1. Generate a
deviate u on If
ro, 1).
u < 1,
If u >- 1, set se -1, and set u-u - 1. [Test for interval.] If u <- rk, go to step N6. [Increase interval.] to step N4. If u > rk, set k ek + 1 and go back
N6.
Generate
J-v*
N8. N9.
Set t eGk(w). Generate another uniform deviate u* on [0, 1). [Test.] If u* > -t, go to step N13. If u* < t , generate another uniform deviate
on [0, 1). If u < u*, set t f--u and go back to step N8.
Nil.
[Test.]
If u 2 u*, go back to step N6. y f-- s ( qkol + w) as the sample normal variable.
As in Section 3, we let N(k) be the expected number of selections of uniform deviates in steps N&-N13, as a function of k. from (22): We have
13
l+ N(1) = 1 e-w2/2 s 0 s 0
edle dw 1 dw
"k dk N(k) = qk e k+ / Yk-l Wmerical values of N(k) are given in the table. Using the formula x+h +t*/li e/ x dt
.
m
e '/2 -k J 'k-1
ew212
dw ('k 2 2).
eow2/'
dw
asymptotic
e 2x I 2
X
x+h
V
N(k) = 1 - e
4.30026 .
(25).
1
L
I have used the same computer to establish that 03 & > k=l N ( k, (k - rk-l) :
3.03585 >
so that the expected number of uniform deviates chosen in order to generate one normal deviate is 1 + y $ 4.03585 .
14
The correctness of this algorithm for generating normal deviates, <2" ~(~11 as the value 01 .J N, have be<?n confirmed in unpublished experi-
6.
In [l], Dieter and Ahrens give a related but different modification of' the von Neumann idea for the generation of normal deviates. There are only two intervals, the center and the tail, and the algorithms are quite different for the two. The expected number of uniform deviates
needed is near 6.321, and computation of-a square root is required in approximately 16 per cent of the cases. The algorithm of Section 5 above requires no function call, but its main advantage over the Center-Tail method lies in requiring about two-thirds the number of uniform deviates. 1 This should be reflected in a shorter average time of execution. The Dieter-Ahrens algorithm for the center interval closely resembles
L
my algorithm for each interval, and the proofs are very close to those given above. The big difference is that in [l] all variables ui have (0 2 x < l), and the com-
! L
n+l
n-l
<
l **
<u
<u
<u
1'
and the comparisons take the form (for the principal case k = 1):
U2 1 U < u < u
n+l 2
un
n-l
n-2
<
"*
<l.l
<u 2
<
15
Changing the distribution function in [l] costs an extra uniform deviate and a comparison for each ui3 whereas forming U12/2 u 's. i = Gl(w) in
Section 5 is done only once for each chain of fact that u12/2
U2
Moreover, the
is usually small means that most of the time u1 is accepted immediately. This contributes
U12/2
and hence
to keeping
N low in my algorithm.
it possible to use the von Neumann technique in any interval in which Gk(w) can be evaluated. In a more recent manuscript [2] Dieter and Ahrens have improved their Center-Tail method so that the comparisons are simpler and the expected number of uniform deviates needed is reduced to near 5.236. According to the authors, the improved Center-Tail method is still somewhat slower than my algorithm.
7.
Further generalizations.
Let
f(x)
(-a
<
<
a)
be
the
probability density function of a random variable F. Under what conditions on from F? f could the von Neumann idea be applied to pick a sample It is sufficient that the interval (-03 , w) be the union Ik = csk-1' $1 ck = "*, I. k > either 0, -2, -l, f(x) 0
0, 1, 2, . ..) such that in each closed interval or the following three conditions all hold: continuous, and f is monotonic. f(x)
f is absolutely
Then a preliminary game can be played to select an interval Ik. If b(x) = f'(x)/f(x) 2 0 in Ik, the algorithm of Section 2 can
(6)
hold.)
16
L.
If
b(x)
<
in
Ik,
change
to
-x
algorithm.
f(t) dt, Ja in order to determine the parameters needed to pick the intervals Ik during execution, and to evaluate the needed These computations have to be done only
once in designing the algorithm. (b) One must evaluate Gk(w) for arbitrary during each execution of the algorithm. w in 1% dkl
Note that
G-ii(w) = B(qk-l + d - B(qk 1) Y.k-1 + w = J 'k-1 b(t) d-t = / qk-1 f(t) qk-1 + w f'(t) dt
::irlf*~~
on-l;,
(b)
:i :: :1011(' on-line, ti~c succes;:; of an a&orithm would 222 f(x) rapidly. WC,
f(x) = C exp ( CJY (x)) (and hence a solution of (1)) is of great practical advantage, but it is
17
Rei'ercnces. 1 . J. 11. A
I Y
Mach. , vol. 15 (15772), pp. 000-000. 2. U. Dieter and J. Ahrens, "A combinatorial method for the
3.
with random digits" (smmary written by George E. Forsythe), pp. 36-38 of Monte Carlo Method, [U. s.] National Bureau of Standards, Applied Mathematics Series, vol. 12 (1951). Reprinted in John von Neumann,
Collected Works, vol. 5, pp. 768-770, Pergamon Press, 1963. February 9, 1972.
18
rr-
r Lf Lc e fi Q cx CFQ
2i 0
cp r-
ma3
u
4
C) U . iJ
C? u . c
0 . b 0 :J 0
19