Uniform Random Numbers
Uniform Random Numbers
1
2 Contents
3
4 3. Uniform random numbers
a pseudo-random number generator only requires a little storage space for both
code and internal data. When re-started in the same state, it re-delivers the
same output.
A second drawback to physical random number generators is that they usu-
ally cannot supply random numbers nearly as fast as pseudo-random numbers
can be generated. A third problem with physical numbers is that they may
still fail some tests for randomness. This failure need not cast doubt on the
underlying physics, or even on the tests. The usual interpretation is that the
hardware that records and processes the random source introduces some flaws.
Because pseudo-random numbers dominate practice, we will usually drop
the distinction, and simply refer to them as random numbers. No pseudo-
random number generator perfectly simulates genuine randomness, so there is
the possibility that any given application will resonate with some flaw in the
generator to give a misleading Monte Carlo answer. When that is a concern,
we can at least recompute the answers using two or more generators with quite
different behavior. In fortunate situations, we can find a version of our problem
that can be done exactly some other way to test the random number generator.
By now there are a number of very good and thoroughly tested generators.
The best of these quickly produce very long streams of numbers, and have
fast and portable implementations in many programming languages. Among
these high quality generators, the Mersenne twister, MT19937, of Matsumoto
and Nishimura (1998) has become the most prominent, though it is not the
only high quality random number generator. We can not rule out getting a
bad answer from a well tested random number generator, but we usually face
much greater risks. Among these are numerical issues in which round-off errors
accumulate, quantities overflow to ∞ or underflow to 0, as well as programming
bugs and simulating from the wrong assumptions.
Sometimes a very bad random number generator will be embedded in gen-
eral purpose software. The results of very extensive (and intensive) testing
are reported in L’Ecuyer and Simard (2007). Many operating systems, pro-
gramming languages and computing environments were found to have default
random number generators which failed a great many tests of randomness. Any
list of test results will eventually become out of date, as hopefully, the software
it describes gets improved. But it seems like a safe bet that some bad random
number generators will still be used as defaults for quite a while. So it is best to
check the documentation of any given computing environment to be sure that
good random numbers are used. The documentation should name the generator
in use and give references to articles where theoretical properties and empirical
tests have been published.
The most widely used random number generators for Monte Carlo sampling
use simple recursions. It is often possible to observe a sequence of their outputs,
infer their inner state and then predict future values. For some applications,
such as cryptography it is necessary to have pseudo-random number sequences
for which prediction is computationally infeasible. These are known as crypto-
graphically secure random number generators. Monte Carlo sampling does not
require cryptographic security.
which makes sure that each of the J simulations start with the same random
numbers. All the calls to rand take place within dosim or functions that it calls.
Whether the simulations for different j remain synchronized depends on details
within dosim.
Many random number generators provide a function called something like
getseed() that we can use to retrieve the present state of the random number
generator. We can follow savedseed ← getseed() by setseed(savedseed) to
restart a simulation where it left off. Some other simulation might take place in
between these two calls.
In moderately complicated simulations, we want to have two or more streams
of random numbers. Each stream should behave like a sequence of independent
U(0, 1) random variables. In addition, the streams need to appear as if they are
statistically independent of each other. For example, we might use one stream
of random numbers to simulate customer arrivals and another to simulate their
service times.
We can get separate streams by carefully using getseed() and setseed().
For example in a queuing application, code like
setseed(arriveseed)
A ← simarrive()
arriveseed ← getseed()
alternating with
setseed(serviceseed)
S ← simservice()
serviceseed ← getseed()
gets us separated streams for these two tasks. Once again the calls to rand()
are hidden, this time in simarrive() and simservice().
Constant switching between streams is cumbersome and it is prone to er-
rors. Furthermore it also requires a careful choice for the starting values of
arriveseed and serviceseed. Poor choices of these seeds will lead to streams
of random numbers that appear correlated with each other.
given s, n, nstep
setseed(s)
for i = 1, . . . , n
nextsubstream()
Xi ← oldsim(nstep)
restartsubstream()
Yi ← newsim(nstep)
end for
return (X1 , Y1 ), . . . , (Xn , Yn )
This algorithm shows pseudo-code for use of one substream per Monte Carlo
replication, to run two simulation methods on the same random scenarios.
arriverng ← newrng()
servicerng ← newrng()
before. The longer simulations will retrace the initial steps of the shorter ones
before extending them.
Comparatively few random number generators allow the user to have careful
control of streams and substreams. A notable one that does is RngStreams by
L’Ecuyer et al. (2002). Much of the description above is based on features of that
software. They provide a sequence of about 2198 points in the unit interval. The
generator is partitioned into streams of length 2127 . Each stream can be used
as a random number generator. Every stream is split up into 251 substreams of
length 276 . The design of RngStreams gives a large number of long substreams
that behave nearly independently of each other. The user can set one seed to
adjust all the streams and substreams.
For algorithms that consume random numbers on many different processors,
we need to supply a seed to each processor. Each stream should still simulate
independent U(0, 1) random variables, but now we also want the streams to
behave as if they were independent of each other. Making that work right
depends on subtle details of the random number generator being used, and the
best approach is to use a random number generator designed to supply multiple
independent streams.
While design of random number generators is a mature field, design for
parallel applications is still an active area. It is far from trivial to supply lots
of good seeds to a random number generator. For users it would be convenient
to be able to use consecutive non-negative integers to seed multiple generators.
That may work if the generator has been designed with such seeds in mind, but
otherwise it can fail badly.
Another approach is to use one random number generator to pick the seeds
for another. That can fail too. Matsumoto et al. (2007) study this issue. It
was brought to their attention by somebody simulating baseball games. The
simulation for game i used a random number generator seeded by si . The seeds
si were consecutive outputs of a second random number generator. These seeds
resulted in the 15’th batter getting a hit in over 50% of the first 25 simulated
teams, even though the simulation used batting averages near 0.250.
A better approach to getting multiple seeds for the Mersenne Twister is to
take integers 1 to K, write them in binary, and apply a cryptographically secure
hash function to their bits. The resulting hashed values are unlikely to show
a predictable pattern. If they did, it would signal a flaw in the hash function.
The advanced encryption standard (AES) (Daemen and Rijmen, 2002) provides
one such hash function and there are many implementations of it.
we might find that there are no floating point numbers between 1 and 1 − 10−17
while the interval between 0 and 10−300 does have some. In single precision
there is an even wider interval around 1 with no represented values. If uniform
random numbers are used in single precision, then that alone could produce
values of U = 1.0.
An LCG is necessarily slower than an MCG and because the LCG does not bring
much if any quality improvement, MCGs are more widely used. A generalization
of the MCG is the multiple recursive generator or MRG:
where k > 1 and ak 6= 0. Lagged Fibonacci generators which take the form
xi = xi−r + xi−s mod M for carefully chosen r, s and M are an important
special case, because they are fast.
A relatively new and quite different generator type is the inversive con-
gruential generator, ICG. For a prime number M the ICG update is
(
a0 + a1 x−1
i−1 mod M xi−1 6= 0
xi = (3.4)
a0 xi−1 = 0.
or si = Asi−1 mod M for a state vector si = (xi , xi−1 , . . . , xi−k+1 )T and the
given k by k matrix A with elements in {0, 1, . . . , M − 1}. This matrix represen-
tation makes it easy to jump ahead in the stream of an MRG. To move ahead
2ν places, we take
ν ν
si+2ν = (A2 si ) mod M = ((A2 mod M )si ) mod M.
ν ν
The matrix A2 mod M can be rapidly precomputed by the iteration A2 =
ν−1
(A2 )2 mod M .
Equation (3.5) also makes it simple to produce and study thinned out sub-
streams xei ≡ x`+ki , based on taking every k’th output value. Those substreams
are MRGs with A e = Ak mod M . The thinned out stream is not necessarily
better than the original one. In the case of an MCG, the thinned out sequence
is also an MCG with e a1 = ak1 mod M . If this value were clearly better, then we
would probably be using it instead of a1 .
The representation (3.5) includes the MCG as well, for k = 1, and similar,
though more complicated formulas, can be obtained for the LCG after taking
account of the constant a0 .
a a2 a3 a4 a5 a6
0 0 0 0 0 0
1 1 1 1 1 1
2 4 1 2 4 1
3 2 6 4 5 1
4 2 1 4 2 1
5 4 6 2 3 1
6 1 6 1 6 1
Table 3.1: The first column shows values of a ranging from 0 to 6. Subsequent
columns show powers of a taken modulo 7. The last 6 entries of the final column
are all 1’s in accordance with Fermat’s theorem. The primitive roots are 3 and 5.
k
Y aj aj + 1
Ba ≡ , ,
j=1
2` 2`
for 0 6 aj < 2` , has 2K−k` of the points (ui , ui+1 , . . . , ui+k−1 ) for i = 1, . . . , P .
We also say that such a random number generator is (k, `)-equidistributed.
We will see a more powerful form of equidistribution, using not necessarily
cubical regions, in Chapter 15 on quasi-Monte Carlo.
Many random number generators have a period of the form P = 2K − 1,
because the state vector is not allowed to be all zeros. Such random number
generators are said to be (k, `)-equidistributed if each of the boxes Ba above has
2K−k` points in it, except the one for a = (0, 0, . . . , 0), which then has 2K−k` −1
points in it.
The Mersenne twister MT19937 is 623-distributed to 32 bits accuracy. It is
also 1246-distributed to 16 bits accuracy, 2492-distributed to 8 bits accuracy,
4984-distributed to 4 bits accuracy, 9968-distributed to 2 bits accuracy and
19937-distributed to 1 bit accuracy. One reason for its popularity is that the
dimension k = 623 for which equidistribution is proven is relatively large. This
fact may be more important than the enormous period which MT19937 has.
Marsaglia (1968) showed that the consecutive tuples (ui , . . . , ui+k−1 ) from
an MCG have a lattice structure. Figure 3.1 shows some examples for k = 2,
using p small enough for us to see the structure.
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Figure 3.1: This figure illustrates two very small MCGs. The plotted points are
(ui , ui+1 ) for 1 6 i 6 P . The lattice structure is clearly visible. Both MCGs
have M = 59. The multiplier a1 is 33 in the left panel and 44 in the right. Two
basis vectors are given near the origin. The points lie in systems of parallel lines
as shown.
where vj are linearly independent basis vectors in Rk . The tuples from the
MCG are the intersection of the infinite set L with the unit cube [0, 1)k . The
definition of a lattice closely resembles the definition of a vector space. The
difference is that the coefficients αj are integers in a lattice, whereas they are
real values in a vector space. We will study lattices further in Chapters 15
through 17 on quasi-Monte Carlo. There is more than one way to choose the
vectors v1 , . . . , vk . Each panel in Figure 3.1 shows one basis (v1 , v2 ) by arrows
from the origin.
The lattice points in k dimensions lie within sets of parallel k −1 dimensional
planes. Lattices where those planes are far apart miss much of the space, and so
are poor approximations to the desired uniform distribution. In the left panel
of Figure 3.1, the two marked lines are distance 0.137 apart. The points could
also be placed inside a system of lines 0.116 apart, nearly orthogonal to the set
shown. The first number is the quality measure for these points because we
seek to minimize the maximum separation. The lattice on the right is worse,
because the maximum separation, shown by two marked lines, is 0.243.
The number of planes on which the points lie tends to be smaller when those
planes are farther apart. The relationship is not perfectly monotone because the
number of parallel planes required to span the unit cube depends on both their
separation and on the angles that they make with the coordinate axes of the
cube. For an MCG with period P , Marsaglia (1968) shows that there is always
a system of (k!P )1/k or fewer parallel planes that contain all of the k-tuples.
MCGs are designed in part to optimize some measure of quality for the
planes in which their k-tuples lie. Just as a bad lattice structure is evidence
of a flaw in the random number generator, a good lattice structure, combined
with a large value of P is proof that the k dimensional projections of the full
period are very uniform. Let dk = dk (u1 , . . . , uP ) be the maximum separation
between planes in [0, 1]k for the k-tuples formed from u1 , . . . , uP . We want a
small value for d1 and d2 , and remembering RANDU, for dk at larger values of
k. It is customary to use ratios sk = d∗k /dk where d∗k is some lower bound for dk
given the period P . The idea is to seek a generator with large (near 1) values
for sk for all small k. Computing dk can be quite difficult and the choice of a
generator involves trade-offs between the quality at multiple values of k. Gentle
(2003, Chapter 2.2) has a good introduction to tests of lattice structure.
Inversive congruential generators produce points that do not have a lattice
structure. Nor do they satisfy strong (k, `)-equidistribution properties. Their
uniformity properties
Qk are established by showing that Qthe fraction of their points
within any box j=1 [0, aj ] is close to the volume j ak of that box. We will
revisit this and other measures of uniformity, called discrepancies, in Chapter 15
on quasi-Monte Carlo.
where O` is the number of times that ordering ` was observed and E` = n/k!
is the expected number of times, and χ2(ν) is a chi-squared random variable on
ν degrees of freedom. Small values of p are evidence that the ui are not IID
uniform. If the ui are IID U(0, 1), then the distribution of p approaches the
U(0, 1) distribution as n → ∞.
A simple test of uniformity is to take large n and then compute a p-value
such as the one given above. If p is very small then we have evidence of a
problem with the random number generator. It is usually not enough to find
and report whether one p-value from a given test is small or otherwise. Instead
we want small, medium and large p-values to appear with the correct relative
frequency for sample sizes n comparable to ones we might use in practice. In
a second-level test we repeat a test like the one above some large number
N of times and obtain p-values p1 , . . . , pN . These should be nearly uniformly
distributed. The second level test statistic is a measure of how close p1 , . . . , pN
are to the U(0, 1) distribution. A popular choice is the Kolmogorov-Smirnov
statistic
1 X N
KS = sup
1pj 6t − t.
06t61 N j=1
The second level p-value is P(QN > KS) where QN is a random variable that
truly has the distribution of a Kolmogorov-Smirnov statistic applied to N IID
U(0, 1) random variables. That is a known distribution, which makes the sec-
ond level test possible. Small values, such as 10−10 for the second order p-
value indicate a problem with the random number generator. Alternatives to
Kolmogorov-Smirnov, such as the Anderson-Darling test, have more power to
capture departures from U(0, 1) in the tails of the distribution of pj .
A second-level test will detect random number generators that are too uni-
form, having for example too few pj below 0.01. They will also capture subtle
flaws like p values that are neither too high nor too low on average, but instead
tend to be close to 0.5 too often.
There are third-level tests based on the distribution of second-level test
statistics but the most practically relevant tests are at the first or second level.
We finish this section by describing a few more first level tests which can be
used to construct corresponding second order tests.
Marsaglia’s birthday test has parameters m and n. In it, the RNG is used
to simulate m birthdays b1 , . . . , bm in a hypothetical year of n days. Each bi
should be U{1, 2, . . . , n}. Marsaglia and Tsang (2002) consider n = 232 and
m = 212 . The sampled bi values are sorted into b(1) 6 b(2) 6 · · · 6 b(m) and
differenced, forming di = b(i+1) − b(i) for i = 1, . . . , m − 1. The test statistic is
D, the number of values among the spacings di that appear two or more times.
The distribution of D is roughly Poi(λ) where λ = m3 /(4n) for m and n in the
range they use. For the given m and n, we find λ = 4. The test compares N
sample values of D to the Poisson distribution. The birthday test seems strange,
but it is known to detect problems with some lagged Fibonacci generators.
A good multiplicative congruential generator has its k-tuples on a well
separated lattice. Those points are then farther apart from each other than
P random points in [0, 1)k would be. Perhaps a small sample of n points
vj = (ujk+1 , . . . , u(j+1)k ) for j = 1, . . . , n preserves this problem and the points
avoid each other too much. The close-pair tests are based on the distribution
of MDn,k = min16j<j 0 6n kvj −vj 0 k. Here kzk is a convenenient norm. L’Ecuyer
et al. (2000) find that norms with a wraparound feature, (treating 0 and 1 as
identical) are convenient because boundary effects disappear making it easier
to approximate the distribution that MDn,k would have given perfect uniform
numbers.
To run these tests we need to know the true distribution of their test statistics
on IID U(0, 1) inputs. That true distribution is usually easiest to work out if
the k-tuples in the test statistic are non-overlapping (ujk+1 , . . . , u(j+1)k ) for
j = 1, . . . , n. Sometimes the tests are run instead on overlapping k-tuples
(uj+1 , . . . , uj+k ) for j = 1, . . . , n. Tests on overlapping tuples are perfectly
valid, though it may then be much harder to find the proper distribution of the
test statistic.
i←0
for j = 1, . . . , n − 1
for k = j + 1, . . . , n
(3.8)
i←i+1
Xi ← Uj ⊕ Uk
end double for loop
Proposition 3.1. Let Y = f (X) have mean µ and variance σ 2 < ∞ when X ∼
U(0, 1)d . Suppose that X1 , . . . , XN are pairwise independent U(0, 1)d random
PN PN
variables. Let Yi = f (Xi ), Ȳ = (1/N ) i=1 Yi and s2 = (N − 1)−1 i=1 (Yi −
Ȳ )2 . Then
σ2
E(Ȳ ) = µ, Var(Ȳ ) = , and E(s2 ) = σ 2 .
N
Proof. Exercise 3.10.
Pairwise independence differs from full independence in one crucial way. The
average Ȳ of pairwise independent and identically distributed random variables
does not necessarily satisfy a central limit theorem. The distribution depends
on how the pairwise independent points are constructed.
To get a 99% confidence interval for E(Y ) we could form R genuinely inde-
pendent replicates Ȳ1 , . . . , ȲR , each of which combines N pairwise independent
random variables, and then use the interval
R 1/2 R
1 1 X
Ȳ¯ ± 2.58 (Ȳr − Ȳ¯ )2 Ȳ¯ =
X
, where Ȳr .
R(R − 1) r=1 R r=1
GFSRs were introduced by Lewis and Payne (1973) and TGFSRs by Mat-
sumoto and Kurita (1992). The Mersenne twister was introduced in Matsumoto
and Nishimura (1998).
TestU01 is a comprehensive test package for random number generators is
given by L’Ecuyer and Simard (2007). It incorporates the tests from Marsaglia’s
diehard test battery, available on the Internet, as well as tests developed by the
National Institute of Standards and Technology. When a battery of tests is
applied to an ensemble of random number generators, we not only see which
generators fail some test, we also see which tests fail many generators. The latter
pattern can even help spot tests that have errors in them (Leopardi, 2009).
For cautionary notes on parallel random number generation see Hellekalek
(1998), Mascagni and Srinivasan (2004) and Matsumoto et al. (2007). The latter
describe several parametrization techniques for picking seeds. They recommend
a seeding scheme from L’Ecuyer et al. (2002).
Owen (2009) investigates the use of pairwise independent random variables
drawn from physical random numbers, described in §3.7. The average
n−1 n
1 X X
Ȳ = n
f (Uj ⊕ Uk ) (3.9)
2 j=1 k=j+1
Recommendations
For Monte Carlo applications, it is necessary to use a high quality random num-
ber generator with a long enough period. The theoretical principles underlying
the generator and its quality should be published. There should also be a pub-
lished record showing how well the generator does on a standard battery of
tests, such as TestU01.
When using an RNG, it is good practice to explicitly set the seed. That
allows the computations to be reproduced later.
If many independent streams are required then a random number generator
which supports them is preferable. For example RngStreams produces indepen-
dent streams and has been extensively tested.
In moderately large projects, there is an advantage to isolating the random
number generator inside one module. That makes it easier to replace the random
numbers later if there are concerns about their quality.
Exercises
3.1. Let P = 219937 − 1 be the period of the Mersenne twister. Using the
equidistribution properties of the Mersenne twister:
a) For how many i = 1, . . . , P will we find max16j6623 ui+j−1 < 2−32 ?
b) For how many i = 1, . . . , P will we find min16j6623 ui+j−1 > 1 − 2−32 ?
c) Supose that we use the Mersenne twister to simulate coin tosses, with toss
i being heads if ui < 1/2 and tails if 1/2 6 ui . Is there any index i in
1 6 i 6 P for which ui , . . . , ui+10000−1 would give 10,000 heads in a row?
How about 20,000 heads in a row?
Note that when i + j − 1 > P the value ui+j−1 is still well defined. It is
ui+j−1 mod P .
3.2. The lattice on the right of Figure 3.1 is not the worst one for p = 59. Find
another value of a for which the period of xi = axi−1 mod 59, starting with
x1 = 1 equals 59, but the 59 points (ui , ui+1 ) for ui = xi /59 lie on parallel lines
more widely separated than those with a = 44. Plot the points and compute
the separation between those lines. [Hint: identify a lattice point on one of
the lines, and drop a perpendicular from it to a line defined by two points on
another of the lines.]
3.3. Suppose that we are using an MCG with P 6 232 .
a) Evaluate Marsaglia’s upper bound on the number of planes which will
contain all consecutive k = 10 tuples from the MCG.
b) Repeat the previous part, but assume now a much larger bound P 6 264 .
c) Repeat the previous two parts for k = 20 and again for k = 100.
3.4. Suppose that an MCG becomes available with period 219937 − 1. What is
Marsaglia’s upper bound on the number of planes in [0, 1]10 that will contain
all 10-tuples from such a generator?
3.5. Consider the inversive generator xi = (a0 + a1 x−1
i−1 ) mod p for p = 59,
a0 = 1 and a1 = 17. Here x−1 satisfies xx−1 mod p = 1 for x 6= 0 and 0−1 is
taken to be 0.
a) What is the period of this generator?
b) Plot the consecutive pairs (xi , xi+1 ).
3.6. Here we investigate whether the digits of π appear to be random.
.
a) Find the first 10,000 digits of π − 3 = .14159 after the decimal point,
and report how many of these are 0’s, 1’s and . . . 9’s. These digits are
available in many places on the Internet.
b) Report briefly how you got them into a form suitable for computation.
You might be able to do it within a file editor, or you might prefer to
write your own short C or perl or other program to get the data in a
form suitable for computing. Either list your program or describe your
sequence of edits. Also: indicate which URL you got the π digits from.
One time, it appeared that not all π listings on the web agree!
c) A χ2 test for uniformity has test statistic
9
X
2
X = (Ej − Oj )2 /Ej
j=0
3.7. Here we make a simple test of the Mersenne Twister, using a trivial starting
seed. The test requires the original MT19937 code, widely available on the
internet, which comes with a function called init by array.
a) Seed the Mersenne Twister using init by array with N = 1 and the un-
signed long vector of length N , called init key in the source code, having
a single entry init key[0] equal to 0. Make a histogram of the first 10,000
U(0, 1) sample values from this stream (using the function genrand real3).
Apply the χ2 test of uniformity based on the number of these sample val-
ues falling into each of 100 bins [(b − 1)/100, b/100) for b = 1, . . . , 100.
Report the p-value for this test.
b) Continue the previous part, until you have obtained 10,000 p-values, each
based on consecutive blocks of 10,000 sample values from the stream.
Make a histogram of these p-values and report the p-value of this second
level test.
This exercise can be done with the C language version of MT19937. If you use
a translation into a different language, then indicate which one you have used.
3.8 (Use of streams). Let Ui be a sequence of IID U(0, 1) random variables for
Pi p
integers i, including i 6 0. Let T = min{i > 1 | j=1 Uj > 40} be the first
Pi 2
future time that a barrier is crossed. Similarly, let S = min{i > 0 | j=0 U−j >
20} represent an event defined by the past and present at time 0. These events
are separated by time T + S. (They are determined via T + S + 1 of the Ui .)
a) Estimate µ40,20 = E(T +S) by Monte Carlo and give a confidence interval.
Use n = 1000 simulations.
b) Replace the threshold 40 in the definition of T by 30. Estimate µ30,20 =
E(T + S) by Monte Carlo and give a confidence interval, using n = 1000
simulations. Explain how you ensure that the same past events are used
for both µ40,20 and µ30,20 . Explain how you ensure that the same future
points are used in both estimates.
25
26 Bibliography