Chapter 02
Chapter 02
RANDOM NUMBERS
Simulation using an electronic device requires algorithms that produce streams of numbers
that a user cannot distinguish from a similar string of numbers generated randomly. This is
normally done using algorithms that generate numbers between 0 and 1, called a random
number generator. This chapter will look at some of the properties and applications of such
generators.
5. Most importantly, the generated random numbers should pass statistical tests for uniformity
and independence.
One method that is widely used, with enhancements, as a pseudo-random number generator
is known as the linear congruential method. It was initially proposed by D. H. Lehmer in 1951,
and produces a sequence of integers 𝑋! , 𝑋! , … between zero and 𝑚 − 1 according to the
following recursive relationship:
𝑋!!! = 𝑎𝑋! + 𝑐 mod 𝑚, for 𝑖 = 1,2, … , 𝐾.
The initial value 𝑋! is called the seed, 𝑎 is called the constant multiplier, 𝑐 is the increment,
and 𝑚 is the modulus. The random numbers generated, called the stream, are then calculated as
𝑋!
𝑅! = , for 𝑖 = 1,2, … , 𝐾.
𝑚
In order to take advantage of the fastest method of doing arithmetic on a computer, the modulus
is usually chosen to be a power of 2. If the increment 𝑐 is zero, the form is known as the
multiplicative congruential method. If 𝑐 ≠ 0, the form is known as a mixed congruential
method.
The random number generator in Example 2.2 has been extensively tested over the years
since it was first proposed. So far, it has been able to withstand this intense inspection, and
remains one of the best random number generators currently known. Most software packages,
however, do not advertise what technique they use for random number generation. Thus we
cannot be certain that this method is used in a spreadsheet. For this reason, let us make a brief
inspection of some of the numbers generated by Excel.
SIMULATION NOTES 2-3
It is often helpful to graph a histogram of the frequencies. To do this, select the frequency
values and use the Column button to open the dialog box shown in Table 2.4. The graph of the
results is given in Figure 2.1. The labels on the horizontal axis have been changed, the legend has
been hidden and the data values have been added above the bars.
2-6 RANDOM NUMBERS
Since each of the 10 subintervals should be equally likely, we expect to about 10 to fall into
each interval. Some of the results, however, seem a bit far from 10. But how far is too far? We
need a statistical test to help us decide. There are two goodness-of-fit tests that are often used in
these types of situations. They are the Chi-square and the Kolmogorov-Smirnov goodness-of-
fit tests. The null hypothesis, 𝐻! , for each is that the underlying distribution is uniform on [0, 1].
To use a Chi-square goodness-of-fit test, we need to divide the interval into subintervals so
that the expected number of values that fall into each is at least 5. Since the expected number of
values in each of the subintervals of length 0.1 is 10, we may use the same intervals as above.
The statistic, which has a Chi-squared distribution with 9 degrees of freedom, is then defined as
𝑒! − 𝑜! !
𝜒! = ,
𝑒!
!
where 𝑒! is the expected number of values in the ith subinterval and 𝑜! is the observed number of
values in the ith subinterval. To facilitate this calculation, we construct the spreadsheet shown in
Table 2.5. Having the spreadsheet sum the last column gives a Chi-square statistic of 10.2. The
10 cells give us 9 degrees of freedom. The Excel function =CHIINV(0.05, 9) gives the critical
value of 16.92. Since the value of the statistic should be small whenever the fit is acceptable, we
would not reject 𝐻! at the 95% confidence level. The p-value of this test may be found to be
0.335, using the function =CHIDIST() as shown in Table 2.5. This means that the value of alpha
required before we would reject 𝐻! and say that the underlying distribution is statistically
different from uniform, must be at least 0.335. Thus we would accept the claim that the numbers
came from a Uniform[0, 1] distribution.
As the sample size 𝑁 becomes larger, the empirical distribution should become a better
approximation of 𝐹, if the null hypothesis is true. The K-S statistic
𝐷 = max 𝐹 𝑥 − 𝐺! 𝑋
!
measures the largest absolute value difference between the uniform CDF and the empirical
distribution that was observed.
Table 2.6 shows a K-S test done on a sample of 𝑛 = 20 numbers generated using Excel’s
random number generator. The generated numbers in column B were copied and, using the Paste
Special feature, the values were pasted into column C. The numbers were then sorted. If the
sample came from a Uniform[0, 1] distribution, the first datum should lie between 0 and 1/20,
the second between 1/20 and 2/20, etc. The 𝐷 statistic is the maximum of all the differences
𝑗/𝑛 − 𝑅! and 𝑅! − (𝑗 − 1)/𝑛, and measures the largest variation from the endpoints of the
interval in which the datum 𝑅! should lie. Notice that no parameters were estimated from the
sample. When this is the case, a single table suffices for critical values of 𝐷 with degrees of
freedom 𝑛. An accurate approximation scheme which adjusts 𝐷 using 𝑛 and eliminates the
degrees of freedom from the table was devised by Stephens. The adjusted statistic, 𝑎𝑑𝑗𝐷, is
given by the formula
0.11
𝑎𝑑𝑗𝐷 = 𝑛 + 0.12 + 𝐷.
𝑛
SIMULATION NOTES 2-9
The better the fit, the smaller 𝐷 (and hence 𝑎𝑑𝑗𝐷) should be. In the example, the value
calculated for 𝑎𝑑𝑗𝐷 is 1.411. This is larger than the critical values for 0.15, 0.1 and 0.05 and
smaller than the critical values for 0.25 and 0.01. Thus the p-value for this test is larger than
0.025 and smaller than 0.05. With 95% confidence (𝛼 = 0.05) we reject the hypothesis that the
numbers were drawn from a population having a Uniform[0, 1] distribution.
2.4 Independence
Deciding whether or not the generated numbers are independent is even more elusive than
the uniformity issue. There are many statistical tests that have been devised to test aspects of the
independence. One of the early random number generators was tested for correlation between
consecutive numbers and thought to be a good tool. However, someone eventually tested for
correlation between the numbers that were separated by one other (1st and 3rd, 2nd and 4th , etc.).
There was found to be a strong correlation, and the tool is no longer used.
2-10 RANDOM NUMBERS
For our purposes, we will simply use the correlation function to test the spreadsheet’s
random number generator. In column B, 100 random numbers were generated using the
generator. The correlation CORR() function was then used to find the correlation coefficient of
the 3 columns. The equation for the correlation coefficient is
∑(𝑥 − 𝑥)(𝑦 − 𝑦)
𝜌= .
∑ 𝑥 − 𝑥 !∑ 𝑦 − 𝑦 !
If the correlation coefficient is close to 0, there is little correlation between the numbers. If it is
close to one, there is a large positive correlation, and if it is close to negative one there is a large
negative correlation. The results of the calculation are shown in Table 2.8. Since the correlation
between Column B and Column C is 0.168, which is small, there is no indication of dependence
between consecutive numbers. Similarly, the correlation between Column B and Column D is
−0.182, which is also small.
Any random number generator must pass these, as well as a larger battery of other tests
before it can be certified for extensive application. The National Institute of Standards and
Technology (NIST) maintains a website containing documents and links which provide an
excellent starting point for anyone who wishes to learn more about random number generators.
Reference: Reber, James C., Pseudorandom Number Generators in a Four-Bit Computer
System, Indiana University of Pennsylvania, Indiana, PA.
SIMULATION NOTES 2-11
Problems
A random number generator will be called “suitable to use” if it has a maximum cycle length.
1. Investigate the effect that the seed has on the random number generator given in Example
2.1, by determining the resulting sequence for each possible seed value. Determine the length
of the cycle for each seed, and state whether or not that seed would be “suitable to use.”
2. Investigate the effect that the multiplier and the increment have on the random number
generator given in Example 2.1. Which of the following would be “suitable to use”? The
modulus in each case is to be 16.
a. 𝑎 = 2, 𝑐 = 5,
b. 𝑎 = 3, 𝑐 = 1,
c. 𝑎 = 7, 𝑐 = 5,
d. 𝑎 = 15, 𝑐 = 4,
e. 𝑎 = 9, 𝑐 = 3.
3. Determine conditions under which all 16 numbers are generated by a linear congruential
random number generator using modulus 16.
4. Using the multiplicative congruential method, find the period of the generator for 𝑎 = 13,
𝑚 = 2! = 64, and seeds 1, 2, 3, and 4.
5. Generate 100 random numbers using a spreadsheet’s random number generator. Obtain the
frequencies of the values generated in each subinterval of length 0.1, and obtain a graph of
the histogram of the frequencies. Do a Chi-square test for uniformity and write a paragraph
on your analysis.
6. Generate 100 random numbers using a spreadsheet’s random number generator. Do a K-S
test for uniformity on the numbers and write a paragraph on your analysis.
7. Generate 102 random numbers using the spreadsheet random number generator. Determine
the correlation coefficient between the 1st through 100th and the 2nd through 101st . Then
determine the correlation coefficient between the 1st through 100th and the 3rd through 102nd .
Write a paragraph on your analysis.
2-12 RANDOM NUMBERS
Partial Solutions
1. All have cycle length 16, which is the maximum possible. Thus all are suitable.
2. a. Not suitable, ends by repeating 11 only. b. Not suitable, cycle length 8. c. Not suitable,
cycle length 4. d. Not suitable, cycle length 2. e. Suitable, cycle length 16 (maximum
possible).
4. 16, 8, 16, 4