Pretty Derping
Pretty Derping
_
x
dy
1 + y
2
=
1
_
tan
1
y
=
1
2
+
1
tan
1
x. (2)
If we set u = F(x) for the Cauchy CDF, then we can solve to obtain
x = tan (u
1
2
). The argument of the tangent is uniformly distributed
on [/2, /2]. Exactly the same eect can be obtained using an angle uni-
formly distributed on [0, ], and so we can obtain a drawing from the Cauchy
distribution as tanU, where U comes straight from the RNG as a U(0, 1)
variate.
In fact, we can spare ourselves the need to compute a trigonometric function
at all, at the cost of generating at least two uniform variates. The trick is
identical to the one used for the same purpose with the Box-Muller algorithm.
We thus have the following code for a random tangent. The returned value
from the function Cauchy is a drawing from the distribution of tanU, as
above.
Ects source code
18 Random Numbers from Various Distributions
Cauchy drawing)
double Random::Cauchy()
{
double rr, v1, v2;
do
{
v1 = 2*rng->uniform()-1;
v2 = 2*rng->uniform()-1;
rr = v1*v1+v2*v2;
}
while (rr >= 1);
return v1/v2;
}
This code, too, will most often be inlined.
3. The Gamma Distribution
The gamma distribution is characterised by the following PDF, for an argu-
ment x > 0, and a parameter a > 0:
f(x) =
x
a1
e
x
(a)
. (3)
We refer to this as the (a) distribution. Its expectation is
1
(a)
_
0
x
a
e
x
dx =
(a + 1)
(a)
= a,
and, in fact, all of its uncentred moments can be computed similarly. If
X (a), we have
E(X
n
) =
1
(a)
_
0
x
n+a1
e
x
dx =
(a + n)
(a)
.
If n is an integer, the ratio of gamma functions reduces to just the product
a(a + 1) . . . (a + n 1).
The simplest case is that in which a is a positive integer. For a = 1, we just
obtain the exponential distribution with expectation 1. The characteristic
function of that distribution is
(t)
_
0
e
x
e
itx
dx = (1 it)
1
,
and so the characteristic function of the sum of n independent exponentially
distributed variables is (1 it)
n
, and the corresponding density is given by
the inverse Laplace transform
1
2
_
C
e
itx
(1 it)
n
dt, (4)
Ects source code
The Gamma Distribution 19
where the contour C runs parallel to the real axis, above all singularities of
the integrand. The only singularity here is at t = i, where there is a pole of
order n. Express the integrand in (4) in terms of s t + i; we obtain
(i)
n
e
x
e
isx
/s
n
. (5)
The residue at t = i is the coecient of s
1
in the Laurent expansion of (5)
around s = 0. This is equal to (i)
n
e
x
times the coecient of s
n1
in the
expansion of e
isx
. This coecient is (ix)
n1
/(n1)!. Thus the integral (4),
which is 2i times the residue (since, on closing C in the lower half-plane,
we obtain a contour traversed in the negative direction), is equal to
2i(i)
n
e
x
(i)
n1
x
n1
/(n 1)! = 2e
x
x
n1
/(n 1)!,
and so the density is
x
n1
e
x
(n 1)!
=
x
n1
e
x
(n)
, (6)
that is, the density of the gamma distribution with parameter n.
It follows that, in order to generate a variate from the (n) distribution, we
can add n independent exponential variates of expectation 1. In order to avoid
computing n logarithms, we can multiply uniform variates, and just take one
logarithm at the end. In the following code, we use a instead of n, since that
is the generic symbol for the parameter.
gamma variate for small integer parameter)
double gam = 1;
for (size_t i = 0; i < a; ++i) gam *= rng->uniform();
gam = -log(gam);
If the parameter is not an integer, or if it is larger than some cuto, then it is
recommended by Press et al (1992) (Numerical Recipes) that one should use
a rejection method. This method is reviewed in Appendix B of this Chapter.
The function used to contain the area under the gamma density is a multiple of
the Cauchy PDF (1), also known as the Lorentzian function, with a translated
and rescaled argument:
g(x) =
c
0
1 + (x x
0
)
2
/a
2
0
, (7)
where x
0
, c
0
, and a
0
have to be chosen appropriately. The integral of (7)
from to x is, from (2),
a
0
c
0
_
1
2
+
1
tan
1
((x x
0
)/a
0
)
_
. (8)
As x , this integral tends to a
0
c
0
.
Ects source code
20 Random Numbers from Various Distributions
The rejection method begins by generating a variate of which the density is
g(x)/
_
g(y) dy. It is clear from (8) that, if x follows the distribution with
this density, then (x x
0
)/a
0
follows the Cauchy distribution, and so x itself
can be generated by the formula x = x
0
+ a
0
tanU, with U U(0, 1).
For simplicity, we now consider the gamma distribution with parameter a+1.
We require that a > 0, and so we must restrict ourselves to values of the
actual parameter greater than 1. It can be seen from (3) that, for parameter
values less than 1, the density tends to innity at the origin. If it is possible to
draw from the distribution with that density, it would need a wholly dierent
approach from the one used here.
The generated variate x is rejected if a further U(0, 1) variate is greater than
the ratio of the desired gamma density at x to g(x). This ratio is
x
a
e
x
(1 + (x x
0
)
2
/a
2
0
)
c
0
a!
=
x
a
e
x
(1 + t
2
)
c
0
a!
, (9)
where t tanU is the random tangent. For the method to work correctly,
we require that a
0
, c
0
, and x
0
should be such that this ratio is never greater
than 1 for positive x, but, subject to that constraint, we wish to make the
rejection rate as small as possible.
The mode of g(x) in (7) is at x = x
0
, while that of the gamma density (3) for
parameter a +1 is at x = a, as we can see by dierentiating the density with
respect to x:
d
dx
x
a
e
x
a!
=
e
x
x
a1
(a x)
a!
, (10)
and so we choose x
0
= a. Then, in order to minimise the area between the
graphs of the density and g, we set c
0
so that the values of the two functions
coincide at the maximum. The maximum value of the density, that is, its
value at x = a, is a
a
e
a
/a!, and the maximum value of g(x) is c
0
, so that we
set c
0
= a
a
e
a
/a! and thus obtain for the ratio (9) the value
_
x
a
_
a
e
ax
(1 + t
2
).
Since a = x
0
, we can observe that a x = x
0
x = a
0
t.
It remains to x a
0
, and this we can do by matching the second derivatives of
the two functions at the maximum. The second derivative of the density is,
from (10),
d
2
dx
2
e
x
x
a
a!
=
e
x
x
a2
a!
_
(x a)
2
a
_
=
e
a
a
a1
a!
at x = a. (11)
The derivative of g(x) is
g
(x) =
2c
0
a
2
0
x x
0
1 + ((x x
0
)/a
0
)
2
,
Ects source code
The Gamma Distribution 21
and so the second derivative is
g
(x) =
2c
0
a
2
0
_
1
_
1 + ((x x
0
)/a
0
)
2
_
2
4(x x
0
)
2
_
1 + ((x x
0
)/a
0
)
2
_
3
_
,
which is just 2c
0
/a
2
0
at x = x
0
, or, evaluating at our chosen value of c
0
,
2a
a
e
a
/(a! a
2
0
). Equating this to the other second derivative, in (11), we
obtain a
0
= (2a)
1/2
. In fact, we must set a
0
to the slightly higher value
of (2a + 1)
1/2
, in order for the Lorentzian function to be always greater than
the gamma density.
The above analysis was done using the factorial function; clearly nothing is
changed by replacing a! by (a + 1). We skip a proof that, with the above
choices of the parameters a
0
, c
0
, and x
0
, the value of the function g(x) is
greater than the density at x, except at the maximum, where they coincide.
It is easy to check numerically that this condition is satised for a wide range
of values of a. Figure R.1 shows the graphs of the two functions for a = 3.5,
and other values of a give topologically equivalent graphs.
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
0.00
0.05
0.10
0.15
0.20
0.25
............ ........................ .......................... ....................................................... ................................................................................ ................................................................ ......................................................................................................... ........................................................................................................... ................................................................................................... .......................................................... ........................................................................ ....................................................... ............................. ..................................... .................................... ........................... ............... ............................... .................................................... ........................................................ ....................................... ............................................................ ............................................................ ............................................................................................... ...................................................... .................................................... ................................. ................................................ ............................................. ............................. .......................................... ........................................ ....................................... .......................... .................................... .................................... ........................ ....................................
............................................. ................................ ................................................. .................................................... .................................... ......................................................... ......................................................... ......................................................... ..................................... .................................................... ............................................. ........................... .................................... .................................... ........................... .............. ............................... .................................................... ........................................................ ...................................... ......................................................... ......................................................... .................................... .................................................... ................................................. ............................................... .............................. ........................................... .......................................... ........................... ....................................... ....................................... ...................................... ........................ .................................... .................................... ........................ ....................................
x
gamma density for a = 3.5
Lorentzian function
Figure R.1
The gamma density and the Lorentzian function
We are now ready to code the rejection method for non-integer values of a
greater than 1, and integer values greater than 9. This cuto will be justied
by some timing experiments in a moment. We generate x by the formula
x = x
0
+ a
0
t, where t is generated by a call to Cauchy so as to obtain a
random tangent.
Ects source code
22 Random Numbers from Various Distributions
gamma variate for other parameters)
a = a-1;
double a0 = sqrt(2*a+1);
double x;
while (true)
{
double t = Cauchy();
x = a+a0*t;
if (x <= 0) continue;
double check = (1+t*t)*exp(-a0*t+a*log(x/a));
if (check >= rng->uniform()) break;
}
We can now put together the function gamma. We select the appropriate
method of generating the variate on the basis of the argument. A nonpositive
argument leads to an exception, of class BadGamma.
gamma denition)
double Random::gamma(double a)
{
if (a < 1) throw BadGamma();
if (a < 10 && a == floor(a))
{
gamma variate for small integer parameter)
return gam;
}
gamma variate for other parameters)
return x;
}
An experiment was performed in order to check whether the distribution of
the generated numbers was correct, and to nd the best integer value of the
parameter at which to stop cumulating exponential variates and switch to
the rejection method. The results in Table R2.1 are timings in seconds for
generating a million variates by cumulating exponentials (cum) or by the
rejection method (rej). The processor used was an Athlon at 1 GHz.
Table R2.1 Timings for generating gamma variates
a cum rej a cum rej
1 0.46 * 11 1.60 1.45
2 0.57 1.25 12 1.74 1.46
3 0.68 1.32 15 2.07 1.48
5 0.94 1.38 50 6.04 1.53
7 1.16 1.42 100 * 1.55
9 1.39 1.43 1000 * 1.57
10 1.51 1.46 10000 * 1.60
Ects source code
The Poisson Distribution 23
The stars * indicate experiments not performed because they would have
taken a long time and been uninformative, or, for the rejection method
with a = 1, infeasible. The choice, made in the above code, of a cuto
of 10 for moving to the rejection method, is clearly justied by the data in
the Table. The slow increase in the time taken by the rejection method as
a grows is presumably due to a higher rejection rate.
The code for the function gamma is in the le gamma.cc.
gamma.cc)
#include <rng/randdist.h>
gamma denition)
As can be seen, it includes a header le randdist.h, which will be set up in
due course.
4. The Poisson Distribution
Although the Poisson distribution is closely related to the gamma distribution,
it is quite dierent from it, in the sense that it is a discrete distribution,
taking on only nonnegative integers as values. Like the gamma distribution,
it depends on just one parameter, which can be taken as the expectation of
the distribution. The probabilities for parameter a are
Pr(Y = n) =
e
a
a
n
n!
.
It is clear that
n=0
Pr(Y = n) = 1, on noting that the Taylor expansion
of e
a
is
n=0
a
n
/n!.
There is a precise relation between the Poisson distribution with expectation a
and the exponential distribution with expectation 1. It is that, if an exponen-
tial variate is interpreted as a waiting time for some sort of event, the Poisson
variate is the greatest number of such independent waiting times that sum to
less than a. That is, if events occur independently, with the waiting time for
each event starting immediately on the realisation of the preceding event, it
is the number of events that occur in time a.
In order to prove this relation, notice that the probability that the Poisson
variate is equal to n is the probability that the sum of n independent exponen-
tial variates, E
i
say, for i = 1, . . . , n, is less than a, while the sum of these plus
another independent exponential variate, E
n+1
, exceeds a. Let Z
n
i=1
E
i
.
It follows that Z (n); recall the result in Section 2. Thus the probability
we wish to calculate is Pr(Z < a Z + E
n+1
> a). Conditional on Z, we
obtain, for values of Z less than a, that
Pr(Z + E
n+1
> a [ Z) = Pr(E
n+1
> a Z [ Z) = e
a+Z
,
Ects source code
24 Random Numbers from Various Distributions
since E
n+1
is independent of Z. To obtain the desired probability, we inte-
grate this last expression times the density (6) of Z over the admissible range
of [0, a]. This yields
e
a
_
a
0
e
x
x
n1
e
x
(n)
dx = e
a
_
a
0
x
n1
dx
(n 1)!
=
e
a
a
n
n!
,
as required.
Poisson variates can be generated directly using this result. We generate
independent exponential variates of expectation 1 until their sum exceeds a,
at which point one fewer than the number of exponential variates generated
is the Poisson variate. We saw in the rst section of this Chapter that an
exponential variate is the negative of the log of a uniform U(0, 1) variate, and
so, as we did with the gamma distribution, we can save time by computing a
cumulative product of uniforms, and comparing the result with e
a
. As with
the gamma distribution, we want to use this method only for small enough
values of a. In the following code, ea has the value e
a
.
Poisson variate for small parameter)
double g = 1;
int n = -1;
do
{ g *= rng->uniform(); ++n;}
while (g > ea);
For large enough values of a, we can again use a rejection method. The
covering function g is once more of the Lorentzian form (7). However, since the
Poisson variate is discrete, we must construct a continuous random variable
from which we can easily derive the Poisson variate. The trick is to let the
density of the continuous variable be equal to Pr(X = n) on the interval
[n, n + 1]. Having generated a drawing x of this variable, it is clear that the
integer part of x, which we write as x|, follows the Poisson distribution.
The greatest probability for the Poisson distribution with parameter a is
achieved at n = a|. This can be seen by noting that, for n < a,
a
n1
/(n 1)! < a
n
/n! and, for n > a, a
n
/n! > a
n+1
/(n + 1)! . We therefore
set the mode of the Lorentzian function x
0
equal to a.
For the parameter a
0
, we cannot match second derivatives in the discrete
setting, but by making approximate arguments, it is not hard to see that the
choice a
0
= (2a)
1/2
is just as appropriate as for the gamma distribution.
The parameter that causes the most trouble is c
0
, the maximum value of g.
It will not do, unfortunately, to set it equal to the maximum of the contin-
uous function that the Poisson probabilities interpolate, because the discrete
probabilities can stick out of the continuous envelope. The phenomenon is
illustrated in Figure R.2, where the step function is the constructed density,
shown along with the obvious, but incorrect, Lorentzian function.
Ects source code
The Poisson Distribution 25
6.0 8.0 10.0 12.0
0.00
0.05
0.10
0.15
..... ..... .......... .......... ..... ..... ..... .......... ..... ..... ..... ..... ..... .......... ..... ..... ..... .......... ...... ................................................................................................................................................................ ..... ..... ..... .......... .......... ..... ..... ..... .......... ..... ..... ..... .......... .......... ..... ..... ..... .......... ..... . ....................................................................................................................................................... ..... ..... .......... ..... ..... ..... ..... ..... .......... ..... ..... ..... .......... .......... ..... ..... ..... .......... . .................................................................................................. .......... ..... ..... ..... .......... ..... ..... ..... ..... ..... .......... ..... ..... ..... .......... .......... ..... ..... . ........ .......... .......... ..... ..... ..... .......... ..... ..... ..... .......... .......... ..... ..... ..... .......... ..... ..... ..... ...... ............................................................................ ..... ..... ..... ..... ..... .......... ..... ..... ..... .......... .......... ..... ..... ..... .......... ..... ..... ..... . ...................................................................................................................... ..... ..... ..... .......... ..... ..... ..... .......... .......... ..... ..... ..... .......... ..... ..... ..... ..... ..... .......... . .................................................................................................................................... ..... ..... ..... .......... ..... ..... ..... .......... .......... ..... ..... ..... .......... .......... ..... .......... .....
...... .................. ............ ...... ............ ...... ............ .................. .................. .................. .................... ..................... ..................... ..................... ..................... ..................... ..................... ..................... ..................... ..................... ..................... ..................... .............. ...... ............ ...... ............ .................. .................. .................. .................. .................. ................. ............... ............... ............... ............... ............... ..... ..... ..... ............... ............... ............... .......... ..... .......... ..... .......... .................. .................. .................. .................. .................. .................. .................. ................... ..................... ..................... ..................... ..................... ..................... ..................... ..................... .............. ....... .............. ....... .............. ..................... ..................... ................... .................. .................. .................. .................. .................. .................. ......
x
a = 8.1
Figure R.2
Poisson probabilities and a bad Lorentzian function
Instead, c
0
is determined by moving to two integers, one a little to the right
and the other a little to the left of a, and computing the Lorentzian function
for c
0
= 1 at these points. The maximum Poisson probability, p
max
Pr(Y =
a|), is also computed, and then c
0
is set equal to p
max
divided by the smaller
of the two Lorentzian values. This ensures that the Lorentzian function is
above the constructed density at the two integer arguments chosen.
Since the constructed density puts mass to the right of the integer it corre-
sponds to, the argument to the right should be at a distance from a| one
greater than the distance from the argument on the left. A dimensional ar-
gument shows that this second distance should be roughly proportional to a,
and extensive numerical investigation shows that an suitable value is 1 plus
a/100 rounded to the nearest integer. This leads to the state of aairs shown
in Figure R.3.
The critical values of a for this method are around a = 100m + 49 and a =
100m+51, for a positive integer m, for it is here that the distance out from a
skips by an integer. Experiments were undertaken for various values of m.
Results are similar. By way of illustration, the closest distance between the
graph of the constructed density and the Lorentzian function was 0.000135
for a = 849 and 0.000136 for a = 851. In all the experiments, the Lorentzian
function always lay above the constructed density, as required for the rejection
method.
The parameters a
0
and c
0
are computed as follows.
compute parameters for Poisson rejection method)
a0 = sqrt(2*a);
double n = floor(a);
Ects source code
26 Random Numbers from Various Distributions
6.0 8.0 10.0 12.0
0.00
0.05
0.10
0.15
..... ..... .......... .......... ..... ..... ..... .......... ..... ..... ..... ..... ..... .......... ..... ..... ..... .......... ...... ................................................................................................................................................................ ..... ..... ..... .......... .......... ..... ..... ..... .......... ..... ..... ..... .......... .......... ..... ..... ..... .......... ..... . ....................................................................................................................................................... ..... ..... .......... ..... ..... ..... ..... ..... .......... ..... ..... ..... .......... .......... ..... ..... ..... .......... . .................................................................................................. .......... ..... ..... ..... .......... ..... ..... ..... ..... ..... .......... ..... ..... ..... .......... .......... ..... ..... . ........ .......... .......... ..... ..... ..... .......... ..... ..... ..... .......... .......... ..... ..... ..... .......... ..... ..... ..... ...... ............................................................................ ..... ..... ..... ..... ..... .......... ..... ..... ..... .......... .......... ..... ..... ..... .......... ..... ..... ..... . ...................................................................................................................... ..... ..... ..... .......... ..... ..... ..... .......... .......... ..... ..... ..... .......... ..... ..... ..... ..... ..... .......... . .................................................................................................................................... ..... ..... ..... .......... ..... ..... ..... .......... .......... ..... ..... ..... .......... .......... ..... .......... .....
................ ................................ ................................................ .................................................. .................................. ................................................... ..................................................... ...................................................... .................................. ................................................... ................................................ ............................... ........................................... ....................................... ........................ ............ ........................ .................................... ...................................... ........................... ............................................ ................................................ ................................. ................................................... ..................................................... ...................................................... .................................. ................................................... ................................................... ................................ ................................................ ................
x
a = 8.1
Figure R.3
Poisson probabilities and a good Lorentzian function
double pmax = exp(-a+n*loga-lgam(n+1));
double ninc = 1+floor(a/100+0.5);
double lu = 1/(1+(n+ninc+1-a)*(n+ninc+1-a)/(a0*a0));
double ll = 1/(1+(n-ninc-a)*(n-ninc-a)/(a0*a0));
c0 = std::max(pmax/lu,pmax/ll);
Note that, in order to evaluate the factorial conveniently, we use the function
lgam from the Cephes mathematical library which, for argument z, returns
log (z).
The rejection method can now be coded. The variable check is the ratio of
the value of the Lorentzian function at the continuous variate rst drawn,
to the Poisson probability at the discrete integer-valued variate that may or
may not be accepted. The Lorentz value is c
0
/(1 +t
2
), where t is the random
tangent, and the Poisson probability is e
a
a
x
/x! where x is the candidate
integer. Thus the ratio is
(1 + t
2
) exp(a + xlog a)
c
0
x!
,
In the following code, the ratio is computed and x is accepted only if a uniform
variate is no greater than it.
Poisson variate for large parameter)
double x;
while (true)
{
double t = Cauchy();
x = floor(a+a0*t);
if (x < 0) continue;
Ects source code
The Poisson Distribution 27
double check = (1+t*t)*exp(-a+x*loga-lgam(x+1))/c0;
if (check >= rng->uniform()) break;
}
Since there are several quantities that depend on the parameter a that must
be calculated, a trick we can use is to have static variables in the function, one
for a, and the others for the various functions of a. We need compute these
functions only when the function is called with an argument dierent from
that for the last call. For the method using cumulated exponential variates,
the only such function is e
a
, but there are others for the rejection method.
First, we have the variable for the parameter a itself.
rst static variable for Poisson)
static double a = 1;
The default argument will be set to 1, hence the choice of the initial value
for the static variable. Next, within the part of the function for cumulated
exponentials, we have just one variable, e
a
, that we initialise to e
1
, of
course.
second static variable for Poisson)
static double ea = exp(-1);
For the rejection method, we need a
0
= (2a)
1/2
, log a, and, of course, c
0
.
third set of static variables for Poisson)
static double loga = 0, a0 = sqrt(2), c0 = 1;
We thus obtain the function Poisson. It returns a size t, since that type
is exactly correct for the Poisson distribution. Since the parameter must be
positive, an exception, BadPoisson, is thrown if a nonpositive argument is
supplied.
Poisson denition)
size_t Random::Poisson(double mean)
{
if (mean <= 0) throw BadPoisson();
rst static variable for Poisson)
if (mean < 20)
{
second static variable for Poisson)
if (mean != a) { a = mean; ea = exp(-a); }
Poisson variate for small parameter)
return size_t(n);
}
third set of static variables for Poisson)
if (mean != a)
{
Ects source code
28 Random Numbers from Various Distributions
a = mean; loga = log(mean);
compute parameters for Poisson rejection method)
}
Poisson variate for large parameter)
return size_t(x);
}
Table R2.2 Timings for generating Poisson variates
a cum rej a cum rej
1 0.27 3.33 20 2.00 1.99
2 0.37 2.78 21 2.10 1.96
4 0.55 2.29 22 2.19 1.96
6 0.73 2.13 23 2.26 1.95
9 1.00 2.05 25 2.46 1.94
12 1.26 2.03 50 * 2.01
15 1.53 2.00 100 * 1.96
18 1.82 2.01 1000 * 2.04
19 1.92 1.97 10000 * 2.83
In the above code, the cuto argument for ipping from the rst method to
the rejection method is 20. This choice was based on the results in Table R2.2,
where timings are given in seconds for a million drawings using the method
of cumulating exponentials (cum) and the rejection method (rej).
The stars * indicate that the experiment was not performed, being con-
sidered uninformative. It is of interest to note that timings for the rejection
method, although showing no trend as a varies, are not uniform or monotonic.
This is a consequence of the way in which c
0
is chosen, leading to rejection
rates that do not vary smoothly. In addition, computing time for the rejection
method appears to be a little longer than for the gamma rejection method.
Finally, it should also be noted that the variates generated did indeed have a
distribution quite indistinguishable from the true Poisson distribution.
The code for Poisson is in source le Poisson.cc
Poisson.cc)
#include <algorithm>
#include <rng/randdist.h>
extern "C" { double lgam(double); }
Poisson denition)
Ects source code
The Chi-squared, Beta, t, and Fisher Distributions 29
5. The Chi-squared, Beta, t, and Fisher Distributions
The chi-squared distribution with n degrees of freedom is just the distribution
of a variate equal to twice a variate that follows the gamma distribution with
parameter n/2. Its CDF and PDF can be derived from the gamma density (3).
Let Z (n/2) and let Y = 2Z, so that Y
2
(n). Then we have have
Pr(Y x) = Pr(Z x/2) =
1
(n/2)
_
x/2
0
t
n/21
e
t
dt
=
1
2
n/2
(n/2)
_
x
0
s
n/21
e
s/2
ds.
The last expression above is the usual formula cited for the chi-squared CDF.
The PDF also follows immediately on removing the integral sign.
An alternative denition of the chi-squared distribution with n degrees of
freedom is the distribution of the sum of the squares of n standard normal
variates. This can be seen by rst calculating the characteristic function of a
squared N(0, 1) random variable. Let X N(0, 1). Then we have
E(e
itX
2
) =
1
(2)
1/2
_
exp
_
x
2
2
(1 2it)
_
dx = (1 2it)
1/2
,
where the change of variables y = x(1 2it)
1/2
is used to obtain the last
equality. It follows that the characteristic function of the sum of n independent
N(0, 1) variables squared is (1 2it)
n/2
. Now the characteristic function of
the gamma distribution with positive integer parameter n is (1 it)
n
; see
Section 3. In fact, this formula applies as well for a non-integer parameter.
Thus the characteristic function of twice a gamma variate with parameter n/2
is equal to that of a chi-squared variate with n degrees of freedom. For non-
integer degrees of freedom, we use the same denition without fear, since the
gamma distribution is dened for any parameter no smaller than 1.
To make it clear that the degrees of freedom number need not be an integer,
we now denote it as . If is an even integer, and is small enough, we can
conveniently generate a
2
() variate as the sum of /2 exponentials. If is
an odd integer, we generate the sum of ( 1)/2 exponentials and then add
the square of a standard normal. Otherwise, we just call the function gamma
developed earlier. The cuto of 10 for a gamma parameter before resorting to
the rejection method translates into a cuto of 20 for the degrees of freedom.
Before resorting to the rejection method, we also check that the degrees of
freedom are at least 2. Otherwise, a negative integer argument would give rise
to an almost innite loop. The code of the function chi2 is as follows, where
we reuse the trick of multiplying uniforms and taking only one logarithm.
Ects source code
30 Random Numbers from Various Distributions
chi-squared drawing)
double Random::chi2(double df)
{
if (df >= 2 && df < 20 && df == floor(df))
{
size_t n = size_t(df);
size_t m = n/2; // If n is odd, this is (n-1)/2.
double chi = 1;
for (size_t i = 0; i < m; ++i) chi *= rng->uniform();
chi = -2*log(chi);
if (n%2) // i.e., if n is odd
{
double sn = rng->normal();
chi += sn*sn;
}
return chi;
}
return 2*gamma(df/2);
}
Note that no checking for a bad degrees of freedom number is done; the call
to gamma takes care of that. I was not sure whether to inline this function,
but testing showed that timings were essentially identical whether an existing
function was called or an inline function used. It is therefore appropriate to
put the code in the source le chi2.cc.
chi2.cc)
#include <rng/randdist.h>
chi-squared drawing)
The Fisher distribution is very easy to code, since the F(n, m) distribution is
by denition the ratio of two independent chi-squared variates, each divided
by its degrees of freedom number. Thus we get the following denition for the
function Fisher.
Fisher drawing)
double Random::Fisher(double n, double m)
{ return chi2(n)*m/(chi2(m)*n); }
This code can, of course, be inlined.
The Students t distribution with n degrees of freedom is by denition the
ratio of a standard normal variate to the square root of an independent chi-
squared variate with n degrees of freedom, divided by n. That is to say,
t(n) = N(0, 1)/(
2
(n)/n)
1/2
. The code, again to be inlined, denes a function
Student.
Ects source code
The Chi-squared, Beta, t, and Fisher Distributions 31
Student drawing)
double Random::Student(double n)
{ return rng->normal()/sqrt(chi2(n)/n); }
Lastly, the beta distribution is a two-parameter distribution that is dened
in terms of the gamma distribution. The relationship is
B(a, b) =
(a)
(a) + (b)
, a, b > 0,
where the two gamma variates are mutually independent. Clearly, a beta vari-
ate can lie only in the [0, 1] interval. Consider the CDF of B(a, b). Evaluated
at x [0, 1], it is
Pr
_
B(a, b) x
_
= Pr
_
(a)
(a) + (b)
x
_
= Pr
_
(a) x(b)/(1 x)
_
= E
_
Pr
_
(a) x(b)/(1 x) [ (b)
_
_
= E
_
1
(a)
_
x(b)/(1x)
0
y
a1
e
y
dy
_
=
1
(a)(b)
_
0
dz z
b1
e
z
_
xz/(1x)
0
y
a1
e
y
dy,
where we twice use the expression (3) of the density of the gamma distribu-
tion. The derivative of the above expression with respect to x is the density
of B(a, b), which is, therefore,
1
(1 x)
2
1
(a)(b)
_
0
dz z
b1
e
z
z
_
xz
1 x
_
a1
exp
xz
1 x
=
(1 x)
b1
x
a1
(a)(b)
_
0
d
_
z
1 x
__
z
1 x
_
a+b1
exp
_
z
1 x
_
=
(a + b)
(a)(b)
x
a1
(1 x)
b1
.
It is clear from this last expression why the distribution is called the beta
distribution. Note too that, if z B(a, b), 1 z B(b, a), since
Pr(1 z x) = Pr(z 1 x) =
(a + b)
(a)(b)
_
1
1x
y
a1
(1 y)
b1
dy
=
(a + b)
(a)(b)
_
x
0
z
b1
(1 z)
a1
dz,
where, for the last step, we set z = 1 y. The last expression is, as required,
the probability that a variate distributed as B(b, a) should be less than x.
Ects source code
32 Random Numbers from Various Distributions
The code for beta can once again be inlined.
beta drawing)
double Random::beta(double a, double b)
{ double ga = gamma(a); return ga/(ga+gamma(b)); }
6. The Header File
The functions developed here are declared in the header le randdist.h,
which also contains the denitions of the inline functions, the declaration of
the pointer rng to the RNG to be used, and of the exceptions that can be
thrown by these functions.
randdist.h)
#ifndef _RNG_RANDDIST_H
#define _RNG_RANDDIST_H
#include <cmath>
#include <rng/random.h>
namespace Random
{
randdist function declarations)
extern RNG* rng;
randdist exceptions)
}
inline
Cauchy drawing)
inline
exponential drawing)
inline
logistic drawing)
inline
geometric drawing)
inline
Fisher drawing)
inline
Student drawing)
inline
beta drawing)
#endif
Ects source code
The Makele for the Full Random Library 33
The list of functions dened in this chapter contains the functions exponen-
tial, logistic, geometric, Cauchy, gamma, Poisson, chi2, Fisher, Stu-
dent, and beta.
randdist function declarations)
double exponential(double mean = 1);
double logistic();
size_t geometric(double prob = 0.5);
double Cauchy();
double gamma(double a = 1);
size_t Poisson(double a = 1);
double chi2(double);
double Fisher(double n, double m);
double Student(double n);
double beta(double a, double b);
The exceptions are BadGeometric, BadGamma, and BadPoisson.
randdist exceptions)
struct BadGeometric {};
struct BadGamma {};
struct BadPoisson {};
7. The Makele for the Full Random Library
The library librand.a contains the code only for the dierent RNGs. The
library librandom.a contains that code plus the code of this chapter. It seems
useful to have two libraries to avoid overburdening executables that need only
the former. On the other hand, the code of this chapter always requires at least
one RNG. A new Makele is thus required for the new library, Makefile.dist.
Makele.dist)
CC = g++
CFLAGS = -c -O3 -fPIC -Wall
INCDIR = ../include
OBJS = random.o rng250.o jgm.o mersenne.o kiss.o rngpointer.o
DISTOBJS = gamma.o Poisson.o chi2.o
%.o: %.cc
$(CC) $(CFLAGS) -I $(INCDIR) $<
librandom.a: $(OBJS) $(DISTOBJS)
ar -rv $@ $(OBJS) $(DISTOBJS)
ranlib $@
$(OBJS): $(INCDIR)/rng/random.h
$(DISTOBJS): $(INCDIR)/rng/random.h
Ects source code
34 Random Numbers from Various Distributions
clean:
rm -f librandom.a $(OBJS) $(DISTOBJS) *.bak core
Ects source code
Appendix A: the Logistic
Distribution
The CDF of the logistic distribution, repeated here for convenience, is (x) =
1/(1 + e
x
). Solving the equation u = (x) gives x = log
_
u/(1 u)
_
. The
moment-generating function of the logistic distribution is, by denition, the
expectation of exp(sx), where x is a logistic variate, and s is the argument of
the function. Since u U(0, 1), we see that the moment-generating function
is
E
_
exp(sx)
_
= E
_
exps log
_
u
1 u
_
_
= E
_
u
s
(1 u)
s
_
=
_
1
0
u
s
(1 u)
s
du = B(s + 1, 1 s),
where B is the beta function, dened as
B(a, b)
(a)(b)
(a + b)
.
The moment-generating function is therefore equal to
(1 + s)(1 s)
(2)
= (1 + s)(1 s),
since (2) = 1! = 1.
The characteristic function is the moment-generating function evaluated at
s = it for real t, and the cumulant-generating function (cgf) is the log of the
characteristic function. The cgf is, therefore,
log (1 + it) + log (1 it).
Cumulants can be calculated by calculating the series expansion of the cgf. In
Abramowitz and Stegun (1965), henceforth AS, we nd, in equation (6.1.33),
the following expansion, valid for complex z with [z[ < 2:
log (1 + z) = log(1 + z) + z(1 + ) +
n=2
(1)
n
_
(n) 1
_
z
n
/n,
where is Eulers constant, and is the Riemann zeta function. The cgf can
thus be written as
log(1 + t
2
) +
n=2
(1)
n
_
(n) 1
n
_
_
(it)
n
+ (it)
n
_
.
35
36 Appendix A: the Logistic Distribution
Since i
n
= (i)
n
for even n, while i
n
= (i)
n
for odd n, it can be seen that
the terms with n odd vanish in the sum above. We can therefore sum over
only the even integers, as follows:
log(1 + t
2
) +
n=1
_
(2n) 1
n
_
(1)
n
t
2n
=
n=1
(1)
n+1
t
2n
n
+
n=1
_
(2n) 1
_
(1)
n
t
2n
n
=
n=1
(1)
n
(2n)
t
2n
n
.
From this it is clear that the odd cumulants do indeed vanish.
In order to compute the even cumulants, recall that the n
th
cumulant,
n
,
is the coecient of (it)
n
/n! in the expansion of the cgf. Thus
2n
, for
n = 1, 2, . . ., is given by the expression (2n)! (2n)/n. Now, from AS equa-
tion (23.2.16), we learn that, for the relevant values of n,
(2n) =
(2)
2n
2(2n)!
[B
2n
[,
where B
2n
is the Bernoulli number of order 2n. Consequently,
2n
=
(2)
2n
2n
[B
2n
[.
For n = 1, we nd from Table 23.1 of AS that B
2
= 1/6, and so
2
, the
second cumulant, which is also the variance, is
2
= (2)
2
/(2 6) =
2
/3.
This accords with the well-known result.
Ects source code
Appendix B: the Rejection
Method
In order to make a drawing from a distribution characterised by a PDF f(x),
with corresponding CDF F(x), the rejection method makes use of a func-
tion g(x) that satises g(x) f(x) everywhere in the support of f(x). The
support of g(x) may be bounded or unbounded; for ease of notation we write
for the bounds in what follows.
The second requirement on the function g is that we should be able to invert
its integral. Let G(x)
_
x
g(y) dy/G
0
, where G
0
_
f(y)
g(y)
dG(y)
=
1
G
0
_
x
f(y)
g(y)
g(y) dy
=
1
G
0
_
x
f(y) dy =
F(x)
G
0
,
37
38 Appendix B: the Rejection Method
where the step to the second line follows directly from the denition of G, and
the last step is just the denition of the CDF F(x).
By exactly similar reasoning, we see that
Pr
_
U
2
f(X)/g(X)
_
= E
_
f(X)/g(X)
_
= 1/G
0
.
The expression 1/G
0
is, of course, just the unconditional acceptance rate of
the method.
We can now compute the CDF (12). By Bayes Theorem, it is
Pr
_
X x[ U
2
f(X)/g(X)
_
=
Pr
_
X x U
2
f(X)/g(X)
_
Pr
_
U
2
f(X)/g(X)
_
=
F(x)/G
0
1/G
0
= F(x).
Thus Z, as generated by the rejection method, does indeed have CDF F(x),
as required.
Ects source code
References
Abramowitz, Milton, and Irene A. Stegun (1965). Handbook of Mathematical
Functions, Dover reprint of original NBS publication.
Golomb, Solomon W. (1967). Shift Register Sequences, SanFrancisco, Holden-
Day.
Knuth, Donald E. (1998). The Art of Computer Programming, 3rd edition,
Reading Massachusetts, Addison-Wesley.
Matsumoto, M. and T. Nishimura (1998). Mersenne Twister: a 623-Dimen-
sionally Equidistributed Uniform Pseudo-Random Number Generator,
ACM Transactions on Modeling and Computer Simulation, 8, 330.
Press, W.H., B.P. Flannery, S.A. Teukolsky, et W.T. Vetterling (1992). Nu-
merical Recipes in C, Cambridge University Press, Cambridge.
Sutter, Herb (2001). Virtuality, in C/C++Users Journal, 19 (September),
5358.
39
General Index
Abramowitz, Milton, 35
Bernoulli numbers, 16, 36
beta distribution, 3132
beta function, 35
Box-Muller algorithm, 23, 17
Cauchy distribution, 1718
Cephes mathematical library, 26
Cephes mathematical library functions
lgam, 26
chi-squared distribution, 2931
DIEHARD test suite for RNGs, 4, 9
distribution
Cauchy, 1718
chi-squared, 2931
exponential, 15
Fisher, 30
gamma, 1823
geometric, 1617
logistic, 16, 3536
Poisson, 2329
Eulers constant, 35
exponential distribution, 15
Fisher distribution, 30
gamma distribution, 1823
geometric distribution, 1617
Golomb, Solomon W., 5
KISS, 910
Knuth, Donald E., 3, 16
librand.a, 13, 33
librandom.a, 33
logistic distribution, 16, 3536
cumulants, 3536
logit model, 16
Lorentzian function, 19
MacKinnon, James G., 45
Marsaglia, George, 4, 9
Matsumoto, Makoto, 7
Mersenne twister, 79
multiplicative congruential generators,
35
Nishimura, Takuji, 7
Numerical Recipes, 19
Poisson distribution, 2329
Random, 113, 1533
Random Number Generators (RNG),
113
random tangent, 1718
rejection method
for gamma variate, 1922
for Poisson variate, 2427
theory, 3738
Riemann zeta function, 35
standard normal distribution, 2
Stegun, Irene A., 35
Students t distribution, 3031
Sutter, Herb, 1
timing
of gamma variate generation, 2223
of Poisson variate generation, 28
uniform distribution, 2
virtuality, 1
zeta function
of Riemann, 35
40
Type Index
C Functions
lgam, 26
Libraries
Cephes mathematical library, 26
librand.a, 13, 33
librandom.a, 33
Namespaces
Random, 113, 1533
Random
Classes
cmRNG, 34, 11
jgm, 45, 11
KISS, 910, 13
MT19937, 89, 12
RNG, 12, 10, 15
RNG250, 67, 12
Exceptions
BadGamma, 22, 33
BadGeometric, 17, 33
BadPoisson, 2728, 33
Functions
beta, 3233
Cauchy, 1718, 21, 33
chi2, 2930, 33
exponential, 15, 33
Fisher, 30, 33
gamma, 2223, 29, 33
geometric, 17, 33
logistic, 16, 33
Poisson, 2729, 33
Student, 3031, 33
Member Functions
cmRNG::Dobits, 34
cmRNG::Douniform, 34
jgm::generate, 5
KISS::generate, 10
MT19937::Doputseed, 7
MT19937::Doseedlength, 7
MT19937::Dosetseed, 9
MT19937::generate, 9
MT19937::MT19937, 89
RNG250::Doputseed, 6
RNG250::Doseedlength, 6
RNG250::Dosetseed, 6
RNG250::generate, 67
RNG250::RNG250, 67
RNG::bits, 2
RNG::Dobits, 2
RNG::normal, 23
RNG::putseed, 2
RNG::seedlength, 2
RNG::setseed, 2
RNG::uniform, 2
Objects
rng, 1011, 13, 15, 3233
Source les
chi2.cc, 30
cmrng.h, 11
gamma.cc, 23
jgm.cc, 12
jgm.h, 1112
kiss.cc, 13
kiss.h, 13
Makefile, 1314
Makefile.dist, 3334
mersenne.cc, 1213
mersenne.h, 12
Poisson.cc, 2829
randdist.h, 23, 3233
random.cc, 11
random.h, 1011
rng250.cc, 12
rng250.h, 12
rngpointer.cc, 13
std
Functions
rand, 6
srand, 6
41
Index of Code Segments
beta drawing
denition, 32
use, 32
Cauchy drawing
denition, 18
use, 32
chi-squared drawing
denition, 30
use, 30
chi2.cc
denition, 30
cmRNG denition
denition, 3
use, 11
cmRNG virtual functions
denition, 3
use, 11
cmrng.h
denition, 11
compute parameters for Poisson rejec-
tion method
denition, 25
use, 28
exponential drawing
denition, 15
use, 32
rst static variable for Poisson
denition, 27
use, 27
Fisher drawing
denition, 30
use, 32
gamma denition
denition, 22
use, 23
gamma variate for other parameters
denition, 22
use, 22
gamma variate for small integer parame-
ter
denition, 19
use, 22
gamma.cc
denition, 23
geometric drawing
denition, 17
use, 32
jgm denition
denition, 4
use, 11
jgm generate
denition, 5
use, 12
jgm.cc
denition, 12
jgm.h
denition, 11
KISS denition
denition, 9
use, 13
KISS generate
denition, 10
use, 13
kiss.cc
denition, 13
kiss.h
denition, 13
logistic drawing
denition, 16
use, 32
Makele.dist
denition, 33
mersenne.cc
denition, 13
mersenne.h
denition, 12
MT19937 constants
denition, 8
use, 13
MT19937 constructor
denition, 8
use, 13
MT19937 denition
denition, 8
use, 12
MT19937 generate
denition, 9
use, 13
MT19937 setseed
denition, 9
use, 13
Poisson denition
denition, 27
use, 28
Poisson variate for large parameter
denition, 26
use, 28
42
Code Index 43
Poisson variate for small parameter
denition, 24
use, 27
Poisson.cc
denition, 28
randdist exceptions
denition, 33
use, 32
randdist function declarations
denition, 33
use, 32
randdist.h
denition, 32
random.cc
denition, 11
random.h
denition, 10
RNG denition
denition, 1
use, 11
RNG Makele
denition, 13
RNG normal generator
denition, 2
use, 11
RNG250 constructor
denition, 6
use, 12
RNG250 denition
denition, 6
use, 12
RNG250 generate
denition, 7
use, 12
rng250.cc
denition, 12
rng250.h
denition, 12
rngpointer.cc
denition, 13
second static variable for Poisson
denition, 27
use, 27
Student drawing
denition, 31
use, 32
third set of static variables for Poisson
denition, 27
use, 27
Ects source code
RandomNumbers
Ects source code
Russell Davidson
c _ January, 2005
Table of Contents
Table of Contents iii
1 Random Number Generators 1
1 Introduction 1
2 The Interface 1
3 Multiplicative Congruential Generators 3
4 Shift Register Sequences 5
5 Newer Generators 7
6 Source Files 10
2 Random Numbers from Various Distributions 15
1 The Exponential, Logistic, and Geometric Distributions 15
2 The Cauchy Distribution 17
3 The Gamma Distribution 18
4 The Poisson Distribution 23
5 The Chi-squared, Beta, t, and Fisher Distributions 29
6 The Header File 32
7 The Makele for the Full Random Library 33
Appendix A: the Logistic Distribution 35
Appendix B: the Rejection Method 37
References 39
iii