Uni Variate Distributions
Uni Variate Distributions
Uni Variate Distributions
x l
c
_
,
L
b
x L
c
(1)
where the Ls give the range of applicability: usually from
to
, often from 0 to
,
sometimes only over a nite range. The function
s
is what gives the actual shape of the
pdf; the constant A
r
(s) (the area under the function) is needed to normalize the integral of
this function, over the limits given, to 1. We call s the shape parameter; not all pdf s have
one. Almost all do have two others:
l A location parameter, which has the effect of translating the pdf on the x-axis.
This appears mostly for functions on (
'
1
0
0 x 1
x < 0 or x >1
and cumulative distribution function (cdf)
(x)
'
0
x
1
x 0
0 x 1
x 1
In this particular case these formulas are the conventional form, and a random variable
X with this distribution is said to satisfy X U(0,1). We can of course apply l and c to
move the nonzero part to any location, and make it of any nite length, in which case we
would have
(x) c
1
for l < x < l + c
This distribution is the basis for many others, and is used in various simulation tech-
niques, some of which we will discuss in Chapter 5. A collection of uniformly distributed
random numbers is what is supposed to be produced by repeated calls to a function named
ran (or some similar name) in your favorite language or software package; we say supposed
to be because the actual numbers produced depart from this in two ways. The rst, harm-
less, one is that any such computer function has to actually return a deterministic set of
numbers designed to look random; hence these are really pseudorandom numbers. The
second and harmful departure is that many of the computer routines that have actually
been used do not meet this requirement well. If you do not know the algorithm for the func-
tion you are using, you should use another one whose algorithm you do know. There are
several good candidates, and an excellent discussion, in Press et al (1992)
1
though much
has been done since.
Figure 3.1
1
Press, W. H., S. A. Teukolsky, W. T. Vettering, and B. P. Flannery (1992). Numerical Recipes
in Fortran: The Art of Scientic Computing, 2nd Ed. (Cambridge: Cambridge University Press)
2008 D. C. Agnew/C. Constable
Version 1.2.3 Univariate Distributions 3-3
3.2. Normal (Gaussian) Distribution
We have already met this pdf, but present it again to show our different forms for a
pdf, namely
(x)
1
2
e
x
2
/2
1
c 2
e
(xl)
2
/2c
2
1
2
e
(x )
2
/2
2
where the location parameter (conventionally ) is called the mean and the scale parame-
ter (conventionally ) is called the standard deviation. Figure 3.1 shows the pdf and cdf
for this distribution, with the dashed lines showing where attains the values 0.05, 0.50,
and 0.95. The 0.5 and 0.95 lines bound 0.9 (90%) of the area under the pdf (often called,
since this is a density function, the mass). A summary table of mass against x-value (plus
and minus) would include the following:
x Mass fraction
t1.00 0.68
t1.65 0.90
t1.96 0.95
t2.58 0.99
t3.29 0.999
t3.90 0.9999
so we would, for example, expect that out of 1000 rvs with this distribution, no more than 1
would be more than 3.3 away from the central location . We described in Chapter 2 why
this pdf is special, namely that the central limit theorem says that (for example) sums of
rvs approach this distribution somewhat irrespective of their own distribution. It very
often simplies the theory to assume Gaussian behavior (we shall see a number of exam-
ples of this), and this behavior is at least approximately satised by many datasets. But
you should never casually assume it.
This is probably a good place to introduce another term of statistical jargon, namely
what a standardized random variable is. We say that any normally distributed random
variable X N(, ) may be transformed to the standard normal distribution by creating
the new standardized rv Z ( X )/ , which is distributed as N(0,1). We will see other
examples of such transformations.
3.2.1. Generating Normal Deviates
The title of this section contains one of those terms (like Love waves) liable to bring a smile
until you get used to it; but deviates is the standard term for what we have called a collection of
random numbers. There are quite a few ways of producing random numbers with a Gaussian pdf;
with the advent of very large simulations it has become a matter of interest to use methods that are
fast but that will also produce the relatively rare large values reliably.
2
In this section we describe
two related methods that allow us to use the procedures developed in the last chapter for nding the
pdf of functions of random variables.
Both methods start by getting two rvs with a uniform distribution; this pair of numbers speci-
es a point in a square with 0 < x
1
<1 and 0 < x
2
<1; clearly the distribution of these points is a uni-
form bivariate one, with (x
1
, x
2
) 1 over this square region. We term these two uniform rvs U
1
and
U
2
, though we use x
1
and x
2
for the actual numbers.
The rst method, known as the Box-Muller transform, computes two values given by
2
Thomas, D. B., W. Luk, P. H. W. Leong, and J. D. Villasenor (2007). Gaussian random num-
ber generators, ACM Computing Surv., 39, #11.
2008 D. C. Agnew/C. Constable
Version 1.2.3 Univariate Distributions 3-4
y
1
2 ln x
1
cos 2 x
2
and y
2
2 ln x
1
sin 2 x
2
and we assert that y
1
and y
2
are each distributed as a Gaussian.
To show this, we have to nd the pdf s of
Y
1
2 lnU
1
cos 2 U
2
and Y
2
2 lnU
1
sin 2 U
2
The rst point is to note that if we write the joint pdf in polar coordinates, as
Y
(r, ), the dis-
tribution in angle is uniform because U
2
is. The pdf in radius is found from equation (22) of Chap-
ter 2, with W g(R) 2 ln R and R uniformly distributed. Then g
1
(w) e
w
2
/2
, and the applica-
tion of (22), and the uniformity of
R
, gives
W
R
( g
1
(w))
d
dw
g
1
(w)
we
w
2
/2
(2)
If Y
1
and Y
2
are iid variables with a Gaussian distribution, the joint pdf will be the product
1
2
e
(x
2
1
+ x
2
2
)/2
1
2
e
r
2
/2
Although this contains r only, it remains a joint pdf. If we integrate over all we get a pdf in
r alone:
R
2
0
Y
r d re
r
2
/2
(3)
Equations (2) and (3) give the same result: the pdf found for the transformed variable
W g(R) 2 ln R for R uniformly distributed matches that for the radius of the bivariate Gaus-
sian. So we see that taking this function of a uniformly distributed rv, and then multiplying the
result by the unit vector in a random direction (which is what (cos2 U
2
, sin 2 U
2
) is) will produce a
pair of Gaussian rvs.
Another approach to this computation is to observe that if the points are uniformly distributed
over the square, they also will be uniformly distributed over the circle inscribed within it, which
they can be limited to if we remove all points for which r
2
x
2
1
+ x
2
2
1. Then we form
y
1
x
1
r
2 ln r
2
x
1
2 ln r
2
r
2
and y
2
x
2
r
2 ln r
2
x
2
2 ln r
2
r
2
where the second form requires only one square root. The advantage of this method is that it avoids
calls to trigonometric functions, and thus is usually faster even though it requires 27% more calls to
the random number generator. It is equivalent to the Box-Muller transform because x
1
/r and x
2
/r
are equivalent to a sine and cosine, and will be distributed uniformly in angle, while r
2
is also uni-
formly distributed, making it a valid replacement for x
1
.
3.3. A Digression: Some Point Processes
To introduce the next probability distributions, we give a more formal description of
the point process we introduced in Chapter 1: the Poisson process, which we used as a sim-
ple model for geomagnetic reversals. In this process we suppose that the probability of
some event occurring is equal over all time, and is described by a rate (also called an
intensity) which has the dimensions of probability over time. The conditions for a Poisson
process with rate are that as a small time interval h approaches 0,
(A) the number of events in disjoint time intervals are independent: the number in
(0, t] is independent of (does not affect) the number in (t, t + h).
(B) the probability of an event in an interval of length h is, as h 0, approximately
proportional to h plus a remainder that is o(h). (Remember that a function g(h)
is o(h) if
h0
lim g(h)/h 0.)
2008 D. C. Agnew/C. Constable
Version 1.2.3 Univariate Distributions 3-5
(C) the probability of more than one reversal in an interval of length h is o(h).
The Poisson process is memoryless: there is no memory of what went before, so what hap-
pens at any time does not depend on the earlier history. This makes the Poisson process
something of a simplest case point process; it is described by only one parameter and is
fundamentally unpredictable. It therefore serves as a kind of reference model, for example
in studies of earthquake statistics, as having the fewest complications and the least pre-
dictability.
Figure 3.2
A more general type of point process is a renewal process, for which the probability
of an event depends in some way on the time since the last event. The Poisson process
might be called a renewal process that isnt, since in its case the probability is constant, and
the time of the next event is uninuenced by the time of the previous one. Continuing with
the geomagnetic eld example, we might suppose that immediately following a geomagnetic
reversal there is some change in the probability of occurrence of a new event, e.g. , while the
geodynamo recovers its stable polarity; so short intervals might be less probable.
3.4. Exponential Distribution
It can be shown that if the occurrences of events are given by a Poisson process, then
the difference (time interval, or distance, or whatever) between successive occurrences will
have an exponential distribution, with probability density function dened over [0,
):
(x) e
x
ce
x/c
e
x
and cumulative distribution function
(x) 1 e
x
(4)
Since the pdf is peaked towards zero, short intervals have a much higher probability of
occurring than long ones do. Figure 3.2 shows this distribution; again, the dotted lines
show the values for equal to 0.05, 0.50, and 0.95: much less symmetric than the same
points for the Normal.
Producing random numbers for this distribution is very easy, and illustrates a method
that can be applied to some other pdf s. We can think of taking uniformly distributed rvs
and placing them on the y-axis of the cdf; then if we map these into the x-axis through the
inverse cdf function, the result will have the distribution we want. This result is general;
how usable it is depends on how easy it is to compute the inverse function for the cdf. In
this case it is easy; from (4) the inverse cdf function is
1
( y)
ln(1 y)
ln( y)
k0
p
k
(x k)
As s becomes large, the distribution starts to look like a discrete version of the Normal. The
expected value (rst moment) of the distribution is just the shape parameter.
This distribution arises, not just for a time series, but whenever we have a small prob-
ability of something happening in each of a large number of instances, and want to know
the distribution of the number of events. Our example so far has been reversals (with a
small probability in each of very many years); earthquakes would be another point-process
example. One that does not have a temporal component might be deaths from being struck
by lightning; if we took a number of groups of (say) 100,000 people, we would expect the
number of deaths in each group to be Poisson-distributed.
3
3
The classic example is, for each corps of the Prussian army, the number of soldiers killed
each year by being kicked by a horse.
2008 D. C. Agnew/C. Constable
Version 1.2.3 Univariate Distributions 3-7
Figure 3.3
3.6. Gamma Distribution
For the more complicated case of a renewal process, the interval lengths often follow a
gamma distribution, which has a pdf on [0,
1
c(s)
x
c
_
,
s1
e
x/c
s
(s)
x
s1
e
x
(5)
where the gamma function (used to normalize the distribution) is dened by
(x)
u
x1
e
u
du
As in the previous example we have followed the convention for dealing with point pro-
cesses and written the scale parameter as c
1
. We see from (5) that the exponential dis-
tribution is a special case of a gamma distribution, one for which s 1.
Figure 3.3 shows the gamma density function for different values of s. For a renewal
point process governed by this kind of probability density function, describes the rate of
events well after each event, while s controls the shape of the probability function immedi-
ately following each event. Values of s <1 correspond to an enhanced probability (relative
to a Poisson process with the same value of ) of another event immediately following one
that has just occurred. Values of s >1 indicate a diminished probability of another event
immediately following any given one. For geomagnetic reversals, using the renewal model
gives s >1. The physical interpretation of this is controversial: it may be a consequence of
the known inadequacies of the geological record of geomagnetic reversals, in which very
short polarity intervals are missing; or it may reect fundamental physical processes in the
core.
4
Gamma distributions are also used in statistical studies of the times between
4
McFadden, P.L. (1984). Statistical tools for the analysis of geomagnetic reversal sequences,
J. Geophys. Res. 89 3363-3372; McFadden, P.L. and R.T. Merrill (1984). Lower mantle convection
and geomagnetism, J. Geophys. Res. 89 3354-3362.
2008 D. C. Agnew/C. Constable
Version 1.2.3 Univariate Distributions 3-8
earthquakes: another example of a process that is not memoryless, since there is an
enhanced probability of earthquake occurrence immediately following any given one (the
aftershock sequence), compared with the probability later. Indeed, even after removing
aftershocks, earthquakes are observed to cluster in time, making a gamma distribution
with s <1 a possible model.
Figure 5
3.7. Log-Normal Distribution
We next consider a variety of distributions derived in different ways from the normal
distribution. One of the simplest comes if we exponentiate X: if X is normally distributed,
then Y e
X
is said to be log-normally distributed, since X ln Y. This is a mapping of
the real line into the positive numbers, which may be useful for some quantities. The pdf is
(x)
1
s 2
e
(ln(x))
2
/2s
2
x
1
s 2
e
(ln((xl)/c))
2
/2s
2
x l
for x > l
One area where this distribution is relevant is grain size distributions in sediments;
another is the observed distributions of magnetization of basaltic rocks. This distribution
has also been used, with mixed success, to model times between large earthquakes: yet
another kind of renewal process.
Figure 6
3.8. Cauchy Distribution
We have discussed in the previous chapter the distribution of the sum of two Normal
random variables, which is again Normal; the difference will be the same. Things get more
interesting if we consider the ratio of two such variables, so our new random variable would
be Y X
1
/ X
2
, with X
i
N(0, ). It is easy to see that it would be relatively common for the
denominator X
2
to be small, and hence for Y to be large; so we would expect the ratio to be
2008 D. C. Agnew/C. Constable
Version 1.2.3 Univariate Distributions 3-9
much more heavy-tailed than the Normal is.
We can use equation (26) of Chapter 2 to get the actual distribution, which is
(x)
1
2
|x|e
y
2
/2
e
x
2
y
2
/2
dx
1
xe
y
2
(x
2
+ 1) /2
dx
A change of variables to u y
2
makes this into the integral of an exponential in u:
1
2
e
u(x
2
+ 1) /2
du
1
(1 + x
2
)
c
(c
2
+ (x l)
2
)
which is known as the Cauchy distribution. This distribution is integrable (it has to be,
to be a pdf), but the rst and higher moments do not exist (that is to say, are innite). The
median does exist, however. This pdf is thus as heavy-tailed as a pdf can get, and can serve
as a reminder that even such standard procedures as taking the mean may not necessarily
be valid.
3.9. Chi-Squared Distribution
If, instead of taking the ratio of two Normal rvs, we take the product, we get the
2
distribution, one of several derived from the Normal that are used not less as models for
data than for statistical testssome of which we discuss in Chapter 5.
We start by squaring a random variable with a normal distribution; that is, if X is a
random variable distributed with a normal pdf with mean 0 and standard deviation 1 (i.e.,
X N(0,1), then the distribution of the random variable Z X
2
is called the chi-square
distribution with one degree of freedom, denoted
2
1
.
The pdf can be derived by the procedure described in Chapter 2:
z
(z) P(Z < z) P(
z < X <
z)
where the second expression comes from using the inverse function to x
2
, namely the square-root.
Rewriting the rightmost expression gives
z
(z)
x
(
z)
x
(
z)
and differentiating with respect to z x
2
using the chain rule gives
z
(z)
d(z)
dz
d
x
(
z)
dz
d
x
(
z)
dz
_
,
1
2
x
(
z)
x
(
z)
_
,
Finally, substituting in the normal pdf for
x
, we get
z
(z)
1
2
e
z/2
z
which is a special case of the gamma distribution, with s
1
2
.
Next, consider n random variables Z
1
, Z
2
, . . . , Z
n
which are independent and each dis-
tributed as
2
with one degree of freedom. The distribution of Y Z
1
+ Z
2
+ . . . + Z
Z
is
called the chi-square distribution with n degrees of freedom, denoted
2
n
(that is,
Y
2
n
). Each of the individual variables has a gamma distribution; and it is the case that
the sum of independent random variables each with a gamma distribution (and all with the
same value of ) also follows a gamma distribution, so the
2
distribution with n degrees of
freedom is a gamma distribution with s n/2 and
1
2
. The pdf is, like the gamma distri-
bution, over [0,
):
2008 D. C. Agnew/C. Constable
Version 1.2.3 Univariate Distributions 3-10
(x)
1
2
n/2
(n/2)
x
(n/2)1
e
x/2
The expected value of the
2
n
distribution is just n, and the variance is 2n. It is often useful
(as we will discuss in later chapters) to look at the reduced chi-square, which is dened
as
2
n
/n
This has an expected value of 1, independent of n.
The
2
distribution nds widespread application in model tting. If we have n obser-
vations o
i
, with predicted values c
i
and measurement errors
i
, then we can form the stan-
dardized residuals
r
2
i
(o
i
c
i
)
2
2
i
where the standardized part, as in the discussion in Section 3.2, comes from scaling the
residuals o c by their errors. Then, if the residuals are distributed according to a Normal
pdf, the sum of squared residuals (ssr,
i
r
2
i
), will have a
2
n
distribution: so the value of
the ssr is often used to see if the model ts the data adequately or not. The reduced ssr is
the ssr divided by n, since this (for Gaussian errors) should be distributed as the reduced
chi-square; so we would hope that the reduced ssr would be close to one. We will discuss all
this in more detail in the chapter on hypothesis testing.
Figure 7
3.10. Students t Distribution
If X is normally distributed ( X N(0,1)) and Z
2
n
and Z and X are independent,
then the distribution of X/ Z/n is called the Students t distribution with n degrees of
freedom.
5
This has the probability density function:
(x)
[(n + 1) /2]
n (n/2)
1 +
x
2
n
_
,
(n+1) /2
so that in this case the shape factor is, as with
2
, an integer. The t distribution is symmet-
ric about zero. As the number of degrees of freedom approaches innity, the t distribution
tends to the standard normal distribution. The t distribution is used in testing whether
5
The name Student is the pseudonym that F. S. Gosset used for his statistical publications,
to avoid problems with his employer, the Guinness brewery.
2008 D. C. Agnew/C. Constable
Version 1.2.3 Univariate Distributions 3-11
samples have statistically distinguishable meanswe will, again, discuss this fully when
we turn to hypothesis testing.
3.11. F Distribution
Next, suppose X is a random variable distributed as
2
with m degrees of freedom;
then X/m is a similarly-distributed rv which has been scaled (standardized) to make its
expected value 1. Take Z to be another rv, independent of X, and distributed as
2
with n
degrees of freedom. Now consider the random variable Y that is the ratio of these after
both have been normalized by their degrees of freedom:
Y
X/m
Z/n
This variable is distributed according to what is called the F distribution with m and n
degrees of freedom, denoted F
m,n
. The probability density function of this is given by, for x
over [0,
):
(x)
[(m + n)/2]
(m/2)(n/2)
m
n
_
,
m/2
x
m/21
1 +
m
n
x
_
,
(m+n)/2
which we do not plot because of the complications of having to show two variables. Like the
t and
2
distributions, the F distribution is not so much used as a pdf for modeling data, as
in testing, in its case whether two sets of data (or, more often, residuals) have statistically
distinguishable variances. This usage comes from this being a ratio of reduced
2
; if the t
of two models is equally good, we would expect the reduced
2
for the residuals to be about
the same, and Y to be near 1; if it is not, this may indicate than one model ts better than
the other.
Figure 8
3.12. Pareto Distribution
This is a curiosity in geophysical data distributions, since everyone has heard of it but
almost no one knows this name, even though it is the standard term in statistics, and par-
ticularly in economics, for which this distribution is a very common data model. It is yet
another distribution on half of the real line (though on[l,
); but it has
a much larger tail than the other distributions we have seen for this interval, all of which
have an exponential falloff. The Pareto distribution, by contrast, falls off as a power of x, so
allowing a greater probability of very large values. The pdf is
(x) sl
s
x
(s+1)
2008 D. C. Agnew/C. Constable
Version 1.2.3 Univariate Distributions 3-12
where s > 0; note that for this distribution the location parameter determines the lower
bound of the range. To see how this leads to a famous geophysical result, we rst nd the
cdf
(x) 1 l
s
x
s
This probability distribution can be applied to data (such as a set of earthquakes, or the
amount of money people have) by supposing that we have a total of n occurrences, with the
probability of getting a value x (of earthquake magnitude or wealth) being given by a
Pareto distribution. Then the number of occurrences less than some value of x, which we
call n(x), will have the pdf n[1 (x)] nl
s
x
s
; taking the log of this we get
ln[n(x)] ln(n l
s
) sx
which has exactly the form of the Gutenberg-Richter relationship for the number of earth-
quakes less than magnitude M, log(n) a M. So the b-value much discussed in seismic-
ity studies, is the shape factor for the Pareto distribution.
Figure 9
3.13. von Mises Distribution
Our next distribution is not usually discussed in elementary statistics, but is impor-
tant for the geophysically common case of circular data, in which we are looking at direc-
tions in two dimensions, best described as an angle, and so as a variable in the range [ , ).
With suitable scaling this distribution applies to any variable which is dened to be modulo
some number: for example, we can use this distribution for the fractional part of a real-val-
ued quantity, which would be limited to the range [0,1). The von Mises distribution serves
as an analog to the Normal in this case; it has the pdf
(x)
1
2 I
0
(s)
e
s cos(xl)
where I
0
is a modied Bessel function. Note that in this case the shape factor modies the
width of the function, as the scale factor does for the Normal. As this factor becomes very
small, approaches a uniform distribution.
3.14. Rayleigh Distribution
We another special distribution arises from the following: suppose we add the squares
of two Normal rvs together, and then take the square root? This would be, for example, the
pdf of the amplitude of a two-dimensional vector, each of whose components was normally
distributednote that this is zero at the origin. We found the answer above in the process
2008 D. C. Agnew/C. Constable
Version 1.2.3 Univariate Distributions 3-13
Figure 3.10
of deriving the Box-Muller transformation; another approach is to realize that the rv for the
square of the amplitude (call this Z) is distributed as
2
2
, with cdf
0
(x) 1 e
z/2
. Then, we
can nd the cdf of X Z
1
2
by the same kind of procedure as we used in Section 3.9: taking
(x) to be the cdf of X, we have
(x) P( X < x) P(Z < x
2
)
0
(x
2
) 1 e
x
2
whence the pdf is
(x) 2xe
x
2
x
c
2
_
,
e
x
2
/2c
2
This is known as the Rayleigh distribution, and is used in the theory of random walks.
Note that if we add two orthogonal vectors described by normal distributions, the expected
value of the result is the origin; but the pdf of the amplitude is not zero: the most probable
amplitude does not correspond to the most probable location.
Figure 3.11
3.15. Weibull Distribution
This distribution was invented to describe failure rates, and as such it is popular in
modeling renewal processes. The pdf is
(x) x
s1
e
x
s
s
c
_
,
x l
c
_
,
s1
e
((xl)/c)
s
for x l
which makes the cdf relatively simple:
(x) 1 e
(x)
s
1 e
((xl)/c)
s
The shape and scale parameters are sufcient to provide a exible distribution, so a
nonzero location parameter is less often used. The exponential distribution is a special
case, with s 1, and the Rayleigh is another, with s 2.
2008 D. C. Agnew/C. Constable