Machine Learning and Pattern Recognition Week 2 Univariate Gaussian
Machine Learning and Pattern Recognition Week 2 Univariate Gaussian
The Gaussian distribution, also called the normal distribution, is widely used in probabilis-
tic machine learning. This week we’ll see Gaussians in the context of doing some basic
statistics of experimental results. Later in the course we’ll use Gaussians as building blocks
of probabilistic models, and to represent beliefs about unknown quantities in inference
algorithms.
We know that many of you will have seen all of this material before (some of you several
times). However, not everyone has, and large parts of the MLPR course depend on thoroughly
understanding Gaussians. This note is more detailed (slow) than many of the machine
learning text books, and provides some exercises. A later note on multivariate Gaussians
will also be important.
z = σx + µ. (3)
If we scale and shift many draws from a standard normal, the histogram of values will be
stretched horizontally, and then shifted. Scaling by σ multiplies the variance by σ2 (see the
notes on expectations), and leaves the mean at zero. Adding µ doesn’t change the width of
the distribution, or its variance, but adds µ to the mean.
The distribution of z maintains the same bell-curve shape, with the points of inflection now
at µ ± σ (note, not ±σ2 ). We still say the variable is Gaussian distributed, but with different
parameters: z ∼ N (µ, σ2 ). By convention, the second parameter of the normal distribution is
usually its variance σ2 , not its width or standard deviation σ. However, if you are reading a
paper, or using a new library routine, it is worth checking the parameterization being used,
just in case. Sometimes people choose to define a Gaussian by its precision, 1/σ2 , instead of
the variance.
For a general univariate Gaussian variable z ∼ N (µ, σ2 ), we can identify a standard normal,
by undoing the shift and scale above:
z−µ
x= . (4)
σ
We now work out the probability density for z, by transforming the density for this x.
[See the further reading if you can’t follow this section, or want more detail.]
Substituting the above expression into the PDF for the standard normal, suggests the shape
of the shifted and scaled distribution that we imagined above is described by the density
− 1 ( z − µ )2
p(z) = N (z; µ, σ2 ) ∝ e 2σ2 . (5)
4 Further reading
https://en.wikipedia.org/wiki/Normal_distribution
If you would like to work through the change of variables more rigorously, or in more detail,
we are using a specific form of the following result:
If an outcome x has probability density p X ( x ), and an invertible, differentiable function g is
used to create a new quantity z = g( x ), then the probability density of the new quantity is
p Z (z) = p X ( g−1 (z))| dx
dz |. In our case the derivative is simple: 1/σ.
This method for transforming densities for a change of variables is in Bishop Section 1.2.1,
with more detail and the multivariate case in Murphy Section 2.6.
5 Python code
To generate a million outcomes from a standard normal and plot a histogram:
import numpy as np
from matplotlib import pyplot as plt