0% found this document useful (0 votes)

66 views42 pages

Gaussian MLEstimator

The document discusses the Gaussian distribution and Bayesian estimation. It covers properties of the Gaussian such as the maximum entropy property and central limit theorem. It also discusses multivariate Gaussians and derivations of mean, variance, and other integrals. Finally, it introduces Bayesian inference for Gaussians with known and unknown parameters, as well as conjugate normal-gamma and Gaussian-Wishart distributions.

Uploaded by

Slim Salim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views42 pages

Gaussian MLEstimator

Uploaded by

Slim Salim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

The Gaussian Distribution MLE

Estimators and Introduction to

Bayesian Estimation
Prof. Nicholas Zabaras
School of Engineering
University of Warwick
Coventry CV4 7AL
United Kingdom

Email: [email protected]
URL: http://www.zabaras.com/

August 7, 2014

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 1

Contents
The Gaussian Distribution, Standard Normal, Degenerate Gaussian
Distribution, Multivariate Gaussian, the Gaussian and Maximum Entropy, the
CLT and the Gaussian Distribution, Convolution of Gaussians, MLE for the
Gaussian, MLE for the Multivariate Gaussian

Sequential MLE Estimation for the Gaussian, Robbins-Monro Algorithm

Bayesian Inference for the Gaussian with Known Variance, Bayesian Inference
for the Gaussian with Known Mean, Bayesian Inference for the Gaussian with
unknown Mean and Variance

Normal-Gamma Distribution, Gaussian-Wishart Distribution

Following closely Chris Bishops PRML book, Chapter 2

Kevin Murphys, Machine Learning: A probablistic perspective, Chapter 2

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 2

The Gaussian Distribution
A random variable X is Gaussian or normally distributed
X N ( , s 2 ) if:
1 2
t
1
P X t
2
exp ( x ) dx
2s 2s
2

The following can be shown easily with direct integration:

1 1 2
X 2
exp ( x ) xdx ,
2s 2s
2

2
1 1 2 X 2 s 2
2
X 2 exp ( x ) x dx 2
s 2
, var[ X ]
2s 2s
2

The following integrals are useful in these derivations :

exp u du , u exp u du 0, du
2 2 2 2
u exp u

2

We often work with the precision of a Gaussian l=1/s2. The

higher l the narrower the distribution is.
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 3
Standard Normal, CDF, Error Function
Plot of the Standard Normal N(0,1) and CDF. Let F(x;0,1)
PDF
the corresponding CDF. 0.4

0.35

Run gaussPlotDemo 0.3

from PMTK
0.25

0.2

0.15

0.1

0.05 N x;0,1
0
-3 -2 -1 0 1 2 3
x

N ( z | , s 2
)dz F z;0,1 , z ( x ) / s

F x;0,1
z
1 1
F z;0,1 dt
t /2
1 erf z / 2
2
e
2
2
x
2
erf x dt
t2
e
0

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 4

Degenerate Gaussian Distribution
Note that as s20, the Gaussian becomes a delta
function centered at the mean :

lim
s 0
2
N x | , s 2
(x )

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 5

Multivariate Gaussian
A multivariate X D
is Gaussian if its probability density is
1/2
1 1
N ( x | , S) exp ( x )T 1
S ( x )
2 D det S 2

where D , S DD is symmetric positive definite matrix

(covariance matrix).

We often work with the precision matrix L=S-1

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 6

2D Gaussian
Level sets of 2D Gaussians (full, diagonal and spherical covariance matrix)
full
diagonal
10
10
full
8 8

6 6
0.2
4 4

0.15 2
2
0
0 0.1
-2
-2
0.05 -4
-4
-6
0
-6 10
-8
5 10
-8 0 5 -10
0 -5 -4 -3 -2 -1 0 1 2 3 4 5
-5 -5
-10 -10 -10
-5 -4 -3 -2 -1 0 1 2 3 4 5 gaussPlot2DDemo
from PMTK
diagonal
spherical spherical
5

4
0.2 0.2
3

0.15
0.15 2

1 0.1
0.1
0
0.05
0.05 -1
0
-2 5
0 5
10 -3
0
5 5 0
-4
0 -5 -5
0 -5
-5 -6 -4 -2 0 2 4 6
-10 -5

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 7

Multivariate Gaussian: Maximum Entropy
We can show that the multivariate Gaussian maximizes the
entropy H with the constraints of normalization with given
mean and given variance S:
max p( x ) ln p( x )dx + l
p ( x ),l , m , L
p( x)dx 1 m xp( x)dx
T

Tr L p( x )( x )( x ) dx S
T

Setting the derivative wrt p(x) to zero gives:

0 1 ln p( x) l mT x Tr L( x )( x )T
1 l mT x ( x )T L( x )
p( x ) e
The coefficients can be found by satisfying the constraints.
We start by completing the square.

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 8

Multivariate Gaussian: Maximum Entropy
1 l mT x ( x )T L ( x )
p( x ) e
y

1 1 1
1 l T m mT L1m ( x L1m )T L( x L1m )
e 4 2 2

Satisfying the mean constraint:

1 1
1
1 l T m mT L1m yT Ly
e 4
y + L m dy
2
The 1st term drops from symmetry, the 2nd gives from
normalization, thus we need to have:
1 1
L m m
2

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 9

Multivariate Gaussian: Maximum Entropy
z
1 l ( x )T L ( x )
p( x ) e
Satisfying the variance constraint:

e
1 l zT Lz
zzT dz S
Note that with L = -S / 2 , the 3nd term from the exponential
when integrated gives:

e zz dz S(2 ) S
zT Lz T D /2 1/2

It remains to select l such that:

1 l 1/2

1 1
e (2 ) S l 1 ln
D /2
1/2
(2 ) S
D /2

The optimizing p(x) is now clearly the Gaussian.
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 10
Multivariate Gaussian: Maximum Entropy
The entropy of the multivariate Gaussian is now computed
as follows:
H x N ( x | , S) ln N ( x | , S) dx
1
N ( x | , S)
2
D ln(2 ) ln S ( x )T S 1 ( x ) dx

D ln(2 ) ln S N ( x | , S) tr ( x )( x )T S 1 dx
1 1
2 2
1
2
1
2

D ln(2 ) ln S tr N ( x | , S)( x )( x )T dx S 1
D ln(2 ) ln S tr SS 1
1 1
2 2

D ln(2 ) ln S tr S 1S
1
2

D ln(2 ) ln S D
1
2

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 11

Multivariate Gaussian: Maximum Entropy
Using also the KL distance definition, one can show that the
Gaussian has the largest entropy from any other distribution
satisfying the mean and 2nd moment constraints. To make
the presentation simple, consider

p ( x ) N ( x | , S), q( x ) xx T dx S
Then:
p( x )
0 KL(q || p) q( x ) ln dx q( x ) ln p( x )dx + q( x ) ln q( x )dx
q( x )
q ( x ) ln p( x )dx H [q] p( x ) ln p( x )dx H [ q]
H [ p] H [q] H [ p] H [q]

The intermediate step in the proof above accounts for the

moment constraint on q and the fact that log(p) is quadratic
in x!
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 12
The CLT and the Gaussian Distribution
Let (X1,X2, Xn) be independent and identically distributed
(i.i.d.) continuous random variables each with expectation
and variance s2.
1
Define: Zn ( X 1 X 2 ... X n N )
s N
As N, the distribution of Zn converges to the distribution
of a standard normal random variable
x
1
lim P Z n x e
t 2 /2
dt
N 2
1 N
s2
If Xn X j, for N large, X n ~ N , as N
N j 1 N
Somewhat of a justification for assuming that Gaussian
noise is common

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 13

The CLT and the Gaussian Distribution
As an example, assume N variables (X1,X2, Xn) each of
which has a uniform distribution over [0, 1] and then
consider the distribution of the mean
(X1+X2+ +Xn)/N. For large N, this distribution tends to a
Gaussian. The convergence as N increases can be rapid.
4 4
4

3.5 3.5
3.5

3 3
3

2.5 2.5
2.5

2 2 2

1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

0 0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

MatLab Code

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 14

The CLT and the Gaussian Distribution
10000
Histogram of 1
N
x where xij~Beta(1,5)
j 1
ij

N=1 N=5
3 3

2 2

1
1

N = 10
0 3
0 0.5 1 0
0 0.5 1

centralLimitDemo
2 from PMTK

0
0 0.5 1
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 15
The CLT and the Gaussian Distribution
One consequence of this result is that the binomial
distribution which is a distribution over m defined by the
sum of N observations of the random binary variable x,
will tend to a Gaussian as N .

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 16

Example of the Convolution of Gaussians
Consider 2 Gaussians x1 ~ N (1 ,11 ), x2 ~ N ( 2 , 21 ). We
want to compute the entropy of the distribution of x=x1+x2.
p(x) can be computed from the convolution of two Gaussians
p( x) p( x | x2 ) p( x2 ) dx2
1
N ( 1 x2 ,11 ) N ( 2 , 2 )
We need to complete the square in the exponential in x2:
1 1
1 x ( 1 x2 ) 2 x2 2
2 2

2 2
1 1 ( x 1 ) 2 2
2 2
1 ( x 1 ) 2 2 1
1 2 x2
1
1 x 1
2

2 1 2 2 2 1 2
The 1st term is integrated out and the precision of x is:
12
1 1 2
1 2 1 2
1
H [ x] ln 2 es 2 ln 2 e 1 2
1
Thus the entropy of x is: 2 2 1 2
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 17
Maximum Likelihood for a Gaussian
Suppose that we have a data set of observations D = (x1, .
. . , xN)T, representing N observations of the scalar random
variable X. The observations are drawn independently from
a Gaussian distribution whose mean and variance s2 are
unknown.

We would like to determine these parameters from the data

set.

Data points that are drawn independently from the same

distribution are said to be independent and identically
distributed, which is often abbreviated to i.i.d.

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 18

Maximum Likelihood for a Gaussian
Because our data set D is i.i.d., we can write the probability
of the data set, given and s2, in the form

N
Likelihood function : p( x | , s 2 ) N ( xi | , s 2 )
i 1

This is seen as a function of , s 2

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 19

Max Likelihood for a Gaussian Distribution
N
Likelihood function : p( x | , s ) N ( xi | , s 2 )
2

i 1

One common criterion for determining the parameters in a

probability distribution using an observed data set is to find
the parameter values that maximize the likelihood function,
i.e. maximizing the probability of the data given the
parameters (contrast this with maximizing the probability of
the parameters given the data).

We can equivalently maximize the log-likelihood:

1 N N N
max2 ln p( x | , s ) max2 2 ( xi ) ln s ln(2 )
2 2 2
,s ,s 2s i 1 2 2
1 N 1 N
ML xi , s ML ( xi ML )2
2

N i 1 N i 1

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 20

Maximum Likelihood for a Gaussian Distribution
N
1 1 N
ML
N
x
i 1
i ,s 2
ML ( xi ML ) 2
N i 1
Sample mean Sample variance wrt ML
mean ( not the exact mean)

The MLE underestimates the variance (bias due to

overfitting) because ML fitted some of the noise in the data.
The maximum likelihood solutions ML , s ML are functions
2

of the data set values x1, . . . , xN. Consider the

expectations of these quantities with respect to the data set
values, which come from a Gaussian.
Using the equations above you can show that :
In this derivation

N 1 2 you need to use :

ML , s ML
2
s E xi x j s 2 for i j
N
E xi2 s 2 2

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 21

Maximum Likelihood for a Gaussian Distribution
We use :
N 1 2
s 2
ML s E xi x j s 2 for i j
N E xi2 s 2 2

1 N 2 1 N 1 N 2
s ( xn ML )
2
ML n N m
( x x )
N n 1 N n 1 m 1
1 N 2 2 N
1 N N
xn xn xm 2 xm xl
N n 1 N m 1 N m 1 l 1
1
N ( 2 s 2 ) N ( N 1) 2 ( 2 s 2 ) N 2 N ( N 1) 2 ( 2 s 2 )
2 1
N N N

N
1
N ( 2 s 2 ) N 2 s 2

N 1 s 2
N

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 22

Maximum Likelihood for a Gaussian Distribution
N 1 2
ML , s ML
2
s
N

On average the MLE estimate obtains the correct mean but

will underestimate the true variance by a factor (N 1)/N.
An unbiased estimate of the variance is given as:
N 1 N For large N,
i ML
2
s s ML
2
( x ) 2
the bias is not
N 1 N 1 i 1
a problem
This result can be obtained from a Bayesian treatment in
which we marginalize over the unknown mean.
The N-1 factor takes account the fact that 1 degree of
freedom has been used in fitting the mean and removes
the bias of MLE.
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 23
MLE for the Multivariate Gaussian
We can easily generalize the earlier results for a
multivariate Gaussian. The log-likelihood takes the form:

ND N 1 N
ln p( X | D , , S) ln 2 ln | S | ( xn )T S 1 ( xn )
2 2 2 n1

Setting the derivatives wrt and S equal to zero gives the

following:

1 N 1 N
ML xn , S ML ( xn ML )( xn ML )T
N n1 N n1

We provide a proof of the calculation of S ML next.

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 24

MLE for the Multivariate Guassian
ND N 1 N
ln p( X | D , , S) ln 2 ln | S | ( xn )T S 1 ( xn )
2 2 2 n1
We differentiate the log likelihood wrt S1. Each contributing
term is:
N N 1 N T N
1
ln | S | 1
ln | S | S S
2 S 2 S 2 2 A useful trick!
1 N 1 1 N 1 T
n
2 S 1 n 1
( x )T 1
S ( x n )
2 S 1
N Tr S
n 1 N
( x n )( x n )

N 1 Tr S 1 S NS
1 1
S symmetric
2 S 2
1 1 N
NS , where S ( xn )( xn )T
2 N n 1
So finally S ML S

Tr T , ln | A | A1 ,
T

Here we used: A
| A1 || A |1 , tr ( AB ) tr ( BA)
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 25
Appendix: Some Useful Matrix Operations
Show that

Tr T and Tr T

Indeed

Tr ik ki nm
A B B Tr T
Amn Amn
Show that

ln | A | A1
T

Using the cofactor expansion of the det:

1 1
(1) m n M mn A1 nm
1
Amn
ln | A |
| A | Amn
| A |
| A | Amn
(1)i j Aij M ij
j | A|

where in the last step we used Cramers rule.

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 26

MLE for a Multivariate Gaussian
N N N
1 1 1
x x x
T
ML n x , S ML ( xn ML )( xn ML ) T
n
T
n xx
N n 1 N n 1 N n 1

Note that the unconstrained maximization of the log-

likelihood gives a symmetric S.

As for the univariate case, we can define an unbiased

covariance as:
1 N
S ML
N 1 n1
( x n ML )( x n ML )T
, S ML S

To prove this, you will need to use that:

xn xmT T mn S
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 27
Sequential MLE Estimation for Gaussians
Often we are interested to compute sequentially an
estimate of ML as more data arrive. This can easily be
done:
1 N xN 1 N 1
ML
(N )
xn xn
N n 1 N N n 1
xN N 1 1 N 1

N

N N 1 n 1
xn

xN N 1 ( N 1)

N

N
ML ML
( N 1) 1
N
x N ML
( N 1)

Learning Error signal
rate

This sequential approach cannot easily be generalized to

other cases (non-Gaussians, etc.)
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 28
Robbins-Monro Algorithm
A more powerful approach to computing sequentially the MLE
estimates is via the Robbins-Monro algorithm.

We review the algorithm by considering the calculation of the zero of a

regression function.*

Consider the joint distribution p(z,q) of two

random variables and define the regression
function as:
f q z | q zp( z | q )dz
Assume we are given samples
from p(z,q) one at a time.

* Effectively, we dont know the regression function f(q) but we have data of a noisy version z of that. We
take the regression function to be the expectation z | q .
Robbins, H. and S. Monro (1951). A stochastic approximation method. Annals of Mathematical Statistics 22,
400407.
Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition (Second ed.). Academic Press.
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 29
Robbins-Monro Algorithm
f q z | q zp( z | q )dz

We want to find the root f q * 0 in a sequential

manner: The Robbins-Monro algorithm proceeds as:

q ( N ) q ( N 1) aN 1 z q ( N 1)

The learning coefficients {aN}

should satisfy:

lim aN 0, aN , aN2
N
n 1 n 1

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 30

Robbins-Monro Algorithm
We can state the MLE calculation ML for our Gaussian
example as finding the root of a regression function:

1 N
N
1 CLT

N
ln p( xn | ) |ML 0
n 1 N
ln p( xn | ) |ML 0 ln p( x | ) 0
n 1 N

z| ML

In the context of the Robbins-Monro algorithm,

x ML ML
z ln p ( x | ) ML
| , z is a Gaussian, f z |
s 2
s2

The algorithm takes the form:

x ( N ) ML
( N 1)
ML
(N )
ML
( N 1)
aN 1
s2
s2
Substituting aN 1 gives the estimate discussed earlier.
N
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 31
Robbins-Monro Algorithm
A graphical interpretation of the algorithm is shown here.

x ML
z
x ( N ) ML
( N 1)
s2
(N )
( N 1)
aN 1
ML ML
s2 p( z | ) is a Gaussian

ML
f z |
s2

The Robbin-Monro algorithm computes the zero of the

regression function.
Blum, J. A. (1965). Multidimensional stochastic approximation methods. Annals of Mathematical Statistics 25,
737744.
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 32
Sequential MLE Estimation for Gaussians
Let us now repeat the same calculations but for the MLE
estimate of s2:
xN
2
N N 1
1 1
s ( N ) xn xn
2 2 2

N n 1 N n 1 N
xN
2
N 1 2
s ( N 1)
N N
s ( N 1)
2 1
N
xN s (2N 1)
2

If we substitute the expression for the Gaussian likelihood
into the Robbins-Monro procedure for maximizing likelihood:
xN
x
2
1 1
s s aN 1 s s s (2N 1)
2 2 2 2 2
( N 1) ln ( N 1) ( N 1) a N 1
s (2N 1) s s
(N ) 2 4 N

2 2 ( N 1)
2 ( N 1)

The 2 formulas are identical for: aN 1 2s (4N 1) N .

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 33

Sequential MLE: Multivariate Gaussian
To simplify things, assume that ML and thus:
N
1
( x )( x ) S (N )
ML
N n 1
n n
T

From this equation we can derive:

S (ML
N) N 1)
S (ML
1
N
( x N )( x N )T S (ML
N 1)

To apply the Robbins-Monro algorithm, assume that S is
diagonal and as before compute the derivative

N 1)
S (ML
ln p x N | , S (ML
N 1)
S ML ( x N )( x N )T S (ML
1 ( N 1) 2
2
N 1)

Substituting into the RM algorithm:
AN 1 S ML ( x N )( x N )T S (ML
( N 1) 1 ( N 1) 2 N 1)
S (N )
ML S ML
2
Thus from the RM algorithm, we can obtain the exact
update by selecting
AN 1 S ML
2 ( N 1) 2
N
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 34
Bayesian Inference for the Gaussian: Known Variance

Consider X1 | ~ N ( , s 2 ), with prior ~ N ( 0 , s 02 ). We want

to infer with the variance s2 taken as known. The case
with multiple data points will be considered later on.
Then we can derive the following:
( x1 )2 ( 0 )2
( | x1 ) f ( x1 | ) ( ) exp
2 s 2
2 s 2
0
2 1 1 x1 0 1
( | x1 ) exp 2 2 2 2 exp 2 ( 1 )2
2 s s0 s s0 2s 1

| x1 ~ N ( 1 , s 12 ) with
1 1 1 s 0s
2 2
s 2
, and
s1 s 0 s
2 2 2 1
s0 s
2 2

x1 0
1 s 2 2
2

s s0
1

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 35

Bayesian Inference: Predictive distribution
To predict the distribution of a new observation X | ~ N ( , s 2
)
in light of x1 , we use the predictive distribution as follows:
( x )2 ( 1 )2 1 ( x )2 ( 1 )2

2 s 2 s12
f ( x | x1 ) f ( x | ) ( | x1 ) d e 2s 2
e 2s12
d e d
Likelihood Posterior

We can complete the square by treating the integrand

above as a bivariate Gaussian in (x,). One can verify that:
1 1
s2
1 ( x ) 2 ( 1 ) 2 1 s 2 x 1
x 1 1
1 1
const.
2 s 2
s12
2 1 1
2
s2 s s1
2

S 1

s 2 s 12 s 12
From the above expression note that: S 2
s1 s1
2

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 36

Bayesian Inference: Predictive distribution
We will see at a follow up lecture that if we partition the
mean and variance of a multivariate Gaussian as:
xa a S aa S ab
x= = , S
xb
b S S
ba bb
then, the marginal
p xa N xa | a , Saa
In our predictive distribution we need to integrate out .
Thus based on the above result and 1 , S s 2s 1 s 12 , we
2 2 2

have: 1 s1 s1

f ( x | x1 ) f ( x | ) ( | x1 ) d N x | 1 , s 2 s 12
Likelihood Posterior

Note the variance is the sum of model variance + variance

of posterior uncertainty in .

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 37

Bayesian Inference for the Gaussian
Consider X x1 , x2 ,..., xN ~ N ( , s 2 ), with prior ~ N (0 , s 02 ).

The likelihood takes the form: N 2

N
1 n ( x )
p( X | ) f ( xn | ) exp n 1

2s
2 N /2
2s
2
n 1

Note that in terms of this is not a probability density and is not
normalized. Introducing the conjugate (Gaussian) prior on leads to:
N
n 2
( x ) 2
N
( )
( | X ) f ( xn | ) ( ) exp n 1 0

n 1 2s 2
2s 0
2

N

2 N xn 1
1 2

( | X ) exp 2 2 n 1 0

2 exp 2 ( N )
2 s s0 s
2
s0 2s 1

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 38
Bayesian Inference for the Gaussian
N
2 N 1 n
x
1
( | X ) exp 2 2 n 1 2 02 exp 2 ( N ) 2
2 s s0 s s0 2s N

So the posterior is a Gaussian as before with
| X ~ N ( N , s N2 ) with
1 1 N s 0s
2 2
s 2
, and
sN s0 s
2 2 2 N
Ns 0 s
2 2

N
xn N N s 2
s 2
N s N n 1 2 02 s N 2ML 02
2 2 0
ML 0
s s0 s s 0 Ns 0 s Ns 0 s
2 2 2 2

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 39

Bayesian Inference for the Gaussian
| X ~ N ( N , s N2 ) with
1 1 N s 02s 2 Ns 02 s2
2 2 s 2
, and N ML 0
s N s0 s
2
Ns 0 s
2 N 2
Ns 0 s
2 2
Ns 0 s
2 2

Observe the posterior mean for N and N0.

The posterior precision is the sum of the precision of the

prior plus one contribution of the data precision for each
observed data point. As we have seen before for N the
posterior peaks around the ML and the posterior variance
goes to zero, i.e. the point MLE estimate is recovered
within the Bayesian paradigm for infinite data.
s2
How about when s ? In this case note that s and N ML
2 2
0 N
N

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 40

Bayesian Inference for the Gaussian
s s
2 2
N s 2
s 2
| X ~ N ( N , s N2 ) with s N2 0
, and 0
ML 0
Ns 0 s Ns 0 s Ns 0 s
2 2 N 2 2 2 2

4.5

3.5

3
N=10
2.5

2
N=2

MatLab 1.5 N=1

implementation 1
0.5 prior
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

X x1 , x2 ,..., xN ~ N (0.8,0.1), with prior ~ N (0,0.1).

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 41

Sequential Bayesian Inference
| X ~ N ( N , s N2 ) with
1 1 N s 02s 2 Ns 02 s2
2 2 s 2
, and N ML 0
s N s0 s
2
Ns 0 s
2 N 2
Ns 0 s
2 2
Ns 0 s
2 2

We can easily derive sequential estimates of the MLE.

They are as follows:
1 1 1 s N2 s N2
2 2 , and N 2 N 1 2 xN
s N s N 1 s
2
s N 1 s

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 42

Pharmaceutical International Marketing
50% (2)
Pharmaceutical International Marketing
47 pages
Class Gaussian Process 2024
No ratings yet
Class Gaussian Process 2024
170 pages
Lessons Learned
100% (1)
Lessons Learned
49 pages
m3l16 Lesson 16 The Slope-Deflection Method: Frames Without Sidesway
100% (1)
m3l16 Lesson 16 The Slope-Deflection Method: Frames Without Sidesway
24 pages
Statistical Tables and Formulae PDF
No ratings yet
Statistical Tables and Formulae PDF
93 pages
Metal Forms IMaRQ Audit Report
No ratings yet
Metal Forms IMaRQ Audit Report
17 pages
20 Bayesian2
No ratings yet
20 Bayesian2
50 pages
Sumulong College of Arts and Sciences M.L Quezon Ext. Brgy, Dalig, Antipolo City Senior High School
100% (1)
Sumulong College of Arts and Sciences M.L Quezon Ext. Brgy, Dalig, Antipolo City Senior High School
28 pages
4 PDF
No ratings yet
4 PDF
62 pages
Unit V-Research in Social Work Quantitative and Qualitative Approaches
No ratings yet
Unit V-Research in Social Work Quantitative and Qualitative Approaches
68 pages
Decision Theory - II Part 9mar23
No ratings yet
Decision Theory - II Part 9mar23
37 pages
Book Exercises PDF
No ratings yet
Book Exercises PDF
179 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
Tut 07
No ratings yet
Tut 07
19 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Sample Test1
No ratings yet
Sample Test1
4 pages
Murphysolns
No ratings yet
Murphysolns
45 pages
Lecture 1
No ratings yet
Lecture 1
30 pages
Book Exercises
No ratings yet
Book Exercises
179 pages
CPSC 540: Machine Learning: Gaussians
No ratings yet
CPSC 540: Machine Learning: Gaussians
30 pages
280 LN Deller PART1 WITH ALL SUPPLEMENTS Fall2015 PDF
No ratings yet
280 LN Deller PART1 WITH ALL SUPPLEMENTS Fall2015 PDF
118 pages
Lec2 IntroToProbabilityAndStatistics
No ratings yet
Lec2 IntroToProbabilityAndStatistics
37 pages
Lecture 6
No ratings yet
Lecture 6
27 pages
PBM Notes
No ratings yet
PBM Notes
130 pages
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Lec12 13 BayesianInferenceForTheGaussian
No ratings yet
Lec12 13 BayesianInferenceForTheGaussian
57 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Lec9 MultivariateGaussian
No ratings yet
Lec9 MultivariateGaussian
60 pages
06 Gaussian Distributions
No ratings yet
06 Gaussian Distributions
33 pages
3 2lecture13-Gaussians - PPTX - Lecture13-Gaussians
No ratings yet
3 2lecture13-Gaussians - PPTX - Lecture13-Gaussians
19 pages
Bayesian Kernel Methods
No ratings yet
Bayesian Kernel Methods
40 pages
Slide - 4 - 07 (Lecture 4.7 Gaussian Random Variable)
No ratings yet
Slide - 4 - 07 (Lecture 4.7 Gaussian Random Variable)
22 pages
My Notes For Discrete and Continuous Distributions 987654
No ratings yet
My Notes For Discrete and Continuous Distributions 987654
28 pages
Lec22 Introduction2BayesianRegression
No ratings yet
Lec22 Introduction2BayesianRegression
42 pages
Market Research For New Product Development - Maruthi Suzuki
No ratings yet
Market Research For New Product Development - Maruthi Suzuki
65 pages
Gaussian Process Tutorial by Andrew NG
No ratings yet
Gaussian Process Tutorial by Andrew NG
13 pages
Gaussian Processes, Multivariate Probability Density Function, Transforms
No ratings yet
Gaussian Processes, Multivariate Probability Density Function, Transforms
14 pages
Trans Emerging Tel Tech - 2008 - Telatar - Capacity of Multi Antenna Gaussian Channels
No ratings yet
Trans Emerging Tel Tech - 2008 - Telatar - Capacity of Multi Antenna Gaussian Channels
11 pages
Topology and Geometry for Physicists
From Everand
Topology and Geometry for Physicists
Charles Nash
3.5/5 (1)
More On Gaussians
No ratings yet
More On Gaussians
11 pages
Stat520 Ch.3
No ratings yet
Stat520 Ch.3
5 pages
Asdad
No ratings yet
Asdad
14 pages
AE - Tema 3 - The Multivariate Gaussian Distribution
No ratings yet
AE - Tema 3 - The Multivariate Gaussian Distribution
6 pages
L08 MaximumLikelihoodEstimation
No ratings yet
L08 MaximumLikelihoodEstimation
5 pages
Lecture 8: Inference 36-401, Fall 2015, Section B
No ratings yet
Lecture 8: Inference 36-401, Fall 2015, Section B
16 pages
Stat520 Ch.2
No ratings yet
Stat520 Ch.2
5 pages
EXPERIMENT-11:Noise Realization: 1 Aim/Practice Questions
No ratings yet
EXPERIMENT-11:Noise Realization: 1 Aim/Practice Questions
13 pages
Regression With Panel Data
No ratings yet
Regression With Panel Data
16 pages
1 Notes On Brownian Motion: 1.1 Normal Distribution
No ratings yet
1 Notes On Brownian Motion: 1.1 Normal Distribution
15 pages
Gaussian Distributions: Overview: This Worksheet Introduces The Properties of Gaussian Distributions, The
No ratings yet
Gaussian Distributions: Overview: This Worksheet Introduces The Properties of Gaussian Distributions, The
25 pages
Murphy Gaussians
No ratings yet
Murphy Gaussians
15 pages
MLE and MAP Ex PG 1-4 Print
No ratings yet
MLE and MAP Ex PG 1-4 Print
10 pages
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
No ratings yet
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
23 pages
Matlab Homework Experts 5
No ratings yet
Matlab Homework Experts 5
9 pages
Gaussian Processes, Multivariate Probability Density Function, Transforms
No ratings yet
Gaussian Processes, Multivariate Probability Density Function, Transforms
14 pages
Important Random Variables: Binomial
No ratings yet
Important Random Variables: Binomial
15 pages
Organisational Culture On Productivity - Namibia
No ratings yet
Organisational Culture On Productivity - Namibia
130 pages
MATH 181 1 SEMESTER/AY 2018-2019: Frequently Used Continuous Random Variables
No ratings yet
MATH 181 1 SEMESTER/AY 2018-2019: Frequently Used Continuous Random Variables
8 pages
Machine Learning and Pattern Recognition Gaussian Processes
No ratings yet
Machine Learning and Pattern Recognition Gaussian Processes
6 pages
Gaussian Conjugate Prior Cheat Sheet
No ratings yet
Gaussian Conjugate Prior Cheat Sheet
7 pages
Machine Learning and Pattern Recognition Week 2 Univariate Gaussian
No ratings yet
Machine Learning and Pattern Recognition Week 2 Univariate Gaussian
3 pages
More On Gaussians
No ratings yet
More On Gaussians
11 pages
Lecture Notes On The Gaussian Distribution
No ratings yet
Lecture Notes On The Gaussian Distribution
6 pages
The Multivariate Gaussian Distribution: 1 Relationship To Univariate Gaussians
No ratings yet
The Multivariate Gaussian Distribution: 1 Relationship To Univariate Gaussians
10 pages
Bayesian Basics: Ryan P. Adams
No ratings yet
Bayesian Basics: Ryan P. Adams
7 pages
Types of Surveying
No ratings yet
Types of Surveying
2 pages
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Tugas Pengukuran
No ratings yet
Tugas Pengukuran
6 pages
Assign 1
No ratings yet
Assign 1
5 pages
Roweis Gaussianidentities
No ratings yet
Roweis Gaussianidentities
4 pages
On The Entropy of Continuous Probability Distributions
No ratings yet
On The Entropy of Continuous Probability Distributions
3 pages
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Polymers 15 03802 v2
No ratings yet
Polymers 15 03802 v2
19 pages
PR-5 011259
No ratings yet
PR-5 011259
3 pages
Applications of Sampling For Theory: Pentti Minkkinen
No ratings yet
Applications of Sampling For Theory: Pentti Minkkinen
51 pages
1993-1.11 The Effect of Pour Time
No ratings yet
1993-1.11 The Effect of Pour Time
9 pages
Study On Psychological Capital and Organizational Identity
No ratings yet
Study On Psychological Capital and Organizational Identity
4 pages
Checkpoint 7 - Mathematics - Sample Paper - MCQ
No ratings yet
Checkpoint 7 - Mathematics - Sample Paper - MCQ
3 pages
What AShit
No ratings yet
What AShit
41 pages
Slope Stability - Finite Element Modeling
No ratings yet
Slope Stability - Finite Element Modeling
9 pages
Local Media2042746812760490968
No ratings yet
Local Media2042746812760490968
10 pages
Research Plan
No ratings yet
Research Plan
29 pages
Optimization of Truss Structures Using Genetic Algorithms With Domain Trimming (GADT)
No ratings yet
Optimization of Truss Structures Using Genetic Algorithms With Domain Trimming (GADT)
7 pages
Unit 2. Sources For The Study of Tourism - Theory
No ratings yet
Unit 2. Sources For The Study of Tourism - Theory
23 pages
Project Proposal Geano 2nd Ed
No ratings yet
Project Proposal Geano 2nd Ed
3 pages
Inventory Scales Alphabetically
No ratings yet
Inventory Scales Alphabetically
8 pages
Group 6 - Chapter 1
No ratings yet
Group 6 - Chapter 1
41 pages
Practical Research 1 Paper Template
No ratings yet
Practical Research 1 Paper Template
13 pages
Key Organizational Factors in Data Warehouse Architecture Selection
No ratings yet
Key Organizational Factors in Data Warehouse Architecture Selection
9 pages
2024 - Lina Wang - Review On Green Buildings A Perspective of Risk Management Process
No ratings yet
2024 - Lina Wang - Review On Green Buildings A Perspective of Risk Management Process
34 pages
SCS TR-55 (GPM)
No ratings yet
SCS TR-55 (GPM)
16 pages
The Use of Saga - Gis in An Integrated Meteorological-Hydrological Model For The Mawddach River Catchment, North Wales
No ratings yet
The Use of Saga - Gis in An Integrated Meteorological-Hydrological Model For The Mawddach River Catchment, North Wales
15 pages
Effects of Information Quality On Information Adoption On Social Mediareview Platforms Moderating Role of Perceived Risk
No ratings yet
Effects of Information Quality On Information Adoption On Social Mediareview Platforms Moderating Role of Perceived Risk
10 pages
DaCUM Chart - CIVIL
No ratings yet
DaCUM Chart - CIVIL
5 pages
Randomfield KL Cond
No ratings yet
Randomfield KL Cond
3 pages
Truss File
No ratings yet
Truss File
2 pages