Lab 1 Eng
Lab 1 Eng
Mathematics department
Div. of mathematical statistics
Mikael Andersson
The purpose of this computer exercise is that you, for a comparatively simple situa-
tion, will get an introduction to the general ideas behind MCMC-simulation and how one
practically implements it. These instructions are written for the software package Matlab,
but it is allowed to solve the exercise using other software.
The choice of Matlab is mainly due to pedagocical considerations. Today, there are
several programs developed for MCMC-simulation (e.g. WinBUGS for Microsoft Windows),
but using such high-level software usually does not give any deeper insight into the details
behind the algorithms. In addition, Matlab is very efficient when it comes to extensive
numerical computations, random number generation and handling large data sets.
No previous experience of Matlab is required, although it would be a great advantage.
The syntax and the specific functions used will be introduced gradually as needed. For
those who wish to learn more about Matlab, there are extensive resources on the web, see
for example www.sgr.nada.kth.se/unix/software/matlab/. It is also possible to get a
short description of every defined function in Matlab using the command help followed by
the name of the function.
1
“Workspace” you see all variables you create and in “Current Directory” you see all files
in the current directory. To terminate the session you either write exit in “Command
Window” or enter the menu “File” at the top and then “Exit MATLAB”.
>> vec=[1 2 3 4]
vec =
1 2 3 4
mat =
To access particular entries from these variables you use indices like
>> vec(2)
ans =
>> mat(2,3)
ans =
8.4000
2
>> vec(2:4)
ans =
2 3 4
>> mat(1:2,2:3)
ans =
2.7000 4.2000
1.9000 8.4000
This is a very crude introduction to the structure of Matlab, but, as mentioned, we will
introduce more aspects and functions as we go along.
3 MCMC-simulation of Normal-Inverse-χ2-distribution
As an introduction to MCMC-simulatiom, we will consider the situation with normally
distributed data with unknown mean and variance. As we have seen in the lectures and
in the book, Section 3.3, the family of Normal-Inverse-χ2 -distributions is conjugate for
the parameter vector θ = (µ, σ 2 ) in the normal distribution. This comparatively simple
situation can be handled analytically, but it might be a good idea to carry out a simulation
algorithm for a model without too many parameters to keep track of.
In this exercise, we will focus on the case with the non-informative prior distribution
1
p(µ, σ 2 ) ∼
σ2
according to Section 3.2 in the book. This implies the posterior distribution
µ ¶n/2+1 µ ¶
2 1 1
p(µ, σ |y) ∼ exp − 2 {(n − 1)s2 + n(ȳ − µ)2 }
σ2 2σ
where ȳ and s2 are average and sample variance for the sampley = (y1 , y2 , . . . , yn ).
As an application with a biostatistical connection, we will use data from a clinical trial
where the dissolving time of a certain substance in stomach acid from eight patients was
measured. The result (in seconds) was
First create the vector y in Matlab consisting of these observations and calculate the average
and sample variance using the functions mean and var. (Write help mean and help var
for short descriptions of these functions.)
3
3.1 The Metropolis-Hastings algorithm
We are now going to construct an algorithm for simulation of the posterior distribution for
given data. Let us first rename our parameters as θ1 = µ and θ2 = σ 2 , which yields a more
convenient notation, and let n = 8. The posterior distribution can now be written
µ ¶
1
p(θ1 , θ2 |y) ∼ θ2−5 exp − {7s2 + 8(ȳ − θ1 )2 }
2θ2
We will now construct the algorithm such that the parameters are updated one at the
time instead of simultaneously. Let us start with the jump distributions.
We will use the uniform distribution for θ1 , mostly because it is easy to simulate but
also because it is symmetric. If we are at step t in the Markov chain, we simulate a new
value for θ1 according to
θ∗ ∼ U [θ1t−1 − d1 , θ1t−1 + d1 ]
where U [x1 , x2 ] denotes the uniform distribution on the interval x1 ≤ x ≤ x2 and d1 denotes
the maximal jump length. An alternative way to express this is
θ1∗ = θ1t−1 + X
φ∗ = φt−1 + Y
where Y ∼ U [−d2 , d2 ]. To express the jump distribution for θ2 in terms of the jump
distribution for φ we use the relation
¯ ¯
¯ dφ ¯ 1
¯ ¯
Jt (θ2∗ |θ2t−1 ) ∗
= Jt (φ |φ t−1
) ¯¯ ¯ = Jt (φ∗ |φt−1 ) ∗
¯
dθ2 θ2
The ratio between the densities for the jump distributions used in the Metropolis-Hastings
algorithm can now be written
4
b. Simulate φ∗ = φt−1 + Y .
Let us now try out a couple of steps in the algorithm. First we have to choose starting
values for the parameters, for example θ10 = 45 and θ20 = 2. In this simple situation we
know that the distribution is centered in the point (ȳ, s2 ), so to get fast convergence we
choose a starting point close to this. To be able to handle data in a convenient way, we
store the simulated values in the vectors theta1 and theta2. We begin by entering the
starting values
>> theta1(1)=45
theta1 =
45
>> theta2(1)=2
theta2 =
Vectors are indexed from 1 and up, so θ10 corresponds to theta1(1) and θ11 corresponds to
theta1(2) and so on.
Let d1 = 0.1 and simulate θ1∗ as
>> d1=0.1
d1 =
0.1000
>> t1=theta1(1)+(rand-0.5)*2*d1
t1 =
45.0315
Since this is a random number, you will most likely get something else.
In the next step, we are going to calculate the ratio
5
for the starting values θ10 , θ20 and our simulated value θ1∗ . Use the expression for the posterior
distribution p(θ1 , θ2 |y) and calculate r. Then generate a uniform random number between
0 and 1 and check if this is smaller than r. In Matlab you can use an if-statement as
>> if rand<r
theta1(2)=t1
else
theta1(2)=theta1(1)
end
theta1 =
45.0000 45.0315
If the condition after if is satisfied then the first command is executed, if not then the
second command after else is executed. In this case, the new value θ1∗ was accepted and
hence the vector theta1 was updated with the new value 45.0315.
Let d2 = 0.2 and simulate θ2∗ as
>> d2=0.2
d2 =
0.2000
>> phi=log(theta2(1))+(rand-0.5)*2*d2
phi =
0.6065
>> t2=exp(phi)
t2 =
1.8339
Then, calculate the ratio
p(θ11 , θ2∗ |y)/Jt (θ2∗ |θ21 ) p(θ11 , θ2∗ |y)θ2∗
r= =
p(θ11 , θ21 |y)/Jt (θ21 |θ2∗ ) p(θ11 , θ21 |y)θ21
6
and test in the same way as above if a random number between 0 and 1 is smaller than r.
Finally, update the vector theta2 depending on the result in a similar way as above.
This will produce vectors similar to
>> theta1
theta1 =
45.0000 45.0315
>> theta2
theta2 =
2.0000 1.8339
To reach (almost) convergence of the Markov chain constructed in this way, we naturally
have to repeat these steps a large number of times. To do this more conveniently, we will
now define a new function in Matlab that simulates a Markov chain a fixed number of
steps. Start by clicking the menu “File” in the top left corner in the “Command Window”,
then click “New” and finally “M-file”. Then a new window appear where we will write our
function.
Write on the first line
function [z1,z2]=MHalg(N,y,theta10,theta20,d1,d2)
The command function says that we are going to define a function with the name MHalg,
where N,y,theta10,theta20,d1,d2 are the arguments of the function and z1 and z2 are
the results, in our case the simulated parameters. We start as before by entering the
starting values and introduce a step counter t as
function [z1,z2]=MHalg(N,y,theta10,theta20,d1,d2)
theta1(1)=theta10;
theta2(1)=theta20;
t=0;
Putting a semicolon (;) at the end of a line has the effect that the result of the operation
on that line is not written out in “Command Window”. Since we are going to carry out
several operations hundreds and even thousands of times, the “Command Window” would
be jammed by all results.
The argument N denotes the number of steps we are going to simulate. To get the
function to repeat the following operations the required number of times, we can use a
while-statement as
7
while t<N
operation 1
operation 2
:
:
:
operation k
end
Write alla steps that you carried out earlier for the simulation of θ11 , but remember to use
the step counter t in the right way as
t1=theta1(t+1)+(rand-0.5)*2*d1;
and
theta1(t+2)=t1;
Write the corresponding steps for simulation of θ22 using the step counter in the same way
as before, update the vectors again and increase the step counter as
t=t+1;
z1=theta1;
z2=theta2;
so that the function yields the vectors as results. When all this is done you click “File”
again, then “Save As” and write MHalg.m on the line “Enter file name:”. Do not forget the
extension .m at the end, so that Matlab will recognise the file.
By calling the function in “Command Window” as
>> [z1,z2]=MHalg(100,y,45,2,0.1,0.2)
we get a simulated Markov chain in 100 steps with starting values 45 and 2 and maximal
jump lengths 0.1 and 0.2 respectively. Try this and plot the results using
>> plot(0:100,z1)
>> plot(0:100,z2)
>> plot(z1,z2)
It is also possible to illustrate the simulated parameters in histograms, which can be done
in Matlab as
>> hist(z1,10)
>> hist(z2,10)
8
The second argument, in this example 10, denotes the number of bars in the histogram. A
simple way to test if the simulated values are correct is to calculate the sample means
>> mean(z1)
>> mean(z2)
and compare to ȳ and s2 . Remember that the mean of the inverse χ2 -distribution is not
equal to s2 but to (n − 1)s2 /(n − 3).
[z1,z2]=MHalg(200,y,45,2,0.1,0.2);
psi1(1,1:100)=z1(102:201);
psi2(1,1:100)=z2(102:201);
[z1,z2]=MHalg(200,y,45.5,2,0.1,0.2);
psi1(2,1:100)=z1(102:201);
psi2(2,1:100)=z2(102:201);
[z1,z2]=MHalg(200,y,45,10,0.1,0.2);
psi1(3,1:100)=z1(102:201);
psi2(3,1:100)=z2(102:201);
[z1,z2]=MHalg(200,y,45.5,10,0.1,0.2);
psi1(4,1:100)=z1(102:201);
psi2(4,1:100)=z2(102:201);
I have chosen four fairly scattered starting points based on the results from the first simu-
lation. To run the operations, you simply write
>> steplengths
in “Command Window”.
We have now obtained two matrices psi1 and psi2 consisting of four rows and 100
columns each. Using the Matlab functions mean, var and sum, we can now calculate the
variances between groups and within groups as
>> B1=100/3*sum(power(mean(psi1’)-mean(mean(psi1)),2))
B1 =
9
0.9307
>> B2=100/3*sum(power(mean(psi2’)-mean(mean(psi2)),2))
B2 =
163.9868
and
>> W1=mean(var(psi1’))
W1 =
0.0206
>> W2=mean(var(psi2’))
W2 =
0.8309
Again, it is quite possible that you will get other values, but the difference should not be
too large. We get the scale reductions
>> R1=sqrt((99/100*W1+1/100*B1)/W1)
R1 =
1.2010
>> R2=sqrt((99/100*W2+1/100*B2)/W2)
R2 =
1.7215
>> neff1=4*100*(99/100*W1+1/100*B1)/B1
neff1 =
12.7543
10
>> neff2=4*100*(99/100*W2+1/100*B2)/B2
neff2 =
6.0065
According to my simulations, we do not satisfy the condition for low scale reduction nor
large effective number of simulations. Try to increase the number of simulations until
these criteria are satisfied. Try also longer or shorter jump lengths d1 and d2 to increse
the efficiency in the algorithm. Another convenient feature of Matlab is to recall previous
commands by repeatedly using the “arrow up” key.
>> mean(z1)
>> mean(z2)
Posterior probability intervals are almost as easily obtained by sorting all values
>> z1=sort(z1);
>> z2=sort(z2);
and removing the 2.5 % smallest and 2.5 % largest values (if we want a 95 % interval). If
we have simulated 1000 values, we get the limits
>> z1(25)
>> z1(975)
3.4 Exercise
The computer exercise must be summarised in a written report containing a print-out of
the file MHalg.m, plots and histograms of your simulated values, sample means and 95 %
probability intervals and choice of optimal jump lengths. It is also possible to send in the
report by e-mail to [email protected], preferably in pdf-format.
11