0% found this document useful (0 votes)
19 views72 pages

Intro Bayes Time Series 1

The document is an introduction to a workshop on Bayesian Time Series Econometrics led by Joshua Chan, covering topics such as linear regression, autoregressive models, and Bayesian model comparison. It includes computation techniques like Monte Carlo simulation and Gibbs sampling, along with MATLAB tutorials. The workshop aims to prepare participants for DSGE lectures and consists of lectures and computer lab exercises over two days.

Uploaded by

jessezheng742247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views72 pages

Intro Bayes Time Series 1

The document is an introduction to a workshop on Bayesian Time Series Econometrics led by Joshua Chan, covering topics such as linear regression, autoregressive models, and Bayesian model comparison. It includes computation techniques like Monte Carlo simulation and Gibbs sampling, along with MATLAB tutorials. The workshop aims to prepare participants for DSGE lectures and consists of lectures and computer lab exercises over two days.

Uploaded by

jessezheng742247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Introduction to

Bayesian Time Series Econometrics

Joshua Chan

Australian National University


5 July 2013
Instructor

Name: Joshua Chan (Josh or Dr. Chan)

Website:

http://people.anu.edu.au/joshua.chan/

Email:
[email protected]
Overview of the Workshop

Purpose
◦ prepare you for DSGE lectures
◦ go over basic Bayesian time series models and computations
◦ include MATLAB tutorials

Topics covered
◦ linear regression, autoregressive moving average models
◦ vector autoregressive (VAR) models
◦ state space models
◦ Bayesian model comparison
Computation techniques
◦ Monte Carlo simulation
◦ Markov chain Monte Carlo: Gibbs sampling,
Metropolis-Hastings algorithm

Logistics
◦ 8 hours of lecture (9am to 1pm, two days)
◦ 6 hours of computer lab/exercises (2pm to 5pm, two days)
Plan for Today

Four 50-minute Sessions


◦ Intro to Bayesian Econometrics
◦ 1-parameter normal model (Monte Carlo integration)
◦ 2-parameter normal model (Gibbs sampler)
◦ linear regression with iid errors
◦ linear regression with moving average errors (MH algorithm)
Recommended Readings
Textbooks
◦ Koop (2003) Bayesian Econometrics, Wiley
◦ Koop, Poirier and Tobias (2007) Bayesian Econometric
Methods, Cambridge University Press
◦ Geweke (2005) Contemporary Bayesian Econometrics and
Statistics, Wiley

Survey papers
◦ Del Negro and Schorfheide (2011) “Bayesian
Macroeconometrics,” in Geweke, Koop and van Dijk (eds):
The Oxford Handbook of Bayesian Econometrics, Oxford
University Press
◦ Koop and Korobilis (2010) “Bayesian Multivariate Time Series
Methods for Empirical Macroeconomics,” Foundations and
Trends in Econometrics, 3(4): 267-358
Introduction

Frequentist inference
◦ frequentist interpretation of probabilities—an event’s
probability as the limit of its relative frequency in a large
number of trials
◦ θ is thought to be an unknown, but fixed quantity
◦ knowledge of θ is obtained from an observed sample
y1 , . . . , yn (only)
◦ usually summarized by the likelihood function

L(θ; y) = f (y | θ)
Bayesian inference
◦ probability is a (subjective) measure of the “degree of belief”
of the individual assessing the event
◦ θ is a random quantity and has a distribution f (θ) called the
prior distribution
◦ knowledge of θ comes from an observed sample y1 , . . . , yn and
the prior distribution
◦ the goal of the analysis is to obtain the posterior distribution
f (θ | y) using Bayes’ Theorem

f (θ)f (y | θ)
f (θ | y) = ∝ f (θ)f (y | θ)
m(y)
R
where m(y) = f (θ)f (y | θ)dθ is called the marginal
likelihood
◦ then compute E(θ | y), Cov(θ | y), P(θj > 0)
1-parameter Normal Model

Suppose we take n measurements y1 , . . . , yn of an unknown


quantity µ where the magnitude of measurement error is known

From a small-scale study µ is estimated to be around µ0 (but with


substantial uncertainty)

One model is

(yi | µ) ∼ N(µ, σ 2 ), i = 1, . . . , n,
µ∼ N(µ0 , σ02 ),

where σ 2 , µ0 , and σ02 are known


Recall that for x ∼ N(a, b 2 ), the density of x is
 
1 − 1
(x−a)2 − 12 1 2
x −2x a2
f (x) = √ e 2b 2 ∝e b2 b
2πb 2

What do we learn from the new measurements?

All the information is summarized by the posterior distribution

f (µ)f (y | µ)
f (µ | y) = ,
m(y)

where y = (y1 , . . . , yn )′
Computation: Overview

Immediate goal: derive f (µ | y)

Ignore any constants in f (µ | y) not involving µ

Then try to figure out what density f (µ | y) is


Computation: Details

Now do the computations...


1 n
Y
− (µ−µ0 )2 1 2
f (µ | y) ∝ f (µ)f (y | µ) ∝ e 2σ 2
0 e− 2σ2 (yi −µ)
i =1
  P 
1 µ2 − 2µµ0 −2µ yi + nµ2
∝ exp − +
2 σ02 σ2
    
1 1 n 2 µ0 nȳ
∝ exp − + µ − 2µ + 2
2 σ02 σ 2 σ02 σ

The exponent is quadratic in µ, hence, (µ | y) ∼ N(b


µ , Dµ )
Computation: Details

Now compare
    
1 1 n 2 µ0 nȳ
f (µ | y) ∝ exp − + µ − 2µ +
2 σ02 σ 2 σ02 σ2

with  
− 21 1
µ2 −2µ Dµ
b
f (µ | y) ∝ e Dµ µ

We have
 −1  
1 n µ0 nȳ
Dµ = 2 + 2 , µ
b = Dµ + 2
σ0 σ σ02 σ
Posterior Mean Interpretation

In particular, the posterior mean is


 −1  
1 n µ0 nȳ
E(µ | y) = µ
b= 2 + 2 + 2
σ0 σ σ02 σ
1 1
σ02 σ2 /n
= 1 n
µ0 + 1 n

σ02
+ σ2 σ02
+ σ2

a weighted average of the prior mean and the MLE


Numerical Examples

Suppose µ0 = 10, n = 3, ȳ = 20 and σ 2 = 1

Case 1: σ02 = 0.1; Case 2: σ02 = 10

1.5 0.8
prior
0.7 posterior
0.6
1
0.5

0.4

0.3
0.5
0.2

0.1

0 0
8 10 12 14 0 5 10 15 20 25
Quantities of Interest

p
Since (µ | y) ∼ N(b
µ, Dµ ), can easily compute Var(µ | y),
P(µ > 0 | y), 95% credibe set for µ, etc.

But suppose we wish to obtain E(g (µ) | y) < ∞ for some function
g

Can estimate that using Monte Carlo integration


Monte Carlo Integration

First note that


Z
E(g (µ) | y) = g (µ)f (µ | y)dµ

(cannot be computed analytically in general)

Monte Carlo integration: generate R draws µ(1) , . . . , µ(R) from


f (µ | y), and compute
R
1X
gb = g (µ(r ) )
R
r =1

By the weak law of large numbers, gb converges weakly in


probability to E(g (µ) | y) < ∞
Numerical Examples (Continued)

Consider case 2. Then, µ


b = 19.67 and Dµ = 0.3226

Suppose g (x) = log |x|

Recall that if Z ∼ N(0, 1), then

Y = a + bZ

follows the N(a, b 2 ) distribution


MATLAB Code

mu0 = 10; sig02 = 10; n=3; ybar = 20; sig2 = 1;


Dmu = 1/(1/sig02 + n/sig2);
muhat = Dmu*(mu0/sig02 + n*ybar/sig2);
R = 10000;
mu = muhat + sqrt(Dmu)*randn(R,1);
ghat = mean(log(abs(mu)));
2-parameter Normal Model

Now we extend the previous model so that σ 2 is unknown

The model is again

(yi | µ) ∼ N(µ, σ 2 ), i = 1, . . . , n,

where both µ and σ 2 are unknown

The same prior for µ : N(µ0 , σ02 )

Need a prior for σ 2 : InvGamma(ν0 , S0 )


Inverse-Gamma Distribution

A random variable Z is said to have an inverse-gamma distribution


with shape parameter α > 0 and scale parameter β > 0 if its pdf is
given by
β α −(α+1) −β/z
f (z; α, β) = z e
Γ(α)

We write Z ∼ InvGamma(α, β)

β β2
Moments: EZ = α−1 for α > 1, Var(Z ) = (α−1)2 (α−2)
for α > 2
Sampling from the Inverse-Gamma Distribution

Sample X ∼ Gamma(α, λ) and set Z = 1/X

Exercise: Show that the pdf of Z is


β α −(α+1) −β/z
f (z; α, β) = z e
Γ(α)

where the pdf of X is


β α α−1 −βx
f (x; α, β) = z e
Γ(α)
The Joint Posterior Distribution

By Bayes’ Theorem, the joint posterior is given by

f (µ, σ 2 | y) ∝ f (µ, σ 2 , y)
∝ f (µ)f (σ 2 )f (y | µ, σ 2 )
1 n
Y
− (µ−µ0 )2 S0 1 1 2
∝e 2σ 2
0 (σ 2 )−(ν0 +1) e− σ2 (σ 2 )− 2 e− 2σ2 (yi −µ)
i =1

We wish to obtain E(µ | y), Var(σ 2 | y) or the quantile µq so that


P(µ > µq ) = q

But those quantities cannot be obtained analytically


Monte Carlo Simulation

Idea: Use Monte Carlo methods. Specifically, obtain draws from


f (µ, σ 2 | y), say, µ(1) , . . . , µ(R) , σ 2(1) , . . . , σ 2(R)

Then compute
R R
1 X (r ) 1 X 2(r )
µ , (σ − σ̄ 2 )2 , µ(qR)
R R
r =1 r =1

One method to do that is the Gibbs Sampler


Gibbs Sampler: Overview

Suppose we want to sample from f (Θ) = f (θ 1 , . . . , θ n ) (target


distribution)

Gibbs sampler constructs a Markov chain Θ(1) , Θ(2) , . . . such that


its limiting distribution converges to f

Use full conditional distributions f (θ i | θ 1 , . . . , θ i −1 , θ i +1 , . . . , θ n )


as the transition kernels
Gibbs Sampler: Details

Starting from an initial state Θ(0) , repeat the following steps from
r = 1 to R:
1. Given the current state Θ(r ) = Θ, generate Y = (Y1 , . . . , Yn )
as follows:
1.1 Draw Y1 ∼ f (y1 | θ2 , . . . , θn ).
1.2 Draw Yi ∼ f (yi | Y1 , . . . , Yi −1 , θi +1 , . . . , θn ), i = 2, . . . , n − 1.
1.3 Draw Yn ∼ f (yn | Y1 , . . . , Yn−1 ).
2. Set Θ(r +1) = Y.
Common Misconceptions
The Markov chain Θ(1) , Θ(2) , . . . does not converge to a fixed
point in Rk

Rather, the distribution of Θ(r ) converges to the target distribution

An example:

3.2

3.15

3.1

3.05

2.95

2.9

2.85
0 2000 4000 6000 8000 10000
Histograms of the previous Markov chain:

150 150

100 100

50 50

0 0
2.9 3 3.1 2.9 3 3.1
2-parameter Normal Model (Continued)

To construct a Gibbs sampler to sample from f (µ, σ 2 | y), we need


to derive f (µ | y, σ 2 ) and f (σ 2 | y, µ)

(1) f (σ 2 | y, µ):

f (σ 2 | y, µ) ∝ f (µ, σ 2 | y)
S0 1 Pn 2
∝ (σ 2 )−(ν0 +1) e− σ2 (σ 2 )−n/2 e− 2σ2 i =1 (yi −µ)

P 2
S0 + ni =1 (yi −µ) /2
2 −(ν0 +n/2+1) −
∝ (σ ) e σ2

A known distribution?
Compare
P 2
S0 + ni =1 (yi −µ) /2
2 2 −(ν0 +n/2+1) −
f (σ | y, µ) ∝ (σ ) e σ2

with
2
f (σ 2 ; α, β) ∝ (σ 2 )−(α+1) e−β/σ

Hence,

n
!
X
(σ 2 | y, µ) ∼ InvGamma ν0 + n/2, S0 + (yi − µ)2 /2
i =1
When σ 2 is given (i.e., known), then the model reduces to the
1-parameter normal model

(2) f (µ | y, σ 2 ):

f (µ | y, σ 2 ) ∝ f (µ)f (y | µ, σ 2 )
    
1 1 n 2 µ0 nȳ
∝ exp − + µ − 2µ + 2
2 σ02 σ 2 σ02 σ

Hence, (µ | y) ∼ N(b
µ, Dµ ) where
 −1  
1 n µ0 nȳ
Dµ = 2
+ 2 , µ
b = Dµ + 2
σ0 σ σ02 σ
Gibbs Sampler for the 2-parameter Normal Model

Pick some initial values µ(0) = a0 and σ 2(0) = b0 > 0. Then,


repeat the following steps from r = 1 to R:
1. Draw σ 2(r ) ∼ f (σ 2 | y, µ(r −1) ) (inverse-gamma).
2. Draw µ(r ) ∼ f (µ | y, σ 2(r ) ) (normal).
Usually discard the first m draws as “burn-in”

Then, given µ(1) , . . . , µ(R) and σ 2(1) , . . . , σ 2(R) one can compute,
e.g.,
R
1 X (r )
µ
R
r =1

as an estimate of E(µ | y)
MATLAB Code

% norm_2para.m
nloop = 10000; burnin = 1000;
n = 500; mu = 3; sig2 = .5;
y = mu + sqrt(sig2)*randn(n,1);

% prior
mu0 = 0; sig20 = 100;
nu0 = 3; S0 = .5;

% initialize the Markov chain


mu = 0; sig2 = 1;
store_theta = zeros(nloop,2);
MATLAB Code

for loop=1:nloop
% sample mu
Dmu = 1/(1/sig20 + n/sig2);
muhat = Dmu*(mu0/sig20 + sum(y)/sig2);
mu = muhat + sqrt(Dmu)*randn;

% sample sig2
sig2 = 1/gamrnd(nu0+n/2,1/(S0+sum((y-mu).^2)/2));

% store the parameters


store_theta(loop,:) = [mu sig2];
end
store_theta = store_theta(burnin+1:end,:);
thetahat = mean(store_theta);
Linear Regression Model

Consider the linear regression model:

yt = β0 + xt,1 β1 + · · · + xt,k βk + ǫt , ǫt ∼ N(0, σ 2 ),

where xt = (1, xt,1 , . . . , xt,k )′ is a vector of regressors and


β = (β0 , β1 , . . . , βk )′ is the associated vector of regression
coefficients

(The 2-parameter normal model is a spacial case with only an


intercept and β0 = µ)
Linear Regression Model: Matrix Form

Hence,
     
y1 x1,1 x1,2 ··· x1,k   ǫ1
 y2   x2,1 β1
   x2,2 ··· x2,k  .   
 ǫ2 
 ..  =  .. .. .. ..   ..  +  .. 
 .   . . . .   .
βk
yT xT ,1 xT ,2 · · · xT ,k ǫT

Or equivalently,

y = Xβ + ǫ, ǫ ∼ N(0, σ 2 IT ),

where IT is the T × T identity matrix


Two Useful Results

(1) An affine transformation (linear transformation followed by a


translation) of normal random variables is a (multivariate) normal
random variable

(2) Suppose U has an expectation vector µU and covariance


matrix ΣU . Let V = AU. Then

µV = AµU , ΣV = AΣU A′ .
Since y is an affine transformation of ǫ ∼ N(0, σ 2 IT ), so y has a
normal distribution

Moreover, given β and σ 2 , the expectation and covariance matrix


of y are Xβ and σ 2 IT

Therefore, we have

(y | β, σ 2 ) ∼ N(Xβ, σ 2 IT )
Linear Regression Model: Likelihood

Since
(y | β, σ 2 ) ∼ N(Xβ, σ 2 IT ),
the likelihood function is given by:
1 1 ′ 2I −1 (y−Xβ)
f (y | β, σ 2 ) = |2πσ 2 IT |− 2 e− 2 (y−Xβ) (σ T)

T 1 ′
= (2πσ 2 )− 2 e− 2σ2 (y−Xβ) (y−Xβ)

Recall that for an n × n matrix A and c ∈ R

|cA| = c n |A|
Priors and Gibbs Sampler

Independent priors for β and σ 2 :

β ∼ N(β 0 , Vβ ), σ 2 ∼ InvGamma(ν0 , S0 )

Use Gibbs sampler to estimate the model

We need to derive (1) (σ 2 | y, β) and (2) (β | y, σ 2 )


Sample (σ 2 | y, β)

(1) f (σ 2 | y, β):
S0 1 ′
f (σ 2 | y, β) ∝ (σ 2 )−(ν0 +1) e− σ2 (σ 2 )−T /2 e− 2σ2 (y−Xβ) (y−Xβ)
1 ′
∝ (σ 2 )−(ν0 +T /2+1) e− σ2 (S0 +(y−Xβ) (y−Xβ)/2)

Hence,

(σ 2 | y, β) ∼ InvGamma ν0 + T /2, S0 + (y − Xβ)′ (y − Xβ)/2
Recall (AB)′ = B′ A′ , and hence

(y − Xβ)′ (y − Xβ) = y′ y − y′ Xβ − β ′ X′ y + β ′ X′ Xβ
= β ′ X′ Xβ − 2β ′ X′ y + y′ y,

since β ′ X′ y is a scalar so

β ′ X′ y = (β ′ X′ y)′ = y′ Xβ
Sample (β | y, σ 2)

(2) f (β | y, σ 2 ):

f (β | y, σ 2 ) ∝ f (β)f (y | β, σ 2 )
1 ′ −1
(β−β0 ) − 2σ1 2 (y−Xβ)′ (y−Xβ)
∝ e− 2 (β−β0 ) Vβ e
− 12 (β ′ Vβ
−1 −1
β−2β ′ Vβ β 0 ) − 2σ1 2 (β ′ X′ Xβ−2β ′ X′ y)
∝e e
h    i
− 12 −1
β ′ Vβ + 1
X′ X −1
β−2β ′ Vβ β0+ 1
X′ y
∝e σ2 σ2

The exponent is quadratic in β. Hence,


b Dβ )
(β | y, σ 2 ) ∼ N(β,
Now compare
h    i
2 − 21 β′ Vβ
−1
+ 1
X′ X −1
β−2β ′ Vβ β0 + 1
X′ y
f (β | y, σ ) ∝ e σ2 σ2

with 1
f (β | y, σ 2 ) ∝ e− 2 (β Dβ β β)
′ −1
β−2β ′ D−1 b

We have
 −1  
1 b 1 ′
Dβ = Vβ−1 + 2 X′ X , −1
β = Dβ Vβ β 0 + 2 X y
σ σ
Sampling from the Multivariate Normal Distribution

To generate R independent draws from N(µ, Σ) of dimension n,


carry out the following steps:
1. Compute the lower Cholesky factorization Σ = BB′ .
2. Generate Z = (Z1 , . . . , Zn )′ by drawing Z1 , . . . , Zn ∼ N(0, 1).
3. Output U = µ + BZ.
4. Repeat Steps 2 and 3 independently R times.
Gibbs Sampler for the Linear Regression Model

Pick some initial values β (0) = a0 and σ 2(0) = b0 > 0. Then,


repeat the following steps from r = 1 to R:
1. Draw σ 2(r ) ∼ f (σ 2 | y, β (r −1) ) (inverse-gamma).
2. Draw β (r ) ∼ f (β | y, σ 2(r ) ) (multivariate normal).
MATLAB Code

% linreg.m
nloop = 10000; burnin = 1000;
n = 500; beta = [1 5]’; sig2 = .5;
X = [ones(n,1) 1+randn(n,1)];
y = X*beta + sqrt(sig2)*randn(n,1);

% prior
beta0 = [0 0]’; invVbeta0 = speye(2)/100;
nu0 = 3; S0 = .5;

% initialize the Markov chain


beta = (X’*X)\(X’*y); sig2 = 1;
store_theta = zeros(nloop,3);
MATLAB Code
for loop=1:nloop
% sample beta
Dbeta = (invVbeta0 + X’*X/sig2)\speye(2);
betahat = Dbeta*(invVbeta0*beta0 + X’*y/sig2);
C = chol(Dbeta,’lower’);
beta = betahat + C*randn(2,1);

% sample sig2
e = y-X*beta;
sig2 = 1/gamrnd(nu0+n/2,1/(S0+e’*e/2));

% store the parameters


store_theta(loop,:) = [beta’ sig2];
end
store_theta = store_theta(burnin+1:end,:);
thetahat = mean(store_theta);
Linear Regression Model with Moving Average Errors

Consider again the linear regression model:

yt = x′t β + ǫt ,

but with MA(1) error:

ǫt = ut + ψut−1 ,

where u0 = 0 and u1 . . . , uT ∼ N(0, σ 2 )

(The previous linear regression model is a spacial case with ψ = 0)


Priors and Estimation

Independent priors for β, σ 2 , and ψ

β ∼ N(β 0 , Vβ ), σ 2 ∼ InvGamma(ν0 , S0 ), ψ ∼ U(−1, 1),

To construct a Gibbs sampler, we need to derive


1. (σ 2 | y, β, ψ),
2. (β | y, σ 2 , ψ),
3. (ψ | y, β, σ 2 ).

Turns out Steps 1 and 2 can be done as before, but f (ψ | y, β, σ 2 )


is not a standard density
Estimation: Overview

We will first derive f (σ 2 | y, β, ψ) (inverse-gamma) and


f (β | y, σ 2 , ψ) (multivariate normal)

Introduce the Metropolis-Hastings algorithm to handle Step 3

(The resulting sampler is sometimes called


Metropolis-within-Gibbs)
Likelihood: Derivation

To derive the likelihood function, rewrite the MA(1) model into


the matrix form (Chan, 2013)

ǫ = Hψ u,

where u = (u1 , . . . , uT )′ ∼ N(0, σ 2 IT ), and


 
1 0 0 ··· 0
ψ 1 0 ··· 0
 
 0
Hψ =  0 ψ 1 ··· 
 .. .. .. 
. . .
0 0 ··· ψ 1

is a sparse T × T matrix that contains only (2T − 1) non-zero


elements
Hence,

y = Xβ + ǫ = Xβ + Hψ u, u ∼ N(0, σ 2 IT )

By a change of variable, we have

(y | β, σ 2 , ψ) ∼ N(Xβ, σ 2 Hψ H′ψ ),

Note that Hψ is a lower triangular matrix with ones on the main


diagonal. Hence, |Hψ | = 1
Likelihood: Computation

The likelihood function can be written as


1 1 ′ 2H ′ −1 (y−Xβ)
f (y | β, σ 2 , ψ) = |2πσ 2 Hψ H′ψ |− 2 e− 2 (y−Xβ) (σ ψ Hψ )

T 1 ′ ′ −1 (y−Xβ)
= (2πσ 2 )− 2 e− 2σ2 (y−Xβ) (Hψ Hψ )

The likelihood function can be evaluated quickly

To compute (Hψ H′ψ )−1 (y − Xβ), one needs not obtain


(Hψ H′ψ )−1 , which is a very time-consuming operation
Instead, solve the system

(Hψ H′ψ )x = (y − Xβ)

for x

Can be done by in MATLAB using the command \

To evaluate the likelihood function for general MA(q) model, one


only needs to redefine the matrix Hψ appropriately
Sample (σ 2 | y, β, ψ)

(1) f (σ 2 | y, β, ψ):
S0 T 1 ′ ′ −1 (y−Xβ)
f (σ 2 | y, β, ψ) ∝ (σ 2 )−(ν0 +1) e− σ2 (σ 2 )− 2 e− 2σ2 (y−Xβ) (Hψ Hψ )
1 ′ ′ −1 (y−Xβ)/2)
∝ (σ 2 )−(ν0 +T /2+1) e− σ2 (S0 +(y−Xβ) (Hψ Hψ )

Hence,
(σ 2 | y, β) ∼ InvGamma (ν0 + T /2, S) ,
where S = S0 + (y − Xβ)′ (Hψ H′ψ )−1 (y − Xβ)/2
Sample (β | y, σ 2, ψ)

(2) f (β | y, σ 2 , ψ):
1 −1 1
′ (y−Xβ)′ (Hψ H′ψ )−1 (y−Xβ)
∝ e− 2 (β−β0 ) Vβ (β−β0 ) −
e 2σ 2

1 ′ −1 −1
β−2β ′ Vβ β 0 ) − 2σ1 2 (β ′ X′ (Hψ H′ψ )−1 Xβ−2β ′ X′ (Hψ H′ψ )−1 y)
∝ e− 2 (β Vβ e
h    i
− 12 β ′ Vβ
−1
+ 1
X′ (Hψ H′ψ )−1 X −1
β−2β′ Vβ β0 + 1
X′ (Hψ H′ψ )−1 y
∝e σ2 σ2
Again, the exponent is quadratic in β, and we have
b Dβ )
(β | y, σ 2 , ψ) ∼ N(β,

where
 −1
1 ′
Dβ = Vβ−1 ′ −1
+ 2 X (Hψ Hψ ) X ,
σ
 
b −1 1 ′ ′ −1
β = Dβ Vβ β 0 + 2 X (Hψ Hψ ) y
σ
Sample (ψ | y, β, σ 2)

(3) f (ψ | y, β, σ 2 ):

f (ψ | y, β, σ 2 ) ∝ f (ψ)f (y | β, σ 2 , ψ)
1 ′ ′ −1 (y−Xβ)
∝ e− 2σ2 (y−Xβ) (Hψ Hψ )

with −1 < ψ < 1

The density f (ψ | y, β, σ 2 ) is non-standard

We don’t know how to sample from f (ψ | y, β, σ 2 ), but it can be


evaluated quickly (up to a normalization constant)
Metropolis-Hastings Algorithm: Overview

Suppose ψ is the current draw. Obtain a candidate draw ψ c , and


decide (probabilistically) whether to accept it or not

The acceptance probability is computed so that the detailed


balance equations are satisfied

As a result, the limiting distribution of the Markov chain


constructed is guaranteed to converge to the target

See Chib and Greenberg (1995)


Metropolis-Hastings Algorithm: Implementation

Two variants: random walk sampler and independent sampler

Random walk sampler:


1. the candidate is obtained via ψ c = ψ + v , where v ∼ g for
some distribution g (usually g is symmetric around 0)
2. accept ψ c with probability
 
f (ψ c | y, β, σ 2 )
min 1,
f (ψ | y, β, σ 2 )

3. if ψ c is not accepted, stay at ψ


Some comments:
◦ easy to implement
◦ g is typically chosen to be normal of t distributions with 0
mean
◦ might need some fine-tuning, but works for low-dimensional
problems
◦ could be very inefficient for high-dimensional problems
Independent sampler:
1. the candidate is obtained via ψ c ∼ q(ψ) (called proposal
density)
2. accept ψ c with probability
 
f (ψ c | y, β, σ 2 )q(ψ)
min 1,
f (ψ | y, β, σ 2 )q(ψ c )

3. if ψ c is not accepted, stay at ψ


Some comments:
◦ requires more from user/more difficult to implement
◦ choose q(ψ) as “close” as possible to f (ψ c | y, β, σ 2 )
◦ (if q(ψ) = f (ψ | y, β, σ 2 ), then acceptance prob. = 1 and it
reduces to the Gibbs sampler)
◦ could be very efficient or very inefficient (depends on the
choice of q(ψ))
Sample (ψ | y, β, σ 2)

Recall 1 ′ ′ −1 (y−Xβ)
f (ψ | y, β, σ 2 ) ∝ e− 2σ2 (y−Xβ) (Hψ Hψ )
with −1 < ψ < 1

Use a random walk sampler with ψ c = ψ + v , where v ∼ N(0, 0.1)


MATLAB Code: Evaluating log f (ψ | y, β, σ 2)

% ppsi.m
function llike = ppsi(psi,y,X,beta,sig2)
n = length(y);
Hpsi = speye(n) + ...
psi*sparse(2:n,1:n-1,ones(n-1,1),n,n);
R = Hpsi*Hpsi’;
llike = -.5/sig2*(y-X*beta)’*(R\(y-X*beta));
end
MATLAB Code

% linreg_ma1_RW.m
nloop = 11000; burnin = 1000;
randn(’seed’, 314159);
psi = -.5; beta = [4 .6]’; sig2 = .5;
n = 100; y0 = 10; y = zeros(n,1);
u = sqrt(sig2)*randn(n,1);
for i=1:n
if i==1
y(i) = beta(1) + beta(2)*y0 + u(i);
else
y(i) = beta(1) + beta(2)*y(i-1) + ...
+ u(i) + psi*u(i-1);
end
end
X = [ones(n,1) [y0; y(1:end-1)]];
MATLAB Code

% prior
beta0 = [0 0]’; invVbeta0 = 1/100*speye(2);
nu0 = 3; S0 = 1;
% initialize the Markov chain
psi = .1; beta = (X’*X)\(X’*y); sig2 = 1;
store_theta = zeros(nloop,4); count = 0;
MATLAB Code

for loop=1:nloop
% sample beta
Hpsi = speye(n) + ...
psi*sparse(2:n,1:n-1,ones(n-1,1),n,n);
R = Hpsi*Hpsi’;
Dbeta = (invVbeta0 + X’*(R\X)/sig2)\speye(2);
betahat = Dbeta*(invVbeta0*beta0+X’*(R\y)/sig2);
C = chol(Dbeta,’lower’);
beta = betahat + C*randn(2,1);
% sample sig2
e = y-X*beta;
sig2 = 1/gamrnd(nu0+n/2,1/(S0+e’*(R\e)/2));
MATLAB Code

% sample psi
psic = psi + sqrt(.1)*randn;
if psic<1 && psic >-1
alp = ppsi(psic,y,X,beta,sig2) ...
- ppsi(psi,y,X,beta,sig2);
if exp(alp)>rand
psi = psic;
count = count+1;
end
end
% store the parameters
store_theta(loop,:) = [beta’ sig2 psi];
end
store_theta = store_theta(burnin+1:end,:);
thetahat = mean(store_theta);

You might also like