Intro Bayes Time Series 1
Intro Bayes Time Series 1
Joshua Chan
Website:
http://people.anu.edu.au/joshua.chan/
Email:
[email protected]
Overview of the Workshop
Purpose
◦ prepare you for DSGE lectures
◦ go over basic Bayesian time series models and computations
◦ include MATLAB tutorials
Topics covered
◦ linear regression, autoregressive moving average models
◦ vector autoregressive (VAR) models
◦ state space models
◦ Bayesian model comparison
Computation techniques
◦ Monte Carlo simulation
◦ Markov chain Monte Carlo: Gibbs sampling,
Metropolis-Hastings algorithm
Logistics
◦ 8 hours of lecture (9am to 1pm, two days)
◦ 6 hours of computer lab/exercises (2pm to 5pm, two days)
Plan for Today
Survey papers
◦ Del Negro and Schorfheide (2011) “Bayesian
Macroeconometrics,” in Geweke, Koop and van Dijk (eds):
The Oxford Handbook of Bayesian Econometrics, Oxford
University Press
◦ Koop and Korobilis (2010) “Bayesian Multivariate Time Series
Methods for Empirical Macroeconomics,” Foundations and
Trends in Econometrics, 3(4): 267-358
Introduction
Frequentist inference
◦ frequentist interpretation of probabilities—an event’s
probability as the limit of its relative frequency in a large
number of trials
◦ θ is thought to be an unknown, but fixed quantity
◦ knowledge of θ is obtained from an observed sample
y1 , . . . , yn (only)
◦ usually summarized by the likelihood function
L(θ; y) = f (y | θ)
Bayesian inference
◦ probability is a (subjective) measure of the “degree of belief”
of the individual assessing the event
◦ θ is a random quantity and has a distribution f (θ) called the
prior distribution
◦ knowledge of θ comes from an observed sample y1 , . . . , yn and
the prior distribution
◦ the goal of the analysis is to obtain the posterior distribution
f (θ | y) using Bayes’ Theorem
f (θ)f (y | θ)
f (θ | y) = ∝ f (θ)f (y | θ)
m(y)
R
where m(y) = f (θ)f (y | θ)dθ is called the marginal
likelihood
◦ then compute E(θ | y), Cov(θ | y), P(θj > 0)
1-parameter Normal Model
One model is
(yi | µ) ∼ N(µ, σ 2 ), i = 1, . . . , n,
µ∼ N(µ0 , σ02 ),
f (µ)f (y | µ)
f (µ | y) = ,
m(y)
where y = (y1 , . . . , yn )′
Computation: Overview
Now compare
1 1 n 2 µ0 nȳ
f (µ | y) ∝ exp − + µ − 2µ +
2 σ02 σ 2 σ02 σ2
with
− 21 1
µ2 −2µ Dµ
b
f (µ | y) ∝ e Dµ µ
We have
−1
1 n µ0 nȳ
Dµ = 2 + 2 , µ
b = Dµ + 2
σ0 σ σ02 σ
Posterior Mean Interpretation
1.5 0.8
prior
0.7 posterior
0.6
1
0.5
0.4
0.3
0.5
0.2
0.1
0 0
8 10 12 14 0 5 10 15 20 25
Quantities of Interest
p
Since (µ | y) ∼ N(b
µ, Dµ ), can easily compute Var(µ | y),
P(µ > 0 | y), 95% credibe set for µ, etc.
But suppose we wish to obtain E(g (µ) | y) < ∞ for some function
g
Y = a + bZ
(yi | µ) ∼ N(µ, σ 2 ), i = 1, . . . , n,
We write Z ∼ InvGamma(α, β)
β β2
Moments: EZ = α−1 for α > 1, Var(Z ) = (α−1)2 (α−2)
for α > 2
Sampling from the Inverse-Gamma Distribution
f (µ, σ 2 | y) ∝ f (µ, σ 2 , y)
∝ f (µ)f (σ 2 )f (y | µ, σ 2 )
1 n
Y
− (µ−µ0 )2 S0 1 1 2
∝e 2σ 2
0 (σ 2 )−(ν0 +1) e− σ2 (σ 2 )− 2 e− 2σ2 (yi −µ)
i =1
Then compute
R R
1 X (r ) 1 X 2(r )
µ , (σ − σ̄ 2 )2 , µ(qR)
R R
r =1 r =1
Starting from an initial state Θ(0) , repeat the following steps from
r = 1 to R:
1. Given the current state Θ(r ) = Θ, generate Y = (Y1 , . . . , Yn )
as follows:
1.1 Draw Y1 ∼ f (y1 | θ2 , . . . , θn ).
1.2 Draw Yi ∼ f (yi | Y1 , . . . , Yi −1 , θi +1 , . . . , θn ), i = 2, . . . , n − 1.
1.3 Draw Yn ∼ f (yn | Y1 , . . . , Yn−1 ).
2. Set Θ(r +1) = Y.
Common Misconceptions
The Markov chain Θ(1) , Θ(2) , . . . does not converge to a fixed
point in Rk
An example:
3.2
3.15
3.1
3.05
2.95
2.9
2.85
0 2000 4000 6000 8000 10000
Histograms of the previous Markov chain:
150 150
100 100
50 50
0 0
2.9 3 3.1 2.9 3 3.1
2-parameter Normal Model (Continued)
(1) f (σ 2 | y, µ):
f (σ 2 | y, µ) ∝ f (µ, σ 2 | y)
S0 1 Pn 2
∝ (σ 2 )−(ν0 +1) e− σ2 (σ 2 )−n/2 e− 2σ2 i =1 (yi −µ)
P 2
S0 + ni =1 (yi −µ) /2
2 −(ν0 +n/2+1) −
∝ (σ ) e σ2
A known distribution?
Compare
P 2
S0 + ni =1 (yi −µ) /2
2 2 −(ν0 +n/2+1) −
f (σ | y, µ) ∝ (σ ) e σ2
with
2
f (σ 2 ; α, β) ∝ (σ 2 )−(α+1) e−β/σ
Hence,
n
!
X
(σ 2 | y, µ) ∼ InvGamma ν0 + n/2, S0 + (yi − µ)2 /2
i =1
When σ 2 is given (i.e., known), then the model reduces to the
1-parameter normal model
(2) f (µ | y, σ 2 ):
f (µ | y, σ 2 ) ∝ f (µ)f (y | µ, σ 2 )
1 1 n 2 µ0 nȳ
∝ exp − + µ − 2µ + 2
2 σ02 σ 2 σ02 σ
Hence, (µ | y) ∼ N(b
µ, Dµ ) where
−1
1 n µ0 nȳ
Dµ = 2
+ 2 , µ
b = Dµ + 2
σ0 σ σ02 σ
Gibbs Sampler for the 2-parameter Normal Model
Then, given µ(1) , . . . , µ(R) and σ 2(1) , . . . , σ 2(R) one can compute,
e.g.,
R
1 X (r )
µ
R
r =1
as an estimate of E(µ | y)
MATLAB Code
% norm_2para.m
nloop = 10000; burnin = 1000;
n = 500; mu = 3; sig2 = .5;
y = mu + sqrt(sig2)*randn(n,1);
% prior
mu0 = 0; sig20 = 100;
nu0 = 3; S0 = .5;
for loop=1:nloop
% sample mu
Dmu = 1/(1/sig20 + n/sig2);
muhat = Dmu*(mu0/sig20 + sum(y)/sig2);
mu = muhat + sqrt(Dmu)*randn;
% sample sig2
sig2 = 1/gamrnd(nu0+n/2,1/(S0+sum((y-mu).^2)/2));
Hence,
y1 x1,1 x1,2 ··· x1,k ǫ1
y2 x2,1 β1
x2,2 ··· x2,k .
ǫ2
.. = .. .. .. .. .. + ..
. . . . . .
βk
yT xT ,1 xT ,2 · · · xT ,k ǫT
Or equivalently,
y = Xβ + ǫ, ǫ ∼ N(0, σ 2 IT ),
µV = AµU , ΣV = AΣU A′ .
Since y is an affine transformation of ǫ ∼ N(0, σ 2 IT ), so y has a
normal distribution
Therefore, we have
(y | β, σ 2 ) ∼ N(Xβ, σ 2 IT )
Linear Regression Model: Likelihood
Since
(y | β, σ 2 ) ∼ N(Xβ, σ 2 IT ),
the likelihood function is given by:
1 1 ′ 2I −1 (y−Xβ)
f (y | β, σ 2 ) = |2πσ 2 IT |− 2 e− 2 (y−Xβ) (σ T)
T 1 ′
= (2πσ 2 )− 2 e− 2σ2 (y−Xβ) (y−Xβ)
|cA| = c n |A|
Priors and Gibbs Sampler
β ∼ N(β 0 , Vβ ), σ 2 ∼ InvGamma(ν0 , S0 )
(1) f (σ 2 | y, β):
S0 1 ′
f (σ 2 | y, β) ∝ (σ 2 )−(ν0 +1) e− σ2 (σ 2 )−T /2 e− 2σ2 (y−Xβ) (y−Xβ)
1 ′
∝ (σ 2 )−(ν0 +T /2+1) e− σ2 (S0 +(y−Xβ) (y−Xβ)/2)
Hence,
(σ 2 | y, β) ∼ InvGamma ν0 + T /2, S0 + (y − Xβ)′ (y − Xβ)/2
Recall (AB)′ = B′ A′ , and hence
(y − Xβ)′ (y − Xβ) = y′ y − y′ Xβ − β ′ X′ y + β ′ X′ Xβ
= β ′ X′ Xβ − 2β ′ X′ y + y′ y,
since β ′ X′ y is a scalar so
β ′ X′ y = (β ′ X′ y)′ = y′ Xβ
Sample (β | y, σ 2)
(2) f (β | y, σ 2 ):
f (β | y, σ 2 ) ∝ f (β)f (y | β, σ 2 )
1 ′ −1
(β−β0 ) − 2σ1 2 (y−Xβ)′ (y−Xβ)
∝ e− 2 (β−β0 ) Vβ e
− 12 (β ′ Vβ
−1 −1
β−2β ′ Vβ β 0 ) − 2σ1 2 (β ′ X′ Xβ−2β ′ X′ y)
∝e e
h i
− 12 −1
β ′ Vβ + 1
X′ X −1
β−2β ′ Vβ β0+ 1
X′ y
∝e σ2 σ2
with 1
f (β | y, σ 2 ) ∝ e− 2 (β Dβ β β)
′ −1
β−2β ′ D−1 b
We have
−1
1 b 1 ′
Dβ = Vβ−1 + 2 X′ X , −1
β = Dβ Vβ β 0 + 2 X y
σ σ
Sampling from the Multivariate Normal Distribution
% linreg.m
nloop = 10000; burnin = 1000;
n = 500; beta = [1 5]’; sig2 = .5;
X = [ones(n,1) 1+randn(n,1)];
y = X*beta + sqrt(sig2)*randn(n,1);
% prior
beta0 = [0 0]’; invVbeta0 = speye(2)/100;
nu0 = 3; S0 = .5;
% sample sig2
e = y-X*beta;
sig2 = 1/gamrnd(nu0+n/2,1/(S0+e’*e/2));
yt = x′t β + ǫt ,
ǫt = ut + ψut−1 ,
ǫ = Hψ u,
y = Xβ + ǫ = Xβ + Hψ u, u ∼ N(0, σ 2 IT )
(y | β, σ 2 , ψ) ∼ N(Xβ, σ 2 Hψ H′ψ ),
T 1 ′ ′ −1 (y−Xβ)
= (2πσ 2 )− 2 e− 2σ2 (y−Xβ) (Hψ Hψ )
for x
(1) f (σ 2 | y, β, ψ):
S0 T 1 ′ ′ −1 (y−Xβ)
f (σ 2 | y, β, ψ) ∝ (σ 2 )−(ν0 +1) e− σ2 (σ 2 )− 2 e− 2σ2 (y−Xβ) (Hψ Hψ )
1 ′ ′ −1 (y−Xβ)/2)
∝ (σ 2 )−(ν0 +T /2+1) e− σ2 (S0 +(y−Xβ) (Hψ Hψ )
Hence,
(σ 2 | y, β) ∼ InvGamma (ν0 + T /2, S) ,
where S = S0 + (y − Xβ)′ (Hψ H′ψ )−1 (y − Xβ)/2
Sample (β | y, σ 2, ψ)
(2) f (β | y, σ 2 , ψ):
1 −1 1
′ (y−Xβ)′ (Hψ H′ψ )−1 (y−Xβ)
∝ e− 2 (β−β0 ) Vβ (β−β0 ) −
e 2σ 2
1 ′ −1 −1
β−2β ′ Vβ β 0 ) − 2σ1 2 (β ′ X′ (Hψ H′ψ )−1 Xβ−2β ′ X′ (Hψ H′ψ )−1 y)
∝ e− 2 (β Vβ e
h i
− 12 β ′ Vβ
−1
+ 1
X′ (Hψ H′ψ )−1 X −1
β−2β′ Vβ β0 + 1
X′ (Hψ H′ψ )−1 y
∝e σ2 σ2
Again, the exponent is quadratic in β, and we have
b Dβ )
(β | y, σ 2 , ψ) ∼ N(β,
where
−1
1 ′
Dβ = Vβ−1 ′ −1
+ 2 X (Hψ Hψ ) X ,
σ
b −1 1 ′ ′ −1
β = Dβ Vβ β 0 + 2 X (Hψ Hψ ) y
σ
Sample (ψ | y, β, σ 2)
(3) f (ψ | y, β, σ 2 ):
f (ψ | y, β, σ 2 ) ∝ f (ψ)f (y | β, σ 2 , ψ)
1 ′ ′ −1 (y−Xβ)
∝ e− 2σ2 (y−Xβ) (Hψ Hψ )
Recall 1 ′ ′ −1 (y−Xβ)
f (ψ | y, β, σ 2 ) ∝ e− 2σ2 (y−Xβ) (Hψ Hψ )
with −1 < ψ < 1
% ppsi.m
function llike = ppsi(psi,y,X,beta,sig2)
n = length(y);
Hpsi = speye(n) + ...
psi*sparse(2:n,1:n-1,ones(n-1,1),n,n);
R = Hpsi*Hpsi’;
llike = -.5/sig2*(y-X*beta)’*(R\(y-X*beta));
end
MATLAB Code
% linreg_ma1_RW.m
nloop = 11000; burnin = 1000;
randn(’seed’, 314159);
psi = -.5; beta = [4 .6]’; sig2 = .5;
n = 100; y0 = 10; y = zeros(n,1);
u = sqrt(sig2)*randn(n,1);
for i=1:n
if i==1
y(i) = beta(1) + beta(2)*y0 + u(i);
else
y(i) = beta(1) + beta(2)*y(i-1) + ...
+ u(i) + psi*u(i-1);
end
end
X = [ones(n,1) [y0; y(1:end-1)]];
MATLAB Code
% prior
beta0 = [0 0]’; invVbeta0 = 1/100*speye(2);
nu0 = 3; S0 = 1;
% initialize the Markov chain
psi = .1; beta = (X’*X)\(X’*y); sig2 = 1;
store_theta = zeros(nloop,4); count = 0;
MATLAB Code
for loop=1:nloop
% sample beta
Hpsi = speye(n) + ...
psi*sparse(2:n,1:n-1,ones(n-1,1),n,n);
R = Hpsi*Hpsi’;
Dbeta = (invVbeta0 + X’*(R\X)/sig2)\speye(2);
betahat = Dbeta*(invVbeta0*beta0+X’*(R\y)/sig2);
C = chol(Dbeta,’lower’);
beta = betahat + C*randn(2,1);
% sample sig2
e = y-X*beta;
sig2 = 1/gamrnd(nu0+n/2,1/(S0+e’*(R\e)/2));
MATLAB Code
% sample psi
psic = psi + sqrt(.1)*randn;
if psic<1 && psic >-1
alp = ppsi(psic,y,X,beta,sig2) ...
- ppsi(psi,y,X,beta,sig2);
if exp(alp)>rand
psi = psic;
count = count+1;
end
end
% store the parameters
store_theta(loop,:) = [beta’ sig2 psi];
end
store_theta = store_theta(burnin+1:end,:);
thetahat = mean(store_theta);