All Tutorials Dynamic Econometrics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

Exercises Dynamic Econometrics 2022


Week 1

Theory exercises
1. (W1V2) Consider a stochastic process Xt . We defined a general linear filter as

X
τt = aj Xt+j ,
j=−∞

where aj are a set of weights. A popular filtering technique is called exponential


smoothing, which is defined by the recursion

τt = bXt + (1 − b)τt−1 , τ1 = X1 .

The name “exponential smoothing” originates from the weights aj . Calculate aj in


terms of b and j.
We can repeatedly substitute τj until we reach τ1 . We get

τt = bXt + (1 − b)(bXt−1 + (1 − b)τt−2 )


= bXt + (1 − b)bXt−1 + (1 − b)2 bXt−2 + (1 − b)3 bXt−3 + . . . + (1 − b)t−1 X1
0
X
= (1 − b) t−1
X1 + b(1 − b)−j Xt+j
j=−(t−2)

We conclude that aj = 0 for j < −(t − 1) and j > 0. For j = −(t − 1), we have
a−(t−1) = (1 − b)t−1 , and for −(t − 2) ≤ j ≤ 0, aj = b(1 − b)−j . The last expression is
why the filter is referred to as “exponential smoothing”: the weight on past observations
decreases exponentially with j. Also note that it is a one-sided filter, and that it does
not use observations after time t.

1
2. (W1V3) Write in lag operator notation

Yt = 0.1Yt−1 + 0.3Yt−3 + εt ,
Yt = 0.1Yt−1 − 0.3Yt−2 + εt ,
Yt = 0.5Yt−2 + εt .

The lag operator L is such that LYt = Yt−1 and Lk Yt = Yt−k . Using these properties,
we see that

(1 − 0.1L − 0.3L3 )Yt = εt ,


(1 − 0.1L + 0.3L2 )Yt = εt ,
(1 − 0.5L2 )Yt = εt .

3. (W1V3) Consider a stochastic process Yt . Define ∆ = 1−L, where L is the lag operator
we discussed in the lecture. Show that ∆2 6= 1 − L2 .
You can repeatedly apply the operator ∆, and show that this is not the same as
applying the operator 1 − L2 , i.e.

(1 − L)(1 − L)Yt = (1 − L)(Yt − Yt−1 )


= (Yt − Yt−1 − Yt−1 + Yt−2 )
= (1 − 2L + L2 )Yt
6= (1 − L2 )Yt

Page 2
Exercises Dynamic Econometrics 2022

Week 2

Theory exercises
1. (W2V0) Show that for a covariance stationary process, the autocovariance function
satisfies γj = γ−j .
This is almost immediate from the definition of covariance stationarity, which tells us
that
γj,t = γj . (1)
The definition of jth autocovariance is

γj = E[(Yt − E[Yt ])(Yt−j − E[Yt−j ])]. (2)

Now relabel t = s + j, then

γj = E[(Ys+j − E[Ys+j ])(Ys − E[Ys ])] = γ−j . (3)

2. (W2V0) Consider the random walk process,


t
X
Yt = εj . (4)
j=1

Assume that E[εt |Yt−1 ] = 0 for t ≥ 1. What is E[Yt |Yt−1 ]? Is this different from E[Yt ]?

1
The unconditional expectation can be calculated using linearity of the expectation
operator,
t
X
E[Yt ] = E[εj ]
j=1
t
X
= E[E[εj |Yj−1 ]]
j=1

= 0.

To obtain the conditional expectation E[Yt |Yt−1 ], we write the random walk process as
the following recursion
Yt = Yt−1 + εt , Y1 = ε1 . (5)
Taking the conditional expectation left and right, we have

E[Yt |Yt−1 ] = Yt−1 + E[εt |Yt−1 ] = Yt−1 . (6)

We see that the unconditional and conditional expectation are different. E[Yt |Yt−1 ] is
a random variable, while E[Yt ] = 0, which is a constant.

3. (W2V0) Suppose Yt follows a random walk process.


(a) Show that we can write Yt as a function of Yt−j as
t
X
Yt = Yt−j + εs (7)
s=t−j+1

The random walk process is defined as


t
X
Yt = εs , (8)
s=1

where εs is a white noise process. We also see that


t−j
X
Yt−j = εs . (9)
s=1

Hence,
t
X
Yt = Yt−j + εs . (10)
s=t−j+1

Page 2
(b) Consider the process
1
Xt = √ Yt .
t
Is this process covariance stationary? Hint: for the calculation of the autocovari-
ance, you can use the result from (a).
This is closely related to the nonstationarity of the random walk process. We
know (see lecture slides) that for a random walk process,

E[Yt ] = 0,
(11)
var[Yt ] = tσ 2 .

Since for any constant a, we have E[aX] = aE[X] and var(aX) = a2 var(X), it
follows that
E[Xt ] = 0,
(12)
var[Xt ] = σ 2 .

So far, so good, but for covariance stationarity, we also need


1
γj,t = p E[Yt Yt−j ] = γj . (13)
t(t − j)

In (a), we have shown that we can write Yt as a function of Yt−j as Yt = Yt−j +


P t
s=t−j+1 εs . Hence,

t
1 2 1 X
γj,t = p E[Yt−j ] +p E[Yt−j εs ]
t(t − j) t(t − j) s=t−j+1
! (14)
t−j
= p σ2 + 0
t(t − j)

The first term uses the fact that the variance of the random walk Pt−jprocess at time
2
t − j is (t − j)σ . For the second term, we use that Yt−j = k=1 εk to see that
E[Yt−j εs ] = t−j
P
k=1 E[εk εs ], where k < s. Since εt is WN, E[εk εs ] = 0 if k 6= s. This
shows that the second term of (14) is equal to zero. Since the first term on the
second line of (14) depends on t, the process is not covariance stationary.

4. (W2V0) Suppose Yt = Xt +ηt with ηt a Gaussian White Noise process with variance ση2 ,
and where Xt a covariance stationary stochastic process with autocovariance function
γjX . You can assume that cov(Xt , ηs ) = 0 for all (t, s).

(a) What is the autocovariance function of Yt ?

Page 3
Since ηt is GWN, E[Yt ] = E[Xt ]. Then,

γjY = E[(Yt − E[Yt ])(Yt−j − E[Yt−j ])]


= E[(Xt − E[Xt ] + ηt )(Xt−j − E[Xt−j ] + ηt−j )] (15)
= γjX + ση2 · I[j = 0]

(b) Write down the long run variance of Xt and relate it to the long-run variance of
Yt .
The long-run variance is defined as

X
X
γLR = γ0X +2 γjX . (16)
j=1

Using the result from (a), the long-run variance of Yt is equal to



X
Y
γLR = γ0Y +2 γjY
j=1 (17)
X
= γLR + ση2 .

(c) What is the long-run variance of Yt if γjX = 0 for j > 2?


Using the definition of the long-run variance and the result from (b), the long-run
variance of Yt is then equal to
Y
γLR = γ0X + 2γ1X + 2γ2X + ση2 . (18)

5. (W1V3/V4) Suppose εt is a White Noise process with variance σ 2 .

(a) What are the population moments (mean, variance, autocovariance) of ηt = εt −


εt−1 ?

Page 4
The unconditional mean is

E[ηt ] = E[εt ] − E[εt−1 ] = 0 − 0 = 0. (19)

The unconditional variance is

γ0 = E[ηt2 ] = E[ε2t ] − 2E[εt εt−1 ] + E[ε2t ] = 2σ 2 . (20)

The j-th autocovariance for j = 1 is

γj = E[ηt ηt−j ] = E[(εt − εt−1 )(εt−j − εt−1−j )]


(21)
= −σ 2 .

For j > 1, γj = 0.

(b) Is this process covariance stationary?


The unconditional mean and autocovariance function do not depend on t. If
we assume that σ 2 < ∞, then they are also finite. In that case, the process is
covariance stationary.

6. (W2V0) Consider the process



Zt if t is even
Xt =
Yt if t is odd

i.i.d. i.i.d. i.i.d.


where Zt ∼ N (1, 1), while Yt ∼ Γ(1, 1). Here ∼ means that Zt and Yt are inde-
pendent and identically distributed over time. You can also assume that Zt and Ys
are independent for all (t, s). Show that Xt is covariance stationary, but not strictly
stationary.
Note that E[Zt ] = E[Yt ] = 1, and var(Zt ) = var(Yt ) = 1. The process Xt is CS because
the mean and autocovariance do not depend on t (they are the same whether t is odd
or even) and are finite.
However, this process is not strictly stationary, because the joint distribution of
(Xt , Xt+2 ) is different from the joint distribution of (Xt+1 , Xt+3 ). For strict stationar-
ity, the joint distribution should only depend on the difference between t + 2 and t (or
t + 3 and t + 1), i.e. on the scalar 2, but here it will be different depending on whether
t is even or odd.

7. (W2V1) Consider the MA(1) process Yt = µ + εt + θ1 εt−1 where var(εt ) = σ 2 .

(a) For what values of θ1 is this process invertible?

Page 5
The characteristic equation is
1 + θ1 z = 0,
which has the root z = − θ11 . The process is invertible if the root is outside of the
unit circle, so if |θ1 | < 1.

(b) Calculate the autocovariance function of this process.


The autocovariance is defined as

γj = E[(Yt − E[Yt ])(Yt−j − E[Yt−j ])].

You can use the MA(1) process as given in the question to show that

γ0 = σ 2 (1 + θ12 ),
γ1 = σ 2 θ1 ,
γj = 0 for j ≥ 2.

(Of course, in an exam you should give the detailed calculations).

(c) Suppose the root of the characteristic equation of the MA(1) process is inside the
unit circle. Find an equivalent representation to the MA(1) process above (in the
sense of the having the same autocovariance function), which has its root outside
the unit circle.
Define an MA(1) process Yt = µ + ε̃t + θ̃1 ε̃t−1 where ε̃t has variance σ̃ 2 .
Define θ̃1 = θ11 and σ̃ 2 = σ 2 θ12 . Then,

γ̃0 = σ̃ 2 (1 + θ̃12 ) = σ 2 (θ12 + 1) = γ0


γ̃1 = σ̃ 2 θ̃1 = σ 2 θ12 /θ1 = σ 2 θ1 = γ1

Since we have an MA(1) process, the fact that the root of the characteristic
equation in the original parametrization is inside the unit circle, implies that
|θ1 | > 1. Since θ̃1 = 1/θ1 , we have |θ̃1 | < 1 and the root of the characteristic
equation after reparametrization is outside of the unit circle.

(d) Is the long-run variance equal in both parametrizations?


Yes. Recall the long-run variance of an MA(1) process

γLR = σ 2 (1 + θ1 )2

You can verify that for σ̃ 2 = σ 2 θ12 and θ̃1 = 1/θ1 , we have that

γ̃LR = σ̃ 2 (1 + θ̃1 )2 = γLR

Page 6
8. (W2V2) Let εt be WN with variance σ 2 = 1.

(a) Is the following MA(2) process covariance stationary?

Yt = (1 + 2.4L + 0.8L2 )εt .

Denote by σ 2 the variance of εt . Any M A(q) process with finite q is covariance


stationary (if σ 2 < ∞, and the coefficients in front of the lag operators are finite).

(b) Calculate the autocovariance function of the process Yt defined above.


You can verify that

E[Yt ] = 0,
γ0 = var[Yt ] = (1 + 2.42 + 0.82 ),
γ1 = 2.4 · (1 + 0.8),
γ2 = 0.8,
γj = 0 if j > 2,

where we used that σ 2 = 1.

(c) Is the process invertible?


The process is called invertible if all roots of the characteristic equation

1 + θ1 z + θ2 z 2 + . . . = 0

lie outside the unit circle. In this case

1 + 2.4z + 0.8z 2 = 0,

which has solutions



−2.4 ± 2.42 − 4 · 0.8
z= = −1.5 ± 1.
1.6
One of the roots is therefore in the unit circle, and the process is not invertible

(d) Find an observationally equivalent representation of the MA process that has roots
outside the unit circle. Use that the variance of this observationally equivalent
MA(2) process is σ̃ 2 = 4.

Page 7
The autocovariance function of an MA(2) process with parameters (σ̃ 2 , θ̃1 , θ̃2 ) is

γ0 = σ̃ 2 (1 + θ̃12 + θ̃22 )
γ1 = σ̃ 2 θ̃1 (1 + θ̃2 ) (22)
2
γ2 = σ̃ θ̃2

To obtain the same autocovariances as in (b), you need

7.4 = σ̃ 2 (1 + θ̃12 + θ̃22 )


4.32 = σ̃ 2 θ̃1 (1 + θ̃2 )
0.8 = σ̃ 2 θ̃2

These are three equations for three unknowns. Using that σ̃ 2 = 4, we find θ̃1 = 0.9
and θ̃2 = 0.2. The characteristic equation is

1 + 0.9z + 0.2z 2 = 0,

which has solutions z = −2, and z = −2.5. These solutions are both outside the
unit circle, and hence the MA(2) process in terms of σ̃ 2 , θ̃1 , and θ̃2 is invertible.

9. (W2V2) Show that the long-run variance of an MA(q) process is equal to γLR =
P 2
q
σ2 j=0 θj .

Page 8
The MA(q) process is
q
X
Yt = µ + θj εt−j ,
j=0

where θ0 = 1 (normalization), and εt is white noise with variance σ 2 . We can derive

E[Yt ] = µ,
q
X
2 2
γ0 = E[(Yt − µ) ] = σ θj2 ,
j=0
q−i
X
2
γi = E[(Yt − µ)(Yt−i − µ)] = σ θj θj+i for i ≤ q.
j=0

Verify the expression for γi .


The long run variance is then
∞ q q q−i
!
X X X X
2
γLR = γ0 + 2 γi = σ θj2 +2 θj θj+i . (23)
i=1 j=0 i=1 j=0

We now use that a square of a sum can be rewritten in terms of a sum of squares and
a sum of cross-products, as
q
!2 q
X X X
θj = θj2 + θk θm .
j=0 j=0 m6=k

To get all possible combinations where k 6= m, consider first all combinations where
there is a difference if 1 between the indices, soP{θ0 θ1 , θ1 θ0 , θ1 θ2 , θ2 θ1 , . . .}. Note that
q−1
all of these occur twice. You can write this as 2 j=0 θj θj+1 . Similarly, if the difference
Pq−2
between the indices is 2, then you get the sum 2 j=0 θj θj+2 . You can continue this
for differences of {3, 4, . . . , q}. Summing all these possibilities gives the second term in
the brackets in (23). We can therefore write
q
!2
X
γLR = σ 2 θj (24)
j=0

10. (W2V2) Reconsider the MA(1) process in Question 7.

(a) What is the IRF (impulse response function) when a unit shock occurs at time t
for both the invertible and the not invertible process.

Page 9
Consider the non-invertible process. Suppose that the MA(1) process is at its
steady state (µ) at t − 1. Then, at time t a shock εt = 1 hits. No more shocks
occur after. The process Yt then satisfies

Yt−1 =µ
Yt = µ + εt
Yt+1 = µ + θ 1 εt
Yt+2 =µ

Consider the invertible process. Suppose that the MA(1) process is at its steady
state (µ) at t − 1. Then, at time t a shock ε̃t = 1 hits. No more shocks occur
after. The process Yt then satisfies

Yt−1 = µ
Yt = µ + ε̃t
Yt+1 = µ + θ̃1 ε̃t
Yt+2 = µ

Since θ1 6= θ̃1 , the IRFs are different.

(b) Why are these different?


The reason these IRFs are different is because the shocks that we impose are
different. A unit shock in the non-invertible representation is not a unit shock in
the other. If you are consistent, then you say (for example) that a shock εt = 1
hits. This means that in the other representation, a shock ε̃t = θ1 hits. Since
θ̃1 ε̃t = 1, then you get identical IRFs.

11. (W2V3) Consider the second order difference equation yt = φ1 yt−1 + φ2 yt−2 + wt .

(a) Write this in matrix notation as a first order vector difference equation.
(b) Suppose φ1 = 0.6 and φ2 = −0.08. Is yt stable?

Page 10
(a).
      
yt φ1 φ2 yt−1 wt
= +
yt−1 1 0 yt−2 0
j
(b). It was discussed in the lectures that
 stability
 is requires the [1,1] element of F
φ1 φ2
to go to zero as j → ∞, where F = . This happens if the eigenvalues of
1 0
F are smaller than one in absolute values. The eigenvalues satisfy
 
φ1 − λ φ2
det = λ2 − φ1 λ − φ2
1 −λ

This is solved by
p
φ1 ± φ21 + 4φ2
λ= .
2
So that λ1 = 0.4 and λ2 = 0.2. Indeed, the eigenvalues are smaller than one in absolute
value, so that yt is stable.

12. (W2V4) The Wold decomposition theorem shows that under general conditions the
process Yt can be written as an MA(∞) process. The shocks are then white noise, and
in fact (this was not mentionend in the lectures) also satisfy

εt = Yt − E[Yt |It−1 ] It−1 = {Yt−1 , Yt−2 , . . .}.

Show that this implies E[Yt−j εt ] = 0 for j > 0.


This follows from the law of iterated expectations,

E[Yt−j εt ] = E[E[Yt−j εt |It−1 ]]


= E[Yt−j E[εt |It−1 ]]
= 0.

13. (W2V4) In the lectures, we derive the population moments of the AR(1) process via
the MA(∞) representation. Derive the mean and variance of a stable AR(1) process
Yt = c + φ1 Yt−1 + εt by invoking (1) stability, (2) covariance-stationarity, and (3)
εt = Yt − E[Yt |It−1 ], where It−1 = {Yt−1 , Yt−2 , . . .}.

Page 11
Taking the expectation on the left and right-hand side of the AR(1) process, we get

E[Yt ] = c + φ1 E[Yt−1 ]. (25)

By covariance stationarity E[Yt ] = E[Yt−1 ], so that


c
E[Yt ] = c + φ1 E[Yt ] ⇒ E[Yt ] = ≡ µ. (26)
1 − φ1

For the variance, we use that Yt − µ = φ1 (Yt−1 − µ) + εt ,

E[(Yt − µ)2 ] = E[(φ1 (Yt−1 − µ) + εt )2 ]


= φ21 E[(Yt−1 − µ)2 ] + 2φ1 E[(Yt−1 − µ)εt ] + E[ε2t ]

The cross-product is zero via property (3) in the question and the answer to question
2. Also, by covariance stationarity E[(Yt − µ)2 ] = var[Yt ] = var[Yt−1 ] = E[(Yt−1 − µ)2 ],
and hence
σ2
var[Yt ] = φ21 var[Yt ] + σ 2 ⇒ var[Yt ] = . (27)
1 − φ21

14. (W2V4) Suppose a stable AR(1) is initialized by Y0 . What assumptions do you need
to make on Y0 to guarantee that E[Y0 ] = E[Yt ] and var(Y0 ) = var(Yt ).
We first use recursive substitution to obtain
t−1
X t−1
X
Yt = c φi1 + φt1 Y0 + φi1 εt−i
i=0 i=0

Now, taking expectations left and right, we have


t−1
X
E[Yt ] = c φi1 + φt1 E[Y0 ]
i=0

Using the finite geometric series, we obtain that


c
E[Y0 ] =
1 − φ1
σ2
Similary (verify this!), we obtain that var(Y0 ) = 1−φ21
.

15. (W2V4) Suppose ηt = σt εt where εt satisfies E[ε2t |ηt−1 ] = 1, and

σt2 = α0 + α1 ηt−1
2

Page 12
(a) Define ut = ηt2 − σt2 . Show that E[ut ] = 0

E[ut ] = E[ηt2 ] − E[σt2 ]


= E[(ε2t − 1)σt2 ]
= E[(ε2t − 1)(α0 + α1 ηt−1 2
)]
= E[E[(ε2t − 1)(α0 + α1 ηt−1 2
)|ηt−1 ] (28)
2 2
= E[(α0 + α1 ηt−1 )E[(εt − 1)|ηt−1 ]]
2
= E[(α0 + α1 ηt−1 ) · 0]
=0

(b) We can write ηt2 as ηt2 = c + φ1 ηt−1


2
+ wt . Find c, φ1 and wt in terms of α0 , α1 and
ut . Hint: use ut in the previous subquestion.

ηt2 = ut + σt2
2 (29)
= ut + α0 + α1 ηt−1

(c) Suppose ηt2 is covariance stationary, what is E[ηt2 ]?

E[ηt2 ] = E[ut ] + α0 + α1 E[ηt−1


2
] (30)

Assuming covariance stationarity, E[ηt2 ] = E[ηt−1


2
], and hence,
α0
E[ηt2 ] = (31)
1 − α1

16. (W2V5) Suppose εt is WN with variance σ 2 . Is the following AR(2) process stable?

Yt = 0.8Yt−1 − 0.3Yt−2 + εt .

Page 13
For stability, the roots of the characteristic equation should be outside the unit circle.
The characteristic equation of this AR(2) process is

1 − 0.8z + 0.3z 2 = 0 (32)

This has two (complex) solutions



0.8 ± 0.56i
z= (33)
0.6
For stability, we need |z| > 1. We have
s 
2
0.8 0.56
|z| = + = 1.82 > 1 (34)
0.6 0.62

So the AR(2) process is stable.

17. (W2V5) Suppose


Yt = 0.8Yt−1 − 0.3Yt−2 + εt .
What are the first three coefficients of the MA(∞) representation of this process?
You can use recursive substitution for this
Yt = 0.8Yt−1 − 0.3Yt−2 + εt
= 0.8(0.8Yt−2 − 0.3Yt−3 + εt−1 ) − 0.3Yt−2 + εt
= (0.82 − 0.3)(0.8Yt−3 − 0.3Yt−4 + εt−2 ) − 0.3 · 0.8Yt−3 + 0.8εt−1 + εt
= εt + 0.8εt−1 + (0.82 − 0.3)εt−2 + (0.83 − 2 · 0.3 · 0.8)Yt−3 − (0.82 − 0.3)0.3Yt−4 .
(35)

So the first three coefficients in the MA representation are {1, 0.8, (0.82 − 0.3)}.

18. (W2V5) Suppose


Yt = 0.8Yt−1 − 0.3Yt−2 + εt .
What is the conditional expectation E[Yt+2 |Yt , Yt−1 ]?
We know that
Yt+2 = 0.8Yt+1 − 0.3Yt + εt+2 (36)
so that
E[Yt+2 |Yt , Yt−1 ] = 0.8E[Yt+1 |Yt , Yt−1 ] − 0.3Yt
(37)
= (0.82 − 0.3)Yt − 0.3 · 0.8Yt−1

Page 14
Exercises Dynamic Econometrics 2022

Week 3

Theory exercises
1. (W3V1) Write the ARMA(1,1) model in MA(∞) form and find the MA coefficients
The ARMA(1,1) model is given by

(1 − φ1 L)Yt = c + (1 + θ1 L)εt

To get rid of the AR part, multiply this from the left by (ψ0 +ψ1 L+ψ2 L2 +ψ3 L3 +. . .).
We should have

(ψ0 + ψ1 L + ψ2 L2 + ψ3 L3 + . . .)(1 − φ1 L) = 1

From this, we find

ψ0 = 1, ψ1 = φ1 , ψ2 = φ21 , ψ3 = φ31 , ..., ψj = φj

On the right hand side, we then have

(1 + φ1 L + φ21 L2 + φ31 L3 + . . .)(1 + θ1 L)


= 1 + (θ1 + φ1 )L + φ1 (θ1 + φ1 )L2 + φ21 (θ1 + φ1 )L3 + . . .
X∞
=1+ ξi Li
i=1

where ξi = φi−1
1 (φ1 + θ1 ).

2. (W3V1) Write the ARMA(1,1) model in AR(∞) form and find the AR coefficients

1
The ARMA(1,1) model is given by

(1 − φ1 L)Yt = c + (1 + θ1 L)εt

To get rid of the MA part, multiply this from the left by (ψ0 +ψ1 L+ψ2 L2 +ψ3 L3 +. . .).
We should have

(ψ0 + ψ1 L + ψ2 L2 + ψ3 L3 + . . .)(1 + θ1 L) = 1

From this, we find

ψ0 = 1, ψ1 = −θ1 , ψ2 = θ12 , ψ3 = −θ13 , ..., ψj = (−1)j θj

On the left hand side, we then have



X
(1 + (−1)j θ1j Lj )(1 − φ1 L)
j=1

= 1 − (θ1 + φ1 )L + θ1 (θ1 + φ1 )L2 − θ12 (θ1 + φ1 )L3 + . . .


X∞
=1+ ξi Li
i=1

where ξi = (−1)i θ1i−1 (φ1 + θ1 ).

3. (W3V1) Show that and ARMA(1,1) model with θ1 = −φ1 is white noise.
From the answers above, we see that if θ1 = −φ1 , we obtain Yt = c + εt , so this is white
noise (plus a drift term c).

4. (W3V1) Specify the order (p and q) of the following ARMA processes and determine
whether they are stable and/or invertible.

Yt + 0.19Yt−1 − 0.45Yt−2 = εt ,
Yt + 1.99Yt−1 + 0.88Yt−2 = εt + 0.2εt−1 + 0.8εt−2 ,
Yt + 0.6Yt−2 = εt + 1.2εt−1 .

Page 2
These are: an AR(2) model, an ARMA(2,2) model, and an ARMA(2,1) model.
For stability, we need the roots of the characteristic equation for the AR part to be
outside of the unit circle.

Process 1: 1 + 0.19z − 0.45z 2 = 0. Roots: [1.7167, −1.2945]


Process 2: 1 + 1.99z + 0.88z 2 = 0. Roots: [−1.5076, −0.7537]
Process 3: 1 + 0.6z 2 = 0. Roots: [−1.2910i, 1.2910i]

We conclude that Process 1 and 3 are stable, and Process 2 is not.


For invertibility, we need the roots of the characteristic equation for the MA part to
be outside of the unit circle.

Process 1: No MA part
Process 2: 1 + 0.2z + 0.8z 2 = 0. Roots: [−0.125 + 1.111i, −0.125 − 1.111i]
Process 3: 1 + 1.2z = 0. Root: [−0.8333]

We conclude the second process is invertible, but the third is not.

5. (W3V1) Suppose Yt follows an ARMA(2,1) process. Derive the first four coefficients
in the AR(∞) representation of this process
The ARMA(2,1) process is written as

(1 − φ1 L − φ2 L2 )Yt = (1 + θ1 L)εt

The AR(∞) representation is

(1 − π1 L − π2 L2 − . . .)Yt = εt

Multiplying both sides by (1 + θ1 L), we have

(1 + θ1 L)(1 − π1 L − π2 L2 − . . .)Yt = (1 + θ1 L)εt

We can now find the coefficients πi by matching the left hand side of this equation to
that of the first equation. We find

θ1 − π1 = −φ1
−θ1 π1 − π2 = −φ2
−π3 − π2 θ1 =0
−π4 − π3 θ1 =0

These can be written in terms of π1 , . . . , π4 to answer the question.

Page 3
6. (W3V1) Consider the process

Y t = X t + εt ,
Xt = φ1 Xt−1 + ηt ,

where εt and ηt are strictly white noise with variance σε2 and ση2 respectively. Also, εt
is independent of ηt .

(a) For which values of φ1 is Xt a stable process?


The characteristic equation is 1 − φ1 z = 0, which has the root 1/φ1 . For this root
to be outside of the unit circle, we need |φ1 | < 1

(b) Show that if φ1 = 1, then ∆Yt = (1 − L)Yt is an ARMA(p, q) process. Determine


p, q and the coefficients of the AR and MA polynomials.

Page 4
We have

∆Yt = Yt − Yt−1
= Xt − Xt−1 + εt − εt−1
= ηt + εt − εt−1

If we calculate the autocovariance function of the process, we get

γ0 = ση2 + 2σε2 , γ1 = −σε2

And γj = 0 if j > 1. For a general MA(1) process, we have

γ0 = σ 2 (1 + θ12 ), γ1 = σ 2 θ1

Equating the two, we get

σ 2 θ1 = −σε2 (1)
σ 2 (1 + θ12 ) = ση2 + 2σε2 (2)

Adding twice the first equation to the second, we get for the second equation

σ 2 (1 + θ1 )2 = ση2

This gives
q
θ1 = −1 + ση2 /σ 2 .

Using this expression in (1), this can be written as


q
−|σ|2 + ση2 |σ| + σε2 = 0.

This gives
p 2 p 2
ση + ση + 4σε2
|σ| = ,
2
(since we are only interested in the positive solution). Finally, using (1) again

θ1 = −σε2 /σ 2
4σ 2
= − p 2 p ε2 .
( ση + ση + 4σε2 )2

The MA(1) process with these parameters θ1 and σ 2 has the same autocovariance
function as ∆Yt .

Page 5
7. (W3V1) Suppose Yt = φ1 Yt−1 + εt + θ1 εt−1 . Now define the process Zt = Yt + Xt with
Xt white noise with variance σx2 .

(a) Derive the ARMA model for Zt .


1+θ1 L
First write (1 − φ1 L)Yt = (1 + θ1 L)εt , so that Yt = 1−φ 1L
εt . Substituting that
into the equation for Zt and then multiplying by (1 − φ1 L), we have

(1 − φ1 L)Zt = (1 + θ1 L)εt + (1 − φ1 L)Xt . (1)

The right-hand-side has variance

γ0 = (1 + θ12 )σ 2 + (1 + φ21 )σx2 .. (2)

And autocovariance

γ1 = θ1 σ 2 − φ1 σx2 (3)
2
Suppose we write the right-hand-side as ηt + θ̃1 ηt−1 where E[ηt ] = 0 and E[ηt−1 ]=
2 2 2 2
σ̃ . This would have variance γ0 = (1 + θ̃1 )σ̃ , and γ1 = θ̃1 σ̃ . Note that γ0 and
γ1 should be the same as (2) and (3). You can solve for σ̃ 2 and θ̃1 in terms of γ0
and γ1 to get a genuine MA(1) representation of the right hand side of (1).
In conclusion, we find that Zt also follows an ARMA(1,1) model.

(b) Suppose you know the parameters of the model for Zt , can you uniquely determine
the parameters of the model for Yt ?
The process for Zt can be described by three parameters (AR, MA and error
variance). However on the right hand side of Zt = Yt + Xt there are in total
four parameters (AR, MA and two error variance parameters). It is therefore
impossible to retrieve these parameters from the parameters describing Zt .

8. (W3V3) Suppose
Yt = φ1 Yt−1 + β1 Xt + εt ,
where Yt was initialized in the infinite past and |φ1 | < 1.

(a) Show that this is equivalent to the model



!
X
Yt = β1 φj1 Xt−j + vt ,
j=0

vt = φ1 vt−1 + εt .

Page 6
By recursive substitution of Yt−j , we have

Yt = φ1 Yt−1 + β1 Xt + εt
= φ1 (φ1 Yt−2 + β1 Xt−1 + εt−1 ) + β1 Xt + εt
= ...
X∞ X ∞
j

= φ1 Y−∞ + β1 φ1 Xt−j + φj1 εt−j
j=0 j=0

Since |φ| <P1, the first term can be safely ignored. Since vt = φ1 vt−1 + εt , we have
that vt = ∞ j
j=0 φ1 εt−j . In total we have the requested result.

(b) What is short run effect of Xt on Yt ?


The short run effect is defined as
∂Yt
= β1 (4)
∂Xt

(c) What is the long run effect of X on Y ?


For the long-run effect, we first suppose we are in equilibrium. In equilibrium

Ỹ = φ1 Ỹ + β1 X̃.

The long-run effect is defined as

∂ Ỹ β1
= . (5)
∂ X̃ 1 − φ1
Why is this called the long run effect? We see that the immediate impact of a
unit change in Xt on Yt is equal to β1 . In the next period, this effect is still φ1 β1 .
Now think of Yt measuring GDP growth. Then the effect on growth is diminishing
overPtime, but the effect on the level of GDP is the sum of all the growth effects,
i.e. ∞ i β1
i=0 β1 φ1 = 1−φ1 .

9. (W3V4) Consider the AR(2) process Yt = φ1 Yt−1 +φ2 Yt−2 +εt for t = 2, . . . , T . Suppose
that εt is Gaussian white noise with variance σ 2 .
(a) Write down the density of Yt conditional on Yt−1 and Yt−2 .
Since we assume εt to have a normal distribution, we know that

1 (yt − φ1 yt−1 − φ2 yt−2 )2


 
1
fYt |Yt−1 ,Yt−2 (yt |yt−1 , yt−2 ) = √ exp − (6)
2πσ 2 2 σ2

Page 7
(b) Rewrite the density fY3 ,Y2 ,Y1 ,Y0 (y3 , y2 , y1 , y0 ) as the product of two conditional den-
sities as in the previous subquestion, and the joint density of Y1 and Y0 .
Note that
fY2 ,Y1 ,Y0 (y2 , y1 , y0 ) = fY2 |Y1 ,Y0 (y2 |y1 , y0 )fY1 ,Y0 (y1 , y0 )
And also

fY3 ,Y2 ,Y1 ,Y0 (y3 , y2 , y1 , y0 ) = fY3 |Y2 ,Y1 ,Y0 (y3 |y2 , y1 , y0 )fY2 ,Y1 ,Y0 (y2 , y1 , y0 )
= fY3 |Y2 ,Y1 (y3 |y2 , y1 )fY2 |Y1 ,Y0 (y2 |y1 , y0 )fY1 ,Y0 (y1 , y0 )

where we used that conditional on Y2 and Y1 , the distribution of Y3 does not


depend on Y0 .

(c) Write down the log-likelihood `(φ1 , φ2 , σ 2 ) using a similar conditioning approach
as shown in the lectures for the AR(1) model.
Continuing in the same fashion as in the previous subquestion, we have that
"T #
Y
fYT ,YT −1 ,...,Y0 (yT , yT −1 , . . . , y0 ) = fYt |Yt−1 ,Yt−2 (yt |yt−1 , yt−2 ) fY1 ,Y0 (y1 , y0 )
t=2

Taking logs and using the density from (6), we have


" T
#
2
1 1 1 X (yt − φ1 yt−1 − φ2 y t−2 )
`(φ1 , φ2 , σ 2 ) = (T − 1) − log(2π) − log(σ 2 ) −
2 2 2(T − 1) t=2 σ2
+ log fY1 ,Y0 (y1 , y0 )

(d) Write down the concentrated log-likelihood, `(φ1 , φ2 , σ̂ 2 (φ1 , φ2 )), where σ̂ 2 (φ1 , φ2 ) =
1
PT 2
T −1 t=2 (yt − φ1 yt−1 − φ2 yt−2 ) .

Substituting σ̂ 2 (φ1 , φ2 ) for σ 2 , we see that the concentrated log-likelihood is


" T
! #
1 1 1 X 1
` = (T − 1) − log(2π) − log (yt − φ1 yt−1 − φ2 yt−2 )2 −
2 2 T − 1 t=2 2
+ log fY1 ,Y0 (y1 , y0 ).

(e) Show that ignoring the contribution to the concentrated


PT likelihood of the first two
1 2
observations is equivalent to minimizing T −1 (y
t=2 t − φ 1 yt−1 − φ2 yt−2 ) .

Page 8
If we ignore log fY1 ,Y0 (y1 , y0 ), then maximizing the concentrated log-likelihood is
1
PT 2
equivalent to minimizing the sum of squared errors T −1 t=2 (yt −φ1 yt−1 −φ2 yt−2 ) .

(f) Suppose Yt is covariance stationary. The joint density of the first two observations
is given by a multivariate normal distribution. Define y I = (y0 , y1 ), then
 
1 1 0 −1
fY0 ,Y1 (y0 , y1 ) = p exp − (y I − µ) Σ (y I − µ)
(2π)2 |Σ| 2

where |Σ| denotes the determinant of Σ, the variance-covariance matrix of (Y0 , Y1 ).


Determine µ, Σ, and Σ−1 .
Since Yt is covariance stationary, (Y0 , Y1 ) has the same distribution as (Yt , Yt−1 ).
Since the AR(2) process above does not contain an intercept µ = E[(Yt , Yt−1 )] =
(0, 0).
Calculate the variance covariance Σ element by element. First of all,

var(Yt ) = φ21 var(Yt−1 ) + φ21 var(Yt−2 ) + 2φ1 φ2 cov(Yt , Yt−1 ) + σ 2

cov(Yt , Yt−1 ) = E[(φ1 Yt−1 + φ2 Yt−2 + εt )Yt−1 ]


= φ1 var(Yt−1 ) + φ2 E[Yt−2 Yt−1 ]

By covariance stationarity cov(Yt , Yt−1 ) = cov(Yt−1 , Yt−2 ), so that we obtain


φ1
cov(Yt , Yt−1 ) = 1−φ 2
var(Yt−1 ). Again by covariance stationary, we now also have

σ2 (1 − φ2 )σ 2
var(Yt ) = =
1 − φ21 − φ22 − 2φ21 φ2 /(1 − φ2 ) (1 + φ2 ) [(1 − φ2 )2 − φ21 ]

(Of course, this result can also be obtain using the Yule-Walker equations)
In total, we now have

σ2
 
1 − φ2 φ1
Σ=
(1 + φ2 ) [(1 − φ2 )2 − φ21 ] φ1 1 − φ2

The inverse of this matrix is given by


 
−1 −2 1 − φ22 −φ1 (1 + φ2 )
Σ =σ
−φ1 (1 + φ2 ) 1 − φ22

10. (W3V4) Consider the MA(2) process Yt = εt +θ1 εt−1 +θ2 εt−2 for t = 2, . . . , T . Suppose
that εt is Gaussian white noise with variance σ 2 .

Page 9
(a) What conditioning step would you take to estimate the parameters using condi-
tional maximum likelihood?
Note that in an MA(2) process

1 (yt − θ1 εt−1 − θ2 εt−2 )2


 
1
fYt |εt−1 ,εt−2 (yt ) = √ exp −
2πσ 2 2 σ2
(7)
1 ε2t
 
1
=√ exp − 2
2πσ 2 2σ

Suppose that we know that ε0 = ε1 = 0, that is, we set the first two shocks equal
to their expected value. This is the essential conditioning step.
We have
1 y22
 
1
fY2 |ε1 =0,ε0 =0 (y2 ) = √ exp − 2
2πσ 2 2σ

If in addition we know y2 , we know with certainty that ε2 = y2 . This means that

fY3 |Y2 ,ε1 =0,ε0 =0 (y3 ) = fY3 |ε2 ,ε1 =0,ε0 =0 (y3 )
= fY3 |ε2 ,ε1 =0 (y3 )
1 (y3 − θ1 ε2 )2
 
1
=√ exp −
2πσ 2 2 σ2

Similarly, if in addition we know Y3 , then we know with certainty that ε3 =


y3 − θ1 ε2 . This means that

fY4 |Y3 ,Y2 ,ε1 =0,ε0 =0 (y4 ) = fY4 |ε3 ,ε2 ,ε1 =0,ε0 =0 (y4 )
= fY4 |ε3 ,ε2 (y4 )
1 (y4 − θ1 ε3 − θ2 ε2 )2
 
1
=√ exp −
2πσ 2 2 σ2

Continuing in this fashion, we see that

fYt |Yt−1 ,Yt−2 ,...,Y2 ,ε1 =0,ε0 =0 (yt ) = fYt |εt−1 ,εt−2 (yt )

(b) Write down the conditional log-likelihood.

Page 10
The conditional likelihood is given by

fYT ,YT −1 ,...,Y2 |ε1 =0,ε0 =0 (yT , yT −1 , . . . , y2 |ε1 = 0, ε0 = 0)


= fYT |YT −1 ,...,Y2 ,ε1 =0,ε0 =0 (yT |yT −1 , . . . , y2 , ε1 = 0, ε0 = 0)
· fYT −1 ,YT −2 ,...,Y2 |ε1 =0,ε0 =0 (yT −1 , yT −2 , . . . , y2 |ε1 = 0, ε0 = 0)
= fYT |εt−1 ,εt−2 (yT |εt−1 , εt−2 )
· fYT −1 ,YT −2 ,...,Y2 |ε1 =0,ε0 =0 (yT −1 , yT −2 , . . . , y2 |ε1 = 0, ε0 = 0)
"T #
Y
= fYt |εt−1 ,εt−2 (yt |εt−1 , εt−2 )
t=2

Now using (7), the conditional log-likelihood is


" T
#
1 1 1 X
`(θ1 , θ2 , σ 2 ) = (T − 1) − log(2π) − log(σ 2 ) − 2
ε2t .
2 2 2(T − 1)σ t=2

Page 11
Exercises Dynamic Econometrics 2022

Week 4

p p
1. (W4V1) Show that if XT → cX and ZT → cZ , where cX and cZ are constants, then
p
ˆ XT + ZT → cX + cZ
p
ˆ XT ZT → cX cZ
p
Since ZT → cZ , with cZ a constant and the fact that convergence in probability implies
d
convergence in distribution, we also have ZT → cZ . Applying Slutsky’s theorem, this
shows that
d
ˆ XT + ZT → cX + cZ
d
ˆ XT ZT → cX cZ
Since convergence in distribution to a constant implies convergence in probability, we
have the desired result.

2. (W4V1) Let ST denote the sample average of {Y1 , . . . , YT }. Derive limT →∞ T · var(ST )
if:
(a) Yt is a white noise process with variance σ 2
For all questions, we know that if Yt has absolutely summable autocovariances,
then ∞
X
lim T var(ȳT ) = γLR = γ0 + 2 γj
T →∞
j=1

If Yt is white noise, then the process is uncorrelated over time, and hence
limT →∞ T var(ȳT ) = σ 2 .

(b) Yt is an MA(1) process


If Yt is an MA(1) process, then γ0 = σ 2 (1 + θ12 ), γ1 = σ 2 θ1 , and hence
limT →∞ T var(ȳT ) = σ 2 (1 + θ1 )2 .

1
(c) Yt is an AR(1) process
If Yt follows an AR(1) process, then γj = φj1 γ0 , and hance

!
σ2 X j
lim T var(ȳT ) = 1+2 φ1
T →∞ 1 − φ21 j=1

σ2
 
1
= 1+2 −2
1 − φ21 1 − φ1
σ2
=
(1 − φ1 )2

3. (W4V2) Consider the covariance stationary AR(1) process Yt = c + φ1 Yt−1 + εt . Sup-


pose you estimate the parameter vector β = (c, φ1 )0 by least squares. Show that the
asymptotic variance of φ̂1 is equal to 1 − φ21 .
Define xt = (1, yt−1 )0 . We know that
√ d
T (β̂ − β) → N (0, σ 2 Q−1 )

where Q = E[xt x0t ]. We work out σ 2 Q−1 explicitly:


 −1
2 −1 2 1 yt−1
σ Q =σ E 2
yt−1 yt−1
c
!−1
2
1 1−φ1
=σ c σ2 c2
1−φ1 1−φ21
+ (1−φ1 )2
!
σ2 c 2 c
1−φ21
+ (1−φ 2 − 1−φ
= (1 − φ21 ) c
1) 1
− 1−φ 1
1

The variance of φ̂1 is given by the [2, 2] element of this matrix, so we have avar(φ̂1 ) =
1 − φ21

4. (W4V2) Suppose {Yt } is a mean zero, independent sequence and E[|Yt |] < ∞. Show
that {Yt } is a martingale difference sequence with respect to the information set It =
{Yt , Yt−1 , . . .}, i.e. E[Yt |It−1 ] = 0.
{Yt } is an MDS if E[Yt |It−1 ] = 0. Since Yt is independent of Yt−1 , . . . , Y0 , we have

E[Yt |It−1 ] = E[Yt ] = 0.

We conclude Yt is an MDS.

Page 2
5. (W4V2) Suppose {Yt } is a martingale difference sequence with respect to the informa-
tion set It = {Yt , Yt−1 , . . .}. Show that E[Yt+m |It ] = 0 for m > 0.
Note first that
E[X|Y, Z] = 0 ⇒ E[E[X|Y, Z]|Z] = E[X|Z] = 0 (1)
Since Yt is an MDS, we have (by definition) E[Yt+m |It+m−1 ] = 0. Define It+m−1 /It =
t+m−1 t+m−1
Ct+1 , i.e. Ct+1 is information that has accumulated from time t + 1 and t + m − 1.
Then we know that
t+m−1
E[Yt+m |It+m−1 ] = E[Yt+m |It , Ct+1 ]=0

By (1), we then have E[Yt+m |It ] = 0 for m > 0.

6. (W4V2) Suppose Yt = εt εt−1 , and εt is some stochastic process. Suppose you can show
that Yt is an MDS with respect to the information set It−1 = {εt−1 , εt−2 , . . .}. Is Yt a
martingale difference sequence with respect to the information set Jt = {Yt , Yt−1 , . . .}?
Evidently, E[|Yt |] < ∞ and Yt ∈ Jt . The only thing left to show is that E[Yt |Jt−1 ] = 0
where Jt−1 = {Yt−1 , Yt−2 , . . .}. Note now that

E[εt εt−1 |Jt−1 ] = E [E[εt εt−1 |It−1 ]|Jt−1 ] = 0.

Here we used that Jt−1 ⊂ It−1 . Hence, Yt is an MDS w.r.t. Jt as well.

7. (W4V2) Give an example of a martingale difference sequence that is not covariance


stationary.
Let εt be mean zero and independent, but not identically distributed. For example,
let var(εt ) = σt2 with σt2 6= σs2 for t 6= s. In this case, εt is not covariance stationary.
Because the sequence is independent, we do have E[εt |It−1 ] = 0, so it is a martingale
difference sequence.

8. (W4V3) The usual estimator for the sample mean is µ̂T = T1 Tt=1 yt . Suppose the un-
P
derlying process Yt is such that the sample mean is a consistent estimator for the
PT pop-
p 1
ulation mean µ, i.e. µ̂T → µ. Show that the alternative estimator µ̄T = T −k t=1 yt ,
with k > 0 is

(a) consistent if k = 3

Page 3
Write
T 1
µ̄T = µ̂T = µ̂T = cT µ̂T
T −k 1 − k/T
p p p
Since Tk → 0, we have cT → 1. Also, it is given that µ̂T → µ. Now µ̄T = g(µ̂T , cT )
with g(·) a continuous function of it’s arguments. By the continuous mapping
p p
theorem, we then have g(µ̂T , cT ) → g(µ, 1), i.e. µ̄T → µ. We conclude that µ̄T is
a consistent estimator for µ.

(b) inconsistent if k = 12 T
p p
In this case, we have cT → 2. By the same argument as above g(µ̂T , cT ) → g(µ, 2),
p
i.e. µ̄T → 2µ. So in this case µ̄T is an inconsistent estimator for µ.

Hint: use the continuous mapping theorem.

9. (W4V3) Consider the following MA(1) process: Yt = εt + θ1 εt−1 . Assume that εt is


i.i.d. with variance σ 2 < ∞. In addition, suppose E[ε4t ] < M < ∞.
Suppose a researcher does not know that Yt follows an MA(1) process and instead
assumes Yt follows an AR(1) process (Yt = φ1 Yt−1 + εt ). The researcher estimates the
AR parameter using least squares
1
PT
T −1 t=2 yt yt−1
φ̂1 = 1 PT 2
T −1 t=2 yt−1

θ1
We will show in steps that the probability limit of φ̂1 is 1+θ12
.

(a) Show that the numerator of φ̂1 can be written as


T T
1 X 1 X
yt yt−1 = σ 2 θ1 + (η1t + η2t + η3t + η4t )
T − 1 t=2 T − 1 t=2

with

η1t = θ1 (ε2t−1 − σ 2 )
η2t = εt εt−1
η3t = θ12 εt−1 εt−2
η4t = θ1 εt εt−2

(b) Show that η1t , η2t , η3t , η4t are martingale difference sequences with respect to
their respective information sets It = {ηit , ηit−1 , . . .}.

Page 4
(c) Show that the variance of ηit is bounded for i = 1, . . . , 4.
(d) Invoke the WLLN for mixingales to find the probability limit of the numerator.
(e) Find a decomposition of the denominator analogous to the one for the numerator
provided above to find the probability limit of the denominator.
p
(f) Use Slutsky’s theorem to show that φ̂1 → θ1 /(1 + θ12 ).

Page 5
First analyze the numerator. Rewrite this as
T T
1 X 1 X
yt yt−1 = (εt + θ1 εt−1 )yt−1
T − 1 t=2 T − 1 t=2
T T
1 X 1 X
= θ1 εt−1 yt−1 + εt yt−1
T − 1 t=2 T − 1 t=2
T T
1 X 2 1 X
= θ1 εt−1 + εt εt−1
T − 1 t=2 T − 1 t=2
T T
1 X 1 X
+ θ12 εt−1 εt−2 + θ1 εt εt−2
T − 1 t=2 T − 1 t=2
T
2 1 X
= σ θ1 + (η1t + η2t + η3t + η4t )
T − 1 t=2

with

η1t = θ1 (ε2t−1 − σ 2 )
η2t = εt εt−1
η3t = θ12 εt−1 εt−2
η4t = θ1 εt εt−2

The trick is now (1) to show that {η1t }, {η2t }, {η3t }, {η4t } are martingale difference
sequences, (2) check that E[|ηit |2 ] < ∞ (so the condition in the WLLN for mixingales
with r = 2), and (3) invoke the WLLN.

Step 1: MDS p Define the information set It = {εt , εt−1 , . . .}. Note that
2 2
E[|εt εt−1 |] ≤ E[εt ]E[εt−1 ] < ∞, εt εt−1 ⊂ It and E[εt εt−1 |It−1 ] = 0 since εt is
i.i.d. Hence, {η2t } and {η3t } are MDS with respect to It .
Also, E[|εt εt−2 |] < ∞, εt εt−2 ∈ It and E[εt εt−2 |It−1 ] = 0, so that {η4t } is an MDS w.r.t.
It .
Finally E[|ε2t − σ 2 |] ≤ E[ε2t ] + σ 2 < ∞, ε2t ∈ It and E[ε2t − σ 2 |It−1 ] = 0, and hence {η1t }
is an MDS w.r.t. It .

Step 2: check the condition for the p WLLN Take r = 2. Using Cauchy-
2
Schwarz, we have E[η2t ] = E[(εt εt−1 )2 ] ≤ E[ε4t ]E[ε4t−1 ] < ∞ by the assumption
that the fourth moment of εt is finite. The argument for η3t and η4t is completely
2
analogous. With regard η1t , we have E[η1t ] = θ12 E[(ε2t−1 − σ 2 )2 ] = θ12 [E[ε4t−1 ] − σ 4 ] < ∞.

Step 3: Invoke the WLLN Since the condition for the WLLN to hold is
satisfied for all three sums, they all converge to zero in probability, and the only term
that remains is σ 2 θ1 .

Page 6
Write the denominator as
T T
1 X 2 1 X
yt−1 = σ 2 (1 + θ12 ) + ηt−1 , ηt = yt2 − σ 2 (1 + θ12 )
T − 1 t=2 T − 1 t=2

Rewrite

ηt = η1t + η2t + η3t


η1t = ε2t − σ 2
η2t = θ12 (ε2t−1 − σ 2 )
η3t = 2θ1 εt εt−1

Step 1: MDS Notice that the unconditional expectation of ηit is 0 for i = 1, 2, 3.


By independence of εt all ηit are MDS with respect to their respective information sets.

Step 2: check the condition for the WLLN You can follow exactly the
same argument as in Step 2 above.

Step 3: Invoke the WLLN Since the condition for the WLLN to hold is
satisfied for all three sums, they all converge to zero in probability, and the only term
that remains is σ 2 (1 + θ12 ).
Since we have established the probability limits of the numerator and denominator,
one can invoke Slutsky’s theorem to find that

p θ1
φ̂1 →
1 + θ12

10. Consider an AR(p) model


p
X
Yt = µ + φj Yt−j + εt
j=1

Suppose we are worried that εt = ρ1 εt−1 + ut , where ut is an i.i.d. process with zero
mean. Show that this would imply that Yt follows an AR(p + 1) process. Also, show
that testing whether ρ1 = 0 is equivalent to testing whether φp+1 = 0 in this AR(p + 1)
model.

Page 7
Write
p
X
Yt−1 = µ + φj Yt−1−j + εt−1 .
j=1

Now consider

Yt − ρ1 Yt−1 = µ(1 − ρ1 ) + φ1 Yt−1 + (φ2 − ρ1 φ1 )Yt−2 +


. . . + (φp − ρ1 φp−1 )Yt−p − ρ1 φp Yt−p−1 + ut .

In other words, Yt follows an AR(p + 1) process:

Yt = µ(1 − ρ1 ) + (φ1 + ρ1 )Yt−1 + (φ2 − ρ1 φ1 )Yt−2 + . . . + (φp − ρ1 φp−1 )Yt−p − ρ1 φp Yt−p−1 + ut .

You can now test whether ρ1 = 0 by testing the coefficient on Yt−p−1 .

Page 8
Exercises Dynamic Econometrics 2022

Week 5

1. (W5V1) Consider the AR(2) model

Yt = c + φ1 Yt−1 + φ2 Yt−2 + εt

where εt is a i.i.d. N (0, σ 2 ).

(a) What is the optimal h-step ahead forecast of the AR(2) model given that you
know (c, φ1 , φ2 , σ 2 ) for h = 1, 2, 3.
We know that the optimal forecast is the conditional mean E[Yt+h |It ]. We have

Yt+1 = c + φ1 Yt + φ2 Yt−1 + εt+1


E[Yt+1 |It ] = c + φ1 Yt + φ2 Yt−1

For the 2-step ahead forecast

Yt+2 = c + φ1 Yt+1 + φ2 Yt + εt+2


= c + φ1 (c + φ1 Yt + φ2 Yt−1 + εt+1 ) + φ2 Yt + εt+2
E[Yt+2 |It ] = c(1 + φ1 ) + (φ21 + φ2 )Yt + φ1 φ2 Yt−1

For the 3-step ahead forecast

Yt+3 = c + φ1 Yt+2 + φ2 Yt+1 + εt+3


= c + φ1 (c + φ1 Yt+1 + φ2 Yt + εt+2 ) + φ2 (c + φ1 Yt
+ φ2 Yt−1 + εt+1 ) + εt+3
= c + φ1 (c + φ1 (c + φ1 Yt + φ2 Yt−1 + εt+1 ) + φ2 Yt + εt+2 )
+ φ2 (c + φ1 Yt + φ2 Yt−1 + εt+1 ) + εt+3
E[Yt+3 |It ] = c(1 + φ1 + φ2 + φ21 ) + φ1 (φ21 + 2φ2 )Yt + φ2 (φ21 + φ2 )Yt−1

(b) What is the MSE for the h-step ahead forecast for h = 1, 2, 3?

1
Define
et+h = Yt+h − E[Yt+h |It ]
We have for the MSE at horizon h = 1, 2, 3, the following

M SEt (1) E[e2t+1 ] = E[ε2t+1 ]


= σ2,
M SEt (2) E[e2t+2 ] = E[φ21 ε2t+1 + 2φ1 εt εt+1 + ε2t+1 ]
= (1 + φ21 )σ 2 ,
M SEt (3) E[e2t+3 ] = E[([φ21 + φ2 ]εt+1 + φ1 εt+2 + εt+3 )2 ]
= σ 2 1 + φ21 + (φ21 + φ2 )2 .


(c) How would you construct a 95%-confidence interval for Yt+3 ?


Notice that et+h = Yt+3 − Ŷt+3 = [φ21 + φ2 ]εt+1 + φ1 εt+2 + εt+3 . Because of the
i.i.d. normal assumption on εt , this is also normally distributed with mean zero
and variance equal to E[e2t+3 ], i.e. the M SEt (3). We therefore have

Yt+3 − Ŷt+3 d
p 2 → N (0, 1)
E[et+3 ]

You would then construct a 95%-confidence interval as


 q q 
Ŷt+h − 1.96σ 1 + φ21 + (φ21 + φ2 )2 , Ŷt+h + 1.96σ 1 + φ21 + (φ21 + φ2 )2

where we have used the expression for the MSE of Ŷt+3 obtained in the previous
subquestion.

2. (W5V1/V3) Suppose we have an AR(1) process Yt = φ1 Yt−1 + εt , and we are interested


in forecasting Yt+2 . Suppose εt is an i.i.d. sequence with E[ε2t ] = σ 2 and E[ε4t ] ≤ M for
some finite M . You may assume that E[Yt4 ] ≤ M for some finite M for all t.
(a) What is the optimal forecast for Yt+2 conditional on It = {Yt , Yt−1 , . . .} and
knowing φ1 ?
The optimal forecast is again the conditional mean

E[Yt+2 |It ] = E[φ1 Yt+1 + εt+2 |It ]


= E[φ21 Yt + φ1 εt+1 + εt+2 |It ]
= φ21 Yt

Page 2
(b) We need to forecast yT +1 . Consider the estimator
PT
Yt Yt−1
φ̂1 = Pt=2
T 2
.
t=2 Yt−1

p
We now construct a forecast as Ŷt+2 = φ̂21 Yt . Show that φ̂21 → φ21 .
p
The fact that φ̂1 → φ1 was proven last week. Since φ̂2 is a continuous function of
φ̂, the result follows from the continuous mapping theorem.

(c) Now instead of using an iterated forecast, we try to relate Yt and Yt+2 by pre-
tending that the process is Yt = φ2 Yt−2 + ut . Say we estimate φ2 by least squares,
i.e. PT
Yt Yt−2
φ̂2 = Pt=3T 2
.
t=3 Yt−2
p
We now construct a forecast as Ŷt+2 = φ̂2 Yt . Show that φ̂2 → φ21 . You may
1
PT 2 p
assume that T −3 t=3 Yt−2 → γ0 < ∞.

Page 3
Note that
Yt = φ21 Yt−2 + εt + φ1 εt−1
Substituting this expression for Yt into the estimator, we see that
1
PT 1
PT
2 T −2 t=3 εt Yt−2 T −2 t=3 εt−1 Yt−2
φ̂2 = φ1 + 1 PT 2
+ φ1 1
P T 2
T −2 t=3 Yt−2 T −2 t=3 Yt−2

Define It = {εt , Yt−1 , εt−1 , Yt−2 , . . .}. Since εt is i.i.d. and Yt−2 only depends on
εt−2 , εt−3 , . . ., we have

E[εt Yt−2 |It−1 ] = Yt−2 E[εt |It−1 ] = Yt−2 E[εt ] = 0.

Also
q
E[|εt Yt−2 |] ≤ E[ε2t ]E[Yt−2
2
] < ∞.

Hence, ηt = εt Yt−2 is an MDS with respect to It . Also,

E[ηt2 ] = E[ε2t Yt−2


2
]
q
≤ E[ε4t ]E[Yt−24
]
≤ M < ∞.

We P can now invoke a law of large numbers for MDS sequences PTto show that
1 T p 1 p
T −2
ε Y
t=3 t t−2 → 0. An very similar argument shows that T −2 t=3 εt−1 Yt−2 →
1
PT 2
0. Since in the expression for φ̂2 , the denominator T −2 t=3 Yt−2 for both terms
converges in probability to γ0 < ∞, it follows from Slutsky’s theorem (the version
p
where everything converges in probability, see last week’s exercises) that φ̂2 → φ21 .

3. (W5V2) Suppose Yt = εt + θ1 εt−1 . Use the truncated AR(∞) representation to find


E[Yt+1 |It ]. What condition on θ1 do you need?

Page 4
Write
Yt = (1 + θ1 L)εt
We have shown before that when |θ1 | < 1, the MA(1) process is invertible and

(1 − θ1 L + θ12 L2 − θ13 L3 + . . .)Yt = εt

This is equivalent to

X
Yt = (−1)j+1 θ1j Yt−j + εt
j=1

Increase the time index to t+1, truncating the sum at j = t, and taking the expectation
conditional on It , we have
t
X
Ŷt+1 = (−1)j+1 θ1j Yt+1−j .
j=1

4. (W5V1/V3) Suppose Yt = φ1 Yt−1 + φ2 Yt−2 + εt , and assume Yt is stable and covariance


stationary. We observe data {y1 , . . . , yT }. By mistake, a researcher thinks that Yt
follows an AR(1) model, and considers the estimator
PT
Yt Yt−1
φ̂1 = Pt=2
T 2
t=2 Yt−1
p E[Yt Yt−1 ]
(a) Suppose you may assume that φ̂1 → 2 ] .
E[Yt−1
Calculate the (necessary) expecta-
tions.

2
E[Yt Yt−1 ] = φ1 E[Yt−1 ] + φ2 E[Yt−1 Yt−2 ]

By covariance stationarity
φ1 2
E[Yt Yt−1 ] = E[Yt−1 ]
1 − φ2
p φ1
This shows that φ̂1 → 1−φ2
.

φ1
(b) Suppose we construct a one-step ahead forecast as YT +1 = 1−φ 2
YT . Show that the
MSE of this forecast is at least as high as the MSE when forecasting using the
conditional mean of YT +1 .

Page 5
The MSE is given by
" 2 #
φ1 φ1 φ2
E[(YT +1 − YT )2 ] = E εT +1 − YT + φ2 YT −1
1 − φ2 1 − φ2
" 2 #
φ φ
1 2
= σ2 + E − YT + φ2 YT −1
1 − φ2
≥ σ2

(c) Now suppose we need to make a two-step ahead forecast. The researcher is still
φ2
using his AR(1) model. Show that his iterated forecast is ŶTI+2 = (1−φ12 )2 YT .
If the researcher thinks the model is an AR(1) model, then according to him, the
optimal forecast is
YT +2 = φ̃21 YT ,
φ1
where he will substitute his estimate for φ1 for φ̃1 , i.e. φ̃1 = 1−φ2
.

(d) Alternatively, the researcher can consider a direct forecast ŶTD+2 = φ̂D YT where
PT
Yt Yt−2
φ̂D = Pt=3
T 2
.
t=3 Yt−2

p φ21
Show that φ̂D −→ 1−φ2
+ φ2 . Hint: use the result from (a), and use that
PT
εt Yt−2 p
Pt=3
T 2
−→ 0.
t=3 Yt−2
To get the direct forecast, estimate
PT
yt yt−2
φ̂D = Pt=3
T 2
t=3 Yt−2

Use that Yt = φ1 Yt−1 + φ2 Yt−2 + εt , then


PT PT
t=3 Yt−1 Yt−2 εt Yt−2
φ̂D = φ2 + φ1 PT 2
+ Pt=3
T 2
t=3 Yt−2 t=3 Yt−2
p φ1
−→ φ2 + φ1
1 − φ2

From question (a), and noting that asymptotically it does not matter whether we
start the sums at t = 2 or t = 3, we then have the desired result.

Page 6
5. (W5V2) Consider the ARDL(1,1) model Yt = φ1 Yt−1 + β1 Xt + εt , with E[εt ] = 0, and
E[ε2t ] = σ 2 . Suppose that both Xt and Yt are CS.

(a) What is the optimal h = 1 step ahead forecast if Xt+1 is known?


This is the conditional mean of Yt , i.e. Ŷt+1 = φ1 Yt + β1 Xt+1 .

(b) What is the MSE of this forecast?


The MSE is given by

E[e2t+1 ] = E[(Yt+1 − Ŷt+1 )2 ] = σ 2

(c) Suppose now that Xt+1 is not known. You consider a direct approach, i.e. you
forecast
Ŷt+1 = φ1 Yt + β̃1 Xt .
Write down the forecast error.
We now have

et+1 = Yt+1 − Ŷt+1


= β1 Xt+1 − β̃1 Xt + εt+1

(d) Write down the MSE in terms of the variance and first-order autocovariance of
Xt .
The MSE is

E[e2t+1 ] = β12 E[Xt+1


2
] − 2β1 β̃1 E[Xt Xt+1 ] + β̃12 E[Xt2 ] + σ 2
 
= σ 2 + β12 + β̃12 E[Xt2 ] − 2β1 β̃1 E[Xt Xt+1 ]

(e) Suppose that Xt follows an AR(1) process, i.e. Xt = ρ1 Xt−1 + εX X


t , where εt is
2
white noise with variance σX . Find the value of β̃1 that minimizes the MSE.

Page 7
Under the AR(1) process, we have
2
σX
E[Xt2 ] =
1 − ρ21
σ2
E[Xt Xt−1 ] = ρ1 X 2
1 − ρ1

Hence,
 
E[e2t+1 ] = σ 2 + β12 + β̃12 E[Xt2 ] − 2β1 β̃1 E[Xt Xt+1 ]
 
= σ 2 + E[Xt2 ] β12 + β̃12 − 2ρ1 β1 β̃1

Minimizing this with respect to β̃1 gives β̃1∗ = β1 ρ1 .

(f) Find the MSE corresponding to the optimal value of β̃1 . Does the value of φX
1
matter?
Substituting the optimal value of β̃1 into the MSE, we have
2
σX
E[e2t+1 ] = σ 2 + β 2 (1 − ρ21 ) = σ 2 + σX
2 2
β1
1 − ρ21 1

The value of ρ1 turns out not to matter.

6. (W5V4) You observe two sequences of forecasts, that satisfy


T T
1X 1X
|et+1,1 | = 4, |et+1,2 | = . . . ,
T t=1 T t=1
T T
(1)
1X 2 1X 2
e = 16, e = 4.
T t=1 t+1,1 T t=1 t+1,2

(a) Suppose γLR = 1. What would be your conclusion when testing the null hypoth-
esis of equal predictive accuracy when the loss function is mean squared errror?
Assume you are testing at a significance level of 5%.
The Diebold-Mariano test statistic would be
16 − 4
DM = = 12 > 1.96 (2)
1
and you would reject at any reasonable significance level.

Page 8
(b) What would be your conclusion when the loss function is based on mean absolute
error?
Although the MAE is not given for model 2, we know that
v
T u T
1 X u1 X
|et+1,2 | ≤ t e2 =2 (3)
T t=1 T t=1 t+1,2

So we know that
4−2
DM ≥ = 2 > 1.96 (4)
1
so we would reject the DM test at the 5% level.

(c) Suppose T = 80, you estimate the long run variance using the Bartlett kernel
with `T = b4(T /100)2/9 c and accidentally the estimated autocovariance function
is γ̂j = 1/(j + 1)2 . Would you reject the null hypothesis of equal predictive
accuracy based on the squared error loss?
We would estimate

γ̂LR = γ̂0 + 2 [(2/3)γ̂1 + (1/3)γ̂2 ] = 1 + (1/3) + (2/27) = 38/27 = 1.41 (5)

This would result in the test statistic


16 − 4
DM = p = 10.12 > 1.96. (6)
38/27

Yes, you would reject the null of equal predictive accuracy.

(d) Does it change the outcome of the test if we would know for sure that γj =
1/(j + 1)2 ?

Page 9
The long run variance is

X
γLR = γ0 + 2 γj
j=1

X
= −γ0 + 2 γj
j=0
∞ (7)
X 1
= −1 + 2
j=1
j2
π2
= −1 + 2
6
= 2.2899

The DM statistic is
16 − 4
DM = = 5.24 > 1.96. (8)
2.2899
Using the true or the population long run variance does not change the outcome
of the test.

7. (W5V5) Suppose Yt = c + φ1 Yt−1 + εt , where |φ1 | < 1. You consider two forecasts, one
based on the unconditional mean of Yt , and one that assume Yt follows a random walk.
c
Specifically, we have Yt+1,1 = 1−φ 1
and Yt+1,2 = Yt . Show that a forecast combination
Yt+1,C = ωYt+1,1 + (1 − ω)Yt+1,2 with |ω| < 1 exists that has the same MSE as the
optimal forecast, i.e. the conditional mean Yt+1 = c + φ1 Yt .

c
Yt+1,C = ω + (1 − ω)Yt
1 − φ1
c
First, set the intercept equal to that of the optimal forecast, i.e. c = ω 1−φ 1
. Solving
gives ω = 1 − φ1 . This also implies that 1 − ω = φ1 . Hence, with ω = 1 − φ1 , we get

Yt+1,C = c + φ1 Yt ,

which is the optimal forecast itself. This is quite nice: using two misspecified models,
forecast combination can nevertheless help you to achieve the optimal MSE (disregard-
ing parameter uncertainty).

Page 10
Exercises Dynamic Econometrics 2022

Week 6

1. (W6V1) Suppose that Yt is an AR(1) process with a structural break in the intercept,
so
Yt = ct + φ1 Yt−1 + εt , (1)
where εt ∼ W N (0, σ 2 ) and

ct = c1 · I[t < Tb ] + c2 · I[t ≥ Tb ]. (2)


c1
Suppose Yt is initialized at t = 0 with E[Y0 ] = 1−φ1
.
To answer the questions below, the finite geometric series can be useful:
t
X 1 − φt+1
1
φi1 = .
i=0
1 − φ1

(a) Show that for t < Tb


c1
Et [Yt ] = .
1 − φ1

For t < Tb the process is simply an AR(1) process, so we can iterate it backwards
until Y0

Yt = c1 + φ1 Yt−1 + εt
t−1
X t−1
X
= c1 φi1 + φt1 Y0 + φi1 εt−i .
i=0 i=0

Taking the expectation, we get

1 − φt1 c1
Et [Yt ] = c1 + φt1
1 − φ1 1 − φ1
c1
= .
1 − φ1

1
(b) Show that for t ≥ Tb
c2 c1
Et [Yt ] = (1 − φt−T
1
b +1
)+ φt−T
1
b +1
.
1 − φ1 1 − φ1

For t ≥ Tb we need to be a bit more careful when interating backwards,

Yt = c2 + φ1 Yt−1 + εt
t−T
Xb t−1
X t−1
X
= c2 φi1 + c1 φi1 + φt1 Y0 + φi1 εt−i .
i=0 i=t−Tb +1 i=0

To get the indices right, a nice check is to take t = Tb . In this case, there should
only be one term involving c2 (so the upper limit on the first sum is correct in
that regard) and t − 1 terms involving c1 . Taking the expectation, we get

1 − φt−T b +1
1 − φ1t−1−t+Tb −1+1 c1
Et [Yt ] = c2 1
+ c1 φt−T
1
b +1
+ φt1
1 − φ1 1 − φ1 1 − φ1
c2 c 1
= (1 − φt−T
1
b +1
)+ φt−Tb +1 .
1 − φ1 1 − φ1 1
Beautiful!

2. (W6V1) Suppose that Yt = ct + φ1 Yt−1 + εt , where εt is i.i.d. with finite fourth moment
and where

0 if T < Tb
ct =
c otherwise.

Consider the estimator


1
PT
T −1 t=2 Yt Yt−1
φ̂1 = 1
PT 2
T −1 t=2 Yt−1

We’re going to argue somewhat loosely that as c → ∞, we get φ̂1 → 1 as long as Tb /T


is some fixed fraction (as T goes to infinity). This argument is important in practice,
because it shows that if we ignore structural breaks, we might (erroneously) conclude
that there are unit roots in the data (and first-difference when we should not).

(a) Show that


1
PT 1
PT
T −1 t=T Yt−1 T −1 t=2 εt Yt−1
φ̂1 = φ1 + c · 1
PT b 2 + 1
PT 2
(3)
T −1 t=2 Yt−1 T −1 t=2 Yt−1

Page 2
This follows by substituting the given DGP and using the ct = c for c ≥ Tb .

1
PT 2 p
(b) Assume that T −1 t=2 Yt−1 −→ a for some constant a. Argue that the last term
is op (1).
The argument is the same as we have seen in Week 4. Since εt is i.i.d. with
finite fourth moment and Yt−1 only depends on {εt−1 , εt−2 , . . .}, the numerator
converges in probability to 0 by the weak law of large numbers for martingale
difference sequences (if you are asked to show something like this on the exam,
you need to go over all the steps, unless it’s explicitly indicated that an informal
argument suffices). Since the denominator converges in probability to a constant,
the last term converges to zero in probability by the continuous mapping theorem.

(c) The results in Question 1 show that it takes some time for E[Yt−1 ] to transition to
c 2
1−φ1
after the break occurred. A similar effect is observed for E[Yt−1 ]. However,
if both Tb and T − Tb are sufficiently large (so as T → ∞, the ratio Tb /T → η
for some fixed fraction η), and φ1 is not too close to 1, we can safely ignore this
transitioning phase when calculating the probability limit of the second term in
equation (3).
What would you then argue is the probability limit of the numerator? What
about the denominator?
We expect the numerator to converge to its expectation. If we ignore the transi-
c
tioning phase, for the numerator the expectation is approximately (1 − η) 1−φ 1
.
σ2
For the denominator, the expectation is γ0 = 1−φ21
for pre-break observations
2 σ2 c2
and γ0 + µ = +
1−φ21
for post-break observations. We expect that the
(1−φ1 )2
denominator is converging to

σ2 c2
+ (1 − η) .
1 − φ21 (1 − φ1 )2

(d) What happens to φ̂1 as c increases?


We have argued that
c2
p (1−φ1 )2
(1 − η)
φ̂1 −→ φ1 + (1 − φ1 ) σ2 c2
≈ φ1 + (1 − φ1 ) = 1,
1−φ21
+ (1−φ 2 (1 − η)
1)

where the approximation holds if c is sufficiently large.

(e) What would be the outcome of an (augmented) Dickey Fuller test if one ignores
(large) structural breaks?

Page 3
The test would not be able to reject the null of a unit root.

3. (W6V2) Consider an AR(1) process (with intercept equal to zero) ith an innovation

outlier at the end of the sample. Calculate the mean squared forecast error E[(Yt+1 −
2
Ŷt+1 ) ] when Ŷt+1 is

(a) Ŷt+1 = φ1 Yt∗



(b) Ŷt+1 = φ21 Yt−1

Since the outlier is an innovation outlier, we have Yt+1 = φ1 Yt∗ +εt+1 = φ1 (Yt +ζ)+εt+1 .
∗ ∗
We also have Yt = Yt + ζ and Yt−1 = Yt−1 . Then,

E[(Yt+1 − φ1 Yt∗ )2 ] = E[ε2t+1 ]
= σ2.

For the second forecasting strategy, we have


∗ ∗
E[(Yt+1 − φ21 Yt−1 )2 ] = E[(φ1 ζ + φ21 Yt−1 + εt+1 + φ1 εt − φ21 Yt−1 )2 ]
= φ21 ζ 2 + σ 2 (1 + φ22 ).

4. (W6V2) Consider an AR(1) process (with intercept equal to zero) with an additive

outlier at time t. Calculate the mean squared forecast error E[(Yt+1 − Ŷt+1 )2 ] when
Ŷt+1 is

(a) Ŷt+1 = φ1 Yt∗



(b) Ŷt+1 = φ21 Yt−1
Since the outlier is additive, we have Yt+1 = φ1 Yt + εt+1 . We also have Yt∗ = Yt + ζ

and Yt−1 = Yt−1 . Then,

E[(Yt+1 − φ1 Yt∗ )2 ] = E[(Yt+1 − φ1 (Yt + ζ))2 ]
= E[(φ1 Yt + εt+1 − φ1 Yt + φ1 ζ)2 ]
= σ 2 + φ21 ζ 2 .

For the second forecasting strategy, we have


∗ ∗
E[(Yt+1 − φ21 Yt−1 )2 ] = E[(Yt+1 − φ21 Yt−1 )2 ]
= E[(φ21 Yt−1 + εt+1 + φ1 εt − φ21 Yt−1 )2 ]
= σ 2 (1 + φ21 ).

Page 4
(c) When does an outlier become ‘harmful’ in the sense that one is better off using
the second, rather than the first forecast above?
Equating the two MSE’s, we get that you prefer the first forecast if

σ 2 + φ21 ζ 2 ≤ σ 2 (1 + φ21 ).

After some rewriting

ζ 2 < σ2

So even when the outlier is relative small (slightly above one standard deviation
of the noise), we already prefer the two step ahead forecast.

5. (W6V3) Consider the following two processes

Yt = c + Yt−1 + εt ,
X t = δ · t + εt .

where εt ∼ W N (0, σ 2 ). Assume that c 6= 0 and δ 6= 0. The first process is a unit root
process, which is also called a process with a stochastic trend. The second process has
a deterministic trend.

(a) Show that these processes are not covariance stationary.

Solution: For the unit root process, we have seen in the lecture slides for
week 1 that Var(Yt ) = σ 2 · t (note: you need to show this). This depends on
t and hence the process cannot be covariance stationary.
For the deterministic process, we even have that E[Xt ] = β · t, which again
depends on t.

(b) Suggest a transformation for both processes such that the resulting process is
stationary. Reflect on the differences in the autocovariances.

Solution: If we first difference the unit root process, we get ∆Yt = c + εt .


This is an white noise process with an intercept c. You can calculate the
autocovariance function and show that this does not depend on t. In fact, all
autocovariances are zero (except for the population variance).
If we first difference the trend stationary process we get ∆Xt = δ + εt − εt−1 .
This is an (non-invertible) MA(1) process. You can calculate the autocovari-
ance function to show that the process is covariance stationary. Only the
first autocovariance is nonzero. So both processes are stationary, but there is
autocovariance in ∆Xt while not in ∆Yt .

Page 5
6. (W6V3) Suppose that Yt = δ · t2 + εt , where εt ∼ W N (0, σ 2 ). However, a researcher
thinks that the model is Yt = δ · t + εt . Suppose that the researcher estimates δ by
least squares based on the linear trend model (while the actual data has a quadratic
trend). Denote the estimator by δ̂.
PT T (T +1) PT
(a) Show that the δ̂/T →p 34 δ. You can use that t=1 t = 2
, 2
t=1 t =
2 2
T (T +1)(2T +1)
and Tt=1 t3 = T (T4+1) .
P
6

PT PT 2
Solution: Let xt = t. Then 2
t=1 xt = t=1 t = T (T +1)(2T
6
+1)
. Also,
PT PT 3 PT T 2 (T +1)2 P T
t=1 xt Yt = δ t=1 t + t=1 tεt = δ 4
+ t=1 tεt . As such, the OLS
estimator is
T
3 T 2 (T + 1)2 6 X
δ̂ = δ + tεt .
2 T (T + 1)(2T + 1) T (T + 1)(2T + 1) t=1

Notice that the variance of the summation in the second term is Var( Tt=1 tεt ) =
P

σ 2 Tt=1 t2 , so that the variance of the second term is T (T +1)(2T


6
P
+1)
. Hence, by
Chebyshev’s inequality, this term converges in probability to zero. For the
first term, only the leading order terms matter. In the numerator, the lead-
ing term is T 4 . In the denominator, the leading term is 2T 3 . It follows that
δ̂/T →p 34 δ.

(b) Calculate the MSE when forecasting h-steps ahead with the misspecified model
and using δ̂/T = 34 δ.

Solution: The researcher thinks that YT +h = δ(T +h)+εT +h and will forecast
ŶT +h = δ̂(T + h). This gives ŶT +h = 34 δT (T + h). The MSE is
2
δ2

2 2 3 2
E[(ŶT +h −YT +h ) ] = σ + δT (T + h) − δ(T + h) = σ 2 + (T +h)2 (T +4h)2
4 16

Note it grows as T 4 .

Page 6
Exercises Dynamic Econometrics 2022

Week 7

1. (W7V1) Consider a VMA(1) model

Y t = c + εt + Ψ1 εt−1 ,

where εt follows a (vector-valued) white noise process with covariance matrix Σ. Cal-
culate the population mean, variance matrix and autocovariance matrix.
The population mean is obtained by applying the expectation operator left and right
and noting that E[εt ] = 0 by the definition of a WN process. We obtain E[Y t ] = c.
The population variance is

E[(Y t − c)(Y t − c)0 ] = Σ + Ψ1 ΣΨ01

The population autocovariance matrix Γ1 is

E[(Y t − c)(Y t−1 − c)0 ] = Ψ1 Σ

Note that Γ−1 = Γ01 , since

E[(Y t−1 − c)(Y t − c)0 ] = ΣΨ01

The higher order autocovariance matrices are O.

2. (W7V2) Consider the VAR(1) model

Y t = c + Φ1 Y t−1 + εt

where ε follows a (vector valued) white noise process with covariance matrix Σ. Show
that this can be written as ∞
X
Yt =µ+ Ψj εt−j
j=0

and determine µ and Ψj in terms of c and Φ.

1
Write the VAR(1) process in lag polymial notation, multiply from the left with an
infinite MA lag polynomial, and then match the coefficients to get the desired result:

(Ψ0 + Ψ1 L + Ψ2 L2 + . . .)(1 − Φ1 L)Y t = (Ψ0 + Ψ1 L + Ψ2 L2 + . . .)(c + εt )

Matching gives: Ψ0 = I, Ψi = Φi1 . What remains is to determine µ, which we get as


follows

!
X j
µ= Φ1 c = (I − Φ1 )−1 c
j=0

3. (W7V2) Using the VMA(∞) representation from the previous question, show that for
the VAR(1) model, Γk = Φ1 Γk−1 for k = 1, 2, . . ..

" ∞ ∞
#
X X
E[(Y t − µ)(Y t−k − µ)0 ] = E Ψj εt−j ε0t−k−l Ψ0l
j=0 l=0

The only nonzero expectations are those where k + l = j, so


"∞ ∞
#
X X
E[(Y t − µ)(Y t−k − µ)0 ] = E Ψj εt−j ε0t−k−l Ψ0l
j=0 l=0

X
= Ψk+l ΣΨ0l
l=0
X∞
= Φk+l Σ(Φ0 )l
l=0

X
=Φ Φk+l−1 Σ(Φ0 )l
l=0
= ΦΓk−1

4. (W7V2) Show how to obtain γj for AR(2) using the companion form.

Page 2
The companion form of an AR(2) model is
      
Yt φ1 φ2 Yt−1 εt
= +
Yt−1 1 0 Yt−2 0

To obtain γj , multiply with (Yt−j , Yt−j−1 ) and take expectations. Notice that all ex-
pectations involving εt are zero.
    
γj γj+1 φ1 φ2 γj−1 γj
=
γj−1 γj 1 0 γj−2 γj−1

5. (W7V2) Consider the m-dimensional covariance stationary VAR(2) process

Y t = c + Φ1 Y t−1 + Φ2 Y t−2 + εt

where εt follows a (vector-valued) white noise process with mean zero and covariance
matrix Σ.
(a) Calculate the unconditional mean of Y t .
Taking expectations and invoking covariance stationarity, we have

E[Y t ] = c + Φ1 E[Y t ] + Φ2 E[Y t ]

Solving for E[Y t ] gives

µ = E[Y t ] = (I − Φ1 − Φ2 )−1 c

(b) Show that the unconditional variance of Y t satisfies Γ0 = Φ1 Γ0 Φ01 + Φ2 Γ0 Φ02 +


Φ1 Γ1 Φ02 + Φ2 Γ01 Φ01 + Σ.
Writing c = (I − Φ1 − Φ2 )µ and subtracting µ from the left and right hand side
of the expression for Y t given in the question, we get

Y t − µ = (I − Φ1 − Φ2 )µ − µ + Φ1 Y t−1 + Φ2 Y t−2 + εt
= Φ1 (Y t−1 − µ) + Φ2 (Y t−2 − µ) + εt

The result follows from

E[(Y t − µ)(Y t − µ)0 ] = E[(Φ1 (Y t−1 − µ) + Φ2 (Y t−2 − µ) + εt )


(Φ1 (Y t−1 − µ) + Φ2 (Y t−2 − µ) + εt )0 ]
= Φ1 Γ0 Φ01 + Φ2 Γ0 Φ02 + Σ + Φ1 Γ1 Φ02 + Φ2 Γ01 Φ01 .

Make sure to get all the transposes correct as Γ01 6= Γ1 .

Page 3
(c) Write down the VAR(2) model in companion form.
The VAR(2) in companion form is
      
Yt−µ Φ1 Φ2 Y t−1 − µ εt
= +
Y t−1 − µ I O Y t−2 − µ 0

(d) From Exercise 3, you know that Γc,k = Φc,1 Γc,k−1 . Use this to show that Γ1 =
Φ1 Γ0 + Φ2 Γ01 .
Multiplying the companion form from the right by (Y 0t−1 − µ0 , Y 0t−2 − µ0 ) and
taking expectations, we get
    
Γ1 Γ2 Φ1 Φ2 Γ0 Γ1
=
Γ0 Γ1 I O Γ01 Γ0

Looking at the left upper block, we see that

Γ1 = Φ1 Γ0 + Φ2 Γ01

6. (W7V3) Consider the following VAR(2)


     
.5 .1 0 0 .09 0
Yt =c+ Y t−1 + Y t−2 + εt , Σ= .
.4 .5 .25 0 0 0.04

(a) Calculate the first two values of the IRF function.


The first two values of the IRF function are:
 
∂Y t ∂Y t .5 .1
= Ψ0 = I 2 , = Ψ1 = . (1)
∂ε0t ∂ε0t−1 .4 .5

(b) Calculate the first two values of the orthogonalized IRF function.
Let us find P :  
.3 0
P = .
0 0.2
The first two elements of the OIRF:
   
∂Y t .3 0 ∂Y t .15 .02
= Ψ0 P = , = Ψ1 P = . (2)
∂u0t 0 0.2 ∂u0t−1 .12 .10

Even though the matrix Σ is diagonal, the OIRF is still different from the IRF.

Page 4
7. (W7V3) Consider the following SVAR(1) where Y t is a [3 × 1] vector.

A0 Y t = d + A1 Y t−1 + ut ,

Assume that E[ut u0t ] is a diagonal matrix. Also consider the corresponding reduced
form VAR(1) process

Y t = c + Φ1 Y t−1 + εt .

How many restrictions do you need to impose on the structural VAR coefficients to be
able to identify the structural coefficients from the reduced form VAR coefficients?
The structural models has 3 intercept parameters, 3×3 parameters relating Y t to Y t−1 ,
6 parameters for contemporaneous relations, and 3 variance parameters, so in total 21
parameters. The reduced form model has 3 intercept parameters, 3 × 3 parameters
relating Y t to Y t−1 , 0 parameters for contemporaneous relations, and 6 parameters
from the variance covariance matrix, so in total 18 parameters. Therefore, we need to
impose 3 restrictions on the structural VAR.

8. (W7V3) Consider the following structural VAR process derived from two ARDL(1,2)
models as in the lectures.
! !
(s) (s) (s) (s) (s) (s) (s)
εs1,t

φ1,1 + γ1,2 φ2,1 φ1,2 + γ1,2 φ2,2 1 γ1,2
Yt=α Y t−1 + α ,
(s) (s) (s) (s) (s) (s)
φ2,1 + γ2,1 φ1,1 φ2,2 + γ2,1 φ1,2
(s)
γ2,1 1 εs2,t
 −1
(s) (s)
where α = 1 − γ1,2 γ2,1 . Also, consider the reduced form corresponding to this
process.
   
φ1,1 φ1,2 ε1,t
Yt= Y t−1 +
φ2,1 φ2,2 ε2,t
(s)
(a) Suppose you know φ1,2 = 0. What does this tell you about φ1,2 ?
(s) (s) (s)
If φ1,2 = 0, then we have that α(φ1,2 + γ1,2 φ2,2 ) = 0. Since α cannot be equal to
(s) (s) (s) (s) (s) (s)
zero, this implies that φ1,2 + γ1,2 φ2,2 = 0. This tells you that φ1,2 = −γ1,2 φ2,2 .
(s) (s)
However, since we have no information on γ1,2 and φ2,2 , the restriction that φ1,2 =
(s)
0 does not tell us anything about φ1,2 . This means that even though you might
not find a reduced form relation between Y1,t and Y2,t−1 , this does not mean that
there is no structural relation between Y1,t and Y2,t−1 .

(s)
(b) Suppose you know that φ1,2 = 0, what does this tell you about φ1,2 ?

Page 5
(s)
The answer is similar to that of the previous exercise. If φ1,2 = 0, you know
(s) (s)
that φ1,2 = αγ1,2 φ2,2 . However, without further restrictions on the structural
coefficients you cannot say anything about φ1,2 . So this means that even though
there is no structural relation between Y1,t and Y2,t−1 , there might be a reduced
form relation between the two.

(s) (s)
(c) Assume D = E[εt (εt )0 ] is diagonal, with diagonal elements d11 and d22 . Denote
Σ = E[εt ε0t ]. Suppose that there is no contemporaneous effect of Y2 on Y1 , i.e.
(s)
γ1,2 = 0.
(s)
i. Find d11 , d22 , and γ2,1 in terms of reduced form parameters by comparing the
covariance matrix of the structural errors to that of the reduced form errors.
(s)
If γ1,2 = 0, then α = 1. Also
(s)
   
1 0 d11 0 1 γ2,1
Σ= (s)
γ2,1 1 0 d22 0 1
!
(s)
d11 γ2,1 d11
= (s) (s) 2
γ2,1 d11 (γ2,1 ) d11 + d22

We see that

d11 = [Σ]1,1 ,
(s) (s)
γ2,1 d11 = [Σ]1,2 → γ2,1 = [Σ]1,2 /[Σ]1,1 ,
(s)
(γ2,1 )2 d11 + d22 = [Σ]2,2 → d22 = [Σ]2,2 − [Σ]21,2 /[Σ]1,1 .

ii. Find the remaining structural coefficients in terms of the reduced form coef-
ficients.
(s) (s) (s)
Since γ1,2 = 0, we immediately have φ1,2 = φ1,2 , and φ1,1 = φ1,1 . Using this
and the solution of the previous question, we have that
(s)
φ2,1 = φ2,1 − [Σ]1,2 /[Σ]1,1 φ1,1
(s)
φ2,2 = φ2,2 − [Σ]1,2 /[Σ]1,1 φ1,2

9. (W7V4) Consider a m-dimensional VAR(p) process. Denote by f t the optimal forecast


for Y t+h , based on the information set It , that is, f t minimizes the MSE given by

E[(f t − Y t+h )0 (f t − Y t+h )].

(a) Show that f t = E[Y t+h |It ].

Page 6
Define mt = E[Y t+h |It ]

E[(f t − Y t+h )0 (f t − Y t+h )] = E[(f t − mt − (Y t+h − mt ))0 (f t − mt − (Y t+h − mt ))]


= E[(f t − mt )0 (f t − mt )]
− 2E[(f t − mt )0 (Y t+h − mt )]
+ E[(Y t+h − mt )0 (Y t+h − mt )]

The interaction term is zero, since

E[(f t − mt )0 (Y t+h − mt )] = E[E[(f t − mt )0 (Y t+h − mt )|It ]]


= E[(f t − mt )0 E[(Y t+h − mt )|It ]]
= E[(f t − mt )0 0] = 0

Since (f t − mt )0 (f t − mt ) ≥= 0, the best we can do is set f t = mt . We conclude


that f t = E[Y t+h |It ] is the optimal forecast, in the sense of minimizing the MSE.

(b) Show that the 1-step ahead MSE under this forecast is trace(Σ), with Σ the
covariance matrix of the errors in the VAR(p) process.
The VAR(p) process is
p
X
Yt =c+ Φi Y t−i + εt
i=1

We see that Ŷt+1 = E[Yt+1 |It ] = c + pi=1 Φi Y t+1−i , and hence, the forecast error
P

is et+1 = Yt+1 − Ŷt+1 = εt+1 . From the definition of the MSE given in the question,
we then have that the MSE is

E[(Ŷt+1 − Yt+1 )0 (Ŷt+1 − Yt+1 )]


= = E[ε0t+1 εt+1 ]
= trace(E[ε0t+1 εt+1 ]) trace of a scalar is still the same scalar
0
= E[trace(εt+1 εt+1 )]
trace is a sum, so expectation of trace is trace of expectation
= E[trace(εt+1 ε0t+1 )] cyclicality property of the trace
= trace(E[(εt+1 ε0t+1 )]) reversing the order of trace and expectation again
= trace(Σ).

Page 7

You might also like