0% found this document useful (0 votes)
63 views28 pages

Methods For Applied Macroeconomics Research - ch1

This chapter reviews background concepts in probability, stochastic processes, and time series analysis that are useful for analyzing dynamic stochastic general equilibrium (DSGE) models. It is divided into six sections that cover: the definition of stochastic processes; concepts of convergence; time series concepts; laws of large numbers; central limit theorems; and elements of spectral analysis. The chapter aims to summarize key concepts and results in a way that is relevant for confronting DSGE models with economic time series data.

Uploaded by

endi75
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views28 pages

Methods For Applied Macroeconomics Research - ch1

This chapter reviews background concepts in probability, stochastic processes, and time series analysis that are useful for analyzing dynamic stochastic general equilibrium (DSGE) models. It is divided into six sections that cover: the definition of stochastic processes; concepts of convergence; time series concepts; laws of large numbers; central limit theorems; and elements of spectral analysis. The chapter aims to summarize key concepts and results in a way that is relevant for confronting DSGE models with economic time series data.

Uploaded by

endi75
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Chapter 1: Preliminaries

This chapter reviews background concepts and a few results used at some stage or another
in the rest of the book. The material we present is dispersed in a variety of advanced
probability, stochastic process and time series textbooks. Here we only summarize concepts
and results which are useful from the point of view we take in this book, i.e. how to confront
dynamic stochastic general equilibrium (DSGE) models to the data.
The chapter is divided in six sections. The rst denes what a stochastic process is.
The second examines the asymptotic behavior of stochastic processes by introducing four
concepts of convergence; characterizes the relationships among various denitions and high-
lights dierences. Section 3 introduces time series concepts which are of use in the next
chapters. Since the majority of econometric estimators are continuous functions of stochas-
tic processes, we also present a few results concerning the properties of such functions.
Section 4 deals with laws of large numbers. Such laws are useful to insure that functions of
stochastic processes converge to appropriate limits. We examine three situations often en-
countered in practice: a case where observations are dependent and identically distributed;
one where they are dependent and heterogeneously distributed and one where they are mar-
tingale dierences. AS we will see, relaxation of the homogeneity condition comes together
with stronger restrictions on the moment structure of the process. Section 5 describes three
central limit theorems corresponding to the three situations analyzed in section 4. Central
limit theorems are useful to derive the distribution of functions of stochastic processes and
are the basis for (classical) tests of hypotheses and for some model evaluation criteria.
Section 6 presents elements of spectral analysis. Spectral analysis is useful for breaking
down economic time series into components (trends, cycles, etc.), for building measures
of persistence of shocks, for analyzing formulas for the asymptotic covariance matrix of
certain estimators and for dening measures of distance between models and the data. It
may be challenging at rst. However, once it is realized that most of the functions typically
performed by modern electronics use spectral methods (frequency modulation in a stereo;
frequency band reception in a cellular phone, etc.), the reader should feel more comfortable
with it. Spectral analysis oers an alternative way to look at time series translating serially
dependent time observations in contemporaneously independent frequency observations.
This change of coordinates allows us to analyze the primitive cycles which compose time
series, discuss their length, their amplitude and their persistence.
Whenever not explicitly stated, everything presented in this chapter applies to both
1
2
scalar and vector stochastic processes. The notation {y
t
()}

t=
indicates the sequence
{. . . , y
0
(),
y
1
(), . . . , y
t
(), . . .} where for each j, the random variable y
j
() is a function of the state
of nature K, i.e. y
j
: K R, where R is the real line and K the space of states of
nature. To simplify the notation at times we simply write {y
t
()} or y
t
. A normal random
variable with zero mean and variance
y
is denoted by y
t
N(0,
y
) and a random variable
uniformly distributed over the interval [a
1
, a
2
] is denoted by y
t
U[a
1
, a
2
]. Finally, iid
indicates identically and independently distributed random variables.
1.1 Stochastic Processes
Denition 1.1 (Stochastic Process 1): A stochastic process {y
t
()}

t=1
is a sequence
of random vectors together with mutually consistent joint probability distributions for nite
subsequences {y
t
()}
t
i
t=t
1
, e.g. , {y
t
1
. . . y
t
i
} N(0,
y
) for all i and for xed .
Denition 1.2 (Stochastic Process 2): A stochastic process {y
t
()}

t=1
is a probability
measure dened on sets of sequences of real vectors (the paths of the process).
Denition 1.1 is incomplete in the sense that it does not specify what a consistent joint prob-
ability distribution is. For example, unless all variables are independent,
P(yt
1
,...,yt
i
0
)
P(y
t
1
,...,y
t
i
0
|y
t
1
,...,y
t
i
)
6= P(y
t
1
. . . y
t
i
) for i
0
> i. Since in macro-time series frameworks the case of independently
distributed y
t
is rare, such a denition is impractical. The second denition implies, among
other things, that the set of paths X = {y : y
t
() %} for arbitrary % R, t xed,
has well-dened probabilities. In other words, choosing dierent % R for a given t,
and performing countable unions, nite intersections and complementing the above set of
paths, we generate a set of events with proper probabilities. Note also that the y
t
path is
unrestricted for all t: the realization needs to be below % only at t. In what follows,
we will use denition 1.2 and the notation y
t
() indicates that the random variable y
t
is a
function of both time t and the event K. Observable time series will be realizations of
a stochastic process {y
t
()}

t=1
given .
Example 1.1 Three examples of simple stochastic processes are the following:
1) y
t
= e
1
cos(t e
2
) where e
1
, e
2
are random variables, e
1
> 0 and e
2
U[0, 2). Here y
t
is periodic: e
1
controls the amplitude and e
2
the periodicity of y
t
.
2) y
t
is such that P[y
t
= 1] = 0.5 for all t. Such a process has no memory and ips
between -1 and 1 at each t.
3) y
t
=
P
T
t=1
e
t
, e
t
iid (0, 1). Then y
t
is a random walk with no drift.
Example 1.2 It is easy to generate complex stochastic processes from primitive ones. For
example, if e
1t
N(0, 1) and e
2t
U(0, 1] and they are independent of each other, y
t
=
e
2t
exp{
e
1t
1+e
1t
} is a stochastic process. What are the mean and the variance of y
t
?
Methods for Applied Macro Research 1: Preliminaries 3
1.2 Concepts of Convergence
1.2.1 Almost sure (a.s.) convergence
The concept of a.s. convergence is an extension of the concept of convergence employed for
sequences of real numbers. In real analysis we have the following denition:
Denition 1.3 Let {y
1
, . . . , y
j
} be a sequence of real numbers. It converges to y if
lim
j
y
j
= y.
Example 1.3 To illustrate denition 1.3, let y
j
= 1
1
j
. Then lim
n
y
j
= 1. On the
other hand, if y
j
= (1)
j
, lim
j
y
j
is undetermined. Finally, if y
j
= y
j
, lim
j
y
j
exists
if y 1.
When we deal with stochastic processes, the object of interest are functions of the state
of nature. However, once is drawn, the sequence {y
1
(), . . . , y
t
()} looks like a sequence
of non-random numbers. Hence, given , we can use a concept of convergence similar to
the one used for real numbers.
Denition 1.4 (almost sure convergence) Let y() < . Then {y
t
()}
a.s.
y() if
lim
T
P[||y
t
() y()|| , t > T] = 1 for almost all K and for every > 0.

T t



y()
y1t()
y2t()
Figure 1.1: Almost sure convergence
According to denition 1.4 a sequence {y
t
()} converges almost surely (a.s.) if the proba-
bility of obtaining a path for y
t
which converges to y() is one, after some T (see gure 1.1
4
for two such paths). The probability here is taken over s. Note that the denition implies
that failure to converge is possible, but it happens on a null set (over which y
t
() ),
which has zero Lebesgue measure. When K is innite dimensional, a.s. convergence is
called convergence almost everywhere; sometimes a.s. convergence is termed convergence
with probability 1 (w.p.1) or strong consistency criteria.
Since in most applications we will be interested in continuous functions of stochastic
processes, we describe the limiting behavior of functions of a.s. convergent sequences.
Proposition 1.1 Let {y
t
()} be such that {y
t
()}
a.s.
y(). Let h be a n 1 vector of
functions, continuous at y(). Then h(y
t
())
a.s.
h(y()). 2
Proposition 1.1 is a simple extension of the standard result that continuous functions of
convergent sequences are convergent. In fact, for xed , if {y
t
()} y() then h(y
t
())
h(y()). Since the set [ : y
t
() y()] [ : h(y
t
()) h(y())] and since 1 =
lim
T
P[||y
t
() y()|| , t > T] lim
T
P[||h(y
t
()) h(y
t
())|| < , t > T]
1, h(y
t
())
a.s.
h(y()).
Example 1.4 Let {y
t
()} = 1
1
t
for a given and let h(y
t
()) =
1
T
P
t
y
t
(). Then
h(y
t
()) is continuous at lim
t
y
t
() = 1 and h(y
t
())
a.s.
1.
Exercise 1.1 Suppose {y
t
()} = 1/t with probability 1 1/t and {y
t
()} = t with proba-
bility t. Does {y
t
()} converge almost surely to 1?
In some applications we will be interested in examining cases where a.s. convergence does
not hold. This can be the case when the observations have a probability density function
that changes over time or when matrices appearing in the expression for estimators do not
converge to xed limits. In these cases even though h(y
1t
()) does not converge to
h(y()), it may be the case that the distance between h(y
1t
()) and h(y
2t
()) becomes
arbitrarily small as t where {y
2t
()} is another sequence of random variables. To
examine this type of convergence we need the following denition.
Denition 1.5 (Uniform Continuity): h is uniformly continuous on R
1
R
m
if for
> 0 there exists a () > 0 such that if {y
1t
()}, {y
2t
()} R
1
and |y
1t,i
()
y
2t,i
()| < (), i = 1, . . . , m, then |h
j
(y
1t
()) h
j
(y
2t
())| < , j = 1, . . . , n.
The concept of uniform continuity is graphically illustrated in gure 1.2 for m = n = 1.
Exercise 1.2 Show that uniform continuity implies continuity but not vice versa. Show
that if R
1
is compact, continuity implies uniform continuity.
Example 1.5 Functions which are continuous but not uniformly continuous are easy to
construct. Let R
1
= (0, 1) and let h(y
t
()) =
1
y
t
. Clearly h is continuous over R
1
. However,
it is not uniformly continuous. To show this it is enough to show that for all > 0, > 0
Methods for Applied Macro Research 1: Preliminaries 5


y2
y1
( )
1
h( y2)
h(y1)

Figure 1.2: Uniform Continuity
we can nd sequences y
1t
() and y
2t
() R
1
for which |y
1t
y
2t
| < and |h(y
1t
)h(y
2t
)| >
(). For example, set y
2t
= 0.5y
1t
. Then conditions reduce to |
y
1t
2
| < and |
1
y
1t
| > ()
which are satised for 0 < |y
1t
| < min{
1
()
, 2, 1}.
What is missing in the example 1.5 and instead is present in the next proposition is the
condition that h is a dened on a compact set.
Proposition 1.2 Let h be continuous on a compact set R
2
R
m
. Suppose {y
1t
()} and
{y
2t
()} are such that {y
1t
()} {y
2t
()}
a.s.
0 and there exists an > 0 such that for
all t > T, y
2t
is interior to R
2
, uniformly in t; that is, the set [|y
1t
y
2t
| < ] R
2
, t
large. Then h(y
1t
()) h(y
2t
())
a.s.
0. 2
The logic of proposition 1.2 is simple. Since R
2
is compact, h
j
, j = 1, . . . , n is uniformly
continuous on R
2
. Choose such that {y
1t
()} {y
2t
()} 0, t . Since y
2t
() is
in the interior of R
2
, for all R
2
suciently large, uniformly in t, and since {y
1t
()}
{y
2t
()} 0, y
1t
is in the interior of R, for t large. By uniform continuity, for any
> 0, there exists a () > 0 such that if |y
1t,i
() y
2t,i
| < () i = 1, . . . m, then
|h
j
(y
1t
()) h
j
(y
2t
())| < . Since |y
1t,i
() y
2t,i
| < (), for all t suciently large and
almost every , |h
j
(y
1t
()) h
j
(y
2t
())|
a.s.
0 for every j = 1, . . . n.
One application of proposition 1.2 is the following: suppose {y
1t
()} is some actual time
series and {y
2t
()} is its simulated counterpart obtained from a model, xing the parameters
and given , and h be some continuous statistics, e.g. the mean or the variance. Then the
proposition tells us that if simulated and actual paths are close enough as t , then the
corresponding statistics generated from these paths will also be close as t .
1.2.2 Convergence in Probability
Convergence in probability is weaker than convergence almost sure, as it is shown next.
6
Denition 1.6 (Convergence in Probability) If there exists a y() < such that, for
every > 0, P[ : ||y
t
() y()|| > ] 0, for t , then {y
t
()}
P
y.
Denition 1.6 can be restated using the concept of boundness in probability.
Denition 1.7 y
t
is bounded in probability denoted by O
p
(1)), if for any arbitrary
1
> 0
there exists a
2
such that P[ : ||y
t
() y()|| >
2
] <
1
. If
2
0 as t ,
{y
t
()} converges in probability to y().
Exercise 1.3 Use the denitions to show that almost sure convergence implies convergence
in probability.
P
is weaker than
a.s.
because in the former we only need the joint distribution of (y
t
, y)
not the joint distribution of (y
t
, y

, y) > T.
P
implies that it is less likely that one
element of the {y
t
()} sequence is more than an away from y as t .
a.s.
implies
that after T, the path of {y
t
()} is not far from y as T . Hence, it is easy to build
examples where
P
does not imply
a.s.
.
Example 1.6 Let y
t
and y

be independent t, , let y
t
be either 0 or 1 and let
P[y
t
= 0] =
_

_
1/2 t = 1, 2
2/3 t = 3, 4
3/4 t = 5 8
4/5 t = 9 16
Then P[y
t
= 0] = 1
1
j
for t = 2
(j1)+1
, . . . , 2
j
so that y
t
P
0. This is because the
probability that y
t
is in one of these classes is 1/j and, as t , the number of classes
goes to innity. However, y
t
does not converge almost surely to zero since the probability
that a convergent path is drawn is zero; i.e., if at t we draw y
t
= 1, there is a non-negligible
probability that y
t+1
= 1 is drawn. Hence, y
t
P
0 is too slow to insure that y
t
a.s.
0.
Although convergence in probability does not imply almost sure convergence, the fol-
lowing result is useful:
Result 1.1 If y
t
()
P
y
t
, there exist a subsequence y
t
j
() such that y
t
j
()
a.s.
y(t)
(see, e.g., Lukacs, 1975, p. 48).
Since convergence in probability allows a more erratic behavior in the converging sequence
than almost sure convergence, one can obtain almost sure convergence by disregarding the
erratic elements of the sequence. The concept of convergence in probability is useful to
show weak consistency of certain estimators.
Methods for Applied Macro Research 1: Preliminaries 7
Example 1.7 (i) Let y
t
be a sequence of iid random variables with E(y
t
) < . Then
1
T
P
T
t=1
y
t
a.s.
E(y
t
) (Kolmogorov strong law of large numbers).
(ii) Let y
t
be a sequence of uncorrelated random variables, E(y
t
) < , var(y
t
) =
2
y
<
, cov(y
t
, y
t
) = 0 for all 6= 0. Then
1
T
P
T
t=1
y
t
P
E(y
t
) (Chebyshev weak law
of large numbers).
As example 1.7 indicates strong consistency requires iid random variables, while for weak
consistency we just need a set of uncorrelated random variables with identical means and
variances. Note also that, weak consistency requires restrictions on the second moments of
the sequence restrictions which are not needed in the rst case.
The analogs of propositions 1.1 and 1.2 for convergence in probability can be easily
obtained.
Proposition 1.3 Let {y
t
()} be such that y
t
()
P
y(). Then h(y
t
())
P
h(y()),
for any h which is continuous at y() (for a proof, see White, 1984, p. 23). 2
Proposition 1.4 Let h be continuous on a compact R
1
R
m
. Let {y
1t
()} and {y
2t
()}
be such that y
1t
() y
2t
()
P
0 for t . Then h(y
1t
()) h(y
2t
())
P
0 (for a
proof, see White, 1984, p. 25). 2
In some time-series applications the stochastic process y
t
may converge to a limit which
is not in the space of the random variables which make up the sequence; e.g., the sequence
dened by y
t
=
P
j
e
j
where each e
j
is stationary has a limit which is not in the space of
stationary variables. In other cases, the limit point y() may be unknown. In that case the
concept of Cauchy convergence is useful.
Denition 1.8 {y
t
()} is Cauchy if for > t > T(, ), ||y
t
() y

()|| where
T(, ) R, for all > 0.
Example 1.8 y
t
=
1
t
is a Cauchy sequence but y
t
= (1)
t
is not.
Using the concept of Cauchy sequences we can redene almost sure convergence and con-
vergence in probability as follows:
Denition 1.9 (Convergence a.s and in Probability): {y
t
()} converges a.s. if and only
if for every > 0, lim
T
P[||y
t
() y

()|| > , for some > t T] 0 and converges


in probability if and only if for every > 0, lim
t,
P[||y
t
() y

()|| > ] 0.
8
1.2.3 Convergence in L
q
-norm.
Denition 1.10 (Convergence in the norm): {y
t
()} converges in the L
q
-norm, (or,
in the q
th
-mean, denoted by y
t
()
q.m.
y()) if there exists a y() < such that
lim
t
E[|y
t
() y()|
q
] = 0, for some q > 0.
While almost sure convergence and convergence in probability look at the path of y
t
L
q
-
convergence is concerned with the q-th moment of y
t
. L
q
-convergence is typically analyzed
when q = 2, in which case we have convergence in mean square; when q = 1, (absolute
convergence); and when q = (convergence in the minmax norm).
If the q
th
-moment of the distribution does not exist, convergence in L
q
does not apply
(i.e., if y
t
is a Cauchy random variable, so that moments do not exist, L
q
convergence is
meaningless). Convergence in probability applies even when moments do not exist. Intu-
itively, the dierence between the two types of convergence lies in the fact that the latter
allows the distance between y
t
and y to get large faster than the probability gets smaller,
while this is not possible with L
q
convergence.
Exercise 1.4 Let y
t
converge to 0 in L
q
. Show that y
t
converges to 0 in probability.
(Hint: Use Chebyshevs inequality.)
As exercise 1.4 indicates, convergence in L
q
is stronger than convergence in probability.
In general, the latter does not necessarily imply the former.
Example 1.9 Let K = {
1
,
2
} and let P(
1
) = P(
2
) = 0.5. Let y
t
(
1
) = (1)
t
,
y
t
(
2
) = (1)
t+1
and let y(
1
) = y(
2
) = 0. Clearly y
t
does not converge in probability to
y. Hence y
t
does not converge to y = 0 in L
q
-norm. To conrm this note, for example, that
lim
t
E[|y
t
() y()|
2
] = 1.
The following result provides the conditions needed to insure that convergence in prob-
ability imply L
q
-convergence.
Result 1.2 If y
t
P
y and sup
t
{lim

E(|y
t
|
q
I
[|yt|]
)} = 0 (i.e. |y
t
|
q
is uniformly
integrable) where I is the indicator function, then y
t
q.m.
y (Davidson, 1994, p.287).
In general, there is no relationship between L
q
and almost sure convergence. The
following is an example that shows that the two concepts are distinct.
Example 1.10 Let y
t
() = t if [0, 1/t) and y
t
() = 0 otherwise. Then the set
{ : lim
t
y
t
() 6= 0} includes only the element {0} so y
t
a.s.
0. However E|y
t
|
q
=
0 (1 1/t) + t
q
/t = t
q1
. Since y
t
is not uniformly integrable it fails to converge in the
q-mean for any q > 1 ( for q = 1, E|y
t
| = 1, t). Hence, the limiting expectation of y
t
diers
from its almost sure limit.
Methods for Applied Macro Research 1: Preliminaries 9
Exercise 1.5 Let
y
t
=
_
_
_
1 with probability 1 1/t
2
t with probability 1/t
2
Show that the rst and second moments of y
t
are nite. Show that y
t
p
1 but that y
t
does not converge in quadratic mean to 1.
The next result is useful to show convergence in L
q
0
-norm when we know that conver-
gence obtains in the L
q
-norm with q > q
0
. To show such a result we need to state Jensens
inequality: Let h be a convex function on R
1
R
m
and y be a random variable such that
P[y R
1
] = 1. Then h[E(y)] E(h(y)). If h is concave on R
1
, h(E(y)) E(h(y)).
Example 1.11 Let h(y) = y
2
, then Eh(y) = E(y
2
) 1/E(y
2
) = h(E(y)).
Proposition 1.5 Let q
0
< q. If y
t
()
q.m.
y(), then y
t
()
q
0
.m.
y() 2 .
To illustrate proposition 1.5 set h(x
t
) = x
b
t
, b < 1, x
t
0 (so that h is concave)
x
t
= |y
t
()y()|
q
and b =
q
0
q
. From Jensens inequality E(|y
t
()y()|
q
0
) = E({|y
t
()
y()|
q
}
q
0
/q
) {E(|y
t
()y()|
q
)}
q
0
/q
. Since E(|y
t
()y()|
q
) 0, E(|y
t
()y()|
q
0
)
0.
Example 1.12 Continuing with example 1.9, we have seen that y
t
converges to one in
mean square. Does it converge in absolute mean? It is easy to verify that lim
t
E[|y
t
()
y()|] = 1, so that y
t
converges in L
1
to the same limit.
1.2.4 Convergence in Distribution
This concept of convergence is useful to show the asymptotic (normal) properties of several
types of estimators.
Denition 1.11 (Convergence in Distribution): Let {y
t
()} be a m1 vector with joint
distribution D
t
. If D
t
(z) D(z) as t , for every point of continuity z, where D
is the distribution function of a random variable y(), then y
t
()
D
y().
Convergence in distribution is weak and does not imply, in general, anything about the
convergence of a sequence of random variables. In fact, while the previous three convergence
concepts {y
1
, . . . , y
t
} and the limit y needs to be dened on the same probability space,
convergence in distribution is meaningful even when this is not the case.
Next, we characterize the relationship between convergence in distribution and conver-
gence in probability .
10
Proposition 1.6 Suppose y
t
()
P
y(), y() constant. Then y
t
()
D
D
y
, where D
y
is the distribution of a random variable z which takes the value y() with probability 1.
Conversely, if y
t
()
D
D
y
, then y
t
P
y (for a proof: see Rao, 1973, p. 120). 2
Note that this result could be derived as a corollary of proposition 1.3, had we assumed
that D
y
is a continuous function of y.
The next two results will be handy when demonstrating the asymptotic properties of a
class of estimators in dynamic models. Note that y
1t
() is O
p
(t
j
) if there exists an O(1)
nonstochastic sequence y
2t
such that (
1
t
j
y
1t
() y
2t
)
p
0 and that y
2t
is O(1) if for
some 0 < < , there exists a T such that |y
2t
| < for all t T.
Result 1.3 If y
t
D
y, then y
t
is bounded in probability. (see White, 1984, p.26).
Result 1.4 (i) If y
1t
p
%, y
2t
D
y then y
1t
y
2t
D
%y, y
1t
+ y
2t
D
% + y where % is a
constant (Davidson, 1994, p.355).
(ii) If y
1t
and y
2t
are sequences of random vectors, y
1t
y
2t
p
0 and y
2t
D
y imply
that y
1t
D
y. (Rao, 1973, p.123)
Part (ii) of result 1.4 is useful when the asymptotic distribution of y
1t
cannot be
determined directly. The proposition insures that if we can nd a y
2t
with known asymptotic
distribution, which converges in probability to y
1t
, then the distributions of y
1t
and y
2t
will coincide. We will use this result when discussing two-steps estimators in chapter 5 .
The limiting behavior of continuous functions of sequences which converge in distribution
is easy to characterize. In fact we have the following:
Result 1.5 Let y
t
D
y. If h is continuous, then h(y
t
)
D
h(y). (Davidson, 1994, p. 355)
1.3 Time Series Concepts
Since this book focuses on time series problems and applications we next describe a number
of useful concepts which will be extensively used in later chapters.
Denition 1.12 (Lag operator): The lag operator is dened by `y
t
= y
t1
and `
1
y
t
=
y
t+1
. When applied to a sequence of m m matrices A
j
, j = 1, 2, . . ., the lag operator
produces A(`) = A
0
+ A
1
` + A
2
`
2
+ . . ..
Denition 1.13 (Autocovariance function): The autocovariance function of {y
t
()}

t=
is ACF
t
() E(y
t
() E(y
t
()))(y
t
() E(y
t
())) and its autocorrelation function
ACRF
t
() corr(y
t
, y
t
) =
ACFt()

var(yt())var(y
t
())
Methods for Applied Macro Research 1: Preliminaries 11
In general, both the autocovariance and the autocorrelation functions depend on time
and on the gap between the elements.
Denition 1.14 (Stationarity 1): {y
t
()}

t=
is stationary if and only if for any set of
paths X = {y
t
() : y
t
() %, % R, K}, P(X) = P(`

X), where `

y
t
= y
t
.
A process is stationary if shifting a path over time does not change the probability
distribution of that path. In this case the joint distribution of {y
t
1
, . . . , y
t
j
} is the same
as the joint distribution of {y
t
1+
, . . . , y
t
j+
}, . A weaker stationarity concept is the
following:
Denition 1.15 (Stationarity 2): {y
t
()}

t=
is covariance (weakly) stationary if E(y
t
)
is constant, E|y
t
|
2
< and ACF
t
() is independent of t.
Denition 1.15 is weaker than 1.14 in several senses: rst, it involves the distribution
of y
t
at each t and not the joint distribution of a (sub)sequence of y
0
t
s. Second, it only
concerns the rst two moments of y
t
. Clearly, a stationary process is weakly stationary.
The converse is not true, except when y
t
is a normal random variable. In fact, when y
t
is
normal for each t, the rst two moments characterize the entire distribution and the joint
distribution of a {y
t
}

t=1
path is normal.
Example 1.13 Let y
t
= e
1
cos(t) + e
2
sin(t) where e
1
, e
2
are two uncorrelated random
variables with mean zero, unit variance and [0, 2]. Clearly, the mean of y
t
is constant
and E|y
t
|
2
< . Also cov(y
t
, y
t+
) = cos(t) cos((t+))+sin(t) sin((t+)) = cos().
Hence y
t
is covariance stationary.
Exercise 1.6 Suppose y
t
= e
t
if t is odd and y
t
= e
t
+1 if t is even, where e
t
iid (0, 1).
Show that y
t
is not covariance stationary. Show that y
t
= y + y
t1
+ e
t
, e
t
iid (0,
2
e
)
and y a constant is not stationary but that y
t
= y
t
y
t1
is stationary.
When {y
t
} is stationary, its autocovariance function has three properties: (i) ACF(0)
0, (ii) |ACF()| ACF(0), (iii) ACF() = ACF() for all . Furthermore, if y
1t
and
y
2t
are two stationary uncorrelated sequences, y
1t
+y
2t
is stationary and the autocovariance
function of y
1t
+ y
2t
is ACF
y
1
() + ACF
y
2
().
Exercise 1.7 Suppose y
1t
= y + at + e
t
where e
t
iid (0,
2
e
) and y, a are constants.
Dene y
2t
=
1
2J+1
P
J
j=J
y
1t+j
. Compute the mean and the autocovariance function of y
2t
.
Is the process stationary? Is it covariance stationary?
Denition 1.16 (Autocovariance generating function): The autocovariance generating
function of a stationary {y
t
()}

t=
process is CGF(z) =
P

=
ACF()z

, provided
that the sum converges for all z in some annulus %
1
< |z| < %, % > 1.
12
Example 1.14 Consider the stochastic process y
t
= e
t
De
t1
= (1 D`)e
t
|D| <
1 e
t
iid (0,
2
e
). Here cov(y
t
, y
tj
) = cov(y
t
, y
t+j
) = 0, j 2; cov(y
t
, y
t
) = (1 + D
2
)
2
e
;
cov(y
t
, y
t1
) = D`
2
e
; cov(y
t
, y
t+1
) = D`
1

2
e
. Then the CGF of y
t
is
CGF
y
(z) = D
2
e
z
1
+ (1 + D
2
)
2
e
z
0
D
2
e
z
1
=
2
e
(Dz
1
+ (1 + D
2
) Dz) =
2
e
(1 Dz)(1 Dz
1
) (1.1)
The result of example 1.14 can be generalized to more complex processes. In fact, if y
t
=
D(`)e
t
, CFG
y
(z) = D(z)
e
D(z
1
)
0
, and this holds for both univariate and multivariate y
t
.
One interesting special case occurs when z = e
i
= cos() i sin(), (0, 2), in
which case S() =
GCF
y
(z)
2
=
1
2
P

=
ACF()e
i
is the spectral density of y
t
.
Exercise 1.8 (i) Let y
t
= (1+0.5`+0.8`
2
)e
t
, e
t
iid (0,
2
e
). Is y
t
covariance stationary?
If so, show the autocovariance and the autocovariance generating functions.
(ii) Let (1 0.25`)y
t
= e
t
where e
t
iid (0,
2
e
). Is y
t
covariance stationary? If so, show
the autocovariance and the autocovariance generating functions.
Exercise 1.9 Let {y
1t
()} be a stationary process and let h be a n1 vector of continuous
functions. Show that y
2t
= h(y
1t
) is also stationary.
Stationarity is a weaker requirement than iid, as the next example shows, but it is
stronger that the identically (not necessarily independently) distributed assumption.
Example 1.15 Let y
t
iid (0, 1) t. Since y
t
iid (0, 1) any nite subsequence
y
t
1+
, . . . , y
t
j+
will have the same distribution for any and therefore y
t
is stationary. It
is easy to build examples where a stationary series is not iid. Take, for instance, y
t
=
e
t
De
t1
. If |D| < 1, y
t
is stationary but it is not iid.
Exercise 1.10 Give an example of a process y
t
which is identically (but not necessarily
independently) distributed which is nonstationary.
Denition 1.17 (Ergodicity 1): A stationary {y
t
()} process is ergodic if and only if
for any set of paths X = {y
t
() : y
t
() %, % R}, with P(`

X) = P(X), , P(X) = 0
or P(X) = 1.
Example 1.16 Consider a path on a unit circle (see gure 1.3). Let X = (y
0
, . . . , y
t
). Let
P(X) be the Lebesgue measure on the circle (i.e., the length of the interval [y
0
, y
t
]). Let
`

X = {y
o
, . . . , y
t
} displace X by half a circle. Since P(`

X) = P(X), y
t
is stationary.
However, P(`

X) 6= 1 or 0 so y
t
is not ergodic.
A weaker denition of ergodicity is the following:
Methods for Applied Macro Research 1: Preliminaries 13


yt-
y0
yt
y1

y0-

y1-
Figure 1.3: Non-ergodicity
Denition 1.18 (Ergodicity 2): A (weakly) stationary stochastic process {y
t
} is ergodic
if and only if
1
T
P
T
t=1
y
t
a.s.
E[y
t
()] where the expectation is taken with respect to .
An important corollary of denition 1.18 is the following:
Result 1.6 For any stationary {y
1t
} and any continuous function h such that E[|h(y
1t
)|] <
;
1
T
P
h(y
1t
)
a.s.
y
2t
, where y
2t
is a random variable dened by E(y
2t
) = E[h(y
1t
)].
Denition 1.17 is stronger than denition 1.18 because it refers to the probability of paths
(the latter concerns only their rst moment). Note also that the denition of ergodicity
applies only to stationary processes. Intuitively, if a process is stationary its path converges
to some limit. If it is stationary and ergodic all paths (indexed by ) will converge to the
same limit. Hence, one path is sucient to infer the moments of its distribution.
Example 1.17 Let y
t
= e
1
+e
2t
t = 0, 1, 2, . . . where e
2t
iid (0, 1) and e
1
has mean 1
and variance 1. Clearly y
t
is stationary and E(y
t
) = 1. Note that
1
T
P
t
y
t
= e
1
+
1
T
P
t
e
2t
and lim
T
1
T
P
t
y
t
= e
1
+ lim
T
1
T
P
t
e
2t
= e
1
because
1
T
P
t
e
2t
a.s.
0. Since, the
time average of y
t
(equal to e
1
) is dierent from the population average of y
t
(equal to
1), y
t
is not ergodic.
What is wrong with example 1.17? Intuitively, y
t
is not ergodic because there is too
much memory in the observations (e
1
appears in y
t
for every t). Hence, roughly speaking,
ergodicity holds when the process forgets its past reasonably fast.
14
Example 1.18 Consider the process y
t
= e
t
2e
t1
where e
t
iid (0,
2
e
). It is easy to
verify that E(y
t
) = 0 and that var(y
t
) = 5
2
e
< and cov(y
t
, y
t
) does not depend on t.
Therefore the process is covariance stationary. To verify that it is ergodic consider the
sample mean
1
T
P
t
y
t
, which is easily shown to converge to 0 as T . Note that the
sample variance of y
t
is
1
T
P
t
y
2
t
=
1
T
P
t
(e
t
2e
t1
)
2
=
5
T
P
t
e
2
t
var(y
t
) as T .
Exercise 1.11 Consider the process y
t
= 0.6y
t1
+ 0.2y
t2
+ e
t
, where e
t
iid (0, 1). Is
y
t
stationary? Is it ergodic? Find the eect of a unitary change in e
t
at time t on y
t+3
.
Repeat the exercise for y
t
= 0.4y
t1
+ 0.8y
t2
+ e
t
.
Exercise 1.12 Consider the following bivariate process:
y
1t
= 0.3y
1t1
+ 0.8y
2t1
+e
1t
y
2t
= 0.3y
1t1
+ 0.4y
2t1
+e
2t
where E(e
1t
e
1
) = 1 for = t and 0 otherwise, E(e
2t
e
2
) = 2 for = t and 0 otherwise,
and E(e
1t
e
2
) = 0 for all , t. Is the system covariance stationary? Is it ergodic? Calculate
y
1t+
e
2t
for = 2, 3. What is the limit of this derivative as ?
Exercise 1.13 Suppose that at t time 0, {y
t
}

t=1
is given by
y
t
=
_
_
_
1 with probability 1/2
0 with probability 1/2
Show that y
t
is stationary but not ergodic. Show that a single path (i.e. a path composed of
only 1s and 0s ) is ergodic.
Exercise 1.14 Let y
t
be dened by
y
t
=
_
_
_
(1)
t
with probability 1/2
(1)
t+1
with probability 1/2
Calculate the correlation between y
t
and y
t+
and show that it is constant as .
Show that the process is stationary and ergodic. Show that single paths are not ergodic.
Exercise 1.15 Let y
t
= cos(/2 t) + e
t
where e
t
iid (0,
2
e
). Show that y
t
is neither
stationary nor ergodic. Show that {y
t
, y
t+4
, y
t+8
. . . t = 1, 2, . . .} is stationary and ergodic.
Exercise 1.15 shows an important result: if a process is non-ergodic, it may be possible
to nd a subsequence which is ergodic .
Exercise 1.16 Show that if {y
1t
()} is ergodic, then the process dened by y
2t
= h(y
1t
)
is ergodic where h is continuous h.
Methods for Applied Macro Research 1: Preliminaries 15
A concept related to ergodicity is the one of mixing. For this purpose, let B
j

be the
Borel algebra generated by values of y
t
from the innite past up to t = j and B

j+i
be the
Borel algebra generated by values of y
t
from t = j +i to innity. Intuitively, B
j

contains
information about the past up to t = j and B

j+i
information from t = j + i on.
Denition 1.19 (Dependence of events) Let B
1
and B
2
be two Borel algebra and B
1
B
1
and B
2
B
2
two events. Then -mixing and -mixing are dened as follows:
(B
1
, B
2
) sup
{B
1
B
1
,B
2
B
2
:P(B
1
)>0}
|P(B
2
|B
1
) P(B
2
)|
(B
1
, B
2
) sup
{B
1
B
1
,B
2
A
2
}
|P(B
2
B
1
) P(B
2
)P(B
1
)|.
-mixing and -mixing measure how much the probability of the joint occurrence of two
events diers from the product of the probabilities of each event occurring. We say that
events in B
2
and B
1
are independent if both and are zero. The function provides a
measure of relative dependence while measures absolute dependence.
For a stochastic process -mixing and -mixing are dened as follows:
Denition 1.20 (Mixing): For a sequence {y
t
()}, the mixing coecients and are
dened as: (i) = sup
j
(B
j

, B

j+i
) and (i) = sup
j
(B
j

, B

j+i
)
(i) and (i), called respectively uniform and strong mixing, measure how much depen-
dence there is between elements of {y
t
} separated by m periods. If (i) = (i) = 0, y
t
and
y
t+i
are independent. If (i) = (i) = 0 as i , they are asymptotically independent.
Note that because (i) (i), -mixing implies -mixing.
Example 1.19 Let y
t
be such that cov(y
t
y
t
) = 0 for some . Then (i) = (i) =
0, i . Let y
t
= Ay
t1
+ e
t
where A 1, e
t
iid (0,
2
e
). Then (i) = 0 as i .
Exercise 1.17 Show that if y
t
= Ay
t1
+ e
t
, A 1, e
t
iid (0,
2
e
), (i) does not go to
zero as i .
Mixing is a stronger memory requirement than ergodicity as the following result shows:
Result 1.7 Let y
t
be stationary. If (i) 0 as i , y
t
is ergodic (Rosenblatt (1978)).
Exercise 1.18 Use result 1.7 and the fact that (i) (i) to show that if (i) 0 as
i a -mixing process is ergodic.
A concept which is somewhat related to those of ergodicity and mixing and is useful
when y
t
is an heteroschedastic processes is the following:
16
Denition 1.21 (Asymptotic Uncorrelatedness): y
t
() has asymptotic uncorrelated el-
ements if there exists a constant 0 %

1, 0 such that
P

=0
%

< and
cov(y
t
, y
t
) %

p
(var(y
t
)var(y
t
) > 0 where var(y
t
) < , for all t.
Note two important features of denition 1.21. First, only > 0 matters. Second,
when var(y
t
) is constant and the covariance of y
t
with y
t
depends only on , asymptotic
uncorrelatedness is the same as covariance stationarity.
Exercise 1.19 Show that to have
P

< it is necessary that %

0 as and it
is sucient that

<
1b
for some b > 0, all large.
Exercise 1.20 Suppose that y
t
is such that the correlation between y
t
and y
t
goes to zero
as . Is this sucient to ensure that y
t
is ergodic?
Exercise 1.21 Let y
t
= Ay
t1
+e
t
, |A| 1, e
t
iid (0,
2
e
). Show that y
t
is asymptotically
uncorrelated.
The next two concepts will be extensively used in the next chapters.
Denition 1.22 (Martingale): {y
t
} is a martingale with respect to the information set F
t
if y
t
F
t
t > 0 and E
t
[y
t+
] E[y
t+
|F
t
] = y
t
, for all t, .
Denition 1.23 (Martingale dierence): {y
t
} is a martingale dierence with respect to
the information set F
t
if y
t
F
t
, t > 0 and E
t
[y
t+
] E[y
t+
|F
t
] = 0 for all t, .
Example 1.20 Let y
t
be iid with E(y
t
) = 0, at F
t
= {. . . , y
t1
, y
t
} and let F
t1
F
t
.
Then y
t
is a martingale dierence sequence.
Martingale dierence is a much weaker requirement than stationarity and ergodicity
since it only involves restrictions on the rst conditional moment. It is therefore easy to
build examples of processes which are martingale dierence but are not stationary.
Example 1.21 Suppose that y
t
is iid with mean zero and variance
2
t
. Then y
t
is a mar-
tingale dierence but it is not stationary.
Exercise 1.22 Let y
1t
be a stochastic process and let y
2t
= E[y
1t
|F
t
] be its conditional
expectation. Show that y
2t
is a martingale process.
We conclude this section suggesting an alternative way to look at stationarity. Let y
t
()
be a stochastic process with E|y
t
| < .
Denition 1.24 {y
t
} is an adapted process if y
t
is F
t
-measurable t (i.e the set X = [y
t

%] F
t
), and if F
t1
F
t
.
Methods for Applied Macro Research 1: Preliminaries 17
Intuitively, measurability implies that y
t
is observable at each t and that some news may
be revealed at each t.
Denition 1.25 A mapping T : R R is measurable if T
1
(X) X and measure preserv-
ing if P(T
1
X) = P(X), for all set of paths X.
Denition 1.25 insures that sets that are not events can not be transformed into events
via the mapping T.
Example 1.22 Let T be a mapping shifting forward the y
t
sequence by one time period,
i.e. Ty
t
() y
t
(T) = y
t+1
() t. Then T is measurable since both Ty
t
() and T
1
y
t
()
describe events which can be generated from F
t
. The mapping will be measure preserving if
the probability of drawing a sequence {y
1
, . . . y
t
j
} is independent of time.
Exercise 1.23 Let y
t
() be a measurable function. Show that
1
T
P
T
t=1
y
t
is measurable.
The measure preserving condition implies that for any elements y
t
1
and y
t
2
of the y
t
sequence, P[y
t
1
%] = P[y
t
2
%] for any % R, which is another way of stating the
requirement that the sequence y
t
is strictly stationary (see denition 1.14).
Exercise 1.24 Let y be a random variable and T be a measure preserving transforma-
tion. Construct a stochastic process using y
1
() = y(), y
2
() = y(T), . . . y
j
() =
y(T
j1
) K. Show that y
t
is stationary.
Using the identity y
t
= y
t
E(y
t
|F
t1
)+E(y
t
|F
t1
)E(y
t
|F
t2
)+E(y
t
|F
t2
) . . . we can
write y
t
=
P
1
j=0
Rev
tj
(t) + E(y
t
|F
t
) for = 1, 2, . . . where Rev
tj
(t) E[y
t
|F
tj
]
E[y
t
|F
tj1
] is the revision in forecasting y
t
, made with new information accrued from
t j 1 to t j. Rev
tj
(t) plays an important role in deriving the properties of functions
of stationary processes.
Exercise 1.25 Show that Rev
tj
(t) is a martingale dierence sequence.
1.4 Law of Large Numbers
Laws of large numbers provide conditions to insure that quantities like
1
T
P
t
x
0
t
x
t
or
1
T
P
t
z
0
t
x
t
,
which appear in the formulas for linear estimators like OLS or IV, stochastically converge to
well dened limits. Since dierent conditions apply to dierent kinds of economic data, we
will study situations which are typically encountered in macro-time series contexts. Given
the results of section 2, we will describe only strong law of large numbers since weak law of
large numbers hold as a consequence.
Laws of large numbers typically come in the following form: given restrictions on the
dependence and the heterogeneity of the observations and/or some moments restrictions,
1
T
P
y
t
E(y
t
)
a.s.
0. We will consider three situations: (i) dependent and identically
18
distributed observations, (ii) dependent and heterogeneously distributed observations, (iii)
martingale dierence observations. To better understand the applicability of each of the
setups note, rst, that in all cases observations are serially correlated. In the rst case we
restrict the distribution of the observations to be the same for every t; in the second case
we allow some carefully selected form of heterogeneity (for example, structural breaks in
the mean or in the variance or conditional heteroschedasticity); in the third case we do not
restrict the distribution of the process, but impose conditions on its moments.
1.4.1 Dependent and Identically Distributed Observations
To state a law of large numbers (LLN) for stationary sequences we need conditions on
the memory of the sequence. Typically, one assumes ergodicity since this implies average
asymptotic independence of the elements of the {y
t
} sequence.
A LLN for stationary and ergodic processes is as follows: Let {y
t
()} be stationary and
ergodic with E|y
t
| < t. Then
1
T
P
t
y
t
a.s.
E(y
t
). (See Stout, 1974, p. 181). 2
To apply this result to econometric estimators recall that for any F
t
-measurable function
h producing y
2t
= h(y
1t
), y
2t
is stationary and ergodic if y
1t
is stationary and ergodic.
Exercise 1.26 (Strong consistency of OLS and IV estimators): Let y
t
= x
t

0
+ e
t
, let
x = [x
1
x
T
]
0
, z = [z
1
, z
T
]
0
, e = [e
1
, , e
T
]
0
and assume:
(i)
x
0
e
T
a.s.
0
(i)
z
0
e
T
a.s.
0
(ii)
x
0
x
T
a.s.

xx
,
xx
nite, |
xx
| 6 = 0
(ii)
z
0
x
T
a.s.

zx
,
zx
nite, |
zx
| 6 = 0
(ii)
z
0
x
T

zx,T
a.s.
0, where
zx,T
is O(1) random matrix which depends on T and has
uniformly continuous column rank.
Show that
OLS
= (x
0
x)
1
(x
0
y) and
IV
= (z
0
x)
1
(z
0
y) exist almost surely for T large
and that
OLS
a.s.

0
under (i)-(ii) and that
IV
a.s.

0
under (i)-(ii). Show that
under (i)-(ii)
IV
exists almost surely for T large, and
IV
a.s.

0
.. (Hint: If A
n
is
a sequence of k
1
k matrices, then A
n
has uniformly full column rank if there exist a
sequence of k k submatrices
n
which is uniformly nonsingular.)
1.4.2 Dependent and Heterogeneously Distributed Observations
To derive a LLN for dependent and heterogeneously distributed processes to hold we drop
ergodicity and we substitute it with a mixing requirement. In addition, we need to dene
the size of the mixing conditions:
Denition 1.26 Let 1 a . Then (i) = O((i)
b
) for b > a/(2a 1) implies that
(i) is of size a/(2a 1). If a > 1 and (i) = O((i)
b
) for b > a/(a 1), (i) is of size
a/(a 1).
Denition 1.26 allows precise statements on the memory of the process. Roughly speak-
ing, the memory depends on the moments of the process as measured by a. Note that
Methods for Applied Macro Research 1: Preliminaries 19
as a , the dependence increases while as a 1, the sequence exhibits less and less
dependence.
The LLN for dependent and heterogeneously distributed processes is the following. Let
{y
t
()} be a sequence with (i) of size a/(2a 1) or (i) of size a/(a 1), a > 1 and
E(y
t
) < , t. If for some 0 < b a,
P

t=1
(
E|ytE(yt)|
a+b
t
a+b
)
1
a
< , then
1
T
P
t
y
t
E(y
t
)
a.s.

0. (For a proof see McLeish, 1975, theorem 2.10). 2.


The elements of y
t
are allowed to have distributions that vary over time (e.g. E(y
t
)
depends on t) but the condition (
E|yt(E(yt)|
a+b
t
a+b
)
1
a
< implies restrictions on the moments
of the process. Note that for a = 1 and b = 1 we have a version of Kolmogorov law of large
numbers.
The moment condition can be weakened somewhat if we are willing to impose a bound on
the (a + b)-th moment.
Corollory 1.1 Let {y
t
()} be a sequence with (i) of size a/(2a1) or (i) of size a/(a
1), a > 1 such that E|y
t
|
a+b
is bounded for all t. Then
1
T
P
t
y
t
E(y
t
)
a.s.
0.
The next result mirrors the one obtained in the case of stationary ergodic processes.
Result 1.8 Let h be F
t
-measurable and y
2
= h(y
1t
, . . . y
1t+
), nite. If y
1t
is mixing
such that (i) ((i)) is O((i)
b
) for some b, y
2t
is mixing such that (j) ((j)) is O((i)
b
).
From this last result it immediately follows that if {z
t
, x
t
, e
t
} is a vector of mixing
sequences also {x
0
t
x
t
}, {x
0
t
e
t
}, {z
0
t
x
t
}, {z
0
t
e
t
}, are mixing sequence of the same size.
Mixing conditions are hard to verify in practice. A useful result when observations are
heterogeneous is the following:
Result 1.9 Let {y
t
()} be a such that
P

t=1
E|y
t
| < . Then
P

t=1
y
t
converges almost
surely and E(
P

t=1
y
t
) =
P

t=1
E(y
t
) < (see White, 1984, p.48).
A LLN for processes which are asymptotically uncorrelated is the following. Let {y
t
()}
be a process with asymptotically uncorrelated elements, mean E(y
t
), variance
2
t
< < .
Then
1
T
P
t
y
t
E(y
t
)
a.s.
0.
Compared with corollary 1.1, we have relaxed the dependence restriction from mixing
to asymptotic uncorrelation at the cost of altering the restriction on moments of order
a +b (a 1, b a) to second moments. Note that since functions of asymptotically uncor-
related sequences are not asymptotically uncorrelated, to prove consistency of econometric
estimators when the regressors x
t
have asymptotic uncorrelated increments we need to make
assumptions on quantities like {x
0
t
x
t
}, {x
0
t
e
t
}, etc. directly.
1.4.3 Martingale Dierence Process
The LLN for martingale dierence processes is the following: Let {y
t
()} be a martingale
dierence process. If for some a 1,
P

t=1
E|y
t
|
2a
t
1+a
< , then
1
T
P
t
y
t
a.s.
0.
20
The martingale LLN therefore requires restrictions on the behavior of moments which
are slightly stronger than those assumed in the case of independent y
t
. The analogous of
corollary 1.1 for martingale dierences is the following.
Corollory 1.2 Let {y
t
()} be a martingale dierence such that E|y
t
|
2a
< < , some
a 1 and all t. Then
1
T
P
t
y
t
a.s.
0.
Exercise 1.27 Suppose {y
1t
()} is a martingale dierence. Show that y
2t
= y
1t
z
t
is a
martingale dierence for any z
t
F
t
.
Exercise 1.28 Let y
t
= x
t

0
+e
t
, and assume (i) e
t
is a martingale dierence; (ii) E(x
0
t
x
t
)
is positive and nite. Show that
OLS
exists and
OLS
a.s.

0
.
1.5 Central Limit Theorems
There are also several central limit theorems (CLT) available in the literature. Clearly, their
applicability depends on the type of data a researcher has available. In this section we list
CLTs for the three cases we have described in the previous section. Loeve (1977) or White
(1984) provide theorems for other relevant cases.
1.5.1 Dependent and Identically Distributed Observations
A central limit theorem for dependent and identically distributed observations can be ob-
tained if the condition E(y
t
|F
t
) 0 for (referred as linear regularity in chapter
4) and some restrictions on the variance of the process are imposed. Alternatively, we could
use E[y
t
|F
t
]
q.m.
0 as and restrictions on the variance of the process. Clearly, the
second condition is stronger than the rst one.
Proposition 1.7 Let {y
t
()} be an adapted process and suppose E[y
t
|F
t
]
q.m.
0 as
. Then E(y
t
) = 0.
It is easy to show that proposition 1.7 holds. In fact, E[y
t
|F
t
]
q.m.
0 as implies
E[E(y
t
|F
t
)] 0 as . Hence, for every > 0, there exists a T() such that 0
E[E(y
t
|F
t
)] < , > T(). By Jensens inequality, |E[E(y
t
|F
t
)]| E(|E(y
t
|F
t
)|)
implies 0 |E[E(y
t
|F
t
)]| < , for all > T(). By the law of iterated expectations
E(y
t
) = E[E(y
t
|F
t
)] and 0 E(y
t
) < . Since is arbitrary, E(y
t
) = 0.
Exercise 1.29 Let var(y
t
) =
2
y
< . Show that cov(Rev
tj
(t), Rev
tj
0 (t)) = 0, j < j
0
where Rev
tj
(t) was dened right before exercise 1.25. Note that this implies that var(x
t
) =
var(
P

j=0
Rev
tj
(t)) =
P

j=0
var(Rev
tj
(t)).
Exercise 1.30 Let
2
T
= T E((T
1
P
T
t=1
x
t
)
2
). Show that
2
T
=
2
+ 2
2
P
T1
=1

(1
/T) where

= E(y
t
y
t
)/
2
. Give conditions on y
t
that make

independent of t. Show
that
2
T
grows without bound as T .
Methods for Applied Macro Research 1: Preliminaries 21
Exercises 1.29-1.30 show that when y
t
is a dependent and identically distributed process
the variance of y
t
is the sum of the variances of the forecast revisions made at each t and
that without further restrictions there is no guarantee that
2
T
will converge to a nite
limit. A sucient condition that insures convergence is that
P

j=0
(varRev
tj
(t))
1/2
< .
The CLT is then as follows: Let {y
t
()} be an adapted sequence such that (i) {y
t
()}
is stationary and ergodic ; (ii) E(y
2
t
) =
2
y
< ; (iii) E(y
t
|F
t
)
q.m.
0 as ; (iv)
P

j=0
(varRev
tj
(t))
1/2
< . Then, as T ,
2
T

2
< and

T
(
1
T
P
t
yt)

T
D

N(0, 1) (for a proof see Gordin, 1969). 2


Exercise 1.31 Assume that (i) E[x
tji
e
tj
|F
t1
] = 0 t i = 1, . . . ; j = 1, . . .; (ii) E[x
tji
e
tj
]
2
< ; (iii)
T
var(T
1/2
x
0
e) var(x
0
e) as T is nonsingular and positive
denite; (iv)
P
j
(varRev
tj
(t))
1/2
< ; (v) (x
t
, e
t
) are stationary ergodic sequences; (vi)
E|x
tji
|
2
< ; (vii)
xx
E(x
0
t
x
t
) is positive denite. Show that (
1
xx

1
0
xx
)
1/2

T(
OLS

0
)
D
N(0, I) where
OLS
is the OLS estimator of in the model y
t
= x
t

0
+ e
t
and T
the number of observations.
1.5.2 Dependent Heterogeneously Distributed Observations
The CLT in this case is the following: Let {y
t
()} be a sequence of mixing random
scalars such that either (i) or (i) is of size a/a 1, a > 1 with E(y
t
) = 0; and
E|y
t
|
2a
< < , t. Dene y
b,T
=
1

T
P
b+T
t=b+1
y
t
and assume there exists a
2
< ,

2
6= 0 such that E(y
2
b,T
)
2
for T , uniformly in b. Then

T
(
1
T
P
t
yt)

T
D
N(0, 1)
where
2
T
E(y
2
0,T
).(see White and Domowitz (1984)). 2
As in the previous CLT, we need the condition that the variance of y
t
is consistently
estimated. Note also that we have substituted the stationarity-ergodicity assumption with
the one of mixing and that we need uniform convergence of E(y
2
b,T
) in b. This is equivalent
to imposing that y
t
is asymptotically covariance stationary (see White, 1984, p.128).
1.5.3 Martingale Dierence Observations
The CLT in this case is as follows: Let {y
t
()} be a martingale dierence process with

2
t
E(y
2
t
) < ,
2
t
6= 0, F
t1
F
t
, y
t
F
t
; let D
t
be the distribution function of y
t
and let
2
T
=
1
T
P
T
t=1

2
t
. If for every > 0, lim
T

2
T
1
T
P
T
t=1
R
y
2
>T
2
T
y
2
dD
t
(y) = 0
and (
1
T
P
T
t=1
y
2
t
)/
2
T
1
P
0 then

T
(
1
T
P
t
yt)

T
D
N(0, 1) (See McLeish, 1974). 2
The last condition is somewhat mysterious: it requires that the average contribution
of the extreme tails of the distribution to the variance of y
t
is zero in the limit. If this
condition holds then y
t
satises a uniform asymptotic negligibility condition. In other
words, none of the elements of {y
t
()} can have a variance which dominates the variance
of
1
T
P
t
y
t
.
22
Example 1.23 Suppose
2
t
=
t
. Then T
2
T
=
P
T
t=1

2
t

P
T
t=1

t
=
P
T1
t=0

t
1
as
T . In this case max
1tT

2
t
T
2
T
= /

1
= 1 6= 0, independent of T. Hence the
asymptotic negligibility condition is violated. On the other hand, if
2
t
=
2
,
2
T
=
2
. Then
max
1tT

2
t
T
2
T
=
1
T

2

2
0 as T and the asymptotic negligibility condition is satised.
The martingale dierence assumption allows us to weaken several of the conditions
needed to prove a central limit theorem relative to the case of stationary processes and will
be the one used in several parts of this book.
A useful result, which we will repeatedly use in later chapters concerns the asymptotic
distribution of converging stochastic processes.
Result 1.10 (Brockwell and Davis) Suppose the m 1 vector {y
t
()} is asymptotically
normally distributed with mean y and variance a
2
t

y
where
y
is a symmetric, non-negative
denite matrix and a
t
0 as t . Let h(y) = (h
1
(y), . . . , h
n
(y))
0
be such that each
h
j
(y) is continuously dierentiable in the neighborhood of y and let
h
=
h( y)
y
0

y
h( y)
y
0
0
have nonzero diagonal elements where
h( y)
y
0
is a nm matrix. Then h(y
t
)
D
N(h( y), a
t

h
)
.
1.6 Elements of Spectral Analysis
Denition 1.27 (Spectral density): The spectral density of stationary y
t
() process at
frequency [0, 2] is S
y
() =
1
2
P

=
ACF
y
() exp{i}.
As mentioned, the spectral density is a reparametrization of the covariance generating
function and it is obtained setting z = e
i
j
= cos(
j
)isin(
j
) where i =

1. Denition
1.27 also shows that the spectral density is the Fourier transform of the autocovariance of
y
t
. Hence, the spectrum repackages the autocovariances of {y
t
} using sine and cosine
functions as weights. Although the information present in the ACF and in the spectral
density functions is the same, the latter is at times more useful since, for
j
appropriately
chosen, its elements are uncorrelated.
Example 1.24 Two elements of the spectral density typically of interest are S(
j
= 0)
and
P
j
S(
j
). It is easily veried that S(
j
= 0) =
1
2
P

ACF() =
1
2
(ACF(0) +
2
P

=1
ACF()), that is, the spectral density at frequency zero is the (unweighted) sum of all
the elements of the autocovariance function. It is also easy to verify that
P
j
S(
j
) = var(y
t
)
that is, the variance of the process is the area below the spectral density.
To understand how the spectral density transforms the autocovariance function select,
for example, =

2
. Note that cos(

2
) = 1, cos(
3
2
) = 1, cos() = cos(2) = 0 and that
sin(

2
) = sin(
3
2
) = 0, sin(0) = 1 and sin() = 1 and that these values repeat themselves
since the sine and cosine functions are periodic.
Methods for Applied Macro Research 1: Preliminaries 23
Exercise 1.32 Calculate S( = ). Which autocovariances enter at frequency ?
It is typical to evaluate the spectral density at Fourier frequencies i.e. at
j
=
2j
T
, j =
1, . . . , T 1 since for any two
1
6=
2
such frequencies, S(
1
) is uncorrelated with S(
2
).
For a Fourier frequency
j
, the period of oscillation is
2

j
=
T
j
.
Example 1.25 Suppose you have quarterly data. Then at the Fourier frequency

2
, the
period is equal to 4. That is, at frequency

2
you have uctuations with an annual periodicity.
Similarly, at the frequency , the period is 2 so that at we nd biannual cycles.
Exercise 1.33 Business cycle are typically thought to occur with a periodicity of 2-8 years.
Assuming that you have quarterly data, nd the Fourier frequencies characterizing business
cycle uctuations. Repeat the exercise for annual and monthly data.
short periodicity
time
A
m
p
l
i
t
u
r
e

o
f

c
y
c
l
e
s
0
-1.0
-0.5
0.0
0.5
1.0
long periodicity
time
A
m
p
l
i
t
u
r
e

o
f

c
y
c
l
e
s
0
-1.0
-0.5
0.0
0.5
1.0
Figure 1.4: Short and long cycles
It is typical to associate low frequencies with cycles displaying long periods of oscillation
- that is, when y
t
moves infrequently from a peak to a through - and high frequencies with
short periods of oscillation - that is, when y
t
frequently moves from a peak to a through
(see gure 1.4). Hence, trends (i.e. cycles with an innite periodicity) are located in the
low frequencies of the spectrum and irregular uctuations in the high frequencies. Since
the spectral density is periodic mod(2) and symmetric around
j
= 0, it is sucient to
examine it S() over the interval [0, ].
24
Exercise 1.34 Show that S(
j
) = S(
j
).
Example 1.26 Suppose {y
t
()} is iid (0,
2
y
). Then ACF
y
() =
2
y
for = 0 and zero
otherwise and S
y
(
j
) =

2
2

j
. That is, the spectral density of an iid process is constant
for all [0, ].
Frequency
S
p
e
c
t
r
a
l

p
o
w
e
r
-3 -2 -1 0 1 2 3
9.5
9.6
9.7
9.8
9.9
10.0
Figure 1.5: Spectral density
Exercise 1.35 Consider a stationary AR(1) process {y
t
()} with autoregressive coecient
equal to A. Calculate the autocovariance function of the process. Show that the spectral
density is increasing monotonically as
j
0.
Exercise 1.36 Consider a stationary MA(1) stochastic process {y
t
()} with MA coecient
equal to D. Calculate the autocovariance function and the spectral density of y
t
. Show its
shape when D > 0 and when D < 0.
Economic time series have a typical bell shaped spectral density (see gure 1.5) with
a large portion of the variance concentrated in the lower part of the spectrum. Given the
result of exercise 1.35 it is therefore reasonable to assume that most of economic time series
can be represented with relatively simple AR processes.
The denitions we have given are valid for univariate processes, but can be easily ex-
tended to vector of stochastic processes.
Methods for Applied Macro Research 1: Preliminaries 25
Denition 1.28 (Spectral density matrix): The spectral density matrix of an m1 vector
of stationary processes {y
t
()}

t=
is S
y
(
j
) =
1
2
P

ACF
y
()e
i
j

where
S(
j
) =
_

_
S
y
1
y
1
(
j
) S
y
1
y
2
(
j
) . . . S
y
1
ym
(
j
)
S
y
2
y
1
(
j
) S
y
2
y
2
(
j
) . . . S
y
2
y
m
(
j
)
. . . . . . . . . . . .
S
y
m
y
1
(
j
) S
y
m
y
2
(
j
) . . . S
y
m
y
m
(
j
)
_

_
The elements on the diagonal of the spectral density matrix are real while the elements
o-the diagonal are typically complex. A measure of the strength of the relationship between
two series at frequency
j
is given by the coherence.
Denition 1.29 Consider a bivariate stationary process {(y
1t
(), y
2t
())}. The coherence
between {(y
1t
(), y
2t
())} at freqeuncy
j
is Co(
j
) =
|S
y
1
,y
2
(
j
)|

Sy
1
,y
1
(
j
)Sy
2
,y
2
(
j
)
.
The coherence is the frequency domain version of the correlation coecient and measures
the strength of the association between y
1t
and y
2t
at
j
. Notice that Co(
j
) is a real valued
function where |y| indicates the real part (or the modulus) of complex number y.
Example 1.27 Suppose y
t
= D(`)e
t
where e
t
iid (0,
2
e
). Then it is immediate to show
that the coherence between e
t
and y
t
is one at all frequencies. Suppose, on the other hand,
that Co(
j
) monotonically declines to zero as
j
moves from 0 to . Then y
t
and e
t
have
similar low frequency components and dierent high frequency ones.
Exercise 1.37 Suppose that e
t
iid (0,
2
e
) and let y
t
= e
t
+De
t1
. Calculate Co
yt,et
(
j
).
Interesting transformations of the spectrum of y
t
can be obtained with the use of lters.
Denition 1.30 A lter is a linear transformation of a stochastic process, i.e. if y
t
=
B(`)e
t
, e
t
iid (0,
2
e
), then B(`) is a lter.
A moving average (MA) process is therefore a lter since it linearly transforms a white
noise into another process. In general, stochastic processes can be thought of as ltered
versions of some white noise process (the news). To study the spectral properties of ltered
processes let CGF
e
(z) be the covariance generating function of e
t
. Then the covariance
generating function of y
t
is CGF
y
(z) = B(z)B(z
1
)CGF
e
(z) = |B(z)|
2
CGF
e
(z) where |B(z)|
is the modulus of B(z).
Example 1.28 Suppose that e
t
iid (0,
2
e
) so that its spectrum is S
e
(
j
) =

2
2
,
j
.
Consider now the process y
t
= D(`)e
t
where D(`) = D
0
+ D
1
` + D
2
`
2
+ . . .. It is typical
to interpret D(`) as the response function of y
t
to a unitary change in e
t
. Then S
y
(
j
) =
|D(e
i
j
)|
2
S
e
(
j
) where |D(e
i
j
)|
2
= D(e
i
j
)D(e
i
j
) and D(e
i
j
) =
P

e
i
j

mea-
sures the eect of a unitary change in e
t
at frequency
j
.
26
Example 1.29 Suppose that y
t
= a
0
+ a
1
t + D(`)e
t
where e
t
iid (0,
2
e
). Since y
t
has a (linear) trend is not stationary in levels and S(
j
) does not exists. Dierencing the
process we have y
t
y
t1
= a
1
+D(`)(e
t
e
t1
) so that y
t
y
t1
is stationary if e
t
e
t1
is a stationary and all the roots of D(`) are greater than one in absolute value. If these
conditions are met the spectrum of y
t
y
t1
is well dened and equals |D(e
ij
)|
2
S
e
(
j
).
The quantity B(e
i
j
) is called transfer function of the lter. Various functions of
this quantity are of interest in time series analysis. For example, |B(e
i
j
)|
2
, the square
modulus of the transfer function, measures the change in variance of e
t
induced by the
lter. Furthermore, since B(e
i
j
) is, in general, complex two alternative representations
of the transfer function of the lter exist. The rst decomposes it into its real and complex
part, i.e. B(e
i
j
) = B

(
j
) + iB

(
j
) where both B

and B

are real. Then the phase


shift is Ph(
j
) = tan
1
[
B

(
j
)
B

(
j
)
], measures how much the lead-lag relationships in e
t
are
altered by the lter. The second can be obtained using the polar representation B(e
i
j
) =
Ga(
j
)e
iPh(
j
)
where Ga(
j
) is the gain. Here Ga(
j
) = |B(e
i
j
)| measures the change
in the amplitude of cycles induced by the lter.
Low Pass
Frequency
0.0 1.5 3.0
0.00
0.25
0.50
0.75
1.00
1.25
High Pass
Frequency
0.0 1.5 3.0
0.00
0.25
0.50
0.75
1.00
1.25
Band Pass
Frequency
0.0 1.5 3.0
0.00
0.25
0.50
0.75
1.00
1.25
Figure 1.6: Filters
Filtering is an operation frequently performed in every day life (e.g. tuning a radio on
a station lters out all other signals (waves)). Several types of lters are used in modern
macroeconomics. Figure 1.6 presents three general types of lters: a low pass, a high pass,
and a band pass. A low pass lter leaves the low frequencies of the spectrum unchanged
but wipes out high frequencies. A high pass lter does exactly the opposite. A band pass
lter is a combination of a low pass and a high pass lters: it wipes out very high and very
low frequencies and leaves unchanged frequencies in middle range.
Methods for Applied Macro Research 1: Preliminaries 27
Low pass, high pass and band pass lters are non-realizable, in the sense that with nite
amount of data it is impossible to construct lters that looks like those of gure 1.6. In
fact, using the inverse Fourier transform, one can show that these three lters (denoted,
respectively, by B(`)
lp
, B(`)
hp
, B(`)
bp
) have the time representation
Low pass: B
lp
0
=

1

; B
lp
j
=
sin(j
1
)
j
; j > 0, some
1
(0, ).
High pass: B
hp
0
= 1 B
lp
0
; B
hp
j
= B
lp
j
; j > 0.
Band pass: B
bp
j
= B
lp
j
(
2
) B
lp
j
(
1
); j > 0,
2
>
1
.
When j is nite the box-like spectral shape of these lters can only be approximated
with a bell-shaped function. This means that relative to the ideal, realizable lters generate
a loss of power at the edges of the band (a phenomenon called leakage) and an increase in the
importance of the frequencies in the middle of the band (a phenomenon called compression).
Approximations to these ideal lters are discussed in chapter 3.
Denition 1.31 The periodogram of a stationary process y
t
() is Pe
y
(
j
) =
P

[
ACF()
e
i
j

where
[
ACF
y
=
1
T
P
t
(y
t

1
T
P
t
y
t
)(y
t

1
T
P
t
y
t
)
0
.
Perhaps surprisingly, the periodogram is an inconsistent estimator of the spectrum (see
e.g. Priestley (1981, p. 433)). Intuitively, this occurs because the periodogram consistently
captures the power of y
t
in a band of frequencies but not in each single one of them. To
obtain consistent estimates it is necessary to smooth periodogram estimates with a lter.
Such a smoothing lter is typically called a kernel.
Denition 1.32 For any > 0, a lter B() is a kernel (denoted by K
T
()) if K
T
() 0
uniformly as T for || > .
Kernels can be applied to both ACF estimates and/or periodogram estimates. When
applied to the periodogram, a kernel produces an estimate of the spectrum at
j
using
weighted average of the values of the periodogram for s in a neighborhood of
j
. Note that
this neighborhood is shrinking as T since the bias in ACF estimates asymptotically
disappears. Hence, in the limit, K
T
() looks like a -function, i.e. it puts all its mass at
one point.
There are several types of kernels. Those used in this book are the following:
1) Box-Car (Truncated) K
TR
() =

1 if || J(T)
0 otherwise
2) Bartlett K
BT
() =
(
1
||
J(T)
if || J(T)
0 otherwise
3) Parzen K
PR
() =
_

_
1 6(

J(T)
)
2
+ 6(
||
J(T)
)
3
0 || J(T)/2
2(1
||
J(T)
)
3
J(T)/2 || J(T)
0 otherwise
28
4) Quadratic spectral K
QS
() =
25
12
2

2
(
sin(6/5)
6/5
cos(6/5))
Bartlett
Lags
S
p
e
c
t
r
a
l

p
o
w
e
r
-25 0 25
-0.25
0.00
0.25
0.50
0.75
1.00
1.25
Quadratic Spectral
Lags
S
p
e
c
t
r
a
l

p
o
w
e
r
-25 0 25
-0.25
0.00
0.25
0.50
0.75
1.00
1.25
Figure 1.7: Kernels
Here J(T) is a truncation point, which typically is a function of the sample size T. Note
that the quadratic spectral kernel has no truncation point. However, for this kernel it is
useful to dene the rst time that K
QS
crosses zero (call it J

(T)) and this point will play


the same role as J(T) in the other three kernels.
The Barlett kernel and the quadratic spectral kernel are the most popular ones. The
Bartlett kernel has the shape of a tent with width 2J(T). To insure consistency of the
spectral estimates, it is typically to choose J(T) so that
J(T)
T
0 as T . In gure
1.7 we have selected J(T)=20. The quadratic spectral kernel has the form of a wave with
innite loops, but after the rst crossing, side loops are small.
Exercise 1.38 Show that the coherence estimator Co() =
|
b
Sy
1
,y
2
()|
q
b
Sy
1
,y
1
b
Sy
2
,y
2
is consistent,
where
b
S
y
i
,y
i
0
=
1
2
P
T1
=T+1
[
ACF
y
1
,y
i
0
()K
T
()e
i
, K
T
() is a kernel and i, i
0
= 1, 2.
While for most part of this book we consider stationary processes, we will also deal with
processes which are only locally stationary (e.g. processes with time varying coecients).
For these processes, the spectral density is not dened. However, it is possible to dene a
local spectral density and practically all the properties we have described here apply also
to this alternative construction. For details, see Priestley (1980, chapter 11).

You might also like