0% found this document useful (0 votes)
33 views130 pages

An Glicky 2016

This document provides lecture notes on stochastic processes. It defines key concepts such as stochastic processes, state space, finite-dimensional distributions, and Daniell-Kolmogorov theorem. It also introduces important classes of stochastic processes and discusses Hilbert spaces, mean square convergence, and continuous time processes in L2 spaces.

Uploaded by

Jules
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views130 pages

An Glicky 2016

This document provides lecture notes on stochastic processes. It defines key concepts such as stochastic processes, state space, finite-dimensional distributions, and Daniell-Kolmogorov theorem. It also introduces important classes of stochastic processes and discusses Hilbert spaces, mean square convergence, and continuous time processes in L2 spaces.

Uploaded by

Jules
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 130

Department of Probability and Mathematical Statistics

Stochastic Processes 2
Lecture Notes

Zuzana Prášková

2017

1
Contents
1 Definitions and basic characteristics 4
1.1 Definition of a stochastic process . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Daniell-Kolmogorov theorem . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Autocovariance and autocorrelation function . . . . . . . . . . . . . . . . 5
1.4 Strict and weak stationarity . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Properties of autocovariance function . . . . . . . . . . . . . . . . . . . . 8

2 Some important classes of stochastic processes 11


2.1 Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Independent increment processes . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Hilbert space 13
3.1 Inner product space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Convergence in norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Space L2 (Ω, A, P ) 15
4.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Mean square convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Hilbert space generated by a stochastic process . . . . . . . . . . . . . . 16

5 Continuous time processes in L2 (Ω, A, P ) 18


5.1 Mean square continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Mean square derivative of the process . . . . . . . . . . . . . . . . . . . . 22
5.3 Riemann integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6 Spectral decomposition of autocovariance function 25


6.1 Auxiliary assertions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2 Spectral decomposition of autocovariance function . . . . . . . . . . . . . 26
6.3 Existence and computation of spectral density . . . . . . . . . . . . . . . 29

7 Spectral representation of stochastic processes 33


7.1 Orthogonal increment processes . . . . . . . . . . . . . . . . . . . . . . . 33
7.2 Integral with respect to an orthogonal increment process . . . . . . . . . 36
7.3 Spectral decomposition of a stochastic process . . . . . . . . . . . . . . . 41

8 Linear models of time series 46


8.1 White noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.2 Moving average sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.3 Linear process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2
8.4 Autoregressive sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8.5 ARMA sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.6 Linear filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

9 Selected limit theorems 65


9.1 Laws of large numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.2 Central limit theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

10 Prediction in time domain 79


10.1 Projection in Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . 79
10.2 Prediction based on finite history . . . . . . . . . . . . . . . . . . . . . . 80
10.3 Prediction from infinite history . . . . . . . . . . . . . . . . . . . . . . . 91

11 Prediction in the spectral domain 95

12 Filtration of signal and noise 99


12.1 Filtration in finite stationary sequences . . . . . . . . . . . . . . . . . . . 99
12.2 Filtration in an infinite stationary sequence . . . . . . . . . . . . . . . . . 100

13 Partial autocorrelation function 103

14 Estimators of the mean and the autocorrelation function 107


14.1 Estimation of the mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
14.2 Estimators of the autocovariance and the autocorrelation function . . . . 110

15 Estimation of parameters in ARMA models 115


15.1 Estimation in AR sequences . . . . . . . . . . . . . . . . . . . . . . . . . 115
15.2 Estimation of parameters in MA and ARMA models . . . . . . . . . . . 119

16 Periodogram 122

References 128

List of symbols 130

3
1 Definitions and basic characteristics
1.1 Definition of a stochastic process
Definition 1. Let (Ω, A, P) be a probability space, (S, E) a measurable space, and
T ⊂ R. A family of random variables {Xt , t ∈ T } defined on (Ω, A, P) with values in S
is called a stochastic (random) process.
If S = R, {Xt , t ∈ T } is called the real-valued stochastic process.
If T = Z = {0, ±1, ±2, . . . } or T ⊂ Z, {Xt , t ∈ T } is called the discrete time stochastic
process, time series.
If T = [a, b], −∞ ≤ a < b ≤ ∞, {Xt , t ∈ T } is the continuous time stochastic process.
For any ω ∈ Ω fixed, Xt (ω) is a function on T with values in S which is called a trajectory
of the process.

Definition 2. The pair (S, E), where S is the set of values of random variables Xt and
E is the σ−algebra of subsets of S, is called the state space of the process {Xt , t ∈ T }.

Definition 3. A real-valued stochastic process {Xt , t ∈ T } is said to be measurable, if


the mapping (ω, t) → Xt (ω) is A ⊗ BT −measurable, where BT is the σ−algebra of Borel
subsets of T and A ⊗ BT is the product σ−algebra.

Finite-dimensional distributions of a stochastic process:

Let {Xt , t ∈ T } be a stochastic process. Then, ∀n ∈ N and any finite subset


{t1 , . . . , tn } ⊂ T there is a system of random variables Xt1 , . . . , Xtn with the joint distri-
bution function

P [Xt1 ≤ x1 , . . . , Xtn ≤ xn ] = Ft1 ,...,tn (x1 , . . . , xn )

for all real valued x1 , . . . , xn .

A system of distribution functions is said to be consistent, if


1. Fti1 ,...,tin (xi1 , . . . , xin ) = Ft1 ,...,tn (x1 , . . . , xn ) for any permutation (i1 , . . . , in ) of in-
dices 1, . . . , n (symmetry)

2. limxn →∞ Ft1 ,...,tn (x1 , . . . , xn ) = Ft1 ,...,tn−1 (x1 , . . . , xn−1 ) (consistency)


The characteristic function of a random vector X = (X1 , . . . , Xn ) is
>X
Pn
uj Xj
ϕX (u) := ϕ(u1 , . . . , un ) = Eeiu = Eei j=1 u = (u1 , . . . , un ) ∈ Rn .

4
A system of characteristic functions is said to be consistent if

1. ϕ(ui1 , . . . , uin ) = ϕ(u1 , . . . , un ) for any permutation (i1 , . . . , in ) of (1, . . . , n), (sym-
metry)

2. limun →0 ϕXt1 ,...,Xtn (u1 , . . . , un ) = ϕXt1 ,...,Xtn−1 (u1 , . . . , un−1 ) (consistency)

1.2 Daniell-Kolmogorov theorem


For any stochastic process there exists a consistent system of distribution functions. On
the other hand, the following theorem holds.

Theorem 1. Let {Ft1 ,...,tn (x1 , . . . , xn )} be a consistent system of distribution functions.


Then there exists a stochastic process {Xt , t ∈ T } such that for any n ∈ N, any
t1 , . . . , tn ∈ T and any real x1 , . . . , xn

P [Xt1 ≤ x1 , . . . , Xtn ≤ xn ] = Ft1 ,...,tn (x1 , . . . , xn ).

Proof. Štěpán (1987), Theorem I.10.3.

1.3 Autocovariance and autocorrelation function


√ variable X is defined by X = Y + iZ, where Y
Definition 4. A complex-valued random
and Z are real random variables, i = −1.
The mean value of a complex-valued random variable X = Y + iZ is defined by EX =
EY + iEZ provided the mean values EY and EZ exist.
The variance of a complex-valued random variable X = Y + iZ is defined by var X :=
 
E (X − EX)(X − EX) = E|X − EX|2 ≥ 0 provided the second moments of random
variables Y and Z exist.

Definition 5. A complex-valued stochastic process is a family of complex-valued random


variables on (Ω, A, P).

Definition 6. Let {Xt , t ∈ T } be a stochastic process such that EXt := µt exists for all
t ∈ T. Then the function {µt , t ∈ T } defined on T is called the mean value of the process
{Xt , t ∈ T }. We say that the process is centered if its mean value is zero, i.e., µt = 0 for
all t ∈ T.

5
Definition 7. Let {Xt , t ∈ T } be a process with finite second order moments, i.e.,
E|Xt |2 < ∞, ∀t ∈ T. Then a (complex-valued) function defined on T × T by
 
R(s, t) = E (Xs − µs )(X t − µt )
is called the autocovariance function of the process {Xt , t ∈ T }. The value R(t, t) is the
variance of the process at time t.

Definition 8. The autocorrelation function of the process {Xt , t ∈ T } with positive


variances is defined by
R(s, t)
r(s, t) = p p , s, t ∈ T.
R(s, s) R(t, t)

Definition 9. A stochastic process {Xt , t ∈ T } is called Gaussian, if for any n ∈ N


and t1 , . . . , tn ∈ T, the vector (Xt1 , . . . , Xtn )> is normally distributed Nn (mt , Vt ), where
mt = (EXt1 , . . . , EXtn )> and
 
varXt1 cov(Xt1 , Xt2 ) . . . cov(Xt1 , Xtn )
 cov(Xt , Xt ) varXt2 . . . cov(Xt2 , Xtn ) 
2 1
Vt =  .
 
. . .
 ... ... ... 
cov(Xtn , Xt1 ) cov(Xtn , Xt2 ) . . . varXtn

1.4 Strict and weak stationarity


Definition 10. A stochastic process {Xt , t ∈ T } is said to be strictly stationary, if for
any n ∈ N, for any x1 , . . . , xn ∈ R and for any t1 , . . . , tn and h such that tk ∈ T, tk + h ∈
T, 1 ≤ k ≤ n,
Ft1 ,...,tn (x1 , . . . , xn ) = Ft1 +h,...,tn +h (x1 , . . . , xn ).

Definition 11. A stochastic process {Xt , t ∈ T } with finite second order moments is
said to be weakly stationary or second order stationary, if its mean value is constant,
µt = µ, ∀t ∈ T, and its autocovariance function R(s, t) is a function of s − t, only. If
only the latter condition is satisfied, the process is called covariance stationary.
The autocovariance function of a weakly stationary process is a function of one variable:
R(t) := R(t, 0), t ∈ T.

The autocorrelation function in such case is


R(t)
r(t) = , t ∈ T.
R(0)

6
Theorem 2. Any strictly stationary stochastic process {Xt , t ∈ T } with finite second
order moments is also weakly stationary.
Proof. If {Xt , t ∈ T } is strictly stationary with finite second order moments, Xt are
equally distributed for all t ∈ T with the mean value

EXt = EXt+h , ∀t ∈ T, ∀h : t + h ∈ T.

Especially, for h = −t, EXt = EX0 = const.


Similarly, (Xt , Xs ) are equally distributed and

E [Xt Xs ] = E [Xt+h Xs+h ] ∀s, t ∈ T, ∀h : s + h ∈ T, t + h ∈ T.

Especially, for h = −t, E [Xt Xs ] = E [X0 Xs−t ] is a function of s − t.

Example 1. Let {Xt , t ∈ Z} be a sequence of i.i.d. random variables with a distribution


function F. Since for all n ∈ N and all t1 , . . . , tn , h ∈ Z,

Ft1 ,...,tn (x1 , . . . , xn ) = P [Xt1 ≤ x1 , . . . , Xtn ≤ xn ] =


Yn Yn
= P [Xti ≤ xi ] = F (xi ),
i=1 i=1
Ft1 +h,...,tn +h (x1 , . . . , xn ) = P [Xt1 +h ≤ x1 , . . . , Xtn +h ≤ xn ] =
Yn Yn
= P [Xti +h ≤ xi ] = F (xi ),
i=1 i=1

{Xt , t ∈ Z} is strictly stationary.

Example 2. Let {Xt , t ∈ Z} be a sequence defined by Xt = (−1)t X, where X is a


random variable such that
(
− 14 with probability 34 ,
X= 3
.
4
with probability 14 .

Then {Xt , t ∈ Z} is weakly stationary, since

EXt = 0,
3
var Xt = σ 2 = ,
16
R(s, t) = σ 2 (−1)s+t = σ 2 (−1)s−t ,

but it is not strictly stationary (variables X and −X are not equally distributed).

7
Theorem 3. Any weakly stationary Gaussian process {Xt , t ∈ T } is also strictly sta-
tionary.

Proof. Weak stationarity of the process {Xt , t ∈ T } implies EXt = µ, cov (Xt , Xs ) =
R(t − s) = cov (Xt+h , Xs+h ), t, s ∈ T, thus, for all n ∈ N and all t1 , . . . , tn , h ∈ Z,

E(Xt1 , . . . , Xtn ) = E(Xt1 +h , . . . , Xtn +h ) = (µ, . . . , µ) := µ,

var (Xt1 , . . . , Xtn ) = var (Xt1 +h , . . . , Xtn +h ) := Σ


where
 
R(0) R(t2 − t1 ) . . . R(tn − t1 )
 R(t2 − t1 ) R(0) . . . R(tn − t2 ) 
Σ= .
 
.. .. .. ..
 . . . . 
... R(0)

Since the joint distribution of a normal vector is uniquely defined by the vector of mean
values and the variance matrix, (Xt1 , . . . , Xtn ) ∼ N (µ, Σ), and (Xt1 +h , . . . , Xtn +h ) ∼
N (µ, Σ) from which the strict stationarity of {Xt , t ∈ T } follows.

1.5 Properties of autocovariance function

Theorem 4. Let {Xt , t ∈ T } be a process with finite second moments. Then its autoco-
variance function satisfies

R(t, t) ≥ 0,
p p
|R(s, t)| ≤ R(s, s) R(t, t).

Proof. The first assertion follows from the definition of the variance. The second one
follows from the Schwarz inequality, since
1 1
|R(s, t)| = |E(Xs − EXs )(Xt − EXt )| ≤ (E|Xs − EXs |2 ) 2 (E|Xt − EXt |2 ) 2
p p
= R(s, s) R(t, t).

Thus, for the autocovariance function of a weakly stationary process we have R(0) ≥ 0
and |R(t)| ≤ R(0).

8
Definition 12. Let f be a complex-valued function defined on T × T , T ⊂ R. We
say that f is positive semidefinite, if ∀n ∈ N, any complex numbers c1 , . . . , cn and any
t1 , . . . , tn ∈ T,
Xn X n
cj ck f (tj , tk ) ≥ 0.
j=1 k=1

We say that a complex-valued function g on T is positive semidefinite, if ∀n ∈ N,


any complex numbers c1 , . . . , cn and any t1 , . . . , tn ∈ T , such that tj − tk ∈ T ,
n X
X n
cj ck g(tj − tk ) ≥ 0.
j=1 k=1

Definition 13. We say that a complex-valued function f on T × T is Hermitian, if


f (s, t) = f (t, s) ∀s, t ∈ T . A complex-valued function g on T is called Hermitian, if
g(−t) = g(t) ∀t ∈ T .

Theorem 5. Any positive semidefinite function is also Hermitian.


Proof. Use the definition of positive semidefiniteness
√ and for n = 1 choose c1 = 1; for
n = 2 choose c1 = 1, c2 = 1 and c1 = 1, c2 = i(= −1).

Remark 1. A positive semidefinite real-valued function f on T × T is symmetric, i.e.,


f (s, t) = f (t, s) for all s, t ∈ T. A positive semidefinite real-valued function g on T is
symmetric, i.e, g(t) = g(−t) for all t ∈ T .

Theorem 6. Let {Xt , t ∈ T } be a process with finite second order moments. Then its
autocovariance function is positive semidefinite on T × T.
Proof. W. l. o. g., suppose that the process is centered. Then for any n ∈ N, complex
constants c1 , . . . , cn and t1 , . . . , tn ∈ T
2
n
" n n
#
X X X
0 ≤ E cj Xtj = E cj Xtj ck Xtk


j=1 j=1 k=1
n
XX n n
XXn
= cj ck E(Xtj Xtk ) = cj ck R(tj , tk ).
j=1 k=1 j=1 k=1

9
Theorem 7. To any positive semidefinite function R on T × T there exists a stochas-
tic process {Xt , t ∈ T } with finite second order moments such that its autocovariance
function is R.
Proof. The proof will be given for real-valued function R, only. For the proof with
complex-valued R see, e.g., Loève (1955), Chap. X, Par. 34.

Since R is positive semidefinite, for any n ∈ N and any real t1 , . . . , tn ∈ T, the matrix
 
R(t1 , t1 ) R(t1 , t2 ) . . . R(t1 , tn )
 R(t2 , t1 ) R(t2 , t2 ) . . . R(t2 , tn ) 
Vt =  
 ... ... ... ... 
R(tn , t1 ) R(tn , t2 ) . . . R(tn , tn )
is positive semidefinite. The function
 
1 >
ϕ(u) = exp − u Vt u , u ∈ Rn
2
is the characteristic function of the normal distribution Nn (0, Vt ). In this way, ∀n ∈ N
and any real t1 , . . . , tn ∈ T we get the consistent system of characteristic functions. The
corresponding system of the distribution functions is also consistent. Thus according to
the Daniell-Kolmogorov theorem (Theorem 1), there exists a Gaussian stochastic process
covariances of which are the values of the function R(s, t); hence, R is the autocovariance
function of this process.

Example 3. Decide whether the function cos t, t ∈ T = (−∞, ∞) is an autocovariance


function of a stochastic process.
Solution: It suffices to show that cos t is the positive semidefinite function. Consider
n ∈ N, c1 , . . . , cn ∈ C a t1 , . . . , tn ∈ R. Then we have
n X
X n n X
X n
cj ck cos(tj − tk ) = cj ck (cos tj cos tk + sin tj sin tk )
j=1 k=1 j=1 k=1
2 2
Xn Xn
= cj cos tj + ck sin tk ≥ 0.


j=1 k=1

The function cos t is positive semidefinite, and according to Theorem 6 there exists
a (Gaussian) stochastic process {Xt , t ∈ T }, the autocovariance function of which is
R(s, t) = cos(s − t).
Theorem 8. The sum of two positive semidefinite functions is a positive semidefinite
function.

10
Proof. It follows from the definition of the positive semidefinite function. If f and g
are positive semidefinite and h = f + g, then for any n ∈ N, complex c1 , . . . , cn and
t1 , . . . , tn ∈ T
n X
X n n X
X n
cj ck h(tj , tk ) = cj ck [f (tj , tk ) + g(tj , tk )]
j=1 k=1 j=1 k=1
X n X n X n X n
= cj ck f (tj , tk ) + cj ck g(tj , tk ) ≥ 0.
j=1 k=1 j=1 k=1

Corollary 1. Sum of two autocovariance functions is the autocovariance function of a


stochastic process with finite second moments.
Proof. It follows from Theorems 6–8.

2 Some important classes of stochastic processes


2.1 Markov processes
Definition 14. We say that {Xt , t ∈ T } is aMarkov process with the state space (S, E),
if for any t0 , t1 , . . . , tn , 0 ≤ t0 < t1 < · · · < tn ,

P (Xtn ≤ x|Xtn−1 , . . . , Xt0 ) = P (Xtn ≤ x|Xtn−1 ) a. s. (1)

for all x ∈ R.
Relation (1) is called the Markovian property. Simple cases are discrete state Markov
processes, i.e., discrete and continuous time Markov chains.

Example 4. Consider a Markov chain {Xt , t ≥ 0} with the state space S = {0, 1}, the
initial distribution P (X0 = 0) = 1, P (X0 = 1) = 0 and the intensity matrix
 
−α α
Q= , α > 0, β > 0.
β −β
Treat the stationarity of this process.
We know that all the finite dimensional distributions of a continuous time Markov
chain are determined by the initial distribution p(0) = {pj (0), j ∈ S}T and the transition
probability matrix P(t) = {pij (t), i, j ∈ S}. In our case, P(t) = exp(Qt), where

β + αe−(α+β)t α − αe−(α+β)t
 
1
P(t) =
α + β β − βe−(α+β)t α + βe−(α+β)t

11
(see, e.g., Prášková and Lachout, 2012, pp. 93–95) and due to the initial distribution,
the absolute distribution is

p(t)T = p(0)T P(t) = (1, 0)T P(t) = (p00 (t), p01 (t))T .

Then we have
1
· α − αe−(α+β)t ,

EXt = P(Xt = 1) = p01 (t) =
α+β
which depends on t, thus, the process is neither strictly nor weakly stationary.
On the other hand, if the initial distribution is the stationary distribution of the
Markov chain, i.e., such probability distribution that satisfies π T = π T P(t), then
{Xt , t ≥ 0} is the strictly stationary process (Prášková and Lachout, 2012, Theorem
3.12).
In our case, the solution of π T = π T P(t) gives
β α
π0 = , π1 =
α+β α+β
α
and from here we get the constant mean EXt = α+β
and the autocovariance function

αβ
R(s, t) = 2
e−(α+β)|s−t| .
(α + β)

2.2 Independent increment processes


Definition 15. A process {Xt , t ∈ T }, where T is an interval, has independent incre-
ments, if for any t1 , t2 , . . . , tn ∈ T such that t1 < t2 < · · · < tn , the random variables
Xt2 − Xt1 , . . . , Xtn − Xtn−1 are independent.
If for any s, t ∈ T, s < t, the distribution of the increments Xt − Xs depends only on
t − s, we say that {Xt , t ∈ T } has stationary increments.

Example 5. Poisson process with intensity λ is a continuous time Markov chain {Xt , t ≥
0} such that X0 = 0 a. s. and for t > 0, Xt has the Poisson distribution with parameter
λt. Increments Xt − Xs , s < t have the Poisson distribution with the parameter λ(t − s).
The Poisson process is neither strictly nor weakly stationary.

Example 6. Wiener process (Brownian motion process) is a Gaussian stochastic process


{Wt , t ≥ 0} with the properties

1. W0 = 0 a. s. and {Wt , t ≥ 0} has continuous trajectories

2. For any 0 ≤ t1 < t2 < · · · < tn , Wt1 , Wt2 − Wt1 , Wt3 − Wt2 , . . . , Wtn − Wtn−1 are
independent random variables (independent increments).

12
3. For any 0 ≤ t < s, the increments Ws − Wt have normal distribution with zero
mean and the variance σ 2 (s − t), where σ 2 is a positive constant. Especially, for
any t ≥ 0, EWt = 0 and var Wt = σ 2 t.

The Wiener process is Markov but it is neither strictly nor weakly stationary.

2.3 Martingales
Definition 16. Let {Ω, A, P } be a probability space, T ⊂ R, T 6= ∅. Let for any t ∈
T, Ft ⊂ A be a σ− algebra (σ−field). A system of σ-fields {Ft , t ∈ T } such that Fs ⊂ Ft
for any s, t ∈ T, s < t is called a filtration.

Definition 17. Let {Xt , t ∈ T } be a stochastic process defined on {Ω, A, P }, and let
{Ft , t ∈ T } be a filtration. We say that {Xt , t ∈ T } is adapted to {Ft , t ∈ T } if for any
t ∈ T, Xt is Ft measurable.

Definition 18. Let {Xt , t ∈ T } be adapted to {Ft , t ∈ T } and E|Xt | < ∞ for all t ∈ T.
Then {Xt , t ∈ T } is said to be a martingale if EXt |Fs = Xs a.s. for any s < t, s, t, ∈ T.

3 Hilbert space
3.1 Inner product space
Definition 19. A complex vector space H is said to be an inner product space, if for
any x, y ∈ H there exists a number hx, yi ∈ C, called the inner product of elements x, y
such that

1. hx, yi = hy, xi.

2. ∀α ∈ C, ∀x, y ∈ H : hαx, yi = αhx, yi.

3. ∀x, y, z ∈ H hx + y, zi = hx, zi + hy, zi.

4. ∀x ∈ H is hx, xi ≥ 0; hx, xi = 0 ⇔ x = 0 (0 is zero element in H).

The number p
kxk := hx, xi, ∀x ∈ H
is called the norm of an element x.
p
Theorem 9. For the norm kxk := hx, xi the following properties hold:

1. kxk ≥ 0 ∀x ∈ H and kxk = 0 ⇔ x = 0.

2. ∀α ∈ C, ∀x ∈ H: kαxk = |α| · kxk.

13
3. ∀x, y ∈ H: kx + yk ≤ kxk + kyk.
p p
4. ∀x, y ∈ H : |hx, yi| ≤ kxk · kyk = hx, xi hy, yi
(the Cauchy-Schwarz inequality).
Proof. It can be found in any textbook on Functional Analysis, see, e.g., Rudin (2003),
Chap. 4.

3.2 Convergence in norm


Definition 20. We say that a sequence {xn , n ∈ N} of elements of an inner product
space H converges in norm to and element x ∈ H, if kxn − xk → 0 as n → ∞.

Theorem 10. (The inner product continuity)


Let {xn , n ∈ N} and {yn , n ∈ N} be sequences of elements of H. Let x, y ∈ H and
xn → x, yn → y in norm as n → ∞. Then
||xn || → ||x||
hxn , yn i → hx, yi.
Proof. From the triangle inequality we get
||x|| ≤ ||x − y|| + ||y||
||y|| ≤ ||y − x|| + ||x|

||x|| − ||y|| ≤ ||x − y||.
From here we get the first assertion since

||xn || − ||x|| ≤ ||xn − x||.
The second assertion is obtained by using the Cauchy-Schwarz inequality:
|hxn , yn i − hx, yi| = |hxn − x + x, yn − y + yi − hx, yi|
≤ |hxn − x, yn − yi| + |hx, yn − yi|+
+ |hxn − x, yi|
≤ kxn − xk · kyn − yk + kxk · kyn − yk+
+ kxn − xk · kyk.

Definition 21. A sequence {xn , n ∈ N} of elements of H is said to be a Cauchy se-


quence, if kxn − xm k → 0 as n, m → ∞.

Definition 22. An inner product space H is defined to be a Hilbert space, if it is


complete, i.e., if any Cauchy sequence of elements of H converges in norm to some
element of H.

14
4 Space L2(Ω, A, P )
4.1 Construction
Let L be the set of all random variables with finite second order moments defined on a
probability space (Ω, A, P). We can easily verify that L is the vector space:

1. ∀ X, Y ∈ L, X + Y ∈ L since

E|X + Y |2 ≤ 2 E|X|2 + E|Y |2 < ∞.




2. ∀X ∈ L and ∀α ∈ C, αX ∈ L, since

E|αX|2 = |α|2 · E|X|2 < ∞.

3. The null element of L is the random variable identically equal to zero.

On space L we define classes of equivalent random variables that satisfy

X ∼ Y ⇐⇒ P [X = Y ] = 1

and on the set of classes of equivalent random variables from L define the relation
 
hX, Y i = E XY , ∀X ∈ X, e Y ∈ Ye ,

where X,
e Ye denote classes of equivalence.

The space of classes of equivalence on L with the above relation h., .i is denoted
L2 (Ω, A, P). The relation hX, Y i satisfies the properties of the inner product on L2 (Ω, A, P):
For every X, Y, Z ∈ L2 (Ω, A, P) and every α ∈ C it holds
   
1. hαX, Y i = E αXY = αE XY = αhX, Y i.
     
2. hX + Y, Zi = E (X + Y )Z = E XZ + E Y Z = hX, Zi + hY, Zi.
 
3. hX, Xi = E XX = E|X|2 ≥ 0.

4. hX, Xi = E|X|2 = 0 ⇔ X ∼ 0.

15
4.2 Mean square convergence
We have defined L2 (Ω, A, P) to be the space of classes of equivalence on L with the inner
product  
hX, Y i = E XY ,
the norm is therefore defined by
p
||X|| := E|X|2

and the convergence in L2 (Ω, A, P) is the convergence in this norm.

Definition 23. We say that a sequence of random variables Xn such that |EXn |2 < ∞
converges in the mean square (or in the squared mean) to a random variable X, if it
converges to X in L2 (Ω, A, P), i. e.,

||Xn − X||2 = E|Xn − X|2 → 0 as n → ∞.

Notation: X = l. i. m. Xn (limit in the (squared) mean).

Theorem 11. The space L2 (Ω, A, P) is complete.

Proof. See, e. g., Brockwell and Davis (1991), Par. 2.10, or Rudin (2003), Theorem
3.11.
The space L2 (Ω, A, P) is the Hilbert space.
Convention: A stochastic process {Xt , t ∈ T } such that E|Xt |2 < ∞ will be called a
second order process.

4.3 Hilbert space generated by a stochastic process


Definition 24. Let {Xt , t ∈ T } be a stochastic process with finite second moments on
an (Ω, A, P). The set M{Xt , t ∈ T } of all finite linear combinations of random variables
from {Xt , t ∈ T } is a linear span of the process {Xt , t ∈ T }, i.e.,
( n )
X
M{Xt , t ∈ T } = ck Xtk , n ∈ N, c1 , . . . , cn ∈ C, t1 , . . . , tn ∈ T .
k=1

Equivalence classes in M{Xt , t ∈ T } and the inner product hX, Y i are defined as
above.

16
Definition 25. A closure M{Xt , t ∈ T } of the linear span M{Xt , t ∈ T } consists of all
the elements of M{Xt , t ∈ T } and the mean square limits of all convergent sequences
of elements of M{Xt , t ∈ T }.

Then M{Xt , t ∈ T } is a closed subspace of the complete space L2 (Ω, A, P) and thus
a complete inner product space. It is called the Hilbert space generated by a stochastic
process {Xt , t ∈ T }, notation H{Xt , t ∈ T }.
Definition 26. Let {Xth , t ∈ T }h∈S , T ⊂ R, S ⊂ R, be a collection of stochastic pro-
cesses in L2 (Ω, A, P) (shortly: second order processes). We say that processes {Xth , t ∈
T }h∈S converge in mean square to a second order process {Xt , t ∈ T } as h → h0 , if

∀t ∈ T : Xth −→ Xt in mean square, i. e., E|Xth − Xt |2 −→ 0.


h→h0 h→h0

Briefly, we write

{Xth , t ∈ T }h∈S −→ {Xt , t ∈ T } in mean square.


h→h0

Theorem 12. Centered second order processes {Xth , t ∈ T }h∈S converge in mean square
to a centered second order process {Xt , t ∈ T } as h → h0 if and only if
h i
E Xth Xth → b(t) as h, h0 → h0 ,
0

where b(.) is a finite function on T .

When processes {Xth , t ∈ T }h∈S converge to a process {Xt , t ∈ T } in mean square


as h → h0 , the autocovariance functions of the processes {Xth , t ∈ T }h∈S converge to the
autocovariance function of {Xt , t ∈ T } as h → h0 .
Proof. 1. Let {Xth , t ∈ T }h∈S −→ {Xt , t ∈ T } in mean square. Then ∀t, t0 ∈ T
h→h0

Xth −→ Xt in mean square


h→h0

0
Xth0 −→ Xt0 in mean square.
h0 →h0

From the continuity of the inner product we get, as h, h0 → h0 ,


h i
E Xth Xth0 −→ E Xt Xt0 .
0
 

Thus for t = t0 a h, h0 → h0 we have


h i
E Xth Xth −→ E Xt Xt = E|Xt |2 := b(t) < ∞,
0
 

17
since {Xt , t ∈ T } is a second order process. For h = h0 , we get
h i
E Xt Xt0 −→ E Xt Xt0 as t, t0 ∈ T,
h h
 
h→h0

h i
where E = Rh (t, t0 ) is the autocovariance function of the process {Xth , t ∈ T }
Xth Xth0
and E Xt Xt0 = R(t, t0 ) is the autocovariance function of the process {Xt , t ∈ T }.
 

2. Let {Xth , t ∈ T }h∈S be centered second order processes for which


h i
E Xth Xth → b(t) < ∞ as h, h0 → h0 a ∀t ∈ T.
0

Then
0
kXth − Xth k2 → 0, as h, h0 → h0 , ∀t ∈ T
since ∀t ∈ T
0
h 0
i
kXth − Xth k2 = E (Xth − Xth )(Xth − Xth )
0

h i h 0 i h i
= E Xth Xth − E Xth Xth − E Xth Xth +
0

h 0 i
+ E Xth Xth −→ b(t) − b(t) − b(t) + b(t) = 0
0

as h, h0 → h0 .
We have proved that processes {Xth , t ∈ T }h∈S satisfy the Cauchy property for any
t ∈ T. Due to the completeness of L2 (Ω, A, P), ∀t ∈ T, ∃Xt ∈ L2 (Ω, A, P) such that
Xth → Xt in mean square as h → h0 , thus E|Xt |2 < ∞ ∀t ∈ T. Therefore there exists a
limit process {Xt , t ∈ T } ∈ L2 (Ω, A, P). We prove that {Xt , t ∈ T } is centered:

EXt = EXt − EXth + EXth = E Xt − Xth .


 

Then
  q
h
2
|EXt | = E Xt − Xt ≤ E Xt − Xth → 0

as h → h0 , ∀t ∈ T .

5 Continuous time processes in L2(Ω, A, P )


5.1 Mean square continuity
Definition 27. Let {Xt , t ∈ T } be a second order process, T ⊂ R an open interval. We
say that the process {Xt , t ∈ T } is mean square continuous (or L2 -continuous) at point
t0 ∈ T , if
E|Xt − Xt0 |2 → 0 as t → t0 .

18
We say that the process {Xt , t ∈ T } is mean square continuous, if it is continuous at
each point of T .

Remark 2. A second order process that is mean square continuous is also stochastically
continuous (continuous in probability), since
h i
P |Xt − Xt0 | > ε ≤ ε−2 · E |Xt − Xt0 |2 .

Theorem 13. Let {Xt , t ∈ T } be a centered second order process, T ⊂ R be an interval.


Then {Xt , t ∈ T } is mean square continuous if and only if its autocovariance function
R(s, t) is continuous at points [s, t], such that s = t.

Proof. 1. Let {Xt , t ∈ T } be a centered mean square continuous process. We prove that
its autocovariance function is continuous at every point of T × T. Since EXt = 0, we
have ∀s0 , t0 ∈ T a s → s0 , t → t0
   
|R(s, t) − R(s0 , t0 )| = E Xs X t − E Xs0 X t0
= |hXs , Xt i − hXs0 , Xt0 i| → 0,

which follows from the continuity of the inner product, since Xt → Xt0 as t → t0 and
Xs → Xs0 as s → s0 , due to the continuity of the process.

2. Let R(s, t) be continuous at points [s, t] such that s = t. Then ∀t0 ∈ T

E|Xt − Xt0 |2 = E (Xt − Xt0 )(X t − X t0 ) =


 
       
= E Xt X t − E Xt X t0 − E Xt0 X t + E Xt0 X t0
= R(t, t) − R(t, t0 ) − R(t0 , t) + R(t0 , t0 ).

The limit on the right hand side is zero as t → t0 , thus the limit on the left hand side is
zero.

Theorem 14. Let {Xt , t ∈ T } be a second order process with a mean value {µt , t ∈ T }
and an autocovariance function R(s, t) defined on T × T . Then {Xt , t ∈ T } is mean
square continuous if {µt , t ∈ T } is continuous on T and R(s, t) is continuous at points
[s, t], such that s = t.

Proof.

E |Xt − Xt0 |2 = E |Xt − µt + µt − Xt0 |2 =


= E |Xt − µt − (Xt0 − µt0 ) + µt − µt0 |2
 

19
Put Yt := Xt − µt , ∀t ∈ T . Then {Yt , t ∈ T } is centered process with the same
autocovariance function R(s, t) and

E |Xt − Xt0 |2 = E |Yt − Yt0 + µt − µt0 |2 ≤


≤ 2E |Yt − Yt0 |2 + 2 |µt − µt0 |2

Theorem 15. Let {Xt , t ∈ T } be a centered weakly stationary process with an autoco-
variance function R(t). Then {Xt , t ∈ T } is mean square continuous if and only if R(t)
is continuous at zero.

Proof. Due to the weak stationarity, R(s, t) = R(s − t). Then the assertion follows from
the previous theorem.

Example 7. A centered weakly stationary process with the autocovariance function


R(t) = cos(t), t ∈ R, is mean square continuous.

Example 8. Let {Xt , t ∈ T }, T = R, be a process of uncorrelated random variables


with EXt = 0, t ∈ R and the same variance 0 < σ 2 < ∞. The autocovariance function
is R(s, t) = σ 2 δs−t where

1 if x = 0,
δ(x) =
0 if x 6= 0.

The process is weakly stationary, but not mean square continuous (the autocovariance
function is not continuous at zero).

Example 9. Wiener process {Wt , t ≥ 0} is a Gaussian process with independent and


stationary increments, EWt = 0, R(s, t) = EWs Wt = σ 2 · min{s, t}. The process is not
weakly neither strictly stationary (though Gaussian).
The process je centered, R(s, t) is continuous (thus at [s, t] with s = t). The process is
mean square continuous.

Example 10. Poisson process {Xt , t ≥ 0} with intensity λ > 0 is a process with
stationary and independent increments, Xt ∼ Po(λ). Since EXt = µt = λt, t ≥ 0 and
cov(Xs , Xt ) = λ · min{s, t}, the process is not weakly stationary.
Since µt is continuous, R(s, t) is continuous, the process is mean square continuous.

20
250

200

150

100

50

−50
0 0.5 1 1.5 2
4
x 10

Figure 1: A trajectory of a Wiener process

1.5

0.5

0
2
1.5 2
1 1.5
1
0.5 0.5
0 0

Figure 2: The autocovariance function of a Wiener process

21
5.2 Mean square derivative of the process
Definition 28. Let {Xt , t ∈ T } be a second order process, T ⊂ R an open interval. We
say that the process is mean square differentiable (L2 -differentiable) at point t0 ∈ T if
there exists the mean square limit
Xt0 +h − Xt0
l. i. m. := Xt00 .
h→0 h
This limit is called the mean square derivative (L2 -derivative) of the process at t0 .

We say that the process {Xt , t ∈ T } is mean square differentiable, if it is mean square
differentiable at every point t ∈ T .

Theorem 16. A centered second order process {Xt , t ∈ T } is mean square differentiable
if and only if there exists a finite generalized second-order partial derivative of its auto-
covariance function R(s, t) at points [s, t], where s = t, i.e., if at these points there exists
finite limit
1
lim [R(s + h, t + h0 ) − R(s, t + h0 ) − R(s + h, t) + R(s, t)] .
h,h →0 hh0
0

Proof. According to Theorem 12, the necessary and sufficient condition for the mean
square convergence of (Xt+h − Xt )/h is the existence of the finite limit

Xt+h − Xt Xt+h0 − Xt
 
lim E · =
h,h0 →0 h h0
1
lim [R(t + h, t + h0 ) − R(t, t + h0 ) − R(t + h, t) + R(t, t)] .
h,h0 →0 hh0

Remark 3. A sufficient condition the generalized second-order partial derivative of


R(s, t) to exist is the following one: Let [s, t] be an interior point in T × T . If there exist

∂ 2 R(s, t) ∂ 2 R(s, t)
and
∂s ∂t ∂t ∂s
and they are continuous, then there exists the generalized second-order partial derivative
2 R(s,t)
of R(s, t) and is equal to ∂ ∂s ∂t
(Anděl, 1976, p. 20).

Theorem 17. A second order process {Xt , t ∈ T } with the mean value {µt , t ∈ T } is
mean square differentiable, if {µt , t ∈ T } is differentiable and the generalized second-
order partial derivative of the autocovariance function exists and is finite at points [s, t],
such that s = t.

22
Proof. A sufficient condition for the mean square limit of Xt+hh−Xt to exists is the Cauchy
condition
Xt+h − Xt Xt+h0 − Xt 2

E − → 0 as h → 0, h0 → 0
h h0
∀t ∈ T. It holds since
Xt+h − Xt Xt+h0 − Xt 2 Yt+h − Yt Yt+h0 − Yt 2

E − ≤ 2E −
h h0 h h0
µt+h − µt µt+h0 − µt 2

+ 2
− ,
h h0

where Yt = Xt − µt . According to Theorem 16, the process {Yt , t ∈ T } is mean square


differentiable, and the first term on the right hand side of the previous inequality con-
verges to zero as h → 0, h0 → 0. The second term converges to zero, since function
{µt , t ∈ T } is differentiable.

Example 11. A centered weakly stationary process with the autocovariance function
R(s, t) = cos(s − t), s, t ∈ R, is mean square differentiable, since

∂ 2 cos(s − t) ∂ 2 cos(s − t)
and exist and they are continuous.
∂s∂t ∂t∂s
Example 12. Poisson process {Xt , t > 0} has the mean value µt = λt, which is continu-
ous and differentiable for all t > 0 and the autocovariance function R(s, t) = λ min(s, t).
The generalized second- order partial derivative of R(s, t) however is not finite: for s = t
we have
1
lim [s + h − min(s + h, s) − min(s, s + h) + s] = +∞,
h→0+ h2
1
lim 2 [s + h − min(s + h, s) − min(s, s + h) + s] = +∞.
h→0− h

Poisson process is not mean square differentiable.

5.3 Riemann integral


Definition 29. Let T = [a, b] be a closed interval, −∞ < a < b < +∞. Let Dn =
{tn,0 , tn,1 , . . . , tn,n }, where a = tn,0 < tn,1 < . . . . . . < tn,n = b, ∀n ∈ N be a partition of
the interval [a, b]. Denote the norm of the partition Dn to be

∆n := max (tn,i+1 − tn,i )


0≤i≤n−1

23
and define partial sums In of a centered second order process {Xt , t ∈ [a, b]} by
n−1
X
In := Xtn,i (tn,i+1 − tn,i ) , n ∈ N.
i=0

If the sequence {In , n ∈ N} has the mean square limit I for any partition of the interval
[a, b] such that ∆n → 0 as n → ∞, we call it Riemann integral of the process {Xt , t ∈
[a, b]} and write
Z b
I= Xt dt.
a
If the process {Xt , t ∈ T } has mean value {µt , t ∈ T }, we define the Riemann integral
of the process {Xt , t ∈ [a, b]} to be
Z b Z b Z b
Xt dt = (Xt − µt ) dt + µt dt,
a a a
Rb
if the centered process {Xt − µt } is Riemann integrable and a
µt dt exists and is finite.

Theorem 18. Let {Xt , t ∈ [a, b]} be a centered second order process with the autocovari-
Rb
ance function R(s, t). Then the Riemann integral a Xt dt exists, if the Riemann integral
RbRb
a a
R(s, t)dsdt exists and is finite.
Proof. Let Dm = {sm,0 , . . . , sm,m }, Dn = {tn,0 , . . . , tn,n } be partitions of interval [a, b],
the norms ∆m , ∆n of which converge to zero as m, n → ∞. Put
m−1
X
Im := (sm,j+1 − sm,j ) Xsm ,j
j=0
n−1
X
In := (tn,k+1 − tn,k ) Xtn ,k .
k=0
Rb
Similarly as in the proof of Theorem 12 we can see that a Xt dt exist if there exist the
finite limit
( "m−1 # " n−1 #)
  X X
E Im In = lim E Xsm,j (sm,j+1 − sm,j ) · Xtn,k (tn,k+1 − tn,k )
j=0 k=0
m−1
XX n−1
= lim R(sm,j , tn,k )(sm,j+1 − sm,j )(tn,k+1 − tn,k )
j=0 k=0

RbRb
as m, n → ∞, ∆m , ∆n → 0, which follows from the existence of a a
R(s, t)dsdt.

24
Rb
Example 13. The Riemann integral a Xt dt of a centered continuous time process with
the autocovariance function R(s, t) = cos(s − t) exists, since R(s, t) is continuous on
[a, b] × [a, b].
Example 14. Let {Xt , t ∈ R} be a centered second order process. We define
Z ∞ Z b
Xt dt := l. i. m. Xt dt as a → −∞, b → ∞,
−∞ a

if the limit and the Riemann integral on the right hand side exist.
Example 15. Poisson process {Xt , t ≥ 0} is Riemann integrable on any finite interval
[a, b] ⊂ [0, ∞), since its autocovariance function is continuous on [a, b] × [a, b].

6 Spectral decomposition of autocovariance function


6.1 Auxiliary assertions
Lemma 1. a) Let µ, ν be finite measures on the σ-field of Borel subsets of the interval
[−π, π]. If for every t ∈ Z,
Z π Z π
itλ
e dµ(λ) = eitλ dν(λ),
−π −π

then µ(B) = ν(B) for every Borel B ⊂ (−π, π) and µ({−π} ∪ {π}) = ν({−π} ∪ {π}).
b) Let µ, ν be finite measures on (R, B). If for every t ∈ R
Z ∞ Z ∞
itλ
e dµ(λ) = eitλ dν(λ),
−∞ −∞

then µ(B) = ν(B) for all B ⊂ B.


Proof. See Anděl (1976), III.1, Theorems 5 and 6.
Lemma 2 (Helly theorem). Let {Fn , n ∈ N} be a sequence on non-decreasing uniformly
bounded functions. Then there exists a subsequence {Fnk }, that, as k → ∞, nk → ∞,
converges weakly to a non-decreasing right-continuous function F , i.e., on the continuity
set of F .
Proof. Rao (1978), Theorem 2c.4, I.
Lemma 3 (Helly-Bray). Let {Fn , n ∈ N} be a sequence of non-decreasing uniformly
bounded functions that, as n → ∞, converges weakly to a non-decreasing bounded right-
continuous function F, and lim Fn (−∞) = F (−∞), lim Fn (+∞) = F (+∞). Let f be a
continuous bounded function. Then
Z ∞ Z ∞
f (x)dFn (x) −→ f (x)dF (x) as n → ∞.
−∞ −∞

25
Proof. Rao (1978), Theorem 2c.4, II.
Remark 4. The integral at the Helly-Bray theorem is the Riemann- Stieltjes integral
of a function f with respect to a function F . If [a, b] is a bounded interval and F is
right-continuous, we will understand that
Z b Z
f (x)dF (x) := f (x)dF (x).
a (a,b]

6.2 Spectral decomposition of autocovariance function


Theorem 19. A complex-valued function R(t), t ∈ Z, is the autocovariance function of
a stationary random sequence if and only if for any t ∈ Z,
Z π
R(t) = eitλ dF (λ) (2)
−π

where F is a right-continuous non-decreasing bounded function on [−π, π], F (−π) = 0.


The function F is determined by formula (2) uniquely.
Proof. 1. Suppose that (2) holds for any complex-valued function R on Z. Then R is
positive semidefinite since for any n ∈ N, any constants c1 , . . . , cn ∈ C and all t1 , . . . , tn ∈
Z
Xn X n Xn X n Z π
cj ck R(tj − tk ) = cj ck ei(tj −tk )λ dF (λ)
j=1 k=1 j=1 k=1 −π
" n X
n
#
Z π X
= cj ck eitj λ e−itk λ dF (λ)
−π j=1 k=1
2
Z π Xn
= cj eitj λ dF (λ) ≥ 0,


−π
j=1

because F is non-decreasing in [−π, π]. It means that R is the autocovariance function


of a stationary random sequence.
2. Let R be the autocovariance function of a stationary random sequence; then it
must be positive semidefinite, i.e.,
n X
X n
cj ck R(tj − tk ) ≥ 0 for all n ∈ N, c1 , . . . , cn ∈ C and t1 , . . . , tn ∈ Z.
j=1 k=1

Put tj = j, cj = e−ijλ for a λ ∈ [−π, π]. Then for every n ∈ N, λ ∈ [−π, π],
n n
1 X X −i(j−k)λ
ϕn (λ) := e R(j − k) ≥ 0.
2πn j=1 k=1

26
From here we get
n n
1 X X −i(j−k)λ
ϕn (λ) = e R(j − k)
2πn j=1 k=1
n−1 min(n, κ+n)
1 X X
= e−iκλ R(κ)
2πn κ=−n+1
j=max(1, κ+1)
n−1
1 X
= e−iκλ R(κ)(n − |κ|).
2πn κ=−n+1

For any n ∈ N let us define function



 0,Rx x ≤ −π,
Fn (x) = ϕn (λ)dλ, x ∈ [−π, π],
 −π
Fn (π), x ≥ π.

Obviously, Fn (−π) = 0 and Fn (x) is non-decreasing on [−π, π]. Compute Fn (π):


Z π
Fn (π) = ϕn (λ)dλ
−π
Z π" X n−1
#
1
= e−iκλ R(κ)(n − |κ|) dλ
2πn −π κ=−n+1
n−1 Z π
1 X
= R(κ)(n − |κ|) e−iκλ dλ = R(0),
2πn κ=−n+1 −π

since the last integral is 2πδ(κ).


The sequence {Fn , n ∈ N} is a sequence of non-decreasing functions, 0 ≤ Fn (x) ≤
R(0) < ∞ for all x ∈ R and all n ∈ N. According to the Helly theorem there exists
a subsequence {Fnk } ⊂ {Fn }, Fnk → F weakly as k → ∞, nk → ∞, where F is a
non-decreasing bounded right-continuous function and F (x) = 0, x ≤ −π, F (x) =
R(0), x > π.
From the Helly - Bray theorem for f (x) = eitx , where t ∈ Z,
Z π Z π
itλ
e dFnk (λ) −→ eitλ dF (λ) as k → ∞, nk → ∞.
−π −π

27
On the other hand,
Z π Z π
itλ
e dFnk (λ) = eitλ ϕnk (λ)dλ
−π −π
nXk −1
Z π " #
1
= eitλ e−iκλ R(κ)(nk − |κ|) dλ
−π 2πn k κ=−nk +1
nXk −1 Z π
1
= R(κ)(nk − |κ|) ei(t−κ)λ dλ,
2πnk κ=−n +1 −π
k

thus, (  
Z π |t|
R(t) 1 − , |t| < nk
eitλ dFnk (λ) = nk
−π 0 elsewhere.
We get
π
|t|
Z  
itλ
lim e dFnk (λ) = lim R(t) 1 −
k→∞ −π k→∞ nk
Z π
= R(t) = eitλ dF (λ).
−π


To prove the uniqueness, suppose that R(t) = −π eitλ dG(λ), where G is a right-
continuous non-decreasing bounded function on [−π, π] and G(−π) = 0.
Then Z π Z π
itλ
e dµF = eitλ dµG ,
−π −π

where µF a µG are finite measures on Borel subsets of the interval [−π, π] induced
by functions F a G, respectively. The rest of the proof follows from Lemma 1 since
µF (B) = µG (B) for any B ⊂ (−π, π) and µF ({−π} ∪ {π}) = µG ({−π} ∪ {π}).
Formula (2) is called the spectral decomposition (representation) of an autocovari-
ance function of a stationary random sequence. The function F is called the spectral
distribution function of a stationary random sequence.

If there exists a function f (λ) ≥ 0 for λ ∈ [−π, π] such that F (λ) = −π f (x)dx (F is
absolutely continuous), then f is called the spectral density. Obviously f = F 0 .
In case that the spectral density exists, the spectral decomposition of the autocovari-
ance function is of the form
Z π
R(t) = eitλ f (λ)dλ, t ∈ Z. (3)
−π

28
Theorem 20. A complex-valued function R(t), t ∈ R, is the autocovariance function of
a centered stationary mean square continuous process if and only if
Z ∞
R(t) = eitλ dF (λ), t ∈ R, (4)
−∞

where F is a non-decreasing right-continuous function such that


lim F (x) = 0, lim F (x) = R(0) < ∞.
x→−∞ x→∞

Function F is determined uniquely.


Proof. 1. Let R be a complex-valued function on R that satisfies (4), where F is a
non-decreasing right-continuous function, F (−∞) = 0, F (+∞) = R(0) < ∞. Then R
is positive semidefinite, moreover, it is continuous. According to Theorem 7, there exists
a stationary centered process with the autocovariance function R. Since R is continuous
(hence, continuous at zero), this process is mean square continuous which follows from
Theorem 15.
2. Suppose that R is the autocovariance function of a centered stationary mean
square continuous process. Then, it is positive semidefinite and continuous at zero. For
the proof that R satisfies (4), see, e.g., Anděl (1976), IV.1, Theorem 2.
Function F from Theorem 20 is called the spectral distribution function of a station-
ary mean square continuous stochastic process. If the spectral distribution function in
(4) is absolutely continuous, its derivative f is again called the spectral density and (4)
can be written in the form
Z ∞
R(t) = eitλ f (λ)dλ, t ∈ R. (5)
−∞

Remark 5. Two different stochastic processes may have the same spectral distribution
functions and thus the same autocovariance functions.

6.3 Existence and computation of spectral density


Theorem 21. Let K be a complex-valued function of an integer-valued argument t ∈ Z,
let ∞
P
t=−∞ |K(t)| < ∞. Then
Z π
K(t) = eitλ f (λ)dλ, t ∈ Z,
−π

where ∞
1 X −itλ
f (λ) = e K(t), λ ∈ [−π, π].
2π t=−∞

29
K be such that ∞
P 1
P∞ −itλ
Proof. LetP t=−∞ |K(t)| < ∞ and f (λ) = 2π t=−∞ e K(t). Since
∞ −itλ
the series t=−∞ e K(t) converges absolutely and uniformly for λ ∈ [−π, π], we can
interchange the integration and the summation and for any t ∈ Z we get

Z π Z π " #
1 X
eitλ f (λ)dλ = eitλ e−ikλ K(k) dλ
−π −π 2π k=−∞
∞  Z π 
1 X i(t−k)λ
= K(k) e dλ
2π k=−∞ −π

1 X
= K(k)2πδ(t − k) = K(t).
2π k=−∞

Theorem 22. Let {Xt , t ∈ Z} be a stationary


P∞ sequence such that its autocovariance
function R is absolutely summable, i.e. t=−∞ |R(t)| < ∞. Then the spectral density of
the sequence {Xt , t ∈ Z} exists and for every λ ∈ [−π, π]

1 X −ikλ
f (λ) = e R(k). (6)
2π k=−∞
P∞
Proof. Since t=−∞ |R(t)| < ∞ it follows from the previous theorem that
Z π
R(t) = eitλ f (λ)dλ, t ∈ Z,
−π

where ∞
1 X −itλ
f (λ) = e R(t), λ ∈ [−π, π].
2π t=−∞
To prove that f is the spectral density, due to the uniqueness of the spectral decompo-
sition (3), it suffices to prove that f (λ) ≥ 0 for every λ ∈ [−π, π].
We know from the proof of Theorem 19 that for every λ ∈ [−π, π],
n−1
1 X
ϕn (λ) = e−iκλ R(κ)(n − |κ|) ≥ 0.
2πn κ=−n+1

We will show that f (λ) = limn→∞ ϕn (λ).

30
We have, as n → ∞,


1 X −ikλ
|f (λ) − ϕn (λ)| ≤ e R(k)
2π |k|≥n

1 n−1
X
+ e−iκλ R(κ)|κ|

2πn κ=−n+1
n−1
1 X 1 X
≤ |R(k)| + |R(κ)||κ| −→ 0
2π 2πn κ=−n+1
|k|≥n

where we have used the assumption on the absolute summability of the autocovariance
function and the Kronecker lemma.1
Formula (6) is called the inverse formula for computing the spectral density of a
stationary random sequence.

Theorem 23. Let {Xt , t ∈ R} be a centered weakly stationary mean square process. Let
R∞
its autocovariance function R satisfies condition −∞ |R(t)|dt < ∞. Then the spectral
density of the process exists and it holds
Z ∞
1
f (λ) = e−itλ R(t)dt, λ ∈ (−∞, ∞). (7)
2π −∞

The proof is quite analogous to the computation of a probability density function by


using the inverse Fourier transformation of the characteristic function (see, e. g. Štěpán,
1987, IV.5.3.)
Example 16. (White noise) Let {Xt , t ∈ Z} be a sequence of uncorrelated random
variables with zero mean and a finite positive variance σ 2 . The autocovariance function
is
P∞ cov(Xs , Xt ) = σ 2 δ(s − t) = R(s − t), the sequence is weakly stationary, and since
2
t=−∞ |R(t)| = σ < ∞ the spectral density exists and according to inverse formula (6)

1 X −ikλ 1 σ2
f (λ) = e R(k) = R(0) = , λ ∈ [−π, π].
2π k=−∞ 2π 2π

The spectral distribution function of the white noise sequence is


F (λ) = 0, λ ≤ −π,
σ2
= (λ + π), λ ∈ [−π, π],

= σ2, λ ≥ π.
Notation: W N (0, σ 2 ) (white noise)
1
P∞ 1
Pn
k=1 ak < ∞ ⇒ n k=1 k ak → 0 as n → ∞.

31
Example 17. Consider a stationary sequence with the autocovariance function R(t) =
a|t| , t ∈ Z, |a| < 1. Since

X ∞
X ∞
X
|t|
|R(t)| = |a| = 1 + 2 |a|t < ∞,
t=−∞ t=−∞ t=1

the spectral density exists and according to inverse formula (6)



1 X −ikλ |k|
f (λ) = e a
2π k=−∞
∞ −1
1 X −ikλ k 1 X −ikλ −k
= e a + e a
2π k=0 2π k=−∞
∞ ∞
1 X −iλ k 1 X k
aeiλ

= ae +
2π k=0 2π k=1
1 1 1 aeiλ
= +
2π 1 − ae−iλ 2π 1 − aeiλ
1 1 − a2 1 1 − a2
= = .
2π |1 − ae−iλ |2 2π 1 − 2a cos λ + a2
Example 18. A centered weakly stationary process with the autocovariance function
R(t) = ce−α|t| , t ∈ R, c > 0, α > 0 is mean square continuous. It holds
Z ∞ Z ∞
|R(t)|dt = ce−α|t| dt < ∞,
−∞ −∞

thus, the spectral density exists and by formula (7)


Z ∞ Z ∞
1 −itλ 1
f (λ) = e R(t)dt = e−itλ ce−α|t| dt
2π −∞ 2π −∞
Z ∞
c
= (cos λt − i sin λt)e−α|t| dt
2π −∞
c ∞ cα
Z
1
= cos(λt)e−αt dt =
π 0 π α + λ2
2

for every λ ∈ R.
Example 19. Consider a centered mean square process with the spectral distribution
function
F (λ) = 0, λ < −1,
1
= , −1 ≤ λ < 1,
2
= 1, λ ≥ 1.

32
AR(1), 0.8
6

−2

−4

−6
0 100 200 300 400 500

AR(1), −0.8
6

−2

−4

−6
0 100 200 300 400 500

Figure 3: Trajectories of a process with the autocovariance function R(t) = a|t| , up: a = 0, 8, down
a = −0, 8

Spectral distribution function is not absolutely continuous; the spectral density of the
process does not exist. According to (4) the autocovariance function is
Z ∞
1 1
R(t) = eitλ dF (λ) = e−it + eit = cos t, t ∈ R.
−∞ 2 2
The process has a discrete spectrum with non-zero values at frequencies λ1 = −1, λ2 = 1.
Example 20. The process {Xt , t ∈ R} of uncorrelated random variables with zero mean
and a finite positive variance does not satisfy decomposition (4), since it is not mean
square continuous.

7 Spectral representation of stochastic processes


7.1 Orthogonal increment processes
Definition 30. Let {Xt , t ∈ T }, T an interval, be a (generally complex-valued) second
order process on (Ω, A, P ). We say that {Xt , t ∈ T } is orthogonal increment process, if
for any t1 , . . . , t4 ∈ T such that (t1 , t2 ] ∩ (t3 , t4 ] = ∅,

E(Xt2 − Xt1 )(X t4 − X t3 ) = 0.

33
62 KAPITOLA 5. LINEÁRNÍ MODELY ČASOVÝCH ŘAD
62 KAPITOLA 5. LINEÁRNÍ MODELY ČASOVÝCH ŘAD

1.0

4
0.5 1.0

34
0.0 0.5

3
f(λ) f(λ)
r(t)

2
−1.0 −1.0−0.5 −0.5 0.0
r(t)

1
02
1
0 2 4 6 8 10 −π −π 2 0 π 2 π

0
0 2 4 t 6 8 10 −π −π 2 λ0 π 2 π

t λ
Obrázek 5.3: Autokorelační funkce (vlevo) a spektrální hustota (vpravo) posloupnosti
|t|
Figure 4: Autocovariance
AR(1):
ObrázekXt5.3:
= 0,8 function
Xt−1 ∼N R(t)
+ Yt ; Ytfunkce
Autokorelační 1) = a aspektrální
(0,(vlevo) (left) and the spectral density (right),
hustota (vpravo) posloupnosti
a = 0, 8 AR(1): Xt = 0,8 Xt−1 + Yt ; Yt ∼ N (0, 1)
1.0

4
0.5 1.0

34
0.0 0.5

3
f(λ) f(λ)
r(t)

2
−1.0 −1.0−0.5 −0.5 0.0
r(t)

1
02
1

0 2 4 6 8 10 −π −π 2 0 π 2 π
0

0 2 4 t 6 8 10 −π −π 2 λ0 π 2 π

t λ
Obrázek 5.4: Autokorelační funkce (vlevo) a spektrální hustota (vpravo) posloupnosti
AR(1):
Obrázek = −0,8
Xt5.4: Autokorelační Yt ∼ N (vlevo)
Xt−1 + Yt ; funkce (0, 1) a spektrální hustota (vpravo) posloupnosti
Figure 5: Autocovariance
AR(1): Xt = −0,8 Xfunction R(t) = a|t| (left) and the spectral density
t−1 + Yt ; Yt ∼ N (0, 1)
(right),
a = −0, 8

0.35
1

0.9 0.3

0.8
0.25
0.7

0.6 0.2

0.5
0.15
0.4

0.3 0.1

0.2
0.05
0.1

0 0
−5 0 5 −5 0 5

Figure 6: Autocovariance function R(t) = ce−α|t| , t ∈ R (left) and the spectral density
(right), c = 1, α = 1

34
We also say that the increments of the process are orthogonal random variables.

In what follows we will consider only centered right-mean square continuous processes
i.e., such that E|Xt − Xt0 |2 → 0 as t → t0 + for any t0 ∈ T.

Theorem 24. Let {Zλ , λ ∈ [a, b]} be a centered orthogonal increment right-mean square
continuous process, [a, b] a bounded interval. Then there exists a unique non-decreasing
right-continuous function F such that

F (λ) = 0, λ ≤ a,
= F (b), λ ≥ b, (8)
F (λ2 ) − F (λ1 ) = E|Zλ2 − Zλ1 |2 , a ≤ λ1 < λ2 ≤ b.

Proof. Define function

F (λ) = E|Zλ − Za |2 , λ ∈ [a, b]


= 0, λ ≤ a,
= F (b), λ ≥ b.

We will show that this function is non-decreasing, right-continuous and satisfies the
condition of the theorem. Obviously, it suffices to consider λ ∈ [a, b], only.
Let a < λ1 < λ2 < b. Then

F (λ2 ) = E|Zλ2 − Za |2 = E|Zλ2 − Zλ1 + Zλ1 − Za |2


= E|Zλ2 − Zλ1 |2 + E|Zλ1 − Za |2
+ E(Zλ2 − Zλ1 )(Zλ1 − Za ) + E(Zλ1 − Za )(Zλ2 − Zλ1 )
= E|Zλ2 − Zλ1 |2 + F (λ1 )

since the increments Zλ2 − Zλ1 a Zλ1 − Za are orthogonal. From here we have

F (λ2 ) − F (λ1 ) = E|Zλ2 − Zλ1 |2 ≥ 0,

which means that F is non-decreasing and also right-continuous, due to the right-
continuity of the process {Zλ , λ ∈ [a, b]}. Condition (8) is satisfied.
Now, let G be a non-decreasing right-continuous function that satisfies conditions of
the theorem. Then G(a) = 0 = F (a) and for λ ∈ (a, b] it holds G(λ) = G(λ) − G(a) =
E|Zλ − Za |2 = F (λ) − F (a) = F (λ), which proves the uniqueness of function F .

The function F is bounded, non-decreasing, right-continuous, and we call it distribu-


tion function associated with the orthogonal increment process.

35
Example 21. Wiener process na [0, T ] is a centered mean square continuous Gaussian
process with independent and stationary increments, therefore with orthogonal incre-
ments, such that W0 = 0, Ws − Wt ∼ N (0, σ 2 |s − t|), 0 ≤ s, t, ≤ T. The associate
distribution function on [0, T ] is

F (λ) = 0, λ ≤ 0,
= E|Wλ − W0 |2 = σ 2 λ, 0 ≤ λ ≤ T,
= σ 2 T, λ ≥ T.

Example 22. Let W fλ be a transformation of the Wiener process on the interval [−π, π]
fλ = W(λ+π)/2π , λ ∈ [−π, π].
given by W
The process {W fλ , λ ∈ [π, π]} is Gaussian process with orthogonal increments and
the associated distribution function

F (λ) = 0, λ ≤ −π,
σ2
= (λ + π), λ ∈ [−π, π],

= σ2, λ≥π

7.2 Integral with respect to an orthogonal increment process


Let {Zλ , λ ∈ [a, b]} be a centered right-mean square continuous process with orthogonal
increments on (Ω, A, P ), [a, b] a bounded interval, let F be the associated distribution
function of this process. Let µF be a measure induced by F.
Consider a space of complex-valued functions L2 ([a, b], B, µF ) := L2 (F ), i,e. space of
measurable functions f on [a, b] such that
Z b Z b
2
|f (λ)| dµF (λ) = |f (λ)|2 dF (λ) < ∞.
a a

Recall the basic properties of this space.

Properties of L2 (F ) :

• The inner product on the space of functions L2 (F ) (more exactly, on equivalence


classes of L2 (F ) with respect to measure µF ) 2 is defined by
Z b
hf, gi = f (λ)g(λ)dF (λ), f, g ∈ L2 (F );
a
2
f ∼ g if f = g µF −almost everywhere

36
hR i 12
b
• Norm in L2 (F ) is given by kf k = a
|f (λ)|2 dF (λ) ;

• Convergence in L2 (F ) means that

fn → f in L2 (F ) as n → ∞, if kfn − f k → 0, i. e.,
Z b
|fn (λ) − f (λ)|2 dF (λ) → 0, n → ∞;
a

• Space L2 (F ) is complete (Rudin, 2003, Theorem 3.11).

Definition of the integral


I. Let f ∈ L2 (F ) be a simple function, i.e., . for a = λ0 < λ1 < · · · < λn = b
n
X
f (λ) = ck J(λk−1 , λk ] (λ), (9)
k=1

where JA (y) = 1 for y ∈ A and JA (y) = 0 otherwise is the indicator function of a set A,
c1 , . . . , cn are complex-valued constants, ck 6= ck+1 , 1 ≤ k ≤ n − 1. We define
Z n
X
f (λ)dZ(λ) := ck (Zλk − Zλk−1 ), (10)
(a,b] k=1

which is a random variable from the space L2 (Ω, A, P ).


R Rb
Convention: Instead of (a,b] f (λ)dZ(λ) we will write a f (λ)dZ(λ).
Rb
Notation: a f (λ)dZ(λ) := I(f ).
Properties of the integral for simple functions:

Theorem 25. Let {Zλ , λ ∈ [a, b]} be a centered mean square right-continuous process
with orthogonal increments and the associated distribution function F , let f , g be simple
functions in L2 (F ), α, β complex-valued constants. Then
Rb
1. E a f (λ)dZ(λ) = 0.
Rb Rb Rb
2. a
[αf (λ) + βg(λ)]dZ(λ) = α a
f (λ)dZ(λ) + β a
g(λ)dZ(λ).
Rb Rb Rb
3. E a
f (λ)dZ(λ) a g(λ)dZ(λ) = a f (λ)g(λ)dF (λ).

37
Pn
Proof. 1. Let f (λ) = k=1 ck J(λk−1 , λk ] (λ). Then
" n # n
Z b X X
E f (λ)dZ(λ) = E ck (Zλk − Zλk−1 ) = ck E(Zλk − Zλk−1 ) = 0,
a k=1 k=1

since {Zλ , λ ∈ [a, b]} is centered.

2. W. l. o. g, let
n
X n
X
f (λ) = ck J(λk−1 , λk ] (λ), g(λ) = dk J(λk−1 , λk ] (λ).
k=1 k=1

Then
Z b n
X
[αf (λ) + βg(λ)]dZ(λ) = (αck + βdk )(Zλk − Zλk−1 )
a k=1
n
X X n
=α ck (Zλk − Zλk−1 ) + β dk (Zλk − Zλk−1 )
k=1 k=1
Z b Z b
=α f (λ)dZ(λ) + β g(λ)dZ(λ).
a a

3. Let n n
X X
f (λ) = ck J(λk−1 , λk ] (λ), g(λ) = dk J(λk−1 , λk ] (λ).
k=1 k=1

Then
Z b Z b
E f (λ)dZ(λ) g(λ)dZ(λ)
a a
n
X n
X 
=E ck Zλk − Zλk−1 dk Zλk − Zλk−1
k=1 k=1
n
X 2
= ck dk E Zλk − Zλk−1
k=1
n
X Z b
= ck dk (F (λk ) − F (λk−1 )) = f (λ)g(λ)dF (λ).
k=1 a

38
II. Let f ∈ L2 (F ) be a measurable function. The set of simple functions is dense in
L2 (F ) and its closure is L2 (F ) (Rudin, 2003, Theorem 3.13), it means that there exists
a sequence of simple functions fn ∈ L2 (F ) such that fn → f in L2 (F ) as n → ∞.
Integral I(fn ) is defined for simple functions and I(fn ) ∈ L2 (Ω, A, P ). The sequence
{I(fn )} is a Cauchy sequence in L2 (Ω, A, P ) because

E|I(fm ) − I(fn )|2 = E(I(fm ) − I(fn ))(I(fm ) − I(fn ))


Z b Z b
=E (fm (λ) − fn (λ))dZ(λ) (fm (λ) − fn (λ))dZ(λ)
a a
Z b
= (fm (λ) − fn (λ))(fm (λ) − fn (λ))dF (λ)
a
Z b
= |fm (λ) − fn (λ)|2 dF (λ) −→ 0
a

as m, n → ∞, since fn → f in L2 (F ).

Since {I(fn )} is the Cauchy sequence in L2 (Ω, A, P ), it has a mean square limit
Z b
I(f ) = l. i. m. I(fn ) := f (λ)dZ(λ), (11)
n→∞ a

which is called to be integral of the function f with respect to the process with orthogonal
increments, or stochastic integral.

Notice that I(f ) does not depend on the choice of the sequence {fn }. Let f ∈ L2 (F )
and fn a gn be simple, fn → f and gn → f v L2 (F ). Then I(fn ), I(gn ) have mean square
limits I, J, respectively.
Define sequence {hn } = {f1 , g1 , f2 , g2 , . . . } which is simple and hn → f in L2 (F ).
Then I(hn ) → K in mean square. Since selected subsequences {I(fn )} a {I(gn )} have
mean square limits, I ≡ J ≡ K.
Theorem 26. Let {Zλ , λ ∈ [a, b]} be a centered right-mean square continuous process
with orthogonal increments and the associated distribution function F . Then integral
(11) has the following properties.
Rb
1. Let f ∈ L2 (F ). Then EI(f ) = E a f (λ)dZ(λ) = 0.

2. Let f, g ∈ L2 (F ), α, β ∈ C be constants. Then I(αf + βg) = αI(f ) + βI(g).

3. Let f, g ∈ L2 (F ). Then
Z b
EI(f )I(g) = f (λ)g(λ)dF (λ). (12)
a

39
4. Let {fn , n ∈ N} and f be functions in L2 (F ), respectively. Then as n → ∞

fn → f in L2 (F ) ⇐⇒ I(fn ) → I(f ) in L2 (Ω, A, P ). (13)

Proof. 1. Let f ∈ L2 (F ) and let {fn , n ∈ N} be a sequence of simple functions in L2 (F )


such that fn → f in L2 (F ). Then I(f ) = l. i. m. I(fn ). Since EI(fn ) = 0, then also
EI(f ) = 0 (from the properties of the mean square convergence)

2. Let f, g ∈ L2 (F ) and let {fn , n ∈ N}, respectively {gn , n ∈ N} be sequences of


simple functions in L2 (F ) such that fn → f and gn → g in L2 (F ) respectively; thus,
I(fn ) → I(f ) and I(gn ) → I(g) in L2 (Ω, A, P ) (in mean square).
The sequence of simple functions hn = αfn + βgn converges to h = αf + βg in L2 (F ),
since
Z b
|αfn (λ) + βgn (λ) − (αf (λ) + βg(λ))|2 dF (λ)
a
Z b
2
≤ 2|α| |fn (λ) − f (λ)|2 dF (λ)
a
Z b
+ 2|β|2 |gn (λ) − g(λ)|2 dF (λ) → 0
a

We have:

• hn = αfn + βgn simple

• I(hn ) = I(αfn + βgn ) = αI(fn ) + βI(gn )

• hn → h in L2 (F ) ⇒ I(hn ) → I(h) in mean square

• h = αf + βg

• I(hn ) → αI(f ) + βI(g) in mean square, since

E|αI(fn ) + βI(gn ) − (αI(f ) + βI(g))|2


= E|α(I(fn ) − I(f )) + β(I(gn ) − I(g))|2
≤ 2|α|2 E|I(fn ) − I(f )|2 + 2|β|2 E|I(gn ) − I(g)|2 −→ 0

⇒ I(h) = I(αf + βg) = αI(f ) + βI(g).

3. Let f, g ∈ L2 (F ), {fn , n ∈ N} and {gn , n ∈ N} be sequences of simple functions,


fn → f and gn → g in L2 (F ). Thus, I(fn ) → I(f ) and I(gn ) → I(g) in mean square.

40
From the continuity of the inner product in L2 (Ω, A, P ) we have

EI(fn )I(gn ) = hI(fn ), I(gn )i → hI(f ), I(g)i = EI(f )I(g).

From the continuity of the inner product in L2 (F ) we have


Z b
EI(fn )I(gn ) = fn (λ)gn (λ)dF (λ) = hfn , gn i → hf, gi
a
Z b
= f (λ)g(λ)dF (λ),
a

from here (12) follows.

4. Letfn , f ∈ L2 (F ). According to 2 and 3,


Z b
2 2
E|I(fn ) − I(f )| = E|I(fn − f )| = |fn (λ) − f (λ)|2 dF (λ),
a

from which (13) follows.

Remark 6. let {Zλ , λ ∈ R} be a centered right-mean square continuous process with


orthogonal increments. Function F defined by

F (λ2 ) − F (λ1 ) = E|Zλ2 − Zλ1 |2 , −∞ < λ1 < λ2 < ∞

is non-decreasing, right-continuous and unique (up to an additive constant). If F is


bounded it induces a finite measure µF , and for f such that
Z ∞ Z ∞
2
|f (λ)| dµF (λ) = |f (λ)|2 dF (λ) < ∞,
−∞ −∞

Z ∞ Z b
f (λ)dZ(λ) := l. i. m. f (λ)dZ(λ) as a → −∞, b → ∞.
−∞ a

7.3 Spectral decomposition of a stochastic process


Theorem 27. Let Xt , t ∈ Z, be random variables such that
Z π
Xt = eitλ dZ(λ),
−π

where {Zλ , λ ∈ [−π, π]} is a centered right-mean square continuous process with orthog-
onal increments on [−π, π] and associated distribution function F . Then {Xt , t ∈ Z} is
a centered weakly stationary sequence with the spectral distribution function F .

41
Proof. The associated distribution function F of the process {Zλ , λ ∈ [−π, π]} is bounded,
non-decreasing and right-continuous, F (λ) = 0 for λ ≤ −π, F (λ) = F (π) < ∞ for λ ≥ π.
For t ∈ Z define function et to be

et (λ) = eitλ , −π ≤ λ ≤ π.

Then Z π Z π
2
|et (λ)| dF (λ) = |eitλ |2 dF (λ) = F (π) − F (−π) < ∞,
−π −π

it means that et ∈ L2 (F ) and eitλ dZ(λ) is well defined random variable.
Xt = −π
According to Theorem 26 we have

1. EXt = E −π eitλ dZ(λ) = 0 for any t ∈ Z

2.
Z π 2 Z π Z π
itλ itλ
2

E|Xt | = E
e dZ(λ) = E
e dZ(λ) eitλ dZ(λ)
−π −π −π
Z π Z π
itλ 2
= |e | dF (λ) = dF (λ) < ∞.
−π −π

3.
Z π Z π
i(t+h)λ
cov(Xt+h , Xt ) = E e dZ(λ) eitλ dZ(λ)
−π −π
Z π
ihλ
= e dF (λ) := R(h).
−π

From here we can conclude that



• cov(Xt+h , Xt ) = R(h) = −π eihλ dF (λ) depends on h only

• sequence {Xt , t ∈ Z} is centered and weakly stationary

• function F has the same properties as the spectral distribution function (see spec-
tral decomposition of the autocovariance function, Theorem 19)

• from the uniqueness of the spectral decomposition (2) it follows that F is the
spectral distribution function of the sequence {Xt , t ∈ Z}.

42
Example 23. Let Wfλ be a transformation of the Wiener process to the interval [−π, π]
fλ = W(λ+π)/2π , λ ∈ [−π, π]. Then random variables
given by W
Z π
Xt = eitλ dW
f (λ), t ∈ Z
−π

are centered, uncorrelated with the same variance σ 2 ; the sequence {Xt , t ∈ Z} is the
Gaussian white noise, see also Examples 16 and 22.
Example 24. Consider a sequence of functions {ftk , t ∈ Z} on [−π, π] defined by
k
X
ftk (λ) = eitλj J(λj−1 ,λj ] (λ),
j=1

where −π = λ0 < λ1 < · · · < λk = π and k ∈ N is given. Let {Zλ , λ ∈ [−π, π]} be
a centered right-mean square continuous process with orthogonal increments on [−π, π]
with the associated distribution function F . We can see that ftk are simple functions in
L2 (F ), thus we can compute
Z π Xk k
X
Xtk := ftk (λ)dZ(λ) = eitλj (Zλj − Zλj−1 ) = eitλj Zej .
−π j=1 j=1

Here Zej = Zλj − Zλj−1 , j = 1, . . . , k, are uncorrelated random variables with zero mean
and the variance E|Zej |2 = E|Zλj − Zλj−1 |2 = F (λj ) − F (λj−1 ) := σj2 . Then we have
k
X
EXtk = 0, EX(t+h),k Xtk = eihλj σj2 := Rk (h).
j=1

We see that {Xtk , t ∈ Z} is stationary and its autocovariance function has the spectral
decomposition Z π
Rk (h) = eihλ dFXk (λ)
−π
where FXk is the spectral distribution function of {Xtk , t ∈ Z}; it has jumps at points λj
such that FXk (λj ) − FXk (λj−1 ) = σj2 . On the other hand, σj2 = F (λj ) − F (λj−1 ). Since
F (−π) = FXk (−π) = 0, F equals to FXk at least at points λj , j = 0, 1, . . . , k.

Theorem 28. Let {Xt , t ∈ Z} be a centered weakly stationary sequence with spec-
tral distribution function F . Then there exists a centered orthogonal increment process
{Zλ , λ ∈ [−π, π]} such that
Z π
Xt = eitλ dZ(λ), t ∈ Z (14)
−π

and
E|Z(λ) − Z(−π)|2 = F (λ), −π ≤ λ ≤ π.

43
Proof. Brockwell and Davis (1991), Theorem 4.8.2 or Prášková (2016), Theorem 4.4.
Relation (14) is called the spectral decomposition of a stationary random sequence.

Remark 7. Theorem 28 says that any random variable of a centered stationary P itλj ran-
dom sequence can be approximated (in the mean square limit) by a sum e Yj of
uncorrelated random variables Yj , the variance of which is an increment of the spectral
distribution function at points (frequencies) λj−1 and λj .

Theorem 29. Let {Xt , t ∈ R} be a centered weakly stationary mean square continuous
process. Then there exists an orthogonal increment process {Zλ , λ ∈ R} such that
Z ∞
Xt = eitλ dZ(λ), t ∈ R, (15)
−∞

and the associated distribution function of the process {Zλ , λ ∈ R} is the spectral distri-
bution function of the process {Xt , t ∈ R}.

Proof. Priestley (1981), Chap. 4.11


Relation (15) is said to be spectral decomposition of a stationary mean square con-
tinuous process.

Theorem 30. Let {Xt , t ∈ Z} be a centered stationary sequence with a spectral distri-
bution function F. Let H{Xt , t ∈ Z} be the Hilbert space generated by {Xt , t ∈ Z}. Then
U ∈ H{Xt , t ∈ Z} if and only if
Z π
U= ϕ(λ)dZ(λ), (16)
−π

where ϕ ∈ L2 (F ) and {Zλ , λ ∈ [−π, π]} is the orthogonal increment process as given in
the spectral decomposition of the sequence {Xt , t ∈ Z}.

Proof. 1. Let U ∈ H{Xt , t ∈ Z}. Then either U ∈ M{Xt , t ∈ Z} (linear span), or


U = l. i. m. n→∞ Un , Un ∈ M{Xt , t ∈ Z}.
a) Let U ∈ M{Xt , t ∈ Z}; then U = N
P
j=1 cj Xtj , for c1 , . . . , cN ∈ C and t1 , . . . , tN ∈ Z.
From the spectral decomposition (14)
N
X N
X Z π 
itj λ
U = cj Xtj = cj e dZ(λ)
j=1 j=1 −π
" N
#
Z π X Z π
itj λ
= cj e dZ(λ) = ϕ(λ)dZ(λ),
−π j=1 −π

44
where ϕ(λ) = N itj λ
P
j=1 cj e . Obviously, ϕ is a finite linear combination of functions from
L2 (F ), thus ϕ ∈ L2 (F ).
b) Let U = l. i. m. n→∞ Un , Un ∈ M{Xt , t ∈ Z}. According to a)
Z π
Un = ϕn (λ)dZ(λ), ϕn ∈ L2 (F ).
−π

Here, {Un } is the Cauchy sequence in H{Xt , t ∈ Z} (it is convergent there) thus, {ϕn }
is the Cauchy sequence in L2 (F ), since
Z π Z π 2
2

E|Um − Un | = E ϕm (λ)dZ(λ) − ϕn (λ)dZ(λ) =
−π −π
Z π 2 Z π

E [ϕm (λ) − ϕn (λ)]dZ(λ) = |ϕm (λ) − ϕn (λ)|2 dF (λ).
−π −π

Hence, there exists ϕ ∈ L2 (F ) such that ϕn → ϕ v L2 (F ). By (13)


Z π Z π
Un = ϕn (λ)dZ(λ) → ϕ(λ)dZ(λ) v L2 (Ω, A, P ),
−π −π

thus U = −π
ϕ(λ)dZ(λ).
2. Let U be a random variable that satisfies (16). Since ϕ ∈ L2 (F ), there exists a
(n)
sequence of trigonometric polynomials ϕn (λ) = nk=−n ck eiλtk on [−π, π] such that
P (n)

ϕn → ϕ v L2 (F ). According to (13)
Z π Z π
ϕ(λ)dZ(λ) = l. i. m. ϕn (λ)dZ(λ),
−π n→∞ −π

hence
" n
#
Z π Z π X (n)
(n) iλtk
U = ϕ(λ)dZ(λ) = l. i. m. ck e dZ(λ)
−π n→∞ −π k=−n
n Z π 
X (n)
(n) iλtk
= l. i. m. ck e dZ(λ)
n→∞ −π
k=−n
Xn
(n)
= l. i. m. ck Xt(n) ∈ H{Xt , t ∈ Z}.
n→∞ k
k=−n

45
8 Linear models of time series
8.1 White noise
Recall that the white noise sequence WN(0, σ 2 ) is defined as a sequence {Yt , t ∈ Z} of
uncorrelated random variables with mean zero and variance 0 < σ 2 < ∞, the autoco-
variance function
RY (t) = σ 2 δ(t), t ∈ Z
and the spectral density
σ2
fY (λ) = , λ ∈ [−π, π].

Moreover, Z π
Yt = eitλ dZY (λ),
−π

where ZY = {Zλ , λ ∈ [−π, π]} is a process with orthogonal increments and associated
distribution function
σ2
F (λ) = (λ + π), λ ∈ [−π, π]

that is same as the spectral distribution function FY (λ) of {Yt , t ∈ Z}.

8.2 Moving average sequences


Definition 31. A random sequence {Xt , t ∈ Z} defined by

Xt = b0 Yt + b1 Yt−1 + · · · + bn Yt−n , t ∈ Z, (17)

where {Yt , t ∈ Z} is a white noise WN(0, σ 2 ) and b0 , b1 , . . . , bn are real- or complex-valued


constants, b0 6= 0, bn 6= 0, is called to be a moving average sequence of order n.

Notation: MA(n)
1
Remark 8. In special case, bi = n+1
,i = 0, . . . , n.

The basic properties of MA(n) :


1. EXt = 0 for all t ∈ Z.

46
2. The autocovariance function (computation in time domain): For t ≥ 0,
n n
!
X X
cov(Xs+t , Xs ) = EXs+t Xs = E bj Ys+t−j bk Ys−k
j=0 k=0
n X
X n
= bj bk E(Ys+t−j Ys−k )
j=0 k=0
Xn X n
2
=σ bj bk δ(t − j + k)
j=0 k=0
n−t
X
= σ2 bt+k bk , 0≤t≤n
k=0
= 0, t > n.
For t ≤ 0 we proceed analogously. Since cov(Xs+t , Xs ) depends on t, only, we can
conclude that the sequence is weakly stationary.
3. Spectral decomposition. By using the spectral decomposition of the white noise
we obtain
X n n
X Z π 
i(t−j)λ
Xt = bj Yt−j = bj e dZY (λ)
j=0 j=0 −π
" n
#
Z π X
= bj ei(t−j)λ dZY (λ)
−π j=0
" n #
Z π X
= eitλ bj e−ijλ dZY (λ)
−π j=0
Z π
= eitλ g(λ)dZY (λ),
−π
Pn
where g(λ) = j=0 bj e−ijλ ∈ L2 (F ). From the properties of the stochastic integral we
again get EXt = 0, and for the autocovariance function (in spectral domain) we have
Z π Z π
i(s+t)λ
EXs+t Xs = E e g(λ)dZY (λ) eisλ g(λ)dZY (λ)
Z π−π −π

= ei(s+t)λ g(λ)e−isλ g(λ)dFY (λ)


Z−π
π
= eitλ |g(λ)|2 fY (λ)dλ
−π
Z π
σ2
= eitλ |g(λ)|2 dλ = RX (t),
−π 2π

47
that is again a function of t which confirms the weak stationarity. Due to the unique-
ness of the spectral decomposition of the autocovariance function (Theorem 19) we can
σ2
conclude that function |g(λ)|2 2π is the spectral density of the sequence {Xt , t ∈ Z}.
We have just proved the following theorem.

Theorem 31. The moving average sequence {Xt , t ∈ Z} of order n defined by (17) is
centered and weakly stationary, with the autocovariance function
n−t
X
2
RX (t) = σ bk+t bk , 0 ≤ t ≤ n, (18)
k=0

= RX (−t), −n ≤ t ≤ 0,

= 0, |t| > n.

The spectral density fX of sequence (17) exists and is given by


n 2
σ 2 X −ikλ
fX (λ) = bk e , λ ∈ [−π, π]. (19)



k=0

Remark 9. For real-valued constants b0 , . . . , bn the autocovariance function of the se-


quence MA(n) takes form
n−|t|
X
2
RX (t) = σ bk bk+|t| , |t| ≤ n, (20)
k=0
= 0, |t| > n.

8.3 Linear process


Theorem 32. Let {Yt , t ∈ Z} be a white noise WN(0, σ 2 ) and {cj , j ∈ N0 } be a sequence
of complex-valued
P∞ constants. P∞
2
1. If j=0 |cj | < ∞, the series j=0 cj Yt−j converges in mean square for every
t ∈ Z, i.e., for every t ∈ Z there exists a random variable Xt such that
n
X
Xt = l. i. m. cj Yt−j .
n→∞
j=0

2. If ∞
P P∞
j=0 |cj | < ∞, the series j=0 cj Yt−j converges for every t ∈ Z absolutely with
probability one.

48
0.2
−0.5

0.1
−1.0

0.0
0 2 4 6 8 10 −π −π 2 0 π 2 π

t λ

Obrázek 5.1: Autokorelační funkce (vlevo) a spektrální hustota (vpravo) posloupnosti


MA(1): Xt = Yt + 0,8 Yt−1 ; Yt ∼ N (0, 1)

1.0

1.5
0.5

1.0
f(λ)
0.0
r(t)

0.5
−0.5
−1.0

0.0
0 2 4 6 8 10 −π −π 2 0 π 2 π

t λ

Obrázek 5.2: Autokorelační funkce (vlevo) a spektrální hustota (vpravo) posloupnosti


Figure 7: Autocorrelation function
MA(2): Xt = Yt + Yt−1 − 2 Yt−2 ; Y(left) and spectral density (right) of the MA(2) se-
t ∼ N (0, 1)

quence Xt = Yt + Yt−1 − 2 Yt−2 . Yt : Gaussian white noise ∼ N (0, 1)

Proof. 1. We will show that { nj=0 cj Yt−j , n ∈ N} is the Cauchy sequence in L2 (Ω, A, P )
P
for every t ∈ Z.
W. l. o. g., assume that m < n. Since Yk are uncorrelated with a constant variance σ 2
we easily get
2 2
Xm n
X Xn
E cj Yt−j − ck Yt−k = E cj Yt−j


j=0 k=0 j=m+1
Xn n
X
= |cj |2 E|Yt−j |2 = σ 2 |cj |2 → 0
j=m+1 j=m+1

asPm, n → ∞ which means thatPthere exists a mean square limit of the sequence
{ nj=0 cj Yt−j }, that we denote by ∞ cY .
√j=0 j t−j
2 12
2. Since E|Yt−j | ≤ (E|Yt−j | ) = σ 2 < ∞, we can see that

X ∞
X ∞
X
E|cj Yt−j | = |cj |E|Yt−j | ≤ σ |cj | < ∞,
j=0 j=0 j=0

P∞
and thus j=0 |cj Yt−j | converges almost surely (Rudin, 2003, Theorem 1.38).

Theorem 33. Let {Xt , t ∈ Z} be a weakly stationary centered random sequence with an
P function R, let {cj , j ∈ N0 } be a sequence
autocovariance P∞of complex-valued constants
such that ∞ |c
j=0 j | < ∞. Then for any t ∈ Z the series j=0 cj Xt−j converges in mean
square and also absolutely with probability one.

49
Proof. 1. For m < n we have
2 !2
Xn n
X
E cj Xt−j ≤ E |cj ||Xt−j |


j=m+1 j=m+1
n
X X n
= |cj ||ck |E|Xt−j ||Xt−k |.
j=m+1 k=m+1

The weak stationarity and the Schwarz inequality imply


1 1
E|Xt−j ||Xt−k | ≤ (E|Xt−j |2 ) 2 (E|Xt−k |2 ) 2 = R(0),
thus, 2 !2
n
X n
X
E cj Xt−j ≤ R(0) |cj | →0


j=m+1 j=m+1

as m, n → ∞. We have proved the mean square convergence.


2. From the weak stationarity we also get

X ∞
X ∞
X
p
E|cj Xt−j | = |cj |E|Xt−j | ≤ R(0) |cj | < ∞,
j=0 j=0 j=0

from which the rest of the proof follows.


2
Definition 32. Let {YPt , t ∈ Z} be a white noise WN(0, σ ) and {cj , j ∈ N0 } a sequence

of constants such that j=0 |cj | < ∞. A random sequence {Xt , t ∈ Z} defined by

X
Xt = cj Yt−j , t∈Z (21)
j=0

is called causal linear process.


Remark 10. The causality means that the random variable Xt depends on Ys , s ≤ t
(contemporary and past variables, only). Sometimes we also use notation MA(∞).
Remark 11. A weaker condition ∞ 2
P
j=0 |cj | < ∞ implies the mean square convergence
in the series defined by (21), only.
Theorem 34. The causalP linear process {Xt , t ∈ Z} defined by (21), where {Yt , t ∈
Z} is WN(0, σ 2 ) and ∞ j=0 |cj | < ∞, is a centered weakly stationary sequence with the
autocovariance function

X
2
RX (t) = σ ck+t ck , t ≥ 0, (22)
k=0

= RX (−t), t ≤ 0.

50
The spectral density fX of sequence (21) exists and takes values
2

σ 2 X −ikλ
fX (λ) = ck e , λ ∈ [−π, π]. (23)



k=0

Proof. Notice that


(n) (n) Pn
• {Xt , t ∈ Z} where Xt := j=0 cj Yt−j is MA(n) for a fix n.
P∞ (n)
• j=0 |cj | < ∞ ⇒ Xt → Xt in mean square as n → ∞ for every t ∈ Z.
(n)
• {Xt , t ∈ Z} is centered, weakly stationary, with the autocovariance function
(18). It means that {Xt , t ∈ Z} is centered, since it is the mean square limit of
the centered sequence.
(n)
• According to Theorem 12, the autocovariance function of {Xt , t ∈ Z} converges
to the autocovariance function of {Xt , t ∈ Z}.

We have proved (22) and the stationarity of sequence (21). Further, notice that the
(n)
sequence {Xt , t ∈ Z} has the spectral decomposition
Z π n
X
(n)
Xt = itλ
e gn (λ)dZY (λ), gn (λ) = cj e−ijλ ∈ L2 (FY ).
−π j=0

P∞ −ijλ
If we denote g(λ) = j=0 cj e , we have
∞ 2
Z π X Z
π
2 −ijλ
|gn (λ) − g(λ)| dFY (λ) = cj e dFY (λ)


−π −π j=n+1
Z π ∞
! 2 ∞
! 2
X X
2
≤ |cj | fY (λ)dλ = σ |cj | → 0.
−π j=n+1 j=n+1

Thus, gn → g in L2 (FY ). According to Theorem 26,


Z π Z π
(n) itλ
Xt = e gn (λ)dZY (λ) −→ eitλ g(λ)dZY (λ)
−π −π

(n)
in mean square as n → ∞ and simultaneously, Xt → Xt in mean square, which means
that Z π
Xt = eitλ g(λ)dZY (λ).
−π

51
Further, the computation
Z π Z π
i(s+t)λ
EXs+t Xs = E e g(λ)dZY (λ) eisλ g(λ)dZY (λ)
−π −π
Z π Z π 2
itλ 2 itλ 2σ
= e |g(λ)| dFY (λ) = e |g(λ)| dλ
−π −π 2π

results in the spectral decomposition of the autocovariance function of the sequence (21).
σ2
Function 2π |g(λ)|2 is the spectral density of the process (21).

Example 25. Let us consider a causal linear process such that



X
Xt = cj Yt−j , t ∈ Z, cj = ϕj , |ϕ| < 1
j=0

and Yt ∼ W N (0, σ 2 .
The process is centered, weakly stationary, with the autocovariance function

ϕt
RX (t) = σ 2 , t ≥ 0,
1 − ϕ2
= RX (−t), t ≤ 0.

The spectral density is


2

σ2 X
j −ijλ σ2 1
fX (λ) = ϕe = , λ ∈ [−π, π].

2π 2π |1 − ϕe−iλ |2


j=0

Further, we can write



X ∞
X ∞
X
j j
Xt = ϕ Yt−j = Yt + ϕ Yt−j = Yt + ϕ ϕj−1 Yt−j
j=0 j=1 j=1

X
= Yt + ϕ ϕk Yt−1−k
k=0
= ϕXt−1 + Yt . (24)

The sequence {Xt , t ∈ Z} defined by (24) is called autoregressive sequence of order one,
AR(1).

52
8.4 Autoregressive sequences
Definition 33. A random sequence {Xt , t ∈ Z} is called to be an autoregressive sequence
of order n, notation AR(n), if it satisfies equation
Xt = ϕ1 Xt−1 + · · · + ϕn Xt−n + Yt , t ∈ Z, (25)
where ϕ1 , . . . , ϕn are real-valued constants, ϕn 6= 0 and {Yt , t ∈ Z} is a white noise.

Equivalently, Xt can be defined by


n
X
Xt + a1 Xt−1 + · · · + an Xt−n = aj Xt−j = Yt , (26)
j=0

where a0 = 1.

We want to express the AR(n) sequence as a causal linear process. First, we define
a backward-shift operator by
BXt = Xt−1 , B 0 Xt = Xt , B k Xt = B k−1 (BXt ) = Xt−k , k ∈ Z.
Using this operator, relation (26) can be shortly written in the form a(B)Xt = Yt where
a(B) is a polynomial operator, formally identical with the algebraic polynomial a(z) =
1P+ a1 z + · · · + an z n . Similarly, let {ck , k ∈ Z} be a sequence of constants such that

k=−∞ |ck | < ∞. The series
X∞
c(z) = ck z k
k=−∞

is absolutely convergent at least inside the unite circle and defines the operator

X
c(B) = ck B k . (27)
k=−∞

This operator has usual properties of algebraic power series.


Theorem 35. Let {Xt , t ∈ Z} be the autoregressive sequence of order n defined by (26).
If all the roots of the polynomial a(z) = 1 + a1 z + · · · + an z n lie outside the unit circle
in C then {Xt , t ∈ Z} is a causal linear process, i.e.,

X
Xt = cj Yt−j , t ∈ Z,
j=0

where cj are defined by



X 1
c(z) = cj z j = , |z| ≤ 1.
j=0
a(z)

53
The autocovariance function of this sequence is given by (22) and the spectral density
is
σ2 1
fX (λ) = Pn , λ ∈ [−π, π], (28)
2π | j=0 aj e−ijλ |2
where a0 = 1.

Proof. Consider the AR(n) sequence,

Xt + a1 Xt−1 + · · · + an Xt−n = a(B)Xt = Yt .

If all the roots zi , i = 1, . . . , n, of the polynomial a(z) = 1 + a1 z + · · · + an z n are outside


the unit circle, then a(z) 6= 0 for |z| ≤ 1. Since |zi | ≥ min1≤i≤n |zi | ≥ 1 + δ > 1 for a
1
δ > 0, a(z) 6= 0 for |z| < 1 + δ, and c(z) = a(z) is holomorphic in the region |z| < 1 + δ
and has a representation
X∞
c(z) = cj z j , |z| < 1 + δ. (29)
j=0

The series in (29) is


Pabsolutely convergent in any closed circle with the radius r < 1 + δ

which means that j=0 |cj | < ∞ and c(z)a(z) = 1, |z| ≤ 1. Thus,

X
c(B)a(B)Xt = Xt = c(B)Yt = cj Yt−j .
j=0

We have proved that the sequence {Xt , t ∈ Z} is the causal linear process, that satisfies
Theorem 34. It is centered and weakly stationary, with the autocovariance function (22)
and spectral density
2
2 X ∞
σ −ikλ
σ 2 −iλ 2
fX (λ) = e ck = c(e )
2π k=0 2π


σ2 1 σ2 1
= −iλ
= Pn .
2π |a(e )| 2 2π | j=0 aj e−ijλ |2

If all the roots of a(z) are simple, we can obtain coefficients cj in the representation
(29) by using decomposition into partial fractions:

1 A1 A2 An
c(z) = = + + ··· +
a(z) z1 − z z2 − z zn − z

54
where A1 , . . . , An are constants that can be determined. For |z| ≤ 1 and |zj | > 1,
∞  k
Aj Aj Aj X z
=  = ,
zj − z zj 1 − zzj zj k=0 zj
n n ∞  k
X Aj X Aj X z
c(z) = =
z −z
j=1 j
z
j=1 j k=0
zj
∞ n ∞
X X
kAj X
= z k+1
= ck z k ,
k=0
z
j=1 j k=0
n
X Aj
ck = k+1
.
j=1
zj

Since for all i = 1, . . . , n we have |zi | ≥ 1 + δ > 1, it holds


n
1 X
|ck | < |Aj |
(1 + δ)k+1 j=1

from which we conclude that ∞


P
k=0 |ck | < ∞.
If the roots of the polynomial a(z) are multiple, we proceed analogously.
Coefficients ck can be also obtained by solving the system of equations
c0 = 1,
c1 + a1 c0 = 0,
c2 + a1 c1 + a2 c0 = 0,
...
cp + a1 cp−1 + · · · + an cp−n = 0, p = n, n + 1, . . . ,
that we get if we compare coefficients with the same powers of z at both sides of the
relation a(z)c(z) = 1. The system of equations
cp + a1 cp−1 + · · · + an cp−n = 0
for p ≥ n can be solved as a system of homogeneous difference equations of order n with
constant coefficients, and initial conditions c0 , c1 . . . , cn−1 .
Yule-Walker equations
The autocovariance function of a stationary and real-valued autoregression sequence
can be alternatively computed by using so-called Yule-Walker equations. Let us consider
a sequence {Xt , t ∈ Z},
Xt + a1 Xt−1 + · · · + an Xt−n = Yt , (30)

55
that satisfies conditions of Theorem 35, with real-valued coefficients a1 , . . . , an and with
{Yt , t ∈ Z} being the real white noise WN(0, σ 2 ). Since the sequence {Xt , t ∈ Z} is a
real-valued causal linear process and Yt are uncorrelated, it can be easily proved that
EXs Yt = hXs , Yt i = 0 for s < t.

Multiplying (30) by Yt and taking the expectation we get

EXt Yt + a1 EXt−1 Yt + · · · + an EXt−n Yt = EYt2 ,

thus,
EXt Yt = σ 2 .
Multiplying (30) by Xt−k for k ≥ 0 and taking the expectation we get a system of
equations
EXt Xt−k + a1 EXt−1 Xt−k + · · · + an EXt−n Xt−k = EYt Xt−k ,
or, if we put RX (t) = R(t),

R(0) + a1 R(1) + · · · + an R(n) = σ 2 , k = 0, (31)


R(k) + a1 R(k − 1) + · · · + an R(n − k) = 0, k ≥ 1. (32)

Equations (31) and (32) are called Yule-Walker equations.

Solution: Dividing (32) for k ≥ 1 by R(0) we get equations for the autocorrelation
function r(t) = R(t)/R(0).

• First solve the system for k = 1, . . . , n − 1 :

r(1) + a1 + a2 r(1) + a3 r(2) + · · · + an r(n − 1) = 0,


r(2) + a1 r(1) + a2 + a3 r(1) + · · · + an r(n − 2) = 0,
...
r(n − 1) + a1 r(n − 2) + · · · + an r(1) = 0.

• Values r(1), . . . , r(n − 1) together with r(0) = 1 serve as initial conditions to solve
the system of difference equations

r(k) + a1 r(k − 1) + · · · + an r(n − k) = 0, k ≥ n,

with the characteristic polynomial

λn + a1 λn−1 + · · · + an−1 λ + an = L(λ).

56
In this way we get the solution r(t) for t ≥ 0. For a real-valued sequence, r(t) = r(−t).
If we insert R(k) = r(k)R(0) into (31) we get the equation for R(0) :

R(0)[1 + a1 r(1) + · · · + an r(n)] = σ 2 ,

thus
σ2
R(0) = · (33)
1 + a1 r(1) + · · · + an r(n)

Remark 12. If zi , i = 1, . . . , n, are the roots of the polynomial a(z) = 1+a1 z +· · ·+an z n ,
then λi = zi−1 , i = 1, . . . , n, are the roots of the polynomial L(z) = z n + a1 z n−1 + · · · + an .
The AR(n) sequence is a causal linear process, if all the roots of the polynomial L(z)
are inside the unit circle.

Example 26. Consider the AR(1) sequence

Xt + aXt−1 = Yt , Yt ∼ WN(0, σ 2 ), |a| < 1.

Polynomial a(z) = 1 + az has the root − a1 that is outside the unit circle; it means that
{Xt , t ∈ Z} is a weakly stationary causal linear process. The Yule - Walker equations
for the autocovariance function RX (t) = R(t) are now

R(0) + aR(1) = σ 2 ,
R(k) + aR(k − 1) = 0, k ≥ 1,

A general solution to the difference equation for the autocorrelation function is r(k) =
c(−a)k , the initial condition is r(0) = 1 = c. Value R(0) can be determined from formula
(33):
σ2 σ2
R(0) = = ·
1 + a r(1) 1 − a2
Example 27. Consider the AR(2) sequence
3 1
Xt − Xt−1 + Xt−2 = Yt , Yt ∼ WN(0, σ 2 ).
4 8
The polynomial a(z) = 1 − 34 z + 18 z 2 has roots z1 = 2, z2 = 4, {Xt , t ∈ Z} is the causal
linear process that is weakly stationary. The Yule-Walker equations are
3 1
R(0) − R(1) + R(2) = σ 2 ,
4 8
3 1
R(k) − R(k − 1) + R(k − 2) = 0, k ≥ 1.
4 8

57
The equations for the autocorrelation function are of the form
3 1
r(1) −
+ r(1) = 0,
4 8
3 1
r(k) − r(k − 1) + r(k − 2) = 0, k ≥ 2. (34)
4 8
Solving the first equation we get r(1) = 23 . For k ≥ 2 we solve the second order difference
equation with initial conditions r(0) = 1, r(1) = 32 .
Characteristic equation L(λ) = λ2 − 34 λ + 81 = 0 has two different real-valued roots
λ1 = 12 , λ2 = 41 . A general solution of the difference equation (34) is
 k  k
1 1
r(k) = c1 λk1 + c2 λk2 = c1 + c2 .
2 4
Constants c1 , c2 satisfy

c1 + c2 = r(0),
λ1 c1 + λ2 c2 = r(1),

so that c1 = 35 , c2 = − 23 , and
 k  k
5 1 2 1
r(k) = − , k = 0, 1, . . .
3 2 3 4
r(k) = r(−k), k = −1, −2, . . .

The value of R(0) can be obtained from (33).

8.5 ARMA sequences


Definition 34. A random sequence {Xt , t ∈ Z} satisfies an ARMA(m, n) model if

Xt + a1 Xt−1 + · · · + am Xt−m = Yt + b1 Yt−1 + · · · + bn Yt−n , t ∈ Z, (35)

where ai , i = 1, . . . , m, bi , i = 1, . . . , n, are real constants, am 6= 0, bn 6= 0 and the se-


quence {Yt , t ∈ Z} is a white noise.

Equivalently we can write

Xt = ϕ1 Xt−1 + · · · + ϕm Xt−m + Yt + θ1 Yt−1 + · · · + θn Yt−n .

The model called ARMA(m, n) is a mixed model of autoregressive and moving average
sequences.

58
Consider polynomials a(z) = 1 + a1 z + · · · + am z m and b(z) = 1 + b1 z + · · · + bn z n .
Then we can write ARMA(m, n) model in the form

a(B)Xt = b(B)Yt . (36)


Theorem 36. Let {Xt , t ∈ Z} be the random ARMA(m, n) sequence given by (36).
Suppose that the polynomials a(z) and b(z) have no common roots and all the roots of
the polynomial a(z) = 1 + a1 z + · · · + am z m are outside the unit circle. Then Xt is of
the form

X
Xt = cj Yt−j , t ∈ Z,
j=0

where coefficients cj satisfy



X b(z)
c(z) = cj z j = , |z| ≤ 1.
j=0
a(z)

The spectral density of the sequence {Xt , t ∈ Z} is


Pn
σ 2 | j=0 bj e−ijλ |2
fX (λ) = , λ ∈ [−π, π], (37)
2π | m −ikλ |2
P
k=0 ak e

where a0 = 1, b0 = 1.
Proof. We proceed analogously as in the Proof of Theorem 35. Since all the roots of the
polynomial a(z) are lying outside the unit circle, for |z| ≤ 1 it holds
∞ ∞
1 X
j
X
= h(z) = hj z , where |hj | < ∞.
a(z) j=0 j=0

Thus, h(z)a(z) = 1 for |z| ≤ 1 and if we apply the operator h(B) to both sides of
equations (36), we have
h(B)a(B)Xt = Xt = h(B)b(B)Yt = c(B)Yt ,
where c(z) = b(z)/a(z) and ∞
P
j=0 |cj | < ∞.
The sequence {Xt , t ∈ Z} is the causal linear process with the autocovariance function
(22) and the spectral density
∞ 2 2
σ 2 X −ijλ σ 2 −iλ 2 σ 2 b(e−iλ )

fX (λ) = cj e = c(e ) =
2π j=0 2π 2π a(e−iλ )


Pn
σ 2 | j=0 bj e−ijλ |2
= .
2π | m −ikλ |2
P
k=0 ak e

59
Remark 13. If the polynomials a(z) a b(z) have common roots the polynomial c(z) =
b(z)/a(z) defines an ARMA(p, q), process with p < m, q < n.

Example 28. Consider the ARMA(1, 1) model

Xt + aXt−1 = Yt + bYt−1 , t ∈ Z,

where Yt is a white noise WN(0, σ 2 ), a, b 6= 0, a 6= b, |a| < 1.

We have a(z) = 1 + az, b(z) = 1 + bz, the roots za = − a1 , zb = − 1b , respectively, are


different and |za | > 1. All the assumptions of the previous theorem are satisfied.
For |z| ≤ 1,
∞ ∞
1 + bz X
j j
X
c(z) = = (1 + bz) (−a) z = cj z j ,
1 + az j=0 j=0

and if we compare the coefficients with the same powers of z we obtain

c0 = 1, cj = (−a)j−1 (b − a), j ≥ 1.

The autocovariance function of the sequence {Xt , t ∈ Z} is



X
2
RX (k) := R(k) = σ cj cj+|k| , k ∈ Z.
j=0

Computation of R(0) :
∞ ∞
" #
X X 2
R(0) = σ 2 c2j = σ 2 1 + aj−1 (b − a)
j=0 j=1
2
(b − a) 1 − 2ab + b2
 
= σ2 1 + = σ2 .
1 − a2 1 − a2

For k ≥ 1,
∞ ∞
" #
X X
R(k) = σ 2 cj cj+k = σ 2 c0 ck + cj cj+k
j=0 j=1

" #
X
= σ 2 (−a)k−1 (b − a) + (−a)k (b − a)2 (−a)2j
j=0
2
(b − a)
 
= σ 2 (−a)k−1 (b − a) + (−a)k
1 − a2
1 − ab
= σ 2 (−a)k−1 (b − a) = (−a)k−1 R(1).
1−a 2

60
The spectral density is

σ 2 |1 + be−iλ |2 σ 2 1 + 2b cos λ + b2
fX (λ) = · = · , λ ∈ [−π, π].
2π |1 + ae−iλ |2 2π 1 + 2a cos λ + a2

We can also use an analogy of the Yule-Walker equations. Multiplying (35) by


Xt−k , k ≥ 0 and taking the expectation we get
m
X n
X
EXt Xt−k + aj EXt−j Xt−k = EYt Xt−k + bj EYt−j Xt−k .
j=1 j=1

From Theorem 36 and properties of the white noise,


(
σ 2 ck−j , k ≥ j,
EXt−j Yt−k =
0, k<j

and the previous equation for k ≥ 0 can be written in the form


m
X n
X
R(k) + aj R(k − j) = σ 2 bj cj−k , k ≤ n, (38)
j=1 j=k
Xm
R(k) + aj R(k − j) = 0 k > n. (39)
j=1

For k ≥ max(m, n + 1), (39) is solved as a difference equation with initial conditions that
can be obtained from the system of equations for k < max(m, n + 1).

Example 29. Consider again the ARMA(1, 1) model

Xt + aXt−1 = Yt + bYt−1 , t ∈ Z,

where a 6= b 6= 0, |a| < 1, Yt ∼ WN(0, σ 2 ).


Equations (38) a (39) are of the form

R(0) + aR(1) = σ 2 + b(b − a)σ 2 ,


R(1) + aR(0) = σ 2 b,
R(k) + aR(k − 1) = 0, k ≥ 2.

The difference equation R(k) + aR(k − 1) = 0 with an initial condition for R(1) has the
solution R(k) = (−a)k−1 R(1), k ≥ 1.

61
The values of R(1) a R(0) will be computed from the first and the second equations:
1  2
σ (1 − 2ab + b2 ) ,

R(0) =
1 − a2
1  2 
R(1) = σ (b − a)(1 − ab)
1 − a2
which is the same result as before.

Definition 35. Let {Xt , t ∈ Z} be the stationary ARMA(m, n) sequence defined by


(36),
a(B)Xt = b(B)Yt , t ∈ Z,
where {Yt , t ∈ Z} is a white noise WN(0, σ 2 ). The sequence {Xt , t ∈ Z}Pis said to be
invertible, if there exists a sequence of constants {dj , j ∈ N0 } such that ∞
j=0 |dj | < ∞
and ∞
X
Yt = dj Xt−j , t ∈ Z. (40)
j=0

Let us study conditions under which an ARMA sequence is invertible.

Theorem 37. Let {Xt , t ∈ Z} be the stationary ARMA(m, n) random sequence defined
by (36). Let the polynomials a(z) and b(z) have no common roots and the polynomial
b(z) = 1 + b1 z + · · · + bn z n has all the roots outside the unit circle. Then {Xt , t ∈ Z}
is invertible and ∞
X
Yt = dj Xt−j , t ∈ Z,
j=0

where coefficients dj are defined by



X a(z)
d(z) = dj z j = , |z| ≤ 1.
j=0
b(z)

Proof. The theorem can be proved analogously as Theorem 36 by inverting the polyno-
mial b(z). The correctness of all operations is guaranteed by Theorem 33 since we assume
that {Xt , t ∈ Z} is stationary.

Remark 14. Let us notice that the equation d(z)b(z) = a(z) with polynomials a(z) =
1 + a1 z + · · · + am z m , b(z) = 1 + b1 z + · · · + bn z n , respectively, implies d0 = 1. Relation
(40) can be written as
X∞
Xt + dj Xt−j = Yt , t ∈ Z. (41)
j=1

The invertible ARMA(m, n) sequence can be thus expressed as an AR(∞) sequence.

62
8.6 Linear filters
Definition 36. Let {Yt , t ∈ Z} be a centered weakly stationary
P∞ sequence. Let {ck , k ∈
Z} be a sequence of (complex-valued) numbers such that j=−∞ |cj | < ∞.
We say that a random sequence {Xt , t ∈ Z} is obtained by filtration of a sequence
{Yt , t ∈ Z}, if
X∞
Xt = cj Yt−j , t ∈ Z. (42)
j=−∞

The sequence {cj , j ∈ Z} is called time-invariant linear filter. Provided that cj = 0 for
all j < 0, we say that the filter {cj , j ∈ Z} is causal

Theorem 38. Let {Yt , t ∈ Z} be a centered weakly stationary sequence with an auto-
covariance function RY and spectral density fY and P let {ck , k ∈ Z} be a linear filtr such
that ∞ ∞
P
−∞ k|c | < ∞. Then {X t , t ∈ Z}, where X t = k=−∞ ck Yt−k , is a centered weakly
stationary sequence with the autocovariance function

X ∞
X
RX (t) = cj ck RY (t − j + k), t∈Z
j=−∞ k=−∞

and the spectral density

fX (λ) = |Ψ(λ)|2 fY (λ), λ ∈ [−π, π], (43)

where ∞
X
Ψ(λ) = ck e−ikλ
k=−∞

for λ ∈ [π, π] is called the transfer function of the filter.

Proof. Let Xt = nk=−n ck Yt−k ; obviously, for each t ∈ Z, Xt → Xt in mean square


(n) P (n)

as n → ∞. Rπ
For any t ∈ Z, Yt has the spectral decomposition Yt = −π eitλ dZY (λ), where ZY is
a process with orthogonal increments and the associated distribution function FY (λ).
Thus
Xn Xn Z π
(n)
Xt = ck Yt−k = ck ei(t−k)λ dZY (λ)
k=−n k=−n −π
Z π n
X Z π
itλ −ikλ
= e ck e dZY (λ) = eitλ hn (λ)dZY (λ),
−π k=−n −π

63
where hn (λ) = n−n ck e−ikλ . For the same reasons as in the proof
P
P of Theorem 34, hn
converges to a function Ψ in the space L2 (FY ), where Ψ(λ) = ∞ c
k=−∞ k e −ikλ
, and by
Theorem 26, Z π
(n)
Xt = l. i. m. Xt = eitλ Ψ(λ)dZY (λ)
n→∞ −π

for any t ∈ Z.
(n)
Since {Xt , t ∈ Z} is centered, {Xt , t ∈ Z} is also centered, and according to
(n)
Theorem 12 the autocovariance functions {Xt , t ∈ Z} converge to the autocovariance
function of {Xt , t ∈ Z}, and so
n
X n
X
n
EXs+t Xs = lim EXs+t Xsn = lim cj ck E(Ys+t−j Ys−k )
n→∞ n→∞
j=−n k=−n

X ∞
X
= cj ck RY (t − j + k) := RX (t).
j=−∞ k=−∞

Since EXs+t Xs = RX (t) is a function of one variable, only, {Xt , t ∈ Z} is weakly


stationary.
It also holds
Z π Z π
i(t+s)λ
RX (t) = E e Ψ(λ)dZY (λ) eisλ Ψ(λ)dZY (λ)
Z π−π Z−ππ
= eitλ |Ψ(λ)|2 dFY (λ) = eitλ |Ψ(λ)|2 fY (λ)dλ
−π −π

and from the spectral decomposition of the autocovariance function (Theorem 19) it
follows that the function
|Ψ(λ)|2 fY (λ) := fX (λ)
is the spectral density of the sequence {Xt , t ∈ Z}.

Example 30. Let {Yt , t ∈ Z} be P a white noise WN(0, σ 2 ) sequence, {ck , k ∈ Z} be



P∞ such that k=−∞ |ck | < ∞. Then the linear process defined
a sequence of constants
by formula Xt = k=−∞ ck Yt−k is obtained by a linear filtration of the white noise.
Similarly, a causal linear process is obtained by a filtration of the white noise by using a
causal linear filter with ck = 0, k < 0.

Example 31. Let {Xt , t ∈ Z} be a random sequence defined by Xt = ϕXt−1 + Yt , where


Yt are elements of a white noise sequence and |ϕ| > 1.

64
Then {Xt , t ∈ Z} is not a causal linear process, but we can write

X
Xt = − ϕ−k Yt+k .
k=1

In this case, we have the linear filter, such that


(
0, k≥0
ck = k
−(ϕ) , k < 0.

9 Selected limit theorems


9.1 Laws of large numbers
Definition 37. We say that a stationary sequence {Xt , t ∈ Z} with mean value µ is
mean square ergodic or it satisfies the law of large numbers in L2 (Ω, A, P ), if, as n → ∞,
n
1X
Xt → µ in mean square. (44)
n t=1

If {Xt , t ∈ Z} is a sequence that is mean square ergodic then


n
1X P
Xt −→ µ,
n t=1

i.e., {Xt , t ∈ Z} satisfies the weak law of large numbers for stationary sequences.

Theorem 39. A stationary random sequence {Xt , t ∈ Z} with mean value µ and auto-
covariance function R is mean square ergodic if and only if
n
1X
R(t) → 0 as n → ∞. (45)
n t=1

Proof. W. l. o. g. put µ = 0 (otherwise we consider X et := Xt − µ). Consider the spectral


decomposition Z π
Xt = eitλ dZ(λ),
−π

where {Zλ , λ ∈ [−π, π]} is the orthogonal increment process with the associated distri-
bution function F, which is same as the spectral distribution function of {Xt , t ∈ Z}.

65
Then
n n n
1 X π itλ  Z π1 X
Z
1X itλ

Xt = e dZ(λ) = e dZ(λ)
n t=1 n t=1 −π −π n t=1
Z π
= hn (λ)dZ(λ),
−π

where
n iλ
(1−einλ )
(
1 X itλ 1e
, λ 6= 0,
hn (λ) = e = n 1−eiλ
n t=1 1, λ = 0.
Further, let us consider function
(
0, λ 6= 0,
h(λ) =
1, λ=0

and define the random variable


Z π
Z0 = h(λ)dZ(λ).
−π

Obviously, hn (λ) → h(λ) for any λ ∈ [−π, π]. Moreover, hn → h in L2 (F ), since


|hn (λ) − h(λ)|2 ≤ 4 and by the Lebesgue theorem, as n → ∞,
Z π
|hn (λ) − h(λ)|2 dF (λ) → 0.
−π

Hence, as n → ∞
n Z π Z π
1X
Xt = hn (λ)dZ(λ) → h(λ)dZ(λ) = Z0
n t=1 −π −π

in mean square.
Now, it suffices to show that
n
1X
Z0 = 0 a. s. ⇐⇒ R(t) → 0 as n → ∞. (46)
n t=1

From Theorem 26 we have EZ0 = 0; thus Z0 = 0 a. s. if and only if E|Z0 |2 = 0.


Further from Theorem 26,
Z π 2 Z π
2
E|Z0 | = E h(λ)dZ(λ) = |h(λ)|2 dF (λ).

−π −π

66
From the spectral decomposition of the autocovariance function and the Lebesgue
theorem
n n Z n
1X 1 Xh π itλ i Z π1 X 
R(t) = e dF (λ) = eitλ dF (λ)
n t=1 n k=1 −π −π n t=1
Z π Z π Z π
= hn (λ)dF (λ) → h(λ)dF (λ) = |h(λ)|2 dF (λ) (47)
−π −π −π

The rest of the proof follows from (46) and(47).

Example 32. Let us consider the AR(1) process

Xt = ϕXt−1 + Yt , Yt ∼ WN(0, σ 2 ), |ϕ| < 1.

We know that the autocovariance function of {Xt , t ∈ Z} is

σ2
RX (t) = ϕ|t| .
1 − ϕ2
Obviously,
n n
1X σ2 1 X t 1 σ 2 ϕ(1 − ϕn )
RX (t) = ϕ = →0
n t=1 1 − ϕ2 n t=1 n 1 − ϕ2 1 − ϕ
as n → ∞, from which we conclude that {Xt , t ∈ Z} is mean square ergodic.

Example 33. Let {Xt , t ∈ Z} be a stationary mean square ergodic sequence with
expected value µ and autocovariance function RX . Define a random sequence {Zt , t ∈ Z}
by
Zt = Xt + Y, t ∈ Z,
where EY = 0, varY = σ 2 ∈ (0, ∞), and EXt Y = 0 ∀t ∈ Z.
Then EZt = EXt + EY = µ for all t ∈ Z and

E(Zs+t − µ)(Zt − µ) = RX (s) + σ 2 := RZ (s),

from which we get that {Zt , t ∈ Z} is weakly stationary. However, it is not mean square
ergodic, since, as n → ∞,
n n
1X 1X
RZ (t) = RX (t) + σ 2 → σ 2 > 0.
n t=1 n t=1

67
Theorem 40. Let {Xt , t ∈ Z} be a real-valued stationary sequence with mean value µ
and autocovariance function R, such that ∞
P
t=−∞ |R(t)| < ∞. Then, as n → ∞

n
1X
Xn = Xt → µ in mean square, (48)
n t=1

X
n var X n → R(k). (49)
k=−∞
P∞ 1
Pn
Proof. 1. k=−∞ |R(k)| < ∞ ⇒ R(k) → 0 as k → ∞, thus n k=1 R(k) → 0 as
n → ∞ and assertion (48) follows from Theorem 39.
2. We have
n
1 X 
var X n = var Xk
n k=1
n
1 hX XX i
= var Xk + cov (Xj , Xk )
n2 k=1 1≤j6=k≤n
n−1
1h X i
= nR(0) + 2 (n − j)R(j)
n2 j=1
n−1 
1h X j i
= R(0) + 2 1− R(j)
n j=1
n
n−1 
|j|

1 X
= 1− R(j). (50)
n j=−n+1 n

Thus,
n−1 n−1
X 2X
n varX n = R(j) − jR(j).
j=−n+1
n j=1

Assertion (49) now follows from the assumptions of the theorem and from the Kronecker
lemma.

Remark 15. From Theorem 40 we also get



X
lim n var X n = R(k) = 2πf (0)
n→∞
k=−∞

where f is the spectral density of the sequence {Xt , t ∈ Z}.

68
Definition 38. A stationary mean square continuous process {Xt , t ∈ R} with mean
value µ is mean square ergodic if, as τ → ∞,
1 τ
Z
Xt dt → µ in mean square.
τ 0

Remark 16. The existence of the integral 0 Xt dt is guarranted by Theorem 18 since the
autocovariance function of the stationary mean square continuous process is continuous
and the expected value µ is constant.

Theorem 41. A stationary, mean square continuous process {Xt , t ∈ R} is mean square
ergodic if and only if its autocovariance function satisfies condition
1 τ
Z
R(t)dt → 0 as τ → ∞.
τ 0

Proof. Rozanov (1963), Chap. 1, § 6.

Theorem 42. Let {Xt , t ∈ R} be a real-valued stationary, mean Rsquare continuous



process, with mean value µ and autocovariance function R, such that −∞ |R(t)|dt < ∞.
Then, as τ → ∞
1 τ
Z
Xτ = Xt dt → µ in mean square, (51)
τ 0
Z ∞
τ var X τ → R(t)dt. (52)
−∞

Proof. Bosq and Nguyen (1996), Theorems 9.11 and 15.1.

Example 34. Let {Xt , t ∈ R} be a stationary centered stochastic process with the
autocovariance function

R(t) = ce−α|t| , t ∈ R, α > 0, c > 0.

The process is mean square continuous. Moreover,

1 τ c τ −αt c 1 − e−ατ
Z Z
R(t)dt = e dt = →0
τ 0 τ 0 τ α
2c
as τ → ∞, the process {Xt , t ∈ R} is mean square ergodic and τ var X τ → α
.

69
9.2 Central limit theorems
Some preliminary asymptotic results

Theorem 43. (Cramér-Slutsky Theorem) Let {Xn , n ∈ N}, {Yn , n ∈ N} be sequences of


D P
random variables and X be a random variable such that, as n → ∞, Xn −→ X, Yn −→ 0.
D
Then Xn + Yn −→ X as n → ∞.

Proof. Brockwell and Davis (1991), Proposition 6.3.3.

Theorem 44. Let {ξn , n ∈ N}, {Skn , n ∈ N, k ∈ N}, {ψk , k ∈ N} and ψ be random
variables such that
D
1. Skn −→ ψk , n → ∞, for all k = 1, 2, . . . ,
D
2. ψk −→ ψ, k → ∞,

3. limk→∞ limn→∞ P (|ξn − Skn | > ) = 0 for all  > 0.

Then
D
ξn −→ ψ as n → ∞.

Proof. Brockwell and Davis (1991), Proposition 6.3.9.

Theorem 45. (Lévy-Lindeberg CLT) Let {Yt , t ∈ Z} be a sequence of independent iden-


tically distributed
Pn random variables with mean µ and finite positive variance σ 2 . Let
1
Y n = n j=1 Yj . Then, as n → ∞

√ Yn−µ D
n −→ N (0, 1). (53)
σ
Proof. Brockwell and Davis (1991), Theorem 6.4.1.

Theorem 46. (Cramér-Wold Theorem) Let X, X1 , X2 , . . . , be k-dimensional random


vectors. Then
D
Xn −→ X as n → ∞
if and only if for every c ∈ Rk

c0 Xn −→ c0 X
D
pro n → ∞.

Proof. Brockwell and Davis (1991), Proposition 6.3.1.

70
Central limit theorems for stationary sequences
Theorem 47. Let {Xt , t ∈ Z} be a random sequence defined by
m
X
Xt = µ + bj Yt−j ,
j=0

where µ ∈ R, {Yt , t ∈ Z} is a strict white noise, i.e., a sequence of independent iden-


tically distributed (i. i. d.) random variables with zero mean and P finite positive variance
σ 2 . Let b0 = 1 and b1 , . . . , bm be real-valued constants such that m j=0 bj 6= 0. Then, as
n → ∞,
n
1 X D
√ (Xt − µ) −→ N (0, ∆2 ), (54)
n t=1
P 2
m
where ∆2 = σ 2 j=0 jb .

Proof. We can write


n n m
1 X 1 X X 
√ (Xt − µ) = √ bj Yt−j
n t=1 n t=1 j=0
n n n
1 X b1 X bm X
=√ Yt + √ Yt−1 + · · · + √ Yt−m
n t=1 n t=1 n t=1
n n
1 X b1  X 
=√ Yt + √ Yt + Y0 − Yn + . . .
n t=1 n t=1
n 0 n
bm  X X X 
+ √ Yt + Yk − Yj
n t=1 k=−m+1 j=n−m+1
m  1 X n
X 1
= bj √ Yt + √ ξn ,
j=0
n t=1 n

where
m
X m
X  m−1
X m
X 
ξn = Y1−s bj − Yn−s bj
s=1 j=s s=0 j=s+1

is a finite linear combination of 2m i. i. d. random variables Y0 , Y−1 , . . . , Y−m+1 and


Yn , Yn−1 , . . . , Yn−m+1 with zero mean and variance σ 2 .
According to Theorem 45, √1n nt=1 Yt −→ N (0, σ 2 ) as n → ∞. From here
P D

Xm  1 X n Xm 2
D 2 2 2
bj √ Yt −→ N (0, ∆ ), where ∆ = σ bj . (55)
j=0
n t=1 j=0

71
P
Now, using Theorem 43 it suffices to proof that √1n ξn −→ 0 as n → ∞. But it holds,
since as n → ∞
1 σ 2 · const
   
1 1 1 2

P √ ξn >  ≤ 2 E ξn = 2 → 0.
n  n  n

Theorem 48. Let {Xt , t ∈ Z} be a random sequence such that



X
Xt = µ + bj Yt−j ,
j=0

where µ ∈ R, {Yt , t ∈ Z} is a sequence of i. i. d. random variables with zeroPmean and


finitePpositive variance σ 2 . Let bj , j ∈ N0 , be real-valued constants such that ∞
j=0 |bj | <

∞, j=0 bj 6= 0 and b0 = 1. Then, as n → ∞,

n
1 X D
√ (Xt − µ) −→ N (0, ∆2 ),
n t=1
P 2

where ∆2 = σ 2 j=0 bj .

Proof. Choose k ∈ N. Then


k
X ∞
X
Xt − µ = bj Yt−j + bj Yt−j =: Ukt + Vkt ,
j=0 j=k+1

thus n n n
1 X 1 X 1 X
√ (Xt − µ) = √ Ukt + √ Vkt .
n t=1 n t=1 n t=1
If we denote
n n n
1 X 1 X 1 X
ξn = √ (Xt − µ), Skn =√ Ukt , Dkn =√ Vkt ,
n t=1 n t=1 n t=1

we have
ξn = Skn + Dkn .
From Theorem 47 we have, as n → ∞ and every k ∈ N
D
Skn −→ ψk , (56)

72
where ψk ∼ N (0, ∆2k ), ∆2k = σ 2 ( kj=0 bj )2 . Further, from the assumptions of the theorem
P
it follows that !2 !2
X k X∞
∆2k = σ 2 bj → σ2 bj = ∆2 ,
j=0 j=0

as k → ∞, and thus
D
ψk −→ N (0, ∆2 ). (57)
According to the Chebyshev inequality
1
P (|ξn − Skn | > ) = P (|Dkn | > ) ≤ varDkn
2 !
n
1 1 X
= var √ Vkt .
2 n t=1

From the assumption ∞


P
j=0 |bj | < ∞ and Theorem 34 it follows that for any k ∈ N,
{Vkt ,
P∞ t ∈ Z} is the centered stationary sequence with the autocovariance function RV (t) =
2
σ j=k+1 bj bj+|t| . Using formula (50) we can write

n
1  1 X 
P (|ξn − Skn | > ) ≤ var √ Vkt
2 n t=1
n−1 n−1
1 X  |j|  1 X
= RV (j) 1 − ≤ |RV (j)|
2 j=−n+1 n 2 j=−n+1
n−1
1h X i
= 2 RV (0) + 2 |RV (j)|
 j=1
∞ ∞
n−1 X
σ2 h X 2 X i
= b + 2 b b

ν ν+j
2 j=k+1 j

j=1 ν=k+1
∞ n−1 X ∞
σ2 h X 2 X i
≤ 2 bj + 2 |bν ||bν+j |
 j=k+1 j=1 ν=k+1
∞ ∞ ∞ ∞
σ2 h X 2
X X i σ2  X 2
≤ 2 |bj | + 2 |bν | |bν+j | = 2 |bj | ,
 j=k+1 ν=k+1 j=1
 j=k+1

so that !2

σ2 X
lim lim P (|ξn − Skn | > ) ≤ lim 2 |bj | =0 (58)
k→∞ n→∞ k→∞ 
j=k+1

for any  > 0.

73
Combining this result with (56) and (57) we can see that the assumptions of Theorem
44 are met and thus, as n → ∞,
n
1 X D
ξn = √ (Xt − µ) −→ N (0, ∆2 ).
n t=1

Example 35. Let us consider a sequence {Xt , t ∈ Z}, defined by

Xt = µ + Zt , Zt = aZt−1 + Yt ,

where µ ∈ R, |a| < 1 and {Yt , t ∈ Z} is a strict white noise with finite variance σ 2 > 0.
P∞
The assumption |a| < 1 implies that j=0 |a|j < ∞, thus

X
Xt = µ + aj Yt−j , t ∈ Z.
j=0
P∞
Since j=0 aj 6= 0, it holds, as n → ∞
n
1 X D 1
√ (Xt − µ) −→ N (0, ∆2 ), ∆2 = σ 2 .
n t=1 (1 − a)2
 
σ2
For large n, X n ≈ N µ, n(1−a)2 .

Definition 39. We say that a strictly stationary sequence {Xt , t ∈ Z} is m-dependent,


where m ∈ N0 is a given number, if for every t ∈ Z, the sets of random variables
(. . . , Xt−1 , Xt ) and (Xt+m+1 , Xt+m+2 , . . . ) are independent.
Remark 17. A sequence of i. i. d. random variables is m−dependent with m = 0.
Example 36. An MA(m) sequence generated from a strict white noise is the sequence
of m-dependent random variables.
Example 37. Let {Yt , t ∈ Z} be a strict white noise. Define {Xt , t ∈ Z} by

Xt = Yt Yt+m , t ∈ Z,

for some m ∈ N. Then


• EXt = E(Yt Yt+m ) = 0,

• EXs Xt = E(Yt Yt+m Ys Ys+m ) = 0 pro t 6= s.

74
In this case, Xt are mutually uncorrelated but not independent. They are m−dependent.

Theorem 49. Let {Xt , t ∈ Z} be a real-valued strictly stationary centered m-dependent


random sequence with finite second-order moments and autocovariance function R, such
that m
X
2
∆m = R(k) 6= 0.
k=−m

Then, as n → ∞,

n varX n −→ ∆2m , (59)


n
1 X D
√ Xt −→ N (0, ∆2m ). (60)
n t=1

Proof. 1. Since the sequence {Xt , t ∈ Z} is strictly stationary with finite second-order
moments, it is weakly stationary. From m-dependence it follows that R(k) = 0 for
|k| > m. According to Theorem 40 we have

X m
X
lim n varX n = R(k) = R(k) = ∆2m .
n→∞
k=−∞ k=−m

2. Let k > 2m and n = k · r, where k ∈ N, r ∈ N. Then

(X1 , . . . , Xn ) = (U1 , V1 , U2 , V2 , . . . , Ur , Vr ),
Uj = (X(j−1)k+1 , . . . , Xjk−m ), j = 1, . . . , r,
Vj = (Xjk−m+1 , . . . , Xjk ), j = 1, . . . , r.

U1 , . . . , Ur are mutually independent (it follows from m-dependence and the assumption
k > 2m) and identically distributed (from strict stationarity). Similarly, V1 , . . . , Vr are
i. i. d. Thus,
n
X Xr Xr
Xt = Sj + Tj ,
t=1 j=1 j=1

Sj , j = 1, . . . , r, are i. i. d. (Sj is the sum of elements of the vector Uj ,)


Tj , j = 1, . . . , r, are i. i. d. (the sum of elements of the vectors Vj ).

For k > 2m we have ES1 = 0, ET1 = 0 and


m
X
var S1 = var (X1 + · · · + Xk−m ) = (k − m − |ν|)R(ν) = ∆2mk .
ν=−m

75
Similarly, utilizing the strict stationarity we have
var T1 = var (Xk−m+1 + · · · + Xk ) = var (X1 + · · · + Xm )
m−1
X
2
= (m − |ν|)R(ν) = δm .
ν=−m+1

Now, we can write


n
1 X
√ Xt := ξn = Skn + Dkn , (61)
n t=1
where
n/k r
1 X 1 1 X
Skn =√ Sj = √ √ Sj , (62)
n j=1 k r j=1
n/k r
1 X 1 1 X
Dkn =√ Tj = √ √ Tj . (63)
n j=1 k r j=1
Form the Lévy-Lindeberg theorem, as r → ∞,
r
1 X D
√ Sj −→ N (0, ∆2mk ).
r j=1

For a fixed k and r → ∞ also n → ∞, so that


D
Skn −→ ψk , (64)
 
∆2mk
where ψk has the normal distribution N 0, k . As k → ∞,
m
∆2mk X
→ R(j) = ∆2m ,
k j=−m

D
ψk −→ N (0, ∆2m ). (65)
From the Chebyshev inequality,
r
1 1 X 
P (|ξn − Skn | > ) = P (|Dkn | > ) ≤ · var Tj
2 n j=1
1 1 1 2
= · var T1 = δ .
2 k 2 k m
Thus,
lim lim P (|ξn − Skn | > ) = 0 (66)
k→∞ n→∞

and the proof follows from (64), (65), (66) and Theorem 44.

76
Example 38. Consider the sequence {Xt , t ∈ Z}

Xt = µ + Yt + a1 Yt−1 + a2 Yt−2 , t ∈ Z,

where Yt are i. i. d., EYt = 0, var Yt = σ 2 > 0.


The sequence {Xt , t ∈ Z} is strictly stationary and m-dependent, m = 2. The
autocovariance function of {Xt , t ∈ Z} takes values

R(0) = σ 2 (1 + a21 + a22 ),


R(1) = σ 2 (a1 + a1 a2 ) = R(−1),
R(2) = σ 2 a2 = R(−2),
R(k) = 0, |k| > 2.

Therefore
m
X
∆2m = R(k) = R(0) + 2R(1) + 2R(2) = σ 2 (1 + a1 + a2 )2 .
k=−m

Pn D
From the previous theorem, √1 − µ) −→ N (0, ∆2m ), provided ∆2m 6= 0.
n t=1 (Xt

Example 39. Let {Yt , t ∈ Z} be a sequence of i. i. d. random variables, EYt = 0, var Yt =


σ 2 , EYt4 < ∞. Prove that for every k > 0 as n → ∞ it holds
n
1 X 2 D
√ (Yt − σ 2 ) −→ N (0, τ 2 ),
n t=1
n
1 X D
√ Yt Yt+k −→ N (0, σ 4 ),
n t=1
n−k
1 X D
√ Yt Yt+k −→ N (0, σ 4 ),
n t=1
n
1 X D
√ Xt −→ Nk (0, σ 4 I),
n t=1

where τ 2 = var Y12 , Xt = (Yt Yt+1 , . . . , Yt Yt+k )0 and I is the identity matrix of order k.

Solution.
1. Yt2 are i. i. d., EYt2 = σ 2 , var Yt2 = τ 2 . The Central limit theorem (Theorem 45)
implies that
n
1 X 2 D
√ (Y − σ 2 ) −→ N (0, τ 2 ).
n t=1 t

77
2. Denote Xt := Yt Yt+k for k > 0. The sequence {Xt , t ∈ Z} is strictly stationary,
EXt = 0, EXt2 = σ 4 , Xt are mutually uncorrelated but k-dependent. By Theorem 49,
n
1 X D
√ Xt −→ N (0, ∆2k ),
n t=1
Pk
where ∆2k = j=−k RX (j) = σ 4 .
3. We can write
n−k n n
1 X 1 X 1 X
√ Yt Yt+k = √ Yt Yt+k − √ Yt Yt+k .
n t=1 n t=1 n t=n−k+1

From step 2, as n → ∞,
n
1 X D
√ Yt Yt+k −→ N (0, σ 4 ).
n t=1
Form the Chebyshev inequality, as n → ∞,
n
! n
!
1 X 1 X
P √ Yt Yt+k >  = P √ Xt > 

n t=n−k+1 n t=n−k+1
n n
1  X 2 1 X
2 1 k σ4
≤ E X t = EX t = →0
2 n t=n−k+1 2 n t=n−k+1 2 n

since k is fixed.
4. Define Zt := c0 Xt , t ∈ Z, c ∈ Rk . Then

• Random vectors Xt have zero mean and the variance matrix σ 4 I and are mutually
uncorrelated.

• Random variables Zt are centered, with the variance σ 4 c0 Ic, uncorrelated and k-
dependent

• {Zt , t ∈ Z} is strictly stationary.

By Theorem 49,
n
1 X D
√ Zt −→ N (0, ∆2k ),
n t=1

where ∆2k = kj=−k RZ (j) = σ 4 c0 Ic. From here the final result follows when we apply
P
Theorem 46 and properties of normal distribution.

78
10 Prediction in time domain
10.1 Projection in Hilbert space
Definition 40. Let H be a Hilbert space with the inner product h·, ·i and the norm ||.||.
We say that two elements x, y ∈ H are orthogonal (perpendicular) if hx, yi = 0. We write
x ⊥ y.
Let M ⊂ H be a subset of H. We say that an element x ∈ H is orthogonal to M, if it
is orthogonal to every element of M, i.e., hx, yi = 0 for every y ∈ M. We write x ⊥ M.
The set M ⊥ = {y ∈ H : y ⊥ M } is called to be the orthogonal complement of the set M .

Theorem 50. Let H be a Hilbert space, M ⊂ H any subset. Then M ⊥ is a closed


subspace of H.

Proof. The null element 0 ∈ M ⊥ , since h0, xi = 0 for every x ∈ M.


The linearity of the inner product implies that any linear combination of two elements
of M ⊥ is an element of M ⊥ .
Continuity of the inner product implies that any limit of a sequence of elements of M ⊥
is an element of M ⊥ .

Theorem 51. (Projection Theorem) Let M be a closed subspace of a Hilbert space H.


Then for every element x ∈ H there exists a unique decomposition x = x
b + (x − x
b), such
b ∈ M and x − x
that x b ∈ M ⊥ . Further

||x − x
b|| = min ||x − y|| (67)
y∈M

and
||x||2 = ||b
x||2 + ||x − x
b||2 . (68)

Proof. Rudin (2003), Theorem 4.11, or Brockwell and Davis (1992), Theorem 2.3.1.

The element x b ∈ M with property (67) is called to be the orthogonal projection


of x onto the subspace M. The mapping PM : H → M such that PM x ∈ M and
(I − PM )x ∈ M ⊥ where I is the identity mapping, is called the projection mapping.
Obviously, for any x ∈ H

x = PM x + (x − PM x) = PM x + (I − PM )x. (69)

Theorem 52. Let H be a Hilbert space, PM the projection mapping of H onto a closed
subspace M. It holds:

1. For every x, y ∈ H and any α, β ∈ C, PM (αx + βy) = αPM x + βPM y.

2. If x ∈ M , then PM x = x.

79
3. If x ∈ M ⊥ , then PM x = 0.

4. If M1 , M2 are closed subspaces of H such that M1 ⊆ M2 , then PM1 x = PM1 (PM2 x)


for every x ∈ H.

5. If xn , x are elements of H such that ||xn − x|| → 0 as n → ∞, then ||PM xn −


PM x|| → 0.
Proof.
1. By using (69) we get

αx + βy = α(PM x + (x − PM x)) + β(PM y + (y − PM y))


= αPM x + βPM y + α(x − PM x) + β(y − PM y).

Obviously,

αPM x + βPM y ∈ M, α(x − PM x) + β(y − PM y) ∈ M ⊥

since M and M ⊥ are linear subspaces, and thus, αPM x + βPM y = PM (αx + βy).

2. The uniqueness of decomposition (69) implies assertion 2.

3. The uniqueness of decomposition (69) implies assertion 3.

4. Since x = PM2 x + (x − PM2 x), PM2 x ∈ M2 , x − PM2 x ∈ M2⊥ , we have PM1 x =


PM1 (PM2 x) + PM1 (x − PM2 x). Thus,

PM1 (PM2 x) ∈ M1 , and M2⊥ ⊆ M1⊥ ⇒ PM1 (x − PM2 x) = 0.

5. From the linearity of the projection mapping and equation (68)

||PM xn − PM x||2 = ||PM (xn − x)||2 ≤ ||xn − x||2 .

10.2 Prediction based on finite history


Let us consider the following problem: We have random variables X1 , . . . , Xn with zero
mean and finite second order moments. Utilizing observations X1 , . . . , Xn we want to
forecast Xn+h , where h > 0. We would like to approximate Xn+h by a measurable
function g(X1 , . . . , Xn ) (prediction) of observations X1 , . . . , Xn that minimizes

E |Xn+h − g (X1 , . . . , Xn )|2 .

80
It is well known that the best approximation is given by the conditional mean value

g(X1 , . . . , Xn ) = E(Xn+h |X1 , . . . , Xn ).

Indeed, if (for the simplicity, we consider only real-valued random variables) we denote
(X1 , . . . , Xn )0 = Xn , we can write

E (Xn+h − g(Xn ))2


= E (Xn+h − E(Xn+h |Xn ) + E(Xn+h |Xn ) − g(Xn ))2
= E (Xn+h − E(Xn+h |Xn ))2 + E (E(Xn+h |Xn ) − g(Xn ))2
+ 2E [(Xn+h − E(Xn+h |Xn )) (E(Xn+h |Xn ) − g(Xn ))] ,

where the last summand is

E [(Xn+h − E(Xn+h |Xn )) (E(Xn+h |Xn ) − g(Xn ))]


= E [E (Xn+h − E(Xn+h |Xn )) (E(Xn+h |Xn ) − g(Xn )) |Xn ]
= E [(E(Xn+h |Xn ) − E(Xn+h |Xn )) (E(Xn+h |Xn ) − g(Xn ))] = 0.

Thus,

E(Xn+h − g(Xn ))2


= E (Xn+h − E(Xn+h |Xn ))2 + E (E(Xn+h |Xn ) − g(Xn ))2
≥ E (Xn+h − E(Xn+h |Xn ))2

with equality for g(Xn ) = E(Xn+h |Xn ).


In the next, we will confine ourselves to linear functions of X1 , . . . , Xn . Then the
problem to find the best linear approximation of Xn+h can be solved by using the pro-
jection method in a Hilbert space. The best linear prediction of Xn+h from X1 , . . . , Xn
will be denoted by X
bn+h (n).
Direct method
Let H := H{X1 , . . . , Xn , . . . , Xn+h } be the Hilbert space generated by centered ran-
dom variables X1 , . . . , Xn+h and H1n := H{X1 , . . . , Xn } be the Hilbert subspace gener-
ated by random variables X1 , . . . , Xn .
The best linear prediction of Xn+h is the random variable
n
X
X
bn+h (n) = cj Xj ∈ H1n , (70)
j=1

such that
bn+h (n)|2 = kXn+h − X
E|Xn+h − X bn+h (n)k2

81
takes minimum with respect to all linear combinations of X1 , . . . , Xn .
It means that
Xbn+h (n) = PH n (Xn+h ) ∈ H n ,
1 1

bn+h (n) ⊥ H n
Xn+h − X (71)
1

and the element X bn+h (n) is determined uniquely due to the projection theorem.
Since the space H1n is a linear span generated by X1 , . . . , Xn , condition (71) is satisfied
if and only if
Xn+h − X bn+h (n) ⊥ Xj , j = 1, . . . , n,

i.e., if and only if


E(Xn+h − X
bn+h (n))X j = 0, j = 1, . . . , n.

Constants c1 , . . . , cn can be therefore obtained from the equations


n
X 
E Xn+h − ck Xk X j = 0, j = 1, . . . , n. (72)
k=1

For X1 , . . . Xn+h supposed to be elements of a real-valued centered stationary sequence


with the autocovariance function R, system (72) is of the form
n
X
ck R(k − j) = R(n + h − j), j = 1, . . . , n, (73)
k=1

or

c1 R(0) + c2 R(1) + · · · + cn R(n − 1)


= R(n + h − 1),
c1 R(1) + c2 R(0) + · · · + cn R(n − 2)
= R(n + h − 2),
...
c1 R(n − 1) + c2 R(n − 2) + · · · + cn R(0) = R(h).

Equivalently, system (73) can be written in the form

Γn cn = γ nh

where cn := (c1 , . . . , cn )0 , γ nh := (R(n + h − 1), . . . , R(h))0 and


 
R(0) R(1) . . . R(n − 1)
 R(1) R(0) . . . R(n − 2) 
Γn :=  ,
 
.. .. ... ..
 . . . 
R(n − 1) R(n − 2) . . . R(0)

82
Provided that Γ−1 −1
n exists we get cn = Γn γ nh , thus

n
X
X
bn+h (n) = cj Xj = c0n Xn = γ 0nh Γ−1
n Xn . (74)
j=1

It is obvious that Γn = var (X1 , . . . , Xn ) = var Xn = E (Xn X0n ).


The prediction error is

δh2 := E|Xn+h − X
bn+h (n)|2 = kXn+h − X
bn+h (n)k2 .

By (68)
kXn+h k2 = kX
bn+h (n)k2 + kXn+h − X
bn+h (n)k2 ,
so that
δh2 = kXn+h k2 − kX
bn+h (n)k2 . (75)
For a real-valued centered stationary sequence such that Γn is regular,

δh2 = kXn+h k2 − kX bn+h (n)k2 = E|Xn+h |2 − E|X bn+h (n)|2


= R(0) − E(c0n Xn )2 = R(0) − c0n E (Xn X0n )cn
= R(0) − c0n Γn cn = R(0) − γ 0nh Γ−1 −1
n Γn Γn γ nh
= R(0) − γ 0nh Γn−1 γ nh . (76)

Theorem 53. Let {Xt , t ∈ Z} be a real-valued centered stationary sequence with auto-
covariance function R, such that R(0) > 0 and R(k) → 0 as k → ∞. Then the matrix
Γn = var (X1 , . . . , Xn ) is regular for every n ∈ N.
Proof. We will prove the theorem by contradiction: suppose that Γn is singular for an
n ∈ N; then there is a nonzero vector c = (c1 , . . . , cn )0 such that c0 Γn c = 0 and for Xn =
(X1 , . . . , Xn )0 , c0 Xn = 0 a. s. holds true, since E c0 Xn = 0 and var (c0 Xn ) = c0 Γn c = 0.
Thus there exists a positive integer 1 ≤ r < n such that Γr is regular and Γr+1 is
singular, and constants a1 , . . . , ar such that
r
X
Xr+1 = aj X j .
j=1

By stationarity of {Xt , t ∈ Z},

var(X1 , . . . , Xr ) = · · · = var(Xh , . . . , Xh+r−1 ) = Γr .

From here, for any h ≥ 1,


r
X
Xr+h = aj Xj+h−1 .
j=1

83
(n) (n) Pr (n)
For every n ≥ r + 1 there exist constants a1 , . . . , ar such that Xn = j=1 aj X j =
(n) (n)
a(n)0 Xr , where a(n) = (a1 , . . . , ar )0 and Xr = (X1 , . . . , Xr )0 ,

var Xn = a(n)0 var Xr a(n) = a(n)0 Γr a(n) = R(0) > 0.

The matrix Γr is positive definite, therefore there exists a decomposition Γr = PΛP0 ,


where Λ is a diagonal matrix with the eigenvalues of the matrix Γr on the diagonal and
PP0 = I is the identity matrix. Since Γr is positive definite, all its eigenvalues are
positive; w. l. o. g. assume that 0 < λ1 ≤ · · · ≤ λr . Then
r 
X 2
(n)0 0 (n) (n)0 0 (n) (n)
R(0) = a PΛP a ≥ λ1 a PP a = λ1 aj ,
j=1

(n) 2 (n)
from which for every j = 1, . . . , r it follows that aj ≤ R(0)/λ1 , hence, |aj | ≤ C
independently of n, where C is a positive constant.
We also have
 r  r
2 X (n)
X (n)
0 < R(0) = E Xn = E Xn aj X j = aj EXn Xj
j=1 j=1
r
X r
X
(n) (n)
= aj R(n − j) ≤ |aj ||R(n − j)|
j=1 j=1
r
X
≤C |R(n − j)|.
j=1

The last expression converges to zero as n → ∞ due to the assumption R(n) → 0 as


n → ∞, but this contradicts to the assumption R(0) > 0. Thus, we conclude that the
matrix Γn is regular for every n ∈ N.

Recursive methods
Let us introduce the following notation.

• Denote by H1k = H{X1 , . . . , Xk } the Hilbert space generated by X1 , . . . , Xk .

• Put X bk+1 , k ≥ 1, the one-step prediction of Xk+1 , i. e.,


b1 := 0 and denote by X

X
bk+1 := X
bk+1 (k) = PH k (Xk+1 ).
1

Then
H1n = H{X1 , . . . , Xn } = H{X1 − X
b 1 , . . . , Xn − X
bn }.

84
Lemma 4. X1 − X
b 1 , . . . , Xn − X
bn are orthogonal random variables.

Proof. Let i < j. Then Xi ∈ H1i ⊆ H1j−1 and X bi ∈ H1i−1 ⊂ H1j−1 , so Xi − X


bi ∈ H1j−1 .
bj ⊥ H1j−1 , and also
bj = P j−1 (Xj ), therefore Xj − X
Further, X H 1

Xi − X
b i ⊥ Xj − X
bj .

The one-step best linear prediction of Xk+1 computed from X1 , . . . , Xk thus can be
written in the form
X k  
Xk+1 =
b θk j Xk+1−j − X
bk+1−j .
j=1

The error of the one-step prediction of Xk+1 is


bk+1 |2 = ||Xk+1 − X
vk = E|Xk+1 − X bk+1 ||2 , k ≥ 0.

Theorem 54 (Innovation algorithm). Let {Xt , t ∈ Z} be a real-valued centered random


sequence with autocovariance function R(i, j), such that matrix (R(i, j))ni,j=1 is regular
for every n. Then the best linear prediction of Xn+1 computed from X1 , . . . , Xn is

X
b1 = 0,
Xn  
Xn+1 =
b θn j Xn+1−j − X
bn+1−j , n ≥ 1, (77)
j=1

where for k = 0, . . . , n − 1,

v0 = R(1, 1), (78)


k−1
1 X 
θn,n−k = R(n + 1, k + 1) − θk,k−j θn,n−j vj , (79)
vk j=0
n−1
X
2
vn = R(n + 1, n + 1) − θn,n−j vj . (80)
j=0

Proof. Define X
b1 := 0, then

b1 |2 = E|X1 |2 = R(1, 1).


v0 = E|X1 − X

85
Since Xbn+1 = PH n Xn+1 , it must be of the form as given in (77). When multiply both
1

sides of (77) by Xk+1 − Xbk+1 for k < n and take the mean value we get

bn+1 (Xk+1 − X
EX bk+1 )
Xn
 
= θnj E Xn+1−j − Xbn+1−j Xk+1 − X
bk+1
j=1
bk+1 )2 = θn,n−k vk .
= θn,n−k E(Xk+1 − X

bn+1 ∈ H n and Xn+1 − X


Since X bn+1 ⊥ H n we have
1 1

E(Xn+1 − X
bn+1 )(Xk+1 − X
bk+1 ) = 0, k < n,

EXn+1 (Xk+1 − X bn+1 (Xk+1 − X


bk+1 ) = EX bk+1 ) = θn,n−k vk . (81)
From here we get
1 
θn,n−k = EXn+1 Xk+1 − X bk+1
vk
1 
= R(n + 1, k + 1) − EXn+1 X
bk+1 .
vk
bk+1 and replacing k by k − j in formula (81) we get
Further, applying formula (77) to X
k
X  
EXn+1 X
bk+1 = EXn+1 θkj Xk+1−j − X
bk+1−j
j=1
k
X  
= θkj EXn+1 Xk+1−j − X
bk+1−j
j=1
k
X
= θkj θn,n−(k−j) vk−j .
j=1

Combining these results altogether we get


k
1 X 
θn,n−k = R(n + 1, k + 1) − θkj θn,n−(k−j) vk−j
vk j=1
k−1
1 X 
= R(n + 1, k + 1) − θn,n−ν θk,k−ν vν .
vk ν=0

86
Computation of vn is as follows.
vn = E|Xn+1 − Xbn+1 |2 = E|Xn+1 |2 − E|X
bn+1 |2
n
X  2
= R(n + 1, n + 1) − E θnj Xn+1−j − Xn+1−j
b
j=1
Xn
2 bn+1−j )2
= R(n + 1, n + 1) − θnj E(Xn+1−j −X
j=1
Xn
2
= R(n + 1, n + 1) − θnj vn−j
j=1
n−1
X
2
= R(n + 1, n + 1) − θn,n−ν vν .
ν=0

Computational scheme of the innovation algorithm:


Xb1 v0
θ11 Xb2 v1
θ22 θ21 Xb3 v2
θ33 θ32 θ31 Xb 4 v3
... ... ...
Example 40. We have observations X1 , . . . , Xn , of an M A(1) random sequence that is
generated by
Xt = Yt + bYt−1 , Yt ∼ WN(0, σ 2 ), t ∈ Z
We will find X
bn+1 by using the innovation algorithm. We get
Xb1 = 0,
v0 = R(0) = σ 2 (1 + b2 ),
1 b
θ11 = R(1) = ,
v0 1 + b2
Xb2 = θ11 (X1 − Xb1 ) = θ11 X1 ,
2
v1 = R(0) − θ11 v0 ,
1
θ22 = R(2) = 0,
v0
1 R(1)
θ21 = (R(1) − θ22 θ11 v0 ) = ,
v1 v1
Xb3 = θ21 (X2 − Xb2 ),
2
v2 = R(0) − θ21 v1 ,

87
generally,
θnk = 0, k = 2, . . . , n,
R(1)
θn1 = ,
vn−1
bn+1 = θn1 (Xn − X
X bn ),
2
vn = R(0) − θn1 vn−1 .
Example 41. Consider an MA(q) sequence. Then R(k) = 0 pro |k| > q. By using the
recursive computations we get
min(q,n)  
X
X
bn+1 = θnj Xn+1−j − X
bn+1−j , n ≥ 1.
j=1

The coefficients θnj can be determined again by using Theorem 54.

The h- step prediction, h > 1:


Now, we want to use the innovation algorithm to make prediction from X1 , . . . , Xn
for h > 1 steps ahead, i. e., to determine X bn+h (n). Obviously, X
bn+h (n) = PH n (Xn+h ),
1

where H1n = H(X1 − X b 1 , . . . , Xn − Xbn ). Since H n ⊂ H1n+1 ⊂ · · · ⊂ H1n+h−1 , it follows


1
from the properties of the projection mapping and (77), that
   
Xbn+h (n) = PH n Xn+h = PH n P n+h−1 Xn+h = PH n X
1 1 H 1
bn+h
1

n+h−1
X 
= PH1n θn+h−1,j Xn+h−j − X
bn+h−j
j=1
n+h−1
X  
= θn+h−1,j PH1n Xn+h−j − X
bn+h−j
j=1
n+h−1
X  
= θn+h−1,j Xn+h−j − Xn+h−j ,
b (82)
j=h

since Xn+h−j − Xbn+h−j ⊥ X n pro j < h.


1
The h-step prediction error is
bn+h (n) 2 = E Xn+h 2 − E X bn+h (n) 2

δh2 = E Xn+h − X
n+h−1   2
X
= R(n + h, n + h) − E θn+h−1,j Xn+h−j − Xn+h−j
b
j=h
n+h−1
X
2
= R(n + h, n + h) − θn+h−1,j vn+h−j−1 .
j=h

88
Example 42. Le us again consider the MA(1) model from Example 40. We have shown
that the one-step prediction is
bn+1 = PH n (Xn+1 ) = θn1 (Xn − X
X bn ).
1

For h > 1 we have

X
bn+h (n) = PH n (Xn+h ) = PH n (X
1 1
bn+h )
= PH n (θn+h−1,1 (Xn+h−1 − X
1
bn+h−1 )) = 0,

bn+h−1 ) ⊥ H n pro h > 1.


since (Xn+h−1 − X 1

Innovation algorithm for an ARMA process


Consider a causal ARMA(p, q) process

Xt = ϕ1 Xt−1 + · · · + ϕp Xt−p + Yt + θ1 Yt−1 + · · · + θq Yt−q , t ∈ Z,

Yt ∼ WN(0, σ 2 ). We want to find X


bn+1 = PH n (Xn+1 ).
1

• First, let us consider the following transformation:


(1
X,
σ t
t = 1, 2, . . . , m,
Wt = (83)
1
σ
(Xt − ϕ1 Xt−1 · · · − ϕp Xt−p ) , t > m,

where m = max(p, q). Put X


b1 = 0, W
c1 = 0, W
ck = P k−1 (Wk ). It is clear that
H 1

H1n = H(X1 − X b 1 , . . . , Xn − X
bn ) = H(X1 , . . . , Xn )
= H(W1 , . . . , Wn ) = H(W1 − W c1 , . . . , Wn − W
cn ).

• Application of the innovation algorithm to the sequence {W1 , . . . , Wn } gives


Pn
 j=1 θnj (Wn+1−j − W cn+1−j ), 1 ≤ n < m,
W
cn+1 = (84)
Pq
j=1 θnj (Wn+1−j − Wn+1−j ), n≥m
c

(for t > m, Wt ∼ MA(q)).

• Application of the projection mapping onto H1t−1 to both sides of (83) results in

 σ1 X
bt , t ≤ m,
W
ct = (85)
1 b
σ
(Xt − ϕ1 Xt−1 · · · − ϕp Xt−p ), t > m.

89
We can see that for t ≥ 1,
1 
ct |2 = 1
Wt − W
ct = Xt − X
bt , E|Wt − W vt−1 := wt−1 .
σ σ2
Therefore it holds
Pn 
 j=1 θnj Xn+1−j − X
bn+1−j , n<m
X
bn+1 = P (86)
 q  Pp
j=1 θnj Xn+1−j − Xn+1−j + j=1 ϕj Xn−j+1 , n ≥ m,
b

where coefficients θnj a wn are computed by applying the innovation algorithm to the
sequence (83). For this we need to compute the values of the autocovariance function of
{Wt }.
We know that E Xt = 0, thus E Wt = 0. For the covariances RW (s, t) = EWs Wt we
get 
1

 R (s − t), 1 ≤ s, t ≤ m,
σ2 X



 Pp
1
 
2 RX (s − t) − j=1 ϕj RX (|s − t| − j) ,




 σ

RW (s, t) = min(s, t) ≤ m, m < max(s, t) ≤ 2m, (87)




1
Pq−|s−t|
θj θj+|s−t| , s, t > m, |s − t| ≤ q,


σ2 j=0






0, elsewhere

(we have put θ0 = 1).

Innovation algorithm for an AR sequence


Let us consider a causal AR(p) process, i,e.,

Xt = ϕ1 Xt−1 + · · · + ϕp Xt−p + Yt , t ∈ Z, Yt ∼ WN(0, σ 2 )

• Transformation:
(1
σ
Xt , 1 ≤ t ≤ p,
Wt = (88)
1
σ
(Xt − ϕ1 Xt−1 − · · · − ϕp Xt−p ) = σ1 Yt , t > p.

• Innovation algorithm applied to W1 , . . . , Wn :


(Pn
j=1 θnj (Wn+1−j − Wn+1−j ), n < p,
c
W
cn+1 = (89)
0, n ≥ p.

(Wn+1 ⊥ H1n for n ≥ p.)

90
ct = 1 (Xt − X
Again, Wt − W bt ) for t ≥ 1 and from here
σ
(Pn 
j=1 θnj Xn+1−j − X
bn+1−j , n < p,
X
bn+1 = (90)
ϕ1 Xn + ϕ2 Xn−1 + · · · + ϕp Xn−p+1 , n ≥ p.

The autocovariance function needed for the calculation of the coefficients θnj is

1
 σ2 RX (s − t),
 1 ≤ s, t ≤ p,
RW (s, t) = 1, t = s > p, (91)

0 elsewhere.

The one-step prediction error for n ≥ p is

bn+1 |2 = EYn+1
vn = E|Xn+1 − X 2
= σ2.

10.3 Prediction from infinite history


Let us suppose that we know the history Xn , Xn−1 , . . . , and we want to forecast Xn+1 , Xn+2 , . . . .
We will solve this problem again by using projection in Hilbert spaces.
n
Consider Hilbert spaces H = H{Xt , t ∈ Z} and H−∞ = H{. . . Xn−1 , Xn }. Then
the best linear prediction Xbn+h (n) of Xn+h from the infinite history Xn , Xn−1 , . . . is
n
the projection of Xn+h ∈ H onto H−∞ , i.,e., X
bn+h (n) = PH n (Xn+h ). The one-step
−∞

prediction is again denoted by X


bn+1 (n) := X bn+1 .
Prediction in a causal AR(p) process
Consider model

Xt = ϕ1 Xt−1 + · · · + ϕp Xt−p + Yt , t ∈ Z, (92)

where {Yt , t ∈ Z} is WN(0, σ 2 ), and assume that all the roots of the polynomial λp −
ϕ1 λp−1 − · · · − ϕp are inside the unit circle. It means that {Xt , t ∈ Z} is a causal linear
process and Yt ⊥ Xs for all t > s.
The one-step prediction: To get X bn+1 (n) from Xn , Xn−1 , . . . , notice that

• Xn+1 = ϕ1 Xn + · · · + ϕp Xn+1−p + Yn+1


n
• ϕ1 Xn + · · · + ϕp Xn+1−p ∈ H−∞ ,

91
n
• Yn+1 ⊥ Xn , Xn−1 , · · · ⇒ Yn+1 ⊥ H−∞ , (from the linearity and the continuity of the
inner product)
It means that
bn+1 = PH n (Xn+1 ) = ϕ1 Xn + · · · + ϕp Xn+1−p .
X −∞

The prediction error is


bn+1 |2 = E|Yn+1 |2 = σ 2 .
E|Xn+1 − X

The h-step prediction, h > 1:


 
X
bn+h (n) = PH n (Xn+h ) = PH n
−∞ −∞
P H n+h−1 (Xn+h )
−∞

= PH−∞
n (X
bn+h )
= PH−∞
n (ϕ1 Xn+h−1 + · · · + ϕp Xn+h−p )
= ϕ1 [Xn+h−1 ] + ϕ2 [Xn+h−2 ] + · · · + ϕp [Xn+h−p ],

where (
Xn+j , j≤0
[Xn+j ] =
Xn+j (n), j > 0.
b

Example 43. Consider an AR(1) process generated by Xt = ϕXt−1 + Yt , where |ϕ| < 1
and Yt ∼ WN(0, σ 2 ). If we know the whole history Xn , Xn−1 , . . . , we have X
bn+1 = ϕXn .
For h > 1

X bn+h−1 (n) = ϕ2 X
bn+h (n) = ϕ[Xn+h−1 ] = ϕX bn+h−2 (n) = . . .
= ϕ h Xn .

The prediction error is


bn+h |2 = E|Xn+h |2 − E|X
E|Xn+h − X bn+h (n)|2
2
= RX (0) − E ϕh Xn = RX (0) 1 − ϕ2h


1 − ϕ2h
= σ2 .
1 − ϕ2

Prediction in a causal and invertible ARMA(p, q) process


Consider a causal and invertible process

Xt = ϕ1 Xt−1 + · · · + ϕp Xt−p + Yt + θ1 Yt−1 + · · · + θq Yt−q , t ∈ Z, (93)

where Yt ∼ WN(0, σ 2 ).

92
• Due to causality, for any t ∈ Z, Xt = ∞
P P∞
j=0 c j Y t−j , where j=0 |cj | < ∞; from here
it follows that Yt ⊥ Xs for every s < t.

• Due to invertibility, for any t ∈ Z, Yt = ∞


P P∞
j=0 dj Xt−j , where j=0 |dj | < ∞, or,


X
Xt = − dj Xt−j + Yt . (94)
j=1

Since

X N
X
t−1

− dj Xt−j = l. i. m. N →∞ − dj Xt−j ∈ H−∞ ,
j=1 j=1
t−1
and Yt ⊥ H−∞ ,from decomposition (94) it follows that the best linear prediction of
Xn+1 based on the whole history Xn , Xn−1 , . . . , is

X
bn+1 = −
X dj Xn+1−j . (95)
j=1

The prediction error is


bn+1 |2 = E|Yn+1 |2 = σ 2 .
E|Xn+1 − X

From the uniqueness of the decomposition Xn+1 = X


bn+1 + Yn+1 and from formula
(93) we can also see that

bn+1 = ϕ1 Xn + · · · + ϕp Xn+1−p + θ1 Yn + · · · + θq Yn+1−q ,


X

thus, if we use the relation Yt = Xt − X


bt , (noticing that X
bt = P t−1 Xt ) we have
H−∞

bn+1 = ϕ1 Xn + · · · + ϕp Xn+1−p
X
 
+ θ1 Xn − X bn + · · · + θq Xn+1−q − X
bn+1−q .

The h-step prediction for h > 1 is

X
bn+h (n) = PH n (Xn+h ) = PH n (P n+h−1 (Xn+h )) = PH n (X
−∞ −∞ H −∞
bn+h )
−∞

= PH−∞n ϕ1 Xn+h−1 + · · · + ϕp Xn+h−p



+ θ1 Yn+h−1 + · · · + θq Yn+h−q
= ϕ1 [Xn+h−1 ] + · · · + ϕp [Xn+h−p ]
+ θ1 [Yn+h−1 ] + · · · + θq [Yn+h−q ],

93
where (
Xn+j , j ≤ 0,
[Xn+j ] =
Xbn+j (n), j > 0

and (
Xn+j − X
bn+j , j ≤ 0,
[Yn+j ] =
0, j > 0.
If we use (95) we have
∞ ∞
!
X X
X
bn+h (n) = PH n
−∞
− dj Xn+h−j =− [Xn−h+j ].
j=1 d=1

Alternatively, from the causality we get



X 
X
bn+h (n) = PH n (Xn+h ) = PH n
−∞ −∞
cj Yn+h−j .
j=0

Then, from properties of the projection mapping,



X  ∞
X
PH−∞
n cj Yn+h−j = cj PH−∞
n (Yn+h−j ),

j=0 j=0

and thus we can express the h− step prediction also as



X
X
bn+h (n) = cj Yn+h−j .
j=h

The prediction error can be then easily computed by


Xh−1 2 h−1
2 2
X
E Xn+h − Xn+h (n) = E cj Yn+h−j = σ |cj |2 .
b
j=0 j=0

Example 44. Consider the MA(1) model

Xt = Yt + θYt−1 , t ∈ Z, Yt ∼ WN(0, σ 2 ), |θ| < 1.

In this case, {Xt , t ∈ Z} is invertible, Yt = ∞ j


P
j=0 (−θ) Xt−j , and the best linear prediction
is ∞
X
Xn+1 (n) = Xn+1 = −
b b (−θ)j Xn+1−j = θYn = θ(Xn − X bn ).
j=1

bn+1 |2 = EY 2 = σ 2 .
The prediction error is E|Xn+1 − X n+1

94
For h > 1,

bn+h (n) = PH n P n+h−1 (Xn+h ) = PH n (X
X −∞ H −∞
bn+h )
−∞

= θPH−∞
n (Yn+h−1 ) = 0,

n
since for h ≥ 2, Yn+h−1 ⊥ H−∞ . The h-step prediction error is

bn+h (n)|2 = E|Xn+h |2 = RX (0) = σ 2 (1 + θ2 ).


E|Xn+h − X

11 Prediction in the spectral domain


• Let {Xt , t ∈ Z} be a centered stationary sequence with a spectral distribution
function F and spectral density f .

• We know the whole past of the sequence {Xt , t ∈ Z} up to time n − 1, and want
to forecast Xn+h , h = 0, 1, . . . , i.e., want to find Xbn+h (n − 1) = P n−1 (Xn+h ), in
H−∞
n−1
other words, we want to find Xn+h (n − 1) ∈ H−∞ ⊂ H{Xt , t ∈ Z}, such that
b
n−1
Xn+h − X bn+h (n − 1) ⊥ H−∞ .

• Recall spectral decomposition: Xt = −π eitλ dZ(λ), where {Zλ , λ ∈ [−π, π]} is an
orthogonal increment process with the associated distribution function F (Theorem
28).

• Recall that all the elements of the Hilbert space H{Xt , t ∈ Z} are of the form
Z π
ϕ(λ)dZ(λ),
−π

where ϕ ∈ L2 (F ) (Theorem 30).

bn+h (n − 1) should be of the form


Element X
Z π
Xbn+h (n − 1) = einλ Φh (λ)dZ(λ), (96)
−π

n−1
where Φh (λ) ∈ L2 (F ). Condition Xn+h − X
bn+h (n − 1) ⊥ H−∞ will be met if

Xn+h − X
bn+h (n − 1) ⊥ Xn−j , j = 1, 2, . . .

thus for j = 1, 2, . . . ,

E Xn+h − X
bn+h (n − 1) X n−j = 0,

95
or
 
E Xn+h − X bn+h (n − 1) X n−j =
Z π Z π
inλ
= R(h + j) − E e Φh (λ)dZ(λ) ei(n−j)λ dZ(λ)
Z π−π −π

= R(h + j) − einλ Φh (λ)e−i(n−j)λ dF (λ)


−π
Z π Z π
i(h+j)λ
= e dF (λ) − eijλ Φh (λ)dF (λ)
Z−π
π Z−ππ
= ei(h+j)λ f (λ)dλ − eijλ Φh (λ)f (λ)dλ
Z−π
π
−π

eijλ eihλ − Φh (λ) f (λ)dλ = 0.



= (97)
−π

Denote
Ψh (λ) := eihλ − Φh (λ) f (λ).


Then (97) can be written as


Z π
eijλ Ψh (λ)dλ = 0, j = 1, 2, . . . . (98)
−π

It follows from condition (98) that the Fourier expansion of the function Ψh has only
terms with nonnegative powers of eiλ ,

X ∞
X
ikλ
Ψh (λ) = bk e , |bk | < ∞.
k=0 k=0

Provided that ∞ ∞
X X
Φh (λ) = ak e−ikλ , |ak | < ∞,
k=1 k=1

which is a function convergent in L2 (F ),

96
Z π ∞
hX i
bn+h (n − 1) = inλ −ikλ
X e ak e dZ(λ)
−π k=1
Z π N
hX i
= l. i. m. e inλ
ak e−ikλ dZ(λ)
N →∞ −π k=1
N
X hZ π i
= l. i. m. ak ei(n−k)λ dZ(λ)
N →∞ −π
k=1
XN ∞
X
= l. i. m. ak Xn−k = ak Xn−k .
N →∞
k=1 k=1

Theorem 55. Let {Xt , t ∈ Z} be a real-valued centered stationary random sequence


with the autocovariance function R and the spectral density f (λ) = f ∗ (eiλ ), where f ∗ is
a rational function of a complex-valued variable.
Let Φ∗h be a function of a complex-valued variable z holomorphic for |z| ≥ 1 and such
that Φ∗h (∞) = 0.
Let
Ψ∗h (z) = z h − Φ∗h (z) f ∗ (z), z ∈ C,


be a function holomorphic for |z| ≤ 1. Then the best linear prediction of Xn+h from
Xn−1 , Xn−2 , . . . is Z π
Xbn+h (n − 1) = einλ Φh (λ)dZ(λ),
−π

where Φh (λ) = Φ∗h eiλ and {Zλ , λ ∈ [−π, π]} is the orthogonal increment process from


the spectral decomposition of {Xt , t ∈ Z}. The prediction error is

δh2 = E|Xn+h − Xbn+h (n − 1)|2


Z π
= R(0) − |Φh (λ)|2 f (λ)d(λ) (99)
Z−π
π
= R(0) − e−ihλ Φh (λ)f (λ)dλ. (100)
−π

Proof. Anděl (1976), Chap. X, Theorem 8.

The function Φh is called to be spectral characteristic of prediction of Xn+h from


Xn−1 , Xn−2 , . . .

97
Example 45. Consider again the autoregressive sequence

Xt = ϕXt−1 + Yt , |ϕ| < 1, ϕ 6= 0, Yt ∼ WN(0, σ 2 ).

We want to determine prediction X bn+h (n − 1), h ≥ 0 on the basis of Xn−1 , Xn−2 , . . . , in


the spectral domain.
The spectral density of the sequence {Xt , t ∈ Z} is

σ2 1 σ2 1 ∗ iλ

f (λ) = = = f e ,
2π |1 − ϕe−iλ |2 2π (1 − ϕe−iλ )(1 − ϕeiλ )

where
∗ σ2 1 σ2 z
f (z) = −1
=
2π (1 − ϕz )(1 − ϕz) 2π (1 − ϕz)(z − ϕ)
which is a rational function of the complex-valued variable z. The function

σ 2 z(z h − Φ∗h (z))


Ψ∗h (z) h
Φ∗h (z) ∗

= z − f (z) = , z ∈ C,
2π (1 − ϕz)(z − ϕ)

should be holomorphic for |z| ≥ 1. Since |ϕ| < 1, it must be z h − Φ∗h (z) = 0 for z = ϕ,
otherwise Ψ∗h has a pole at z = ϕ.
Thus,
Φ∗h (ϕ) = ϕh . (101)
Put Φ∗h (z) := γz , where γ is a constant. Then Φ∗h is holomorphic for |z| ≥ 1 and
Φ∗h (∞) = 0. The function Ψ∗h is holomorphic for |z| ≤ 1.
The value of constant γ follows from (101): ϕh = Φ∗h (ϕ) = ϕγ , thus γ = ϕh+1 .
Functions
ϕh+1
Φ∗h (z) = = ϕh+1 z −1 , z ∈ C,
z
σ 2
z h+1 − ϕh+1
Ψ∗h (z) = , z ∈ C,
2π (z − ϕ)(1 − ϕz)

satisfy the conditions of Theorem 55. The spectral characteristic of prediction is Φh (λ) =
ϕh+1 e−iλ and the best linear forecast is
Z π
Xn+h (n − 1) =
b einλ Φh (λ)dZ(λ)
Z−π
π
= einλ ϕh+1 e−iλ dZ(λ)
Z−π
π
= ei(n−1)λ dZ(λ)ϕh+1 = ϕh+1 Xn−1
−π

98
which is the same result as obtained in the time domain.
For the prediction error, from (99) we get
bt+h (t − 1) 2 = kXt+h k2 − kX

δh2 = E Xt+h − X bt+h (t − 1)k2
Z π 2
2 itλ

= R(0) − E Xt+h (t − 1) = R(0) − E
b e Φh (λ)dZ(λ)
−π
Z π
itλ 2
= R(0) − e Φh (λ) f (λ)dλ =
−π
Z π
2(h+1)
f (λ)dλ = R(0) 1 − ϕ2(h+1) ,

= R(0) − |ϕ|
−π

which is again in accordance with the result in the time domain.

12 Filtration of signal and noise


Let us consider a sequence {Xt , t ∈ Z}, said to be a signal, and a sequence {Yt , t ∈ Z},
a noise. Further consider the sequence {Vt , t ∈ Z}, where

Vt = Xt + Yt , t ∈ Z,

i.e., {Vt , t ∈ Z}, is a mixture of the signal and the noise. Our aim is to extract the signal
from this mixture.

12.1 Filtration in finite stationary sequences


Let {Xt , t ∈ Z}, {Yt , t ∈ Z}, be real-valued centered stationary sequences, mutually
uncorrelated, with autocovariance functions RX , RY , respectively. Let Vt = Xt + Yt for
t ∈ Z. Then {Vt , t ∈ Z} is also the real-valued centered and stationary sequence with the
autocovariance function RV = RX + RY . Suppose V1 , . . . , Vn to be known observations.
On the basis of V1 , . . . , Vn we want to find the best linear approximation of Xs in the
form Xbs = Pn cj Vj , with coefficients c1 , . . . , cn that minimize the mean square error
j=1
2
E|Xs − Xs | .
b
Denote H1n = H{V1 , . . . , Vn } ⊂ L2 (Ω, A, P ). Then the best linear approximation X bs
of Xs is the projection of Xs ∈ L2 (Ω, A, P ) onto H1n , i.e., X bs ∈ H1n and Xs − X bs ⊥ H1n .
n
Since H1 = H{V1 , . . . , Vn } = M{V1 , . . . , Vn }, it suffices to find constants c1 , . . . , cn such
that n
X
Xs =
b cj V j
j=1

and
Xs − X
bs ⊥ Vt , t = 1, . . . , n,

99
or
E(Xs − X
bs )Vt = 0, t = 1, . . . , n. (102)
Since Vt = Xt +Yt for all t and Xt , Yt are uncorrelated, we can see that EXs Vt = EXs Xt =
RX (s − t), and equations (102) can be written in the form
n
X
RX (s − t) − cj RV (j − t) = 0, t = 1, . . . , n. (103)
j=1

The variable X bs is the best linear filtration of the signal Xs at time s from the mixture
V1 , . . . , Vn .
The filtration error is

δ 2 = E|Xs − Xbs |2 = kXs − X bs k2 = kXs k2 − kX


bs k2
n n X n
X 2 X
= RX (0) − E cj Vj = RX (0) − cj ck RV (j − k).
j=1 j=1 k=1

The system of equations (103) can be written in the obvious matrix form. For the
regularity of the matrix of elements RV (j − t), j, t = 1, . . . , n, see Theorem 53.

12.2 Filtration in an infinite stationary sequence


Consider a signal {Xt , t ∈ Z}, a noise {Yt , t ∈ Z} and the mixture {Vt , t ∈ Z}, where
Vt = Xt + Yt for any t ∈ Z. Our aim is to find the best linear filtration of Xs from the
sequence of observations {Vt , t ∈ Z}.
Theorem 56. Let {Xt , t ∈ Z} and {Yt , t ∈ Z} be centered stationary sequences, mutu-
ally uncorrelated, with spectral densities fX a fY , respectively, that are continuous and
fX (λ) + fY (λ) > 0 for all λ ∈ [−π, π]. Let {Vt , t ∈ Z} be a random sequence such that
Vt = Xt + Yt for all t ∈ Z. Then the best linear filtration of Xs from {Vt , t ∈ Z} is
Z π
Xs =
b eisλ Φ(λ)dZV (λ),
−π

where
fX (λ)
Φ(λ) = , λ ∈ [−π, π], (104)
fV (λ)
fV = fX + fY is the spectral density of {Vt , t ∈ Z} and ZV = {Zλ , λ ∈ [−π, π]} is the
orthogonal increment process from the spectral decomposition of the sequence {Vt , t ∈ Z}.
The filtration error is
Z π Z π
2 fX (λ)fY (λ)
δ = dλ = Φ(λ)fY (λ)dλ.
−π fX (λ) + fY (λ) −π

100
Function Φ is called spectral characteristic of filtration.

18. Notice that if Φ(λ) = ∞ ikλ


, where ∞
P P
Remark k=−∞ ak e k=−∞ |ak | < ∞, then Xs =
b
P∞
k=−∞ ak Vs−k .

Proof. The sequences {Xt , t ∈ Z} and {Yt , t ∈ Z} are centered, stationary and mutually
uncorrelated with spectral densities. It follows that the sequence {Vt , t ∈ Z} is centered
and stationary with the autocovariance function RV = RX + RY . Then the spectral
density of {Vt , t ∈ Z} exists and is equal to fV = fX + fY .
The best linear filtration of Xs from {Vt , t ∈ Z} is the projection of Xs onto the
Hilbert space H = H{Vt , t ∈ Z}, i.e., we are interested in X bs = PH (Xs ).
Let Φ be the function defined in (104). First, we will show that
Z π
Xs :=
b eisλ Φ(λ)dZV (λ) ∈ H.
−π

According to Theorem 30 it suffices to show that Φ ∈ L2 (FV ), where FV is the spectral


distribution function of the sequence {Vt , t ∈ Z}. According to the assumption, fX a fV
are continuous functions and fV takes in [−π, π] positive values, only. Thus,

fX (λ) 2
Z π Z π Z π
|fX (λ)|2

2
|Φ(λ)| dFV (λ) = f V (λ)dλ = dλ < ∞
−π fV (λ) −π fV (λ)

−π

and X bs ∈ H.
Further, Xbs will be the projection of Xs onto H if (Xs −X
bs ) ⊥ H, i.e., if (Xs −Xbs ) ⊥ Vt
for all t ∈ Z. For any t ∈ Z we have
 
E Xs − Xs V t = EXs V t − EX
b bs V t
Z π Z π !
isλ

= EXs X t + Y t − E e Φ(λ)dZV (λ) eitλ dZV (λ)
−π −π
Z π
= EXs X t − eisλ Φ(λ)e−itλ dFV (λ)
−π
Z π
= RX (s − t) − eiλ(s−t) Φ(λ)fV (λ)dλ
Z−π
π
= RX (s − t) − eiλ(s−t) fX (λ)dλ
−π
= RX (s − t) − RX (s − t) = 0.

We have proved that X


bs is the best linear filtration.

101
Let us determine the filtration error:
δ 2 = kXs k2 − kX bs k2 = RX (0) − E|X bs |2
Z π Z π 2
isλ
= fX (λ)dλ − E e Φ(λ)dZV (λ)

Z−ππ Z π −π
Φ(λ) 2 dFV (λ)

= fX (λ)dλ −
Z−ππ Z−ππ
fX (λ) 2

= fX (λ)dλ − fV (λ)dλ
−π fV (λ)

−π
Z π Z π
fX (λ)fY (λ)
= dλ = Φ(λ)fY (λ)dλ.
−π fX (λ) + fY (λ) −π

Example 46. Let the signal {Xt , t ∈ Z} and the noise {Yt , t ∈ Z} be mutually inde-
pendent sequences such that
Xt = ϕXt−1 + Wt , t ∈ Z,
2
where |ϕ| < 1, ϕ 6= 0 and {Wt , t ∈ Z} is a white noise with zero mean and variance σW ,
2
and {Yt , t ∈ Z} is another white noise sequence with zero mean and variance σY . We
observe Vt = Xt + Yt , t ∈ Z.
Obviously, {Xt , t ∈ Z} and {Yt , t ∈ Z} are centered stationary sequences with the
spectral densities
2
σW 1 σY2
fX (λ) = , fY (λ) = , λ ∈ [−π, π]
2π |1 − ϕe−iλ |2 2π
that satisfy conditions of Theorem 56.
The sequence {Vt , t ∈ Z} has the spectral density fV = fX + fY and it can be shown
that
σ 2 |1 − θe−iλ |2
fV (λ) = , λ ∈ [−π, π], (105)
2π |1 − ϕe−iλ |2
where σ 2 = ϕθ σY2 , θ is the root of the equation θ2 − cθ + 1 = 0, the absolute value of
which is less than one and has the same sign as the coefficient ϕ, and
2
σW + σY2 (1 + ϕ2 )
c= .
ϕσY2
(See Prášková, 2016, Problem 8.1 for some hints.) Then
2 2 X∞
fX (λ) σW 1 σW 2
k −ikλ
Φ(λ) = = 2 = 2 θ e

fV (λ) σ |1 − θe−iλ |2 σ k=0

2 ∞
σW 1 X
= θ|k| e−ikλ
σ 1 − θ k=−∞
2 2

102
for all λ ∈ [−π, π].
The best linear filtration of Xs from {Vt , t ∈ Z} is
2 ∞
σW 1 X
Xs = 2
b θ|k| Vs−k . (106)
σ 1 − θ2 k=−∞

The filtration error is


Z π
2 2
δ = E|Xs − Xs | =
b φ(λ)fY (λ)dλ
−π
2 Z π
σY2 σW 1 σY2 σW2
1
= −iλ
dλ = .
2π σ −π |1 − θe |
2 2 σ 1 − θ2
2

Remark 19. It follows from (105) that fV has the same form as the spectral density
of an ARMA(1, 1) sequence. The mixture of the AR(1) sequence {Xt , t ∈ Z} and the
white noise {Vt , t ∈ Z} has the same covariance structure as the stationary sequence
{Zt , t ∈ Z} that is modeled to be

Zt − ϕZt−1 = Ut − θUt−1 , t ∈ Z,

where ϕ 6= 0, |ϕ| < 1, |θ| < 1 and {Ut , t ∈ Z} is a white noise with the variance
σ 2 = ϕθ σY2 . Parameter θ can be determined as given above.
σ2
Remark 20. Function Φ is the transfer function of the linear filter { σW2 1
1−θ2
θ|k| , k ∈ Z}.

13 Partial autocorrelation function


Definition 41. Let {Xt , t ∈ Z} be a real-valued centered stationary sequence. The
partial autocorrelation function of {Xt , t ∈ Z} is defined to be

cov(X1 ,X2 )
corr(X1 , Xk+1 ) = √varX1 √varX2 ,
 k = 1,
α(k) =  
corr X1 − X
 e1 , Xk+1 − X ek+1 , k > 1,

where Xe1 is the linear projection of X1 onto Hilbert space H2k = H{X2 , . . . , Xk } and
ek+1 is the linear projection of Xk+1 onto H k .
X 2

From the properties of the projection mapping it follows that X e1 = c2 X2 +· · ·+ ck Xk


where constants c2 , . . . , ck are determined by conditions E(X1 − X
e1 )Xj = 0, j = 2, . . . , k.
The same holds for X ek+1 . We can see that α(k) represents the correlation coefficient of

103
residuals X1 − X e1 and Xk+1 − Xek+1 of the best linear approximation of the variables X1
and Xk+1 by random variables X2 , . . . , Xk .
The stationarity of the sequence {Xt , t ∈ Z} implies that for h ∈ N, corr(X1 −
X1 , Xk+1 − X
e ek+1 ) = corr(Xh − Xeh , Xk+h − Xek+h ), where X
eh , X
ek+h are linear projec-
tions of random variables Xh , Xk+h onto the Hilbert space H{Xh+1 , . . . , Xh+k−1 }. There-
fore, α(k) is also the correlation coefficient of Xh and Xh+k after the linear dependence
Xh+1 , . . . , Xh+k−1 was eliminated.

Example 47. Consider the causal AR(1) process

Xt = ϕXt−1 + Yt ,

where |ϕ| < 1 and Yt ∼ WN(0, σ 2 ).


According to the definition, α(1) = corr(X1 , X2 ) = rX (1) = ϕ. For k > 1,

E(X1 − Xe1 )(Xk+1 − X


ek+1 )
α(k) = q q .
E(X1 − X1 ) E(Xk+1 − Xk+1 )
e 2 e 2

Due to the causality, X ek+1 = PH k (Xk+1 ) = ϕXk and Xk+1 − X ek+1 = Yk+1 ⊥ H k .
2 2
Further, it follows from causality that Yk+1 ⊥ X1 , thus E(X1 − X1 )(Xk+1 − Xk+1 ) =
e e
E(X1 − Xe1 )Yk+1 = 0, from which we conclude that α(k) = 0 for k > 1.

Remark 21. In the same manner, for a causal AR(p) sequence we could prove that the
partial autocorrelation α(k) = 0 for k > p.

Example 48. Consider the MA(1) process

Xt = Yt + bYt−1 ,

where |b| < 1 and Yt ∼ WN(0, σ 2 ). We know that in this case RX (0) = (1 + b2 )σ 2 ,
RX (1) = bσ 2 = RX (−1) and RX (k) = 0 for |k| > 1.
We compute the partial autocorrelations.
b
First, α(1) = rX (1) = 1+b 2 . Further, α(2) = corr(X1 − X1 , X3 − X3 ). To determine
e e
 
X e1 = PH 2 X1 = cX2 and X1 − X
e1 , notice that X e1 ⊥ X e1 . Thus E(X1 − cX2 )X2 = 0,
2
RX (1) b e1 = b 2 X2 . Quite analogously we get X b
and c = RX (0)
= 1+b2
.
We have X 1+b
e3 = X,
1+b2 2
i.e., X
e1 = X
e3 . We have
 
b b
α(2) = corr X1 − X2 , X3 − X2 .
1 + b2 1 + b2

104
Obviously,
 b  b  2b b2
E X1 − X 2 X 3 − X 2 = RX (2) − RX (1) + RX (0)
1 + b2 1 + b2 1 + b2 (1 + b2 )2
σ 2 b2
=− ,
1 + b2
similarly,
 b 2 b2 2b σ 2 (1 + b2 + b4 )
E X1 − X 2 = RX (0) + RX (0) − RX (1) = ,
1 + b2 (1 + b2 )2 1 + b2 1 + b2
and combining these results we conclude that
b2
α(2) = − .
1 + b2 + b4
Generally, it can be shown that

(−b)k (1 − b2 )
α(k) = − , k ≥ 1.
1 − b2(k+1)

Definition 42 (An alternative definition of the partial correlation function). Let {Xt , t ∈
Z} be a centered stationary sequence, let PH1k (Xk+1 ) be the best linear prediction of Xk+1
on the basis of X1 , . . . , Xk . If H1k = H{X1 , . . . , Xk }, and PH1k (Xk+1 ) = ϕ1 Xk +· · ·+ϕk X1 ,
then the partial autocorrelation function at lag k is defined to be α(k) = ϕk .
Theorem 57. Let {Xt , t ∈ Z} be a real-valued sequence with the autocovariance function
R, such that R(0) > 0, R(t) → 0, as t → ∞. Then the both definitions of the partial
autocorrelation function are equivalent and it holds

α(1) = r(1),


1 r(1) . . . r(k − 2) r(1)

r(1)
1 . . . r(k − 3) r(2)

.. .. .. .. ..
. . . . .

r(k − 1) r(k − 2) . . . r(1) r(k)
α(k) = k > 1, (107)

1 r(1) . . . r(k − 1)

r(1)
1 . . . r(k − 2)
.. .. .. ..
. . . .

r(k − 1) r(k − 2) . . . 1

where r is the autocorrelation function of the sequence {Xt , t ∈ Z}.

105
Proof. Denote H1k = H{X1 , . . . , Xk }, H2k = H{X2 , . . . , Xk }, X
bk+1 = PH k (Xk+1 ), X
1
e1 =
PH k (X1 ), X
2
ek+1 = PH k (Xk+1 ).
2

e1 + (X1 − X
Since X1 = X e1 ∈ H2k , X1 − X
e1 ), where X ek+1 ∈ H2k ,
e1 ⊥ H2k , and X

ek+1 (X1 − X
EX e1 ) = 0. (108)

Consider
bk+1 = ϕ1 Xk + · · · + ϕk X1 .
X
Then    
bk+1 = ϕ1 Xk + · · · + ϕk−1 X2 + ϕk X
X e1 + ϕk (X1 − X
e1 ) ,
 
and the random variables in the brackets are mutually orthogonal. Then ϕk (X1 − X e1 )
can be considered to be the projection of X e = H{X1 − X
bk+1 onto the Hilbert space H e1 } ⊂
k
H1 . It is also the projection of Xk+1 onto the space He and
 
E Xk+1 − ϕk (X1 − X e 1 ) X1 − X
e1 = 0
e1 2 .
 
= EXk+1 X1 − X e1 − ϕk E X1 − X

From here and from (108) we get


  
EXk+1 X1 − Xe1 E Xk+1 − X
ek+1 X1 − X
e1
ϕk = = . (109)
e1 2 e1 2
 
E X1 − X E X1 − X

Since E(X1 − Xe1 )2 = E(Xk+1 − X ek+1 )2 , which holds from the fact that for a stationary
sequence, var (X2 , . . . , Xk ) = var (Xk , . . . , X2 ), we get from (109) that

ϕk = corr(X1 − X
b1 , Xk+1 − X
bk+1 ) = α(k).

Now we will verify (107). We know that X bk+1 = ϕ1 Xk + ϕ2 Xk−1 + · · · + ϕk X1 ∈ H k ,


1
bk+1 ⊥ H k and therefore
Xk+1 − X 1

E Xk+1 − (ϕ1 Xk + · · · + ϕk X1 ) Xk+1−j = 0, j = 1, 2, . . . , k,

which is a system of equations

R(1) − ϕ1 R(0) − · · · − ϕk R(k − 1) = 0


R(2) − ϕ1 R(1) − · · · − ϕk R(k − 2) = 0
..
.
R(k) − ϕ1 R(k − 1) − · · · − ϕk R(0) = 0.

106
Dividing each equation by R(0), we get the system of equations
ϕ1 + ϕ2 r(1) + · · · + ϕk r(k − 1) = r(1)
ϕ1 r(1) + ϕ2 + · · · + ϕk r(k − 2) = r(2)
..
.
ϕ1 r(k − 1) + ϕ2 r(k − 2) + · · · + ϕk = r(k),
or, in the matrix form,
    
1 r(1) . . . r(k − 1) ϕ1 r(1)
 r(1) 1 . . . r(k − 2)  ϕ2   r(2) 
= ,
    
 .. .. ... ..  .. ..
 . . .  .   . 
r(k − 1) r(k − 2) ... 1 ϕk r(k)
The ratio of determinants (107) gives the solution for ϕk .
Example 49. Consider again the causal AR(1) process
Xt = ϕXt−1 + Yt ,
where |ϕ| < 1 and Yt ∼ WN(0, σ 2 ). Let us compute the partial autocorrelation function
according to formula (107). We get

1
ϕ . . . ϕk−2 ϕ

ϕ
1 . . . ϕk−3 ϕ2

.. .. .. .. ..
. . . . .
k−1
ϕk−2 . . . ϕ ϕk

ϕ
α(k) =
k−1
k > 1, (110)
1
ϕ . . . ϕ
ϕ k−2
1 . . . ϕ
.. .. .. ..
. . . .
k−1 k−2
ϕ ϕ ... 1
We can see that the last column of the determinant in the numerator of (110) is
obtained by multiplication of the first column, thus, this determinant equals zero.

14 Estimators of the mean and the autocorrelation


function
14.1 Estimation of the mean
Let {Xt , t ∈ Z} be a stationary sequence with expected value EXt = µ and autocovari-
ance function R(s, t) = R(s − t).

107
1.0

1.0
0.5

0.5
α(t)
r(t)

0.0

0.0
−0.5

−0.5
−1.0

−1.0
0 2 4 6 8 10 1 3 5 7 9

t t

Figure 8: Autocorrelation (left) and partial autocorrelation function (right) of the AR(1) sequence
Xt = −0,8 Xt−1 + Yt
1.0

1.0
0.5

0.5
α(t)
r(t)

0.0

0.0
−0.5

−0.5
−1.0

−1.0

0 2 4 6 8 10 1 3 5 7 9

t t

Figure 9: Autocorrelation function (left) and partial autocorrelation function (right) of the MA(1)
sequence Xt = Yt + 0,8 Yt−1

108
A common estimator of the mean value is the sample mean defined by
n
1X
Xn = Xt .
n t=1

We know that X n is an unbiased estimator of the expected value, since EX n = µ. We


also know that if the sequence {Xt , t ∈ Z} is mean square ergodic, then X n → µ in mean
square and also in probability. It guarantees the (weak) consistency of the estimator.
Recall that a sufficient condition for mean square ergodicity is R(t) → 0 as t → ∞
(compare Theorems 39 and 40).
The variance of the sample mean of a stationary sequence is
n−1
|k|
 
1 X
varX n = R(k) 1 − ,
n k=−n+1 n
P∞ P∞
and if −∞ |R(k)| < ∞, then n varX n → −∞ R(k) = 2πf (0) where f (λ) is the
spectral density of the sequence {Xt , t ∈ Z}, see Theorem 40. We have also proved a
few central limit theorems for selected strictly stationary sequences, saying that X n has
2
asymptotically distribution N (µ, ∆n ), where ∆2 is an asymptotic variance (see Theorems
47, 48 and 49).
However, the sample mean X n is not the best linear estimator of the expected value
of a stationary sequence {Xt , t ∈ Z}. Such estimator can be constructed as follows.
Consider a linear model

Xt = µ + X
et , t = 1, . . . , n, (111)

where X et , t = 1, . . . , n, is a centered stationary sequence with the autocovariance func-


tion R, such that R(0) > 0, R(t) → 0 as t → ∞. Then from the theory of general linear
model (e.g., Anděl, 2002, Theorem 9.2) it holds that the best linear unbiased estimator
of the parameter µ is statistic

bn = (10n Γ−1
µ −1 0 −1
n 1 n ) 1 n Γ n Xn , (112)

where  
R(0) . . . R(n − 1)
R(1)
 R(1) . . . R(n − 2)
R(0) 
Γn = var Xn =  ,
 
.. .. ... ..
 . . . 
R(n − 1) R(n − 2) . . . R(0)
is regular matrix according to Theorem 53, 1n = (1, . . . , 1)0 and Xn = (X1 , . . . , Xn )0 .
The variance of this estimator is

bn = (10n Γ−1
var µ −1
n 1n ) . (113)

109
14.2 Estimators of the autocovariance and the autocorrelation
function
The best linear estimator (112) assumes knowledge of the autocovariance function R.
Similarly, knowledge of the autocovariance function is assumed in prediction problems.
For estimators we usually work with the sample autocovariance function
n−k
1X  
R(k)
b = Xt − X n Xt+k − X n , k = 0, 1, . . . , n − 1 (114)
n t=1

and R(k)
b = R(−k)
b pro k < 0. Let us remark that the sample autocovatriance function
is not an unbiased estimator of the autocovariance function, i.e., ER(k)
b 6= R(k).
The matrix  
R(0)
b R(1)
b b − 1)
. . . R(n
 R(1)
b R(0)
b b − 2) 
. . . R(n
Γn = 
 
b .
. .
. . . .
. 
 . . . . 
b − 1) R(n
R(n b − 2) . . . R(0)
b

will be regular, if R(0)


b > 0. For given X1 , . . . , Xn , function
( P
1 n−|k|
n t=1 (Xt − X n )(Xt+k − X n ), |k| < n,
R(k) =
b (115)
0, |k| ≥ n

can be wieved to be the autocovariance function of an MA(n − 1) sequence and thus,


the regularity of matrix Γ
b n follows from Theorem 53.
If we dispose only n observations X1 , . . . , Xn , we can estimate R(k), k = 0, . . . , n − 1.
From the practical point of view, it is recommended to choose n ≥ 50 and k ≤ n4 .

Further, let us consider the autocorrelation function r(k) = R(k)


R(0)
.
We define the sample autocorrelation function to be
Pn−k  
R(k) t=1 Xt − X n Xt+k − X n
b
rb(k) = = Pn 2 ,
R(0)
b
t=1 X t − X n

2
= n1 nt=1 Xt − X n > 0.
P 
if R(0)
b
Asymptotic behaviour of the sample autocorrelations is described in the following
theorem.
Theorem 58. Let {Xt , t ∈ Z} be a random sequence

X
Xt − µ = αj Yt−j ,
j=−∞

110
where Yt , t ∈ Z, are independent identically distributed random variables with zero mean
and finite positive variance σ 2 , and let E|Yt |4 < ∞ and ∞
P
j=−∞ |αj | < ∞.
Let r(k), k ∈ Z, be the autocorrelation function of the sequence {Xt , t ∈ Z} and rb(k)
be the sample autocorrelation at lag k, based on X1 , . . . , Xn . √ 
Then for each h = 1, 2, . . . , as n → ∞, the random vector n b r(h) − r(h) converges
in distribution to a random vector with normal distribution Nh (0, W), where

r(1), . . . , rb(h))0 , r(h) = (r(1), . . . , r(h))0 ,


r(h) = (b
b

and W is an h × h matrix elements of which are



X   
wij = r(k + i) + r(k − i) − 2r(i)r(k) r(k + j) + r(k − j) − 2r(j)r(k) (116)
k=1

i, j = 1, . . . , h.
Proof. Brockwell and Davis (1991), Theorem 7.2.1.
Remark 22. Formula (116) is called the Bartlett formula. From the assertion of the
theorem we especially get for any i
√  D
n rb(i) − r(i) −→ N (0, wii ), n → ∞,

i.e., for large n,  wii 


rb(i) ∼ N r(i), .
n
Example 50. Consider the AR(1) sequence

Xt = ϕXt−1 + Yt , t ∈ Z,

where |ϕ| < 1 and Yt , t ∈ Z are i. i. d. with zero mean, finite non-zero variance σ 2 and
with finite moments E|Yt |4 . Then r(k) = ϕ|k| , thus r(1) = ϕ and according to Theorem
58, √  D
n rb(1) − ϕ) −→ N (0, w11 ), n → ∞,
where
∞ ∞
X 2 X 2
ϕk−1 − ϕk+1

w11 = r(k + 1) + r(k − 1) − 2r(1)r(k) =
k=1 k=1

X
= (1 − ϕ2 )2 ϕ2(k−1) = 1 − ϕ2 .
k=1

If we denote rb(1) := ϕ,
b we can write
√  D
b − ϕ −→ N (0, 1 − ϕ2 ),
n ϕ n→∞

111
or
√ b−ϕ D
ϕ
np −→ N (0, 1), n → ∞.
1 − ϕ2
P
b −→ ϕ (see, e.g. Brockwell and Davis, 1991, Chap. 6) and
From here it follows that ϕ
also
√ ϕ b−ϕ D
np −→ N (0, 1), n → ∞.
1−ϕb2
The asymptotic 95% confidence interval for ϕ is
r r !
1−ϕ b2 1−ϕb2
b − 1, 96
ϕ , ϕ
b + 1, 96 .
n n

Example 51. Let us suppose that the sequence {Xt , t ∈ Z} is a strict white noise.
Then r(0) = 1 and r(t) = 0 for t 6= 0. The elements of W are
X∞
wii = r(k − i)2 = 1,
k=1

X
wij = r(k − i)r(k − j) = 0, i 6= j,
k=1

i. e., W = I is the identity matrix. It means that for large n, the vector b r(h) =
0
rb(1), . . . , rb(h) has approximately normal distribution Nh (0, n1 I). For large n, there-
fore the random variables rb(1), . . . , rb(h) are approximately independent and identically
distributed with zero mean and variance n1 . In the plot of sample autocorrelations rb(k)
for k = 1, . . . , approximately 95% of them should be in the interval −1, 96 √1n , 1, 96 √1n .


The sample partial autocorrelation function is defined to be α b(k) = ϕbk , where ϕbk
can be obtained e.g. from (107), where we insert the sample autocorrelation
Pn coefficients.
1
The determinant in the denominator of (107) will be non-zero if n t=1 (Xt − X n )2 > 0.
Example 52. In Figure 12 the plot of the Wolf index of the annual number of the
Sunspots (1700-1987)3 is displayed. In Figures 13 and 14 we can see the sample auto-
covariance function and the sample partial autocorrelation function, respectively. The
data was identified (after centering) with the autoregressive AR(9) process a(B)Xt = Yt
where
a(z) = 1 − 1.182z + 0.4248z 2 + 0.1619z 3 − 0.1687z 4
+ 0.1156z 5 − 0.02689z 6 − 0.005769z 7
+ 0.02251z 8 − 0.2062z 9
Yt ∼ WN(0, σ 2 ), σ 2 = 219.58.

3
Source: WDC-SILSO, Royal Observatory of Belgium, Brussels

112
3
2
1
0
−1
−2
−3

0 50 100 150 200

Figure 10: Trajectory of a strict white noise process


1.0
0.5
^r (k)

0.0
−0.5
−1.0

0 2 4 6 8 10 12 14 16 18 20

Figure 11: Sample autocorrelation function of a strict white noise

113
Annual means of Wolf Sunspot numbers, 1700−1987
200

180

160

140

120

100

80

60

40

20

0
1700 1750 1800 1850 1900 1950 2000

Figure 12: Number of Sunspots, the Wolf index

Wolf numbers, sample ACF


1
Sample Autocorrelation

0.5

−0.5
0 5 10 15 20
Lag

Figure 13: Wolf index, estimated autocorrelation function

114
Wolf numbers, sample PACF
1

0.8
Sample Partial Autocorrelations
0.6

0.4

0.2

−0.2

−0.4

−0.6

−0.8
0 5 10 15 20
Lag

Figure 14: Wolf index, estimated partial autocorrelation function

15 Estimation of parameters in ARMA models


15.1 Estimation in AR sequences
Let us consider a real-valued stationary causal AR(p) sequence of known order p,
Xt = ϕ1 Xt−1 + · · · + ϕp Xt−p + Yt , t ∈ Z, (117)
where {Yt , t ∈ Z} denotes a white noise process WN(0, σ 2 ), and ϕ1 , . . . , ϕp , σ 2 are un-
knowns parameters to be estimated on the basis of X1 , . . . , Xn .
Moment methods
The method utilizes Yule - Walker equations for the autocovariance function RX := R
of the sequence {Xt , t ∈ Z} in the form
R(0) = ϕ1 R(1) + · · · + ϕp R(p) + σ 2 , (118)
R(k) = ϕ1 R(k − 1) + · · · + ϕp R(k − p), k ≥ 1. (119)
The system of equations for k = 1, . . . , p can be written in the matrix form
Γϕ = γ, (120)
where
     
ϕ1 . . . R(p − 1)
R(0) R(1)
ϕ =  ...  , Γ= .. ... ..
, γ =  ...  .
     
. .
ϕp R(p − 1) . . . R(0) R(p)

115
If we replace the values of R(k) in Γ and γ by their sample counterparts
n−k
1X  
R(k)
b = Xt − X n Xt+k − X n ,
n t=1

we get the matrix Γ


b and the vector γb . If we plug these estimators into equation (120),
we obtain moment estimators of ϕ1 , . . . , ϕp by solving
b = (ϕ
ϕ bp )0 = Γ
b1 , . . . , ϕ b −1 γ
b, (121)
b −1 exists. From subsection 14.2 we know that a sufficient condition for Γ
provided Γ b to
be regular is
n
1X 2
R(0) =
b Xt − X n > 0.
n t=1
The moment estimate of σ 2 is obtained from (118) as
b2 = R(0)
σ b −ϕ b − ··· − ϕ
b1 R(1) bp R(p)
b b −ϕ
= R(0) b 0γ
b. (122)
Remark 23. The moment estimators based on Yule-Walker equations are sometimes
called Yule -Walker estimators.
Example 53. Consider an AR(1) sequence in the form Xt = ϕXt−1 + Yt , t ∈ Z, where
|ϕ| < 1 and Yt is from WN(0, σ 2 ). Moment estimators of parameters ϕ and σ 2 are
R(1)
b
b2 = R(0) b2 .

ϕ
b= = rb(1), σ b −ϕ bR(1)
b = R(0)
b 1−ϕ
R(0)
b
Moment estimator of the parameter ϕ is in this case the same as the sample autocorre-
lation coefficient rb(1) (compare with Example 50.)
Asymptotic properties of the moment estimators are described in the following the-
orem.
Theorem 59. Let {Xt , t ∈ Z} be an AR(p) sequence generated by Xt = ϕ1 Xt−1 + · · · +
ϕp Xt−p + Yt for t ∈ Z, where {Yt , t ∈ Z} is a sequence of i. i. d. random variables with
zero mean and finite non-zero variance σ 2 . Suppose that all the roots of the characteristic
polynomial λp − ϕ1 λp−1 − · · · − ϕp are inside the unit circle and let ϕ
b = (ϕ bp )0 and
b1 , . . . , ϕ
2 0 2
σ
b be the moment estimators of ϕ = (ϕ1 , . . . , ϕp ) and σ , repectively, computed from
X1 , . . . , X n .
Then √
ϕ − ϕ) −→ Np 0, σ 2 Γ−1 , n → ∞,
D 
n (b
where Γ is the matrix with elements Γij = R(i − j), 1 ≤ i, j ≤ p, R is the autocovariance
function of {Xt , t ∈ Z}.
Further, it holds
P
b2 −→ σ 2 , n → ∞.
σ

116
Proof. Brockwell and Davis (1991), Theorem 8.1.1.
Least squares method

Consider again sequence (117) and suppose X1 , . . . , Xn to be known. The least square
estimators of parameters ϕ1 , . . . , ϕp are obtained by minimizing the sum of squares
n
X
min (Xt − ϕ1 Xt−1 − · · · − ϕp Xt−p )2 .
ϕ1 ,...,ϕp
t=p+1

The problem leads to the solution of the system of equations


n
X
(Xt − ϕ1 Xt−1 − · · · − ϕp Xt−p ) Xt−j = 0, j = 1, . . . , p,
t=p+1

i. e., to the system


n
X n
X n
X
2
ϕ1 Xt−1 + · · · + ϕp Xt−1 Xt−p = Xt Xt−1 ,
t=p+1 t=p+1 t=p+1
..
.
n
X n
X n
X
2
ϕ1 Xt−1 Xt−p + · · · + ϕp Xt−p = Xt Xt−p .
t=p+1 t=p+1 t=p+1

If we write (117) in commonly used form

Xt = ϕ0 X t−1 + Yt ,

where X t−1 = (Xt−1 , . . . , Xt−p )0 , then the solution is of the form


n
!−1 n
ep )0 =
X X
e = (ϕ
ϕ e1 , . . . , ϕ X t−1 X 0t−1 X t−1 Xt . (123)
t=p+1 t=p+1

The least squares estimator of σ 2 is


n
1 X 2
σ
e = 2
e 0 X t−1 .
Xt − ϕ (124)
n − p t=p+1

It can be shown that estimators ϕe and σ e2 have the same asymptotic properties as
the moment estimators. In particular, as n → ∞

ϕ − ϕ) −→ Np 0, σ 2 Γ−1
D 
n (e

117
and
P
e2 −→ σ 2 ,
σ
where Γ is the same matrix as in Theorem 59 (Brockwell and Davis, 1991, Chap. 8).
Maximum likelihood estimators
The maximum likelihood method assumes that the distribution of random variables
from which we are intended to construct estimators of parameters under consideration
is known.
Consider first a sequence {Xt , t ∈ Z}, that satisfies model Xt = ϕXt−1 + Yt , where Yt
are i. i. d. random variables with distribution N (0, σ 2 ). We assume causality, i.e., |ϕ| < 1.
Let us have observations X1 , . . . , Xn . From the causality and independence assumption
it follows that random variables X1 and (Y2 , . . . , Yn ) are jointly independent with the
density
n 1 X n o
2 −(n−1)/2 2

f (x1 , y2 , . . . , yn ) = f1 (x1 )f2 (y2 , . . . , yn ) = f1 (x1 ) 2πσ exp − 2 y .
2σ t=2 t
From the causality it also follows that random variable X1 has the distribution N (0, τ 2 ),
σ2
where τ 2 = 1−ϕ 2 . By the transformation density theorem we easily obtain that the joint

density of X1 , . . . , Xn is
n 1 n
2 −n/2
p X o
2 2
(xt − ϕxt−1 )2 . (125)

f (x1 , . . . , xn ) = 2πσ 1 − ϕ exp − 2 (1−ϕ )x1 +
2
2σ t=2

The likelihood function L(ϕ, σ 2 ) is of the same form as (125). Maximum likelihood
estimates are then ϕ̄, σ̄ 2 , that maximize L(ϕ, σ 2 ) on a given parametric space.
These are the unconditional maximum likelihood estimators and even in this simple
model the task to maximize the likelihood function leads to a non-linear optimization
problem.
More simple solution is provided by using the conditional maximum likelihood method.
We can easily realize that the conditional density of X2 , . . . , Xn given fixed X1 = x1
in our AR(1) model is
n 1 X n o
2 −(n−1)/2 2

f (x2 , . . . , xn |x1 ) = 2πσ exp − 2 (xt − ϕxt−1 ) . (126)
2σ t=2
The conditional maximum likelihood estimators are obtained by maximizing function
(126) with respect to ϕ and σ 2 .
Similarly, if we consider a general causal AR(p) sequence (117), where Yt are i. i. d. with
distribution N (0, σ 2 ), we can prove that the conditional density of (Xp+1 , . . . , Xn )0 given
X1 = x1 , . . . , Xp = xp is
( n
)
−(n−p)/2 1 X 2
f (xp+1 , . . . , xn |x1 , . . . , xp ) = 2πσ 2 (xt − ϕ0 xt−1 ) ,

exp − 2
2σ t=p+1

118
where xt−1 = (xt−1 , . . . , xt−p )0 , and ϕ = (ϕ1 , . . . , ϕp )0 .
By maximization of this function with respect to ϕ1 , . . . , ϕp , σ 2 we get the conditional
maximum likelihood estimators. It can be easily shown that under normality, these
estimators are numerically equivalent to the least squares estimators.

15.2 Estimation of parameters in MA and ARMA models


In the previous paragraph we have seen that in AR models, moment estimators, as well
as the least squares estimators and the conditional maximum likelihood estimators are
computationally very simple since we are dealing with linear regression functions. In MA
and generally in ARMA models the problem is more complicated since the estimation
equations are generally non-linear. We will mention only a few basic methods.
Moment method in MA(q)
Consider the MA(q) sequence defined by

Xt = Yt + θ1 Yt−1 + · · · + θq Yt−q , t ∈ Z,

where {Yt , t ∈ Z} is WN(0, σ 2 ). Suppose that θ1 , . . . , θq , σ 2 are unknown real-valued


parameters to be estimated from X1 , . . . , Xn .
Recall that the autocovariance function of the MA(q) sequence under consideration
is ( P
σ 2 q−|k|
j=0 θj θj+|k| , |k| ≤ q,
RX (k) = (127)
0 elsewhere
(we put θ0 = 1.)
Moment estimators of θ1 , . . . , θq , σ 2 can be obtained by solving the system of equations
(127) for k = 0, 1,. . . , q, where we replace RX (k) by the sample autocovariances R(k)
b =
1 n−k
Xt+k − X n . We get the system of q + 1 equations for θ1 , . . . , θq , σ 2
P
n t=1 Xt − X n

= σ 2 1 + θ12 + · · · + θq2 ,

R(0)
b (128)
R(1)
b = σ 2 (θ1 + θ1 θ2 + · · · + θq−1 θq ) ,
..
.
R(q) = σ 2 θq .
b

This system however need not have the unique solution.

Example 54. Consider MA(1) model Xt = Yt + θYt−1 , where Yt is WN(0, σ 2 ) and θ 6= 0.


Obviously, RX (0) = σ 2 (1 + θ2 ) a RX (1) = σ 2 θ, thus

RX (1) θ
r(1) = = .
RX (0) 1 + θ2

119
It can be shown that in this case, |r(1)| ≤ 21 for all real values of θ. Consequently, solving
1
the last equation with respect to θ, we get either the twofold root θ = 2r(1) or two
real-valued roots p
1 ± 1 − 4r2 (1)
θ1,2 = .
2r(1)
The root with the positive sign is in absolute value larger than 1, while those with the
negative sign is in absolute value less than 1, which corresponds to the invertible process.
The moment estimators of θ a σ 2 now can be obtained from equations (128) that can
be rewritten into the form

R(0)
b = σ 2 (1 + θ2 ),
θ
rb(1) = .
1 + θ2
For θ we have two solutions
p
1± 1 − 4b
r2 (1)
θb1,2 = ,
r(1)
2b

r(1)| ≤ 21 .
that take real values if |b
r(1)| < 12 , the moment estimators are
Provided that the process is invertible and |b
p
1− 1 − 4b
r2 (1)
θb = ,
r(1)
2b
R(0)
b
b2 =
σ .
1 + θb2

r(1)| = 12 , we take
If |b

1 rb(1)
θb = = ,
r(1)
2b |b
r(1)|
1b
b2
σ = R(0).
2

r(1)| > 12 the real-valued solution of (128) does not exist. In such a case we use
For |b
the same estimates as given for |br(1)| = 12 .

Similarly we can proceed to obtain moment estimators in ARMA models.


For a causal and invertible ARMA(p, q) process

Xt = ϕ1 Xt−1 + · · · + ϕp Xt−p + Yt + θ1 Yt−1 + · · · + θq Yt−q ,

120
where Yt ∼ WN(0, σ 2 ), ϕ1 , . . . , ϕp , θ1 , . . . , θq , σ 2 are unknown parameters and X1 , . . . , Xn
are given observations, we can proceed as follows:
First we use an analogy of the Yule-Walker equations for the autocovariances RX (k),
k = q + 1, . . . , q + p. We get equations

RX (k) = ϕ1 RX (k − 1) + · · · + ϕp RX (k − p)

for unknown parameters ϕ1 , . . . , ϕp . If we replace the theoretical values RX by their


bX (k) = 1 Pn−k Xt − X n Xt+k − X n , we obtain estimates of parameters
 
estimates R n t=1
ϕ
b1 , . . . , ϕ
bp .
Further, we put Zt = Xt − ϕ1 Xt−1 − · · · − ϕp Xt−p and want to estimate θ1 , . . . θq and
2
σ in the MA(q) model
Zt = Yt + θ1 Yt−1 + · · · + θq Yt−q .
Compute the autocovariance function of the sequence {Zt , t ∈ Z}. Since
p
X
Zt = βj Xt−j ,
j=0

where β0 = 1, βj = −ϕj , j = 1, . . . , p, we have


p p
X X
RZ (k) = βj βl RX (k + j − l), k ∈ Z.
j=0 l=0

Estimates of θ1 , . . . , θq a σ 2 are obtained from (128) replacing R(k)


b by estimates
p p
X X
R
bZ (k) = bX (k + j − l),
βbj βbl R
j=0 l=0

where βbj = −ϕbj and RbX (k) are sample autocovariances computed from X1 , . . . , Xn .
The moment estimators are under some assumptions consistent and asymptotically
normal, but they are not too stable and must be handled carefully. Nevertheless, they
can serve as preliminary estimates in more advanced estimation procedures.
Two-step least squares estimators in MA and ARMA models
Consider a causal and invertible ARMA(p, q) process

Xt = ϕ1 Xt−1 + · · · + ϕp Xt−p + Yt + θ1 Yt−1 + · · · + θq Yt−q ,

where Yt ∼ WN(0, σ 2 ), ϕ1 , . . . , ϕp , θ1 , . . . , θq , σ 2 are unknown parameters and X1 , . . . , Xn


are given observations.

121
Under the invertibility assumptions, the process has an AR(∞) representation

X ∞
X
Yt = dj Xt−j = Xt + dj Xt−j
j=0 j=1

(see Theorem 37.) This can be used to obtain parameters ϕ1 , . . . , ϕp , θ1 , . . . , θq , σ 2 as


follows.

• Approximate Xt by an autoregressive process of a sufficiently large order m, where


m ≥ p, i. e., consider model

Xt = α1 Xt−1 + · · · + αm Xt−m + Ỹt , t = m + 1, . . . , n

and using X1 , . . . , Xn estimate α1 , . . . , αm by the least squares method. Obtained


estimates are α̃1 , . . . , α̃m .

• Estimate residuals Yb̃ t , t = m + 1, . . . , n and use them as known regressors in the


regression model

Xt = ϕ1 Xt−1 +· · ·+ϕp Xt−p +θ1 Yb̃ t−1 +· · ·+θq Yb̃ t−q +Yt , t = max(p, q, m)+1, . . . , n

and estimate ϕ1 , . . . , ϕp , θ1 , . . . , θq , σ 2 from this regression model with regressors


X , . . . , X , Yb̃ , Yb̃ .
t−1 t−p t−1 t−q

For other estimating methods see, e. g., Prášková, 2016.

16 Periodogram
Definition 43. Let X1 , . . . , Xn be observations of a random sequence {Xt , t ∈ Z}. The
periodogram of X1 , . . . , Xn is defined by
n
1 X 2
−itλ
In (λ) = Xt e , λ ∈ [−π, π]. (129)
2πn t=1

To compute the values of the periodogram, it is more convenient to consider it in the


form
1  2
A (λ) + B 2 (λ) ,

In (λ) = (130)

where r n r n
2X 2X
A(λ) = Xt cos tλ, B(λ) = Xt sin tλ. (131)
n t=1 n t=1

122
For a real-valued sequence, the periodogram can be also expressed by
n n n−1 min(n,n−k)
1 XX −i(t−s)λ 1 X X
In (λ) = Xt X s e = Xs Xs+k e−ikλ
2πn t=1 s=1 2πn k=−n+1
s=max(1,1−k)
n−1
1 X
= e−ikλ Ck , (132)
2π k=−n+1

where
n−k
1X
Ck = Xt Xt+k , k≥0 (133)
n t=1
= C−k , k < 0.

Distribution of values of the periodogram


Theorem 60. Let {Xt , t ∈ Z} be a centered
P weakly stationary real-valued sequence with
the autocovariance function R, such that ∞k=−∞ |R(k)| < ∞. Then

EIn (λ) → f (λ), λ ∈ [−π, π], (134)

where f denotes the spectral density of the sequence {Xt , t ∈ Z}.


Proof. From formula (132), using the stationarity and the centrality we get
n−1
1 X
EIn (λ) = eitλ R(k)(n − |k|).
2πn k=−n+1

Under the assumptions of the theorem and according to Theorem 22, the spectral density
of the sequence {Xt , t ∈ Z} exists and is given by

1 X −ikλ
f (λ) = e R(k).
2π k=−∞

Thus, using the same arguments as in Theorem 22, we have, as n → ∞,


n−1
1 X 1 X
|f (λ) − EIn (λ)| ≤ |R(k)| + |R(k)||k| → 0. (135)
2π 2πn k=−n+1
|k|≥n

2πj
Usually, the periodogram is computed at points λj = n
, λj ∈ [−π, π] (Fourier
frequencies).

123
Theorem 61. Let {Xt , t ∈ Z} be a Gaussian random sequence of i. i. d. random vari-
ables with zero mean and variance σ 2 , 0 < σ 2 < ∞. Put n = 2m + 1 and consider
the periodogram In computed from X1 , . . . , Xn at frequencies λj = 2πj n
, j = 1, . . . , m.
Then random variables In (λ1 ), . . . , In (λm ) are independent and identically distributed as
σ2 2

χ (2), where χ2 (2) denotes the χ2 distribution with two degrees of freedom.
Proof. Consider random vector J = (A(λ1 ), . . . , A(λm ), B(λ1 ), . . . , B(λm ))0 , where the
variables A(λj ), B(λj ) are defined in (131). This vector has jointly normal distribution
since it is a linear transformation of the random vector (X1 , . . . , Xn )0 . Further we prove
that all the components of the vector J are mutually uncorrelated (and thus, indepen-
dent), and identically distributed with zero mean and variance σ 2 . For this we use the
following identities for trigonometric functions
n
X n
cos2 (tλr ) = , r = 1, . . . , m,
t=1
2
n
X n
sin2 (tλr ) = , r = 1, . . . , m,
t=1
2
n
X
sin(tλr ) cos(tλs ) = 0, r, s = 1, . . . , m,
t=1
n
X
sin(tλr ) sin(tλs ) = 0, r, s = 1, . . . , m, r 6= s,
t=1
Xn
cos(tλr ) cos(tλs ) = 0, r, s = 1, . . . , m, r 6= s
t=1

from that the result follows using simple computations. Particularly, we get for any
r = 1, . . . , m that A(λr ) ∼ N (0, σ 2 ), B(λr ) ∼ N (0, σ 2 ), thus
A2 (λr ) + B 2 (λr ) 4π
2
= 2 In (λr ) ∼ χ2 (2).
σ σ

Remark 24. From the assumption that {Xt , t ∈ Z} is a Gaussian random sequence of
i. i. d. random variables with zero mean and variance σ 2 we can easily conclude that the
spectral density of this sequence is
σ2
, λ ∈ [−π, π]
f (λ) =

since {Xt , t ∈ Z} is the white noise. From Theorem 61 and properties of the χ2 distri-
bution we have for r = 1, . . . , m,
σ2 σ2
EIn (λr ) = 2 = = f (λr ),
4π 2π

124
σ4
var In (λr ) = 4 = f 2 (λr ).
16π 2
We can see that the variance of the periodogram in this case does not depend on n. More
generally, it can be proved that for any Gaussian stationary centered sequence with a
continuous spectral density f it holds that
lim var In (λ) = f 2 (λ), λ 6= 0, λ ∈ (−π, π)
n→∞
= 2f 2 (λ), λ = 0, λ = ±π
(Anděl, 1976, p. 103, Theorem 10). We see that the variance of the periodogram does
not converge to zero with increasing n. It means that the periodogram is not consistent
estimator of the spectral density.
The periodogram was originally proposed to detect hidden periodic components in a
time series. To demonstrate it, let us consider a sequence {Xt , t ∈ Z} such that
Xt = αeitλ0 + Yt , Yt ∼ WN(0, σ 2 )
where α is a nonzero constant and λ0 ∈ [−π, π]. Then
n n n
1 X 1 X −itλ 1 X −it(λ−λ0 )
√ Xt e−itλ = √ Yt e +√ αe (136)
n t=1 n t=1 n t=1
and from here we can see that if λ = λ0 , the nonrandom part of the periodogram
represented by the second sum on the right-hand side of (136) tends to ∞ as n → ∞
while for λ 6= λ0 is negligible. It means that if there is a single periodic component at
frequency λ0 the periodogram takes in it the largest value. Since usually the frequency
λ0 is unknown, it is reasonable to consider maximum of the values of the periodogram
at the Fourier frequencies.
Theorem 62. Let {Xt , t ∈ Z} be a Gaussian random sequence of i. i. d. random variables
with zero mean and variance σ 2 . Let n = 2m + 1 and In (λr ) be the periodogram computed
from X1 , . . . , Xn at the frequencies λr = 2πr
n
, r = 1, . . . , m. Then the statistic
max1≤r≤m In (λr )
W = (137)
In (λ1 ) + · · · + In (λm )
has density
[1/x]
m−1
X  
j−1
g(x) = m(m − 1) (−1) (1 − jx)m−2 , 0<x<1
j=1
j−1

and
[1/x]  
X m k
P (W > x) = 1 − (−1) (1 − kx)m−1 , 0 < x < 1. (138)
k=0
k

125
7 Sunspots: Periodogram
x 10
2

1.8

1.6

1.4

1.2

0.8

0.6

0.4

0.2

0
0 0.1 0.2 0.3 0.4 0.5
cycles/year

Figure 15: Periodogram of the Sunspots, Wolf index. The maximum corresponds to the
cycle with period 11.0769 years

Proof. Anděl (1976), pp. 79–82.


Fisher test of periodicity We want to test the null hypothesis of no periodic compo-
nent H0 : “X1 , . . . , Xn are i. i. d. with distribution N (0, σ 2 ) “ against the alternative that
the null hypothesis is violated. The test statistic is based on Theorem 62 and reject the
null hypothesis at level α if W > cα where cα is a critical value that can be computed
from (138).
Estimators of spectral density
We have seen in Theorem 60 that the periodogram is the asymptotically unbiased
estimator of the spectral density, but it is not consistent since its variance does not
converge to zero neither is the simplest case of the Gaussian white noise. It can be
however shown that under some smoothing assumptions,
Z π
In (λ)K(λ)dλ
−π

where K is a kernel function with properties


Z π Z π
K(λ) ≥ 0, K(λ) = K(−λ), K(λ)dλ = 1, K 2 (λ)dλ < ∞,
−π −π

126
is an asymptotically unbiased and consistent estimator of
Z π
f (λ)K(λ)dλ.

Then a consistent estimator of f (λ0 ) is considered to be


Z π
fbn (λ0 ) = K(λ − λ0 )In (λ)dλ.
−π

If we expand function K into the Fourier series with the Fourier coefficients wk and
express the periodogram by using formulas (132) and (133), we get
Z πn−1
1 X −ikλ
fbn (λ0 ) = e Ck K(λ − λ0 )dλ
−π 2π k=−n+1
n−1 Z π
1 X
= Ck e−ikλ K(λ − λ0 )dλ
2π k=−n+1 −π
n−1 Z π ∞
1 X X
= Ck e−ikλ eij(λ−λ0 ) wj dλ
2π k=−n+1 −π j=−∞
n−1 ∞ Z π
1 X X
−ijλ0
= Ck e wj e(ijλ−ikλ) dλ
2π k=−n+1 j=−∞ −π

n−1
X n−1
X
−ikλ0
= e Ck w k = C0 w 0 + 2 Ck wk cos(kλ0 ). (139)
k=−n+1 k=1

One of the commonly used kernel function is so-called Parzen window, which is
usually presented by coefficients
  3
1 − 6 Mk 2 + 6 |k| , |k| ≤ M2
 

  M
3
wk = 2 1 − |k| , M
< |k| ≤ M

 M 2

0, |k| > M

where M is a truncation point that depends on n ( n6 < M < n5 ). For more information
on the choice of K, respectively of wk , see, e.g., Anděl, 1976, or Brockwell and Davis,
Chap. 10.

127
Sunspots: Spectral Density Estimate, Parzen window
45

40

35
Power/frequency (dB/rad/sample)

30

25

20

15

10

5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (×π rad/sample)

Figure 16: Sunspots, Wolf index. Estimator of the spectral density, Parzen kernel

References
[1] Anděl, J. : Statistická analýza časových řad. SNTL, Praha 1976 (In Czech)

[2] Anděl, J.: Základy matematické statistiky. MATFYZPRESS, Praha 2007 (In Czech)

[3] Bosq, D., Nguyen, Hung T. : A Course in Stochastic Processes. Stochastic Models
and Statistical Inference. Kluwer, Dordrecht 1996

[4] Brockwell, P. J., Davis, R. A. : Time Series: Theory and Methods. Springer-Verlag,
New York 1991

[5] Loève, M. : Probability Theory. Van Nostrand, Princeton 1960

[6] Prášková, Z.: Základy náhodných procesů II, Karolinum, Praha 2016. 2nd, extended
edition (In Czech)

[7] Prášková, Z., Lachout, P. : Základy náhodných procesů I, Matfyzpress, Praha 2012.
2nd, extended edition (In Czech)

[8] Priestley, M. B. : Spectral Analysis and Time Series, Vol. I. Academic Press, London
1981

128
[9] Rao, R. C. : Lineárnı́ metody statistické indukce a jejich aplikace. Academia, Praha
1978. Czech translation from Rao, R. C. : Linear Statistical Inference and its Appli-
cations, Wiley, New York, 1973

[10] Rozanov, Yu. A. : Stacionarnyje slučajnyje processy. Gosudarstvennoje izdatelstvo


fiziko-matematičeskoj literatury, Moskva 1963 (In Russian)

[11] Rudin, W. : Analýza v reálném a komplexnı́m oboru. Academia, Praha. 2003. Czech
translation from Rudin, W. : Real and Complex Analysis., 3rd Ed. McGraw Hill,
New York, 1987

[12] Shumway, R. H., Stoffer, D. S. : Time Series Analysis and Its Applications. Springer-
Verlag, New York 2001

[13] Štěpán, J. : Teorie pravděpodobnosti. Matematické základy. Academia, Praha 1987.


In Czech.

[14] WDC-SILSO, Royal Observatory of Belgium, Brussels,


http://www.sidc.be/silso/datafiles

129
List of symbols

N set of natural numbers


N0 set of nonnegative integers
Z set of integers
R set of real numbers
C set of complex numbers
x column vector
A matrix
I identity matrix
|| · || norm in a Hilbert space
B Borel σ−algebra
N (µ, σ 2 ) normal distribution with parameters µ, σ 2
X ∼ N (µ, σ 2 ) random variable with distribution N (µ, σ 2 )
{Xt , t ∈ T } stochastic process indexed by set T
M{Xt , t ∈ T } linear span of {Xt , t ∈ T }
H{Xt , t ∈ T } Hilbert space generated by the stochastic process {Xt , t ∈ T }
AR(p) autoregressive sequence of order p
MA(q0 moving average sequence of order q
ARMA(p, q) mixed ARMA sequence of orders p and q
WN(0, σ 2 ) white noise with zero mean and variance σ 2
X⊥Y orthogonal (perpendicular) random variables
lim limes superior
P
−→ convergence in probability
D
−→ convergence in distribution
l. i. m. convergence in mean square (limit in the mean)

130

You might also like