An Glicky 2016
An Glicky 2016
Stochastic Processes 2
Lecture Notes
Zuzana Prášková
2017
1
Contents
1 Definitions and basic characteristics 4
1.1 Definition of a stochastic process . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Daniell-Kolmogorov theorem . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Autocovariance and autocorrelation function . . . . . . . . . . . . . . . . 5
1.4 Strict and weak stationarity . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Properties of autocovariance function . . . . . . . . . . . . . . . . . . . . 8
3 Hilbert space 13
3.1 Inner product space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Convergence in norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Space L2 (Ω, A, P ) 15
4.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Mean square convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Hilbert space generated by a stochastic process . . . . . . . . . . . . . . 16
2
8.4 Autoregressive sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8.5 ARMA sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.6 Linear filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
16 Periodogram 122
References 128
3
1 Definitions and basic characteristics
1.1 Definition of a stochastic process
Definition 1. Let (Ω, A, P) be a probability space, (S, E) a measurable space, and
T ⊂ R. A family of random variables {Xt , t ∈ T } defined on (Ω, A, P) with values in S
is called a stochastic (random) process.
If S = R, {Xt , t ∈ T } is called the real-valued stochastic process.
If T = Z = {0, ±1, ±2, . . . } or T ⊂ Z, {Xt , t ∈ T } is called the discrete time stochastic
process, time series.
If T = [a, b], −∞ ≤ a < b ≤ ∞, {Xt , t ∈ T } is the continuous time stochastic process.
For any ω ∈ Ω fixed, Xt (ω) is a function on T with values in S which is called a trajectory
of the process.
Definition 2. The pair (S, E), where S is the set of values of random variables Xt and
E is the σ−algebra of subsets of S, is called the state space of the process {Xt , t ∈ T }.
4
A system of characteristic functions is said to be consistent if
1. ϕ(ui1 , . . . , uin ) = ϕ(u1 , . . . , un ) for any permutation (i1 , . . . , in ) of (1, . . . , n), (sym-
metry)
Definition 6. Let {Xt , t ∈ T } be a stochastic process such that EXt := µt exists for all
t ∈ T. Then the function {µt , t ∈ T } defined on T is called the mean value of the process
{Xt , t ∈ T }. We say that the process is centered if its mean value is zero, i.e., µt = 0 for
all t ∈ T.
5
Definition 7. Let {Xt , t ∈ T } be a process with finite second order moments, i.e.,
E|Xt |2 < ∞, ∀t ∈ T. Then a (complex-valued) function defined on T × T by
R(s, t) = E (Xs − µs )(X t − µt )
is called the autocovariance function of the process {Xt , t ∈ T }. The value R(t, t) is the
variance of the process at time t.
Definition 11. A stochastic process {Xt , t ∈ T } with finite second order moments is
said to be weakly stationary or second order stationary, if its mean value is constant,
µt = µ, ∀t ∈ T, and its autocovariance function R(s, t) is a function of s − t, only. If
only the latter condition is satisfied, the process is called covariance stationary.
The autocovariance function of a weakly stationary process is a function of one variable:
R(t) := R(t, 0), t ∈ T.
6
Theorem 2. Any strictly stationary stochastic process {Xt , t ∈ T } with finite second
order moments is also weakly stationary.
Proof. If {Xt , t ∈ T } is strictly stationary with finite second order moments, Xt are
equally distributed for all t ∈ T with the mean value
EXt = EXt+h , ∀t ∈ T, ∀h : t + h ∈ T.
EXt = 0,
3
var Xt = σ 2 = ,
16
R(s, t) = σ 2 (−1)s+t = σ 2 (−1)s−t ,
but it is not strictly stationary (variables X and −X are not equally distributed).
7
Theorem 3. Any weakly stationary Gaussian process {Xt , t ∈ T } is also strictly sta-
tionary.
Proof. Weak stationarity of the process {Xt , t ∈ T } implies EXt = µ, cov (Xt , Xs ) =
R(t − s) = cov (Xt+h , Xs+h ), t, s ∈ T, thus, for all n ∈ N and all t1 , . . . , tn , h ∈ Z,
Since the joint distribution of a normal vector is uniquely defined by the vector of mean
values and the variance matrix, (Xt1 , . . . , Xtn ) ∼ N (µ, Σ), and (Xt1 +h , . . . , Xtn +h ) ∼
N (µ, Σ) from which the strict stationarity of {Xt , t ∈ T } follows.
Theorem 4. Let {Xt , t ∈ T } be a process with finite second moments. Then its autoco-
variance function satisfies
R(t, t) ≥ 0,
p p
|R(s, t)| ≤ R(s, s) R(t, t).
Proof. The first assertion follows from the definition of the variance. The second one
follows from the Schwarz inequality, since
1 1
|R(s, t)| = |E(Xs − EXs )(Xt − EXt )| ≤ (E|Xs − EXs |2 ) 2 (E|Xt − EXt |2 ) 2
p p
= R(s, s) R(t, t).
Thus, for the autocovariance function of a weakly stationary process we have R(0) ≥ 0
and |R(t)| ≤ R(0).
8
Definition 12. Let f be a complex-valued function defined on T × T , T ⊂ R. We
say that f is positive semidefinite, if ∀n ∈ N, any complex numbers c1 , . . . , cn and any
t1 , . . . , tn ∈ T,
Xn X n
cj ck f (tj , tk ) ≥ 0.
j=1 k=1
Theorem 6. Let {Xt , t ∈ T } be a process with finite second order moments. Then its
autocovariance function is positive semidefinite on T × T.
Proof. W. l. o. g., suppose that the process is centered. Then for any n ∈ N, complex
constants c1 , . . . , cn and t1 , . . . , tn ∈ T
2
n
" n n
#
X X X
0 ≤ E cj Xtj = E cj Xtj ck Xtk
j=1 j=1 k=1
n
XX n n
XXn
= cj ck E(Xtj Xtk ) = cj ck R(tj , tk ).
j=1 k=1 j=1 k=1
9
Theorem 7. To any positive semidefinite function R on T × T there exists a stochas-
tic process {Xt , t ∈ T } with finite second order moments such that its autocovariance
function is R.
Proof. The proof will be given for real-valued function R, only. For the proof with
complex-valued R see, e.g., Loève (1955), Chap. X, Par. 34.
Since R is positive semidefinite, for any n ∈ N and any real t1 , . . . , tn ∈ T, the matrix
R(t1 , t1 ) R(t1 , t2 ) . . . R(t1 , tn )
R(t2 , t1 ) R(t2 , t2 ) . . . R(t2 , tn )
Vt =
... ... ... ...
R(tn , t1 ) R(tn , t2 ) . . . R(tn , tn )
is positive semidefinite. The function
1 >
ϕ(u) = exp − u Vt u , u ∈ Rn
2
is the characteristic function of the normal distribution Nn (0, Vt ). In this way, ∀n ∈ N
and any real t1 , . . . , tn ∈ T we get the consistent system of characteristic functions. The
corresponding system of the distribution functions is also consistent. Thus according to
the Daniell-Kolmogorov theorem (Theorem 1), there exists a Gaussian stochastic process
covariances of which are the values of the function R(s, t); hence, R is the autocovariance
function of this process.
The function cos t is positive semidefinite, and according to Theorem 6 there exists
a (Gaussian) stochastic process {Xt , t ∈ T }, the autocovariance function of which is
R(s, t) = cos(s − t).
Theorem 8. The sum of two positive semidefinite functions is a positive semidefinite
function.
10
Proof. It follows from the definition of the positive semidefinite function. If f and g
are positive semidefinite and h = f + g, then for any n ∈ N, complex c1 , . . . , cn and
t1 , . . . , tn ∈ T
n X
X n n X
X n
cj ck h(tj , tk ) = cj ck [f (tj , tk ) + g(tj , tk )]
j=1 k=1 j=1 k=1
X n X n X n X n
= cj ck f (tj , tk ) + cj ck g(tj , tk ) ≥ 0.
j=1 k=1 j=1 k=1
for all x ∈ R.
Relation (1) is called the Markovian property. Simple cases are discrete state Markov
processes, i.e., discrete and continuous time Markov chains.
Example 4. Consider a Markov chain {Xt , t ≥ 0} with the state space S = {0, 1}, the
initial distribution P (X0 = 0) = 1, P (X0 = 1) = 0 and the intensity matrix
−α α
Q= , α > 0, β > 0.
β −β
Treat the stationarity of this process.
We know that all the finite dimensional distributions of a continuous time Markov
chain are determined by the initial distribution p(0) = {pj (0), j ∈ S}T and the transition
probability matrix P(t) = {pij (t), i, j ∈ S}. In our case, P(t) = exp(Qt), where
β + αe−(α+β)t α − αe−(α+β)t
1
P(t) =
α + β β − βe−(α+β)t α + βe−(α+β)t
11
(see, e.g., Prášková and Lachout, 2012, pp. 93–95) and due to the initial distribution,
the absolute distribution is
p(t)T = p(0)T P(t) = (1, 0)T P(t) = (p00 (t), p01 (t))T .
Then we have
1
· α − αe−(α+β)t ,
EXt = P(Xt = 1) = p01 (t) =
α+β
which depends on t, thus, the process is neither strictly nor weakly stationary.
On the other hand, if the initial distribution is the stationary distribution of the
Markov chain, i.e., such probability distribution that satisfies π T = π T P(t), then
{Xt , t ≥ 0} is the strictly stationary process (Prášková and Lachout, 2012, Theorem
3.12).
In our case, the solution of π T = π T P(t) gives
β α
π0 = , π1 =
α+β α+β
α
and from here we get the constant mean EXt = α+β
and the autocovariance function
αβ
R(s, t) = 2
e−(α+β)|s−t| .
(α + β)
Example 5. Poisson process with intensity λ is a continuous time Markov chain {Xt , t ≥
0} such that X0 = 0 a. s. and for t > 0, Xt has the Poisson distribution with parameter
λt. Increments Xt − Xs , s < t have the Poisson distribution with the parameter λ(t − s).
The Poisson process is neither strictly nor weakly stationary.
2. For any 0 ≤ t1 < t2 < · · · < tn , Wt1 , Wt2 − Wt1 , Wt3 − Wt2 , . . . , Wtn − Wtn−1 are
independent random variables (independent increments).
12
3. For any 0 ≤ t < s, the increments Ws − Wt have normal distribution with zero
mean and the variance σ 2 (s − t), where σ 2 is a positive constant. Especially, for
any t ≥ 0, EWt = 0 and var Wt = σ 2 t.
The Wiener process is Markov but it is neither strictly nor weakly stationary.
2.3 Martingales
Definition 16. Let {Ω, A, P } be a probability space, T ⊂ R, T 6= ∅. Let for any t ∈
T, Ft ⊂ A be a σ− algebra (σ−field). A system of σ-fields {Ft , t ∈ T } such that Fs ⊂ Ft
for any s, t ∈ T, s < t is called a filtration.
Definition 17. Let {Xt , t ∈ T } be a stochastic process defined on {Ω, A, P }, and let
{Ft , t ∈ T } be a filtration. We say that {Xt , t ∈ T } is adapted to {Ft , t ∈ T } if for any
t ∈ T, Xt is Ft measurable.
Definition 18. Let {Xt , t ∈ T } be adapted to {Ft , t ∈ T } and E|Xt | < ∞ for all t ∈ T.
Then {Xt , t ∈ T } is said to be a martingale if EXt |Fs = Xs a.s. for any s < t, s, t, ∈ T.
3 Hilbert space
3.1 Inner product space
Definition 19. A complex vector space H is said to be an inner product space, if for
any x, y ∈ H there exists a number hx, yi ∈ C, called the inner product of elements x, y
such that
The number p
kxk := hx, xi, ∀x ∈ H
is called the norm of an element x.
p
Theorem 9. For the norm kxk := hx, xi the following properties hold:
13
3. ∀x, y ∈ H: kx + yk ≤ kxk + kyk.
p p
4. ∀x, y ∈ H : |hx, yi| ≤ kxk · kyk = hx, xi hy, yi
(the Cauchy-Schwarz inequality).
Proof. It can be found in any textbook on Functional Analysis, see, e.g., Rudin (2003),
Chap. 4.
14
4 Space L2(Ω, A, P )
4.1 Construction
Let L be the set of all random variables with finite second order moments defined on a
probability space (Ω, A, P). We can easily verify that L is the vector space:
1. ∀ X, Y ∈ L, X + Y ∈ L since
2. ∀X ∈ L and ∀α ∈ C, αX ∈ L, since
X ∼ Y ⇐⇒ P [X = Y ] = 1
and on the set of classes of equivalent random variables from L define the relation
hX, Y i = E XY , ∀X ∈ X, e Y ∈ Ye ,
where X,
e Ye denote classes of equivalence.
The space of classes of equivalence on L with the above relation h., .i is denoted
L2 (Ω, A, P). The relation hX, Y i satisfies the properties of the inner product on L2 (Ω, A, P):
For every X, Y, Z ∈ L2 (Ω, A, P) and every α ∈ C it holds
1. hαX, Y i = E αXY = αE XY = αhX, Y i.
2. hX + Y, Zi = E (X + Y )Z = E XZ + E Y Z = hX, Zi + hY, Zi.
3. hX, Xi = E XX = E|X|2 ≥ 0.
4. hX, Xi = E|X|2 = 0 ⇔ X ∼ 0.
15
4.2 Mean square convergence
We have defined L2 (Ω, A, P) to be the space of classes of equivalence on L with the inner
product
hX, Y i = E XY ,
the norm is therefore defined by
p
||X|| := E|X|2
Definition 23. We say that a sequence of random variables Xn such that |EXn |2 < ∞
converges in the mean square (or in the squared mean) to a random variable X, if it
converges to X in L2 (Ω, A, P), i. e.,
Proof. See, e. g., Brockwell and Davis (1991), Par. 2.10, or Rudin (2003), Theorem
3.11.
The space L2 (Ω, A, P) is the Hilbert space.
Convention: A stochastic process {Xt , t ∈ T } such that E|Xt |2 < ∞ will be called a
second order process.
Equivalence classes in M{Xt , t ∈ T } and the inner product hX, Y i are defined as
above.
16
Definition 25. A closure M{Xt , t ∈ T } of the linear span M{Xt , t ∈ T } consists of all
the elements of M{Xt , t ∈ T } and the mean square limits of all convergent sequences
of elements of M{Xt , t ∈ T }.
Then M{Xt , t ∈ T } is a closed subspace of the complete space L2 (Ω, A, P) and thus
a complete inner product space. It is called the Hilbert space generated by a stochastic
process {Xt , t ∈ T }, notation H{Xt , t ∈ T }.
Definition 26. Let {Xth , t ∈ T }h∈S , T ⊂ R, S ⊂ R, be a collection of stochastic pro-
cesses in L2 (Ω, A, P) (shortly: second order processes). We say that processes {Xth , t ∈
T }h∈S converge in mean square to a second order process {Xt , t ∈ T } as h → h0 , if
Briefly, we write
Theorem 12. Centered second order processes {Xth , t ∈ T }h∈S converge in mean square
to a centered second order process {Xt , t ∈ T } as h → h0 if and only if
h i
E Xth Xth → b(t) as h, h0 → h0 ,
0
0
Xth0 −→ Xt0 in mean square.
h0 →h0
17
since {Xt , t ∈ T } is a second order process. For h = h0 , we get
h i
E Xt Xt0 −→ E Xt Xt0 as t, t0 ∈ T,
h h
h→h0
h i
where E = Rh (t, t0 ) is the autocovariance function of the process {Xth , t ∈ T }
Xth Xth0
and E Xt Xt0 = R(t, t0 ) is the autocovariance function of the process {Xt , t ∈ T }.
Then
0
kXth − Xth k2 → 0, as h, h0 → h0 , ∀t ∈ T
since ∀t ∈ T
0
h 0
i
kXth − Xth k2 = E (Xth − Xth )(Xth − Xth )
0
h i h 0 i h i
= E Xth Xth − E Xth Xth − E Xth Xth +
0
h 0 i
+ E Xth Xth −→ b(t) − b(t) − b(t) + b(t) = 0
0
as h, h0 → h0 .
We have proved that processes {Xth , t ∈ T }h∈S satisfy the Cauchy property for any
t ∈ T. Due to the completeness of L2 (Ω, A, P), ∀t ∈ T, ∃Xt ∈ L2 (Ω, A, P) such that
Xth → Xt in mean square as h → h0 , thus E|Xt |2 < ∞ ∀t ∈ T. Therefore there exists a
limit process {Xt , t ∈ T } ∈ L2 (Ω, A, P). We prove that {Xt , t ∈ T } is centered:
Then
q
h
2
|EXt | = E Xt − Xt ≤ E Xt − Xth → 0
as h → h0 , ∀t ∈ T .
18
We say that the process {Xt , t ∈ T } is mean square continuous, if it is continuous at
each point of T .
Remark 2. A second order process that is mean square continuous is also stochastically
continuous (continuous in probability), since
h i
P |Xt − Xt0 | > ε ≤ ε−2 · E |Xt − Xt0 |2 .
Proof. 1. Let {Xt , t ∈ T } be a centered mean square continuous process. We prove that
its autocovariance function is continuous at every point of T × T. Since EXt = 0, we
have ∀s0 , t0 ∈ T a s → s0 , t → t0
|R(s, t) − R(s0 , t0 )| = E Xs X t − E Xs0 X t0
= |hXs , Xt i − hXs0 , Xt0 i| → 0,
which follows from the continuity of the inner product, since Xt → Xt0 as t → t0 and
Xs → Xs0 as s → s0 , due to the continuity of the process.
The limit on the right hand side is zero as t → t0 , thus the limit on the left hand side is
zero.
Theorem 14. Let {Xt , t ∈ T } be a second order process with a mean value {µt , t ∈ T }
and an autocovariance function R(s, t) defined on T × T . Then {Xt , t ∈ T } is mean
square continuous if {µt , t ∈ T } is continuous on T and R(s, t) is continuous at points
[s, t], such that s = t.
Proof.
19
Put Yt := Xt − µt , ∀t ∈ T . Then {Yt , t ∈ T } is centered process with the same
autocovariance function R(s, t) and
Theorem 15. Let {Xt , t ∈ T } be a centered weakly stationary process with an autoco-
variance function R(t). Then {Xt , t ∈ T } is mean square continuous if and only if R(t)
is continuous at zero.
Proof. Due to the weak stationarity, R(s, t) = R(s − t). Then the assertion follows from
the previous theorem.
The process is weakly stationary, but not mean square continuous (the autocovariance
function is not continuous at zero).
Example 10. Poisson process {Xt , t ≥ 0} with intensity λ > 0 is a process with
stationary and independent increments, Xt ∼ Po(λ). Since EXt = µt = λt, t ≥ 0 and
cov(Xs , Xt ) = λ · min{s, t}, the process is not weakly stationary.
Since µt is continuous, R(s, t) is continuous, the process is mean square continuous.
20
250
200
150
100
50
−50
0 0.5 1 1.5 2
4
x 10
1.5
0.5
0
2
1.5 2
1 1.5
1
0.5 0.5
0 0
21
5.2 Mean square derivative of the process
Definition 28. Let {Xt , t ∈ T } be a second order process, T ⊂ R an open interval. We
say that the process is mean square differentiable (L2 -differentiable) at point t0 ∈ T if
there exists the mean square limit
Xt0 +h − Xt0
l. i. m. := Xt00 .
h→0 h
This limit is called the mean square derivative (L2 -derivative) of the process at t0 .
We say that the process {Xt , t ∈ T } is mean square differentiable, if it is mean square
differentiable at every point t ∈ T .
Theorem 16. A centered second order process {Xt , t ∈ T } is mean square differentiable
if and only if there exists a finite generalized second-order partial derivative of its auto-
covariance function R(s, t) at points [s, t], where s = t, i.e., if at these points there exists
finite limit
1
lim [R(s + h, t + h0 ) − R(s, t + h0 ) − R(s + h, t) + R(s, t)] .
h,h →0 hh0
0
Proof. According to Theorem 12, the necessary and sufficient condition for the mean
square convergence of (Xt+h − Xt )/h is the existence of the finite limit
Xt+h − Xt Xt+h0 − Xt
lim E · =
h,h0 →0 h h0
1
lim [R(t + h, t + h0 ) − R(t, t + h0 ) − R(t + h, t) + R(t, t)] .
h,h0 →0 hh0
∂ 2 R(s, t) ∂ 2 R(s, t)
and
∂s ∂t ∂t ∂s
and they are continuous, then there exists the generalized second-order partial derivative
2 R(s,t)
of R(s, t) and is equal to ∂ ∂s ∂t
(Anděl, 1976, p. 20).
Theorem 17. A second order process {Xt , t ∈ T } with the mean value {µt , t ∈ T } is
mean square differentiable, if {µt , t ∈ T } is differentiable and the generalized second-
order partial derivative of the autocovariance function exists and is finite at points [s, t],
such that s = t.
22
Proof. A sufficient condition for the mean square limit of Xt+hh−Xt to exists is the Cauchy
condition
Xt+h − Xt Xt+h0 − Xt 2
E − → 0 as h → 0, h0 → 0
h h0
∀t ∈ T. It holds since
Xt+h − Xt Xt+h0 − Xt 2 Yt+h − Yt Yt+h0 − Yt 2
E − ≤ 2E −
h h0 h h0
µt+h − µt µt+h0 − µt 2
+ 2
− ,
h h0
Example 11. A centered weakly stationary process with the autocovariance function
R(s, t) = cos(s − t), s, t ∈ R, is mean square differentiable, since
∂ 2 cos(s − t) ∂ 2 cos(s − t)
and exist and they are continuous.
∂s∂t ∂t∂s
Example 12. Poisson process {Xt , t > 0} has the mean value µt = λt, which is continu-
ous and differentiable for all t > 0 and the autocovariance function R(s, t) = λ min(s, t).
The generalized second- order partial derivative of R(s, t) however is not finite: for s = t
we have
1
lim [s + h − min(s + h, s) − min(s, s + h) + s] = +∞,
h→0+ h2
1
lim 2 [s + h − min(s + h, s) − min(s, s + h) + s] = +∞.
h→0− h
23
and define partial sums In of a centered second order process {Xt , t ∈ [a, b]} by
n−1
X
In := Xtn,i (tn,i+1 − tn,i ) , n ∈ N.
i=0
If the sequence {In , n ∈ N} has the mean square limit I for any partition of the interval
[a, b] such that ∆n → 0 as n → ∞, we call it Riemann integral of the process {Xt , t ∈
[a, b]} and write
Z b
I= Xt dt.
a
If the process {Xt , t ∈ T } has mean value {µt , t ∈ T }, we define the Riemann integral
of the process {Xt , t ∈ [a, b]} to be
Z b Z b Z b
Xt dt = (Xt − µt ) dt + µt dt,
a a a
Rb
if the centered process {Xt − µt } is Riemann integrable and a
µt dt exists and is finite.
Theorem 18. Let {Xt , t ∈ [a, b]} be a centered second order process with the autocovari-
Rb
ance function R(s, t). Then the Riemann integral a Xt dt exists, if the Riemann integral
RbRb
a a
R(s, t)dsdt exists and is finite.
Proof. Let Dm = {sm,0 , . . . , sm,m }, Dn = {tn,0 , . . . , tn,n } be partitions of interval [a, b],
the norms ∆m , ∆n of which converge to zero as m, n → ∞. Put
m−1
X
Im := (sm,j+1 − sm,j ) Xsm ,j
j=0
n−1
X
In := (tn,k+1 − tn,k ) Xtn ,k .
k=0
Rb
Similarly as in the proof of Theorem 12 we can see that a Xt dt exist if there exist the
finite limit
( "m−1 # " n−1 #)
X X
E Im In = lim E Xsm,j (sm,j+1 − sm,j ) · Xtn,k (tn,k+1 − tn,k )
j=0 k=0
m−1
XX n−1
= lim R(sm,j , tn,k )(sm,j+1 − sm,j )(tn,k+1 − tn,k )
j=0 k=0
RbRb
as m, n → ∞, ∆m , ∆n → 0, which follows from the existence of a a
R(s, t)dsdt.
24
Rb
Example 13. The Riemann integral a Xt dt of a centered continuous time process with
the autocovariance function R(s, t) = cos(s − t) exists, since R(s, t) is continuous on
[a, b] × [a, b].
Example 14. Let {Xt , t ∈ R} be a centered second order process. We define
Z ∞ Z b
Xt dt := l. i. m. Xt dt as a → −∞, b → ∞,
−∞ a
if the limit and the Riemann integral on the right hand side exist.
Example 15. Poisson process {Xt , t ≥ 0} is Riemann integrable on any finite interval
[a, b] ⊂ [0, ∞), since its autocovariance function is continuous on [a, b] × [a, b].
then µ(B) = ν(B) for every Borel B ⊂ (−π, π) and µ({−π} ∪ {π}) = ν({−π} ∪ {π}).
b) Let µ, ν be finite measures on (R, B). If for every t ∈ R
Z ∞ Z ∞
itλ
e dµ(λ) = eitλ dν(λ),
−∞ −∞
25
Proof. Rao (1978), Theorem 2c.4, II.
Remark 4. The integral at the Helly-Bray theorem is the Riemann- Stieltjes integral
of a function f with respect to a function F . If [a, b] is a bounded interval and F is
right-continuous, we will understand that
Z b Z
f (x)dF (x) := f (x)dF (x).
a (a,b]
Put tj = j, cj = e−ijλ for a λ ∈ [−π, π]. Then for every n ∈ N, λ ∈ [−π, π],
n n
1 X X −i(j−k)λ
ϕn (λ) := e R(j − k) ≥ 0.
2πn j=1 k=1
26
From here we get
n n
1 X X −i(j−k)λ
ϕn (λ) = e R(j − k)
2πn j=1 k=1
n−1 min(n, κ+n)
1 X X
= e−iκλ R(κ)
2πn κ=−n+1
j=max(1, κ+1)
n−1
1 X
= e−iκλ R(κ)(n − |κ|).
2πn κ=−n+1
27
On the other hand,
Z π Z π
itλ
e dFnk (λ) = eitλ ϕnk (λ)dλ
−π −π
nXk −1
Z π " #
1
= eitλ e−iκλ R(κ)(nk − |κ|) dλ
−π 2πn k κ=−nk +1
nXk −1 Z π
1
= R(κ)(nk − |κ|) ei(t−κ)λ dλ,
2πnk κ=−n +1 −π
k
thus, (
Z π |t|
R(t) 1 − , |t| < nk
eitλ dFnk (λ) = nk
−π 0 elsewhere.
We get
π
|t|
Z
itλ
lim e dFnk (λ) = lim R(t) 1 −
k→∞ −π k→∞ nk
Z π
= R(t) = eitλ dF (λ).
−π
Rπ
To prove the uniqueness, suppose that R(t) = −π eitλ dG(λ), where G is a right-
continuous non-decreasing bounded function on [−π, π] and G(−π) = 0.
Then Z π Z π
itλ
e dµF = eitλ dµG ,
−π −π
where µF a µG are finite measures on Borel subsets of the interval [−π, π] induced
by functions F a G, respectively. The rest of the proof follows from Lemma 1 since
µF (B) = µG (B) for any B ⊂ (−π, π) and µF ({−π} ∪ {π}) = µG ({−π} ∪ {π}).
Formula (2) is called the spectral decomposition (representation) of an autocovari-
ance function of a stationary random sequence. The function F is called the spectral
distribution function of a stationary random sequence.
Rλ
If there exists a function f (λ) ≥ 0 for λ ∈ [−π, π] such that F (λ) = −π f (x)dx (F is
absolutely continuous), then f is called the spectral density. Obviously f = F 0 .
In case that the spectral density exists, the spectral decomposition of the autocovari-
ance function is of the form
Z π
R(t) = eitλ f (λ)dλ, t ∈ Z. (3)
−π
28
Theorem 20. A complex-valued function R(t), t ∈ R, is the autocovariance function of
a centered stationary mean square continuous process if and only if
Z ∞
R(t) = eitλ dF (λ), t ∈ R, (4)
−∞
Remark 5. Two different stochastic processes may have the same spectral distribution
functions and thus the same autocovariance functions.
where ∞
1 X −itλ
f (λ) = e K(t), λ ∈ [−π, π].
2π t=−∞
29
K be such that ∞
P 1
P∞ −itλ
Proof. LetP t=−∞ |K(t)| < ∞ and f (λ) = 2π t=−∞ e K(t). Since
∞ −itλ
the series t=−∞ e K(t) converges absolutely and uniformly for λ ∈ [−π, π], we can
interchange the integration and the summation and for any t ∈ Z we get
∞
Z π Z π " #
1 X
eitλ f (λ)dλ = eitλ e−ikλ K(k) dλ
−π −π 2π k=−∞
∞ Z π
1 X i(t−k)λ
= K(k) e dλ
2π k=−∞ −π
∞
1 X
= K(k)2πδ(t − k) = K(t).
2π k=−∞
where ∞
1 X −itλ
f (λ) = e R(t), λ ∈ [−π, π].
2π t=−∞
To prove that f is the spectral density, due to the uniqueness of the spectral decompo-
sition (3), it suffices to prove that f (λ) ≥ 0 for every λ ∈ [−π, π].
We know from the proof of Theorem 19 that for every λ ∈ [−π, π],
n−1
1 X
ϕn (λ) = e−iκλ R(κ)(n − |κ|) ≥ 0.
2πn κ=−n+1
30
We have, as n → ∞,
1 X −ikλ
|f (λ) − ϕn (λ)| ≤ e R(k)
2π |k|≥n
1 n−1
X
+ e−iκλ R(κ)|κ|
2πn κ=−n+1
n−1
1 X 1 X
≤ |R(k)| + |R(κ)||κ| −→ 0
2π 2πn κ=−n+1
|k|≥n
where we have used the assumption on the absolute summability of the autocovariance
function and the Kronecker lemma.1
Formula (6) is called the inverse formula for computing the spectral density of a
stationary random sequence.
Theorem 23. Let {Xt , t ∈ R} be a centered weakly stationary mean square process. Let
R∞
its autocovariance function R satisfies condition −∞ |R(t)|dt < ∞. Then the spectral
density of the process exists and it holds
Z ∞
1
f (λ) = e−itλ R(t)dt, λ ∈ (−∞, ∞). (7)
2π −∞
31
Example 17. Consider a stationary sequence with the autocovariance function R(t) =
a|t| , t ∈ Z, |a| < 1. Since
∞
X ∞
X ∞
X
|t|
|R(t)| = |a| = 1 + 2 |a|t < ∞,
t=−∞ t=−∞ t=1
for every λ ∈ R.
Example 19. Consider a centered mean square process with the spectral distribution
function
F (λ) = 0, λ < −1,
1
= , −1 ≤ λ < 1,
2
= 1, λ ≥ 1.
32
AR(1), 0.8
6
−2
−4
−6
0 100 200 300 400 500
AR(1), −0.8
6
−2
−4
−6
0 100 200 300 400 500
Figure 3: Trajectories of a process with the autocovariance function R(t) = a|t| , up: a = 0, 8, down
a = −0, 8
Spectral distribution function is not absolutely continuous; the spectral density of the
process does not exist. According to (4) the autocovariance function is
Z ∞
1 1
R(t) = eitλ dF (λ) = e−it + eit = cos t, t ∈ R.
−∞ 2 2
The process has a discrete spectrum with non-zero values at frequencies λ1 = −1, λ2 = 1.
Example 20. The process {Xt , t ∈ R} of uncorrelated random variables with zero mean
and a finite positive variance does not satisfy decomposition (4), since it is not mean
square continuous.
33
62 KAPITOLA 5. LINEÁRNÍ MODELY ČASOVÝCH ŘAD
62 KAPITOLA 5. LINEÁRNÍ MODELY ČASOVÝCH ŘAD
1.0
4
0.5 1.0
34
0.0 0.5
3
f(λ) f(λ)
r(t)
2
−1.0 −1.0−0.5 −0.5 0.0
r(t)
1
02
1
0 2 4 6 8 10 −π −π 2 0 π 2 π
0
0 2 4 t 6 8 10 −π −π 2 λ0 π 2 π
t λ
Obrázek 5.3: Autokorelační funkce (vlevo) a spektrální hustota (vpravo) posloupnosti
|t|
Figure 4: Autocovariance
AR(1):
ObrázekXt5.3:
= 0,8 function
Xt−1 ∼N R(t)
+ Yt ; Ytfunkce
Autokorelační 1) = a aspektrální
(0,(vlevo) (left) and the spectral density (right),
hustota (vpravo) posloupnosti
a = 0, 8 AR(1): Xt = 0,8 Xt−1 + Yt ; Yt ∼ N (0, 1)
1.0
4
0.5 1.0
34
0.0 0.5
3
f(λ) f(λ)
r(t)
2
−1.0 −1.0−0.5 −0.5 0.0
r(t)
1
02
1
0 2 4 6 8 10 −π −π 2 0 π 2 π
0
0 2 4 t 6 8 10 −π −π 2 λ0 π 2 π
t λ
Obrázek 5.4: Autokorelační funkce (vlevo) a spektrální hustota (vpravo) posloupnosti
AR(1):
Obrázek = −0,8
Xt5.4: Autokorelační Yt ∼ N (vlevo)
Xt−1 + Yt ; funkce (0, 1) a spektrální hustota (vpravo) posloupnosti
Figure 5: Autocovariance
AR(1): Xt = −0,8 Xfunction R(t) = a|t| (left) and the spectral density
t−1 + Yt ; Yt ∼ N (0, 1)
(right),
a = −0, 8
0.35
1
0.9 0.3
0.8
0.25
0.7
0.6 0.2
0.5
0.15
0.4
0.3 0.1
0.2
0.05
0.1
0 0
−5 0 5 −5 0 5
Figure 6: Autocovariance function R(t) = ce−α|t| , t ∈ R (left) and the spectral density
(right), c = 1, α = 1
34
We also say that the increments of the process are orthogonal random variables.
In what follows we will consider only centered right-mean square continuous processes
i.e., such that E|Xt − Xt0 |2 → 0 as t → t0 + for any t0 ∈ T.
Theorem 24. Let {Zλ , λ ∈ [a, b]} be a centered orthogonal increment right-mean square
continuous process, [a, b] a bounded interval. Then there exists a unique non-decreasing
right-continuous function F such that
F (λ) = 0, λ ≤ a,
= F (b), λ ≥ b, (8)
F (λ2 ) − F (λ1 ) = E|Zλ2 − Zλ1 |2 , a ≤ λ1 < λ2 ≤ b.
We will show that this function is non-decreasing, right-continuous and satisfies the
condition of the theorem. Obviously, it suffices to consider λ ∈ [a, b], only.
Let a < λ1 < λ2 < b. Then
since the increments Zλ2 − Zλ1 a Zλ1 − Za are orthogonal. From here we have
which means that F is non-decreasing and also right-continuous, due to the right-
continuity of the process {Zλ , λ ∈ [a, b]}. Condition (8) is satisfied.
Now, let G be a non-decreasing right-continuous function that satisfies conditions of
the theorem. Then G(a) = 0 = F (a) and for λ ∈ (a, b] it holds G(λ) = G(λ) − G(a) =
E|Zλ − Za |2 = F (λ) − F (a) = F (λ), which proves the uniqueness of function F .
35
Example 21. Wiener process na [0, T ] is a centered mean square continuous Gaussian
process with independent and stationary increments, therefore with orthogonal incre-
ments, such that W0 = 0, Ws − Wt ∼ N (0, σ 2 |s − t|), 0 ≤ s, t, ≤ T. The associate
distribution function on [0, T ] is
F (λ) = 0, λ ≤ 0,
= E|Wλ − W0 |2 = σ 2 λ, 0 ≤ λ ≤ T,
= σ 2 T, λ ≥ T.
Example 22. Let W fλ be a transformation of the Wiener process on the interval [−π, π]
fλ = W(λ+π)/2π , λ ∈ [−π, π].
given by W
The process {W fλ , λ ∈ [π, π]} is Gaussian process with orthogonal increments and
the associated distribution function
F (λ) = 0, λ ≤ −π,
σ2
= (λ + π), λ ∈ [−π, π],
2π
= σ2, λ≥π
Properties of L2 (F ) :
36
hR i 12
b
• Norm in L2 (F ) is given by kf k = a
|f (λ)|2 dF (λ) ;
fn → f in L2 (F ) as n → ∞, if kfn − f k → 0, i. e.,
Z b
|fn (λ) − f (λ)|2 dF (λ) → 0, n → ∞;
a
where JA (y) = 1 for y ∈ A and JA (y) = 0 otherwise is the indicator function of a set A,
c1 , . . . , cn are complex-valued constants, ck 6= ck+1 , 1 ≤ k ≤ n − 1. We define
Z n
X
f (λ)dZ(λ) := ck (Zλk − Zλk−1 ), (10)
(a,b] k=1
Theorem 25. Let {Zλ , λ ∈ [a, b]} be a centered mean square right-continuous process
with orthogonal increments and the associated distribution function F , let f , g be simple
functions in L2 (F ), α, β complex-valued constants. Then
Rb
1. E a f (λ)dZ(λ) = 0.
Rb Rb Rb
2. a
[αf (λ) + βg(λ)]dZ(λ) = α a
f (λ)dZ(λ) + β a
g(λ)dZ(λ).
Rb Rb Rb
3. E a
f (λ)dZ(λ) a g(λ)dZ(λ) = a f (λ)g(λ)dF (λ).
37
Pn
Proof. 1. Let f (λ) = k=1 ck J(λk−1 , λk ] (λ). Then
" n # n
Z b X X
E f (λ)dZ(λ) = E ck (Zλk − Zλk−1 ) = ck E(Zλk − Zλk−1 ) = 0,
a k=1 k=1
2. W. l. o. g, let
n
X n
X
f (λ) = ck J(λk−1 , λk ] (λ), g(λ) = dk J(λk−1 , λk ] (λ).
k=1 k=1
Then
Z b n
X
[αf (λ) + βg(λ)]dZ(λ) = (αck + βdk )(Zλk − Zλk−1 )
a k=1
n
X X n
=α ck (Zλk − Zλk−1 ) + β dk (Zλk − Zλk−1 )
k=1 k=1
Z b Z b
=α f (λ)dZ(λ) + β g(λ)dZ(λ).
a a
3. Let n n
X X
f (λ) = ck J(λk−1 , λk ] (λ), g(λ) = dk J(λk−1 , λk ] (λ).
k=1 k=1
Then
Z b Z b
E f (λ)dZ(λ) g(λ)dZ(λ)
a a
n
X n
X
=E ck Zλk − Zλk−1 dk Zλk − Zλk−1
k=1 k=1
n
X 2
= ck dk E Zλk − Zλk−1
k=1
n
X Z b
= ck dk (F (λk ) − F (λk−1 )) = f (λ)g(λ)dF (λ).
k=1 a
38
II. Let f ∈ L2 (F ) be a measurable function. The set of simple functions is dense in
L2 (F ) and its closure is L2 (F ) (Rudin, 2003, Theorem 3.13), it means that there exists
a sequence of simple functions fn ∈ L2 (F ) such that fn → f in L2 (F ) as n → ∞.
Integral I(fn ) is defined for simple functions and I(fn ) ∈ L2 (Ω, A, P ). The sequence
{I(fn )} is a Cauchy sequence in L2 (Ω, A, P ) because
as m, n → ∞, since fn → f in L2 (F ).
Since {I(fn )} is the Cauchy sequence in L2 (Ω, A, P ), it has a mean square limit
Z b
I(f ) = l. i. m. I(fn ) := f (λ)dZ(λ), (11)
n→∞ a
which is called to be integral of the function f with respect to the process with orthogonal
increments, or stochastic integral.
Notice that I(f ) does not depend on the choice of the sequence {fn }. Let f ∈ L2 (F )
and fn a gn be simple, fn → f and gn → f v L2 (F ). Then I(fn ), I(gn ) have mean square
limits I, J, respectively.
Define sequence {hn } = {f1 , g1 , f2 , g2 , . . . } which is simple and hn → f in L2 (F ).
Then I(hn ) → K in mean square. Since selected subsequences {I(fn )} a {I(gn )} have
mean square limits, I ≡ J ≡ K.
Theorem 26. Let {Zλ , λ ∈ [a, b]} be a centered right-mean square continuous process
with orthogonal increments and the associated distribution function F . Then integral
(11) has the following properties.
Rb
1. Let f ∈ L2 (F ). Then EI(f ) = E a f (λ)dZ(λ) = 0.
3. Let f, g ∈ L2 (F ). Then
Z b
EI(f )I(g) = f (λ)g(λ)dF (λ). (12)
a
39
4. Let {fn , n ∈ N} and f be functions in L2 (F ), respectively. Then as n → ∞
We have:
• h = αf + βg
40
From the continuity of the inner product in L2 (Ω, A, P ) we have
Z ∞ Z b
f (λ)dZ(λ) := l. i. m. f (λ)dZ(λ) as a → −∞, b → ∞.
−∞ a
where {Zλ , λ ∈ [−π, π]} is a centered right-mean square continuous process with orthog-
onal increments on [−π, π] and associated distribution function F . Then {Xt , t ∈ Z} is
a centered weakly stationary sequence with the spectral distribution function F .
41
Proof. The associated distribution function F of the process {Zλ , λ ∈ [−π, π]} is bounded,
non-decreasing and right-continuous, F (λ) = 0 for λ ≤ −π, F (λ) = F (π) < ∞ for λ ≥ π.
For t ∈ Z define function et to be
et (λ) = eitλ , −π ≤ λ ≤ π.
Then Z π Z π
2
|et (λ)| dF (λ) = |eitλ |2 dF (λ) = F (π) − F (−π) < ∞,
−π −π
Rπ
it means that et ∈ L2 (F ) and eitλ dZ(λ) is well defined random variable.
Xt = −π
According to Theorem 26 we have
Rπ
1. EXt = E −π eitλ dZ(λ) = 0 for any t ∈ Z
2.
Z π 2 Z π Z π
itλ itλ
2
E|Xt | = E
e dZ(λ) = E
e dZ(λ) eitλ dZ(λ)
−π −π −π
Z π Z π
itλ 2
= |e | dF (λ) = dF (λ) < ∞.
−π −π
3.
Z π Z π
i(t+h)λ
cov(Xt+h , Xt ) = E e dZ(λ) eitλ dZ(λ)
−π −π
Z π
ihλ
= e dF (λ) := R(h).
−π
• function F has the same properties as the spectral distribution function (see spec-
tral decomposition of the autocovariance function, Theorem 19)
• from the uniqueness of the spectral decomposition (2) it follows that F is the
spectral distribution function of the sequence {Xt , t ∈ Z}.
42
Example 23. Let Wfλ be a transformation of the Wiener process to the interval [−π, π]
fλ = W(λ+π)/2π , λ ∈ [−π, π]. Then random variables
given by W
Z π
Xt = eitλ dW
f (λ), t ∈ Z
−π
are centered, uncorrelated with the same variance σ 2 ; the sequence {Xt , t ∈ Z} is the
Gaussian white noise, see also Examples 16 and 22.
Example 24. Consider a sequence of functions {ftk , t ∈ Z} on [−π, π] defined by
k
X
ftk (λ) = eitλj J(λj−1 ,λj ] (λ),
j=1
where −π = λ0 < λ1 < · · · < λk = π and k ∈ N is given. Let {Zλ , λ ∈ [−π, π]} be
a centered right-mean square continuous process with orthogonal increments on [−π, π]
with the associated distribution function F . We can see that ftk are simple functions in
L2 (F ), thus we can compute
Z π Xk k
X
Xtk := ftk (λ)dZ(λ) = eitλj (Zλj − Zλj−1 ) = eitλj Zej .
−π j=1 j=1
Here Zej = Zλj − Zλj−1 , j = 1, . . . , k, are uncorrelated random variables with zero mean
and the variance E|Zej |2 = E|Zλj − Zλj−1 |2 = F (λj ) − F (λj−1 ) := σj2 . Then we have
k
X
EXtk = 0, EX(t+h),k Xtk = eihλj σj2 := Rk (h).
j=1
We see that {Xtk , t ∈ Z} is stationary and its autocovariance function has the spectral
decomposition Z π
Rk (h) = eihλ dFXk (λ)
−π
where FXk is the spectral distribution function of {Xtk , t ∈ Z}; it has jumps at points λj
such that FXk (λj ) − FXk (λj−1 ) = σj2 . On the other hand, σj2 = F (λj ) − F (λj−1 ). Since
F (−π) = FXk (−π) = 0, F equals to FXk at least at points λj , j = 0, 1, . . . , k.
Theorem 28. Let {Xt , t ∈ Z} be a centered weakly stationary sequence with spec-
tral distribution function F . Then there exists a centered orthogonal increment process
{Zλ , λ ∈ [−π, π]} such that
Z π
Xt = eitλ dZ(λ), t ∈ Z (14)
−π
and
E|Z(λ) − Z(−π)|2 = F (λ), −π ≤ λ ≤ π.
43
Proof. Brockwell and Davis (1991), Theorem 4.8.2 or Prášková (2016), Theorem 4.4.
Relation (14) is called the spectral decomposition of a stationary random sequence.
Remark 7. Theorem 28 says that any random variable of a centered stationary P itλj ran-
dom sequence can be approximated (in the mean square limit) by a sum e Yj of
uncorrelated random variables Yj , the variance of which is an increment of the spectral
distribution function at points (frequencies) λj−1 and λj .
Theorem 29. Let {Xt , t ∈ R} be a centered weakly stationary mean square continuous
process. Then there exists an orthogonal increment process {Zλ , λ ∈ R} such that
Z ∞
Xt = eitλ dZ(λ), t ∈ R, (15)
−∞
and the associated distribution function of the process {Zλ , λ ∈ R} is the spectral distri-
bution function of the process {Xt , t ∈ R}.
Theorem 30. Let {Xt , t ∈ Z} be a centered stationary sequence with a spectral distri-
bution function F. Let H{Xt , t ∈ Z} be the Hilbert space generated by {Xt , t ∈ Z}. Then
U ∈ H{Xt , t ∈ Z} if and only if
Z π
U= ϕ(λ)dZ(λ), (16)
−π
where ϕ ∈ L2 (F ) and {Zλ , λ ∈ [−π, π]} is the orthogonal increment process as given in
the spectral decomposition of the sequence {Xt , t ∈ Z}.
44
where ϕ(λ) = N itj λ
P
j=1 cj e . Obviously, ϕ is a finite linear combination of functions from
L2 (F ), thus ϕ ∈ L2 (F ).
b) Let U = l. i. m. n→∞ Un , Un ∈ M{Xt , t ∈ Z}. According to a)
Z π
Un = ϕn (λ)dZ(λ), ϕn ∈ L2 (F ).
−π
Here, {Un } is the Cauchy sequence in H{Xt , t ∈ Z} (it is convergent there) thus, {ϕn }
is the Cauchy sequence in L2 (F ), since
Z π Z π 2
2
E|Um − Un | = E ϕm (λ)dZ(λ) − ϕn (λ)dZ(λ) =
−π −π
Z π 2 Z π
E [ϕm (λ) − ϕn (λ)]dZ(λ) = |ϕm (λ) − ϕn (λ)|2 dF (λ).
−π −π
ϕn → ϕ v L2 (F ). According to (13)
Z π Z π
ϕ(λ)dZ(λ) = l. i. m. ϕn (λ)dZ(λ),
−π n→∞ −π
hence
" n
#
Z π Z π X (n)
(n) iλtk
U = ϕ(λ)dZ(λ) = l. i. m. ck e dZ(λ)
−π n→∞ −π k=−n
n Z π
X (n)
(n) iλtk
= l. i. m. ck e dZ(λ)
n→∞ −π
k=−n
Xn
(n)
= l. i. m. ck Xt(n) ∈ H{Xt , t ∈ Z}.
n→∞ k
k=−n
45
8 Linear models of time series
8.1 White noise
Recall that the white noise sequence WN(0, σ 2 ) is defined as a sequence {Yt , t ∈ Z} of
uncorrelated random variables with mean zero and variance 0 < σ 2 < ∞, the autoco-
variance function
RY (t) = σ 2 δ(t), t ∈ Z
and the spectral density
σ2
fY (λ) = , λ ∈ [−π, π].
2π
Moreover, Z π
Yt = eitλ dZY (λ),
−π
where ZY = {Zλ , λ ∈ [−π, π]} is a process with orthogonal increments and associated
distribution function
σ2
F (λ) = (λ + π), λ ∈ [−π, π]
2π
that is same as the spectral distribution function FY (λ) of {Yt , t ∈ Z}.
Notation: MA(n)
1
Remark 8. In special case, bi = n+1
,i = 0, . . . , n.
46
2. The autocovariance function (computation in time domain): For t ≥ 0,
n n
!
X X
cov(Xs+t , Xs ) = EXs+t Xs = E bj Ys+t−j bk Ys−k
j=0 k=0
n X
X n
= bj bk E(Ys+t−j Ys−k )
j=0 k=0
Xn X n
2
=σ bj bk δ(t − j + k)
j=0 k=0
n−t
X
= σ2 bt+k bk , 0≤t≤n
k=0
= 0, t > n.
For t ≤ 0 we proceed analogously. Since cov(Xs+t , Xs ) depends on t, only, we can
conclude that the sequence is weakly stationary.
3. Spectral decomposition. By using the spectral decomposition of the white noise
we obtain
X n n
X Z π
i(t−j)λ
Xt = bj Yt−j = bj e dZY (λ)
j=0 j=0 −π
" n
#
Z π X
= bj ei(t−j)λ dZY (λ)
−π j=0
" n #
Z π X
= eitλ bj e−ijλ dZY (λ)
−π j=0
Z π
= eitλ g(λ)dZY (λ),
−π
Pn
where g(λ) = j=0 bj e−ijλ ∈ L2 (F ). From the properties of the stochastic integral we
again get EXt = 0, and for the autocovariance function (in spectral domain) we have
Z π Z π
i(s+t)λ
EXs+t Xs = E e g(λ)dZY (λ) eisλ g(λ)dZY (λ)
Z π−π −π
47
that is again a function of t which confirms the weak stationarity. Due to the unique-
ness of the spectral decomposition of the autocovariance function (Theorem 19) we can
σ2
conclude that function |g(λ)|2 2π is the spectral density of the sequence {Xt , t ∈ Z}.
We have just proved the following theorem.
Theorem 31. The moving average sequence {Xt , t ∈ Z} of order n defined by (17) is
centered and weakly stationary, with the autocovariance function
n−t
X
2
RX (t) = σ bk+t bk , 0 ≤ t ≤ n, (18)
k=0
= RX (−t), −n ≤ t ≤ 0,
= 0, |t| > n.
2. If ∞
P P∞
j=0 |cj | < ∞, the series j=0 cj Yt−j converges for every t ∈ Z absolutely with
probability one.
48
0.2
−0.5
0.1
−1.0
0.0
0 2 4 6 8 10 −π −π 2 0 π 2 π
t λ
1.0
1.5
0.5
1.0
f(λ)
0.0
r(t)
0.5
−0.5
−1.0
0.0
0 2 4 6 8 10 −π −π 2 0 π 2 π
t λ
Proof. 1. We will show that { nj=0 cj Yt−j , n ∈ N} is the Cauchy sequence in L2 (Ω, A, P )
P
for every t ∈ Z.
W. l. o. g., assume that m < n. Since Yk are uncorrelated with a constant variance σ 2
we easily get
2 2
Xm n
X Xn
E cj Yt−j − ck Yt−k = E cj Yt−j
j=0 k=0 j=m+1
Xn n
X
= |cj |2 E|Yt−j |2 = σ 2 |cj |2 → 0
j=m+1 j=m+1
asPm, n → ∞ which means thatPthere exists a mean square limit of the sequence
{ nj=0 cj Yt−j }, that we denote by ∞ cY .
√j=0 j t−j
2 12
2. Since E|Yt−j | ≤ (E|Yt−j | ) = σ 2 < ∞, we can see that
∞
X ∞
X ∞
X
E|cj Yt−j | = |cj |E|Yt−j | ≤ σ |cj | < ∞,
j=0 j=0 j=0
P∞
and thus j=0 |cj Yt−j | converges almost surely (Rudin, 2003, Theorem 1.38).
Theorem 33. Let {Xt , t ∈ Z} be a weakly stationary centered random sequence with an
P function R, let {cj , j ∈ N0 } be a sequence
autocovariance P∞of complex-valued constants
such that ∞ |c
j=0 j | < ∞. Then for any t ∈ Z the series j=0 cj Xt−j converges in mean
square and also absolutely with probability one.
49
Proof. 1. For m < n we have
2 !2
Xn n
X
E cj Xt−j ≤ E |cj ||Xt−j |
j=m+1 j=m+1
n
X X n
= |cj ||ck |E|Xt−j ||Xt−k |.
j=m+1 k=m+1
= RX (−t), t ≤ 0.
50
The spectral density fX of sequence (21) exists and takes values
2
∞
σ 2 X −ikλ
fX (λ) = ck e , λ ∈ [−π, π]. (23)
2π
k=0
We have proved (22) and the stationarity of sequence (21). Further, notice that the
(n)
sequence {Xt , t ∈ Z} has the spectral decomposition
Z π n
X
(n)
Xt = itλ
e gn (λ)dZY (λ), gn (λ) = cj e−ijλ ∈ L2 (FY ).
−π j=0
P∞ −ijλ
If we denote g(λ) = j=0 cj e , we have
∞ 2
Z π X Z
π
2 −ijλ
|gn (λ) − g(λ)| dFY (λ) = cj e dFY (λ)
−π −π j=n+1
Z π ∞
! 2 ∞
! 2
X X
2
≤ |cj | fY (λ)dλ = σ |cj | → 0.
−π j=n+1 j=n+1
(n)
in mean square as n → ∞ and simultaneously, Xt → Xt in mean square, which means
that Z π
Xt = eitλ g(λ)dZY (λ).
−π
51
Further, the computation
Z π Z π
i(s+t)λ
EXs+t Xs = E e g(λ)dZY (λ) eisλ g(λ)dZY (λ)
−π −π
Z π Z π 2
itλ 2 itλ 2σ
= e |g(λ)| dFY (λ) = e |g(λ)| dλ
−π −π 2π
results in the spectral decomposition of the autocovariance function of the sequence (21).
σ2
Function 2π |g(λ)|2 is the spectral density of the process (21).
and Yt ∼ W N (0, σ 2 .
The process is centered, weakly stationary, with the autocovariance function
ϕt
RX (t) = σ 2 , t ≥ 0,
1 − ϕ2
= RX (−t), t ≤ 0.
The sequence {Xt , t ∈ Z} defined by (24) is called autoregressive sequence of order one,
AR(1).
52
8.4 Autoregressive sequences
Definition 33. A random sequence {Xt , t ∈ Z} is called to be an autoregressive sequence
of order n, notation AR(n), if it satisfies equation
Xt = ϕ1 Xt−1 + · · · + ϕn Xt−n + Yt , t ∈ Z, (25)
where ϕ1 , . . . , ϕn are real-valued constants, ϕn 6= 0 and {Yt , t ∈ Z} is a white noise.
where a0 = 1.
We want to express the AR(n) sequence as a causal linear process. First, we define
a backward-shift operator by
BXt = Xt−1 , B 0 Xt = Xt , B k Xt = B k−1 (BXt ) = Xt−k , k ∈ Z.
Using this operator, relation (26) can be shortly written in the form a(B)Xt = Yt where
a(B) is a polynomial operator, formally identical with the algebraic polynomial a(z) =
1P+ a1 z + · · · + an z n . Similarly, let {ck , k ∈ Z} be a sequence of constants such that
∞
k=−∞ |ck | < ∞. The series
X∞
c(z) = ck z k
k=−∞
is absolutely convergent at least inside the unite circle and defines the operator
∞
X
c(B) = ck B k . (27)
k=−∞
53
The autocovariance function of this sequence is given by (22) and the spectral density
is
σ2 1
fX (λ) = Pn , λ ∈ [−π, π], (28)
2π | j=0 aj e−ijλ |2
where a0 = 1.
We have proved that the sequence {Xt , t ∈ Z} is the causal linear process, that satisfies
Theorem 34. It is centered and weakly stationary, with the autocovariance function (22)
and spectral density
2
2 X ∞
σ −ikλ
σ 2 −iλ 2
fX (λ) = e ck = c(e )
2π k=0 2π
σ2 1 σ2 1
= −iλ
= Pn .
2π |a(e )| 2 2π | j=0 aj e−ijλ |2
If all the roots of a(z) are simple, we can obtain coefficients cj in the representation
(29) by using decomposition into partial fractions:
1 A1 A2 An
c(z) = = + + ··· +
a(z) z1 − z z2 − z zn − z
54
where A1 , . . . , An are constants that can be determined. For |z| ≤ 1 and |zj | > 1,
∞ k
Aj Aj Aj X z
= = ,
zj − z zj 1 − zzj zj k=0 zj
n n ∞ k
X Aj X Aj X z
c(z) = =
z −z
j=1 j
z
j=1 j k=0
zj
∞ n ∞
X X
kAj X
= z k+1
= ck z k ,
k=0
z
j=1 j k=0
n
X Aj
ck = k+1
.
j=1
zj
55
that satisfies conditions of Theorem 35, with real-valued coefficients a1 , . . . , an and with
{Yt , t ∈ Z} being the real white noise WN(0, σ 2 ). Since the sequence {Xt , t ∈ Z} is a
real-valued causal linear process and Yt are uncorrelated, it can be easily proved that
EXs Yt = hXs , Yt i = 0 for s < t.
thus,
EXt Yt = σ 2 .
Multiplying (30) by Xt−k for k ≥ 0 and taking the expectation we get a system of
equations
EXt Xt−k + a1 EXt−1 Xt−k + · · · + an EXt−n Xt−k = EYt Xt−k ,
or, if we put RX (t) = R(t),
Solution: Dividing (32) for k ≥ 1 by R(0) we get equations for the autocorrelation
function r(t) = R(t)/R(0).
• Values r(1), . . . , r(n − 1) together with r(0) = 1 serve as initial conditions to solve
the system of difference equations
56
In this way we get the solution r(t) for t ≥ 0. For a real-valued sequence, r(t) = r(−t).
If we insert R(k) = r(k)R(0) into (31) we get the equation for R(0) :
thus
σ2
R(0) = · (33)
1 + a1 r(1) + · · · + an r(n)
Remark 12. If zi , i = 1, . . . , n, are the roots of the polynomial a(z) = 1+a1 z +· · ·+an z n ,
then λi = zi−1 , i = 1, . . . , n, are the roots of the polynomial L(z) = z n + a1 z n−1 + · · · + an .
The AR(n) sequence is a causal linear process, if all the roots of the polynomial L(z)
are inside the unit circle.
Polynomial a(z) = 1 + az has the root − a1 that is outside the unit circle; it means that
{Xt , t ∈ Z} is a weakly stationary causal linear process. The Yule - Walker equations
for the autocovariance function RX (t) = R(t) are now
R(0) + aR(1) = σ 2 ,
R(k) + aR(k − 1) = 0, k ≥ 1,
A general solution to the difference equation for the autocorrelation function is r(k) =
c(−a)k , the initial condition is r(0) = 1 = c. Value R(0) can be determined from formula
(33):
σ2 σ2
R(0) = = ·
1 + a r(1) 1 − a2
Example 27. Consider the AR(2) sequence
3 1
Xt − Xt−1 + Xt−2 = Yt , Yt ∼ WN(0, σ 2 ).
4 8
The polynomial a(z) = 1 − 34 z + 18 z 2 has roots z1 = 2, z2 = 4, {Xt , t ∈ Z} is the causal
linear process that is weakly stationary. The Yule-Walker equations are
3 1
R(0) − R(1) + R(2) = σ 2 ,
4 8
3 1
R(k) − R(k − 1) + R(k − 2) = 0, k ≥ 1.
4 8
57
The equations for the autocorrelation function are of the form
3 1
r(1) −
+ r(1) = 0,
4 8
3 1
r(k) − r(k − 1) + r(k − 2) = 0, k ≥ 2. (34)
4 8
Solving the first equation we get r(1) = 23 . For k ≥ 2 we solve the second order difference
equation with initial conditions r(0) = 1, r(1) = 32 .
Characteristic equation L(λ) = λ2 − 34 λ + 81 = 0 has two different real-valued roots
λ1 = 12 , λ2 = 41 . A general solution of the difference equation (34) is
k k
1 1
r(k) = c1 λk1 + c2 λk2 = c1 + c2 .
2 4
Constants c1 , c2 satisfy
c1 + c2 = r(0),
λ1 c1 + λ2 c2 = r(1),
so that c1 = 35 , c2 = − 23 , and
k k
5 1 2 1
r(k) = − , k = 0, 1, . . .
3 2 3 4
r(k) = r(−k), k = −1, −2, . . .
The model called ARMA(m, n) is a mixed model of autoregressive and moving average
sequences.
58
Consider polynomials a(z) = 1 + a1 z + · · · + am z m and b(z) = 1 + b1 z + · · · + bn z n .
Then we can write ARMA(m, n) model in the form
where a0 = 1, b0 = 1.
Proof. We proceed analogously as in the Proof of Theorem 35. Since all the roots of the
polynomial a(z) are lying outside the unit circle, for |z| ≤ 1 it holds
∞ ∞
1 X
j
X
= h(z) = hj z , where |hj | < ∞.
a(z) j=0 j=0
Thus, h(z)a(z) = 1 for |z| ≤ 1 and if we apply the operator h(B) to both sides of
equations (36), we have
h(B)a(B)Xt = Xt = h(B)b(B)Yt = c(B)Yt ,
where c(z) = b(z)/a(z) and ∞
P
j=0 |cj | < ∞.
The sequence {Xt , t ∈ Z} is the causal linear process with the autocovariance function
(22) and the spectral density
∞ 2 2
σ 2 X −ijλ σ 2 −iλ 2 σ 2 b(e−iλ )
fX (λ) = cj e = c(e ) =
2π j=0 2π 2π a(e−iλ )
Pn
σ 2 | j=0 bj e−ijλ |2
= .
2π | m −ikλ |2
P
k=0 ak e
59
Remark 13. If the polynomials a(z) a b(z) have common roots the polynomial c(z) =
b(z)/a(z) defines an ARMA(p, q), process with p < m, q < n.
Xt + aXt−1 = Yt + bYt−1 , t ∈ Z,
c0 = 1, cj = (−a)j−1 (b − a), j ≥ 1.
Computation of R(0) :
∞ ∞
" #
X X 2
R(0) = σ 2 c2j = σ 2 1 + aj−1 (b − a)
j=0 j=1
2
(b − a) 1 − 2ab + b2
= σ2 1 + = σ2 .
1 − a2 1 − a2
For k ≥ 1,
∞ ∞
" #
X X
R(k) = σ 2 cj cj+k = σ 2 c0 ck + cj cj+k
j=0 j=1
∞
" #
X
= σ 2 (−a)k−1 (b − a) + (−a)k (b − a)2 (−a)2j
j=0
2
(b − a)
= σ 2 (−a)k−1 (b − a) + (−a)k
1 − a2
1 − ab
= σ 2 (−a)k−1 (b − a) = (−a)k−1 R(1).
1−a 2
60
The spectral density is
σ 2 |1 + be−iλ |2 σ 2 1 + 2b cos λ + b2
fX (λ) = · = · , λ ∈ [−π, π].
2π |1 + ae−iλ |2 2π 1 + 2a cos λ + a2
For k ≥ max(m, n + 1), (39) is solved as a difference equation with initial conditions that
can be obtained from the system of equations for k < max(m, n + 1).
Xt + aXt−1 = Yt + bYt−1 , t ∈ Z,
The difference equation R(k) + aR(k − 1) = 0 with an initial condition for R(1) has the
solution R(k) = (−a)k−1 R(1), k ≥ 1.
61
The values of R(1) a R(0) will be computed from the first and the second equations:
1 2
σ (1 − 2ab + b2 ) ,
R(0) =
1 − a2
1 2
R(1) = σ (b − a)(1 − ab)
1 − a2
which is the same result as before.
Theorem 37. Let {Xt , t ∈ Z} be the stationary ARMA(m, n) random sequence defined
by (36). Let the polynomials a(z) and b(z) have no common roots and the polynomial
b(z) = 1 + b1 z + · · · + bn z n has all the roots outside the unit circle. Then {Xt , t ∈ Z}
is invertible and ∞
X
Yt = dj Xt−j , t ∈ Z,
j=0
Proof. The theorem can be proved analogously as Theorem 36 by inverting the polyno-
mial b(z). The correctness of all operations is guaranteed by Theorem 33 since we assume
that {Xt , t ∈ Z} is stationary.
Remark 14. Let us notice that the equation d(z)b(z) = a(z) with polynomials a(z) =
1 + a1 z + · · · + am z m , b(z) = 1 + b1 z + · · · + bn z n , respectively, implies d0 = 1. Relation
(40) can be written as
X∞
Xt + dj Xt−j = Yt , t ∈ Z. (41)
j=1
62
8.6 Linear filters
Definition 36. Let {Yt , t ∈ Z} be a centered weakly stationary
P∞ sequence. Let {ck , k ∈
Z} be a sequence of (complex-valued) numbers such that j=−∞ |cj | < ∞.
We say that a random sequence {Xt , t ∈ Z} is obtained by filtration of a sequence
{Yt , t ∈ Z}, if
X∞
Xt = cj Yt−j , t ∈ Z. (42)
j=−∞
The sequence {cj , j ∈ Z} is called time-invariant linear filter. Provided that cj = 0 for
all j < 0, we say that the filter {cj , j ∈ Z} is causal
Theorem 38. Let {Yt , t ∈ Z} be a centered weakly stationary sequence with an auto-
covariance function RY and spectral density fY and P let {ck , k ∈ Z} be a linear filtr such
that ∞ ∞
P
−∞ k|c | < ∞. Then {X t , t ∈ Z}, where X t = k=−∞ ck Yt−k , is a centered weakly
stationary sequence with the autocovariance function
∞
X ∞
X
RX (t) = cj ck RY (t − j + k), t∈Z
j=−∞ k=−∞
where ∞
X
Ψ(λ) = ck e−ikλ
k=−∞
as n → ∞. Rπ
For any t ∈ Z, Yt has the spectral decomposition Yt = −π eitλ dZY (λ), where ZY is
a process with orthogonal increments and the associated distribution function FY (λ).
Thus
Xn Xn Z π
(n)
Xt = ck Yt−k = ck ei(t−k)λ dZY (λ)
k=−n k=−n −π
Z π n
X Z π
itλ −ikλ
= e ck e dZY (λ) = eitλ hn (λ)dZY (λ),
−π k=−n −π
63
where hn (λ) = n−n ck e−ikλ . For the same reasons as in the proof
P
P of Theorem 34, hn
converges to a function Ψ in the space L2 (FY ), where Ψ(λ) = ∞ c
k=−∞ k e −ikλ
, and by
Theorem 26, Z π
(n)
Xt = l. i. m. Xt = eitλ Ψ(λ)dZY (λ)
n→∞ −π
for any t ∈ Z.
(n)
Since {Xt , t ∈ Z} is centered, {Xt , t ∈ Z} is also centered, and according to
(n)
Theorem 12 the autocovariance functions {Xt , t ∈ Z} converge to the autocovariance
function of {Xt , t ∈ Z}, and so
n
X n
X
n
EXs+t Xs = lim EXs+t Xsn = lim cj ck E(Ys+t−j Ys−k )
n→∞ n→∞
j=−n k=−n
∞
X ∞
X
= cj ck RY (t − j + k) := RX (t).
j=−∞ k=−∞
and from the spectral decomposition of the autocovariance function (Theorem 19) it
follows that the function
|Ψ(λ)|2 fY (λ) := fX (λ)
is the spectral density of the sequence {Xt , t ∈ Z}.
64
Then {Xt , t ∈ Z} is not a causal linear process, but we can write
∞
X
Xt = − ϕ−k Yt+k .
k=1
i.e., {Xt , t ∈ Z} satisfies the weak law of large numbers for stationary sequences.
Theorem 39. A stationary random sequence {Xt , t ∈ Z} with mean value µ and auto-
covariance function R is mean square ergodic if and only if
n
1X
R(t) → 0 as n → ∞. (45)
n t=1
where {Zλ , λ ∈ [−π, π]} is the orthogonal increment process with the associated distri-
bution function F, which is same as the spectral distribution function of {Xt , t ∈ Z}.
65
Then
n n n
1 X π itλ Z π1 X
Z
1X itλ
Xt = e dZ(λ) = e dZ(λ)
n t=1 n t=1 −π −π n t=1
Z π
= hn (λ)dZ(λ),
−π
where
n iλ
(1−einλ )
(
1 X itλ 1e
, λ 6= 0,
hn (λ) = e = n 1−eiλ
n t=1 1, λ = 0.
Further, let us consider function
(
0, λ 6= 0,
h(λ) =
1, λ=0
Hence, as n → ∞
n Z π Z π
1X
Xt = hn (λ)dZ(λ) → h(λ)dZ(λ) = Z0
n t=1 −π −π
in mean square.
Now, it suffices to show that
n
1X
Z0 = 0 a. s. ⇐⇒ R(t) → 0 as n → ∞. (46)
n t=1
66
From the spectral decomposition of the autocovariance function and the Lebesgue
theorem
n n Z n
1X 1 Xh π itλ i Z π1 X
R(t) = e dF (λ) = eitλ dF (λ)
n t=1 n k=1 −π −π n t=1
Z π Z π Z π
= hn (λ)dF (λ) → h(λ)dF (λ) = |h(λ)|2 dF (λ) (47)
−π −π −π
σ2
RX (t) = ϕ|t| .
1 − ϕ2
Obviously,
n n
1X σ2 1 X t 1 σ 2 ϕ(1 − ϕn )
RX (t) = ϕ = →0
n t=1 1 − ϕ2 n t=1 n 1 − ϕ2 1 − ϕ
as n → ∞, from which we conclude that {Xt , t ∈ Z} is mean square ergodic.
Example 33. Let {Xt , t ∈ Z} be a stationary mean square ergodic sequence with
expected value µ and autocovariance function RX . Define a random sequence {Zt , t ∈ Z}
by
Zt = Xt + Y, t ∈ Z,
where EY = 0, varY = σ 2 ∈ (0, ∞), and EXt Y = 0 ∀t ∈ Z.
Then EZt = EXt + EY = µ for all t ∈ Z and
from which we get that {Zt , t ∈ Z} is weakly stationary. However, it is not mean square
ergodic, since, as n → ∞,
n n
1X 1X
RZ (t) = RX (t) + σ 2 → σ 2 > 0.
n t=1 n t=1
67
Theorem 40. Let {Xt , t ∈ Z} be a real-valued stationary sequence with mean value µ
and autocovariance function R, such that ∞
P
t=−∞ |R(t)| < ∞. Then, as n → ∞
n
1X
Xn = Xt → µ in mean square, (48)
n t=1
∞
X
n var X n → R(k). (49)
k=−∞
P∞ 1
Pn
Proof. 1. k=−∞ |R(k)| < ∞ ⇒ R(k) → 0 as k → ∞, thus n k=1 R(k) → 0 as
n → ∞ and assertion (48) follows from Theorem 39.
2. We have
n
1 X
var X n = var Xk
n k=1
n
1 hX XX i
= var Xk + cov (Xj , Xk )
n2 k=1 1≤j6=k≤n
n−1
1h X i
= nR(0) + 2 (n − j)R(j)
n2 j=1
n−1
1h X j i
= R(0) + 2 1− R(j)
n j=1
n
n−1
|j|
1 X
= 1− R(j). (50)
n j=−n+1 n
Thus,
n−1 n−1
X 2X
n varX n = R(j) − jR(j).
j=−n+1
n j=1
Assertion (49) now follows from the assumptions of the theorem and from the Kronecker
lemma.
68
Definition 38. A stationary mean square continuous process {Xt , t ∈ R} with mean
value µ is mean square ergodic if, as τ → ∞,
1 τ
Z
Xt dt → µ in mean square.
τ 0
Rτ
Remark 16. The existence of the integral 0 Xt dt is guarranted by Theorem 18 since the
autocovariance function of the stationary mean square continuous process is continuous
and the expected value µ is constant.
Theorem 41. A stationary, mean square continuous process {Xt , t ∈ R} is mean square
ergodic if and only if its autocovariance function satisfies condition
1 τ
Z
R(t)dt → 0 as τ → ∞.
τ 0
Example 34. Let {Xt , t ∈ R} be a stationary centered stochastic process with the
autocovariance function
1 τ c τ −αt c 1 − e−ατ
Z Z
R(t)dt = e dt = →0
τ 0 τ 0 τ α
2c
as τ → ∞, the process {Xt , t ∈ R} is mean square ergodic and τ var X τ → α
.
69
9.2 Central limit theorems
Some preliminary asymptotic results
Theorem 44. Let {ξn , n ∈ N}, {Skn , n ∈ N, k ∈ N}, {ψk , k ∈ N} and ψ be random
variables such that
D
1. Skn −→ ψk , n → ∞, for all k = 1, 2, . . . ,
D
2. ψk −→ ψ, k → ∞,
Then
D
ξn −→ ψ as n → ∞.
√ Yn−µ D
n −→ N (0, 1). (53)
σ
Proof. Brockwell and Davis (1991), Theorem 6.4.1.
c0 Xn −→ c0 X
D
pro n → ∞.
70
Central limit theorems for stationary sequences
Theorem 47. Let {Xt , t ∈ Z} be a random sequence defined by
m
X
Xt = µ + bj Yt−j ,
j=0
where
m
X m
X m−1
X m
X
ξn = Y1−s bj − Yn−s bj
s=1 j=s s=0 j=s+1
Xm 1 X n Xm 2
D 2 2 2
bj √ Yt −→ N (0, ∆ ), where ∆ = σ bj . (55)
j=0
n t=1 j=0
71
P
Now, using Theorem 43 it suffices to proof that √1n ξn −→ 0 as n → ∞. But it holds,
since as n → ∞
1 σ 2 · const
1 1 1 2
P √ ξn > ≤ 2 E ξn = 2 → 0.
n n n
n
1 X D
√ (Xt − µ) −→ N (0, ∆2 ),
n t=1
P 2
∞
where ∆2 = σ 2 j=0 bj .
thus n n n
1 X 1 X 1 X
√ (Xt − µ) = √ Ukt + √ Vkt .
n t=1 n t=1 n t=1
If we denote
n n n
1 X 1 X 1 X
ξn = √ (Xt − µ), Skn =√ Ukt , Dkn =√ Vkt ,
n t=1 n t=1 n t=1
we have
ξn = Skn + Dkn .
From Theorem 47 we have, as n → ∞ and every k ∈ N
D
Skn −→ ψk , (56)
72
where ψk ∼ N (0, ∆2k ), ∆2k = σ 2 ( kj=0 bj )2 . Further, from the assumptions of the theorem
P
it follows that !2 !2
X k X∞
∆2k = σ 2 bj → σ2 bj = ∆2 ,
j=0 j=0
as k → ∞, and thus
D
ψk −→ N (0, ∆2 ). (57)
According to the Chebyshev inequality
1
P (|ξn − Skn | > ) = P (|Dkn | > ) ≤ varDkn
2 !
n
1 1 X
= var √ Vkt .
2 n t=1
n
1 1 X
P (|ξn − Skn | > ) ≤ var √ Vkt
2 n t=1
n−1 n−1
1 X |j| 1 X
= RV (j) 1 − ≤ |RV (j)|
2 j=−n+1 n 2 j=−n+1
n−1
1h X i
= 2 RV (0) + 2 |RV (j)|
j=1
∞ ∞
n−1 X
σ2 h X 2 X i
= b + 2 b b
ν ν+j
2 j=k+1 j
j=1 ν=k+1
∞ n−1 X ∞
σ2 h X 2 X i
≤ 2 bj + 2 |bν ||bν+j |
j=k+1 j=1 ν=k+1
∞ ∞ ∞ ∞
σ2 h X 2
X X i σ2 X 2
≤ 2 |bj | + 2 |bν | |bν+j | = 2 |bj | ,
j=k+1 ν=k+1 j=1
j=k+1
so that !2
∞
σ2 X
lim lim P (|ξn − Skn | > ) ≤ lim 2 |bj | =0 (58)
k→∞ n→∞ k→∞
j=k+1
73
Combining this result with (56) and (57) we can see that the assumptions of Theorem
44 are met and thus, as n → ∞,
n
1 X D
ξn = √ (Xt − µ) −→ N (0, ∆2 ).
n t=1
Xt = µ + Zt , Zt = aZt−1 + Yt ,
where µ ∈ R, |a| < 1 and {Yt , t ∈ Z} is a strict white noise with finite variance σ 2 > 0.
P∞
The assumption |a| < 1 implies that j=0 |a|j < ∞, thus
∞
X
Xt = µ + aj Yt−j , t ∈ Z.
j=0
P∞
Since j=0 aj 6= 0, it holds, as n → ∞
n
1 X D 1
√ (Xt − µ) −→ N (0, ∆2 ), ∆2 = σ 2 .
n t=1 (1 − a)2
σ2
For large n, X n ≈ N µ, n(1−a)2 .
Xt = Yt Yt+m , t ∈ Z,
74
In this case, Xt are mutually uncorrelated but not independent. They are m−dependent.
Then, as n → ∞,
Proof. 1. Since the sequence {Xt , t ∈ Z} is strictly stationary with finite second-order
moments, it is weakly stationary. From m-dependence it follows that R(k) = 0 for
|k| > m. According to Theorem 40 we have
∞
X m
X
lim n varX n = R(k) = R(k) = ∆2m .
n→∞
k=−∞ k=−m
(X1 , . . . , Xn ) = (U1 , V1 , U2 , V2 , . . . , Ur , Vr ),
Uj = (X(j−1)k+1 , . . . , Xjk−m ), j = 1, . . . , r,
Vj = (Xjk−m+1 , . . . , Xjk ), j = 1, . . . , r.
U1 , . . . , Ur are mutually independent (it follows from m-dependence and the assumption
k > 2m) and identically distributed (from strict stationarity). Similarly, V1 , . . . , Vr are
i. i. d. Thus,
n
X Xr Xr
Xt = Sj + Tj ,
t=1 j=1 j=1
75
Similarly, utilizing the strict stationarity we have
var T1 = var (Xk−m+1 + · · · + Xk ) = var (X1 + · · · + Xm )
m−1
X
2
= (m − |ν|)R(ν) = δm .
ν=−m+1
D
ψk −→ N (0, ∆2m ). (65)
From the Chebyshev inequality,
r
1 1 X
P (|ξn − Skn | > ) = P (|Dkn | > ) ≤ · var Tj
2 n j=1
1 1 1 2
= · var T1 = δ .
2 k 2 k m
Thus,
lim lim P (|ξn − Skn | > ) = 0 (66)
k→∞ n→∞
and the proof follows from (64), (65), (66) and Theorem 44.
76
Example 38. Consider the sequence {Xt , t ∈ Z}
Xt = µ + Yt + a1 Yt−1 + a2 Yt−2 , t ∈ Z,
Therefore
m
X
∆2m = R(k) = R(0) + 2R(1) + 2R(2) = σ 2 (1 + a1 + a2 )2 .
k=−m
Pn D
From the previous theorem, √1 − µ) −→ N (0, ∆2m ), provided ∆2m 6= 0.
n t=1 (Xt
where τ 2 = var Y12 , Xt = (Yt Yt+1 , . . . , Yt Yt+k )0 and I is the identity matrix of order k.
Solution.
1. Yt2 are i. i. d., EYt2 = σ 2 , var Yt2 = τ 2 . The Central limit theorem (Theorem 45)
implies that
n
1 X 2 D
√ (Y − σ 2 ) −→ N (0, τ 2 ).
n t=1 t
77
2. Denote Xt := Yt Yt+k for k > 0. The sequence {Xt , t ∈ Z} is strictly stationary,
EXt = 0, EXt2 = σ 4 , Xt are mutually uncorrelated but k-dependent. By Theorem 49,
n
1 X D
√ Xt −→ N (0, ∆2k ),
n t=1
Pk
where ∆2k = j=−k RX (j) = σ 4 .
3. We can write
n−k n n
1 X 1 X 1 X
√ Yt Yt+k = √ Yt Yt+k − √ Yt Yt+k .
n t=1 n t=1 n t=n−k+1
From step 2, as n → ∞,
n
1 X D
√ Yt Yt+k −→ N (0, σ 4 ).
n t=1
Form the Chebyshev inequality, as n → ∞,
n
! n
!
1 X 1 X
P √ Yt Yt+k > = P √ Xt >
n t=n−k+1 n t=n−k+1
n n
1 X 2 1 X
2 1 k σ4
≤ E X t = EX t = →0
2 n t=n−k+1 2 n t=n−k+1 2 n
since k is fixed.
4. Define Zt := c0 Xt , t ∈ Z, c ∈ Rk . Then
• Random vectors Xt have zero mean and the variance matrix σ 4 I and are mutually
uncorrelated.
• Random variables Zt are centered, with the variance σ 4 c0 Ic, uncorrelated and k-
dependent
By Theorem 49,
n
1 X D
√ Zt −→ N (0, ∆2k ),
n t=1
where ∆2k = kj=−k RZ (j) = σ 4 c0 Ic. From here the final result follows when we apply
P
Theorem 46 and properties of normal distribution.
78
10 Prediction in time domain
10.1 Projection in Hilbert space
Definition 40. Let H be a Hilbert space with the inner product h·, ·i and the norm ||.||.
We say that two elements x, y ∈ H are orthogonal (perpendicular) if hx, yi = 0. We write
x ⊥ y.
Let M ⊂ H be a subset of H. We say that an element x ∈ H is orthogonal to M, if it
is orthogonal to every element of M, i.e., hx, yi = 0 for every y ∈ M. We write x ⊥ M.
The set M ⊥ = {y ∈ H : y ⊥ M } is called to be the orthogonal complement of the set M .
||x − x
b|| = min ||x − y|| (67)
y∈M
and
||x||2 = ||b
x||2 + ||x − x
b||2 . (68)
Proof. Rudin (2003), Theorem 4.11, or Brockwell and Davis (1992), Theorem 2.3.1.
x = PM x + (x − PM x) = PM x + (I − PM )x. (69)
Theorem 52. Let H be a Hilbert space, PM the projection mapping of H onto a closed
subspace M. It holds:
2. If x ∈ M , then PM x = x.
79
3. If x ∈ M ⊥ , then PM x = 0.
Obviously,
since M and M ⊥ are linear subspaces, and thus, αPM x + βPM y = PM (αx + βy).
80
It is well known that the best approximation is given by the conditional mean value
Indeed, if (for the simplicity, we consider only real-valued random variables) we denote
(X1 , . . . , Xn )0 = Xn , we can write
Thus,
such that
bn+h (n)|2 = kXn+h − X
E|Xn+h − X bn+h (n)k2
81
takes minimum with respect to all linear combinations of X1 , . . . , Xn .
It means that
Xbn+h (n) = PH n (Xn+h ) ∈ H n ,
1 1
bn+h (n) ⊥ H n
Xn+h − X (71)
1
and the element X bn+h (n) is determined uniquely due to the projection theorem.
Since the space H1n is a linear span generated by X1 , . . . , Xn , condition (71) is satisfied
if and only if
Xn+h − X bn+h (n) ⊥ Xj , j = 1, . . . , n,
or
Γn cn = γ nh
82
Provided that Γ−1 −1
n exists we get cn = Γn γ nh , thus
n
X
X
bn+h (n) = cj Xj = c0n Xn = γ 0nh Γ−1
n Xn . (74)
j=1
δh2 := E|Xn+h − X
bn+h (n)|2 = kXn+h − X
bn+h (n)k2 .
By (68)
kXn+h k2 = kX
bn+h (n)k2 + kXn+h − X
bn+h (n)k2 ,
so that
δh2 = kXn+h k2 − kX
bn+h (n)k2 . (75)
For a real-valued centered stationary sequence such that Γn is regular,
Theorem 53. Let {Xt , t ∈ Z} be a real-valued centered stationary sequence with auto-
covariance function R, such that R(0) > 0 and R(k) → 0 as k → ∞. Then the matrix
Γn = var (X1 , . . . , Xn ) is regular for every n ∈ N.
Proof. We will prove the theorem by contradiction: suppose that Γn is singular for an
n ∈ N; then there is a nonzero vector c = (c1 , . . . , cn )0 such that c0 Γn c = 0 and for Xn =
(X1 , . . . , Xn )0 , c0 Xn = 0 a. s. holds true, since E c0 Xn = 0 and var (c0 Xn ) = c0 Γn c = 0.
Thus there exists a positive integer 1 ≤ r < n such that Γr is regular and Γr+1 is
singular, and constants a1 , . . . , ar such that
r
X
Xr+1 = aj X j .
j=1
83
(n) (n) Pr (n)
For every n ≥ r + 1 there exist constants a1 , . . . , ar such that Xn = j=1 aj X j =
(n) (n)
a(n)0 Xr , where a(n) = (a1 , . . . , ar )0 and Xr = (X1 , . . . , Xr )0 ,
(n) 2 (n)
from which for every j = 1, . . . , r it follows that aj ≤ R(0)/λ1 , hence, |aj | ≤ C
independently of n, where C is a positive constant.
We also have
r r
2 X (n)
X (n)
0 < R(0) = E Xn = E Xn aj X j = aj EXn Xj
j=1 j=1
r
X r
X
(n) (n)
= aj R(n − j) ≤ |aj ||R(n − j)|
j=1 j=1
r
X
≤C |R(n − j)|.
j=1
Recursive methods
Let us introduce the following notation.
X
bk+1 := X
bk+1 (k) = PH k (Xk+1 ).
1
Then
H1n = H{X1 , . . . , Xn } = H{X1 − X
b 1 , . . . , Xn − X
bn }.
84
Lemma 4. X1 − X
b 1 , . . . , Xn − X
bn are orthogonal random variables.
Xi − X
b i ⊥ Xj − X
bj .
The one-step best linear prediction of Xk+1 computed from X1 , . . . , Xk thus can be
written in the form
X k
Xk+1 =
b θk j Xk+1−j − X
bk+1−j .
j=1
X
b1 = 0,
Xn
Xn+1 =
b θn j Xn+1−j − X
bn+1−j , n ≥ 1, (77)
j=1
where for k = 0, . . . , n − 1,
Proof. Define X
b1 := 0, then
85
Since Xbn+1 = PH n Xn+1 , it must be of the form as given in (77). When multiply both
1
sides of (77) by Xk+1 − Xbk+1 for k < n and take the mean value we get
bn+1 (Xk+1 − X
EX bk+1 )
Xn
= θnj E Xn+1−j − Xbn+1−j Xk+1 − X
bk+1
j=1
bk+1 )2 = θn,n−k vk .
= θn,n−k E(Xk+1 − X
E(Xn+1 − X
bn+1 )(Xk+1 − X
bk+1 ) = 0, k < n,
86
Computation of vn is as follows.
vn = E|Xn+1 − Xbn+1 |2 = E|Xn+1 |2 − E|X
bn+1 |2
n
X 2
= R(n + 1, n + 1) − E θnj Xn+1−j − Xn+1−j
b
j=1
Xn
2 bn+1−j )2
= R(n + 1, n + 1) − θnj E(Xn+1−j −X
j=1
Xn
2
= R(n + 1, n + 1) − θnj vn−j
j=1
n−1
X
2
= R(n + 1, n + 1) − θn,n−ν vν .
ν=0
87
generally,
θnk = 0, k = 2, . . . , n,
R(1)
θn1 = ,
vn−1
bn+1 = θn1 (Xn − X
X bn ),
2
vn = R(0) − θn1 vn−1 .
Example 41. Consider an MA(q) sequence. Then R(k) = 0 pro |k| > q. By using the
recursive computations we get
min(q,n)
X
X
bn+1 = θnj Xn+1−j − X
bn+1−j , n ≥ 1.
j=1
n+h−1
X
= PH1n θn+h−1,j Xn+h−j − X
bn+h−j
j=1
n+h−1
X
= θn+h−1,j PH1n Xn+h−j − X
bn+h−j
j=1
n+h−1
X
= θn+h−1,j Xn+h−j − Xn+h−j ,
b (82)
j=h
88
Example 42. Le us again consider the MA(1) model from Example 40. We have shown
that the one-step prediction is
bn+1 = PH n (Xn+1 ) = θn1 (Xn − X
X bn ).
1
X
bn+h (n) = PH n (Xn+h ) = PH n (X
1 1
bn+h )
= PH n (θn+h−1,1 (Xn+h−1 − X
1
bn+h−1 )) = 0,
H1n = H(X1 − X b 1 , . . . , Xn − X
bn ) = H(X1 , . . . , Xn )
= H(W1 , . . . , Wn ) = H(W1 − W c1 , . . . , Wn − W
cn ).
• Application of the projection mapping onto H1t−1 to both sides of (83) results in
σ1 X
bt , t ≤ m,
W
ct = (85)
1 b
σ
(Xt − ϕ1 Xt−1 · · · − ϕp Xt−p ), t > m.
89
We can see that for t ≥ 1,
1
ct |2 = 1
Wt − W
ct = Xt − X
bt , E|Wt − W vt−1 := wt−1 .
σ σ2
Therefore it holds
Pn
j=1 θnj Xn+1−j − X
bn+1−j , n<m
X
bn+1 = P (86)
q Pp
j=1 θnj Xn+1−j − Xn+1−j + j=1 ϕj Xn−j+1 , n ≥ m,
b
where coefficients θnj a wn are computed by applying the innovation algorithm to the
sequence (83). For this we need to compute the values of the autocovariance function of
{Wt }.
We know that E Xt = 0, thus E Wt = 0. For the covariances RW (s, t) = EWs Wt we
get
1
R (s − t), 1 ≤ s, t ≤ m,
σ2 X
Pp
1
2 RX (s − t) − j=1 ϕj RX (|s − t| − j) ,
σ
• Transformation:
(1
σ
Xt , 1 ≤ t ≤ p,
Wt = (88)
1
σ
(Xt − ϕ1 Xt−1 − · · · − ϕp Xt−p ) = σ1 Yt , t > p.
90
ct = 1 (Xt − X
Again, Wt − W bt ) for t ≥ 1 and from here
σ
(Pn
j=1 θnj Xn+1−j − X
bn+1−j , n < p,
X
bn+1 = (90)
ϕ1 Xn + ϕ2 Xn−1 + · · · + ϕp Xn−p+1 , n ≥ p.
The autocovariance function needed for the calculation of the coefficients θnj is
1
σ2 RX (s − t),
1 ≤ s, t ≤ p,
RW (s, t) = 1, t = s > p, (91)
0 elsewhere.
bn+1 |2 = EYn+1
vn = E|Xn+1 − X 2
= σ2.
where {Yt , t ∈ Z} is WN(0, σ 2 ), and assume that all the roots of the polynomial λp −
ϕ1 λp−1 − · · · − ϕp are inside the unit circle. It means that {Xt , t ∈ Z} is a causal linear
process and Yt ⊥ Xs for all t > s.
The one-step prediction: To get X bn+1 (n) from Xn , Xn−1 , . . . , notice that
91
n
• Yn+1 ⊥ Xn , Xn−1 , · · · ⇒ Yn+1 ⊥ H−∞ , (from the linearity and the continuity of the
inner product)
It means that
bn+1 = PH n (Xn+1 ) = ϕ1 Xn + · · · + ϕp Xn+1−p .
X −∞
= PH−∞
n (X
bn+h )
= PH−∞
n (ϕ1 Xn+h−1 + · · · + ϕp Xn+h−p )
= ϕ1 [Xn+h−1 ] + ϕ2 [Xn+h−2 ] + · · · + ϕp [Xn+h−p ],
where (
Xn+j , j≤0
[Xn+j ] =
Xn+j (n), j > 0.
b
Example 43. Consider an AR(1) process generated by Xt = ϕXt−1 + Yt , where |ϕ| < 1
and Yt ∼ WN(0, σ 2 ). If we know the whole history Xn , Xn−1 , . . . , we have X
bn+1 = ϕXn .
For h > 1
X bn+h−1 (n) = ϕ2 X
bn+h (n) = ϕ[Xn+h−1 ] = ϕX bn+h−2 (n) = . . .
= ϕ h Xn .
1 − ϕ2h
= σ2 .
1 − ϕ2
where Yt ∼ WN(0, σ 2 ).
92
• Due to causality, for any t ∈ Z, Xt = ∞
P P∞
j=0 c j Y t−j , where j=0 |cj | < ∞; from here
it follows that Yt ⊥ Xs for every s < t.
∞
X
Xt = − dj Xt−j + Yt . (94)
j=1
Since
∞
X N
X
t−1
− dj Xt−j = l. i. m. N →∞ − dj Xt−j ∈ H−∞ ,
j=1 j=1
t−1
and Yt ⊥ H−∞ ,from decomposition (94) it follows that the best linear prediction of
Xn+1 based on the whole history Xn , Xn−1 , . . . , is
∞
X
bn+1 = −
X dj Xn+1−j . (95)
j=1
bn+1 = ϕ1 Xn + · · · + ϕp Xn+1−p
X
+ θ1 Xn − X bn + · · · + θq Xn+1−q − X
bn+1−q .
X
bn+h (n) = PH n (Xn+h ) = PH n (P n+h−1 (Xn+h )) = PH n (X
−∞ −∞ H −∞
bn+h )
−∞
93
where (
Xn+j , j ≤ 0,
[Xn+j ] =
Xbn+j (n), j > 0
and (
Xn+j − X
bn+j , j ≤ 0,
[Yn+j ] =
0, j > 0.
If we use (95) we have
∞ ∞
!
X X
X
bn+h (n) = PH n
−∞
− dj Xn+h−j =− [Xn−h+j ].
j=1 d=1
j=0 j=0
bn+1 |2 = EY 2 = σ 2 .
The prediction error is E|Xn+1 − X n+1
94
For h > 1,
bn+h (n) = PH n P n+h−1 (Xn+h ) = PH n (X
X −∞ H −∞
bn+h )
−∞
= θPH−∞
n (Yn+h−1 ) = 0,
n
since for h ≥ 2, Yn+h−1 ⊥ H−∞ . The h-step prediction error is
• We know the whole past of the sequence {Xt , t ∈ Z} up to time n − 1, and want
to forecast Xn+h , h = 0, 1, . . . , i.e., want to find Xbn+h (n − 1) = P n−1 (Xn+h ), in
H−∞
n−1
other words, we want to find Xn+h (n − 1) ∈ H−∞ ⊂ H{Xt , t ∈ Z}, such that
b
n−1
Xn+h − X bn+h (n − 1) ⊥ H−∞ .
Rπ
• Recall spectral decomposition: Xt = −π eitλ dZ(λ), where {Zλ , λ ∈ [−π, π]} is an
orthogonal increment process with the associated distribution function F (Theorem
28).
• Recall that all the elements of the Hilbert space H{Xt , t ∈ Z} are of the form
Z π
ϕ(λ)dZ(λ),
−π
n−1
where Φh (λ) ∈ L2 (F ). Condition Xn+h − X
bn+h (n − 1) ⊥ H−∞ will be met if
Xn+h − X
bn+h (n − 1) ⊥ Xn−j , j = 1, 2, . . .
thus for j = 1, 2, . . . ,
E Xn+h − X
bn+h (n − 1) X n−j = 0,
95
or
E Xn+h − X bn+h (n − 1) X n−j =
Z π Z π
inλ
= R(h + j) − E e Φh (λ)dZ(λ) ei(n−j)λ dZ(λ)
Z π−π −π
Denote
Ψh (λ) := eihλ − Φh (λ) f (λ).
It follows from condition (98) that the Fourier expansion of the function Ψh has only
terms with nonnegative powers of eiλ ,
∞
X ∞
X
ikλ
Ψh (λ) = bk e , |bk | < ∞.
k=0 k=0
Provided that ∞ ∞
X X
Φh (λ) = ak e−ikλ , |ak | < ∞,
k=1 k=1
96
Z π ∞
hX i
bn+h (n − 1) = inλ −ikλ
X e ak e dZ(λ)
−π k=1
Z π N
hX i
= l. i. m. e inλ
ak e−ikλ dZ(λ)
N →∞ −π k=1
N
X hZ π i
= l. i. m. ak ei(n−k)λ dZ(λ)
N →∞ −π
k=1
XN ∞
X
= l. i. m. ak Xn−k = ak Xn−k .
N →∞
k=1 k=1
be a function holomorphic for |z| ≤ 1. Then the best linear prediction of Xn+h from
Xn−1 , Xn−2 , . . . is Z π
Xbn+h (n − 1) = einλ Φh (λ)dZ(λ),
−π
where Φh (λ) = Φ∗h eiλ and {Zλ , λ ∈ [−π, π]} is the orthogonal increment process from
97
Example 45. Consider again the autoregressive sequence
σ2 1 σ2 1 ∗ iλ
f (λ) = = = f e ,
2π |1 − ϕe−iλ |2 2π (1 − ϕe−iλ )(1 − ϕeiλ )
where
∗ σ2 1 σ2 z
f (z) = −1
=
2π (1 − ϕz )(1 − ϕz) 2π (1 − ϕz)(z − ϕ)
which is a rational function of the complex-valued variable z. The function
should be holomorphic for |z| ≥ 1. Since |ϕ| < 1, it must be z h − Φ∗h (z) = 0 for z = ϕ,
otherwise Ψ∗h has a pole at z = ϕ.
Thus,
Φ∗h (ϕ) = ϕh . (101)
Put Φ∗h (z) := γz , where γ is a constant. Then Φ∗h is holomorphic for |z| ≥ 1 and
Φ∗h (∞) = 0. The function Ψ∗h is holomorphic for |z| ≤ 1.
The value of constant γ follows from (101): ϕh = Φ∗h (ϕ) = ϕγ , thus γ = ϕh+1 .
Functions
ϕh+1
Φ∗h (z) = = ϕh+1 z −1 , z ∈ C,
z
σ 2
z h+1 − ϕh+1
Ψ∗h (z) = , z ∈ C,
2π (z − ϕ)(1 − ϕz)
satisfy the conditions of Theorem 55. The spectral characteristic of prediction is Φh (λ) =
ϕh+1 e−iλ and the best linear forecast is
Z π
Xn+h (n − 1) =
b einλ Φh (λ)dZ(λ)
Z−π
π
= einλ ϕh+1 e−iλ dZ(λ)
Z−π
π
= ei(n−1)λ dZ(λ)ϕh+1 = ϕh+1 Xn−1
−π
98
which is the same result as obtained in the time domain.
For the prediction error, from (99) we get
bt+h (t − 1)2 = kXt+h k2 − kX
δh2 = EXt+h − X bt+h (t − 1)k2
Z π 2
2 itλ
= R(0) − E Xt+h (t − 1) = R(0) − E
b e Φh (λ)dZ(λ)
−π
Z π
itλ 2
= R(0) − e Φh (λ) f (λ)dλ =
−π
Z π
2(h+1)
f (λ)dλ = R(0) 1 − ϕ2(h+1) ,
= R(0) − |ϕ|
−π
Vt = Xt + Yt , t ∈ Z,
i.e., {Vt , t ∈ Z}, is a mixture of the signal and the noise. Our aim is to extract the signal
from this mixture.
and
Xs − X
bs ⊥ Vt , t = 1, . . . , n,
99
or
E(Xs − X
bs )Vt = 0, t = 1, . . . , n. (102)
Since Vt = Xt +Yt for all t and Xt , Yt are uncorrelated, we can see that EXs Vt = EXs Xt =
RX (s − t), and equations (102) can be written in the form
n
X
RX (s − t) − cj RV (j − t) = 0, t = 1, . . . , n. (103)
j=1
The variable X bs is the best linear filtration of the signal Xs at time s from the mixture
V1 , . . . , Vn .
The filtration error is
The system of equations (103) can be written in the obvious matrix form. For the
regularity of the matrix of elements RV (j − t), j, t = 1, . . . , n, see Theorem 53.
where
fX (λ)
Φ(λ) = , λ ∈ [−π, π], (104)
fV (λ)
fV = fX + fY is the spectral density of {Vt , t ∈ Z} and ZV = {Zλ , λ ∈ [−π, π]} is the
orthogonal increment process from the spectral decomposition of the sequence {Vt , t ∈ Z}.
The filtration error is
Z π Z π
2 fX (λ)fY (λ)
δ = dλ = Φ(λ)fY (λ)dλ.
−π fX (λ) + fY (λ) −π
100
Function Φ is called spectral characteristic of filtration.
Proof. The sequences {Xt , t ∈ Z} and {Yt , t ∈ Z} are centered, stationary and mutually
uncorrelated with spectral densities. It follows that the sequence {Vt , t ∈ Z} is centered
and stationary with the autocovariance function RV = RX + RY . Then the spectral
density of {Vt , t ∈ Z} exists and is equal to fV = fX + fY .
The best linear filtration of Xs from {Vt , t ∈ Z} is the projection of Xs onto the
Hilbert space H = H{Vt , t ∈ Z}, i.e., we are interested in X bs = PH (Xs ).
Let Φ be the function defined in (104). First, we will show that
Z π
Xs :=
b eisλ Φ(λ)dZV (λ) ∈ H.
−π
fX (λ) 2
Z π Z π Z π
|fX (λ)|2
2
|Φ(λ)| dFV (λ) = f V (λ)dλ = dλ < ∞
−π fV (λ) −π fV (λ)
−π
and X bs ∈ H.
Further, Xbs will be the projection of Xs onto H if (Xs −X
bs ) ⊥ H, i.e., if (Xs −Xbs ) ⊥ Vt
for all t ∈ Z. For any t ∈ Z we have
E Xs − Xs V t = EXs V t − EX
b bs V t
Z π Z π !
isλ
= EXs X t + Y t − E e Φ(λ)dZV (λ) eitλ dZV (λ)
−π −π
Z π
= EXs X t − eisλ Φ(λ)e−itλ dFV (λ)
−π
Z π
= RX (s − t) − eiλ(s−t) Φ(λ)fV (λ)dλ
Z−π
π
= RX (s − t) − eiλ(s−t) fX (λ)dλ
−π
= RX (s − t) − RX (s − t) = 0.
101
Let us determine the filtration error:
δ 2 = kXs k2 − kX bs k2 = RX (0) − E|X bs |2
Z π Z π 2
isλ
= fX (λ)dλ − E e Φ(λ)dZV (λ)
Z−ππ Z π −π
Φ(λ)2 dFV (λ)
= fX (λ)dλ −
Z−ππ Z−ππ
fX (λ) 2
= fX (λ)dλ − fV (λ)dλ
−π fV (λ)
−π
Z π Z π
fX (λ)fY (λ)
= dλ = Φ(λ)fY (λ)dλ.
−π fX (λ) + fY (λ) −π
Example 46. Let the signal {Xt , t ∈ Z} and the noise {Yt , t ∈ Z} be mutually inde-
pendent sequences such that
Xt = ϕXt−1 + Wt , t ∈ Z,
2
where |ϕ| < 1, ϕ 6= 0 and {Wt , t ∈ Z} is a white noise with zero mean and variance σW ,
2
and {Yt , t ∈ Z} is another white noise sequence with zero mean and variance σY . We
observe Vt = Xt + Yt , t ∈ Z.
Obviously, {Xt , t ∈ Z} and {Yt , t ∈ Z} are centered stationary sequences with the
spectral densities
2
σW 1 σY2
fX (λ) = , fY (λ) = , λ ∈ [−π, π]
2π |1 − ϕe−iλ |2 2π
that satisfy conditions of Theorem 56.
The sequence {Vt , t ∈ Z} has the spectral density fV = fX + fY and it can be shown
that
σ 2 |1 − θe−iλ |2
fV (λ) = , λ ∈ [−π, π], (105)
2π |1 − ϕe−iλ |2
where σ 2 = ϕθ σY2 , θ is the root of the equation θ2 − cθ + 1 = 0, the absolute value of
which is less than one and has the same sign as the coefficient ϕ, and
2
σW + σY2 (1 + ϕ2 )
c= .
ϕσY2
(See Prášková, 2016, Problem 8.1 for some hints.) Then
2 2 X∞
fX (λ) σW 1 σW 2
k −ikλ
Φ(λ) = = 2 = 2 θ e
fV (λ) σ |1 − θe−iλ |2 σ k=0
2 ∞
σW 1 X
= θ|k| e−ikλ
σ 1 − θ k=−∞
2 2
102
for all λ ∈ [−π, π].
The best linear filtration of Xs from {Vt , t ∈ Z} is
2 ∞
σW 1 X
Xs = 2
b θ|k| Vs−k . (106)
σ 1 − θ2 k=−∞
Remark 19. It follows from (105) that fV has the same form as the spectral density
of an ARMA(1, 1) sequence. The mixture of the AR(1) sequence {Xt , t ∈ Z} and the
white noise {Vt , t ∈ Z} has the same covariance structure as the stationary sequence
{Zt , t ∈ Z} that is modeled to be
Zt − ϕZt−1 = Ut − θUt−1 , t ∈ Z,
where ϕ 6= 0, |ϕ| < 1, |θ| < 1 and {Ut , t ∈ Z} is a white noise with the variance
σ 2 = ϕθ σY2 . Parameter θ can be determined as given above.
σ2
Remark 20. Function Φ is the transfer function of the linear filter { σW2 1
1−θ2
θ|k| , k ∈ Z}.
where Xe1 is the linear projection of X1 onto Hilbert space H2k = H{X2 , . . . , Xk } and
ek+1 is the linear projection of Xk+1 onto H k .
X 2
103
residuals X1 − X e1 and Xk+1 − Xek+1 of the best linear approximation of the variables X1
and Xk+1 by random variables X2 , . . . , Xk .
The stationarity of the sequence {Xt , t ∈ Z} implies that for h ∈ N, corr(X1 −
X1 , Xk+1 − X
e ek+1 ) = corr(Xh − Xeh , Xk+h − Xek+h ), where X
eh , X
ek+h are linear projec-
tions of random variables Xh , Xk+h onto the Hilbert space H{Xh+1 , . . . , Xh+k−1 }. There-
fore, α(k) is also the correlation coefficient of Xh and Xh+k after the linear dependence
Xh+1 , . . . , Xh+k−1 was eliminated.
Xt = ϕXt−1 + Yt ,
Due to the causality, X ek+1 = PH k (Xk+1 ) = ϕXk and Xk+1 − X ek+1 = Yk+1 ⊥ H k .
2 2
Further, it follows from causality that Yk+1 ⊥ X1 , thus E(X1 − X1 )(Xk+1 − Xk+1 ) =
e e
E(X1 − Xe1 )Yk+1 = 0, from which we conclude that α(k) = 0 for k > 1.
Remark 21. In the same manner, for a causal AR(p) sequence we could prove that the
partial autocorrelation α(k) = 0 for k > p.
Xt = Yt + bYt−1 ,
where |b| < 1 and Yt ∼ WN(0, σ 2 ). We know that in this case RX (0) = (1 + b2 )σ 2 ,
RX (1) = bσ 2 = RX (−1) and RX (k) = 0 for |k| > 1.
We compute the partial autocorrelations.
b
First, α(1) = rX (1) = 1+b 2 . Further, α(2) = corr(X1 − X1 , X3 − X3 ). To determine
e e
X e1 = PH 2 X1 = cX2 and X1 − X
e1 , notice that X e1 ⊥ X e1 . Thus E(X1 − cX2 )X2 = 0,
2
RX (1) b e1 = b 2 X2 . Quite analogously we get X b
and c = RX (0)
= 1+b2
.
We have X 1+b
e3 = X,
1+b2 2
i.e., X
e1 = X
e3 . We have
b b
α(2) = corr X1 − X2 , X3 − X2 .
1 + b2 1 + b2
104
Obviously,
b b 2b b2
E X1 − X 2 X 3 − X 2 = RX (2) − RX (1) + RX (0)
1 + b2 1 + b2 1 + b2 (1 + b2 )2
σ 2 b2
=− ,
1 + b2
similarly,
b 2 b2 2b σ 2 (1 + b2 + b4 )
E X1 − X 2 = RX (0) + RX (0) − RX (1) = ,
1 + b2 (1 + b2 )2 1 + b2 1 + b2
and combining these results we conclude that
b2
α(2) = − .
1 + b2 + b4
Generally, it can be shown that
(−b)k (1 − b2 )
α(k) = − , k ≥ 1.
1 − b2(k+1)
Definition 42 (An alternative definition of the partial correlation function). Let {Xt , t ∈
Z} be a centered stationary sequence, let PH1k (Xk+1 ) be the best linear prediction of Xk+1
on the basis of X1 , . . . , Xk . If H1k = H{X1 , . . . , Xk }, and PH1k (Xk+1 ) = ϕ1 Xk +· · ·+ϕk X1 ,
then the partial autocorrelation function at lag k is defined to be α(k) = ϕk .
Theorem 57. Let {Xt , t ∈ Z} be a real-valued sequence with the autocovariance function
R, such that R(0) > 0, R(t) → 0, as t → ∞. Then the both definitions of the partial
autocorrelation function are equivalent and it holds
α(1) = r(1),
1 r(1) . . . r(k − 2) r(1)
r(1)
1 . . . r(k − 3) r(2)
.. .. .. .. ..
. . . . .
r(k − 1) r(k − 2) . . . r(1) r(k)
α(k) = k > 1, (107)
1 r(1) . . . r(k − 1)
r(1)
1 . . . r(k − 2)
.. .. .. ..
. . . .
r(k − 1) r(k − 2) . . . 1
105
Proof. Denote H1k = H{X1 , . . . , Xk }, H2k = H{X2 , . . . , Xk }, X
bk+1 = PH k (Xk+1 ), X
1
e1 =
PH k (X1 ), X
2
ek+1 = PH k (Xk+1 ).
2
e1 + (X1 − X
Since X1 = X e1 ∈ H2k , X1 − X
e1 ), where X ek+1 ∈ H2k ,
e1 ⊥ H2k , and X
ek+1 (X1 − X
EX e1 ) = 0. (108)
Consider
bk+1 = ϕ1 Xk + · · · + ϕk X1 .
X
Then
bk+1 = ϕ1 Xk + · · · + ϕk−1 X2 + ϕk X
X e1 + ϕk (X1 − X
e1 ) ,
and the random variables in the brackets are mutually orthogonal. Then ϕk (X1 − X e1 )
can be considered to be the projection of X e = H{X1 − X
bk+1 onto the Hilbert space H e1 } ⊂
k
H1 . It is also the projection of Xk+1 onto the space He and
E Xk+1 − ϕk (X1 − X e 1 ) X1 − X
e1 = 0
e1 2 .
= EXk+1 X1 − X e1 − ϕk E X1 − X
Since E(X1 − Xe1 )2 = E(Xk+1 − X ek+1 )2 , which holds from the fact that for a stationary
sequence, var (X2 , . . . , Xk ) = var (Xk , . . . , X2 ), we get from (109) that
ϕk = corr(X1 − X
b1 , Xk+1 − X
bk+1 ) = α(k).
106
Dividing each equation by R(0), we get the system of equations
ϕ1 + ϕ2 r(1) + · · · + ϕk r(k − 1) = r(1)
ϕ1 r(1) + ϕ2 + · · · + ϕk r(k − 2) = r(2)
..
.
ϕ1 r(k − 1) + ϕ2 r(k − 2) + · · · + ϕk = r(k),
or, in the matrix form,
1 r(1) . . . r(k − 1) ϕ1 r(1)
r(1) 1 . . . r(k − 2) ϕ2 r(2)
= ,
.. .. ... .. .. ..
. . . . .
r(k − 1) r(k − 2) ... 1 ϕk r(k)
The ratio of determinants (107) gives the solution for ϕk .
Example 49. Consider again the causal AR(1) process
Xt = ϕXt−1 + Yt ,
where |ϕ| < 1 and Yt ∼ WN(0, σ 2 ). Let us compute the partial autocorrelation function
according to formula (107). We get
1
ϕ . . . ϕk−2 ϕ
ϕ
1 . . . ϕk−3 ϕ2
.. .. .. .. ..
. . . . .
k−1
ϕk−2 . . . ϕ ϕk
ϕ
α(k) =
k−1
k > 1, (110)
1
ϕ . . . ϕ
ϕ k−2
1 . . . ϕ
.. .. .. ..
. . . .
k−1 k−2
ϕ ϕ ... 1
We can see that the last column of the determinant in the numerator of (110) is
obtained by multiplication of the first column, thus, this determinant equals zero.
107
1.0
1.0
0.5
0.5
α(t)
r(t)
0.0
0.0
−0.5
−0.5
−1.0
−1.0
0 2 4 6 8 10 1 3 5 7 9
t t
Figure 8: Autocorrelation (left) and partial autocorrelation function (right) of the AR(1) sequence
Xt = −0,8 Xt−1 + Yt
1.0
1.0
0.5
0.5
α(t)
r(t)
0.0
0.0
−0.5
−0.5
−1.0
−1.0
0 2 4 6 8 10 1 3 5 7 9
t t
Figure 9: Autocorrelation function (left) and partial autocorrelation function (right) of the MA(1)
sequence Xt = Yt + 0,8 Yt−1
108
A common estimator of the mean value is the sample mean defined by
n
1X
Xn = Xt .
n t=1
Xt = µ + X
et , t = 1, . . . , n, (111)
bn = (10n Γ−1
µ −1 0 −1
n 1 n ) 1 n Γ n Xn , (112)
where
R(0) . . . R(n − 1)
R(1)
R(1) . . . R(n − 2)
R(0)
Γn = var Xn = ,
.. .. ... ..
. . .
R(n − 1) R(n − 2) . . . R(0)
is regular matrix according to Theorem 53, 1n = (1, . . . , 1)0 and Xn = (X1 , . . . , Xn )0 .
The variance of this estimator is
bn = (10n Γ−1
var µ −1
n 1n ) . (113)
109
14.2 Estimators of the autocovariance and the autocorrelation
function
The best linear estimator (112) assumes knowledge of the autocovariance function R.
Similarly, knowledge of the autocovariance function is assumed in prediction problems.
For estimators we usually work with the sample autocovariance function
n−k
1X
R(k)
b = Xt − X n Xt+k − X n , k = 0, 1, . . . , n − 1 (114)
n t=1
and R(k)
b = R(−k)
b pro k < 0. Let us remark that the sample autocovatriance function
is not an unbiased estimator of the autocovariance function, i.e., ER(k)
b 6= R(k).
The matrix
R(0)
b R(1)
b b − 1)
. . . R(n
R(1)
b R(0)
b b − 2)
. . . R(n
Γn =
b .
. .
. . . .
.
. . . .
b − 1) R(n
R(n b − 2) . . . R(0)
b
2
= n1 nt=1 Xt − X n > 0.
P
if R(0)
b
Asymptotic behaviour of the sample autocorrelations is described in the following
theorem.
Theorem 58. Let {Xt , t ∈ Z} be a random sequence
∞
X
Xt − µ = αj Yt−j ,
j=−∞
110
where Yt , t ∈ Z, are independent identically distributed random variables with zero mean
and finite positive variance σ 2 , and let E|Yt |4 < ∞ and ∞
P
j=−∞ |αj | < ∞.
Let r(k), k ∈ Z, be the autocorrelation function of the sequence {Xt , t ∈ Z} and rb(k)
be the sample autocorrelation at lag k, based on X1 , . . . , Xn . √
Then for each h = 1, 2, . . . , as n → ∞, the random vector n b r(h) − r(h) converges
in distribution to a random vector with normal distribution Nh (0, W), where
i, j = 1, . . . , h.
Proof. Brockwell and Davis (1991), Theorem 7.2.1.
Remark 22. Formula (116) is called the Bartlett formula. From the assertion of the
theorem we especially get for any i
√ D
n rb(i) − r(i) −→ N (0, wii ), n → ∞,
Xt = ϕXt−1 + Yt , t ∈ Z,
where |ϕ| < 1 and Yt , t ∈ Z are i. i. d. with zero mean, finite non-zero variance σ 2 and
with finite moments E|Yt |4 . Then r(k) = ϕ|k| , thus r(1) = ϕ and according to Theorem
58, √ D
n rb(1) − ϕ) −→ N (0, w11 ), n → ∞,
where
∞ ∞
X 2 X 2
ϕk−1 − ϕk+1
w11 = r(k + 1) + r(k − 1) − 2r(1)r(k) =
k=1 k=1
∞
X
= (1 − ϕ2 )2 ϕ2(k−1) = 1 − ϕ2 .
k=1
If we denote rb(1) := ϕ,
b we can write
√ D
b − ϕ −→ N (0, 1 − ϕ2 ),
n ϕ n→∞
111
or
√ b−ϕ D
ϕ
np −→ N (0, 1), n → ∞.
1 − ϕ2
P
b −→ ϕ (see, e.g. Brockwell and Davis, 1991, Chap. 6) and
From here it follows that ϕ
also
√ ϕ b−ϕ D
np −→ N (0, 1), n → ∞.
1−ϕb2
The asymptotic 95% confidence interval for ϕ is
r r !
1−ϕ b2 1−ϕb2
b − 1, 96
ϕ , ϕ
b + 1, 96 .
n n
Example 51. Let us suppose that the sequence {Xt , t ∈ Z} is a strict white noise.
Then r(0) = 1 and r(t) = 0 for t 6= 0. The elements of W are
X∞
wii = r(k − i)2 = 1,
k=1
∞
X
wij = r(k − i)r(k − j) = 0, i 6= j,
k=1
i. e., W = I is the identity matrix. It means that for large n, the vector b r(h) =
0
rb(1), . . . , rb(h) has approximately normal distribution Nh (0, n1 I). For large n, there-
fore the random variables rb(1), . . . , rb(h) are approximately independent and identically
distributed with zero mean and variance n1 . In the plot of sample autocorrelations rb(k)
for k = 1, . . . , approximately 95% of them should be in the interval −1, 96 √1n , 1, 96 √1n .
The sample partial autocorrelation function is defined to be α b(k) = ϕbk , where ϕbk
can be obtained e.g. from (107), where we insert the sample autocorrelation
Pn coefficients.
1
The determinant in the denominator of (107) will be non-zero if n t=1 (Xt − X n )2 > 0.
Example 52. In Figure 12 the plot of the Wolf index of the annual number of the
Sunspots (1700-1987)3 is displayed. In Figures 13 and 14 we can see the sample auto-
covariance function and the sample partial autocorrelation function, respectively. The
data was identified (after centering) with the autoregressive AR(9) process a(B)Xt = Yt
where
a(z) = 1 − 1.182z + 0.4248z 2 + 0.1619z 3 − 0.1687z 4
+ 0.1156z 5 − 0.02689z 6 − 0.005769z 7
+ 0.02251z 8 − 0.2062z 9
Yt ∼ WN(0, σ 2 ), σ 2 = 219.58.
3
Source: WDC-SILSO, Royal Observatory of Belgium, Brussels
112
3
2
1
0
−1
−2
−3
0.0
−0.5
−1.0
0 2 4 6 8 10 12 14 16 18 20
113
Annual means of Wolf Sunspot numbers, 1700−1987
200
180
160
140
120
100
80
60
40
20
0
1700 1750 1800 1850 1900 1950 2000
0.5
−0.5
0 5 10 15 20
Lag
114
Wolf numbers, sample PACF
1
0.8
Sample Partial Autocorrelations
0.6
0.4
0.2
−0.2
−0.4
−0.6
−0.8
0 5 10 15 20
Lag
115
If we replace the values of R(k) in Γ and γ by their sample counterparts
n−k
1X
R(k)
b = Xt − X n Xt+k − X n ,
n t=1
116
Proof. Brockwell and Davis (1991), Theorem 8.1.1.
Least squares method
Consider again sequence (117) and suppose X1 , . . . , Xn to be known. The least square
estimators of parameters ϕ1 , . . . , ϕp are obtained by minimizing the sum of squares
n
X
min (Xt − ϕ1 Xt−1 − · · · − ϕp Xt−p )2 .
ϕ1 ,...,ϕp
t=p+1
Xt = ϕ0 X t−1 + Yt ,
It can be shown that estimators ϕe and σ e2 have the same asymptotic properties as
the moment estimators. In particular, as n → ∞
√
ϕ − ϕ) −→ Np 0, σ 2 Γ−1
D
n (e
117
and
P
e2 −→ σ 2 ,
σ
where Γ is the same matrix as in Theorem 59 (Brockwell and Davis, 1991, Chap. 8).
Maximum likelihood estimators
The maximum likelihood method assumes that the distribution of random variables
from which we are intended to construct estimators of parameters under consideration
is known.
Consider first a sequence {Xt , t ∈ Z}, that satisfies model Xt = ϕXt−1 + Yt , where Yt
are i. i. d. random variables with distribution N (0, σ 2 ). We assume causality, i.e., |ϕ| < 1.
Let us have observations X1 , . . . , Xn . From the causality and independence assumption
it follows that random variables X1 and (Y2 , . . . , Yn ) are jointly independent with the
density
n 1 X n o
2 −(n−1)/2 2
f (x1 , y2 , . . . , yn ) = f1 (x1 )f2 (y2 , . . . , yn ) = f1 (x1 ) 2πσ exp − 2 y .
2σ t=2 t
From the causality it also follows that random variable X1 has the distribution N (0, τ 2 ),
σ2
where τ 2 = 1−ϕ 2 . By the transformation density theorem we easily obtain that the joint
density of X1 , . . . , Xn is
n 1 n
2 −n/2
p X o
2 2
(xt − ϕxt−1 )2 . (125)
f (x1 , . . . , xn ) = 2πσ 1 − ϕ exp − 2 (1−ϕ )x1 +
2
2σ t=2
The likelihood function L(ϕ, σ 2 ) is of the same form as (125). Maximum likelihood
estimates are then ϕ̄, σ̄ 2 , that maximize L(ϕ, σ 2 ) on a given parametric space.
These are the unconditional maximum likelihood estimators and even in this simple
model the task to maximize the likelihood function leads to a non-linear optimization
problem.
More simple solution is provided by using the conditional maximum likelihood method.
We can easily realize that the conditional density of X2 , . . . , Xn given fixed X1 = x1
in our AR(1) model is
n 1 X n o
2 −(n−1)/2 2
f (x2 , . . . , xn |x1 ) = 2πσ exp − 2 (xt − ϕxt−1 ) . (126)
2σ t=2
The conditional maximum likelihood estimators are obtained by maximizing function
(126) with respect to ϕ and σ 2 .
Similarly, if we consider a general causal AR(p) sequence (117), where Yt are i. i. d. with
distribution N (0, σ 2 ), we can prove that the conditional density of (Xp+1 , . . . , Xn )0 given
X1 = x1 , . . . , Xp = xp is
( n
)
−(n−p)/2 1 X 2
f (xp+1 , . . . , xn |x1 , . . . , xp ) = 2πσ 2 (xt − ϕ0 xt−1 ) ,
exp − 2
2σ t=p+1
118
where xt−1 = (xt−1 , . . . , xt−p )0 , and ϕ = (ϕ1 , . . . , ϕp )0 .
By maximization of this function with respect to ϕ1 , . . . , ϕp , σ 2 we get the conditional
maximum likelihood estimators. It can be easily shown that under normality, these
estimators are numerically equivalent to the least squares estimators.
Xt = Yt + θ1 Yt−1 + · · · + θq Yt−q , t ∈ Z,
= σ 2 1 + θ12 + · · · + θq2 ,
R(0)
b (128)
R(1)
b = σ 2 (θ1 + θ1 θ2 + · · · + θq−1 θq ) ,
..
.
R(q) = σ 2 θq .
b
RX (1) θ
r(1) = = .
RX (0) 1 + θ2
119
It can be shown that in this case, |r(1)| ≤ 21 for all real values of θ. Consequently, solving
1
the last equation with respect to θ, we get either the twofold root θ = 2r(1) or two
real-valued roots p
1 ± 1 − 4r2 (1)
θ1,2 = .
2r(1)
The root with the positive sign is in absolute value larger than 1, while those with the
negative sign is in absolute value less than 1, which corresponds to the invertible process.
The moment estimators of θ a σ 2 now can be obtained from equations (128) that can
be rewritten into the form
R(0)
b = σ 2 (1 + θ2 ),
θ
rb(1) = .
1 + θ2
For θ we have two solutions
p
1± 1 − 4b
r2 (1)
θb1,2 = ,
r(1)
2b
r(1)| ≤ 21 .
that take real values if |b
r(1)| < 12 , the moment estimators are
Provided that the process is invertible and |b
p
1− 1 − 4b
r2 (1)
θb = ,
r(1)
2b
R(0)
b
b2 =
σ .
1 + θb2
r(1)| = 12 , we take
If |b
1 rb(1)
θb = = ,
r(1)
2b |b
r(1)|
1b
b2
σ = R(0).
2
r(1)| > 12 the real-valued solution of (128) does not exist. In such a case we use
For |b
the same estimates as given for |br(1)| = 12 .
120
where Yt ∼ WN(0, σ 2 ), ϕ1 , . . . , ϕp , θ1 , . . . , θq , σ 2 are unknown parameters and X1 , . . . , Xn
are given observations, we can proceed as follows:
First we use an analogy of the Yule-Walker equations for the autocovariances RX (k),
k = q + 1, . . . , q + p. We get equations
RX (k) = ϕ1 RX (k − 1) + · · · + ϕp RX (k − p)
where βbj = −ϕbj and RbX (k) are sample autocovariances computed from X1 , . . . , Xn .
The moment estimators are under some assumptions consistent and asymptotically
normal, but they are not too stable and must be handled carefully. Nevertheless, they
can serve as preliminary estimates in more advanced estimation procedures.
Two-step least squares estimators in MA and ARMA models
Consider a causal and invertible ARMA(p, q) process
121
Under the invertibility assumptions, the process has an AR(∞) representation
∞
X ∞
X
Yt = dj Xt−j = Xt + dj Xt−j
j=0 j=1
Xt = ϕ1 Xt−1 +· · ·+ϕp Xt−p +θ1 Yb̃ t−1 +· · ·+θq Yb̃ t−q +Yt , t = max(p, q, m)+1, . . . , n
16 Periodogram
Definition 43. Let X1 , . . . , Xn be observations of a random sequence {Xt , t ∈ Z}. The
periodogram of X1 , . . . , Xn is defined by
n
1 X 2
−itλ
In (λ) = Xt e , λ ∈ [−π, π]. (129)
2πn t=1
122
For a real-valued sequence, the periodogram can be also expressed by
n n n−1 min(n,n−k)
1 XX −i(t−s)λ 1 X X
In (λ) = Xt X s e = Xs Xs+k e−ikλ
2πn t=1 s=1 2πn k=−n+1
s=max(1,1−k)
n−1
1 X
= e−ikλ Ck , (132)
2π k=−n+1
where
n−k
1X
Ck = Xt Xt+k , k≥0 (133)
n t=1
= C−k , k < 0.
Under the assumptions of the theorem and according to Theorem 22, the spectral density
of the sequence {Xt , t ∈ Z} exists and is given by
∞
1 X −ikλ
f (λ) = e R(k).
2π k=−∞
2πj
Usually, the periodogram is computed at points λj = n
, λj ∈ [−π, π] (Fourier
frequencies).
123
Theorem 61. Let {Xt , t ∈ Z} be a Gaussian random sequence of i. i. d. random vari-
ables with zero mean and variance σ 2 , 0 < σ 2 < ∞. Put n = 2m + 1 and consider
the periodogram In computed from X1 , . . . , Xn at frequencies λj = 2πj n
, j = 1, . . . , m.
Then random variables In (λ1 ), . . . , In (λm ) are independent and identically distributed as
σ2 2
4π
χ (2), where χ2 (2) denotes the χ2 distribution with two degrees of freedom.
Proof. Consider random vector J = (A(λ1 ), . . . , A(λm ), B(λ1 ), . . . , B(λm ))0 , where the
variables A(λj ), B(λj ) are defined in (131). This vector has jointly normal distribution
since it is a linear transformation of the random vector (X1 , . . . , Xn )0 . Further we prove
that all the components of the vector J are mutually uncorrelated (and thus, indepen-
dent), and identically distributed with zero mean and variance σ 2 . For this we use the
following identities for trigonometric functions
n
X n
cos2 (tλr ) = , r = 1, . . . , m,
t=1
2
n
X n
sin2 (tλr ) = , r = 1, . . . , m,
t=1
2
n
X
sin(tλr ) cos(tλs ) = 0, r, s = 1, . . . , m,
t=1
n
X
sin(tλr ) sin(tλs ) = 0, r, s = 1, . . . , m, r 6= s,
t=1
Xn
cos(tλr ) cos(tλs ) = 0, r, s = 1, . . . , m, r 6= s
t=1
from that the result follows using simple computations. Particularly, we get for any
r = 1, . . . , m that A(λr ) ∼ N (0, σ 2 ), B(λr ) ∼ N (0, σ 2 ), thus
A2 (λr ) + B 2 (λr ) 4π
2
= 2 In (λr ) ∼ χ2 (2).
σ σ
Remark 24. From the assumption that {Xt , t ∈ Z} is a Gaussian random sequence of
i. i. d. random variables with zero mean and variance σ 2 we can easily conclude that the
spectral density of this sequence is
σ2
, λ ∈ [−π, π]
f (λ) =
2π
since {Xt , t ∈ Z} is the white noise. From Theorem 61 and properties of the χ2 distri-
bution we have for r = 1, . . . , m,
σ2 σ2
EIn (λr ) = 2 = = f (λr ),
4π 2π
124
σ4
var In (λr ) = 4 = f 2 (λr ).
16π 2
We can see that the variance of the periodogram in this case does not depend on n. More
generally, it can be proved that for any Gaussian stationary centered sequence with a
continuous spectral density f it holds that
lim var In (λ) = f 2 (λ), λ 6= 0, λ ∈ (−π, π)
n→∞
= 2f 2 (λ), λ = 0, λ = ±π
(Anděl, 1976, p. 103, Theorem 10). We see that the variance of the periodogram does
not converge to zero with increasing n. It means that the periodogram is not consistent
estimator of the spectral density.
The periodogram was originally proposed to detect hidden periodic components in a
time series. To demonstrate it, let us consider a sequence {Xt , t ∈ Z} such that
Xt = αeitλ0 + Yt , Yt ∼ WN(0, σ 2 )
where α is a nonzero constant and λ0 ∈ [−π, π]. Then
n n n
1 X 1 X −itλ 1 X −it(λ−λ0 )
√ Xt e−itλ = √ Yt e +√ αe (136)
n t=1 n t=1 n t=1
and from here we can see that if λ = λ0 , the nonrandom part of the periodogram
represented by the second sum on the right-hand side of (136) tends to ∞ as n → ∞
while for λ 6= λ0 is negligible. It means that if there is a single periodic component at
frequency λ0 the periodogram takes in it the largest value. Since usually the frequency
λ0 is unknown, it is reasonable to consider maximum of the values of the periodogram
at the Fourier frequencies.
Theorem 62. Let {Xt , t ∈ Z} be a Gaussian random sequence of i. i. d. random variables
with zero mean and variance σ 2 . Let n = 2m + 1 and In (λr ) be the periodogram computed
from X1 , . . . , Xn at the frequencies λr = 2πr
n
, r = 1, . . . , m. Then the statistic
max1≤r≤m In (λr )
W = (137)
In (λ1 ) + · · · + In (λm )
has density
[1/x]
m−1
X
j−1
g(x) = m(m − 1) (−1) (1 − jx)m−2 , 0<x<1
j=1
j−1
and
[1/x]
X m k
P (W > x) = 1 − (−1) (1 − kx)m−1 , 0 < x < 1. (138)
k=0
k
125
7 Sunspots: Periodogram
x 10
2
1.8
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5
cycles/year
Figure 15: Periodogram of the Sunspots, Wolf index. The maximum corresponds to the
cycle with period 11.0769 years
126
is an asymptotically unbiased and consistent estimator of
Z π
f (λ)K(λ)dλ.
/π
If we expand function K into the Fourier series with the Fourier coefficients wk and
express the periodogram by using formulas (132) and (133), we get
Z πn−1
1 X −ikλ
fbn (λ0 ) = e Ck K(λ − λ0 )dλ
−π 2π k=−n+1
n−1 Z π
1 X
= Ck e−ikλ K(λ − λ0 )dλ
2π k=−n+1 −π
n−1 Z π ∞
1 X X
= Ck e−ikλ eij(λ−λ0 ) wj dλ
2π k=−n+1 −π j=−∞
n−1 ∞ Z π
1 X X
−ijλ0
= Ck e wj e(ijλ−ikλ) dλ
2π k=−n+1 j=−∞ −π
n−1
X n−1
X
−ikλ0
= e Ck w k = C0 w 0 + 2 Ck wk cos(kλ0 ). (139)
k=−n+1 k=1
One of the commonly used kernel function is so-called Parzen window, which is
usually presented by coefficients
3
1 − 6 Mk 2 + 6 |k| , |k| ≤ M2
M
3
wk = 2 1 − |k| , M
< |k| ≤ M
M 2
0, |k| > M
where M is a truncation point that depends on n ( n6 < M < n5 ). For more information
on the choice of K, respectively of wk , see, e.g., Anděl, 1976, or Brockwell and Davis,
Chap. 10.
127
Sunspots: Spectral Density Estimate, Parzen window
45
40
35
Power/frequency (dB/rad/sample)
30
25
20
15
10
5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (×π rad/sample)
Figure 16: Sunspots, Wolf index. Estimator of the spectral density, Parzen kernel
References
[1] Anděl, J. : Statistická analýza časových řad. SNTL, Praha 1976 (In Czech)
[2] Anděl, J.: Základy matematické statistiky. MATFYZPRESS, Praha 2007 (In Czech)
[3] Bosq, D., Nguyen, Hung T. : A Course in Stochastic Processes. Stochastic Models
and Statistical Inference. Kluwer, Dordrecht 1996
[4] Brockwell, P. J., Davis, R. A. : Time Series: Theory and Methods. Springer-Verlag,
New York 1991
[6] Prášková, Z.: Základy náhodných procesů II, Karolinum, Praha 2016. 2nd, extended
edition (In Czech)
[7] Prášková, Z., Lachout, P. : Základy náhodných procesů I, Matfyzpress, Praha 2012.
2nd, extended edition (In Czech)
[8] Priestley, M. B. : Spectral Analysis and Time Series, Vol. I. Academic Press, London
1981
128
[9] Rao, R. C. : Lineárnı́ metody statistické indukce a jejich aplikace. Academia, Praha
1978. Czech translation from Rao, R. C. : Linear Statistical Inference and its Appli-
cations, Wiley, New York, 1973
[11] Rudin, W. : Analýza v reálném a komplexnı́m oboru. Academia, Praha. 2003. Czech
translation from Rudin, W. : Real and Complex Analysis., 3rd Ed. McGraw Hill,
New York, 1987
[12] Shumway, R. H., Stoffer, D. S. : Time Series Analysis and Its Applications. Springer-
Verlag, New York 2001
129
List of symbols
130