Machine Learning 2
Machine Learning 2
on
Unit-II
Time Series Models for Linear Stationary
SR
Processes
Dr. Suresh, R
Assistant Professor
Department of Statistics
Bangalore University, Bengaluru-560 056
1
Contents
1 Time–series(t.s) as Discrete Parameter Stochastic Process, Def-
inition of Strict and Weak Stationarity of a t.s., Gaussian t.s.,
Ergodicity 6
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.1 Describing (Properties of) Stochastic Processes . . . . . . . 6
1.2 Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Examples of Some Processes . . . . . . . . . . . . . . . . . . 11
1.2.2 Relationship between Strong and Weak Stationarity . . . . . 13
1.3 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Alternative Form of GLP . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Properties of GLP . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Stationarity and Invertibility Conditions of a General Linear Process 24
2
4.7.5 PACF of AR(2) Process . . . . . . . . . . . . . . . . . . . . 39
3
Syllabus
4
References
[1] Anderson, T. W. , The Statistical Analysis of Time Series, Wi-
ley, New York, 1971.
[2] Box, G. E. P., Jenkins, G. M., Reinsel, G. C. and Ljung, G.
M., Time Series Analysis-Forecasting and Control, 5/e, Wiley,
2016.
[3] Brockwell, P. J. and Davis, R. A., Introduction to Time Series
and Forecasting, 3/e, Springer, Switzerland, 2016.
[4] Brockwell, P. J. and Davis, R. A., Time Series: Theory and
Methods, 2/e, Springer, New York, 2009.
[5] Chatfield, C., The Analysis of Time Series: Theory and Prac-
tice, 5/e, Chapman and Hall, London, 1996.
[6] Chatfield, C. and Xing, H., The Analysis of Time Series: An
Introduction with R, 7/e, CRC Press, 2019.
[7] Cryer, J. D. and Chan, K. S., Time Series Analysis with Appli-
SR
5
1 Time–series(t.s) as Discrete Parameter Stochas-
tic Process, Definition of Strict and Weak Sta-
tionarity of a t.s., Gaussian t.s., Ergodicity
1.1 Introduction
Definition 1.1 ( Time–series(t.s) as Discrete Parameter Stochas-
tic Process). : A time series is a discrete time stochastic pro-
cess X = {Xt ; t ∈ T } where t a time parameter with domain T =
{0, ±1, ±2, . . .}.
Note: Let (Ω, A, P ) be a given Probability Space. Let T =
{0, ±1, ±2, . . .}. A sequence X = {Xt ; t ∈ T } is said to be a discrete
time
Q stochastic process if it is a mapping from Ω to RT (which is equal
to t∈T Rt , where for each t, Rt is a copy of the real line R) such that
for any integer m ≥ 1, t1 , t2 , . . . , tm ∈ T , and for all x1 , x2 , . . . , xm ∈
R, {(Xt1 ≤ x1 ) ∩ (Xt2 ≤ x2 ) ∩ (Xt3 ≤ x3 ) ∩ · · · ∩ (Xtm ≤ xm )} is a
member of the sigma field A.
Remark: In the case of stochastic process we have a large (count-
able or uncountable) number of distribution functions and in most
SR
6
Definition 1.3 (Autocovariance Function of a Stochastic Pro-
cess (ACVF)). The Autocovariance Function (ACVF) γt1 ,t2 is de-
fined to be the covariance of Xt1 and Xt2 ,(covariances between ran-
dom variables of the same stochastic process hence known as autoco-
variance) namely
γt1 ,t2 = E{(Xt1 − µt1 )(Xt2 − µt2 )}. (3)
Clearly, the variance function is a special case of the ACVF when
t1 = t2 = t(say), which implies Cov(Xt , Xt ) = V ar(Xt ) = γ0
Remark: Higher moments of a stochastic process may be defined
in an obvious way, but are rarely used in practice.
Note: The size of an autocovariance coefficient depends on the
units in which Xt is measured. Thus, for interpretative purposes, it
is helpful to standardize the ACVF, to produce a function called the
autocorrelation function(ACF). It is also called as Serial Correla-
tion.
Definition 1.4 (Autocorrelation Function(ACF)). The auto-
correlation function (ACF) ρt1 ,t2 to be the correlation between Xt1
SR
7
been removed. In other words, the properties of one section of the
data are much like those of any other section.
Loosely speaking, a time series {Xt , t = 0, ±1, ±2, . . .} is said to
be stationary if it has statistical properties similar to those of the
“time-shifted” series {Xt+k , t = 0, ±1, ±2, . . .}, for each integer k.
Strictly speaking, it is very often that time series data violate the
stationarity property. However, the phrase is often used for time
series data meaning that they exhibit characteristics that suggest a
stationary model can sensibly be fitted.
Much of the probability theory of time series is concerned with
stationary time series, and for this reason time series analysis often
requires one to transform a non-stationary series into a stationary
one so as to use this theory. For example, it may be of interest to
remove the trend and seasonal variation from a set of data and then
try to model the variation in the residuals by means of a stationary
stochastic process. However, it is also worth stressing that the non-
stationary components, such as the trend, may be of more interest
than the stationary residuals.
Stationarity is an essential concept in time series analysis. Gen-
SR
8
Notes:
1. Strict Stationarity is also known as Strong Stationarity or Sta-
tionarity in Distribution.
2. In verbal terms a stationary process is that process whose prob-
abilistic laws remain unchanged over shifts of time. In other
words the utility of the observations x1 , x2 , . . . , xn in drawing
inference on the time series is the same as one gets by using
a data collected after a time shift say, x1+k , x2+k , . . . , xn+k , (
k being any positive integer). Interestingly a good number of
time series in several areas of research, like biology, ecology and
economics, exhibit this feature.
3. Moments, (even mean) of a strictly stationary time series need
not exist.
Example: A process consisting of a sequence of i.i.d. Cauchy
random variables.
4. An iid sequence is strictly stationary. Converse need not be
true.
SR
and
ρt1 ,t2 = ρt1 +k,t2 +k . (6)
Letting t1 = t − k and t2 = t, we get the autocovariance coeffi-
cient at lag k,
9
and the autocorrelation coefficient at lag k,
γk
ρt1 ,t2 = ρt−k,t = ρt,t+k = ρk = . (8)
γ0
Thus, for a strictly stationary process with first two moments
finite, the covariance and the correlation between Xt and Xt+k
depend only on the time difference k.
Remark: Strict stationarity is too stringent for verification in
practical applications. It has to be weakened to a level that will be
useful for statistical inference. That is, in practice it is often use-
ful to define stationarity in a less restricted way than that described
above. Such a concept is known as Weak/Second-Order/Covariance
Stationarity, which is achieved interms of the moments of the pro-
cess.
Definition 1.5 (Weak Stationarity of a Time Series). A Stochas-
tic process is called second-order stationary (or weakly stationary or
Covariance stationary) if its mean is constant and its ACVF depends
only on the lag, so that
SR
E(Xt ) = µ ∀ t ∈ T (9)
and
Cov(Xt , Xt+k ) = γk ∀ k, and t ∈ T, (10)
or
Cov(Xt+s , Xt+r ) = γ|r−s| ∀ r, s, and t ∈ T. (11)
Note:
1. By letting k = 0, we note that the form of a stationary ACVF
implies that the variance, as well as the mean, is constant. Fur-
ther, the autocovariance and autocorrelation being functions of
the time differences alone.
2. The definition also implies that both the variance and the mean
must be finite.
3. Sometimes, the terms stationary in wide sense or covariance
stationary are also used to describe a second-order weakely sta-
tionary process.
10
4. It follows from the above definitions(Strict and Weak) that a
strictly stationary process with the first two moments finite is
also a second-order weakley or covariance staationary process.
Yet, a striclty stationary process may not have finite moments
and hence may not be covariance stationary(IID Cauchy random
variables - Stricty stationay but not Weak because no moments).
5. No requirements are placed on the higher order moments.
Remark:
1. This weaker definition of stationarity will generally be used from
now on, as many of the properties of stationary processes depend
only on the structure of the process as specified by its first and
second moments.(For instance, Normal Process).
2. The assumption of weak stationarity can be checked empirically
provided that a sufficient number of historical data(time series)
are available. For example, one can divide the data into sub-
samples and check the consistency of the results obtained across
the subsamples.
SR
Example 1 : IID Noise - Perhaps the simplest model for a time series
is one in which there is no trend or seasonal component and
in which the observations are simply independent and identi-
cally distributed (iid) random variables with zero mean. Such
a sequence of random variables X1 , X2 , . . . are known as iid se-
quence,indicated by {Xt } ∼ IID(0, σ 2 ).
Note : In this model there is no dependence between observa-
tions, i.e., knowledge of X1 , X2 , . . . , Xn is of no value in pre-
dicting the behaviour of the future observation Xn+h (say). Al-
though this means that iid noise is a rather uninteresting process
11
for forecasters, it plays an important role as a building block for
more complicated time series models.
Example 2 : Random Walk (RW) - Suppose that {Xt } is a discrete-time,
2
purely random process with mean µ and variance σX . A process
{St } is said to be a random walk if
St = St−1 + Xt . (12)
2
Note: In the above example, E(St ) = tµ and V ar(St ) = tσX .
As the mean and variance change with t, the process is nonsta-
tionary. However, it is interesting to note that the first differ-
ence of a RW, given by
SR
12
Example 5 : Let {Xt } be a sequence of independent random variables alter-
nately following a standard normal distribution N (0, 1) and a
two-sided discrete uniform distribution with equal pobability 1/2
of being −1 or 1. Clearly, the process {Xt } is covariance sta-
tionary (Verify). The process, however, is not stationary. It is,
in fact, not stationary in distribution for any order(h).
1. First note that finite second moments are not assumed in the
definition of strong stationarity, therefore strong stationarity
does not necessarily imply weak stationarity.
2. If the process {Xt ; t ∈ Z} is strongly stationary and has finite
second moment, then {Xt ; t ∈ Z} is weakly stationary.
3. Of course weakly stationary process is not necessarily strongly
stationary, i.e., weak stationarity ; strong stationarity.
4. There is one important case however in which weak stationar-
ity implies strong stationarity. If the process {Xt ; t ∈ Z} is a
SR
1.3 Ergodicity
In a strictly stationary or covariance stationary stochastic process no
assumption is made about the strength of dependence between random
variables in the sequence. For example, in a covariance stationary
stochastic process it is possible that ρ1 = Cor(Xt , Xt−1 ) = ρ100 =
Cor(Xt , Xt−100 ) = 0.5 say. However, in many contexts it is rea-
sonable to assume that the strength of dependence between random
variables in a stochastic process diminishes the farther apart they be-
come. That is, ρ1 > ρ2 > · · · and that eventually ρj = 0 for j large
enough. This diminishing dependence assumption is captured by the
concept of ergodicity.
Definition 1.5 ( Ergodicity). : Intuitively, a stochastic process
{Xt } is ergodic if any two collections of random variables partitioned
far apart in the sequence are essentially independent.
Note: The stochastic process {Xt } is ergodic if Xt and Xt−j are
essentially independent if j is large enough.
13
Definition 1.6 ( Ergodicity). : A covariance stationary time se-
ries {Xt } is said to be ergodic if for every realization x1 , x2 , . . . , xn ,
the sample mean and sample autocovariance coverge in mean-square
to their true(ensemble) statistical quantities, i.e.,
1. !2
Pn
t=1 xt
lim E −µ = 0 and (14)
n→∞ n
2.
!2
Pn
t=1 (xt − x̄)(xt−k − x̄)
lim E − γk = 0, 0 ≤ k ≤ n.
n→∞ n
(15)
In other words, a time average like nt=1 xt /n converges to a pop-
P
14
erties of stationary processes. More details may be found, for exam-
ple, in Hamilton (1994, p. 46).
Remarks:
1. The notions Stationarity and Ergodicity plays a central role in
estimations of the statistical quantities of a Time Series process.
15
4. The acf does not uniquely identify the underlying model.
5. The acvf and acf are positive semidefinite (nonnegative definite)
in the sense that
Xn X n
αi αj γ|ti −tj | ≥ 0 (16)
i=1 j=1
and n X
n
X
αi αj ρ|ti −tj | ≥ 0 (17)
i=1 j=1
for any set of time points t1 , t2 , . . . , tn and any real numbers
α1 , α2 , . . . , αn .
Note:
1. Although a given stochastic process has a unique covariance
structure, the converse is not in general true. It is usually
possible to find many normal and non-normal processes with
the same ACF, and this creates further difficulty in interpret-
ing sample ACFs. Even for stationary normal processes, which
are completely determined by the mean, variance and ACF, we
SR
will see latter in the due course that a requirement, called the
invertibility condition, is needed to ensure uniqueness.
2. It is pertinent to know that not every arbitrary function satis-
fying properties 1 to 3 can be an acvf or acf for a process. A
necessary condition for a function to be the acvf or acf of some
process is that it be positive definite.
3. The proofs of properties 1, 2, 3 and 5 can be found in standard
text books on Time Series Analysis. (Kindly refer them and
prepare the respective proofs)
16
2.2 Partial Autocorrelation Function (PACF)
Apart from the ACVF and ACF tools, Partial Autocorrelation Func-
tion (PACF) is another important tool to analyse Time Series. While
developing a time series model, correlograms will be used extensively.
The interpretation of correlograms is one of the hardest aspects of
time series analysis and practical experience is a ‘must’. Inspection
of the partial autocorrelation function can provide additional help.
Here is an instance which sheds light on the above aspect: Due
to the stability conditions, autocorrelation functions of stationary fi-
nite order autoregressive processes (to be discussed latter) are always
sequences that converge to zero but do not break off. This makes
it difficult to distinguish between processes of different orders when
using the autocorrelation function. To cope with this problem, we
introduce a new concept, the partial autocorrelation function.
The partial correlation between two random variables is the cor-
relation that remains if the possible impact of all other random vari-
ables has been eliminated.
Definition 2.1 ( Patial Autocorrelation Function (PACF) of
SR
Xt+k = φk1 Xt+k−1 +φk2 Xt+k−2 +· · ·+φk(k−1) Xt+1 +φkk Xt +t+k , (19)
where φki denotes the ith regression parameter and t+k is an error
term with mean 0 and uncorrelated with Xt+k−j for j = 1(1)k. Mul-
tiplying Xt+k−j on both sides of the above regression equation and
taking expectation, we get
17
and
φ11 = ρ1
1 ρ1
ρ1 ρ2
φ22 =
SR
1 ρ1
ρ1 1
1 ρ1 ρ1
ρ 1 1 ρ2
ρ 2 ρ1 ρ3
φ33 =
1 ρ1 ρ2
ρ 1 1 ρ1
ρ 2 ρ1 1
..
.
1 ρ1 ρ2 · · · ρk−2 ρ1
ρ1 1 ρ1 · · · ρk−3 ρ2
.. .. .. . . . .. ..
. . . . .
ρk−1 ρk−2 ρk−3 · · · ρ1 ρk
φkk = . (22)
1 ρ1 ρ2 · · · ρk−2 ρk−1
ρ1 1 ρ1 · · · ρk−3 ρk−2
.. .. .. . . . .. ..
. . . . .
ρk−1 ρk−2 ρk−3 · · · ρ1 1
18
Thus, the partial autocorrelation between Xt and Xt+k can be ob-
tained as the regression coefficient associated with Xt when regress-
ing Xt+k on its lagged variables Xt+k−1 , Xt+k−2 , . . . , Xt+1 , Xt as in
equation (19).
Note:
1. As a function of k, φkk is usually refered to as the Partial Au-
tocorrelation Function (PACF).
2. PACF is a useful tool for determining the order p of an Autore-
gressive (AR) model, which will be discussed in the sequel.
3. The plots of ACF and PACF play a crucial role in modeling a
time series.
4. There is an alternative way of obtaining PACF, refer Box et al.
(5th edition, 2016, p. 64) for the same.
SR
19
3. hence, the ACF is
(
1 ,k = 0
ρk = (24)
0 , k 6= 0,
and
4. the PACF is (
1 ,k = 0
φkk = (25)
0 , k 6= 0.
Note:
20
1. The general linear process (26) allows us to represent Xt as a
weighted sum of present and past values of the “white noise”
process at .
2. The white noise process at may be regarded as a series of shocks
(innovations) that drive the system.
3. For Xt to represent a valid stationary process, it is necessary
P∞ for
the coefficients ψj to be absolutelyPsummable, that is, j=0 |ψj | <
∞, in which case we also have ∞ 2
j=0 ψj < ∞ so that V ar(Xt )
is finite.
4. A process with a nonzero mean µ may be obtained by adding µ
to the right-hand side of Equation (26). Since the mean does
not affect the covariance properties of a process, we assume a
zero mean (taking E(Xt ) = 0) for convenience until we begin
fitting models to data. If E(Xt ) = µ, then P∞ the general linear
process (26), can be rewriteen as Xt = µ + j=0 ψj at−j or X̃t =
Xt − µ = ∞
P
j=0 ψj at−j . Here X̃t , which is the deviation from its
level µ follows GLP.
SR
Xt = π1 Xt−1 + π2 Xt−1 + · · · + at .
21
Later, we will also need to use the forward shift operator F = B −1 ,
such that
FXt = Xt+1 and hence Fj Xt = Xt+j .
Usng the backshift operator B, the model (26) can be written as
X∞
X t = 1 + ψj Bj at
j=1
or
Xt = ψ (B)at (29)
where ∞ ∞
X X
j
ψ (B) = 1 + ψj B = ψj Bj
j=1 j=0
or
π (B)Xt = at . (30)
Thus,
∞
X
π (B) = 1 − πj Bj
j=1
is the generating function of the π weights. After operating on both
sides of this expression (30) by ψ , we obtain
π (B)Xt = ψ (B)at = Xt .
ψ (B)π
π (B) = 1, so that
Hence, ψ (B)π
π (B) = ψ −1 (B). (31)
This relationship may be used to derive the π weights, knowing the
ψ weights, and vice versa.
22
3.3 Properties of GLP
For a general linear process, as given in expression (26), note that:
1.
X∞
E(Xt ) = E( ψj at−j )
j=0
∞
X
= ψj E(at−j )
j=0
X∞
= ψj (0), since E(at ) = 0, ∀ t
j=0
E(Xt ) = 0 (32)
Note that,
P∞if Xt has mean µ, then
P∞using note-4 (above) E(Xt ) =
E(µ + j=0 ψj at−j ) = µ + E( j=0 ψj at−j ) = µ.
2. Since autocorrelation function is a basic data analysis tool for
identifying models, it is important to know the autocorrelation
function of a linear P
process. The autocovariance at lag k of the
linear process Xt = ∞
SR
23
3. The Autocovariance Generating Function of a GLP is defined
as ∞
X
γ(B) = γk Bk . (36)
k=−∞
24
stationarity. For a linear processP
(26), these conditions are guaran-
teed by the single condition that ∞j=0 ψj < ∞. This condition can
also be embodied in the condition that the series ψ (B), which is the
generating function of the ψ weights, must converge for |B| ≤ 1, that
is, on or within the unit circle.
25
Note: Having studied the General Linear Process, we now focus
our attention on more realistic models. For practical representation,
it is desirable to employ models that use parameters parsimoniously.
Parsimony may often be achieved by representation of the linear pro-
cess in terms of a small number of autoregressive–moving average
(ARMA) terms. The GLP contains infinite number of parameters
that are impossible to estimate from a finite number of available ob-
servations. Instead, in modeling a phenomenon, we construct models
with only a finite number of parameters (this is what is known as
principle of parsimony).
∞ j
π (B)Xt = at , where π (B) = 1 − j=1 πj B , and 1 + j=1 |πj | < ∞
and {at } ∼ W N (0, σa2 ). In this model, if only a finite number of π
weights are nonzero, i.e., π1 = φ1 , π2 = φ2 , . . . , πp = φp and πk =
0, ∀ k > p, then the resulting process (special case of (30) ) is said
to be an Autoregressive process of order p and abbreviate the name
to AR(p) process.
Definition 4.1 (Autoregressive Process of order p (AR(p)).
Suppose that {at } is a purely random process with mean zero and
variance σa2 . Then a process {Xt } is said to be an autoregressive
process of order p (abbreviated to an AR(p) process) if
The AR(p) model can be written in the equivalent form using the
backshift operator B
(1 − φ1 B − φ2 B2 − · · · − φp Bp )Xt = at
or
φ (B)Xt = at . (40)
26
This implies that
at
Xt = = φ −1 (B)at ≡ Ψ(B)at .
φ (B)
Hence, the autoregressive process can be thought of as the output Xt
from a linear filter with transfer function φ −1 (B) = Ψ(B) when the
input is white noise at .
Note :
1. The AR processes are useful in describing situations in which
the present value of a ts depends linearly on its immediate pre-
vious values together with a random shock.
2. The current value of the series Xt is a linear combination of the
p most recent past values of itself plus an “innovation” term at
that incorporates everything new in the series at time t that is
not explained by the past values. Thus, for every t, we assume
that at is independent of Xt−1 , Xt−2 , Xt−3 , . . . , Xt−p .
3. AR model is rather like a multiple regression model, but Xt is
SR
27
where {at } ∼ W N (0, σa2 ) and (1 − φ1 z − φ2 z 2 − · · · − φp z p ) is the pth
degree polynomial.
28
Remark: Other relationships between polynomial roots and coef-
ficients may be used to show that the following two inequalities are
necessary for stationarity. That is, for the roots to be greater than 1
in modulus, it is necessary, but not sufficient, that both
φ1 + φ2 + · · · + φp < 1
and φp < 1.
Note:
1. The ψ weights are calculated for the AR(p) process by using the
difference equation ψj = φ1 ψj−1 + φ2 ψj−2 + · · · + φp ψj−p , j > 0
with ψ0 = 1 and ψj = 0, ∀ j < 0, from which the weights ψj
can easily be computed recursively in terms of the φi .
2. Since the series π (B) = φ (B) = (1−φ1 B−φ2 B2 −· · ·−φp Bp ) is
finite, no restrictions are required on the parameters
Pof an au-
∞
toregressive process to ensure invertibility, because j=1 πj =
Pp
j=1 φj < ∞.
29
where φ (B) = (1 − φ1 B − φ2 B2 − · · · − φp Bp ) and B now operates
on k not t. Then, writing
p
Y
φ(B) = (1 − Gi B)
i=1
AR process using the expression given in (35), but the {ψj } weights
(wrt to AR(p) process the weights will be interms of {φj }) may be
algebraically hard to find. We have an alternative way, which is
discussed below.
30
parameters by replacing the theoretical autocorrelations ρk by the es-
timated autocorrelations rk . Note that, if we write
φ1 ρ1 1 ρ1 ρ2 · · · ρp−1
φ
2
ρ
2
ρ
1 1 ρ1 · · · ρp−2
φ = .. ρ p = .. and P p = ..
.. .. ..
. ···
. . . . .
φp ρp ρp−1 ρp−2 ρp−3 · · · 1
φ = P −1
p ρp (49)
σa2
σx2 = . (52)
1 − φ1 ρ1 − φ2 ρ2 − · · · − φp ρp
31
Note:2).
The partial autocorrelations can be described in terms of p nonzero
functions of the autocorrelations. Denote by φkj the jth coefficient
in an autoregressive representation of order k, so that φkk is the last
coefficient. From, ρk = φ1 ρk−1 + φ2 ρk−2 + · · · + φp ρk−p , the φkj satisfy
the set of equations
φ11 = ρ1
1 ρ1
ρ1 ρ 2
φ22 =
1 ρ1
ρ1 1
1 ρ1 ρ1
ρ 1 1 ρ2
ρ 2 ρ1 ρ3
φ33 =
1 ρ1 ρ2
ρ 1 1 ρ1
ρ 2 ρ1 1
..
.
32
In general, for φkk ,
1 ρ1 ρ2 · · · ρk−2 ρ1
ρ1 1 ρ1 · · · ρk−3 ρ2
.. .. .. . . . .. ..
. . . . .
ρk−1 ρk−2 ρk−3 · · · ρ1 ρk
φkk = .
1 ρ1 ρ2 · · · ρk−2 ρk−1
ρ1 1 ρ1 · · · ρk−3 ρk−2
.. .. .. . . . .. ..
. . . . .
ρk−1 ρk−2 ρk−3 · · · ρ1 1
The quantity φkk , regarded as a function of the lag k, is called the
partial autocorrelation function.
Note: For an AR(p) process, the partial autocorrelations φkk will
be nonzero for k ≤ p and zero for k > p. In other words, the partial
autocorrelation function of the AR(p) process has a cutoff after lag
p. i.e. (
6= 0 , k ≤ p
For an AR(p) process φkk is (55)
= 0 , k > p.
SR
33
provided that the infinite series on the right converges in an appro-
priate sense. Hence
∞
X
ψ (B) = (1 − φ1 B) −1
= φj1 Bj . (57)
j=0
γk = φ1 γk−1 , k ≥ 1, (58)
where we use that ρ0 = 1. Hence, when |φ| < 1 and the process is
stationary, the ACF exponentially decays in one of the two forms
depending on the sign of φ1 . If 0 < φ1 < 1, then all autocorrelations
34
are positive; if −1 < φ1 < 0, then the sign of the autocorrelations
shows an alternating pattern beginning with a negative sign. The
magnitude of these autocorrelations decrease exponentially in both
cases.
(
ρ1 = φ1 , k = 1
φkk = (62)
0 , k ≥ 1.
35
4.7.2 Stationarity and Invertibility of AR(2) Process
(1 − φ1 B − φ2 B2 )Xt = at
i.e. Xt = φ1 Xt−1 + φ2 Xt−2 + at .
" p #
2
−φ1 + φ1 + 4φ2
2φ 2φ
G1 = p2 = p2 p
−φ1 − φ21 + 4φ2 −φ1 − φ21 + 4φ2 −φ1 + φ21 + 4φ2
p p
2φ2 (−φ1 + φ21 + 4φ2 ) φ1 − φ21 + 4φ2
= =
φ21 − (φ21 + 4φ2 ) 2
Similarly,
p
φ1 + φ21 + 4φ2
G2 =
2
The required condition G−1
i > 1 implies that |Gi | < 1 for i = 1, 2.
Hence,
36
For real roots: Here we have φ21 + 4φ2 ≥ 0. Note that |Gi | < 1 for
i = 1, 2 if and only if
−1 < G1 ≤ G2 < 1
p p
φ1 − φ21 + 4φ2 φ1 + φ21 + 4φ2
−1 < ≤ <1
q2 q2
or − 2 < φ1 − φ21 + 4φ2 ≤ φ1 + φ21 + 4φ2 < 2
φ1 + φ2 < 1. Hence, (
φ2 + φ1 < 1
(65)
φ2 − φ1 < 1.
These equations together with φ21 + 4φ2 ≥ 0 define the stationarity
region for the real root case.
For complex roots: Here we have φ21 + 4φ2 < 0. It is quite in-
volved. Shall not pursue.
37
Since the series is assumed to be stationary with zero mean, and
since at is independent of Xt−k , we obtain E((Xt−k at ) = 0 and so
γk = φ1 γk−1 + φ2 γk−2 , k ≥ 1, (67)
and the autocorrelation becomes
ρk = φ1 ρk−1 + φ2 ρk−2 , k ≥ 1. (68)
Specifically, when k = 1 and 2,
ρ1 = φ1 + φ2 ρ1
ρ2 = φ1 ρ1 + φ2 ,
where we use that ρ0 = 1. Hence,
φ1
ρ1 =
1 − φ2
φ21 φ21 + φ2 − φ22
ρ2 = + φ2 = , (69)
1 − φ2 1 − φ2
and ρk for k ≥ 3 is calculated recursively through (68).
The pattern of the ACF is governed by the difference equation
SR
Consider the AR(2) model, Xt = φ1 Xt−1 +φ2 Xt−2 +at , take variances
of both sides to get
γ0 = φ21 γ0 + φ22 γ0 + 2φ1 φ2 γ1 + σa2 , (70)
where we use that V ar(Xt ) = γ0 = σx2 , ∀ t and V ar(at ) = σa2 . Using
γ1 = φ1 γ0 + φ2 γ1 and, hence solving for γ0 yields
(1 − φ2 )σa2
σx2 = γ0 = (71)
(1 − φ2 )(1 − φ21 − φ22 ) − 2φ2 φ21
Note:
1. Notice the immediate implication of (66).
2. We can also find the variance of AR(2) process using σx2 =
σa2
, where ρ1 and ρ2 are as given in equation(69).
1 − φ1 ρ 1 − φ2 ρ 2
38
4.7.5 PACF of AR(2) Process
1 ρ1
ρ1 ρ2
φ22 =
1 ρ1
ρ1 1
ρ2 − ρ21
=
1 − ρ21
! 2
φ21 + φ2 − φ22 φ1
−
SR
1 − φ2 1 − φ2
= 2
φ1
1−
1 − φ2
φ2 [(1 − φ2 )2 − φ21 ]
=
(1 − φ2 )2 − φ21
φ22 = φ2 (74)
39
1 ρ1 ρ 1
ρ1 1 ρ 2
ρ2 ρ1 ρ 3
φ33 =
1 ρ1 ρ 2
ρ1 1 ρ 1
ρ2 ρ1 1
1 ρ1 φ 1 + φ 2 ρ1
ρ1 1 φ1 ρ1 + φ2
ρ2 ρ1 φ 1 ρ2 + φ 2 ρ1
= due to equation (72)
1 ρ 1 ρ2
ρ1 1 ρ1
ρ2 ρ 1 1
φ33 = 0, (75)
40
σa2 . Then a process {Xt } is said to be a moving average process of
order q (abbreviated to an MA(q) process) if
The MA(q) model can be written in the equivalent form using the
backshift operator B
Xt = (1 − θ1 B − θ2 B2 − · · · − θq Bq )at
or
Xt = θ (B)at . (77)
This implies that
Xt = θ (B)at ≡ Ψ(B)at .
Hence, the moving average process can be thought of as the output
Xt from a linear filter with transfer function θ (B) = Ψ(B) when the
input is white noise at .
Note :
SR
41
3. The terminology moving average arises from the fact that Xt
is obtained by applying the weights, 1, −θ1 , −θ2 , . . . , −θq to the
variables at , at−1 , at−2 , . . . , at−q and then moving the weights and
applying them to at+1 , at , at−1 , . . . , at−q+1 to obtain Xt+1 and so
on.
4. The symbols θ1 , θ2 , . . . , θq are the finite set of weight parameters,
known as Moving average parameters.
5. The moving average operator is defined to be θ(B) = (1 − θ1 B −
θ2 B2 − · · · − θq Bq ).
6. Recall that we are assuming that Xt has zero mean. We can
always introduce a nonzero mean by replacing Xt by Xt − µ, ∀ t
throughout our equations.
7. Slutzky (1927) and Wold(1938) carried out the original work on
moving average processes. The process arose as a result of the
study by Slutzky on the effect of the moving average on random
events.
SR
Xt = (1 − θ1 B − θ2 B2 − · · · − θq Bq )at (78)
42
Invertibility: We now derive the conditions that the parameters
θ1 , θ2 , . . . , θq must satisfy to ensure the invertibility of the MA(q)
process:
The general MA(q) process Xt = θ (B)at can be written as
∞
X
−1
at = θ (B)Xt ≡ π (B)Xt = πj Xt−j (79)
j=0
θ (B) = (1 − H1 B)(1 − H2 B) · · · (1 − Hq B)
where H1−1 , H2−1 , . . . , Hq−1 are the roots of θ (B) = 0, expanding θ −1 (B)
in partial fractions yields
q
−1
X Mi
at = θ (B)Xt = Xt .
i=1
1 − Hi B
43
necessary for invertibiity. That is, for the roots to be greater than 1
in modulus, it is necessary, but not sufficient, that both
θ1 + θ2 + · · · + θq < 1
and θq < 1.
Note:
1. The π weights are calculated for the MA(q) process by using the
difference equation πj = θ1 πj−1 + θ2 πj−1 + · · · + θq πj−q , j > 0
with π0 = −1 and πj = 0, ∀ j < 0, from which the weights πj
can easily be computed recursively in terms of the θi .
2. Since the series ψ (B) = θ (B) = (1 − θ1 B − θ2 B2 − · · · − θq Bq )
is finite, and 1 + θ12 + θ22 + · · · + θq2 < ∞, no restrictions are re-
quired on the parameters of an moving average process to ensure
stationarity.
44
Using (81) and (82), thus, the autocorrelation function is
(−θk + θ1 θk+1 + θ2 θk+2 + · · · + θq−k θq ) , k = 1(1)q
ρk = (1 + θ12 + θ22 + · · · + θq2 ) (83)
0, k > q.
The PACF for MA models behaves much like the ACF for AR models.
We can easily see that the partial autocorrelation function of the
general MA(q) process tails off as a mixture of exponential deacys
and/or damped sine waves depending on the nature of the roots of
(1 − θ1 − θ2 − · · · − θq ) = 0. The PACF will contain damped sine
waves if some of the roots are complex.
45
5.5.2 Invertibility Condition of MA(1) Process
Xt = (1 − θ1 B)at ,
i.e. ∞
X
−1
at = (1 − θ1 B) Xt = θ1j Xt−j
j=0
Note: When |θ| > 1,the weights increase as lags increase, so the
more distant the observations the greater their influence on the cur-
rent error. When |θ| = 1, the weights are constant in size, and the
distant observations have the same influence as the recent observa-
tions. As neither of these situations make much sense, we require
|θ| < 1, so the most recent observations have higher weight than ob-
servations from the more distant past. Thus, the process is invertible
when |θ| < 1.
46
For MA(1) process, ψ (B) = (1 − θ1 B). Therefore
0, k ≥ 2,
which cuts off after lag 1; that is, the process has no correlation
beyond lag 1. This fact will be important later when we need to
choose suitable models for real data.
Remark: We can see that the first lag autocorrelation in MA(1) is
|θ1 | 1
bounded as |ρ1 | = ≤ .
(1 + θ12 ) 2
Let
1
M odel A : Xt = at − θ1 at−1 and M odel B : Xt = at − at−1 .
θ1
It can easily be shown that these two different processes have exactly
the same ACF. Thus we cannot identify a MA process uniquely from
a given ACF. Now, if we ‘invert’ models A and B by expressing at
in terms of Xt , Xt−1 , . . ., we find by successive substitution that
47
and
1 1
M odel B : at = Xt + Xt−1 + 2 Xt−2 + · · · .
θ1 θ1
If |θ1 | < 1, the series of coeffcients of Xt−j for model A converges
whereas that of B does not. Thus model B cannot be ‘inverted’ in
this way. The imposition of the invertibility condition ensures that
there is a unique invertible first-order MA process for a given ACF.
Using (90) and (22), the PACF of a MA(1) process can be easily
seen as
−θ1 −θ1 (1 − θ12 )
φ11 = ρ1 = =
1 + θ12 1 − θ14
−ρ21 −θ12 −θ12 (1 − θ12 )
φ22 = = =
SR
48
where {at } ∼ W N (0, σa2 ) and (1 − θ1 z − θ2 z 2 ) is the 2nd degree
polynomial.
49
Using (95) and (96), the autocorrelation function becomes
−θ1 (1 − θ2 )
, k=1
(1 + θ12 + θ22 )
ρk = −θ2 (97)
(1 + θ 2 + θ2) , k=2
1 2
0, k ≥ 3,
which cuts off after lag 2.
φ11 = ρ1
ρ2 − ρ21
φ22 =
1 − ρ21
ρ31 − ρ1 ρ2 (2 − ρ2 )
φ33 =
1 − ρ22 − 2ρ21 (1 − ρ2 )
SR
..
..
(98)
Note: The exact expression for the partial autocorrelation function
of an MA(2) process is complicated, but it is dominated by the sum
of two exponentials (tails off exponentially) if the roots of the char-
acteristic equation (1 − θ1 B1 − θ2 B2 ) = 0 are real, and by a damped
sine wave if the roots are complex. Its behaviour depend also on the
signs and magnitudes of θ1 and θ2 . Thus, it behaves like the auto-
correlation function of an AR(2) process (this aspect illustrates the
duality between the MA(2) and the AR(2) processes).
φ(B) ψ(B) = 1.
φ(B)ψ(B) (101)
(1 − φ1 B − φ2 B2 )(1 + ψ1 B + ψ2 B2 + · · · ) = 1,
i.e.
1 + ψ1 B + ψ2 B2 + ψ3 B3 + · · ·
− φ1 B − ψ1 φ1 B2 − ψ2 φ1 B3 − · · · (103)
− φ2 B2 − ψ1 φ2 B3 − · · · = 1.
Thus, we obtain the ψj ’s as follows:
B1 : ψ1 − φ1 = 0
SR
→ ψ1 = φ1
2
B : ψ2 − ψ1 φ1 − φ2 = 0 → ψ2 = ψ1 φ1 + φ2 = φ21 + φ2
(104)
B3 : ψ3 − ψ2 ψ1 − ψ1 φ2 = 0 → ψ3 = ψ2 φ1 + ψ1 φ2
..
..
Actually, for j ≥ 2, we have
Xt = θ(B)
θ(B)at , (107)
51
with θ(B) = (1 − θ1 B − θ2 B2 − · · · − θq Bq ), we can rewrite it as
Xt
π(B)
π(B)Xt = = at , (108)
θ(B)
1
where π(B) = (1 − π1 B − π2 B2 − · · · ) = , such that
θ(B)
θ(B) π(B) = 1.
θ(B)π(B) (109)
The π weights can be derived by equating the coefficients of Bj on
both sides of the above expression. For example, we can write the
MA(2) process as
Xt
(1 − π1 B − π2 B2 − π3 B3 − · · · )Xt = = at , (110)
(1 − θ1 B − θ2 B2 )
which implies that
(1 − θ1 B − θ2 B2 )(1 − π1 B − π2 B2 − π3 B3 − · · · ) = 1,
or
1 − π1 B − π2 B2 − π3 B3 − · · ·
SR
− θ1 B + π1 θ1 B2 + π2 θ1 B3 + · · · (111)
− θ2 B2 + π1 θ2 B3 + · · · = 1.
Thus, the πj weights can be derive by equating the coefficients of Bj
as follows:
B1 : −π1 − θ1 = 0 → π1 = −θ1
B2 : −π2 + π1 θ1 − θ2 = 0 → π2 = π1 θ1 − θ2 = −θ12 − θ2
(112)
B3 : −π3 + π2 θ1 + π1 θ2 = 0 → π3 = π2 θ1 + π1 θ2
..
..
In general, for j ≥ 3, we have
πj = πj−1 θ1 + πj−2 θ2 , (113)
where π0 = −1. In a special case when θ2 = 0, we have πj =
−θ1j , ∀ j ≥ 1. Therefore,
Zt
(1 + θ11 B + θ12 B2 + · · · )Xt = = at . (114)
(1 − θ1 B)
This equation implies that a finite-order invertible MA process is
equivalent to an infinite-order AR process.
52
In summary, a finite-order stationary AR(p) process corresponds
to an infinite-order MA(q) process, and a finite-order invertible MA(q)
process corresponds to an infinite-order AR(p) process. This dual re-
lationship between the AR(p) and MA(q) processes also exists in the
ACF and PACF. The AR(p) process has its autocorrelations tailing
off and partial autocorrelations cutting off, but the MA(q) process
has its autocorrelations cutting off and partial autocorrelations tail-
ing off.
53
Note: The below table gives the characteristics of the Autocor-
relation and the Partial Autocorrelation Functions of AR and MA
Processes, which can be used for model identification.
Table 1: Characteristics of the Autocorrelation and the Partial Autocorrelation
Functions of AR and MA Processes
ACF PACF
AR(p) does not break off breaks off with p
MA(q) breaks off with q does not break off
54
autoregressive moving average process of order (p,q) (abbreviated to
an ARMA(p,q) process) if
Xt −φ1 Xt−1 −· · ·−φp Xt−p = at −θ1 at−1 −θ2 at−2 −· · ·−θq at−q . (115)
(1 − φ1 B − · · · − φp Bp )Xt = (1 − θ1 B − · · · − θq Bq )at
or in concise form
φ (B)Xt = θ (B)at . (116)
where φ (B) = (1 − φ1 B − φ2 B2 − · · · − φp Bp ) is the pth degree poly-
nomial and θ (B)(1 − θ1 B − θ2 B2 − · · · − θp Bq ) is the qth degree
polynomial.
φ(B)Xt = et , (117)
Xt = θ (B)bt , (118)
55
1. These models enable us to describe processes in which neither
the autocorrelation nor the partial autocorrelation function breaks
off after a finite number of lags.
2. Here we express Xt as a linear combination of past observations
Xt−j and white noise at−j .
3. In the statistical analysis of time series, autoregressive–moving-
average (ARMA) models provide a parsimonious description of
a (weakly) stationary stochastic process in terms of two poly-
nomials, one for the autoregression (AR) and the second for
the moving average (MA). The general ARMA model was de-
scribed in the 1951 thesis of Peter Whittle, Hypothesis testing
in time series analysis, and it was popularized in the 1970 book
by George E. P. Box and Gwilym Jenkins.
4. The importance of ARMA processes lies in the fact that a sta-
tionary time series may often be adequately modelled by an
ARMA model involving fewer parameters than a pure MA or
AR process by itself. This is an early example of what is often
SR
56
θ(B) ≡ 1), the model is
5. It is to be noted that, when q = 0(θ(B)
called an autoregressive model of order p, AR(p), and p =
φ(B) ≡ 1), the model is called a moving average model of
0(φ(B)
order q, MA(q).
6. Recall that we are assuming that Xt has zero mean. We can
always introduce a nonzero mean by replacing Xt by Xt − µ, ∀ t
throughout our equations.
7. In the stationary ARMA(p, q) process, if the autoregressive op-
erator φ(B) = (1 − φ1 B − φ2 B2 − · · · − φp Bp ) and the moving
average operator θ(B) = (1 − θ1 B − θ2 B2 − · · · − θq Bq ) have
any roots in common say, λi = ηj for some i and j, then the
stationary ARMA(p, q) process is clearly identical to the sta-
tionary ARMA(p-1,q-1) process. (Here, the orders p and q gets
reduced by as many as the number of common roots, i.e., if there
are say j where j = min(p, q) common roots, then the actual
order of the ARMA process is (p − j, q − j) instead of (p,q)).
This is the reason why in the above definition (119 ) we state
that the polynomials will not have common roots.
SR
φ (B) = 0
θ (B) = 0
57
lie outside the unit circle.
Xt = φ1 Xt−1 +φ2 Xt−2 +· · ·+φp Xt−p +at −θ1 at−1 −θ2 at−2 −· · ·−θq at−q ,
58
multiply it by Xt−k on both sides
Xt−k Xt = φ1 Xt−k Xt−1 +· · ·+φp Xt−k Xt−p +Xt−k at −θ1 Xt−k at−1 −· · ·−θq Xt−k at−q .
γk = φ1 γk−1 +· · ·+φp γk−p +E[Xt−k at ]−θ1 E[Xt−k at−1 ]−· · ·−θq E[Xt−k at−q ].
i.e.,
γk = φ1 γk−1 + · · · + φp γk−p , k = q + 1, q + 2, . . . ,
and hence,
or concisely,
φ (B)ρk = 0, ∀ k = q + 1, q + 2, . . . . (123)
59
Note that (123) does not hold for k ≤ q, owing to correlation
between θk at−k and Xt−k . Hence, an ARMA(p, q) process will have
more complicated autocovariances for lags 1 through q than would
the corresponding AR(p) process.
The ACF decays always exponentially. In case φ > 0 the ACF is
smooth as it damps out whereas if φ < 0 the ACF alternates in sign.
Remark that the sign of ρ1 is determined by φ1 and θ1 : if φ1 > θ1
then ρ1 > 0, else if φ1 < θ1 then ρ1 < 0.
γ0 = φ1 γ1 + · · · + φp γp + σa2 (1 − θ1 ψ1 − · · · − θq ψq ) (124)
60
1. If we are using an ARMA(1, 1) model in which θ1 is close to
φ1 then the data might better be modeled as simple white noise.
2. When φ1 = 0 (125) reduces to MA(1) process, and when θ1 = 0
it reduces to AR(1) process. Thus, we can regard the AR(1) and
MA(1) processes as special cases of the ARMA(1, 1) process.
(1 − φ1 B)Xt = (1 − θ1 B)at .
Recall that E(Xt at ) = σa2 . For the term E(Xt at−1 ), we note that
Hence,
γ0 = φ1 γ1 + σa2 − θ1 (φ1 − θ1 )σa2 . (127)
61
When k = 1, we have from (126)
γ1 = φ1 γ0 − θ1 σa2 . (128)
Substituting (128) in (127), we have
γ0 = φ21 γ0 − φ1 θ1 σa2 + σa2 − φ1 θ1 σa2 + θ12 σa2 ,
i.e., The variance of ARMA(1, 1) process is
(1 + θ12 − 2φ1 θ1 ) 2
γ0 = σa . (129)
(1 − φ21 )
Thus,
γ1 = φ1 γ0 − θ1 σa2
φ1 (1 + θ12 − 2φ1 θ1 )
= 2 − θ1 σa2
(1 − φ1 )
(φ1 − θ1 )(1 − φ1 θ1 ) 2
γ1 = σa . (130)
(1 − φ21 )
For k ≥ 2, we have from (126)
γk = φ1 γk−1 , k ≥ 2. (131)
Hence, the ARMA(1, 1) model has the following autocorrelation
SR
function:
1, k=0
(φ − θ )(1 − φ θ )
1 1 1 1
ρk = 2 , k=1 (132)
(1 + θ1 − 2φ1 θ1 )
φ1 ρk−1 = φk−1 ρ1 , k ≥ 2.
1
62
Alternative Way of Deriving ACF of ARMA(1, 1) Process
Consider the model
(1 − φ1 B)Xt = (1 − θ1 B)at
φ (B)Xt = θ (B)at
1
Xt = θ (B)at (133)
φ (B)
For |φ1 | < 1, we have
1
θ (B) = (1 + φ1 B + φ21 B2 + φ31 B3 + · · · )(1 − θ1 B)
φ (B)
= 1 + φ1 B + φ21 B2 + φ31 B3 + · · · − (θ1 B + φ1 θ1 B2 + φ21 θ1 B3 + · · · )
= 1 + (φ1 − θ1 )B + (φ21 − φ1 θ1 )B2 + (φ31 − φ21 θ1 )B3 + · · ·
= 1 + (φ1 − θ1 )B + (φ1 − θ1 )φ1 B2 + (φ1 − θ1 )φ21 B3 + · · ·
∞
1 X
θ (B) = ψj Bj (134)
φ (B) j=0
63
and
∞
X
γ1 = σa2 ψj ψj+1
j=0
2
= + (φ1 − θ1 )(φ1 − θ1 )φ1 + (φ1 − θ1 )φ1 (φ1 − θ1 )φ21
σa [1(φ1 − θ1 )
+ (φ1 − θ1 )φ21 (φ1 − θ1 )φ31 + · · · ]
= σa2 [(φ1 − θ1 ) + (φ1 − θ1 )2 φ1 (1 + φ21 + φ41 + · · · )]
∞
X 2j
= σa2 (φ1 − θ1 ) + (φ1 − θ1 )2 φ1 φ1
j=0
" #
(φ1 − θ1 )2 φ1
γ1 = σa2 (φ1 − θ1 ) +
(1 − φ21 )
(137)
Similar derivations for k ≥ 2 give
γk = φk−1
1 γk−1 . (138)
we obtain
γ1 (φ1 − θ1 )(1 − φ1 θ1 )
ρ1 = = , (139)
γ0 1 − φ21
and for k ≥ 2 we have
ρk = φk−1
1 ρk−1 . (140)
64
6.9 Summary of Properties of AR, MA, and ARMA Pro-
cesses
***END***
SR
65